Solved

Data Collector - Delta Lake

2 years ago
24 December 2021
3 replies
53 views

Userlevel 1

onlinepk
Fan
1 reply

Hi

We are using streamsets provisioned from google cloud marketplace .

We are trying to create a data pipeline origin being kafka topic and destination being delta lake.

While setting we observed that “Staging location” requires a AWS S3 or Azure Storage . Google cloud storage or other alternatives are not present.

We do not have a AWS or Azure account.

Is it mandatory for any delta lake ingestion to have a AWS or Azure storage even though our application may be in neither of them?

icon

Best answer by alex.sanchez 24 December 2021, 08:52

View original

3 replies

Userlevel 2

alex.sanchez
StreamSets Employee
73 replies
2 years ago
24 December 2021

Hi @onlinepk ,

Unfortunately, our Delta lake connection works on batches (to increase the performance and reduce costs), we basically push a batch of data to that staging area (in Azure or AWS) and from them copy it directly to the destination table.

If you are a customer, please consider opening a feature request, since we can consider GCP as an option.

Thanks

Userlevel 1

onlinepk
Author
Fan
1 reply
2 years ago
24 December 2021

Thanks Alex for the prompt response

Any alternatives or workaround to get around this ?

Additionally we have subscribed to streamsets from google marketplace.

So I am hoping that makes us a “customer” to raise a feature request.

Userlevel 2

alex.sanchez
StreamSets Employee
73 replies
2 years ago
24 December 2021
Answer

Hi @onlinepk,

Unfortunately, there is no way to replace it with another functionality.

Please reach out to support to get that feature request created.

Reply

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded