Solved

Data Collector - Delta Lake

Forum|Forum|3 years ago
December 24, 2021
3 replies
67 views

onlinepk
Fan

Hi

We are using streamsets provisioned from google cloud marketplace .

We are trying to create a data pipeline origin being kafka topic and destination being delta lake.

While setting we observed that “Staging location” requires a AWS S3 or Azure Storage . Google cloud storage or other alternatives are not present.

We do not have a AWS or Azure account.

Is it mandatory for any delta lake ingestion to have a AWS or Azure storage even though our application may be in neither of them?

Best answer by alex.sanchez

Hi @onlinepk,

Unfortunately, there is no way to replace it with another functionality.

Please reach out to support to get that feature request created.

alex.sanchez
StreamSets Employee
Forum|Forum|3 years ago
December 24, 2021

Hi @onlinepk ,

Unfortunately, our Delta lake connection works on batches (to increase the performance and reduce costs), we basically push a batch of data to that staging area (in Azure or AWS) and from them copy it directly to the destination table.

If you are a customer, please consider opening a feature request, since we can consider GCP as an option.

Thanks

Alex - StreamSets Engineering Manager @ Collector

Like

onlinepk
Author
Fan
Forum|Forum|3 years ago
December 24, 2021

Thanks Alex for the prompt response

Any alternatives or workaround to get around this ?

Additionally we have subscribed to streamsets from google marketplace.

So I am hoping that makes us a “customer” to raise a feature request.

Like

alex.sanchez
StreamSets Employee
Answer
Forum|Forum|3 years ago
December 24, 2021

Hi @onlinepk,

Unfortunately, there is no way to replace it with another functionality.

Please reach out to support to get that feature request created.

Alex - StreamSets Engineering Manager @ Collector

Like

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded