Skip to main content
Solved

Data Collector - Delta Lake

  • December 24, 2021
  • 3 replies
  • 56 views

onlinepk
Fan

Hi

We are using streamsets provisioned from google cloud marketplace .

We are trying to create a data pipeline origin being kafka topic and destination being delta lake.

While setting we observed that “Staging location” requires a AWS S3 or Azure Storage . Google cloud storage or other alternatives are not present.

We do not have a AWS or Azure account.

Is it mandatory for any delta lake ingestion to have a AWS or Azure storage even though our application may be in neither of them?

Best answer by alex.sanchez

Hi @onlinepk,

 

Unfortunately, there is no way to replace it with another functionality.

Please reach out to support to get that feature request created.

View original
Did this topic help you find an answer to your question?

3 replies

alex.sanchez
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 73 replies
  • December 24, 2021

Hi @onlinepk ,

 

Unfortunately, our Delta lake connection works on batches (to increase the performance and reduce costs), we basically push a batch of data to that staging area (in Azure or AWS) and from them copy it directly to the destination table.

If you are a customer, please consider opening a feature request, since we can consider GCP as an option.

Thanks


onlinepk
Fan
  • Author
  • Fan
  • 1 reply
  • December 24, 2021

Thanks Alex for the prompt response

Any alternatives or workaround to get around this ?

Additionally we have subscribed to streamsets from google marketplace. 

So I am hoping that makes us a “customer” to raise a feature request. 


alex.sanchez
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 73 replies
  • Answer
  • December 24, 2021

Hi @onlinepk,

 

Unfortunately, there is no way to replace it with another functionality.

Please reach out to support to get that feature request created.


Reply