Solved

Data Collector - Delta Lake

  • 24 December 2021
  • 3 replies
  • 53 views

Userlevel 1

Hi

We are using streamsets provisioned from google cloud marketplace .

We are trying to create a data pipeline origin being kafka topic and destination being delta lake.

While setting we observed that “Staging location” requires a AWS S3 or Azure Storage . Google cloud storage or other alternatives are not present.

We do not have a AWS or Azure account.

Is it mandatory for any delta lake ingestion to have a AWS or Azure storage even though our application may be in neither of them?

icon

Best answer by alex.sanchez 24 December 2021, 08:52

View original

3 replies

Userlevel 2
Badge

Hi @onlinepk ,

 

Unfortunately, our Delta lake connection works on batches (to increase the performance and reduce costs), we basically push a batch of data to that staging area (in Azure or AWS) and from them copy it directly to the destination table.

If you are a customer, please consider opening a feature request, since we can consider GCP as an option.

Thanks

Userlevel 1

Thanks Alex for the prompt response

Any alternatives or workaround to get around this ?

Additionally we have subscribed to streamsets from google marketplace. 

So I am hoping that makes us a “customer” to raise a feature request. 

Userlevel 2
Badge

Hi @onlinepk,

 

Unfortunately, there is no way to replace it with another functionality.

Please reach out to support to get that feature request created.

Reply