Hello,We’re trying to create a transformer pipeline with Origin: AmazonS3 , Destination: Azure ADLS and compute is on Azure Databricks. We tried to break is down into smaller pieces.S3->ADLS pipeline with Streamsets own cluster worked fine. Random data source to ADLS with Databricks compute worked fine. But when we try to bring in all 3 components together, we are getting stage library specific errors which are conflicting. We have tried different combinations from the list in the attached picture.In this particular scenario, what are the stage libraries we need to choose?

Solved

Selection of cloud libraries when pipeline needs 2 different cloud environments

1 year ago
6 February 2023
1 reply
13 views

Dhanashri_Bhate
Opening Band
19 replies

Hello,

We’re trying to create a transformer pipeline with Origin: AmazonS3 , Destination: Azure ADLS and compute is on Azure Databricks.

We tried to break is down into smaller pieces.

S3->ADLS pipeline with Streamsets own cluster worked fine.
Random data source to ADLS with Databricks compute worked fine.

But when we try to bring in all 3 components together, we are getting stage library specific errors which are conflicting. We have tried different combinations from the list in the attached picture.

In this particular scenario, what are the stage libraries we need to choose?

icon

Best answer by Dhanashri_Bhate 7 February 2023, 01:23

View original

1 reply

D

Dhanashri_Bhate
Author
Opening Band
19 replies
1 year ago
7 February 2023
Answer

Query resolved.The pipeline worked well with the following libraries.

[
    "streamsets-spark-azure-no-dependency-lib:5.2.0",
    "streamsets-spark-aws-no-dependency-lib:5.2.0",
    "streamsets-spark-basic-lib:5.2.0",
    "streamsets-spark-file-lib:5.2.0",
    "streamsets-spark-jdbc-lib:5.2.0"
]

Reply

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded