Solved

Selection of cloud libraries when pipeline needs 2 different cloud environments

  • 6 February 2023
  • 1 reply
  • 13 views

Hello,

We’re trying to create a transformer pipeline with Origin: AmazonS3 , Destination: Azure ADLS and compute is on Azure Databricks. 

We tried to break is down into smaller pieces.

  1. S3->ADLS pipeline with Streamsets own cluster worked fine.
  2. Random data source to ADLS with Databricks compute worked fine. 

But when we try to bring in all 3 components together, we are getting stage library specific errors which are conflicting. We have tried different combinations from the list in the attached picture.

In this particular scenario, what are the stage libraries we need to choose? 

 

icon

Best answer by Dhanashri_Bhate 7 February 2023, 01:23

View original

1 reply

Query resolved.The pipeline worked well with the following libraries. 

[
    "streamsets-spark-azure-no-dependency-lib:5.2.0",
    "streamsets-spark-aws-no-dependency-lib:5.2.0",
    "streamsets-spark-basic-lib:5.2.0",
    "streamsets-spark-file-lib:5.2.0",
    "streamsets-spark-jdbc-lib:5.2.0"
]

Reply