Skip to main content
Question

How to add multiple origins to an existing pipeline using python SDK

  • December 28, 2021
  • 6 replies
  • 550 views

jerri
Roadie

How to add multiple origins to an existing pipeline using python SDK 

6 replies

dima
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • December 28, 2021

Can you describe your use case a bit more? StreamSets Data Collector pipelines can only have a single origin, so this isn't possible with or without the SDK.


jerri
Roadie
  • Author
  • Roadie
  • December 28, 2021

My one transformer pipeline will have multiple sources. I have created such a template. Now i want to replicate it against multiple projects by just swapping N origins in a programmatic way. I am aware this can be SCH usecase. But i dont have the option of SCH, hence need to depend on python SDK  


  • Fan
  • December 28, 2021

@jerri  - are you looking to swapping the origin module or just changing the source connection ? 

in our deployment, we used one parameterized template and used the SDK to programmatically change the connections at run time.. This helped us to scale efficiently and monitor the flows from our standard scheduling tool. 

 


jerri
Roadie
  • Author
  • Roadie
  • December 29, 2021

@jerri  - are you looking to swapping the origin module or just changing the source connection ? 

in our deployment, we used one parameterized template and used the SDK to programmatically change the connections at run time.. This helped us to scale efficiently and monitor the flows from our standard scheduling tool. 

 

in my usecase, i have a varying no.of origins due to which i have to also add/remove origins from an existing pipeline


  • Fan
  • December 30, 2021

you have an interesting use case..  I have not used SDK for Transformer.  A quick search in the documentation does not indicate that we have Transformer related capability. 

https://docs.streamsets.com/sdk/latest/index.html

Another possibility would be to create jobs, but I do not think that option fits the requirement perfectly. There will be some maintenance concerns as well.  Sorry I could not be of much help.   Please do share your final implementation model. Thanks


jerri
Roadie
  • Author
  • Roadie
  • January 3, 2022

you have an interesting use case..  I have not used SDK for Transformer.  A quick search in the documentation does not indicate that we have Transformer related capability. 

https://docs.streamsets.com/sdk/latest/index.html

Another possibility would be to create jobs, but I do not think that option fits the requirement perfectly. There will be some maintenance concerns as well.  Sorry I could not be of much help.   Please do share your final implementation model. Thanks

Had to do it the hard way using python SDK.
step1 : retrieve pipeline(by ID) that needs to be replicated.
step2 : loop through all stages and filter and remove stages(origins) with type=SOURCE
step3 : loop and add origins with respective paths (generate outputLane for these origins)
step4 : mention each of the newly added origin's outputLane in the inputLane for the stage, where these origins have to be connected (in our case, all origins met at a UNION stage)

Do let me know if there is a smarter way to go about this. Thanks.