Skip to main content

How to add multiple origins to an existing pipeline using python SDK 

Can you describe your use case a bit more? StreamSets Data Collector pipelines can only have a single origin, so this isn't possible with or without the SDK.


My one transformer pipeline will have multiple sources. I have created such a template. Now i want to replicate it against multiple projects by just swapping N origins in a programmatic way. I am aware this can be SCH usecase. But i dont have the option of SCH, hence need to depend on python SDK  


@jerri  - are you looking to swapping the origin module or just changing the source connection ? 

in our deployment, we used one parameterized template and used the SDK to programmatically change the connections at run time.. This helped us to scale efficiently and monitor the flows from our standard scheduling tool. 

 


@jerri  - are you looking to swapping the origin module or just changing the source connection ? 

in our deployment, we used one parameterized template and used the SDK to programmatically change the connections at run time.. This helped us to scale efficiently and monitor the flows from our standard scheduling tool. 

 

in my usecase, i have a varying no.of origins due to which i have to also add/remove origins from an existing pipeline


you have an interesting use case..  I have not used SDK for Transformer.  A quick search in the documentation does not indicate that we have Transformer related capability. 

https://docs.streamsets.com/sdk/latest/index.html

Another possibility would be to create jobs, but I do not think that option fits the requirement perfectly. There will be some maintenance concerns as well.  Sorry I could not be of much help.   Please do share your final implementation model. Thanks


you have an interesting use case..  I have not used SDK for Transformer.  A quick search in the documentation does not indicate that we have Transformer related capability. 

https://docs.streamsets.com/sdk/latest/index.html

Another possibility would be to create jobs, but I do not think that option fits the requirement perfectly. There will be some maintenance concerns as well.  Sorry I could not be of much help.   Please do share your final implementation model. Thanks

Had to do it the hard way using python SDK.
step1 : retrieve pipeline(by ID) that needs to be replicated.
step2 : loop through all stages and filter and remove stages(origins) with type=SOURCE
step3 : loop and add origins with respective paths (generate outputLane for these origins)
step4 : mention each of the newly added origin's outputLane in the inputLane for the stage, where these origins have to be connected (in our case, all origins met at a UNION stage)

Do let me know if there is a smarter way to go about this. Thanks.


Reply