How to add multiple origins to an existing pipeline using python SDK
Can you describe your use case a bit more? StreamSets Data Collector pipelines can only have a single origin, so this isn't possible with or without the SDK.
My one transformer pipeline will have multiple sources. I have created such a template. Now i want to replicate it against multiple projects by just swapping N origins in a programmatic way. I am aware this can be SCH usecase. But i dont have the option of SCH, hence need to depend on python SDK
in our deployment, we used one parameterized template and used the SDK to programmatically change the connections at run time.. This helped us to scale efficiently and monitor the flows from our standard scheduling tool.
in our deployment, we used one parameterized template and used the SDK to programmatically change the connections at run time.. This helped us to scale efficiently and monitor the flows from our standard scheduling tool.
in my usecase, i have a varying no.of origins due to which i have to also add/remove origins from an existing pipeline
you have an interesting use case.. I have not used SDK for Transformer. A quick search in the documentation does not indicate that we have Transformer related capability.
https://docs.streamsets.com/sdk/latest/index.html
Another possibility would be to create jobs, but I do not think that option fits the requirement perfectly. There will be some maintenance concerns as well. Sorry I could not be of much help. Please do share your final implementation model. Thanks
you have an interesting use case.. I have not used SDK for Transformer. A quick search in the documentation does not indicate that we have Transformer related capability.
https://docs.streamsets.com/sdk/latest/index.html
Another possibility would be to create jobs, but I do not think that option fits the requirement perfectly. There will be some maintenance concerns as well. Sorry I could not be of much help. Please do share your final implementation model. Thanks
Had to do it the hard way using python SDK.
step1 : retrieve pipeline(by ID) that needs to be replicated.
step2 : loop through all stages and filter and remove stages(origins) with type=SOURCE
step3 : loop and add origins with respective paths (generate outputLane for these origins)
step4 : mention each of the newly added origin's outputLane in the inputLane for the stage, where these origins have to be connected (in our case, all origins met at a UNION stage)
Do let me know if there is a smarter way to go about this. Thanks.
Reply
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.