When will StreamSets introduce the ability to use Git repos such as Github or GitLab for version control?
Already have an account? Login
Login to the community
No account yet? Create an account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.
StreamSets Control Hub has version control for pipelines through its pipeline repository feature.
Not sure I understand. Does that allow me to use my own GitHub repo for version control?
It doesn't, but unlike source code where developers directly modify it and where diffs are human-readable, StreamSets pipelines are JSONs that aren't intended to be inspected by eye. As such, while I know of some Control Hub users who have setup automation that pushes pipeline JSONs to external version control systems, they use them more as a backup. Meanwhile, the pipeline repository feature actually lets you see which changes happened between pipeline versions, lets you branch them off, rollback, etc. in a much more full-featured way.
Can you elaborate a bit more on the use case you're interested in? I could provide more detail if I understand it a bit better.
If I am correct, the idea is to be able to combine the pipeline definitions into the repo with other code and incorporate it all into a CI/CD pipeline for automated testing.
So we actually have a way of doing this kind of thing with functionality that’s already in the StreamSets DataOps Platform. Feel free to DM me on the community Slack (streamsetters.slack.com) and I can walk you through it because it’s just a bit involved for this thread.
@dima I have a similar use case as above. Is there any documentation that provides streamsets integration with external git repositories? Also I am not using slack, is there any other alternative to reachout? Thanks.
you can use Subscriptions on Control Hub. Create a new subscription and select ‘Pipeline Commit’ as your event you want to capture. You can provide further conditions like pipeline name starting with or contacts a particular text.
So everytime you commit a pipeline and if it matches the condition then it will generate a trigger that can be used to invoke webhook. You can use a CI server like jenkins to listen to this event and make an API call to download the pipeline (pipeline id and version number can be passed in the trigger). Also CI server can then post this downloaded pipeline to any repository of your choice.
Here is a video which explains this in detail: https://youtu.be/UQwKOS9VNyE?t=237