Question

Alternate Git Repo Integration

  • 4 January 2022
  • 7 replies
  • 341 views

Userlevel 1

When will StreamSets introduce the ability to use Git repos such as Github or GitLab for version control?


7 replies

Userlevel 3
Badge

Hi @mblahay,

StreamSets Control Hub has version control for pipelines through its pipeline repository feature.

Userlevel 1

Not sure I understand. Does that allow me to use my own GitHub repo for version control?

Userlevel 3
Badge

It doesn't, but unlike source code where developers directly modify it and where diffs are human-readable, StreamSets pipelines are JSONs that aren't intended to be inspected by eye. As such, while I know of some Control Hub users who have setup automation that pushes pipeline JSONs to external version control systems, they use them more as a backup. Meanwhile, the pipeline repository feature actually lets you see which changes happened between pipeline versions, lets you branch them off, rollback, etc. in a much more full-featured way.
 

Can you elaborate a bit more on the use case you're interested in? I could provide more detail if I understand it a bit better. 

Userlevel 1

If I am correct, the idea is to be able to combine the pipeline definitions into the repo with other code and incorporate it all into a CI/CD pipeline for automated testing.

Userlevel 3
Badge

So we actually have a way of doing this kind of thing with functionality that’s already in the StreamSets DataOps Platform. Feel free to DM me on the community Slack (streamsetters.slack.com) and I can walk you through it because it’s just a bit involved for this thread.

Hi @dima I have a similar use case as above. Is there any documentation that provides streamsets integration with external git repositories? Also I am not using slack, is there any other alternative to reachout? Thanks. 

Userlevel 4
Badge

hi @nchalla0 

you can use Subscriptions on Control Hub. Create a new subscription and select ‘Pipeline Commit’ as your event you want to capture. You can provide further conditions like pipeline name starting with or contacts a particular text.

So everytime you commit a pipeline and if it matches the condition then it will generate a trigger that can be used to invoke webhook. You can use a CI server like jenkins to listen to this event and make an API call to download the pipeline (pipeline id and version number can be passed in the trigger). Also CI server can then post this downloaded pipeline to any repository of your choice.

 

Here is a video which explains this in detail: https://youtu.be/UQwKOS9VNyE?t=237

Reply