Resuming pipeline after pipeline failure and Batch processing behaviour

  • 20 April 2023
  • 1 reply

I need to build a fail safe pipeline where In case the pipeline is stopped abruptly due to some issue on the VM side or the SDC, There needs to be a safe exit.
The batch it was working on should not be lost or partial data should not be written.
When restarted I need to make sure it does not start from the 1st batch again but needs to pick up from where it left off.

What does streamsets have to deal with problem like this??

1 reply

Userlevel 4



StreamSets Data Collector processes data in batches. Depending upon the Deliver Guarantee that you have set at pipeline level, the offset value will be committed after the data is sent to the destination. So of the data collector goes down in between the batch processing then by default your pipeline will start from the last offset value. You just have to choose the right delivery guarantee and everything else will be taken care of by SDC and SCH.