Skip to main content

Data duplication was discovered by SCH jobs when one of SDC node was taken offline


subashini
StreamSets Employee
Forum|alt.badge.img

Product: ALL SCH versions

Issue:

The customer has reported that 3 of their Jobs loaded duplicated data. This issue happened after they started an upgrade on their SDCs and disabled one of the nodes to upgrade it (node2), forcing all the Jobs to run on the other one (node1).

Customer was using Cloud SCH and planning an SDC upgrade from 3.7.2 to 3.16.2 at the time of writing this article.

Steps followed to remove SDC NODE2 from executing any job are as below: 

1. In Control Hub - went to Jobs section .
2. All jobs were in inactive mode . As the jobs are scheduled to run at 8PM UK time.
3. Went to individual job ->Job status edit job-> Label option .
4. RemovedProdlabel and addedupgradelabel for each job.
5. Also disabled the failover option by unchecking the check box.

The data duplication was only seen in 3 jobs while a lot of other jobs picked up from where they left off and ran without any data duplication.

Versions affected:

SCH Cloud latest release as of Aug 2020

Root Cause:

For the job  where data duplication was observed the Job configuration "migrate offsets" was configured to "false"

This perhaps may have been the reason with those 3 jobs that had issues when Node 2 was disabled.

Did this topic help you find an answer to your question?
This topic has been closed for comments