Problem:
When working with jobs, the job does not stop after encountering an error and continues to run. The pipeline is set to retry on error a few times before failing, but never shuts down completely.
Solution:
When creating jobs for pipelines, there are some configuration settings that may override the settings in the pipeline, specifically regarding retries when Failover is enabled. Two job settings in particular, Failover Retries per Data Collector and Global Failover Retries, directly impact this scenario.
Failover Retries per Data Collector is the maximum number of pipeline failover retries to attempt on each available Data Collector. Global Failover Retries is the maximum number of pipeline failover retries to attempt across all available Data Collectors. The default value for these settings is -1, which will result in unlimited retries on the job. Therefore, the pipeline will never completely stop, even if the pipeline settings are configured to do so, when failover is enabled.
To achieve the desired result, both of these job settings need to be changed from the default setting of -1 to another value, 1 for example. This will allow your job to retry once on all available Data Collectors before eventually shutting down completely and going into a “RED INACTIVE” state.