Issue:
A pipeline got into STARTING_ERROR status. Not able to do a force stop or even restart the pipeline to try and recover.
Solution:
A very common scenario where this can be seen is when you have your $SDC_DIST/data directory hitting 100% disk utilisation. If the same volume/mount-point is shared to host fast growing files like SDC logs, then proper control measures should be enforced - like log rolling, maximum size a single log is allowed to grow to, etc.
If you go to the SDC UI -> Administration -> Logs -> Log Config, you should see the log4j properties with our default defined 'streamsets' appender. You can see this appender will have settings similar to the following:
log4j.appender.streamsets=org.apache.log4j.RollingFileAppenderlog4j.appender.streamsets.MaxFileSize=256MBlog4j.appender.streamsets.MaxBackupIndex=10
To proactively prevent the pipeline from hitting such an issue,
- adjust the MaxFileSize and MaxBackupIndex accordingly, to suit your environment capacity.
- a better recommendation, is to write the log files to a separate volume/mount-point, different from that used by $SDC_DATA, if you are keeping a good amount of logs and also have a lot of pipelines, so that log retention does not affect the writing of the pipeline states.
A quick workaround on hitting this will be to remove some of the older log files to reduce disk space utilisation.