Problem
Old SCH job run histories are taking up a lot of disk space either in SDC nodes or in SCH’s DB, and these histories need to be limited for better performance.
Solution
Job run history is stored in SCH and in SDC (if SDC has configured dpm.runHistory.enabled=true in $SDC_CONF/dpm.properties).
- In SCH, you can configure Maximum number of job runs for each Organization. This is the max number of runs retained for each job's history
- SCH periodically runs a clean-up task to purge all old job histories when this limit is reached.
- SDC can also retain a local copy of job run histories if dpm.runHistory.enabled=true is configured in $SDC_CONF/dpm.properties.
- SDC keeps runtime information pertaining to each running pipeline in $SDC_DATA/runInfo.
- SDC deletes any local job history information in $SDC_DATA/runInfo whenever a pipeline finishes.
- If you set dpm.runHistory.enabled=true, then SDC will copy each job’s history from $SDC_DATA/runInfo into $SDC_DATA/runHistory before this clean-up occurs to retain a persistent local copy of all job runs which occurred on that SDC.
- There is no history limit configuration for SDC to automatically clean these $SDC_DATA/runHistory history files.
- If you need to clean up your job histories on SDC due to filesystem limitations, you can either:
- Regularly clean up the persistent run histories by deleting old files in the $SDC_DATA/runHistory directory
- Set dpm.runHistory.enabled=false so SDC will not retain local copies of job histories