Scenario:
Pipeline with ADLS GEN2 Destination stage with nested directory structure stays in the `STARTING` stage and takes a long time for transition to RUNNING state.
Goal:
The bottleneck seems to be the file recovery functionality. When the "Skip file recovery" (By default is enabled ), the transition to the RUNNING state will be delayed.
Because during the recovery process, the "Directory Template" is scanned to find and promote all the temporary files The temporary files found could have been kept in the ADLS container after a forced stop of the pipeline or after killing the SDC process. After the recovery is completed, the pipeline can be restarted again at any time without any slow down, provided it is stopped safely.
Solution:
- Disable the recovery functionality (By default enabled)
Go to pipeline ---> ADLS Destination ---> output files --> Skip file recovery ( check this) to disable recovery option