Skip to main content

Scheduling pipeline execution


Drew Kreiger
Rock star
Forum|alt.badge.img
  • Senior Community Builder at StreamSets
  • 95 replies

Job scheduling was not offered prior to Control Hub version 3.2.0. If you're using Control Hub 3.2.0 or higher, you can use the Scheduling feature to start pipelines at specific times. 

The documentation for Control Hub's Scheduler can be found here https://streamsets.com/documentation/controlhub/latest/help/index.html#controlhub/UserGuide/Scheduler/Scheduler_title.html#concept_up3_pm3_ldb 

 

If you're not using Control Hub, or have a pre 3.2.0 version, pipelines can be scheduled to start via cron or another utility.  You can use the Data Collector's CLI to start and stop the pipelines. 

The documentation for Data Collector CLI is here: 
https://streamsets.com/documentation/datacollector/latest/help/index.html#Administration/Administration_title.html#concept_ywx_d5x_pt

As a reminder, the columns in a crontab entry are:

Minutes - 0-59. 

Hour 0-23.

Day of month 1-31

Month of year 1-12

Day of week 0-6 (0 is Sunday) 

the command to execute.

To run a pipeline on weekdays, at 1:00 am, your crontab entry might look like this: 

00 01 * * 1-5 bin/streamsets cli -U http://localhost:18630 manager start -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db

Start the pipeline  at 1:00 and run the pipeline Monday through Friday.  Replace 1-5 with * to include weekends.

Depending on your environment, you will likely need to adjust the path above.  Perhaps writing a wrapper script that correctly sets the shell's environment and can start an arbitrary pipeline will make it more manageable.

 

@bob 

March 05, 2020 10:46

Did this topic help you find an answer to your question?
This topic has been closed for comments