Solved

Run Pipeline for a specific amount of time


I have numerous Kafka topics that I’m moving to Databricks, but I don’t want the pipelines to continuously run.  Is there a way that I can schedule a pipeline to run for a certain amount of time or trigger it to stop after a certain amount of time...say an hour or 2?

icon

Best answer by saleempothiwala 6 March 2023, 22:35

View original

2 replies

Userlevel 5
Badge +1

@pkandra 

If the source is Kafka and if you want to stop after consuming the messages then you need to stop by using the code .

We don’t have any event to stop the pipeline .

 

You also can  try to one thing ,stop the event by selecting the option data bricks  and check if it helps.

Below the snippet for your reference.

 

Init Script: state['first_batch'] = "true"

if (state['first_batch'] == "false" and len(records) == 0):
sdc.log.info("No more Kafka messages to consume. Stopping pipeline. See ya!")
sdc.toEvent(sdc.createEvent("no-more-messages", 0))

for record in sdc.records:
try:
sdc.output.write(record)
except Exception as e:
# Send record to error
sdc.error.write(record, str(e))

if (state['first_batch'] == "true" and len(records) > 0):
state['first_batch'] = "false"

 

Please let me know if I can help you more on the issue.

 

Thanks & Regards

Bikram_

Userlevel 4
Badge

@pkandra you can create two schedule tasks. 

  1. Schedule task with Action = Start  at say 1pm
  2. Schedule task with Action = Stop at say 3pm

Reply