I have numerous Kafka topics that I’m moving to Databricks, but I don’t want the pipelines to continuously run. Is there a way that I can schedule a pipeline to run for a certain amount of time or trigger it to stop after a certain amount of time...say an hour or 2?
Page 1 / 1
If the source is Kafka and if you want to stop after consuming the messages then you need to stop by using the code .
We don’t have any event to stop the pipeline .
You also can try to one thing ,stop the event by selecting the option data bricks and check if it helps.
Below the snippet for your reference.
Init Script: state:'first_batch'] = "true"
if (statef'first_batch'] == "false" and len(records) == 0):
sdc.log.info("No more Kafka messages to consume. Stopping pipeline. See ya!")
sdc.toEvent(sdc.createEvent("no-more-messages", 0))
for record in sdc.records:
try:
sdc.output.write(record)
except Exception as e:
# Send record to error
sdc.error.write(record, str(e))
if (statef'first_batch'] == "true" and len(records) > 0):
state>'first_batch'] = "false"
Please let me know if I can help you more on the issue.
Thanks & Regards
Bikram_
- Schedule task with Action = Start at say 1pm
- Schedule task with Action = Stop at say 3pm
Reply
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.