Skip to main content

I have a pipeline that gathers records from an HTTP client. At one point in the pipeline I have a stream selector that filters out records that I don’t want and sends them to trash, while remaining records gets written to an S3 destination. At the end is a pipeline finisher executor.

The issue is when there are no remaining records after the stream selector, ie. the data contained no relevant records. In this case the pipeline job remains running indefinitely, and I have to stop it manually. How can I stop the pipeline when there are no records left after the stream selector?

@HåkonD 

There are a few things that I need clarifications on:

  1. What is the selected mode for our HTTP Client Origin ? There are three possible - Streaming, Polling and Batch. If your client returns all the records in one batch then set it to this mode and your pipeline will stop automatically without adding any executors.
  2. Where have you added the pipeline finisher executor? If it is at S3 destination then it will trigger when there is no more data. But that is not a good place to add it. This will stop your pipeline after first batch of data is processed even if the next batch will have more data.

@saleempothiwala 

  1. That’s where it gets a bit complicated. Based on your answer to another question I posted I set the origin as a raw data source in order to create a dynamic timestamp in the URL. So the data collection happens in an HTTP Client processor, not an origin.
  2. I added the finisher after the s3 destination. For this pipeline, it’s not really necessary to run more than one batch of data, so that’s not really a problem.

Don’t know how helpful it is, but I’ve added a screenshot of the pipeline

 


@HåkonD 

ah… now i remember :-)

You can set the ‘Stop After First Batch’ flag on Dev Raw Data Source origin. That should automatically stop the pipeline after first batch is processed.

 

 


Reply