Skip to main content
Question

pipeline want to run once my source is update.


lakshmi_narayanan_t
Discovered Fame

my origin is directory reading csv source  and write it to AWS S3 BUCKET when my source any data is updated means I need to rerun my pipeline ,how can I achieve it .

2 replies

Bikram
Headliner
Forum|alt.badge.img+1
  • Headliner
  • 486 replies
  • June 20, 2023

@lakshmi_narayanan_t 

 

Can you use kafka processor in your case , if yes then it will  be solve your problem.

 

Pipeline 1:

Read  data from source and send it to Kafka producer .

Pipeline 2 :

Fetch data from kafka topic and send to S3 bucket . 

In this case you will get the updated data in case of any changes from the source.

 

Please let me know if it helps , else i will help you on the second approach to come over your issue.

 


Sanjeev
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 53 replies
  • July 11, 2023

@lakshmi_narayanan_t  depending upon the read order configured the directory origin will automatically pick up new files as and when they arrive as long as the pipeline is running continuously or runs based on a regular schedule.  


Reply