Skip to main content
Solved

Using records from a Control Hub API DC pipeline


Hi, 
what would be the best way to take the record that returns from making a Control Hub API request for a job status (from a data collector pipeline) and writing that record to a file, say HDFS.

cheers

Best answer by Giuseppe Mura

@collid , could you please expand on your question? The response from the API is a JSON document, which is handled perfectly well by Data Collector - you then have many options for manipulating the payload (you can flatten the structure, pick individual elements of it, etc.) using the many built-in Processor stages. You can then write the data out in whatever format (JSON, Delimited, Avro, etc.) to whatever destination system that is supported by Data Collector, including HDFS if  your SDC is connected to your data lake.
 

View original
Did this topic help you find an answer to your question?

4 replies

Giuseppe Mura
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 37 replies
  • Answer
  • July 6, 2022

@collid , could you please expand on your question? The response from the API is a JSON document, which is handled perfectly well by Data Collector - you then have many options for manipulating the payload (you can flatten the structure, pick individual elements of it, etc.) using the many built-in Processor stages. You can then write the data out in whatever format (JSON, Delimited, Avro, etc.) to whatever destination system that is supported by Data Collector, including HDFS if  your SDC is connected to your data lake.
 


  • Author
  • Roadie
  • 6 replies
  • July 6, 2022

sorry, I probably posted this too early!
it seems using the expression evaluator to access the JSON response has done what I was trying to do.

 

Sorry!


Giuseppe Mura
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 37 replies
  • July 6, 2022

No problem at all @collid, glad it all worked out!


  • Author
  • Roadie
  • 6 replies
  • July 8, 2022

One other question relevant to this topic. Is it possible for the Job Status API call to return results from more than one JobID during a call?
or do you need separate API processors in the Data Collector pipeline?

Thanks


Reply