Solved

Using records from a Control Hub API DC pipeline


Hi, 
what would be the best way to take the record that returns from making a Control Hub API request for a job status (from a data collector pipeline) and writing that record to a file, say HDFS.

cheers

icon

Best answer by Giuseppe Mura 6 July 2022, 11:41

View original

4 replies

Userlevel 3
Badge

@collid , could you please expand on your question? The response from the API is a JSON document, which is handled perfectly well by Data Collector - you then have many options for manipulating the payload (you can flatten the structure, pick individual elements of it, etc.) using the many built-in Processor stages. You can then write the data out in whatever format (JSON, Delimited, Avro, etc.) to whatever destination system that is supported by Data Collector, including HDFS if  your SDC is connected to your data lake.
 

sorry, I probably posted this too early!
it seems using the expression evaluator to access the JSON response has done what I was trying to do.

 

Sorry!

Userlevel 3
Badge

No problem at all @collid, glad it all worked out!

One other question relevant to this topic. Is it possible for the Job Status API call to return results from more than one JobID during a call?
or do you need separate API processors in the Data Collector pipeline?

Thanks

Reply