Solved

Logging mechanism for Data Transformer and Data Collector pipelines

  • 13 September 2021
  • 3 replies
  • 72 views

Is there a prebuild processor/component which captures no. of records processed through stages and other logging events ? We have requirements to capture no. of records processed and other logging events and possibly store them to log files/MySQL stages

icon

Best answer by Giuseppe Mura 13 September 2021, 10:12

View original

3 replies

Userlevel 2

HI @Shanil, actually yes, there is a standard way to consume execution metrics, including overall pipeline / job record counts as well as statistics for each stage in a pipeline: it’s the Control Hub API: 

/jobrunner/rest/v1/metrics/job/{jobId}

You can find the API from the Help menu, the little question mark icon in the toolbar at the top of the page.

When you call the metrics API you get a json document back which contains a number of objects, one is the “counters” which contains the information you’re after, including the following, for pipeline level inputs/outputs:

  • pipeline.batchInputRecords.counter
  • pipeline.batchOutputRecords.counter
  • pipeline.batchErrorRecords.counter

You’ll also find stage level counters in there, if you need to retrieve information at stage level.

In terms of consuming the API, you can use your tool of choice, or indeed build a simple Data Collector pipeline to source from it and write any metrics you’re interested in to logs or MySQL as required.

Thanks for the response. Tried calling the API in a Data Collector pipeline using HTTP client origin using JSON as Data Format . However it is giving error:

 

HTTP_00 - Cannot parse record. HTTP-Status: 200 Reason: com.fasterxml.jackson.core.JsonParseException: Unexpected character ('<' (code 60)): expected a valid value (number, String, array, object, 'true', 'false' or 'null') at [Source: com.streamsets.pipeline.api.ext.io.OverrunReader@1f4c5afc; line: 1, column: 2]

Noticed that there is an issue open for the same error:

https://issues.streamsets.com/browse/SDC-11039

Any pointers to proceed in this case?

 

Userlevel 2

Hi @Shanil, I think the reason why you’re getting that is likely to be that you’ve not authenticated to Control Hub and therefore are getting a HTML page back from the call (redirection to login page) rather than the output of the API call.

I’ll refer you to the tip on the REST APIs page in Control Hub itself:


Sample script for using Control Hub REST APIs

# Get $CRED_ID & $CRED_TOKEN from API Credentials page # Call Control Hub or Data Collector (Control Hub enabled) REST APIs using API Credentials curl -X GET https://eu01.hub.streamsets.com/security/rest/v1/currentUser -H "Content-Type:application/json" -H "X-Requested-By:curl" -H "X-SS-REST-CALL:true" -H "X-SS-App-Component-Id: $CRED_ID" -H "X-SS-App-Auth-Token: $CRED_TOKEN" -i

So, what you need to do is:

  1. Get the auth token from the UI first
  2. Make the call to the API passing the CRED_ID and CRED_TOKEN values as in the example above

I hope this helps!

Reply