Skip to main content
Question

audit logic for CDC pipelines

  • February 25, 2022
  • 4 replies
  • 114 views

harshith
Discovered Fame

we are ingesting data using CDC pipelines(streamsets) from oracle source. so basically it ingests if any INSERT/UPDATE/DELETES are happening at source. Just wanted to how how can audit be performed on the same and its 24*7 and what can be the logic for audit checks.

we are basically using databricks for audit checks

4 replies

Giuseppe Mura
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 37 replies
  • February 25, 2022

 Hi @harshith , the pipeline itself keeps collecting metrics about its performance - the histograms in the pipeline itself provides you a graphical view of the performance (e.g. records throughput). 
 

 

Note that the detailed information can be extracted from the Control Hub repository using REST APIs; you can use the following:

/jobrunner/rest/v1/metrics/job/{jobId}

 

Given a specific jobId it will return a JSON document with all the metrics related to that execution.

 


harshith
Discovered Fame
  • Author
  • Discovered Fame
  • 11 replies
  • February 25, 2022

Giuseppe Mura thanks for the reply, actually since it will be running on production daily basis , we wanted to capture the record count and match it with source everyday on a delta table.


Giuseppe Mura
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 37 replies
  • February 25, 2022

@harshith, that makes a lot of sense, you want some audit of loads and have some level of reconciliation vs source systems; you can get the record count from the UI, but that’s only helpful if you’re a user; for automation purposes and to facilitate reporting, you can use the REST API.

Other easy option is to use the Data Collector “Control Hub API”  stage as below:


 

 

note that you’ll need to pass the API credentials in the header as follows:
 

 


Drew Kreiger
Rock star
Forum|alt.badge.img
  • Senior Community Builder at StreamSets
  • 95 replies
  • March 7, 2022

@harshith were @Giuseppe Mura suggestions helpful? If so, Please mark “Best Answer”. 


Reply