audit logic for CDC pipelines

  • 25 February 2022
  • 4 replies

we are ingesting data using CDC pipelines(streamsets) from oracle source. so basically it ingests if any INSERT/UPDATE/DELETES are happening at source. Just wanted to how how can audit be performed on the same and its 24*7 and what can be the logic for audit checks.

we are basically using databricks for audit checks

4 replies

Userlevel 5

@harshith were @Giuseppe Mura suggestions helpful? If so, Please mark “Best Answer”. 

Userlevel 3

@harshith, that makes a lot of sense, you want some audit of loads and have some level of reconciliation vs source systems; you can get the record count from the UI, but that’s only helpful if you’re a user; for automation purposes and to facilitate reporting, you can use the REST API.

Other easy option is to use the Data Collector “Control Hub API”  stage as below:



note that you’ll need to pass the API credentials in the header as follows:


Giuseppe Mura thanks for the reply, actually since it will be running on production daily basis , we wanted to capture the record count and match it with source everyday on a delta table.

Userlevel 3

 Hi @harshith , the pipeline itself keeps collecting metrics about its performance - the histograms in the pipeline itself provides you a graphical view of the performance (e.g. records throughput). 


Note that the detailed information can be extracted from the Control Hub repository using REST APIs; you can use the following:



Given a specific jobId it will return a JSON document with all the metrics related to that execution.