We are using postgres CDC client to ingest the data from postges to ADLS. It’s working for other pipeline which are getting the CDC records lesser than 1 GB.
While we drop and recreate the slots its working fine. But after sometime CDC files are generating from Streamsets. In pipeline logs we are not find error logs and jobs is active and running.
while querying pg_replication_slots volume of data reaches more than 1 GB to several. until drop the replication slots CDC pipeline is not streaming.
So, please suggest how to fix the issue. Do, we need to change the Streamsets configuration or postgres configuration. please let us know.
Our Streamsets config. as below,
Max Batch Size : 15000
Streamsets engine : 4.4.1
Batch Wait Time : 15000
Query Timeout : {45 * MINUTES}
Poll Interval: {1 * SECONDS}
Status Interval : ${30 * SECONDS}
CDC Generator Queue Size : 20000