Question

Postgres CDC client is not processing data, when slot volume reaches 1 GB

  • 28 November 2022
  • 5 replies
  • 52 views

We are using postgres CDC client to ingest the data from postges to ADLS. It’s working for other pipeline which are getting the CDC records lesser than 1 GB. 

While we drop and recreate the slots its working fine. But after sometime CDC files are generating from Streamsets. In pipeline logs we are not find error logs and jobs is active and running.

while querying pg_replication_slots volume of data reaches more than 1 GB to several. until drop the replication slots CDC pipeline is not streaming.

So, please suggest how to fix the issue. Do, we need to change the Streamsets configuration or postgres configuration. please let us know.

Our Streamsets config. as below,

Max Batch Size : 15000

Streamsets engine : 4.4.1

Batch Wait Time : 15000

Query Timeout : {45 * MINUTES}

Poll Interval: {1 * SECONDS}
Status Interval : ${30 * SECONDS}

CDC Generator Queue Size : 20000


5 replies

Userlevel 5
Badge +1

@durairaj 

Can you please reduce the batch size to 50k or 100K and check if it helps.

 

Hi Bikram,

                

Thanks for your response. I have given already 15K as batch size. Do you want increase batch size to 50K. or re deuce to lesser than 15K. please confirm.

I tried with 50K and 100 K batch size for those databases. But same issue. 

Thanks & Regards

Durairaj S

Userlevel 5
Badge +1

@durairaj 

The batch size is looks fine . If possible can you please try to do the setup in PostgreSQL and validate if it helps. 

 

Thanks & regards

Bikram_

Hi Bikram,

 

Thanks for you response. I will made the change and test. before need some information. I am unable to find the below configuration postgres server. is this mandatory to set or these parameters are renamed?

checkpint_flush_after (not avaialble)
checkpint_timeout (not avaialble)

Userlevel 5
Badge +1

@durairaj 

If you are not seeing the options , then go ahead with other configurations and execute the pipeline and check if its throwing the same error.

 

I connected the postgre SQL from JDBC consumers and it worked fine for me .I will try to provide you the details for your reference.

 

Thanks & Regards

Bikram_ 

Reply