Question

Postgres CDC records are not generating over period of time

  • 8 November 2022
  • 9 replies
  • 74 views

We have Streamsets pipeline for generating Postgres incremental records using Logical replication slots.

It ran successfully for few days and generated the files.

Then after few days Streamsets are running and are in active state, but the incremental files are not getting generated. 

We dont see any errors in Streamsets and Postgres logs. The replication slots are getting filled up in the backend.

Do, someone knows the rootcause of this issue and any solution to fix this.

Please suggest.

 

Thank you,

Jose Kattakayam

 


9 replies

Userlevel 3
Badge

Hello @jk56273 - Thank you for reaching out to the StreamSets Community.

Could you please confirm the data collector version and share the screenshot of the pipeline with us?

Userlevel 3
Badge +1

@jk56273 

 

can you please try to run the job with “Reset origin option “ and validate if the pipeline is running and generating the file.

Hi @Bikram,

We have tried running the job from Reset origin option, but still the issue persists. After certain time, the Postgres replication slot gets piled up. 

@AkshayJadhav The Data collector version we use is V4.4.1. We cannot share the pipeline screenshot.

Userlevel 3
Badge

@jk56273  - Could you please confirm if `Remove Replication Slot on Close` is set to true?

Userlevel 3
Badge +1

@jk56273 

 

Can you please check if the configuration is same as given below.

 

Thanks & Regards

Bikram_

 

@jk56273  - Could you please confirm if `Remove Replication Slot on Close` is set to true?

 

We tested after enabling “Remove Replication Slot on Close”, We restart the pipeline every one hour with 5 mins gap. But if any transactions happen in this 5 min, it will be missed as the Replication slot is not present. Thats why we didnt enable this option.

Userlevel 3
Badge

Hi @jk56273  - Could you please add the debug and see if that shows any error in the sdc.log file during the issue time:

Go to SDC UI -> 'Administration' (cog-wheel top-right) -> 'Logs' -> 'Log Config'
Append the following line:

  log4j.logger.com.streamsets.pipeline.stage.origin.jdbc.cdc.postgres=DEBUG
 

Hi @AkshayJadhav

We have added above debug in log config and rerun the pipeline. But no records are processed. We get the following DEBUG message “Checking if any active runner is idle”.

Meanwhile, Replication Slot is having 3137 Mb data size.

Is the data processing not happening in streamsets because of the huge data in Replication slot?

Userlevel 3
Badge

Hello @jk56273 We can ignore those messages for now. Would it be possible to open up support case with Streamsets for further troubleshooting?

Thank you - AkshayJ

Reply