CDC Oracle Problem

  • 5 September 2022
  • 1 reply

Hi everyone, my name is Sebastian and rigth know Im start working with Streamsets, the plataform looks good but  I have some problems y want to bring data from Oracle Database Insert Update and Delete, all good at this point the problem is i want to read only one table but i watch the CDC reads al SCN of the datatable and when make the operation (Insert, Update, Delete) the SCN is 2238200632 and in the pipeline is 2237969670 so is huge space that the pipeline needs to record to take the operation, so my question is if is possible to improve or increased speed of read to try to catch the SCN of database? 

To operation on the DBB and go to Streamset is like 1 or 2 hours and is impossible to work with this teams in a table that only have 300 registers.


I hope one of you can help me!! Thank you

1 reply

Userlevel 2

HI @SebasR,

Our Oracle CDC Origin is really fast, being able to ingest many thousand records per second in high load database servers. I don’t think you have a performance issue here, albeit you are using a very old SDC version. As you don’t mention it, I will assume you are using latest SDC version (5.1).

Given this, it looks you have some settings in your CDC pipeline that can be improved. Please check the following settings:

  • Dictionary Source: If you don’t need to track dictionary changes (probably you don’t need this in your first contact with this stage), set it to Online Catalog.
  • Maximum Transaction Length: 1 minute is enough in most cases.
  • LogMiner Session Window: 1 to 2 minutes should be enough in most cases.
  • DB Time Zone: Make 100% sure it is exactly the same as in your database (select DBTIMEZONE from dual).
  • Max Batch Size (records): Use a reasonable value from 100 up.
  • Batch Wait Time (secs): Avoid 0 to start with. Something from 1 second to 30 seconds is a good value to begin experimenting with this stage.
  • JDBC Fetch Size for Current Window: Use some large value. 1000 is a good value for experimental testing and low load databases.
  • JDBC Fetch Size for Past Windows: Use some large value. 1000 is a good value for experimental testing and low load databases.
  • Disable Continuous Mine: Check this flag, specially if your database is 12c. Continuos Mine has been deprecated by Oracle, newer versions of Oracle do not support it, and, in most cases, disabling it will give you better performance. It is a good idea to disable it for experimental testing, even if after you decide to use it.

Please, let me know if these recommendations help you to address your issues. If not, sharing your pipeline would help to give further assistance.