Missing LogMiner files - what's happening?

  • 10 January 2022
  • 0 replies
  • 172 views

Userlevel 3

Oracle CDC origin reads data from redo logs using an Oracle tool called LogMiner. Redo Logs store every single change that happened on the database, which is why they are really large in size. To avoid consuming massive amounts of disk space, these logs are usually removed from the database, usually once the file contains only data that is more than 24-48 hours old. 

One of the most common issues that we have seen is an error that looks like this:

ERROR OracleCDCSource - SQLException while trying to setup record generator thread java.sql.SQLException: ORA-01292: no log file has been specified for the current LogMiner session ORA-06512: at "SYS.DBMS_LOGMNR", line 58
ORA-06512: at line 1 at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:440) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396) ....

 

What does this mean?

This simply means that the pipeline is trying to read data from a timeframe when the redo logs are no longer available. This causes the origin to complain that there is no redo log file that contains data from that time, making no progress (by design, to avoid losing data). 

To figure out why the pipeline entered such a state:

  1. Look at the start date or start SCN. Is it from quite a while before the time the pipeline was started? If yes, this could be the reason. Remember that resetting offsets will move your read window to the original start date or SCN specified in the pipeline.
  2. The pipeline might be affected by SDC-9216. Remember that this is a pretty rare occurrence and it happens only when the tables do not get any data for extended periods of time, and then the pipeline is restarted because the offset points to the last record received, which could be outside the retention window. A running pipeline is NOT affected by this bug as the offset is not used while running to track data reads, it is only used to restart the pipeline.
  3. If the user stops a pipeline and restarts it after the retention window has expired, then they could hit this issue since we try to read from where we left off (regardless of if the fix for SDC-9216 is present or not). To fix this, change the start date/SCN to a time when the redo logs are available, reset the origin and then start the pipeline. Note that any unread data between the stop time and the new start time will be lost. To avoid this, pipelines should not be stopped for that much time. 
  4. Look at the stage timer in the metrics. If stage metrics show the origin taking the most time, and not the rest of the pipeline, go to (3). Since the origin buffers only about 2 * batch size, this could cause lag because the buffer is being cleared very slowly. Usually, this happens because a stage or combination of stages in the pipeline is slow. The usual suspects here are JDBC lookup, Hive stages, etc. In addition to the stage timer, a good way to figure out which stage causes the lag is to create an Oracle CDC to trash pipeline and add stages back one by one. In general, if the use-case involves complex processing of the data read from Oracle CDC origin, it is a good idea to split up the pipeline into two: Oracle CDC Origin to Kafka (SDC record format) and Kafka to rest of the complex pipeline. This will ensure the lag does not keep increasing, causing unread data in redo logs to be deleted. 
  5. If the origin is what is taking > 60% of the pipeline time, then it means that the origin is somehow reading data very slowly. There are various possibilities:
    1. Parsing is what is taking way too long. There is no direct way to identify if this is the case, but usually high CPU usage on a pipeline that simply is Oracle CDC to trash is a good indication. In this case, switch to the experimental PEG parser (Oracle CDC Origin -> Advanced tab -> Enable PEG Parser), and increase the number of parsing threads if you expect your transactions are likely to have a large number of records.
    2. The pipeline is far from the Oracle DB from a network perspective. Try moving the pipeline to the same data center or the same rack as the Oracle node.
    3. JDBC server-side buffering. This causes the Oracle JDBC driver to buffer data on the Oracle side before fetching it. The JDBC fetch size is configurable and defaults to 1. For low-volume tables, setting the fetch size low will help improve latency.
    4. JDBC is also a very chatty protocol. So if the fetch size is set to very low values on a high volume table, especially if it is not wide, causes the JDBC overhead to be very high. If the volume is high on these tables, increase the JDBC fetch size to a higher value. 
    5. Ensure that the timezone of the DB is configured correctly. Incorrect timezones will cause us to attempt to read data from incorrect LogMiner windows. Use the following SQL query to see what time on the DB is, and ensure it is what we expect it to be, based on the selected timezone:
ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD HH24:MI:SS'; SELECT SYSDATE FROM DUAL;

 


This topic has been closed for comments