Skip to main content

Clustered Kafka Error - kafka.common.OffsetOutOfRangeException.

  • February 14, 2022
  • 0 replies
  • 87 views

AkshayJadhav
StreamSets Employee
Forum|alt.badge.img

ISSUE:

When you configure a clustered Kafka pipeline and Kafka topic retention is set to a low value, for example, 30 minutes, you may run into kafka.common.OffsetOutOfRangeException errors.  This exception occurs when data no longer exists past the retention period.

 

SOLUTION:

The error message/scenario which you are seeing  kafka.common.OffsetOutOfRangeException is because the offsets no longer exist, as the retention time is set to 30 minutes. Typical deployments we've seen in the field have retention settings around 24 hours up to a full week.  

In this use case, if one starts a pipeline with the Kafka consumer origin with offset X and the retention time is set to 30 minutes then when the pipeline goes down for a couple of hours, offset X which existed a couple of hours ago would vanish as retention will only keep 30 minutes worth of data. You will hit the OffsetOutOfRangeException in this scenario.

Due to a limitation in the spark-Kafka library, we are using, there is no way to issue an OffsetRequest to get the latest/earliest offset currently available. You can see the related JIRA here:

https://issues.apache.org/jira/browse/SPARK-12693 
 

WORKAROUND:


1) For now, as a workaround, you could clear the offset and restart the pipeline. 

Offset file location in cluster mode: hdfs://<user_home>/.streamsets-spark-streaming/<sdcId>/<topic>/<consumer_group>/pipeline_name/offset.json

2) For a long-term solution, you could increase the retention time which will decrease the probability of hitting the error.

Did this topic help you find an answer to your question?

0 replies

Be the first to reply!

Reply