Kafka consumer origin is fetching multiples of number of records in the origin

3 years ago
February 28, 2022
0 replies
156 views

+1

Pradeep
StreamSets Employee
48 replies

Sometimes we notice kafka consumer is fetching more number of records than in the origin. Let us take following assumptions to understand the issue.

You have 250 records in a kafka topic and assuming there are no new events getting streamed to this topic. Batch size of kafka consumer is also set to 250.
Kafka consumer in the pipeline is setup with a consumer group to read from the topic in (1)
Destination is HTTP Client which makes calls to configured API end-points to consume the relevant kafka data. Destination could be anything and not just HTTP Client.
You observe kafka consumer’s record count is increasing by 250 every few mins. Eg: 500, 750, 1000 etc.

Troubleshooting:

First thing we need to verify is "Stage Batch Processing Timer (in seconds)" and note how much time each stage is taking to process.

If you see the destination or processor configured next to Kafka consumer origin is taking more than 90% time of the pipeline run(90% is just a random threshold. you can say affected stage is taking significant processing time compared to the origin) then it is likely that destination or processor is taking longer time to process the origin records and if the records are not consumed in set timeout of kafka it will send the same batch of records again.

Solution:

To fix the issue, below configs can be set according to the max processing time of the destination.

1) max.poll.interval.ms
2) session.timeout.ms: While setting this please make sure the value is within the range of broker config group.min.session.timeout.ms and group.max.session.timeout.ms

We would also recommend to go through https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html to understand more about these configurations.

Did this topic help you find an answer to your question?

Be the first to reply!

Reply

Related topics

Security Content lifecycle - Identify and Discover - Community Roundtable Highlights

Box Community Weekly Digest: March 14, 2025

Weekly Round-up! 📢

Box Community Weekly Digest: March 7, 2025

The Security and Governance Content Lifecycle

Tags

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded