Problem:
The Kafka stages performance seems to decrease and we can see the following exception in the logs:
ERROR Encountered error in multi kafka thread 0 during read org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
Solution:
These errors are usually an issue on the Kafka side due to group re-balance. It looks like large processing time between poll calls ( when processor processes a large volume of data) can exceed session.timeout.ms and cause group rebalancing.
To mitigate the problem, we might need to increase the group.max.session.timeout.ms on the brokers' side and increase the values for request.timeout.ms and session.timeout.ms. This will depend on your environment configuration. Usually increasing the session.timeout.ms should help to fix the problem. For more information related to these parameters, please see the Kafka documentation here.
In order to change configs as session.timeout.ms, you can set the property in both Kafka Consumer and Kafka Multitopic Consumer in the SDC UI > Stage configuration > Kafka tab > Kafka Configuration