Question

Kafka client is not traversing through an entire bootstrap list

  • 20 September 2023
  • 0 replies
  • 24 views

Userlevel 2
  • StreamSets Employee
  • 0 replies

Problem statement: 

If you are doing DR testing or one of your Kafka broker is not available, you might observe an error message in the job log as below. 

Error getting metadata for topic
.
error: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata

Explanation: 

Let's say you have 3 brokers in the bootstrap list and first broker is went down during DR or some other reason, if request made from the client  to the first broker is timing out then pipeline will not make a retry to the next available broker , instead it will fail with an above “timeout” error. In ideal scenario client should traverse through all the brokers in the list before marking it as fail. 

server1:9092,server2:9092,server3:9092

This could happen in the kafka client lib 2.6 or below versions. More details can be found in KIP-601 article. 

 

Solution: 

  1. Upgrade kafka client lib to 2.7 or above and tune socket timeouts accordingly. . In this version they have introduced below two configurations which makes socket timeout to be controlled by client-side. 

socket.connection.setup.timeout.max.ms
socket.connection.setup.timeout.ms

  1. Decrease the tcp retry value from the file  /proc/sys/net/ipv4/tcp_syn_retries to 3. ( Default 6)

0 replies

Be the first to reply!

Reply