Skip to main content

Trying to troubleshoot connecting to Kafka and getting the following error:

 

com.streamsets.pipeline.api.StageException: KAFKA_29 - Error fetching data from Kafka: org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition xxx at offset 0. If needed, please seek past the record to continue consumption.

xxx is my topic which I can connect to without issue using a GUI tool named Kafka Magic.

Hi @pkandra Based on the error message, my assumption is that your data format is Avro. Please correct me if I am mistaken. Have you specified the method the origin uses for deserializing the message? If the Avro schema ID is included in each message, make sure to set the key and value deserializers to Confluent


Thanks @Rishi.  Yes, the origin is using the deserializers are set to Confluent.  I have no issue with viewing the data with another tool named Kafka Magic using the exact same settings.  I’m also getting the following error now:  DATA_FORMAT_06 - Cannot create the parser factory: java.lang.RuntimeException: Could not create DataFactory instance for 'com.streamsets.pipeline.lib.parser.avro.AvroDataParserFactory': com.streamsets.pipeline.lib.util.SchemaRegistryException: com.streamsets.pipeline.lib.util.SchemaRegistryException: java.net.SocketTimeoutException: connect timed out


I actually figured this out.  In addition to the streamsets-datacollector-apache-kafka_3_2-lib stage library, I also had the CDP 7.1 library installed.  Both support the Kafka Multitopic Consumer as an origin.  Once I removed the CDP 7.1 library from my deployment, my pipeline worked as expected.  This really wasn’t documented anywhere and it was pretty much part trial and error and part luck that I discovered this.


Reply