Solved

Kafka consumer in Transformer throws java.lang.NoClassDefFoundError

  • 24 July 2022
  • 5 replies
  • 87 views

Hello, could you please help me out?

In Transformer pipelines adding a Kafka consumer throws

java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer$

The same connection works fine with Data Collector.

I am using local Spark in docker 

Thanks!

icon

Best answer by Rishi 25 July 2022, 07:05

View original

5 replies

Userlevel 3
Badge

@Mike Arov , Could you please please confirm scala version of your Spark cluster and also check the transformer Scala version ? Make sure both matches

Thanks you! It looks like they match:

  • Transformer 5.0.0 Scala 2.12
  • Spark library version: 3.0.3
  • kafka-clients-2.6.0.jar

I am also using local[*] ….

One thing I did notice was that /opt/streamsets-transformer/streamsets-libs/streamsets-spark-kafka-lib/lib/ contained libs for 3.0.2, while the Spark version was 3.0.3:

Manually downloaded 3.0.3 jars, but it did not make a difference :(

  • spark-sql-kafka-0-10_2.12-3.0.3.jar
  • spark-tags_2.12-3.0.3.jar
  • spark-token-provider-kafka-0-10_2.12-3.0.3.jar

Solved it!

I changed to 2.11 Scala version and it worked out of the box:

  • Transformer 5.0.0 Scala 2.11

 

It appears 

  • Transformer 5.0.0 Scala 2.12

has a bug ...

Now I was able to get Transformer 5.0.0 Scala 2.12 to work as well!

Turns out https://repo1.maven.org/maven2/org/apache/commons/commons-pool2/2.11.0/commons-pool2-2.11.0.jar was missing!

 

wget https://repo1.maven.org/maven2/org/apache/commons/commons-pool2/2.11.0/commons-pool2-2.11.0.jar
sudo mv commons-pool2-2.11.0.jar /opt/streamsets/spark-3.0.3-bin-hadoop3.2/jars/

This got Kafka stage working, but I think this Jar needs to be packaged by Transformer deployment installer. @Rishi, maybe you can fix it in next release? ;)

Reply