Skip to main content

Hello, could you please help me out?

In Transformer pipelines adding a Kafka consumer throws

java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer$

The same connection works fine with Data Collector.

I am using local Spark in docker 

Thanks!

@Mike Arov , Could you please please confirm scala version of your Spark cluster and also check the transformer Scala version ? Make sure both matches


Thanks you! It looks like they match:

  • Transformer 5.0.0 Scala 2.12
  • Spark library version: 3.0.3
  • kafka-clients-2.6.0.jar

I am also using local

  • ….


  • One thing I did notice was that /opt/streamsets-transformer/streamsets-libs/streamsets-spark-kafka-lib/lib/ contained libs for 3.0.2, while the Spark version was 3.0.3:

    Manually downloaded 3.0.3 jars, but it did not make a difference :(

    • spark-sql-kafka-0-10_2.12-3.0.3.jar
    • spark-tags_2.12-3.0.3.jar
    • spark-token-provider-kafka-0-10_2.12-3.0.3.jar

    Solved it!

    I changed to 2.11 Scala version and it worked out of the box:

    • Transformer 5.0.0 Scala 2.11

     

    It appears 

    • Transformer 5.0.0 Scala 2.12

    has a bug ...


    Now I was able to get Transformer 5.0.0 Scala 2.12 to work as well!

    Turns out https://repo1.maven.org/maven2/org/apache/commons/commons-pool2/2.11.0/commons-pool2-2.11.0.jar was missing!

     

    wget https://repo1.maven.org/maven2/org/apache/commons/commons-pool2/2.11.0/commons-pool2-2.11.0.jar
    sudo mv commons-pool2-2.11.0.jar /opt/streamsets/spark-3.0.3-bin-hadoop3.2/jars/

    This got Kafka stage working, but I think this Jar needs to be packaged by Transformer deployment installer. @Rishi, maybe you can fix it in next release? ;)


    Reply