I’m used to configuring the Data Collector Kafka origin with a specific consumer group. I need this to control Kafka offsets and the Kafka broker requires this.
In Transformer, I don’t see any way to define the consumer group. How is this done ?
If you need to consume data from Kafka and perform real-time stream processing, you should use StreamSets Data Collector and take advantage of its Kafka Consumer origin. If you require more complex data transformations at scale, you can use StreamSets Transformer for batch processing with Apache Spark.
In the transformer, data will be processed from Kafka, based on the Kafka topic, eliminating the need for consumer details.
Thanks for that. Yes. We’re a combined SDC and Transformer implementation already. I just had a first look at the Transformer Kafka origin having used the SDC Kafka multitopic origin extensively. If it doesn’t have a consumer group setting it isn’t usable in any scenario I can foresee. We have access controls on consumer groups so you can’t just be an arbitrary consumer.
https://issues.apache.org/jira/browse/SPARK-26350
With Spark v3.0+ you should be able to specify consumer group via additional properties(kafka.group.id)
Reply
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.