Offset storage for cluster mode Kafka pipelines.

3 years ago
February 14, 2022
0 replies
50 views

AkshayJadhav
StreamSets Employee

Question:

How do cluster (YARN streaming) mode Kafka pipelines store offsets?

Answer:

The cluster mode Kafka pipeline offset tracking is done by the SDC application. It does not use Kafka for offset storage, which is default for standalone Kafka consumer pipelines.

Offsets are persisted into HDFS in the following location:

/user/<sdc_user>/.streamsets-spark-streaming/<streamsets_id>/<topic_name>/<consumer_group>/<pipeline_name>/offset.json

Did this topic help you find an answer to your question?

Reply

Related topics

Multiple instances of event propertiesicon

Best practice for creating event propertiesicon

Funnel analysis - Calculate time to convert between first event and last event instanceicon

Tracking multiple applications on one instance (API-KEY)icon

When to leverage cohort syncs vs event streaming [Intercom edition]

Tags

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded