Pipeline design consideration - Drift in cluster streaming mode.

2 years ago
May 23, 2022
0 replies
22 views

AkshayJadhav
StreamSets Employee
101 replies

Question:

We are trying to do a Hive/Impala Data Drift in cluster streaming mode. However, the pipeline stalls (new partitions) and often fails (new columns) when jobs try to simultaneously hit the Hive/Impala Executor.

Should we be streaming all the Data Drift events to a single pipeline and de-dupe them to manage the Hive/Impala changes?

What is the recommended way to proceed here?

Answer:

The general recommendation is not to hit the Hive Metastore from multiple pipelines. The reason behind this approach revolves around the premise that two pipelines could try to create a table or add a column at the same time. Having Hive Metastore target, hence, in a separate pipeline and funneling all the metadata records via Kafka/SDC RPC is the optimal way to design it in such a scenario.

The Hive metastore target is capable of de-duplicating events and manages that with just one query if it is all done via a single instance of the stage. And, this is applicable for cluster pipelines/multiple standalone pipelines. For multi-threaded pipelines, it is not needed because we make use of the shared cache.

Note: If using Kafka stage to process the records from the pipeline to another pipeline with Hive Metastore destination, make sure that you pick SDC Record Data type.

Did this topic help you find an answer to your question?

Be the first to reply!

Reply

Related topics

Subscriptions status stuck in "Pending Binary Approval" on App Store Connecticon

Update accepted & Subscription stuck in "Pending Binary Approval" App Store Connecticon

Amazon: Error fetching offerings - PurchasesError(code=UnknownError, underlyingErrorMessage=Timeout error trying to get Amazon user data., message='Unknown error.')icon

Unable to Fetch Subscription Products During App Store Reviewicon

IOS configuration problem |😿‼️ There is an issue with your configurationicon

Tags

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded