Post-processing capabilities for Hadoop origin

3 years ago
February 14, 2022
0 replies
16 views

AkshayJadhav
StreamSets Employee
101 replies

Scenario:

When processing a large amount of data in Hadoop FS origin, there may be bad data files that fail to be processed.

Goal:

For bad data files, it is important to be able to move out the bad data file and continue to process the rest of the data.

Solution:

The standalone Hadoop FS origin (available as of Data Collector v3.2.0) provides the option for post-processing, which includes the option to specify an error directory for bad input files that could not be fully processed.

Did this topic help you find an answer to your question?

Be the first to reply!

Reply

Related topics

CDC Oracle Problemicon

Oracle CDC client stage shows only one record when previewing pipeline.icon

Suggested config for Oracle CDC client in a development phase?icon

Technical Service Bulletin 2022-02-28 (TSB) - Oracle CDC origin potential data loss when Daylight Saving Time enabled in Oracle Database

Technical Service Bulletin 2022-02-28 (TSB) - Oracle CDC origin potential data loss when Daylight Saving Time enabled in Oracle Database

Tags

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded