How to handle multi delimited records?

3 years ago
December 20, 2021
0 replies
93 views

AkshayJadhav
StreamSets Employee
101 replies

Scenario:

Data record contains multiple delimited characters acting as a single delimiter. This makes it difficult to directly parse those records directly by specifying the multi delimiter in the stage configuration.

Goal:

To be able to parse multi delimited character records using SDC stages without having to do data cleansing prior to letting the data flow through the pipeline.

Solution:

The following sample pipeline should help to achieve that end.

Directory Origin => Expression Evaluator => Data Parser => Local FS

In the Expression Evaluator stage, you'll need to apply the following Field Expression

${str:replaceAll(record:value('/text'), "\\|\\~\\|", "^")}

The assumption here is that there is a character that will never show up in the incoming data and use that as the alternate separator being set in the Expression Evaluator. Here. ^ is used for the purpose of the example.

So, the above pipeline consumes a directory origin using text format, then the expression evaluator replaces all |~| occurrences with ^, then the data-parser parses delimited using ^ as a delimiter.

To improve one's chances of not encountering a character commonly seen as delimiters, one could make use of a rare Unicode character like \u2603 as the replacement of the multi delimiter shown above, i.e, use \u2603 instead of ^.

Did this topic help you find an answer to your question?

This topic has been closed for comments

Related topics

How to convert all fields to uppercase/lowercase?

Jython evaluator converts fields with null to STRING.

Jython Evaluator converts a field of "Date" type to a field of "DateTime" type.

How to convert String to ZonedDateTime with Field Type Converter

Not able to convert value field when writing to InfluxDBicon

Tags

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded