Scenario:
Data record contains multiple delimited characters acting as a single delimiter. This makes it difficult to directly parse those records directly by specifying the multi delimiter in the stage configuration.
Goal:
To be able to parse multi delimited character records using SDC stages without having to do data cleansing prior to letting the data flow through the pipeline.
Solution:
The following sample pipeline should help to achieve that end.
Directory Origin => Expression Evaluator => Data Parser => Local FS
In the Expression Evaluator stage, you'll need to apply the following Field Expression
${str:replaceAll(record:value('/text'), "\\|\\~\\|", "^")}
The assumption here is that there is a character that will never show up in the incoming data and use that as the alternate separator being set in the Expression Evaluator. Here. ^ is used for the purpose of the example.
So, the above pipeline consumes a directory origin using text format, then the expression evaluator replaces all |~| occurrences with ^, then the data-parser parses delimited using ^ as a delimiter.
To improve one's chances of not encountering a character commonly seen as delimiters, one could make use of a rare Unicode character like \u2603 as the replacement of the multi delimiter shown above, i.e, use \u2603 instead of ^.