n StreamSets Data Collector UI you can set Delimiter Character, Escape Character and Quote Character for Delimited Data Format.
You can use Quote Character in delimited files often by requiring " (double quote) characters to keep field values that contain special/reserved characters such as commas, double quotes, or less commonly, newlines in one field.
For example (attached screen shots) if I set (also set by default):
Delimiter Character: |
Escape Character: \
Quote Character: "
Example1:
Input:
Text 1|Text 2 and Text 3|Text 4
Output (quotation marks are around the field values,
because each field is saved as a string):
Field 1: "Text 1"
Field 2: "Text 2 and Text3"
Field 3: "Text 4"
Example2 (quotation marks used as a quote character):
Input:
Text 1|"Text2 and Text 3|Text 4"
Output:
Field 1: "Text 1"
Field 2: "Text2 and Text 3|Text 4"
When the user introduces delimiters into text without actually intending them to be interpreted as boundaries between separate regions, it might cause delimiter collision - see below the error message:
Example3:
Input:
Text 1 |"Text 2" and Text3|Text 4
Output: S3_SPOOLDIR_01 - Failed to process object 'example5' at position '0':
com.streamsets.pipeline.stage.origin.s3.BadSpoolObjectException: com.streamsets.pipeline.lib.parser.DataParserException: DATA_PARSER_02 -
Parser error: 'java.io.IOException: (line 1) invalid char between
encapsulated token and delimiter'
In this case, there are a few options:
1) Use another Quote Character
2) Delete quotation marks from your data
3) If you need to use quotation marks, use it with Escape Character (example5)
Example4 (quotation marks used in text and also as a quotation character):
Input:
Text 1 |"\"Text 2\" and Text3|Text 4"
Output:
Field 1: "Text 1"
Field 2: ""Text 2" and Text3|Text 4"
OR
Example5 (quotation marks used in a text, not as a Quote Character):
Input:
Text 1 |\"Text 2\" and Text3|Text 4
Output:
Field 1: "Text 1"
Field 2: ""Text 2" and Text3"
Field 3: "Text 4"