How to use delimiter-separated values vs. S3_SPOOLDIR_01 error message ?

3 years ago
January 11, 2022
0 replies
371 views

AkshayJadhav
StreamSets Employee
101 replies

n StreamSets Data Collector UI you can set Delimiter Character, Escape Character and Quote Character for Delimited Data Format.

You can use Quote Character in delimited files often by requiring " (double quote) characters to keep field values that contain special/reserved characters such as commas, double quotes, or less commonly, newlines in one field.

For example (attached screen shots) if I set (also set by default):

Delimiter Character: | 
Escape Character: \ 
Quote Character: "

Example1:

Input: 
Text 1|Text 2 and Text 3|Text 4 

Output (quotation marks are around the field values,
because each field is saved as a string): 
Field 1: "Text 1" 
Field 2: "Text 2 and Text3" 
Field 3: "Text 4"

Example2 (quotation marks used as a quote character):

Input: 
Text 1|"Text2 and Text 3|Text 4" 

Output: 
Field 1: "Text 1" 
Field 2: "Text2 and Text 3|Text 4"

When the user introduces delimiters into text without actually intending them to be interpreted as boundaries between separate regions, it might cause delimiter collision - see below the error message:

Example3:

Input: 
Text 1 |"Text 2" and Text3|Text 4 

Output: S3_SPOOLDIR_01 - Failed to process object 'example5' at position '0':
com.streamsets.pipeline.stage.origin.s3.BadSpoolObjectException: com.streamsets.pipeline.lib.parser.DataParserException: DATA_PARSER_02 -
Parser error: 'java.io.IOException: (line 1) invalid char between
encapsulated token and delimiter'

In this case, there are a few options:

1) Use another Quote Character

2) Delete quotation marks from your data

3) If you need to use quotation marks, use it with Escape Character (example5)

Example4 (quotation marks used in text and also as a quotation character):

Input: 
Text 1 |"\"Text 2\" and Text3|Text 4" 

Output: 
Field 1: "Text 1" 
Field 2: ""Text 2" and Text3|Text 4"

OR

Example5 (quotation marks used in a text, not as a Quote Character):

Input: 
Text 1 |\"Text 2\" and Text3|Text 4 

Output: 
Field 1: "Text 1" 
Field 2: ""Text 2" and Text3" 
Field 3: "Text 4"

Did this topic help you find an answer to your question?

Be the first to reply!

Reply

Related topics

How to ignore the unexpected line in CSV file and proceed to next?icon

Parser Overrun Errors or Max record length (chars) property doesn't take effect.

Kafka Producer - kafka.common.MessageSizeTooLargeException.

Meet the SPOOLDIR_01 error messageicon

Unable to inject large XML data (4MB size) from S3 to Snowflake (Data collector)icon

Tags

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded