Skip to main content
Question

How to ignore the unexpected line in CSV file and proceed to next?

  • October 19, 2021
  • 4 replies
  • 950 views

swayam
Discovered Fame
Forum|alt.badge.img
  • Discovered Fame
  • 20 replies

Hi - while processing a CSV file via Directory Origin, I encountered the following error "SPOOLDIR_01 - Failed to process file '/tmp/out/customer_daily_incremental/CUSTOMER/CUSTOMER_20092021.csv' at position '177394': com.streamsets.pipeline.lib.dirspooler.BadSpoolFileException: java.io.IOException: (line 1138) invalid char between encapsulated token and delimiter" - I now know that the records present in line 1138 doesn't comply to the expected format (I see there is an additional double quote). But I don't want SDC to completely abort this file. Is there a configuration available where I can opt to log the record with error message and proceed to next line in the file?

4 replies

Sanjeev
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 53 replies
  • October 19, 2021

swayam
Discovered Fame
Forum|alt.badge.img
  • Author
  • Discovered Fame
  • 20 replies
  • October 20, 2021

Hi Sanjeev - I’m not able to get rid of this.

Sample File

ORT_TEL|CMS_CUST_ID|CMS_INIT|CMS_NAME|CMS_MEMBER_NO|CMS_DATE_BIRTH|CMS_EMAIL|CMS_GENDER|MOD_BY|UPD_DATE|CUST_REGISTER_DATE|ORT_ADDR|ORT_UNIT| ORT_STREET

"9999900000"|"68073437"|""|"Witdiertty"|""|""|""|""|"EIOS_WEB_G"|"2021-08-26 19:51"|"2021-08-26"|"238"|""|"Jalan Taat Kampung Melayu" 

"9999912345"|"6246636"|"MR"|"Aidah Othman"|"12665772"|""|"12345@yahoo.com"|""|"EIOS"|"2021-09-23 15:37:59"|"2021-09-23"|"17"|"17"|"Jalan Hiliran" 

"9999955578"|"7946061"|"MR"|"To my beloved mom "Saripah""|"15678232"|""|"test12345@yahoo.com"|""|"EIOS"|"2021-09-20 12:35:02"|"2021-09-20"|"29"|""|"Jalan Balau 9 Taman Rinting"

"9999943251"|"5970743"|""|"shila"|"12092237"|""|"12345@gmail.com"|""|"EIOS"|"2021-09-20 14:47:18"|"2021-09-20"|"17"|"17"|"Jalan Hiliran"

Sample Pipeline

Data Format Settings

 

The error is 3rd line of the file where the quotes around Saripah are not escaped.

I tried to set the stage error handling to Discard and Send to Error - but neither of them helped to process the whole file except the 3rd line.

Regards

Swayam


swayam
Discovered Fame
Forum|alt.badge.img
  • Author
  • Discovered Fame
  • 20 replies
  • October 20, 2021

Hi Sanjeev,

I’m unable to get rid of this.

Sample File

ORT_TEL|CMS_CUST_ID|CMS_INIT|CMS_NAME|CMS_MEMBER_NO|CMS_DATE_BIRTH|CMS_EMAIL|CMS_GENDER|MOD_BY|UPD_DATE|CUST_REGISTER_DATE|ORT_ADDR|ORT_UNIT| ORT_STREET

"9999900000"|"68073437"|""|"Witdiertty"|""|""|""|""|"EIOS_WEB_G"|"2021-08-26 19:51"|"2021-08-26"|"238"|""|"Jalan Taat Kampung Melayu" 

"9999912345"|"6246636"|"MR"|"Aidah Othman"|"12665772"|""|"12345@yahoo.com"|""|"EIOS"|"2021-09-23 15:37:59"|"2021-09-23"|"17"|"17"|"Jalan Hiliran" 

"9999955578"|"7946061"|"MR"|"To my beloved mom "Saripah""|"15678232"|""|"test12345@yahoo.com"|""|"EIOS"|"2021-09-20 12:35:02"|"2021-09-20"|"29"|""|"Jalan Balau 9 Taman Rinting"

"9999943251"|"5970743"|""|"shila"|"12092237"|""|"12345@gmail.com"|""|"EIOS"|"2021-09-20 14:47:18"|"2021-09-20"|"17"|"17"|"Jalan Hiliran"

 

Sample Pipeline

 

Data Format Config

The error is 3rd line of the file where the quotes around Saripah are not escaped.

I tried to set the stage error handling to Discard and Send to Error - but neither of them helped to process the whole file except the 3rd line.

Regards

Swayam


Drew Kreiger
Rock star
Forum|alt.badge.img
  • Senior Community Builder at StreamSets
  • 95 replies
  • February 7, 2022

Reply