Question

How to ignore the unexpected line in CSV file and proceed to next?

  • 19 October 2021
  • 3 replies
  • 41 views

Userlevel 2

Hi - while processing a CSV file via Directory Origin, I encountered the following error "SPOOLDIR_01 - Failed to process file '/tmp/out/customer_daily_incremental/CUSTOMER/CUSTOMER_20092021.csv' at position '177394': com.streamsets.pipeline.lib.dirspooler.BadSpoolFileException: java.io.IOException: (line 1138) invalid char between encapsulated token and delimiter" - I now know that the records present in line 1138 doesn't comply to the expected format (I see there is an additional double quote). But I don't want SDC to completely abort this file. Is there a configuration available where I can opt to log the record with error message and proceed to next line in the file?


3 replies

Userlevel 1

@swayam configuring the stage error handling should help in this case - https://docs.streamsets.com/portal/#controlhub/latest/help/datacollector/UserGuide/Pipeline_Design/ErrorHandling.html#concept_atr_j4y_5r

 

Userlevel 2

Hi Sanjeev - I’m not able to get rid of this.

Sample File

ORT_TEL|CMS_CUST_ID|CMS_INIT|CMS_NAME|CMS_MEMBER_NO|CMS_DATE_BIRTH|CMS_EMAIL|CMS_GENDER|MOD_BY|UPD_DATE|CUST_REGISTER_DATE|ORT_ADDR|ORT_UNIT| ORT_STREET

"9999900000"|"68073437"|""|"Witdiertty"|""|""|""|""|"EIOS_WEB_G"|"2021-08-26 19:51"|"2021-08-26"|"238"|""|"Jalan Taat Kampung Melayu" 

"9999912345"|"6246636"|"MR"|"Aidah Othman"|"12665772"|""|"12345@yahoo.com"|""|"EIOS"|"2021-09-23 15:37:59"|"2021-09-23"|"17"|"17"|"Jalan Hiliran" 

"9999955578"|"7946061"|"MR"|"To my beloved mom "Saripah""|"15678232"|""|"test12345@yahoo.com"|""|"EIOS"|"2021-09-20 12:35:02"|"2021-09-20"|"29"|""|"Jalan Balau 9 Taman Rinting"

"9999943251"|"5970743"|""|"shila"|"12092237"|""|"12345@gmail.com"|""|"EIOS"|"2021-09-20 14:47:18"|"2021-09-20"|"17"|"17"|"Jalan Hiliran"

Sample Pipeline

Data Format Settings

 

The error is 3rd line of the file where the quotes around Saripah are not escaped.

I tried to set the stage error handling to Discard and Send to Error - but neither of them helped to process the whole file except the 3rd line.

Regards

Swayam

Userlevel 2

Hi Sanjeev,

I’m unable to get rid of this.

Sample File

ORT_TEL|CMS_CUST_ID|CMS_INIT|CMS_NAME|CMS_MEMBER_NO|CMS_DATE_BIRTH|CMS_EMAIL|CMS_GENDER|MOD_BY|UPD_DATE|CUST_REGISTER_DATE|ORT_ADDR|ORT_UNIT| ORT_STREET

"9999900000"|"68073437"|""|"Witdiertty"|""|""|""|""|"EIOS_WEB_G"|"2021-08-26 19:51"|"2021-08-26"|"238"|""|"Jalan Taat Kampung Melayu" 

"9999912345"|"6246636"|"MR"|"Aidah Othman"|"12665772"|""|"12345@yahoo.com"|""|"EIOS"|"2021-09-23 15:37:59"|"2021-09-23"|"17"|"17"|"Jalan Hiliran" 

"9999955578"|"7946061"|"MR"|"To my beloved mom "Saripah""|"15678232"|""|"test12345@yahoo.com"|""|"EIOS"|"2021-09-20 12:35:02"|"2021-09-20"|"29"|""|"Jalan Balau 9 Taman Rinting"

"9999943251"|"5970743"|""|"shila"|"12092237"|""|"12345@gmail.com"|""|"EIOS"|"2021-09-20 14:47:18"|"2021-09-20"|"17"|"17"|"Jalan Hiliran"

 

Sample Pipeline

 

Data Format Config

The error is 3rd line of the file where the quotes around Saripah are not escaped.

I tried to set the stage error handling to Discard and Send to Error - but neither of them helped to process the whole file except the 3rd line.

Regards

Swayam

Reply