Skip to main content

Parser Overrun Errors or Max record length (chars) property doesn't take effect.

  • February 19, 2022
  • 0 replies
  • 814 views

AkshayJadhav
StreamSets Employee
Forum|alt.badge.img

When parsing XML, Json or CSV files you may get an exception such as:

Directory Origin (reading XML files):

"SPOOLDIR_01 - Failed to process file '/tmp/out/dir27/file31.xml' at position '0': com.streamsets.pipeline.stage.origin.spooldir.BadSpoolFileException: com.streamsets.pipeline.lib.parser.DataParserException: XML_PARSER_02 - XML object exceeded maximum length: readerId 'file13.xml', offset '0', maximum length '2147483647'"

or S3 Origin reading JSON:

com.streamsets.pipeline.stage.origin.s3.BadSpoolObjectException: com.streamsets.pipeline.api.ext.io.OverrunException: Reader exceeded the read limit '1048576'

In both cases, regardless of the value set in the Data Format tab's "Max Object Length (chars) "  field,  Data Collector's internal parsing buffer defaults to 1mb.  Trying to parse a file greater than the size of the internal buffer - not the size specified in the UI  - results in the error. 

To provide a 10mb buffer for XML, JSON and CSV parsing update the parser.limit parameter in the sdc.properties file,  eg:

parser.limit=10485760

 

Then restart Data Collector.

You can also check the following article: OverRunLimit Exception

 

Did this topic help you find an answer to your question?

Reply