When parsing XML, Json or CSV files you may get an exception such as:
Directory Origin (reading XML files):
"SPOOLDIR_01 - Failed to process file '/tmp/out/dir27/file31.xml' at position '0': com.streamsets.pipeline.stage.origin.spooldir.BadSpoolFileException: com.streamsets.pipeline.lib.parser.DataParserException: XML_PARSER_02 - XML object exceeded maximum length: readerId 'file13.xml', offset '0', maximum length '2147483647'"
or S3 Origin reading JSON:
com.streamsets.pipeline.stage.origin.s3.BadSpoolObjectException: com.streamsets.pipeline.api.ext.io.OverrunException: Reader exceeded the read limit '1048576'
In both cases, regardless of the value set in the Data Format tab's "Max Object Length (chars) " field, Data Collector's internal parsing buffer defaults to 1mb. Trying to parse a file greater than the size of the internal buffer - not the size specified in the UI - results in the error.
To provide a 10mb buffer for XML, JSON and CSV parsing update the parser.limit parameter in the sdc.properties file, eg:
parser.limit=10485760
Then restart Data Collector.
You can also check the following article: OverRunLimit Exception