Skip to main content

Getting "Failed to process object" when previewing JSON files.

  • February 19, 2022
  • 0 replies
  • 205 views

AkshayJadhav
StreamSets Employee
Forum|alt.badge.img

Issue:

My pipeline takes in JSON files as a whole file and then attempts to parse them in a Jython stage with the help of some Java IO packages.

This works fine for small files, but it's taking hours (and still not finishing) when I try to preview it with larger files (the smallest sample file I could find is ~25 MB).

Also, I'm not reading these in as JSON-formatted files because they are generating the following error,

Failed to process object 'Input/RawDataTD.json' at position '1048568': com.streamsets.pipeline.stage.origin.s3.BadSpoolObjectException: com.fasterxml.jackson.databind.JsonMappingException: Unexpected end-of-input within/between Array entries at [Source: com.streamsets.pipeline.api.ext.io.OverrunReader@26bd1e84; line: 1, column: 1049155] at [Source: com.streamsets.pipeline.api.ext.io.OverrunReader@26bd1e84; line: 1, column: 1048569] (through reference chain: com.streamsets.datacollector.json.OverrunJsonObjectReaderImpl$EnforcerMap["TimeDomainData"]->com.streamsets.datacollector.json.OverrunJsonObjectReaderImpl$EnforcerList[758]->com.streamsets.datacollector.json.OverrunJsonObjectReaderImpl$EnforcerMap["ChannelSamples"]->com.streamsets.datacollector.json.OverrunJsonObjectReaderImpl$EnforcerList[0])

 

Similarly, if I try to read it in as text I get,

java.lang.OutOfMemoryError: Requested array size exceeds VM limit.

How to solve this?

 

Solution:

The primary reason for the error seen during the pipeline preview is because there is a hard enforced limit ONLY for the preview, to prevent too much data being pushed into the browser. Therefore, the preview for large files will not work. However, running the pipeline should work without issues if the following things are taken care of,

  1. Update both the memory settings for the JVM as well as the parser.limit to accommodate large JSON files.
  2. Make sure that the Buffer Limite (KB) is set to the appropriate value. e.g. say 4096 (i.e. 4MB) if your files are not going to exceed 4 MB or a value less than the parser.limit. Else, it will complain if the value you set is equal to the parser.limit.

 

Note: There shouldn't be a need to use the Jython processor for parsing the file manually in this particular use case.