Getting "Failed to process object" when previewing JSON files.

Forum|Forum|3 years ago
February 19, 2022
0 replies
205 views

AkshayJadhav
StreamSets Employee

Issue:

My pipeline takes in JSON files as a whole file and then attempts to parse them in a Jython stage with the help of some Java IO packages.

This works fine for small files, but it's taking hours (and still not finishing) when I try to preview it with larger files (the smallest sample file I could find is ~25 MB).

Also, I'm not reading these in as JSON-formatted files because they are generating the following error,

Failed to process object 'Input/RawDataTD.json' at position '1048568': com.streamsets.pipeline.stage.origin.s3.BadSpoolObjectException: com.fasterxml.jackson.databind.JsonMappingException: Unexpected end-of-input within/between Array entries at [Source: com.streamsets.pipeline.api.ext.io.OverrunReader@26bd1e84; line: 1, column: 1049155] at [Source: com.streamsets.pipeline.api.ext.io.OverrunReader@26bd1e84; line: 1, column: 1048569] (through reference chain: com.streamsets.datacollector.json.OverrunJsonObjectReaderImpl$EnforcerMap["TimeDomainData"]->com.streamsets.datacollector.json.OverrunJsonObjectReaderImpl$EnforcerList[758]->com.streamsets.datacollector.json.OverrunJsonObjectReaderImpl$EnforcerMap["ChannelSamples"]->com.streamsets.datacollector.json.OverrunJsonObjectReaderImpl$EnforcerList[0])

Similarly, if I try to read it in as text I get,

java.lang.OutOfMemoryError: Requested array size exceeds VM limit.

How to solve this?

Solution:

The primary reason for the error seen during the pipeline preview is because there is a hard enforced limit ONLY for the preview, to prevent too much data being pushed into the browser. Therefore, the preview for large files will not work. However, running the pipeline should work without issues if the following things are taken care of,

Update both the memory settings for the JVM as well as the parser.limit to accommodate large JSON files.
Make sure that the Buffer Limite (KB) is set to the appropriate value. e.g. say 4096 (i.e. 4MB) if your files are not going to exceed 4 MB or a value less than the parser.limit. Else, it will complain if the value you set is equal to the parser.limit.

Note: There shouldn't be a need to use the Jython processor for parsing the file manually in this particular use case.

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded