Solved

ArrayIndexOutOfBoundsException when renaming array elements

  • 8 September 2022
  • 5 replies
  • 157 views

I’m building a data collector pipeline in which I want to rename elements within an array.
The input are JSON files read from a data lake, format is a single document with an array of identically structured documents, e.g.


{ "entries": [

  {

     "wd:Employee_User_ID": "lol",

     "wd:Scheduled_Weekly_Hours": "40",

     "wd:fte": "100" },

  {

     "wd:Employee_User_ID": "rofl",

     "wd:Scheduled_Weekly_Hours": "37.5",

     "wd:fte": "100" }

]}

 

I need to rename these fields, so I’ve put the following into the Field Renamer processor:

/entries[*]/'wd:Employee_User_ID' → /entries[*]/alias

 

But that gives an 

java.lang.ArrayIndexOutOfBoundsException: -1

    at java.util.ArrayList.elementData(ArrayList.java:424)

    at java.util.ArrayList.get(ArrayList.java:437)

    at com.streamsets.datacollector.record.RecordImpl.get(RecordImpl.java:309)

    at com.streamsets.datacollector.record.RecordImpl.has(RecordImpl.java:374)

    at com.streamsets.pipeline.stage.processor.fieldrenamer.FieldRenamerProcessor.process(FieldRenamerProcessor.java:300)

 

It works quite well (for the first array element) if I replace the asterisk with a 0 in the expressions. Isn’t this the correct syntax for renaming all array elements? 

icon

Best answer by saleempothiwala 8 September 2022, 15:31

View original

5 replies

Userlevel 4
Badge

@chisou Try Field Mapper processor

Thanks, @saleempothiwala 
I did that:

And it works, but it’s not renaming, but copying. Also - just for one, and having a Field Mapper for each field seems so inefficient.

Is the Field Renamer not able to do this? The documentation states “To rename an array or map, you can specify a single array index or map element, or you can use the asterisk wildcard to represent all array indices and map elements.” 

 

Userlevel 4
Badge

@chisou 

 

See this below:

 

Output:

 

Userlevel 4
Badge

so if you can write an expression to identify the values to be changed then it can be done for multiple fields. 

for e.g. if you want to take out wd: from all the fields then simply use something like:

 

Ah, replace, thanks! Unfortunately is not a pattern like this, so I would need to go with multiple Mappers for now. 

Do you happen to know whether Renamer is considered legacy or something? It seems like a pretty basic requirement, being able to just rename record labels. I most cases you’d probably want to do that within Snowflake directly I guess, but still ...

Reply