Skip to main content

I’m building a data collector pipeline in which I want to rename elements within an array.
The input are JSON files read from a data lake, format is a single document with an array of identically structured documents, e.g.


{ "entries": [

  {

     "wd:Employee_User_ID": "lol",

     "wd:Scheduled_Weekly_Hours": "40",

     "wd:fte": "100" },

  {

     "wd:Employee_User_ID": "rofl",

     "wd:Scheduled_Weekly_Hours": "37.5",

     "wd:fte": "100" }

]}

 

I need to rename these fields, so I’ve put the following into the Field Renamer processor:

/entries[*]/'wd:Employee_User_ID' → /entries[*]/alias

 

But that gives an 

java.lang.ArrayIndexOutOfBoundsException: -1

    at java.util.ArrayList.elementData(ArrayList.java:424)

    at java.util.ArrayList.get(ArrayList.java:437)

    at com.streamsets.datacollector.record.RecordImpl.get(RecordImpl.java:309)

    at com.streamsets.datacollector.record.RecordImpl.has(RecordImpl.java:374)

    at com.streamsets.pipeline.stage.processor.fieldrenamer.FieldRenamerProcessor.process(FieldRenamerProcessor.java:300)

 

It works quite well (for the first array element) if I replace the asterisk with a 0 in the expressions. Isn’t this the correct syntax for renaming all array elements? 

@chisou Try Field Mapper processor


Thanks, @saleempothiwala 
I did that:

And it works, but it’s not renaming, but copying. Also - just for one, and having a Field Mapper for each field seems so inefficient.

Is the Field Renamer not able to do this? The documentation states “To rename an array or map, you can specify a single array index or map element, or you can use the asterisk wildcard to represent all array indices and map elements.” 

 


@chisou 

 

See this below:

 

Output:

 


so if you can write an expression to identify the values to be changed then it can be done for multiple fields. 

for e.g. if you want to take out wd: from all the fields then simply use something like:

 


Ah, replace, thanks! Unfortunately is not a pattern like this, so I would need to go with multiple Mappers for now. 

Do you happen to know whether Renamer is considered legacy or something? It seems like a pretty basic requirement, being able to just rename record labels. I most cases you’d probably want to do that within Snowflake directly I guess, but still ...