Skip to main content

I’m building a data collector pipeline in which I want to rename elements within an array.
The input are JSON files read from a data lake, format is a single document with an array of identically structured documents, e.g.


{ "entries": :

  {

     "wd:Employee_User_ID": "lol",

     "wd:Scheduled_Weekly_Hours": "40",

     "wd:fte": "100" },

  {

     "wd:Employee_User_ID": "rofl",

     "wd:Scheduled_Weekly_Hours": "37.5",

     "wd:fte": "100" }

]}

 

I need to rename these fields, so I’ve put the following into the Field Renamer processor:

/entries

  • /'wd:Employee_User_ID' → /entries
  • /alias

     

    But that gives an 

    java.lang.ArrayIndexOutOfBoundsException: -1

        at java.util.ArrayList.elementData(ArrayList.java:424)

        at java.util.ArrayList.get(ArrayList.java:437)

        at com.streamsets.datacollector.record.RecordImpl.get(RecordImpl.java:309)

        at com.streamsets.datacollector.record.RecordImpl.has(RecordImpl.java:374)

        at com.streamsets.pipeline.stage.processor.fieldrenamer.FieldRenamerProcessor.process(FieldRenamerProcessor.java:300)

     

    It works quite well (for the first array element) if I replace the asterisk with a 0 in the expressions. Isn’t this the correct syntax for renaming all array elements? 

  • @chisou Try Field Mapper processor


    Thanks, @saleempothiwala 
    I did that:

    And it works, but it’s not renaming, but copying. Also - just for one, and having a Field Mapper for each field seems so inefficient.

    Is the Field Renamer not able to do this? The documentation states “To rename an array or map, you can specify a single array index or map element, or you can use the asterisk wildcard to represent all array indices and map elements.” 

     


    @chisou 

     

    See this below:

     

    Output:

     


    so if you can write an expression to identify the values to be changed then it can be done for multiple fields. 

    for e.g. if you want to take out wd: from all the fields then simply use something like:

     


    Ah, replace, thanks! Unfortunately is not a pattern like this, so I would need to go with multiple Mappers for now. 

    Do you happen to know whether Renamer is considered legacy or something? It seems like a pretty basic requirement, being able to just rename record labels. I most cases you’d probably want to do that within Snowflake directly I guess, but still ...


    Reply