Solved

How to remove special characters from field names in streamsets data collector?


Userlevel 5
Badge +1

Solution.

The issue can be handled by Field mapper processor. Below the syntax for the same.

 

${str:replaceAll(f:name(), '[.\\-/()!&%^$#@]','')}

 

Attached the screen shot for the reference.

 

NB : The issue also can be handled in different ways e;g(Expression Evaluator, scripting processor etc) 

icon

Best answer by pkandra 30 January 2023, 20:21

View original

15 replies

I frequently use this, but ran across a few different situations.

  • Could the same logic be used for removing unwanted characters in field values?
  • How would you go about removing control characters like a tab or new line character from field names or values?
Userlevel 5
Badge +1

@pkandra 

Kindly try the above the logic and check if it helps in fixing the issues.

If not then in the logic we need to mention which are characters are causing problem.

If i get the error details with your input data which is failing ,so i can help you on that.

Thanks & Regards

Bikram_

 

@Bikram your logic works perfectly for the special characters listed.  I can’t seem to get it to work for control characters though.  StreamSets regex indicates a \s should handle what I need but I can’t get it to work.

Represents a whitespace character - includes space, tab, line break and form feed.
Userlevel 5
Badge +1

can you please try with the below code and check if it helps.

 

${str:replaceAll(f:name(), '[.\\-/()!&%^$#@ \t\n\r','')}

No luck.

 

Userlevel 5
Badge +1

can i get the sample input data for which you are having issues in it, so i can try to reproduce the same and try to help you on it.

Sure.  I’m pulling from a JDBC Multitable consumer source and one of the tables has a column that has some embedded new line characters...an example string is:

  Gas 2012 C1500 Silverado

 

 

 

Userlevel 5
Badge +1

@pkandra 

can you please give a try the below code.

 

${str:replaceAll(f:name(), '[\\s*.\\-/()!&%^$#@]','')}

 

Thanks & Regards

Bikram_

Thanks @Bikram.  I should be able to modify that one to meet my needs...probably just \\n so I don’t remove spaces.

Userlevel 5
Badge +1

@pkandra 

May i know if it helps in fixing your issues.

Yes, but I need to only use the regex value I need and also use a conditional expression to replace the new line character within a field value if it exists or else it will turn numeric and date fields to strings for some reason.  The below works for what I need.  Thanks for your assistance.

 

 

Userlevel 5
Badge +1

@pkandra 

Thanks and good to know that it helps.

Kindly help me to close the issue as resolved and mark as best answer.

Thanks & Regards

Bikram_

@Bikram sure...how would I go about doing that?  I had appended my question to your original post.

Userlevel 5
Badge +1

@pkandra 

Just click on Best answer and publish

@Bikram  I believe the Best Answer link only gets displayed if I asked the initial question.

Reply