Control Charecters replacement

  • 6 September 2022
  • 2 replies

How to replace control charecters in streamsets.

Data is coming from Kafka in UTF-8 charset and along with “i, Inverted Question-mark,½ symbol”

How to replace this using Groovy/Expression Evaluator /Jython

2 replies

Userlevel 2
Badge +1

@Priyanka Mynepally 

can you please try to use below expression in expression evaluator and check if it helps.

str.replaceAll("[^a-zA-Z0-9]", " ")


kindly provide me the sample inputs , os i can give a try and if it works then i can provide you the sample pipeline for that .

Userlevel 3

@Priyanka Mynepally, before replacing these characters, make sure you are using the right encoding-decoding. These characters might be some valid data in other language that you are replacing with null.