How can I Parse Fixed-width records?

  • 17 January 2022
  • 0 replies
  • 71 views

Userlevel 4
Badge

If you have files containing records that are comprised of fixed-width fields and the records are delimited with line terminators, you can ingest them into StreamSets by coupling the Directory Origin with an Expression Evaluator Processor. In Directory Origin, set the Data Format tab to "Text". This will place each record (up to line terminator) into the /text field.

 

Let's look at a sample data recordset:

 

Now set the Expression Evaluator Processor as the next stage in the pipeline and configure it as follows:

Notes:

1. Each field is a new field that will be created.
2. The format of substring is starting position, up to but not including ending position. Hence /name is starting at position 0 up to, but not including position 8.

 

Optionally, you can place a field remover after the expression processor to remove the original /text field.


0 replies

Be the first to reply!

Reply