How to format Excel files before writing to endpoint?

3 years ago
December 22, 2021
0 replies
528 views

AkshayJadhav
StreamSets Employee
101 replies

Product: All SDC versions that support Data Format Excel

Scenario:

We have an excel sheet that is an input as Whole file to Directory / HDFS origin. We want to run some basic formatting on all the worksheets and eliminate some unwanted data before writing to the endpoint.

Goal:

Format Excel files before loading to endpoint

Data Format : Excel

i. Eliminate the first 2 rows (completely ignore) and process from the 3rd row
ii. Pick column values A,B,C and M for first row and D,E,F,I,J,K,L columns from row 2 onwards.

The above is a screenshot of one of the worksheets in the SampleExcel2.xlsx file attached to this article

What we want to do here is to skip rows 1,2 and columns A,B,C in all worksheets and write the remaining columns and rows to the endpoint

Solution:

Please refer to attachments to this KB which has a sample input sheet, screenshots of sample output, and 2 example pipeline JSON files to handle this.

There are 2 ways of handling this. One is using the groovy script and the other is using an EL evaluator. I have uploaded the sample pipeline for both methods.

+ We integrated a groovy script and saw it was working for a single sheet.
+ Here is a code change you need to make to your current groovy evaluator to make it work for

multiple sheets

Groovy Evaluator ->> Groovy

Init Script :

sdc.state['currentFile'] = ""
 sdc.state['worksheet'] = ""
 sdc.state['lineNum'] = 0

Script :

// Sample Groovy code
 records = sdc.records
 for (record in records) {
    try {

      // initialize, so we work correctly on the first file.
      // do this for the first record of the first file only.
      if(sdc.state['currentFile'].equals("")) {
        sdc.state['currentFile'] = record.attributes['file'];
      }
      if(sdc.state['worksheet'].equals("")) {
        sdc.state['worksheet'] = record.attributes['worksheet'];
      }

      if(sdc.state['currentFile'].equals(record.attributes['file'])) {
        if (sdc.state['worksheet'].equals(record.attributes['worksheet']))
        {
if(sdc.state['lineNum'] < 2) {  // number of records to skip.  <<<<<<<<<<<<< THIS WILL CHANGE BASED ON NUMBER OF LINES YOU WANT TO SKIP
          ++sdc.state['lineNum'];
            sdc.log.info("CASE 13303: currentFile {} worksheet {} lineNum {}  " ,sdc.state['currentFile'],
                        sdc.state['worksheet'], sdc.state['lineNum']);
          continue;  // skip this record

          }
        } else {
          sdc.state['worksheet'] = record.attributes['worksheet'];
          sdc.state['lineNum'] = 1;
            continue;  // skip this record
        }
      } else {
        sdc.log.info("CASE 13303: {} set to {}", sdc.state['currentFile'], record.attributes['file']);
        sdc.state['currentFile'] = record.attributes['file'];
        sdc.state['worksheet'] = record.attributes['worksheet'];
        sdc.state['lineNum'] = 0;
      }
      // Write a record to the processor output
      sdc.output.write(record);
    } catch (e) {
        // Write a record to the error pipeline
        sdc.log.error(e.toString(), e)
        sdc.error.write(record, e.toString())
    }
 }

Did this topic help you find an answer to your question?

This topic has been closed for comments

Scenario:

Goal:

Solution:

Related topics

Why so many bugs in FD for contact & User Managementicon

Freshdesk Contact record thread to FreshService requestericon

change the settings in FD contact listicon

The New FreshCaller - 6 Things you should know!!!

Next-Gen Project Management: Link User Stories and Bugs directly to versionsicon

Tags

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded