Skip to main content

Here we will be covering how to convert an AVRO file with Schema, to a Parquet file.

STEP 1

Set your pipeline up as follows;

Directory - Whole File Transformer - Local FS

 

STEP 2

Configure your Directory Origin as follows;

 

GENERAL TAB

 

FILES TAB

 

Please ensure you make the amendments necessary to certain fields to suit your own set up, e.g. File Name Pattern. Keep the rest as in the screenshot.

 

POST PROCESSING TAB

 

DATA FORMAT TAB

 

STEP 3

Configure Whole File Transformer stage as follows;

 

GENERAL TAB

 

JOB TAB

*amend file directory as necessary

 

AVRO TO PARQUET TAB

 

If you preview the pipeline, ensure to select the following boxes, which will allow you to click in to each record and schema;

 

 

You should now have a .parquet file from your original AVRO.

Be the first to reply!

Reply