Question

Reading avro file from a bucket similar to S3 and trying to convert into parquet

  • 1 June 2023
  • 3 replies
  • 109 views

I am reading an avro file from a custom cloud similar to S3 and trying to conver it into parquet file using whole file evaluator but its giving an error 

 

Record1-Error Record1 CONVERT_01 - Failed to validate record is a whole file data format : java.lang.IllegalArgumentException: Record does not contain the mandatory fields /fileRef,/fileInfo,/fileInfo/size for Whole File Format. : (View Stack Trace... 

 

 


3 replies

Userlevel 5
Badge +1

@kushal_upadhyay 

can you please try to read file from s3 as a whole file and convert it to parquet by using whole file transformer and store the new file in s3 .

Then read data from the new file and move proceed with your pipeline integration.

Please let me know in case of any issues.

 

Thanks & Regards

Bikram_

HI Bikram,

Thanks we can write the parquet file using two different pipelines. Firs we have created one pipeline which converted the avro to parquest and stored into local fs then we created second pipeline which read the parquet from the same directory and stored in S3 but is there any way through which we can write the parquet from kafka → AWS S3 bucket by converting avro to parquet.

 

Here we first get the SDC records from KAFKA and pass them to Whole file generator by choosing convert to parquet then we need to pass them to S3..Any example pipeline will help.

 

Hi @kushal_upadhyay ,

 

I’m not sure if this is still needed, however. I have just published an article on converting from AVRO to Parquet that you might find helpful. It can be found here; 

 

 

Regards, 

Gary

Reply