Question

File convertion

  • 9 March 2023
  • 4 replies
  • 144 views

Hi Team

I need help to convert CSV file to parquet files or whole files using jython Evaluator or grovy evaluator. I need code guidance on how to implement this use case.

origin- avro file data incoming

jython evaluvator-want to save avro as whole file format 

whole file transformer -for convert avro to parquet 

local fs -write parquet file  in local directory 

suggest any other ideas if you had.

 

Thanks 

Tamilarasu

 


4 replies

Userlevel 4
Badge

@tamilarasup 

SDC is row based, currently the product does not support writing to Parquet directly.

You can read data from CSV, write it as an Avro in Local FS. 

Create event on Local FS that run at every time file finishes. Read whole file, use Schema Generator to convert from Avro to Parquet.

 

More details for the approach here: : https://docs.streamsets.com/portal/platform-datacollector/latest/datacollector/UserGuide/Solutions/Parquet.html

Hi Team

I need help to convert CSV file to parquet files or whole files using jython Evaluator or grovy evaluator. I need code guidance on how to implement this use case.

origin- avro file data incoming

jython evaluvator-want to save avro as whole file format 

whole file transformer -for convert avro to parquet 

local fs -write parquet file  in local directory 

suggest any other ideas if you had.

 

Thanks 

Tamilarasu

 

I need, how to import modules or Packages to the data collector(SDC). 

Jython Evaluator used to custom Jython code to process data which type we need.

Here I entered jython code- import pandas or some other packages to the Jython evaluator. then I got a no-module error.

how to import all modules to the data collector for jython evaluator.

 

 

Userlevel 5
Badge +1

@tamilarasup 

As suggested by Saleem , To handle your case , you can create two pipelines.

 

Pipeline 1:

Read read data from Kafka and store into Local FS . Below the pipeline for your reference.

 

Pipeline 2 : 

Read file as whole file and convert it in parquet using whole file transformer and send to Local FS as a whole file.

 

https://docs.streamsets.com/platform-datacollector/latest/datacollector/UserGuide/Processors/WholeFileTransformer.html

 

To answer your questions on python package installation .we need install python panda in SDC.

 

 

kindly try it in SDC and check if it helps.

pip3 install pandas

 

 

Thanks & Regards

Bikram_

Userlevel 4
Badge

Thanks @Bikram 

 

The other way to get this done in 1 pipeline is to use event and executors on destination Local FS.

Reply