Question

Reading parquet files from Google Cloud Storage using Data Collector

  • 12 October 2022
  • 1 reply
  • 53 views

Hi,

Understand GCS stage in SDC does not support Parquet but is there any reason or workaround to read Parquet from GCS using Data Collector?

Thank you.

 

 


1 reply

Userlevel 4
Badge

@anirbanch 

Parquet is a column oriented data storage format where as SDC is row-based micro batching processing engine hence it cannot read parquet.

You can use StreamSets Transformer engine to read parquet files and transform it.

Reply