Skip to main content
Question

Reading parquet files from Google Cloud Storage using Data Collector

  • October 12, 2022
  • 1 reply
  • 62 views

Hi,

Understand GCS stage in SDC does not support Parquet but is there any reason or workaround to read Parquet from GCS using Data Collector?

Thank you.

 

 

1 reply

saleempothiwala
Headliner
Forum|alt.badge.img

@anirbanch 

Parquet is a column oriented data storage format where as SDC is row-based micro batching processing engine hence it cannot read parquet.

You can use StreamSets Transformer engine to read parquet files and transform it.


Reply