Skip to main content
Question

Reading parquet files from Google Cloud Storage using Data Collector

  • October 12, 2022
  • 1 reply
  • 75 views

Hi,

Understand GCS stage in SDC does not support Parquet but is there any reason or workaround to read Parquet from GCS using Data Collector?

Thank you.

 

 

1 reply

saleempothiwala
Headliner
Forum|alt.badge.img
  • Headliner
  • 258 replies
  • October 13, 2022

@anirbanch 

Parquet is a column oriented data storage format where as SDC is row-based micro batching processing engine hence it cannot read parquet.

You can use StreamSets Transformer engine to read parquet files and transform it.