data collector vs transformer

  • 3 January 2023
  • 1 reply

Userlevel 2

Anyone can provide a quick overview of pros and cons of data-collector vs transformer apart from streamsets document also.

1 reply

Userlevel 4


Data Collector is an ingestion engine that reads data from A and writes to B with some transformation of data in batch. Use case is to read from multiple sources and write to a landing area or deliver data. Data is ingested as streaming batch.

Transformer is an engine that uses Apache Spark to provide ETL at scale. Generally data is processed in batch. Transformation happens on datasets.