Skip to main content

Anyone can provide a quick overview of pros and cons of data-collector vs transformer apart from streamsets document also.

@lakshmi_narayanan_t 

Data Collector is an ingestion engine that reads data from A and writes to B with some transformation of data in batch. Use case is to read from multiple sources and write to a landing area or deliver data. Data is ingested as streaming batch.


Transformer is an engine that uses Apache Spark to provide ETL at scale. Generally data is processed in batch. Transformation happens on datasets.


Reply