Anyone can provide a quick overview of pros and cons of data-collector vs transformer apart from streamsets document also.
Already have an account? Login
Login to the community
No account yet? Create an account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.
Data Collector is an ingestion engine that reads data from A and writes to B with some transformation of data in batch. Use case is to read from multiple sources and write to a landing area or deliver data. Data is ingested as streaming batch.
Transformer is an engine that uses Apache Spark to provide ETL at scale. Generally data is processed in batch. Transformation happens on datasets.