Question:
What is the ideal batch size which one can set for better performance?
Answer:
One important thing to remember about batch size is that the batch is processed in memory, meaning the heap size(memory) allocated on the machine for SDC and the number of other pipelines(with their own memory consumption) running simultaneously are important factors when changing your batch size.
Unfortunately, we don't have a set batch size or formula as the ideal size is dependent on a variety of factors.
The batch size allowed for pipelines can be limited in the sdc.properties file by changing production.maxBatchSize; the default setting is 1000.
If you are using Cloudera Manager, then you have to change this property by going to Cloudera Manager UI --> StreamSets --> Configuration --> Max Batch Size (Running) (as attached in the print screen).