Skip to main content

FAQ - Ideal batch size for performance


Drew Kreiger
Rock star
Forum|alt.badge.img
  • Senior Community Builder at StreamSets
  • 95 replies

Question:

What is the ideal batch size which one can set for better performance?

 

Answer:

One important thing to remember about batch size is that the batch is processed in memory, meaning the heap size(memory) allocated on the machine for SDC and the number of other pipelines(with their own memory consumption) running simultaneously are important factors when changing your batch size.

Unfortunately, we don't have a set batch size or formula as the ideal size is dependent on a variety of factors. 

The batch size allowed for pipelines can be limited in the sdc.properties file by changing production.maxBatchSize; the default setting is 1000.

If you are using Cloudera Manager, then you have to change this property by going to Cloudera Manager UI --> StreamSets --> Configuration --> Max Batch Size (Running) (as attached in the print screen).

 

Daniel Amador

August 11, 2021 00:03
Did this topic help you find an answer to your question?
This topic has been closed for comments