SDC ships with the default heap size of 1g. This is generally adequate to develop some simple pipelines, with a few stages. In order to use run more pipelines concurrently or utilize memory-intensive stages such as Field Pivoter or the JSON Parser, it is recommended to increase the default heap size.
Starting with SDC 3.15.0, there is a new option available to set SDC JVM's heap size. While it is somewhat documented, the intention really was that it would be used only in the StreamSets cloud vendor marketplace offerings. For example, the StreamSets AWS Market Place AMI dynamically adjusts the heap size at startup based on the memory capacity of the underlying EC2 node.
On Linux systems, SDC_HEAP_SIZE_PERCENTAGE can be enabled to specify a percentage of the system memory to configure. This option is only supported on Linux systems, on Mac OS, the new parameter has no effect, therefore SDC starts with the -Xmx -Xms values that are specified in the SDC_JAVA_OPTS environment variable. This is usually configured in the user's shell profile or specified in the sdc-env.sh or sdcd-env.sh.
SDC_HEAP_SIZE_PERCENTAGE can be enabled to specify a percentage of the available system memory to configure for SDC's heap. The parameter does have limits; the value is checked to determine whether there is a `ulimit` or cgroup setting that should be considered. There is an enforced minimum value of 512 kiB. And there is a maximum of 50% of the free memory on the system.
There are a few details to consider when setting this parameter.
There is a significant amount of memory used by SDC that is not heap memory - there is the memory required for the code itself, additional memory is required for off-heap memory for IO buffers and metadata as well as memory reserved for thread state and thread stacks. That memory is not considered, and this parameter is only to set to the heap size.
Memory is also required for things on the system, so it is a good idea to reserve some memory for Linux, some space for the Linux buffer pool, and for other transient applications, shells, tar, and backup utilities. With the SDC_HEAP_SIZE_PERCENTAGE limit of 50% of system memory, it's unlikely that the Linux Out of Memory Killer will be invoked, except on systems with very small memory configuration.
When working with a machine with a small amount of memory, SDC_HEAP_SIZE_PERCENTAGE should be reduced accordingly. The settings can be verified by looking for the last -Xmx -Xms on SDC's command line - `ps -ef | grep sdc | grep --color Xmx` After configuring SDC_HEAP_SIZE_PERCENTAGE and starting SDC, the easiest way to see the total footprint of SDC's memory use on the system is to use `top` and check the memory column for the SDC process.