Could not get a private ClassLoader - max.stage.private.classloaders

3 years ago
August 24, 2021
0 replies
514 views

Drew Kreiger
Senior Community Builder at StreamSets
95 replies

If you run into an exception similar to the one below, it means that Data Collector allocated all available private classloaders.

You can calculate how many private classloaders an SDC instance needs by calculating the number of Hadoop stages used in your pipelines, which run simultaneously (stopping a pipeline should also release the private classloaders).

java.lang.RuntimeException: Could not get a private ClassLoader for 
'streamsets-datacollector-cdh_5_7-lib', for stage 
'com_streamsets_pipeline_stage_destination_hdfs_HdfsDTarget', 
active private ClassLoaders='50': java.util.NoSuchElementException: Pool exhausted

If your running pipelines use more than 50 Hadoop stages, you need to increase max.stage.private.classloaders in sdc.properties file, which is set to 50 by default.

From the sdc.properties file:

#Maximum number of private classloaders to allow in the data collector.
#Stage that have configuration singletons (i.e. Hadoop FS & Hbase) require private classloaders
max.stage.private.classloaders=50

For example, if 100 pipelines (within one SDC) are running and writing to HDFS destination, you have to set this number to 100. If one pipeline contains HDFS, Hive Metadata processor, and Hive Metastore destination, you need 3 private classloaders for one pipeline.

Some of the other stages (not a fully exhaustive list) that use private classloader are - Mapr-DB origin, MapR-DB target, HDFS origin, HDFS metadata executor, Hadoop FS target, BigTable target, Hive Metadata processor, HiveQuery executor, Hive target, Hive Metastore target, Spark processor, Amazon S3 target, Kudu look-up processor, Kudu target, MapReduce executor, HBase target, HBase lookup processor, etc.

If you install the SDC with Cloudera Manager, you can set the value in CM UI > StreamSets configuration > "Data Collector Advanced Configuration Snippet (Safety Valve) for sdc.properties".

Leaving the max.stage.private.classloaders unlimited (or set to a high value, more than your needs) is not recommended. The very common area that gets impacted when this is not taken care of is your memory consumption and the JVM heap management via GC and the likes. And this in turn adversely affects the performance. These become dangerously difficult to trace when you hit an issue with the pipeline.

What if my running pipelines do not use more than 50 Hadoop stages?

If you run only a few pipelines using Hadoop stages simultaneously in one Data Collector and the number is not higher than 50, please contact our support team as this might mean that Data Collector does not release the private classloaders properly.

@aneta

August 11, 2021 00:14

Did this topic help you find an answer to your question?

This topic has been closed for comments

What if my running pipelines do not use more than 50 Hadoop stages?

Related topics

Encryption and Decryption of a file using scripticon

Reading json file from aws bucket using groovy script in groovy_scripting.icon

File encryption using RSA algorithmicon

Calling Runtime Values from within Jython Evaluator.

How to rename a file with its original file name with the Local FS stageicon

Tags

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded