After upgrading MapR client libraries or applying a MapR client patch on an SDC node, pipelines using Hadoop stages fail with some Provider could not be instantiated error or ClassNotFoundException at startup, indicating some Hadoop class cannot be found within the Java classpath.
Due to licensing restrictions, StreamSets cannot distribute MapR libraries with Data Collector. As a result, you must perform additional steps to enable the Data Collector machine to connect to MapR.
When the bin/streamsets setup-mapr prerequisite is completed, SDC creates symlinks to the installed MapR Client libraries on the system in the streamsets-libs/streamsets-datacollector-mapr-version-lib/lib directory.
When applying a MapR Client patch or upgrading the MapR Client package on the SDC machine, some JAR files in the MapR client installation patch will be renamed which will cause broken symlinks within SDC’s MapR stage library directory.
The Provider could not be instantiated exception indicates that some MapR JAR libraries are missing from SDC’s classpath. When applying a MapR client patch or upgrading the MapR client version on the SDC node, these MapR prerequisite steps need to be re-run in order to update all of the symlinks to the new MapR client JAR files so SDC can include these in the Java classpath.
Re-running the bin/streamsets setup-mapr pre-requisite step and then restarting SDC will update any symlinks which were broken by the MapR client patch or upgrade and these ClassNotFoundException errors will be resolved.