StreamSets CLI is an excellent way to script pipeline execution or provide statistics or metrics to other applications, such as enterprise monitoring dashboards. Documentation for The Data Collector CLI can be found here:
Search for "Command Line Interface"
Generally the CLI tool is easy to set up, but there have been some issues, particularly getting started with CLI. In some cases you may receive “timeout errors” or “connection refused” messages.
When not using SCH, errors are often caused by having the wrong hostname, or incorrect authentication information.
If you’re a Cloud SCH customer, the first thing to check is whether the machine running the CLI command can access the SCH servers so step 1 is to `ping cloud.streamsets.com`. If this fails you my need to set up a proxy - many environments set up proxies so StreamSets Data Collector can talk to SCH. In this case you will be able to find the correct configuration values for the proxy in the SDC_OPTS_JAVA.
For CLI, we have a different environment variable - SDC_CLI_OPTS_JAVA. Just like setting up SDC_JAVA_OPTS, SDC_CLI_JAVA_OPTS variable needs to have the following configuration options set to support using a proxy:
-Dhttps.proxyHost=myproxyname -Dhttps.proxyPort=myport
After getting the proxy configured, you may still in some cases see an error message “error”. Typically, this is an authentication error, again if you’re a SCH customer you need to provide an SCH url and SCH credentials to run a CLI command. You’ll need to use parameters such as -D <SCHUrl> -a dpm -u <SCHUsername> and -p <SCHPassword>
A complete CLI command when using Cloud-based SCH would look like this:
/bin/streamsets cli -U http://localhost:18630 -D https://cloud.streamsets.com -a dpm -u admin@myOrg -p myPassword manager status -n TheSamplePipelinef91ff995-045c-40fd-a7d4-a017746e7efe
NB. localhost may be replaced by the hostname of a remote machine on which you want to run the CLI command.