30-Day Free Trial: It’s Never Been Easier To Get Started With StreamSets
Good afternoon, I’m currently using Streamsets Data Collector 3.14 (last open source version of sdc).I see that it is possible to connect sdc to a control hub and there is a lot of advantages to do so. Also there is a documentation for installing an on prem control-hub from an archive :https://docs.streamsets.com/portal/controlhub/latest/onpremhelp/controlhub/UserGuide/Install/InstallingDPM.html#concept_exg_11p_hbbUse the following command to extract the tarball:tar xzvf streamsets-dpm-<version>.tar.gz There was even a tutorial for how to build your own sch on prem from a github repo: Howerver the repo is no longer available :http://github.com/streamsets/domainserver Could it be possible to have again access to the repository or to the archive ? Thank you.
QuestionHow are the SDC Metrics collected and what do they represent?AnswerThe graphs on the SDC Metrics page in StreamSets Data Collector and the Metrics tab of the Execution Engine page in Control Hub are system-level metrics collected from standard Java core libraries. They should roughly correspond with metrics reported by other system-level tools, such as top and uptime at the command line, as well as external monitoring tools that show system-level metrics. Note that the numbers reported by different tools won’t match each other exactly due to differences in reporting intervals and other factors, but there should be a correlation.
Hi I have pipeline and it has 1 runtime parameter about table name. I create a job on top of this pipeline, so it also has a runtime parameter, when create job from this pipeline, I provide a default parameter.I try to use python SDK to call the job, if I ran like below, the job will run with the default parameter(TEST_TABLE1) defined in pipeline and job, it is expected:job = sch.jobs.get(job_name = "MY_SDC_JOB")sch.start_job(job) However when I run like this, I want to overwrite the default runtime parameter, but it still using the default parameter(TEST_TABLE1) , how to make it run with the given parameter(TEST_TABLE2)?sch.start_job(job, TABLE_NAME=’TEST_TABLE2')
HiI am following the steps generate API Credential, and run the content in the green frame directly, however it return error like below. curl -X GET https://eu01.hub.streamsets.com/security/rest/v1/currentUser -H "Content-Type:application/json" -H "X-Requested-By:curl" -H "X-SS-REST-CALL:true" -H "X-SS-App-Component-Id: $CRED_ID" -H "X-SS-App-Auth-Token: $CRED_TOKEN" -i % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed100 19 100 19 0 0 22 0 --:--:-- --:--:-- --:--:-- 22HTTP/1.1 403 Forbiddencontent-length: 19content-type: text/plaincontent-security-policy: object-src 'none';script-src 'self' https://cdn.cookielaw.org https://privacyportal.onetrust.com https://geolocation.onetrust.com https://app.intercom.io https://widget.intercom.io https://js.intercomcdn.com https://js.userflow.com https://cdn.userflow.com;style-src 'self' https://fonts.googleapis.com h
I want to create a parameter that I can use across multiple pipelines.For example, I have multiple pipelines that are configured to run on EMR cluster.Instead of providing the same EMR configuration values for all the pipelines, if there is a way for me to create parameters globally that I can refer to, in each of the relevant pipelines.Thanks !
Is there a way to create a pipeline to successfully call StreamSets API endpoints that does not require the Python SDK for StreamSets to be leveraged? I have tried the REST Service and HTTP Client origins, and even attempted (horribly) a Jython Scripting origin to no avail. I am fairly certain that authentication is the problem since the API works when I use them through the SCH RESTful API section manually, and receive the following as an error when attempting to use a pipeline stage:com.fasterxml.jackson.core.JsonParseException: Unexpected character ('<' (code 60)): expected a valid value (JSON String, Number, Array, Object or token 'null', 'true' or 'false') at [Source: REDACTED (`StreamReadFeature.INCLUDE_SOURCE_IN_LOCATION` disabled); line: 1, column: 2]My ask for a UI-only solution is due to 1) it would allow less experienced admins to troubleshoot without immediately calling on me if problems arise, 2) allow us to build sequences for business- or management-oriented users to
Organization 5c4e8df2-f1c4-11ee-9efd-39b2ca65500f already has 0 pipelines which is the maximum allowed. Please contact support : PIPELINE_STORE_07
Hi all, Is there a Streamsets Postman collection available? If yes, would someone please share the download link?Thank you,
I am trying to build a pipeline by following the steps mentioned in the streamsets docs- https://docs.streamsets.com/portal/platform-controlhub/controlhub/UserGuide/GettingStarted/Try.html#task_q3r_p2x_k4b but getting an issue while writing to the local fils system.the issue resides in the directory template [Directory Template info/HDFS_output/${YYYY()}-${MM()}-${DD()}-${hh()}-${every(5,mm())}codeHADOOPFS_41 - Base directory path could not be created]Can anyone help? Do we have a set of configuration that needs to be installed before running streamsets?
I was able to configure EMR Cluster from my transformer pipeline and start the job. But the job does not finish.It fails with an errorERROR Client: Application diagnostics message: Shutdown hook called before final status was reported.Any ideas ?Thanks !Regards,Srinivasa Nanduru
Become a leader!
Already have an account? Login
No account yet? Create an account
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.
Sorry, we're still checking this file's contents to make sure it's safe to download. Please try again in a few minutes.
Sorry, our virus scanner detected that this file isn't safe to download.