30-Day Free Trial: It’s Never Been Easier To Get Started With StreamSets
Hi
I'm working with StreamSets and have three pipelines that each consume from different Kafka topics. Each pipeline writes to its own dedicated table (e.g., table1,table2, table3), but all three also write to a shared table . I'm using logic to avoid duplicates(update or insert). My concern is about concurrent writes — if all three pipelines try to write to the same row (shared table) at the same time, will StreamSets or the database handle it safely? I'm also wondering how to properly configure retry attempts, error handling, and backpressure in StreamSets to ensure that no data is lost or skipped if database locks or contention occur. What are best practices for configuring the JDBC and pipeline error handling for this kind of multi-pipeline write scenario?
We have an HTTP Client origin that pulls from our ServiceNow site records that would be returned if we ran a report in ServiceNow directly. The records pulled appear at first blush to all be accounted for, but unfortunately the origin just keeps looping and pulling the every record over and over. I reduced the batch size to as low as 100 records without effect (i.e., it still loops endlessly).The URL with parameters is:https://<intentionally_redacted>.servicenowservices.com/api/now/table/task?sysparm_query=closed_atISNOTEMPTY%5Eclosed_at%3E%3Djavascript%3Ags.dateGenerate('2023-01-01'%2C'00%3A00%3A00')%5Eclosed_by.department.nameSTARTSWITHPBO%20-%20ETS%5Eassignment_group!%3Daeb1d3bc3772310057c29da543990ea2%5Eassignment_group!%3D4660e3fc3772310057c29da543990e0b%5EnumberNOT%20LIKEGAPRV%5EnumberNOT%20LIKERCC%5Esysparm_display_value=true%5Esysparm_limit=100%5Esysparm_offset=0I have the stage set to pull in Batch mode, and for pagination I have tried all 5 modes including “None”. Since
Environment:StreamSets Data Collector v4.3. Control Hub v3.51.4. (Inactivity period for session termination = 30 Min) Issue:It has been observed that credentials time out is happening on the long running Streaming Pipelines when using Start Jobs stage. This can happen when using the User & Password Authentication type.The following error captured from control hub logs indicates that the token has expired. START_JOB_04 - Failed to start job template for job ID: START_JOB_04 - Failed to start job template for job ID: 1238759hvbv949:Test, status code '401': {"ISSUES":[{"code":"SSO_01","message":"User not authenticated"}]}, status code } Solution:After an internal investigation, found that we are encountering COLLECTOR-1125, which has been addressed and resolved in data collector v5.1.
Is it possible to query the SCH for the longest running jobs? For example, say you have 100 jobs, and you want to know the 10 longest running jobs out of the total (I’d happily take just reporting on averages if that is all there is). It seems like this would be a useful report or API endpoint to have.
I’m following the DataOps Platform Fundamentals course and the Build a Pipeline chapter has you enter “/zomato” as a files directory in the configuration of a Directory source.However, when validating the pipeline I get the error: SPOOLDIR_12-Directory ‘/zomato’ does not exist: conf.spoolDir.Any solution for this?
Hello!!Currently, I’m using the SDK to extract metrics from some jobs. However, instead of having only {run_count, input_count, output_count, total_error_count} from sch.job.metrics, I would like to have some other metrics as well, such as counting each record for each step of my job. Is this possible? How can I achieve this?I’m also looking to get the last received record (for each pipeline) that we can see in the real-time summary. How can I get this metric from ControlHub using the SDK?
Hi ,I am creating a new Connection with SFTP protocol and trying to connect to the SFTP server using a private key .I inputted these while creating the Connection:Authentication as Private Key ,Private Key provider as plain text Private Key as the text that i copied from the ppk file Username the correct one there was no Passphraseand when i hit Test Connection I get the below errorStage 'com_streamsets_pipeline_lib_remote_RemoteConnectionVerifier_01' initialization error: java.lang.NullPointerException: Cannot invoke "net.schmizz.sshj.userauth.keyprovider.KeyProvider.getPublic()" because "this.kProv" is null (CONTAINER_0701) But alternatively if i input the same in the credentials tab of the pipeline and preview ,I am able to successfully connect to the sftp server and read the file there . Problem is when i create the same via a connection Kindly help me to resolve this issue .
I am attempting to use Postman to query the runtime history of all jobs so that I can provide a list of the X slowest running jobs. I have been unable to configure Postman correct for this GET call, and would like to know from the hivemind what I am doing wrong here.Trying to use this API endpoint:https://{Control_Hub_URL}.streamsets.com/jobrunner/rest/v1/job/{jobId}/historyAuth is set to “No auth” based upon another forum post I read that stated we should include the ID and key for the API credentials used in the header, though I have tried it with API Key and providing the credentials there, as well.I have read through the 14 questions and 4 conversations that return when searching here for “Postman” and tried a few of the proffered solutions but without success. I have no parameters and the headers are where I have been trying to define API credentials. Note these credentials are used by automation in the Control Hub so they are valid. I think my issue lies in what header names I ne
I am using Data collector 5.8.1 . I am selected JDBC in my pipeline start event I am using this query in my start event: DECLARE @XYZWorkFlowID INT = (SELECT TOP 1 WorkFlowID FROM dbo.RefWorkFlow WHERE WorkFlowName = 'XYZ')INSERT INTO JobTracker(JobName,WorkFlowID,Campaign,LastExecutedTime,JobStatus)VALUES ('XYZ_ABC_Export',@XYZWorkFlowID ,'Dialer',GETDATE(),'InProgress')I need to get the identity value generated by JobTracker table and assign it to a ‘JobTrackerId’ pipeline parameter. I need to use the same parameter in the stop event jdbc to update JobTracker the table.I tried adding: ${JobTrackerId} = SELECT @@IdentityI'm pretty sure this is the wrong approach. Can anyone guide me through the right one?
Docs and at least one community thread have suggested that the following expression should work in a stream selector conditional:${record:exists('/ids') && length(record:value('/ids')) > 0}This returns:ELException: No function is mapped to the name "length".The field in question is a list of integers. What am I not understanding?Thanks
I have resource URL as “https://api.simpleng.com.au/v2/ActionSheets/SummaryForSite/3?data=bma&changesAfter=2024-11-01T00:00:00&pageNumber=1” And response as :{ "page": 1, "ofPages": 3, "content": [ ] }Where I want to get all records in Page 1,2 and 3.Despite using pagination I am not able to get the records from all the pages and get only from 1st page.I am using Pagination with PageNumber and using ${StartAt} variable.Refer below Screenshots
While i am working with a table that had no Primary Key and the total number of records were also huge, so i thought of a basic SQL concept that “ To Uniquely Identify a row some times we use Multiple columns “.To use the same in incremental mode in JDBC Query Consumer, I thought of using multiple columns as offset, but i have gone nowhere.Can anybody have any thoughts on this.Thank you.
Hello everybody. I'm doing an exercise on streamsets and when creating a pipeline it throws this error.VALIDATION_0096 - This pipeline was created in version 6.1.0 and is not compatible with current version 6.0.0.How to fix it? I'm using the 30-day trial version of streamsets.Thank you very much.
Become a leader!
Already have an account? Login
No account yet? Create an account
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.
Sorry, we're still checking this file's contents to make sure it's safe to download. Please try again in a few minutes.
Sorry, our virus scanner detected that this file isn't safe to download.