- - Knowledge base
Product Updates
Events

30-Day Free Trial: It’s Never Been Easier To Get Started With StreamSets

8 months ago

Home
Community overview
StreamSets Platform
Community Articles and Got a Question?

Community Articles and Got a Question?

Can't find what you're looking for? Ask it here or check out the Community articles

997 Topics
2,280 Replies

When you subscribe we will email you when there is a new topic in this category

997 Topics

Recently active

Most replies Most views Newest first

Meghana ChinnuFan

asked in Community Articles and Got a Question?

HTTP Client missing output records

The HTTP Client in my pipeline is not processing all of the input records that it gets. For eg: Input to HTTP client is 1430 records, but the output records processed in the same client are 1360 records only, with 0 error records. Not sure if I am missing any configuration to be added so that I can balance the I/O records, and send the error records to the error stage.

5 days ago

Karan BhatiaFan

asked in Community Articles and Got a Question?

API request using multiple values for a query parameter in one job

Hello There I am trying to solve a very specific usecase here. I am trying to query a DB using an API and this API is using multiple query parameters. one of the query parameters is going to be ids which are more than 100 in count. The catch is that I can’t pass all 100 ids as an array to the API call because API is not designed to accept array for that parameter. It is going to be kind of looping over those 100 ids one by one and then calling the API with new id as the parameter value in each iteration.Also, these IDs needs to be fetched from a snowflake table and then passed as the parameter to the API call. So I am thinking of having some snowflake or JDBC query consumer as as an origin. And these IDs would increase over the period of time so want to make it as dynamic as possible but that is not priority for now. Having multiple jobs to solve this would lead to 100+ jobs and that would keep increasing which is not a good practice at all. Could someone please suggest the best possi

5 days ago

ricsammaFan

asked in Community Articles and Got a Question?

Kafka Connection error Http failure response for https TUNNELING_INSTANCE_ID=tunneling-1: 500 OK

Dears,I have configured a local docker image for the Streamsets and kafka images to try a simple Kafka connection, but I´m receiving the following:Http failure response for https://na01.hub.streamsets.com/tunneling/rest/660c2f92-c396-4322-a9ea-cd73758897a1/rest/v1/pipeline/dynamicPreview?TUNNELING_INSTANCE_ID=tunneling-1: 500 OK The kafka image names are:bash-3.2$ docker-compose psNAME IMAGE COMMAND SERVICE CREATED STATUS PORTSkafka wurstmeister/kafka "start-kafka.sh" kafka 27 minutes ago Up 27 minutes 0.0.0.0:9092->9092/tcpzookeeper wurstmeister/zookeeper "/bin/sh -c '/usr/sb…" zookeeper 27 minutes ago Up 27 minutes 22/tcp, 2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp And I try the Test connection kafka:9092 or localhost:9092 and receive the Http failure response for https://na01.hub.streamsets.com/tunneling

13 days ago

jangala.karthikFan

asked in Community Articles and Got a Question?

Snowflake runnable process batch failed: com.streamsets.pipeline.api.StageException: SNOWFLAKE_00 - Could not perform SQL operation: java.sql.SQLException: Error when adding primary keys:

I have a simple pipeline from oracle multi table into snowflake destination. This error only happens when the destination table is not present but it does create the table with PK. After pipeline ran and I can truncate the table and re-run the pipeline without any issues. Any ideas?

1 month ago

inoFan

asked in Community Articles and Got a Question?

Java Throw Error when I using javascript evaluator in SDC

Hello!I'm currently trying out the free (open source version) of SDC. When I created an empty pipeline and added a cron scheduler and JavaScript Evaluator and tried to run it, I encountered the following error:Pipeline Status: STARTING_ERROR: java.lang.NoClassDefFoundError: Could not initialize class com.streamsets.pipeline.stage.processor.javascript.Java8JavaScriptObjectFactoryI was under the impression that using the JavaScript Evaluator did not require any additional package installations. Is there an additional step that needs to be taken?I need some help. Thank you.

1 month ago

ravvichaudharyFan

asked in Community Articles and Got a Question?

HTTP Client Processor Not Retrying Despite Configuration in StreamSets Data Collector

Hello,I'm currently using the HTTP Client Processor in StreamSets Data Collector and I've encountered an issue with the retry mechanism. Despite configuring the processor to retry on receiving certain HTTP status codes, it doesn't seem to be doing so.Specifically, I've set the processor to retry immediately when it receives a 409 status code, with a maximum of 2 retries. However, when the processor receives a 409 status code, it doesn't retry the request and instead gives the following error: "HTTP_101: Applying passthrough and error policy on status configuration".I've checked the pipeline configuration. I'm not sure why the processor isn't retrying as expected.Has anyone else encountered this issue? Any insights or suggestions would be greatly appreciated.Thank you in advance for your help.

1 month ago

dhanraj_shindeFan

asked in Community Articles and Got a Question?

Tarball engine installation

Hi Team While installing DC Tarball engine using install script from Dataops platform, it asks for download and install directory at run time.Is there a way we could avoid passing it during execution and pass it in install script or use current directory?

1 month ago

Clément ViFan

asked in Community Articles and Got a Question?

Kafka consumer Offset from beginning every batch

Hello,I'm currently working on a simple pipeline to ingest kafka messages inside a log file.I'm trying to consume all the data from the beginning of a topic but i'm only getting newer data added to this topic.Once consumed, the previous topic messages are not accessible anymore.I've already test all the different "Auto Offset Reset" properties.Same for simple and multi topic consumers.In the official documentation docs.streamsets.com :auto.commit.interval.ms bootstrap.servers enable.auto.commit group.id max.poll.recordsIf I understand correctly all those parameters are locked so I can't disable the offset management and process all the data from the beginning of a topic.Is there an additionnal Kafka configuration property to use or do I need to configure the topic directly via kafka CLI ??StreamSets Data Collector version : 3.14.0Kafka Consumer version : 2.0.0Regards.

1 month ago

andalvFan

asked in Community Articles and Got a Question?

How to retain original file name when writing to s3?

I read from an SFTP server and when writing to S3 my files get renamed. How can I retain the original file name?

1 month ago

shankaraviFan

asked in Community Articles and Got a Question?

Update Job using REST API

I have a requirement to update job parameter via restAPI. I get the job details via GET function. I modify the content via code. But I am not sure how to pass the modified json content in a variable/file in the restapi post call? I have the need to run the api as automated batch with no manual intervention. I tried below two ways but none worked. I dont see any other viable option to pass modified content in the curl command.var1=`cat /home/script/modified.json`curl -X POST https://XXX.hub.streamsets.com/jobrunner/rest/v1/job/72b8200c-0e3b-426e-b564-bceb17220b1e:8c2c652f-e3d9-11eb-9fb3-b974ac4c3f67 -d '{ @var1}’ -H "Content-Type:application/json" -H "X-Requested-By:curl" -H "X-SS-REST-CALL:true" -H "X-SS-App-Component-Id: $CRED_ID" -H "X-SS-App-Auth-Token: $CRED_TOKEN" -i curl -X POST https://XXX.hub.streamsets.com/jobrunner/rest/v1/job/72b8200c-0e3b-426e-b564-bceb17220b1e:8c2c652f-e3d9-11eb-9fb3-b974ac4c3f67 -d '{ /home/script/modified.json}’ -H "Content-Type:application/json" -H "X-R

1 month ago

yogesh0590

asked in Community Articles and Got a Question?

Update “globalMaxRetries” job config using REST API Call

Hello Team,I need to update one job config which is “globalMaxRetries” using rest call.I refer Restful section in Control Hub but that Rest call for “updateJob” requires many other configs also needs to mention in Request Body. could you please assist me with this ?

1 month ago

shivrajRoadie

asked in Community Articles and Got a Question?

Release connection Port of vm instace, used by pipeline in which Http server origin is used

I have developed a pipeline that utilizes an Http Server as the origin. In the configuration of the origin, I have designated port 8000 for the purpose of writing data to the pipeline. However, the issue lies in the fact that the pipeline does not release this port. I aim to utilize the same port for multiple pipelines, distinguishing them by their respective Application-Id. Upon executing the initial pipeline, the port remains occupied and fails to release. Upon attempting to execute a second pipeline using the same origin, I encountered an error message stating "Port is not free."If anyone got solution, Please Help me out with this.

1 month ago

Hamid KFan

asked in Community Articles and Got a Question?

JSON payload validation using json schema

Hello community, I am vuilding a pipeline that receives http requests providing a JSON payload in the body.I want to validate the payload against a json schema. Now I have come accross this article which is interesting https://streamsets.com/blog/json-validator/. I keep getting this error : JSON_VAL_02 - The JSON object supplied is invalid: org.json.JSONException: Expected a ':' after a key at 6 [character 7 line 1] When I check the record as json string checkbox, it raises the exception … com.streamsets.pipeline.api.impl.TypeSupportConversionException: Cannot convert Map to com.streamsets.pipeline.api.impl.StringTypeSupport@374848aAlso with that processor component I am restricted as I would like to fetch the json schema from a repository.I have tried using the javascript evaluator as well and also there I am running into trouble with library versions I believe …Below the error I am getting …Caused by: java.io.IOException: resource /draftv4/schema not foundUsing libraries …json-schem

1 month ago

Hamid KFan

asked in Community Articles and Got a Question?

Testing HTTP server origin on SDC deployed in docker

Hello all,I am trying to run a pipeline with an HTTP server origin to receive messages and then push them to a kafka topic. When I start my docker container, the port is not accessible on my local laptop when I don’t enable port-forwarding for the port the origin is listening on. But when I enable port-forwarding then I get the notification : “The engine cannot be reached. Check that the engine is running and can communicate with Control Hub.” Can anyone help me out on this?Many thanks!Hamid K.

2 months ago

fhermosillaFan

asked in Community Articles and Got a Question?

Using GCP EL credentials in a custom function

Hi Everyone.We are working in a solution which integrates GCP Secret Manager and Streamsets.The client stores a secret in JSON format, and wants to extract a field with his data. Is similar to this format: How can I extract the dbuser field??I now there are custom functions, but don’t know if I can use the EL function from Streamsets to obtain the JSON info.Can I use the ${credential:get("gcp", "group@org", "dbuser?latest") as an argument? so the function returns only the dbuser value. Thanks!

2 months ago

hrishikeh132609Fan

asked in Community Articles and Got a Question?

Teradata to Snowflake (Extract and Load)

Hi,Use case.We are load data from Teradata to Snowflake.as we not doing any transformation just extract the Data and loading in Snowflake.which engine will be the best for this use case.1.Data collector2.Transformer for spark.Note: we have data in millions Best regards,Hrishikesh

2 months ago

nachiket_petheFan

asked in Community Articles and Got a Question?

After enabling https on Control Hub, login screen redirects back to login screen.

We have performed following steps according to this documentation - https://docs.streamsets.com/portal/controlhub/latest/onpremhelp/controlhub/UserGuide/Install/EnableHTTPS.html1. stored .p12 file in etc/dpm directory2. made changes in dpm.properties (exactly as given in the documentation)3. made changes in common-to-all.properties (exactly as given in the documentation)4. stored keystore password in keystore-password.txt5. start the control hub manuallywe are using control hub version 3.51.4

2 months ago

pedroStreamSets Employee

posted in Community Articles and Got a Question?

How to use Content Type ‘multipart/form-data’ in the HTTP Client stage

SDC’s HTTP Client stage does not have a native option to send HTTP requests using the Content Type ‘multipart/form-data’. Using the default options will result in SDC making the call seemingly correctly but it will not add the required boundary characters that a request using this kind of Content Type requires. However, we can force SDC to send the call in the correct multipart/form-data’ format. To do this you will need to set these parameters in the following way:*** For this example, the request SDC would make is equivalent to the following curl command:curl --location 'https://test.url.com/api/jobs' \--header 'Accept: application/json' \--header 'Authorization: Bearer *********************' \--header 'Cookie: staging-connections-canary="45412ba154f512d34"' \--form 'options="{\"options\":{},\"url\":\"https://test.url.com\",\"sslVersion\":\"TLS 1.2\",\"password\":xxxxxxxxxx,\"userId\":xxxxxxxxxx,\"displayName\":\"Pedro\",\"jobName\":\"pedro_test_job\",\"batchsize\":1000,\"waitTimeout

2 months ago

N_aliOpening Band

asked in Community Articles and Got a Question?

Move Files from sftp server to httpclient dst as an attachment?

Hi , i need to transfer csv files from an sftp server to post it as an attachment in a post API (http client),Can you help me figure out how to accomplish this ?

2 months ago

PradeepStreamSets Employee

asked in Community Articles and Got a Question?

API which returns all pipelines

Problem description: By default https://<control-hub or platform endpoint>:<port-number>/pipelinestore/rest/v1/pipelines returns the pipelines which are committed. Pipeline count which you note in below image is the number committed pipelines. What if we want the pipelines api to return non-committed or draft pipelines too?

2 months ago

john.mcavoyStreamSets Employee

posted in Community Articles and Got a Question?

UTF-8 Special Characters being replaced by Question Marks (�)in Pipelines

Problem DescriptionWhen running an SDC pipeline which processes records with Strings containing UTF-8 special characters, these special characters are being replaced by question marks (? or �) in various parts of your pipeline.ExampleInput record:{"productName": "My Product™"}Output record:{"productName": "My Product�"}Root CauseThis problem indicates that your Java Runtime Environment is using a non-UTF-8 character set which does not support these special characters, so the special characters are being replaced by the � (U+FFFD) REPLACEMENT CHARACTER.By default, SDC will try to set your JAVA_OPTS to use the UTF-8 encoding, however it is possible that the JAVA_OPTS parameters are being set outside of SDC and are overriding either of the following JVM parameters which tells the JRE what encoding to use at runtime: file.encoding and sun.jnu.encoding.The JRE also has a default character set which it will use if these two parameters are not specified. This default character set can vary be

3 months ago

Douglas RFan

posted in Community Articles and Got a Question?

"Failed at Step EXEC - Permission denied" when starting SDC as a Service on Systems with SELinux

"Failed at step EXEC - Permission denied" when starting SDC as a Service on Systems with SELinux ProblemWhen starting StreamSets Data Collector as a service under systemd, the service fails immediately on startup. The following error is shown when the status of the service is checked (using the systemctl status sdc command).systemd[13801]: sdc.service: Failed at step EXEC spawning /opt/streamsets-datacollector/bin/streamsets: Permission denied CauseThis error indicates that the script SDC_HOME/bin/streamsets (which is used to start SDC) could not be launched by systemd. This can be caused by incorrect file or directory permissions or by problems with the SELinux context (on systems where SELinux is enabled). SolutionStep 1: Examine the Service ConfigurationBefore looking at possible permissions issues, it is important to verify that the user, group, and installation directory for the service are correctly configured in the service unit file, /etc/systemd/system/sdc.service. Exami

3 months ago

Page 1 / 40

Badge winners

ajinkyahas earned the badge Innovator
Sanjeevhas earned the badge Eager to help
AkshayJadhavhas earned the badge Eager to help
john.durkinhas earned the badge Eager to help
samhas earned the badge Eager to help

Show all badges

Terms & Conditions

Sign up

Already have an account? Login

Social Login

Username *

E-mail address *

What I do... *

Data Leader Data Architect Data Engineer Data Scientist Other

Company *

Country *

Zip Code *

Marketing Communications

Yes No

Password *

I have read and Agree to the Website Terms of Service and I have read and acknowledged the Privacy Policy.

loginBox.register.email_repeat

Login to the community

No account yet? Create an account

Social Login

Username or Email

Password

Remember me

Forgot password?

Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.

Username or e-mail

Back to overview

Scanning file for viruses.

Sorry, we're still checking this file's contents to make sure it's safe to download. Please try again in a few minutes.

This file cannot be downloaded

Sorry, our virus scanner detected that this file isn't safe to download.