- - Knowledge base
Product Updates
Events

30-Day Free Trial: It’s Never Been Easier To Get Started With StreamSets

8 months ago

Home
Community overview
Welcome
Show us your Pipelines

Show us your Pipelines

Share an example of your Pipeline pattern. Upload a Screenshot and detailed description. No wrong answers. Do NOT share personal or credential information.

57 Topics
66 Replies

When you subscribe we will email you when there is a new topic in this category

57 Topics

Recently active

Most replies Most views Newest first

DolphinDiscovered Fame

asked in Show us your Pipelines

How to Config to make Pipeline "Stop Event" not run when pipeline failed

Hi team,we have a pipeline, which configured a stop event, it let a sql statement to run once pipeline finish processing data, however we found, when the pipeline failed due to some reason, this “Stop Event” still run, which is not expected for us. Can you please let me know if some place can be configured, which let the Pipeline "Stop Event" not run when pipeline failed processing data? I actually found if the pipelien “Start Event” failed, the pipeline will not run which is expected, however the “Stop Event” always run even pipeline failed.

vishalsdc2024New Member

asked in Show us your Pipelines

Stream set pipeline is not starting

Hi , i am facing an issue while starting my sdc service and sdc is not coming up. it is prod and need to be up. can someone please help on this at the earliest. Caused by: com.streamsets.datacollector.store.PipelineStoreException: CONTAINER_0206 - Cannot load details for pipeline 'SupplyCha__eed7a3d5-486c-499b-baa1-6eac92aa198b__averydennison.com': java.io.IOException: File '/apps/sdc/data/pipelines/SupplyCha__eed7a3d5-486c-499b-baa1-6eac92aa198b__averydennison.com/pipeline.json-tmp' exists, '/apps/sdc/data/pipelines/SupplyCha__eed7a3d5-486c-499b-baa1-6eac92aa198b__averydennison.com/pipeline.json-old' should exists we have referred link for solution but not sure what can be done as files are not there on those pipelines. Thanks & Regards,Vishal Verma

ColinNew Member

asked in Show us your Pipelines

Inquiry Regarding StreamSets Product Versions and Feature Differences

As per my understanding, the Community Edition is available for free. Could you please provide insights into the advantages of the Enterprise Edition over the Community Edition? Additionally, what are the limitations of the Community Edition?As per my understanding, the Community Edition is available for free. Could you please provide insights into the advantages of the Enterprise Edition over the Community Edition? Additionally, what are the limitations of the Community Edition?

DolphinDiscovered Fame

asked in Show us your Pipelines

Oracle CDC client pipeline keep running without processing any records, but in DBeaver same account can get data from V$LOGMNR_CONTENTS

Hi, I build a pipeline using Oracle CDC client, it is a very simple pipeline , have attached my exported pipeline. I currently using sysdba access configured in Streamsets and using this account I ran below in DBeaver can get records from V$LOGMNR_CONTENTS, please refer to attached screenshot "DBeaver_logmnr_screeashot.png"", however the streamsets cdc pipeline keeps running without any input and output, I also attached the sdc.log from server.from the log I can see the pipeline has gotten the timestamp of the starting SCN operation , however it cannot get records from LOGMNR and then insert into destination.can you let know anything wrong here?

dimaStreamSets Employee

posted in Show us your Pipelines

Analyzing chess games with batch and streaming pipelines

I wanted to have a way to send chess game information from lichess.org (a popular chess server) to Elasticsearch and Snowflake to let me visualize statistics (e.g., how often does World Champion GM Magnus Carlsen lose to GM Daniel Naroditsky?) as well as to generate reports in Snowflake (e.g., when Magnus does lose, which openings does he tend to play?). I ended up accomplishing this with two pipelines: This pipeline is a batch pipeline that ingests data using the lichess REST API and pushes it to a Kafka topic. It uses an endpoint that allows for pulling down all games by username, so I’ve parameterized the username portion and then can use Control Hub jobs to pick which specific users I want to study. This pipeline consumes game data from my Kafka topic, does some basic cleanup of the data (adds a field for the name of the winner rather than the color of the winning pieces and converts timestamps from long to datetime) as well as some basic enrichment (it adds a field that calculates

asked in Show us your Pipelines

"Can values be inserted into another database table from a temporary table that is created inside a function?"

"Can values be inserted into another database table from a temporary table that is created inside a function?"

posted in Show us your Pipelines

"Can values be inserted into another database table from a temporary table that is created inside a function?"

"Can values be inserted into another database table from a temporary table that is created inside a function?"

realmatchaOpening Band

asked in Show us your Pipelines

JDBC ERROR

Error: Source : hello everyone, I want to pull data from the source to the warehouse, in the source there is a "customer_id" column, when the streamsets are run why is there an invalid customer error? in streamsets I also flag based on the "id" column. please help, thank you

DolphinDiscovered Fame

asked in Show us your Pipelines

About Oracle CDC Client Configure

Hi team,I am configuring Oracle CDC Client now, our oracle version is 19C, non-CDB oracle.I ensure our oracle instance has enabled Database Archiving Mode, cause the return of statement "select log_mode from v$database" is "ARCHIVELOG". My question is for Step "Enable Supplemental Logging", since I have three "NO" for statement "select supplemental_log_data_min, supplemental_log_data_pk, supplemental_log_data_all from v$database;". Is it mandatory to run this "alter database add supplemental log data;" ?I actually do not want to make all database tables' Supplemental Logging, only want to enable it for some related tables, if this, should I only run like below "alter table <schema name>.<table name> add supplemental log data (all) columns;" , then run "alter system archive log current;" or I need to run below in sequence?Step 1:Enable minimal supplemental logging, run "alter database add supplemental log data;"Step 2:alter table <schema name>.<table name> add su

hrishikeh132609Fan

asked in Show us your Pipelines

Transformer for spark (Extract and Load).

Hi,Use case.We are load data from Teradata to Snowflake.as we not doing any transformation just extract the Data and loading in Snowflake.Right now, I am using.1.JDBC query consumer (origin) 2.Snowflake (Destination)Note. In our sources we don’t have any primary key or Unique key So, not able to define offset column because of that not able achieve partition the data.

antmcmullenStreamSets Employee

posted in Show us your Pipelines

Ordinance Survey - Address Base Premium Processing mutli-schema documents easily

Ordnance Survey (OS) produce a suite of products for the UK Market, One of these is Address Base Premium (ABP), Which is a set of details that describe buildings, businesses, residential and items you’d find on a map in detail, such as Lat/Long data, how the royal mail refers to it, and how the local authorities/councils describe and classify those items.Sounds good right? Well its not so nice of a set of data to deal with, It is shipped in 5 km batches of data in a single csv file, across 10 different schema patterns within it.. Nightmare? Nope! You know the secret of streamsets!Streamsets doesn't care about schema on read, That is the key to unlocking this… In the ABP Files, the first column has a record identifier, This tells us which schema and rules to process.This pipeline makes such a difficult issue to deal with normally, clear and transparent.You can watch a record come in from the raw file, read it with no header, Look at the first column. Process that to a given lane, That l

dixit.singlaFan

asked in Show us your Pipelines

MongoDB Atlas Connection seems to have a bug.

I was trying to add a MongoDB Atlas connection and on clicking the test connection button, I am getting an error “[unauthorized] not authorized on local to execute command”. I tested the connection string and credentials in Mongo Compass and there it was working fine. So, I could not understand why the test connection is failing. After many tries, I just thought of saving the connection as it is and try it in the pipeline. Magically when used in the pipeline I was able to fetch the data from MongoDB. Then again when I tried the test connection and was getting the same error.Can anyone help me understand this behavior.

hrishikeh132609Fan

asked in Show us your Pipelines

Load data from Teradata to Snowflake

I am trying to load data from Teradata to Snowflake. I am using Transformer for spark as engine.Origin: JDBC query consumer. Destination:SnowflakeI am able to load 12 million data in 1 hour. but I want to improve the performance.Below is the configuration of my pipeline.

dixit.singlaFan

asked in Show us your Pipelines

Posting XML data

Hi Team,We are currently working on creating a pipeline which will consume Json data from Kafka and transform it into XML then will post the xml data using a particular REST API.Using Kafka stage we are able to consume the data successfully.Using Jython evaluator we are able to generate a desired XML from Json (can’t use ‘data generator’ as did not find a way add namespaces in the final XML).The output of Jython Evaluator is xml string. Now using HTTP client I have to post that XML to a particular API. In the HTTP Client I don’t see the option for xml in Data Format. How can I output data in XML format from Jython evaluator and how can I set the data format to XML.I have already tried the Content-Type = application/xml in headers but no luck. Regards,Dixit

hrishikeh132609Fan

asked in Show us your Pipelines

how to create multiple job instance

Hi ,I have created Metadata Driven Data Ingestion Framework.Step 1:I am reading the Metadata from source and passing the runtime parameter(Database name and table name) to the job .Step 2:I am able to load one table at one time. Is there any way to create Mutiple job instance for each record? Please reach out to me If you any solution.email: hrishikeshe143143@gmail.com

asked in Show us your Pipelines

Init query in the JDBC Query Consumer not working for Delete previous data

I use Streamsets Data Collector to Load data from Stage 2 database tables (JDBC Query Consumer) using a query and Write loaded data to another Stage 2 Database table (JDBC Producer). I use Init Query as below to delete the previous records before loading data. But this does not delete any record from the table. It would be great if someone can help me.

asked in Show us your Pipelines

Error while connecting solace event broker

HiI am trying to connect to Solace event broker using JMS Producer and configured all JMS parameters (user name and password ) but still pipeline throwing below error:RETRY: JMS_00 - Could not create initial context 'com.solacesystems.jndi.SolJNDIInitialContextFactory' with provider URL 'tcps://mr-connection-irj89q7fz0j.messaging.solace.cloud:55443' : javax.naming.NamingException: Username must be specified I have already uploaded all solace lib files in external resources section of engine.Please suggest.

drvynguyencosmeticFan

asked in Show us your Pipelines

drvynguyencosmetic

Với cam kết đem đến sự lựa chọn tốt nhất cho khách hàng, Drvynguyencosmetic mang đến một kho lựa chọn ấn tượng từ các thương hiệu hàng đầu như: SVR: Sự ưa chuộng của các chuyên gia da liễu với sản phẩm chất lượng cao. Hãy truy cập ngay https://drvynguyencosmetic.com/ để tìm hiểu rõ hơn về những thông tin trên nhé. Thông tin liên hệ: SĐT: 0931510129Xem thêm:https://drvynguyencosmetic.com/https://www.facebook.com/drvynguyencosmetichttps://www.instagram.com/drvynguyencosmetichttps://twitter.com/dr_vynguyenhttps://www.youtube.com/@drvynguyen.cosmetic/abouthttps://www.flickr.com/people/drvynguyencosmetic/https://www.pinterest.com/drvynguyencosmetic/https://www.tumblr.com/drvynguyencosmetichttps://myspace.com/drvynguyencosmetic

lakshmi_narayanan_tDiscovered Fame

asked in Show us your Pipelines

how to use Google adress api to validate adress column or records

I want to validate the address column by using google address api how to do that .this is my record : I want to validate the address .how to do any one can help this scenario.

asked in Show us your Pipelines

Issue with TCP Server Origin listening port

Am trying to listen some of the port from tcp server origin but its not giving any results, the same port am when using in linux console which we deploy streamsets server am able to listen the message currently am using Version:StreamSets Data Collector 3.19.1 in log am getting TCP_00 - Cannot bind to port [0.0.0.0/0.0.0.0:******]: java.net.BindException: Address already in usehere am hiding port number for confidential with *Could you please help me how to listen messages from streamsets tcp server origin. Thanks in advance

realmatchaOpening Band

asked in Show us your Pipelines

Connect SqlServer

Hi, I am new to Streamsets, started using it today.I have a task to create some backup data from sqlserver to hadoop and hive, the jdbc connection has been changed to sqlserver but it still doesn't work, please help me

HIMANSHU_SURANAFan

asked in Show us your Pipelines

Consuming MySQL binlog data in JDBC producer

I am trying to build CDC pipeline to migrate MySQL database to MySQL database in different server.Here’s the data collector pipeline I’ve created. As per the documentation here, JDBC producers should be able to process binlog data. I’ve used Field remover to only use /Data and /Table fields. When I run the pipeline I am getting error that input record has no data for <schema>.<table> . How can I create a pipeline to consume binlog records?

asked in Show us your Pipelines

HTTP Server

Can anyone let me know on how to use the HTTP Server Origin stage to fetch large volume of recordsMy problem statement is, I need to fetch large volume of data from a source using GET call. We can use HTTP Client stage for the same but since the source application doesn't have pagination configured , we will receive all the records at once . Thus i want to use HTTP server stage to achieve the same. Can anyone suggest how to achieve this. Appreciate your help!!Thanks in Advance!!

ajinkyaStreamSets Employee

asked in Show us your Pipelines

Sorting a specific column and writing it to a new table

Use Case:-We have a dataset, in which we have columns as follows:-FIRST_NAME, LAST_NAME, EMAIL, PHONE, GENDER, DEPARTMENT, JOB_TITLE, YEARS_OF_EXPEREIENCE, SALARY. Lets sort the column SALARY in ascending order and write to a new table with just 4 columns, FIRST_NAME, LAST_NAME, YEARS_OF_EXPEREIENCE, SALARY. Pipeline Design:-Snowflake Table (origin) Sort (Processor) Column Remover (Processor) Snowflake Table (Destination) Pipeline Working:-Snowflake Origin will fetch the table and columns and pass the records to Sort processor Sort processor will sort the data based on the configuration and pass it to Column Remover. (SALARY column, Ascending order) Column Remover will keep or remove the columns based on the configuration. Snowflake Table Destination will write the data to a new snowflake table.

AnkurDiscovered Fame

asked in Show us your Pipelines

Error: java.util.concurrent.TimeoutException: Idle timeout expired: 30000/30000 ms

Hi,We need to connect AWS S3 Select using Groovy scripting. For that need to upload jar files. While uploading below Jar files, getting the subject line error. The size of these jars is less than 1MB.joda-time.jarhttpclient.jarhttpcore.jaraws-java-sdk-s3.jarCan you please help me resolving this?Error: java.util.concurrent.TimeoutException: Idle timeout expired: 30000/30000 ms

1
2
3

Page 1 / 3

Badge winners

ajinkyahas earned the badge Innovator
Sanjeevhas earned the badge Eager to help
AkshayJadhavhas earned the badge Eager to help
john.durkinhas earned the badge Eager to help
samhas earned the badge Eager to help

Show all badges

Powered by Gainsight

Terms & Conditions

Sign up

Already have an account? Login

Social Login

or

Username *

E-mail address *

What I do... *

Data Leader Data Architect Data Engineer Data Scientist Other

Company *

Country *

Zip Code *

Marketing Communications

Yes No

Password *

I have read and Agree to the Website Terms of Service and I have read and acknowledged the Privacy Policy.

loginBox.register.email_repeat

Login to the community

No account yet? Create an account

Social Login

or

Username or Email

Password

Remember me

Forgot password?

Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.

Username or e-mail

Back to overview

Scanning file for viruses.

Sorry, we're still checking this file's contents to make sure it's safe to download. Please try again in a few minutes.

OK

This file cannot be downloaded

Sorry, our virus scanner detected that this file isn't safe to download.

OK