- - Knowledge base
Product Updates
Events

30-Day Free Trial: It’s Never Been Easier To Get Started With StreamSets

8 months ago

Home
Community overview
Welcome

Welcome

Welcome to the StreamSets Community

89 Topics

Newest first

Recently active Most replies Most views

realmatchaOpening Band

asked in Show us your Pipelines

Connect SqlServer

Hi, I am new to Streamsets, started using it today.I have a task to create some backup data from sqlserver to hadoop and hive, the jdbc connection has been changed to sqlserver but it still doesn't work, please help me

8 months ago

SubhaFan

asked in Show us your Pipelines

HTTP Server

Can anyone let me know on how to use the HTTP Server Origin stage to fetch large volume of recordsMy problem statement is, I need to fetch large volume of data from a source using GET call. We can use HTTP Client stage for the same but since the source application doesn't have pagination configured , we will receive all the records at once . Thus i want to use HTTP server stage to achieve the same. Can anyone suggest how to achieve this. Appreciate your help!!Thanks in Advance!!

8 months ago

ajinkyaStreamSets Employee

asked in Show us your Pipelines

Sorting a specific column and writing it to a new table

Use Case:-We have a dataset, in which we have columns as follows:-FIRST_NAME, LAST_NAME, EMAIL, PHONE, GENDER, DEPARTMENT, JOB_TITLE, YEARS_OF_EXPEREIENCE, SALARY. Lets sort the column SALARY in ascending order and write to a new table with just 4 columns, FIRST_NAME, LAST_NAME, YEARS_OF_EXPEREIENCE, SALARY. Pipeline Design:-Snowflake Table (origin) Sort (Processor) Column Remover (Processor) Snowflake Table (Destination) Pipeline Working:-Snowflake Origin will fetch the table and columns and pass the records to Sort processor Sort processor will sort the data based on the configuration and pass it to Column Remover. (SALARY column, Ascending order) Column Remover will keep or remove the columns based on the configuration. Snowflake Table Destination will write the data to a new snowflake table.

8 months ago

AnkurDiscovered Fame

asked in Show us your Pipelines

Error: java.util.concurrent.TimeoutException: Idle timeout expired: 30000/30000 ms

Hi,We need to connect AWS S3 Select using Groovy scripting. For that need to upload jar files. While uploading below Jar files, getting the subject line error. The size of these jars is less than 1MB.joda-time.jarhttpclient.jarhttpcore.jaraws-java-sdk-s3.jarCan you please help me resolving this?Error: java.util.concurrent.TimeoutException: Idle timeout expired: 30000/30000 ms

9 months ago

Anonymous

published in Events & Webinars

New Course: Agile Reporting with StreamSets

We are proud to announce a new, free, and completely self-paced course focused on real world scenarios: Agile Reporting with StreamSets.Learn how to streamline financial reporting through modern, resilient, and repeatable data integration with StreamSets. Quickly and securely integrate and migrate data from internal and external sources, legacy systems, and files into modern data platforms such as Snowflake to accelerate financial reporting.This self-paced hands-on training course teaches you how to overcome several common challenges experienced by organizations attempting to accelerate reporting capabilities:"hard to access" data in legacy systems data silos and data sprawl heavy reliance on IT and a lack of self-service access moving data from legacy to modern data environments integrating internal and external data sources including filesAudienceThe course is designed for data engineers and anyone who is interested in learning how to use StreamSets to solve reporting challenges.Enr

850

10 months ago

newkanyemerch32Fan

asked in Show us your Pipelines

btsmerchstore45

To operate on an EMR cluster, I have developed a straightforward transformer pipeline. I used the required configuration information during cluster configuration.I have the following problem:

11 months ago

tamilarasupDiscovered Fame

asked in Show us your Pipelines

Databricks Transformer

Hinow i am working in streamsets data-ops transformer platform. i try to connect databricks cluster in streamsets. But i facing some issues. JOBRUNNER_63 - Pipeline status: 'START_ERROR', message: 'Failed to start pipeline, check logs for error. The Transformer Spark application in the Spark cluster might not be able to reach Transformer at ‘http://661f1312d1e9:19630’. If this is not the correct URL, update the transformer.base.http.url property in the Transformer configuration file or define a cluster callback URL for the pipeline and restart Transformer.'START_ERROR: Failed to start pipeline, check logs for error. The Transformer Spark application in the Spark cluster might not be able to reach Transformer at ‘http://661f1312d1e9:19630’. If this is not the correct URL, update the transformer.base.http.url property in the Transformer configuration file or define a cluster callback URL for the pipeline and restart Transformer. how to clear this error? and how to connect the cluster to

11 months ago

tamilarasupDiscovered Fame

asked in Show us your Pipelines

Transformer

In Transformer, i sent data from file(directory) to file(local fs)but the destination will got errorJOBRUNNER_63 - Pipeline status: 'RUN_ERROR', message: 'Operator File_3 failed due to org.apache.spark.sql.AnalysisException, check Driver Logs for further information'RUN_ERROR: Operator File_3 failed due to org.apache.spark.sql.AnalysisException, check Driver Logs for further information (View Stack Trace... ) how to clear the error and I need help to clear the errorThanks Tamilarasu

11 months ago

tamilarasupDiscovered Fame

asked in Show us your Pipelines

mongodb

Is possible to perform cdc in mongodb atlas using mongodb oplog? The mongodb oplog connection string will be satisfied for MongoDB atlas but i got time out errorplease give some guidance to clear that error.

1 year ago

Anonymous

published in Events & Webinars

Business Value of Data Engineering Survey is now openNews

The first Business Value of Data Engineering Survey is now open. We know you’re more than a data ‘plumber’ - we see it every day in the outcomes you help your organizations create. With this survey, we’re looking at how you work with your business colleagues to make it happen. Please take 5-7 minutes to fill it out — you’ll automatically be entered to win one of 5 $100 Amazon gift cards! Take the survey here ▶️ https://tinyurl.com/dataengbv

180

1 year ago

PradeepStreamSets Employee

posted in Show us your Pipelines

Example pipelines for Delta lake lookup processor

Input data:[ { "product_id": 1, "unit_price": 23.2 }, { "product_id": 4, "unit_price": 21.64 }, { "product_id": 2, "unit_price": 10.87 }]Delta lake lookup table. This is same for all operations performedReturn all matching rows, generating a record for each match: Return first matching row:There won’t be change in the output from above since there are no duplicates in input. Return a count of matching rows:Return true if matches exist, otherwise false:Input data:[ { "product_id": 1, "unit_price": 23.2 }, { "product_id": 2, "unit_price": 21.64 }, { "product_id": 4, "unit_price": 10.87 }, { "product_id": 5, "unit_price": 99.32 }, { "product_id": 4, "unit_price": 20.33 }]Return all matching rows, generating a record for each match:Return first matching row:Return true if matches exist, otherwise false: Return a count of matching rows:

1 year ago

Santhosh KumarDiscovered Fame

asked in Show us your Pipelines

what is best of streamsets?

Hello Team, My self Santhos kumari'm Beginner of streamsets so which one is best of the learning in depth of streamsets.

1 year ago

lakshmi_narayanan_tDiscovered Fame

asked in Show us your Pipelines

JDBC PIPE LINE NOT RUNING,SHOWS ME ERROR LIKE: JDBC_16 - Table '' does not exist or PDB is incorrect. Make sure the correct PDB was specified

I facing a issue on running JDBC pipeline ,it shows the below error ,I tried so many things from the google but no use anybody can solve .

1 year ago

Anonymous

published in Events & Webinars

StreamSets Roadshow - San Francisco and New YorkNews

I’m excited to write that we’ve announced dates for the StreamSets Roadshow. (Details here and in your right hand tool bar) First upcoming are San Francisco and New York. I wanted to personally invite all my fellow data engineers to join me at my training sessions during these events.If you use my code BuuckSF50 for the San Francisco Roadshow or BuuckNYC50 for New York you can get 50% off a 4 hour training and a half-day of info on best practices and innovation with StreamSets. I hope to see you there!

270

1 year ago

gkognoleFan

asked in Show us your Pipelines

Unable to ingest data from Azure SQL (CDC) to Azure Data bricks using Stream Sets.

Trying to build data pipeline for Azure SQL Server DB (CDC) as source and Azure Data bricks (Delta tables) as destinationI have referred data pipeline sample fromhttps://github.com/streamsets/pipeline-library/tree/master/datacollector/sample-pipelines/pipelines/SQLServer%20CDC%20to%20Delta%20Lake Getting below error for few records in Schema preview as-well:DELTA_LAKE_34 - Databricks Delta Lake load request failed: 'DELTA_LAKE_32 - Could not copy staged file 'sdc-4a076fce-7a73-45ba-8dd7-29e58848cf23.csv': java.sql.SQLException: [Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: Unable to infer schema for CSV. It must be specified manually. Note : On Preview/Draft Run → Pipeline is able to capture changes from Source DB, successfully created files in stage (ADLS container) and created Delta tables at destination but it it fails to ingest re

1 year ago

iaakashbansalFan

asked in Show us your Pipelines

HTTP Client Origin - Stop Condition for Pagination

I need to extract data from an API using OAuth2 connection. As per the data, they provide /cursor at the end of each page and that cursor can be used to get the records from next page.In Pagination tab, I used Link with Response field and tried to add Stop Condition with /cursor. but, not able to handle this scenario.Can someone please help. Thanks in Advance!!