30-Day Free Trial: It’s Never Been Easier To Get Started With StreamSets
Get inspired and gain all the knowledge you need
Recently active
Hi -Welcome to the StreamSets Academy area of Community!Please post questions related to the self-paced video course “DataOps Platform Fundamentals”. Please use the tag “DataOps Platform Fundamentals Course” when creating your question under the “StreamSets Academy” category. Thank you!
Hi, I am new to Streamsets, started using it today.I have a task to create some backup data to hive and hadoop using jdbc multitable consumer.the problem that occurs is that there is a table name with '-' instead of '_' example: call-outcomme_id should the correct one be call_outcomme_id.I have used the field remover to maintain the table name with 'keep listed fields' when I start an error occurs, but when I select 'remove listed fields' the column that I targeted was deleted and it worked.thanks in advance
HiI am trying to establishing a connection to jdbc:mysql://mysqldb:3306/zomato but it is not working.I tried 2 differents approach:at first, I did not want to use Strigo so I was not limited to the 8 hours. So I run my engines in containers within my local machine. The engines are running fine on the Control Hub and I can create and run some pipelines. However, I would like to use the connection cited in one of the lab with mysqldb and zomato. I am not sure how to “install” the zomato db and reviews table locally? I have a container running mysql but from there i dont know what to. If i try to create the connection using my local engine, i get then the below error.JDBC_00 - Cannot connect to specified database: com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. I could see a post with the same error message but it does
We currently have a C# console application for processing large CSV files. We are exploring using Streamsets to perform the same. Below is my high-level requirement.Currently we have 50+ consumers calling our application by placing a CSV file at on-premises shared drive folder. The file size can be from 10,000 to 2 million rows. The application should be able to handle processing up to 5 million rows an hour. Each file drop triggers an individual process run. So that multiple files are processed simultaneously. Below are the high-level steps that are involved in processing the file. Write the file to a database table. Enrich every record with additional data attributes. Make an API call for each record. Update the DB record with the response received from the API call. Also write the response of the API call to a result CSV file for the consumer. After all the records are processed write the result CSV file for the consumer to pick from the shared drive. Thank you.
I have a field mapper to remap camel case fields to snake case. This is the message i am trying to save:{ "id": "e35fd529-613f-5e79-036f-f37c2801041d", "batchName": "SIT Axle3+ 9_12_test", "batchSize": 35, "numberOfAudited": 35, "batchType": "TRIP_BASED", "batchMode": "RANDOM", "batchFacility": "71", "batchDirection": "West", "batchPlaza": "", "batchLane": "", "fromTimestamp": 1694538060, "toTimestamp": 1694566800, "createdTimestamp": 1694626323, "createdBy": "mmccammon@etcc.com", "completedTimestamp": 1697039389, "completedBy": "asayed@etcc.com", "auditDetailList": [ { "id": "a4f274d5-8de5-4c54-ae70-b6a498ab4e9e", "auditBatchId": "e35fd529-613f-5e79-036f-f37c2801041d", "eventId": "49dcbe2c-51a8-11ee-9348-3cecefd01508", "eventType": "TRIP", "eventTimeStamp": 1694549374, "auditResult": false, "auditCorrections": [], "auditedTimestamp": 1694635817, "auditedBy": "ppradeep@etcc.co
Pipeline Status: RUN_ERROR: QUERY_EXECUTOR_001 - Failed to execute query 'UPDATE user SET created_by='HistoricalLoad', created_date="2020-04-27 15:11:25", modified_by='', modified_date="" WHERE id = '71021903' LIMIT 1': Data truncation: Incorrect datetime value: '' for column 'modified_date' at row 1 UPDATE user SETcreated_by='${record:value('/created_by')}',created_date="${str:isNullOrEmpty(record:value('/created_date'))?NULL: record:value('/created_date')}",modified_by='${str:isNullOrEmpty(record:value("/modified_by"))?NULL:record:value("/modified_by")}',modified_date="${str:isNullOrEmpty(record:value('/modified_date'))?NULL: record:value('/modified_date')}" WHERE id = '${record:value('/id')}' LIMIT 1
I have some json objects that looks as follows and i want them to be all merged in an array like :data: [{nstaskid: “...”,awb: “….”},{nstaskid: “...”,awb: “….”},] i tried to use javascript evalutor , loop over the records and push them in an array and write this record array , but it didn’t work.Can you please help me with that .
Hi team,I am using field renamer for mssql column name rename from origin mssql “regionid” to target snowflake “RegionID”.Attchment 1 is my configure for Field Rename configuration. When I ran “previewer”, did not show any errors.Attachment 2 is log.Cannot figured out what is the exact reason for the failure from the log.Can you suggest?
Solution.The issue can be handled by Field mapper processor. Below the syntax for the same. ${str:replaceAll(f:name(), '[.\\-/()!&%^$#@]','')} Attached the screen shot for the reference. NB : The issue also can be handled in different ways e;g(Expression Evaluator, scripting processor etc)
Hi, I am getting SNOWFLAKE_56 - Key fields not specified for table 'ORCL_EMP' errorWhen Table Auto create is enabled, CDC also enabled and given table key column as ID column
I have a bunch of excel (xls) files that are hosted online (HTTPS) and periodically updated. I want to be able to store the excel file to ADLS Gen2. What is the best way to approach this? Data SourceNo authentication/authorization needed. HTTPS XLS formatEnvironment StackStreamsets Azure Shop
There is a directory created in IIS (Windows) and published via FTPS.When trying to use this directory in StreamSets using the SFTP|FTP|FTPS component, it returns the following error: "REMOTE_11 - Unable to connect to remote host 'ftps://ftps.hostname.net:921/PLV' with given credentials. Please verify if the host is reachable, and the credentials and other configuration are valid. The logs may have more details. Message: Could not list the contents of "ftps://ftps.hostname.net:921/PLV" because it is not a . : conf.remoteConfig.remoteAddress" Credentials are OK as it allows to open this directory using LFTP service on LinuxIs there any StreamSets configuration missing to fix this issue?
i trying to write avro format as wholefile format from kafka to localfs but i got error.below i share the pipeline ,can you please suggest how to done this
I am getting this type of error JDBC_00 - Cannot connect to specified database: com.mysql.cj.jdbc.exceptions.CommunicationsException. Can anybody help me in this.
I would like to process tables with zero records from JDBC Multitable Consumer to my target location,is it possible to do so please help me to find out solution to this query.
I want to Connect MariaDb to the StreamSet by JDBC processors. MariaDB is similar to the MySQL to need to configure the connection, getting error not supported suitable driver found.So Please help me to find that driver for MariaDb
Hello Team,Could you please assist me with below JSON flattening, [ { "ID1": "Test_value", "ID2": "Test_value2", "ID3": { "ID3_1": 1, "ID3_2": 2, "MAIN_1": { "OUTPUT": [ { "OUT1": "1", "OUT2": "2", "OUT3": "3", "OUT4": "4" }, { "OUT1": "6", "OUT2": "7", "OUT3": "8", "OUT4": "9" }, { "OUT1": "10", "OUT2": "11", "OUT3": "12", "OUT4": "13" } ] }, "MAIN_2": { "OUTPUT": [ { "OUT1": "1", "OUT2": "2", "OUT3": "3", "OUT4": "4" }, { "OUT1": "6", "OUT2": "7", "OUT3": "8", "OUT4": "9" }, { "OUT1": "10", "OUT2": "11", "OUT3": "12", "OUT4": "13" } ] }, "MAIN_3":
Hi, community, I am using Streamset data collector for data migration from PostgreSQL to Cassandra But while testing the Cassandra connection I am encountering the above error.
Hi All,I created a microservice pipeline with REST API Origin using the DataOps platform as per video: https://www.youtube.com/watch?v=wIZWMV1bMl4Now, I am unable to invoke the pipeline using 3rd party tools like Postman. I have chosen default settings in SDC and headers as (X-SDC-APPLICATION-ID:sdc_microservice & Content-Type:application/json) yet I’m facing issue as “Error: connect ETIMEDOUT xx.xx.xx.xx:8000”.However, when I use the same URL in a curl command where SDC container is running, I see, 200 OK response. curl -i -X GET http://xx.xx.xx.xx:8000/rest/v1/user --header "X-SDC-APPLICATION-ID:sdc_microservice"There is no connectivity issue. So, can anyone tell me why I am unable to use Postman tool or if I am missing anything here?Below are the images showing the same:
Hi,I am new to streamsets here, so please bear with my questions :) Here is my first one. I created a simple pipeline to copy data from a employee table running sqlserver to employee table on postgresql. I used a JDBC Query consumer to pull records in incrementally using the offset value and a JDBC Producer stage to insert records. When I run the pipeline, it starts execution, select all records from sqlserver and inserts into postgres and then return a java.lang.NullPointerException. It goes into retry mode and continues to fail with same error on every retry attempt. I tried adding more records in the table so that the next retry attempt would pull the new records based on the offset value, however, the pipeline is not pulling them. I also tried filtering null records using stream selector, but it didnt work. I also added Pipeline Finisher stage to end the pipeline when a “no more data” event is generated, but it doesnt seem to work as expected either. The pipeline continues to throw