Team, Our DataOps deployment runs on AWS EC2. We have one Data Collector engine running on that deployment. We are seeing this message frequently, i.e. we are losing connection to our Data Collector engine quite often. How do I check whether my Data Collector is running fine on EC2 machine (Linux) to confirm its the connection issue (Control Hub talking to Data Collector Engine) ? Any idea on why we see lose engine connectivity every few minutes or an hour? Any help would be appreciated. Cheers,Srini
we are using multithreading for ingesting bulk data from source using databricks notebook. we are running the notebooks as a job. we have a requirement like pausing the job and resuming the job from where it left off previously(since we can only ingest at particular time).wanted to know how to run the databricks notebook job to pause at certain time and resume it from where it had left off previously instead of starting from the scratch again.
Hi All I am reading a CSV file created by a instrument , it produces a csv file with header information , then a collection of fields with values for each element tested for , I want to write each element value set to JDBC as a single rowexamplemachine_id, date, time , run Number, element, element_value, element_error, element, element_value,element error (50 of these sets )i need to write to the database machine_id,data,time,run Number, element, element_value,element_ error , for each set of elements, values, errors , I can’t figure out how to loop through the record,So from the 1 csv record I need to write 50 JDBC records , is it possible in streamsetsthanks
There is a function, str:splitKV which creates a map from a string, provided the key value pairs are encoded in the string, but how does one create a map from a group of individual fields? I suppose one could convert the values into a single string with the necessary key/value pairings and then use the str:splitKV function, but such a conversion seems excessive and might require extra type conversions.Is there a better way?
we are ingesting the data from oracle to databricks,while ingesting i could see some of the staging files (CSV) in s3 bucket are unable to insert into databrciks, its showing as stage errors. is their a way to move these staging files to different bucket and retry it again ?
while ingesting data from oracle to databricks, i could see few staging files(s3 bucket) are moving into stage errorserror:’DELTA_LAKE_32’ -could not copy stage file <filename> Error running query , at least one column must be specified for the tablewhile we inserted the same record manually in databricks , we could see the record got inserted. No issues found on data as such.can you please give suggestions on the error which we are receiving.
After launching pipeline using SDK, if i have any changes in pipeline, i want to do it from SDK instead of UI and those changes has to reflect in UI. how can i achieve this one without again launching pipeline.some more queries1.preview using SDKhow to know stage has no errors in SDK
Hi, We know a SDC job be involked through HTTP endpoint and can a set of parameters be passed as input parameter?Effectively we want the following :Originating system X > invoke SDC job with a set of parameters > SDC job looks up a database table with the passed on parameters > Returns the results to a different external system Y[ note its not the originating system X] > Send a response back to Originating system with a pass/fail status.Any demo pipeline will help. Are we advised to use REST Service origin for https://docs.streamsets.com/portal/controlhub/latest/help/datacollector/UserGuide/Microservice/Microservice_Title.htmlRegards, Anirban
i am reading a fixed width file from s3 and trying to parse with Jython evaluator in Data collector.when i try to run the pipeline i am getting below isue. Script error while processing batch: javax.script.ScriptException: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Cannot create PyString with non-byte value in <script> at line number 60i am reading file using UTF-8 option.
Hi,I’m trying to fetch data from Shopify using HTTP Clent Origin with pagination set to “Link in HTTP Header”.As per Shopify (refer link here - https://shopify.dev/api/usage/pagination-rest#link-headers) - the parameters that comes back in “link” contain the page_info parameter that we should be using while querying the next page.Using StreamSets HTTP Client Origin, the pagination option using “Link in HTTP Header” is not working. I think this is because SDC is expecting the parameter name as “page” and hence this issue.Has anybody faced this earlier and if so how have you been able to make progress here?Anyone from StreamSets who can comment on this?
HOW TO FIX ERROR THAT IS NOT CORRESPOND WITH MARIADB/MYSQL VERSION FOR THE RIGHT SYNTAX.. First error: SQLState: 42000 Error Code: 1064
My Streamset loading get this error but i don't know how to fix it. In detail, i paste the error and pineline image here. Many thanks for helping com.streamsets.pipeline.api.StageException: JDBC_77 - SQLSyntaxErrorException attempting to execute query 'SELECT ID AS product_id, AMOUNT AS amount, MODIFIED_DATE FROM TBL_PRODUCT WHERE MODIFIED_DATE > 2021-12-27 17:31:43 ORDER BY MODIFIED_DATE'. Giving up after 1 errors as per stage configuration. First error: SQLState: 42000 Error Code: 1064 Message: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '17:31:43 ORDER BY MODIFIED_DATE' at line 6 This error appear when i have verified and the pineline have run, the Mariadb database received full data successfully. But then this error came
Hi - while processing a CSV file via Directory Origin, I encountered the following error "SPOOLDIR_01 - Failed to process file '/tmp/out/customer_daily_incremental/CUSTOMER/CUSTOMER_20092021.csv' at position '177394': com.streamsets.pipeline.lib.dirspooler.BadSpoolFileException: java.io.IOException: (line 1138) invalid char between encapsulated token and delimiter" - I now know that the records present in line 1138 doesn't comply to the expected format (I see there is an additional double quote). But I don't want SDC to completely abort this file. Is there a configuration available where I can opt to log the record with error message and proceed to next line in the file?
The data pipeline is jdbc → hive_metadata → hadoop file system and hivemetastore. Data moves from oracle database to hadoop file system. Schema is also getting created but there is no data in the tables. Tried changing the number idle time settings to 10 seconds.Any help in this regard will be greatly appreciated.
We are ingesting a table uisng streamsets which has a column containing xml data.we have added an extra stage to create a hash value for all the records coming through.In our target we could see all xml data is getting suffixed with hash value, and seperate hash column.How to get only the xml value in xml column without suffixing hash value?
How we can check the condition in Stream selector for which service is not giving any response because of some exception at service end.If any exception at service end Http client is giving “No output records produced.” message so based on this message how we can further process like if success response comes then we can move to Trash if this message is coming “No output records produced.” we may insert into our sql table. So how we can identify this message in Stream selector or any other way.
i am reading a file from S3 bucket and doing lookup against data from Postgress DB using JDBC lookup using Data collector.in source i have some 341k records against 341 records in postgress DB.my observations are1.it taking 30min to process 50K records.2.some records are going to error even matching one present in DB3.i have tried by enabling local cache.
When Microsoft D365 is landed to an Azure Data Lake, the CSV files do not contain a header record. The schema is actually stored in a separate json file. In SDC, is there a way to apply a schema (mainly, field names) to a pipeline either manually (ie. by supplying the field names) or by reading from a separate file?
Already have an account? Login
Login to the community
No account yet? Create an account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.