After building a pipeline and running a job using sdk python I am unable to get the ways to schedule a job using sdk and also there are no steps given in StreamSets documents for the same.Can anyone guide me on this.
Hi team I have two json request for singleton[{“a”: “1”“b”:”2”“c”:”3”}]and for batch[{“a”: “1”“b”:”2”“c”:”3”},{“x”: “4”“y”:”5”“z”:”6”}] if singleton come i need to pass first request and if batch comes i need to pick the second one and in the second one i need to pick first value. am using json generator how to convert that into array as a request and then for batch how to pick the first value from multiple values.
when i using field re-namer to perform multipile function with same field it shows error ,if there is any option to perform operation like caps and remove special characters of field with one processor.
Hi Team,we Observed, After the Creation Of Deployment using SDK Python Version 5.1.0, we have to Run the “Get Install Script” in Our Local VM for the Creating Engine. Except This Step How to Create the Engine By using SDK Python? Requesting to you Please Provide the Information. Thank YouSuresh_venkata
i am reading a file from S3 bucket and doing lookup against data from Postgress DB using JDBC lookup using Data collector.in source i have some 341k records against 341 records in postgress DB.my observations are1.it taking 30min to process 50K records.2.some records are going to error even matching one present in DB3.i have tried by enabling local cache.
I am executing a simple transformer pipeline on EMR cluster (6.5.0). EMR logs show that the job is successful, but the transformer pipeline fails with error “START_ERROR: Application has completed. Clearing staged files..”. What could be the issue here? Pipeline logs are as below:2023-01-27 10:44:46,788 INFO Current Step Status is PENDING. Waiting for 30 seconds before checking status again.. EMRAppLauncher *842bf273-5a66-11ed-b52a-ab8a97a1d11f@b601f953-5a66-11ed-b8e4-b31442db6663 runner-pool-2-thread-292023-01-27 10:45:16,824 INFO Application started successfully. Current Status is RUNNING EMRAppLauncher *842bf273-5a66-11ed-b52a-ab8a97a1d11f@b601f953-5a66-11ed-b8e4-b31442db6663 runner-pool-2-thread-292023-01-27 10:45:16,824 INFO DataTransformerLauncher start method finished DataTransformerLauncher *842bf273-5a66-11ed-b52a-ab8a97a1d11f@b601f953-5a66-11ed-b8e4-b31442db6663 runner-pool-2-thread-292023-01-27 10:46:16,857 INFO
HiI have created a simple transformer pipeline trying to run on EMR cluster. In Cluster configuration I have used required configuration details.Facing below issue, Can anyone help me this?
Hello,Please help if you can. I was given these instructions: Please go through the installation steps for the common tarball version and build Data Collector pipelines based on the Basic and Extended Tutorials (find the links below). You can use your home computer/laptop as we support a variety of operating systems (except Windows). After creating your account and logging in, select "Download Data Collector," then select "Linux Server - For production use," and then select "Tarball - recommended". This will allow you to download Data Collector version 3.22.3. I am using a IOS operating system, I followed the instructions above, but I cannot seem to get it to work via the terminal. I am new to the community and I am new to learning about Streamset. I was given a task that I can’t seem to get through. I need assistance installing SDC and building pipelines on my IOS operating system. Please help
One of the pipelines that we created immediately switces to "FINISHED" state after we start it. This started happening after we "FORCE SHUTDOWN " the pipeline with the CRON. Looks like the Pipeline id got registered somwhere with the CRON Shutdown status and wont allow us to restart.DEBUGStaring pipeline with offset: {}ProductionPipelineRunner ERRORThe Scheduler cannot be restarted after shutdown() has been called.SchedulerPushSourceorg.quartz.SchedulerException: The Scheduler cannot be restarted after shutdown() has been called. at org.quartz.core.QuartzScheduler.start(QuartzScheduler.java:529) at org.quartz.impl.StdScheduler.start(StdScheduler.java:142)
How easy would it be to create a custom Scala or PySpark stage, that can output an array of spark data frames (like it can receive for input) rather than just one?
Become a leader!
Learn how to make the most of StreamSets with user guides and tutorials
Get StreamSets certified to expand your skills and accelerate your success.
Contact our support team and we'll be happy to help you get up and running!
Already have an account? Login
No account yet? Create an account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.
Sorry, we're still checking this file's contents to make sure it's safe to download. Please try again in a few minutes.
Sorry, our virus scanner detected that this file isn't safe to download.