Got a Question?
Can't find what you're looking for? Ask it here!
- 604 Topics
- 1,552 Replies
Producing single S3 output file
Hi,We are processing S3 files with the batch size of 1000, the output also we are planning to store in S3 only. But we have he input file of 10000 records so we are seeing 10 output files in s3. As per client requirement we need to create a single file.Is there a way to create a single S3 output file from Streamsets.
How to pass Table Name as an attribute while producing events from hadoop fs?
My pipeline is configured to pick data from JDBC Multitable Consumer Origin and put it in Hadoop FS Destination. My requirement is to rename the output file at destination with TableName_TimeStamp.Able to achieve TimeStamp using Expression Evaluator. How do I get TableName as an event passed from Hadoop Fs to be used in HDFS File Metadata?
Hi, I have a product table named ‘product’ in MySQL as follows:product_id | Product | FieldName1 Milk milk2 Water water3 Coffee coffee Then, I have a source fully de-normalized table named ‘raw_transaction’ as follows:transaction_Id | Date | customer | milk | water | coffee | 1 1/1/2021 John 12 1/1/2021 Mary 1 13 1/1/2021 Anna 1 Can you give me a hint on how I can create a pipeline in StreamSets so that I can use the product table as meta-data in creating a dynamic query so that I can populate a ‘FactCustomerProduct’ as follows For each product in products INSERT INTO FactCustomerProduct (product_id,date_id,customer_id,transaction_id,quantity) SELECT p.product_id,r.date_id,customer_id,r.transaction_id,r.<fieldName> FROM ‘raw_transaction’ r [...] WHERE r.<fiel
ORA-00942 While running Oracle CDC on 19c Databases
The configuration is made according to the linked document CDC documentationhttps://docs.streamsets.com/portal/datacollector/3.17.x/help/datacollector/UserGuide/Origins/OracleCDC.html#concept_rs5_hjj_tw run into the below errorJDBC_52 - Error starting LogMiner Caused by: com.streamsets.pipeline.api.StageException: JDBC_603 - Error while retrieving LogMiner metadata: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
How does JDBC destination handle fields without matching columns?
How does JDBC destination handle fields without matching columns? I am encountering a situation where it appears that the fields without matching columns are ignored, but this is not defined in the documentation and seems counter to what I would expect (an exception complaining about a column not existing in the table). Please provide a deeper description of what is happening here.
Erro ao usar um diretório criado no IIS (Windows) e publicado via FTPS
Existe um diretório criado no IIS (Windows) e publicado via FTPS.Ao tentar usar este diretório em StreamSets usando o componente SFTP|FTP|FTPS, ele retorna o seguinte erro: "REMOTE_11 - Unable to connect to remote host 'ftps://ftps.hostname.net:921/PLV' with given credentials. Please verify if the host is reachable, and the credentials and other configuration are valid. The logs may have more details. Message: Could not list the contents of "ftps://ftps.hostname.net:921/PLV" because it is not a folder. : conf.remoteConfig.remoteAddress" As credenciais estão OK, pois permite abrir este diretório usando o serviço LFTP no LinuxEstá faltando alguma configuração de StreamSets para corrigir esse problema?
Dynamically creating folders in S3 using streamsets
im using oracle as a source and s3 as destination.im ingesting the records from the source and adding table name as a column through expression evalutaor. i want to use this table name and create a folder in s3 dynamically before dropping those records in s3 bucketso the folder name should be created dynamically through streamsets fetching the name of the table. what should be the approach for the same?for ex: if I'm fetching records from table abc , I need to create a folder called “abc” and drop all the records inside that folder.
Find the status of a job using REST API
Hi,I have triggered one of the job suing REST API in StreamSets and I need to the status of the job if it is failed or successful. I am able to get the status of the job but if the job is failed or successful, I am getting status as Inactive in both success and failure case. I need to get exact status like success or failure and error message in case of job failure. Please let me know how I can I achieve this StreamSets.
How to check the Retry records in http client
Can you please help me on below points.How we can get to know how many times StreamSets has retried for failure records ?what is their data ?What value we should give in Base backoff interval field ? and what all settings we have to do. bcz for me i am not seeing the incoming data to streamsets is matching with processed records. the difference between both is keeps on increasing. Can you please suggest something on this.
All Files are not copying from one folder to another folder in the same s3 bucket
HI, I have tried copy all the files from one folder to another folder with in same s3 bucket using streamsets job. But I am seeing only 1 or 2 files copied into destination folder compared to source folder(like in source folder if 7 files are there, but in destination folder I am seeing 1 or 2 files are copying). Can any one help me on this issue. ThanksMurali
how to know what values are present in action for filed remover compoennet in SDK
in Control hub i will know what are values are present in action for Field Remover but in SDK how to know.field_remover = pipeline_builder_14.add_stage('Field Remover')for field_remover.action, what values are present how i will know through SDKThanks,Ashok.
Lookups (into DeltaTable) delivering extremely bad performances when used in Transformer
Lookups (into DeltaTable) giving extremely bad performances (sometime it stays in pre-execution stage forever) when used in Transformer with origin of 1000 records, although, it works decent enough in streaming mode which i guess is due to the lesser number of incoming records.
i want to extract multiple fields from JSON/XML using XML Parser etc..
i want to extract multiple fields from JSON/XML using XML Parser etc..i am able to extarct with groovy but i want to achive like belowreading a file from S3 using data_format as XML extarct multiple fileds from XML in step 2<body><head>1</heaad><m>3</m><tail>2</tail><body>in step 2 i want to have 2 values in my output with out using any groovy etc..i want to achive using XML parser or filed mapper etc.. as of today i see only one value i can extarct these ex : /body/headbut i want to extarct both /body/head/body/tail
View job version differences for same pipeline
I’m looking for a tool to help prevent missing job version updates when moving code from one environment to another (development / UAT / production).If we update a pipeline and job in our development environment but move ancillary code to our test environment a week later, I’ve seen it is easy for our team to miss the test environment job update, resulting in wasted testing time. It would be helpful if we could see at a glance the differences in job versions before doing a deployment to help validate that we have the correct list of jobs to update as a part of that deployment.I see the REST API and could put together a script to compare versions, but I was wondering if there is any way we could visually see that in Control Hub, or even a way to build a pipeline or report that could be run to give this information. Thanks in advance!-Spyder
Already have an account? Login
Login to the community
No account yet? Create an account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.