I want to read the file in the middle of the pipeline instead from the origin. Can we do that using groovy script .
Hi TeamI am looking for appending date and time timestamp at end of each file in format <filename>_YYYYMMDDHH24MM.txtCan someone help me how to achieve it in shell script?[as per all ways EST, IST timings etc]
I am getting EXcel parser 2 error while reading file from SFTP. It should parse the file and read records instead it is read ing whole file.Initially it ran fine without any issue. Seems an empty excel file came and was not able to parse. org.apache.poi.openxml4j.exceptions.InvalidFormatException: Your InputStream was neither an OLE2 stream, nor an OOXML stream at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:186) at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:149) at com.streamsets.pipeline.lib.parser.excel.WorkbookParserFactory.open(WorkbookParserFactory.java:70) at com.streamsets.pipeline.lib.parser.excel.WorkbookParserFactory.createParser(WorkbookParserFactory.java:58) at com.streamsets.pipeline.lib.parser.excel.WorkbookParserFactory.getParser(WorkbookParserFactory.java:53) at com.streamsets.pipeline.lib.parser.WrapperDataParserFactory.getParser(WrapperDataParserFactory.java:66) at com.streamsets.pipeline.stage.origin.remote.Rem
Hi team,We are using streamsets to load data to snowflake table. We configured an common Internal Stage for the pipeline, as well as target tables, can we understand the mechanism for such pipelines are firstly load data as file and put into Snowflake cloud (Internal Stage), and then will trigger a copy command from Internal Stage to taget Snowflake tables like:copy into mytable … (is it from user stage or table stage or name stage)?Can StreamSets load data to external stage and then copy to table?ANy cost difference for load data to external stage and internal stage by StreamSets?I suppose it is just the cost difference between the storage cost of snowflake cloud and external cloud, cause external cloud a bit cheaper than snowflake internal cloud? Any computing resource difference when loading to internal and external stages. Thanks.
Hi,The above pipeline reads the CSV files uploaded into an S3 bucket for new contact file uploads. Based on the new contacts available as a CSV, it converts them into a JSON. But the critical part here is that it merges each batches (each batch size is 1000) into a single JSON of multiple records and finally make a single HTTP POST call to one of our internal system to carry out a CONTACT update call.
hello streamsets community, the streamsets is installed in cloudera manager, but it does not have graphics that show the memory usage of the jvm with that, i need to know how to query the jvm memory usage of streamsets by the command line.
Hi Team, how to run streamsets job in debug mode?
Hi! I have a pipeline that uses a JDBC multitable consumer origin and that populates the jdbc.tables header attribute, but when I change out the origin to a JDBC query consumer it is not longer populated. Any advice on why this might be happening?
Hello,I’m new to streamsets. I have started by creating an account in Control Hub - StreamSets. In the process, I have created a test environment and a deployment. But I don’t know how to do the step below.“After creating a self-managed deployment, you set up a machine that meets the engine requirements. The machine can be a local on-premises machine or a cloud computing machine. Then, you manually run the engine installation script to install and launch an engine instance on the machine.”Can you please suggest how to do this?
Hi, running the PostgreSQL Metadata processor and the JDBC Producer in a pipeline with the Oracle CDC client I run into the above error when adding a column to a table on the source side.Checking the destination table I found that the Metadata processor altered the target table perfectly but it seems that the JDBC producer does not recognize the new field and assumes that the record, consisting of two columns, the primary key column and a value for the new added column does not carry any relevant data and issues the error.I attach a screenshot with the pipeline and data.May be someone has an idea what’s wrong? I checked the articles about data drift and cannot find any hint about what I may have configured wrongly. Any help is greatly appreciated!Here the stacktrace:com.streamsets.pipeline.api.base.OnRecordErrorException: JDBC_90 - Record doesn't have any columns for table 'bunkerstreamsets.s_product' at com.streamsets.pipeline.lib.jdbc.JdbcGenericRecordWriter.processQueue(JdbcGener
Is there a way we can use RSA public and private key to encrypt and decrypt the contents of a file in a pipeline?
is there a way to write headers first like the ones in yellow rows in the screenshot before adding the needed columns after in a csv file ?after the 4th row the data should be displayed as a normal csv file, However i cannot figure out how to write the first 3 rows .
Become a leader!
Become a leader!
Learn how to make the most of StreamSets with user guides and tutorials
Get StreamSets certified to expand your skills and accelerate your success.
Contact our support team and we'll be happy to help you get up and running!
Already have an account? Login
No account yet? Create an account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.
Sorry, we're still checking this file's contents to make sure it's safe to download. Please try again in a few minutes.
Sorry, our virus scanner detected that this file isn't safe to download.