Got a Question?
Can't find what you're looking for? Ask it here!
- 689 Topics
- 1,772 Replies
E2E integration testing
We want to create an integration test suite and run it in a self-contained world of headless containersThe high level goal is docker-compose to start all containers streamsets kafkaconfigure streamsets send data to kafka topic StreamSets processes the data in the topic and send its to other topics Application under tests, processes the data (fails / passes) - the testtear down Question:How do I configure streamset pipeline without using the UI?
Can't parse XML element names containing colon ':'
Trying to read XML in the following format with “:” is throwing Can't parse XML element names containing colon ':' error.<sh:root> <sh:book> </sh:book> <sh:genre> </sh:genre> <sh:id> </sh:id> <sh:book> </sh:book> <sh:genre> </sh:genre> <sh:id> </sh:id> <sh:book> </sh:book> <sh:genre> </sh:genre> <sh:id> </sh:id></sh:root>What’s the best way to read such XML? (Note that changing “:” to “_” in the XML works.)
Enabling Kerberos: javax.security.auth.login.LoginException: Unable to obtain password from user
I'm trying to enable Kerberos for my SDC RPM installation, but when I start the SDC I get following exception:java.lang.RuntimeException: Could not get Kerberos credentials: javax.security.auth.login.LoginEx Caused by: javax.security.auth.login.LoginException: Unable to obtain password from user at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:897) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:760) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498)How do I move forward?
The configuration = was supplied but isn't a known config
I have a StreamSets Data Collector running in Docker and when I run a pipeline with Kafka Consumer I am seeing these error messages:The configuration = was supplied but isn't a known configThe configuration schema.registry.url = was supplied but isn't a known config How do I get past this error?
Failed to get driver instance with multiple JDBC connections
I have a few JDBC-based stages in my pipeline (Origin, JDBC Lookup, etc.) and when I try to replace the existing JDBC-based origin with another (for example, Oracle CDC with MySQL Binary Log), the validation just on the lookup processors fails with “Failed to get driver instance with multiple JDBC connections” error. Even though I haven’t changed anything on those processors.Here’s the stack trace…java.lang.RuntimeException: Failed to get driver instance for jdbcUrl=jdbc:oracle:thin:@connection_URL at com.zaxxer.hikari.util.DriverDataSource.<init>(DriverDataSource.java:112) at com.zaxxer.hikari.pool.PoolBase.initializeDataSource(PoolBase.java:336) at com.zaxxer.hikari.pool.PoolBase.<init>(PoolBase.java:109) at com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:108) at com.zaxxer.hikari.HikariDataSource.<init>(HikariDataSource.java:81) at com.streamsets.pipeline.lib.jdbc.JdbcUtil.createDataSourceForRead(JdbcUtil.java:875) at com.streams
Unable to write object to Amazon S3: The request signature we calculated does not match the signature you provided. Check your key and signing method.
I am trying to write to Amazon S3 destination with its Authentication Method set to AWS Keys, but when I run the pipeline I get “Unable to write object to Amazon S3: The request signature we calculated does not match the signature you provided. Check your key and signing method.” error.Here’s the entire stack trace…Caused by: com.streamsets.pipeline.api.StageException: S3_21 - Unable to write object to Amazon S3, reason : com.amazonaws.services.s3.model.AmazonS3Exception: The request signature we calculated does not match the signature you provided. Check your key and signing method. (Service: Amazon S3; Status Code: 403; Error Code: SignatureDoesNotMatch; Request ID: 12345678915ABCDE; S3 Extended Request ID: xyzxyzxyzxyzxyzxyz=; Proxy: null), S3 Extended Request ID: xyzxyzxyzxyzxyzxyz=at com.streamsets.pipeline.stage.destination.s3.AmazonS3Target.write(AmazonS3Target.java:182)at com.streamsets.pipeline.api.base.configurablestage.DTarget.write(DTarget.java:34)at com.streamsets.datacoll
Exception while using Windowing Aggregator
Hi,I'm currently using SDC 3.21 and I'm hitting the error that is also mentioned in this thred - https://issues.streamsets.com/plugins/servlet/mobile#issue/SDC-12129Any suggestions on how to resolve this issue permanently. At present the only workaround that I've is to restart StreamSets. I did that in development (local) environment. But that's not an option in Production.RegardsSwayam
Logging mechanism for Data Transformer and Data Collector pipelines
Is there a prebuild processor/component which captures no. of records processed through stages and other logging events ? We have requirements to capture no. of records processed and other logging events and possibly store them to log files/MySQL stages
Unable to inject large XML data (4MB size) from S3 to Snowflake (Data collector)
While trying to inject XML data from S3 into snowflake, facing the below error :S3_SPOOLDIR_01 - Failed to process object 'UBO/GSRL_Sample_XML.xml' at position '0': com.streamsets.pipeline.stage.origin.s3.BadSpoolObjectException: com.streamsets.pipeline.api.service.dataformats.DataParserException: XML_PARSER_02 - XML object exceeded maximum length: readerId 'com.dnb.asc.stream-sets.us-west-2.poc/UBO/GSRL_Sample_XML.xml', offset '0', maximum length '2147483647'Size of the XML file is 4MBThe properties used for Amazon S3 component has been attached.Also, Increased the Max Record Length size to its max.S3 Properties- Max Record Length size : 2147483647 Data Format : XML Can you Please suggest on this. Is there any size related constraint associated?We have successfully loaded smaller files from S3 to Snowflake.
Dear StreamSetsWe have an requirement to transform Complex XML data into JSON using XSLT. This needs to be done in DataCollector. The incoming file will contain millions of records and for each record, we need to apply XSLT and write the output to S3 location.I could not find resource on support for XSLT in DataCollector documentation. Could you please help me with this query? Note 1: We also have similar use case to transform JSON data to XML. Does StreamSets support usage of FreeMarker in DataCollector pipeline.Note 2: For both XSLT and freemarker, both uses external java functions to support transformationNote 3: For both XSLT and freemarker, they are compiled once for the run for better performance. RegardsVaradha
XML to JSON and derivations
Hi, I have a XML as shown below<events> <event> <type>online</type> <event_date>1-Jan-21</event_date> <feedback_status>Closed</feedback_status> </event> <event> <type>online</type> <event_date>1-Jan-20</event_date> <feedback_status>Closed</feedback_status> </event> <event> <type>online</type> <event_date>1-Aug-21</event_date> <feedback_status>Open</feedback_status> </event> <event> <type>offline</type> <event_date>1-Mar-21</event_date> <feedback_status>Closed</feedback_status> </event> <event> <type>offline</type> <event_date>1-Feb-20</event_date> <feedback_status>Closed</feedback_status> </event></
Standard out-of-box origins for SaaS apps
Hi StreamSets,I would like to know whether there is an initiative to introduce standard origins for various popular SaaS apps like Shopify, Magento, Branch etc? If there is a space where we can vote for these connectors based on which they can be prioritised for development, please do share that information.
Gradle Build Problem with Golang go-gradle plugin
I am using legacy version till now and facing this issue of build recently.I was trying to build the datacollector-edge-oss version 3.14 from source with commandgradlew goClean dist publishToMavenLocal --build-cache --stacktrace --info --scanand the build is failing as per below scan results : gradle scan linkTo resolve the issue, it seems the Bitbucket api is not accessible and the fork of same inflect library is present at volatile tech link. Kindly let me know how i can solve this?
Unable to Read Data Using S3origin in Data Transformer
Hi Team,I am facing issue i.e. reading data through S3origin within data transformer. I am able to read data through s3 origin in data collector.Trying to read data from S3 origin and copy the same in different location using S3 destination. I am using EMR as an computing engine. Job runs for several mins. on EMR and completed successfully. There is no error in Logs (Both EMR and StreamSet pipeline Logs). Do get this below Warning but not sure this is causing issue or not. java.nio.file.NoSuchFileException: /data/transformer/runInfo/testRun__9e731964-6f21-4956-99fa-82206f3451f5__149e11c1-f697-11eb-b9dc-fd846d33049d__56e36c1c-f8c6-11eb-9295-0fa62e75e081@149e11c1-f697-11eb-b9dc-fd846d33049d/run1630923519827/driver-topLevelError.logI have verified staging directory as well. seems like all required files are getting populated there which eventually being read through Spark submit. At the end, Transformer pipeline ends with status START_ERROR: Job completed successfully.This is a show stop
Any specific templates to write a blog
We are using 3.21 OSS StreamSets in a unique way where we are always creating pipelines as templates. We have given a custom UI in customer's hand to pick their preffered source. Based on their choice, we call Streami REST APIs to create customer specific pipelines in real time. That's awesome me and I beleive this is a unique way of building pipelines which others may find interesting. I'm thinking of writing a blog and curious if StreamSets suggest to follow any specific templates.
Already have an account? Login
Login to the community
No account yet? Create an account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.