Community Articles and Got a Question?
Can't find what you're looking for? Ask it here or check out the Community articles
- 948 Topics
- 2,248 Replies
Trying to read XML in the following format with “:” is throwing Can't parse XML element names containing colon ':' error.<sh:root> <sh:book> </sh:book> <sh:genre> </sh:genre> <sh:id> </sh:id> <sh:book> </sh:book> <sh:genre> </sh:genre> <sh:id> </sh:id> <sh:book> </sh:book> <sh:genre> </sh:genre> <sh:id> </sh:id></sh:root>What’s the best way to read such XML? (Note that changing “:” to “_” in the XML works.)
I'm trying to enable Kerberos for my SDC RPM installation, but when I start the SDC I get following exception:java.lang.RuntimeException: Could not get Kerberos credentials: javax.security.auth.login.LoginEx Caused by: javax.security.auth.login.LoginException: Unable to obtain password from user at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:897) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:760) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498)How do I move forward?
I have a StreamSets Data Collector running in Docker and when I run a pipeline with Kafka Consumer I am seeing these error messages:The configuration = was supplied but isn't a known configThe configuration schema.registry.url = was supplied but isn't a known config How do I get past this error?
I have a few JDBC-based stages in my pipeline (Origin, JDBC Lookup, etc.) and when I try to replace the existing JDBC-based origin with another (for example, Oracle CDC with MySQL Binary Log), the validation just on the lookup processors fails with “Failed to get driver instance with multiple JDBC connections” error. Even though I haven’t changed anything on those processors.Here’s the stack trace…java.lang.RuntimeException: Failed to get driver instance for jdbcUrl=jdbc:oracle:thin:@connection_URL at com.zaxxer.hikari.util.DriverDataSource.<init>(DriverDataSource.java:112) at com.zaxxer.hikari.pool.PoolBase.initializeDataSource(PoolBase.java:336) at com.zaxxer.hikari.pool.PoolBase.<init>(PoolBase.java:109) at com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:108) at com.zaxxer.hikari.HikariDataSource.<init>(HikariDataSource.java:81) at com.streamsets.pipeline.lib.jdbc.JdbcUtil.createDataSourceForRead(JdbcUtil.java:875) at com.streams
Unable to write object to Amazon S3: The request signature we calculated does not match the signature you provided. Check your key and signing method.
I am trying to write to Amazon S3 destination with its Authentication Method set to AWS Keys, but when I run the pipeline I get “Unable to write object to Amazon S3: The request signature we calculated does not match the signature you provided. Check your key and signing method.” error.Here’s the entire stack trace…Caused by: com.streamsets.pipeline.api.StageException: S3_21 - Unable to write object to Amazon S3, reason : com.amazonaws.services.s3.model.AmazonS3Exception: The request signature we calculated does not match the signature you provided. Check your key and signing method. (Service: Amazon S3; Status Code: 403; Error Code: SignatureDoesNotMatch; Request ID: 12345678915ABCDE; S3 Extended Request ID: xyzxyzxyzxyzxyzxyz=; Proxy: null), S3 Extended Request ID: xyzxyzxyzxyzxyzxyz=at com.streamsets.pipeline.stage.destination.s3.AmazonS3Target.write(AmazonS3Target.java:182)at com.streamsets.pipeline.api.base.configurablestage.DTarget.write(DTarget.java:34)at com.streamsets.datacoll
I want to run data collector on machine. Have gone through some git readme.md and some blogs too, they are asking to download the tarball, but I am not able to find the url to download the tarball. Can someone help with this, or with some other way to download the tarball?
The data pipeline is jdbc → hive_metadata → hadoop file system and hivemetastore. Data moves from oracle database to hadoop file system. Schema is also getting created but there is no data in the tables. Tried changing the number idle time settings to 10 seconds.Any help in this regard will be greatly appreciated.
Hi,I'm currently using SDC 3.21 and I'm hitting the error that is also mentioned in this thred - https://issues.streamsets.com/plugins/servlet/mobile#issue/SDC-12129Any suggestions on how to resolve this issue permanently. At present the only workaround that I've is to restart StreamSets. I did that in development (local) environment. But that's not an option in Production.RegardsSwayam
While trying to inject XML data from S3 into snowflake, facing the below error :S3_SPOOLDIR_01 - Failed to process object 'UBO/GSRL_Sample_XML.xml' at position '0': com.streamsets.pipeline.stage.origin.s3.BadSpoolObjectException: com.streamsets.pipeline.api.service.dataformats.DataParserException: XML_PARSER_02 - XML object exceeded maximum length: readerId 'com.dnb.asc.stream-sets.us-west-2.poc/UBO/GSRL_Sample_XML.xml', offset '0', maximum length '2147483647'Size of the XML file is 4MBThe properties used for Amazon S3 component has been attached.Also, Increased the Max Record Length size to its max.S3 Properties- Max Record Length size : 2147483647 Data Format : XML Can you Please suggest on this. Is there any size related constraint associated?We have successfully loaded smaller files from S3 to Snowflake.
Is there a prebuild processor/component which captures no. of records processed through stages and other logging events ? We have requirements to capture no. of records processed and other logging events and possibly store them to log files/MySQL stages
Hi StreamSets,I would like to know whether there is an initiative to introduce standard origins for various popular SaaS apps like Shopify, Magento, Branch etc? If there is a space where we can vote for these connectors based on which they can be prioritised for development, please do share that information.
Dear Community users,SDC offers a great out-of-box processor “Field Type Converter” to convert an incoming field of one data type to another data type. This is a very useful transformation function that is required before we send the data to final destination. However, sometime we may have to write our own transformation function and today I’m going to share a scenario where I was forced to use a jython evaluator to detect the date format of an incoming field and convert that to a single unified format that our destination is expecting.Problem StatementWe have a need to process the incoming file where there is a field that need to be processed as “date” data type while there is another field which needs to be processed as “timestamp” data type before we generate the final feed for destination. The detaination while accepts the “date” fields is YYYY-MM-DD format, the timestamp fileds need to be of YYYY-MM-DD HH:MI:SS GMT+05:30, the origin fields can be of different formats. The solution
Hi, I have a XML as shown below<events> <event> <type>online</type> <event_date>1-Jan-21</event_date> <feedback_status>Closed</feedback_status> </event> <event> <type>online</type> <event_date>1-Jan-20</event_date> <feedback_status>Closed</feedback_status> </event> <event> <type>online</type> <event_date>1-Aug-21</event_date> <feedback_status>Open</feedback_status> </event> <event> <type>offline</type> <event_date>1-Mar-21</event_date> <feedback_status>Closed</feedback_status> </event> <event> <type>offline</type> <event_date>1-Feb-20</event_date> <feedback_status>Closed</feedback_status> </event></
Hi Team,I am facing issue i.e. reading data through S3origin within data transformer. I am able to read data through s3 origin in data collector.Trying to read data from S3 origin and copy the same in different location using S3 destination. I am using EMR as an computing engine. Job runs for several mins. on EMR and completed successfully. There is no error in Logs (Both EMR and StreamSet pipeline Logs). Do get this below Warning but not sure this is causing issue or not. java.nio.file.NoSuchFileException: /data/transformer/runInfo/testRun__9e731964-6f21-4956-99fa-82206f3451f5__149e11c1-f697-11eb-b9dc-fd846d33049d__56e36c1c-f8c6-11eb-9295-0fa62e75e081@149e11c1-f697-11eb-b9dc-fd846d33049d/run1630923519827/driver-topLevelError.logI have verified staging directory as well. seems like all required files are getting populated there which eventually being read through Spark submit. At the end, Transformer pipeline ends with status START_ERROR: Job completed successfully.This is a show stop
I am using legacy version till now and facing this issue of build recently.I was trying to build the datacollector-edge-oss version 3.14 from source with commandgradlew goClean dist publishToMavenLocal --build-cache --stacktrace --info --scanand the build is failing as per below scan results : gradle scan linkTo resolve the issue, it seems the Bitbucket api is not accessible and the fork of same inflect library is present at volatile tech link. Kindly let me know how i can solve this?
Already have an account? Login
Login to the community
No account yet? Create an account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.