Get inspired and gain all the knowledge you need
Streamsets asking for connect to control hub or to enter registration code after upgrading to 4.2.0 version.
Hi Team,To fix log 4j vulnerability , I upgraded the streamsets version from streamsets/datacollector:3.18.1 to streamsets/datacollector:4.2.0. After that, I am not able to create a new /import pipeline. User Interface asks me to connect to control hub or enter activation code which was not the case in version 3.18.1.
HI, I have tried copy all the files from one folder to another folder with in same s3 bucket using streamsets job. But I am seeing more files copied into destination folder compared to source folder(like in source folder if 7 files are there, but in destination folder I am seeing more than 7 like … 8 or 10 or 12). But this issue is coming only for first time of the day. If I run same job again for the day I am seeing record count matching between source and destination. Can any one help me on this issue. ThanksMurali
i have declared s3 data format as below but in UI/Control hub data format is showing blank.pipeline_builder = sch.get_pipeline_builder(engine_type='data_collector', engine_url=”XXXX”)s3_origin = pipeline_builder.add_stage('Amazon S3', type='origin')--s3_origin.data_format ='Text'.How to see values allowed for any particular component in SDK.(ex s3_origin.data_format ,s3_origin.delimiter etc..)
Hi,I have created a pipeline in the StreamSets Data Collector which reads data from an Apache Kafka topic and inserts it into a Databricks Delta table.Table Auto Creation has been disabled. I have created the table on the Databricks instance separately and the columns and data types in the table are correct.But I am getting the following error while validating the pipeline. What could be the reason for the error?Caused by: com.streamsets.pipeline.api.StageException: DELTA_LAKE_13 - Table 'gov_src_req_sts_v1' column '' unsupported type '' at com.streamsets.pipeline.stage.destination.definitions.JdbcTableDefStore.get(JdbcTableDefStore.java:157) at com.streamsets.pipeline.stage.destination.definitions.CacheTableDefStore$1.load(CacheTableDefStore.java:59) at com.streamsets.pipeline.stage.destination.definitions.CacheTableDefStore$1.load(CacheTableDefStore.java:53) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3542) at com.google.common.cache.LocalCa
from the python shell i am unable to launch data collectorbelow options i have triedfrom streamsets.sdk import DataCollectordc = DataCollector('https://localhost:18630') error : None object has no attribute use_websocket_tunning 2.from streamsets.sdk import DataCollector,ControlHubsch = ControlHub(<SCH URL>, credential_id=<credential id>, token=<token>)pipeline_builder = sch.get_pipeline_builder(engine_type='data_collector', engine_url=<SDC URL>)in the above step, i have given engine_url by login into streamsets and under engine tab the active i gave and i am getting below error error : instnace is not in list
I am installing Control Hub and when I run 01-iitdb.sh, I get error of DbUtils.DBType.POSTGRESSLF4J: Found binding in [jar:file:/opt/streamsets-dpm/app-lib/slf4j-log4j12-1.7. 7.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/streamsets-dpm/server-lib/slf4j-log4j12-1 .7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]Error while running 'BuildSchemaCommand': java.lang.IllegalArgumentException: No enum constant com.streamsets.lib.util.DbUtils.DBType.POSTGRES----------------------------------------java.lang.IllegalArgumentException: No enum constant com.streamsets.lib.util.DbU tils.DBType.POSTGRES at java.lang.Enum.valueOf(Enum.java:238) at com.streamsets.lib.util.DbUtils$DBType.valueOf(DbUtils.java:180)
We are using cron scheduler , for which we have to give the cron expression .As per our requirement pipeline has to run for every 15 minsis the below cron expression is correct for the pipeline to run for every 15 mins.if not please suggest the changes Cron expression used in the pipeline : 0 /15 * * * * * but when i do the test run pipeline it is not working, can anyone explain how cron scheduler works and mentioned above cron expression is correct or not
HiWe are using streamsets provisioned from google cloud marketplace .We are trying to create a data pipeline origin being kafka topic and destination being delta lake.While setting we observed that “Staging location” requires a AWS S3 or Azure Storage . Google cloud storage or other alternatives are not present.We do not have a AWS or Azure account.Is it mandatory for any delta lake ingestion to have a AWS or Azure storage even though our application may be in neither of them?
Hey guys, In the DataOps Platform Fundamentals Course, the video on Data Drift is missing.https://academy.streamsets.com/courses/dataops-platform-fundamentals/lessons/data-drift-2/ Same with CI/CD:https://academy.streamsets.com/courses/dataops-platform-fundamentals/lessons/implementing-pipeline-ci-cd/ Is there a video, or should I mark it complete and move on? Jim
Hello,We are getting an error in Hadoop FS HDFS destination while writing a file. It works fine when we use the Stage Library version CDH 5.16 but fails if we use any higher version like CDH 6.x or CDP 7.1. What may be causing this if the same setup works in SDC using CDH5.16 stage library. Also it works fine for version CDP 7.1 if we do not use HDFS encrypted directory since it wont call KMS provider.The error in the log states: WARN KMS provider at [http://x.y.z:16000/kms/v1/] threw an IOException:java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt at org.apache.hadoop.crypto.key.kms.KMSClientProvider.createConnection(KMSClientProvider.java:487) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:788)
When I am using jdbc producer to insert data into Sybase ASE its throwing me an error : JDBC_17 -failed to lookup primary keys for table ‘xxxxxx’ : SQLState: ZZZZZ Error code:2762 Message: The ‘CREATE TABLE’ command is not allowed within a multi-statement transaction in the ‘tempdb’ database. SQLState: 42000 Error code:208 Message:#keys not found.’ Please have a look and advice.
I am trying to use Jython Evaluator for fuzzy matching. I need to compare a record from one source with another in Jython code and emit only 1 record from evaluator.So my question is how do i access records (values) from 2 different sources at the same in Jython evaluator?Request for assistance
Hello. In one of our pipelines, we end with a call to a Rabbit MQ Producer node. The node publishes to a fanout exchange. Later in another application, we create a listener queue on that same exchange and listen for messages on the newly created queue. This happens for each docker container running our application which is listening on the fanout exchange. As a side effect, of the Rabbit MQ Producer node creating the exchange, it is also creating a queue along with it. This results in a queue being sent all the messages that are sent to the exchange which has no consumers. We would like to prevent this default queue from being created. Is there a way to do that in the RabbitMQ Producer node? Below I will outline the steps to try and make the situation more clearRabbit MQ Producer node creates fanout Exchange Rabbit MQ Producer node creates queue “A” on the exchange Instance 1 of separate application creates queue “B” on the exchange Instance 2 of separate application creates qu
Hey guys, I am running through the Academy training and seem to have hit a snag. I am not real knowledgeable about Docker/Kubernetes, so I thought I’d ask here. Is Kafka running in the labs? I am getting timeout errors when I try to connect from Control Hub.https://academy.streamsets.com/courses/dataops-platform-fundamentals/lessons/build-a-kafka-pipeline/topic/lab-build-a-kafka-pipeline-2/ Any help is appreciated. Jim
I have a streamsets pipeline that part of a scheduled job. The pipeline reads a CSV file that is stored at an AWS sftp location. That csv file gets overwritten every night. The scheduled job is supposed to read the file well after the file is over-written. The scheduler does run the job at the specified hour, however, the pipeline only reads the first line of the csv file and ends even though there are many records for it to process. If I manually run the job and reset the origin the job runs as expected. I have only been working with streamsets for about 5 months. Anyone suggest what I might be missing in my pipeline or possibly the scheduler?
Already have an account? Login
Login to the community
No account yet? Create an account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.