Got a Question?
Can't find what you're looking for? Ask it here!
- 689 Topics
- 1,772 Replies
How to configure REST service
Hi Team,As, I am new to Streamsets want to have more information about Rest Service.We have databases on on-prem, We run procedure on database that initiate the rest API call to tomcat(service published on this) web server on port 8908 redirectedNow we receive one url which contain different parameter. I need to configure the same in Streamsets.Can you please let me know, How we can configure same in rest service.
JDBC Multitable SqlServer to snowflake primary keys replication
Hello, thank you very much for your time to read and help me. :) .I need help because I have a problem that I can't solve.Context:I am developing a CDC integration with the following characteristics:A pipeline that performs the first data load and creates the tables in snowflake (jdbc multi table → snowflake connector). A second pipeline that performs continuous integration using the Stage CDC for Sql Server. ( sql server cdc → snowflake connector) The integration is with hundreds of tables, so it does not support the possibility of manual management and specific configurations at the object level. The update strategy is to configure the snowflake connector that loads the CDC transactions, so that it does a MERGE of the data, saving only the latest version of the row. Problem: The Multitable stage creates the table in snowflake and moves the historical data. But I am finding that it does not create PKs that ARE defined in the source (sql server in this case).Is there any way to replica
How does the JDBC Lookup Cache work?
The JDBC Lookup documentation explains how to configure the cache and how the eviction works, but there is no discussion about how the cache actually works. Is the query being evaluated and then hashed, with the hash value being used as a key? or is something else happening?
hello,i’m trying to capture time series data for the pipeline. i’m thinking about using the rest api to pull the data and store it in influxdb.having said this, if we set the pipeline/job configuration to write the pipeline statistics to control hub, can it be accessed so that it can be moved to influxdb or does it write to influxdb automatically? thxeric
Unable to create a deployment using python SDK
After creating an environment through python sdk, when I am trying to create a deployment using the scripts I am getting the error described bellow: >>>deployment_builder = sch.get_deployment_builder(deployment_type='SELF')>>> # sample_environment is an instance of streamsets.sdk.sch_models.SelfManagedEnvironment>>> deployment = deployment_builder.build(deployment_name='Sample Deployment',... environment=sample_environment,... engine_type='DC',... engine_version='4.1.0',... deployment_tags=['self-managed-tag'])Traceback (most recent call last): File "<stdin>", line 2, in <module>NameError: name 'sample_environment' is not definedPlease help me regarding this.
Error : Java heap Space
I have 98 columns/fields and 250,000 rows, every time I run it always error, what i'm doing now is to reduce the :Max Batch Size (Records) = 10Max Clob Size (Characters) = 10Max Blob Size (Bytes) = 10Fetch Size = 10because the default is 1000 and an error will occur. How to handle that problem?
DELTA_LAKE_01 - Could not create SQL DataSource
After configuring databricks delta lake destination when I am trying to validate my pipeline, the following error I am receiving: DELTA_LAKE_01 - Could not create SQL DataSource: com.zaxxer.hikari.pool.HikariPool$PoolInitializationException: Failed to initialize pool: [Simba][SparkJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target.
Can pipeline access kafka headers?
I would like to add OpenTelemetry tracing support to our pipelines. Trace information is propagated through headers, so I’d like to access the headers to pass them through the pipeline. I don’t see the headers on the record so I’m curious if it is possible to access them.
Post-processing for whole files format
Hi everybody! I want to remove files (whole file data format) from a origin stage (GCS stage) after processing it, but when I activate thge post-processing option and set it to "Delete" and try to start the associated job to the pipeline a have the Error:GCS_11 - Error validating postprocessing options. If 'Whole File' data format is used, neither 'Delete' nor 'Archive -> Move into ...' can be used.There are any way to do that directly from a pipeline or the best practice is to add a shell script at the end of the pipeline to remove the files from command line with google CLI commands after processing them?
refreshing pipeline launch using SDK
After launching pipeline using SDK, if i have any changes in pipeline, i want to do it from SDK instead of UI and those changes has to reflect in UI. how can i achieve this one without again launching pipeline.some more queries1.preview using SDKhow to know stage has no errors in SDK
File list-url's in a file to process
Hi Guys, I have a requirement where I have thousands of json file url’s in a file(all files are of same format). I need to process the data for every file and load data in to destination. Ex: local_dir/file_list.txthttps://example1.jsonhttps://example2.jsonhttps://example3.jsonhttps://example4.json
How does JDBC destination handle fields without matching columns?
How does JDBC destination handle fields without matching columns? I am encountering a situation where it appears that the fields without matching columns are ignored, but this is not defined in the documentation and seems counter to what I would expect (an exception complaining about a column not existing in the table). Please provide a deeper description of what is happening here.
Lookups (into DeltaTable) delivering extremely bad performances when used in Transformer
Lookups (into DeltaTable) giving extremely bad performances (sometime it stays in pre-execution stage forever) when used in Transformer with origin of 1000 records, although, it works decent enough in streaming mode which i guess is due to the lesser number of incoming records.
Already have an account? Login
Login to the community
No account yet? Create an account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.