Question

How to fetch actual error message in a Streamsets pipeline?

1 year ago
March 14, 2024
7 replies
213 views

Ritantara
Fan
2 replies

We are logging pipeline errors /orchestrated task errors in database . Now often, we see error messages like - “for actual error , open the logs “ as runtime errors.Per our solution , this gets logged into the database . We don’t want that ..Instead we want the actual errors to get logged . How can we do that ?

Bikram
Headliner
486 replies
1 year ago
March 22, 2024

@Ritantara

You can redirect the error to s3 bucket or local directory to check the error details in it.

Once errors are present in your S3 or local folder, you can selectively filter the error details that you wish to retain in the database for future reference.

Please let me know if it helps , else provide me some more details what are you looking for ,so i can help you on it.

Thanks & Regards

Bikram_

hg508
Roadie
6 replies
1 year ago
March 28, 2024

Hello Community , @Bikram ,

I am working on similar kind of setup.

For ease of understanding, below are the stages in a pipeline.

s3 origin -> expression_01 -> stream selector -> JDBC query consumer-> groovy_script -> s3 destination

Let’s say there are 10 files in the s3 origin, and couple of them have data issues, leading to errors at various stages like expression_01 or JDBC query executor or groovy_script.

I would like to capture the error details in the following format.

File_name stage_name error_code error_message

========== =========. ==========. =============

file_05 expression_01. XXXX Invalid expression <variable name>

file_07 JDBC XXXX Query failed due to invalid col <column name>

file_09 GroovyScript. XXXX Script error at line <XX>

file_XX <Stage name>. <Streamsets_Err_code> <Streamsets error message>

Are there any built-in functions in Streamsets similar to Oracle’s SQLCODE and SQLERRM for capturing error codes and error messages?

Do you have any suggestions or pointers for implementing custom solutions?

Please note, I want to capture the error at each file level.

Thanks!

Bikram
Headliner
486 replies
1 year ago
March 28, 2024

@hg508

StreamSets doesn't offer built-in functionality to handle this scenario, but it can be accomplished with the following steps:

Firstly, generate error files and store them in an S3 bucket. Then, retrieve the error files from the S3 bucket and format the data according to your requirements before storing it in the database.

Please give a try and do let me know if it helps.

hg508
Roadie
6 replies
1 year ago
March 28, 2024

We prefer to streamline the process without involving S3 as an intermediary step. Instead, we aim to capture errors directly within the same stream. For instance, utilizing a stream selector switch, if any error occurs in the pipeline (event status = false), we'd like to write the necessary error information to the database, as outlined above.

Do you have any suggestions on how we can achieve this?

Bikram
Headliner
486 replies
1 year ago
April 1, 2024

@hg508

If errors occur in any stage, they cannot be managed within the Stream Selector or any other processors because data processing halts after encountering an error stage.

In such cases, we need to address this by implementing two separate pipelines, as previously discussed. One pipeline will be responsible for generating errors and storing them in S3 or Kafka, while the other pipeline will retrieve the error data and store the details in the database.

Once these pipelines are set up, orchestration can be utilized to execute them within a single stream, simplifying the handling of this use case.

hg508
Roadie
6 replies
1 year ago
April 3, 2024

@Bikram, I’m wondering how we can specifically capture the generated error codes and error messages, not just the error records.Specifically, I’m keen on extracting the error code and error message similar to those provided in the logs whenever there’s an error in the pipeline.

Any insights on this?

Bikram
Headliner
486 replies
1 year ago
April 3, 2024

@hg508

In the event of any issues in the pipeline stages, StreamSets will generate error codes. These codes vary depending on the stage. Therefore, prior to retrieval, we must store error details in a file for subsequent filtering.

Portal - StreamSets Docs

Reply

Related topics

Kafka Producer - kafka.common.MessageSizeTooLargeException.

MySQL: JDBC_06: Failed to Initialize Connection Pool: Access Denied Error

Is it possible to read data from SOAPUI using http client

SNOWFLAKE_11 - Could not create SQL DataSourceicon

Connecting to Microsoft SQL Server from SDC v5.1 is failing "unable to find valid certification path to requested target"

Tags

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded