We are logging pipeline errors /orchestrated task errors in database . Now often, we see error messages like - “for actual error , open the logs “ as runtime errors.Per our solution , this gets logged into the database . We don’t want that ..Instead we want the actual errors to get logged . How can we do that ?
You can redirect the error to s3 bucket or local directory to check the error details in it.
Once errors are present in your S3 or local folder, you can selectively filter the error details that you wish to retain in the database for future reference.
Please let me know if it helps , else provide me some more details what are you looking for ,so i can help you on it.
Thanks & Regards
Bikram_
Hello Community ,
I am working on similar kind of setup.
For ease of understanding, below are the stages in a pipeline.
s3 origin -> expression_01 -> stream selector -> JDBC query consumer-> groovy_script -> s3 destination
Let’s say there are 10 files in the s3 origin, and couple of them have data issues, leading to errors at various stages like expression_01 or JDBC query executor or groovy_script.
I would like to capture the error details in the following format.
File_name stage_name error_code error_message
========== =========. ==========. =============
file_05 expression_01. XXXX Invalid expression <variable name>
file_07 JDBC XXXX Query failed due to invalid col <column name>
file_09 GroovyScript. XXXX Script error at line <XX>
file_XX <Stage name>. <Streamsets_Err_code> <Streamsets error message>
Are there any built-in functions in Streamsets similar to Oracle’s SQLCODE and SQLERRM for capturing error codes and error messages?
Do you have any suggestions or pointers for implementing custom solutions?
Please note, I want to capture the error at each file level.
Thanks!
StreamSets doesn't offer built-in functionality to handle this scenario, but it can be accomplished with the following steps:
Firstly, generate error files and store them in an S3 bucket. Then, retrieve the error files from the S3 bucket and format the data according to your requirements before storing it in the database.
Please give a try and do let me know if it helps.
We prefer to streamline the process without involving S3 as an intermediary step. Instead, we aim to capture errors directly within the same stream. For instance, utilizing a stream selector switch, if any error occurs in the pipeline (event status = false), we'd like to write the necessary error information to the database, as outlined above.
Do you have any suggestions on how we can achieve this?
If errors occur in any stage, they cannot be managed within the Stream Selector or any other processors because data processing halts after encountering an error stage.
In such cases, we need to address this by implementing two separate pipelines, as previously discussed. One pipeline will be responsible for generating errors and storing them in S3 or Kafka, while the other pipeline will retrieve the error data and store the details in the database.
Once these pipelines are set up, orchestration can be utilized to execute them within a single stream, simplifying the handling of this use case.
@Bikram, I’m wondering how we can specifically capture the generated error codes and error messages, not just the error records.Specifically, I’m keen on extracting the error code and error message similar to those provided in the logs whenever there’s an error in the pipeline.
Any insights on this?
In the event of any issues in the pipeline stages, StreamSets will generate error codes. These codes vary depending on the stage. Therefore, prior to retrieval, we must store error details in a file for subsequent filtering.
Reply
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.