Question

Error Handling on Pipeline Step

  • 16 March 2022
  • 2 replies
  • 81 views

I currently look for a way to perform a typical try catch operation on a pipeline step.
In my case on JDBC producer to detect a specific error (e.g. duplicate record, or record does not exists)
and based on that to perform a retry by changing the sdc.operation.type.

Below a picture to explain that use case.

The reason for that is that it seems that no MERGE or often as well named UPSERT is supported by streamsets as it looks. As well i wanted to avoid to check for any record if record exists as that is costing time and resources. (as described here: How to upsert into JDBC Destination? | StreamSets Community)


Any hints welcome how to do just a simple try catch. 


2 replies

Userlevel 3
Badge

Hi @robert.bernhard , I don’t think that can be achieved exactly the way you’ve drawn it - a StreamSets Pipeline is a Directed Acyclical Graph, so the flow never goes back to a previous node.

However, if the origin is a CDC one, why do you need to worry about performing MERGE/UPSERT operations? Your changes will be ordered correctly, and applied in that order by the destination.

For records that are rejected by the destination, you could have the pipeline write them out to the Error Record (to any of the supported Error Records destinations) and then have a second pipeline process them.

Apologies if I’m missing the point of your question.Maybe we can discuss via DM?

@Giuseppe Mura 
Thanks for your feedback.
The topic is independent if CDC or JDBC Consumer or anything else is used on the left side.
So imagine: REST, CDC, JDBC, JMS Consumer or any other sources

The problem is as described here: How to upsert into JDBC Destination? | StreamSets Community
to detect on destination what operation to perform. You have sometimes event via pub sub in between
and therefore you want to detect the operation to be save on destination.

A try catch is just a standard pattern for handling errors and i see it not supported as you stated likely
because of a linear graph. But it would help in some cases and i think you could make linear
if you just would be handle an error output similar to stream selector does.
Means any stage has an exit point (on error) where you could handle the error in the same pipeline.
Passing error and pipeline context further, check error and to be linear have another jdbc producer 
to perform second operation. (sure better to point back to same producer and just change sdc.operation.type but that would break the linear flow).

That is just one use case you want maybe directly handle the error but i can see as well other use cases where such a functionality would help.

The generic error handler if fine to jus log errors or act on them on a wider level but this handling
i look for is specific to a pipeline and should be part of that pipeline.

Reply