SUMMARY OF THE ISSUE:
SDC performance issues when using Kudu stages which lead to timeout issues and failures when starting additional pipelines with Kudu stages.
SYMPTOMS:
- RPC Connection Errors for a pipeline with Kudu destination.
- Reset by peer (error 104) in Kudu logs - indicates that the connection is timed out on the client side.
- Kudu logs show that RPC negotiations take significant time (37 seconds, 84 seconds, etc.).
- Many threads in the thread dump used by Kudu client library while pipelines with Kudu stages are running.
- An exception in the sdc.log similar to the following one:
RUN_ERROR: com.streamsets.pipeline.api.StageException: KUDU_03 - Errors while interacting with Kudu: Row error for primary key=[...], tablet=null, server=null, status=Timed out: can not complete before timeout: Batch{operations=..., tablet="..." [0x0000000B, ), ignoreAllDuplicateRows=false, rpc=KuduRpc(method=Write, tablet=..., attempt=2, DeadlineTracker(timeout=10000, elapsed=10242), Traces: [0ms] sending RPC to server ..., [6200ms] received from server ... response Network error: connection disconnected, [6200ms] delaying RPC due to Network error: connection disconnected, [6215ms] querying master, [6215ms] Sub rpc: GetTableLocations sending RPC to server , [9726ms] Sub rpc: GetTableLocations received from server response OK)}
VERSIONS AFFECTED:
SDC 3.3.0 and earlier
SOLUTION:
The problem is that Kudu clients start many worker nodes. The number of worker nodes is configurable using Kudu client API to set the limit for the maximum number of workerCount. As currently, our Kudu Target does not set this configuration, the Kudu client library uses (2 * the number of available processors) threads per stage by default.
For example, if a machine has 32 CPUs, it means that one Kudu stage uses 64 threads (2 x 32 CPUs). Then 8 pipelines (if every pipeline uses only one Kudu stage) can use 512 threads (2 x 32 CPUs x 8 pipelines), 16 pipelines can use 1024 threads.
In the SDC 3.3.1, we introduced the additional configuration which can be found in the Advanced tab for the Kudu Destination and Kudu Lookup processor: "Maximum Number of Worker Threads" and "Admin Operation Timeout (milliseconds)".

For more information about the recommended number of worker threads please see the following KB article - Kudu: Recommended Number of Worker Threads - SDC 3.1.1 and later.