Hi, I am using JDBC Multitable Origin Stage and I want to run 200+ pipelines in parallel but I have to maintain their process time as low as possible. I am configuring sdc.properties file but I still cannot find the proper values. I have configured max.stage.private.classloaders, runner.thread.pool.size, pipeline.max.runners.count but I still cannot minimize their run time. Whenever they are running in parallel, it seems that their usual individual run time takes longer. Also, whenever I run them in parallel, some pipelines are in STARTING state. They do not run immediately as their pipeline status is STARTING. May I ask what configurations should I consider for me to run these 200+ pipelines concurrently with consideration of lower run time?
Running Concurrent 200+ Pipelines
Best answer by saleempothiwala
Hi
Please have a look at this video:
Your data collector has a fixed amount of memory available. For every pipeline you run, you spend
memory = record size x batchsize x destinations + other overheads
So there are only certain number of pipelines you can run in parallel. Any others will have to wait for resources to free up so that it can run. There is no magic property that will allow you to run 200 pipelines in one go. The STARTING status you see is actually the pipelines waiting for the resources. More runners in waiting will consume more resources.
Best approach would be to add more data collectors with same configuration and labels, create jobs out of pipelines and allocate same tags for data collectors and then SCH will distribute the load accordingly.
Number of SDC will again depend on the calculation above. So assuming you can run 20 pipelines in parallel on 1 SDC, you will need 10 sdc to run all 200 in parallel.
Reply
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.