Skip to main content

I am using a SQL Server Change Tracking origin in the pipeline. There are many tables in the SQL source.

If I don’t turn on the “Product Events” option, the pipeline can process data from all tables, as expected. However, because there is no event, the pipeline must run continuously, which I would like to avoid.

If I turn on the “Produce Events” option, the pipeline would stop after processing data from only 1 table. This is a problem because I need to process data from all tables.

What can I do to make it process all tables before producing the event?

Thanks a lot for your attention! I hope to hear from you soon!

@jyuan 

Kindly increase the number of threads property and maximum pool size property as per given example.

 

Example

Say you are reading from 10 tables. You set the Number of Threads property to 5 and the Maximum Pool Size property to 6. When you start the pipeline, the origin retrieves the list of tables. The origin then creates five threads to read from the first five tables, and by default Data Collector creates a matching number of pipeline runners. Upon receiving data, a thread passes a batch to each of the pipeline runners for processing.

At any given moment, the five pipeline runners can each process a batch, so this multithreaded pipeline processes up to five batches at a time. When incoming data slows, the pipeline runners sit idle, available for use as soon as the data flow increases.

 

 Thanks & Regards

Bikram_


@jyuan ,

 

Please verify that you are using correct event type in the precondition of your Pipeline FInisher.

Also, make sure your pipeline finisher is connected to the ‘E’ of your CDC Origin.

 

Here is what the documentation says about SQL Server CDC Origin.

 

no-more-data

The origin generates a no-more-data event record when the origin completes processing all available data in a time window and the number of seconds configured for event generation delay elapses without any new files appearing to be processed. The origin generates the event record even when subsequent time windows remain for processing.

Make sure your time window is big enough.


After taking the advice from @Bikram (increase thread and pool), I see 4 tables are processed. But, there are still tables left out.

I need to share more info. I am using the SQL Server Change Tracking origin. We have used this approach to load CDC data from other SQL servers without problem. I wonder if something could be wrong with this SQL server’s CDC configuration. 

Here, saleempothiwala mentioned SQL Server CDC Client origin. We haven’t used it. I wonder if I should go ahead and use this one.

 

 


I also tried the SQL Server CDC Client origin. Still, only a few tables are processed, leaving out others.

I have given it up. I’ll not use event in the pipeline. This means that I will run the pipeline continuously.

If someone has an example of using event with either origin and being able to process all tables before the event is produced, I would really like to see it and learn from your success experience.

Thanks a lot for your attention!


Reply