Question

Performance is very slow when I am using JDBC Lookup processor in Data Collector

2 years ago
28 March 2022
2 replies
176 views

HEMANTH14194
Fan
1 reply

I am trying to filter records by looking at a specific value from a table using jdbc lookup processor. My source has around 100k records and my jdbc lookup has only 1 record, but the pipeline is taking 2 minutes to write 1000 records to Hive. Overall the execution time is around 200 minutees for 100K records which is very bad. I request you to help me with this.

2 replies

Userlevel 5

+1

Bikram
Headliner
483 replies
2 years ago
4 April 2022

@HEMANTH14194 ,

Can you please enable the cache and try to execute the pipeline and let me know if it helps.

Userlevel 4

Rishi
StreamSets Employee
96 replies
2 years ago
29 March 2022

@HEMANTH14194 Overall performance of pipeline can varies on wide range of factors and it would require to detailed analysis of concerned data source/destination and what kind of transformation you have within the pipeline. Also the underlying system config can have contributing factor too (i.e cpu/memory/network)

However, quick thing you can try by configuring the JDBC Lookup processor to locally cache the values returned from a database table. and you can also increase the number of threads that the JDBC Lookup processor uses to pre-populate the lookup cache.

checkout our document : https://docs.streamsets.com/portal/platform-datacollector/latest/datacollector/UserGuide/Processors/JDBCLookup.html#concept_jt5_kx2_px

Reply

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded