Question

Performance is very slow when I am using JDBC Lookup processor in Data Collector

2 years ago
28 March 2022
2 replies
176 views

HEMANTH14194
Fan
1 reply

I am trying to filter records by looking at a specific value from a table using jdbc lookup processor. My source has around 100k records and my jdbc lookup has only 1 record, but the pipeline is taking 2 minutes to write 1000 records to Hive. Overall the execution time is around 200 minutees for 100K records which is very bad. I request you to help me with this.

2 replies

Userlevel 4

Rishi
StreamSets Employee
96 replies
2 years ago
29 March 2022

@HEMANTH14194 Overall performance of pipeline can varies on wide range of factors and it would require to detailed analysis of concerned data source/destination and what kind of transformation you have within the pipeline. Also the underlying system config can have contributing factor too (i.e cpu/memory/network)

However, quick thing you can try by configuring the JDBC Lookup processor to locally cache the values returned from a database table. and you can also increase the number of threads that the JDBC Lookup processor uses to pre-populate the lookup cache.

checkout our document : https://docs.streamsets.com/portal/platform-datacollector/latest/datacollector/UserGuide/Processors/JDBCLookup.html#concept_jt5_kx2_px