Skip to main content

I am trying to filter records by looking at a specific value from a table using jdbc lookup processor. My source has around 100k records and my jdbc lookup has only 1 record, but the pipeline is taking 2 minutes to write 1000 records to Hive. Overall the execution time is around 200 minutees for 100K records which is very bad. I request you to help me with this.

@HEMANTH14194 Overall performance of pipeline can varies on wide range of factors and it would require to detailed analysis of concerned data source/destination and what kind of transformation you have within the pipeline. Also the underlying system config can have contributing factor too (i.e cpu/memory/network) 


However, quick thing you can try by configuring the JDBC Lookup processor to locally cache the values returned from a database table. and  you can also increase the number of threads that the JDBC Lookup processor uses to pre-populate the lookup cache.

checkout our document : https://docs.streamsets.com/portal/platform-datacollector/latest/datacollector/UserGuide/Processors/JDBCLookup.html#concept_jt5_kx2_px 


@HEMANTH14194 ,

Can you please  enable the cache and try to execute the pipeline and let me know if it helps.

 


Reply