Lookups (into DeltaTable) delivering extremely bad performances when used in Transformer

  • 3 January 2022
  • 1 reply

Lookups (into DeltaTable) giving extremely bad performances (sometime it stays in pre-execution stage forever) when used in Transformer with origin of 1000 records, although, it works decent enough in streaming mode which i guess is due to the lesser number of incoming records.

1 reply

Hii @jerri ,Delta Lake Lookup performance can varies on various factor including storage system/layer used ( i.e AWS s3, ADLS, HDFS ) to lookup keys. 

However few quick things you can try 

  •  If the lookup table is static, you can configure the processor to load the table only once, enabling the processor to cache and reuse the data for the duration of the pipeline run.
  • If not loading only once, and if the processor passes data to multiple stages, you might enable caching to improve pipeline performance.

Since performance issue generally complex in nature and require detailed analysis. I would encourage if your organization has an Enterprise support contract, please open a support ticket.