I have a pipeline migrating oracle db to salesforce which gets a couple of product attributes from salesforce for all previously migrated products in salesforce. Each record in the pipeline is processing one product and needs to reference this map, which uses local caching to only call salesforce once to build the product map of everything (rather than call for every product being processed - which is tioo slow and uses too many API calls).
With this approach, every record gets the map of all products. The approach works with small batch sizes of 50, but runs out of heap memory with batch sizes bigger than that.
Is there a way for every record to access this large map of Prouct information loaded from salesforce without duplicating it inside of every record?
With your approach, one reasonable solution would be using option Multiple Values Behavior with setting Split into multiple records and then connect to an Stream Selector that would send to Trash unmatched records (that is records whose product id is not the product id of the expanded attributes. As far as your pipeline is reasonably fast and your batches not very big, you would be reclaiming wasted memory very quickly, thus dodging memory issues.