Skip to main content

If you would like to have destinations LocalFS, HDFS, ADLS2 to write all of your processed data to a single file or split files based on some configuration, you can use any of the following triggers.

  1. Max Records in File
  2. Max File Size (MB)
  3. Idle Timeout
  4. Late Record Time Limit(Time basis set to processing time)

Whichever the trigger first meets the config value passed will close the file. Let us say we have configured Max File Size as 10GB and Max Records in File as one million, and the file has reached 1million processed record count before the file size becomes 10GB, the file gets closed. 

In order to write to a single file on one particular trigger, set it to the desired value and set the remaining triggers to very high values. For example, for a pipeline which never runs more than 5hours and we are not sure how many records or how much file size it can have, we can set Late Record Time Limit to 5hours and Max Records In File, Max File Size to very higher values which cannot be reached by your processed data.

Be the first to reply!

Reply