Question:
When running a pipeline with Amazon S3 destination, a user can see lots of following warning logs in the log file:
WARN AmazonS3Client - No content length specified for stream data. Stream contents will be buffered in memory and could result in out of memory errors.
Answer:
The warning message is a general message. The field Content-Length HTTP header indicating the size of the associated object in bytes is required when uploading objects to S3, but the AWS S3 Java client will automatically set it when working directly with files. Because we are streaming data (and not uploading), the client must buffer the entire stream in order to calculate the content length before sending the data to Amazon S3. This message is only a warning for the user to know that in the case that the size of the batch is too large than available memory, it can cause memory issues. Unless you are seeing some memory issues, you do not have to be worried about this warning message.