The error message "kafka.common.MessageSizeTooLargeException" can occur, when Kafka Producer of StreamSets Data Collector is trying to send too large message to the Kafka Broker.
In SDC you have to set the max size of the record and this size should be smaller (or even) than message.max.bytes set in Kafka Broker (not StreamSets Data Collector properties), usually server.properties.
For Kafka 0.8, when message fails to be be published, the Kafka Client receives a very generic exception (FailedToSendMessageException : Failed to send messages after 3 tries) with no proper semantics. Based on this exception, it is not possible to determine the actual cause of the issue (whether a message too large issue or something else).
When you hit this error message, you should check these values:
Broker configs:
- message.max.bytes – Maximum size of a message the broker will accept. This has to be smaller than the consumer fetch.message.max.bytes, or the broker will have messages that can’t be consumed, causing consumers to hang.
- replica.fetch.max.bytes – Maximum size of data that a broker can replicate. This has to be larger than message.max.bytes, or a broker will accept messages and fail to replicate them. Leading to potential data loss.
- log.segment.bytes – size of a Kafka data file. Make sure its larger than 1 message (large messages probably shouldn’t exceed 1GB in any case)
Consumer configs:
- fetch.message.max.bytes – Maximum size of message a consumer can read. This should be the size of message.max.bytes or larger.
- For Kafka 0.9 and above, please use new consumer API max.partition.fetch.bytes instead of fetch.message.max.bytes.
You should take to consideration how you will change the values so the performance of Kafka wouldn't be reduced.
When you need to send large messages in Kafka, you can check also the article "HANDLING LARGE MESSAGES IN KAFKA": http://ingest.tips/2015/01/21/handling-large-messages-kafka/
This issue is resolved for Kafka 0.9 by considering large messages as error records.