Why new line is getting added in Kafka when using Delimited "RFC4180_CSV" as data format?


Userlevel 4
Badge

Scenario:

A new line is getting added in the every records when using Delimited RFC4180_CSV in Kafka Producer. 

However, if we use any other data format we would not see the extra line issue. Here are the screenshots of configuration and console-consumer script.

This the DataFormat configuration.
Pipeline has processed 10 messages.
Kafka-console-consumer output.

In the above output, I could see an extra line is added in every records. We tested the pipeline with Replace New Line Characters configuration and found that it does not add extra line in every records however, it adds the line after every batch of records i.e. a new line after 1000 records.

 

Solution:

The new line is introduced by the CSV format that we are using in pipeline means kafka producer is working as expected.

[1]. Common Format and MIME Type for Comma-Separated Values (CSV) Files - Page2-3
https://datatracker.ietf.org/doc/html/rfc4180

++++++++++++

CSV: It is an example of a "flat file" format. It is a delimited data format that has fields/columns separated by the comma character %x2C (Hex 2C) and records/rows/lines separated by characters indicating a line break. RFC 4180 stipulates the use of CRLF pairs to denote line breaks, where CR is %x0D (Hex 0D) and LF is %x0A (Hex 0A). Each line should contain the same number of fields. Fields that contain a special character (comma, CR, LF, or double quote), must be "escaped" by enclosing them in double quotes (Hex 22). An optional header line may appear as the first line of the file with the same format as normal record lines. This header will contain names corresponding to the fields in the file and should contain the same number of fields as the records in the rest of the file. CSV commonly employs US-ASCII as a character set, but other character sets are permitted.

++++++++++++


0 replies

Be the first to reply!

Reply