Can I convert JSON to CSV (delimited) data?

2 years ago
17 January 2022
1 reply
303 views

Userlevel 4

AkshayJadhav
StreamSets Employee
101 replies

It's possible to convert a data format that supports complex fields such as JSON to a flat file format such as CSV, but it can often get tricky very fast.

Consider a JSON example of:

{ "first_name": "Mary", 
  "last_name": "Jane"
}

Seems simple enough. This JSON actually resembles a flat-file format as there is no nesting of fields. We could flatten this into a CSV that looks like:

first_name,last_name
Mary,Jane

However, this gets a little more complicated if we had the following:

{
  "name": {
    "first": "Mary",
    "last": "Jane"
  }
}

Now we must take each level of nesting and "flatten" it by creating some kind of composite field name. This is still doable, with a recursive function such as the one below that can be used in the Jython Evaluator Processor.

def flatten_dict(d):
    def items():
        for key, value in d.items():
            if isinstance(value, dict):
                for subkey, subvalue in flatten_dict(value).items():
                    yield key + "." + subkey, subvalue
            else:
                yield key, value

    return dict(items())

The final case to account for is the most difficult, where you may still have arrays in your final data, another complex field type. Arrays are harder to flatten because they don't have a natural key (field name) and you must generate one. It's possible to use the field name of the array itself plus a number.

The best solution really as you may have already figured out is to use a more complex data-friendly format such as Avro, which is also supported in StreamSets and by the vast majority of the Hadoop ecosystem.

1 reply

mikelsr
StreamSets Employee
5 replies
2 years ago
18 January 2022

Hi Akshay,

Check out this thread where the same topic is discussed and a solution is found https://groups.google.com/a/streamsets.com/g/sdc-user/c/oMANqyP3KQA . The initial solution they propose is using a Field Flattener, but it gets a little more complicated after that!

Regards, Mikel.

Reply

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded