Create and append key value pairs to a file of JSON objects with a regex of each JSON object value

Ask Ubuntu Asked on January 6, 2022

I have a big file that contains json objects each object in a new line.

File example

{"Name" :"%Hana-29-Mrs-Smith","job":"engineer"}
{"Name" :"%Mike-31-Mr-Larry","job":"marketing"}
{"Name" :"%Jhon-40-Mr-Doe","job":"engineer"}

Desired output:

{"Name" :"%Hana-29-Mr-Smith", "f_nams":"Hana", "age":29, "title":"Mrs", "l_name":"Smith","job":"engineer"}
{"Name" :"%Mike-29-Mr-Larry", "f_nams":"Mike", "age":31, "title":"Mr", "l_name":"Larry","job":"marketing"}
{"Name" :"%Jhon-29-Mr-Smith", "f_nams":"Jhon", "age":40, "title":"Mr", "l_name":"Doe","job":"engineer"}

2 Answers

One of the possible ways that is expressive, procedural and clear (although the script itself can seem a bit lengthy), is to use Python3 with json module.

#!/usr/bin/env python3
import json
import sys

with open(sys.argv[1]) as json_file:
    for line in json_file:
        json_obj = dict(json.loads(line))
        tokens = json_obj["Name"].split('-')
        extra_data = { 
            "f_nams": tokens[0].replace('%','') ,
            "age"   : tokens[1],
            "title" : tokens[2],
            "l_name": tokens[3]
        joined_data = {**json_obj, **extra_data}

The way it works is that we use a context manager open() to open the file and to be closed automatically upon completion. From the sample data in the question we may assume that each json object is on separate lines (NOTE: if the actual data you use has multi-line json objects, you may have to adapt the script to use try-except block to read file until full json data is read into a variable).

From there it's just text manipulations and Python magic: split value of key "Name" into tokens on - character into a list, put list of tokens into new dictionary and join the two dictionaries with Python 3.5 ** operator, which I believe is called "keyword unpacking" ( if you use other version of Python, check the link for alternatives ). All that is converted back into json object and printed on standard output. If you do need to save it to new file, use shell redirection as in ./ ./data.json > ./new_data.json or if you want to see it simultaneously on screen ./ ./data.json | tee ./new_data.json

How it works in action:

$ ./ ./data.json 
{"Name": "%Hana-29-Mrs-Smith", "job": "engineer", "f_nams": "Hana", "age": "29", "title": "Mrs", "l_name": "Smith"}
{"Name": "%Mike-31-Mr-Larry", "job": "marketing", "f_nams": "Mike", "age": "31", "title": "Mr", "l_name": "Larry"}
{"Name": "%Jhon-40-Mr-Doe", "job": "engineer", "f_nams": "Jhon", "age": "40", "title": "Mr", "l_name": "Doe"}

$ cat ./data.json 
{"Name" :"%Hana-29-Mrs-Smith","job":"engineer"}
{"Name" :"%Mike-31-Mr-Larry","job":"marketing"}
{"Name" :"%Jhon-40-Mr-Doe","job":"engineer"}

Answered by Sergiy Kolodyazhnyy on January 6, 2022

For non-nested objects such as this, you could consider using Miller

$ mlr --json put -S '
    @x = splitnv(substr($Name,1,-1),"-"); $f_nams = @x[1]; $age = @x[2]; $title = @x[3]; $l_name = @x[4]
  ' then reorder -e -f job file.json
{ "Name": "%Hana-29-Mrs-Smith", "f_nams": "Hana", "age": 29, "title": "Mrs", "l_name": "Smith", "job": "engineer" }
{ "Name": "%Mike-31-Mr-Larry", "f_nams": "Mike", "age": 31, "title": "Mr", "l_name": "Larry", "job": "marketing" }
{ "Name": "%Jhon-40-Mr-Doe", "f_nams": "Jhon", "age": 40, "title": "Mr", "l_name": "Doe", "job": "engineer" }

Answered by steeldriver on January 6, 2022

Add your own answers!

Ask a Question

Get help from others!

© 2024 All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP