Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Original Post = Remove duplicates from json data

This is only my second post. I didnt have enough points to comment my question on the original post...So here I am.

Andy Hayden makes a great point - "Also, those aren't really duplicates... – Andy Hayden"

My question is just that situation... How can you remove duplicates from a json file but by matching against more than 1 key in the json file?

Here is the original example: (it was pointed out that it is not a valid json)

{
  {obj_id: 123,
    location: {
      x: 123,
      y: 323,
  },
  {obj_id: 13,
    location: {
      x: 23,
      y: 333,
  },
 {obj_id: 123,
    location: {
      x: 122,
      y: 133,
  },
}

My case is very similar to this example except In my case, it would keep all these because the x and y values of obj_id are unique, however if x and y were the same than one would be removed from json file.

All the examples I have found only kick out ones based on only one key match..

I don't know if it matters, but the keys that I need to match against are "Company Name" , "First Name", and "Last Name" (it is a 100k plus line json of companies and contacts - there are times when the same person is a contact of multiple companies which is why I need to match against multiple keys)

Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
305 views
Welcome To Ask or Share your Answers For Others

1 Answer

I hope this does what you are looking for (It only checks if First and Last Name are different)

raw_data = [
        {
            "Company":123,
            "Person":{
                "First Name":123,
                "Last Name":323
            }
        },
        {
            "Company":13,
            "Person":{
                "First Name":123,
                "Last Name":323
            }
        },
        {
            "Company":123,
            "Person":{
                "First Name":122,
                "Last Name":133
            }
        }
    ]

unique = []
for company in raw_data:
    if all(unique_comp["Person"] != company["Person"] for unique_comp in unique):
        unique.append(company)

print(unique)

#>>> [{'Company': 123, 'Person': {'First Name': 123, 'Last Name': 323}}, {'Company': 123, 'Person': {'First Name': 122, 'Last Name': 133}}]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...