Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have following dict which I want to convert into pandas. this dict have nested list which can appear for one node but not other.

dis={"companies": [{"object_id": 123,
                           "name": "Abd ",
                           "contact_name": ["xxxx",
                                                                       "yyyy"],
                           "contact_id":[1234,
                                                                     33455]
                           },
                          {"object_id": 654,
                           "name": "DDSPP"},
                          {"object_id": 987,
                           "name": "CCD"}
                          ]}

AS

object_id, name, contact_name, contact_id
123,Abd,xxxx,1234
123,Abd,yyyy,
654,DDSPP,,
987,CCD,,

How can i achive this

I was trying to do like

abc = pd.DataFrame(dis).set_index['object_id','contact_name']

but it says

'method' object is not subscriptable

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
152 views
Welcome To Ask or Share your Answers For Others

1 Answer

This is inspired from @jezrael answer in this link: Splitting multiple columns into rows in pandas dataframe

Use:

s = {"companies": [{"object_id": 123,
                       "name": "Abd ",
                       "contact_name": ["xxxx",
                                                                   "yyyy"],
                       "contact_id":[1234,
                                                                 33455]
                       },
                      {"object_id": 654,
                       "name": "DDSPP"},
                      {"object_id": 987,
                       "name": "CCD"}
                      ]}
df = pd.DataFrame(s) #convert into DF
df = df['companies'].apply(pd.Series) #this splits the internal keys and values into columns
split1 = df.apply(lambda x: pd.Series(x['contact_id']), axis=1).stack().reset_index(level=1, drop=True)
split2 = df.apply(lambda x: pd.Series(x['contact_name']), axis=1).stack().reset_index(level=1, drop=True)
df1 = pd.concat([split1,split2], axis=1, keys=['contact_id','contact_name'])
pd.options.display.float_format = '{:.0f}'.format
print (df.drop(['contact_id','contact_name'], axis=1).join(df1).reset_index(drop=True))

Output with regular index:

    name  object_id  contact_id contact_name
0   Abd         123        1234         xxxx
1   Abd         123       33455         yyyy
2   DDSPP       654         nan          NaN
3   CCD         987         nan          NaN

Is this something you were looking for?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...