A want to create a sample dataframe -- based on a json template -- that looks as real as possible. Hence normal distribution.
This is what I have tried
import json, random
import pandas as pd
sample_data = """{"product1":[
{"category":"Fruits",
"productlist":["Bell Peppers","Red Chillies", "Onions", "Tomatoes"]}
],
"product2":[
{"category":"Vegetables",
"productlist":["Apple","Mango","Banana"]}
]}"""
products = json.loads(sample_data)
colHeaders = []
for k,v in products.items():
colHeaders.append(v[0]['category'])
df = pd.DataFrame(columns= colHeaders)
for i in range (1000):
itemlist = []
for k,v in products.items():
itemlist.append(random.choice(v[0]['productlist']))
#print(itemlist)
df.loc[len(df)] = itemlist
print(df)
I am not sure I am doing it correctly. If not, please help me with
- How to check if the data frame rows represent a normal distribution?
- How to try other distributions in this case?
Other related Stack Overflow questions I have referred are:
- Select one element from a list using python following the normal distribution
- Create distribution in Pandas