Good day all, I have an excel with multiple sheets which I load via pd.read_excel into a dict object. Each excel sheet contains some column names which are the same:
Like sheet1: ['IDx','IDy'], sheet2: ['IDx','IDy']
I usually use pd.concat in order to load everything into one data frame. This way same column names also get merged correctly.
But this way is not robust in the case that there is some accidental extra white space in one of the ID columns but they're actually meant to get merged with pd.concat.
like: sheet1: ['IDx','IDy'], sheet2: ['IDx',' IDy']
To tackle this problem I thought to iterate over each data frame from the excel dict and strip() the column names of white space and then pd.concat afterwards.
excel = pd.read_excel('D:ExcelFile.xlsx', sheet_name=None)
new_excel = {}
for name, sheet in excel.items():
sheet = sheet.columns.str.strip()
#print(sheet)
new_excel[name] = sheet
print(new_excel)
output:
{'Sheet1': Index([ 'IDx', 'IDy', ..... ]}
at this point I am stuck. I can't do anything with the new_excel dict. It seems that I am accessing each data frame incorrectly and just get the Index object. I can't get my head around this issue.
When trying to concat with new_excel:
TypeError: cannot concatenate object of type '<class 'pandas.core.indexes.base.Index'>'; only Series and DataFrame objs are valid
Many thanks in advance!
question from:https://stackoverflow.com/questions/65935924/problems-manipulation-dataframes-in-dictionary-from-pd-read-excel