Starting from
>>> df
val1 val2 val3
city_id
houston,tx 1 2 0
houston,tx 0 0 1
houston,tx 2 1 1
somewhere,ew 4 3 7
I might do
>>> df.groupby(df.index).sum()
val1 val2 val3
city_id
houston,tx 3 3 2
somewhere,ew 4 3 7
or
>>> df.reset_index().groupby("city_id").sum()
val1 val2 val3
city_id
houston,tx 3 3 2
somewhere,ew 4 3 7
The first approach passes the index values (in this case, the city_id
values) to groupby
and tells it to use those as the group keys, and the second resets the index and then selects the city_id
column. See this section of the docs for more examples. Note that there are lots of other methods in the DataFrameGroupBy
objects, too:
>>> df.groupby(df.index)
<pandas.core.groupby.DataFrameGroupBy object at 0x1045a1790>
>>> df.groupby(df.index).max()
val1 val2 val3
city_id
houston,tx 2 2 1
somewhere,ew 4 3 7
>>> df.groupby(df.index).mean()
val1 val2 val3
city_id
houston,tx 1 1 0.666667
somewhere,ew 4 3 7.000000