Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

When unstacking a pd.DataFrame from a long format to a wide format, pandas automatically sorts the columns in an ascending order. How can I avoid sorting at all? I get my expected the result, when I reorder the columns manually. However, this can't be the best solution, can it?

import pandas as pd

df = pd.DataFrame(
    {"cols": ["B", "A", "C", "D"],
     "rows": [2, 1, 3, 4],
     "value": [3, 1, 2,4]})

df = df.set_index(["cols", "rows"], drop=True)
print(df)

           value
cols rows       
B    2         3
A    1         1
C    3         2
D    4         4

actual_result = df.unstack(level="cols").droplevel(level=0, axis=1)
print(actual_result)

cols    A    B    C    D
rows                    
1     1.0  NaN  NaN  NaN
2     NaN  3.0  NaN  NaN
3     NaN  NaN  2.0  NaN
4     NaN  NaN  NaN  4.0

expected_result = actual_result[["B", "A", "C", "D"]]
print(expected_result)

cols    B    A    C    D
rows                    
1     NaN  1.0  NaN  NaN
2     3.0  NaN  NaN  NaN
3     NaN  NaN  2.0  NaN
4     NaN  NaN  NaN  4.0
question from:https://stackoverflow.com/questions/65938714/avoid-sorting-columns-when-unstacking-a-pandas-dataframe

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
131 views
Welcome To Ask or Share your Answers For Others

1 Answer

Another idea like change values in last step is use ordered categoricals in original column:

df["cols"] = pd.Categorical(df["cols"], ordered=True, categories=df["cols"].unique())
df = df.set_index(["cols", "rows"], drop=True)


actual_result = df.unstack(level="cols").droplevel(level=0, axis=1)
print(actual_result)
cols    B    A    C    D
rows                    
1     NaN  1.0  NaN  NaN
2     3.0  NaN  NaN  NaN
3     NaN  NaN  2.0  NaN
4     NaN  NaN  NaN  4.0

Another idea is used unique values of original column in reindex:

df1 = df.set_index(["cols", "rows"], drop=True)
print(df1)


actual_result = (df1.unstack(level="cols")
                    .droplevel(level=0, axis=1)
                    .reindex(df["cols"].unique(), axis=1))
print(actual_result)
cols    B    A    C    D
rows                    
1     NaN  1.0  NaN  NaN
2     3.0  NaN  NaN  NaN
3     NaN  NaN  2.0  NaN
4     NaN  NaN  NaN  4.0

Or unique values of first level in df1:

df1 = df.set_index(["cols", "rows"], drop=True)
print(df1)


actual_result = (df1.unstack(level="cols")
                    .droplevel(level=0, axis=1)
                    .reindex(df1.index.get_level_values(0).unique(), axis=1))
print(actual_result)

cols    B    A    C    D
rows                    
1     NaN  1.0  NaN  NaN
2     3.0  NaN  NaN  NaN
3     NaN  NaN  2.0  NaN
4     NaN  NaN  NaN  4.0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...