On the pandas tag, I often see users asking questions about melting dataframes in pandas. I am gonna attempt a cannonical Q&A (self-answer) with this topic.
I am gonna clarify:
What is melt?
How do I use melt?
When do I use melt?
I see some hotter questions about melt, like:
pandas convert some columns into rows : This one actually could be good, but some more explanation would be better.
Pandas Melt Function : Nice question answer is good, but it's a bit too vague, not much expanation.
Melting a pandas dataframe : Also a nice answer! But it's only for that particular situation, which is pretty simple, only
pd.melt(df)
Pandas dataframe use columns as rows (melt) : Very neat! But the problem is that it's only for the specific question the OP asked, which is also required to use
pivot_table
as well.
So I am gonna attempt a canonical Q&A for this topic.
Dataset:
I will have all my answers on this dataset of random grades for random people with random ages (easier to explain for the answers :D):
import pandas as pd
df = pd.DataFrame({'Name': ['Bob', 'John', 'Foo', 'Bar', 'Alex', 'Tom'],
'Math': ['A+', 'B', 'A', 'F', 'D', 'C'],
'English': ['C', 'B', 'B', 'A+', 'F', 'A'],
'Age': [13, 16, 16, 15, 15, 13]})
>>> df
Name Math English Age
0 Bob A+ C 13
1 John B B 16
2 Foo A B 16
3 Bar F A+ 15
4 Alex D F 15
5 Tom C A 13
>>>
Problems:
I am gonna have some problems and they will be solved in my self-answer below.
Problem 1:
How do I melt a dataframe so that the original dataframe becomes:
Name Age Subject Grade
0 Bob 13 English C
1 John 16 English B
2 Foo 14 English B
3 Bar 15 English A+
4 Alex 17 English F
5 Tom 12 English A
6 Bob 13 Math A+
7 John 16 Math B
8 Foo 14 Math A
9 Bar 15 Math F
10 Alex 17 Math D
11 Tom 12 Math C
I want to transpose this so that one column would be each subject and the other columns would be the repeated names of the students and there age and score.
Problem 2:
This is similar to Problem 1, but this time I want to make the Problem 1 output Subject
column only have Math
, I want to filter out the English
column:
Name Age Subject Grades
0 Bob 13 Math A+
1 John 16 Math B
2 Foo 16 Math A
3 Bar 15 Math F
4 Alex 15 Math D
5 Tom 13 Math C
I want the output to be like the above.
Problem 3:
If I was to group the melt and order the students by there scores, how would I be able to do that, to get the desired output like the below:
value Name Subjects
0 A Foo, Tom Math, English
1 A+ Bob, Bar Math, English
2 B John, John, Foo Math, English, English
3 C Tom, Bob Math, English
4 D Alex Math
5 F Bar, Alex Math, English
I need it to be ordered and the names separated by comma and also the Subjects
separated by comma in the same order respectively
Problem 4:
How would I unmelt a melted dataframe? Let's say I already melted this dataframe:
print(df.melt(id_vars=['Name', 'Age'], var_name='Subject', value_name='Grades'))
To become:
Name Age Subject Grades
0 Bob 13 Math A+
1 John 16 Math B
2 Foo 16 Math A
3 Bar 15 Math F
4 Alex 15 Math D
5 Tom 13 Math C
6 Bob 13 English C
7 John 16 English B
8 Foo 16 English B
9 Bar 15 English A+
10 Alex 15 English F
11 Tom 13 English A
Then how would I translate this back to the original dataframe, the below:
Name Math English Age
0 Bob A+ C 13
1 John B B 16
2 Foo A B 16
3 Bar F A+ 15
4 Alex D F 15
5 Tom C A 13
How would I go about doing this?
Problem 5:
If I was to group by the names of the students and separate the subjects and grades by comma, how would I do it?
Name Subject Grades
0 Alex Math, English D, F
1 Bar Math, English F, A+
2 Bob Math, English A+, C
3 Foo Math, English A, B
4 John Math, English B, B
5 Tom Math, English C, A
I want to have a dataframe like above.
Problem 6:
If I was gonna completely melt my dataframe, all columns as values, how would I do it?
Column Value
0 Name Bob
1 Name John
2 Name Foo
3 Name Bar
4 Name Alex
5 Name Tom
6 Math A+
7 Math B
8 Math A
9 Math F
10 Math D
11 Math C
12 English C
13 English B
14 English B
15 English A+
16 English F
17 English A
18 Age 13
19 Age 16
20 Age 16
21 Age 15
22 Age 15
23 Age 13
I want to have a dataframe like above. All columns as values.