Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm trying to replace missing values of column "Age" but under condition of other columns on this data Titanic - Machine Learning from Disaster

df.Age[(df['Sex'] == 0) & (df['Pclass'] == 1)]

I tried to do that using SimpleImputer:

from sklearn.impute import SimpleImputer
Imputer = SimpleImputer(missing_values=np.nan, strategy='most_frequent')

Imputer.fit_transform( pd.DataFrame(df.Age[(df['Sex'] == 0) & (df['Pclass'] == 1)]) )

but it doesn't work and tried to save values to the column:

df.loc[(df.Age.isnull()) & (df.Age[(df['Sex'] == 0) & (df['Pclass'] == 1)]), 'Age'] = Imputer.fit_transform( pd.DataFrame(df.Age[(df['Sex'] == 0) & (df['Pclass'] == 1)]) )

but doesn't work also.

I tried to do it manually using fillna()

df.loc[(df['Sex'] == 0) & (df['Pclass'] == 1), 'Age'].fillna(int(df.Age[(df['Sex'] == 0) & (df['Pclass'] == 1)].mode()), inplace=True)

I tried to use indexes to access rows and update their values:

mod = int(df.Age[(df['Sex'] == 0) & (df['Pclass'] == 1)].mode())
indices = df.loc[(df.Age.isnull()) & (df.Sex == 0) & (df.Pclass == 1), 'Age'].isnull().index
df.loc[ind, 'Age'] = mod
df[(df['Sex'] == 0) & (df['Pclass'] == 1)]['Age'].isnull().sum()

it worked and the output was: 0, but when I'm trying to apply it in for loop it gives me an error

for i in range(1,3):
    for j in range(1,4):    
        indices = df.loc[(df.Sex == i) & (df.Pclass == j), 'Age'].isnull().index
        mod = int(df.Age[(df['Sex'] == i) & (df['Pclass'] == j)].mode())
        df.loc[ind, 'Age'] = mod

I want to know what is the wrong of first 2 ways an why the 3rd doesn't work in loop?

question from:https://stackoverflow.com/questions/65643621/replace-missing-values-with-most-frequent-number-under-condition

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
337 views
Welcome To Ask or Share your Answers For Others

1 Answer

Try to use the method pad. It takes the most recent value. Afterward, you can drop certain values based on conditions from other columns.

df.fillna(method='pad')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...