I recently started working with Pandas and I'm currently trying to impute some missing values in my dataset.
(我最近开始与Pandas合作,目前正在尝试在数据集中估算一些缺失值。)
I want to impute the missing values based on the median (for numerical entries) and mode (for categorical entries).
(我想根据中位数(用于数字输入)和模式(用于类别输入)来估算缺失值。)
However, I do not want to calculate the median and mode over the whole dataset, but based on aGroupBy
of my column called "make"
. (但是,我不想计算整个数据集的中位数和众数,而是基于我的名为"make"
列的GroupBy
。)
For numerical values I have done the following:
(对于数值,我做了以下工作:)
data = data.fillna(data.groupby("make").transform("median"))
--> this works perfectly and replaces all my numerical NA
values with the median of their "make"
.
(->效果很好,并用"make"
的中值替换了我所有的数值NA
值。)
However, I couldn't manage to do the same thing for the mode, ie replace all categorical NA values with the mode of their "make"
.
(但是,我无法对该模式执行相同的操作,即用其"make"
模式替换所有类别的NA值。)
Does anyone know how to do it?
(有人知道怎么做吗?)
ask by mt1212 translate from so