I believe the following function is a working solution for pandas DataFrame rolling argmin/max:
import numpy as np
def data_frame_rolling_arg_func(df, window_size, func):
ws = window_size
wm1 = window_size - 1
return (df.rolling(ws).apply(getattr(np, f'arg{func}'))[wm1:].astype(int) +
np.array([np.arange(len(df) - wm1)]).T).applymap(
lambda x: df.index[x]).combine_first(df.applymap(lambda x: np.NaN))
It is inspired from a partial solution for rolling idxmax on pandas Series.
Explanations:
- Apply the numpy argmin/max function to the rolling window.
- Only keep the non-
NaN
values. - Convert the values to
int
. - Realign the values to original row numbers.
- Use
applymap
to replace the row numbers by the index values. - Combine with the original
DataFrame
filled withNaN
in order to add the first rows with expectedNaN
values.
In [1]: index = map(chr, range(ord('a'), ord('a') + 10))
In [2]: df = pd.DataFrame((10 * np.random.randn(10, 3)).astype(int), index=index)
In [3]: df
Out[3]:
0 1 2
a -4 15 0
b 0 -6 4
c 7 8 -18
d 11 12 -16
e 6 3 -6
f -1 4 -9
g 6 -10 -7
h 8 11 -25
i -2 -10 -8
j 0 10 -7
In [4]: data_frame_rolling_arg_func(df, 3, 'max')
Out[4]:
0 1 2
a NaN NaN NaN
b NaN NaN NaN
c c a b
d d d b
e d d e
f d d e
g e f e
h h h g
i h h g
j h h j
In [5]: data_frame_rolling_arg_func(df, 3, 'min')
Out[5]:
0 1 2
a NaN NaN NaN
b NaN NaN NaN
c a b c
d b b c
e e e c
f f e d
g f g f
h f g h
i i g h
j i i h
My question are:
- Can you find any mistakes?
- Is there a better solution? That is: more performant and/or more elegant.
And for pandas maintainers out there: it would be nice if the already great pandas library included rolling idxmax and idxmin.