If I have the following dataframe, derived like so: df = pd.DataFrame(np.random.randint(0, 10, size=(10, 1)))
0
0 0
1 2
2 8
3 1
4 0
5 0
6 7
7 0
8 2
9 2
Is there an efficient way cumsum
rows with a limit and each time this limit is reached, to start a new cumsum
. After each limit is reached (however many rows), a row is created with the total cumsum.
Below I have created an example of a function that does this, but it's very slow, especially when the dataframe becomes very large. I don't like that my function is looping and I am looking for a way to make it faster (I guess a way without a loop).
def foo(df, max_value):
last_value = 0
storage = []
for index, row in df.iterrows():
this_value = np.nansum([row[0], last_value])
if this_value >= max_value:
storage.append((index, this_value))
this_value = 0
last_value = this_value
return storage
If you rum my function like so: foo(df, 5)
In in the above context, it returns:
0
2 10
6 8
See Question&Answers more detail:os