Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

The following DF represents events received from users. Id of the user and the timestamp of the event:

    id           timestamp
0    1 2020-09-01 18:14:35
1    1 2020-09-01 18:14:39
2    1 2020-09-01 18:14:40
3    1 2020-09-01 02:09:22
4    1 2020-09-01 02:09:35
5    1 2020-09-01 02:09:53
6    1 2020-09-01 02:09:57
7    2 2020-09-01 18:14:35
8    2 2020-09-01 18:14:39
9    2 2020-09-01 18:14:40
10   2 2020-09-01 02:09:22
11   2 2020-09-01 02:09:35
12   2 2020-09-01 02:09:53
13   2 2020-09-01 02:09:57

I would like to get the average expanding session time. A session is defined as a sequence of events that is terminated by more than 5 minutes break.

I've grouped the sessions like so:

df.groupby(['id', pd.Grouper(key="timestamp", freq='5min', origin='start')])

And got the right groups:

   id           timestamp
3   1 2020-09-01 02:09:22
4   1 2020-09-01 02:09:35
5   1 2020-09-01 02:09:53
6   1 2020-09-01 02:09:57
   id           timestamp
0   1 2020-09-01 18:14:35
1   1 2020-09-01 18:14:39
2   1 2020-09-01 18:14:40
    id           timestamp
10   2 2020-09-01 02:09:22
11   2 2020-09-01 02:09:35
12   2 2020-09-01 02:09:53
13   2 2020-09-01 02:09:57
   id           timestamp
7   2 2020-09-01 18:14:35
8   2 2020-09-01 18:14:39
9   2 2020-09-01 18:14:40

Now I would like to calculate the average session time in seconds per user at any given row, so the output is:

    id           timestamp  avg_session_time
0    1 2020-09-01 18:14:35  0 <-- first event
1    1 2020-09-01 18:14:39  4 <-- 2nd event after 4 seconds
2    1 2020-09-01 18:14:40  5 <-- 3rd event after 5 seconds
--- session end
3    1 2020-09-01 02:09:22  5 <-- first event of second session
4    1 2020-09-01 02:09:35  9 <-- 2nd event after 13 seconds (13 seconds in the 2nd session + 5 in first session divide by the number of sessions 2)
5    1 2020-09-01 02:09:53  18 <-- 3rd event after 31 seconds ((31 + 5) / 2 = 18)
6    1 2020-09-01 02:09:57  20 <-- 4th event after 35 seconds ((35 + 5) / 2 = 20)
---
7    2 2020-09-01 18:14:35  0
8    2 2020-09-01 18:14:39  4
9    2 2020-09-01 18:14:40  5
---
10   2 2020-09-01 02:09:22  5
11   2 2020-09-01 02:09:35  9
12   2 2020-09-01 02:09:53  18
13   2 2020-09-01 02:09:57  20

Any help would be awesome :)

question from:https://stackoverflow.com/questions/65561016/pandas-expanding-average-session-time

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.4k views
Welcome To Ask or Share your Answers For Others

1 Answer

Use:

#converting to datetimes
df['timestamp'] = pd.to_datetime(df['timestamp'])

#grouping per 5Min and id
g = df.groupby(['id', pd.Grouper(key="timestamp", freq='5min', origin='start')])
#get first values per groups to new column
df['diff'] = g['timestamp'].transform('first')
#subtract by timestamp and convert timedeltas to seconds
df['diff'] = df['timestamp'].sub(df['diff']).dt.total_seconds()
#shifting per groups by id
df['new'] = df.groupby('id')['diff'].shift()
#get first value per groups, now shifted
df['new'] = g['new'].transform('first')
#replace 0 to misisng values and get average
df['last'] = df[['new','diff']].replace(0, np.nan).mean(axis=1).fillna(df['new'])

print (df)
    id           timestamp  diff  new  last
0    1 2020-09-01 18:14:35   0.0  0.0   0.0
1    1 2020-09-01 18:14:39   4.0  0.0   4.0
2    1 2020-09-01 18:14:40   5.0  0.0   5.0
3    1 2020-09-01 02:09:22   0.0  5.0   5.0
4    1 2020-09-01 02:09:35  13.0  5.0   9.0
5    1 2020-09-01 02:09:53  31.0  5.0  18.0
6    1 2020-09-01 02:09:57  35.0  5.0  20.0
7    2 2020-09-01 18:14:35   0.0  0.0   0.0
8    2 2020-09-01 18:14:39   4.0  0.0   4.0
9    2 2020-09-01 18:14:40   5.0  0.0   5.0
10   2 2020-09-01 02:09:22   0.0  5.0   5.0
11   2 2020-09-01 02:09:35  13.0  5.0   9.0
12   2 2020-09-01 02:09:53  31.0  5.0  18.0
13   2 2020-09-01 02:09:57  35.0  5.0  20.0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...