Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Here are my 2 data frames: df1

eid start_dt   end_dt     flag
1   2020-12-01 2020-12-07 0
1   2020-12-08 2020-12-15 0
1   2020-12-16 2020-12-23 1
2   2020-12-01 2020-12-07 0

df2

eid event_dt   col1 col2
1   2020-12-01 .    .
1   2020-12-09 .    .
1   2020-12-17 .    .
2   2020-12-02 .    .

output df.

- If in df1 and df2,  the eids match AND event_dt is between start_dt,end_dt
 -- add a new column 
 -- update the flag

The output data frame df would look like this

eid event_dt   col1 col2 flag
1   2020-12-01 .    .    0
1   2020-12-09 .    .    0
1   2020-12-17 .    .    1
2   2020-12-02 .    .    0

How would I go about doing this?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
217 views
Welcome To Ask or Share your Answers For Others

1 Answer

Try merge and query:

df2['flag'] = (df1.assign(idx=df1.index)
    .merge(df2, on='eid', how='left')
    .query('start_dt <= event_dt <= end_dt')
    .set_index('idx')
    ['flag']
)

Output:

   eid    event_dt col1 col2  flag
0    1  2020-12-01    .    .     0
1    1  2020-12-09    .    .     0
2    1  2020-12-17    .    .     1
3    2  2020-12-02    .    .     0

Update: For the bigger dataset, the above approach may yield a MemoryError. Use pd.merge_asof instead:

df2['flag'] = (pd.merge_asof(df2.sort_values('event_dt'),
                             df1.assign(idx=df1.index).sort_values('end_dt'),
                             by='eid', left_on='event_dt',
                             right_on='start_dt')
               .query('event_dt<=end_dt')
               .set_index('idx')
               ['flag']
               )

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...