Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm trying to create categories based on multiple columns in pandas but it is taking forever to run so i'm not sure it is correct. I left for 30 mns and was still running so stopped it. I'm trying to create a new column based on several other columns (in my actual data it is about 15 cols). However when I try on a smaller dataset it is very quick. Any suggestions?

other_cols = ['col1', 'col2', 'col3', 'col4', 'col5']


def labels(row):
    if ((row['col 6'] > 1) & (row[other_cols] < 1)).all():
        return 'Yes'
    if ((row['col 6'] >1) & (row['col 7'] >1) & (row[other_cols] <1)).all():
        return 'Maybe'
    if ((row['col 6'] <1) & (row['col 7']>1) & (row[other_cols] <1)).all():
        return 'no'

df['category'] = df.apply(lambda row: labels(row), axis=1)
question from:https://stackoverflow.com/questions/65940888/creating-new-column-based-on-multiple-columns-pandas

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
199 views
Welcome To Ask or Share your Answers For Others

1 Answer

You can try that maybe :

ther_cols = ['col1', 'col2', 'col3', 'col4', 'col5']


def labels(row):
    elif ((row['col 6'] > 1) & (row[other_cols] < 1)).all():
        row['category'] = 'Yes'
    elif ((row['col 6'] >1) & (row['col 7'] >1) & (row[other_cols] <1)).all():
        row['category'] = 'Maybe'
    elif ((row['col 6'] <1) & (row['col 7']>1) & (row[other_cols] <1)).all():
        row['category'] = 'no'
    else:
        row['category'] = ''

df = df.apply(labels, axis=1)

What is the size of your dataset ?

I'm sorry i can not comment I am still new here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...