Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm trying to convert a string column in a dataframe to int. The strings should be replaced with an integer as a key value.

Data:

user_id site_id 
100     url1.com 
100     url2.com 
100     url1.com 
101     url2.com 
101     url2.com 
101     url2.com

Wanted output:

user_id site_id 
100     1 
100     2 
100     1 
101     2 
101     2 
101     2

I tried to get all unique urls with:

names = pd.unique(df.site_id.ravel()) 
urls = pd.Series(np.arange(len(names)), names) 

and then

df["site_id"] = df.applymapp(urls.get)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
157 views
Welcome To Ask or Share your Answers For Others

1 Answer

You want factorize to encode the values to ints:

In [52]:
df['site_id'] = pd.factorize(df['site_id'])[0] + 1
df

Out[52]:
   user_id  site_id
0      100        1
1      100        2
2      100        1
3      101        2
4      101        2
5      101        2

here factorize returns an array:

In [53]:
pd.factorize(df['site_id'])

Out[53]:
(array([0, 1, 0, 1, 1, 1], dtype=int64), Int64Index([1, 2], dtype='int64'))

we want the encoded values in the tuple and add 1 to each:

pd.factorize(df['site_id'])[0] + 1

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...