Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have tried passing the dtype parameter with read_csv as dtype={n: pandas.Categorical} but this does not work properly (the result is an Object). The manual is unclear.

question from:https://stackoverflow.com/questions/30272300/is-it-possible-to-read-categorical-columns-with-pandas-read-csv

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
355 views
Welcome To Ask or Share your Answers For Others

1 Answer

In version 0.19.0 you can use parameter dtype='category' in read_csv:

data = 'col1,col2,col3
a,b,1
a,b,2
c,d,3'
df = pd.read_csv(pd.compat.StringIO(data), dtype='category')
print (df)
  col1 col2 col3
0    a    b    1
1    a    b    2
2    c    d    3

print (df.dtypes)
col1    category
col2    category
col3    category
dtype: object

If want specify column for category use dtype with dictionary:

df = pd.read_csv(pd.compat.StringIO(data), dtype={'col1':'category'})
print (df)
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

print (df.dtypes)
col1    category
col2      object
col3       int64
dtype: object

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...