My system:win7+R-3.0.2.
> Sys.getlocale()
[1] "LC_COLLATE=Chinese (Simplified)_People's Republic of China.936;LC_CTYPE=Chinese
(Simplified)_People's Republic of China.936;LC_MONETARY=Chinese (Simplified)_People's
republic of China.936;LC_NUMERIC=C;LC_TIME=Chinese (Simplified)_People's Republic of China.936"
There are two files with same content saved in microsoft notepad: one is saved as ansi format, the other is saved as utf8 format.The data is death name in M370 Malaysia Airlines . Or you can create the file this way.
1)copy the data into microsoft notepad.
乘客姓名,性别,出生日期
HuangTianhui,男,1948/05/28
姜翠云,女,1952/03/27
李红晶,女,1994/12/09
2)save it as test.ansi with ansi format in notepad.
3)save it as test.utf8 with utf-8 format in notepad.
read.table("test.ansi",sep=",",header=TRUE) #can work fine
read.table("test.utf8",sep=",",header=TRUE) #can't work
Then, i set encoding into utf-8.
options(encoding="utf-8")
read.table("test.utf8",sep=",",header=TRUE,encoding="utf-8")
In read.table("test.utf8", sep = ",",header=TRUE,encoding = "utf-8") :
invalid input found on input connection 'test.utf8'
How can I read the data file (test.utf8)?
In python,it is so simple
rfile=open("g:\test.utf8","r",encoding="utf-8").read()
rfile
'ufeff乘客姓名,性别,出生日期
HuangTianhui,男,1948/05/28
姜翠云,女,1952/03
/27
李红晶,女,1994/12/09'
rfile.replace("
","
").replace("ufeff","").splitlines()
['乘客姓名,性别,出生日期', 'HuangTianhui,男,1948/05/28', '姜翠云,女,1952/03/27',
'李红晶,女,1994/12/09']
Python can do such job better than R.
I do as Sathish say, problem solved a little ,still remain some.
I found that when the data is in data.frame ,it can not be displayed properly,
when the data is a column of data.frame ,it can be displayed properly,
strange enough,when the data is a row of data.frame,it can not be displayed properly .
See Question&Answers more detail:os