Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

For csv file which encoding is utf-16le. When I try to read data of csv it gives me junk character

To get file encoding I use below command

 file -bi test.csv

it gives me text/plain; charset=utf-16le

To read file data I use below command

head -n1 test.csv | tr '^' ','

it gives me ??colon1,colon2,colon3

Why it is giving me junk charchater


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
436 views
Welcome To Ask or Share your Answers For Others

1 Answer

As the csv file is encoded with UTF-16LE, the file starts with the BOM (Byte Order Mark), 0xff and 0xfe. You can identify it with:

head -n1 test.csv | xxd

UTF-8 is most commonly used now and UTF-16 is getting less used (including Windows). Your locale will be also defaulted to UTF-8. So please try:

iconv -f UTF-16LE -t UTF-8 test.csv | head -n1 | tr '^' ','

which converts the csv file to UTF-8 coding.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...