I have a text file full of non-ASCII characters.
I can not detect the encoding by either file
or enca
.
file non_ascii.txt
non_ascii.txt: Non-ISO extended-ASCII text
enca non_ascii.txt
Unrecognized encoding
But I can open it normally in Windows Notepad++
Edit: The expression above leads misunderstanding. Sorry for this. In fact, I picked some parts of the original file and put them into new text file, then opened in notepad++.
The 2 parts shows as below. They are decoded in 2 different ways by notepad++.
Question:
- How could I detect the files encoding under linux?
- how do I recover the characters represented by
<F1><EE><E9><E4><FF>
? I couldn't get result by "grep 'сойдя' win.txt" even though the "сойдя" is encoded into<F1><EE><E9><E4><FF>
?
The file content slice as follows:
less non_ascii.txt
"non_ascii.txt" may be a binary file. See it anyway?
<F1><EE><E9><E4><FF>
<F2><F0><E0><EA><F2><EE><E2><E0><F2><FC><F1><FF>
<D0><F2><E9><E4><D7><E9><E7><E1><EC><E1><F3><F8>
<D1><E5><EA><F3><ED><E4>
<F0><E0><E7><E3><F0><F3><E7><EA><E8>
<EF><EE><E4><F1><F2><E0><E2><EB><FF><F2><FC>
<F0><E0><E7><E3><F0><F3><E7><EA><E5>
<F1><EE><E9><E4><F3>
<F0><E0><E7><E3><F0><F3><E7><EA><E0>
<F1><EE><E2><EB><E0><E4><E0><EB><E8>
<C1><D7><E9><E1><F0><EF><FE><F4><E1>
<CB><C1><D3><D3><C9><D4><C5><D2><C9><D4>
<F1><EE><E2><EB><E0><E4><E0><EB><EE>
<F1><EE><E9><E4><E8>
<F1><EE><E2><EB><E0><E4><E0><EB><E0>
question from:https://stackoverflow.com/questions/65901320/how-to-check-s3-file-encoding