Now Windows 10 Notepad does not require unicode files to have the BOM header and it does not encode the header by default. This does break the existing code that checks the header to determine Unicode in files. How can I now tell in C++ if a file is in unicode? Source: https://www.bleepingcomputer.com/news/microsoft/windows-10-notepad-is-getting-better-utf-8-encoding-support/
The code we have to determine Unicode:
int IsUnicode(const BYTE p2bytes[3])
{
if( p2bytes[0]==0xEF && p2bytes[1]==0xBB p2bytes[2]==0xBF)
return 1; // UTF-8
if( p2bytes[0]==0xFE && p2bytes[1]==0xFF)
return 2; // UTF-16 (BE)
if( p2bytes[0]==0xFF && p2bytes[1]==0xFE)
return 3; // UTF-16 (LE)
return 0;
}
If it's so much pain, why isn't there a typical function to determine the encoding?
question from:https://stackoverflow.com/questions/65933277/detecting-unicode-in-files-in-windows-10