I am reading a byte sequence from a stream. Assume for the sake of argument, that the sequence is of a fixed length and I read the whole thing into a byte array (in my case it's vector<char>
but it's not important for this question). This byte sequence contains a string, which my be either in utf-16 or in utf-8 encoding. Unfortunately, there's no indicator of which one it is.
I can verify whether the byte sequence represents a valid utf-16 encoding and also whether it represents a valid utf-8 encoding, but I can also imaging how the same sequence of bytes may be a valid utf-8 and a valid utf-16 at the same time.
So, does that mean there's no way to generically figure out which one it is?
See Question&Answers more detail:os