c++ - Is the u8 string literal necessary in C++11

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

c++ - Is the u8 string literal necessary in C++11

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

From Wikipedia:

For the purpose of enhancing support for Unicode in C++ compilers, the definition of the type char has been modified to be at least the size necessary to store an eight-bit coding of UTF-8.

I'm wondering what exactly this means for writing portable applications. Is there any difference between writing this

const char[] str = "Test String";

or this?

const char[] str = u8"Test String";

Is there be any reason not to use the latter for every string literal in your code?

What happens when there are non-ASCII-Characters inside the TestString?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

228 views

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:57:52+0000

The encoding of "Test String" is the implementation-defined system encoding (the narrow, possibly multibyte one).

The encoding of u8"Test String" is always UTF-8.

The examples aren't terribly telling. If you included some Unicode literals (such as U0010FFFF) into the string, then you would always get those (encoded as UTF-8), but whether they could be expressed in the system-encoded string, and if yes what their value would be, is implementation-defined.

If it helps, imagine you're authoring the source code on an EBCDIC machine. Then the literal "Test String" is always EBCDIC-encoded in the source file itself, but the u8-initialized array contains UTF-8 encoded values, whereas the first array contains EBCDIC-encoded values.

Categories

c++ - Is the u8 string literal necessary in C++11

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags