Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

In Python 2.7's documentation, three rules about Unicode are described as follows:

If the code point is <128, it’s represented by the corresponding byte value.

If the code point is between 128 and 0x7ff, it’s turned into two byte values between 128 and 255.

Code points >0x7ff are turned into three- or four-byte sequences, where each byte of the sequence is between 128 and 255.

Then I made some tests about it:

>>>> unichr(40960)

u'ua000'

>>> ord(u'ua000')

40960

In my view, 40960 is a code point > 0x7ff, so it should be turned into three- or four-byte sequences, where each byte of the sequence is between 128 and 255, but it only be turned into two-bytes sequence, and the value '00' in u'a000' is lower than 128, not matched with the rules mentioned above. Why?

What's more, I found some more Unicode characters, such as u'u1234', etc. I found that the value ("12" && "34") in it is also lower than 128, but according to the thoery mentioned first, they shouldn't be lower than 128. Any other theories that I lost?

Thanks for all answers.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
322 views
Welcome To Ask or Share your Answers For Others

1 Answer

in python2.7's documentation, three rules about unicodes are described as follows:

That is a description of the UTF-8 encoding.

Then I made some tests about it:

ua000 is an escape sequence representing a Unicode character. The a000 is a hexadecimal representation of the numerical code point value. It has nothing to do with UTF-8 encoding.

You get UTF-8 encoding when you explicitly encode a unicode string using the UTF-8 encoding.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...