Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am sure this question has already been asked, so forgive me for the duplicate.

Python's chr() function returns the unicode string representation of 1 ordinal value. How can I return a unicode string of a string of ordinals? For example:
john:
j - 106
o - 111
h - 104
n - 110

The full unicode string is: 106111104110

My current method is:

from textwrap import wrap
ct = "106111104110" # unicode string
Split = wrap(ct,3) # split into threes list
inInt = list(map(int, Split)) # convert list of string into list of int
answer=''.join([chr(num) for num in inInt]) # return unicode string for each 3 character string
print(answer)

The above works correctly, printing "john".

However this does not work when the unicode for the value is less than 3 characters, or less than 100. For example:
apple:
a - 97
p - 112
p - 112
l - 108
e - 101

The full unicode string is: 97112112108101

However doing:

ct="97112112108101"
Split = wrap(ct,3) 
inInt = list(map(int, Split)) 
answer=''.join([chr(num) for num in inInt]) 
print(answer)

will print ?yyQ because the unicode of a is 97, which is only 2 characters. I would like to not be constricted to using only characters over 100.

Is there a python library that has the functionality I am looking for? Many thanks in advance.

question from:https://stackoverflow.com/questions/65602239/is-it-possible-to-chr-a-string

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
162 views
Welcome To Ask or Share your Answers For Others

1 Answer

Unicode code points can be up to six hexadecimal digits or seven decimal digits, so you could use leading zeros for consistency:

>>> ''.join(format(ord(x),'06x') for x in 'john')
'00006a00006f00006800006e'
>>> ''.join(chr(int(_[i:i+6],16)) for i in range(0,len(_),6))  # _ gets previous result from REPL.
'john'
>>> ''.join(format(ord(x),'06x') for x in '你好吗')
'004f6000597d005417'
>>> ''.join(chr(int(_[i:i+6],16)) for i in range(0,len(_),6))
'你好吗'

However, typical encoding is performed on byte strings, so encode to UTF-8 first, then you can use bytes methods to get two-digit hex strings:

>>> 'apple'.encode('utf8').hex()
'6170706c65'
>>> bytes.fromhex(_).decode()
'apple'
>>> '你好吗'.encode('utf8').hex()
'e4bda0e5a5bde59097'
>>> bytes.fromhex(_).decode('utf8')
'你好吗'

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...