I have this helper function that gets rid of control characters in XML text:
def remove_control_characters(s): #Remove control characters in XML text
t = ""
for ch in s:
if unicodedata.category(ch)[0] == "C":
t += " "
if ch == "," or ch == """:
t += ""
else:
t += ch
return "".join(ch for ch in t if unicodedata.category(ch)[0]!="C")
I would like to know whether there is a unicode category for excluding quotation marks and commas.
See Question&Answers more detail:os