According to the following table for the ISO-8859-1 standard, there seems to be an entity name and an entity number associated with each reserved HTML character.
So for example, for the character é
:
Entity Name : é
Entity Number : é
Similarly, for the character >
:
Entity Name : >
Entity Number : >
For a given string, the HttpUtility.HtmlEncode
returns an HTML encoded String, but I can't figure out how it works. Here is what I mean :
Console.WriteLine(HtmlEncode("é>"));
//Outputs é>
It seems to be using the entity number for the é
character but the entity name for the >
character.
So does the HtmlEncode method really work with the ISO-8859-1 standard? If it does, is there a reason why it sometimes uses the entity name and other times the entity number? More importantly, can I force it to give me the entity name reliably?
EDIT : Thanks for the answers guys. I cannot decode the string before I perform the search though. Without getting into too many details, the text is stored in a SharePoint List and the "search" is done by SharePoint itself (using a CAML query). So basically, I can't.
I'm trying to think of a way to convert the entity numbers into names, is there a function in .NET that does that? Or any other idea?
See Question&Answers more detail:os