.net - Hexadecimal value 0x00 is a invalid character

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

.net - Hexadecimal value 0x00 is a invalid character

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

I am generating an XML document from a StringBuilder, basically something like:

string.Format("<text><row>{0}</row><col>{1}</col><textHeight>{2}</textHeight><textWidth>{3}</textWidth><data>{4}</data><rotation>{5}</rotation></text>

Later, something like:

XmlDocument document = new XmlDocument();
document.LoadXml(xml);
XmlNodeList labelSetNodes = document.GetElementsByTagName("labels");
for (int index = 0; index < labelSetNodes.Count; index++)
{
    //do something
}

All the data comes from a database. Recently I've had a few issues with the error:

Hexadecimal value 0x00 is a invalid character, line 1, position nnnnn

But its not consistent. Sometimes some 'blank' data will work. The 'faulty' data works on some PCs, but not others.

In the database, the data is always a blank string. It is never 'null' and in the XML file, it comes out as < data>< /data>, i.e. no character between opening and closing. (but not sure if this can be relied on as I am pulling it from the 'immediate' window is vis studio and pasting it into textpad).

There is possibly differences in the versions of sql server (2008 is where it would fail, 2005 would work) and collation too. Not sure if any of these are likely causes?

But exactly the same code and data will sometimes fail. Any ideas where the problem lies?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

457 views

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:52:36+0000

Without your actual data or source, it will be hard for us to diagnose what is going wrong. However, I can make a few suggestions:

Unicode NUL (0x00) is illegal in all versions of XML and validating parsers must reject input that contains it.
Despite the above; real-world non-validated XML can contain any kind of garbage ill-formed bytes imaginable.
XML 1.1 allows zero-width and nonprinting control characters (except NUL), so you cannot look at an XML 1.1 file in a text editor and tell what characters it contains.

Given what you wrote, I suspect whatever converts the database data to XML is broken; it's propagating non-XML characters.

Create some database entries with non-XML characters (NULs, DELs, control characters, et al.) and run your XML converter on it. Output the XML to a file and look at it in a hex editor. If this contains non-XML characters, your converter is broken. Fix it or, if you cannot, create a preprocessor that rejects output with such characters.

If the converter output looks good, the problem is in your XML consumer; it's inserting non-XML characters somewhere. You will have to break your consumption process into separate steps, examine the output at each step, and narrow down what is introducing the bad characters.

Check file encoding (for UTF-16)

Update: I just ran into an example of this myself! What was happening is that the producer was encoding the XML as UTF16 and the consumer was expecting UTF8. Since UTF16 uses 0x00 as the high byte for all ASCII characters and UTF8 doesn't, the consumer was seeing every second byte as a NUL. In my case I could change encoding, but suggested all XML payloads start with a BOM.

Categories

.net - Hexadecimal value 0x00 is a invalid character

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Check file encoding (for UTF-16)

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags