Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have created a Wordpress/WooCommerce plugin which creates an XML file from our products.

But in some rows there are illegal characters.

error on line 15622 at column 22: Input is not proper UTF-8, indicate encoding !
Bytes: 0x03 0xC3 0xB6 0x73

How can I solve this, so the XML is parsed correctly?

XML FEED FILE

The code for generating is something like:

$dom = new DOMDocument('1.0', 'UTF-8');

// create root element
$root = $dom->createElement("termeklista");
$dom->appendChild($root);
$dom->formatOutput=true;

then a while loop with filling the data. The issue is in the description tag.

// DESCRIPTION

$description = $dom->createElement("leiras");
$producta->appendChild($description);
// create CDATA section
$cdata = $dom->createCDATASection("
".$loop->post->post_excerpt."
");
$description->appendChild($cdata);

I have tried iconv, utf8_encode, custom function to replace the wrong characters, but I cannot figure it out what the issue.

The WooCommerce product post excerpt does not have any illegal characters in it.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
277 views
Welcome To Ask or Share your Answers For Others

1 Answer

0x03 (aka ^C aka ETX aka end of transmission) is not an allowed character in XML :

[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

Therefore your data is not XML, and any conformant XML processor must report an error such as the one you received.

You must repair the data by removing any illegal characters by treating it as text, not XML, manually or automatically before using it with any XML libraries.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...