Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have an XML document that has a default namespace attached to it, eg

<foo xmlns="http://www.example.com/ns/1.0">
...
</foo>

In reality this is a complex XML document that conforms to a complex schema. My job is to parse out some data from it. To aid me, I have a spreadsheet of XPath. The XPath is rather deeply nested, eg

level1/level2/level3[@foo="bar"]/level4[@foo="bar"]/level5/level6[2]

The person who generate the XPath is an expert in the schema, so I am going with the assumption that I can't simplify it, or use object traversal shortcuts.

I am using SimpleXML to parse everything out. My problem has to do with how the default namespace gets handled.

Since there is a default namespace on the root element, I can't just do

$xml = simplexml_load_file($somepath);
$node = $xml->xpath('level1/level2/level3[@foo="bar"]/level4[@foo="bar"]/level5/level6[2]');

I have to register the namespace, assign it to a prefix, and then use the prefix in my XPath, eg

$xml = simplexml_load_file($somepath);
$xml->registerXPathNamespace('myns', 'http://www.example.com/ns/1.0');
$node = $xml->xpath('myns:level1/myns:level2/myns:level3[@foo="bar"]/myns:level4[@foo="bar"]/myns:level5/myns:level6[2]');

Adding the prefixes isn't going to be manageable in the long run.

Is there a proper way to handle default namespaces without needing to using prefixes with XPath?

Using an empty prefix doesn't work ($xml->registerXPathNamespace('', 'http://www.example.com/ns/1.0');). I can string out the default namespace, eg

$xml = file_get_contents($somepath);
$xml = str_replace('xmlns="http://www.example.com/ns/1.0"', '', $xml);
$xml = simplexml_load_string($xml);

but that is skirting the issue.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
396 views
Welcome To Ask or Share your Answers For Others

1 Answer

From a bit of reading online, this is not restricted to any particular PHP or other library, but to XPath itself - at least in XPath version 1.0

XPath 1.0 does not include any concept of a "default" namespace, so regardless of how the element names appear in the XML source, if they have a namespace bound to them, the selectors for them must be prefixed in basic XPath selectors of the form ns:name. Note that ns is a prefix defined within the XPath processor, not by the document being processed, so has no relationship to how xmlns attributes are used in the XML representation.

See e.g. this "common XSLT mistakes" page, talking about the closely related XSLT 1.0:

To access namespaced elements in XPath, you must define a prefix for their namespace. [...] Unfortunately, XSLT version 1.0 has no concept similar to a default namespace; therefore, you must repeat namespace prefixes again and again.

According to an answer to a similar question, XPath 2.0 does include a notion of "default namespace", and the XSLT page linked above mentions this also in the context of XSLT 2.0.

Unfortunately, all of the built-in XML extensions in PHP are built on top of the libxml2 and libxslt libraries, which support only version 1.0 of XPath and XSLT.

So other than pre-processing the document not to use namespaces, your only option would be to find an XPath 2.0 processor that you could plug in to PHP.

(As an aside, it's worth noting that if you have unprefixed attributes in your XML document, they are not technically in the default namespace, but rather in no namespace at all; see XML Namespaces and Unprefixed Attributes for discussion of this oddity of the Namespace spec.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...