Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I need to dynamically construct an XPath query for an element attribute, where the attribute value is provided by the user. I'm unsure how to go about cleaning or sanitizing this value to prevent the XPath equivalent of a SQL injection attack. For example (in PHP):

<?php
function xPathQuery($attr) {
    $xml = simplexml_load_file('example.xml');
    return $xml->xpath("//myElement[@content='{$attr}']");
}

xPathQuery('This should work fine');
# //myElement[@content='This should work fine']

xPathQuery('As should "this"');
# //myElement[@content='As should "this"']

xPathQuery('This'll cause problems');
# //myElement[@content='This'll cause problems']

xPathQuery('']/../privateElement[@content='private data');
# //myElement[@content='']/../privateElement[@content='private data']

The last one in particular is reminiscent to the SQL injection attacks of yore.

Now, I know for a fact there will be attributes containing single quotes and attributes containing double quotes. Since these are provided as an argument to a function, what would be the ideal way to sanitize the input for these?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
343 views
Welcome To Ask or Share your Answers For Others

1 Answer

XPath does actually include a method of doing this safely, in that it permits variable references in the form $varname in expressions. The library on which PHP's SimpleXML is based provides an interface to supply variables, however this is not exposed by the xpath function in your example.

As a demonstration of really how simple this can be:

>>> from lxml import etree
>>> n = etree.fromstring('<n a='He said "I&apos;m here"'/>')
>>> n.xpath("@a=$maybeunsafe", maybeunsafe='He said "I'm here"')
True

That's using lxml, a python wrapper for the same underlying library as SimpleXML, with a similar xpath function. Booleans, numbers, and node-sets can also be passed directly.

If switching to a more capable XPath interface is not an option, a workaround when given external string would be something (feel free to adapt to PHP) along the lines of:

def safe_xpath_string(strvar):
    if "'" in strvar:
        return "',"'",'".join(strvar.split("'")).join(("concat('","')"))
    return strvar.join("''")

The return value can be directly inserted in your expression string. As that's not actually very readable, here is how it behaves:

>>> print safe_xpath_string("basic")
'basic'
>>> print safe_xpath_string('He said "I'm here"')
concat('He said "I',"'",'m here"')

Note, you can't use escaping in the form &apos; outside of an XML document, nor are generic XML serialisation routines applicable. However, the XPath concat function can be used to create a string with both types of quotes in any context.

PHP variant:

function safe_xpath_string($value)
{
    $quote = "'";
    if (FALSE === strpos($value, $quote))
        return $quote.$value.$quote;
    else
        return sprintf("concat('%s')", implode("', "'", '", explode($quote, $value)));
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...