I got stuck with XML and Python. The task is simple but I couldn't resolve it so far and spent on that long time. I came here for an advice how to solve it with couple of lines.
Thanks for any help with traversing the tree. I always ended up with too many or too few elements. Elements can be nested without limit. Given example is just an example. I will accept any solution, not picky about dom, minidom, sax, whatever..
I have an XML file similar to this one:
<root>
<elm>
<elm>Common content</elm>
<elm xmlns="http://example.org/ns">
<elm lang="en">Content EN</elm>
<elm lang="cs">?lu?ou?ky koní?ek</elm>
</elm>
<elm xml:id="abc123">Common content</elm>
<elm lang="en">Content EN</elm>
<elm lang="cs">Content CS</elm>
<elm lang="en">
<elm>Content EN</elm>
<elm>Content EN</elm>
</elm>
<elm lang="cs">
<elm>Content CS</elm>
<elm>Content CS</elm>
</elm>
</elm>
</root>
What I need - parse the XML and write a new file. The new file should contain all the elements for given language and elements without lang
attribute.
For "cs" language the output file should containt this:
<root>
<elm>
<elm>Common content</elm>
<elm xmlns="http://example.org/ns">
<elm lang="cs">?lu?ou?ky koní?ek</elm>
</elm>
<elm xml:id="abc123">Common content</elm>
<elm lang="cs">Content CS</elm>
<elm lang="cs">
<elm>Content CS</elm>
<elm>Content CS</elm>
</elm>
</elm>
</root>
If you can make it to omit the lang
attribute in the new file, even better. But it's not that important.
UPDATE1: Added unicode characters and namespace attribute.
UPDATE2: Using Python 2.5, standard libraries preferred.
See Question&Answers more detail:os