Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

This is my sample xml file

<manifest>
 <default>
    <remote>remote1</remote>
    <revision>rev1</revision>
 </default>
 <project>
    <name>common</name>
    <path>opensource/device</path>
    <revision>sa</revision>
    <x-ship>oss</x-ship>
 </project>
 <project>
   <name>external</name>
   <path>source/tp</path>
   <x-ship>none</x-ship>
 </project>
 <project>
   <name>ws</name>
   <path>opensource/ws</path>
   <remote>nj</remote>
   <revision>myno</revision>
   <x-ship>none</x-ship>
 </project>
</manifest>

In this I need to update the value of revision only when <path> has "opensource" string in it.

I searched a lot but couldn't find anything useful to achieve this, I could modify the value based on the position as below, can anyone help me updating this? Or let me know if there is a better Perl library to do this.

#!/usr/bin/perl

use strict;
use warnings;

use XML::Simple;

my $xml_file = 'dev.xml';

my $xml = XMLin(
    $xml_file,
    KeepRoot => 1,
    ForceArray => 1,
);

$xml->{manifest}->[0]->{project}->[2]->{revision} = 'kyo';

XMLout(
    $xml,
    KeepRoot => 1,
    NoAttr => 1,
    OutputFile => $xml_file,
);
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
739 views
Welcome To Ask or Share your Answers For Others

1 Answer

There is definitely a learning curve, but XML::Twig and XPath syntax can handle this quite well. The following demonstrates that for a variation of the fake data that you provided.

Please note that one of the big features to twig is the ability to parse data as you go instead of having to load an extremely large XML file entirely into memory. That may not be a limitation in your case, but is an important feature for some.

use strict;
use warnings;

use XML::Twig;

my $data = do { local $/; <DATA> };

my $t= XML::Twig->new( 
    twig_handlers => {
        q{project[string(path) =~ /opensource/]/revision} => &revision,
    },
    pretty_print => 'indented',
);
$t->parse( $data );
$t->print;

sub revision {
    my ($twig, $rev) = @_;
    $rev->set_text("open source - " . $rev->text());
}

__DATA__
<manifest>
 <default>
    <remote>remote1</remote>
    <revision>rev1</revision>
 </default>
 <project>
    <name>common</name>
    <path>NOTopensource/device</path>
    <revision>sa</revision>
    <x-ship>oss</x-ship>
 </project>
 <project>
   <name>external</name>
   <path>source/tp</path>
   <x-ship>none</x-ship>
 </project>
 <project>
   <name>ws</name>
   <path>opensource/ws</path>
   <remote>nj</remote>
   <revision>myno</revision>
   <x-ship>none</x-ship>
 </project>
</manifest>

Output:

You'll notice that the last revision has open source - prefixed to it.

<manifest>
  <default>
    <remote>remote1</remote>
    <revision>rev1</revision>
  </default>
  <project>
    <name>common</name>
    <path>NOTopensource/device</path>
    <revision>sa</revision>
    <x-ship>oss</x-ship>
  </project>
  <project>
    <name>external</name>
    <path>source/tp</path>
    <x-ship>none</x-ship>
  </project>
  <project>
    <name>ws</name>
    <path>opensource/ws</path>
    <remote>nj</remote>
    <revision>open source - myno</revision>
    <x-ship>none</x-ship>
  </project>
</manifest>

Addendum about sibling elements:

Yes, there are methods to traverse within a twig to nearby xml elements. For example, if I wanted to pull the name of the revision I was editting, and place that in the new text, I could do the following:

sub revision {
    my ($twig, $rev) = @_;
    my $name = $rev->parent()->first_child("name");
    $rev->set_text("open source - " $name->text() . ' - '. $rev->text());
}

Note the ws now added to the edited revision tag:

  <project>
    <name>ws</name>
    <path>opensource/ws</path>
    <remote>nj</remote>
    <revision>open source - ws - myno</revision>
    <x-ship>none</x-ship>
  </project>

This method of traversing the twig to nearby elements can often be a useful way of filtering. I easily could've done the same to enforce this branch be one with a path that contained opensource, but setting that requirement in the xpath for the handler is convenient if one is familiar with the xpath syntax.

Also note, in my above example, I assume there is a sibling of type name. Normally I would check to make sure before calling ->text() or one might get an error.

Addendum about Attributes:

Concerning your edge case with an alternative format:

<project path="opensource" revision="apple" name="platform" x-ship="none"/>

The above contains the same data as other projects, but instead of the values being child elements, they are attributes. This is also a feature of XML, but it is different and so must be handled in a different way.

The following is an edit of the originally suggested script that adds a new handler for projects that contain a path attribute versus a child:

use strict;
use warnings;

use XML::Twig;

my $data = do { local $/; <DATA> };

my $t= XML::Twig->new( 
    twig_handlers => {
        q{project[string(path) =~ /opensource/]/revision} => &revision,
        q{project[@path =~ /opensource/]} => &project,
    },
    pretty_print => 'indented',
);
$t->parse( $data );
$t->print;

sub revision {
    my ($twig, $rev) = @_;
    $rev->set_text("open source - " . $rev->text());
}

sub project {
    my ($twig, $project) = @_;

    $project->set_att(
        revision => 'open source - ' . $project->{att}{revision},
    );
}

__DATA__
<manifest>
 <default>
    <remote>remote1</remote>
    <revision>rev1</revision>
 </default>
 <project>
    <name>common</name>
    <path>NOTopensource/device</path>
    <revision>sa</revision>
    <x-ship>oss</x-ship>
 </project>
 <project path="opensource" revision="apple" name="platform" x-ship="none"/>
 <project>
   <name>external</name>
   <path>source/tp</path>
   <x-ship>none</x-ship>
 </project>
 <project>
   <name>ws</name>
   <path>opensource/ws</path>
   <remote>nj</remote>
   <revision>myno</revision>
   <x-ship>none</x-ship>
 </project>
</manifest>

And just to give you something to compare and learn from, here is the same code but with the filtering done in the handler versus using xpath:

    twig_handlers => {
        q{project[string(path) =~ /opensource/]/revision} => &revision,
        q{project} => &project,
    },

...

sub project {
    my ($twig, $project) = @_;

    if ($project->{att}{path} && $project->{att}{path} =~ /opensource/) {
        $project->set_att(
            revision => 'open source - ' . $project->{att}{revision},
        );
    }
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...