Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

am trying to parse a multi-line html file using regex.

HTML code:

<td>Details</td></tr>  
<tr class=d1>
<td>uss_vod_translator</td>

Regex Expression:

if ($line =~ m/Details</td>s*</tr>s*<trs*class=d1>s*<td>(w*)</td>/)
{
    print "$1";
}

I am using /s* (space) for multi-line, but it is not working. I searched about it, even used /? for multi-line but that too did not work.

Can any one please suggest me how to parse a multiline HTML?

I know regex is a poor solution to parse HTML. But i have a legacy HTML code which i need to parse and have no other choice.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.6k views
Welcome To Ask or Share your Answers For Others

1 Answer

Can any one please suggest me how to parse a multiline HTML?

Stop trying to use regular expressions and use a module that will parse it for you.

HTML::TreeBuilder is a good solution.

HTML::TreeBuilder::LibXML gives you the same API but backed by a fast parser.

HTML::TreeBuilder::XPath adds XPath support as well as a fast parser.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...