Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.7k views
in Technique[技术] by (71.8m points)

Regex to parse a multiline HTML

am trying to parse a multi-line html file using regex.

HTML code:

<td>Details</td></tr>  
<tr class=d1>
<td>uss_vod_translator</td>

Regex Expression:

if ($line =~ m/Details</td>s*</tr>s*<trs*class=d1>s*<td>(w*)</td>/)
{
    print "$1";
}

I am using /s* (space) for multi-line, but it is not working. I searched about it, even used /? for multi-line but that too did not work.

Can any one please suggest me how to parse a multiline HTML?

I know regex is a poor solution to parse HTML. But i have a legacy HTML code which i need to parse and have no other choice.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Can any one please suggest me how to parse a multiline HTML?

Stop trying to use regular expressions and use a module that will parse it for you.

HTML::TreeBuilder is a good solution.

HTML::TreeBuilder::LibXML gives you the same API but backed by a fast parser.

HTML::TreeBuilder::XPath adds XPath support as well as a fast parser.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...