Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I know how to get the html source code via cUrl, but I want to remove the comments on the html document (I mean what is between <!-- .. -->). In addition, if I can take just the BODY of the html document. thank you.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
705 views
Welcome To Ask or Share your Answers For Others

1 Answer

Try PHP DOM*:

$html = '<html><body><!--a comment--><div>some content</div></body></html>'; // put your cURL result here

$dom = new DOMDocument;
$dom->loadHtml($html);

$xpath = new DOMXPath($dom);
foreach ($xpath->query('//comment()') as $comment) {
    $comment->parentNode->removeChild($comment);
}

$body = $xpath->query('//body')->item(0);
$newHtml = $body instanceof DOMNode ? $dom->saveXml($body) : 'something failed';

var_dump($newHtml);

Output:

string(36) "<body><div>some content</div></body>"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...