Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I dont parse this url: http://foldmunka.net

$ch = curl_init("http://foldmunka.net");

//curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
//curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); //not necessary unless the file redirects (like the PHP example we're using here)
$data = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
clearstatcache();
if ($data === false) {
  echo 'cURL failed';
  exit;
}
$dom = new DOMDocument();
$data = mb_convert_encoding($data, 'HTML-ENTITIES', "utf-8");
$data = preg_replace('/<!--[if(.*)]>/', '', $data);
$data = str_replace('<![endif]-->', '', $data);
$data = str_replace('<!--', '', $data);
$data = str_replace('-->', '', $data);
$data = preg_replace('@<script[^>]*?>.*?</script>@si', '', $data);
$data = preg_replace('@<style[^>]*?>.*?</style>@si', '', $data);

$data = mb_convert_encoding($data, 'HTML-ENTITIES', "utf-8");
@$dom->loadHTML($data);

$els = $dom->getElementsByTagName('*');
foreach($els as $el){
  print $el->nodeName." | ".$el->getAttribute('content')."<hr />";
  if($el->getAttribute('title'))$el->nodeValue = $el->getAttribute('title')." ".$el->nodeValue;
  if($el->getAttribute('alt'))$el->nodeValue = $el->getAttribute('alt')." ".$el->nodeValue;
  print $el->nodeName." | ".$el->nodeValue."<hr />";
}

I need sequentially the alt, title attributes and the simple text, but this page i cannot access the nodes within the body tag.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
577 views
Welcome To Ask or Share your Answers For Others

1 Answer

I'm not sure I'm getting what this script does - the replace operations look like an attempt at sanitation but I'm not sure what for, if you're just extracting some parts of the code - but have you tried the Simple HTML DOM Browser? It may be able to handle the parsing part more easily. Check out the examples.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...