Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Given a list of urls, I would like to check that each url:

  • Returns a 200 OK status code
  • Returns a response within X amount of time

The end goal is a system that is capable of flagging urls as potentially broken so that an administrator can review them.

The script will be written in PHP and will most likely run on a daily basis via cron.

The script will be processing approximately 1000 urls at a go.

Question has two parts:

  • Are there any bigtime gotchas with an operation like this, what issues have you run into?
  • What is the best method for checking the status of a url in PHP considering both accuracy and performance?
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
543 views
Welcome To Ask or Share your Answers For Others

1 Answer

Use the PHP cURL extension. Unlike fopen() it can also make HTTP HEAD requests which are sufficient to check the availability of a URL and save you a ton of bandwith as you don't have to download the entire body of the page to check.

As a starting point you could use some function like this:

function is_available($url, $timeout = 30) {
    $ch = curl_init(); // get cURL handle

    // set cURL options
    $opts = array(CURLOPT_RETURNTRANSFER => true, // do not output to browser
                  CURLOPT_URL => $url,            // set URL
                  CURLOPT_NOBODY => true,         // do a HEAD request only
                  CURLOPT_TIMEOUT => $timeout);   // set timeout
    curl_setopt_array($ch, $opts); 

    curl_exec($ch); // do it!

    $retval = curl_getinfo($ch, CURLINFO_HTTP_CODE) == 200; // check if HTTP OK

    curl_close($ch); // close handle

    return $retval;
}

However, there's a ton of possible optimizations: You might want to re-use the cURL instance and, if checking more than one URL per host, even re-use the connection.

Oh, and this code does check strictly for HTTP response code 200. It does not follow redirects (302) -- but there also is a cURL-option for that.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...