Download the content of a webpage or URL using PHP and cURL

Download the content of a webpage or URL using PHP and cURL

Download the content of a webpage or URL using PHP and cURL

 

What is cURL?

“cURL is the name of the project. The name is a play on ‘Client for URLs’, originally with URL spelled in uppercase to make it obvious it deals with URLs. The fact it can also be pronounced ‘see URL’ also helped, it works as an abbreviation for “Client URL Request Library” or why not the recursive version: “Curl URL Request Library”.”

cURL is a tool for transferring files and data with URL syntax, supporting many protocols including HTTP, FTP, TELNET and more. Initially, cURL was designed to be a command line tool.
As PHP supports the cURL library (from PHP 4.0.3+), we’ll have a look on how to load the content of a webpage using PHP and the cURL library.

 

Why cURL?

Within PHP there are other ways of fetching a web page content then using cURL.

// file_get_contents method
$content = file_get_contents("http://www.creative-geeks.com");
// file() method
$content = file("http://www.creative-geeks.com");
// readfile method
readfile("http://www.creative-geeks.com");

These above methods do not have the flexibility that cURL has. They lack error handling and do not provide request transfer metrics (loading time / data load / transfer speed / …).

Also, there are certain tasks that you simply can’t do with the previous methods… like dealing with cookies, authentication, file uploads, etc.

 

Init cURL handle using curl_init

Create a cURL handle as following:

$ch = curl_init();

 

Use curl_setopt to set options

The curl_setopt is your goto method to call when using cURL.
curl_setopt — Set an option for a cURL transfer.

/*
$ch - A cURL handle returned by curl_init().
option - The CURLOPT_XXX option to set.
value - The value to be set on option.
*/

curl_setopt($ch, CURLOPT_URL, $url);

See all available options at http://php.net/manual/en/function.curl-setopt.php

 

curl_getinfo

Use this cURL method to retrieve information regarding a specific transfer.

/*
Value fields available for the cURL transfer
"url"
"content_type"
"http_code"
"header_size"
"request_size"
"filetime"
"ssl_verify_result"
"redirect_count"
"total_time"
"namelookup_time"
"connect_time"
"pretransfer_time"
"size_upload"
"size_download"
"speed_download"
"speed_upload"
"download_content_length"
"upload_content_length"
"starttransfer_time"
"redirect_time"
"certinfo"
"primary_ip"
"primary_port"
"local_ip"
"local_port"
"redirect_url"
"request_header"
*/

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://creative-geeks.com/blog/2017/05/03/angular-js-series-how-to-optimize-the-performance-of-your-angularjs-applications/');
curl_getinfo($ch);

 

Code

/*
* get the data from a specified URL
* @param - url
* @return $data - array data & info object;
*/

function get_url_content($url) {
    $ch = curl_init();
    // set webpage url
    curl_setopt($ch, CURLOPT_URL, $url);
    // Returns TRUE on success or FALSE on failure. However, if the CURLOPT_RETURNTRANSFER option is set, it will return the result on success, FALSE on failure.
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    // The number of seconds to wait while trying to connect. Use 0 to wait indefinitely.  
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
    // execute cURL handle
    $data = curl_exec($ch);
    // get curl info
    $info = curl_getinfo($ch);
    // close cURL handle
    curl_close($ch);
    // return
    return array(
        'data' => $data,
        'info' => $info
    );
}


/* Get request content */
$returned_content = get_url_content('http://creative-geeks.com/blog/2017/05/03/angular-js-series-how-to-optimize-the-performance-of-your-angularjs-applications/');
var_dump($returned_content['data']);
var_dump($returned_content['info']);

See code on PHPSandbox


About the Author

Daan is a Creative-Geek who loves learning and sharing new techniques! Follow him on Twitter to keep up to date with the Creative-Geeks blog and other subjects. Contact him on e-mail : info[at]creative-geeks.com.