cURL is a sleek way to grab contents from a URL. It’s helpful in PHP where servers have banned using the function file_get_contents() . A while back I had shared a post which show on how to grab the contents of a URL using cURL. However, cURL is much more than just grabbing data. It can be used to make API calls (which are again pinging a particular url with a relevant data), like Facebook, Twitter APIs do.
When a URL is requested via cURL , the server responds to the request like a request sent via browser. However, the default cURL request do not specify information which is needed by some servers to validate a request, for eg. User Agent, language etc.
As a matter of fact, cURL lets you specify these parameters. When you specify these parameters the request via cURL looks like a request from just another browser. So this can be a handy way when you need to scrape some data off the website (personally I do not recommend this ). We also need to specify the header information other than the user agent to make our cURL request look legit. Now let us see how can you spoof various information in the cURL request to make it look like a browser request –
curl_setopt($curl,CURLOPT_USERAGENT,'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0.1) Gecko/20100101 Firefox/7.0.1'); |
Also you can randomize the user agents if you are making continuous request to a particular URL/Service, as follows –
$agents = array( 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0.1) Gecko/20100101 Firefox/7.0.1', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100508 SeaMonkey/2.0.4', 'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)', 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_7; da-dk) AppleWebKit/533.21.1 (KHTML, like Gecko) Version/5.0.5 Safari/533.21.1' ); curl_setopt($ch,CURLOPT_USERAGENT,$agents[array_rand($agents)]); |
You can add as many user agents you want to and using the array_rand() function the strings will be picked randomly and sent to the server while making various cURL requests.
There are two alternate ways to set your default user agent using php ini_set function or via .htaccess file.
The PHP Way :
ini_set('user_agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0.1) Gecko/20100101 Firefox/7.0.1'); |
The .htaccess Way :
php_value user_agent Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9 |
Now that we have set user agent for the cURL request, it’s time to set the header to make the request completely legit.
//set the header params $header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,"; $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"; $header[] = "Cache-Control: max-age=0"; $header[] = "Connection: keep-alive"; $header[] = "Keep-Alive: 300"; $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"; $header[] = "Accept-Language: en-us,en;q=0.5"; $header[] = "Pragma: "; //assign to the curl request. curl_setopt($curl, CURLOPT_HTTPHEADER, $header); |
Now you are good to execute your cURL requests and have fun.
Stay Digified!!
Sachin Khosla