SEM Labs

Handcrafted Pixels, Code & Title Tags

PHP cURL Class With Multi-Threading

This is an object oriented wrapper for PHP cURL support; designed to cut down on the amount of bloat that is required to deal with cURL in PHP. It supports multi-threading and has a built-in retry facility that will try to re-download a URL a given number of times if it recives a HTTP header code more or equal than 400.

All of cURL facilities have been implemented in the wrapper. They are documented using the Javdoc format, so should be pretty easy for anyone familiar with using cURL to use. As standard, you will need to set CURLOPT_RETURNTRANSFER to true in order to store the results in a variable. Here are a couple of examples to get you going:

Download A URL

In the above example, cURL will attempt to download yahoo.com twice. The result will be stored in an array. The clear method clears the cURL wrapper of its sessions, allowing you to make fresh connections.

Download Multiple URLs

In this example, cURL will attempt to download three URLs at once (multi-threading) and return the results in an array.

This class is available under the MIT License.

Comments

nTommy Replied at 10:15 PM on 27 Jan 2009

Class looks real good. I will implement it into my eCommerce payment gateway when I get back to work.

David Replied at 10:17 PM on 27 Jan 2009

Willl be good to see the new eCommerce framework.

Arshelbic Replied at 7:43 PM on 8 Mar 2009

Thank You! This was a huge help to me. The clean() function was a life saver. I was looking at another lib and couldn't figure out how to remove handles individually. It seems multithreading pages actually is taking me longer. Not sure why yet. I think it may have to do with strange characters, some are utf-8. Once that is sorted out, I think it will be much faster. Thanks again, awesome contribution.

David Replied at 1:05 AM on 9 Mar 2009

Hi, glad it was of help. They certainly didn't make multi-threading very easy to do in PHP. I am not sure why multi-threading would take you longer. I find an optimum number of URLs to download at once is 50 on my 256kbps line. Here is something that may help you with your UTF-8 issues:


$header	= array( 
	"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
	"Accept-Language: en-gb,en;q=0.5",
	"Accept-Encoding: gzip,deflate",
	"Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7",
	"Keep-Alive: 300",
	"Connection: keep-alive"
);

$opts = array(
	CURLOPT_HTTPHEADER	=> $header,
	CURLOPT_USERAGENT	=> "Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.8.1.14) Gecko/20080418 Ubuntu/7.10 (gutsy) Firefox/2.0.0.14",
	CURLOPT_SSL_VERIFYHOST	=> false,
	CURLOPT_SSL_VERIFYPEER	=> false,
	CURLOPT_FOLLOWLOCATION	=> true,
	CURLOPT_RETURNTRANSFER	=> true,
	CURLOPT_TIMEOUT		=> 15
);

The above is a fairly standard set of cURL opts that I might use. It basically emulates a standard set of browser HTTP headers. So may stop any encoding issues.

Arshelbic Replied at 8:16 AM on 9 Mar 2009

I don't know why either, spent the whole day digging into UTF and Unicode. My project was scraping images from Flickr, without using their API. I wanted to also get the titles, descriptions, and tags. Flickr allows Unicode chars, I'm pretty certain that was part of the issue. My script would stall out on some pages, and then go pretty quick on others. I was only using 24 threads at a time. I finally gave up on that script and went to the API. I'm still using your wrapper for my project. I'll let you know how it goes today. Thanks again.

David Replied at 2:26 PM on 9 Mar 2009

Unicode is not handled too well by PHP5. It is handled properly by PHP6 though, but its not been documented yet in the manual. Other things you could try are these:


setlocale( LC_CTYPE, 'en_GB.utf8' );

This tells PHP to use UTF-8 for everything.

Alternatively, you may have been having issues because you were outputting stuff to browser without declaring a char set. If so, use this at the top of the file:


header('Content-Type: text/html; charset=utf-8'); 

Arshelbic Replied at 9:54 AM on 9 Mar 2009

Very helpful, thanks. I read up on Unicode at the php.net site, I considered upgrading to PHP6. Not ready for that headache yet. : )

I'm going through the Flickr API now, it's blazing fast, and I found a wrapper that makes it silly simple. (phpFlickr). I'm getting about 100 medium images in under 30 seconds using your multicurl, now I need to try grabbing the amplifying data and store it in a DB. Using 100 threads (requests) doesn't even seem to phase my localhost, I'll push it up to the Flickr limit of 500 and just see what happens. All I'm doing is grabbing the jpg files, multicurl puts them into arrays and I use file_put_contents($path,$arrayItem) to write the jpg file.

I can't tell you how much time you probably saved me with your class! Thanks.

Neor Replied at 9:47 AM on 12 Jun 2009

Nice class, nothing to improve

Jason Replied at 12:47 PM on 29 Jun 2009

This class is great - just what I need (multi-handling is a real pain)!

Do you have any thoughts on how to enhance this to allow someone to use it in a continual fashion?

Example: Say I have 500 URLs to obtain, and I want to run 10 download 'slots'. Then, when a slot finishes, it adds the next URL to that position. The idea being that you are always active, but have no more than 10 (or whatever) concurrent downloads at once.

David Replied at 9:59 PM on 29 Jun 2009

See this other class: .

Anton Replied at 7:20 AM on 3 Jul 2009

Have a look at Multiplatform PHP Multithreading engine http://anton.vedeshin.com/articles/lightweight-and-multiplatform-php-multithreading-engine

Keny Replied at 9:04 PM on 16 Aug 2009

Thanks you very very much I will use that class in my website !!

BTW I have do so test to compare my old version and the new version with the class

for 3 pages here is the script loading time in sec :

old new

3.235 1.167

3.478 1.142

3.014 1.186

2.791 1.282

2.976 1.115

That 3 times more faster !

Did I say thanks ?

Vincent Replied at 9:09 AM on 3 Oct 2009

Anyone having CPU problems when some files need time to load ? My CPU is at 100% most of the time...

Could Adding a usleep (100000) here help a bit ?

while ( $active && $mrc == CURLM_OK )

{

if ( curl_multi_select( $mh ) != -1 )

{

do

$mrc = curl_multi_exec( $mh, $active );

usleep (100000);

while ( $mrc == CURLM_CALL_MULTI_PERFORM );

}

}

Nikko Replied at 9:58 PM on 15 Nov 2009

First of all, thank you for providing a nice class, I'm thinking of incorporating it to a project I have at work.

Just a suggestion, it might be good to implement a singleton pattern for this class, just to ensure that it won't be instantiated more than once when used.

"Example: Say I have 500 URLs to obtain, and I want to run 10 download 'slots'. Then, when a slot finishes, it adds the next URL to that position. The idea being that you are always active, but have no more than 10 (or whatever) concurrent downloads at once."

I think this can be applied by creating a sort of callback function to the exec method.

David Replied at 5:09 PM on 16 Nov 2009

Hi, you don't need to build singletons into classes. You can have a singleton class that you can use for any class you want. I use this: Singleton Class. Then all you do is:

$curl = Sing::get( 'CURL' );

This will either create a new CURL singleton or get the existing object if it exists. You can also expand that class to allow you to use multiple singletons of the same class by using a prefix.

Nikko Replied at 6:54 PM on 16 Nov 2009

Great way of using Reflection and the Singleton pattern. :)

Jez Replied at 3:22 PM on 15 Jun 2010

Hi David,

I was just Googling "how to thread curl php" to learn how to do this this and you came up #1, I had not realised you had posted this class ... thanks ;-)

Derik Replied at 1:43 AM on 26 Aug 2010

Thanks for this. Currently working on a project where product details gets pulled from external repositories in XML format and on the one staging server we have a really slow connection causing CURL to time out without results. Your retry method solved this issue for me so again, kudos!

WNK Replied at 3:12 PM on 16 Jan 2011

Thanks for this php CLASS... I'm using it for a while now (basic use, principally for fetching html data) coupled to the regexp's function.

Now, I'm using this technique for this: http://www.appbrain.com/app/simple-daily-horoscope-(beta)/com.wk.horoscope

darren Replied at 10:30 PM on 18 Jan 2011

Hi,

Thanks for such a great script...

I only have one problem when its running in multi-mode.

foreach ($report_files as $report_file)

{

$curl->addSession(SERVER . "{$rel_path}{$report_file}", $opts);

$result[$report][] = $curl->exec();

}

It appears it doesnt finish actioning each request completed until its cleared.

Since i am writing out CSV files and the last php file in the loop reads a base csv file it fails since it hasnt written yet.

The only way i found to over come this was to modify your function execSingle and add $this->sessions = array(); right before its return.

I couldnt see any other way around this?

Thanks.

galileo Replied at 7:16 AM on 11 Feb 2011

This isn't actually multithreading, its just bunch of sockets, curl multi creates a requested number of sockets and checks its status in a while loop. If a socket responds with changed status curl reads the socket etc.

You can see how this works in 'pure' php at:

http://code.google.com/p/phpsocketdaemon/

Ejz Replied at 8:41 AM on 12 Sep 2011

You can simply use shell "wget" with flag -b (means background), if you have some problem with cURL.

Max Replied at 11:55 AM on 9 Nov 2011

Wow, this class turned out to be extremely helpful and saved me a lot of time. I'm now using it to check my client's websites resp. landingpages for different criteria (PPC campaigns - target is to improve Adwords quality score) which gives really valuable insights. Wouldn't be possible at all to do this manually... Thanks again! :)

Andree Christaldi Replied at 11:24 AM on 5 Mar 2013

Thank for making this available. I found a few other scripts, but the code was horrendous.

Post Comment

Thin comments left for links will be deleted.

Entry Info

Categories