PHP cURL Class With Multi-Threading
This is an object oriented wrapper for PHP cURL support; designed to cut down on the amount of bloat that is required to deal with cURL in PHP. It supports multi-threading and has a built-in retry facility that will try to re-download a URL a given number of times if it recives a HTTP header code more or equal than 400.
All of cURL facilities have been implemented in the wrapper. They are documented using the Javdoc format, so should be pretty easy for anyone familiar with using cURL to use. As standard, you will need to set CURLOPT_RETURNTRANSFER to true in order to store the results in a variable. Here are a couple of examples to get you going:
Download A URL
In the above example, cURL will attempt to download yahoo.com twice. The result will be stored in an array. The clear method clears the cURL wrapper of its sessions, allowing you to make fresh connections.
Download Multiple URLs
In this example, cURL will attempt to download three URLs at once (multi-threading) and return the results in an array.
This class is available under the MIT License.
Comments
Class looks real good. I will implement it into my eCommerce payment gateway when I get back to work.
Willl be good to see the new eCommerce framework.
Thank You! This was a huge help to me. The clean() function was a life saver. I was looking at another lib and couldn't figure out how to remove handles individually. It seems multithreading pages actually is taking me longer. Not sure why yet. I think it may have to do with strange characters, some are utf-8. Once that is sorted out, I think it will be much faster. Thanks again, awesome contribution.
Hi, glad it was of help. They certainly didn't make multi-threading very easy to do in PHP. I am not sure why multi-threading would take you longer. I find an optimum number of URLs to download at once is 50 on my 256kbps line. Here is something that may help you with your UTF-8 issues:
The above is a fairly standard set of cURL opts that I might use. It basically emulates a standard set of browser HTTP headers. So may stop any encoding issues.
I don't know why either, spent the whole day digging into UTF and Unicode. My project was scraping images from Flickr, without using their API. I wanted to also get the titles, descriptions, and tags. Flickr allows Unicode chars, I'm pretty certain that was part of the issue. My script would stall out on some pages, and then go pretty quick on others. I was only using 24 threads at a time. I finally gave up on that script and went to the API. I'm still using your wrapper for my project. I'll let you know how it goes today. Thanks again.
Unicode is not handled too well by PHP5. It is handled properly by PHP6 though, but its not been documented yet in the manual. Other things you could try are these:
This tells PHP to use UTF-8 for everything.
Alternatively, you may have been having issues because you were outputting stuff to browser without declaring a char set. If so, use this at the top of the file:
Very helpful, thanks. I read up on Unicode at the php.net site, I considered upgrading to PHP6. Not ready for that headache yet. : )
I'm going through the Flickr API now, it's blazing fast, and I found a wrapper that makes it silly simple. (phpFlickr). I'm getting about 100 medium images in under 30 seconds using your multicurl, now I need to try grabbing the amplifying data and store it in a DB. Using 100 threads (requests) doesn't even seem to phase my localhost, I'll push it up to the Flickr limit of 500 and just see what happens. All I'm doing is grabbing the jpg files, multicurl puts them into arrays and I use file_put_contents($path,$arrayItem) to write the jpg file.
I can't tell you how much time you probably saved me with your class! Thanks.
Nice class, nothing to improve
This class is great - just what I need (multi-handling is a real pain)!
Do you have any thoughts on how to enhance this to allow someone to use it in a continual fashion?
Example: Say I have 500 URLs to obtain, and I want to run 10 download 'slots'. Then, when a slot finishes, it adds the next URL to that position. The idea being that you are always active, but have no more than 10 (or whatever) concurrent downloads at once.
See this other class: .
Have a look at Multiplatform PHP Multithreading engine http://anton.vedeshin.com/articles/lightweight-and-multiplatform-php-multithreading-engine
Thanks you very very much I will use that class in my website !!
BTW I have do so test to compare my old version and the new version with the class
for 3 pages here is the script loading time in sec :
old new
3.235 1.167
3.478 1.142
3.014 1.186
2.791 1.282
2.976 1.115
That 3 times more faster !
Did I say thanks ?
Anyone having CPU problems when some files need time to load ? My CPU is at 100% most of the time...
Could Adding a usleep (100000) here help a bit ?
while ( $active && $mrc == CURLM_OK )
{
if ( curl_multi_select( $mh ) != -1 )
{
do
$mrc = curl_multi_exec( $mh, $active );
usleep (100000);
while ( $mrc == CURLM_CALL_MULTI_PERFORM );
}
}
First of all, thank you for providing a nice class, I'm thinking of incorporating it to a project I have at work.
Just a suggestion, it might be good to implement a singleton pattern for this class, just to ensure that it won't be instantiated more than once when used.
"Example: Say I have 500 URLs to obtain, and I want to run 10 download 'slots'. Then, when a slot finishes, it adds the next URL to that position. The idea being that you are always active, but have no more than 10 (or whatever) concurrent downloads at once."
I think this can be applied by creating a sort of callback function to the exec method.
Hi, you don't need to build singletons into classes. You can have a singleton class that you can use for any class you want. I use this: Singleton Class. Then all you do is:
$curl = Sing::get( 'CURL' );This will either create a new CURL singleton or get the existing object if it exists. You can also expand that class to allow you to use multiple singletons of the same class by using a prefix.
Great way of using Reflection and the Singleton pattern. :)