This is a PHP class to quickly download single URLs and stacks of URLs using multiple threads. It uses the OO cURL wrapper.
One of the most common URI-related tasks is to create stacks of URLs (multi-dimensional arrays), to be downloaded and stored. The main purpose of the class is to provide a simplistic way to do this.
In the process of making this site, I needed a facility to strip attributes from HTML elements. My first stop was on the
strip_tags page in the PHP manual. However, the function on there were pretty poor and borked out a lot. Google didn't provide any better results, so I ended up having to make one. The result are pretty good. After a few tweaks I ran over 10,000 tests on different web pages and didn't have any problems.
A common task in SEO scripting and dealing with APIs is downloading paged data â€“ page iteration. I created a class to make this task a bit easier about a year ago. It supports downloading paginated data that use GET or POST to move the cursor on. To make sure it doesn't whir away when there is no data left, it has a callback function that is called after each URL has been downloaded. You can use this to run a reg ex or whatever on each page to make sure there is still data to be scraped.
This is an object oriented wrapper for PHP cURL support; designed to cut down on the amount of bloat that is required to deal with cURL in PHP. It supports multi-threading and has a built-in retry facility that will try to re-download a URL a given number of times if it recives a HTTP header code more or equal than 400.