SEM Labs

Handcrafted Pixels, Code & Title Tags

This is a PHP class to quickly download single URLs and stacks of URLs using multiple threads. It uses the OO cURL wrapper.

One of the most common URI-related tasks is to create stacks of URLs (multi-dimensional arrays), to be downloaded and stored. The main purpose of the class is to provide a simplistic way to do this.

If you are using the Nested Set Model to store hierarchical data in SQL, this PHP function will help you convert that data from a flat array into a multi-dimensional array.

A while back someone contacted me about their WordPress blog borking out on some of their posts. After a bit of poking about it became apparent that this was because WordPress doesn't allow high Unicode characters in the URL. At first, I thought this would just be a change to a line in .htaccess, but there are a couple of other things that need to be changed too.

In the process of making this site, I needed a facility to strip attributes from HTML elements. My first stop was on the strip_tags page in the PHP manual. However, the function on there were pretty poor and borked out a lot. Google didn't provide any better results, so I ended up having to make one. The result are pretty good. After a few tweaks I ran over 10,000 tests on different web pages and didn't have any problems.

A common task in SEO scripting and dealing with APIs is downloading paged data – page iteration. I created a class to make this task a bit easier about a year ago. It supports downloading paginated data that use GET or POST to move the cursor on. To make sure it doesn't whir away when there is no data left, it has a callback function that is called after each URL has been downloaded. You can use this to run a reg ex or whatever on each page to make sure there is still data to be scraped.