In the process of making this site, I needed a facility to strip attributes from HTML elements. My first stop was on the strip_tags page in the PHP manual. However, the functions on there were pretty poor and borked out a lot. Google didn’t provide any better results, so I ended up having to make one. The results are pretty good. After a few tweaks I ran over 10,000 tests on different web pages and didn’t have any problems.

The class will work on any form of XML markup, not just HTML. This class doesn’t strip tags. To do so you will need to use it in conjunction with strip_tags. It picks up any form of invalid attributes that get rendered by browsers like name=value and name = "value". So you can make sure your not gonna get owned by progressive HTML injectors and XSS kiddies 🙂

For the sake of making it easy to use, I just botched a required function that escapes strings for regular expression at the top of the class. For some reason PCRE doesn’t escape properly. So, you may want to move this function elsewhere.

In the above example the allow variable sets attributes that are to be allowed on all elements, the exceptions variable sets attributes that are to be allowed on specific elements and the ignore variable sets elements that are to be totally ignored. So, the example will except id and class attributes on any element, src and alt attributes on img elements, href and title on a elements and will not ignore any tags.

Version History

Version Release Notes
0.1 27 Jan 2009 Initial release
0.2 16 Mar 2009 Expanded support for XML node names (such as namespaces), fixed a bug with finding self-closing tags and expanded support for malformed attributes.
0.2.1 23 Oct 2009 Fixed parsing of elements that contain new lines.

This class is available under the MIT License.