Top

Enableing Chinese, Arabic and Other High Unicode in WordPress Slugs

SEMLabs.co.uk / General  / Enableing Chinese, Arabic and Other High Unicode in WordPress Slugs

Enableing Chinese, Arabic and Other High Unicode in WordPress Slugs

A while back someone contacted me about their WordPress blog borking out on some of their posts. After a bit of poking about it became apparent that this was because WordPress doesn’t allow high Unicode characters in the URL. At first, I thought this would just be a change to a line in .htaccess, but there are a couple of other things that need to be changed too.

Here are the instructions to allow your WordPress blog to have high Unicode like Han or Arabic in the URL:

Edit .htaccess

First of all open up your .htaccess. You will find a block something like this:

You will need to replace the forth and fifth lines with one of the following:


Edit Post Slugs

The first solution will allow URLs containing high Unicode characters. While the second will allow all URLs to go though. This will re-route URLs containing those characters to the index.php so WordPress can deal with them. However, there are a couple of other things you need to do…

  • Open up the file wp-includes/query.php
  • Search for the line: $q['name'] = sanitize_title($q['name']);
  • and replace this with: $q['name'] = addslashes(strip_tags($q['name']));

This will stop WordPress from converting post slugs to UTF-8 character codes. So your posts will now be loaded from the database.

Edit Page Slugs

  • Open wp-includes/post.php
  • Search for the function called get_page_by_path
  • Copy the following to the top of the function:

This will allow any pages that contain high Unicode characters in their slugs to be loaded.

Edit Category Slugs

  • Open wp-includes/category.php
  • Search for the function called get_category_by_path
  • Comment out this line: $category_path = rawurlencode(urldecode($category_path)); – you do this by adding a hash at the beginning of the line
  • After the following line: $full_path .= ( $pathdir != '' ? '/' : '' ) . sanitize_title( $pathdir ); paste this:

This will allow any categories that contain high Unicode characters in their slugs to be loaded.

Share