If you recently upgraded to wordpress 2.2+ from earlier versions you may have found your posts have been all messed up with lots of Â, â€ and ™ characters dotted everywhere. This happened to Blogging Tips posts, I quickly removed them from all of the affected posts but I didn’t spend any time looking into what caused this to happen.
Last week I helped out Elina from My Lil Venture by upgrading her blog to wordpress 2.2.1. Elina emailed be back telling me that all of her posts were now filled with Â and â€ characters, clearly this problem was something to do with the wordpress upgrade.
Here is an example of the output in one of Elina’s posts
I decided to look into the problem further as there are a lot of wordpress users who are having problems with this.
What causes these characters to appear after upgrading?
WordPress added two new lines to the wp-config.php file in version 2.2
In brief, DB_CHARSET lets you define the character set which is used on your blog and DB_COLLATE lets you set the order of the character set. I don’t want to go into too much details about these new variables as it will not concern the majority of wordpress users. You can find out more information about these variables here.
So why does this problem arise? Well, if you install a fresh copy of wordpress you will not get this problem however if your blog has been upgraded to 2.2+ from an earlier version of wordpress you will. This is because the wordpress upgrade does not convert the old Latin1 character set to the new UTF-8 character set during the upgrade. I’m baffled why wordpress did not speak more about this when they released wordpress 2.2 as it’s clearly something that is going to cause problems for a lot of bloggers.
How do you remove these strange characters from your posts?
There are a few ways you can fix this problem.
- Remove the references to DB_CHARSET and DB_COLLATE in the wp-config.php file – If you simply remove the new lines from your wp-config.php file your posts should be back to normal. In the long term though it’s maybe best to convert your database. Here is a screenshot of the page I referenced before. As you can see, the unwanted characters have disappeared by simply removing the lines from the wp-config.php file.
- Convert your database the hard way – WordPress have a guide to converting your database character set. Unless you have experience with mysql, I wouldn’t recommend doing this as there is a much better alternative (noted below). If you do choose to convert your database using this step to step guide, make sure you back up your database beforehand.
- Download the wordpress UTF-8 Database Converter plugin – g30rg3x have released a wordpress plugin called UTF-8 Database Converter which converts your database and therefore removes all strange characters from your posts. You can download it here. All you need to do is backup your database, upload this plugin and then activate it. When you have activated it and selected the converter in your plugin tab you will see this screen.
Don’t get too alarmed about this screen, just double check that you have backed up your blog database so if anything happens your covered.
Removing or commenting out the DB_CHARSET and DB_COLLATE references in your wp-config.php file is definately the quickest way to resolve this problem but I’ve no doubt that future versions of wordpress will include these new character variables so it might be worthwhile converting your database using the plugin I mentioned.
I hope that this guide will help wordpress users who have had problems with this. If you are unsure about anything please let me know and I will do my best to help.