Well, if you have tried to check my site about a week ago, then you would have noticed that it was not there. Okay, it was still “there”, but the there was somewhere off the public accessible Internet and thus for all intends and purposes “gone”.
This site and also the Blog were hosted on an old dedicated web server of mine which came into its years. It just suddenly decided that it is time for it to go. Admitted, it was not all that suddenly. The server had accumulated various minor “diseases” and flukes over the past seven years while it was running 24/7. It even crashed before, but until recently never “fatally”. A few days down and stuff like a new HD brought it always back up and running again.
Not so this time. May it rest in peace.
Web Server Fatal and Permanent Crash – Site and Blog Move
I also have to admit that I was kind of dragging along the well overdue move to a different server for way to long. Always came something else up and I simply did not do it. Well, now I had no more liberty to choose the when and how. That decision was done for me.
Coincidently also Google discontinued their FTP feature for Google/Blogger blogs for webmasters who wanted to host the actual blog media and content files on their own web server rather than having it hosted at blogspot.com or running on a sub-domain like blog.website.com with the actual blog itself still being hosted at Google and not their own web servers.
It actually meant that I not only had to move this site (and two other sites as well), but also had to do something about this blog. I played in the past already with the thought to migrate the blog to a different platform, such as Word Press, but also here, time for long time considerations and migration timelines vanished completely.
Google announced the discontinuation of the FTP feature already several months ago, but I was of course waiting until they actually disabled the feature and I wanted to make a change to an existing post. Well, that was the moment when I was thinking to just take care of the blog migration. On that day I went to my site getting an error screen telling me that there is no site anymore at all and I realized that I got a hell of a lot more to deal with than just the migration of my “little” personal blog. And that is how I ended up with a stressful week and way to little sleep.
Short answer, the blog is now running on WordPress, which I host on my own dedicated server running under Microsoft IIS and not PHP, which was less of a problem in many respects than I expected it to be.
The move of the main web site(s) was all in all not that much of a problem. I also had up-to-date content backups, which prevented the loss of data fortunately. Where I did not have an 100% backup of were the damn Blogger blog images. There were multiple reasons why that happened. The major reason was probably the history of the blog itself, which I started in February 2006 as a blogspot.com hosted blog, then moved to Cumbrowski.com with FTP option and then in 2007 to RoySAC.com. I also started using Windows Live Writer as main editor, using prior to that the Blogger.com interface and then another editor (I think it was called Blogworks). That also brought some changes to how, where and when images were published to the blog.
Image locations changed several times, having at the end about 4 of them. One of the 4 locations was unfortunately outside the main web sites directory and missed by my frequent file backups. I also converted all blog posts that I ever wrote to MS Word and PDF, which means that lost image files did not get lost forever, only, how do I put it best… were less convenient accessible and in a format that would have allowed a swift transition to a different blogging platform. But my problems did not end there. Let me start from the beginning.
As I mentioned already, getting WordPress up and running on a Windows Server wasn’t much of a problem. That used to be very different only a few years ago. Okay, I had PHP already installed and configured. MySQL was also setup, but not configured also, but having those things installed with IIS web server does not mean that you can call it a day, job done in order to get WP up and running using those things properly.
1. Word Press/PHP/MySQL Server Install on Windows Server with IIS
With the exception of Windows Live Writer (which they later had to bundle with Windows Live Essentials to get people mad at them again), Microsoft appears to have created something that is actually fantastically useful, working, plug and play versus plug-n-pray, pain-free and fee-free on top of that. It is called “Microsoft Web Platform” and WordPress is one application already supported by it. Download, run setup and the thing not only downloaded and installed the latest version of WordPress right on my Server, but also downloaded, installed and/or configured all depending apps like PHP and MySQL Server within the same process as needed. It took less than 5 minutes (or maybe 1-2 more, well, it was light-speed fast compared to a Windows Service Pack install).
I could not believe that this was all I had to do and prepared myself for the worse, but to my amasement it was really working and ready to use. I only had to change one thing myself and that was the increase of allowed upload file size in order to get my blog backup from Blogger imported into Word Press. My 4+ years of blogging added up to more than just 2 MB, which was the default limit by the PHP installation, but Word Press figured that and told me exactly what the problem is, what I had to do, where and why in order to fix it.. nothing the typical Windows error message that says nothing, offers no help or anything near of an explanation of what happed in the first place.
|Fix: Open PHP.INI configuration file, usually located in the PHP installation directory and/or maybe also in the System32 sub-directory of Windows itself. Notepad will do. Find the parameter “upload_max_filesize = “ and change the default value of “2MB” to whatever you see fit, e.g. “10M”.|
2. Conversion of Blogger Backup XML to Word Press WXR Format
I mentioned already the Blogger backup, which is a nice feature in blogger that was around for some while already. It simply allows the export of all blog posts, comments, tags and categories into a single XML text file to download and local safekeeping or in my case, migration to another blogging platform.
Unfortunately, the Blogger import feature in Word Press does not make use of this full backup file and only offers solutions that involve a live hosted blog, which I did not have anymore at that time.
Fortunately, some folks developed a free tool using Google’s App Engine to convert those Blogger backup files to various different blogging platforms, including the Word Press Backup, WXR file format. You can download the tool and run it yourself or use the online version of it, which is also provided by that developer team at: http://blogger2wordpress.appspot.com/
Posts, Comments and Categories were imported into Word Press. One catch though, Blogger “Categories” are converted to Word Press “Tags” and not “Categories”. All posts ended up under the WP default category “uncategorized”.
3. Word Press URL Formatting and IIS
Word Press allows the configuration of various different permalink URLs, giving users a great deal of freedom. There are also tons of additional Word Press plug-ins available that let you tweak your post URLs even further. There is only one problem, most of those only work properly or at all with Apache Web Servers and/or .HTA-access for proper internal redirection to the Word Press internal URL format, which looks basically like this:
Nice URLs with a folder like structure like this one: www.domain.com/blog/year/month/post-name/ do not work with IIS web servers off the bat. IIS will return a 404 Error (not found), because there is no real directory for each year, month and post, but IIS keeps looking for one. The solution that works out of the box looks something like this.
“blog” is an actual folder on my site where the blog really resides. Notice the “index.php” that follows right after that? This is the actual script file located in the sub folder “blog”, which generated the blog home-page and many of its sub pages as well. If you browse to www.domain.com/blog/, index.php is called automatically without the need to enter it, because that is how I configured it in IIS. This works fine for default scripts in a directory, root or sub-directory, but it requires that the directory itself exists before it starts looking for a default script to process. With the exception of messing around with the default IIS error pages (like the 404 not found error page that you get from a web server that is unable to locate the requested document), I believe the only practical workable solution is the installation of an ISAPI filter in IIS itself. No PHP script, regardless how sophisticated will help you get around it as far as I know. Possibly with IIS 7, which I am not really familiar with, but not IIS 6 or lower.
But also there is a free, pretty easy to setup and configure solution available:
WordPressURLRewrite by Binary Fortress Software is free and easy to configure ISAPI filter plug-in for Word Press installations running on IIS web servers. There is a 32bit and 64bit version available depending on what type of web server you are using. Copy the Dynamic Link Library for your platform “WordPressURLRewrite32.dll” or “WordPressURLRewrite64.dll”, plus the configuration file “WordPressURLRewrite.ini” somewhere on the web server, where it is possible for you to set the user permissions for the account IIS is running under to “WRITE” for the DLL and INI. A new sub-directory of the Word Press “plugins” sub-folder will do just file, e.g.:
The INI file is simple and straight forward and in most cases already ready to be used without modification. It requires the logical path location of the blog on your web site and the list of exceptions, that you don’t want the plug-in to mess with. Those are the usual suspects, like the Word Press contents and admin sub folders. I only had to add a few more exceptions to mine to include image sub folders etc.
#Paths to Rewrite (not case sensitive) /blog/
#Path Exceptions (not case sensitive) /blog/wp-admin /blog/wp%2dadmin /blog/wp-content /blog/wp%2dcontent
The Installation in IIS is straight forward. Open the IIS (Internet Information Services) Manager under “Administrative Tools”, expand Web Sites and right-click on the site where you installed Word Press. Click “Properties”, select the “tab “ISAPI Filters” in the window that pops up. Click the “Add” button, “Browse” for the 32 or 64 bit DLL of WordPressURLRewrite, depending on your server, OK, OK and you are done there.
I don’t know why, but any ISAPI filter in IIS I have encountered so far requires WRITE permission to itself in Windows (NOT Web Site permissions in IIS). In order to do that, you need to know the Windows User account, which the IIS User is running under (You can find that out when you are in IIS Manager and the properties for your web site, where you setup the ISAPI filter. Instead of “ISAPI Filter” select the tab “Directory Security” and then clicking on the “Edit” button for the segment “Authentication and Access Control”. The second pop-up window that opens up shows in the text field for “User name” the Windows account that is used for anonymous/public visitors to your web site. This user needs the write permissions IN WINDOWS. You probably configure this best via Windows Explorer (right click the plug-in folder, “Properties”, select the “Security’” tab and add the account used by ISS there, checking “Write” in the “Allow” column, which is “off” by default for any new user account that you add permissions for.)
So, that takes care of the nasty “Index.php” problem in the URL.
4. Old Blogger Permalinks to New Word Press Permalinks Redirection
It would have ended right there, if that would have been a new blog or one with the same permalink URLs as I have now. Unfortunately that is not the case. Blogger URLs look similar, but are not.
They look like this: http://www.domain.com/blog/year/month/post-name.html
The actual post looks like a file name and not a folder. If that would have been the only difference, no problem, because a “Custom” permalink structure configuration in Word Press looking like this: /%year%/%monthnum%/%postname%.html instead of /%year%/%monthnum%/%postname%/ would accomplish the same goal. It would have been too easy though. While this solution might work for a large number of posts, it probably won’t work for all of them, and if it fails just for one, its already bad, especially, if you have no idea which one and several hundred to check.
Blogger does not use the entire post title for the generated URL. It cuts it off at some point, if it exceeds a certain size. I read somewhere that blogger cuts off after 40 characters, but I won’t vouch for that information. It also makes some different choices for the conversion of special characters used within a post title. Some are simply ignored, while others are replaced by a “dash” or “hyphen” character, the generic and universal “minus” sign: “-“. It does that with SPACES, which is the same rule as Word Press uses for it’s permalink URLs, but when it comes to other characters, both platforms do it differently. What the rules are exactly I don’t know. Unless your post titles are always short and never uses any other characters than Letters, Numbers and Space, odds are that one or more of your new Word Press URLS will not match your old Blogger ones.
|If you migrate from Blogger hosted (YourBlog.Blogspot.com) to Word Press, check out this solution by Justin Somnia for redirecting users accessing posts at your old BlogSpot.com location to the same post at your new self-hosted Word Press blog.|
I have not found anything that worked properly for this problem. If the number of your posts isn’t too much and a manual tweak of the one or the other permalink by hand is okay and acceptable for you, I’d suggest to stop here. Use the custom link structure “/%year%/%monthnum%/%postname%.html” in Word Press and change the few wrong generated permalinks that differ from the original Blogger ones in the “Edit” screen of the post in Word Press. Word Press allows a manual change and overwrite for the permalink for every post whenever you feel like it.
5. Darker Territory
If you are a perfectionist or have too many posts that would fit the “possibly wrong generated by Word Press” category to check them one by one, then continue to follow me.
I decided to setup a one-to-one redirection for each post from the old Blogger URL to the new Word Press URL.
That required me to do 4 things. 1) get all permalinks for all posts generated by Word Press for my imported posts, 2) get all permalinks for all posts written using Blogger, 3) match them all up with as little as possible manual labor to do so and last but not least 4) establish a 301 redirection for each of those URL pairs.
Step 4) One-On-One Redirection
Let me start with 4), the redirection. The ISAPI filter WordPressURLRedirection does not offer any help, but a different ISAPI filter does. Since you installed already one, installing a second one should be a piece of cake. It’s the same thing over again. Only another DLL and a different configuration file to worry about, the rest is the same. I decided to go with the ISAPI filter I am most familiar with, which is called “ISAPI_Rewrite by Helicon”, the free version of it will do, while the paid pro version does a heck of a lot more, which has nothing to do with our little problem though. But you could use any other ISAPI filter that has a one by one URL redirection option. See this page on my other web site for tools and stuff for this kind of purposes, where you can also find a link to download the ISAPI_Rewrite filter by Helicon.
Add the DLL “ISAPI_Rewrite.dll” of the ISAPI_REWRITE filter by Helicon to IIS, change permission to the Windows folder for the IIS user and configure the filter via the httpd.ini configuration file. Example:
RewriteCond Host: (?:www\.)?roysac\.com RewriteRule ^/blog/2010/05/demo-post\.html /blog/2010/05/demo-post/ [I,R] RewriteRule …
Step 1) Getting New Word Press Permalinks
Actually getting the “Post Title” AND the Permalink is much more helpful in order to reduce the amount of manual labor in step 3) hehe. There are many different ways to do this. I used MySQL to get to the “meat”. If you have the user name and password for the MySQL installation that is used for the blog, good.
Start the MySQL Administrator and connect to the DB server with that user name and password, preferably the “root” (master) user account. If you cannot remember from the original installation of Word Press, use the manager to find out the “Database” or “Catalog” name used for Word Press. The right catalog or database will have tables with names that start with the same prefix (which you specified during the WP installation, “WP_” by default). The table we are interested in is called “PREFIX_posts” (E.g. wp_posts).
Start the MySQL Command Line Client from the “Tools” Menu in MySQL Administrator, there type:
where CATALOGNAME is the name of the database or catalog used by your WP installation. then type:
SELECT w.post_title, w.post_name INTO OUTFILE “C:/WPLINKS.TXT” FROM wp_posts w where w.post_type=”post”;
where you only have to replace the “wp_posts” towards the end, if you are using a different table prefix. This writes a text file to the root of the “C:” drives called “WPLINKS.TXT”
Step 2) and 3) Getting Old Blogger Permalinks, Matching Them
… and post titles, which should actually be the same as in Word Press, if you did not change them there for any reasons. You can get them from an old RSS feed or from the XML backup file.
1. Download this ZIP file and extract the small executable included in the archive.
2. Copy and Rename into the same folder where you extracted the EXE file to
2 a) Your Blogger backup XML file, rename it to “bglxml.txt”
2 b) the text file created from MySQL called “wplinks.txt”
Run the EXE and it should start reading the Blogger backup file, parsing URLs and post titles from it, then reading the Word Press extract and match them. It generates two files in the same folder called OUT1.TXT and OUT2.TXT.
OUT1.TXT contains ready to use line items for the ISAPI_REWRITE tool for all posts where it was able to match them up and OUT2.TXT, which contains Blogger as well as Word Press post titles and URLs, where it was not able to match them up.Those you have to deal with yourself and my hand.
Final Words and Comments
Well, I had to recover many images manually for a bunch of my blog posts and while I was going through my posts, I noticed that in some cases something was not right. Embed videos were missing and some HTML was messed up. This was not consistent that I was able to nail it down to anything specific, like the video sharing site or editing platform that I used. Below are “before” and “after” screen shots of a severe case of mal-formatting.
I am not sure if the cause for this can be found in any of the tools and steps that I mentioned above or if it was caused by my other messing around working towards a solution by trial and error.
|Before Fixing||After Fixing|
Good Luck for your own migration project and I hope that my tips, resources and experiences are able to make your migration go through much smoother that my own.
Carsten aka Roy/SAC