Migrating Blog The Engineering Way
Migrations Topic has been very close to me, I have been part of many migrations throughout my career and recently did one of the biggest one at my current company [article soon to be published at the company tech blog]
. This time I have something unique, a cross between blogging and engineering practices, the blog migration from Blogger Blog to Ghost Blog by taking the best practices from the engineering migrations.
Recently, I completed the blog migration from Blogger to Ghost for many good reasons, if you are a returning reader you will see a complete new look as now it is hosted at Ghost instead of Blogger. This article I will be focusing on many aspects of this migration and the special thing is to learn how I applied my engineering approaches to migration to make it successful and painless.
In my last post Large Scale Migration Best Practices, I shared how engineering migration should be done, however that was for the team or company, while this one is a bit different as this is a solo project with me being the lone engineer doing everything.
At higher level I first split this migration project into smaller phases, this approach we usually take in engineering projects and since being an engineer I wanted to make sure I do it the right way so it's less painful and successful.
Engineering Practice: Split into smaller digestable components
Motivation
I have been writing content on the internet for the past several years, and typically being used to Blogger I have been happy for most of the time. Blogger is one of the oldest blogging platforms, if not the oldest. However, in recent years Blogger seems to have lost its ability to keep with up-to-date features despite being owned by Google.
There are many reasons that contributed to my decision in looking for an alternate. I can shed light on few of them:
Editor
Blogger editor is very buggy, not able to copy paste and format in a consistent way. It has been a very hard journey, it was acceptable for me as I was not writing frequently or maybe I was not motivated to write often because of the painful process. Furthermore, Blogger does not have an editor that can support inline code snippets, code blocks etc. It does have flexibility to add but despite that the consistency and maintaining that become too time consuming.
Subscription
Blogger had FeedBurner
for email subscription or now you can say newsletter. Recently they killed that service. Google Blogger Team killed the service despite knowing the recent waves in subscription based models like SubStack. Even though this was just a subscription service with no option to have different tiers to control free vs paid content.
Themes
I have been setting up free themes and editing myself most of the time, there is no marketplace or proper upgrades to themes. Theme setup is very tricky and challenging especially if you like to make some modification on top. There is one big template file and thats it.
Accelerated Mobile Pages (AMP)
AMP was started many years ago by Google Search, the purpose of that was to speed up load times of content on mobile phones but Blogger never added that capability. With this blogger blogs may end up losing organic traffic.
Google Structured Schema
Similar to AMP, Google Structured Schema is very important for structuring your content on Google Search, and this was lacking, either a complete customization was required or a new theme. I was mainly focused on the Article
schema.
Community
Blogger community may be there for more than two decades now, but it feels most of active bloggers have moved and the community is draining with less content, since no new features are introduced there is nothing much to discuss, if you started blogger ten years ago, you would have same feature except maybe just a new design in the backend.
I personally feel Google will soon kill Blogger.
Migration
Lets switch gears and focus on migration. First thing was to create a solid plan split into many phases so that I can focus on each component every week. Initially I thought it would be very quick and like a weekend project but unfortunately with lots of things to consider, it took three to four weekends.
Research Phase
Free Trial / POC Phase
Migration Phase
Final Phase
Research Phase
This is exactly the same phase which most of the Senior Engineers do many times when looking for tools to solve problems for the team or company. This was the phase where I needed to research, share and ask folks about the top platforms and their experience. I was already aware of many major blogging platforms like Medium, Wordpress, and Substack.
Since I have already tested Medium before it was out of the picture, it was mostly between the three WordPress vs SubStack vs Ghost.
Lets dive into what I looked for in a blog:
Feature Reviews
I was mainly looking to solve the problems mentioned in the motivation section. A better modern editor, support to Google Structured content, growing community, and the customization and flexibility better or atleast like Blogger. SubStack lacked most of the items, while WordPress required more than its needed, like a plugin is required for everything unlike Ghost where you have good enough solid built-in integrations.
I have previously tested SubStack and did not enjoy the writing experience.
Cost and Flexibility
Between Ghost and WordPress, they both are open source meaning cost can be controlled anytime in future, while SubStack is a revenue sharing model, so starts free but if you have a successful paid subscriber count, then you could end up paying more than the hosting fee compared to the other two. Since I am an Engineer, I knew cost would not be a big problem as I can later self host on platforms for cost saving in case I grow tremendously.
Content Migration Support
Another important aspect was to find how easy it is to migrate content with the flexibility of adding custom 301
redirects as I was moving from old to new cleaner URLs, Ghost provided enough documentation to allow me to write some custom scripts to make this migration easy. Content Import on SubStack was super easy, just a click of a button was needed but they missed the redirection part. Will discuss in the later section.
Benchmarking
Site speed is very important, not just for users but for search engine crawlers as well. Ghost is known to be super fast with neat and clean themes and no external plugins required like WordPress which makes it slow and buggy. You might or could make Ghost buggy and slow if you don’t have experience with customizing the templates.
I tested the speed with existing Ghost and WordPress simple blogs that are similar to my Blogger blog, and Ghost came out at top. Unfortunately, I could not document each and everything, but I remember that Ghost achieved the performance rating of 98
. Since Ghost was very fast, I decided not to enable AMP pages initially.
Blogging on Ghost provides both, newsletter traffic (memberships) and organic traffic from Search Engines (SEO).
Engineering Practice: Research and Dive into Documentation
Free Trial / POC Phase
After shortlisting the options, it was time to start the next phase, trying out by utilizing the free trial, since Ghost was already on top of my mind I started out with trying to see if it can achieve all that I researched about and how much do I like as a content creator.
The free trial was of 14 days, and I tried for a week before buying out the Ghost Pro.
I am not going to dive into detail but here is a list of things that satisfied me:
Great experience with modern editor
Simple, clean and fast themes
Full control on template customization
Members analytics
Dynamic Routing
Redirection
Detailed Content level SEO and Social Media Structure
Content Importer
Ghost documentation and community
Tag Organizer
Built in Integrations
Some features came out of a surprise to me and I treated them as an added bonus. That being said, I ended up buying the Ghost Prod after a week.
Last but not the least, I did the speed test again (used results from earlier benchmarking) to find out how performant the new blog is and they were great!
Engineering Practice: Testing and benchmarking is important
Migration Phase
Since I was satisfied with everything that Ghost was offering I came up with a migration plan. Going more deeper into the migration phase by splitting it further and finding out what can be automated and what to be done manually.
Few requirements:
Do migration incrementally, and iterate on automation scripts
Backfill posts with proper cleanup of images and tag when possible
Write automated test for making sure content import works as expected
Engineering Practice: Increment and Iterate
Automation
Since Ghost had good enough documentation on how to structure data for content import, I was able to use my super powers of coming up with rough one-off scripts. This was a big win otherwise manual copy pasting would have been a pain.
Going over the high level scripts steps [Nothing fancy]:
Get JSON from Blogger API
self.URL = f"https://www.googleapis.com/blogger/v3/blogs/{BLOG_ID}/posts?key={API_KEY}&maxResults={self.POSTS_COUNT}"
resp = requests.get(self.URL).json()
Extract and clean html content (blogger API returned HTML)
def content_cleanup(content):
print("Fixing html content, cleaning up")
content = content.encode().decode('utf8')
content = content.replace('"',"'").replace('\n','')
return content
Add additional missing fields in correct format like dates
def date_conversion_to_unix(date_str):
print(f"Coverting Date: {date_str}")
date_str = datetime.strptime(''.join(date_str.rsplit(':', 1)), '%Y-%m-%dT%H:%M:%S%z')
unix_str = int(time.mktime(date_str.timetuple()))* 1000
return unix_str
Convert to Ghost JSON schema for content import
output.json
, docs: Ghost SchemaDownload all images in the blogger post (so I can keep in case blogger loses)
def img_download(content, title):
img_url = re.findall(r'(?:http\:|https\:)\/\/(?:blogger|[0-9]\.bp).*?.(?:png|jpeg|jpg|=s16000)', content)
img_url = img_url[0]
img_name = re.sub('[^A-Za-z0-9]+', '-', title) + '.png'
print(f"Downloading Image from URL {img_url}, {img_name}")
img_data = requests.get(img_url).content
if not os.path.exists(f"images/{img_name}"):
os.makedirs(f"images/{img_name}")
with open(f'images/{img_name}/{img_name}', 'wb') as handler:
handler.write(img_data)
return img_url
Generate one to one
301
redirect mapping (`redirect.yaml`)
def redirect_yaml_generator(self, redirect_yaml, url, slug):
print(f"Adding mapping for url {url}")
redirect_yaml[301] = redirect_yaml.get(301, {})
redirect_yaml[301]["/"+url.split('/',3)[3]] = f"/blog/{slug}/"
print(redirect_yaml)
# Write only once when all posts are done
if len(redirect_yaml[301]) == self.POSTS_COUNT:
with open(self.PATH_REDIRECT, 'w') as outfile:
yaml.dump(redirect_yaml, outfile, default_flow_style=False, sort_keys=False)
I wrote one big python script to deal with all above challenges, can be found at Github. Lot of room for code improvement, this was done purely as an one-off script, depending on your content quality, you may need to adjust few methods.
Some limitations that could not allow me to automate:
Since Blogger was not providing search description via API, this was one manual action needed to be done post migration.
Images could not be uploaded to Ghost easily or may be a upgrade to plan was needed.
Tags cleanup and organization was hard to do via scripting.
Code blocks and inline code needs to be formatted.
These items were kept for later as manual steps.
Engineering Practice: Automation where possible
Data Quality
At this point the blog was ready and I could have kept it as is if I had hundreds of posts with slight manual review. But since we had limitations, I already had planned to go one by one to review and fix what comes on the fly. Ended up investing an hour or two of my time for that.
Automated tests helped in making sure data generated from scripts is in good shape.
Manual tests help me in making sure we follow up on missed items like limitations.
Engineering practice: Data Quality is crucial
Final Phase
This phase focused on the final testing and configurations related to blog and domain.
Custom Domain
Since I wanted to have zero disruption for my blog (although its not a big deal), I have been using the Ghost subdomain for doing all the stuff till now, this was the time to migrate the domain as well to a new address.
Engineering Practice: Zero service disruption
Ghost provided the CNAME
records and I was able to connect it within few seconds.
During the process also transferred from GoDaddy to Google domains as email alias was free. E.g: hi@junaideffendi.com
Further, I setup domain forwarding as I was moving from blog.junaideffendi.com
to www.junaideffendi.com
.
Dynamic Routing
Since Ghost provides the ability to dynamically route, I decided to keep my blog under this URL, www.junaideffendi.com/blog
, which allows me to customize the homepage in future. Dynamic route is a yaml
file that can be uploaded easily.
Snippet from the full routes.yaml
.
collections:
/:
permalink: /blog/{slug}/
template: index
Redirect
Last step was to upload the generated redirects.yaml
file on the Ghost to allow older links to work seamlessly. In order to make sure the 301
redirect was working, I wrote a test script to make sure the url returned assertion is what I expected.
import yaml
import requests
"""
Post migration
Compares the old url redirected url path to expected mapped path in the redirects.yaml
If mapping does not match it will log
"""
OLD_URL = "https://blog.junaideffendi.com"
NEW_URL = "https://www.junaideffendi.com"
PATH = "../outputs/redirects.yaml"
with open(PATH, "r") as stream:
redirects = yaml.safe_load(stream)
i = 1
for o,n in redirects[301].items():
o = f'{OLD_URL}{o}'
n = f'{NEW_URL}{n}'
resp = requests.get(o)
if resp.url != n:
print(f"mismatch between old and new url {o} -> {n}")
i = i + 1
Below is how the redirects.yaml
look like:
301:
/2022/10/large-scale-migration-best-practices.html: /blog/large-scale-migration-best-practices/
/2022/08/automated-unit-testing.html: /blog/automated-unit-testing/
Ghost redirects also have regex support but since I had to fix some urls, I kept a list of all 59
mappings.
Speed Test
Final speed tests against the benchmarking numbers gave a lower speed rating of of the homepage 80
. The image size was the culprit that reduced my score, this is one item that I missed during the migration, it should not be that bad after all as I need to resize and reupload only a handful of images. Below script helped me achieve that.
from pathlib import Path
from PIL import Image
import glob
"""
Conver Images to webp post download
"""
PATH = "../images"
paths = Path(PATH).glob("*/*")
for source in paths:
destination = source.with_suffix(".webp")
image = Image.open(source)
image.save(destination, format="webp")
print(f"image saved at {destination}")
And thats how blog migration was completed and it was ready to be shared.
Future
My personal goal is to be consistent and attract more readers in the near future. For that I would be investing time in topics and content quality with an aim to reuse my previous SEO experience for getting more organic traffic as well.
Furthermore, getting more experienced with Ghost, it would allow me to customise a theme down the line to have more uniqueness to the blog and have a little more control.
On pricing, GhostPro can end up being expensive, if that happens I have no problems in switching to self hosted Ghost which could require more time and effort but worth the tradeoff.
Moving to new platform motivates me in many ways, a change was needed.