How to convert VMDK virtual hard disks to Amazon Elastic Compute Cloud (EC2) AMI format

Updated: May 14, 2009

Let me introduce the Amazon Elastic Compute Cloud (EC2) concept for a few moments. Amazon EC2 is a web service designed to provide instantly resizable, scalable computing capacity to users around the world. It's a massive virtualization grid, located within the massive Amazon farms, providing CPU cycles and hard disk space to whoever may require them, on the fly. The only thing you have to do is pay, and not that much actually.

To be actually useful, Amazon provides the service with a very affordable price tag, comparable to what you may have to spend on buying and maintaining your own infrastructure. For home users, this may not seem like the most immediate necessity, but small to medium businesses with flexible demands will definitely like the prospect of hardware independent freedom.

Some of you will like the idea, some of you will hate it. Personally, I think this is a great project, especially since the Amazon cloud is based on Linux and mainly support Linux, creating a tremendous opportunity for the Linux market growth. Mark Shuttleworth suffered quite a bit of criticism for his intention to incorporate Amazon functionality into the upcoming Ubuntu release, but I think it's a wise step, as the merging of needs between the desktop and the Web beckons for an "ethereal"  flexibility only available in the Cloud.

Teaser

At home ...

At home, you will rarely need more than classic virtualization provides. But you will probably start considering Amazon whenever the price tag of having your own database server with 16GB RAM in the living room comes up.

If ever that happens, you will want to know how to use the Amazon EC2. A full tutorial is definitely beyond the scope of this article; you have the official documentation for that.

Instead, I'm going to demonstrate but a part of the whole scheme, albeit probably the most important one that the home user might come across - the creation of Amazon virtual machine images. I'm going to show you how to convert home-brewed virtual hard disks used by VMware products into Amazon-compatible images that you can use with EC2.

Why not create Amazon images in the first place or use existing builds?

A good question! Amazon allows you to create images from scratch or use one of the existing templates (like SUSE, Ubuntu, RedHat, etc) as your starting point. However, creating images is a tad complicated. And using existing templates might not be what you want. For example, you may want to run a special, custom-configured distro or you may want to use a private, digitally-signed image that is unavailable to anyone but you. You want total control and you can't afford to let others create the images for you.

Objective: Create image at home, then upload to Amazon

I'm narrowing down this tutorial to a very specific subject, which I find quite important. You want to have a private image, with its own certificate. You want to create it at home, at leisure, at your own time and expense, without wasting money on bandwidth and the long hours of configurations. Once it's ready, you'll convert it and upload it to Amazon storage, called Amazon Simple Storage Service (S3).

Note: You will have to have a valid Amazon EC2 account before you can put the instructions in this tutorial to any use. Furthermore, you will have to use Linux to follow the tutorial successfully. Certain tools used here are only available for Linux - not surprising, considering that Amazon runs on Linux.

First question: convert from what?

Another good question. There are many desktop virtualization formats available. However, if you run virtualization at home, there is a good chance that you're using one of VMware products. VMware uses the .vmdk format for its virtual hard disks. This is our source format, which we want to convert to Amazon Machine Image (AMI).

However, you cannot directly convert the VMDK virtual disks to AMI. To show you what needs to be done, I've written this tutorial. Let's begin.

Step 1: Make sure you have EVERYTHING you need

Before we begin, there are some (quite a few) very important things to take into consideration before rushing off converting this and that.

Furthermore, before you start reading, I need you to realize that the Amazon EC2 project is still in its infancy, so the recommendations given here may only partially apply to future setups - or not at all. While I do not usually humble down my findings, in this particular case, due to the very large number of delicate variables, I cannot vouchsafe success with this endeavor. However, do not let this small admission put you down! Follow me.

General Amazon requirements (regardless of VMware):

Xen requirements

You will also have to xenify your distribution before you can convert it and use it on Amazon. What does this mean? Well, the Amazon virtual infrastructure is based on Xen, an open-source hypervisor similar to VMware, KVM or other virtualization products. Amazon virtual machines require specific kernels and kernel modules to be able to run on this platform. You will have to download the appropriate Xen kernel, extract it, install Xen modules, and update the GRUB menu.

In greater detail, go to the website / software repository of your favorite distribution and download the relevant kernel, matching your version. Extract the archive and place the files contained therein into /boot and /lib. Your next step is to create the relevant module files:

depmod -F /boot/System.map-<kernel>-xenU -a <kernel>-xenU

Where kernel is matching the version you have (uname -r).

Once this is done, check that modules exist under /lib/modules/<kernel>-xenU directory. Finally, update the GRUB menu.lst configuration file so that it contains a Xen kernel entry. This entry (stanza) should be booted by default. For more about GRUB, please check my extensive tutorial.

Partitioning requirements

Another tricky one. Amazon virtual machines must have a very specific partitioning layout. Your /etc/fstab tree should look like this:

/dev/sda1   /           ext3    defaults        1    1
none        /dev/pts    devpts  gid=5,mode=620  0    0
none        /dev/shm    tmpfs   defaults        0    0
none        /proc       proc    defaults        0    0
none        /sys        sysfs   defaults        0    0

If you're using your own swap and /mnt mountpoints, remove them from /etc/fstab, as Amazon will use its own when running the machine instance.

In general, please read the Creating an AMI document for many, geeky details about how your virtual machine should look like. Do not be alarmed by the overwhelming abundance of data, as most stuff is irrelevant for our test case. What you need to pay attention to are the kernel modules, the partitioning layout and the networking configuration.

Default runlevel & services

Furthermore, to actually be able to connect to your Amazon machine, you will have to have the firewall service either disabled or configured to allow SSH incoming connections and naturally, the SSH service enabled in your default runlevel. By default, SSH uses TCP port 22.

Speaking of runlevels, Xen machines normally boot into the unused runlevel 4, so you will have to edit your /etc/inittab file and uncomment the unused runlevel 4 and set it as the default runlevel. Second, you will have to enable relevant services. I do not have a magic set to recommend, unfortunately.

Likewise, you cannot run X in Amazon (yet), so runlevel 5 is out of question. If you need help configuring services to run in different runlevels, take a look at my Linux services tutorial.

Network interfaces

You will also have to configure your network device to lease the IP address via DHCP and have the IPv6 protocol disabled. In most cases, this is the default setting, so you will not have to work too hard here.

Additional important notes:

VMware products allow you to create dynamically expandable images that grow as they're filled with data. Therefore, a 40GB virtual disk may only weigh 700MB, if it contains only 700MB of data. It is also possible to pre-allocate the size, in which case the disk will be inflated to its full 40GB size. However, to conserve hard disk space, many users will opt to use the first option - dynamic hard disks.

When it comes to converting VMDK to RAW, any benefit gained with the dynamic expansion is lost. The disk image will be inflated to its real size. So be careful when creating virtual machines and pay attention to disk size.

Secondly, Amazon currently supports images only up to 10GB. This means that your VMDK should not exceed 10GB. Otherwise, you won't be able to create the AMI. Use an image too big and you'll get an error:

ERROR: the specified image file <something>.raw is too large

An example image:

Image too large

Thirdly, you will need java and ruby installed on your Linux machine. And yes, you will need a Linux machine for step 3 of this tutorial. Make sure you satisfy ALL these demands before venturing to your escapade.

Step 2: Convert VMDK to RAW

Now that we know what we need, we'll use QEMU for conversion. QEMU, available for both Windows and Linux, is a rather powerful, jack-of-all-trades emulator/image utility, which allows you to convert the VMDK files to RAW format.

I've introduced QEMU as a great tool for the creation of virtual hard disk in my VMware Player article, long ago. You can also use it for other disk-related tasks, like conversion.

Now, you will have to convert the VMware disks to raw format. Essentially, this will strip the hard disk image of any smart VMware algorithms and expand it into a sector-by-sector disk image.

QEMU does this well. In fact, you can use QEMU for all sorts of conversions. But currently, we're interested in vmdk > raw. To convert, simply locate the relevant .vmdk file and run the following qemu-img command:

qemu-img convert -O raw source.vmdk target.raw

Let the process run. Depending on your machine specs and the image size, it can take quite a while.

Step 3: Bundle the image with AMI tools

Amazon offers two bundles of tools for work with their EC2 service. One is the set of API tools, a client interface for the EC2 service. The other is the set of AMI tools, a collection of command utilities used to create, bundle and upload AMI to Amazon S3.

We need the AMI tools. Download and extract them, preferably inside your home directory. Now, you will have to run a long and tedious conversion command that will look something like this:

./ec2-bundle-image -i <image> -r <arch> -c <cert> ->
-> -k <key> --user <user id>

Let's explain the options:

--prefix has invalid value 'cert.pem':'/' character not allowed.

Here's an actual screenshot of such an error (output manipulated for clarity):

Slash not allowed

Requirements

But this is not enough. Even if you follow the command to the letter, the ec2-bundle-image will complain. Something like this:

ec2-bundle-image: line 3: EC2_HOME: Neither of EC2_AMITOOL or EC2_HOME environment variables are set

This means you will have to export a few environment variables before using the utility.

BASH

On Bash, you will use the export command, like this:

export EC_HOME=<path>
export EC_AMITOOL_HOME=<path>

TCSH

TCSH does not support export. So instead, you will have to use setenv:

setenv EC_HOME <path>
setenv EC_AMITOOL_HOME <path>

The paths should correspond to the following:

You may also want to export/setenv other variables, which makes the process of typing them manually rather boring. To make things more efficient, you may want to create a file, contain all your variables and their values and then source it when you need to run the AMI tools. Something like this:

source file-containing-all-exports

Now, we're ready. Hit Enter against your long conversion command. And wait. After a while, the process should complete successfully. The emphasis is on the word should, as you have to complete a series of delicate preparations for the thing to work.

Conversion complete

If you did successfully convert the VMDK > RAW > AMI, then you're ready to upload the file. The indication that your conversion was successful will be a list of files in the /tmp directory and an XML manifest file. This is the default output directory.

Directory listing

This concludes the actual conversion process. Your next step is the upload, but this is beyond the scope of this article. We'll talk about this on another occasion. Amazon is here to stay and you're likely to see several more articles on the subject on Dedoimedo in the future.

Conclusion

As you can see, the task is not that simple, but it is manageable and will save you quite a bit of time, bandwidth and ultimately money. Being able to convert images offers you freedom beyond the specific needs of the Amazon EC2 service. We've already seen how to use VMware Converter to this end. Now, QEMU is another power tool we can use to enhance our virtualization needs.

You must satisfy quite a few rules though, especially in regard to AMI tools. You must make sure the image is not too big, not to use the trailing slash in front of the certificate and private key paths, make sure the environment variables are exported, and in general, not to place the Amazon tools in "unsafe" directories. Now, have fun walking on clouds.

In the sequel articles, we will talk about Kiwi, a handsome component of the powerful SUSE Build Service and see how it can be used to create Xen or even AMI images from custom distributions or even your own physical installation, offering similar capabilities to VMware Converter - and then some. To this end, we'll play with Image Creator and Product Creator. We will also talk about SUSE Studio and many other exciting, revolutionary concepts, ideas and projects.

Markus, you asked about an AMI tutorial? Here you go. Enjoy.

Cheers.