Virtualizing the UCSF Cluster.

Written by  on July 19, 2009

The computing cluster we maintain at uscf is built on a stock ubuntu hardy (desktop) with software added to it either to match the previous configuration or as needed by the experiments which are being conducted on the system. This decision was made as a compromise between the user community which was familiar with Ubuntu and the need for a stable (LTS) platform from an administration perspective. Since the hardware on the cluster is aging and reinstallation of systems requiring on site administration is expensive. We are evaluating alternatives such as the Amazon Elastic Cloud. This is an overview of an attempt to create a virtual instance of the servers used in the cluster. (The process is actually done twice since the head node is built on intrepid (desktop) because the processor configuration was not compatible with hardy’s install kernel.)

Out of hundreds of prebuilt images avaliable for the “Elastic Cloud” there are two sets of ec2 AMIs that closely match the platform that we are running. One is from alestic and the other is a relatively new bundle from Ubuntu

My initial attempt to create a package used the alestic desktop image. On top of taking more than 7 hours to merge the packages installed on the reference platform the result was completely unusable. Some of this was due to my not catching packages which did not work on ec2 such as grub and the kernel which was installed on the reference platform. Unlike VMware’s virtualization the amazon cloud is pretty particular about a few things. For this reason I found it prudent to start with systems which were native to the cloud and modify them.

The Ubuntu AMI advertises itself as “server” which is bare bones to the point of uselessness. Comparing the installed packages on the reference platform to the server left over 500 packages to sort through.
Fortunately the instantiated version allows you to install a set of packages which includes the desktop. This left us a delta of around 100 packages. Eliminating the packages that were not likely to work on the EC2 such as the standard kernel and grub resulted in less than 90 new packages to install.

Once I set up the ec2 tools and environment in accordance with amazons documentation I created an instance of the Ubuntu AMI (ami-5d59be34)

$ ec2-run-instances ami-5d59be34 -k n01-keypair
$ ec2-describe-instance (...wait until its running)
$ ssh -i id_rsa-furm-keypair

With the basic instance installed you can  “sudo tasksel” and select Ubuntu desktop (in addition to the ssh server and any other appropriate packages). Once it completes the installation you can get a list of the installed packages and compare it to the packages on the reference platform and using aptitude install the additional packages. I use aptitude since it handles dependencies more ‘apt’ly than apt-get.

# dpkg --get-selections|grep install|cut -f1>ubuntu-intrepid-ec2.packages
# exit
$ exit
$ scp -i ~/.ssh/id_rsa-n01-keypair .
$ dpkg --get-selections|grep install |cut -f1 >reference.package
$ diff ubuntu-intrepid-ec2.packages reference.packages |grep \>|sed 's/> //'>packages2add
$ nano packages2add <<<<<< delete linux* grub and any kernel packages.>>>>>>>
$ cp -i ~/.ssh/id_rsa-n01-keypair packages2add
$ ssh -i ~/.ssh/id_rsa-n01-keypair
$ sudo bash
# for p in `cat packages2add` ; do echo $p;  aptitude -y install $p; done

At this point you should back check the new manifest and reboot the instance to make sure that you didn’t break anything before using your modified instance to create a new bundle. The ubuntu AMIs come with two disks defined by default / and /mnt which are close to the same size. you use /mnt to create an image. The process is convoluted at best. There is a pretty good description at

$ scp -i ~/.ssh/id_rsa-n01-keypair  *.pem
cert-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.pem     100%  916     0.9KB/s   00:00
pk-XXXXXXXXXXXXXXXXXXXXXXXXXXXXX.pem       100%  922     0.9KB/s   00:00
$ ssh -i ~/.ssh/id_rsa-n01-keypair
$ sudo bash
# ec2-bundle-vol -r i386 -d/mnt -p ucsf-hardy-node \
-s 10240 -e /mnt,/tmp,/root/.ssh,/home/ubuntu/.ssh
Copying / into the image file /mnt/ucsf-intrepid-head-node...
... <<<<<<< wait for a small eternity >>>>>>>
Bundling image file
... <<<<<<< wait for a nother small eternity >>>>>>>
Creating bundle manifest...
ec2-bundle-vol complete..
#ec2-upload-bundle  -b ucsfintrepid -m /mnt/ucsf-intrepid-head-node.manifest.xml   \
Creating bucket...
Uploading bundled image parts to the S3 bucket ucsfintrepid ...
Uploaded ucsf-intrepid-head-node.part.000
Uploaded ucsf-intrepid-head-node.part.167
Uploading manifest ...
Uploaded manifest.
Bundle upload completed.

Once this is done you have to register the bundle this is done from the local machine (reference).here.

$ ec2-register ucsfintrepid/ucsf-intrepid-head-node.manifest.xml
IMAGE    ami-f339d89a

Then you can test load it.

$ ec2-run-instances ami-f339d89a -k n01-keypair
$ ec2-describe-instances
...INSTANCE    i-af261cc6    ami-f339d89a    domU-12-31-39-00-65-C8.compute-1.internal    running    n01-keypair    0        m1.small    2009-07-20T03:41:11+0000    us-east-1c    aki-714daa18    ari-6a5bbc03        monitoring-disabled
$ ssh -i ~/.ssh/id_rsa-n01-keypair

If this is successfull you can terminate your original instances.

$ for i in `ec2-describe-instances |grep INSTANCE |cut -f2` ; do ec2-terminate-instances $i; done
INSTANCE    i-cd4f75a4    running    shutting-down
INSTANCE    i-af261cc6    running    shutting-down.

(This is important since amazon charges a nominal amount for each instance’s runtime ).