BigGrid virtualisatie

From BioAssist
Jump to: navigation, search

Members, assignment and progress

  • Sander Klous - Nikhef (Chair)
  • Ronald Starink - Nikhef
  • Marc van Driel - NBIC
  • Pieter van Beek - SARA
  • Ron Trompert - SARA

Project Proposal, goedgekeurd door ET
Progress Report, completion of phase 1

Meetings

Kick-off - Monday July 6, 2009

Tuesday August 18, 2009

Thursday September 03, 2009

Wednesday September 16, 2009

Wednesday October 28, 2009

Presentations

  • Slides - Sander Klous (Monday July 6, 2009), a summary of the CERN virtual machines workshop (see other information) and an introduction for the kick-off meeting of the BIG grid virtual machines working group.
  • Slides - Sander Klous (Tuesday August 18, 2009), Class 2 VM scenario.
  • Slides - Pieter van Beek (Tuesday August 18,2009), Virtual Machines op Big Grid Hardware.
  • Slides - Ronald Starink (Tuesday August 18, 2009), Policy & Security issues for Class 2/3 Virtual Machines.
  • Slides - Marc van Driel (Thursday September 03, 2009), First steps toward the construction of custom virtual machines that can run on the grid.
  • Slides - Pieter van Beek (Thursday September 03, 2009), Policy & Security issues for Class 2/3 Virtual Machines.
  • Slides - Sander Klous (Wednesday September 16, 2009), Technology study about virtual machine infrastructure.
  • Slides - Sander Klous (Monday September 21, 2009), Invited talk at EGEE'09 on Grid and Cloud integration.
  • Slides - Sander Klous (Thursday February 4, 2010), Progress report at the BiG Grid Executive Team meeting (completion of phase 1).

User requests and requirements

Pjotr Prins

-----Original Message-----
From: Pjotr Prins [mailto:pjotr2009@thebird.nl] 
Sent: Wednesday, August 26, 2009 5:43 PM
To: Tom Visser
Subject: [harm.nijveen@wur.nl: [Fwd: [Bioassist-users] - opportunity: small scale cloud pilot: Claudia -]]

Beste Tom,

Ik ben bezig met een applicatie voor in de cloud. Het is een programma
geschreven in Erlang voor het genoom-wijd zoeken van gen-interacties.
In principe draait het onder Debian Linux. Mijn idee is om snel op te
schalen in EC2. Ook ben ik bezig met Jack Leunissen om XEN images te
draaien op zijn cluster, zodat ik die er ook bij kan trekken. Ik
draai al jaren XEN op mijn eigen systemen (een 4-tal reken servers).

Mijn software zou ook wel in het GRID kunnen draaien, met wat
moeite, maar ik vind het erg aantrekkelijk mijn eigen omgeving mee te
nemen, mede omdat ik specifieke versies van Erlang, gcc en glibc wil
draaien (dit in verband met correctheid). Dus eigenlijk kan het niet
in het GRID draaien.

Kortom, ik heb interesse om mee te doen.

Pjotr Prins
Wageningen University

Open Issues

  • Network Address Translation - What is the load?
  • Virtual Machine Isolation - Prohibit internal network connectivity with IPTables?
  • Image repository - Storage Area Network or distributed over worker nodes?
  • Image access - ACL based on grid certificate?
  • Policy document for Virtual Machines

Infrastructure

We are setting up a testbed to investigate technical issues related to virtual machine management.

Hardware and Operating Systems

  • Two Dell 1950 machines, dual CPU, 4 cores per CPU
    • One machine has a CentOS-5 installation
    • One machine has a Debian-squeeze installation

Software

  • CentOS-5 comes with Xen 3.0
  • Debian-squeeze comes with Xen 3.3
    • Debian-squeeze Xen packages have a problem with tap:aio.
Fix:
ln -s /usr/lib/xen-3.2-1/bin/tapdisk /usr/sbin
echo xenblktap >> /etc/modules
  • Opennebula has been installed (stand alone) on CentOS-5 following this guide
    • A few additional staps were needed:
      • Install rubygems and rubygem-sqlite3
      • Opennebula has to be added to the sudoers file for xm and xentop
      • Sudoers should not require a tty
wget ftp://fr.rpmfind.net/linux/EPEL/5/x86_64/rubygem-sqlite3-ruby-1.2.4-1.el5.x86_64.rpm
wget ftp://fr.rpmfind.net/linux/EPEL/5/x86_64/rubygems-1.3.1-1.el5.noarch.rpm
sudo rpm -Uvh rubygems-1.3.1-1.el5.noarch.rpm rubygem-sqlite3-ruby-1.2.4-1.el5.x86_64.rpm 
In /etc/sudoers (on all machines)
opennebula ALL = NOPASSWD: /usr/sbin/xm
opennebula ALL = NOPASSWD: /usr/sbin/xentop
#Defaults    requiretty
  • Installed iSCSI target and client software for shared image repository
    • Howtos: Debian client/server, CentOS client, CentOS server
    • Switched iSCSI target software from default TGT to IET.
      • IET offers blockio mode instead of fileio mode, which disables iSCSI caching. The interplay between iSCSI and LVM is a sensitive issue. Caching is investigated for performance reasons (see below), but iSCSI caching does not provide the required features and is insufficiently configurable to make this work.
    • Maybe test later with encrypted iSCSI
    • Two new machines ordered with the required iSCSI offload
  • Image repository consists of LVM volume groups
    • Performance of LVM is better than file based images
    • Each logical volume contains an image
    • This allows easy creation/deletion of new images
    • VMs can run from cloned (Copy-On-Write) images

Implementation issues

Implemented iSCSI image management for opennebula following the storage guide

In /opt/opennebula/etc/oned.conf:
TM_MAD = [
   name       = "tm_iscsi",
   executable = "one_tm",
   arguments  = "tm_iscsi/tm_iscsi.conf",
   default    = "tm_iscsi/tm_iscsi.conf" ]
/opt/opennebula/etc/tm_iscsi/tm_iscsi.conf
/opt/opennebula/etc/tm_iscsi/tm_iscsirc
/opt/opennebula/lib/tm_commands/iscsi/tm_clone.sh
/opt/opennebula/lib/tm_commands/iscsi/tm_delete.sh
/opt/opennebula/lib/tm_commands/iscsi/tm_ln.sh
/opt/opennebula/lib/tm_commands/iscsi/tm_mkimage.sh
/opt/opennebula/lib/tm_commands/iscsi/tm_mkswap.sh
/opt/opennebula/lib/tm_commands/iscsi/tm_mv.sh
.../one-1.2.0/src/vmm/XenDriver.cc
.../one-1.2.0/src/tm/TransferManager.cc
In /etc/sudoers (on all machines)
opennebula ALL = NOPASSWD: /usr/sbin/lvcreate
opennebula ALL = NOPASSWD: /usr/sbin/lvremove
opennebula ALL = NOPASSWD: /usr/sbin/lvchange
opennebula ALL = NOPASSWD: /usr/sbin/lvrename

Scalability

The above implementation works fine for a minimalistic scenario. However, when more than one concurrent boot from a virtual machine image is necessary, some changes are needed. First of all, the cluster extensions should be enabled for LVM (clvm). Note that the locking mechanism of the cluster LVM daemon makes use of the RedHat cluster management tools (cman). In order to get cluster LVM running, the following packages and minimalistic configuration files were installed:

On CentOS:
cman and lvm2-cluster

On Debian cman, libcman and clvm
Configuration files (both systems) /etc/cluster/cluster.conf, /etc/cluster/lvm.conf

The second issue is that snapshots and clones on LVM are not (yet) cluster aware. Fortunately, for our purpose they do not need to be (because the base virtual machine image doesn't change during usage). The workaround is to create a cluster aware Copy-On-Write partition in LVM. This partition can be enabled exclusively on the worker node and mapped together with the base (cluster aware) virtual machine image to a local snapshot with 'dmsetup'. When modifications have to be stored after a shutdown of the virtual machine, it is sufficient to remove the mapping and synchronize the cluster aware Copy-On-Write partition (by disabling it on the worker node). The Copy-On-Write clone is enabled in the repository for the entire cluster, which makes it accessible from all worker nodes. It can now be used by Virtual Machines as any other base image. So, Copy-On-Write clones are fully recursive. A few modifications were needed to implement this:

In /etc/sudoers (on all machines)
opennebula ALL = NOPASSWD: /usr/sbin/lvs
opennebula ALL = NOPASSWD: /sbin/dmsetup
opennebula ALL = NOPASSWD: /sbin/blockdev

Note (ToDo): the current transfer management scripts require /dev/<Virtual Machine VG> to be owned by opennebula. This is not a problem, but on a reboot the owner is reset to root. The requirement is no longer needed with some minor changes in the scripts:

  • Make use of uuid to move between lvm and dmsetup.
  • Create cow-image on local node, move it with dmsetup and put the snapshot in place.
    • This would avoid the need to create an additional link to boot the VM.
    • It already works when the VM is done and the image is activated on the repository.

Local caching

Network traffic for Virtual Machine management can be optimized significantly with two caches on each worker node:

  1. A read cache for the original Virtual Machine image to facilitate reuse on the same worker node.
  2. A write-back cache for the copy-on-write clone to allow local writes when the virtual machine is active.

If requested by the user, the copy-on-write clone can be synchronized with the image repository when the virtual machine is done. After this synchronization, the write-back cache becomes obsolete and can be removed. We implemented both the read and the write-back cache at block device level (i.e. iSCSI/LVM level) with dm-cache. One LVM partition on the worker node serves as persistent local read cache for the virtual machine image. Another LVM partition on the worker node serves as transient local write-back cache for the copy-on-write clone. The transient cache is created and removed on demand by OpenNebula.

Unfortunately no CentOS or debian packages are available for dm-cache. Here is the recipe to build the kernel module from source.

On debian:
apt-get install linux-source linux-patch-debian
cd /usr/src
tar jxf linux-source-2.6.26.tar.bz2
/usr/src/kernel-patches/all/2.6.26/apply/debian -a x86_64 -f xen
cd linux-source-2.6.26
cp /boot/config-2.6.26-2-xen-amd64 .config
<In the Makefile: EXTRAVERSION = -2-xen-amd64>
make prepare
cp /usr/src/linux-headers-2.6.26-2-xen-amd64/Module.symvers .
cp -r /usr/src/linux-kbuild-2.6.26/scripts/* scripts
cd
wget http://github.com/mingzhao/dm-cache/tarball/master
tar zxvf dm-cache.tar.gz
cd dm-cache/2.6.29
ln -s /usr/src/linux-source-2.6.26/drivers/md/dm.h .
ln -s /usr/src/linux-source-2.6.26/drivers/md/dm-bio-list.h .
<In dm-cache.c: change BIO_RW_SYNCIO to BIO_RW_SYNC (line 172)>
<Create Makefile>
make
insmod dm-cache.ko
cp dm-cache.ko /lib/modules/2.6.26-2-xen-amd64/kernel/drivers/md/dm-cache.ko
depmod -a
On Centos:
yum install kernel-devel
mkdir -p ~/rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS}
echo '%_topdir %(echo $HOME)/rpmbuild' > ~/.rpmmacros
rpm -Uvh kernel-<version>.src.rpm
cd ~/rpmbuild/BUILD/kernel-<version>/linux-<version>
cp /boot/config-2.6.18-128.7.1.el5xen .config
<In the Makefile: EXTRAVERSION = -128.7.1.el5xen>
make prepare
cp /usr/src/kernels/2.6.18-128.7.1.el5-xen-x86_64/Module.symvers .
cp -a /usr/src/kernels/2.6.18-128.7.1.el5-xen-x86_64/scripts .
cd
wget http://github.com/mingzhao/dm-cache/tarball/master
tar zxvf dm-cache.tar.gz
cd dm-cache/2.6.19
<Extract dm-cache.c from patch>
ln -s ~/rpmbuild/BUILD/kernel-2.6.18/linux-2.6.18.x86_64/drivers/md/dm.h .
ln -s ~/rpmbuild/BUILD/kernel-2.6.18/linux-2.6.18.x86_64/drivers/md/dm-bio-list.h .
<In dm-cache.c change the dm-io.h include to: #include <linux/dm-io.h> >
<Create Makefile>
make
insmod dm-cache.ko
cp dm-cache.ko /lib/modules/2.6.18-128.7.1.el5xen/kernel/drivers/md/dm-cache.ko
depmod -a

So far, I was not able to make dm-cache work on either Centos-5 (kernel panic) or Debian squeeze (cache writes corrupt the image). I will contact Ming Zhao (the author of this code) at a later stage to sort things out.

Performance tests

An overview of the performance of the VM test cluster can be found on the Nikhef Ganglia monitoring page.

File I/O performance

Network performance

Realistic load performance

Links