Monday, February 20, 2006
Interview on Xen with Manuel Bouyer
Virtualization requires guest OSes to be built for the host machine processor. It should not be confused with emulation, that do not have this requirement: When an OS runs on the top of a virtualizer, its code runs unchanged on the processor, whereas an emulator has to interpret to the guest OS code. MAME or Basilisk are examples of emulators.
Binary compatibility is another different feature: it is the ability of an OS to run applications from another OS. For instance BSD systems are able to run Linux binaries.
Manuel Bouyer is a NetBSD developer who has been involved in kernel hacking for many years. He recently added support for Xen to NetBSD, based on Christian Limpach's work for the Xen team
In this interview, Manuel will tell us what is so good with Xen, and what was the work required to have it working on NetBSD.
ED: Hello Manuel, and thank you for accepting this interview. First, could you tell our readers how virtualization actually works? It is easy to understand that the guest OS code will run unchanged, but what happens when it tries to access the hardware? One could expect conflicts if both the host and the guest OS try to use a disk controller at the same time...
MB: The most common approach is that the virtualizer traps accesses to devices and schedule accesses to the hardware. For example, for a shared disk device, the virtualizer will trap each guest's access to the device and issue the command to the device itself after checking the request against the guest's privileges. Once the command completes, it will give the result back to the guest. The virtualizer will also apply scheduling policies to the device access to ensure proper share of resources, as defined by the administrator (for example, to make sure one guest won't use all the disk or network bandwidth).
ED: The host OS also has to control the guest OS memory accesses, to make sure that it does not screw up the host OS memory. How is it done?
MB: The answer to this is highly dependent on the hardware. It won't work the same way on Sun or IBM systems which have hardware support for virtualization, or on ordinary x86 hardware where this will have to be emulated. In the later case, the virtualizer won't let the Guest's kernel run in privileged mode. When a privileged instruction occurs, a trap is generated and the virtualizer emulates the instruction after checking it's within the domain's bounds (for example, that the domain doesn't try to map a page which doesn't belongs to its memory space).
ED: I am a bit confused about the virtualizer not allowing the guest kernel to run in privileged mode (also known as kernel mode). When an OS boots, it runs the processor in kernel mode. If the guest OS kernel is started in user mode, how does it avoid a panic on the first privileged instruction?
MB: When the guest kernel executes a privileged instruction or tries to write to protected memory (e.g. a page table) a fault is generated, which is handled by the virtualizer (not by the guest's kernel). The virtualizer will then decode the faulting instruction, and take appropriate action (really execute the privileged instruction if it's OK, do the memory operation if it's OK, or forward the fault to the guest's kernel if it's a "real" fault).
ED: Let us talk about Xen, now. This virtualizer claims an unprecedented level of performance. How is it achieved?
MB: I can talk about the differences between Xen and VMware because it's what I know best. The Xen papers also present benchmarks against User Mode Linux, but I don't know how User Mode Linux works so I can't really comment on this one.
VMware will emulate real hardware so that an unmodified OS can run on top of the virtualizer. Just run an OS in VMware, and this OS will see some kind of VGA adapter for the console along with the usual PS/2 keyboards and mouse, some kind of network interface (a DEC tulip if my memory is right), and some kind of IDE or SCSI adapter for mass storage. To achieve this, the virtualizer will have to emulate these devices. This means emulating the registers and interrupts, quite often with MMU faults when these fake registers are accessed. This means that a single I/O operation will most likely require several context switches. The same goes for the MMU: accesses to privileged CPU registers, page tables and use of privileged instruction will all generate context switches.
The Xen's virtualizer (the hypervisor) doesn't emulate real hardware but offers an interface to the guest OS for these operations. This means that the guest OS will have to be modified to use these interfaces. But then these interfaces can be optimized for the task, allowing it to batch large numbers of operations in a single communication with the hypervisor, and so limiting the number of context switches required for I/O or MMU operations.
ED: Your work was not just about making NetBSD a Xen host but also a Xen guest. What were the required changes for running as a guest? Were they limited to fake device drivers and machine-dependent parts of memory management (the kernel part known as pmap)? Or did you have other things to change?
MB: To make it clear: I didn't start from scratch. My first task in this was to take the NetBSD patches provided by the Xen team (more specifically, by Christian Limpach) and include them in NetBSD's sources. This allowed NetBSD to run as a non-privileged guest. Then I added support for domain0 (in Xen, domain0 is a "privileged" guest OS which has access to the hardware, controls the hypervisor and start others guest OSes. It also handles driver backends, to provide block and network devices to other guests). Now, to reply to your question: no changes were required in the NetBSD sources, all the needed bits were added in NetBSD/Xen. Then some changes were done in the i386-specific sources to allow more code sharing between NetBSD/i386 and NetBSD/Xen (but this was mostly cosmetic, like split a file into parts that can be shared and parts that can't).
ED: You introduced a new NetBSD port for NetBSD as a Xen guest. It has been called NetBSD/xen. Would it have been possible to merge Xen guest support in NetBSD/i386 instead of making a new port? What would have been the pros and cons of such an approach?
MB: Well, Christian introduced it. but I think it was the right approach. i386 is the name of the processor (MACHINE_ARCH in NetBSD kernel terminology) and the name of the architecture (MACHINE). NetBSD/xen is different from NetBSD/i386 in the same way NetBSD/mac68k and NetBSD/sun3 are different, even if they run on the same processor and can share binaries. Note that, even though Xen is a different port, there's no xen distribution, it's merged with i386: the i386 binary sets contains the Xen kernels.
ED: What about portability? For now NetBSD supports Xen as host or guest on i386. Will we see support for other architectures? For instance on amd64, on powerpc...
MB: There's no reasons it can't (but then NetBSD/xen will have to be renamed to NetBSD/xeni386 first :). I'm interested in working on xenamd64 myself.
ED: Since you helped port it, I imagine you were interested into using Xen. Can you tell us about how you use it?
MB: The primary use was to merge several different servers in an overloaded machine room onto a single piece of hardware. These different servers are managed by different groups of people, runing different services and/or different OSes. For all these servers, the hardware was underused. Thanks to Xen I've been able to merge 10 servers to 2 physical boxes. Another use is to build virtual network platforms for network experiments. Each node in the testbed can be a Xen guest instead of a physical system.
And finally, Xen is a wonderful platform for kernel development. Instead of a separate test machine, you can use a Xen guest. A Xen guest boots much faster, and you don't have to reboot to an older kernel in order to get back a working system to be able to install a new kernel when something goes bad. And, as the hypervisor can export PCI devices to guest OSes, you can even use a Xen guest for driver developments.
ED: From the security point of view, having different physical servers is a good thing, since an attacker compromising one machine may not be able to compromise another one. One could imagine an attack against the Xen virtualizer, enabling the superuser of a guest OS to get root on the host OS. Is it likely to happen?
MB: Of course this is something that could happen, and the person using virtualization to host different systems has to keep this in mind. As usual it's a matter of risk vs benefits. The hypervisor is quite small in fact - only 240 source files for Xen 2, 77000 lines of code - so it's easy to audit and maintain. Also, the interaction between the guest kernel and the hypervisor are mostly limited to events (which are just a bitmask), there are only a few places where a guest kernel can give a pointer to arbitrary data to the hypervisor. So I think the chances of compromising the hypervisor are very small - much smaller than required to get control of a regular Unix kernel. Then there is the risk of a guest compromising the domain0's kernel through the driver backends. Again these drivers are small, so the risk should be low. I think the main risk is getting the domain0 compromised by a bug in some network tool, through the virtual network with the guests.
ED: Let us talk about the configuration. How complex is it to set up a Linux running as a guest of NetBSD, for instance?
MB: There a two steps: first install NetBSD on top of Xen and then add more guests. This is documented on the Xen port page on NetBSD's web site. The first step is easy: Do a regular NetBSD/i386 install, install grub and the xen packages, create the xen devices in /dev (just a MAKEDEV xen is needed). Add a grub config to load the Xen kernel and the netbsd-XEN0 kernel as a module and you're done, you now have a Xen system running with NetBSD as domain0.
To add a guest isn't much more difficult: you need a free partition or a large file for the virtual disk (you can create this file using dd(1) for example) and a bridge interface pre-configured with your real ethernet. Create a Xen config file (just need to change a few lines in the provided sample) pointing to the kernel you want to load, the file or partition to use as virtual disk and the bridge to use for the virtual network.
A guest won't run properly without a root filesystem on its virtual disk (note that it's possible to use an NFS root too, as you've got networking). For a NetBSD guest it's easy: create your domain with the netbsd-INSTALL_XENU kernel and the guest will boot in sysinst, just like a regular NetBSD install. Once the install is complete, replace netbsd-INSTALL_XENU with netbsd-XENU and you've got your guest running with NetBSD. This is really fast, creating a new NetBSD guest only takes a few minutes, the most time-consuming part being creating the big file for the virtual disk with dd :)
For linux it's a bit more tricky, because of the different distributions and install procedure available. It should be possible to create a linux-XENU launch an install procedure just like NetBSD's INSTALL_XENU but I'm not aware of such a kernel. The documented way is to format the virtual disk as ext2 from domain0, and populate it with linux binaries. The easiest way is to do a copy from an existing, running linux system. But once you've got your virtual disk populated you can create as many guests as you want easily, just copy the file backing up the virtual disk.
ED: As far as I understood, Xen also offers virtual machine migration, where you freeze a Xen guest, move it to another machine and resume it there. Could you give us an insight about how this is done from the administrator point of view?
MB: Yes, it offers migration and suspend/resume, which, from the virtual machine's kernel is the same thing. NetBSD doesn't support this yet and I've not played with it under linux so I don't know the exact details. Suspend/Resume is easy, the current state of the guest is saved in a large file at suspend time, and reloaded at resume time. migration works in a similar way, but instead of saving the state of the guest in a file, it's transfered over the network to the remote Xen machine. This also means that a similar environment for the guest has to exist on the remote system. For example the same network segment has to be available on both systems so that the guest's virtual interface can be connected back to the same network on the new host. The backend storage for the virtual block devices also has to be present on the new host. To achieve this, you could use a SAN, or maybe just a file on a NFS server.
ED: Manuel, thank you for answering my questions. Do you have something to add about Xen on NetBSD?
MB: Xen associated with the performances and features of NetBSD is a wonderful tool for systems administrators and developers. Try it, you'll like it !
source:http://ezine.daemonnews.org/200602/xen.html