translatedcode

QEMU for and on ARM cores

Debian on QEMU’s Raspberry Pi 3 model

with 17 comments

For the QEMU 2.12 release we added support for a model of the Raspberry Pi 3 board (thanks to everybody involved in developing and upstreaming that code). The model is sufficient to boot a Debian image, so I wanted to write up how to do that.

Things to know before you start

Before I start, some warnings about the current state of the QEMU emulation of this board:

  • We don’t emulate the boot rom, so QEMU will not automatically boot from an SD card image. You need to manually extract the kernel, initrd and device tree blob from the SD image first. I’ll talk about how to do that below.
  • We don’t have an emulation of the BCM2835 USB controller. This means that there is no networking support, because on the raspi devices the ethernet hangs off the USB controller.
  • Our raspi3 model will only boot AArch64 (64-bit) kernels. If you want to boot a 32-bit kernel you should use the “raspi2” board model.
  • The QEMU model is missing models of some devices, and others are guesswork due to a lack of documentation of the hardware; so although the kernel I tested here will boot, it’s quite possible that other kernels may fail.

You’ll need the following things on your host system:

  • QEMU version 2.12 or better
  • libguestfs (on Debian and Ubuntu, install the libguestfs-tools package)

Getting the image

I’m using the unofficial preview images described on the Debian wiki.

$ wget https://people.debian.org/~stapelberg/raspberrypi3/2018-01-08/2018-01-08-raspberry-pi-3-buster-PREVIEW.img.xz
$ xz -d 2018-01-08-raspberry-pi-3-buster-PREVIEW.img.xz

Extracting the guest boot partition contents

I use libguestfs to extract files from the guest SD card image. There are other ways to do this but I think libguestfs is the easiest to use. First, check that libguestfs is working on your system:

$ virt-filesystems -a 2018-01-08-raspberry-pi-3-buster-PREVIEW.img
/dev/sda1
/dev/sda2

If this doesn’t work, then you should sort that out first. A couple of common reasons I’ve seen:

  • if you’re on Ubuntu then your kernels in /boot are installed not-world-readable; you can fix this with sudo chmod 644 /boot/vmlinuz*
  • if you’re running Virtualbox on the same host it will interfere with libguestfs’s attempt to run KVM; you can fix that by exiting Virtualbox

Now you can ask libguestfs to extract the contents of the boot partition:

$ mkdir bootpart
$ guestfish --ro -a 2018-01-08-raspberry-pi-3-buster-PREVIEW.img -m /dev/sda1

Then at the guestfish prompt type:

copy-out / bootpart/
quit

This should have copied various files into the bootpart/ subdirectory.

Run the guest image

You should now be able to run the guest image:

$ qemu-system-aarch64 \
  -kernel bootpart/vmlinuz-4.14.0-3-arm64 \
  -initrd bootpart/initrd.img-4.14.0-3-arm64 \
  -dtb bootpart/bcm2837-rpi-3-b.dtb \
  -M raspi3 -m 1024 \
  -serial stdio \
  -append "rw earlycon=pl011,0x3f201000 console=ttyAMA0 loglevel=8 root=/dev/mmcblk0p2 fsck.repair=yes net.ifnames=0 rootwait memtest=1" \
  -drive file=2018-01-08-raspberry-pi-3-buster-PREVIEW.img,format=raw,if=sd

and have it boot to a login prompt (the root password for this Debian image is “raspberry”).

There will be several WARNING logs and backtraces printed by the kernel as it starts; these will have a backtrace like this:

[  145.157957] [] uart_get_baud_rate+0xe4/0x188
[  145.158349] [] pl011_set_termios+0x60/0x348
[  145.158733] [] uart_change_speed.isra.3+0x50/0x130
[  145.159147] [] uart_set_termios+0x7c/0x180
[  145.159570] [] tty_set_termios+0x168/0x200
[  145.159976] [] set_termios+0x2b0/0x338
[  145.160647] [] tty_mode_ioctl+0x358/0x590
[  145.161127] [] n_tty_ioctl_helper+0x54/0x168
[  145.161521] [] n_tty_ioctl+0xd4/0x1a0
[  145.161883] [] tty_ioctl+0x150/0xac0
[  145.162255] [] do_vfs_ioctl+0xc4/0x768
[  145.162620] [] SyS_ioctl+0x8c/0xa8

These are ugly but harmless. (The underlying cause is that QEMU doesn’t implement the undocumented ‘cprman’ clock control hardware, and so Linux thinks that the UART is running at a zero baud rate and complains.)

Written by pm215

April 25, 2018 at 9:07 am

Posted in linaro, qemu

Installing Debian on QEMU’s 64-bit ARM “virt” board

with 26 comments

This post is a 64-bit companion to an earlier post of mine where I described how to get Debian running on QEMU emulating a 32-bit ARM “virt” board. Thanks to commenter snak3xe for reminding me that I’d said I’d write this up…

Why the “virt” board?

For 64-bit ARM QEMU emulates many fewer boards, so “virt” is almost the only choice, unless you specifically know that you want to emulate one of the 64-bit Xilinx boards. “virt” supports supports PCI, virtio, a recent ARM CPU and large amounts of RAM. The only thing it doesn’t have out of the box is graphics.

Prerequisites and assumptions

I’m going to assume you have a Linux host, and a recent version of QEMU (at least QEMU 2.8). I also use libguestfs to extract files from a QEMU disk image, but you could use a different tool for that step if you prefer.

I’m going to document how to set up a guest which directly boots the kernel. It should also be possible to have QEMU boot a UEFI image which then boots the kernel from a disk image, but that’s not something I’ve looked into doing myself. (There may be tutorials elsewhere on the web.)

Getting the installer files

I suggest creating a subdirectory for these and the other files we’re going to create.

wget -O installer-linux http://http.us.debian.org/debian/dists/stretch/main/installer-arm64/current/images/netboot/debian-installer/arm64/linux
wget -O installer-initrd.gz http://http.us.debian.org/debian/dists/stretch/main/installer-arm64/current/images/netboot/debian-installer/arm64/initrd.gz

Saving them locally as installer-linux and installer-initrd.gz means they won’t be confused with the final kernel and initrd that the installation process produces.

(If we were installing on real hardware we would also need a “device tree” file to tell the kernel the details of the exact hardware it’s running on. QEMU’s “virt” board automatically creates a device tree internally and passes it to the kernel, so we don’t need to provide one.)

Installing

First we need to create an empty disk drive to install onto. I picked a 5GB disk but you can make it larger if you like.

qemu-img create -f qcow2 hda.qcow2 5G

(Oops — an earlier version of this blogpost created a “qcow” format image, which will work but is less efficient. If you created a qcow image by mistake, you can convert it to qcow2 with mv hda.qcow2 old-hda.qcow && qemu-img convert -O qcow2 old-hda.qcow hda.qcow2. Don’t try it while the VM is running! You then need to update your QEMU command line to say “format=qcow2” rather than “format=qcow”. You can delete the old-hda.qcow once you’ve checked that the new qcow2 file works.)

Now we can run the installer:

qemu-system-aarch64 -M virt -m 1024 -cpu cortex-a53 \
  -kernel installer-linux \
  -initrd installer-initrd.gz \
  -drive if=none,file=hda.qcow2,format=qcow2,id=hd \
  -device virtio-blk-pci,drive=hd \
  -netdev user,id=mynet \
  -device virtio-net-pci,netdev=mynet \
  -nographic -no-reboot

The installer will display its messages on the text console (via an emulated serial port). Follow its instructions to install Debian to the virtual disk; it’s straightforward, but if you have any difficulty the Debian installation guide may help.

The actual install process will take a few hours as it downloads packages over the network and writes them to disk. It will occasionally stop to ask you questions.

Late in the process, the installer will print the following warning dialog:

   +-----------------| [!] Continue without boot loader |------------------+
   |                                                                       |
   |                       No boot loader installed                        |
   | No boot loader has been installed, either because you chose not to or |
   | because your specific architecture doesn't support a boot loader yet. |
   |                                                                       |
   | You will need to boot manually with the /vmlinuz kernel on partition  |
   | /dev/vda1 and root=/dev/vda2 passed as a kernel argument.             |
   |                                                                       |
   |                              <Continue>                               |
   |                                                                       |
   +-----------------------------------------------------------------------+  

Press continue for now, and we’ll sort this out later.

Eventually the installer will finish by rebooting — this should cause QEMU to exit (since we used the -no-reboot option).

At this point you might like to make a copy of the hard disk image file, to save the tedium of repeating the install later.

Extracting the kernel

The installer warned us that it didn’t know how to arrange to automatically boot the right kernel, so we need to do it manually. For QEMU that means we need to extract the kernel the installer put into the disk image so that we can pass it to QEMU on the command line.

There are various tools you can use for this, but I’m going to recommend libguestfs, because it’s the simplest to use. To check that it works, let’s look at the partitions in our virtual disk image:

$ virt-filesystems -a hda.qcow2 
/dev/sda1
/dev/sda2

If this doesn’t work, then you should sort that out first. A couple of common reasons I’ve seen:

  • if you’re on Ubuntu then your kernels in /boot are installed not-world-readable; you can fix this with sudo chmod 644 /boot/vmlinuz*
  • if you’re running Virtualbox on the same host it will interfere with libguestfs’s attempt to run KVM; you can fix that by exiting Virtualbox

Looking at what’s in our disk we can see the kernel and initrd in /boot:

$ virt-ls -a hda.qcow2 /boot/
System.map-4.9.0-3-arm64
config-4.9.0-3-arm64
initrd.img
initrd.img-4.9.0-3-arm64
initrd.img.old
lost+found
vmlinuz
vmlinuz-4.9.0-3-arm64
vmlinuz.old

and we can copy them out to the host filesystem:

virt-copy-out -a hda.qcow2 /boot/vmlinuz-4.9.0-3-arm64 /boot/initrd.img-4.9.0-3-arm64 .

(We want the longer filenames, because vmlinuz and initrd.img are just symlinks and virt-copy-out won’t copy them.)

An important warning about libguestfs, or any other tools for accessing disk images from the host system: do not try to use them while QEMU is running, or you will get disk corruption when both the guest OS inside QEMU and libguestfs try to update the same image.

If you subsequently upgrade the kernel inside the guest, you’ll need to repeat this step to extract the new kernel and initrd, and then update your QEMU command line appropriately.

Running

To run the installed system we need a different command line which boots the installed kernel and initrd, and passes the kernel the command line arguments the installer told us we’d need:

qemu-system-aarch64 -M virt -m 1024 -cpu cortex-a53 \
  -kernel vmlinuz-4.9.0-3-arm64 \
  -initrd initrd.img-4.9.0-3-arm64 \
  -append 'root=/dev/vda2' \
  -drive if=none,file=hda.qcow2,format=qcow2,id=hd \
  -device virtio-blk-pci,drive=hd \
  -netdev user,id=mynet \
  -device virtio-net-pci,netdev=mynet \
  -nographic

This should boot to a login prompt, where you can log in with the user and password you set up during the install.

The installation has an SSH client, so one easy way to get files in and out is to use “scp” from inside the VM to talk to an SSH server outside it. Or you can use libguestfs to write files directly into the disk image (for instance using virt-copy-in) — but make sure you only use libguestfs when the VM is not running, or you will get disk corruption.

Written by pm215

July 24, 2017 at 10:25 am

Posted in linaro, qemu

Installing Debian on QEMU’s 32-bit ARM “virt” board

with 52 comments

In this post I’m going to describe how to set up Debian on QEMU emulating a 32-bit ARM “virt” board. There are a lot of older tutorials out there which suggest using boards like “versatilepb” or “vexpress-a9”, but these days “virt” is a far better choice for most people, so some documentation of how to use it seems overdue. (I may do a followup post for 64-bit ARM later.)

Update 2017-07-24: I have now written that post about installing a 64-bit ARM guest.

Why the “virt” board?

QEMU has models of nearly 50 different ARM boards, which makes it difficult for new users to pick one which is right for their purposes. This wild profusion reflects a similar diversity in the real hardware world: ARM systems come in many different flavours with very different hardware components and capabilities. A kernel which is expecting to run on one system will likely not run on another. Many of QEMU’s models are annoyingly limited because the real hardware was also limited — there’s no PCI bus on most mobile devices, after all, and a fifteen year old development board wouldn’t have had a gigabyte of RAM on it.

My recommendation is that if you don’t know for certain that you want a model of a specific device, you should choose the “virt” board. This is a purely virtual platform designed for use in virtual machines, and it supports PCI, virtio, a recent ARM CPU and large amounts of RAM. The only thing it doesn’t have out of the box is graphics, but graphical programs on a fully emulated system run very slowly anyway so are best avoided.

Why Debian?

Debian has had good support for ARM for a long time, and with the Debian Jessie release it has a “multiplatform” kernel, so there’s no need to build a custom kernel. Because we’re installing a full distribution rather than a cut-down embedded environment, any development tools you need inside the VM will be easy to install later.

Prerequisites and assumptions

I’m going to assume you have a Linux host, and a recent version of QEMU (at least QEMU 2.6). I also use libguestfs to extract files from a QEMU disk image, but you could use a different tool for that step if you prefer.

Getting the installer files

I suggest creating a subdirectory for these and the other files we’re going to create.

To install on QEMU we will want the multiplatform “armmp” kernel and initrd from the Debian website:

wget -O installer-vmlinuz http://http.us.debian.org/debian/dists/jessie/main/installer-armhf/current/images/netboot/vmlinuz
wget -O installer-initrd.gz http://http.us.debian.org/debian/dists/jessie/main/installer-armhf/current/images/netboot/initrd.gz

Saving them locally as installer-vmlinuz and installer-initrd.gz means they won’t be confused with the final kernel and initrd that the installation process produces.

(If we were installing on real hardware we would also need a “device tree” file to tell the kernel the details of the exact hardware it’s running on. QEMU’s “virt” board automatically creates a device tree internally and passes it to the kernel, so we don’t need to provide one.)

Installing

First we need to create an empty disk drive to install onto. I picked a 5GB disk but you can make it larger if you like.

qemu-img create -f qcow2 hda.qcow2 5G

(Oops — an earlier version of this blogpost created a “qcow” format image, which will work but is less efficient. If you created a qcow image by mistake, you can convert it to qcow2 with mv hda.qcow2 old-hda.qcow && qemu-img convert -O qcow2 old-hda.qcow hda.qcow2. Don’t try it while the VM is running! You then need to update your QEMU command line to say “format=qcow2” rather than “format=qcow”. You can delete the old-hda.qcow once you’ve checked that the new qcow2 file works.)

Now we can run the installer:

qemu-system-arm -M virt -m 1024 \
  -kernel installer-vmlinuz \
  -initrd installer-initrd.gz \
  -drive if=none,file=hda.qcow2,format=qcow2,id=hd \
  -device virtio-blk-device,drive=hd \
  -netdev user,id=mynet \
  -device virtio-net-device,netdev=mynet \
  -nographic -no-reboot

(I would have preferred to use QEMU’s PCI virtio devices, but unfortunately the Debian kernel doesn’t support them; a future Debian release very likely will, which would allow you to use virtio-blk-pci and virtio-net-pci instead of virtio-blk-device and virtio-net-device.)

The installer will display its messages on the text console (via an emulated serial port). Follow its instructions to install Debian to the virtual disk; it’s straightforward, but if you have any difficulty the Debian release manual may help.
(Don’t worry about all the warnings the installer kernel produces about GPIOs when it first boots.)

The actual install process will take a few hours as it downloads packages over the network and writes them to disk. It will occasionally stop to ask you questions.

Late in the process, the installer will print the following warning dialog:

   +-----------------| [!] Continue without boot loader |------------------+
   |                                                                       |
   |                       No boot loader installed                        |
   | No boot loader has been installed, either because you chose not to or |
   | because your specific architecture doesn't support a boot loader yet. |
   |                                                                       |
   | You will need to boot manually with the /vmlinuz kernel on partition  |
   | /dev/vda1 and root=/dev/vda2 passed as a kernel argument.             |
   |                                                                       |
   |                              <Continue>                               |
   |                                                                       |
   +-----------------------------------------------------------------------+  

Press continue for now, and we’ll sort this out later.

Eventually the installer will finish by rebooting — this should cause QEMU to exit (since we used the -no-reboot option).

At this point you might like to make a copy of the hard disk image file, to save the tedium of repeating the install later.

Extracting the kernel

The installer warned us that it didn’t know how to arrange to automatically boot the right kernel, so we need to do it manually. For QEMU that means we need to extract the kernel the installer put into the disk image so that we can pass it to QEMU on the command line.

There are various tools you can use for this, but I’m going to recommend libguestfs, because it’s the simplest to use. To check that it works, let’s look at the partitions in our virtual disk image:

$ virt-filesystems -a hda.qcow2 
/dev/sda1
/dev/sda2

If this doesn’t work, then you should sort that out first. A couple of common reasons I’ve seen:

  • if you’re on Ubuntu then your kernels in /boot are installed not-world-readable; you can fix this with sudo chmod 644 /boot/vmlinuz*
  • if you’re running Virtualbox on the same host it will interfere with libguestfs’s attempt to run KVM; you can fix that by exiting Virtualbox

Looking at what’s in our disk we can see the kernel and initrd in /boot:

$ virt-ls -a hda.qcow2 /boot/
System.map-3.16.0-4-armmp-lpae
config-3.16.0-4-armmp-lpae
initrd.img
initrd.img-3.16.0-4-armmp-lpae
lost+found
vmlinuz
vmlinuz-3.16.0-4-armmp-lpae

and we can copy them out to the host filesystem:

$ virt-copy-out -a hda.qcow2 /boot/vmlinuz-3.16.0-4-armmp-lpae /boot/initrd.img-3.16.0-4-armmp-lpae .

(We want the longer filenames, because vmlinuz and initrd.img are just symlinks and virt-copy-out won’t copy them.)

An important warning about libguestfs, or any other tools for accessing disk images from the host system: do not try to use them while QEMU is running, or you will get disk corruption when both the guest OS inside QEMU and libguestfs try to update the same image.

Running

To run the installed system we need a different command line which boots the installed kernel and initrd, and passes the kernel the command line arguments the installer told us we’d need:

qemu-system-arm -M virt -m 1024 \
  -kernel vmlinuz-3.16.0-4-armmp-lpae \
  -initrd initrd.img-3.16.0-4-armmp-lpae \
  -append 'root=/dev/vda2' \
  -drive if=none,file=hda.qcow2,format=qcow2,id=hd \
  -device virtio-blk-device,drive=hd \
  -netdev user,id=mynet \
  -device virtio-net-device,netdev=mynet \
  -nographic

This should boot to a login prompt, where you can log in with the user and password you set up during the install.

The installation has an SSH client, so one easy way to get files in and out is to use “scp” from inside the VM to talk to an SSH server outside it. Or you can use libguestfs to write files directly into the disk image (for instance using virt-copy-in) — but make sure you only use libguestfs when the VM is not running, or you will get disk corruption.

Written by pm215

November 3, 2016 at 10:33 pm

Posted in linaro, qemu

Tricks for Debugging QEMU — savevm snapshots

with 2 comments

For the next entry in this occasional series of posts about tricks for debugging QEMU I want to talk about savevm snapshots.

QEMU’s savevm snapshot feature is designed as a user feature, but it’s surprisingly handy as a developer tool too. Suppose you have a guest image which misbehaves when you run a particular userspace program inside the guest. This can be very awkward to debug because it takes so long to get to the point of failure, especially if it requires user interaction along the way. If you take a snapshot of the VM state just before the bug manifests itself, you can create a simpler and shorter test case by making QEMU start execution from the snapshot point. It’s then often practical to use debug techniques like turning on QEMU’s slow and voluminous tracing of all execution, now that you’re only dealing with a short run of execution.

To use savevm snapshots you’ll need to be using a disk image format which supports them, like QCOW2. If you have a different format like a raw disk, you can convert it with qemu-img:

qemu-img convert -f raw -O qcow2 your-disk.img your-disk.qcow2

and then change your command line to use the qcow2 file rather than the old raw image. (As a bonus it should be faster and take less disk space too!)

If the QEMU system you’re trying to debug doesn’t have a disk image at all, you can create a dummy disk which will be used for nothing but snapshots like this:

qemu-img create -f qcow2 dummy.qcow2 32M

and then add this option to your command line:

-drive if=none,format=qcow2,file=dummy.qcow2

(QEMU may warn that the drive is “orphaned” because it’s not connected to anything, but that’s fine.)

To create a snapshot, you use this QEMU monitor command:

savevm some-name

This will save the VM state, and usually takes a second or two. Once it’s done you can type quit at the monitor to exit QEMU. You can make multiple snapshots with different names.

Then to make QEMU start automatically from the snapshot add the option:

-loadvm some-name

to your QEMU command line. (You still need to specify all the same device and configuration options you did when you saved the snapshot.)

Before you dive into debugging your reduced test case, do check that the bug you’re reproducing is still present in the shortened test case. Some bugs don’t reproduce in a snapshot — for instance if the problem is that QEMU has stale information cached in its TLB or translated code cache, then the bug will probably not manifest when the snapshot is loaded, because these caches will be empty. (Not reproducing in a snapshot is interesting diagnostic information in itself, in fact.)

You should also be aware that snapshotting requires support from all the devices in the system QEMU is modelling. This works fine for the x86 PC models, and also for most of the major ARM boards (including ‘virt’, ‘vexpress’ and ‘versatilepb’), but if you’re trying this on a more obscure guest CPU architecture or board you might run into trouble. Missing snapshotting support will manifest as the reloaded system misbehaving (eg device stops working, or perhaps there are no interrupts so nothing responds). I think this debugging technique is valuable enough that it’s worth stopping to fix up missing snapshot support in devices just so you can use it. If you don’t feel up to that, feel free to report the bugs on qemu-devel…

You can automate the process of taking the initial snapshot using the ‘expect’ utility. Here are some command line options that create a monitor session on TCP port 4444 and make QEMU start up in a ‘stopped’ state, so the VM doesn’t run until we ask it to:

-chardev socket,id=monitor,host=127.0.0.1,port=4444,server,nowait,telnet -mon chardev=monitor,mode=readline -S

And here’s an expect script that connects to the monitor, tells QEMU to start, and then takes a snapshot 0.6 seconds into the run:

#!/usr/bin/expect -f

set timeout -1
spawn telnet localhost 4444
expect "(qemu)"
send "c\r"
sleep 0.6
send "savevm foo\r"
expect "(qemu)"
send "quit\r"
expect eof

I used this recently to debug a problem in early boot that was causing a hang — by adjusting the timeout I was able to get a snapshot very close to the point where the trouble occured. Even a second of execution can generate enough execution trace to be unmanageable…

Snapshots won’t solve your debugging problem all on their own, but they can cut the problem down to a size where you can apply some of the other tools in your toolbox.

Written by pm215

July 6, 2015 at 2:27 pm

Posted in debugging-tricks, qemu

Tricks for debugging QEMU — rr

with one comment

Over the years I’ve picked up a few tricks for tracking down problems in QEMU, and it seemed worth writing them up. First on the list is a tool I’ve found relatively recently: rr, from the folks at Mozilla.

rr is a record-and-replay tool for C and C++: you run your program under the recorder and provoke the bug you’re interested in. Then you can debug using a replay of the recording. The replay is deterministic and side-effect-free, so you can debug it as many times as you want, knowing that even an intermittent bug will always reveal itself in the same way. Better still, rr recently gained support for reverse-debugging, so you can set a breakpoint or watchpoint and then run time backwards to find the previous occurrence of what you’re looking for. This is fantastic for debugging problems which manifest only a long time after they occur, like memory corruption or stale entries in cache data structures. The idea of record-and-replay is not new; where rr is different is that it’s very low overhead and capable of handling complex programs like QEMU and Mozilla. It’s a usable production quality debug tool, not just a research project. It has a few rough edges, but the developers have been very responsive to bug reports.

Here’s a worked example with a real-world bug I tracked down last week. (This is a compressed account of the last part of a couple of weeks of head-scratching; I have omitted various wrong turns and false starts…)

I had an image for QEMU’s Zaurus (“spitz”) machine, which managed to boot the guest kernel but then got random segfaults trying to execute userspace. Use of git bisect showed that this regression happened with commit 2f0d8631b7. That change is valid, but it did vastly reduce the number of unnecessary guest TLB flushes we were doing. This suggested that the cause of the segfaults was a bug where we weren’t flushing the TLB properly somewhere, which was only exposed when we stopped flushing the TLB on practically every guest kernel-to-userspace transition.

Insufficient TLB flushing is a little odd for an ARM guest, because in practice we end up flushing all of QEMU’s TLB every time the guest asks for a single page to be flushed. (This is forced on us by having to support the legacy ARMv5 1K page tables, so for most guests which use 4K pages all pages are “huge pages” and take a less efficient path through QEMU’s TLB handling.) So I had a hunch that maybe we weren’t actually doing the flush correctly. OK, change the code to handle the “TLB invalidate by virtual address” guest operations so that they explicitly flush the whole TLB — bug goes away. Take that back out, and put an assert(0) in the cputlb.c function that handles “delete a single entry from the TLB cache”. This should never fire for an ARM guest with 4K pages, and yet it did.

At this point I was pretty sure I was near to tracking down the cause of the bug; but the problem wasn’t likely to be near the assertion, but somewhere further back in execution when the entry got added to the TLB in the first place. Time for rr.

Recording is simple: just rr record qemu-system-arm args.... Then rr replay will start replaying the last record, and by default will drop you into a gdb at the start of the recording. Let’s just let it run forward until the assertion:

(gdb) c
Continuing.
[...]
qemu-system-arm: /home/petmay01/linaro/qemu-from-laptop/qemu/cputlb.c:80: tlb_flush_entry: Assertion `0' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 18096.18098]
0x0000000070000018 in ?? ()

Looking back up the stack we find that we were definitely trying to flush a valid TLB entry:

(gdb) frame 13
#13 0x0000555555665eb1 in tlb_flush_page (cpu=0x55555653bea0, addr=1074962432)
    at /home/petmay01/linaro/qemu-from-laptop/qemu/cputlb.c:118
118                 tlb_flush_entry(&env-&gt;tlb_v_table[mmu_idx][k], addr);
(gdb) print /x env-&gt;tlb_v_table[mmu_idx][k]
$2 = {addr_read = 0x4012a000, addr_write = 0x4012a000, addr_code = 0x4012a000, 
  addend = 0x2aaa83cf6000, dummy = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff}}

and checking env->tlb_flush_mask and env->tlb_flush_addr shows that QEMU thinks this address is outside the range covered by huge pages. Maybe we miscalculated them when we were adding the page? Let’s go back and find out what happened then:

(gdb) break tlb_set_page_with_attrs if vaddr == 0x4012a000
Breakpoint 1 at 0x5555556663f2: file /home/petmay01/linaro/qemu-from-laptop/qemu/cputlb.c, line 256.
(gdb) rc
Continuing.

Program received signal SIGABRT, Aborted.
0x0000000070000016 in ?? ()
(gdb) rc
Continuing.

Breakpoint 1, tlb_set_page_with_attrs (cpu=0x55555653bea0, vaddr=1074962432, paddr=2684485632, 
    attrs=..., prot=7, mmu_idx=0, size=1024)
    at /home/petmay01/linaro/qemu-from-laptop/qemu/cputlb.c:256

(Notice that we hit the assertion again as we went backwards over it, so we just repeat the reverse-continue.) We stop exactly where we want to be to investigate the insertion of the TLB entry. In a normal debug session we could have tried restarting execution from the beginning with a conditional breakpoint, but there would be no guarantee that guest execution was deterministic enough for the guest address to be the same, or that the call we wanted to stop at was the only time we added a TLB entry for this address. Stepping forwards through the tlb code I notice we don’t think this is a huge page at all, and in fact you can see from the function parameters that the size is 1024, not the expected 4096. Where did this come from? Setting a breakpoint in arm_cpu_handle_mmu_fault and doing yet another reverse-continue brings us to the start of the code that’s doing the page table walk so we can step forwards through it. (You can use rn and rs to step backwards if you like but personally I find that a little confusing.). Now rr has led us to the scene of the crime it’s very obvious that the problem is in our handling of an XScale-specific page table descriptor, which we’re incorrectly claiming to indicate a 1K page rather than 4K. Fix that, and the bug is vanquished.

Without rr this would have been much more tedious to track down. Being able to follow the chain of causation backwards from the failing assertion to the exact point where things diverged from your expectations is priceless. And anybody who’s done much debugging will have had the experience of accidentally stepping or continuing one time too often and zooming irrevocably past the point they wanted to look at — with reverse execution those errors are easily undoable.

I can’t recommend rr highly enough — I think it deserves to become a standard part of the Linux C/C++ developer’s toolkit, as valgrind has done before it.

Written by pm215

May 30, 2015 at 7:45 pm

Posted in debugging-tricks, qemu

AArch64 system emulation has landed in QEMU upstream

leave a comment »

In my last post I mentioned that we were nearly done with support for emulating an entire AArch64 system in QEMU. Those last few pieces of code have now landed upstream, and Alex Bennée has written a great guide to how to build QEMU and a test image so you can give it a spin.

Written by pm215

May 13, 2014 at 10:42 am

Posted in linaro, qemu

64-bit ARM usermode emulation in QEMU 2.0.0

leave a comment »

The QEMU Project released version 2.0.0 of QEMU last week; this seems like a good time to summarise our progress with ARMv8 QEMU work.

One of the major new ARM related features in this release is support for emulating AArch64 processes in QEMU’s “linux-user” mode; in Linaro we’ve been working on this over the last few months (building on a great foundation established by SUSE) and we just managed to squeeze support for the last few instructions into 2.0.0.

“linux-user” mode is where we run a single Linux guest binary, and QEMU converts the system calls the guest makes into system calls to the host Linux kernel. Typically you’d use this to run an AArch64 binary on a more conveniently available host, usually x86_64, by setting up a cross-architecture chroot and putting QEMU in it. We’ve implemented support for all the mandatory A64 instructions, including floating point and Advanced SIMD, but not the optional instructions in the crypto and CRC extensions.

As well as adding an entirely new instruction set for 64 bit support, the ARMv8 architecture included a few new instructions for the 32 bit A32 and T32 instruction sets. QEMU also now implements all the mandatory new instructions, though this will for the moment probably mostly be of use only to people running compiler test suites.

Two other uses for QEMU involve running it on AArch64 hardware. Firstly, you can use it to emulate other CPU architectures on AArch64 hosts, for instance running an x86 kernel in an emulated machine. This was contributed by Huawei last year, and has been supported since the previous release of QEMU (1.7).

You can also use QEMU as the userspace device emulation part of a virtual machine which uses KVM and the hardware’s virtualization extensions to provide fast AArch64-on-AArch64 VMs. This too has been supported since 1.7, though some features are not yet implemented (for instance, VM migration and debugging a guest VM are both not currently supported).

The final use for QEMU I want to talk about is the only one which isn’t in the 2.0.0 release, but many people have been waiting for it so here’s a status update. AArch64 system emulation is where you emulate a complete system and boot a full system including an AArch64 Linux kernel and user space, typically running on an x86 host. We’re working on this right now, and in fact as soon as QEMU’s git repository reopened for development after the 2.0.0 release we landed a large set of patches which implement all the necessary CPU emulation support. The only remaining missing piece in upstream QEMU master to be able to boot a kernel is to add support for running the “virt” board model with a Cortex-A57 and a GICv2 with an appropriate register layout. This last bit of work should be done shortly.

If you want to try out QEMU 2.0.0 you can build it yourself from the upstream released tarballs. If you’re an Ubuntu user then you’re in luck, because these changes are also in the QEMU shipped in the newly released Ubuntu 14.04 LTS.

Written by pm215

April 24, 2014 at 10:05 pm

Posted in linaro, qemu

QEMU KVM on ARMv7 support is upstream!

with one comment

This week the QEMU support patches for KVM on ARM were accepted into upstream. Since the kernel KVM on ARM patchset was accepted for the 3.9 kernel, this means that there is now enough support in purely upstream kernel and QEMU for basic working KVM on ARMv7 cores with the Virtualization Extensions (specifically, Cortex-A15). There are still a number of features left to be implemented but nonetheless I feel this is an important milestone. Thanks to everybody who’s played a part in getting us this far!

Written by pm215

March 8, 2013 at 7:08 am

Posted in kvm, linaro, qemu

Draining the CP15 Swamp

leave a comment »

A surprisingly large amount of the work we’ve been doing with QEMU and with KVM on ARM has been trying to get handling of CP15 correct.

CP15 is the System Control coprocessor; the architecture manual says it is for “control and configuration of the ARM processor system, including architecture and feature identification”. So this is the place where the control knobs for all the interestingly complicated processor features live: MMU, TLBs, caches, TrustZone access controls, performance monitors, virtualization… and complicated features need a lot of control knobs. Although early system control coprocessors were very simple (ARMv3 system coprocessors had just 8 registers), a modern ARMv7A processor like the Cortex-A15 has about 150 different CP15 registers.

The difficulty for QEMU is twofold. Firstly, the CP15 emulation code has grown organically along with the architecture. When we were dealing with 8 to 16 registers a simple set of switch statements was workable. As registers have been added the switch statements have got more and more cumbersome. Secondly, unlike hardware we want to support multiple CPUs in the same codebase, so we need to deal with all these variations simultaneously. As we added more conditionals things rapidly became unreadable. Registers were being defined for more CPUs than they should be, and it was hard to add new registers without accidentally breaking other CPUs, especially where some older CPUs defined registers that were reused for different purposes in newer architecture versions, or where the older CPU didn’t completely decode the CP15 instructions and so provided the same register in several different locations.

I spent a fair amount of time earlier this year rewriting QEMU’s CP15 code to use a more data-driven approach. Each register is described by a structure like this:

    { .name = "FCSEIDR", .cp = 15, .crn = 13, .crm = 0, .opc1 = 0, .opc2 = 0,
      .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.c13_fcse),
      .resetvalue = 0, .writefn = fcse_write },

which concisely describes where it sits in the coprocessor, its read/write access permissions, what fields of QEMU’s CPUARMState structure hold the information, and any special-purpose read or write accessor functions that might be needed. At startup we simply define the right registers based on the CPU feature bits. The rewrite also throws in some useful new features like support for 64 bit coprocessor registers and much better support for UNDEFfing on bad register accesses.

This is much easier to work with and we’re starting to see the benefits. When I wrote the LPAE support patches (which have just landed upstream) it was really easy to add the necessary new registers and modify the behaviour of some of the existing ones.

On the kernel side, we currently only support the Cortex-A15, but we’re anxious to keep things clean from the start (and we have the added incentive that if we fail to handle a CP15 register it could potentially let the guest mess with the host’s CPU state, which would be a security hole). Rusty Russell has just posted a patchset to the KVM ARM mailing list which also drives the CP15 emulation from a data table. These patches create a flexible userspace-to-kernel ABI (borrowed from the x86 handling of MSRs) which lets QEMU query the kernel for which registers it supports and read and write only the registers that both QEMU and the kernel know about. This should help avoid nasty binary compatibility breaks in the future when we add code to deal with new CP15 registers.

We’re not completely done yet; for instance we still need to think about how we handle possible compatibility issues with migration of a VM between QEMU instances which are different versions of QEMU and might have different CP15 support. But we’ve definitely drained a fair amount of the muddy water from this swamp and dispatched a few of the alligators…

Written by pm215

July 22, 2012 at 6:37 pm

Posted in linaro, qemu

This End Up…

leave a comment »

I’ve just been reading the ARM ARM on the subject of big-endian support. It’s quite complicated now (as with many bits of the architecture), especially if like QEMU you need to support both old obsolete features and their new replacements. First, a quick summary:

ARM v4 and v5 supported a big-endian model now known as BE32 (although at the time it was just big-endian mode). The key features of BE32 are:

  • word invariant: this means that if you store a 32 bit word in little-endian mode, then flip to big-endian and reload it, you’ll get the same value back. However, if you do a byte load in big-endian mode you’re reading a different byte of RAM than you would for a byte load of the same address in little-endian mode. (Under the hood, the hardware adjusts the addresses for loads and stores of bytes and halfwords.)
  • operates on all memory accesses: data loads and stores, instruction fetches and translation table walks.
  • system wide: it is controlled by bit 7 in the System Control Register (SCTLR.B), and only the operating system can set or clear this. (Implementations might make the bit read-only if they don’t support big-endian mode or if they only allow it to be set via an external signal on reset.)

ARM v6 deprecated BE32 and introduced BE8 as its replacement. Key features:

  • byte invariant: a byte load from address X in little-endian mode accesses the same data as a byte load from X in big-endian mode. However, a word access in big-endian mode will return a word whose bytes are in the opposite order to the same word access in little-endian mode. (Instead of fiddling with addresses like BE32 hardware, BE8 hardware simply flips the four bytes of data for 32 bit accesses, and flips two bytes of data for 16 bit accesses.)
  • only operates on data accesses. Loads and stores done by the program will be in big-endian order, but when the CPU fetches instructions it does so little-endian. This means that self-modifying code needs to know it’s in BE8 mode, because the instruction words it reads from memory will appear to it to be the “wrong way” round, because the CPU reads instructions in little-endian mode and so they must always be in RAM that way round. Since executables are loaded into memory without distinguishing code from data, this also means that when the toolchain writes out a BE8 executable it effectively needs to flip the instructions. This is usually done in the linker.
  • potentially per-user-process: the main control bit is the CPSR.E bit, which can be changed with the unprivileged SETEND instruction. So that the OS gets a predictable data endianness there is a new bit SCTLR.EE in the System Control Register (“exception endianness”) which controls the value of CPSR.E on exception entry; it also determines endianness used for translation table walks.

Notice that both “byte invariant” and “word invariant” approaches meet the key big-endian requirement that if the CPU stores a word 0x12345678 to an address and then reads back a byte from that address it will read 0x12. You can only tell the difference if you have some other way to look at the actual bytes in memory (for instance if you have a second little-endian processor in the system that can read the RAM, or if you can switch the CPU back into little-endian mode).

A v6 core can support both BE32 and BE8, so it still has the SCTLR.B bit. Attempting to turn them both on at once is (fortunately!) UNPREDICTABLE…

In ARMv7 BE32 was dropped completely, so SCTLR.B will always read as zero. However, for R profile only, implementations may support reversing byte order for instruction accesses as well as data. If this is provided then it’s only changeable by asserting an input signal to the CPU on reset. A new System Control Register bit SCTLR.IE tells you whether this instruction endianness flipping is in effect. A system with SCTLR.EE, SCTLR.IE and CPSR.E all set looks pretty similar to a BE32 system from the point of view of the code running on the CPU.

So how does this fit in to QEMU? QEMU’s basic model of endianness is that it is a fixed thing; targets are at compile time specified to be big- or little-endian, and the QEMU core then swaps data if the host and guest are of differing endianness; all memory and device accesses are assumed to be of the same endianness. This is really a kind of byte-invariant big-endianness, but we can use it to implement support for BE32 systems provided that you can never switch back into little-endian mode. In fact, QEMU’s current armeb targets provide exactly this fixed always-BE32 system.[Update: we don’t have any BE32 system targets currently, only the linux-user one, but in theory it should work.]

We don’t currently support BE8, and to do so we need to support separate control of data and code access byteswapping. Paul Brook has posted some patches to add BE8 support to the linux-user-mode, again as a fixed always-on setting (automatically enabled if the ELF file we’re running specifies that it is BE8). This works by telling QEMU’s core that the guest CPU is big-endian (which means data accesses are correct); we then have manual code to swap back the values when we’re doing a read which is an instruction access. This is much simpler than trying to only swap all the data accesses because there are far fewer places where we read words as instructions. The inefficiency of swapping twice is not as bad as it might seem, because we will only do it when we first read code to translate it; subsequent reexecution of the instruction will just reexecute the translated code. I expect this user-mode-only BE8 support to get into upstream QEMU and qemu-linaro within a month or so.

BE8 in system mode would be trickier, and ideally we’d support dynamic endianness switching. The simplest approach would be to have QEMU treat the system as “little-endian”, and then do the byteswapping for data accesses by translating a LDR instruction as “load 32 bits; byteswap 32 bit word”, and so on. Of course if you were running in BE8 mode on a big-endian host system you’d end up swapping everything twice; it would be more efficient to add some support to QEMU’s core for this. However there isn’t really much demand for BE8 system mode support at the moment, so we don’t have any plans to work on it.

Written by pm215

April 2, 2012 at 6:53 pm

Posted in linaro, qemu