QEMU for and on ARM cores

Archive for the ‘debugging-tricks’ Category

Tricks for Debugging QEMU — savevm snapshots

with 2 comments

For the next entry in this occasional series of posts about tricks for debugging QEMU I want to talk about savevm snapshots.

QEMU’s savevm snapshot feature is designed as a user feature, but it’s surprisingly handy as a developer tool too. Suppose you have a guest image which misbehaves when you run a particular userspace program inside the guest. This can be very awkward to debug because it takes so long to get to the point of failure, especially if it requires user interaction along the way. If you take a snapshot of the VM state just before the bug manifests itself, you can create a simpler and shorter test case by making QEMU start execution from the snapshot point. It’s then often practical to use debug techniques like turning on QEMU’s slow and voluminous tracing of all execution, now that you’re only dealing with a short run of execution.

To use savevm snapshots you’ll need to be using a disk image format which supports them, like QCOW2. If you have a different format like a raw disk, you can convert it with qemu-img:

qemu-img convert -f raw -O qcow2 your-disk.img your-disk.qcow2

and then change your command line to use the qcow2 file rather than the old raw image. (As a bonus it should be faster and take less disk space too!)

If the QEMU system you’re trying to debug doesn’t have a disk image at all, you can create a dummy disk which will be used for nothing but snapshots like this:

qemu-img create -f qcow2 dummy.qcow2 32M

and then add this option to your command line:

-drive if=none,format=qcow2,file=dummy.qcow2

(QEMU may warn that the drive is “orphaned” because it’s not connected to anything, but that’s fine.)

To create a snapshot, you use this QEMU monitor command:

savevm some-name

This will save the VM state, and usually takes a second or two. Once it’s done you can type quit at the monitor to exit QEMU. You can make multiple snapshots with different names.

Then to make QEMU start automatically from the snapshot add the option:

-loadvm some-name

to your QEMU command line. (You still need to specify all the same device and configuration options you did when you saved the snapshot.)

Before you dive into debugging your reduced test case, do check that the bug you’re reproducing is still present in the shortened test case. Some bugs don’t reproduce in a snapshot — for instance if the problem is that QEMU has stale information cached in its TLB or translated code cache, then the bug will probably not manifest when the snapshot is loaded, because these caches will be empty. (Not reproducing in a snapshot is interesting diagnostic information in itself, in fact.)

You should also be aware that snapshotting requires support from all the devices in the system QEMU is modelling. This works fine for the x86 PC models, and also for most of the major ARM boards (including ‘virt’, ‘vexpress’ and ‘versatilepb’), but if you’re trying this on a more obscure guest CPU architecture or board you might run into trouble. Missing snapshotting support will manifest as the reloaded system misbehaving (eg device stops working, or perhaps there are no interrupts so nothing responds). I think this debugging technique is valuable enough that it’s worth stopping to fix up missing snapshot support in devices just so you can use it. If you don’t feel up to that, feel free to report the bugs on qemu-devel…

You can automate the process of taking the initial snapshot using the ‘expect’ utility. Here are some command line options that create a monitor session on TCP port 4444 and make QEMU start up in a ‘stopped’ state, so the VM doesn’t run until we ask it to:

-chardev socket,id=monitor,host=,port=4444,server,nowait,telnet -mon chardev=monitor,mode=readline -S

And here’s an expect script that connects to the monitor, tells QEMU to start, and then takes a snapshot 0.6 seconds into the run:

#!/usr/bin/expect -f

set timeout -1
spawn telnet localhost 4444
expect "(qemu)"
send "c\r"
sleep 0.6
send "savevm foo\r"
expect "(qemu)"
send "quit\r"
expect eof

I used this recently to debug a problem in early boot that was causing a hang — by adjusting the timeout I was able to get a snapshot very close to the point where the trouble occured. Even a second of execution can generate enough execution trace to be unmanageable…

Snapshots won’t solve your debugging problem all on their own, but they can cut the problem down to a size where you can apply some of the other tools in your toolbox.

Written by pm215

July 6, 2015 at 2:27 pm

Posted in debugging-tricks, qemu

Tricks for debugging QEMU — rr

with one comment

Over the years I’ve picked up a few tricks for tracking down problems in QEMU, and it seemed worth writing them up. First on the list is a tool I’ve found relatively recently: rr, from the folks at Mozilla.

rr is a record-and-replay tool for C and C++: you run your program under the recorder and provoke the bug you’re interested in. Then you can debug using a replay of the recording. The replay is deterministic and side-effect-free, so you can debug it as many times as you want, knowing that even an intermittent bug will always reveal itself in the same way. Better still, rr recently gained support for reverse-debugging, so you can set a breakpoint or watchpoint and then run time backwards to find the previous occurrence of what you’re looking for. This is fantastic for debugging problems which manifest only a long time after they occur, like memory corruption or stale entries in cache data structures. The idea of record-and-replay is not new; where rr is different is that it’s very low overhead and capable of handling complex programs like QEMU and Mozilla. It’s a usable production quality debug tool, not just a research project. It has a few rough edges, but the developers have been very responsive to bug reports.

Here’s a worked example with a real-world bug I tracked down last week. (This is a compressed account of the last part of a couple of weeks of head-scratching; I have omitted various wrong turns and false starts…)

I had an image for QEMU’s Zaurus (“spitz”) machine, which managed to boot the guest kernel but then got random segfaults trying to execute userspace. Use of git bisect showed that this regression happened with commit 2f0d8631b7. That change is valid, but it did vastly reduce the number of unnecessary guest TLB flushes we were doing. This suggested that the cause of the segfaults was a bug where we weren’t flushing the TLB properly somewhere, which was only exposed when we stopped flushing the TLB on practically every guest kernel-to-userspace transition.

Insufficient TLB flushing is a little odd for an ARM guest, because in practice we end up flushing all of QEMU’s TLB every time the guest asks for a single page to be flushed. (This is forced on us by having to support the legacy ARMv5 1K page tables, so for most guests which use 4K pages all pages are “huge pages” and take a less efficient path through QEMU’s TLB handling.) So I had a hunch that maybe we weren’t actually doing the flush correctly. OK, change the code to handle the “TLB invalidate by virtual address” guest operations so that they explicitly flush the whole TLB — bug goes away. Take that back out, and put an assert(0) in the cputlb.c function that handles “delete a single entry from the TLB cache”. This should never fire for an ARM guest with 4K pages, and yet it did.

At this point I was pretty sure I was near to tracking down the cause of the bug; but the problem wasn’t likely to be near the assertion, but somewhere further back in execution when the entry got added to the TLB in the first place. Time for rr.

Recording is simple: just rr record qemu-system-arm args.... Then rr replay will start replaying the last record, and by default will drop you into a gdb at the start of the recording. Let’s just let it run forward until the assertion:

(gdb) c
qemu-system-arm: /home/petmay01/linaro/qemu-from-laptop/qemu/cputlb.c:80: tlb_flush_entry: Assertion `0' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 18096.18098]
0x0000000070000018 in ?? ()

Looking back up the stack we find that we were definitely trying to flush a valid TLB entry:

(gdb) frame 13
#13 0x0000555555665eb1 in tlb_flush_page (cpu=0x55555653bea0, addr=1074962432)
    at /home/petmay01/linaro/qemu-from-laptop/qemu/cputlb.c:118
118                 tlb_flush_entry(&env->tlb_v_table[mmu_idx][k], addr);
(gdb) print /x env->tlb_v_table[mmu_idx][k]
$2 = {addr_read = 0x4012a000, addr_write = 0x4012a000, addr_code = 0x4012a000, 
  addend = 0x2aaa83cf6000, dummy = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff}}

and checking env->tlb_flush_mask and env->tlb_flush_addr shows that QEMU thinks this address is outside the range covered by huge pages. Maybe we miscalculated them when we were adding the page? Let’s go back and find out what happened then:

(gdb) break tlb_set_page_with_attrs if vaddr == 0x4012a000
Breakpoint 1 at 0x5555556663f2: file /home/petmay01/linaro/qemu-from-laptop/qemu/cputlb.c, line 256.
(gdb) rc

Program received signal SIGABRT, Aborted.
0x0000000070000016 in ?? ()
(gdb) rc

Breakpoint 1, tlb_set_page_with_attrs (cpu=0x55555653bea0, vaddr=1074962432, paddr=2684485632, 
    attrs=..., prot=7, mmu_idx=0, size=1024)
    at /home/petmay01/linaro/qemu-from-laptop/qemu/cputlb.c:256

(Notice that we hit the assertion again as we went backwards over it, so we just repeat the reverse-continue.) We stop exactly where we want to be to investigate the insertion of the TLB entry. In a normal debug session we could have tried restarting execution from the beginning with a conditional breakpoint, but there would be no guarantee that guest execution was deterministic enough for the guest address to be the same, or that the call we wanted to stop at was the only time we added a TLB entry for this address. Stepping forwards through the tlb code I notice we don’t think this is a huge page at all, and in fact you can see from the function parameters that the size is 1024, not the expected 4096. Where did this come from? Setting a breakpoint in arm_cpu_handle_mmu_fault and doing yet another reverse-continue brings us to the start of the code that’s doing the page table walk so we can step forwards through it. (You can use rn and rs to step backwards if you like but personally I find that a little confusing.). Now rr has led us to the scene of the crime it’s very obvious that the problem is in our handling of an XScale-specific page table descriptor, which we’re incorrectly claiming to indicate a 1K page rather than 4K. Fix that, and the bug is vanquished.

Without rr this would have been much more tedious to track down. Being able to follow the chain of causation backwards from the failing assertion to the exact point where things diverged from your expectations is priceless. And anybody who’s done much debugging will have had the experience of accidentally stepping or continuing one time too often and zooming irrevocably past the point they wanted to look at — with reverse execution those errors are easily undoable.

I can’t recommend rr highly enough — I think it deserves to become a standard part of the Linux C/C++ developer’s toolkit, as valgrind has done before it.

Written by pm215

May 30, 2015 at 7:45 pm

Posted in debugging-tricks, qemu