QEMU for and on ARM cores

Archive for April 2012

This End Up…

leave a comment »

I’ve just been reading the ARM ARM on the subject of big-endian support. It’s quite complicated now (as with many bits of the architecture), especially if like QEMU you need to support both old obsolete features and their new replacements. First, a quick summary:

ARM v4 and v5 supported a big-endian model now known as BE32 (although at the time it was just big-endian mode). The key features of BE32 are:

  • word invariant: this means that if you store a 32 bit word in little-endian mode, then flip to big-endian and reload it, you’ll get the same value back. However, if you do a byte load in big-endian mode you’re reading a different byte of RAM than you would for a byte load of the same address in little-endian mode. (Under the hood, the hardware adjusts the addresses for loads and stores of bytes and halfwords.)
  • operates on all memory accesses: data loads and stores, instruction fetches and translation table walks.
  • system wide: it is controlled by bit 7 in the System Control Register (SCTLR.B), and only the operating system can set or clear this. (Implementations might make the bit read-only if they don’t support big-endian mode or if they only allow it to be set via an external signal on reset.)

ARM v6 deprecated BE32 and introduced BE8 as its replacement. Key features:

  • byte invariant: a byte load from address X in little-endian mode accesses the same data as a byte load from X in big-endian mode. However, a word access in big-endian mode will return a word whose bytes are in the opposite order to the same word access in little-endian mode. (Instead of fiddling with addresses like BE32 hardware, BE8 hardware simply flips the four bytes of data for 32 bit accesses, and flips two bytes of data for 16 bit accesses.)
  • only operates on data accesses. Loads and stores done by the program will be in big-endian order, but when the CPU fetches instructions it does so little-endian. This means that self-modifying code needs to know it’s in BE8 mode, because the instruction words it reads from memory will appear to it to be the “wrong way” round, because the CPU reads instructions in little-endian mode and so they must always be in RAM that way round. Since executables are loaded into memory without distinguishing code from data, this also means that when the toolchain writes out a BE8 executable it effectively needs to flip the instructions. This is usually done in the linker.
  • potentially per-user-process: the main control bit is the CPSR.E bit, which can be changed with the unprivileged SETEND instruction. So that the OS gets a predictable data endianness there is a new bit SCTLR.EE in the System Control Register (“exception endianness”) which controls the value of CPSR.E on exception entry; it also determines endianness used for translation table walks.

Notice that both “byte invariant” and “word invariant” approaches meet the key big-endian requirement that if the CPU stores a word 0x12345678 to an address and then reads back a byte from that address it will read 0x12. You can only tell the difference if you have some other way to look at the actual bytes in memory (for instance if you have a second little-endian processor in the system that can read the RAM, or if you can switch the CPU back into little-endian mode).

A v6 core can support both BE32 and BE8, so it still has the SCTLR.B bit. Attempting to turn them both on at once is (fortunately!) UNPREDICTABLE…

In ARMv7 BE32 was dropped completely, so SCTLR.B will always read as zero. However, for R profile only, implementations may support reversing byte order for instruction accesses as well as data. If this is provided then it’s only changeable by asserting an input signal to the CPU on reset. A new System Control Register bit SCTLR.IE tells you whether this instruction endianness flipping is in effect. A system with SCTLR.EE, SCTLR.IE and CPSR.E all set looks pretty similar to a BE32 system from the point of view of the code running on the CPU.

So how does this fit in to QEMU? QEMU’s basic model of endianness is that it is a fixed thing; targets are at compile time specified to be big- or little-endian, and the QEMU core then swaps data if the host and guest are of differing endianness; all memory and device accesses are assumed to be of the same endianness. This is really a kind of byte-invariant big-endianness, but we can use it to implement support for BE32 systems provided that you can never switch back into little-endian mode. In fact, QEMU’s current armeb targets provide exactly this fixed always-BE32 system.[Update: we don’t have any BE32 system targets currently, only the linux-user one, but in theory it should work.]

We don’t currently support BE8, and to do so we need to support separate control of data and code access byteswapping. Paul Brook has posted some patches to add BE8 support to the linux-user-mode, again as a fixed always-on setting (automatically enabled if the ELF file we’re running specifies that it is BE8). This works by telling QEMU’s core that the guest CPU is big-endian (which means data accesses are correct); we then have manual code to swap back the values when we’re doing a read which is an instruction access. This is much simpler than trying to only swap all the data accesses because there are far fewer places where we read words as instructions. The inefficiency of swapping twice is not as bad as it might seem, because we will only do it when we first read code to translate it; subsequent reexecution of the instruction will just reexecute the translated code. I expect this user-mode-only BE8 support to get into upstream QEMU and qemu-linaro within a month or so.

BE8 in system mode would be trickier, and ideally we’d support dynamic endianness switching. The simplest approach would be to have QEMU treat the system as “little-endian”, and then do the byteswapping for data accesses by translating a LDR instruction as “load 32 bits; byteswap 32 bit word”, and so on. Of course if you were running in BE8 mode on a big-endian host system you’d end up swapping everything twice; it would be more efficient to add some support to QEMU’s core for this. However there isn’t really much demand for BE8 system mode support at the moment, so we don’t have any plans to work on it.

Written by pm215

April 2, 2012 at 6:53 pm

Posted in linaro, qemu