risu -- random instruction sequence generator for userspace testing

risu is a tool intended to assist in testing the implementation of
models of the ARM architecture such as qemu and valgrind. In particular
it restricts itself to considering the parts of the architecture
visible from Linux userspace, so it can be used to test programs
which only implement userspace, like valgrind and qemu's linux-user

risu is also the Japanese word for squirrel.


risu comes in two parts -- a perl script 'risugen' for generating
test blobs (which can be run anywhere), and a Linux executable
'risu' which runs on the target architecture (ie ARM). To
build the executable part:

    [VAR=VALUE] ... ./configure [--static]

where [VAR=VALUE] ... allows you to specify any options.
Most useful is
 CROSS_PREFIX= which specifies the cross compiler prefix; you'll
   need this if you're not building on the target system
   (Example: CROSS_PREFIX=arm-linux-gnueabihf- )

Another useful flag is
 CPPFLAGS= which specifies pre-processor flags, usually -I statements
   for specifying extra include paths. Use this if you need something
   from new kernel headers not installed on your system, for instance.
   (Example: CPPFLAGS=-I/path/to/new-kernel-headers/include)

Passing --static will build a statically linked binary which is useful
if you don't want to mess around with a chroot to run the binary.

For other possibilities run 'configure --help'.

Building into a separate build tree from the source code is supported:
    mkdir my-build-dir
    cd my-build-dir
    [VAR=VALUE] ... ../configure

The build directory doesn't need to be inside the source tree;
just run configure from inside it.

For risu developers: there is a build-all-archs script which
will automatically configure and build risu for every CPU
architecture that we support and that you have a cross compiler
installed for. This is useful for confirming that your changes
to risu haven't broken anything.

Coding Style

risu follows the same coding style as the QEMU project, namely 4
spaces (no tabs) for indentation and the One True Brace Style variant
of K&R. The source tree includes a .dir-locals.el for Emacs users that
will set this automatically. Other editors are available.


The principle is straightforward: we generate a random sequence of
instructions, and run it on both native hardware and the model under
test. Register values are cross-checked after every instruction and
if there is a mismatch then the test fails.

risugen is a Perl script which generates a binary blob containing
the random instructions. It works by reading a configuration file
which specifies instruction patterns to be generated. Command line
options can be used to restrict the set of instruction patterns
used. For example:

  ./risugen --numinsns 10000 --pattern 'VQSHL.*imm.*' arm.risu vqshlimm.out

reads the configuration file arm.risu, and generates 10000 instructions
based on the instruction patterns matching the regular expression
"VQSHL.*imm.*". The resulting binary is written to vqshlimm.out.

An alternative to using regular expression patterns is to use the
--group specifier. This relies on the configuration file having been
annotated with suitable @ markers.

This binary can then be passed to the risu program, which is
written in C. You need to run risu on both an ARM native target
and on the program under test. The ARM native system is the 'master'
end, so run it like this:

  ./risu --master vqshlimm.out

It will sit waiting for a TCP connection from the 'apprentice'
which must be run on the program under test. In theory this is
as simple as:

  risu --host ip-addr-of-master vqshlimm.out

However since you actually need to run it under qemu or similar
you probably need an ARM chroot to run it in, and to do something

  sudo chroot /srv/chroot/arm-mav /risu --host ipaddr vqshlimm.out

If you built the binary statically you can simply run:

  /path/to/qemu ./risu --host ipaddr vqshlimm.out

When the apprentice connects to the master, they will both start
running the binary and checking results with each other. When the
test ends the master will print a register dump and the match or
mismatch status to its standard output.

NB that in the register dump the r15 (pc) value will be given
as an offset from the start of the binary, not an absolute value.

While the master/slave setup works well it is a bit fiddly for running
regression tests and other sorts of automation. For this reason risu
supports recording a trace of its execution to a file. For example:

  risu --master FxxV_across_lanes.risu.bin -t FxxV_across_lanes.risu.trace

And then playback with:

  risu FxxV_across_lanes.risu.bin -t FxxV_across_lanes.risu.trace

Ideally it should be built with zlib to compress the trace files which
would otherwise be huge. If building with zlib proves too tricky you
can pipe to stdout and an external compression binary using "-t -".

  risu --master FxxV_across_lanes.risu.bin -t - | gzip --best > trace.file


  gunzip -c trace.file | risu -t - FxxV_across_lanes.risu.bin

File format

The .risu file specifies instruction patterns to be tested.

Lines starting with '#' are comments and are ignored.
Blank lines are ignored. A '\' at the end of the line is 
a line continuation character.

Lines starting with a '.' are directives to risu/risugen:
 * ".mode [thumb|arm]" specifies whether the file contains ARM
   or Thumb instructions; it must precede all instruction patterns.

Lines starting with a '@' are a grouping directive. Instructions
following will be assigned to a comma separated list of groups. The
list of groups is reset at the next '@' directive which may be empty.
This provides an alternative method to selecting instructions than RE

Other lines are instruction patterns:
 insnname encodingname bitfield ... [ [ !blockname ] { blocktext } ]
where each bitfield is either:
  var:sz  specifying a variable field of size sz (sz == 0 if :sz omitted)
  [01]*   specifying fixed bits
Field names beginning 'r' are special as they are assumed to be general
purpose registers. They get an automatic "cannot be 13 or 15" (sp/pc)

The optional blocks at the end of the line are generally named;
an unnamed block is (for backwards compatibility) treated as one
named "constraints". Currently the following named blocks are

 * constraints :

The block is a perl statement to be evaluated and which
must return true if the generated statement is OK, false if the
generator should retry with a fresh random number. It is evaluated
in a context where variables with the same names as the defined
variable fields are initialised. The intention is that odd cases
where you need to apply some sort of constraint to the generated
instruction can be handled via this mechanism.
NB that there is no sanity checking that you don't do bad things
in the eval block, although there is a basic check for syntax
errors and and we bail out if the constraint returns failure too often.

 * memory :

The block indicates what memory address the instruction accesses
(either load or store). It should be a fragment of perl code which
is a call to a risugen function which implements support for the
addressing mode used by the instruction. As with the 'constraints'
block, the variable field values are provided as Perl variables.
By convention, the function always accepts as its last argument(s)
a list of the registers which will be trashed by the function
(this information is needed to avoid problems handling insns which
load to their base register.)
Currently supported addressing modes:
 reg(reg, trashed);
 reg_plus_imm(reg, immediate, trashed);
 reg_minus_imm(reg, immediate, trashed);
 reg_plus_reg(basereg, indexreg, trashed);
 reg_plus_reg_shifted(basereg, indexreg, shift, trashed);
    -- this is for [basereg + indexreg LSL shift]
The block can also include a call to the align() function
to indicate the memory alignment required for the access.
The default is 4-alignment. The align() call must precede
the addressing mode function call.

Implementation details and points to note

The register checking is done by registering a signal handler for
SIGILL, which then has access to register contents via the
sigcontext argument to the handler. Particular opcodes in the
guaranteed-to-UNDEF space are then used to say "check register
values" and "end of test".

There are some obvious limitations to this approach:

 * we assume that all the interesting state is in the registers
accessible to a signal handler. This is true in most cases but
we can't test complex instructions like ldrexd/strexd.
 * the generator is fairly simplistic and just alternates generated
instructions and "check-registers" commands. So branches and
loads or stores can't be checked this way. (This is more of a
restriction in the generator, not the test harness proper.)
 * we only catch gross errors of decode or implementation of
an instruction. We won't notice problems like overenthusiastic
reordering of instructions in the model's code generator, for
 * by definition, we can only test user-space visible instructions,
not those which are only accessible to privileged code.

Some limits which are more accidental:

 * I'm only testing ARM. The generator is rather ARM-specific.
The test harness is less so (there's a skeleton of an x86
implementation, for example) but only ARM is tested.
 * we don't actually compare FP status flags, simply because
I'm pretty sure qemu doesn't get them right yet and I'm more
interested in fixing gross bugs first.
 * You can compile statically to avoid the requirement for the ARM
chroot for qemu testing but you can no longer use gethostbyname() and need
to specify your hosts by IP address.
 * the documentation is rather minimal. This is because I don't
really expect many people to need to use this :-)

Contributed scripts

The contrib/ directory contains various contributed scripts
and other tools that are not necessary to use risu, but
might be helpful. See the comments in each script for
more information and documentation.

-- Peter Maydell <peter.maydell@linaro.org>