summaryrefslogtreecommitdiff
path: root/docs/04-Cache-hit-rate-howto.txt
diff options
context:
space:
mode:
Diffstat (limited to 'docs/04-Cache-hit-rate-howto.txt')
-rw-r--r--docs/04-Cache-hit-rate-howto.txt208
1 files changed, 208 insertions, 0 deletions
diff --git a/docs/04-Cache-hit-rate-howto.txt b/docs/04-Cache-hit-rate-howto.txt
new file mode 100644
index 0000000..60d37a5
--- /dev/null
+++ b/docs/04-Cache-hit-rate-howto.txt
@@ -0,0 +1,208 @@
+Cache hit-rate HOWTO
+====================
+
+A Introduction
+
+ The ARM Fast Models are accompanied with a trace infrastructure
+ referred to as the Model Trace Interface (MTI). The MTI trace
+ provides a mechanism to dynamically register to events from the
+ model. The GenericTrace.so MTI trace plugin provides a number of
+ trace events whose output can be logged in a simple text file.
+ The usage of this plugin is given in Section B.
+
+ In this document we will consider how the GenericTrace.so plugin
+ can be used during a cluster switchover to calculate the number
+ of cache hits in the outbound cluster L2 cache originating from
+ the inbound cluster before the outbound L2 is flushed and the
+ cluster placed in reset.
+
+B Plugin Usage
+
+ The GenericTrace plugin is loaded using the "--trace-plugin"
+ parameter in the command line to launch the model.
+
+ A list of trace sources provided by the plugin can be listed as
+ follows:
+
+ "RTSM_VE_Cortex-A15x1-A7x1 --trace-plugin GenericTrace.so
+ --parameter TRACE.GenericTrace.trace-sources= "
+
+ A list of parameters supported by the Generic Trace plugin can
+ be listed as follows:
+
+ "RTSM_VE_Cortex-A15x1-A7x1 --trace-plugin GenericTrace.so -l"
+
+ Some of the interesting parameters are:
+
+ TRACE.GenericTrace.trace-file: The trace file to write into. If
+ empty will print to console / STDOUT.
+
+ TRACE.GenericTrace.perf-period: Print performance every N
+ instructions. Since the instruction count and the global counter
+ have the same value on the Fast Models, this parameter provides
+ a good approximation of time.
+
+ TRACE.GenericTrace.flush: If set to true then the trace file will be
+ flushed after every event.
+
+C Plugin Trace sources
+
+ The GenericTrace plugin provides events which allow each cluster
+ to trace snoop requests originating from a different cluster that
+ hit in its caches. For snoops originating from the Cortex-A7 cluster
+ that hit in the A15 cluster, the event is 'read_for_4_came_from_snoop'
+ & for the opposite case the event is 'read_for_3_came_from_snoop'.
+ The numbers '3' & '4' in the name of the trace sources are the ids
+ of the CCI slave interfaces from where the snoop originated.
+
+ These trace sources are the per-cluster implementation of the
+ event id '0xA' "(Read data last handshake - data returned
+ from the cache rather than from downstream)" of the CCI PMU.
+ Please refer to the "Cache Coherent Interconnect (CCI-400)
+ Architecture Specification" for further details.
+
+ The plugin also provides the ability to trace code execution through
+ a memory mapped "tube" interface. This interface defines a list of
+ registers which when written to in a particular sequence and the
+ 'sw_trace_event' trace source selected during model invocation will
+ print out the register values in the trace file.
+
+ The "tube" interface defines:
+
+ - Three LE 64 bit registers of arbitrary data that can be
+ written (and retain their values).
+
+ - A tube-like char register which when written with '\0'
+ will generate an event with the current state of the
+ 64-bit registers and with the characters sent to the
+ device with a unique sequence_id.
+
+ All of these registers are banked and write-only, the trace
+ event will also output the cluster id and the CPU id. ARM
+ FastModels implement 1 to 4 TUBE interfaces. Please refer to
+ Section E for supported interfaces in the current model
+ release. The memory map of these registers can be found in
+ big-little/include/misc.h.
+
+ The 'write_trace' function in big-litte/lib/tube.c implements the
+ software sequence to program the tube interface. This function is
+ called at various points in switchover process. It prints out a
+ message which indicates that an event is about to start or has
+ completed alongwith the value of the global counter in one of the
+ 64 bit registers. To enable this functionality, the environment
+ variable "TUBE" needs to be defined to TRUE prior to code compilation.
+
+D Putting it all together
+
+ The list of steps to use the above mentioned functionality is:
+
+ 1. Build the Virtualizer code with "TUBE" support. On the
+ tcsh shell, this is as follows;
+
+ $ setenv TUBE TRUE; make clean && make
+
+ 2. Launch the model with the MTI trace plugin support and a
+ selection of the right trace sources using a suitable
+ MXScript file in the 'bootwrapper' directory.
+
+ Once the switchover process starts, the trace file will contain output
+ that looks like this (not including the comments):
+
+ .
+ .
+ .
+ .
+ // Lines beginning with "PERFORMANCE" are a result of the value of the
+ // "TRACE.GenericTrace.perf-period" parameter. This string is printed
+ // every <value> number of instructions (200 in this case) in the trace
+ // file. It indicates at what rate is the model executing instructions
+ // & the number of instructions executed thus far.
+ PERFORMANCE: 2.8 MIPS (Inst:67216767)
+ .
+ .
+ .
+ // Lines beginning with "sw_trace_event<x>" are a result of enabling
+ // "TUBE" support in the code and selecting the "sw_trace_event" source
+ // while invoking the model. The interpretation of this message is:
+ //
+ // <x> : indicates the "TUBE" interface number.
+ // sequence_id : a unique number assigned to each message
+ // cluster_and_cpu_id : in the format 0x<cluster id><cpu id>. Each id
+ // occupies 8 bits.
+ // data0 : first 64-bit register value. Programmed with
+ // the value of the global counter.
+ // data1 : second 64-bit register value. Not used.
+ // data2 : third 64-bit register value. Not used.
+ // message : String written to the TUBE register
+ sw_trace_event2: sequence_id=0x00000001 cluster_and_cpu_id=0x0000 data0=0x000000000401a3dc data1=0x0000000000000000 data2=0x0000000000000000 message="Secure Coherency Enable Start":30
+ .
+ .
+ .
+ PERFORMANCE: 0.2 MIPS (Inst:67217079)
+ sw_trace_event2: sequence_id=0x00000002 cluster_and_cpu_id=0x0000 data0=0x000000000401a581 data1=0x0000000000000000 data2=0x0000000000000000 message="Secure Coherency Enable End":28
+ PERFORMANCE: 0.9 MIPS (Inst:67217301)
+ PERFORMANCE: 5.8 MIPS (Inst:67217511)
+ .
+ .
+ .
+ // Lines beginning with "read_for_<x>_came_from_snoop" are a result of
+ // enabling the event sources for monitoring the cache hits resulting
+ // from snoops originating from master interface <x> on the CCI.
+ // The following line indicates that a snoop from the Cortex-A7 cluster
+ // hit in the caches of the A15 cluster. It also prints the cache line
+ // address and whether the access was Secure or Non-secure.
+ read_for_4_came_from_snoop: Bus address=0x000000008ff02440 Is non secure=N
+ read_for_4_came_from_snoop: Bus address=0x000000008ff02440 Is non secure=N
+ read_for_4_came_from_snoop: Bus address=0x000000008ff02240 Is non secure=N
+ read_for_4_came_from_snoop: Bus address=0x000000008ff02240 Is non secure=N
+ read_for_4_came_from_snoop: Bus address=0x000000008ff012c0 Is non secure=N
+ PERFORMANCE: 0.0 MIPS (Inst:135292834)
+ sw_trace_event: sequence_id=0x00000010 cluster_and_cpu_id=0x0000 data0=0x000000000810672e data1=0x0000000000000000 data2=0x0000000000000000 message="L2 Flush Begin":15
+ PERFORMANCE: 5.5 MIPS (Inst:135293056)
+ PERFORMANCE: 7.2 MIPS (Inst:135293374)
+ PERFORMANCE: 7.4 MIPS (Inst:135293587)
+ PERFORMANCE: 12.4 MIPS (Inst:135293800)
+ PERFORMANCE: 10.0 MIPS (Inst:135294118)
+ read_for_4_came_from_snoop: Bus address=0x0000000080054a80 Is non secure=Y
+ read_for_4_came_from_snoop: Bus address=0x0000000080054a80 Is non secure=Y
+ read_for_4_came_from_snoop: Bus address=0x0000000080054ac0 Is non secure=Y
+ read_for_4_came_from_snoop: Bus address=0x0000000080054ac0 Is non secure=Y
+ read_for_4_came_from_snoop: Bus address=0x0000000080074c80 Is non secure=Y
+ PERFORMANCE: 0.5 MIPS (Inst:135294331)
+ .
+ .
+ .
+ .
+ PERFORMANCE: 10.5 MIPS (Inst:135541612)
+ PERFORMANCE: 3.3 MIPS (Inst:135541929)
+ sw_trace_event: sequence_id=0x00000011 cluster_and_cpu_id=0x0000 data0=0x0000000008143442 data1=0x0000000000000000 data2=0x0000000000000000 message="L2 Flush End":13
+ .
+ .
+ .
+ .
+
+ Post-processing scripts can be developed which count the number of
+ 'read_for_<x>_came_from_snoop' events between two 'sw_trace_event<x>'
+ events. In the above example, the result will be the number of snoop
+ hits in the A15 caches while they were being flushed. In addition,
+ the "PERFORMANCE" strings can be used to determine the cache hit rate.
+ In this case, they indicate the number of hits in the last 200
+ instructions. Repeated iterations can be done where each iteration
+ changes the point of time when the L2 cache is flushed during a
+ switchover. By monitoring its effect on the cache hit rate, a suitable
+ time can be determined to power down the outbound L2 cache.
+
+E Status of "TUBE" support
+
+ The current version of ARM FastModels (RTSM VE Cortex-A15 KF
+ CCI version MODEL_VERSION) implements only one 'tube'
+ interface i.e. TUBE0.
+
+ Subsequent releases will support upto four 'tube' interfaces i.e TUBE0-3.
+ The Virtualizer code has been internally tested to work with all four 'tube'
+ sources and assumes their presence. Writing to a non-existent
+ 'tube' interface is treated as a nop and the trace file will contain
+ messages only from the 'sw_trace_event' source i.e TUBE0.
+
+ (Please correspond with the ARM FastModels team for details on future ARM
+ FastModels releases that will support all four tube interfaces).