diff options
Diffstat (limited to 'docs/04-Cache-hit-rate-howto.txt')
-rw-r--r-- | docs/04-Cache-hit-rate-howto.txt | 208 |
1 files changed, 208 insertions, 0 deletions
diff --git a/docs/04-Cache-hit-rate-howto.txt b/docs/04-Cache-hit-rate-howto.txt new file mode 100644 index 0000000..60d37a5 --- /dev/null +++ b/docs/04-Cache-hit-rate-howto.txt @@ -0,0 +1,208 @@ +Cache hit-rate HOWTO +==================== + +A Introduction + + The ARM Fast Models are accompanied with a trace infrastructure + referred to as the Model Trace Interface (MTI). The MTI trace + provides a mechanism to dynamically register to events from the + model. The GenericTrace.so MTI trace plugin provides a number of + trace events whose output can be logged in a simple text file. + The usage of this plugin is given in Section B. + + In this document we will consider how the GenericTrace.so plugin + can be used during a cluster switchover to calculate the number + of cache hits in the outbound cluster L2 cache originating from + the inbound cluster before the outbound L2 is flushed and the + cluster placed in reset. + +B Plugin Usage + + The GenericTrace plugin is loaded using the "--trace-plugin" + parameter in the command line to launch the model. + + A list of trace sources provided by the plugin can be listed as + follows: + + "RTSM_VE_Cortex-A15x1-A7x1 --trace-plugin GenericTrace.so + --parameter TRACE.GenericTrace.trace-sources= " + + A list of parameters supported by the Generic Trace plugin can + be listed as follows: + + "RTSM_VE_Cortex-A15x1-A7x1 --trace-plugin GenericTrace.so -l" + + Some of the interesting parameters are: + + TRACE.GenericTrace.trace-file: The trace file to write into. If + empty will print to console / STDOUT. + + TRACE.GenericTrace.perf-period: Print performance every N + instructions. Since the instruction count and the global counter + have the same value on the Fast Models, this parameter provides + a good approximation of time. + + TRACE.GenericTrace.flush: If set to true then the trace file will be + flushed after every event. + +C Plugin Trace sources + + The GenericTrace plugin provides events which allow each cluster + to trace snoop requests originating from a different cluster that + hit in its caches. For snoops originating from the Cortex-A7 cluster + that hit in the A15 cluster, the event is 'read_for_4_came_from_snoop' + & for the opposite case the event is 'read_for_3_came_from_snoop'. + The numbers '3' & '4' in the name of the trace sources are the ids + of the CCI slave interfaces from where the snoop originated. + + These trace sources are the per-cluster implementation of the + event id '0xA' "(Read data last handshake - data returned + from the cache rather than from downstream)" of the CCI PMU. + Please refer to the "Cache Coherent Interconnect (CCI-400) + Architecture Specification" for further details. + + The plugin also provides the ability to trace code execution through + a memory mapped "tube" interface. This interface defines a list of + registers which when written to in a particular sequence and the + 'sw_trace_event' trace source selected during model invocation will + print out the register values in the trace file. + + The "tube" interface defines: + + - Three LE 64 bit registers of arbitrary data that can be + written (and retain their values). + + - A tube-like char register which when written with '\0' + will generate an event with the current state of the + 64-bit registers and with the characters sent to the + device with a unique sequence_id. + + All of these registers are banked and write-only, the trace + event will also output the cluster id and the CPU id. ARM + FastModels implement 1 to 4 TUBE interfaces. Please refer to + Section E for supported interfaces in the current model + release. The memory map of these registers can be found in + big-little/include/misc.h. + + The 'write_trace' function in big-litte/lib/tube.c implements the + software sequence to program the tube interface. This function is + called at various points in switchover process. It prints out a + message which indicates that an event is about to start or has + completed alongwith the value of the global counter in one of the + 64 bit registers. To enable this functionality, the environment + variable "TUBE" needs to be defined to TRUE prior to code compilation. + +D Putting it all together + + The list of steps to use the above mentioned functionality is: + + 1. Build the Virtualizer code with "TUBE" support. On the + tcsh shell, this is as follows; + + $ setenv TUBE TRUE; make clean && make + + 2. Launch the model with the MTI trace plugin support and a + selection of the right trace sources using a suitable + MXScript file in the 'bootwrapper' directory. + + Once the switchover process starts, the trace file will contain output + that looks like this (not including the comments): + + . + . + . + . + // Lines beginning with "PERFORMANCE" are a result of the value of the + // "TRACE.GenericTrace.perf-period" parameter. This string is printed + // every <value> number of instructions (200 in this case) in the trace + // file. It indicates at what rate is the model executing instructions + // & the number of instructions executed thus far. + PERFORMANCE: 2.8 MIPS (Inst:67216767) + . + . + . + // Lines beginning with "sw_trace_event<x>" are a result of enabling + // "TUBE" support in the code and selecting the "sw_trace_event" source + // while invoking the model. The interpretation of this message is: + // + // <x> : indicates the "TUBE" interface number. + // sequence_id : a unique number assigned to each message + // cluster_and_cpu_id : in the format 0x<cluster id><cpu id>. Each id + // occupies 8 bits. + // data0 : first 64-bit register value. Programmed with + // the value of the global counter. + // data1 : second 64-bit register value. Not used. + // data2 : third 64-bit register value. Not used. + // message : String written to the TUBE register + sw_trace_event2: sequence_id=0x00000001 cluster_and_cpu_id=0x0000 data0=0x000000000401a3dc data1=0x0000000000000000 data2=0x0000000000000000 message="Secure Coherency Enable Start":30 + . + . + . + PERFORMANCE: 0.2 MIPS (Inst:67217079) + sw_trace_event2: sequence_id=0x00000002 cluster_and_cpu_id=0x0000 data0=0x000000000401a581 data1=0x0000000000000000 data2=0x0000000000000000 message="Secure Coherency Enable End":28 + PERFORMANCE: 0.9 MIPS (Inst:67217301) + PERFORMANCE: 5.8 MIPS (Inst:67217511) + . + . + . + // Lines beginning with "read_for_<x>_came_from_snoop" are a result of + // enabling the event sources for monitoring the cache hits resulting + // from snoops originating from master interface <x> on the CCI. + // The following line indicates that a snoop from the Cortex-A7 cluster + // hit in the caches of the A15 cluster. It also prints the cache line + // address and whether the access was Secure or Non-secure. + read_for_4_came_from_snoop: Bus address=0x000000008ff02440 Is non secure=N + read_for_4_came_from_snoop: Bus address=0x000000008ff02440 Is non secure=N + read_for_4_came_from_snoop: Bus address=0x000000008ff02240 Is non secure=N + read_for_4_came_from_snoop: Bus address=0x000000008ff02240 Is non secure=N + read_for_4_came_from_snoop: Bus address=0x000000008ff012c0 Is non secure=N + PERFORMANCE: 0.0 MIPS (Inst:135292834) + sw_trace_event: sequence_id=0x00000010 cluster_and_cpu_id=0x0000 data0=0x000000000810672e data1=0x0000000000000000 data2=0x0000000000000000 message="L2 Flush Begin":15 + PERFORMANCE: 5.5 MIPS (Inst:135293056) + PERFORMANCE: 7.2 MIPS (Inst:135293374) + PERFORMANCE: 7.4 MIPS (Inst:135293587) + PERFORMANCE: 12.4 MIPS (Inst:135293800) + PERFORMANCE: 10.0 MIPS (Inst:135294118) + read_for_4_came_from_snoop: Bus address=0x0000000080054a80 Is non secure=Y + read_for_4_came_from_snoop: Bus address=0x0000000080054a80 Is non secure=Y + read_for_4_came_from_snoop: Bus address=0x0000000080054ac0 Is non secure=Y + read_for_4_came_from_snoop: Bus address=0x0000000080054ac0 Is non secure=Y + read_for_4_came_from_snoop: Bus address=0x0000000080074c80 Is non secure=Y + PERFORMANCE: 0.5 MIPS (Inst:135294331) + . + . + . + . + PERFORMANCE: 10.5 MIPS (Inst:135541612) + PERFORMANCE: 3.3 MIPS (Inst:135541929) + sw_trace_event: sequence_id=0x00000011 cluster_and_cpu_id=0x0000 data0=0x0000000008143442 data1=0x0000000000000000 data2=0x0000000000000000 message="L2 Flush End":13 + . + . + . + . + + Post-processing scripts can be developed which count the number of + 'read_for_<x>_came_from_snoop' events between two 'sw_trace_event<x>' + events. In the above example, the result will be the number of snoop + hits in the A15 caches while they were being flushed. In addition, + the "PERFORMANCE" strings can be used to determine the cache hit rate. + In this case, they indicate the number of hits in the last 200 + instructions. Repeated iterations can be done where each iteration + changes the point of time when the L2 cache is flushed during a + switchover. By monitoring its effect on the cache hit rate, a suitable + time can be determined to power down the outbound L2 cache. + +E Status of "TUBE" support + + The current version of ARM FastModels (RTSM VE Cortex-A15 KF + CCI version MODEL_VERSION) implements only one 'tube' + interface i.e. TUBE0. + + Subsequent releases will support upto four 'tube' interfaces i.e TUBE0-3. + The Virtualizer code has been internally tested to work with all four 'tube' + sources and assumes their presence. Writing to a non-existent + 'tube' interface is treated as a nop and the trace file will contain + messages only from the 'sw_trace_event' source i.e TUBE0. + + (Please correspond with the ARM FastModels team for details on future ARM + FastModels releases that will support all four tube interfaces). |