aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu/perf_event_p4.c
AgeCommit message (Collapse)Author
2014-02-09perf/x86/p4: Block PMIs on init to prevent a stream of unkown NMIsDon Zickus
A bunch of unknown NMIs have popped up on a Pentium4 recently when booting into a kdump kernel. This was exposed because the watchdog timer went from 60 seconds down to 10 seconds (increasing the ability to reproduce this problem). What is happening is on boot up of the second kernel (the kdump one), the previous nmi_watchdogs were enabled on thread 0 and thread 1. The second kernel only initializes one cpu but the perf counter on thread 1 still counts. Normally in a kdump scenario, the other cpus are blocking in an NMI loop, but more importantly their local apics have the performance counters disabled (iow LVTPC is masked). So any counters that fire are masked and never get through to the second kernel. However, on a P4 the local apic is shared by both threads and thread1's PMI (despite being configured to only interrupt thread1) will generate an NMI on thread0. Because thread0 knows nothing about this NMI, it is seen as an unknown NMI. This would be fine because it is a kdump kernel, strange things happen what is the big deal about a single unknown NMI. Unfortunately, the P4 comes with another quirk: clearing the overflow bit to prevent a stream of NMIs. This is the problem. The kdump kernel can not execute because of the endless NMIs that happen. To solve this, I instrumented the p4 perf init code, to walk all the counters and zero them out (just like a normal reset would). Now when the counters go off, they do not generate anything and no unknown NMIs are seen. I tested this on a P4 we have in our lab. After two or three crashes, I could normally reproduce the problem. Now after 10 crashes, everything continues to boot correctly. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140120154115.GZ25953@redhat.com [ Fixed a stylistic detail. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09perf/x86/p4: Fix counter corruption when using lots of perf groupsDon Zickus
On a P4 box stressing perf with: ./perf record -o perf.data ./perf stat -v ./perf bench all it was noticed that a slew of unknown NMIs would pop out rather quickly. Painfully debugging this ancient platform, led me to notice cross cpu counter corruption. The P4 machine is special in that it has 18 counters, half are used for cpu0 and the other half is for cpu1 (or all 18 if hyperthreading is disabled). But the splitting of the counters has to be actively managed by the software. In this particular bug, one of the cpu0 specific counters was being used by cpu1 and caused all sorts of random unknown nmis. I am not entirely sure on the corruption path, but what happens is: o perf schedules a group with p4_pmu_schedule_events() o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused but for a different cpu, so it 'swaps' the config bits and returns the updated 'assign' array with a _new_ index. o perf schedules another group with p4_pmu_schedule_events() o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused (the same one as above) but for the _same_ cpu [BUG!!], so it updates the 'assign' array to use the _old_ (wrong cpu) index because the _new_ index is in an earlier part of the 'assign' array (and hasn't been committed yet). o perf commits the transaction using the wrong index and corrupts the other cpu The [BUG!!] is because the 'hwc->config' is updated but not the 'hwc->idx'. So the check for 'p4_should_swap_ts()' is correct the first time around but incorrect the second time around (because hwc->config was updated in between). I think the spirit of perf was to not modify anything until all the transactions had a chance to 'test' if they would succeed, and if so, commit atomically. However, P4 breaks this spirit by touching the hwc->config element. So my fix is to continue the un-perf like breakage, by assigning hwc->idx to -1 on swap to tell follow up group scheduling to find a new index. Of course if the transaction fails rolling this back will be difficult, but that is not different than how the current code works. :-) And I wasn't sure how much effort to cleanup the code I should do for a platform that is almost 10 years old by now. Hence the lazy fix. Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1391024270-19469-1-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-26perf/x86/intel/P4: Robistify P4 PMU typesIngo Molnar
Linus found, while extending integer type extension checks in the sparse static code checker, various fragile patterns of mixed signed/unsigned 64-bit/32-bit integer use in perf_events_p4.c. The relevant hardware register ABI is 64 bit wide on 32-bit kernels as well, so clean it all up a bit, remove unnecessary casts, and make sure we use 64-bit unsigned integers in these places. [ Unfortunately this patch was not tested on real P4 hardware, those are pretty rare already. If this patch causes any problems on P4 hardware then please holler ... ] Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: David Miller <davem@davemloft.net> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20130424072630.GB1780@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86: Rename Intel specific macrosRobert Richter
There are macros that are Intel specific and not x86 generic. Rename them into INTEL_*. This patch removes X86_PMC_IDX_GENERIC and does: $ sed -i -e 's/X86_PMC_MAX_/INTEL_PMC_MAX_/g' \ arch/x86/include/asm/kvm_host.h \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_p4.c \ arch/x86/kvm/pmu.c $ sed -i -e 's/X86_PMC_IDX_FIXED/INTEL_PMC_IDX_FIXED/g' \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_intel.c \ arch/x86/kernel/cpu/perf_event_intel_ds.c \ arch/x86/kvm/pmu.c $ sed -i -e 's/X86_PMC_MSK_/INTEL_PMC_MSK_/g' \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event.c Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340217996-2254-2-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-07x86, cpu: Rename checking_wrmsrl() to wrmsrl_safe()H. Peter Anvin
Rename checking_wrmsrl() to wrmsrl_safe(), to match the naming convention used by all the other MSR access functions/macros. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-05-09perf: Pass last sampling period to perf_sample_data_init()Robert Richter
We always need to pass the last sample period to perf_sample_data_init(), otherwise the event distribution will be wrong. Thus, modifiyng the function interface with the required period as argument. So basically a pattern like this: perf_sample_data_init(&data, ~0ULL); data.period = event->hw.last_period; will now be like that: perf_sample_data_init(&data, ~0ULL, event->hw.last_period); Avoids unininitialized data.period and simplifies code. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1333390758-10893-3-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-04-03perf/x86/p4: Add format attributesPeter Zijlstra
Steven reported his P4 not booting properly, the missing format attributes cause a NULL ptr deref. Cure this by adding the missing format specification. I took the format description out of the comment near p4_config_pack*() and hope that comment is still relatively accurate. Reported-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Bruno Prémont <bonbons@linux-vserver.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1332859842.16159.227.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org>
2011-11-14perf: Don't use -ENOSPC for out of PMU resourcesPeter Zijlstra
People (Linus) objected to using -ENOSPC to signal not having enough resources on the PMU to satisfy the request. Use -EINVAL. Requested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Daney <david.daney@cavium.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-xv8geaz2zpbjhlx0svmpp28n@git.kernel.org [ merged to newer kernel, fixed up MIPS impact ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-26x86, perf: Clean up perf_event cpu codeKevin Winchester
The CPU support for perf events on x86 was implemented via included C files with #ifdefs. Clean this up by creating a new header file and compiling the vendor-specific files as needed. Signed-off-by: Kevin Winchester <kjwinchester@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1314747665-2090-1-git-send-email-kjwinchester@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21x86, perf: P4 PMU - Fix typos in comments and style cleanupCyrill Gorcunov
This patch: - fixes typos in comments and clarifies the text - renames obscure p4_event_alias::original and ::alter members to ::original and ::alternative as appropriate - drops parenthesis from the return of p4_get_alias_event() No functional changes. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Link: http://lkml.kernel.org/r/20110721160625.GX7492@sun Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-14perf, x86: P4 PMU - Introduce event alias featureCyrill Gorcunov
Instead of hw_nmi_watchdog_set_attr() weak function and appropriate x86_pmu::hw_watchdog_set_attr() call we introduce even alias mechanism which allow us to drop this routines completely and isolate quirks of Netburst architecture inside P4 PMU code only. The main idea remains the same though -- to allow nmi-watchdog and perf top run simultaneously. Note the aliasing mechanism applies to generic PERF_COUNT_HW_CPU_CYCLES event only because arbitrary event (say passed as RAW initially) might have some additional bits set inside ESCR register changing the behaviour of event and we can't guarantee anymore that alias event will give the same result. P.S. Thanks a huge to Don and Steven for for testing and early review. Acked-by: Don Zickus <dzickus@redhat.com> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> CC: Ingo Molnar <mingo@elte.hu> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Stephane Eranian <eranian@google.com> CC: Lin Ming <ming.m.lin@intel.com> CC: Arnaldo Carvalho de Melo <acme@redhat.com> CC: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20110708201712.GS23657@sun Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-01perf, arch: Add generic NODE cache eventsPeter Zijlstra
Add a NODE level to the generic cache events which is used to measure local vs remote memory accesses. Like all other cache events, an ACCESS is HIT+MISS, if there is no way to distinguish between reads and writes do reads only etc.. The below needs filling out for !x86 (which I filled out with unsupported events). I'm fairly sure ARM can leave it like that since it doesn't strike me as an architecture that even has NUMA support. SH might have something since it does appear to have some NUMA bits. Sparc64, PowerPC and MIPS certainly want a good look there since they clearly are NUMA capable. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: David Miller <davem@davemloft.net> Cc: Anton Blanchard <anton@samba.org> Cc: David Daney <ddaney@caviumnetworks.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1303508226.4865.8.camel@laptop Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01perf: Remove the nmi parameter from the swevent and overflow interfacePeter Zijlstra
The nmi parameter indicated if we could do wakeups from the current context, if not, we would set some state and self-IPI and let the resulting interrupt do the wakeup. For the various event classes: - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from the PMI-tail (ARM etc.) - tracepoint: nmi=0; since tracepoint could be from NMI context. - software: nmi=[0,1]; some, like the schedule thing cannot perform wakeups, and hence need 0. As one can see, there is very little nmi=1 usage, and the down-side of not using it is that on some platforms some software events can have a jiffy delay in wakeup (when arch_irq_work_raise isn't implemented). The up-side however is that we can remove the nmi parameter and save a bunch of conditionals in fast paths. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michael Cree <mcree@orcon.net.nz> Cc: Will Deacon <will.deacon@arm.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: Eric B Munson <emunson@mgebm.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01perf, x86: Add hw_watchdog_set_attr() in a sake of nmi-watchdog on P4Cyrill Gorcunov
Due to restriction and specifics of Netburst PMU we need a separated event for NMI watchdog. In particular every Netburst event consumes not just a counter and a config register, but also an additional ESCR register. Since ESCR registers are grouped upon counters (i.e. if ESCR is occupied for some event there is no room for another event to enter until its released) we need to pick up the "least" used ESCR (or the most available one) for nmi-watchdog purposes -- so MSR_P4_CRU_ESCR2/3 was chosen. With this patch nmi-watchdog and perf top should be able to run simultaneously. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> CC: Lin Ming <ming.m.lin@intel.com> CC: Arnaldo Carvalho de Melo <acme@redhat.com> CC: Frederic Weisbecker <fweisbec@gmail.com> Tested-and-reviewed-by: Don Zickus <dzickus@redhat.com> Tested-and-reviewed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110623124918.GC13050@sun Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-01Merge branch 'tip/perf/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
2011-04-27perf, x86, nmi: Move LVT un-masking into irq handlersDon Zickus
It was noticed that P4 machines were generating double NMIs for each perf event. These extra NMIs lead to 'Dazed and confused' messages on the screen. I tracked this down to a P4 quirk that said the overflow bit had to be cleared before re-enabling the apic LVT mask. My first attempt was to move the un-masking inside the perf nmi handler from before the chipset NMI handler to after. This broke Nehalem boxes that seem to like the unmasking before the counters themselves are re-enabled. In order to keep this change simple for 2.6.39, I decided to just simply move the apic LVT un-masking to the beginning of all the chipset NMI handlers, with the exception of Pentium4's to fix the double NMI issue. Later on we can move the un-masking to later in the handlers to save a number of 'extra' NMIs on those particular chipsets. I tested this change on a P4 machine, an AMD machine, a Nehalem box, and a core2quad box. 'perf top' worked correctly along with various other small 'perf record' runs. Anything high stress breaks all the machines but that is a different problem. Thanks to various people for testing different versions of this patch. Reported-and-tested-by: Shaun Ruffell <sruffell@digium.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Link: http://lkml.kernel.org/r/1303900353-10242-1-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu> CC: Cyrill Gorcunov <gorcunov@gmail.com>
2011-04-24perf events, x86, P4: Fix typo in commentJustin P. Mattock
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: trivial@kernel.org Link: http://lkml.kernel.org/r/1303492132-3004-1-git-send-email-justinmattock@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-22perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflowCyrill Gorcunov
It's not enough to simply disable event on overflow the cpuc->active_mask should be cleared as well otherwise counter may stall in "active" even in real being already disabled (which potentially may lead to the situation that user may not use this counter further). Don pointed out that: " I also noticed this patch fixed some unknown NMIs on a P4 when I stressed the box". Tested-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Link: http://lkml.kernel.org/r/1303398203-2918-3-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-22perf, x86: P4 PMU -- Use perf_sample_data_init helperCyrill Gorcunov
Instead of opencoded assignments better to use perf_sample_data_init helper. Tested-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Link: http://lkml.kernel.org/r/1303398203-2918-2-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-29perf, x86: P4 PMU - clean up the code a bitCyrill Gorcunov
No change on the functional level, just align the table properly. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <4D8FA213.5050108@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-25Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86: Complain louder about BIOSen corrupting CPU/PMU state and continue perf, x86: P4 PMU - Read proper MSR register to catch unflagged overflows perf symbols: Look at .dynsym again if .symtab not found perf build-id: Add quirk to deal with perf.data file format breakage perf session: Pass evsel in event_ops->sample() perf: Better fit max unprivileged mlock pages for tools needs perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx() perf top: Fix uninitialized 'counter' variable tracing: Fix set_ftrace_filter probe function display perf, x86: Fix Intel fixed counters base initialization
2011-03-24perf, x86: P4 PMU - Read proper MSR register to catch unflagged overflowsDon Zickus
The read of a proper MSR register was missed and instead of counter the configration register was tested (it has ARCH_P4_UNFLAGGED_BIT always cleared) leading to unknown NMI hitting the system. As result the user may obtain "Dazed and confused, but trying to continue" message. Fix it by reading a proper MSR register. When an NMI happens on a P4, the perf nmi handler checks the configuration register to see if the overflow bit is set or not before taking appropriate action. Unfortunately, various P4 machines had a broken overflow bit, so a backup mechanism was implemented. This mechanism checked to see if the counter rolled over or not. A previous commit that implemented this backup mechanism was broken. Instead of reading the counter register, it used the configuration register to determine if the counter rolled over or not. Reading that bit would give incorrect results. This would lead to 'Dazed and confused' messages for the end user when using the perf tool (or if the nmi watchdog is running). The fix is to read the counter register before determining if the counter rolled over or not. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <4D8BAB49.3080701@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-18x86: Fix common misspellingsLucas De Marchi
They were generated by 'codespell' and then manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: trivial@kernel.org LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-16perf, x86: Store perfctr msr addresses in config_base/event_baseRobert Richter
Instead of storing the base addresses we can store the counter's msr addresses directly in config_base/event_base of struct hw_perf_event. This avoids recalculating the address with each msr access. The addresses are configured one time. We also need this change to later modify the address calculation. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1296664860-10886-5-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-16perf, x86: P4 PMU: Fix spurious NMI messagesCyrill Gorcunov
Several people have reported spurious unknown NMI messages on some P4 CPUs. This patch fixes it by checking for an overflow (negative counter values) directly, instead of relying on the P4_CCCR_OVF bit. Reported-by: George Spelvin <linux@horizon.com> Reported-by: Meelis Roos <mroos@linux.ee> Reported-by: Don Zickus <dzickus@redhat.com> Reported-by: Dave Airlie <airlied@gmail.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <AANLkTinfuTfCck_FfaOHrDqQZZehtRzkBum4SpFoO=KJ@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-27perf: Fix Pentium4 raw event validationStephane Eranian
This patch fixes some issues with raw event validation on Pentium 4 (Netburst) based processors. As I was testing libpfm4 Netburst support, I ran into two problems in the p4_validate_raw_event() function: - the shared field must be checked ONLY when HT is on - the binding to ESCR register was missing The second item was causing raw events to not be encoded correctly compared to generic PMU events. With this patch, I can now pass Netburst events to libpfm4 examples and get meaningful results: $ task -e global_power_events:running:u noploop 1 noploop for 1 seconds 3,206,304,898 global_power_events:running Signed-off-by: Stephane Eranian <eranian@google.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: peterz@infradead.org Cc: paulus@samba.org Cc: davem@davemloft.net Cc: fweisbec@gmail.com Cc: perfmon2-devel@lists.sf.net Cc: eranian@gmail.com Cc: robert.richter@amd.com Cc: acme@redhat.com Cc: gorcunov@gmail.com Cc: ming.m.lin@intel.com LKML-Reference: <4d3efb2f.1252d80a.1a80.ffffc83f@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-09perf, x86: P4 PMU - Fix unflagged overflows handlingCyrill Gorcunov
Don found that P4 PMU reads CCCR register instead of counter itself (in attempt to catch unflagged event) this makes P4 NMI handler to consume all NMIs it observes. So the other NMI users such as kgdb simply have no chance to get NMI on their hands. Side note: at moment there is no way to run nmi-watchdog together with perf tool. This is because both 'perf top' and nmi-watchdog use same event. So while nmi-watchdog reserves one event/counter for own needs there is no room for perf tool left (there is a way to disable nmi-watchdog on boot of course). Ming has tested this patch with the following results | 1. watchdog disabled | | kgdb tests on boot OK | perf works OK | | 2. watchdog enabled, without patch perf-x86-p4-nmi-4 | | kgdb tests on boot hang | | 3. watchdog enabled, without patch perf-x86-p4-nmi-4 and do not run kgdb | tests on boot | | "perf top" partialy works | cpu-cycles no | instructions yes | cache-references no | cache-misses no | branch-instructions no | branch-misses yes | bus-cycles no | | 4. watchdog enabled, with patch perf-x86-p4-nmi-4 applied | | kgdb tests on boot OK | perf does not work, NMI "Dazed and confused" messages show up | Which means we still have problems with p4 box due to 'unknown' nmi happens but at least it should fix kgdb test cases. Reported-by: Jason Wessel <jason.wessel@windriver.com> Reported-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Don Zickus <dzickus@redhat.com> Acked-by: Lin Ming <ming.m.lin@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <4D275E7E.3040903@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-05Merge branch 'perf/urgent' into perf/coreIngo Molnar
Conflicts: tools/perf/util/ui/browsers/hists.c Merge reason: fix the conflict and merge in changes for dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-30perf, x86: Handle in flight NMIs on P4 platformCyrill Gorcunov
Stephane reported we've forgot to guard the P4 platform against spurious in-flight performance IRQs. Fix it. This fixes potential spurious 'dazed and confused' NMI messages. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: fweisbec@gmail.com Cc: peterz@infradead.org Cc: Robert Richter <robert.richter@amd.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <1285815698-4298-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09Merge branch 'perf/urgent' into perf/coreIngo Molnar
Merge reason: Pick up pending fixes before applying dependent new changes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03perf, x86: Fix handle_irq return valuesPeter Zijlstra
Now that we rely on the number of handled overflows, ensure all handle_irq implementations actually return the right number. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: peterz@infradead.org Cc: robert.richter@amd.com Cc: gorcunov@gmail.com Cc: fweisbec@gmail.com Cc: ying.huang@intel.com Cc: ming.m.lin@intel.com Cc: eranian@google.com LKML-Reference: <1283454469-1909-4-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-01perf, x86, Pentium4: Add RAW events verificationCyrill Gorcunov
Implements verification of - Bits of ESCR EventMask field (meaningful bits in field are hardware predefined and others bits should be set to zero) - INSTR_COMPLETED event (it is available on predefined cpu model only) - Thread shared events (they should be guarded by "perf_event_paranoid" sysctl due to security reason). The side effect of this action is that PERF_COUNT_HW_BUS_CYCLES become a "paranoid" general event. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Tested-by: Lin Ming <ming.m.lin@intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20100825182334.GB14874@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-25perf, x86, Pentium4: Clear the P4_CCCR_FORCE_OVF flagLin Ming
If on Pentium4 CPUs the FORCE_OVF flag is set then an NMI happens on every event, which can generate a flood of NMIs. Clear it. Reported-by: Vince Weaver <vweaver1@eecs.utk.edu> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-08perf, x86: P4 PMU -- update nmi irq statistics and unmask lvt entry properlyCyrill Gorcunov
In case if last active performance counter is not overflowed at moment of NMI being triggered by another counter, the irq statistics may miss an update stage. As a more serious consequence -- apic quirk may not be triggered so apic lvt entry stay masked. Tested-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Stephane Eranian <eranian@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100805150917.GA6311@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-05perf, x86: P4 PMU -- redesign cache eventsCyrill Gorcunov
To support cache events we have reserved the low 6 bits in hw_perf_event::config (which is a part of CCCR register configuration actually). These bits represent Replay Event mertic enumerated in enum P4_PEBS_METRIC. The caller should not care about which exact bits should be set and how -- the caller just chooses one P4_PEBS_METRIC entity and puts it into the config. The kernel will track it and set appropriate additional MSR registers (metrics) when needed. The reason for this redesign was the PEBS enable bit, which should not be set until DS (and PEBS sampling) support will be implemented properly. TODO ==== - PEBS sampling (note it's tricky and works with _one_ counter only so for HT machines it will be not that easy to handle both threads) - tracking of PEBS registers state, a user might need to turn PEBS off completely (ie no PEBS enable, no UOP_tag) but some other event may need it, such events clashes and should not run simultaneously, at moment we just don't support such events - eventually export user space bits in separate header which will allow user apps to configure raw events more conveniently. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1278295769.9540.15.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-09perf, x86: Make a second write to performance counter if neededCyrill Gorcunov
On Netburst PMU we need a second write to a performance counter due to cpu erratum. A simple flag test instead of alternative instructions was choosen because wrmsrl is already a macro and if virtualization is turned on will need an additional wrapper call which is more expencise. nb: we should propably switch to jump-labels as only this facility reach the mainline. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100602212304.GC5264@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-19perf, x86: P4_pmu_schedule_events -- use smp_processor_id instead of raw_Cyrill Gorcunov
This snippet somehow escaped the commit: | commit 137351e0feeb9f25d99488ee1afc1c79f5499a9a | Author: Cyrill Gorcunov <gorcunov@openvz.org> | Date: Sat May 8 15:25:52 2010 +0400 | | x86, perf: P4 PMU -- protect sensible procedures from preemption so bring it eventually back. It helps to catch preemption issue (if there will be, rule of thumb -- don't use raw_ if you can). Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100518212439.167259349@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-19perf, x86: P4 PMU -- do a real check for ESCR address being in hashCyrill Gorcunov
To prevent from clashes in future code modifications do a real check for ESCR address being in hash. At moment the callers are known to pass sane values but better to be on a safe side. And comment fix. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> CC: Lin Ming <ming.m.lin@intel.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100518212439.004503600@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-18perf, x86: P4 PMU -- fix typo in unflagged NMI handlingCyrill Gorcunov
Tested-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> LKML-Reference: <1274174954.22793.17.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-18perf, x86: P4 PMU -- handle unflagged eventsCyrill Gorcunov
It might happen that an event can overflow without the proper overflow flag set. Check the sign bit in the raw counter value to solve this problem. Tested-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: fweisbec@gmail.com Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1274083984.6540.15.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-15x86, perf: P4 PMU - fix counters management logicCyrill Gorcunov
Jaswinder reported this #GP: | | Message from syslogd@ht at May 14 09:39:32 ... | kernel:[ 314.908612] EIP: [<c100ccca>] | x86_perf_event_set_period+0x19d/0x1b2 SS:ESP 0068:edac3d70 | Ming has narrowed it down to a comparision issue between arguments with different sizes and signs. As result event index reached a wrong value which in turn led to a GP fault. At the same time it was found that p4_next_cntr has broken logic and should return the counter index only if it was not yet borrowed for another event. Reported-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com> Reported-by: Lin Ming <ming.m.lin@intel.com> Bisected-by: Lin Ming <ming.m.lin@intel.com> Tested-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100514190815.GG13509@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-13x86, perf: P4 PMU -- use hash for p4_get_escr_idx()Cyrill Gorcunov
Linear search over all p4 MSRs should be fine if only we would not use it in events scheduling routine which is pretty time critical. Lets use hashes. It should speed scheduling up significantly. v2: Steven proposed to use more gentle approach than issue BUG on error, so we use WARN_ONCE now Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <20100512174242.GA5190@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-08x86, perf: P4 PMU -- check for proper event index in RAW eventsCyrill Gorcunov
RAW events are special and we should be ready for user passing in insane event index values. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <20100508112717.315897547@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-08x86, perf: P4 PMU -- Get rid of redundant check for array indexCyrill Gorcunov
The caller already has done such a check. And it was wrong anyway, it had to be '>=' rather than '>' Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <20100508112717.130386882@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-08x86, perf: P4 PMU -- protect sensible procedures from preemptionCyrill Gorcunov
Steven reported: | | I'm getting: | | Pid: 3477, comm: perf Not tainted 2.6.34-rc6 #2727 | Call Trace: | [<ffffffff811c7565>] debug_smp_processor_id+0xd5/0xf0 | [<ffffffff81019874>] p4_hw_config+0x2b/0x15c | [<ffffffff8107acbc>] ? trace_hardirqs_on_caller+0x12b/0x14f | [<ffffffff81019143>] hw_perf_event_init+0x468/0x7be | [<ffffffff810782fd>] ? debug_mutex_init+0x31/0x3c | [<ffffffff810c68b2>] T.850+0x273/0x42e | [<ffffffff810c6cab>] sys_perf_event_open+0x23e/0x3f1 | [<ffffffff81009e6a>] ? sysret_check+0x2e/0x69 | [<ffffffff81009e32>] system_call_fastpath+0x16/0x1b | | When running perf record in latest tip/perf/core | Due to the fact that p4 counters are shared between HT threads we synthetically divide the whole set of counters into two non-intersected subsets. And while we're "borrowing" counters from these subsets we should not be preempted (well, strictly speaking in p4_hw_config we just pre-set reference to the subset which allow to save some cycles in schedule routine if it happens on the same cpu). So use get_cpu/put_cpu pair. Also p4_pmu_schedule_events should use smp_processor_id rather than raw_ version. This allow us to catch up preemption issue (if there will ever be). Reported-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <20100508112716.963478928@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-08x86, perf: P4 PMU -- configure predefined eventsCyrill Gorcunov
If an event is not RAW we should not exit p4_hw_config early but call x86_setup_perfctr as well. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Robert Richter <robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-07perf, x86: Call x86_setup_perfctr() from .hw_config()Robert Richter
The perfctr setup calls are in the corresponding .hw_config() functions now. This makes it possible to introduce config functions for other pmu events that are not perfctr specific. Also, all of a sudden the code looks much nicer. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1271190201-25705-4-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02perf, x86: Fix __initconst vs constPeter Zijlstra
All variables that have __initconst should also be const. Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02perf, x86: Fix up the ANY flag stuffPeter Zijlstra
Stephane noticed that the ANY flag was in generic arch code, and Cyrill reported that it broke the P4 code. Solve this by merging x86_pmu::raw_event into x86_pmu::hw_config and provide intel_pmu and amd_pmu specific versions of this callback. The intel_pmu one deals with the ANY flag, the amd_pmu adds the few extra event bits AMD64 has. Reported-by: Stephane Eranian <eranian@google.com> Reported-by: Cyrill Gorcunov <gorcunov@gmail.com> Acked-by: Robert Richter <robert.richter@amd.com> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Acked-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1269968113.5258.442.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02perf, x86: Undo some some *_counter* -> *_event* renamesRobert Richter
The big rename: cdd6c48 perf: Do the big rename: Performance Counters -> Performance Events accidentally renamed some members of stucts that were named after registers in the spec. To avoid confusion this patch reverts some changes. The related specs are MSR descriptions in AMD's BKDGs and the ARCHITECTURAL PERFORMANCE MONITORING section in the Intel 64 and IA-32 Architectures Software Developer's Manuals. This patch does: $ sed -i -e 's:num_events:num_counters:g' \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event_amd.c \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_intel.c \ arch/x86/kernel/cpu/perf_event_p6.c \ arch/x86/kernel/cpu/perf_event_p4.c \ arch/x86/oprofile/op_model_ppro.c $ sed -i -e 's:event_bits:cntval_bits:g' -e 's:event_mask:cntval_mask:g' \ arch/x86/kernel/cpu/perf_event_amd.c \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_intel.c \ arch/x86/kernel/cpu/perf_event_p6.c \ arch/x86/kernel/cpu/perf_event_p4.c Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1269880612-25800-2-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>