aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2011-05-22Merge commit 'v2.6.38.7' into linaro-2.6.38Nicolas Pitre
Conflicts: mm/memory.c
2011-05-21x86, mce, AMD: Fix leaving freed data in a listJulia Lawall
commit d9a5ac9ef306eb5cc874f285185a15c303c50009 upstream. b may be added to a list, but is not removed before being freed in the case of an error. This is done in the corresponding deallocation function, so the code here has been changed to follow that. The sematic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression E,E1,E2; identifier l; @@ *list_add(&E->l,E1); ... when != E1 when != list_del(&E->l) when != list_del_init(&E->l) when != E = E2 *kfree(E);// </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Link: http://lkml.kernel.org/r/1305294731-12127-1-git-send-email-julia@diku.dk Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-21x86, apic: Fix spurious error interrupts triggering on all non-boot APsYouquan Song
commit e503f9e4b092e2349a9477a333543de8f3c7f5d9 upstream. This patch fixes a bug reported by a customer, who found that many unreasonable error interrupts reported on all non-boot CPUs (APs) during the system boot stage. According to Chapter 10 of Intel Software Developer Manual Volume 3A, Local APIC may signal an illegal vector error when an LVT entry is set as an illegal vector value (0~15) under FIXED delivery mode (bits 8-11 is 0), regardless of whether the mask bit is set or an interrupt actually happen. These errors are seen as error interrupts. The initial value of thermal LVT entries on all APs always reads 0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI sequence to them and LVT registers are reset to 0s except for the mask bits which are set to 1s when APs receive INIT IPI. When the BIOS takes over the thermal throttling interrupt, the LVT thermal deliver mode should be SMI and it is required from the kernel to keep AP's LVT thermal monitoring register programmed as such as well. This issue happens when BIOS does not take over thermal throttling interrupt, AP's LVT thermal monitor register will be restored to 0x10000 which means vector 0 and fixed deliver mode, so all APs will signal illegal vector error interrupts. This patch check if interrupt delivery mode is not fixed mode before restoring AP's LVT thermal monitor register. Signed-off-by: Youquan Song <youquan.song@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Yong Wang <yong.y.wang@intel.com> Cc: hpa@linux.intel.com Cc: joe@perches.com Cc: jbaron@redhat.com Cc: trenn@suse.de Cc: kent.liu@intel.com Cc: chaohong.guo@intel.com Link: http://lkml.kernel.org/r/1303402963-17738-1-git-send-email-youquan.song@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-21x86, AMD: Fix ARAT feature setting againBorislav Petkov
commit 14fb57dccb6e1defe9f89a66f548fcb24c374c1d upstream. Trying to enable the local APIC timer on early K8 revisions uncovers a number of other issues with it, in conjunction with the C1E enter path on AMD. Fixing those causes much more churn and troubles than the benefit of using that timer brings so don't enable it on K8 at all, falling back to the original functionality the kernel had wrt to that. Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com> Cc: Boris Ostrovsky <Boris.Ostrovsky@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Hans Rosenfeld <hans.rosenfeld@amd.com> Cc: Nick Bowler <nbowler@elliptictech.com> Cc: Joerg-Volker-Peetz <jvpeetz@web.de> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1305636919-31165-3-git-send-email-bp@amd64.org Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-21Revert "x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors"Borislav Petkov
commit 328935e6348c6a7cb34798a68c326f4b8372e68a upstream. This reverts commit e20a2d205c05cef6b5783df339a7d54adeb50962, as it crashes certain boxes with specific AMD CPU models. Moving the lower endpoint of the Erratum 400 check to accomodate earlier K8 revisions (A-E) opens a can of worms which is simply not worth to fix properly by tweaking the errata checking framework: * missing IntPenging MSR on revisions < CG cause #GP: http://marc.info/?l=linux-kernel&m=130541471818831 * makes earlier revisions use the LAPIC timer instead of the C1E idle routine which switches to HPET, thus not waking up in deeper C-states: http://lkml.org/lkml/2011/4/24/20 Therefore, leave the original boundary starting with K8-revF. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-21x86, hw_breakpoints: Fix racy access to ptrace breakpointsFrederic Weisbecker
commit 87dc669ba25777b67796d7262c569429e58b1ed4 upstream. While the tracer accesses ptrace breakpoints, the child task may concurrently exit due to a SIGKILL and thus release its breakpoints at the same time. We can then dereference some freed pointers. To fix this, hold a reference on the child breakpoints before manipulating them. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Will Deacon <will.deacon@arm.com> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Link: http://lkml.kernel.org/r/1302284067-7860-3-git-send-email-fweisbec@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-16Merge commit 'v2.6.38.6' into linaro-2.6.38Nicolas Pitre
2011-05-09x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processorsBoris Ostrovsky
commit e20a2d205c05cef6b5783df339a7d54adeb50962 upstream. Older AMD K8 processors (Revisions A-E) are affected by erratum 400 (APIC timer interrupts don't occur in C states greater than C1). This, for example, means that X86_FEATURE_ARAT flag should not be set for these parts. This addresses regression introduced by commit b87cf80af3ba4b4c008b4face3c68d604e1715c6 ("x86, AMD: Set ARAT feature on AMD processors") where the system may become unresponsive until external interrupt (such as keyboard input) occurs. This results, for example, in time not being reported correctly, lack of progress on the system and other lockups. Reported-by: Joerg-Volker Peetz <jvpeetz@web.de> Tested-by: Joerg-Volker Peetz <jvpeetz@web.de> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Boris Ostrovsky <Boris.Ostrovsky@amd.com> Link: http://lkml.kernel.org/r/1304113663-6586-1-git-send-email-ostr@amd64.org Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-02Merge commit 'v2.6.38.5' into linaro-2.6.38Nicolas Pitre
2011-05-02x86, gart: Make sure GART does not map physmem above 1TBJoerg Roedel
commit 665d3e2af83c8fbd149534db8f57d82fa6fa6753 upstream. The GART can only map physical memory below 1TB. Make sure the gart driver in the kernel does not try to map memory above 1TB. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/1303134346-5805-5-git-send-email-joerg.roedel@amd.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-02x86, gart: Set DISTLBWALKPRB bit alwaysJoerg Roedel
commit c34151a742d84ae65db2088ea30495063f697fbe upstream. The DISTLBWALKPRB bit must be set for the GART because the gatt table is mapped UC. But the current code does not set the bit at boot when the BIOS setup the aperture correctly. Fix that by setting this bit when enabling the GART instead of the other places. Cc: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/1303134346-5805-4-git-send-email-joerg.roedel@amd.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-25Merge commit 'v2.6.38.4' into linaro-2.6.38Nicolas Pitre
2011-04-21x86, amd: Disable GartTlbWlkErr when BIOS forgets itJoerg Roedel
commit 5bbc097d890409d8eff4e3f1d26f11a9d6b7c07e upstream. This patch disables GartTlbWlk errors on AMD Fam10h CPUs if the BIOS forgets to do is (or is just too old). Letting these errors enabled can cause a sync-flood on the CPU causing a reboot. The AMD BKDG recommends disabling GART TLB Wlk Error completely. This patch is the fix for https://bugzilla.kernel.org/show_bug.cgi?id=33012 on my machine. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/20110415131152.GJ18463@8bytes.org Tested-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-21x86, AMD: Set ARAT feature on AMD processorsBoris Ostrovsky
commit b87cf80af3ba4b4c008b4face3c68d604e1715c6 upstream. Support for Always Running APIC timer (ARAT) was introduced in commit db954b5898dd3ef3ef93f4144158ea8f97deb058. This feature allows us to avoid switching timers from LAPIC to something else (e.g. HPET) and go into timer broadcasts when entering deep C-states. AMD processors don't provide a CPUID bit for that feature but they also keep APIC timers running in deep C-states (except for cases when the processor is affected by erratum 400). Therefore we should set ARAT feature bit on AMD CPUs. Tested-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Mark Langsdorf <mark.langsdorf@amd.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> LKML-Reference: <1300205624-4813-1-git-send-email-ostr@amd64.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14Merge commit 'v2.6.38.3' into linaro-2.6.38Nicolas Pitre
2011-04-14Revert "x86: Cleanup highmap after brk is concluded"Greg Kroah-Hartman
This reverts upstream commit e5f15b45ddf3afa2bbbb10c7ea34fb32b6de0a0e It caused problems in the stable tree and should not have been there. Cc: Yinghai Lu <yinghai@kernel.org> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14x86, mtrr, pat: Fix one cpu getting out of sync during resumeSuresh Siddha
commit 84ac7cdbdd0f04df6b96153f7a79127fd6e45467 upstream. On laptops with core i5/i7, there were reports that after resume graphics workloads were performing poorly on a specific AP, while the other cpu's were ok. This was observed on a 32bit kernel specifically. Debug showed that the PAT init was not happening on that AP during resume and hence it contributing to the poor workload performance on that cpu. On this system, resume flow looked like this: 1. BP starts the resume sequence and we reinit BP's MTRR's/PAT early on using mtrr_bp_restore() 2. Resume sequence brings all AP's online 3. Resume sequence now kicks off the MTRR reinit on all the AP's. 4. For some reason, between point 2 and 3, we moved from BP to one of the AP's. My guess is that printk() during resume sequence is contributing to this. We don't see similar behavior with the 64bit kernel but there is no guarantee that at this point the remaining resume sequence (after AP's bringup) has to happen on BP. 5. set_mtrr() was assuming that we are still on BP and skipped the MTRR/PAT init on that cpu (because of 1 above) 6. But we were on an AP and this led to not reprogramming PAT on this cpu leading to bad performance. Fix this by doing unconditional mtrr_if->set_all() in set_mtrr() during MTRR/PAT init. This might be unnecessary if we are still running on BP. But it is of no harm and will guarantee that after resume, all the cpu's will be in sync with respect to the MTRR/PAT registers. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1301438292-28370-1-git-send-email-eric@anholt.net> Signed-off-by: Eric Anholt <eric@anholt.net> Tested-by: Keith Packard <keithp@keithp.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-31Merge remote branch 'lttng/2.6.38-lttng-0.247'Avik Sil
Conflicts: arch/arm/kernel/traps.c arch/arm/mach-omap2/clock34xx.c arch/arm/mach-omap2/pm34xx.c
2011-03-27x86: Cleanup highmap after brk is concludedYinghai Lu
commit e5f15b45ddf3afa2bbbb10c7ea34fb32b6de0a0e upstream. Now cleanup_highmap actually is in two steps: one is early in head64.c and only clears above _end; a second one is in init_memory_mapping() and tries to clean from _brk_end to _end. It should check if those boundaries are PMD_SIZE aligned but currently does not. Also init_memory_mapping() is called several times for numa or memory hotplug, so we really should not handle initial kernel mappings there. This patch moves cleanup_highmap() down after _brk_end is settled so we can do everything in one step. Also we honor max_pfn_mapped in the implementation of cleanup_highmap. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-27x86: Fix binutils-2.21 symbol related build failuresSedat Dilek
commit 2ae9d293b14d17f35eff624272cfecac7979a2ee upstream. [only 1/2 of the upstream commit was needed for stable - gkh] New binutils version 2.21.0.20110302-1 started checking that the symbol parameter to the .size directive matches the entry name's symbol parameter, unearthing two mismatches: AS arch/x86/kernel/acpi/wakeup_rm.o arch/x86/kernel/acpi/wakeup_rm.S: Assembler messages: arch/x86/kernel/acpi/wakeup_rm.S:12: Error: .size expression with symbol `wakeup_code_start' does not evaluate to a constant arch/x86/kernel/entry_32.S: Assembler messages: arch/x86/kernel/entry_32.S:1421: Error: .size expression with symbol `apf_page_fault' does not evaluate to a constant The problem was discovered while using Debian's binutils (2.21.0.20110302-1) and experimenting with binutils from upstream. Thanks Alexander and H.J. for the vital help. Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Rafael J. Wysocki <rjw@sisk.pl> LKML-Reference: <1299620364-21644-1-git-send-email-sedat.dilek@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-23x86, binutils, xen: Fix another wrong size directiveAlexander van Heukelum
commit 371c394af27ab7d1e58a66bc19d9f1f3ac1f67b4 upstream. The latest binutils (2.21.0.20110302/Ubuntu) breaks the build yet another time, under CONFIG_XEN=y due to a .size directive that refers to a slightly differently named (hence, to the now very strict and unforgiving assembler, non-existent) symbol. [ mingo: This unnecessary build breakage caused by new binutils version 2.21 gets escallated back several kernel releases spanning several years of Linux history, affecting over 130,000 upstream kernel commits (!), on CONFIG_XEN=y 64-bit kernels (i.e. essentially affecting all major Linux distro kernel configs). Git annotate tells us that this slight debug symbol code mismatch bug has been introduced in 2008 in commit 3d75e1b8: 3d75e1b8 (Jeremy Fitzhardinge 2008-07-08 15:06:49 -0700 1231) ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs) The 'bug' is just a slight assymetry in ENTRY()/END() debug-symbols sequences, with lots of assembly code between the ENTRY() and the END(): ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs) ... END(do_hypervisor_callback) Human reviewers almost never catch such small mismatches, and binutils never even warned about it either. This new binutils version thus breaks the Xen build on all upstream kernels since v2.6.27, out of the blue. This makes a straightforward Git bisection of all 64-bit Xen-enabled kernels impossible on such binutils, for a bisection window of over hundred thousand historic commits. (!) This is a major fail on the side of binutils and binutils needs to turn this show-stopper build failure into a warning ASAP. ] Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jan Beulich <jbeulich@novell.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <kees.cook@canonical.com> LKML-Reference: <1299877178-26063-1-git-send-email-heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-23x86: stop_machine_text_poke() should issue sync_core()Mathieu Desnoyers
commit 0e00f7aed6af21fc09b2a94d28bc34e449bd3a53 upstream. Intel Archiecture Software Developer's Manual section 7.1.3 specifies that a core serializing instruction such as "cpuid" should be executed on _each_ core before the new instruction is made visible. Failure to do so can lead to unspecified behavior (Intel XMC erratas include General Protection Fault in the list), so we should avoid this at all cost. This problem can affect modified code executed by interrupt handlers after interrupt are re-enabled at the end of stop_machine, because no core serializing instruction is executed between the code modification and the moment interrupts are reenabled. Because stop_machine_text_poke performs the text modification from the first CPU decrementing stop_machine_first, modified code executed in thread context is also affected by this problem. To explain why, we have to split the CPUs in two categories: the CPU that initiates the text modification (calls text_poke_smp) and all the others. The scheduler, executed on all other CPUs after stop_machine, issues an "iret" core serializing instruction, and therefore handles core serialization for all these CPUs. However, the text modification initiator can continue its execution on the same thread and access the modified text without any scheduler call. Given that the CPU that initiates the code modification is not guaranteed to be the one actually performing the code modification, it falls into the XMC errata. Q: Isn't this executed from an IPI handler, which will return with IRET (a serializing instruction) anyway? A: No, now stop_machine uses per-cpu workqueue, so that handler will be executed from worker threads. There is no iret anymore. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20110303160137.GB1590@Krystal> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-23x86, quirk: Fix SB600 revision checkAndreas Herrmann
commit 1d3e09a304e6c4e004ca06356578b171e8735d3c upstream. Commit 7f74f8f28a2bd9db9404f7d364e2097a0c42cc12 (x86 quirk: Fix polarity for IRQ0 pin2 override on SB800 systems) introduced a regression. It removed some SB600 specific code to determine the revision ID without adapting a corresponding revision ID check for SB600. See this mail thread: http://marc.info/?l=linux-kernel&m=129980296006380&w=2 This patch adapts the corresponding check to cover all SB600 revisions. Tested-by: Wang Lei <f3d27b@gmail.com> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <20110315143137.GD29499@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-23x86: Emit "mem=nopentium ignored" warning when not supportedKamal Mostafa
commit 9a6d44b9adb777ca9549e88cd55bd8f2673c52a2 upstream. Emit warning when "mem=nopentium" is specified on any arch other than x86_32 (the only that arch supports it). Signed-off-by: Kamal Mostafa <kamal@canonical.com> BugLink: http://bugs.launchpad.net/bugs/553464 Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> LKML-Reference: <1296783486-23033-2-git-send-email-kamal@canonical.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-23x86: Fix panic when handling "mem={invalid}" paramKamal Mostafa
commit 77eed821accf5dd962b1f13bed0680e217e49112 upstream. Avoid removing all of memory and panicing when "mem={invalid}" is specified, e.g. mem=blahblah, mem=0, or mem=nopentium (on platforms other than x86_32). Signed-off-by: Kamal Mostafa <kamal@canonical.com> BugLink: http://bugs.launchpad.net/bugs/553464 Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> LKML-Reference: <1296783486-23033-1-git-send-email-kamal@canonical.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-16trace-clock-userspaceMathieu Desnoyers
TRACE_CLOCK and TRACE_CLOCK_FREQ in clock_gettime These new options to clock_gettime allows the user to retreive the TSC frequency and the current TSC from userspace. We use the LTTng infrastructure to make sure the TSC is synchronized. If it is not, we fallback to a syscall (which for the moment does the same thing but in the future will be modified to ensure consistency for the tracing between user and kernel space). The main difference with using the TSC clocksource directly is that the time starts at machine boot and not at Linux boot which makes it possible to correlate user and kernelspace events. Also we export frequency and cycles, we don't do the conversion in sec.nsec from the kernel since we don't need it. The differences between the v1 are : - we validated on 32 bits the clock_gettime vDSO doesn't exist so it cleans up the vDSO code; - the syscall is now properly defined using the posix timer architecture - we export the frequency to userspace so we don't need to convert the cycles in sec.nsec anymore. Which means that on 64 bits machine, the nsec field will contain the whole cycle counter and on 32 bits the value is split between the two fields sec and nsec. - remove the rdtsc_barrier() which is overkill for tracing purpose - trace_clock_is_sync field is updated as soon as the LTTng trace clock detects an inconsistency Updated benchmarks (with 20000000 iterations reading the tsc before and after each call on an i7 920): 64 bits with vDSO average cycles for clock_realtime: 101 average cycles for clock_monotonic: 104 average cycles for clock_trace: 52 64 bits without vDSO (using syscall) average cycles for clock_realtime: 240 average cycles for clock_monotonic: 256 average cycles for clock_trace: 219 32 bits (without vDSO) average cycles for clock_realtime: 649 average cycles for clock_monotonic: 661 average cycles for clock_trace: 616 Signed-off-by: Julien Desfossez <julien.desfossez@polymtl.ca> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2011-03-16trace-clock-remove-extra-barriers-on-x86Mathieu Desnoyers
trace clock remove extra barriers on x86 Given that a tracer cannot realistically provide accuracy better than the inaccuracy between the traced action (e.g. an atomic operation) and the timestamp read, having barriers around the timestamp read is just overkill. This will speed up tracing. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2011-03-16trace-clock-get-may-failMathieu Desnoyers
Trace clock get may fail ARM pmu reservation may fail, so we have to change the trace clock get prototype. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2011-03-16x86-trace-clock-use-mod-timerMathieu Desnoyers
x86 trace clock use del timer sync It is incorrect to call add_timer_on from the timer handler. This fixes this issue. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2011-03-16idle-notifier-x86_32-fix-export-symbolMathieu Desnoyers
idle notifier x86_32 fix export symbol Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
2011-03-16idle-notifier-x86_32-fixMathieu Desnoyers
idle notifier x86_32 fix - Comment cleanup - Fix apm.c for 32-bit. Need to include asm/idle.h and need __exit_idle symbol. Therefore, x86 64 __exit_idle must become non-static too. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
2011-03-16lttng-instrumentation-idle-x86_32Mathieu Desnoyers
lttng instrumentation idle x86_32 Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2011-03-16idle-notifier-x86_32Mathieu Desnoyers
idle notifier standardization x86_32 Add idle notifier callback to x86_32. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2011-03-16idle-notifier-standardizeMathieu Desnoyers
2011-03-16lttng-pm-idle-instrumentationMathieu Desnoyers
LTTng pm_idle instrumentation, arch-specific Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
2011-03-16x86-trace-clock-use-spinlockMathieu Desnoyers
x86 trace-clock use spinlock CPU hotplug notifiers should use spinlocks, not mutexes. They are not sleepable. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
2011-03-16markers-update-to-channel-apiMathieu Desnoyers
Markers update to new channel API Update marker users accordingly. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
2011-03-16lttng-statedump/lttng-statedump-x86Mathieu Desnoyers
lttng statedump x86 Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
2011-03-16lttng-instrumentation/lttng-instrumentation-x86Mathieu Desnoyers
LTTng - x86 instrumentation Make x86 support TIF_SYSCALL_TRACE async flag set in entry.S When the flag is inactive upon syscall entry and concurrently activated before exit, we seem to reach a state where the top of stack is incorrect upon return to user space. Fix this by fixing the top of stack and jumping to int_ret_from_sys_call if we detect that thread flags has been modified. We make sure that the thread flag read is coherent between our new test and the ALLWORK_MASK test by first saving it in a register used for both comparisons. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: H. Peter Anvin <hpa@zytor.com>
2011-03-16lttng-instrumentation/lttng-kernel-trace-thread-flag-x86Mathieu Desnoyers
LTTng Linux Kernel Trace Thread Flag x86 Add a thread flag to activate system-wide syscall tracing. Make x86 support TIF_SYSCALL_TRACE async flag set in entry_32.S/entry_64.S. x86_64 : When the flag is inactive upon syscall entry and concurrently activated before exit, we seem to reach a state where the top of stack is incorrect upon return to user space. Fix this by fixing the top of stack and jumping to int_ret_from_sys_call if we detect that thread flags has been modified. We make sure that the thread flag read is coherent between our new test and the ALLWORK_MASK test by first saving it in a register used for both comparisons. Note : Removed : # perform syscall exit tracing ALIGN syscall_exit_work: - testb $_TIF_WORK_SYSCALL_EXIT, %cl jz work_pending TRACE_IRQS_ON ENABLE_INTERRUPTS(CLBR_ANY) # could let syscall_trace_leave() call
2011-03-16lttng-core/lttng-core-x86Mathieu Desnoyers
LTTng core x86 Adds OOPS printing indication of nesting within tracer code. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: H. Peter Anvin <hpa@zytor.com>
2011-03-16trace-clock/x86-trace-clockMathieu Desnoyers
x86 trace clock X86 trace clock. Depends on tsc_sync to detect if timestamp counters are synchronized on the machine. I am leaving this poorly scalable solution for now as this is the simplest, yet working, solution I found (compared to using the HPET which also scales very poorly, probably due to bus contention). This should be a good start and let us trace a good amount of machines out there. A "Big Fat" (TM) warning is shown on the console when the trace clock is used on systems without synchronized TSCs to tell the user to - use force_tsc_sync=1 - use idle=poll - disable Powernow or Speedstep In order to get accurate and fast timestamps. This keeps room for further improvement in a second phase. Changelog: - freq_scale is not used as a divisor rather than multiplier to support systems with frequency < 1HZ. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Steven Rostedt <rostedt@goodmis.org>
2011-03-16trace-clock/x86-remove-arch-specific-tsc_syncMathieu Desnoyers
x86 : remove arch-specific tsc_sync.c Depends on the new arch. independent kernel/time/tsc-sync.c Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Steven Rostedt <rostedt@goodmis.org>
2011-03-16nmi-safe-kernel/x86-nmi-safe-int3-and-page-faultMathieu Desnoyers
x86 NMI-safe INT3 and Page Fault Implements an alternative iret with popf and return so trap and exception handlers can return to the NMI handler without issuing iret. iret would cause NMIs to be reenabled prematurely. x86_32 uses popf and far return. x86_64 has to copy the return instruction pointer to the top of the previous stack, issue a popf, loads the previous esp and issue a near return (ret). It allows placing immediate values (and therefore optimized trace_marks) in NMI code since returning from a breakpoint would be valid. Accessing vmalloc'd memory, which allows executing module code or accessing vmapped or vmalloc'd areas from NMI context, would also be valid. This is very useful to tracers like LTTng. This patch makes all faults, traps and exception safe to be called from NMI context *except* single-stepping, which requires iret to restore the TF (trap flag) and jump to the return address in a single instruction. Sorry, no kprobes support in NMI handlers because of this limitation. We cannot single-step an NMI handler, because iret must set the TF flag and return back to the instruction to single-step in a single instruction. This cannot be emulated with popf/lret, because lret would be single-stepped. It does not apply to immediate values because they do not use single-stepping. This code detects if the TF flag is set and uses the iret path for single-stepping, even if it reactivates NMIs prematurely. Test to detect if nested under a NMI handler is only done upon the return from trap/exception to kernel, which is not frequent. Other return paths (return from trap/exception to userspace, return from interrupt) keep the exact same behavior (no slowdown). alpha and avr32 use the active count bit 31. This patch moves them to 28. TODO : test alpha and avr32 active count modification TODO : test with lguest, xen, kvm. ** This patch depends on the "Stringify support commas" patchset ** ** Also depends on fix-x86_64-page-fault-scheduler-race patch ** tested on x86_32 (tests implemented in a separate patch) : - instrumented the return path to export the EIP, CS and EFLAGS values when taken so we know the return path code has been executed. - trace_mark, using immediate values, with 10ms delay with the breakpoint activated. Runs well through the return path. - tested vmalloc faults in NMI handler by placing a non-optimized marker in the NMI handler (so no breakpoint is executed) and connecting a probe which touches every pages of a 20MB vmalloc'd buffer. It executes trough the return path without problem. - Tested with and without preemption tested on x86_64 - instrumented the return path to export the EIP, CS and EFLAGS values when taken so we know the return path code has been executed. - trace_mark, using immediate values, with 10ms delay with the breakpoint activated. Runs well through the return path. To test on x86_64 : - Test without preemption - Test vmalloc faults - Test on Intel 64 bits CPUs. (AMD64 was fine) Changelog since v1 : - x86_64 fixes. Changelog since v2 : - fix paravirt build Changelog since v3 : - Include modifications suggested by Jeremy Changelog since v4 : - including hardirq.h in entry_32/64.S is a bad idea (non ifndef'd C code), define NMI_MASK in the .S files directly. Changelog since v5 : - Add NMI_MASK to irq_count() and make die() more verbose for NMIs. Changelog since v7 : - Implement paravirtualized nmi_return. Changelog since v8 : - refreshed the patch for asm-offsets. Those were left out of v8. - now depends on "Stringify support commas" patch. Changelog since v9 : - Only test the nmi nested preempt count flag upon return from exceptions, not on return from interrupts. Only the kernel return path has this test. - Add Xen, VMI, lguest support. Use their iret pavavirt ops in lieu of nmi_return. - update for 2.6.30-rc1 Follow NMI_MASK bits merged in mainline. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: akpm@osdl.org CC: mingo@elte.hu CC: "H. Peter Anvin" <hpa@zytor.com> CC: Jeremy Fitzhardinge <jeremy@goop.org> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Frank Ch. Eigler" <fche@redhat.com>
2011-03-14Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: ce4100: Set pci ops via callback instead of module init x86/mm: Fix pgd_lock deadlock x86/mm: Handle mm_fault_error() in kernel space x86: Don't check for BIOS corruption in first 64K when there's no need to
2011-03-09[CPUFREQ] pcc-cpufreq: don't load driver if get_freq fails during init.Naga Chumbalkar
Return 0 on failure. This will cause the initialization of the driver to fail and prevent the driver from loading if the BIOS cannot handle the PCC interface command to "get frequency". Otherwise, the driver will load and display a very high value like "4294967274" (which is actually -EINVAL) for frequency: # cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq 4294967274 Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> CC: stable@kernel.org Signed-off-by: Dave Jones <davej@redhat.com>
2011-03-09x86: Don't check for BIOS corruption in first 64K when there's no need toNaga Chumbalkar
Due to commit 781c5a67f152c17c3e4a9ed9647f8c0be6ea5ae9 it is likely that the number of areas to scan for BIOS corruption is 0 -- especially when the first 64K is already reserved (X86_RESERVE_LOW is 64K by default). If that's the case then don't set up the scan. Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Cc: <stable@kernel.org> LKML-Reference: <20110225202838.2229.71011.sendpatchset@nchumbalkar.americas.hpqcorp.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-01[CPUFREQ] p4-clockmod: print EST-capable warning message only onceNaga Chumbalkar
Print the message only once. I see it 16 times on a 2P box with 16 logical CPUs. Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
2011-03-01[CPUFREQ] Fix another notifier leak in powernow-k8.Dave Jones
Do the notifier registration later, so we don't have to worry about freeing it if we fail the msr allocation. Signed-off-by: Dave Jones <davej@redhat.com>
2011-03-01[CPUFREQ] Missing "unregister_cpu_notifier" in powernow-k8.cNeil Brown
It appears that when powernow-k8 finds that No compatible ACPI _PSS objects found. and suggests Try again with latest BIOS. it fails the module load, but does not unregister the cpu_notifier that was registered in powernowk8_init This ends up leaving freed memory on the cpu notifier list for some other poor module (e.g. md/raid5) to come along and trip over. The following might be a partial fix, but I suspect there is probably other clean-up that is needed. ( https://bugzilla.novell.com/show_bug.cgi?id=655215 has full dmesg traces). Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Neil Brown <neilb@suse.de>