aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)Author
2011-05-22Merge commit 'v2.6.38.7' into linaro-2.6.38Nicolas Pitre
Conflicts: mm/memory.c
2011-05-21x86: Fix UV BAU for non-consecutive nasidsCliff Wickman
commit 77ed23f8d995a01cd8101d84351b567bf5177a30 upstream. This is a fix for the SGI Altix-UV Broadcast Assist Unit code, which is used for TLB flushing. Certain hardware configurations (that customers are ordering) cause nasids (numa address space id's) to be non-consecutive. Specifically, once you have more than 4 blades in a IRU (Individual Rack Unit - or 1/2 rack) but less than the maximum of 16, the nasid numbering becomes non-consecutive. This currently results in a 'catastrophic error' (CATERR) detected by the firmware during OS boot. The BAU is generating an 'INTD' request that is targeting a non-existent nasid value. Such configurations may also occur when a blade is configured off because of hardware errors. (There is one UV hub per blade.) This patch is required to support such configurations. The problem with the tlb_uv.c code is that is using the consecutive hub numbers as indices to the BAU distribution bit map. These are simply the ordinal position of the hub or blade within its partition. It should be using physical node numbers (pnodes), which correspond to the physical nasid values. Use of the hub number only works as long as the nasids in the partition are consecutive and increase with a stride of 1. This patch changes the index to be the pnode number, thus allowing nasids to be non-consecutive. It also provides a table in local memory for each cpu to translate target cpu number to target pnode and nasid. And it improves naming to properly reflect 'node' and 'uvhub' versus 'nasid'. Signed-off-by: Cliff Wickman <cpw@sgi.com> Link: http://lkml.kernel.org/r/E1QJmxX-0002Mz-Fk@eag09.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-21x86, apic: Fix spurious error interrupts triggering on all non-boot APsYouquan Song
commit e503f9e4b092e2349a9477a333543de8f3c7f5d9 upstream. This patch fixes a bug reported by a customer, who found that many unreasonable error interrupts reported on all non-boot CPUs (APs) during the system boot stage. According to Chapter 10 of Intel Software Developer Manual Volume 3A, Local APIC may signal an illegal vector error when an LVT entry is set as an illegal vector value (0~15) under FIXED delivery mode (bits 8-11 is 0), regardless of whether the mask bit is set or an interrupt actually happen. These errors are seen as error interrupts. The initial value of thermal LVT entries on all APs always reads 0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI sequence to them and LVT registers are reset to 0s except for the mask bits which are set to 1s when APs receive INIT IPI. When the BIOS takes over the thermal throttling interrupt, the LVT thermal deliver mode should be SMI and it is required from the kernel to keep AP's LVT thermal monitoring register programmed as such as well. This issue happens when BIOS does not take over thermal throttling interrupt, AP's LVT thermal monitor register will be restored to 0x10000 which means vector 0 and fixed deliver mode, so all APs will signal illegal vector error interrupts. This patch check if interrupt delivery mode is not fixed mode before restoring AP's LVT thermal monitor register. Signed-off-by: Youquan Song <youquan.song@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Yong Wang <yong.y.wang@intel.com> Cc: hpa@linux.intel.com Cc: joe@perches.com Cc: jbaron@redhat.com Cc: trenn@suse.de Cc: kent.liu@intel.com Cc: chaohong.guo@intel.com Link: http://lkml.kernel.org/r/1303402963-17738-1-git-send-email-youquan.song@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-02Merge commit 'v2.6.38.5' into linaro-2.6.38Nicolas Pitre
2011-05-02x86, gart: Set DISTLBWALKPRB bit alwaysJoerg Roedel
commit c34151a742d84ae65db2088ea30495063f697fbe upstream. The DISTLBWALKPRB bit must be set for the GART because the gatt table is mapped UC. But the current code does not set the bit at boot when the BIOS setup the aperture correctly. Fix that by setting this bit when enabling the GART instead of the other places. Cc: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/1303134346-5805-4-git-send-email-joerg.roedel@amd.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-25Merge commit 'v2.6.38.4' into linaro-2.6.38Nicolas Pitre
2011-04-25Merge commit '07d5eca' into linaro-2.6.38Nicolas Pitre
2011-04-21x86, amd: Disable GartTlbWlkErr when BIOS forgets itJoerg Roedel
commit 5bbc097d890409d8eff4e3f1d26f11a9d6b7c07e upstream. This patch disables GartTlbWlk errors on AMD Fam10h CPUs if the BIOS forgets to do is (or is just too old). Letting these errors enabled can cause a sync-flood on the CPU causing a reboot. The AMD BKDG recommends disabling GART TLB Wlk Error completely. This patch is the fix for https://bugzilla.kernel.org/show_bug.cgi?id=33012 on my machine. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/20110415131152.GJ18463@8bytes.org Tested-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-31Merge remote branch 'lttng/2.6.38-lttng-0.247'Avik Sil
Conflicts: arch/arm/kernel/traps.c arch/arm/mach-omap2/clock34xx.c arch/arm/mach-omap2/pm34xx.c
2011-03-23x86: Flush TLB if PGD entry is changed in i386 PAE modeShaohua Li
commit 4981d01eada5354d81c8929d5b2836829ba3df7b upstream. According to intel CPU manual, every time PGD entry is changed in i386 PAE mode, we need do a full TLB flush. Current code follows this and there is comment for this too in the code. But current code misses the multi-threaded case. A changed page table might be used by several CPUs, every such CPU should flush TLB. Usually this isn't a problem, because we prepopulate all PGD entries at process fork. But when the process does munmap and follows new mmap, this issue will be triggered. When it happens, some CPUs keep doing page faults: http://marc.info/?l=linux-kernel&m=129915020508238&w=2 Reported-by: Yasunori Goto<y-goto@jp.fujitsu.com> Tested-by: Yasunori Goto<y-goto@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Shaohua Li<shaohua.li@intel.com> Cc: Mallick Asit K <asit.k.mallick@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm <linux-mm@kvack.org> LKML-Reference: <1300246649.2337.95.camel@sli10-conroe> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-16trace-clock-userspaceMathieu Desnoyers
TRACE_CLOCK and TRACE_CLOCK_FREQ in clock_gettime These new options to clock_gettime allows the user to retreive the TSC frequency and the current TSC from userspace. We use the LTTng infrastructure to make sure the TSC is synchronized. If it is not, we fallback to a syscall (which for the moment does the same thing but in the future will be modified to ensure consistency for the tracing between user and kernel space). The main difference with using the TSC clocksource directly is that the time starts at machine boot and not at Linux boot which makes it possible to correlate user and kernelspace events. Also we export frequency and cycles, we don't do the conversion in sec.nsec from the kernel since we don't need it. The differences between the v1 are : - we validated on 32 bits the clock_gettime vDSO doesn't exist so it cleans up the vDSO code; - the syscall is now properly defined using the posix timer architecture - we export the frequency to userspace so we don't need to convert the cycles in sec.nsec anymore. Which means that on 64 bits machine, the nsec field will contain the whole cycle counter and on 32 bits the value is split between the two fields sec and nsec. - remove the rdtsc_barrier() which is overkill for tracing purpose - trace_clock_is_sync field is updated as soon as the LTTng trace clock detects an inconsistency Updated benchmarks (with 20000000 iterations reading the tsc before and after each call on an i7 920): 64 bits with vDSO average cycles for clock_realtime: 101 average cycles for clock_monotonic: 104 average cycles for clock_trace: 52 64 bits without vDSO (using syscall) average cycles for clock_realtime: 240 average cycles for clock_monotonic: 256 average cycles for clock_trace: 219 32 bits (without vDSO) average cycles for clock_realtime: 649 average cycles for clock_monotonic: 661 average cycles for clock_trace: 616 Signed-off-by: Julien Desfossez <julien.desfossez@polymtl.ca> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2011-03-16trace-clock-remove-extra-barriers-on-x86Mathieu Desnoyers
trace clock remove extra barriers on x86 Given that a tracer cannot realistically provide accuracy better than the inaccuracy between the traced action (e.g. an atomic operation) and the timestamp read, having barriers around the timestamp read is just overkill. This will speed up tracing. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2011-03-16trace-clock-get-may-failMathieu Desnoyers
Trace clock get may fail ARM pmu reservation may fail, so we have to change the trace clock get prototype. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2011-03-16Move KVM trace includes in a standard directoryMathieu Desnoyers
The mmutrace.h and trace.h were defined in arch/x86/kvm/. Moved them in arch/x86/include and fixed the dependencies to make inclusion possible by lttng-modules. From: Julien Desfossez <julien.desfossez@polymtl.ca> Signed-off-by: Julien Desfossez <julien.desfossez@polymtl.ca> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2011-03-16idle-notifier-x86_32-fix-export-symbolMathieu Desnoyers
idle notifier x86_32 fix export symbol Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
2011-03-16idle-notifier-x86_32-fixMathieu Desnoyers
idle notifier x86_32 fix - Comment cleanup - Fix apm.c for 32-bit. Need to include asm/idle.h and need __exit_idle symbol. Therefore, x86 64 __exit_idle must become non-static too. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
2011-03-16idle-notifier-x86_32Mathieu Desnoyers
idle notifier standardization x86_32 Add idle notifier callback to x86_32. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2011-03-16idle-notifier-standardizeMathieu Desnoyers
2011-03-16lttng-instrumentation/lttng-kernel-trace-thread-flag-x86Mathieu Desnoyers
LTTng Linux Kernel Trace Thread Flag x86 Add a thread flag to activate system-wide syscall tracing. Make x86 support TIF_SYSCALL_TRACE async flag set in entry_32.S/entry_64.S. x86_64 : When the flag is inactive upon syscall entry and concurrently activated before exit, we seem to reach a state where the top of stack is incorrect upon return to user space. Fix this by fixing the top of stack and jumping to int_ret_from_sys_call if we detect that thread flags has been modified. We make sure that the thread flag read is coherent between our new test and the ALLWORK_MASK test by first saving it in a register used for both comparisons. Note : Removed : # perform syscall exit tracing ALIGN syscall_exit_work: - testb $_TIF_WORK_SYSCALL_EXIT, %cl jz work_pending TRACE_IRQS_ON ENABLE_INTERRUPTS(CLBR_ANY) # could let syscall_trace_leave() call
2011-03-16trace-clock/x86-trace-clockMathieu Desnoyers
x86 trace clock X86 trace clock. Depends on tsc_sync to detect if timestamp counters are synchronized on the machine. I am leaving this poorly scalable solution for now as this is the simplest, yet working, solution I found (compared to using the HPET which also scales very poorly, probably due to bus contention). This should be a good start and let us trace a good amount of machines out there. A "Big Fat" (TM) warning is shown on the console when the trace clock is used on systems without synchronized TSCs to tell the user to - use force_tsc_sync=1 - use idle=poll - disable Powernow or Speedstep In order to get accurate and fast timestamps. This keeps room for further improvement in a second phase. Changelog: - freq_scale is not used as a divisor rather than multiplier to support systems with frequency < 1HZ. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Steven Rostedt <rostedt@goodmis.org>
2011-03-16trace-clock/x86-remove-arch-specific-tsc_syncMathieu Desnoyers
x86 : remove arch-specific tsc_sync.c Depends on the new arch. independent kernel/time/tsc-sync.c Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Steven Rostedt <rostedt@goodmis.org>
2011-03-16trace-clock/get-cycles-x86-have-get-cyclesMathieu Desnoyers
get_cycles() : x86 HAVE_GET_CYCLES This patch selects HAVE_GET_CYCLES and makes sure get_cycles_barrier() and get_cycles_rate() are implemented. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: David Miller <davem@davemloft.net> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Andrew Morton <akpm@linux-foundation.org> CC: Ingo Molnar <mingo@redhat.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Thomas Gleixner <tglx@linutronix.de> CC: Steven Rostedt <rostedt@goodmis.org> CC: linux-arch@vger.kernel.org
2011-03-16nmi-safe-kernel/x86-nmi-safe-int3-and-page-faultMathieu Desnoyers
x86 NMI-safe INT3 and Page Fault Implements an alternative iret with popf and return so trap and exception handlers can return to the NMI handler without issuing iret. iret would cause NMIs to be reenabled prematurely. x86_32 uses popf and far return. x86_64 has to copy the return instruction pointer to the top of the previous stack, issue a popf, loads the previous esp and issue a near return (ret). It allows placing immediate values (and therefore optimized trace_marks) in NMI code since returning from a breakpoint would be valid. Accessing vmalloc'd memory, which allows executing module code or accessing vmapped or vmalloc'd areas from NMI context, would also be valid. This is very useful to tracers like LTTng. This patch makes all faults, traps and exception safe to be called from NMI context *except* single-stepping, which requires iret to restore the TF (trap flag) and jump to the return address in a single instruction. Sorry, no kprobes support in NMI handlers because of this limitation. We cannot single-step an NMI handler, because iret must set the TF flag and return back to the instruction to single-step in a single instruction. This cannot be emulated with popf/lret, because lret would be single-stepped. It does not apply to immediate values because they do not use single-stepping. This code detects if the TF flag is set and uses the iret path for single-stepping, even if it reactivates NMIs prematurely. Test to detect if nested under a NMI handler is only done upon the return from trap/exception to kernel, which is not frequent. Other return paths (return from trap/exception to userspace, return from interrupt) keep the exact same behavior (no slowdown). alpha and avr32 use the active count bit 31. This patch moves them to 28. TODO : test alpha and avr32 active count modification TODO : test with lguest, xen, kvm. ** This patch depends on the "Stringify support commas" patchset ** ** Also depends on fix-x86_64-page-fault-scheduler-race patch ** tested on x86_32 (tests implemented in a separate patch) : - instrumented the return path to export the EIP, CS and EFLAGS values when taken so we know the return path code has been executed. - trace_mark, using immediate values, with 10ms delay with the breakpoint activated. Runs well through the return path. - tested vmalloc faults in NMI handler by placing a non-optimized marker in the NMI handler (so no breakpoint is executed) and connecting a probe which touches every pages of a 20MB vmalloc'd buffer. It executes trough the return path without problem. - Tested with and without preemption tested on x86_64 - instrumented the return path to export the EIP, CS and EFLAGS values when taken so we know the return path code has been executed. - trace_mark, using immediate values, with 10ms delay with the breakpoint activated. Runs well through the return path. To test on x86_64 : - Test without preemption - Test vmalloc faults - Test on Intel 64 bits CPUs. (AMD64 was fine) Changelog since v1 : - x86_64 fixes. Changelog since v2 : - fix paravirt build Changelog since v3 : - Include modifications suggested by Jeremy Changelog since v4 : - including hardirq.h in entry_32/64.S is a bad idea (non ifndef'd C code), define NMI_MASK in the .S files directly. Changelog since v5 : - Add NMI_MASK to irq_count() and make die() more verbose for NMIs. Changelog since v7 : - Implement paravirtualized nmi_return. Changelog since v8 : - refreshed the patch for asm-offsets. Those were left out of v8. - now depends on "Stringify support commas" patch. Changelog since v9 : - Only test the nmi nested preempt count flag upon return from exceptions, not on return from interrupts. Only the kernel return path has this test. - Add Xen, VMI, lguest support. Use their iret pavavirt ops in lieu of nmi_return. - update for 2.6.30-rc1 Follow NMI_MASK bits merged in mainline. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: akpm@osdl.org CC: mingo@elte.hu CC: "H. Peter Anvin" <hpa@zytor.com> CC: Jeremy Fitzhardinge <jeremy@goop.org> CC: Steven Rostedt <rostedt@goodmis.org> CC: "Frank Ch. Eigler" <fche@redhat.com>
2011-03-14Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: ce4100: Set pci ops via callback instead of module init x86/mm: Fix pgd_lock deadlock x86/mm: Handle mm_fault_error() in kernel space x86: Don't check for BIOS corruption in first 64K when there's no need to
2011-03-14x86: ce4100: Set pci ops via callback instead of module initSebastian Andrzej Siewior
Setting the pci ops on subsys initcall unconditionally will break multi platform kernels on anything except ce4100. Use x86_init.pci.init ops to call this only on real ce4100 platforms. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: sodaville@linutronix.de LKML-Reference: <20110314093340.GA21026@www.tglx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-11futex: Sanitize futex ops argument typesMichel Lespinasse
Change futex_atomic_op_inuser and futex_atomic_cmpxchg_inatomic prototypes to use u32 types for the futex as this is the data type the futex core code uses all over the place. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Darren Hart <darren@dvhart.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20110311025058.GD26122@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-11futex: Sanitize cmpxchg_futex_value_locked APIMichel Lespinasse
The cmpxchg_futex_value_locked API was funny in that it returned either the original, user-exposed futex value OR an error code such as -EFAULT. This was confusing at best, and could be a source of livelocks in places that retry the cmpxchg_futex_value_locked after trying to fix the issue by running fault_in_user_writeable(). This change makes the cmpxchg_futex_value_locked API more similar to the get_futex_value_locked one, returning an error code and updating the original value through a reference argument. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> [tile] Acked-by: Tony Luck <tony.luck@intel.com> [ia64] Acked-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michal Simek <monstr@monstr.eu> [microblaze] Acked-by: David Howells <dhowells@redhat.com> [frv] Cc: Darren Hart <darren@dvhart.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20110311024851.GC26122@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-10Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, UV: Initialize the broadcast assist unit base destination node id properly x86, numa: Fix numa_emulation code with memory-less node0 x86, build: Make sure mkpiggy fails on read error
2011-03-09x86, UV: Initialize the broadcast assist unit base destination node id properlyCliff Wickman
The BAU's initialization of the broadcast description header is lacking the coherence domain (high bits) in the nasid. This causes a catastrophic system failure when running on a system with multiple coherence domains. Signed-off-by: Cliff Wickman <cpw@sgi.com> LKML-Reference: <E1PxKBB-0005F0-3U@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-02Merge branch 'idle-release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: intel_idle: disable Atom/Lincroft HW C-state auto-demotion intel_idle: disable NHM/WSM HW C-state auto-demotion
2011-02-28x86: Use u32 instead of long to set reset vector back to 0Don Zickus
A customer of ours, complained that when setting the reset vector back to 0, it trashed other data and hung their box. They noticed when only 4 bytes were set to 0 instead of 8, everything worked correctly. Mathew pointed out: | | We're supposed to be resetting trampoline_phys_low and | trampoline_phys_high here, which are two 16-bit values. | Writing 64 bits is definitely going to overwrite space | that we're not supposed to be touching. | So limit the area modified to u32. Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: Matthew Garrett <mjg@redhat.com> Cc: <stable@kernel.org> LKML-Reference: <1297139100-424-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-25Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86 quirk: Fix polarity for IRQ0 pin2 override on SB800 systems x86/mrst: Fix apb timer rating when lapic timer is used x86: Fix reboot problem on VersaLogic Menlow boards
2011-02-24x86 quirk: Fix polarity for IRQ0 pin2 override on SB800 systemsAndreas Herrmann
On some SB800 systems polarity for IOAPIC pin2 is wrongly specified as low active by BIOS. This caused system hangs after resume from S3 when HPET was used in one-shot mode on such systems because a timer interrupt was missed (HPET signal is high active). For more details see: http://marc.info/?l=linux-kernel&m=129623757413868 Tested-by: Manoj Iyer <manoj.iyer@canonical.com> Tested-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: stable@kernel.org # 37.x, 32.x LKML-Reference: <20110224145346.GD3658@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-17intel_idle: disable Atom/Lincroft HW C-state auto-demotionLen Brown
Just as we had to disable auto-demotion for NHM/WSM, we need to do the same for Atom (Lincroft version). In particular, auto-demotion will prevent Lincroft from entering the S0i3 idle power saving state. https://bugzilla.kernel.org/show_bug.cgi?id=25252 Signed-off-by: Len Brown <len.brown@intel.com>
2011-02-17intel_idle: disable NHM/WSM HW C-state auto-demotionLen Brown
Hardware C-state auto-demotion is a mechanism where the HW overrides the OS C-state request, instead demoting to a shallower state, which is less expensive, but saves less power. Modern Linux should generally get exactly the states it requests. In particular, when a CPU is taken off-line, it must not be demoted, else it can prevent the entire package from reaching deep C-states. https://bugzilla.kernel.org/show_bug.cgi?id=25252 Signed-off-by: Len Brown <len.brown@intel.com>
2011-02-16perf, x86: P4 PMU: Fix spurious NMI messagesCyrill Gorcunov
Several people have reported spurious unknown NMI messages on some P4 CPUs. This patch fixes it by checking for an overflow (negative counter values) directly, instead of relying on the P4_CCCR_OVF bit. Reported-by: George Spelvin <linux@horizon.com> Reported-by: Meelis Roos <mroos@linux.ee> Reported-by: Don Zickus <dzickus@redhat.com> Reported-by: Dave Airlie <airlied@gmail.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <AANLkTinfuTfCck_FfaOHrDqQZZehtRzkBum4SpFoO=KJ@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14x86: Fix mwait_usable section mismatchBorislav Petkov
We use it in non __cpuinit code now too so drop marker. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20110211171754.GA21047@aftab> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-10x86: Fix section mismatch in LAPIC initializationJan Beulich
Additionally doing things conditionally upon smp_processor_id() being zero is generally a bad idea, as this means CPU 0 cannot be offlined and brought back online later again. While there may be other places where this is done, I think adding more of those should be avoided so that some day SMP can really become "symmetrical". Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> LKML-Reference: <4D525C7E0200007800030EE1@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-06Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32: Make sure the stack is set up before we use it x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms x86, nx: Don't force pages RW when setting NX bits
2011-02-04x86-32: Make sure the stack is set up before we use itH. Peter Anvin
Since checkin ebba638ae723d8a8fc2f7abce5ec18b688b791d7 we call verify_cpu even in 32-bit mode. Unfortunately, calling a function means using the stack, and the stack pointer was not initialized in the 32-bit setup code! This code initializes the stack pointer, and simplifies the interface slightly since it is easier to rely on just a pointer value rather than a descriptor; we need to have different values for the segment register anyway. This retains start_stack as a virtual address, even though a physical address would be more convenient for 32 bits; the 64-bit code wants the other way around... Reported-by: Matthieu Castet <castet.matthieu@free.fr> LKML-Reference: <4D41E86D.8060205@free.fr> Tested-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-02-03x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after ↵Suresh Siddha
switching mm Clearing the cpu in prev's mm_cpumask early will avoid the flush tlb IPI's while the cr3 is still pointing to the prev mm. And this window can lead to the possibility of bogus TLB fills resulting in strange failures. One such problematic scenario is mentioned below. T1. CPU-1 is context switching from mm1 to mm2 context and got a NMI etc between the point of clearing the cpu from the mm_cpumask(mm1) and before reloading the cr3 with the new mm2. T2. CPU-2 is tearing down a specific vma for mm1 and will proceed with flushing the TLB for mm1. It doesn't send the flush TLB to CPU-1 as it doesn't see that cpu listed in the mm_cpumask(mm1). T3. After the TLB flush is complete, CPU-2 goes ahead and frees the page-table pages associated with the removed vma mapping. T4. CPU-2 now allocates those freed page-table pages for something else. T5. As the CR3 and TLB caches for mm1 is still active on CPU-1, CPU-1 can potentially speculate and walk through the page-table caches and can insert new TLB entries. As the page-table pages are already freed and being used on CPU-2, this page walk can potentially insert a bogus global TLB entry depending on the (random) contents of the page that is being used on CPU-2. T6. This bogus TLB entry being global will be active across future CR3 changes and can result in weird memory corruption etc. To avoid this issue, for the prev mm that is handing over the cpu to another mm, clear the cpu from the mm_cpumask(prev) after the cr3 is changed. Marking it for -stable, though we haven't seen any reported failure that can be attributed to this. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: stable@kernel.org [v2.6.32+] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-28Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: percpu, x86: Fix percpu_xchg_op() x86: Remove left over system_64.h x86-64: Don't use pointer to out-of-scope variable in dump_trace()
2011-01-26percpu, x86: Fix percpu_xchg_op()Eric Dumazet
These recent percpu commits: 2485b6464cf8: x86,percpu: Move out of place 64 bit ops into X86_64 section 8270137a0d50: cpuops: Use cmpxchg for xchg to avoid lock semantics Caused this 'perf top' crash: Kernel panic - not syncing: Fatal exception in interrupt Pid: 0, comm: swapper Tainted: G D 2.6.38-rc2-00181-gef71723 #413 Call Trace: <IRQ> [<ffffffff810465b5>] ? panic ? kmsg_dump ? kmsg_dump ? oops_end ? no_context ? __bad_area_nosemaphore ? perf_output_begin ? bad_area_nosemaphore ? do_page_fault ? __task_pid_nr_ns ? perf_event_tid ? __perf_event_header__init_id ? validate_chain ? perf_output_sample ? trace_hardirqs_off ? page_fault ? irq_work_run ? update_process_times ? tick_sched_timer ? tick_sched_timer ? __run_hrtimer ? hrtimer_interrupt ? account_system_vtime ? smp_apic_timer_interrupt ? apic_timer_interrupt ... Looking at assembly code, I found: list = this_cpu_xchg(irq_work_list, NULL); gives this wrong code : (gcc-4.1.2 cross compiler) ffffffff810bc45e: mov %gs:0xead0,%rax cmpxchg %rax,%gs:0xead0 jne ffffffff810bc45e <irq_work_run+0x3e> test %rax,%rax je ffffffff810bc4aa <irq_work_run+0x8a> Tell gcc we dirty eax/rax register in percpu_xchg_op() Compiler must use another register to store pxo_new__ We also dont need to reload percpu value after a jump, since a 'failed' cmpxchg already updated eax/rax Wrong generated code was : xor %rax,%rax /* load 0 into %rax */ 1: mov %gs:0xead0,%rax cmpxchg %rax,%gs:0xead0 jne 1b test %rax,%rax After patch : xor %rdx,%rdx /* load 0 into %rdx */ mov %gs:0xead0,%rax 1: cmpxchg %rdx,%gs:0xead0 jne 1b: test %rax,%rax Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Tejun Heo <tj@kernel.org> LKML-Reference: <1295973114.3588.312.camel@edumazet-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-26x86: Remove left over system_64.hYinghai Lu
Left-over from the x86 merge ... Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D3E23D1.7010405@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-26thp: fix PARAVIRT x86 32bit noPAEAndrea Arcangeli
This fixes TRANSPARENT_HUGEPAGE=y with PARAVIRT=y and HIGHMEM64=n. The #ifdef that this patch removes was erratically introduced to fix a build error for noPAE (where pmd.pmd doesn't exist). So then the kernel built but it failed at runtime because set_pmd_at was a noop. This will correct it by enabling set_pmd_at for noPAE mode too. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: werner <w.landgraf@ru.ru> Reported-by: Minchan Kim <minchan.kim@gmail.com> Tested-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-25Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix jump label with RO/NX module protection crash x86, hotplug: Fix powersavings with offlined cores on AMD x86, mcheck, therm_throt.c: Export symbol platform_thermal_notify to allow coretemp to handler intr x86: Use asm-generic/cacheflush.h x86: Update CPU cache attributes table descriptors
2011-01-23x86: Fix jump label with RO/NX module protection crashmatthieu castet
If we use jump table in module init, there are marked as removed in __jump_table section after init is done. But we already applied ro permissions on the module, so we can't modify a read only section (crash in remove_jump_label_module_init). Make the __jump_table section rw. Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr> Cc: Xiaotian Feng <xtfeng@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Siarhei Liakh <sliakh.lkml@gmail.com> Cc: Xuxian Jiang <jiang@cs.ncsu.edu> Cc: James Morris <jmorris@namei.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Dave Jones <davej@redhat.com> Cc: Kees Cook <kees.cook@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4D3C3F20.7030203@free.fr> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-21x86, hotplug: Fix powersavings with offlined cores on AMDBorislav Petkov
ea53069231f9317062910d6e772cca4ce93de8c8 made a CPU use monitor/mwait when offline. This is not the optimal choice for AMD wrt to powersavings and we'd prefer our cores to halt (i.e. enter C1) instead. For this, the same selection whether to use monitor/mwait has to be used as when we select the idle routine for the machine. With this patch, offlining cores 1-5 on a X6 machine allows core0 to boost again. [ hpa: putting this in urgent since it is a (power) regression fix ] Reported-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: stable@kernel.org # 37.x Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.hl> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1295534572-10730-1-git-send-email-bp@amd64.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-01-21Merge branch 'fixes-2.6.38' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: x86,percpu: Move out of place 64 bit ops into X86_64 section
2011-01-21x86: Use asm-generic/cacheflush.hAkinobu Mita
The implementation of the cache flushing interfaces on the x86 is identical with the default implementation in asm-generic. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: arnd@arndb.de LKML-Reference: <1295523136-4277-2-git-send-email-akinobu.mita@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>