aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2012-12-11Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Lots of activity: 211 files changed, 8328 insertions(+), 4116 deletions(-) most of it on the tooling side. Main changes: * ftrace enhancements and fixes from Steve Rostedt. * uprobes fixes, cleanups and preparation for the ARM port from Oleg Nesterov. * UAPI fixes, from David Howels - prepares the arch/x86 UAPI transition * Separate perf tests into multiple objects, one per test, from Jiri Olsa. * Make hardware event translations available in sysfs, from Jiri Olsa. * Fixes to /proc/pid/maps parsing, preparatory to supporting data maps, from Namhyung Kim * Implement ui_progress for GTK, from Namhyung Kim * Add framework for automated perf_event_attr tests, where tools with different command line options will be run from a 'perf test', via python glue, and the perf syscall will be intercepted to verify that the perf_event_attr fields set by the tool are those expected, from Jiri Olsa * Add a 'link' method for hists, so that we can have the leader with buckets for all the entries in all the hists. This new method is now used in the default 'diff' output, making the sum of the 'baseline' column be 100%, eliminating blind spots. * libtraceevent fixes for compiler warnings trying to make perf it build on some distros, like fedora 14, 32-bit, some of the warnings really pointed to real bugs. * Add a browser for 'perf script' and make it available from the report and annotate browsers. It does filtering to find the scripts that handle events found in the perf.data file used. From Feng Tang * perf inject changes to allow showing where a task sleeps, from Andrew Vagin. * Makefile improvements from Namhyung Kim. * Add --pre and --post command hooks in 'stat', from Peter Zijlstra. * Don't stop synthesizing threads when one vanishes, this is for the existing threads when we start a tool like trace. * Use sched:sched_stat_runtime to provide a thread summary, this produces the same output as the 'trace summary' subcommand of tglx's original "trace" tool. * Support interrupted syscalls in 'trace' * Add an event duration column and filter in 'trace'. * There are references to the man pages in some tools, so try to build Documentation when installing, warning the user if that is not possible, from Borislav Petkov. * Give user better message if precise is not supported, from David Ahern. * Try to find cross-built objdump path by using the session environment information in the perf.data file header, from Irina Tirdea, original patch and idea by Namhyung Kim. * Diplays more output on features check for make V=1, so that one can figure out what is happening by looking at gcc output, etc. From Jiri Olsa. * Add on_exit implementation for systems without one, e.g. Android, from Bernhard Rosenkraenzer. * Only process events for vcpus of interest, helps handling large number of events, from David Ahern. * Cross compilation fixes for Android, from Irina Tirdea. * Add documentation on compiling for Android, from Irina Tirdea. * perf diff improvements from Jiri Olsa. * Target (task/user/cpu/syswide) handling improvements, from Namhyung Kim. * Add support in 'trace' for tracing workload given by command line, from Namhyung Kim. * ... and much more." * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (194 commits) uprobes: Use percpu_rw_semaphore to fix register/unregister vs dup_mmap() race perf evsel: Introduce is_group_member method perf powerpc: Use uapi/unistd.h to fix build error tools: Pass the target in descend tools: Honour the O= flag when tool build called from a higher Makefile tools: Define a Makefile function to do subdir processing perf ui: Always compile browser setup code perf ui: Add ui_progress__finish() perf ui gtk: Implement ui_progress functions perf ui: Introduce generic ui_progress helper perf ui tui: Move progress.c under ui/tui directory perf tools: Add basic event modifier sanity check perf tools: Omit group members from perf_evlist__disable/enable perf tools: Ensure single disable call per event in record comand perf tools: Fix 'disabled' attribute config for record command perf tools: Fix attributes for '{}' defined event groups perf tools: Use sscanf for parsing /proc/pid/maps perf tools: Add gtk.<command> config option for launching GTK browser perf tools: Fix compile error on NO_NEWT=1 build perf hists: Initialize all of he->stat with zeroes ...
2012-12-11Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU update from Ingo Molnar: "The major features of this tree are: 1. A first version of no-callbacks CPUs. This version prohibits offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y. Relaxing this constraint is in progress, but not yet ready for prime time. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/724. 2. Changes to SRCU that allows statically initialized srcu_struct structures. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/296. 3. Restructuring of RCU's debugfs output. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/341. 4. Additional CPU-hotplug/RCU improvements, posted to LKML at https://lkml.org/lkml/2012/10/30/327. Note that the commit eliminating __stop_machine() was judged to be too-high of risk, so is deferred to 3.9. 5. Changes to RCU's idle interface, most notably a new module parameter that redirects normal grace-period operations to their expedited equivalents. These were posted to LKML at https://lkml.org/lkml/2012/10/30/739. 6. Additional diagnostics for RCU's CPU stall warning facility, posted to LKML at https://lkml.org/lkml/2012/10/30/315. The most notable change reduces the default RCU CPU stall-warning time from 60 seconds to 21 seconds, so that it once again happens sooner than the softlockup timeout. 7. Documentation updates, which were posted to LKML at https://lkml.org/lkml/2012/10/30/280. A couple of late-breaking changes were posted at https://lkml.org/lkml/2012/11/16/634 and https://lkml.org/lkml/2012/11/16/547. 8. Miscellaneous fixes, which were posted to LKML at https://lkml.org/lkml/2012/10/30/309. 9. Finally, a fix for an lockdep-RCU splat was posted to LKML at https://lkml.org/lkml/2012/11/7/486." * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits) context_tracking: New context tracking susbsystem sched: Mark RCU reader in sched_show_task() rcu: Separate accounting of callbacks from callback-free CPUs rcu: Add callback-free CPUs rcu: Add documentation for the new rcuexp debugfs trace file rcu: Update documentation for TREE_RCU debugfs tracing rcu: Reduce default RCU CPU stall warning timeout rcu: Fix TINY_RCU rcu_is_cpu_rrupt_from_idle check rcu: Clarify memory-ordering properties of grace-period primitives rcu: Add new rcutorture module parameters to start/end test messages rcu: Remove list_for_each_continue_rcu() rcu: Fix batch-limit size problem rcu: Add tracing for synchronize_sched_expedited() rcu: Remove old debugfs interfaces and also RCU flavor name rcu: split 'rcuhier' to each flavor rcu: split 'rcugp' to each flavor rcu: split 'rcuboost' to each flavor rcu: split 'rcubarrier' to each flavor rcu: Fix tracing formatting rcu: Remove the interface "rcudata.csv" ...
2012-12-11Merge branch 'akpm' (Andrew's patchbomb)Linus Torvalds
Merge misc updates from Andrew Morton: "About half of most of MM. Going very early this time due to uncertainty over the coreautounifiednumasched things. I'll send the other half of most of MM tomorrow. The rest of MM awaits a slab merge from Pekka." * emailed patches from Andrew Morton: (71 commits) memory_hotplug: ensure every online node has NORMAL memory memory_hotplug: handle empty zone when online_movable/online_kernel mm, memory-hotplug: dynamic configure movable memory and portion memory drivers/base/node.c: cleanup node_state_attr[] bootmem: fix wrong call parameter for free_bootmem() avr32, kconfig: remove HAVE_ARCH_BOOTMEM mm: cma: remove watermark hacks mm: cma: skip watermarks check for already isolated blocks in split_free_page() mm, oom: fix race when specifying a thread as the oom origin mm, oom: change type of oom_score_adj to short mm: cleanup register_node() mm, mempolicy: remove duplicate code mm/vmscan.c: try_to_freeze() returns boolean mm: introduce putback_movable_pages() virtio_balloon: introduce migration primitives to balloon pages mm: introduce compaction and migration for ballooned pages mm: introduce a common interface for balloon pages mobility mm: redefine address_space.assoc_mapping mm: adjust address_space_operations.migratepage() return code arch/sparc/kernel/sys_sparc_64.c: s/COLOUR/COLOR/ ...
2012-12-11mm: use vm_unmapped_area() in hugetlbfs on i386 architectureMichel Lespinasse
Update the i386 hugetlb_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. [akpm@linux-foundation.org: fix build] Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11mm: fix cache coloring on x86_64 architectureMichel Lespinasse
Fix the x86-64 cache alignment code to take pgoff into account. Use the x86 and MIPS cache alignment code as the basis for a generic cache alignment function. The old x86 code will always align the mmap to aliasing boundaries, even if the program mmaps the file with a non-zero pgoff. If program A mmaps the file with pgoff 0, and program B mmaps the file with pgoff 1. The old code would align the mmaps, resulting in misaligned pages: A: 0123 B: 123 After this patch, they are aligned so the pages line up: A: 0123 B: 123 Proposed by Rik van Riel. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11mm: use vm_unmapped_area() on x86_64 architectureMichel Lespinasse
Update the x86_64 arch_get_unmapped_area[_topdown] functions to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLBAndi Kleen
There was some desire in large applications using MAP_HUGETLB or SHM_HUGETLB to use 1GB huge pages on some mappings, and stay with 2MB on others. This is useful together with NUMA policy: use 2MB interleaving on some mappings, but 1GB on local mappings. This patch extends the IPC/SHM syscall interfaces slightly to allow specifying the page size. It borrows some upper bits in the existing flag arguments and allows encoding the log of the desired page size in addition to the *_HUGETLB flag. When 0 is specified the default size is used, this makes the change fully compatible. Extending the internal hugetlb code to handle this is straight forward. Instead of a single mount it just keeps an array of them and selects the right mount based on the specified page size. When no page size is specified it uses the mount of the default page size. The change is not visible in /proc/mounts because internal mounts don't appear there. It also has very little overhead: the additional mounts just consume a super block, but not more memory when not used. I also exported the new flags to the user headers (they were previously under __KERNEL__). Right now only symbols for x86 and some other architecture for 1GB and 2MB are defined. The interface should already work for all other architectures though. Only architectures that define multiple hugetlb sizes actually need it (that is currently x86, tile, powerpc). However tile and powerpc have user configurable hugetlb sizes, so it's not easy to add defines. A program on those architectures would need to query sysfs and use the appropiate log2. [akpm@linux-foundation.org: cleanups] [rientjes@google.com: fix build] [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11KVM: emulator: fix real mode segment checks in address linearizationGleb Natapov
In real mode CS register is writable, so do not #GP on write. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-11VMX: remove unneeded enable_unrestricted_guest checkGleb Natapov
If enable_unrestricted_guest is true vmx->rmode.vm86_active will always be false. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-11KVM: VMX: fix DPL during entry to protected modeGleb Natapov
On CPUs without support for unrestricted guests DPL cannot be smaller than RPL for data segments during guest entry, but this state can occurs if a data segment selector changes while vcpu is in real mode to a value with lowest two bits != 00. Fix that by forcing DPL == RPL on transition to protected mode. This is a regression introduced by c865c43de66dc97. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-11Merge tag 'tty-3.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull TTY/Serial merge from Greg Kroah-Hartman: "Here's the big tty/serial tree set of changes for 3.8-rc1. Contained in here is a bunch more reworks of the tty port layer from Jiri and bugfixes from Alan, along with a number of other tty and serial driver updates by the various driver authors. Also, Jiri has been coerced^Wconvinced to be the co-maintainer of the TTY layer, which is much appreciated by me. All of these have been in the linux-next tree for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" Fixed up some trivial conflicts in the staging tree, due to the fwserial driver having come in both ways (but fixed up a bit in the serial tree), and the ioctl handling in the dgrp driver having been done slightly differently (staging tree got that one right, and removed both TIOCGSOFTCAR and TIOCSSOFTCAR). * tag 'tty-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (146 commits) staging: sb105x: fix potential NULL pointer dereference in mp_chars_in_buffer() staging/fwserial: Remove superfluous free staging/fwserial: Use WARN_ONCE when port table is corrupted staging/fwserial: Destruct embedded tty_port on teardown staging/fwserial: Fix build breakage when !CONFIG_BUG staging: fwserial: Add TTY-over-Firewire serial driver drivers/tty/serial/serial_core.c: clean up HIGH_BITS_OFFSET usage staging: dgrp: dgrp_tty.c: Audit the return values of get/put_user() staging: dgrp: dgrp_tty.c: Remove the TIOCSSOFTCAR ioctl handler from dgrp driver serial: ifx6x60: Add modem power off function in the platform reboot process serial: mxs-auart: unmap the scatter list before we copy the data serial: mxs-auart: disable the Receive Timeout Interrupt when DMA is enabled serial: max310x: Setup missing "can_sleep" field for GPIO tty/serial: fix ifx6x60.c declaration warning serial: samsung: add devicetree properties for non-Exynos SoCs serial: samsung: fix potential soft lockup during uart write tty: vt: Remove redundant null check before kfree. tty/8250 Add check for pci_ioremap_bar failure tty/8250 Add support for Commtech's Fastcom Async-335 and Fastcom Async-PCIe cards tty/8250 Add XR17D15x devices to the exar_handle_irq override ...
2012-12-11x86/kexec: crash_vmclear_local_vmcss needs __rcuZhang Yanfei
This removes the sparse warning: arch/x86/kernel/crash.c:49:32: sparse: incompatible types in comparison expression (different address spaces) Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-11Merge tag 'pm+acpi-for-3.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: - Introduction of device PM QoS flags. - ACPI device power management update allowing subsystems other than PCI to use it more easily. - ACPI device enumeration rework allowing additional kinds of devices to be enumerated via ACPI. From Mika Westerberg, Adrian Hunter, Mathias Nyman, Andy Shevchenko, and Rafael J. Wysocki. - ACPICA update to version 20121018 from Bob Moore and Lv Zheng. - ACPI memory hotplug update from Wen Congyang and Yasuaki Ishimatsu. - Introduction of acpi_handle_<level>() messaging macros and ACPI-based CPU hot-remove support from Toshi Kani. - ACPI EC updates from Feng Tang. - cpufreq updates from Viresh Kumar, Fabio Baltieri and others. - cpuidle changes to quickly notice governor prediction failure from Youquan Song. - Support for using multiple cpuidle drivers at the same time and cpuidle cleanups from Daniel Lezcano. - devfreq updates from Nishanth Menon and others. - cpupower update from Thomas Renninger. - Fixes and small cleanups all over the place. * tag 'pm+acpi-for-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (196 commits) mmc: sdhci-acpi: enable runtime-pm for device HID INT33C6 ACPI: add Haswell LPSS devices to acpi_platform_device_ids list ACPI: add documentation about ACPI 5 enumeration pnpacpi: fix incorrect TEST_ALPHA() test ACPI / PM: Fix header of acpi_dev_pm_detach() in acpi.h ACPI / video: ignore BIOS initial backlight value for HP Folio 13-2000 ACPI : do not use Lid and Sleep button for S5 wakeup ACPI / PNP: Do not crash due to stale pointer use during system resume ACPI / video: Add "Asus UL30VT" to ACPI video detect blacklist ACPI: do acpisleep dmi check when CONFIG_ACPI_SLEEP is set spi / ACPI: add ACPI enumeration support gpio / ACPI: add ACPI support PM / devfreq: remove compiler error with module governors (2) cpupower: IvyBridge (0x3a and 0x3e models) support cpupower: Provide -c param for cpupower monitor to schedule process on all cores cpupower tools: Fix warning and a bug with the cpu package count cpupower tools: Fix malloc of cpu_info structure cpupower tools: Fix issues with sysfs_topology_read_file cpupower tools: Fix minor warnings cpupower tools: Update .gitignore for files created in the debug directories ...
2012-12-11mm: numa: Add fault driven placement and migrationPeter Zijlstra
NOTE: This patch is based on "sched, numa, mm: Add fault driven placement and migration policy" but as it throws away all the policy to just leave a basic foundation I had to drop the signed-offs-by. This patch creates a bare-bones method for setting PTEs pte_numa in the context of the scheduler that when faulted later will be faulted onto the node the CPU is running on. In itself this does nothing useful but any placement policy will fundamentally depend on receiving hints on placement from fault context and doing something intelligent about it. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com>
2012-12-11mm: numa: pte_numa() and pmd_numa()Andrea Arcangeli
Implement pte_numa and pmd_numa. We must atomically set the numa bit and clear the present bit to define a pte_numa or pmd_numa. Once a pte or pmd has been set as pte_numa or pmd_numa, the next time a thread touches a virtual address in the corresponding virtual range, a NUMA hinting page fault will trigger. The NUMA hinting page fault will clear the NUMA bit and set the present bit again to resolve the page fault. The expectation is that a NUMA hinting page fault is used as part of a placement policy that decides if a page should remain on the current node or migrated to a different node. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de>
2012-12-11mm: numa: define _PAGE_NUMAAndrea Arcangeli
The objective of _PAGE_NUMA is to be able to trigger NUMA hinting page faults to identify the per NUMA node working set of the thread at runtime. Arming the NUMA hinting page fault mechanism works similarly to setting up a mprotect(PROT_NONE) virtual range: the present bit is cleared at the same time that _PAGE_NUMA is set, so when the fault triggers we can identify it as a NUMA hinting page fault. _PAGE_NUMA on x86 shares the same bit number of _PAGE_PROTNONE (but it could also use a different bitflag, it's up to the architecture to decide). It would be confusing to call the "NUMA hinting page faults" as "do_prot_none faults". They're different events and _PAGE_NUMA doesn't alter the semantics of mprotect(PROT_NONE) in any way. Sharing the same bitflag with _PAGE_PROTNONE in fact complicates things: it requires us to ensure the code paths executed by _PAGE_PROTNONE remains mutually exclusive to the code paths executed by _PAGE_NUMA at all times, to avoid _PAGE_NUMA and _PAGE_PROTNONE to step into each other toes. Because we want to be able to set this bitflag in any established pte or pmd (while clearing the present bit at the same time) without losing information, this bitflag must never be set when the pte and pmd are present, so the bitflag picked for _PAGE_NUMA usage, must not be used by the swap entry format. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com>
2012-12-11x86/mm: Introduce pte_accessible()Rik van Riel
We need pte_present to return true for _PAGE_PROTNONE pages, to indicate that the pte is associated with a page. However, for TLB flushing purposes, we would like to know whether the pte points to an actually accessible page. This allows us to skip remote TLB flushes for pages that are not actually accessible. Fill in this method for x86 and provide a safe (but slower) method on other architectures. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixed-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-66p11te4uj23gevgh4j987ip@git.kernel.org [ Added Linus's review fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-11x86: mm: drop TLB flush from ptep_set_access_flagsRik van Riel
Intel has an architectural guarantee that the TLB entry causing a page fault gets invalidated automatically. This means we should be able to drop the local TLB invalidation. Because of the way other areas of the page fault code work, chances are good that all x86 CPUs do this. However, if someone somewhere has an x86 CPU that does not invalidate the TLB entry causing a page fault, this one-liner should be easy to revert. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michel Lespinasse <walken@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com>
2012-12-11x86: mm: only do a local tlb flush in ptep_set_access_flags()Rik van Riel
The function ptep_set_access_flags() is only ever invoked to set access flags or add write permission on a PTE. The write bit is only ever set together with the dirty bit. Because we only ever upgrade a PTE, it is safe to skip flushing entries on remote TLBs. The worst that can happen is a spurious page fault on other CPUs, which would flush that TLB entry. Lazily letting another CPU incur a spurious page fault occasionally is (much!) cheaper than aggressively flushing everybody else's TLB. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michel Lespinasse <walken@google.com> Cc: Ingo Molnar <mingo@kernel.org>
2012-12-10Merge branch 'pci/mjg-pci-roms-from-efi' into nextBjorn Helgaas
* pci/mjg-pci-roms-from-efi: PCI: Use phys_addr_t for physical ROM address
2012-12-10x86: Fix the error of using "const" in gen-insn-attr-x86.awkCong Ding
The original version code causes following sparse warnings: arch/x86/lib/inat-tables.c:1080:25: warning: duplicate const arch/x86/lib/inat-tables.c:1095:25: warning: duplicate const arch/x86/lib/inat-tables.c:1118:25: warning: duplicate const for the variables inat_escape_tables, inat_group_tables, and inat_avx_tables in the code generated by gen-insn-attr-x86.awk. The author Masami Hiramutsu says here is to make both the value pointed by the pointers and the pointers itself read-only, so we move the "const" to be after the "*". Signed-off-by: Cong Ding <dinggnu@gmail.com> Link: http://lkml.kernel.org/r/20121209082103.GA9181@gmail.com Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-12-10PCI: Use phys_addr_t for physical ROM addressBjorn Helgaas
Use phys_addr_t rather than "void *" for physical memory address. This removes casts and fixes a "cast from pointer to integer of different size" warning on ppc44x_defconfig. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-12-08Merge branch 'tip/perf/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core Pull ftrace updates from Steve Rostedt. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-08Merge branch 'uprobes/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core Pull uprobes fixes, cleanups and preparation for the ARM port from Oleg Nesterov. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-08Merge branch 'linus' into perf/coreIngo Molnar
Conflicts: tools/perf/Makefile tools/perf/builtin-test.c tools/perf/perf.h tools/perf/tests/parse-events.c tools/perf/util/evsel.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-07Merge branch 'pci/daniel-numachip' into nextBjorn Helgaas
* pci/daniel-numachip: x86/PCI: Add NumaChip remote PCI support
2012-12-07x86/PCI: Add NumaChip remote PCI supportDaniel J Blueman
Add NumaChip-specific PCI access mechanism via MMCONFIG cycles, but preventing access to AMD Northbridges which shouldn't respond. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-12-07x86 mtrr: fix comment typo in mtrr_bp_initOlaf Hering
Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-12-06Merge branch 'pci/mjg-pci-roms-from-efi' into nextBjorn Helgaas
* pci/mjg-pci-roms-from-efi: x86: Use PCI setup data PCI: Add support for non-BAR ROMs PCI: Add pcibios_add_device EFI: Stash ROMs if they're not in the PCI BAR
2012-12-06KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdumpZhang Yanfei
The vmclear function will be assigned to the callback function pointer when loading kvm-intel module. And the bitmap indicates whether we should do VMCLEAR operation in kdump. The bits in the bitmap are set/unset according to different conditions. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-06x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessaryZhang Yanfei
This patch provides a way to VMCLEAR VMCSs related to guests on all cpus before executing the VMXOFF when doing kdump. This is used to ensure the VMCSs in the vmcore updated and non-corrupted. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-06propagate name change to comments in kernel sourceNadia Yvette Chambers
I've legally changed my name with New York State, the US Social Security Administration, et al. This patch propagates the name change and change in initials and login to comments in the kernel source as well. Signed-off-by: Nadia Yvette Chambers <nyc@holomorphy.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-12-06crypto: cast5/cast6 - move lookup tables to shared moduleJussi Kivilinna
CAST5 and CAST6 both use same lookup tables, which can be moved shared module 'cast_common'. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-12-06KVM: MMU: optimize for set_spteXiao Guangrong
There are two cases we need to adjust page size in set_spte: 1): the one is other vcpu creates new sp in the window between mapping_level() and acquiring mmu-lock. 2): the another case is the new sp is created by itself (page-fault path) when guest uses the target gfn as its page table. In current code, set_spte drop the spte and emulate the access for these case, it works not good: - for the case 1, it may destroy the mapping established by other vcpu, and do expensive instruction emulation. - for the case 2, it may emulate the access even if the guest is accessing the page which not used as page table. There is a example, 0~2M is used as huge page in guest, in this huge page, only page 3 used as page table, then guest read/writes on other pages can cause instruction emulation. Both of these cases can be fixed by allowing guest to retry the access, it will refault, then we can establish the mapping by using small page Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-05x86: Use PCI setup dataMatthew Garrett
EFI can provide PCI ROMs out of band via boot services, which may not be available after boot. Add support for using the data handed off to us by the boot stub or bootloader. [bhelgaas: added Seth's boot_params section mismatch fix] [bhelgaas: drop "boot_params.hdr.version < 0x0209" test] Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Seth Forshee <seth.forshee@canonical.com>
2012-12-05EFI: Stash ROMs if they're not in the PCI BARMatthew Garrett
EFI provides support for providing PCI ROMs via means other than the ROM BAR. This support vanishes after we've exited boot services, so add support for stashing copies of the ROMs in setup_data if they're not otherwise available. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Seth Forshee <seth.forshee@canonical.com>
2012-12-05KVM: x86: Make register state after reset conform to specificationJulian Stecklina
VMX behaves now as SVM wrt to FPU initialization. Code has been moved to generic code path. General-purpose registers are now cleared on reset and INIT. SVM code properly initializes EDX. Signed-off-by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-05kvm: don't use bit24 for detecting address-specific invalidation capabilityZhang Xiantao
Bit24 in VMX_EPT_VPID_CAP_MASI is not used for address-specific invalidation capability reporting, so remove it from KVM to avoid conflicts in future. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-05kvm: remove unnecessary bit checking for ept violationZhang Xiantao
Bit 6 in EPT vmexit's exit qualification is not defined in SDM, so remove it. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-03Merge branch 'rcu/next' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Conflicts: arch/x86/kernel/ptrace.c Pull the latest RCU tree from Paul E. McKenney: " The major features of this series are: 1. A first version of no-callbacks CPUs. This version prohibits offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y. Relaxing this constraint is in progress, but not yet ready for prime time. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/724, and are at branch rcu/nocb. 2. Changes to SRCU that allows statically initialized srcu_struct structures. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/296, and are at branch rcu/srcu. 3. Restructuring of RCU's debugfs output. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/341, and are at branch rcu/tracing. 4. Additional CPU-hotplug/RCU improvements, posted to LKML at https://lkml.org/lkml/2012/10/30/327, and are at branch rcu/hotplug. Note that the commit eliminating __stop_machine() was judged to be too-high of risk, so is deferred to 3.9. 5. Changes to RCU's idle interface, most notably a new module parameter that redirects normal grace-period operations to their expedited equivalents. These were posted to LKML at https://lkml.org/lkml/2012/10/30/739, and are at branch rcu/idle. 6. Additional diagnostics for RCU's CPU stall warning facility, posted to LKML at https://lkml.org/lkml/2012/10/30/315, and are at branch rcu/stall. The most notable change reduces the default RCU CPU stall-warning time from 60 seconds to 21 seconds, so that it once again happens sooner than the softlockup timeout. 7. Documentation updates, which were posted to LKML at https://lkml.org/lkml/2012/10/30/280, and are at branch rcu/doc. A couple of late-breaking changes were posted at https://lkml.org/lkml/2012/11/16/634 and https://lkml.org/lkml/2012/11/16/547. 8. Miscellaneous fixes, which were posted to LKML at https://lkml.org/lkml/2012/10/30/309, along with a late-breaking change posted at Fri, 16 Nov 2012 11:26:25 -0800 with message-ID <20121116192625.GA447@linux.vnet.ibm.com>, but which lkml.org seems to have missed. These are at branch rcu/fixes. 9. Finally, a fix for an lockdep-RCU splat was posted to LKML at https://lkml.org/lkml/2012/11/7/486. This is at rcu/next. " Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-02KVM: x86: Fix uninitialized return codeJan Kiszka
This is a regression caused by 18595411a7. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-01Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU fix from Ingo Molnar: "Fix leaking RCU extended quiescent state, which might trigger warnings and mess up the extended quiescent state tracking logic into thinking that we are in "RCU user mode" while we aren't." * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Fix unrecovered RCU user mode in syscall_trace_leave()
2012-12-01Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This is mostly about unbreaking architectures that took the UAPI changes in the v3.7 cycle, plus misc fixes." * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf kvm: Fix building perf kvm on non x86 arches perf kvm: Rename perf_kvm to perf_kvm_stat perf: Make perf build for x86 with UAPI disintegration applied perf powerpc: Use uapi/unistd.h to fix build error tools: Pass the target in descend tools: Honour the O= flag when tool build called from a higher Makefile tools: Define a Makefile function to do subdir processing x86: Export asm/{svm.h,vmx.h,perf_regs.h} perf tools: Fix strbuf_addf() when the buffer needs to grow perf header: Fix numa topology printing perf, powerpc: Fix hw breakpoints returning -ENOSPC
2012-11-30Merge branch 'arm-privcmd-for-3.8' of ↵Konrad Rzeszutek Wilk
git://xenbits.xen.org/people/ianc/linux into stable/for-linus-3.8 * 'arm-privcmd-for-3.8' of git://xenbits.xen.org/people/ianc/linux: xen: arm: implement remap interfaces needed for privcmd mappings. xen: correctly use xen_pfn_t in remap_domain_mfn_range. xen: arm: enable balloon driver xen: balloon: allow PVMMU interfaces to be compiled out xen: privcmd: support autotranslated physmap guests. xen: add pages parameter to xen_remap_domain_mfn_range Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-11-30xen/PVonHVM: fix compile warning in init_hvm_pv_infoOlaf Hering
After merging the xen-two tree, today's linux-next build (x86_64 allmodconfig) produced this warning: arch/x86/xen/enlighten.c: In function 'init_hvm_pv_info': arch/x86/xen/enlighten.c:1617:16: warning: unused variable 'ebx' [-Wunused-variable] arch/x86/xen/enlighten.c:1617:11: warning: unused variable 'eax' [-Wunused-variable] Introduced by commit 9d02b43dee0d ("xen PVonHVM: use E820_Reserved area for shared_info"). Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-11-30x86, fpu: Avoid FPU lazy restore after suspendVincent Palatin
When a cpu enters S3 state, the FPU state is lost. After resuming for S3, if we try to lazy restore the FPU for a process running on the same CPU, this will result in a corrupted FPU context. Ensure that "fpu_owner_task" is properly invalided when (re-)initializing a CPU, so nobody will try to lazy restore a state which doesn't exist in the hardware. Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off, by doing thousands of suspend/resume cycles with 4 processes doing FPU operations running. Without the patch, a process is killed after a few hundreds cycles by a SIGFPE. Cc: Duncan Laurie <dlaurie@chromium.org> Cc: Olof Johansson <olofj@chromium.org> Cc: <stable@kernel.org> v3.4+ # for 3.4 need to replace this_cpu_write by percpu_write Signed-off-by: Vincent Palatin <vpalatin@chromium.org> Link: http://lkml.kernel.org/r/1354306532-1014-1-git-send-email-vpalatin@chromium.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-30KVM: x86: Emulate IA32_TSC_ADJUST MSRWill Auld
CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control. However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations. The argument against storing it in the actual MSR is performance. This is likely to be seldom used while the save/restore is required on every transition. IA32_TSC_ADJUST was created as a way to solve some issues with writing TSC itself so that is not an option either. The remaining option, defined above as our solution has the problem of returning incorrect vmcs tsc_offset values (unless we intercept and fix, not done here) as mentioned above. However, more problematic is that storing the data in vmcs tsc_offset will have a different semantic effect on the system than does using the actual MSR. This is illustrated in the following example: The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest process performs a rdtsc. In this case the guest process will get TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics as seen by the guest do not and hence this will not cause a problem. Signed-off-by: Will Auld <will.auld@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-30KVM: x86: Add code to track call origin for msr assignmentWill Auld
In order to track who initiated the call (host or guest) to modify an msr value I have changed function call parameters along the call path. The specific change is to add a struct pointer parameter that points to (index, data, caller) information rather than having this information passed as individual parameters. The initial use for this capability is for updating the IA32_TSC_ADJUST msr while setting the tsc value. It is anticipated that this capability is useful for other tasks. Signed-off-by: Will Auld <will.auld@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-30context_tracking: New context tracking susbsystemFrederic Weisbecker
Create a new subsystem that probes on kernel boundaries to keep track of the transitions between level contexts with two basic initial contexts: user or kernel. This is an abstraction of some RCU code that use such tracking to implement its userspace extended quiescent state. We need to pull this up from RCU into this new level of indirection because this tracking is also going to be used to implement an "on demand" generic virtual cputime accounting. A necessary step to shutdown the tick while still accounting the cputime. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Gilad Ben-Yossef <gilad@benyossef.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> [ paulmck: fix whitespace error and email address. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-11-29KVM: VMX: fix memory order between loading vmcs and clearing vmcsXiao Guangrong
vmcs->cpu indicates whether it exists on the target cpu, -1 means the vmcs does not exist on any vcpu If vcpu load vmcs with vmcs.cpu = -1, it can be directly added to cpu's percpu list. The list can be corrupted if the cpu prefetch the vmcs's list before reading vmcs->cpu. Meanwhile, we should remove vmcs from the list before making vmcs->vcpu == -1 be visible Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>