aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2013-01-20ia64: kill thread_matches(), unexport ptrace_check_attach()Oleg Nesterov
The ia64 function "thread_matches()" has no users since commit e868a55c2a8c ("[IA64] remove find_thread_for_addr()"). Remove it. This allows us to make ptrace_check_attach() static to kernel/ptrace.c, which is good since we'll need to change the semantics of it and fix up all the callers. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-16Merge tag 'pm+acpi-for-3.8-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: - cpuidle regression fix related to the initialization of state kobjects from Krzysztof Mazur. - cpuidle fix removing some not very useful code and making some user-visible problems go away at the same time. From Daniel Lezcano. - ACPI build fix from Yinghai Lu. * tag 'pm+acpi-for-3.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpuidle: remove the power_specified field in the driver ACPI / glue: Fix build with ACPI_GLUE_DEBUG set cpuidle: fix number of initialized/destroyed states
2013-01-16lockdep, rwsem: fix down_write_nest_lock() if !CONFIG_DEBUG_LOCK_ALLOCJiri Kosina
Commit 1b963c81b145 ("lockdep, rwsem: provide down_write_nest_lock()") contains a bug in a codepath when CONFIG_DEBUG_LOCK_ALLOC is disabled, which causes down_read() to be called instead of down_write() by mistake on such configurations. Fix that. Reported-and-tested-by: Andrew Clayton <andrew@digital-domain.net> Reported-and-tested-by: Zlatko Calusic <zlatko.calusic@iskon.hr> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-16module, async: async_synchronize_full() on module init iff async is usedTejun Heo
If the default iosched is built as module, the kernel may deadlock while trying to load the iosched module on device probe if the probing was running off async. This is because async_synchronize_full() at the end of module init ends up waiting for the async job which initiated the module loading. async A modprobe 1. finds a device 2. registers the block device 3. request_module(default iosched) 4. modprobe in userland 5. load and init module 6. async_synchronize_full() Async A waits for modprobe to finish in request_module() and modprobe waits for async A to finish in async_synchronize_full(). Because there's no easy to track dependency once control goes out to userland, implementing properly nested flushing is difficult. For now, make module init perform async_synchronize_full() iff module init has queued async jobs as suggested by Linus. This avoids the described deadlock because iosched module doesn't use async and thus wouldn't invoke async_synchronize_full(). This is hacky and incomplete. It will deadlock if async module loading nests; however, this works around the known problem case and seems to be the best of bad options. For more details, please refer to the following thread. http://thread.gmane.org/gmane.linux.kernel/1420814 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Alex Riesen <raa.lkml@gmail.com> Tested-by: Ming Lei <ming.lei@canonical.com> Tested-by: Alex Riesen <raa.lkml@gmail.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-15cpuidle: remove the power_specified field in the driverDaniel Lezcano
We realized that the power usage field is never filled and when it is filled for tegra, the power_specified flag is not set causing all of these values to be reset when the driver is initialized with set_power_state(). However, the power_specified flag can be simply removed under the assumption that the states are always backward sorted, which is the case with the current code. This change allows the menu governor select function and the cpuidle_play_dead() to be simplified. Moreover, the set_power_states() function can removed as it does not make sense any more. Drop the power_specified flag from struct cpuidle_driver and make the related changes as described above. As a consequence, this also fixes the bug where on the dynamic C-states system, the power fields are not initialized. [rjw: Changelog] References: https://bugzilla.kernel.org/show_bug.cgi?id=42870 References: https://bugzilla.kernel.org/show_bug.cgi?id=43349 References: https://lkml.org/lkml/2012/10/16/518 Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-14Merge tag 'driver-core-3.8-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg Kroah-Hartman: "Here are two patches for 3.8-rc3. One removes the __dev* defines from init.h now that all usages of it are gone from your tree. The other fix is for debugfs's paramater that was using the wrong base for the option. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'driver-core-3.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: debugfs: convert gid= argument from decimal, not octal Remove __dev* markings from init.h
2013-01-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix regression allowing IP_TTL setting of zero, fix from Cong Wang. 2) Fix leak regressions in tunap, from Jason Wang. 3) be2net driver always returns IRQ_HANDLED in INTx handler, fix from Sathya Perla. 4) qlge doesn't really support NETIF_F_TSO6, don't set that flag. Fix from Amerigo Wang. 5) Add 802.11ad Atheros wil6210 driver, from Vladimir Kondratiev. 6) Fix MTU calculations in mac80211 layer, from T Krishna Chaitanya. 7) Station info layer of mac80211 needs to use del_timer_sync(), from Johannes Berg. 8) tcp_read_sock() can loop forever, because we don't immediately stop when recv_actor() returns zero. Fix from Eric Dumazet. 9) Fix WARN_ON() in tcp_cleanup_rbuf(). We have to use sk_eat_skb() in tcp_recv_skb() to handle the case where a large GRO packet is split up while it is use by a splice() operation. Fix also from Eric Dumazet. 10) addrconf_get_prefix_route() in ipv6 tests flags incorrectly, it does: if (X && (p->flags & Y) != 0) when it really meant to go: if (X && (p->flags & X) != 0) fix from Romain Kuntz. 11) Fix lost Kconfig dependency for bfin_mac driver hardware timestamping. From Lars-Peter Clausen. 12) Fix regression in handling of RST without ACK in TCP, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (37 commits) be2net: fix unconditionally returning IRQ_HANDLED in INTx tuntap: fix leaking reference count tuntap: forbid calling TUNSETIFF when detached tuntap: switch to use rtnl_dereference() net, wireless: overwrite default_ethtool_ops qlge: remove NETIF_F_TSO6 flag tcp: accept RST without ACK flag net: ethernet: xilinx: Do not use NO_IRQ in axienet net: ethernet: xilinx: Do not use axienet on PPC bnx2x: Allow management traffic after boot from SAN bnx2x: Fix fastpath structures when memory allocation fails bfin_mac: Restore hardware time-stamping dependency on BF518 tun: avoid owner checks on IFF_ATTACH_QUEUE bnx2x: move debugging code before the return tuntap: refuse to re-attach to different tun_struct ipv6: use addrconf_get_prefix_route for prefix route lookup [v2] ipv6: fix the noflags test in addrconf_get_prefix_route tcp: fix splice() and tcp collapsing interaction tcp: splice: fix an infinite loop in tcp_read_sock() net: prevent setting ttl=0 via IP_TTL ...
2013-01-11net, wireless: overwrite default_ethtool_opsStanislaw Gruszka
Since: commit 2c60db037034d27f8c636403355d52872da92f81 Author: Eric Dumazet <edumazet@google.com> Date: Sun Sep 16 09:17:26 2012 +0000 net: provide a default dev->ethtool_ops wireless core does not correctly assign ethtool_ops. After alloc_netdev*() call, some cfg80211 drivers provide they own ethtool_ops, but some do not. For them, wireless core provide generic cfg80211_ethtool_ops, which is assigned in NETDEV_REGISTER notify call: if (!dev->ethtool_ops) dev->ethtool_ops = &cfg80211_ethtool_ops; But after Eric's commit, dev->ethtool_ops is no longer NULL (on cfg80211 drivers without custom ethtool_ops), but points to &default_ethtool_ops. In order to fix the problem, provide function which will overwrite default_ethtool_ops and use it by wireless core. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-11lib/rbtree.c: avoid the use of non-static __always_inlineMichel Lespinasse
lib/rbtree.c declared __rb_erase_color() as __always_inline void, and then exported it with EXPORT_SYMBOL. This was because __rb_erase_color() must be exported for augmented rbtree users, but it must also be inlined into rb_erase() so that the dummy callback can get optimized out of that call site. (Actually with a modern compiler, none of the dummy callback functions should even be generated as separate text functions). The above usage is legal C, but it was unusual enough for some compilers to warn about it. This change makes things more explicit, with a static __always_inline ____rb_erase_color function for use in rb_erase(), and a separate non-inline __rb_erase_color function for use in rb_erase_augmented call sites. Signed-off-by: Michel Lespinasse <walken@google.com> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11mm: compaction: partially revert capture of suitable high-order pageMel Gorman
Eric Wong reported on 3.7 and 3.8-rc2 that ppoll() got stuck when waiting for POLLIN on a local TCP socket. It was easier to trigger if there was disk IO and dirty pages at the same time and he bisected it to commit 1fb3f8ca0e92 ("mm: compaction: capture a suitable high-order page immediately when it is made available"). The intention of that patch was to improve high-order allocations under memory pressure after changes made to reclaim in 3.6 drastically hurt THP allocations but the approach was flawed. For Eric, the problem was that page->pfmemalloc was not being cleared for captured pages leading to a poor interaction with swap-over-NFS support causing the packets to be dropped. However, I identified a few more problems with the patch including the fact that it can increase contention on zone->lock in some cases which could result in async direct compaction being aborted early. In retrospect the capture patch took the wrong approach. What it should have done is mark the pageblock being migrated as MIGRATE_ISOLATE if it was allocating for THP and avoided races that way. While the patch was showing to improve allocation success rates at the time, the benefit is marginal given the relative complexity and it should be revisited from scratch in the context of the other reclaim-related changes that have taken place since the patch was first written and tested. This patch partially reverts commit 1fb3f8ca0e92 ("mm: compaction: capture a suitable high-order page immediately when it is made available"). Reported-and-tested-by: Eric Wong <normalperson@yhbt.net> Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11linux/audit.h: move ptrace.h include to kernel headerMike Frysinger
While the kernel internals want pt_regs (and so it includes linux/ptrace.h), the user version of audit.h does not need it. So move the include out of the uapi version. This avoids issues where people want the audit defines and userland ptrace api. Including both the kernel ptrace and the userland ptrace headers can easily lead to failure. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11audit: create explicit AUDIT_SECCOMP event typeKees Cook
The seccomp path was using AUDIT_ANOM_ABEND from when seccomp mode 1 could only kill a process. While we still want to make sure an audit record is forced on a kill, this should use a separate record type since seccomp mode 2 introduces other behaviors. In the case of "handled" behaviors (process wasn't killed), only emit a record if the process is under inspection. This change also fixes userspace examination of seccomp audit events, since it was considered malformed due to missing fields of the AUDIT_ANOM_ABEND event type. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Cc: Jeff Layton <jlayton@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Julien Tinnes <jln@google.com> Acked-by: Will Drewry <wad@chromium.org> Acked-by: Steve Grubb <sgrubb@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11lockdep, rwsem: provide down_write_nest_lock()Jiri Kosina
down_write_nest_lock() provides a means to annotate locking scenario where an outer lock is guaranteed to serialize the order nested locks are being acquired. This is analogoue to already existing mutex_lock_nest_lock() and spin_lock_nest_lock(). Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mel Gorman <mel@csn.ul.ie> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11lib: cpu_rmap: avoid flushing all workqueuesDavid Decotigny
In some cases, free_irq_cpu_rmap() is called while holding a lock (eg rtnl). This can lead to deadlocks, because it invokes flush_scheduled_work() which ends up waiting for whole system workqueue to flush, but some pending works might try to acquire the lock we are already holding. This commit uses reference-counting to replace irq_run_affinity_notifiers(). It also removes irq_run_affinity_notifiers() altogether. [akpm@linux-foundation.org: eliminate free_cpu_rmap, rename cpu_rmap_reclaim() to cpu_rmap_release(), propagate kref_put() retval from cpu_rmap_put()] Signed-off-by: David Decotigny <decot@googlers.com> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-10Remove __dev* markings from init.hGreg Kroah-Hartman
Now that all in-kernel users of __dev* are gone, let's remove them from init.h to keep them from popping up again and again. Thanks to Bill Pemberton for doing all of the hard work to make removal of this possible. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-09cred: Remove tgcred pointer from struct credMarc Dionne
Commit 3a50597de863 ("KEYS: Make the session and process keyrings per-thread") removed the definition of the thread_group_cred structure, but left a now unused pointer in struct cred. Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull namei.h missing include fix from Al Viro. The new use of ESTALE in namei.h can cause compile failures on ARM with certain configurations due to lack of errno.h. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: namei.h: include errno.h
2013-01-04mm: fix zone_watermark_ok_safe() accounting of isolated pagesBartlomiej Zolnierkiewicz
Commit 702d1a6e0766 ("memory-hotplug: fix kswapd looping forever problem") added an isolated pageblocks counter (nr_pageblock_isolate in struct zone) and used it to adjust free pages counter in zone_watermark_ok_safe() to prevent kswapd looping forever problem. Then later, commit 2139cbe627b8 ("cma: fix counting of isolated pages") fixed accounting of isolated pages in global free pages counter. It made the previous zone_watermark_ok_safe() fix unnecessary and potentially harmful (cause now isolated pages may be accounted twice making free pages counter incorrect). This patch removes the special isolated pageblocks counter altogether which fixes zone_watermark_ok_safe() free pages check. Reported-by: Tomasz Stanislawski <t.stanislaws@samsung.com> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Minchan Kim <minchan@kernel.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Aaditya Kumar <aaditya.kumar.30@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04selftests: IPC message queue copy feature testStanislav Kinsbursky
This test can be used to check wheither kernel supports IPC message queue copy and restore features (required by CRIU project). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04ipc: message queue receive cleanupStanislav Kinsbursky
Move all message related manipulation into one function msg_fill(). Actually, two functions because of the compat one. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04ipc: add sysctl to specify desired next object idStanislav Kinsbursky
Add 3 new variables and sysctls to tune them (by one "next_id" variable for messages, semaphores and shared memory respectively). This variable can be used to set desired id for next allocated IPC object. By default it's equal to -1 and old behaviour is preserved. If this variable is non-negative, then desired idr will be extracted from it and used as a start value to search for free IDR slot. Notes: 1) this patch doesn't guarantee that the new object will have desired id. So it's up to user space how to handle new object with wrong id. 2) After a sucessful id allocation attempt, "next_id" will be set back to -1 (if it was non-negative). [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-03include: remove __dev* attributes.Greg Kroah-Hartman
CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit from some include files that were previously missed. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-03Drivers: bcma: remove __dev* attributes.Greg Kroah-Hartman
CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: "Rafał Miłecki" <zajec5@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-03pstore: remove __dev* attributes.Greg Kroah-Hartman
CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit from the pstore filesystem. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Anton Vorontsov <cbouatmailru@gmail.com> Cc: Colin Cross <ccross@android.com> Cc: Kees Cook <keescook@chromium.org> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-02Merge tag '3.8-pci-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Some fixes for v3.8. They include a fix for the new SR-IOV sysfs management support, an expanded quirk for Ricoh SD card readers, a Stratus DMI quirk fix, and a PME polling fix." * tag '3.8-pci-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Reduce Ricoh 0xe822 SD card reader base clock frequency to 50MHz PCI/PM: Do not suspend port if any subordinate device needs PME polling PCI: Add PCIe Link Capability link speed and width names PCI: Work around Stratus ftServer broken PCIe hierarchy (fix DMI check) PCI: Remove spurious error for sriov_numvfs store and simplify flow
2013-01-02UAPI: Remove empty Kbuild filesDavid Howells
Empty files can get deleted by the patch program, so remove empty Kbuild files and their links from the parent Kbuilds. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-02mm: mempolicy: Convert shared_policy mutex to spinlockMel Gorman
Sasha was fuzzing with trinity and reported the following problem: BUG: sleeping function called from invalid context at kernel/mutex.c:269 in_atomic(): 1, irqs_disabled(): 0, pid: 6361, name: trinity-main 2 locks held by trinity-main/6361: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff810aa314>] __do_page_fault+0x1e4/0x4f0 #1: (&(&mm->page_table_lock)->rlock){+.+...}, at: [<ffffffff8122f017>] handle_pte_fault+0x3f7/0x6a0 Pid: 6361, comm: trinity-main Tainted: G W 3.7.0-rc2-next-20121024-sasha-00001-gd95ef01-dirty #74 Call Trace: __might_sleep+0x1c3/0x1e0 mutex_lock_nested+0x29/0x50 mpol_shared_policy_lookup+0x2e/0x90 shmem_get_policy+0x2e/0x30 get_vma_policy+0x5a/0xa0 mpol_misplaced+0x41/0x1d0 handle_pte_fault+0x465/0x6a0 This was triggered by a different version of automatic NUMA balancing but in theory the current version is vunerable to the same problem. do_numa_page -> numa_migrate_prep -> mpol_misplaced -> get_vma_policy -> shmem_get_policy It's very unlikely this will happen as shared pages are not marked pte_numa -- see the page_mapcount() check in change_pte_range() -- but it is possible. To address this, this patch restores sp->lock as originally implemented by Kosaki Motohiro. In the path where get_vma_policy() is called, it should not be calling sp_alloc() so it is not necessary to treat the PTL specially. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-02Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bug fixes from Ted Ts'o: "Various bug fixes for ext4. Perhaps the most serious bug fixed is one which could cause file system corruptions when performing file punch operations." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: avoid hang when mounting non-journal filesystems with orphan list ext4: lock i_mutex when truncating orphan inodes ext4: do not try to write superblock on ro remount w/o journal ext4: include journal blocks in df overhead calcs ext4: remove unaligned AIO warning printk ext4: fix an incorrect comment about i_mutex ext4: fix deadlock in journal_unmap_buffer() ext4: split off ext4_journalled_invalidatepage() jbd2: fix assertion failure in jbd2_journal_flush() ext4: check dioread_nolock on remount ext4: fix extent tree corruption caused by hole punch
2013-01-02mempolicy: remove arg from mpol_parse_str, mpol_to_strHugh Dickins
Remove the unused argument (formerly no_context) from mpol_parse_str() and from mpol_to_str(). Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-27Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace fixes from Eric Biederman: "This tree includes two bug fixes for problems Oleg spotted on his review of the recent pid namespace work. A small fix to not enable bottom halves with irqs disabled, and a trivial build fix for f2fs with user namespaces enabled." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: f2fs: Don't assign e_id in f2fs_acl_from_disk proc: Allow proc_free_inum to be called from any context pidns: Stop pid allocation when init dies pidns: Outlaw thread creation after unshare(CLONE_NEWPID)
2012-12-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) GRE tunnel drivers don't set the transport header properly, they also blindly deref the inner protocol ipv4 and needs some checks. Fixes from Isaku Yamahata. 2) Fix sleeps while atomic in netdevice rename code, from Eric Dumazet. 3) Fix double-spinlock in solos-pci driver, from Dan Carpenter. 4) More ARP bug fixes. Fix lockdep splat in arp_solicit() and then the bug accidentally added by that fix. From Eric Dumazet and Cong Wang. 5) Remove some __dev* annotations that slipped back in, as well as all HOTPLUG references. From Greg KH 6) RDS protocol uses wrong interfaces to access scatter-gather elements, causing a regression. From Mike Marciniszyn. 7) Fix build error in cpts driver, from Richard Cochran. 8) Fix arithmetic in packet scheduler, from Stefan Hasko. 9) Similarly, fix association during calculation of random backoff in batman-adv. From Akinobu Mita. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits) ipv6/ip6_gre: set transport header correctly ipv4/ip_gre: set transport header correctly to gre header IB/rds: suppress incompatible protocol when version is known IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len net/vxlan: Use the underlying device index when joining/leaving multicast groups tcp: should drop incoming frames without ACK flag set netprio_cgroup: define sk_cgrp_prioidx only if NETPRIO_CGROUP is enabled cpts: fix a run time warn_on. cpts: fix build error by removing useless code. batman-adv: fix random jitter calculation arp: fix a regression in arp_solicit() net: sched: integer overflow fix CONFIG_HOTPLUG removal from networking core Drivers: network: more __dev* removal bridge: call br_netpoll_disable in br_add_if ipv4: arp: fix a lockdep splat in arp_solicit() tuntap: dont use a private kmem_cache net: devnet_rename_seq should be a seqcount ip_gre: fix possible use after free ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally ...
2012-12-26mm: Fix PageHead when !CONFIG_PAGEFLAGS_EXTENDEDChristoffer Dall
Unfortunately with !CONFIG_PAGEFLAGS_EXTENDED, (!PageHead) is false, and (PageHead) is true, for tail pages. If this is indeed the intended behavior, which I doubt because it breaks cache cleaning on some ARM systems, then the nomenclature is highly problematic. This patch makes sure PageHead is only true for head pages and PageTail is only true for tail pages, and neither is true for non-compound pages. [ This buglet seems ancient - seems to have been introduced back in Apr 2008 in commit 6a1e7f777f61: "pageflags: convert to the use of new macros". And the reason nobody noticed is because the PageHead() tests are almost all about just sanity-checking, and only used on pages that are actual page heads. The fact that the old code returned true for tail pages too was thus not really noticeable. - Linus ] Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu> Acked-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Will Deacon <Will.Deacon@arm.com> Cc: Steve Capper <Steve.Capper@arm.com> Cc: Christoph Lameter <cl@linux.com> Cc: stable@kernel.org # 2.6.26+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-26PCI: Reduce Ricoh 0xe822 SD card reader base clock frequency to 50MHzAndy Lutomirski
Otherwise it fails like this on cards like the Transcend 16GB SDHC card: mmc0: new SDHC card at address b368 mmcblk0: mmc0:b368 SDC 15.0 GiB mmcblk0: error -110 sending status command, retrying mmcblk0: error -84 transferring data, sector 0, nr 8, cmd response 0x900, card status 0xb0 Tested on my Lenovo x200 laptop. [bhelgaas: changelog] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Chris Ball <cjb@laptop.org> CC: Manoj Iyer <manoj.iyer@canonical.com> CC: stable@vger.kernel.org
2012-12-25pidns: Stop pid allocation when init diesEric W. Biederman
Oleg pointed out that in a pid namespace the sequence. - pid 1 becomes a zombie - setns(thepidns), fork,... - reaping pid 1. - The injected processes exiting. Can lead to processes attempting access their child reaper and instead following a stale pointer. That waitpid for init can return before all of the processes in the pid namespace have exited is also unfortunate. Avoid these problems by disabling the allocation of new pids in a pid namespace when init dies, instead of when the last process in a pid namespace is reaped. Pointed-out-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-12-25namei.h: include errno.hStephen Warren
This solves: In file included from fs/ext3/symlink.c:20:0: include/linux/namei.h: In function 'retry_estale': include/linux/namei.h:114:19: error: 'ESTALE' undeclared (first use in this function) Signed-off-by: Stephen Warren <swarren@wwwdotorg.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-25ext4: fix deadlock in journal_unmap_buffer()Jan Kara
We cannot wait for transaction commit in journal_unmap_buffer() because we hold page lock which ranks below transaction start. We solve the issue by bailing out of journal_unmap_buffer() and jbd2_journal_invalidatepage() with -EBUSY. Caller is then responsible for waiting for transaction commit to finish and try invalidation again. Since the issue can happen only for page stradding i_size, it is simple enough to manually call jbd2_journal_invalidatepage() for such page from ext4_setattr(), check the return value and wait if necessary. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-21Merge git://www.linux-watchdog.org/linux-watchdogLinus Torvalds
Pull watchdog updates from Wim Van Sebroeck: "This includes some fixes and code improvements (like clk_prepare_enable and clk_disable_unprepare), conversion from the omap_wdt and twl4030_wdt drivers to the watchdog framework, addition of the SB8x0 chipset support and the DA9055 Watchdog driver and some OF support for the davinci_wdt driver." * git://www.linux-watchdog.org/linux-watchdog: (22 commits) watchdog: mei: avoid oops in watchdog unregister code path watchdog: Orion: Fix possible null-deference in orion_wdt_probe watchdog: sp5100_tco: Add SB8x0 chipset support watchdog: davinci_wdt: add OF support watchdog: da9052: Fix invalid free of devm_ allocated data watchdog: twl4030_wdt: Change TWL4030_MODULE_PM_RECEIVER to TWL_MODULE_PM_RECEIVER watchdog: remove depends on CONFIG_EXPERIMENTAL watchdog: Convert dev_printk(KERN_<LEVEL> to dev_<level>( watchdog: DA9055 Watchdog driver watchdog: omap_wdt: eliminate goto watchdog: omap_wdt: delete redundant platform_set_drvdata() calls watchdog: omap_wdt: convert to devm_ functions watchdog: omap_wdt: convert to new watchdog core watchdog: WatchDog Timer Driver Core: fix comment watchdog: s3c2410_wdt: use clk_prepare_enable and clk_disable_unprepare watchdog: imx2_wdt: Select the driver via ARCH_MXC watchdog: cpu5wdt.c: add missing del_timer call watchdog: hpwdt.c: Increase version string watchdog: Convert twl4030_wdt to watchdog core davinci_wdt: preparation for switch to common clock framework ...
2012-12-21Merge tag 'dm-3.8-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm Pull dm update from Alasdair G Kergon: "Miscellaneous device-mapper fixes, cleanups and performance improvements. Of particular note: - Disable broken WRITE SAME support in all targets except linear and striped. Use it when kcopyd is zeroing blocks. - Remove several mempools from targets by moving the data into the bio's new front_pad area(which dm calls 'per_bio_data'). - Fix a race in thin provisioning if discards are misused. - Prevent userspace from interfering with the ioctl parameters and use kmalloc for the data buffer if it's small instead of vmalloc. - Throttle some annoying error messages when I/O fails." * tag 'dm-3.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: (36 commits) dm stripe: add WRITE SAME support dm: remove map_info dm snapshot: do not use map_context dm thin: dont use map_context dm raid1: dont use map_context dm flakey: dont use map_context dm raid1: rename read_record to bio_record dm: move target request nr to dm_target_io dm snapshot: use per_bio_data dm verity: use per_bio_data dm raid1: use per_bio_data dm: introduce per_bio_data dm kcopyd: add WRITE SAME support to dm_kcopyd_zero dm linear: add WRITE SAME support dm: add WRITE SAME support dm: prepare to support WRITE SAME dm ioctl: use kmalloc if possible dm ioctl: remove PF_MEMALLOC dm persistent data: improve improve space map block alloc failure message dm thin: use DMERR_LIMIT for errors ...
2012-12-21Merge tag 'rdma-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull more infiniband changes from Roland Dreier: "Second batch of InfiniBand/RDMA changes for 3.8: - cxgb4 changes to fix lookup engine hash collisions - mlx4 changes to make flow steering usable - fix to IPoIB to avoid pinning dst reference for too long" * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA/cxgb4: Fix bug for active and passive LE hash collision path RDMA/cxgb4: Fix LE hash collision bug for passive open connection RDMA/cxgb4: Fix LE hash collision bug for active open connection mlx4_core: Allow choosing flow steering mode mlx4_core: Adjustments to Flow Steering activation logic for SR-IOV mlx4_core: Fix error flow in the flow steering wrapper mlx4_core: Add QPN enforcement for flow steering rules set by VFs cxgb4: Add LE hash collision bug fix path in LLD driver cxgb4: Add T4 filter support IPoIB: Call skb_dst_drop() once skb is enqueued for sending
2012-12-21net: devnet_rename_seq should be a seqcountEric Dumazet
Using a seqlock for devnet_rename_seq is not a good idea, as device_rename() can sleep. As we hold RTNL, we dont need a protection for writers, and only need a seqcount so that readers can catch a change done by a writer. Bug added in commit c91f6df2db4972d3 (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name) Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-21dm: remove map_infoMikulas Patocka
This patch removes map_info from bio-based device mapper targets. map_info is still used for request-based targets. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm: move target request nr to dm_target_ioMikulas Patocka
This patch moves target_request_nr from map_info to dm_target_io and makes it accessible with dm_bio_get_target_request_nr. This patch is a preparation for the next patch that removes map_info. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm: introduce per_bio_dataMikulas Patocka
Introduce a field per_bio_data_size in struct dm_target. Targets can set this field in the constructor. If a target sets this field to a non-zero value, "per_bio_data_size" bytes of auxiliary data are allocated for each bio submitted to the target. These data can be used for any purpose by the target and help us improve performance by removing some per-target mempools. Per-bio data is accessed with dm_per_bio_data. The argument data_size must be the same as the value per_bio_data_size in dm_target. If the target has a pointer to per_bio_data, it can get a pointer to the bio with dm_bio_from_per_bio_data() function (data_size must be the same as the value passed to dm_per_bio_data). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm: prepare to support WRITE SAMEMike Snitzer
Allow targets to opt in to WRITE SAME support by setting 'num_write_same_requests' in the dm_target structure. A dm device will only advertise WRITE SAME support if all its targets and all its underlying devices support it. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-20Merge branch 'for-next' of git://git.infradead.org/users/eparis/notifyLinus Torvalds
Pull filesystem notification updates from Eric Paris: "This pull mostly is about locking changes in the fsnotify system. By switching the group lock from a spin_lock() to a mutex() we can now hold the lock across things like iput(). This fixes a problem involving unmounting a fs and having inodes be busy, first pointed out by FAT, but reproducible with tmpfs. This also restores signal driven I/O for inotify, which has been broken since about 2.6.32." Ugh. I *hate* the timing of this. It was rebased after the merge window opened, and then left to sit with the pull request coming the day before the merge window closes. That's just crap. But apparently the patches themselves have been around for over a year, just gathering dust, so now it's suddenly critical. Fixed up semantic conflict in fs/notify/fdinfo.c as per Stephen Rothwell's fixes from -next. * 'for-next' of git://git.infradead.org/users/eparis/notify: inotify: automatically restart syscalls inotify: dont skip removal of watch descriptor if creation of ignored event failed fanotify: dont merge permission events fsnotify: make fasync generic for both inotify and fanotify fsnotify: change locking order fsnotify: dont put marks on temporary list when clearing marks by group fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark() fsnotify: pass group to fsnotify_destroy_mark() fsnotify: use a mutex instead of a spinlock to protect a groups mark list fanotify: add an extra flag to mark_remove_from_mask that indicates wheather a mark should be destroyed fsnotify: take groups mark_lock before mark lock fsnotify: use reference counting for groups fsnotify: introduce fsnotify_get_group() inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group()
2012-12-20Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge the rest of Andrew's patches for -rc1: "A bunch of fixes and misc missed-out-on things. That'll do for -rc1. I still have a batch of IPC patches which still have a possible bug report which I'm chasing down." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (25 commits) keys: use keyring_alloc() to create module signing keyring keys: fix unreachable code sendfile: allows bypassing of notifier events SGI-XP: handle non-fatal traps fat: fix incorrect function comment Documentation: ABI: remove testing/sysfs-devices-node proc: fix inconsistent lock state linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors memcg: don't register hotcpu notifier from ->css_alloc() checkpatch: warn on uapi #includes that #include <uapi/... revert "rtc: recycle id when unloading a rtc driver" mm: clean up transparent hugepage sysfs error messages hfsplus: add error message for the case of failure of sync fs in delayed_sync_fs() method hfsplus: rework processing of hfs_btree_write() returned error hfsplus: rework processing errors in hfsplus_free_extents() hfsplus: avoid crash on failed block map free kcmp: include linux/ptrace.h drivers/rtc/rtc-imxdi.c: must include <linux/spinlock.h> mm: cma: WARN if freed memory is still in use exec: do not leave bprm->interp on stack ...
2012-12-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS update from Al Viro: "fscache fixes, ESTALE patchset, vmtruncate removal series, assorted misc stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (79 commits) vfs: make lremovexattr retry once on ESTALE error vfs: make removexattr retry once on ESTALE vfs: make llistxattr retry once on ESTALE error vfs: make listxattr retry once on ESTALE error vfs: make lgetxattr retry once on ESTALE vfs: make getxattr retry once on an ESTALE error vfs: allow lsetxattr() to retry once on ESTALE errors vfs: allow setxattr to retry once on ESTALE errors vfs: allow utimensat() calls to retry once on an ESTALE error vfs: fix user_statfs to retry once on ESTALE errors vfs: make fchownat retry once on ESTALE errors vfs: make fchmodat retry once on ESTALE errors vfs: have chroot retry once on ESTALE error vfs: have chdir retry lookup and call once on ESTALE error vfs: have faccessat retry once on an ESTALE error vfs: have do_sys_truncate retry once on an ESTALE error vfs: fix renameat to retry on ESTALE errors vfs: make do_unlinkat retry once on ESTALE errors vfs: make do_rmdir retry once on ESTALE errors vfs: add a flags argument to user_path_parent ...
2012-12-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull signal handling cleanups from Al Viro: "sigaltstack infrastructure + conversion for x86, alpha and um, COMPAT_SYSCALL_DEFINE infrastructure. Note that there are several conflicts between "unify SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline; resolution is trivial - just remove definitions of SS_ONSTACK and SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and include/uapi/linux/signal.h contains the unified variant." Fixed up conflicts as per Al. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: alpha: switch to generic sigaltstack new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those generic compat_sys_sigaltstack() introduce generic sys_sigaltstack(), switch x86 and um to it new helper: compat_user_stack_pointer() new helper: restore_altstack() unify SS_ONSTACK/SS_DISABLE definitions new helper: current_user_stack_pointer() missing user_stack_pointer() instances Bury the conditionals from kernel_thread/kernel_execve series COMPAT_SYSCALL_DEFINE: infrastructure
2012-12-20linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisorsGuenter Roeck
Commit 263a523d18bc ("linux/kernel.h: Fix warning seen with W=1 due to change in DIV_ROUND_CLOSEST") fixes a warning seen with W=1 due to change in DIV_ROUND_CLOSEST. Unfortunately, the C compiler converts divide operations with unsigned divisors to unsigned, even if the dividend is signed and negative (for example, -10 / 5U = 858993457). The C standard says "If one operand has unsigned int type, the other operand is converted to unsigned int", so the compiler is not to blame. As a result, DIV_ROUND_CLOSEST(0, 2U) and similar operations now return bad values, since the automatic conversion of expressions such as "0 - 2U/2" to unsigned was not taken into account. Fix by checking for the divisor variable type when deciding which operation to perform. This fixes DIV_ROUND_CLOSEST(0, 2U), but still returns bad values for negative dividends divided by unsigned divisors. Mark the latter case as unsupported. One observed effect of this problem is that the s2c_hwmon driver reports a value of 4198403 instead of 0 if the ADC reads 0. Other impact is unpredictable. Problem is seen if the divisor is an unsigned variable or constant and the dividend is less than (divisor/2). Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reported-by: Juergen Beisert <jbe@pengutronix.de> Tested-by: Juergen Beisert <jbe@pengutronix.de> Cc: Jean Delvare <khali@linux-fr.org> Cc: <stable@vger.kernel.org> [3.7.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-20exec: do not leave bprm->interp on stackKees Cook
If a series of scripts are executed, each triggering module loading via unprintable bytes in the script header, kernel stack contents can leak into the command line. Normally execution of binfmt_script and binfmt_misc happens recursively. However, when modules are enabled, and unprintable bytes exist in the bprm->buf, execution will restart after attempting to load matching binfmt modules. Unfortunately, the logic in binfmt_script and binfmt_misc does not expect to get restarted. They leave bprm->interp pointing to their local stack. This means on restart bprm->interp is left pointing into unused stack memory which can then be copied into the userspace argv areas. After additional study, it seems that both recursion and restart remains the desirable way to handle exec with scripts, misc, and modules. As such, we need to protect the changes to interp. This changes the logic to require allocation for any changes to the bprm->interp. To avoid adding a new kmalloc to every exec, the default value is left as-is. Only when passing through binfmt_script or binfmt_misc does an allocation take place. For a proof of concept, see DoTest.sh from: http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ Signed-off-by: Kees Cook <keescook@chromium.org> Cc: halfdog <me@halfdog.net> Cc: P J P <ppandit@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>