aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2010-05-07Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: create rcu_my_thread_group_empty() wrapper memcg: css_id() must be called under rcu_read_lock() cgroup: Check task_lock in task_subsys_state() sched: Fix an RCU warning in print_task() cgroup: Fix an RCU warning in alloc_css_id() cgroup: Fix an RCU warning in cgroup_path() KEYS: Fix an RCU warning in the reading of user keys KEYS: Fix an RCU warning
2010-05-06rcu: create rcu_my_thread_group_empty() wrapperPaul E. McKenney
Some RCU-lockdep splat repairs need to know whether they are running in a single-threaded process. Unfortunately, the thread_group_empty() primitive is defined in sched.h, and can induce #include hell. This commit therefore introduces a rcu_my_thread_group_empty() wrapper that is defined in rcupdate.c, thus avoiding the need to include sched.h everywhere. Signed-off-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
2010-05-05Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: flush_delayed_work: keep the original workqueue for re-queueing
2010-05-04Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix resource leak in failure path of perf_event_open()
2010-05-04Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: Fix RCU lockdep splat on freezer_fork path rcu: Fix RCU lockdep splat in set_task_cpu on fork path mutex: Don't spin when the owner CPU is offline or other weird cases
2010-05-04sched: Fix an RCU warning in print_task()Li Zefan
With CONFIG_PROVE_RCU=y, a warning can be triggered: $ cat /proc/sched_debug ... kernel/cgroup.c:1649 invoked rcu_dereference_check() without protection! ... Both cgroup_path() and task_group() should be called with either rcu_read_lock or cgroup_mutex held. The rcu_dereference_check() does include cgroup_lock_is_held(), so we know that this lock is not held. Therefore, in a CONFIG_PREEMPT kernel, to say nothing of a CONFIG_PREEMPT_RT kernel, the original code could have ended up copying a string out of the freelist. This patch inserts RCU read-side primitives needed to prevent this scenario. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-04cgroup: Fix an RCU warning in alloc_css_id()Li Zefan
With CONFIG_PROVE_RCU=y, a warning can be triggered: # mount -t cgroup -o memory xxx /mnt # mkdir /mnt/0 ... kernel/cgroup.c:4442 invoked rcu_dereference_check() without protection! ... This is a false-positive. It's safe to directly access parent_css->id. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-04cgroup: Fix an RCU warning in cgroup_path()Li Zefan
with CONFIG_PROVE_RCU=y, a warning can be triggered: # mount -t cgroup -o debug xxx /mnt # cat /proc/$$/cgroup ... kernel/cgroup.c:1649 invoked rcu_dereference_check() without protection! ... This is a false-positive, because cgroup_path() can be called with either rcu_read_lock() held or cgroup_mutex held. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-01perf: Fix resource leak in failure path of perf_event_open()Tejun Heo
perf_event_open() kfrees event after init failure which doesn't release all resources allocated by perf_event_alloc(). Use free_event() instead. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@au1.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: <stable@kernel.org> LKML-Reference: <4BDBE237.1040809@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-30rcu: Fix RCU lockdep splat on freezer_fork pathPaul E. McKenney
Add an RCU read-side critical section to suppress this false positive. Located-by: Eric Paris <eparis@parisplace.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com Cc: eric.dumazet@gmail.com LKML-Reference: <1271880131-3951-2-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-30rcu: Fix RCU lockdep splat in set_task_cpu on fork pathPeter Zijlstra
Add an RCU read-side critical section to suppress this false positive. Located-by: Eric Paris <eparis@parisplace.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com Cc: eric.dumazet@gmail.com LKML-Reference: <1271880131-3951-1-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-30workqueue: flush_delayed_work: keep the original workqueue for re-queueingOleg Nesterov
flush_delayed_work() always uses keventd_wq for re-queueing, but it should use the workqueue this dwork was queued on. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-04-24kernel/sys.c: fix compat uname machineAndreas Schwab
On ppc64 you get this error: $ setarch ppc -R true setarch: ppc: Unrecognized architecture because uname still reports ppc64 as the machine. So mask off the personality flags when checking for PER_LINUX32. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-23mutex: Don't spin when the owner CPU is offline or other weird casesBenjamin Herrenschmidt
Due to recent load-balancer changes that delay the task migration to the next wakeup, the adaptive mutex spinning ends up in a live lock when the owner's CPU gets offlined because the cpu_online() check lives before the owner running check. This patch changes mutex_spin_on_owner() to return 0 (don't spin) in any case where we aren't sure about the owner struct validity or CPU number, and if the said CPU is offline. There is no point going back & re-evaluate spinning in corner cases like that, let's just go to sleep. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1271212509.13059.135.camel@pasglop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-22CRED: Fix a race in creds_are_invalid() in credentials debuggingDavid Howells
creds_are_invalid() reads both cred->usage and cred->subscribers and then compares them to make sure the number of processes subscribed to a cred struct never exceeds the refcount of that cred struct. The problem is that this can cause a race with both copy_creds() and exit_creds() as the two counters, whilst they are of atomic_t type, are only atomic with respect to themselves, and not atomic with respect to each other. This means that if creds_are_invalid() can read the values on one CPU whilst they're being modified on another CPU, and so can observe an evolving state in which the subscribers count now is greater than the usage count a moment before. Switching the order in which the counts are read cannot help, so the thing to do is to remove that particular check. I had considered rechecking the values to see if they're in flux if the test fails, but I can't guarantee they won't appear the same, even if they've changed several times in the meantime. Note that this can only happen if CONFIG_DEBUG_CREDENTIALS is enabled. The problem is only likely to occur with multithreaded programs, and can be tested by the tst-eintr1 program from glibc's "make check". The symptoms look like: CRED: Invalid credentials CRED: At include/linux/cred.h:240 CRED: Specified credentials: ffff88003dda5878 [real][eff] CRED: ->magic=43736564, put_addr=(null) CRED: ->usage=766, subscr=766 CRED: ->*uid = { 0,0,0,0 } CRED: ->*gid = { 0,0,0,0 } CRED: ->security is ffff88003d72f538 CRED: ->security {359, 359} ------------[ cut here ]------------ kernel BUG at kernel/cred.c:850! ... RIP: 0010:[<ffffffff81049889>] [<ffffffff81049889>] __invalid_creds+0x4e/0x52 ... Call Trace: [<ffffffff8104a37b>] copy_creds+0x6b/0x23f Note the ->usage=766 and subscr=766. The values appear the same because they've been re-read since the check was made. Reported-by: Roland McGrath <roland@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-04-21CRED: Fix double free in prepare_usermodehelper_creds() error handlingDavid Howells
Patch 570b8fb505896e007fd3bb07573ba6640e51851d: Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Tue Mar 30 00:04:00 2010 +0100 Subject: CRED: Fix memory leak in error handling attempts to fix a memory leak in the error handling by making the offending return statement into a jump down to the bottom of the function where a kfree(tgcred) is inserted. This is, however, incorrect, as it does a kfree() after doing put_cred() if security_prepare_creds() fails. That will result in a double free if 'error' is jumped to as put_cred() will also attempt to free the new tgcred record by virtue of it being pointed to by the new cred record. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-04-19rcu: Make RCU lockdep check the lockdep_recursion variablePaul E. McKenney
The lockdep facility temporarily disables lockdep checking by incrementing the current->lockdep_recursion variable. Such disabling happens in NMIs and in other situations where lockdep might expect to recurse on itself. This patch therefore checks current->lockdep_recursion, disabling RCU lockdep splats when this variable is non-zero. In addition, this patch removes the "likely()", as suggested by Lai Jiangshan. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: David Miller <davem@davemloft.net> Tested-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com Cc: eric.dumazet@gmail.com LKML-Reference: <20100415195039.GA22623@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-10PM / Hibernate: user.c, fix SNAPSHOT_SET_SWAP_AREA handlingJiri Slaby
When CONFIG_DEBUG_BLOCK_EXT_DEVT is set we decode the device improperly by old_decode_dev and it results in an error while hibernating with s2disk. All users already pass the new device number, so switch to new_decode_dev(). Signed-off-by: Jiri Slaby <jslaby@suse.cz> Reported-and-tested-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
2010-04-08Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix sched_getaffinity()
2010-04-07mm: avoid null-pointer deref in sync_mm_rss()KAMEZAWA Hiroyuki
- We weren't zeroing p->rss_stat[] at fork() - Consequently sync_mm_rss() was dereferencing tsk->mm for kernel threads and was oopsing. - Make __sync_task_rss_stat() static, too. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=15648 [akpm@linux-foundation.org: remove the BUG_ON(!mm->rss)] Reported-by: Troels Liebe Bentsen <tlb@rapanden.dk> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> "Michael S. Tsirkin" <mst@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-06Merge branch 'irq-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Force MSI irq handlers to run with interrupts disabled
2010-04-06sched: Fix sched_getaffinity()Anton Blanchard
taskset on 2.6.34-rc3 fails on one of my ppc64 test boxes with the following error: sched_getaffinity(0, 16, 0x10029650030) = -1 EINVAL (Invalid argument) This box has 128 threads and 16 bytes is enough to cover it. Commit cd3d8031eb4311e516329aee03c79a08333141f1 (sched: sched_getaffinity(): Allow less than NR_CPUS length) is comparing this 16 bytes agains nr_cpu_ids. Fix it by comparing nr_cpu_ids to the number of bits in the cpumask we pass in. Signed-off-by: Anton Blanchard <anton@samba.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Sharyathi Nagesh <sharyath@in.ibm.com> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Cc: Russ Anderson <rja@sgi.com> Cc: Mike Travis <travis@sgi.com> LKML-Reference: <20100406070218.GM5594@kryten> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-05Fix up possibly racy module refcountingNick Piggin
Module refcounting is implemented with a per-cpu counter for speed. However there is a race when tallying the counter where a reference may be taken by one CPU and released by another. Reference count summation may then see the decrement without having seen the previous increment, leading to lower than expected count. A module which never has its actual reference drop below 1 may return a reference count of 0 due to this race. Module removal generally runs under stop_machine, which prevents this race causing bugs due to removal of in-use modules. However there are other real bugs in module.c code and driver code (module_refcount is exported) where the callers do not run under stop_machine. Fix this by maintaining running per-cpu counters for the number of module refcount increments and the number of refcount decrements. The increments are tallied after the decrements, so any decrement seen will always have its corresponding increment counted. The final refcount is the difference of the total increments and decrements, preventing a low-refcount from being returned. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-05audit: preface audit printk with auditEric Paris
There have been a number of reports of people seeing the message: "name_count maxed, losing inode data: dev=00:05, inode=3185" in dmesg. These usually lead to people reporting problems to the filesystem group who are in turn clueless what they mean. Eventually someone finds me and I explain what is going on and that these come from the audit system. The basics of the problem is that the audit subsystem never expects a single syscall to 'interact' (for some wish washy meaning of interact) with more than 20 inodes. But in fact some operations like loading kernel modules can cause changes to lots of inodes in debugfs. There are a couple real fixes being bandied about including removing the fixed compile time limit of 20 or not auditing changes in debugfs (or both) but neither are small and obvious so I am not sending them for immediate inclusion (I hope Al forwards a real solution next devel window). In the meantime this patch simply adds 'audit' to the beginning of the crap message so if a user sees it, they come blame me first and we can talk about what it means and make sure we understand all of the reasons it can happen and make sure this gets solved correctly in the long run. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-05Merge branch 'slabh' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/miscLinus Torvalds
* 'slabh' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc: eeepc-wmi: include slab.h staging/otus: include slab.h from usbdrv.h percpu: don't implicitly include slab.h from percpu.h kmemcheck: Fix build errors due to missing slab.h include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h iwlwifi: don't include iwl-dev.h from iwl-devtrace.h x86: don't include slab.h from arch/x86/include/asm/pgtable_32.h Fix up trivial conflicts in include/linux/percpu.h due to is_kernel_percpu_address() having been introduced since the slab.h cleanup with the percpu_up.c splitup.
2010-04-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: module: add stub for is_module_percpu_address percpu, module: implement and use is_kernel/module_percpu_address() module: encapsulate percpu handling better and record percpu_size
2010-04-05Merge branch 'master' into export-slabhTejun Heo
2010-04-04Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Always build the powerpc perf_arch_fetch_caller_regs version perf: Always build the stub perf_arch_fetch_caller_regs version perf, probe-finder: Build fix on Debian perf/scripts: Tuple was set from long in both branches in python_process_event() perf: Fix 'perf sched record' deadlock perf, x86: Fix callgraphs of 32-bit processes on 64-bit kernels perf, x86: Fix AMD hotplug & constraint initialization x86: Move notify_cpu_starting() callback to a later stage x86,kgdb: Always initialize the hw breakpoint attribute perf: Use hot regs with software sched switch/migrate events perf: Correctly align perf event tracing buffer
2010-04-04Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: set_cpus_allowed_ptr(): Don't use rq->migration_thread after unlock sched: Fix proc_sched_set_task()
2010-04-04Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ring-buffer: Add missing unlock tracing: Fix lockdep warning in global_clock()
2010-04-03perf: Always build the stub perf_arch_fetch_caller_regs versionFrederic Weisbecker
Now that software events use perf_arch_fetch_caller_regs() too, we need the stub version to be always built in for archs that don't implement it. Fixes the following build error in PARISC: kernel/built-in.o: In function `perf_event_task_sched_out': (.text.perf_event_task_sched_out+0x54): undefined reference to `perf_arch_fetch_caller_regs' Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org>
2010-04-02Merge branch 'kgdb-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'kgdb-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kgdb: Turn off tracing while in the debugger kgdb: use atomic_inc and atomic_dec instead of atomic_set kgdb: eliminate kgdb_wait(), all cpus enter the same way kgdbts,sh: Add in breakpoint pc offset for superh kgdb: have ebin2mem call probe_kernel_write once
2010-04-02Merge branch 'pm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: Freezer: Fix buggy resume test for tasks frozen with cgroup freezer Freezer: Only show the state of tasks refusing to freeze
2010-04-02kgdb: Turn off tracing while in the debuggerJason Wessel
The kernel debugger should turn off kernel tracing any time the debugger is active and restore it on resume. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
2010-04-02kgdb: use atomic_inc and atomic_dec instead of atomic_setJason Wessel
Memory barriers should be used for the kgdb cpu synchronization. The atomic_set() does not imply a memory barrier. Reported-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-04-02kgdb: eliminate kgdb_wait(), all cpus enter the same wayJason Wessel
This is a kgdb architectural change to have all the cpus (master or slave) enter the same function. A cpu that hits an exception (wants to be the master cpu) will call kgdb_handle_exception() from the trap handler and then invoke a kgdb_roundup_cpu() to synchronize the other cpus and bring them into the kgdb_handle_exception() as well. A slave cpu will enter kgdb_handle_exception() from the kgdb_nmicallback() and set the exception state to note that the processor is a slave. Previously the salve cpu would have called kgdb_wait(). This change allows the debug core to change cpus without resuming the system in order to inspect arch specific cpu information. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-04-02kgdb: have ebin2mem call probe_kernel_write onceJason Wessel
Rather than call probe_kernel_write() one byte at a time, process the whole buffer locally and pass the entire result in one go. This way, architectures that need to do special handling based on the length can do so, or we only end up calling memcpy() once. [sonic.zhang@analog.com: Reported original problem and preliminary patch] Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2010-04-02sched: set_cpus_allowed_ptr(): Don't use rq->migration_thread after unlockOleg Nesterov
Trivial typo fix. rq->migration_thread can be NULL after task_rq_unlock(), this is why we have "mt" which should be used instead. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100330165829.GA18284@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02sched: Fix proc_sched_set_task()Mike Galbraith
Latencytop clearing sum_exec_runtime via proc_sched_set_task() breaks task_times(). Other places in kernel use nvcsw and nivcsw, which are being cleared as well, Clear task statistics only. Reported-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1269940193.19286.14.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02perf: Fix 'perf sched record' deadlockMike Galbraith
perf sched record can deadlock a box should the holder of handle->data->lock take an interrupt, and then attempt to acquire an rq lock held by a CPU trying to acquire the same lock. Disable interrupts. CPU0 CPU1 sched event with rq->lock held grab handle->data->lock spin on handle->data->lock interrupt try to grab rq->lock Reported-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Tested-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1269598293.6174.8.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02Merge branch 'perf/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent
2010-04-01perf: Use hot regs with software sched switch/migrate eventsFrederic Weisbecker
Scheduler's task migration events don't work because they always pass NULL regs perf_sw_event(). The event hence gets filtered in perf_swevent_add(). Scheduler's context switches events use task_pt_regs() to get the context when the event occured which is a wrong thing to do as this won't give us the place in the kernel where we went to sleep but the place where we left userspace. The result is even more wrong if we switch from a kernel thread. Use the hot regs snapshot for both events as they belong to the non-interrupt/exception based events family. Unlike page faults or so that provide the regs matching the exact origin of the event, we need to save the current context. This makes the task migration event working and fix the context switch callchains and origin ip. Example: perf record -a -e cs Before: 10.91% ksoftirqd/0 0 [k] 0000000000000000 | --- (nil) perf_callchain perf_prepare_sample __perf_event_overflow perf_swevent_overflow perf_swevent_add perf_swevent_ctx_event do_perf_sw_event __perf_sw_event perf_event_task_sched_out schedule run_ksoftirqd kthread kernel_thread_helper After: 23.77% hald-addon-stor [kernel.kallsyms] [k] schedule | --- schedule | |--60.00%-- schedule_timeout | wait_for_common | wait_for_completion | blk_execute_rq | scsi_execute | scsi_execute_req | sr_test_unit_ready | | | |--66.67%-- sr_media_change | | media_changed | | cdrom_media_changed | | sr_block_media_changed | | check_disk_change | | cdrom_open v2: Always build perf_arch_fetch_caller_regs() now that software events need that too. They don't need it from modules, unlike trace events, so we keep the EXPORT_SYMBOL in trace_event_perf.c Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net>
2010-04-01perf: Correctly align perf event tracing bufferFrederic Weisbecker
The trace event buffer used by perf to record raw sample events is typed as an array of char and may then not be aligned to 8 by alloc_percpu(). But we need it to be aligned to 8 in sparc64 because we cast this buffer into a random structure type built by the TRACE_EVENT() macro to store the traces. So if a random 64 bits field is accessed inside, it may be not under an expected good alignment. Use an array of long instead to force the appropriate alignment, and perform a compile time check to ensure the size in byte of the buffer is a multiple of sizeof(long) so that its actual size doesn't get shrinked under us. This fixes unaligned accesses reported while using perf lock in sparc 64. Suggested-by: David Miller <davem@davemloft.net> Suggested-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Steven Rostedt <rostedt@goodmis.org>
2010-03-31genirq: Force MSI irq handlers to run with interrupts disabledThomas Gleixner
Network folks reported that directing all MSI-X vectors of their multi queue NICs to a single core can cause interrupt stack overflows when enough interrupts fire at the same time. This is caused by the fact that we run interrupt handlers by default with interrupts enabled unless the driver reuqests the interrupt with the IRQF_DISABLED set. The NIC handlers do not set this flag, so simultaneous interrupts can nest unlimited and cause the stack overflow. The only safe counter measure is to run the interrupt handlers with interrupts disabled. We can't switch to this mode in general right now, but it is safe to do so for MSI interrupts. Force IRQF_DISABLED for MSI interrupt handlers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: Linus Torvalds <torvalds@osdl.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: stable@kernel.org
2010-03-30Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: CRED: Fix memory leak in error handling
2010-03-30Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Do not free zero sized per cpu areas x86: Make sure free_init_pages() frees pages on page boundary x86: Make smp_locks end with page alignment
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30CRED: Fix memory leak in error handlingMathieu Desnoyers
Fix a memory leak on an OOM condition in prepare_usermodehelper_creds(). Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-03-29ring-buffer: Add missing unlockJulia Lawall
In some error handling cases the lock is not unlocked. The return is converted to a goto, to share the unlock at the end of the function. A simplified version of the semantic patch that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ expression E1; identifier f; @@ f (...) { <+... * spin_lock_irq (E1,...); ... when != E1 * return ...; ...+> } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> LKML-Reference: <Pine.LNX.4.64.1003291736440.21896@ask.diku.dk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-03-29tracing: Fix lockdep warning in global_clock()Li Zefan
# echo 1 > events/enable # echo global > trace_clock ------------[ cut here ]------------ WARNING: at kernel/lockdep.c:3162 check_flags+0xb2/0x190() ... ---[ end trace 3f86734a89416623 ]--- possible reason: unannotated irqs-on. ... There's no reason to use the raw_local_irq_save() in trace_clock_global. The local_irq_save() version is fine, and does not cause the bug in lockdep. Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4BA97FA1.7030606@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>