aboutsummaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)Author
2010-10-15llseek: automatically add .llseek fopArnd Bergmann
All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode *i, struct file *f) { <+... ( nonseekable_open(...) | nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable */ }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, /* open uses nonseekable */ }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, /* we have seq_read */ }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, /* readdir is present */ }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, /* read accesses f_pos */ }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, /* write accesses f_pos */ }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, /* read and write both use no f_pos */ }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, /* write uses no f_pos */ }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, /* read uses no f_pos */ }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, /* no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>
2010-10-14kmemleak: add TILE to the list of supported architectures.Chris Metcalf
All the necessary functionality was already there; we just need to make it possible to select the config option. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2010-10-11swiotlb: Use page alignment for early buffer allocationYinghai Lu
We could call free_bootmem_late() if swiotlb is not used, and it will shrink to page alignment. So alloc them with page alignment at first, to avoid lose two pages before patch: [ 0.000000] memblock_x86_reserve_range: [00d3600000, 00d7600000] swiotlb buffer [ 0.000000] memblock_x86_reserve_range: [00d7e7ef40, 00d7e9ef40] swiotlb list [ 0.000000] memblock_x86_reserve_range: [00d7e3ef40, 00d7e7ef40] swiotlb orig_ad [ 0.000000] memblock_x86_reserve_range: [000008a000, 0000092000] swiotlb overflo after patch will get [ 0.000000] memblock_x86_reserve_range: [00d3600000, 00d7600000] swiotlb buffer [ 0.000000] memblock_x86_reserve_range: [00d7e7e000, 00d7e9e000] swiotlb list [ 0.000000] memblock_x86_reserve_range: [00d7e3e000, 00d7e7e000] swiotlb orig_ad [ 0.000000] memblock_x86_reserve_range: [000008a000, 0000092000] swiotlb overflo Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-11swiotlb: make io_tlb_overflow staticFUJITA Tomonori
We don't need to export io_tlb_overflow_buffer. I'll remove io_tlb_overflow_buffer completely in the long term though. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-08Merge commit 'v2.6.36-rc7' into perf/coreIngo Molnar
Conflicts: arch/x86/kernel/module.c Merge reason: Resolve the conflict, pick up fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-07move async raid6 test to lib/Kconfig.debugDan Williams
The prompt for "Self test for hardware accelerated raid6 recovery" does not belong in the top level configuration menu. All the options in crypto/async_tx/Kconfig are selected and do not depend on CRYPTO. Kconfig.debug seems like a reasonable fit. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-10-07Merge commit 'v2.6.36-rc7' into core/rcuIngo Molnar
Merge reason: Update from -rc3 to -rc7. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-07Merge branch 'rcu/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu into core/rcu
2010-10-06slub: Enable sysfs support for !CONFIG_SLUB_DEBUGChristoph Lameter
Currently disabling CONFIG_SLUB_DEBUG also disabled SYSFS support meaning that the slabs cannot be tuned without DEBUG. Make SYSFS support independent of CONFIG_SLUB_DEBUG Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2010-10-05modules: Fix module_bug_list list corruption raceLinus Torvalds
With all the recent module loading cleanups, we've minimized the code that sits under module_mutex, fixing various deadlocks and making it possible to do most of the module loading in parallel. However, that whole conversion totally missed the rather obscure code that adds a new module to the list for BUG() handling. That code was doubly obscure because (a) the code itself lives in lib/bugs.c (for dubious reasons) and (b) it gets called from the architecture-specific "module_finalize()" rather than from generic code. Calling it from arch-specific code makes no sense what-so-ever to begin with, and is now actively wrong since that code isn't protected by the module loading lock any more. So this commit moves the "module_bug_{finalize,cleanup}()" calls away from the arch-specific code, and into the generic code - and in the process protects it with the module_mutex so that the list operations are now safe. Future fixups: - move the module list handling code into kernel/module.c where it belongs. - get rid of 'module_bug_list' and just use the regular list of modules (called 'modules' - imagine that) that we already create and maintain for other reasons. Reported-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Adrian Bunk <bunk@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-01lib/list_sort: do not pass bad pointers to cmp callbackDon Mullis
If the original list is a POT in length, the first callback from line 73 will pass a==b both pointing to the original list_head. This is dangerous because the 'list_sort()' user can use 'container_of()' and accesses the "containing" object, which does not necessary exist for the list head. So the user can access RAM which does not belong to him. If this is a write access, we can end up with memory corruption. Signed-off-by: Don Mullis <don.mullis@gmail.com> Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-23rcu: Add advice to PROVE_RCU_REPEATEDLY kernel config parameterPaul E. McKenney
The PROVE_RCU_REPEATEDLY has no "Say Y"/"Say N" advice, so this commit adds it. Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-09-22jump label: Convert dynamic debug to use jump labelsJason Baron
Convert the 'dynamic debug' infrastructure to use jump labels. Signed-off-by: Jason Baron <jbaron@redhat.com> LKML-Reference: <b77627358cea3e27d7be4386f45f66219afb8452.1284733808.git.jbaron@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-09-15Merge branch 'tip/perf/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
2010-09-10Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: Range check cpu in blk_cpu_to_group scatterlist: prevent invalid free when alloc fails writeback: Fix lost wake-up shutting down writeback thread writeback: do not lose wakeup events when forking bdi threads cciss: fix reporting of max queue depth since init block: switch s390 tape_block and mg_disk to elevator_change() block: add function call to switch the IO scheduler from a driver fs/bio-integrity.c: return -ENOMEM on kmalloc failure bio-integrity.c: remove dependency on __GFP_NOFAIL BLOCK: fix bio.bi_rw handling block: put dev->kobj in blk_register_queue fail path cciss: handle allocation failure cfq-iosched: Documentation help for new tunables cfq-iosched: blktrace print per slice sector stats cfq-iosched: Implement tunable group_idle cfq-iosched: Do group share accounting in IOPS when slice_idle=0 cfq-iosched: Do not idle if slice_idle=0 cciss: disable doorbell reset on reset_devices blkio: Fix return code for mkdir calls
2010-08-31tracing/lockdep: Fix dependency of TRACE_IRQFLAGSSteven Rostedt
When CONFIG_IRQSOFF_TRACER is set and CONFIG_PROVE_LOCKING is not, we get the following error: $ make oldconfig scripts/kconfig/conf --oldconfig arch/x86/Kconfig warning: (IRQSOFF_TRACER && TRACING_SUPPORT && FTRACE && TRACE_IRQFLAGS_SUPPORT && !ARCH_USES_GETTIMEOFFSET) selects TRACE_IRQFLAGS which has unmet direct dependencies (DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && PROVE_LOCKING) warning: (IRQSOFF_TRACER && TRACING_SUPPORT && FTRACE && TRACE_IRQFLAGS_SUPPORT && !ARCH_USES_GETTIMEOFFSET) selects TRACE_IRQFLAGS which has unmet direct dependencies (DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && PROVE_LOCKING) This is because IRQSOFF_TRACER selects TRACE_IRQFLAGS but TRACE_IRQFLAGS has PROVE_LOCKING as a dependency. This code is incorrect, and this patch changes the TRACE_IRQFLAGS to be just a simple bool that does not depend or select anything. Instead both IRQSOFF_TRACER and PROVE_LOCKING select it. Reported-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-08-31idr: describe how nextidp works in idr_get_next().Naohiro Aota
It was unclear in original kernel-doc how nextidp worked in idr_get_next(). Let's describe it. Signed-off-by: Naohiro Aota <naota@elisp.net> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-08-31idr: fix kernel-doc warnings.Naohiro Aota
Fix the following kernel-doc warnings. % perl scripts/kernel-doc lib/idr.c > /dev/null Warning(lib/idr.c:300): No description found for parameter 'starting_id' Warning(lib/idr.c:300): Excess function parameter 'start_id' description in 'idr_get_new_above' Warning(lib/idr.c:485): No description found for parameter 'idp' Warning(lib/idr.c:596): No description found for parameter 'nextidp' Warning(lib/idr.c:596): Excess function parameter 'id' description in 'idr_get_next' Warning(lib/idr.c:774): No description found for parameter 'starting_id' Warning(lib/idr.c:774): Excess function parameter 'staring_id' description in 'ida_get_new_above' Warning(lib/idr.c:918): No description found for parameter 'ida' Signed-off-by: Naohiro Aota <naota@elisp.net> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-08-30scatterlist: prevent invalid free when alloc failsJeffrey Carlyle
When alloc fails, free_table is being called. Depending on the number of bytes requested, we determine if we are going to call _get_free_page() or kmalloc(). When alloc fails, our math is wrong (due to sg_size - 1), and the last buffer is wrongfully assumed to have been allocated by kmalloc. Hence, kfree gets called and a panic occurs. Signed-off-by: Jeffrey Carlyle <jeff.carlyle@motorola.com> Signed-off-by: Olusanya Soyannwo <c23746@motorola.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-30Move .gitignore from drivers/md to lib/raid6NeilBrown
Another missing bit of the raid6 -> /lib move. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: NeilBrown <neilb@suse.de>
2010-08-23kobject_uevent: fix typo in commentsXiaotian Feng
s/ending/sending, s/kobject_uevent()/kobject_uevent_env() in the comments. Signed-off-by: Xiaotian Feng <xtfeng@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-23Merge branch 'rcu/next' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu into core/rcu
2010-08-22Merge branch 'radix-tree' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev * 'radix-tree' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev: radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags radix-tree: clear all tags in radix_tree_node_rcu_free
2010-08-23radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tagsDave Chinner
Commit ebf8aa44beed48cd17893a83d92a4403e5f9d9e2 ("radix-tree: omplement function radix_tree_range_tag_if_tagged") does not safely set tags on on intermediate tree nodes. The code walks down the tree setting tags before it has fully resolved the path to the leaf under the assumption there will be a leaf slot with the tag set in the range it is searching. Unfortunately, this is not a valid assumption - we can abort after setting a tag on an intermediate node if we overrun the number of tags we are allowed to set in a batch, or stop scanning because we we have passed the last scan index before we reach a leaf slot with the tag we are searching for set. As a result, we can leave the function with tags set on intemediate nodes which can be tripped over later by tag-based lookups. The result of these stale tags is that lookup may end prematurely or livelock because the lookup cannot make progress. The fix for the problem involves reocrding the traversal path we take to the leaf nodes, and only propagating the tags back up the tree once the tag is set in the leaf node slot. We are already recording the path for efficient traversal, so there is no additional overhead to do the intermediately node tag setting in this manner. This fixes a radix tree lookup livelock triggered by the new writeback sync livelock avoidance code introduced in commit f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement writeback livelock avoidance using page tagging"). Signed-off-by: Dave Chinner <dchinner@redhat.com> Acked-by: Jan Kara <jack@suse.cz>
2010-08-23radix-tree: clear all tags in radix_tree_node_rcu_freeDave Chinner
Commit f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement writeback livelock avoidance using page tagging") introduced a new radix tree tag, increasing the number of tags in each node from 2 to 3. It did not, however, fix up the code in radix_tree_node_rcu_free() that cleans up after radix_tree_shrink() and hence could leave stray tags set in the new tag array. The result is that the livelock avoidance code added in the the above commit would hit stale tags when doing tag based lookups, resulting in livelocks when trying to traverse the tree. Fix this problem in radix_tree_node_rcu_free() so it doesn't happen again in the future by using a loop to walk all the tags up to RADIX_TREE_MAX_TAGS to clear the stray tags radix_tree_shrink() leaves behind. Signed-off-by: Dave Chinner <dchinner@redhat.com> Acked-by: Nick Piggin <npiggin@kernel.dk> Acked-by: Jan Kara <jack@suse.cz>
2010-08-20lib/radix-tree.c: fix overflow in radix_tree_range_tag_if_tagged()Jan Kara
When radix_tree_maxindex() is ~0UL, it can happen that scanning overflows index and tree traversal code goes astray reading memory until it hits unreadable memory. Check for overflow and exit in that case. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-19rcu: Allow RCU CPU stall warnings to be off at boot, but manually enablablePaul E. McKenney
Currently, if RCU CPU stall warnings are enabled, they are enabled immediately upon boot. They can be manually disabled via /sys (and also re-enabled via /sys), and are automatically disabled upon panic. However, some users need RCU CPU stalls to be disabled at boot time, but to be enabled without rebuilding/rebooting. For example, someone running a real-time application in production might not want the additional latency of RCU CPU stall detection in normal operation, but might need to enable it at any point for fault isolation purposes. This commit therefore provides a new CONFIG_RCU_CPU_STALL_DETECTOR_RUNNABLE kernel configuration parameter that maintains the current behavior (enable at boot) by default, but allows a kernel to be configured with RCU CPU stall detection built into the kernel, but disabled at boot time. Requested-by: Clark Williams <williams@redhat.com> Requested-by: John Kacur <jkacur@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-08-19radix-tree: __rcu annotationsArnd Bergmann
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Nick Piggin <npiggin@suse.de> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2010-08-19rcu: make CPU stall warning timeout configurablePaul E. McKenney
Also set the default to 60 seconds, up from the previous hard-coded timeout of 10 seconds. This allows people who care to set short timeouts, while avoiding people with unusual configurations (make randconfig!!!) from being bothered with spurious CPU stall warnings. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2010-08-19rcu: define __rcu address space modifier for sparsePaul E. McKenney
This commit provides definitions for the __rcu annotation defined earlier. This annotation permits sparse to check for correct use of RCU-protected pointers. If a pointer that is annotated with __rcu is accessed directly (as opposed to via rcu_dereference(), rcu_assign_pointer(), or one of their variants), sparse can be made to complain. To enable such complaints, use the new default-disabled CONFIG_SPARSE_RCU_POINTER kernel configuration option. Please note that these sparse complaints are intended to be a debugging aid, -not- a code-style-enforcement mechanism. There are special rcu_dereference_protected() and rcu_access_pointer() accessors for use when RCU read-side protection is not required, for example, when no other CPU has access to the data structure in question or while the current CPU hold the update-side lock. This patch also updates a number of docbook comments that were showing their age. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Christopher Li <sparse@chrisli.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2010-08-17latencytop: Fix kconfig dependency warningsRandy Dunlap
warning: (LATENCYTOP && HAVE_LATENCYTOP_SUPPORT) selects SCHED_DEBUG which has unmet direct dependencies (DEBUG_KERNEL && PROC_FS) warning: (LATENCYTOP && HAVE_LATENCYTOP_SUPPORT) selects SCHEDSTATS which has unmet direct dependencies (DEBUG_KERNEL && PROC_FS) Add depends on STACKTRACE_SUPPORT for 'select STACKTRACE'. Add depends on PROC_FS since that is where the output goes. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Arjan van de Ven <arjan@linux.intel.com> LKML-Reference: <20100812123121.a7c99cde.randy.dunlap@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-12Merge branch 'for-linus' of git://neil.brown.name/mdLinus Torvalds
* 'for-linus' of git://neil.brown.name/md: Further tidyup of raid6 naming in lib/raid6 Make lib/raid6/test build correctly. Rename raid6 files now they're in a 'raid6' directory.
2010-08-12MN10300: Don't try and #include <linux/slab.h> in lib/inflate.c from bootloaderDavid Howells
Don't try and #include <linux/slab.h> in lib/inflate.c from the bootloader code as linux/slab.h hauls in function defs that aren't available in the bootloader code and may also haul in conflicting functions. To fix this, make the inclusion of linux/slab.h contingent on NO_INFLATE_MALLOC as are the usages of kmalloc() and kfree(). In MN10300, this causes the following errors: In file included from include/linux/string.h:21, from include/linux/bitmap.h:8, from include/linux/nodemask.h:93, from include/linux/mmzone.h:16, from include/linux/gfp.h:4, from include/linux/slab.h:12, from arch/mn10300/boot/compressed/../../../../lib/inflate.c:106, from arch/mn10300/boot/compressed/misc.c:170: /warthog/am33/linux-2.6-mn10300/arch/mn10300/include/asm/string.h:19: error: conflicting types for 'memset' arch/mn10300/boot/compressed/misc.c:59: error: previous definition of 'memset' was here Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12Further tidyup of raid6 naming in lib/raid6NeilBrown
Rename raid6/raid6x86.h to raid6/x86.h and modify some comments. Signed-off-by: NeilBrown <neilb@suse.de>
2010-08-12Make lib/raid6/test build correctly.NeilBrown
Some bit-rot needs to be cleaned out. Signed-off-by: NeilBrown <neilb@suse.de>
2010-08-11lib/decompress_bunzip2.c: fix checkstack warningPrarit Bhargava
Fix checkstack error: lib/decompress_bunzip2.c: In function `get_next_block': lib/decompress_bunzip2.c:511: warning: the frame size of 1932 bytes is larger than 1024 bytes byteCount, symToByte, and mtfSymbol cannot be declared static or allocated dynamically so place them in the bunzip_data struct. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Phillip Lougher <phillip@lougher.demon.co.uk> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11lib/bug.c: add oops end marker to WARN implementationAnton Blanchard
We are missing the oops end marker for the exception based WARN implementation in lib/bug.c. This is useful for logfile analysis tools. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@infradead.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11lib/bug.c: make WARN implementation match the kernel/panic.c oneAnton Blanchard
There are a few issues with the exception based WARN implementation in lib/bug.c: - Inconsistent printk flags. The "cut here" line is printed at KERN_EMERG, so the console and all logged in users see the single line: ------------[ cut here ]------------ for each WARN. Fix this so we print everything at KERN_WARNING to match the kernel/panic.c version. - The lib/bug.c WARN would print "Badness at". Change it to match the kernel/panic.c version which prints "WARNING: at". - Print the list of modules, similar to kernel/panic.c of modules, similar to kernel/panic.c [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Anton Blanchard <anton@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@infradead.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11Rename raid6 files now they're in a 'raid6' directory.David Woodhouse
Linus asks 'why "raid6" twice?'. No reason. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-08-10Merge branch 'for-linus' of git://neil.brown.name/mdLinus Torvalds
* 'for-linus' of git://neil.brown.name/md: (24 commits) md: clean up do_md_stop md: fix another deadlock with removing sysfs attributes. md: move revalidate_disk() back outside open_mutex md/raid10: fix deadlock with unaligned read during resync md/bitmap: separate out loading a bitmap from initialising the structures. md/bitmap: prepare for storing write-intent-bitmap via dm-dirty-log. md/bitmap: optimise scanning of empty bitmaps. md/bitmap: clean up plugging calls. md/bitmap: reduce dependence on sysfs. md/bitmap: white space clean up and similar. md/raid5: export raid5 unplugging interface. md/plug: optionally use plugger to unplug an array during resync/recovery. md/raid5: add simple plugging infrastructure. md/raid5: export is_congested test raid5: Don't set read-ahead when there is no queue md: add support for raising dm events. md: export various start/stop interfaces md: split out md_rdev_init md: be more careful setting MD_CHANGE_CLEAN md/raid5: ensure we create a unique name for kmem_cache when mddev has no gendisk ...
2010-08-10Merge branch 'kmemleak' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-2.6-cm * 'kmemleak' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-2.6-cm: kmemleak: Fix typo in the comment lib/scatterlist: Hook sg_kmalloc into kmemleak (v2) kmemleak: Add DocBook style comments to kmemleak.c kmemleak: Introduce a default off mode for kmemleak kmemleak: Show more information for objects found by alias
2010-08-09rwsem: smaller wrappers around rwsem_down_failed_commonMichel Lespinasse
More code can be pushed from rwsem_down_read_failed and rwsem_down_write_failed into rwsem_down_failed_common. Following change adding down_read_critical infrastructure support also enjoys having flags available in a register rather than having to fish it out in the struct rwsem_waiter... Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Mike Waychison <mikew@google.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Ying Han <yinghan@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09rwsem: wake queued readers when writer blocks on active read lockMichel Lespinasse
This change addresses the following situation: - Thread A acquires the rwsem for read - Thread B tries to acquire the rwsem for write, notices there is already an active owner for the rwsem. - Thread C tries to acquire the rwsem for read, notices that thread B already tried to acquire it. - Thread C grabs the spinlock and queues itself on the wait queue. - Thread B grabs the spinlock and queues itself behind C. At this point A is the only remaining active owner on the rwsem. In this situation thread B could notice that it was the last active writer on the rwsem, and decide to wake C to let it proceed in parallel with A since they both only want the rwsem for read. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Mike Waychison <mikew@google.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Ying Han <yinghan@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09rwsem: let RWSEM_WAITING_BIAS represent any number of waiting threadsMichel Lespinasse
Previously each waiting thread added a bias of RWSEM_WAITING_BIAS. With this change, the bias is added only once to indicate that the wait list is non-empty. This has a few nice properties which will be used in following changes: - when the spinlock is held and the waiter list is known to be non-empty, count < RWSEM_WAITING_BIAS <=> there is an active writer on that sem - count == RWSEM_WAITING_BIAS <=> there are waiting threads and no active readers/writers on that sem Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Mike Waychison <mikew@google.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Ying Han <yinghan@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09rwsem: lighter active count checks when waking up readersMichel Lespinasse
In __rwsem_do_wake(), we can skip the active count check unless we come there from up_xxxx(). Also when checking the active count, it is not actually necessary to increment it; this allows us to get rid of the read side undo code and simplify the calculation of the final rwsem count adjustment once we've counted the reader threads to wake. The basic observation is the following. When there are waiter threads on a rwsem and the spinlock is held, other threads can only increment the active count by trying to grab the rwsem in down_xxxx(). However down_xxxx() will notice there are waiter threads and take the down_failed path, blocking to acquire the spinlock on the way there. Therefore, a thread observing an active count of zero with waiters queued and the spinlock held, is protected against other threads acquiring the rwsem until it wakes the last waiter or releases the spinlock. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Mike Waychison <mikew@google.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Ying Han <yinghan@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09rwsem: fully separate code paths to wake writers vs readersMichel Lespinasse
This is in preparation for later changes in the series. In __rwsem_do_wake(), the first queued waiter is checked first in order to determine whether it's a writer or a reader. The code paths diverge at this point. The code that checks and increments the rwsem active count is duplicated on both sides - the point is that later changes in the series will be able to independently modify both sides. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Mike Waychison <mikew@google.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Ying Han <yinghan@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09flex_array: add helpers to get and put to make pointers easy to useEric Paris
Getting and putting arrays of pointers with flex arrays is a PITA. You have to remember to pass &ptr to the _put and you have to do weird and wacky casting to get the ptr back from the _get. Add two functions flex_array_get_ptr() and flex_array_put_ptr() to handle all of the magic. [akpm@linux-foundation.org: simplification suggested by Joe] Signed-off-by: Eric Paris <eparis@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Joe Perches <joe@perches.com> Cc: James Morris <jmorris@namei.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09lib: vsprintf: useless strlen() removedMichal Nazarewicz
The strict_strtoul() and strict_strtoull() functions used strlen() to check argument's length in a situation where it wasn't strictly necessary Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Cc: "Yi Yang" <yi.y.yang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09list debugging: warn when deleting a deleted entryBaruch Siach
Use the magic LIST_POISON* values to detect an incorrect use of list_del on a deleted entry. This DEBUG_LIST specific warning is easier to understand than the generic Oops message caused by LIST_POISON dereference. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09iommu: inline iommu_num_pagesAnton Blanchard
A profile of a network benchmark showed iommu_num_pages rather high up: 0.52% iommu_num_pages Looking at the profile, an integer divide is taking almost all of the time: % : c000000000376ea4 <.iommu_num_pages>: 1.93 : c000000000376ea4: fb e1 ff f8 std r31,-8(r1) 0.00 : c000000000376ea8: f8 21 ff c1 stdu r1,-64(r1) 0.00 : c000000000376eac: 7c 3f 0b 78 mr r31,r1 3.86 : c000000000376eb0: 38 84 ff ff addi r4,r4,-1 0.00 : c000000000376eb4: 38 05 ff ff addi r0,r5,-1 0.00 : c000000000376eb8: 7c 84 2a 14 add r4,r4,r5 46.95 : c000000000376ebc: 7c 00 18 38 and r0,r0,r3 45.66 : c000000000376ec0: 7c 84 02 14 add r4,r4,r0 0.00 : c000000000376ec4: 7c 64 2b 92 divdu r3,r4,r5 0.00 : c000000000376ec8: 38 3f 00 40 addi r1,r31,64 0.00 : c000000000376ecc: eb e1 ff f8 ld r31,-8(r1) 1.61 : c000000000376ed0: 4e 80 00 20 blr Since every caller of iommu_num_pages passes in a constant power of two we can inline this such that the divide is replaced by a shift. The entire function is only a few instructions once optimised, so it is a good candidate for inlining overall. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>