aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2006-01-08[PATCH] cpuset: minor spacing initializer fixesPaul Jackson
Four trivial cpuset fixes: remove extra spaces, remove useless initializers, mark one __read_mostly. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] cpuset: memory pressure meterPaul Jackson
Provide a simple per-cpuset metric of memory pressure, tracking the -rate- that the tasks in a cpuset call try_to_free_pages(), the synchronous (direct) memory reclaim code. This enables batch managers monitoring jobs running in dedicated cpusets to efficiently detect what level of memory pressure that job is causing. This is useful both on tightly managed systems running a wide mix of submitted jobs, which may choose to terminate or reprioritize jobs that are trying to use more memory than allowed on the nodes assigned them, and with tightly coupled, long running, massively parallel scientific computing jobs that will dramatically fail to meet required performance goals if they start to use more memory than allowed to them. This patch just provides a very economical way for the batch manager to monitor a cpuset for signs of memory pressure. It's up to the batch manager or other user code to decide what to do about it and take action. ==> Unless this feature is enabled by writing "1" to the special file /dev/cpuset/memory_pressure_enabled, the hook in the rebalance code of __alloc_pages() for this metric reduces to simply noticing that the cpuset_memory_pressure_enabled flag is zero. So only systems that enable this feature will compute the metric. Why a per-cpuset, running average: Because this meter is per-cpuset, rather than per-task or mm, the system load imposed by a batch scheduler monitoring this metric is sharply reduced on large systems, because a scan of the tasklist can be avoided on each set of queries. Because this meter is a running average, instead of an accumulating counter, a batch scheduler can detect memory pressure with a single read, instead of having to read and accumulate results for a period of time. Because this meter is per-cpuset rather than per-task or mm, the batch scheduler can obtain the key information, memory pressure in a cpuset, with a single read, rather than having to query and accumulate results over all the (dynamically changing) set of tasks in the cpuset. A per-cpuset simple digital filter (requires a spinlock and 3 words of data per-cpuset) is kept, and updated by any task attached to that cpuset, if it enters the synchronous (direct) page reclaim code. A per-cpuset file provides an integer number representing the recent (half-life of 10 seconds) rate of direct page reclaims caused by the tasks in the cpuset, in units of reclaims attempted per second, times 1000. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] cpuset: mempolicy one more nodemask conversionPaul Jackson
Finish converting mm/mempolicy.c from bitmaps to nodemasks. The previous conversion had left one routine using bitmaps, since it involved a corresponding change to kernel/cpuset.c Fix that interface by replacing with a simple macro that calls nodes_subset(), or if !CONFIG_CPUSET, returns (1). Signed-off-by: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <christoph@lameter.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Simpler signal-exit concurrency handlingPaul E. McKenney
Some simplification in checking signal delivery against concurrent exit. Instead of using get_task_struct_rcu(), which increments the task_struct reference count, check the reference count after acquiring sighand lock. Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] RCU signal handlingIngo Molnar
RCU tasklist_lock and RCU signal handling: send signals RCU-read-locked instead of tasklist_lock read-locked. This is a scalability improvement on SMP and a preemption-latency improvement under PREEMPT_RCU. Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: William Irwin <wli@holomorphy.com> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Change maxaligned_in_smp alignemnt macros to internodealigned_in_smp ↵Ravikiran G Thirumalai
macros ____cacheline_maxaligned_in_smp is currently used to align critical structures and avoid false sharing. It uses per-arch L1_CACHE_SHIFT_MAX and people find L1_CACHE_SHIFT_MAX useless. However, we have been using ____cacheline_maxaligned_in_smp to align structures on the internode cacheline size. As per Andi's suggestion, following patch kills ____cacheline_maxaligned_in_smp and introduces INTERNODE_CACHE_SHIFT, which defaults to L1_CACHE_SHIFT for all arches. Arches needing L3/Internode cacheline alignment can define INTERNODE_CACHE_SHIFT in the arch asm/cache.h. Patch replaces ____cacheline_maxaligned_in_smp with ____cacheline_internodealigned_in_smp With this patch, L1_CACHE_SHIFT_MAX can be killed Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] cpusets: swap migration interfacePaul Jackson
Add a boolean "memory_migrate" to each cpuset, represented by a file containing "0" or "1" in each directory below /dev/cpuset. It defaults to false (file contains "0"). It can be set true by writing "1" to the file. If true, then anytime that a task is attached to the cpuset so marked, the pages of that task will be moved to that cpuset, preserving, to the extent practical, the cpuset-relative placement of the pages. Also anytime that a cpuset so marked has its memory placement changed (by writing to its "mems" file), the tasks in that cpuset will have their pages moved to the cpusets new nodes, preserving, to the extent practical, the cpuset-relative placement of the moved pages. Signed-off-by: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <christoph@lameter.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Swap Migration V5: sys_migrate_pages interfaceChristoph Lameter
sys_migrate_pages implementation using swap based page migration This is the original API proposed by Ray Bryant in his posts during the first half of 2005 on linux-mm@kvack.org and linux-kernel@vger.kernel.org. The intent of sys_migrate is to migrate memory of a process. A process may have migrated to another node. Memory was allocated optimally for the prior context. sys_migrate_pages allows to shift the memory to the new node. sys_migrate_pages is also useful if the processes available memory nodes have changed through cpuset operations to manually move the processes memory. Paul Jackson is working on an automated mechanism that will allow an automatic migration if the cpuset of a process is changed. However, a user may decide to manually control the migration. This implementation is put into the policy layer since it uses concepts and functions that are also needed for mbind and friends. The patch also provides a do_migrate_pages function that may be useful for cpusets to automatically move memory. sys_migrate_pages does not modify policies in contrast to Ray's implementation. The current code here is based on the swap based page migration capability and thus is not able to preserve the physical layout relative to it containing nodeset (which may be a cpuset). When direct page migration becomes available then the implementation needs to be changed to do a isomorphic move of pages between different nodesets. The current implementation simply evicts all pages in source nodeset that are not in the target nodeset. Patch supports ia64, i386 and x86_64. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] add schedule_on_each_cpu()Christoph Lameter
swap migration's isolate_lru_page() currently uses an IPI to notify other processors that the lru caches need to be drained if the page cannot be found on the LRU. The IPI interrupt may interrupt a processor that is just processing lru requests and cause a race condition. This patch introduces a new function run_on_each_cpu() that uses the keventd() to run the LRU draining on each processor. Processors disable preemption when dealing the LRU caches (these are per processor) and thus executing LRU draining from another process is safe. Thanks to Lee Schermerhorn <lee.schermerhorn@hp.com> for finding this race condition. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] Make high and batch sizes of per_cpu_pagelists configurableRohit Seth
As recently there has been lot of traffic on the right values for batch and high water marks for per_cpu_pagelists. This patch makes these two variables configurable through /proc interface. A new tunable /proc/sys/vm/percpu_pagelist_fraction is added. This entry controls the fraction of pages at most in each zone that are allocated for each per cpu page list. The min value for this is 8. It means that we don't allow more than 1/8th of pages in each zone to be allocated in any single per_cpu_pagelist. The batch value of each per cpu pagelist is also updated as a result. It is set to pcp->high/4. The upper limit of batch is (PAGE_SHIFT * 8) Signed-off-by: Rohit Seth <rohit.seth@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] drop-pagecacheAndrew Morton
Add /proc/sys/vm/drop_caches. When written to, this will cause the kernel to discard as much pagecache and/or reclaimable slab objects as it can. THis operation requires root permissions. It won't drop dirty data, so the user should run `sync' first. Caveats: a) Holds inode_lock for exorbitant amounts of time. b) Needs to be taught about NUMA nodes: propagate these all the way through so the discarding can be controlled on a per-node basis. This is a debugging feature: useful for getting consistent results between filesystem benchmarks. We could possibly put it under a config option, but it's less than 300 bytes. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-09[PATCH] powerpc: Add arch-dependent copy_oldmem_pageMichael Ellerman
Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] spufs: The SPU file system, baseArnd Bergmann
This is the current version of the spu file system, used for driving SPEs on the Cell Broadband Engine. This release is almost identical to the version for the 2.6.14 kernel posted earlier, which is available as part of the Cell BE Linux distribution from http://www.bsc.es/projects/deepcomputing/linuxoncell/. The first patch provides all the interfaces for running spu application, but does not have any support for debugging SPU tasks or for scheduling. Both these functionalities are added in the subsequent patches. See Documentation/filesystems/spufs.txt on how to use spufs. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-06[PATCH] Fix posix-cpu-timers sched_time accumulationDavid S. Miller
I've spent the past 3 days digging into a glibc testsuite failure in current CVS, specifically libc/rt/tst-cputimer1.c The thr1 and thr2 timers fire too early in the second pass of this test. The second pass is noteworthy because it makes use of intervals, whereas the first pass does not. All throughout the posix-cpu-timers.c code, the calculation of the process sched_time sum is implemented roughly as: unsigned long long sum; sum = tsk->signal->sched_time; t = tsk; do { sum += t->sched_time; t = next_thread(t); } while (t != tsk); In fact this is the exact scheme used by check_process_timers(). In the case of check_process_timers(), current->sched_time has just been updated (via scheduler_tick(), which is invoked by update_process_times(), which subsequently invokes run_posix_cpu_timers()) So there is no special processing necessary wrt. that. In other contexts, we have to allot for the fact that tsk->sched_time might be a bit out of date if we are current. And the posix-cpu-timers.c code uses current_sched_time() to deal with that. Unfortunately it does so in an erroneous and inconsistent manner in one spot which is what results in the early timer firing. In cpu_clock_sample_group_locked(), it does this: cpu->sched = p->signal->sched_time; /* Add in each other live thread. */ while ((t = next_thread(t)) != p) { cpu->sched += t->sched_time; } if (p->tgid == current->tgid) { /* * We're sampling ourselves, so include the * cycles not yet banked. We still omit * other threads running on other CPUs, * so the total can always be behind as * much as max(nthreads-1,ncpus) * (NSEC_PER_SEC/HZ). */ cpu->sched += current_sched_time(current); } else { cpu->sched += p->sched_time; } The problem is the "p->tgid == current->tgid" test. If "p" is not current, and the tgids are the same, we will add the process t->sched_time twice into cpu->sched and omit "p"'s sched_time which is very very very wrong. posix-cpu-timers.c has a helper function, sched_ns(p) which takes care of this, so my fix is to use that here instead of this special tgid test. The fact that current can be one of the sub-threads of "p" points out that we could make things a little bit more accurate, perhaps by using sched_ns() on every thread we process in these loops. It also points out that we don't use the most accurate value for threads in the group actively running other cpus (and this is mentioned in the comment). But that is a future enhancement, and this fix here definitely makes sense. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] kernel/module.c: removed dead codeJayachandran C
This patch fixes an issue reported by Coverity in kernel/module.c Error reported: Cannot reach this line of code "else return ptr;" Patch description: This is the error path, so 'err' will be negative, the else case is not required, this patch removes it. Signed-off-by: Jayachandran C. <c.jayachandran@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] s390: cleanup KconfigMartin Schwidefsky
Sanitize some s390 Kconfig options. We have ARCH_S390, ARCH_S390X, ARCH_S390_31, 64BIT, S390_SUPPORT and COMPAT. Replace these 6 options by S390, 64BIT and COMPAT. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] s390: cputime_t fixesMartin Schwidefsky
There are some more places where the use of cputime_t instead of an integer type and the associated macros is necessary for the virtual cputime accounting on s390. Affected are the s390 specific appldata code and BSD process accounting. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] swsusp: save image header firstRafael J. Wysocki
This makes the swsusp_info structure become the header of the image in the literal sense (ie. it is saved to the swap and read before any other image data with the help of the swsusp's swap map structure, so generally it is treated in the same way as the rest of the image). The main thing it does is to make swsusp_header contain the offset of the swap map used to track the image data pages rather than the offset of swsusp_info.  Simultaneously, swsusp_info becomes the first image page written to the swap. The other changes are generally consequences of the above with a few exceptions (there's some consolidation in the image reading part as a few functions turn into trivial wrappers around something else). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] swsusp: improve handling of swap partitionsRafael J. Wysocki
This changes the handling of swap partitions by swsusp to avoid locking of the swap devices that are not used for suspend and, consequently, simplifies the code. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] swsusp: make image size limit tunableRafael J. Wysocki
Make the suspend image size limit tunable via /sys/power/image_size. It is necessary for systems on which there is a limited amount of swap available for suspend. It can also be useful for optimizing performance of swsusp on systems with 1 GB of RAM or more. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] swsusp: limit image sizeRafael J. Wysocki
Limit the size of the suspend image to approx. 500 MB, which should improve the overall performance of swsusp on systems with more than 1 GB of RAM. It introduces the constant IMAGE_SIZE that can be set to the preferred size of the image (in MB) and modifies the memory-shrinking part of swsusp to take this constant into account (500 is the default value of IMAGE_SIZE). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] swsusp: Drop duplicate prototypesPavel Machek
These two prototypes are already present in sched.h, remove duplicate version. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] swsusp: fix enough_free_memRafael J. Wysocki
This patch fixes a problem with the function enough_free_mem() used by swsusp to verify if there is a sufficient number of memory pages available to it to create and save the suspend image. Namely, enough_free_mem() uses nr_free_pages() to obtain the number of free memory pages, which is incorrect, because this function returns the total number of free pages, including free highmem pages, and the highmem pages cannot be used by swsusp for storing the image data. The patch makes enough_free_mem() avoid counting the free highmem pages as available to swsusp. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] swsusp: improve freeing of memoryRafael J. Wysocki
This patch makes swsusp free only as much memory as needed to complete the suspend and not as much as possible.  In the most of cases this should speed up the suspend and make the system much more responsive after resume, especially if a GUI (eg. X Windows) is used. If needed, the old behavior (ie to free as much memory as possible during suspend) can be restored by unsetting FAST_FREE in power.h Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] swsusp: introduce the swap map structureRafael J. Wysocki
This patch introduces the swap map structure that can be used by swsusp for keeping tracks of data pages written to the swap.  The structure itself is described in a comment within the patch. The overall idea is to reduce the amount of metadata written to the swap and to write and read the image pages sequentially, in a file-alike way. This makes the swap-handling part of swsusp fairly independent of its snapshot-handling part and will hopefully allow us to completely separate these two parts in the future. This patch is needed to remove the suspend image size limit imposed by the limited size of the swsusp_info structure, which is essential for x86-64 systems with more than 512 MB of RAM. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] swsusp: remove encryptionRafael J. Wysocki
This patch removes the image encryption that is only used by swsusp instead of zeroing the image after resume in order to prevent someone from reading some confidential data from it in the future and it does not protect the image from being read by an unauthorized person before resume. The functionality it provides should really belong to the user space and will possibly be reimplemented after the swap-handling functionality of swsusp is moved to the user space. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] Alpha: convert to generic irq framework (generic part)Ivan Kokshaysky
Thanks to Christoph for doing most of the work. This allows automatic SMP IRQ affinity assignment other than default "all interrupts on all CPUs" which is rather expensive. This might be useful if the hardware can be programmed to distribute interrupts among different CPUs, like Alpha does. Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Christoph Hellwig <hch@lst.de> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] FRV: Make futex code compilable on nommu [try #2]David Howells
Make the futex code compilable and usable on NOMMU by making the attempt to handle page faults conditional on CONFIG_MMU. If this is not enabled, then we can assume that EFAULT returned from futex_atomic_op_inuser() is not recoverable, and that the address lies outside of valid memory. handle_mm_fault() is made to BUG if called on NOMMU without attempting to invoke the actual handler (__handle_mm_fault). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] swsusp: resume_store() retval fixAndrew Morton
- This function returns -EINVAL all the time. Fix. - Decruftify it a bit too. - Writing to it doesn't seem to do what it's suppoed to do. Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6Linus Torvalds
Trivial manual merge fixup for usb_find_interface clashes.
2006-01-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuildLinus Torvalds
2006-01-04[PATCH] kobject_uevent CONFIG_NET=n fixakpm@osdl.org
lib/lib.a(kobject_uevent.o)(.text+0x25f): In function `kobject_uevent': : undefined reference to `__alloc_skb' lib/lib.a(kobject_uevent.o)(.text+0x2a1): In function `kobject_uevent': : undefined reference to `skb_over_panic' lib/lib.a(kobject_uevent.o)(.text+0x31d): In function `kobject_uevent': : undefined reference to `skb_over_panic' lib/lib.a(kobject_uevent.o)(.text+0x356): In function `kobject_uevent': : undefined reference to `netlink_broadcast' lib/lib.a(kobject_uevent.o)(.init.text+0x9): In function `kobject_uevent_init': : undefined reference to `netlink_kernel_create' make: *** [.tmp_vmlinux1] Error 1 Netlink is unconditionally enabled if CONFIG_NET, so that's OK. kobject_uevent.o is compiled even if !CONFIG_HOTPLUG, which is lazy. Let's compound the sin. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-04[PATCH] driver core: replace "hotplug" by "uevent"Kay Sievers
Leave the overloaded "hotplug" word to susbsystems which are handling real devices. The driver core does not "plug" anything, it just exports the state to userspace and generates events. Signed-off-by: Kay Sievers <kay.sievers@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-04[PATCH] add uevent_helper control in /sys/kernel/Kay Sievers
This deprecates the /proc/sys/kernel/hotplug file, as all this stuff should be in /sys some day, right? :) In /sys/kernel/ we have now uevent_seqnum and uevent_helper. The seqnum is no longer used by udev, as the version for this kernel depends on netlink which events will never get out-of-order. Recent udev versions disable the /sbin/hotplug helper with an init script, cause it leads to OOM on big boxes by running hundreds of shells in parallel. It should be done now by: echo "" > /sys/kernel/uevent_helper (Note that "-n" does not work, cause neighter proc nor sysfs support truncate().) Signed-off-by: Kay Sievers <kay.sievers@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-04[PATCH] remove CONFIG_KOBJECT_UEVENT optionKay Sievers
It makes zero sense to have hotplug, but not the netlink events enabled today. Remove this option and merge the kobject_uevent.h header into the kobject.h header file. Signed-off-by: Kay Sievers <kay.sievers@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-03update the email address of Randy DunlapAdrian Bunk
This patch removes all references to the bouncing address rddunlap@osdl.org and one dead web page from the kernel. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Randy Dunlap <rdunlap@xenotime.net>
2006-01-03gitignore: ignore more generated files
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2005-12-31sysctl: make sure to terminate strings with a NULLinus Torvalds
This is a slightly more complete fix for the previous minimal sysctl string fix. It always terminates the returned string with a NUL, even if the full result wouldn't fit in the user-supplied buffer. The returned length is the full untruncated length, so that you can tell when truncation has occurred. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-30[PATCH] Fix false old value return of sysctlYi Yang
For the sysctl syscall, if the user wants to get the old value of a sysctl entry and set a new value for it in the same syscall, the old value is always overwritten by the new value if the sysctl entry is of string type and if the user sets its strategy to sysctl_string. This issue lies in the strategy being run twice if the strategy is set to sysctl_string, the general strategy sysctl_string always returns 0 if success. Such strategy routines as sysctl_jiffies and sysctl_jiffies_ms return 1 because they do read and write for the sysctl entry. The strategy routine sysctl_string return 0 although it actually read and write the sysctl entry. According to my analysis, if a strategy routine do read and write, it should return 1, if it just does some necessary check but not read and write, it should return 0, for example sysctl_intvec. Signed-off-by: Yi Yang <yang.y.yi@gmail.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-30sysctl: don't overflow the user-supplied buffer with '\0'Linus Torvalds
If the string was too long to fit in the user-supplied buffer, the sysctl layer would zero-terminate it by writing past the end of the buffer. Don't do that. Noticed by Yi Yang <yang.y.yi@gmail.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-24[PATCH] Fix memory ordering problem in wake_futex()Andrew Morton
Fix a memory ordering problem that occurs on IA64. The "store" to q->lock_ptr in wake_futex() can become visible before wake_up_all() clears the lock in the futex_q. Signed-off-by: Jack Steiner <steiner@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-20[PATCH] kernel/params.c: fix sysfs access with CONFIG_MODULES=nJason Wessel
All the work was done to setup the file and maintain the file handles but the access functions were zeroed out due to the #ifdef. Removing the #ifdef allows full access to all the parameters when CONFIG_MODULES=n. akpm: put it back again, but use CONFIG_SYSFS instead. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12[PATCH] kprobes: increment kprobe missed count for multiprobesKeshavamurthy Anil S
When multiple probes are registered at the same address and if due to some recursion (probe getting triggered within a probe handler), we skip calling pre_handlers and just increment nmissed field. The below patch make sure it walks the list for multiple probes case. Without the below patch we get incorrect results of nmissed count for multiple probe case. Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12[PATCH] kprobes: no probes on critical pathKeshavamurthy Anil S
For Kprobes critical path is the path from debug break exception handler till the control reaches kprobes exception code. No probes can be supported in this path as we will end up in recursion. This patch prevents this by moving the below function to safe __kprobes section onto which no probes can be inserted. Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12[PATCH] Add try_to_freeze to kauditdPierre Ossman
kauditd was causing suspends to fail because it refused to freeze. Adding a try_to_freeze() to its sleep loop solves the issue. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx> Acked-by: Pavel Machek <pavel@suse.cz> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12[PATCH] kprobes: fix race in aggregate kprobe registrationKeshavamurthy Anil S
When registering multiple kprobes at the same address, we leave a small window where the kprobe hlist will not contain a reference to the registered kprobe, leading to potentially, a system crash if the breakpoint is hit on another processor. Patch below now automically relpace the old kprobe with the new kprobe from the hash list. Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12[PATCH] Add getnstimestamp functionMatt Helsley
There are several functions that might seem appropriate for a timestamp: get_cycles() current_kernel_time() do_gettimeofday() <read jiffies/jiffies_64> Each has problems with combinations of SMP-safety, low resolution, and monotonicity. This patch adds a new function that returns a monotonic SMP-safe timestamp with nanosecond resolution where available. Changes: Split timestamp into separate patch Moved to kernel/time.c Renamed to getnstimestamp Fixed unintended-pointer-arithmetic bug Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12[PATCH] Fix RCU race in access of nohz_cpu_maskSrivatsa Vaddagiri
Accessing nohz_cpu_mask before incrementing rcp->cur is racy. It can cause tickless idle CPUs to be included in rsp->cpumask, which will extend graceperiods unnecessarily. Fix this race. It has been tested using extensions to RCU torture module that forces various CPUs to become idle. Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12[PATCH] Fix bug in RCU torture testSrivatsa Vaddagiri
While doing some test of RCU torture module, I hit a OOPS in rcu_do_batch, which was trying to processes callback of a module that was just removed. This is because we weren't waiting long enough for all callbacks to fire. Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12[PATCH] add rcu_barrier() synchronization pointDipankar Sarma
This introduces a new interface - rcu_barrier() which waits until all the RCUs queued until this call have been completed. Reiser4 needs this, because we do more than just freeing memory object in our RCU callback: we also remove it from the list hanging off super-block. This means, that before freeing reiser4-specific portion of super-block (during umount) we have to wait until all pending RCU callbacks are executed. The only change of reiser4 made to the original patch, is exporting of rcu_barrier(). Cc: Hans Reiser <reiser@namesys.com> Cc: Vladimir V. Saveliev <vs@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>