aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2009-09-24hugetlb_file_setup(): use C, not cppAndrew Morton
Why macros are always wrong: mm/mmap.c: In function 'do_mmap_pgoff': mm/mmap.c:953: warning: unused variable 'user' also, move a couple of struct forward-decls outside `#ifdef CONFIG_HUGETLB_PAGE' - it's pointless and frequently harmful to make these conditional (eg, this patch needed `struct user_struct'). Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Eric B Munson <ebmunson@us.ibm.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24genetlink: fix netns vs. netlink table locking (2)Johannes Berg
Similar to commit d136f1bd366fdb7e747ca7e0218171e7a00a98a5, there's a bug when unregistering a generic netlink family, which is caught by the might_sleep() added in that commit: BUG: sleeping function called from invalid context at net/netlink/af_netlink.c:183 in_atomic(): 1, irqs_disabled(): 0, pid: 1510, name: rmmod 2 locks held by rmmod/1510: #0: (genl_mutex){+.+.+.}, at: [<ffffffff8138283b>] genl_unregister_family+0x2b/0x130 #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8138270c>] __genl_unregister_mc_group+0x1c/0x120 Pid: 1510, comm: rmmod Not tainted 2.6.31-wl #444 Call Trace: [<ffffffff81044ff9>] __might_sleep+0x119/0x150 [<ffffffff81380501>] netlink_table_grab+0x21/0x100 [<ffffffff813813a3>] netlink_clear_multicast_users+0x23/0x60 [<ffffffff81382761>] __genl_unregister_mc_group+0x71/0x120 [<ffffffff81382866>] genl_unregister_family+0x56/0x130 [<ffffffffa0007d85>] nl80211_exit+0x15/0x20 [cfg80211] [<ffffffffa000005a>] cfg80211_exit+0x1a/0x40 [cfg80211] Fix in the same way by grabbing the netlink table lock before doing rcu_read_lock(). Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-24tunnel: eliminate recursion fieldEric Dumazet
It seems recursion field from "struct ip_tunnel" is not anymore needed. recursion prevention is done at the upper level (in dev_queue_xmit()), since we use HARD_TX_LOCK protection for tunnels. This avoids a cache line ping pong on "struct ip_tunnel" : This structure should be now mostly read on xmit and receive paths. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-24Phonet: error on broadcast sending (unimplemented)Rémi Denis-Courmont
If we ever implement this, then we can stop returning an error. Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-24Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller
Conflicts: drivers/staging/Kconfig drivers/staging/Makefile drivers/staging/cpc-usb/TODO drivers/staging/cpc-usb/cpc-usb_drv.c drivers/staging/cpc-usb/cpc.h drivers/staging/cpc-usb/cpc_int.h drivers/staging/cpc-usb/cpcusb.h
2009-09-24Merge branch 'origin' into for-linusRussell King
Conflicts: MAINTAINERS
2009-09-24Merge branch 'drm-intel-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel * 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel: (57 commits) drm/i915: Handle ERESTARTSYS during page fault drm/i915: Warn before mmaping a purgeable buffer. drm/i915: Track purged state. drm/i915: Remove eviction debug spam drm/i915: Immediately discard any backing storage for uneeded objects drm/i915: Do not mis-classify clean objects as purgeable drm/i915: Whitespace correction for madv drm/i915: BUG_ON page refleak during unbind drm/i915: Search harder for a reusable object drm/i915: Clean up evict from list. drm/i915: Add tracepoints drm/i915: framebuffer compression for GM45+ drm/i915: split display functions by chip type drm/i915: Skip the sanity checks if the current relocation is valid drm/i915: Check that the relocation points to within the target drm/i915: correct FBC update when pipe base update occurs drm/i915: blacklist Acer AspireOne lid status ACPI: make ACPI button funcs no-ops if not built in drm/i915: prevent FIFO calculation overflows on 32 bits with high dotclocks drm/i915: intel_display.c handle latency variable efficiently ... Fix up trivial conflicts in drivers/gpu/drm/i915/{i915_dma.c|i915_drv.h}
2009-09-24Merge branch 'linux-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (21 commits) x86/PCI: make 32 bit NUMA node array int, not unsigned char x86/PCI: default pcibus cpumask to all cpus if it lacks affinity MAINTAINTERS: remove hotplug driver entries PCI: pciehp: remove slot capabilities definitions PCI: pciehp: remove error message definitions PCI: pciehp: remove number field PCI: pciehp: remove hpc_ops PCI: pciehp: remove pci_dev field PCI: pciehp: remove crit_sect mutex PCI: pciehp: remove slot_bus field PCI: pciehp: remove first_slot field PCI: pciehp: remove slot_device_offset field PCI: pciehp: remove hp_slot field PCI: pciehp: remove device field PCI: pciehp: remove bus field PCI: pciehp: remove slot_num_inc field PCI: pciehp: remove num_slots field PCI: pciehp: remove slot_list field PCI: fix VGA arbiter header file PCI: Disable AER with pci=nomsi ... Fixed up trivial conflicts in MAINTAINERS
2009-09-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linusLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: module: don't call percpu_modfree on NULL pointer. module: fix memory leak when load fails after srcversion/version allocated module: preferred way to use MODULE_AUTHOR param: allow whitespace as kernel parameter separator module: reduce string table for loaded modules (v2) module: reduce symbol table for loaded modules (v2)
2009-09-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: truncate: use new helpers truncate: new helpers fs: fix overflow in sys_mount() for in-kernel calls fs: Make unload_nls() NULL pointer safe freeze_bdev: grab active reference to frozen superblocks freeze_bdev: kill bd_mount_sem exofs: remove BKL from super operations fs/romfs: correct error-handling code vfs: seq_file: add helpers for data filling vfs: remove redundant position check in do_sendfile vfs: change sb->s_maxbytes to a loff_t vfs: explicitly cast s_maxbytes in fiemap_check_ranges libfs: return error code on failed attr set seq_file: return a negative error code when seq_path_root() fails. vfs: optimize touch_time() too vfs: optimization for touch_atime() vfs: split generic_forget_inode() so that hugetlbfs does not have to copy it fs/inode.c: add dev-id and inode number for debugging in init_special_inode() libfs: make simple_read_from_buffer conventional
2009-09-25module: preferred way to use MODULE_AUTHORJohannes Berg
For the longest time now we've been using multiple MODULE_AUTHOR() statements when a module has more than one author, but the comment here disagrees. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Luciano Coelho <luciano.coelho@nokia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-25module: reduce string table for loaded modules (v2)Jan Beulich
Also remove all parts of the string table (referenced by the symbol table) that are not needed for kallsyms use (i.e. which were only referenced by symbols discarded by the previous patch, or not referenced at all for whatever reason). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-25module: reduce symbol table for loaded modules (v2)Jan Beulich
Discard all symbols not interesting for kallsyms use: absolute, section, and in the common case (!KALLSYMS_ALL) data ones. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-24Merge branch 'for-linus' of git://neil.brown.name/mdLinus Torvalds
* 'for-linus' of git://neil.brown.name/md: (97 commits) md: raid-1/10: fix RW bits manipulation md: remove unnecessary memset from multipath. md: report device as congested when suspended md: Improve name of threads created by md_register_thread md: remove sparse warnings about lock context. md: remove sparse waring "symbol xxx shadows an earlier one" async_tx/raid6: add missing dma_unmap calls to the async fail case ioat3: fix uninitialized var warnings drivers/dma/ioat/dma_v2.c: fix warnings raid6test: fix stack overflow ioat2: clarify ring size limits md/raid6: cleanup ops_run_compute6_2 md/raid6: eliminate BUG_ON with side effect dca: module load should not be an error message ioat: driver version 4.0 dca: registering requesters in multiple dca domains async_tx: remove HIGHMEM64G restriction dmaengine: sh: Add Support SuperH DMA Engine driver dmaengine: Move all map_sg/unmap_sg for slave channel to its client fsldma: Add DMA_SLAVE support ...
2009-09-24Merge branch 'hwpoison' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits) HWPOISON: Enable error_remove_page on btrfs HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs HWPOISON: Add madvise() based injector for hardware poisoned pages v4 HWPOISON: Enable error_remove_page for NFS HWPOISON: Enable .remove_error_page for migration aware file systems HWPOISON: The high level memory error handler in the VM v7 HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process HWPOISON: shmem: call set_page_dirty() with locked page HWPOISON: Define a new error_remove_page address space op for async truncation HWPOISON: Add invalidate_inode_page HWPOISON: Refactor truncate to allow direct truncating of page v2 HWPOISON: check and isolate corrupted free pages v2 HWPOISON: Handle hardware poisoned pages in try_to_unmap HWPOISON: Use bitmask/action code for try_to_unmap behaviour HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2 HWPOISON: Add poison check to page fault handling HWPOISON: Add basic support for poisoned pages in fault handler v3 HWPOISON: Add new SIGBUS error codes for hardware poison signals HWPOISON: Add support for poison swap entries v2 HWPOISON: Export some rmap vma locking to outside world ...
2009-09-24task_struct cleanup: move binfmt field to mm_structHiroshi Shimamoto
Because the binfmt is not different between threads in the same process, it can be moved from task_struct to mm_struct. And binfmt moudle is handled per mm_struct instead of task_struct. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24include/linux/unaligned/{l,b}e_byteshift.h: fix usage for compressed kernelsAlbin Tonnerre
When unaligned accesses are required for uncompressing a kernel (such as for LZO decompression on ARM in a patch that follows), including <linux/kernel.h> causes issues as it brings in a lot of things that are not available in the decompression environment. linux/kernel.h brings at least: extern int console_printk[]; extern const char hex_asc[]; which causes errors at link-time as they are not available when compiling the pre-boot environement. There are also a few others: arch/arm/boot/compressed/misc.o: In function `valid_user_regs': arch/arm/include/asm/ptrace.h:158: undefined reference to `elf_hwcap' arch/arm/boot/compressed/misc.o: In function `console_silent': include/linux/kernel.h:292: undefined reference to `console_printk' arch/arm/boot/compressed/misc.o: In function `console_verbose': include/linux/kernel.h:297: undefined reference to `console_printk' arch/arm/boot/compressed/misc.o: In function `pack_hex_byte': include/linux/kernel.h:360: undefined reference to `hex_asc' arch/arm/boot/compressed/misc.o: In function `hweight_long': include/linux/bitops.h:45: undefined reference to `hweight32' arch/arm/boot/compressed/misc.o: In function `__cmpxchg_local_generic': include/asm-generic/cmpxchg-local.h:21: undefined reference to `wrong_size_cmpxchg' include/asm-generic/cmpxchg-local.h:42: undefined reference to `wrong_size_cmpxchg' arch/arm/boot/compressed/misc.o: In function `__xchg': arch/arm/include/asm/system.h:309: undefined reference to `__bad_xchg' However, those files apparently use nothing from <linux/kernel.h>, all they need is the declaration of types such as u32 or u64, so <linux/types.h> should be enough Signed-off-by: Albin Tonnerre <albin.tonnerre@free-electrons.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Phillip Lougher <phillip@lougher.demon.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24aio: ifdef fields in mm_structAlexey Dobriyan
->ioctx_lock and ->ioctx_list are used only under CONFIG_AIO. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24linux/futex.h: place kernel types behind __KERNEL__Mike Frysinger
The forward decls for some kernel types are only needed by the code behind __KERNEL__, so don't bleed these types to userspace. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24sysctl: remove "struct file *" argument of ->proc_handlerAlexey Dobriyan
It's unused. It isn't needed -- read or write flag is already passed and sysctl shouldn't care about the rest. It _was_ used in two places at arch/frv for some reason. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24signals: inline __fatal_signal_pendingRoland McGrath
__fatal_signal_pending inlines to one instruction on x86, probably two instructions on other machines. It takes two longer x86 instructions just to call it and test its return value, not to mention the function itself. On my random x86_64 config, this saved 70 bytes of text (59 of those being __fatal_signal_pending itself). Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24fcntl: add F_[SG]ETOWN_EXPeter Zijlstra
In order to direct the SIGIO signal to a particular thread of a multi-threaded application we cannot, like suggested by the manpage, put a TID into the regular fcntl(F_SETOWN) call. It will still be send to the whole process of which that thread is part. Since people do want to properly direct SIGIO we introduce F_SETOWN_EX. The need to direct SIGIO comes from self-monitoring profiling such as with perf-counters. Perf-counters uses SIGIO to notify that new sample data is available. If the signal is delivered to the same task that generated the new sample it can augment that data by inspecting the task's user-space state right after it returns from the kernel. This is esp. convenient for interpreted or virtual machine driven environments. Both F_SETOWN_EX and F_GETOWN_EX take a pointer to a struct f_owner_ex as argument: struct f_owner_ex { int type; pid_t pid; }; Where type is one of F_OWNER_TID, F_OWNER_PID or F_OWNER_GID. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Tested-by: stephane eranian <eranian@googlemail.com> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Roland McGrath <roland@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24signals: introduce do_send_sig_info() helperOleg Nesterov
Introduce do_send_sig_info() and convert group_send_sig_info(), send_sig_info(), do_send_specific() to use this helper. Hopefully it will have more users soon, it allows to specify specific/group behaviour via "bool group" argument. Shaves 80 bytes from .text. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stephane eranian <eranian@googlemail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24exec: fix set_binfmt() vs sys_delete_module() raceOleg Nesterov
sys_delete_module() can set MODULE_STATE_GOING after search_binary_handler() does try_module_get(). In this case set_binfmt()->try_module_get() fails but since none of the callers check the returned error, the task will run with the wrong old ->binfmt. The proper fix should change all ->load_binary() methods, but we can rely on fact that the caller must hold a reference to binfmt->module and use __module_get() which never fails. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24signals: tracehook_notify_jctl changeRoland McGrath
This changes tracehook_notify_jctl() so it's called with the siglock held, and changes its argument and return value definition. These clean-ups make it a better fit for what new tracing hooks need to check. Tracing needs the siglock here, held from the time TASK_STOPPED was set, to avoid potential SIGCONT races if it wants to allow any blocking in its tracing hooks. This also folds the finish_stop() function into its caller do_signal_stop(). The function is short, called only once and only unconditionally. It aids readability to fold it in. [oleg@redhat.com: do not call tracehook_notify_jctl() in TASK_STOPPED state] [oleg@redhat.com: introduce tracehook_finish_jctl() helper] Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24ptrace: __ptrace_detach: do __wake_up_parent() if we reap the traceeOleg Nesterov
The bug is old, it wasn't cause by recent changes. Test case: static void *tfunc(void *arg) { int pid = (long)arg; assert(ptrace(PTRACE_ATTACH, pid, NULL, NULL) == 0); kill(pid, SIGKILL); sleep(1); return NULL; } int main(void) { pthread_t th; long pid = fork(); if (!pid) pause(); signal(SIGCHLD, SIG_IGN); assert(pthread_create(&th, NULL, tfunc, (void*)pid) == 0); int r = waitpid(-1, NULL, __WNOTHREAD); printf("waitpid: %d %m\n", r); return 0; } Before the patch this program hangs, after this patch waitpid() correctly fails with errno == -ECHILD. The problem is, __ptrace_detach() reaps the EXIT_ZOMBIE tracee if its ->real_parent is our sub-thread and we ignore SIGCHLD. But in this case we should wake up other threads which can sleep in do_wait(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24memory controller: soft limit reclaim on contentionBalbir Singh
Implement reclaim from groups over their soft limit Permit reclaim from memory cgroups on contention (via the direct reclaim path). memory cgroup soft limit reclaim finds the group that exceeds its soft limit by the largest number of pages and reclaims pages from it and then reinserts the cgroup into its correct place in the rbtree. Add additional checks to mem_cgroup_hierarchical_reclaim() to detect long loops in case all swap is turned off. The code has been refactored and the loop check (loop < 2) has been enhanced for soft limits. For soft limits, we try to do more targetted reclaim. Instead of bailing out after two loops, the routine now reclaims memory proportional to the size by which the soft limit is exceeded. The proportion has been empirically determined. [akpm@linux-foundation.org: build fix] [kamezawa.hiroyu@jp.fujitsu.com: fix softlimit css refcnt handling] [nishimura@mxp.nes.nec.co.jp: refcount of the "victim" should be decremented before exiting the loop] Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24memory controller: soft limit organize cgroupsBalbir Singh
Organize cgroups over soft limit in a RB-Tree Introduce an RB-Tree for storing memory cgroups that are over their soft limit. The overall goal is to 1. Add a memory cgroup to the RB-Tree when the soft limit is exceeded. We are careful about updates, updates take place only after a particular time interval has passed 2. We remove the node from the RB-Tree when the usage goes below the soft limit The next set of patches will exploit the RB-Tree to get the group that is over its soft limit by the largest amount and reclaim from it, when we face memory contention. [hugh.dickins@tiscali.co.uk: CONFIG_CGROUP_MEM_RES_CTLR=y CONFIG_PREEMPT=y fails to boot] Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24memory controller: soft limit interfaceBalbir Singh
Add an interface to allow get/set of soft limits. Soft limits for memory plus swap controller (memsw) is currently not supported. Resource counters have been enhanced to support soft limits and new type RES_SOFT_LIMIT has been added. Unlike hard limits, soft limits can be directly set and do not need any reclaim or checks before setting them to a newer value. Kamezawa-San raised a question as to whether soft limit should belong to res_counter. Since all resources understand the basic concepts of hard and soft limits, it is justified to add soft limits here. Soft limits are a generic resource usage feature, even file system quotas support soft limits. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24memcg: remove the overhead associated with the root cgroupBalbir Singh
Change the memory cgroup to remove the overhead associated with accounting all pages in the root cgroup. As a side-effect, we can no longer set a memory hard limit in the root cgroup. A new flag to track whether the page has been accounted or not has been added as well. Flags are now set atomically for page_cgroup, pcg_default_flags is now obsolete and removed. [akpm@linux-foundation.org: fix a few documentation glitches] Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24cgroups: let ss->can_attach and ss->attach do whole threadgroups at a timeBen Blum
Alter the ss->can_attach and ss->attach functions to be able to deal with a whole threadgroup at a time, for use in cgroup_attach_proc. (This is a pre-patch to cgroup-procs-writable.patch.) Currently, new mode of the attach function can only tell the subsystem about the old cgroup of the threadgroup leader. No subsystem currently needs that information for each thread that's being moved, but if one were to be added (for example, one that counts tasks within a group) this bit would need to be reworked a bit to tell the subsystem the right information. [hidave.darkstar@gmail.com: fix build] Signed-off-by: Ben Blum <bblum@google.com> Signed-off-by: Paul Menage <menage@google.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Reviewed-by: Matt Helsley <matthltc@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24cgroups: change css_set freeing mechanism to be under RCUBen Blum
Changes css_set freeing mechanism to be under RCU This is a prepatch for making the procs file writable. In order to free the old css_sets for each task to be moved as they're being moved, the freeing mechanism must be RCU-protected, or else we would have to have a call to synchronize_rcu() for each task before freeing its old css_set. Signed-off-by: Ben Blum <bblum@google.com> Signed-off-by: Paul Menage <menage@google.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24cgroups: ensure correct concurrent opening/reading of pidlists across pid ↵Ben Blum
namespaces Previously there was the problem in which two processes from different pid namespaces reading the tasks or procs file could result in one process seeing results from the other's namespace. Rather than one pidlist for each file in a cgroup, we now keep a list of pidlists keyed by namespace and file type (tasks versus procs) in which entries are placed on demand. Each pidlist has its own lock, and that the pidlists themselves are passed around in the seq_file's private pointer means we don't have to touch the cgroup or its master list except when creating and destroying entries. Signed-off-by: Ben Blum <bblum@google.com> Signed-off-by: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24cgroups: add a read-only "procs" file similar to "tasks" that shows only ↵Ben Blum
unique tgids struct cgroup used to have a bunch of fields for keeping track of the pidlist for the tasks file. Those are now separated into a new struct cgroup_pidlist, of which two are had, one for procs and one for tasks. The way the seq_file operations are set up is changed so that just the pidlist struct gets passed around as the private data. Interface example: Suppose a multithreaded process has pid 1000 and other threads with ids 1001, 1002, 1003: $ cat tasks 1000 1001 1002 1003 $ cat cgroup.procs 1000 $ Signed-off-by: Ben Blum <bblum@google.com> Signed-off-by: Paul Menage <menage@google.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24cgroups: revert "cgroups: fix pid namespace bug"Paul Menage
The following series adds a "cgroup.procs" file to each cgroup that reports unique tgids rather than pids, and allows all threads in a threadgroup to be atomically moved to a new cgroup. The subsystem "attach" interface is modified to support attaching whole threadgroups at a time, which could introduce potential problems if any subsystem were to need to access the old cgroup of every thread being moved. The attach interface may need to be revised if this becomes the case. Also added is functionality for read/write locking all CLONE_THREAD fork()ing within a threadgroup, by means of an rwsem that lives in the sighand_struct, for per-threadgroup-ness and also for sharing a cacheline with the sighand's atomic count. This scheme should introduce no extra overhead in the fork path when there's no contention. The final patch reveals potential for a race when forking before a subsystem's attach function is called - one potential solution in case any subsystem has this problem is to hang on to the group's fork mutex through the attach() calls, though no subsystem yet demonstrates need for an extended critical section. This patch: Revert commit 096b7fe012d66ed55e98bc8022405ede0cc80e96 Author: Li Zefan <lizf@cn.fujitsu.com> AuthorDate: Wed Jul 29 15:04:04 2009 -0700 Commit: Linus Torvalds <torvalds@linux-foundation.org> CommitDate: Wed Jul 29 19:10:35 2009 -0700 cgroups: fix pid namespace bug This is in preparation for some clashing cgroups changes that subsume the original commit's functionaliy. The original commit fixed a pid namespace bug which Ben Blum fixed independently (in the same way, but with different code) as part of a series of patches. I played around with trying to reconcile Ben's patch series with Li's patch, but concluded that it was simpler to just revert Li's, given that Ben's patch series contained essentially the same fix. Signed-off-by: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24docs: fix various Documentation/ paths in header filesRandy Dunlap
Fix various Documentation/ paths in include/linux/. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Reviewed-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24time: add function to convert between calendar time and broken-down time for ↵Zhaolei
universal use There are many similar code in kernel for one object: convert time between calendar time and broken-down time. Here is some source I found: fs/ncpfs/dir.c fs/smbfs/proc.c fs/fat/misc.c fs/udf/udftime.c fs/cifs/netmisc.c net/netfilter/xt_time.c drivers/scsi/ips.c drivers/input/misc/hp_sdc_rtc.c drivers/rtc/rtc-lib.c arch/ia64/hp/sim/boot/fw-emu.c arch/m68k/mac/misc.c arch/powerpc/kernel/time.c arch/parisc/include/asm/rtc.h ... We can make a common function for this type of conversion, At least we can get following benefit: 1: Make kernel simple and unify 2: Easy to fix bug in converting code 3: Reduce clone of code in future For example, I'm trying to make ftrace display walltime, this patch will make me easy. This code is based on code from glibc-2.6 Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Machek <pavel@ucw.cz> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24truncate: new helpersnpiggin@suse.de
Introduce new truncate helpers truncate_pagecache and inode_newsize_ok. vmtruncate is also consolidated from mm/memory.c and mm/nommu.c and into mm/truncate.c. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24freeze_bdev: grab active reference to frozen superblocksChristoph Hellwig
Currently we held s_umount while a filesystem is frozen, despite that we might return to userspace and unlock it from a different process. Instead grab an active reference to keep the file system busy and add an explicit check for frozen filesystems in remount and reject the remount instead of blocking on s_umount. Add a new get_active_super helper to super.c for use by freeze_bdev that grabs an active reference to a superblock from a given block device. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24freeze_bdev: kill bd_mount_semChristoph Hellwig
Now that we have the freeze count there is not much reason for bd_mount_sem anymore. The actual freeze/thaw operations are serialized using the bd_fsfreeze_mutex, and the only other place we take bd_mount_sem is get_sb_bdev which tries to prevent mounting a filesystem while the block device is frozen. Instead of add a check for bd_fsfreeze_count and return -EBUSY if a filesystem is frozen. While that is a change in user visible behaviour a failing mount is much better for this case rather than having the mount process stuck uninterruptible for a long time. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24vfs: seq_file: add helpers for data fillingMiklos Szeredi
Add two helpers that allow access to the seq_file's own buffer, but hide the internal details of seq_files. This allows easier implementation of special purpose filling functions. It also cleans up some existing functions which duplicated the seq_file logic. Make these inline functions in seq_file.h, as suggested by Al. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24vfs: change sb->s_maxbytes to a loff_tJeff Layton
sb->s_maxbytes is supposed to indicate the maximum size of a file that can exist on the filesystem. It's declared as an unsigned long long. Even if a filesystem has no inherent limit that prevents it from using every bit in that unsigned long long, it's still problematic to set it to anything larger than MAX_LFS_FILESIZE. There are places in the kernel that cast s_maxbytes to a signed value. If it's set too large then this cast makes it a negative number and generally breaks the comparison. Change s_maxbytes to be loff_t instead. That should help eliminate the temptation to set it too large by making it a signed value. Also, add a warning for couple of releases to help catch filesystems that set s_maxbytes too large. Eventually we can either convert this to a BUG() or just remove it and in the hope that no one will get it wrong now that it's a signed value. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Robert Love <rlove@google.com> Cc: Mandeep Singh Baines <msb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24vfs: split generic_forget_inode() so that hugetlbfs does not have to copy itJan Kara
Hugetlbfs needs to do special things instead of truncate_inode_pages(). Currently, it copied generic_forget_inode() except for truncate_inode_pages() call which is asking for trouble (the code there isn't trivial). So create a separate function generic_detach_inode() which does all the list magic done in generic_forget_inode() and call it from hugetlbfs_forget_inode(). Signed-off-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linusLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (39 commits) cpumask: Move deprecated functions to end of header. cpumask: remove unused deprecated functions, avoid accusations of insanity cpumask: use new-style cpumask ops in mm/quicklist. cpumask: use mm_cpumask() wrapper: x86 cpumask: use mm_cpumask() wrapper: um cpumask: use mm_cpumask() wrapper: mips cpumask: use mm_cpumask() wrapper: mn10300 cpumask: use mm_cpumask() wrapper: m32r cpumask: use mm_cpumask() wrapper: arm cpumask: Use accessors for cpu_*_mask: um cpumask: Use accessors for cpu_*_mask: powerpc cpumask: Use accessors for cpu_*_mask: mips cpumask: Use accessors for cpu_*_mask: m32r cpumask: remove arch_send_call_function_ipi cpumask: arch_send_call_function_ipi_mask: s390 cpumask: arch_send_call_function_ipi_mask: powerpc cpumask: arch_send_call_function_ipi_mask: mips cpumask: arch_send_call_function_ipi_mask: m32r cpumask: arch_send_call_function_ipi_mask: alpha cpumask: remove obsolete topology_core_siblings and topology_thread_siblings: ia64 ...
2009-09-23headers: utsname.h reduxAlexey Dobriyan
* remove asm/atomic.h inclusion from linux/utsname.h -- not needed after kref conversion * remove linux/utsname.h inclusion from files which do not need it NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however due to some personality stuff it _is_ needed -- cowardly leave ELF-related headers and files alone. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24cpumask: Move deprecated functions to end of header.Rusty Russell
The new ones have pretty kerneldoc. Move the old ones to the end to avoid confusing people. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: benh@kernel.crashing.org
2009-09-24cpumask: remove unused deprecated functions, avoid accusations of insanityRusty Russell
We're not forcing removal of the old cpu_ functions, but we might as well delete the now-unused ones. Especially CPUMASK_ALLOC and friends. I actually got a phone call (!) from a hacker who thought I had introduced them as the new cpumask API. He seemed bewildered that I had lost all taste. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: benh@kernel.crashing.org
2009-09-24cpumask: remove obsolete topology_core_siblings and ↵Rusty Russell
topology_thread_siblings: core There were replaced by topology_core_cpumask and topology_thread_cpumask. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-24cpumask: remove the deprecated smp_call_function_mask()Rusty Russell
Everyone is now using smp_call_function_many(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-24cpumask: don't define set_cpus_allowed() if CONFIG_CPUMASK_OFFSTACK=yRusty Russell
You're not supposed to pass cpumasks on the stack in that case. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>