aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2010-09-27tcp: Fix >4GB writes on 64-bit.David S. Miller
Fixes kernel bugzilla #16603 tcp_sendmsg() truncates iov_len to an 'int' which a 4GB write to write zero bytes, for example. There is also the problem higher up of how verify_iovec() works. It wants to prevent the total length from looking like an error return value. However it does this using 'int', but syscalls return 'long' (and thus signed 64-bit on 64-bit machines). So it could trigger false-positives on 64-bit as written. So fix it to use 'long'. Reported-by: Olaf Bonorden <bono@onlinehome.de> Reported-by: Daniel Büse <dbuese@gmx.de> Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-22net: Move "struct net" declaration inside the __KERNEL__ macro guardOllie Wild
This patch reduces namespace pollution by moving the "struct net" declaration out of the userspace-facing portion of linux/netlink.h. It has no impact on the kernel. (This came up because we have several C++ applications which use "net" as a namespace name.) Signed-off-by: Ollie Wild <aaw@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-17netpoll: Disable IRQ around RCU dereference in netpoll_rxHerbert Xu
We cannot use rcu_dereference_bh safely in netpoll_rx as we may be called with IRQs disabled. We could however simply disable IRQs as that too causes BH to be disabled and is safe in either case. Thanks to John Linville for discovering this bug and providing a patch. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-09Merge branch 'vhost-net' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
2010-09-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: bus speed strings should be const PCI hotplug: Fix build with CONFIG_ACPI unset PCI: PCIe: Remove the port driver module exit routine PCI: PCIe: Move PCIe PME code to the pcie directory PCI: PCIe: Disable PCIe port services during port initialization PCI: PCIe: Ask BIOS for control of all native services at once ACPI/PCI: Negotiate _OSC control bits before requesting them ACPI/PCI: Do not preserve _OSC control bits returned by a query ACPI/PCI: Make acpi_pci_query_osc() return control bits ACPI/PCI: Reorder checks in acpi_pci_osc_control_set() PCI: PCIe: Introduce commad line switch for disabling port services PCI: PCIe AER: Introduce pci_aer_available() x86/PCI: only define pci_domain_nr if PCI and PCI_DOMAINS are set PCI: provide stub pci_domain_nr function for !CONFIG_PCI configs
2010-09-07Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/pseries: Correct rtas_data_buf locking in dlpar code powerpc/85xx: Add P1021 PCI IDs and quirks arch/powerpc/sysdev/qe_lib/qe.c: Add of_node_put to avoid memory leak arch/powerpc/platforms/83xx/mpc837x_mds.c: Add missing iounmap fsl_rio: fix compile errors powerpc/85xx: Fix compile issue with p1022_ds due to lmb rename to memblock powerpc/85xx: Fix compilation of mpc85xx_mds.c powerpc: Don't use kernel stack with translation off powerpc/perf_event: Reduce latency of calling perf_event_do_pending powerpc/kexec: Adds correct calling convention for kexec purgatory
2010-09-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: fix a mismatch between code and comment percpu: fix a memory leak in pcpu_extend_area_map() percpu: add __percpu notations to UP allocator percpu: handle __percpu notations in UP accessors
2010-09-07Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: use zalloc_cpumask_var() for gcwq->mayday_mask workqueue: fix GCWQ_DISASSOCIATED initialization workqueue: Add a workqueue chapter to the tracepoint docbook workqueue: fix cwq->nr_active underflow workqueue: improve destroy_workqueue() debuggability workqueue: mark lock acquisition on worker_maybe_bind_and_lock() workqueue: annotate lock context change workqueue: free rescuer on destroy_workqueue
2010-09-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: tty: fix tty_line must not be equal to number of allocated tty pointers in tty driver serial: bfin_sport_uart: restore transmit frame sync fix serial: fix port type conflict between NS16550A & U6_16550A MAINTAINERS: orphan isicom vt: Fix console corruption on driver hand-over.
2010-09-07agp/intel: Fix cache control for SandybridgeZhenyu Wang
Sandybridge GTT has new cache control bits in PTE, which controls graphics page cache in LLC or LLC/MLC, so we need to extend the mask function to respect the new bits. And set cache control to always LLC only by default on Gen6. Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: stable@kernel.org Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-09-05cgroups: fix API thinkoMichael S. Tsirkin
cgroup_attach_task_current_cg API that have upstream is backwards: we really need an API to attach to the cgroups from another process A to the current one. In our case (vhost), a priveledged user wants to attach it's task to cgroups from a less priveledged one, the API makes us run it in the other task's context, and this fails. So let's make the API generic and just pass in 'from' and 'to' tasks. Add an inline wrapper for cgroup_attach_task_current_cg to avoid breaking bisect. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com>
2010-09-03serial: fix port type conflict between NS16550A & U6_16550APhilippe Langlais
Bug seen by Dr. David Alan Gilbert with sparse Signed-off-by: Philippe Langlais <philippe.langlais@stericsson.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-31powerpc/85xx: Add P1021 PCI IDs and quirksAnton Vorontsov
This is needed for proper PCI-E support on P1021 SoCs. Signed-off-by: Anton Vorontsov <avorontsov@mvista.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2010-08-28Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notifyLinus Torvalds
* 'for-linus' of git://git.infradead.org/users/eparis/notify: fsnotify: drop two useless bools in the fnsotify main loop fsnotify: fix list walk order fanotify: Return EPERM when a process is not privileged fanotify: resize pid and reorder structure fanotify: drop duplicate pr_debug statement fanotify: flush outstanding perm requests on group destroy fsnotify: fix ignored mask handling between inode and vfsmount marks fanotify: add MAINTAINERS entry fsnotify: reset used_inode and used_vfsmount on each pass fanotify: do not dereference inode_mark when it is unset
2010-08-28Merge branch 'drm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: vgaarb: Wrap vga_(get|put) in CONFIG_VGA_ARB drm/radeon/kms: add missing scratch update in dp_detect drm/modes: Fix CVT-R modeline generation drm: fix regression in drm locking since BKL removal. drm/radeon/kms: remove stray radeon_i2c_destroy drm: mm: fix range restricted allocations drm/nouveau: drop drm_global_mutex before sleeping in submission path drm: export drm_global_mutex for drivers to use drm/nv20: Don't use pushbuf calls on the original nv20. drm/nouveau: Fix TMDS on some DCB1.5 boards. drm/nouveau: Fix backlight control on PPC machines with an internal TMDS panel. drm/nv30: Apply modesetting to the correct slave encoder drm/nouveau: Use a helper function to match PCI device/subsystem IDs. drm/nv50: add dcb type 14 to enum to prevent compiler complaint
2010-08-28NOMMU: Stub out vm_get_page_prot() if there's no MMUDavid Howells
Stub out vm_get_page_prot() if there's no MMU. This was added by commit 804af2cf6e7a ("[AGPGART] remove private page protection map") and is used in commit c07fbfd17e61 ("fbmem: VM_IO set, but not propagated") in the fbmem video driver, but the function doesn't exist on NOMMU, resulting in an undefined symbol at link time. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: pxa27x_keypad - remove input_free_device() in pxa27x_keypad_remove() Input: mousedev - fix regression of inverting axes Input: uinput - add devname alias to allow module on-demand load Input: hil_kbd - fix compile error USB: drop tty argument from usb_serial_handle_sysrq_char() Input: sysrq - drop tty argument form handle_sysrq() Input: sysrq - drop tty argument from sysrq ops handlers
2010-08-27fanotify: resize pid and reorder structureTvrtko Ursulin
resize pid and reorder the fanotify_event_metadata so it is naturally aligned and we can work towards dropping the packed attributed Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@sophos.com> Cc: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Eric Paris <eparis@redhat.com>
2010-08-27vgaarb: Wrap vga_(get|put) in CONFIG_VGA_ARBChris Wilson
Fix link failure without the vga arbitrator. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Airlie <airlied@redhat.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-08-25workqueue: fix cwq->nr_active underflowTejun Heo
cwq->nr_active is used to keep track of how many work items are active for the cpu workqueue, where 'active' is defined as either pending on global worklist or executing. This is used to implement the max_active limit and workqueue freezing. If a work item is queued after nr_active has already reached max_active, the work item doesn't increment nr_active and is put on the delayed queue and gets activated later as previous active work items retire. try_to_grab_pending() which is used in the cancellation path unconditionally decremented nr_active whether the work item being cancelled is currently active or delayed, so cancelling a delayed work item makes nr_active underflow. This breaks max_active enforcement and triggers BUG_ON() in destroy_workqueue() later on. This patch fixes this bug by adding a flag WORK_STRUCT_DELAYED, which is set while a work item in on the delayed list and making try_to_grab_pending() decrement nr_active iff the work item is currently active. The addition of the flag enlarges cwq alignment to 256 bytes which is getting a bit too large. It's scheduled to be reduced back to 128 bytes by merging WORK_STRUCT_PENDING and WORK_STRUCT_CWQ in the next devel cycle. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Johannes Berg <johannes@sipsolutions.net>
2010-08-24ACPI/PCI: Negotiate _OSC control bits before requesting them Rafael J. Wysocki
It is possible that the BIOS will not grant control of all _OSC features requested via acpi_pci_osc_control_set(), so it is recommended to negotiate the final set of _OSC features with the query flag set before calling _OSC to request control of these features. To implement it, rework acpi_pci_osc_control_set() so that the caller can specify the mask of _OSC control bits to negotiate and the mask of _OSC control bits that are absolutely necessary to it. Then, acpi_pci_osc_control_set() will run _OSC queries in a loop until the mask of _OSC control bits returned by the BIOS is equal to the mask passed to it. Also, before running the _OSC request acpi_pci_osc_control_set() will check if the caller's required control bits are present in the final mask. Using this mechanism we will be able to avoid situations in which the BIOS doesn't grant control of certain _OSC features, because they depend on some other _OSC features that have not been requested. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-08-24guard page for stacks that grow upwardsLuck, Tony
pa-risc and ia64 have stacks that grow upwards. Check that they do not run into other mappings. By making VM_GROWSUP 0x0 on architectures that do not ever use it, we can avoid some unpleasant #ifdefs in check_stack_guard_page(). Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-24workqueue: improve destroy_workqueue() debuggabilityTejun Heo
Now that the worklist is global, having works pending after wq destruction can easily lead to oops and destroy_workqueue() have several BUG_ON()s to catch these cases. Unfortunately, BUG_ON() doesn't tell much about how the work became pending after the final flush_workqueue(). This patch adds WQ_DYING which is set before the final flush begins. If a work is requested to be queued on a dying workqueue, WARN_ON_ONCE() is triggered and the request is ignored. This clearly indicates which caller is trying to queue a work on a dying workqueue and keeps the system working in most cases. Locking rule comment is updated such that the 'I' rule includes modifying the field from destruction path. Signed-off-by: Tejun Heo <tj@kernel.org>
2010-08-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: kobject_uevent: fix typo in comments firmware_class: fix typo in error path kobject: Break the kobject namespace defs into their own header
2010-08-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (29 commits) ARM: imx: fix build failure concerning otg/ulpi USB: ftdi_sio: add product ID for Lenz LI-USB USB: adutux: fix misuse of return value of copy_to_user() USB: iowarrior: fix misuse of return value of copy_to_user() USB: xHCI: update ring dequeue pointer when process missed tds USB: xhci: Remove buggy assignment in next_trb() USB: ftdi_sio: Add ID for Ionics PlugComputer USB: serial: io_ti.c: don't return 0 if writing the download record failed USB: otg: twl4030: fix wrong assumption of starting state USB: gadget: Return -ENOMEM on memory allocation failure USB: gadget: fix composite kernel-doc warnings USB: ssu100: set tty_flags in ssu100_process_packet USB: ssu100: add disconnect function for ssu100 USB: serial: export symbol usb_serial_generic_disconnect USB: ssu100: rework logic for TIOCMIWAIT USB: ssu100: add register parameter to ssu100_setregister USB: ssu100: remove duplicate #defines in ssu100 USB: ssu100: refine process_packet in ssu100 USB: ssu100: add locking for port private data in ssu100 USB: r8a66597-udc: return -ENOMEM if kzalloc() fails ...
2010-08-23USB: gadget: fix composite kernel-doc warningsRandy Dunlap
Warning(include/linux/usb/composite.h:284): No description found for parameter 'disconnect' Warning(drivers/usb/gadget/composite.c:744): No description found for parameter 'c' Warning(drivers/usb/gadget/composite.c:744): Excess function parameter 'cdev' description in 'usb_string_ids_n' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits) netfilter: fix CONFIG_COMPAT support isdn/avm: fix build when PCMCIA is not enabled header: fix broken headers for user space e1000e: don't check for alternate MAC addr on parts that don't support it e1000e: disable ASPM L1 on 82573 ll_temac: Fix poll implementation netxen: fix a race in netxen_nic_get_stats() qlnic: fix a race in qlcnic_get_stats() irda: fix a race in irlan_eth_xmit() net: sh_eth: remove unused variable netxen: update version 4.0.74 netxen: fix inconsistent lock state vlan: Match underlying dev carrier on vlan add ibmveth: Fix opps during MTU change on an active device ehea: Fix synchronization between HW and SW send queue bnx2x: Update bnx2x version to 1.52.53-4 bnx2x: Fix PHY locking problem rds: fix a leak of kernel memory netlink: fix compat recvmsg netfilter: fix userspace header warning ...
2010-08-23kobject: Break the kobject namespace defs into their own headerDavid Howells
Break the kobject namespace defs into their own header to avoid a header file inclusion ordering problem between linux/sysfs.h and linux/kobject.h. This fixes the build breakage on older versions of gcc. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-22header: fix broken headers for user spaceChangli Gao
__packed is only defined in kernel space, so we should use __attribute__((packed)) for the code shared between kernel and user space. Two __attribute() annotations are replaced with __attribute__() too. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-22fanotify: flush outstanding perm requests on group destroyEric Paris
When an fanotify listener is closing it may cause a deadlock between the listener and the original task doing an fs operation. If the original task is waiting for a permissions response it will be holding the srcu lock. The listener cannot clean up and exit until after that srcu lock is syncronized. Thus deadlock. The fix introduced here is to stop accepting new permissions events when a listener is shutting down and to grant permission for all outstanding events. Thus the original task will eventually release the srcu lock and the listener can complete shutdown. Reported-by: Andreas Gruenbacher <agruen@suse.de> Cc: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2010-08-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slab: fix object alignment slub: add missing __percpu markup in mm/slub_def.h
2010-08-21mm: make the vma list be doubly linkedLinus Torvalds
It's a really simple list, and several of the users want to go backwards in it to find the previous vma. So rather than have to look up the previous entry with 'find_vma_prev()' or something similar, just make it doubly linked instead. Tested-by: Ian Campbell <ijc@hellion.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-21Input: uinput - add devname alias to allow module on-demand loadKay Sievers
Recent modprobe and udev versions allow to create device nodes for modules which are not loaded. Only the first access will cause the in-kernel module loader to pull-in the module. Systems which never access the device node will not needlessly load the module, and no longer need init scripts or other facilities to unconditionally load it. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-08-21USB: drop tty argument from usb_serial_handle_sysrq_char()Dmitry Torokhov
Since handle_sysrq() does not take tty as argument anymore we can drop it from usb_serial_handle_sysrq_char() as well. Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-08-21Input: sysrq - drop tty argument form handle_sysrq()Dmitry Torokhov
Sysrq operations do not accept tty argument anymore so no need to pass it to us. [Stephen Rothwell <sfr@canb.auug.org.au>: fix build breakage in drm code caused by sysrq using bool but not including linux/types.h] [Sachin Sant <sachinp@in.ibm.com>: fix build breakage in s390 keyboadr driver] Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-08-20kfifo: implement missing __kfifo_skip_r()Andrea Righi
kfifo_skip() is currently broken, due to the missing of the internal helper function. Add it. Signed-off-by: Andrea Righi <arighi@develer.com> Cc: Greg KH <greg@kroah.com> Acked-by: Stefani Seibold <stefani@seibold.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-19Input: sysrq - drop tty argument from sysrq ops handlersDmitry Torokhov
Noone is using tty argument so let's get rid of it. Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-08-18netfilter: fix userspace header warningSam Ravnborg
"make headers_check" issued the following warning: CHECK include/linux/netfilter (64 files) usr/include/linux/netfilter/xt_ipvs.h:19: found __[us]{8,16,32,64} type without #include <linux/types.h> Fix this by as suggested including linux/types.h. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-18net: add Fast Ethernet driver for PXA168.Sachin Sanap
Signed-off-by: Sachin Sanap <ssanap@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-18Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: fs: brlock vfsmount_lock fs: scale files_lock lglock: introduce special lglock and brlock spin locks tty: fix fu_list abuse fs: cleanup files_lock locking fs: remove extra lookup in __lookup_hash fs: fs_struct rwlock to spinlock apparmor: use task path helpers fs: dentry allocation consolidation fs: fix do_lookup false negative mbcache: Limit the maximum number of cache entries hostfs ->follow_link() braino hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy remove SWRITE* I/O types kill BH_Ordered flag vfs: update ctime when changing the file's permission by setfacl cramfs: only unlock new inodes fix reiserfs_evict_inode end_writeback second call
2010-08-18Merge branch 'merge-devicetree' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds
* 'merge-devicetree' of git://git.secretlab.ca/git/linux-2.6: spi.h: missing kernel-doc notation, please fix of: fix missing headers for of_address_to_resource() in MTD and SysACE drivers of: Fix missing includes ata: update for of_device to platform_device replacement microblaze: Fix of: eliminate of_device->node and dev_archdata->{of,prom}_node microblaze: Fix of/address: Merge all of the bus translation code booting-without-of: Remove nonexistent chapters from TOC, fix numbering
2010-08-18fs: scale files_lockNick Piggin
fs: scale files_lock Improve scalability of files_lock by adding per-cpu, per-sb files lists, protected with an lglock. The lglock provides fast access to the per-cpu lists to add and remove files. It also provides a snapshot of all the per-cpu lists (although this is very slow). One difficulty with this approach is that a file can be removed from the list by another CPU. We must track which per-cpu list the file is on with a new variale in the file struct (packed into a hole on 64-bit archs). Scalability could suffer if files are frequently removed from different cpu's list. However loads with frequent removal of files imply short interval between adding and removing the files, and the scheduler attempts to avoid moving processes too far away. Also, even in the case of cross-CPU removal, the hardware has much more opportunity to parallelise cacheline transfers with N cachelines than with 1. A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs degenerates to contending on a single lock, which is no worse than before. When more than one CPU are allocating files, even if they are always freed by different CPUs, there will be more parallelism than the single-lock case. Testing results: On a 2 socket, 8 core opteron, I measure the number of times the lock is taken to remove the file, the number of times it is removed by the same CPU that added it, and the number of times it is removed by the same node that added it. Booting: locks= 25049 cpu-hits= 23174 (92.5%) node-hits= 23945 (95.6%) kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%) dbench 64 locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%) So a file is removed from the same CPU it was added by over 90% of the time. It remains within the same node 95% of the time. Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile. throughput 2.6.34-rc2 24.5 +patch 24.9 us sys idle IO wait (in %) 2.6.34-rc2 51.25 28.25 17.25 3.25 +patch 53.75 18.5 19 8.75 So significantly less CPU time spent in kernel code, higher idle time and slightly higher throughput. Single threaded performance difference was within the noise of microbenchmarks. That is not to say penalty does not exist, the code is larger and more memory accesses required so it will be slightly slower. Cc: linux-kernel@vger.kernel.org Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18lglock: introduce special lglock and brlock spin locksNick Piggin
lglock: introduce special lglock and brlock spin locks This patch introduces "local-global" locks (lglocks). These can be used to: - Provide fast exclusive access to per-CPU data, with exclusive access to another CPU's data allowed but possibly subject to contention, and to provide very slow exclusive access to all per-CPU data. - Or to provide very fast and scalable read serialisation, and to provide very slow exclusive serialisation of data (not necessarily per-CPU data). Brlocks are also implemented as a short-hand notation for the latter use case. Thanks to Paul for local/global naming convention. Cc: linux-kernel@vger.kernel.org Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18tty: fix fu_list abuseNick Piggin
tty: fix fu_list abuse tty code abuses fu_list, which causes a bug in remount,ro handling. If a tty device node is opened on a filesystem, then the last link to the inode removed, the filesystem will be allowed to be remounted readonly. This is because fs_may_remount_ro does not find the 0 link tty inode on the file sb list (because the tty code incorrectly removed it to use for its own purpose). This can result in a filesystem with errors after it is marked "clean". Taking idea from Christoph's initial patch, allocate a tty private struct at file->private_data and put our required list fields in there, linking file and tty. This makes tty nodes behave the same way as other device nodes and avoid meddling with the vfs, and avoids this bug. The error handling is not trivial in the tty code, so for this bugfix, I take the simple approach of using __GFP_NOFAIL and don't worry about memory errors. This is not a problem because our allocator doesn't fail small allocs as a rule anyway. So proper error handling is left as an exercise for tty hackers. [ Arguably filesystem's device inode would ideally be divorced from the driver's pseudo inode when it is opened, but in practice it's not clear whether that will ever be worth implementing. ] Cc: linux-kernel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18fs: cleanup files_lock lockingNick Piggin
fs: cleanup files_lock locking Lock tty_files with a new spinlock, tty_files_lock; provide helpers to manipulate the per-sb files list; unexport the files_lock spinlock. Cc: linux-kernel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18fs: fs_struct rwlock to spinlockNick Piggin
fs: fs_struct rwlock to spinlock struct fs_struct.lock is an rwlock with the read-side used to protect root and pwd members while taking references to them. Taking a reference to a path typically requires just 2 atomic ops, so the critical section is very small. Parallel read-side operations would have cacheline contention on the lock, the dentry, and the vfsmount cachelines, so the rwlock is unlikely to ever give a real parallelism increase. Replace it with a spinlock to avoid one or two atomic operations in typical path lookup fastpath. Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18remove SWRITE* I/O typesChristoph Hellwig
These flags aren't real I/O types, but tell ll_rw_block to always lock the buffer instead of giving up on a failed trylock. Instead add a new write_dirty_buffer helper that implements this semantic and use it from the existing SWRITE* callers. Note that the ll_rw_block code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which this patch fixes. In the ufs code clean up the helper that used to call ll_rw_block to mirror sync_dirty_buffer, which is the function it implements for compound buffers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18kill BH_Ordered flagChristoph Hellwig
Instead of abusing a buffer_head flag just add a variant of sync_dirty_buffer which allows passing the exact type of write flag required. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-17spi.h: missing kernel-doc notation, please fixErnst Schwab
Added comments in kernel-doc notation for previously added struct fields. Signed-off-by: Ernst Schwab <eschwab@online.de> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-08-17Merge master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds
* master.kernel.org:/home/rmk/linux-2.6-arm: VIDEO: amba clcd: don't disable an already disabled clock ARM: Tighten check for allowable CPSR values ARM: 6329/1: wire up sys_accept4() on ARM ARM: 6328/1: Build with -fno-dwarf2-cfi-asm ARM: 6326/1: kgdb: fix GDB_MAX_REGS no longer used