aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2007-05-07KMEM_CACHE(): simplify slab cache creationChristoph Lameter
This patch provides a new macro KMEM_CACHE(<struct>, <flags>) to simplify slab creation. KMEM_CACHE creates a slab with the name of the struct, with the size of the struct and with the alignment of the struct. Additional slab flags may be specified if necessary. Example struct test_slab { int a,b,c; struct list_head; } __cacheline_aligned_in_smp; test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC) will create a new slab named "test_slab" of the size sizeof(struct test_slab) and aligned to the alignment of test slab. If it fails then we panic. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07slab allocators: Remove obsolete SLAB_MUST_HWCACHE_ALIGNChristoph Lameter
This patch was recently posted to lkml and acked by Pekka. The flag SLAB_MUST_HWCACHE_ALIGN is 1. Never checked by SLAB at all. 2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB 3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB. The only remaining use is in sparc64 and ppc64 and their use there reflects some earlier role that the slab flag once may have had. If its specified then SLAB_HWCACHE_ALIGN is also specified. The flag is confusing, inconsistent and has no purpose. Remove it. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07mm: optimize kill_bdev()Peter Zijlstra
Remove duplicate work in kill_bdev(). It currently invalidates and then truncates the bdev's mapping. invalidate_mapping_pages() will opportunistically remove pages from the mapping. And truncate_inode_pages() will forcefully remove all pages. The only thing truncate doesn't do is flush the bh lrus. So do that explicitly. This avoids (very unlikely) but possible invalid lookup results if the same bdev is quickly re-issued. It also will prevent extreme kernel latencies which are observed when blockdevs which have a large amount of pagecache are unmounted, by avoiding invalidate_mapping_pages() on that path. invalidate_mapping_pages() has no cond_resched (it can be called under spinlock), whereas truncate_inode_pages() has one. [akpm@linux-foundation.org: restore nrpages==0 optimisation] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07mm: remove destroy_dirty_buffers from invalidate_bdev()Peter Zijlstra
Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't been used in 6 years (so akpm says). find * -name \*.[ch] | xargs grep -l invalidate_bdev | while read file; do quilt add $file; sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file; done Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07Quicklists for page table pagesChristoph Lameter
On x86_64 this cuts allocation overhead for page table pages down to a fraction (kernel compile / editing load. TSC based measurement of times spend in each function): no quicklist pte_alloc 1569048 4.3s(401ns/2.7us/179.7us) pmd_alloc 780988 2.1s(337ns/2.7us/86.1us) pud_alloc 780072 2.2s(424ns/2.8us/300.6us) pgd_alloc 260022 1s(920ns/4us/263.1us) quicklist: pte_alloc 452436 573.4ms(8ns/1.3us/121.1us) pmd_alloc 196204 174.5ms(7ns/889ns/46.1us) pud_alloc 195688 172.4ms(7ns/881ns/151.3us) pgd_alloc 65228 9.8ms(8ns/150ns/6.1us) pgd allocations are the most complex and there we see the most dramatic improvement (may be we can cut down the amount of pgds cached somewhat?). But even the pte allocations still see a doubling of performance. 1. Proven code from the IA64 arch. The method used here has been fine tuned for years and is NUMA aware. It is based on the knowledge that accesses to page table pages are sparse in nature. Taking a page off the freelists instead of allocating a zeroed pages allows a reduction of number of cachelines touched in addition to getting rid of the slab overhead. So performance improves. This is particularly useful if pgds contain standard mappings. We can save on the teardown and setup of such a page if we have some on the quicklists. This includes avoiding lists operations that are otherwise necessary on alloc and free to track pgds. 2. Light weight alternative to use slab to manage page size pages Slab overhead is significant and even page allocator use is pretty heavy weight. The use of a per cpu quicklist means that we touch only two cachelines for an allocation. There is no need to access the page_struct (unless arch code needs to fiddle around with it). So the fast past just means bringing in one cacheline at the beginning of the page. That same cacheline may then be used to store the page table entry. Or a second cacheline may be used if the page table entry is not in the first cacheline of the page. The current code will zero the page which means touching 32 cachelines (assuming 128 byte). We get down from 32 to 2 cachelines in the fast path. 3. x86_64 gets lightweight page table page management. This will allow x86_64 arch code to faster repopulate pgds and other page table entries. The list operations for pgds are reduced in the same way as for i386 to the point where a pgd is allocated from the page allocator and when it is freed back to the page allocator. A pgd can pass through the quicklists without having to be reinitialized. 64 Consolidation of code from multiple arches So far arches have their own implementation of quicklist management. This patch moves that feature into the core allowing an easier maintenance and consistent management of quicklists. Page table pages have the characteristics that they are typically zero or in a known state when they are freed. This is usually the exactly same state as needed after allocation. So it makes sense to build a list of freed page table pages and then consume the pages already in use first. Those pages have already been initialized correctly (thus no need to zero them) and are likely already cached in such a way that the MMU can use them most effectively. Page table pages are used in a sparse way so zeroing them on allocation is not too useful. Such an implementation already exits for ia64. Howver, that implementation did not support constructors and destructors as needed by i386 / x86_64. It also only supported a single quicklist. The implementation here has constructor and destructor support as well as the ability for an arch to specify how many quicklists are needed. Quicklists are defined by an arch defining CONFIG_QUICKLIST. If more than one quicklist is necessary then we can define NR_QUICK for additional lists. F.e. i386 needs two and thus has config NR_QUICK int default 2 If an arch has requested quicklist support then pages can be allocated from the quicklist (or from the page allocator if the quicklist is empty) via: quicklist_alloc(<quicklist-nr>, <gfpflags>, <constructor>) Page table pages can be freed using: quicklist_free(<quicklist-nr>, <destructor>, <page>) Pages must have a definite state after allocation and before they are freed. If no constructor is specified then pages will be zeroed on allocation and must be zeroed before they are freed. If a constructor is used then the constructor will establish a definite page state. F.e. the i386 and x86_64 pgd constructors establish certain mappings. Constructors and destructors can also be used to track the pages. i386 and x86_64 use a list of pgds in order to be able to dynamically update standard mappings. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07slub: enable tracking of full slabsChristoph Lameter
If slab tracking is on then build a list of full slabs so that we can verify the integrity of all slabs and are also able to built list of alloc/free callers. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07Add virt_to_head_page and consolidate code in slab and slubChristoph Lameter
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07mm: optimize compound_head() by avoiding a shared page flagChristoph Lameter
The patch adds PageTail(page) and PageHead(page) to check if a page is the head or the tail of a compound page. This is done by masking the two bits describing the state of a compound page and then comparing them. So one comparision and a branch instead of two bit checks and two branches. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07Make page->private usable in compound pagesChristoph Lameter
If we add a new flag so that we can distinguish between the first page and the tail pages then we can avoid to use page->private in the first page. page->private == page for the first page, so there is no real information in there. Freeing up page->private makes the use of compound pages more transparent. They become more usable like real pages. Right now we have to be careful f.e. if we are going beyond PAGE_SIZE allocations in the slab on i386 because we can then no longer use the private field. This is one of the issues that cause us not to support debugging for page size slabs in SLAB. Having page->private available for SLUB would allow more meta information in the page struct. I can probably avoid the 16 bit ints that I have in there right now. Also if page->private is available then a compound page may be equipped with buffer heads. This may free up the way for filesystems to support larger blocks than page size. We add PageTail as an alias of PageReclaim. Compound pages cannot currently be reclaimed. Because of the alias one needs to check PageCompound first. The RFC for the this approach was discussed at http://marc.info/?t=117574302800001&r=1&w=2 [nacc@us.ibm.com: fix hugetlbfs] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07SLUB: allocate smallest object size if the user asks for 0 bytesChristoph Lameter
Makes SLUB behave like SLAB in this area to avoid issues.... Throw a stack dump to alert people. At some point the behavior should be switched back. NULL is no memory as far as I can tell and if the use asked for 0 bytes then he need to get no memory. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07SLUB coreChristoph Lameter
This is a new slab allocator which was motivated by the complexity of the existing code in mm/slab.c. It attempts to address a variety of concerns with the existing implementation. A. Management of object queues A particular concern was the complex management of the numerous object queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for each allocating CPU and use objects from a slab directly instead of queueing them up. B. Storage overhead of object queues SLAB Object queues exist per node, per CPU. The alien cache queue even has a queue array that contain a queue for each processor on each node. For very large systems the number of queues and the number of objects that may be caught in those queues grows exponentially. On our systems with 1k nodes / processors we have several gigabytes just tied up for storing references to objects for those queues This does not include the objects that could be on those queues. One fears that the whole memory of the machine could one day be consumed by those queues. C. SLAB meta data overhead SLAB has overhead at the beginning of each slab. This means that data cannot be naturally aligned at the beginning of a slab block. SLUB keeps all meta data in the corresponding page_struct. Objects can be naturally aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte boundaries and can fit tightly into a 4k page with no bytes left over. SLAB cannot do this. D. SLAB has a complex cache reaper SLUB does not need a cache reaper for UP systems. On SMP systems the per CPU slab may be pushed back into partial list but that operation is simple and does not require an iteration over a list of objects. SLAB expires per CPU, shared and alien object queues during cache reaping which may cause strange hold offs. E. SLAB has complex NUMA policy layer support SLUB pushes NUMA policy handling into the page allocator. This means that allocation is coarser (SLUB does interleave on a page level) but that situation was also present before 2.6.13. SLABs application of policies to individual slab objects allocated in SLAB is certainly a performance concern due to the frequent references to memory policies which may lead a sequence of objects to come from one node after another. SLUB will get a slab full of objects from one node and then will switch to the next. F. Reduction of the size of partial slab lists SLAB has per node partial lists. This means that over time a large number of partial slabs may accumulate on those lists. These can only be reused if allocator occur on specific nodes. SLUB has a global pool of partial slabs and will consume slabs from that pool to decrease fragmentation. G. Tunables SLAB has sophisticated tuning abilities for each slab cache. One can manipulate the queue sizes in detail. However, filling the queues still requires the uses of the spin lock to check out slabs. SLUB has a global parameter (min_slab_order) for tuning. Increasing the minimum slab order can decrease the locking overhead. The bigger the slab order the less motions of pages between per CPU and partial lists occur and the better SLUB will be scaling. G. Slab merging We often have slab caches with similar parameters. SLUB detects those on boot up and merges them into the corresponding general caches. This leads to more effective memory use. About 50% of all caches can be eliminated through slab merging. This will also decrease slab fragmentation because partial allocated slabs can be filled up again. Slab merging can be switched off by specifying slub_nomerge on boot up. Note that merging can expose heretofore unknown bugs in the kernel because corrupted objects may now be placed differently and corrupt differing neighboring objects. Enable sanity checks to find those. H. Diagnostics The current slab diagnostics are difficult to use and require a recompilation of the kernel. SLUB contains debugging code that is always available (but is kept out of the hot code paths). SLUB diagnostics can be enabled via the "slab_debug" option. Parameters can be specified to select a single or a group of slab caches for diagnostics. This means that the system is running with the usual performance and it is much more likely that race conditions can be reproduced. I. Resiliency If basic sanity checks are on then SLUB is capable of detecting common error conditions and recover as best as possible to allow the system to continue. J. Tracing Tracing can be enabled via the slab_debug=T,<slabcache> option during boot. SLUB will then protocol all actions on that slabcache and dump the object contents on free. K. On demand DMA cache creation. Generally DMA caches are not needed. If a kmalloc is used with __GFP_DMA then just create this single slabcache that is needed. For systems that have no ZONE_DMA requirement the support is completely eliminated. L. Performance increase Some benchmarks have shown speed improvements on kernbench in the range of 5-10%. The locking overhead of slub is based on the underlying base allocation size. If we can reliably allocate larger order pages then it is possible to increase slub performance much further. The anti-fragmentation patches may enable further performance increases. Tested on: i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator SLUB Boot options slub_nomerge Disable merging of slabs slub_min_order=x Require a minimum order for slab caches. This increases the managed chunk size and therefore reduces meta data and locking overhead. slub_min_objects=x Mininum objects per slab. Default is 8. slub_max_order=x Avoid generating slabs larger than order specified. slub_debug Enable all diagnostics for all caches slub_debug=<options> Enable selective options for all caches slub_debug=<o>,<cache> Enable selective options for a certain set of caches Available Debug options F Double Free checking, sanity and resiliency R Red zoning P Object / padding poisoning U Track last free / alloc T Trace all allocs / frees (only use for individual slabs). To use SLUB: Apply this patch and then select SLUB as the default slab allocator. [hugh@veritas.com: fix an oops-causing locking error] [akpm@linux-foundation.org: various stupid cleanups and small fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07readahead: code cleanupJan Kara
Rename file_ra_state.prev_page to prev_index and file_ra_state.offset to prev_offset. Also update of prev_index in do_generic_mapping_read() is now moved close to the update of prev_offset. [wfg@mail.ustc.edu.cn: fix it] Signed-off-by: Jan Kara <jack@suse.cz> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: WU Fengguang <wfg@mail.ustc.edu.cn> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07readahead: improve heuristic detecting sequential readsJan Kara
Introduce ra.offset and store in it an offset where the previous read ended. This way we can detect whether reads are really sequential (and thus we should not mark the page as accessed repeatedly) or whether they are random and just happen to be in the same page (and the page should really be marked accessed again). Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Cc: WU Fengguang <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07smaps: add clear_refs file to clear referenceDavid Rientjes
Adds /proc/pid/clear_refs. When any non-zero number is written to this file, pte_mkold() and ClearPageReferenced() is called for each pte and its corresponding page, respectively, in that task's VMAs. This file is only writable by the user who owns the task. It is now possible to measure _approximately_ how much memory a task is using by clearing the reference bits with echo 1 > /proc/pid/clear_refs and checking the reference count for each VMA from the /proc/pid/smaps output at a measured time interval. For example, to observe the approximate change in memory footprint for a task, write a script that clears the references (echo 1 > /proc/pid/clear_refs), sleeps, and then greps for Pgs_Referenced and extracts the size in kB. Add the sizes for each VMA together for the total referenced footprint. Moments later, repeat the process and observe the difference. For example, using an efficient Mozilla: accumulated time referenced memory ---------------- ----------------- 0 s 408 kB 1 s 408 kB 2 s 556 kB 3 s 1028 kB 4 s 872 kB 5 s 1956 kB 6 s 416 kB 7 s 1560 kB 8 s 2336 kB 9 s 1044 kB 10 s 416 kB This is a valuable tool to get an approximate measurement of the memory footprint for a task. Cc: Hugh Dickins <hugh@veritas.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> [akpm@linux-foundation.org: build fixes] [mpm@selenic.com: rename for_each_pmd] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07Add unitialized_var() macro for suppressing gcc warningsBorislav Petkov
Introduce a macro for suppressing gcc from generating a warning about a probable uninitialized state of a variable. Example: - spinlock_t *ptl; + spinlock_t *uninitialized_var(ptl); Not a happy solution, but those warnings are obnoxious. - Using the usual pointlessly-set-it-to-zero approach wastes several bytes of text. - Using a macro means we can (hopefully) do something else if gcc changes cause the `x = x' hack to stop working - Using a macro means that people who are worried about hiding true bugs can easily turn it off. Signed-off-by: Borislav Petkov <bbpetkov@yahoo.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07add pfn_valid_within helper for sub-MAX_ORDER hole detectionAndy Whitcroft
Generally we work under the assumption that memory the mem_map array is contigious and valid out to MAX_ORDER_NR_PAGES block of pages, ie. that if we have validated any page within this MAX_ORDER_NR_PAGES block we need not check any other. This is not true when CONFIG_HOLES_IN_ZONE is set and we must check each and every reference we make from a pfn. Add a pfn_valid_within() helper which should be used when scanning pages within a MAX_ORDER_NR_PAGES block when we have already checked the validility of the block normally with pfn_valid(). This can then be optimised away when we do not have holes within a MAX_ORDER_NR_PAGES block of pages. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07mm/slab.c: proper prototypesAdrian Bunk
Add proper prototypes in include/linux/slab.h. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07mm: make read_cache_page synchronousNick Piggin
Ensure pages are uptodate after returning from read_cache_page, which allows us to cut out most of the filesystem-internal PageUptodate calls. I didn't have a great look down the call chains, but this appears to fixes 7 possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in block2mtd. All depending on whether the filler is async and/or can return with a !uptodate page. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07mm: remove gcc workaroundNick Piggin
Minimum gcc version is 3.2 now. However, with likely profiling, even modern gcc versions cannot always eliminate the call. Replace the placeholder functions with the more conventional empty static inlines, which should be optimal for everyone. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07proper prototype for hugetlb_get_unmapped_area()Adrian Bunk
Add a proper prototype for hugetlb_get_unmapped_area() in include/linux/hugetlb.h. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07Add apply_to_page_range() which applies a function to a pte rangeJeremy Fitzhardinge
Add a new mm function apply_to_page_range() which applies a given function to every pte in a given virtual address range in a given mm structure. This is a generic alternative to cut-and-pasting the Linux idiomatic pagetable walking code in every place that a sequence of PTEs must be accessed. Although this interface is intended to be useful in a wide range of situations, it is currently used specifically by several Xen subsystems, for example: to ensure that pagetables have been allocated for a virtual address range, and to construct batched special pagetable update requests to map I/O memory (in ioremap()). [akpm@linux-foundation.org: fix warning, unpleasantly] Signed-off-by: Ian Pratt <ian.pratt@xensource.com> Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Matt Mackall <mpm@waste.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07serial: define FIXED_PORT flag for serial_coreDavid Gibson
At present, the serial core always allows setserial in userspace to change the port address, irq and base clock of any serial port. That makes sense for legacy ISA ports, but not for (say) embedded ns16550 compatible serial ports at peculiar addresses. In these cases, the kernel code configuring the ports must know exactly where they are, and their clocking arrangements (which can be unusual on embedded boards). It doesn't make sense for userspace to change these settings. Therefore, this patch defines a UPF_FIXED_PORT flag for the uart_port structure. If this flag is set when the serial port is configured, any attempts to alter the port's type, io address, irq or base clock with setserial are ignored. In addition this patch uses the new flag for on-chip serial ports probed in arch/powerpc/kernel/legacy_serial.c, and for other hard-wired serial ports probed by drivers/serial/of_serial.c. Signed-off-by: David Gibson <dwg@au1.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07RM9000 serial driverThomas Koeller
Add support for the integrated serial ports of the MIPS RM9122 processor and its relatives. The patch also does some whitespace cleanup. [akpm@linux-foundation.org: cleanups] Signed-off-by: Thomas Koeller <thomas.koeller@baslerweb.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07serial driver PMC MSP71xxMarc St-Jean
Serial driver patch for the PMC-Sierra MSP71xx devices. There are three different fixes: 1 Fix for DesignWare APB THRE errata: In brief, this is a non-standard 16550 in that the THRE interrupt will not re-assert itself simply by disabling and re-enabling the THRI bit in the IER, it is only re-enabled if a character is actually sent out. It appears that the "8250-uart-backup-timer.patch" in the "mm" tree also fixes it so we have dropped our initial workaround. This patch now needs to be applied on top of that "mm" patch. 2 Fix for Busy Detect on LCR write: The DesignWare APB UART has a feature which causes a new Busy Detect interrupt to be generated if it's busy when the LCR is written. This fix saves the value of the LCR and rewrites it after clearing the interrupt. 3 Workaround for interrupt/data concurrency issue: The SoC needs to ensure that writes that can cause interrupts to be cleared reach the UART before returning from the ISR. This fix reads a non-destructive register on the UART so the read transaction completion ensures the previously queued write transaction has also completed. Signed-off-by: Marc St-Jean <Marc_St-Jean@pmc-sierra.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07slab: introduce kreallocPekka Enberg
This introduce krealloc() that reallocates memory while keeping the contents unchanged. The allocator avoids reallocation if the new size fits the currently used cache. I also added a simple non-optimized version for mm/slob.c for compatibility. [akpm@linux-foundation.org: fix warnings] Acked-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu> Acked-by: Matt Mackall <mpm@selenic.com> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuildLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (38 commits) kconfig: fix mconf segmentation fault kbuild: enable use of code from a different dir kconfig: error out if recursive dependencies are found kbuild: scripts/basic/fixdep segfault on pathological string-o-death kconfig: correct minor typo in Kconfig warning message. kconfig: fix path to modules.txt in Kconfig help usr/Kconfig: fix typo kernel-doc: alphabetically-sorted entries in index.html of 'htmldocs' kbuild: be more explicit on missing .config file kbuild: clarify the creation of the LOCALVERSION_AUTO string. kbuild: propagate errors from find in scripts/gen_initramfs_list.sh kconfig: refer to qt3 if we cannot find qt libraries kbuild: handle compressed cpio initramfs-es kbuild: ignore section mismatch warning for references from .paravirtprobe to .init.text kbuild: remove stale comment in modpost.c kbuild/mkuboot.sh: allow spaces in CROSS_COMPILE kbuild: fix make mrproper for Documentation/DocBook/man kbuild: remove kconfig binaries during make mrproper kconfig/menuconfig: do not hardcode '.config' kbuild: override build timestamp & version ...
2007-05-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (66 commits) KVM: Remove unused 'instruction_length' KVM: Don't require explicit indication of completion of mmio or pio KVM: Remove extraneous guest entry on mmio read KVM: SVM: Only save/restore MSRs when needed KVM: fix an if() condition KVM: VMX: Add lazy FPU support for VT KVM: VMX: Properly shadow the CR0 register in the vcpu struct KVM: Don't complain about cpu erratum AA15 KVM: Lazy FPU support for SVM KVM: Allow passing 64-bit values to the emulated read/write API KVM: Per-vcpu statistics KVM: VMX: Avoid unnecessary vcpu_load()/vcpu_put() cycles KVM: MMU: Avoid heavy ASSERT at non debug mode. KVM: VMX: Only save/restore MSR_K6_STAR if necessary KVM: Fold drivers/kvm/kvm_vmx.h into drivers/kvm/vmx.c KVM: VMX: Don't switch 64-bit msrs for 32-bit guests KVM: VMX: Reduce unnecessary saving of host msrs KVM: Handle guest page faults when emulating mmio KVM: SVM: Report hardware exit reason to userspace instead of dmesg KVM: Retry sleeping allocation if atomic allocation fails ...
2007-05-05Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6Linus Torvalds
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (231 commits) [PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall [PATCH] i386: type may be unused [PATCH] i386: Some additional chipset register values validation. [PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split. [PATCH] x86-64: Don't exclude asm-offsets.c in Documentation/dontdiff [PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu [PATCH] i386: white space fixes in i387.h [PATCH] i386: Drop noisy e820 debugging printks [PATCH] x86-64: Fix allnoconfig error in genapic_flat.c [PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems [PATCH] x86-64: Share identical video.S between i386 and x86-64 [PATCH] x86-64: Remove CONFIG_REORDER [PATCH] x86-64: Print type and size correctly for unknown compat ioctls [PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0) [PATCH] i386: Little cleanups in smpboot.c [PATCH] x86-64: Don't enable NUMA for a single node in K8 NUMA scanning [PATCH] x86: Use RDTSCP for synchronous get_cycles if possible [PATCH] i386: Add X86_FEATURE_RDTSCP [PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386 [PATCH] i386: Implement alternative_io for i386 ... Fix up trivial conflict in include/linux/highmem.h manually. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-05Fix nfsroot buildRalf Baechle
CC fs/nfs/nfsroot.o fs/nfs/nfsroot.c:131: error: tokens causes a section type conflict make[2]: *** [fs/nfs/nfsroot.o] Error 1 This is due to mixing const and non-const content in the same section which halfway recent gccs absolutely hate. Fixed by dropping the const. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-05Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [TG3]: Add TG3_FLAG_SUPPORT_MSI flag. [TG3]: Eliminate the TG3_FLAG_5701_REG_WRITE_BUG flag. [TG3]: Eliminate the TG3_FLAG_GOT_SERDES_FLOWCTL flag. [TG3]: Remove reset during MAC address changes. [TG3]: WoL fixes. [TG3]: Clear GPIO mask before storing. [TG3]: Improve NVRAM sizing. [TG3]: Fix TSO bugs. [MAC80211]: Add maintainers entry for mac80211. [MAC80211]: Add debugfs attributes. [MAC80211]: Add mac80211 wireless stack. [MAC80211]: Add generic include/linux/ieee80211.h [NETLINK]: Remove references to process ID [AF_IUCV]: Compile fix - adopt to skbuff changes.
2007-05-05sl82c105: rework PIO support (take 2)Sergei Shtylyov
Get rid of the 'pio_speed' member of 'ide_drive_t' that was only used by this driver by storing the PIO mode timings in the 'drive_data' instead -- this allows us to greatly simplify the process of "reloading" of the chip's timing register and do it right in sl82c150_dma_off_quietly() and to get rid of two extra arguments to config_for_pio() -- which got renamed to sl82c105_tune_pio() and now returns a PIO mode selected, with ide_config_drive_speed() call moved into the tuneproc() method, now called sl82c105_tune_drive() with the code to set drive's 'io_32bit' and 'unmask' flags in its turn moved to its proper place in the init_hwif() method. Also, while at it, rename get_timing_sl82c105() into get_pio_timings() and get rid of the code in it clamping cycle counts to 32 which was both incorrect and never executed anyway... Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-05-05[MAC80211]: Add generic include/linux/ieee80211.hJiri Benc
Add generic IEEE 802.11 definitions. Signed-off-by: Jiri Benc <jbenc@suse.cz> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-05[NETLINK]: Remove references to process IDHerbert Xu
People treating the *_pid fields in netlink as a process ID has caused endless confusion over the years. The fact that our own netlink.h does this only adds to the confusion. So here is a patch to change the comments to refer to it as the port ID which hopefully will make it clear what the purpose of the fields really is. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc: (46 commits) mmc-omap: Clean up omap set_ios and make MMC_POWER_ON work mmc-omap: Fix omap to use MMC_POWER_ON mmc-omap: add missing '\n' mmc: make tifm_sd_set_dma_data() static mmc: remove old card states mmc: support unsafe resume of cards mmc: separate out reading EXT_CSD mmc: break apart switch function MMC: Fix handling of low-voltage cards MMC: Consolidate voltage definitions mmc: add bus handler wbsd: check for data opcode earlier mmc: Separate out protocol ops mmc: Move core functions to subdir mmc: deprecate mmc bus topology mmc: remove card upon suspend mmc: allow suspended block driver to be removed mmc: Flush pending detects on host removal mmc: Move host and card drivers to subdirs mmc: Move queue functions to mmc_block ...
2007-05-04Merge git://git.linux-nfs.org/pub/linux/nfs-2.6Linus Torvalds
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (28 commits) NFS: Fix a compile glitch on 64-bit systems NFS: Clean up nfs_create_request comments spkm3: initialize hash spkm3: remove bad kfree, unnecessary export spkm3: fix spkm3's use of hmac NFS4: invalidate cached acl on setacl NFS: Fix directory caching problem - with test case and patch. NFS: Set meaningful value for fattr->time_start in readdirplus results. NFS: Added support to turn off the NFSv3 READDIRPLUS RPC. SUNRPC: RPC client should retry with different versions of rpcbind SUNRPC: remove old portmapper NFS: switch NFSROOT to use new rpcbind client SUNRPC: switch the RPC server to use the new rpcbind registration API SUNRPC: switch socket-based RPC transports to use rpcbind SUNRPC: introduce rpcbind: replacement for in-kernel portmapper SUNRPC: Eliminate side effects from rpc_malloc SUNRPC: RPC buffer size estimates are too large NLM: Shrink the maximum request size of NLM4 requests NFS: Use pgoff_t in structures and functions that pass page cache offsets NFS: Clean up nfs_sync_mapping_wait() ...
2007-05-04Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (49 commits) [SCTP]: Set assoc_id correctly during INIT collision. [SCTP]: Re-order SCTP initializations to avoid race with sctp_rcv() [SCTP]: Fix the SO_REUSEADDR handling to be similar to TCP. [SCTP]: Verify all destination ports in sctp_connectx. [XFRM] SPD info TLV aggregation [XFRM] SAD info TLV aggregationx [AF_RXRPC]: Sort out MTU handling. [AF_IUCV/IUCV] : Add missing section annotations [AF_IUCV]: Implementation of a skb backlog queue [NETLINK]: Remove bogus BUG_ON [IPV6]: Some cleanups in include/net/ipv6.h [TCP]: zero out rx_opt in tcp_disconnect() [BNX2]: Fix TSO problem with small MSS. [NET]: Rework dev_base via list_head (v3) [TCP] Highspeed: Limited slow-start is nowadays in tcp_slow_start [BNX2]: Update version and reldate. [BNX2]: Print bus information for PCIE devices. [BNX2]: Add 1-shot MSI handler for 5709. [BNX2]: Restructure PHY event handling. [BNX2]: Add indirect spinlock. ...
2007-05-04Merge branch 'for-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/dtor/input: (65 commits) Input: gpio_keys - add support for switches (EV_SW) Input: cobalt_btns - convert to use polldev library Input: add skeleton for simple polled devices Input: update some documentation Input: wistron - fix typo in keymap for Acer TM610 Input: add input_set_capability() helper Input: i8042 - add Fujitsu touchscreen/touchpad PNP IDs Input: i8042 - add Panasonic CF-29 to nomux list Input: lifebook - split into 2 devices Input: lifebook - add signature of Panasonic CF-29 Input: lifebook - activate 6-byte protocol on select models Input: lifebook - work properly on Panasonic CF-18 Input: cobalt buttons - separate device and driver registration Input: ati_remote - make button repeat sensitivity configurable Input: pxa27x - do not use deprecated SA_INTERRUPT flag Input: ucb1400 - make delays configurable Input: misc devices - switch to using input_dev->dev.parent Input: joysticks - switch to using input_dev->dev.parent Input: touchscreens - switch to using input_dev->dev.parent Input: mice - switch to using input_dev->dev.parent ... Fixed up conflicts with core device model removal of "struct subsystem" manually. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-04Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: remove "struct subsystem" as it is no longer needed sysfs: printk format warning DOC: Fix wrong identifier name in Documentation/driver-model/devres.txt platform: reorder platform_device_del Driver core: fix show_uevent from taking up way too much stack
2007-05-04Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (59 commits) PCI: Free resource files in error path of pci_create_sysfs_dev_files() pci-quirks: disable MSI on RS400-200 and RS480 PCI hotplug: Use menuconfig objects PCI: ZT5550 CPCI Hotplug driver fix PCI: rpaphp: Remove semaphores PCI: rpaphp: Ensure more pcibios_add/pcibios_remove symmetry PCI: rpaphp: Use pcibios_remove_pci_devices() symmetrically PCI: rpaphp: Document is_php_dn() PCI: rpaphp: Document find_php_slot() PCI: rpaphp: Rename rpaphp_register_pci_slot() to rpaphp_enable_slot() PCI: rpaphp: refactor tail call to rpaphp_register_slot() PCI: rpaphp: remove rpaphp_set_attention_status() PCI: rpaphp: remove print_slot_pci_funcs() PCI: rpaphp: Remove setup_pci_slot() PCI: rpaphp: remove a call that does nothing but a pointer lookup PCI: rpaphp: Remove another wrappered function PCI: rpaphp: Remve another call that is a wrapper PCI: rpaphp: remove a function that does nothing but wrap debug printks PCI: rpaphp: Remove un-needed goto PCI: rpaphp: Fix a memleak; slot->location string was never freed ...
2007-05-04Merge master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6: [CRYPTO] padlock: Remove pointless padlock module [CRYPTO] api: Add ablkcipher_request_set_tfm [CRYPTO] cryptd: Add software async crypto daemon [CRYPTO] api: Do not remove users unless new algorithm matches [CRYPTO] cryptomgr: Fix parsing of nested templates [CRYPTO] api: Add async blkcipher type [CRYPTO] templates: Pass type/mask when creating instances [CRYPTO] tcrypt: Use async blkcipher interface [CRYPTO] api: Add async block cipher interface [CRYPTO] api: Proc functions should be marked as unused
2007-05-04Convert non-highmem kmap_atomic() to static inline functionGeert Uytterhoeven
Convert kmap_atomic() in the non-highmem case from a macro to a static inline function, for better type-checking and the ability to pass void pointers instead of struct page pointers. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-04NuBus header updateFinn Thain
Sync the nubus defines with the latest code in the mac68k repo. Some of these are needed for DP8390 driver update in the next patch. Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-04lockdep: Add missing disable/enable irq variantRoman Zippel
Add missing disable/enable irq variant Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-04m68k: Atari keyboard and mouse support.Michael Schmitz
Atari keyboard and mouse support. (reformating and Kconfig fixes by Roman Zippel) Signed-off-by: Michael Schmitz <schmitz@debian.org> Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-04Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6Linus Torvalds
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6: (44 commits) i2c-s3c2410: Fix bug in releasing driver i2c-s3c2410: Fix I2C SDA to SCL setup time i2c: New i2c-tiny-usb bus driver i2c: Documentation update i2c: SPIN_LOCK_UNLOCKED cleanup i2c: Obsolete i2c-ixp2000, i2c-ixp4xx and scx200_i2c i2c: New Simtec I2C bus driver i2c: Bitbanging I2C bus driver using the GPIO API Use menuconfig objects - I2C i2c: Restore i2c_smbus_read_block_data i2c-pxa: Clean transaction stop i2c-algo-bit: Improve debugging i2c-algo-bit: Implement a 50/50 SCL duty cycle i2c-omap: Switch to static adapter numbering i2c: Blackfin Two Wire Interface driver i2c-algo-sgi: Comment and whitespace cleanups i2c: Make i2c_del_driver a void function i2c: Move i2c-isa-only exported symbol declarations i2c: Document i2c_new_device() i2c: Add i2c_new_probed_device() ... Fixed trivial conflict in Documentation/feature-removal-schedule.txt manually. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-04Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreqLinus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] Report the number of processors in PowerNow-k8 correctly [CPUFREQ] do not declare undefined functions [CPUFREQ] cleanup kconfig options [CPUFREQ] Longhaul - Revert Longhaul ver. 2 [CPUFREQ] Remove deprecated /proc/acpi/processor/performance write support [CPUFREQ] Fix limited cpufreq when booted on battery Fix preemption warnings in speedstep-centrino.c [CPUFREQ] Longhaul - Correct PCI code [CPUFREQ] p4-clockmod: switch to rdmsr_on_cpu/wrmsr_on_cpu
2007-05-04[XFRM] SPD info TLV aggregationJamal Hadi Salim
Aggregate the SPD info TLVs. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04[XFRM] SAD info TLV aggregationxJamal Hadi Salim
Aggregate the SAD info TLVs. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03[NET]: Rework dev_base via list_head (v3)Pavel Emelianov
Cleanup of dev_base list use, with the aim to simplify making device list per-namespace. In almost every occasion, use of dev_base variable and dev->next pointer could be easily replaced by for_each_netdev loop. A few most complicated places were converted to using first_netdev()/next_netdev(). Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03[BNX2]: Add support for 5709 Serdes.Michael Chan
Add PCI ID and code to support the 5709 Serdes PHY. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>