path: root/fs
AgeCommit message (Collapse)Author
2009-09-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits) netxen: update copyright netxen: fix tx timeout recovery netxen: fix file firmware leak netxen: improve pci memory access netxen: change firmware write size tg3: Fix return ring size breakage netxen: build fix for INET=n cdc-phonet: autoconfigure Phonet address Phonet: back-end for autoconfigured addresses Phonet: fix netlink address dump error handling ipv6: Add IFA_F_DADFAILED flag net: Add DEVTYPE support for Ethernet based devices mv643xx_eth.c: remove unused txq_set_wrr() ucc_geth: Fix hangs after switching from full to half duplex ucc_geth: Rearrange some code to avoid forward declarations phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs drivers/net/phy: introduce missing kfree drivers/net/wan: introduce missing kfree net: force bridge module(s) to be GPL Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded ... Fixed up trivial conflicts: - arch/x86/include/asm/socket.h converted to <asm-generic/socket.h> in the x86 tree. The generic header has the same new #define's, so that works out fine. - drivers/net/tun.c fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that switched over to using 'tun->socket.sk' instead of the redundantly available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks to the TUN driver") which added a new 'tun->sk' use. Noted in 'next' by Stephen Rothwell.
2009-09-14udf: Fix possible corruption when close races with writeJan Kara
When we close a file, we remove preallocated blocks from it. But this truncation was not protected by i_mutex and thus it could have raced with a write through a different fd and cause crashes or even filesystem corruption. Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14udf: Perform preallocation only for regular filesJan Kara
So far we preallocated blocks also for directories but that brings a problem, when to get rid of preallocated blocks we don't need. So far we removed them in udf_clear_inode() which has a disadvantage that 1) blocks are unavailable long after writing to a directory finished and thus one can get out of space unnecessarily early 2) releasing blocks from udf_clear_inode is problematic because VFS does not expect us to redirty inode there and it also slows down memory reclaim. So preallocate blocks only for regular files where we can drop preallocation in udf_release_file. Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14udf: Remove wrong assignment in udf_symlinkJan Kara
Recomputation of the pointer was wrong (it should have been just increment). Luckily, we never use the computed value. Remove it. Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14udf: Remove dead codeJan Kara
Remove code that gets never used. Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14fsync: wait for data writeout completion before calling ->fsyncChristoph Hellwig
Currenly vfs_fsync(_range) first calls filemap_fdatawrite to write out the data, the calls into ->fsync to write out the metadata and then finally calls filemap_fdatawait to wait for the data I/O to complete. What sounds like a clever micro-optimization actually is nast trap for many filesystems. For many modern filesystems i_size or other inode information is only updated on I/O completion and we need to wait for I/O to finish before we can write out the metadata. For old fashionen filesystems that instanciate blocks during the actual write and also update the metadata at that point it opens up a large window were we could expose uninitialized blocks after a crash. While a few filesystems that need it already wait for the I/O to finish inside their ->fsync methods it is rather suboptimal as it is done under the i_mutex and also always for the whole file instead of just a part as we could do for O_SYNC handling. Here is a small audit of all fsync instances in the tree: - spufs_mfc_fsync: - ps3flash_fsync: - vol_cdev_fsync: - printer_fsync: - fb_deferred_io_fsync: - bad_file_fsync: - simple_sync_file: don't care - filesystems/drivers do't use the page cache or are purely in-memory. - simple_fsync: - file_fsync: - affs_file_fsync: - fat_file_fsync: - jfs_fsync: - ubifs_fsync: - reiserfs_dir_fsync: - reiserfs_sync_file: never touch pagecache themselves. We need to wait before if we do not want to expose stale data after an allocation. - afs_fsync: - fuse_fsync_common: do the waiting writeback itself in awkward ways, would benefit from proper semantics - block_fsync: Does a filemap_write_and_wait on the block device inode. Because we now have f_mapping that is the same inode we call it on in vfs_fsync. So just removing it and letting the VFS do the work in one go would be an improvement. - btrfs_sync_file: - cifs_fsync: - xfs_file_fsync: need the wait first and currently do it themselves. would benefit from doing it outside i_mutex. - coda_fsync: - ecryptfs_fsync: - exofs_file_fsync: - shm_fsync: only passes the fsync through to the lower layer - ext3_sync_file: doesn't seem to care, comments are confusing. - ext4_sync_file: would need the wait to work correctly for delalloc mode with late i_size updates. Otherwise the ext3 comment applies. currently implemens it's own writeback and wait in an odd way, could benefit from doing it properly. - gfs2_fsync: not needed for journaled data mode, but probably harmless there. Currently writes back data asynchronously itself. Needs some major audit. - hostfs_fsync: just calls fsync/datasync on the host FD. Without the wait before data might not even be inflight yet if we're unlucky. - hpfs_file_fsync: - ncp_fsync: no-ops. Dangerous before and after. - jffs2_fsync: just calls jffs2_flush_wbuf_gc, not sure how this relates to data. - nfs_fsync_dir: just increments stats, claims all directory operations are synchronous - nfs_file_fsync: only writes out data??? Looks very odd. - nilfs_sync_file: looks like it expects all data done, but not sure from the code - ntfs_dir_fsync: - ntfs_file_fsync: appear to do their own data writeback. Very convoluted code. - ocfs2_sync_file: does it's own data writeback, but no wait. probably needs the wait. - smb_fsync: according to a comment expects all pages written already, probably needs the wait before. This patch only changes vfs_fsync_range, removal of the wait in the methods that have it is left to the filesystem maintainers. Note that most filesystems really do need an audit for their fsync methods given the gems found in this very brief audit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14vfs: Remove generic_osync_inode() and sync_page_range{_nolock}()Jan Kara
Remove these three functions since nobody uses them anymore. Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14fat: Opencode sync_page_range_nolock()Jan Kara
fat_cont_expand() is the only user of sync_page_range_nolock(). It's also the only user of generic_osync_inode() which does not have a file open. So opencode needed actions for FAT so that we can convert generic_osync_inode() to a standard syncing path. Update a comment about generic_osync_inode(). CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14xfs: Convert sync_page_range() to simple filemap_write_and_wait_range()Jan Kara
Christoph Hellwig says that it is enough for XFS to call filemap_write_and_wait_range() instead of sync_page_range() because we do all the metadata syncing when forcing the log. CC: Felix Blyakher <felixb@sgi.com> CC: xfs@oss.sgi.com CC: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14ocfs2: Update syncing after splicing to match generic versionJan Kara
Update ocfs2 specific splicing code to use generic syncing helper. The sync now does not happen under rw_lock because generic_write_sync() acquires i_mutex which ranks above rw_lock. That should not matter because standard fsync path does not hold it either. Acked-by: Joel Becker <Joel.Becker@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> CC: ocfs2-devel@oss.oracle.com Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14ntfs: Use new syncing helpers and update commentsJan Kara
Use new syncing helpers in .write and .aio_write functions. Also remove superfluous syncing in ntfs_file_buffered_write() and update comments about generic_osync_inode(). CC: Anton Altaparmakov <aia21@cantab.net> CC: linux-ntfs-dev@lists.sourceforge.net Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14ext4: Remove syncing logic from ext4_file_writeJan Kara
The syncing is now properly handled by generic_file_aio_write() so no special ext4 code is needed. CC: linux-ext4@vger.kernel.org CC: tytso@mit.edu Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14ext3: Remove syncing logic from ext3_file_writeJan Kara
Syncing is now properly done by generic_file_aio_write() so no special logic is needed in ext3. CC: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14ext2: Update comment about generic_osync_inodeJan Kara
We rely on generic_write_sync() now. CC: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14vfs: Introduce new helpers for syncing after writing to O_SYNC file or ↵Jan Kara
IS_SYNC inode Introduce new function for generic inode syncing (vfs_fsync_range) and use it from fsync() path. Introduce also new helper for syncing after a sync write (generic_write_sync) using the generic function. Use these new helpers for syncing from generic VFS functions. This makes O_SYNC writes to block devices acquire i_mutex for syncing. If we really care about this, we can make block_fsync() drop the i_mutex and reacquire it before it returns. CC: Evgeniy Polyakov <zbr@ioremap.net> CC: ocfs2-devel@oss.oracle.com CC: Joel Becker <joel.becker@oracle.com> CC: Felix Blyakher <felixb@sgi.com> CC: xfs@oss.sgi.com CC: Anton Altaparmakov <aia21@cantab.net> CC: linux-ntfs-dev@lists.sourceforge.net CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> CC: linux-ext4@vger.kernel.org CC: tytso@mit.edu Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14vfs: Rename generic_file_aio_write_nolockChristoph Hellwig
generic_file_aio_write_nolock() is now used only by block devices and raw character device. Filesystems should use __generic_file_aio_write() in case generic_file_aio_write() doesn't suit them. So rename the function to blkdev_aio_write() and move it to fs/blockdev.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14ocfs2: Use __generic_file_aio_write instead of generic_file_aio_write_nolockJan Kara
Use the new helper. We have to submit data pages ourselves in case of O_SYNC write because __generic_file_aio_write does not do it for us. OCFS2 developpers might think about moving the sync out of i_mutex which seems to be easily possible but that's out of scope of this patch. CC: ocfs2-devel@oss.oracle.com Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-14fs/Kconfig: move nilfs2 outside misc filesystemsRyusuke Konishi
Some people asked me questions like the following: On Wed, 15 Jul 2009 13:11:21 +0200, Leon Woestenberg wrote: > just wondering, any reasons why NILFS2 is one of the miscellaneous > filesystems and, for example, btrfs, is not in Kconfig? Actually, nilfs is NOT a filesystem came from other operating systems, but a filesystem created purely for Linux. Nor is it a flash filesystem but that for generic block devices. So, this moves nilfs outside the misc category as I responded in LKML "Re: Why does NILFS2 hide under Miscellaneous filesystems?" (Message-Id: <20090716.002526.93465395.ryusuke@osrg.net>). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: convert nilfs_bmap_lookup to an inline functionRyusuke Konishi
The nilfs_bmap_lookup() is now a wrapper function of nilfs_bmap_lookup_at_level(). This moves the nilfs_bmap_lookup() to a header file converting it to an inline function and gives an opportunity for optimization. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: allow btree code to directly call dat operationsRyusuke Konishi
The current btree code is written so that btree functions call dat operations via wrapper functions in bmap.c when they allocate, free, or modify virtual block addresses. This abstraction requires additional function calls and causes frequent call of nilfs_bmap_get_dat() function since it is used in the every wrapper function. This removes the wrapper functions and makes them available from btree.c and direct.c, which will increase the opportunity of compiler optimization. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: add update functions of virtual block address to datRyusuke Konishi
This is a preparation for the successive cleanup ("nilfs2: allow btree to directly call dat operations"). This adds functions bundling a few operations to change an entry of virtual block address on the dat file. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: remove individual gfp constants for each metadata fileRyusuke Konishi
This gets rid of NILFS_CPFILE_GFP, NILFS_SUFILE_GFP, NILFS_DAT_GFP, and NILFS_IFILE_GFP. All of these constants refer to NILFS_MDT_GFP, and can be removed. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: stop zero-fill of btree path just before free itRyusuke Konishi
The btree path object is cleared just before it is freed. This will remove the code doing the unnecessary clear operation. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: remove unused btree argument from btree functionsRyusuke Konishi
Even though many btree functions take a btree object as their first argument, most of them are not used in their functions. This sticky use of the btree argument is hurting code readability and giving the possibility of inefficient code generation. So, this removes the unnecessary btree arguments. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: remove nilfs_dat_abort_start and nilfs_dat_abort_freeRyusuke Konishi
These functions are not called from any functions. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: shorten freeze period due to GC in write operation v3Jiro SEKIBA
This is a re-revised patch to shorten freeze period. This version include a fix of the bug Konishi-san mentioned last time. When GC is runnning, GC moves live block to difference segments. Copying live blocks into memory is done in a transaction, however it is not necessarily to be in the transaction. This patch will get the nilfs_ioctl_move_blocks() out from transaction lock and put it before the transaction. I ran sysbench fileio test against nilfs partition. I copied some DVD/CD images and created snapshot to create live blocks before starting the benchmark. Followings are summary of rc8 and rc8 w/ the patch of per-request statistics, which is min/max and avg. I ran each test three times and bellow is average of those numers. According to this benchmark result, average time is slightly degrated. However, worstcase (max) result is significantly improved. This can address a few seconds write freeze. - random write per-request performance of rc8 min 0.843ms max 680.406ms avg 3.050ms - random write per-request performance of rc8 w/ this patch min 0.843ms -> 100.00% max 380.490ms -> 55.90% avg 3.233ms -> 106.00% - sequential write per-request performance of rc8 min 0.736ms max 774.343ms avg 2.883ms - sequential write per-request performance of rc8 w/ this patch min 0.720ms -> 97.80% max 644.280ms-> 83.20% avg 3.130ms -> 108.50% -----8<-----8<-----nilfs_cleanerd.conf-----8<-----8<----- protection_period 150 selection_policy timestamp # timestamp in ascend order nsegments_per_clean 2 cleaning_interval 2 retry_interval 60 use_mmap log_priority info -----8<-----8<-----nilfs_cleanerd.conf-----8<-----8<----- Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: add more check routines in mount processZhu Yanhai
nilfs2: Add more safeguard routines and protections in mount process, which also makes nilfs2 report consistency error messages when checkpoint number is invalid. Signed-off-by: Zhu Yanhai <zhu.yanhai@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: An unassigned variable is assigned to a never used structure memberZhang Qiang
nilfs2: In procedure 'nilfs_get_sb()', when a nilfs filesysttem is mounted for the first time, local variable 'nilfs->ns_last_cno' is used before loading the latest checkpoint number from disk (in 'nilfs_fill_super'). 'nilfs->ns_last_cno' is assigned to 'sd.cno', but 'sd.cno' has never been used in the procedure. Signed-off-by: Zhang Qiang <zhangqiang.buaa@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: use GFP_NOIO for bio_alloc instead of GFP_NOWAITRyusuke Konishi
Alberto Bertogli advised me about bio_alloc() use in nilfs: On Sat, 13 Jun 2009 22:52:40 -0300, Alberto Bertogli wrote: > By the way, those bio_alloc()s are using GFP_NOWAIT but it looks > like they could use at least GFP_NOIO or GFP_NOFS, since the caller > can (and sometimes do) sleep. The only caller is nilfs_submit_bh(), > which calls nilfs_submit_seg_bio() which can sleep calling > wait_for_completion(). This takes in the comment and replaces the use of GFP_NOWAIT flag with GFP_NOIO. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: stop using periodic write_super callbackJiro SEKIBA
This removes nilfs_write_super and commit super block in nilfs internal thread, instead of periodic write_super callback. VFS layer calls ->write_super callback periodically. However, it looks like that calling back is ommited when disk I/O is busy. And when cleanerd (nilfs GC) is runnig, disk I/O tend to be busy thus nilfs superblock is not synchronized as nilfs designed. To avoid it, syncing superblock by nilfs thread instead of pdflush. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: clean up nilfs_write_superJiro SEKIBA
Separate conditions that check if syncing super block and alternative super block are required as inline functions to reuse the conditions. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: fix disorder of nilfs_write_super in nilfs_sync_fsJiro SEKIBA
This fixes disorder of nilfs_write_super in nilfs_sync_fs. Commiting super block must be the end of the function so that every changes are reflected. ->sync_fs() is not called frequently so this makes nilfs_sync_fs call nilfs_commit_super instead of nilfs_write_super. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: remove redundant super block commitJiro SEKIBA
This removes redundant super block commit. nilfs_write_super will call nilfs_commit_super to store super block into block device. However, nilfs_put_super will call nilfs_commit_super right after calling nilfs_write_super. So calling nilfs_write_super in nilfs_put_super would be redundant. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: implement nilfs_show_options to display mount options in /proc/mountsJiro SEKIBA
This is a patch to display mount options in procfs. Mount options will show up in the /proc/mounts as other fs does. ... /dev/sda6 /mnt nilfs2 ro,relatime,barrier=off,cp=3,order=strict 0 0 ... Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: always lookup disk block address before reading metadata blockRyusuke Konishi
The current metadata file code skips disk address lookup for its data block if the buffer has a mapped flag. This has a potential risk to cause read request to be performed against the stale block address that GC moved, and it may lead to meta data corruption. The mapped flag is safe if the buffer has an uptodate flag, otherwise it may prevent necessary update of disk address in the next read. This will avoid the potential problem by ensuring disk address lookup before reading metadata block even for buffers with the mapped flag. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: use semaphore to protect pointer to a writable FS-instanceRyusuke Konishi
will get rid of nilfs_get_writer() and nilfs_put_writer() pair used to retain a writable FS-instance for a period. The pair functions were making up some kind of recursive lock with a mutex, but they became overkill since the commit 201913ed746c7724a40d33ee5a0b6a1fd2ef3193. Furthermore, they caused the following lockdep warning because the mutex can be released by a task which didn't lock it: ===================================== [ BUG: bad unlock balance detected! ] ------------------------------------- kswapd0/422 is trying to release lock (&nilfs->ns_writer_mutex) at: [<c1359ff5>] mutex_unlock+0x8/0xa but there are no more locks to release! other info that might help us debug this: no locks held by kswapd0/422. stack backtrace: Pid: 422, comm: kswapd0 Not tainted 2.6.31-rc4-nilfs #51 Call Trace: [<c1358f97>] ? printk+0xf/0x18 [<c104fea7>] print_unlock_inbalance_bug+0xcc/0xd7 [<c11578de>] ? prop_put_global+0x3/0x35 [<c1050195>] lock_release+0xed/0x1dc [<c1359ff5>] ? mutex_unlock+0x8/0xa [<c1359f83>] __mutex_unlock_slowpath+0xaf/0x119 [<c1359ff5>] mutex_unlock+0x8/0xa [<d1284add>] nilfs_mdt_write_page+0xd8/0xe1 [nilfs2] [<c1092653>] shrink_page_list+0x379/0x68d [<c109171b>] ? isolate_pages_global+0xb4/0x18c [<c1092bd2>] shrink_list+0x26b/0x54b [<c10930be>] shrink_zone+0x20c/0x2a2 [<c10936b7>] kswapd+0x407/0x591 [<c1091667>] ? isolate_pages_global+0x0/0x18c [<c1040603>] ? autoremove_wake_function+0x0/0x33 [<c10932b0>] ? kswapd+0x0/0x591 [<c104033b>] kthread+0x69/0x6e [<c10402d2>] ? kthread+0x0/0x6e [<c1003e33>] kernel_thread_helper+0x7/0x1a This patch uses a reader/writer semaphore instead of the own lock and kills this warning. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: fix format string compile warning (ino_t)Heiko Carstens
Unlike on most other architectures ino_t is an unsigned int on s390. So add an explicit cast to avoid this compile warning: fs/nilfs2/recovery.c: In function 'recover_dsync_blocks': fs/nilfs2/recovery.c:555: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t' Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: fix ignored error code in __nilfs_read_inode()Ryusuke Konishi
The __nilfs_read_inode function is ignoring the error code returned from nilfs_read_inode_common(), and wrongly delivers a success code (zero) when it escapes from the function in erroneous cases. This adds the missing error handling. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14GFS2: Whitespace fixesSteven Whitehouse
Reported-by: Daniel Walker <dwalker@fifo99.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-09-14block: use blkdev_issue_discard in blk_ioctl_discardChristoph Hellwig
blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard, the only difference between the two is that blkdev_issue_discard needs to send a barrier discard request and blk_ioctl_discard a non-barrier one, and blk_ioctl_discard needs to wait on the request. To facilitates this add a flags argument to blkdev_issue_discard to control both aspects of the behaviour. This will be very useful later on for using the waiting funcitonality for other callers. Based on an earlier patch from Matthew Wilcox <matthew@wil.cx>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-14Seperate read and write statistics of in_flight requestsNikanth Karthikesan
Currently, there is a single in_flight counter measuring the number of requests in the request_queue. But some monitoring tools would like to know how many read requests and write requests are in progress. Split the current in_flight counter into two seperate counters for read and write. This information is exported as a sysfs attribute, as changing the currently available stat files would break the existing tools. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (87 commits) NFSv4: Disallow 'mount -t nfs4 -overs=2' and 'mount -t nfs4 -overs=3' NFS: Allow the "nfs" file system type to support NFSv4 NFS: Move details of nfs4_get_sb() to a helper NFS: Refactor NFSv4 text-based mount option validation NFS: Mount option parser should detect missing "port=" NFS: out of date comment regarding O_EXCL above nfs3_proc_create() NFS: Handle a zero-length auth flavor list SUNRPC: Ensure that sunrpc gets initialised before nfs, lockd, etc... nfs: fix compile error in rpc_pipefs.h nfs: Remove reference to generic_osync_inode from a comment SUNRPC: cache must take a reference to the cache detail's module on open() NFS: Use the DNS resolver in the mount code. NFS: Add a dns resolver for use with NFSv4 referrals and migration SUNRPC: Fix a typo in cache_pipefs_files nfs: nfs4xdr: optimize low level decoding nfs: nfs4xdr: get rid of READ_BUF nfs: nfs4xdr: simplify decode_exchange_id by reusing decode_opaque_inline nfs: nfs4xdr: get rid of COPYMEM nfs: nfs4xdr: introduce decode_sessionid helper nfs: nfs4xdr: introduce decode_verifier helper ...
2009-09-11ext4: Fix initalization of s_flex_groupsTheodore Ts'o
The s_flex_groups array should have been initialized using atomic_add to sum up the free counts from the block groups that make up a flex_bg. By using atomic_set, the value of the s_flex_groups array was set to the values of the last block group in the flex_bg. The impact of this bug is that the block and inode allocation algorithms might not pick the best flex_bg for new allocation. Thanks to Damien Guibouret for pointing out this problem! Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-11Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits) sched: Fix sched::sched_stat_wait tracepoint field sched: Disable NEW_FAIR_SLEEPERS for now sched: Keep kthreads at default priority sched: Re-tune the scheduler latency defaults to decrease worst-case latencies sched: Turn off child_runs_first sched: Ensure that a child can't gain time over it's parent after fork() sched: enable SD_WAKE_IDLE sched: Deal with low-load in wake_affine() sched: Remove short cut from select_task_rq_fair() sched: Turn on SD_BALANCE_NEWIDLE sched: Clean up topology.h sched: Fix dynamic power-balancing crash sched: Remove reciprocal for cpu_power sched: Try to deal with low capacity, fix update_sd_power_savings_stats() sched: Try to deal with low capacity sched: Scale down cpu_power due to RT tasks sched: Implement dynamic cpu_power sched: Add smt_gain sched: Update the cpu_power sum during load-balance sched: Add SD_PREFER_SIBLING ...
2009-09-11Merge branch 'nfs-for-2.6.32'Trond Myklebust
2009-09-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (377 commits) ASoC: au1x: PSC-AC97 bugfixes ALSA: dummy - Increase MAX_PCM_SUBSTREAMS to 128 ALSA: dummy - Add debug proc file ALSA: Add const prefix to proc helper functions ALSA: Re-export snd_pcm_format_name() function ALSA: hda - Use auto model for HP laptops with ALC268 codec ALSA: cs46xx - Fix minimum period size ASoC: Fix WM835x Out4 capture enumeration ALSA: Remove unneeded ifdef from sound/core.h ALSA: Remove struct snd_monitor_file from public sound/core.h ASoC: Remove unuused hw_read_t sound: oxygen: work around MCE when changing volume ALSA: dummy - Fake buffer allocations ALSA: hda/realtek: Added support for CLEVO M540R subsystem, 6 channel + digital ASoC: fix pxa2xx-ac97.c breakage ALSA: dummy - Fix the timer calculation in systimer mode ALSA: dummy - Add more description ALSA: dummy - Better jiffies handling ALSA: dummy - Support high-res timer mode ALSA: Release v1.0.21 ...
2009-09-11Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'writeback' of git://git.kernel.dk/linux-2.6-block: writeback: check for registered bdi in flusher add and inode dirty writeback: add name to backing_dev_info writeback: add some debug inode list counters to bdi stats writeback: get rid of pdflush completely writeback: switch to per-bdi threads for flushing data writeback: move dirty inodes from super_block to backing_dev_info writeback: get rid of generic_sync_sb_inodes() export
2009-09-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (57 commits) binfmt_elf: fix PT_INTERP bss handling TPM: Fixup boot probe timeout for tpm_tis driver sysfs: Add labeling support for sysfs LSM/SELinux: inode_{get,set,notify}secctx hooks to access LSM security context information. VFS: Factor out part of vfs_setxattr so it can be called from the SELinux hook for inode_setsecctx. KEYS: Add missing linux/tracehook.h #inclusions KEYS: Fix default security_session_to_parent() Security/SELinux: includecheck fix kernel/sysctl.c KEYS: security_cred_alloc_blank() should return int under all circumstances IMA: open new file for read KEYS: Add a keyctl to install a process's session keyring on its parent [try #6] KEYS: Extend TIF_NOTIFY_RESUME to (almost) all architectures [try #6] KEYS: Do some whitespace cleanups [try #6] KEYS: Make /proc/keys use keyid not numread as file position [try #6] KEYS: Add garbage collection for dead, revoked and expired keys. [try #6] KEYS: Flag dead keys to induce EKEYREVOKED [try #6] KEYS: Allow keyctl_revoke() on keys that have SETATTR but not WRITE perm [try #6] KEYS: Deal with dead-type keys appropriately [try #6] CRED: Add some configurable debugging [try #6] selinux: Support for the new TUN LSM hooks ...
2009-09-11splice: update mtime and atime on filesMiklos Szeredi
Splice should update the modification and access times on regular files just like read and write. Not updating mtime will confuse backup tools, etc... This patch only adds the time updates for regular files. For pipes and other special files that splice touches the need for updating the times is less clear. Let's discuss and fix that separately. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11bio: first step in sanitizing the bio->bi_rw flag testingJens Axboe
Get rid of any functions that test for these bits and make callers use bio_rw_flagged() directly. Then it is at least directly apparent what variable and flag they check. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>