aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2009-09-11writeback: check for registered bdi in flusher add and inode dirtyJens Axboe
Also a debugging aid. We want to catch dirty inodes being added to backing devices that don't do writeback. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11writeback: add name to backing_dev_infoJens Axboe
This enables us to track who does what and print info. Its main use is catching dirty inodes on the default_backing_dev_info, so we can fix that up. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11writeback: get rid of pdflush completelyJens Axboe
It is now unused, so kill it off. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11writeback: switch to per-bdi threads for flushing dataJens Axboe
This gets rid of pdflush for bdi writeout and kupdated style cleaning. pdflush writeout suffers from lack of locality and also requires more threads to handle the same workload, since it has to work in a non-blocking fashion against each queue. This also introduces lumpy behaviour and potential request starvation, since pdflush can be starved for queue access if others are accessing it. A sample ffsb workload that does random writes to files is about 8% faster here on a simple SATA drive during the benchmark phase. File layout also seems a LOT more smooth in vmstat: r b swpd free buff cache si so bi bo in cs us sy id wa 0 1 0 608848 2652 375372 0 0 0 71024 604 24 1 10 48 42 0 1 0 549644 2712 433736 0 0 0 60692 505 27 1 8 48 44 1 0 0 476928 2784 505192 0 0 4 29540 553 24 0 9 53 37 0 1 0 457972 2808 524008 0 0 0 54876 331 16 0 4 38 58 0 1 0 366128 2928 614284 0 0 4 92168 710 58 0 13 53 34 0 1 0 295092 3000 684140 0 0 0 62924 572 23 0 9 53 37 0 1 0 236592 3064 741704 0 0 4 58256 523 17 0 8 48 44 0 1 0 165608 3132 811464 0 0 0 57460 560 21 0 8 54 38 0 1 0 102952 3200 873164 0 0 4 74748 540 29 1 10 48 41 0 1 0 48604 3252 926472 0 0 0 53248 469 29 0 7 47 45 where vanilla tends to fluctuate a lot in the creation phase: r b swpd free buff cache si so bi bo in cs us sy id wa 1 1 0 678716 5792 303380 0 0 0 74064 565 50 1 11 52 36 1 0 0 662488 5864 319396 0 0 4 352 302 329 0 2 47 51 0 1 0 599312 5924 381468 0 0 0 78164 516 55 0 9 51 40 0 1 0 519952 6008 459516 0 0 4 78156 622 56 1 11 52 37 1 1 0 436640 6092 541632 0 0 0 82244 622 54 0 11 48 41 0 1 0 436640 6092 541660 0 0 0 8 152 39 0 0 51 49 0 1 0 332224 6200 644252 0 0 4 102800 728 46 1 13 49 36 1 0 0 274492 6260 701056 0 0 4 12328 459 49 0 7 50 43 0 1 0 211220 6324 763356 0 0 0 106940 515 37 1 10 51 39 1 0 0 160412 6376 813468 0 0 0 8224 415 43 0 6 49 45 1 1 0 85980 6452 886556 0 0 4 113516 575 39 1 11 54 34 0 2 0 85968 6452 886620 0 0 0 1640 158 211 0 0 46 54 A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A SSD based writeback test on XFS performs over 20% better as well, with the throughput being very stable around 1GB/sec, where pdflush only manages 750MB/sec and fluctuates wildly while doing so. Random buffered writes to many files behave a lot better as well, as does random mmap'ed writes. A separate thread is added to sync the super blocks. In the long term, adding sync_supers_bdi() functionality could get rid of this thread again. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11writeback: move dirty inodes from super_block to backing_dev_infoJens Axboe
This is a first step at introducing per-bdi flusher threads. We should have no change in behaviour, although sb_has_dirty_inodes() is now ridiculously expensive, as there's no easy way to answer that question. Not a huge problem, since it'll be deleted in subsequent patches. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11writeback: get rid of generic_sync_sb_inodes() exportJens Axboe
This adds two new exported functions: - writeback_inodes_sb(), which only attempts to writeback dirty inodes on this super_block, for WB_SYNC_NONE writeout. - sync_inodes_sb(), which writes out all dirty inodes on this super_block and also waits for the IO to complete. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-10ext4: Always set dx_node's fake_dirent explicitly.Andreas Schlick
When ext4_dx_add_entry() has to split an index node, it has to ensure that name_len of dx_node's fake_dirent is also zero, because otherwise e2fsck won't recognise it as an intermediate htree node and consider the htree to be corrupted. Signed-off-by: Andreas Schlick <schlick@lavabit.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-11ext4: Fix async commit mode to be safe by using a barrierTheodore Ts'o
Previously the journal_async_commit mount option was equivalent to using barrier=0 (and just as unsafe). This patch fixes it so that we eliminate the barrier before the commit block (by not using ordered mode), and explicitly issuing an empty barrier bio after writing the commit block. Because of the journal checksum, it is safe to do this; if the journal blocks are not all written before a power failure, the checksum in the commit block will prevent the last transaction from being replayed. Using the fs_mark benchmark, using journal_async_commit shows a 50% improvement: FSUse% Count Size Files/sec App Overhead 8 1000 10240 30.5 28242 vs. FSUse% Count Size Files/sec App Overhead 8 1000 10240 45.8 28620 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-11Merge branch 'next' into for-linusJames Morris
2009-09-10ext4: Don't update superblock write time when filesystem is read-onlyTheodore Ts'o
This avoids updating the superblock write time when we are mounting the root file system read/only but we need to replay the journal; at that point, for people who are east of GMT and who make their clock tick in localtime for Windows bug-for-bug compatibility, and this will cause e2fsck to complain and force a full file system check. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-10Merge branch 'master' of ↵Alex Elder
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
2009-09-10Merge branch 'topic/soundcore-preclaim' into for-linusTakashi Iwai
* topic/soundcore-preclaim: sound: make OSS device number claiming optional and schedule its removal sound: request char-major-* module aliases for missing OSS devices chrdev: implement __[un]register_chrdev()
2009-09-10binfmt_elf: fix PT_INTERP bss handlingRoland McGrath
In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if the PT_LOAD has no PROT_WRITE and no .bss. This generates EFAULT. Here is a small test case. (Yes, there are other, useful PT_INTERP which have only .text and no .data/.bss.) ----- ptinterp.S _start: .globl _start nop int3 ----- $ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S $ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c $ ./hello Segmentation fault # during execve() itself After applying the patch: $ ./hello Trace trap # user-mode execution after execve() finishes If the ELF headers are actually self-inconsistent, then dying is fine. But having no PROT_WRITE segment is perfectly normal and correct if there is no segment with p_memsz > p_filesz (i.e. bss). John Reiser suggested checking for PROT_WRITE in the bss logic. I think it makes most sense to simply apply the bss logic only when there is bss. This patch looks less trivial than it is due to some reindentation. It just moves the "if (last_bss > elf_bss) {" test up to include the partial-page bss logic as well as the more-pages bss logic. Reported-by: John Reiser <jreiser@bitwagon.com> Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-09ext4: Clarify the locking details in mballocAneesh Kumar K.V
We don't need to take the alloc_sem lock when we are adding new groups, since mballoc won't see the new group added until we bump sbi->s_groups_count. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2009-09-09ext4: check for need init flag in ext4_mb_load_buddyAneesh Kumar K.V
We should check for need init flag with the group's alloc_sem held, to make sure while we are loading the buddy cache and holding a reference to it, a file system resize can't add new blocks to same group. The patch also drops the need init flag check in ext4_mb_regular_allocator() because doing the check without holding alloc_sem is racy. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2009-09-09ext4: move ext4_mb_init_group() function earlier in the mballoc.cAneesh Kumar K.V
This moves the function around so that it can be called from ext4_mb_load_buddy(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-09Merge branch 'lookup-permissions-cleanup'Linus Torvalds
* lookup-permissions-cleanup: jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()' ext[234]: move over to 'check_acl' permission model shmfs: use 'check_acl' instead of 'permission' Make 'check_acl()' a first-class filesystem op Simplify exec_permission_lite(), part 3 Simplify exec_permission_lite() further Simplify exec_permission_lite() logic Do not call 'ima_path_check()' for each path component
2009-09-09binfmt_elf: fix PT_INTERP bss handlingRoland McGrath
In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if the PT_LOAD has no PROT_WRITE and no .bss. This generates EFAULT. Here is a small test case. (Yes, there are other, useful PT_INTERP which have only .text and no .data/.bss.) ----- ptinterp.S _start: .globl _start nop int3 ----- $ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S $ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c $ ./hello Segmentation fault # during execve() itself After applying the patch: $ ./hello Trace trap # user-mode execution after execve() finishes If the ELF headers are actually self-inconsistent, then dying is fine. But having no PROT_WRITE segment is perfectly normal and correct if there is no segment with p_memsz > p_filesz (i.e. bss). John Reiser suggested checking for PROT_WRITE in the bss logic. I think it makes most sense to simply apply the bss logic only when there is bss. This patch looks less trivial than it is due to some reindentation. It just moves the "if (last_bss > elf_bss) {" test up to include the partial-page bss logic as well as the more-pages bss logic. Reported-by: John Reiser <jreiser@bitwagon.com> Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-09ext4: Make non-journal fsync work properlyFrank Mayhar
Teach ext4_write_inode() and ext4_do_update_inode() about non-journal mode: If we're not using a journal, ext4_write_inode() now calls ext4_do_update_inode() (after getting the iloc via ext4_get_inode_loc()) with a new "do_sync" parameter. If that parameter is nonzero _and_ we're not using a journal, ext4_do_update_inode() calls sync_dirty_buffer() instead of ext4_handle_dirty_metadata(). This problem was found in power-fail testing, checking the amount of loss of files and blocks after a power failure when using fsync() and when not using fsync(). It turned out that using fsync() was actually worse than not doing so, possibly because it increased the likelihood that the inodes would remain unflushed and would therefore be lost at the power failure. Signed-off-by: Frank Mayhar <fmayhar@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-12ext4: Assure that metadata blocks are written during fsync in no journal modeTheodore Ts'o
When there is no journal present, we must attach buffer heads associated with extent tree and indirect blocks to the inode's mapping->private_list via mark_buffer_dirty_inode() so that ext4_sync_file() --- which is called to service fsync() and fdatasync() system calls --- can write out the inode's metadata blocks by calling sync_mapping_buffers(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-09ext4: Use bforget() in no journal mode for ext4_journal_{forget,revoke}()Theodore Ts'o
When ext4 is using a journal, a metadata block which is deallocated must be passed into the journal layer so it can be dropped from the current transaction and/or revoked. This is done by calling the functions ext4_journal_forget() and ext4_journal_revoke(), which call jbd2_journal_forget(), and jbd2_journal_revoke(), respectively. Since the jbd2_journal_forget() and jbd2_journal_revoke() call bforget(), if ext4 is not using a journal, ext4_journal_forget() and ext4_journal_revoke() must call bforget() to avoid a dirty metadata block overwriting a block after it has been reallocated and reused for another inode's data block. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-10sysfs: Add labeling support for sysfsDavid P. Quigley
This patch adds a setxattr handler to the file, directory, and symlink inode_operations structures for sysfs. The patch uses hooks introduced in the previous patch to handle the getting and setting of security information for the sysfs inodes. As was suggested by Eric Biederman the struct iattr in the sysfs_dirent structure has been replaced by a structure which contains the iattr, secdata and secdata length to allow the changes to persist in the event that the inode representing the sysfs_dirent is evicted. Because sysfs only stores this information when a change is made all the optional data is moved into one dynamically allocated field. This patch addresses an issue where SELinux was denying virtd access to the PCI configuration entries in sysfs. The lack of setxattr handlers for sysfs required that a single label be assigned to all entries in sysfs. Granting virtd access to every entry in sysfs is not an acceptable solution so fine grained labeling of sysfs is required such that individual entries can be labeled appropriately. [sds: Fixed compile-time warnings, coding style, and setting of inode security init flags.] Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Signed-off-by: Stephen D. Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-10VFS: Factor out part of vfs_setxattr so it can be called from the SELinux ↵David P. Quigley
hook for inode_setsecctx. This factors out the part of the vfs_setxattr function that performs the setting of the xattr and its notification. This is needed so the SELinux implementation of inode_setsecctx can handle the setting of the xattr while maintaining the proper separation of layers. Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-09xfs: use correct log reservation when handling ENOSPC in xfs_createChristoph Hellwig
We added the ENOSPC handling patch in xfs_create just after it got mered with xfs_mkdir. Change the log reservation to the variable for either the create or mkdir value so it does the right thing if get here for creating a directory. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-09-09GFS2: Remove unused sysfs fileSteven Whitehouse
The /sys/fs/gfs2/<fsname>/lock_module/id file has been unused for some time now, so we can remove it. We still accept the mount option though, as userspace still sends that. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-09-08NFSv4: Disallow 'mount -t nfs4 -overs=2' and 'mount -t nfs4 -overs=3'Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08NFS: Allow the "nfs" file system type to support NFSv4Chuck Lever
When mounting an "nfs" type file system, recognize "v4," "vers=4," or "nfsvers=4" mount options, and convert the file system to "nfs4" under the covers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [trondmy: fixed up binary mount code so it sets the 'version' field too] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08NFS: Move details of nfs4_get_sb() to a helperChuck Lever
Clean up: Refactor nfs4_get_sb() to allow its guts to be invoked by nfs_get_sb(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08NFS: Refactor NFSv4 text-based mount option validationChuck Lever
Clean up: Refactor the part of nfs4_validate_mount_options() that handles text-based options, so we can call it from the NFSv2/v3 option validation function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08NFS: Mount option parser should detect missing "port="Chuck Lever
The meaning of not specifying the "port=" mount option is different for "-t nfs" and "-t nfs4" mounts. The default port value for NFSv2/v3 mounts is 0, but the default for NFSv4 mounts is 2049. To support "-t nfs -o vers=4", the mount option parser must detect when "port=" is missing so that the correct default port value can be set depending on which NFS version is requested. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08NFS: out of date comment regarding O_EXCL above nfs3_proc_create()Harshula Jayasuriya
Hi Trond, Recently we were observing the behaviour difference between a 2.4.x and 2.6.x kernel with respect to O_EXCL. A comment from 2.4.x era, "For now, we don't implement O_EXCL." seems inaccurate in TOT. If so, here's a patch to remove the comment. This patch is against: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Signed-off-by: Harshula Jayasuriya <harshula@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()'Linus Torvalds
This avoids an indirect call in the VFS for each path component lookup. Well, at least as long as you own the directory in question, and the ACL check is unnecessary. Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-08ext[234]: move over to 'check_acl' permission modelLinus Torvalds
Don't implement per-filesystem 'extX_permission()' functions that have to be called for every path component operation, and instead just expose the actual ACL checking so that the VFS layer can now do it for us. Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-08Make 'check_acl()' a first-class filesystem opLinus Torvalds
This is stage one in flattening out the callchains for the common permission testing. Rather than have most filesystem implement their own inode->i_op->permission function that just calls back down to the VFS layers 'generic_permission()' with the per-filesystem ACL checking function, the filesystem can just expose its 'check_acl' function directly, and let the VFS layer do everything for it. This is all just preparatory - no filesystem actually enables this yet. Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-08Simplify exec_permission_lite(), part 3Linus Torvalds
Don't call down to the generic inode_permission() function just to call the inode-specific permission function - just do it directly. The generic inode_permission() code does things like checking MAY_WRITE and devcgroup_inode_permission(), neither of which are relevant for the light pathname walk permission checks (we always do just MAY_EXEC, and the inode is never a special device). Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-08Simplify exec_permission_lite() furtherLinus Torvalds
This function is only called for path components that are already known to be directories (they have a '->lookup' method). So don't bother doing that whole S_ISDIR() testing, the whole point of the 'lite()' version is that we know that we are looking at a directory component, and that we're only checking name lookup permission. Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-08Simplify exec_permission_lite() logicLinus Torvalds
Instead of returning EAGAIN and having the caller do something special for that case, just do the special case directly. Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-08Do not call 'ima_path_check()' for each path componentLinus Torvalds
Not only is that a supremely timing-critical path, but it's hopefully some day going to be lockless for the common case, and ima can't do that. Plus the integrity code doesn't even care about non-regular files, so it was always a total waste of time and effort. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-08GFS2: Be extra careful about deallocating inodesSteven Whitehouse
There is a potential race in the inode deallocation code if two nodes try to deallocate the same inode at the same time. Most of the issue is solved by the iopen locking. There is still a small window which is not covered by the iopen lock. This patches fixes that and also makes the deallocation code more robust in the face of any errors in the rgrp bitmaps, or erroneous iopen callbacks from other nodes. This does introduce one extra disk read, but that is generally not an issue since its the same block that must be written to later in the deallocation process. The total disk accesses therefore stay the same, Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-09-08ext4: print more sysadmin-friendly message in check_block_validity()Theodore Ts'o
Drop the WARN_ON(1), as he stack trace is not appropriate, since it is triggered by file system corruption, and it misleads users into thinking there is a kernel bug. In addition, change the message displayed by ext4_error() to make it clear that this is a file system corruption problem. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-09ext4: Take page lock before looking at attached buffer_heads flagsAneesh Kumar K.V
In order to check whether the buffer_heads are mapped we need to hold page lock. Otherwise a reclaim can cleanup the attached buffer_heads. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-07IMA: update ima_counts_putMimi Zohar
- As ima_counts_put() may be called after the inode has been freed, verify that the inode is not NULL, before dereferencing it. - Maintain the IMA file counters in may_open() properly, decrementing any counter increments on subsequent errors. Reported-by: Ciprian Docan <docan@eden.rutgers.edu> Reported-by: J.R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com Signed-off-by: James Morris <jmorris@namei.org>
2009-09-05ext4: Fix small typo for move_extent_per_page()Akira Fujita
This function means moving extents every page, so change its name from move_exgtent_par_page(). Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.co.jp> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-05ext4: Return exchanged blocks count to user space in failureAkira Fujita
Return exchanged blocks count (moved_len) to user space, if ext4_move_extents() failed on the way. Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-05ext4: Remove unneeded BUG_ON() in ext4_move_extents()Akira Fujita
The ext4_move_extents() functions checks with BUG_ON() whether the exchanged blocks count accords with request blocks count. But, if the target range (orig_start + len) includes sparse block(s), 'moved_len' (exchanged blocks count) does not agree with 'len' (request blocks count), since sparse block is not counted in 'moved_len'. This causes us to hit the BUG_ON(), even though the function succeeded. Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-16ext4: Fix wrong comparisons in mext_check_arguments()Akira Fujita
The mext_check_arguments() function in move_extents.c has wrong comparisons. orig_start which is passed from user-space is block unit, but i_size of inode is byte unit, therefore the checks do not work fine. This mis-check leads to the overflow of 'len' and then hits BUG_ON() in ext4_move_extents(). The patch fixes this issue. Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Reviewed-by: Greg Freemyer <greg.freemyer@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-05ext4: fix cache flush in ext4_sync_fileChristoph Hellwig
We need to flush the write cache unconditionally in ->fsync, otherwise writes into already allocated blocks can get lost. Writes into fully allocated files are very common when using disk images for virtualization, and without this fix can easily lose data after an fdatasync, which is the typical implementation for a cache flush on the virtual drive. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-05Merge git://git.infradead.org/~dwmw2/mtd-2.6.31Linus Torvalds
* git://git.infradead.org/~dwmw2/mtd-2.6.31: JFFS2: add missing verify buffer allocation/deallocation mtd: nftl: fix offset alignments mtd: nftl: write support is broken mtd: m25p80: fix null pointer dereference bug
2009-09-05Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: actually enable the swapext compat handler
2009-09-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_key