aboutsummaryrefslogtreecommitdiff
path: root/drivers/md/bcache
AgeCommit message (Collapse)Author
2014-02-06bcache: Data corruption fixKent Overstreet
commit ef71ec00002d92a08eb27e9d036e3d48835b6597 upstream. The code that handles overlapping extents that we've just read back in from disk was depending on the behaviour of the code that handles overlapping extents as we're inserting into a btree node in the case of an insert that forced an existing extent to be split: on insert, if we had to split we'd also insert a new extent to represent the top part of the old extent - and then that new extent would get written out. The code that read the extents back in thus not bother with splitting extents - if it saw an extent that ovelapped in the middle of an older extent, it would trim the old extent to only represent the bottom part, assuming that the original insert would've inserted a new extent to represent the top part. I still haven't figured out _how_ it can happen, but I'm now pretty convinced (and testing has confirmed) that there's some kind of an obscure corner case (probably involving extent merging, and multiple overwrites in different sets) that breaks this. The fix is to change the mergesort fixup code to split extents itself when required. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-13bcache: Fixed incorrect order of arguments to bio_alloc_bioset()Kent Overstreet
commit d4eddd42f592a0cf06818fae694a3d271f842e4d upstream. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13bcache: Fix a null ptr deref regressionKent Overstreet
commit 2fe80d3bbf1c8bd9efc5b8154207c8dd104e7306 upstream. Commit c0f04d88e46d ("bcache: Fix flushes in writeback mode") was fixing a reported data corruption bug, but it seems some last minute refactoring or rebasing introduced a null pointer deref. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Reported-by: Gabriel de Perthuis <g2p.code@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05bcache: Fix flushes in writeback modeKent Overstreet
commit c0f04d88e46d14de51f4baebb6efafb7d59e9f96 upstream. In writeback mode, when we get a cache flush we need to make sure we issue a flush to the backing device. The code for sending down an extra flush was wrong - by cloning the bio we were probably getting flags that didn't make sense for a bare flush, and also the old code was firing for FUA bios, for which we don't need to send a flush to the backing device. This was causing data corruption somehow - the mechanism was never determined, but this patch fixes it for the users that were seeing it. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05bcache: Fix for handling overlapping extents when reading in a btree nodeKent Overstreet
commit 84786438ed17978d72eeced580ab757e4da8830b upstream. btree_sort_fixup() was overly clever, because it was trying to avoid pulling a key off the btree iterator in more than one place. This led to a really obscure bug where we'd break early from the loop in btree_sort_fixup() if the current key overlapped with keys in more than one older set, and the next key it overlapped with was zero size. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05bcache: Fix a shrinker deadlockKent Overstreet
commit a698e08c82dfb9771e0bac12c7337c706d729b6d upstream. GFP_NOIO means we could be getting called recursively - mca_alloc() -> mca_data_alloc() - definitely can't use mutex_lock(bucket_lock) then. Whoops. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05bcache: Fix a dumb CPU spinning bug in writebackKent Overstreet
commit 79e3dab90d9f826ceca67c7890e048ac9169de49 upstream. schedule_timeout() != schedule_timeout_uninterruptible() Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05bcache: Fix a flush/fua performance bugKent Overstreet
commit 1394d6761b6e9e15ee7c632a6d48791188727b40 upstream. bch_journal_meta() was missing the flush to make the journal write actually go down (instead of waiting up to journal_delay_ms)... Whoops Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05bcache: Fix a writeback performance regressionKent Overstreet
commit c2a4f3183a1248f615a695fbd8905da55ad11bba upstream. Background writeback works by scanning the btree for dirty data and adding those keys into a fixed size buffer, then for each dirty key in the keybuf writing it to the backing device. When read_dirty() finishes and it's time to scan for more dirty data, we need to wait for the outstanding writeback IO to finish - they still take up slots in the keybuf (so that foreground writes can check for them to avoid races) - without that wait, we'll continually rescan when we'll be able to add at most a key or two to the keybuf, and that takes locks that starves foreground IO. Doh. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05bcache: Fix for when no journal entries are foundKent Overstreet
commit c426c4fd46f709ade2bddd51c5738729c7ae1db5 upstream. The journal replay code didn't handle this case, causing it to go into an infinite loop... Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05bcache: Strip endline when writing the label through sysfsGabriel de Perthuis
commit aee6f1cfff3ce240eb4b43b41ca466b907acbd2e upstream. sysfs attributes with unusual characters have crappy failure modes in Squeeze (udev 164); later versions of udev are unaffected. This should make these characters more unusual. Signed-off-by: Gabriel de Perthuis <g2p.code@gmail.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05bcache: Fix a dumb journal discard bugKent Overstreet
commit 6d9d21e35fbfa2934339e96934f862d118abac23 upstream. That switch statement was obviously wrong, leading to some sort of weird spinning on rare occasion with discards enabled... Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-29bcache: FUA fixesKent Overstreet
commit e49c7c374e7aacd1f04ecbc21d9dbbeeea4a77d6 upstream. Journal writes need to be marked FUA, not just REQ_FLUSH. And btree node writes have... weird ordering requirements. Signed-off-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-29md: bcache: io.c: fix a potential NULL pointer dereferenceKumar Amit Mehta
commit 5c694129c8db6d89c9be109049a16510b2f70f6d upstream. bio_alloc_bioset returns NULL on failure. This fix adds a missing check for potential NULL pointer dereferencing. Signed-off-by: Kumar Amit Mehta <gmate.amit@gmail.com> Signed-off-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28bcache: Journal replay fixKent Overstreet
commit faa5673617656ee58369a3cfe4a312cfcdc59c81 upstream. The journal replay code starts by finding something that looks like a valid journal entry, then it does a binary search over the unchecked region of the journal for the journal entries with the highest sequence numbers. Trouble is, the logic was wrong - journal_read_bucket() returns true if it found journal entries we need, but if the range of journal entries we're looking for loops around the end of the journal - in that case journal_read_bucket() could return true when it hadn't found the highest sequence number we'd seen yet, and in that case the binary search did the wrong thing. Whoops. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28bcache: Fix GC_SECTORS_USED() calculationKent Overstreet
commit 29ebf465b9050f241c4433a796a32e6c896a9dcd upstream. Part of the job of garbage collection is to add up however many sectors of live data it finds in each bucket, but that doesn't work very well if it doesn't reset GC_SECTORS_USED() when it starts. Whoops. This wouldn't have broken anything horribly, but allocation tries to preferentially reclaim buckets that are mostly empty and that's not gonna work with an incorrect GC_SECTORS_USED() value. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28bcache: Fix a sysfs splat on shutdownKent Overstreet
commit c9502ea4424b31728703d113fc6b30bfead14633 upstream. If we stopped a bcache device when we were already detaching (or something like that), bcache_device_unlink() would try to remove a symlink from sysfs that was already gone because the bcache dev kobject had already been removed from sysfs. So keep track of whether we've removed stuff from sysfs. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28bcache: Shutdown fixKent Overstreet
commit 5caa52afc5abd1396e4af720469abb5843a71eb8 upstream. Stopping a cache set is supposed to make it stop attached backing devices, but somewhere along the way that code got lost. Fixing this mainly has the effect of fixing our reboot notifier. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28bcache: Advertise that flushes are supportedKent Overstreet
commit 54d12f2b4fd0f218590d1490b41a18d0e2328a9a upstream. Whoops - bcache's flush/FUA was mostly correct, but flushes get filtered out unless we say we support them... Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28bcache: Fix a dumb raceKent Overstreet
commit 6aa8f1a6ca41c49721d2de4e048d3da8d06411f9 upstream. In the far-too-complicated closure code - closures can have destructors, for probably dubious reasons; they get run after the closure is no longer waiting on anything but before dropping the parent ref, intended just for freeing whatever memory the closure is embedded in. Trouble is, when remaining goes to 0 and we've got nothing more to run - we also have to unlock the closure, setting remaining to -1. If there's a destructor, that unlock isn't doing anything - nobody could be trying to lock it if we're about to free it - but if the unlock _is needed... that check for a destructor was racy. Argh. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-15bcache: Fix error handling in init codeKent Overstreet
This code appears to have rotted... fix various bugs and do some refactoring. Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-05-15bcache: drop "select CLOSURES"Paul Bolle
The Kconfig entry for BCACHE selects CLOSURES. But there's no Kconfig symbol CLOSURES. That symbol was used in development versions of bcache, but was removed when the closures code was no longer provided as a kernel library. It can safely be dropped. Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
2013-05-15bcache: Fix incompatible pointer type warningEmil Goode
The function pointer release in struct block_device_operations should point to functions declared as void. Sparse warnings: drivers/md/bcache/super.c:656:27: warning: incorrect type in initializer (different base types) drivers/md/bcache/super.c:656:27: expected void ( *release )( ... ) drivers/md/bcache/super.c:656:27: got int ( static [toplevel] *<noident> )( ... ) drivers/md/bcache/super.c:656:2: warning: initialization from incompatible pointer type [enabled by default] drivers/md/bcache/super.c:656:2: warning: (near initialization for ‘bcache_ops.release’) [enabled by default] Signed-off-by: Emil Goode <emilgoode@gmail.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-04-30bcache: Use bd_link_disk_holder()Kent Overstreet
Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-04-30bcache: Allocator cleanup/fixesKent Overstreet
The main fix is that bch_allocator_thread() wasn't waiting on garbage collection to finish (if invalidate_buckets had set ca->invalidate_needs_gc); we need that to make sure the allocator doesn't spin and potentially block gc from finishing. Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-04-24bcache: Make sure blocksize isn't smaller than device blocksizeKent Overstreet
Sanity check to make sure we don't end up doing IO the device doesn't support. Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-04-22bcache: Fix merge_bvec_fn usage for when it modifies the bvmKent Overstreet
Stacked md devices reuse the bvm for the subordinate device, causing problems... Reported-by: Michael Balser <michael.balser@profitbricks.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-04-20bcache: Correctly check against BIO_MAX_PAGESKent Overstreet
bch_bio_max_sectors() was checking against BIO_MAX_PAGES as if the limit was for the total bytes in the bio, not the number of segments. Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-04-20bcache: Hack around stuff that clones up to bi_max_vecsKent Overstreet
Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-04-20bcache: Set ra_pages based on backing device's ra_pagesKent Overstreet
Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-04-20bcache: Take data offset from the bdev superblock.Kent Overstreet
Add a new superblock version, and consolidate related defines. Signed-off-by: Gabriel de Perthuis <g2p.code+bcache@gmail.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-04-08bcache: Disable broken btree fuzz testerKent Overstreet
Reported-by: <sasha.levin@oracle.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-04-08bcache: Fix a format string overflowKent Overstreet
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-04-08bcache: Fix a minor memory leak on device teardownKent Overstreet
Reported-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-04-08bcache: Use WARN_ONCE() instead of __WARN()Kent Overstreet
Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-04-08bcache: Add missing #include <linux/prefetch.h>Geert Uytterhoeven
m68k/allmodconfig: drivers/md/bcache/bset.c: In function ‘bset_search_tree’: drivers/md/bcache/bset.c:727: error: implicit declaration of function ‘prefetch’ drivers/md/bcache/btree.c: In function ‘bch_btree_node_get’: drivers/md/bcache/btree.c:933: error: implicit declaration of function ‘prefetch’ Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-04-08bcache: Sparse fixesKent Overstreet
Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-03-28bcache: Don't export utility code, prefix with bch_Kent Overstreet
Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: linux-bcache@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-25bcache: Fix for the build fixesKent Overstreet
Commit 82a84eaf7e51ba3da0c36cbc401034a4e943492d left a return 0 in closure_debug_init(). Whoops. Signed-off-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-25bcache: Style/checkpatch fixesKent Overstreet
Took out some nested functions, and fixed some more checkpatch complaints. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: linux-bcache@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-25bcache: Build fixes from test robotKent Overstreet
config: make ARCH=i386 allmodconfig All error/warnings: drivers/md/bcache/bset.c: In function 'bch_ptr_bad': >> drivers/md/bcache/bset.c:164:2: warning: format '%li' expects argument of type 'long int', but argument 4 has type 'size_t' [-Wformat] -- drivers/md/bcache/debug.c: In function 'bch_pbtree': >> drivers/md/bcache/debug.c:86:4: warning: format '%li' expects argument of type 'long int', but argument 4 has type 'size_t' [-Wformat] -- drivers/md/bcache/btree.c: In function 'bch_btree_read_done': >> drivers/md/bcache/btree.c:245:8: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t' [-Wformat] -- drivers/md/bcache/closure.o: In function `closure_debug_init': >> (.init.text+0x0): multiple definition of `init_module' >> drivers/md/bcache/super.o:super.c:(.init.text+0x0): first defined here Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: linux-bcache@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-23bcache: A block layer cacheKent Overstreet
Does writethrough and writeback caching, handles unclean shutdown, and has a bunch of other nifty features motivated by real world usage. See the wiki at http://bcache.evilpiepirate.org for more. Signed-off-by: Kent Overstreet <koverstreet@google.com>