authorNeilBrown <neilb@suse.de>2011-06-28 16:59:42 +1000
committerGreg Kroah-Hartman <gregkh@suse.de>2011-07-13 05:29:25 +0200
commit1ca39696ba621b0737c78af2c104939c60b29ce4 (patch)
treeba3fee215495632bc74e5066727ac46fa99c45c9 /drivers
parent48984ada7416c0f533bc2a0f886ccb6c3646fe9a (diff)
md: avoid endless recovery loop when waiting for fail device to complete.
commit 4274215d24633df7302069e51426659d4759c5ed upstream. If a device fails in a way that causes pending request to take a while to complete, md will not be able to immediately remove it from the array in remove_and_add_spares. It will then incorrectly look like a spare device and md will try to recover it even though it is failed. This leads to a recovery process starting and instantly aborting over and over again. We should check if the device is faulty before considering it to be a spare. This will avoid trying to start a recovery that cannot proceed. This bug was introduced in 2.6.26 so that patch is suitable for any kernel since then. Reported-by: Jim Paradis <james.paradis@stratus.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
1 files changed, 1 insertions, 0 deletions
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7c5129f6f4ed..c199c7028534 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6623,6 +6623,7 @@ static int remove_and_add_spares(mddev_t *mddev)
list_for_each_entry(rdev, &mddev->disks, same_set) {
if (rdev->raid_disk >= 0 &&
!test_bit(In_sync, &rdev->flags) &&
+ !test_bit(Faulty, &rdev->flags) &&
!test_bit(Blocked, &rdev->flags))
if (rdev->raid_disk < 0