md: fix a potential deadlock of raid5/raid10 reshape

[ Upstream commit 8876391e440ba615b10eef729576e111f0315f87 ] There is a potential deadlock if mount/umount happens when raid5_finish_reshape() tries to grow the size of emulated disk. How the deadlock happens? 1) The raid5 resync thread finished reshape (expanding array). 2) The mount or umount thread holds VFS sb->s_umount lock and tries to write through critical data into raid5 emulated block device. So it waits for raid5 kernel thread handling stripes in order to finish it I/Os. 3) In the routine of raid5 kernel thread, md_check_recovery() will be called first in order to reap the raid5 resync thread. That is, raid5_finish_reshape() will be called. In this function, it will try to update conf and call VFS revalidate_disk() to grow the raid5 emulated block device. It will try to acquire VFS sb->s_umount lock. The raid5 kernel thread cannot continue, so no one can handle mount/ umount I/Os (stripes). Once the write-through I/Os cannot be finished, mount/umount will not release sb->s_umount lock. The deadlock happens. The raid5 kernel thread is an emulated block device. It is responible to handle I/Os (stripes) from upper layers. The emulated block device should not request any I/Os on itself. That is, it should not call VFS layer functions. (If it did, it will try to acquire VFS locks to guarantee the I/Os sequence.) So we have the resync thread to send resync I/O requests and to wait for the results. For solving this potential deadlock, we can put the size growth of the emulated block device as the final step of reshape thread. 2017/12/29: Thanks to Guoqing Jiang <gqjiang@suse.com>, we confirmed that there is the same deadlock issue in raid10. It's reproducible and can be fixed by this patch. For raid10.c, we can remove the similar code to prevent deadlock as well since they has been called before. Reported-by: Alex Wu <alexwu@synology.com> Reviewed-by: Alex Wu <alexwu@synology.com> Reviewed-by: Chung-Chiang Cheng <cccheng@synology.com> Signed-off-by: BingJing Chang <bingjingc@synology.com> Signed-off-by: Shaohua Li <sh.li@alibaba-inc.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
author: BingJing Chang <bingjingc@synology.com> 2018-02-22 13:34:46 +0800
committer: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 2018-05-30 07:50:31 +0200
commit: 547f11fd132d908018109417216d23b4008a2780 (patch)
tree: f4e1ffb996ad670f5f02aa0051843b04002b5702 /drivers/md/raid10.c
parent: 527ed41ff2776311bdae56c2472ee0a5cbb60605 (diff)
1 files changed, 1 insertions, 7 deletions
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 8166e0d2973b..b138b5cba286 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4684,17 +4684,11 @@ static void raid10_finish_reshape(struct mddev *mddev)
 		return;
 
 	if (mddev->delta_disks > 0) {
-		sector_t size = raid10_size(mddev, 0, 0);
-		md_set_array_sectors(mddev, size);
 		if (mddev->recovery_cp > mddev->resync_max_sectors) {
 			mddev->recovery_cp = mddev->resync_max_sectors;
 			set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 		}
-		mddev->resync_max_sectors = size;
-		if (mddev->queue) {
-			set_capacity(mddev->gendisk, mddev->array_sectors);
-			revalidate_disk(mddev->gendisk);
-		}
+		mddev->resync_max_sectors = mddev->array_sectors;
 	} else {
 		int d;
 		rcu_read_lock();
author	BingJing Chang <bingjingc@synology.com>	2018-02-22 13:34:46 +0800
committer	Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-05-30 07:50:31 +0200
commit	547f11fd132d908018109417216d23b4008a2780 (patch)
tree	f4e1ffb996ad670f5f02aa0051843b04002b5702 /drivers/md/raid10.c
parent	527ed41ff2776311bdae56c2472ee0a5cbb60605 (diff)