Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp2201029imm; Mon, 28 May 2018 03:49:53 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpn6gSMJraeIXzdxh+fQ8I48hNrLBiEqyYKzzs8M6G/8i6NL0TLTviFfKvqXuPe8qz2xUAC X-Received: by 2002:a62:ac14:: with SMTP id v20-v6mr13145671pfe.101.1527504593727; Mon, 28 May 2018 03:49:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527504593; cv=none; d=google.com; s=arc-20160816; b=acaNQoskNH/aqiBCprgEzCCc7Gz20UqT/HwKq5NRcZT2rw8QggkLDjKi9XgrP4zqYM 6TbekV/f4bEDG6YnHwLQB0QSvHLd//m+WE1vKgwb6DQ9a9/90ghKTog5KHEy75RlWE59 dMlVM6tH44TFL4cDf3xuOT0HZCsDfPf/qh7JIUTvL5N6+uzkm3dj2UZxfs9vpQ3bZWuG OhH5GDxU1uuTGSVkL7i1abHvXcoNn2Hqi68TnInCmmUcAwPg/pf03Q4kmJlDZpO+JQqX H+RRLiQF/UJm/j8O+2fKh2dfwwJN0ZDStZqGFFS8B/FjpzXyes18+9771UB+KIbBLvPd l0xQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :arc-authentication-results; bh=GVerFdbxdcpZrzpgIK3xORw4BltNSQ2WQJ5Rm3CXLbI=; b=HY88d0sHx1o6Z+JJ6Xa+U6GEa4BQQTxZURqUFBbo/+ei1UEOThMrlF+vpUqzrMuZBe gbXxQ47yw5EdlKZvmqalhb6mKUPHr74HlT95/lF778qIclzl5GPJPn2XfsmVGSTWouAA 9W3mv3trbY0c/6K3B9xSse2AZWYlT7aJF2X8FpqY49XJRytfJpzvqXN7VUFuGl+76HUX lHSYGOykpmFnGqvZp7WhiIyImBHWEW9zRD7X8EHiIpR9wn/8zK5mRS36VINF4X81BiyR E+kUhnQcKaIrhQNKneCoxsilJCqiR51piQ9wZoHgfi2aoQYtttRFQ+YwOB1b0tDvK5GX nM1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=nckxp2Fu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g1-v6si28521030plm.268.2018.05.28.03.49.39; Mon, 28 May 2018 03:49:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=nckxp2Fu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1163325AbeE1KsP (ORCPT + 99 others); Mon, 28 May 2018 06:48:15 -0400 Received: from mail.kernel.org ([198.145.29.99]:38166 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1163313AbeE1KsM (ORCPT ); Mon, 28 May 2018 06:48:12 -0400 Received: from localhost (LFbn-1-12247-202.w90-92.abo.wanadoo.fr [90.92.61.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E9C8720660; Mon, 28 May 2018 10:48:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1527504491; bh=sBQzAqQNLejitLzmpDIorlPoQ+skBiCj3ZBHBULO0oQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=nckxp2Fuz/NRMQqaMxrTc2dQPwGGL9OAXFteBNfh7tp4aOvlxig54OC8vd5ncSrwn mD0zivZ7FXC3aogHOkLGhzE9epGf5axtJdzzYbP5lN4fJ21jwwQWfpPbYVeI6WM0/t eThfOtTNAZP5l/GnT2SfE/8dyo0iuEDdIbMUYlKY= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Alex Wu , Chung-Chiang Cheng , BingJing Chang , Shaohua Li , Sasha Levin Subject: [PATCH 4.14 140/496] md: fix a potential deadlock of raid5/raid10 reshape Date: Mon, 28 May 2018 11:58:45 +0200 Message-Id: <20180528100325.824843394@linuxfoundation.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180528100319.498712256@linuxfoundation.org> References: <20180528100319.498712256@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.14-stable review patch. If anyone has any objections, please let me know. ------------------ From: BingJing Chang [ Upstream commit 8876391e440ba615b10eef729576e111f0315f87 ] There is a potential deadlock if mount/umount happens when raid5_finish_reshape() tries to grow the size of emulated disk. How the deadlock happens? 1) The raid5 resync thread finished reshape (expanding array). 2) The mount or umount thread holds VFS sb->s_umount lock and tries to write through critical data into raid5 emulated block device. So it waits for raid5 kernel thread handling stripes in order to finish it I/Os. 3) In the routine of raid5 kernel thread, md_check_recovery() will be called first in order to reap the raid5 resync thread. That is, raid5_finish_reshape() will be called. In this function, it will try to update conf and call VFS revalidate_disk() to grow the raid5 emulated block device. It will try to acquire VFS sb->s_umount lock. The raid5 kernel thread cannot continue, so no one can handle mount/ umount I/Os (stripes). Once the write-through I/Os cannot be finished, mount/umount will not release sb->s_umount lock. The deadlock happens. The raid5 kernel thread is an emulated block device. It is responible to handle I/Os (stripes) from upper layers. The emulated block device should not request any I/Os on itself. That is, it should not call VFS layer functions. (If it did, it will try to acquire VFS locks to guarantee the I/Os sequence.) So we have the resync thread to send resync I/O requests and to wait for the results. For solving this potential deadlock, we can put the size growth of the emulated block device as the final step of reshape thread. 2017/12/29: Thanks to Guoqing Jiang , we confirmed that there is the same deadlock issue in raid10. It's reproducible and can be fixed by this patch. For raid10.c, we can remove the similar code to prevent deadlock as well since they has been called before. Reported-by: Alex Wu Reviewed-by: Alex Wu Reviewed-by: Chung-Chiang Cheng Signed-off-by: BingJing Chang Signed-off-by: Shaohua Li Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/md/md.c | 13 +++++++++++++ drivers/md/raid10.c | 8 +------- drivers/md/raid5.c | 8 +------- 3 files changed, 15 insertions(+), 14 deletions(-) --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8522,6 +8522,19 @@ void md_do_sync(struct md_thread *thread set_mask_bits(&mddev->sb_flags, 0, BIT(MD_SB_CHANGE_PENDING) | BIT(MD_SB_CHANGE_DEVS)); + if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && + !test_bit(MD_RECOVERY_INTR, &mddev->recovery) && + mddev->delta_disks > 0 && + mddev->pers->finish_reshape && + mddev->pers->size && + mddev->queue) { + mddev_lock_nointr(mddev); + md_set_array_sectors(mddev, mddev->pers->size(mddev, 0, 0)); + mddev_unlock(mddev); + set_capacity(mddev->gendisk, mddev->array_sectors); + revalidate_disk(mddev->gendisk); + } + spin_lock(&mddev->lock); if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { /* We completed so min/max setting can be forgotten if used. */ --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -4693,17 +4693,11 @@ static void raid10_finish_reshape(struct return; if (mddev->delta_disks > 0) { - sector_t size = raid10_size(mddev, 0, 0); - md_set_array_sectors(mddev, size); if (mddev->recovery_cp > mddev->resync_max_sectors) { mddev->recovery_cp = mddev->resync_max_sectors; set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); } - mddev->resync_max_sectors = size; - if (mddev->queue) { - set_capacity(mddev->gendisk, mddev->array_sectors); - revalidate_disk(mddev->gendisk); - } + mddev->resync_max_sectors = mddev->array_sectors; } else { int d; rcu_read_lock(); --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -8001,13 +8001,7 @@ static void raid5_finish_reshape(struct if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { - if (mddev->delta_disks > 0) { - md_set_array_sectors(mddev, raid5_size(mddev, 0, 0)); - if (mddev->queue) { - set_capacity(mddev->gendisk, mddev->array_sectors); - revalidate_disk(mddev->gendisk); - } - } else { + if (mddev->delta_disks <= 0) { int d; spin_lock_irq(&conf->device_lock); mddev->degraded = raid5_calc_degraded(conf);