Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp199204rwi; Wed, 12 Oct 2022 18:34:53 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5rv/b/zy7KSXUc3c2uIb3ysKGXt6tcc56ZxZxy2hN6L2Sl7qzQWPZkfrYUvA68CMS/Ook4 X-Received: by 2002:a17:90b:4ac9:b0:20d:56c4:a8b2 with SMTP id mh9-20020a17090b4ac900b0020d56c4a8b2mr8156107pjb.174.1665624892842; Wed, 12 Oct 2022 18:34:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665624892; cv=none; d=google.com; s=arc-20160816; b=QqzcS1K5hyOEN+eM2f9v+VEYsPpw4u8ap9F76oDuU3+cqJiowEB+t4R+BjWY6VnJYF q/SAM7W4ga+KeCKsEp3YGps+HgOa4hTl3Fg/qY5bfUgqX82NnjP4IuYKCv6LVxyN7JcJ OI4DdSzPKwJ06IGh2JvMKvMWo0yvubv2/LCApG6AX5Rzjo6CsKoRC+wFFf9cHUmBP4Bg iB3JY9IUj8l8OdW5v+CuKn+PsrGmI4u3MvUYY92BAM/FsT5c5bJn5eLRPF2eOc/XTIte cHIUVJYJ32L/MgsmGbgi45p74olsS5lTrqCFMnMFNBSKvDB5t/aFUYQabIDMAN3OmAk+ IRDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=iWXbecGVGdECjXFShYw6Bz16wfGlCa/OnNI2fkSuMyQ=; b=SPO8LZRKqyvpda3/9IimZ2w2lg1djD1B0ISyzjBvSSuqHRQ5GykoXG5sahy+wiEE1L 9U/NCvWzRkpRpKXkDjWvdfNrXL51FF6LkISdzk3K8a5H5SxpEC8DvJgkWFJNwwjLLtWH nocazCSCSmzpxPIerO4cvdcw4vrSmpbsi30V4a37MARrY1JfgIVPYMQRGVTqaZcgnFTX /eP8YIUIh83g79NodDlL5f5rJqLddFObVEBmOhDXnjSztNQikHhI32KdHm1A+xKrxnw+ ckQfbAKb+yR0MwPSNt4xrS62t6QTu6wX/SVMpRJ3j0XQqV6lsL2fvc5XtBm2taFdZZ9T i+bg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=C7P6J1CT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id my10-20020a17090b4c8a00b0020da61f84dbsi706227pjb.176.2022.10.12.18.34.41; Wed, 12 Oct 2022 18:34:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=C7P6J1CT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229566AbiJMBFp (ORCPT + 99 others); Wed, 12 Oct 2022 21:05:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40072 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229560AbiJMBFU (ORCPT ); Wed, 12 Oct 2022 21:05:20 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A44D111458; Wed, 12 Oct 2022 18:03:04 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 5C7DCB81CE2; Thu, 13 Oct 2022 00:22:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5706BC43470; Thu, 13 Oct 2022 00:22:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665620561; bh=hFbDqPGAF/aU8OqJBnmamr8qrcOTfz/0LmJccZfAYdM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=C7P6J1CTdLAbD9n623tRVRdYKSWcCrwEM9XLeX/YEf0uKGhGfkKN65S3QtOZdbiQ2 N/wTm5FtkdakVvN0x7bim3XKQ6sUjSLXWXD8HJDIuf0GUOEyP6Pmq5o1O43Ngmvvex rq3lnsToJ7JMt+Q+4+BEvjodqjwV0VtJT9eKsEC6D4B6OBBh5JgMoRfyHL+JNDX3S2 8E+M7vrT1RQJjFlikHQGUXzOBgle2tOdJvdZlP/6wurGOHfj3UifHs5lyEv89O+ZhW W2g4kQ73X4JWD+sF37OZ7EoHzq0Wu+vDF2lNUMRXYH0kQh+wG/bRvpKQN5DX+4f6Jr zN3NNisfVSvqg== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Logan Gunthorpe , Song Liu , Sasha Levin , linux-raid@vger.kernel.org Subject: [PATCH AUTOSEL 5.15 28/47] md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d Date: Wed, 12 Oct 2022 20:21:03 -0400 Message-Id: <20221013002124.1894077-28-sashal@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221013002124.1894077-1-sashal@kernel.org> References: <20221013002124.1894077-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Logan Gunthorpe [ Upstream commit 5e2cf333b7bd5d3e62595a44d598a254c697cd74 ] A complicated deadlock exists when using the journal and an elevated group_thrtead_cnt. It was found with loop devices, but its not clear whether it can be seen with real disks. The deadlock can occur simply by writing data with an fio script. When the deadlock occurs, multiple threads will hang in different ways: 1) The group threads will hang in the blk-wbt code with bios waiting to be submitted to the block layer: io_schedule+0x70/0xb0 rq_qos_wait+0x153/0x210 wbt_wait+0x115/0x1b0 io_schedule+0x70/0xb0 rq_qos_wait+0x153/0x210 wbt_wait+0x115/0x1b0 __rq_qos_throttle+0x38/0x60 blk_mq_submit_bio+0x589/0xcd0 wbt_wait+0x115/0x1b0 __rq_qos_throttle+0x38/0x60 blk_mq_submit_bio+0x589/0xcd0 __submit_bio+0xe6/0x100 submit_bio_noacct_nocheck+0x42e/0x470 submit_bio_noacct+0x4c2/0xbb0 ops_run_io+0x46b/0x1a30 handle_stripe+0xcd3/0x36b0 handle_active_stripes.constprop.0+0x6f6/0xa60 raid5_do_work+0x177/0x330 Or: io_schedule+0x70/0xb0 rq_qos_wait+0x153/0x210 wbt_wait+0x115/0x1b0 __rq_qos_throttle+0x38/0x60 blk_mq_submit_bio+0x589/0xcd0 __submit_bio+0xe6/0x100 submit_bio_noacct_nocheck+0x42e/0x470 submit_bio_noacct+0x4c2/0xbb0 flush_deferred_bios+0x136/0x170 raid5_do_work+0x262/0x330 2) The r5l_reclaim thread will hang in the same way, submitting a bio to the block layer: io_schedule+0x70/0xb0 rq_qos_wait+0x153/0x210 wbt_wait+0x115/0x1b0 __rq_qos_throttle+0x38/0x60 blk_mq_submit_bio+0x589/0xcd0 __submit_bio+0xe6/0x100 submit_bio_noacct_nocheck+0x42e/0x470 submit_bio_noacct+0x4c2/0xbb0 submit_bio+0x3f/0xf0 md_super_write+0x12f/0x1b0 md_update_sb.part.0+0x7c6/0xff0 md_update_sb+0x30/0x60 r5l_do_reclaim+0x4f9/0x5e0 r5l_reclaim_thread+0x69/0x30b However, before hanging, the MD_SB_CHANGE_PENDING flag will be set for sb_flags in r5l_write_super_and_discard_space(). This flag will never be cleared because the submit_bio() call never returns. 3) Due to the MD_SB_CHANGE_PENDING flag being set, handle_stripe() will do no processing on any pending stripes and re-set STRIPE_HANDLE. This will cause the raid5d thread to enter an infinite loop, constantly trying to handle the same stripes stuck in the queue. The raid5d thread has a blk_plug that holds a number of bios that are also stuck waiting seeing the thread is in a loop that never schedules. These bios have been accounted for by blk-wbt thus preventing the other threads above from continuing when they try to submit bios. --Deadlock. To fix this, add the same wait_event() that is used in raid5_do_work() to raid5d() such that if MD_SB_CHANGE_PENDING is set, the thread will schedule and wait until the flag is cleared. The schedule action will flush the plug which will allow the r5l_reclaim thread to continue, thus preventing the deadlock. However, md_check_recovery() calls can also clear MD_SB_CHANGE_PENDING from the same thread and can thus deadlock if the thread is put to sleep. So avoid waiting if md_check_recovery() is being called in the loop. It's not clear when the deadlock was introduced, but the similar wait_event() call in raid5_do_work() was added in 2017 by this commit: 16d997b78b15 ("md/raid5: simplfy delaying of writes while metadata is updated.") Link: https://lore.kernel.org/r/7f3b87b6-b52a-f737-51d7-a4eec5c44112@deltatee.com Signed-off-by: Logan Gunthorpe Signed-off-by: Song Liu Signed-off-by: Sasha Levin --- drivers/md/raid5.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 19e497a7e747..169d27dcad50 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -36,6 +36,7 @@ */ #include +#include #include #include #include @@ -6522,7 +6523,18 @@ static void raid5d(struct md_thread *thread) spin_unlock_irq(&conf->device_lock); md_check_recovery(mddev); spin_lock_irq(&conf->device_lock); + + /* + * Waiting on MD_SB_CHANGE_PENDING below may deadlock + * seeing md_check_recovery() is needed to clear + * the flag when using mdmon. + */ + continue; } + + wait_event_lock_irq(mddev->sb_wait, + !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags), + conf->device_lock); } pr_debug("%d stripes handled\n", handled); -- 2.35.1