Received: by 2002:ab2:3141:0:b0:1ed:23cc:44d1 with SMTP id i1csp1571427lqg; Sun, 3 Mar 2024 17:24:18 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXriOhj5DVpLe7yTeqVf2Sm4Gg2bAa5gtBjk84kzG00QBt/szSqr5pbujp82BmEGs+x8r+wgwdSHEQkLD7mUzE9AFZkclMqj4FhU0czpw== X-Google-Smtp-Source: AGHT+IFXGwUHjEtCTMaN51JUnNsb43FYHWgwcvHCqsHisG8+W9Ns3b5yx75zYgjobcWbjlD77Q3r X-Received: by 2002:a17:906:ca55:b0:a44:7a25:736e with SMTP id jx21-20020a170906ca5500b00a447a25736emr5060285ejb.27.1709515458426; Sun, 03 Mar 2024 17:24:18 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709515458; cv=pass; d=google.com; s=arc-20160816; b=073jOEZmOjJnYYue/8A9OZI9yPMoM64fo7E6YiKgk093mUP3b9TPme07xGNPixVd+T 6gGNRVP+AVT2cdXwP4Hmx0klF3maC1pZMNGNEDKG+Sr8IUxfmOsKv0cllJjS+MEZWb6Y XYC0EQfpV8Ly79/+pgYWdhbcvA5+1O3lKVimKGJqGmECVLlEKdctbl6VfSk9cjvxrbpr 0n+vgfFCVheSocB9r28D5OGV3pWupsVhNPxsNp4f0EeEa6JyA1fauFV1/jxv4a5wlbY9 QmOfu5RiaDBZ/P6C3zRWHPu7BS13jeOuaYEps3/vuDymXH/R0IaGBFpHnujloUzuBVhR e5MQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:user-agent:date:message-id:from :references:cc:to:subject; bh=dszNXzl5faOhWjR+xgPQZwr1T7WS49vPOXgoPjxEu+8=; fh=iMaN34kixIVz1FgcFqamBmHYZ9rLhOq0d267aCd8JR8=; b=0Hlwonpu9T/PO8mSbI0Si53R6ehqD1tMTfU+ZQ9w1/gx6PEQj0cTScXfzPpkE33L6C AX8xsDoAGKWh50QUH1gF8n7eujpOsS3yGoslAmaYBM53XKYK1hxCH86U7/rupT07BV8Q w5YJGqGbZ7/sWbfLvRoB+sdx99hvWnU+2HdRTVngoStwsTTA08zBOwpkOAanvvd2KCVn 6Oo8cEhzkF2+NsmsNSG9Ne2/6ddjSR8118Y8e7402Pb5yoFImBQm2MOOqmGirC9Jsadc pJOHOOwepV9sqhqyYzlrZ7hu8jXqadkTI9qkV2NR3MUYwqlsSgqR6h9teaea0qIdMzJk Hb5A==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-89960-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-89960-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id u9-20020a17090657c900b00a440435c2ecsi3398706ejr.503.2024.03.03.17.24.18 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 03 Mar 2024 17:24:18 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-89960-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-89960-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-89960-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 02C211F21107 for ; Mon, 4 Mar 2024 01:24:18 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A1CC715D1; Mon, 4 Mar 2024 01:24:08 +0000 (UTC) Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 417F5399; Mon, 4 Mar 2024 01:24:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709515448; cv=none; b=LMAHaK+sVMEyJ5lOmDLrITVd2OIO/QVR6OYY+3measN2O9oN6NuSdQYkoX8oTtUIQPkyNEfqCwai+WzXbm1KGslPaWrern5US59jnA7hTzhCGQCjpDufIc0g57VaSVqqOB2JgEFp6q+gziFBs8YUtWPHrjCl8aKDkuwNyzJeH7s= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709515448; c=relaxed/simple; bh=ObNZbjO3XsX0lJ50+lckhWcAILjCQwKhcAXSOgwVk7o=; h=Subject:To:Cc:References:From:Message-ID:Date:MIME-Version: In-Reply-To:Content-Type; b=K8bo9w+yVLzpGCG+T36ymwQkQs/0bQYZOcMVbCF4MwXgReY9HExOQecYNIZu5+ao73Cu+GjAxxTdjPowmAnZ/oZYcssDHCmsVr/tQwC0V5CA+S17rdgxu1PEjlFU8kNg9oWIyNZB88ADtRQU+O6yq1ARxCc0xOTSnlN/1ETsB7Y= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4Tp1Dn3xC1z4f3lCy; Mon, 4 Mar 2024 09:23:53 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 028D91A0CB0; Mon, 4 Mar 2024 09:24:01 +0800 (CST) Received: from [10.174.176.73] (unknown [10.174.176.73]) by APP1 (Coremail) with SMTP id cCh0CgCXaBGtIuVlE4J9Fw--.56824S3; Mon, 04 Mar 2024 09:23:59 +0800 (CST) Subject: Re: [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2 To: Yu Kuai , Xiao Ni Cc: zkabelac@redhat.com, agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com, dm-devel@lists.linux.dev, song@kernel.org, heinzm@redhat.com, neilb@suse.de, jbrassow@redhat.com, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, yi.zhang@huawei.com, yangerkun@huawei.com, "yukuai (C)" References: <20240301095657.662111-1-yukuai1@huaweicloud.com> <0091f7d1-2273-16ff-8285-5fa3f7e2e0f7@huaweicloud.com> From: Yu Kuai Message-ID: <35feaa54-db9e-f0d6-d5a5-a10a45bb90a5@huaweicloud.com> Date: Mon, 4 Mar 2024 09:23:56 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In-Reply-To: <0091f7d1-2273-16ff-8285-5fa3f7e2e0f7@huaweicloud.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:cCh0CgCXaBGtIuVlE4J9Fw--.56824S3 X-Coremail-Antispam: 1UD129KBjvJXoWxWw4UuF1rtr1DWF1fXrWDArb_yoWrGryxpF Z3Gay3ZrWUCrn3ZrsIq34UZFyYyw4xG3yUAw17Ja18ArZFqryIqr4jgr1q9F98Xr4rAw1U tr45tay3ur1UtFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUU9214x267AKxVW5JVWrJwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4U JVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gc CE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E 2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJV W8JwACjcxG0xvEwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2ka 0xkIwI1lc7I2V7IY0VAS07AlzVAYIcxG8wCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7x kEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E 67AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCw CI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6rWUJVWr Zr1UMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYx BIdaVFxhVjvjDU0xZFpf9x0JUQvtAUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Hi, 在 2024/03/04 9:07, Yu Kuai 写道: > Hi, > > 在 2024/03/03 21:16, Xiao Ni 写道: >> Hi all >> >> There is a error report from lvm regression tests. The case is >> lvconvert-raid-reshape-stripes-load-reload.sh. I saw this error when I >> tried to fix dmraid regression problems too. In my patch set,  after >> reverting ad39c08186f8a0f221337985036ba86731d6aafe (md: Don't register >> sync_thread for reshape directly), this problem doesn't appear. > > How often did you see this tes failed? I'm running the tests for over > two days now, for 30+ rounds, and this test never fail in my VM. Take a quick look, there is still a path from raid10 that MD_RECOVERY_FROZEN can be cleared, and in theroy this problem can be triggered. Can you test the following patch on the top of this set? I'll keep running the test myself. diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index a5f8419e2df1..7ca29469123a 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -4575,7 +4575,8 @@ static int raid10_start_reshape(struct mddev *mddev) return 0; abort: - mddev->recovery = 0; + if (mddev->gendisk) + mddev->recovery = 0; spin_lock_irq(&conf->device_lock); conf->geo = conf->prev; mddev->raid_disks = conf->geo.raid_disks; Thanks, Kuai > > Thanks, > Kuai > >> >> I put the log in the attachment. >> >> On Fri, Mar 1, 2024 at 6:03 PM Yu Kuai wrote: >>> >>> From: Yu Kuai >>> >>> link to part1: >>> https://lore.kernel.org/all/CAPhsuW7u1UKHCDOBDhD7DzOVtkGemDz_QnJ4DUq_kSN-Q3G66Q@mail.gmail.com/ >>> >>> >>> part1 contains fixes for deadlocks for stopping sync_thread >>> >>> This set contains fixes: >>>   - reshape can start unexpected, cause data corruption, patch 1,5,6; >>>   - deadlocks that reshape concurrent with IO, patch 8; >>>   - a lockdep warning, patch 9; >>> >>> I'm runing lvm2 tests with following scripts with a few rounds now, >>> >>> for t in `ls test/shell`; do >>>          if cat test/shell/$t | grep raid &> /dev/null; then >>>                  make check T=shell/$t >>>          fi >>> done >>> >>> There are no deadlock and no fs corrupt now, however, there are still >>> four >>> failed tests: >>> >>> ###       failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh >>> ###       failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh >>> ###       failed: [ndev-vanilla] shell/lvcreate-large-raid.sh >>> ###       failed: [ndev-vanilla] shell/lvextend-raid.sh >>> >>> And failed reasons are the same: >>> >>> ## ERROR: The test started dmeventd (147856) unexpectedly >>> >>> I have no clue yet, and it seems other folks doesn't have this issue. >>> >>> Yu Kuai (9): >>>    md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume >>>    md: export helpers to stop sync_thread >>>    md: export helper md_is_rdwr() >>>    md: add a new helper reshape_interrupted() >>>    dm-raid: really frozen sync_thread during suspend >>>    md/dm-raid: don't call md_reap_sync_thread() directly >>>    dm-raid: add a new helper prepare_suspend() in md_personality >>>    dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io >>>      concurrent with reshape >>>    dm-raid: fix lockdep waring in "pers->hot_add_disk" >>> >>>   drivers/md/dm-raid.c | 93 ++++++++++++++++++++++++++++++++++---------- >>>   drivers/md/md.c      | 73 ++++++++++++++++++++++++++-------- >>>   drivers/md/md.h      | 38 +++++++++++++++++- >>>   drivers/md/raid5.c   | 32 ++++++++++++++- >>>   4 files changed, 196 insertions(+), 40 deletions(-) >>> >>> -- >>> 2.39.2 >>> > > > . >