Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp3108709rdb; Wed, 13 Sep 2023 02:10:22 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHId0n2t/xHRLvx6fkjIAbYhQOj1ybTJPXhqtlCUGMUdo9QBqhdmKauukMmloK5T4XILg+m X-Received: by 2002:a05:6a21:7889:b0:13e:99c0:fe7f with SMTP id bf9-20020a056a21788900b0013e99c0fe7fmr2026004pzc.11.1694596222205; Wed, 13 Sep 2023 02:10:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694596222; cv=none; d=google.com; s=arc-20160816; b=AZX0KfAT4y22uiejuPqqXZjq8+tCyGCmjcIqFcGhPoQzhDtVa2R9ca/hmmgrK8IIuX 4c+kypsmfhiKZgv18ZprX2XUxLHPorqqXwBwB2s0S0A1o1khm0NKmmRmt6BQLfbbwm4N UKA45IvrLdiiP80s9U9cM/IBgq8AIJmS0RpFEgoxL9YCdpzHdb4qk+59AInIDdxO6v/V xcdOAfP33JLPg5L8FXcYH9c9mHACVIbkocw5SjR6U+pSQNUfkdfXoLlhVCuM+yfWYrJB x0SEURal2f0RWr8X+alX+uVkj1V2MWLvALmt2tw0EwLVUnC9iI5/evQSYQBIG9heb0vL hF6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=ykrDoCZhwCMnBo+uoGwPjyQ3BRfOpMkuFkcLOCHUXPM=; fh=Dld/LIMtYnJCYgCD4kB6Mqvs6OxgaDAy74VVaB1VrpA=; b=e5t4A/+/s0y17VTCzSrYJUIo09E0+is3s+jzFC/FmAtFY+X2P6n+JA9DrB9/mZZx96 Ck1PcZ2lq16nUHNOd+m95KfFOj9GrxnSuBrqQ20O36w1h6CwepmmCiCa55H+R5dsO+03 ixxXXu4cPSAlxmjVpP7oqUquDEvRBSB+XIlm7G7q+FlqFBlD0dn79FtN49XD4SWOt6/C 13xOdgZmLHSg0QPLp9u3hm4E3zj8H+kBlAuAwLu1qHey9ovNBrvJYXDZb6po9ZHOmagB Wl3laz9Z6SPN2oe/oEoc6CWjjBvkea9aE0j7QSxyfFBosbHsd4/gmmDsepyHTkQoCiSg 3aJA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id x9-20020a170902a38900b001c3da86939csi2222850pla.546.2023.09.13.02.10.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Sep 2023 02:10:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 04D0680965AE; Wed, 13 Sep 2023 02:09:08 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239214AbjIMJJF (ORCPT + 99 others); Wed, 13 Sep 2023 05:09:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34246 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230322AbjIMJJE (ORCPT ); Wed, 13 Sep 2023 05:09:04 -0400 Received: from mx3.molgen.mpg.de (mx3.molgen.mpg.de [141.14.17.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1ED641993; Wed, 13 Sep 2023 02:09:00 -0700 (PDT) Received: from theinternet.molgen.mpg.de (theinternet.molgen.mpg.de [141.14.31.7]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: buczek) by mx.molgen.mpg.de (Postfix) with ESMTPSA id BC06861E5FE03; Wed, 13 Sep 2023 11:08:12 +0200 (CEST) Subject: Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition To: Dragan Stancevic , Yu Kuai , song@kernel.org Cc: guoqing.jiang@linux.dev, it+raid@molgen.mpg.de, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, msmith626@gmail.com, "yangerkun@huawei.com" References: <20230822211627.1389410-1-dragan@stancevic.com> <2061b123-6332-1456-e7c3-b713752527fb@stancevic.com> <07d5c7c2-c444-8747-ed6d-ca24231decd8@huaweicloud.com> <0d79d1f9-00e8-93be-3c7c-244030521cd7@huaweicloud.com> <07ef7b78-66d4-d3de-4e25-8a889b902e14@stancevic.com> From: Donald Buczek Message-ID: <63c63d93-30fc-0175-0033-846b93fe9eff@molgen.mpg.de> Date: Wed, 13 Sep 2023 11:08:12 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.1 MIME-Version: 1.0 In-Reply-To: <07ef7b78-66d4-d3de-4e25-8a889b902e14@stancevic.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 13 Sep 2023 02:09:08 -0700 (PDT) X-Spam-Status: No, score=-2.2 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email On 9/5/23 3:54 PM, Dragan Stancevic wrote: > On 9/4/23 22:50, Yu Kuai wrote: >> Hi, >> >> 在 2023/08/30 9:36, Yu Kuai 写道: >>> Hi, >>> >>> 在 2023/08/29 4:32, Dragan Stancevic 写道: >>> >>>> Just a followup on 6.1 testing. I tried reproducing this problem for 5 days with 6.1.42 kernel without your patches and I was not able to reproduce it. >> >> oops, I forgot that you need to backport this patch first to reporduce >> this problem: >> >> https://lore.kernel.org/all/20230529132037.2124527-2-yukuai1@huaweicloud.com/ >> >> The patch fix the deadlock as well, but it introduce some regressions. We've just got an unplanned lock up on "check" to "idle" transition with 6.1.52 after a few hours on a backup server. For the last 2 1/2 years we used the patch I originally proposed with multiple kernel versions [1]. But this no longer seems to be valid or maybe its even destructive in combination with the other changes. But I totally lost track of the further development. As I understood, there are patches queue up in mainline, which might go into 6.1, too, but have not landed there which should fix the problem? Can anyone give me exact references to the patches I'd need to apply to 6.1.52, so that I could probably fix my problem and also test the patches for you those on production systems with a load which tends to run into that problem easily? Thanks Donald [1]: https://lore.kernel.org/linux-raid/bc342de0-98d2-1733-39cd-cc1999777ff3@molgen.mpg.de/ > Ha, jinx :) I was about to email you that I isolated that change with the testing over the weekend that made it more difficult to reproduce in 6.1 and that the original change must be reverted :) > > > >> >> Thanks, >> Kuai >> >>>> >>>> It seems that 6.1 has some other code that prevents this from happening. >>>> >>> >>> I see that there are lots of patches for raid456 between 5.10 and 6.1, >>> however, I remember that I used to reporduce the deadlock after 6.1, and >>> it's true it's not easy to reporduce, see below: >>> >>> https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/ >>> >>> My guess is that 6.1 is harder to reporduce than 5.10 due to some >>> changes inside raid456. >>> >>> By the way, raid10 had a similiar deadlock, and can be fixed the same >>> way, so it make sense to backport these patches. >>> >>> https://lore.kernel.org/r/20230529132037.2124527-5-yukuai1@huaweicloud.com >>> >>> Thanks, >>> Kuai >>> >>> >>>> On 5.10 I can reproduce it within minutes to an hour. >>>> >>> >>> . >>> >> > -- Donald Buczek buczek@molgen.mpg.de Tel: +49 30 8413 1433