Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp5141123rwl; Tue, 28 Mar 2023 17:13:58 -0700 (PDT) X-Google-Smtp-Source: AKy350YUMS8lHQkttT7h2qa4VqcmZ4GGUPWV4QfTeD8erH4blTeNeKlsEC4PKKuTWXJv5x8vQCb3 X-Received: by 2002:a17:906:b297:b0:933:46a7:c3af with SMTP id q23-20020a170906b29700b0093346a7c3afmr17874316ejz.72.1680048838549; Tue, 28 Mar 2023 17:13:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680048838; cv=none; d=google.com; s=arc-20160816; b=Gtna4lMBZzHz0EpMkhxuyEtbDdl+j/KW8BjVv4PezuBfliH69rc26hQB/xA0Wr/Kji r2ID58b0/Y/oQ6HfkrOaqmOACbu4oQ/DP2l42JDE27Gu67dZUIgENZiPaNiMeYp+SEQ0 1oH8RaLS9xBEcMucGxlyajhmwlQgaeDwHgxsEkBKEFYl2zO2OqPsxluiJu9hpEsar13R xBP82bMquIMjwLSheRhPGkxrwOkcKjyGxkmV1YHk9H1tjR4UgC7e9OfLwFlJlSakt2H+ taKlkL+IlsoAqY+sq6nxi2+YZb2LcmHxGKDEcZbFb492HUzcxJhpvLlBN8jSOBRe3DhJ yU7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=Rm637VcgiwDKAo8iz58x88hgSwzm8y+XXe72IyGX59A=; b=0hQxF10gQAaT2gWHu28TwyDtmMEHurhmqLMtMCCJbHELXb8DPPwee7jqCacad6M+Fn SVUtxSbaMqR9wmQVN2+9MxQNRHDJFJXG/cfyE3xuT4jXRjS4esOVLAKvIDJSmr3VoF+g 2EUTOI+KR8iH7EjaOoPNFQAw9DD849Bi0CgDZLRcE3+Wr3oScE2A6XPL+gRHlB4/cGqX dfaYxorADZHWhK6nLSBMlguYhCyRRSa2awncb1a9s4tjzUi4JNOimZr0QMqNoHBfK9qS dCcKXWL1gHW9K7yF5X6Yxwi2UY6PmXqi1slBaWj6ueGJrxGzKXKCgPW7XnaIsNSu8FLq 9Pgw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=awFjxt5m; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id oz41-20020a1709077da900b009462880b9cdsi5361977ejc.950.2023.03.28.17.13.29; Tue, 28 Mar 2023 17:13:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=awFjxt5m; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229932AbjC2AB2 (ORCPT + 99 others); Tue, 28 Mar 2023 20:01:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229779AbjC2AB0 (ORCPT ); Tue, 28 Mar 2023 20:01:26 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DFF38AF; Tue, 28 Mar 2023 17:01:25 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4BDC2618E5; Wed, 29 Mar 2023 00:01:25 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AE908C433EF; Wed, 29 Mar 2023 00:01:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1680048084; bh=RVxXcEiibRebSGr+aKRiwje0chKKQkgOOPr8eSlVSkY=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=awFjxt5m0rkRkcTeDWE3+14Tkz0Ts+OA3EknP8LyGLYSJKo7krFIZWEkt/fa7t6jW QxwfuThJ/kmJ4Landpwg9VLqnUnT7tGY9Y5MVxEJMQZWjYtEqwd4hutizbqCT6aXS6 EwpOxhJfxIuhpGi99UdCv1iphl/Uml/hs3cGcz0ILoc53cwbQxalNiTwc+eOVg3Fss kaAdLvwMDeFOb40RvTGnw/pdg5rubsjNIF3qvm+1GwzMRqdKYpqSHOKqez44Gy7OVP 9TZu5SU83+qbpoAxi7zBCDKNnHpJl0V4yXsQtYRb6M+qQajFKYxoMUl29Ms32Xqma8 uBWdN0XzJ+Gpw== Received: by mail-lf1-f49.google.com with SMTP id h11so11007489lfu.8; Tue, 28 Mar 2023 17:01:24 -0700 (PDT) X-Gm-Message-State: AAQBX9ci19N+jOu5gFVsNadSuILBcG+PrLw9rM6qNu7+DGHqohK4DRHG X4u9ocCdGJp9bfO4/KKlxZ6O4j21M6gikWgkCos= X-Received: by 2002:ac2:5dcd:0:b0:4e8:5bed:a051 with SMTP id x13-20020ac25dcd000000b004e85beda051mr5202722lfq.3.1680048082691; Tue, 28 Mar 2023 17:01:22 -0700 (PDT) MIME-Version: 1.0 References: <55e30408-ac63-965f-769f-18be5fd5885c@molgen.mpg.de> <30576384-682c-c021-ff16-bebed8251365@molgen.mpg.de> <6c7008df-942e-13b1-2e70-a058e96ab0e9@cloud.ionos.com> <12f09162-c92f-8fbb-8382-cba6188bfb29@molgen.mpg.de> <6757d55d-ada8-9b7e-b7fd-2071fe905466@cloud.ionos.com> <93d8d623-8aec-ad91-490c-a414c4926fb2@molgen.mpg.de> <0bb7c8d8-6b96-ce70-c5ee-ba414de10561@cloud.ionos.com> <1cdfceb6-f39b-70e1-3018-ea14dbe257d9@cloud.ionos.com> <7733de01-d1b0-e56f-db6a-137a752f7236@molgen.mpg.de> <2af18cf7-05eb-f1d1-616a-2c5894d1ac43@linux.dev> In-Reply-To: From: Song Liu Date: Tue, 28 Mar 2023 17:01:09 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition To: Marc Smith , Yu Kuai Cc: Guoqing Jiang , Donald Buczek , linux-raid@vger.kernel.org, Linux Kernel Mailing List , it+raid@molgen.mpg.de Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-5.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 16, 2023 at 8:25=E2=80=AFAM Marc Smith wr= ote: > > On Tue, Mar 14, 2023 at 10:45=E2=80=AFAM Marc Smith = wrote: > > > > On Tue, Mar 14, 2023 at 9:55=E2=80=AFAM Guoqing Jiang wrote: > > > > > > > > > > > > On 3/14/23 21:25, Marc Smith wrote: > > > > On Mon, Feb 8, 2021 at 7:49=E2=80=AFPM Guoqing Jiang > > > > wrote: > > > >> Hi Donald, > > > >> > > > >> On 2/8/21 19:41, Donald Buczek wrote: > > > >>> Dear Guoqing, > > > >>> > > > >>> On 08.02.21 15:53, Guoqing Jiang wrote: > > > >>>> > > > >>>> On 2/8/21 12:38, Donald Buczek wrote: > > > >>>>>> 5. maybe don't hold reconfig_mutex when try to unregister > > > >>>>>> sync_thread, like this. > > > >>>>>> > > > >>>>>> /* resync has finished, collect result */ > > > >>>>>> mddev_unlock(mddev); > > > >>>>>> md_unregister_thread(&mddev->sync_thread); > > > >>>>>> mddev_lock(mddev); > > > >>>>> As above: While we wait for the sync thread to terminate, would= n't it > > > >>>>> be a problem, if another user space operation takes the mutex? > > > >>>> I don't think other places can be blocked while hold mutex, othe= rwise > > > >>>> these places can cause potential deadlock. Please try above two = lines > > > >>>> change. And perhaps others have better idea. > > > >>> Yes, this works. No deadlock after >11000 seconds, > > > >>> > > > >>> (Time till deadlock from previous runs/seconds: 1723, 37, 434, 12= 65, > > > >>> 3500, 1136, 109, 1892, 1060, 664, 84, 315, 12, 820 ) > > > >> Great. I will send a formal patch with your reported-by and tested= -by. > > > >> > > > >> Thanks, > > > >> Guoqing > > > > I'm still hitting this issue with Linux 5.4.229 -- it looks like 1/= 2 > > > > of the patches that supposedly resolve this were applied to the sta= ble > > > > kernels, however, one was omitted due to a regression: > > > > md: don't unregister sync_thread with reconfig_mutex held (upstream > > > > commit 8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934) > > > > > > > > I don't see any follow-up on the thread from June 8th 2022 asking f= or > > > > this patch to be dropped from all stable kernels since it caused a > > > > regression. > > > > > > > > The patch doesn't appear to be present in the current mainline kern= el > > > > (6.3-rc2) either. So I assume this issue is still present there, or= it > > > > was resolved differently and I just can't find the commit/patch. > > > > > > It should be fixed by commit 9dfbdafda3b3"md: unlock mddev before rea= p > > > sync_thread in action_store". > > > > Okay, let me try applying that patch... it does not appear to be > > present in my 5.4.229 kernel source. Thanks. > > Yes, applying this '9dfbdafda3b3 "md: unlock mddev before reap > sync_thread in action_store"' patch on top of vanilla 5.4.229 source > appears to fix the problem for me -- I can't reproduce the issue with > the script, and it's been running for >24 hours now. (Previously I was > able to induce the issue within a matter of minutes.) Hi Marc, Could you please run your reproducer on the md-tmp branch? https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=3Dmd-tmp This contains a different version of the fix by Yu Kuai. Thanks, Song