Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3791251pxb; Tue, 26 Jan 2021 05:04:04 -0800 (PST) X-Google-Smtp-Source: ABdhPJyCYYTJktD40i5osZeYlsIDf7rfNv2xFVjYk9FtBu1KVWWNF3LgHgoRVtHC4/8RNSmwn9jn X-Received: by 2002:a17:906:ce44:: with SMTP id se4mr3494756ejb.373.1611666243836; Tue, 26 Jan 2021 05:04:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611666243; cv=none; d=google.com; s=arc-20160816; b=PFG6MPkgtW+FRO7XrCw0GYj0hCPCn0UqG/UUL5gNqXLAW0WMtWRfj1ujo8pw0pAqnb VhUMNImN/UOwRuvHWHgdennJ2kY9iSuRnFCbeZcVoxPuUwkCdZHtAEfhrv05SBKtkfqR CuKE0lvWRWoF1nINUlbDx3EfzI2GQURaW9TgZYl+eFdv2pxiXtvs/+htXKv9J8953no9 a2sD4OgAN7Huz/xn/ECwTrr+f1WJFjSW+k4tDQryNWgUsQYQqtUwlA2fVazTAwWgYHXo UOydTvkSXCGToBP1cIKd6ddvZ0Poz7d/lXIsTaUSNee5JhmKKCPtm4xWFhanm3D8eeUX Tzyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :to:subject; bh=uXQwsvV7Y47g4yNbGYCQlKwPYlkG8ei5en59OJJSYEg=; b=Yo1ZD817UfkacZWPg+DpV1Da63nJehuRkjrkLrZOAb3QwrZDmQBF1MJk0pp4xvy3Cq cB/axcRqzvc7yDmkAXWz9wdxdrhQRmrB10RQk47gX0czApkC1rU1V3DwywEzkyHfnNqN xvVfiMQLsh70rU46mYh2efllk0pHPHkUXf5QlBaXrWuOdRsL+OSGDcKN39ZmxebNQyF7 I+JdLCWUZU0NOFW2tgf9kIaEuZ+QgA53qOPY23wUpEUd+c+m6z19ABbVk8+pb+Xy1MZI UJ/O3e5Fh3MZIGh0yyosGFdfhQdU2QY2dEpkZI2mhV6Y4oez2LBe2FzSA0v2wbkooMTa jO/w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b8si7016792eja.22.2021.01.26.05.03.29; Tue, 26 Jan 2021 05:04:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392658AbhAZNBx (ORCPT + 99 others); Tue, 26 Jan 2021 08:01:53 -0500 Received: from mx3.molgen.mpg.de ([141.14.17.11]:34843 "EHLO mx1.molgen.mpg.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2392459AbhAZM7A (ORCPT ); Tue, 26 Jan 2021 07:59:00 -0500 Received: from [192.168.0.5] (ip5f5aed2c.dynamic.kabel-deutschland.de [95.90.237.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: buczek) by mx.molgen.mpg.de (Postfix) with ESMTPSA id 85EA220647904; Tue, 26 Jan 2021 13:58:07 +0100 (CET) Subject: Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition To: Guoqing Jiang , Song Liu , linux-raid@vger.kernel.org, Linux Kernel Mailing List , it+raid@molgen.mpg.de References: <95fbd558-5e46-7a6a-43ac-bcc5ae8581db@cloud.ionos.com> <77244d60-1c2d-330e-71e6-4907d4dd65fc@molgen.mpg.de> <7c5438c7-2324-cc50-db4d-512587cb0ec9@molgen.mpg.de> <37c158cb-f527-34f5-2482-cae138bc8b07@molgen.mpg.de> <55e30408-ac63-965f-769f-18be5fd5885c@molgen.mpg.de> <30576384-682c-c021-ff16-bebed8251365@molgen.mpg.de> <6c7008df-942e-13b1-2e70-a058e96ab0e9@cloud.ionos.com> From: Donald Buczek Message-ID: <12f09162-c92f-8fbb-8382-cba6188bfb29@molgen.mpg.de> Date: Tue, 26 Jan 2021 13:58:06 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <6c7008df-942e-13b1-2e70-a058e96ab0e9@cloud.ionos.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 26.01.21 12:14, Guoqing Jiang wrote: > Hi Donald, > > On 1/26/21 10:50, Donald Buczek wrote: > [...] > >>>> >>>> diff --git a/drivers/md/md.c b/drivers/md/md.c >>>> index 2d21c298ffa7..f40429843906 100644 >>>> --- a/drivers/md/md.c >>>> +++ b/drivers/md/md.c >>>> @@ -4687,11 +4687,13 @@ action_store(struct mddev *mddev, const char *page, size_t len) >>>>               clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); >>>>           if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && >>>>               mddev_lock(mddev) == 0) { >>>> +            set_bit(MD_ALLOW_SB_UPDATE, &mddev->flags); >>>>               flush_workqueue(md_misc_wq); >>>>               if (mddev->sync_thread) { >>>>                   set_bit(MD_RECOVERY_INTR, &mddev->recovery); >>>>                   md_reap_sync_thread(mddev); >>>>               } >>>> +            clear_bit(MD_ALLOW_SB_UPDATE, &mddev->flags); >>>>               mddev_unlock(mddev); >>>>           } >>>>       } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) >>> >>> Yes, it could break the deadlock issue, but I am not sure if it is the right way given we only set ALLOW_SB_UPDATE in suspend which makes sense since the io will be quiesced, but write idle action can't guarantee the  similar thing. >> >> Thinking (and documenting) MD_ALLOW_SB_UPDATE as "the holder of reconfig_mutex promises not to make any changes which would exclude superblocks from being written" might make it easier to accept the usage. > > I am not sure it is safe to set the flag here since write idle can't prevent IO from fs while mddev_suspend can guarantee that. > >> >>> I prefer to make resync thread not wait forever here. >>> > > [...] > >>> >>> -        sh = raid5_get_active_stripe(conf, new_sector, previous, >>> +        sh = raid5_get_active_stripe(conf, new_sector, previous, 0, >> >> >> Mistake here (fourth argument added instead of third) > > Thanks for checking. > > [...] > >> Unfortunately, this patch did not fix the issue. >> >>      root@sloth:~/linux# cat /proc/$(pgrep md3_resync)/stack >>      [<0>] raid5_get_active_stripe+0x1e7/0x6b0 >>      [<0>] raid5_sync_request+0x2a7/0x3d0 >>      [<0>] md_do_sync.cold+0x3ee/0x97c >>      [<0>] md_thread+0xab/0x160 >>      [<0>] kthread+0x11b/0x140 >>      [<0>] ret_from_fork+0x22/0x30 >> >> which is ( according to objdump -d -Sl drivers/md/raid5.o ) at https://elixir.bootlin.com/linux/v5.11-rc5/source/drivers/md/raid5.c#L735 >> >> Isn't it still the case that the superblocks are not written, therefore stripes are not processed, therefore number of active stripes are not decreasing? Who is expected to wake up conf->wait_for_stripe waiters? > > Hmm, how about wake the waiter up in the while loop of raid5d? > > @@ -6520,6 +6532,11 @@ static void raid5d(struct md_thread *thread) >                         md_check_recovery(mddev); >                         spin_lock_irq(&conf->device_lock); >                 } > + > +               if ((atomic_read(&conf->active_stripes) > +                    < (conf->max_nr_stripes * 3 / 4) || > +                    (test_bit(MD_RECOVERY_INTR, &mddev->recovery)))) > +                       wake_up(&conf->wait_for_stripe); >         } >         pr_debug("%d stripes handled\n", handled); Hmm... With this patch on top of your other one, we still have the basic symptoms (md3_raid6 busy looping), but the sync thread is now hanging at root@sloth:~# cat /proc/$(pgrep md3_resync)/stack [<0>] md_do_sync.cold+0x8ec/0x97c [<0>] md_thread+0xab/0x160 [<0>] kthread+0x11b/0x140 [<0>] ret_from_fork+0x22/0x30 instead, which is https://elixir.bootlin.com/linux/latest/source/drivers/md/md.c#L8963 And, unlike before, "md: md3: data-check interrupted." from the pr_info two lines above appears in dmesg. Best Donald > If the issue still appears then I will change the waiter to break just if MD_RECOVERY_INTR is set, something like. > > wait_event_lock_irq(conf->wait_for_stripe, >     (test_bit(MD_RECOVERY_INTR, &mddev->recovery) && sync_req) || >      /* the previous condition */, >     *(conf->hash_locks + hash)); > > Thanks, > Guoqing