Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3840945pxb; Tue, 26 Jan 2021 06:15:22 -0800 (PST) X-Google-Smtp-Source: ABdhPJytq4/kTbaFNlruapVg50mXQt1OCBOuPRYMPzBBkjyuJU1k5JytXtNR/5cQv0h3g3pPwkN7 X-Received: by 2002:aa7:d1d4:: with SMTP id g20mr4822182edp.244.1611670522127; Tue, 26 Jan 2021 06:15:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611670522; cv=none; d=google.com; s=arc-20160816; b=JjGx94Qq1LTEjRcA6cRc3YAMTqZfyufodjlAoB2VtiQX6KrxkzC0Drd9QoZC6i5aMl q7mVKOo4NJRBtDU62m1Cqk8RXj0ORGLHUfERMHkors81AN3ZE9YDp5Bad/JMCmBKlWf3 +GSEBG0ih21LpWKypNNJaULFHRnMUjzryb6sWq3SQoU4jqG3dkg3WSo/75cwbE/YCzLp rLeFg48YPF2pOC9sXn5XCNeM1OyxOkhbGn16YSfK1FNU11WnuDZnLDoWOMk+JPtsdNNI c4YQBxQ3+8gDJ4IYsajQufVNCGAkw5ZyRIKFBNEGVIAyOo8agACpr0yH2AM26AtacZbI cZVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :to:subject:dkim-signature; bh=+czMnlTMDcv/kNs5J/9z223E++sGCsTNb3CDdoR0L8I=; b=VVcfInbbVKexhSiEXXt/hqK2O4Vuh+NgMPfYZr6Qxf0b+ph7Yzid9E+b+IzZVxyGJL NvLKYWo+n8Q3EPGaCwsfkpWi3H5PvkewFX62rLGkpcirpuaSqLvVG3VrVPBRM1jKEpxj vCZ3GyfC0YzaSNMxiYA7kyYFwPxNeYbUCdxinZKTpskg9p5uuSH/vPLp8A9qigu6lkQo ftxIw/VRH2l8qeV1txP7tSMKaG7LzX4d2v3LpKzdvRYlsrhtREje+HDam0qWDJxAniq9 cM6Yg/L7VUwDLKFazdmx5iauvPyx1R07SAtupp4gpqUhHUoW+IzAEmSCwS66q3g8DPXt /xcw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cloud.ionos.com header.s=google header.b=fi71ayd9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ionos.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m2si8460366eds.224.2021.01.26.06.14.56; Tue, 26 Jan 2021 06:15:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@cloud.ionos.com header.s=google header.b=fi71ayd9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ionos.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404473AbhAZOJw (ORCPT + 99 others); Tue, 26 Jan 2021 09:09:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2392712AbhAZOHN (ORCPT ); Tue, 26 Jan 2021 09:07:13 -0500 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A248AC0611BD for ; Tue, 26 Jan 2021 06:06:21 -0800 (PST) Received: by mail-pf1-x42b.google.com with SMTP id f63so10457646pfa.13 for ; Tue, 26 Jan 2021 06:06:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloud.ionos.com; s=google; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=+czMnlTMDcv/kNs5J/9z223E++sGCsTNb3CDdoR0L8I=; b=fi71ayd9wyt+fvpLXgDBpwyqm3kdyFaM2YKXHHucSl+lMGRtQbNPJFZ92zBToRIfq0 EiCz5GHY6NRg+pxgnq5HuzWYJdVaxiEDkpOHdyrw6DATIDG9jBeb8PRjkX3q17rwSBdC 6Q/8Y0RQIfMWi14C3+fg8wQVNUxrdZyrxDuLEDjJzjG3W6edT8ejg5nlhy5KYwAaTWY3 ZKeV65HjG7Fe34RLaTNO4vuWT9a6g31sQk0Wp4k9bUSRwKSFDR7s2AiMcV9/HL952Vse U9AWWdZYZAmYG2oHUtRcC3TA+FZf+emsCI5Hnz17kVXxY7Jyul3fGRf7fmc7uyHhunu2 EdSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=+czMnlTMDcv/kNs5J/9z223E++sGCsTNb3CDdoR0L8I=; b=nscuGcxbUEHeaenn2dFqE/JSqavc4bgK6yh9zOztcqV6fGyyz0M8Ol3+GK6YVMY9w1 gJEEcBxs+pywYtd8hVjpdOG5VkGtzRy8pySoD9NJmONf6Resb+pyKRKL8im0ctKETvVC eZ0LyDxsF2EAWAY0vGwJfcxiUXuXiq0bGHOJJhSldeFYBQEC1Iex9uxNnlp5MSrKgP/M JlACnAnL5c/Nzj7uHuUbQW0xo9DvFr8Wzlak/pIU8OXFaloccjxacuaarMrN449/sGNj 152Cky6RtRYyFwq+uFMeZfYI/c3xHthMHiu+aQG8wWJ8deLjQcmBwqAvSx6tK704msla y6NA== X-Gm-Message-State: AOAM531/JbWE/fU3ToaFFT3SGBaqgYfhGvKQerBbKTZwd1PsEM3ORkhh IiFL2Hf1hLx3EoKR4/nUR8Ft4A== X-Received: by 2002:a62:7dc4:0:b029:1ba:765:3af with SMTP id y187-20020a627dc40000b02901ba076503afmr5331083pfc.78.1611669979990; Tue, 26 Jan 2021 06:06:19 -0800 (PST) Received: from [10.8.0.111] ([89.187.162.118]) by smtp.gmail.com with ESMTPSA id b67sm19214493pfa.140.2021.01.26.06.06.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 26 Jan 2021 06:06:18 -0800 (PST) Subject: Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition To: Donald Buczek , Song Liu , linux-raid@vger.kernel.org, Linux Kernel Mailing List , it+raid@molgen.mpg.de References: <95fbd558-5e46-7a6a-43ac-bcc5ae8581db@cloud.ionos.com> <77244d60-1c2d-330e-71e6-4907d4dd65fc@molgen.mpg.de> <7c5438c7-2324-cc50-db4d-512587cb0ec9@molgen.mpg.de> <37c158cb-f527-34f5-2482-cae138bc8b07@molgen.mpg.de> <55e30408-ac63-965f-769f-18be5fd5885c@molgen.mpg.de> <30576384-682c-c021-ff16-bebed8251365@molgen.mpg.de> <6c7008df-942e-13b1-2e70-a058e96ab0e9@cloud.ionos.com> <12f09162-c92f-8fbb-8382-cba6188bfb29@molgen.mpg.de> From: Guoqing Jiang Message-ID: <6757d55d-ada8-9b7e-b7fd-2071fe905466@cloud.ionos.com> Date: Tue, 26 Jan 2021 15:06:01 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <12f09162-c92f-8fbb-8382-cba6188bfb29@molgen.mpg.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/26/21 13:58, Donald Buczek wrote: > > >> Hmm, how about wake the waiter up in the while loop of raid5d? >> >> @@ -6520,6 +6532,11 @@ static void raid5d(struct md_thread *thread) >>                          md_check_recovery(mddev); >>                          spin_lock_irq(&conf->device_lock); >>                  } >> + >> +               if ((atomic_read(&conf->active_stripes) >> +                    < (conf->max_nr_stripes * 3 / 4) || >> +                    (test_bit(MD_RECOVERY_INTR, &mddev->recovery)))) >> +                       wake_up(&conf->wait_for_stripe); >>          } >>          pr_debug("%d stripes handled\n", handled); > > Hmm... With this patch on top of your other one, we still have the basic > symptoms (md3_raid6 busy looping), but the sync thread is now hanging at > >     root@sloth:~# cat /proc/$(pgrep md3_resync)/stack >     [<0>] md_do_sync.cold+0x8ec/0x97c >     [<0>] md_thread+0xab/0x160 >     [<0>] kthread+0x11b/0x140 >     [<0>] ret_from_fork+0x22/0x30 > > instead, which is > https://elixir.bootlin.com/linux/latest/source/drivers/md/md.c#L8963 Not sure why recovery_active is not zero, because it is set 0 before blk_start_plug, and raid5_sync_request returns 0 and skipped is also set to 1. Perhaps handle_stripe calls md_done_sync. Could you double check the value of recovery_active? Or just don't wait if resync thread is interrupted. wait_event(mddev->recovery_wait, test_bit(MD_RECOVERY_INTR,&mddev->recovery) || !atomic_read(&mddev->recovery_active)); > And, unlike before, "md: md3: data-check interrupted." from the pr_info > two lines above appears in dmesg. Yes, that is intentional since MD_RECOVERY_INTR is set by write idle. Anyway, will try the script and investigate more about the issue. Thanks, Guoqing