Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp2090936rwe; Sun, 28 Aug 2022 03:46:01 -0700 (PDT) X-Google-Smtp-Source: AA6agR5qtXNOjbFpaulfEx93xW4W2nHM2sUxYtTOhnSRuWjIZcTa2fMUcJMjZfNdtD2dLTMtDLYc X-Received: by 2002:a63:d1a:0:b0:42a:d321:faa4 with SMTP id c26-20020a630d1a000000b0042ad321faa4mr9963871pgl.532.1661683560954; Sun, 28 Aug 2022 03:46:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661683560; cv=none; d=google.com; s=arc-20160816; b=HV1ngX6Xduc3J1LiRmVFoUEooZ70owwOfGnTK67hudaHODZfFXq1Zthdq724OB38Xd A9kgdxug7jy+ZYmA3oWbfZlJlua1PXQX/UTnLHL+WhxB+V2EaCqnuuskgNpEVouoQrh2 j8OvUN5a9jrAhJm/qi8cbrnjUe51EQe1AS64LV1AjjOzxKBw9zRCOzOYGbpJcGe+6//B h2hgwNqiXSXK8YzGMhTbknMOiCKQDidQ+q9dW//CW48ny9vaXtv6cHZebMOzK8XU84YZ t4TpTUJ67yUSAFBUQYJg/a0UYYIyFXVwQp2prFdZ+QctGjW1CvSW8ParJncHWfQE/Qsz xg8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=TJczTZdCxjt2xQ03y8Ou7+XjyAqdiXhkgadOh2pUhnc=; b=KVENxXckGD4mmpz99EdP6FC+WkrIUNY9/Od/pCf4VDoqrkmU1AblBXrubfoUZJzh5m nYt7z+Yv7Bh+XGq7dq6XDxR48/kbS+yQfq1tTeO/1a1M/pFOH2o8jqiusjZyVlWxAmGA nCKjrp6itLOSMsut9Utr6mXmJEuLtYhyTv73WmcmgyE93zI6zZXy7GRb+L54WWOXefdF +4EiUtvIueX3vkW9t7ea7la6kCvA965C3g0Uv28gQUru0C3BvKxF8jVO2T1UmgwULDqS jMwfzoO46l030d2/pAt6B32BTsO+EDB1IAfvfC3AL+Yj/b8VNtSng3zqXG8+NT5pGfJ0 be0g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 71-20020a63014a000000b0042b80821b1csi5686619pgb.371.2022.08.28.03.45.49; Sun, 28 Aug 2022 03:46:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229538AbiH1KY1 (ORCPT + 99 others); Sun, 28 Aug 2022 06:24:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229436AbiH1KYZ (ORCPT ); Sun, 28 Aug 2022 06:24:25 -0400 Received: from wp530.webpack.hosteurope.de (wp530.webpack.hosteurope.de [IPv6:2a01:488:42:1000:50ed:8234::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B853E4CA00; Sun, 28 Aug 2022 03:24:23 -0700 (PDT) Received: from [2a02:8108:963f:de38:eca4:7d19:f9a2:22c5]; authenticated by wp530.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) id 1oSFSn-0006qz-Ha; Sun, 28 Aug 2022 12:24:21 +0200 Message-ID: <9968ab01-f9c4-ee47-9dd0-9e46560e459e@leemhuis.info> Date: Sun, 28 Aug 2022 12:24:21 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.0 Subject: Re: regression caused by block: freeze the queue earlier in del_gendisk Content-Language: en-US To: Dusty Mabe , Jens Axboe , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Cc: hch@lst.de, "regressions@lists.linux.dev" References: <017845ae-fbae-70f6-5f9e-29aff2742b8c@dustymabe.com> From: Thorsten Leemhuis In-Reply-To: <017845ae-fbae-70f6-5f9e-29aff2742b8c@dustymabe.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-bounce-key: webpack.hosteurope.de;regressions@leemhuis.info;1661682263;b86e50f9; X-HE-SMSGID: 1oSFSn-0006qz-Ha X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, this is your Linux kernel regression tracker. Thx for the report. CCing the regression mailing list, as it should be in the loop for all regressions, as explained here: https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html On 26.08.22 18:15, Dusty Mabe wrote: > > I think I've found a regression introduced by: > a09b314 o block: freeze the queue earlier in del_gendisk Just FYI, in case you are not aware of it already: there was another report that this commit causes problems. See this thread for details: https://lore.kernel.org/all/72a5bf2e-cd56-a85c-2b99-cb8729a66fed@deltatee.com/#t Anyway, let me add this report to the regressions tracking: [TLDR: I'm adding this regression report to the list of tracked regressions; all text from me you find below is based on a few templates paragraphs you might have encountered already already in similar form.] > In Fedora CoreOS we have tests that set up RAID1 on the /boot/ and /root/ partitions > and then subsequently removes one of the disks to simulate a failure. Sometime recently > this test started timing out occasionally. Looking a bit closer it appears instances are > getting stuck during reboot with a bunch of looping messages: > > ``` > [ 17.978854] block device autoloading is deprecated and will be removed. > [ 17.982555] block device autoloading is deprecated and will be removed. > [ 17.985537] block device autoloading is deprecated and will be removed. > [ 17.987546] block device autoloading is deprecated and will be removed. > [ 17.989540] block device autoloading is deprecated and will be removed. > [ 17.991547] block device autoloading is deprecated and will be removed. > [ 17.993555] block device autoloading is deprecated and will be removed. > [ 17.995539] block device autoloading is deprecated and will be removed. > [ 17.997577] block device autoloading is deprecated and will be removed. > [ 17.999544] block device autoloading is deprecated and will be removed. > [ 22.979465] blkdev_get_no_open: 1666 callbacks suppressed > ... > ... > ... > [ 618.221270] blkdev_get_no_open: 1664 callbacks suppressed > [ 618.221273] block device autoloading is deprecated and will be removed. > [ 618.224274] block device autoloading is deprecated and will be removed. > [ 618.227267] block device autoloading is deprecated and will be removed. > [ 618.229274] block device autoloading is deprecated and will be removed. > [ 618.231277] block device autoloading is deprecated and will be removed. > [ 618.233277] block device autoloading is deprecated and will be removed. > [ 618.235282] block device autoloading is deprecated and will be removed. > [ 618.237370] block device autoloading is deprecated and will be removed. > [ 618.239356] block device autoloading is deprecated and will be removed. > [ 618.241290] block device autoloading is deprecated and will be removed. > ``` > > Using the Fedora kernels I narrowed it down to being introduced between > `kernel-5.19.0-0.rc3.27.fc37` (good) and `kernel-5.19.0-0.rc4.33.fc37` (bad). > > I then did a bisect and found: > > ``` > $ git bisect bad > a09b314005f3a0956ebf56e01b3b80339df577cc is the first bad commit > commit a09b314005f3a0956ebf56e01b3b80339df577cc > Author: Christoph Hellwig > Date: Tue Jun 14 09:48:27 2022 +0200 > > block: freeze the queue earlier in del_gendisk > > Freeze the queue earlier in del_gendisk so that the state does not > change while we remove debugfs and sysfs files. > > Ming mentioned that being able to observer request in debugfs might > be useful while the queue is being frozen in del_gendisk, which is > made possible by this change. > > Signed-off-by: Christoph Hellwig > Link: https://lore.kernel.org/r/20220614074827.458955-5-hch@lst.de > Signed-off-by: Jens Axboe > > block/genhd.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > ``` > > Reverting this commit on top of latest git master (4c612826b) gave me successful results. > > Any ideas on what could be amiss here? Luckily the patch is tiny so hopefully it might > be obvious. > > More details (including logs) in the following locations: > > - https://bugzilla.redhat.com/show_bug.cgi?id=2121791 > - https://github.com/coreos/fedora-coreos-tracker/issues/1282 > > > Thanks! > Dusty > Thanks for the report. To be sure below issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression tracking bot: #regzbot ^introduced a09b314005f3a0 #regzbot title block: timeouts when removing a disk from a RAID1 #regzbot ignore-activity This isn't a regression? This issue or a fix for it are already discussed somewhere else? It was fixed already? You want to clarify when the regression started to happen? Or point out I got the title or something else totally wrong? Then just reply -- ideally with also telling regzbot about it, as explained here: https://linux-regtracking.leemhuis.info/tracked-regression/ Reminder for developers: When fixing the issue, add 'Link:' tags pointing to the report (the mail this one replies to), as explained for in the Linux kernel's documentation; above webpage explains why this is important for tracked regressions. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.