Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp3659598rwb; Tue, 20 Sep 2022 02:52:54 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5UVwGiw8j1ZIWMIVMRC3YxIUbSHEHcxKpHYzAuLe0iSgfs9S/L2MfnlYpgRSlgCzXp6IUd X-Received: by 2002:a17:907:94c6:b0:77d:7ad3:d063 with SMTP id dn6-20020a17090794c600b0077d7ad3d063mr16044891ejc.330.1663667574380; Tue, 20 Sep 2022 02:52:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663667574; cv=none; d=google.com; s=arc-20160816; b=kZQfVfTALSAhEF4IIGxFkUXk/FujM1Gf2pEkW40+w3qwziW8iL9p29y8LFy2jM2Y12 afRXbBWPhQZQCbkwW5WS0cnZE+DZyhJbo5wmrhqq6ya3P4qU/tvYhPVud16/6uicZ3T8 YIORCw17tdmgIvOs+Q9zCJykK165F/hr+7SvryBBvIzBsiw7G5PSWPJi3yXy6aau5C83 bwMwYH8rIaDD50OXB7siscC/ZYzMFctyv2qru9o3oQKgeCVDZJiwEy75aPHtmyuhvi9O +sBxsyr1YcW4fHNkzJCHmsErOZ5sMxRUJYYOGdOYhlt4Rs2ABpYLN3zupVKtw4gVoL2E 934w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=Qy94Ee1wHMH0NfQPLgb7pfUJHDFelH9Djk7GnxHm3a0=; b=J+Nfeze3bvdKV07OwobFqRC8fmB/9VFTLw1hj7kVLqQL7ZJ6qXvSFBZ5LmlzPN8G+p 4Fs2HEIR1kgV246ypdqmirBFGw+0C3s/3m0iShn4u5HvvFiR1ocNSXkw1utHc8SQsZTq LqmFXmkRjAA7pUnBeIbN4GxMMYSJ5SOgqN5av8rE5ZHJF3BwESTb4rOzzzWaj1uRZ7sa UQQBWiAQGkjt//aeA9Z7YpjyV3IYhiUq44YD+e9JGWNGnW5GWh6cke1TmMc9wCgwzQcN U6aBa4LwSwsfBx0lVQh8Wsa3g479CKffoaJ8yBexMujOOY0f0Y6hJL7cLTGzLVTA2edk fq1Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hq24-20020a1709073f1800b0077d6d63bd0dsi984621ejc.184.2022.09.20.02.52.28; Tue, 20 Sep 2022 02:52:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231776AbiITJNU (ORCPT + 99 others); Tue, 20 Sep 2022 05:13:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231773AbiITJMy (ORCPT ); Tue, 20 Sep 2022 05:12:54 -0400 Received: from wp530.webpack.hosteurope.de (wp530.webpack.hosteurope.de [IPv6:2a01:488:42:1000:50ed:8234::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6B8DE6D9E9; Tue, 20 Sep 2022 02:12:03 -0700 (PDT) Received: from [2a02:8108:963f:de38:eca4:7d19:f9a2:22c5]; authenticated by wp530.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) id 1oaZIF-0003sr-Dh; Tue, 20 Sep 2022 11:11:51 +0200 Message-ID: Date: Tue, 20 Sep 2022 11:11:50 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.1 Subject: Re: regression caused by block: freeze the queue earlier in del_gendisk Content-Language: en-US, de-DE To: Dusty Mabe , Ming Lei , Christoph Hellwig Cc: Jens Axboe , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, "regressions@lists.linux.dev" References: <017845ae-fbae-70f6-5f9e-29aff2742b8c@dustymabe.com> <20220907073324.GB23826@lst.de> <20220912071618.GA4971@lst.de> <95cbd47d-46ed-850e-7d4f-851b35d03069@dustymabe.com> From: Thorsten Leemhuis In-Reply-To: <95cbd47d-46ed-850e-7d4f-851b35d03069@dustymabe.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-bounce-key: webpack.hosteurope.de;regressions@leemhuis.info;1663665123;03f15537; X-HE-SMSGID: 1oaZIF-0003sr-Dh X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, this is your Linux kernel regression tracker. On 13.09.22 04:36, Dusty Mabe wrote: > On 9/12/22 21:55, Ming Lei wrote: >> On Mon, Sep 12, 2022 at 09:16:18AM +0200, Christoph Hellwig wrote: >>> On Fri, Sep 09, 2022 at 04:24:40PM +0800, Ming Lei wrote: >>>> On Wed, Sep 07, 2022 at 09:33:24AM +0200, Christoph Hellwig wrote: >>>>> On Thu, Sep 01, 2022 at 03:06:08PM +0800, Ming Lei wrote: >>>>>> It is a bit hard to associate the above commit with reported issue. >>>>> >>>>> So the messages clearly are about something trying to open a device >>>>> that went away at the block layer, but somehow does not get removed >>>>> in time by udev (which seems to be a userspace bug in CoreOS). But >>>>> even with that we really should not hang. >>>> >>>> Xiao Ni provides one script[1] which can reproduce the issue more or less. >>> >>> I've run the reproduced 10000 times on current mainline, and while >>> it prints one of the autoloading messages per run, I've not actually >>> seen any kind of hang. >> >> I can't reproduce the hang too. > > I obviously can reproduce the issue with the test in our Fedora CoreOS > test suite. It's part of a framework (i.e. it's not simple some script > you can run) but it is very reproducible so one can add some instrumentation > to the kernel and feed it through a build/test cycle to see different > results or logs. > > I'm willing to share this with other people (maybe a screen share or > some written down instructions) if anyone would be interested. This thread looked stalled, or was there any progress in the past week? If not: Fedora apparently removed the patch in their kernels a while ago, as quite a few users where hitting it. What is preventing us from doing the same in mainline and 5.19.y until the issue can be resolved? The description of a09b314005f3 ("block: freeze the queue earlier in del_gendisk") doesn't sound like the change does something crucial that can't wait a bit. I might be totally wrong with that, but I think it's my duty to ask that question at this point. >> What I meant is that new raid disk can be added by mdadm after stopping >> the imsm container and raid disk with the autoloading messages printed, >> I understand this behavior isn't correct, but I am not familiar with >> raid enough. >> >> It might be related with the delay deleting gendisk from wq & md kobj >> release handler. >> >> During reboot, if mdadm does this stupid thing without stopping, the hang >> could be caused. >> >> I think the root cause is that why mdadm tries to open/add new raid bdev >> crazily during reboot. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.