Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAE72C433F5 for ; Wed, 17 Nov 2021 09:07:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AF8EE61C4F for ; Wed, 17 Nov 2021 09:07:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234830AbhKQJK4 convert rfc822-to-8bit (ORCPT ); Wed, 17 Nov 2021 04:10:56 -0500 Received: from mail.kernel.org ([198.145.29.99]:52042 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234800AbhKQJKy (ORCPT ); Wed, 17 Nov 2021 04:10:54 -0500 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0ED0461BFB; Wed, 17 Nov 2021 09:07:56 +0000 (UTC) Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mnGv3-00620J-SE; Wed, 17 Nov 2021 09:07:54 +0000 Date: Wed, 17 Nov 2021 09:07:53 +0000 Message-ID: <8735nv880m.wl-maz@kernel.org> From: Marc Zyngier To: Krzysztof =?UTF-8?B?V2lsY3p5xYRza2k=?= , Yuji Nakao Cc: Damien Le Moal , linux-kernel@vger.kernel.org, "linux-pci@vger.kernel.org" , ". Bjorn Helgaas" , Arnd Bergmann , Sasha Levin Subject: Re: Kernel 5.15 doesn't detect SATA drive on boot In-Reply-To: References: <87h7ccw9qc.fsf@yujinakao.com> <8951152e-12d7-0ebe-6f61-7d3de7ef28cb@opensource.wdc.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: kw@linux.com, contact@yujinakao.com, damien.lemoal@opensource.wdc.com, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, bhelgaas@google.com, arnd@arndb.de, sashal@kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Krzysztof, Yugi, On Tue, 16 Nov 2021 23:26:18 +0000, Krzysztof WilczyƄski wrote: > > [+CC Arnd, Bjorn, Marc and Sasha for visibility] > > Hello Damien and Yuji, > > [...] > > > I'm using Arch Linux on MacBook Air 2010. I updated `linux` package[1] > > > from v5.14.16 to v5.15.2 the other day, and the boot process stalled > > > with the following message. > > > > > > ```shell > > > :: running early hook [udev] > > > Starting version 249.6-3-arch > > > :: running hook [udev] > > > :: Triggering uevents... > > > Waiting 10 seconds for device /dev/sda3 ... > > > ERROR: device '/dev/sda3' not found. Skipping fsck. > > > :: mounting '/dev/sda' on real root > > > mount: /new_root: no filesystem type specified. > > > You are now being dropped into an emergency shell. > > > sh: can't access tty; job control turned off > > > [rootfs ]# > > > ``` > > > > > > In the emergency shell there's no `sda` devices when I type `$ ls > > > /dev/`. By downgrading the kernel, boot process works properly. > > > > > > See also Arch Linux bug tracker[2]. There are similar reports on > > > Apple devices. > > > > > > `dmesg` output in the emergency shell is attached. I guess this issue is > > > related to libata, so CCed to Damien Le Moal. > > > > I think that this problem is due to recent PCI subsystem changes which broke Mac > > support. The problem show up as the interrupts not being delivered, which in > > turn result in the kernel assuming that the drive is not working (see the > > timeout error messages in your dmesg output). Hence your boot drive detection > > fails and no rootfs to mount. > > > > Adding linux-pci list. > > > > > > > > > > > > Regards. > > > > > > [1] https://archlinux.org/packages/core/x86_64/linux/ > > > [2] https://bugs.archlinux.org/task/72734 > > The error in the dmesg output (see [2] where the log file is attached) > looks similar to the problem reported a week or so ago, as per: > > https://lore.kernel.org/linux-pci/ee3884db-da17-39e3-4010-bcc8f878e2f6@xenosoft.de/ > > The problematic commits where reverted by Bjorn and the Pull Request that > did it was accepted, as per: > > https://lore.kernel.org/linux-pci/20211111195040.GA1345641@bhelgaas/ > > Thus, this would made its way into 5.16-rc1, I suppose. We might have to > back-port this to the stable and long-term kernels. > > Yuji, could you, if you have some time to spare, try the 5.16-rc1 to see if > this have gotten better on your system? I'm afraid you have the wrong end of the stick on this one. The issue is reported on 5.15, and the issue you are pointing at was introduced during the 5.16 merge window. The problematic commit wasn't reverted, but instead fixed in 10a20b34d735 ("of/irq: Don't ignore interrupt-controller when interrupt-map failed"). The issue is instead very close to the one reported at [1], for which we have a very conservative workaround in 5.16-rc1 (commits 2226667a145d and f21082fb20db). Looking at the dmesg log provided by Yugi, you find the following nugget: [ 0.378564] pci 0000:00:0a.0: [10de:0d88] type 00 class 0x010601 Oh look, a NVIDIA AHCI controller, probably similar enough to the one discussed in the issue reported by Rui. Yugi, could you please test the patch below on top of 5.16-rc1? Thanks, M. [1] https://lore.kernel.org/r/CALjTZvbzYfBuLB+H=fj2J+9=DxjQ2Uqcy0if_PvmJ-nU-qEgkg@mail.gmail.com diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 003950c738d2..cd88eddf614d 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -5857,3 +5857,4 @@ static void nvidia_ion_ahci_fixup(struct pci_dev *pdev) pdev->dev_flags |= PCI_DEV_FLAGS_HAS_MSI_MASKING; } DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0ab8, nvidia_ion_ahci_fixup); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0d88, nvidia_ion_ahci_fixup); -- Without deviation from the norm, progress is not possible.