Received: by 2002:a05:7412:5112:b0:fa:6e18:a558 with SMTP id fm18csp31573rdb; Mon, 22 Jan 2024 10:57:38 -0800 (PST) X-Google-Smtp-Source: AGHT+IHnwQhUEFkAhDpWyv6ykd0SsuEJ2Np+lCCBh3DyOPOPrx/5LlGlYy/ltCpTDHWRX/LavhaN X-Received: by 2002:a05:620a:2187:b0:783:539a:f30e with SMTP id g7-20020a05620a218700b00783539af30emr5496939qka.83.1705949857979; Mon, 22 Jan 2024 10:57:37 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705949857; cv=pass; d=google.com; s=arc-20160816; b=EXsooVF2ugmpf37YsyYzPdcsiBXTXEWIeTC/782V7MnCgojn7D2fENCw5AxnvYc09Y kAzI72WZ/X9WpRphE4a+2rjIpieWrpznpCZA3qSO090hgYg3mW5m7jkrqcEZF6X2KNRT Ph9n7DUzpaaIr01BHgOXFLYqCCM41LzlyynUQO/s2wx5j6kCSL7+75hq1HjGa74iD8Yh 0D1mNs6DozpKBbtPvyfH5tNNGiHiqZIPIaWiMjfLMryytr9yYHcrmPSh3dScuNygi0r+ VUCSnz78mRXMytLcvbzASPFSBdRhIijqIhA0ktLCtGOqyYu3zDxc6QamoKGUBZd3fNQ/ mAJg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:subject:cc:to:from :date:dkim-signature; bh=Ln1ircINb9z5d6LTIvhhkBnkiuFR4Lz7WlMnZBMvRb4=; fh=D/+JBx5lWLoSPnfw2sRXWw5A4E4l8yiBuBAV7IA1zZc=; b=qXD7LwkdYFlbTvIJhLqfOv6IkYPWXJxW+jjBzLKGZcPAiceaXtJImg2fKdzzpeulgS h1i2A7vnAIFtpGl68rBU2TRbf5Zc7ZNDNi4BsoNJ3zxl50OjYnjZdnm3qUVO6/cWcgub wY3rxoBJuWJHzejGRSo/nNifjXCIwLeoRvXH8RVBIIiHLTlS9X2JxQS9KB0hdtaEO6z7 0er1U4+qJ3lEfn0D0Mz4w4yJbrFtbVXQAhrBHMPfxePB+vPXqmUKkCdND8OHJnqlOXW1 KZO0iOqxTZ3vjjMREZ1PZswMorO55Es6k6Pdls8lN5PnAHLAaM8znvsck/nbY4oBmarl zwTg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=sbF57wLp; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-33981-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-33981-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id x11-20020ae9e90b000000b007834bf573dfsi6199732qkf.284.2024.01.22.10.57.37 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jan 2024 10:57:37 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-33981-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=sbF57wLp; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-33981-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-33981-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id ABE5C1C29D42 for ; Mon, 22 Jan 2024 18:57:37 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B16F54D127; Mon, 22 Jan 2024 18:26:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="sbF57wLp" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B6EE24D10E; Mon, 22 Jan 2024 18:26:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705947977; cv=none; b=m04GXNAyYGtrC+z93+dE6i2lm9nM8txf3aByv7nwd+ZbPTtupdfX1zU//SGHxgAVPeMO8dBDkGIr4or+50peMJZHqzP2mCnRLu7RsOZLTcif/sgejgmBac6rgeEFZxQts6soaL0HAWOZgTEBsDVtv2futT4qqzHorP/nAJSe/TY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705947977; c=relaxed/simple; bh=xqaElstZ6Be2KexYuJuX9vmqnmJZ8Y7gmDJsiNOQ56Q=; h=Date:From:To:Cc:Subject:Message-ID:MIME-Version:Content-Type: Content-Disposition:In-Reply-To; b=Wo/96XAL3APwKThy/wSaTZmAjibcnyquk1LWdobjDYwgPU4sCmVaCwsBOgK9XE2U2QL0NcOuTYtw0NmmNwpAjD++qSaJZxatxH6AVMhwoYq1xUAGqtzFctdh2onVX4zF/YcatYXWpZ81oBEGxYa4pkCKrvBHHAQtTQNo8X1n3rg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=sbF57wLp; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id F1769C433C7; Mon, 22 Jan 2024 18:26:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705947977; bh=xqaElstZ6Be2KexYuJuX9vmqnmJZ8Y7gmDJsiNOQ56Q=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=sbF57wLpzO4Po/5GpjgYQjSECzQjLbxSiNv/RKB8wRHfNNYDrfybv1BM0T+rwpu13 WXBW4KZf0YWrt1XaNQpp3Hl67mdkmKjBMm10UrSe1YtlB/jlWvemQbqfZuLtOPugV7 w7zmscDHsgJyt9pCLUdy4K+WKwdOJJpQ3FOQmgSkL2jGVV98ELA1IJyO15yK9on5jV VC9xEDPpXQjYMabsXM2bkSsDb+HZW462zvO1CPzN+F7m0wmEkizHIv1x1NG7IONcso 8MTlsPoksu0gnKedabN5/EDAjZJhLwQE5iKmSugXt0ALOpa2XjuZhtiIBNbXTIbBTi EA0XaD5Y4jSgA== Date: Mon, 22 Jan 2024 12:26:15 -0600 From: Bjorn Helgaas To: Johan Hovold Cc: Michael Schaller , Kai-Heng Feng , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, regressions@lists.linux.dev, "Maciej W . Rozycki" , Ajay Agarwal , Kuppuswamy Sathyanarayanan , Greg Kroah-Hartman , Heiner Kallweit , Johan Hovold , Bjorn Helgaas , stable@vger.kernel.org, regressions@leemhuis.info Subject: Re: PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()") Message-ID: <20240122182615.GA277100@bhelgaas> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Mon, Jan 22, 2024 at 11:53:35AM +0100, Johan Hovold wrote: > Hi Bjorn, > > I never got a reply to this one so resending with updated Subject in > case it got buried in your inbox. I did see it but decided it was better to fix the problem with resume causing an unintended reboot, even though fixing that meant breaking lockdep again, since I don't think we have user reports of the potential deadlock lockdep finds. 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") was a start at fixing other problems and also improving the ASPM style, so I hope somebody steps up to fix both it and the lockdep issue. I haven't looked at it enough to have a preference for *how* to fix it. Bjorn > On Mon, Jan 08, 2024 at 09:39:07AM +0100, Johan Hovold wrote: > > > On Tue, Jan 02, 2024 at 05:25:50PM -0600, Bjorn Helgaas wrote: > > > From: Bjorn Helgaas > > > > > > This reverts commit 08d0cc5f34265d1a1e3031f319f594bd1970976c. > > > > > > Michael reported that when attempting to resume from suspend to RAM on ASUS > > > mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1), 08d0cc5f3426 > > > ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") caused a 12-second delay > > > with no output, followed by a reboot. > > > > > > Workarounds include: > > > > > > - Reverting 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") > > > - Booting with "pcie_aspm=off" > > > - Booting with "pcie_aspm.policy=performance" > > > - "echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm" > > > before suspending > > > - Connecting a USB flash drive > > > > > > Fixes: 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") > > > Reported-by: Michael Schaller > > > Link: https://lore.kernel.org/r/76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de > > > Signed-off-by: Bjorn Helgaas > > > Cc: > > > --- > > > > > +/* @pdev: the root port or switch downstream port */ > > > +void pcie_aspm_pm_state_change(struct pci_dev *pdev) > > > +{ > > > + struct pcie_link_state *link = pdev->link_state; > > > + > > > + if (aspm_disabled || !link) > > > + return; > > > + /* > > > + * Devices changed PM state, we should recheck if latency > > > + * meets all functions' requirement > > > + */ > > > + down_read(&pci_bus_sem); > > > + mutex_lock(&aspm_lock); > > > + pcie_update_aspm_capable(link->root); > > > + pcie_config_aspm_path(link); > > > + mutex_unlock(&aspm_lock); > > > + up_read(&pci_bus_sem); > > > +} > > > > This function is now restored in 6.7 final and is called in paths which > > already hold the pci_bus_sem as reported by lockdep (see splat below). > > > > This can potentially lead to a deadlock and specifically prevents using > > lockdep on Qualcomm platforms. > > > > Not sure if you want to propagate whether the bus semaphore is held to > > pcie_aspm_pm_state_change() or if there was some alternative to > > restoring this function which should be explored instead. > > So to summarise, this patch, which is now commit > > f93e71aea6c6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"") > > introduced a regression in 6.7-final for Qualcomm platforms (and some > Intel platforms) similar to the one recently fixed by commit > > f352ce999260 ("PCI: qcom: Fix potential deadlock when enabling ASPM"). > > Johan > > > #regzbot introduced: f93e71aea6c6 > > > ============================================ > > WARNING: possible recursive locking detected > > 6.7.0 #40 Not tainted > > -------------------------------------------- > > kworker/u16:5/90 is trying to acquire lock: > > ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc > > pcieport 0002:00:00.0: PME: Signaling with IRQ 197 > > > > but task is already holding lock: > > ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc > > > > other info that might help us debug this: > > Possible unsafe locking scenario: > > > > CPU0 > > ---- > > lock(pci_bus_sem); > > lock(pci_bus_sem); > > > > *** DEADLOCK *** > > > > May be due to missing lock nesting notation > > > > 4 locks held by kworker/u16:5/90: > > #0: ffff06c5c0008d38 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x150/0x53c > > #1: ffff800081c0bdd0 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x150/0x53c > > #2: ffff06c5c0b7d0f8 (&dev->mutex){....}-{3:3}, at: __driver_attach_async_helper+0x3c/0xf4 > > #3: ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc > > > > stack backtrace: > > CPU: 1 PID: 90 Comm: kworker/u16:5 Not tainted 6.7.0 #40 > > Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET53W (1.25 ) 10/12/2022 > > Workqueue: events_unbound async_run_entry_fn > > Call trace: > > dump_backtrace+0x9c/0x11c > > show_stack+0x18/0x24 > > dump_stack_lvl+0x60/0xac > > dump_stack+0x18/0x24 > > print_deadlock_bug+0x25c/0x348 > > __lock_acquire+0x10a4/0x2064 > > lock_acquire+0x1e8/0x318 > > down_read+0x60/0x184 > > pcie_aspm_pm_state_change+0x58/0xdc > > pci_set_full_power_state+0xa8/0x114 > > pci_set_power_state+0xc4/0x120 > > qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom] > > pci_walk_bus+0x64/0xbc > > qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom]