Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp102551pxy; Tue, 4 May 2021 19:54:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxp5HE6fy05cJambZx/Nwn+qoy0HS8MNYucBjeFsbjNnjI22bJYr1kBeYYo5mkdloB8FC93 X-Received: by 2002:a17:902:e54a:b029:ee:d9ed:5191 with SMTP id n10-20020a170902e54ab02900eed9ed5191mr12750994plf.46.1620183262313; Tue, 04 May 2021 19:54:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620183262; cv=none; d=google.com; s=arc-20160816; b=YNly9KmkpBQf+6Js+qpDl3KgNXwUFOaurzwCizf8aqTBJ0rZSK/zhlqXmNcq7dg42Z qYTqEJMaeQT76dqxueBHvWsaBWr12Mpo+5RfJ5/asBZ+mwURXpAK+uG/5uShgBRAnQWu w+o5lepbR3+BK+bI+wFNufXMJfZL52nKjWnqbKkzxn4bFx4yOKmpk+9TEl+QmwQOcOMS 89hcRwTjby7Dko5E1LQhbMKyE0vGyf7/ZB+TG84EQhws4MifnyotJqXCOHb7JL4VxLkD 3KlfwpjgeQfWgV2D8UQFpTZ9MZ+MWXv5U7O+excUoG1IKiQnxq7EnFJxS5DGwiXQhu+r 0vjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:message-id:subject:cc:to:from:date :dkim-signature; bh=OTinkDOdg4KiIzQz1jO+lh4k48kZ8AL12HytC/XRgJ0=; b=gKGCuJdq4xrR15aAMcH8hYWXUqMZNBmftpPneW3/qeJ+sIRrBE7bj+T3NrAc48MG2C xDFRWvX5TiQAxPEOxIoXzmHFrWSCKIj3mEk7NA8gwID4QmY2AUAStlT60il1bNslVNTT ZgIt0V/IaJBkH97NoP/KeIriyOMDNyJxszeCu3iG4AGAGAsfBRSabjTFjMmARMhJbmZn BqrWLx0WiGLSYJM3VEOQKCTSByG2pAuQpMEiW65h4aJzsv6jZZ/sPhsFMbh0lZtHcIPK 4ADG8pV0fmodoFCOTBu8uJ28jf8PlumY+JvMU7/2+yzfkt7efv+MwIjFei6ZEiZZOOM5 Qojg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=elB3Njpg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id x4si284197pfi.40.2021.05.04.19.54.08; Tue, 04 May 2021 19:54:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=elB3Njpg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231276AbhEECNe (ORCPT + 99 others); Tue, 4 May 2021 22:13:34 -0400 Received: from mail.kernel.org ([198.145.29.99]:37778 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229799AbhEECNe (ORCPT ); Tue, 4 May 2021 22:13:34 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 3399F611CB; Wed, 5 May 2021 02:12:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1620180758; bh=3MH+IfwCtYy+KGwAHtO23xU0Cn26SwyyNWf7lqR1zT8=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=elB3NjpgrYKCLLTt8S0bH6TPRyVGuvc1kPJH49JTIW3XtxuGtwa52ttsnOfDR1BVY fb8TR/GZW4vIhOSG62+rsoz2ORZ5ogsiLxfYXAQ30T5FZjo0pFhp9vRmMuwehlfFBK dA7odjOdJhFAZFxCnSlTC7ZM3W1KEAQnEnp2TT24134x4Z7pS9AwBearh2m1epymgY NJ9g7wvpZuzqSxeW+H7YcOqBvX4pRAeEhWm9LIIekk1TCLoY2nXFqnFw1NQq7VJW0e Cf03M8v8pwkVIjubLb9pj4oYrfHH2okFDh6dvfpDaqV4qo+55c6AW6LSztlwSnuDQL zf8Or7KdMVKsQ== Date: Tue, 4 May 2021 21:12:36 -0500 From: Bjorn Helgaas To: Shanker R Donthineni Cc: Alex Williamson , Bjorn Helgaas , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Sinan Kaya , Vikram Sethi , Amey Narkhede Subject: Re: [PATCH v4 2/2] PCI: Enable NO_BUS_RESET quirk for Nvidia GPUs Message-ID: <20210505021236.GA1244944@bjorn-Precision-5520> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <478efe56-fb64-6987-f64c-f3d930a3b330@nvidia.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 03, 2021 at 09:07:11PM -0500, Shanker R Donthineni wrote: > On 5/3/21 5:42 PM, Bjorn Helgaas wrote: > > Obviously _RST only works for built-in devices, since there's no AML > > for plug-in devices, right? So if there's a plug-in card with this > > GPU, neither SBR nor _RST will work? > These are not plug-in PCIe GPU cards, will exist on upcoming server > baseboards. ACPI-reset should wok for plug-in devices as well as long > as firmware has _RST method defined in ACPI-device associated with > the PCIe hot-plug slot. Maybe I'm missing something, but I don't see how _RST can work for plug-in devices. _RST is part of the system firmware, and that firmware knows nothing about what will be plugged into the slot. So if system firmware supplies _RST that knows how to reset the Nvidia GPU, it's not going to do the right thing if you plug in an NVMe device instead. Can you elaborate on how _RST would work for plug-in devices? My only point here is that IF this GPU is ever on a plug-in card, neither _RST nor SBR would work, so we'd have to use whatever other reset methods *do* work (I guess only FLR?) > I've verified PCIe plug-in feature using SYSFS interface. > > 1) Remove device using sysfs interface > ? root@test:/sys/bus/pci# echo 1 > devices/0005:01:00.0/remove > ? root@test:/sys/bus/pci# lspci -s 0005:01:00.0 > ? > 2) Rescan PCI bus using sysfs interface > ? root@test:/sys/bus/pci# echo 1 > devices/0005:00:00.0/rescan > ? root@test:/sys/bus/pci# lspci -s 0005:01:00.0 > ? 0005:01:00.0 3D controller: NVIDIA Corporation Device 2341 (rev a1) > > 3) List current reset methods > ? root@jetson:/sys/bus/pci# cat devices/0005:01:00.0/reset_method > ? acpi,flr > > Example AML code: > ?// Device definition for slot/devfn > ? Device(GPU0) { > ???? Name(_ADR,0x00000000) > ???? Method (_RST, 0) > ???? { > ??????? printf("Entering ACPI _RST method") > ??????? // RESET code > ??????? printf("Exiting ACPI _RST method") > ???? } > ? } > > 4) Issue device reset from the userspace > ?root@test:/sys/bus/pci# echo 1 > devices/0005:01:00.0/reset > > dmesg: > ?[ 6156.426303] ACPI Debug:? "Entering PCI9 _RST method" > ?[ 6156.427007] ACPI Debug:? "Exiting PCI9 _RST method" > > > I'm wondering if we should log something to dmesg in > > quirk_no_bus_reset(), quirk_no_pm_reset(), quirk_no_flr(), etc., just > > so we have a hint about the fact that resets won't work quite as > > expected on these devices. > Yes, it would be very useful to know what PCI quirks were applied > during boot. Should I create a separate patch for adding pci_info() > or include as part of this patch? Don't include it as part of this patch. It's a separate logical change so should be a separate patch. We can worry about that later. > ?--- a/drivers/pci/quirks.c > ?+++ b/drivers/pci/quirks.c > ?@@ -3556,6 +3556,7 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MELLANOX, PCI_ANY_ID, > ? static void quirk_no_bus_reset(struct pci_dev *dev) > ? { > ? ? ???? dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET; > ?????? +pci_info(dev, "Applied NO_BUS_RESET quirk\n"); > ? } > > ? /* > ?@@ -3598,6 +3599,7 @@ static void quirk_no_pm_reset(struct pci_dev *dev) > ?? ?????? */ > ? ? ???? if (!pci_is_root_bus(dev->bus)) > ?? ? ??????????? dev->dev_flags |= PCI_DEV_FLAGS_NO_PM_RESET; > ??????? +pci_info(dev, "Applied NO_PM_RESET quirk\n"); > ? } > > ? /* > ?@@ -5138,6 +5140,7 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap); > ? static void quirk_no_flr(struct pci_dev *dev) > ? { > ? ? ???? dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET; > ??????? +pci_info(dev, "Applied NO_FLR_RESET quirk\n"); > ? } > >