Received: by 10.223.185.116 with SMTP id b49csp1107236wrg; Fri, 16 Feb 2018 12:35:47 -0800 (PST) X-Google-Smtp-Source: AH8x225AAKVn+W/H+YMGfnHxiffWKxCgUz++CGISYKlcyS9E+vH+fC5lR7aI95ukcEQjgsVqGYHY X-Received: by 10.98.224.24 with SMTP id f24mr7103458pfh.157.1518813347577; Fri, 16 Feb 2018 12:35:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518813347; cv=none; d=google.com; s=arc-20160816; b=aJHO7kzfWU3VqMyiIYEpZXDDbwPTSC5A4FozntYKVw27sfszj8bAxLGX0O3DYjCyIE b6WUjUJPGaa/Lf1l0V9KbC/45Q0OgJAuUzrU2JLjwV+qMavML60MdOR6onmsCFKDPPmC wEV+iTPKfcknEa20TxeS7X129ITa6MzC7k7RbLz+TctCNAVdLpHVZbnmqMhZrYRlP71c cou13I8NsI9uc1ed0cba/sO4RE1TGcbEPs8rXegdhrLS4+Vwh6jzA1pX9hssw9noIHMW W6F4Cclyar8HFPPAiA7adh5RFS2zx0ehMRxJK3Fqr0plWkoPqxX74AxHNhm/nvIxzFy5 8VsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dmarc-filter:arc-authentication-results; bh=tKEiB0st8fd6JWnysrD0ViJUWQjxXlDNfuUG1jPDdsc=; b=t4h3PBnQsPL+oq1q2KwEEhnO5KKf7GQq1nP4jJAT2rz0y7i8y/dx8Ys4xycgHQH2E5 Y3NvyW9FUZT5MUAGb5WHihqwOeRNKKa9rCFlwEyKFmcR/q1wRsvgqU0SGwCOWttJq5Te LDtfNedxPXb/Q7OcrFH0hFCxBhW88SeT7DVgiSUnKAQifhGGM43qVxTIDiGmJ7ETo5Pi e4I6+AVthVjiQq4SocyVT2bbaEBMHGRpJXo6fab/7ktdRmYbeVwR0VykJoj/1p9QAenB VTeH9yF2UH06UgceAuEdGwu1jwyWnZQRXgJlWUdESYfT+qgY5OBVw1q4JogX54mwdv8w mN1A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m72si1825673pga.376.2018.02.16.12.35.32; Fri, 16 Feb 2018 12:35:47 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751184AbeBPUej (ORCPT + 99 others); Fri, 16 Feb 2018 15:34:39 -0500 Received: from mail.kernel.org ([198.145.29.99]:42164 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750801AbeBPUeg (ORCPT ); Fri, 16 Feb 2018 15:34:36 -0500 Received: from localhost (unknown [69.55.156.246]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C2CB4217A0; Fri, 16 Feb 2018 20:34:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C2CB4217A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=helgaas@kernel.org Date: Fri, 16 Feb 2018 14:34:34 -0600 From: Bjorn Helgaas To: "Rafael J. Wysocki" Cc: Mika Westerberg , George Cherian , George Cherian , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, bhelgaas@google.com, Jayachandran.Nair@cavium.com, Robert.Richter@cavium.com, Lorenzo Pieralisi Subject: Re: [PATCH] PCI: Add quirk for Cavium Thunder-X2 PCIe erratum #173 Message-ID: <20180216203434.GC11014@bhelgaas-glaptop.roam.corp.google.com> References: <1517554846-16703-1-git-send-email-george.cherian@cavium.com> <26660034.hqiP3ND89J@aspire.rjw.lan> <20180215233900.GA195653@bhelgaas-glaptop.roam.corp.google.com> <2323301.ORZpb3hFRe@aspire.rjw.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2323301.ORZpb3hFRe@aspire.rjw.lan> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 16, 2018 at 01:40:37PM +0100, Rafael J. Wysocki wrote: > On Friday, February 16, 2018 12:39:00 AM CET Bjorn Helgaas wrote: > > On Thu, Feb 15, 2018 at 10:57:25PM +0100, Rafael J. Wysocki wrote: > > > On Wednesday, February 14, 2018 9:16:53 PM CET Bjorn Helgaas wrote: > > > > On Wed, Feb 14, 2018 at 04:58:08PM +0530, George Cherian wrote: > > > > > On 02/13/2018 08:39 PM, Bjorn Helgaas wrote: > > > > > >On Fri, Feb 02, 2018 at 07:00:46AM +0000, George Cherian wrote: > > > > > >>[ 12.775202] Unhandled fault: synchronous external abort (0x96000610) at 0x0000000000000000 > > > > > > > > > >>[ 12.813518] PC is at pci_generic_config_read+0x5c/0xf0 > > > > > >>[ 12.818643] LR is at pci_generic_config_read+0x48/0xf0 > > > > > >> ... > > > > > >>[ 13.269819] [] pci_generic_config_read+0x5c/0xf0 > > > > > >>[ 13.275987] [] pci_bus_read_config_dword+0xb4/0xd8 > > > > > >>[ 13.282328] [] pcie_capability_read_dword+0x64/0xb8 > > > > > >>[ 13.288757] [] __pci_dev_reset+0x90/0x328 > > > > > >>[ 13.294317] [] pci_probe_reset_function+0x24/0x30 > > > > > >>[ 13.300571] [] pci_create_sysfs_dev_files+0x18c/0x2a0 > > > > > >>[ 13.307173] [] pci_sysfs_init+0x38/0x60 > > > > > >>[ 13.312560] [] do_one_initcall+0x5c/0x170 > > > > > >>[ 13.318122] [] kernel_init_freeable+0x1c0/0x27c > > > > > >>[ 13.324205] [] kernel_init+0x18/0x110 > > > > > >>[ 13.329416] [] ret_from_fork+0x10/0x40 > > > > > > > > I should have asked this before: why are we even trying to do this > > > > config read to a device that's in D3? I assume this root port started > > > > out in D0 because we apparently enumerated it successfully, but it > > > > must have been put into D3 later? > > > > > > > > The pci_probe_reset_function() function comment says "the PCI device > > > > must be responsive to PCI config space in order to use this function." > > > > So apparently the caller should make sure the device is in at least > > > > D3hot and any bridges leading to it are in D0. > > > > > > > > If we're missing whatever it is that makes sure the target device is > > > > reachable and responsive to config space, we likely have issues on > > > > other systems as well. On Cavium this causes the external abort, but > > > > on other systems, we'd probably just get all 1's data back from the > > > > config read and silently do the wrong thing in > > > > pci_probe_reset_function(). > > > > > > > > I don't know how this runtime PM works, but maybe Rafael can help us > > > > out. > > > > > > I'm not sure what the question is to be honest. > > > > My questions are basically "What does the PCI core need to do to make > > sure a device is in D0 before it operates on it? And where do we need > > to do that?" > > > > > Unused ports are autosuspended after 100ms as per pcie_portdrv_probe(). > > > > So I guess this is only done for PCIe bridges, not for conventional > > PCI bridges. Is this because autosuspend depends on a PCIe-only > > feature? > > Yes, that's PCIe-only. What PCIe feature does this depend on? I see the PCIe test in pci_bridge_d3_possible(), added by 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend"), but unfortunately that commit doesn't say *why* we only do this for PCIe, or only for BIOS dates of 2015 or newer. > > Here's the path we're in now: > > > > pcie_portdrv_init # device_initcall (level 6) > > pci_register_driver(&pcie_portdriver) > > ... > > pcie_portdrv_probe # pcie_portdriver.probe > > pm_runtime_set_autosuspend_delay(&dev->dev, 100) > > > > pci_sysfs_init # late_initcall (level 7) > > for_each_pci_dev(dev) > > pci_create_sysfs_dev_files(dev) > > pci_create_capabilities_sysfs(dev) > > pci_probe_reset_function(dev) > > ... > > pcie_capability_read_dword <-- config read causing abort > > > > I guess it's conceivable that more than 100ms elapsed between > > pcie_portdrv_probe() and pci_probe_reset_function(), and obviously we > > can't depend on any particular timing. > > Right. George, can you find out which specific devices are involved here? My working assumption is that as for_each_pci_dev() iterates through all the devices, we call pci_probe_reset_function() for a root port that happens to be in D3hot (this should work fine because the root port's config space should be accessible), then we call pci_probe_reset_function() for a device *below* the root port. That doesn't work because the root port's link is down and and devices below it are not accessible. The root port should treat this as an Unsupported Request. > > > Now, it looks like they should be resumed in the code path in question, > > > so this case seems to have been overlooked. > > > > How do we identify which code paths need to resume the device, and > > what do we need to do to resume it? > > The "suspended" metastate is generally defined as "the device may not be > accessible now" and for the majority of bus types it is unsafe to attempt > to access suspended devices. > > That's not strictly the case for PCI, but IMO PCI devices should be resumed > before accessing them too because of the below. :-) > > pm_runtime_get_sync() resumes the device and prevents it from suspending > again until pm_runtime_put() (or equivalent) is called on it. How do we identify the places that need to call pm_runtime_get_sync()? PCI devices will respond to config accesses while in D3hot, but not to memory or I/O accesses. I think that distinction is probably too detailed and the general rule should be "the device must be in D0 before you access it." I guess generally the driver is in control, and if it suspends the device, it has to be smart enough to resume it before touching it. This particular pci_probe_reset_function() case is a problem because it's the PCI *core* doing the access while the device may already be claimed by a driver. I think it's generally taboo for the core to do this, and I have a patch to move this access so it happens before a driver can claim the device. We also have the case where userspace accesses a device via sysfs. I guess all those cases probably need to wake up the device. It looks like pci_config_pm_runtime_get() already does that for config accesses. But I think other entry points, e.g., the following, should have the same problem and I don't see anything there: pci_read_legacy_io() pci_write_legacy_io() pci_mmap_legacy_mem() pci_mmap_legacy_io() pci_mmap_resource() pci_resource_io() pci_read_rom() reset_store() Plus probably the ASPM control files, VPD, maybe others? Bjorn