Received: by 2002:ac0:b08d:0:0:0:0:0 with SMTP id l13csp3983443imc; Sun, 24 Feb 2019 18:29:41 -0800 (PST) X-Google-Smtp-Source: AHgI3Ia2U+ElpVL/GuRXo5dhFxZ3Ny0LdcC4qLIWOX+YVmGGGW+/TxuqQjCYfl5E09TFnc32Ieha X-Received: by 2002:a62:e082:: with SMTP id d2mr18129011pfm.240.1551061781861; Sun, 24 Feb 2019 18:29:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551061781; cv=none; d=google.com; s=arc-20160816; b=NXibyo41edLT4CuPGKshmU2wx2ZA7ws6OGanTvQb7eIOvyv2Vc3/mNF1RDNcZPW5gr oHjEZKuBiBPtcD+SSmEQHpvFeC9YLqsBzzDdqeY7ZDZPceYfMvzJdZ1iPBH7wU2HLz6a n9rtVz1677rHnXC5QsqZVJJD/gk3GEnzGu+WmKWrZ15DW5QtRvNbk3VYg7ZLR8jEI7Hk r2sMcd+xR+PBQqeFltpHffBOdb1suSI07J09DNwV1Jp39Dy+Qrd1OFS9JAIzyC38MYs0 UqkgaQjo4bT8Nt2iA2NwETTWLbgm493QKeUwpkpMXWLwECqnnIvx+KHdkiiHl47d/olC tstg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=+eHaVyu7/Ldoikxkm5zKy5mb/or1HzqKxiVcZO9g2F0=; b=gKpGU5z3pbgoSIRj1QPwxEOaCkeh4keNwoANehaPZuwGtJiEG9de7zyZFkhdEaEYO1 cKuuYX4kOnN6w2T6q822pJ4nx84K3CrQ3s8gOi6T/dz1WHuptTHPqdI16KOGmnuqom2T N1lEwCYjdVEIr9dzr64d3WKUUsk7kAKSpgkCHe4IyZyI+dZGR7GDFYCHsb84EYI8vWt3 9znM4YcpOg8cDVHHbYCeWyImEaGcrRjR2Kb573EijkjdeJTVkzd4JN9Z+JwFldfWI7ry 4jy+5KoFYbk/WyxV7LHDRyPn7EJzF69ZcbbkdEic+OEqxLBW5i12tHvSFu36aFNdyVyA 0hLw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w31si8435957pla.308.2019.02.24.18.29.26; Sun, 24 Feb 2019 18:29:41 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728417AbfBYC3B (ORCPT + 99 others); Sun, 24 Feb 2019 21:29:01 -0500 Received: from bmailout2.hostsharing.net ([83.223.90.240]:48073 "EHLO bmailout2.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727197AbfBYC3B (ORCPT ); Sun, 24 Feb 2019 21:29:01 -0500 Received: from h08.hostsharing.net (h08.hostsharing.net [83.223.95.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (not verified)) by bmailout2.hostsharing.net (Postfix) with ESMTPS id 8CE9A2800B3D1; Mon, 25 Feb 2019 03:28:57 +0100 (CET) Received: by h08.hostsharing.net (Postfix, from userid 100393) id 2D9DF63B55; Mon, 25 Feb 2019 03:28:57 +0100 (CET) Date: Mon, 25 Feb 2019 03:28:57 +0100 From: Lukas Wunner To: Alexandru Gagniuc Cc: helgaas@kernel.org, alex_gagniuc@dellteam.com, Austin.Bolen@dell.com, Shyam.Iyer@dell.com, Bjorn Helgaas , "Rafael J. Wysocki" , Keith Busch , Oza Pawandeep , Mika Westerberg , Frederick Lawler , linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org Subject: Re: [PATCH v2] PCI: pciehp: Report degraded links via link bandwidth notification Message-ID: <20190225022857.awuspuxb4bedirjp@wunner.de> References: <20181129230454.GF178809@google.com> <20181207182021.16344-1-mr.nuke.me@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181207182021.16344-1-mr.nuke.me@gmail.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Dec 07, 2018 at 12:20:00PM -0600, Alexandru Gagniuc wrote: > A warning is generated when a PCIe device is probed with a degraded > link, but there was no similar mechanism to warn when the link becomes > degraded after probing. The Link Bandwidth Notification provides this > mechanism. > > Use the link bandwidth notification interrupt to detect bandwidth > changes, and rescan the bandwidth, looking for the weakest point. This > is the same logic used in probe(). This is unrelated to pciehp, so the subject prefix should be changed to "PCI/portdrv". > Q: Why is this unconditionally compiled in? > A: The symmetrical check in pci probe() is also always compiled in. Hm, it looks like the convention is to provide a separate Kconfig entry for each port service. > Q: Why call module_init() instead of adding a call in pcie_init_services() ? > A: A call in pcie_init_services() also requires a prototype in portdrv.h, a > non-static implementation in bw_notification.c. Using module_init() is > functionally equivalent, and takes less code. Commit c29de84149ab ("PCI: portdrv: Initialize service drivers directly") moved away from module_init() on purpose, apparently to fix a race condition. > Q: Why print only on degraded links and not when a link is back to full speed? > For symmetry with PCI probe(). Although I see a benefit in conveying that a > link is back to full speed, I expect this to be extremely rare. Secondary bus > reset is usually needed to retrain at full bandwidth. What if the link is retrained at the same speed/width? Intuitively I'd compare the speed in the Link Status Register to what is cached in the cur_bus_speed member of struct pci_bus and print a message only if the speed has changed. (Don't we need to cache the width as well?) > +static irqreturn_t pcie_bw_notification_irq(int irq, void *context) > +{ > + struct pcie_device *srv = context; > + struct pci_dev *port = srv->port; > + struct pci_dev *dev; > + u16 link_status, events; > + int ret; > + > + ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status); > + events = link_status & PCI_EXP_LNKSTA_LBMS; > + > + if (!events || ret != PCIBIOS_SUCCESSFUL) > + return IRQ_NONE; > + > + /* Print status from upstream link partner, not this downstream port. */ > + list_for_each_entry(dev, &port->subordinate->devices, bus_list) > + __pcie_print_link_status(dev, false); > + > + pcie_capability_write_word(port, PCI_EXP_LNKSTA, events); > + return IRQ_HANDLED; > +} You need to hold pci_bus_sem for the list iteration to protect against simultaneous removal of child devices. This may sleep, so request the IRQ with request_threaded_irq(), pass NULL for the "handler" argument and pcie_bw_notification_irq for the "thread_fn" argument. You may want to call pcie_update_link_speed() to update the now stale speed cached in the pci_bus struct. > + ret = devm_request_irq(&srv->device, srv->irq, pcie_bw_notification_irq, > + IRQF_SHARED, "PCIe BW notif", srv); The plan is to move away from port services as devices in the future, so device-managed allocations should be avoided. Apart from that, with a device-managed IRQ, if this service driver is unbound, its IRQ handler may still be invoked if some other port service signals an interrupt (because the IRQ is shared). Which seems wrong. > + .port_type = PCI_EXP_TYPE_DOWNSTREAM, According to PCIe r4.0, sec 7.8.6, Link Bandwidth Notification Capability is also required for root ports, so I think you need to match for PCIE_ANY_PORT and amend pcie_link_bandwidth_notification_supported() to check that you're dealing with a root or downstream port. > + .service = PCIE_PORT_SERVICE_BwNOTIF, All caps please. > @@ -136,9 +136,12 @@ static int pcie_port_enable_irq_vec(struct pci_dev *dev, int *irqs, int mask) > } > > /* PME and hotplug share an MSI/MSI-X vector */ > - if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP)) { > - irqs[PCIE_PORT_SERVICE_PME_SHIFT] = pci_irq_vector(dev, pme); > - irqs[PCIE_PORT_SERVICE_HP_SHIFT] = pci_irq_vector(dev, pme); > + if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP | > + PCIE_PORT_SERVICE_BwNOTIF)) { > + pcie_irq = pci_irq_vector(dev, pme); > + irqs[PCIE_PORT_SERVICE_PME_SHIFT] = pcie_irq; > + irqs[PCIE_PORT_SERVICE_HP_SHIFT] = pcie_irq; > + irqs[PCIE_PORT_SERVICE_BWNOTIF_SHIFT] = pcie_irq; Please amend the code comment with something like - /* PME and hotplug share an MSI/MSI-X vector */ + /* PME, hotplug and bandwidth notification share an MSI/MSI-X vector */ > @@ -250,6 +253,9 @@ static int get_port_device_capability(struct pci_dev *dev) > pci_aer_available() && services & PCIE_PORT_SERVICE_AER) > services |= PCIE_PORT_SERVICE_DPC; > > + if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM) > + services |= PCIE_PORT_SERVICE_BwNOTIF; > + Also need to check for root ports, see above. Otherwise LGTM. Thanks, Lukas