Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp1441965imm; Thu, 4 Oct 2018 13:46:21 -0700 (PDT) X-Google-Smtp-Source: ACcGV62HNfi25v3xYX/DTgCAmmpfLU0I+hGglwkveZnZQ2WloZsfe8hQMVetNSBKtIg1E//fPvut X-Received: by 2002:a63:d208:: with SMTP id a8-v6mr6725915pgg.99.1538685981635; Thu, 04 Oct 2018 13:46:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538685981; cv=none; d=google.com; s=arc-20160816; b=z5nkn8HIllR9QdTCuserNdsTH8CDq5B8+TGadf1jdEQL1f5CcYM0ZykmOho8AFwQhq l5dypYxrAQ6Qwn4nYx/s3RoqZTn5RE4HJS4QppYHwwYrpagMGUsDOY1t7Rjpw/9iYUH3 zRYW7T26848o2taHQ0skJLrJyDSSIK+41f+p7lgU+K4WpKE6LscrsZYfDmGbUO7A0N9s eqYl0TfsAdVBwMx2+Py5X1sDH9bh09tqMKsKROEgHJ/tG2smoESzuDfaizWmv+YZUy0S JJqIC7LQdR60MiUC2sq85MzAVp/cmELcGFxyJwsV8o8CcV7ssjClnNPby0CyFqMcyzWc HWeg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=sw9QGLETyHABEHZL6P5UFMpLY8ncRXu0POeJeNBbI1Y=; b=aNfcOS0Y1dCMV9MNFTYBor+q+Ba6MQgzEAN0u1p99rjCbNQbsfOFRt0zWZK9IJsUSS ETO9t50/Pd2ElRZV39n1j87qW+amKd2mLF17PAhtKpEg7N2kRcNdMHeV4ILcI9C0O6wd 2kXT+d6uN66XZN2xTYn+FdqeIvryo4py/IPB1RNN/YTYl4CkFyCPna6Q3lckz7b2JfFc Zg6FlArJqq1fZYjO5vPTFy4ImQDsl0Q+YG3nRxwU75p54zgiDfWlNxoqFZGZ4RTux64Z PW3MTy/+i0UwSAnrqNApqlJ3HO9GjU04SWaauwuvKZ6+6ODMJt8lv2oLSXlgaHZkvO21 R8TA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=tzVckIbl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a11-v6si5732396plp.225.2018.10.04.13.46.06; Thu, 04 Oct 2018 13:46:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=tzVckIbl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727814AbeJEDkw (ORCPT + 99 others); Thu, 4 Oct 2018 23:40:52 -0400 Received: from mail.kernel.org ([198.145.29.99]:55116 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727499AbeJEDkw (ORCPT ); Thu, 4 Oct 2018 23:40:52 -0400 Received: from localhost (unknown [69.71.4.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 98FDC2083F; Thu, 4 Oct 2018 20:45:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1538685951; bh=zBx1GMyIX0MtAw7hjW3EbbabSYdg572zWaTRlMTFX+Q=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=tzVckIblDAGOPrZD5JlE0IbyH86AbZrX/mZ/5kvl7twsrgbQlCcZDUE2BO+U6fwpy IzZ/FF3Gn2QY5Zax45vNMfpWspFNFWUOLxeRYwVGli7baitRmf3h5Yal8MGV8+d4Vy NH4nwaOZMFKbxvgrg3bScMBNjS3fabTwDM0xFAH4= Date: Thu, 4 Oct 2018 15:45:49 -0500 From: Bjorn Helgaas To: Alex_Gagniuc@Dellteam.com Cc: mr.nuke.me@gmail.com, linux-pci@vger.kernel.org, bhelgaas@google.com, keith.busch@intel.com, Austin.Bolen@dell.com, Shyam.Iyer@dell.com, ariel.elior@cavium.com, everest-linux-l2@cavium.com, davem@davemloft.net, michael.chan@broadcom.com, ganeshgr@chelsio.com, jeffrey.t.kirsher@intel.com, tariqt@mellanox.com, saeedm@mellanox.com, leon@kernel.org, jakub.kicinski@netronome.com, dirk.vandermerwe@netronome.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org, linux-rdma@vger.kernel.org, oss-drivers@netronome.com, stephen@networkplumber.org, mj@ucw.cz, Alex Williamson Subject: Re: [PATCH 1/9] PCI: sysfs: Export available PCIe bandwidth Message-ID: <20181004204549.GI120535@bhelgaas-glaptop.roam.corp.google.com> References: <20180903180242.14504-1-mr.nuke.me@gmail.com> <20180903180242.14504-2-mr.nuke.me@gmail.com> <20181003213054.GH120535@bhelgaas-glaptop.roam.corp.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [+cc Alex (VC mentioned below)] On Wed, Oct 03, 2018 at 10:00:19PM +0000, Alex_Gagniuc@Dellteam.com wrote: > On 10/03/2018 04:31 PM, Bjorn Helgaas wrote: > > On Mon, Sep 03, 2018 at 01:02:28PM -0500, Alexandru Gagniuc wrote: > >> For certain bandwidth-critical devices (e.g. multi-port network cards) > >> it is useful to know the available bandwidth to the root complex. This > >> information is only available via the system log, which doesn't > >> account for link degradation after probing. > >> > >> With a sysfs attribute, we can computes the bandwidth on-demand, and > >> will detect degraded links. > >> > >> Signed-off-by: Alexandru Gagniuc > >> --- > >> drivers/pci/pci-sysfs.c | 13 +++++++++++++ > >> 1 file changed, 13 insertions(+) > >> > >> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > >> index 9ecfe13157c0..6658e927b1f5 100644 > >> --- a/drivers/pci/pci-sysfs.c > >> +++ b/drivers/pci/pci-sysfs.c > >> @@ -218,6 +218,18 @@ static ssize_t current_link_width_show(struct device *dev, > >> } > >> static DEVICE_ATTR_RO(current_link_width); > >> > >> +static ssize_t available_bandwidth_show(struct device *dev, > >> + struct device_attribute *attr, char *buf) > >> +{ > >> + struct pci_dev *pci_dev = to_pci_dev(dev); > >> + u32 bw_avail; > >> + > >> + bw_avail = pcie_bandwidth_available(pci_dev, NULL, NULL, NULL); > >> + > >> + return sprintf(buf, "%u.%03u Gb/s\n", bw_avail / 1000, bw_avail % 1000); > >> +} > >> +static DEVICE_ATTR_RO(available_bandwidth); > > > > Help me understand this. We already have these sysfs attributes: > > > > max_link_speed # eg, 16 GT/s > > max_link_width # eg, 8 > > current_link_speed # eg, 16 GT/s > > current_link_width # eg, 8 > > > > so I think the raw materials are already exposed. > > > > The benefits I see for this new file are that > > > > - pcie_bandwidth_available() does the work of traversing up the > > tree, doing the computations (link width * speed, reduced by > > encoding overhead), and finding the minimum, and > > > > - it re-traverses the path every time we look at it, while the > > boot-time check is a one-time event. > > > > In principle this could all be done in user space with the attributes > > that are already exported. There's some precedent for things like > > this in lspci, e.g., "NUMA node" [1], and lspci might even be a more > > user-friendly place for users to look for this, as opposed to > > searching through sysfs. > > Parsing the endpoint to root port bandwidth is, in principle, possible > from userspace. It's just that in practice it's very clumsy to do, and, > as you pointed out, not that reliable. I don't remember the reliability issue. Was that in a previous conversation? AFAICT, using current_link_speed and current_link_width should be as reliable as pcie_bandwidth_available() because they're both based on reading PCI_EXP_LNKSTA. This patch exposes only the available bandwidth, not the limiting device, link speed, or link width. Maybe that extra information isn't important in this context. Of course, it's easy to derive using current_link_speed and current_link_width, but if we do that, there's really no benefit to adding a new file. Linux doesn't really support Virtual Channels (VC) yet, and I don't know much about it, but it seems like Quality-of-Service features like VC could affect this idea of available bandwidth. Since we already expose the necessary information, and we might throw additional things like VC into the mix, I'm a little hesitant about adding things to sysfs because they're very difficult to change later. > I understand it's not information that all users would jump in the air > to know. However, it was important enough for certain use cases, that > the kernel already has a very reliable way to calculate it. > > It seems to me that the most elegant way is to let the kernel tell us, > since the kernel already has this facility. To quote one of the texts > under Documentation/, it is an elegant way to "avoid reinventing kernel > wheels in userspace". > > Alex > > > [1] https://git.kernel.org/pub/scm/utils/pciutils/pciutils.git/commit/?id=90ec4a6d0ae8 > > > >> static ssize_t secondary_bus_number_show(struct device *dev, > >> struct device_attribute *attr, > >> char *buf) > >> @@ -786,6 +798,7 @@ static struct attribute *pcie_dev_attrs[] = { > >> &dev_attr_current_link_width.attr, > >> &dev_attr_max_link_width.attr, > >> &dev_attr_max_link_speed.attr, > >> + &dev_attr_available_bandwidth.attr, > >> NULL, > >> }; > >> > >> -- > >> 2.17.1 > >> > > >