Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp2036983imm; Thu, 21 Jun 2018 06:20:05 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJaDLAABlJc8vzkf8vQf7AW4jUc8uyG6tLESJMZ98F+xYu65OK3GR2A2FBDuUnh5WkyQ1MX X-Received: by 2002:a63:a74c:: with SMTP id w12-v6mr23033268pgo.374.1529587205263; Thu, 21 Jun 2018 06:20:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529587205; cv=none; d=google.com; s=arc-20160816; b=rBbhTut4trSKiDTCSY0E+bqEPIsK2gageBaE859OwZ7Za10B6aLYWO+evYlMki1YzP D713HOtRWX1b56Wv3tQkp5Sdg07QZuE4lShctCD6Kgc1/e2DOmIiK/t3bLDlt05qig1W /HGuJcBG6Oe4pSiGw+W0FExOn0lSeFhVk+jRlmKDX8HUbayqgzXCTkHFDcdkNtXSWLQk zNTeBZ00mvNkiRyOkv5eqmyFC1sHeLtwoBUCCf10boBy1KjZEwOpI9EAJC4qXU1WnNUo LF55P1QDdfJNOum6WAFjO0lazS/Hj6I607GIb3Lv7JRqYuXke56AWdR4EoHzDUz/G7Vw CIdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=xaOqZxBCGW5xap+tbgt9S7lpL9etWv5ivpx6RKo3PAg=; b=ap1Xn6g9BjTr1DJXKe25wWOy2mh0vHLKsMv6xCI8lu+KjN7z06fm79di1M5RmJcSt6 rZezidplLJAnmPaKTF3VV4uWojKW2rBfNXPTdUA4fFFqwfwGdWqX5OUksMvaZ7mvMZA3 /39U5Zg/HDhKuF2JazwOT6p6CYHdLsn2WgmE2OsbtivE7UXj97dsysGZywjENjJReQcc 9rLmvpSXCnBmJi8m9fjQthJe+Hci845yDwskNdfCTMIGQD1Brd8gwqyJJ0BydBH6xmgn Z0wsfD6CwizKMaAqkg8rQpVLiT4YpU6CnJ+rjzv81bbh3ddQVXWqu7gxYwwy3o9zRlT/ hXOA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=cTZcaNOc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u2-v6si3749096pgv.335.2018.06.21.06.19.50; Thu, 21 Jun 2018 06:20:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=cTZcaNOc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933356AbeFUNRn (ORCPT + 99 others); Thu, 21 Jun 2018 09:17:43 -0400 Received: from mail.kernel.org ([198.145.29.99]:35184 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932747AbeFUNRl (ORCPT ); Thu, 21 Jun 2018 09:17:41 -0400 Received: from localhost (173-25-171-118.client.mchsi.com [173.25.171.118]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 840A520837; Thu, 21 Jun 2018 13:17:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1529587060; bh=Mh5XPnIW9cKjMIa8ddBiqhht/6Shaw5XQ/ldw5ujX28=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=cTZcaNOcADla0QVAZF0TLXvynVuIomlmnCBed07v6D/RsvznN6dWaYCOsRBTUz6Ix QC/JAdnNxq7LC0bJIsCrlMOwVoo1p0qz1d42Oi1xxFVLR+xEWE7//P1Uf/9uDOShWw H/DUlJuv2WMCYLiYLc+GnxSY7EFtk9VQXq4iXxNM= Date: Thu, 21 Jun 2018 08:17:34 -0500 From: Bjorn Helgaas To: Rajat Jain Cc: Bjorn Helgaas , Jonathan Corbet , Philippe Ombredanne , Kate Stewart , Thomas Gleixner , Greg Kroah-Hartman , Frederick Lawler , Oza Pawandeep , Keith Busch , Alexandru Gagniuc , Thomas Tai , "Steven Rostedt (VMware)" , linux-pci@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Jes Sorensen , Kyle McMartin , rajatxjain@gmail.com Subject: Re: [PATCH v5 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices Message-ID: <20180621131734.GA14136@bhelgaas-glaptop.roam.corp.google.com> References: <20180522222805.80314-1-rajatja@google.com> <20180620234147.48438-1-rajatja@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180620234147.48438-1-rajatja@google.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 20, 2018 at 04:41:43PM -0700, Rajat Jain wrote: > Define a structure to hold the AER statistics. There are 2 groups > of statistics: dev_* counters that are to be collected for all AER > capable devices and rootport_* counters that are collected for all > (AER capable) rootports only. Allocate and free this structure when > device is added or released (thus counters survive the lifetime of the > device). > > Signed-off-by: Rajat Jain > --- > v5: Same as v4 > v4: Same as v3 > v3: Merge everything in aer.c > > drivers/pci/pcie/aer.c | 60 ++++++++++++++++++++++++++++++++++++++++++ > drivers/pci/probe.c | 1 + > include/linux/pci.h | 3 +++ > 3 files changed, 64 insertions(+) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index a2e88386af28..f9fa994b6c33 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -33,6 +33,9 @@ > #define AER_ERROR_SOURCES_MAX 100 > #define AER_MAX_MULTI_ERR_DEVICES 5 /* Not likely to have more */ > > +#define AER_MAX_TYPEOF_CORRECTABLE_ERRS 16 /* as per PCI_ERR_COR_STATUS */ > +#define AER_MAX_TYPEOF_UNCORRECTABLE_ERRS 26 /* as per PCI_ERR_UNCOR_STATUS*/ > + > struct aer_err_info { > struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES]; > int error_dev_num; > @@ -76,6 +79,40 @@ struct aer_rpc { > */ > }; > > +/* AER stats for the device */ > +struct aer_stats { > + > + /* > + * Fields for all AER capable devices. They indicate the errors > + * "as seen by this device". Note that this may mean that if an > + * end point is causing problems, the AER counters may increment > + * at its link partner (e.g. root port) because the errors will be > + * "seen" by the link partner and not the the problematic end point > + * itself (which may report all counters as 0 as it never saw any > + * problems). > + */ > + /* Individual counters for different type of correctable errors */ > + u64 dev_cor_errs[AER_MAX_TYPEOF_CORRECTABLE_ERRS]; > + /* Individual counters for different type of uncorrectable errors */ > + u64 dev_uncor_errs[AER_MAX_TYPEOF_UNCORRECTABLE_ERRS]; > + /* Total number of correctable errors seen by this device */ > + u64 dev_total_cor_errs; > + /* Total number of fatal uncorrectable errors seen by this device */ > + u64 dev_total_fatal_errs; > + /* Total number of fatal uncorrectable errors seen by this device */ > + u64 dev_total_nonfatal_errs; > + > + /* > + * Fields for Root ports only, these indicate the total number of > + * ERR_COR, ERR_FATAL, and ERR_NONFATAL messages received by the > + * rootport, INCLUDING the ones that are generated internally (by > + * the rootport itself) Strictly speaking, I think these are applicable for both root ports and root complex event collectors, right? > + */ > + u64 rootport_total_cor_errs; > + u64 rootport_total_fatal_errs; > + u64 rootport_total_nonfatal_errs; > +}; > + > #define AER_LOG_TLP_MASKS (PCI_ERR_UNC_POISON_TLP| \ > PCI_ERR_UNC_ECRC| \ > PCI_ERR_UNC_UNSUP| \ > @@ -402,12 +439,35 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev) > return 0; > } > > +static int pci_aer_stats_init(struct pci_dev *pdev) > +{ > + pdev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL); > + if (!pdev->aer_stats) { > + dev_err(&pdev->dev, "No memory for aer_stats\n"); pci_err(), if we need the message at all. Based on c7abb2352c29 ("PCI: Remove unnecessary messages for memory allocation failures"), I'd be inclined to drop the message completely. > + return -ENOMEM; > + } > + return 0; > +} > + > +static void pci_aer_stats_exit(struct pci_dev *pdev) > +{ > + kfree(pdev->aer_stats); > + pdev->aer_stats = NULL; > +} > + > int pci_aer_init(struct pci_dev *dev) > { > dev->aer_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR); > + if (!dev->aer_cap || pci_aer_stats_init(dev)) > + return -EIO; This skips pci_cleanup_aer_error_status_regs() if the kzalloc() fails. I think we should still do pci_cleanup_aer_error_status_regs(), even if the alloc fails. Nobody checks the return value of pci_aer_init(), so I think you can simplify this by making these functions void. Maybe even squash them together, i.e., do the kzalloc() directly in pci_aer_init() and the kfree() directly in pci_aer_exit()? > return pci_cleanup_aer_error_status_regs(dev); > } > > +void pci_aer_exit(struct pci_dev *dev) > +{ > + pci_aer_stats_exit(dev); > +} > + > #define AER_AGENT_RECEIVER 0 > #define AER_AGENT_REQUESTER 1 > #define AER_AGENT_COMPLETER 2 > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index ac876e32de4b..48edd0c9e4bc 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -2064,6 +2064,7 @@ static void pci_configure_device(struct pci_dev *dev) > > static void pci_release_capabilities(struct pci_dev *dev) > { > + pci_aer_exit(dev); > pci_vpd_release(dev); > pci_iov_release(dev); > pci_free_cap_save_buffers(dev); > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 340029b2fb38..8d59c6c19a19 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -299,6 +299,7 @@ struct pci_dev { > u8 hdr_type; /* PCI header type (`multi' flag masked out) */ > #ifdef CONFIG_PCIEAER > u16 aer_cap; /* AER capability offset */ > + struct aer_stats *aer_stats; /* AER stats for this device */ > #endif > u8 pcie_cap; /* PCIe capability offset */ > u8 msi_cap; /* MSI capability offset */ > @@ -1471,10 +1472,12 @@ static inline bool pcie_aspm_support_enabled(void) { return false; } > void pci_no_aer(void); > bool pci_aer_available(void); > int pci_aer_init(struct pci_dev *dev); > +void pci_aer_exit(struct pci_dev *dev); With the exception of pci_aer_available(), these are only used inside drivers/pci. This might be a good opportunity to move those private things to drivers/pci/pci.h (in a separate patch, of course). > #else > static inline void pci_no_aer(void) { } > static inline bool pci_aer_available(void) { return false; } > static inline int pci_aer_init(struct pci_dev *d) { return -ENODEV; } > +static inline void pci_aer_exit(struct pci_dev *d) { } > #endif > > #ifdef CONFIG_PCIE_ECRC > -- > 2.18.0.rc1.244.gcf134e6275-goog >