Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp2296138pxu; Fri, 9 Oct 2020 12:57:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx4ab2c8MDL5bxVUls+p1a1Zb5QRTRJ85/61UVqzhN7tGZF3KBrKMEi7Tmq3mVMatPuP/Mg X-Received: by 2002:a50:9b14:: with SMTP id o20mr892618edi.328.1602273465741; Fri, 09 Oct 2020 12:57:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1602273465; cv=none; d=google.com; s=arc-20160816; b=TtQcwTgejEcGG6d0wyUvaiTuODjWsjXpKoVQdy8Vlf71mAdbc4EZy9Jgi9LCUQ17ox Kzpp6q9fPBx8wET6MYp5gjOaVOjcD/RZFKMo4ZPKc5B1qbo80Dok+X3JXHr+hGF9aPYM bFfGqZCQNhzwCKRbaBdiltn2+YTNvAw6mr0IJxURGMSIL9SMpuD1oKIRmt71VSTDOf7I nc524zvXoBYHbwa9514fDLvpvwEtpPJ1GwzA+IQJD2mP2VMOObdM33LW44g0OX9q69yh XqOI70vKs5UE1gFwHG750gnZ0HXD6xjWTOpJLFGjwksIH0Hvhqdvd0tD5HN5Wx9R/Eqh nZjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=tXXPpZHJQ0wRrGaibo/iiYcPbewRYAUfJ4x3oxucUqk=; b=Hau2U6fijKQ853N9G4nfK78Y/nodWXlkI5pbdkUEY7l+eDDb5DbJaF2Y7qgaNyBdU2 WKnHk2qZB/q4fip0TxC62w2evvQUUQ/gxN2TkgHJsF79jyf2yaeKe+GlTP1B7JhPcww6 Z5zqBnsetrcyYGAzGIRG0qrTf+0roo/gsfkMDVrpQpribqcVWTraTaF1rfFJtWBA5m/P gs3sr/dZoc2nVD9ML8jR+dq9ofCE1ACKi5vlRH/L2W7zQl5TAd/OMXxti6Oire5moJp9 OZMl+tj225yursToYuLr+5qEzm5GGO/zqwVphtddYWxygs4x6Tt0Tgj+A/891EYG2LaC 0X4w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=UiAbnnn5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z15si6701540edq.289.2020.10.09.12.57.23; Fri, 09 Oct 2020 12:57:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=UiAbnnn5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390403AbgJIS0V (ORCPT + 99 others); Fri, 9 Oct 2020 14:26:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33846 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388114AbgJIS0V (ORCPT ); Fri, 9 Oct 2020 14:26:21 -0400 Received: from mail-pl1-x641.google.com (mail-pl1-x641.google.com [IPv6:2607:f8b0:4864:20::641]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11A87C0613D5 for ; Fri, 9 Oct 2020 11:26:21 -0700 (PDT) Received: by mail-pl1-x641.google.com with SMTP id y20so4838323pll.12 for ; Fri, 09 Oct 2020 11:26:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=tXXPpZHJQ0wRrGaibo/iiYcPbewRYAUfJ4x3oxucUqk=; b=UiAbnnn53s5H1As902luwcCycpN3MIYHCfm2tMeGf8dOel4RN/O31cUhpTA77+ogQF Mq8Uh8CbLEzzUK9dslW6QmyBkw0AVfD0n4nAEree1Qk8Syy3kjNC2rp4ULYfMe+b6oka xwURuLpSZrrrMea0cGuXPQT62BiJ1lOVXJcGMIt5911GKxsqIZCeK7Qy/Pwj1Sy+NoYL sWqnDBfVcTm9VKP0MO4qu5lEMMW2QxixCV7EGr31+aqNBG/nPyHk7FQY64JUYpnnks1L cUeR8YPgKFnuYIM4Nb/roHpvQwjXGvJJ+b7zKrNpiJPkxlSjSXHl2g+Y2rDvwEDpLIHS qz2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tXXPpZHJQ0wRrGaibo/iiYcPbewRYAUfJ4x3oxucUqk=; b=fDC+lmsD3fzwxZuh8i1oJdicuE1bHuAlddKj1b8OVl6hYp7nZZDiCToiqVEE0/gWlq YQdXB0gCgJMP/0WBLQxE0LwuMMpbeFTnYm6t8Sy3sat/ksUa8+XTwY2Wn60kDkPBYxjJ W6g9wRWvB0MimQmb50Bmv29RwVZvn3srnb/ZnZbMgZGejQxCiDu2e6spn8tAl2vzXBEC 8zsv/2v9UrgYXrGUDildRKa4s7MLHR1D8F0vowWCXkqLcJFUp8B0sDyLMMqZf9NRQZOj izfzARSv/6xZ9/Ua0dyp+lfDXMKRiC2l/3uLAW39FCwJi69gQLpWoW/T5TDwmf6MGTuG HT+A== X-Gm-Message-State: AOAM530IsLkbbkxOnkEoYKMBEHC/Q1NxlB7ORo8rYq0w3jv4UDn9vYgK 70ZhjjGfVY14daN+nvOvkpz84Q== X-Received: by 2002:a17:902:b718:b029:d4:ccbc:7f2b with SMTP id d24-20020a170902b718b02900d4ccbc7f2bmr11041pls.10.1602267980378; Fri, 09 Oct 2020 11:26:20 -0700 (PDT) Received: from [10.213.166.37] (fmdmzpr03-ext.fm.intel.com. [192.55.54.38]) by smtp.gmail.com with ESMTPSA id e16sm12346661pfh.45.2020.10.09.11.26.18 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 09 Oct 2020 11:26:19 -0700 (PDT) From: "Sean V Kelley" To: "Bjorn Helgaas" Cc: "Sean V Kelley" , bhelgaas@google.com, Jonathan.Cameron@huawei.com, rafael.j.wysocki@intel.com, ashok.raj@intel.com, tony.luck@intel.com, sathyanarayanan.kuppuswamy@intel.com, qiuxu.zhuo@intel.com, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v8 11/14] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR Date: Fri, 09 Oct 2020 11:26:14 -0700 X-Mailer: MailMate (1.13.2r5673) Message-ID: <41B2B0A9-80B0-41DA-9FF3-A5D9465E6E08@intel.com> In-Reply-To: <20201009175745.GA3489710@bjorn-Precision-5520> References: <20201009175745.GA3489710@bjorn-Precision-5520> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Bjorn, On 9 Oct 2020, at 10:57, Bjorn Helgaas wrote: > On Fri, Oct 02, 2020 at 11:47:32AM -0700, Sean V Kelley wrote: >> From: Qiuxu Zhuo >> >> When attempting error recovery for an RCiEP associated with an RCEC >> device, >> there needs to be a way to update the Root Error Status, the >> Uncorrectable >> Error Status and the Uncorrectable Error Severity of the parent RCEC. >> In some non-native cases in which there is no OS visible device >> associated with the RCiEP, there is nothing to act upon as the >> firmware >> is acting before the OS. So add handling for the linked 'rcec' in >> AER/ERR >> while taking into account non-native cases. >> >> Co-developed-by: Sean V Kelley >> Signed-off-by: Sean V Kelley >> Signed-off-by: Qiuxu Zhuo >> Reviewed-by: Jonathan Cameron >> --- >> drivers/pci/pcie/aer.c | 9 +++++---- >> drivers/pci/pcie/err.c | 39 ++++++++++++++++++++++++++++----------- >> 2 files changed, 33 insertions(+), 15 deletions(-) >> >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c >> index 65dff5f3457a..dccdba60b5d9 100644 >> --- a/drivers/pci/pcie/aer.c >> +++ b/drivers/pci/pcie/aer.c >> @@ -1358,17 +1358,18 @@ static int aer_probe(struct pcie_device *dev) >> static pci_ers_result_t aer_root_reset(struct pci_dev *dev) >> { >> int aer = dev->aer_cap; >> + int rc = 0; >> u32 reg32; >> - int rc; >> - >> >> /* Disable Root's interrupt in response to error messages */ >> pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32); >> reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; >> pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32); >> >> - rc = pci_bus_error_reset(dev); >> - pci_info(dev, "Root Port link has been reset\n"); >> + if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC) { >> + rc = pci_bus_error_reset(dev); >> + pci_info(dev, "Root Port link has been reset\n"); >> + } >> >> /* Clear Root Error Status */ >> pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, ®32); >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >> index 38abd7984996..956ad4c86d53 100644 >> --- a/drivers/pci/pcie/err.c >> +++ b/drivers/pci/pcie/err.c >> @@ -149,7 +149,8 @@ static int report_resume(struct pci_dev *dev, >> void *data) >> /** >> * pci_walk_bridge - walk bridges potentially AER affected >> * @bridge bridge which may be an RCEC with associated RCiEPs, >> - * an RCiEP associated with an RCEC, or a Port. >> + * or a Port. >> + * @dev an RCiEP lacking an associated RCEC. >> * @cb callback to be called for each device found >> * @userdata arbitrary pointer to be passed to callback. >> * >> @@ -160,13 +161,20 @@ static int report_resume(struct pci_dev *dev, >> void *data) >> * If the device provided has no subordinate bus, call the provided >> * callback on the device itself. >> */ >> -static void pci_walk_bridge(struct pci_dev *bridge, int (*cb)(struct >> pci_dev *, void *), >> +static void pci_walk_bridge(struct pci_dev *bridge, struct pci_dev >> *dev, >> + int (*cb)(struct pci_dev *, void *), >> void *userdata) >> { >> - if (bridge->subordinate) >> + /* >> + * In a non-native case where there is no OS-visible reporting >> + * device the bridge will be NULL, i.e., no RCEC, no PORT. >> + */ >> + if (bridge && bridge->subordinate) >> pci_walk_bus(bridge->subordinate, cb, userdata); >> - else >> + else if (bridge) >> cb(bridge, userdata); >> + else >> + cb(dev, userdata); >> } >> >> static pci_ers_result_t flr_on_rciep(struct pci_dev *dev) >> @@ -196,16 +204,25 @@ pci_ers_result_t pcie_do_recovery(struct >> pci_dev *dev, >> type = pci_pcie_type(dev); >> if (type == PCI_EXP_TYPE_ROOT_PORT || >> type == PCI_EXP_TYPE_DOWNSTREAM || >> - type == PCI_EXP_TYPE_RC_EC || >> - type == PCI_EXP_TYPE_RC_END) >> + type == PCI_EXP_TYPE_RC_EC) >> bridge = dev; >> + else if (type == PCI_EXP_TYPE_RC_END) >> + bridge = dev->rcec; >> else >> bridge = pci_upstream_bridge(dev); >> >> pci_dbg(dev, "broadcast error_detected message\n"); >> if (state == pci_channel_io_frozen) { >> - pci_walk_bridge(bridge, report_frozen_detected, &status); >> + pci_walk_bridge(bridge, dev, report_frozen_detected, &status); >> if (type == PCI_EXP_TYPE_RC_END) { >> + /* >> + * The callback only clears the Root Error Status >> + * of the RCEC (see aer.c). Only perform this for the >> + * native case, i.e., an RCEC is present. >> + */ >> + if (bridge) >> + reset_subordinate_devices(bridge); > > Help me understand this. There are lots of callbacks in this picture, > but I guess this "callback only clears Root Error Status" must refer > to aer_root_reset(), i.e., the reset_subordinate_devices pointer? The ‘bridge’ in this case will always be dev->rcec, the event collector for the associated RC_END. And that’s what’s being cleared in aer.c via aer_root_reset() as the callback. It’s also being checked for native/non-native here. > > Of course, the caller of pcie_do_recovery() supplied that pointer. > And we can infer that it must be aer_root_reset(), not > dpc_reset_link(), because RCECs and RCiEPs are not allowed to > implement DPC. Correct. > > I wish we didn't have either this assumption about what > reset_subordinate_devices points to, or the assumption about what > aer_root_reset() does. They both seem a little bit tenuous. Agree. It’s the relationship between the RC_END and the RC_EC. > > We already made aer_root_reset() smart enough to check for RCECs. Can > we put the FLR there, too? Then we wouldn't have this weird situation > where reset_subordinate_devices() does a reset and clears error > status, EXCEPT for this case where it only clears error status and we > do the reset here? We could add the smarts to aer_root_reset() to check for an RC_END in that callback and perform the clear there on its RC_EC. We just wouldn’t map ‘bridge’ to dev->rcec for RC_END in pcie_do_recovery() which would simplify things. Further, the FLR in the case of flr_on_rciep() below is specific to the RCiEP itself. So it could be performed either in aer_root_reset() or remain in the pcie_do_recovery(). That should work. Sean > >> status = flr_on_rciep(dev); >> if (status != PCI_ERS_RESULT_RECOVERED) { >> pci_warn(dev, "function level reset failed\n"); >> @@ -219,13 +236,13 @@ pci_ers_result_t pcie_do_recovery(struct >> pci_dev *dev, >> } >> } >> } else { >> - pci_walk_bridge(bridge, report_normal_detected, &status); >> + pci_walk_bridge(bridge, dev, report_normal_detected, &status); >> } >> >> if (status == PCI_ERS_RESULT_CAN_RECOVER) { >> status = PCI_ERS_RESULT_RECOVERED; >> pci_dbg(dev, "broadcast mmio_enabled message\n"); >> - pci_walk_bridge(bridge, report_mmio_enabled, &status); >> + pci_walk_bridge(bridge, dev, report_mmio_enabled, &status); >> } >> >> if (status == PCI_ERS_RESULT_NEED_RESET) { >> @@ -236,14 +253,14 @@ pci_ers_result_t pcie_do_recovery(struct >> pci_dev *dev, >> */ >> status = PCI_ERS_RESULT_RECOVERED; >> pci_dbg(dev, "broadcast slot_reset message\n"); >> - pci_walk_bridge(bridge, report_slot_reset, &status); >> + pci_walk_bridge(bridge, dev, report_slot_reset, &status); >> } >> >> if (status != PCI_ERS_RESULT_RECOVERED) >> goto failed; >> >> pci_dbg(dev, "broadcast resume message\n"); >> - pci_walk_bridge(bridge, report_resume, &status); >> + pci_walk_bridge(bridge, dev, report_resume, &status); >> >> if (type == PCI_EXP_TYPE_ROOT_PORT || >> type == PCI_EXP_TYPE_DOWNSTREAM || >> -- >> 2.28.0 >>