Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp611665imm; Fri, 11 May 2018 03:46:01 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpJRSp7tsiIzC2zh04DwK1lb3ThxxDLye4K+nFaqCrE/1QIsMMOaa+s5UU/38Yt9AtoKyY1 X-Received: by 2002:a65:4907:: with SMTP id p7-v6mr4033874pgs.139.1526035561365; Fri, 11 May 2018 03:46:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526035561; cv=none; d=google.com; s=arc-20160816; b=vQPT6jJ5yVqEVnIj4bRfd4KGuVKI2WDYQ02NHhud+AclheYkAY1jTy/NsJlxDe1FBr pkyaEf16OTBQSXwwstJ55JHztKE8zaskre1EpKgaLgEibpaN3dqacm4MJMyARoGjamTd Cee3/Z313sZ6oP5M6bpJ2nRCrDhsXLGfk9KlPzmulC8WFeVLZSt+k/qr0OZaxioCGkap t9YGQTzQKE/JVCVpcSJq2muTXp6yGh1EG6PoBOQWsJ30UzYjRvRJZKUmWV32ahtWpEcq QZESyg7OZk79uTteaYd/XWbQjAK3VxZgjl473+Gg/kCI+d74x4gy8qMtaghkOEZI5GIP xstA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=YGNN3YZMqaK7P6jjEp4uSZwJ/sKd0FUAkWV0q+F44SE=; b=IZZ2LmyrKekOmL4EF7KkpsySb0I0pjX0ZZUyLxMADaOCMYTBcatz+AhLyMlL73CO9W isl8rrF3IuBsOtsPWYPqTRVdI70KrtraSRBxhvc5JhOFLQmyk/t66II/ZPIIqQoFVTYu lEKhx6/xAQq3rFAkEZdHsPEt/htC2QT459nuSIC1XXtq23QbeUle/5jQ5eiBPr6t2HTN V7xoGPtYDI0nbuHcDyjNo5UnEVI3bLCPZ5/n7izzz5OgE2j924JX13f3+8CORUrP53dA LGvfOx6eDCwD7hm3agtMsX6QVz3EFIcC0KbMRG6Q1XsK/S5vuRiABiEZNYIHOaVUXIsA AUNw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f35-v6si2857036plh.193.2018.05.11.03.45.47; Fri, 11 May 2018 03:46:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753172AbeEKKpB (ORCPT + 99 others); Fri, 11 May 2018 06:45:01 -0400 Received: from alexa-out-sd-02.qualcomm.com ([199.106.114.39]:62001 "EHLO alexa-out-sd-02.qualcomm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752666AbeEKKnd (ORCPT ); Fri, 11 May 2018 06:43:33 -0400 X-IronPort-AV: E=Sophos;i="5.49,388,1520924400"; d="scan'208";a="3207140" Received: from unknown (HELO ironmsg02-sd.qualcomm.com) ([10.53.140.142]) by alexa-out-sd-02.qualcomm.com with ESMTP; 11 May 2018 03:43:32 -0700 Received: from westreach.qualcomm.com ([10.228.196.125]) by ironmsg02-sd.qualcomm.com with ESMTP; 11 May 2018 03:43:31 -0700 Received: by westreach.qualcomm.com (Postfix, from userid 467151) id 8996A1F29; Fri, 11 May 2018 06:43:30 -0400 (EDT) From: Oza Pawandeep To: Bjorn Helgaas , Philippe Ombredanne , Thomas Gleixner , Greg Kroah-Hartman , Kate Stewart , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Dongdong Liu , Keith Busch , Wei Zhang , Sinan Kaya , Timur Tabi Cc: Oza Pawandeep Subject: [PATCH v16 3/9] PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices Date: Fri, 11 May 2018 06:43:22 -0400 Message-Id: <1526035408-31328-4-git-send-email-poza@codeaurora.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1526035408-31328-1-git-send-email-poza@codeaurora.org> References: <1526035408-31328-1-git-send-email-poza@codeaurora.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch alters the behavior of handling of ERR_FATAL, where removal of devices is initiated, followed by reset link, followed by re-enumeration. So the errors are handled in a different way as follows: ERR_NONFATAL => call driver recovery entry points ERR_FATAL => remove and re-enumerate please refer to Documentation/PCI/pci-error-recovery.txt for more details. Signed-off-by: Oza Pawandeep Reviewed-by: Keith Busch diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c index 0ea5acc..649dd1f 100644 --- a/drivers/pci/pcie/aer/aerdrv_core.c +++ b/drivers/pci/pcie/aer/aerdrv_core.c @@ -20,6 +20,7 @@ #include #include #include "aerdrv.h" +#include "../../pci.h" #define PCI_EXP_AER_FLAGS (PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE | \ PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE) @@ -475,35 +476,84 @@ static pci_ers_result_t reset_link(struct pci_dev *dev) } /** - * do_recovery - handle nonfatal/fatal error recovery process + * do_fatal_recovery - handle fatal error recovery process + * @dev: pointer to a pci_dev data structure of agent detecting an error + * + * Invoked when an error is fatal. Once being invoked, removes the devices + * benetah this AER agent, followed by reset link e.g. secondary bus reset + * followed by re-enumeration of devices. + */ + +static void do_fatal_recovery(struct pci_dev *dev) +{ + struct pci_dev *udev; + struct pci_bus *parent; + struct pci_dev *pdev, *temp; + pci_ers_result_t result = PCI_ERS_RESULT_RECOVERED; + struct aer_broadcast_data result_data; + + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + udev = dev; + else + udev = dev->bus->self; + + parent = udev->subordinate; + pci_lock_rescan_remove(); + list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, + bus_list) { + pci_dev_get(pdev); + pci_dev_set_disconnected(pdev, NULL); + if (pci_has_subordinate(pdev)) + pci_walk_bus(pdev->subordinate, + pci_dev_set_disconnected, NULL); + pci_stop_and_remove_bus_device(pdev); + pci_dev_put(pdev); + } + + result = reset_link(udev); + + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { + /* + * If the error is reported by a bridge, we think this error + * is related to the downstream link of the bridge, so we + * do error recovery on all subordinates of the bridge instead + * of the bridge and clear the error status of the bridge. + */ + pci_walk_bus(dev->subordinate, report_resume, &result_data); + pci_cleanup_aer_uncorrect_error_status(dev); + } + + if (result == PCI_ERS_RESULT_RECOVERED) { + if (pcie_wait_for_link(udev, true)) + pci_rescan_bus(udev->bus); + } else { + pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); + pci_info(dev, "AER: Device recovery failed\n"); + } + + pci_unlock_rescan_remove(); +} + +/** + * do_nonfatal_recovery - handle nonfatal error recovery process * @dev: pointer to a pci_dev data structure of agent detecting an error - * @severity: error severity type * * Invoked when an error is nonfatal/fatal. Once being invoked, broadcast * error detected message to all downstream drivers within a hierarchy in * question and return the returned code. */ -static void do_recovery(struct pci_dev *dev, int severity) +static void do_nonfatal_recovery(struct pci_dev *dev) { - pci_ers_result_t status, result = PCI_ERS_RESULT_RECOVERED; + pci_ers_result_t status; enum pci_channel_state state; - if (severity == AER_FATAL) - state = pci_channel_io_frozen; - else - state = pci_channel_io_normal; + state = pci_channel_io_normal; status = broadcast_error_message(dev, state, "error_detected", report_error_detected); - if (severity == AER_FATAL) { - result = reset_link(dev); - if (result != PCI_ERS_RESULT_RECOVERED) - goto failed; - } - if (status == PCI_ERS_RESULT_CAN_RECOVER) status = broadcast_error_message(dev, state, @@ -562,8 +612,10 @@ static void handle_error_source(struct pcie_device *aerdev, if (pos) pci_write_config_dword(dev, pos + PCI_ERR_COR_STATUS, info->status); - } else - do_recovery(dev, info->severity); + } else if (info->severity == AER_NONFATAL) + do_nonfatal_recovery(dev); + else if (info->severity == AER_FATAL) + do_fatal_recovery(dev); } #ifdef CONFIG_ACPI_APEI_PCIEAER @@ -627,8 +679,10 @@ static void aer_recover_work_func(struct work_struct *work) continue; } cper_print_aer(pdev, entry.severity, entry.regs); - if (entry.severity != AER_CORRECTABLE) - do_recovery(pdev, entry.severity); + if (entry.severity == AER_NONFATAL) + do_nonfatal_recovery(pdev); + else if (entry.severity == AER_FATAL) + do_fatal_recovery(pdev); pci_dev_put(pdev); } } -- 2.7.4