Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp164546ybk; Tue, 12 May 2020 18:52:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzGsUk1NajZ1fPP0Ey5hm0N0WcggAxvjVQ3KeNe4J/N6n1/ftkIzXu55IfPFTrmdnEVLZu2 X-Received: by 2002:aa7:cd7b:: with SMTP id ca27mr11906210edb.51.1589334760628; Tue, 12 May 2020 18:52:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589334760; cv=none; d=google.com; s=arc-20160816; b=WhMapB3n2mMZekgG1HgQeTrNQKUJd5DdQQ33BFDYknR3C5fpVrffLI+NbStCun6bi4 IPql/J+Y2uboj/ZCcxjveWW30PJUKZsKXOYK4+uH33axEDJpSTOw7hldI17lNmsMItmp 7prEv/DFdlSEOkLpoRPVm2gtan7O4YNf9GU2LE6c/jDf1HsiBnXfAIPSd3eQEH6vSunY 4eOzDHaaXvwSxi/3+729DC2AE2DEuVuktGLP7Y01ITi7Dy2zXWyUC82H/faoqbPQegAc ic1fCd8/mOw5XPPu8SZH93Q9Pkzl5Gxrj+zn/sh7ShdQDPrw9yUs/O9z9PNpP/z56A9F H78w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=oz/OI6WFWVz5qPT0I1Vmyt6uBXPVX2sflN9faC3uDgE=; b=mQ+vP+XfXsnPk7gEAHVnx69OkMO6cLzqc0y4NathU8ftNzdjSjJ8qKHcaA56sHBxDp Zexwoxz8e+0lxRiDyu0zU5+lRVjeSX7nXPilrMQywzZ59w7tw7602UBm4+xjIBpbp9Ig kIgtpKLXAWNqzbAP4WFYLmaaf8Ht5LqXm/SQWryV4JKInj9kjeeyTLg7tVqZp5zOlzwL HVWoshQq/lbgbTWIdf0Ywef0FBePI3qrLi5e46sihKRF6X0hfOZBgTII/xrImQCw7BTm jmaigy0zat0wJWK8Hyc2f2qCjV7yjn9VQwyTckulVVFtA3l3W31zZIwrueVzeERS2Qzl ej3g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i19si2074372ejy.14.2020.05.12.18.52.18; Tue, 12 May 2020 18:52:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728285AbgEMBuL (ORCPT + 99 others); Tue, 12 May 2020 21:50:11 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:4398 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726028AbgEMBuK (ORCPT ); Tue, 12 May 2020 21:50:10 -0400 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 73FC533283DC5F6787D9; Wed, 13 May 2020 09:50:08 +0800 (CST) Received: from [10.65.58.147] (10.65.58.147) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.487.0; Wed, 13 May 2020 09:50:00 +0800 Subject: Re: [PATCH v1 1/1] PCI/ERR: Handle fatal error recovery for non-hotplug capable devices To: Jay Vosburgh , References: <18609.1588812972@famine> <9908.1589311230@famine> CC: , , , , liudongdong 00290354 , Linuxarm From: Yicong Yang Message-ID: <1216d38b-bc0a-b4d5-967f-f5a86d96287c@hisilicon.com> Date: Wed, 13 May 2020 09:50:06 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <9908.1589311230@famine> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.65.58.147] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/5/13 3:20, Jay Vosburgh wrote: > sathyanarayanan.kuppuswamy@linux.intel.com wrote: > >> From: Kuppuswamy Sathyanarayanan >> >> If there are non-hotplug capable devices connected to a given >> port, then during the fatal error recovery(triggered by DPC or >> AER), after calling reset_link() function, we cannot rely on >> hotplug handler to detach and re-enumerate the device drivers >> in the affected bus. Instead, we will have to let the error >> recovery handler call report_slot_reset() for all devices in >> the bus to notify about the reset operation. Although this is >> only required for non hot-plug capable devices, doing it for >> hotplug capable devices should not affect the functionality. > Yicong, > > Does the patch below also resolve the issue for you, as with > your changed version of my original patch? Yes. It works. > > -J > >> Along with above issue, this fix also applicable to following >> issue. >> >> Commit 6d2c89441571 ("PCI/ERR: Update error status after >> reset_link()") added support to store status of reset_link() >> call. Although this fixed the error recovery issue observed if >> the initial value of error status is PCI_ERS_RESULT_DISCONNECT >> or PCI_ERS_RESULT_NO_AER_DRIVER, it also discarded the status >> result from report_frozen_detected. This can cause a failure to >> recover if _NEED_RESET is returned by report_frozen_detected and >> report_slot_reset is not invoked. >> >> Such an event can be induced for testing purposes by reducing the >> Max_Payload_Size of a PCIe bridge to less than that of a device >> downstream from the bridge, and then initiating I/O through the >> device, resulting in oversize transactions. In the presence of DPC, >> this results in a containment event and attempted reset and recovery >> via pcie_do_recovery. After 6d2c89441571 report_slot_reset is not >> invoked, and the device does not recover. >> >> [original patch is from jay.vosburgh@canonical.com] >> [original patch link https://lore.kernel.org/linux-pci/18609.1588812972@famine/] >> Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") >> Signed-off-by: Jay Vosburgh >> Signed-off-by: Kuppuswamy Sathyanarayanan >> --- >> drivers/pci/pcie/err.c | 19 +++++++++++++++---- >> 1 file changed, 15 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >> index 14bb8f54723e..db80e1ecb2dc 100644 >> --- a/drivers/pci/pcie/err.c >> +++ b/drivers/pci/pcie/err.c >> @@ -165,13 +165,24 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> pci_dbg(dev, "broadcast error_detected message\n"); >> if (state == pci_channel_io_frozen) { >> pci_walk_bus(bus, report_frozen_detected, &status); >> - status = reset_link(dev); >> - if (status != PCI_ERS_RESULT_RECOVERED) { >> + status = PCI_ERS_RESULT_NEED_RESET; >> + } else { >> + pci_walk_bus(bus, report_normal_detected, &status); >> + } >> + >> + if (status == PCI_ERS_RESULT_NEED_RESET) { >> + if (reset_link) { >> + if (reset_link(dev) != PCI_ERS_RESULT_RECOVERED) >> + status = PCI_ERS_RESULT_DISCONNECT; >> + } else { >> + if (pci_bus_error_reset(dev)) >> + status = PCI_ERS_RESULT_DISCONNECT; >> + } >> + >> + if (status == PCI_ERS_RESULT_DISCONNECT) { >> pci_warn(dev, "link reset failed\n"); >> goto failed; >> } >> - } else { >> - pci_walk_bus(bus, report_normal_detected, &status); >> } >> >> if (status == PCI_ERS_RESULT_CAN_RECOVER) { >> -- >> 2.17.1 >> > --- > -Jay Vosburgh, jay.vosburgh@canonical.com > . >