Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp1756314pxu; Thu, 8 Oct 2020 21:37:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxqPr2Ktcj878enpIs46ea6SdyyQrCLFaDU89AP3tKkEskXp0ynVJzlQki4ZnInJeQHat7h X-Received: by 2002:a50:871d:: with SMTP id i29mr12891032edb.300.1602218231542; Thu, 08 Oct 2020 21:37:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1602218231; cv=none; d=google.com; s=arc-20160816; b=SoTjMDUoftyHr3hFBF5dtsDrvj7b9gtP4V6WkcOmlbcInHbFk9/R910ge4zX/Xmx16 qSgP1L5jTWkYdAtJRayw+8DCJFwg/zdXx/0bFd9MWCZJj+ZhVw1hUp/Y24c0lezqEF3a XxA+/vDDY9bq1qkYZWkl7vDF5xkqo+OELd4fHO6EUyx8TVsbgP3qdRRm7Vj4IsOJL9Fg ESkpARbYitjSxynQwYtkDP2dwvHuHSZnRfrnl8iF1bIKSVwQ3AlUkC/p9xhU/7nE+l2j eJIxPdg/YjpdYQ7HbKZyJh4N5TnrU0bzu+v3XgmOuYubshmcxw7SSsqNsiEXivMQgIC3 1giQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:mail-followup-to:message-id:subject:cc:to:from:date :dkim-signature; bh=8lY9zHD49arnbGH+DnXZD8UONONwPBRNxpxm5YDmoBs=; b=lPlKYdFnq/OkHYXXjO1hK9aV7JpHivpU007sVdPaHb9dj6IAD4q9xwtph0SC2tH/Iv jOLAxFUebOfisZmB/Bbmv3O51evGOGQ/zaZCOsPF8nlW4HL4Xf5TF1HnX6tP7/zdtRa2 JpPa/HZMKXhPdVwNr4YH3Ln2sOd4C8XNr0pc7MJyVJ75zTeSlAQHPe+xSB9avXHn8Um2 KL4ri1lh2tc0GUlqI/OV/ixGCQ+R/Ec5tfhPRnDxfP8/VmCV20xFse81+fPCF7c82HPr AgXdSe8K0G9tV3KjSzPNz247w/vzRu5vgEbZILhrVqj2Ax0PWnP5MWJaFpKQvQtVKWYL /gug== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@hpe.com header.s=pps0720 header.b="mad81/sU"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hpe.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z23si5207899edx.29.2020.10.08.21.36.48; Thu, 08 Oct 2020 21:37:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@hpe.com header.s=pps0720 header.b="mad81/sU"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hpe.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726365AbgJIEHI (ORCPT + 99 others); Fri, 9 Oct 2020 00:07:08 -0400 Received: from mx0a-002e3701.pphosted.com ([148.163.147.86]:17364 "EHLO mx0a-002e3701.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725900AbgJIEHI (ORCPT ); Fri, 9 Oct 2020 00:07:08 -0400 Received: from pps.filterd (m0150242.ppops.net [127.0.0.1]) by mx0a-002e3701.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 09943Abs011829; Fri, 9 Oct 2020 04:05:58 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=pps0720; bh=8lY9zHD49arnbGH+DnXZD8UONONwPBRNxpxm5YDmoBs=; b=mad81/sU/ekKeERRsRj44WBUbLy+TKr0V4ien5YB/gZdvLfHR3B3+QznIBYid9IN1man 3D3tgGBbG1pQscQDfMrMDE1dGuJiF8ibSBTMe+1gCPzf3YxNj6ZBpOZuGjAJc5HR7gO5 hLEnt473Ze8kZKoUAGybFtmU9uq1tA4YBWGqrMdvOfYqJJqvuFcES5Gurw9P+ND2+OOl o7jYuQms/YbqXh2vPkPoY7/q8ZseGepfYxM34yTcfC1r5Uq32cJiPWRx0LpZsivW+s+Z zOeUcl7e4CZcN1VMROpbFrfD9xxko2o9BMxYjbHoMpKU9Ix6C47J+sMWesptBQvwA2bL Gg== Received: from g4t3427.houston.hpe.com (g4t3427.houston.hpe.com [15.241.140.73]) by mx0a-002e3701.pphosted.com with ESMTP id 342ekq0pxh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 09 Oct 2020 04:05:58 +0000 Received: from g4t3433.houston.hpecorp.net (g4t3433.houston.hpecorp.net [16.208.49.245]) by g4t3427.houston.hpe.com (Postfix) with ESMTP id 5D4D566; Fri, 9 Oct 2020 04:05:57 +0000 (UTC) Received: from sarge.linuxathome.me (unknown [16.29.167.198]) by g4t3433.houston.hpecorp.net (Postfix) with ESMTP id 6F43B45; Fri, 9 Oct 2020 04:05:55 +0000 (UTC) Date: Fri, 9 Oct 2020 05:05:54 +0100 From: Hedi Berriche To: "Raj, Ashok" Cc: Kuppuswamy Sathyanarayanan , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Russ Anderson , Bjorn Helgaas , Joerg Roedel , stable@kernel.org Subject: Re: [PATCH v1 1/1] PCI/ERR: don't clobber status after reset_link() Message-ID: <20201009040554.GB2365427@sarge.linuxathome.me> Mail-Followup-To: "Raj, Ashok" , Kuppuswamy Sathyanarayanan , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Russ Anderson , Bjorn Helgaas , Joerg Roedel , stable@kernel.org References: <20201009025251.2360659-1-hedi.berriche@hpe.com> <20201009034614.GB60852@otc-nc-03> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Disposition: inline In-Reply-To: <20201009034614.GB60852@otc-nc-03> X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235,18.0.687 definitions=2020-10-09_01:2020-10-09,2020-10-09 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 malwarescore=0 suspectscore=1 clxscore=1015 impostorscore=0 mlxlogscore=999 lowpriorityscore=0 priorityscore=1501 adultscore=0 spamscore=0 phishscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010090027 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 09, 2020 at 04:46 Raj, Ashok wrote: Hi Ashok, Thanks for looking into this. >On Fri, Oct 09, 2020 at 03:52:51AM +0100, Hedi Berriche wrote: >> Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") >> changed pcie_do_recovery() so that status is updated with the return >> value from reset_link(); this was to fix the problem where we would >> wrongly report recovery failure, despite a successful reset_link(), >> whenever the initial error status is PCI_ERS_RESULT_DISCONNECT or >> PCI_ERS_RESULT_NO_AER_DRIVER. >> >> Unfortunately this breaks the flow of pcie_do_recovery() as it prevents > >What is the reference to "this breaks" above? The code change introduced by commit 6d2c89441571; would "this code change" instead of "this breaks" work better? If not, I can also rephrase the whole paragraph along the following lines: Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") breaks the flow of pcie_do_recovery() as it prevents the actions needed when the initial error is PCI_ERS_RESULT_CAN_RECOVER or PCI_ERS_RESULT_NEED_RESET from taking place which causes error recovery to fail. ... and do away with the first paragraph. >> the actions needed when the initial error is PCI_ERS_RESULT_CAN_RECOVER >> or PCI_ERS_RESULT_NEED_RESET from taking place which causes error >> recovery to fail. >> >> Don't clobber status after reset_link() to restore the intended flow in >> pcie_do_recovery(). >> >> Fix the original problem by saving the return value from reset_link() >> and use it later on to decide whether error recovery should be deemed >> successful in the scenarios where the initial error status is >> PCI_ERS_RESULT_{DISCONNECT,NO_AER_DRIVER}. > >I would rather rephrase the above to make it clear what is being proposed. >Since the description seems to talk about the old problem and new solution >all mixed up. OK; will do that to clarify that what's being proposed here is: 1. fix the regression introduced by commit 6d2c89441571 2. address the problem that commit 6d2c89441571 aimed to fix >> Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") >> Signed-off-by: Hedi Berriche >> Cc: Russ Anderson >> Cc: Kuppuswamy Sathyanarayanan >> Cc: Bjorn Helgaas >> Cc: Ashok Raj >> Cc: Keith Busch >> Cc: Joerg Roedel >> >> Cc: stable@kernel.org # v5.7+ >> --- >> drivers/pci/pcie/err.c | 13 ++++++++++--- >> 1 file changed, 10 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >> index c543f419d8f9..dbd0b56bd6c1 100644 >> --- a/drivers/pci/pcie/err.c >> +++ b/drivers/pci/pcie/err.c >> @@ -150,7 +150,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> pci_channel_state_t state, >> pci_ers_result_t (*reset_link)(struct pci_dev *pdev)) >> { >> - pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; >> + pci_ers_result_t post_reset_status, status = PCI_ERS_RESULT_CAN_RECOVER; > >why call it post_reset_status? Perhaps post_reset_status is not a great choice; would reset_result or reset_link_result be better? Cheers, Hedi. > >> struct pci_bus *bus; >> >> /* >> @@ -165,8 +165,8 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> pci_dbg(dev, "broadcast error_detected message\n"); >> if (state == pci_channel_io_frozen) { >> pci_walk_bus(bus, report_frozen_detected, &status); >> - status = reset_link(dev); >> - if (status != PCI_ERS_RESULT_RECOVERED) { >> + post_reset_status = reset_link(dev); >> + if (post_reset_status != PCI_ERS_RESULT_RECOVERED) { >> pci_warn(dev, "link reset failed\n"); >> goto failed; >> } >> @@ -174,6 +174,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> pci_walk_bus(bus, report_normal_detected, &status); >> } >> >> + if ((status == PCI_ERS_RESULT_DISCONNECT || >> + status == PCI_ERS_RESULT_NO_AER_DRIVER) && >> + post_reset_status == PCI_ERS_RESULT_RECOVERED) { >> + /* error recovery succeeded thanks to reset_link() */ >> + status = PCI_ERS_RESULT_RECOVERED; >> + } >> + >> if (status == PCI_ERS_RESULT_CAN_RECOVER) { >> status = PCI_ERS_RESULT_RECOVERED; >> pci_dbg(dev, "broadcast mmio_enabled message\n"); >> -- >> 2.28.0 >> -- Be careful of reading health books, you might die of a misprint. -- Mark Twain