Received: by 10.192.165.156 with SMTP id m28csp304020imm; Sun, 15 Apr 2018 23:43:52 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+0StkaVXi21M+Ka8NFfZIX8H3nn7dAGTUyaveFaR7rwjcz/l75naMfzVLIRwSmXpAQx+bi X-Received: by 10.98.73.22 with SMTP id w22mr7870529pfa.96.1523861032882; Sun, 15 Apr 2018 23:43:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523861032; cv=none; d=google.com; s=arc-20160816; b=Ibw5egkmURb/mhqK/VWpyxFaqv+mCbnucM8OkxuBAupehrjeMW03AD5gO3S5geEEvA ghkCwi6RlQRhWnTR0MZCfail/KbKCZ8TGeU9AY1QGwuuMTKpTyhvLxC53G+YTZGxiiLT a4XiQ1JTWk+Q55xyM/qIV73S0hxhCgQeysQnEkyAWiv4ZFrQBNeBlCK1Q8or8prXvby/ Cqh7Y+dVyIKbtbe2kQGs8cOgryIglALklblJdLGw6vk8GZ9EjBbdnvLge3undvsDW82j sI16B4qwJ/FmVJlhthF6nEpItoONRau/UI8Sntw8hrMnsTc4r5+U8eYK++I7HIKMix8O dl/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:message-id:references :in-reply-to:subject:cc:to:from:date:content-transfer-encoding :mime-version:dkim-signature:dkim-signature :arc-authentication-results; bh=6yLvku+/552oC/C0aWb82IyjIa33OGBfCHTjq4raGS4=; b=1BlVIGP0725+jEM49En1XQdHsWcA1HN0aafe/54X/1dcCUwoKTGouOiRB2biB3QE6v t5+qJjA0ql8ASMpBGwBwRv21l9RTsQ/YlDRNBydIdg+yRnrcIGVV4Y2IxqRexL63nVpu l/Hdt4hGd/efb8Zu41+HUY8G5Om7vTrkVY08g3E6TkJezN0r3w3F1SCVJJ9vql5FIBk4 bh9V8pnA3gmg2FUricPgk5nDlofSYi1FTKRYveLn9ZgTv98+IYY2kQXBFp3Y6MlUJThY Q6M77tl1OK1eqvSyljD0xeW+RWbNnhyS4jX2lafFS8bCxQvVuVesl6AzMkTpwkH8h4Pf xEWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=dm8i0A7/; dkim=pass header.i=@codeaurora.org header.s=default header.b=V+zVQAIb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w12si7554263pgm.565.2018.04.15.23.43.38; Sun, 15 Apr 2018 23:43:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=dm8i0A7/; dkim=pass header.i=@codeaurora.org header.s=default header.b=V+zVQAIb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752941AbeDPFd2 (ORCPT + 99 others); Mon, 16 Apr 2018 01:33:28 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:36732 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751106AbeDPFd0 (ORCPT ); Mon, 16 Apr 2018 01:33:26 -0400 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 5E86660F92; Mon, 16 Apr 2018 05:33:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1523856806; bh=GWslziE1x3EeVFRROz2EWEGemT/3BFFHfsqZGSLV1P8=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=dm8i0A7/rholr4GkVlkmN8RsggrUjpPlRcDRe2LzKkHkPXfuqUcd3stRifxqVFaEH kbDszM0eCyhkV5dzjgIAP3DCXz0q1/znaEaRJjtVUesCRq7hfeSgGMOA/j0ZJ70PeY 9wDV56t7irmCOh7dibICHsAwfDpIJn2Mrr4OuAbQ= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,T_DKIM_INVALID autolearn=no autolearn_force=no version=3.4.0 Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.codeaurora.org (Postfix) with ESMTP id 4158A606AC; Mon, 16 Apr 2018 05:33:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1523856805; bh=GWslziE1x3EeVFRROz2EWEGemT/3BFFHfsqZGSLV1P8=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=V+zVQAIb3z/eub3x1h49aqKor8VWkyThOuZRad294zFeR2/fLbCcWKL7OT75d7Lsh yenj/8WGIU8lQEHxfvpT2fcGf6OQ5rPR9e3/ZQz5GQuSDAQADqdaBCrssHYUz6TTdA lwCYOthV3WqJZ9cTfKb96NKbAIDrADoOhUGmb+uI= MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 16 Apr 2018 11:03:25 +0530 From: poza@codeaurora.org To: Bjorn Helgaas Cc: Sinan Kaya , Keith Busch , Bjorn Helgaas , Philippe Ombredanne , Thomas Gleixner , Greg Kroah-Hartman , Kate Stewart , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Dongdong Liu , Wei Zhang , Timur Tabi , Alex Williamson Subject: Re: [PATCH v13 6/6] PCI/DPC: Do not do recovery for hotplug enabled system In-Reply-To: <20180416031726.GB158153@bhelgaas-glaptop.roam.corp.google.com> References: <20180410210349.GG54986@bhelgaas-glaptop.roam.corp.google.com> <13efe2e8-74c8-acb4-ec58-f79b14a1f182@codeaurora.org> <20180412140648.GD145698@bhelgaas-glaptop.roam.corp.google.com> <20180412143954.GB4810@localhost.localdomain> <20180412150231.GD4810@localhost.localdomain> <20180412170911.GA6424@localhost.localdomain> <20180416031726.GB158153@bhelgaas-glaptop.roam.corp.google.com> Message-ID: X-Sender: poza@codeaurora.org User-Agent: Roundcube Webmail/1.2.5 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018-04-16 08:47, Bjorn Helgaas wrote: > On Sat, Apr 14, 2018 at 11:53:17AM -0400, Sinan Kaya wrote: > >> You indicated that you want to unify the AER and DPC behavior. Let's >> settle on what we want to do one more time. We have been going forth >> and back on the direction. > > My thinking is that as much as possible, similar events should be > handled similarly, whether the mechanism is AER, DPC, EEH, etc. > Ideally, drivers shouldn't have to be aware of which mechanism is in > use. > > Error recovery includes conventional PCI as well, but right now I > think we're only concerned with PCIe. The following error types are > from PCIe r4.0, sec 6.2.2: > > ERR_COR > Corrected by hardware with no software intervention. Software > involved for logging only. > > Handled by AER via pci_error_handlers; DPC is never involved. > > Link is unaffected. > > ERR_NONFATAL > A transaction is unreliable but the link is fully functional. > > If DPC is not supported, handled by AER via pci_error_handlers and > the link is unaffected. > > If DPC supported, handled by DPC (because we set > PCI_EXP_DPC_CTL_EN_NONFATAL) via remove/re-enumerate. > > ERR_FATAL > The link is unreliable. > > If DPC is not supported, handled by AER via pci_error_handlers and > the link is reset. > > If DPC supported, handled by DPC via remove/re-enumerate. > > It doesn't seem right to me that we handle both ERR_NONFATAL and > ERR_FATAL events differently if we happen to have DPC support in a > switch. > > Maybe we should consider triggering DPC only on ERR_FATAL? That would > keep DPC out of the ERR_NONFATAL cases. > > For ERR_FATAL, maybe we should bite the bullet and use > remove/re-enumerate for AER as well as for DPC. That would be painful > for higher-level software, but if we're willing to accept that pain > for new systems that support DPC, maybe life would be better overall > if it worked the same way on systems without DPC? > > Bjorn This had crossed my mind when I first looked at the code. DPC is getting triggered for both ERR_NONFATAL and ERR_FATAL case. I thought the primary purpose of DPC to recover fatal errors, by triggering HW recovery. but what if some platform wants to handle both FATAL and NON_FATAL with DPC ? As you said AER FATAL cases and DPC FATAL cases should be handled similarly. e.g. remove/re-enumerate the devices. while NON_FATAL case; only AER would come into picture. if some platform would like to handle DPC NON_FATAL then it should follow AER NON_FATAL path (where it does not do remove/re-enumerate) And the case where hotplug is enabled, remove/re-enumerate more sense in case of ERR_FATAL. And the case where hotplug is disabled, only re-enumeration is required. (no need to remove the devices) but then do we need to handle this case specifically, what is the harm in removing the devices in all the cases followed by re-enumerate ? Regards, Oza.