Received: by 10.192.165.156 with SMTP id m28csp304738imm; Sun, 15 Apr 2018 23:45:00 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+3vPkG7uXnfsUDnDjurtojJ4LSRflG5Ck2Rk7Fu3M6hMmWXXEB5RZSxSU+Mxut0znoduh8 X-Received: by 10.167.133.23 with SMTP id v23mr20559651pfn.157.1523861100024; Sun, 15 Apr 2018 23:45:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523861099; cv=none; d=google.com; s=arc-20160816; b=vbzCV3TBKVdBQ+Gy6cDrvGSenfJcoAQmMap78G3bZd5IKcWQj5ofjHNGRRCaIcd5Rp YmpsaLcZZfrFTj76C/jwRsUazjGjfbFejojeygIlFl9umajoK73S1AFfeykOpGW1UcjT 2OfEEYazCeVI9kNDZT39OyG8p9V5hqMIhcm+oVetdcfXjJ6FC1nTvZwL1DXhxT+13uyj KlhSndkGD6bPI/sUxzIgSrbWQ7ZBYRymL4F4IRwz5mxz4wEwBYfro6PQfkDXNhjeHeCt XoDQWtjrt2cC7CdYcSaTaEu282TxiTwsH5eeWTu3WTsRq+OUBUs6fEjfm5yprqoCz+Vj sjQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:message-id:references :in-reply-to:subject:cc:to:from:date:content-transfer-encoding :mime-version:dkim-signature:dkim-signature :arc-authentication-results; bh=sQG+MgKrA7OQ48WDuNso8/kHtLBfTPY6AHRqnSXH7Z4=; b=An9usFO1Z7nOZ6Jdf1Zn4d+13xNEDkvWvUXCalNNDcQ59+zsYPPN8ck4zOohaYUzdA HEuAkf/lB7TE/u1zBCaHBKxgHHxpwfpbdz8hISANYgpvKa8kKiAz7vpoDJO1hBMjOdkK JOJQPB/0nHny6q3U4+T6hriqs0BUNktdrMwN2ZLc3JKA+GJm0ve5PQ+Vf7w6qfA11zzg 29oz5TbNeiveJwQLj2jIh/BWZcLISsd47LSh8rXwvYZ+GA5SdqjjRIG27OuUE+NR4zDl KCWcWVlgcAHoEwzrQ6zYluHQzWBtBuQ5YoyuwaYFQh5TfKbgJseZJXhLjT8zQ0rmXQG2 Yh3g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=DTet2WpZ; dkim=pass header.i=@codeaurora.org header.s=default header.b=RCK3LChl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y188si675307pgb.622.2018.04.15.23.44.45; Sun, 15 Apr 2018 23:44:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=DTet2WpZ; dkim=pass header.i=@codeaurora.org header.s=default header.b=RCK3LChl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752871AbeDPFvT (ORCPT + 99 others); Mon, 16 Apr 2018 01:51:19 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:40060 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751218AbeDPFvR (ORCPT ); Mon, 16 Apr 2018 01:51:17 -0400 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 24A6760F90; Mon, 16 Apr 2018 05:51:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1523857877; bh=TzeIUU2DO6BSNnLyc8h3s9fnavDXk3YE3JYzCy76u6k=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=DTet2WpZbONIgJ/OoVNBq9ocNW0uJf7Z+m6sIXkFKAe5Qxd8U3E80fLq2WRoO8/kF OKt8Gam5NQWdP6pgcsyQcOHm25ePFr0SV451HtyE9+YRs1mE99F3H0tauyDWrvF6XB u2fdRvumMB1/ExHc0ITHmznFNDHSD6Wc5wo8qQOI= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,T_DKIM_INVALID autolearn=no autolearn_force=no version=3.4.0 Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.codeaurora.org (Postfix) with ESMTP id C2BE36072E; Mon, 16 Apr 2018 05:51:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1523857875; bh=TzeIUU2DO6BSNnLyc8h3s9fnavDXk3YE3JYzCy76u6k=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=RCK3LChla1WcJp8FbnrDD5YTv+fEoakocDoyAXsqQsJEwA1cF13cZVqtlPYVDEltJ Ij/GdvTTUYDuaFmUwjt1O5xnxu50tD06hv6DO0Yr0N55IqhJQQzxyx6cii2n6bPZZv t1qa5SoCuZgA2KX3tbsfeclky8SGWNaoi7qyHqiY= MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 16 Apr 2018 11:21:15 +0530 From: poza@codeaurora.org To: Bjorn Helgaas Cc: Sinan Kaya , Keith Busch , Bjorn Helgaas , Philippe Ombredanne , Thomas Gleixner , Greg Kroah-Hartman , Kate Stewart , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Dongdong Liu , Wei Zhang , Timur Tabi , Alex Williamson , linux-pci-owner@vger.kernel.org Subject: Re: [PATCH v13 6/6] PCI/DPC: Do not do recovery for hotplug enabled system In-Reply-To: References: <20180410210349.GG54986@bhelgaas-glaptop.roam.corp.google.com> <13efe2e8-74c8-acb4-ec58-f79b14a1f182@codeaurora.org> <20180412140648.GD145698@bhelgaas-glaptop.roam.corp.google.com> <20180412143954.GB4810@localhost.localdomain> <20180412150231.GD4810@localhost.localdomain> <20180412170911.GA6424@localhost.localdomain> <20180416031726.GB158153@bhelgaas-glaptop.roam.corp.google.com> Message-ID: X-Sender: poza@codeaurora.org User-Agent: Roundcube Webmail/1.2.5 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018-04-16 11:03, poza@codeaurora.org wrote: > On 2018-04-16 08:47, Bjorn Helgaas wrote: >> On Sat, Apr 14, 2018 at 11:53:17AM -0400, Sinan Kaya wrote: >> >>> You indicated that you want to unify the AER and DPC behavior. Let's >>> settle on what we want to do one more time. We have been going forth >>> and back on the direction. >> >> My thinking is that as much as possible, similar events should be >> handled similarly, whether the mechanism is AER, DPC, EEH, etc. >> Ideally, drivers shouldn't have to be aware of which mechanism is in >> use. >> >> Error recovery includes conventional PCI as well, but right now I >> think we're only concerned with PCIe. The following error types are >> from PCIe r4.0, sec 6.2.2: >> >> ERR_COR >> Corrected by hardware with no software intervention. Software >> involved for logging only. >> >> Handled by AER via pci_error_handlers; DPC is never involved. >> >> Link is unaffected. >> >> ERR_NONFATAL >> A transaction is unreliable but the link is fully functional. >> >> If DPC is not supported, handled by AER via pci_error_handlers and >> the link is unaffected. >> >> If DPC supported, handled by DPC (because we set >> PCI_EXP_DPC_CTL_EN_NONFATAL) via remove/re-enumerate. >> >> ERR_FATAL >> The link is unreliable. >> >> If DPC is not supported, handled by AER via pci_error_handlers and >> the link is reset. >> >> If DPC supported, handled by DPC via remove/re-enumerate. >> >> It doesn't seem right to me that we handle both ERR_NONFATAL and >> ERR_FATAL events differently if we happen to have DPC support in a >> switch. >> >> Maybe we should consider triggering DPC only on ERR_FATAL? That would >> keep DPC out of the ERR_NONFATAL cases. >> >> For ERR_FATAL, maybe we should bite the bullet and use >> remove/re-enumerate for AER as well as for DPC. That would be painful >> for higher-level software, but if we're willing to accept that pain >> for new systems that support DPC, maybe life would be better overall >> if it worked the same way on systems without DPC? >> >> Bjorn > > This had crossed my mind when I first looked at the code. > DPC is getting triggered for both ERR_NONFATAL and ERR_FATAL case. > I thought the primary purpose of DPC to recover fatal errors, by > triggering HW recovery. > but what if some platform wants to handle both FATAL and NON_FATAL with > DPC ? > > As you said AER FATAL cases and DPC FATAL cases should be handled > similarly. > e.g. remove/re-enumerate the devices. > > while NON_FATAL case; only AER would come into picture. > if some platform would like to handle DPC NON_FATAL then it should > follow AER NON_FATAL path (where it does not do remove/re-enumerate) > > And the case where hotplug is enabled, remove/re-enumerate more sense > in case of ERR_FATAL. > And the case where hotplug is disabled, only re-enumeration is > required. (no need to remove the devices) > but then do we need to handle this case specifically, what is the harm > in removing the devices in all the cases followed by re-enumerate ? To Clarify the last line, what I meant here was, in case of ERR_FATAL we can always remove/re-enumerate the devices irrespective of hotplug is enabled or not. and in case of ERR_NONFATAL, DPC will follow AER path (where it just tries to recover) although I am not very sure that how to handle ERR_NONFATAL case if hotplug is enabled. Because as Keith suggested device might have been changed run-time. > > Regards, > Oza.