Received: by 10.213.65.68 with SMTP id h4csp99837imn; Mon, 12 Mar 2018 07:58:33 -0700 (PDT) X-Google-Smtp-Source: AG47ELtD091r2/LyEr9YjCO+PetTJhbkN4JX7ZF4mb1yX6rdy1dGzaCLVlmDicQ8ILL41i/8bDS+ X-Received: by 10.98.16.131 with SMTP id 3mr8356330pfq.188.1520866713399; Mon, 12 Mar 2018 07:58:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520866713; cv=none; d=google.com; s=arc-20160816; b=QRGD2gGZN2JC1JNdteXNdRbACP9lisZxRaoxt9OouXzo6cSSU0KWETvWecaVdpQO7R Dk2KoJkfwgsj1jpGqdnoPZf4ww8ImqtJ3jUZw5Zqvt7CZzPUZl3tps8HyN2sg3GdkccR D1paxcJos1YmuZ6hTuO8BjlZmYRDD0Odpe9knNryZrvR2beIkSqpDQwc4J79qf28bY9U HeyIYofyrYN6L1ML58I85nog+qg64QQauQCC+d/mQagofEcgg7ssKXJZ8B//KLJIzEdv kQncmUskQJv9WqORNVM2pC8vxsedjtL6clQZjuQtvBKHF7Ogs8cmAK69oUul6/nA4JLv jgeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=nH8Fxe/cMFkjx7rPiTDK0Wo5E8f/iR4HKAxJxB0275I=; b=BptRLUNTufRrbYoeWUjbCdorEVL0Lu04GFy7dMRR8yVx6TcVqhdEDscsTemeWzZp20 gr+i6nHyhwXQC4/prvWfB2pVwDK1RtCTb73dh61e1413tdBLOiTZbbIdGjJbxPdh87aJ 4jhmqjqqDNOcIoYDMq1DFigjw1t8MMa9cdnQFJMZdq7DTh4vc2u6jCgjd5ycNvO0Zn9P VSo0K9SpQROMKFT/w8PgYRB2LENUU553wqFpnm9jDJ9KjX8HAF9/NNtJMdNhoQPN+LcU 5QiNfFXhcvXM4hLjJXBsUy+XHhvEz7ftAnyJ8TtUd7MjXf3q5T6AHK6+bUyt2c20r8d8 6A9Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l3-v6si6170791plt.146.2018.03.12.07.58.18; Mon, 12 Mar 2018 07:58:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751512AbeCLO4z (ORCPT + 99 others); Mon, 12 Mar 2018 10:56:55 -0400 Received: from mga05.intel.com ([192.55.52.43]:49714 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751280AbeCLO4y (ORCPT ); Mon, 12 Mar 2018 10:56:54 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Mar 2018 07:56:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.47,461,1515484800"; d="scan'208";a="36619280" Received: from unknown (HELO localhost.localdomain) ([10.232.112.44]) by fmsmga004.fm.intel.com with ESMTP; 12 Mar 2018 07:56:28 -0700 Date: Mon, 12 Mar 2018 08:58:24 -0600 From: Keith Busch To: poza@codeaurora.org Cc: Sinan Kaya , Bjorn Helgaas , Bjorn Helgaas , Philippe Ombredanne , Thomas Gleixner , Greg Kroah-Hartman , Kate Stewart , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Dongdong Liu , Wei Zhang , Timur Tabi , linux-pci-owner@vger.kernel.org Subject: Re: [PATCH v12 0/6] Address error and recovery for AER and DPC Message-ID: <20180312145823.GC18494@localhost.localdomain> References: <1519837457-3596-1-git-send-email-poza@codeaurora.org> <20180311220337.GA194000@bhelgaas-glaptop.roam.corp.google.com> <04ade52e-d1ea-fe67-bb26-246621d159e6@codeaurora.org> <20180312142551.GB18494@localhost.localdomain> <3e1a2036675de6b8456145a022640f3d@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3e1a2036675de6b8456145a022640f3d@codeaurora.org> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 12, 2018 at 08:16:38PM +0530, poza@codeaurora.org wrote: > On 2018-03-12 19:55, Keith Busch wrote: > > On Sun, Mar 11, 2018 at 11:03:58PM -0400, Sinan Kaya wrote: > > > On 3/11/2018 6:03 PM, Bjorn Helgaas wrote: > > > > On Wed, Feb 28, 2018 at 10:34:11PM +0530, Oza Pawandeep wrote: > > > > > > > That difference has been there since the beginning of DPC, so it has > > > > nothing to do with *this* series EXCEPT for the fact that it really > > > > complicates the logic you're adding to reset_link() and > > > > broadcast_error_message(). > > > > > > > > We ought to be able to simplify that somehow because the only real > > > > difference between AER and DPC should be that DPC automatically > > > > disables the link and AER does it in software. > > > > > > I agree this should be possible. Code execution path should be almost > > > identical to fatal error case. > > > > > > Is there any reason why you went to stop driver path, Keith? > > > > The fact is the link is truly down during a DPC event. When the link > > is enabled again, you don't know at that point if the device(s) on the > > other side have changed. Calling a driver's error handler for the wrong > > device in an unknown state may have undefined results. Enumerating the > > slot from scratch should be safe, and will assign resources, tune bus > > settings, and bind to the matching driver. > > > > Per spec, DPC is the recommended way for handling surprise removal > > events and even recommends DPC capable slots *not* set 'Surprise' > > in Slot Capabilities so that removals are always handled by DPC. This > > service driver was developed with that use in mind. > > Now it begs the question, that > > after DPC trigger > > should we enumerate the devices, ? > or > error handling callbacks, followed by stop devices followed by enumeration ? > or > error handling callbacks, followed by enumeration ? (no stop devices) I'm not sure I understand. The link is disabled while DPC is triggered, so if anything, you'd want to un-enumerate everything below the contained port (that's what it does today). After releasing a slot from DPC, the link is allowed to retrain. If there is a working device on the other side, a link up event occurs. That event is handled by the pciehp driver, and that schedules enumeration no matter what you do to the DPC driver.