Received: by 10.192.165.156 with SMTP id m28csp2006911imm; Thu, 12 Apr 2018 07:10:43 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+Lau/pVXnjOsyHb8Vg8jSY/s7QjhPiJYT95SdgDevu0Uy9v4qPM9rGmYbYPi/4+JH2/fGj X-Received: by 2002:a17:902:6949:: with SMTP id k9-v6mr1239800plt.185.1523542243399; Thu, 12 Apr 2018 07:10:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523542243; cv=none; d=google.com; s=arc-20160816; b=ckdEu15AjzVAEHEtLYNdDTwcvBkIIE5ZXMQobYKsGPDnW2+oJSP00ER2P35F5zSFPl OnLDqtemJ1psneji3DgqP0/aXPkqomJ2j+2vpEo321oaFcxHCpo41hxnovogDXBKfMCj /nC3D1+Eox0ZUyX5EAb256SIoI6bq1LMKQhOxasotnhf3h3CCPcCaa17y/2uB7BeT5/k kNJOXh7ZO7fYx3rlTSVj3p/1tyIAb7K0Z2ojIznPRrwlIqJKs+CWg/PKdth+8QxoLefl pVk6mY3QrLPCT7vwhGX+749eeQD5Y+5+Nb8kkT6M/jkGCPL13j3ptjGZ7lxUYUIaFKMN Oryw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dmarc-filter:arc-authentication-results; bh=eRS+4KeJ5Nn0ot+T+MWeDcQQ+UUwuSiBjoIxCZUUahc=; b=voV+xX2pIdYkSiRqC0jaCK9vKzEskuV5xd7iJU9gkV7NYrB26FaqoSU//PlAWHTVy9 a5r4+jbU8RokMUkKhdCXR0DpebHcigreX2up4jDkdMCHFdKjDeZt6xzV2wURPEtp10jN N6V3lOs8bjGlxFAWYaCv1Trev1YzLN7Uljs8myIF3KJ8lgLJa2x2lPyBDhZiBf0tngbv R7eHxvTJSyPZKM0IHCcMgYCKqoNeVLHfEql37w5mWSnQR6on8jgvK2dV5dulkEoivlWR EXuidfMGSyENrKrhcTlmdqplrhE5kptWEuUJ8GM+H1XWC+E1IMcjaU/0oxx4SrbqeKt0 dBdA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q124si2345052pgq.215.2018.04.12.07.10.05; Thu, 12 Apr 2018 07:10:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752989AbeDLOG4 (ORCPT + 99 others); Thu, 12 Apr 2018 10:06:56 -0400 Received: from mail.kernel.org ([198.145.29.99]:59294 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752406AbeDLOGy (ORCPT ); Thu, 12 Apr 2018 10:06:54 -0400 Received: from localhost (243.sub-174-234-139.myvzw.com [174.234.139.243]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 983BF2178E; Thu, 12 Apr 2018 14:06:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 983BF2178E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=helgaas@kernel.org Date: Thu, 12 Apr 2018 09:06:49 -0500 From: Bjorn Helgaas To: Sinan Kaya Cc: Oza Pawandeep , Bjorn Helgaas , Philippe Ombredanne , Thomas Gleixner , Greg Kroah-Hartman , Kate Stewart , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Dongdong Liu , Keith Busch , Wei Zhang , Timur Tabi , Alex Williamson Subject: Re: [PATCH v13 6/6] PCI/DPC: Do not do recovery for hotplug enabled system Message-ID: <20180412140648.GD145698@bhelgaas-glaptop.roam.corp.google.com> References: <1523284914-2037-1-git-send-email-poza@codeaurora.org> <1523284914-2037-7-git-send-email-poza@codeaurora.org> <20180410210349.GG54986@bhelgaas-glaptop.roam.corp.google.com> <13efe2e8-74c8-acb4-ec58-f79b14a1f182@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <13efe2e8-74c8-acb4-ec58-f79b14a1f182@codeaurora.org> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [+cc Alex because of his interest in device reset] For context, here's the part of the patch we're discussing: > >> static void dpc_work(struct work_struct *work) > >> { > >> struct dpc_dev *dpc = container_of(work, struct dpc_dev, work); > >> struct pci_dev *pdev = dpc->dev->port; > >> > >> /* From DPC point of view error is always FATAL. */ > >> - pcie_do_recovery(pdev, DPC_FATAL); > >> + if (!pdev->is_hotplug_bridge) > >> + pcie_do_recovery(pdev, DPC_FATAL); > >> + else > >> + dpc_reset_link_remove_dev(pdev); > >> } This is at the point where DPC has been triggered in the hardware and the DPC driver is starting recovery, and I'm wondering why we need to handle the "!pdev->is_hotplug_bridge" case differently. On Wed, Apr 11, 2018 at 09:41:56PM -0400, Sinan Kaya wrote: > On 4/10/2018 5:03 PM, Bjorn Helgaas wrote: > >> DPC and AER should attempt recovery in the same way, except the > >> cases where system is with hotplug enabled. > > What's the connection with hotplug? I see from the patch that for > > hotplug bridges you remove the tree below the bridge, and otherwise > > you just reset the secondary link (I think). > > > > The changelog should explain why we need the difference. > > > > I'm a little skeptical to begin with, because I'm not sure why we > > should handle a DPC event differently just because a bridge has the > > *capability* of hotplug. Even if a hotplug bridge reports a DPC > > event, that doesn't necessarily mean a hotplug has occurred. > > Let's do a recap on what we have discussed about this until now. > > There are two conflicting error recovery mechanisms for PCIe. > > If a system supports both hotplug and DPC, endpoint can be removed > and inserted safely. DPC driver shuts down the driver on link down. > When link comes back up, hotplug driver takes over and initiates an > enumeration process. Keith mentioned the stop and re-enumerate > design was chosen because someone could remove a drive and insert an > unrelated drive back to the system. We can't really save and > restore state as we do in the AER path. > > Now, let's assume a system without hotplug capability. Second > mechanism is to go through DPC/AER path and do an automatic link > down recovery via DPC retrain/secondary bus reset including register > save and restore. Second mechanism is more suitable for handling > "surprise link down" event. The goal is to retrain the link and > continue driver operation. > > The goal of this patch to separate these two cases from each other > as the DPC driver needs to work on both contexts. Current DPC code > doesn't handle the second use case. I think the scenario you are describing is two systems that are identical except that in the first, the endpoint is below a hotplug bridge, while in the second, it's below a non-hotplug bridge. There's no physical hotplug (no drive removed or inserted), and DPC is triggered in both systems. I suggest that DPC should be handled identically in both systems: - The PCI core should have the same view of the endpoint: it should be removed and re-added in both cases (or in neither case). - The endpoint itself should not be able to tell the difference: it should see a link down event, followed by a link retrain, followed by the same sequence of config accesses, etc. - The endpoint driver should not be able to tell the difference, i.e., we should be calling the same pci_error_handlers callbacks in both cases. It's true that in the non-hotplug system, pciehp probably won't start re-enumeration, so we might need an alternate path to trigger that. But that's not what we're doing in this patch. In this patch we're adding a much bigger difference: for hotplug bridges, we stop and remove the hierarchy below the bridge; for non-hotplug bridges, we do the AER-style flow of calling pci_error_handlers callbacks. Bjorn