Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp2814330pxf; Sun, 28 Mar 2021 03:01:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzlN5fAVK9UMy+4A7UzgbWU+qCQd/AWbZEz88EOjULv6l/cXAF5rt6NqN5Ams18ZzluJ2of X-Received: by 2002:a05:6402:c88:: with SMTP id cm8mr23616677edb.62.1616925711737; Sun, 28 Mar 2021 03:01:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616925711; cv=none; d=google.com; s=arc-20160816; b=n9YJ/jg62K7U0cL5uPuLLibuLdvnXE/JuvbIGtUYmwDT6edp1tgkIYi9zltKlhsgB1 x1MVdKg44K3m398nXS9ruZYlpl70xXdhFHt73naNSnTyXrGB+ClhKYLpnEbK7uTKJcjE bFiBQl3HJscZ84YBdAeDf8DUzRbViOlZgZLUvkxuI/nr9QDoSo27bXtxkFExaD7LqLq3 0E6RPXW8IoaWC60lDky5M1bSCtLjLvjgw+DTkXaXZmCn0S2062oDYs1wlpPeof1lYQ6G /EYa5xMdAWIMA9hwWQrOxVHkIuqneFNartadxtIyzzD2LRh/WK/hLeKu6ICKVVQ8ON63 GNrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=DvmzU8qwW9AGiIy9dJ9HM2zFvFPxOPg8RDY2eVs3v30=; b=D+Wkeqt6klns3/bG1GkyEHmvluDW/7zXgW7rcumezGqF1BbFrH9PJeESZ5cTc8kxZq WvEsEjRPxJJv+mF6K2dfeHDSYfL69AcgKdAOy1BE/LcgqxvbLxL73WMEijbERkivDavG n3LfNRCRpEEUBB9/0Pz+UgTyVetPHtv+kzZUPHxDesTprAFTsvk7WtaYLfu1Mxy3EL8/ RQSO4ua9jeFsOh+eSmjvdpq8/w5HOTHBz566MAt6xviRcaoqd0eyc2IsjRdcGcANifNo H4oo6nXAk0ynX2Z8/u6ilIXxupsEwtFxVBFV0giAytvrvCQvCYoWhv9UU2dIKc/7yKKK WNzQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z18si10261875edb.55.2021.03.28.03.01.28; Sun, 28 Mar 2021 03:01:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230316AbhC1Jxx (ORCPT + 99 others); Sun, 28 Mar 2021 05:53:53 -0400 Received: from bmailout2.hostsharing.net ([83.223.78.240]:57801 "EHLO bmailout2.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229647AbhC1Jxe (ORCPT ); Sun, 28 Mar 2021 05:53:34 -0400 Received: from h08.hostsharing.net (h08.hostsharing.net [83.223.95.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "RapidSSL TLS DV RSA Mixed SHA256 2020 CA-1" (verified OK)) by bmailout2.hostsharing.net (Postfix) with ESMTPS id E27652800B3E0; Sun, 28 Mar 2021 11:53:32 +0200 (CEST) Received: by h08.hostsharing.net (Postfix, from userid 100393) id D6C861EEC7; Sun, 28 Mar 2021 11:53:32 +0200 (CEST) Date: Sun, 28 Mar 2021 11:53:32 +0200 From: Lukas Wunner To: "Kuppuswamy, Sathyanarayanan" Cc: Sathyanarayanan Kuppuswamy Natarajan , Dan Williams , Bjorn Helgaas , Linux PCI , Linux Kernel Mailing List , "Raj, Ashok" , Keith Busch , knsathya@kernel.org, Sinan Kaya Subject: Re: [PATCH v2 1/1] PCI: pciehp: Skip DLLSC handling if DPC is triggered Message-ID: <20210328095332.GA8657@wunner.de> References: <59cb30f5e5ac6d65427ceaadf1012b2ba8dbf66c.1615606143.git.sathyanarayanan.kuppuswamy@linux.intel.com> <20210317041342.GA19198@wunner.de> <20210317053114.GA32370@wunner.de> <20210317190151.GA27146@wunner.de> <0a020128-80e8-76a7-6b94-e165d3c6f778@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0a020128-80e8-76a7-6b94-e165d3c6f778@linux.intel.com> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 17, 2021 at 01:02:07PM -0700, Kuppuswamy, Sathyanarayanan wrote: > On 3/17/21 12:01 PM, Lukas Wunner wrote: > > If the events are ignored, the driver of the device in the hotplug slot > > is not unbound and rebound. So the driver must be able to cope with > > loss of TLPs during DPC recovery and it must be able to cope with > > whatever state the endpoint device is in after DPC recovery. > > Is this really safe? How does the nvme driver deal with it? > > During DPC recovery, in pcie_do_recovery() function, we use > report_frozen_detected() to notify all devices attached to the port > about the fatal error. After this notification, we expect all > affected devices to halt its IO transactions. > > Regarding state restoration, after successful recovery, we use > report_slot_reset() to notify about the slot/link reset. So device > drivers are expected to restore the device to working state after this > notification. Thanks a lot for the explanation. > I am not sure how pure firmware DPC recovery works. Is there a platform > which uses this combination? For firmware DPC model, spec does not clarify > following points. > > 1. Who will notify the affected device drivers to halt the IO transactions. > 2. Who is responsible to restore the state of the device after link reset. > > IMO, pure firmware DPC does not support seamless recovery. I think after it > clears the DPC trigger status, it might expect hotplug handler be responsible > for device recovery. > > I don't want to add fix to the code path that I don't understand. This is the > reason for extending this logic to pure firmware DPC case. I agree, let's just declare synchronization of pciehp with pure firmware DPC recovery as unsupported for now. I've just submitted a refined version of my patch to the list: https://lore.kernel.org/linux-pci/b70e19324bbdded90b728a5687aa78dc17c20306.1616921228.git.lukas@wunner.de/ If you could give this new version a whirl I'd be grateful. This version contains more code comments and kernel-doc. There's now a check in dpc_completed() whether the DPC Status register contains "all ones", which can happen when a DPC-capable hotplug port is hot-removed, i.e. for cascaded DPC-capable hotplug ports. I've also realized that the previous version was prone to races which are theoretical but should nonetheless be avoided: E.g., previously the DLLSC event was only removed from "events" if the link is still up after DPC recovery. However if DPC triggers and recovers multiple times in a row, the link may happen to be up but a new DLLSC event may have been picked up in "pending_events" which should be ignored. I've solved this by inverting the logic such that DLLSC is *always* removed from "events", and if the link is unexpectedly *down* after successful recovery, a DLLSC event is synthesized. Thanks, Lukas