Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp3531464pxv; Mon, 19 Jul 2021 02:30:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzlDYw0hWNantVxLasE+/gLCItgDxe084Zq48X/iZ+FciZ9WgWwrBKaRwOKiUfgj1YlBdtt X-Received: by 2002:a17:906:3c0d:: with SMTP id h13mr26433249ejg.335.1626687042812; Mon, 19 Jul 2021 02:30:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626687042; cv=none; d=google.com; s=arc-20160816; b=dUu3FSSRRuV1eAdpDGHdpf1Dlr6IUn2GHGXKHiCuZw189CmuVpwBjmuH7FFVVBDBL5 mXvQg8dfheJkW5qqNsSr1mrmX2LOundfR6DOJBD4hVbbAkrp5TbuSJOgzQb1Gb18g5Wn q//e6Ht9oBhShySsyUoTLe1GX/fmZ3AepmdSqwYNcJoDCIfvO/pYf3T/9Bc2oer4rMBA m1vBtLFS5SMspJ25rSlB4ARmafsSrBlA/0SnXVrItfVqqC+nG8iYXQnGwcEFfqU/+FZ/ 2weF6FTB63VWO3B8acpYGSKpaBLx7wWLzTnLK1/3Tv9zdu1AUAUfZntI0uc1fVsRR4Dq Mp4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=zfmlvgYZF9ya8XeRC56p7k9TICKFfORGfVYAIqo+E9E=; b=0vSv/rwbIzvPCcDBX9GwlPsYrNIjqDY6c+yqGgC8f1zANOb/T/aYTqS6J5eWU6ZOtD kOPVjdE94fkUFx6vnjJVXLJ1bf0AthonBqICCxb/4Kung1YQUi0S9Cxn4N+Rpe5DD8mq G2WG2mseWvjovt6MxlzTOm2jsFsOuuz+lGvOEDKNwYbx98qG7cadnE+XhCl/cFPi1k5V aajchW272Gehl76timdB9T0tRj3cf6k0WFtHXMnj8Gu14BYksv6INFNoSCho788Ea/8H vNPfp9RejpUxOscXpSXklYB0b5vGXSgp3PagX3Me+iZiyTbTzahTd9sn/wh4BW/soi/2 WC5w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=oWbFMB5a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h2si5729136eje.232.2021.07.19.02.30.20; Mon, 19 Jul 2021 02:30:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=oWbFMB5a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235651AbhGSIsp (ORCPT + 99 others); Mon, 19 Jul 2021 04:48:45 -0400 Received: from mail.kernel.org ([198.145.29.99]:53358 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234868AbhGSIso (ORCPT ); Mon, 19 Jul 2021 04:48:44 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 5D24D60240; Mon, 19 Jul 2021 08:13:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1626682422; bh=0TE5z1JwXY+ttzhYw+dqHxv31KAXFHmt/AZXZQRPKEk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=oWbFMB5aDV3OXGYjtcSxVa7L7v6T+xq91Sy/9s0bYFGq0eORYxBMSV/Kqkx7AbvFO lrxlaP6M3LHNnmc+J1FFxh8jrqdEtrRcywtw0sOgZk6fwjW3tdYjhwZ+V4dipbXtbe nUZoSI6rI3WKqQLtNIMZqbfrPAbuwAkjEp4jZpMLn+b1nOggU4zCWAblcaFs6+d6+k p9Nk7EgpoCfveNsBAgEFO+GWHe5u9vMcHimhKwhzIqINvHluzEKV7S62kpPA4lQPrs xNqEB/4kM/owBHg8WHnJOFtzEjjJ6xg/f2rWaz0ogQ8aycQm2Aw4yOH5m8LBp3RHOK VC6hh9Ebms8Fw== Received: by pali.im (Postfix) id E64CEADB; Mon, 19 Jul 2021 10:13:39 +0200 (CEST) Date: Mon, 19 Jul 2021 10:13:39 +0200 From: Pali =?utf-8?B?Um9ow6Fy?= To: Oliver O'Halloran Cc: Bjorn Helgaas , Aaron Ma , jesse.brandeburg@intel.com, anthony.l.nguyen@intel.com, "David S. Miller" , Jakub Kicinski , intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, Linux Kernel Mailing List , Krzysztof =?utf-8?Q?Wilczy=C5=84ski?= , linux-pci Subject: Re: [PATCH 1/2] igc: don't rd/wr iomem when PCI is removed Message-ID: <20210719081339.52inudhug3rgpbed@pali> References: <20210708154550.GA1019947@bjorn-Precision-5520> <20210718225059.hd3od4k4on3aopcu@pali> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20180716 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Monday 19 July 2021 12:49:18 Oliver O'Halloran wrote: > On Mon, Jul 19, 2021 at 8:51 AM Pali Rohár wrote: > > > > And do we have some solution for this kind of issue? There are more PCIe > > controllers / platforms which do not like MMIO read/write operation when > > card / link is not connected. > > Do you have some actual examples? The few times I've seen those > crashes were due to broken firmware-first error handling. The AER > notifications would be escalated into some kind of ACPI error which > the kernel didn't have a good way of dealing with so it panicked > instead. I have experience and examples with pci aardvark controller. When card is disconnected it sends synchronous abort to CPU when doing MMIO read operation. One example is in this linux-usb thread: https://lore.kernel.org/linux-usb/20210505120117.4wpmo6fhvzznf3wv@pali/t/#u I can trigger this issue at least for xhci, nvme and ath drivers. > Assuming it is a real problem then as Bjorn pointed out this sort of > hack doesn't really fix the issue because hotplug and AER > notifications are fundamentally asynchronous. In case of pci aardvark it is not AER notification. And for MMIO read it is synchronous abort. Anyway, hotplug events are really asynchronous, but there is main issue that this hotplug disconnect event instruct device driver to "unbind" and e.g. these ethernet or usb controllers try to do MMIO operations in their cleanup / remove / unbind phase, even when card is already "disconnected" in PCI subsystem. > If the driver is > actively using the device when the error / removal happens then the > pci_dev_is_disconnected() check will pass and the MMIO will go > through. If the MMIO is poisonous because of dumb hardware then this > sort of hack will only paper over the issue. > > > If we do not provide a way how to solve these problems then we can > > expect that people would just hack ethernet / wifi / ... device drivers > > which are currently crashing by patches like in this thread. > > > > Maybe PCI subsystem could provide wrapper function which implements > > above pattern and which can be used by device drivers? > > We could do that and I think there was a proposal to add some > pci_readl(pdev, ) style wrappers at one point. On powerpc > there's hooks in the arch provided MMIO functions to detect error > responses and kick off the error handling machinery when a problem is > detected. Those hooks are mainly there to help the platform detect > errors though and they don't make life much easier for drivers. Due to > locking concerns the driver's .error_detected() callback cannot be > called in the MMIO hook so even when the platform detects errors > synchronously the driver notifications must happen asynchronously. In > the meanwhile the driver still needs to handle the 0xFFs response > safely and there's not much we can do from the platform side to help > there. > > Oliver