Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp777450ybh; Wed, 15 Jul 2020 15:13:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxyc6xd3NjcW7EoQWat5x27rEJay2o18fZljFg1mNAVE8yKNGZGBvTPNTQ2fs4I40zv1XO8 X-Received: by 2002:a05:6402:6c4:: with SMTP id n4mr1726724edy.353.1594851230737; Wed, 15 Jul 2020 15:13:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594851230; cv=none; d=google.com; s=arc-20160816; b=Ey6Gvi/13zFfrPdmFiMA3y3Tg2Hv/08sJWzj2J4JNVvBW7rSYIjIWzzOWOJonP0KoN sSBykjGgGwTG+Pp1vnvj7GFUQbxtCu/HGfjQm11HuWSGnq4L9crASfJ2Z9Pfyt4JBkIK qhXxMBXhjT+wjevs+v91RnL4mwRKxMHvDFrzek/PggCtFmbJ5s0iDRatzFhX/XPJvRR0 0CLUiiQ8JFuICjovr7bJo29ttK3GWSw7jl2nOPm2mJQmsag7W1F0WR/eiVhU9RYJvry+ Im2JYDCW9ubceU/b5uYYVO87LL2Ur3NPH4YOzVRnAZKYUETp+YVs60Oqyw2iGNmthYFq ZB3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:message-id:subject:cc:to:from:date:dkim-signature; bh=zfFZ44iVywp9etZxofC+L3s1UWABVHEppsQ7YAniRCU=; b=Bqo96+0+VFaIWcZT7u0Nxh5Q9U70j4xBny7YBU2MN/Jr05z3Tp2i5Ccnig57NDQqJD U815Y6X+cZzK3T1Ux68TPEIyeh608igWs2O/e9fQPe1aH4nONdSwwD8HFoTx5NBO3yot 1/pi3sVnUSxgqXjzgVN2rYDW6A5dvibykvBBFblGvbex/AEbYIGySwQpBfvR45QjVKx3 ayLTKvflSbofQjfXk2TRsjgTb3FbrsMe5LxSKZvabcV19dDUZiD2NIfgs5o64rpb/6BD nklHvaqvWkt4dKwXPMhXWj+89IqJxstEXJTWYYmfUlFt7Y4NHBAPw6bKOE8FAhKOWYHO 6NNg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="S/WhYifu"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h14si2105077edr.586.2020.07.15.15.13.28; Wed, 15 Jul 2020 15:13:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="S/WhYifu"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727910AbgGOWMe (ORCPT + 99 others); Wed, 15 Jul 2020 18:12:34 -0400 Received: from mail.kernel.org ([198.145.29.99]:35126 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726660AbgGOWMd (ORCPT ); Wed, 15 Jul 2020 18:12:33 -0400 Received: from localhost (mobile-166-175-191-139.mycingular.net [166.175.191.139]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3066F2065F; Wed, 15 Jul 2020 22:12:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1594851152; bh=yiFReVioS6K+TBmJa6LKAY/XmzClp9+1AGEcxSQDlBk=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=S/WhYifuBwhLApPX94Rc/0RHdh1EsD3Lsktt1aoGethpDycX9QnYAuXsckOndJhw5 n3QaIQohtvY8AFWnVXQuc092VZjXlIZnqtKtIWZvilP5gx9OzeqCMahn+95Usq7PJx YtSiGFgryUo4JNKSiAdj114J+tlIwEPKap8PPewk= Date: Wed, 15 Jul 2020 17:12:30 -0500 From: Bjorn Helgaas To: David Laight Cc: 'Oliver O'Halloran' , Arnd Bergmann , Keith Busch , Paul Mackerras , sparclinux , Toan Le , Greg Ungerer , Marek Vasut , Rob Herring , Lorenzo Pieralisi , Sagi Grimberg , Russell King , Ley Foon Tan , Christoph Hellwig , Geert Uytterhoeven , Kevin Hilman , linux-pci , Jakub Kicinski , Matt Turner , "linux-kernel-mentees@lists.linuxfoundation.org" , Guenter Roeck , Ray Jui , Jens Axboe , Ivan Kokshaysky , Shuah Khan , "bjorn@helgaas.com" , Boris Ostrovsky , Richard Henderson , Juergen Gross , Bjorn Helgaas , Thomas Bogendoerfer , Scott Branden , Jingoo Han , "Saheed O. Bolarinwa" , "linux-kernel@vger.kernel.org" , Philipp Zabel , Greg Kroah-Hartman , Gustavo Pimentel , linuxppc-dev , "David S. Miller" , Heiner Kallweit Subject: Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86 Message-ID: <20200715221230.GA563957@bjorn-Precision-5520> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1e2ae69a55f542faa18988a49e9b9491@AcuMS.aculab.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 15, 2020 at 02:38:29PM +0000, David Laight wrote: > From: Oliver O'Halloran > > Sent: 15 July 2020 05:19 > > > > On Wed, Jul 15, 2020 at 8:03 AM Arnd Bergmann wrote: > ... > > > - config space accesses are very rare compared to memory > > > space access and on the hardware side the error handling > > > would be similar, but readl/writel don't return errors, they just > > > access wrong registers or return 0xffffffff. > > > arch/powerpc/kernel/eeh.c has a ton extra code written to > > > deal with it, but no other architectures do. > > > > TBH the EEH MMIO hooks were probably a mistake to begin with. Errors > > detected via MMIO are almost always asynchronous to the error itself > > so you usually just wind up with a misleading stack trace rather than > > any kind of useful synchronous error reporting. It seems like most > > drivers don't bother checking for 0xFFs either and rely on the > > asynchronous reporting via .error_detected() instead, so I have to > > wonder what the point is. I've been thinking of removing the MMIO > > hooks and using a background poller to check for errors on each PHB > > periodically (assuming we don't have an EEH interrupt) instead. That > > would remove the requirement for eeh_dev_check_failure() to be > > interrupt safe too, so it might even let us fix all the godawful races > > in EEH. > > I've 'played' with PCIe error handling - without much success. > What might be useful is for a driver that has just read ~0u to > be able to ask 'has there been an error signalled for this device?'. In many cases a driver will know that ~0 is not a valid value for the register it's reading. But if ~0 *could* be valid, an interface like you suggest could be useful. I don't think we have anything like that today, but maybe we could. It would certainly be nice if the PCI core noticed, logged, and cleared errors. We have some of that for AER, but that's an optional feature, and support for the error bits in the garden-variety PCI_STATUS register is pretty haphazard. As you note below, this sort of SERR/PERR reporting is frequently hard-wired in ways that takes it out of our purview. Bjorn