Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp90002img; Wed, 20 Mar 2019 14:52:36 -0700 (PDT) X-Google-Smtp-Source: APXvYqz0G0VQXDsFU4HXG6Jr6w5EzBuUC3SSXQBupkPv+OKg/pMImIGc3ye4IajuCWIgUSl45EDq X-Received: by 2002:a63:6cc1:: with SMTP id h184mr181905pgc.151.1553118756821; Wed, 20 Mar 2019 14:52:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553118756; cv=none; d=google.com; s=arc-20160816; b=GVRAclpZHlm4U9L9Xw7gGjPRn7+WSmIiKJjY0lhrVRv1Xxo+amaqX7DbgpJemrKXa/ ZbDAAW6+tfblRTVVtgSzsmvlZjNusYt2twedAuoEJnuomjtkHeB1GyOWV9jCT/HzsPVf ZOuilVBIxbAyPN/OdIF8nSEyG5GMnGIAzRHQLZGe/Z49UBDADiFTP0h9di3SvtyfgFUr SKjVwVygRGwOw18U5+uOcP1W9TSC5qbAb/LZen+C0RVZ7ceg5F3JSDY0H6LPE5Ls+clV nPSrPeKlLKVthG1ABr6eJpR1fkSGcrfdEg1UVRZeKHCdRj0Ys7/ptfCWEtZm9OUn4M/W DUIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=nFrXG5qc1aUtScwVWNQp/5mCoDnizoN1TRnTd6X2ckw=; b=QJph8RYETck+fM+Ai+IVfHZWY83L4xYrOL2L6xw9i7FXe1K8I1+0lbzkmetm+cb6zn 0fJfVyXG77cI69mw/NDbyPjyTdpwzA6oZ30uMwd/srAhsnehd6BxrosMocOShpv8w0EF hiA9jB5STHh7HhJLBE8gfoUb0kAWiBL50krmj9lv4isyxF6z6wBCBHKge8hRjBTnAwKT Zxto6TpNOnCbfTNgnqaFKnGCrUtyuzlkiDHzUCfzpjzT+rIVD3E2aKPtX1w3ua+YM5vr y5D5j8xASrO0fUXl58PQ9mWJk41Ce50ME/C9pMPm6tgvDR6nBckkGmXd7AkWxXNDZKTQ AO7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=I2S5l4EG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q3si2690525pls.22.2019.03.20.14.52.21; Wed, 20 Mar 2019 14:52:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=I2S5l4EG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727577AbfCTVvd (ORCPT + 99 others); Wed, 20 Mar 2019 17:51:33 -0400 Received: from mail-lf1-f68.google.com ([209.85.167.68]:40323 "EHLO mail-lf1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727382AbfCTVvc (ORCPT ); Wed, 20 Mar 2019 17:51:32 -0400 Received: by mail-lf1-f68.google.com with SMTP id u68so3076411lff.7 for ; Wed, 20 Mar 2019 14:51:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=nFrXG5qc1aUtScwVWNQp/5mCoDnizoN1TRnTd6X2ckw=; b=I2S5l4EG3lG/fb/hD3y7ZF7E5weaw12ztaizoHY26ZGDtCX+45F/XVqPykxjHQMS+3 GPCUhUw8AU/neg9T2oc+HWDyw/EFAL2iBxhpZUzdMeR7Umb78k/5q5I+7agHe8PW6Yeo mZiKxKboRm2UCN10QzNvj9+LXHCx9gTtKO1eA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=nFrXG5qc1aUtScwVWNQp/5mCoDnizoN1TRnTd6X2ckw=; b=O1NXfMnAodqqLSiCrjuYhIYlT7DooLnlA3ByIOWSXOEynjtuoClI1Uz9ARoHyHjx+J RETgBa6GTheOXAbxf37Vl2CDASdpzw2F6+sXd0ZP4thvn0KFQSOwrt5yA0eHbldVuUtm nAr50fo54js8V8wSjx1S3Y3uIyUVIQySpVBwqurrMd5nyds7aysoYSndJmcd9FG1BOu+ /wCI5LJr1dgbp4QdVIs+au6KjhXJRLQcrGHXcBYmwTZYSaDs/vXmb2jW+Knug07hgGhc vtl1EJ5Bg7YViMl0lrg2NzA7h7eeqeEUgatpparmjixsaze0m5bg6GwipOTHXRYLfHkB 5hew== X-Gm-Message-State: APjAAAVbsHX0OtfBVeQd9eJKoUT0EEj8yVahyxLEjbmO3Xt5DZp61g8M n/Hgwo//9H9W2zLoUv9LXr+vT2+32ag= X-Received: by 2002:ac2:4203:: with SMTP id y3mr50640lfh.11.1553118689648; Wed, 20 Mar 2019 14:51:29 -0700 (PDT) Received: from mail-lf1-f52.google.com (mail-lf1-f52.google.com. [209.85.167.52]) by smtp.gmail.com with ESMTPSA id 13sm582231lfy.2.2019.03.20.14.51.28 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 20 Mar 2019 14:51:28 -0700 (PDT) Received: by mail-lf1-f52.google.com with SMTP id g7so3077137lfh.10 for ; Wed, 20 Mar 2019 14:51:28 -0700 (PDT) X-Received: by 2002:ac2:4424:: with SMTP id w4mr41645lfl.148.1553118287299; Wed, 20 Mar 2019 14:44:47 -0700 (PDT) MIME-Version: 1.0 References: <20190222194808.15962-1-mr.nuke.me@gmail.com> <20190320205233.GE251185@google.com> In-Reply-To: <20190320205233.GE251185@google.com> From: Linus Torvalds Date: Wed, 20 Mar 2019 14:44:31 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v3] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected To: Bjorn Helgaas Cc: Alexandru Gagniuc , austin_bolen@dell.com, Alex Gagniuc , Keith Busch , Shyam_Iyer@dell.com, Lukas Wunner , Sinan Kaya , linux-pci@vger.kernel.org, Linux List Kernel Mailing , Jon Derrick , Jens Axboe , Christoph Hellwig , Sagi Grimberg , linux-nvme@lists.infradead.org, Greg Kroah-Hartman , "Oliver O'Halloran" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 20, 2019 at 1:52 PM Bjorn Helgaas wrote: > > AFAICT, the consensus there was that it would be better to find some > sort of platform solution instead of dealing with it in individual > drivers. The PCI core isn't really a driver, but I think the same > argument applies to it: if we had a better way to recover from readl() > errors, that way would work equally well in nvme-pci and the PCI core. I think that patches with the pattern "if (disconnected) don't do IO" are fundamentally broken and we should look for alternatives in all cases. They are fundamentally broken because they are racy: if it's an actual sudden disconnect in the middle of IO, there's no guarantee that we'll even be notified in time. They are fundamentally broken because they add new magic special cases that very few people will ever test, and the people who do test them tend to do so with old irrelevant kernels. Finally, they are fundamentally broken because they always end up being just special cases. One or two special case accesses in a driver, or perhaps all accesses of a particular type in just _one_ special driver. Yes, yes, I realize that people want error reporting, and that hot-removal can cause various error conditions (perhaps just parity errors for the IO, but also perhaps other random errors caused by firmware perhaps doing special HW setup). But the "you get a fatal interrupt, so avoid the IO" kind of model is completely broken, and needs to just be fixed differently. See above why it's so completely broken. So if the hw is set up to send some kinf of synchronous interrupt or machine check that cannot sanely be handled (perhaps because it will just repeat forever), we should try to just disable said thing. PCIe allows for just polling for errors on the bridges, afaik. It's been years since I looked at it, and maybe I'm wrong. And I bet there are various "platform-specific value add" registers etc that may need tweaking outside of any standard spec for PCIe error reporting. But let's do that in a platform driver, to set up the platform to not do the silly "I'm just going to die if I see an error" thing. It's way better to have a model where you poll each bridge once a minute (or one an hour) and let people know "guys, your hardware reports errors", than make random crappy changes to random drivers because the hardware was set up to die on said errors. And if some MIS person wants the "hardware will die" setting, then they can damn well have that, and then it's not out problem, but it also means that we don't start changing random drivers for that insane setting. It's dead, Jim, and it was the users choice. Notice how in neither case does it make sense to try to do some "if (disconnected) dont_do_io()" model for the drivers. Linus