Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp330451img; Wed, 20 Mar 2019 21:59:13 -0700 (PDT) X-Google-Smtp-Source: APXvYqxbq8PS4CG6zcuiVuVrBhgTAWtmxvT5yYdB/7Lh68F/oOReRGXUlqfu7L1AuEYrOgonxd7h X-Received: by 2002:a65:4547:: with SMTP id x7mr1586247pgr.350.1553144353732; Wed, 20 Mar 2019 21:59:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553144353; cv=none; d=google.com; s=arc-20160816; b=L+SOHprP14OQ1y14PxmI0cI2n/R1KtCIie/vech+c+VAIbORzF3OmfWiu+V8YydkTn IaZBJBewumgIbh9UDslDo6PdLRYKcpE5R+p4n8iAtjqaeXUQuI9R9MIH3WNsvEet+hLl fa5DpXPLwLWxOdZBLIPPjXL/I5PZtGRp+/BEcGCxxgXOPm2l7az/0+ayzvwPp1UMBn+t vK5qzhSS54hOhjoZgf46+aHlPTV++1i0Ze0lAU3taenZcBtafLZUu6DK11ob8nZ2xBxh yXkXxzA7XeS2uOFE+29eBDPLIuUuUuFkwn/xP36F33JKfLbuQv2L5ohi7M3n2hNrL6Uj doKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=3dyLsgRjy52GT6zOA+aLJRUshJeDc3eyQetzOoKEFkc=; b=gs5lPeCp+HlgNnaCjsXi6kN3yszPvkjcV8ORoibaw8jT2AiIo3PeMQfh30WLycZvbi tovZQxMFIaJCat7Uyi3bwUM6kTsenRh5DqNwGH6nQ7alnSTCvlhuGhWOrmNQDkVkTrUA lTbK4vSmaWJMGxaBAv+yeLzKuKiFhWWitGBgPcdAaHJfCka4nnncb9t1ad06U5QXFAEm mpA3EDmGRZT3zuHCguf8txdqvHMNOGUUrDYwaVbr+ibVCQXZFH3yOTQtJLzrKa26nm0W rRH2+EAEhE43LVy2qR2t1JlZVhCdsBWtJAfunQzuzjSz36mjlWOsdAdfUi8/oGNcFw8K M2IA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=siojCPH1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u11si3240829pgp.205.2019.03.20.21.58.48; Wed, 20 Mar 2019 21:59:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=siojCPH1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726012AbfCUE5u (ORCPT + 99 others); Thu, 21 Mar 2019 00:57:50 -0400 Received: from mail-io1-f65.google.com ([209.85.166.65]:46750 "EHLO mail-io1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725800AbfCUE5u (ORCPT ); Thu, 21 Mar 2019 00:57:50 -0400 Received: by mail-io1-f65.google.com with SMTP id b9so4188651iot.13; Wed, 20 Mar 2019 21:57:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=3dyLsgRjy52GT6zOA+aLJRUshJeDc3eyQetzOoKEFkc=; b=siojCPH1RG6sCukAfkZk2VjfJF+9XBYR34fEZ5mmaWuNRrN9YBmk+pQzk9PH+2JblD Lmn12S29Ec8LWfYdkHI9mo3dKPoFjqpakChfNHHFVNw4lq0GNCcRG91z3hKClpEhIhI2 GLM5F7sEoTSGYtnxJRCn3O+xT12mlXKy9xmV0yzU0stTrecBBK0aUjfdgh4pkIBWnCSF Rx56AjrmXSEhVx57HG4qxjHkIxXFK7B/rWT427+jWHjYzMx6Fk/A396QJ6uDEeSjUmbz nco1JOMfwEOj+VYqww2J5xu8T5UICNQCFhSaFmssY5DmBYsWzBA9qno+zs7NuaeI1aSx LO4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=3dyLsgRjy52GT6zOA+aLJRUshJeDc3eyQetzOoKEFkc=; b=kF7BcHAKy170ivcUGSY7X2cFU8TemvynkEZyF6wSGkR4zqdCJumpp7YHzbbkfvon2i sxvwZlEntu+cvqW43qtHYgNavDPqvFRC7944ypAPTw2ViC5ixCObqY/rtcyZNH4MAztV hn83T/G25P+whQQwosPd5bUImSZVjAbhptfaGARv+TNY8QX1/NZpHstprN+bWAo2mRLA ZU7Ckj8gT3yyyfw10p7s/p29RPHDCkHIARBEbUbhFvu3bgqW3m2Wu1pMrBLMfWtjsSod LYayrtoXIZfAe8dCtjoR+FwYUnLo0/2repFHYerS15H7HZfemZfSbINanZSI0LNjPGt/ RF9w== X-Gm-Message-State: APjAAAXRA7O9TbmOlCCo4Yj/VHwyWQwPH+hCx2wYnMTyBKPKP+vwbkV0 iFQUrsGYNtn8F+Ay8HNqnzq+qh3g3WNkQl0IG88= X-Received: by 2002:a6b:e305:: with SMTP id u5mr1342124ioc.262.1553144268944; Wed, 20 Mar 2019 21:57:48 -0700 (PDT) MIME-Version: 1.0 References: <20190222194808.15962-1-mr.nuke.me@gmail.com> <20190320205233.GE251185@google.com> <156dec38-438d-1ce9-64fa-af95d37c14bd@gmail.com> In-Reply-To: <156dec38-438d-1ce9-64fa-af95d37c14bd@gmail.com> From: Oliver Date: Thu, 21 Mar 2019 15:57:37 +1100 Message-ID: Subject: Re: [PATCH v3] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected To: Alex G Cc: Linus Torvalds , Bjorn Helgaas , austin_bolen@dell.com, Alex Gagniuc , Keith Busch , Shyam_Iyer@dell.com, Lukas Wunner , Sinan Kaya , linux-pci@vger.kernel.org, Linux List Kernel Mailing , Jon Derrick , Jens Axboe , Christoph Hellwig , Sagi Grimberg , linux-nvme@lists.infradead.org, Greg Kroah-Hartman Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 21, 2019 at 12:27 PM Alex G wrote: > > On 3/20/19 4:44 PM, Linus Torvalds wrote: > > On Wed, Mar 20, 2019 at 1:52 PM Bjorn Helgaas wrote: > >> > >> AFAICT, the consensus there was that it would be better to find some > >> sort of platform solution instead of dealing with it in individual > >> drivers. The PCI core isn't really a driver, but I think the same > >> argument applies to it: if we had a better way to recover from readl() > >> errors, that way would work equally well in nvme-pci and the PCI core. > > > > I think that patches with the pattern "if (disconnected) don't do IO" > > are fundamentally broken and we should look for alternatives in all > > cases. > > > > They are fundamentally broken because they are racy: if it's an actual > > sudden disconnect in the middle of IO, there's no guarantee that we'll > > even be notified in time. > > > > They are fundamentally broken because they add new magic special cases > > that very few people will ever test, and the people who do test them > > tend to do so with old irrelevant kernels. > > > > Finally, they are fundamentally broken because they always end up > > being just special cases. One or two special case accesses in a > > driver, or perhaps all accesses of a particular type in just _one_ > > special driver. > > > > Yes, yes, I realize that people want error reporting, and that > > hot-removal can cause various error conditions (perhaps just parity > > errors for the IO, but also perhaps other random errors caused by > > firmware perhaps doing special HW setup). > > > > But the "you get a fatal interrupt, so avoid the IO" kind of model is > > completely broken, and needs to just be fixed differently. See above > > why it's so completely broken. > > > > So if the hw is set up to send some kinf of synchronous interrupt or > > machine check that cannot sanely be handled (perhaps because it will > > just repeat forever), we should try to just disable said thing. > > > > PCIe allows for just polling for errors on the bridges, afaik. It's > > been years since I looked at it, and maybe I'm wrong. And I bet there > > are various "platform-specific value add" registers etc that may need > > tweaking outside of any standard spec for PCIe error reporting. But > > let's do that in a platform driver, to set up the platform to not do > > the silly "I'm just going to die if I see an error" thing. > > > > It's way better to have a model where you poll each bridge once a > > minute (or one an hour) and let people know "guys, your hardware > > reports errors", than make random crappy changes to random drivers > > because the hardware was set up to die on said errors. > > > > And if some MIS person wants the "hardware will die" setting, then > > they can damn well have that, and then it's not out problem, but it > > also means that we don't start changing random drivers for that insane > > setting. It's dead, Jim, and it was the users choice. > > > > Notice how in neither case does it make sense to try to do some "if > > (disconnected) dont_do_io()" model for the drivers. > > I disagree with the idea of doing something you know can cause an error > to propagate. That being said, in this particular case we have come to > rely a little too much on the if (disconnected) model. My main gripe with the if (disconnected) model is that it's only really good for inactive devices. If a device is being used then odds are the driver will do an MMIO before the pci core has had a chance to mark the device as broken so you crash anyway. > You mentioned in the other thread that fixing the GHES driver will pay > much higher dividends. I'm working on reviving a couple of changes to do > just that. Some industry folk were very concerned about the "don't > panic()" approach, and I want to make sure I fairly present their > arguments in the cover letter. > > I'm hoping one day we'll have the ability to use page tables to prevent > the situations that we're trying to fix today in less than ideal ways. > Although there are other places in msi.c that use if (disconnected), I'm > okay with dropping this change -- provided we come up with an equivalent > fix. What's the idea there? Scan the ioremap space for mappings over the device BARs and swap them with a normal memory page? > But even if FFS doesn't crash, do we really want to lose hundreds of > milliseconds to SMM --on all cores-- when all it takes is a couple of > cycles to check a flag? Using pci_dev_is_disconnected() to opportunistically avoid waiting for MMIO timeouts is fair enough IMO, even if it's a bit ugly. It would help your case if you did some measurements to show the improvement and look for other cases it might help. It might also be a good idea to document when it is appropriate to use pci_is_dev_disconnected() so we aren't stuck having the same argument again and again, but that's probably a job for Bjorn though. Oliver