Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp77134imu; Thu, 8 Nov 2018 15:08:32 -0800 (PST) X-Google-Smtp-Source: AJdET5eFsKNjIIuTiIVc24g9lzcdRW4ryVYLX8XYVPbN33mQiRThcuqgKYXlTdVTN7JRVtdaBubD X-Received: by 2002:a17:902:5066:: with SMTP id f35-v6mr6331982plh.145.1541718512864; Thu, 08 Nov 2018 15:08:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541718512; cv=none; d=google.com; s=arc-20160816; b=MLzOg5853k6o+6XVB6GHk6M4Z/SSuMp3u9BUSDCwRaBQlVfAcdzdCuOIbwqB5VOaKQ YyCjUbqWRkLgVdGzdx0gjzhsYJSVBZFYbm0xlHbStecmHv2UZMfPKPYOYYfulBCr8uRf JqLtiB/lfnQwFcTG4xcVJVod5MOMaSVB5x2X+VR/DR7GPQ+zXZQfXQuQBjv8P7f0ltqD Bn+ERqQP7PILPfeKgIwvsZQzGbD6iy6/NN2C62RcWsWPcuxWmgwif7dben21mNSryKe5 9jP9t/2A8RIeFcfdfKVGHwU4ta90MqSYDk1gDsrsuSZhY3qQ9p+ebBSfSxwlEE3MEIWn Ga5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=A23VrovNOK4cyHMcZ0wj6FDfh1X6UKM3gHjJyjXTTsw=; b=KchzuuwfAEGwbOrY2LXHnAeaFWTHgJU3E7wUtqqZcv5QrWkDzedpsD15xjjKy6Au0t wmR/MmUFfhd25sdHx+sUKrhf7vEsFKDqY5LEnNDoRGyFOUcN9JdIHK+uBz51BPc7SY81 umNTxoVwO1j2MaSsC7kHZVMyFYvUtIpU/RwQ46YGMzDUBcZmgzL52cuKiPEWbe3fvg6K rxNxuUVXsBQvWp80wHWhOBhETT5RdGmmgFRsIW2mc+MktR9/Q6/n8wjNeHo2GTXRxTOd +rXULldVP+Q9Mgp1A0CrprztqgGkIUqHRcaRny5ToAPg0G15AlatDxm7HjLojzFamRNn j0jA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g80-v6si6954903pfj.37.2018.11.08.15.08.16; Thu, 08 Nov 2018 15:08:32 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727552AbeKIIoz (ORCPT + 99 others); Fri, 9 Nov 2018 03:44:55 -0500 Received: from mga07.intel.com ([134.134.136.100]:25010 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725922AbeKIIoz (ORCPT ); Fri, 9 Nov 2018 03:44:55 -0500 X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Nov 2018 15:07:06 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,481,1534834800"; d="scan'208";a="89702031" Received: from unknown (HELO localhost.localdomain) ([10.232.112.69]) by orsmga006.jf.intel.com with ESMTP; 08 Nov 2018 15:07:06 -0800 Date: Thu, 8 Nov 2018 16:03:27 -0700 From: Keith Busch To: Greg Kroah-Hartman Cc: Bjorn Helgaas , Alexandru Gagniuc , linux-pci@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, linux-kernel@vger.kernel.org, Jonathan Derrick , Lukas Wunner , Russell Currey , Sam Bobroff , Oliver O'Halloran , linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected Message-ID: <20181108230327.GE2932@localhost.localdomain> References: <20180918221501.13112-1-mr.nuke.me@gmail.com> <20181107234257.GC41183@google.com> <20181108200855.GE41183@google.com> <20181108220117.GA11466@kroah.com> <20181108223258.GD2932@localhost.localdomain> <20181108224255.GA20619@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181108224255.GA20619@kroah.com> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 08, 2018 at 02:42:55PM -0800, Greg Kroah-Hartman wrote: > On Thu, Nov 08, 2018 at 03:32:58PM -0700, Keith Busch wrote: > > On Thu, Nov 08, 2018 at 02:01:17PM -0800, Greg Kroah-Hartman wrote: > > > On Thu, Nov 08, 2018 at 02:09:17PM -0600, Bjorn Helgaas wrote: > > > > I'm having second thoughts about this. One thing I'm uncomfortable > > > > with is that sprinkling pci_dev_is_disconnected() around feels ad hoc > > > > instead of systematic, in the sense that I don't know how we convince > > > > ourselves that this (and only this) is the correct place to put it. > > > > > > I think my stance always has been that this call is not good at all > > > because once you call it you never really know if it is still true as > > > the device could have been removed right afterward. > > > > > > So almost any code that relies on it is broken, there is no locking and > > > it can and will race and you will loose. > > > > AIUI, we're not trying to create code to rely on this. This more about > > reducing reliance on hardware. If the software misses the race once and > > accesses disconnected device memory, that's usually not a big deal to > > let hardware sort it out, but the point is not to push our luck. > > Then why even care about this call at all? If you need to really know > if the read worked, you have to check the value. If the value is FF > then you have a huge hint that the hardware is now gone. And you can > rely on it being gone, you can never rely on making the call to the > function to check if the hardware is there to be still valid any point > in time after the call returns. > > > Surprise hot remove is empirically more reliable the less we interact > > with hardware and firmware. That shouldn't be necessary, but is just an > > unfortunate reality. > > You are not "interacting", you are reading/writing to the hardware, as > you have to do so. So I really don't understand what you are talking > about here, sorry. We're reading hardware memory, yes, but the hardware isn't there. Something obviously needs to return FF, so we are indirectly interacting with whatever mechanism handles that. Sometimes that mechanism doesn't handle it gracefully and instead of having FF to consider, you have a machine check rebooting your system.