Received: by 10.192.165.148 with SMTP id m20csp5511915imm; Wed, 9 May 2018 06:18:06 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpMx3uoEmB/1g4vKbPfnk66YRjsixq5nggtikQlQ0oAusPv6eeu1v/CSb+WvABtqzlLzWhF X-Received: by 10.167.129.154 with SMTP id g26mr31549104pfi.210.1525871886843; Wed, 09 May 2018 06:18:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525871886; cv=none; d=google.com; s=arc-20160816; b=atHScd15JQPCFywRLbge0FdMYu4ZZYL8Of9KcpGEi2YZyTL6w0/8lUR/plZQm78Hoe kIfRDj8gy+lS0nn9g6a/2Z5yoWzy/djWQvyOFgXZ1kcPre1edrg3qeATjkC1ht5JODe6 enA4NA/pyPAfrExo0rE3avZQNlHWL/+lHRXkJiH9TgXCxiZFpXcpvo91p2EYtpzpuTAy kVMnchhQ8Q+WDon6fIQoJih+0J351I7W92w97qjE0GwfU1H47S4gJU2thwkiUZr4XGQt d6xJ1eIDYJ4U72fHYXlhQToumI86vESkUPh429Khi6Om2kQa17I/vzcs3TQXlKBGEAv+ W3KQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=15qDNhk5FqKx2L+030QXqKH41iYBqyjcJfZfOq6IFqo=; b=AQ2qD8HVnRxl5LBr20fqJOtpW+aljmZ4ynty3TtWf3K7z22StjwAR0PW+cBnL6wsZL 1aqLxtQEzW3J8Dn9MYLvm8Kay+44ihBqYH0xDUu9JZcwXD8/cuvqqzZ8Up1JUgGV3aqq LYx2gjp8swbb+q9O9tIGXBmBAcH6+bB204LWnnPKZM4u5gMhdqTGxte0CJK2/d8168uP hwU1mBcZnIYG/A+5HWBM7YU186grJLagmRBiG3QDFfd4fhFqB2En/u/i+Wjrh8n3Blr6 tIC2DkOfFb5jUFtJDNXje8J36EjHEF23ASJ2z7A3+YUafv+FraSXmWQFQHtAmPqBSA9O uRAg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a18-v6si8497241pgu.59.2018.05.09.06.17.45; Wed, 09 May 2018 06:18:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935040AbeEINQE (ORCPT + 99 others); Wed, 9 May 2018 09:16:04 -0400 Received: from bmailout3.hostsharing.net ([176.9.242.62]:59711 "EHLO bmailout3.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934784AbeEINQD (ORCPT ); Wed, 9 May 2018 09:16:03 -0400 Received: from h08.hostsharing.net (h08.hostsharing.net [83.223.95.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (not verified)) by bmailout3.hostsharing.net (Postfix) with ESMTPS id 2847E100FBE8F; Wed, 9 May 2018 15:16:01 +0200 (CEST) Received: by h08.hostsharing.net (Postfix, from userid 100393) id B650A4A8A3; Wed, 9 May 2018 15:16:00 +0200 (CEST) Date: Wed, 9 May 2018 15:16:00 +0200 From: Lukas Wunner To: Bjorn Helgaas Cc: Paul Menzel , Bjorn Helgaas , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Sinan Kaya Subject: Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago) Message-ID: <20180509131600.GA3712@wunner.de> References: <8770820b-85a0-172b-7230-3a44524e6c9f@molgen.mpg.de> <20180427192207.GG8199@bhelgaas-glaptop.roam.corp.google.com> <20180509114124.GA20639@wunner.de> <20180509125752.GA234395@bhelgaas-glaptop.roam.corp.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180509125752.GA234395@bhelgaas-glaptop.roam.corp.google.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 09, 2018 at 07:57:52AM -0500, Bjorn Helgaas wrote: > On Wed, May 09, 2018 at 01:41:24PM +0200, Lukas Wunner wrote: > > On Fri, Apr 27, 2018 at 02:22:07PM -0500, Bjorn Helgaas wrote: > > > Sinan mooted the idea of using a "no-wait" path of sending the "don't > > > generate hotplug interrupts" command. I think we should work on this > > > idea a little more. If we're shutting down the whole system, I can't > > > believe there's much value in *anything* we do in the pciehp_remove() > > > path. > > > > > > Maybe we should just get rid of pciehp_remove() (and probably > > > pcie_port_remove_service() and the other service driver remove methods) > > > completely. That dates from when the service drivers could be modules that > > > could be potentially unloaded, but unloading them hasn't been possible for > > > years. > > > > Every Thunderbolt device contains a PCIe switch with at least one > > (downstream) hotplug port, so pciehp_remove() is executed on unplug > > of a Thunderbolt device and the assumption that it's unnecessary > > simply because it's builtin isn't correct. > > I agree that simply being builtin isn't a sufficient argument for getting > rid of pciehp_remove(). > > But if we do need pciehp_remove(), we should be able to make a rational > case for why that is. If we're about to turn off the power, it's not > obvious why we would need to deallocate memory, remove sysfs stuff, etc. > If we need to configure the hardware to make it easier for a kexec'd > kernel, that's a possible argument but we should make it explicit. With Thunderbolt, up to 6 devices may be daisy-chained. This means that a hotplug port may have further hotplug ports as (grand-)children. If power is turned off manually via sysfs for a hotplug port, all children (including hotplug ports) are removed by pciehp even though they physically remain attached to the machine. If such removed-in-software-but-physically- still-present devices send an interrupt, and interrupts were not orderly disabled on ->remove, they will be considered spurious interrupts by genirq code. In particular, level-triggered INTx interrupts will immediately lead to an unpleasant user-visible splat and the interrupt will be switched to polling. So there's no way around orderly disabling interrupts in the ->remove path. I agree that ->shutdown is a different story in principle and that disabling devices seems superfluous and counter-intuitive. I imagine kexec might not be the only reason, but also devices passed through to VMs. (What happens if a VM hands a device back to the host in an unclean state on shutdown?) Thanks, Lukas