Received: by 2002:a25:b794:0:0:0:0:0 with SMTP id n20csp2634958ybh; Mon, 5 Aug 2019 04:20:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqx7+UUe0kT/CsBbhnRNABRSd9ejuKR+Pn/COgn/jwYgk+gzRpdX7BdoaTPqH25NEUMq9muz X-Received: by 2002:a17:902:9a06:: with SMTP id v6mr139921982plp.71.1565004011870; Mon, 05 Aug 2019 04:20:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565004011; cv=none; d=google.com; s=arc-20160816; b=Ubmp8YQ0vTqhm2laXhozAUO2k2IvYpkBvgyd9XPZ9RN58uTiAkWv5M00kK977UHQv+ jnD29WMdf6vN4OomoaiycgjnsCHsbBR1AZKPBrrdGquVoMWlIsjs/qakRVmU5uiA5zj6 p5CAqgvOnh0CJvmcCzMktFJR5jFpGkFkwmyv18ocOt1esSJicHF3FcIa49OC9wgThyZT Pt6vwM/gLVPnmzSKR2K5IsiAmhGGr2kTr2x8eDO188j/PwhVeRFOwH4lmqLciA/X/SYF UaV+QatE4mXnRsiG/TS/ZMJx/7nR7dGbSpYpSij3v8OkZYAOc17zRPI1BMNsGT44nGp8 v3Jw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=iFRT0+hqJTULZY1c0mo1NNj8wM02NmrTfCPg8/EsF14=; b=v3q81wQhVky6Vhdz8PLm//IMpJjLcj5Y16BVO6nHjtR5XkYe+tufykSsWLRhK1OhtH yNqyIBGFony9UzrXanoTJBwMQ4fO1kEVsu0UaD+ZU6SqkhCQRghc1qA/oD+p4ptcL37a 0yCsQaZglT//5yRbaYtDghVWecrSEC+VacKndWVg+3CJDr0M8iXIZXKgWJLuzfo0lGZz 36IBQgfYV2pvdq/yZqJNOCQ6fyPiXTj2nh+x6zBGzpCOtwAO7RmjLUSaXcLgE1WmxpYU tGP3T51BWr0JASbTJZ8gA4R8kIbU9bahWAVRRUCmeWhYLmFnqELQfdzqgWdrOZScOqKZ xCuQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 65si40958594plf.368.2019.08.05.04.19.55; Mon, 05 Aug 2019 04:20:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728428AbfHELS5 (ORCPT + 99 others); Mon, 5 Aug 2019 07:18:57 -0400 Received: from bmailout3.hostsharing.net ([176.9.242.62]:51639 "EHLO bmailout3.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727158AbfHELS4 (ORCPT ); Mon, 5 Aug 2019 07:18:56 -0400 Received: from h08.hostsharing.net (h08.hostsharing.net [IPv6:2a01:37:1000::53df:5f1c:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (not verified)) by bmailout3.hostsharing.net (Postfix) with ESMTPS id BDFE3100AF48B; Mon, 5 Aug 2019 13:18:54 +0200 (CEST) Received: by h08.hostsharing.net (Postfix, from userid 100393) id 6A5E44BA41; Mon, 5 Aug 2019 13:18:54 +0200 (CEST) Date: Mon, 5 Aug 2019 13:18:54 +0200 From: Lukas Wunner To: Mika Westerberg Cc: Bjorn Helgaas , "Rafael J. Wysocki" , Keith Busch , Andy Shevchenko , Sinan Kaya , Kai Heng Feng , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/2] PCI: pciehp: Prevent deadlock on disconnect Message-ID: <20190805111854.al5bj3q2gdng5ai6@wunner.de> References: <20190618125051.2382-1-mika.westerberg@linux.intel.com> <20190618125051.2382-2-mika.westerberg@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190618125051.2382-2-mika.westerberg@linux.intel.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 18, 2019 at 03:50:51PM +0300, Mika Westerberg wrote: > If there are more than one PCIe switch with hotplug downstream ports > hot-removing them leads to a following deadlock: [...] > What happens here is that the whole hierarchy is runtime resumed and the > parent PCIe downstream port, who got the hot-remove event, starts > removing devices below it taking pci_lock_rescan_remove() lock. When the > child PCIe port is runtime resumed it calls pciehp_check_presence() > which ends up calling pciehp_card_present() and pciehp_check_link_active(). > Both of these read their parts of PCIe config space by calling helper > function pcie_capability_read_word(). Now, this function notices that > the underlying device is already gone and returns PCIBIOS_DEVICE_NOT_FOUND > with the capability value set to 0. When pciehp gets this value it > thinks that its child device is also hot-removed and schedules its IRQ > thread to handle the event. > > The deadlock happens when the child's IRQ thread runs and tries to > acquire pci_lock_rescan_remove() which is already taken by the parent > and the parent waits for the child's IRQ thread to finish. > > We can prevent this from happening by checking the return value of > pcie_capability_read_word() and if it is PCIBIOS_DEVICE_NOT_FOUND stop > performing any hot-removal activities. IIUC this patch only avoids the deadlock if the child hotplug port happens to be runtime suspended when it is surprise removed. The deadlock isn't avoided if is runtime resumed. This patch I posted last year should cover both cases: https://patchwork.kernel.org/patch/10468065/ However, as I've noted in this follow-up to the patch, I don't consider my solution a proper fix either: https://patchwork.kernel.org/patch/10468065/#22206721 Rather, the problem should be addressed by unbinding PCI drivers without holding pci_lock_rescan_remove(). I'm truly sorry but I haven't been able to make much progress on this as I got caught up with other things. Part of the problem is that this is volunteer work. Maybe someone's interested in hiring me to work on it? Resume available on request. (But I'll get to it sooner or later whether paid or not, unless someone else beats me to it. :-) ) Thanks, Lukas