Received: by 2002:a25:b794:0:0:0:0:0 with SMTP id n20csp2655017ybh; Mon, 5 Aug 2019 04:41:54 -0700 (PDT) X-Google-Smtp-Source: APXvYqwaqBtiHTqsG7o/d1KmOyTl022fv62r2Z1ixQMm0qeehxZN0s5denVkkh+Ns+IxYfiFs+6i X-Received: by 2002:a17:902:76c7:: with SMTP id j7mr139907478plt.247.1565005314825; Mon, 05 Aug 2019 04:41:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565005314; cv=none; d=google.com; s=arc-20160816; b=0D87JOJhhRZ/ACWX/bKxcaXZ/P44CvSW0MTaNnihSrHplCtsD5WygHbdU7hQlo1Ruh ur7Uf/KZmmCtFhJixbWammnghF+8JxPE4p2Lbgy1fu95BsyGKF3WYZHXFky6HskSLhYR mL7dlTg5COJKGqlBH2ftRdpmLih8JIr1YXiDlxaFYEv7glov2n5jyOIeTLPqn69EaVjX jNKJGfCCt3xnwAixCWSsXmw8X5ikznslAaZo6UAcnA9JXPiBPsFuW8/zDNZFkbH1IwJW cggIwayp6VA781Ayc5VXmAyPb6dXc57LVvw+JBrsz7LfFm2+5VV9K1PfFX+lJjTPAHKQ AhWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=CzwNZZDDdbJ3VyNVXygAEUiiabmJfiZ9LcexqEhHmOc=; b=eQnH+Rv4K7RMPbe7wJq7nVbBliEkv2XvYeYGriWHB4jTrl/QGZJ5OSYkg3SIJ+54Ka PwI1MCrilZiCRI0C5Hm0QWKn7O7tpI9FI8nH+i4SxkmDMZ4nam63nHb+4CQaIqNaCuZ9 PDhxofmzHMdFpsLnGinVUniyIkTKyhGpPFQWdh6hSSctH5RuEe1PIvmhMJzGsRp3mr+q iBAZqS5IWK0/u+pxnWCW0QEFudhRRAkbIjWHvYSDPFrI048jJRTqRIk34r8vIl/9V7M1 up0vA9O95gD8F1m7MMzXXv/XNRtHLZ+YVU9/+YGsEzgZ1phJ2nKVSjA+Wye4kBv4Pd0Y TNiw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a24si44612035pfi.205.2019.08.05.04.41.39; Mon, 05 Aug 2019 04:41:54 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728111AbfHELk4 (ORCPT + 99 others); Mon, 5 Aug 2019 07:40:56 -0400 Received: from bmailout3.hostsharing.net ([176.9.242.62]:58483 "EHLO bmailout3.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727259AbfHELkz (ORCPT ); Mon, 5 Aug 2019 07:40:55 -0400 Received: from h08.hostsharing.net (h08.hostsharing.net [83.223.95.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (not verified)) by bmailout3.hostsharing.net (Postfix) with ESMTPS id EE7D6100B029E; Mon, 5 Aug 2019 13:40:53 +0200 (CEST) Received: by h08.hostsharing.net (Postfix, from userid 100393) id A70823E553; Mon, 5 Aug 2019 13:40:53 +0200 (CEST) Date: Mon, 5 Aug 2019 13:40:53 +0200 From: Lukas Wunner To: Xiongfeng Wang Cc: Bjorn Helgaas , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] pciehp: fix a race between pciehp and removing operations by sysfs Message-ID: <20190805114053.srbngho3wbziy2uy@wunner.de> References: <1519648875-38196-1-git-send-email-wangxiongfeng2@huawei.com> <20190802003618.GJ151852@google.com> <0c0512fd-e95a-74be-09c2-1576844d9c97@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0c0512fd-e95a-74be-09c2-1576844d9c97@huawei.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 02, 2019 at 04:23:33PM +0800, Xiongfeng Wang wrote: > If I use a global flag to mark if any pci device is being rescaned or > removed, the problem is that we can't remove two devices belonging to > two root ports at the same time. > Since a root port produces a pci tree, so I was planning to make the > flag per root port slot. I mean add the flag in 'struct slot'. > But in some situation, the root port doesn't support hotplug and the > downport below the root port support hotplug. I am not sure if it's > better to add the flag in 'struct pci_dev' of the root port. We're susceptible to deadlocks if at least two hotplug ports are removed simultaneously where one is a parent of the other. What you're witnessing is basically a variation of that problem wherein a hotplug port is removed while it is simultaneously removing its children. pci_lock_rescan_remove(), which was introduced by commit 9d16947b7583 to fix races (which are real), at the same time caused these deadlocks. The lock is too coarse-grained and needs to be replaced with more fine-grained locking. Specifically, unbinding PCI devices from drivers on removal need not and should not happen under that lock. That will fix all the deadlocks. I've submitted a patch last year to address one class of those deadlocks but withdrew it as I realized it's not a proper fix: https://patchwork.kernel.org/patch/10468065/ What you can do is add a flag to struct pci_dev (or the priv_flags embedded therein) to indicate that a device is about to be removed. Set this flag on all children of the device being removed before acquiring pci_lock_rescan_remove() and avoid taking that lock in pciehp_unconfigure_device() if the flag is set on the hotplug port. But again, that approach is just a band-aid and the real fix is to unbind devices without holding the lock. Thanks, Lukas