Received: by 10.223.185.116 with SMTP id b49csp3767281wrg; Mon, 26 Feb 2018 05:50:26 -0800 (PST) X-Google-Smtp-Source: AG47ELtCbUbxGEGAGClsUkHIcmWK7yw6T81/FbZeA/I8JMzK9/hywBIv+QWFgTsSoqnQA6fCu3Hk X-Received: by 10.99.42.83 with SMTP id q80mr942127pgq.115.1519653026305; Mon, 26 Feb 2018 05:50:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519653026; cv=none; d=google.com; s=arc-20160816; b=xdz5IJvY5g0+hhBDF0xBxvXsCIMKPfGya6/hCQySSgGJiIJL/9DOEoPnVIpYF7s26O 85o52k2BxnOxxDGYexp8rw1B1uTxZJ9/spx5vJ9VuPlIOVQ/DcBJZe18TzJFn96f2AL+ a5VJpVc5k3TNXybmD4AhPt1E8jWAqGBBy+VJJXjTqg1pRbnL5Psn3QLFRgGXv1y+lMxg +cI3w22RxrehQNRFh5MvAeTMGXROFG49w5IPNQt9p9pW2VOu7Xh3Pw/D8lt6X2y9UJaZ hd4k+l9oAoe+mLenfHFU6/Dc0687IsYo6sG3zTFg7c1Ypqui7TxdT5z8l1PxyA9+tkSY +N8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:subject:cc :to:from:arc-authentication-results; bh=r9g5f23H0iq/hO1tpwMI0mAPA7sKSClqT1ldT/WQ8Fo=; b=wjglHNsOqCR/p1XjrQd3dRFAmODBjJ0xcV5tCNywQv9tltnhD5L9REaGMwJHDSQfOb ng4uVVPG0XvOliuk4eb5auPZWoBsCZN0XE6eA+rHvh0fOq1/laswXG1+2Mfnqnn4VK3D Gk52CCu3fCDFJC/zWIdx4qI7TPrp9xIFhse+LDzjidJ6xxu6zCSzoG3EnFfoK2ZKT9It Lw3YN+sfe1YhHV+XwVtsxHigXPyCiGNkyGTVnjIRVYoOV0svqu6+v5wSRZ/HrJdl47nh ZXmITaTPTasuZMt1ujcvKun6neq0CPIlmEkkAzkjpox5QaOWBwn8mAW+wGAfUtio5Di4 Eskw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d2-v6si6973345pln.115.2018.02.26.05.50.11; Mon, 26 Feb 2018 05:50:26 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752869AbeBZMuv (ORCPT + 99 others); Mon, 26 Feb 2018 07:50:51 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:5675 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752548AbeBZMuu (ORCPT ); Mon, 26 Feb 2018 07:50:50 -0500 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id CC8BF4D4FEF72; Mon, 26 Feb 2018 20:50:32 +0800 (CST) Received: from linux-ibm.site (10.175.102.37) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.361.1; Mon, 26 Feb 2018 20:50:26 +0800 From: Xiongfeng Wang To: CC: , , Subject: [PATCH] pciehp: fix a race between pciehp and removing operations by sysfs Date: Mon, 26 Feb 2018 20:41:15 +0800 Message-ID: <1519648875-38196-1-git-send-email-wangxiongfeng2@huawei.com> X-Mailer: git-send-email 1.7.12.4 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.175.102.37] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Xiongfeng Wang When I run a stress test about pcie hotplug and removing operations by sysfs, I got a hange task, and the following call trace is printed. INFO: task kworker/0:2:4413 blocked for more than 120 seconds. Tainted: P W O 4.12.0-rc1 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/0:2 D 0 4413 2 0x00000000 Workqueue: pciehp-0 pciehp_power_thread Call trace: [] __switch_to+0x94/0xa8 [] __schedule+0x1b0/0x708 [] schedule+0x40/0xa4 [] schedule_preempt_disabled+0x28/0x40 [] __mutex_lock.isra.8+0x148/0x50c [] __mutex_lock_slowpath+0x24/0x30 [] mutex_lock+0x48/0x54 [] pci_lock_rescan_remove+0x20/0x28 [] pciehp_unconfigure_device+0x54/0x1cc [] pciehp_disable_slot+0x4c/0xbc [] pciehp_power_thread+0xa0/0xb8 [] process_one_work+0x13c/0x3f8 [] worker_thread+0x60/0x3e4 [] kthread+0x10c/0x138 [] ret_from_fork+0x10/0x50 INFO: task bash:31732 blocked for more than 120 seconds. Tainted: P W O 4.12.0-rc1 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. bash D 0 31732 1 0x00000009 Call trace: [] __switch_to+0x94/0xa8 [] __schedule+0x1b0/0x708 [] schedule+0x40/0xa4 [] schedule_timeout+0x1a0/0x340 [] wait_for_common+0x108/0x1bc [] wait_for_completion+0x28/0x34 [] flush_workqueue+0x130/0x488 [] drain_workqueue+0xc4/0x164 [] destroy_workqueue+0x28/0x1f4 [] pciehp_release_ctrl+0x34/0xe0 [] pciehp_remove+0x30/0x3c [] pcie_port_remove_service+0x3c/0x54 [] device_release_driver_internal+0x150/0x1d0 [] device_release_driver+0x28/0x34 [] bus_remove_device+0xe0/0x11c [] device_del+0x200/0x304 [] device_unregister+0x20/0x38 [] remove_iter+0x44/0x54 [] device_for_each_child+0x4c/0x90 [] pcie_port_device_remove+0x2c/0x48 [] pcie_portdrv_remove+0x60/0x6c [] pci_device_remove+0x48/0x110 [] device_release_driver_internal+0x150/0x1d0 [] device_release_driver+0x28/0x34 [] pci_stop_bus_device+0x9c/0xac [] pci_stop_and_remove_bus_device_locked+0x24/0x3c [] remove_store+0x74/0x80 [] dev_attr_store+0x44/0x5c [] sysfs_kf_write+0x5c/0x74 [] kernfs_fop_write+0xcc/0x1dc [] __vfs_write+0x48/0x13c [] vfs_write+0xa8/0x198 [] SyS_write+0x54/0xb0 [] el0_svc_naked+0x24/0x28 There is a race condition between these two kinds of operations. When the Attention button on a PCIE slot is pressed, 5 seconds later, pciehp_power_thread() will be scheduled on slot->wq. This function will call pciehp_unconfigure_device(), which will try to get a global mutex lock 'pci_rescan_remove_lock'. At the same time, we remove the pcie port by sysfs, which results in pci_stop_and_remove_bus_device_locked() called. This function will get the global mutex lock 'pci_rescan_remove_lock', and then release the struct 'ctrl', which will wait until the work_struct on slot->wq is finished. If pci_stop_and_remove_bus_device_locked() got the mutex lock, and before it drains workqueue slot->wq, pciehp_power_thread() is scheduled on slot->wq and tries to get the mutex lock but failed, so it will just wait. Then pci_stop_and_remove_bus_device_locked() tries to drain workqueue slot->wq and wait until work struct 'pciehp_power_thread()' is finished. Then a hung_task occurs. So this two kinds of operation, removing through attention buttion and removing through /sys/devices/pci***/remove, should not be excuted at the same time. This patch add a global variable to mark that one of these operations is under processing. When this variable is set, if another operation is requested, it will be rejected. At first, I want to add a flag for each pci slot to record whether a removing operation is under processing. When a bridge is being removed, the flags of all the slots below the bridge need to be checked. But it is hard for us to guarantee the atomic access. So I just use a global flag. Signed-off-by: Xiongfeng Wang --- drivers/pci/hotplug/pciehp_ctrl.c | 7 +++++++ drivers/pci/hotplug/pciehp_hpc.c | 12 +++++++++++- drivers/pci/pci-sysfs.c | 11 +++++++++-- drivers/pci/remove.c | 6 ++++++ include/linux/pci.h | 3 +++ 5 files changed, 36 insertions(+), 3 deletions(-) diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c index c684faa..e6fc5d7 100644 --- a/drivers/pci/hotplug/pciehp_ctrl.c +++ b/drivers/pci/hotplug/pciehp_ctrl.c @@ -30,6 +30,7 @@ void pciehp_queue_interrupt_event(struct slot *p_slot, u32 event_type) info = kmalloc(sizeof(*info), GFP_ATOMIC); if (!info) { ctrl_err(p_slot->ctrl, "dropped event %d (ENOMEM)\n", event_type); + slot_being_removed_rescanned = 0; return; } @@ -174,6 +175,7 @@ static void pciehp_power_thread(struct work_struct *work) mutex_lock(&p_slot->lock); p_slot->state = STATIC_STATE; mutex_unlock(&p_slot->lock); + slot_being_removed_rescanned = 0; break; case ENABLE_REQ: mutex_lock(&p_slot->hotplug_lock); @@ -184,6 +186,7 @@ static void pciehp_power_thread(struct work_struct *work) mutex_lock(&p_slot->lock); p_slot->state = STATIC_STATE; mutex_unlock(&p_slot->lock); + slot_being_removed_rescanned = 0; break; default: break; @@ -202,6 +205,7 @@ static void pciehp_queue_power_work(struct slot *p_slot, int req) if (!info) { ctrl_err(p_slot->ctrl, "no memory to queue %s request\n", (req == ENABLE_REQ) ? "poweron" : "poweroff"); + slot_being_removed_rescanned = 0; return; } info->p_slot = p_slot; @@ -270,6 +274,7 @@ static void handle_button_press_event(struct slot *p_slot) ctrl_info(ctrl, "Slot(%s): Action canceled due to button press\n", slot_name(p_slot)); p_slot->state = STATIC_STATE; + slot_being_removed_rescanned = 0; break; case POWEROFF_STATE: case POWERON_STATE: @@ -280,10 +285,12 @@ static void handle_button_press_event(struct slot *p_slot) */ ctrl_info(ctrl, "Slot(%s): Button ignored\n", slot_name(p_slot)); + slot_being_removed_rescanned = 0; break; default: ctrl_err(ctrl, "Slot(%s): Ignoring invalid state %#x\n", slot_name(p_slot), p_slot->state); + slot_being_removed_rescanned = 0; break; } } diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 18a42f8..5ef5387 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -608,7 +608,17 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) if (events & PCI_EXP_SLTSTA_ABP) { ctrl_info(ctrl, "Slot(%s): Attention button pressed\n", slot_name(slot)); - pciehp_queue_interrupt_event(slot, INT_BUTTON_PRESS); + + if (!test_and_set_bit(0, &slot_being_removed_rescanned)) + pciehp_queue_interrupt_event(slot, INT_BUTTON_PRESS); + else { + if (slot->state == BLINKINGOFF_STATE || slot->state == BLINKINGON_STATE) + pciehp_queue_interrupt_event(slot, INT_BUTTON_PRESS); + else + ctrl_info(ctrl, "Slot(%s): Slot operation failed because a remove or" + " rescan operation is under processing, please try later!\n", + slot_name(slot)); + } } /* diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index eb6bee8..9fd1699 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -499,8 +499,15 @@ static ssize_t remove_store(struct device *dev, struct device_attribute *attr, if (kstrtoul(buf, 0, &val) < 0) return -EINVAL; - if (val && device_remove_file_self(dev, attr)) - pci_stop_and_remove_bus_device_locked(to_pci_dev(dev)); + if (val && device_remove_file_self(dev, attr)) { + if (!test_and_set_bit(0, &slot_being_removed_rescanned)) { + pci_stop_and_remove_bus_device_locked(to_pci_dev(dev)); + slot_being_removed_rescanned = 0; + } else { + pr_info("Slot is being removed or rescanned, please try later!\n"); + return -EPERM; + } + } return count; } static struct device_attribute dev_remove_attr = __ATTR(remove, diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c index 6f072ea..6d36d53 100644 --- a/drivers/pci/remove.c +++ b/drivers/pci/remove.c @@ -4,6 +4,12 @@ #include #include "pci.h" +/* + * When a slot is being hotplug through Attention Button or being + * removed/rescanned through sysfs, this flag is set. + */ +unsigned long slot_being_removed_rescanned; + static void pci_free_resources(struct pci_dev *dev) { int i; diff --git a/include/linux/pci.h b/include/linux/pci.h index 024a1be..e1711ce 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -846,6 +846,9 @@ enum pcie_bus_config_types { /* Do NOT directly access these two variables, unless you are arch-specific PCI * code, or PCI core code. */ extern struct list_head pci_root_buses; /* List of all known PCI buses */ + +extern unsigned long slot_being_removed_rescanned; + /* Some device drivers need know if PCI is initiated */ int no_pci_devices(void); -- 1.7.12.4