Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751625AbdLLLaz (ORCPT ); Tue, 12 Dec 2017 06:30:55 -0500 Received: from szxga06-in.huawei.com ([45.249.212.32]:33118 "EHLO huawei.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1750731AbdLLLaw (ORCPT ); Tue, 12 Dec 2017 06:30:52 -0500 Subject: Re: [RFC PATCH][resend] pciehp: fix a race between pciehp and removing operations by sysfs To: References: <1513067384-10914-1-git-send-email-wangxiongfeng2@huawei.com> CC: , , , , From: Xiongfeng Wang Message-ID: <55cc79f2-cece-c84b-5c95-050a954d3669@huawei.com> Date: Tue, 12 Dec 2017 19:30:31 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <1513067384-10914-1-git-send-email-wangxiongfeng2@huawei.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.32.209] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5111 Lines: 108 This patch seems to introduce another issue. pciehp_power_thread() use 'container_of' to get the 'slot' according to 'work_struct'. If the 'slot' has been freed before that, there will be an issue. On 2017/12/12 16:29, Xiongfeng Wang wrote: > When the Attention button on a PCIE slot is pressed, 5 seconds later, > pciehp_power_thread() will be scheduled on slot->wq. This function will > get a global mutex lock 'pci_rescan_remove_lock' in > pciehp_unconfigure_device(). > > At the same time, we remove the pcie port by sysfs, which results in > pci_stop_and_remove_bus_device_locked() called. This function will get > the global mutex lock 'pci_rescan_remove_lock', and then release the > struct 'ctrl', which will wait until the work_struct on slot->wq is > finished. > > If pci_stop_and_remove_bus_device_locked() got the mutex lock, and > before it drains workqueue slot->wq, pciehp_power_thread() is scheduled > on slot->wq and tries to get the mutex lock. Then > pci_stop_and_remove_bus_device_locked() tries to drain workqueue > slot->wq and wait until work struct 'pciehp_power_thread()' is finished. > Then a hung_task happens. > > This patch solve this problem by schedule 'pciehp_power_thread()' on a > system workqueue instead of slot->wq. > > The Call Trace we got is as following. > > INFO: task kworker/0:2:4413 blocked for more than 120 seconds. > Tainted: P W O 4.12.0-rc1 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kworker/0:2 D 0 4413 2 0x00000000 > Workqueue: pciehp-0 pciehp_power_thread > Call trace: > [] __switch_to+0x94/0xa8 > [] __schedule+0x1b0/0x708 > [] schedule+0x40/0xa4 > [] schedule_preempt_disabled+0x28/0x40 > [] __mutex_lock.isra.8+0x148/0x50c > [] __mutex_lock_slowpath+0x24/0x30 > [] mutex_lock+0x48/0x54 > [] pci_lock_rescan_remove+0x20/0x28 > [] pciehp_unconfigure_device+0x54/0x1cc > [] pciehp_disable_slot+0x4c/0xbc > [] pciehp_power_thread+0xa0/0xb8 > [] process_one_work+0x13c/0x3f8 > [] worker_thread+0x60/0x3e4 > [] kthread+0x10c/0x138 > [] ret_from_fork+0x10/0x50 > INFO: task bash:31732 blocked for more than 120 seconds. > Tainted: P W O 4.12.0-rc1 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > bash D 0 31732 1 0x00000009 > Call trace: > [] __switch_to+0x94/0xa8 > [] __schedule+0x1b0/0x708 > [] schedule+0x40/0xa4 > [] schedule_timeout+0x1a0/0x340 > [] wait_for_common+0x108/0x1bc > [] wait_for_completion+0x28/0x34 > [] flush_workqueue+0x130/0x488 > [] drain_workqueue+0xc4/0x164 > [] destroy_workqueue+0x28/0x1f4 > [] pciehp_release_ctrl+0x34/0xe0 > [] pciehp_remove+0x30/0x3c > [] pcie_port_remove_service+0x3c/0x54 > [] device_release_driver_internal+0x150/0x1d0 > [] device_release_driver+0x28/0x34 > [] bus_remove_device+0xe0/0x11c > [] device_del+0x200/0x304 > [] device_unregister+0x20/0x38 > [] remove_iter+0x44/0x54 > [] device_for_each_child+0x4c/0x90 > [] pcie_port_device_remove+0x2c/0x48 > [] pcie_portdrv_remove+0x60/0x6c > [] pci_device_remove+0x48/0x110 > [] device_release_driver_internal+0x150/0x1d0 > [] device_release_driver+0x28/0x34 > [] pci_stop_bus_device+0x9c/0xac > [] pci_stop_and_remove_bus_device_locked+0x24/0x3c > [] remove_store+0x74/0x80 > [] dev_attr_store+0x44/0x5c > [] sysfs_kf_write+0x5c/0x74 > [] kernfs_fop_write+0xcc/0x1dc > [] __vfs_write+0x48/0x13c > [] vfs_write+0xa8/0x198 > [] SyS_write+0x54/0xb0 > [] el0_svc_naked+0x24/0x28 > > Signed-off-by: Xiongfeng Wang > --- > drivers/pci/hotplug/pciehp_ctrl.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c > index 83f3d4a..9d39d85 100644 > --- a/drivers/pci/hotplug/pciehp_ctrl.c > +++ b/drivers/pci/hotplug/pciehp_ctrl.c > @@ -221,7 +221,7 @@ static void pciehp_queue_power_work(struct slot *p_slot, int req) > info->p_slot = p_slot; > INIT_WORK(&info->work, pciehp_power_thread); > info->req = req; > - queue_work(p_slot->wq, &info->work); > + schedule_work(&info->work); > } > > void pciehp_queue_pushbutton_work(struct work_struct *work) >