Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756423AbZD1FSQ (ORCPT ); Tue, 28 Apr 2009 01:18:16 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751922AbZD1FSA (ORCPT ); Tue, 28 Apr 2009 01:18:00 -0400 Received: from g1t0029.austin.hp.com ([15.216.28.36]:44805 "EHLO g1t0029.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751295AbZD1FR7 (ORCPT ); Tue, 28 Apr 2009 01:17:59 -0400 X-IMAP-Sender: achiang Date: Mon, 27 Apr 2009 23:15:49 -0600 X-OfflineIMAP-2053168646-6c646c-494e424f582e4f7574626f78: 1240895877-0989849774897-v6.0.3 From: Alex Chiang To: Andrew Morton Cc: Greg Kroah-Hartman , linux-kernel@vger.kernel.org, Lai Jiangshan , a.p.zijlstra@chello.nl Subject: Re: [PATCH 01/10] sysfs: don't use global workqueue in sysfs_schedule_callback() Message-ID: <20090428051549.GE12234@soy.asiapacific.hpqcorp.net> Mail-Followup-To: Alex Chiang , Andrew Morton , Greg Kroah-Hartman , linux-kernel@vger.kernel.org, Lai Jiangshan , a.p.zijlstra@chello.nl References: <20090417190144.GB3548@kroah.com> <1239995074-4065-1-git-send-email-gregkh@suse.de> <20090422131402.b35a76bc.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090422131402.b35a76bc.akpm@linux-foundation.org> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5102 Lines: 120 * Andrew Morton : > On Fri, 17 Apr 2009 12:04:25 -0700 > Greg Kroah-Hartman wrote: > > > From: Alex Chiang > > > > A sysfs attribute using sysfs_schedule_callback() to commit suicide > > may end up calling device_unregister(), which will eventually call > > a driver's ->remove function. > > > > Drivers may call flush_scheduled_work() in their shutdown routines, > > in which case lockdep will complain with something like the following: > > > > ============================================= > > [ INFO: possible recursive locking detected ] > > 2.6.29-rc8-kk #1 > > --------------------------------------------- > > events/4/56 is trying to acquire lock: > > (events){--..}, at: [] flush_workqueue+0x0/0xa0 > > > > but task is already holding lock: > > (events){--..}, at: [] run_workqueue+0x108/0x230 > > > > other info that might help us debug this: > > 3 locks held by events/4/56: > > #0: (events){--..}, at: [] run_workqueue+0x108/0x230 > > #1: (&ss->work){--..}, at: [] run_workqueue+0x108/0x230 > > #2: (pci_remove_rescan_mutex){--..}, at: [] remove_callback+0x21/0x40 > > > > stack backtrace: > > Pid: 56, comm: events/4 Not tainted 2.6.29-rc8-kk #1 > > Call Trace: > > [] validate_chain+0xb7d/0x1260 > > [] __lock_acquire+0x42e/0xa40 > > [] lock_acquire+0x58/0x80 > > [] ? flush_workqueue+0x0/0xa0 > > [] flush_workqueue+0x4d/0xa0 > > [] ? flush_workqueue+0x0/0xa0 > > [] flush_scheduled_work+0x10/0x20 > > [] e1000_remove+0x55/0xfe [e1000e] > > [] ? sysfs_schedule_callback_work+0x0/0x50 > > [] pci_device_remove+0x32/0x70 > > [] __device_release_driver+0x59/0x90 > > [] device_release_driver+0x2b/0x40 > > [] bus_remove_device+0xa6/0x120 > > [] device_del+0x12b/0x190 > > [] device_unregister+0x26/0x70 > > [] pci_stop_dev+0x49/0x60 > > [] pci_remove_bus_device+0x40/0xc0 > > [] remove_callback+0x29/0x40 > > [] sysfs_schedule_callback_work+0x1f/0x50 > > [] run_workqueue+0x15a/0x230 > > [] ? run_workqueue+0x108/0x230 > > [] worker_thread+0x9f/0x100 > > [] ? autoremove_wake_function+0x0/0x40 > > [] ? worker_thread+0x0/0x100 > > [] kthread+0x4d/0x80 > > [] child_rip+0xa/0x20 > > [] ? restore_args+0x0/0x30 > > [] ? kthread+0x0/0x80 > > [] ? child_rip+0x0/0x20 > > > > Although we know that the device_unregister path will never acquire > > a lock that a driver might try to acquire in its ->remove, in general > > we should never attempt to flush a workqueue from within the same > > workqueue, and lockdep rightly complains. > > > > So as long as sysfs attributes cannot commit suicide directly and we > > are stuck with this callback mechanism, put the sysfs callbacks on > > their own workqueue instead of the global one. > > > > This has the side benefit that if a suicidal sysfs attribute kicks > > off a long chain of ->remove callbacks, we no longer induce a long > > delay on the global queue. > > I still don't know why I merged > > : commit 2355b70fd59cb5be7de2052a9edeee7afb7ff099 > : Author: Lai Jiangshan > : Date: Thu Apr 2 16:58:24 2009 -0700 > : > : workqueue: avoid recursion in run_workqueue() > > there was nothing wrong with permitting limited recursion into > run_workqueue(). It never deadlocked and the three-deep-recursion > warning never triggered. According to Peter Zjilstra and Lai Jiangshan, there actually is danger of deadlock when flushing a workqueue from the same workqueue: http://thread.gmane.org/gmane.linux.kernel.pci/3713/focus=811703 And in any case, I think it's a good idea to avoid the global workqueue, as removing a PCI bridge near the root of the hierarchy may result in a longish series of operations as we unregister drivers, etc. > > + if (sysfs_workqueue == NULL) { > > + sysfs_workqueue = create_workqueue("sysfsd"); > > + if (sysfs_workqueue == NULL) { > > + module_put(owner); > > + return -ENOMEM; > > + } > > + } > > This will create a kernel thread per CPU. Surely > create_singlethread_workqueue() will suffice? Oh darn, you are right. We do not need a per-CPU thread. I will queue up a patch to use create_singlethread_workqueue. Thanks for the review. /ac -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/