Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932527AbXAGMzM (ORCPT ); Sun, 7 Jan 2007 07:55:12 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932524AbXAGMzM (ORCPT ); Sun, 7 Jan 2007 07:55:12 -0500 Received: from mail.screens.ru ([213.234.233.54]:36050 "EHLO mail.screens.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932527AbXAGMzK (ORCPT ); Sun, 7 Jan 2007 07:55:10 -0500 Date: Sun, 7 Jan 2007 15:56:03 +0300 From: Oleg Nesterov To: Srivatsa Vaddagiri Cc: Andrew Morton , David Howells , Christoph Hellwig , Ingo Molnar , Linus Torvalds , linux-kernel@vger.kernel.org, Gautham shenoy Subject: Re: [PATCH] fix-flush_workqueue-vs-cpu_dead-race-update Message-ID: <20070107125603.GA74@tv-sign.ru> References: <20061219004319.GA821@tv-sign.ru> <20070104113214.GA30377@in.ibm.com> <20070104142936.GA179@tv-sign.ru> <20070104091850.c1feee76.akpm@osdl.org> <20070106151036.GA951@tv-sign.ru> <20070106154506.GC24274@in.ibm.com> <20070106163035.GA2948@tv-sign.ru> <20070106163851.GA13579@in.ibm.com> <20070106173416.GA3771@tv-sign.ru> <20070107104328.GC13579@in.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070107104328.GC13579@in.ibm.com> User-Agent: Mutt/1.5.11 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2037 Lines: 55 On 01/07, Srivatsa Vaddagiri wrote: > > On Sat, Jan 06, 2007 at 08:34:16PM +0300, Oleg Nesterov wrote: > > I suspect this can't help either. > > > > The problem is that flush_workqueue() may be called while cpu hotplug event > > in progress and CPU_DEAD waits for kthread_stop(), so we have the same dead > > lock if work->func() does flush_workqueue(). This means that Andrew's change > > to use preempt_disable() is good and anyway needed. > > Well ..a lock_cpu_hotplug() in run_workqueue() and support for recursive > calls to lock_cpu_hotplug() by the same thread will avoid the problem > you mention. Srivatsa, I'm completely new to cpu-hotplug, so please correct me if I'm wrong (in fact I _hope_ I am wrong) but as I see it, the hotplug/workqueue interaction is broken by design, it can't be fixed by changing just locking. Once again. CPU dies, CPU_DEAD calls kthread_stop() and sleeps until cwq->thread exits. To do so, this thread must at least complete the currently running work->func(). work->func() calls flush_workque(WQ), it does lock_cpu_hotplug() or _whatever_. Now the question, does it block? if YES: This is what the stable tree does - deadlock. if NOT: This is what we have with Andrew's s/mutex_lock/preempt_disable/ patch - race or deadlock, we have a choice. Suppose that WQ has pending works on that dead CPU. Note that at this point this CPU does not present on cpu_online_map. This means that (without other changes) we have lost. - flush_workque(WQ) can't return until CPU_DEAD transfers these works to some another CPU on the cpu_online_map. - CPU_DEAD can't do take_over_work() untill flush_workque() returns. Andrew, Ingo, this also means that freezer can't solve this particular problem either (if i am right). Thoughts? Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/