Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751425Ab2JEGrN (ORCPT ); Fri, 5 Oct 2012 02:47:13 -0400 Received: from e23smtp05.au.ibm.com ([202.81.31.147]:56454 "EHLO e23smtp05.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751318Ab2JEGrK (ORCPT ); Fri, 5 Oct 2012 02:47:10 -0400 Message-ID: <506E71BE.5030602@linux.vnet.ibm.com> Date: Fri, 05 Oct 2012 11:05:58 +0530 From: "Srivatsa S. Bhat" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120828 Thunderbird/15.0 MIME-Version: 1.0 To: Yasuaki Ishimatsu CC: Andrew Morton , Jiri Kosina , Thomas Gleixner , Ingo Molnar , Peter Zijlstra , "Paul E. McKenney" , Christoph Lameter , Pekka Enberg , "Paul E. McKenney" , Josh Triplett , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus() References: <20121002170149.GC2465@linux.vnet.ibm.com> <20121002233138.GD2465@linux.vnet.ibm.com> <20121003001530.GF2465@linux.vnet.ibm.com> <506C2E02.9080804@linux.vnet.ibm.com> <506C3535.3070401@linux.vnet.ibm.com> <20121003141311.09fb3ffc.akpm@linux-foundation.org> <506D29A7.1000805@linux.vnet.ibm.com> <506E52E1.3090609@jp.fujitsu.com> In-Reply-To: <506E52E1.3090609@jp.fujitsu.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit x-cbid: 12100505-1396-0000-0000-000001F59596 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5287 Lines: 151 On 10/05/2012 08:54 AM, Yasuaki Ishimatsu wrote: > 2012/10/04 15:16, Srivatsa S. Bhat wrote: >> On 10/04/2012 02:43 AM, Andrew Morton wrote: >>> On Wed, 03 Oct 2012 18:23:09 +0530 >>> "Srivatsa S. Bhat" wrote: >>> >>>> The synchronization between CPU hotplug readers and writers is >>>> achieved by >>>> means of refcounting, safe-guarded by the cpu_hotplug.lock. >>>> >>>> get_online_cpus() increments the refcount, whereas put_online_cpus() >>>> decrements >>>> it. If we ever hit an imbalance between the two, we end up >>>> compromising the >>>> guarantees of the hotplug synchronization i.e, for example, an extra >>>> call to >>>> put_online_cpus() can end up allowing a hotplug reader to execute >>>> concurrently with >>>> a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect >>>> such cases >>>> where the refcount can go negative. >>>> >>>> Signed-off-by: Srivatsa S. Bhat >>>> --- >>>> >>>> kernel/cpu.c | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/kernel/cpu.c b/kernel/cpu.c >>>> index f560598..00d29bc 100644 >>>> --- a/kernel/cpu.c >>>> +++ b/kernel/cpu.c >>>> @@ -80,6 +80,7 @@ void put_online_cpus(void) >>>> if (cpu_hotplug.active_writer == current) >>>> return; >>>> mutex_lock(&cpu_hotplug.lock); >>>> + BUG_ON(cpu_hotplug.refcount == 0); >>>> if (!--cpu_hotplug.refcount && >>>> unlikely(cpu_hotplug.active_writer)) >>>> wake_up_process(cpu_hotplug.active_writer); >>>> mutex_unlock(&cpu_hotplug.lock); >>> >>> I think calling BUG() here is a bit harsh. We should only do that if >>> there's a risk to proceeding: a risk of data loss, a reduced ability to >>> analyse the underlying bug, etc. >>> >>> But a cpu-hotplug locking imbalance is a really really really minor >>> problem! So how about we emit a warning then try to fix things up? >> >> That would be better indeed, thanks! >> >>> This should increase the chance that the machine will keep running and >>> so will increase the chance that a user will be able to report the bug >>> to us. >>> >> >> Yep, sounds good. >> >>> >>> --- >>> a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix >>> >>> +++ a/kernel/cpu.c >>> @@ -80,9 +80,12 @@ void put_online_cpus(void) >>> if (cpu_hotplug.active_writer == current) >>> return; >>> mutex_lock(&cpu_hotplug.lock); >>> - BUG_ON(cpu_hotplug.refcount == 0); >>> - if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer)) >>> - wake_up_process(cpu_hotplug.active_writer); >>> + if (!--cpu_hotplug.refcount) { >> >> This won't catch it. We'll enter this 'if' condition only when >> cpu_hotplug.refcount was >> decremented to zero. We'll miss out the case when it went negative >> (which we intended to detect). >> >>> + if (WARN_ON(cpu_hotplug.refcount == -1)) >>> + cpu_hotplug.refcount++; /* try to fix things up */ >>> + if (unlikely(cpu_hotplug.active_writer)) >>> + wake_up_process(cpu_hotplug.active_writer); >>> + } >>> mutex_unlock(&cpu_hotplug.lock); >>> >>> } >> >> So how about something like below: >> >> ------------------------------------------------------> >> >> From: Srivatsa S. Bhat >> Subject: [PATCH] CPU hotplug, debug: Detect imbalance between >> get_online_cpus() and put_online_cpus() >> >> The synchronization between CPU hotplug readers and writers is >> achieved by >> means of refcounting, safe-guarded by the cpu_hotplug.lock. >> >> get_online_cpus() increments the refcount, whereas put_online_cpus() >> decrements >> it. If we ever hit an imbalance between the two, we end up >> compromising the >> guarantees of the hotplug synchronization i.e, for example, an extra >> call to >> put_online_cpus() can end up allowing a hotplug reader to execute >> concurrently with >> a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect >> such cases >> where the refcount can go negative, and also attempt to fix it up, so >> that we can >> continue to run. >> >> Signed-off-by: Srivatsa S. Bhat >> --- > > Looks good to me. > Reviewed-by: Yasuaki Ishimatsu > Thanks for your review Yasuaki! Regards, Srivatsa S. Bhat >> >> kernel/cpu.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/kernel/cpu.c b/kernel/cpu.c >> index f560598..42bd331 100644 >> --- a/kernel/cpu.c >> +++ b/kernel/cpu.c >> @@ -80,6 +80,10 @@ void put_online_cpus(void) >> if (cpu_hotplug.active_writer == current) >> return; >> mutex_lock(&cpu_hotplug.lock); >> + >> + if (WARN_ON(!cpu_hotplug.refcount)) >> + cpu_hotplug.refcount++; /* try to fix things up */ >> + >> if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer)) >> wake_up_process(cpu_hotplug.active_writer); >> mutex_unlock(&cpu_hotplug.lock); >> >> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/