Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756028Ab2JCVNO (ORCPT ); Wed, 3 Oct 2012 17:13:14 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:34079 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755592Ab2JCVNN (ORCPT ); Wed, 3 Oct 2012 17:13:13 -0400 Date: Wed, 3 Oct 2012 14:13:11 -0700 From: Andrew Morton To: "Srivatsa S. Bhat" Cc: Jiri Kosina , Thomas Gleixner , Ingo Molnar , Peter Zijlstra , "Paul E. McKenney" , Christoph Lameter , Pekka Enberg , "Paul E. McKenney" , Josh Triplett , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus() Message-Id: <20121003141311.09fb3ffc.akpm@linux-foundation.org> In-Reply-To: <506C3535.3070401@linux.vnet.ibm.com> References: <20121002170149.GC2465@linux.vnet.ibm.com> <20121002233138.GD2465@linux.vnet.ibm.com> <20121003001530.GF2465@linux.vnet.ibm.com> <506C2E02.9080804@linux.vnet.ibm.com> <506C3535.3070401@linux.vnet.ibm.com> X-Mailer: Sylpheed 3.0.2 (GTK+ 2.20.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2671 Lines: 68 On Wed, 03 Oct 2012 18:23:09 +0530 "Srivatsa S. Bhat" wrote: > The synchronization between CPU hotplug readers and writers is achieved by > means of refcounting, safe-guarded by the cpu_hotplug.lock. > > get_online_cpus() increments the refcount, whereas put_online_cpus() decrements > it. If we ever hit an imbalance between the two, we end up compromising the > guarantees of the hotplug synchronization i.e, for example, an extra call to > put_online_cpus() can end up allowing a hotplug reader to execute concurrently with > a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases > where the refcount can go negative. > > Signed-off-by: Srivatsa S. Bhat > --- > > kernel/cpu.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/cpu.c b/kernel/cpu.c > index f560598..00d29bc 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -80,6 +80,7 @@ void put_online_cpus(void) > if (cpu_hotplug.active_writer == current) > return; > mutex_lock(&cpu_hotplug.lock); > + BUG_ON(cpu_hotplug.refcount == 0); > if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer)) > wake_up_process(cpu_hotplug.active_writer); > mutex_unlock(&cpu_hotplug.lock); I think calling BUG() here is a bit harsh. We should only do that if there's a risk to proceeding: a risk of data loss, a reduced ability to analyse the underlying bug, etc. But a cpu-hotplug locking imbalance is a really really really minor problem! So how about we emit a warning then try to fix things up? This should increase the chance that the machine will keep running and so will increase the chance that a user will be able to report the bug to us. --- a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix +++ a/kernel/cpu.c @@ -80,9 +80,12 @@ void put_online_cpus(void) if (cpu_hotplug.active_writer == current) return; mutex_lock(&cpu_hotplug.lock); - BUG_ON(cpu_hotplug.refcount == 0); - if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer)) - wake_up_process(cpu_hotplug.active_writer); + if (!--cpu_hotplug.refcount) { + if (WARN_ON(cpu_hotplug.refcount == -1)) + cpu_hotplug.refcount++; /* try to fix things up */ + if (unlikely(cpu_hotplug.active_writer)) + wake_up_process(cpu_hotplug.active_writer); + } mutex_unlock(&cpu_hotplug.lock); } _ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/