Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753364AbdCNRnt (ORCPT ); Tue, 14 Mar 2017 13:43:49 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:43673 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752105AbdCNRns (ORCPT ); Tue, 14 Mar 2017 13:43:48 -0400 Date: Tue, 14 Mar 2017 18:43:45 +0100 (CET) From: Thomas Gleixner To: Bart Van Assche cc: "bigeasy@linutronix.de" , "torvalds@linux-foundation.org" , "linux-kernel@vger.kernel.org" , "mingo@kernel.org" , "akpm@linux-foundation.org" , "hpa@zytor.com" Subject: Re: [PATCH] cpu/hotplug: Serialize callback invocations proper In-Reply-To: <1489513067.2676.9.camel@sandisk.com> Message-ID: References: <1488851515.6858.2.camel@sandisk.com> <20170314150645.g4tdyoszlcbajmna@linutronix.de> <1489513067.2676.9.camel@sandisk.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1405 Lines: 31 On Tue, 14 Mar 2017, Bart Van Assche wrote: > On Tue, 2017-03-14 at 16:06 +0100, Sebastian Andrzej Siewior wrote: > > The setup/remove_state/instance() functions in the hotplug core code are > > serialized against concurrent CPU hotplug, but unfortunately not serialized > > against themself. > > > > As a consequence a concurrent invocation of these function results in > > corruption of the callback machinery because two instances try to invoke > > callbacks on remote cpus at the same time. This results in missing callback > > invocations and initiator threads waiting forever on the completion. > > > > The obvious solution to replace get_cpu_online() with cpu_hotplug_begin() > > is not possible because at least one callsite calls into these functions > > from a get_online_cpu() locked region. > > > > Extend the protection scope of the cpuhp_state_mutex from solely protecting > > the state arrays to cover the callback invocation machinery as well. > > > > Reported-by: Bart Van Assche > > Fixes: 5b7aa87e0482 ("cpu/hotplug: Implement setup/removal interface") > > Signed-off-by: Sebastian Andrzej Siewior > > Tested-by: Bart Van Assche > > So this regression was introduced in kernel v4.6? Anyway, thanks for the patch! And it's very timing sensitive .... Thanks for trying to bisect this! tglx