Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754662AbbB0ItP (ORCPT ); Fri, 27 Feb 2015 03:49:15 -0500 Received: from e9.ny.us.ibm.com ([32.97.182.139]:55826 "EHLO e9.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753201AbbB0ItO (ORCPT ); Fri, 27 Feb 2015 03:49:14 -0500 Message-ID: <54F02F81.10901@linux.vnet.ibm.com> Date: Fri, 27 Feb 2015 14:19:05 +0530 From: Preeti U Murthy User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Nicolas Pitre , Peter Zijlstra , tglx@linutronix.de CC: linux-kernel@vger.kernel.org, mingo@kernel.org, rjw@rjwysocki.net, Michael Ellerman Subject: Re: [PATCH 32/35] clockevents: Fix cpu down race for hrtimer based broadcasting References: <20150216121435.203983131@infradead.org> <20150216122413.880378334@infradead.org> <20150221124659.GG23367@worktop.ger.corp.intel.com> <20150223161457.GA5029@twins.programming.kicks-ass.net> <54EEAFCF.4080208@linux.vnet.ibm.com> In-Reply-To: <54EEAFCF.4080208@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15022708-0033-0000-0000-000001F7FE7E Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5876 Lines: 169 On 02/26/2015 11:01 AM, Preeti U Murthy wrote: > On 02/23/2015 11:03 PM, Nicolas Pitre wrote: >> On Mon, 23 Feb 2015, Nicolas Pitre wrote: >> >>> On Mon, 23 Feb 2015, Peter Zijlstra wrote: >>> >>>> The reported function that fails: bL_switcher_restore_cpus() is called >>>> in the error paths of the former and the main path in the latter to make >>>> the 'stolen' cpus re-appear. >>>> >>>> The patch in question somehow makes that go boom. >>>> >>>> >>>> Now what all do you need to do to make it go boom? Just enable/disable >>>> the switcher once and it'll explode? Or does it need to do actual >>>> switches while it is enabled? >>> >>> It gets automatically enabled during boot. Then several switches are >>> performed while user space is brought up. If I manually disable it >>> via /sys then it goes boom. >> >> OK. Forget the bL switcher. I configured it out of my kernel and then >> managed to get the same crash by simply hotplugging out one CPU and >> plugging it back in. >> >> $ echo 0 > /sys/devices/system/cpu/cpu2/online >> [CPU2 gone] >> $ echo 1 > /sys/devices/system/cpu/cpu2/online >> [Boom!] >> >> > I saw an issue with this patch as well. I tried to do an smt mode switch > on a power machine, i.e varying the number of hyperthreads on an SMT 8 > system, and the system hangs. Worse, there are no softlockup > messages/warnings/bug_ons reported. I am digging into this issue. > > A couple of points though. Looking at the patch, I see that we are > shutting down tick device of the hotplugged out cpu *much before* > migrating the timers and hrtimers from it. Migration of timers is done > in the CPU_DEAD phase, while we shutdown tick devices in the CPU_DYING > phase. There is quite a bit of a gap here. Earlier we would do both in a > single notification. > > Another point is that the tick devices are shutdown before the > hotplugged out cpu actually dies in __cpu_die(). At first look none of > these two points should create any issues. But since we are noticing > problems with this patch, I thought it would be best to put them forth. > > But why are tick devices being shutdown that early ? Is there any > specific advantage to this? Taking/handing over tick duties should be > done before __cpu_die(), but shutdown of tick devices should be done > after this phase. This seems more natural, doesn't it? The problem reported in the changelog of this patch is causing severe regressions very frequently on our machines for certain usecases. It would help to put in a fix in place first and then follow that up with these cleanups. A fix on the below lines : --------------------------------------------------------------------- clockevents: Fix cpu down race for hrtimer based broadcasting --- include/linux/tick.h | 10 +++++++--- kernel/cpu.c | 2 ++ kernel/time/tick-broadcast.c | 19 +++++++++++-------- 3 files changed, 20 insertions(+), 11 deletions(-) diff --git a/include/linux/tick.h b/include/linux/tick.h index eda850c..8735776 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -91,14 +91,18 @@ extern void tick_cancel_sched_timer(int cpu); static inline void tick_cancel_sched_timer(int cpu) { } # endif -# ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST +# if defined CONFIG_GENERIC_CLOCKEVENTS_BROADCAST extern struct tick_device *tick_get_broadcast_device(void); extern struct cpumask *tick_get_broadcast_mask(void); -# ifdef CONFIG_TICK_ONESHOT +# if defined CONFIG_TICK_ONESHOT extern struct cpumask *tick_get_broadcast_oneshot_mask(void); +extern void tick_takeover(int deadcpu); +# else +static void tick_takeover(int deadcpu) {} # endif - +# else +static void tick_takeover(int deadcpu) {} # endif /* BROADCAST */ # ifdef CONFIG_TICK_ONESHOT diff --git a/kernel/cpu.c b/kernel/cpu.c index 1972b16..f9ca351 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include "smpboot.h" @@ -411,6 +412,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen) while (!idle_cpu(cpu)) cpu_relax(); + tick_takeover(cpu); /* This actually kills the CPU. */ __cpu_die(cpu); diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index 066f0ec..0fd6634 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -669,14 +669,19 @@ static void broadcast_shutdown_local(struct clock_event_device *bc, clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN); } -static void broadcast_move_bc(int deadcpu) +void tick_takeover(int deadcpu) { - struct clock_event_device *bc = tick_broadcast_device.evtdev; + struct clock_event_device *bc; + unsigned long flags; - if (!bc || !broadcast_needs_cpu(bc, deadcpu)) - return; - /* This moves the broadcast assignment to this cpu */ - clockevents_program_event(bc, bc->next_event, 1); + raw_spin_lock_irqsave(&tick_broadcast_lock, flags); + bc = tick_broadcast_device.evtdev; + + if (bc && broadcast_needs_cpu(bc, deadcpu)) { + /* This moves the broadcast assignment to this cpu */ + clockevents_program_event(bc, bc->next_event, 1); + } + raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags); } /* @@ -913,8 +918,6 @@ void tick_shutdown_broadcast_oneshot(unsigned int *cpup) cpumask_clear_cpu(cpu, tick_broadcast_pending_mask); cpumask_clear_cpu(cpu, tick_broadcast_force_mask); - broadcast_move_bc(cpu); - raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags); } Regards Preeti U Murthy > > > Regards > Preeti U Murthy > >> Nicolas >> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/