Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754058AbbGaPhA (ORCPT ); Fri, 31 Jul 2015 11:37:00 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42361 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752840AbbGaPg7 (ORCPT ); Fri, 31 Jul 2015 11:36:59 -0400 Date: Fri, 31 Jul 2015 11:36:56 -0400 From: Don Zickus To: Jiri Olsa Cc: Peter Zijlstra , lkml , Ingo Molnar , Michal Hocko , Ulrich Obergfell , Stephane Eranian , Andi Kleen , Paul Bunyan Subject: Re: [PATCH] watchdog: Fix race on single cpu boot Message-ID: <20150731153656.GB42272@redhat.com> References: <1438343805-29771-1-git-send-email-jolsa@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1438343805-29771-1-git-send-email-jolsa@kernel.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7019 Lines: 201 On Fri, Jul 31, 2015 at 01:56:45PM +0200, Jiri Olsa wrote: > We are experiencing following crashes under kdump > kernel execution: > > [ 1.930428] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080 > > SNIP > > [ 2.478604] Call Trace: > [ 2.479364] [] perf_event_enable+0x16/0x40 > [ 2.481035] [] watchdog_nmi_enable+0x10c/0x170 > [ 2.482906] [] watchdog_enable+0x54/0xc0 > [ 2.484583] [] smpboot_thread_fn+0xb9/0x1a0 > [ 2.486302] [] ? schedule+0x29/0x70 > [ 2.487811] [] ? lg_double_unlock+0x90/0x90 > [ 2.4810a484f>] kthread+0xcf/0xe0 > [ 2.991478] [] ? kthread_create_on_node+0x140/0x140 > [ 2.993519] [] ret_from_fork+0x58/0x90 > [ 2.995192] [] ? kthread_create_on_node+0x140/0x140 > [ 3.005787] RIP [] perf_event_ctx_lock_nested.isra.50+0x5a/0xa0 > [ 3.008350] RSP > [ 3.009417] CR2: 0000000000000080 > [ 3.010432] ---[ end trace 892dd7be017c5cde ]--- > [ 3.011792] Kernel panic - not syncing: Fatal exception > [ 3.013379] Rebooting in 10 seconds.. > > It's caused by watchdog_nmi_disable_all/watchdog_nmi_enable_all calls > executed by fixup_ht_bug (as fallback for HT bug) and watchdog_enable > function executed by watchdog's smp_hotplug_thread setup callback. Hi Jiri, Uli privately has been working on a patchset that cleans up a bunch of these race conditions. We believe it should cover this case. It uses the proc_mutex to synchronize everything. I think he is reaching out to you. If you could try his patchset to see if it fixes things, it might be a cleaner approach than what you are doing. Ok, I was poked on IRC and you have already done this and it passed your testcase. Let me get Uli to post his patches. :-) Cheers, Don > > The watchdog_enable function calls watchdog_nmi_enable which is not serialized > with watchdog_nmi_disable_all/watchdog_nmi_enable_all. We see following race > during the boot. > > (booting with 'maxcpus=1' to emulate kdump mode and execute HT bug fallback). > > --- > kernel_init > kernel_init_freeable > lockup_detector_init > smpboot_register_percpu_thread > > -> schedules out to the new kthread > --- > smpboot_thread_fn > watchdog_enable > watchdog_nmi_enable > perf_event_create_kernel_counter > per_cpu(watchdog_ev, cpu) = event > perf_event_enable > perf_event_ctx_lock_nested > -> schedules out back to the kernel_init > --- > kernel_init > ... > do_basic_setup > do_initcalls > fixup_ht_bug > watchdog_nmi_disable_all > -> release per_cpu(watchdog_ev, cpu) initialized by watchdog_enable > > watchdog_nmi_enable_all > ... > -> schedules out to the smpboot_thread_fn > --- > smpboot_thread_fn > ... > -> crash in perf_event_ctx_lock_nested because the event > was released > > Protecting watchdog_nmi_(enable|disable) calls with new per-cpu mutex. > > I can easily reproduce this on SNB server with 'maxcpus=1' command line > option, and I guess it could be triggered on any server that deals with > the HT bug. I dig out following: > > case 42: /* 32nm SandyBridge */ > case 45: /* 32nm SandyBridge-E/EN/EP */ > case 58: /* 22nm IvyBridge */ > case 62: /* 22nm IvyBridge-EP/EX */ > case 60: /* 22nm Haswell Core */ > case 63: /* 22nm Haswell Server */ > case 69: /* 22nm Haswell ULT */ > case 70: /* 22nm Haswell + GT3e (Intel Iris Pro graphics) */ > > Signed-off-by: Jiri Olsa > --- > kernel/watchdog.c | 22 ++++++++++++++++++++-- > 1 file changed, 20 insertions(+), 2 deletions(-) > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index a6ffa43f2993..8fcc75116277 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -82,6 +82,7 @@ static DEFINE_PER_CPU(bool, hard_watchdog_warn); > static DEFINE_PER_CPU(bool, watchdog_nmi_touch); > static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); > static DEFINE_PER_CPU(struct perf_event *, watchdog_ev); > +static DEFINE_PER_CPU(struct mutex, watchdog_ev_lock); > #endif > static unsigned long soft_lockup_nmi_warn; > > @@ -526,7 +527,10 @@ static unsigned long cpu0_err; > static int watchdog_nmi_enable(unsigned int cpu) > { > struct perf_event_attr *wd_attr; > - struct perf_event *event = per_cpu(watchdog_ev, cpu); > + struct perf_event *event; > + > + mutex_lock(&per_cpu(watchdog_ev_lock, cpu)); > + event = per_cpu(watchdog_ev, cpu); > > /* nothing to do if the hard lockup detector is disabled */ > if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED)) > @@ -557,6 +561,8 @@ static int watchdog_nmi_enable(unsigned int cpu) > goto out_save; > } > > + mutex_unlock(&per_cpu(watchdog_ev_lock, cpu)); > + > /* > * Disable the hard lockup detector if _any_ CPU fails to set up > * set up the hardware perf event. The watchdog() function checks > @@ -593,12 +599,16 @@ out_save: > out_enable: > perf_event_enable(per_cpu(watchdog_ev, cpu)); > out: > + mutex_unlock(&per_cpu(watchdog_ev_lock, cpu)); > return 0; > } > > static void watchdog_nmi_disable(unsigned int cpu) > { > - struct perf_event *event = per_cpu(watchdog_ev, cpu); > + struct perf_event *event; > + > + mutex_lock(&per_cpu(watchdog_ev_lock, cpu)); > + event = per_cpu(watchdog_ev, cpu); > > if (event) { > perf_event_disable(event); > @@ -611,6 +621,7 @@ static void watchdog_nmi_disable(unsigned int cpu) > /* watchdog_nmi_enable() expects this to be zero initially. */ > cpu0_err = 0; > } > + mutex_unlock(&per_cpu(watchdog_ev_lock, cpu)); > } > > void watchdog_nmi_enable_all(void) > @@ -928,6 +939,8 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write, > > void __init lockup_detector_init(void) > { > + int cpu __maybe_unused; > + > set_sample_period(); > > #ifdef CONFIG_NO_HZ_FULL > @@ -942,6 +955,11 @@ void __init lockup_detector_init(void) > cpumask_copy(&watchdog_cpumask, cpu_possible_mask); > #endif > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR > + for_each_possible_cpu(cpu) > + mutex_init(&per_cpu(watchdog_ev_lock, cpu)); > +#endif > + > if (watchdog_enabled) > watchdog_enable_all_cpus(); > } > -- > 2.4.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/