Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752629AbbGWKLP (ORCPT ); Thu, 23 Jul 2015 06:11:15 -0400 Received: from mail9.hitachi.co.jp ([133.145.228.44]:39866 "EHLO mail9.hitachi.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751173AbbGWKLI (ORCPT ); Thu, 23 Jul 2015 06:11:08 -0400 X-AuditID: 85900ec0-9e1cab9000001a57-dd-55b0bd8f922c Message-ID: <55B0BDB7.2050809@hitachi.com> Date: Thu, 23 Jul 2015 19:11:03 +0900 From: Hidehiro Kawai User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120604 Thunderbird/13.0 MIME-Version: 1.0 To: Michal Hocko CC: Jonathan Corbet , Peter Zijlstra , Ingo Molnar , "Eric W. Biederman" , "H. Peter Anvin" , Andrew Morton , Thomas Gleixner , Vivek Goyal , Masami Hiramatsu , x86@kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Subject: Re: [PATCH 0/3] x86: Fix panic vs. NMI issues References: <20150722021421.5155.74460.stgit@softrs> <20150723082514.GC9386@dhcp22.suse.cz> In-Reply-To: <20150723082514.GC9386@dhcp22.suse.cz> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4567 Lines: 125 Hi, Thanks for the feedback. (2015/07/23 17:25), Michal Hocko wrote: > Hi, > > On Wed 22-07-15 11:14:21, Hidehiro Kawai wrote: >> When an HA cluster software or administrator detects non-response >> of a host, they issue an NMI to the host to completely stop current >> works and take a crash dump. If the kernel has already panicked >> or is capturing a crash dump at that time, further NMI can cause >> a crash dump failure. >> >> To solve this issue, this patch set does two things: >> >> - Don't panic on NMI if the kernel has already panicked >> - Introduce "noextnmi" boot option which masks external NMI at the >> boot time (supported only for x86) > > I am currently debugging the same issue for our customer. Curiously > enough the issue happens on a Hitachi HW. I found these issues by my white-box testing and source code reading. So, they haven't happened on our customers yet, but possibly happen. > I haven't posted my patch for an upstream review yet because I still > do not have a feedback but I believe your solution is unnecessarily > too complex. Unless I am missing something the following should be enough, > no? Your patch solves some cases, but I think it wouldn't cover all cases where I want to solve. How about the following cases? 1) panic -> acquire panic_lock -> unknown NMI on this CPU -> panic -> failed to acquire panic_lock -> infinite loop ==> no one processes kdump procedure. 2) crash_kexec w/o entering panic -> acquire kexec_mutex -> unknown NMI on this CPU -> panic -> crash_kexec -> failed to acquire kexec_mutex -> return to panic -> smp_send_stop Even if with your patch, case 2) causes infinite loop of try_crash_kexec and no one processes kdump procedure. Regards, > --- >>From ba6ef85d26113e720a630ea22b08efef5b70210f Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Fri, 17 Jul 2015 15:17:08 +0200 > Subject: [PATCH] kexec: Never return from crash_kexec when kexex is in > progress > > We had a report when kdump kernel hasn't booted after unknown NMI has > been delivered and unknown_nmi_panic is enabled. The NMI is triggered > by HW and it is delivered to all CPUs at the same time. The machine has > hundreds of CPUs and the most plausible theory is that one CPU really > manages to kick the kexec but it cannot shut down all the CPUs because > they are processing NMI and so cannot process an IPI. Another CPU then > manages to call smp_send_stop from a concurrent panic and this stops the > kexec CPU which has managed to switch to the new kernel and doesn't run > in the NMI mode anymore. > > Fix this by making crash_kexec to never return if there is a kexec in > progress. This can be done easily by relying on the fact that > kexec_mutex will never be released for an ongoing kexec so we just have > to loop over the try lock. The only tricky part is that > kexec_crash_image might be not loaded when we have to return. The check > has to be done under the lock. Extract the trylock and check into > try_crash_kexec and make it return true only if crash kexec is disabled. > > Signed-off-by: Michal Hocko > --- > kernel/kexec.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/kernel/kexec.c b/kernel/kexec.c > index a785c1015e25..d61b1478167d 100644 > --- a/kernel/kexec.c > +++ b/kernel/kexec.c > @@ -1470,7 +1470,7 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, > > #endif /* CONFIG_KEXEC_FILE */ > > -void crash_kexec(struct pt_regs *regs) > +static bool try_crash_kexec(struct pt_regs *regs) > { > /* Take the kexec_mutex here to prevent sys_kexec_load > * running on one cpu from replacing the crash kernel > @@ -1490,7 +1490,20 @@ void crash_kexec(struct pt_regs *regs) > machine_kexec(kexec_crash_image); > } > mutex_unlock(&kexec_mutex); > + return true; > } > + return false; > +} > + > +void crash_kexec(struct pt_regs *regs) > +{ > + /* > + * Never return from this function if a kexec is in progress > + * already because next steps might interfere with it. > + * try_crash_kexec will never succeed in such a case. > + */ > + while (!try_crash_kexec(regs)) > + cpu_relax(); > } > > size_t crash_get_memory_size(void) > -- Hidehiro Kawai Hitachi, Ltd. Research & Development Group -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/