Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752234AbbG2JJ3 (ORCPT ); Wed, 29 Jul 2015 05:09:29 -0400 Received: from mail7.hitachi.co.jp ([133.145.228.42]:44665 "EHLO mail7.hitachi.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750788AbbG2JJW (ORCPT ); Wed, 29 Jul 2015 05:09:22 -0400 From: =?utf-8?B?5rKz5ZCI6Iux5a6PIC8gS0FXQUnvvIxISURFSElSTw==?= To: "'Michal Hocko'" CC: Jonathan Corbet , Peter Zijlstra , Ingo Molnar , "Eric W. Biederman" , "H. Peter Anvin" , Andrew Morton , Thomas Gleixner , Vivek Goyal , "linux-doc@vger.kernel.org" , "x86@kernel.org" , "kexec@lists.infradead.org" , "linux-kernel@vger.kernel.org" , Ingo Molnar , =?utf-8?B?5bmz5p2+6ZuF5bezIC8gSElSQU1BVFXvvIxNQVNBTUk=?= Subject: RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI Thread-Topic: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI Thread-Index: AQHQydfduQy2RlZRNkO8HiurnXt/8J3yI0hA Date: Wed, 29 Jul 2015 09:09:18 +0000 Message-ID: <04EAB7311EE43145B2D3536183D1A8445491DB5E@GSjpTKYDCembx31.service.hitachi.net> References: <20150727015850.4928.87717.stgit@softrs> <20150727015850.4928.50289.stgit@softrs> <20150727143405.GF11317@dhcp22.suse.cz> <55B6E2A3.8070004@hitachi.com> <04EAB7311EE43145B2D3536183D1A8445491D5E8@GSjpTKYDCembx31.service.hitachi.net> <20150729082329.GA15801@dhcp22.suse.cz> In-Reply-To: <20150729082329.GA15801@dhcp22.suse.cz> Accept-Language: ja-JP, en-US Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.198.220.54] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id t6T99XWP023937 Content-Length: 3243 Lines: 78 > From: Michal Hocko [mailto:mhocko@kernel.org] > On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote: > > Hi, > > > > > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Hidehiro Kawai > > > (2015/07/27 23:34), Michal Hocko wrote: > > > > On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote: > > [...] > > > > The check could be also relaxed a bit and nmi_panic would > > > > return only if the ongoing panic is the current cpu when we really have > > > > to return and allow the preempted panic to finish. > > > > > > It's reasonable. I'll do that in the next version. > > > > I noticed atomic_read() is insufficient. Please consider the following > > scenario. > > > > CPU 1: call panic() in the normal context > > CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic() > > CPU 1: set 1 to panic_cpu > > CPU 0: fail to set 0 to panic_cpu, then do an infinite loop > > CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus() > > > > At this point, since CPU 0 loops in NMI context, it never executes > > the NMI handler registered by kdump_nmi_shootdown_cpus(). This means > > that no register states are saved and no cleanups for VMX/SVM are > > performed. > > Yes this is true but it is no different from the current state, isn't > it? So if you want to handle that then it deserves a separate patch. > It is certainly not harmful wrt. panic behavior. > > > So, we should still use atomic_cmpxchg() in nmi_panic() to > > prevent other cpus from running panic routines. > > Not sure what you mean by that. I mean that we should use the same logic as my V2 patch like this: #define nmi_panic(fmt, ...) \ do { \ if (atomic_cmpxchg(&panic_cpu, -1, raw_smp_processor_id()) \ == -1) \ panic(fmt, ##__VA_ARGS__); \ } while (0) By using atomic_cmpxchg here, we can ensure that only this cpu runs panic routines. It is important to prevent a NMI-context cpu from calling panic_smp_self_stop(). void panic(const char *fmt, ...) { ... * `old_cpu == -1' means we are the first comer. * `old_cpu == this_cpu' means we came here due to panic on NMI. */ this_cpu = raw_smp_processor_id(); old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu); if (old_cpu != -1 && old_cpu != this_cpu) panic_smp_self_stop(); Please assume that CPU 0 calls nmi_panic() in NMI context and CPU 1 calls panic() in normal context at tha same time. If CPU 1 set panic_cpu before CPU 0 does, CPU 1 runs panic routines and CPU 0 return from the nmi handler. Eventually CPU 0 is stopped by nmi_shootdown_cpus(). If CPU 0 set panic_cpu before CPU 1 does, CPU 0 runs panic routines. CPU 1 calls panic_smp_self_stop(), and wait for NMI by nmi_shootdown_cpus(). Anyway, I tested my approach and it worked fine. Regards, Kawai ????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?