Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752756AbbHVBnI (ORCPT ); Fri, 21 Aug 2015 21:43:08 -0400 Received: from mail9.hitachi.co.jp ([133.145.228.44]:55278 "EHLO mail9.hitachi.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751560AbbHVBnF (ORCPT ); Fri, 21 Aug 2015 21:43:05 -0400 From: =?utf-8?B?5rKz5ZCI6Iux5a6PIC8gS0FXQUnvvIxISURFSElSTw==?= To: "'Peter Zijlstra'" CC: Jonathan Corbet , Ingo Molnar , "Eric W. Biederman" , "H. Peter Anvin" , Andrew Morton , Thomas Gleixner , Vivek Goyal , "linux-doc@vger.kernel.org" , "x86@kernel.org" , "kexec@lists.infradead.org" , "linux-kernel@vger.kernel.org" , Michal Hocko , Ingo Molnar , =?utf-8?B?5bmz5p2+6ZuF5bezIC8gSElSQU1BVFXvvIxNQVNBTUk=?= Subject: RE: [V3 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context Thread-Topic: [V3 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context Thread-Index: AQHQ0AtiyuHjlYjMZkO4hyfD/fPLmJ4VBgUAgAJRFRA= Date: Sat, 22 Aug 2015 01:43:00 +0000 Message-ID: <04EAB7311EE43145B2D3536183D1A8445493C80C@GSjpTKYDCembx31.service.hitachi.net> References: <20150806054543.25766.29590.stgit@softrs> <20150806054543.25766.5526.stgit@softrs> <20150820231816.GH3161@worktop.event.rightround.com> In-Reply-To: <20150820231816.GH3161@worktop.event.rightround.com> Accept-Language: ja-JP, en-US Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.198.220.34] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id t7M1hfKM003075 Content-Length: 1750 Lines: 39 > From: Peter Zijlstra [mailto:peterz@infradead.org] > > On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote: > > When cpu-A panics on NMI just after cpu-B has panicked, cpu-A loops > > infinitely in NMI context. Especially for x86, cpu-B issues NMI IPI > > to other cpus to save their register states and do some cleanups if > > kdump is enabled, but cpu-A can't handle the NMI and fails to save > > register states. > > > > To solve thie issue, we wait for the timing of the NMI IPI, then > > call the NMI handler which saves register states. > > Sorry, I don't follow, what? First, a subroutine of crash_kexec(), nmi_shootdown_cpus() send NMI IPI to non-panic cpus to stop them while saving their registers ans doing some cleanups for crash dumping. So if a non-panic cpu is looping in NMI context infinitely at that time, we fail to save its register information and lose the information from the crash dump. `Infinite loop in NMI context' can happen when panic on NMI is about to happen while another cpu has already been processing panic(). To save regs and do some cleanups in that case too, this patch does two things: 1. Moves the timing of `infinite loop in NMI context' (actually panic_smp_self_stop()) outside of panic() to keep the pt_regs object 2. call a callback of nmi_shootdown_cpus() directly to save regs and do some cleanups after setting waiting_for_crash_ipi which is used for counting down the number of cpus which handled the callback Does that answer your question? Regards, Hidehiro Kawai Hitachi, Ltd. Research & Development Group ????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?