Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751448AbbGaLXJ (ORCPT ); Fri, 31 Jul 2015 07:23:09 -0400 Received: from mail9.hitachi.co.jp ([133.145.228.44]:42639 "EHLO mail9.hitachi.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750904AbbGaLXF (ORCPT ); Fri, 31 Jul 2015 07:23:05 -0400 From: =?utf-8?B?5rKz5ZCI6Iux5a6PIC8gS0FXQUnvvIxISURFSElSTw==?= To: "'Michal Hocko'" CC: Jonathan Corbet , Peter Zijlstra , Ingo Molnar , "Eric W. Biederman" , "H. Peter Anvin" , Andrew Morton , Thomas Gleixner , Vivek Goyal , "linux-doc@vger.kernel.org" , "x86@kernel.org" , "kexec@lists.infradead.org" , "linux-kernel@vger.kernel.org" , Ingo Molnar , =?utf-8?B?5bmz5p2+6ZuF5bezIC8gSElSQU1BVFXvvIxNQVNBTUk=?= Subject: RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI Thread-Topic: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI Thread-Index: AQHQydfduQy2RlZRNkO8HiurnXt/8J3yI0hA//9yMQCAAZxu0P//27WAgADIr+D//4VugABCBvNg Date: Fri, 31 Jul 2015 11:23:00 +0000 Message-ID: <04EAB7311EE43145B2D3536183D1A844549220E7@GSjpTKYDCembx31.service.hitachi.net> References: <20150727015850.4928.50289.stgit@softrs> <20150727143405.GF11317@dhcp22.suse.cz> <55B6E2A3.8070004@hitachi.com> <04EAB7311EE43145B2D3536183D1A8445491D5E8@GSjpTKYDCembx31.service.hitachi.net> <20150729082329.GA15801@dhcp22.suse.cz> <04EAB7311EE43145B2D3536183D1A8445491DB5E@GSjpTKYDCembx31.service.hitachi.net> <20150729092157.GC15801@dhcp22.suse.cz> <04EAB7311EE43145B2D3536183D1A8445491F23A@GSjpTKYDCembx31.service.hitachi.net> <20150730074812.GA9387@dhcp22.suse.cz> <04EAB7311EE43145B2D3536183D1A8445491FC55@GSjpTKYDCembx31.service.hitachi.net> <20150730122747.GA3954@dhcp22.suse.cz> In-Reply-To: <20150730122747.GA3954@dhcp22.suse.cz> Accept-Language: ja-JP, en-US Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.198.220.53] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id t6VBNEUW008818 Content-Length: 3661 Lines: 89 > From: Michal Hocko [mailto:mhocko@kernel.org] > > On Thu 30-07-15 11:55:52, 河合英宏 / KAWAI,HIDEHIRO wrote: > > > From: Michal Hocko [mailto:mhocko@kernel.org] > [...] > > > Could you point me to the code which does that, please? Maybe we are > > > missing that in our 3.0 kernel. I was quite surprised to see this > > > behavior as well. > > > > Please see the snippet below. > > > > void setup_local_APIC(void) > > { > > ... > > /* > > * only the BP should see the LINT1 NMI signal, obviously. > > */ > > if (!cpu) > > value = APIC_DM_NMI; > > else > > value = APIC_DM_NMI | APIC_LVT_MASKED; > > if (!lapic_is_integrated()) /* 82489DX */ > > value |= APIC_LVT_LEVEL_TRIGGER; > > apic_write(APIC_LVT1, value); > > > > > > LINT1 pins of cpus other than CPU 0 are masked here. > > However, at least on some of Hitachi servers, NMI caused by NMI > > button doesn't seem to be delivered through LINT1. So, my `external NMI' > > word may not be correct. > > I am not familiar with details here but I can tell you that this > particular code snippet is the same in our 3.0 based kernel so it seems > that the HW is indeed doing something differently. Yes, and it turned out my PATCH 3/3 doesn't work at all on some hardware... > > > You might still get a panic on hardlockup which will happen on all CPUs > > > from the NMI context so we have to be able to handle panic in NMI on > > > many CPUs. > > > > Do you say about the case of a kerne panic while other cpus locks up > > in NMI context? In that case, there is no way to do things needed by > > kdump procedure including saving registeres... > > I am saying that watchdog_overflow_callback might trigger on more CPUs > and panic from NMI context as well. So this is not reduced to the NMI > button sends NMI to more CPUs. I understand. So, I have to also modify watchdog_overflow_callback to call nmi_panic(). > Why cannot the panic() context save all the registers if we are going to > loop in NMI context? This would be imho preferable to returning from > panic IMO. I'm not saying we cannot save registers and do some cleanups in NMI context. I fell that it would introduce unneeded complexity. Since watchdog_overflow_callback is defined as generic code, we need to implement the preparation for kdump for other architectures. I haven't checked which architectures support both nmi watchdog and kdump, though. Anyway, I came up with a simple solution for x86. Waiting for the timing of nmi_shootdown_cpus() in nmi_panic(), then invoke the callback registered by nmi_shootdown_cpus(). > > > I can provide the full log but it is quite mangled. I guess the > > > CPU130 was the only one allowed to proceed with the panic while others > > > returned from the unknown NMI handling. It took a lot of time until > > > CPU130 managed to boot the crash kernel with soft lockups and RCU stalls > > > reports. CPU0 is most probably locked up waiting for CPU130 to > > > acknowledge the IPI which will not happen apparently. > > > > There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know > > why CPU 130 waits so long. I'll try to consider for a while. > > Yes, I do not understand the timing here either and the fact that the > log is a complete mess in the important parts doesn't help a wee bit. I'm interested in where "kernel panic -not syncing: " is. It may give us a clue. Regards, Kawai ????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?