From: =?utf-8?B?5rKz5ZCI6Iux5a6PIC8gS0FXQUnvvIxISURFSElSTw==?= 
	<hidehiro.kawai.ez@hitachi.com>
To: "'Michal Hocko'" <mhocko@kernel.org>
CC: Jonathan Corbet <corbet@lwn.net>, Peter Zijlstra <peterz@infradead.org>,
        Ingo Molnar <mingo@kernel.org>,
        "Eric W. Biederman" <ebiederm@xmission.com>,
        "H. Peter Anvin" <hpa@zytor.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        Thomas Gleixner <tglx@linutronix.de>, Vivek Goyal <vgoyal@redhat.com>,
        "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
        "x86@kernel.org" <x86@kernel.org>,
        "kexec@lists.infradead.org" <kexec@lists.infradead.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Ingo Molnar <mingo@redhat.com>,
        =?utf-8?B?5bmz5p2+6ZuF5bezIC8gSElSQU1BVFXvvIxNQVNBTUk=?= 
	<masami.hiramatsu.pt@hitachi.com>
Subject: RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to
 panic on NMI
Thread-Topic: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to
 panic on NMI
Thread-Index: AQHQydfduQy2RlZRNkO8HiurnXt/8J3yI0hA//9yMQCAAZxu0P//27WAgADIr+D//4VugABCBvNg
Date: Fri, 31 Jul 2015 11:23:00 +0000
Message-ID: <04EAB7311EE43145B2D3536183D1A844549220E7@GSjpTKYDCembx31.service.hitachi.net>
References: <20150727015850.4928.50289.stgit@softrs>
 <20150727143405.GF11317@dhcp22.suse.cz> <55B6E2A3.8070004@hitachi.com>
 <04EAB7311EE43145B2D3536183D1A8445491D5E8@GSjpTKYDCembx31.service.hitachi.net>
 <20150729082329.GA15801@dhcp22.suse.cz>
 <04EAB7311EE43145B2D3536183D1A8445491DB5E@GSjpTKYDCembx31.service.hitachi.net>
 <20150729092157.GC15801@dhcp22.suse.cz>
 <04EAB7311EE43145B2D3536183D1A8445491F23A@GSjpTKYDCembx31.service.hitachi.net>
 <20150730074812.GA9387@dhcp22.suse.cz>
 <04EAB7311EE43145B2D3536183D1A8445491FC55@GSjpTKYDCembx31.service.hitachi.net>
 <20150730122747.GA3954@dhcp22.suse.cz>
In-Reply-To: <20150730122747.GA3954@dhcp22.suse.cz>
Accept-Language: ja-JP, en-US
Content-Language: ja-JP
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Sender: linux-kernel-owner@vger.kernel.org
Content-Transfer-Encoding: 8bit
Content-Length: 3661
Lines: 89

> From: Michal Hocko [mailto:mhocko@kernel.org]
> 
> On Thu 30-07-15 11:55:52, 河合英宏 / KAWAI，HIDEHIRO wrote:
> > > From: Michal Hocko [mailto:mhocko@kernel.org]
> [...]
> > > Could you point me to the code which does that, please? Maybe we are
> > > missing that in our 3.0 kernel. I was quite surprised to see this
> > > behavior as well.
> >
> > Please see the snippet below.
> >
> > void setup_local_APIC(void)
> > {
> > ...
> >         /*
> >          * only the BP should see the LINT1 NMI signal, obviously.
> >          */
> >         if (!cpu)
> >                 value = APIC_DM_NMI;
> >         else
> >                 value = APIC_DM_NMI | APIC_LVT_MASKED;
> >         if (!lapic_is_integrated())             /* 82489DX */
> >                 value |= APIC_LVT_LEVEL_TRIGGER;
> >         apic_write(APIC_LVT1, value);
> >
> >
> > LINT1 pins of cpus other than CPU 0 are masked here.
> > However, at least on some of Hitachi servers, NMI caused by NMI
> > button doesn't seem to be delivered through LINT1.  So, my `external NMI'
> > word may not be correct.
> 
> I am not familiar with details here but I can tell you that this
> particular code snippet is the same in our 3.0 based kernel so it seems
> that the HW is indeed doing something differently.

Yes, and it turned out my PATCH 3/3 doesn't work at all on some
hardware...

> > > You might still get a panic on hardlockup which will happen on all CPUs
> > > from the NMI context so we have to be able to handle panic in NMI on
> > > many CPUs.
> >
> > Do you say about the case of a kerne panic while other cpus locks up
> > in NMI context?  In that case, there is no way to do things needed by
> > kdump procedure including saving registeres...
> 
> I am saying that watchdog_overflow_callback might trigger on more CPUs
> and panic from NMI context as well. So this is not reduced to the NMI
> button sends NMI to more CPUs.

I understand.  So, I have to also modify watchdog_overflow_callback
to call nmi_panic().

> Why cannot the panic() context save all the registers if we are going to
> loop in NMI context? This would be imho preferable to returning from
> panic IMO.

I'm not saying we cannot save registers and do some cleanups in NMI
context.  I fell that it would introduce unneeded complexity.
Since watchdog_overflow_callback is defined as generic code,
we need to implement the preparation for kdump for other architectures.
I haven't checked which architectures support both nmi watchdog and
kdump, though.

Anyway, I came up with a simple solution for x86.  Waiting for the
timing of nmi_shootdown_cpus() in nmi_panic(), then invoke the
callback registered by nmi_shootdown_cpus().

> > > I can provide the full log but it is quite mangled. I guess the
> > > CPU130 was the only one allowed to proceed with the panic while others
> > > returned from the unknown NMI handling. It took a lot of time until
> > > CPU130 managed to boot the crash kernel with soft lockups and RCU stalls
> > > reports. CPU0 is most probably locked up waiting for CPU130 to
> > > acknowledge the IPI which will not happen apparently.
> >
> > There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know
> > why CPU 130 waits so long.  I'll try to consider for a while.
> 
> Yes, I do not understand the timing here either and the fact that the
> log is a complete mess in the important parts doesn't help a wee bit.

I'm interested in where "kernel panic -not syncing: " is.
It may give us a clue.


Regards,
Kawai

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?