Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753919Ab2BQDfc (ORCPT ); Thu, 16 Feb 2012 22:35:32 -0500 Received: from out01.mta.xmission.com ([166.70.13.231]:56715 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751438Ab2BQDfb convert rfc822-to-8bit (ORCPT ); Thu, 16 Feb 2012 22:35:31 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Don Zickus Cc: Yinghai Lu , linux-kernel@vger.kernel.org, mingo@redhat.com, hpa@zytor.com, torvalds@linux-foundation.org, kexec@lists.infradead.org, vgoyal@redhat.com, akpm@linux-foundation.org, tglx@linutronix.de, mingo@elte.hu, linux-tip-commits@vger.kernel.org Subject: Re: [tip:x86/debug] x86/kdump: No need to disable ioapic/ lapic in crash path References: <20120216172735.GX9751@redhat.com> <20120216215603.GH9751@redhat.com> Date: Thu, 16 Feb 2012 19:38:21 -0800 In-Reply-To: <20120216215603.GH9751@redhat.com> (Don Zickus's message of "Thu, 16 Feb 2012 16:56:03 -0500") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-XM-SPF: eid=;;;mid=;;;hst=in01.mta.xmission.com;;;ip=98.207.153.68;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+KPfx86BPEsB2HsW0Yid6BWoyeVH4Om2A= X-SA-Exim-Connect-IP: 98.207.153.68 X-SA-Exim-Mail-From: ebiederm@xmission.com X-SA-Exim-Scanned: No (on in01.mta.xmission.com); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3099 Lines: 81 Don Zickus writes: > On Thu, Feb 16, 2012 at 01:53:29PM -0800, Yinghai Lu wrote: >> On Thu, Feb 16, 2012 at 9:27 AM, Don Zickus wrote: >> >> > So I think I figured it out.  I went through and commented out code in >> > disable_local_APIC until I narrowed it down to the piece of code that >> > needs to be disabled for it to work. >> > >> > Surprise, surprise... its LVTPC or perf! :-)  Actually it is the >> > nmi_watchdog which uses perf.  My theory is NMIs are not disabled and one >> > is generated by the local apic during decompression (just bad timing) and >> > *splat*. >> > >> > Yinghai, you can probably prove this by >> > >> > echo 0 > /proc/sys/kernel/nmi_watchdog >> > >> > then do your kdump crash test. >> >> yes. that will make kdump crash working. > > Cool. Thanks. > > Eric, > > Just let me know how you want to handle disabling NMIs in the kexec in > panic shutdown case. Interesting. Apparently we have been avoiding this problem by accident. Thanks for hunting this down. The options I can see are: - Ensure we can handle and ignore exceptions like this. - Always shutoff the lapic and ioapic entries that can generate this. The good news is that both solutions should be lock free. The current kernel boot code relies on the assumption that all interrupts can be disabled. In this case with nmi's that is clearly not the case. The most robust solution and what we want to do long term is to install an idt that will simply ignore all interrupts until the idt is replaced. Since really all we need to deal with is the NMI vector, which is vector #2, we can have a very small interrupt descriptor table. Unfortunately we go through some cpu mode switches in /sbin/kexec, allowing us to enter the kernels 32bit entry point before we run the decompresser, so at first glance both /sbin/kexec and the kernel need to be fixed in a coordinated fashion. There are two was I can see of removing the need for an exactly coordinated release. - Document that an old /sbin/kexec userspace requires you not to use the nmi watchdog with modern kernels. - For a short while simply retain code that stomps the nmi watchdog. (But still leaves us open to other kinds of nmi's). Grr. Looking a little more closely, all throughout the linux kernel's boot there is the assumption that any interrupt during boot is a failure of some kind, and except for an errant nmi watchdog that is a true assumption. Don I guess I really have to recommend disabling the nmi watchdog in the kexec on panic path if we can do so at all reasonably. I like the idea of ignoring nmis during boot but that seems to be a slightly larger project and with little practical improvement in kexec on panic quality. Other than getting what should be one or two i/o writes out of the kexec on panic path. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/