Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753195Ab1EQHlO (ORCPT ); Tue, 17 May 2011 03:41:14 -0400 Received: from mga11.intel.com ([192.55.52.93]:56144 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753072Ab1EQHlJ (ORCPT ); Tue, 17 May 2011 03:41:09 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.65,224,1304319600"; d="scan'208";a="2651869" Message-ID: <4DD22692.7050209@intel.com> Date: Tue, 17 May 2011 15:41:06 +0800 From: Huang Ying User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.15) Gecko/20110402 Iceowl/1.0b2 Icedove/3.1.9 MIME-Version: 1.0 To: Ingo Molnar CC: Don Zickus , huang ying , "linux-kernel@vger.kernel.org" , Andi Kleen , Robert Richter , Andi Kleen , Borislav Petkov Subject: Re: [RFC] x86, NMI, Treat unknown NMI as hardware error References: <1305275018-20596-1-git-send-email-ying.huang@intel.com> <20110513124523.GM13984@redhat.com> <20110513130011.GA6474@elte.hu> <20110513152033.GB3854@elte.hu> <20110513160029.GD31888@redhat.com> <20110516112934.GE19837@elte.hu> In-Reply-To: <20110516112934.GE19837@elte.hu> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3175 Lines: 79 On 05/16/2011 07:29 PM, Ingo Molnar wrote: > > * Don Zickus wrote: > >> On Fri, May 13, 2011 at 05:20:33PM +0200, Ingo Molnar wrote: >>> >>> * huang ying wrote: >>> >>>>> What should be done instead is to add an event for unknown NMIs, which can >>>>> then be processed by the RAS daemon to implement policy. >>>>> >>>>> By using 'active' event filters it could even be set on a system to panic >>>>> the box by default. >>>> >>>> If there is real fatal hardware error, maybe we have no luxury to go from NMI >>>> handler to user space RAS daemon to determine what to do. System may explode, >>>> bad data may go to disk before that. >>> >>> That is why i suggested: >>> >>> > > By using 'active' event filters it could even be set on a system to panic >>> > > the box by default. >>> >>> event filters are evaluated in the kernel, so the panic could be instantaneous, >>> without the event having to reach user-space. >> >> Interesting. Question though, what do you mean by 'event filtering'. Is >> that different then setting 'unknown_nmi_panic' panic on the commandline or >> procfs? >> >> Or are you suggesting something like registering another callback on the >> die_chain that looks for DIE_NMIUNKNOWN as the event, swallows them and >> implements the policy? That way only on HEST related platforms would >> register them while others would keep the default of 'Dazed and confused' >> messages? > > The idea is that "event filters", which are an existing upstream feature and > which can be used in rather flexible ways: > > http://lkml.org/lkml/2011/4/27/660 > > Could be used to trigger non-standard policy action as well - such as to panic > the box. > > This would replace various very limited /debugfs and /sys event filtering hacks > (and hardcoded policies) such as arch/x86/kernel/cpu/mcheck/mce-severity.c, and > it would allow nonstandard behavior like 'panic the box on unknown NMIs' as > well. > > This could be set by the RAS daemon, and it could be propagated to the kernel > boot line as well, where event filter syntax would look like this: > > events=nmi::unknown"if (reason == 0) panic();" > > (Where the 'reason' field of the NMI event is the current legacy 'reason' value > there.) > > The filter code would have to be modified to be able to recognize the panic() > bit, but that's desirable anyway and it is a one-time effort. > > This: > > events=nmi::unknown:"if (reason == 0) ignore();" > > would be a possible outcome as well, on certain boxes - to skip certain events. We can determine whether NMI is unknown in kernel now. If you want to push all unknown NMI logic into user space (although I don't think that is the best solution), is it not sufficient that just check system in user space (via PCI ID or DMI ID, etc) and set existing "unknown_nmi_panic" accordingly? Best Regards, Huang Ying -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/