Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753906Ab1BIQjX (ORCPT ); Wed, 9 Feb 2011 11:39:23 -0500 Received: from usindmx04.hds.com ([207.126.252.15]:56925 "EHLO usindmx04.hds.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753522Ab1BIQjW convert rfc822-to-8bit (ORCPT ); Wed, 9 Feb 2011 11:39:22 -0500 From: Seiji Aguchi To: "hpa@zytor.com" , "andi@firstfloor.org" , "ebiederm@xmission.com" , "bp@alien8.de" , "seto.hidetoshi@jp.fujitsu.com" , "gregkh@suse.de" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "x86@kernel.org" , "dle-develop@lists.sourceforge.net" , "amwang@redhat.com" CC: Satoru Moriya Date: Wed, 9 Feb 2011 11:35:43 -0500 Subject: [RFC][PATCH v2] Controlling kexec behaviour when hardware error happened. Thread-Topic: [RFC][PATCH v2] Controlling kexec behaviour when hardware error happened. Thread-Index: AcvId2TymcpoA5pgTYGZSHMFdxU+/A== Message-ID: <5C4C569E8A4B9B42A84A977CF070A35B2C1494DBE0@USINDEVS01.corp.hds.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="iso-2022-jp" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7118 Lines: 212 Hi, I submitted a quite similar patch last December. http://www.spinics.net/lists/linux-mm/msg13157.html I retry it with different description of the purpose. [Changelog] from v1: - Change name of sysctl parameter ,kexec_on_mce, to kexec_on_hwerr. - Move variable declaration from to . - Remove CONFIG_X86_MCE in *.c files. - Modify [Purpose]/[Patch Description]. [Purpose] There are some logging features of firmware/hardware, SEL,BMC, etc, in enterprise servers. We investigate the firmware/hardware logs first when MCE occurred and replace the broken hardware. So, memory dump is not necessary for detecting root cause of machine check. Also, we can reduce down-time by skipping kdump. Of course, there are a lot of servers which don't have logging features of firmware/hardware. So, I proposed a option controlling kexec behaviour when hardware error occurred. [Patch Description] This patch adds a sysctl option ,kernel.kexec_on_hwerr, controlling kexec behaviour when hardware error occurred. - Permission   - 0644 - Value(default is "1") - non-zero: Kexec is enabled regardless of hardware error. - 0: Kexec is disabled when MCE occurred. Matrix of kernel.kexec_on_hwerr value ,hardware error and kexec -------------------------------------------------- kernel.kexec_on_hwerr| hardware error | kexec -------------------------------------------------- non-zero | occurred | enabled ----------------------------- | not occurred | enabled -------------------------------------------------- 0 | occurred | disabled |---------------------------- | not occurred | enabled -------------------------------------------------- Any comments and suggestions are welcome. Signed-off-by: Seiji Aguchi --- Documentation/sysctl/kernel.txt | 11 +++++++++++ arch/x86/kernel/cpu/mcheck/mce.c | 2 ++ include/linux/kernel.h | 2 ++ include/linux/sysctl.h | 1 + kernel/panic.c | 15 ++++++++++++++- kernel/sysctl.c | 8 ++++++++ kernel/sysctl_binary.c | 1 + mm/memory-failure.c | 2 ++ 8 files changed, 41 insertions(+), 1 deletions(-) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 11d5ced..3159111 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -34,6 +34,7 @@ show up in /proc/sys/kernel: - hotplug - java-appletviewer [ binfmt_java, obsolete ] - java-interpreter [ binfmt_java, obsolete ] +- kexec_on_hwerr [ x86 only ] - kptr_restrict - kstack_depth_to_print [ X86 only ] - l2cr [ PPC only ] @@ -261,6 +262,16 @@ This flag controls the L2 cache of G3 processor boards. If 0, the cache is disabled. Enabled if nonzero. ============================================================== +kexec_on_hwerr: (X86 only) + +Controls the behaviour of kexec when panic occurred due to hardware +error. +Default value is 1. + +0: Kexec is disabled. +non-zero: Kexec is enabled. + +============================================================== kptr_restrict: diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index d916183..e76b47b 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -944,6 +944,8 @@ void do_machine_check(struct pt_regs *regs, long error_code) percpu_inc(mce_exception_count); + hwerr_flag = 1; + if (notify_die(DIE_NMI, "machine check", regs, error_code, 18, SIGKILL) == NOTIFY_STOP) goto out; diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 2fe6e84..c2fba7c 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -242,6 +242,8 @@ extern void add_taint(unsigned flag); extern int test_taint(unsigned flag); extern unsigned long get_taint(void); extern int root_mountflags; +extern int kexec_on_hwerr; +extern int hwerr_flag; extern bool early_boot_irqs_disabled; diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 7bb5cb6..8ae5bfe 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -153,6 +153,7 @@ enum KERN_MAX_LOCK_DEPTH=74, /* int: rtmutex's maximum lock depth */ KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */ KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */ + KERN_KEXEC_ON_HWERR=77, /* int: bevaviour of kexec for hardware error +*/ }; diff --git a/kernel/panic.c b/kernel/panic.c index 991bb87..84c1d2e 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -28,6 +28,8 @@ #define PANIC_BLINK_SPD 18 int panic_on_oops; +int kexec_on_hwerr = 1; +int hwerr_flag; static unsigned long tainted_mask; static int pause_on_oops; static int pause_on_oops_flag; @@ -45,6 +47,16 @@ static long no_blink(int state) return 0; } +static int kexec_should_skip(void) +{ + if (!kexec_on_hwerr && hwerr_flag) { + printk(KERN_WARNING "Kexec is skipped because hardware error " + "occurred.\n"); + return 1; + } + return 0; +} + /* Returns how long it waited in ms */ long (*panic_blink)(int state); EXPORT_SYMBOL(panic_blink); @@ -86,7 +98,8 @@ NORET_TYPE void panic(const char * fmt, ...) * everything else. * Do we want to call this before we try to display a message? */ - crash_kexec(NULL); + if (!kexec_should_skip()) + crash_kexec(NULL); kmsg_dump(KMSG_DUMP_PANIC); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 0f1bd83..f78edd8 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -811,6 +811,14 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, + { + .procname = "kexec_on_hwerr", + .data = &kexec_on_hwerr, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + #endif #if defined(CONFIG_MMU) { diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c index b875bed..8d572ca 100644 --- a/kernel/sysctl_binary.c +++ b/kernel/sysctl_binary.c @@ -137,6 +137,7 @@ static const struct bin_table bin_kern_table[] = { { CTL_INT, KERN_COMPAT_LOG, "compat-log" }, { CTL_INT, KERN_MAX_LOCK_DEPTH, "max_lock_depth" }, { CTL_INT, KERN_PANIC_ON_NMI, "panic_on_unrecovered_nmi" }, + { CTL_INT, KERN_KEXEC_ON_HWERR, "kexec_on_hwerr" }, {} }; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 0207c2f..0178f47 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -994,6 +994,8 @@ int __memory_failure(unsigned long pfn, int trapno, int flags) int res; unsigned int nr_pages; + hwerr_flag = 1; + if (!sysctl_memory_failure_recovery) panic("Memory failure from trap %d on page %lx", trapno, pfn); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/