Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756716Ab1EFQy1 (ORCPT ); Fri, 6 May 2011 12:54:27 -0400 Received: from e28smtp04.in.ibm.com ([122.248.162.4]:37384 "EHLO e28smtp04.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756666Ab1EFQyW (ORCPT ); Fri, 6 May 2011 12:54:22 -0400 Date: Fri, 6 May 2011 22:24:12 +0530 From: "K.Prasad" To: Linux Kernel Mailing List Cc: Andi Kleen , "Luck, Tony" , Vivek Goyal , kexec@lists.infradead.org Subject: [Bug] Kdump does not work when panic triggered due to MCE Message-ID: <20110506165412.GB2719@in.ibm.com> Reply-To: prasad@linux.vnet.ibm.com MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5876 Lines: 126 Hi All, I wanted to test the behaviour of kdump when panic is triggered due to MCE on x86 and found that kdump is not captured. While the kdump service is configured and running and non-MCE panics (such as those triggered through to /proc/sysrq-trigger) successfully capture a kdump, any fatal MCE error injected through the mce-inject tool causes a reboot of the machine. The code has been traced (using early_serial_putc()) to enter the kexec path i.e. panic()->crash_kexec()->machine_kexec()->relocate_kernel() but is untraceable further. Kdump works fine when the same the similar test is carried out inside a KVM guest. Has anybody tested this before? Or have found kdump working when fatal MCEs have actually occurred? Thanks, K.Prasad Relevant Screen logs --------------------- login: root Password: Last login: Fri May 6 11:16:52 from 9.77.122.190 # uname -a Linux elm3a97.beaverton.ibm.com 2.6.39-rc6.prasad_kdump+ #1 SMP Fri May 6 07:47:31 EDT 2011 i686 i686 i386 GNU/Linux # lsmod | grep mce mce_inject 2355 0 [permanent] # service kdump status Kdump is operational # mce-inject /home/prasadkr/mce/mce-test/cases/soft-inj/panic_ucr/data/srar_over Triggering MCE exception on CPU 0 Disabling lock debugging due to kernel taint [Hardware Error]: CPU 0: Machine Check Exception: 6 Bank 2: f580000000000000 [Hardware Error]: RIP 73:<000000001eadbabe> [Hardware Error]: TSC 21dde8717030 ADDR 1234 [Hardware Error]: PROCESSOR 0:106a5 TIME 1304696989 SOCKET 0 APIC 0 [Hardware Error]: No human readable MCE decoding support on this CPU type. [Hardware Error]: Run the message through 'mcelog --ascii' to decode. [Hardware Error]: Machine check: Overflowed uncorrected Kernel panic - not syncing: Fatal Machine check Pid: 0, comm: kworker/0:0 Tainted: G M W 2.6.39-rc6.prasad_kdump+ #1 ------------[ cut here ]------------ kernel BUG at arch/x86/kernel/traps.c:436! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map Modules linked in: mce_inject autofs4 cpufreq_ondemand acpi_cpufreq mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log dm_mod cdc_ether usbnet mii microcode sg i2c_i801 serio_raw pcspkr iTCO_wdt iTCO_vendor_support bnx2x libcrc32c mdio ioatdma dca i7core_edac edac_core bnx2 ext4 jbd2 sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas [last unloaded: scsi_wait_scan] Pid: 0, comm: kworker/0:1 Tainted: G M W 2.6.39-rc6.prasad_kdump+ #1 IBM IBM System x -[7839AC1]-/46C7890 EIP: 0060:[] EFLAGS: 00010006 CPU: 12 EIP is at do_nmi+0x89/0xa0 EAX: e9ba9c9c EBX: e9ba9c9c ECX: 04010000 EDX: e9ba8000 ESI: 0000000c EDI: 00000af0 EBP: e9ba9c94 ESP: e9ba9c90 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process kworker/0:1 (pid: 0, ti=e9ba8000 task=e9ba70d0 task.ti=e9ba8000) Stack: 0000000c e9ba9ce8 c0860678 0000000c 00000010 000021dd 0000000c 00000af0 e9ba9ce8 f0eca855 c0aa007b 0000007b e9ba00d8 c08600e0 f0eca855 c06048fd 00000060 00000246 f0eca5dd 004b9910 0000000c 0000000c e9ba9cf0 c06048be Call Trace: [] nmi_stack_correct+0x2f/0x34 [] ? invalidate_interrupt23+0x3c/0x3c [] ? delay_tsc+0x3d/0x70 [] __const_udelay+0x1e/0x20 [] wait_for_panic+0x25/0x50 [] mce_timed_out+0x48/0x90 [] mce_end+0x59/0x100 [] do_machine_check+0x3db/0x6a0 [] ? __hrtimer_start_range_ns+0xa0/0x470 [] raise_exception+0x34/0xa0 [mce_inject] [] ? sched_clock+0x8/0x10 [] ? sched_clock_cpu+0x145/0x190 [] ? __lock_acquire+0x2c0/0x490 [] mce_raise_notify+0x61/0x70 [mce_inject] [] notifier_call_chain+0x43/0x60 [] __atomic_notifier_call_chain+0x5b/0x80 [] ? notifier_call_chain+0x60/0x60 [] atomic_notifier_call_chain+0x1a/0x20 [] notify_die+0x2d/0x30 [] default_do_nmi+0x32/0x290 [] ? __lock_release+0x72/0x180 [] ? clockevents_notify+0x3a/0xf0 [] do_nmi+0x87/0xa0 [] nmi_stack_correct+0x2f/0x34 [] ? leaps_between+0x3b/0x90 [] ? intel_idle+0x8c/0x100 [] cpuidle_idle_call+0x8d/0x210 [] cpu_idle+0x9b/0xd0 [] start_secondary+0xdd/0xe3 Code: 5e 2f c2 ff 89 e0 25 00 e0 ff ff 8b 50 14 f7 c2 00 00 00 04 74 1e 81 ea 00 00 01 04 89 50 14 5b 5d c3 89 d8 e8 e9 fc ff ff eb c2 <0f> 0b 90 8d 74 26 00 eb f9 0f 0b eb fe 8d 76 00 8d bc 27 00 00 EIP: [] do_nmi+0x89/0xa0 SS:ESP 0068:e9ba9c90 Call Trace: [] panic+0x57/0x165 [] mce_panic+0x1c0/0x1e0 [] mce_reign+0x110/0x120 [] mce_end+0xea/0x100 [] do_machine_check+0x3db/0x6a0 [] ? __hrtimer_start_range_ns+0xa0/0x470 [] raise_exception+0x34/0xa0 [mce_inject] [] ? sched_clock+0x8/0x10 [] ? sched_clock_cpu+0x145/0x190 [] ? __lock_acquire+0x2c0/0x490 [] mce_raise_notify+0x61/0x70 [mce_inject] [] nx30 [] default_do_nmi+0x32/0x290 [] ? __lock_release+0x72/0x180 [] ? clockevents_notify+0x3a/0xf0 [] do_nmi+0x87/0xa0 [] nmi_stack_correct+0x2f/0x34 [] ? leaps_between+0x3b/0x90 [] ? intel_idle+0x8c/0x100 [] cpuidle_idle_call+0x8d/0x210 [] cpu_idle+0x9b/0xd0 [] start_secondary+0xdd/0xe3 Rebooting in 1 seconds.. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/