Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754417AbbFRJgk (ORCPT ); Thu, 18 Jun 2015 05:36:40 -0400 Received: from mga14.intel.com ([192.55.52.115]:24735 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753622AbbFRJg1 (ORCPT ); Thu, 18 Jun 2015 05:36:27 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,638,1427785200"; d="scan'208";a="590136446" From: Rui Wang To: bp@suse.de Cc: tony.luck@intel.com, gong.chen@intel.com, linux-kernel@vger.kernel.org, Rui Wang Subject: Re: MCE bug? Date: Thu, 18 Jun 2015 17:18:45 +0800 Message-Id: <1434619125-7142-1-git-send-email-rui.y.wang@intel.com> X-Mailer: git-send-email 1.7.5.4 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6119 Lines: 116 > On Wed, Jun 17, 2015 at 11:41:56AM +0200, Borislav Petkov wrote: >> And I was waiting in line to get a chance to do some injection on our >> EINJ box here too. But it seems you have the required setup already so >> if you want to give those changes a run, I've uploaded them here: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras >> >> It'll be much appreciated. > > and the answer is .... > > > > no. :-( I see a different panic with this kernel. Not seen every time. It was after reboot due to injected errors. [ 0.234672] mce: CPU supports 22 MCE banks [ 0.239291] CPU0: Thermal monitoring enabled (TM1) [ 0.244680] process: using mwait in idle threads [ 0.249844] Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 1024 [ 0.256654] Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 1024, 1GB 4 [ 0.264330] Freeing SMP alternatives memory: 20K (ffffffff81d1e000 - ffffffff81d23000) [ 0.274057] ftrace: allocating 22650 entries in 89 pages [ 0.289946] x2apic: IRQ remapping doesn't support X2APIC mode [ 0.296505] Switched APIC routing to physical flat. [ 0.302838] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 [ 0.349289] smpboot: CPU0: Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz (fam: 06, model: 3f, stepping: 03) [ 0.359844] Performance Events: PEBS fmt2+, 16-deep LBR, Haswell events, full-width counters, Intel PMU driver. [ 0.371173] ... version: 3 [ 0.375649] ... bit width: 48 [ 0.380222] ... generic registers: 4 [ 0.384698] ... value mask: 0000ffffffffffff [ 0.390632] ... max period: 0000ffffffffffff [ 0.396566] ... fixed-purpose events: 3 [ 0.401043] ... event mask: 000000070000000f [ 0.410260] x86: Booting SMP configuration: [ 0.414933] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 [ 0.706763] .... node #1, CPUs: #18 [ 0.822565] mce: [Hardware Error]: Machine check events logged [ 0.822801] #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 [ 1.078660] mce: [Hardware Error]: Machine check events logged [ 1.093416] #34 [ 1.095433] BUG: unable to handle kernel [ 1.100045] #35 [ 1.102193] NULL pointer dereference at 0000000000000008 [ 1.108126] IP: [] pool_mayday_timeout+0x81/0x150 [ 1.111969] [ 1.116818] .... node #0, CPUs: #36 [ 1.121101] PGD 0 [ 1.123348] Oops: 0000 [#1] SMP [ 1.126975] Modules linked in: [ 1.130402] CPU: 33 PID: 0 Comm: swapper/33 Not tainted 4.1.0-rc3-7-default+ #1 [ 1.138570] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0059.R00.1501081238 01/08/2015 [ 1.150134] task: ffff88046e86e0d0 ti: ffff88046e874000 task.ti: ffff88046e874000 [ 1.158496] RIP: 0010:[] [] pool_mayday_timeout+0x81/0x150 [ 1.168228] RSP: 0000:ffff88087f5e3e08 EFLAGS: 00010046 [ 1.174164] RAX: 0000000fffffffe0 RBX: 0000000000000000 RCX: 0000000000000000 [ 1.182135] RDX: ffff88087f5f4898 RSI: ffffffff8107ec80 RDI: ffffffff81dd332c [ 1.190108] RBP: ffff88087f5e3e48 R08: 0000000000000000 R09: ffff88087f5ed8c0 [ 1.198080] R10: 0000000000000004 R11: 0000000000000005 R12: ffffffff81d4d880 [ 1.206052] R13: 0000000000000101 R14: ffffffff8107ec80 R15: ffff88087f5f4880 [ 1.214026] FS: 0000000000000000(0000) GS:ffff88087f5e0000(0000) knlGS:0000000000000000 [ 1.223066] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.229486] #37 [ 1.229486] CR2: 0000000000000008 CR3: 0000000001a0e000 CR4: 00000000001406e0 [ 1.239605] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1.247578] #38 [ 1.247578] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1.257697] #39 [ 1.257697] Stack: [ 1.262090] ffff88087f5e3e48 ffffffff810bf45b 0000000000000021 ffff88087f5ed8c0 [ 1.270398] ffff88087f5f4910[ 1.273400] #40 0000000000000101 ffffffff8107ec80 ffff88087f5f4880 [ 1.280867] ffff88087f5e3e88 ffffffff810cf559 ffff88087f5e3e88 ffff88087f5ed8c0 [ 1.289177] Call Trace: [ 1.291910] #41 [ 1.294068] [ 1.294068] [] ? console_unlock+0x1fb/0x460 [ 1.302927] [] ? wq_unbind_fn+0x130/0x130 [ 1.309242] #42 [ 1.309242] [] call_timer_fn+0x39/0x130 [ 1.317509] [] ? wq_unbind_fn+0x130/0x130 [ 1.323833] #43 [ 1.323834] [] run_timer_softirq+0x211/0x300 [ 1.332598] [] __do_softirq+0xe4/0x290 [ 1.338629] [] irq_exit+0x9d/0xb0 [ 1.344177] #44 [ 1.344177] [] smp_apic_timer_interrupt+0x4a/0x60 [ 1.353424] [] apic_timer_interrupt+0x6e/0x80 [ 1.360135] #45 [ 1.362292] [ 1.362292] [] ? mwait_idle+0x6d/0x90 [ 1.370568] [] arch_cpu_idle+0xf/0x20 [ 1.376507] #46 [ 1.376507] [] cpu_startup_entry+0x2f4/0x3c0 [ 1.385274] [] start_secondary+0x143/0x170 [ 1.391694] #47 [ 1.391694] Code: 49 83 ec 08 31 c9 eb 14 66 90 49 8b 44 24 08 48 39 c2 4c 8d 60 f8 0f 84 8e 00 00 00 49 8b 04 [ 1.404957] #48 24 48 89 c3 30 db a8 04 48 0f 44 d9 <4c> 8b 6b 08 49 83 bd 90 00 00 00 00 74 d1 4c 8d b3 80 00 00 00 [ 1.417801] RIP [] pool_mayday_timeout+0x81/0x150 [ 1.424914] #49 [ 1.424914] RSP [ 1.430955] CR2: 0000000000000008 [ 1.434665] ---[ end trace 4b134008a4be60b6 ]--- [ 1.439823] #50 [ 1.439824] Kernel panic - not syncing: Fatal exception in interrupt [ 1.449088] ---[ end Kernel panic - not syncing: Fatal exception in interrupt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/