Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753808AbbBNDUY (ORCPT ); Fri, 13 Feb 2015 22:20:24 -0500 Received: from numascale.com ([213.162.240.84]:59958 "EHLO numascale.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753258AbbBNDUX (ORCPT ); Fri, 13 Feb 2015 22:20:23 -0500 From: Daniel J Blueman To: Doug Thompson , Borislav Petkov Cc: Daniel J Blueman , Mauro Carvalho Chehab , linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, Steffen Persvold Subject: [PATCH] x86: Prevent oops with >16 memory controllers Date: Sat, 14 Feb 2015 11:18:40 +0800 Message-Id: <1423883920-6425-1-git-send-email-daniel@numascale.com> X-Mailer: git-send-email 1.9.1 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - cpanel21.proisp.no X-AntiAbuse: Original Domain - vger.kernel.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - numascale.com X-Get-Message-Sender-Via: cpanel21.proisp.no: authenticated_id: daniel@numascale.com X-Source: X-Source-Args: X-Source-Dir: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3854 Lines: 93 When ECC interrupts occur on memory controllers after EDAC_MAX_MCS (16), the kernel fatally dereferences unallocated structures [1]; this occurs on at least NumaConnect systems. Minimally fix by checking if a memory controller info structure is allocated; candidate for stable. Signed-off-by: Daniel J Blueman -- [1] BUG: unable to handle kernel NULL pointer dereference at 0000000000000320 IP: [] decode_bus_error+0x2f/0x2b0 PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0 Oops: 0000 [#2] SMP Modules linked in: CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G D 3.19.0 #1 Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.5b 01/28/2015 task: ffff8807dbfb8c00 ti: ffff8807dd16c000 task.ti: ffff8807dd16c000 RIP: 0010:[] [] decode_bus_error+0x2f/0x2b0 RSP: 0000:ffff8907dfc03c48 EFLAGS: 00010297 RAX: 0000000000000001 RBX: 9c67400010080a13 RCX: 0000000000001dc6 RDX: 000000001dc61dc6 RSI: ffff8907dfc03df0 RDI: 000000000000001c RBP: ffff8907dfc03ce8 R08: 0000000000000000 R09: 0000000000000022 R10: ffff891fffa30380 R11: 00000000001cfc90 R12: 0000000000000008 R13: 0000000000000000 R14: 000000000000001c R15: 00009c6740001000 FS: 00007fa97ee18700(0000) GS:ffff8907dfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000320 CR3: 0000003f889b8000 CR4: 00000000000407e0 Stack: 0000000000000000 ffff8907dfc03df0 0000000000000008 9c67400010080a13 000000000000001c 00009c6740001000 ffff8907dfc03c88 ffffffff810e4f9a ffff8907dfc03ce8 ffffffff81b375b9 0000000000000000 0000000000000010 Call Trace: [] ? vprintk_default+0x1a/0x20 [] ? printk+0x41/0x43 [] amd_decode_mce+0x58f/0x8e0 [] notifier_call_chain+0x4d/0x80 [] atomic_notifier_call_chain+0x15/0x20 [] mce_log+0x1d/0x130 [] machine_check_poll+0x194/0x260 [] mce_timer_fn+0x116/0x140 [] ? mce_cpu_restart+0x40/0x40 [] call_timer_fn.isra.29+0x17/0x80 [] run_timer_softirq+0x18b/0x220 [] __do_softirq+0xf9/0x200 [] irq_exit+0x76/0xa0 [] smp_apic_timer_interrupt+0x41/0x50 [] apic_timer_interrupt+0x67/0x70 [] ? down_read_trylock+0x15/0x20 [] __do_page_fault+0xbb/0x4c0 [] ? __schedule+0x25a/0x850 [] do_page_fault+0xc/0x10 [] page_fault+0x1f/0x30 --- drivers/edac/amd64_edac.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 17638d7..baccc0e 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -2175,7 +2175,7 @@ static void __log_bus_error(struct mem_ctl_info *mci, struct err_info *err, static inline void decode_bus_error(int node_id, struct mce *m) { struct mem_ctl_info *mci = mcis[node_id]; - struct amd64_pvt *pvt = mci->pvt_info; + struct amd64_pvt *pvt; u8 ecc_type = (m->status >> 45) & 0x3; u8 xec = XEC(m->status, 0x1f); u16 ec = EC(m->status); @@ -2190,6 +2190,11 @@ static inline void decode_bus_error(int node_id, struct mce *m) if (xec && xec != F10_NBSL_EXT_ERR_ECC) return; + /* Unable to decode on memory controllers after EDAC_MAX_MCS, as no mci is allocated */ + if (!mci) + return; + pvt = mci->pvt_info; + memset(&err, 0, sizeof(err)); sys_addr = get_error_address(pvt, m); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/