Received: by 10.213.65.68 with SMTP id h4csp1776922imn; Mon, 19 Mar 2018 13:00:12 -0700 (PDT) X-Google-Smtp-Source: AG47ELsyxtm6bcH0P7PNpNGJkjfaP2nB2gsICl8CCEKpe+F6YMsd3sdgBbPLxbLHS0/jHL4f52+h X-Received: by 10.98.212.80 with SMTP id u16mr11242439pfl.58.1521489612692; Mon, 19 Mar 2018 13:00:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521489612; cv=none; d=google.com; s=arc-20160816; b=ErP7lVHePSSEp7cFJoUcwGKgooaXzmhmgOVReVYXaZBjlRLo2H4ep+6oW/AB4jL4CY kQkouHzGu7MoVP5Zk+iGdXSofVVY1R89MMeEJ6i1qAxEhA9Aw7qB/O2rpt4xPE/Xw9wr s90T2U/PyfpycF7tjz6QZKY+u8SSQBCGg/rkYcw7NtYuYMrQEEe3PZSBXFCHn8Uetgel hKRrgZEibHYWMT0B9SlBRVZ2pumJew+vLR1XQttaTdBdaW6WXL+PSY0wjF1yGtltAAU9 rHdj8Ilj2eiuEEc4oEtXbMUNey066RCY+wUCY4eUALnGqhgQ0qXs3DuuKR59pVVAd+4B voBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=6FDiO2tkTNNisq0vYo3Mk2A/QqklwA8M+72LH9b5CyQ=; b=zejmjyOWVOap6VF8EKl5SoSAtDB74pXTSH4XA8IT2KZBy7zt6Usyix2lpMJI0J3uxp jAD9GZUpHYMx4wzooCy4mN4dda1VcSJEXkrWYscHfeUNzcyuF451pmmRXUpZzNulOIt4 rBzaMMKIU+/M/a6/f+ECbXwAJzs+hReSEDIfgImDrd1kO8Em/m1NBWBQ6SLFWj/8YuI5 l8LMsfe96NTjAT9i6ECClS1Ko6t4Zy6hd1whczWcrYeG06qS0L8en2fI8D0lkEuX3WsD YUb92mi2bq0whN1LAzHCuqQdl9oBSuS6DgMzZo4AaWesGRo6pv6sv2Cf4LQEJ4300+pc tW1Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q5-v6si521339plk.632.2018.03.19.12.59.57; Mon, 19 Mar 2018 13:00:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S970747AbeCST5z (ORCPT + 99 others); Mon, 19 Mar 2018 15:57:55 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:46690 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030882AbeCSSSt (ORCPT ); Mon, 19 Mar 2018 14:18:49 -0400 Received: from localhost (LFbn-1-12247-202.w90-92.abo.wanadoo.fr [90.92.61.202]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id D4C9C2F; Mon, 19 Mar 2018 18:18:48 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Borislav Petkov , Xunlei Pang , Borislav Petkov , Tony Luck , Naoya Horiguchi , kexec@lists.infradead.org, linux-edac , Thomas Gleixner , Sasha Levin Subject: [PATCH 4.9 022/241] x86/mce: Handle broadcasted MCE gracefully with kexec Date: Mon, 19 Mar 2018 19:04:47 +0100 Message-Id: <20180319180752.087801059@linuxfoundation.org> X-Mailer: git-send-email 2.16.2 In-Reply-To: <20180319180751.172155436@linuxfoundation.org> References: <20180319180751.172155436@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.9-stable review patch. If anyone has any objections, please let me know. ------------------ From: Xunlei Pang [ Upstream commit 5bc329503e8191c91c4c40836f062ef771d8ba83 ] When we are about to kexec a crash kernel and right then and there a broadcasted MCE fires while we're still in the first kernel and while the other CPUs remain in a holding pattern, the #MC handler of the first kernel will timeout and then panic due to never completing MCE synchronization. Handle this in a similar way as to when the CPUs are offlined when that broadcasted MCE happens. [ Boris: rewrote commit message and comments. ] Suggested-by: Borislav Petkov Signed-off-by: Xunlei Pang Signed-off-by: Borislav Petkov Acked-by: Tony Luck Cc: Naoya Horiguchi Cc: kexec@lists.infradead.org Cc: linux-edac Link: http://lkml.kernel.org/r/1487857012-9059-1-git-send-email-xlpang@redhat.com Link: http://lkml.kernel.org/r/20170313095019.19351-1-bp@alien8.de Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/reboot.h | 1 + arch/x86/kernel/cpu/mcheck/mce.c | 18 ++++++++++++++++-- arch/x86/kernel/reboot.c | 5 +++-- 3 files changed, 20 insertions(+), 4 deletions(-) --- a/arch/x86/include/asm/reboot.h +++ b/arch/x86/include/asm/reboot.h @@ -15,6 +15,7 @@ struct machine_ops { }; extern struct machine_ops machine_ops; +extern int crashing_cpu; void native_machine_crash_shutdown(struct pt_regs *regs); void native_machine_shutdown(void); --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -48,6 +48,7 @@ #include #include #include +#include #include "mce-internal.h" @@ -1081,9 +1082,22 @@ void do_machine_check(struct pt_regs *re * on Intel. */ int lmce = 1; + int cpu = smp_processor_id(); - /* If this CPU is offline, just bail out. */ - if (cpu_is_offline(smp_processor_id())) { + /* + * Cases where we avoid rendezvous handler timeout: + * 1) If this CPU is offline. + * + * 2) If crashing_cpu was set, e.g. we're entering kdump and we need to + * skip those CPUs which remain looping in the 1st kernel - see + * crash_nmi_callback(). + * + * Note: there still is a small window between kexec-ing and the new, + * kdump kernel establishing a new #MC handler where a broadcasted MCE + * might not get handled properly. + */ + if (cpu_is_offline(cpu) || + (crashing_cpu != -1 && crashing_cpu != cpu)) { u64 mcgstatus; mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS); --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -769,10 +769,11 @@ void machine_crash_shutdown(struct pt_re #endif +/* This is the CPU performing the emergency shutdown work. */ +int crashing_cpu = -1; + #if defined(CONFIG_SMP) -/* This keeps a track of which one is crashing cpu. */ -static int crashing_cpu; static nmi_shootdown_cb shootdown_callback; static atomic_t waiting_for_crash_ipi;