Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp381514pxk; Wed, 23 Sep 2020 05:56:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJykRr7k6fDy/J8chZs2UzUyvFAwYRDIfS/Dfnas4Ba5e9wXBg4PMsAuaNAKNY1abOjT1oFp X-Received: by 2002:a17:906:5008:: with SMTP id s8mr10577621ejj.408.1600865773262; Wed, 23 Sep 2020 05:56:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600865773; cv=none; d=google.com; s=arc-20160816; b=zG+rBnCaxHEa4T6LsI3ywZygDlTTVxLPT2Yul9Nj9J9izErxmqwhzVc/qQX5jmBCLg dR3NGhAHJ8tHThG1d6aq+yj+S8QV6cNIwY8uf5X6pvplK9vinynH57kbwugBn7zSZ8hV 36WRqpPhOD/YcPBTFIy4blhnKERUzcMg6te7TEPKISOSKoXxlXzmNhAVXriJV0eVwwk2 LvCoLFM9GzzxZhYLkUqefCP24etu8srJs9KK/FOakBClGNlWnnwcRnVU/nQr7dgauJa9 Z2hD7EXA6QEFqVoEQL1LjZ/pmJSljH4ykWiYNs9Z93HVqFLfNPna2Z6A46mbG5WfHo86 2TBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:reply-to:message-id:subject:cc:to:from:date :sender:dkim-signature; bh=31e/9hnjGjBaSvfSzNB5mOVLnVWAb9RUd/BaxDGfT4A=; b=Wr/WGzs0mdCmRYOFdY442Sjc9n4hxot6bKsDhtRANsmX0eqUhh79DXfAspWmF6cdtb ubQnnlmyCx1RG7GgnSHdPS4RQiaLn49ofAlkMG1kha0jjBKJ1TBU1TMqfqN1QR4MlSUP GB7Auy1dSs6t3BIpLd2ZsZWs/uXtfc7+9Y/wZ9a2Z4y7eHE5802X2sqWnwl9EFNOafVM il/KwgPr8uDahzLF1p7w6gz+XQ/+qXUAMWk5rIndqGtjV53pIwMOnan7Har6214VcdFt WixcTqFz/pzUtFp9X1oKJtPjFDmcXPmBmh7tHCMeItfnxFjztgWsbNVPRteInlsI2qX8 6b4Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=GuzQy8gx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u2si13289637edy.251.2020.09.23.05.55.48; Wed, 23 Sep 2020 05:56:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=GuzQy8gx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726587AbgIWMwk (ORCPT + 99 others); Wed, 23 Sep 2020 08:52:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46722 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726513AbgIWMwj (ORCPT ); Wed, 23 Sep 2020 08:52:39 -0400 Received: from mail-ot1-x343.google.com (mail-ot1-x343.google.com [IPv6:2607:f8b0:4864:20::343]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D596C0613CE for ; Wed, 23 Sep 2020 05:52:39 -0700 (PDT) Received: by mail-ot1-x343.google.com with SMTP id a2so18801078otr.11 for ; Wed, 23 Sep 2020 05:52:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:reply-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=31e/9hnjGjBaSvfSzNB5mOVLnVWAb9RUd/BaxDGfT4A=; b=GuzQy8gx7X7/aMG7eKdOZ5pwehmrMc1CRrR17t6PRaddpPd4AlqfMLE65PiQRFzJ65 o2gsOF60wTXCyIOunmfujQXGdMa90OeDcem0LhMBXBFJLDUSJ3Fa45ocfp2gwgMYib4o YC5oLTPAF0arnm8fUZ8jaJltuwAdlqlyigTB841FtCVBHk2O1c/fLAjh0YcEShRDfMIn sQvlbLHVBy+Yf8r1irsgknqLSzZ7RwR52BSztciZTa+eINd91bizh4sr/69rmTpoaYSg mS18P3Qg2FYCtQDFbW14pchoxOxso+9VnuekHa3kmvd4BuxqDDzDOLBvCxZs/Ryj0jUR UY5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :reply-to:references:mime-version:content-disposition:in-reply-to :user-agent; bh=31e/9hnjGjBaSvfSzNB5mOVLnVWAb9RUd/BaxDGfT4A=; b=EwunccfpG6n2cAW/5kV17UbdL8XcWpF3lZeG8YMLAqp12C/V1ir45FvPgOhb6pw2Gn UpfgJr0GDKpVPdRf2dE0SUHPtp5p3pQBEN/esRafHYekL++cqyat/w8F6aj0NsbNJwJ6 o0mjimazpsMP3WF3cCJPkhcszgWt8dzylCoowmCltp0HxJcixE9J3c/uiOLAVLVsyChx YNNRUtkmC+gpVvg7z1+OC0wnnf9Nf1SAZ8sQWudMkLQ9WtbtfILZqpSmocsrMwsyLvdB qzY486Pm9hUP97vRTMcLgSDxbEwdxOHjP2e57kyEiFEJmNN0+1B3nsMviZPiVxPfhW3d jzzg== X-Gm-Message-State: AOAM5308PDUbMenx5ghXSIOazlCqOCWilKkNtmnuHvsa3clyJNLbRPYD H6nrwpNCuzhiiznL96K42w== X-Received: by 2002:a9d:7b48:: with SMTP id f8mr6085167oto.297.1600865558751; Wed, 23 Sep 2020 05:52:38 -0700 (PDT) Received: from serve.minyard.net (serve.minyard.net. [2001:470:b8f6:1b::1]) by smtp.gmail.com with ESMTPSA id k51sm7607386otc.46.2020.09.23.05.52.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Sep 2020 05:52:37 -0700 (PDT) Sender: Corey Minyard Received: from minyard.net (unknown [IPv6:2001:470:b8f6:1b:bda8:cea9:424f:cdc4]) by serve.minyard.net (Postfix) with ESMTPSA id A7332182239; Wed, 23 Sep 2020 12:52:36 +0000 (UTC) Date: Wed, 23 Sep 2020 07:52:35 -0500 From: Corey Minyard To: Wu Bo Cc: Corey Minyard , arnd@arndb.de, gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org, linfeilong@huawei.com, hidehiro.kawai.ez@hitachi.com, openipmi-developer@lists.sourceforge.net, liuzhiqiang26@huawei.com Subject: Re: [Openipmi-developer] [PATCH] x86: Fix MCE error handing when kdump is enabled Message-ID: <20200923125235.GW3674@minyard.net> Reply-To: minyard@acm.org References: <20200922161311.GQ3674@minyard.net> <20200922182940.31843-1-minyard@acm.org> <20200922184332.GT3674@minyard.net> <29448f27-12f7-82a1-7483-80471c36d48c@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <29448f27-12f7-82a1-7483-80471c36d48c@huawei.com> User-Agent: Mutt/1.9.4 (2018-02-28) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 23, 2020 at 04:48:31PM +0800, Wu Bo wrote: > On 2020/9/23 2:43, Corey Minyard wrote: > > On Tue, Sep 22, 2020 at 01:29:40PM -0500, minyard@acm.org wrote: > > > From: Corey Minyard > > > > > > If kdump is enabled, the handling of shooting down CPUs does not use the > > > RESET_VECTOR irq before trying to use NMIs to shoot down the CPUs. > > > > > > For normal errors that is fine. MCEs, however, are already running in > > > an NMI, so sending them an NMI won't do anything. The MCE code is set > > > up to receive the RESET_VECTOR because it disables CPUs, but it won't > > ^ should be "enables irqs" > > > work on the NMI-only case. > > > > > > There is already code in place to scan for the NMI callback being ready, > > > simply call that from the MCE's wait_for_panic() code so it will pick up > > > and handle it if an NMI shootdown is requested. This required > > > propagating the registers down to wait_for_panic(). > > > > > > Signed-off-by: Corey Minyard > > > --- > > > After looking at it a bit, I think this is the proper way to fix the > > > issue, though I'm not an expert on this code so I'm not sure. > > > > > > I have not even tested this patch, I have only compiled it. But from > > > what I can tell, things waiting in NMIs for a shootdown should call > > > run_crash_ipi_callback() in their wait loop. > > Hi, > > In my VM (using qemu-kvm), Kump is enabled, used mce-inject injects an > uncorrectable error. I has an issue with the IPMI driver's panic handling > running while the other CPUs are sitting in "wait_for_panic()" with > interrupt on, and IPMI interrupts interfering with the panic handling, As a > result, IPMI panic hangs for more than 3000 seconds. > > After I has patched and tested this patch, the problem of IPMI hangs has > disappeared. It should be a solution to the problem. Thanks for testing this. I have submitted the patch to the MCE maintainers. -corey > > > Thanks, > > Wu Bo > > > > > > > arch/x86/kernel/cpu/mce/core.c | 67 ++++++++++++++++++++++------------ > > > 1 file changed, 44 insertions(+), 23 deletions(-) > > > > > > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > > > index f43a78bde670..3a842b3773b3 100644 > > > --- a/arch/x86/kernel/cpu/mce/core.c > > > +++ b/arch/x86/kernel/cpu/mce/core.c > > > @@ -282,20 +282,35 @@ static int fake_panic; > > > static atomic_t mce_fake_panicked; > > > /* Panic in progress. Enable interrupts and wait for final IPI */ > > > -static void wait_for_panic(void) > > > +static void wait_for_panic(struct pt_regs *regs) > > > { > > > long timeout = PANIC_TIMEOUT*USEC_PER_SEC; > > > preempt_disable(); > > > local_irq_enable(); > > > - while (timeout-- > 0) > > > + while (timeout-- > 0) { > > > + /* > > > + * We are in an NMI waiting to be stopped by the > > > + * handing processor. For kdump handling, we need to > > > + * be monitoring crash_ipi_issued since that is what > > > + * is used for an NMI stop used by kdump. But we also > > > + * need to have interrupts enabled some so that > > > + * RESET_VECTOR will interrupt us on a normal > > > + * shutdown. > > > + */ > > > + local_irq_disable(); > > > + run_crash_ipi_callback(regs); > > > + local_irq_enable(); > > > + > > > udelay(1); > > > + } > > > if (panic_timeout == 0) > > > panic_timeout = mca_cfg.panic_timeout; > > > panic("Panicing machine check CPU died"); > > > } > > > -static void mce_panic(const char *msg, struct mce *final, char *exp) > > > +static void mce_panic(const char *msg, struct mce *final, char *exp, > > > + struct pt_regs *regs) > > > { > > > int apei_err = 0; > > > struct llist_node *pending; > > > @@ -306,7 +321,7 @@ static void mce_panic(const char *msg, struct mce *final, char *exp) > > > * Make sure only one CPU runs in machine check panic > > > */ > > > if (atomic_inc_return(&mce_panicked) > 1) > > > - wait_for_panic(); > > > + wait_for_panic(regs); > > > barrier(); > > > bust_spinlocks(1); > > > @@ -817,7 +832,7 @@ static atomic_t mce_callin; > > > /* > > > * Check if a timeout waiting for other CPUs happened. > > > */ > > > -static int mce_timed_out(u64 *t, const char *msg) > > > +static int mce_timed_out(u64 *t, const char *msg, struct pt_regs *regs) > > > { > > > /* > > > * The others already did panic for some reason. > > > @@ -827,12 +842,12 @@ static int mce_timed_out(u64 *t, const char *msg) > > > */ > > > rmb(); > > > if (atomic_read(&mce_panicked)) > > > - wait_for_panic(); > > > + wait_for_panic(regs); > > > if (!mca_cfg.monarch_timeout) > > > goto out; > > > if ((s64)*t < SPINUNIT) { > > > if (mca_cfg.tolerant <= 1) > > > - mce_panic(msg, NULL, NULL); > > > + mce_panic(msg, NULL, NULL, regs); > > > cpu_missing = 1; > > > return 1; > > > } > > > @@ -866,7 +881,7 @@ static int mce_timed_out(u64 *t, const char *msg) > > > * All the spin loops have timeouts; when a timeout happens a CPU > > > * typically elects itself to be Monarch. > > > */ > > > -static void mce_reign(void) > > > +static void mce_reign(struct pt_regs *regs) > > > { > > > int cpu; > > > struct mce *m = NULL; > > > @@ -896,7 +911,7 @@ static void mce_reign(void) > > > * other CPUs. > > > */ > > > if (m && global_worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) > > > - mce_panic("Fatal machine check", m, msg); > > > + mce_panic("Fatal machine check", m, msg, regs); > > > /* > > > * For UC somewhere we let the CPU who detects it handle it. > > > @@ -909,7 +924,8 @@ static void mce_reign(void) > > > * source or one CPU is hung. Panic. > > > */ > > > if (global_worst <= MCE_KEEP_SEVERITY && mca_cfg.tolerant < 3) > > > - mce_panic("Fatal machine check from unknown source", NULL, NULL); > > > + mce_panic("Fatal machine check from unknown source", NULL, NULL, > > > + regs); > > > /* > > > * Now clear all the mces_seen so that they don't reappear on > > > @@ -928,7 +944,7 @@ static atomic_t global_nwo; > > > * in the entry order. > > > * TBD double check parallel CPU hotunplug > > > */ > > > -static int mce_start(int *no_way_out) > > > +static int mce_start(int *no_way_out, struct pt_regs *regs) > > > { > > > int order; > > > int cpus = num_online_cpus(); > > > @@ -949,7 +965,8 @@ static int mce_start(int *no_way_out) > > > */ > > > while (atomic_read(&mce_callin) != cpus) { > > > if (mce_timed_out(&timeout, > > > - "Timeout: Not all CPUs entered broadcast exception handler")) { > > > + "Timeout: Not all CPUs entered broadcast exception handler", > > > + regs)) { > > > atomic_set(&global_nwo, 0); > > > return -1; > > > } > > > @@ -975,7 +992,8 @@ static int mce_start(int *no_way_out) > > > */ > > > while (atomic_read(&mce_executing) < order) { > > > if (mce_timed_out(&timeout, > > > - "Timeout: Subject CPUs unable to finish machine check processing")) { > > > + "Timeout: Subject CPUs unable to finish machine check processing", > > > + regs)) { > > > atomic_set(&global_nwo, 0); > > > return -1; > > > } > > > @@ -995,7 +1013,7 @@ static int mce_start(int *no_way_out) > > > * Synchronize between CPUs after main scanning loop. > > > * This invokes the bulk of the Monarch processing. > > > */ > > > -static int mce_end(int order) > > > +static int mce_end(int order, struct pt_regs *regs) > > > { > > > int ret = -1; > > > u64 timeout = (u64)mca_cfg.monarch_timeout * NSEC_PER_USEC; > > > @@ -1020,12 +1038,13 @@ static int mce_end(int order) > > > */ > > > while (atomic_read(&mce_executing) <= cpus) { > > > if (mce_timed_out(&timeout, > > > - "Timeout: Monarch CPU unable to finish machine check processing")) > > > + "Timeout: Monarch CPU unable to finish machine check processing", > > > + regs)) > > > goto reset; > > > ndelay(SPINUNIT); > > > } > > > - mce_reign(); > > > + mce_reign(regs); > > > barrier(); > > > ret = 0; > > > } else { > > > @@ -1034,7 +1053,8 @@ static int mce_end(int order) > > > */ > > > while (atomic_read(&mce_executing) != 0) { > > > if (mce_timed_out(&timeout, > > > - "Timeout: Monarch CPU did not finish machine check processing")) > > > + "Timeout: Monarch CPU did not finish machine check processing", > > > + regs)) > > > goto reset; > > > ndelay(SPINUNIT); > > > } > > > @@ -1286,9 +1306,9 @@ noinstr void do_machine_check(struct pt_regs *regs) > > > */ > > > if (lmce) { > > > if (no_way_out) > > > - mce_panic("Fatal local machine check", &m, msg); > > > + mce_panic("Fatal local machine check", &m, msg, regs); > > > } else { > > > - order = mce_start(&no_way_out); > > > + order = mce_start(&no_way_out, regs); > > > } > > > __mc_scan_banks(&m, final, toclear, valid_banks, no_way_out, &worst); > > > @@ -1301,7 +1321,7 @@ noinstr void do_machine_check(struct pt_regs *regs) > > > * When there's any problem use only local no_way_out state. > > > */ > > > if (!lmce) { > > > - if (mce_end(order) < 0) > > > + if (mce_end(order, regs) < 0) > > > no_way_out = worst >= MCE_PANIC_SEVERITY; > > > } else { > > > /* > > > @@ -1314,7 +1334,7 @@ noinstr void do_machine_check(struct pt_regs *regs) > > > */ > > > if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) { > > > mce_severity(&m, cfg->tolerant, &msg, true); > > > - mce_panic("Local fatal machine check!", &m, msg); > > > + mce_panic("Local fatal machine check!", &m, msg, regs); > > > } > > > } > > > @@ -1325,7 +1345,7 @@ noinstr void do_machine_check(struct pt_regs *regs) > > > if (cfg->tolerant == 3) > > > kill_it = 0; > > > else if (no_way_out) > > > - mce_panic("Fatal machine check on current CPU", &m, msg); > > > + mce_panic("Fatal machine check on current CPU", &m, msg, regs); > > > if (worst > 0) > > > irq_work_queue(&mce_irq_work); > > > @@ -1361,7 +1381,8 @@ noinstr void do_machine_check(struct pt_regs *regs) > > > */ > > > if (m.kflags & MCE_IN_KERNEL_RECOV) { > > > if (!fixup_exception(regs, X86_TRAP_MC, 0, 0)) > > > - mce_panic("Failed kernel mode recovery", &m, msg); > > > + mce_panic("Failed kernel mode recovery", &m, > > > + msg, regs); > > > } > > > } > > > } > > > -- > > > 2.17.1 > > > > > > > > > > > > _______________________________________________ > > > Openipmi-developer mailing list > > > Openipmi-developer@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/openipmi-developer > > . > > >