Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp242540pxk; Wed, 23 Sep 2020 01:50:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzANEJlJM8GXmCqgIrt3oxSAfryvFJCN5RNoyiBSMrCkAKQrYG2IT113RjD8GTuSNQaWJpt X-Received: by 2002:a17:906:6ce:: with SMTP id v14mr8865979ejb.451.1600851013203; Wed, 23 Sep 2020 01:50:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600851013; cv=none; d=google.com; s=arc-20160816; b=PaaWD0qWZ1A6CrWi9G3i13RxFCWPtWTpaLsvjfvDsvoFqLzXlTsyvQashQNkH0eofh AmDMLCr1SAxIJrrAEf5LfnJtWBkPxJMiGG435x4e/wY1HvVcfm+sW32PT1gyKqBGzKxq EkzsQr17wChqZz/M8rBNReSZUm2St0hUz+8sNDiyB8B+TGn+bpG5X6CDKWdnIRR0uhRZ wxksFEpHbWrB3ayOC+87uHAr2bRcyrevpm5q1vouaETf+oXZOIDpJxp3fixFOjUAMAFn T9ZEhPRCXn1xaLGVPZvRJS7uhH2+siZBPq+YaHyvCnWto3xvt3hwF12gJTUIhIfGE7X5 B0kg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=VQ0aQnL2FvvqrqKWy/0UdyEctHvlBdh2SvHlebjG90M=; b=Bd0VINm64lMR7Mo/2y9gyn5LRqBR5Ci5/kLsmHdDFHZxTOmWQCT1Sn8iCkaRaSoiZx MWRg13kcC/AsjPSnla1xqt5arUsBhfR3V4Uv+DT6wTCumISud5iSG1BeZTClsFQJVtOA E4zPE3/xXD1sdcWIyqUw8SWlB29Sh2ziT3y9+qcoX8WMK+AftlBW12MXxDPB+fsjx81E JsphCoLFs5aHJpU7b5FHw+y1Ob19AMhEMwDIMh+m0urvNZFak44k4+gTR7/+bOC4/LoA dftD+5LZBHkhWGze/DYO6zu7CgDkna6cfuSVyY3fa8xW6x9stad/vTdm3TvjMfWKIUrl D5mw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h11si12221965edr.319.2020.09.23.01.49.48; Wed, 23 Sep 2020 01:50:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726342AbgIWIst (ORCPT + 99 others); Wed, 23 Sep 2020 04:48:49 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:14265 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726244AbgIWIst (ORCPT ); Wed, 23 Sep 2020 04:48:49 -0400 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 99245D33C459C597CA47; Wed, 23 Sep 2020 16:48:42 +0800 (CST) Received: from [10.174.179.35] (10.174.179.35) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.487.0; Wed, 23 Sep 2020 16:48:32 +0800 Subject: Re: [Openipmi-developer] [PATCH] x86: Fix MCE error handing when kdump is enabled To: CC: Corey Minyard , , , , , , , References: <20200922161311.GQ3674@minyard.net> <20200922182940.31843-1-minyard@acm.org> <20200922184332.GT3674@minyard.net> From: Wu Bo Message-ID: <29448f27-12f7-82a1-7483-80471c36d48c@huawei.com> Date: Wed, 23 Sep 2020 16:48:31 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.2.2 MIME-Version: 1.0 In-Reply-To: <20200922184332.GT3674@minyard.net> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.179.35] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/9/23 2:43, Corey Minyard wrote: > On Tue, Sep 22, 2020 at 01:29:40PM -0500, minyard@acm.org wrote: >> From: Corey Minyard >> >> If kdump is enabled, the handling of shooting down CPUs does not use the >> RESET_VECTOR irq before trying to use NMIs to shoot down the CPUs. >> >> For normal errors that is fine. MCEs, however, are already running in >> an NMI, so sending them an NMI won't do anything. The MCE code is set >> up to receive the RESET_VECTOR because it disables CPUs, but it won't > ^ should be "enables irqs" >> work on the NMI-only case. >> >> There is already code in place to scan for the NMI callback being ready, >> simply call that from the MCE's wait_for_panic() code so it will pick up >> and handle it if an NMI shootdown is requested. This required >> propagating the registers down to wait_for_panic(). >> >> Signed-off-by: Corey Minyard >> --- >> After looking at it a bit, I think this is the proper way to fix the >> issue, though I'm not an expert on this code so I'm not sure. >> >> I have not even tested this patch, I have only compiled it. But from >> what I can tell, things waiting in NMIs for a shootdown should call >> run_crash_ipi_callback() in their wait loop. Hi, In my VM (using qemu-kvm), Kump is enabled, used mce-inject injects an uncorrectable error. I has an issue with the IPMI driver's panic handling running while the other CPUs are sitting in "wait_for_panic()" with interrupt on, and IPMI interrupts interfering with the panic handling, As a result, IPMI panic hangs for more than 3000 seconds. After I has patched and tested this patch, the problem of IPMI hangs has disappeared. It should be a solution to the problem. Thanks, Wu Bo >> >> arch/x86/kernel/cpu/mce/core.c | 67 ++++++++++++++++++++++------------ >> 1 file changed, 44 insertions(+), 23 deletions(-) >> >> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c >> index f43a78bde670..3a842b3773b3 100644 >> --- a/arch/x86/kernel/cpu/mce/core.c >> +++ b/arch/x86/kernel/cpu/mce/core.c >> @@ -282,20 +282,35 @@ static int fake_panic; >> static atomic_t mce_fake_panicked; >> >> /* Panic in progress. Enable interrupts and wait for final IPI */ >> -static void wait_for_panic(void) >> +static void wait_for_panic(struct pt_regs *regs) >> { >> long timeout = PANIC_TIMEOUT*USEC_PER_SEC; >> >> preempt_disable(); >> local_irq_enable(); >> - while (timeout-- > 0) >> + while (timeout-- > 0) { >> + /* >> + * We are in an NMI waiting to be stopped by the >> + * handing processor. For kdump handling, we need to >> + * be monitoring crash_ipi_issued since that is what >> + * is used for an NMI stop used by kdump. But we also >> + * need to have interrupts enabled some so that >> + * RESET_VECTOR will interrupt us on a normal >> + * shutdown. >> + */ >> + local_irq_disable(); >> + run_crash_ipi_callback(regs); >> + local_irq_enable(); >> + >> udelay(1); >> + } >> if (panic_timeout == 0) >> panic_timeout = mca_cfg.panic_timeout; >> panic("Panicing machine check CPU died"); >> } >> >> -static void mce_panic(const char *msg, struct mce *final, char *exp) >> +static void mce_panic(const char *msg, struct mce *final, char *exp, >> + struct pt_regs *regs) >> { >> int apei_err = 0; >> struct llist_node *pending; >> @@ -306,7 +321,7 @@ static void mce_panic(const char *msg, struct mce *final, char *exp) >> * Make sure only one CPU runs in machine check panic >> */ >> if (atomic_inc_return(&mce_panicked) > 1) >> - wait_for_panic(); >> + wait_for_panic(regs); >> barrier(); >> >> bust_spinlocks(1); >> @@ -817,7 +832,7 @@ static atomic_t mce_callin; >> /* >> * Check if a timeout waiting for other CPUs happened. >> */ >> -static int mce_timed_out(u64 *t, const char *msg) >> +static int mce_timed_out(u64 *t, const char *msg, struct pt_regs *regs) >> { >> /* >> * The others already did panic for some reason. >> @@ -827,12 +842,12 @@ static int mce_timed_out(u64 *t, const char *msg) >> */ >> rmb(); >> if (atomic_read(&mce_panicked)) >> - wait_for_panic(); >> + wait_for_panic(regs); >> if (!mca_cfg.monarch_timeout) >> goto out; >> if ((s64)*t < SPINUNIT) { >> if (mca_cfg.tolerant <= 1) >> - mce_panic(msg, NULL, NULL); >> + mce_panic(msg, NULL, NULL, regs); >> cpu_missing = 1; >> return 1; >> } >> @@ -866,7 +881,7 @@ static int mce_timed_out(u64 *t, const char *msg) >> * All the spin loops have timeouts; when a timeout happens a CPU >> * typically elects itself to be Monarch. >> */ >> -static void mce_reign(void) >> +static void mce_reign(struct pt_regs *regs) >> { >> int cpu; >> struct mce *m = NULL; >> @@ -896,7 +911,7 @@ static void mce_reign(void) >> * other CPUs. >> */ >> if (m && global_worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) >> - mce_panic("Fatal machine check", m, msg); >> + mce_panic("Fatal machine check", m, msg, regs); >> >> /* >> * For UC somewhere we let the CPU who detects it handle it. >> @@ -909,7 +924,8 @@ static void mce_reign(void) >> * source or one CPU is hung. Panic. >> */ >> if (global_worst <= MCE_KEEP_SEVERITY && mca_cfg.tolerant < 3) >> - mce_panic("Fatal machine check from unknown source", NULL, NULL); >> + mce_panic("Fatal machine check from unknown source", NULL, NULL, >> + regs); >> >> /* >> * Now clear all the mces_seen so that they don't reappear on >> @@ -928,7 +944,7 @@ static atomic_t global_nwo; >> * in the entry order. >> * TBD double check parallel CPU hotunplug >> */ >> -static int mce_start(int *no_way_out) >> +static int mce_start(int *no_way_out, struct pt_regs *regs) >> { >> int order; >> int cpus = num_online_cpus(); >> @@ -949,7 +965,8 @@ static int mce_start(int *no_way_out) >> */ >> while (atomic_read(&mce_callin) != cpus) { >> if (mce_timed_out(&timeout, >> - "Timeout: Not all CPUs entered broadcast exception handler")) { >> + "Timeout: Not all CPUs entered broadcast exception handler", >> + regs)) { >> atomic_set(&global_nwo, 0); >> return -1; >> } >> @@ -975,7 +992,8 @@ static int mce_start(int *no_way_out) >> */ >> while (atomic_read(&mce_executing) < order) { >> if (mce_timed_out(&timeout, >> - "Timeout: Subject CPUs unable to finish machine check processing")) { >> + "Timeout: Subject CPUs unable to finish machine check processing", >> + regs)) { >> atomic_set(&global_nwo, 0); >> return -1; >> } >> @@ -995,7 +1013,7 @@ static int mce_start(int *no_way_out) >> * Synchronize between CPUs after main scanning loop. >> * This invokes the bulk of the Monarch processing. >> */ >> -static int mce_end(int order) >> +static int mce_end(int order, struct pt_regs *regs) >> { >> int ret = -1; >> u64 timeout = (u64)mca_cfg.monarch_timeout * NSEC_PER_USEC; >> @@ -1020,12 +1038,13 @@ static int mce_end(int order) >> */ >> while (atomic_read(&mce_executing) <= cpus) { >> if (mce_timed_out(&timeout, >> - "Timeout: Monarch CPU unable to finish machine check processing")) >> + "Timeout: Monarch CPU unable to finish machine check processing", >> + regs)) >> goto reset; >> ndelay(SPINUNIT); >> } >> >> - mce_reign(); >> + mce_reign(regs); >> barrier(); >> ret = 0; >> } else { >> @@ -1034,7 +1053,8 @@ static int mce_end(int order) >> */ >> while (atomic_read(&mce_executing) != 0) { >> if (mce_timed_out(&timeout, >> - "Timeout: Monarch CPU did not finish machine check processing")) >> + "Timeout: Monarch CPU did not finish machine check processing", >> + regs)) >> goto reset; >> ndelay(SPINUNIT); >> } >> @@ -1286,9 +1306,9 @@ noinstr void do_machine_check(struct pt_regs *regs) >> */ >> if (lmce) { >> if (no_way_out) >> - mce_panic("Fatal local machine check", &m, msg); >> + mce_panic("Fatal local machine check", &m, msg, regs); >> } else { >> - order = mce_start(&no_way_out); >> + order = mce_start(&no_way_out, regs); >> } >> >> __mc_scan_banks(&m, final, toclear, valid_banks, no_way_out, &worst); >> @@ -1301,7 +1321,7 @@ noinstr void do_machine_check(struct pt_regs *regs) >> * When there's any problem use only local no_way_out state. >> */ >> if (!lmce) { >> - if (mce_end(order) < 0) >> + if (mce_end(order, regs) < 0) >> no_way_out = worst >= MCE_PANIC_SEVERITY; >> } else { >> /* >> @@ -1314,7 +1334,7 @@ noinstr void do_machine_check(struct pt_regs *regs) >> */ >> if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) { >> mce_severity(&m, cfg->tolerant, &msg, true); >> - mce_panic("Local fatal machine check!", &m, msg); >> + mce_panic("Local fatal machine check!", &m, msg, regs); >> } >> } >> >> @@ -1325,7 +1345,7 @@ noinstr void do_machine_check(struct pt_regs *regs) >> if (cfg->tolerant == 3) >> kill_it = 0; >> else if (no_way_out) >> - mce_panic("Fatal machine check on current CPU", &m, msg); >> + mce_panic("Fatal machine check on current CPU", &m, msg, regs); >> >> if (worst > 0) >> irq_work_queue(&mce_irq_work); >> @@ -1361,7 +1381,8 @@ noinstr void do_machine_check(struct pt_regs *regs) >> */ >> if (m.kflags & MCE_IN_KERNEL_RECOV) { >> if (!fixup_exception(regs, X86_TRAP_MC, 0, 0)) >> - mce_panic("Failed kernel mode recovery", &m, msg); >> + mce_panic("Failed kernel mode recovery", &m, >> + msg, regs); >> } >> } >> } >> -- >> 2.17.1 >> >> >> >> _______________________________________________ >> Openipmi-developer mailing list >> Openipmi-developer@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/openipmi-developer > . >