Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp3694546pxb; Mon, 24 Jan 2022 15:34:39 -0800 (PST) X-Google-Smtp-Source: ABdhPJxx7k+huW+zS/1sJOK+FYPnMEX56SlI/O8t8/ddx5RU5jS1SU7nCjA+TcW9sFgEmf5bjNaK X-Received: by 2002:a17:90b:240a:: with SMTP id nr10mr655751pjb.110.1643067279409; Mon, 24 Jan 2022 15:34:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643067279; cv=none; d=google.com; s=arc-20160816; b=bwcWHpM21ghkFZ92lW70BLpmaiDzj47A7+3+kfdldHqYxAj+KoGI/r1tFaXTRCbbfh PM5dI23LtOLcte0CAuZdcrLm+poUDWj9ROwvA6uzuRM6b706TzG2Uv6hGvrPKSYihNeO zvQ6LmyEdbQaZoROjN5UP84N/mY5V18RgzMTidZ3RFAPg0cfo+0uHSveatqPKtYFAmR3 37XPFenD2Eo3m2LcPJknyXgMhuapvMUda48qvxKF887TieE9BqdGtSG91veik2IoeIxI Nf/GFiFtIMLRRGagRtsKFeyD+rYpgPIYQ+Su47QC1t83FWy71LOtVGBwGlfrjsYIlhlN Wn8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=qh0A3sqE0I27Ex015pFgb6moOGRc0KZxKSo6LJ1BvIU=; b=khJsEYynj0mOR+A7bObVPM5HGur+Y0jkBeEi3KCHJWiophacKGvoaby/LZr1/br9/b 9yqqkz/N43DpxjDJdgSqIwSB9VC1KL3hgD+5CJWAaQTr6m1DKP+JZRvhSc56+U2rU6Fk x+c4BwazbJC3UN5ts/EmHdQnuaQKy0NzbvKlmDgK6+7er4qeIhVlIHVCfbhgj/s2DMvb yl27i+f0VVO4avo7DwGm3cCDUsqWKi/PZbCEf4HWofV4D3MmPcpNsvgr75VRcYF9OC6E GPnzh0gUrrDz4klbdyfJ0xcHSXcEJxbF9SkLLb3B2gmKu1f3X/MNedF9zHp6nnJFtwHj p2nA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b="gvVvRaJ/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d23si599866pje.95.2022.01.24.15.34.27; Mon, 24 Jan 2022 15:34:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b="gvVvRaJ/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1850415AbiAXX3y (ORCPT + 99 others); Mon, 24 Jan 2022 18:29:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35638 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1380987AbiAXWNL (ORCPT ); Mon, 24 Jan 2022 17:13:11 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3DADC0E03CB; Mon, 24 Jan 2022 12:43:40 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 8A8A6B8121C; Mon, 24 Jan 2022 20:43:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A7210C340E5; Mon, 24 Jan 2022 20:43:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1643057019; bh=Sk+xXa+P1+yP8WuweNAfqZxJsA9AOeunJxTM3kRPWqc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=gvVvRaJ/lXQNWWeVZTyf1dvoR6hakbvaT4nCTC2Lv5Dq3lWXxLbgzwtM/KvPCadc2 QUGzPftaqHWI8EC/OKiVf53l84NQqvxHPb9xeitIEGlkyYcXDwSxfxrOFLmq6XvKzu 72ZqV8fhF4J/mzHe88+cR0R/JJBB0DDEavEPbdpw= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Hari Bathini , Michael Ellerman , Sasha Levin Subject: [PATCH 5.15 648/846] powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic Date: Mon, 24 Jan 2022 19:42:45 +0100 Message-Id: <20220124184123.373050483@linuxfoundation.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124184100.867127425@linuxfoundation.org> References: <20220124184100.867127425@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Hari Bathini [ Upstream commit 06e629c25daa519be620a8c17359ae8fc7a2e903 ] In panic path, fadump is triggered via a panic notifier function. Before calling panic notifier functions, smp_send_stop() gets called, which stops all CPUs except the panic'ing CPU. Commit 8389b37dffdc ("powerpc: stop_this_cpu: remove the cpu from the online map.") and again commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()") started marking CPUs as offline while stopping them. So, if a kernel has either of the above commits, vmcore captured with fadump via panic path would not process register data for all CPUs except the panic'ing CPU. Sample output of crash-utility with such vmcore: # crash vmlinux vmcore ... KERNEL: vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 1 DATE: Wed Nov 10 09:56:34 EST 2021 UPTIME: 00:00:42 LOAD AVERAGE: 2.27, 0.69, 0.24 TASKS: 183 NODENAME: XXXXXXXXX RELEASE: 5.15.0+ VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021 MACHINE: ppc64le (2500 Mhz) MEMORY: 8 GB PANIC: "Kernel panic - not syncing: sysrq triggered crash" PID: 3394 COMMAND: "bash" TASK: c0000000150a5f80 [THREAD_INFO: c0000000150a5f80] CPU: 1 STATE: TASK_RUNNING (PANIC) crash> p -x __cpu_online_mask __cpu_online_mask = $1 = { bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} } crash> crash> crash> p -x __cpu_active_mask __cpu_active_mask = $2 = { bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} } crash> While this has been the case since fadump was introduced, the issue was not identified for two probable reasons: - In general, the bulk of the vmcores analyzed were from crash due to exception. - The above did change since commit 8341f2f222d7 ("sysrq: Use panic() to force a crash") started using panic() instead of deferencing NULL pointer to force a kernel crash. But then commit de6e5d38417e ("powerpc: smp_send_stop do not offline stopped CPUs") stopped marking CPUs as offline till kernel commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()") reverted that change. To ensure post processing register data of all other CPUs happens as intended, let panic() function take the crash friendly path (read crash_smp_send_stop()) with the help of crash_kexec_post_notifiers option. Also, as register data for all CPUs is captured by f/w, skip IPI callbacks here for fadump, to avoid any complications in finding the right backtraces. Signed-off-by: Hari Bathini Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20211207103719.91117-2-hbathini@linux.ibm.com Signed-off-by: Sasha Levin --- arch/powerpc/kernel/fadump.c | 8 ++++++++ arch/powerpc/kernel/smp.c | 10 ++++++++++ 2 files changed, 18 insertions(+) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index b7ceb041743c9..60f5fc14aa235 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1641,6 +1641,14 @@ int __init setup_fadump(void) else if (fw_dump.reserve_dump_area_size) fw_dump.ops->fadump_init_mem_struct(&fw_dump); + /* + * In case of panic, fadump is triggered via ppc_panic_event() + * panic notifier. Setting crash_kexec_post_notifiers to 'true' + * lets panic() function take crash friendly path before panic + * notifiers are invoked. + */ + crash_kexec_post_notifiers = true; + return 1; } subsys_initcall(setup_fadump); diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index d03823aa7e4de..fb95f92dcfac6 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -61,6 +61,7 @@ #include #include #include +#include #ifdef DEBUG #include @@ -638,6 +639,15 @@ void crash_smp_send_stop(void) { static bool stopped = false; + /* + * In case of fadump, register data for all CPUs is captured by f/w + * on ibm,os-term rtas call. Skip IPI callbacks to other CPUs before + * this rtas call to avoid tricky post processing of those CPUs' + * backtraces. + */ + if (should_fadump_crash()) + return; + if (stopped) return; -- 2.34.1