Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp274981pxb; Fri, 8 Jan 2021 04:51:18 -0800 (PST) X-Google-Smtp-Source: ABdhPJyVeI57tKMeTNqKWT6FdVZ+aKJseUYp47k5V/B9msOlztIK5kzGhUFRWxX04ATEVjmVg1AH X-Received: by 2002:a17:906:7804:: with SMTP id u4mr2599665ejm.97.1610110278493; Fri, 08 Jan 2021 04:51:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610110278; cv=none; d=google.com; s=arc-20160816; b=FXzwqAjdAYKrjDlK975yPfYF7yrRODzGj+Twczx6RcTXapLbf2ufTjJhkla8OrA9CM T/1Ymptdc2qf5UVbR60eHtpoB4I4DCbhzJjCkxW7uKipC1mlNetRU1cGjRE1ce305GOq 67ao+6PEdd86g/CyjzTxummCbQjWu+2QQNfKi+Qaw5yIk+5d56tsRcbvPncVLo/WJH4/ mWWnbuggb9APEZJVKfUXijwU1MXfF5nRAVw/tw3CZ/HbNdgMV/nSNsp41Bvq1ar4LLtU QWQmlsiVDx1msD4UpCk88+qO+RU/QlrOIKP2Gmc3FOFh7uoLmf3vZUzLMfHN5VAWJWxK fRLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=DckcCzsJ+u0tQc15f+xtfESHNbQlsx8GV/bkEuYJ8P0=; b=tDiWQN+vuNkG3x8NUdEigN6MdMijeThy3q3M6/ZSVzUsO1IJBbk+g0vAGpHPx97R/9 GmbzXZ4Gqkk+EkI1xbXF51WPDNHLmSAiJYLGc9fep3FX9nnpuL/duW2aqKutnd6/vJ3q 3l5awjFkAmN7h5eYLIrl6aA74dvlNyuhHR+jKUeRjIpQNK0iESSicrZlUBjH/5/ipiaA HsH8XISvmd9er/azcmntdpI/RaQnQo7NjhajTLucJ7W8B0QtQ3WydTpdWnoyaltWwQFJ EeGR0JL8s09JEWV2OSeoUunffXs/zo91jFvE4WubjctyZ7eT59kr2xkgedAdhB5Ot0Ec uJvg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=ocsWBWYb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g2si3383205eji.656.2021.01.08.04.50.54; Fri, 08 Jan 2021 04:51:18 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=ocsWBWYb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726745AbhAHMsn (ORCPT + 99 others); Fri, 8 Jan 2021 07:48:43 -0500 Received: from mx2.suse.de ([195.135.220.15]:39102 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725817AbhAHMsn (ORCPT ); Fri, 8 Jan 2021 07:48:43 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1610110077; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DckcCzsJ+u0tQc15f+xtfESHNbQlsx8GV/bkEuYJ8P0=; b=ocsWBWYbP5iRk6KKSnQ9NGsRsaPJ7jDiStJvkOEF/UwhljYpqmwHz/RAV9R1ttciVhBegM 6Sdx4CUO3Ixo3XIHxcaolSJ1UnrWpn7bfr82TL1C+jHap/1kfV/qnJYum4P5sFVhuwNplf YI+Q37oA9/Bm7YuMgnWYt0rfeKjh/L4= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id E3941AD62; Fri, 8 Jan 2021 12:47:56 +0000 (UTC) Date: Fri, 8 Jan 2021 13:47:56 +0100 From: Petr Mladek To: =?utf-8?Q?=E2=80=9CWilliam?= Roche Cc: linux-kernel@vger.kernel.org, John Ogness , Peter Zijlstra , Steven Rostedt , Andrew Morton , Sergey Senozhatsky , Thomas Gleixner , Borislav Petkov Subject: Re: [PATCH v1] panic: push panic() messages to the console even from the MCE nmi handler Message-ID: References: <1609794955-3661-1-git-send-email-william.roche@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1609794955-3661-1-git-send-email-william.roche@oracle.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 2021-01-04 16:15:55, “William Roche wrote: > From: William Roche > > Force push panic messages to the console as panic() can be called from NMI > interrupt handler functions where printed messages can't always reach the > console without an explicit push provided by printk_safe_flush_on_panic() > and console_flush_on_panic(). > This is the case with the MCE handler that can lead to a system panic > giving information on the fatal MCE root cause that must reach the console. > > Signed-off-by: William Roche > --- > > Notes: > While testing MCE injection and kernel reaction, we discovered a bug > in the way the kernel provides the panic reason information: When dealing > with fatal MCE, the machine (physical or virtual) can reboot without > leaving any message on the console. > > This behavior can be reproduced on Intel with the mce-inject tool > with a simple: > # modprobe mce-inject > # mce-inject test/uncorrected > > The investigations showed that the MCE panic can be totally message-less > or can give a small set of messages. This behavior depends on the use of the > crash_kexec mechanism (using the "crashkernel" parameter). Not using this > parameter, we get a partial [Hardware Error] information on panic, but some > important notifications can be missing. And when using it, a fatal MCE can > panic the system without leaving any information. > > . Without "crashkernel", a Fatal MCE injection shows: > > [ 212.153928] mce: Machine check injector initialized > [ 236.730682] mce: Triggering MCE exception on CPU 0 > [ 236.731304] Disabling lock debugging due to kernel taint > [ 236.731947] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 1: b000000000000000 > [ 236.731948] mce: [Hardware Error]: TSC 78418fb4a83f > [ 236.731949] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1605312952 SOCKET 0 APIC 0 microcode 1 > [ 236.731949] mce: [Hardware Error]: Run the above through 'mcelog --ascii' > [ 236.731950] mce: [Hardware Error]: Machine check: MCIP not set in MCA handler > [ 236.731950] Kernel panic - not syncing: Fatal machine check > [ 236.732047] Kernel Offset: disabled > > The system hangs 30 seconds without any additional message, and finally > reboots. > > . With the use of "crashkernel", a Fatal MCE injection shows only the > injection message > > [ 80.811708] mce: Machine check injector initialized > [ 92.298755] mce: Triggering MCE exception on CPU 0 > [ 92.299362] Disabling lock debugging due to kernel taint > > No other messages is displayed and the system reboots immediately. But you could find the messages in the crashdump. Aren't you? It works this way by "design". The idea is the following: Taking any locks from NMI context might lead to a deadlock. Re-initializing the locks might lead to deadlock as well because of possible double unlock. Ignoring the locks might lead to problems either. A compromise is needed: 1. crashdump disabled console_flush_on_panic() is called. It tries hard to get the messages on the console because it is the only chance. It does console_trylock(). It is called after bust_spinlocks(1) so that even the console-specific locks are taken only with trylock, see oops_in_progress. BTW: There are people that do not like this because there is still a risk of a deadlock. Some code paths take locks without checking oops_in_progress. For these people, more reliable reboot is more important because they want to have the system back ASAP (cloud people). 2. crashdump enabled: Only printk_safe_flush_on_panic() is called. It does the best effort to flush messages from the per-CPU buffers into the main log buffer so that they can be found easily in the core. It it pretty reliable. It should not be needed at all once the new lockless ringbuffer gets fully integrated, It does not try to flush the messages to the console. Getting the crash dump is more important than risking a deadlock with consoles. Best Regards, Petr