Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp564264pxu; Tue, 5 Jan 2021 20:37:18 -0800 (PST) X-Google-Smtp-Source: ABdhPJyM/fzrBzns1Ev82ma0hNDSScw4sWnSkqiTPnCUVSHiQIOtzZyLsSMu1qPR49pL+L/i5zbw X-Received: by 2002:a17:906:2b50:: with SMTP id b16mr1638850ejg.255.1609907838131; Tue, 05 Jan 2021 20:37:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1609907838; cv=none; d=google.com; s=arc-20160816; b=UqMeft5orixU74DIO3eu0QYLG/+GBbSSTHtxIwFy7SxQ46q5DL7pjnR6g4W9vBzrX0 cXkyTUiRdcBNiKJC4oPwrR3tn4zrVboVYgQ4tKgtWYgvrAC0Mh0wOeA0BloqWQXvMvbc PXgevuoQeCxmJ2c90+8jTjxX1UfktddaToSu1tmUcace+R5uhFFaqLHcqfFS9O8kU79/ lOq1F/bQ4afGmsor7EjcJYhXp0IhO3EQLh0wWULfyxrwt9PMuWSn0Mcg98XCcaCfft5f eSZHpsp8ryDw3Eh3Krh3aUGCUBYHgkGdbd8CrcOHKxjNhfBol1P7/twGoD1dmBBBt5TI GIVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=0jZbWD+sHK2GdWV/CUfefgawgJK/eGwztAnh5wrZIDg=; b=qg52ukYaUwe6KZ/koeQDM9lQtOsvpR6ak19TmuV+N3UbbMbW1YUg/tbbQMkpJFYOEE g0Rp16f9LF77pwicSWhtfBwES13qr4K1+KOu6E6goOvQ0yTqk5UPlHu6DfawSLyEHBjR l2BBbX8ujwyQ23G52OaTN1bU5h2S5iwl8wMhcWIQY+3Yw14/H9uijPpW5fP0AC2OCNY3 3VV00EV77so0IMzwBB0rE1iMo7YxTARRBVYKf7FRLBaLk3AV5JH/fiHNg8spNPjwZUnI QRQWwnAUFNSXQaNBKPsB6VhlUTxs2eUwKA6n63bi8r9owh8GHHe3pPfsFN2x7FenqW8Q Qvkg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=e49I3amG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a12si499520ejg.220.2021.01.05.20.36.53; Tue, 05 Jan 2021 20:37:18 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=e49I3amG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725948AbhAFEfz (ORCPT + 99 others); Tue, 5 Jan 2021 23:35:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49100 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725822AbhAFEfz (ORCPT ); Tue, 5 Jan 2021 23:35:55 -0500 Received: from mail-pg1-x534.google.com (mail-pg1-x534.google.com [IPv6:2607:f8b0:4864:20::534]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16F96C06134C for ; Tue, 5 Jan 2021 20:35:15 -0800 (PST) Received: by mail-pg1-x534.google.com with SMTP id i5so1405474pgo.1 for ; Tue, 05 Jan 2021 20:35:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=0jZbWD+sHK2GdWV/CUfefgawgJK/eGwztAnh5wrZIDg=; b=e49I3amGEEQbWoPzQpGgiyM82/JwRi3TmzYiIu71mWZMeD9cLdyDHHneOsCo4bGoyb JEwGp13KKX+dOOIz4oBCZ0QzmtAsOdWIvHgKAoqw/zJ4P4/akalOoY1NOrStWAuN+b0H HDIDBmc0N08ZGAezGW3I9KUaGlbulHaBtdevif1RV/2MhRiZpAb/g8AzIXlux+GI7ow9 VTDGNj/E+D7mpmM114VgA2sO3Z3I+cdr6Iov3m17EeXHhzjMZUuT9TPHMFUPhG2gooMd 6KNuba+O28y6Qu7gbyf4NefmL9NrLAz+VRiL8I5xm+NNT/C5nXVXwBAtYSCGc5KlIVPy 3nwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=0jZbWD+sHK2GdWV/CUfefgawgJK/eGwztAnh5wrZIDg=; b=fT5cnjIfO6Z6cEcsx8LZPYAPTIsoASD8bvIin++R5GmiU1HgzgKcc4it1OBF69BAoy wLJq1a+zIRzhcmlAYThznjePXtDn8IQodkli934eCbw82vfn556gFlcvPMqZoXSupNXB i8M0MU6YpaHjA0SDmpwcFRkOCO/++PV8KRa0mYvAtDP4fUQc8znNNVasnzgo1XbyDMzv bHgWLstoY9BQgjkLELyc6UziXThIddrYer4oKGKzObYPZi8AWdv8ohGtpO8NfpjgzIZI Sy+AundSHtS8PdNYwjxyHxrdTCBRwixucniQeiWgulGba6hF4+YhcyZBEORhHcL0nvy9 SdrA== X-Gm-Message-State: AOAM531ictEif5f4wnQdq+/8f0LNufe5LUFO/TVtGgYs0UiL5XpIhfd6 hpITnhDUJW7qoEC3rGk3+bs= X-Received: by 2002:a63:5d10:: with SMTP id r16mr2476110pgb.406.1609907714470; Tue, 05 Jan 2021 20:35:14 -0800 (PST) Received: from localhost ([2409:10:2e40:5100:6e29:95ff:fe2d:8f34]) by smtp.gmail.com with ESMTPSA id 193sm832263pfz.36.2021.01.05.20.35.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jan 2021 20:35:13 -0800 (PST) Date: Wed, 6 Jan 2021 13:35:11 +0900 From: Sergey Senozhatsky To: =?utf-8?Q?=E2=80=9CWilliam?= Roche Cc: linux-kernel@vger.kernel.org, John Ogness , Peter Zijlstra , Petr Mladek , Steven Rostedt , Andrew Morton , Sergey Senozhatsky , Thomas Gleixner , Borislav Petkov Subject: Re: [PATCH v1] panic: push panic() messages to the console even from the MCE nmi handler Message-ID: References: <1609794955-3661-1-git-send-email-william.roche@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1609794955-3661-1-git-send-email-william.roche@oracle.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On (21/01/04 16:15), “William Roche wrote: [..] > diff --git a/kernel/panic.c b/kernel/panic.c > index 332736a..eb90cc0 100644 > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -166,6 +166,15 @@ static void panic_print_sys_info(void) > ftrace_dump(DUMP_ALL); > } > > +/* > + * Force flush messages to the console. > + */ > +static void panic_flush_to_console(void) > +{ > + printk_safe_flush_on_panic(); > + console_flush_on_panic(CONSOLE_FLUSH_PENDING); > +} You must debug_locks_off() as the very first step here. But see below. > /** > * panic - halt the system > * @fmt: The text string to print > @@ -247,7 +256,7 @@ void panic(const char *fmt, ...) > * Bypass the panic_cpu check and call __crash_kexec directly. > */ > if (!_crash_kexec_post_notifiers) { > - printk_safe_flush_on_panic(); > + panic_flush_to_console(); > __crash_kexec(NULL); It's dangerous to call console_flush_on_panic() before we stop secondary CPUs. console_flush_on_panic() ignores the state console_sem, so if any of the secondary is currently printing something on the consoles you'll get corrupted messages - we use `static char buffer` for messages we push to consoles. Another issue is that with this panic_flush_to_console() call console_sem can end up being locked once (by secondary CPU) and unlocked twice (by second and panic CPUs) [*] > /* > @@ -271,9 +280,8 @@ void panic(const char *fmt, ...) > */ > atomic_notifier_call_chain(&panic_notifier_list, 0, buf); > > - /* Call flush even twice. It tries harder with a single online CPU */ > - printk_safe_flush_on_panic(); > kmsg_dump(KMSG_DUMP_PANIC); > + panic_flush_to_console(); Why? > /* > * If you doubt kdump always works fine in any situation, > @@ -298,7 +306,7 @@ void panic(const char *fmt, ...) > * buffer. Try to acquire the lock then release it regardless of the > * result. The release will also print the buffers out. Locks debug > * should be disabled to avoid reporting bad unlock balance when > - * panic() is not being callled from OOPS. > + * panic() is not being called from OOPS. > */ > debug_locks_off(); > console_flush_on_panic(CONSOLE_FLUSH_PENDING); > @@ -314,6 +322,7 @@ void panic(const char *fmt, ...) > * We can't use the "normal" timers since we just panicked. > */ > pr_emerg("Rebooting in %d seconds..\n", panic_timeout); > + panic_flush_to_console(); [*] So this > for (i = 0; i < panic_timeout * 1000; i += PANIC_TIMER_STEP) { > touch_nmi_watchdog(); > @@ -347,6 +356,7 @@ void panic(const char *fmt, ...) > disabled_wait(); > #endif > pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf); > + panic_flush_to_console(); [*] and this are both very interesting points. Those extra console_flush_on_panic() calls indicate that normal printk() cannot succeed in locking the console_sem so it doesn't try to console_unlock(): either because we killed the secondary CPU while it was holding the lock, or because we locked it once and unlocked it twice. I think it would make sense to just re-init console_sem, so that normal printk()-s will have chance to grab the console_sem lock and then we don't need any extra panic_flush_to_console() calls. Maybe we can do something like this --- diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index ffdd0dc7ec6d..4bd2e29c8cc0 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2638,6 +2638,7 @@ void console_flush_on_panic(enum con_flush_mode mode) * context and we don't want to get preempted while flushing, * ensure may_schedule is cleared. */ + sema_init(&console_sem, 1); console_trylock(); console_may_schedule = 0;