Received: by 2002:a25:d783:0:0:0:0:0 with SMTP id o125csp382876ybg; Thu, 19 Mar 2020 01:21:33 -0700 (PDT) X-Google-Smtp-Source: ADFU+vtV1BJ6rxPxFypazV5Jxblmnq1+e1v5fEGz1QSVSTX3uWL/UlAtIFnYgKD5lzEm3m7aMc6n X-Received: by 2002:aca:56c7:: with SMTP id k190mr1454963oib.127.1584606093673; Thu, 19 Mar 2020 01:21:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584606093; cv=none; d=google.com; s=arc-20160816; b=pESPxF65yK3EgeqjNWDRahVzQhYIiruBIPrHyXJi2nkvnBNRm8zpIeVU2MTe7TpRDq a+3bOdGyuF0jrvFHoCMF9CKV9DlDXnHB7vKLrLPi5smOwmcCSIP61EZAd1TGzPXix/WL JfPY8/jCxz+eR6oIs85QZXQBeBdZWoa+axXIWJhgB0Nbhfl12uV2l4F+P4EsA9Sz0LEy cm6G4V5BpCGyMYUoGw+lxhxMg3P4VQLv6PlIm8fCpm/7fJQ7OJM69sDC6g0PGU8pbd0f KHrZEDPjakvHFp++cacMaagyBwQggT5L/WE2cj2LxBQ9deChX/HFuD/FqWX9yEbtxJ0X uKnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from; bh=DEzIB3njlzjiHSN4lxy/CfHS3kECaz0YV9QEzk/hz3c=; b=kjnX5lYw55AghIgxk9NcsHmUNk+VmvuCEUt3LIzgvGWoO37A86wvvA3VaJiK/m+Tmm EOEb8WjTAFp1Ma98K1ivKbhGSUmtuZ2ilqL8G1ZAor2zIFRlaBjVg8Q040k+hRHMCh6f bKxXp5tR9A33EwPsqhIxGPe5GD0T9a0OgpNBdajF18hY2BOr6O2wKjfYEzONZvG4FIpT CZkzHWgpTUm+nJzwaLYrT5QayjuMPpPAkleeDrhSIDfOtC6/Oqd3jHF+sr6HFbNxaO+y MbRVa+2gsPxJ10pYY8TF/lIuixdMY87j6bEZESp5vnsN/8DUlpcWm5wh0VwasDcW/ghV IvHg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f50si891041otf.139.2020.03.19.01.21.20; Thu, 19 Mar 2020 01:21:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725895AbgCSIUv (ORCPT + 99 others); Thu, 19 Mar 2020 04:20:51 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:59617 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725767AbgCSIUu (ORCPT ); Thu, 19 Mar 2020 04:20:50 -0400 Received: from localhost ([127.0.0.1] helo=vostro.local) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1jEqPm-0002Oo-Vy; Thu, 19 Mar 2020 09:20:31 +0100 From: John Ogness To: Eugeniu Rosca Cc: Sergey Senozhatsky , , Petr Mladek , Sergey Senozhatsky , Steven Rostedt , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Jisheng Zhang , Valdis Kletnieks , Sebastian Andrzej Siewior , Andrew Gabbasov , Dirk Behme , Eugeniu Rosca Subject: Re: [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup References: <20200315170903.17393-1-erosca@de.adit-jv.com> <20200315170903.17393-4-erosca@de.adit-jv.com> <20200317021818.GD219881@google.com> <20200318180525.GA5790@lxhi-065.adit-jv.com> Date: Thu, 19 Mar 2020 09:20:27 +0100 In-Reply-To: <20200318180525.GA5790@lxhi-065.adit-jv.com> (Eugeniu Rosca's message of "Wed, 18 Mar 2020 19:05:25 +0100") Message-ID: <87r1xog3hw.fsf@linutronix.de> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020-03-18, Eugeniu Rosca wrote: > On Tue, Mar 17, 2020 at 11:18:18AM +0900, Sergey Senozhatsky wrote: >> On (20/03/15 18:09), Eugeniu Rosca wrote: >>> @@ -428,6 +428,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) >>> } >>> } >>> >>> + console_verbose_start(); >>> + >>> pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", >>> smp_processor_id(), duration, >>> current->comm, task_pid_nr(current)); >>> @@ -453,6 +455,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) >>> if (softlockup_panic) >>> panic("softlockup: hung tasks"); >>> __this_cpu_write(soft_watchdog_warn, true); >>> + >>> + console_verbose_end(); >>> } else >>> __this_cpu_write(soft_watchdog_warn, false); >> >> I'm afraid, as of now, this approach is not going to work the way it's >> supposed to work in 100% of cases. Because the only thing that printk() >> call sort of guarantees is that the message will be stored somewhere. >> Either in the main kernel log buffer, on in one of auxiliary per-CPU >> log buffers. It does not guarantee, generally speaking, that the message >> will be printed on the console immediately. > > I take this passage as an acknowledgement of the problem being _real_, > in spite of the fix being not perfect. > > One aspect I would like to emphasize is that (please, NAK this > statement if it's not accurate) the problem reported in this patch is > not specific to the existing printk mechanism, but also applies to the > upcoming kthread-based printk. If that's true, then IMHO this is a > compelling argument to join forces and try to find a working, safe and > future-proof solution. Let me clarify that the upcoming rework is _not_ simply to make a kthread-based printk. There upcoming rework addresses several issues (kthreads being only a piece): 1. allow printk-callers to get their messages immediately and locklessly into the log buffer from any context 2. during normal operation, printk-callers are completely decoupled from console printing 3. in the case of an emergency, every printk-caller will directly and synchronously perform console printing of its messages on consoles that support atomic writing For the rework we decided that triggering an oops is what irreversibly transitions the system from "normal operation" to "emergency". One could interpret the current "console_verbose()" to be some sort of equivalent to irreversibly transitioning the system to "emergency". But that is not how it was decided to be interpreted. For the rework, printk-callers do not have any influence on forcing immediate console printing (unless they trigger an oops). console_verbose() is still relevant in the rework. console_verbose() is affecting _what_ is printed to consoles and _not when_. Once the printk rework is complete, the mechanisms for atomic and synchronous console printing will be in place. It would be possible that these mechanisms could also be used in non-oops scenarios. But that would need to be discussed in depth and clearly defined with caution for abuse. John Ogness