Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp896272ybh; Wed, 18 Mar 2020 11:06:34 -0700 (PDT) X-Google-Smtp-Source: ADFU+vspxD8ifHNWqphsZt2e52XIWtC+KNkSG4GLVFiTysPb3azp7bNj2L6bNjjHxTTxei2EtGwz X-Received: by 2002:a9d:4ee:: with SMTP id 101mr4842880otm.301.1584554794370; Wed, 18 Mar 2020 11:06:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584554794; cv=none; d=google.com; s=arc-20160816; b=SoW4mx+xtw+x2N0xvjC+E71KvM1W9Peg2tXZ0YTKf7iPIAON30AAO+Jl9ma30d/7L6 uj6QVAZQiT4QY2hEsKFGaCrcYW1Bh2sDe2r6oz5vodA7oYb08eQ8U2o/kpbBCm8UvrPR /dTsmSIykztYX6ozz1MSmu5I0YId7CPANcSPB8SMiOA4+yJnysw0K01sZ5FQFFCBJulM Rd6TrqU9oMHK+OIXPMstmXCRHhsxMIEvfi5DYAcFz6g4zMrZlO27dqPTDuBO3nkELJzQ EFBU1BFM+yA0Vh2928y1rhF27nV7ATVYj81SKyq4IDzPXEwMO9lk0aOi8s+zSiOiNCf5 MOtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=U+9MtSDj1TEbh0Pf5EDndnTBBI93wOz+c1TbWknDwSs=; b=ta/9RWRjZS98iTXMNe8QwPPWdJIYyTS2noeGfNtJChCAXuogGBoWcPQfPZoF+9jzQ+ ii5W4n/gN7BOSQHYD2HqsgFRni6Ev4SCBytkzbHPq3UoRpx7gz8A+7hSfqslLevYoC0W Xk2SaPP29P71yFS4YXIXkw3V8CCc8xkyAfYNgqG+vxaSsdgUrYYN4e8oXtlWPe0OKkYQ gZYjz3RRsnIScyjm++hbX59iH2wKumHJXLnlX9r7jk06ozwZ8icO1S40dma9oMJSjg3r QuQFPh95AQOLBhE414Y/IXXP/tU9/zR2QQmqjnbe2A0ShNBQk99NrSS0lN3qWXIkk0Ir 6Xyg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i142si3806954oib.87.2020.03.18.11.06.08; Wed, 18 Mar 2020 11:06:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726738AbgCRSFq (ORCPT + 99 others); Wed, 18 Mar 2020 14:05:46 -0400 Received: from smtp1.de.adit-jv.com ([93.241.18.167]:53408 "EHLO smtp1.de.adit-jv.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726506AbgCRSFq (ORCPT ); Wed, 18 Mar 2020 14:05:46 -0400 Received: from localhost (smtp1.de.adit-jv.com [127.0.0.1]) by smtp1.de.adit-jv.com (Postfix) with ESMTP id 8F3783C00C3; Wed, 18 Mar 2020 19:05:43 +0100 (CET) Received: from smtp1.de.adit-jv.com ([127.0.0.1]) by localhost (smtp1.de.adit-jv.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id U5k1Dee-lBIf; Wed, 18 Mar 2020 19:05:37 +0100 (CET) Received: from HI2EXCH01.adit-jv.com (hi2exch01.adit-jv.com [10.72.92.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by smtp1.de.adit-jv.com (Postfix) with ESMTPS id 3DA163C057F; Wed, 18 Mar 2020 19:05:30 +0100 (CET) Received: from lxhi-065.adit-jv.com (10.72.94.23) by HI2EXCH01.adit-jv.com (10.72.92.24) with Microsoft SMTP Server (TLS) id 14.3.487.0; Wed, 18 Mar 2020 19:05:29 +0100 Date: Wed, 18 Mar 2020 19:05:25 +0100 From: Eugeniu Rosca To: Sergey Senozhatsky CC: , John Ogness , Petr Mladek , Sergey Senozhatsky , Steven Rostedt , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Jisheng Zhang , Valdis Kletnieks , Sebastian Andrzej Siewior , Andrew Gabbasov , Dirk Behme , Eugeniu Rosca , Eugeniu Rosca Subject: Re: [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup Message-ID: <20200318180525.GA5790@lxhi-065.adit-jv.com> References: <20200315170903.17393-1-erosca@de.adit-jv.com> <20200315170903.17393-4-erosca@de.adit-jv.com> <20200317021818.GD219881@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20200317021818.GD219881@google.com> X-Originating-IP: [10.72.94.23] Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Sergey, Many thanks for your feedback! On Tue, Mar 17, 2020 at 11:18:18AM +0900, Sergey Senozhatsky wrote: > On (20/03/15 18:09), Eugeniu Rosca wrote: > > [..] > > > @@ -428,6 +428,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) > > } > > } > > > > + console_verbose_start(); > > + > > pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", > > smp_processor_id(), duration, > > current->comm, task_pid_nr(current)); > > @@ -453,6 +455,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) > > if (softlockup_panic) > > panic("softlockup: hung tasks"); > > __this_cpu_write(soft_watchdog_warn, true); > > + > > + console_verbose_end(); > > } else > > __this_cpu_write(soft_watchdog_warn, false); > > > I'm afraid, as of now, this approach is not going to work the way it's > supposed to work in 100% of cases. Because the only thing that printk() > call sort of guarantees is that the message will be stored somewhere. > Either in the main kernel log buffer, on in one of auxiliary per-CPU > log buffers. It does not guarantee, generally speaking, that the message > will be printed on the console immediately. I take this passage as an acknowledgement of the problem being _real_, in spite of the fix being not perfect. One aspect I would like to emphasize is that (please, NAK this statement if it's not accurate) the problem reported in this patch is not specific to the existing printk mechanism, but also applies to the upcoming kthread-based printk. If that's true, then IMHO this is a compelling argument to join forces and try to find a working, safe and future-proof solution. > > Consider the following example: > > CPU0 CPU1 > console_lock(); > schedule(); > > watchdog() > console_verbose_start(); > printk() > log_store() > if (!console_trylock()) > return; > console_verbose_end(); > > ... > console_unlock() > print logbuf messages to the consoles > we missed the console_verbose_start/end > on CPU1 This looks plausible. However, I wonder to which degree the same scenario is a concern in the kthread-based approach? My current standpoint is that as long as points [A-D] are met, it should do no harm to accept a (partial) fix like seen in my series: - [A] the patch tackles at least a subset of problematic use-cases - [B] the fix is non-intrusive and easy to review - [C] there is hope to reuse it in the new lockless buffer based printk - [D] there are no regressions employing the major console knobs (ignore_loglevel, quiet, loglevel, etc) as it happened in a6ae928c25835c ("Revert "printk: make sure to print log on console."") From the above points, my only major concern is that current series breaks the expectations of users who pass loglevel=0 on kernel command line and expect the system to be totally silent. This has already been expressed in the cover letter. I would especially appreciate if the same view is shared (or invalidated) by others. > > IIRC, we had a similar approach in the past. See commit 375899cddcbb26 > ("printk: make sure to print log on console"). And we reverted it, see > a6ae928c25835 ("Revert "printk: make sure to print log on console."). Thanks for this reference. It looks to me that in spite of being relatively compact, commit 375899cddcbb26 ("printk: make sure to print log on console.") broke criteria [D] listed above. I intend to avoid it by testing multiple console knob values on my arm64 system. Looking forward to your feedback on the questions posted above. TIA! -- Best Regards Eugeniu Rosca