Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp243910rdg; Tue, 10 Oct 2023 09:03:25 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF6OC4iRFqr0nTct1hMAw4VA5yUtL0jMgnQDjxWHl49kal/mn3LU/InR+qadUaZ4F9v1PJn X-Received: by 2002:a17:902:8343:b0:1c2:82e:32de with SMTP id z3-20020a170902834300b001c2082e32demr17556829pln.0.1696953805400; Tue, 10 Oct 2023 09:03:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696953805; cv=none; d=google.com; s=arc-20160816; b=YzRYahYbc066RquZFm0uqNIJoz4zdFJlND9uzu1xJaRlKRpUn3mauaz8l7hjT/ymSe QmRCNNYDmMIZmRpZ7+r/uXabj15ZFHPc7PaLpdzEbmuxEIiBFGOf1EwWcAb88to3ucrM Fp5B4kd6rtmATk+vuvcmf2SS1eIUj6N1Ll+To41fi5ndj59HxybCQh8tsukhANn4kFQp 5fwtca9Jxzoykw2XELPXqEDVQKq+a/2ssBOdAIi011QbXvFwmQlpwBDsbUG4pQnZFduH mPWeSgyvpHeFxGVmmb3wMJEalMSfH0myAwZJb5Jwxmbq1OknLZ3cnMkZA3Fxv/E4uuQi rO8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=+Rn5Os1D6mbuIatFwguyX+3/LudvCCUS08zsghsg2yU=; fh=qDyLFD9MqwlToJBjD3f1aOaVklnsuzVMDEruFe5El9o=; b=gMd8jfCufuaIyzMw/r7qwaXO/84V9QJsZRunVpTZhuF3lLKYv/IkHHfCecZ9yFpo4X qscQbxoEZ5Ex7/p+BYvL0/P5RfjDcI5dfs5sEPjr9D5yKyfEkwi8IVTpQ1WqDHF2Qh4A p0w2ODv8hifTfk6ZlmE7jquOK8t4tLmE6wexV0O+BXve8W9ROyMLMLqelJ1XRfKHPph4 bGVvGJOtSqO5RujIO1xaptq7mnlrw8CXvdAnqjR5lbGWyfYva3RrYXRApQuc2miQsA11 udlmKcX5OS/nePoWVzEVCC1fPeORlm30opKxbaQvUGT/JZnbH1KUEP/+mVQL81D+/jya Qkdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=flyvWYxF; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=kFzw+JYJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id q2-20020a170902eb8200b001b9eb349549si11771966plg.630.2023.10.10.09.03.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Oct 2023 09:03:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=flyvWYxF; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=kFzw+JYJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id E49A681B2115; Tue, 10 Oct 2023 09:03:15 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233624AbjJJQCv (ORCPT + 99 others); Tue, 10 Oct 2023 12:02:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44118 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233532AbjJJQCs (ORCPT ); Tue, 10 Oct 2023 12:02:48 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 542F8B9 for ; Tue, 10 Oct 2023 09:02:45 -0700 (PDT) From: John Ogness DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1696953761; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=+Rn5Os1D6mbuIatFwguyX+3/LudvCCUS08zsghsg2yU=; b=flyvWYxF6QixzAxq8OaXHWfBr2icL73xA8nmgYR4T3mYDseJc97PUXgW1uS1kwYet3QwRQ 6Q1AKo71jvLGT9m4cRgm2lgciywjNGz8oSpGMIwd2CIDfWi6ZLN9DH4P9jN5WI7oEkfHDm pHrjGLKOdSgMkVrqmAEPHo87nDQ8352N/SMQ2cPEJ36E4IcFBSNR/pQ4XZtoey+/rXsw9J JXd223dgv8xKfk3tftTKKD2ccGKUE5HKD6POOoMrBt5dN9TYBlwSYwn7OVPvsB+5IDMs4A VN0Ac7HUYe5mU3tGS94eqhj2ag/lm/bjbKd7Hdb5L2ZcI0X/35saoLLW5p8fbg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1696953761; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=+Rn5Os1D6mbuIatFwguyX+3/LudvCCUS08zsghsg2yU=; b=kFzw+JYJPlK+uVCbLEFFt7QtPU4rNwqdykm69ZKpGVpXY0YKUtzD7Kx7hM0RPMvKmWqSlX gZQ+QcTddQsbG5Dg== To: Petr Mladek Cc: Linus Torvalds , Sergey Senozhatsky , Steven Rostedt , Thomas Gleixner , linux-kernel@vger.kernel.org, Greg Kroah-Hartman Subject: Re: panic context: was: Re: [PATCH printk v2 04/11] printk: nbcon: Provide functions to mark atomic write sections In-Reply-To: References: <87h6n5teos.fsf@jogness.linutronix.de> <87il7hv2v2.fsf@jogness.linutronix.de> Date: Tue, 10 Oct 2023 18:08:40 +0206 Message-ID: <874jiy8nz3.fsf@jogness.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=3.3 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INVALID_DATE_TZ_ABSURD, MAILING_LIST_MULTI,RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 10 Oct 2023 09:03:16 -0700 (PDT) X-Spam-Level: *** On 2023-10-09, Petr Mladek wrote: > We really have to distinguish emergency and panic context! OK. >> The LPC2022 demo/talk was recorded: >> >> https://www.youtube.com/watch?v=TVhNcKQvzxI >> >> At 55:55 is where the situation occurred and triggered the conversation, >> ultimately leading to this new feature. > > Thanks for the link. My understanding is that the following scenario > has happened: > > 1. System was booting and messages were being flushed using the kthread. > > 2. WARN() happened. It printed the 1st line, took over the per-console > console lock and started flushing the backlog. There were still > many pending messages from the boot. > > 3. NMI came and caused panic(). The panic context printed its first line, > took over the console from the WARN context, flushed the rest > of the backlog and continued printing/flushing more messages from > the panic() context. > > > Problem: > > People complained that they saw only the first line from WARN(). > The related detailed info, including backtrace, was missing. > > It was because panic() interrupted WARN() before it was able > to flush the backlog and print/flush all WARN() messages. Thanks for taking the time to review it in detail. > Proposed solution: > > WARN()/emergency context should first store the messages and > flush them at the end. > > It would increase the chance that all WARN() messages will > be stored in the ring buffer before NMI/panic() is called. > > panic() would then flush all pending messages including > the stored WARN() messages. OK. > Important: > > The problem is that panic() interrupted WARN(). Ack. >> You may also want to reread my summary: >> >> https://lore.kernel.org/lkml/875yheqh6v.fsf@jogness.linutronix.de > > Again, thanks for the pointer. Let me paste 2 paragraphs here: > > > - Printing the backlog is important! If some emergency situation occurs, > make sure the backlog gets printed. > > - When an emergency occurs, put the full backtrace into the ringbuffer > before flushing any backlog. This ensures that the backtrace makes it > into the ringbuffer in case a panic occurs while flushing the backlog. > > > My understanding is: > > 1st paragraph is the reason why: > > + we have three priorities: normal, emergency, panic > > + messages in normal context might be offloaded to kthread > > + emergency and panic context should try to flush the messages > from this context. > > > 2nd paragraph talks about that emergency context should first store > the messages and flush them later. And the important part: > > "in case a panic occurs while flushing the backlog. > > => panic might interrupt emergency > > It clearly distinguishes emergency and panic context. > > >> as well as Linus' follow-up message: >> >> https://lore.kernel.org/lkml/CAHk-=wieXPMGEm7E=Sz2utzZdW1d=9hJBwGYAaAipxnMXr0Hvg@mail.gmail.com > > IMHO, the important part is: > > > Yeah, I really liked the notion of doing the oops with just filling > the back buffer but still getting it printed out if something goes > wrong in the middle. > > > He was talking about oops => emergency context > > Also he wanted to get it out when something goes wrong in the middle > => panic in the middle ? > > > And another paragraph: > > > I doubt it ends up being an issue in practice, but basically I wanted > to just pipe up and say that the exact details of how much of the back > buffer needs to be flushed first _could_ be tweaked if it ever does > come up as an issue. > > > Linus had doubts that there might be problems with too long backlog > in practice. And I have the doubts as well. > > And this is my view. The deferred flush is trying to solve a corner > case and we are forgetting what blocked printk kthreads >10 years. OK. Thank you for the detailed analysis. For v3 I will do something similar to what you proposed [0], except that I will use a per-cpu variable (to track printk emergency nesting) instead of adding a new field to the task struct. John Ogness [0] https://lore.kernel.org/lkml/ZRLBxsXPCym2NC5Q@alley/