Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp1253676rdh; Mon, 25 Sep 2023 07:31:27 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGcKnWLfbFA36DYDRFgOddQK9iknXamDmm/FT4QpcVk6yELD0KN3tm5Ywa80gmNh20Pa9zg X-Received: by 2002:a05:6e02:2167:b0:349:7663:566c with SMTP id s7-20020a056e02216700b003497663566cmr8897467ilv.30.1695652287177; Mon, 25 Sep 2023 07:31:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695652287; cv=none; d=google.com; s=arc-20160816; b=GOKq/my6sGxiIB8y4n8AiNQg8w6eMhFCI/xj40B/v/FeEDmqwC4oEXohhREydGLRzp sCVuWUrOaT5mD8oYB2eyAb/Wm7Ig2IDKahWOBN/TQo+l/1pVLJrdwR1OZHD7z1Gp0NNd Qr3wTFV/7shToprHYBCWko5KAWSUSljeOdjJ6wzM02RqZRiQLx3QSqDl1s9ybCpd8N1R qLH9Y2MqS7hS3uyhT6Dt6u55epS+InS3KGze9w1yBRAEfb8AW4PpAP8tRxGID9dZ2iBC JyffI4y7P1NwhDxVAHpWUN7qvUotQXSyNv/tZpkiZHz/5vOLSqELBbMxbh8C2SN9Nn7E pOPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=VGDSLgZSdKcOXlcPNU3EmTBi5UZKRWa8ZqAxGYScQ7k=; fh=pKnPKUsir0uEGSsor+4Zc2vgbu+g+ayvUgsdzkuXaoA=; b=Gz0gjpUHqYdbTl+ZiAgycNSeYB+Dvo//Ev9TpwEcCXdVmmX57peMjborq5cLBl5m0i H0f7zSu/iYrxO55C4oMox18HQjflA6jM5pVEDlVfwi5oOAsKZtgjRXkOn+waAAkC936h 1PrI+EHbnVgUHO2ZHjWsEv1qpEbfkWOAzANoGcH1yvfukleX9b8TuMtfZKKDKFz9rhrw Mmn8G9vMX0vwZ9mG+mR9eZCnzTnedVrutx2dOJQLxTVNagTsNN8Iyb7PaFNg77CNGOwi 6IRac0MFIdvp+VBZeEq0R6I4owM5JQMPNK57Q5XaDsZrotS4cM3z63R8iAijhUuVCiQb seYQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=Jery8T2F; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=DBs6vZn2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id w4-20020a634744000000b005781e99d048si10016507pgk.889.2023.09.25.07.31.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 07:31:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=Jery8T2F; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=DBs6vZn2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 8A993803D5B7; Mon, 25 Sep 2023 06:37:24 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231367AbjIYNhP (ORCPT + 99 others); Mon, 25 Sep 2023 09:37:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230284AbjIYNhO (ORCPT ); Mon, 25 Sep 2023 09:37:14 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F2CD83 for ; Mon, 25 Sep 2023 06:37:07 -0700 (PDT) From: John Ogness DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1695649025; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=VGDSLgZSdKcOXlcPNU3EmTBi5UZKRWa8ZqAxGYScQ7k=; b=Jery8T2F2WP7uudy/8tYZzr+//iP/fgJ7SpgOm7Akv5vxNM/jrSid5HZw4g+YdMjJCDoh2 HVYmebEdU6MvapmloX3U2inF17aN+TRclObm/XHu36J88DX8F6jgj/Pjk0aJAGmiKEGmkN Ey8yoKCBsYXMLfNQvyXv7ZjAvj+Rnq5mtWfcgGB+EvXJ2gokUGth6Ok18Rknns4Q/M0aQ9 XR/xvSfFoIobvBlGt8Vpwyiuj45ZSz50mT4YAV9FTIq5qpqVxREMaHpbNJj78OSMSrym/N iPB+PuabumgtfqyhnIwcuztA41P6peeYd18CFoAjoqzFr0a8yJRPQjKg9ZxIiA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1695649025; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=VGDSLgZSdKcOXlcPNU3EmTBi5UZKRWa8ZqAxGYScQ7k=; b=DBs6vZn2IIgiNt54vyTaseQaWCsU0uI7QLXbWZFgWvsbkrzCkgSniqabEsXRswaJJ0+FOt GNtI08x/HgJr5KCw== To: Petr Mladek Cc: Sergey Senozhatsky , Steven Rostedt , Thomas Gleixner , linux-kernel@vger.kernel.org Subject: Re: [PATCH printk v2 06/11] printk: nbcon: Wire up nbcon console atomic flushing In-Reply-To: References: <20230919230856.661435-1-john.ogness@linutronix.de> <20230919230856.661435-7-john.ogness@linutronix.de> Date: Mon, 25 Sep 2023 15:43:03 +0206 Message-ID: <87a5tabchs.fsf@jogness.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INVALID_DATE_TZ_ABSURD, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Mon, 25 Sep 2023 06:37:24 -0700 (PDT) On 2023-09-22, Petr Mladek wrote: >> Perform nbcon console atomic flushing at key call sites: >> >> nbcon_atomic_exit() - When exiting from the outermost atomic >> printing section. > > IMHO, it would not help because it all depends > whether nbcon_context_try_acquire() succeeds or now, see below. Sure, but it first attempts a _safe_ flush on all nbcon consoles that alow it, _then_ it falls back to an _unsafe_ flush (which cannot fail, but might explode). >> If the priority was NBCON_PRIO_PANIC, >> attempt a second flush allowing unsafe hostile >> takeovers. > > I was first scared that this would be called by any printk() > in panic(). But it seems that nbcon_atomic_exit() is called > only one after disabling printk(). It might deserve a comment > where it is supposed to be used. See a proposal below. OK. >> console_flush_on_panic() - Called from several call sites to >> trigger ringbuffer dumping in an urgent situation. >> >> console_flush_on_panic() - Typically the panic() function will > > This is a second description of console_flush_of_panic() which > looks like a mistake. Oops. The first one should not be there. >> take care of atomic flushing the nbcon consoles on >> panic. However, there are several users of >> console_flush_on_panic() outside of panic(). > > The generic panic() seems to use console_flush_on_panic() correctly > at the very end. > > Hmm, I see that console_flush_on_panic() is called also in [...] > Anyway, we should make clear that console_flush_on_panic() might break > the system and should be called as the last attempt to flush consoles. > The above arch-specific users are worth review. In an upcoming series you will see that console_flush_on_panic() only takes the console lock if there are legacy consoles. Ideally, eventually there will only be nbcon consoles, so your concern would disappear. And if those users continue to use legacy consoles, then the risks will be the same as now. >> * Return: The previous priority that needs to be fed into >> * the corresponding nbcon_atomic_exit() >> * Context: Any context. Disables migration. >> + * >> + * When within an atomic printing section, no atomic printing occurs. This >> + * is to allow all emergency messages to be dumped into the ringbuffer before >> + * flushing the ringbuffer. > > The comment sounds like it is an advantage. But I think that it would be > a disadvantage. Please explain. At LPC2022 we agreed it was an advantage and it is implemented on purpose using the atomic printing sections. > Fortunately, this is not true. Even the atomic context tries > to flush the messages immediately when it is able to get > the per-cpu lock. It happes via console_flush_all() called > from console_unlock(). You are talking about legacy consoles. The goal of the nbcon consoles is to introduce a _new_ console type to support controlled decisions for emergency printing. Legacy consoles will continue to work (or not work) as before. >> @@ -1130,8 +1135,13 @@ void nbcon_atomic_exit(enum nbcon_prio prio, enum nbcon_prio prev_prio) >> { >> struct nbcon_cpu_state *cpu_state; >> >> + __nbcon_atomic_flush_all(false); > > IMHO, this would not help. Either this context was able to acquire the > lock and flush each message directly. What do you mean by "help"? We are exiting an atomic printing section. This is where the accumulated emergency messages are to be printed. If this is a nested atomic printing section, it does nothing because the outermost atomic printing section will flush. > Or it would fail to get the lock even here. If it fails to get the lock, we are talking about the worst case scenario. That scenario demands unsafe printing, which is only allowed in panic. If we are not in panic, it is assumed the current owner will take care of it. >> + >> cpu_state = nbcon_get_cpu_state(); >> >> + if (cpu_state->prio == NBCON_PRIO_PANIC) >> + __nbcon_atomic_flush_all(true); > > It would deserve a comment that nbcon_atomic_exit() is used > in panic() to calm down the consoles completely, similar effect > as setting the variable "suppress_printk". Actually, it is nbcon_atomic_enter() that does the calming down, but only for the emergency CPU. Other CPUs are allowed to print as wildly as they want. And all CPUs are allowed to continue to fill the ringbuffer. @suppress_printk is quite different as it globally blocks any new messages from being stored into the ringbuffer. This is not calming down the consoles, but rather putting a finite limit on what they produce. >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c >> index 6ef33cefa4d0..419c0fabc481 100644 >> --- a/kernel/printk/printk.c >> +++ b/kernel/printk/printk.c >> @@ -3159,6 +3159,8 @@ void console_flush_on_panic(enum con_flush_mode mode) >> console_srcu_read_unlock(cookie); >> } >> >> + nbcon_atomic_flush_all(); >> + >> console_flush_all(false, &next_seq, &handover); > > It seems that console_flush_all() tries to flush nbcon conosoles > as well. And nbcon_atomic_flush_all() is explicitely called > close above this flush_on_panic(). This is from panic() > after this patchset is applied. > > > void panic(const char *fmt, ...) > { > [...] > nbcon_atomic_flush_all(); > > console_unblank(); > > /* > * We may have ended up stopping the CPU holding the lock (in > * smp_send_stop()) while still having some valuable data in the console > * buffer. Try to acquire the lock then release it regardless of the > * result. The release will also print the buffers out. Locks debug > * should be disabled to avoid reporting bad unlock balance when > * panic() is not being callled from OOPS. > */ > debug_locks_off(); > console_flush_on_panic(CONSOLE_FLUSH_PENDING); > > > By other words, we try to flush nbcon consoles 3 times almost > immediately after each other. "Almost immediately" != "immediately". Keep in mind that this is in atomic printing context. Generally speaking, no messages will be seen on the consoles yet. Anytime time there is a signficant piece (such as console_unblank()) it would probably be wise to flush. We will probably want even more flush points. But I wanted to at least start with the risky points (such as console_unblank() when legacy consoles still exist). > It would deserve some comment what is the purpose. Otherwise, people > would tend to remove the "redundant" calls. OK. >> @@ -3903,6 +3905,10 @@ void defer_console_output(void) >> >> void printk_trigger_flush(void) >> { >> + migrate_disable(); >> + nbcon_atomic_flush_all(); >> + migrate_enable(); > > Can this be actually called in NMI? No. I need to add in_nmi() checks. >> defer_console_output(); >> } > > This function would deserve some description because it looks strange. > There are three names which are contradicting each other: > > + trigger_flush: it sounds like it tells someone to do the flush > > + nbcon_atomic_flush_all: does the flush immediately > > + defer_console_output: sounds like it prevents the current context > to flush the consoles immediately Agreed. > We should either choose better names and/or add comments: > > /** > * console_flush_or_trigger() - Make sure that the consoles will get flushed. > * > * Try to flush consoles when possible or queue flushing consoles like > * in the deferred printk. > * > * Context: Can be used in any context (including NMI?) > */ > void printk_flush_or_trigger(void) > { > /* > * Try to flush consoles which do not depend on console_lock() > * and support safe atomic_write() > */ if (!in_nmi()) migrate_disable(); > nbcon_atomic_flush_all(); if (!in_nmi()) migrate_enable(); > > /* Try flushing legacy consoles or queue the deferred handling. */ > if (!in_nmi() && console_trylock()) > console_unlock(); > else > defer_console_output(); > } I would be OK with this name and (fixed up for NMI) implementation. > All in all. I feel a bit overwhelmed by the many *flush*() functions at > the moment. Some flush only nbcon consoles. Some flush all. Some > are using only the safe takeover/handover and some allow also > harsh takeover. Some ignore port->lock because of bust_spinlock(). > Some are ignoring console_lock. They are called on different > locations. The nbcon variants are called explicitly and also > inside the generic *flush*() functions. Agreed. This madness exists in part because we are continuing to support legacy consoles. I tried to keep the two separate as much as possible. But when it comes to flushing, there will be some overlap. When working on the code, I tend to either look at the legacy consoles _or_ the nbcon consoles. If you try to visualize the whole picture on a system with legacy and nbcon consoles, it is a bit overwhelming. I recommend focussing on them separately. I.e. when talking about nbcon consoles, there is no reason to be concerned about the state of the console lock or the port lock. When talking about legacy consoles, there is no reason to be concerned about CPU atomic printing states. John