LinuxLists.cc - [patch RFC 00/29] printk: A new approach

2022-09-10 22:31:22

Subject: [patch RFC 00/29] printk: A new approach - WIP

Folks!

After the recent revert of the threaded printk patches, John and I sat down
and did a thorough analysis of the failure(s). It turned out that there are
several reasons why this failed:

1) The blurry semantics of console lock which triggers unpleasant
memories of BKL. That in turn inspired me to flag the new consoles
with CON_NOBKL and use the nobkl theme throughout the series as I
could not come up with a better name. :)

2) The assumption that seperating a printk thread out from console lock,
but at the same time partially depending on console lock. That's not
really working out.

3) The operation of consoles and printk threads was depending solely on
global state and that state is on/off so it does not really qualify as
stateful and is therefore not really useful to create a stable
mechanism.

So I have to correct myself and admit that the revert was the Right Thing
to do. My other statements in that mail [1] still stand.

Nevertheless threaded printing is not only the last missing prerequisite
for enabling RT in mainline, it is also a long standing request from the
enterprise departement.

Synchronous printk is slow especially with serial consoles (which are at
the same time the most reliable ones) where a single character takes ~87
microseconds at 115200 Baud. With enough noise this causes lockup detectors
to trigger and the current "load balancing" approach of handing over the
consoles after one line to an different CPU is just a bandaid which "works"
by chance and makes CPUs spinwait for milliseconds.

After taking a step back we decided to go for a radically different
approach:

Create infrastructure which is designed to overcome the current
console/printk shortcomings and convert drivers one by one.

A CPU hotplug dejavu. And yes we should have gone for that right away.
Water down the bridge...

The infrastructure we implemented comes with the following properties:

- It shares the console list, but only for registration/unregistration
purposes. Consoles which are utilizing the new infrastructure are
ignored by the existing mechanisms and vice versa. This allows to
reuse as much code as possible and preserves the printk semantics
except for the threaded printing part.

- The console list walk becomes SRCU protected to avoid any restrictions
on contexts

- Consoles become stateful to handle handover and takeover in a graceful
way.

- All console state operations rely solely on atomic*_try_cmpxchg() so
they work in any context.

- Console locking is per console to allow friendly handover or "safe"
hostile takeover in emergency/panic situations. Console lock is not
relevant for consoles converted to the new infrastructure.

- The core provides interfaces for console drivers to query whether they
can proceed and to denote 'unsafe' sections in the console state, which
is unavoidable for some console drivers.

In fact there is not a single regular (non-early) console driver today
which is reentrancy safe under all circumstances, which enforces that
NMI context is excluded from printing directly. TBH, that's a sad state
of affairs.

The unsafe state bit allows to make informed decisions in the core
code, e.g. to postpone printing if there are consoles available which
are safe to acquire. In case of a hostile takeover the unsafe state bit
is handed to the atomic write callback so that the console driver can
act accordingly.

- Printing is delegated to a per console printer thread except for the
following situations:

- Early boot
- Emergency printing (WARN/OOPS)
- Panic printing

The integration is complete, but without any fancy things, like locking all
consoles when entering a WARN, print and unlock when done. Such things only
make sense once all drivers are converted over because that conflicts with
the way how the existing console lock mechanics work.

For testing we used the most simple driver: a hacked up version of the
early uart8250 console as we wanted to concentrate on validating the core
mechanisms of friendly handover and hostile takeovers instead of dealing
with the horrors of port locks or whatever at the same time. That's the
next challenge. Hack patch will be provided in a reply.

Here is sample output where we let the atomic and thread write functions
prepend each line with the printing context (A=atomic, T=thread):

A[ 0.394066] ... fixed-purpose events: 3
A[ 0.395130] ... event mask: 000000070000000f

End of early boot, thread starts

TA[ 0.396821] rcu: Hierarchical SRCU implementation.

^^ Thread starts printing and immediately raises a warning, so atomic
context at the emergency priority takes over and continues printing.

This is a forceful, but safe takeover scenario as the WARN context
is obviously on the same CPU as the printing thread where friendly
is not an option.

A[ 0.397133] rcu: Max phase no-delay instances is 400.
A[ 0.397640] ------------[ cut here ]------------
A[ 0.398072] WARNING: CPU: 0 PID: 13 at drivers/tty/serial/8250/8250_early.c:123 __early_serial8250_write.isra.0+0x80/0xa0
....
A[ 0.440131] ret_from_fork+0x1f/0x30
A[ 0.441133] </TASK>
A[ 0.441867] ---[ end trace 0000000000000000 ]---
T[ 0.443493] smp: Bringing up secondary CPUs ...

After the warning the thread continues printing.

....

T[ 1.916873] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3
T[ 1.918719] md: Waiting for all devices to be available before autod

A[ 1.918719] md: Waiting for all devices to be available before autodetect

System panics because it can't find a root file system. Panic printing
takes over the console from the printer thread immediately and reprints
the interrupted line.

This case is a friendly handover from the printing thread to the panic
context because the printing thread was not running on the panic CPU, but
handed over gracefully.

A[ 1.919942] md: If you don't use raid, use raid=noautodetect
A[ 1.921030] md: Autodetecting RAID arrays.
A[ 1.921919] md: autorun ...
A[ 1.922686] md: ... autorun DONE.
A[ 1.923761] /dev/root: Can't open blockdev

So far the implemented state handling machinery holds up on the various
handover and hostile takeover situations we enforced for testing.

Hostile takeover is nevertheless a special case. If the driver is in an
unsafe region that's something which needs to be dealt with per driver.
There is not much the core code can do here except of trying a friendly
handover first and only enforcing it after a timeout or not trying to print
on such consoles.

This needs some thought, but we explicitely did not implement any takeover
policy into the core state handling mechanism as this is really a decision
which needs to be made at the call site. See patch 28.

We are soliciting feedback on that approach and we hope that we can
organize a BOF in Dublin on short notice.

The series is also available from git:

git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git printk

The series has the following parts:

Patches 1 - 5: Cleanups

Patches 6 - 12: Locking and list conversion

Patches 13 - 18: Improved output buffer handling to prepare for
code sharing

Patches 19 - 29: New infrastructure implementation

Most of the preparatory patches 1-18 have probably a value on their own.

Don't be scared about the patch stats below. We added kernel doc and
extensive comments to the code:

kernel/printk/printk_nobkl.c: Code: 668 lines, Comments: 697 lines, Ratio: 1:1.043

Of course the code is trivial and straight forward as any other facility
which has to deal with concurrency and the twist of being safe in any
context. :)

Comments welcome.

Thanks,

tglx
---
[1] https://lore.kernel.org/lkml/87r11qp63n.ffs@tglx/
---
arch/parisc/include/asm/pdc.h | 2
arch/parisc/kernel/pdc_cons.c | 53 -
arch/parisc/kernel/traps.c | 17
b/kernel/printk/printk_nobkl.c | 1564 +++++++++++++++++++++++++++++++++++++++++
drivers/tty/serial/kgdboc.c | 7
drivers/tty/tty_io.c | 6
fs/proc/consoles.c | 12
fs/proc/kmsg.c | 2
include/linux/console.h | 375 +++++++++
include/linux/printk.h | 9
include/linux/syslog.h | 3
kernel/debug/kdb/kdb_io.c | 7
kernel/panic.c | 12
kernel/printk/printk.c | 485 ++++++++----
14 files changed, 2304 insertions(+), 250 deletions(-)

2022-09-10 23:02:30

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [patch RFC 00/29] printk: A new approach - WIP

On Sun, Sep 11 2022 at 00:27, Thomas Gleixner wrote:
> For testing we used the most simple driver: a hacked up version of the
> early uart8250 console as we wanted to concentrate on validating the core
> mechanisms of friendly handover and hostile takeovers instead of dealing
> with the horrors of port locks or whatever at the same time. That's the
> next challenge. Hack patch will be provided in a reply.

Here you go.

---
Subject: serial: 8250: Use 8250 serial for exploring noBKL consoles
From: John Ogness <[email protected]>
Date: Sat, 10 Sep 2022 01:05:34 +0200

From: John Ogness <[email protected]>

Utilize 8250 early console - the only console in the kernel which is
reentrancy and NMI safe - to explore the noBKL console infrastructure.

Not-Signed-off-by: John Ogness <[email protected]>
Not-Signed-off-by: Thomas Gleixner <[email protected]>
---
drivers/tty/serial/8250/8250_early.c | 32 ++++++++++++++++++++++++++++++++
drivers/tty/serial/8250/Kconfig | 11 ++++++++++-
drivers/tty/serial/8250/Makefile | 2 +-
3 files changed, 43 insertions(+), 2 deletions(-)

--- a/drivers/tty/serial/8250/8250_early.c
+++ b/drivers/tty/serial/8250/8250_early.c
@@ -107,6 +107,34 @@ static void early_serial8250_write(struc
uart_console_write(port, s, count, serial_putc);
}

+static bool __early_serial8250_write(struct console *con, struct cons_write_context *wctxt,
+ unsigned char c)
+{
+ struct earlycon_device *device = con->data;
+ struct uart_port *port = &device->port;
+ unsigned char *s = wctxt->outbuf;
+
+ serial_putc(port, c);
+
+ for (; wctxt->pos < wctxt->len; wctxt->pos++, s++) {
+ if (!console_can_proceed(wctxt))
+ return false;
+
+ uart_console_write(port, s, 1, serial_putc);
+ }
+ return true;
+}
+
+static bool early_serial8250_write_thread(struct console *con, struct cons_write_context *wctxt)
+{
+ return __early_serial8250_write(con, wctxt, 'T');
+}
+
+static bool early_serial8250_write_atomic(struct console *con, struct cons_write_context *wctxt)
+{
+ return __early_serial8250_write(con, wctxt, 'A');
+}
+
#ifdef CONFIG_CONSOLE_POLL
static int early_serial8250_read(struct console *console,
char *s, unsigned int count)
@@ -170,6 +198,10 @@ int __init early_serial8250_setup(struct

device->con->write = early_serial8250_write;
device->con->read = early_serial8250_read;
+ device->con->flags &= ~CON_BOOT;
+ device->con->flags |= CON_NO_BKL;
+ device->con->write_thread = early_serial8250_write_thread;
+ device->con->write_atomic = early_serial8250_write_atomic;
return 0;
}
EARLYCON_DECLARE(uart8250, early_serial8250_setup);
--- a/drivers/tty/serial/8250/Kconfig
+++ b/drivers/tty/serial/8250/Kconfig
@@ -82,9 +82,18 @@ config SERIAL_8250_FINTEK

If unsure, say N.

+config SERIAL_8250_CONSOLE_EARLY
+ bool "Console on 8250/16550 and compatible noBKL console mockup"
+ default SERIAL_8250
+ select SERIAL_CORE_CONSOLE
+ select SERIAL_EARLYCON
+ help
+ Mockup to demonstrate the core capabilities for noBKL consoles.
+ OTOH, the _only_ reliable reentrant and NMI safe console...
+
config SERIAL_8250_CONSOLE
bool "Console on 8250/16550 and compatible serial port"
- depends on SERIAL_8250=y
+ depends on SERIAL_8250=y && !SERIAL_8250_CONSOLE_EARLY
select SERIAL_CORE_CONSOLE
select SERIAL_EARLYCON
help
--- a/drivers/tty/serial/8250/Makefile
+++ b/drivers/tty/serial/8250/Makefile
@@ -20,7 +20,7 @@ obj-$(CONFIG_SERIAL_8250_CS) += serial_
obj-$(CONFIG_SERIAL_8250_ACORN) += 8250_acorn.o
obj-$(CONFIG_SERIAL_8250_ASPEED_VUART) += 8250_aspeed_vuart.o
obj-$(CONFIG_SERIAL_8250_BCM2835AUX) += 8250_bcm2835aux.o
-obj-$(CONFIG_SERIAL_8250_CONSOLE) += 8250_early.o
+obj-$(CONFIG_SERIAL_8250_CONSOLE_EARLY) += 8250_early.o
obj-$(CONFIG_SERIAL_8250_FOURPORT) += 8250_fourport.o
obj-$(CONFIG_SERIAL_8250_ACCENT) += 8250_accent.o
obj-$(CONFIG_SERIAL_8250_BOCA) += 8250_boca.o

2022-09-11 09:16:02

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [patch RFC 00/29] printk: A new approach - WIP

On Sun, Sep 11, 2022 at 12:27:31AM +0200, Thomas Gleixner wrote:

[ . . . ]

> The infrastructure we implemented comes with the following properties:
>
> - It shares the console list, but only for registration/unregistration
> purposes. Consoles which are utilizing the new infrastructure are
> ignored by the existing mechanisms and vice versa. This allows to
> reuse as much code as possible and preserves the printk semantics
> except for the threaded printing part.
>
> - The console list walk becomes SRCU protected to avoid any restrictions
> on contexts

I am guessing that this means that you need an NMI-safe srcu_read_lock()
and srcu_read_unlock(). If my guess is correct, please let me know,
and I will create one for you. (As it stands, these are NMI-safe on x86,
but not on architectures using the asm-generic variant of this_cpu_inc().

The result would be srcu_read_lock_nmi() and srcu_read_unlock_nmi()
or similar. There would need to be something to prevent mixing of
srcu_read_lock() and srcu_read_lock_nmi().

Or are you somehow avoiding ever invoking either srcu_read_lock() or
srcu_read_unlock() from NMI context?

For example, are we simply living dangerously with NMI-based stack traces
as suggested below? ;-)

> - Consoles become stateful to handle handover and takeover in a graceful
> way.
>
> - All console state operations rely solely on atomic*_try_cmpxchg() so
> they work in any context.
>
> - Console locking is per console to allow friendly handover or "safe"
> hostile takeover in emergency/panic situations. Console lock is not
> relevant for consoles converted to the new infrastructure.
>
> - The core provides interfaces for console drivers to query whether they
> can proceed and to denote 'unsafe' sections in the console state, which
> is unavoidable for some console drivers.
>
> In fact there is not a single regular (non-early) console driver today
> which is reentrancy safe under all circumstances, which enforces that
> NMI context is excluded from printing directly. TBH, that's a sad state
> of affairs.
>
> The unsafe state bit allows to make informed decisions in the core
> code, e.g. to postpone printing if there are consoles available which
> are safe to acquire. In case of a hostile takeover the unsafe state bit
> is handed to the atomic write callback so that the console driver can
> act accordingly.
>
> - Printing is delegated to a per console printer thread except for the
> following situations:
>
> - Early boot
> - Emergency printing (WARN/OOPS)
> - Panic printing
>
> The integration is complete, but without any fancy things, like locking all
> consoles when entering a WARN, print and unlock when done. Such things only
> make sense once all drivers are converted over because that conflicts with
> the way how the existing console lock mechanics work.
>
> For testing we used the most simple driver: a hacked up version of the
> early uart8250 console as we wanted to concentrate on validating the core
> mechanisms of friendly handover and hostile takeovers instead of dealing
> with the horrors of port locks or whatever at the same time. That's the
> next challenge. Hack patch will be provided in a reply.
>
> Here is sample output where we let the atomic and thread write functions
> prepend each line with the printing context (A=atomic, T=thread):
>
> A[ 0.394066] ... fixed-purpose events: 3
> A[ 0.395130] ... event mask: 000000070000000f
>
> End of early boot, thread starts
>
> TA[ 0.396821] rcu: Hierarchical SRCU implementation.
>
> ^^ Thread starts printing and immediately raises a warning, so atomic
> context at the emergency priority takes over and continues printing.
>
> This is a forceful, but safe takeover scenario as the WARN context
> is obviously on the same CPU as the printing thread where friendly
> is not an option.
>
> A[ 0.397133] rcu: Max phase no-delay instances is 400.
> A[ 0.397640] ------------[ cut here ]------------
> A[ 0.398072] WARNING: CPU: 0 PID: 13 at drivers/tty/serial/8250/8250_early.c:123 __early_serial8250_write.isra.0+0x80/0xa0
> ....
> A[ 0.440131] ret_from_fork+0x1f/0x30
> A[ 0.441133] </TASK>
> A[ 0.441867] ---[ end trace 0000000000000000 ]---
> T[ 0.443493] smp: Bringing up secondary CPUs ...
>
> After the warning the thread continues printing.
>
> ....
>
> T[ 1.916873] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3
> T[ 1.918719] md: Waiting for all devices to be available before autod
>
> A[ 1.918719] md: Waiting for all devices to be available before autodetect
>
> System panics because it can't find a root file system. Panic printing
> takes over the console from the printer thread immediately and reprints
> the interrupted line.
>
> This case is a friendly handover from the printing thread to the panic
> context because the printing thread was not running on the panic CPU, but
> handed over gracefully.
>
> A[ 1.919942] md: If you don't use raid, use raid=noautodetect
> A[ 1.921030] md: Autodetecting RAID arrays.
> A[ 1.921919] md: autorun ...
> A[ 1.922686] md: ... autorun DONE.
> A[ 1.923761] /dev/root: Can't open blockdev
>
> So far the implemented state handling machinery holds up on the various
> handover and hostile takeover situations we enforced for testing.
>
> Hostile takeover is nevertheless a special case. If the driver is in an
> unsafe region that's something which needs to be dealt with per driver.
> There is not much the core code can do here except of trying a friendly
> handover first and only enforcing it after a timeout or not trying to print
> on such consoles.
>
> This needs some thought, but we explicitely did not implement any takeover
> policy into the core state handling mechanism as this is really a decision
> which needs to be made at the call site. See patch 28.
>
> We are soliciting feedback on that approach and we hope that we can
> organize a BOF in Dublin on short notice.

Last I checked, there were still some slots open.

> The series is also available from git:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git printk
>
> The series has the following parts:
>
> Patches 1 - 5: Cleanups
>
> Patches 6 - 12: Locking and list conversion
>
> Patches 13 - 18: Improved output buffer handling to prepare for
> code sharing
>
> Patches 19 - 29: New infrastructure implementation
>
> Most of the preparatory patches 1-18 have probably a value on their own.
>
> Don't be scared about the patch stats below. We added kernel doc and
> extensive comments to the code:
>
> kernel/printk/printk_nobkl.c: Code: 668 lines, Comments: 697 lines, Ratio: 1:1.043

When I reach stable AC, I will fire off some rcutorture tests. From the
discussion above, it sounds like I should expect some boot-time warnings?
Or were those strictly printk-specific testing?

> Of course the code is trivial and straight forward as any other facility
> which has to deal with concurrency and the twist of being safe in any
> context. :)

;-) ;-) ;-)

Thanx, Paul

> Comments welcome.
>
> Thanks,
>
> tglx
> ---
> [1] https://lore.kernel.org/lkml/87r11qp63n.ffs@tglx/
> ---
> arch/parisc/include/asm/pdc.h | 2
> arch/parisc/kernel/pdc_cons.c | 53 -
> arch/parisc/kernel/traps.c | 17
> b/kernel/printk/printk_nobkl.c | 1564 +++++++++++++++++++++++++++++++++++++++++
> drivers/tty/serial/kgdboc.c | 7
> drivers/tty/tty_io.c | 6
> fs/proc/consoles.c | 12
> fs/proc/kmsg.c | 2
> include/linux/console.h | 375 +++++++++
> include/linux/printk.h | 9
> include/linux/syslog.h | 3
> kernel/debug/kdb/kdb_io.c | 7
> kernel/panic.c | 12
> kernel/printk/printk.c | 485 ++++++++----
> 14 files changed, 2304 insertions(+), 250 deletions(-)

2022-09-11 12:44:05

by Linus Torvalds

[permalink] [raw]

Subject: Re: [patch RFC 00/29] printk: A new approach - WIP

On Sat, Sep 10, 2022 at 6:27 PM Thomas Gleixner <[email protected]> wrote:
>
> After taking a step back we decided to go for a radically different
> approach:

From a quick look through the patches this morning, I see nothing
alarming. The proof is in the pudding, but this seems to have a sane
model for console list handling and for handling the individual
console states.

But I'm on a laptop and only read through the patches while going
through my email this morning, so I may well have missed something.

Linus

2022-09-12 16:57:19

by John Ogness

[permalink] [raw]

Subject: printk meeting at LPC 2022

Hi,

We now have a room/timeslot [0] where Thomas and I will be presenting
and discussing this new approach [1] for bringing kthread and atomic
console printing to the kernel.

Wednesday, 14 Sep. @ 3:00pm-4:30pm in room "Meeting 9"

John Ogness

[0] https://lpc.events/event/16/contributions/1394/
[1] https://lore.kernel.org/all/[email protected]/

2022-09-15 11:36:11

by Steven Rostedt

[permalink] [raw]

Subject: Re: printk meeting at LPC 2022

On Thu, 15 Sep 2022 20:00:57 +0900
Sergey Senozhatsky <[email protected]> wrote:

> Hi,
>
> On (22/09/12 18:46), John Ogness wrote:
> > Hi,
> >
> > We now have a room/timeslot [0] where Thomas and I will be presenting
> > and discussing this new approach [1] for bringing kthread and atomic
> > console printing to the kernel.
> >
> > Wednesday, 14 Sep. @ 3:00pm-4:30pm in room "Meeting 9"
>
> Was this recorded? I glanced through LPC/kernel summit schedules and didn't
> find it anywhere.

Yes it was, but it will take a bit to extract it from BBB and upload it to YouTube.

-- Steve

2022-09-15 11:56:21

by Sergey Senozhatsky

[permalink] [raw]

Subject: Re: printk meeting at LPC 2022

Hi,

On (22/09/12 18:46), John Ogness wrote:
> Hi,
>
> We now have a room/timeslot [0] where Thomas and I will be presenting
> and discussing this new approach [1] for bringing kthread and atomic
> console printing to the kernel.
>
> Wednesday, 14 Sep. @ 3:00pm-4:30pm in room "Meeting 9"

Was this recorded? I glanced through LPC/kernel summit schedules and didn't
find it anywhere.

2022-09-15 15:39:49

by Sergey Senozhatsky

[permalink] [raw]

Subject: Re: printk meeting at LPC 2022

On (22/09/15 07:09), Steven Rostedt wrote:
> > > We now have a room/timeslot [0] where Thomas and I will be presenting
> > > and discussing this new approach [1] for bringing kthread and atomic
> > > console printing to the kernel.
> > >
> > > Wednesday, 14 Sep. @ 3:00pm-4:30pm in room "Meeting 9"
> >
> > Was this recorded? I glanced through LPC/kernel summit schedules and didn't
> > find it anywhere.
>
> Yes it was, but it will take a bit to extract it from BBB and upload it to YouTube.

Thanks Steven!

2022-09-23 15:02:48

by John Ogness

[permalink] [raw]

Subject: Re: printk meeting at LPC 2022

Hi,

On 2022-09-12, John Ogness <[email protected]> wrote:
> We now have a room/timeslot [0] where Thomas and I will be presenting
> and discussing this new approach [1] for bringing kthread and atomic
> console printing to the kernel.

Thanks to everyone who attended the meeting (in person and virtually)!
It was a productive and fun discussion that left me thinking we will get
the printk threading right this time.

Here are the main points that I took away from the meeting:

- Printing the backlog is important! If some emergency situation occurs,
make sure the backlog gets printed.

- When an emergency occurs, put the full backtrace into the ringbuffer
before flushing any backlog. This ensures that the backtrace makes it
into the ringbuffer in case a panic occurs while flushing the backlog.

- A newline should be added when an atomic console takes over from a
threaded console. This improves readability. We may decide later that
the atomic console prints some extra information upon takeover, or
that it completes the line the threaded console was printing. But for
now we will just use a newline to keep things simple.

- It should be visible to users and in crash reports if legacy consoles
were in use. It was suggested that a new TAINT flag could be used for
this.

- There will need to be new console flags introduced so that safe
printing decisions can be made in emergency and panic situations.

For example, upon panic, intially only the consoles marked RELIABLE
would be used. If any of the RELIABLE consoles required a hostile
takeover, they would only be used if they are labeled to support safe
hostile takeovers.

All other consoles could then be tried as a "last hope" at the very
end of panic(), after all records have been flushed to reliable
consoles and when it no longer matters if a console kills the CPU. For
non-panic emergencies (warn, rcu stalls, etc), there may be other
flags that would be needed.

Initially we do not plan to have any such flags. We can add them on an
as-needed basis as console drivers are moved over to the new
thread/atomic interface.

If I have missed anything relevant, please let me know.

John Ogness

> [0] https://lpc.events/event/16/contributions/1394/
> [1] https://lore.kernel.org/all/[email protected]/

2022-09-23 15:43:52

by Sebastian Andrzej Siewior

[permalink] [raw]

Subject: Re: printk meeting at LPC 2022

On 2022-09-23 16:55:28 [+0206], John Ogness wrote:
> If I have missed anything relevant, please let me know.

I just wanted to state, that there was discussion at the end about
removing the early_printk drivers in favour of the atomic-printing
driver/ support which should be capable to do the same.

> John Ogness

Sebastian

2022-09-23 15:50:38

by Linus Torvalds

[permalink] [raw]

Subject: Re: printk meeting at LPC 2022

On Fri, Sep 23, 2022 at 7:49 AM John Ogness <[email protected]> wrote:
>
> - Printing the backlog is important! If some emergency situation occurs,
> make sure the backlog gets printed.

Yeah, I really liked the notion of doing the oops with just filling
the back buffer but still getting it printed out if something goes
wrong in the middle.

That said, I'm sure we can tweak the exact "how much of the back log
we print" if there are any real life issues that look even remotely
like the demo did.

It's not like you couldn't do a "skipping lines" message if there are
thousands of old non-emergency lines in the back buffer, and
prioritize getting the recent ones out first.

I doubt it ends up being an issue in practice, but basically I wanted
to just pipe up and say that the exact details of how much of the back
buffer needs to be flushed first _could_ be tweaked if it ever does
come up as an issue.

Linus

2022-09-23 16:18:17

by Steven Rostedt

[permalink] [raw]

Subject: Re: printk meeting at LPC 2022

On Fri, 23 Sep 2022 16:55:28 +0206
John Ogness <[email protected]> wrote:

> All other consoles could then be tried as a "last hope" at the very
> end of panic(), after all records have been flushed to reliable
> consoles and when it no longer matters if a console kills the CPU. For
> non-panic emergencies (warn, rcu stalls, etc), there may be other
> flags that would be needed.

I think we may need to check if kexec is involved. We don't want one of
these "last hope" consoles to lock up the system preventing kexec to occur.

But if there's no kexec, and the system is just going to lock up anyway,
then sure go ahead and call the unsafe consoles.

-- Steve