2022-05-21 07:24:30

by Rik van Riel

[permalink] [raw]
Subject: Re: [RFC PATCH v2] printk: console: Allow each console to have its own loglevel

On Fri, 2022-05-20 at 13:57 +0100, Chris Down wrote:
> [Once the goals of this patch are generally agreed upon, it can be
> split
> out further with more detailed changelogs if desired.]
>
> Consoles can have vastly different latencies and throughputs. For
> example, writing a message to the serial console can take on the
> order
> of tens of milliseconds to get the UART to successfully write a
> message.
> While this might be fine for a single, one-off message, this can
> cause
> significant application-level stalls in situations where the kernel
> writes large amounts of information to the console.
>
It's more than just application-level stalls. I have seen
some cases of the kernel spending so much time logging things
to serial console that it thinks it locked up, and panics as
a result of how slow the serial console is.

Adding insult to injury, because the log level is sytem wide,
we only see _some_ of the hints of why the kernel started
spewing like that in the netcons logs.

If we print all the information, we will have more hosts panic
because we spent too much time in the serial console code.

If we print less information, we won't find out some of the
other things causing issues on systems.

Having per console log levels will allow us to avoid the
serial console issues, and gather all the info we need on
other stuff happening on the system.

--
All Rights Reversed.


Attachments:
signature.asc (499.00 B)
This is a digitally signed message part

2022-07-07 14:48:41

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC PATCH v2] printk: console: Allow each console to have its own loglevel

On Fri 2022-05-20 12:06:33, Rik van Riel wrote:
> On Fri, 2022-05-20 at 13:57 +0100, Chris Down wrote:
> > [Once the goals of this patch are generally agreed upon, it can be
> > split
> > out further with more detailed changelogs if desired.]
> >
> > Consoles can have vastly different latencies and throughputs. For
> > example, writing a message to the serial console can take on the
> > order
> > of tens of milliseconds to get the UART to successfully write a
> > message.
> > While this might be fine for a single, one-off message, this can
> > cause
> > significant application-level stalls in situations where the kernel
> > writes large amounts of information to the console.
> >
> It's more than just application-level stalls. I have seen
> some cases of the kernel spending so much time logging things
> to serial console that it thinks it locked up, and panics as
> a result of how slow the serial console is.
>
> Adding insult to injury, because the log level is sytem wide,
> we only see _some_ of the hints of why the kernel started
> spewing like that in the netcons logs.
>
> If we print all the information, we will have more hosts panic
> because we spent too much time in the serial console code.
>
> If we print less information, we won't find out some of the
> other things causing issues on systems.
>
> Having per console log levels will allow us to avoid the
> serial console issues, and gather all the info we need on
> other stuff happening on the system.

The problem is clear. But the big part of the problem is that printk()
tries to show the messages on all consoles immediately.

I wonder how much the per-console loglevel would be needed
when the console handling is offloaded to per-console kthreads, see
https://lore.kernel.org/all/[email protected]/
It causes that printk() should "never" block and each console might
run on its own speed.

It still might be useful from some reasons:

+ Serial consoles might miss messages because the old messages are
over-written before they reach the console. It might be solved
by big enough buffer.

+ printk() still tries to show the messages immediately in some
critical situations, for example, early boot, watchdog warnings,
suspend, reboot, OOps, panic(). The slow consoles might still
cause stalls and put the system into its knees.

+ People might need to explicitly disable the kthreads, for
example, when debugging a situation when kthreads are not
scheduled.


So, I think that the per-console loglevels might still be useful.
But I wonder if they really will be used in practice. It does
not make sense to add feature that would get obsoleted by
the kthreads.

Note that the per-console kthreads were added into 5.19-rc1.
Unfortunately they were reverted in 5.19-rc4 because there were
some issues that need more work. But we still believe that
they are needed and we could make them working reliably.

Best Regards,
Petr

PS: I am sorry for the late response. I am still snowed under
many tasks. The printk kthreads are complicated and need
a lot of attention. Plus there was a sickness, vacations,
and other tasks.

2022-07-13 14:52:48

by Chris Down

[permalink] [raw]
Subject: Re: [RFC PATCH v2] printk: console: Allow each console to have its own loglevel

Petr Mladek writes:
>The problem is clear. But the big part of the problem is that printk()
>tries to show the messages on all consoles immediately.
>
>I wonder how much the per-console loglevel would be needed
>when the console handling is offloaded to per-console kthreads, see
>https://lore.kernel.org/all/[email protected]/
>It causes that printk() should "never" block and each console might
>run on its own speed.
>
>It still might be useful from some reasons:
>
> + Serial consoles might miss messages because the old messages are
> over-written before they reach the console. It might be solved
> by big enough buffer.
>
> + printk() still tries to show the messages immediately in some
> critical situations, for example, early boot, watchdog warnings,
> suspend, reboot, OOps, panic(). The slow consoles might still
> cause stalls and put the system into its knees.
>
> + People might need to explicitly disable the kthreads, for
> example, when debugging a situation when kthreads are not
> scheduled.

Indeed. In addition to these, there are cases (like the pstore case mentioned
by Vincent) where we want to bump the console loglevel up to the maximum for
debugging, but it still doesn't make sense to emit it over all consoles --
especially netconsoles where processing capacity on the netconsole
receiver/server is likely limited. The same is true for things like baseboard
management controllers where they may blindly store the console output in a
similar fashion.

The kthread offloading is definitely going to help a lot here here, but to
capably be able to use netconsole in prod we're going to need both.

>PS: I am sorry for the late response. I am still snowed under
>many tasks. The printk kthreads are complicated and need
>a lot of attention. Plus there was a sickness, vacations,
>and other tasks.

Don't worry, I totally understand :-) I really appreciate you getting back.