2023-04-25 13:07:58

by Chris Down

[permalink] [raw]
Subject: [PATCH v5 0/2] printk: console: Per-console loglevels

v5:

- Fix syntax in boot_delay

v4:

- Change base to Linus' master
- Use SRCU iterators for console walks
- Override per-console loglevels on magic sysrq
- Fix htmldocs
- Fix mistaken __user annotation in sysctl callbacks
- Consistently use indexed names (eg. ttyS0 instead of ttyS)
- Remove "The loglevel for a console can be set in many places" comment
- Remove CON_LOGLEVEL flag and infer based on >0
- Open code our dev_get_drvdata console stashing
- Split out console_effective_loglevel functions per Petr's suggestion
- Make boot_delay_msec/printk_delay check if it would be emitted
- Simplify warning on SYSLOG_ACTION_CONSOLE_LEVEL
- Save/restore ignore_per_console_loglevel on SYSLOG_ACTION_CONSOLE_{ON,OFF}
- Unify min/max level checks across sysfs/proc/syslog
- Add find_and_remove_console_option to avoid affecting io/mmio options

v3:

- Update to work with John's kthread patches
- Remove force_console_loglevel, now we only have global and local levels
- Remove minimum_console_loglevel control and document how to change it
- The minimum loglevel is now only honoured on setting global/local level
- Add ignore_per_console_loglevel
- Return -EINVAL if trying to set below minimum console level
- Add parser for named console= options
- Fix docs around ignore_loglevel: it can be changed at runtime
- Fix ordering in "in order of authority" docs
- Remove duplicated default_console_loglevel doc
- Only warn once on syslog() use

v2:

- Dynamically allocate struct device*
- Document sysfs attributes in Documentation/ABI/
- Use sysfs_emit() instead of sprintf() in dev sysfs files
- Remove WARN_ON() for device_add/IS_ERR(console_class)
- Remove "soon" comment for kernel.printk
- Fix !CONFIG_PRINTK build
- Fix device_unregister() NULL dereference if called before class setup
- Add new documentation to MAINTAINERS

Chris Down (2):
printk: Do not delay messages which aren't solicited by any console
printk: console: Support console-specific loglevels

Documentation/ABI/testing/sysfs-class-console | 47 ++
Documentation/admin-guide/index.rst | 1 +
.../admin-guide/kernel-parameters.txt | 28 +-
.../admin-guide/per-console-loglevel.rst | 92 ++++
Documentation/admin-guide/serial-console.rst | 17 +-
Documentation/core-api/printk-basics.rst | 35 +-
Documentation/networking/netconsole.rst | 17 +
MAINTAINERS | 3 +
drivers/tty/sysrq.c | 15 +
include/linux/console.h | 3 +
include/linux/printk.h | 9 +
kernel/printk/console_cmdline.h | 2 +
kernel/printk/printk.c | 417 +++++++++++++++++-
kernel/printk/sysctl.c | 57 ++-
14 files changed, 692 insertions(+), 51 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-class-console
create mode 100644 Documentation/admin-guide/per-console-loglevel.rst


base-commit: 61d325dcbc05d8fef88110d35ef7776f3ac3f68b
--
2.40.0


2023-04-25 13:10:01

by Chris Down

[permalink] [raw]
Subject: [PATCH v5 1/2] printk: Do not delay messages which aren't solicited by any console

Boot-delayed printk messages already have this checked up front, and
it's odd that we don't do the same check for global printk delays, since
there's no reason to delay if we are not going to emit anything.

Signed-off-by: Chris Down <[email protected]>
---
kernel/printk/printk.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index fd0c9f913940..041c7bd56005 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1289,15 +1289,13 @@ static int __init boot_delay_setup(char *str)
}
early_param("boot_delay", boot_delay_setup);

-static void boot_delay_msec(int level)
+static void boot_delay_msec(void)
{
unsigned long long k;
unsigned long timeout;

- if ((boot_delay == 0 || system_state >= SYSTEM_RUNNING)
- || suppress_message_printing(level)) {
+ if (boot_delay == 0 || system_state >= SYSTEM_RUNNING)
return;
- }

k = (unsigned long long)loops_per_msec * boot_delay;

@@ -1316,7 +1314,7 @@ static void boot_delay_msec(int level)
}
}
#else
-static inline void boot_delay_msec(int level)
+static inline void boot_delay_msec(void)
{
}
#endif
@@ -2064,7 +2062,10 @@ int printk_delay_msec __read_mostly;

static inline void printk_delay(int level)
{
- boot_delay_msec(level);
+ if (suppress_message_printing(level))
+ return;
+
+ boot_delay_msec();

if (unlikely(printk_delay_msec)) {
int m = printk_delay_msec;
--
2.40.0

2023-04-25 13:10:30

by Chris Down

[permalink] [raw]
Subject: [PATCH v5 2/2] printk: console: Support console-specific loglevels

Consoles can have vastly different latencies and throughputs. For
example, writing a message to the serial console can take on the order
of tens of milliseconds to get the UART to successfully write a message.
While this might be fine for a single, one-off message, this can cause
significant application-level stalls in situations where the kernel
writes large amounts of information to the console.

This means that while you might want to send at least INFO level
messages to (for example) netconsole, which is relatively fast, you may
only want to send at least WARN level messages to the serial console.
Such an implementation would permit debugging using the serial console
in cases that netconsole doesn't receive messages during particularly
bad system issues, while still keeping the noise low enough to avoid
inducing latency in userspace applications. This patch adds such an
interface, extending the existing console loglevel controls to allow
each console to have its own loglevel.

One can't just disable the serial console, because one may actually need
it in situations where the machine is in a bad enough state that nothing
is received on netconsole. One also can't just bump the loglevel at
runtime after the issue, because usually the machine is already so
wedged by this point that it isn't responsive to such requests.

In terms of technical implementation, this patch embeds a device pointer
in the console struct, and registers each console using it so we can
expose attributes in sysfs. We currently expose the following
attributes:

% ls -l /sys/class/console/ttyS/
total 0
lrwxrwxrwx 1 root root 0 Jul 20 17:37 subsystem -> ../../../../class/console/
-r--r--r-- 1 root root 4096 Jul 20 17:38 effective_loglevel
-r--r--r-- 1 root root 4096 Jul 20 17:38 effective_loglevel_source
-r--r--r-- 1 root root 4096 Jul 20 17:38 enabled
-rw-r--r-- 1 root root 4096 Jul 20 17:38 loglevel
-rw-r--r-- 1 root root 4096 Jul 20 17:37 uevent

The lifecycle of this classdev looks like this on registration:

register_console(con)/printk_late_init()
console_register_device(con)
device_initialize(con->classdev) # refcount++
device_add(con->classdev) # refcount++

At stable state, the refcount is two.

Console unregistration looks like this:

[con->classdev refcount drops to 0]
console_classdev_release(con->classdev)
kfree(con->classdev)

unregister_console(con)
device_unregister(con->classdev)
device_del(con->classdev) # refcount--
device_remove_class_symlinks()
kernfs_remove_by_name_ns()
kernfs_drain()
kernfs_drain_open_files() # wait for close()
put_device(con->classdev) # refcount--

We also deprecate the kernel.printk sysctl as it doesn't know about
per-console loglevels, and is generally pretty confusing.

For information on the precedence and application of the new controls,
see Documentation/ABI/testing/sysfs-class-console and
Documentation/admin-guide/per-console-loglevel.rst.

Signed-off-by: Chris Down <[email protected]>
---
Documentation/ABI/testing/sysfs-class-console | 47 ++
Documentation/admin-guide/index.rst | 1 +
.../admin-guide/kernel-parameters.txt | 28 +-
.../admin-guide/per-console-loglevel.rst | 92 ++++
Documentation/admin-guide/serial-console.rst | 17 +-
Documentation/core-api/printk-basics.rst | 35 +-
Documentation/networking/netconsole.rst | 17 +
MAINTAINERS | 3 +
drivers/tty/sysrq.c | 15 +
include/linux/console.h | 3 +
include/linux/printk.h | 9 +
kernel/printk/console_cmdline.h | 2 +
kernel/printk/printk.c | 410 +++++++++++++++++-
kernel/printk/sysctl.c | 57 ++-
14 files changed, 688 insertions(+), 48 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-class-console
create mode 100644 Documentation/admin-guide/per-console-loglevel.rst

diff --git a/Documentation/ABI/testing/sysfs-class-console b/Documentation/ABI/testing/sysfs-class-console
new file mode 100644
index 000000000000..35a117d521cf
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-console
@@ -0,0 +1,47 @@
+What: /sys/class/console/
+Date: April 2023
+Contact: Chris Down <[email protected]>
+Description: Interface for viewing and setting per-console attributes, like
+ the per-console loglevel. For a high-level document describing
+ the motivations for this interface and related non-sysfs
+ controls, see
+ Documentation/admin-guide/per-console-loglevel.rst.
+
+What: /sys/class/console/<C>/effective_loglevel
+Date: April 2023
+Contact: Chris Down <[email protected]>
+Description: Read only. The currently effective loglevel for this console.
+ All messages emitted with a loglevel below the effective value
+ will be emitted to the console.
+
+What: /sys/class/console/<C>/effective_loglevel_source
+Date: April 2023
+Contact: Chris Down <[email protected]>
+Description: Read only. The currently effective loglevel source for this
+ console -- for example, whether it was set globally, or whether
+ it was set locally for this console.
+
+ Possible values are:
+ =============== ============================================
+ local The loglevel comes from the per-console
+ loglevel.
+ global The loglevel comes from the global loglevel.
+ ignore_loglevel Both the per-console loglevel and global
+ loglevels are ignored as ignore_loglevel is
+ present on the kernel command line.
+ =============== ============================================
+
+What: /sys/class/console/<C>/enabled
+Date: April 2023
+Contact: Chris Down <[email protected]>
+Description: Read only. "1" if the console is enabled, "0" otherwise.
+
+What: /sys/class/console/<C>/loglevel
+Date: April 2023
+Contact: Chris Down <[email protected]>
+Description: Read write. The current per-console loglevel, which will take
+ effect if not overridden by other non-sysfs controls (see
+ Documentation/admin-guide/per-console-loglevel.rst). Bounds are
+ 0 (LOGLEVEL_EMERG) to 8 (LOGLEVEL_DEBUG + 1) inclusive. Also
+ takes the special value "-1" to indicate that no per-console
+ loglevel is set, and we should defer to the global controls.
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 43ea35613dfc..6143a83c48f3 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -117,6 +117,7 @@ configure specific aspects of kernel behavior to your liking.
namespaces/index
numastat
parport
+ per-console-loglevel
perf-security
pm/index
pnp
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 10e2e5c3ff0b..d956424ccd35 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -722,13 +722,18 @@
ttyS<n>[,options]
ttyUSB0[,options]
Use the specified serial port. The options are of
- the form "bbbbpnf", where "bbbb" is the baud rate,
- "p" is parity ("n", "o", or "e"), "n" is number of
- bits, and "f" is flow control ("r" for RTS or
- omit it). Default is "9600n8".
+ the form "bbbbpnf,extra", where "bbbb" is the baud
+ rate, "p" is parity ("n", "o", or "e"), "n" is
+ number of bits, and "f" is flow control ("r" for RTS
+ or omit it). Default is "9600n8".

- See Documentation/admin-guide/serial-console.rst for more
- information. See
+ At present the only extra option is "loglevel" to
+ set the per-console loglevel. For example:
+
+ console=ttyS0,9600n8,loglevel:3
+
+ See Documentation/admin-guide/serial-console.rst for
+ more information. See
Documentation/networking/netconsole.rst for an
alternative.

@@ -2877,10 +2882,13 @@
logibm.irq= [HW,MOUSE] Logitech Bus Mouse Driver
Format: <irq>

- loglevel= All Kernel Messages with a loglevel smaller than the
- console loglevel will be printed to the console. It can
- also be changed with klogd or other programs. The
- loglevels are defined as follows:
+ loglevel= Sets the global loglevel. All messages with a loglevel
+ smaller than the console loglevel will be printed to
+ the console. Note that this can be overridden
+ per-console, see
+ Documentation/admin-guide/per-console-loglevel.rst.
+
+ The loglevels are defined as follows:

0 (KERN_EMERG) system is unusable
1 (KERN_ALERT) action must be taken immediately
diff --git a/Documentation/admin-guide/per-console-loglevel.rst b/Documentation/admin-guide/per-console-loglevel.rst
new file mode 100644
index 000000000000..4a7ecce7ba8a
--- /dev/null
+++ b/Documentation/admin-guide/per-console-loglevel.rst
@@ -0,0 +1,92 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _per_console_loglevel:
+
+Per-console loglevel support
+============================
+
+Motivation
+----------
+
+Consoles can have vastly different latencies and throughputs. For example,
+writing a message to the serial console can take on the order of tens of
+milliseconds to get the UART to successfully write a message. While this might
+be fine for a single, one-off message, this can cause significant
+application-level stalls in situations where the kernel writes large amounts of
+information to the console.
+
+This means that while you might want to send at least INFO level messages to
+(for example) netconsole, which is relatively fast, you may only want to send
+at least WARN level messages to the serial console. This permits debugging
+using the serial console in cases that netconsole doesn't receive messages
+during particularly bad system issues, while still keeping the noise low enough
+to avoid inducing latency in userspace applications.
+
+Tunables
+--------
+
+In order to allow tuning this, the following controls exist:
+
+Global
+~~~~~~
+
+The global loglevel is set by the ``kernel.console_loglevel`` sysctl, which can
+also be set as ``loglevel=`` on the kernel command line.
+
+The printk module also takes two parameters which modify this behaviour
+further:
+
+* ``ignore_loglevel`` on the kernel command line or set in printk parameters:
+ Emit all messages. All other controls are ignored if this is present.
+* ``ignore_per_console_loglevel`` on the kernel command line or set in printk
+ parameters: Ignore all per-console loglevels and use the global loglevel.
+
+The default value for ``kernel.console_loglevel`` comes from
+``CONFIG_CONSOLE_LOGLEVEL_DEFAULT``, or ``CONFIG_CONSOLE_LOGLEVEL_QUIET`` if
+``quiet`` is passed on the kernel command line.
+
+Console attributes
+~~~~~~~~~~~~~~~~~~
+
+Registered consoles are exposed at ``/sys/class/console``. For example, if you
+are using ``ttyS0``, the console backing it can be viewed at
+``/sys/class/console/ttyS0/``. The following files are available:
+
+* ``effective_loglevel`` (r): The effective loglevel after considering all
+ loglevel authorities. For example, if the console-specific loglevel is 3, but
+ the global minimum console loglevel [*]_ is 5, then the value will be 5.
+* ``effective_loglevel_source`` (r): The loglevel authority which resulted in
+ the effective loglevel being set. The following values can be present:
+
+ * ``local``: The console-specific loglevel is in effect.
+ * ``global``: The global loglevel (``kernel.console_loglevel``) is in
+ effect. Set a console-specific loglevel to override it.
+ * ``ignore_loglevel``: ``ignore_loglevel`` was specified on the kernel
+ command line or at ``/sys/module/printk/parameters/ignore_loglevel``.
+ Disable it to use level controls.
+ * ``ignore_per_console_loglevel``: ``ignore_per_console_loglevel`` was
+ specified on the kernel command line or at
+ ``/sys/module/printk/parameters/ignore_per_console_loglevel``. Disable it
+ to use per-console level controls.
+
+* ``enabled`` (r): Whether the console is enabled.
+* ``loglevel`` (rw): The local, console-specific loglevel for this console.
+ This will be in effect if no other global control overrides it. Look at
+ ``effective_loglevel`` and ``effective_loglevel_source`` to verify that.
+
+.. [*] The existence of a minimum console loglevel is generally considered to
+ be a confusing and rarely used interface, and as such is not exposed through
+ the modern printk sysctl APIs that obsoleted ``kernel.printk``. Use the
+ legacy ``kernel.printk`` sysctl to control it if you have a rare use case
+ that requires changing it. The default value is ``CONSOLE_LOGLEVEL_MIN``.
+
+Deprecated
+~~~~~~~~~~
+
+* ``kernel.printk`` sysctl: this takes four values, setting
+ ``kernel.console_loglevel``, ``kernel.default_message_loglevel``, the minimum
+ console loglevel, and a fourth unused value. The interface is generally
+ considered to quite confusing, doesn't perform checks on the values given,
+ and is unaware of per-console loglevel semantics.
+
+Chris Down <[email protected]>, 27-April-2023
diff --git a/Documentation/admin-guide/serial-console.rst b/Documentation/admin-guide/serial-console.rst
index 58b32832e50a..4e204115fe4a 100644
--- a/Documentation/admin-guide/serial-console.rst
+++ b/Documentation/admin-guide/serial-console.rst
@@ -32,15 +32,25 @@ The format of this option is::
and F is flow control ('r' for RTS). Default is
9600n8. The maximum baudrate is 115200.

+ One can also specify the per-console loglevel for this
+ console by providing a loglevel parameter, for example
+ "loglevel:4" to set this console's loglevel to 4. The
+ value provided can be from 0 (LOGLEVEL_EMERG) to 8
+ (LOGLEVEL_DEBUG + 1), and messages below that will be
+ emitted onto the console as they become available.
+
You can specify multiple console= options on the kernel command line.
Output will appear on all of them. The last device will be used when
you open ``/dev/console``. So, for example::

- console=ttyS1,9600 console=tty0
+ console=ttyS1,9600,loglevel:5 console=tty0

defines that opening ``/dev/console`` will get you the current foreground
-virtual console, and kernel messages will appear on both the VGA
-console and the 2nd serial port (ttyS1 or COM2) at 9600 baud.
+virtual console, and kernel messages will appear on both the VGA console and
+the 2nd serial port (ttyS1 or COM2) at 9600 baud. The optional loglevel "5"
+indicates that this console will emit messages more serious than
+LOGLEVEL_NOTICE (that is, LOGLEVEL_WARNING and below, since more serious
+messages have lower ordering).

Note that you can only define one console per device type (serial, video).

@@ -113,3 +123,4 @@ Replace the sample values as needed.
the integration of these patches into m68k, ppc and alpha.

Miquel van Smoorenburg <[email protected]>, 11-Jun-2000
+Chris Down <[email protected]>, 27-April-2023
diff --git a/Documentation/core-api/printk-basics.rst b/Documentation/core-api/printk-basics.rst
index 2dde24ca7d9f..bfad359505bb 100644
--- a/Documentation/core-api/printk-basics.rst
+++ b/Documentation/core-api/printk-basics.rst
@@ -54,32 +54,33 @@ string, the log level is not a separate argument). The available log levels are:

The log level specifies the importance of a message. The kernel decides whether
to show the message immediately (printing it to the current console) depending
-on its log level and the current *console_loglevel* (a kernel variable). If the
-message priority is higher (lower log level value) than the *console_loglevel*
-the message will be printed to the console.
+on its log level and the current global *console_loglevel* or local per-console
+loglevel (kernel variables). If the message priority is higher (lower log level
+value) than the effective loglevel the message will be printed to the console.

If the log level is omitted, the message is printed with ``KERN_DEFAULT``
level.

-You can check the current *console_loglevel* with::
+You can check the current console's loglevel -- for example if you want to
+check the loglevel for serial consoles:

- $ cat /proc/sys/kernel/printk
- 4 4 1 7
+ $ cat /sys/class/console/ttyS0/effective_loglevel
+ 6
+ $ cat /sys/class/console/ttyS0/effective_loglevel_source
+ local

-The result shows the *current*, *default*, *minimum* and *boot-time-default* log
-levels.
+To change the default loglevel for all consoles, simply write the desired level
+to ``/proc/sys/kernel/console_loglevel``. For example::

-To change the current console_loglevel simply write the desired level to
-``/proc/sys/kernel/printk``. For example, to print all messages to the console::
+ # echo 5 > /proc/sys/kernel/console_loglevel

- # echo 8 > /proc/sys/kernel/printk
+This sets the console_loglevel to print KERN_WARNING (4) or more severe
+messages to console. Consoles with a per-console loglevel set will ignore it
+unless ``ignore_per_console_loglevel`` is set on the kernel command line or at
+``/sys/module/printk/parameters/ignore_per_console_loglevel``.

-Another way, using ``dmesg``::
-
- # dmesg -n 5
-
-sets the console_loglevel to print KERN_WARNING (4) or more severe messages to
-console. See ``dmesg(1)`` for more information.
+For more information on per-console loglevels, see
+Documentation/admin-guide/per-console-loglevel.rst.

As an alternative to printk() you can use the ``pr_*()`` aliases for
logging. This family of macros embed the log level in the macro names. For
diff --git a/Documentation/networking/netconsole.rst b/Documentation/networking/netconsole.rst
index dd0518e002f6..e2c094a96f16 100644
--- a/Documentation/networking/netconsole.rst
+++ b/Documentation/networking/netconsole.rst
@@ -67,6 +67,23 @@ Built-in netconsole starts immediately after the TCP stack is
initialized and attempts to bring up the supplied dev at the supplied
address.

+You can also set a loglevel at boot time on the kernel command line::
+
+ console=netcon0,loglevel:2
+
+This can also be changed at runtime::
+
+ $ ls -l /sys/class/console/netcon0/
+ total 0
+ lrwxrwxrwx 1 root root 0 May 18 13:28 subsystem -> ../../../../class/console/
+ -r--r--r-- 1 root root 4096 May 18 13:28 effective_loglevel
+ -r--r--r-- 1 root root 4096 May 18 13:28 effective_loglevel_source
+ -r--r--r-- 1 root root 4096 May 18 13:28 enabled
+ -rw-r--r-- 1 root root 4096 May 18 13:28 loglevel
+ -rw-r--r-- 1 root root 4096 May 18 13:28 uevent
+
+See Documentation/admin-guide/per-console-loglevel.rst for more information.
+
The remote host has several options to receive the kernel messages,
for example:

diff --git a/MAINTAINERS b/MAINTAINERS
index 14fadebf42fd..ecf906bb37ee 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16805,6 +16805,9 @@ R: Steven Rostedt <[email protected]>
R: John Ogness <[email protected]>
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux.git
+F: Documentation/ABI/testing/sysfs-class-console
+F: Documentation/admin-guide/per-console-loglevel.rst
+F: Documentation/core-api/printk-basics.rst
F: include/linux/printk.h
F: kernel/printk/

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index b6e70c5cfa17..f62beafd5d49 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -51,6 +51,7 @@
#include <linux/syscalls.h>
#include <linux/of.h>
#include <linux/rcupdate.h>
+#include <linux/console.h>

#include <asm/ptrace.h>
#include <asm/irq_regs.h>
@@ -101,12 +102,26 @@ __setup("sysrq_always_enabled", sysrq_always_enabled_setup);
static void sysrq_handle_loglevel(int key)
{
int i;
+ int cookie;
+ int warned = 0;
+ struct console *con;

i = key - '0';
console_loglevel = CONSOLE_LOGLEVEL_DEFAULT;
pr_info("Loglevel set to %d\n", i);
console_loglevel = i;
+
+ cookie = console_srcu_read_lock();
+ for_each_console_srcu(con) {
+ if (!warned && per_console_loglevel_is_set(con)) {
+ warned = 1;
+ pr_warn("Overriding per-console loglevel from sysrq\n");
+ }
+ con->level = -1;
+ }
+ console_srcu_read_unlock(cookie);
}
+
static const struct sysrq_key_op sysrq_loglevel_op = {
.handler = sysrq_handle_loglevel,
.help_msg = "loglevel(0-9)",
diff --git a/include/linux/console.h b/include/linux/console.h
index d3195664baa5..77052f863571 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -17,6 +17,7 @@
#include <linux/atomic.h>
#include <linux/bits.h>
#include <linux/rculist.h>
+#include <linux/device.h>
#include <linux/types.h>

struct vc_data;
@@ -203,6 +204,8 @@ struct console {
unsigned long dropped;
void *data;
struct hlist_node node;
+ int level;
+ struct device *classdev;
};

#ifdef CONFIG_LOCKDEP
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 8ef499ab3c1e..25ba0873dee8 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -69,6 +69,8 @@ extern int console_printk[];

extern void console_verbose(void);

+int clamp_loglevel(int level);
+
/* strlen("ratelimit") + 1 */
#define DEVKMSG_STR_MAX_SIZE 10
extern char devkmsg_log_str[];
@@ -139,6 +141,7 @@ void early_printk(const char *s, ...) { }
#endif

struct dev_printk_info;
+struct console;

#ifdef CONFIG_PRINTK
asmlinkage __printf(4, 0)
@@ -192,6 +195,8 @@ void show_regs_print_info(const char *log_lvl);
extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
extern asmlinkage void dump_stack(void) __cold;
void printk_trigger_flush(void);
+
+bool per_console_loglevel_is_set(const struct console *con);
#else
static inline __printf(1, 0)
int vprintk(const char *s, va_list args)
@@ -271,6 +276,10 @@ static inline void dump_stack(void)
static inline void printk_trigger_flush(void)
{
}
+static inline bool per_console_loglevel_is_set(const struct console *con)
+{
+ return false;
+}
#endif

#ifdef CONFIG_SMP
diff --git a/kernel/printk/console_cmdline.h b/kernel/printk/console_cmdline.h
index 3ca74ad391d6..40f1a1ff0965 100644
--- a/kernel/printk/console_cmdline.h
+++ b/kernel/printk/console_cmdline.h
@@ -6,6 +6,8 @@ struct console_cmdline
{
char name[16]; /* Name of the driver */
int index; /* Minor dev. to use */
+ int level; /* Log level to use */
+ short flags; /* Initial flags */
bool user_specified; /* Specified by command line vs. platform */
char *options; /* Options for the driver */
#ifdef CONFIG_A11Y_BRAILLE_CONSOLE
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 041c7bd56005..f006960ce18b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -44,6 +44,7 @@
#include <linux/irq_work.h>
#include <linux/ctype.h>
#include <linux/uio.h>
+#include <linux/device.h>
#include <linux/sched/clock.h>
#include <linux/sched/debug.h>
#include <linux/sched/task_stack.h>
@@ -199,6 +200,12 @@ static int __init control_devkmsg(char *str)
}
__setup("printk.devkmsg=", control_devkmsg);

+int clamp_loglevel(int level)
+{
+ return clamp(level, minimum_console_loglevel,
+ CONSOLE_LOGLEVEL_MOTORMOUTH);
+}
+
char devkmsg_log_str[DEVKMSG_STR_MAX_SIZE] = "ratelimit";
#if defined(CONFIG_PRINTK) && defined(CONFIG_SYSCTL)
int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
@@ -466,6 +473,14 @@ static struct latched_seq clear_seq = {
.val[1] = 0,
};

+static struct class *console_class;
+
+enum loglevel_source {
+ LLS_GLOBAL,
+ LLS_LOCAL,
+ LLS_IGNORE_LOGLEVEL,
+};
+
#define LOG_LEVEL(v) ((v) & 0x07)
#define LOG_FACILITY(v) ((v) >> 3 & 0xff)

@@ -696,7 +711,8 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
return len;
}

-static bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
+static bool printk_get_next_message(struct printk_message *pmsg,
+ struct console *con, u64 seq,
bool is_extended, bool may_supress);

/* /dev/kmsg - userspace message inject/listen interface */
@@ -799,7 +815,8 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
if (ret)
return ret;

- if (!printk_get_next_message(&pmsg, atomic64_read(&user->seq), true, false)) {
+ if (!printk_get_next_message(&pmsg, NULL, atomic64_read(&user->seq),
+ true, false)) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
goto out;
@@ -816,8 +833,8 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
* This pairs with __wake_up_klogd:A.
*/
ret = wait_event_interruptible(log_wait,
- printk_get_next_message(&pmsg, atomic64_read(&user->seq), true,
- false)); /* LMM(devkmsg_read:A) */
+ printk_get_next_message(&pmsg, NULL, atomic64_read(&user->seq),
+ true, false)); /* LMM(devkmsg_read:A) */
if (ret)
goto out;
}
@@ -1261,9 +1278,119 @@ module_param(ignore_loglevel, bool, S_IRUGO | S_IWUSR);
MODULE_PARM_DESC(ignore_loglevel,
"ignore loglevel setting (prints all kernel messages to the console)");

-static bool suppress_message_printing(int level)
+static bool __read_mostly ignore_per_console_loglevel;
+
+static int __init ignore_per_console_loglevel_setup(char *str)
+{
+ ignore_per_console_loglevel = true;
+ return 0;
+}
+
+early_param("ignore_per_console_loglevel", ignore_per_console_loglevel_setup);
+module_param(ignore_per_console_loglevel, bool, 0644);
+MODULE_PARM_DESC(
+ ignore_per_console_loglevel,
+ "ignore per-console loglevel setting (only respect global console loglevel)");
+
+bool per_console_loglevel_is_set(const struct console *con)
+{
+ return !ignore_per_console_loglevel && con && (con->level > 0);
+}
+
+/*
+ * Hierarchy of loglevel authority:
+ *
+ * 1. con->level. The locally set, console-specific loglevel. Optional, only
+ * valid if >0.
+ * 2. console_loglevel. The default global console loglevel, always present.
+ *
+ * The behaviour can be further changed by the following printk module
+ * parameters:
+ *
+ * 1. ignore_loglevel. Can be set at boot or at runtime with
+ * /sys/module/printk/parameters/ignore_loglevel. Overrides absolutely
+ * everything since it's used to debug.
+ * 2. ignore_per_console_loglevel. Existing per-console loglevel values are left
+ * intact, but are ignored in favour of console_loglevel as long as this is
+ * true. Also manipulated through syslog(SYSLOG_ACTION_CONSOLE_{ON,OFF}).
+ */
+static enum loglevel_source
+console_effective_loglevel_source(const struct console *con)
+{
+ if (WARN_ON_ONCE(!con))
+ return LLS_GLOBAL;
+
+ if (ignore_loglevel)
+ return LLS_IGNORE_LOGLEVEL;
+
+ if (per_console_loglevel_is_set(con))
+ return LLS_LOCAL;
+
+ return LLS_GLOBAL;
+}
+
+static int console_effective_loglevel(const struct console *con)
{
- return (level >= console_loglevel && !ignore_loglevel);
+ enum loglevel_source source;
+ int level;
+
+ if (WARN_ON_ONCE(!con))
+ return default_console_loglevel;
+
+ source = console_effective_loglevel_source(con);
+
+ switch (source) {
+ case LLS_IGNORE_LOGLEVEL:
+ level = CONSOLE_LOGLEVEL_MOTORMOUTH;
+ break;
+ case LLS_LOCAL:
+ level = con->level;
+ break;
+ case LLS_GLOBAL:
+ level = console_loglevel;
+ break;
+ default:
+ pr_warn("Unhandled console loglevel source: %d", source);
+ level = default_console_loglevel;
+ break;
+ }
+
+ return level;
+}
+
+static const char *
+console_effective_loglevel_source_str(const struct console *con)
+{
+ enum loglevel_source source;
+ const char *str;
+
+ if (WARN_ON_ONCE(!con))
+ return "unknown";
+
+ source = console_effective_loglevel_source(con);
+
+ switch (source) {
+ case LLS_IGNORE_LOGLEVEL:
+ str = "ignore_loglevel";
+ break;
+ case LLS_LOCAL:
+ str = "local";
+ break;
+ case LLS_GLOBAL:
+ str = "global";
+ break;
+ default:
+ pr_warn("Unhandled console loglevel source: %d", source);
+ str = "unknown";
+ break;
+ }
+
+ return str;
+}
+
+static bool suppress_message_printing(int level, struct console *con)
+{
+ return level >= console_effective_loglevel(con);
}

#ifdef CONFIG_BOOT_PRINTK_DELAY
@@ -1718,6 +1845,7 @@ int do_syslog(int type, char __user *buf, int len, int source)
struct printk_info info;
bool clear = false;
static int saved_console_loglevel = LOGLEVEL_DEFAULT;
+ static int saved_ignore_per_console_loglevel;
int error;

error = check_syslog_permissions(type, source);
@@ -1758,19 +1886,28 @@ int do_syslog(int type, char __user *buf, int len, int source)
break;
/* Disable logging to console */
case SYSLOG_ACTION_CONSOLE_OFF:
- if (saved_console_loglevel == LOGLEVEL_DEFAULT)
+ if (saved_console_loglevel == LOGLEVEL_DEFAULT) {
saved_console_loglevel = console_loglevel;
+ saved_ignore_per_console_loglevel =
+ ignore_per_console_loglevel;
+ }
console_loglevel = minimum_console_loglevel;
+ ignore_per_console_loglevel = true;
break;
/* Enable logging to console */
case SYSLOG_ACTION_CONSOLE_ON:
if (saved_console_loglevel != LOGLEVEL_DEFAULT) {
console_loglevel = saved_console_loglevel;
+ ignore_per_console_loglevel =
+ saved_ignore_per_console_loglevel;
saved_console_loglevel = LOGLEVEL_DEFAULT;
}
break;
/* Set level of messages printed to console */
case SYSLOG_ACTION_CONSOLE_LEVEL:
+ if (!ignore_per_console_loglevel)
+ pr_warn_once(
+ "SYSLOG_ACTION_CONSOLE_LEVEL is ignored by consoles with an explicitly set per-console loglevel, see Documentation/admin-guide/per-console-loglevel\n");
if (len < 1 || len > 8)
return -EINVAL;
if (len < minimum_console_loglevel)
@@ -2060,11 +2197,8 @@ static u8 *__printk_recursion_counter(void)

int printk_delay_msec __read_mostly;

-static inline void printk_delay(int level)
+static inline void printk_delay(void)
{
- if (suppress_message_printing(level))
- return;
-
boot_delay_msec();

if (unlikely(printk_delay_msec)) {
@@ -2278,7 +2412,9 @@ asmlinkage int vprintk_emit(int facility, int level,
const char *fmt, va_list args)
{
int printed_len;
+ int cookie;
bool in_sched = false;
+ struct console *con;

/* Suppress unimportant messages after panic happens */
if (unlikely(suppress_printk))
@@ -2293,7 +2429,14 @@ asmlinkage int vprintk_emit(int facility, int level,
in_sched = true;
}

- printk_delay(level);
+ cookie = console_srcu_read_lock();
+ for_each_console_srcu(con) {
+ if (!suppress_message_printing(level, con)) {
+ printk_delay();
+ break;
+ }
+ }
+ console_srcu_read_unlock(cookie);

printed_len = vprintk_store(facility, level, dev_info, fmt, args);

@@ -2370,7 +2513,10 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
struct dev_printk_info *dev_info) { return 0; }
static void console_lock_spinning_enable(void) { }
static int console_lock_spinning_disable_and_check(int cookie) { return 0; }
-static bool suppress_message_printing(int level) { return false; }
+static bool suppress_message_printing(int level, struct console *con)
+{
+ return false;
+}
static bool pr_flush(int timeout_ms, bool reset_on_progress) { return true; }
static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress) { return true; }

@@ -2410,11 +2556,78 @@ static void set_user_specified(struct console_cmdline *c, bool user_specified)
console_set_on_cmdline = 1;
}

+static bool find_and_remove_console_option(char *options, const char *key,
+ char *val_buf, size_t val_buf_size)
+{
+ bool found = false, first = true;
+ char *option, *next = options;
+
+ while ((option = strsep(&next, ","))) {
+ char *value;
+
+ value = strchr(option, ':');
+ if (value)
+ *(value++) = '\0';
+
+ if (strcmp(option, key) == 0) {
+ found = true;
+ if (value) {
+ if (strlen(value) > val_buf_size - 1) {
+ pr_warn("Can't copy console option value for %s:%s: not enough space (%zu)\n",
+ option, value, val_buf_size);
+ found = false;
+ } else {
+ strscpy(val_buf, value, val_buf_size);
+ }
+ } else
+ *val_buf = '\0';
+ }
+
+ if (found)
+ break;
+
+ if (next)
+ *(next - 1) = ',';
+ if (value)
+ *(value - 1) = ':';
+
+ first = false;
+ }
+
+ if (found) {
+ if (next)
+ memmove(option, next, strlen(next) + 1);
+ else if (first)
+ *option = '\0';
+ else
+ *--option = '\0';
+ }
+
+ return found;
+}
+
+static int find_and_remove_loglevel_option(char *options)
+{
+ char val[3];
+ int loglevel;
+
+ if (!find_and_remove_console_option(options, "loglevel", val,
+ sizeof(val)))
+ return -ENOENT;
+
+ if (kstrtoint(val, 10, &loglevel)) {
+ pr_warn("Invalid console loglevel, ignoring: %s\n", val);
+ return -EINVAL;
+ }
+
+ return clamp_loglevel(loglevel);
+}
+
static int __add_preferred_console(char *name, int idx, char *options,
char *brl_options, bool user_specified)
{
struct console_cmdline *c;
- int i;
+ int i, ret;

/*
* See if this tty is not yet registered, and
@@ -2435,6 +2648,11 @@ static int __add_preferred_console(char *name, int idx, char *options,
if (!brl_options)
preferred_console = i;
strscpy(c->name, name, sizeof(c->name));
+
+ ret = find_and_remove_loglevel_option(options);
+ if (ret >= 0)
+ c->level = ret;
+
c->options = options;
set_user_specified(c, user_specified);
braille_set_options(c, brl_options);
@@ -2764,7 +2982,8 @@ static void console_prepend_dropped(struct printk_message *pmsg, unsigned long d
* of @pmsg are valid. (See the documentation of struct printk_message
* for information about the @pmsg fields.)
*/
-static bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
+static bool printk_get_next_message(struct printk_message *pmsg,
+ struct console *con, u64 seq,
bool is_extended, bool may_suppress)
{
static int panic_console_dropped;
@@ -2808,7 +3027,7 @@ static bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
}

/* Skip record that has level above the console loglevel. */
- if (may_suppress && suppress_message_printing(r.info->level))
+ if (may_suppress && suppress_message_printing(r.info->level, con))
goto out;

if (is_extended) {
@@ -2851,7 +3070,7 @@ static bool console_emit_next_record(struct console *con, bool *handover, int co

*handover = false;

- if (!printk_get_next_message(&pmsg, con->seq, is_extended, true))
+ if (!printk_get_next_message(&pmsg, con, con->seq, is_extended, true))
return false;

con->dropped += pmsg.dropped;
@@ -3199,6 +3418,144 @@ static int __init keep_bootcon_setup(char *str)

early_param("keep_bootcon", keep_bootcon_setup);

+#ifdef CONFIG_PRINTK
+static ssize_t loglevel_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct console *con = dev_get_drvdata(dev);
+
+ return sysfs_emit(buf, "%d\n", con->level);
+}
+
+static ssize_t loglevel_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t size)
+{
+ struct console *con = dev_get_drvdata(dev);
+ ssize_t ret;
+ int level;
+
+ ret = kstrtoint(buf, 10, &level);
+ if (ret < 0)
+ return ret;
+
+ if (level == -1) {
+ con->level = level;
+ return size;
+ }
+
+ if (clamp_loglevel(level) != level)
+ return -ERANGE;
+
+ con->level = level;
+
+ return size;
+}
+
+static DEVICE_ATTR_RW(loglevel);
+
+static ssize_t effective_loglevel_source_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct console *con = dev_get_drvdata(dev);
+
+ return sysfs_emit(buf, "%s\n",
+ console_effective_loglevel_source_str(con));
+}
+
+static DEVICE_ATTR_RO(effective_loglevel_source);
+
+static ssize_t effective_loglevel_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct console *con = dev_get_drvdata(dev);
+
+ return sysfs_emit(buf, "%d\n", console_effective_loglevel(con));
+}
+
+static DEVICE_ATTR_RO(effective_loglevel);
+
+static ssize_t enabled_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct console *con = dev_get_drvdata(dev);
+
+ return sysfs_emit(buf, "%d\n", !!(con->flags & CON_ENABLED));
+}
+
+static DEVICE_ATTR_RO(enabled);
+
+static struct attribute *console_sysfs_attrs[] = {
+ &dev_attr_loglevel.attr,
+ &dev_attr_effective_loglevel_source.attr,
+ &dev_attr_effective_loglevel.attr,
+ &dev_attr_enabled.attr,
+ NULL,
+};
+
+ATTRIBUTE_GROUPS(console_sysfs);
+
+static void console_classdev_release(struct device *dev)
+{
+ kfree(dev);
+}
+
+static void console_register_device(struct console *con)
+{
+ /*
+ * We might be called from register_console() before the class is
+ * registered. If that happens, we'll take care of it in
+ * printk_late_init.
+ */
+ if (IS_ERR_OR_NULL(console_class))
+ return;
+
+ if (WARN_ON(con->classdev))
+ return;
+
+ con->classdev = kzalloc(sizeof(struct device), GFP_KERNEL);
+ if (!con->classdev)
+ return;
+
+ device_initialize(con->classdev);
+ dev_set_name(con->classdev, "%s%d", con->name, con->index);
+ dev_set_drvdata(con->classdev, con);
+ con->classdev->release = console_classdev_release;
+ con->classdev->class = console_class;
+ if (device_add(con->classdev))
+ put_device(con->classdev);
+}
+
+static void console_setup_class(void)
+{
+ struct console *con;
+ int cookie;
+
+ /*
+ * printk exists for the lifetime of the kernel, it cannot be unloaded,
+ * so we should never end up back in here.
+ */
+ if (WARN_ON(console_class))
+ return;
+
+ console_class = class_create(THIS_MODULE, "console");
+ if (!IS_ERR(console_class))
+ console_class->dev_groups = console_sysfs_groups;
+
+ cookie = console_srcu_read_lock();
+ for_each_console_srcu(con)
+ console_register_device(con);
+ console_srcu_read_unlock(cookie);
+}
+#else /* CONFIG_PRINTK */
+static void console_register_device(struct console *new)
+{
+}
+static void console_setup_class(void)
+{
+}
+#endif
+
/*
* This is called by register_console() to try to match
* the newly registered console with any of the ones selected
@@ -3231,6 +3588,14 @@ static int try_enable_preferred_console(struct console *newcon,
if (newcon->index < 0)
newcon->index = c->index;

+ if (c->level > 0)
+ newcon->level = c->level;
+ else
+ newcon->level = -1;
+
+ newcon->flags |= c->flags;
+ newcon->classdev = NULL;
+
if (_braille_register_console(newcon, c))
return 0;

@@ -3456,6 +3821,7 @@ void register_console(struct console *newcon)
* register_console() completes.
*/

+ console_register_device(newcon);
console_sysfs_notify();

/*
@@ -3523,6 +3889,9 @@ static int unregister_console_locked(struct console *console)
*/
synchronize_srcu(&console_srcu);

+ if (console->classdev)
+ device_unregister(console->classdev);
+
console_sysfs_notify();

if (console->exit)
@@ -3627,6 +3996,10 @@ void __init console_init(void)
* To mitigate this problem somewhat, only unregister consoles whose memory
* intersects with the init section. Note that all other boot consoles will
* get unregistered when the real preferred console is registered.
+ *
+ * Early consoles will also have been registered before we had the
+ * infrastructure to put them into /sys/class/console, so make sure they get
+ * set up now that we're ready.
*/
static int __init printk_late_init(void)
{
@@ -3664,6 +4037,9 @@ static int __init printk_late_init(void)
console_cpu_notify, NULL);
WARN_ON(ret < 0);
printk_sysctl_init();
+
+ console_setup_class();
+
return 0;
}
late_initcall(printk_late_init);
diff --git a/kernel/printk/sysctl.c b/kernel/printk/sysctl.c
index c228343eeb97..688e1ebb7066 100644
--- a/kernel/printk/sysctl.c
+++ b/kernel/printk/sysctl.c
@@ -7,10 +7,14 @@
#include <linux/printk.h>
#include <linux/capability.h>
#include <linux/ratelimit.h>
+#include <linux/console.h>
#include "internal.h"

static const int ten_thousand = 10000;

+static int min_loglevel = LOGLEVEL_EMERG;
+static int max_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH;
+
static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
@@ -20,13 +24,48 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
}

+static int printk_sysctl_deprecated(struct ctl_table *table, int write,
+ void *buffer, size_t *lenp, loff_t *ppos)
+{
+ int res = proc_dointvec(table, write, buffer, lenp, ppos);
+
+ if (write)
+ pr_warn_once(
+ "printk: The kernel.printk sysctl is deprecated. Consider using kernel.console_loglevel or kernel.default_message_loglevel instead.\n");
+
+ return res;
+}
+
+static int printk_console_loglevel(struct ctl_table *table, int write,
+ void *buffer, size_t *lenp, loff_t *ppos)
+{
+ struct ctl_table ltable = *table;
+ int ret, level;
+
+ if (!write)
+ return proc_dointvec(table, write, buffer, lenp, ppos);
+
+ ltable.data = &level;
+
+ ret = proc_dointvec(&ltable, write, buffer, lenp, ppos);
+ if (ret)
+ return ret;
+
+ if (level != -1 && level != clamp_loglevel(level))
+ return -ERANGE;
+
+ console_loglevel = level;
+
+ return 0;
+}
+
static struct ctl_table printk_sysctls[] = {
{
.procname = "printk",
.data = &console_loglevel,
.maxlen = 4*sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = printk_sysctl_deprecated,
},
{
.procname = "printk_ratelimit",
@@ -76,6 +115,22 @@ static struct ctl_table printk_sysctls[] = {
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_TWO,
},
+ {
+ .procname = "console_loglevel",
+ .data = &console_loglevel,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = printk_console_loglevel,
+ },
+ {
+ .procname = "default_message_loglevel",
+ .data = &default_message_loglevel,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &min_loglevel,
+ .extra2 = &max_loglevel,
+ },
{}
};

--
2.40.0

2023-04-26 09:12:57

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] printk: Do not delay messages which aren't solicited by any console

On Tue 2023-04-25 14:06:22, Chris Down wrote:
> Boot-delayed printk messages already have this checked up front, and
> it's odd that we don't do the same check for global printk delays, since
> there's no reason to delay if we are not going to emit anything.
>
> Signed-off-by: Chris Down <[email protected]>

Great catch!

Reviewed-by: Petr Mladek <[email protected]>

Best Regards,
Petr

2023-04-27 14:09:12

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] printk: console: Support console-specific loglevels

On Tue 2023-04-25 14:06:26, Chris Down wrote:
> Consoles can have vastly different latencies and throughputs. For
> example, writing a message to the serial console can take on the order
> of tens of milliseconds to get the UART to successfully write a message.
> While this might be fine for a single, one-off message, this can cause
> significant application-level stalls in situations where the kernel
> writes large amounts of information to the console.
>
> This means that while you might want to send at least INFO level
> messages to (for example) netconsole, which is relatively fast, you may
> only want to send at least WARN level messages to the serial console.
> Such an implementation would permit debugging using the serial console
> in cases that netconsole doesn't receive messages during particularly
> bad system issues, while still keeping the noise low enough to avoid
> inducing latency in userspace applications. This patch adds such an
> interface, extending the existing console loglevel controls to allow
> each console to have its own loglevel.

I like the approach. I have many comments but there is nothing
serious. Mostly only cosmetic issues.

> --- /dev/null
> +++ b/Documentation/admin-guide/per-console-loglevel.rst
> @@ -0,0 +1,92 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +.. _per_console_loglevel:
> +
> +Per-console loglevel support
> +============================
> +
> +Motivation
> +----------
> +
> +Consoles can have vastly different latencies and throughputs. For example,
> +writing a message to the serial console can take on the order of tens of
> +milliseconds to get the UART to successfully write a message. While this might
> +be fine for a single, one-off message, this can cause significant
> +application-level stalls in situations where the kernel writes large amounts of
> +information to the console.
> +
> +This means that while you might want to send at least INFO level messages to
> +(for example) netconsole, which is relatively fast, you may only want to send
> +at least WARN level messages to the serial console. This permits debugging
> +using the serial console in cases that netconsole doesn't receive messages
> +during particularly bad system issues, while still keeping the noise low enough
> +to avoid inducing latency in userspace applications.
> +

It would make sense to list the available loglevel somewhere in this
file:

Loglevel
--------

The loglevels are defined the followind way:

+---+--------------+-----------------------------------+
| 0 | KERN_EMERG | system is unusable |
+---+--------------+-----------------------------------+
| 1 | KERN_ALERT | action must be taken immediately |
+---+--------------+-----------------------------------+
| 2 | KERN_CRIT | critical conditions |
+---+--------------+-----------------------------------+
| 3 | KERN_ERR | error conditions |
+---+--------------+-----------------------------------+
| 4 | KERN_WARNING | warning conditions |
+---+--------------+-----------------------------------+
| 5 | KERN_NOTICE | normal but significant condition |
+---+--------------+-----------------------------------+
| 6 | KERN_INFO | informational |
+---+--------------+-----------------------------------+
| 7 | KERN_DEBUG | debug-level messages |
+---+--------------+-----------------------------------+

> +Tunables
> +--------
> +
> +In order to allow tuning this, the following controls exist:
> +
> +Global
> +~~~~~~
> +
> +The global loglevel is set by the ``kernel.console_loglevel`` sysctl, which can
> +also be set as ``loglevel=`` on the kernel command line.
> +
> +The printk module also takes two parameters which modify this behaviour
> +further:
> +
> +* ``ignore_loglevel`` on the kernel command line or set in printk parameters:
> + Emit all messages. All other controls are ignored if this is present.

Please add empty lines between "* item with a long descripion".
It would make the text better readable in the plain text form.

> +* ``ignore_per_console_loglevel`` on the kernel command line or set in printk
> + parameters: Ignore all per-console loglevels and use the global loglevel.
> +
> +The default value for ``kernel.console_loglevel`` comes from
> +``CONFIG_CONSOLE_LOGLEVEL_DEFAULT``, or ``CONFIG_CONSOLE_LOGLEVEL_QUIET`` if
> +``quiet`` is passed on the kernel command line.
> +
> +Console attributes
> +~~~~~~~~~~~~~~~~~~
> +
> +Registered consoles are exposed at ``/sys/class/console``. For example, if you
> +are using ``ttyS0``, the console backing it can be viewed at
> +``/sys/class/console/ttyS0/``. The following files are available:
> +
> +* ``effective_loglevel`` (r): The effective loglevel after considering all
> + loglevel authorities. For example, if the console-specific loglevel is 3, but
> + the global minimum console loglevel [*]_ is 5, then the value will be 5.

I would rather avoid this example. It confused me ;-)

I thought that it was talking about the "global console loglevel"
instead of "global minimum console loglevel". I was surprised
that the global value was used when a console-specific one
was defined.

</more details about my confusion>
The console loglevel defines a maximal message loglevel.
The important thing is that it defines a "limit". Most people,
including me, do not remember if the numbers go up or down.
So I ignored the word "minimum". I though that it described
the effect of the value and not a limit for the value.

Also the global minimum loglevel is a really weird setting.
I do not see much practical use for it. I guess that most
people are not aware of this limit. I think that it creates
more harm than good.
</more details about my confusion>

My proposal. I would replace the "confusing" paragraph with
something like:

* ``effective_loglevel`` (r): The effective loglevel after considering all
loglevel authorities. For example, it would show the value of
the console-specific loglevel when it is defined or the global
console loglevel value when the console-specific one is not defined.


> +* ``effective_loglevel_source`` (r): The loglevel authority which resulted in
> + the effective loglevel being set. The following values can be present:
> +
> + * ``local``: The console-specific loglevel is in effect.

> + * ``global``: The global loglevel (``kernel.console_loglevel``) is in
> + effect. Set a console-specific loglevel to override it.

> + * ``ignore_loglevel``: ``ignore_loglevel`` was specified on the kernel
> + command line or at ``/sys/module/printk/parameters/ignore_loglevel``.
> + Disable it to use level controls.

> + * ``ignore_per_console_loglevel``: ``ignore_per_console_loglevel`` was
> + specified on the kernel command line or at
> + ``/sys/module/printk/parameters/ignore_per_console_loglevel``. Disable it
> + to use per-console level controls.

The code does not show this value. And I would not show it either.

> +* ``enabled`` (r): Whether the console is enabled.

> +* ``loglevel`` (rw): The local, console-specific loglevel for this console.
> + This will be in effect if no other global control overrides it. Look at
> + ``effective_loglevel`` and ``effective_loglevel_source`` to verify that.
> +
> +.. [*] The existence of a minimum console loglevel is generally considered to
> + be a confusing and rarely used interface, and as such is not exposed through
> + the modern printk sysctl APIs that obsoleted ``kernel.printk``. Use the
> + legacy ``kernel.printk`` sysctl to control it if you have a rare use case
> + that requires changing it. The default value is ``CONSOLE_LOGLEVEL_MIN``.
> +
> +Deprecated
> +~~~~~~~~~~
> +
> +* ``kernel.printk`` sysctl: this takes four values, setting
> + ``kernel.console_loglevel``, ``kernel.default_message_loglevel``, the minimum
> + console loglevel, and a fourth unused value. The interface is generally
> + considered to quite confusing, doesn't perform checks on the values given,
> + and is unaware of per-console loglevel semantics.
> +
> +Chris Down <[email protected]>, 27-April-2023
> diff --git a/Documentation/admin-guide/serial-console.rst b/Documentation/admin-guide/serial-console.rst
> index 58b32832e50a..4e204115fe4a 100644
> --- a/Documentation/admin-guide/serial-console.rst
> +++ b/Documentation/admin-guide/serial-console.rst
> @@ -32,15 +32,25 @@ The format of this option is::
> and F is flow control ('r' for RTS). Default is
> 9600n8. The maximum baudrate is 115200.
>
> + One can also specify the per-console loglevel for this
> + console by providing a loglevel parameter, for example
> + "loglevel:4" to set this console's loglevel to 4. The
> + value provided can be from 0 (LOGLEVEL_EMERG) to 8
> + (LOGLEVEL_DEBUG + 1), and messages below that will be
> + emitted onto the console as they become available.

It would be handy to add the table with the available log levels
into this file as well.

> +
> You can specify multiple console= options on the kernel command line.
> Output will appear on all of them. The last device will be used when
> you open ``/dev/console``. So, for example::
>
> - console=ttyS1,9600 console=tty0
> + console=ttyS1,9600,loglevel:5 console=tty0
>
> defines that opening ``/dev/console`` will get you the current foreground
> -virtual console, and kernel messages will appear on both the VGA
> -console and the 2nd serial port (ttyS1 or COM2) at 9600 baud.
> +virtual console, and kernel messages will appear on both the VGA console and
> +the 2nd serial port (ttyS1 or COM2) at 9600 baud. The optional loglevel "5"
> +indicates that this console will emit messages more serious than
> +LOGLEVEL_NOTICE (that is, LOGLEVEL_WARNING and below, since more serious
> +messages have lower ordering).
>
> Note that you can only define one console per device type (serial, video).
>
> --- a/drivers/tty/sysrq.c
> +++ b/drivers/tty/sysrq.c
> @@ -101,12 +102,26 @@ __setup("sysrq_always_enabled", sysrq_always_enabled_setup);
> static void sysrq_handle_loglevel(int key)
> {
> int i;
> + int cookie;
> + int warned = 0;
> + struct console *con;
>
> i = key - '0';
> console_loglevel = CONSOLE_LOGLEVEL_DEFAULT;
> pr_info("Loglevel set to %d\n", i);
> console_loglevel = i;
> +
> + cookie = console_srcu_read_lock();
> + for_each_console_srcu(con) {
> + if (!warned && per_console_loglevel_is_set(con)) {
> + warned = 1;
> + pr_warn("Overriding per-console loglevel from sysrq\n");
> + }
> + con->level = -1;

Please, use WRITE_ONCE(con->level, -1) to make sure that it will be
atomic.

READ_ONCE()/WRITE_ONCE() should always be used when the values
are read/written using RCU synchronization. Otherwise the compiler
might do some optimizations and read/write bytes separately.

> + }
> + console_srcu_read_unlock(cookie);
> }

Also we should safe/set/restore "ignore_per_console_loglevel"
in __handle_sysrq(). It already does the same with
"console_loglevel".

__handle_sysrq() increases the loglevel to show all
messages printed by the sysrq handler on all consoles.
Many handlers print some information that might be useful
for debugging.

> static const struct sysrq_key_op sysrq_loglevel_op = {
> .handler = sysrq_handle_loglevel,
> .help_msg = "loglevel(0-9)",
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -203,6 +204,8 @@ struct console {
> unsigned long dropped;
> void *data;
> struct hlist_node node;
> + int level;
> + struct device *classdev;

Please, update also the comment above the structure.

> };
>
> #ifdef CONFIG_LOCKDEP
> --- a/kernel/printk/console_cmdline.h
> +++ b/kernel/printk/console_cmdline.h
> @@ -6,6 +6,8 @@ struct console_cmdline
> {
> char name[16]; /* Name of the driver */
> int index; /* Minor dev. to use */
> + int level; /* Log level to use */
> + short flags; /* Initial flags */

Are the initial flags actually set anywhere?

> bool user_specified; /* Specified by command line vs. platform */
> char *options; /* Options for the driver */
> #ifdef CONFIG_A11Y_BRAILLE_CONSOLE
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 041c7bd56005..f006960ce18b 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1261,9 +1278,119 @@ module_param(ignore_loglevel, bool, S_IRUGO | S_IWUSR);
> MODULE_PARM_DESC(ignore_loglevel,
> "ignore loglevel setting (prints all kernel messages to the console)");
>
> -static bool suppress_message_printing(int level)
> +static bool __read_mostly ignore_per_console_loglevel;
> +
> +static int __init ignore_per_console_loglevel_setup(char *str)
> +{
> + ignore_per_console_loglevel = true;
> + return 0;
> +}
> +
> +early_param("ignore_per_console_loglevel", ignore_per_console_loglevel_setup);
> +module_param(ignore_per_console_loglevel, bool, 0644);
> +MODULE_PARM_DESC(
> + ignore_per_console_loglevel,
> + "ignore per-console loglevel setting (only respect global console loglevel)");
> +
> +bool per_console_loglevel_is_set(const struct console *con)
> +{
> + return !ignore_per_console_loglevel && con && (con->level > 0);

It should use READ_ONCE(con->level).

> +}
> +
> +/*
> + * Hierarchy of loglevel authority:
> + *
> + * 1. con->level. The locally set, console-specific loglevel. Optional, only
> + * valid if >0.
> + * 2. console_loglevel. The default global console loglevel, always present.
> + *
> + * The behaviour can be further changed by the following printk module
> + * parameters:
> + *
> + * 1. ignore_loglevel. Can be set at boot or at runtime with
> + * /sys/module/printk/parameters/ignore_loglevel. Overrides absolutely
> + * everything since it's used to debug.
> + * 2. ignore_per_console_loglevel. Existing per-console loglevel values are left
> + * intact, but are ignored in favour of console_loglevel as long as this is
> + * true. Also manipulated through syslog(SYSLOG_ACTION_CONSOLE_{ON,OFF}).
> + */
> +static enum loglevel_source
> +console_effective_loglevel_source(const struct console *con)
> +{
> + if (WARN_ON_ONCE(!con))
> + return LLS_GLOBAL;
> +
> + if (ignore_loglevel)
> + return LLS_IGNORE_LOGLEVEL;
> +
> + if (per_console_loglevel_is_set(con))
> + return LLS_LOCAL;
> +
> + return LLS_GLOBAL;
> +}
> +
> +static int console_effective_loglevel(const struct console *con)
> {
> - return (level >= console_loglevel && !ignore_loglevel);
> + enum loglevel_source source;
> + int level;
> +
> + if (WARN_ON_ONCE(!con))
> + return default_console_loglevel;

This looks superfluous. The check in
console_effective_loglevel_source(con) should be enough.

I know that it would change the behavior of this function.
It would return the value of "console_loglevel" instead of
"default_console_loglevel" when there is a bug. But it
does not matter. There is no good value. It just
should never happen.

> + source = console_effective_loglevel_source(con);
> +
> + switch (source) {
> + case LLS_IGNORE_LOGLEVEL:
> + level = CONSOLE_LOGLEVEL_MOTORMOUTH;
> + break;
> + case LLS_LOCAL:
> + level = con->level;
> + break;
> + case LLS_GLOBAL:
> + level = console_loglevel;
> + break;
> + default:
> + pr_warn("Unhandled console loglevel source: %d", source);
> + level = default_console_loglevel;
> + break;
> + }
> +
> + return level;
> +}
> +
> +static const char *
> +console_effective_loglevel_source_str(const struct console *con)
> +{
> + enum loglevel_source source;
> + const char *str;
> +
> + if (WARN_ON_ONCE(!con))
> + return "unknown";

Same here. I would remove this check.

> + source = console_effective_loglevel_source(con);
> +
> + switch (source) {
> + case LLS_IGNORE_LOGLEVEL:
> + str = "ignore_loglevel";
> + break;
> + case LLS_LOCAL:
> + str = "local";
> + break;
> + case LLS_GLOBAL:
> + str = "global";
> + break;
> + default:
> + pr_warn("Unhandled console loglevel source: %d", source);
> + str = "unknown";
> + break;
> + }
> +
> + return str;
> +}
> +
> +static bool suppress_message_printing(int level, struct console *con)
> +{
> + return level >= console_effective_loglevel(con);
> }
>
> #ifdef CONFIG_BOOT_PRINTK_DELAY
> @@ -2293,7 +2429,14 @@ asmlinkage int vprintk_emit(int facility, int level,
> in_sched = true;
> }
>
> - printk_delay(level);
> + cookie = console_srcu_read_lock();
> + for_each_console_srcu(con) {
> + if (!suppress_message_printing(level, con)) {
> + printk_delay();

This would repeat the delay for each registered and non-supressed console.
But it should be called only once when there is at least one
non-suppressed console.

> + break;
> + }
> + }
> + console_srcu_read_unlock(cookie);

I would prefer to keep the @level parameter and do this in
printk_delay(). vprintk_emit() historically grew into a monster
function. It took a long time to clean it. I am happy that
it contains only the top level logic now ;-)

>
> printed_len = vprintk_store(facility, level, dev_info, fmt, args);
>
> @@ -2410,11 +2556,78 @@ static void set_user_specified(struct console_cmdline *c, bool user_specified)
> console_set_on_cmdline = 1;
> }
>
> +static bool find_and_remove_console_option(char *options, const char *key,
> + char *val_buf, size_t val_buf_size)
> +{
> + bool found = false, first = true;
> + char *option, *next = options;
> +
> + while ((option = strsep(&next, ","))) {
> + char *value;
> +
> + value = strchr(option, ':');
> + if (value)
> + *(value++) = '\0';
> +
> + if (strcmp(option, key) == 0) {
> + found = true;
> + if (value) {
> + if (strlen(value) > val_buf_size - 1)
> {

It might undeflow when val_buf_size == 0. I would use:

if (strlen(value) >= val_buf_size)

> + pr_warn("Can't copy console option value for %s:%s: not enough space (%zu)\n",
> + option, value, val_buf_size);
> + found = false;
> + } else {
> + strscpy(val_buf, value, val_buf_size);
> + }
> + } else
> + *val_buf = '\0';
> + }
> +
> + if (found)
> + break;
> +
> + if (next)
> + *(next - 1) = ',';
> + if (value)
> + *(value - 1) = ':';
> +
> + first = false;
> + }
> +
> + if (found) {
> + if (next)
> + memmove(option, next, strlen(next) + 1);
> + else if (first)
> + *option = '\0';
> + else
> + *--option = '\0';
> + }
> +
> + return found;
> +}
> +
> +static int find_and_remove_loglevel_option(char *options)
> +{
> + char val[3];
> +
> + int loglevel;
> +
> + if (!find_and_remove_console_option(options, "loglevel", val,
> + sizeof(val)))
> + return -ENOENT;
> +
> + if (kstrtoint(val, 10, &loglevel)) {
> + pr_warn("Invalid console loglevel, ignoring: %s\n", val);
> + return -EINVAL;
> + }
> +
> + return clamp_loglevel(loglevel);

I would use the same logic as in loglevel_store():

if (clamp_loglevel(loglevel) != loglevel) {
pr_warn("Per-console loglevel out of range, ignoring: %d\n", loglevel);
return -ERANGE;
}

return loglevel;

Nit: Also I would use a bigger buffer, e.g. val[12]. It would allow to
read a bigger range of invalid values. The message about that
a value is out of range is more clear than the generic one about
that the buffer is too small ;-)

> +}
> +
> static int __add_preferred_console(char *name, int idx, char *options,
> char *brl_options, bool user_specified)
> {
> struct console_cmdline *c;
> - int i;
> + int i, ret;
>
> /*
> * See if this tty is not yet registered, and
> @@ -2435,6 +2648,11 @@ static int __add_preferred_console(char *name, int idx, char *options,
> if (!brl_options)
> preferred_console = i;
> strscpy(c->name, name, sizeof(c->name));
> +
> + ret = find_and_remove_loglevel_option(options);
> + if (ret >= 0)
> + c->level = ret;

c->level == 0 is also the initial value. We should do:

if (ret >= 0)
c->level = ret;
else
c->level = -1;

I know that it was not a real problem because the level never could be 0
because of clamping by the minimal console loglevel. But it is not obvious.
Let's have the logic obvious and clear.

> +
> c->options = options;
> set_user_specified(c, user_specified);
> braille_set_options(c, brl_options);
> @@ -2764,7 +2982,8 @@ static void console_prepend_dropped(struct printk_message *pmsg, unsigned long d
> * of @pmsg are valid. (See the documentation of struct printk_message
> * for information about the @pmsg fields.)
> */
> -static bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
> +static bool printk_get_next_message(struct printk_message *pmsg,
> + struct console *con, u64 seq,
> bool is_extended, bool may_suppress)

@is_extended has to match @con->flags & CON_EXTENDED. It would be
ugly if the API is not used correctly and they do not match.

Also the "may_suppress" value might be guessed from @con.

I see two solutions:

1. If we pass @con then @is_extended and @may_suppress parameters
are not needed. The values might be guessed. The logic is
the following:

if (con) {
is_extended = console_srcu_read_flags(con) & CON_EXTENDED;
may_suppress = true;
} else {
/* Used only by devkmsg_read(). */
is_extended = true;
may_suppress = false;
}

If we go this way then we should do the parameter switch in a
separate patch. The commit message should explain that
the upcoming changes will need to read even more information
from struct console.

The drawback is that @con->seq and @seq values might still
be inconsistent. But the @seq value must be passed explicitly
when called from dev_kmsg() and @con is NULL. Also the explicit
@seq parameter will most likely useful also for the
con->atomic_write() callback added by John's patchset.
The atomic consoles will not use con->seq at all.


2. We could pass @con_loglevel instead of @con. And pass it to
suppress_message_printing() instead of @con as well.

It is probably cleaner solution but the many parameters suck.


I would personally use the 1st proposal and live with the fact
that the function would ignore con->seq and use the passed
@seq instead. The many parameters suck, and especially
bool parameters suck.

> {
> static int panic_console_dropped;
> @@ -2808,7 +3027,7 @@ static bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
> }
>
> /* Skip record that has level above the console loglevel. */
> - if (may_suppress && suppress_message_printing(r.info->level))
> + if (may_suppress && suppress_message_printing(r.info->level, con))

This would be:

/* Never suppress when used in devkmsg_read() */
if (con && suppress_message_printing(r.info->level, con))

> goto out;
>
> if (is_extended) {
> @@ -2851,7 +3070,7 @@ static bool console_emit_next_record(struct console *con, bool *handover, int co
>
> *handover = false;
>
> - if (!printk_get_next_message(&pmsg, con->seq, is_extended, true))
> + if (!printk_get_next_message(&pmsg, con, con->seq, is_extended, true))

The "con, con->seq" parameters look a bit ugly.
But I could live with it as explained above.

> return false;
>
> con->dropped += pmsg.dropped;
> @@ -3199,6 +3418,144 @@ static int __init keep_bootcon_setup(char *str)
>
> early_param("keep_bootcon", keep_bootcon_setup);
>
> +#ifdef CONFIG_PRINTK

It might be better to put this into separate source file, e.g.
kernel/printk/sysfs.c, like the existing kernel/printk/sysctl.c.

printk.c is already huge. And any longer #ifdef section is always
hard to follow in such long files. This would actually be
3rd #ifdef CONFIG_PRINTK section there.

> +static ssize_t loglevel_show(struct device *dev, struct device_attribute *attr,
> + char *buf)
> +{
> + struct console *con = dev_get_drvdata(dev);
> +
> + return sysfs_emit(buf, "%d\n", con->level);

It should use READ_ONCE(con->loglevel).

> +}
> +
> +static ssize_t loglevel_store(struct device *dev, struct device_attribute *attr,
> + const char *buf, size_t size)
> +{
> + struct console *con = dev_get_drvdata(dev);
> + ssize_t ret;
> + int level;
> +
> + ret = kstrtoint(buf, 10, &level);
> + if (ret < 0)
> + return ret;
> +
> + if (level == -1) {
> + con->level = level;
> + return size;
> + }

Nit: This might be:

if (level == -1)
goto out:

> + if (clamp_loglevel(level) != level)
> + return -ERANGE;

out:
> + con->level = level;
> + return size;
> +}
> +
> +static DEVICE_ATTR_RW(loglevel);
[...]
> +
> +static ssize_t enabled_show(struct device *dev, struct device_attribute *attr,
> + char *buf)
> +{
> + struct console *con = dev_get_drvdata(dev);
> +
> + return sysfs_emit(buf, "%d\n", !!(con->flags & CON_ENABLED));

This should use console_srcu_read_flags(con) & CON_ENABLED);

> +}
> +
[...]
> +static void console_register_device(struct console *con)
> +{

Please add:

lockdep_assert_console_list_lock_held();

, see below.


> + /*
> + * We might be called from register_console() before the class is
> + * registered. If that happens, we'll take care of it in
> + * printk_late_init.
> + */
> + if (IS_ERR_OR_NULL(console_class))
> + return;
> +
> + if (WARN_ON(con->classdev))
> + return;
> +
> + con->classdev = kzalloc(sizeof(struct device), GFP_KERNEL);

Better would be sizeof(*con->classdev)

> + if (!con->classdev)
> + return;
> +
> + device_initialize(con->classdev);
> + dev_set_name(con->classdev, "%s%d", con->name, con->index);
> + dev_set_drvdata(con->classdev, con);
> + con->classdev->release = console_classdev_release;
> + con->classdev->class = console_class;
> + if (device_add(con->classdev))
> + put_device(con->classdev);
> +}
> +
> +static void console_setup_class(void)

Please. rename it to console_sysfs_init() to match the name of
printk_sysctl_init(). They are called after each other in
printk_late_init()...

> +{
> + struct console *con;
> + int cookie;
> +
> + /*
> + * printk exists for the lifetime of the kernel, it cannot be unloaded,
> + * so we should never end up back in here.
> + */
> + if (WARN_ON(console_class))
> + return;
> +
> + console_class = class_create(THIS_MODULE, "console");
> + if (!IS_ERR(console_class))
> + console_class->dev_groups = console_sysfs_groups;
> +

I would add a comment here:

/*
* Create sysfs interface for consoles that were registered
* before the console class was created.
*/

> + cookie = console_srcu_read_lock();
> + for_each_console_srcu(con)

This does not loop properly serialized against register_console() and
unregister_console(). I am not 100% sure. The race is not much
realistic, definitely. But better be on the safe side and do:

console_list_lock();
for_each_console(con) {

> + console_register_device(con);
> + console_srcu_read_unlock(cookie);
> +}
> +#else /* CONFIG_PRINTK */
> +static void console_register_device(struct console *new)
> +{
> +}
> +static void console_setup_class(void)
> +{
> +}
> +#endif
> +
> /*
> * This is called by register_console() to try to match
> * the newly registered console with any of the ones selected
> @@ -3231,6 +3588,14 @@ static int try_enable_preferred_console(struct console *newcon,
> if (newcon->index < 0)
> newcon->index = c->index;
>
> + if (c->level > 0)
> + newcon->level = c->level;
> + else
> + newcon->level = -1;

Better would be:

if (c->level >= minimum_console_loglevel)
newcon->level = c->level;
else
newcon->level = -1;


> +
> + newcon->flags |= c->flags;

Is newcon->flags actually set anywhere, please?

> + newcon->classdev = NULL;
> +
> if (_braille_register_console(newcon, c))
> return 0;
>
> @@ -3627,6 +3996,10 @@ void __init console_init(void)
> * To mitigate this problem somewhat, only unregister consoles whose memory
> * intersects with the init section. Note that all other boot consoles will
> * get unregistered when the real preferred console is registered.
> + *
> + * Early consoles will also have been registered before we had the
> + * infrastructure to put them into /sys/class/console, so make sure they get
> + * set up now that we're ready.

Actually, it is not only about early consoles. Even normal consoles
are usually registered before late init calls.

Anyway, we do not need to add a comment here. It is too lost and
unrelated to the above comments. The comment in console_setup_class()
above the for-cycle would be enough.

> */
> static int __init printk_late_init(void)
> {
> --- a/kernel/printk/sysctl.c
> +++ b/kernel/printk/sysctl.c
> @@ -7,10 +7,14 @@
> #include <linux/printk.h>
> #include <linux/capability.h>
> #include <linux/ratelimit.h>
> +#include <linux/console.h>
> #include "internal.h"
>
> static const int ten_thousand = 10000;
>
> +static int min_loglevel = LOGLEVEL_EMERG;
> +static int max_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH;

The default message loglevel should be meaningful. I would use
LOGLEVEL_DEBUG at max.

Also, please, call the variable min_msg_... to make it clear
that these limits are for message loglevels:

static int min_msg_loglevel = LOGLEVEL_EMERG;
static int max_msg_loglevel = LOGLEVEL_DEBUG;

> +
> static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
> void *buffer, size_t *lenp, loff_t *ppos)
> {
> @@ -20,13 +24,48 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
> return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> }
>
> +static int printk_sysctl_deprecated(struct ctl_table *table, int write,
> + void *buffer, size_t *lenp, loff_t *ppos)
> +{
> + int res = proc_dointvec(table, write, buffer, lenp, ppos);
> +
> + if (write)
> + pr_warn_once(
> + "printk: The kernel.printk sysctl is deprecated. Consider using kernel.console_loglevel or kernel.default_message_loglevel instead.\n");

Nit: Please, put it on the same line as pr_warn_once("printk: The kernel....
The small shift left did not help much ;-)

> +
> + return res;
> +}
> +
> +static int printk_console_loglevel(struct ctl_table *table, int write,
> + void *buffer, size_t *lenp, loff_t *ppos)
> +{
> + struct ctl_table ltable = *table;
> + int ret, level;
> +
> + if (!write)
> + return proc_dointvec(table, write, buffer, lenp, ppos);
> +
> + ltable.data = &level;
> +
> + ret = proc_dointvec(&ltable, write, buffer, lenp, ppos);
> + if (ret)
> + return ret;
> +
> + if (level != -1 && level != clamp_loglevel(level))
> + return -ERANGE;
> +
> + console_loglevel = level;
> +
> + return 0;
> +}
> +
> static struct ctl_table printk_sysctls[] = {
> {
> .procname = "printk",
> .data = &console_loglevel,
> .maxlen = 4*sizeof(int),
> .mode = 0644,
> - .proc_handler = proc_dointvec,
> + .proc_handler = printk_sysctl_deprecated,
> },
> {
> .procname = "printk_ratelimit",
> @@ -76,6 +115,22 @@ static struct ctl_table printk_sysctls[] = {
> .extra1 = SYSCTL_ZERO,
> .extra2 = SYSCTL_TWO,
> },
> + {
> + .procname = "console_loglevel",
> + .data = &console_loglevel,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = printk_console_loglevel,
> + },
> + {
> + .procname = "default_message_loglevel",
> + .data = &default_message_loglevel,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec_minmax,
> + .extra1 = &min_loglevel,
> + .extra2 = &max_loglevel,
> + },
> {}
> };

Finally, the patch is really huge. It might cause problems
when dealing with regressions. It would be nice to split it.
It would help with review, bisection, and make the life
easier for future archaeologists.

The following separate patches comes to my mind:

1. Fix printk_delay() call (already done, current 1st patch)

2. Pass @con instead of @is_extended/@may_schedule in
in printk_get_next_message().

3. Add con->level and the logic that will use it in printk(),
including console_effective_loglevel(). It will be
temporary initialized to -1 in register_console().

4. Add ignore_per_console_loglevel parameter, use it
in per_console_loglevel_is_set(), do_syslog(),
and __handle_sysrq().

5. Add sysfs interface that would allow to set per-console
log level at runtime.

6. Add the command line interface that would allow to define
the console loglevel via console= parameter.

7. Update the sysctl interface.

I know that it might be painful to do so but I believe that
it would pay off in the long term.

How does this sounds, please?

Best Regards,
Petr

2023-04-27 15:52:11

by John Ogness

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] printk: console: Support console-specific loglevels

On 2023-04-27, Petr Mladek <[email protected]> wrote:
>> @@ -2764,7 +2982,8 @@ static void console_prepend_dropped(struct printk_message *pmsg, unsigned long d
>> * of @pmsg are valid. (See the documentation of struct printk_message
>> * for information about the @pmsg fields.)
>> */
>> -static bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
>> +static bool printk_get_next_message(struct printk_message *pmsg,
>> + struct console *con, u64 seq,
>> bool is_extended, bool may_suppress)
>
> @is_extended has to match @con->flags & CON_EXTENDED. It would be
> ugly if the API is not used correctly and they do not match.
>
> Also the "may_suppress" value might be guessed from @con.
>
> I see two solutions:
>
> 1. If we pass @con then @is_extended and @may_suppress parameters
> are not needed. The values might be guessed. The logic is
> the following:
>
> if (con) {
> is_extended = console_srcu_read_flags(con) & CON_EXTENDED;
> may_suppress = true;
> } else {
> /* Used only by devkmsg_read(). */
> is_extended = true;
> may_suppress = false;
> }

This is correct, but we also need to keep in mind that the caller of
printk_get_next_message() also needs to know if it is extended, so that
it can know if it should prepend the dropped messages. Although I guess
it doesn't matter if the caller is also reading @con->flags to get the
extended status.

I don't want to move console_prepend_dropped() inside
printk_get_next_message() because then printk_get_next_message() would
be modifying @dropped. The main purpose of printk_get_next_message() was
to locklessly get formatted text, thus simplifying the atomic console
implementation.

> The drawback is that @con->seq and @seq values might still
> be inconsistent. But the @seq value must be passed explicitly
> when called from dev_kmsg() and @con is NULL. Also the explicit
> @seq parameter will most likely useful also for the
> con->atomic_write() callback added by John's patchset.
> The atomic consoles will not use con->seq at all.

Correct. @flags and @loglevel are the _only_ fields that may be read
locklessly.

> 2. We could pass @con_loglevel instead of @con. And pass it to
> suppress_message_printing() instead of @con as well.
>
> It is probably cleaner solution but the many parameters suck.

We would need to pass console_effective_loglevel(). But yes, 3 separate
parameters when they could be determined from 1 sucks.

> I would personally use the 1st proposal and live with the fact
> that the function would ignore con->seq and use the passed
> @seq instead.

Whitelist: only @flags and @level may be read

I agree that we should go with your 1st proposal. But I would like to
see some comments added into printk_get_next_message() that only @flags
and @level may be considered valid.

John

2023-04-28 16:29:59

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] printk: console: Support console-specific loglevels

Hi Chris,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 61d325dcbc05d8fef88110d35ef7776f3ac3f68b]

url: https://github.com/intel-lab-lkp/linux/commits/Chris-Down/printk-Do-not-delay-messages-which-aren-t-solicited-by-any-console/20230425-210828
base: 61d325dcbc05d8fef88110d35ef7776f3ac3f68b
patch link: https://lore.kernel.org/r/4d3846bf2543de20aa071b2a12de924eea3e9574.1682427812.git.chris%40chrisdown.name
patch subject: [PATCH v5 2/2] printk: console: Support console-specific loglevels
reproduce:
# https://github.com/intel-lab-lkp/linux/commit/c92f68c2bbfd8d7ff05785c20f0c2535861143c5
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Chris-Down/printk-Do-not-delay-messages-which-aren-t-solicited-by-any-console/20230425-210828
git checkout c92f68c2bbfd8d7ff05785c20f0c2535861143c5
make menuconfig
# enable CONFIG_COMPILE_TEST, CONFIG_WARN_MISSING_DOCUMENTS, CONFIG_WARN_ABI_ERRORS
make htmldocs

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

>> Warning: kernel/printk/printk.c references a file that doesn't exist: Documentation/admin-guide/per-console-loglevel

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

2023-04-30 17:01:58

by calumlikesapplepie

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] printk: console: Remove sysrq exception

Hullo! I'm a random college student; this is my first message to the
LKML, please don't be too mean!

On Thu, 2023-04-27 at 15:49 +0200, Petr Mladek wrote:
> On Tue 2023-04-25 14:06:26, Chris Down wrote:
> > Consoles can have vastly different latencies and throughputs. For
> > example, writing a message to the serial console can take on the order
> > of tens of milliseconds to get the UART to successfully write a message.
> > While this might be fine for a single, one-off message, this can cause
> > significant application-level stalls in situations where the kernel
> > writes large amounts of information to the console.
>
> > This means that while you might want to send at least INFO level
> > messages to (for example) netconsole, which is relatively fast, you may
> > only want to send at least WARN level messages to the serial console.
> > Such an implementation would permit debugging using the serial console
> > in cases that netconsole doesn't receive messages during particularly
> > bad system issues, while still keeping the noise low enough to avoid
> > inducing latency in userspace applications. This patch adds such an
> > interface, extending the existing console loglevel controls to allow
> > each console to have its own loglevel.

What I'm coming here to say isn't really perfectly relevant to this
goal, but it is related to the work you're doing to accomplish it.
Basically: sysrq interaction with the console is terrible, and is
causing problems for no good reasons as a result of being an edge case,
making me sad. Details on why I am sad are in the next two paragraphs,
but you can just skip them. In fact, I recommend it.

Yes, this email is the result of about two and a half hours of
dedicated procrastination right at the start of finals week. I blame
Steam for eating up all my RAM and swap to pre-cache Vulkan shaders,
whatever error handler froze my desktop after it did, kdump for
generating a nice crash report revealing that the sysrq's were
processed but that the OOM Killer was never invoked (despite a kworker
allocation failure), atopd for creating a log entry and proving the
system was alive and well after the freeze, and systemd-journald for
consuming 15% cpu in that entry but not actually syncing anything to
disk.

I suppose my decision to get said shaders ready for next week's jubilee
might be partly to blame. Or my habit of running two browsers with a
few hundred tabs apeice, a CAD program, and a half-dozen other things
at the same time. My assumption that sysrq headers print
unconditionally on all consoles means that they also probably count as
KERN_CRIT messages (which trigger an immediate journal sync) might not
have helped. I suppose if I'd been more patient my system might have
either recovered or finally gotten the OOM killer out, and been
responsive. And maybe if I'd known more sysrq's than 'emergency sync
to disk' and 'crash the kernel', or had chosen to trigger a mass SIGINT
rather than a crash, I wouldn't have embarked on this journey.

But really, I blame 1993 Linus Torvalids for coding up the sysrq
loglevel dance in the first place.

[snip]
> > --- a/drivers/tty/sysrq.c
> > +++ b/drivers/tty/sysrq.c
> > @@ -101,12 +102,26 @@ __setup("sysrq_always_enabled",
> > sysrq_always_enabled_setup);
> >  static void sysrq_handle_loglevel(int key)
> >  {
> >         int i;
> > +       int cookie;
> > +       int warned = 0;
> > +       struct console *con;
> >  
> >         i = key - '0';
> >         console_loglevel = CONSOLE_LOGLEVEL_DEFAULT;
> >         pr_info("Loglevel set to %d\n", i);
> >         console_loglevel = i;
> > +
> > +       cookie = console_srcu_read_lock();
> > +       for_each_console_srcu(con) {
> > +               if (!warned && per_console_loglevel_is_set(con)) {
> > +                       warned = 1;
> > +                       pr_warn("Overriding per-console loglevel
> > from sysrq\n");
> > +               }
> > +               con->level = -1;
>
> Please, use WRITE_ONCE(con->level, -1) to make sure that it will be
> atomic.
>
> READ_ONCE()/WRITE_ONCE() should always be used when the values
> are read/written using RCU synchronization. Otherwise the compiler
> might do some optimizations and read/write bytes separately.
>

I would argue we remove this whole logic, and just go to a nice and
simple printk at KERN_EMERG priority. Why add interactions with the
SRCU subsystem in code that we need to run even in the event of some
massive SRCU bug? This code will be both infrequently tested and make
a bad day worse if it's problematic.

> > +       }
> > +       console_srcu_read_unlock(cookie);
> >  }
>
> Also we should safe/set/restore "ignore_per_console_loglevel"
> in __handle_sysrq(). It already does the same with
> "console_loglevel".
>
> __handle_sysrq() increases the loglevel to show all
> messages printed by the sysrq handler on all consoles.
> Many handlers print some information that might be useful
> for debugging.

Actually, it doesn't! Be kinda cool if it did, but there is a
legitimate reason for it not to: some sysrq messages, like the list of
all processes in the system, are extremely long, and so you might want
to print them to a file but not every console. Instead, it prints
exactly one message with the increased log level visibility: an info-
priority message that states the "action message" of the sysrq, before
returning to the default log level.

In other words: magic sysrq messages are printed to every console,
because they are deemed to be important emergencies that no user should
miss, even if they told the kernel it better be actively on fire if it
wants to talk. Cool. However, they are not marked as emergency
priority. Instead, they're marked as info priority, but do a dance
with kernel log level to ensure they get printed everywhere.

Why? See this thread, where they discuss adding the (kind-of vague)
comment that now exists in the code to indicate the reasoning.

https://lore.kernel.org/lkml/[email protected]/

Okay, so that explains why we do the song-and-dance only for the header
message. Still wondering why the printk at low priority? They
acknowledge it as something weird, but don't talk about why. So I dove
into git history, only to find that the log-level shifting code was
added with the initial import from bitkeeper. And then that, while
there was a change to the file in the bitkeeper history, the log level
shifting code predates it.

Uh oh.

So lets go back to May 31st, 1997, when the version of sysrq.c that
makes it into the 2.1.43 kernel is written by Martin Mares. Fun fact:
the email address he used then probably still works; I've added him to
the CC on this, since it was him who created the code. And... yup.
The log level shifting code is there, complete with the same variable
names. Sysrq.c does not exist in the previous kernel version.

https://mirrors.edge.kernel.org/pub/linux/kernel/v2.1/linux-2.1.43.tar.gz

No, I could not find the email where this patch was first submitted.
It may not exist on the internet; it certainly doesn't in the parts
where Google looks.

Now, printk did support priority levels at this point: without the
emails discussing the patch, therefore, it's impossible to say why
Martin decided all those years ago to use this technique. It's
probably a moot point anyways.

The fact that sysrq messages behave in this way isn't well documented,
and is definitely harmful. Users might (rightfully) assume that if
they set the log level to 1, they will not see any messages that are
not of EMERGENCY or ALERT priority. If they program on that
assumption, however, a sysrq might break their code: it will be shown
on their console, but break whatever boolean they have looking at
priority. The chance of that occurring will go up with these changes.

sysrq's behaving in this way, as info-level emergencies, is just weird.
No sane system uses a sysrq in normal running: its an awkward
interface, designed for use by a human, with pretty much all of the
keys causing some level of system-level disruption. The stated reason
doesn't really make sense; why is it important enough to be shown on
every console, but not important enough to be seen as an emergency by
the computer?

If your system experiences a sysrq, either you have some weird backup
software that is using the wrong interface, or someone with extremely
privileged access to your system believes that there is something so
fundamentally wrong with your system that they need to bypass the
entirety of userspace and much of the kernel to get something done.
Either of those situations are at least as important as a typo in a
password for sudo; which is given a CRITICAL priority.

Lets not add a pile of code in order to maintain a behavior that no
sane userspace will be depending on, and which might even be causing
bugs in sane userspaces. Like, for instance, systemd-journald deciding
not to write out journals when I instruct my kernel to do an emergency
sync.

>
> The following separate patches comes to my mind:
>
> 1. Fix printk_delay() call (already done, current 1st patch)
>
> 2. Pass @con instead of @is_extended/@may_schedule in
>    in printk_get_next_message().
>
> 3. Add con->level and the logic that will use it in printk(),
>    including console_effective_loglevel(). It will be
>    temporary initialized to -1 in register_console().
>
> 4. Add ignore_per_console_loglevel parameter, use it
>    in per_console_loglevel_is_set(), do_syslog(),
>    and __handle_sysrq().

In other words: sysrq's use of the printk subsystem in this way is
unique, and thus almost certainly a bad idea.

> 5. Add sysfs interface that would allow to set per-console
>    log level at runtime.
>
> 6. Add the command line interface that would allow to define
>    the console loglevel via console= parameter.
>
> 7. Update the sysctl interface.
>

2023-05-02 09:28:44

by Chris Down

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] printk: console: Remove sysrq exception

Hi Calum,

This patch is almost at completion and already changes the way kernel.printk
works. I'm sure we'd all be happy to look at proposed changes in another
patchset, but I'm not likely to add more changes to this one. In general,
there's plenty of historical baggage in printk space, but changing it requires
careful thought about API/ABI compatibility, new designs, and compatibility.

Thanks,

Chris

2023-05-02 10:39:49

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] printk: console: Remove sysrq exception

On Sun 2023-04-30 19:00:42, [email protected] wrote:
> > > --- a/drivers/tty/sysrq.c
> > > +++ b/drivers/tty/sysrq.c
> > > @@ -101,12 +102,26 @@ __setup("sysrq_always_enabled",
> > > sysrq_always_enabled_setup);
> > > ?static void sysrq_handle_loglevel(int key)
> > > ?{
> > > ????????int i;
> > > +???????int cookie;
> > > +???????int warned = 0;
> > > +???????struct console *con;
> > > ?
> > > ????????i = key - '0';
> > > ????????console_loglevel = CONSOLE_LOGLEVEL_DEFAULT;
> > > ????????pr_info("Loglevel set to %d\n", i);
> > > ????????console_loglevel = i;
> > > +
> > > +???????cookie = console_srcu_read_lock();
> > > +???????for_each_console_srcu(con) {
> > > +???????????????if (!warned && per_console_loglevel_is_set(con)) {
> > > +???????????????????????warned = 1;
> > > +???????????????????????pr_warn("Overriding per-console loglevel
> > > from sysrq\n");
> > > +???????????????}
> > > +???????????????con->level = -1;
> >
> > Please, use WRITE_ONCE(con->level, -1) to make sure that it will be
> > atomic.
> >
> > READ_ONCE()/WRITE_ONCE() should always be used when the values
> > are read/written using RCU synchronization. Otherwise the compiler
> > might do some optimizations and read/write bytes separately.
> >
>
> I would argue we remove this whole logic, and just go to a nice and
> simple printk at KERN_EMERG priority. Why add interactions with the
> SRCU subsystem in code that we need to run even in the event of some
> massive SRCU bug? This code will be both infrequently tested and make
> a bad day worse if it's problematic.

Any bug in SRCU should not break this sysrq code.
console_srcu_read_lock()/unlock() are non-blocking operations.

The problem might be in a path calling synchronize_srcu()
but it is not sysrq case, definitely.

> > > +???????}
> > > +???????console_srcu_read_unlock(cookie);
> > > ?}
> >
> > Also we should safe/set/restore "ignore_per_console_loglevel"
> > in __handle_sysrq(). It already does the same with
> > "console_loglevel".
> >
> > __handle_sysrq() increases the loglevel to show all
> > messages printed by the sysrq handler on all consoles.
> > Many handlers print some information that might be useful
> > for debugging.
>
> Actually, it doesn't! Be kinda cool if it did, but there is a
> legitimate reason for it not to: some sysrq messages, like the list of
> all processes in the system, are extremely long, and so you might want
> to print them to a file but not every console. Instead, it prints
> exactly one message with the increased log level visibility: an info-
> priority message that states the "action message" of the sysrq, before
> returning to the default log level.

Ah, I forgot and misread the code. It really prints only the single
line with updated console loglevel.

> In other words: magic sysrq messages are printed to every console,
> because they are deemed to be important emergencies that no user should
> miss, even if they told the kernel it better be actively on fire if it
> wants to talk. Cool. However, they are not marked as emergency
> priority. Instead, they're marked as info priority, but do a dance
> with kernel log level to ensure they get printed everywhere.

I have different theory why it works this way, see below.


> So I dove into git history, only to find that the log-level shifting code was
> added with the initial import from bitkeeper.
[...]
> Sysrq.c does not exist in the previous kernel version.
>
> https://mirrors.edge.kernel.org/pub/linux/kernel/v2.1/linux-2.1.43.tar.gz

It seems that the original code actually printed all messages with
the updated console_loglevel.

It was modified to printk only the first line or help by:

commit 2433aae9cbfbe77b5c5af11e6174d390e06053a6
Author: linus1 <[email protected]>
Date: Sun Sep 23 11:00:00 2001 -0600

v2.4.10.1 -> v2.4.10.2

- me/Al Viro: fix bdget() oops with block device modules that don't
clean up after they exit
- Alan Cox: continued merging (drivers, license tags)
- David Miller: sparc update, network fixes
- Christoph Hellwig: work around broken drivers that add a gendisk more
than once
- Jakub Jelinek: handle more ELF loading special cases
- Trond Myklebust: NFS client and lockd reclaimer cleanups/fixes
- Greg KH: USB updates
- Mikael Pettersson: sparate out local APIC / IO-APIC config options

, see
https://github.com/mpe/linux-fullhistory/commit/2433aae9cbfbe77b5c5af11e6174d390e06053a6

I do not see any explanation for this. I guess that the too long
output caused some problems.

> No, I could not find the email where this patch was first submitted.
> It may not exist on the internet; it certainly doesn't in the parts
> where Google looks.
>
> Now, printk did support priority levels at this point: without the
> emails discussing the patch, therefore, it's impossible to say why
> Martin decided all those years ago to use this technique. It's
> probably a moot point anyways.

I agree that it is pretty questionable. I am not sure about
the exact motivation.

Well, sysrq is primary used when the system is not responsive.
It is good to know that it is being handled by printing
the first line at least. And it is handy when it does this
out of box.

IMHO, the debug output is not printed because it might be too
much for slow consoles. It might actually be an advantage to
distinguish the log level of the various messages. It allows
to filter the messages on the console.

Note that the console loglevel could be set by sysrq-0 .. 9.

One use case might be to print the debug messages into
the internal log buffer and then trigger crash to
produce crash dump. It would allow to see the log even
when it is not shown to the console. And the log might
be much easier way than digging the information from
the various structures in the crashdump.

> The fact that sysrq messages behave in this way isn't well documented,
> and is definitely harmful. Users might (rightfully) assume that if
> they set the log level to 1, they will not see any messages that are
> not of EMERGENCY or ALERT priority.

sysrq does not know in which state the system is. It might be
called even on a normally running system. Will the EMERGENCY
or ALERT be correct in that case?

> If your system experiences a sysrq, either you have some weird backup
> software that is using the wrong interface

Is there any backup system doing this? Or is it just some wild theory?

> , or someone with extremely
> privileged access to your system believes that there is something so
> fundamentally wrong with your system that they need to bypass the
> entirety of userspace and much of the kernel to get something done.

My understanding is that sysrq is primary used when userspace
does not longer work. IMHO, the original use-case was to
trigger it from the keyboard.

> Either of those situations are at least as important as a typo in a
> password for sudo; which is given a CRITICAL priority.
>
> Lets not add a pile of code in order to maintain a behavior that no
> sane userspace will be depending on, and which might even be causing
> bugs in sane userspaces. Like, for instance, systemd-journald deciding
> not to write out journals when I instruct my kernel to do an emergency
> sync.

Honestly, I am not sure what would be your preferred behavior.
It might be because I am not a native speaker. And the mail is
really long.

Is the problem that systemd-journald did not write the log?
Or is the problem that it did eat 15% CPU?

Eating 15% CPU looks like a bug. The fact that it did not write
anything might be because of the OOM situation. Most things get
blocked when there is no memory.

What exact sysrq behavior would you suggest, please?


> > 4. Add ignore_per_console_loglevel parameter, use it
> > ?? in per_console_loglevel_is_set(), do_syslog(),
> > ?? and __handle_sysrq().
>
> In other words: sysrq's use of the printk subsystem in this way is
> unique, and thus almost certainly a bad idea.

sysrq is very old interface. Various people might expect different
behavior depending on the use case. It might be impossible to
make all people happy.

Changing the default behavior a significant way might be seen as a regression.
Especially, printing all messages with EMERGENCY loglevel looks like
a pretty bad idea because it would prevent any filtering on the
console level.

My feeling is that your primary problem is somewhere else,
systemd-journald or OOM behavior.

Best Regards,
Petr

2023-05-02 14:48:06

by Chris Down

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] printk: console: Support console-specific loglevels

Thanks for the detailed feedback! All comments not explicitly replied to are
acked and will be updated in v6.

Petr Mladek writes:
>> @@ -2293,7 +2429,14 @@ asmlinkage int vprintk_emit(int facility, int level,
>> in_sched = true;
>> }
>>
>> - printk_delay(level);
>> + cookie = console_srcu_read_lock();
>> + for_each_console_srcu(con) {
>> + if (!suppress_message_printing(level, con)) {
>> + printk_delay();
>
>This would repeat the delay for each registered and non-supressed console.
>But it should be called only once when there is at least one
>non-suppressed console.
>
>> + break;
>> + }
>> + }
>> + console_srcu_read_unlock(cookie);

Doesn't the `break' below avoid that already? Maybe I'm missing something?

[...]

>> +static int printk_sysctl_deprecated(struct ctl_table *table, int write,
>> + void *buffer, size_t *lenp, loff_t *ppos)
>> +{
>> + int res = proc_dointvec(table, write, buffer, lenp, ppos);
>> +
>> + if (write)
>> + pr_warn_once(
>> + "printk: The kernel.printk sysctl is deprecated. Consider using kernel.console_loglevel or kernel.default_message_loglevel instead.\n");
>
>Nit: Please, put it on the same line as pr_warn_once("printk: The kernel....
> The small shift left did not help much ;-)

I think this is done by clang-format with kernel default config, since my
preference is single line as well. Maybe worth discussing changing the config?

>How does this sounds, please?

Sounds great! Thank you again for the very detailed feedback :-)

2023-05-02 14:52:51

by Chris Down

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] printk: console: Support console-specific loglevels

John Ogness writes:
>I agree that we should go with your 1st proposal. But I would like to
>see some comments added into printk_get_next_message() that only @flags
>and @level may be considered valid.

Thanks, will do for v6.