Hello,
Here is the next series for the printk-rework to remove
@logbuf_lock. Since this lock protected more than just the
ringbuffer, a new finer-grained @syslog_lock is introduced.
With this series, vprintk_store() is lockless.
A consequence of this locklessness is that a different method
for using temporary sprint space was needed. A DIY memory
buffer pool has been implemented for this. Since printk is
used before per-cpu variables are available, I felt that this
was the simplest approach. Feel free to throw tomatoes at me
if another approach should be taken. (And please suggest that
other approach.)
Also as a consequence of a lockless vprintk_store() is that
there is no longer a need for the NMI safe buffers. They are
also removed. The safe buffers for printk recursion are still
used.
This series falls in line with the printk-rework plan as
presented [0] at Linux Plumbers in Lisbon 2019.
The series is based on the printk-rework branch of the printk
git tree:
f35efc78add6 ("printk: remove dict ring")
John Ogness
[0] https://linuxplumbersconf.org/event/4/contributions/290/attachments/276/463/lpc2019_jogness_printk.pdf (slide 23)
John Ogness (5):
printk: get new seq before enabling interrupts
printk: kmsg_dump_rewind_nolock: start from first record
printk: use buffer pool for sprint buffers
printk: remove logbuf_lock, add syslog_lock
printk: remove nmi safe buffers
kernel/printk/internal.h | 4 +-
kernel/printk/printk.c | 320 ++++++++++++++++++++++++------------
kernel/printk/printk_safe.c | 64 +-------
3 files changed, 218 insertions(+), 170 deletions(-)
--
2.20.1
kmsg_dump_rewind_nolock() accesses @clear_seq without any locking.
However, accessing this global variable requires holding
@logbuf_lock. For the _nolock() variant, start from the first record
in the ringbuffer rather than the @clear_seq record.
Signed-off-by: John Ogness <[email protected]>
---
kernel/printk/printk.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 00bc1fce3299..cff13b33e926 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3410,11 +3410,12 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer);
* kmsg_dump_get_buffer() can be called again and used multiple
* times within the same dumper.dump() callback.
*
- * The function is similar to kmsg_dump_rewind(), but grabs no locks.
+ * The function is similar to kmsg_dump_rewind(), but grabs no locks
+ * and starts from the oldest record rather than from @clear_seq.
*/
void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper)
{
- dumper->cur_seq = clear_seq;
+ dumper->cur_seq = 0;
dumper->next_seq = prb_next_seq(prb);
}
@@ -3431,7 +3432,8 @@ void kmsg_dump_rewind(struct kmsg_dumper *dumper)
unsigned long flags;
logbuf_lock_irqsave(flags);
- kmsg_dump_rewind_nolock(dumper);
+ dumper->cur_seq = clear_seq;
+ dumper->next_seq = prb_next_seq(prb);
logbuf_unlock_irqrestore(flags);
}
EXPORT_SYMBOL_GPL(kmsg_dump_rewind);
--
2.20.1
Since the ringbuffer is lockless, there is no need for it to be
protected by @logbuf_lock. Remove @logbuf_lock.
This means that printk_nmi_direct and printk_safe_flush_on_panic()
no longer need to acquire any lock to run.
The global variables @syslog_seq, @syslog_partial, @syslog_time,
@clear_seq were also protected by @logbuf_lock. Introduce
@syslog_lock to protect these.
@console_seq, @exclusive_console_stop_seq, @console_dropped are
protected by @console_lock.
Signed-off-by: John Ogness <[email protected]>
---
kernel/printk/internal.h | 4 +-
kernel/printk/printk.c | 162 +++++++++++++++---------------------
kernel/printk/printk_safe.c | 21 +----
3 files changed, 70 insertions(+), 117 deletions(-)
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 3a8fd491758c..e7acc2888c8e 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -12,8 +12,6 @@
#define PRINTK_NMI_CONTEXT_OFFSET 0x010000000
-extern raw_spinlock_t logbuf_lock;
-
__printf(4, 0)
int vprintk_store(int facility, int level,
const struct dev_printk_info *dev_info,
@@ -59,7 +57,7 @@ void defer_console_output(void);
__printf(1, 0) int vprintk_func(const char *fmt, va_list args) { return 0; }
/*
- * In !PRINTK builds we still export logbuf_lock spin_lock, console_sem
+ * In !PRINTK builds we still export console_sem
* semaphore and some of console functions (console_unlock()/etc.), so
* printk-safe must preserve the existing local IRQ guarantees.
*/
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 763494d1d6b3..65e3cdbddeff 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -355,53 +355,29 @@ enum log_flags {
LOG_CONT = 8, /* text is a fragment of a continuation line */
};
-/*
- * The logbuf_lock protects kmsg buffer, indices, counters. This can be taken
- * within the scheduler's rq lock. It must be released before calling
- * console_unlock() or anything else that might wake up a process.
- */
-DEFINE_RAW_SPINLOCK(logbuf_lock);
-
-/*
- * Helper macros to lock/unlock logbuf_lock and switch between
- * printk-safe/unsafe modes.
- */
-#define logbuf_lock_irq() \
- do { \
- printk_safe_enter_irq(); \
- raw_spin_lock(&logbuf_lock); \
- } while (0)
-
-#define logbuf_unlock_irq() \
- do { \
- raw_spin_unlock(&logbuf_lock); \
- printk_safe_exit_irq(); \
- } while (0)
-
-#define logbuf_lock_irqsave(flags) \
- do { \
- printk_safe_enter_irqsave(flags); \
- raw_spin_lock(&logbuf_lock); \
- } while (0)
-
-#define logbuf_unlock_irqrestore(flags) \
- do { \
- raw_spin_unlock(&logbuf_lock); \
- printk_safe_exit_irqrestore(flags); \
- } while (0)
+/* The syslog_lock protects syslog_* and clear_seq variables. */
+DEFINE_RAW_SPINLOCK(syslog_lock);
+#define syslog_lock_irq() raw_spin_lock_irq(&syslog_lock)
+#define syslog_unlock_irq() raw_spin_unlock_irq(&syslog_lock)
+#define syslog_lock_irqsave(flags) raw_spin_lock_irqsave(&syslog_lock, flags)
+#define syslog_unlock_irqrestore(flags) raw_spin_unlock_irqrestore(&syslog_lock, flags)
#ifdef CONFIG_PRINTK
DECLARE_WAIT_QUEUE_HEAD(log_wait);
+
+/* All 3 protected by @syslog_lock. */
/* the next printk record to read by syslog(READ) or /proc/kmsg */
static u64 syslog_seq;
static size_t syslog_partial;
static bool syslog_time;
+/* All 3 protected by @console_lock. */
/* the next printk record to write to the console */
static u64 console_seq;
static u64 exclusive_console_stop_seq;
static unsigned long console_dropped;
+/* Protected by @syslog_lock. */
/* the next printk record to read after the last 'clear' command */
static u64 clear_seq;
@@ -766,27 +742,22 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
if (ret)
return ret;
- logbuf_lock_irq();
if (!prb_read_valid(prb, user->seq, r)) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
- logbuf_unlock_irq();
goto out;
}
- logbuf_unlock_irq();
ret = wait_event_interruptible(log_wait,
prb_read_valid(prb, user->seq, r));
if (ret)
goto out;
- logbuf_lock_irq();
}
if (user->seq < prb_first_valid_seq(prb)) {
/* our last seen message is gone, return error and reset */
user->seq = prb_first_valid_seq(prb);
ret = -EPIPE;
- logbuf_unlock_irq();
goto out;
}
@@ -796,7 +767,6 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
&r->info->dev_info);
user->seq = r->info->seq + 1;
- logbuf_unlock_irq();
if (len > count) {
ret = -EINVAL;
@@ -823,7 +793,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
if (offset)
return -ESPIPE;
- logbuf_lock_irq();
switch (whence) {
case SEEK_SET:
/* the first record */
@@ -835,7 +804,9 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
* like issued by 'dmesg -c'. Reading /dev/kmsg itself
* changes no global state, and does not clear anything.
*/
+ syslog_lock_irq();
user->seq = clear_seq;
+ syslog_unlock_irq();
break;
case SEEK_END:
/* after the last record */
@@ -844,7 +815,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
default:
ret = -EINVAL;
}
- logbuf_unlock_irq();
return ret;
}
@@ -858,7 +828,6 @@ static __poll_t devkmsg_poll(struct file *file, poll_table *wait)
poll_wait(file, &log_wait, wait);
- logbuf_lock_irq();
if (prb_read_valid(prb, user->seq, NULL)) {
/* return error when data has vanished underneath us */
if (user->seq < prb_first_valid_seq(prb))
@@ -866,7 +835,6 @@ static __poll_t devkmsg_poll(struct file *file, poll_table *wait)
else
ret = EPOLLIN|EPOLLRDNORM;
}
- logbuf_unlock_irq();
return ret;
}
@@ -899,9 +867,7 @@ static int devkmsg_open(struct inode *inode, struct file *file)
prb_rec_init_rd(&user->record, &user->info,
&user->text_buf[0], sizeof(user->text_buf));
- logbuf_lock_irq();
user->seq = prb_first_valid_seq(prb);
- logbuf_unlock_irq();
file->private_data = user;
return 0;
@@ -1164,7 +1130,7 @@ void __init setup_log_buf(int early)
new_descs, ilog2(new_descs_count),
new_infos);
- logbuf_lock_irqsave(flags);
+ local_irq_save(flags);
log_buf_len = new_log_buf_len;
log_buf = new_log_buf;
@@ -1183,7 +1149,7 @@ void __init setup_log_buf(int early)
seq = prb_next_seq(&printk_rb_static) - seq;
- logbuf_unlock_irqrestore(flags);
+ local_irq_restore(flags);
if (seq)
pr_err("dropped %llu messages\n", seq);
@@ -1451,9 +1417,9 @@ static int syslog_print(char __user *buf, int size)
size_t n;
size_t skip;
- logbuf_lock_irq();
+ syslog_lock_irq();
if (!prb_read_valid(prb, syslog_seq, &r)) {
- logbuf_unlock_irq();
+ syslog_unlock_irq();
break;
}
if (r.info->seq != syslog_seq) {
@@ -1482,7 +1448,7 @@ static int syslog_print(char __user *buf, int size)
syslog_partial += n;
} else
n = 0;
- logbuf_unlock_irq();
+ syslog_unlock_irq();
if (!n)
break;
@@ -1507,6 +1473,8 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
struct printk_info info;
unsigned int line_count;
struct printk_record r;
+ u64 newest_seq;
+ u64 clr_seq;
char *text;
int len = 0;
u64 seq;
@@ -1517,17 +1485,31 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
return -ENOMEM;
time = printk_time;
- logbuf_lock_irq();
+ syslog_lock_irq();
+ clr_seq = clear_seq;
+ syslog_unlock_irq();
+
/*
* Find first record that fits, including all following records,
* into the user-provided buffer for this dump.
*/
- prb_for_each_info(clear_seq, prb, seq, &info, &line_count)
+ prb_for_each_info(clr_seq, prb, seq, &info, &line_count)
len += get_record_print_text_size(&info, line_count, true, time);
- /* move first record forward until length fits into the buffer */
- prb_for_each_info(clear_seq, prb, seq, &info, &line_count) {
- if (len <= size)
+ /*
+ * Keep track of the latest in case new records are coming in fast
+ * and overwriting the older records.
+ */
+ newest_seq = seq;
+
+ /*
+ * Move first record forward until length fits into the buffer. This
+ * is a best effort attempt. If @newest_seq is reached because the
+ * ringbuffer is wrapping too fast, just start filling the buffer
+ * from there.
+ */
+ prb_for_each_info(clr_seq, prb, seq, &info, &line_count) {
+ if (len <= size || seq > newest_seq)
break;
len -= get_record_print_text_size(&info, line_count, true, time);
}
@@ -1545,20 +1527,20 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
break;
}
- logbuf_unlock_irq();
if (copy_to_user(buf + len, text, textlen))
len = -EFAULT;
else
len += textlen;
- logbuf_lock_irq();
if (len < 0)
break;
}
- if (clear)
+ if (clear) {
+ syslog_lock_irq();
clear_seq = seq;
- logbuf_unlock_irq();
+ syslog_unlock_irq();
+ }
kfree(text);
return len;
@@ -1566,9 +1548,9 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
static void syslog_clear(void)
{
- logbuf_lock_irq();
+ syslog_lock_irq();
clear_seq = prb_next_seq(prb);
- logbuf_unlock_irq();
+ syslog_unlock_irq();
}
int do_syslog(int type, char __user *buf, int len, int source)
@@ -1576,6 +1558,7 @@ int do_syslog(int type, char __user *buf, int len, int source)
bool clear = false;
static int saved_console_loglevel = LOGLEVEL_DEFAULT;
int error;
+ u64 seq;
error = check_syslog_permissions(type, source);
if (error)
@@ -1593,8 +1576,11 @@ int do_syslog(int type, char __user *buf, int len, int source)
return 0;
if (!access_ok(buf, len))
return -EFAULT;
+ syslog_lock_irq();
+ seq = syslog_seq;
+ syslog_unlock_irq();
error = wait_event_interruptible(log_wait,
- prb_read_valid(prb, syslog_seq, NULL));
+ prb_read_valid(prb, seq, NULL));
if (error)
return error;
error = syslog_print(buf, len);
@@ -1642,7 +1628,7 @@ int do_syslog(int type, char __user *buf, int len, int source)
break;
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD:
- logbuf_lock_irq();
+ syslog_lock_irq();
if (syslog_seq < prb_first_valid_seq(prb)) {
/* messages are gone, move to first one */
syslog_seq = prb_first_valid_seq(prb);
@@ -1669,7 +1655,7 @@ int do_syslog(int type, char __user *buf, int len, int source)
}
error -= syslog_partial;
}
- logbuf_unlock_irq();
+ syslog_unlock_irq();
break;
/* Size of the log buffer */
case SYSLOG_ACTION_SIZE_BUFFER:
@@ -2106,10 +2092,9 @@ asmlinkage int vprintk_emit(int facility, int level,
boot_delay_msec(level);
printk_delay();
- /* This stops the holder of console_sem just where we want him */
- logbuf_lock_irqsave(flags);
+ printk_safe_enter_irqsave(flags);
printed_len = vprintk_store(facility, level, dev_info, fmt, args);
- logbuf_unlock_irqrestore(flags);
+ printk_safe_exit_irqrestore(flags);
/* If called from the scheduler, we can not call up(). */
if (!in_sched) {
@@ -2541,7 +2526,6 @@ void console_unlock(void)
size_t len;
printk_safe_enter_irqsave(flags);
- raw_spin_lock(&logbuf_lock);
skip:
if (!prb_read_valid(prb, console_seq, &r))
break;
@@ -2585,7 +2569,6 @@ void console_unlock(void)
console_msg_format & MSG_FORMAT_SYSLOG,
printk_time);
console_seq++;
- raw_spin_unlock(&logbuf_lock);
/*
* While actively printing out messages, if another printk()
@@ -2612,8 +2595,6 @@ void console_unlock(void)
console_locked = 0;
- raw_spin_unlock(&logbuf_lock);
-
up_console_sem();
/*
@@ -2622,9 +2603,7 @@ void console_unlock(void)
* there's a new owner and the console_unlock() from them will do the
* flush, no worries.
*/
- raw_spin_lock(&logbuf_lock);
retry = prb_read_valid(prb, console_seq, NULL);
- raw_spin_unlock(&logbuf_lock);
printk_safe_exit_irqrestore(flags);
if (retry && console_trylock())
@@ -2691,9 +2670,9 @@ void console_flush_on_panic(enum con_flush_mode mode)
if (mode == CONSOLE_REPLAY_ALL) {
unsigned long flags;
- logbuf_lock_irqsave(flags);
+ local_irq_save(flags);
console_seq = prb_first_valid_seq(prb);
- logbuf_unlock_irqrestore(flags);
+ local_irq_restore(flags);
}
console_unlock();
}
@@ -2922,7 +2901,6 @@ void register_console(struct console *newcon)
* console_unlock(); will print out the buffered messages
* for us.
*/
- logbuf_lock_irqsave(flags);
/*
* We're about to replay the log buffer. Only do this to the
* just-registered console to avoid excessive message spam to
@@ -2934,8 +2912,10 @@ void register_console(struct console *newcon)
*/
exclusive_console = newcon;
exclusive_console_stop_seq = console_seq;
+
+ syslog_lock_irqsave(flags);
console_seq = syslog_seq;
- logbuf_unlock_irqrestore(flags);
+ syslog_unlock_irqrestore(flags);
}
console_unlock();
console_sysfs_notify();
@@ -3338,10 +3318,11 @@ void kmsg_dump(enum kmsg_dump_reason reason)
/* initialize iterator with data about the stored records */
dumper->active = true;
- logbuf_lock_irqsave(flags);
+ syslog_lock_irqsave(flags);
dumper->cur_seq = clear_seq;
+ syslog_unlock_irqrestore(flags);
+
dumper->next_seq = prb_next_seq(prb);
- logbuf_unlock_irqrestore(flags);
/* invoke dumper which will iterate over records */
dumper->dump(dumper, reason);
@@ -3428,14 +3409,7 @@ bool kmsg_dump_get_line_nolock(struct kmsg_dumper *dumper, bool syslog,
bool kmsg_dump_get_line(struct kmsg_dumper *dumper, bool syslog,
char *line, size_t size, size_t *len)
{
- unsigned long flags;
- bool ret;
-
- logbuf_lock_irqsave(flags);
- ret = kmsg_dump_get_line_nolock(dumper, syslog, line, size, len);
- logbuf_unlock_irqrestore(flags);
-
- return ret;
+ return kmsg_dump_get_line_nolock(dumper, syslog, line, size, len);
}
EXPORT_SYMBOL_GPL(kmsg_dump_get_line);
@@ -3464,7 +3438,6 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
struct printk_info info;
unsigned int line_count;
struct printk_record r;
- unsigned long flags;
u64 seq;
u64 next_seq;
size_t l = 0;
@@ -3476,17 +3449,14 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
if (!dumper->active || !buf || !size)
goto out;
- logbuf_lock_irqsave(flags);
if (dumper->cur_seq < prb_first_valid_seq(prb)) {
/* messages are gone, move to first available one */
dumper->cur_seq = prb_first_valid_seq(prb);
}
/* last entry */
- if (dumper->cur_seq >= dumper->next_seq) {
- logbuf_unlock_irqrestore(flags);
+ if (dumper->cur_seq >= dumper->next_seq)
goto out;
- }
/* calculate length of entire buffer */
seq = dumper->cur_seq;
@@ -3526,7 +3496,6 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
dumper->next_seq = next_seq;
ret = true;
- logbuf_unlock_irqrestore(flags);
out:
if (len)
*len = l;
@@ -3563,10 +3532,11 @@ void kmsg_dump_rewind(struct kmsg_dumper *dumper)
{
unsigned long flags;
- logbuf_lock_irqsave(flags);
+ syslog_lock_irqsave(flags);
dumper->cur_seq = clear_seq;
+ syslog_unlock_irqrestore(flags);
+
dumper->next_seq = prb_next_seq(prb);
- logbuf_unlock_irqrestore(flags);
}
EXPORT_SYMBOL_GPL(kmsg_dump_rewind);
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 5dbc40160990..7c186cefdb50 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -16,8 +16,7 @@
#include "internal.h"
/*
- * printk() could not take logbuf_lock in NMI context. Instead,
- * it uses an alternative implementation that temporary stores
+ * Use an alternative implementation that temporary stores
* the strings into a per-CPU buffer. The content of the buffer
* is later flushed into the main ring buffer via IRQ work.
*
@@ -266,18 +265,6 @@ void printk_safe_flush(void)
*/
void printk_safe_flush_on_panic(void)
{
- /*
- * Make sure that we could access the main ring buffer.
- * Do not risk a double release when more CPUs are up.
- */
- if (raw_spin_is_locked(&logbuf_lock)) {
- if (num_online_cpus() > 1)
- return;
-
- debug_locks_off();
- raw_spin_lock_init(&logbuf_lock);
- }
-
printk_safe_flush();
}
@@ -371,17 +358,15 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
* Try to use the main logbuf even in NMI. But avoid calling console
* drivers that might have their own locks.
*/
- if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK) &&
- raw_spin_trylock(&logbuf_lock)) {
+ if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK)) {
int len;
len = vprintk_store(0, LOGLEVEL_DEFAULT, NULL, fmt, args);
- raw_spin_unlock(&logbuf_lock);
defer_console_output();
return len;
}
- /* Use extra buffer in NMI when logbuf_lock is taken or in safe mode. */
+ /* Use extra buffer in NMI. */
if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
return vprintk_nmi(fmt, args);
--
2.20.1
vprintk_store() is using a single static buffer as a temporary
sprint buffer for the message text. This will not work once
@logbuf_lock is removed. Replace the single static buffer with a
pool of buffers.
The pool is large enough to support a worse case of 2 contexts
per CPU (non-NMI and NMI).
Signed-off-by: John Ogness <[email protected]>
---
kernel/printk/printk.c | 142 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 137 insertions(+), 5 deletions(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index cff13b33e926..763494d1d6b3 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1872,6 +1872,112 @@ static void call_console_drivers(const char *ext_text, size_t ext_len,
}
}
+/*
+ * The sprint buffers are used with interrupts disabled, so each CPU
+ * only requires 2 buffers: for non-NMI and NMI contexts. Recursive
+ * printk() calls are handled by the safe buffers.
+ */
+#define SPRINT_CTX_DEPTH 2
+
+/* Static sprint buffers for early boot (only 1 CPU). */
+static DECLARE_BITMAP(sprint_static_textbuf_map, SPRINT_CTX_DEPTH);
+static char sprint_static_textbuf[SPRINT_CTX_DEPTH * LOG_LINE_MAX];
+
+/* Dynamically allocated sprint buffers. */
+static unsigned int sprint_dynamic_textbuf_count;
+static unsigned long *sprint_dynamic_textbuf_map;
+static char *sprint_dynamic_textbuf;
+
+/*
+ * Acquire an unused buffer, returning its index. If no buffer is
+ * available, @count is returned.
+ */
+static int _get_sprint_buf(unsigned long *map, int count)
+{
+ int index;
+
+ do {
+ index = find_first_zero_bit(map, count);
+ if (index == count)
+ break;
+ /*
+ * Guarantee map changes are ordered for the other CPUs.
+ * Pairs with clear_bit() in _put_sprint_buf().
+ */
+ } while (test_and_set_bit(index, map));
+
+ return index;
+}
+
+/* Mark the buffer @index as unused. */
+static void _put_sprint_buf(unsigned long *map, unsigned int count, unsigned int index)
+{
+ /*
+ * Guarantee map changes are ordered for the other CPUs.
+ * Pairs with test_and_set_bit() in _get_sprint_buf().
+ */
+ clear_bit(index, map);
+}
+
+/*
+ * Get a buffer sized LOG_LINE_MAX for sprinting. On success, @id is set and
+ * interrupts are disabled. @id is used to put back the buffer.
+ *
+ * @id is non-negative for static buffers, negative for dynamic buffers.
+ */
+static char *get_sprint_buf(int *id, unsigned long *flags)
+{
+ unsigned int index;
+
+ local_irq_save(*flags);
+
+ /* First try with static pool. */
+ index = _get_sprint_buf(sprint_static_textbuf_map, SPRINT_CTX_DEPTH);
+ if (index != SPRINT_CTX_DEPTH) {
+ *id = index;
+ return &sprint_static_textbuf[index * LOG_LINE_MAX];
+ }
+
+ /*
+ * Fallback to dynamic pool, if available.
+ *
+ * Guarantee the buffer is loaded before loading the map and count.
+ * Pairs with smp_store_release() in console_init().
+ */
+ if (smp_load_acquire(&sprint_dynamic_textbuf)) {
+ index = _get_sprint_buf(sprint_dynamic_textbuf_map,
+ sprint_dynamic_textbuf_count);
+ if (index != sprint_dynamic_textbuf_count) {
+ /* Convert to dynamic buffer representation. */
+ *id = -index - 1;
+ return &sprint_dynamic_textbuf[index * LOG_LINE_MAX];
+ }
+ }
+
+ /* Failed to get a buffer. */
+ local_irq_restore(*flags);
+ return NULL;
+}
+
+/* Put back an sprint buffer and restore interrupts. */
+static void put_sprint_buf(int id, unsigned long flags)
+{
+ unsigned int index;
+
+ if (id >= 0) {
+ index = id;
+ _put_sprint_buf(sprint_static_textbuf_map,
+ SPRINT_CTX_DEPTH, index);
+ } else {
+ /* Convert from dynamic buffer representation. */
+ index = -id - 1;
+ _put_sprint_buf(sprint_dynamic_textbuf_map,
+ sprint_dynamic_textbuf_count, index);
+ }
+
+ local_irq_restore(flags);
+}
+
int printk_delay_msec __read_mostly;
static inline void printk_delay(void)
@@ -1921,21 +2027,26 @@ static size_t log_output(int facility, int level, enum log_flags lflags,
dev_info, text, text_len);
}
-/* Must be called under logbuf_lock. */
int vprintk_store(int facility, int level,
const struct dev_printk_info *dev_info,
const char *fmt, va_list args)
{
- static char textbuf[LOG_LINE_MAX];
- char *text = textbuf;
size_t text_len;
enum log_flags lflags = 0;
+ unsigned long irqflags;
+ int sprint_id;
+ char *text;
+ int ret;
+
+ text = get_sprint_buf(&sprint_id, &irqflags);
+ if (WARN_ON_ONCE(!text))
+ return 0;
/*
* The printf needs to come first; we need the syslog
* prefix which might be passed-in as a parameter.
*/
- text_len = vscnprintf(text, sizeof(textbuf), fmt, args);
+ text_len = vscnprintf(text, LOG_LINE_MAX, fmt, args);
/* mark and strip a trailing newline */
if (text_len && text[text_len-1] == '\n') {
@@ -1968,7 +2079,11 @@ int vprintk_store(int facility, int level,
if (dev_info)
lflags |= LOG_NEWLINE;
- return log_output(facility, level, lflags, dev_info, text, text_len);
+ ret = log_output(facility, level, lflags, dev_info, text, text_len);
+
+ put_sprint_buf(sprint_id, irqflags);
+
+ return ret;
}
asmlinkage int vprintk_emit(int facility, int level,
@@ -2920,6 +3035,23 @@ void __init console_init(void)
initcall_t call;
initcall_entry_t *ce;
+ sprint_dynamic_textbuf_count = num_possible_cpus() * SPRINT_CTX_DEPTH;
+ sprint_dynamic_textbuf_map = bitmap_alloc(sprint_dynamic_textbuf_count, GFP_KERNEL);
+ if (sprint_dynamic_textbuf_map) {
+ /*
+ * Guarantee the buffer is stored after storing the
+ * map and count. Pairs with smp_load_acquire() in
+ * get_sprint_buf().
+ */
+ smp_store_release(&sprint_dynamic_textbuf,
+ kmalloc(sprint_dynamic_textbuf_count * LOG_LINE_MAX, GFP_KERNEL));
+ }
+ if (!sprint_dynamic_textbuf) {
+ bitmap_free(sprint_dynamic_textbuf_map);
+ pr_info("sprint_dynamic_textbuf: %u buffers not available\n",
+ sprint_dynamic_textbuf_count);
+ }
+
/* Setup the default TTY line discipline. */
n_tty_init();
--
2.20.1
Since the ringbuffer is now lockless, new records can be inserted
directly from NMI context. There is no need for the NMI safe
buffers. Handle all NMI printk() calls using defer_console_output()
to avoid calling console drivers that might have their own locks.
Signed-off-by: John Ogness <[email protected]>
---
kernel/printk/printk_safe.c | 47 ++++---------------------------------
1 file changed, 4 insertions(+), 43 deletions(-)
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 7c186cefdb50..720935d19a3e 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -44,10 +44,6 @@ struct printk_safe_seq_buf {
static DEFINE_PER_CPU(struct printk_safe_seq_buf, safe_print_seq);
static DEFINE_PER_CPU(int, printk_context);
-#ifdef CONFIG_PRINTK_NMI
-static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq);
-#endif
-
/* Get flushed in a more safe context. */
static void queue_flush_work(struct printk_safe_seq_buf *s)
{
@@ -245,12 +241,8 @@ void printk_safe_flush(void)
{
int cpu;
- for_each_possible_cpu(cpu) {
-#ifdef CONFIG_PRINTK_NMI
- __printk_safe_flush(&per_cpu(nmi_print_seq, cpu).work);
-#endif
+ for_each_possible_cpu(cpu)
__printk_safe_flush(&per_cpu(safe_print_seq, cpu).work);
- }
}
/**
@@ -268,20 +260,6 @@ void printk_safe_flush_on_panic(void)
printk_safe_flush();
}
-#ifdef CONFIG_PRINTK_NMI
-/*
- * Safe printk() for NMI context. It uses a per-CPU buffer to
- * store the message. NMIs are not nested, so there is always only
- * one writer running. But the buffer might get flushed from another
- * CPU, so we need to be careful.
- */
-static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args)
-{
- struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
-
- return printk_safe_log_store(s, fmt, args);
-}
-
void noinstr printk_nmi_enter(void)
{
this_cpu_add(printk_context, PRINTK_NMI_CONTEXT_OFFSET);
@@ -313,15 +291,6 @@ void printk_nmi_direct_exit(void)
this_cpu_and(printk_context, ~PRINTK_NMI_DIRECT_CONTEXT_MASK);
}
-#else
-
-static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args)
-{
- return 0;
-}
-
-#endif /* CONFIG_PRINTK_NMI */
-
/*
* Lock-less printk(), to avoid deadlocks should the printk() recurse
* into itself. It uses a per-CPU buffer to store the message, just like
@@ -355,10 +324,11 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
#endif
/*
- * Try to use the main logbuf even in NMI. But avoid calling console
+ * Use the main ringbuffer in NMI. But avoid calling console
* drivers that might have their own locks.
*/
- if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK)) {
+ if ((this_cpu_read(printk_context) &
+ (PRINTK_NMI_CONTEXT_MASK | PRINTK_NMI_DIRECT_CONTEXT_MASK))) {
int len;
len = vprintk_store(0, LOGLEVEL_DEFAULT, NULL, fmt, args);
@@ -366,10 +336,6 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
return len;
}
- /* Use extra buffer in NMI. */
- if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
- return vprintk_nmi(fmt, args);
-
/* Use extra buffer to prevent a recursion deadlock in safe mode. */
if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
return vprintk_safe(fmt, args);
@@ -387,11 +353,6 @@ void __init printk_safe_init(void)
s = &per_cpu(safe_print_seq, cpu);
init_irq_work(&s->work, __printk_safe_flush);
-
-#ifdef CONFIG_PRINTK_NMI
- s = &per_cpu(nmi_print_seq, cpu);
- init_irq_work(&s->work, __printk_safe_flush);
-#endif
}
/* Flush pending messages that did not have scheduled IRQ works. */
--
2.20.1
On Tue 2020-09-22 17:44:13, John Ogness wrote:
> kmsg_dump_rewind_nolock() accesses @clear_seq without any locking.
> However, accessing this global variable requires holding
> @logbuf_lock. For the _nolock() variant, start from the first record
> in the ringbuffer rather than the @clear_seq record.
> Signed-off-by: John Ogness <[email protected]>
> ---
> kernel/printk/printk.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 00bc1fce3299..cff13b33e926 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3410,11 +3410,12 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer);
> * kmsg_dump_get_buffer() can be called again and used multiple
> * times within the same dumper.dump() callback.
> *
> - * The function is similar to kmsg_dump_rewind(), but grabs no locks.
> + * The function is similar to kmsg_dump_rewind(), but grabs no locks
> + * and starts from the oldest record rather than from @clear_seq.
> */
> void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper)
> {
> - dumper->cur_seq = clear_seq;
> + dumper->cur_seq = 0;
Just to understand it. Is the problem that the value might be in
an inconsistent state?
I mean that it might be modified by more instructions, for example,
because of compiler optimizations or on 32-bit system by definition.
I still have to look at the later patches. But it seems that
the new syslog_lock is taken mostly only around reading or
writing the global @clear_seq variable. Atomic variable might
do the same job.
Best Regards,
Petr
On Tue 2020-09-22 17:44:14, John Ogness wrote:
> vprintk_store() is using a single static buffer as a temporary
> sprint buffer for the message text. This will not work once
> @logbuf_lock is removed. Replace the single static buffer with a
> pool of buffers.
The buffer is used because we do not know the length of the
formatted message to reserve the right space in the ring buffer
in advance.
There was the idea to call vsprintf(NULL, fmt, args) to count
the length in advance.
AFAIK, there is one catch. We need to use va_copy() around
the 1st call because va_format can be proceed only once.
See, va_format() in lib/vsprintf.c as an example.
Is there any other problem, please?
Best Regards,
Petr
From: Petr Mladek
> Sent: 23 September 2020 16:11
>
> On Tue 2020-09-22 17:44:14, John Ogness wrote:
> > vprintk_store() is using a single static buffer as a temporary
> > sprint buffer for the message text. This will not work once
> > @logbuf_lock is removed. Replace the single static buffer with a
> > pool of buffers.
>
> The buffer is used because we do not know the length of the
> formatted message to reserve the right space in the ring buffer
> in advance.
>
> There was the idea to call vsprintf(NULL, fmt, args) to count
> the length in advance.
>
> AFAIK, there is one catch. We need to use va_copy() around
> the 1st call because va_format can be proceed only once.
> See, va_format() in lib/vsprintf.c as an example.
>
> Is there any other problem, please?
Potentially the data can change after the vsprintf(NULL, ...)
call so that the buffer isn't guaranteed to be the right length.
Never mind the extra cost of doing all the work twice.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On 2020-09-23, Petr Mladek <[email protected]> wrote:
>> kmsg_dump_rewind_nolock() accesses @clear_seq without any locking.
>> However, accessing this global variable requires holding
>> @logbuf_lock. For the _nolock() variant, start from the first record
>> in the ringbuffer rather than the @clear_seq record.
>
>> Signed-off-by: John Ogness <[email protected]>
>> ---
>> kernel/printk/printk.c | 8 +++++---
>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index 00bc1fce3299..cff13b33e926 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -3410,11 +3410,12 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer);
>> * kmsg_dump_get_buffer() can be called again and used multiple
>> * times within the same dumper.dump() callback.
>> *
>> - * The function is similar to kmsg_dump_rewind(), but grabs no locks.
>> + * The function is similar to kmsg_dump_rewind(), but grabs no locks
>> + * and starts from the oldest record rather than from @clear_seq.
>> */
>> void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper)
>> {
>> - dumper->cur_seq = clear_seq;
>> + dumper->cur_seq = 0;
>
> Just to understand it. Is the problem that the value might be in
> an inconsistent state?
>
> I mean that it might be modified by more instructions, for example,
> because of compiler optimizations or on 32-bit system by definition.
Correct.
> I still have to look at the later patches. But it seems that
> the new syslog_lock is taken mostly only around reading or
> writing the global @clear_seq variable. Atomic variable might
> do the same job.
Until now I have avoided using atomic64 types. Perhaps my reluctance to
use this type is unfounded. Using an atomic64 for @clear_seq would free
it from needing to be protected by @syslog_lock.
John Ogness
On Tue 2020-09-22 17:44:15, John Ogness wrote:
> Since the ringbuffer is lockless, there is no need for it to be
> protected by @logbuf_lock. Remove @logbuf_lock.
>
> This means that printk_nmi_direct and printk_safe_flush_on_panic()
> no longer need to acquire any lock to run.
>
> The global variables @syslog_seq, @syslog_partial, @syslog_time,
> @clear_seq were also protected by @logbuf_lock. Introduce
> @syslog_lock to protect these.
>
> @console_seq, @exclusive_console_stop_seq, @console_dropped are
> protected by @console_lock.
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 763494d1d6b3..65e3cdbddeff 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> #ifdef CONFIG_PRINTK
> DECLARE_WAIT_QUEUE_HEAD(log_wait);
> +
> +/* All 3 protected by @syslog_lock. */
> /* the next printk record to read by syslog(READ) or /proc/kmsg */
> static u64 syslog_seq;
> static size_t syslog_partial;
> static bool syslog_time;
I agree that it makes sense to synchronize these three variables
on 3 locations, see below.
> +/* All 3 protected by @console_lock. */
> /* the next printk record to write to the console */
> static u64 console_seq;
> static u64 exclusive_console_stop_seq;
> static unsigned long console_dropped;
>
> +/* Protected by @syslog_lock. */
> /* the next printk record to read after the last 'clear' command */
> static u64 clear_seq;
The synchronization of this variable is strange. It is not clear
against which changes it is synchronized.
> @@ -823,7 +793,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
> if (offset)
> return -ESPIPE;
>
> - logbuf_lock_irq();
> switch (whence) {
> case SEEK_SET:
> /* the first record */
SEEK_SET does:
user->seq = prb_first_valid_seq(prb);
I wonder if we actually need to protect user->seq by user->lock mutex
as it is done in devkmsg_read(). The logbuf_lock somehow prevented
a possible race. Well, any race is not much realistic.
> @@ -858,7 +828,6 @@ static __poll_t devkmsg_poll(struct file *file, poll_table *wait)
>
> poll_wait(file, &log_wait, wait);
>
> - logbuf_lock_irq();
Also this should probably get replaced by user->lock mutex.
> if (prb_read_valid(prb, user->seq, NULL)) {
> /* return error when data has vanished underneath us */
> if (user->seq < prb_first_valid_seq(prb))
> @@ -1593,8 +1576,11 @@ int do_syslog(int type, char __user *buf, int len, int source)
> return 0;
> if (!access_ok(buf, len))
> return -EFAULT;
> + syslog_lock_irq();
> + seq = syslog_seq;
> + syslog_unlock_irq();
It is not clear why a lock is suddenly needed here.
All the locks around a single variable read/write are suspicious. They
help only against inconsistent value (compile optimization or 64-bit value
manipulation on 32-bit system).
It might make sense but it has been clearly ignored before.
> error = wait_event_interruptible(log_wait,
> - prb_read_valid(prb, syslog_seq, NULL));
> + prb_read_valid(prb, seq, NULL));
> if (error)
> return error;
> error = syslog_print(buf, len);
> @@ -1642,7 +1628,7 @@ int do_syslog(int type, char __user *buf, int len, int source)
> break;
> /* Number of chars in the log buffer */
> case SYSLOG_ACTION_SIZE_UNREAD:
> - logbuf_lock_irq();
> + syslog_lock_irq();
I agree that some locking is needed here to keep @syslog_seq,
@syslog_partial, and @syslog_time consistent.
> if (syslog_seq < prb_first_valid_seq(prb)) {
> /* messages are gone, move to first one */
> syslog_seq = prb_first_valid_seq(prb);
> @@ -1669,7 +1655,7 @@ int do_syslog(int type, char __user *buf, int len, int source)
> }
> error -= syslog_partial;
> }
> - logbuf_unlock_irq();
> + syslog_unlock_irq();
> break;
> /* Size of the log buffer */
> case SYSLOG_ACTION_SIZE_BUFFER:
> @@ -2106,10 +2092,9 @@ asmlinkage int vprintk_emit(int facility, int level,
> boot_delay_msec(level);
> printk_delay();
>
> - /* This stops the holder of console_sem just where we want him */
> - logbuf_lock_irqsave(flags);
> + printk_safe_enter_irqsave(flags);
> printed_len = vprintk_store(facility, level, dev_info, fmt, args);
> - logbuf_unlock_irqrestore(flags);
Why exactly need this be called in printk_safe context, please?
Infinite recursion might be prevented by per-CPU counter.
Lack of line buffers could hopefully be prevented by vscnprintf(NULL,
...) or extending the pool in 2nd patch.
Is there any other reason, please?
> + printk_safe_exit_irqrestore(flags);
>
> /* If called from the scheduler, we can not call up(). */
> if (!in_sched) {
> @@ -2691,9 +2670,9 @@ void console_flush_on_panic(enum con_flush_mode mode)
> if (mode == CONSOLE_REPLAY_ALL) {
> unsigned long flags;
>
> - logbuf_lock_irqsave(flags);
> + local_irq_save(flags);
> console_seq = prb_first_valid_seq(prb);
> - logbuf_unlock_irqrestore(flags);
> + local_irq_restore(flags);
What is the reason for disabled irq here, please?
> }
> console_unlock();
> }
> @@ -3476,17 +3449,14 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
> if (!dumper->active || !buf || !size)
> goto out;
>
> - logbuf_lock_irqsave(flags);
I wonder if the logbuf_lock actually synchronized also some variables
in struct kmsg_dumper (cur_seq and next_seq).
We might need to add a lock into struct kmsg_dumper.
> if (dumper->cur_seq < prb_first_valid_seq(prb)) {
> /* messages are gone, move to first available one */
> dumper->cur_seq = prb_first_valid_seq(prb);
> }
>
> /* last entry */
> - if (dumper->cur_seq >= dumper->next_seq) {
> - logbuf_unlock_irqrestore(flags);
> + if (dumper->cur_seq >= dumper->next_seq)
> goto out;
> - }
>
> /* calculate length of entire buffer */
> seq = dumper->cur_seq;
Sigh, I wish the locking world was easier.
Best Regards,
Petr
On Tue 2020-09-22 17:44:16, John Ogness wrote:
> Since the ringbuffer is now lockless, new records can be inserted
> directly from NMI context. There is no need for the NMI safe
> buffers. Handle all NMI printk() calls using defer_console_output()
> to avoid calling console drivers that might have their own locks.
>
> Signed-off-by: John Ogness <[email protected]>
> ---
> kernel/printk/printk_safe.c | 47 ++++---------------------------------
> 1 file changed, 4 insertions(+), 43 deletions(-)
>
> diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
> index 7c186cefdb50..720935d19a3e 100644
> --- a/kernel/printk/printk_safe.c
> +++ b/kernel/printk/printk_safe.c
> @@ -355,10 +324,11 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
> #endif
>
> /*
> - * Try to use the main logbuf even in NMI. But avoid calling console
> + * Use the main ringbuffer in NMI. But avoid calling console
> * drivers that might have their own locks.
> */
> - if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK)) {
> + if ((this_cpu_read(printk_context) &
> + (PRINTK_NMI_CONTEXT_MASK | PRINTK_NMI_DIRECT_CONTEXT_MASK))) {
> int len;
>
> len = vprintk_store(0, LOGLEVEL_DEFAULT, NULL, fmt, args);
> @@ -366,10 +336,6 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
> return len;
> }
>
> - /* Use extra buffer in NMI. */
> - if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
> - return vprintk_nmi(fmt, args);
> -
> /* Use extra buffer to prevent a recursion deadlock in safe mode. */
> if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
> return vprintk_safe(fmt, args);
I still hope that we will be able to get rid of printk_safe completely.
I would expect that only two situations would need special handling
with the lockless ringbuffer:
+ call vkdb_printf() in kdb context
+ defer console in NMI context
Eventually we should prevent too big level of recursion.
Best Regards,
Petr
On Wed 2020-09-23 15:21:41, David Laight wrote:
> From: Petr Mladek
> > Sent: 23 September 2020 16:11
> >
> > On Tue 2020-09-22 17:44:14, John Ogness wrote:
> > > vprintk_store() is using a single static buffer as a temporary
> > > sprint buffer for the message text. This will not work once
> > > @logbuf_lock is removed. Replace the single static buffer with a
> > > pool of buffers.
> >
> > The buffer is used because we do not know the length of the
> > formatted message to reserve the right space in the ring buffer
> > in advance.
> >
> > There was the idea to call vsprintf(NULL, fmt, args) to count
> > the length in advance.
> >
> > AFAIK, there is one catch. We need to use va_copy() around
> > the 1st call because va_format can be proceed only once.
> > See, va_format() in lib/vsprintf.c as an example.
> >
> > Is there any other problem, please?
>
> Potentially the data can change after the vsprintf(NULL, ...)
> call so that the buffer isn't guaranteed to be the right length.
Yup. Well, in the worst case, we would shrink the text when the length
increased in the meantime. The content of such a message is racy
anyway.
> Never mind the extra cost of doing all the work twice.
IMHO, neither of this is worth the extra complexity with a pool
of buffers.
I would prefer to increase the complexity only when there are real life
problems.
Best Regards,
Petr
On (20/09/22 17:44), John Ogness wrote:
[..]
> int vprintk_store(int facility, int level,
> const struct dev_printk_info *dev_info,
> const char *fmt, va_list args)
> {
> - static char textbuf[LOG_LINE_MAX];
> - char *text = textbuf;
> size_t text_len;
> enum log_flags lflags = 0;
> + unsigned long irqflags;
> + int sprint_id;
> + char *text;
> + int ret;
> +
> + text = get_sprint_buf(&sprint_id, &irqflags);
> + if (WARN_ON_ONCE(!text))
> + return 0;
So we failed to get the buffer. How likely that this recursive
WARN_ON_ONCE()->printk()->get_sprint_buf() will be able to get one?
I would rather expect that we will entirely lose WARN_ON_ONCE().
-ss
On (20/09/23 17:11), Petr Mladek wrote:
>
> AFAIK, there is one catch. We need to use va_copy() around
> the 1st call because va_format can be proceed only once.
>
Current printk() should be good enough for reporting, say, "Kernel
stack overflow" errors. Is extra pressure that va_copy() adds something
that we need to consider?
-ss
On (20/09/22 17:44), John Ogness wrote:
> +/*
> + * The sprint buffers are used with interrupts disabled, so each CPU
> + * only requires 2 buffers: for non-NMI and NMI contexts. Recursive
> + * printk() calls are handled by the safe buffers.
> + */
> +#define SPRINT_CTX_DEPTH 2
> +
> +/* Static sprint buffers for early boot (only 1 CPU). */
> +static DECLARE_BITMAP(sprint_static_textbuf_map, SPRINT_CTX_DEPTH);
> +static char sprint_static_textbuf[SPRINT_CTX_DEPTH * LOG_LINE_MAX];
> +
> +/* Dynamically allocated sprint buffers. */
> +static unsigned int sprint_dynamic_textbuf_count;
> +static unsigned long *sprint_dynamic_textbuf_map;
> +static char *sprint_dynamic_textbuf;
Just a question:
Can dynamic_textbuf be a PER_CPU array of five textbuf[1024] buffers
(for normal printk, nmi, hard irq, soft irq and one extra buffer for
recursive printk calls)?
So then we'd
vprintk(...)
{
preempt_disable();
buf = this_cpu_ptr(... preempt_count_to_ctx());
...
preempt_enable();
}
preempt_disable()/preempt_enable() is already in printk().
-ss
On Thu 2020-09-24 14:40:58, Sergey Senozhatsky wrote:
> On (20/09/23 17:11), Petr Mladek wrote:
> >
> > AFAIK, there is one catch. We need to use va_copy() around
> > the 1st call because va_format can be proceed only once.
> >
>
> Current printk() should be good enough for reporting, say, "Kernel
> stack overflow" errors. Is extra pressure that va_copy() adds something
> that we need to consider?
The thing is that vsprintf() traverses the arguments using va_arg().
It modifies internal values so that the next va_arg() will read
the next value.
If we want to call vsprintf() twice then we need to reset the internal
va_list states. My understanding is that va_copy() is the only legal
way when we are already nested inside va_start()/va_end().
See also the commit 5756b76e4db643d8f7 ("vsprintf: make %pV handling
compatible with kasprintf()").
Best Regards,
Petr
On (20/09/22 17:44), John Ogness wrote:
[..]
> void printk_safe_flush_on_panic(void)
> {
> - /*
> - * Make sure that we could access the main ring buffer.
> - * Do not risk a double release when more CPUs are up.
> - */
> - if (raw_spin_is_locked(&logbuf_lock)) {
> - if (num_online_cpus() > 1)
> - return;
> -
> - debug_locks_off();
> - raw_spin_lock_init(&logbuf_lock);
If we panic() with syslog_lock locked then we need to zap syslog_lock
and disable debug_locks for kmsg_dump() to succeed. Otherwise panic()
may deadlock:
syslog_lock_irq()
-> NMI
panic()
...
printk_safe_flush_on_panic();
kmsg_dump(KMSG_DUMP_PANIC); // syslog_lock_irqsave(flags)
...
Did I miss the point where we make sure that kmsg_dump() will not
deadlock?
-ss
On (20/09/24 10:45), Petr Mladek wrote:
> On Thu 2020-09-24 14:40:58, Sergey Senozhatsky wrote:
> > On (20/09/23 17:11), Petr Mladek wrote:
> > >
> > > AFAIK, there is one catch. We need to use va_copy() around
> > > the 1st call because va_format can be proceed only once.
> > >
> >
> > Current printk() should be good enough for reporting, say, "Kernel
> > stack overflow" errors. Is extra pressure that va_copy() adds something
> > that we need to consider?
>
> The thing is that vsprintf() traverses the arguments using va_arg().
> It modifies internal values so that the next va_arg() will read
> the next value.
Yes, I understand the purpose of va_copy(). I'm asking if we are
always on the safe side doing va_copy for every printk (+ potential
recursive va_copy-s).
> If we want to call vsprintf() twice then we need to reset the internal
> va_list states. My understanding is that va_copy() is the only legal
> way when we are already nested inside va_start()/va_end().
Yes, it is. My question is a bit different.
-ss
On Thu 2020-09-24 15:17:46, Sergey Senozhatsky wrote:
> On (20/09/22 17:44), John Ogness wrote:
> > +/*
> > + * The sprint buffers are used with interrupts disabled, so each CPU
> > + * only requires 2 buffers: for non-NMI and NMI contexts. Recursive
> > + * printk() calls are handled by the safe buffers.
> > + */
> > +#define SPRINT_CTX_DEPTH 2
> > +
> > +/* Static sprint buffers for early boot (only 1 CPU). */
> > +static DECLARE_BITMAP(sprint_static_textbuf_map, SPRINT_CTX_DEPTH);
> > +static char sprint_static_textbuf[SPRINT_CTX_DEPTH * LOG_LINE_MAX];
> > +
> > +/* Dynamically allocated sprint buffers. */
> > +static unsigned int sprint_dynamic_textbuf_count;
> > +static unsigned long *sprint_dynamic_textbuf_map;
> > +static char *sprint_dynamic_textbuf;
>
> Just a question:
>
> Can dynamic_textbuf be a PER_CPU array of five textbuf[1024] buffers
> (for normal printk, nmi, hard irq, soft irq and one extra buffer for
> recursive printk calls)?
That would be my preferred fallback when the approach with
vsprintf(NULL, ) is not acceptable for some reasons.
But I still think that calling vsprintf(NULL, ) is the most trivial
and good enough solution.
IMHO, the solution with per-CPU buffers is not that trivial, for
example:
What if recursive printk() is interrupted by NMI and it causes
yet another recursion?
Is one level of recursion enough?
Best Regards,
Petr
On (20/09/24 10:54), Petr Mladek wrote:
> > Just a question:
> >
> > Can dynamic_textbuf be a PER_CPU array of five textbuf[1024] buffers
> > (for normal printk, nmi, hard irq, soft irq and one extra buffer for
> > recursive printk calls)?
>
> That would be my preferred fallback when the approach with
> vsprintf(NULL, ) is not acceptable for some reasons.
OK.
> But I still think that calling vsprintf(NULL, ) is the most trivial
> and good enough solution.
It's probably good enough.
> IMHO, the solution with per-CPU buffers is not that trivial, for
> example:
>
> What if recursive printk() is interrupted by NMI and it causes
> yet another recursion?
>
> Is one level of recursion enough?
We might try the current approach - when we, for example, have
recursion in printk_safe() we just end up writing data to the
same per-CPU buffer. We need to limit the depth of recursion
one way or another. With per-CPU counter we will just bail out
of "deeply recursive printk" without attempting to store its
messages; with the buffers approach we will write the data to
a static buffer and see how badly it will be overlapped at the
end. Just a thought.
-ss
On 2020-09-24, Sergey Senozhatsky <[email protected]> wrote:
> [..]
>> void printk_safe_flush_on_panic(void)
>> {
>> - /*
>> - * Make sure that we could access the main ring buffer.
>> - * Do not risk a double release when more CPUs are up.
>> - */
>> - if (raw_spin_is_locked(&logbuf_lock)) {
>> - if (num_online_cpus() > 1)
>> - return;
>> -
>> - debug_locks_off();
>> - raw_spin_lock_init(&logbuf_lock);
>
> If we panic() with syslog_lock locked then we need to zap syslog_lock
> and disable debug_locks for kmsg_dump() to succeed. Otherwise panic()
> may deadlock:
>
> syslog_lock_irq()
> -> NMI
> panic()
> ...
> printk_safe_flush_on_panic();
> kmsg_dump(KMSG_DUMP_PANIC); // syslog_lock_irqsave(flags)
> ...
>
> Did I miss the point where we make sure that kmsg_dump() will not
> deadlock?
You are correct. Thanks.
@clear_seq should either become atomic64 or kmsg_dump() should start
from the oldest available message (seq=0).
John Ogness
On 24/09/2020 10.53, Sergey Senozhatsky wrote:
> On (20/09/24 10:45), Petr Mladek wrote:
>> On Thu 2020-09-24 14:40:58, Sergey Senozhatsky wrote:
>>> On (20/09/23 17:11), Petr Mladek wrote:
>>>>
>>>> AFAIK, there is one catch. We need to use va_copy() around
>>>> the 1st call because va_format can be proceed only once.
>>>>
>>>
>>> Current printk() should be good enough for reporting, say, "Kernel
>>> stack overflow" errors. Is extra pressure that va_copy() adds something
>>> that we need to consider?
>>
>> The thing is that vsprintf() traverses the arguments using va_arg().
>> It modifies internal values so that the next va_arg() will read
>> the next value.
>
> Yes, I understand the purpose of va_copy(). I'm asking if we are
> always on the safe side doing va_copy for every printk (+ potential
> recursive va_copy-s).
va_copy doesn't really add any extra stack use worth talking about. When
ABI says all arguments are passed on stack, va_list is just a pointer
into the arguments, and va_copy merely copies that pointer. When some
arguments are passed in register, the function prologue sets up a
"register save area" where those registers are stashed, and va_list then
contains two pointers: one to the reg save area, one to the place in the
stack where the rest of the arguments are, plus a bit of control
information on how many of the register arguments have been consumed so
far (and that control info is the only reason one must "reset" a
va_list, or rather use a copy that was made before consumption started).
In either case, an extra va_list doesn't cost more than 24 bytes or so
of stack - even if printk() was called with 100 arguments, those
90-100ish arguments are only stored once on the stack.
Rasmus
On 23/09/2020 17.11, Petr Mladek wrote:
> On Tue 2020-09-22 17:44:14, John Ogness wrote:
>> vprintk_store() is using a single static buffer as a temporary
>> sprint buffer for the message text. This will not work once
>> @logbuf_lock is removed. Replace the single static buffer with a
>> pool of buffers.
>
> The buffer is used because we do not know the length of the
> formatted message to reserve the right space in the ring buffer
> in advance.
>
> There was the idea to call vsprintf(NULL, fmt, args) to count
> the length in advance.
sprintf is dog slow. If you do this, perhaps say "we can afford to use
128 bytes of stack" and do vsprintf(stackbuf, 128, fmt, args) to do the
counting, and in the vast majority of cases where the text fits we don't
need to do vsprintf() again.
Rasmus
On 24/09/2020 11.54, Rasmus Villemoes wrote:
> On 23/09/2020 17.11, Petr Mladek wrote:
>> On Tue 2020-09-22 17:44:14, John Ogness wrote:
>>> vprintk_store() is using a single static buffer as a temporary
>>> sprint buffer for the message text. This will not work once
>>> @logbuf_lock is removed. Replace the single static buffer with a
>>> pool of buffers.
>>
>> The buffer is used because we do not know the length of the
>> formatted message to reserve the right space in the ring buffer
>> in advance.
>>
>> There was the idea to call vsprintf(NULL, fmt, args) to count
>> the length in advance.
>
> sprintf is dog slow. If you do this, perhaps say "we can afford to use
> 128 bytes of stack" and do vsprintf(stackbuf, 128, fmt, args) to do the
> counting, and in the vast majority of cases where the text fits we don't
> need to do vsprintf() again.
Or, since 128 bytes of stack may be too much, combine the "pre-allocate
a few buffers" with "but fall back to vsprintf(NULL) if we can't get
one". That should allow choosing the X in X*1024 smaller than what
worst-case requires (which will never actually be used, assuming the
machine is doing something useful rather than just printk'ing all day
long) and still works even when a tmp buffer can't be obtained.
Rasmus
On Thu 2020-09-24 18:06:53, Sergey Senozhatsky wrote:
> On (20/09/24 10:54), Petr Mladek wrote:
> > > Just a question:
> > >
> > > Can dynamic_textbuf be a PER_CPU array of five textbuf[1024] buffers
> > > (for normal printk, nmi, hard irq, soft irq and one extra buffer for
> > > recursive printk calls)?
> >
> > That would be my preferred fallback when the approach with
> > vsprintf(NULL, ) is not acceptable for some reasons.
>
> OK.
>
> > But I still think that calling vsprintf(NULL, ) is the most trivial
> > and good enough solution.
>
> It's probably good enough.
>
> > IMHO, the solution with per-CPU buffers is not that trivial, for
> > example:
> >
> > What if recursive printk() is interrupted by NMI and it causes
> > yet another recursion?
> >
> > Is one level of recursion enough?
>
> We might try the current approach - when we, for example, have
> recursion in printk_safe() we just end up writing data to the
> same per-CPU buffer. We need to limit the depth of recursion
> one way or another. With per-CPU counter we will just bail out
> of "deeply recursive printk" without attempting to store its
> messages; with the buffers approach we will write the data to
> a static buffer and see how badly it will be overlapped at the
> end. Just a thought.
It makes some sense but it has just another bunch of problems:
+ require some tricky code [*]
+ temporary buffers need extra care in panic()
+ does not guarantee that there is enough stack for the recursion
I think that limiting recursion to some level (3 or 5?) is much
easier and provide similar protection.
[*] We could reuse printk_safe but we would need to maintain it.
In fact, the code might be easier because it will not be needed
to read the buffer from other CPU. But it would need to be
tricky anyway.
Best Regards,
Petr
On Thu 2020-09-24 11:49:06, Rasmus Villemoes wrote:
> On 24/09/2020 10.53, Sergey Senozhatsky wrote:
> > On (20/09/24 10:45), Petr Mladek wrote:
> >> On Thu 2020-09-24 14:40:58, Sergey Senozhatsky wrote:
> >>> On (20/09/23 17:11), Petr Mladek wrote:
> >>>>
> >>>> AFAIK, there is one catch. We need to use va_copy() around
> >>>> the 1st call because va_format can be proceed only once.
> >>>>
> >>>
> >>> Current printk() should be good enough for reporting, say, "Kernel
> >>> stack overflow" errors. Is extra pressure that va_copy() adds something
> >>> that we need to consider?
> >>
> >> The thing is that vsprintf() traverses the arguments using va_arg().
> >> It modifies internal values so that the next va_arg() will read
> >> the next value.
> >
> > Yes, I understand the purpose of va_copy(). I'm asking if we are
> > always on the safe side doing va_copy for every printk (+ potential
> > recursive va_copy-s).
>
> va_copy doesn't really add any extra stack use worth talking about. When
> ABI says all arguments are passed on stack, va_list is just a pointer
> into the arguments, and va_copy merely copies that pointer. When some
> arguments are passed in register, the function prologue sets up a
> "register save area" where those registers are stashed, and va_list then
> contains two pointers: one to the reg save area, one to the place in the
> stack where the rest of the arguments are, plus a bit of control
> information on how many of the register arguments have been consumed so
> far (and that control info is the only reason one must "reset" a
> va_list, or rather use a copy that was made before consumption started).
> In either case, an extra va_list doesn't cost more than 24 bytes or so
> of stack - even if printk() was called with 100 arguments, those
> 90-100ish arguments are only stored once on the stack.
Thanks a lot for the detailed information. It is good to know that
va_copy() does not cost that much.
Best Regards,
Petr
On Thu 2020-09-24 14:32:49, Rasmus Villemoes wrote:
> On 24/09/2020 11.54, Rasmus Villemoes wrote:
> > On 23/09/2020 17.11, Petr Mladek wrote:
> >> On Tue 2020-09-22 17:44:14, John Ogness wrote:
> >>> vprintk_store() is using a single static buffer as a temporary
> >>> sprint buffer for the message text. This will not work once
> >>> @logbuf_lock is removed. Replace the single static buffer with a
> >>> pool of buffers.
> >>
> >> The buffer is used because we do not know the length of the
> >> formatted message to reserve the right space in the ring buffer
> >> in advance.
> >>
> >> There was the idea to call vsprintf(NULL, fmt, args) to count
> >> the length in advance.
> >
> > sprintf is dog slow.
Well, printk() is a slow path. It has never been too optimized for
speed. The main purpose is to report problems and eventually some
interesting information.
Yes, it is used also for debugging. But there is always trace_printk()
that is designed to be fast. I guess that people have to use it
already today when debugging performance crytical code.
> If you do this, perhaps say "we can afford to use
> > 128 bytes of stack" and do vsprintf(stackbuf, 128, fmt, args) to do the
> > counting, and in the vast majority of cases where the text fits we don't
> > need to do vsprintf() again.
It is an interesting idea. I am a bit concerned about the stack usage. It
might limit a possible nesting too much.
> Or, since 128 bytes of stack may be too much, combine the "pre-allocate
> a few buffers" with "but fall back to vsprintf(NULL) if we can't get
> one". That should allow choosing the X in X*1024 smaller than what
> worst-case requires (which will never actually be used, assuming the
> machine is doing something useful rather than just printk'ing all day
> long) and still works even when a tmp buffer can't be obtained.
I would like to keep it simple at the beginning. We could always add
some fancy tricks when it is not good enough.
Best Regards,
Petr
On 25/09/2020 10.28, Petr Mladek wrote:
> On Thu 2020-09-24 14:32:49, Rasmus Villemoes wrote:
>> On 24/09/2020 11.54, Rasmus Villemoes wrote:
>>> On 23/09/2020 17.11, Petr Mladek wrote:
>>>> On Tue 2020-09-22 17:44:14, John Ogness wrote:
>>>>> vprintk_store() is using a single static buffer as a temporary
>>>>> sprint buffer for the message text. This will not work once
>>>>> @logbuf_lock is removed. Replace the single static buffer with a
>>>>> pool of buffers.
>>>>
>>>> The buffer is used because we do not know the length of the
>>>> formatted message to reserve the right space in the ring buffer
>>>> in advance.
>>>>
>>>> There was the idea to call vsprintf(NULL, fmt, args) to count
>>>> the length in advance.
>>>
>>> sprintf is dog slow.
>
> Well, printk() is a slow path. It has never been too optimized for
> speed. The main purpose is to report problems and eventually some
> interesting information.
True. But remember that printk is called from _everywhere_, with all
sorts of locks held and/or preemption disabled or whatnot, and every
cycle spent in printk makes those windows wider. Doubling the cost of
every single printk by unconditionally doing vsnprintf() twice is a bad
idea.
Rasmus
On Wed 2020-09-30 10:06:24, Rasmus Villemoes wrote:
> On 25/09/2020 10.28, Petr Mladek wrote:
> > On Thu 2020-09-24 14:32:49, Rasmus Villemoes wrote:
> >> On 24/09/2020 11.54, Rasmus Villemoes wrote:
> >>> On 23/09/2020 17.11, Petr Mladek wrote:
> >>>> On Tue 2020-09-22 17:44:14, John Ogness wrote:
> >>>>> vprintk_store() is using a single static buffer as a temporary
> >>>>> sprint buffer for the message text. This will not work once
> >>>>> @logbuf_lock is removed. Replace the single static buffer with a
> >>>>> pool of buffers.
> >>>>
> >>>> The buffer is used because we do not know the length of the
> >>>> formatted message to reserve the right space in the ring buffer
> >>>> in advance.
> >>>>
> >>>> There was the idea to call vsprintf(NULL, fmt, args) to count
> >>>> the length in advance.
> >>>
> >>> sprintf is dog slow.
> >
> > Well, printk() is a slow path. It has never been too optimized for
> > speed. The main purpose is to report problems and eventually some
> > interesting information.
>
> True. But remember that printk is called from _everywhere_, with all
> sorts of locks held and/or preemption disabled or whatnot, and every
> cycle spent in printk makes those windows wider. Doubling the cost of
> every single printk by unconditionally doing vsnprintf() twice is a bad
> idea.
I would prefer to solve this when there are real life problems.
printk() should not get called in performance sensitive paths in
the first place.
We could always make the code more complicated when it can be
justified. Extra buffers is yet another layer (code and memory)
in the processing. And the current arguments sounds theoretical.
Anyway, the performance is not that obvious. The 1st run should
be faster because the result will not be stored. 2nd run should
be faster because the input values will likely by cached. Using
temporary buffers would require extra copying.
Best Regards,
Petr
On 2020-09-30, Petr Mladek <[email protected]> wrote:
>> Doubling the cost of every single printk by unconditionally doing
>> vsnprintf() twice is a bad idea.
>
> I would prefer to solve this when there are real life problems.
> printk() should not get called in performance sensitive paths in
> the first place.
>
> We could always make the code more complicated when it can be
> justified. Extra buffers is yet another layer (code and memory)
> in the processing. And the current arguments sounds theoretical.
I am preparing a v2 that avoids the memory pool.
John Ogness
On Wed, 30 Sep 2020 10:06:24 +0200
Rasmus Villemoes <[email protected]> wrote:
> True. But remember that printk is called from _everywhere_, with all
> sorts of locks held and/or preemption disabled or whatnot, and every
> cycle spent in printk makes those windows wider. Doubling the cost of
> every single printk by unconditionally doing vsnprintf() twice is a bad
> idea.
But the console output is usually magnitudes more expensive than the
vsnprintf(), would doing it twice really make a difference?
-- Steve
From: Steven Rostedt
> Sent: 30 September 2020 14:36
>
> On Wed, 30 Sep 2020 10:06:24 +0200
> Rasmus Villemoes <[email protected]> wrote:
>
> > True. But remember that printk is called from _everywhere_, with all
> > sorts of locks held and/or preemption disabled or whatnot, and every
> > cycle spent in printk makes those windows wider. Doubling the cost of
> > every single printk by unconditionally doing vsnprintf() twice is a bad
> > idea.
>
> But the console output is usually magnitudes more expensive than the
> vsnprintf(), would doing it twice really make a difference?
Are there any strange %pX modifiers that do anything really horrid?
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On 30/09/2020 15.35, Steven Rostedt wrote:
> On Wed, 30 Sep 2020 10:06:24 +0200
> Rasmus Villemoes <[email protected]> wrote:
>
>> True. But remember that printk is called from _everywhere_, with all
>> sorts of locks held and/or preemption disabled or whatnot, and every
>> cycle spent in printk makes those windows wider. Doubling the cost of
>> every single printk by unconditionally doing vsnprintf() twice is a bad
>> idea.
>
> But the console output is usually magnitudes more expensive than the
> vsnprintf(), would doing it twice really make a difference?
AFAIU, not every message gets printed to the console directly - syslog(2):
/proc/sys/kernel/printk
/proc/sys/kernel/printk is a writable file containing four
integer val‐
ues that influence kernel printk() behavior when printing or
logging
error messages. The four values are:
console_loglevel
Only messages with a log level lower than this value
will be
printed to the console. The default value for this
field is
DEFAULT_CONSOLE_LOGLEVEL (7), but it is set to 4 if the
kernel
command line contains the word "quiet",
So the normal state of things is that you don't pay the cost of printing
to the console for all the pr_debug (ok, they may be compiled out or
run-time disabled depending on DYNAMIC_DEBUG etc.), nor info, notice,
warn. For those messages that are not directly written to the console,
the vsnprintf() is a large part of the cost (not exactly half, of
course, so doubling is an exaggeration, but whether it's 70% or 100%
doesn't really matter).
I'm not at all concerned about pr_err and above becoming more expensive,
they are rare. But random drivers are filled with random pr_info in
random contexts - just a small selection from dmesg -x shows these
really important things:
kern :info : [ 4631.338105] ax88179_178a 3-13.2.3.3:1.0 eth0: ax88179
- Link status is: 1
kern :info : [ 4642.218100] ax88179_178a 3-13.2.3.3:1.0 eth0: ax88179
- Link status is: 0
kern :info : [ 4643.882038] ax88179_178a 3-13.2.3.3:1.0 eth0: ax88179
- Link status is: 1
kern :info : [ 4667.562011] ax88179_178a 3-13.2.3.3:1.0 eth0: ax88179
- Link status is: 0
...
kern :info : [ 9149.215456] [drm] ring test on 1 succeeded in 1 usecs
kern :info : [ 9149.215459] [drm] ring test on 2 succeeded in 1 usecs
kern :info : [ 9149.215466] [drm] ring test on 3 succeeded in 4 usecs
and if I'm reading the code correctly, the former is even an example of
something that happens in irq context.
Rasmus
On Thu 2020-10-01 09:15:39, Rasmus Villemoes wrote:
> On 30/09/2020 15.35, Steven Rostedt wrote:
> > On Wed, 30 Sep 2020 10:06:24 +0200
> > Rasmus Villemoes <[email protected]> wrote:
> >
> >> True. But remember that printk is called from _everywhere_, with all
> >> sorts of locks held and/or preemption disabled or whatnot, and every
> >> cycle spent in printk makes those windows wider. Doubling the cost of
> >> every single printk by unconditionally doing vsnprintf() twice is a bad
> >> idea.
> >
> > But the console output is usually magnitudes more expensive than the
> > vsnprintf(), would doing it twice really make a difference?
>
> AFAIU, not every message gets printed to the console directly - syslog(2):
>
> /proc/sys/kernel/printk
> /proc/sys/kernel/printk is a writable file containing four
> integer val‐
> ues that influence kernel printk() behavior when printing or
> logging
> error messages. The four values are:
>
> console_loglevel
> Only messages with a log level lower than this value
> will be
> printed to the console. The default value for this
> field is
> DEFAULT_CONSOLE_LOGLEVEL (7), but it is set to 4 if the
> kernel
> command line contains the word "quiet",
>
> So the normal state of things is that you don't pay the cost of printing
> to the console for all the pr_debug (ok, they may be compiled out or
> run-time disabled depending on DYNAMIC_DEBUG etc.), nor info, notice,
> warn. For those messages that are not directly written to the console,
> the vsnprintf() is a large part of the cost (not exactly half, of
> course, so doubling is an exaggeration, but whether it's 70% or 100%
> doesn't really matter).
>
> I'm not at all concerned about pr_err and above becoming more expensive,
> they are rare. But random drivers are filled with random pr_info in
> random contexts - just a small selection from dmesg -x shows these
> really important things:
>
> kern :info : [ 4631.338105] ax88179_178a 3-13.2.3.3:1.0 eth0: ax88179
> - Link status is: 1
> kern :info : [ 4642.218100] ax88179_178a 3-13.2.3.3:1.0 eth0: ax88179
> - Link status is: 0
> kern :info : [ 4643.882038] ax88179_178a 3-13.2.3.3:1.0 eth0: ax88179
> - Link status is: 1
> kern :info : [ 4667.562011] ax88179_178a 3-13.2.3.3:1.0 eth0: ax88179
> - Link status is: 0
> ...
> kern :info : [ 9149.215456] [drm] ring test on 1 succeeded in 1 usecs
> kern :info : [ 9149.215459] [drm] ring test on 2 succeeded in 1 usecs
> kern :info : [ 9149.215466] [drm] ring test on 3 succeeded in 4 usecs
>
> and if I'm reading the code correctly, the former is even an example of
> something that happens in irq context.
As I already wrote. We might optimize this when it causes real
life problems. And trace_printk() is a better choice for performance
sensitive debugging.
Best Regards,
Petr