This is the next part of our printk-rework effort (points 2 and 6 of
the LPC 2019 summary [0]).
This series implements a kthread for each registered console. The
kthreads locklessly retrieve the records from the printk ringbuffer
and also do not cause any lock contention between one another. This
allows consoles to run at full speed. For example, a netconsole is
able to dump records much faster than a serial or vt console.
There are situations where kthread printing is not sufficient. For
example, during early boot before kthreads exist or during panic
situations, where the kthreads may not get a chance to schedule. In
these cases, the current method of printing directly within the
printk() caller context is used. New functions printk_direct_enter()
and printk_direct_exit() are made available to mark areas of the
kernel where direct printing should be used.
This series also introduces pr_flush(): a might_sleep() function
that will block until all active printing threads have caught up
to the latest record at the time of the pr_flush() call. This
function is useful, for example, to wait until pending records
are flushed to consoles before suspending.
And finally, this series introduces a new console flag
CON_MIGHT_SLEEP that can be set by consoles that only want their
write() callback called in sleepable contexts. This was requested
by Daniel Vetter [1] due to the many and growing number of deadlock
situations in the fbcon code.
Note that this series does *not* increase the reliability of console
printing. Rather it focuses on the non-interference aspect of
printk() by decoupling printk() callers from printing (during normal
operation). Nonetheless, the reliability aspect should not worsen
due to this series (with perhaps the exception of the fbcon, which
will only print from sleepable contexts).
[0] https://lore.kernel.org/lkml/[email protected]
[1] https://lore.kernel.org/all/[email protected]
John Ogness (13):
printk: rename cpulock functions
printk: cpu sync always disable interrupts
printk: use percpu flag instead of cpu_online()
printk: get caller_id/timestamp after migration disable
printk: call boot_delay_msec() in printk_delay()
printk: refactor and rework printing logic
printk: move buffer definitions into console_emit_next_record() caller
printk: add pr_flush()
printk: add functions to allow direct printing
printk: add kthread console printers
printk: reimplement console_lock for proper kthread support
printk: remove @console_locked
console: introduce CON_MIGHT_SLEEP for vt
drivers/tty/sysrq.c | 2 +
drivers/tty/vt/vt.c | 2 +-
include/linux/console.h | 22 +-
include/linux/printk.h | 70 ++--
kernel/hung_task.c | 11 +-
kernel/printk/printk.c | 903 +++++++++++++++++++++++++++++-----------
kernel/rcu/tree_stall.h | 2 +
kernel/reboot.c | 14 +-
kernel/watchdog.c | 4 +
kernel/watchdog_hld.c | 4 +
lib/dump_stack.c | 6 +-
lib/nmi_backtrace.c | 6 +-
12 files changed, 765 insertions(+), 281 deletions(-)
base-commit: a229327733b86aa585effdb0d27a87b12aa51597
--
2.30.2
Deadlocks and the framebuffer console have been a recurring issue
that is getting worse. Daniel Vetter suggested [0] that
fbcon->write() should no longer be called from an atomic context.
Introduce a new console flag CON_MIGHT_SLEEP for a console driver to
specify that it is only called from sleepable contexts. Set the
fbcon to use this new flag.
[0] https://lore.kernel.org/all/[email protected]
Signed-off-by: John Ogness <[email protected]>
---
drivers/tty/vt/vt.c | 2 +-
include/linux/console.h | 1 +
kernel/printk/printk.c | 2 ++
3 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 7359c3e80d63..ab4712cc9327 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -3161,7 +3161,7 @@ static struct console vt_console_driver = {
.write = vt_console_print,
.device = vt_console_device,
.unblank = unblank_screen,
- .flags = CON_PRINTBUFFER,
+ .flags = CON_PRINTBUFFER|CON_MIGHT_SLEEP,
.index = -1,
};
#endif
diff --git a/include/linux/console.h b/include/linux/console.h
index c51c7f5507a5..ea52c56b3ff8 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -138,6 +138,7 @@ static inline int con_debug_leave(void)
#define CON_BRL (32) /* Used for a braille device */
#define CON_EXTENDED (64) /* Use the extended output format a la /dev/kmsg */
#define CON_PAUSED (128) /* Sleep while console is locked */
+#define CON_MIGHT_SLEEP (256) /* Can only be called from sleepable context */
struct console {
char name[16];
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 086155578f10..b92ef67a5aa2 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2836,6 +2836,8 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
if (!console_is_usable(con))
continue;
+ if ((con->flags & CON_MIGHT_SLEEP) && !do_cond_resched)
+ continue;
any_usable = true;
if (con->flags & CON_EXTENDED) {
--
2.30.2
Since the printk cpulock is CPU-reentrant and since it is used
in all contexts, its usage must be carefully considered and
most likely will require programming locklessly. To avoid
mistaking the printk cpulock as a typical lock, rename it to
cpu_sync. The main functions then become:
printk_cpu_sync_get_irqsave(flags);
printk_cpu_sync_put_irqrestore(flags);
Signed-off-by: John Ogness <[email protected]>
---
include/linux/printk.h | 42 ++++++++++++-------------
kernel/printk/printk.c | 71 +++++++++++++++++++++---------------------
lib/dump_stack.c | 4 +--
lib/nmi_backtrace.c | 4 +--
4 files changed, 61 insertions(+), 60 deletions(-)
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 9497f6b98339..780408b29c13 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -281,43 +281,43 @@ static inline void printk_trigger_flush(void)
#endif
#ifdef CONFIG_SMP
-extern int __printk_cpu_trylock(void);
-extern void __printk_wait_on_cpu_lock(void);
-extern void __printk_cpu_unlock(void);
+extern int __printk_cpu_sync_try_get(void);
+extern void __printk_cpu_sync_wait(void);
+extern void __printk_cpu_sync_put(void);
/**
- * printk_cpu_lock_irqsave() - Acquire the printk cpu-reentrant spinning
- * lock and disable interrupts.
+ * printk_cpu_sync_get_irqsave() - Acquire the printk cpu-reentrant spinning
+ * lock and disable interrupts.
* @flags: Stack-allocated storage for saving local interrupt state,
- * to be passed to printk_cpu_unlock_irqrestore().
+ * to be passed to printk_cpu_sync_put_irqrestore().
*
* If the lock is owned by another CPU, spin until it becomes available.
* Interrupts are restored while spinning.
*/
-#define printk_cpu_lock_irqsave(flags) \
- for (;;) { \
- local_irq_save(flags); \
- if (__printk_cpu_trylock()) \
- break; \
- local_irq_restore(flags); \
- __printk_wait_on_cpu_lock(); \
+#define printk_cpu_sync_get_irqsave(flags) \
+ for (;;) { \
+ local_irq_save(flags); \
+ if (__printk_cpu_sync_try_get()) \
+ break; \
+ local_irq_restore(flags); \
+ __printk_cpu_sync_wait(); \
}
/**
- * printk_cpu_unlock_irqrestore() - Release the printk cpu-reentrant spinning
- * lock and restore interrupts.
- * @flags: Caller's saved interrupt state, from printk_cpu_lock_irqsave().
+ * printk_cpu_sync_put_irqrestore() - Release the printk cpu-reentrant spinning
+ * lock and restore interrupts.
+ * @flags: Caller's saved interrupt state, from printk_cpu_sync_get_irqsave().
*/
-#define printk_cpu_unlock_irqrestore(flags) \
+#define printk_cpu_sync_put_irqrestore(flags) \
do { \
- __printk_cpu_unlock(); \
+ __printk_cpu_sync_put(); \
local_irq_restore(flags); \
- } while (0) \
+ } while (0)
#else
-#define printk_cpu_lock_irqsave(flags) ((void)flags)
-#define printk_cpu_unlock_irqrestore(flags) ((void)flags)
+#define printk_cpu_sync_get_irqsave(flags) ((void)flags)
+#define printk_cpu_sync_put_irqrestore(flags) ((void)flags)
#endif /* CONFIG_SMP */
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 155229f0cf0f..d1b773823d63 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3598,26 +3598,26 @@ EXPORT_SYMBOL_GPL(kmsg_dump_rewind);
#endif
#ifdef CONFIG_SMP
-static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1);
-static atomic_t printk_cpulock_nested = ATOMIC_INIT(0);
+static atomic_t printk_cpu_sync_owner = ATOMIC_INIT(-1);
+static atomic_t printk_cpu_sync_nested = ATOMIC_INIT(0);
/**
- * __printk_wait_on_cpu_lock() - Busy wait until the printk cpu-reentrant
- * spinning lock is not owned by any CPU.
+ * __printk_cpu_sync_wait() - Busy wait until the printk cpu-reentrant
+ * spinning lock is not owned by any CPU.
*
* Context: Any context.
*/
-void __printk_wait_on_cpu_lock(void)
+void __printk_cpu_sync_wait(void)
{
do {
cpu_relax();
- } while (atomic_read(&printk_cpulock_owner) != -1);
+ } while (atomic_read(&printk_cpu_sync_owner) != -1);
}
-EXPORT_SYMBOL(__printk_wait_on_cpu_lock);
+EXPORT_SYMBOL(__printk_cpu_sync_wait);
/**
- * __printk_cpu_trylock() - Try to acquire the printk cpu-reentrant
- * spinning lock.
+ * __printk_cpu_sync_try_get() - Try to acquire the printk cpu-reentrant
+ * spinning lock.
*
* If no processor has the lock, the calling processor takes the lock and
* becomes the owner. If the calling processor is already the owner of the
@@ -3626,7 +3626,7 @@ EXPORT_SYMBOL(__printk_wait_on_cpu_lock);
* Context: Any context. Expects interrupts to be disabled.
* Return: 1 on success, otherwise 0.
*/
-int __printk_cpu_trylock(void)
+int __printk_cpu_sync_try_get(void)
{
int cpu;
int old;
@@ -3636,79 +3636,80 @@ int __printk_cpu_trylock(void)
/*
* Guarantee loads and stores from this CPU when it is the lock owner
* are _not_ visible to the previous lock owner. This pairs with
- * __printk_cpu_unlock:B.
+ * __printk_cpu_sync_put:B.
*
* Memory barrier involvement:
*
- * If __printk_cpu_trylock:A reads from __printk_cpu_unlock:B, then
- * __printk_cpu_unlock:A can never read from __printk_cpu_trylock:B.
+ * If __printk_cpu_sync_try_get:A reads from __printk_cpu_sync_put:B,
+ * then __printk_cpu_sync_put:A can never read from
+ * __printk_cpu_sync_try_get:B.
*
* Relies on:
*
- * RELEASE from __printk_cpu_unlock:A to __printk_cpu_unlock:B
+ * RELEASE from __printk_cpu_sync_put:A to __printk_cpu_sync_put:B
* of the previous CPU
* matching
- * ACQUIRE from __printk_cpu_trylock:A to __printk_cpu_trylock:B
- * of this CPU
+ * ACQUIRE from __printk_cpu_sync_try_get:A to
+ * __printk_cpu_sync_try_get:B of this CPU
*/
- old = atomic_cmpxchg_acquire(&printk_cpulock_owner, -1,
- cpu); /* LMM(__printk_cpu_trylock:A) */
+ old = atomic_cmpxchg_acquire(&printk_cpu_sync_owner, -1,
+ cpu); /* LMM(__printk_cpu_sync_try_get:A) */
if (old == -1) {
/*
* This CPU is now the owner and begins loading/storing
- * data: LMM(__printk_cpu_trylock:B)
+ * data: LMM(__printk_cpu_sync_try_get:B)
*/
return 1;
} else if (old == cpu) {
/* This CPU is already the owner. */
- atomic_inc(&printk_cpulock_nested);
+ atomic_inc(&printk_cpu_sync_nested);
return 1;
}
return 0;
}
-EXPORT_SYMBOL(__printk_cpu_trylock);
+EXPORT_SYMBOL(__printk_cpu_sync_try_get);
/**
- * __printk_cpu_unlock() - Release the printk cpu-reentrant spinning lock.
+ * __printk_cpu_sync_put() - Release the printk cpu-reentrant spinning lock.
*
* The calling processor must be the owner of the lock.
*
* Context: Any context. Expects interrupts to be disabled.
*/
-void __printk_cpu_unlock(void)
+void __printk_cpu_sync_put(void)
{
- if (atomic_read(&printk_cpulock_nested)) {
- atomic_dec(&printk_cpulock_nested);
+ if (atomic_read(&printk_cpu_sync_nested)) {
+ atomic_dec(&printk_cpu_sync_nested);
return;
}
/*
* This CPU is finished loading/storing data:
- * LMM(__printk_cpu_unlock:A)
+ * LMM(__printk_cpu_sync_put:A)
*/
/*
* Guarantee loads and stores from this CPU when it was the
* lock owner are visible to the next lock owner. This pairs
- * with __printk_cpu_trylock:A.
+ * with __printk_cpu_sync_try_get:A.
*
* Memory barrier involvement:
*
- * If __printk_cpu_trylock:A reads from __printk_cpu_unlock:B,
- * then __printk_cpu_trylock:B reads from __printk_cpu_unlock:A.
+ * If __printk_cpu_sync_try_get:A reads from __printk_cpu_sync_put:B,
+ * then __printk_cpu_sync_try_get:B reads from __printk_cpu_sync_put:A.
*
* Relies on:
*
- * RELEASE from __printk_cpu_unlock:A to __printk_cpu_unlock:B
+ * RELEASE from __printk_cpu_sync_put:A to __printk_cpu_sync_put:B
* of this CPU
* matching
- * ACQUIRE from __printk_cpu_trylock:A to __printk_cpu_trylock:B
- * of the next CPU
+ * ACQUIRE from __printk_cpu_sync_try_get:A to
+ * __printk_cpu_sync_try_get:B of the next CPU
*/
- atomic_set_release(&printk_cpulock_owner,
- -1); /* LMM(__printk_cpu_unlock:B) */
+ atomic_set_release(&printk_cpu_sync_owner,
+ -1); /* LMM(__printk_cpu_sync_put:B) */
}
-EXPORT_SYMBOL(__printk_cpu_unlock);
+EXPORT_SYMBOL(__printk_cpu_sync_put);
#endif /* CONFIG_SMP */
diff --git a/lib/dump_stack.c b/lib/dump_stack.c
index 6b7f1bf6715d..83471e81501a 100644
--- a/lib/dump_stack.c
+++ b/lib/dump_stack.c
@@ -102,9 +102,9 @@ asmlinkage __visible void dump_stack_lvl(const char *log_lvl)
* Permit this cpu to perform nested stack dumps while serialising
* against other CPUs
*/
- printk_cpu_lock_irqsave(flags);
+ printk_cpu_sync_get_irqsave(flags);
__dump_stack(log_lvl);
- printk_cpu_unlock_irqrestore(flags);
+ printk_cpu_sync_put_irqrestore(flags);
}
EXPORT_SYMBOL(dump_stack_lvl);
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 199ab201d501..d01aec6ae15c 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -99,7 +99,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
* Allow nested NMI backtraces while serializing
* against other CPUs.
*/
- printk_cpu_lock_irqsave(flags);
+ printk_cpu_sync_get_irqsave(flags);
if (!READ_ONCE(backtrace_idle) && regs && cpu_in_idle(instruction_pointer(regs))) {
pr_warn("NMI backtrace for cpu %d skipped: idling at %pS\n",
cpu, (void *)instruction_pointer(regs));
@@ -110,7 +110,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
else
dump_stack();
}
- printk_cpu_unlock_irqrestore(flags);
+ printk_cpu_sync_put_irqrestore(flags);
cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
return true;
}
--
2.30.2
boot_delay_msec() is always called immediately before printk_delay()
so just call it from within printk_delay().
Signed-off-by: John Ogness <[email protected]>
---
kernel/printk/printk.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 206405f349d0..8c9a89c60989 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2002,8 +2002,10 @@ static u8 *__printk_recursion_counter(void)
int printk_delay_msec __read_mostly;
-static inline void printk_delay(void)
+static inline void printk_delay(int level)
{
+ boot_delay_msec(level);
+
if (unlikely(printk_delay_msec)) {
int m = printk_delay_msec;
@@ -2224,8 +2226,7 @@ asmlinkage int vprintk_emit(int facility, int level,
in_sched = true;
}
- boot_delay_msec(level);
- printk_delay();
+ printk_delay(level);
printed_len = vprintk_store(facility, level, dev_info, fmt, args);
--
2.30.2
The CON_ANYTIME console flag is used to label consoles that will
work correctly before percpu resources are allocated. To check
the condition, cpu_online(raw_smp_processor_id()) was used.
However, this is odd because CPUs can go offline at a later point.
Also, the function is forced to use the raw_ variant because
migration is not disabled.
Since commit ab6f762f0f53 ("printk: queue wake_up_klogd irq_work
only if per-CPU areas are ready") there is a variable to identify
if percpu resources have been allocated. Use that variable instead
of cpu_online(raw_smp_processor_id()).
Signed-off-by: John Ogness <[email protected]>
---
include/linux/console.h | 2 +-
kernel/printk/printk.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/linux/console.h b/include/linux/console.h
index 7cd758a4f44e..aa4d3fb4c1c5 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -133,7 +133,7 @@ static inline int con_debug_leave(void)
#define CON_CONSDEV (2) /* Preferred console, /dev/console */
#define CON_ENABLED (4)
#define CON_BOOT (8)
-#define CON_ANYTIME (16) /* Safe to call when cpu is offline */
+#define CON_ANYTIME (16) /* Safe to call before per-cpu resources ready */
#define CON_BRL (32) /* Used for a braille device */
#define CON_EXTENDED (64) /* Use the extended output format a la /dev/kmsg */
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d1b773823d63..b346e60e9e51 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2577,11 +2577,11 @@ static int have_callable_console(void)
*
* Console drivers may assume that per-cpu resources have been allocated. So
* unless they're explicitly marked as being able to cope (CON_ANYTIME) don't
- * call them until this CPU is officially up.
+ * call them until per-cpu resources have been allocated.
*/
static inline int can_use_console(void)
{
- return cpu_online(raw_smp_processor_id()) || have_callable_console();
+ return (printk_percpu_data_ready() || have_callable_console());
}
/**
--
2.30.2
Provide a might-sleep function to allow waiting for threaded console
printers to catch up to the latest logged message.
Use pr_flush() whenever it is desirable to get buffered messages
printed before continuing: suspend_console(), resume_console(),
console_stop(), console_start(), console_unblank().
Signed-off-by: John Ogness <[email protected]>
---
include/linux/printk.h | 7 ++++
kernel/printk/printk.c | 73 +++++++++++++++++++++++++++++++++++++++++-
2 files changed, 79 insertions(+), 1 deletion(-)
diff --git a/include/linux/printk.h b/include/linux/printk.h
index cd8192e611a1..6596f02d1f05 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -170,6 +170,8 @@ extern void __printk_safe_exit(void);
#define printk_deferred_enter __printk_safe_enter
#define printk_deferred_exit __printk_safe_exit
+extern bool pr_flush(int timeout_ms, bool reset_on_progress);
+
/*
* Please don't use printk_ratelimit(), because it shares ratelimiting state
* with all other unrelated printk_ratelimit() callsites. Instead use
@@ -224,6 +226,11 @@ static inline void printk_deferred_exit(void)
{
}
+static inline bool pr_flush(int timeout_ms, bool reset_on_progress)
+{
+ return true;
+}
+
static inline int printk_ratelimit(void)
{
return 0;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 02bde45c1149..1e80fd052bd5 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2449,6 +2449,7 @@ void suspend_console(void)
if (!console_suspend_enabled)
return;
pr_info("Suspending console(s) (use no_console_suspend to debug)\n");
+ pr_flush(1000, true);
console_lock();
console_suspended = 1;
up_console_sem();
@@ -2461,6 +2462,7 @@ void resume_console(void)
down_console_sem();
console_suspended = 0;
console_unlock();
+ pr_flush(1000, true);
}
/**
@@ -2802,8 +2804,10 @@ void console_unblank(void)
if (oops_in_progress) {
if (down_trylock_console_sem() != 0)
return;
- } else
+ } else {
+ pr_flush(1000, true);
console_lock();
+ }
console_locked = 1;
console_may_schedule = 0;
@@ -2869,6 +2873,7 @@ struct tty_driver *console_device(int *index)
*/
void console_stop(struct console *console)
{
+ pr_flush(1000, true);
console_lock();
console->flags &= ~CON_ENABLED;
console_unlock();
@@ -2880,6 +2885,7 @@ void console_start(struct console *console)
console_lock();
console->flags |= CON_ENABLED;
console_unlock();
+ pr_flush(1000, true);
}
EXPORT_SYMBOL(console_start);
@@ -3249,6 +3255,71 @@ static int __init printk_late_init(void)
late_initcall(printk_late_init);
#if defined CONFIG_PRINTK
+/**
+ * pr_flush() - Wait for printing threads to catch up.
+ *
+ * @timeout_ms: The maximum time (in ms) to wait.
+ * @reset_on_progress: Reset the timeout if forward progress is seen.
+ *
+ * A value of 0 for @timeout_ms means no waiting will occur. A value of -1
+ * represents infinite waiting.
+ *
+ * If @reset_on_progress is true, the timeout will be reset whenever any
+ * printer has been seen to make some forward progress.
+ *
+ * Context: Process context. May sleep while acquiring console lock.
+ * Return: true if all enabled printers are caught up.
+ */
+bool pr_flush(int timeout_ms, bool reset_on_progress)
+{
+ int remaining = timeout_ms;
+ struct console *con;
+ u64 last_diff = 0;
+ u64 printk_seq;
+ u64 diff;
+ u64 seq;
+
+ might_sleep();
+
+ seq = prb_next_seq(prb);
+
+ for (;;) {
+ diff = 0;
+
+ console_lock();
+ for_each_console(con) {
+ if (!console_is_usable(con))
+ continue;
+ printk_seq = con->seq;
+ if (printk_seq < seq)
+ diff += seq - printk_seq;
+ }
+ console_unlock();
+
+ if (diff != last_diff && reset_on_progress)
+ remaining = timeout_ms;
+
+ if (diff == 0 || remaining == 0)
+ break;
+
+ if (remaining < 0) {
+ /* no timeout limit */
+ msleep(100);
+ } else if (remaining < 100) {
+ msleep(remaining);
+ remaining = 0;
+ } else {
+ msleep(100);
+ remaining -= 100;
+ }
+
+ last_diff = diff;
+ }
+
+ return (diff == 0);
+}
+EXPORT_SYMBOL(pr_flush);
+
/*
* Delayed printk version, for scheduler-internal messages:
*/
--
2.30.2
Currently the local CPU timestamp and caller_id for the record are
collected while migration is enabled. Since this information is
CPU-specific, it should be collected with migration disabled.
Migration is disabled immediately after collecting this information
anyway, so just move the information collection to after the
migration disabling.
Signed-off-by: John Ogness <[email protected]>
---
kernel/printk/printk.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b346e60e9e51..206405f349d0 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2017,7 +2017,7 @@ static inline void printk_delay(void)
static inline u32 printk_caller_id(void)
{
return in_task() ? task_pid_nr(current) :
- 0x80000000 + raw_smp_processor_id();
+ 0x80000000 + smp_processor_id();
}
/**
@@ -2099,7 +2099,6 @@ int vprintk_store(int facility, int level,
const struct dev_printk_info *dev_info,
const char *fmt, va_list args)
{
- const u32 caller_id = printk_caller_id();
struct prb_reserved_entry e;
enum printk_info_flags flags = 0;
struct printk_record r;
@@ -2109,10 +2108,14 @@ int vprintk_store(int facility, int level,
u8 *recursion_ptr;
u16 reserve_size;
va_list args2;
+ u32 caller_id;
u16 text_len;
int ret = 0;
u64 ts_nsec;
+ if (!printk_enter_irqsave(recursion_ptr, irqflags))
+ return 0;
+
/*
* Since the duration of printk() can vary depending on the message
* and state of the ringbuffer, grab the timestamp now so that it is
@@ -2121,8 +2124,7 @@ int vprintk_store(int facility, int level,
*/
ts_nsec = local_clock();
- if (!printk_enter_irqsave(recursion_ptr, irqflags))
- return 0;
+ caller_id = printk_caller_id();
/*
* The sprintf needs to come first since the syslog prefix might be
--
2.30.2
With non-threaded console printers preemption is disabled while
holding the console lock in order to avoid the situation where the
console printer is scheduled away and no other task can lock the
console (for printing or otherwise). Disabling preemption is
necessary because the console lock is implemented purely as a
semaphore, which has no owner.
Like non-threaded console printers, kthread printers use the
console lock to synchronize during printing. However, since they
use console_lock() instead of a best-effort console_trylock(), it
is not possible to disable preemption upon locking. Therefore an
alternative for synchronizing and avoiding the above mentioned
situation is needed.
The kthread printers do not need to synchronize against each other,
but they do need to synchronize against console_lock() callers. To
provide this synchronization, introduce a per-console mutex. The
mutex is taken by the kthread printer during printing and is also
taken by console_lock() callers. Since mutexes have owners, when
calling console_lock(), the scheduler is able to schedule any
kthread printers that may have been preempted while printing.
Rather than console_lock() callers holding the per-console mutex
for the duration of the console lock, the per-console mutex is only
taken in order to set a new CON_PAUSED flag, which is checked by
the kthread printers. This avoids any issues due to nested locking
between the various per-console mutexes.
The kthread printers must also synchronize against console_trylock()
callers. Since console_trylock() is non-blocking, a global atomic
counter will be used to identify if any kthread printers are active.
The kthread printers will also check the atomic counter to identify
if the console has been locked by another task via
console_trylock().
A locking overview for console_lock(), console_trylock(), and the
kthread printers is as follows (pseudo code):
console_lock()
{
down(&console_sem);
for_each_console(con) {
mutex_lock(&con->lock);
con->flags |= CON_PAUSED;
mutex_unlock(&con->lock);
}
}
console_trylock()
{
assert(down_trylock(&console_sem));
assert(atomic_cmpxchg(&console_lock_count, 0, -1));
}
kthread_printer()
{
mutex_lock(&con->lock);
assert(con->flags & CON_PAUSED);
assert(atomic_inc_unless_negative(&console_lock_count));
con->write();
atomic_dec(&console_lock_count);
mutex_unlock(&con->lock);
}
Also note that the console owner and waiter logic now only applies
between contexts that have both taken the console lock via
console_trylock(). This is for 2 reasons:
1. Contexts that have taken the console lock via console_lock()
require a sleepable context when unlocking to unpause the kthread
printers. But a waiter context has used console_trylock() and
may not be sleepable.
2. The kthread printers no longer acquire the console lock, so it is
not possible to handover the console lock.
This also has implications for console_unlock(), which attempts a
console_trylock() before returning. Introduce
console_trylock_sched() to allow console_unlock() to specify if it
is in a sleepable context.
Signed-off-by: John Ogness <[email protected]>
---
include/linux/console.h | 15 ++++
kernel/printk/printk.c | 190 +++++++++++++++++++++++++++++++---------
2 files changed, 166 insertions(+), 39 deletions(-)
diff --git a/include/linux/console.h b/include/linux/console.h
index 0f94b1771df8..c51c7f5507a5 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -16,6 +16,7 @@
#include <linux/atomic.h>
#include <linux/types.h>
+#include <linux/mutex.h>
struct vc_data;
struct console_font_op;
@@ -136,6 +137,7 @@ static inline int con_debug_leave(void)
#define CON_ANYTIME (16) /* Safe to call before per-cpu resources ready */
#define CON_BRL (32) /* Used for a braille device */
#define CON_EXTENDED (64) /* Use the extended output format a la /dev/kmsg */
+#define CON_PAUSED (128) /* Sleep while console is locked */
struct console {
char name[16];
@@ -155,6 +157,19 @@ struct console {
unsigned long dropped;
struct task_struct *thread;
+ /*
+ * The per-console lock is used by printing kthreads to synchronize
+ * this console with callers of console_lock(). This is necessary in
+ * order to allow printing kthreads to run in parallel to each other,
+ * while each safely accessing their own @flags and synchronizing
+ * against direct printing via console_lock/console_unlock.
+ *
+ * Note: For synchronizing against direct printing via
+ * console_trylock/console_unlock, see the static global
+ * variable @console_lock_count.
+ */
+ struct mutex lock;
+
void *data;
struct console *next;
};
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e182f31fec58..135fbe647092 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -214,6 +214,26 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
/* Number of registered extended console drivers. */
static int nr_ext_console_drivers;
+/*
+ * Used to synchronize printing kthreads against direct printing via
+ * console_trylock/console_unlock.
+ *
+ * Values:
+ * -1 = console locked (via trylock), kthreads will not print
+ * 0 = no kthread printing, console not locked (via trylock)
+ * >0 = kthread(s) actively printing
+ *
+ * Note: For synchronizing against direct printing via
+ * console_lock/console_unlock, see the @lock variable in
+ * struct console.
+ */
+static atomic_t console_lock_count = ATOMIC_INIT(0);
+
+#define console_excl_trylock() (atomic_cmpxchg(&console_lock_count, 0, -1) == 0)
+#define console_excl_unlock() atomic_cmpxchg(&console_lock_count, -1, 0)
+#define console_printer_tryenter() atomic_inc_unless_negative(&console_lock_count)
+#define console_printer_exit() atomic_dec(&console_lock_count)
+
/*
* Helper macros to handle lockdep when locking/unlocking console_sem. We use
* macros instead of functions so that _RET_IP_ contains useful information.
@@ -256,6 +276,37 @@ static void __up_console_sem(unsigned long ip)
}
#define up_console_sem() __up_console_sem(_RET_IP_)
+/*
+ * Tracks whether kthread printers are all paused. A value of true implies
+ * that the console is locked via console_lock() or the console is suspended.
+ * Reading and writing to this variable requires holding @console_sem.
+ */
+static bool consoles_paused;
+
+/*
+ * Pause or unpause all kthread printers.
+ *
+ * Requires the console_lock.
+ */
+static void __pause_all_consoles(bool do_pause)
+{
+ struct console *con;
+
+ for_each_console(con) {
+ mutex_lock(&con->lock);
+ if (do_pause)
+ con->flags |= CON_PAUSED;
+ else
+ con->flags &= ~CON_PAUSED;
+ mutex_unlock(&con->lock);
+ }
+
+ consoles_paused = do_pause;
+}
+
+#define pause_all_consoles() __pause_all_consoles(true)
+#define unpause_all_consoles() __pause_all_consoles(false)
+
/*
* This is used for debugging the mess that is the VT code by
* keeping track if we have the console semaphore held. It's
@@ -2506,10 +2557,6 @@ void resume_console(void)
down_console_sem();
console_suspended = 0;
console_unlock();
-
- /* Wake the kthread printers. */
- wake_up_klogd();
-
pr_flush(1000, true);
}
@@ -2547,6 +2594,7 @@ void console_lock(void)
down_console_sem();
if (console_suspended)
return;
+ pause_all_consoles();
console_locked = 1;
console_may_schedule = 1;
}
@@ -2568,15 +2616,45 @@ int console_trylock(void)
up_console_sem();
return 0;
}
+ if (!console_excl_trylock()) {
+ up_console_sem();
+ return 0;
+ }
console_locked = 1;
console_may_schedule = 0;
return 1;
}
EXPORT_SYMBOL(console_trylock);
+/*
+ * A variant of console_trylock() that allows specifying if the context may
+ * sleep. If yes, a trylock on @console_sem is attempted and if successful,
+ * the threaded printers are paused. This is important to ensure that
+ * sleepable contexts do not become involved in console_lock handovers and
+ * will call cond_resched() during the printing loop.
+ */
+static int console_trylock_sched(bool may_schedule)
+{
+ if (!may_schedule)
+ return console_trylock();
+
+ might_sleep();
+
+ if (down_trylock_console_sem())
+ return 0;
+ if (console_suspended) {
+ up_console_sem();
+ return 0;
+ }
+ pause_all_consoles();
+ console_locked = 1;
+ console_may_schedule = 1;
+ return 1;
+}
+
int is_console_locked(void)
{
- return console_locked;
+ return (console_locked || atomic_read(&console_lock_count));
}
EXPORT_SYMBOL(is_console_locked);
@@ -2610,6 +2688,19 @@ static inline bool console_is_usable(struct console *con)
static void __console_unlock(void)
{
console_locked = 0;
+
+ /*
+ * Depending on whether console_lock() or console_trylock() was used,
+ * appropriately allow the kthread printers to continue.
+ */
+ if (consoles_paused)
+ unpause_all_consoles();
+ else
+ console_excl_unlock();
+
+ /* Wake the kthread printers. */
+ wake_up_klogd();
+
up_console_sem();
}
@@ -2632,7 +2723,8 @@ static void __console_unlock(void)
*
* @handover will be set to true if a printk waiter has taken over the
* console_lock, in which case the caller is no longer holding the
- * console_lock. Otherwise it is set to false.
+ * console_lock. Otherwise it is set to false. A NULL pointer may be provided
+ * to disable allowing the console_lock to be taken over by a printk waiter.
*/
static bool console_emit_next_record(struct console *con, char *text, char *ext_text,
char *dropped_text, bool *handover)
@@ -2640,12 +2732,14 @@ static bool console_emit_next_record(struct console *con, char *text, char *ext_
struct printk_info info;
struct printk_record r;
unsigned long flags;
+ bool allow_handover;
char *write_text;
size_t len;
prb_rec_init_rd(&r, &info, text, CONSOLE_LOG_MAX);
- *handover = false;
+ if (handover)
+ *handover = false;
if (!prb_read_valid(prb, con->seq, &r))
return false;
@@ -2671,18 +2765,23 @@ static bool console_emit_next_record(struct console *con, char *text, char *ext_
len = record_print_text(&r, console_msg_format & MSG_FORMAT_SYSLOG, printk_time);
}
- /*
- * While actively printing out messages, if another printk()
- * were to occur on another CPU, it may wait for this one to
- * finish. This task can not be preempted if there is a
- * waiter waiting to take over.
- *
- * Interrupts are disabled because the hand over to a waiter
- * must not be interrupted until the hand over is completed
- * (@console_waiter is cleared).
- */
- printk_safe_enter_irqsave(flags);
- console_lock_spinning_enable();
+ /* Handovers may only happen between trylock contexts. */
+ allow_handover = (handover && atomic_read(&console_lock_count) == -1);
+
+ if (allow_handover) {
+ /*
+ * While actively printing out messages, if another printk()
+ * were to occur on another CPU, it may wait for this one to
+ * finish. This task can not be preempted if there is a
+ * waiter waiting to take over.
+ *
+ * Interrupts are disabled because the hand over to a waiter
+ * must not be interrupted until the hand over is completed
+ * (@console_waiter is cleared).
+ */
+ printk_safe_enter_irqsave(flags);
+ console_lock_spinning_enable();
+ }
stop_critical_timings(); /* don't trace print latency */
call_console_driver(con, write_text, len, dropped_text);
@@ -2690,8 +2789,10 @@ static bool console_emit_next_record(struct console *con, char *text, char *ext_
con->seq++;
- *handover = console_lock_spinning_disable_and_check();
- printk_safe_exit_irqrestore(flags);
+ if (allow_handover) {
+ *handover = console_lock_spinning_disable_and_check();
+ printk_safe_exit_irqrestore(flags);
+ }
printk_delay(r.info->level);
skip:
@@ -2825,7 +2926,7 @@ void console_unlock(void)
* Re-check if there is a new record to flush. If the trylock
* fails, another context is already handling the printing.
*/
- } while (prb_read_valid(prb, next_seq, NULL) && console_trylock());
+ } while (prb_read_valid(prb, next_seq, NULL) && console_trylock_sched(do_cond_resched));
}
EXPORT_SYMBOL(console_unlock);
@@ -2856,6 +2957,10 @@ void console_unblank(void)
if (oops_in_progress) {
if (down_trylock_console_sem() != 0)
return;
+ if (!console_excl_trylock()) {
+ up_console_sem();
+ return;
+ }
} else {
pr_flush(1000, true);
console_lock();
@@ -2937,10 +3042,6 @@ void console_start(struct console *console)
console_lock();
console->flags |= CON_ENABLED;
console_unlock();
-
- /* Wake the kthread printers. */
- wake_up_klogd();
-
pr_flush(1000, true);
}
EXPORT_SYMBOL(console_start);
@@ -3135,7 +3236,11 @@ void register_console(struct console *newcon)
if (newcon->flags & CON_EXTENDED)
nr_ext_console_drivers++;
+ if (consoles_paused)
+ newcon->flags |= CON_PAUSED;
+
newcon->dropped = 0;
+ mutex_init(&newcon->lock);
if (newcon->flags & CON_PRINTBUFFER) {
/* Get a consistent copy of @syslog_seq. */
mutex_lock(&syslog_lock);
@@ -3397,16 +3502,17 @@ static bool printer_should_wake(struct console *con, u64 seq)
if (kthread_should_stop())
return true;
- if (console_suspended)
- return false;
-
/*
* This is an unsafe read to con->flags, but false positives
* are not an issue as long as they are rare.
*/
flags = data_race(READ_ONCE(con->flags));
- if (!(flags & CON_ENABLED))
+
+ if (!(flags & CON_ENABLED) ||
+ (flags & CON_PAUSED) ||
+ atomic_read(&console_lock_count) == -1) {
return false;
+ }
return prb_read_valid(prb, seq, NULL);
}
@@ -3417,7 +3523,6 @@ static int printk_kthread_func(void *data)
char *dropped_text = NULL;
char *ext_text = NULL;
bool progress;
- bool handover;
u64 seq = 0;
char *text;
int error;
@@ -3450,9 +3555,17 @@ static int printk_kthread_func(void *data)
continue;
do {
- console_lock();
- if (console_suspended) {
- console_unlock();
+ error = mutex_lock_interruptible(&con->lock);
+ if (error)
+ break;
+
+ if (!console_is_usable(con)) {
+ mutex_unlock(&con->lock);
+ break;
+ }
+
+ if ((con->flags & CON_PAUSED) || !console_printer_tryenter()) {
+ mutex_unlock(&con->lock);
break;
}
@@ -3466,14 +3579,13 @@ static int printk_kthread_func(void *data)
*/
console_may_schedule = 0;
progress = console_emit_next_record(con, text, ext_text,
- dropped_text, &handover);
- if (handover)
- break;
+ dropped_text, NULL);
seq = con->seq;
- /* Unlock console without invoking direct printing. */
- __console_unlock();
+ console_printer_exit();
+
+ mutex_unlock(&con->lock);
} while (progress);
}
out:
--
2.30.2
The CPU sync functions are a NOP for !CONFIG_SMP. But for
!CONFIG_SMP they still need to disable interrupts in order to
preserve context within the CPU sync sections.
Signed-off-by: John Ogness <[email protected]>
---
include/linux/printk.h | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 780408b29c13..cd8192e611a1 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -285,9 +285,16 @@ extern int __printk_cpu_sync_try_get(void);
extern void __printk_cpu_sync_wait(void);
extern void __printk_cpu_sync_put(void);
+#else
+
+#define __printk_cpu_sync_try_get() true
+#define __printk_cpu_sync_wait()
+#define __printk_cpu_sync_put()
+#endif /* CONFIG_SMP */
+
/**
- * printk_cpu_sync_get_irqsave() - Acquire the printk cpu-reentrant spinning
- * lock and disable interrupts.
+ * printk_cpu_sync_get_irqsave() - Disable interrupts and acquire the printk
+ * cpu-reentrant spinning lock.
* @flags: Stack-allocated storage for saving local interrupt state,
* to be passed to printk_cpu_sync_put_irqrestore().
*
@@ -314,13 +321,6 @@ extern void __printk_cpu_sync_put(void);
local_irq_restore(flags); \
} while (0)
-#else
-
-#define printk_cpu_sync_get_irqsave(flags) ((void)flags)
-#define printk_cpu_sync_put_irqrestore(flags) ((void)flags)
-
-#endif /* CONFIG_SMP */
-
extern int kptr_restrict;
/**
--
2.30.2
On Mon 2022-02-07 20:49:11, John Ogness wrote:
> Since the printk cpulock is CPU-reentrant and since it is used
> in all contexts, its usage must be carefully considered and
> most likely will require programming locklessly. To avoid
> mistaking the printk cpulock as a typical lock, rename it to
> cpu_sync. The main functions then become:
>
> printk_cpu_sync_get_irqsave(flags);
> printk_cpu_sync_put_irqrestore(flags);
It is possible that I will understand the motivation later when
reading the entire patchset. But my initial reaction is confusion ;-)
From mo POV, it is a lock. It tries to get exclusive access and
has to wait until the current owner releases it.
As you say: "its usage must be carefully considered and most likely
will require programming locklessly." I guess that it is related to:
+ There is a risk of deadlocks that are typically associated with
locks. After all the word "lock" is part of "deadlock".
+ It requires lockless programming because it is supposed to be
terminal lock. It means that no other locks should be taken
under it.
Is there any other API using this naming scheme, please?
I have get() and put() associated with reference counting. But it has
an opposite meaning. It usually guards an object from freeing as long
as there is at least one user. And it allows to have many users.
Regarding the reentrancy. It seems that "_nested" suffix is used for
this type of locks, for example, mutex_lock_nested(),
spin_lock_nested().
It might be enough to add "_nested" suffix and explain why it has
to be used carefully (terminal lock) in a comment.
But I might miss something.
Best Regards,
Petr
On 2022-02-11, Petr Mladek <[email protected]> wrote:
> On Mon 2022-02-07 20:49:11, John Ogness wrote:
>> Since the printk cpulock is CPU-reentrant and since it is used
>> in all contexts, its usage must be carefully considered and
>> most likely will require programming locklessly. To avoid
>> mistaking the printk cpulock as a typical lock, rename it to
>> cpu_sync. The main functions then become:
>>
>> printk_cpu_sync_get_irqsave(flags);
>> printk_cpu_sync_put_irqrestore(flags);
>
> It is possible that I will understand the motivation later when
> reading the entire patchset. But my initial reaction is confusion ;-)
Actually, the motivation comes from a discussion we had during the RT
Track at Plumbers 2021 [0]. It isn't a lock and so we didn't want to
call it a lock. (More below.)
> From mo POV, it is a lock. It tries to get exclusive access and
> has to wait until the current owner releases it.
It is only exclusive for a CPU. If another context on that CPU tries to
get the "lock" it will succeed. For example:
process context lock() -> success
--- INTERRUPT ---
irq context lock() -> success
--- NMI ---
nmi context lock() -> success
None of these contexts can assume that they have synchronized access
because clearly they have all interrupted each other. If an object does
not provide synchronized access to data, then "lock" is probably not a
good name for that object.
> As you say: "its usage must be carefully considered and most likely
> will require programming locklessly." I guess that it is related to:
>
> + There is a risk of deadlocks that are typically associated with
> locks. After all the word "lock" is part of "deadlock".
>
> + It requires lockless programming because it is supposed to be
> terminal lock. It means that no other locks should be taken
> under it.
It is because (as in the example above), taking this "lock" does not
provide synchronization to data. It is only synchronizing between
CPUs. It was Steven's suggestion to call the thing a cpu_sync object and
nobody in the RT Track seemed to disagree.
> Is there any other API using this naming scheme, please?
No.
> I have get() and put() associated with reference counting. But it has
> an opposite meaning. It usually guards an object from freeing as long
> as there is at least one user. And it allows to have many users.
This _is_ reference counting. In fact, if you look at the implementation
you see:
atomic_inc(&printk_cpu_sync_nested);
It is allowing multiple users (from the same CPU).
> Regarding the reentrancy. It seems that "_nested" suffix is used for
> this type of locks, for example, mutex_lock_nested(),
> spin_lock_nested().
>
> It might be enough to add "_nested" suffix and explain why it has
> to be used carefully (terminal lock) in a comment.
The internal counter is called "_nested" to make it clear to us printk
developers. IMO the common _get and _put semantics are appropriate
here. The important thing is that the word "lock" is removed. It is not
a lock.
John
[0] https://youtu.be/cZUzc0U1jJ4?t=12946
On Mon 2022-02-07 20:49:12, John Ogness wrote:
> The CPU sync functions are a NOP for !CONFIG_SMP. But for
> !CONFIG_SMP they still need to disable interrupts in order to
> preserve context within the CPU sync sections.
>
> Signed-off-by: John Ogness <[email protected]>
Good catch!
It is interesting that the original code did not disable interrupts
around __dump_stack() on non-SMP, see the commit 766c268bc6d39b8124
("lib/dump_stack: move cpu lock to printk.c"). But it was rather
a bug. The patch makes perfect sense.
Reviewed-by: Petr Mladek <[email protected]>
Best Regards,
Petr
On Fri, Feb 11, 2022 at 03:57:27PM -0500, Steven Rostedt wrote:
> On Fri, 11 Feb 2022 15:48:08 +0106
> John Ogness <[email protected]> wrote:
>
> > It is because (as in the example above), taking this "lock" does not
> > provide synchronization to data. It is only synchronizing between
> > CPUs. It was Steven's suggestion to call the thing a cpu_sync object and
> > nobody in the RT Track seemed to disagree.
>
> I love causing trouble ;-)
>
> Actually, it wasn't just my suggestion. IIRC, I believe Peter Zijlstra was
> against calling it a lock (Peter, you can use lore to see the context here).
All I remember is that it was in a room and I was late, I can't even
remember what City we were all in at the time. Was this Lisbon?
Anyway, as Steve said, it isn't really a strict exclusion thing, it only
avoids the most egregious inter-cpu interleaving. I'm down with
goldi-locks, something has to have that name :-)
On Mon 2022-02-07 20:49:13, John Ogness wrote:
> The CON_ANYTIME console flag is used to label consoles that will
> work correctly before percpu resources are allocated. To check
> the condition, cpu_online(raw_smp_processor_id()) was used.
> However, this is odd because CPUs can go offline at a later point.
> Also, the function is forced to use the raw_ variant because
> migration is not disabled.
>
> Since commit ab6f762f0f53 ("printk: queue wake_up_klogd irq_work
> only if per-CPU areas are ready") there is a variable to identify
> if percpu resources have been allocated. Use that variable instead
> of cpu_online(raw_smp_processor_id()).
>
> Signed-off-by: John Ogness <[email protected]>
I dug more into the history to be more confident about the change.
I am not sure that it is safe.
CON_ANYTIME really seems to be related to per-CPU variables and early
boot, see the commit 76a8ad293912cd2f01eca ("[PATCH] Make printk work
for really early debugging").
I am not 100% sure. I think that the support for per-CPU variables
is enabled by setup_per_cpu_areas(). It is done for_each_possible_cpu()
So that per-CPU variables should always be available after
printk_percpu_data_ready(). So far, so good.
The problem is the commit ac25575203c11145066ea ("[PATCH] CPU hotplug
printk fix"). It suggests that per-CPU data of slab are freed during
hotplug.
There are many other things that are manipulated during cpu hotplug.
And there are the two notifiers "printk:dead" and "printk:online",
see printk_late_init(). Maybe, we should use them to decide whether
the non-trivial consoles are callable during CPU hotplug.
Best Regards,
Petr
On (22/02/11 17:05), Petr Mladek wrote:
> I dug more into the history to be more confident about the change.
> I am not sure that it is safe.
>
> CON_ANYTIME really seems to be related to per-CPU variables and early
> boot, see the commit 76a8ad293912cd2f01eca ("[PATCH] Make printk work
> for really early debugging").
Hmm... This looks a bit tricky. It seems that the commit in question
is concerned with per-CPU variables access indeed; not printk() per-CPU
access though but per-CPU access from con->write(). I guess console_unlock()
did not use to access per-CPU variables before, it does now, so that
can_use_console() in fact protects both printk (which certainly happens)
and con->write() (do we even have it?) per-CPU access
console_unlock()
{
if (!can_use_console())
return;
for (;;) {
printk_safe_enter_irqsave() -> __this_cpu()
printk_safe_exit_irqrestore() -> __this_cpu()
}
}
On (22/02/11 17:05), Petr Mladek wrote:
> On Mon 2022-02-07 20:49:13, John Ogness wrote:
[..]
> The problem is the commit ac25575203c11145066ea ("[PATCH] CPU hotplug
> printk fix"). It suggests that per-CPU data of slab are freed during
> hotplug.
>
> There are many other things that are manipulated during cpu hotplug.
> And there are the two notifiers "printk:dead" and "printk:online",
> see printk_late_init(). Maybe, we should use them to decide whether
> the non-trivial consoles are callable during CPU hotplug.
Great findings. Looks like we only set __printk_percpu_data_ready to
true and never set it back to false, relying on cpu_online() in such
cases. But here's the thing: we have printk_percpu_data_ready() in
__printk_recursion_counter() and in wake_up_klogd() and in
defer_console_output(), but why we never check __printk_percpu_data_ready
in __down_trylock_console_sem()/__up_console_sem() and more importantly
in console_trylock_spinning() and those do access this_cpu() in printk safe
enter/exit. Am I missing something?
On (22/02/11 13:58), Petr Mladek wrote:
> On Mon 2022-02-07 20:49:12, John Ogness wrote:
> > The CPU sync functions are a NOP for !CONFIG_SMP. But for
> > !CONFIG_SMP they still need to disable interrupts in order to
> > preserve context within the CPU sync sections.
> >
> > Signed-off-by: John Ogness <[email protected]>
>
> Good catch!
>
> It is interesting that the original code did not disable interrupts
> around __dump_stack() on non-SMP, see the commit 766c268bc6d39b8124
> ("lib/dump_stack: move cpu lock to printk.c"). But it was rather
> a bug. The patch makes perfect sense.
>
> Reviewed-by: Petr Mladek <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
On Fri, 11 Feb 2022 15:48:08 +0106
John Ogness <[email protected]> wrote:
> It is because (as in the example above), taking this "lock" does not
> provide synchronization to data. It is only synchronizing between
> CPUs. It was Steven's suggestion to call the thing a cpu_sync object and
> nobody in the RT Track seemed to disagree.
I love causing trouble ;-)
Actually, it wasn't just my suggestion. IIRC, I believe Peter Zijlstra was
against calling it a lock (Peter, you can use lore to see the context here).
And besides calling it "goldie locks" I suggested calling it something that
has to do with "synchronization" instead of true exclusiveness. And name it
something to do with keeping one CPU from stepping on another CPU. And the
entire group agreed to not call it a lock, but instead a CPU
synchronization object.
Linux Plumbers RT MC summary:
https://lpc.events/event/11/page/104-accepted-microconferences#summary-rt
Printk topic video:
https://youtu.be/5jkvV4A2kT8?list=PLVsQ_xZBEyN3wA8Ej4BUjudXFbXuxhnfc&t=9217
Printk topic slides:
https://linuxplumbersconf.org/event/11/contributions/1060/attachments/829/1600/lpc2021_rtmc_ogness.pdf
(This is why I pushed all the MCs to provide summaries!)
-- Steve
On 2022-02-14, Sergey Senozhatsky <[email protected]> wrote:
> printk has been using enter/exit naming for a while now (starting with
> nmi enter/exit, then printk_safe enter/exit and soon direct
> enter/exit); so may be we can follow suit here and use
> printk_cpu_sync_enter() and printk_cpu_sync_exit()?
I like this suggestion. So it would be:
printk_cpu_sync_enter_irqsave()
printk_cpu_sync_exit_irqrestore()
On (22/02/11 13:44), Petr Mladek wrote:
> On Mon 2022-02-07 20:49:11, John Ogness wrote:
> > Since the printk cpulock is CPU-reentrant and since it is used
> > in all contexts, its usage must be carefully considered and
> > most likely will require programming locklessly. To avoid
> > mistaking the printk cpulock as a typical lock, rename it to
> > cpu_sync. The main functions then become:
> >
> > printk_cpu_sync_get_irqsave(flags);
> > printk_cpu_sync_put_irqrestore(flags);
>
> It is possible that I will understand the motivation later when
> reading the entire patchset. But my initial reaction is confusion ;-)
>
> From mo POV, it is a lock. It tries to get exclusive access and
> has to wait until the current owner releases it.
printk has been using enter/exit naming for a while now (starting with
nmi enter/exit, then printk_safe enter/exit and soon direct enter/exit);
so may be we can follow suit here and use printk_cpu_sync_enter() and
printk_cpu_sync_exit()?
> After all the word "lock" is part of "deadlock".
That's a good one.
On (22/02/07 20:49), John Ogness wrote:
> Currently the local CPU timestamp and caller_id for the record are
> collected while migration is enabled. Since this information is
> CPU-specific, it should be collected with migration disabled.
>
> Migration is disabled immediately after collecting this information
> anyway, so just move the information collection to after the
> migration disabling.
>
> Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
On (22/02/07 20:49), John Ogness wrote:
> boot_delay_msec() is always called immediately before printk_delay()
> so just call it from within printk_delay().
>
> Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
> -static inline void printk_delay(void)
> +static inline void printk_delay(int level)
> {
> + boot_delay_msec(level);
> +
> if (unlikely(printk_delay_msec)) {
> int m = printk_delay_msec;
>
> @@ -2224,8 +2226,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> in_sched = true;
> }
>
> - boot_delay_msec(level);
> - printk_delay();
> + printk_delay(level);
As a side note, I wonder why do boot_delay_msec() and printk_delay()
touch only NMI watchdog. Shouldn't they also touch softlockup, WQ,
RCU, etc. watchdogs?
On Fri 2022-02-11 15:48:08, John Ogness wrote:
> On 2022-02-11, Petr Mladek <[email protected]> wrote:
> > On Mon 2022-02-07 20:49:11, John Ogness wrote:
> >> Since the printk cpulock is CPU-reentrant and since it is used
> >> in all contexts, its usage must be carefully considered and
> >> most likely will require programming locklessly. To avoid
> >> mistaking the printk cpulock as a typical lock, rename it to
> >> cpu_sync. The main functions then become:
> >>
> >> printk_cpu_sync_get_irqsave(flags);
> >> printk_cpu_sync_put_irqrestore(flags);
> >
> > It is possible that I will understand the motivation later when
> > reading the entire patchset. But my initial reaction is confusion ;-)
>
> Actually, the motivation comes from a discussion we had during the RT
> Track at Plumbers 2021 [0]. It isn't a lock and so we didn't want to
> call it a lock. (More below.)
Thanks for the link. I have listened to the discussion. And I am still not
persuaded ;-)
> > From mo POV, it is a lock. It tries to get exclusive access and
> > has to wait until the current owner releases it.
>
> It is only exclusive for a CPU. If another context on that CPU tries to
> get the "lock" it will succeed. For example:
>
> process context lock() -> success
> --- INTERRUPT ---
> irq context lock() -> success
> --- NMI ---
> nmi context lock() -> success
>
> None of these contexts can assume that they have synchronized access
> because clearly they have all interrupted each other. If an object does
> not provide synchronized access to data, then "lock" is probably not a
> good name for that object.
All _nested_ locks have these limits. In fact, _all_ locks have these
limits. This is why it is common to take many locks (chain of locks)
if you really want to serialize some things. This is why there are
ABBA problems.
> > As you say: "its usage must be carefully considered and most likely
> > will require programming locklessly." I guess that it is related to:
> >
> > + There is a risk of deadlocks that are typically associated with
> > locks. After all the word "lock" is part of "deadlock".
> >
> > + It requires lockless programming because it is supposed to be
> > terminal lock. It means that no other locks should be taken
> > under it.
>
> It is because (as in the example above), taking this "lock" does not
> provide synchronization to data. It is only synchronizing between
> CPUs. It was Steven's suggestion to call the thing a cpu_sync object and
> nobody in the RT Track seemed to disagree.
IMHO, the main task of this API is to synchronize CPUs. It is normal that
a lock does not protect all objects that are accessed under the lock.
> > I have get() and put() associated with reference counting. But it has
> > an opposite meaning. It usually guards an object from freeing as long
> > as there is at least one user. And it allows to have many users.
>
> This _is_ reference counting. In fact, if you look at the implementation
> you see:
>
> atomic_inc(&printk_cpu_sync_nested);
>
> It is allowing multiple users (from the same CPU).
Yes. My point is that reference counting prevents releasing of an
object. It does not prevent parallel access. The parallel access
is prevented by locks.
From my POV, the main task of this API is to prevent parallel
printing from other CPUs. Even Steven Rostedt wrote in the chat
"This makes it a lock" see the recording[0] around the time 2:46:14.
> > Regarding the reentrancy. It seems that "_nested" suffix is used for
> > this type of locks, for example, mutex_lock_nested(),
> > spin_lock_nested().
> >
> > It might be enough to add "_nested" suffix and explain why it has
> > to be used carefully (terminal lock) in a comment.
>
> The internal counter is called "_nested" to make it clear to us printk
> developers. IMO the common _get and _put semantics are appropriate
> here. The important thing is that the word "lock" is removed. It is not
> a lock.
Why is it so important to get rid of the word "lock", please?
Well, I probably understand it. The API must be used carefully.
This whole discussion is about how to make the risks more obvious.
My main fear are deadlocks caused when someone tries to get this
"cpu_sync" thing. This is why I would like to call it a lock.
I guess that you are more concerned about races between different
contexts when implementing atomic consoles. This is why you prefer
to avoid the word "lock".
OK, this discussion helped me to improve my mental model about
this API. So, the name is getting less important for me. I would
still slightly prefer to keep "lock". But I am fine with the renaming
to "put/get".
[0] https://youtu.be/cZUzc0U1jJ4?t=12946
Best Regards,
Petr
On Fri 2022-02-11 22:04:48, Peter Zijlstra wrote:
> On Fri, Feb 11, 2022 at 03:57:27PM -0500, Steven Rostedt wrote:
> > On Fri, 11 Feb 2022 15:48:08 +0106
> > John Ogness <[email protected]> wrote:
> >
> > > It is because (as in the example above), taking this "lock" does not
> > > provide synchronization to data. It is only synchronizing between
> > > CPUs. It was Steven's suggestion to call the thing a cpu_sync object and
> > > nobody in the RT Track seemed to disagree.
> >
> > I love causing trouble ;-)
> >
> > Actually, it wasn't just my suggestion. IIRC, I believe Peter Zijlstra was
> > against calling it a lock (Peter, you can use lore to see the context here).
>
> All I remember is that it was in a room and I was late, I can't even
> remember what City we were all in at the time. Was this Lisbon?
>
> Anyway, as Steve said, it isn't really a strict exclusion thing, it only
> avoids the most egregious inter-cpu interleaving. I'm down with
> goldi-locks, something has to have that name :-)
You troublemakers :-)
OK, I know, I am the troublemaker here.
Best Regards,
Petr
On Mon 2022-02-07 20:49:15, John Ogness wrote:
> boot_delay_msec() is always called immediately before printk_delay()
> so just call it from within printk_delay().
>
> Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Best Regards,
Petr
On Mon 2022-02-14 15:49:08, Sergey Senozhatsky wrote:
> On (22/02/11 13:44), Petr Mladek wrote:
> > On Mon 2022-02-07 20:49:11, John Ogness wrote:
> > > Since the printk cpulock is CPU-reentrant and since it is used
> > > in all contexts, its usage must be carefully considered and
> > > most likely will require programming locklessly. To avoid
> > > mistaking the printk cpulock as a typical lock, rename it to
> > > cpu_sync. The main functions then become:
> > >
> > > printk_cpu_sync_get_irqsave(flags);
> > > printk_cpu_sync_put_irqrestore(flags);
> >
> > It is possible that I will understand the motivation later when
> > reading the entire patchset. But my initial reaction is confusion ;-)
> >
> > From mo POV, it is a lock. It tries to get exclusive access and
> > has to wait until the current owner releases it.
>
> printk has been using enter/exit naming for a while now (starting with
> nmi enter/exit, then printk_safe enter/exit and soon direct enter/exit);
> so may be we can follow suit here and use printk_cpu_sync_enter() and
> printk_cpu_sync_exit()?
Interesting idea.
Honestly, I do not like it much. The existing enter/exit API is not
blocking but "cpu_sync" API is.
This patch is about how to make it more obvious that this
API has to be used carefully. I see the following main risks when
using this API:
+ it might cause deadlocks, especially in panic
+ it is supposed to be tail lock (no other locks allowed in this
context)
+ it is re-entrant
+ it does not prevent parallel (nested) access on the same CPU
"get/put" are more acceptable for me. They create at lest some feeling
that it tries to get something and it might take some time. Even
thought many "get/put" APIs are not blocking.
BTW: The API disables IRQ. So the nested access is limited to two
levels: normal/IRQ and nested NMI contexts.
Best Regards,
Petr
On Tue 2022-02-15 14:58:51, Sergey Senozhatsky wrote:
> On (22/02/07 20:49), John Ogness wrote:
> > boot_delay_msec() is always called immediately before printk_delay()
> > so just call it from within printk_delay().
> >
> > Signed-off-by: John Ogness <[email protected]>
>
> Reviewed-by: Sergey Senozhatsky <[email protected]>
>
> > -static inline void printk_delay(void)
> > +static inline void printk_delay(int level)
> > {
> > + boot_delay_msec(level);
> > +
> > if (unlikely(printk_delay_msec)) {
> > int m = printk_delay_msec;
> >
> > @@ -2224,8 +2226,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> > in_sched = true;
> > }
> >
> > - boot_delay_msec(level);
> > - printk_delay();
> > + printk_delay(level);
>
> As a side note, I wonder why do boot_delay_msec() and printk_delay()
> touch only NMI watchdog. Shouldn't they also touch softlockup, WQ,
> RCU, etc. watchdogs?
It seems that touch_nmi_watchdog() actually calls touch_softlockup_watchdog()
that calls wq_watchdog_touch().
Best Regards,
Petr
On Mon 2022-02-14 16:35:18, Sergey Senozhatsky wrote:
> On (22/02/11 17:05), Petr Mladek wrote:
> > On Mon 2022-02-07 20:49:13, John Ogness wrote:
> [..]
> > The problem is the commit ac25575203c11145066ea ("[PATCH] CPU hotplug
> > printk fix"). It suggests that per-CPU data of slab are freed during
> > hotplug.
> >
> > There are many other things that are manipulated during cpu hotplug.
> > And there are the two notifiers "printk:dead" and "printk:online",
> > see printk_late_init(). Maybe, we should use them to decide whether
> > the non-trivial consoles are callable during CPU hotplug.
>
> Great findings. Looks like we only set __printk_percpu_data_ready to
> true and never set it back to false, relying on cpu_online() in such
> cases. But here's the thing: we have printk_percpu_data_ready() in
> __printk_recursion_counter() and in wake_up_klogd() and in
> defer_console_output(), but why we never check __printk_percpu_data_ready
> in __down_trylock_console_sem()/__up_console_sem() and more importantly
> in console_trylock_spinning() and those do access this_cpu() in printk safe
> enter/exit. Am I missing something?
Great point!
I am not 100% sure. But it seems that static per-CPU variables might
actually be used since the boot.
This is from mm/percpu.c
* There is special consideration for the first chunk which must handle
* the static percpu variables in the kernel image as allocation services
* are not online yet. In short, the first chunk is structured like so:
*
* <Static | [Reserved] | Dynamic>
*
* The static data is copied from the original section managed by the
* linker. The reserved section, if non-zero, primarily manages static
* percpu variables from kernel modules. Finally, the dynamic section
* takes care of normal allocations.
I thought that it might work only for CPU0. But it seems that it
probably works for each possible cpu, see:
bool __is_kernel_percpu_address(unsigned long addr, unsigned long *can_addr)
{
#ifdef CONFIG_SMP
const size_t static_size = __per_cpu_end - __per_cpu_start;
void __percpu *base = __addr_to_pcpu_ptr(pcpu_base_addr);
unsigned int cpu;
for_each_possible_cpu(cpu) {
void *start = per_cpu_ptr(base, cpu);
void *va = (void *)addr;
if (va >= start && va < start + static_size) {
[...]
}
and
/**
* is_kernel_percpu_address - test whether address is from static percpu area
* @addr: address to test
*
* Test whether @addr belongs to in-kernel static percpu area. Module
* static percpu areas are not considered. For those, use
* is_module_percpu_address().
*
* RETURNS:
* %true if @addr is from in-kernel static percpu area, %false otherwise.
*/
bool is_kernel_percpu_address(unsigned long addr)
{
return __is_kernel_percpu_address(addr, NULL);
}
Most likely, only dynamically allocated per-cpu variables have to wait
until the per-cpu areas are initialized.
This might explain why there is no generic
are_per_cpu_variables_ready() callback.
We should probably revisit the code and remove the fallback to
normal static variables.
Best Regards,
Petr
On (22/02/15 11:38), Petr Mladek wrote:
> On Mon 2022-02-14 16:35:18, Sergey Senozhatsky wrote:
> > On (22/02/11 17:05), Petr Mladek wrote:
> > > On Mon 2022-02-07 20:49:13, John Ogness wrote:
> > [..]
> > > The problem is the commit ac25575203c11145066ea ("[PATCH] CPU hotplug
> > > printk fix"). It suggests that per-CPU data of slab are freed during
> > > hotplug.
> > >
> > > There are many other things that are manipulated during cpu hotplug.
> > > And there are the two notifiers "printk:dead" and "printk:online",
> > > see printk_late_init(). Maybe, we should use them to decide whether
> > > the non-trivial consoles are callable during CPU hotplug.
> >
> > Great findings. Looks like we only set __printk_percpu_data_ready to
> > true and never set it back to false, relying on cpu_online() in such
> > cases. But here's the thing: we have printk_percpu_data_ready() in
> > __printk_recursion_counter() and in wake_up_klogd() and in
> > defer_console_output(), but why we never check __printk_percpu_data_ready
> > in __down_trylock_console_sem()/__up_console_sem() and more importantly
> > in console_trylock_spinning() and those do access this_cpu() in printk safe
> > enter/exit. Am I missing something?
>
> Great point!
>
> I am not 100% sure. But it seems that static per-CPU variables might
> actually be used since the boot.
Wow, this is great to learn. Thanks!
> This is from mm/percpu.c
>
> * There is special consideration for the first chunk which must handle
> * the static percpu variables in the kernel image as allocation services
> * are not online yet. In short, the first chunk is structured like so:
> *
> * <Static | [Reserved] | Dynamic>
> *
> * The static data is copied from the original section managed by the
> * linker. The reserved section, if non-zero, primarily manages static
> * percpu variables from kernel modules. Finally, the dynamic section
> * takes care of normal allocations.
>
>
> I thought that it might work only for CPU0. But it seems that it
> probably works for each possible cpu, see:
>
> bool __is_kernel_percpu_address(unsigned long addr, unsigned long *can_addr)
> {
> #ifdef CONFIG_SMP
> const size_t static_size = __per_cpu_end - __per_cpu_start;
> void __percpu *base = __addr_to_pcpu_ptr(pcpu_base_addr);
> unsigned int cpu;
>
> for_each_possible_cpu(cpu) {
> void *start = per_cpu_ptr(base, cpu);
> void *va = (void *)addr;
>
> if (va >= start && va < start + static_size) {
> [...]
> }
>
> and
>
> /**
> * is_kernel_percpu_address - test whether address is from static percpu area
> * @addr: address to test
> *
> * Test whether @addr belongs to in-kernel static percpu area. Module
> * static percpu areas are not considered. For those, use
> * is_module_percpu_address().
> *
> * RETURNS:
> * %true if @addr is from in-kernel static percpu area, %false otherwise.
> */
> bool is_kernel_percpu_address(unsigned long addr)
> {
> return __is_kernel_percpu_address(addr, NULL);
> }
>
>
> Most likely, only dynamically allocated per-cpu variables have to wait
> until the per-cpu areas are initialized.
>
> This might explain why there is no generic
> are_per_cpu_variables_ready() callback.
>
> We should probably revisit the code and remove the fallback to
> normal static variables.
Agreed.
On (22/02/15 15:59), Petr Mladek wrote:
> > > -static inline void printk_delay(void)
> > > +static inline void printk_delay(int level)
> > > {
> > > + boot_delay_msec(level);
> > > +
> > > if (unlikely(printk_delay_msec)) {
> > > int m = printk_delay_msec;
> > >
> > > @@ -2224,8 +2226,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> > > in_sched = true;
> > > }
> > >
> > > - boot_delay_msec(level);
> > > - printk_delay();
> > > + printk_delay(level);
> >
> > As a side note, I wonder why do boot_delay_msec() and printk_delay()
> > touch only NMI watchdog. Shouldn't they also touch softlockup, WQ,
> > RCU, etc. watchdogs?
>
> It seems that touch_nmi_watchdog() actually calls touch_softlockup_watchdog()
> that calls wq_watchdog_touch().
Oh, you are right, good point.
Well, we still have RCU stall detector which can trigger (in theory).
On (22/02/15 10:29), Petr Mladek wrote:
> > printk has been using enter/exit naming for a while now (starting with
> > nmi enter/exit, then printk_safe enter/exit and soon direct enter/exit);
> > so may be we can follow suit here and use printk_cpu_sync_enter() and
> > printk_cpu_sync_exit()?
>
> Interesting idea.
>
> Honestly, I do not like it much.
:) OK
> The existing enter/exit API is not blocking but "cpu_sync" API is.
>
> This patch is about how to make it more obvious that this
> API has to be used carefully. I see the following main risks when
> using this API:
>
> + it might cause deadlocks, especially in panic
>
> + it is supposed to be tail lock (no other locks allowed in this
> context)
>
> + it is re-entrant
>
> + it does not prevent parallel (nested) access on the same CPU
>
>
> "get/put" are more acceptable for me. They create at lest some feeling
> that it tries to get something and it might take some time. Even
> thought many "get/put" APIs are not blocking.
I don't mind "get/put".
The thing that looks nice in enter/exit is that enter/exit don't
expose any implementation details: is there a lock behind or not,
is there a ref-counter or not, and so on.
On Mon 2022-02-07 20:49:14, John Ogness wrote:
> Currently the local CPU timestamp and caller_id for the record are
> collected while migration is enabled. Since this information is
> CPU-specific, it should be collected with migration disabled.
>
> Migration is disabled immediately after collecting this information
> anyway, so just move the information collection to after the
> migration disabling.
>
> Signed-off-by: John Ogness <[email protected]>
Great catch!
Reviewed-by: Petr Mladek <[email protected]>
Best Regards,
Petr
On Mon 2022-02-07 20:49:13, John Ogness wrote:
> The CON_ANYTIME console flag is used to label consoles that will
> work correctly before percpu resources are allocated. To check
> the condition, cpu_online(raw_smp_processor_id()) was used.
> However, this is odd because CPUs can go offline at a later point.
> Also, the function is forced to use the raw_ variant because
> migration is not disabled.
>
> Since commit ab6f762f0f53 ("printk: queue wake_up_klogd irq_work
> only if per-CPU areas are ready") there is a variable to identify
> if percpu resources have been allocated. Use that variable instead
> of cpu_online(raw_smp_processor_id()).
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index d1b773823d63..b346e60e9e51 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2577,11 +2577,11 @@ static int have_callable_console(void)
> *
> * Console drivers may assume that per-cpu resources have been allocated. So
> * unless they're explicitly marked as being able to cope (CON_ANYTIME) don't
> - * call them until this CPU is officially up.
> + * call them until per-cpu resources have been allocated.
> */
> static inline int can_use_console(void)
> {
> - return cpu_online(raw_smp_processor_id()) || have_callable_console();
> + return (printk_percpu_data_ready() || have_callable_console());
> }
cpu_online(raw_smp_processor_id()) check is used also in
call_console_drivers(). The same logic should be used in both
locations.
I found this when reviewing 6th patch that replaced both checks
with a single one.
Note that I am still not sure if this change is correct at all.
It will allow to always call the console during CPU hotplug
and I am not sure if it is safe. IMHO, it might cause problems when
a console driver uses, for example, CPU-bound workqueues.
Best Regards,
Petr
On Mon 2022-02-07 20:49:18, John Ogness wrote:
> Provide a might-sleep function to allow waiting for threaded console
> printers to catch up to the latest logged message.
>
> Use pr_flush() whenever it is desirable to get buffered messages
> printed before continuing: suspend_console(), resume_console(),
> console_stop(), console_start(), console_unblank().
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 02bde45c1149..1e80fd052bd5 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2449,6 +2449,7 @@ void suspend_console(void)
> if (!console_suspend_enabled)
> return;
> pr_info("Suspending console(s) (use no_console_suspend to debug)\n");
> + pr_flush(1000, true);
> console_lock();
> console_suspended = 1;
> up_console_sem();
> @@ -2461,6 +2462,7 @@ void resume_console(void)
> down_console_sem();
> console_suspended = 0;
> console_unlock();
> + pr_flush(1000, true);
> }
>
> /**
> @@ -2802,8 +2804,10 @@ void console_unblank(void)
> if (oops_in_progress) {
> if (down_trylock_console_sem() != 0)
> return;
> - } else
> + } else {
> + pr_flush(1000, true);
It would make more sense to flush the consoles after they are
unblanked. I mean to move this to the end of the function.
Also it is not obvious why this is not called when oops_in_progress
is set. I guess that it is because trylock is needed in this case.
It should be handled inside pr_flush().
I mean that pr_flush() should internally use trylock when
@oops_in_progress is set. It will make it safe even in this
mode.
> console_lock();
> + }
>
> console_locked = 1;
> console_may_schedule = 0;
> @@ -2869,6 +2873,7 @@ struct tty_driver *console_device(int *index)
> */
> void console_stop(struct console *console)
> {
> + pr_flush(1000, true);
It would be enough to flush just the given @console.
It might be possible to take over the job from the related
kthread and flush it in this context. Well, I am not sure if
it is a good idea.
> console_lock();
> console->flags &= ~CON_ENABLED;
> console_unlock();
> @@ -2880,6 +2885,7 @@ void console_start(struct console *console)
> console_lock();
> console->flags |= CON_ENABLED;
> console_unlock();
> + pr_flush(1000, true);
Same here.
> }
> EXPORT_SYMBOL(console_start);
>
> @@ -3249,6 +3255,71 @@ static int __init printk_late_init(void)
> late_initcall(printk_late_init);
>
> #if defined CONFIG_PRINTK
> +/**
> + * pr_flush() - Wait for printing threads to catch up.
> + *
Alternative solution would be to take over the job from the kthreads
and flush the consoles in this context. Well, I am not sure
if it is a good idea or not.
> + * @timeout_ms: The maximum time (in ms) to wait.
> + * @reset_on_progress: Reset the timeout if forward progress is seen.
> + *
> + * A value of 0 for @timeout_ms means no waiting will occur. A value of -1
> + * represents infinite waiting.
> + *
> + * If @reset_on_progress is true, the timeout will be reset whenever any
> + * printer has been seen to make some forward progress.
> + *
> + * Context: Process context. May sleep while acquiring console lock.
> + * Return: true if all enabled printers are caught up.
> + */
> +bool pr_flush(int timeout_ms, bool reset_on_progress)
> +{
> + int remaining = timeout_ms;
> + struct console *con;
> + u64 last_diff = 0;
> + u64 printk_seq;
> + u64 diff;
> + u64 seq;
> +
> + might_sleep();
> +
> + seq = prb_next_seq(prb);
> +
> + for (;;) {
> + diff = 0;
> +
> + console_lock();
> + for_each_console(con) {
> + if (!console_is_usable(con))
> + continue;
> + printk_seq = con->seq;
> + if (printk_seq < seq)
> + diff += seq - printk_seq;
> + }
> + console_unlock();
> +
> + if (diff != last_diff && reset_on_progress)
> + remaining = timeout_ms;
> +
> + if (diff == 0 || remaining == 0)
> + break;
> +
> + if (remaining < 0) {
> + /* no timeout limit */
> + msleep(100);
> + } else if (remaining < 100) {
> + msleep(remaining);
> + remaining = 0;
> + } else {
> + msleep(100);
> + remaining -= 100;
> + }
> +
> + last_diff = diff;
> + }
> +
> + return (diff == 0);
> +}
> +EXPORT_SYMBOL(pr_flush);
Summary:
The pr_flush() API and the optional timeout look reasonable to me.
Please, handle oops_in_progress in pr_flush() and make it safe in this
mode. It will allow to move it at the end of console_unblank() where
it makes more sense.
I do not resist on flushing only the given consoles in console_stop()
and console_start(). It is nice to have and can be done later.
Also I do not resist on doing the flush in the context of the caller.
I am not even sure if it is a good idea. We could play with it
later when there are some problems with the current approach
in practice.
Best Regards,
Petr
On 2022-02-15, Petr Mladek <[email protected]> wrote:
> This patch is about how to make it more obvious that this
> API has to be used carefully. I see the following main risks when
> using this API:
>
> + it might cause deadlocks, especially in panic
>
> + it is supposed to be tail lock (no other locks allowed in this
> context)
>
> + it is re-entrant
>
> + it does not prevent parallel (nested) access on the same CPU
I can add more information about the risks to the function documentation
(include/linux/printk.h).
> "get/put" are more acceptable for me. They create at lest some feeling
> that it tries to get something and it might take some time. Even
> thought many "get/put" APIs are not blocking.
OK. So I will keep the naming for the series v2.
> BTW: The API disables IRQ. So the nested access is limited to two
> levels: normal/IRQ and nested NMI contexts.
And there are reentrant call chains. For example, we already have:
nmi_cpu_backtrace()
printk_cpu_sync_get_irqsave(flags);
dump_stack();
printk_cpu_sync_get_irqsave(flags);
__dump_stack(log_lvl);
printk_cpu_sync_put_irqrestore(flags);
printk_cpu_sync_put_irqrestore(flags);
John
Hi,
the solution is very interesting and I am sure that it has many
advantages. But it is also quite complex. I want to be sure
about the motivations before I start discussing details.
On Mon 2022-02-07 20:49:21, John Ogness wrote:
> With non-threaded console printers preemption is disabled while
> holding the console lock in order to avoid the situation where the
> console printer is scheduled away and no other task can lock the
> console (for printing or otherwise).
I guess that you are talking about that vprintk_emit() disables
the preemtion. console_lock() does not disable preemption.
> Disabling preemption is necessary because the console lock is
> implemented purely as a semaphore, which has no owner.
I do not understand this explanation.
The preemtion was added into vprintk_emit() by the commit
fd5f7cde1b85d4c8e09c ("printk: Never set console_may_schedule
in console_trylock()").
It is not really necessary. It actually increases the risk of softlockups.
The motivation was to increase the chance that messages will get
printed. Nobody prints the messages when the owner is sleeping.
We hoped that the console owner steeling will be efficient enough
to prevent the softlockups.
> Like non-threaded console printers, kthread printers use the
> console lock to synchronize during printing. However, since they
> use console_lock() instead of a best-effort console_trylock(), it
> is not possible to disable preemption upon locking.
^^^^^^^^^^^^^^^
IMHO, it is possible to disable preemtion. But we propably do
not want to do it, especially on RT.
> Therefore an alternative for synchronizing and avoiding
> the above mentioned situation is needed.
I do not think that preemtion is the reason for an alternative
soluiton.
IMHO, the reason is to handle different consoles independently.
It will allow to handle more consoles in parallel. And more
importantly they will not block each other.
> The kthread printers do not need to synchronize against each other,
yes
> but they do need to synchronize against console_lock() callers.
We should make it more clear why. I see the following reasons:
+ it is needed for switching between direct printing and
offload to kthreads.
+ another operations on all consoles, e.g. suspend, updating
list of registered consoles
+ console_lock() has several other external users and their
dependencies are complicated.
> To provide this synchronization, introduce a per-console mutex. The
> mutex is taken by the kthread printer during printing and is also
> taken by console_lock() callers.
ok
> Since mutexes have owners, when calling console_lock(), the
> scheduler is able to schedule any kthread printers that may have
> been preempted while printing.
I do not userstand the relation between mutex owners and scheduling
kthreads. The mutex can be taken by any process.
> Rather than console_lock() callers holding the per-console mutex
> for the duration of the console lock, the per-console mutex is only
> taken in order to set a new CON_PAUSED flag, which is checked by
> the kthread printers. This avoids any issues due to nested locking
> between the various per-console mutexes.
The flag adds some complexity. The logic will use two locks + one flag
instead of just two locks. IMHO, it deserves a more clear explanation
why it is needed. When can nested locking happen? Will it make the
nested locking safe?
The obvious ordering is to always take con->mutex under console_sem.
Are you afraid that some console driver might need to take console_sem
from con->write() callback? IMHO, this might cause deadlock even when
using CON_PAUSED flag.
CPU0: CPU1:
console_lock()
down(&console_sem);
for_each_console(con)
kthread_printer()
mutex_lock(&con->lock);
con->write()
down(&console_sem)
mutex_lock(&con->lock);
DEADLOCK!!!
> The kthread printers must also synchronize against console_trylock()
> callers. Since console_trylock() is non-blocking, a global atomic
> counter will be used to identify if any kthread printers are active.
> The kthread printers will also check the atomic counter to identify
> if the console has been locked by another task via
> console_trylock().
This is another complexity. An easier solution would be to do:
console_trylock()
{
if (down_trylock(&console_sem))
return failed;
for_each_console(con) {
if (mutext_trylock(&con->lock))
goto fail;
}
// sucess
fail:
for_each_console(con_fail) {
if (con_fail == con)
break;
mutex_unlock(&con->lock);
}
Of course, the atomic counter is interesting. But it is
also another extra complexity. Is it worth it?
Best Regards,
Petr
On 2022-02-18, John Ogness <[email protected]> wrote:
>> This is another complexity. An easier solution would be to do:
>>
>> console_trylock()
>> {
>> if (down_trylock(&console_sem))
>> return failed;
>>
>> for_each_console(con) {
>> if (mutext_trylock(&con->lock))
>> goto fail;
>> }
>>
>> // sucess
>>
>> fail:
>> for_each_console(con_fail) {
>> if (con_fail == con)
>> break;
>> mutex_unlock(&con->lock);
>> }
>
> No, this cannot work. Semaphores are the _only_ synchonization object in
> the kernel that is allowed to perform a trylock from any context
> (because they don't have owners).
Sorry, I need to clarify this bold statement.
Semaphores are the only synchonization object in the kernel that is
allowed to perform a trylock from any context and does not disable
preemption upon locking. (This is important because console lock holders
expect preemption to be unaffected when taking the lock.)
John
On 2022-02-18, Petr Mladek <[email protected]> wrote:
> the solution is very interesting and I am sure that it has many
> advantages. But it is also quite complex. I want to be sure
> about the motivations before I start discussing details.
I am jumping ahead to this email so that we can start discussing
details. This patch is really the key to make the whole series work
correctly.
> On Mon 2022-02-07 20:49:21, John Ogness wrote:
>> With non-threaded console printers preemption is disabled while
>> holding the console lock in order to avoid the situation where the
>> console printer is scheduled away and no other task can lock the
>> console (for printing or otherwise).
>
> I guess that you are talking about that vprintk_emit() disables
> the preemtion. console_lock() does not disable preemption.
Yes. I did not mean to imply that console_lock() is disabling
preemption. But preemption is disabled at some point _within_ the
console_lock()/console_unlock() window. I consider
console_lock()/console_unlock() to be the main API functions for direct
printing and I wanted to point out that calling that pair will cause
preemption disabling.
>> Disabling preemption is necessary because the console lock is
>> implemented purely as a semaphore, which has no owner.
>
> I do not understand this explanation.
>
> The preemtion was added into vprintk_emit() by the commit
> fd5f7cde1b85d4c8e09c ("printk: Never set console_may_schedule
> in console_trylock()").
>
> It is not really necessary. It actually increases the risk of softlockups.
> The motivation was to increase the chance that messages will get
> printed. Nobody prints the messages when the owner is sleeping.
> We hoped that the console owner steeling will be efficient enough
> to prevent the softlockups.
Yes, I am referring to this case where the context taking the semaphore
goes to sleep. If another context tries to take the semaphore, the
scheduler does not know which task it can schedule to resolve the
situation because there is no true lock contention.
With a mutex there is an owner. When another task tries to lock a mutex,
the scheduler knows which task must be scheduled to resolve this true
lock contention. (There are also other benefits relating to priority
inheritance, but I chose not to mention this.)
>> Like non-threaded console printers, kthread printers use the
>> console lock to synchronize during printing. However, since they
>> use console_lock() instead of a best-effort console_trylock(), it
>> is not possible to disable preemption upon locking.
> ^^^^^^^^^^^^^^^
>
> IMHO, it is possible to disable preemtion. But we propably do
> not want to do it, especially on RT.
As stated in the commit message of fd5f7cde1b85 ("printk: Never set
console_may_schedule in console_trylock()"), enabling preemption is "a
mistake". That is why I said "not possible". Perhaps I should say "bad
idea".
>> Therefore an alternative for synchronizing and avoiding
>> the above mentioned situation is needed.
>
> I do not think that preemtion is the reason for an alternative
> soluiton.
Well, if enabling preemption is "a mistake" then we _need_ an
alternative.
> IMHO, the reason is to handle different consoles independently.
> It will allow to handle more consoles in parallel. And more
> importantly they will not block each other.
This is an additional benefit, but it was not my primary motivation when
designing the alternative. Honestly, I do not care how we phrase the
motivation and benefits. I only wanted to make it clear that disabling
preemption is not an acceptable option.
>> The kthread printers do not need to synchronize against each other,
>
> yes
>
>> but they do need to synchronize against console_lock() callers.
>
> We should make it more clear why. I see the following reasons:
>
> + it is needed for switching between direct printing and
> offload to kthreads.
>
> + another operations on all consoles, e.g. suspend, updating
> list of registered consoles
>
> + console_lock() has several other external users and their
> dependencies are complicated.
OK. Those are also the reasons I see.
>> To provide this synchronization, introduce a per-console mutex. The
>> mutex is taken by the kthread printer during printing and is also
>> taken by console_lock() callers.
>
> ok
>
>> Since mutexes have owners, when calling console_lock(), the
>> scheduler is able to schedule any kthread printers that may have
>> been preempted while printing.
>
> I do not userstand the relation between mutex owners and scheduling
> kthreads. The mutex can be taken by any process.
As mentioned above, it is about creating true lock contention between
the kthreads and the console_lock() callers.
>> Rather than console_lock() callers holding the per-console mutex
>> for the duration of the console lock, the per-console mutex is only
>> taken in order to set a new CON_PAUSED flag, which is checked by
>> the kthread printers. This avoids any issues due to nested locking
>> between the various per-console mutexes.
>
> The flag adds some complexity. The logic will use two locks + one flag
> instead of just two locks. IMHO, it deserves a more clear explanation
> why it is needed. When can nested locking happen? Will it make the
> nested locking safe?
>
> The obvious ordering is to always take con->mutex under console_sem.
>
> Are you afraid that some console driver might need to take console_sem
> from con->write() callback?
No. By nested locking I mean:
for_each_console(c)
mutex_lock(&c->lock);
/* .... do something .... */
for_each_console_reversed(c)
mutex_unlock(&c->lock);
Aside from this looking ugly and being a potential lockdep problem if
the console ordering were to change, it would also mean that all the
console mutexes are locked while suspended. That is not necessarily an
issue, but I just did not like the idea of holding so many locks while
suspended. And I still do not feel so comfortable about the somewhat
arbitrary ordering of console locking.
If you think the above construct is good, then we do not need the flag
approach.
>> The kthread printers must also synchronize against console_trylock()
>> callers. Since console_trylock() is non-blocking, a global atomic
>> counter will be used to identify if any kthread printers are active.
>> The kthread printers will also check the atomic counter to identify
>> if the console has been locked by another task via
>> console_trylock().
>
> This is another complexity. An easier solution would be to do:
>
> console_trylock()
> {
> if (down_trylock(&console_sem))
> return failed;
>
> for_each_console(con) {
> if (mutext_trylock(&con->lock))
> goto fail;
> }
>
> // sucess
>
> fail:
> for_each_console(con_fail) {
> if (con_fail == con)
> break;
> mutex_unlock(&con->lock);
> }
No, this cannot work. Semaphores are the _only_ synchonization object in
the kernel that is allowed to perform a trylock from any context
(because they don't have owners).
> Of course, the atomic counter is interesting. But it is
> also another extra complexity. Is it worth it?
I could find no alternative. But I am open for ideas if anyone thinks it
can be done with less complexity.
John
Hi,
first, to make it clear. The changes make sense and I am going to
review the code. But I would still like to clarify the commit message.
I do not understand it myself. Future generations should know
why we did this ;-)
On Fri 2022-02-18 22:47:56, John Ogness wrote:
> On 2022-02-18, Petr Mladek <[email protected]> wrote:
> > the solution is very interesting and I am sure that it has many
> > advantages. But it is also quite complex. I want to be sure
> > about the motivations before I start discussing details.
>
> I am jumping ahead to this email so that we can start discussing
> details. This patch is really the key to make the whole series work
> correctly.
>
> > On Mon 2022-02-07 20:49:21, John Ogness wrote:
> >> With non-threaded console printers preemption is disabled while
> >> holding the console lock in order to avoid the situation where the
> >> console printer is scheduled away and no other task can lock the
> >> console (for printing or otherwise).
> >
> > I guess that you are talking about that vprintk_emit() disables
> > the preemtion. console_lock() does not disable preemption.
>
> Yes. I did not mean to imply that console_lock() is disabling
> preemption. But preemption is disabled at some point _within_ the
> console_lock()/console_unlock() window. I consider
> console_lock()/console_unlock() to be the main API functions for direct
> printing and I wanted to point out that calling that pair will cause
> preemption disabling.
But console_unlock() should not call consoles when kthreads are
usable. It should just wake up them. The kthreads might still
use console_lock() and __console_unlock().
I do not understand why this should be a reason to introduce
per-console locks.
> >> Disabling preemption is necessary because the console lock is
> >> implemented purely as a semaphore, which has no owner.
> >
> > I do not understand this explanation.
> >
> > The preemtion was added into vprintk_emit() by the commit
> > fd5f7cde1b85d4c8e09c ("printk: Never set console_may_schedule
> > in console_trylock()").
> >
> > It is not really necessary. It actually increases the risk of softlockups.
> > The motivation was to increase the chance that messages will get
> > printed. Nobody prints the messages when the owner is sleeping.
> > We hoped that the console owner steeling will be efficient enough
> > to prevent the softlockups.
>
> Yes, I am referring to this case where the context taking the semaphore
> goes to sleep. If another context tries to take the semaphore, the
> scheduler does not know which task it can schedule to resolve the
> situation because there is no true lock contention.
>
> With a mutex there is an owner. When another task tries to lock a mutex,
> the scheduler knows which task must be scheduled to resolve this true
> lock contention. (There are also other benefits relating to priority
> inheritance, but I chose not to mention this.)
This sounds interesting. Does scheduler wake up or prioritize
mutex owners?
I do not see this in the code. There is the optimistic spin but
it is used only when the current owner is just running on another CPU.
Again, I do not see how this could be a motivation to introduce
per-console mutexes.
> >> Like non-threaded console printers, kthread printers use the
> >> console lock to synchronize during printing. However, since they
> >> use console_lock() instead of a best-effort console_trylock(), it
> >> is not possible to disable preemption upon locking.
> > ^^^^^^^^^^^^^^^
> >
> > IMHO, it is possible to disable preemtion. But we propably do
> > not want to do it, especially on RT.
>
> As stated in the commit message of fd5f7cde1b85 ("printk: Never set
> console_may_schedule in console_trylock()"), enabling preemption is "a
> mistake". That is why I said "not possible". Perhaps I should say "bad
> idea".
"bad idea" is better. I am sorry that I am too picky. But the disabled
preemption was a big mystery when I started working on printk code
ages ago. And it was hard to find any explanation if it was required
or just nice to have. Clear commit message or comment would have
helped a lot.
> >> Therefore an alternative for synchronizing and avoiding
> >> the above mentioned situation is needed.
> >
> > I do not think that preemtion is the reason for an alternative
> > soluiton.
>
> Well, if enabling preemption is "a mistake" then we _need_ an
> alternative.
The alternative is the offload to kthreads that were introduced
in another patch. This patch introduces per-console locks and
the kthreads are usable even without them.
This is why I am confused by this commit messages.
> > IMHO, the reason is to handle different consoles independently.
> > It will allow to handle more consoles in parallel. And more
> > importantly they will not block each other.
>
> This is an additional benefit, but it was not my primary motivation when
> designing the alternative. Honestly, I do not care how we phrase the
> motivation and benefits. I only wanted to make it clear that disabling
> preemption is not an acceptable option.
I am sorry but I still do not understand the primary motivation.
I do not see how the per-console mutexes help with preemtion:
+ console_lock()/console_unlock() is fast when kthreads are usable
+ console_lock()/console_unlock() behaves the same as before
when kthreads are not usable, for example, during early boot,
or panic.
Idea: Did you want to prevent situation when the console_lock()
is blocked because any printk kthread is sleeping with the lock?
But it is still the case. console_lock() must still take
each per-console mutex and can be blocked by any sleeping
printk kthread.
BTW: The possibility to process printk kthreads in parallel is
a good enough reason from my POV. But I would still like
to understand your primary motivation. There must be some
important aspect that I do not understand. And it would be
great to document it.
> >> To provide this synchronization, introduce a per-console mutex. The
> >> mutex is taken by the kthread printer during printing and is also
> >> taken by console_lock() callers.
> >
> > ok
> >
> >> Since mutexes have owners, when calling console_lock(), the
> >> scheduler is able to schedule any kthread printers that may have
> >> been preempted while printing.
> >
> > I do not userstand the relation between mutex owners and scheduling
> > kthreads. The mutex can be taken by any process.
>
> As mentioned above, it is about creating true lock contention between
> the kthreads and the console_lock() callers.
I understand that it solves synchronization between the global
console_lock() and per-console mutexes.
But I still do not understand the effect of mutex owners and that
scheduler might wake them. I do not see any code for this.
Will this behavior be added by some followup patchset?
> >> Rather than console_lock() callers holding the per-console mutex
> >> for the duration of the console lock, the per-console mutex is only
> >> taken in order to set a new CON_PAUSED flag, which is checked by
> >> the kthread printers. This avoids any issues due to nested locking
> >> between the various per-console mutexes.
> >
> > The flag adds some complexity. The logic will use two locks + one flag
> > instead of just two locks. IMHO, it deserves a more clear explanation
> > why it is needed. When can nested locking happen? Will it make the
> > nested locking safe?
> >
> > The obvious ordering is to always take con->mutex under console_sem.
> >
> > Are you afraid that some console driver might need to take console_sem
> > from con->write() callback?
>
> No. By nested locking I mean:
>
> for_each_console(c)
> mutex_lock(&c->lock);
>
> /* .... do something .... */
>
> for_each_console_reversed(c)
> mutex_unlock(&c->lock);
>
> Aside from this looking ugly and being a potential lockdep problem if
> the console ordering were to change, it would also mean that all the
> console mutexes are locked while suspended. That is not necessarily an
> issue, but I just did not like the idea of holding so many locks while
> suspended. And I still do not feel so comfortable about the somewhat
> arbitrary ordering of console locking.
It makes perfect sense. I see why we need the flags now.
> If you think the above construct is good, then we do not need the flag
> approach.
>
> >> The kthread printers must also synchronize against console_trylock()
> >> callers. Since console_trylock() is non-blocking, a global atomic
> >> counter will be used to identify if any kthread printers are active.
> >> The kthread printers will also check the atomic counter to identify
> >> if the console has been locked by another task via
> >> console_trylock().
> >
> > This is another complexity. An easier solution would be to do:
> >
> > console_trylock()
> > {
> > if (down_trylock(&console_sem))
> > return failed;
> >
> > for_each_console(con) {
> > if (mutext_trylock(&con->lock))
> > goto fail;
> > }
> >
> > // sucess
> >
> > fail:
> > for_each_console(con_fail) {
> > if (con_fail == con)
> > break;
> > mutex_unlock(&con->lock);
> > }
> No, this cannot work. Semaphores are the _only_ synchonization object in
> the kernel that is allowed to perform a trylock from any context
> (because they don't have owners).
Ah, mutex_trylock() can't be used in atomic context. I was not aware
of this.
It means that also the atomic counter makes perfect sense. And it is
very nice trick.
Best Regards,
Petr
PS: I am going to review the code.
On Mon 2022-02-07 20:49:21, John Ogness wrote:
> Rather than console_lock() callers holding the per-console mutex
> for the duration of the console lock, the per-console mutex is only
> taken in order to set a new CON_PAUSED flag, which is checked by
> the kthread printers. This avoids any issues due to nested locking
> between the various per-console mutexes.
>
> diff --git a/include/linux/console.h b/include/linux/console.h
> index 0f94b1771df8..c51c7f5507a5 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -16,6 +16,7 @@
>
> #include <linux/atomic.h>
> #include <linux/types.h>
> +#include <linux/mutex.h>
>
> struct vc_data;
> struct console_font_op;
> @@ -136,6 +137,7 @@ static inline int con_debug_leave(void)
> #define CON_ANYTIME (16) /* Safe to call before per-cpu resources ready */
> #define CON_BRL (32) /* Used for a braille device */
> #define CON_EXTENDED (64) /* Use the extended output format a la /dev/kmsg */
> +#define CON_PAUSED (128) /* Sleep while console is locked */
I was few times confused by the name. It sounds like that the console
does not show any messages as all. I mean that it is something as
console_suspend().
It is not easy to find a good short name explaining that it is only
about kthreads. What about CON_DIRECT?
>
> struct console {
> char name[16];
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index e182f31fec58..135fbe647092 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -214,6 +214,26 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
> /* Number of registered extended console drivers. */
> static int nr_ext_console_drivers;
>
> +/*
> + * Used to synchronize printing kthreads against direct printing via
> + * console_trylock/console_unlock.
> + *
> + * Values:
> + * -1 = console locked (via trylock), kthreads will not print
> + * 0 = no kthread printing, console not locked (via trylock)
> + * >0 = kthread(s) actively printing
> + *
> + * Note: For synchronizing against direct printing via
> + * console_lock/console_unlock, see the @lock variable in
> + * struct console.
> + */
> +static atomic_t console_lock_count = ATOMIC_INIT(0);
I am getting lost in all the variables called lock.
What about calling this "console_kthreads_active"? The counter is
incremented when a kthread is actively printing.
> +#define console_excl_trylock() (atomic_cmpxchg(&console_lock_count, 0, -1) == 0)
> +#define console_excl_unlock() atomic_cmpxchg(&console_lock_count, -1, 0)
> +#define console_printer_tryenter() atomic_inc_unless_negative(&console_lock_count)
> +#define console_printer_exit() atomic_dec(&console_lock_count)
Similar here. What about the following
#define console_kthreads_atomic_tryblock() (atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0)
#define console_kthreads_atomic_unblock() atomic_cmpxchg(&console_kthreads_active, -1, 0)
#define console_kthread_tryenter() atomic_inc_unless_negative(&console_kthreads_active)
#define console_kthread_exit() atomic_dec(&console_kthreads_active)
It is more clear that it is an "atomic" way to block/pause the "kthreads".
I do not resist on the word "block". I somehow like it more than
"pause". But "pause" is good as well when used together with "kthreads".
> +
> /*
> * Helper macros to handle lockdep when locking/unlocking console_sem. We use
> * macros instead of functions so that _RET_IP_ contains useful information.
> @@ -256,6 +276,37 @@ static void __up_console_sem(unsigned long ip)
> }
> #define up_console_sem() __up_console_sem(_RET_IP_)
>
> +/*
> + * Tracks whether kthread printers are all paused. A value of true implies
> + * that the console is locked via console_lock() or the console is suspended.
> + * Reading and writing to this variable requires holding @console_sem.
> + */
> +static bool consoles_paused;
> +
> +/*
> + * Pause or unpause all kthread printers.
> + *
> + * Requires the console_lock.
> + */
> +static void __pause_all_consoles(bool do_pause)
Again. It feels like no consoles will work anymore. What about
console_kthreads_block().
> +{
> + struct console *con;
> +
> + for_each_console(con) {
> + mutex_lock(&con->lock);
> + if (do_pause)
> + con->flags |= CON_PAUSED;
> + else
> + con->flags &= ~CON_PAUSED;
> + mutex_unlock(&con->lock);
> + }
> +
> + consoles_paused = do_pause;
> +}
> +
> +#define pause_all_consoles() __pause_all_consoles(true)
> +#define unpause_all_consoles() __pause_all_consoles(false)
> +
> /*
> * This is used for debugging the mess that is the VT code by
> * keeping track if we have the console semaphore held. It's
> @@ -2568,15 +2616,45 @@ int console_trylock(void)
> up_console_sem();
> return 0;
> }
> + if (!console_excl_trylock()) {
> + up_console_sem();
> + return 0;
> + }
> console_locked = 1;
> console_may_schedule = 0;
> return 1;
> }
> EXPORT_SYMBOL(console_trylock);
>
> +/*
> + * A variant of console_trylock() that allows specifying if the context may
> + * sleep. If yes, a trylock on @console_sem is attempted and if successful,
> + * the threaded printers are paused. This is important to ensure that
> + * sleepable contexts do not become involved in console_lock handovers and
> + * will call cond_resched() during the printing loop.
> + */
> +static int console_trylock_sched(bool may_schedule)
> +{
> + if (!may_schedule)
> + return console_trylock();
> +
> + might_sleep();
> +
> + if (down_trylock_console_sem())
> + return 0;
> + if (console_suspended) {
> + up_console_sem();
> + return 0;
> + }
> + pause_all_consoles();
This is weird. Any trylock function should be fast and non-blocking.
But pause_all_consoles() uses mutex_lock().
My expectation would be that console_trylock_sched() behaves
excatly the same as console_trylock() except that it will
set console_may_schedule by the given parameter.
I would do it the other way. Rename console_trylock() and
implement:
int console_trylock(void)
{
return console_trylock_sched(false);
}
LATER: I got it. It is used by console_trylock_sched() called
in console_unlock() when "do_cond_resched == true". In this
case, the trylock might wait for the mutexes. It will prevent
transfering console_lock from schedulable to atomic context
by the check in console_emit_next_record().
Hmm, I would still prefer to keep console_trylock_sched()
behavior sane: non-blocking in all situations. It means
that we actually do not need it and console_trylock()
is enough.
It will allow to steal console_lock() from schedulable
context. But it is not a regression. And it is only
a corner case when console_unlock() re-takes the semaphore
after releasing it.
We could do the same optimization in console_unlock() by
calling console_emit_next_record() with NULL handover pointer
when do_cond_resched == true. But we should do it
as a separate patch later.
> + console_locked = 1;
> + console_may_schedule = 1;
> + return 1;
> +}
> +
> int is_console_locked(void)
> {
> - return console_locked;
> + return (console_locked || atomic_read(&console_lock_count));
> }
> EXPORT_SYMBOL(is_console_locked);
>
> @@ -2610,6 +2688,19 @@ static inline bool console_is_usable(struct console *con)
> static void __console_unlock(void)
> {
> console_locked = 0;
> +
> + /*
> + * Depending on whether console_lock() or console_trylock() was used,
> + * appropriately allow the kthread printers to continue.
> + */
> + if (consoles_paused)
> + unpause_all_consoles();
> + else
> + console_excl_unlock();
This is a good example when the meaning of console_excl_lock() is not
obvious. IHMO, the following is easier to understand:
if (console_kthreads_blocked)
console_kthreads_unblock();
else
console_kthreads_atomic_unblock();
> +
> + /* Wake the kthread printers. */
> + wake_up_klogd();
> +
> up_console_sem();
> }
>
> @@ -2632,7 +2723,8 @@ static void __console_unlock(void)
> *
> * @handover will be set to true if a printk waiter has taken over the
> * console_lock, in which case the caller is no longer holding the
> - * console_lock. Otherwise it is set to false.
> + * console_lock. Otherwise it is set to false. A NULL pointer may be provided
> + * to disable allowing the console_lock to be taken over by a printk waiter.
> */
> static bool console_emit_next_record(struct console *con, char *text, char *ext_text,
> char *dropped_text, bool *handover)
> @@ -2640,12 +2732,14 @@ static bool console_emit_next_record(struct console *con, char *text, char *ext_
> struct printk_info info;
> struct printk_record r;
> unsigned long flags;
> + bool allow_handover;
> char *write_text;
> size_t len;
>
> prb_rec_init_rd(&r, &info, text, CONSOLE_LOG_MAX);
>
> - *handover = false;
> + if (handover)
> + *handover = false;
>
> if (!prb_read_valid(prb, con->seq, &r))
> return false;
> @@ -2671,18 +2765,23 @@ static bool console_emit_next_record(struct console *con, char *text, char *ext_
> len = record_print_text(&r, console_msg_format & MSG_FORMAT_SYSLOG, printk_time);
> }
>
> - /*
> - * While actively printing out messages, if another printk()
> - * were to occur on another CPU, it may wait for this one to
> - * finish. This task can not be preempted if there is a
> - * waiter waiting to take over.
> - *
> - * Interrupts are disabled because the hand over to a waiter
> - * must not be interrupted until the hand over is completed
> - * (@console_waiter is cleared).
> - */
> - printk_safe_enter_irqsave(flags);
> - console_lock_spinning_enable();
> + /* Handovers may only happen between trylock contexts. */
> + allow_handover = (handover && atomic_read(&console_lock_count) == -1);
It is far from obvious why the check for console_lock_count is needed.
I found the reason only after removing the check and seeing the
warning about using mutex_unlock() in atomic context.
I suggest something like:
/*
* Allow to handover the lock only from atomic context. The handover
* is used to transfer console owner between more printk() callers.
* They are not allowed to unblock the console kthreads by mutexes
* in atomic context.
*/
allow_handover = (handover && console_kthreads_atomically_blocked());
> +
> + if (allow_handover) {
> + /*
> + * While actively printing out messages, if another printk()
> + * were to occur on another CPU, it may wait for this one to
> + * finish. This task can not be preempted if there is a
> + * waiter waiting to take over.
> + *
> + * Interrupts are disabled because the hand over to a waiter
> + * must not be interrupted until the hand over is completed
> + * (@console_waiter is cleared).
> + */
> + printk_safe_enter_irqsave(flags);
> + console_lock_spinning_enable();
> + }
>
> stop_critical_timings(); /* don't trace print latency */
> call_console_driver(con, write_text, len, dropped_text);
> @@ -2825,7 +2926,7 @@ void console_unlock(void)
> * Re-check if there is a new record to flush. If the trylock
> * fails, another context is already handling the printing.
> */
> - } while (prb_read_valid(prb, next_seq, NULL) && console_trylock());
> + } while (prb_read_valid(prb, next_seq, NULL) && console_trylock_sched(do_cond_resched));
> }
> EXPORT_SYMBOL(console_unlock);
>
> @@ -2856,6 +2957,10 @@ void console_unblank(void)
> if (oops_in_progress) {
> if (down_trylock_console_sem() != 0)
> return;
> + if (!console_excl_trylock()) {
> + up_console_sem();
> + return;
> + }
It would be better to use
if (oops_in_progress) {
if (!console_trylock())
return;
> } else {
> pr_flush(1000, true);
> console_lock();
> @@ -3397,16 +3502,17 @@ static bool printer_should_wake(struct console *con, u64 seq)
> if (kthread_should_stop())
> return true;
>
> - if (console_suspended)
> - return false;
> -
> /*
> * This is an unsafe read to con->flags, but false positives
> * are not an issue as long as they are rare.
> */
> flags = data_race(READ_ONCE(con->flags));
> - if (!(flags & CON_ENABLED))
> +
> + if (!(flags & CON_ENABLED) ||
> + (flags & CON_PAUSED) ||
> + atomic_read(&console_lock_count) == -1) {
> return false;
> + }
I would remove also this and do the after waking up. I think
that it is actually already being discussed in another patch.
>
> return prb_read_valid(prb, seq, NULL);
> }
Otherwise, it looks good.
This must have been hard to invent. Great job!
Best Regards,
Petr
On Mon 2022-02-07 20:49:23, John Ogness wrote:
> Deadlocks and the framebuffer console have been a recurring issue
> that is getting worse. Daniel Vetter suggested [0] that
> fbcon->write() should no longer be called from an atomic context.
We should make it clear that people will not longer see kernel
messages on ttyX during early boot, panic, and in some other
situations when printk kthreads are not available.
Or do I miss something?
Do we really want this?
Do the tty maintainers really want to give up on supporting
modes when processes/kthreads do not work?
Maybe, it is inevitable. Maybe, people are not using tty
for debugging too much.
Anyway, this change has to be approved by tty guys.
Best Regards,
Petr
> Introduce a new console flag CON_MIGHT_SLEEP for a console driver to
> specify that it is only called from sleepable contexts. Set the
> fbcon to use this new flag.
>
> [0] https://lore.kernel.org/all/[email protected]
>
> Signed-off-by: John Ogness <[email protected]>
> ---
> drivers/tty/vt/vt.c | 2 +-
> include/linux/console.h | 1 +
> kernel/printk/printk.c | 2 ++
> 3 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
> index 7359c3e80d63..ab4712cc9327 100644
> --- a/drivers/tty/vt/vt.c
> +++ b/drivers/tty/vt/vt.c
> @@ -3161,7 +3161,7 @@ static struct console vt_console_driver = {
> .write = vt_console_print,
> .device = vt_console_device,
> .unblank = unblank_screen,
> - .flags = CON_PRINTBUFFER,
> + .flags = CON_PRINTBUFFER|CON_MIGHT_SLEEP,
> .index = -1,
> };
> #endif
> diff --git a/include/linux/console.h b/include/linux/console.h
> index c51c7f5507a5..ea52c56b3ff8 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -138,6 +138,7 @@ static inline int con_debug_leave(void)
> #define CON_BRL (32) /* Used for a braille device */
> #define CON_EXTENDED (64) /* Use the extended output format a la /dev/kmsg */
> #define CON_PAUSED (128) /* Sleep while console is locked */
> +#define CON_MIGHT_SLEEP (256) /* Can only be called from sleepable context */
>
> struct console {
> char name[16];
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 086155578f10..b92ef67a5aa2 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2836,6 +2836,8 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
>
> if (!console_is_usable(con))
> continue;
> + if ((con->flags & CON_MIGHT_SLEEP) && !do_cond_resched)
> + continue;
This means that ttyX will be able to show the messages only
from a process context. It it will not longer show the messages
during, early boot, panic, suspend, and some other situations.
Do we really want this?
Do the tty maintainers really want to give up support
in these modes when processes/kthreads do not work?
Maybe, it is inevitable. Maybe, people are not using tty
for debugging too much.
Anyway, this has to be
> any_usable = true;
>
> if (con->flags & CON_EXTENDED) {
> --
> 2.30.2
On Wed, Feb 23, 2022 at 02:37:55PM +0100, Petr Mladek wrote:
> On Mon 2022-02-07 20:49:23, John Ogness wrote:
> > Deadlocks and the framebuffer console have been a recurring issue
> > that is getting worse. Daniel Vetter suggested [0] that
> > fbcon->write() should no longer be called from an atomic context.
>
> We should make it clear that people will not longer see kernel
> messages on ttyX during early boot, panic, and in some other
> situations when printk kthreads are not available.
>
> Or do I miss something?
>
> Do we really want this?
>
> Do the tty maintainers really want to give up on supporting
> modes when processes/kthreads do not work?
I would love that, but:
> Maybe, it is inevitable. Maybe, people are not using tty
> for debugging too much.
People are using tty consoles for debugging all the time. We can not
break that.
thanks,
greg k-h
On 2022-02-22, Petr Mladek <[email protected]> wrote:
>> >> With non-threaded console printers preemption is disabled while
>> >> holding the console lock in order to avoid the situation where the
>> >> console printer is scheduled away and no other task can lock the
>> >> console (for printing or otherwise).
>> >
>> > I guess that you are talking about that vprintk_emit() disables
>> > the preemtion. console_lock() does not disable preemption.
>>
>> Yes. I did not mean to imply that console_lock() is disabling
>> preemption. But preemption is disabled at some point _within_ the
>> console_lock()/console_unlock() window. I consider
>> console_lock()/console_unlock() to be the main API functions for direct
>> printing and I wanted to point out that calling that pair will cause
>> preemption disabling.
>
> But console_unlock() should not call consoles when kthreads are
> usable. It should just wake up them. The kthreads might still
> use console_lock() and __console_unlock().
I am only talking about when direct printing should be performed. Direct
printing depends on more than just the kthreads. See
allow_direct_printing().
> I do not understand why this should be a reason to introduce
> per-console locks.
I see your point and you are correct. The reason for per-console locks
is purely to allow the console printers to run in parallel.
>> >> Disabling preemption is necessary because the console lock is
>> >> implemented purely as a semaphore, which has no owner.
>> >
>> > I do not understand this explanation.
>> >
>> > The preemtion was added into vprintk_emit() by the commit
>> > fd5f7cde1b85d4c8e09c ("printk: Never set console_may_schedule
>> > in console_trylock()").
>> >
>> > It is not really necessary. It actually increases the risk of
>> > softlockups. The motivation was to increase the chance that
>> > messages will get printed. Nobody prints the messages when the
>> > owner is sleeping. We hoped that the console owner steeling will
>> > be efficient enough to prevent the softlockups.
>>
>> Yes, I am referring to this case where the context taking the
>> semaphore goes to sleep. If another context tries to take the
>> semaphore, the scheduler does not know which task it can schedule to
>> resolve the situation because there is no true lock contention.
>>
>> With a mutex there is an owner. When another task tries to lock a
>> mutex, the scheduler knows which task must be scheduled to resolve
>> this true lock contention. (There are also other benefits relating to
>> priority inheritance, but I chose not to mention this.)
>
> This sounds interesting. Does scheduler wake up or prioritize
> mutex owners?
Sorry, the only example of this is priority inheritance. But for
non-PREEMPT_RT there is no priority inheritance. The lock would need to
be an rtmutex for this ability, which probably doesn't make sense for
printk.
And besides, the priority inheritance only works in 1 direction. With
the use of the flags approach, there is no way for printing kthreads to
boost console_lock owners. So even mentioning priority inheritance is
misleading.
> I do not see this in the code. There is the optimistic spin but
> it is used only when the current owner is just running on another CPU.
>
> Again, I do not see how this could be a motivation to introduce
> per-console mutexes.
You are correct.
>> >> Like non-threaded console printers, kthread printers use the
>> >> console lock to synchronize during printing. However, since they
>> >> use console_lock() instead of a best-effort console_trylock(), it
>> >> is not possible to disable preemption upon locking.
>> > ^^^^^^^^^^^^^^^
>> >
>> > IMHO, it is possible to disable preemtion. But we propably do
>> > not want to do it, especially on RT.
>>
>> As stated in the commit message of fd5f7cde1b85 ("printk: Never set
>> console_may_schedule in console_trylock()"), enabling preemption is
>> "a mistake". That is why I said "not possible". Perhaps I should say
>> "bad idea".
>
> "bad idea" is better. I am sorry that I am too picky. But the disabled
> preemption was a big mystery when I started working on printk code
> ages ago. And it was hard to find any explanation if it was required
> or just nice to have. Clear commit message or comment would have
> helped a lot.
AFAICT the issue only applies to the printk() call chain, because the
caller may be holding other locks that are critical to the system. (In
Tetsuo's case [0], it was @oom_lock.)
[0] https://lore.kernel.org/linux-mm/[email protected]/
That means the reasoning in my commit message about kthreaded printing
and disabling preemption is moot. I will clean that up. Thanks for
pointing out my handwaving.
> I do not see how the per-console mutexes help with preemtion:
>
> + console_lock()/console_unlock() is fast when kthreads are usable
>
> + console_lock()/console_unlock() behaves the same as before
> when kthreads are not usable, for example, during early boot,
> or panic.
>
> BTW: The possibility to process printk kthreads in parallel is
> a good enough reason from my POV. But I would still like
> to understand your primary motivation. There must be some
> important aspect that I do not understand. And it would be
> great to document it.
You are right. The v2 commit message will focus on:
- the motivation for per-console locks is parallel printing
- explain about how disabling preemption is only necessary for direct
printing via printk() because the caller may be holding
system-critical and/or timing-sensitive locks (and also to allow the
console owner/waiter logic to work correctly)
- correctly clarifying why the various types
(semaphore/mutex/flag/atomic) were chosen to implement the printing
sychronization between atomic-direct, non-atomic-direct, and kthreads
(and I will explicitly remind the audience that mutex_trylock() cannot
be used in atomic context)
Thanks, Petr!
John
On Wed 2022-02-23 18:26:54, John Ogness wrote:
> On 2022-02-22, Petr Mladek <[email protected]> wrote:
> >> With a mutex there is an owner. When another task tries to lock a
> >> mutex, the scheduler knows which task must be scheduled to resolve
> >> this true lock contention. (There are also other benefits relating to
> >> priority inheritance, but I chose not to mention this.)
> >
> > This sounds interesting. Does scheduler wake up or prioritize
> > mutex owners?
>
> Sorry, the only example of this is priority inheritance. But for
> non-PREEMPT_RT there is no priority inheritance. The lock would need to
> be an rtmutex for this ability, which probably doesn't make sense for
> printk.
Good to know.
> The v2 commit message will focus on:
>
> - the motivation for per-console locks is parallel printing
>
> - explain about how disabling preemption is only necessary for direct
> printing via printk() because the caller may be holding
> system-critical and/or timing-sensitive locks (and also to allow the
> console owner/waiter logic to work correctly)
>
> - correctly clarifying why the various types
> (semaphore/mutex/flag/atomic) were chosen to implement the printing
> sychronization between atomic-direct, non-atomic-direct, and kthreads
> (and I will explicitly remind the audience that mutex_trylock() cannot
> be used in atomic context)
Sounds great.
Thanks a lot for the hard work. It is very appreciated.
Best Regards,
Petr
On 2022-02-16, Petr Mladek <[email protected]> wrote:
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index d1b773823d63..b346e60e9e51 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2577,11 +2577,11 @@ static int have_callable_console(void)
>> *
>> * Console drivers may assume that per-cpu resources have been allocated. So
>> * unless they're explicitly marked as being able to cope (CON_ANYTIME) don't
>> - * call them until this CPU is officially up.
>> + * call them until per-cpu resources have been allocated.
>> */
>> static inline int can_use_console(void)
>> {
>> - return cpu_online(raw_smp_processor_id()) || have_callable_console();
>> + return (printk_percpu_data_ready() || have_callable_console());
>> }
>
> cpu_online(raw_smp_processor_id()) check is used also in
> call_console_drivers(). The same logic should be used in both
> locations.
>
> I found this when reviewing 6th patch that replaced both checks
> with a single one.
>
> Note that I am still not sure if this change is correct at all.
> It will allow to always call the console during CPU hotplug
> and I am not sure if it is safe. IMHO, it might cause problems when
> a console driver uses, for example, CPU-bound workqueues.
You are correct. We must take hotplug into account for !CON_ANYTIME
consoles. There may be some hotplug callbacks that make memory
unavailable for the console.
However, I will add the use of printk_percpu_data_ready() in the
check. !CON_ANYTIME consoles also should not be called until the per-cpu
areas are ready. For example, it would be bad if a console queued
irq_work before per-cpu areas are setup (cpu_online() is true during
this time).
One of my main concerns was that raw_smp_processor_id() was used for the
check. It is conceptually wrong to exclude certain consoles based on a
current CPU when migration is still enabled. I understand that the use
of can_use_console() is an optimization to avoid doing extra work where
there are no consoles available. But the task could be preemptible there
and _conceptually_, could get moved to another CPU before its write()
callback is called. The cpu_online() check belongs in code where
preemption is disabled.
If the context is preemptible, I do not think it will ever see
!cpu_online(). So I think if the cpu_online() check is limited to
unlocking when console_trylock() was used, it will be correct.
In the current implementation of printk(), it would be odd to do this
conditional check (perhaps by passing @do_cond_resched to
can_use_console()). But my series does significant refactoring and
actually does need to distinguish between console_lock() and
console_trylock() due to the kthreads and supporting the handover. So it
should work well that the cpu_online() check for !CON_ANYTIME is only
performed when !preemptible.
Regardless, my v2 will keep cpu_online() checks since they are necessary
for hotplug support.
John
On 2022-02-17, Petr Mladek <[email protected]> wrote:
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index 02bde45c1149..1e80fd052bd5 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2802,8 +2804,10 @@ void console_unblank(void)
>> if (oops_in_progress) {
>> if (down_trylock_console_sem() != 0)
>> return;
>> - } else
>> + } else {
>> + pr_flush(1000, true);
>
> It would make more sense to flush the consoles after they are
> unblanked. I mean to move this to the end of the function.
Agreed.
> Also it is not obvious why this is not called when oops_in_progress
> is set. I guess that it is because trylock is needed in this case.
> It should be handled inside pr_flush().
>
> I mean that pr_flush() should internally use trylock when
> @oops_in_progress is set. It will make it safe even in this
> mode.
pr_flush() is a might_sleep() function. We agreed on this at
LPC2019.
Creating a pr_flush() that will directly push out the messages (or
busy-wait) in non-preemptible contexts is complicated. It might be
something to attempt for the future, but I would prefer to avoid it at
this stage.
>> console_lock();
>> + }
>>
>> console_locked = 1;
>> console_may_schedule = 0;
>> @@ -2869,6 +2873,7 @@ struct tty_driver *console_device(int *index)
>> */
>> void console_stop(struct console *console)
>> {
>> + pr_flush(1000, true);
>
> It would be enough to flush just the given @console.
For v2 I will create an internal __pr_flush() to allow specifying that
only a single console is flushed. The high level pr_flush() will then
call __pr_flush() specifying all consoles.
> It might be possible to take over the job from the related
> kthread and flush it in this context. Well, I am not sure if
> it is a good idea.
I agree that it might not be a good idea. Let's keep things simple for
now.
>> console_lock();
>> console->flags &= ~CON_ENABLED;
>> console_unlock();
>> @@ -2880,6 +2885,7 @@ void console_start(struct console *console)
>> console_lock();
>> console->flags |= CON_ENABLED;
>> console_unlock();
>> + pr_flush(1000, true);
>
> Same here.
OK.
>> }
>> EXPORT_SYMBOL(console_start);
>>
>> @@ -3249,6 +3255,71 @@ static int __init printk_late_init(void)
>> late_initcall(printk_late_init);
>>
>> #if defined CONFIG_PRINTK
>> +/**
>> + * pr_flush() - Wait for printing threads to catch up.
>> + *
>
> Alternative solution would be to take over the job from the kthreads
> and flush the consoles in this context. Well, I am not sure
> if it is a good idea or not.
Since pr_flush() is might_sleep() this would be relatively simple. Just
grab the console mutex and go. My concern is that this task may have
different scheduling parameters that could negatively affect the
system. For normal operation, I really would prefer that the designated
kthreads do the work. If "waiting for the kthreads" turns out to be
problematic, then maybe we could go down this path.
>> + * @timeout_ms: The maximum time (in ms) to wait.
>> + * @reset_on_progress: Reset the timeout if forward progress is seen.
>> + *
>> + * A value of 0 for @timeout_ms means no waiting will occur. A value of -1
>> + * represents infinite waiting.
>> + *
>> + * If @reset_on_progress is true, the timeout will be reset whenever any
>> + * printer has been seen to make some forward progress.
>> + *
>> + * Context: Process context. May sleep while acquiring console lock.
>> + * Return: true if all enabled printers are caught up.
>> + */
>> +bool pr_flush(int timeout_ms, bool reset_on_progress)
>> +{
>> + int remaining = timeout_ms;
>> + struct console *con;
>> + u64 last_diff = 0;
>> + u64 printk_seq;
>> + u64 diff;
>> + u64 seq;
>> +
>> + might_sleep();
>> +
>> + seq = prb_next_seq(prb);
>> +
>> + for (;;) {
>> + diff = 0;
>> +
>> + console_lock();
>> + for_each_console(con) {
>> + if (!console_is_usable(con))
>> + continue;
>> + printk_seq = con->seq;
>> + if (printk_seq < seq)
>> + diff += seq - printk_seq;
>> + }
>> + console_unlock();
>> +
>> + if (diff != last_diff && reset_on_progress)
>> + remaining = timeout_ms;
>> +
>> + if (diff == 0 || remaining == 0)
>> + break;
>> +
>> + if (remaining < 0) {
>> + /* no timeout limit */
>> + msleep(100);
>> + } else if (remaining < 100) {
>> + msleep(remaining);
>> + remaining = 0;
>> + } else {
>> + msleep(100);
>> + remaining -= 100;
>> + }
>> +
>> + last_diff = diff;
>> + }
>> +
>> + return (diff == 0);
>> +}
>> +EXPORT_SYMBOL(pr_flush);
>
> Summary:
>
> The pr_flush() API and the optional timeout look reasonable to me.
>
> Please, handle oops_in_progress in pr_flush() and make it safe in this
> mode. It will allow to move it at the end of console_unblank() where
> it makes more sense.
I will add another oops_in_progress check at the end. pr_flush() will
not be made safe for oops_in_progress. Keep in mind that when
oops_in_progress is set, direct printing will be active, so there should
be no need for pr_flush() anyway.
> I do not resist on flushing only the given consoles in console_stop()
> and console_start(). It is nice to have and can be done later.
It is a simple change. I will do it for v2.
> Also I do not resist on doing the flush in the context of the caller.
> I am not even sure if it is a good idea. We could play with it
> later when there are some problems with the current approach
> in practice.
For now, let's keep it as a waiting function.
John
I have taken some time to investigate the percpu implementation so that
I could provide clear answers here.
On 2022-02-15, Petr Mladek <[email protected]> wrote:
> I am not 100% sure. But it seems that static per-CPU variables might
> actually be used since the boot.
You are correct, but until per-cpu areas are setup, they all point to
CPU0. Normally that is not a problem since usually code is using
this_cpu_ptr() (which will always be the CPU0 in early boot), rather
than specifying foreign CPUs with per_cpu_ptr().
> Most likely, only dynamically allocated per-cpu variables have to wait
> until the per-cpu areas are initialized.
It is also important to wait if data will be stored that is no longer
valid after per-cpu areas are setup. setup_per_cpu_areas() copies the
static CPU0 per-cpu value to all the newly setup per-cpu areas.
This is actually the cause for the mystery [0] of failing irq_work when
printk_deferred was added with commit
15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a
lockdep splat with printk()". The problem was that the irq work was
being queued and _claimed_ before per-cpu areas were setup. But the
actual work functions are not called there. So when the irq_work was
copied to all the new per-cpu areas and the actual work was carried out,
the work item that was cleared was the old static variable and not the
new per-cpu area variable. This means all the irq_work flags were left
in the claimed state (PENDING|BUSY) and could no longer be claimed.
By avoiding queueing irq_work before setup_per_cpu_areas(), we correctly
avoided this problem. (I considered sending a patch so that
irq_work_claim() will fail if a global @percpu_complete is not yet
set. But for now, our set_percpu_data_ready() solution is at least good
enough for the printk subsystem.)
> We should probably revisit the code and remove the fallback to
> normal static variables.
Definitely. Now it is clear that @printk_count and @printk_count_nmi do
not need early variants.
John
[0] https://lore.kernel.org/lkml/[email protected]/
On Wed 2022-03-02 18:29:09, John Ogness wrote:
> On 2022-02-17, Petr Mladek <[email protected]> wrote:
> >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> >> index 02bde45c1149..1e80fd052bd5 100644
> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> >> @@ -2802,8 +2804,10 @@ void console_unblank(void)
> >> if (oops_in_progress) {
> >> if (down_trylock_console_sem() != 0)
> >> return;
> >> - } else
> >> + } else {
> >> + pr_flush(1000, true);
> >
> > It would make more sense to flush the consoles after they are
> > unblanked. I mean to move this to the end of the function.
>
> Agreed.
>
> > Also it is not obvious why this is not called when oops_in_progress
> > is set. I guess that it is because trylock is needed in this case.
> > It should be handled inside pr_flush().
> >
> > I mean that pr_flush() should internally use trylock when
> > @oops_in_progress is set. It will make it safe even in this
> > mode.
>
> pr_flush() is a might_sleep() function. We agreed on this at
> LPC2019.
I see.
> Creating a pr_flush() that will directly push out the messages (or
> busy-wait) in non-preemptible contexts is complicated. It might be
> something to attempt for the future, but I would prefer to avoid it at
> this stage.
Sure. It is not needed now, definitely.
> >> console_lock();
> >> + }
> >>
> >> console_locked = 1;
> >> console_may_schedule = 0;
> >> @@ -2869,6 +2873,7 @@ struct tty_driver *console_device(int *index)
> >> */
> >> void console_stop(struct console *console)
> >> {
> >> + pr_flush(1000, true);
> >
> > It would be enough to flush just the given @console.
>
> For v2 I will create an internal __pr_flush() to allow specifying that
> only a single console is flushed. The high level pr_flush() will then
> call __pr_flush() specifying all consoles.
Feel free to keep it as is when it gets to complicated.
We could always optimize it later.
> > It might be possible to take over the job from the related
> > kthread and flush it in this context. Well, I am not sure if
> > it is a good idea.
>
> I agree that it might not be a good idea. Let's keep things simple for
> now.
I agree.
> >> console_lock();
> >> console->flags &= ~CON_ENABLED;
> >> console_unlock();
> >> @@ -2880,6 +2885,7 @@ void console_start(struct console *console)
> >> console_lock();
> >> console->flags |= CON_ENABLED;
> >> console_unlock();
> >> + pr_flush(1000, true);
> >
> > Same here.
>
> OK.
>
> >> }
> >> EXPORT_SYMBOL(console_start);
> >>
> >> @@ -3249,6 +3255,71 @@ static int __init printk_late_init(void)
> >> late_initcall(printk_late_init);
> >>
> >> #if defined CONFIG_PRINTK
> >> +/**
> >> + * pr_flush() - Wait for printing threads to catch up.
> >> + *
> >
> > Alternative solution would be to take over the job from the kthreads
> > and flush the consoles in this context. Well, I am not sure
> > if it is a good idea or not.
>
> Since pr_flush() is might_sleep() this would be relatively simple. Just
> grab the console mutex and go. My concern is that this task may have
> different scheduling parameters that could negatively affect the
> system. For normal operation, I really would prefer that the designated
> kthreads do the work. If "waiting for the kthreads" turns out to be
> problematic, then maybe we could go down this path.
The different scheduling parameters were actually my concern. I though
about a high priority task waiting for normal priority kthreads.
Anyway, let's keep it simple now. Only practice will show if
a more complex solution is needed.
> >> + * @timeout_ms: The maximum time (in ms) to wait.
> >> + * @reset_on_progress: Reset the timeout if forward progress is seen.
> >> + *
> >> + * A value of 0 for @timeout_ms means no waiting will occur. A value of -1
> >> + * represents infinite waiting.
> >> + *
> >> + * If @reset_on_progress is true, the timeout will be reset whenever any
> >> + * printer has been seen to make some forward progress.
> >> + *
> >> + * Context: Process context. May sleep while acquiring console lock.
> >> + * Return: true if all enabled printers are caught up.
> >> + */
> >> +bool pr_flush(int timeout_ms, bool reset_on_progress)
> >> +{
> >> + int remaining = timeout_ms;
> >> + struct console *con;
> >> + u64 last_diff = 0;
> >> + u64 printk_seq;
> >> + u64 diff;
> >> + u64 seq;
> >> +
> >> + might_sleep();
> >> +
> >> + seq = prb_next_seq(prb);
> >> +
> >> + for (;;) {
> >> + diff = 0;
> >> +
> >> + console_lock();
> >> + for_each_console(con) {
> >> + if (!console_is_usable(con))
> >> + continue;
> >> + printk_seq = con->seq;
> >> + if (printk_seq < seq)
> >> + diff += seq - printk_seq;
> >> + }
> >> + console_unlock();
> >> +
> >> + if (diff != last_diff && reset_on_progress)
> >> + remaining = timeout_ms;
> >> +
> >> + if (diff == 0 || remaining == 0)
> >> + break;
> >> +
> >> + if (remaining < 0) {
> >> + /* no timeout limit */
> >> + msleep(100);
> >> + } else if (remaining < 100) {
> >> + msleep(remaining);
> >> + remaining = 0;
> >> + } else {
> >> + msleep(100);
> >> + remaining -= 100;
> >> + }
> >> +
> >> + last_diff = diff;
> >> + }
> >> +
> >> + return (diff == 0);
> >> +}
> >> +EXPORT_SYMBOL(pr_flush);
> >
> > Summary:
> >
> > The pr_flush() API and the optional timeout look reasonable to me.
> >
> > Please, handle oops_in_progress in pr_flush() and make it safe in this
> > mode. It will allow to move it at the end of console_unblank() where
> > it makes more sense.
>
> I will add another oops_in_progress check at the end. pr_flush() will
> not be made safe for oops_in_progress. Keep in mind that when
> oops_in_progress is set, direct printing will be active, so there should
> be no need for pr_flush() anyway.
Sounds good.
Best Regards,
Petr
On Wed 2022-03-02 15:27:50, John Ogness wrote:
> I have taken some time to investigate the percpu implementation so that
> I could provide clear answers here.
>
> On 2022-02-15, Petr Mladek <[email protected]> wrote:
> > I am not 100% sure. But it seems that static per-CPU variables might
> > actually be used since the boot.
>
> You are correct, but until per-cpu areas are setup, they all point to
> CPU0. Normally that is not a problem since usually code is using
> this_cpu_ptr() (which will always be the CPU0 in early boot), rather
> than specifying foreign CPUs with per_cpu_ptr().
>
> > Most likely, only dynamically allocated per-cpu variables have to wait
> > until the per-cpu areas are initialized.
>
> It is also important to wait if data will be stored that is no longer
> valid after per-cpu areas are setup. setup_per_cpu_areas() copies the
> static CPU0 per-cpu value to all the newly setup per-cpu areas.
>
> This is actually the cause for the mystery [0] of failing irq_work when
> printk_deferred was added with commit
> 15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a
> lockdep splat with printk()".
Just for record, the right commit ID in the mainline is
1b710b1b10eff9d466. It used printk_deferred() in _warn_unseeded_randomness():
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1687,8 +1687,9 @@ static void _warn_unseeded_randomness(const char *func_name, void *caller,
print_once = true;
#endif
if (__ratelimit(&unseeded_warning))
- pr_notice("random: %s called from %pS with crng_init=%d\n",
- func_name, caller, crng_init);
+ printk_deferred(KERN_NOTICE "random: %s called from %pS "
+ "with crng_init=%d\n", func_name, caller,
+ crng_init);
}
/*
> By avoiding queueing irq_work before setup_per_cpu_areas(), we correctly
> avoided this problem. (I considered sending a patch so that
> irq_work_claim() will fail if a global @percpu_complete is not yet
> set.
That is a great TODO :-)
> But for now, our set_percpu_data_ready() solution is at least good
> enough for the printk subsystem.)
Yes but it is most likely not needed for CON_ANYTIME consoles,
see below.
> > We should probably revisit the code and remove the fallback to
> > normal static variables.
>
> Definitely. Now it is clear that @printk_count and @printk_count_nmi do
> not need early variants.
Note the ordering:
asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
{
[...]
early_security_init();
[...]
setup_per_cpu_areas();
[...]
parse_early_param();
after_dashes = parse_args("Booting kernel",
My guess is that _warn_unseeded_randomness() was called from a code
before early_security_init(). It was bad because per-CPU variables
were not ready yet.
The early consoles are enabled by parse_early_param(). It happens
after setup_per_cpu_areas(). It means that all console drivers
should be on the safe side.
Best Regards,
Petr
On Wed 2022-03-02 15:55:23, John Ogness wrote:
> On 2022-02-16, Petr Mladek <[email protected]> wrote:
> >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> >> index d1b773823d63..b346e60e9e51 100644
> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> >> @@ -2577,11 +2577,11 @@ static int have_callable_console(void)
> >> *
> >> * Console drivers may assume that per-cpu resources have been allocated. So
> >> * unless they're explicitly marked as being able to cope (CON_ANYTIME) don't
> >> - * call them until this CPU is officially up.
> >> + * call them until per-cpu resources have been allocated.
> >> */
> >> static inline int can_use_console(void)
> >> {
> >> - return cpu_online(raw_smp_processor_id()) || have_callable_console();
> >> + return (printk_percpu_data_ready() || have_callable_console());
> >> }
> >
> > cpu_online(raw_smp_processor_id()) check is used also in
> > call_console_drivers(). The same logic should be used in both
> > locations.
> >
> > I found this when reviewing 6th patch that replaced both checks
> > with a single one.
> >
> > Note that I am still not sure if this change is correct at all.
> > It will allow to always call the console during CPU hotplug
> > and I am not sure if it is safe. IMHO, it might cause problems when
> > a console driver uses, for example, CPU-bound workqueues.
>
> You are correct. We must take hotplug into account for !CON_ANYTIME
> consoles. There may be some hotplug callbacks that make memory
> unavailable for the console.
>
> However, I will add the use of printk_percpu_data_ready() in the
> check. !CON_ANYTIME consoles also should not be called until the per-cpu
> areas are ready. For example, it would be bad if a console queued
> irq_work before per-cpu areas are setup (cpu_online() is true during
> this time).
>
> One of my main concerns was that raw_smp_processor_id() was used for the
> check. It is conceptually wrong to exclude certain consoles based on a
> current CPU when migration is still enabled. I understand that the use
> of can_use_console() is an optimization to avoid doing extra work where
> there are no consoles available. But the task could be preemptible there
> and _conceptually_, could get moved to another CPU before its write()
> callback is called. The cpu_online() check belongs in code where
> preemption is disabled.
>
> If the context is preemptible, I do not think it will ever see
> !cpu_online(). So I think if the cpu_online() check is limited to
> unlocking when console_trylock() was used, it will be correct.
This would require calling console_lock()/console_unlock()
in a hotplug code when cpu_online() already returns false.
Do I get it correctly?
I agree that it should not happen. console_lock() must be called in
a preemptible context. And CPU should not be in a pre-emptible
context when cpu_online() returns false. To be honest, I did
not check the code. It just does not make much sense.
> In the current implementation of printk(), it would be odd to do this
> conditional check (perhaps by passing @do_cond_resched to
> can_use_console()). But my series does significant refactoring and
> actually does need to distinguish between console_lock() and
> console_trylock() due to the kthreads and supporting the handover. So it
> should work well that the cpu_online() check for !CON_ANYTIME is only
> performed when !preemptible.
>
> Regardless, my v2 will keep cpu_online() checks since they are necessary
> for hotplug support.
Yes, I would do it to stay on the safe side.
Best Regards,
Petr
Hi Petr,
On Fri, Mar 04, 2022 at 04:56:55PM +0100, Petr Mladek wrote:
> Just for record, the right commit ID in the mainline is
> 1b710b1b10eff9d466. It used printk_deferred() in _warn_unseeded_randomness():
>
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1687,8 +1687,9 @@ static void _warn_unseeded_randomness(const char *func_name, void *caller,
> print_once = true;
> #endif
> if (__ratelimit(&unseeded_warning))
> - pr_notice("random: %s called from %pS with crng_init=%d\n",
> - func_name, caller, crng_init);
> + printk_deferred(KERN_NOTICE "random: %s called from %pS "
> + "with crng_init=%d\n", func_name, caller,
> + crng_init);
> }
Are we able to revert this yet? Or is it still required because of
locking issues? Would gladly take a patch to revert that if the
non-deferred function is fine for 5.18.
Jason
On 2022-03-04, Petr Mladek <[email protected]> wrote:
>> If the context is preemptible, I do not think it will ever see
>> !cpu_online(). So I think if the cpu_online() check is limited to
>> unlocking when console_trylock() was used, it will be correct.
>
> This would require calling console_lock()/console_unlock()
> in a hotplug code when cpu_online() already returns false.
> Do I get it correctly?
Yes, but there is also the other side: Are there preemptible code paths
that are calling console_trylock()? And if yes, then why are they not
using console_lock()?
There is only a handfull of console_trylock() users (about 10
sites). Most of them appear to be differentiating from preemptible and
non-preemptible contexts. I will investigate them in detail.
If the raw_smp_processor_id() is changed to smp_processor_id() within
the console_trylock()/console_unlock() call path, then at least the
kernel's lock debugging would identify if there are preemptible contexts
that are using console_trylock().
John
On Wed 2022-03-02 15:55:23, John Ogness wrote:
> On 2022-02-16, Petr Mladek <[email protected]> wrote:
> >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> >> index d1b773823d63..b346e60e9e51 100644
> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> >> @@ -2577,11 +2577,11 @@ static int have_callable_console(void)
> >> *
> >> * Console drivers may assume that per-cpu resources have been allocated. So
> >> * unless they're explicitly marked as being able to cope (CON_ANYTIME) don't
> >> - * call them until this CPU is officially up.
> >> + * call them until per-cpu resources have been allocated.
> >> */
> >> static inline int can_use_console(void)
> >> {
> >> - return cpu_online(raw_smp_processor_id()) || have_callable_console();
> >> + return (printk_percpu_data_ready() || have_callable_console());
> >> }
> >
> > cpu_online(raw_smp_processor_id()) check is used also in
> > call_console_drivers(). The same logic should be used in both
> > locations.
> >
> > I found this when reviewing 6th patch that replaced both checks
> > with a single one.
> >
> > Note that I am still not sure if this change is correct at all.
> > It will allow to always call the console during CPU hotplug
> > and I am not sure if it is safe. IMHO, it might cause problems when
> > a console driver uses, for example, CPU-bound workqueues.
>
> You are correct. We must take hotplug into account for !CON_ANYTIME
> consoles. There may be some hotplug callbacks that make memory
> unavailable for the console.
>
> However, I will add the use of printk_percpu_data_ready() in the
> check. !CON_ANYTIME consoles also should not be called until the per-cpu
> areas are ready. For example, it would be bad if a console queued
> irq_work before per-cpu areas are setup (cpu_online() is true during
> this time).
>
> One of my main concerns was that raw_smp_processor_id() was used for the
> check. It is conceptually wrong to exclude certain consoles based on a
> current CPU when migration is still enabled. I understand that the use
> of can_use_console() is an optimization to avoid doing extra work where
> there are no consoles available. But the task could be preemptible there
> and _conceptually_, could get moved to another CPU before its write()
> callback is called. The cpu_online() check belongs in code where
> preemption is disabled.
>
> If the context is preemptible, I do not think it will ever see
> !cpu_online(). So I think if the cpu_online() check is limited to
> unlocking when console_trylock() was used, it will be correct.
I investigated the cpu hotplug code and found the following:
1. In the cpu_down() code path, @cpu_online mask is cleared
by this call chain:
+ take_cpu_down()
+ __cpu_disable()
+ smp_ops.cpu_disable()
+ native_cpu_disable() # x86
+ cpu_disable_common()
+ remove_cpu_from_maps()
+ set_cpu_online(cpu, false)
, where take_cpu_down() is called via:
+ .teardown.single() calback for CPUHP_TEARDOWN_CPU state
+ takedown_cpu()
+ stop_machine_cpuslocked()
+ stop_cpus()
+ __stop_cpus()
+ queue_stop_cpus_work()
+ cpu_stop_queue_work()
, which queues the work in cpu_stopper thread that is bound
to the CPU:
+ cpu_stop_init()
+ smpboot_register_percpu_thread()
+ __smpboot_create_thread()
+ kthread_create_on_cpu()
Summary: @cpu_online mask is cleared on the affected CPU in
cpu_stopper thread that is bound to the same CPU.
It happens when handling CPUHP_TEARDOWN_CPU.
2. The CPU hotplug states are split into three groups:
+ code running on control CPU (another CPU)
+ low level code running on the hotplugged CPU
+ code running in the hotplug thread on the hotplugged CPU
It is described in include/linux/cpuhotplug.h:
/* PREPARE section invoked on a control CPU */
CPUHP_OFFLINE = 0,
[...]
/*
* STARTING section invoked on the hotplugged CPU in low level
* bringup and teardown code.
*/
CPUHP_AP_IDLE_DEAD,
[...]
CPUHP_AP_ONLINE,
CPUHP_TEARDOWN_CPU,
/* Online section invoked on the hotplugged CPU from the hotplug thread */
CPUHP_AP_ONLINE_IDLE,
CPUHP_AP_SCHED_WAIT_EMPTY,
[...]
CPUHP_ONLINE,
, where sched_cpu_wait_empty() is the .teardown.single callback for
CPUHP_AP_SCHED_WAIT_EMPTY. After this callback, another tasks
should not be scheduled on this CPU. Any attempt should be
catched and handled by sched_cpu_dying().
Note that CPUHP_AP_SCHED_WAIT_EMPTY is called before
CPUHP_TEARDOWN_CPU when the CPU goes down.
Summary: @cpu_only mask is cleared for the CPU when other tasks
could not longer be sheduled there.
Result: cpu_online(raw_smp_processor_id()) should be safe for our
purpose. It will return false only the task could not longer
migrate from the CPU.
I have to admit that it is far from obvious and tricky like
hell.
OK, cpu_online(raw_smp_processor_id()) check is not racy. Another
question is whether it is a good check whether the consoles are usable
or not.
I found the following:
1. My understanding is that affinity of IRQs is disabled right after
clearing @cpu_online mask:
void cpu_disable_common(void)
{
[...]
remove_cpu_from_maps(cpu);
[...]
fixup_irqs();
[...]
}
2. Timers must not be used close after clearing @cpu_online mask, see
see include/linux/cpuhotplug.h:
/* Must be the last timer callback */
CPUHP_AP_DUMMY_TIMER_STARTING,
[...]
CPUHP_AP_ONLINE,
CPUHP_TEARDOWN_CPU,
Result: From my POV, cpu_online(raw_smp_processor_id()) marks
reasonable place in the CPU hotplug code when the conosles
start/stop being usable.
But again, it is far from obvious and tricky like hell.
Summary: We need to keep cpu_online(raw_smp_processor_id()) check
to make the consoles safe during CPU hotplug.
IMHO, it is not about per-CPU variables. It is more
about timers, interrupts. The hotplugged CPU is not
ready call console code at these early hotplug stages.
> Regardless, my v2 will keep cpu_online() checks since they are necessary
> for hotplug support.
Yes, please. We should also somehow document this. But it can be done
separately. It is not necessarily in the scope of your patchset.
Best Regards,
Petr
On Sat 2022-03-05 10:05:02, Jason A. Donenfeld wrote:
> Hi Petr,
>
> On Fri, Mar 04, 2022 at 04:56:55PM +0100, Petr Mladek wrote:
>
> > Just for record, the right commit ID in the mainline is
> > 1b710b1b10eff9d466. It used printk_deferred() in _warn_unseeded_randomness():
> >
> > --- a/drivers/char/random.c
> > +++ b/drivers/char/random.c
> > @@ -1687,8 +1687,9 @@ static void _warn_unseeded_randomness(const char *func_name, void *caller,
> > print_once = true;
> > #endif
> > if (__ratelimit(&unseeded_warning))
> > - pr_notice("random: %s called from %pS with crng_init=%d\n",
> > - func_name, caller, crng_init);
> > + printk_deferred(KERN_NOTICE "random: %s called from %pS "
> > + "with crng_init=%d\n", func_name, caller,
> > + crng_init);
> > }
>
> Are we able to revert this yet? Or is it still required because of
> locking issues? Would gladly take a patch to revert that if the
> non-deferred function is fine for 5.18.
Unfortunately, printk_deferred() will still be needed in 5.18 here.
One thing is that this patchset introducing console kthreads will not
be ready for 5.18.
But more importantly, the kthreads will not be enough to remove the
cyclic dependency. The legacy synchronous printing will still be used
during early boot, panic, etc.
Honestly, I am still not sure if we will be able to get rid of
printk_deferred(). I hope so but some more magic will be necessary.
Anyway, the kthreads should help to achieve this.
Best Regards,
Petr
On 2022-02-23, Petr Mladek <[email protected]> wrote:
>> +/*
>> + * A variant of console_trylock() that allows specifying if the context may
>> + * sleep. If yes, a trylock on @console_sem is attempted and if successful,
>> + * the threaded printers are paused. This is important to ensure that
>> + * sleepable contexts do not become involved in console_lock handovers and
>> + * will call cond_resched() during the printing loop.
>> + */
>> +static int console_trylock_sched(bool may_schedule)
>> +{
>> + if (!may_schedule)
>> + return console_trylock();
>> +
>> + might_sleep();
>> +
>> + if (down_trylock_console_sem())
>> + return 0;
>> + if (console_suspended) {
>> + up_console_sem();
>> + return 0;
>> + }
>> + pause_all_consoles();
>
> This is weird. Any trylock function should be fast and non-blocking.
> But pause_all_consoles() uses mutex_lock().
>
> My expectation would be that console_trylock_sched() behaves
> excatly the same as console_trylock() except that it will
> set console_may_schedule by the given parameter.
>
> I would do it the other way. Rename console_trylock() and
> implement:
>
> int console_trylock(void)
> {
> return console_trylock_sched(false);
> }
>
> LATER: I got it. It is used by console_trylock_sched() called
> in console_unlock() when "do_cond_resched == true". In this
> case, the trylock might wait for the mutexes. It will prevent
> transfering console_lock from schedulable to atomic context
> by the check in console_emit_next_record().
Yes!
> Hmm, I would still prefer to keep console_trylock_sched()
> behavior sane: non-blocking in all situations. It means
> that we actually do not need it and console_trylock()
> is enough.
>
> It will allow to steal console_lock() from schedulable
> context. But it is not a regression. And it is only
> a corner case when console_unlock() re-takes the semaphore
> after releasing it.
A console waiter must not wait on a schedulable context. The console
waiter is burning the CPU waiting for a transfer. If the console owner
gets scheduled away while still holding the console lock, that is bad.
> We could do the same optimization in console_unlock() by
> calling console_emit_next_record() with NULL handover pointer
> when do_cond_resched == true. But we should do it
> as a separate patch later.
It is not an optimization, it is needed. Passing a NULL handover pointer
when do_cond_resched == true would handle it correctly, but this feels
like a workaround to me.
The reason for adding console_trylock_sched() is because a context that
previously acquired the console lock via console_lock() wants to try to
reacquire it. If it reacquires the console lock using the kthread
mutexes, the locking scenario returns to the same as it was
before... all kthreads are blocked via their mutex.
You are suggesting that a console_lock() context later tries to
reacquire the console lock, but using the console_trylock() method
(atomic counter) and keeping console_may_schedule=1.
IMHO, _this_ is a weird variant that requires passing in a NULL handover
pointer as a workaround. It introduces a third locking scenario where a
schedulable context is using functions created for atomic use.
Also, as I mentioned in the percpu thread [0], I think we need to avoid
console_trylock() usage in schedulable contexts. Functions need to be
aware in what contexts they are running and call the appropriate
functions for it.
>> @@ -2856,6 +2957,10 @@ void console_unblank(void)
>> if (oops_in_progress) {
>> if (down_trylock_console_sem() != 0)
>> return;
>> + if (!console_excl_trylock()) {
>> + up_console_sem();
>> + return;
>> + }
>
> It would be better to use
>
> if (oops_in_progress) {
> if (!console_trylock())
> return;
>
Well that is slightly different. It would mean that @console_suspended
is now also considered. I will investigate if that matters, but
currently it is not considered.
John
[0] https://lore.kernel.org/lkml/[email protected]
On 2022-03-10, Petr Mladek <[email protected]> wrote:
> console_unlock()
> {
> [...]
> if (may_schedule)
> retry = console_trylock_sched();
> else
> retry = console_trylock();
> }
I believe the main confusion comes from the function name I chose and
the poor function description. Using your above code idea and changing
to a more fitting name, I would suggest:
console_unlock()
{
[...]
if (may_schedule)
retry = console_lock_reacquire();
else
retry = console_trylock();
}
This console_lock_reacquire() acquires the console lock the same way
that console_lock() does it. The only reason we don't just use
console_lock() is because we want to perform a try on @console_sem. But
if we are successful, in the end, we have re-taken the console lock
exactly as console_lock() did before: @console_sem locked, kthreads
blocked by mutex.
You say this creates deadlock potential, but I do not see how that could
be. We are in the same context and locking the same way we did before.
But my primary concern is not the naming or workarounds or confusing
APIs. So we should not let ourselves be diverted by that aspect.
My primary concern is the technical difference when a schedulable
context reacquires via atomic counter (which fails if any kthread is
active) vs. reacquiring via mutex (which never fails).
The reason for the reacquire is because (during direct printing) we see
that a new record appeared and we need to make sure it gets printed
(because other direct printers may have aborted, expecting us to print
it).
This scenario is only interesting if kthread printers exist because
otherwise @console_sem is enough to handle the direct printing.
So the questions are:
1. Is it OK to assume the new record will be printed if any kthread is
active? If yes, then it is enough to use the atomic counter.
2. Or, since we are responsible for direct printing, do we want to be
certain that the record is printed by printing it ourselves? If yes,
then we must block all the kthreads and perform the printing directly to
all the consoles. This requires the mutex approach.
IMHO #1 will relies heavily on kthreads waking up and printing (even
though the printk caller requested direct printing), whereas #2 will
cause direct printers to more actively print (possibly printing more
than was requested).
I prefer to go the route of #2 because it is more conservative. IMHO,
when direct printing becomes active, we really should make a strong
effort to direct print.
Having now stated the issues (hopefully clearly), I will not fight for
#2. If you say it is enough to rely on the kthreads in this scenario,
then I will implement the atomic counter solution for my v2.
John
On Thu 2022-03-10 17:14:18, John Ogness wrote:
> On 2022-03-10, Petr Mladek <[email protected]> wrote:
> > console_unlock()
> > {
> > [...]
> > if (may_schedule)
> > retry = console_trylock_sched();
> > else
> > retry = console_trylock();
> > }
>
> I believe the main confusion comes from the function name I chose and
> the poor function description. Using your above code idea and changing
> to a more fitting name, I would suggest:
>
> console_unlock()
> {
> [...]
> if (may_schedule)
> retry = console_lock_reacquire();
> else
> retry = console_trylock();
> }
>
> This console_lock_reacquire() acquires the console lock the same way
> that console_lock() does it. The only reason we don't just use
> console_lock() is because we want to perform a try on @console_sem. But
> if we are successful, in the end, we have re-taken the console lock
> exactly as console_lock() did before: @console_sem locked, kthreads
> blocked by mutex.
>
> You say this creates deadlock potential, but I do not see how that could
> be. We are in the same context and locking the same way we did before.
Yeah, it looks safe when using to re-acquire the lock in console_unlock().
I thought about it as a generic API. If it exists, it might be used
anywhere. And it has pretty special behavior that one has to keep
in mind.
> But my primary concern is not the naming or workarounds or confusing
> APIs. So we should not let ourselves be diverted by that aspect.
>
> My primary concern is the technical difference when a schedulable
> context reacquires via atomic counter (which fails if any kthread is
> active) vs. reacquiring via mutex (which never fails).
>
> The reason for the reacquire is because (during direct printing) we see
> that a new record appeared and we need to make sure it gets printed
> (because other direct printers may have aborted, expecting us to print
> it).
I see. I missed this POV.
> This scenario is only interesting if kthread printers exist because
> otherwise @console_sem is enough to handle the direct printing.
>
> So the questions are:
>
> 1. Is it OK to assume the new record will be printed if any kthread is
> active? If yes, then it is enough to use the atomic counter.
>
> 2. Or, since we are responsible for direct printing, do we want to be
> certain that the record is printed by printing it ourselves? If yes,
> then we must block all the kthreads and perform the printing directly to
> all the consoles. This requires the mutex approach.
>
> IMHO #1 will relies heavily on kthreads waking up and printing (even
> though the printk caller requested direct printing), whereas #2 will
> cause direct printers to more actively print (possibly printing more
> than was requested).
OK, it means that the main problem here _is not_ the scheduling context,
console_lock() vs. console_trylock(). The main problem _is_ the direct
printing vs. the offload to kthreads.
Of course, the context is important. It affects how we could re-take
the lock. But the main problem is the printing mode. We must make sure
that:
1. someone is printing pending messages when the direct mode is needed
2. kthreads are woken and can enter the printing mode when the direct
mode is disabled.
Will console_trylock_sched()/console_trylock_reacquire() really
help here?
The API theoretically helps in direct mode when the lock was taken
via console_lock(). But it does not help when the lock was taken
via console_trylock() from printk(). It might mean that
the forward progress might not be guaranteed in the direct mode
(early boot, panic, ...).
Hmm, the forward progress seems to be guaranteed in the direct
mode most of the time. console_trylock() can take over
the atomic counter because console kthreads are not allowed
to enter the printing mode in this case.
I used "most of the time" because there might be races when
the mode is switched. "printk_direct" is an atomic variable.
CON_DIRECT is set under con->mutex but console_trylock()
does not take the mutex...
There are also races when the offload to consoles kthreads
is allowed. For example, console_trylock() might block
console_kthread_printing_tryenter().
Sigh, I am afraid that we have non-trivial problems
to guarantee that all messages will be printed:
+ races when switching between direct mode
and offload to kthreads. It might cause
stall in both modes.
+ console_trylock() races with
console_kthread_printing_tryenter().
It might put kthread into a sleep even when
it is supposed to print the message.
IMHO, console_trylock_sched() does not help much here.
We need to solve console_trylock() path anyway.
I think that the solution might be:
+ make sure that the state of "printk_direct" atomic variable
is enough to distinguish about the mode.
+ always wakeup() console kthreads after console_trylock()
to handle the possible race with
console_kthread_printing_tryenter()
I have to think more about it.
Does it make any sense, please?
Best Regards,
Petr
On Wed 2022-03-09 15:02:07, John Ogness wrote:
> On 2022-02-23, Petr Mladek <[email protected]> wrote:
> >> +/*
> >> + * A variant of console_trylock() that allows specifying if the context may
> >> + * sleep. If yes, a trylock on @console_sem is attempted and if successful,
> >> + * the threaded printers are paused. This is important to ensure that
> >> + * sleepable contexts do not become involved in console_lock handovers and
> >> + * will call cond_resched() during the printing loop.
> >> + */
> >> +static int console_trylock_sched(bool may_schedule)
> >> +{
> >> + if (!may_schedule)
> >> + return console_trylock();
> >> +
> >> + might_sleep();
> >> +
> >> + if (down_trylock_console_sem())
> >> + return 0;
> >> + if (console_suspended) {
> >> + up_console_sem();
> >> + return 0;
> >> + }
> >> + pause_all_consoles();
> >
> > This is weird. Any trylock function should be fast and non-blocking.
> > But pause_all_consoles() uses mutex_lock().
> >
> > My expectation would be that console_trylock_sched() behaves
> > excatly the same as console_trylock() except that it will
> > set console_may_schedule by the given parameter.
> >
> > I would do it the other way. Rename console_trylock() and
> > implement:
> >
> > int console_trylock(void)
> > {
> > return console_trylock_sched(false);
> > }
> >
> > LATER: I got it. It is used by console_trylock_sched() called
> > in console_unlock() when "do_cond_resched == true". In this
> > case, the trylock might wait for the mutexes. It will prevent
> > transfering console_lock from schedulable to atomic context
> > by the check in console_emit_next_record().
>
> Yes!
>
> > Hmm, I would still prefer to keep console_trylock_sched()
> > behavior sane: non-blocking in all situations. It means
> > that we actually do not need it and console_trylock()
> > is enough.
> >
> > It will allow to steal console_lock() from schedulable
> > context. But it is not a regression. And it is only
> > a corner case when console_unlock() re-takes the semaphore
> > after releasing it.
>
> A console waiter must not wait on a schedulable context. The console
> waiter is burning the CPU waiting for a transfer. If the console owner
> gets scheduled away while still holding the console lock, that is bad.
There must be a confusion. Of course, preemption must be disabled
when the current console_lock() owner allows to take over the lock.
It is currently achieved by disabling interrupts:
printk_safe_enter_irqsave(flags);
console_lock_spinning_enable();
call_console_driver(con, write_text, len, dropped_text);
*handover = console_lock_spinning_disable_and_check();
printk_safe_exit_irqrestore(flags);
I wanted to say that it is safe even when the preemption is otherwise
enabled around this code.
Also I wanted to say that it is not ideal when the current owner
is called with preemption enabled and the new owner will continue
handling consoles with preemption disabled. But the original code
worked this way.
It would be nice to avoid moving the lock from a preemptive context
to non-preemptive one. But I would prefer to do it separately
because the proposed console_trylock_sched() is really controversial.
IMHO, it is not worth delaying this patchset because of this.
> > We could do the same optimization in console_unlock() by
> > calling console_emit_next_record() with NULL handover pointer
> > when do_cond_resched == true. But we should do it
> > as a separate patch later.
>
> It is not an optimization, it is needed. Passing a NULL handover pointer
> when do_cond_resched == true would handle it correctly, but this feels
> like a workaround to me.
>
> The reason for adding console_trylock_sched() is because a context that
> previously acquired the console lock via console_lock() wants to try to
> reacquire it. If it reacquires the console lock using the kthread
> mutexes, the locking scenario returns to the same as it was
> before... all kthreads are blocked via their mutex.
>
> You are suggesting that a console_lock() context later tries to
> reacquire the console lock, but using the console_trylock() method
> (atomic counter) and keeping console_may_schedule=1.
>
> IMHO, _this_ is a weird variant that requires passing in a NULL handover
> pointer as a workaround. It introduces a third locking scenario where a
> schedulable context is using functions created for atomic use.
>
> Also, as I mentioned in the percpu thread [0], I think we need to avoid
> console_trylock() usage in schedulable contexts. Functions need to be
> aware in what contexts they are running and call the appropriate
> functions for it.
Let's put the NULL parameter aside. It is an implementation detail.
It can be handled by another parameter or so.
IMHO, the main disagreement is:
+ I do not like the proposed console_trylock_sched() API because
it uses mutex_lock().
From my POV, it can obviously cause deadlock. It is something
that nobody would expect from any *_trylock() API. IMHO,
it is a ticking bomb.
+ You do not like that console_lock()/console_unlock() might
re-take the lock using another approach (atomic counter
vs. con->flags.
From your POV, using non-blocking console_trylock() in
console_unlock() would be 3rd locking scheme. Also you say that
that the caller should know the context and use the appropriate
function.
Let me try to persuade you ;-)
1. Regarding the 3rd locking scheme:
It is exactly the opposite from my POV. console_trylock_sched()
looks like a 3rd locking variant to me. We would have blocking
console_lock(), non-blocking console_trylock(), and semi-blocking
console_trylock_sched().
And it is even worse. console_trylock_sched() behaves differently
according to the parameter. It makes is even more hard to
understand the behavior.
2. Regarding using appropriate function:
Let's see how it looks in the code:
printk()
preempt_disable()
console_trylock()
console_unlock()
// printing
up_console_sem()
retry = console_trylock_sched();
It feels like using sleeping lock in a non-preemptive context.
OK, we could make it more obvious:
console_unlock()
{
[...]
if (may_schedule)
retry = console_trylock_sched();
else
retry = console_trylock();
}
But it is yet another complexity in console_unlock().
Is it really necessary?
3. Implementation detail:
From my POV, the way how console_lock()/console_trylock() takes
the lock is an implementation detail.
console_lock() uses con->flags because it can and actually must
wait/sleep. console_trylock() uses the atomic counter because
it must not block.
We hide this implementation detail in __console_unlock(). It
releases the lock the same way as it was taken.
4. may_schedule vs. handover
The information whether console_unlock() may schedule is stored in
@console_may_schedule variable. console_lock() allows scheduling
and console_trylock() disables scheduling.
console_emit_next_record() allows handover according to
the atomic counter. It is set by the way how console_lock
was taken. console_lock() disables handover. console_trylock()
enables handover.
Now, console_trylock_sched() is needed to re-take the lock
the same way as it was taken originally. Otherwise,
the atomic counter will be used even when scheduling was enabled.
As a result, console_emit_next_record() would work differently.
By other word, console_trylock_sched() is used to encode
@console_may_schedule into @console_lock_count.
It looks like a hack similar to passing NULL *handover. But from my POV,
it is much worse. The price is console_trylock_sched() API with
very strange behavior. It is super tricky. It took me hours to
understand the motivation and behavior.
From, my POV, the most clear solution is to pass "may_schedule"
parameter to console_emit_next_record().
Does it make sense, please?
Do I miss anything, please?
Best Regards,
Petr
On Fri 2022-03-11 14:34:40, John Ogness wrote:
> On 2022-03-11, Petr Mladek <[email protected]> wrote:
> > But it does not help when the lock was taken
> > via console_trylock() from printk(). It might mean that
> > the forward progress might not be guaranteed in the direct mode
> > (early boot, panic, ...).
>
> How is the console_trylock() case different from current mainline now?
> As I mentioned above, the kthreads can block console_trylock(), but so
> can a console_lock() currently in mainline.
CPU0 CPU1 CPU2
printk()
// direct mode allowed
console_trylock()
console_unlock()
console_flush_all()
printk_direct_enter()
allows_direct_printing() -> false;
break;
__console_unlock()
wakeup_klogd()
// woken printk_khread
console_thread_printing_enter()
printk_direct_exit()
console_trylock()
atomic_tryblock()
//fails because thread active
return;
printk_direct_enter()
console_thread_printing_exit()
// sleep because
atomic_read(&printk_direct) is not
zero
Result: nobody prints
Note: The thread will actually not sleep because printk_should_wake()
does not check atomic_read(&printk_direct).
But it is a bug. Active thread should check it and allow
entering direct mode.
Note2: Even when one printk thread flushes the messages. There might
be other thread that will never get scheduled a nobody would
printk the messages on the related console.
This is the race that I see with console_trylock(). IMHO, if we solve
this race then we do not need console_trylock_sched().
Well, I might be wrong. It is Friday evening. I still do not have
the entire picture. I probably should have waited for Monday.
I am sure that I misunderstood several things, including wait_event()
API or CON_DIRECT flag. But the above race looks real. And it somehow
supports my feeling about that there are some races that can't be
solved by console_trylock_sched().
Best Regards,
Petr
On Fri 2022-03-11 14:34:40, John Ogness wrote:
> On 2022-03-11, Petr Mladek <[email protected]> wrote:
> >> console_unlock()
> >> {
> >> [...]
> >> if (may_schedule)
> >> retry = console_lock_reacquire();
> >> else
> >> retry = console_trylock();
> >> }
> >>
>
> [...]
>
> > OK, it means that the main problem here _is not_ the scheduling context,
> > console_lock() vs. console_trylock(). The main problem _is_ the direct
> > printing vs. the offload to kthreads.
> >
> > Of course, the context is important. It affects how we could re-take
> > the lock. But the main problem is the printing mode. We must make sure
> > that:
> >
> > 1. someone is printing pending messages when the direct mode is needed
>
> console_trylock() causes difficulties here because it will fail if any
> kthread is active. It is an example of direct mode failure. But is that
> really any different than current mainline console_trylock() failing
> because a console_lock() context is active (and possibly not scheduled
> on a CPU)?
I see. Well, console_lock() does not mean that we are in direct mode.
Yes, it stops kthreads. But it might be called by register_console()
or tty driver. console_unlock() might do nothing when
allow_direct_printing() returns false in console_flush_all().
> > 2. kthreads are woken and can enter the printing mode when the direct
> > mode is disabled.
>
> This happens at the end of vprintk_emit() and within __console_unlock(),
> regardless if the printk() was running in direct mode or not.
>
> > Will console_lock_reacquire() really help here?
> >
> > The API theoretically helps in direct mode when the lock was taken
> > via console_lock().
>
> console_lock_reacquire() only exists for the console_lock() case.
>
> > But it does not help when the lock was taken
> > via console_trylock() from printk(). It might mean that
> > the forward progress might not be guaranteed in the direct mode
> > (early boot, panic, ...).
>
> How is the console_trylock() case different from current mainline now?
> As I mentioned above, the kthreads can block console_trylock(), but so
> can a console_lock() currently in mainline.
It is very different. Failing console_trylock() in the current mainline
means that someone else is responsible for printing all the pending
messages. All should be on the safe side.
Failing console_trylock() in this patch means that some printk
kthread is active but nothing else. The kthread might handle one
message and then go into sleep because someone requested the
direct mode using atomic_inc(). Other kthreads might be sleeping
and ignore everything. The failing console_trylock() will mean
that nobody will handle the rest of the messages. The race
is described in the other reply.
Honestly, handling console_trylock() case looks more important to me
because it is used by printk(). It is the main usecase. While
console_lock() code path should be used rarely, during
console registration. Well, it is used also by tty code but
I am not sure how often it is really called there.
> > Hmm, the forward progress seems to be guaranteed in the direct
> > mode most of the time. console_trylock() can take over
> > the atomic counter because console kthreads are not allowed
> > to enter the printing mode in this case.
> >
> > I used "most of the time" because there might be races when
> > the mode is switched. "printk_direct" is an atomic variable.
> > CON_DIRECT is set under con->mutex but console_trylock()
> > does not take the mutex...
>
> You are mixing issues here. If CON_DIRECT is set, there is already a
> console_lock() in progress, so console_trylock() fails on @console_sem.
Yes, I am sorry for the confusion. I forgot the meaning of the flag.
I thought that it was set when the direct mode was requested.
But it is not true. It only means that console_lock is taken
and it could do the direct printing.
It would make sense to rename it. It might be part of all the
confusion here. The name creates false feeling about that
the direct mode is requested. I think that I have actually
suggested renaming the flag in an earlier mail.
Best Regards,
Petr
On 2022-03-11, Petr Mladek <[email protected]> wrote:
>> console_unlock()
>> {
>> [...]
>> if (may_schedule)
>> retry = console_lock_reacquire();
>> else
>> retry = console_trylock();
>> }
>>
[...]
> OK, it means that the main problem here _is not_ the scheduling context,
> console_lock() vs. console_trylock(). The main problem _is_ the direct
> printing vs. the offload to kthreads.
>
> Of course, the context is important. It affects how we could re-take
> the lock. But the main problem is the printing mode. We must make sure
> that:
>
> 1. someone is printing pending messages when the direct mode is needed
console_trylock() causes difficulties here because it will fail if any
kthread is active. It is an example of direct mode failure. But is that
really any different than current mainline console_trylock() failing
because a console_lock() context is active (and possibly not scheduled
on a CPU)?
> 2. kthreads are woken and can enter the printing mode when the direct
> mode is disabled.
This happens at the end of vprintk_emit() and within __console_unlock(),
regardless if the printk() was running in direct mode or not.
> Will console_lock_reacquire() really help here?
>
> The API theoretically helps in direct mode when the lock was taken
> via console_lock().
console_lock_reacquire() only exists for the console_lock() case.
> But it does not help when the lock was taken
> via console_trylock() from printk(). It might mean that
> the forward progress might not be guaranteed in the direct mode
> (early boot, panic, ...).
How is the console_trylock() case different from current mainline now?
As I mentioned above, the kthreads can block console_trylock(), but so
can a console_lock() currently in mainline.
> Hmm, the forward progress seems to be guaranteed in the direct
> mode most of the time. console_trylock() can take over
> the atomic counter because console kthreads are not allowed
> to enter the printing mode in this case.
>
> I used "most of the time" because there might be races when
> the mode is switched. "printk_direct" is an atomic variable.
> CON_DIRECT is set under con->mutex but console_trylock()
> does not take the mutex...
You are mixing issues here. If CON_DIRECT is set, there is already a
console_lock() in progress, so console_trylock() fails on @console_sem.
> There are also races when the offload to consoles kthreads
> is allowed. For example, console_trylock() might block
> console_kthread_printing_tryenter().
I do not see how that is a problem. If any context has the console lock
(no matter how it got that lock) then the kthreads are blocked.
If direct printing is enabled (from @printk_direct or @oops_in_progress
or @system_state != SYSTEM_RUNNING), the task with the console lock will
print out *all* the remaining records.
If direct printing is not enabled, the kthreads are woken in
__console_unlock().
> Sigh, I am afraid that we have non-trivial problems
> to guarantee that all messages will be printed:
>
> + races when switching between direct mode
> and offload to kthreads. It might cause
> stall in both modes.
Scheduable contexts holding locks can cause stalls. We have that same
problem with console_lock() in mainline now. The kernel provides
mechanisms to avoid such stalls (niceness, priorities, policies,
priority inheritance), but this is all problem-specific and must be
fine-tuned by the user if they are running workloads that are causing
problems. kthreads are not solving the reliability problem (and they
never will).
> + console_trylock() races with
> console_kthread_printing_tryenter().
> It might put kthread into a sleep even when
> it is supposed to print the message.
kthread is never _supposed_ to print a message. It is there to offload
direct printing. If console_trylock() (direct printing) wins, then that
is the context that does the printing.
> IMHO, console_lock_reacquire() does not help much here.
> We need to solve console_trylock() path anyway.
It preserves a consistent locking scenario for the console_lock()
path. That is all it is intended to do.
> I think that the solution might be:
>
> + make sure that the state of "printk_direct" atomic variable
> is enough to distinguish about the mode.
The printk subsystem does not have absolute control over which
task/context is doing the actual printing. Currently in mainline there
are printers that handoff to waiters and even some that do not print at
all because there is already waiters. Adding kthreads introduces a new
task that can print. But there still is no real control about who
prints. The important thing is that there is some context or runnable
task that is running the printing code (whether scheduled or not).
> + always wakeup() console kthreads after console_trylock()
> to handle the possible race with
> console_kthread_printing_tryenter()
I do not understand. If console_trylock() wins, it already wakes the
kthreads on __console_unlock(). If console_trylock() loses, the kthreads
are already running.
John
Hi Petr,
I do not think the example below is showing what you think it does, but
it does shed light on an important point.
On 2022-03-11, Petr Mladek <[email protected]> wrote:
> CPU0 CPU1 CPU2
>
> printk()
> // direct mode allowed
OK, so @printk_direct is >= 1.
> console_trylock()
> console_unlock()
> console_flush_all()
>
> printk_direct_enter()
@printk_direct is now >= 2.
>
> allows_direct_printing() -> false;
I do not understand why you say it returns false. @printk_direct is not
zero. This returns _true_ and will print all records currently in the
ringbuffer.
> break;
>
> __console_unlock()
> wakeup_klogd()
Note that all kthreads wake here. To keep it simple, let us assume there
is only 1.
> // woken printk_khread
The kthread will only go into its printing loop if new records have
appeared after the above __console_unlock(). If any did appear, _that_
printk() will attempt to print them because direct printing is
active. But let us assume that the kthread on CPU2 woke up before the
new printk() context could attempt a console_lock() or
console_trylock().
> console_thread_printing_enter()
@console_lock_count is now 1. Now console_lock() and console_trylock()
will block or fail, respectively.
> printk_direct_exit()
@printk_direct is now >= 1.
> console_trylock()
> atomic_tryblock()
> //fails because thread active
Correct. (Note that if CPU0 had been in a console_lock() context, it
would _not_ fail because console_lock_reacquire() would wait for the
active kthreads to finish their current record.)
> return;
>
> printk_direct_enter()
@printk_direct is now >= 2.
> console_thread_printing_exit()
Note that CPU2 would have printed a single record.
> // sleep because
> atomic_read(&printk_direct) is not
> zero
No, the kthread does not sleep. It will continue printing until it is
blocked via console_lock()/console_trylock() or until all the records
are printed.
> Result: nobody prints
Sorry, but I have no idea how you came to that conclusion.
> Note: The thread will actually not sleep because printk_should_wake()
> does not check atomic_read(&printk_direct).
Exactly.
> But it is a bug. Active thread should check it and allow
> entering direct mode.
We are in direct mode this entire example. I do not understand what you
mean by "allow entering direct mode". Perhaps you mean "allow
console_trylock() to succeed".
If that is what you mean, then you are suggesting that the
console_trylock() spins until all the kthreads have finished their
current record. This could be a new variant of console_trylock().
And then rather than continuing their printing loop, the kthreads all
stay asleep as long as @printk_direct is non-zero (in addition to the
existing sleep conditions).
This could work if the kthreads disable preemption for the
console_thread_printing_enter()/_exit() window.
Although it should be noted that this still will not help if a
console_lock() context is scheduled out.
> Note2: Even when one printk thread flushes the messages. There might
> be other thread that will never get scheduled a nobody would
> printk the messages on the related console.
That is what atomic consoles are for. Until atomic consoles are
available, that situation is covered by @oops_in_progress and
console_flush_on_panic().
> This is the race that I see with console_trylock(). IMHO, if we solve
> this race then we do not need console_lock_reacquire().
I do not understand why you continue to mix console_trylock() and
console_lock_reacquire(). console_lock_reacquire() is only for the
console_lock() context.
> Well, I might be wrong. It is Friday evening. I still do not have
> the entire picture. I probably should have waited for Monday.
I believe that you sense a danger with direct printing and
console_trylock(). It allows for scenarios like your example that end up
relying on kthreads to finish the printing (if there is no
panic). Mainline already has this issue because console_lock() can also
be scheduled away and console_trylock() has no chance to print. This
really is the same issue and ultimately relies on @oops_in_progress and
console_flush_on_panic() to get the messages out.
I believe you are hinting at the worst-case scenario: a kthread getting
scheduled out while printing and never seeing a CPU again because the
system is so busy. Assuming the system does not panic, no printing would
be seen on that console anymore, even if direct printing is enabled.
The only solution I see (aside from atomic consoles) is to disable
preemption during printing. Perhaps for non-atomic consoles, this is
what we need to do. That, together with a new console_trylock() variant,
should avoid this concern. Do you agree? Do we want to go that route?
Disabling preemption would be a problem moving forward for the fbcon
because, for the future, I really would like fbcon to live in a
sleepable context. I already have some new ideas about this. But that is
not a topic for this series.
John
P.S. By "new console_trylock() variant" I mean something for the
console_trylock() context like:
1. grab @console_sem with down_trylock_console_sem()
2. spin on atomic_tryblock()
If console_thread_printing_enter() disables preemption, this should work
about as well as the current owner/waiter logic for @console_sem. I can
do some prototype testing to see if there is some detail I am missing.
John
On Fri 2022-03-11 23:27:51, John Ogness wrote:
> Hi Petr,
>
> I do not think the example below is showing what you think it does, but
> it does shed light on an important point.
>
> On 2022-03-11, Petr Mladek <[email protected]> wrote:
> > CPU0 CPU1 CPU2
> >
> > printk()
> > // direct mode allowed
>
> OK, so @printk_direct is >= 1.
>
> > console_trylock()
> > console_unlock()
> > console_flush_all()
> >
> > printk_direct_enter()
>
> @printk_direct is now >= 2.
I am sorry. I made a mistake in the example. I hope that
the following shows what I wanted to show:
CPU0 CPU1 CPU2
printk_direct_enter()
@printk_direct == 1.
printk()
// direct mode allowed
console_trylock()
console_unlock()
printk_direct_exit()
@printk_direct == 0
console_flush_all()
allows_direct_printing() -> false;
break;
__console_unlock()
wakeup_klogd()
// woken printk_ktread
console_thread_printing_enter()
console_trylock()
atomic_tryblock()
//fails because thread active
return;
printk_direct_enter()
@printk_direct == 1
console_thread_printing_exit()
My expectation: printk_kthread will go into sleep because the system
is in direct mode. In this case, nobody would be
scheduled to continue the printing.
> No, the kthread does not sleep. It will continue printing until it is
> blocked via console_lock()/console_trylock() or until all the records
> are printed.
I see. In the patch, the kthread does not check @printk_direct at all.
I have missed this :-(
From my POV, it is a very non-intuitive behavior. But it seems that
you did it intentionally.
> If that is what you mean, then you are suggesting that the
> console_trylock() spins until all the kthreads have finished their
> current record. This could be a new variant of console_trylock().
No, console_trylock() must not wait. My expectation would be
something like:
static bool printer_should_wake(struct console *con, u64 seq)
{
[...]
/*
* Bail out when direct more is requested. Make sure
* that someone tries to continue printing in
* the direct mode.
*/
if (atomic_read(&printk_direct)) {
defer_console_output();
return false;
}
[...]
}
> That is what atomic consoles are for. Until atomic consoles are
> available, that situation is covered by @oops_in_progress and
> console_flush_on_panic().
> > This is the race that I see with console_trylock(). IMHO, if we solve
> > this race then we do not need console_lock_reacquire().
>
> I do not understand why you continue to mix console_trylock() and
> console_lock_reacquire(). console_lock_reacquire() is only for the
> console_lock() context.
Because the code path in console_unlock() is the same for
console_lock() and console_trylock(). I am sorry but I still do
not see any convincing argument why it is so important to handle
the trylock different way according to how it was taken originally.
The result of console_trylock() and console_reackquire() is exactly
the same:
+ console_sem is taken
+ kthreads are blocked by a single value (flag or counter)
My view, is that the console_trylock_reacquire() is a weird
function that is not necessary. If we really need it to avoid
any race then then the logic is too complex and we have to make
it easier. console_trylock() must always be safe to use!
Simple logic:
If the atomic console_trylock() fails it means that someone else is
already printing. It does not matter if it is another console_lock
owner or kthreads.
If console_trylock() is enough in one code path then it should be
enough also in the other.
Also:
If the console lock is released then anything could happen,
especially the kthreads might start printing. I do not see
any point in stopping them again.
They will start working _only_ when the direct mode is _not_
requested. If they did not start working then we could re-take
the lock atomically and we do not need to set CON_DIRECT flag.
My view is that console_kthreads_atomic_tryblock() is a fast
path how to block the kthreads. IMHO, it is perfectly fine
to use it even in console_lock(). The complicated and
sleeping console_kthreads_block() is needed only
when any kthread is printing and we really need to wait.
OK, you wrote:
<John>
The reason for the reacquire is because (during direct printing) we see
that a new record appeared and we need to make sure it gets printed
(because other direct printers may have aborted, expecting us to print
it).
</John>
Let's make sure that we understand it the same way. When are we
expected to print the pending records directly?
+ When console_lock() was used? My answer is "No".
+ When @printk_direct > 0 My answer is "Yes".
> > Well, I might be wrong. It is Friday evening. I still do not have
> > the entire picture. I probably should have waited for Monday.
>
> I believe that you sense a danger with direct printing and
> console_trylock(). It allows for scenarios like your example that end up
> relying on kthreads to finish the printing (if there is no
> panic). Mainline already has this issue because console_lock() can also
> be scheduled away and console_trylock() has no chance to print. This
> really is the same issue and ultimately relies on @oops_in_progress and
> console_flush_on_panic() to get the messages out.
Sure.
> I believe you are hinting at the worst-case scenario: a kthread getting
> scheduled out while printing and never seeing a CPU again because the
> system is so busy. Assuming the system does not panic, no printing would
> be seen on that console anymore, even if direct printing is enabled.
We could not do much about this. As you say, it happens already in current
mainline. If the current console_lock owner is sleeping, it might
never release the console lock. The fallback is in
console_flush_panic().
IMHO, it is worse with the kthreads. Any printk kthread might block switching
to the direct mode and printing messages on other consoles. We should
allow switching to the direct mode ASAP. This is why I suggest that
kthreads should check @printk_direct counter and bail out. It is
similar to
https://lore.kernel.org/all/[email protected]/
It makes the logic symmetric and easier:
+ console_lock() and printk_direct_enter() block kthreads.
The kthreads will bail out on the next record when printing.
+ console_unlock() and printk_direct_exit() allow using kthreads.
+ console_unlock() will stop direct printing when @printk_direct == 0.
+ printk_kthread() will get blocked when @printk_direct > 0
+ vprintk_emit() and console_unlock() will do the same decision
according to @printk_direct counter. They will either try
to printk directly or wake up kthreads.
Slightly unrelated:
+ __console_emit_next_record() should allow handover by a parameter
passed from the caller. Not by guessing from a global state.
> The only solution I see (aside from atomic consoles) is to disable
> preemption during printing. Perhaps for non-atomic consoles, this is
> what we need to do. That, together with a new console_trylock() variant,
> should avoid this concern. Do you agree? Do we want to go that route?
>
> Disabling preemption would be a problem moving forward for the fbcon
> because, for the future, I really would like fbcon to live in a
> sleepable context. I already have some new ideas about this. But that is
> not a topic for this series.
I though about this as well. But let's keep the preemption enabled in
the kthreads for now. We could always disable it when it causes
problems in the real life.
Best Regards,
Petr
PS: It seems like we are going in circles. My intention is to keep
the logic as simple and as clear as possible:
+ if we need lock then use lock
+ if we need trylock then use trylock
+ if we want direct mode then block kthreads and try enter
the direct mode ASAP.
+ if kthreads mode is allowed then do nothing in
console_unlock() and leave the job to kthreads.
+ console_lock() temporarily blocks kthreads but
it handle messages only when direct mode is enforced.
On 2022-03-14, Petr Mladek <[email protected]> wrote:
> My intention is to keep the logic as simple and as clear as possible:
>
> + if we need lock then use lock
>
> + if we need trylock then use trylock
>
> + if we want direct mode then block kthreads and try enter
> the direct mode ASAP.
>
> + if kthreads mode is allowed then do nothing in
> console_unlock() and leave the job to kthreads.
>
> + console_lock() temporarily blocks kthreads but
> it handle messages only when direct mode is enforced.
Thank you for your examples, detailed analysis, insight, and summaries.
This particular review became quite complicated because offline you sent
me a heavily revised version. Several of your comments are criticizing
your version and not the actual series I posted. For v2 we need to
handle it better so that the list has a chance at following our
discussion. ;-)
I will post a v2 that attempts to address your concerns and try to frame
the naming and structures to align with your suggestions.
John
On Mon 2022-03-14 15:49:39, John Ogness wrote:
> On 2022-03-14, Petr Mladek <[email protected]> wrote:
> > My intention is to keep the logic as simple and as clear as possible:
> >
> > + if we need lock then use lock
> >
> > + if we need trylock then use trylock
> >
> > + if we want direct mode then block kthreads and try enter
> > the direct mode ASAP.
> >
> > + if kthreads mode is allowed then do nothing in
> > console_unlock() and leave the job to kthreads.
> >
> > + console_lock() temporarily blocks kthreads but
> > it handle messages only when direct mode is enforced.
>
> Thank you for your examples, detailed analysis, insight, and summaries.
>
> This particular review became quite complicated because offline you sent
> me a heavily revised version. Several of your comments are criticizing
> your version and not the actual series I posted. For v2 we need to
> handle it better so that the list has a chance at following our
> discussion. ;-)
I am really sorry for this. And some my mails also were a bit
misleading because I missed something in the code.
One small plus is that most of the new names were mentioned
in the public https://lore.kernel.org/r/YhYKP/UuSKENGwfj@alley
But I agree that I probably made it more confusing than needed.
> I will post a v2 that attempts to address your concerns and try to frame
> the naming and structures to align with your suggestions.
Yes, sending v2 and continuing the discussion sounds like a good idea.
Best Regards,
Petr