Some threads still call printk() for printing when resume_console() is
being executed. In practice, the printk() is executed for a period of time
and then returned. The duration is determined by the number of prints
cached in the prb during the suspend/resume process. At the same time,
resume_console() returns quickly.
Base on owner/waiter machanism, the frist one who fail to lock console will
become waiter, and start spining. When current owner finish print one
informance, if a waiter is waitting, owner will give up and let waiter
become a new owner. New owner need to flush the whole prb unitl prb empty
or another new waiter come and take the job from him.
So the first waiter after resume_console() will take seconds to help to
flush prb, but driver which call printk() may be bothered by this. New
a flag to mark resume flushing prb. When the console resume, before the
prb is empty, stop to set a new waiter temporarily.
Signed-off-by: Luo Jiaxing <[email protected]>
---
kernel/printk/printk.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 575a34b..2c680a5 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -287,6 +287,9 @@ EXPORT_SYMBOL(console_set_on_cmdline);
/* Flag: console code may call schedule() */
static int console_may_schedule;
+/* Flags: console flushing prb when resume */
+static atomic_t console_resume_flush_prb = ATOMIC_INIT(0);
+
enum con_msg_format_flags {
MSG_FORMAT_DEFAULT = 0,
MSG_FORMAT_SYSLOG = (1 << 0),
@@ -1781,7 +1784,8 @@ static int console_trylock_spinning(void)
raw_spin_lock(&console_owner_lock);
owner = READ_ONCE(console_owner);
waiter = READ_ONCE(console_waiter);
- if (!waiter && owner && owner != current) {
+ if (!waiter && owner && owner != current &&
+ !atomic_read(&console_resume_flush_prb)) {
WRITE_ONCE(console_waiter, true);
spin = true;
}
@@ -2355,6 +2359,7 @@ void resume_console(void)
if (!console_suspend_enabled)
return;
down_console_sem();
+ atomic_set(&console_resume_flush_prb, 1);
console_suspended = 0;
console_unlock();
}
@@ -2592,6 +2597,8 @@ void console_unlock(void)
raw_spin_unlock(&logbuf_lock);
up_console_sem();
+ if (atomic_read(&console_resume_flush_prb))
+ atomic_set(&console_resume_flush_prb, 0);
/*
* Someone could have filled up the buffer again, so re-check if there's
--
2.7.4
On Thu, 6 May 2021 16:00:26 +0800
Luo Jiaxing <[email protected]> wrote:
> Some threads still call printk() for printing when resume_console() is
> being executed. In practice, the printk() is executed for a period of time
> and then returned. The duration is determined by the number of prints
> cached in the prb during the suspend/resume process. At the same time,
> resume_console() returns quickly.
>
> Base on owner/waiter machanism, the frist one who fail to lock console will
> become waiter, and start spining. When current owner finish print one
> informance, if a waiter is waitting, owner will give up and let waiter
> become a new owner. New owner need to flush the whole prb unitl prb empty
> or another new waiter come and take the job from him.
>
> So the first waiter after resume_console() will take seconds to help to
> flush prb, but driver which call printk() may be bothered by this. New
> a flag to mark resume flushing prb. When the console resume, before the
> prb is empty, stop to set a new waiter temporarily.
>
> Signed-off-by: Luo Jiaxing <[email protected]>
> ---
> kernel/printk/printk.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 575a34b..2c680a5 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -287,6 +287,9 @@ EXPORT_SYMBOL(console_set_on_cmdline);
> /* Flag: console code may call schedule() */
> static int console_may_schedule;
>
> +/* Flags: console flushing prb when resume */
> +static atomic_t console_resume_flush_prb = ATOMIC_INIT(0);
Why are you using an atomic? It's accessed within locks.
static bool console_resuming;
> +
> enum con_msg_format_flags {
> MSG_FORMAT_DEFAULT = 0,
> MSG_FORMAT_SYSLOG = (1 << 0),
> @@ -1781,7 +1784,8 @@ static int console_trylock_spinning(void)
> raw_spin_lock(&console_owner_lock);
> owner = READ_ONCE(console_owner);
> waiter = READ_ONCE(console_waiter);
resuming = READ_ONCE(console_removing);
> - if (!waiter && owner && owner != current) {
if (!resuming && (!waiter ...
> + if (!waiter && owner && owner != current &&
> + !atomic_read(&console_resume_flush_prb)) {
> WRITE_ONCE(console_waiter, true);
> spin = true;
> }
> @@ -2355,6 +2359,7 @@ void resume_console(void)
> if (!console_suspend_enabled)
> return;
> down_console_sem();
> + atomic_set(&console_resume_flush_prb, 1);
> console_suspended = 0;
resuming = true;
> console_unlock();
/* Keep clearing resume from entering the console_unlock */
smp_wmb();
resuming = false;
> }
> @@ -2592,6 +2597,8 @@ void console_unlock(void)
> raw_spin_unlock(&logbuf_lock);
>
> up_console_sem();
> + if (atomic_read(&console_resume_flush_prb))
> + atomic_set(&console_resume_flush_prb, 0);
Get rid of the above.
-- Steve
>
> /*
> * Someone could have filled up the buffer again, so re-check if there's
On Thu 2021-05-06 16:00:26, Luo Jiaxing wrote:
> Some threads still call printk() for printing when resume_console() is
> being executed. In practice, the printk() is executed for a period of time
> and then returned. The duration is determined by the number of prints
> cached in the prb during the suspend/resume process. At the same time,
> resume_console() returns quickly.
The last sentence is a bit misleading. resume_console() returns
quickly only when @console_owner was passed to another process.
> Base on owner/waiter machanism, the frist one who fail to lock console will
> become waiter, and start spining. When current owner finish print one
> informance, if a waiter is waitting, owner will give up and let waiter
> become a new owner. New owner need to flush the whole prb unitl prb empty
> or another new waiter come and take the job from him.
>
> So the first waiter after resume_console() will take seconds to help to
It need not to be the first waiter. The console_lock owner might be passed
several times.
But you have a point. Many messages might get accumulated when the
console was suspended and any console_owner might spend a long time
processing them. resume_console() seems to be always called in
preemptible context, so it is safe to process all messages here.
> flush prb, but driver which call printk() may be bothered by this. New
> a flag to mark resume flushing prb. When the console resume, before the
> prb is empty, stop to set a new waiter temporarily.
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -287,6 +287,9 @@ EXPORT_SYMBOL(console_set_on_cmdline);
> /* Flag: console code may call schedule() */
> static int console_may_schedule;
>
> +/* Flags: console flushing prb when resume */
> +static atomic_t console_resume_flush_prb = ATOMIC_INIT(0);
> +
> enum con_msg_format_flags {
> MSG_FORMAT_DEFAULT = 0,
> MSG_FORMAT_SYSLOG = (1 << 0),
> @@ -1781,7 +1784,8 @@ static int console_trylock_spinning(void)
> raw_spin_lock(&console_owner_lock);
> owner = READ_ONCE(console_owner);
> waiter = READ_ONCE(console_waiter);
> - if (!waiter && owner && owner != current) {
> + if (!waiter && owner && owner != current &&
> + !atomic_read(&console_resume_flush_prb)) {
atomic_set()/atomic_read() do not provide any memory barriers.
IMHO, the atomic operations are not enough to serialize @console_owner
and @console_resume_flush_prb manipulation.
See below.
> WRITE_ONCE(console_waiter, true);
> spin = true;
> }
> @@ -2355,6 +2359,7 @@ void resume_console(void)
> if (!console_suspend_enabled)
> return;
> down_console_sem();
> + atomic_set(&console_resume_flush_prb, 1);
> console_suspended = 0;
> console_unlock();
> }
> @@ -2592,6 +2597,8 @@ void console_unlock(void)
> raw_spin_unlock(&logbuf_lock);
>
> up_console_sem();
> + if (atomic_read(&console_resume_flush_prb))
> + atomic_set(&console_resume_flush_prb, 0);
This should be done under console_lock. Othwerwise,
it is not serialized at all.
Also there is one more return from console_unlock():
if (!can_use_console()) {
console_locked = 0;
up_console_sem();
return;
}
@console_resume_flush_prb must be cleared here as well.
Otherwise, the next random console_unlock() caller will not
be allowed to pass the console lock owner.
OK, the above patch tries to tell console_trylock_spinning()
that it should ignore console_owner even when set.
@console_resume_flush_prb variable is set/read by different
processes in parallel which makes it complicated.
Instead, we should simply tell console_unlock() that it should not
set console_owner in this case. The most strightforward
way is to pass this via parameter.
Such console_unlock() might be used even on another locations
with preemptible context.
What about the following patch?
From 574e844f512c9f450e64832f09cc389bc9915f83 Mon Sep 17 00:00:00 2001
From: Petr Mladek <[email protected]>
Date: Thu, 6 May 2021 12:40:56 +0200
Subject: [PATCH] printk: Prevent softlockup when resuming console
Many printk messages might get accumulated when consoles were suspended.
They are proceed when console_unlock() is called in resume_console().
The possibility to pass the console lock owner was added to reduce the risk
of softlockup when too many messages were handled in an atomic context.
Now, resume_console() is always in a preemptible context that is safe
to handle all accumulated messages. The possibility to pass the console
lock owner actually makes things worse. The new owner might be in an atomic
context and might cause softlockup when processing all messages accumulated
when the console was suspended.
Create new console_unlock_preemptible() that will not allow to pass
the console lock owner. As a result, all accumulated messages will
be proceed in the safe preemptible process.
Use it in resume_console(). But it might be used also on other locations
where console lock was used in preemptible context and many messages
might get accumulated when the process was sleeping.
Reported-by: Luo Jiaxing <[email protected]>
Suggested-by: Luo Jiaxing <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
---
include/linux/console.h | 1 +
kernel/printk/printk.c | 39 +++++++++++++++++++++++++++++++++------
2 files changed, 34 insertions(+), 6 deletions(-)
diff --git a/include/linux/console.h b/include/linux/console.h
index 20874db50bc8..0c444c6448e8 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -174,6 +174,7 @@ extern struct console *console_drivers;
extern void console_lock(void);
extern int console_trylock(void);
extern void console_unlock(void);
+extern void console_unlock_preemptible(void);
extern void console_conditional_schedule(void);
extern void console_unblank(void);
extern void console_flush_on_panic(enum con_flush_mode mode);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 421c35571797..a7e94c898646 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2425,7 +2425,7 @@ void resume_console(void)
return;
down_console_sem();
console_suspended = 0;
- console_unlock();
+ console_unlock_preemptible();
}
/**
@@ -2534,10 +2534,8 @@ static inline int can_use_console(void)
* the output prior to releasing the lock.
*
* If there is output waiting, we wake /dev/kmsg and syslog() users.
- *
- * console_unlock(); may be called from any context.
*/
-void console_unlock(void)
+void __console_unlock(bool spinning_enabled)
{
static char ext_text[CONSOLE_EXT_LOG_MAX];
static char text[CONSOLE_LOG_MAX];
@@ -2546,6 +2544,7 @@ void console_unlock(void)
struct printk_info info;
struct printk_record r;
+
if (console_suspended) {
up_console_sem();
return;
@@ -2637,13 +2636,15 @@ void console_unlock(void)
* finish. This task can not be preempted if there is a
* waiter waiting to take over.
*/
- console_lock_spinning_enable();
+ if (spinning_enabled)
+ console_lock_spinning_enable();
stop_critical_timings(); /* don't trace print latency */
call_console_drivers(ext_text, ext_len, text, len);
start_critical_timings();
- if (console_lock_spinning_disable_and_check()) {
+ if (spinning_enabled &&
+ console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
return;
}
@@ -2670,8 +2671,34 @@ void console_unlock(void)
if (retry && console_trylock())
goto again;
}
+
+/*
+ * Classic console_unlock() that might be called in any context.
+ *
+ * It allows to pass the console lock owner when processing the buffered
+ * messages. It helps to prevent soft lockups in an atomic context.
+ */
+void console_unlock()
+{
+ __console_unlock(true);
+}
EXPORT_SYMBOL(console_unlock);
+/*
+ * Variant of the console unlock that can be called only in preemptible
+ * context.
+ *
+ * All messages are processed in this safe context. It helps to prevent
+ * softlockups when the console lock owner was passed to an atomic context.
+ */
+void console_unlock_preemptible()
+{
+ lockdep_assert_preemption_enabled();
+
+ __console_unlock(false);
+}
+EXPORT_SYMBOL(console_unlock_preemptible);
+
/**
* console_conditional_schedule - yield the CPU if required
*
--
2.26.2
Thanks for Cc-ing Petr
On (21/05/06 15:39), Petr Mladek wrote:
>
> Many printk messages might get accumulated when consoles were suspended.
> They are proceed when console_unlock() is called in resume_console().
>
> The possibility to pass the console lock owner was added to reduce the risk
> of softlockup when too many messages were handled in an atomic context.
>
> Now, resume_console() is always in a preemptible context that is safe
> to handle all accumulated messages. The possibility to pass the console
> lock owner actually makes things worse. The new owner might be in an atomic
> context and might cause softlockup when processing all messages accumulated
> when the console was suspended.
>
> Create new console_unlock_preemptible() that will not allow to pass
> the console lock owner. As a result, all accumulated messages will
> be proceed in the safe preemptible process.
If we have a lot of pending messages in the logbuf, then there is
something chatty - some context (task, irq) or maybe several contexts.
And those contexts can continue adding messages, while we print them
_exclusively_ from preemptible context only. without ever throttling down
printk() callers - something that console_owner spinning and handover
does for us. And those printk() callers can even preempt
console_unlock_preemptible() and cause delays and lost messages.
In this regard, I'm afraid, console_unlock_preemptible() is somewhat
similar, IMHO, to the solution which we reverted - removal of
preempt_disable() before console_unlock() in vprintk_emit().
How about this.
Can we count the number of lines that we print from the `current` context
in console_unlock() and if after N messages there is no console_lock waiter
waiting for the `current` to handover console lock ownership, then create
one: schedule IRQ work that will become a console lock owner, spin on
console lock and call console_unlock() once it acquired the ownership.
That 'artificial' console lock owner will do the same - print N
messages, if nothing wants to become a console lock owner then it'll
queue another IRQ work.
On 2021-05-06, Sergey Senozhatsky <[email protected]> wrote:
>> Can we count the number of lines that we print from the `current` context
>> in console_unlock() and if after N messages there is no console_lock waiter
>> waiting for the `current` to handover console lock ownership, then create
>> one: schedule IRQ work that will become a console lock owner, spin on
>> console lock and call console_unlock() once it acquired the ownership.
>> That 'artificial' console lock owner will do the same - print N
>> messages, if nothing wants to become a console lock owner then it'll
>> queue another IRQ work.
>
> Or even simpler
>
> console_unlock()
> {
> ...
>
> if (printed_messages > limit && !console_lock_spinning_disable_and_check()) {
> printk_safe_exit_irqrestore(flags);
>
> console_locked = 0;
> up_console_sem();
>
> defer_console_output();
> return;
> }
>
> ...
> }
Or instead to keep it suspend/resume related maybe...
void resume_console(void)
{
if (!console_suspend_enabled)
return;
down_console_sem();
console_suspended = 0;
- console_unlock();
+ console_locked = 0;
+ up_console_sem();
+ defer_console_output();
}
John Ogness
On (21/05/06 23:07), Sergey Senozhatsky wrote:
>
> Can we count the number of lines that we print from the `current` context
> in console_unlock() and if after N messages there is no console_lock waiter
> waiting for the `current` to handover console lock ownership, then create
> one: schedule IRQ work that will become a console lock owner, spin on
> console lock and call console_unlock() once it acquired the ownership.
> That 'artificial' console lock owner will do the same - print N
> messages, if nothing wants to become a console lock owner then it'll
> queue another IRQ work.
Or even simpler
console_unlock()
{
...
if (printed_messages > limit && !console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
console_locked = 0;
up_console_sem();
defer_console_output();
return;
}
...
}
On (21/05/07 15:33), luojiaxing wrote:
> > console_unlock()
> > {
> > ...
> >
> > if (printed_messages > limit && !console_lock_spinning_disable_and_check()) {
> > printk_safe_exit_irqrestore(flags);
> >
> > console_locked = 0;
> > up_console_sem();
> >
> > defer_console_output();
> > return;
> > }
>
>
> Hi,? Sergey, I test this,? it works.
>
>
> But, I have a doubt. If the log buffer exceeds limit, we can schedule IRQ
> work to become the console lock owner and let current context return.
>
> So why not just let the IRQ work process the console output without limit?
log buffer can be several tenth of megabytes in size which IRQ work
will have to print to a potentially slow serial console, which will
trigger watchdogs on the CPU that IRQ is running on.
> I wonder if the driver can only cache the print and queue the output tasks
> to a workqueue to return
Work queue task is preemptible, which did cause problems in the past:
huge printing delays under memory pressure. So, ideally, what we want
from a console lock owner is be non-preemptible and to either print
pending messages or handover the lock to another task.
On 2021/5/6 22:12, Sergey Senozhatsky wrote:
> On (21/05/06 23:07), Sergey Senozhatsky wrote:
>> Can we count the number of lines that we print from the `current` context
>> in console_unlock() and if after N messages there is no console_lock waiter
>> waiting for the `current` to handover console lock ownership, then create
>> one: schedule IRQ work that will become a console lock owner, spin on
>> console lock and call console_unlock() once it acquired the ownership.
>> That 'artificial' console lock owner will do the same - print N
>> messages, if nothing wants to become a console lock owner then it'll
>> queue another IRQ work.
> Or even simpler
>
> console_unlock()
> {
> ...
>
> if (printed_messages > limit && !console_lock_spinning_disable_and_check()) {
> printk_safe_exit_irqrestore(flags);
>
> console_locked = 0;
> up_console_sem();
>
> defer_console_output();
> return;
> }
Hi, Sergey, I test this, it works.
But, I have a doubt. If the log buffer exceeds limit, we can schedule
IRQ work to become the console lock owner and let current context return.
So why not just let the IRQ work process the console output without
limit? To be clear, driver call printk() only care about the correct
sequence of timestamps
and does not care when the print is output. Actually, I wonder if the
driver can only cache the print and queue the output tasks to a
workqueue to return, and then return directly?
I don't know if this is a stupid idea, because I'm not familiar with printk.
Thanks
Jiaxing
> ...
> }
>
> .
>
On 2021/5/6 23:14, John Ogness wrote:
> On 2021-05-06, Sergey Senozhatsky <[email protected]> wrote:
>>> Can we count the number of lines that we print from the `current` context
>>> in console_unlock() and if after N messages there is no console_lock waiter
>>> waiting for the `current` to handover console lock ownership, then create
>>> one: schedule IRQ work that will become a console lock owner, spin on
>>> console lock and call console_unlock() once it acquired the ownership.
>>> That 'artificial' console lock owner will do the same - print N
>>> messages, if nothing wants to become a console lock owner then it'll
>>> queue another IRQ work.
>> Or even simpler
>>
>> console_unlock()
>> {
>> ...
>>
>> if (printed_messages > limit && !console_lock_spinning_disable_and_check()) {
>> printk_safe_exit_irqrestore(flags);
>>
>> console_locked = 0;
>> up_console_sem();
>>
>> defer_console_output();
>> return;
>> }
>>
>> ...
>> }
> Or instead to keep it suspend/resume related maybe...
>
> void resume_console(void)
> {
> if (!console_suspend_enabled)
> return;
> down_console_sem();
> console_suspended = 0;
> - console_unlock();
> + console_locked = 0;
> + up_console_sem();
> + defer_console_output();
Hi, john, I test this and found a calltrace below:
[ 467.670366] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 467.670367] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 467.670368] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 467.670369] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 467.670370] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 467.670371] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 468.297467] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 489.298709] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 489.298712] rcu: 44-....: (4324 ticks this GP)
idle=e6e/1/0x4000000000000002 softirq=1212/1212 fqs=2625
[ 489.298722] (t=5250 jiffies g=3285 q=1820)
[ 489.298725] Task dump for CPU 44:
[ 489.298726] task:bash state:R running task stack: 0
pid:16937 ppid: 16935 flags:0x00000206
[ 489.298731] Call trace:
[ 489.298732] dump_backtrace+0x0/0x1a4
[ 489.298738] show_stack+0x24/0x40
[ 489.298740] sched_show_task+0x148/0x170
[ 489.298744] dump_cpu_task+0x50/0x60
[ 489.298748] rcu_dump_cpu_stacks+0x10c/0x154
[ 489.298751] rcu_sched_clock_irq+0x8d4/0xd94
[ 489.298755] update_process_times+0xa8/0xf4
[ 489.298758] tick_sched_handle+0x38/0x74
[ 489.298760] tick_sched_timer+0x58/0xb0
[ 489.298762] __hrtimer_run_queues+0x18c/0x3a0
[ 489.298764] hrtimer_interrupt+0xf4/0x2cc
[ 489.298767] arch_timer_handler_phys+0x40/0x50
[ 489.298771] handle_percpu_devid_irq+0x94/0x280
[ 489.298775] __handle_domain_irq+0x88/0xf0
[ 489.298777] gic_handle_irq+0xd8/0x2c0
[ 489.298779] el1_irq+0xbc/0x180
[ 489.298780] console_unlock+0x390/0x580
[ 489.298782] vprintk_emit+0x104/0x280
[ 489.298784] dev_vprintk_emit+0x14c/0x184
[ 489.298787] dev_printk_emit+0x64/0x8c
[ 489.298789] __dev_printk+0x60/0x84
[ 489.298792] _dev_info+0x6c/0x94
[ 489.298794] debugfs_test_if_1_write+0xc0/0x14c [hisi_sas_v3_hw]
[ 489.298801] full_proxy_write+0x6c/0xc0
[ 489.298804] vfs_write+0xfc/0x2b0
[ 489.298807] ksys_write+0x74/0x100
[ 489.298809] __arm64_sys_write+0x28/0x34
[ 489.298812] el0_svc_common.constprop.0+0x70/0x1a4
[ 489.298815] do_el0_svc+0x28/0x3c
[ 489.298817] el0_svc+0x2c/0x70
[ 489.298820] el0_sync_handler+0x1a4/0x1b0
[ 489.298822] el0_sync+0x170/0x180
[ 492.474266] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 492.479137] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 492.483996] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 492.488857] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 492.493716] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 492.498574] hisi_sas_v3_hw 0000:b4:02.0: print test
My test method is as follows: Use a kernel thread to make the console
enter suspend, sleep for 10 seconds, and then resume.
At the same time, another thread repeatedly calls printk() for output.
Thanks
Jiaxing
> }
>
> John Ogness
>
> .
>
On 2021/5/6 21:13, Steven Rostedt wrote:
> On Thu, 6 May 2021 16:00:26 +0800
> Luo Jiaxing <[email protected]> wrote:
>
>> Some threads still call printk() for printing when resume_console() is
>> being executed. In practice, the printk() is executed for a period of time
>> and then returned. The duration is determined by the number of prints
>> cached in the prb during the suspend/resume process. At the same time,
>> resume_console() returns quickly.
>>
>> Base on owner/waiter machanism, the frist one who fail to lock console will
>> become waiter, and start spining. When current owner finish print one
>> informance, if a waiter is waitting, owner will give up and let waiter
>> become a new owner. New owner need to flush the whole prb unitl prb empty
>> or another new waiter come and take the job from him.
>>
>> So the first waiter after resume_console() will take seconds to help to
>> flush prb, but driver which call printk() may be bothered by this. New
>> a flag to mark resume flushing prb. When the console resume, before the
>> prb is empty, stop to set a new waiter temporarily.
>>
>> Signed-off-by: Luo Jiaxing <[email protected]>
>> ---
>> kernel/printk/printk.c | 9 ++++++++-
>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index 575a34b..2c680a5 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -287,6 +287,9 @@ EXPORT_SYMBOL(console_set_on_cmdline);
>> /* Flag: console code may call schedule() */
>> static int console_may_schedule;
>>
>> +/* Flags: console flushing prb when resume */
>> +static atomic_t console_resume_flush_prb = ATOMIC_INIT(0);
> Why are you using an atomic? It's accessed within locks.
Hi, Steven, My method is to add a flag. When the flag is set, the
console_resume context can exclusively keep the sonsole ower lock.
As for each CPU may invoke printk at any time, we have to ensure that
each CPU can obtain the latest flag value in a timely manner.
Initially, wmb was also considered, but I think wmb is not suitable for
machines with multiple NUMA nodes, atomic variables were used then.
In addition, only console_resume sets the flag, so whether the flag is
locked is not considered.
Thanks
Jiaxing
>
> static bool console_resuming;
>
>
>> +
>> enum con_msg_format_flags {
>> MSG_FORMAT_DEFAULT = 0,
>> MSG_FORMAT_SYSLOG = (1 << 0),
>> @@ -1781,7 +1784,8 @@ static int console_trylock_spinning(void)
>> raw_spin_lock(&console_owner_lock);
>> owner = READ_ONCE(console_owner);
>> waiter = READ_ONCE(console_waiter);
> resuming = READ_ONCE(console_removing);
>
>> - if (!waiter && owner && owner != current) {
> if (!resuming && (!waiter ...
>
>> + if (!waiter && owner && owner != current &&
>> + !atomic_read(&console_resume_flush_prb)) {
>> WRITE_ONCE(console_waiter, true);
>> spin = true;
>> }
>> @@ -2355,6 +2359,7 @@ void resume_console(void)
>> if (!console_suspend_enabled)
>> return;
>> down_console_sem();
>> + atomic_set(&console_resume_flush_prb, 1);
>> console_suspended = 0;
> resuming = true;
>
>> console_unlock();
> /* Keep clearing resume from entering the console_unlock */
> smp_wmb();
> resuming = false;
>
>
>> }
>> @@ -2592,6 +2597,8 @@ void console_unlock(void)
>> raw_spin_unlock(&logbuf_lock);
>>
>> up_console_sem();
>> + if (atomic_read(&console_resume_flush_prb))
>> + atomic_set(&console_resume_flush_prb, 0);
> Get rid of the above.
>
> -- Steve
>
>>
>> /*
>> * Someone could have filled up the buffer again, so re-check if there's
>
> .
>
On Thu 2021-05-06 23:07:19, Sergey Senozhatsky wrote:
> Thanks for Cc-ing Petr
>
> On (21/05/06 15:39), Petr Mladek wrote:
> >
> > Many printk messages might get accumulated when consoles were suspended.
> > They are proceed when console_unlock() is called in resume_console().
> >
> > The possibility to pass the console lock owner was added to reduce the risk
> > of softlockup when too many messages were handled in an atomic context.
> >
> > Now, resume_console() is always in a preemptible context that is safe
> > to handle all accumulated messages. The possibility to pass the console
> > lock owner actually makes things worse. The new owner might be in an atomic
> > context and might cause softlockup when processing all messages accumulated
> > when the console was suspended.
> >
> > Create new console_unlock_preemptible() that will not allow to pass
> > the console lock owner. As a result, all accumulated messages will
> > be proceed in the safe preemptible process.
>
> If we have a lot of pending messages in the logbuf, then there is
> something chatty - some context (task, irq) or maybe several contexts.
> And those contexts can continue adding messages, while we print them
> _exclusively_ from preemptible context only. without ever throttling down
> printk() callers - something that console_owner spinning and handover
> does for us. And those printk() callers can even preempt
> console_unlock_preemptible() and cause delays and lost messages.
Luo, please, correct me if I am wrong.
This patch a about one well defined scenario. The messages are
accumulated between suspend_console() and resume_console().
It is "small" part of the system hibernation. And we need
to get them out now. There might be many if something special
was debugged.
I am pretty sure that Luo did not see any flood of messages:
+ Flood in more contexts would be balanced by switching
the console_owner.
+ Flood in one context would be naturally throttled because
this context will become the console_owner.
In each case, these messages would be generated after
console_resume(). Luo's original patch was explicitly talking
about messages accumulated during the suspend.
Luo, could you please provide some log showing the problem?
Best Regards,
Petr
On Thu 2021-05-06 23:12:02, Sergey Senozhatsky wrote:
> On (21/05/06 23:07), Sergey Senozhatsky wrote:
> >
> > Can we count the number of lines that we print from the `current` context
> > in console_unlock() and if after N messages there is no console_lock waiter
> > waiting for the `current` to handover console lock ownership, then create
> > one: schedule IRQ work that will become a console lock owner, spin on
> > console lock and call console_unlock() once it acquired the ownership.
> > That 'artificial' console lock owner will do the same - print N
> > messages, if nothing wants to become a console lock owner then it'll
> > queue another IRQ work.
>
> Or even simpler
>
> console_unlock()
> {
> ...
>
> if (printed_messages > limit && !console_lock_spinning_disable_and_check()) {
> printk_safe_exit_irqrestore(flags);
>
> console_locked = 0;
> up_console_sem();
>
> defer_console_output();
> return;
> }
>
> ...
> }
No, please, no.
This is exactly the opposite. The original patch tried to keep the
work in the preemtible context. This proposal moves the work into irq
context which is bad. Not to say, that defer_console_output() would
trigger IRQ on the same CPU again and again.
All the problems with printk() are that we try to support all
scenarios. But it simply does not work. We need to say that some
situations are not supported.
Flood of messages and slow console requires a miracle. The only chance
is a huge buffer and get them out later. Infinite flood of messages
is impossible to handle by definition.
I hope that we agreed that the priority is to keep printk() safe
and do not break the system. The ultimate solution is to handle
consoles in a separate preemtible context (kthread).
There should be atomic consoles for those who want to see the messages
immediately with the cost of slowing down the code doing the real job.
I do not have strong opinion whether the proposed patch is worth it.
It is just another compromise that might be better in some situations
and worse in others.
Well, this situation is special. There might be many accumulated
messages during the hibernation. They did not have any chance to
be handled by more CPUs. Using the well known preemtible context
sounds slightly better than risking a random victim in atomic
context.
Anyway, I am fine with discarding this patch and focusing on
the offload to kthreads.
Best Regards,
Petr
On 2021/5/6 21:39, Petr Mladek wrote:
> On Thu 2021-05-06 16:00:26, Luo Jiaxing wrote:
>> Some threads still call printk() for printing when resume_console() is
>> being executed. In practice, the printk() is executed for a period of time
>> and then returned. The duration is determined by the number of prints
>> cached in the prb during the suspend/resume process. At the same time,
>> resume_console() returns quickly.
> The last sentence is a bit misleading. resume_console() returns
> quickly only when @console_owner was passed to another process.
>
>
>> Base on owner/waiter machanism, the frist one who fail to lock console will
>> become waiter, and start spining. When current owner finish print one
>> informance, if a waiter is waitting, owner will give up and let waiter
>> become a new owner. New owner need to flush the whole prb unitl prb empty
>> or another new waiter come and take the job from him.
>>
>> So the first waiter after resume_console() will take seconds to help to
> It need not to be the first waiter. The console_lock owner might be passed
> several times.
>
> But you have a point. Many messages might get accumulated when the
> console was suspended and any console_owner might spend a long time
> processing them. resume_console() seems to be always called in
> preemptible context, so it is safe to process all messages here.
>
>
>> flush prb, but driver which call printk() may be bothered by this. New
>> a flag to mark resume flushing prb. When the console resume, before the
>> prb is empty, stop to set a new waiter temporarily.
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -287,6 +287,9 @@ EXPORT_SYMBOL(console_set_on_cmdline);
>> /* Flag: console code may call schedule() */
>> static int console_may_schedule;
>>
>> +/* Flags: console flushing prb when resume */
>> +static atomic_t console_resume_flush_prb = ATOMIC_INIT(0);
>> +
>> enum con_msg_format_flags {
>> MSG_FORMAT_DEFAULT = 0,
>> MSG_FORMAT_SYSLOG = (1 << 0),
>> @@ -1781,7 +1784,8 @@ static int console_trylock_spinning(void)
>> raw_spin_lock(&console_owner_lock);
>> owner = READ_ONCE(console_owner);
>> waiter = READ_ONCE(console_waiter);
>> - if (!waiter && owner && owner != current) {
>> + if (!waiter && owner && owner != current &&
>> + !atomic_read(&console_resume_flush_prb)) {
> atomic_set()/atomic_read() do not provide any memory barriers.
> IMHO, the atomic operations are not enough to serialize @console_owner
> and @console_resume_flush_prb manipulation.
>
> See below.
>
>> WRITE_ONCE(console_waiter, true);
>> spin = true;
>> }
>> @@ -2355,6 +2359,7 @@ void resume_console(void)
>> if (!console_suspend_enabled)
>> return;
>> down_console_sem();
>> + atomic_set(&console_resume_flush_prb, 1);
>> console_suspended = 0;
>> console_unlock();
>> }
>> @@ -2592,6 +2597,8 @@ void console_unlock(void)
>> raw_spin_unlock(&logbuf_lock);
>>
>> up_console_sem();
>> + if (atomic_read(&console_resume_flush_prb))
>> + atomic_set(&console_resume_flush_prb, 0);
> This should be done under console_lock. Othwerwise,
> it is not serialized at all.
>
> Also there is one more return from console_unlock():
>
> if (!can_use_console()) {
> console_locked = 0;
> up_console_sem();
> return;
> }
>
> @console_resume_flush_prb must be cleared here as well.
> Otherwise, the next random console_unlock() caller will not
> be allowed to pass the console lock owner.
>
>
> OK, the above patch tries to tell console_trylock_spinning()
> that it should ignore console_owner even when set.
> @console_resume_flush_prb variable is set/read by different
> processes in parallel which makes it complicated.
>
> Instead, we should simply tell console_unlock() that it should not
> set console_owner in this case. The most strightforward
> way is to pass this via parameter.
>
> Such console_unlock() might be used even on another locations
> with preemptible context.
>
>
> What about the following patch?
Hi, Petr, I test your patch and I think it needs to make some
modifications to fix the problem.
My test method is as follows:
Kernel thread A causes the console to enter suspend and then resume it
10 seconds later.
Kernel thread B repeatedly invokes dev_info() for 15 seconds after the
console suspend.
My expected result is that the call to dev_info() in kernel thread B
does not block for a long time.
But the actual test result shows that kernel thread B is still blocked
after console_resume.
I modified some of the code to meet expectations, Please check below.
>
>
> >From 574e844f512c9f450e64832f09cc389bc9915f83 Mon Sep 17 00:00:00 2001
> From: Petr Mladek <[email protected]>
> Date: Thu, 6 May 2021 12:40:56 +0200
> Subject: [PATCH] printk: Prevent softlockup when resuming console
>
> Many printk messages might get accumulated when consoles were suspended.
> They are proceed when console_unlock() is called in resume_console().
>
> The possibility to pass the console lock owner was added to reduce the risk
> of softlockup when too many messages were handled in an atomic context.
>
> Now, resume_console() is always in a preemptible context that is safe
> to handle all accumulated messages. The possibility to pass the console
> lock owner actually makes things worse. The new owner might be in an atomic
> context and might cause softlockup when processing all messages accumulated
> when the console was suspended.
>
> Create new console_unlock_preemptible() that will not allow to pass
> the console lock owner. As a result, all accumulated messages will
> be proceed in the safe preemptible process.
>
> Use it in resume_console(). But it might be used also on other locations
> where console lock was used in preemptible context and many messages
> might get accumulated when the process was sleeping.
>
> Reported-by: Luo Jiaxing <[email protected]>
> Suggested-by: Luo Jiaxing <[email protected]>
> Signed-off-by: Petr Mladek <[email protected]>
> ---
> include/linux/console.h | 1 +
> kernel/printk/printk.c | 39 +++++++++++++++++++++++++++++++++------
> 2 files changed, 34 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/console.h b/include/linux/console.h
> index 20874db50bc8..0c444c6448e8 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -174,6 +174,7 @@ extern struct console *console_drivers;
> extern void console_lock(void);
> extern int console_trylock(void);
> extern void console_unlock(void);
> +extern void console_unlock_preemptible(void);
> extern void console_conditional_schedule(void);
> extern void console_unblank(void);
> extern void console_flush_on_panic(enum con_flush_mode mode);
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 421c35571797..a7e94c898646 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2425,7 +2425,7 @@ void resume_console(void)
> return;
> down_console_sem();
> console_suspended = 0;
> - console_unlock();
> + console_unlock_preemptible();
> }
>
> /**
> @@ -2534,10 +2534,8 @@ static inline int can_use_console(void)
> * the output prior to releasing the lock.
> *
> * If there is output waiting, we wake /dev/kmsg and syslog() users.
> - *
> - * console_unlock(); may be called from any context.
> */
> -void console_unlock(void)
> +void __console_unlock(bool spinning_enabled)
> {
> static char ext_text[CONSOLE_EXT_LOG_MAX];
> static char text[CONSOLE_LOG_MAX];
> @@ -2546,6 +2544,7 @@ void console_unlock(void)
> struct printk_info info;
> struct printk_record r;
>
> +
> if (console_suspended) {
> up_console_sem();
> return;
> @@ -2637,13 +2636,15 @@ void console_unlock(void)
> * finish. This task can not be preempted if there is a
> * waiter waiting to take over.
> */
> - console_lock_spinning_enable();
> + if (spinning_enabled)
> + console_lock_spinning_enable();
change to
+ if (!spinning_enabled) {
+ raw_spin_lock(&console_owner_lock);
+ WRITE_ONCE(console_waiter, true);
+ raw_spin_unlock(&console_owner_lock);
+ }
set console_waiter true here can prevent other thread from spinning.
>
> stop_critical_timings(); /* don't trace print latency */
> call_console_drivers(ext_text, ext_len, text, len);
> start_critical_timings();
>
> - if (console_lock_spinning_disable_and_check()) {
> + if (spinning_enabled &&
> + console_lock_spinning_disable_and_check()) {
> printk_safe_exit_irqrestore(flags);
> return;
> }
add some code to release console_waiter before return
@@ -2592,6 +2595,12 @@ void __console_unlock(bool spinning_enabled)
cond_resched();
}
+ if (!spinning_enabled) {
+ raw_spin_lock(&console_owner_lock);
+ WRITE_ONCE(console_waiter, false);
+ raw_spin_unlock(&console_owner_lock);
+ }
+
console_locked = 0;
Thanks
Jiaxing
> @@ -2670,8 +2671,34 @@ void console_unlock(void)
> if (retry && console_trylock())
> goto again;
> }
> +
> +/*
> + * Classic console_unlock() that might be called in any context.
> + *
> + * It allows to pass the console lock owner when processing the buffered
> + * messages. It helps to prevent soft lockups in an atomic context.
> + */
> +void console_unlock()
> +{
> + __console_unlock(true);
> +}
> EXPORT_SYMBOL(console_unlock);
>
> +/*
> + * Variant of the console unlock that can be called only in preemptible
> + * context.
> + *
> + * All messages are processed in this safe context. It helps to prevent
> + * softlockups when the console lock owner was passed to an atomic context.
> + */
> +void console_unlock_preemptible()
> +{
> + lockdep_assert_preemption_enabled();
> +
> + __console_unlock(false);
> +}
> +EXPORT_SYMBOL(console_unlock_preemptible);
> +
> /**
> * console_conditional_schedule - yield the CPU if required
> *
On (21/05/07 18:36), Petr Mladek wrote:
> > console_unlock()
> > {
> > ...
> >
> > if (printed_messages > limit && !console_lock_spinning_disable_and_check()) {
> > printk_safe_exit_irqrestore(flags);
> >
> > console_locked = 0;
> > up_console_sem();
> >
> > defer_console_output();
> > return;
> > }
> >
> > ...
> > }
>
> No, please, no.
>
Well, the alternative patch set just gives everyone an API that selectively
downgrades printk() to pre-console_sem_owner age: when console_unlock()
would never handover the printing duty. It'll take no time until this
function will pop up somewhere where we don't want it to be used.
E.g.
rcu_read_lock();
...
console_unlock_preemptible();
...
rcu_read_unlock();
lockdep_assert_preemption_enabled() is not entirely reliable - it
depends on __lockdep_enabled, provided that system in question has
CONFIG_PROVE_LOCKING set.
> Not to say, that defer_console_output() would trigger IRQ on the same
> CPU again and again.
You mean only on archs that have arch_irq_work_raise()?
It queues the work IF we have pending messages AND there are NO active
console_sem waiters spinning on consolse_sem waiting for us to handover
the printing duty. And IRQ shall write to consoles only N messages out
of possibly M pending messages (M > N). N, obviously, should be small,
e.g. 42 lines: if after 42 printed lines we didn't handover printing
to another context then - queue another IRQ work and iret. But it keeps
the console_owner mechanism enabled.
On 2021/5/8 0:13, Petr Mladek wrote:
> On Thu 2021-05-06 23:07:19, Sergey Senozhatsky wrote:
>> Thanks for Cc-ing Petr
>>
>> On (21/05/06 15:39), Petr Mladek wrote:
>>> Many printk messages might get accumulated when consoles were suspended.
>>> They are proceed when console_unlock() is called in resume_console().
>>>
>>> The possibility to pass the console lock owner was added to reduce the risk
>>> of softlockup when too many messages were handled in an atomic context.
>>>
>>> Now, resume_console() is always in a preemptible context that is safe
>>> to handle all accumulated messages. The possibility to pass the console
>>> lock owner actually makes things worse. The new owner might be in an atomic
>>> context and might cause softlockup when processing all messages accumulated
>>> when the console was suspended.
>>>
>>> Create new console_unlock_preemptible() that will not allow to pass
>>> the console lock owner. As a result, all accumulated messages will
>>> be proceed in the safe preemptible process.
>> If we have a lot of pending messages in the logbuf, then there is
>> something chatty - some context (task, irq) or maybe several contexts.
>> And those contexts can continue adding messages, while we print them
>> _exclusively_ from preemptible context only. without ever throttling down
>> printk() callers - something that console_owner spinning and handover
>> does for us. And those printk() callers can even preempt
>> console_unlock_preemptible() and cause delays and lost messages.
> Luo, please, correct me if I am wrong.
Hi, Petr, I reply the test result on your first reply for this patch.
Please check it when you are free.
>
> This patch a about one well defined scenario. The messages are
> accumulated between suspend_console() and resume_console().
> It is "small" part of the system hibernation. And we need
> to get them out now. There might be many if something special
> was debugged.
>
> I am pretty sure that Luo did not see any flood of messages:
>
> + Flood in more contexts would be balanced by switching
> the console_owner.
>
> + Flood in one context would be naturally throttled because
> this context will become the console_owner.
>
> In each case, these messages would be generated after
> console_resume(). Luo's original patch was explicitly talking
> about messages accumulated during the suspend.
>
> Luo, could you please provide some log showing the problem?
Sure, But, it is not easy to find that the printk got such problem in
the user's logs.
So I'm attaching a log of the simulation test. You can clearly see that
the driver thread calling dev_info() is blocked.
My test method is as follows:
Kernel thread A causes the console to enter suspend and then resume it
10 seconds later.
Kernel thread B repeatedly invokes dev_info() for 15 seconds after the
console suspend.
Part of dmesg log I save before:
[ 288.013869] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 288.013870] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 288.013871] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 288.013872] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 288.013873] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 288.013874] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 288.013875] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 288.013876] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 288.013877] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 289.670256] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 299.286325] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 299.291198] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 299.296063] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 299.300924] hisi_sas_v3_hw 0000:b4:02.0: print test
[ 299.305787] hisi_sas_v3_hw 0000:b4:02.0: print test
The log shows that when thread B invokes dev_info, dev_info is blocked
for several seconds.
Thanks
Jiaxing
>
> Best Regards,
> Petr
>
> .
>
On Mon 2021-05-10 15:41:31, luojiaxing wrote:
>
> On 2021/5/6 21:39, Petr Mladek wrote:
> Hi, Petr, I test your patch and I think it needs to make some modifications
> to fix the problem.
>
>
> My test method is as follows:
> Kernel thread A causes the console to enter suspend and then resume it 10
> seconds later.
> Kernel thread B repeatedly invokes dev_info() for 15 seconds after the
> console suspend.
Could you please provide the test code?
If kthread B starts invoking dev_info() after console_resume() then it
has nothing to do with suspend/resume. It can happen anytime that a
process starts a flood of printk() calls.
Also, do you see this problem in the real life, please?
What motivated you to investigate this scenario, please?
> > >From 574e844f512c9f450e64832f09cc389bc9915f83 Mon Sep 17 00:00:00 2001
> > From: Petr Mladek <[email protected]>
> > Date: Thu, 6 May 2021 12:40:56 +0200
> > Subject: [PATCH] printk: Prevent softlockup when resuming console
> >
> > Many printk messages might get accumulated when consoles were suspended.
> > They are proceed when console_unlock() is called in resume_console().
> >
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2637,13 +2636,15 @@ void console_unlock(void)
> > * finish. This task can not be preempted if there is a
> > * waiter waiting to take over.
> > */
> > - console_lock_spinning_enable();
> > + if (spinning_enabled)
> > + console_lock_spinning_enable();
>
>
> change to
>
>
> +?????????????? if (!spinning_enabled) {
> +?????????????????????? raw_spin_lock(&console_owner_lock);
> +?????????????????????? WRITE_ONCE(console_waiter, true);
> +?????????????????????? raw_spin_unlock(&console_owner_lock);
> +?????????????? }
>
IMHO, both variants have the same result and behavior:
console_trylock_spinning() has the following condition:
if (!waiter && owner && owner != current) {
WRITE_ONCE(console_waiter, true);
spin = true;
}
My variant causes that @owner will stay "NULL".
Your variant causes that @waiter will be "true"
In both cases, the condition fails and spin will stay "false"
which means that any parallel printk() will not spin.
Best Regards,
Petr
On Mon 2021-05-10 16:29:11, luojiaxing wrote:
>
> On 2021/5/8 0:13, Petr Mladek wrote:
> > On Thu 2021-05-06 23:07:19, Sergey Senozhatsky wrote:
> > > Thanks for Cc-ing Petr
> > >
> > > On (21/05/06 15:39), Petr Mladek wrote:
> > > > Many printk messages might get accumulated when consoles were suspended.
> > > > They are proceed when console_unlock() is called in resume_console().
> > > >
> > > > The possibility to pass the console lock owner was added to reduce the risk
> > > > of softlockup when too many messages were handled in an atomic context.
> > > >
> > > > Now, resume_console() is always in a preemptible context that is safe
> > > > to handle all accumulated messages. The possibility to pass the console
> > > > lock owner actually makes things worse. The new owner might be in an atomic
> > > > context and might cause softlockup when processing all messages accumulated
> > > > when the console was suspended.
> > > >
> > > > Create new console_unlock_preemptible() that will not allow to pass
> > > > the console lock owner. As a result, all accumulated messages will
> > > > be proceed in the safe preemptible process.
> > > If we have a lot of pending messages in the logbuf, then there is
> > > something chatty - some context (task, irq) or maybe several contexts.
> > > And those contexts can continue adding messages, while we print them
> > > _exclusively_ from preemptible context only. without ever throttling down
> > > printk() callers - something that console_owner spinning and handover
> > > does for us. And those printk() callers can even preempt
> > > console_unlock_preemptible() and cause delays and lost messages.
> > Luo, please, correct me if I am wrong.
>
>
> Hi, Petr, I reply the test result on your first reply for this patch.
>
> Please check it when you are free.
>
>
> >
> > This patch a about one well defined scenario. The messages are
> > accumulated between suspend_console() and resume_console().
> > It is "small" part of the system hibernation. And we need
> > to get them out now. There might be many if something special
> > was debugged.
> >
> > I am pretty sure that Luo did not see any flood of messages:
> >
> > + Flood in more contexts would be balanced by switching
> > the console_owner.
> >
> > + Flood in one context would be naturally throttled because
> > this context will become the console_owner.
> >
> > In each case, these messages would be generated after
> > console_resume(). Luo's original patch was explicitly talking
> > about messages accumulated during the suspend.
> >
> > Luo, could you please provide some log showing the problem?
>
>
> Sure, But, it is not easy to find that the printk got such problem in the
> user's logs.
>
> So I'm attaching a log of the simulation test. You can clearly see that the
> driver thread calling dev_info() is blocked.
>
>
> My test method is as follows:
> Kernel thread A causes the console to enter suspend and then resume it 10
> seconds later.
> Kernel thread B repeatedly invokes dev_info() for 15 seconds after the
> console suspend.
>
>
> Part of dmesg log I save before:
>
> [? 288.013869] hisi_sas_v3_hw 0000:b4:02.0: print test
> [? 288.013870] hisi_sas_v3_hw 0000:b4:02.0: print test
> [? 288.013871] hisi_sas_v3_hw 0000:b4:02.0: print test
> [? 288.013872] hisi_sas_v3_hw 0000:b4:02.0: print test
> [? 288.013873] hisi_sas_v3_hw 0000:b4:02.0: print test
> [? 288.013874] hisi_sas_v3_hw 0000:b4:02.0: print test
> [? 288.013875] hisi_sas_v3_hw 0000:b4:02.0: print test
> [? 288.013876] hisi_sas_v3_hw 0000:b4:02.0: print test
> [? 288.013877] hisi_sas_v3_hw 0000:b4:02.0: print test
> [? 289.670256] hisi_sas_v3_hw 0000:b4:02.0: print test
> [? 299.286325] hisi_sas_v3_hw 0000:b4:02.0: print test
> [? 299.291198] hisi_sas_v3_hw 0000:b4:02.0: print test
> [? 299.296063] hisi_sas_v3_hw 0000:b4:02.0: print test
> [? 299.300924] hisi_sas_v3_hw 0000:b4:02.0: print test
> [? 299.305787] hisi_sas_v3_hw 0000:b4:02.0: print test
>
> The log shows that when thread B invokes dev_info, dev_info is blocked for
> several seconds.
This works as expected. It is not ideal but it is the best we
can do with the current printk() design. kthread B causes flood
of messages and it pays the price for it. The console work
is a nature throttling.
The only problem is that it becomes the console_owner too late.
It spends long time by flushing the accumulated messages. But
they are its own messages after all.
The only real solution is to pass the console work to a separate
kthread and do not block any printk() caller.
By other words. The above test case does not justify the need
of the patch.
The only justification for the patch would be when you see many
messages accumulated between suspend_console() and resume_console()
in the real life.
Best Regards,
Petr
On Mon 2021-05-10 17:26:45, Sergey Senozhatsky wrote:
> On (21/05/07 18:36), Petr Mladek wrote:
> Well, the alternative patch set just gives everyone an API that selectively
> downgrades printk() to pre-console_sem_owner age: when console_unlock()
> would never handover the printing duty. It'll take no time until this
> function will pop up somewhere where we don't want it to be used.
>
> E.g.
>
> rcu_read_lock();
> ...
> console_unlock_preemptible();
> ...
> rcu_read_unlock();
>
> lockdep_assert_preemption_enabled() is not entirely reliable - it
> depends on __lockdep_enabled, provided that system in question has
> CONFIG_PROVE_LOCKING set.
I understand the concern. I think that I would be able to sleep with
this. I believe that such problems would be discovered soon.
Also I doubt that it would spread that quickly. It is the same as
with printk_deferred(). It is just a workaround for a problem
that only few people are aware of.
If it is still a concern, we could make it static and block
any patches that would make it public.
But it does not make sense to fight over this now.
I am not sure that console_unlock_preemptible() is worth it after all.
Luo has to prove that it might fix a real life problem.
> It queues the work IF we have pending messages AND there are NO active
> console_sem waiters spinning on consolse_sem waiting for us to handover
> the printing duty. And IRQ shall write to consoles only N messages out
> of possibly M pending messages (M > N). N, obviously, should be small,
> e.g. 42 lines: if after 42 printed lines we didn't handover printing
> to another context then - queue another IRQ work and iret. But it keeps
> the console_owner mechanism enabled.
I am sorry but I am not going to continue in this discussion. Many
printk() designs have been discussed in the past and this is just
repeating the same again and again.
The current plan is to move console work to kthreads (separate
preemptive context). Using IRQ is a complete opposite way.
There is always the fight between getting the messages out as soon
as possible and the risk of breaking the system (softlockups,
deadlocks).
The kthread approach reduces the risk of system breakage to a bare
minimum. The price is that some messages might never reach console.
There is finally a consensus to give it a try. If it fails, we might
start looking for alternatives again.
Best Regards,
Petr
On 2021-05-10, Petr Mladek <[email protected]> wrote:
> The current plan is to move console work to kthreads (separate
> preemptive context). Using IRQ is a complete opposite way.
>
> There is always the fight between getting the messages out as soon
> as possible and the risk of breaking the system (softlockups,
> deadlocks).
>
> The kthread approach reduces the risk of system breakage to a bare
> minimum. The price is that some messages might never reach console.
> There is finally a consensus to give it a try. If it fails, we might
> start looking for alternatives again.
+1
I think it is clear that any such fixups will disappear once
atomic-consoles and console printing kthreads arrive. That doesn't mean
we should ignore the fixups. We just need to decide if it is a real
problem that needs our immediate attention, thus warranting a fixup in
the current implementation.
I can see the suspend/resume issue might be a real problem. If this
should be addressed now, I would support Petr's patch, forcing the
backlog to be printed in the preemptible resuming context. But let's
just keep it a suspend/resume fixup. I do not think we want to start
playing with how console_unlock() behaves.
John Ogness
On (21/05/10 12:17), Petr Mladek wrote:
> > rcu_read_lock();
> > ...
> > console_unlock_preemptible();
> > ...
> > rcu_read_unlock();
> >
> > lockdep_assert_preemption_enabled() is not entirely reliable - it
> > depends on __lockdep_enabled, provided that system in question has
> > CONFIG_PROVE_LOCKING set.
>
> I understand the concern. I think that I would be able to sleep with
> this. I believe that such problems would be discovered soon.
>
> Also I doubt that it would spread that quickly. It is the same as
> with printk_deferred(). It is just a workaround for a problem
> that only few people are aware of.
>
> If it is still a concern, we could make it static and block
> any patches that would make it public.
>
> But it does not make sense to fight over this now.
Is any attempt to have alternative fixup solutions and discussions is
going to be labeled as a fight? Up to you, I'm personally not having
any fights.
> I am not sure that console_unlock_preemptible() is worth it after all.
> Luo has to prove that it might fix a real life problem.
Good point.
> I am sorry but I am not going to continue in this discussion.
[..]
> The current plan is to move console work to kthreads (separate
> preemptive context). Using IRQ is a complete opposite way.
I'm not objecting future directions.
I'm discussing current fixup proposals. And I'm not very super comfortable
with the approach that introduces dynamic duality to printk behaviour: new
printk behaviour or the old one, and that API is system wide available. That
IRQ thing is not beautiful, bit it's already there and it's working and we
now how it behaves. That's it.
On (21/05/10 12:32), John Ogness wrote:
> On 2021-05-10, Petr Mladek <[email protected]> wrote:
> > The current plan is to move console work to kthreads (separate
> > preemptive context). Using IRQ is a complete opposite way.
> >
> > There is always the fight between getting the messages out as soon
> > as possible and the risk of breaking the system (softlockups,
> > deadlocks).
> >
> > The kthread approach reduces the risk of system breakage to a bare
> > minimum. The price is that some messages might never reach console.
> > There is finally a consensus to give it a try. If it fails, we might
> > start looking for alternatives again.
>
> +1
>
> I think it is clear that any such fixups will disappear once
> atomic-consoles and console printing kthreads arrive.
Yes, hopefully.
> That doesn't mean we should ignore the fixups.
Well, that also doesn't mean that we should not discuss the fixups.
And there seems to be some sort of taboo on discussions.
> We just need to decide if it is a real problem that needs our
> immediate attention, thus warranting a fixup in the current implementation.
That's a good point.
> I can see the suspend/resume issue might be a real problem. If this
> should be addressed now, I would support Petr's patch, forcing the
> backlog to be printed in the preemptible resuming context. But let's
> just keep it a suspend/resume fixup. I do not think we want to start
> playing with how console_unlock() behaves.
And yes again. If console suspend/resume is a problem then something
superficially about suspend/resume can be done. System wide API that
makes printk behave either like "old" or like "new" one depending on
some flags is slightly opposite to "keep printk simple" intention. IMHO
On (21/05/10 11:50), Petr Mladek wrote:
[..]
> The only problem is that it becomes the console_owner too late.
> It spends long time by flushing the accumulated messages. But
> they are its own messages after all.
>
> The only real solution is to pass the console work to a separate
> kthread and do not block any printk() caller.
Yeah, but even that thing is getting complex fairly quickly.
wake_up_process() queues kthread on the same CPU, so if the
printk() (ab)-user is in atomic context then printing kthread
won't get scheduled.
On 2021/5/10 17:30, Petr Mladek wrote:
> On Mon 2021-05-10 15:41:31, luojiaxing wrote:
>> On 2021/5/6 21:39, Petr Mladek wrote:
>> Hi, Petr, I test your patch and I think it needs to make some modifications
>> to fix the problem.
>>
>>
>> My test method is as follows:
>> Kernel thread A causes the console to enter suspend and then resume it 10
>> seconds later.
>> Kernel thread B repeatedly invokes dev_info() for 15 seconds after the
>> console suspend.
> Could you please provide the test code?
To facilitate debugging, I replaced the implementation code of the
kernel thread with the debugfs interface.
debugfs_test_if_1_write() corresponds to kernel thread A.
+static ssize_t debugfs_test_if_1_write(struct file *file,
+ const char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ struct hisi_hba *hisi_hba = file->f_inode->i_private;
+ char buf[DUMP_BUF_SIZE];
+
+ if (copy_from_user(buf, user_buf, count))
+ return -EFAULT;
+
+ if (buf[0] != '1')
+ return -EFAULT;
+
+ suspend_console();
+ mdelay(5000);
+ resume_console();
+
+ return count;
+}
debugfs_test_if_2_write corresponds to kernel thread B.
+static ssize_t debugfs_test_if_2_write(struct file *file,
+ const char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ struct hisi_hba *hisi_hba = file->f_inode->i_private;
+ char buf[DUMP_BUF_SIZE];
+ int i;
+
+ if (copy_from_user(buf, user_buf, count))
+ return -EFAULT;
+
+ if (buf[0] != '1')
+ return -EFAULT;
+
+ for (i = 0; i < 1000; i++){
+ dev_info(hisi_hba->dev, "print test");
+ }
+
+ return count;
+}
Step:
1.echo 1 > debugfs_test_if_1
2.repeat to echo 1 > debugfs_test_if_2, until debugfs_test_if_1 return
3.check timestamp by dmesg
>
> If kthread B starts invoking dev_info() after console_resume() then it
> has nothing to do with suspend/resume.
It's possible that my simulations test were not designed clearly enough,
and you might be misunderstood. Sorry.
I should actually design a kthread C to send a print during the
console_resume process (when it hasn't been returned).
But this was a litter more difficult to achieved, so I had kthread B
accomplish two test purposes. kthread B before console_resume
is invoked is to simulate loading a large amount of cache into printk
logbuf. After the console_resume is invoked, kthread B is used to
prove that it takes over the prb flushing task from console_resume,
causing Thread B to be blocked.
> It can happen anytime that a
> process starts a flood of printk() calls.
>
> Also, do you see this problem in the real life, please?
Yes, the following is part of log found when the user performs S4
suspend and resume on the PC:
[ 1368979] PM: thaw_processes over
[ 146.369606] PM: __pm_notifier_call_chain over
[ 146.374285] cpu1 pid4321 irq0 hisi_sas_debug_I_T_nexus_reset 1844
phy5 unlock mutex
[ 146.374287] cpu1 pid4321 irq0 hisi_sas_debug_I_T_nexus_reset 1845
phy5 reset over
[ 146.374288] cpu2 pid4256 irq0 hisi_sas_debug_I_T_nexus_reset 1780
phy4 get mutex
[ 146.374297] hisi_sas_v3_hw 0000:74:02.0: phydown: phy4 phy_state=0x21
[ 146.531336] cpu2 pid4256 irq0 hisi_sas_debug_I_T_nexus_reset 1810
phy4 wait start 2
[ 146.533523] hisi_sas_v3_hw 0000:74:02.0: ignore flutter phy4 down
[ 148.551332] cpu2 pid4256 irq0 hisi_sas_debug_I_T_nexus_reset 1812
phy4 wait over 2
[ 148.552442] cpu0 pid302 irq128 phy_up_v3_hw 1586 phy4
[ 148.552449] sas: sas_form_port: phy4 belongs to port0 already(1)!
[ 148.559980] cpu2 pid4256 irq0 hisi_sas_debug_I_T_nexus_reset 182reset
timeout
[ 148.567480] ata5.00: configured for UDMA/133
[ 148.574743] cpu2 pid4256 irq0 hisi_sas_debug_I_T_nexus_reset 14 phy4
unlock mut 148.574744] cpu2 pid4250 hisi_sas_debug_I_T_nexus_reset 1845
phy4 reset over
[ 148.734754] PM: pm_restore_console over
[ 148.738587] PM: atomic_inc over
[ 148.738588] PM: hibernation exit
[ 148.738714] PM: hibernation entry
You can see "hisi_sas_debug_I_T_nexus_reset 182reset timeout" in the
above print, which we added to the kernel.
The mean to wait for a phy up interrupt, as the interrupt didn't come
back for more than 2s, so driver report a timeout.
However, when we check the hardware register, the flag of phy up
interrupt has been set, So the interrupt should have returned.
After analysis, we found that dev_info() called by phy up interrupt is
blocked for more than 2s. We proved that this dev_info() takes over
the job of flushing the prb from console_resume(), and unfortunately, no
other kthread call printk() at this moment.
So it take more than 2 seconds before returning and prolong phy up
interrupt callback func's handle duration, finally misjudgment leading
to timeout.
> What motivated you to investigate this scenario, please?
I also try to modify it in my own driver to prolong the timeout judgment
by several seconds. However, since the time to flush the prb depends
on the number of caches in the logbuf, I cannot determine how many
seconds to extend the timeout judgment.
In addition, I understand that many kernel drivers do not expect
printk() to be blocked for more than a few seconds when calling
printk(). And,
printk is blocked only in a few specific scenarios, and the framework
does not need to be modified, so may be it's simple to be fixed.
Therefore, I proposed this bugfix.
>
>>> >From 574e844f512c9f450e64832f09cc389bc9915f83 Mon Sep 17 00:00:00 2001
>>> From: Petr Mladek <[email protected]>
>>> Date: Thu, 6 May 2021 12:40:56 +0200
>>> Subject: [PATCH] printk: Prevent softlockup when resuming console
>>>
>>> Many printk messages might get accumulated when consoles were suspended.
>>> They are proceed when console_unlock() is called in resume_console().
>>>
>>> --- a/kernel/printk/printk.c
>>> +++ b/kernel/printk/printk.c
>>> @@ -2637,13 +2636,15 @@ void console_unlock(void)
>>> * finish. This task can not be preempted if there is a
>>> * waiter waiting to take over.
>>> */
>>> - console_lock_spinning_enable();
>>> + if (spinning_enabled)
>>> + console_lock_spinning_enable();
>>
>> change to
>>
>>
>> + if (!spinning_enabled) {
>> + raw_spin_lock(&console_owner_lock);
>> + WRITE_ONCE(console_waiter, true);
>> + raw_spin_unlock(&console_owner_lock);
>> + }
>>
> IMHO, both variants have the same result and behavior:
Yes, you are right. If we set owner as NULL, it's the same with setting
waiter as true.
Sorry for misunderstanding your purpose. It is ok to fix my problem.
Tested-by: Luo Jiaxing <[email protected]>
Thanks
Jiaxing
>
> console_trylock_spinning() has the following condition:
>
> if (!waiter && owner && owner != current) {
> WRITE_ONCE(console_waiter, true);
> spin = true;
> }
>
> My variant causes that @owner will stay "NULL".
> Your variant causes that @waiter will be "true"
>
> In both cases, the condition fails and spin will stay "false"
> which means that any parallel printk() will not spin.
>
> Best Regards,
> Petr
>
> .
>
On Tue 2021-05-11 15:32:13, luojiaxing wrote:
>
> On 2021/5/10 17:30, Petr Mladek wrote:
> > On Mon 2021-05-10 15:41:31, luojiaxing wrote:
> > > On 2021/5/6 21:39, Petr Mladek wrote:
> To facilitate debugging, I replaced the implementation code of the kernel
> thread with the debugfs interface.
>
> It's possible that my simulations test were not designed clearly enough, and
> you might be misunderstood. Sorry.
No problem. The simulation is not that important. We all agree that
printk() has this problem. I was just curios how much the simulation
did meet the real life problem.
> > Also, do you see this problem in the real life, please?
>
> Yes, the following is part of log found when the user performs S4 suspend
> and resume on the PC:
>
> [? 1368979] PM: thaw_processes over
> [? 146.369606] PM: __pm_notifier_call_chain over
> [? 146.374285] cpu1 pid4321 irq0 hisi_sas_debug_I_T_nexus_reset 1844 phy5
> unlock mutex
> [? 146.374287] cpu1 pid4321 irq0 hisi_sas_debug_I_T_nexus_reset 1845 phy5
> reset over
> [? 146.374288] cpu2 pid4256 irq0 hisi_sas_debug_I_T_nexus_reset 1780 phy4
> get mutex
> [? 146.374297] hisi_sas_v3_hw 0000:74:02.0: phydown: phy4 phy_state=0x21
> [? 146.531336] cpu2 pid4256 irq0 hisi_sas_debug_I_T_nexus_reset 1810 phy4
> wait start 2
> [? 146.533523] hisi_sas_v3_hw 0000:74:02.0: ignore flutter phy4 down
So, here is the delay caused by flushing the accumulated printk()
messages. Am I right, please?
> [? 148.551332] cpu2 pid4256 irq0 hisi_sas_debug_I_T_nexus_reset 1812 phy4
> wait over 2
> [? 148.552442] cpu0 pid302 irq128 phy_up_v3_hw 1586 phy4
> [? 148.552449] sas: sas_form_port: phy4 belongs to port0 already(1)!
> [? 148.559980] cpu2 pid4256 irq0 hisi_sas_debug_I_T_nexus_reset 182reset
> timeout
> [? 148.567480] ata5.00: configured for UDMA/133
> [? 148.574743] cpu2 pid4256 irq0 hisi_sas_debug_I_T_nexus_reset 14 phy4
> unlock mut? 148.574744] cpu2 pid4250 hisi_sas_debug_I_T_nexus_reset 1845
> phy4 reset over
> [? 148.734754] PM: pm_restore_console over
I am a bit confused that pm_restore_console is mentioned after the
problematic delay.
I guess that "over" means that the function returned here.
Does it mean that resume_console() was called earlier before
the above delay?
> [? 148.738587] PM: atomic_inc over
> [? 148.738588] PM: hibernation exit
> [? 148.738714] PM: hibernation entry
>
>
> You can see "hisi_sas_debug_I_T_nexus_reset 182reset timeout" in the above
> print, which we added to the kernel.
>
> The mean to wait for a phy up interrupt, as the interrupt didn't come back
> for more than 2s, so driver report a timeout.
>
> However, when we check the hardware register, the flag of phy up interrupt
> has been set, So the interrupt should have returned.
>
> After analysis,? we found that dev_info() called by phy up interrupt is
> blocked for more than 2s. We proved that this dev_info() takes over
>
> the job of flushing the prb from console_resume(), and unfortunately, no
> other kthread call printk() at this moment.
>
> So it take more than 2 seconds before returning and prolong phy up interrupt
> callback func's handle duration, finally misjudgment leading to timeout.
OK.
> > What motivated you to investigate this scenario, please?
>
>
> I also try to modify it in my own driver to prolong the timeout judgment by
> several seconds. However, since the time to flush the prb depends
>
> on the number of caches in the logbuf, I cannot determine how many seconds
> to extend the timeout judgment.
Unfortunately. there is no hard limit at the moment. There are
situations where printk() causes softlockups which means that the
console work takes longer than 30 sec.
> In addition, I understand that many kernel drivers do not expect printk() to
> be blocked for more than a few seconds when calling printk(). And,
>
> printk is blocked only in a few specific scenarios, and the framework does
> not need to be modified, so may be it's simple to be fixed.
This is very old problem. If I remember correctly, the first attempts
to offload the console handling started in 2012. The main problem is
that offloading increases the risk that the messages will never reach
the console. This blocked any change for years.
There is finally a consensus to give it a chance. RT people are
working hard on it. They put a lot of effort to make lockless
ringbuffer. Offload is the next step. But it requires implementing
atomic console(s), serialize backtraces from all CPUs in NMI,
and try to push the messages immediately when things go wrong
or when the kthreads could not get scheduled by definition,
for example, early boot, suspend, shutdown, panic.
> Therefore, I proposed this bugfix.
The fix is rather a workaround. And it helps only in one particular
scenario. It will get obsolete when the console work is offloaded.
I would like to be sure that it is important enough.
Does the fix prevent "only" an extra reset?
Is the system functional in the end?
Is it acceptable to keep it as is and wait for the proper solution?
To be honest, it might take a long time (even > 1 year) until
the kthreads are upstream. As I already wrote, there was a pushback
against it and we need to do it properly.
Best Regards,
Petr
On 2021/5/11 17:08, Petr Mladek wrote:
> On Tue 2021-05-11 15:32:13, luojiaxing wrote:
>> On 2021/5/10 17:30, Petr Mladek wrote:
>>> On Mon 2021-05-10 15:41:31, luojiaxing wrote:
>>>> On 2021/5/6 21:39, Petr Mladek wrote:
>> To facilitate debugging, I replaced the implementation code of the kernel
>> thread with the debugfs interface.
>>
>> It's possible that my simulations test were not designed clearly enough, and
>> you might be misunderstood. Sorry.
> No problem. The simulation is not that important. We all agree that
> printk() has this problem. I was just curios how much the simulation
> did meet the real life problem.
>
>
>>> Also, do you see this problem in the real life, please?
>> Yes, the following is part of log found when the user performs S4 suspend
>> and resume on the PC:
>>
>> [ 1368979] PM: thaw_processes over
>> [ 146.369606] PM: __pm_notifier_call_chain over
>> [ 146.374285] cpu1 pid4321 irq0 hisi_sas_debug_I_T_nexus_reset 1844 phy5
>> unlock mutex
>> [ 146.374287] cpu1 pid4321 irq0 hisi_sas_debug_I_T_nexus_reset 1845 phy5
>> reset over
>> [ 146.374288] cpu2 pid4256 irq0 hisi_sas_debug_I_T_nexus_reset 1780 phy4
>> get mutex
>> [ 146.374297] hisi_sas_v3_hw 0000:74:02.0: phydown: phy4 phy_state=0x21
>> [ 146.531336] cpu2 pid4256 irq0 hisi_sas_debug_I_T_nexus_reset 1810 phy4
>> wait start 2
>> [ 146.533523] hisi_sas_v3_hw 0000:74:02.0: ignore flutter phy4 down
> So, here is the delay caused by flushing the accumulated printk()
> messages. Am I right, please?
Yes, My personal expectation was that the delay in printk() was about 10ms,
but this situaton was far more than expected,
>
>
>> [ 148.551332] cpu2 pid4256 irq0 hisi_sas_debug_I_T_nexus_reset 1812 phy4
>> wait over 2
>> [ 148.552442] cpu0 pid302 irq128 phy_up_v3_hw 1586 phy4
>> [ 148.552449] sas: sas_form_port: phy4 belongs to port0 already(1)!
>> [ 148.559980] cpu2 pid4256 irq0 hisi_sas_debug_I_T_nexus_reset 182reset
>> timeout
>> [ 148.567480] ata5.00: configured for UDMA/133
>> [ 148.574743] cpu2 pid4256 irq0 hisi_sas_debug_I_T_nexus_reset 14 phy4
>> unlock mut 148.574744] cpu2 pid4250 hisi_sas_debug_I_T_nexus_reset 1845
>> phy4 reset over
>> [ 148.734754] PM: pm_restore_console over
> I am a bit confused that pm_restore_console is mentioned after the
> problematic delay.
>
> I guess that "over" means that the function returned here.
> Does it mean that resume_console() was called earlier before
> the above delay?
Yes, resume_console() is called before phy up interrupt call printk()
These is another log, may be more clear:
[ 739.860835] hisi_sas_v3_hw 0000:74:02.0: resuming from operating
state [D0]
[ 739.867233] hisi_sas_v3_hw 0000:74:04.0: resuming from operating
state [D0]
...
[ 742.906981] cpu2 pid4182 irq0 hisi_sas_debug_I_T_nexus_reset 1799
phy0 wait start
[ 742.907970] cpu0 pid4107 irq128 phy_up_v3_hw 1463 phy0
[ 744.934979] cpu2 pid4182 irq0 hisi_sas_debug_I_T_nexus_reset 1801
phy0 wait over
[ 744.944173] cpu2 pid4182 irq0 hisi_sas_debug_I_T_nexus_reset 1811
phy0 reset timeout
PC resume, console_resume() is call somewhere between [ 742.906981] and
[ 739.867233],
Then our driver execute I_T_nexus_reset on phy0 and start to wait phy
up. phy up interrupt actually
come back soon, and cache "phy_up_v3_hw 1463 phy0" on [ 742.907970], but
as you konw,
printk() is blocked here and after 2s, I_T_nexus_reset wait over and
print timeout.
>
>
>> [ 148.738587] PM: atomic_inc over
>> [ 148.738588] PM: hibernation exit
>> [ 148.738714] PM: hibernation entry
>>
>>
>> You can see "hisi_sas_debug_I_T_nexus_reset 182reset timeout" in the above
>> print, which we added to the kernel.
>>
>> The mean to wait for a phy up interrupt, as the interrupt didn't come back
>> for more than 2s, so driver report a timeout.
>>
>> However, when we check the hardware register, the flag of phy up interrupt
>> has been set, So the interrupt should have returned.
>>
>> After analysis, we found that dev_info() called by phy up interrupt is
>> blocked for more than 2s. We proved that this dev_info() takes over
>>
>> the job of flushing the prb from console_resume(), and unfortunately, no
>> other kthread call printk() at this moment.
>>
>> So it take more than 2 seconds before returning and prolong phy up interrupt
>> callback func's handle duration, finally misjudgment leading to timeout.
> OK.
>
>
>>> What motivated you to investigate this scenario, please?
>>
>> I also try to modify it in my own driver to prolong the timeout judgment by
>> several seconds. However, since the time to flush the prb depends
>>
>> on the number of caches in the logbuf, I cannot determine how many seconds
>> to extend the timeout judgment.
> Unfortunately. there is no hard limit at the moment. There are
> situations where printk() causes softlockups which means that the
> console work takes longer than 30 sec.
>
>
>> In addition, I understand that many kernel drivers do not expect printk() to
>> be blocked for more than a few seconds when calling printk(). And,
>>
>> printk is blocked only in a few specific scenarios, and the framework does
>> not need to be modified, so may be it's simple to be fixed.
> This is very old problem. If I remember correctly, the first attempts
> to offload the console handling started in 2012. The main problem is
> that offloading increases the risk that the messages will never reach
> the console. This blocked any change for years.
>
> There is finally a consensus to give it a chance. RT people are
> working hard on it. They put a lot of effort to make lockless
> ringbuffer. Offload is the next step. But it requires implementing
> atomic console(s), serialize backtraces from all CPUs in NMI,
> and try to push the messages immediately when things go wrong
> or when the kthreads could not get scheduled by definition,
> for example, early boot, suspend, shutdown, panic.
I understand the effect of offload allows the driver to call printk() to
return quickly without any blocking, right?
>
>> Therefore, I proposed this bugfix.
> The fix is rather a workaround. And it helps only in one particular
> scenario. It will get obsolete when the console work is offloaded.
> I would like to be sure that it is important enough.
>
> Does the fix prevent "only" an extra reset?
> Is the system functional in the end?
From the results, system disk become read-only after resume. We are
storage driver.
> Is it acceptable to keep it as is and wait for the proper solution?
As you said, offload takes more than one year to upstream. If this
bugfix is a small one and scenario-specific,
can we fix it first? Perhaps this bugfix will be removed along with the
rest of the code when offload upstream.
However other drivers don't sense any change.
If our drive do the workaround, we have to revert it after offload
upstream, which may be more troublesome for us.
Thanks
Jiaxing
>
> To be honest, it might take a long time (even > 1 year) until
> the kthreads are upstream. As I already wrote, there was a pushback
> against it and we need to do it properly.
>
> Best Regards,
> Petr
>
> .
>