2015-05-25 12:46:59

by Petr Mladek

[permalink] [raw]
Subject: [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup

The main source of deadlocks caused by printk() in NMI context has been
solved by the commit a9edc88093287 ("x86/nmi: Perform a safe NMI stack
trace on all CPUs").

But there are still few warnings printed in the NMI code that could
case a deadlock. For example, see the freeze discussed at
https://lkml.org/lkml/2015/5/20/481

The warnings are not easy to avoid. We could not postpone them easily.
We could not allocate buffers in NMI context. Therefore we would need
some lockless mechanism to share a buffer between NMI and normal context.
But this would make printk() code much more complicated and it is not
worth it. There has already been an attempt to do so and it has been
rejected, see https://lkml.org/lkml/2014/6/10/388


This patch set provides much easier solution. It prevents from
the deadlock on logbuf_lock by using trylock rather than spin_lock.
If the lock can not be taken, the message is ignored and some
warning is printed later on.

1st patch adds the simple solution.

2nd, 4th, 5th patches makes the solution a bit more robust.

3rd patch just shuffle some code to make 4th patch trivial.

6th..10th patches split the spaghetti vprintk_emit() into more
pieces. They basically just shuffle the code to make it
better readable.


The patch have been tested against Linus' tree but it can be applied
also against linux-next.

Petr Mladek (10):
printk: Avoid deadlock in NMI context
printk: Try harder to get logbuf_lock on NMI
printk: Move the deferred printk stuff
printk: Merge and flush NMI buffer predictably via IRQ work
printk: Try hard to print Oops message in NMI context
printk: Split delayed printing of warnings from vprintk_emit()
printk: Split text formatting and analyze from vprintk_emit()
printk: Detect scheduler messages in vprintk_format_and_analyze()
printk: Split text storing logic from vprintk_emit()
printk: Split console call from vprintk_emit()

kernel/printk/printk.c | 378 +++++++++++++++++++++++++++++++++++--------------
1 file changed, 269 insertions(+), 109 deletions(-)

--
1.8.5.6


2015-05-25 12:46:55

by Petr Mladek

[permalink] [raw]
Subject: [PATCH 01/10] printk: Avoid deadlock in NMI context

printk() cannot be used in NMI context safely because it uses an internal
lock and thus could cause a deadlock. This is fine because NMI context
is very special. The handlers should be short, effective, and do not
use printk().

But developers tend to print warnings even from NMI code. They are pretty
hard to debug when they lockup the machine and nothing appears in the logs.

This patch prevents from the deadlock on logbuf_lock by using trylock
rather than spin_lock. If the lock can not be taken, the message is
ignored and some warning is printed later on.

We also must not try to get console from NMI context. It needs
even more locks and there is even higher chance to hung up.

Unfortunately, we could not print more details about the lost message.
We could not alloc a buffer in NMI. Therefore we would need some
lockless mechanism to share a buffer between NMI and normal context.
But this would make printk() code much more complicated and
it is not worth it. There has already been an attempt to do so
and it has been rejected, see https://lkml.org/lkml/2014/6/10/388
This is also the reason why we use the atomic counter.

Hint: If anybody wants to know what NMI message is being lost,
trace_dump_stack(0) can be called when try_lock() fails. Then
trace_printk() could be used on that place.

This is only the basic logic. There will be some improvements in
the followup patches.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/printk/printk.c | 42 ++++++++++++++++++++++++++++++++++++++----
1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c099b082cd02..94fcf6f0b542 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1615,6 +1615,7 @@ asmlinkage int vprintk_emit(int facility, int level,
const char *fmt, va_list args)
{
static int recursion_bug;
+ static atomic_t nmi_message_lost;
static char textbuf[LOG_LINE_MAX];
char *text = textbuf;
size_t text_len = 0;
@@ -1641,7 +1642,7 @@ asmlinkage int vprintk_emit(int facility, int level,
/*
* Ouch, printk recursed into itself!
*/
- if (unlikely(logbuf_cpu == this_cpu)) {
+ if (unlikely(logbuf_cpu == this_cpu) && !in_nmi()) {
/*
* If a crash is occurring during printk() on this CPU,
* then try to get the crash message out but make sure
@@ -1658,7 +1659,26 @@ asmlinkage int vprintk_emit(int facility, int level,
}

lockdep_off();
- raw_spin_lock(&logbuf_lock);
+
+ /*
+ * printk() should not be used in NMI context because the interrupt
+ * might came when lockbuf_lock is taken. Such situation is hard
+ * to debug. Therefore, we try to avoid the deadlock, use trylock
+ * in NMI context, and later warn about lost messages.
+ *
+ * Hint: If you get into troubles, use trace_dump_stack(0)
+ * to get location of the lost message and use trace_printk()
+ * there.
+ */
+ if (!in_nmi()) {
+ raw_spin_lock(&logbuf_lock);
+ } else if (!raw_spin_trylock(&logbuf_lock)) {
+ atomic_inc(&nmi_message_lost);
+ lockdep_on();
+ local_irq_restore(flags);
+ return 0;
+ }
+
logbuf_cpu = this_cpu;

if (unlikely(recursion_bug)) {
@@ -1672,6 +1692,17 @@ asmlinkage int vprintk_emit(int facility, int level,
strlen(recursion_msg));
}

+ if (unlikely(atomic_read(&nmi_message_lost))) {
+ int lost = atomic_xchg(&nmi_message_lost, 0);
+
+ text_len = scnprintf(text, sizeof(textbuf),
+ "BAD LUCK: lost %d message(s) from NMI context!",
+ lost);
+ /* emit KERN_CRIT message */
+ printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
+ NULL, 0, text, text_len);
+ }
+
/*
* The printf needs to come first; we need the syslog
* prefix which might be passed-in as a parameter.
@@ -1759,8 +1790,11 @@ asmlinkage int vprintk_emit(int facility, int level,
lockdep_on();
local_irq_restore(flags);

- /* If called from the scheduler, we can not call up(). */
- if (!in_sched) {
+ /*
+ * If called from the scheduler or NMI context, we can not get console
+ * without a possible deadlock.
+ */
+ if (!in_sched && !in_nmi()) {
lockdep_off();
/*
* Disable preemption to avoid being preempted while holding
--
1.8.5.6

2015-05-25 12:46:57

by Petr Mladek

[permalink] [raw]
Subject: [PATCH 02/10] printk: Try harder to get logbuf_lock on NMI

If the logbuf_lock is not available immediately, it does not mean
that there is a deadlock. We should try harder and wait a bit.

On the other hand, we must not forget that we are in NMI and the timeout
has to be rather small. It must not cause dangerous stalls.

I even got full system freeze when the timeout was 10ms and I printed
backtraces from all CPUs. In this case, all CPUs were blocked for
too long.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/printk/printk.c | 38 +++++++++++++++++++++++++++++++++++---
1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 94fcf6f0b542..e6c00d6ee8dc 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -231,6 +231,8 @@ static DEFINE_RAW_SPINLOCK(logbuf_lock);

#ifdef CONFIG_PRINTK
DECLARE_WAIT_QUEUE_HEAD(log_wait);
+/* cpu currently holding logbuf_lock */
+static unsigned int logbuf_cpu = UINT_MAX;
/* the next printk record to read by syslog(READ) or /proc/kmsg */
static u64 syslog_seq;
static u32 syslog_idx;
@@ -1610,6 +1612,38 @@ static size_t cont_print_text(char *text, size_t size)
return textlen;
}

+/*
+ * This value defines the maximum delay that we spend waiting for logbuf_lock
+ * in NMI context. 100us looks like a good compromise. Note that, for example,
+ * syslog_print_all() might hold the lock for quite some time. On the other
+ * hand, waiting 10ms caused system freeze when many backtraces were printed
+ * in NMI.
+ */
+#define TRY_LOCKBUF_LOCK_MAX_DELAY_NS 100000UL
+
+/* We must be careful in NMI when we managed to preempt a running printk */
+static int try_logbuf_lock_in_nmi(void)
+{
+ u64 start_time, current_time;
+ int this_cpu = smp_processor_id();
+
+ /* no way if we are already locked on this CPU */
+ if (logbuf_cpu == this_cpu)
+ return 0;
+
+ /* try hard to get the lock but do not wait forever */
+ start_time = cpu_clock(this_cpu);
+ current_time = start_time;
+ while (current_time - start_time < TRY_LOCKBUF_LOCK_MAX_DELAY_NS) {
+ if (raw_spin_trylock(&logbuf_lock))
+ return 1;
+ cpu_relax();
+ current_time = cpu_clock(this_cpu);
+ }
+
+ return 0;
+}
+
asmlinkage int vprintk_emit(int facility, int level,
const char *dict, size_t dictlen,
const char *fmt, va_list args)
@@ -1624,8 +1658,6 @@ asmlinkage int vprintk_emit(int facility, int level,
int this_cpu;
int printed_len = 0;
bool in_sched = false;
- /* cpu currently holding logbuf_lock in this function */
- static unsigned int logbuf_cpu = UINT_MAX;

if (level == LOGLEVEL_SCHED) {
level = LOGLEVEL_DEFAULT;
@@ -1672,7 +1704,7 @@ asmlinkage int vprintk_emit(int facility, int level,
*/
if (!in_nmi()) {
raw_spin_lock(&logbuf_lock);
- } else if (!raw_spin_trylock(&logbuf_lock)) {
+ } else if (!try_logbuf_lock_in_nmi()) {
atomic_inc(&nmi_message_lost);
lockdep_on();
local_irq_restore(flags);
--
1.8.5.6

2015-05-25 12:48:59

by Petr Mladek

[permalink] [raw]
Subject: [PATCH 03/10] printk: Move the deferred printk stuff

We will want to use the workqueue also in NMI because we could not take
the console safely there.

This patch just move the code up in the source file. There is no
other change.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/printk/printk.c | 108 ++++++++++++++++++++++++-------------------------
1 file changed, 54 insertions(+), 54 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e6c00d6ee8dc..bf2abdda5869 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1509,6 +1509,60 @@ static inline void printk_delay(void)
}

/*
+ * Delayed printk version, for scheduler-internal messages:
+ */
+#define PRINTK_PENDING_WAKEUP 0x01
+#define PRINTK_PENDING_OUTPUT 0x02
+
+static DEFINE_PER_CPU(int, printk_pending);
+
+static void wake_up_klogd_work_func(struct irq_work *irq_work)
+{
+ int pending = __this_cpu_xchg(printk_pending, 0);
+
+ if (pending & PRINTK_PENDING_OUTPUT) {
+ /* If trylock fails, someone else is doing the printing */
+ if (console_trylock())
+ console_unlock();
+ }
+
+ if (pending & PRINTK_PENDING_WAKEUP)
+ wake_up_interruptible(&log_wait);
+}
+
+static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
+ .func = wake_up_klogd_work_func,
+ .flags = IRQ_WORK_LAZY,
+};
+
+void wake_up_klogd(void)
+{
+ preempt_disable();
+ if (waitqueue_active(&log_wait)) {
+ this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
+ irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+ }
+ preempt_enable();
+}
+
+int printk_deferred(const char *fmt, ...)
+{
+ va_list args;
+ int r;
+
+ preempt_disable();
+ va_start(args, fmt);
+ r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
+ va_end(args);
+
+ __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
+ irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+ preempt_enable();
+
+ return r;
+}
+
+/*
* Continuation lines are buffered, and not committed to the record buffer
* until the line is complete, or a race forces it. The line fragments
* though, are printed immediately to the consoles to ensure everything has
@@ -2665,60 +2719,6 @@ late_initcall(printk_late_init);

#if defined CONFIG_PRINTK
/*
- * Delayed printk version, for scheduler-internal messages:
- */
-#define PRINTK_PENDING_WAKEUP 0x01
-#define PRINTK_PENDING_OUTPUT 0x02
-
-static DEFINE_PER_CPU(int, printk_pending);
-
-static void wake_up_klogd_work_func(struct irq_work *irq_work)
-{
- int pending = __this_cpu_xchg(printk_pending, 0);
-
- if (pending & PRINTK_PENDING_OUTPUT) {
- /* If trylock fails, someone else is doing the printing */
- if (console_trylock())
- console_unlock();
- }
-
- if (pending & PRINTK_PENDING_WAKEUP)
- wake_up_interruptible(&log_wait);
-}
-
-static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
- .func = wake_up_klogd_work_func,
- .flags = IRQ_WORK_LAZY,
-};
-
-void wake_up_klogd(void)
-{
- preempt_disable();
- if (waitqueue_active(&log_wait)) {
- this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
- irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
- }
- preempt_enable();
-}
-
-int printk_deferred(const char *fmt, ...)
-{
- va_list args;
- int r;
-
- preempt_disable();
- va_start(args, fmt);
- r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
- va_end(args);
-
- __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
- irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
- preempt_enable();
-
- return r;
-}
-
-/*
* printk rate limiting, lifted from the networking subsystem.
*
* This enforces a rate limit: not more than 10 kernel messages
--
1.8.5.6

2015-05-25 12:47:05

by Petr Mladek

[permalink] [raw]
Subject: [PATCH 04/10] printk: Merge and flush NMI buffer predictably via IRQ work

It might take ages until users see messages from NMI context. They cannot
be flushed to the console because the operation involves taking and
releasing a bunch of locks. Everything gets fixed by the followup printk
in normal context but it is not predictable.

The same problem has printk_sched() and this patch reuses the existing
solution.

There is no special printk() variant for NMI context. Hence the IRQ work
need to get queued from vprintk_emit().

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/printk/printk.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index bf2abdda5869..c2ae9ff388ae 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1554,9 +1554,6 @@ int printk_deferred(const char *fmt, ...)
va_start(args, fmt);
r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
va_end(args);
-
- __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
- irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
preempt_enable();

return r;
@@ -1880,7 +1877,10 @@ asmlinkage int vprintk_emit(int facility, int level,
* If called from the scheduler or NMI context, we can not get console
* without a possible deadlock.
*/
- if (!in_sched && !in_nmi()) {
+ if (in_sched || in_nmi()) {
+ __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
+ irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+ } else {
lockdep_off();
/*
* Disable preemption to avoid being preempted while holding
--
1.8.5.6

2015-05-25 12:47:09

by Petr Mladek

[permalink] [raw]
Subject: [PATCH 05/10] printk: Try hard to print Oops message in NMI context

Oops messages are important for debugging. We should try harder to get
them into the ring buffer and print them to the console. This is
problematic in NMI context because the needed locks might
already be taken.

What we can do, though, is to zap all printk locks. We already do this
when a printk recursion is detected. This should be safe because
the system is crashing and there shouldn't be any printk caller
that would cause the deadlock.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/printk/printk.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c2ae9ff388ae..8fb0aaaa6258 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1672,14 +1672,20 @@ static size_t cont_print_text(char *text, size_t size)
*/
#define TRY_LOCKBUF_LOCK_MAX_DELAY_NS 100000UL

-/* We must be careful in NMI when we managed to preempt a running printk */
+/*
+ * We must be careful in NMI when we managed to preempt a running printk.
+ * Special case are Oops messages from NMI context. We try hard to print
+ * them and forcefully drop existing locks.
+ */
static int try_logbuf_lock_in_nmi(void)
{
u64 start_time, current_time;
int this_cpu = smp_processor_id();

- /* no way if we are already locked on this CPU */
- if (logbuf_cpu == this_cpu)
+ if (oops_in_progress)
+ zap_locks();
+ else if (logbuf_cpu == this_cpu)
+ /* no way if we are already locked on this CPU */
return 0;

/* try hard to get the lock but do not wait forever */
@@ -1876,8 +1882,13 @@ asmlinkage int vprintk_emit(int facility, int level,
/*
* If called from the scheduler or NMI context, we can not get console
* without a possible deadlock.
+ *
+ * The only exception are Oops messages from NMI context where all
+ * relevant locks have been forcefully dropped in
+ * try_logbuf_lock_in_nmi(). We have to try to get the console,
+ * otherwise the last messages would get lost.
*/
- if (in_sched || in_nmi()) {
+ if (in_sched || (in_nmi() && !oops_in_progress)) {
__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
} else {
--
1.8.5.6

2015-05-25 12:48:32

by Petr Mladek

[permalink] [raw]
Subject: [PATCH 06/10] printk: Split delayed printing of warnings from vprintk_emit()

printk_emit() is too long. Let's split the delayed printing of warnings
into a separate function.

This patch just shuffles the code. There is no change in the functionality.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/printk/printk.c | 64 +++++++++++++++++++++++++++++++-------------------
1 file changed, 40 insertions(+), 24 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8fb0aaaa6258..91b8043044ac 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -233,6 +233,9 @@ static DEFINE_RAW_SPINLOCK(logbuf_lock);
DECLARE_WAIT_QUEUE_HEAD(log_wait);
/* cpu currently holding logbuf_lock */
static unsigned int logbuf_cpu = UINT_MAX;
+/* flags used for delayed printing of warnings */
+static int recursion_bug;
+static atomic_t nmi_message_lost;
/* the next printk record to read by syslog(READ) or /proc/kmsg */
static u64 syslog_seq;
static u32 syslog_idx;
@@ -269,6 +272,8 @@ static u32 clear_idx;
static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
static char *log_buf = __log_buf;
static u32 log_buf_len = __LOG_BUF_LEN;
+/* helper buffer to print formatted text */
+static char textbuf[LOG_LINE_MAX];

/* Return log buffer address */
char *log_buf_addr_get(void)
@@ -1701,13 +1706,44 @@ static int try_logbuf_lock_in_nmi(void)
return 0;
}

+/*
+ * This function modifies the global textbuf and therefore it must be called
+ * under lockbuf_lock!
+ */
+static int vprintk_delayed_warnings(void)
+{
+ int printed_len = 0;
+ int text_len;
+
+ if (unlikely(recursion_bug)) {
+ static const char recursion_msg[] =
+ "BUG: recent printk recursion!";
+
+ recursion_bug = 0;
+ /* emit KERN_CRIT message */
+ printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
+ NULL, 0, recursion_msg,
+ strlen(recursion_msg));
+ }
+
+ if (unlikely(atomic_read(&nmi_message_lost))) {
+ int lost = atomic_xchg(&nmi_message_lost, 0);
+
+ text_len = scnprintf(textbuf, sizeof(textbuf),
+ "BAD LUCK: lost %d message(s) from NMI context!",
+ lost);
+ /* emit KERN_CRIT message */
+ printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
+ NULL, 0, textbuf, text_len);
+ }
+
+ return printed_len;
+}
+
asmlinkage int vprintk_emit(int facility, int level,
const char *dict, size_t dictlen,
const char *fmt, va_list args)
{
- static int recursion_bug;
- static atomic_t nmi_message_lost;
- static char textbuf[LOG_LINE_MAX];
char *text = textbuf;
size_t text_len = 0;
enum log_flags lflags = 0;
@@ -1770,27 +1806,7 @@ asmlinkage int vprintk_emit(int facility, int level,

logbuf_cpu = this_cpu;

- if (unlikely(recursion_bug)) {
- static const char recursion_msg[] =
- "BUG: recent printk recursion!";
-
- recursion_bug = 0;
- /* emit KERN_CRIT message */
- printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
- NULL, 0, recursion_msg,
- strlen(recursion_msg));
- }
-
- if (unlikely(atomic_read(&nmi_message_lost))) {
- int lost = atomic_xchg(&nmi_message_lost, 0);
-
- text_len = scnprintf(text, sizeof(textbuf),
- "BAD LUCK: lost %d message(s) from NMI context!",
- lost);
- /* emit KERN_CRIT message */
- printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
- NULL, 0, text, text_len);
- }
+ printed_len += vprintk_delayed_warnings();

/*
* The printf needs to come first; we need the syslog
--
1.8.5.6

2015-05-25 12:48:09

by Petr Mladek

[permalink] [raw]
Subject: [PATCH 07/10] printk: Split text formatting and analyze from vprintk_emit()

vprintk_emit() is too long. Let's move the text formatting and
the text analyze into a separate function.

I am not super happy with the result. The primary aim is to reduce
future mistakes when backporting fixes in this code to older stable
kernel releases. Therefore I want to keep the changes in the code
at minimum. Especially, I want to avoid conversion of @text into
"char **" because it is harder to read.

Well, there is one exception. @level is accessed via a pointer.
It looks safe because compiler warns about type mismatch when
the access is not updated. Also it used only in a very
simple code.

Better ideas are welcome!

This patch just shuffles the code. There is no change in
the functionality.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/printk/printk.c | 122 +++++++++++++++++++++++++++++++------------------
1 file changed, 78 insertions(+), 44 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 91b8043044ac..7d009144f97f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1740,13 +1740,85 @@ static int vprintk_delayed_warnings(void)
return printed_len;
}

+/**
+ * vprintk_format_and_analyze - format and analyze the text message
+ * @fmt: text format
+ * @args: argumetns used to generate the formated text
+ * @dict: pointer to the message dictionary
+ * @facility: syslog facility
+ * @level: syslog level; might be replaced by the one found in the fomatted text
+ * @ftext: pointer to the formatted text after the syslog prefix
+ * @ftext_len: length of the formatted text without the syslog prefix
+ *
+ * This function modifies the global textbuf and therefore it must be called
+ * under lockbuf_lock!
+ */
+static enum log_flags vprinkt_format_and_analyze(const char *fmt, va_list args,
+ const char *dict, int facility,
+ int *level,
+ char **ftext,
+ size_t *ftext_len)
+{
+ char *text = textbuf;
+ size_t text_len;
+ enum log_flags lflags = 0;
+
+ /*
+ * The printf needs to come first; we need the syslog
+ * prefix which might be passed-in as a parameter.
+ */
+ text_len = vscnprintf(text, sizeof(textbuf), fmt, args);
+
+ /* mark and strip a trailing newline */
+ if (text_len && text[text_len-1] == '\n') {
+ text_len--;
+ lflags |= LOG_NEWLINE;
+ }
+
+ /* strip kernel syslog prefix and extract log level or control flags */
+ if (facility == 0) {
+ int kern_level = printk_get_level(text);
+
+ if (kern_level) {
+ const char *end_of_header = printk_skip_level(text);
+
+ switch (kern_level) {
+ case '0' ... '7':
+ if (*level == LOGLEVEL_DEFAULT)
+ *level = kern_level - '0';
+ /* fallthrough */
+ case 'd': /* KERN_DEFAULT */
+ lflags |= LOG_PREFIX;
+ }
+ /*
+ * No need to check length here because vscnprintf
+ * put '\0' at the end of the string. Only valid and
+ * newly printed level is detected.
+ */
+ text_len -= end_of_header - text;
+ text = (char *)end_of_header;
+ }
+ }
+
+ if (*level == LOGLEVEL_DEFAULT)
+ *level = default_message_loglevel;
+
+ if (dict)
+ lflags |= LOG_PREFIX|LOG_NEWLINE;
+
+ *ftext = text;
+ *ftext_len = text_len;
+
+ return lflags;
+}
+
asmlinkage int vprintk_emit(int facility, int level,
const char *dict, size_t dictlen,
const char *fmt, va_list args)
{
- char *text = textbuf;
- size_t text_len = 0;
- enum log_flags lflags = 0;
+ char *text;
+ size_t text_len;
+ enum log_flags lflags;
unsigned long flags;
int this_cpu;
int printed_len = 0;
@@ -1808,47 +1880,9 @@ asmlinkage int vprintk_emit(int facility, int level,

printed_len += vprintk_delayed_warnings();

- /*
- * The printf needs to come first; we need the syslog
- * prefix which might be passed-in as a parameter.
- */
- text_len = vscnprintf(text, sizeof(textbuf), fmt, args);
-
- /* mark and strip a trailing newline */
- if (text_len && text[text_len-1] == '\n') {
- text_len--;
- lflags |= LOG_NEWLINE;
- }
-
- /* strip kernel syslog prefix and extract log level or control flags */
- if (facility == 0) {
- int kern_level = printk_get_level(text);
-
- if (kern_level) {
- const char *end_of_header = printk_skip_level(text);
- switch (kern_level) {
- case '0' ... '7':
- if (level == LOGLEVEL_DEFAULT)
- level = kern_level - '0';
- /* fallthrough */
- case 'd': /* KERN_DEFAULT */
- lflags |= LOG_PREFIX;
- }
- /*
- * No need to check length here because vscnprintf
- * put '\0' at the end of the string. Only valid and
- * newly printed level is detected.
- */
- text_len -= end_of_header - text;
- text = (char *)end_of_header;
- }
- }
-
- if (level == LOGLEVEL_DEFAULT)
- level = default_message_loglevel;
-
- if (dict)
- lflags |= LOG_PREFIX|LOG_NEWLINE;
+ lflags = vprinkt_format_and_analyze(fmt, args, dict, facility,
+ &level,
+ &text, &text_len);

if (!(lflags & LOG_NEWLINE)) {
/*
--
1.8.5.6

2015-05-25 12:47:54

by Petr Mladek

[permalink] [raw]
Subject: [PATCH 08/10] printk: Detect scheduler messages in vprintk_format_and_analyze()

The special hack for messages printed by the scheduler belongs to
the new vprintk_format_and_analyze().

This patch just shuffles the code. There is no change in
the functionality.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/printk/printk.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7d009144f97f..6e53b6f60ca3 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1747,15 +1747,16 @@ static int vprintk_delayed_warnings(void)
* @dict: pointer to the message dictionary
* @facility: syslog facility
* @level: syslog level; might be replaced by the one found in the fomatted text
+ * @in_sched: is set to true when the message is printed by the scheduler
* @ftext: pointer to the formatted text after the syslog prefix
* @ftext_len: length of the formatted text without the syslog prefix
*
* This function modifies the global textbuf and therefore it must be called
* under lockbuf_lock!
*/
-static enum log_flags vprinkt_format_and_analyze(const char *fmt, va_list args,
+static enum log_flags vprintk_format_and_analyze(const char *fmt, va_list args,
const char *dict, int facility,
- int *level,
+ int *level, bool *in_sched,
char **ftext,
size_t *ftext_len)
{
@@ -1775,6 +1776,15 @@ static enum log_flags vprinkt_format_and_analyze(const char *fmt, va_list args,
lflags |= LOG_NEWLINE;
}

+ /*
+ * Messages printed by the scheduler must not wake up the console.
+ * They are detected via a special @level, see printk_deferred().
+ */
+ if (*level == LOGLEVEL_SCHED) {
+ *level = LOGLEVEL_DEFAULT;
+ *in_sched = true;
+ }
+
/* strip kernel syslog prefix and extract log level or control flags */
if (facility == 0) {
int kern_level = printk_get_level(text);
@@ -1824,11 +1834,6 @@ asmlinkage int vprintk_emit(int facility, int level,
int printed_len = 0;
bool in_sched = false;

- if (level == LOGLEVEL_SCHED) {
- level = LOGLEVEL_DEFAULT;
- in_sched = true;
- }
-
boot_delay_msec(level);
printk_delay();

@@ -1880,8 +1885,8 @@ asmlinkage int vprintk_emit(int facility, int level,

printed_len += vprintk_delayed_warnings();

- lflags = vprinkt_format_and_analyze(fmt, args, dict, facility,
- &level,
+ lflags = vprintk_format_and_analyze(fmt, args, dict, facility,
+ &level, &in_sched,
&text, &text_len);

if (!(lflags & LOG_NEWLINE)) {
--
1.8.5.6

2015-05-25 12:47:13

by Petr Mladek

[permalink] [raw]
Subject: [PATCH 09/10] printk: Split text storing logic from vprintk_emit()

vprintk_emit() is too long. Let's split the code that stores the actual
text buffer with respect to the cont buffer and the detected log flags.

This patch just shuffles the code. There is no change in
the functionality.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/printk/printk.c | 96 ++++++++++++++++++++++++++++++--------------------
1 file changed, 57 insertions(+), 39 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 6e53b6f60ca3..c0b97653987a 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1822,6 +1822,61 @@ static enum log_flags vprintk_format_and_analyze(const char *fmt, va_list args,
return lflags;
}

+/*
+ * This function stores the given text into the log buffer with respect
+ * to the cont buffer and the text flags.
+ *
+ * It must be called under lockbuf_lock!
+ */
+static int vprintk_store_text(int facility, int level, enum log_flags lflags,
+ const char *dict, size_t dictlen,
+ const char *text, size_t text_len)
+{
+ int printed_len = 0;
+
+ if (!(lflags & LOG_NEWLINE)) {
+ /*
+ * Flush the conflicting buffer. An earlier newline was missing,
+ * or another task also prints continuation lines.
+ */
+ if (cont.len && (lflags & LOG_PREFIX || cont.owner != current))
+ cont_flush(LOG_NEWLINE);
+
+ /* buffer line if possible, otherwise store it right away */
+ if (cont_add(facility, level, text, text_len))
+ printed_len += text_len;
+ else
+ printed_len += log_store(facility, level,
+ lflags | LOG_CONT, 0,
+ dict, dictlen, text, text_len);
+ } else {
+ bool stored = false;
+
+ /*
+ * If an earlier newline was missing and it was the same task,
+ * either merge it with the current buffer and flush, or if
+ * there was a race with interrupts (prefix == true) then just
+ * flush it out and store this line separately.
+ * If the preceding printk was from a different task and missed
+ * a newline, flush and append the newline.
+ */
+ if (cont.len) {
+ if (cont.owner == current && !(lflags & LOG_PREFIX))
+ stored = cont_add(facility, level, text,
+ text_len);
+ cont_flush(LOG_NEWLINE);
+ }
+
+ if (stored)
+ printed_len += text_len;
+ else
+ printed_len += log_store(facility, level, lflags, 0,
+ dict, dictlen, text, text_len);
+ }
+
+ return printed_len;
+}
+
asmlinkage int vprintk_emit(int facility, int level,
const char *dict, size_t dictlen,
const char *fmt, va_list args)
@@ -1889,45 +1944,8 @@ asmlinkage int vprintk_emit(int facility, int level,
&level, &in_sched,
&text, &text_len);

- if (!(lflags & LOG_NEWLINE)) {
- /*
- * Flush the conflicting buffer. An earlier newline was missing,
- * or another task also prints continuation lines.
- */
- if (cont.len && (lflags & LOG_PREFIX || cont.owner != current))
- cont_flush(LOG_NEWLINE);
-
- /* buffer line if possible, otherwise store it right away */
- if (cont_add(facility, level, text, text_len))
- printed_len += text_len;
- else
- printed_len += log_store(facility, level,
- lflags | LOG_CONT, 0,
- dict, dictlen, text, text_len);
- } else {
- bool stored = false;
-
- /*
- * If an earlier newline was missing and it was the same task,
- * either merge it with the current buffer and flush, or if
- * there was a race with interrupts (prefix == true) then just
- * flush it out and store this line separately.
- * If the preceding printk was from a different task and missed
- * a newline, flush and append the newline.
- */
- if (cont.len) {
- if (cont.owner == current && !(lflags & LOG_PREFIX))
- stored = cont_add(facility, level, text,
- text_len);
- cont_flush(LOG_NEWLINE);
- }
-
- if (stored)
- printed_len += text_len;
- else
- printed_len += log_store(facility, level, lflags, 0,
- dict, dictlen, text, text_len);
- }
+ printed_len += vprintk_store_text(facility, level, lflags,
+ dict, dictlen, text, text_len);

logbuf_cpu = UINT_MAX;
raw_spin_unlock(&logbuf_lock);
--
1.8.5.6

2015-05-25 12:47:24

by Petr Mladek

[permalink] [raw]
Subject: [PATCH 10/10] printk: Split console call from vprintk_emit()

vprintk_emit() is too long. Let's split the logic about when and how
to call the console into a separate function.

Well, it is not only about console but also about /dev/kmsg and syslog().
Hence the generic name vprintk_poke_loggers() seems to be more appropriate.

This patch just shuffles the code. There is no change in the functionality.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/printk/printk.c | 72 ++++++++++++++++++++++++++++----------------------
1 file changed, 41 insertions(+), 31 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c0b97653987a..fb4af49c2d4e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1877,6 +1877,46 @@ static int vprintk_store_text(int facility, int level, enum log_flags lflags,
return printed_len;
}

+/*
+ * Try to get console, wake up logging services, and flush the last
+ * messages there.
+ */
+static void vprintk_poke_loggers(bool in_sched)
+{
+ /*
+ * If called from the scheduler or NMI context, we can not get console
+ * without a possible deadlock. In this case, we must deffer the task
+ * via IRQ work.
+ *
+ * The only exception are Oops messages from NMI context where all
+ * relevant locks have been forcefully dropped in
+ * try_logbuf_lock_in_nmi(). We have to try to get the console,
+ * otherwise the last messages would get lost.
+ */
+ if (in_sched || (in_nmi() && !oops_in_progress)) {
+ __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
+ irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+ } else {
+ lockdep_off();
+ /*
+ * Disable preemption to avoid being preempted while holding
+ * console_sem which would prevent anyone from printing to
+ * console
+ */
+ preempt_disable();
+
+ /*
+ * Try to acquire and then immediately release the console
+ * semaphore. The release will print out buffers and wake up
+ * /dev/kmsg and syslog() users.
+ */
+ if (console_trylock_for_printk())
+ console_unlock();
+ preempt_enable();
+ lockdep_on();
+ }
+}
+
asmlinkage int vprintk_emit(int facility, int level,
const char *dict, size_t dictlen,
const char *fmt, va_list args)
@@ -1952,37 +1992,7 @@ asmlinkage int vprintk_emit(int facility, int level,
lockdep_on();
local_irq_restore(flags);

- /*
- * If called from the scheduler or NMI context, we can not get console
- * without a possible deadlock.
- *
- * The only exception are Oops messages from NMI context where all
- * relevant locks have been forcefully dropped in
- * try_logbuf_lock_in_nmi(). We have to try to get the console,
- * otherwise the last messages would get lost.
- */
- if (in_sched || (in_nmi() && !oops_in_progress)) {
- __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
- irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
- } else {
- lockdep_off();
- /*
- * Disable preemption to avoid being preempted while holding
- * console_sem which would prevent anyone from printing to
- * console
- */
- preempt_disable();
-
- /*
- * Try to acquire and then immediately release the console
- * semaphore. The release will print out buffers and wake up
- * /dev/kmsg and syslog() users.
- */
- if (console_trylock_for_printk())
- console_unlock();
- preempt_enable();
- lockdep_on();
- }
+ vprintk_poke_loggers(in_sched);

return printed_len;
}
--
1.8.5.6

2015-06-01 13:06:27

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup

On Fri 29-05-15 13:50:45, Andrew Morton wrote:
> On Mon, 25 May 2015 14:46:23 +0200 Petr Mladek <[email protected]> wrote:
>
> > The main source of deadlocks caused by printk() in NMI context has been
> > solved by the commit a9edc88093287 ("x86/nmi: Perform a safe NMI stack
> > trace on all CPUs").
> >
> > But there are still few warnings printed in the NMI code that could
> > case a deadlock. For example, see the freeze discussed at
> > https://lkml.org/lkml/2015/5/20/481
>
> I'm not (yet) convinced that we want the entire patchset btw. Do we
> really want to try to semi-support printk from NMI? With a rather
> nasty set of hacks?
>
> Why not just delete the offending printks?
And what about WARN_ONs and BUG_ONs? Delete as well? Or just don't print
anything when we are in NMI? I agree that NMI is so problematic context
that restricting printk there makes some sence. OTOH propagating
information from NMI to user is useful as well so I'm somewhat undecided.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-06-02 09:49:04

by Wang Long

[permalink] [raw]
Subject: Re: [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup

On 2015/6/1 21:06, Jan Kara wrote:
> On Fri 29-05-15 13:50:45, Andrew Morton wrote:
>> On Mon, 25 May 2015 14:46:23 +0200 Petr Mladek <[email protected]> wrote:
>>
>>> The main source of deadlocks caused by printk() in NMI context has been
>>> solved by the commit a9edc88093287 ("x86/nmi: Perform a safe NMI stack
>>> trace on all CPUs").
>>>
>>> But there are still few warnings printed in the NMI code that could
>>> case a deadlock. For example, see the freeze discussed at
>>> https://lkml.org/lkml/2015/5/20/481
>>
>> I'm not (yet) convinced that we want the entire patchset btw. Do we
>> really want to try to semi-support printk from NMI? With a rather
>> nasty set of hacks?
>>
>> Why not just delete the offending printks?
> And what about WARN_ONs and BUG_ONs? Delete as well? Or just don't print
> anything when we are in NMI? I agree that NMI is so problematic context
> that restricting printk there makes some sence. OTOH propagating
> information from NMI to user is useful as well so I'm somewhat undecided.
>
> Honza
>

I think that delete all printks in NMI context is not a good solution.
because some information is very useful to user.

The commit a9edc88093287 ("x86/nmi: Perform a safe NMI stack trace on all CPUs")
solved the deadlock problem only in arch_trigger_all_cpu_backtrace_handler() by
replacing the printk to a special print function (nmi_vprintk). But all the
other NMI handlers still use the original printk.

How about replacing printk function earlier? we can replace printk function
before we calling default_do_nmi(arch/x86/kernel/nmi.c) and replace back
after calling.

Is it a feasible solution? or does it introduce other problems?

Best Regards
Wang Long

2015-06-02 09:53:08

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH 00/10] printk: Avoid deadlock in NMI + vprintk_emit() cleanup

On Tue, 2 Jun 2015, long.wanglong wrote:

> How about replacing printk function earlier? we can replace printk
> function before we calling default_do_nmi(arch/x86/kernel/nmi.c) and
> replace back after calling.
>
> Is it a feasible solution? or does it introduce other problems?

This has been already discussed in the past when we were fixing the
NMI-based stackdumping. See

http://marc.info/?l=linux-kernel&m=141640218702588&w=2

--
Jiri Kosina
SUSE Labs