From: Petr Mladek <pmladek@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Dave Anderson <anderson@redhat.com>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        Kay Sievers <kay@vrfy.org>, Jiri Kosina <jkosina@suse.cz>,
        Michal Hocko <mhocko@suse.cz>, Jan Kara <jack@suse.cz>,
        linux-kernel@vger.kernel.org, Wang Long <long.wanglong@huawei.com>,
        peifeiyue@huawei.com, dzickus@redhat.com, morgan.wang@huawei.com,
        sasha.levin@oracle.com, Petr Mladek <pmladek@suse.cz>
Subject: [PATCH 01/10] printk: Avoid deadlock in NMI context
Date: Mon, 25 May 2015 14:46:24 +0200
Message-Id: <1432557993-20458-2-git-send-email-pmladek@suse.cz>
In-Reply-To: <1432557993-20458-1-git-send-email-pmladek@suse.cz>
References: <1432557993-20458-1-git-send-email-pmladek@suse.cz>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4425
Lines: 124

printk() cannot be used in NMI context safely because it uses an internal
lock and thus could cause a deadlock. This is fine because NMI context
is very special. The handlers should be short, effective, and do not
use printk().

But developers tend to print warnings even from NMI code. They are pretty
hard to debug when they lockup the machine and nothing appears in the logs.

This patch prevents from the deadlock on logbuf_lock by using trylock
rather than spin_lock. If the lock can not be taken, the message is
ignored and some warning is printed later on.

We also must not try to get console from NMI context. It needs
even more locks and there is even higher chance to hung up.

Unfortunately, we could not print more details about the lost message.
We could not alloc a buffer in NMI. Therefore we would need some
lockless mechanism to share a buffer between NMI and normal context.
But this would make printk() code much more complicated and
it is not worth it. There has already been an attempt to do so
and it has been rejected, see https://lkml.org/lkml/2014/6/10/388
This is also the reason why we use the atomic counter.

Hint: If anybody wants to know what NMI message is being lost,
trace_dump_stack(0) can be called when try_lock() fails. Then
trace_printk() could be used on that place.

This is only the basic logic. There will be some improvements in
the followup patches.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
---
 kernel/printk/printk.c | 42 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c099b082cd02..94fcf6f0b542 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1615,6 +1615,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 			    const char *fmt, va_list args)
 {
 	static int recursion_bug;
+	static atomic_t nmi_message_lost;
 	static char textbuf[LOG_LINE_MAX];
 	char *text = textbuf;
 	size_t text_len = 0;
@@ -1641,7 +1642,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	/*
 	 * Ouch, printk recursed into itself!
 	 */
-	if (unlikely(logbuf_cpu == this_cpu)) {
+	if (unlikely(logbuf_cpu == this_cpu) && !in_nmi()) {
 		/*
 		 * If a crash is occurring during printk() on this CPU,
 		 * then try to get the crash message out but make sure
@@ -1658,7 +1659,26 @@ asmlinkage int vprintk_emit(int facility, int level,
 	}
 
 	lockdep_off();
-	raw_spin_lock(&logbuf_lock);
+
+	/*
+	 * printk() should not be used in NMI context because the interrupt
+	 * might came when lockbuf_lock is taken. Such situation is hard
+	 * to debug. Therefore, we try to avoid the deadlock, use trylock
+	 * in NMI context, and later warn about lost messages.
+	 *
+	 * Hint: If you get into troubles, use trace_dump_stack(0)
+	 * to get location of the lost message and use trace_printk()
+	 * there.
+	 */
+	if (!in_nmi()) {
+		raw_spin_lock(&logbuf_lock);
+	} else if (!raw_spin_trylock(&logbuf_lock)) {
+		atomic_inc(&nmi_message_lost);
+		lockdep_on();
+		local_irq_restore(flags);
+		return 0;
+	}
+
 	logbuf_cpu = this_cpu;
 
 	if (unlikely(recursion_bug)) {
@@ -1672,6 +1692,17 @@ asmlinkage int vprintk_emit(int facility, int level,
 					 strlen(recursion_msg));
 	}
 
+	if (unlikely(atomic_read(&nmi_message_lost))) {
+		int lost = atomic_xchg(&nmi_message_lost, 0);
+
+		text_len = scnprintf(text, sizeof(textbuf),
+				     "BAD LUCK: lost %d message(s) from NMI context!",
+				     lost);
+		/* emit KERN_CRIT message */
+		printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
+					 NULL, 0, text, text_len);
+	}
+
 	/*
 	 * The printf needs to come first; we need the syslog
 	 * prefix which might be passed-in as a parameter.
@@ -1759,8 +1790,11 @@ asmlinkage int vprintk_emit(int facility, int level,
 	lockdep_on();
 	local_irq_restore(flags);
 
-	/* If called from the scheduler, we can not call up(). */
-	if (!in_sched) {
+	/*
+	 * If called from the scheduler or NMI context, we can not get console
+	 * without a possible deadlock.
+	 */
+	if (!in_sched && !in_nmi()) {
 		lockdep_off();
 		/*
 		 * Disable preemption to avoid being preempted while holding
-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/