Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751297AbbEYMqz (ORCPT ); Mon, 25 May 2015 08:46:55 -0400 Received: from cantor2.suse.de ([195.135.220.15]:53550 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750771AbbEYMqu (ORCPT ); Mon, 25 May 2015 08:46:50 -0400 From: Petr Mladek To: Andrew Morton Cc: Frederic Weisbecker , Steven Rostedt , Dave Anderson , "Paul E. McKenney" , Kay Sievers , Jiri Kosina , Michal Hocko , Jan Kara , linux-kernel@vger.kernel.org, Wang Long , peifeiyue@huawei.com, dzickus@redhat.com, morgan.wang@huawei.com, sasha.levin@oracle.com, Petr Mladek Subject: [PATCH 01/10] printk: Avoid deadlock in NMI context Date: Mon, 25 May 2015 14:46:24 +0200 Message-Id: <1432557993-20458-2-git-send-email-pmladek@suse.cz> X-Mailer: git-send-email 1.8.5.6 In-Reply-To: <1432557993-20458-1-git-send-email-pmladek@suse.cz> References: <1432557993-20458-1-git-send-email-pmladek@suse.cz> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4425 Lines: 124 printk() cannot be used in NMI context safely because it uses an internal lock and thus could cause a deadlock. This is fine because NMI context is very special. The handlers should be short, effective, and do not use printk(). But developers tend to print warnings even from NMI code. They are pretty hard to debug when they lockup the machine and nothing appears in the logs. This patch prevents from the deadlock on logbuf_lock by using trylock rather than spin_lock. If the lock can not be taken, the message is ignored and some warning is printed later on. We also must not try to get console from NMI context. It needs even more locks and there is even higher chance to hung up. Unfortunately, we could not print more details about the lost message. We could not alloc a buffer in NMI. Therefore we would need some lockless mechanism to share a buffer between NMI and normal context. But this would make printk() code much more complicated and it is not worth it. There has already been an attempt to do so and it has been rejected, see https://lkml.org/lkml/2014/6/10/388 This is also the reason why we use the atomic counter. Hint: If anybody wants to know what NMI message is being lost, trace_dump_stack(0) can be called when try_lock() fails. Then trace_printk() could be used on that place. This is only the basic logic. There will be some improvements in the followup patches. Signed-off-by: Petr Mladek --- kernel/printk/printk.c | 42 ++++++++++++++++++++++++++++++++++++++---- 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index c099b082cd02..94fcf6f0b542 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1615,6 +1615,7 @@ asmlinkage int vprintk_emit(int facility, int level, const char *fmt, va_list args) { static int recursion_bug; + static atomic_t nmi_message_lost; static char textbuf[LOG_LINE_MAX]; char *text = textbuf; size_t text_len = 0; @@ -1641,7 +1642,7 @@ asmlinkage int vprintk_emit(int facility, int level, /* * Ouch, printk recursed into itself! */ - if (unlikely(logbuf_cpu == this_cpu)) { + if (unlikely(logbuf_cpu == this_cpu) && !in_nmi()) { /* * If a crash is occurring during printk() on this CPU, * then try to get the crash message out but make sure @@ -1658,7 +1659,26 @@ asmlinkage int vprintk_emit(int facility, int level, } lockdep_off(); - raw_spin_lock(&logbuf_lock); + + /* + * printk() should not be used in NMI context because the interrupt + * might came when lockbuf_lock is taken. Such situation is hard + * to debug. Therefore, we try to avoid the deadlock, use trylock + * in NMI context, and later warn about lost messages. + * + * Hint: If you get into troubles, use trace_dump_stack(0) + * to get location of the lost message and use trace_printk() + * there. + */ + if (!in_nmi()) { + raw_spin_lock(&logbuf_lock); + } else if (!raw_spin_trylock(&logbuf_lock)) { + atomic_inc(&nmi_message_lost); + lockdep_on(); + local_irq_restore(flags); + return 0; + } + logbuf_cpu = this_cpu; if (unlikely(recursion_bug)) { @@ -1672,6 +1692,17 @@ asmlinkage int vprintk_emit(int facility, int level, strlen(recursion_msg)); } + if (unlikely(atomic_read(&nmi_message_lost))) { + int lost = atomic_xchg(&nmi_message_lost, 0); + + text_len = scnprintf(text, sizeof(textbuf), + "BAD LUCK: lost %d message(s) from NMI context!", + lost); + /* emit KERN_CRIT message */ + printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0, + NULL, 0, text, text_len); + } + /* * The printf needs to come first; we need the syslog * prefix which might be passed-in as a parameter. @@ -1759,8 +1790,11 @@ asmlinkage int vprintk_emit(int facility, int level, lockdep_on(); local_irq_restore(flags); - /* If called from the scheduler, we can not call up(). */ - if (!in_sched) { + /* + * If called from the scheduler or NMI context, we can not get console + * without a possible deadlock. + */ + if (!in_sched && !in_nmi()) { lockdep_off(); /* * Disable preemption to avoid being preempted while holding -- 1.8.5.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/