2015-12-09 13:21:32

by Petr Mladek

[permalink] [raw]
Subject: [PATCH v3 0/4] Cleaning printk stuff in NMI context

This patch set generalizes the already existing solution for
printing NMI messages. The main idea comes from Peter Zijlstra.

Changes against v2:

+ fixed compilation problems reported by 0-day build robot

+ MN10300 and Xtensa architectures will get handled separately

+ dropped the patch that printed NMI messages directly when Oops
in progress; it made the solution less reliable

+ made the size of the buffer configurable; use real numbers
instead of PAGE_SIZE


Changes against v1:

+ rebased on top of 4.4-rc2; there the old implementation was
moved to lib/nmi_backtrace.c and used also on arm; I hope that
I got the arm side correctly; I was not able to test on arm :-(

+ defined HAVE_NMI on arm for !CPU_V7M instead of !CPU_V7;
handle_fiq_as_nmi() is called from entry-armv.S that
is compiled when !CPU_V7M

+ defined HAVE_NMI also on mips; it calls nmi_enter() and
seems to have real NMIs (or am I wrong?)

+ serialized backtraces when printing directly
(oops_in_progress)


Petr Mladek (4):
printk/nmi: Generic solution for safe printk in NMI
printk/nmi: Use IRQ work only when ready
printk/nmi: Warn when some message has been lost in NMI context
printk/nmi: Increase the size of NMI buffer and make it configurable

arch/Kconfig | 7 ++
arch/arm/Kconfig | 2 +
arch/arm/kernel/smp.c | 2 +
arch/avr32/Kconfig | 1 +
arch/blackfin/Kconfig | 1 +
arch/cris/Kconfig | 1 +
arch/mips/Kconfig | 1 +
arch/powerpc/Kconfig | 1 +
arch/s390/Kconfig | 1 +
arch/sh/Kconfig | 1 +
arch/sparc/Kconfig | 1 +
arch/tile/Kconfig | 1 +
arch/x86/Kconfig | 1 +
arch/x86/kernel/apic/hw_nmi.c | 1 -
include/linux/hardirq.h | 2 +
include/linux/percpu.h | 3 -
include/linux/printk.h | 12 ++-
init/Kconfig | 27 ++++++
init/main.c | 1 +
kernel/printk/Makefile | 1 +
kernel/printk/nmi.c | 218 ++++++++++++++++++++++++++++++++++++++++++
kernel/printk/printk.c | 29 +++---
kernel/printk/printk.h | 55 +++++++++++
lib/nmi_backtrace.c | 89 +----------------
24 files changed, 352 insertions(+), 107 deletions(-)
create mode 100644 kernel/printk/nmi.c
create mode 100644 kernel/printk/printk.h

--
1.8.5.6


2015-12-09 13:22:47

by Petr Mladek

[permalink] [raw]
Subject: [PATCH v3 1/4] printk/nmi: Generic solution for safe printk in NMI

printk() takes some locks and could not be used a safe way in NMI
context.

The chance of a deadlock is real especially when printing
stacks from all CPUs. This particular problem has been addressed
on x86 by the commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack
trace on all CPUs").

This patch reuses most of the code and makes it generic. It is
useful for all messages and architectures that support NMI.

The alternative printk_func is set when entering and is reseted when
leaving NMI context. It queues IRQ work to copy the messages into
the main ring buffer in a safe context.

__printk_nmi_flush() copies all available messages and reset
the buffer. Then we could use a simple cmpxchg operations to
get synchronized with writers. There is also used a spinlock
to get synchronized with other flushers.

We do not longer use seq_buf because it depends on external lock.
It would be hard to make all supported operations safe for
a lockless use. It would be confusing and error prone to
make only some operations safe.

The code is put into separate printk/nmi.c as suggested by
Steven Rostedt. It needs a per-CPU buffer and is compiled only
on architectures that call nmi_enter(). This is achieved by
the new HAVE_NMI Kconfig flag.

One exception is arm where the deferred printing is used for
printing backtraces even without NMI. For this purpose,
we define NEED_PRINTK_NMI Kconfig flag. The alternative
printk_func is explicitly set when IPI_CPU_BACKTRACE is
handled.

The other exceptions are MN10300 and Xtensa architectures.
We need to clean up NMI handling there first. Let's do it
separately.

The patch is heavily based on the draft from Peter Zijlstra,
see https://lkml.org/lkml/2015/6/10/327

Suggested-by: Peter Zijlstra <[email protected]>
Suggested-by: Steven Rostedt <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
---
arch/Kconfig | 7 ++
arch/arm/Kconfig | 2 +
arch/arm/kernel/smp.c | 2 +
arch/avr32/Kconfig | 1 +
arch/blackfin/Kconfig | 1 +
arch/cris/Kconfig | 1 +
arch/mips/Kconfig | 1 +
arch/powerpc/Kconfig | 1 +
arch/s390/Kconfig | 1 +
arch/sh/Kconfig | 1 +
arch/sparc/Kconfig | 1 +
arch/tile/Kconfig | 1 +
arch/x86/Kconfig | 1 +
arch/x86/kernel/apic/hw_nmi.c | 1 -
include/linux/hardirq.h | 2 +
include/linux/percpu.h | 3 -
include/linux/printk.h | 12 ++-
init/Kconfig | 5 ++
init/main.c | 1 +
kernel/printk/Makefile | 1 +
kernel/printk/nmi.c | 203 ++++++++++++++++++++++++++++++++++++++++++
kernel/printk/printk.c | 19 +---
kernel/printk/printk.h | 44 +++++++++
lib/nmi_backtrace.c | 89 ++----------------
24 files changed, 294 insertions(+), 107 deletions(-)
create mode 100644 kernel/printk/nmi.c
create mode 100644 kernel/printk/printk.h

diff --git a/arch/Kconfig b/arch/Kconfig
index 4e949e58b192..7ce5101c2472 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -187,7 +187,14 @@ config HAVE_OPTPROBES
config HAVE_KPROBES_ON_FTRACE
bool

+config HAVE_NMI
+ bool
+
+config NEED_PRINTK_NMI
+ bool
+
config HAVE_NMI_WATCHDOG
+ depends on HAVE_NMI
bool
#
# An arch should select this if it provides all these things:
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 34e1569a11ee..96d2c275f0f7 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -63,6 +63,8 @@ config ARM
select HAVE_KRETPROBES if (HAVE_KPROBES)
select HAVE_MEMBLOCK
select HAVE_MOD_ARCH_SPECIFIC
+ select HAVE_NMI if (!CPU_V7M)
+ select NEED_PRINTK_NMI if (CPU_V7M)
select HAVE_OPROFILE if (HAVE_PERF_EVENTS)
select HAVE_OPTPROBES if !THUMB2_KERNEL
select HAVE_PERF_EVENTS
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index b26361355dae..a960adb9bd7d 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -648,7 +648,9 @@ void handle_IPI(int ipinr, struct pt_regs *regs)

case IPI_CPU_BACKTRACE:
irq_enter();
+ printk_nmi_enter();
nmi_cpu_backtrace(regs);
+ printk_nmi_exit();
irq_exit();
break;

diff --git a/arch/avr32/Kconfig b/arch/avr32/Kconfig
index b6878eb64884..9dc3e2b1180b 100644
--- a/arch/avr32/Kconfig
+++ b/arch/avr32/Kconfig
@@ -17,6 +17,7 @@ config AVR32
select GENERIC_CLOCKEVENTS
select HAVE_MOD_ARCH_SPECIFIC
select MODULES_USE_ELF_RELA
+ select HAVE_NMI
help
AVR32 is a high-performance 32-bit RISC microprocessor core,
designed for cost-sensitive embedded applications, with particular
diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig
index af76634f8d98..47c0a55acafd 100644
--- a/arch/blackfin/Kconfig
+++ b/arch/blackfin/Kconfig
@@ -40,6 +40,7 @@ config BLACKFIN
select HAVE_MOD_ARCH_SPECIFIC
select MODULES_USE_ELF_RELA
select HAVE_DEBUG_STACKOVERFLOW
+ select HAVE_NMI

config GENERIC_CSUM
def_bool y
diff --git a/arch/cris/Kconfig b/arch/cris/Kconfig
index e086f9e93728..62148eaf0189 100644
--- a/arch/cris/Kconfig
+++ b/arch/cris/Kconfig
@@ -69,6 +69,7 @@ config CRIS
select GENERIC_CLOCKEVENTS if ETRAX_ARCH_V32
select GENERIC_SCHED_CLOCK if ETRAX_ARCH_V32
select HAVE_DEBUG_BUGVERBOSE if ETRAX_ARCH_V32
+ select HAVE_NMI

config HZ
int
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 71683a853372..35ae01fe980a 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -63,6 +63,7 @@ config MIPS
select HAVE_IRQ_TIME_ACCOUNTING
select GENERIC_TIME_VSYSCALL
select ARCH_CLOCKSOURCE_DATA
+ select HAVE_NMI

menu "Machine selection"

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index db49e0d796b1..1bd8a7503a48 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -156,6 +156,7 @@ config PPC
select NO_BOOTMEM
select HAVE_GENERIC_RCU_GUP
select HAVE_PERF_EVENTS_NMI if PPC64
+ select HAVE_NMI if PERF_EVENTS
select EDAC_SUPPORT
select EDAC_ATOMIC_SCRUB
select ARCH_HAS_DMA_SET_COHERENT_MASK
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 3a55f493c7da..9a8a76829508 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -159,6 +159,7 @@ config S390
select TTY
select VIRT_CPU_ACCOUNTING
select VIRT_TO_BUS
+ select HAVE_NMI


config SCHED_OMIT_FRAME_POINTER
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index d514df7e04dd..ff8093b574f1 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -44,6 +44,7 @@ config SUPERH
select OLD_SIGSUSPEND
select OLD_SIGACTION
select HAVE_ARCH_AUDITSYSCALL
+ select HAVE_NMI
help
The SuperH is a RISC processor targeted for use in embedded systems
and consumer electronics; it was also used in the Sega Dreamcast
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 56442d2d7bbc..e05ea973fd6a 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -80,6 +80,7 @@ config SPARC64
select NO_BOOTMEM
select HAVE_ARCH_AUDITSYSCALL
select ARCH_SUPPORTS_ATOMIC_RMW
+ select HAVE_NMI

config ARCH_DEFCONFIG
string
diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index 106c21bd7f44..050468c41dc6 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -29,6 +29,7 @@ config TILE
select HAVE_DEBUG_STACKOVERFLOW
select ARCH_WANT_FRAME_POINTERS
select HAVE_CONTEXT_TRACKING
+ select HAVE_NMI if USE_PMC
select EDAC_SUPPORT
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index db3622f22b61..e83f99194c4c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -126,6 +126,7 @@ config X86
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP
select HAVE_MIXED_BREAKPOINTS_REGS
+ select HAVE_NMI
select HAVE_OPROFILE
select HAVE_OPTPROBES
select HAVE_PCSPKR_PLATFORM
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 045e424fb368..7788ce643bf4 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -18,7 +18,6 @@
#include <linux/nmi.h>
#include <linux/module.h>
#include <linux/delay.h>
-#include <linux/seq_buf.h>

#ifdef CONFIG_HARDLOCKUP_DETECTOR
u64 hw_nmi_get_sample_period(int watchdog_thresh)
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index dfd59d6bc6f0..a477e0766d2e 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -67,10 +67,12 @@ extern void irq_exit(void);
preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
rcu_nmi_enter(); \
trace_hardirq_enter(); \
+ printk_nmi_enter(); \
} while (0)

#define nmi_exit() \
do { \
+ printk_nmi_exit(); \
trace_hardirq_exit(); \
rcu_nmi_exit(); \
BUG_ON(!in_nmi()); \
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index caebf2a758dc..04c68b9f56f8 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -135,7 +135,4 @@ extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
(typeof(type) __percpu *)__alloc_percpu(sizeof(type), \
__alignof__(type))

-/* To avoid include hell, as printk can not declare this, we declare it here */
-DECLARE_PER_CPU(printk_func_t, printk_func);
-
#endif /* __LINUX_PERCPU_H */
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 9729565c25ff..7aba6b92d020 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -122,7 +122,17 @@ static inline __printf(1, 2) __cold
void early_printk(const char *s, ...) { }
#endif

-typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args);
+#ifdef CONFIG_PRINTK_NMI
+extern void printk_nmi_init(void);
+extern void printk_nmi_enter(void);
+extern void printk_nmi_exit(void);
+extern void printk_nmi_flush(void);
+#else
+static inline void printk_nmi_init(void) { }
+static inline void printk_nmi_enter(void) { }
+static inline void printk_nmi_exit(void) { }
+static inline void printk_nmi_flush(void) { }
+#endif /* PRINTK_NMI */

#ifdef CONFIG_PRINTK
asmlinkage __printf(5, 0)
diff --git a/init/Kconfig b/init/Kconfig
index c24b6f767bf0..c1c0b6a2d712 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1456,6 +1456,11 @@ config PRINTK
very difficult to diagnose system problems, saying N here is
strongly discouraged.

+config PRINTK_NMI
+ def_bool y
+ depends on PRINTK
+ depends on HAVE_NMI || NEED_PRINTK_NMI
+
config BUG
bool "BUG() support" if EXPERT
default y
diff --git a/init/main.c b/init/main.c
index 9e64d7097f1a..45c1e05125dc 100644
--- a/init/main.c
+++ b/init/main.c
@@ -591,6 +591,7 @@ asmlinkage __visible void __init start_kernel(void)
timekeeping_init();
time_init();
sched_clock_postinit();
+ printk_nmi_init();
perf_event_init();
profile_init();
call_function_init();
diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index 85405bdcf2b3..abb0042a427b 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -1,2 +1,3 @@
obj-y = printk.o
+obj-$(CONFIG_PRINTK_NMI) += nmi.o
obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
new file mode 100644
index 000000000000..7b5ff81b291f
--- /dev/null
+++ b/kernel/printk/nmi.c
@@ -0,0 +1,203 @@
+/*
+ * nmi.c - Safe printk in NMI context
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/preempt.h>
+#include <linux/spinlock.h>
+#include <linux/smp.h>
+#include <linux/cpumask.h>
+#include <linux/irq_work.h>
+#include <linux/printk.h>
+
+#include "printk.h"
+
+/*
+ * printk() could not take logbuf_lock in NMI context. Instead,
+ * it uses an alternative implementation that temporary stores
+ * the strings into a per-CPU buffer. The content of the buffer
+ * is later flushed into the main ring buffer via IRQ work.
+ *
+ * The alternative implementation is chosen transparently
+ * via @printk_func per-CPU variable.
+ *
+ * The implementation allows to flush the strings also from another CPU.
+ * There are situations when we want to make sure that all buffers
+ * were handled or when IRQs are blocked.
+ */
+DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
+
+#define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
+
+struct nmi_seq_buf {
+ atomic_t len; /* length of written data */
+ struct irq_work work; /* IRQ work that flushes the buffer */
+ unsigned char buffer[NMI_LOG_BUF_LEN];
+};
+static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
+
+/*
+ * Safe printk() for NMI context. It uses a per-CPU buffer to
+ * store the message. NMIs are not nested, so there is always only
+ * one writer running. But the buffer might get flushed from another
+ * CPU, so we need to be careful.
+ */
+static int vprintk_nmi(const char *fmt, va_list args)
+{
+ struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
+ int add = 0, len;
+
+again:
+ len = atomic_read(&s->len);
+
+ if (len >= sizeof(s->buffer))
+ return 0;
+
+ /*
+ * Make sure that all old data have been read before the buffer was
+ * reseted. This is not needed when we just append data.
+ */
+ if (!len)
+ smp_rmb();
+
+ add = vsnprintf(s->buffer + len, sizeof(s->buffer) - len, fmt, args);
+
+ /*
+ * Do it once again if the buffer has been flushed in the meantime.
+ * Note that atomic_cmpxchg() is an implicit memory barrier that
+ * makes sure that the data were written before updating s->len.
+ */
+ if (atomic_cmpxchg(&s->len, len, len + add) != len)
+ goto again;
+
+ /* Get flushed in a more safe context. */
+ if (add)
+ irq_work_queue(&s->work);
+
+ return add;
+}
+
+/*
+ * printk one line from the temporary buffer from @start index until
+ * and including the @end index.
+ */
+static void print_nmi_seq_line(struct nmi_seq_buf *s, int start, int end)
+{
+ const char *buf = s->buffer + start;
+
+ printk("%.*s", (end - start) + 1, buf);
+}
+
+/*
+ * Flush data from the associated per_CPU buffer. The function
+ * can be called either via IRQ work or independently.
+ */
+static void __printk_nmi_flush(struct irq_work *work)
+{
+ static raw_spinlock_t read_lock =
+ __RAW_SPIN_LOCK_INITIALIZER(read_lock);
+ struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
+ int len, size, i, last_i;
+
+ /*
+ * The lock has two functions. First, one reader has to flush all
+ * available message to make the lockless synchronization with
+ * writers easier. Second, we do not want to mix messages from
+ * different CPUs. This is especially important when printing
+ * a backtrace.
+ */
+ raw_spin_lock(&read_lock);
+
+ i = 0;
+more:
+ len = atomic_read(&s->len);
+
+ /*
+ * This is just a paranoid check that nobody has manipulated
+ * the buffer an unexpected way. If we printed something then
+ * @len must only increase.
+ */
+ WARN_ON(i && i >= len);
+
+ if (!len)
+ goto out; /* Someone else has already flushed the buffer. */
+
+ /* Make sure that data has been written up to the @len */
+ smp_rmb();
+
+ size = min_t(int, len, sizeof(s->buffer));
+ last_i = i;
+
+ /* Print line by line. */
+ for (; i < size; i++) {
+ if (s->buffer[i] == '\n') {
+ print_nmi_seq_line(s, last_i, i);
+ last_i = i + 1;
+ }
+ }
+ /* Check if there was a partial line. */
+ if (last_i < size) {
+ print_nmi_seq_line(s, last_i, size - 1);
+ pr_cont("\n");
+ }
+
+ /*
+ * Check that nothing has got added in the meantime and truncate
+ * the buffer. Note that atomic_cmpxchg() is an implicit memory
+ * barrier that makes sure that the data were copied before
+ * updating s->len.
+ */
+ if (atomic_cmpxchg(&s->len, len, 0) != len)
+ goto more;
+
+out:
+ raw_spin_unlock(&read_lock);
+}
+
+/**
+ * printk_nmi_flush - flush all per-cpu nmi buffers.
+ *
+ * The buffers are flushed automatically via IRQ work. This function
+ * is useful only when someone wants to be sure that all buffers have
+ * been flushed at some point.
+ */
+void printk_nmi_flush(void)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu)
+ __printk_nmi_flush(&per_cpu(nmi_print_seq, cpu).work);
+}
+
+void __init printk_nmi_init(void)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ struct nmi_seq_buf *s = &per_cpu(nmi_print_seq, cpu);
+
+ init_irq_work(&s->work, __printk_nmi_flush);
+ }
+}
+
+void printk_nmi_enter(void)
+{
+ this_cpu_write(printk_func, vprintk_nmi);
+}
+
+void printk_nmi_exit(void)
+{
+ this_cpu_write(printk_func, vprintk_default);
+}
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2ce8826f1053..88641c74163d 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -54,6 +54,7 @@

#include "console_cmdline.h"
#include "braille.h"
+#include "printk.h"

int console_printk[4] = {
CONSOLE_LOGLEVEL_DEFAULT, /* console_loglevel */
@@ -1867,14 +1868,6 @@ int vprintk_default(const char *fmt, va_list args)
}
EXPORT_SYMBOL_GPL(vprintk_default);

-/*
- * This allows printk to be diverted to another function per cpu.
- * This is useful for calling printk functions from within NMI
- * without worrying about race conditions that can lock up the
- * box.
- */
-DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
-
/**
* printk - print a kernel message
* @fmt: format string
@@ -1898,21 +1891,11 @@ DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
*/
asmlinkage __visible int printk(const char *fmt, ...)
{
- printk_func_t vprintk_func;
va_list args;
int r;

va_start(args, fmt);
-
- /*
- * If a caller overrides the per_cpu printk_func, then it needs
- * to disable preemption when calling printk(). Otherwise
- * the printk_func should be set to the default. No need to
- * disable preemption here.
- */
- vprintk_func = this_cpu_read(printk_func);
r = vprintk_func(fmt, args);
-
va_end(args);

return r;
diff --git a/kernel/printk/printk.h b/kernel/printk/printk.h
new file mode 100644
index 000000000000..dc97c1c4d53b
--- /dev/null
+++ b/kernel/printk/printk.h
@@ -0,0 +1,44 @@
+/*
+ * printk.h - printk internal definitions
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#include <linux/percpu.h>
+
+typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args);
+
+int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
+
+#ifdef CONFIG_PRINTK_NMI
+
+/*
+ * printk() could not take logbuf_lock in NMI context. Instead,
+ * it temporary stores the strings into a per-CPU buffer.
+ * The alternative implementation is chosen transparently
+ * via per-CPU variable.
+ */
+DECLARE_PER_CPU(printk_func_t, printk_func);
+static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
+{
+ return this_cpu_read(printk_func)(fmt, args);
+}
+
+#else /* CONFIG_PRINTK_NMI */
+
+static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
+{
+ return vprintk_default(fmt, args);
+}
+
+#endif /* CONFIG_PRINTK_NMI */
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 6019c53c669e..26caf51cc238 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -16,33 +16,14 @@
#include <linux/delay.h>
#include <linux/kprobes.h>
#include <linux/nmi.h>
-#include <linux/seq_buf.h>

#ifdef arch_trigger_all_cpu_backtrace
/* For reliability, we're prepared to waste bits here. */
static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
-static cpumask_t printtrace_mask;
-
-#define NMI_BUF_SIZE 4096
-
-struct nmi_seq_buf {
- unsigned char buffer[NMI_BUF_SIZE];
- struct seq_buf seq;
-};
-
-/* Safe printing in NMI context */
-static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);

/* "in progress" flag of arch_trigger_all_cpu_backtrace */
static unsigned long backtrace_flag;

-static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
-{
- const char *buf = s->buffer + start;
-
- printk("%.*s", (end - start) + 1, buf);
-}
-
/*
* When raise() is called it will be is passed a pointer to the
* backtrace_mask. Architectures that call nmi_cpu_backtrace()
@@ -52,8 +33,7 @@ static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
void nmi_trigger_all_cpu_backtrace(bool include_self,
void (*raise)(cpumask_t *mask))
{
- struct nmi_seq_buf *s;
- int i, cpu, this_cpu = get_cpu();
+ int i, this_cpu = get_cpu();

if (test_and_set_bit(0, &backtrace_flag)) {
/*
@@ -68,17 +48,6 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
if (!include_self)
cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask));

- cpumask_copy(&printtrace_mask, to_cpumask(backtrace_mask));
-
- /*
- * Set up per_cpu seq_buf buffers that the NMIs running on the other
- * CPUs will write to.
- */
- for_each_cpu(cpu, to_cpumask(backtrace_mask)) {
- s = &per_cpu(nmi_print_seq, cpu);
- seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
- }
-
if (!cpumask_empty(to_cpumask(backtrace_mask))) {
pr_info("Sending NMI to %s CPUs:\n",
(include_self ? "all" : "other"));
@@ -94,73 +63,25 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
}

/*
- * Now that all the NMIs have triggered, we can dump out their
- * back traces safely to the console.
+ * Force flush any remote buffers that might be stuck in IRQ context
+ * and therefore could not run their irq_work.
*/
- for_each_cpu(cpu, &printtrace_mask) {
- int len, last_i = 0;
+ printk_nmi_flush();

- s = &per_cpu(nmi_print_seq, cpu);
- len = seq_buf_used(&s->seq);
- if (!len)
- continue;
-
- /* Print line by line. */
- for (i = 0; i < len; i++) {
- if (s->buffer[i] == '\n') {
- print_seq_line(s, last_i, i);
- last_i = i + 1;
- }
- }
- /* Check if there was a partial line. */
- if (last_i < len) {
- print_seq_line(s, last_i, len - 1);
- pr_cont("\n");
- }
- }
-
- clear_bit(0, &backtrace_flag);
- smp_mb__after_atomic();
+ clear_bit_unlock(0, &backtrace_flag);
put_cpu();
}

-/*
- * It is not safe to call printk() directly from NMI handlers.
- * It may be fine if the NMI detected a lock up and we have no choice
- * but to do so, but doing a NMI on all other CPUs to get a back trace
- * can be done with a sysrq-l. We don't want that to lock up, which
- * can happen if the NMI interrupts a printk in progress.
- *
- * Instead, we redirect the vprintk() to this nmi_vprintk() that writes
- * the content into a per cpu seq_buf buffer. Then when the NMIs are
- * all done, we can safely dump the contents of the seq_buf to a printk()
- * from a non NMI context.
- */
-static int nmi_vprintk(const char *fmt, va_list args)
-{
- struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
- unsigned int len = seq_buf_used(&s->seq);
-
- seq_buf_vprintf(&s->seq, fmt, args);
- return seq_buf_used(&s->seq) - len;
-}
-
bool nmi_cpu_backtrace(struct pt_regs *regs)
{
int cpu = smp_processor_id();

if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
- printk_func_t printk_func_save = this_cpu_read(printk_func);
-
- /* Replace printk to write into the NMI seq */
- this_cpu_write(printk_func, nmi_vprintk);
pr_warn("NMI backtrace for cpu %d\n", cpu);
if (regs)
show_regs(regs);
else
dump_stack();
- this_cpu_write(printk_func, printk_func_save);
-
cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
return true;
}
--
1.8.5.6

2015-12-09 13:21:38

by Petr Mladek

[permalink] [raw]
Subject: [PATCH v3 2/4] printk/nmi: Use IRQ work only when ready

NMIs could happen at any time. This patch makes sure that the safe
printk() in NMI will schedule IRQ work only when the related
structs are initialized.

All pending messages are flushed when the IRQ work is being
initialized.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/printk/nmi.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index 7b5ff81b291f..29a498586465 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -38,6 +38,7 @@
* were handled or when IRQs are blocked.
*/
DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
+static int printk_nmi_irq_ready;

#define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))

@@ -83,8 +84,11 @@ again:
goto again;

/* Get flushed in a more safe context. */
- if (add)
+ if (add && printk_nmi_irq_ready) {
+ /* Make sure that IRQ work is really initialized. */
+ smp_rmb();
irq_work_queue(&s->work);
+ }

return add;
}
@@ -190,6 +194,13 @@ void __init printk_nmi_init(void)

init_irq_work(&s->work, __printk_nmi_flush);
}
+
+ /* Make sure that IRQ works are initialized before enabling. */
+ smp_wmb();
+ printk_nmi_irq_ready = 1;
+
+ /* Flush pending messages that did not have scheduled IRQ works. */
+ printk_nmi_flush();
}

void printk_nmi_enter(void)
--
1.8.5.6

2015-12-09 13:22:03

by Petr Mladek

[permalink] [raw]
Subject: [PATCH v3 3/4] printk/nmi: Warn when some message has been lost in NMI context

We could not resize the temporary buffer in NMI context. Let's warn
if a message is lost.

This is rather theoretical. printk() should not be used in NMI.
The only sensible use is when we want to print backtrace from all
CPUs. The current buffer should be enough for this purpose.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/printk/nmi.c | 5 ++++-
kernel/printk/printk.c | 10 ++++++++++
kernel/printk/printk.h | 11 +++++++++++
3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index 29a498586465..5465230b75ec 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -39,6 +39,7 @@
*/
DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
static int printk_nmi_irq_ready;
+atomic_t nmi_message_lost;

#define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))

@@ -63,8 +64,10 @@ static int vprintk_nmi(const char *fmt, va_list args)
again:
len = atomic_read(&s->len);

- if (len >= sizeof(s->buffer))
+ if (len >= sizeof(s->buffer)) {
+ atomic_inc(&nmi_message_lost);
return 0;
+ }

/*
* Make sure that all old data have been read before the buffer was
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 88641c74163d..a6bb0b0838f3 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1669,6 +1669,7 @@ asmlinkage int vprintk_emit(int facility, int level,
unsigned long flags;
int this_cpu;
int printed_len = 0;
+ int nmi_message_lost;
bool in_sched = false;
/* cpu currently holding logbuf_lock in this function */
static unsigned int logbuf_cpu = UINT_MAX;
@@ -1719,6 +1720,15 @@ asmlinkage int vprintk_emit(int facility, int level,
strlen(recursion_msg));
}

+ nmi_message_lost = get_nmi_message_lost();
+ if (unlikely(nmi_message_lost)) {
+ text_len = scnprintf(textbuf, sizeof(textbuf),
+ "BAD LUCK: lost %d message(s) from NMI context!",
+ nmi_message_lost);
+ printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
+ NULL, 0, textbuf, text_len);
+ }
+
/*
* The printf needs to come first; we need the syslog
* prefix which might be passed-in as a parameter.
diff --git a/kernel/printk/printk.h b/kernel/printk/printk.h
index dc97c1c4d53b..7f778b609492 100644
--- a/kernel/printk/printk.h
+++ b/kernel/printk/printk.h
@@ -34,6 +34,12 @@ static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
return this_cpu_read(printk_func)(fmt, args);
}

+extern atomic_t nmi_message_lost;
+static inline int get_nmi_message_lost(void)
+{
+ return atomic_xchg(&nmi_message_lost, 0);
+}
+
#else /* CONFIG_PRINTK_NMI */

static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
@@ -41,4 +47,9 @@ static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
return vprintk_default(fmt, args);
}

+static inline int get_nmi_message_lost(void)
+{
+ return 0;
+}
+
#endif /* CONFIG_PRINTK_NMI */
--
1.8.5.6

2015-12-09 13:21:41

by Petr Mladek

[permalink] [raw]
Subject: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

Testing has shown that the backtrace sometimes does not fit
into the 4kB temporary buffer that is used in NMI context.
The warnings are gone when I double the temporary buffer size.

This patch doubles the buffer size and makes it configurable.

Note that this problem existed even in the x86-specific
implementation that was added by the commit a9edc8809328
("x86/nmi: Perform a safe NMI stack trace on all CPUs").
Nobody noticed it because it did not print any warnings.

Signed-off-by: Petr Mladek <[email protected]>
---
init/Kconfig | 22 ++++++++++++++++++++++
kernel/printk/nmi.c | 3 ++-
2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index c1c0b6a2d712..efcff25a112d 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -866,6 +866,28 @@ config LOG_CPU_MAX_BUF_SHIFT
13 => 8 KB for each CPU
12 => 4 KB for each CPU

+config NMI_LOG_BUF_SHIFT
+ int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
+ range 10 21
+ default 13
+ depends on PRINTK && HAVE_NMI
+ help
+ Select the size of a per-CPU buffer where NMI messages are temporary
+ stored. They are copied to the main log buffer in a safe context
+ to avoid a deadlock. The value defines the size as a power of 2.
+
+ NMI messages are rare and limited. The largest one is when
+ a backtrace is printed. It usually fits into 4KB. Select
+ 8KB if you want to be on the safe side.
+
+ Examples:
+ 17 => 128 KB for each CPU
+ 16 => 64 KB for each CPU
+ 15 => 32 KB for each CPU
+ 14 => 16 KB for each CPU
+ 13 => 8 KB for each CPU
+ 12 => 4 KB for each CPU
+
#
# Architectures with an unreliable sched_clock() should select this:
#
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index 5465230b75ec..78c07d441b4e 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -41,7 +41,8 @@ DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
static int printk_nmi_irq_ready;
atomic_t nmi_message_lost;

-#define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
+#define NMI_LOG_BUF_LEN ((1 << CONFIG_NMI_LOG_BUF_SHIFT) - \
+ sizeof(atomic_t) - sizeof(struct irq_work))

struct nmi_seq_buf {
atomic_t len; /* length of written data */
--
1.8.5.6

2015-12-09 23:50:12

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] printk/nmi: Generic solution for safe printk in NMI

On Wed, 9 Dec 2015 14:21:02 +0100 Petr Mladek <[email protected]> wrote:

> printk() takes some locks and could not be used a safe way in NMI
> context.
>
> The chance of a deadlock is real especially when printing
> stacks from all CPUs. This particular problem has been addressed
> on x86 by the commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack
> trace on all CPUs").
>
> This patch reuses most of the code and makes it generic. It is
> useful for all messages and architectures that support NMI.
>
> The alternative printk_func is set when entering and is reseted when
> leaving NMI context. It queues IRQ work to copy the messages into
> the main ring buffer in a safe context.
>
> __printk_nmi_flush() copies all available messages and reset
> the buffer. Then we could use a simple cmpxchg operations to
> get synchronized with writers. There is also used a spinlock
> to get synchronized with other flushers.
>
> We do not longer use seq_buf because it depends on external lock.
> It would be hard to make all supported operations safe for
> a lockless use. It would be confusing and error prone to
> make only some operations safe.
>
> The code is put into separate printk/nmi.c as suggested by
> Steven Rostedt. It needs a per-CPU buffer and is compiled only
> on architectures that call nmi_enter(). This is achieved by
> the new HAVE_NMI Kconfig flag.
>
> One exception is arm where the deferred printing is used for
> printing backtraces even without NMI. For this purpose,
> we define NEED_PRINTK_NMI Kconfig flag. The alternative
> printk_func is explicitly set when IPI_CPU_BACKTRACE is
> handled.
>
> The other exceptions are MN10300 and Xtensa architectures.
> We need to clean up NMI handling there first. Let's do it
> separately.
>
> The patch is heavily based on the draft from Peter Zijlstra,
> see https://lkml.org/lkml/2015/6/10/327
>

I guess this code is useful even on CONFIG_SMP=n: to avoid corruption
of the printk internal structures whcih the problematic locking
protects.

> ...
>
> +/*
> + * printk() could not take logbuf_lock in NMI context. Instead,
> + * it uses an alternative implementation that temporary stores
> + * the strings into a per-CPU buffer. The content of the buffer
> + * is later flushed into the main ring buffer via IRQ work.
> + *
> + * The alternative implementation is chosen transparently
> + * via @printk_func per-CPU variable.
> + *
> + * The implementation allows to flush the strings also from another CPU.
> + * There are situations when we want to make sure that all buffers
> + * were handled or when IRQs are blocked.
> + */
> +DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
> +
> +#define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
> +
> +struct nmi_seq_buf {
> + atomic_t len; /* length of written data */
> + struct irq_work work; /* IRQ work that flushes the buffer */
> + unsigned char buffer[NMI_LOG_BUF_LEN];

When this buffer overflows, which characters get lost? Most recent or
least recent?

I'm not sure which is best, really. For an oops trace you probably
want to preserve the least recent output: the stuff at the start of the
output.

> +};
>
> ...
>
> +static void __printk_nmi_flush(struct irq_work *work)
> +{
> + static raw_spinlock_t read_lock =
> + __RAW_SPIN_LOCK_INITIALIZER(read_lock);
> + struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
> + int len, size, i, last_i;
> +
> + /*
> + * The lock has two functions. First, one reader has to flush all
> + * available message to make the lockless synchronization with
> + * writers easier. Second, we do not want to mix messages from
> + * different CPUs. This is especially important when printing
> + * a backtrace.
> + */
> + raw_spin_lock(&read_lock);
> +
> + i = 0;
> +more:
> + len = atomic_read(&s->len);
> +
> + /*
> + * This is just a paranoid check that nobody has manipulated
> + * the buffer an unexpected way. If we printed something then
> + * @len must only increase.
> + */
> + WARN_ON(i && i >= len);

hm, dumping a big backtrace in this context seems a poor idea. Oh
well, shouldn't happen.

> + if (!len)
> + goto out; /* Someone else has already flushed the buffer. */
> +
> + /* Make sure that data has been written up to the @len */
> + smp_rmb();
> +
> + size = min_t(int, len, sizeof(s->buffer));

len and size should have type size_t.

> + last_i = i;
> +
> + /* Print line by line. */
> + for (; i < size; i++) {
> + if (s->buffer[i] == '\n') {
> + print_nmi_seq_line(s, last_i, i);
> + last_i = i + 1;
> + }
> + }
> + /* Check if there was a partial line. */
> + if (last_i < size) {
> + print_nmi_seq_line(s, last_i, size - 1);
> + pr_cont("\n");
> + }
> +
> + /*
> + * Check that nothing has got added in the meantime and truncate
> + * the buffer. Note that atomic_cmpxchg() is an implicit memory
> + * barrier that makes sure that the data were copied before
> + * updating s->len.
> + */
> + if (atomic_cmpxchg(&s->len, len, 0) != len)
> + goto more;
> +
> +out:
> + raw_spin_unlock(&read_lock);
> +}
>
> ...
>
> --- /dev/null
> +++ b/kernel/printk/printk.h

I find it a bit irritating to have duplicated filenames. We could
follow convention and call this "internal.h".

>
> ...
>

2015-12-10 15:26:13

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] printk/nmi: Generic solution for safe printk in NMI

On Wed 2015-12-09 15:50:07, Andrew Morton wrote:
> On Wed, 9 Dec 2015 14:21:02 +0100 Petr Mladek <[email protected]> wrote:
>
> > printk() takes some locks and could not be used a safe way in NMI
> > context.
> >
> > The chance of a deadlock is real especially when printing
> > stacks from all CPUs. This particular problem has been addressed
> > on x86 by the commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack
> > trace on all CPUs").
> >
> > This patch reuses most of the code and makes it generic. It is
> > useful for all messages and architectures that support NMI.
> >
> > The patch is heavily based on the draft from Peter Zijlstra,
> > see https://lkml.org/lkml/2015/6/10/327
> >
>
> I guess this code is useful even on CONFIG_SMP=n: to avoid corruption
> of the printk internal structures whcih the problematic locking
> protects.

Yup and it is used even on CONFIG_SMP=n if I am not missing
something. At least, CONFIG_PRINTK_NMI stays enabled here.


> > +#define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
> > +
> > +struct nmi_seq_buf {
> > + atomic_t len; /* length of written data */
> > + struct irq_work work; /* IRQ work that flushes the buffer */
> > + unsigned char buffer[NMI_LOG_BUF_LEN];
>
> When this buffer overflows, which characters get lost? Most recent or
> least recent?

The most recent messages are lost when the buffer overflows. The other
way would require to use a ring-buffer instead the seq_buf. We would need
a lock-less synchronization for both, begin and end, pointers. It
would add quite some complications.


> I'm not sure which is best, really. For an oops trace you probably
> want to preserve the least recent output: the stuff at the start of the
> output.

I agree. Fortunately, this is easier and it works this way.


> > +static void __printk_nmi_flush(struct irq_work *work)
> > +{
> > + static raw_spinlock_t read_lock =
> > + __RAW_SPIN_LOCK_INITIALIZER(read_lock);
> > + struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
> > + int len, size, i, last_i;
> > +
> > + /*
> > + * The lock has two functions. First, one reader has to flush all
> > + * available message to make the lockless synchronization with
> > + * writers easier. Second, we do not want to mix messages from
> > + * different CPUs. This is especially important when printing
> > + * a backtrace.
> > + */
> > + raw_spin_lock(&read_lock);
> > +
> > + i = 0;
> > +more:
> > + len = atomic_read(&s->len);
> > +
> > + /*
> > + * This is just a paranoid check that nobody has manipulated
> > + * the buffer an unexpected way. If we printed something then
> > + * @len must only increase.
> > + */
> > + WARN_ON(i && i >= len);
>
> hm, dumping a big backtrace in this context seems a poor idea. Oh
> well, shouldn't happen.

I see and the backtrace probably would not help much because "len"
might be manipulated also from NMI context. I am going to change
this to:

if (i && i >= len)
pr_err("printk_nmi_flush: internal error: i=%d >=
len=%d)\n", i, len);



> > + if (!len)
> > + goto out; /* Someone else has already flushed the buffer. */
> > +
> > + /* Make sure that data has been written up to the @len */
> > + smp_rmb();
> > +
> > + size = min_t(int, len, sizeof(s->buffer));
>
> len and size should have type size_t.

OK

> > --- /dev/null
> > +++ b/kernel/printk/printk.h
>
> I find it a bit irritating to have duplicated filenames. We could
> follow convention and call this "internal.h".

No problem. I am going to send an updated patchset soon.

Thanks a lot for review,
Petr

2015-12-11 11:10:06

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On Wed, Dec 9, 2015 at 2:21 PM, Petr Mladek <[email protected]> wrote:
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -866,6 +866,28 @@ config LOG_CPU_MAX_BUF_SHIFT
> 13 => 8 KB for each CPU
> 12 => 4 KB for each CPU
>
> +config NMI_LOG_BUF_SHIFT
> + int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
> + range 10 21
> + default 13
> + depends on PRINTK && HAVE_NMI

Symbol NMI_LOG_BUF_SHIFT does not exist if its dependencies are not met.

> + help
> + Select the size of a per-CPU buffer where NMI messages are temporary
> + stored. They are copied to the main log buffer in a safe context
> + to avoid a deadlock. The value defines the size as a power of 2.
> +
> + NMI messages are rare and limited. The largest one is when
> + a backtrace is printed. It usually fits into 4KB. Select
> + 8KB if you want to be on the safe side.
> +
> + Examples:
> + 17 => 128 KB for each CPU
> + 16 => 64 KB for each CPU
> + 15 => 32 KB for each CPU
> + 14 => 16 KB for each CPU
> + 13 => 8 KB for each CPU
> + 12 => 4 KB for each CPU
> +
> #
> # Architectures with an unreliable sched_clock() should select this:
> #
> diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
> index 5465230b75ec..78c07d441b4e 100644
> --- a/kernel/printk/nmi.c
> +++ b/kernel/printk/nmi.c
> @@ -41,7 +41,8 @@ DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
> static int printk_nmi_irq_ready;
> atomic_t nmi_message_lost;
>
> -#define NMI_LOG_BUF_LEN (4096 - sizeof(atomic_t) - sizeof(struct irq_work))
> +#define NMI_LOG_BUF_LEN ((1 << CONFIG_NMI_LOG_BUF_SHIFT) - \
> + sizeof(atomic_t) - sizeof(struct irq_work))

kernel/printk/nmi.c:50:24: error: 'CONFIG_NMI_LOG_BUF_SHIFT'
undeclared here (not in a function)

E.g. efm32_defconfig
http://kisskb.ellerman.id.au/kisskb/buildresult/12565754/

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2015-12-11 12:42:06

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On Fri 2015-12-11 12:10:02, Geert Uytterhoeven wrote:
> On Wed, Dec 9, 2015 at 2:21 PM, Petr Mladek <[email protected]> wrote:
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -866,6 +866,28 @@ config LOG_CPU_MAX_BUF_SHIFT
> > 13 => 8 KB for each CPU
> > 12 => 4 KB for each CPU
> >
> > +config NMI_LOG_BUF_SHIFT
> > + int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
> > + range 10 21
> > + default 13
> > + depends on PRINTK && HAVE_NMI
>
> Symbol NMI_LOG_BUF_SHIFT does not exist if its dependencies are not met.

?h, the NMI buffer is enabled on arm via NEED_PRINTK_NMI.

The buffer is compiled when CONFIG_PRINTK_NMI is defined. I am going
to fix it the following way:


diff --git a/init/Kconfig b/init/Kconfig
index efcff25a112d..61cfd96a3c96 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -870,7 +870,7 @@ config NMI_LOG_BUF_SHIFT
int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
range 10 21
default 13
- depends on PRINTK && HAVE_NMI
+ depends on PRINTK_NMI
help
Select the size of a per-CPU buffer where NMI messages are temporary
stored. They are copied to the main log buffer in a safe context


Thanks a lot for report,
Petr

2015-12-11 12:49:34

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On Friday 11 December 2015 13:41:59 Petr Mladek wrote:
> diff --git a/init/Kconfig b/init/Kconfig
> index efcff25a112d..61cfd96a3c96 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -870,7 +870,7 @@ config NMI_LOG_BUF_SHIFT
> int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
> range 10 21
> default 13
> - depends on PRINTK && HAVE_NMI
> + depends on PRINTK_NMI
> help
> Select the size of a per-CPU buffer where NMI messages are temporary
> stored. They are copied to the main log buffer in a safe context
>
>

Acked-by: Arnd Bergmann <[email protected]>

I found this on linux-next as well today and came to the same conclusion.

Arnd

2015-12-11 12:50:19

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On Fri, Dec 11, 2015 at 1:41 PM, Petr Mladek <[email protected]> wrote:
> On Fri 2015-12-11 12:10:02, Geert Uytterhoeven wrote:
>> On Wed, Dec 9, 2015 at 2:21 PM, Petr Mladek <[email protected]> wrote:
>> > --- a/init/Kconfig
>> > +++ b/init/Kconfig
>> > @@ -866,6 +866,28 @@ config LOG_CPU_MAX_BUF_SHIFT
>> > 13 => 8 KB for each CPU
>> > 12 => 4 KB for each CPU
>> >
>> > +config NMI_LOG_BUF_SHIFT
>> > + int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
>> > + range 10 21
>> > + default 13
>> > + depends on PRINTK && HAVE_NMI
>>
>> Symbol NMI_LOG_BUF_SHIFT does not exist if its dependencies are not met.
>
> Åh, the NMI buffer is enabled on arm via NEED_PRINTK_NMI.
>
> The buffer is compiled when CONFIG_PRINTK_NMI is defined. I am going
> to fix it the following way:
>
>
> diff --git a/init/Kconfig b/init/Kconfig
> index efcff25a112d..61cfd96a3c96 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -870,7 +870,7 @@ config NMI_LOG_BUF_SHIFT
> int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
> range 10 21
> default 13
> - depends on PRINTK && HAVE_NMI
> + depends on PRINTK_NMI
> help
> Select the size of a per-CPU buffer where NMI messages are temporary
> stored. They are copied to the main log buffer in a safe context

Makes sense, as kernel/printk/nmi.c is compiled if PRINTK_NMI is set.

Acked-by: Geert Uytterhoeven <[email protected]>

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2015-12-11 22:57:30

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On Fri, 11 Dec 2015 13:41:59 +0100 Petr Mladek <[email protected]> wrote:

> On Fri 2015-12-11 12:10:02, Geert Uytterhoeven wrote:
> > On Wed, Dec 9, 2015 at 2:21 PM, Petr Mladek <[email protected]> wrote:
> > > --- a/init/Kconfig
> > > +++ b/init/Kconfig
> > > @@ -866,6 +866,28 @@ config LOG_CPU_MAX_BUF_SHIFT
> > > 13 => 8 KB for each CPU
> > > 12 => 4 KB for each CPU
> > >
> > > +config NMI_LOG_BUF_SHIFT
> > > + int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
> > > + range 10 21
> > > + default 13
> > > + depends on PRINTK && HAVE_NMI
> >
> > Symbol NMI_LOG_BUF_SHIFT does not exist if its dependencies are not met.
>
> __h, the NMI buffer is enabled on arm via NEED_PRINTK_NMI.
>
> The buffer is compiled when CONFIG_PRINTK_NMI is defined. I am going
> to fix it the following way:
>
>
> diff --git a/init/Kconfig b/init/Kconfig
> index efcff25a112d..61cfd96a3c96 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -870,7 +870,7 @@ config NMI_LOG_BUF_SHIFT
> int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)"
> range 10 21
> default 13
> - depends on PRINTK && HAVE_NMI
> + depends on PRINTK_NMI
> help
> Select the size of a per-CPU buffer where NMI messages are temporary
> stored. They are copied to the main log buffer in a safe context

I'm wondering why we're building kernel/printk/nmi.o if HAVE_NMI is not
set.

obj-$(CONFIG_PRINTK_NMI) += nmi.o

and

config PRINTK_NMI
def_bool y
depends on PRINTK
depends on HAVE_NMI || NEED_PRINTK_NMI

This is a bit messy. NEED_PRINTK_NMI is an added-on hack for one
particular arm variant. From the changelog:

"One exception is arm where the deferred printing is used for
printing backtraces even without NMI. For this purpose, we define
NEED_PRINTK_NMI Kconfig flag. The alternative printk_func is
explicitly set when IPI_CPU_BACKTRACE is handled."


- why does arm needs deferred printing for backtraces?

- why is this specific to CONFIG_CPU_V7M?

- can this Kconfig logic be cleaned up a bit?

2015-12-11 23:22:36

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On Fri, Dec 11, 2015 at 02:57:25PM -0800, Andrew Morton wrote:
> This is a bit messy. NEED_PRINTK_NMI is an added-on hack for one
> particular arm variant. From the changelog:
>
> "One exception is arm where the deferred printing is used for
> printing backtraces even without NMI. For this purpose, we define
> NEED_PRINTK_NMI Kconfig flag. The alternative printk_func is
> explicitly set when IPI_CPU_BACKTRACE is handled."
>
>
> - why does arm needs deferred printing for backtraces?
>
> - why is this specific to CONFIG_CPU_V7M?
>
> - can this Kconfig logic be cleaned up a bit?

I think this comes purely from this attempt to apply another round of
cleanups to the nmi backtrace work I did.

As I explained when I did that work, the vast majority of ARM platforms
are unable to trigger anything like a NMI - the FIQ is something that's
generally a property of the secure monitor, and is not accessible to
Linux. However, there are platforms where it is accessible.

The work to add the FIQ-based variant never happened (I've no idea what
happened to that part, Daniel seems to have lost interest in working on
it.) So, what we have is the IRQ-based variant merged in mainline, which
would be the fallback for the "FIQ not available" cases, and I carry a
local hack in my tree which provides the FIQ-based version - but if it
were to trigger, it takes out all interrupts (hence why I've not merged
my hack.)

I think the reason that the FIQ-based variant has never really happened
is that hooking into the interrupt controller code to clear down the FIQ
creates such a horrid layering violation, and also a locking mess that
I suspect it's just been given up with.

However, I've found my "hack" useful - it's turned a number of totally
undebuggable hangs (where one CPU silently hangs leaving the others
running with no way to find out where the hung CPU is) into something
that can be debugged.

Now, when we end up triggering the IRQ-based variant, we could already
be in a situation where IRQs are off for the local CPU, so the IRQ is
never delivered. Others decided that it wasn't acceptable to wait 10sec
for the local CPU to time out, and (iirc) we'd also loose the local CPUs
backtrace in certain situations.

I'm personally happy with the existing code, and I've been wondering why
there's this effort to apply further cleanups - to me, the changelogs
don't seem to make that much sense, unless we want to start using
printk() extensively in NMI functions - using the generic nmi backtrace
code surely gets us something that works across all architectures...

I've been assuming that I've missed something, which is why I've not
said anything on that point until now.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-12-11 23:26:18

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On Fri, 11 Dec 2015, Russell King - ARM Linux wrote:

> I'm personally happy with the existing code, and I've been wondering why
> there's this effort to apply further cleanups - to me, the changelogs
> don't seem to make that much sense, unless we want to start using
> printk() extensively in NMI functions - using the generic nmi backtrace
> code surely gets us something that works across all architectures...

It is already being used extensively, and not only for all-CPU backtraces.
For starters, please consider

- WARN_ON(in_nmi())
- BUG_ON(in_nmi())
- anything being printed out from MCE handlers

--
Jiri Kosina
SUSE Labs

2015-12-11 23:30:59

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On Fri, 11 Dec 2015 23:21:13 +0000 Russell King - ARM Linux <[email protected]> wrote:

> On Fri, Dec 11, 2015 at 02:57:25PM -0800, Andrew Morton wrote:
> > This is a bit messy. NEED_PRINTK_NMI is an added-on hack for one
> > particular arm variant. From the changelog:
> >
> > "One exception is arm where the deferred printing is used for
> > printing backtraces even without NMI. For this purpose, we define
> > NEED_PRINTK_NMI Kconfig flag. The alternative printk_func is
> > explicitly set when IPI_CPU_BACKTRACE is handled."
> >
> >
> > - why does arm needs deferred printing for backtraces?
> >
> > - why is this specific to CONFIG_CPU_V7M?
> >
> > - can this Kconfig logic be cleaned up a bit?
>
> I think this comes purely from this attempt to apply another round of
> cleanups to the nmi backtrace work I did.
>
> As I explained when I did that work, the vast majority of ARM platforms
> are unable to trigger anything like a NMI - the FIQ is something that's
> generally a property of the secure monitor, and is not accessible to
> Linux. However, there are platforms where it is accessible.

OK, thanks. So "not needed at present, might be needed in the future,
useful for out-of-tree debug code"?

> I'm personally happy with the existing code, and I've been wondering why
> there's this effort to apply further cleanups - to me, the changelogs
> don't seem to make that much sense, unless we want to start using
> printk() extensively in NMI functions - using the generic nmi backtrace
> code surely gets us something that works across all architectures...

Yes, I was scratching my head over that. The patchset takes an nmi-safe
all-cpu-backtrace and generalises that into an nmi-safe printk. That
*sounds* like a good thing to do but yes, some additional justification
would be helpful. What real-world value does this patchset really
bring to real-world users?

2015-12-14 10:28:55

by Daniel Thompson

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On 11/12/15 23:21, Russell King - ARM Linux wrote:
> As I explained when I did that work, the vast majority of ARM platforms
> are unable to trigger anything like a NMI - the FIQ is something that's
> generally a property of the secure monitor, and is not accessible to
> Linux. However, there are platforms where it is accessible.
>
> The work to add the FIQ-based variant never happened (I've no idea what
> happened to that part, Daniel seems to have lost interest in working on
> it.) So, what we have is the IRQ-based variant merged in mainline, which
> would be the fallback for the "FIQ not available" cases, and I carry a
> local hack in my tree which provides the FIQ-based version - but if it
> were to trigger, it takes out all interrupts (hence why I've not merged
> my hack.)
>
> I think the reason that the FIQ-based variant has never really happened
> is that hooking into the interrupt controller code to clear down the FIQ
> creates such a horrid layering violation, and also a locking mess that
> I suspect it's just been given up with.

I haven't quite given up; I'm still looking into this stuff. However
you're certainly right that connecting the FIQ handler to the GIC code
in an elegant way is tough.

I've been working in parallel on an arm64 implementation with the result
that I'm now two lumps of code that are almost, but not quite, ready.

Right now I hope to share latest arm code fairly late in the this
devcycle (for review rather than merge) followed up with a new version
very early in v4.6. Even now I think the code needs a long soak in -next
just in case there are any lurking regressions on particular platforms.

I don't expect anyone to base decisions on my aspirations above but
would like to reassure Russell that I haven't given up on it.


Daniel.

2015-12-15 14:26:28

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On Fri 2015-12-11 15:30:54, Andrew Morton wrote:
> On Fri, 11 Dec 2015 23:21:13 +0000 Russell King - ARM Linux <[email protected]> wrote:
>
> > On Fri, Dec 11, 2015 at 02:57:25PM -0800, Andrew Morton wrote:
> > > This is a bit messy. NEED_PRINTK_NMI is an added-on hack for one
> > > particular arm variant. From the changelog:
> > >
> > > "One exception is arm where the deferred printing is used for
> > > printing backtraces even without NMI. For this purpose, we define
> > > NEED_PRINTK_NMI Kconfig flag. The alternative printk_func is
> > > explicitly set when IPI_CPU_BACKTRACE is handled."
> > >
> > >
> > > - why does arm needs deferred printing for backtraces?
> > >
> > > - why is this specific to CONFIG_CPU_V7M?



> > > - can this Kconfig logic be cleaned up a bit?
> >
> > I think this comes purely from this attempt to apply another round of
> > cleanups to the nmi backtrace work I did.
> >
> > As I explained when I did that work, the vast majority of ARM platforms
> > are unable to trigger anything like a NMI - the FIQ is something that's
> > generally a property of the secure monitor, and is not accessible to
> > Linux. However, there are platforms where it is accessible.
>
> OK, thanks. So "not needed at present, might be needed in the future,
> useful for out-of-tree debug code"?

It is possible that I got it a wrong way on arm. The NMI buffer is
usable there on two locations.

First, the temporary is currently used to handle IPI_CPU_BACKTRACE.
It seems that it is not a real NMI. But it seems to be available
(compiled) on all arm system. This is why I introduced NEED_PRINTK_NMI
Kconfig flag to avoid confusion with a real NMI.

Second, there is the FIQ "NMI" handler that is called from
/arch/arm/kernel/entry-armv.S. It is compiled only if _not_
defined $(CONFIG_CPU_V7M). It calls nmi_enter() and nmi_stop().
It looks like a real NMI handler. This is why I defined HAVE_NMI
if (!CPU_V7M).

A solution would be to define HAVE_NMI on all Arm systems and get rid
of NEED_PRINTK_NMI. If you think that it would cause less confusion...


> > there's this effort to apply further cleanups - to me, the changelogs
> > don't seem to make that much sense, unless we want to start using
> > printk() extensively in NMI functions - using the generic nmi backtrace
> > code surely gets us something that works across all architectures...
>
> Yes, I was scratching my head over that. The patchset takes an nmi-safe
> all-cpu-backtrace and generalises that into an nmi-safe printk. That
> *sounds* like a good thing to do but yes, some additional justification
> would be helpful. What real-world value does this patchset really
> bring to real-world users?

The patchset brings two big advantages. First, it makes the NMI
backtraces safe on all architectures for free. Second, it makes
all NMI messages almost[*] safe on all architectures.

Note that there already are several messages printed in NMI context.
See the mail from Jiri Kosina. They are not easy to avoid.

[*] The temporary buffer is limited. We still should keep
the number of messages in NMI context at minimum.

Best Regards,
Petr

2015-12-17 22:39:03

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On Tue, 15 Dec 2015 15:26:21 +0100 Petr Mladek <[email protected]> wrote:

> > OK, thanks. So "not needed at present, might be needed in the future,
> > useful for out-of-tree debug code"?
>
> It is possible that I got it a wrong way on arm. The NMI buffer is
> usable there on two locations.
>
> First, the temporary is currently used to handle IPI_CPU_BACKTRACE.
> It seems that it is not a real NMI. But it seems to be available
> (compiled) on all arm system. This is why I introduced NEED_PRINTK_NMI
> Kconfig flag to avoid confusion with a real NMI.
>
> Second, there is the FIQ "NMI" handler that is called from
> /arch/arm/kernel/entry-armv.S. It is compiled only if _not_
> defined $(CONFIG_CPU_V7M). It calls nmi_enter() and nmi_stop().
> It looks like a real NMI handler. This is why I defined HAVE_NMI
> if (!CPU_V7M).
>
> A solution would be to define HAVE_NMI on all Arm systems and get rid
> of NEED_PRINTK_NMI. If you think that it would cause less confusion...

So does this mean that the patch will be updated?

>
> > > there's this effort to apply further cleanups - to me, the changelogs
> > > don't seem to make that much sense, unless we want to start using
> > > printk() extensively in NMI functions - using the generic nmi backtrace
> > > code surely gets us something that works across all architectures...
> >
> > Yes, I was scratching my head over that. The patchset takes an nmi-safe
> > all-cpu-backtrace and generalises that into an nmi-safe printk. That
> > *sounds* like a good thing to do but yes, some additional justification
> > would be helpful. What real-world value does this patchset really
> > bring to real-world users?
>
> The patchset brings two big advantages. First, it makes the NMI
> backtraces safe on all architectures for free. Second, it makes
> all NMI messages almost[*] safe on all architectures.
>
> Note that there already are several messages printed in NMI context.
> See the mail from Jiri Kosina. They are not easy to avoid.
>
> [*] The temporary buffer is limited. We still should keep
> the number of messages in NMI context at minimum.

This is important info - in fact a paragraph which starts with "The
patchset brings two big advantages" is *the most* important info. I
added the below text to the [1/n] changelog:

: The patchset brings two big advantages. First, it makes the NMI
: backtraces safe on all architectures for free. Second, it makes all NMI
: messages almost safe on all architectures (the temporary buffer is
: limited. We still should keep the number of messages in NMI context at
: minimum).
:
: Note that there already are several messages printed in NMI context:
: WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
: handlers. These are not easy to avoid.

2015-12-18 10:18:18

by Daniel Thompson

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On 11/12/15 23:26, Jiri Kosina wrote:
> On Fri, 11 Dec 2015, Russell King - ARM Linux wrote:
>
>> I'm personally happy with the existing code, and I've been wondering why
>> there's this effort to apply further cleanups - to me, the changelogs
>> don't seem to make that much sense, unless we want to start using
>> printk() extensively in NMI functions - using the generic nmi backtrace
>> code surely gets us something that works across all architectures...
>
> It is already being used extensively, and not only for all-CPU backtraces.
> For starters, please consider
>
> - WARN_ON(in_nmi())
> - BUG_ON(in_nmi())

Sorry to join in so late but...

Today we risk deadlock when we try to issue these diagnostic errors
directly from NMI context.

After this change we will still risk deadlock, because that's what the
diagnostic code is trying to tell us, *and* we delay actually reporting
the error until, and only if, the NMI handler completes.

I'm not entirely sure that this is an improvement.


> - anything being printed out from MCE handlers

The MCE handlers should only call printk() when they decide to panic and
*after* busting the spinlocks. At this point deferring printk() until it
is safe is not very helpful.

When we bust the spinlocks we should probably restore the normal
printk() function to give best chance of the failure messages making it out.


Daniel.

2015-12-18 11:29:29

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On Fri, Dec 18, 2015 at 10:18:08AM +0000, Daniel Thompson wrote:
> I'm not entirely sure that this is an improvement.

What I do these days is delete everything in vprintk_emit() and simply
call early_printk().

Kill the useless kmsg buffer crap and locking, just pound bytes to the
UART registers without anything in between.

The other semi usable solution is redirecting to trace_printk() and
recovering the trace buffers from your kdump. But I've found that
typically kdump doesn't work anymore if you properly wedge the machine.
So this is very much a second rate solution.

But this globally locked buffer, calling out to console drivers that do
locking and even scheduling, is an unreliable unfixable trainwreck that
I've given up on.

2015-12-18 12:11:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On Fri, Dec 18, 2015 at 12:29:02PM +0100, Peter Zijlstra wrote:
> On Fri, Dec 18, 2015 at 10:18:08AM +0000, Daniel Thompson wrote:
> > I'm not entirely sure that this is an improvement.
>
> What I do these days is delete everything in vprintk_emit() and simply
> call early_printk().

On that, whoever made the device model use vprintk_emit() broke the
debugger (KGDB/KDB) printk intercept, and the whole vprintk_func
redirection scheme.

2015-12-18 14:52:15

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On Fri 2015-12-18 10:18:08, Daniel Thompson wrote:
> On 11/12/15 23:26, Jiri Kosina wrote:
> >On Fri, 11 Dec 2015, Russell King - ARM Linux wrote:
> >
> >>I'm personally happy with the existing code, and I've been wondering why
> >>there's this effort to apply further cleanups - to me, the changelogs
> >>don't seem to make that much sense, unless we want to start using
> >>printk() extensively in NMI functions - using the generic nmi backtrace
> >>code surely gets us something that works across all architectures...
> >
> >It is already being used extensively, and not only for all-CPU backtraces.
> >For starters, please consider
> >
> >- WARN_ON(in_nmi())
> >- BUG_ON(in_nmi())
>
> Sorry to join in so late but...
>
> Today we risk deadlock when we try to issue these diagnostic errors
> directly from NMI context.
>
> After this change we will still risk deadlock, because that's what
> the diagnostic code is trying to tell us, *and* we delay actually
> reporting the error until, and only if, the NMI handler completes.

I think that NMI messages about a possible deadlock are the ones
from

kernel/locking/rtmutex.c
kernel/irq_work.c
include/linux/hardirq.h

You are right that if the deadlock happens, this patch set lowers the
chance to see the message.

On the other hand, all the other printk's in NMI seems to be non-fatal
warnings. In this case, this patch set increases the chance to see
them.

A compromise might be to explicitly call printk_nmi_flush() in the few
fatal cases. Alternatively we could force the messages on the
early_console when available.


> >- anything being printed out from MCE handlers
>
> The MCE handlers should only call printk() when they decide to panic
> and *after* busting the spinlocks. At this point deferring printk()
> until it is safe is not very helpful.
>
> When we bust the spinlocks we should probably restore the normal
> printk() function to give best chance of the failure messages making
> it out.

The problem is that we do not know what locks need to be busted. There
are too many consoles and too many locks involved. Also busting locks
open another can of worms.

Best Regards,
Petr

2015-12-18 16:18:42

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On Thu 2015-12-17 14:38:58, Andrew Morton wrote:
> On Tue, 15 Dec 2015 15:26:21 +0100 Petr Mladek <[email protected]> wrote:
>
> > > OK, thanks. So "not needed at present, might be needed in the future,
> > > useful for out-of-tree debug code"?
> >
> > It is possible that I got it a wrong way on arm. The NMI buffer is
> > usable there on two locations.
> >
> > First, the temporary is currently used to handle IPI_CPU_BACKTRACE.
> > It seems that it is not a real NMI. But it seems to be available
> > (compiled) on all arm system. This is why I introduced NEED_PRINTK_NMI
> > Kconfig flag to avoid confusion with a real NMI.
> >
> > Second, there is the FIQ "NMI" handler that is called from
> > /arch/arm/kernel/entry-armv.S. It is compiled only if _not_
> > defined $(CONFIG_CPU_V7M). It calls nmi_enter() and nmi_stop().
> > It looks like a real NMI handler. This is why I defined HAVE_NMI
> > if (!CPU_V7M).
> >
> > A solution would be to define HAVE_NMI on all Arm systems and get rid
> > of NEED_PRINTK_NMI. If you think that it would cause less confusion...
>
> So does this mean that the patch will be updated?

Please, find the follow up patch below. I guess that you will want to
squash it into the [1/n] one.


>From 144d90ada2e34b8807efcc01922c48d7c09797e7 Mon Sep 17 00:00:00 2001
From: Petr Mladek <[email protected]>
Date: Fri, 18 Dec 2015 16:04:35 +0100
Subject: [PATCH] printk/nmi: Remove the questionable CONFIG_NEED_PRINTK_NMI

The flag NEED_PRINTK_NMI was added because of Arm. It used the NMI safe
backtrace implementation on all Arm systems. But it did not have a real
NMI handling on CPU_V7M.

It seems that it causes more confusion than good. Let's use HAVE_NMI
on all arm systems and get rid of the problematic flag.

Signed-off-by: Petr Mladek <[email protected]>
---
arch/Kconfig | 3 ---
arch/arm/Kconfig | 3 +--
init/Kconfig | 2 +-
3 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 7ce5101c2472..d1a18b313624 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -190,9 +190,6 @@ config HAVE_KPROBES_ON_FTRACE
config HAVE_NMI
bool

-config NEED_PRINTK_NMI
- bool
-
config HAVE_NMI_WATCHDOG
depends on HAVE_NMI
bool
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 96d2c275f0f7..01dc56d8f31b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -63,8 +63,7 @@ config ARM
select HAVE_KRETPROBES if (HAVE_KPROBES)
select HAVE_MEMBLOCK
select HAVE_MOD_ARCH_SPECIFIC
- select HAVE_NMI if (!CPU_V7M)
- select NEED_PRINTK_NMI if (CPU_V7M)
+ select HAVE_NMI
select HAVE_OPROFILE if (HAVE_PERF_EVENTS)
select HAVE_OPTPROBES if !THUMB2_KERNEL
select HAVE_PERF_EVENTS
diff --git a/init/Kconfig b/init/Kconfig
index 61cfd96a3c96..abf79f3b1a55 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1481,7 +1481,7 @@ config PRINTK
config PRINTK_NMI
def_bool y
depends on PRINTK
- depends on HAVE_NMI || NEED_PRINTK_NMI
+ depends on HAVE_NMI

config BUG
bool "BUG() support" if EXPERT
--
1.8.5.6

2015-12-18 17:00:23

by Daniel Thompson

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On 18/12/15 14:52, Petr Mladek wrote:
> On Fri 2015-12-18 10:18:08, Daniel Thompson wrote:
>> On 11/12/15 23:26, Jiri Kosina wrote:
>>> On Fri, 11 Dec 2015, Russell King - ARM Linux wrote:
>>>
>>>> I'm personally happy with the existing code, and I've been wondering why
>>>> there's this effort to apply further cleanups - to me, the changelogs
>>>> don't seem to make that much sense, unless we want to start using
>>>> printk() extensively in NMI functions - using the generic nmi backtrace
>>>> code surely gets us something that works across all architectures...
>>>
>>> It is already being used extensively, and not only for all-CPU backtraces.
>>> For starters, please consider
>>>
>>> - WARN_ON(in_nmi())
>>> - BUG_ON(in_nmi())
>>
>> Sorry to join in so late but...
>>
>> Today we risk deadlock when we try to issue these diagnostic errors
>> directly from NMI context.
>>
>> After this change we will still risk deadlock, because that's what
>> the diagnostic code is trying to tell us, *and* we delay actually
>> reporting the error until, and only if, the NMI handler completes.
>
> I think that NMI messages about a possible deadlock are the ones
> from
>
> kernel/locking/rtmutex.c
> kernel/irq_work.c
> include/linux/hardirq.h
>
> You are right that if the deadlock happens, this patch set lowers the
> chance to see the message.
>
> On the other hand, all the other printk's in NMI seems to be non-fatal
> warnings. In this case, this patch set increases the chance to see
> them.

Maybe for a WARN_ON() the trade off is worth it but I don't think a
BUG_ON() trace would ever make it out.


> A compromise might be to explicitly call printk_nmi_flush() in the few
> fatal cases. Alternatively we could force the messages on the
> early_console when available.
>
>
>>> - anything being printed out from MCE handlers
>>
>> The MCE handlers should only call printk() when they decide to panic
>> and *after* busting the spinlocks. At this point deferring printk()
>> until it is safe is not very helpful.
>>
>> When we bust the spinlocks we should probably restore the normal
>> printk() function to give best chance of the failure messages making
>> it out.
>
> The problem is that we do not know what locks need to be busted. There
> are too many consoles and too many locks involved. Also busting locks
> open another can of worms.

Yes, I agree that busting the spinlocks doesn't avoid all risk of deadlock.

Probably I've been placing too much weight on the importance of getting
messages out when dying. You're right that surviving far enough through
a panic to trigger kdump or reset is equally (or more) important in many
scenarios than getting a failure message out.

However on a system with nothing but "while(1) {}" hooked up to panic()
then its worth risking a lock up. In this case restoring normal printk()
behavior and dumping the NMI buffers would be worthwhile.


Daniel.

2015-12-18 23:03:24

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On Fri, 18 Dec 2015 13:11:41 +0100 Peter Zijlstra <[email protected]> wrote:

> On Fri, Dec 18, 2015 at 12:29:02PM +0100, Peter Zijlstra wrote:
> > On Fri, Dec 18, 2015 at 10:18:08AM +0000, Daniel Thompson wrote:
> > > I'm not entirely sure that this is an improvement.
> >
> > What I do these days is delete everything in vprintk_emit() and simply
> > call early_printk().
>
> On that, whoever made the device model use vprintk_emit() broke the
> debugger (KGDB/KDB) printk intercept, and the whole vprintk_func
> redirection scheme.

crap, we have a whole set of interfaces which are broken this way.
printk_emit(), vprintk(), vprintk_emit().


commit 7ff9554bb578ba02166071d2d487b7fc7d860d62
Author: Kay Sievers <[email protected]>
AuthorDate: Thu May 3 02:29:13 2012 +0200
Commit: Greg Kroah-Hartman <[email protected]>
CommitDate: Mon May 7 16:53:02 2012 -0700

printk: convert byte-buffer to variable-length record buffer

2016-03-01 14:05:12

by Daniel Thompson

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] printk/nmi: Increase the size of NMI buffer and make it configurable

On 18/12/15 17:00, Daniel Thompson wrote:
>>> The MCE handlers should only call printk() when they decide to panic
>>> and *after* busting the spinlocks. At this point deferring printk()
>>> until it is safe is not very helpful.
>>>
>>> When we bust the spinlocks we should probably restore the normal
>>> printk() function to give best chance of the failure messages making
>>> it out.
>>
>> The problem is that we do not know what locks need to be busted. There
>> are too many consoles and too many locks involved. Also busting locks
>> open another can of worms.
>
> Yes, I agree that busting the spinlocks doesn't avoid all risk of deadlock.
>
> Probably I've been placing too much weight on the importance of getting
> messages out when dying. You're right that surviving far enough through
> a panic to trigger kdump or reset is equally (or more) important in many
> scenarios than getting a failure message out.
>
> However on a system with nothing but "while(1) {}" hooked up to panic()
> then its worth risking a lock up. In this case restoring normal printk()
> behavior and dumping the NMI buffers would be worthwhile.

An a (much) later thread[1] Andrew Morton described this comment as
non-committal. Sorry for that.

I don't object to the overall approach taken by Petr, merely that I
think there are cases where the current patchset doesn't put in quite
enough effort to issue messages.

Panic triggered during NMI is the only case I can think of and that, I
think, could be addressed by adding an extra call to printk_nmi_flush()
during panic(). It should probably only cover cases where we *don't*
call kdump and the panic handler doesn't restart the machine... so just
after the pr_emerg("...end kernel panic") would be OK for me.


Daniel.


[1]: http://thread.gmane.org/gmane.linux.ports.arm.kernel/482845