2015-05-14 11:42:48

by Wang Long

[permalink] [raw]
Subject: [RFC PATCH 00/17][request for stable 3.10 inclusion] x86/nmi: Print all cpu stacks from NMI safely

This is my backport patch series to Fix the problem(backport to 3.10):
"
When trigger_all_cpu_backtrace() is called on x86, it will trigger an
NMI on each CPU and call show_regs(). But this can lead to a hard lock
up if the NMI comes in on another printk().
"
The solution is described in commit "a9edc88093287183ac934be44f295f183b2c62dd":
when the NMI triggers, it switches the printk routine for that CPU to call
a NMI safe printk function that records the printk in a per_cpu seq_buf
descriptor. After all NMIs have finished recording its data, the trace_
seqs are printed in a safe context.

The solution use "switch printk routine" and "seq_buf" infrastructures, but the
3.10 stable have no both of them.

The patch 1-13 backport the "seq_buf" infrastructures. in detail, patch 1, 2
and 6 only backport "seq_buf" related code.

The patch 14-15 backport the "switch printk routine".

The patch 16-17 is the patch to print all cpu stacks from NMI safely

as discussed in https://lkml.org/lkml/2015/5/13/497, in 3.10 stable, this is
the only way to solve the problem and the backport code is a bit more.

Any thoughts?

Sasha Levin (1):
x86/nmi: Fix use of unallocated cpumask_var_t

Steven Rostedt (Red Hat) (16):
tracing: Create seq_buf layer in trace_seq
tracing: Convert seq_buf_path() to be like seq_path()
tracing: Convert seq_buf fields to be like seq_file fields
tracing: Add a seq_buf_clear() helper and clear len and readpos in
init
seq_buf: Create seq_buf_used() to find out how much was written
tracing: Use trace_seq_used() and seq_buf_used() instead of len
seq_buf: Add seq_buf_can_fit() helper function
tracing: Have seq_buf use full buffer
tracing: Add seq_buf_get_buf() and seq_buf_commit() helper functions
seq-buf: Make seq_buf_bprintf() conditional on CONFIG_BINARY_PRINTF
seq_buf: Move the seq_buf code to lib/
seq_buf: Fix seq_buf_vprintf() truncation
seq_buf: Fix seq_buf_bprintf() truncation
printk: Add per_cpu printk func to allow printk to be diverted
printk/percpu: Define printk_func when printk is not defined
x86/nmi: Perform a safe NMI stack trace on all CPUs

arch/x86/kernel/apic/hw_nmi.c | 86 +++++++++-
include/linux/percpu.h | 4 +
include/linux/printk.h | 2 +
include/linux/seq_buf.h | 136 ++++++++++++++++
kernel/printk.c | 41 +++--
lib/Makefile | 2 +-
lib/seq_buf.c | 359 ++++++++++++++++++++++++++++++++++++++++++
7 files changed, 617 insertions(+), 13 deletions(-)
create mode 100644 include/linux/seq_buf.h
create mode 100644 lib/seq_buf.c

--
1.8.3.4


2015-05-14 11:39:34

by Wang Long

[permalink] [raw]
Subject: [RFC PATCH 01/17] tracing: Create seq_buf layer in trace_seq

From: "Steven Rostedt (Red Hat)" <[email protected]>

commit 3a161d99c43ce74c76aecff309be4c3ba455e823 upstream.

Create a seq_buf layer that trace_seq sits on. The seq_buf will not
be limited to page size. This will allow other usages of seq_buf
instead of a hard set PAGE_SIZE one that trace_seq has.

Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]

Tested-by: Jiri Kosina <[email protected]>
Acked-by: Jiri Kosina <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
[wanglong: backport to 3.10 stable
- only backport seq_buf related
- adjust context
]
Singed-off-by: Wang Long <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
include/linux/seq_buf.h | 81 ++++++++++++
kernel/trace/Makefile | 1 +
kernel/trace/seq_buf.c | 341 ++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 423 insertions(+)
create mode 100644 include/linux/seq_buf.h
create mode 100644 kernel/trace/seq_buf.c

diff --git a/include/linux/seq_buf.h b/include/linux/seq_buf.h
new file mode 100644
index 0000000..4f7a96a
--- /dev/null
+++ b/include/linux/seq_buf.h
@@ -0,0 +1,81 @@
+#ifndef _LINUX_SEQ_BUF_H
+#define _LINUX_SEQ_BUF_H
+
+#include <linux/fs.h>
+
+/*
+ * Trace sequences are used to allow a function to call several other functions
+ * to create a string of data to use.
+ */
+
+/**
+ * seq_buf - seq buffer structure
+ * @buffer: pointer to the buffer
+ * @size: size of the buffer
+ * @len: the amount of data inside the buffer
+ * @readpos: The next position to read in the buffer.
+ */
+struct seq_buf {
+ unsigned char *buffer;
+ unsigned int size;
+ unsigned int len;
+ unsigned int readpos;
+};
+
+static inline void
+seq_buf_init(struct seq_buf *s, unsigned char *buf, unsigned int size)
+{
+ s->buffer = buf;
+ s->size = size;
+ s->len = 0;
+ s->readpos = 0;
+}
+
+/*
+ * seq_buf have a buffer that might overflow. When this happens
+ * the len and size are set to be equal.
+ */
+static inline bool
+seq_buf_has_overflowed(struct seq_buf *s)
+{
+ return s->len == s->size;
+}
+
+static inline void
+seq_buf_set_overflow(struct seq_buf *s)
+{
+ s->len = s->size;
+}
+
+/*
+ * How much buffer is left on the seq_buf?
+ */
+static inline unsigned int
+seq_buf_buffer_left(struct seq_buf *s)
+{
+ if (seq_buf_has_overflowed(s))
+ return 0;
+
+ return (s->size - 1) - s->len;
+}
+
+extern __printf(2, 3)
+int seq_buf_printf(struct seq_buf *s, const char *fmt, ...);
+extern __printf(2, 0)
+int seq_buf_vprintf(struct seq_buf *s, const char *fmt, va_list args);
+extern int
+seq_buf_bprintf(struct seq_buf *s, const char *fmt, const u32 *binary);
+extern int seq_buf_print_seq(struct seq_file *m, struct seq_buf *s);
+extern int seq_buf_to_user(struct seq_buf *s, char __user *ubuf,
+ int cnt);
+extern int seq_buf_puts(struct seq_buf *s, const char *str);
+extern int seq_buf_putc(struct seq_buf *s, unsigned char c);
+extern int seq_buf_putmem(struct seq_buf *s, const void *mem, unsigned int len);
+extern int seq_buf_putmem_hex(struct seq_buf *s, const void *mem,
+ unsigned int len);
+extern int seq_buf_path(struct seq_buf *s, const struct path *path);
+
+extern int seq_buf_bitmask(struct seq_buf *s, const unsigned long *maskp,
+ int nmaskbits);
+
+#endif /* _LINUX_SEQ_BUF_H */
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index d7e2068..cfb86f1 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -27,6 +27,7 @@ obj-$(CONFIG_RING_BUFFER_BENCHMARK) += ring_buffer_benchmark.o

obj-$(CONFIG_TRACING) += trace.o
obj-$(CONFIG_TRACING) += trace_output.o
+obj-$(CONFIG_TRACING) += seq_buf.o
obj-$(CONFIG_TRACING) += trace_stat.o
obj-$(CONFIG_TRACING) += trace_printk.o
obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
diff --git a/kernel/trace/seq_buf.c b/kernel/trace/seq_buf.c
new file mode 100644
index 0000000..e9a7861
--- /dev/null
+++ b/kernel/trace/seq_buf.c
@@ -0,0 +1,341 @@
+/*
+ * seq_buf.c
+ *
+ * Copyright (C) 2014 Red Hat Inc, Steven Rostedt <[email protected]>
+ *
+ * The seq_buf is a handy tool that allows you to pass a descriptor around
+ * to a buffer that other functions can write to. It is similar to the
+ * seq_file functionality but has some differences.
+ *
+ * To use it, the seq_buf must be initialized with seq_buf_init().
+ * This will set up the counters within the descriptor. You can call
+ * seq_buf_init() more than once to reset the seq_buf to start
+ * from scratch.
+ */
+#include <linux/uaccess.h>
+#include <linux/seq_file.h>
+#include <linux/seq_buf.h>
+
+/* How much buffer is written? */
+#define SEQ_BUF_USED(s) min((s)->len, (s)->size - 1)
+
+/**
+ * seq_buf_print_seq - move the contents of seq_buf into a seq_file
+ * @m: the seq_file descriptor that is the destination
+ * @s: the seq_buf descriptor that is the source.
+ *
+ * Returns zero on success, non zero otherwise
+ */
+int seq_buf_print_seq(struct seq_file *m, struct seq_buf *s)
+{
+ unsigned int len = SEQ_BUF_USED(s);
+
+ return seq_write(m, s->buffer, len);
+}
+
+/**
+ * seq_buf_vprintf - sequence printing of information.
+ * @s: seq_buf descriptor
+ * @fmt: printf format string
+ * @args: va_list of arguments from a printf() type function
+ *
+ * Writes a vnprintf() format into the sequencce buffer.
+ *
+ * Returns zero on success, -1 on overflow.
+ */
+int seq_buf_vprintf(struct seq_buf *s, const char *fmt, va_list args)
+{
+ int len;
+
+ WARN_ON(s->size == 0);
+
+ if (s->len < s->size) {
+ len = vsnprintf(s->buffer + s->len, s->size - s->len, fmt, args);
+ if (s->len + len < s->size) {
+ s->len += len;
+ return 0;
+ }
+ }
+ seq_buf_set_overflow(s);
+ return -1;
+}
+
+/**
+ * seq_buf_printf - sequence printing of information
+ * @s: seq_buf descriptor
+ * @fmt: printf format string
+ *
+ * Writes a printf() format into the sequence buffer.
+ *
+ * Returns zero on success, -1 on overflow.
+ */
+int seq_buf_printf(struct seq_buf *s, const char *fmt, ...)
+{
+ va_list ap;
+ int ret;
+
+ va_start(ap, fmt);
+ ret = seq_buf_vprintf(s, fmt, ap);
+ va_end(ap);
+
+ return ret;
+}
+
+/**
+ * seq_buf_bitmask - write a bitmask array in its ASCII representation
+ * @s: seq_buf descriptor
+ * @maskp: points to an array of unsigned longs that represent a bitmask
+ * @nmaskbits: The number of bits that are valid in @maskp
+ *
+ * Writes a ASCII representation of a bitmask string into @s.
+ *
+ * Returns zero on success, -1 on overflow.
+ */
+int seq_buf_bitmask(struct seq_buf *s, const unsigned long *maskp,
+ int nmaskbits)
+{
+ unsigned int len = seq_buf_buffer_left(s);
+ int ret;
+
+ WARN_ON(s->size == 0);
+
+ /*
+ * The last byte of the buffer is used to determine if we
+ * overflowed or not.
+ */
+ if (len > 1) {
+ ret = bitmap_scnprintf(s->buffer + s->len, len, maskp, nmaskbits);
+ if (ret < len) {
+ s->len += ret;
+ return 0;
+ }
+ }
+ seq_buf_set_overflow(s);
+ return -1;
+}
+
+/**
+ * seq_buf_bprintf - Write the printf string from binary arguments
+ * @s: seq_buf descriptor
+ * @fmt: The format string for the @binary arguments
+ * @binary: The binary arguments for @fmt.
+ *
+ * When recording in a fast path, a printf may be recorded with just
+ * saving the format and the arguments as they were passed to the
+ * function, instead of wasting cycles converting the arguments into
+ * ASCII characters. Instead, the arguments are saved in a 32 bit
+ * word array that is defined by the format string constraints.
+ *
+ * This function will take the format and the binary array and finish
+ * the conversion into the ASCII string within the buffer.
+ *
+ * Returns zero on success, -1 on overflow.
+ */
+int seq_buf_bprintf(struct seq_buf *s, const char *fmt, const u32 *binary)
+{
+ unsigned int len = seq_buf_buffer_left(s);
+ int ret;
+
+ WARN_ON(s->size == 0);
+
+ if (s->len < s->size) {
+ ret = bstr_printf(s->buffer + s->len, len, fmt, binary);
+ if (s->len + ret < s->size) {
+ s->len += ret;
+ return 0;
+ }
+ }
+ seq_buf_set_overflow(s);
+ return -1;
+}
+
+/**
+ * seq_buf_puts - sequence printing of simple string
+ * @s: seq_buf descriptor
+ * @str: simple string to record
+ *
+ * Copy a simple string into the sequence buffer.
+ *
+ * Returns zero on success, -1 on overflow
+ */
+int seq_buf_puts(struct seq_buf *s, const char *str)
+{
+ unsigned int len = strlen(str);
+
+ WARN_ON(s->size == 0);
+
+ if (s->len + len < s->size) {
+ memcpy(s->buffer + s->len, str, len);
+ s->len += len;
+ return 0;
+ }
+ seq_buf_set_overflow(s);
+ return -1;
+}
+
+/**
+ * seq_buf_putc - sequence printing of simple character
+ * @s: seq_buf descriptor
+ * @c: simple character to record
+ *
+ * Copy a single character into the sequence buffer.
+ *
+ * Returns zero on success, -1 on overflow
+ */
+int seq_buf_putc(struct seq_buf *s, unsigned char c)
+{
+ WARN_ON(s->size == 0);
+
+ if (s->len + 1 < s->size) {
+ s->buffer[s->len++] = c;
+ return 0;
+ }
+ seq_buf_set_overflow(s);
+ return -1;
+}
+
+/**
+ * seq_buf_putmem - write raw data into the sequenc buffer
+ * @s: seq_buf descriptor
+ * @mem: The raw memory to copy into the buffer
+ * @len: The length of the raw memory to copy (in bytes)
+ *
+ * There may be cases where raw memory needs to be written into the
+ * buffer and a strcpy() would not work. Using this function allows
+ * for such cases.
+ *
+ * Returns zero on success, -1 on overflow
+ */
+int seq_buf_putmem(struct seq_buf *s, const void *mem, unsigned int len)
+{
+ WARN_ON(s->size == 0);
+
+ if (s->len + len < s->size) {
+ memcpy(s->buffer + s->len, mem, len);
+ s->len += len;
+ return 0;
+ }
+ seq_buf_set_overflow(s);
+ return -1;
+}
+
+#define MAX_MEMHEX_BYTES 8U
+#define HEX_CHARS (MAX_MEMHEX_BYTES*2 + 1)
+
+/**
+ * seq_buf_putmem_hex - write raw memory into the buffer in ASCII hex
+ * @s: seq_buf descriptor
+ * @mem: The raw memory to write its hex ASCII representation of
+ * @len: The length of the raw memory to copy (in bytes)
+ *
+ * This is similar to seq_buf_putmem() except instead of just copying the
+ * raw memory into the buffer it writes its ASCII representation of it
+ * in hex characters.
+ *
+ * Returns zero on success, -1 on overflow
+ */
+int seq_buf_putmem_hex(struct seq_buf *s, const void *mem,
+ unsigned int len)
+{
+ unsigned char hex[HEX_CHARS];
+ const unsigned char *data = mem;
+ unsigned int start_len;
+ int i, j;
+
+ WARN_ON(s->size == 0);
+
+ while (len) {
+ start_len = min(len, HEX_CHARS - 1);
+#ifdef __BIG_ENDIAN
+ for (i = 0, j = 0; i < start_len; i++) {
+#else
+ for (i = start_len-1, j = 0; i >= 0; i--) {
+#endif
+ hex[j++] = hex_asc_hi(data[i]);
+ hex[j++] = hex_asc_lo(data[i]);
+ }
+ if (WARN_ON_ONCE(j == 0 || j/2 > len))
+ break;
+
+ /* j increments twice per loop */
+ len -= j / 2;
+ hex[j++] = ' ';
+
+ seq_buf_putmem(s, hex, j);
+ if (seq_buf_has_overflowed(s))
+ return -1;
+ }
+ return 0;
+}
+
+/**
+ * seq_buf_path - copy a path into the sequence buffer
+ * @s: seq_buf descriptor
+ * @path: path to write into the sequence buffer.
+ *
+ * Write a path name into the sequence buffer.
+ *
+ * Returns zero on success, -1 on overflow
+ */
+int seq_buf_path(struct seq_buf *s, const struct path *path)
+{
+ unsigned int len = seq_buf_buffer_left(s);
+ unsigned char *p;
+
+ WARN_ON(s->size == 0);
+
+ p = d_path(path, s->buffer + s->len, len);
+ if (!IS_ERR(p)) {
+ p = mangle_path(s->buffer + s->len, p, "\n");
+ if (p) {
+ s->len = p - s->buffer;
+ return 0;
+ }
+ }
+ seq_buf_set_overflow(s);
+ return -1;
+}
+
+/**
+ * seq_buf_to_user - copy the squence buffer to user space
+ * @s: seq_buf descriptor
+ * @ubuf: The userspace memory location to copy to
+ * @cnt: The amount to copy
+ *
+ * Copies the sequence buffer into the userspace memory pointed to
+ * by @ubuf. It starts from the last read position (@s->readpos)
+ * and writes up to @cnt characters or till it reaches the end of
+ * the content in the buffer (@s->len), which ever comes first.
+ *
+ * On success, it returns a positive number of the number of bytes
+ * it copied.
+ *
+ * On failure it returns -EBUSY if all of the content in the
+ * sequence has been already read, which includes nothing in the
+ * sequence (@s->len == @s->readpos).
+ *
+ * Returns -EFAULT if the copy to userspace fails.
+ */
+int seq_buf_to_user(struct seq_buf *s, char __user *ubuf, int cnt)
+{
+ int len;
+ int ret;
+
+ if (!cnt)
+ return 0;
+
+ if (s->len <= s->readpos)
+ return -EBUSY;
+
+ len = s->len - s->readpos;
+ if (cnt > len)
+ cnt = len;
+ ret = copy_to_user(ubuf, s->buffer + s->readpos, cnt);
+ if (ret == cnt)
+ return -EFAULT;
+
+ cnt -= ret;
+
+ s->readpos += cnt;
+ return cnt;
+}
--
1.8.3.4

2015-05-14 11:41:54

by Wang Long

[permalink] [raw]
Subject: [RFC PATCH 02/17] tracing: Convert seq_buf_path() to be like seq_path()

From: "Steven Rostedt (Red Hat)" <[email protected]>

commit dd23180aacf4b27d48f40b27249f1e58c8df03be upstream.

Rewrite seq_buf_path() like it is done in seq_path() and allow
it to accept any escape character instead of just "\n".

Making seq_buf_path() like seq_path() will help prevent problems
when converting seq_file to use the seq_buf logic.

Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]

Tested-by: Jiri Kosina <[email protected]>
Acked-by: Jiri Kosina <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
[wanglong: backport to 3.10 stable
- only backport seq_buf related
]
Singed-off-by: Wang Long <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
include/linux/seq_buf.h | 2 +-
kernel/trace/seq_buf.c | 28 ++++++++++++++++------------
2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/include/linux/seq_buf.h b/include/linux/seq_buf.h
index 4f7a96a..3877068 100644
--- a/include/linux/seq_buf.h
+++ b/include/linux/seq_buf.h
@@ -73,7 +73,7 @@ extern int seq_buf_putc(struct seq_buf *s, unsigned char c);
extern int seq_buf_putmem(struct seq_buf *s, const void *mem, unsigned int len);
extern int seq_buf_putmem_hex(struct seq_buf *s, const void *mem,
unsigned int len);
-extern int seq_buf_path(struct seq_buf *s, const struct path *path);
+extern int seq_buf_path(struct seq_buf *s, const struct path *path, const char *esc);

extern int seq_buf_bitmask(struct seq_buf *s, const unsigned long *maskp,
int nmaskbits);
diff --git a/kernel/trace/seq_buf.c b/kernel/trace/seq_buf.c
index e9a7861..7dac34d 100644
--- a/kernel/trace/seq_buf.c
+++ b/kernel/trace/seq_buf.c
@@ -272,28 +272,32 @@ int seq_buf_putmem_hex(struct seq_buf *s, const void *mem,
* seq_buf_path - copy a path into the sequence buffer
* @s: seq_buf descriptor
* @path: path to write into the sequence buffer.
+ * @esc: set of characters to escape in the output
*
* Write a path name into the sequence buffer.
*
- * Returns zero on success, -1 on overflow
+ * Returns the number of written bytes on success, -1 on overflow
*/
-int seq_buf_path(struct seq_buf *s, const struct path *path)
+int seq_buf_path(struct seq_buf *s, const struct path *path, const char *esc)
{
- unsigned int len = seq_buf_buffer_left(s);
- unsigned char *p;
+ char *buf = s->buffer + s->len;
+ size_t size = seq_buf_buffer_left(s);
+ int res = -1;

WARN_ON(s->size == 0);

- p = d_path(path, s->buffer + s->len, len);
- if (!IS_ERR(p)) {
- p = mangle_path(s->buffer + s->len, p, "\n");
- if (p) {
- s->len = p - s->buffer;
- return 0;
+ if (size) {
+ char *p = d_path(path, buf, size);
+ if (!IS_ERR(p)) {
+ char *end = mangle_path(buf, p, esc);
+ if (end)
+ res = end - buf;
}
}
- seq_buf_set_overflow(s);
- return -1;
+ if (res > 0)
+ s->len += res;
+
+ return res;
}

/**
--
1.8.3.4

2015-05-14 11:40:34

by Wang Long

[permalink] [raw]
Subject: [RFC PATCH 03/17] tracing: Convert seq_buf fields to be like seq_file fields

From: "Steven Rostedt (Red Hat)" <[email protected]>

commit 9a7777935c34b9192e28ef3d232a4b6b5414a657 upstream.

In facilitating the conversion of seq_file to use seq_buf,
have the seq_buf fields match the types used by seq_file.

Link: http://lkml.kernel.org/r/[email protected]

Tested-by: Jiri Kosina <[email protected]>
Acked-by: Jiri Kosina <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
[wanglong: backport to 3.10 stable]
Signed-off-by: Wang Long <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
include/linux/seq_buf.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/seq_buf.h b/include/linux/seq_buf.h
index 3877068..d14dc90 100644
--- a/include/linux/seq_buf.h
+++ b/include/linux/seq_buf.h
@@ -16,10 +16,10 @@
* @readpos: The next position to read in the buffer.
*/
struct seq_buf {
- unsigned char *buffer;
- unsigned int size;
- unsigned int len;
- unsigned int readpos;
+ char *buffer;
+ size_t size;
+ size_t len;
+ loff_t readpos;
};

static inline void
--
1.8.3.4

2015-05-14 11:39:30

by Wang Long

[permalink] [raw]
Subject: [RFC PATCH 04/17] tracing: Add a seq_buf_clear() helper and clear len and readpos in init

From: "Steven Rostedt (Red Hat)" <[email protected]>

commit 0736c033a81547b1cdc5120fc8dd60e26a00fd28 upstream.

Add a helper function seq_buf_clear() that resets the len and readpos
fields of a seq_buf. Currently it is only used in the seq_buf_init()
but will be used later when updating the seq_file code.

Link: http://lkml.kernel.org/r/[email protected]

Tested-by: Jiri Kosina <[email protected]>
Acked-by: Jiri Kosina <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
[wanglong: backport to 3.10 stable]
Signed-off-by: Wang Long <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
include/linux/seq_buf.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/seq_buf.h b/include/linux/seq_buf.h
index d14dc90..5d91262 100644
--- a/include/linux/seq_buf.h
+++ b/include/linux/seq_buf.h
@@ -22,13 +22,18 @@ struct seq_buf {
loff_t readpos;
};

+static inline void seq_buf_clear(struct seq_buf *s)
+{
+ s->len = 0;
+ s->readpos = 0;
+}
+
static inline void
seq_buf_init(struct seq_buf *s, unsigned char *buf, unsigned int size)
{
s->buffer = buf;
s->size = size;
- s->len = 0;
- s->readpos = 0;
+ seq_buf_clear(s);
}

/*
--
1.8.3.4

2015-05-14 11:40:57

by Wang Long

[permalink] [raw]
Subject: [RFC PATCH 05/17] seq_buf: Create seq_buf_used() to find out how much was written

From: "Steven Rostedt (Red Hat)" <[email protected]>

commit eeab98154dc0b49afd398afdd71c464a8af5911f upstream.

Add a helper function seq_buf_used() that replaces the SEQ_BUF_USED()
private macro to let callers have a method to know how much of the
seq_buf was written to.

Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]

Reviewed-by: Petr Mladek <[email protected]>
[wanglong: backport to 3.10 stable]
Signed-off-by: Wang Long <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
include/linux/seq_buf.h | 6 ++++++
kernel/trace/seq_buf.c | 5 +----
2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/linux/seq_buf.h b/include/linux/seq_buf.h
index 5d91262..93718e5 100644
--- a/include/linux/seq_buf.h
+++ b/include/linux/seq_buf.h
@@ -64,6 +64,12 @@ seq_buf_buffer_left(struct seq_buf *s)
return (s->size - 1) - s->len;
}

+/* How much buffer was written? */
+static inline unsigned int seq_buf_used(struct seq_buf *s)
+{
+ return min(s->len, s->size);
+}
+
extern __printf(2, 3)
int seq_buf_printf(struct seq_buf *s, const char *fmt, ...);
extern __printf(2, 0)
diff --git a/kernel/trace/seq_buf.c b/kernel/trace/seq_buf.c
index 7dac34d..9ec5305 100644
--- a/kernel/trace/seq_buf.c
+++ b/kernel/trace/seq_buf.c
@@ -16,9 +16,6 @@
#include <linux/seq_file.h>
#include <linux/seq_buf.h>

-/* How much buffer is written? */
-#define SEQ_BUF_USED(s) min((s)->len, (s)->size - 1)
-
/**
* seq_buf_print_seq - move the contents of seq_buf into a seq_file
* @m: the seq_file descriptor that is the destination
@@ -28,7 +25,7 @@
*/
int seq_buf_print_seq(struct seq_file *m, struct seq_buf *s)
{
- unsigned int len = SEQ_BUF_USED(s);
+ unsigned int len = seq_buf_used(s);

return seq_write(m, s->buffer, len);
}
--
1.8.3.4

2015-05-14 11:40:59

by Wang Long

[permalink] [raw]
Subject: [RFC PATCH 06/17] tracing: Use trace_seq_used() and seq_buf_used() instead of len

From: "Steven Rostedt (Red Hat)" <[email protected]>

commit 5ac48378414dccca735897c4d7f4e19987c8977c upstream.

As the seq_buf->len will soon be +1 size when there's an overflow, we
must use trace_seq_used() or seq_buf_used() methods to get the real
length. This will prevent buffer overflow issues if just the len
of the seq_buf descriptor is used to copy memory.

Link: http://lkml.kernel.org/r/[email protected]

Reported-by: Petr Mladek <[email protected]>
[wanglong: backport to 3.10 stable
- only backport seq_buf related
]
Signed-off-by: Wang Long <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
kernel/trace/seq_buf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/seq_buf.c b/kernel/trace/seq_buf.c
index 9ec5305..ce17f65 100644
--- a/kernel/trace/seq_buf.c
+++ b/kernel/trace/seq_buf.c
@@ -328,7 +328,7 @@ int seq_buf_to_user(struct seq_buf *s, char __user *ubuf, int cnt)
if (s->len <= s->readpos)
return -EBUSY;

- len = s->len - s->readpos;
+ len = seq_buf_used(s) - s->readpos;
if (cnt > len)
cnt = len;
ret = copy_to_user(ubuf, s->buffer + s->readpos, cnt);
--
1.8.3.4

2015-05-14 11:39:40

by Wang Long

[permalink] [raw]
Subject: [RFC PATCH 07/17] seq_buf: Add seq_buf_can_fit() helper function

From: "Steven Rostedt (Red Hat)" <[email protected]>

commit 9b77215382b42ef9c5b34293ad3a95332e5b71ef upsteam.

Add a seq_buf_can_fit() helper function that removes the possible mistakes
of comparing the seq_buf length plus added data compared to the size of
the buffer.

Link: http://lkml.kernel.org/r/[email protected]

Reviewed-by: Petr Mladek <[email protected]>
[wanglong: backport to 3.10 stable]
Signed-off-by: Wang Long <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
kernel/trace/seq_buf.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/seq_buf.c b/kernel/trace/seq_buf.c
index ce17f65..6fc9d02 100644
--- a/kernel/trace/seq_buf.c
+++ b/kernel/trace/seq_buf.c
@@ -17,6 +17,19 @@
#include <linux/seq_buf.h>

/**
+ * seq_buf_can_fit - can the new data fit in the current buffer?
+ * @s: the seq_buf descriptor
+ * @len: The length to see if it can fit in the current buffer
+ *
+ * Returns true if there's enough unused space in the seq_buf buffer
+ * to fit the amount of new data according to @len.
+ */
+static bool seq_buf_can_fit(struct seq_buf *s, size_t len)
+{
+ return s->len + len < s->size;
+}
+
+/**
* seq_buf_print_seq - move the contents of seq_buf into a seq_file
* @m: the seq_file descriptor that is the destination
* @s: the seq_buf descriptor that is the source.
@@ -48,7 +61,7 @@ int seq_buf_vprintf(struct seq_buf *s, const char *fmt, va_list args)

if (s->len < s->size) {
len = vsnprintf(s->buffer + s->len, s->size - s->len, fmt, args);
- if (s->len + len < s->size) {
+ if (seq_buf_can_fit(s, len)) {
s->len += len;
return 0;
}
@@ -137,7 +150,7 @@ int seq_buf_bprintf(struct seq_buf *s, const char *fmt, const u32 *binary)

if (s->len < s->size) {
ret = bstr_printf(s->buffer + s->len, len, fmt, binary);
- if (s->len + ret < s->size) {
+ if (seq_buf_can_fit(s, ret)) {
s->len += ret;
return 0;
}
@@ -161,7 +174,7 @@ int seq_buf_puts(struct seq_buf *s, const char *str)

WARN_ON(s->size == 0);

- if (s->len + len < s->size) {
+ if (seq_buf_can_fit(s, len)) {
memcpy(s->buffer + s->len, str, len);
s->len += len;
return 0;
@@ -183,7 +196,7 @@ int seq_buf_putc(struct seq_buf *s, unsigned char c)
{
WARN_ON(s->size == 0);

- if (s->len + 1 < s->size) {
+ if (seq_buf_can_fit(s, 1)) {
s->buffer[s->len++] = c;
return 0;
}
@@ -207,7 +220,7 @@ int seq_buf_putmem(struct seq_buf *s, const void *mem, unsigned int len)
{
WARN_ON(s->size == 0);

- if (s->len + len < s->size) {
+ if (seq_buf_can_fit(s, len)) {
memcpy(s->buffer + s->len, mem, len);
s->len += len;
return 0;
--
1.8.3.4

2015-05-14 11:39:27

by Wang Long

[permalink] [raw]
Subject: [RFC PATCH 08/17] tracing: Have seq_buf use full buffer

From: "Steven Rostedt (Red Hat)" <[email protected]>

commit 8cd709ae7658a7fd7f6630699e3229188c2591e4 upstream.

Currently seq_buf is full when all but one byte of the buffer is
filled. Change it so that the seq_buf is full when all of the
buffer is filled.

Some of the functions would fill the buffer completely and report
everything was fine. This was inconsistent with the max of size - 1.
Changing this to be max of size makes all functions consistent.

Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]

Tested-by: Jiri Kosina <[email protected]>
Acked-by: Jiri Kosina <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
[wanglong: backport to 3.10 stable]
Signed-off-by: Wang Long <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
include/linux/seq_buf.h | 6 +++---
kernel/trace/seq_buf.c | 9 ++++++---
2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/include/linux/seq_buf.h b/include/linux/seq_buf.h
index 93718e5..0800a24 100644
--- a/include/linux/seq_buf.h
+++ b/include/linux/seq_buf.h
@@ -43,13 +43,13 @@ seq_buf_init(struct seq_buf *s, unsigned char *buf, unsigned int size)
static inline bool
seq_buf_has_overflowed(struct seq_buf *s)
{
- return s->len == s->size;
+ return s->len > s->size;
}

static inline void
seq_buf_set_overflow(struct seq_buf *s)
{
- s->len = s->size;
+ s->len = s->size + 1;
}

/*
@@ -61,7 +61,7 @@ seq_buf_buffer_left(struct seq_buf *s)
if (seq_buf_has_overflowed(s))
return 0;

- return (s->size - 1) - s->len;
+ return s->size - s->len;
}

/* How much buffer was written? */
diff --git a/kernel/trace/seq_buf.c b/kernel/trace/seq_buf.c
index 6fc9d02..c53f1d5 100644
--- a/kernel/trace/seq_buf.c
+++ b/kernel/trace/seq_buf.c
@@ -26,7 +26,7 @@
*/
static bool seq_buf_can_fit(struct seq_buf *s, size_t len)
{
- return s->len + len < s->size;
+ return s->len + len <= s->size;
}

/**
@@ -110,8 +110,11 @@ int seq_buf_bitmask(struct seq_buf *s, const unsigned long *maskp,
WARN_ON(s->size == 0);

/*
- * The last byte of the buffer is used to determine if we
- * overflowed or not.
+ * Note, because bitmap_scnprintf() only returns the number of bytes
+ * written and not the number that would be written, we use the last
+ * byte of the buffer to let us know if we overflowed. There's a small
+ * chance that the bitmap could have fit exactly inside the buffer, but
+ * it's not that critical if that does happen.
*/
if (len > 1) {
ret = bitmap_scnprintf(s->buffer + s->len, len, maskp, nmaskbits);
--
1.8.3.4

2015-05-14 11:42:54

by Wang Long

[permalink] [raw]
Subject: [RFC PATCH 09/17] tracing: Add seq_buf_get_buf() and seq_buf_commit() helper functions

From: "Steven Rostedt (Red Hat)" <[email protected]>

commit 01cb06a4c229908d239149017049fdd1fca1dd51 upstream.

Add two helper functions; seq_buf_get_buf() and seq_buf_commit() that
are used by seq_buf_path(). This makes the code similar to the
seq_file: seq_path() function, and will help to be able to consolidate
the functions between seq_file and trace_seq.

Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]

Tested-by: Jiri Kosina <[email protected]>
Acked-by: Jiri Kosina <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
[wanglong: backport to 3.10 stable]
Signed-off-by: Wang Long <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
include/linux/seq_buf.h | 41 +++++++++++++++++++++++++++++++++++++++++
kernel/trace/seq_buf.c | 7 +++----
2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/include/linux/seq_buf.h b/include/linux/seq_buf.h
index 0800a24..12c6428 100644
--- a/include/linux/seq_buf.h
+++ b/include/linux/seq_buf.h
@@ -70,6 +70,47 @@ static inline unsigned int seq_buf_used(struct seq_buf *s)
return min(s->len, s->size);
}

+/**
+ * seq_buf_get_buf - get buffer to write arbitrary data to
+ * @s: the seq_buf handle
+ * @bufp: the beginning of the buffer is stored here
+ *
+ * Return the number of bytes available in the buffer, or zero if
+ * there's no space.
+ */
+static inline size_t seq_buf_get_buf(struct seq_buf *s, char **bufp)
+{
+ WARN_ON(s->len > s->size + 1);
+
+ if (s->len < s->size) {
+ *bufp = s->buffer + s->len;
+ return s->size - s->len;
+ }
+
+ *bufp = NULL;
+ return 0;
+}
+
+/**
+ * seq_buf_commit - commit data to the buffer
+ * @s: the seq_buf handle
+ * @num: the number of bytes to commit
+ *
+ * Commit @num bytes of data written to a buffer previously acquired
+ * by seq_buf_get. To signal an error condition, or that the data
+ * didn't fit in the available space, pass a negative @num value.
+ */
+static inline void seq_buf_commit(struct seq_buf *s, int num)
+{
+ if (num < 0) {
+ seq_buf_set_overflow(s);
+ } else {
+ /* num must be negative on overflow */
+ BUG_ON(s->len + num > s->size);
+ s->len += num;
+ }
+}
+
extern __printf(2, 3)
int seq_buf_printf(struct seq_buf *s, const char *fmt, ...);
extern __printf(2, 0)
diff --git a/kernel/trace/seq_buf.c b/kernel/trace/seq_buf.c
index c53f1d5..086f594 100644
--- a/kernel/trace/seq_buf.c
+++ b/kernel/trace/seq_buf.c
@@ -293,8 +293,8 @@ int seq_buf_putmem_hex(struct seq_buf *s, const void *mem,
*/
int seq_buf_path(struct seq_buf *s, const struct path *path, const char *esc)
{
- char *buf = s->buffer + s->len;
- size_t size = seq_buf_buffer_left(s);
+ char *buf;
+ size_t size = seq_buf_get_buf(s, &buf);
int res = -1;

WARN_ON(s->size == 0);
@@ -307,8 +307,7 @@ int seq_buf_path(struct seq_buf *s, const struct path *path, const char *esc)
res = end - buf;
}
}
- if (res > 0)
- s->len += res;
+ seq_buf_commit(s, res);

return res;
}
--
1.8.3.4

2015-05-14 11:39:46

by Wang Long

[permalink] [raw]
Subject: [RFC PATCH 10/17] seq-buf: Make seq_buf_bprintf() conditional on CONFIG_BINARY_PRINTF

From: "Steven Rostedt (Red Hat)" <[email protected]>

commit 2448913ed2aa7a7424d9b9ca79861d13c746a3f1 upstream.

The function bstr_printf() from lib/vsprnintf.c is only available if
CONFIG_BINARY_PRINTF is defined. This is due to the only user currently
being the tracing infrastructure, which needs to select this config
when tracing is configured. Until there is another user of the binary
printf formats, this will continue to be the case.

Since seq_buf.c is now lives in lib/ and is compiled even without
tracing, it must encompass its use of bstr_printf() which is used
by seq_buf_printf(). This too is only used by the tracing infrastructure
and is still encapsulated by the CONFIG_BINARY_PRINTF.

Link: http://lkml.kernel.org/r/[email protected]

Tested-by: Jiri Kosina <[email protected]>
Acked-by: Jiri Kosina <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
[wanglong: backport to 3.10 stable]
Signed-off-by: Wang Long <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
include/linux/seq_buf.h | 7 +++++--
kernel/trace/seq_buf.c | 2 ++
2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/seq_buf.h b/include/linux/seq_buf.h
index 12c6428..9aafe0e 100644
--- a/include/linux/seq_buf.h
+++ b/include/linux/seq_buf.h
@@ -115,8 +115,6 @@ extern __printf(2, 3)
int seq_buf_printf(struct seq_buf *s, const char *fmt, ...);
extern __printf(2, 0)
int seq_buf_vprintf(struct seq_buf *s, const char *fmt, va_list args);
-extern int
-seq_buf_bprintf(struct seq_buf *s, const char *fmt, const u32 *binary);
extern int seq_buf_print_seq(struct seq_file *m, struct seq_buf *s);
extern int seq_buf_to_user(struct seq_buf *s, char __user *ubuf,
int cnt);
@@ -130,4 +128,9 @@ extern int seq_buf_path(struct seq_buf *s, const struct path *path, const char *
extern int seq_buf_bitmask(struct seq_buf *s, const unsigned long *maskp,
int nmaskbits);

+#ifdef CONFIG_BINARY_PRINTF
+extern int
+seq_buf_bprintf(struct seq_buf *s, const char *fmt, const u32 *binary);
+#endif
+
#endif /* _LINUX_SEQ_BUF_H */
diff --git a/kernel/trace/seq_buf.c b/kernel/trace/seq_buf.c
index 086f594..4eedfed 100644
--- a/kernel/trace/seq_buf.c
+++ b/kernel/trace/seq_buf.c
@@ -127,6 +127,7 @@ int seq_buf_bitmask(struct seq_buf *s, const unsigned long *maskp,
return -1;
}

+#ifdef CONFIG_BINARY_PRINTF
/**
* seq_buf_bprintf - Write the printf string from binary arguments
* @s: seq_buf descriptor
@@ -161,6 +162,7 @@ int seq_buf_bprintf(struct seq_buf *s, const char *fmt, const u32 *binary)
seq_buf_set_overflow(s);
return -1;
}
+#endif /* CONFIG_BINARY_PRINTF */

/**
* seq_buf_puts - sequence printing of simple string
--
1.8.3.4

2015-05-14 11:39:36

by Wang Long

[permalink] [raw]
Subject: [RFC PATCH 11/17] seq_buf: Move the seq_buf code to lib/

From: "Steven Rostedt (Red Hat)" <[email protected]>

commit 8d58e99af5980d444948720977b0976455885391 upstream.

The seq_buf functions are rather useful outside of tracing. Instead
of having it be dependent on CONFIG_TRACING, move the code into lib/
and allow other users to have access to it even when tracing is not
configured.

The seq_buf utility is similar to the seq_file utility, but instead of
writing sending data back up to userland, it writes it into a buffer
defined at seq_buf_init(). This allows us to send a descriptor around
that writes printf() formatted strings into it that can be retrieved
later.

It is currently used by the tracing facility for such things like trace
events to convert its binary saved data in the ring buffer into an
ASCII human readable context to be displayed in /sys/kernel/debug/trace.

It can also be used for doing NMI prints safely from NMI context into
the seq_buf and retrieved later and dumped to printk() safely. Doing
printk() from an NMI context is dangerous because an NMI can preempt
a current printk() and deadlock on it.

Link: http://lkml.kernel.org/p/[email protected]

Tested-by: Jiri Kosina <[email protected]>
Acked-by: Jiri Kosina <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
[wanglong: backport to 3.10 stable
- adjust context
]
Signed-off-by: Wang Long <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
kernel/trace/Makefile | 1 -
lib/Makefile | 2 +-
{kernel/trace => lib}/seq_buf.c | 0
3 files changed, 1 insertion(+), 2 deletions(-)
rename {kernel/trace => lib}/seq_buf.c (100%)

diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index cfb86f1..d7e2068 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -27,7 +27,6 @@ obj-$(CONFIG_RING_BUFFER_BENCHMARK) += ring_buffer_benchmark.o

obj-$(CONFIG_TRACING) += trace.o
obj-$(CONFIG_TRACING) += trace_output.o
-obj-$(CONFIG_TRACING) += seq_buf.o
obj-$(CONFIG_TRACING) += trace_stat.o
obj-$(CONFIG_TRACING) += trace_printk.o
obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
diff --git a/lib/Makefile b/lib/Makefile
index 9efe480..4ecd2e4 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -13,7 +13,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \
proportions.o flex_proportions.o prio_heap.o ratelimit.o show_mem.o \
is_single_threaded.o plist.o decompress.o kobject_uevent.o \
- earlycpio.o
+ earlycpio.o seq_buf.o

obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
lib-$(CONFIG_MMU) += ioremap.o
diff --git a/kernel/trace/seq_buf.c b/lib/seq_buf.c
similarity index 100%
rename from kernel/trace/seq_buf.c
rename to lib/seq_buf.c
--
1.8.3.4

2015-05-14 11:42:58

by Wang Long

[permalink] [raw]
Subject: [RFC PATCH 12/17] seq_buf: Fix seq_buf_vprintf() truncation

From: "Steven Rostedt (Red Hat)" <[email protected]>

commit 4a8fe4e1811c96ad0ad9f4083f2fe4fb43b2988d upstream.

In seq_buf_vprintf(), vsnprintf() is used to copy the format into the
buffer remaining in the seq_buf structure. The return of vsnprintf()
is the amount of characters written to the buffer excluding the '\0',
unless the line was truncated!

If the line copied does not fit, it is truncated, and a '\0' is added
to the end of the buffer. But in this case, '\0' is included in the length
of the line written. To know if the buffer had overflowed, the return
length will be the same as the length of the buffer passed in.

The check in seq_buf_vprintf() only checked if the length returned from
vsnprintf() would fit in the buffer, as the seq_buf_vprintf() is only
to be an all or nothing command. It either writes all the string into
the seq_buf, or none of it. If the string is truncated, the pointers
inside the seq_buf must be reset to what they were when the function was
called. This is not the case. On overflow, it copies only part of the string.

The fix is to change the overflow check to see if the length returned from
vsnprintf() is less than the length remaining in the seq_buf buffer, and not
if it is less than or equal to as it currently does. Then seq_buf_vprintf()
will know if the write from vsnpritnf() was truncated or not.

Cc: [email protected]
[wanglong: backport to 3.10 stable]
Signed-off-by: Wang Long <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
lib/seq_buf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index 4eedfed..795dd94 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -61,7 +61,7 @@ int seq_buf_vprintf(struct seq_buf *s, const char *fmt, va_list args)

if (s->len < s->size) {
len = vsnprintf(s->buffer + s->len, s->size - s->len, fmt, args);
- if (seq_buf_can_fit(s, len)) {
+ if (s->len + len < s->size) {
s->len += len;
return 0;
}
--
1.8.3.4

2015-05-14 11:39:53

by Wang Long

[permalink] [raw]
Subject: [RFC PATCH 13/17] seq_buf: Fix seq_buf_bprintf() truncation

From: "Steven Rostedt (Red Hat)" <[email protected]>

commit 4d4eb4d4fbd9403682e2b75117b6b895531d8e01 upstream.

In seq_buf_bprintf(), bstr_printf() is used to copy the format into the
buffer remaining in the seq_buf structure. The return of bstr_printf()
is the amount of characters written to the buffer excluding the '\0',
unless the line was truncated!

If the line copied does not fit, it is truncated, and a '\0' is added
to the end of the buffer. But in this case, '\0' is included in the length
of the line written. To know if the buffer had overflowed, the return
length will be the same or greater than the length of the buffer passed in.

The check in seq_buf_bprintf() only checked if the length returned from
bstr_printf() would fit in the buffer, as the seq_buf_bprintf() is only
to be an all or nothing command. It either writes all the string into
the seq_buf, or none of it. If the string is truncated, the pointers
inside the seq_buf must be reset to what they were when the function was
called. This is not the case. On overflow, it copies only part of the string.

The fix is to change the overflow check to see if the length returned from
bstr_printf() is less than the length remaining in the seq_buf buffer, and not
if it is less than or equal to as it currently does. Then seq_buf_bprintf()
will know if the write from bstr_printf() was truncated or not.

Link: http://lkml.kernel.org/r/[email protected]

Cc: [email protected]
Reported-by: Joe Perches <[email protected]>
[wanglong: backport to 3.10 stable]
Signed-off-by: Wang Long <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
lib/seq_buf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index 795dd94..f25c33b 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -154,7 +154,7 @@ int seq_buf_bprintf(struct seq_buf *s, const char *fmt, const u32 *binary)

if (s->len < s->size) {
ret = bstr_printf(s->buffer + s->len, len, fmt, binary);
- if (seq_buf_can_fit(s, ret)) {
+ if (s->len + ret < s->size) {
s->len += ret;
return 0;
}
--
1.8.3.4

2015-05-14 11:41:18

by Wang Long

[permalink] [raw]
Subject: [RFC PATCH 14/17] printk: Add per_cpu printk func to allow printk to be diverted

From: "Steven Rostedt (Red Hat)" <[email protected]>

commit afdc34a3d3b823a12a93b822ee1efb566f884032 upstream.

Being able to divert printk to call another function besides the normal
logging is useful for such things like NMI handling. If some functions
are to be called from NMI that does printk() it is possible to lock up
the box if the nmi handler triggers when another printk is happening.

One example of this use is to perform a stack trace on all CPUs via NMI.
But if the NMI is to do the printk() it can cause the system to lock up.
By allowing the printk to be diverted to another function that can safely
record the printk output and then print it when it in a safe context
then NMIs will be safe to call these functions like show_regs().

Link: http://lkml.kernel.org/p/[email protected]

Tested-by: Jiri Kosina <[email protected]>
Acked-by: Jiri Kosina <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
[wanglong: backport to 3.10 stable
- adjust context
]
Signed-off-by: Wang Long <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
include/linux/percpu.h | 3 +++
include/linux/printk.h | 2 ++
kernel/printk.c | 38 +++++++++++++++++++++++++++++---------
3 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index cc88172..bb1d29c 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -755,4 +755,7 @@ do { \
__pcpu_double_call_return_bool(__this_cpu_cmpxchg_double_, (pcp1), (pcp2), (oval1), (oval2), (nval1), (nval2))
#endif

+/* To avoid include hell, as printk can not declare this, we declare it here */
+DECLARE_PER_CPU(printk_func_t, printk_func);
+
#endif /* __LINUX_PERCPU_H */
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 708b8a8..ebc7f3e 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -144,6 +144,8 @@ extern int kptr_restrict;

extern void wake_up_klogd(void);

+typedef int(*printk_func_t)(const char *fmt, va_list args);
+
void log_buf_kexec_setup(void);
void __init setup_log_buf(int early);
void dump_stack_set_arch_desc(const char *fmt, ...);
diff --git a/kernel/printk.c b/kernel/printk.c
index fd0154a..d3d929d 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -1657,6 +1657,30 @@ asmlinkage int printk_emit(int facility, int level,
}
EXPORT_SYMBOL(printk_emit);

+int vprintk_default(const char *fmt, va_list args)
+{
+ int r;
+
+#ifdef CONFIG_KGDB_KDB
+ if (unlikely(kdb_trap_printk)) {
+ r = vkdb_printf(fmt, args);
+ return r;
+ }
+#endif
+ r = vprintk_emit(0, -1, NULL, 0, fmt, args);
+
+ return r;
+}
+EXPORT_SYMBOL_GPL(vprintk_default);
+
+/*
+ * This allows printk to be diverted to another function per cpu.
+ * This is useful for calling printk functions from within NMI
+ * without worrying about race conditions that can lock up the
+ * box.
+ */
+DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
+
/**
* printk - print a kernel message
* @fmt: format string
@@ -1680,19 +1704,15 @@ EXPORT_SYMBOL(printk_emit);
*/
asmlinkage int printk(const char *fmt, ...)
{
+ printk_func_t vprintk_func;
va_list args;
int r;

-#ifdef CONFIG_KGDB_KDB
- if (unlikely(kdb_trap_printk)) {
- va_start(args, fmt);
- r = vkdb_printf(fmt, args);
- va_end(args);
- return r;
- }
-#endif
va_start(args, fmt);
- r = vprintk_emit(0, -1, NULL, 0, fmt, args);
+ preempt_disable();
+ vprintk_func = this_cpu_read(printk_func);
+ r = vprintk_func(fmt, args);
+ preempt_enable();
va_end(args);

return r;
--
1.8.3.4

2015-05-14 11:41:25

by Wang Long

[permalink] [raw]
Subject: [RFC PATCH 15/17] printk/percpu: Define printk_func when printk is not defined

From: "Steven Rostedt (Red Hat)" <[email protected]>

commit 04b74b27c2941e5d62120f6fee3a0a9388a30613 upstream.

To avoid include hell, the per_cpu variable printk_func was declared
in percpu.h. But it is only defined if printk is defined.

As users of printk may also use the printk_func variable, it needs to
be defined even if CONFIG_PRINTK is not.

Also add a printk.h include in percpu.h just to be safe.

Link: http://lkml.kernel.org/r/[email protected]

Reported-by: Stephen Rothwell <[email protected]>
[wanglong: backport to 3.10 stable
- adjust context
]
Signed-off-by: Wang Long <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
include/linux/percpu.h | 1 +
include/linux/printk.h | 4 ++--
kernel/printk.c | 3 +++
3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index bb1d29c..60e425e 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -4,6 +4,7 @@
#include <linux/preempt.h>
#include <linux/smp.h>
#include <linux/cpumask.h>
+#include <linux/printk.h>
#include <linux/pfn.h>
#include <linux/init.h>

diff --git a/include/linux/printk.h b/include/linux/printk.h
index ebc7f3e..14b4098 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -106,6 +106,8 @@ static inline __printf(1, 2) __cold
void early_printk(const char *s, ...) { }
#endif

+typedef int(*printk_func_t)(const char *fmt, va_list args);
+
#ifdef CONFIG_PRINTK
asmlinkage __printf(5, 0)
int vprintk_emit(int facility, int level,
@@ -144,8 +146,6 @@ extern int kptr_restrict;

extern void wake_up_klogd(void);

-typedef int(*printk_func_t)(const char *fmt, va_list args);
-
void log_buf_kexec_setup(void);
void __init setup_log_buf(int early);
void dump_stack_set_arch_desc(const char *fmt, ...);
diff --git a/kernel/printk.c b/kernel/printk.c
index d3d929d..7109b38 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -1746,6 +1746,9 @@ static size_t msg_print_text(const struct log *msg, enum log_flags prev,
bool syslog, char *buf, size_t size) { return 0; }
static size_t cont_print_text(char *text, size_t size) { return 0; }

+/* Still needs to be defined for users */
+DEFINE_PER_CPU(printk_func_t, printk_func);
+
#endif /* CONFIG_PRINTK */

#ifdef CONFIG_EARLY_PRINTK
--
1.8.3.4

2015-05-14 11:42:50

by Wang Long

[permalink] [raw]
Subject: [RFC PATCH 16/17] x86/nmi: Perform a safe NMI stack trace on all CPUs

From: "Steven Rostedt (Red Hat)" <[email protected]>

commit a9edc88093287183ac934be44f295f183b2c62dd upstream.

When trigger_all_cpu_backtrace() is called on x86, it will trigger an
NMI on each CPU and call show_regs(). But this can lead to a hard lock
up if the NMI comes in on another printk().

In order to avoid this, when the NMI triggers, it switches the printk
routine for that CPU to call a NMI safe printk function that records the
printk in a per_cpu seq_buf descriptor. After all NMIs have finished
recording its data, the seq_bufs are printed in a safe context.

Link: http://lkml.kernel.org/p/[email protected]
Link: http://lkml.kernel.org/r/[email protected]

Tested-by: Jiri Kosina <[email protected]>
Acked-by: Jiri Kosina <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
[wanglong: backport to 3.10 stable
- adjust context
]
Signed-off-by: Wang Long <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
arch/x86/kernel/apic/hw_nmi.c | 86 +++++++++++++++++++++++++++++++++++++++++--
1 file changed, 83 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index a698d71..1eb5f90 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -18,6 +18,7 @@
#include <linux/nmi.h>
#include <linux/module.h>
#include <linux/delay.h>
+#include <linux/seq_buf.h>

#ifdef CONFIG_HARDLOCKUP_DETECTOR
u64 hw_nmi_get_sample_period(int watchdog_thresh)
@@ -29,12 +30,33 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
#ifdef arch_trigger_all_cpu_backtrace
/* For reliability, we're prepared to waste bits here. */
static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
+static cpumask_var_t printtrace_mask;
+
+#define NMI_BUF_SIZE 4096
+
+struct nmi_seq_buf {
+ unsigned char buffer[NMI_BUF_SIZE];
+ struct seq_buf seq;
+};
+
+/* Safe printing in NMI context */
+static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);

/* "in progress" flag of arch_trigger_all_cpu_backtrace */
static unsigned long backtrace_flag;

+static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
+{
+ const char *buf = s->buffer + start;
+
+ printk("%.*s", (end - start) + 1, buf);
+}
+
void arch_trigger_all_cpu_backtrace(void)
{
+ struct nmi_seq_buf *s;
+ int len;
+ int cpu;
int i;

if (test_and_set_bit(0, &backtrace_flag))
@@ -45,6 +67,15 @@ void arch_trigger_all_cpu_backtrace(void)
return;

cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
+ cpumask_copy(printtrace_mask, to_cpumask(backtrace_mask));
+ /*
+ * Set up per_cpu seq_buf buffers that the NMIs running on the other
+ * CPUs will write to.
+ */
+ for_each_cpu(cpu, to_cpumask(backtrace_mask)) {
+ s = &per_cpu(nmi_print_seq, cpu);
+ seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
+ }

printk(KERN_INFO "sending NMI to all CPUs:\n");
apic->send_IPI_all(NMI_VECTOR);
@@ -56,10 +87,57 @@ void arch_trigger_all_cpu_backtrace(void)
mdelay(1);
}

+ /*
+ * Now that all the NMIs have triggered, we can dump out their
+ * back traces safely to the console.
+ */
+ for_each_cpu(cpu, printtrace_mask) {
+ int last_i = 0;
+
+ s = &per_cpu(nmi_print_seq, cpu);
+ len = seq_buf_used(&s->seq);
+ if (!len)
+ continue;
+
+ /* Print line by line. */
+ for (i = 0; i < len; i++) {
+ if (s->buffer[i] == '\n') {
+ print_seq_line(s, last_i, i);
+ last_i = i + 1;
+ }
+ }
+ /* Check if there was a partial line. */
+ if (last_i < len) {
+ print_seq_line(s, last_i, len - 1);
+ pr_cont("\n");
+ }
+ }
+
clear_bit(0, &backtrace_flag);
smp_mb__after_clear_bit();
}

+/*
+ * It is not safe to call printk() directly from NMI handlers.
+ * It may be fine if the NMI detected a lock up and we have no choice
+ * but to do so, but doing a NMI on all other CPUs to get a back trace
+ * can be done with a sysrq-l. We don't want that to lock up, which
+ * can happen if the NMI interrupts a printk in progress.
+ *
+ * Instead, we redirect the vprintk() to this nmi_vprintk() that writes
+ * the content into a per cpu seq_buf buffer. Then when the NMIs are
+ * all done, we can safely dump the contents of the seq_buf to a printk()
+ * from a non NMI context.
+ */
+static int nmi_vprintk(const char *fmt, va_list args)
+{
+ struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
+ unsigned int len = seq_buf_used(&s->seq);
+
+ seq_buf_vprintf(&s->seq, fmt, args);
+ return seq_buf_used(&s->seq) - len;
+}
+
static int __kprobes
arch_trigger_all_cpu_backtrace_handler(unsigned int cmd, struct pt_regs *regs)
{
@@ -68,12 +146,14 @@ arch_trigger_all_cpu_backtrace_handler(unsigned int cmd, struct pt_regs *regs)
cpu = smp_processor_id();

if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
- static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED;
+ printk_func_t printk_func_save = this_cpu_read(printk_func);

- arch_spin_lock(&lock);
+ /* Replace printk to write into the NMI seq */
+ this_cpu_write(printk_func, nmi_vprintk);
printk(KERN_WARNING "NMI backtrace for cpu %d\n", cpu);
show_regs(regs);
- arch_spin_unlock(&lock);
+ this_cpu_write(printk_func, printk_func_save);
+
cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
return NMI_HANDLED;
}
--
1.8.3.4

2015-05-14 11:40:08

by Wang Long

[permalink] [raw]
Subject: [RFC PATCH 17/17] x86/nmi: Fix use of unallocated cpumask_var_t

From: Sasha Levin <[email protected]>

commit db0865543739b3edb2ee9bf340380cf4986b58ff upstream.

Commit "x86/nmi: Perform a safe NMI stack trace on all CPUs" has introduced
a cpumask_var_t variable:

+static cpumask_var_t printtrace_mask;

But never allocated it before using it, which caused a NULL ptr deref when
trying to print the stack trace:

[ 1110.296154] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1110.296169] IP: __memcpy (arch/x86/lib/memcpy_64.S:151)
[ 1110.296178] PGD 4c34b3067 PUD 4c351b067 PMD 0
[ 1110.296186] Oops: 0002 [#1] PREEMPT SMP KASAN
[ 1110.296234] Dumping ftrace buffer:
[ 1110.296330] (ftrace buffer empty)
[ 1110.296339] Modules linked in:
[ 1110.296345] CPU: 1 PID: 10538 Comm: trinity-c99 Not tainted 3.18.0-rc5-next-20141124-sasha-00058-ge2a8c09-dirty #1499
[ 1110.296348] task: ffff880152650000 ti: ffff8804c3560000 task.ti: ffff8804c3560000
[ 1110.296357] RIP: __memcpy (arch/x86/lib/memcpy_64.S:151)
[ 1110.296360] RSP: 0000:ffff8804c3563870 EFLAGS: 00010246
[ 1110.296363] RAX: 0000000000000000 RBX: ffffe8fff3c4a809 RCX: 0000000000000000
[ 1110.296366] RDX: 0000000000000008 RSI: ffffffff9e254040 RDI: 0000000000000000
[ 1110.296369] RBP: ffff8804c3563908 R08: 0000000000ffffff R09: 0000000000ffffff
[ 1110.296371] R10: 0000000000000000 R11: 0000000000000006 R12: 0000000000000000
[ 1110.296375] R13: 0000000000000000 R14: ffffffff9e254040 R15: ffffe8fff3c4a809
[ 1110.296379] FS: 00007f9e43b0b700(0000) GS:ffff880107e00000(0000) knlGS:0000000000000000
[ 1110.296382] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1110.296385] CR2: 0000000000000000 CR3: 00000004e4334000 CR4: 00000000000006a0
[ 1110.296400] Stack:
[ 1110.296406] ffffffff81b1e46c 0000000000000000 ffff880107e03fb8 000000000000000b
[ 1110.296413] ffff880107dfffc0 ffff880107e03fc0 0000000000000008 ffffffff93f2e9c8
[ 1110.296419] 0000000000000000 ffffda0020fc07f7 0000000000000008 ffff8804c3563901
[ 1110.296420] Call Trace:
[ 1110.296429] ? memcpy (mm/kasan/kasan.c:275)
[ 1110.296437] ? arch_trigger_all_cpu_backtrace (include/linux/bitmap.h:215 include/linux/cpumask.h:506 arch/x86/kernel/apic/hw_nmi.c:76)
[ 1110.296444] arch_trigger_all_cpu_backtrace (include/linux/bitmap.h:215 include/linux/cpumask.h:506 arch/x86/kernel/apic/hw_nmi.c:76)
[ 1110.296451] ? dump_stack (./arch/x86/include/asm/preempt.h:95 lib/dump_stack.c:55)
[ 1110.296458] do_raw_spin_lock (./arch/x86/include/asm/spinlock.h:86 kernel/locking/spinlock_debug.c:130 kernel/locking/spinlock_debug.c:137)
[ 1110.296468] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 1110.296474] ? __page_check_address (include/linux/spinlock.h:309 mm/rmap.c:630)
[ 1110.296481] __page_check_address (include/linux/spinlock.h:309 mm/rmap.c:630)
[ 1110.296487] ? preempt_count_sub (kernel/sched/core.c:2615)
[ 1110.296493] try_to_unmap_one (include/linux/rmap.h:202 mm/rmap.c:1146)
[ 1110.296504] ? anon_vma_interval_tree_iter_next (mm/interval_tree.c:72 mm/interval_tree.c:103)
[ 1110.296514] rmap_walk (mm/rmap.c:1653 mm/rmap.c:1725)
[ 1110.296521] ? page_get_anon_vma (include/linux/rcupdate.h:423 include/linux/rcupdate.h:935 mm/rmap.c:435)
[ 1110.296530] try_to_unmap (mm/rmap.c:1545)
[ 1110.296536] ? page_get_anon_vma (mm/rmap.c:437)
[ 1110.296545] ? try_to_unmap_nonlinear (mm/rmap.c:1138)
[ 1110.296551] ? SyS_msync (mm/rmap.c:1501)
[ 1110.296558] ? page_remove_rmap (mm/rmap.c:1409)
[ 1110.296565] ? page_get_anon_vma (mm/rmap.c:448)
[ 1110.296571] ? anon_vma_ctor (mm/rmap.c:1496)
[ 1110.296579] migrate_pages (mm/migrate.c:913 mm/migrate.c:956 mm/migrate.c:1136)
[ 1110.296586] ? _raw_spin_unlock_irq (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:169 kernel/locking/spinlock.c:199)
[ 1110.296593] ? buffer_migrate_lock_buffers (mm/migrate.c:1584)
[ 1110.296601] ? handle_mm_fault (mm/memory.c:3163 mm/memory.c:3223 mm/memory.c:3336 mm/memory.c:3365)
[ 1110.296607] migrate_misplaced_page (mm/migrate.c:1738)
[ 1110.296614] handle_mm_fault (mm/memory.c:3170 mm/memory.c:3223 mm/memory.c:3336 mm/memory.c:3365)
[ 1110.296623] __do_page_fault (arch/x86/mm/fault.c:1246)
[ 1110.296630] ? vtime_account_user (kernel/sched/cputime.c:701)
[ 1110.296638] ? get_parent_ip (kernel/sched/core.c:2559)
[ 1110.296646] ? context_tracking_user_exit (kernel/context_tracking.c:144)
[ 1110.296656] trace_do_page_fault (arch/x86/mm/fault.c:1329 include/linux/jump_label.h:114 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1330)
[ 1110.296664] do_async_page_fault (arch/x86/kernel/kvm.c:280)
[ 1110.296670] async_page_fault (arch/x86/kernel/entry_64.S:1285)
[ 1110.296755] Code: 08 4c 8b 54 16 f0 4c 8b 5c 16 f8 4c 89 07 4c 89 4f 08 4c 89 54 17 f0 4c 89 5c 17 f8 c3 90 83 fa 08 72 1b 4c 8b 06 4c 8b 4c 16 f8 <4c> 89 07 4c 89 4c 17 f8 c3 66 2e 0f 1f 84 00 00 00 00 00 83 fa
All code
========
0: 08 4c 8b 54 or %cl,0x54(%rbx,%rcx,4)
4: 16 (bad)
5: f0 4c 8b 5c 16 f8 lock mov -0x8(%rsi,%rdx,1),%r11
b: 4c 89 07 mov %r8,(%rdi)
e: 4c 89 4f 08 mov %r9,0x8(%rdi)
12: 4c 89 54 17 f0 mov %r10,-0x10(%rdi,%rdx,1)
17: 4c 89 5c 17 f8 mov %r11,-0x8(%rdi,%rdx,1)
1c: c3 retq
1d: 90 nop
1e: 83 fa 08 cmp $0x8,%edx
21: 72 1b jb 0x3e
23: 4c 8b 06 mov (%rsi),%r8
26: 4c 8b 4c 16 f8 mov -0x8(%rsi,%rdx,1),%r9
2b:* 4c 89 07 mov %r8,(%rdi) <-- trapping instruction
2e: 4c 89 4c 17 f8 mov %r9,-0x8(%rdi,%rdx,1)
33: c3 retq
34: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
3b: 00 00 00
3e: 83 fa 00 cmp $0x0,%edx

Code starting with the faulting instruction
===========================================
0: 4c 89 07 mov %r8,(%rdi)
3: 4c 89 4c 17 f8 mov %r9,-0x8(%rdi,%rdx,1)
8: c3 retq
9: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
10: 00 00 00
13: 83 fa 00 cmp $0x0,%edx
[ 1110.296760] RIP __memcpy (arch/x86/lib/memcpy_64.S:151)
[ 1110.296763] RSP <ffff8804c3563870>
[ 1110.296765] CR2: 0000000000000000

Link: http://lkml.kernel.org/r/[email protected]

Signed-off-by: Sasha Levin <[email protected]>
[wanglong: backport to 3.10 stable
- adjust context
]
Signed-off-by: Wang Long <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
arch/x86/kernel/apic/hw_nmi.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 1eb5f90..c168e8a 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -30,7 +30,7 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
#ifdef arch_trigger_all_cpu_backtrace
/* For reliability, we're prepared to waste bits here. */
static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
-static cpumask_var_t printtrace_mask;
+static cpumask_t printtrace_mask;

#define NMI_BUF_SIZE 4096

@@ -67,7 +67,7 @@ void arch_trigger_all_cpu_backtrace(void)
return;

cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
- cpumask_copy(printtrace_mask, to_cpumask(backtrace_mask));
+ cpumask_copy(&printtrace_mask, to_cpumask(backtrace_mask));
/*
* Set up per_cpu seq_buf buffers that the NMIs running on the other
* CPUs will write to.
@@ -91,7 +91,7 @@ void arch_trigger_all_cpu_backtrace(void)
* Now that all the NMIs have triggered, we can dump out their
* back traces safely to the console.
*/
- for_each_cpu(cpu, printtrace_mask) {
+ for_each_cpu(cpu, &printtrace_mask) {
int last_i = 0;

s = &per_cpu(nmi_print_seq, cpu);
--
1.8.3.4

2015-05-14 13:55:14

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC PATCH 00/17][request for stable 3.10 inclusion] x86/nmi: Print all cpu stacks from NMI safely

On Thu, 14 May 2015 11:34:47 +0000
Wang Long <[email protected]> wrote:

> The patch 1-13 backport the "seq_buf" infrastructures. in detail, patch 1, 2
> and 6 only backport "seq_buf" related code.
>

Ah, so basically you just backported the seq_buf.c code without
modifying the trace_seq code. That's a good approach. I don't have much
time to look at these but I'll try to skim them to see if I find
anything broken.

I may pull all of them into a test branch and run my tests to make sure
they don't break anything else.

-- Steve


> arch/x86/kernel/apic/hw_nmi.c | 86 +++++++++-
> include/linux/percpu.h | 4 +
> include/linux/printk.h | 2 +
> include/linux/seq_buf.h | 136 ++++++++++++++++
> kernel/printk.c | 41 +++--
> lib/Makefile | 2 +-
> lib/seq_buf.c | 359 ++++++++++++++++++++++++++++++++++++++++++
> 7 files changed, 617 insertions(+), 13 deletions(-)
> create mode 100644 include/linux/seq_buf.h
> create mode 100644 lib/seq_buf.c
>

2015-05-18 03:55:06

by Wang Long

[permalink] [raw]
Subject: Re: [RFC PATCH 00/17][request for stable 3.10 inclusion] x86/nmi: Print all cpu stacks from NMI safely

On 2015/5/14 21:55, Steven Rostedt wrote:
> On Thu, 14 May 2015 11:34:47 +0000
> Wang Long <[email protected]> wrote:
>
>> The patch 1-13 backport the "seq_buf" infrastructures. in detail, patch 1, 2
>> and 6 only backport "seq_buf" related code.
>>
>
> Ah, so basically you just backported the seq_buf.c code without
> modifying the trace_seq code. That's a good approach. I don't have much
> time to look at these but I'll try to skim them to see if I find
> anything broken.
>
> I may pull all of them into a test branch and run my tests to make sure
> they don't break anything else.
>
> -- Steve
>
Hi Steve,

Thank you for your review and test. Does your testcases run OK with this
series patches?

Best Regards
Wang Long
>
>> arch/x86/kernel/apic/hw_nmi.c | 86 +++++++++-
>> include/linux/percpu.h | 4 +
>> include/linux/printk.h | 2 +
>> include/linux/seq_buf.h | 136 ++++++++++++++++
>> kernel/printk.c | 41 +++--
>> lib/Makefile | 2 +-
>> lib/seq_buf.c | 359 ++++++++++++++++++++++++++++++++++++++++++
>> 7 files changed, 617 insertions(+), 13 deletions(-)
>> create mode 100644 include/linux/seq_buf.h
>> create mode 100644 lib/seq_buf.c
>>
>
>
> .
>

2015-05-18 13:53:38

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC PATCH 14/17] printk: Add per_cpu printk func to allow printk to be diverted

On Thu 2015-05-14 11:35:01, Wang Long wrote:
> From: "Steven Rostedt (Red Hat)" <[email protected]>
>
> commit afdc34a3d3b823a12a93b822ee1efb566f884032 upstream.
>
> Being able to divert printk to call another function besides the normal
> logging is useful for such things like NMI handling. If some functions
> are to be called from NMI that does printk() it is possible to lock up
> the box if the nmi handler triggers when another printk is happening.
>
> One example of this use is to perform a stack trace on all CPUs via NMI.
> But if the NMI is to do the printk() it can cause the system to lock up.
> By allowing the printk to be diverted to another function that can safely
> record the printk output and then print it when it in a safe context
> then NMIs will be safe to call these functions like show_regs().
>
> Link: http://lkml.kernel.org/p/[email protected]
>
> Tested-by: Jiri Kosina <[email protected]>
> Acked-by: Jiri Kosina <[email protected]>
> Acked-by: Paul E. McKenney <[email protected]>
> Reviewed-by: Petr Mladek <[email protected]>
> [wanglong: backport to 3.10 stable
> - adjust context
> ]
> Signed-off-by: Wang Long <[email protected]>
> Signed-off-by: Steven Rostedt <[email protected]>
> ---
> include/linux/percpu.h | 3 +++
> include/linux/printk.h | 2 ++
> kernel/printk.c | 38 +++++++++++++++++++++++++++++---------
> 3 files changed, 34 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/percpu.h b/include/linux/percpu.h
> index cc88172..bb1d29c 100644
> --- a/include/linux/percpu.h
> +++ b/include/linux/percpu.h
> @@ -755,4 +755,7 @@ do { \
> __pcpu_double_call_return_bool(__this_cpu_cmpxchg_double_, (pcp1), (pcp2), (oval1), (oval2), (nval1), (nval2))
> #endif
>
> +/* To avoid include hell, as printk can not declare this, we declare it here */
> +DECLARE_PER_CPU(printk_func_t, printk_func);
> +
> #endif /* __LINUX_PERCPU_H */
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index 708b8a8..ebc7f3e 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -144,6 +144,8 @@ extern int kptr_restrict;
>
> extern void wake_up_klogd(void);
>
> +typedef int(*printk_func_t)(const char *fmt, va_list args);
> +
> void log_buf_kexec_setup(void);
> void __init setup_log_buf(int early);
> void dump_stack_set_arch_desc(const char *fmt, ...);
> diff --git a/kernel/printk.c b/kernel/printk.c
> index fd0154a..d3d929d 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -1657,6 +1657,30 @@ asmlinkage int printk_emit(int facility, int level,
> }
> EXPORT_SYMBOL(printk_emit);
>
> +int vprintk_default(const char *fmt, va_list args)
> +{
> + int r;
> +
> +#ifdef CONFIG_KGDB_KDB
> + if (unlikely(kdb_trap_printk)) {
> + r = vkdb_printf(fmt, args);
> + return r;
> + }
> +#endif
> + r = vprintk_emit(0, -1, NULL, 0, fmt, args);
> +
> + return r;

There is broken indentation in this hunk.

The rest of the patch looks fine.

Best Regards,
Petr

2015-05-18 14:15:13

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC PATCH 16/17] x86/nmi: Perform a safe NMI stack trace on all CPUs

On Thu 2015-05-14 11:35:03, Wang Long wrote:
> From: "Steven Rostedt (Red Hat)" <[email protected]>
>
> commit a9edc88093287183ac934be44f295f183b2c62dd upstream.
>
> When trigger_all_cpu_backtrace() is called on x86, it will trigger an
> NMI on each CPU and call show_regs(). But this can lead to a hard lock
> up if the NMI comes in on another printk().
>
> In order to avoid this, when the NMI triggers, it switches the printk
> routine for that CPU to call a NMI safe printk function that records the
> printk in a per_cpu seq_buf descriptor. After all NMIs have finished
> recording its data, the seq_bufs are printed in a safe context.
>
> Link: http://lkml.kernel.org/p/[email protected]
> Link: http://lkml.kernel.org/r/[email protected]
>
> Tested-by: Jiri Kosina <[email protected]>
> Acked-by: Jiri Kosina <[email protected]>
> Acked-by: Paul E. McKenney <[email protected]>
> Reviewed-by: Petr Mladek <[email protected]>
> [wanglong: backport to 3.10 stable
> - adjust context
> ]
> Signed-off-by: Wang Long <[email protected]>
> Signed-off-by: Steven Rostedt <[email protected]>
> ---
> arch/x86/kernel/apic/hw_nmi.c | 86 +++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 83 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
> index a698d71..1eb5f90 100644
> --- a/arch/x86/kernel/apic/hw_nmi.c
> +++ b/arch/x86/kernel/apic/hw_nmi.c
> @@ -18,6 +18,7 @@
> #include <linux/nmi.h>
> #include <linux/module.h>
> #include <linux/delay.h>
> +#include <linux/seq_buf.h>
>
> #ifdef CONFIG_HARDLOCKUP_DETECTOR
> u64 hw_nmi_get_sample_period(int watchdog_thresh)
> @@ -29,12 +30,33 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
> #ifdef arch_trigger_all_cpu_backtrace
> /* For reliability, we're prepared to waste bits here. */
> static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
> +static cpumask_var_t printtrace_mask;
> +
> +#define NMI_BUF_SIZE 4096

Please, replace spaces with tabs.

In fact, the indentation is broken in this whole patch.

The content looks fine, though.

Best Regards,
Petr

> +
> +struct nmi_seq_buf {
> + unsigned char buffer[NMI_BUF_SIZE];
> + struct seq_buf seq;
> +};
> +
> +/* Safe printing in NMI context */
> +static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
>
> /* "in progress" flag of arch_trigger_all_cpu_backtrace */
> static unsigned long backtrace_flag;
>
> +static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
> +{
> + const char *buf = s->buffer + start;
> +
> + printk("%.*s", (end - start) + 1, buf);

Also th

> +}
> +
> void arch_trigger_all_cpu_backtrace(void)
> {
> + struct nmi_seq_buf *s;
> + int len;
> + int cpu;
> int i;
>
> if (test_and_set_bit(0, &backtrace_flag))
> @@ -45,6 +67,15 @@ void arch_trigger_all_cpu_backtrace(void)
> return;
>
> cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
> + cpumask_copy(printtrace_mask, to_cpumask(backtrace_mask));
> + /*
> + * Set up per_cpu seq_buf buffers that the NMIs running on the other
> + * CPUs will write to.
> + */
> + for_each_cpu(cpu, to_cpumask(backtrace_mask)) {
> + s = &per_cpu(nmi_print_seq, cpu);
> + seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);


> + }
>
> printk(KERN_INFO "sending NMI to all CPUs:\n");
> apic->send_IPI_all(NMI_VECTOR);
> @@ -56,10 +87,57 @@ void arch_trigger_all_cpu_backtrace(void)
> mdelay(1);
> }
>
> + /*
> + * Now that all the NMIs have triggered, we can dump out their
> + * back traces safely to the console.
> + */
> + for_each_cpu(cpu, printtrace_mask) {
> + int last_i = 0;
> +
> + s = &per_cpu(nmi_print_seq, cpu);
> + len = seq_buf_used(&s->seq);
> + if (!len)
> + continue;
> +
> + /* Print line by line. */
> + for (i = 0; i < len; i++) {
> + if (s->buffer[i] == '\n') {
> + print_seq_line(s, last_i, i);
> + last_i = i + 1;
> + }
> + }
> + /* Check if there was a partial line. */
> + if (last_i < len) {
> + print_seq_line(s, last_i, len - 1);
> + pr_cont("\n");
> + }

Same here.

> + }
> +
> clear_bit(0, &backtrace_flag);
> smp_mb__after_clear_bit();
> }
>
> +/*
> + * It is not safe to call printk() directly from NMI handlers.
> + * It may be fine if the NMI detected a lock up and we have no choice
> + * but to do so, but doing a NMI on all other CPUs to get a back trace
> + * can be done with a sysrq-l. We don't want that to lock up, which
> + * can happen if the NMI interrupts a printk in progress.
> + *
> + * Instead, we redirect the vprintk() to this nmi_vprintk() that writes
> + * the content into a per cpu seq_buf buffer. Then when the NMIs are
> + * all done, we can safely dump the contents of the seq_buf to a printk()
> + * from a non NMI context.
> + */
> +static int nmi_vprintk(const char *fmt, va_list args)
> +{
> + struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
> + unsigned int len = seq_buf_used(&s->seq);
> +
> + seq_buf_vprintf(&s->seq, fmt, args);
> + return seq_buf_used(&s->seq) - len;
> +}
> +
> static int __kprobes
> arch_trigger_all_cpu_backtrace_handler(unsigned int cmd, struct pt_regs *regs)
> {
> @@ -68,12 +146,14 @@ arch_trigger_all_cpu_backtrace_handler(unsigned int cmd, struct pt_regs *regs)
> cpu = smp_processor_id();
>
> if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
> - static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED;
> + printk_func_t printk_func_save = this_cpu_read(printk_func);
>
> - arch_spin_lock(&lock);
> + /* Replace printk to write into the NMI seq */
> + this_cpu_write(printk_func, nmi_vprintk);
> printk(KERN_WARNING "NMI backtrace for cpu %d\n", cpu);
> show_regs(regs);
> - arch_spin_unlock(&lock);
> + this_cpu_write(printk_func, printk_func_save);
> +
> cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
> return NMI_HANDLED;
> }
> --
> 1.8.3.4
>

2015-05-19 08:56:56

by Wang Long

[permalink] [raw]
Subject: Re: [RFC PATCH 16/17] x86/nmi: Perform a safe NMI stack trace on all CPUs

On 2015/5/18 22:17, Petr Mladek wrote:
> On Thu 2015-05-14 11:35:03, Wang Long wrote:
>> From: "Steven Rostedt (Red Hat)" <[email protected]>
>>
>> commit a9edc88093287183ac934be44f295f183b2c62dd upstream.
>>
>> When trigger_all_cpu_backtrace() is called on x86, it will trigger an
>> NMI on each CPU and call show_regs(). But this can lead to a hard lock
>> up if the NMI comes in on another printk().
>>
>> In order to avoid this, when the NMI triggers, it switches the printk
>> routine for that CPU to call a NMI safe printk function that records the
>> printk in a per_cpu seq_buf descriptor. After all NMIs have finished
>> recording its data, the seq_bufs are printed in a safe context.
>>
>> Link: http://lkml.kernel.org/p/[email protected]
>> Link: http://lkml.kernel.org/r/[email protected]
>>
>> Tested-by: Jiri Kosina <[email protected]>
>> Acked-by: Jiri Kosina <[email protected]>
>> Acked-by: Paul E. McKenney <[email protected]>
>> Reviewed-by: Petr Mladek <[email protected]>
>> [wanglong: backport to 3.10 stable
>> - adjust context
>> ]
>> Signed-off-by: Wang Long <[email protected]>
>> Signed-off-by: Steven Rostedt <[email protected]>
>> ---
>> arch/x86/kernel/apic/hw_nmi.c | 86 +++++++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 83 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
>> index a698d71..1eb5f90 100644
>> --- a/arch/x86/kernel/apic/hw_nmi.c
>> +++ b/arch/x86/kernel/apic/hw_nmi.c
>> @@ -18,6 +18,7 @@
>> #include <linux/nmi.h>
>> #include <linux/module.h>
>> #include <linux/delay.h>
>> +#include <linux/seq_buf.h>
>>
>> #ifdef CONFIG_HARDLOCKUP_DETECTOR
>> u64 hw_nmi_get_sample_period(int watchdog_thresh)
>> @@ -29,12 +30,33 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
>> #ifdef arch_trigger_all_cpu_backtrace
>> /* For reliability, we're prepared to waste bits here. */
>> static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
>> +static cpumask_var_t printtrace_mask;
>> +
>> +#define NMI_BUF_SIZE 4096
>
> Please, replace spaces with tabs.
>
> In fact, the indentation is broken in this whole patch.
>
> The content looks fine, though.
>
> Best Regards,
> Petr
>
Hi Petr,

Thank you for your review. Sorry for the indentation broken.

I will send the patch v2.

Best Regards
Wang Long