Subject: [PATCH] oprofile, x86: Fix nmi-unsafe callgraph support in, 2.6.27 and later

On 07.06.11 10:00:41, John Lumby wrote:
> But please do send your fix along and I'll give it a try. I now have
> capability for capturing un-syslog'd console messages such as these via
> serial console so if it happens again I should be able to provide more
> information.

See the fix below.

-Robert


>From c73bebe2e281ae089499d7cbc3b44a0869c8daf8 Mon Sep 17 00:00:00 2001
From: Robert Richter <[email protected]>
Date: Fri, 3 Jun 2011 16:37:47 +0200
Subject: [PATCH] oprofile, x86: Fix nmi-unsafe callgraph support

Current oprofile's x86 callgraph support may trigger page faults
throwing the BUG_ON(in_nmi()) message below. This patch fixes this by
using the same nmi-safe copy-from-user code as in perf.

------------[ cut here ]------------
kernel BUG at .../arch/x86/kernel/traps.c:436!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:0a.0/0000:07:00.0/0000:08:04.0/net/eth0/broadcast
CPU 5
Modules linked in:

Pid: 8611, comm: opcontrol Not tainted 2.6.39-00007-gfe47ae7 #1 Advanced Micro Device Anaheim/Anaheim
RIP: 0010:[<ffffffff813e8e35>] [<ffffffff813e8e35>] do_nmi+0x22/0x1ee
RSP: 0000:ffff88042fd47f28 EFLAGS: 00010002
RAX: ffff88042c0a7fd8 RBX: 0000000000000001 RCX: 00000000c0000101
RDX: 00000000ffff8804 RSI: ffffffffffffffff RDI: ffff88042fd47f58
RBP: ffff88042fd47f48 R08: 0000000000000004 R09: 0000000000001484
R10: 0000000000000001 R11: 0000000000000000 R12: ffff88042fd47f58
R13: 0000000000000000 R14: ffff88042fd47d98 R15: 0000000000000020
FS: 00007fca25e56700(0000) GS:ffff88042fd40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000074 CR3: 000000042d28b000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process opcontrol (pid: 8611, threadinfo ffff88042c0a6000, task ffff88042c532310)
Stack:
0000000000000000 0000000000000001 ffff88042c0a7fd8 0000000000000000
ffff88042fd47de8 ffffffff813e897a 0000000000000020 ffff88042fd47d98
0000000000000000 ffff88042c0a7fd8 ffff88042fd47de8 0000000000000074
Call Trace:
<NMI>
[<ffffffff813e897a>] nmi+0x1a/0x20
[<ffffffff813f08ab>] ? bad_to_user+0x25/0x771
<<EOE>>
Code: ff 59 5b 41 5c 41 5d c9 c3 55 65 48 8b 04 25 88 b5 00 00 48 89 e5 41 55 41 54 49 89 fc 53 48 83 ec 08 f6 80 47 e0 ff ff 04 74 04 <0f> 0b eb fe 81 80 44 e0 ff ff 00 00 01 04 65 ff 04 25 c4 0f 01
RIP [<ffffffff813e8e35>] do_nmi+0x22/0x1ee
RSP <ffff88042fd47f28>
---[ end trace ed6752185092104b ]---
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 8611, comm: opcontrol Tainted: G D 2.6.39-00007-gfe47ae7 #1
Call Trace:
<NMI> [<ffffffff813e5e0a>] panic+0x8c/0x188
[<ffffffff813e915c>] oops_end+0x81/0x8e
[<ffffffff8100403d>] die+0x55/0x5e
[<ffffffff813e8c45>] do_trap+0x11c/0x12b
[<ffffffff810023c8>] do_invalid_op+0x91/0x9a
[<ffffffff813e8e35>] ? do_nmi+0x22/0x1ee
[<ffffffff8131e6fa>] ? oprofile_add_sample+0x83/0x95
[<ffffffff81321670>] ? op_amd_check_ctrs+0x4f/0x2cf
[<ffffffff813ee4d5>] invalid_op+0x15/0x20
[<ffffffff813e8e35>] ? do_nmi+0x22/0x1ee
[<ffffffff813e8e7a>] ? do_nmi+0x67/0x1ee
[<ffffffff813e897a>] nmi+0x1a/0x20
[<ffffffff813f08ab>] ? bad_to_user+0x25/0x771
<<EOE>>

Cc: John Lumby <[email protected]>
Cc: Maynard Johnson <[email protected]>
Cc: <[email protected]> # .37+
Signed-off-by: Robert Richter <[email protected]>
---
arch/x86/oprofile/backtrace.c | 56 ++++++++++++++++++++++++++++++++++------
1 files changed, 47 insertions(+), 9 deletions(-)

diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index 2d49d4e..88e856e 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -11,10 +11,12 @@
#include <linux/oprofile.h>
#include <linux/sched.h>
#include <linux/mm.h>
+#include <linux/compat.h>
+#include <linux/highmem.h>
+
#include <asm/ptrace.h>
#include <asm/uaccess.h>
#include <asm/stacktrace.h>
-#include <linux/compat.h>

static void backtrace_warning_symbol(void *data, char *msg,
unsigned long symbol)
@@ -49,17 +51,53 @@ static struct stacktrace_ops backtrace_ops = {
.walk_stack = print_context_stack,
};

+/* from arch/x86/kernel/cpu/perf_event.c: */
+
+/*
+ * best effort, GUP based copy_from_user() that assumes IRQ or NMI context
+ */
+static unsigned long
+copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
+{
+ unsigned long offset, addr = (unsigned long)from;
+ unsigned long size, len = 0;
+ struct page *page;
+ void *map;
+ int ret;
+
+ do {
+ ret = __get_user_pages_fast(addr, 1, 0, &page);
+ if (!ret)
+ break;
+
+ offset = addr & (PAGE_SIZE - 1);
+ size = min(PAGE_SIZE - offset, n - len);
+
+ map = kmap_atomic(page);
+ memcpy(to, map+offset, size);
+ kunmap_atomic(map);
+ put_page(page);
+
+ len += size;
+ to += size;
+ addr += size;
+
+ } while (len < n);
+
+ return len;
+}
+
#ifdef CONFIG_COMPAT
static struct stack_frame_ia32 *
dump_user_backtrace_32(struct stack_frame_ia32 *head)
{
+ /* Also check accessibility of one struct frame_head beyond: */
struct stack_frame_ia32 bufhead[2];
struct stack_frame_ia32 *fp;
+ unsigned long bytes;

- /* Also check accessibility of one struct frame_head beyond */
- if (!access_ok(VERIFY_READ, head, sizeof(bufhead)))
- return NULL;
- if (__copy_from_user_inatomic(bufhead, head, sizeof(bufhead)))
+ bytes = copy_from_user_nmi(bufhead, head, sizeof(bufhead));
+ if (bytes != sizeof(bufhead))
return NULL;

fp = (struct stack_frame_ia32 *) compat_ptr(bufhead[0].next_frame);
@@ -100,12 +138,12 @@ x86_backtrace_32(struct pt_regs * const regs, unsigned int depth)

static struct stack_frame *dump_user_backtrace(struct stack_frame *head)
{
+ /* Also check accessibility of one struct frame_head beyond: */
struct stack_frame bufhead[2];
+ unsigned long bytes;

- /* Also check accessibility of one struct stack_frame beyond */
- if (!access_ok(VERIFY_READ, head, sizeof(bufhead)))
- return NULL;
- if (__copy_from_user_inatomic(bufhead, head, sizeof(bufhead)))
+ bytes = copy_from_user_nmi(bufhead, head, sizeof(bufhead));
+ if (bytes != sizeof(bufhead))
return NULL;

oprofile_add_trace(bufhead[0].return_address);
--
1.7.5.rc3



--
Advanced Micro Devices, Inc.
Operating System Research Center


Subject: [RFC] [PATCH] x86: Make copy_from_user_nmi() a library function

On 07.06.11 19:18:22, Robert Richter wrote:
> From c73bebe2e281ae089499d7cbc3b44a0869c8daf8 Mon Sep 17 00:00:00 2001
> From: Robert Richter <[email protected]>
> Date: Fri, 3 Jun 2011 16:37:47 +0200
> Subject: [PATCH] oprofile, x86: Fix nmi-unsafe callgraph support

While this patch was a stable fix which introduced duplicate code for
perf and oprofile I suggest merging the code, see the patch below.

-Robert


>From 3652d1b07a0cfce5b26ad186594c1bf8f9ca5898 Mon Sep 17 00:00:00 2001
From: Robert Richter <[email protected]>
Date: Tue, 7 Jun 2011 11:49:55 +0200
Subject: [PATCH] x86: Make copy_from_user_nmi() a library function

copy_from_user_nmi() is used in oprofile and perf. Moving it to other
library functions like copy_from_user(). As this is x86 code for 32
and 64 bits, create a new file usercopy.c for unified code.

Signed-off-by: Robert Richter <[email protected]>
---
arch/x86/include/asm/uaccess.h | 3 ++
arch/x86/kernel/cpu/perf_event.c | 35 ------------------------------
arch/x86/lib/Makefile | 2 +-
arch/x86/lib/usercopy.c | 43 ++++++++++++++++++++++++++++++++++++++
arch/x86/oprofile/backtrace.c | 39 +---------------------------------
5 files changed, 48 insertions(+), 74 deletions(-)
create mode 100644 arch/x86/lib/usercopy.c

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index abd3e0e..c0236e9 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -556,6 +556,9 @@ struct __large_struct { unsigned long buf[100]; };

#endif /* CONFIG_X86_WP_WORKS_OK */

+extern unsigned long
+copy_from_user_nmi(void *to, const void __user *from, unsigned long n);
+
/*
* movsl can be slow when source and dest are not both 8-byte aligned
*/
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index e638689..46bf36d 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -22,7 +22,6 @@
#include <linux/sched.h>
#include <linux/uaccess.h>
#include <linux/slab.h>
-#include <linux/highmem.h>
#include <linux/cpu.h>
#include <linux/bitops.h>

@@ -43,40 +42,6 @@ do { \
} while (0)
#endif

-/*
- * best effort, GUP based copy_from_user() that assumes IRQ or NMI context
- */
-static unsigned long
-copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
-{
- unsigned long offset, addr = (unsigned long)from;
- unsigned long size, len = 0;
- struct page *page;
- void *map;
- int ret;
-
- do {
- ret = __get_user_pages_fast(addr, 1, 0, &page);
- if (!ret)
- break;
-
- offset = addr & (PAGE_SIZE - 1);
- size = min(PAGE_SIZE - offset, n - len);
-
- map = kmap_atomic(page);
- memcpy(to, map+offset, size);
- kunmap_atomic(map);
- put_page(page);
-
- len += size;
- to += size;
- addr += size;
-
- } while (len < n);
-
- return len;
-}
-
struct event_constraint {
union {
unsigned long idxmsk[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index f2479f1..6ba4773 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -18,7 +18,7 @@ obj-$(CONFIG_SMP) += msr-smp.o cache-smp.o

lib-y := delay.o
lib-y += thunk_$(BITS).o
-lib-y += usercopy_$(BITS).o getuser.o putuser.o
+lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
lib-y += memcpy_$(BITS).o
lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o

diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
new file mode 100644
index 0000000..4ef83d3
--- /dev/null
+++ b/arch/x86/lib/usercopy.c
@@ -0,0 +1,43 @@
+/*
+ * User address space access functions.
+ *
+ * For licencing details see kernel-base/COPYING
+ */
+
+#include <linux/highmem.h>
+#include <linux/module.h>
+
+/*
+ * best effort, GUP based copy_from_user() that assumes IRQ or NMI context
+ */
+unsigned long
+copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
+{
+ unsigned long offset, addr = (unsigned long)from;
+ unsigned long size, len = 0;
+ struct page *page;
+ void *map;
+ int ret;
+
+ do {
+ ret = __get_user_pages_fast(addr, 1, 0, &page);
+ if (!ret)
+ break;
+
+ offset = addr & (PAGE_SIZE - 1);
+ size = min(PAGE_SIZE - offset, n - len);
+
+ map = kmap_atomic(page);
+ memcpy(to, map+offset, size);
+ kunmap_atomic(map);
+ put_page(page);
+
+ len += size;
+ to += size;
+ addr += size;
+
+ } while (len < n);
+
+ return len;
+}
+EXPORT_SYMBOL(copy_from_user_nmi);
diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index 88e856e..087233f 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -12,10 +12,9 @@
#include <linux/sched.h>
#include <linux/mm.h>
#include <linux/compat.h>
-#include <linux/highmem.h>
+#include <linux/uaccess.h>

#include <asm/ptrace.h>
-#include <asm/uaccess.h>
#include <asm/stacktrace.h>

static void backtrace_warning_symbol(void *data, char *msg,
@@ -51,42 +50,6 @@ static struct stacktrace_ops backtrace_ops = {
.walk_stack = print_context_stack,
};

-/* from arch/x86/kernel/cpu/perf_event.c: */
-
-/*
- * best effort, GUP based copy_from_user() that assumes IRQ or NMI context
- */
-static unsigned long
-copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
-{
- unsigned long offset, addr = (unsigned long)from;
- unsigned long size, len = 0;
- struct page *page;
- void *map;
- int ret;
-
- do {
- ret = __get_user_pages_fast(addr, 1, 0, &page);
- if (!ret)
- break;
-
- offset = addr & (PAGE_SIZE - 1);
- size = min(PAGE_SIZE - offset, n - len);
-
- map = kmap_atomic(page);
- memcpy(to, map+offset, size);
- kunmap_atomic(map);
- put_page(page);
-
- len += size;
- to += size;
- addr += size;
-
- } while (len < n);
-
- return len;
-}
-
#ifdef CONFIG_COMPAT
static struct stack_frame_ia32 *
dump_user_backtrace_32(struct stack_frame_ia32 *head)
--
1.7.5.rc3



--
Advanced Micro Devices, Inc.
Operating System Research Center

2011-06-07 20:46:14

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [RFC] [PATCH] x86: Make copy_from_user_nmi() a library function

On Tue, Jun 07, 2011 at 07:24:13PM +0200, Robert Richter wrote:
... (unrelated a bit) ...
> +
> + offset = addr & (PAGE_SIZE - 1);
> + size = min(PAGE_SIZE - offset, n - len);
> +
...

Robert, would not offset = addr & ~PAGE_MASK be somehow more
readable? Just a thought.

Cyrill

Subject: Re: [PATCH] oprofile, x86: Fix nmi-unsafe callgraph support in, 2.6.27 and later

On 08.06.11 13:43:56, John Lumby wrote:
> I am also a bit confused by the kernel's statement that it's a NULL
> pointer deref. I had previously found that the pointer contains
> 0x0000000a, not NULL. Does the kernel call it NULL for any invalid
> value? Or am I misunderstanding it. Anyway ...

0x0000000a is also considered a NULL pointer access, typically this
happens if a member of a struct which points to NULL is accessed.

>
> Have you or anyone tried this on an intel x86? I'm just curious
> whether it's everyone or only me.

>From your other mail:

"EIP is at print_context_stack=0x45/0xb0

and from a machine-code listing, I found that that offset corresponds
to the line

addr = *stack;

in arch/x86/kernel/dumpstack.c"

Actually this should not happen, because of checking the stack pointer
in valid_stack_ptr(). So could you apply the change below and test if
this throws a bug message?

Thanks,

-Robert


diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index e2a3f06..37693f5 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -73,6 +73,8 @@ static inline int valid_stack_ptr(struct thread_info *tinfo,
void *p, unsigned int size, void *end)
{
void *t = tinfo;
+
+ BUG_ON(p < (void *)THREAD_SIZE);
if (end) {
if (p < end && p >= (end-THREAD_SIZE))
return 1;


--
Advanced Micro Devices, Inc.
Operating System Research Center