2013-07-20 13:12:47

by Fengguang Wu

[permalink] [raw]
Subject: [x86] Kernel panic - not syncing: Fatal exception in interrupt

Greetings,

I got the below dmesg and the first bad commit is

commit 51b2c07b22261f19188d9a9071943d60a067481c
Author: Jiri Kosina <[email protected]>
Date: Fri Jul 12 11:22:09 2013 +0200

x86: Make jump_label use int3-based patching

Make jump labels use text_poke_bp() for text patching instead of
text_poke_smp(), avoiding the need for stop_machine().

Reviewed-by: Steven Rostedt <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>


Parent commit not clean. Look out for wrong bisect!

BUG: kernel boot crashed

/kernel/x86_64-randconfig-c05-0718/fd4363fff3d96795d3feb1b3fb48ce590f186bdd/dmesg-kvm-xbm-7912-20130720142415-3.11.0-rc1-00166-g1faabf2-146

[ 0.212429] devtmpfs: initialized
[ 0.236027] int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 0.237157] Modules linked in:
[ 0.237765] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8
[ 0.239129] task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000
[ 0.240000] RIP: 0010:[<ffffffff811098cc>] [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225
[ 0.240000] RSP: 0000:ffff88000dd03f10 EFLAGS: 00000006
[ 0.240000] RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40
[ 0.240000] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001
[ 0.240000] RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002
[ 0.240000] R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0
[ 0.240000] R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8
[ 0.240000] FS: 0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000
[ 0.240000] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 0.240000] CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0
[ 0.240000] Stack:
[ 0.240000] ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48
[ 0.240000] ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68
[ 0.240000] ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98
[ 0.240000] Call Trace:
[ 0.240000] <IRQ>
[ 0.240000] [<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79
[ 0.240000] [<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84
[ 0.240000] [<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0
[ 0.240000] [<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41
[ 0.240000] [<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80
[ 0.240000] <EOI>
[ 0.240000] [<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1
[ 0.240000] [<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16
[ 0.240000] [<ffffffff81015f10>] default_idle+0x147/0x282
[ 0.240000] [<ffffffff81017026>] arch_cpu_idle+0x3d/0x5d
[ 0.240000] [<ffffffff81127d6a>] cpu_idle_loop+0x46d/0x5db
[ 0.240000] [<ffffffff81127f5c>] cpu_startup_entry+0x84/0x84
[ 0.240000] [<ffffffff8104f4f8>] start_secondary+0x3c8/0x3d5
[ 0.240000] Code: 5c 5d c3 e8 d7 0f 63 00 55 48 ff 05 1f 5d 1e 01 48 89 e5 41 55 49 89 f5 41 54 53 48 89 fb e8 8d fe ff ff 48 ff 05 0d 5d 1e 01 cc <1f> 44 00 00 31 c0 eb 0c 48 ff 05 05 5d 1e 01 b8 01 00 00 00 48
[ 0.240000] RIP [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225
[ 0.240000] RSP <ffff88000dd03f10>
[ 0.240000] ---[ end trace 0d3288a047152a17 ]---

git bisect start 04bf57606b28bfa283bd118e4ea032151b02f6df 61f98b0fca802d7e0191072606519e2230a6226d --
git bisect good 4b6cd6e8fdab82632dff52f495ea60c712d3ce0e # 11:12 55+ kvm tools: Update README
git bisect good c56875099528aa54eb28b5feb11935c4069b5835 # 11:46 55+ kvm tools: Fix 'lkvm stop' when guest is pasued.
git bisect good 3b1938c28d3f6e28d7ecf2e30324e80a5a7fff35 # 11:58 55+ kvm tools: move kvm_config into struct kvm
git bisect good 7c6238bf263d8488878d17377ae0f255a93f6927 # 12:22 55+ kvm tools: remove unneeded checks in qcow code
git bisect good 7160bbc1d66580a9eb7ee9251e4b58d34d264baa # 12:58 55+ kvm tools: ioport: allow ioport devices to generate fdt nodes
git bisect bad 54396a651817c510d8d5405cf2ddfce9d3a7108d # 13:07 12- Merge branch 'x86/urgent'
git bisect good b8e7b7b26a0082c461d1be51fbe411a0d1cf196f # 13:27 55+ Merge branch 'x86/asm'
git bisect bad 63a1d00d179f794012001c50afacadc98b2c4b81 # 13:32 12- Merge branch 'x86/jumplabel'
git bisect good 237d1548543312fcc8c99d302ab68fbf8ef6f97f # 13:44 75+ x86: Fix override new_cpu_data.x86 with 486
git bisect bad 51b2c07b22261f19188d9a9071943d60a067481c # 13:53 65- x86: Make jump_label use int3-based patching
git bisect good fd4363fff3d96795d3feb1b3fb48ce590f186bdd # 14:18 340+ x86: Introduce int3 (breakpoint)-based instruction patching
git bisect good fd4363fff3d96795d3feb1b3fb48ce590f186bdd # 14:24 1020+ x86: Introduce int3 (breakpoint)-based instruction patching
git bisect bad 04bf57606b28bfa283bd118e4ea032151b02f6df # 14:25 0- Merge branch 'sched/urgent'
git bisect good a1fab3cce9730d3ec8d5caf2e6acb0a0ff8dd240 # 14:39 1020+ Revert "x86: Make jump_label use int3-based patching"
git bisect good d471ce53b1fab60110e4e9f647a345cea31752de # 17:31 1020+ Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
git bisect bad c1f631b9a68251007a6353041ae90f9f7dca771c # 17:31 0- Add linux-next specific files for 20130719
git bisect bad 5a69985392e60abcaeb2243faf040488330a6ae8 # 17:34 22- Merge branch 'perf/core'

Thanks,
Fengguang


Attachments:
(No filename) (5.54 kB)
dmesg-kvm-snb-16313-20130718213435-3.11.0-rc1-00026-g0ec5f3a-58 (25.31 kB)
bisect-04bf57606b28bfa283bd118e4ea032151b02f6df-x86_64-randconfig-c05-0718-Kernel-panic---not-syncing:-Fatal-exception-in-interrupt-81850.log (30.16 kB)
config-3.11.0-rc1-01429-g04bf576 (69.35 kB)
Download all attachments

2013-07-21 00:24:02

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [x86] Kernel panic - not syncing: Fatal exception in interrupt

On 07/20/2013 06:12 AM, Fengguang Wu wrote:
> Greetings,
>
> I got the below dmesg and the first bad commit is
>
> commit 51b2c07b22261f19188d9a9071943d60a067481c
> Author: Jiri Kosina <[email protected]>
> Date: Fri Jul 12 11:22:09 2013 +0200
>
> x86: Make jump_label use int3-based patching
>
> Make jump labels use text_poke_bp() for text patching instead of
> text_poke_smp(), avoiding the need for stop_machine().
>
> Reviewed-by: Steven Rostedt <[email protected]>
> Reviewed-by: Masami Hiramatsu <[email protected]>
> Signed-off-by: Jiri Kosina <[email protected]>
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: H. Peter Anvin <[email protected]>
>
>
> Parent commit not clean. Look out for wrong bisect!
>
> BUG: kernel boot crashed
>
> /kernel/x86_64-randconfig-c05-0718/fd4363fff3d96795d3feb1b3fb48ce590f186bdd/dmesg-kvm-xbm-7912-20130720142415-3.11.0-rc1-00166-g1faabf2-146
>
> [ 0.212429] devtmpfs: initialized
> [ 0.236027] int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [ 0.237157] Modules linked in:
> [ 0.237765] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8
> [ 0.239129] task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000
> [ 0.240000] RIP: 0010:[<ffffffff811098cc>] [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225
> [ 0.240000] RSP: 0000:ffff88000dd03f10 EFLAGS: 00000006
> [ 0.240000] RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40
> [ 0.240000] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001
> [ 0.240000] RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002
> [ 0.240000] R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0
> [ 0.240000] R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8
> [ 0.240000] FS: 0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000
> [ 0.240000] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 0.240000] CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0
> [ 0.240000] Stack:
> [ 0.240000] ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48
> [ 0.240000] ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68
> [ 0.240000] ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98
> [ 0.240000] Call Trace:
> [ 0.240000] <IRQ>
> [ 0.240000] [<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79
> [ 0.240000] [<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84
> [ 0.240000] [<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0
> [ 0.240000] [<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41
> [ 0.240000] [<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80
> [ 0.240000] <EOI>
> [ 0.240000] [<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1
> [ 0.240000] [<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16

Well, it is definitely easy to see what happened here.

We took a breakpoint fault that the kernel didn't expect. This
shouldn't happen... the breakpoint handler should have said "oh, this is
an instruction being patched" and resumed, but that didn't happen.

Jiri, I'm wondering if by any chance we have more than one CPU inside
text_poke_bp() at the same time. The global variables in text_poke_bp()
don't seem to be protected against reentrancy at all.

-hpa

P.S. the sync_core() in do_sync_core() should be unnecessary, as IRET is
a synchronizing instruction.



2013-07-21 08:30:57

by Jiri Kosina

[permalink] [raw]
Subject: Re: [x86] Kernel panic - not syncing: Fatal exception in interrupt

On Sat, 20 Jul 2013, H. Peter Anvin wrote:

> > [ 0.212429] devtmpfs: initialized
> > [ 0.236027] int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > [ 0.237157] Modules linked in:
> > [ 0.237765] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8
> > [ 0.239129] task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000
> > [ 0.240000] RIP: 0010:[<ffffffff811098cc>] [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225
> > [ 0.240000] RSP: 0000:ffff88000dd03f10 EFLAGS: 00000006
> > [ 0.240000] RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40
> > [ 0.240000] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001
> > [ 0.240000] RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002
> > [ 0.240000] R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0
> > [ 0.240000] R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8
> > [ 0.240000] FS: 0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000
> > [ 0.240000] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [ 0.240000] CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0
> > [ 0.240000] Stack:
> > [ 0.240000] ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48
> > [ 0.240000] ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68
> > [ 0.240000] ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98
> > [ 0.240000] Call Trace:
> > [ 0.240000] <IRQ>
> > [ 0.240000] [<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79
> > [ 0.240000] [<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84
> > [ 0.240000] [<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0
> > [ 0.240000] [<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41
> > [ 0.240000] [<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80
> > [ 0.240000] <EOI>
> > [ 0.240000] [<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1
> > [ 0.240000] [<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16
>
> Well, it is definitely easy to see what happened here.
>
> We took a breakpoint fault that the kernel didn't expect. This
> shouldn't happen... the breakpoint handler should have said "oh, this is
> an instruction being patched" and resumed, but that didn't happen.
>
> Jiri, I'm wondering if by any chance we have more than one CPU inside
> text_poke_bp() at the same time. The global variables in text_poke_bp()
> don't seem to be protected against reentrancy at all.

That shouldn't happen, because:


- text_poke_bp() should always be called under text_mutex (and
arch_jump_label_transform() does that properly)

- correctness between int3_notify() and texp_poke_bp() wrt. global
variables is achieved through barrier

So we should be safe here afaics.

What I am however wondering whether can't be case here is that the jump
label was used before int3_notifier has been registered.
I am thinking about ways around this, but we'll probably have to do the
same ftrace is doing, i.e. hook into do_int3() directly instead of relying
on the notifier to be registered in time.

Fengguang, as I am not able to reproduce this bug locally, could you do me
a favor and test whether the patch below works the problem around, just
for the sake of testing the hypothesis?

Thanks.


From: Jiri Kosina <[email protected]>
Subject: [PATCH] x86: call out into int3 handler directly instead of using notifier

---
arch/x86/include/asm/alternative.h | 2 ++
arch/x86/kernel/alternative.c | 22 +++++++++++++++++++++-
arch/x86/kernel/traps.c | 4 ++++
3 files changed, 27 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 3abf8dd..c22a41d 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -5,6 +5,7 @@
#include <linux/stddef.h>
#include <linux/stringify.h>
#include <asm/asm.h>
+#include <asm/ptrace.h>

/*
* Alternative inline assembly for SMP.
@@ -232,6 +233,7 @@ struct text_poke_param {
size_t len;
};

+extern int poke_bp_int3_handler(struct pt_regs *regs);
extern void *text_poke(void *addr, const void *opcode, size_t len);
extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 0ab4936..e1088f2 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -605,6 +605,24 @@ static void do_sync_core(void *info)
static bool bp_patching_in_progress;
static void *bp_int3_handler, *bp_int3_addr;

+int poke_bp_int3_handler(struct pt_regs *regs)
+{
+ /* bp_patching_in_progress */
+ smp_rmb();
+
+ if (likely(!bp_patching_in_progress))
+ return 0;
+
+ if (user_mode_vm(regs) || regs->ip != (unsigned long)bp_int3_addr)
+ return 0;
+
+ /* set up the specified breakpoint handler */
+ regs->ip = (unsigned long) bp_int3_handler;
+
+ return 1;
+
+}
+
static int int3_notify(struct notifier_block *self, unsigned long val, void *data)
{
struct die_args *args = data;
@@ -689,6 +707,7 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
return addr;
}

+#if 0
/* this one needs to run before anything else handles it as a
* regular exception */
static struct notifier_block int3_nb = {
@@ -700,8 +719,9 @@ static int __init int3_init(void)
{
return register_die_notifier(&int3_nb);
}
-
arch_initcall(int3_init);
+#endif
+
/*
* Cross-modifying kernel text with stop_machine().
* This code originally comes from immediate value.
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 772e2a8..e464764 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -58,6 +58,7 @@
#include <asm/mce.h>
#include <asm/fixmap.h>
#include <asm/mach_traps.h>
+#include <asm/alternative.h>

#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -324,6 +325,9 @@ dotraplinkage void __kprobes notrace do_int3(struct pt_regs *regs, long error_co
ftrace_int3_handler(regs))
return;
#endif
+ if (poke_bp_int3_handler(regs))
+ return;
+
prev_state = exception_enter();
#ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,

--
Jiri Kosina
SUSE Labs

2013-07-21 16:21:21

by Fengguang Wu

[permalink] [raw]
Subject: Re: [x86] Kernel panic - not syncing: Fatal exception in interrupt

Hi Jiri,

> Fengguang, as I am not able to reproduce this bug locally, could you do me
> a favor and test whether the patch below works the problem around, just
> for the sake of testing the hypothesis?

Sure. I just created a branch with this patch on top of the first bad
commit, and queued the branch for boot tests.

Thanks,
Fengguang

> From: Jiri Kosina <[email protected]>
> Subject: [PATCH] x86: call out into int3 handler directly instead of using notifier
>
> ---
> arch/x86/include/asm/alternative.h | 2 ++
> arch/x86/kernel/alternative.c | 22 +++++++++++++++++++++-
> arch/x86/kernel/traps.c | 4 ++++
> 3 files changed, 27 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> index 3abf8dd..c22a41d 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -5,6 +5,7 @@
> #include <linux/stddef.h>
> #include <linux/stringify.h>
> #include <asm/asm.h>
> +#include <asm/ptrace.h>
>
> /*
> * Alternative inline assembly for SMP.
> @@ -232,6 +233,7 @@ struct text_poke_param {
> size_t len;
> };
>
> +extern int poke_bp_int3_handler(struct pt_regs *regs);
> extern void *text_poke(void *addr, const void *opcode, size_t len);
> extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
> extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 0ab4936..e1088f2 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -605,6 +605,24 @@ static void do_sync_core(void *info)
> static bool bp_patching_in_progress;
> static void *bp_int3_handler, *bp_int3_addr;
>
> +int poke_bp_int3_handler(struct pt_regs *regs)
> +{
> + /* bp_patching_in_progress */
> + smp_rmb();
> +
> + if (likely(!bp_patching_in_progress))
> + return 0;
> +
> + if (user_mode_vm(regs) || regs->ip != (unsigned long)bp_int3_addr)
> + return 0;
> +
> + /* set up the specified breakpoint handler */
> + regs->ip = (unsigned long) bp_int3_handler;
> +
> + return 1;
> +
> +}
> +
> static int int3_notify(struct notifier_block *self, unsigned long val, void *data)
> {
> struct die_args *args = data;
> @@ -689,6 +707,7 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
> return addr;
> }
>
> +#if 0
> /* this one needs to run before anything else handles it as a
> * regular exception */
> static struct notifier_block int3_nb = {
> @@ -700,8 +719,9 @@ static int __init int3_init(void)
> {
> return register_die_notifier(&int3_nb);
> }
> -
> arch_initcall(int3_init);
> +#endif
> +
> /*
> * Cross-modifying kernel text with stop_machine().
> * This code originally comes from immediate value.
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 772e2a8..e464764 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -58,6 +58,7 @@
> #include <asm/mce.h>
> #include <asm/fixmap.h>
> #include <asm/mach_traps.h>
> +#include <asm/alternative.h>
>
> #ifdef CONFIG_X86_64
> #include <asm/x86_init.h>
> @@ -324,6 +325,9 @@ dotraplinkage void __kprobes notrace do_int3(struct pt_regs *regs, long error_co
> ftrace_int3_handler(regs))
> return;
> #endif
> + if (poke_bp_int3_handler(regs))
> + return;
> +
> prev_state = exception_enter();
> #ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
> if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
>
> --
> Jiri Kosina
> SUSE Labs

2013-07-21 17:21:27

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [x86] Kernel panic - not syncing: Fatal exception in interrupt

Honestly, why don't we make the patch list (rbtree, whatever) a permanent part of the default breakpoint handler. It only applies to kernel space anyway and the kernel doesn't have any permanent breakpoints so there should be no performance reason not to.

Jiri Kosina <[email protected]> wrote:
>On Sat, 20 Jul 2013, H. Peter Anvin wrote:
>
>> > [ 0.212429] devtmpfs: initialized
>> > [ 0.236027] int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>> > [ 0.237157] Modules linked in:
>> > [ 0.237765] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
>3.11.0-rc1-01429-g04bf576 #8
>> > [ 0.239129] task: ffff88000da1b040 ti: ffff88000da1c000 task.ti:
>ffff88000da1c000
>> > [ 0.240000] RIP: 0010:[<ffffffff811098cc>] [<ffffffff811098cc>]
>ttwu_do_wakeup+0x28/0x225
>> > [ 0.240000] RSP: 0000:ffff88000dd03f10 EFLAGS: 00000006
>> > [ 0.240000] RAX: 0000000000000000 RBX: ffff88000dd12940 RCX:
>ffffffff81769c40
>> > [ 0.240000] RDX: 0000000000000002 RSI: 0000000000000000 RDI:
>0000000000000001
>> > [ 0.240000] RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09:
>0000000000000002
>> > [ 0.240000] R10: ffffffff810ff484 R11: ffff88000dd129e8 R12:
>ffff88000dbc90c0
>> > [ 0.240000] R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15:
>ffff88000da1dfd8
>> > [ 0.240000] FS: 0000000000000000(0000)
>GS:ffff88000dd00000(0000) knlGS:0000000000000000
>> > [ 0.240000] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> > [ 0.240000] CR2: 00000000ffffffff CR3: 0000000001c88000 CR4:
>00000000000006e0
>> > [ 0.240000] Stack:
>> > [ 0.240000] ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8
>ffff88000dd03f48
>> > [ 0.240000] ffffffff81109e2b ffff88000dd12940 0000000000000000
>ffff88000dd03f68
>> > [ 0.240000] ffffffff81109e9e 0000000000000000 0000000000012940
>ffff88000dd03f98
>> > [ 0.240000] Call Trace:
>> > [ 0.240000] <IRQ>
>> > [ 0.240000] [<ffffffff81109e2b>]
>ttwu_do_activate.constprop.56+0x6d/0x79
>> > [ 0.240000] [<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84
>> > [ 0.240000] [<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0
>> > [ 0.240000] [<ffffffff8104dfb4>]
>smp_reschedule_interrupt+0x38/0x41
>> > [ 0.240000] [<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80
>> > [ 0.240000] <EOI>
>> > [ 0.240000] [<ffffffff810ff484>] ?
>__atomic_notifier_call_chain+0x5/0xc1
>> > [ 0.240000] [<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16
>>
>> Well, it is definitely easy to see what happened here.
>>
>> We took a breakpoint fault that the kernel didn't expect. This
>> shouldn't happen... the breakpoint handler should have said "oh, this
>is
>> an instruction being patched" and resumed, but that didn't happen.
>>
>> Jiri, I'm wondering if by any chance we have more than one CPU inside
>> text_poke_bp() at the same time. The global variables in
>text_poke_bp()
>> don't seem to be protected against reentrancy at all.
>
>That shouldn't happen, because:
>
>
>- text_poke_bp() should always be called under text_mutex (and
> arch_jump_label_transform() does that properly)
>
>- correctness between int3_notify() and texp_poke_bp() wrt. global
> variables is achieved through barrier
>
>So we should be safe here afaics.
>
>What I am however wondering whether can't be case here is that the jump
>
>label was used before int3_notifier has been registered.
>I am thinking about ways around this, but we'll probably have to do the
>
>same ftrace is doing, i.e. hook into do_int3() directly instead of
>relying
>on the notifier to be registered in time.
>
>Fengguang, as I am not able to reproduce this bug locally, could you do
>me
>a favor and test whether the patch below works the problem around, just
>
>for the sake of testing the hypothesis?
>
>Thanks.
>
>
>From: Jiri Kosina <[email protected]>
>Subject: [PATCH] x86: call out into int3 handler directly instead of
>using notifier
>
>---
> arch/x86/include/asm/alternative.h | 2 ++
> arch/x86/kernel/alternative.c | 22 +++++++++++++++++++++-
> arch/x86/kernel/traps.c | 4 ++++
> 3 files changed, 27 insertions(+), 1 deletions(-)
>
>diff --git a/arch/x86/include/asm/alternative.h
>b/arch/x86/include/asm/alternative.h
>index 3abf8dd..c22a41d 100644
>--- a/arch/x86/include/asm/alternative.h
>+++ b/arch/x86/include/asm/alternative.h
>@@ -5,6 +5,7 @@
> #include <linux/stddef.h>
> #include <linux/stringify.h>
> #include <asm/asm.h>
>+#include <asm/ptrace.h>
>
> /*
> * Alternative inline assembly for SMP.
>@@ -232,6 +233,7 @@ struct text_poke_param {
> size_t len;
> };
>
>+extern int poke_bp_int3_handler(struct pt_regs *regs);
> extern void *text_poke(void *addr, const void *opcode, size_t len);
>extern void *text_poke_bp(void *addr, const void *opcode, size_t len,
>void *handler);
>extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
>diff --git a/arch/x86/kernel/alternative.c
>b/arch/x86/kernel/alternative.c
>index 0ab4936..e1088f2 100644
>--- a/arch/x86/kernel/alternative.c
>+++ b/arch/x86/kernel/alternative.c
>@@ -605,6 +605,24 @@ static void do_sync_core(void *info)
> static bool bp_patching_in_progress;
> static void *bp_int3_handler, *bp_int3_addr;
>
>+int poke_bp_int3_handler(struct pt_regs *regs)
>+{
>+ /* bp_patching_in_progress */
>+ smp_rmb();
>+
>+ if (likely(!bp_patching_in_progress))
>+ return 0;
>+
>+ if (user_mode_vm(regs) || regs->ip != (unsigned long)bp_int3_addr)
>+ return 0;
>+
>+ /* set up the specified breakpoint handler */
>+ regs->ip = (unsigned long) bp_int3_handler;
>+
>+ return 1;
>+
>+}
>+
>static int int3_notify(struct notifier_block *self, unsigned long val,
>void *data)
> {
> struct die_args *args = data;
>@@ -689,6 +707,7 @@ void *text_poke_bp(void *addr, const void *opcode,
>size_t len, void *handler)
> return addr;
> }
>
>+#if 0
> /* this one needs to run before anything else handles it as a
> * regular exception */
> static struct notifier_block int3_nb = {
>@@ -700,8 +719,9 @@ static int __init int3_init(void)
> {
> return register_die_notifier(&int3_nb);
> }
>-
> arch_initcall(int3_init);
>+#endif
>+
> /*
> * Cross-modifying kernel text with stop_machine().
> * This code originally comes from immediate value.
>diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
>index 772e2a8..e464764 100644
>--- a/arch/x86/kernel/traps.c
>+++ b/arch/x86/kernel/traps.c
>@@ -58,6 +58,7 @@
> #include <asm/mce.h>
> #include <asm/fixmap.h>
> #include <asm/mach_traps.h>
>+#include <asm/alternative.h>
>
> #ifdef CONFIG_X86_64
> #include <asm/x86_init.h>
>@@ -324,6 +325,9 @@ dotraplinkage void __kprobes notrace do_int3(struct
>pt_regs *regs, long error_co
> ftrace_int3_handler(regs))
> return;
> #endif
>+ if (poke_bp_int3_handler(regs))
>+ return;
>+
> prev_state = exception_enter();
> #ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
> if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,

--
Sent from my mobile phone. Please excuse brevity and lack of formatting.

2013-07-22 01:08:00

by Fengguang Wu

[permalink] [raw]
Subject: Re: [x86] Kernel panic - not syncing: Fatal exception in interrupt

Hi Jiri,

> What I am however wondering whether can't be case here is that the jump
> label was used before int3_notifier has been registered.
> I am thinking about ways around this, but we'll probably have to do the
> same ftrace is doing, i.e. hook into do_int3() directly instead of relying
> on the notifier to be registered in time.
>
> Fengguang, as I am not able to reproduce this bug locally, could you do me
> a favor and test whether the patch below works the problem around, just
> for the sake of testing the hypothesis?

I tested 1000 boots with the patch and find no more boot problem.

Thanks,
Fengguang

> From: Jiri Kosina <[email protected]>
> Subject: [PATCH] x86: call out into int3 handler directly instead of using notifier
>
> ---
> arch/x86/include/asm/alternative.h | 2 ++
> arch/x86/kernel/alternative.c | 22 +++++++++++++++++++++-
> arch/x86/kernel/traps.c | 4 ++++
> 3 files changed, 27 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> index 3abf8dd..c22a41d 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -5,6 +5,7 @@
> #include <linux/stddef.h>
> #include <linux/stringify.h>
> #include <asm/asm.h>
> +#include <asm/ptrace.h>
>
> /*
> * Alternative inline assembly for SMP.
> @@ -232,6 +233,7 @@ struct text_poke_param {
> size_t len;
> };
>
> +extern int poke_bp_int3_handler(struct pt_regs *regs);
> extern void *text_poke(void *addr, const void *opcode, size_t len);
> extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
> extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 0ab4936..e1088f2 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -605,6 +605,24 @@ static void do_sync_core(void *info)
> static bool bp_patching_in_progress;
> static void *bp_int3_handler, *bp_int3_addr;
>
> +int poke_bp_int3_handler(struct pt_regs *regs)
> +{
> + /* bp_patching_in_progress */
> + smp_rmb();
> +
> + if (likely(!bp_patching_in_progress))
> + return 0;
> +
> + if (user_mode_vm(regs) || regs->ip != (unsigned long)bp_int3_addr)
> + return 0;
> +
> + /* set up the specified breakpoint handler */
> + regs->ip = (unsigned long) bp_int3_handler;
> +
> + return 1;
> +
> +}
> +
> static int int3_notify(struct notifier_block *self, unsigned long val, void *data)
> {
> struct die_args *args = data;
> @@ -689,6 +707,7 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
> return addr;
> }
>
> +#if 0
> /* this one needs to run before anything else handles it as a
> * regular exception */
> static struct notifier_block int3_nb = {
> @@ -700,8 +719,9 @@ static int __init int3_init(void)
> {
> return register_die_notifier(&int3_nb);
> }
> -
> arch_initcall(int3_init);
> +#endif
> +
> /*
> * Cross-modifying kernel text with stop_machine().
> * This code originally comes from immediate value.
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 772e2a8..e464764 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -58,6 +58,7 @@
> #include <asm/mce.h>
> #include <asm/fixmap.h>
> #include <asm/mach_traps.h>
> +#include <asm/alternative.h>
>
> #ifdef CONFIG_X86_64
> #include <asm/x86_init.h>
> @@ -324,6 +325,9 @@ dotraplinkage void __kprobes notrace do_int3(struct pt_regs *regs, long error_co
> ftrace_int3_handler(regs))
> return;
> #endif
> + if (poke_bp_int3_handler(regs))
> + return;
> +
> prev_state = exception_enter();
> #ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
> if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
>
> --
> Jiri Kosina
> SUSE Labs

2013-07-22 11:14:23

by Jiri Kosina

[permalink] [raw]
Subject: [PATCH -tip/x86/jumplabel] x86: call out into int3 handler directly instead of using notifier

In fd4363fff3d96 ("x86: Introduce int3 (breakpoint)-based instruction
patching"), the mechanism that was introduced for notifying alternatives
code from int3 exception handler that and exception occured was
die_notifier.

This is however problematic, as early code might be using jump labels even
before the notifier registration has been performed, which will then lead
to an oops due to unhandled exception. One of such occurences has been
encountered by Fengguang:

int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8
task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000
RIP: 0010:[<ffffffff811098cc>] [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225
RSP: 0000:ffff88000dd03f10 EFLAGS: 00000006
RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40
RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002
R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0
R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8
FS: 0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0
Stack:
ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48
ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68
ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98
Call Trace:
<IRQ>
[<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79
[<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84
[<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0
[<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41
[<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80
<EOI>
[<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1
[<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16
[<ffffffff81015f10>] default_idle+0x147/0x282
[<ffffffff81017026>] arch_cpu_idle+0x3d/0x5d
[<ffffffff81127d6a>] cpu_idle_loop+0x46d/0x5db
[<ffffffff81127f5c>] cpu_startup_entry+0x84/0x84
[<ffffffff8104f4f8>] start_secondary+0x3c8/0x3d5
Code: 5c 5d c3 e8 d7 0f 63 00 55 48 ff 05 1f 5d 1e 01 48 89 e5 41 55 49 89 f5 41 54 53 48 89 fb e8 8d fe ff ff 48
01 cc <1f> 44 00 00 31 c0 eb 0c 48 ff 05 05 5d 1e 01 b8 01 00 00 00 48
RIP [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225
RSP <ffff88000dd03f10>
---[ end trace 0d3288a047152a17 ]---

Fix this by directly calling poke_int3_handler() from the int3 exception
handler (analogically to what ftrace has been doing already), instead of
relying on notifier, registration of which might not have yet been
finalized by the time of the first trap.

Reported-and-tested-by: Fengguang Wu <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
---
arch/x86/include/asm/alternative.h | 2 ++
arch/x86/kernel/alternative.c | 31 ++++++++-----------------------
arch/x86/kernel/traps.c | 4 ++++
kernel/kprobes.c | 2 +-
4 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 3abf8dd..4df44c2 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -5,6 +5,7 @@
#include <linux/stddef.h>
#include <linux/stringify.h>
#include <asm/asm.h>
+#include <asm/ptrace.h>

/*
* Alternative inline assembly for SMP.
@@ -232,6 +233,7 @@ struct text_poke_param {
size_t len;
};

+extern int poke_int3_handler(struct pt_regs *regs);
extern void *text_poke(void *addr, const void *opcode, size_t len);
extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 0ab4936..7f6351f 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -605,26 +605,24 @@ static void do_sync_core(void *info)
static bool bp_patching_in_progress;
static void *bp_int3_handler, *bp_int3_addr;

-static int int3_notify(struct notifier_block *self, unsigned long val, void *data)
+int poke_int3_handler(struct pt_regs *regs)
{
- struct die_args *args = data;
-
/* bp_patching_in_progress */
smp_rmb();

if (likely(!bp_patching_in_progress))
- return NOTIFY_DONE;
+ return 0;

- /* we are not interested in non-int3 faults and ring > 0 faults */
- if (val != DIE_INT3 || !args->regs || user_mode_vm(args->regs)
- || args->regs->ip != (unsigned long)bp_int3_addr)
- return NOTIFY_DONE;
+ if (user_mode_vm(regs) || regs->ip != (unsigned long)bp_int3_addr)
+ return 0;

/* set up the specified breakpoint handler */
- args->regs->ip = (unsigned long) bp_int3_handler;
+ regs->ip = (unsigned long) bp_int3_handler;
+
+ return 1;

- return NOTIFY_STOP;
}
+
/**
* text_poke_bp() -- update instructions on live kernel on SMP
* @addr: address to patch
@@ -689,19 +687,6 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
return addr;
}

-/* this one needs to run before anything else handles it as a
- * regular exception */
-static struct notifier_block int3_nb = {
- .priority = 0x7fffffff,
- .notifier_call = int3_notify
-};
-
-static int __init int3_init(void)
-{
- return register_die_notifier(&int3_nb);
-}
-
-arch_initcall(int3_init);
/*
* Cross-modifying kernel text with stop_machine().
* This code originally comes from immediate value.
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 772e2a8..2694486 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -58,6 +58,7 @@
#include <asm/mce.h>
#include <asm/fixmap.h>
#include <asm/mach_traps.h>
+#include <asm/alternative.h>

#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -324,6 +325,9 @@ dotraplinkage void __kprobes notrace do_int3(struct pt_regs *regs, long error_co
ftrace_int3_handler(regs))
return;
#endif
+ if (poke_int3_handler(regs))
+ return;
+
prev_state = exception_enter();
#ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index d6db7bd..bddf3b2 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1709,7 +1709,7 @@ EXPORT_SYMBOL_GPL(unregister_kprobes);

static struct notifier_block kprobe_exceptions_nb = {
.notifier_call = kprobe_exceptions_notify,
- .priority = 0x7ffffff0 /* High priority, but not first. */
+ .priority = 0x7fffffff /* we need to be notified first */
};

unsigned long __weak arch_deref_entry_point(void *entry)
--
1.7.8.3

2013-07-22 11:18:22

by Jiri Kosina

[permalink] [raw]
Subject: Re: [x86] Kernel panic - not syncing: Fatal exception in interrupt

On Sun, 21 Jul 2013, H. Peter Anvin wrote:

> Honestly, why don't we make the patch list (rbtree, whatever) a
> permanent part of the default breakpoint handler. It only applies to
> kernel space anyway and the kernel doesn't have any permanent
> breakpoints so there should be no performance reason not to.

That is actually quite a nice idea, and I am putting it on my TODO list
for ftrace conversion.

Thanks,

--
Jiri Kosina
SUSE Labs

Subject: Re: [PATCH -tip/x86/jumplabel] x86: call out into int3 handler directly instead of using notifier

(2013/07/22 20:14), Jiri Kosina wrote:
> In fd4363fff3d96 ("x86: Introduce int3 (breakpoint)-based instruction
> patching"), the mechanism that was introduced for notifying alternatives
> code from int3 exception handler that and exception occured was
> die_notifier.
>
> This is however problematic, as early code might be using jump labels even
> before the notifier registration has been performed, which will then lead
> to an oops due to unhandled exception. One of such occurences has been
> encountered by Fengguang:

Ah, right!

>
> int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> Modules linked in:
> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8
> task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000
> RIP: 0010:[<ffffffff811098cc>] [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225
> RSP: 0000:ffff88000dd03f10 EFLAGS: 00000006
> RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40
> RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001
> RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002
> R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0
> R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8
> FS: 0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0
> Stack:
> ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48
> ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68
> ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98
> Call Trace:
> <IRQ>
> [<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79
> [<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84
> [<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0
> [<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41
> [<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80
> <EOI>
> [<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1
> [<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16
> [<ffffffff81015f10>] default_idle+0x147/0x282
> [<ffffffff81017026>] arch_cpu_idle+0x3d/0x5d
> [<ffffffff81127d6a>] cpu_idle_loop+0x46d/0x5db
> [<ffffffff81127f5c>] cpu_startup_entry+0x84/0x84
> [<ffffffff8104f4f8>] start_secondary+0x3c8/0x3d5
> Code: 5c 5d c3 e8 d7 0f 63 00 55 48 ff 05 1f 5d 1e 01 48 89 e5 41 55 49 89 f5 41 54 53 48 89 fb e8 8d fe ff ff 48
> 01 cc <1f> 44 00 00 31 c0 eb 0c 48 ff 05 05 5d 1e 01 b8 01 00 00 00 48
> RIP [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225
> RSP <ffff88000dd03f10>
> ---[ end trace 0d3288a047152a17 ]---
>
> Fix this by directly calling poke_int3_handler() from the int3 exception
> handler (analogically to what ftrace has been doing already), instead of
> relying on notifier, registration of which might not have yet been
> finalized by the time of the first trap.

This seems OK for me.

Reviewed-by: Masami Hiramatsu <[email protected]>

Thanks!

>
> Reported-and-tested-by: Fengguang Wu <[email protected]>
> Signed-off-by: Jiri Kosina <[email protected]>
> ---
> arch/x86/include/asm/alternative.h | 2 ++
> arch/x86/kernel/alternative.c | 31 ++++++++-----------------------
> arch/x86/kernel/traps.c | 4 ++++
> kernel/kprobes.c | 2 +-
> 4 files changed, 15 insertions(+), 24 deletions(-)
>
> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> index 3abf8dd..4df44c2 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -5,6 +5,7 @@
> #include <linux/stddef.h>
> #include <linux/stringify.h>
> #include <asm/asm.h>
> +#include <asm/ptrace.h>
>
> /*
> * Alternative inline assembly for SMP.
> @@ -232,6 +233,7 @@ struct text_poke_param {
> size_t len;
> };
>
> +extern int poke_int3_handler(struct pt_regs *regs);
> extern void *text_poke(void *addr, const void *opcode, size_t len);
> extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
> extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 0ab4936..7f6351f 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -605,26 +605,24 @@ static void do_sync_core(void *info)
> static bool bp_patching_in_progress;
> static void *bp_int3_handler, *bp_int3_addr;
>
> -static int int3_notify(struct notifier_block *self, unsigned long val, void *data)
> +int poke_int3_handler(struct pt_regs *regs)
> {
> - struct die_args *args = data;
> -
> /* bp_patching_in_progress */
> smp_rmb();
>
> if (likely(!bp_patching_in_progress))
> - return NOTIFY_DONE;
> + return 0;
>
> - /* we are not interested in non-int3 faults and ring > 0 faults */
> - if (val != DIE_INT3 || !args->regs || user_mode_vm(args->regs)
> - || args->regs->ip != (unsigned long)bp_int3_addr)
> - return NOTIFY_DONE;
> + if (user_mode_vm(regs) || regs->ip != (unsigned long)bp_int3_addr)
> + return 0;
>
> /* set up the specified breakpoint handler */
> - args->regs->ip = (unsigned long) bp_int3_handler;
> + regs->ip = (unsigned long) bp_int3_handler;
> +
> + return 1;
>
> - return NOTIFY_STOP;
> }
> +
> /**
> * text_poke_bp() -- update instructions on live kernel on SMP
> * @addr: address to patch
> @@ -689,19 +687,6 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
> return addr;
> }
>
> -/* this one needs to run before anything else handles it as a
> - * regular exception */
> -static struct notifier_block int3_nb = {
> - .priority = 0x7fffffff,
> - .notifier_call = int3_notify
> -};
> -
> -static int __init int3_init(void)
> -{
> - return register_die_notifier(&int3_nb);
> -}
> -
> -arch_initcall(int3_init);
> /*
> * Cross-modifying kernel text with stop_machine().
> * This code originally comes from immediate value.
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 772e2a8..2694486 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -58,6 +58,7 @@
> #include <asm/mce.h>
> #include <asm/fixmap.h>
> #include <asm/mach_traps.h>
> +#include <asm/alternative.h>
>
> #ifdef CONFIG_X86_64
> #include <asm/x86_init.h>
> @@ -324,6 +325,9 @@ dotraplinkage void __kprobes notrace do_int3(struct pt_regs *regs, long error_co
> ftrace_int3_handler(regs))
> return;
> #endif
> + if (poke_int3_handler(regs))
> + return;
> +
> prev_state = exception_enter();
> #ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
> if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index d6db7bd..bddf3b2 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1709,7 +1709,7 @@ EXPORT_SYMBOL_GPL(unregister_kprobes);
>
> static struct notifier_block kprobe_exceptions_nb = {
> .notifier_call = kprobe_exceptions_notify,
> - .priority = 0x7ffffff0 /* High priority, but not first. */
> + .priority = 0x7fffffff /* we need to be notified first */
> };
>
> unsigned long __weak arch_deref_entry_point(void *entry)
>


--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-07-22 20:53:22

by H. Peter Anvin

[permalink] [raw]

2013-07-22 21:00:57

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH -tip/x86/jumplabel] x86: call out into int3 handler directly instead of using notifier

On Mon, 22 Jul 2013, H. Peter Anvin wrote:

> What is the baseline for this patch?

Hi!

it's x86/jumplabel branch, as a followup to commit
fd4363fff3d96 ("x86: Introduce int3 (breakpoint)-based instruction
patching") sitting there.

This branch however seems to be broken by Ingo's mismerge; this e-mail
from earlier today:

https://lkml.org/lkml/2013/7/22/147

implied that he fixed this ("all is fine"), however x86/jumplabel still
contains just 3/3 of Masami's series, and therefore is broken; 3/3 is not
enough, it needs also

kprobes/x86: Remove an incorrect comment about int3 in NMI/MCE
kprobes/x86: Use text_poke_bp() instead of text_poke_smp*()

included or

kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions

dropped to be in consistent state again.

I pinged Ingo about this:

https://lkml.org/lkml/2013/7/22/281

but the branch still seems to be in an odd state, containing last patch of
Masami's series.

Thanks in advance for fixing this and applying mi fix as well,

--
Jiri Kosina
SUSE Labs

Subject: Re: Re: [PATCH -tip/x86/jumplabel] x86: call out into int3 handler directly instead of using notifier

(2013/07/23 6:00), Jiri Kosina wrote:
> On Mon, 22 Jul 2013, H. Peter Anvin wrote:
>
>> What is the baseline for this patch?
>
> Hi!
>
> it's x86/jumplabel branch, as a followup to commit
> fd4363fff3d96 ("x86: Introduce int3 (breakpoint)-based instruction
> patching") sitting there.

AFAICS, tip/master already merged Jiri's works on tip/x86/jumplabel.
Thus both branches need this. (especially, since this is actual
bugfix, it should go into tip/master.)


> This branch however seems to be broken by Ingo's mismerge; this e-mail
> from earlier today:
>
> https://lkml.org/lkml/2013/7/22/147
>
> implied that he fixed this ("all is fine"), however x86/jumplabel still
> contains just 3/3 of Masami's series, and therefore is broken; 3/3 is not
> enough, it needs also
>
> kprobes/x86: Remove an incorrect comment about int3 in NMI/MCE
> kprobes/x86: Use text_poke_bp() instead of text_poke_smp*()
>
> included or
>
> kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions
>
> dropped to be in consistent state again.

I hope to pull entire of this series in :)

>
> I pinged Ingo about this:
>
> https://lkml.org/lkml/2013/7/22/281
>
> but the branch still seems to be in an odd state, containing last patch of
> Masami's series.
>
> Thanks in advance for fixing this and applying mi fix as well,
>

Thanks!

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-07-23 07:45:11

by Ingo Molnar

[permalink] [raw]
Subject: Re: Re: [PATCH -tip/x86/jumplabel] x86: call out into int3 handler directly instead of using notifier


* Masami Hiramatsu <[email protected]> wrote:

> (2013/07/23 6:00), Jiri Kosina wrote:
> > On Mon, 22 Jul 2013, H. Peter Anvin wrote:
> >
> >> What is the baseline for this patch?
> >
> > Hi!
> >
> > it's x86/jumplabel branch, as a followup to commit
> > fd4363fff3d96 ("x86: Introduce int3 (breakpoint)-based instruction
> > patching") sitting there.
>
> AFAICS, tip/master already merged Jiri's works on tip/x86/jumplabel.
> Thus both branches need this. (especially, since this is actual
> bugfix, it should go into tip/master.)

I resolved it yesterday by keeping the kprobes patches in perf/core - just
forgot to push it all out.

I've pushed it out now, so perf/core should be a baseline: it merges in
x86/jumplabel and then applies the kprobes patches (historically in
perf/core).

Thanks,

Ingo

2013-07-23 08:10:21

by Jiri Kosina

[permalink] [raw]
Subject: Re: Re: [PATCH -tip/x86/jumplabel] x86: call out into int3 handler directly instead of using notifier

On Tue, 23 Jul 2013, Ingo Molnar wrote:

> > > it's x86/jumplabel branch, as a followup to commit
> > > fd4363fff3d96 ("x86: Introduce int3 (breakpoint)-based instruction
> > > patching") sitting there.
> >
> > AFAICS, tip/master already merged Jiri's works on tip/x86/jumplabel.
> > Thus both branches need this. (especially, since this is actual
> > bugfix, it should go into tip/master.)
>
> I resolved it yesterday by keeping the kprobes patches in perf/core - just
> forgot to push it all out.
>
> I've pushed it out now, so perf/core should be a baseline: it merges in
> x86/jumplabel and then applies the kprobes patches (historically in
> perf/core).

Ok, thanks Ingo. Rebased fix below.




From: Jiri Kosina <[email protected]>
Subject: [PATCH] x86: call out into int3 handler directly instead of using notifier

In fd4363fff3d96 ("x86: Introduce int3 (breakpoint)-based instruction
patching"), the mechanism that was introduced for notifying alternatives
code from int3 exception handler that and exception occured was
die_notifier.

This is however problematic, as early code might be using jump labels even
before the notifier registration has been performed, which will then lead
to an oops due to unhandled exception. One of such occurences has been
encountered by Fengguang:

int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8
task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000
RIP: 0010:[<ffffffff811098cc>] [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225
RSP: 0000:ffff88000dd03f10 EFLAGS: 00000006
RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40
RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002
R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0
R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8
FS: 0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0
Stack:
ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48
ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68
ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98
Call Trace:
<IRQ>
[<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79
[<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84
[<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0
[<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41
[<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80
<EOI>
[<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1
[<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16
[<ffffffff81015f10>] default_idle+0x147/0x282
[<ffffffff81017026>] arch_cpu_idle+0x3d/0x5d
[<ffffffff81127d6a>] cpu_idle_loop+0x46d/0x5db
[<ffffffff81127f5c>] cpu_startup_entry+0x84/0x84
[<ffffffff8104f4f8>] start_secondary+0x3c8/0x3d5
Code: 5c 5d c3 e8 d7 0f 63 00 55 48 ff 05 1f 5d 1e 01 48 89 e5 41 55 49 89 f5 41 54 53 48 89 fb e8 8d fe ff ff 48
01 cc <1f> 44 00 00 31 c0 eb 0c 48 ff 05 05 5d 1e 01 b8 01 00 00 00 48
RIP [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225
RSP <ffff88000dd03f10>
---[ end trace 0d3288a047152a17 ]---

Fix this by directly calling poke_int3_handler() from the int3 exception
handler (analogically to what ftrace has been doing already), instead of
relying on notifier, registration of which might not have yet been
finalized by the time of the first trap.

Reported-and-tested-by: Fengguang Wu <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
---
arch/x86/include/asm/alternative.h | 2 ++
arch/x86/kernel/alternative.c | 31 ++++++++-----------------------
arch/x86/kernel/traps.c | 4 ++++
kernel/kprobes.c | 2 +-
4 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 4daf8c5..0a3f9c9 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -5,6 +5,7 @@
#include <linux/stddef.h>
#include <linux/stringify.h>
#include <asm/asm.h>
+#include <asm/ptrace.h>

/*
* Alternative inline assembly for SMP.
@@ -224,6 +225,7 @@ extern void *text_poke_early(void *addr, const void *opcode, size_t len);
* inconsistent instruction while you patch.
*/
extern void *text_poke(void *addr, const void *opcode, size_t len);
+extern int poke_int3_handler(struct pt_regs *regs);
extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);

#endif /* _ASM_X86_ALTERNATIVE_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 5d8782e..15e8563 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -605,26 +605,24 @@ static void do_sync_core(void *info)
static bool bp_patching_in_progress;
static void *bp_int3_handler, *bp_int3_addr;

-static int int3_notify(struct notifier_block *self, unsigned long val, void *data)
+int poke_int3_handler(struct pt_regs *regs)
{
- struct die_args *args = data;
-
/* bp_patching_in_progress */
smp_rmb();

if (likely(!bp_patching_in_progress))
- return NOTIFY_DONE;
+ return 0;

- /* we are not interested in non-int3 faults and ring > 0 faults */
- if (val != DIE_INT3 || !args->regs || user_mode_vm(args->regs)
- || args->regs->ip != (unsigned long)bp_int3_addr)
- return NOTIFY_DONE;
+ if (user_mode_vm(regs) || regs->ip != (unsigned long)bp_int3_addr)
+ return 0;

/* set up the specified breakpoint handler */
- args->regs->ip = (unsigned long) bp_int3_handler;
+ regs->ip = (unsigned long) bp_int3_handler;
+
+ return 1;

- return NOTIFY_STOP;
}
+
/**
* text_poke_bp() -- update instructions on live kernel on SMP
* @addr: address to patch
@@ -689,16 +687,3 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
return addr;
}

-/* this one needs to run before anything else handles it as a
- * regular exception */
-static struct notifier_block int3_nb = {
- .priority = 0x7fffffff,
- .notifier_call = int3_notify
-};
-
-static int __init int3_init(void)
-{
- return register_die_notifier(&int3_nb);
-}
-
-arch_initcall(int3_init);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 1b23a1c..8c8093b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -58,6 +58,7 @@
#include <asm/mce.h>
#include <asm/fixmap.h>
#include <asm/mach_traps.h>
+#include <asm/alternative.h>

#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -327,6 +328,9 @@ dotraplinkage void __kprobes notrace do_int3(struct pt_regs *regs, long error_co
ftrace_int3_handler(regs))
return;
#endif
+ if (poke_int3_handler(regs))
+ return;
+
prev_state = exception_enter();
#ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index b58b490..6e33498 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1709,7 +1709,7 @@ EXPORT_SYMBOL_GPL(unregister_kprobes);

static struct notifier_block kprobe_exceptions_nb = {
.notifier_call = kprobe_exceptions_notify,
- .priority = 0x7ffffff0 /* High priority, but not first. */
+ .priority = 0x7fffffff /* we need to be notified first */
};

unsigned long __weak arch_deref_entry_point(void *entry)

--
Jiri Kosina
SUSE Labs

Subject: [tip:perf/core] kprobes/x86: Call out into INT3 handler directly instead of using notifier

Commit-ID: 17f41571bb2c4a398785452ac2718a6c5d77180e
Gitweb: http://git.kernel.org/tip/17f41571bb2c4a398785452ac2718a6c5d77180e
Author: Jiri Kosina <[email protected]>
AuthorDate: Tue, 23 Jul 2013 10:09:28 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 23 Jul 2013 10:12:57 +0200

kprobes/x86: Call out into INT3 handler directly instead of using notifier

In fd4363fff3d96 ("x86: Introduce int3 (breakpoint)-based
instruction patching"), the mechanism that was introduced for
notifying alternatives code from int3 exception handler that and
exception occured was die_notifier.

This is however problematic, as early code might be using jump
labels even before the notifier registration has been performed,
which will then lead to an oops due to unhandled exception. One
of such occurences has been encountered by Fengguang:

int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8
task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000
RIP: 0010:[<ffffffff811098cc>] [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225
RSP: 0000:ffff88000dd03f10 EFLAGS: 00000006
RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40
RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002
R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0
R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8
FS: 0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0
Stack:
ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48
ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68
ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98
Call Trace:
<IRQ>
[<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79
[<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84
[<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0
[<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41
[<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80
<EOI>
[<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1
[<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16
[<ffffffff81015f10>] default_idle+0x147/0x282
[<ffffffff81017026>] arch_cpu_idle+0x3d/0x5d
[<ffffffff81127d6a>] cpu_idle_loop+0x46d/0x5db
[<ffffffff81127f5c>] cpu_startup_entry+0x84/0x84
[<ffffffff8104f4f8>] start_secondary+0x3c8/0x3d5
[...]

Fix this by directly calling poke_int3_handler() from the int3
exception handler (analogically to what ftrace has been doing
already), instead of relying on notifier, registration of which
might not have yet been finalized by the time of the first trap.

Reported-and-tested-by: Fengguang Wu <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Steven Rostedt <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/alternative.h | 2 ++
arch/x86/kernel/alternative.c | 31 ++++++++-----------------------
arch/x86/kernel/traps.c | 4 ++++
kernel/kprobes.c | 2 +-
4 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 4daf8c5..0a3f9c9 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -5,6 +5,7 @@
#include <linux/stddef.h>
#include <linux/stringify.h>
#include <asm/asm.h>
+#include <asm/ptrace.h>

/*
* Alternative inline assembly for SMP.
@@ -224,6 +225,7 @@ extern void *text_poke_early(void *addr, const void *opcode, size_t len);
* inconsistent instruction while you patch.
*/
extern void *text_poke(void *addr, const void *opcode, size_t len);
+extern int poke_int3_handler(struct pt_regs *regs);
extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);

#endif /* _ASM_X86_ALTERNATIVE_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 5d8782e..15e8563 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -605,26 +605,24 @@ static void do_sync_core(void *info)
static bool bp_patching_in_progress;
static void *bp_int3_handler, *bp_int3_addr;

-static int int3_notify(struct notifier_block *self, unsigned long val, void *data)
+int poke_int3_handler(struct pt_regs *regs)
{
- struct die_args *args = data;
-
/* bp_patching_in_progress */
smp_rmb();

if (likely(!bp_patching_in_progress))
- return NOTIFY_DONE;
+ return 0;

- /* we are not interested in non-int3 faults and ring > 0 faults */
- if (val != DIE_INT3 || !args->regs || user_mode_vm(args->regs)
- || args->regs->ip != (unsigned long)bp_int3_addr)
- return NOTIFY_DONE;
+ if (user_mode_vm(regs) || regs->ip != (unsigned long)bp_int3_addr)
+ return 0;

/* set up the specified breakpoint handler */
- args->regs->ip = (unsigned long) bp_int3_handler;
+ regs->ip = (unsigned long) bp_int3_handler;
+
+ return 1;

- return NOTIFY_STOP;
}
+
/**
* text_poke_bp() -- update instructions on live kernel on SMP
* @addr: address to patch
@@ -689,16 +687,3 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
return addr;
}

-/* this one needs to run before anything else handles it as a
- * regular exception */
-static struct notifier_block int3_nb = {
- .priority = 0x7fffffff,
- .notifier_call = int3_notify
-};
-
-static int __init int3_init(void)
-{
- return register_die_notifier(&int3_nb);
-}
-
-arch_initcall(int3_init);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 1b23a1c..8c8093b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -58,6 +58,7 @@
#include <asm/mce.h>
#include <asm/fixmap.h>
#include <asm/mach_traps.h>
+#include <asm/alternative.h>

#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -327,6 +328,9 @@ dotraplinkage void __kprobes notrace do_int3(struct pt_regs *regs, long error_co
ftrace_int3_handler(regs))
return;
#endif
+ if (poke_int3_handler(regs))
+ return;
+
prev_state = exception_enter();
#ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index b58b490..6e33498 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1709,7 +1709,7 @@ EXPORT_SYMBOL_GPL(unregister_kprobes);

static struct notifier_block kprobe_exceptions_nb = {
.notifier_call = kprobe_exceptions_notify,
- .priority = 0x7ffffff0 /* High priority, but not first. */
+ .priority = 0x7fffffff /* we need to be notified first */
};

unsigned long __weak arch_deref_entry_point(void *entry)

2013-07-24 14:34:24

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [tip:perf/core] kprobes/x86: Call out into INT3 handler directly instead of using notifier

On 07/23/2013 08:55 PM, tip-bot for Jiri Kosina wrote:
> Commit-ID: 17f41571bb2c4a398785452ac2718a6c5d77180e
> Gitweb: http://git.kernel.org/tip/17f41571bb2c4a398785452ac2718a6c5d77180e
> Author: Jiri Kosina <[email protected]>
> AuthorDate: Tue, 23 Jul 2013 10:09:28 +0200
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Tue, 23 Jul 2013 10:12:57 +0200
>
> kprobes/x86: Call out into INT3 handler directly instead of using notifier
>
> In fd4363fff3d96 ("x86: Introduce int3 (breakpoint)-based
> instruction patching"), the mechanism that was introduced for
> notifying alternatives code from int3 exception handler that and
> exception occured was die_notifier.
>
> This is however problematic, as early code might be using jump
> labels even before the notifier registration has been performed,
> which will then lead to an oops due to unhandled exception. One
> of such occurences has been encountered by Fengguang:
>

Please note that putting this patch in perf/core means x86/jumplabel is
completely nonfunctional by itself.

-hpa

2013-07-29 09:07:01

by Jiri Kosina

[permalink] [raw]
Subject: Re: [tip:perf/core] kprobes/x86: Call out into INT3 handler directly instead of using notifier

On Wed, 24 Jul 2013, H. Peter Anvin wrote:

> > Commit-ID: 17f41571bb2c4a398785452ac2718a6c5d77180e
> > Gitweb: http://git.kernel.org/tip/17f41571bb2c4a398785452ac2718a6c5d77180e
> > Author: Jiri Kosina <[email protected]>
> > AuthorDate: Tue, 23 Jul 2013 10:09:28 +0200
> > Committer: Ingo Molnar <[email protected]>
> > CommitDate: Tue, 23 Jul 2013 10:12:57 +0200
> >
> > kprobes/x86: Call out into INT3 handler directly instead of using notifier
> >
> > In fd4363fff3d96 ("x86: Introduce int3 (breakpoint)-based
> > instruction patching"), the mechanism that was introduced for
> > notifying alternatives code from int3 exception handler that and
> > exception occured was die_notifier.
> >
> > This is however problematic, as early code might be using jump
> > labels even before the notifier registration has been performed,
> > which will then lead to an oops due to unhandled exception. One
> > of such occurences has been encountered by Fengguang:
> >
>
> Please note that putting this patch in perf/core means x86/jumplabel is
> completely nonfunctional by itself.

I'd like to point out that this is still the case ... TIA,

--
Jiri Kosina
SUSE Labs

2013-07-29 09:15:10

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [tip:perf/core] kprobes/x86: Call out into INT3 handler directly instead of using notifier

On 07/29/2013 02:06 AM, Jiri Kosina wrote:
>>
>> Please note that putting this patch in perf/core means x86/jumplabel is
>> completely nonfunctional by itself.
>
> I'd like to point out that this is still the case ... TIA,
>

What is?

-hpa

2013-07-29 09:18:57

by Jiri Kosina

[permalink] [raw]
Subject: Re: [tip:perf/core] kprobes/x86: Call out into INT3 handler directly instead of using notifier

On Mon, 29 Jul 2013, H. Peter Anvin wrote:

> >> Please note that putting this patch in perf/core means x86/jumplabel is
> >> completely nonfunctional by itself.
> >
> > I'd like to point out that this is still the case ... TIA,
>
> What is?

17f41571bb2c4a is only in perf/core, x86/jumplabel is missing it. I am not
sure about the tip workflow, but as it is now, x86/jumplabel is buggy, so
it should be either fixed (by pulling 17f41571bb2c4a from perf/core) or
removed, or somehow marked as broken ... ?

Not a big deal for me as long as perf/core is fine, but as you were
mentioning it ...

Thanks,

--
Jiri Kosina
SUSE Labs

2013-07-29 09:24:04

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [tip:perf/core] kprobes/x86: Call out into INT3 handler directly instead of using notifier

On 07/29/2013 02:18 AM, Jiri Kosina wrote:
> On Mon, 29 Jul 2013, H. Peter Anvin wrote:
>
>>>> Please note that putting this patch in perf/core means x86/jumplabel is
>>>> completely nonfunctional by itself.
>>>
>>> I'd like to point out that this is still the case ... TIA,
>>
>> What is?
>
> 17f41571bb2c4a is only in perf/core, x86/jumplabel is missing it. I am not
> sure about the tip workflow, but as it is now, x86/jumplabel is buggy, so
> it should be either fixed (by pulling 17f41571bb2c4a from perf/core) or
> removed, or somehow marked as broken ... ?
>
> Not a big deal for me as long as perf/core is fine, but as you were
> mentioning it ...
>

Yes, we know. In effect, x86/jumplabel not really a working standalone
topic branch at the moment. It is probably not worth fixing it for this
kernel cycle, as the number of patches involved is relatively small, but
it is perhaps a bit unfortunate.

-hpa