2013-07-10 20:25:52

by Jiri Kosina

[permalink] [raw]
Subject: [RFC] [PATCH 0/2] x86: make jump labels use int3-based breakpoint instead of stop_machine()

Hi,

this is a resurrection of a few years old idea to have jump labels use
synchronization based on int3 breakpoint rather than relying on
stop_machine() with all the consequences.

ftrace has been doing exactly this kind of patching for year since
08d636b6 ("ftrace/x86: Have arch x86_64 use breakpoints instead of stop
machine").

This patchset first introduces generic text_poke_bp() that provides means
to perform this method of patching in parallel to text_poke_smp(), and
then converts x86 jump label code to use it.

If this is merged, I'll do a followup patch converting ftrace to use this
infrastructure as well, as it's doing the same thing in principle already.

Comments welcome.

--
Jiri Kosina
SUSE Labs


2013-07-10 20:26:00

by Jiri Kosina

[permalink] [raw]
Subject: [RFC] [PATCH 1/2] x86: introduce int3-based instruction patching

Introduce a method for run-time instrucntion patching on a live SMP kernel
based on int3 breakpoint, completely avoiding the need for stop_machine().

The way this is achieved:

- add a int3 trap to the address that will be patched
- sync cores
- update all but the first byte of the patched range
- sync cores
- replalace the first byte (int3) by the first byte of
replacing opcode
- sync cores

If any CPU instruction execution would collide with the patching,
it'd be trapped by the int3 breakpoint and redirected to the provided
"handler" (which would typically mean just skipping over the patched
region, acting as "nop" has been there, in case we are doing nop -> jump
and jump -> nop transitions).

Ftrace has been using this very technique since 08d636b ("ftrace/x86:
Have arch x86_64 use breakpoints instead of stop machine") for ages
already, and jump labels are another obvious potential user of this.

Based on activities of Masami Hiramatsu <[email protected]>
a few years ago.

Signed-off-by: Jiri Kosina <[email protected]>
---
arch/x86/include/asm/alternative.h | 1 +
arch/x86/kernel/alternative.c | 100 ++++++++++++++++++++++++++++++++++++
kernel/kprobes.c | 2 +-
3 files changed, 102 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 58ed6d9..3abf8dd 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -233,6 +233,7 @@ struct text_poke_param {
};

extern void *text_poke(void *addr, const void *opcode, size_t len);
+extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
extern void text_poke_smp_batch(struct text_poke_param *params, int n);

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index c15cf9a..f61242a 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -11,6 +11,7 @@
#include <linux/memory.h>
#include <linux/stop_machine.h>
#include <linux/slab.h>
+#include <linux/kdebug.h>
#include <asm/alternative.h>
#include <asm/sections.h>
#include <asm/pgtable.h>
@@ -596,6 +597,105 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
return addr;
}

+static void do_sync_core(void *info)
+{
+ sync_core();
+}
+
+static bool bp_patching_in_progress;
+static void *bp_int3_handler, *bp_int3_addr;
+
+static int int3_notify(struct notifier_block *self, unsigned long val, void *data)
+{
+ struct die_args *args = data;
+ struct pt_regs *regs = args->regs;
+
+ /* bp_patching_in_progress */
+ smp_rmb();
+
+ if (likely(!bp_patching_in_progress))
+ return NOTIFY_DONE;
+
+ /* we are not interested in non-int3 faults and ring > 0 faults */
+ if (val != DIE_INT3 || !regs || user_mode_vm(regs)
+ || (unsigned long) bp_int3_addr != regs->ip)
+ return NOTIFY_DONE;
+
+ /* set up the specified breakpoint handler */
+ args->regs->ip = (unsigned long) bp_int3_handler;
+
+ return NOTIFY_STOP;
+}
+/*
+ * text_poke_bp() -- update instructions on live kernel on SMP
+ * @addr: address to patch
+ * @opcode: opcode of new instruction
+ * @len: length to copy
+ * @fixup: address to jump to when the temporary breakpoint is hit
+ *
+
+ * Modify multi-byte instruction by using int3 breakpoint on SMP.
+ * In contrary to text_poke_smp(), we completely avoid stop_machine() here,
+ * and achieve the synchronization using int3 breakpoint.
+ *
+ * The way it is done:
+ * - add a int3 trap to the address that will be patched
+ * - sync cores
+ * - update all but the first byte of the patched range
+ * - sync cores
+ * - replalace the first byte (int3) by the first byte of
+ * replacing opcode
+ * - sync cores
+ *
+ * Note: must be called under text_mutex.
+ */
+void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
+{
+ unsigned char int3 = 0xcc;
+
+ bp_int3_handler = handler;
+ bp_int3_addr = (u8 *)addr + sizeof(int3);
+ bp_patching_in_progress = true;
+ /* corresponding read barrier in int3 notifier for
+ * making sure the in_progress flags is correctly ordered wrt.
+ * patching */
+ smp_wmb();
+
+ text_poke(addr, &int3, sizeof(int3));
+
+ if (len - sizeof(int3) > 0) {
+ /* patch all but the first byte */
+ text_poke((char *)addr + sizeof(int3),
+ (const char *) opcode + sizeof(int3),
+ len - sizeof(int3));
+
+ on_each_cpu(do_sync_core, NULL, 1);
+ }
+
+ /* patch the first byte */
+ text_poke(addr, opcode, sizeof(int3));
+
+ on_each_cpu(do_sync_core, NULL, 1);
+
+ bp_patching_in_progress = false;
+ smp_wmb();
+
+ return addr;
+}
+
+/* this one needs to run before anything else handles it as a
+ * regular exception */
+static struct notifier_block int3_nb = {
+ .priority = 0x7fffffff,
+ .notifier_call = int3_notify
+};
+
+static int __init int3_init (void)
+{
+ return register_die_notifier(&int3_nb);
+}
+
+arch_initcall(int3_init);
/*
* Cross-modifying kernel text with stop_machine().
* This code originally comes from immediate value.
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index bddf3b2..d6db7bd 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1709,7 +1709,7 @@ EXPORT_SYMBOL_GPL(unregister_kprobes);

static struct notifier_block kprobe_exceptions_nb = {
.notifier_call = kprobe_exceptions_notify,
- .priority = 0x7fffffff /* we need to be notified first */
+ .priority = 0x7ffffff0 /* High priority, but not first. */
};

unsigned long __weak arch_deref_entry_point(void *entry)
--
1.7.3.1

2013-07-10 20:26:03

by Jiri Kosina

[permalink] [raw]
Subject: [RFC] [PATCH 2/2] x86: make jump_label use int3-based patching

Make jump labels use text_poke_bp() for text patching instead of
text_poke_smp(), avoiding the need for stop_machine().

Signed-off-by: Jiri Kosina <[email protected]>
---
arch/x86/kernel/jump_label.c | 16 ++++++++++++++--
1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index 2889b3d..460f5d9 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -37,7 +37,19 @@ static void __jump_label_transform(struct jump_entry *entry,
} else
memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);

- (*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
+ /*
+ * Make text_poke_bp() a default fallback poker.
+ *
+ * At the time the change is being done, just ignore whether we
+ * are doing nop -> jump or jump -> nop transition, and assume
+ * always nop being the 'currently valid' instruction
+ *
+ */
+ if (poker)
+ (*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
+ else
+ text_poke_bp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE,
+ (void *)entry->code + JUMP_LABEL_NOP_SIZE);
}

void arch_jump_label_transform(struct jump_entry *entry,
@@ -45,7 +57,7 @@ void arch_jump_label_transform(struct jump_entry *entry,
{
get_online_cpus();
mutex_lock(&text_mutex);
- __jump_label_transform(entry, type, text_poke_smp);
+ __jump_label_transform(entry, type, NULL);
mutex_unlock(&text_mutex);
put_online_cpus();
}
--
1.7.3.1

2013-07-10 21:31:41

by Jiri Kosina

[permalink] [raw]
Subject: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

Introduce a method for run-time instrucntion patching on a live SMP kernel
based on int3 breakpoint, completely avoiding the need for stop_machine().

The way this is achieved:

- add a int3 trap to the address that will be patched
- sync cores
- update all but the first byte of the patched range
- sync cores
- replalace the first byte (int3) by the first byte of
replacing opcode
- sync cores

If any CPU instruction execution would collide with the patching,
it'd be trapped by the int3 breakpoint and redirected to the provided
"handler" (which would typically mean just skipping over the patched
region, acting as "nop" has been there, in case we are doing nop -> jump
and jump -> nop transitions).

Ftrace has been using this very technique since 08d636b ("ftrace/x86:
Have arch x86_64 use breakpoints instead of stop machine") for ages
already, and jump labels are another obvious potential user of this.

Based on activities of Masami Hiramatsu <[email protected]>
a few years ago.

Signed-off-by: Jiri Kosina <[email protected]>
---
Changes:

v1 -> v2:
+ fixed kerneldoc
+ fixed checkpatch errors (reported by Borislav)

arch/x86/include/asm/alternative.h | 1 +
arch/x86/kernel/alternative.c | 101 ++++++++++++++++++++++++++++++++++++
kernel/kprobes.c | 2 +-
3 files changed, 103 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 58ed6d9..3abf8dd 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -233,6 +233,7 @@ struct text_poke_param {
};

extern void *text_poke(void *addr, const void *opcode, size_t len);
+extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
extern void text_poke_smp_batch(struct text_poke_param *params, int n);

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index c15cf9a..ee1f51c 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -11,6 +11,7 @@
#include <linux/memory.h>
#include <linux/stop_machine.h>
#include <linux/slab.h>
+#include <linux/kdebug.h>
#include <asm/alternative.h>
#include <asm/sections.h>
#include <asm/pgtable.h>
@@ -596,6 +597,106 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
return addr;
}

+static void do_sync_core(void *info)
+{
+ sync_core();
+}
+
+static bool bp_patching_in_progress;
+static void *bp_int3_handler, *bp_int3_addr;
+
+static int int3_notify(struct notifier_block *self, unsigned long val, void *data)
+{
+ struct die_args *args = data;
+ struct pt_regs *regs = args->regs;
+
+ /* bp_patching_in_progress */
+ smp_rmb();
+
+ if (likely(!bp_patching_in_progress))
+ return NOTIFY_DONE;
+
+ /* we are not interested in non-int3 faults and ring > 0 faults */
+ if (val != DIE_INT3 || !regs || user_mode_vm(regs)
+ || (unsigned long) bp_int3_addr != regs->ip)
+ return NOTIFY_DONE;
+
+ /* set up the specified breakpoint handler */
+ args->regs->ip = (unsigned long) bp_int3_handler;
+
+ return NOTIFY_STOP;
+}
+/*
+ * text_poke_bp() -- update instructions on live kernel on SMP
+ * @addr: address to patch
+ * @opcode: opcode of new instruction
+ * @len: length to copy
+ * @handler: address to jump to when the temporary breakpoint is hit
+ *
+
+ * Modify multi-byte instruction by using int3 breakpoint on SMP.
+ * In contrary to text_poke_smp(), we completely avoid stop_machine() here,
+ * and achieve the synchronization using int3 breakpoint.
+ *
+ * The way it is done:
+ * - add a int3 trap to the address that will be patched
+ * - sync cores
+ * - update all but the first byte of the patched range
+ * - sync cores
+ * - replalace the first byte (int3) by the first byte of
+ * replacing opcode
+ * - sync cores
+ *
+ * Note: must be called under text_mutex.
+ */
+void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
+{
+ unsigned char int3 = 0xcc;
+
+ bp_int3_handler = handler;
+ bp_int3_addr = (u8 *)addr + sizeof(int3);
+ bp_patching_in_progress = true;
+ /*
+ * corresponding read barrier in int3 notifier for
+ * making sure the in_progress flags is correctly ordered wrt.
+ * patching */
+ smp_wmb();
+
+ text_poke(addr, &int3, sizeof(int3));
+
+ if (len - sizeof(int3) > 0) {
+ /* patch all but the first byte */
+ text_poke((char *)addr + sizeof(int3),
+ (const char *) opcode + sizeof(int3),
+ len - sizeof(int3));
+
+ on_each_cpu(do_sync_core, NULL, 1);
+ }
+
+ /* patch the first byte */
+ text_poke(addr, opcode, sizeof(int3));
+
+ on_each_cpu(do_sync_core, NULL, 1);
+
+ bp_patching_in_progress = false;
+ smp_wmb();
+
+ return addr;
+}
+
+/* this one needs to run before anything else handles it as a
+ * regular exception */
+static struct notifier_block int3_nb = {
+ .priority = 0x7fffffff,
+ .notifier_call = int3_notify
+};
+
+static int __init int3_init(void)
+{
+ return register_die_notifier(&int3_nb);
+}
+
+arch_initcall(int3_init);
/*
* Cross-modifying kernel text with stop_machine().
* This code originally comes from immediate value.
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index bddf3b2..d6db7bd 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1709,7 +1709,7 @@ EXPORT_SYMBOL_GPL(unregister_kprobes);

static struct notifier_block kprobe_exceptions_nb = {
.notifier_call = kprobe_exceptions_notify,
- .priority = 0x7fffffff /* we need to be notified first */
+ .priority = 0x7ffffff0 /* High priority, but not first. */
};

unsigned long __weak arch_deref_entry_point(void *entry)
--
1.7.3.1

2013-07-10 21:36:44

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On 07/10/2013 02:31 PM, Jiri Kosina wrote:
>
> If any CPU instruction execution would collide with the patching,
> it'd be trapped by the int3 breakpoint and redirected to the provided
> "handler" (which would typically mean just skipping over the patched
> region, acting as "nop" has been there, in case we are doing nop -> jump
> and jump -> nop transitions).
>

I'm wondering if it would be easier/more general to just return to the
instruction. The "more general" bit would allow this to be used for
other things, like alternatives, and perhaps eventually dynamic call
patching.

Returning to the instruction will, in effect, be a busy-wait for the
faulted CPU until the patch is complete; more or less what stop_machine
would do, but only for a CPU which actually strays into the affected region.

-hpa

2013-07-10 21:47:01

by Joe Perches

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On Wed, 2013-07-10 at 23:31 +0200, Jiri Kosina wrote:
> Introduce a method for run-time instrucntion patching on a live SMP kernel
> based on int3 breakpoint, completely avoiding the need for stop_machine().

Yet more trivia:

instruction typo

> The way this is achieved:
>
> - add a int3 trap to the address that will be patched
> - sync cores
> - update all but the first byte of the patched range
> - sync cores
> - replalace the first byte (int3) by the first byte of

replace typo

> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
[]
> +static int int3_notify(struct notifier_block *self, unsigned long val, void *data)
> +{
> + struct die_args *args = data;
> + struct pt_regs *regs = args->regs;
> +
> + /* bp_patching_in_progress */
> + smp_rmb();
> +
> + if (likely(!bp_patching_in_progress))
> + return NOTIFY_DONE;
> +
> + /* we are not interested in non-int3 faults and ring > 0 faults */
> + if (val != DIE_INT3 || !regs || user_mode_vm(regs)
> + || (unsigned long) bp_int3_addr != regs->ip)
> + return NOTIFY_DONE;
> +
> + /* set up the specified breakpoint handler */
> + args->regs->ip = (unsigned long) bp_int3_handler;

Probably better to use regs->ip as that's what's used
in the test above.

I'd also change the test to order the regs->ip first

if (val != DIE_INT3 || !regs || user_mode_vm(regs) ||
regs->ip != (unsigned long) bp_int3_addr)
return NOTIFY_DONE;

regs->ip = (unsigned long) bp_int3_handler;

> +/*
> + * text_poke_bp() -- update instructions on live kernel on SMP
> + * @addr: address to patch
> + * @opcode: opcode of new instruction
> + * @len: length to copy
> + * @handler: address to jump to when the temporary breakpoint is hit
> + *

kernel-doc?

> +
> + * Modify multi-byte instruction by using int3 breakpoint on SMP.
> + * In contrary to text_poke_smp(), we completely avoid stop_machine() here,
> + * and achieve the synchronization using int3 breakpoint.
> + *
> + * The way it is done:
> + * - add a int3 trap to the address that will be patched
> + * - sync cores
> + * - update all but the first byte of the patched range
> + * - sync cores
> + * - replalace the first byte (int3) by the first byte of

same typo

> + /*
> + * corresponding read barrier in int3 notifier for
> + * making sure the in_progress flags is correctly ordered wrt.
> + * patching */

Some might care about the comment style.
/*
* foo
*/


2013-07-10 21:49:08

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On Wed, Jul 10, 2013 at 02:36:41PM -0700, H. Peter Anvin wrote:
> I'm wondering if it would be easier/more general to just return to the
> instruction. The "more general" bit would allow this to be used for
> other things, like alternatives, and perhaps eventually dynamic call
> patching.

Well, the aspect of not using stop_machine in alternatives is a don't
care because there we do text_poke_early on the BSP anyway. However,
there we toggle interrupts so it would probably be interesting to see
whether a non-interrupt-disabling variant would be faster.

> Returning to the instruction will, in effect, be a busy-wait for
> the faulted CPU until the patch is complete; more or less what
> stop_machine would do, but only for a CPU which actually strays into
> the affected region.

Oh, something like we patch in a two-byte jump first:

1:
jmp 1b

until we finish patching the rest? Ha, interesting :).

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-07-10 21:56:39

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On 07/10/2013 02:48 PM, Borislav Petkov wrote:
> On Wed, Jul 10, 2013 at 02:36:41PM -0700, H. Peter Anvin wrote:
>> I'm wondering if it would be easier/more general to just return to the
>> instruction. The "more general" bit would allow this to be used for
>> other things, like alternatives, and perhaps eventually dynamic call
>> patching.
>
> Well, the aspect of not using stop_machine in alternatives is a don't
> care because there we do text_poke_early on the BSP anyway. However,
> there we toggle interrupts so it would probably be interesting to see
> whether a non-interrupt-disabling variant would be faster.
>
>> Returning to the instruction will, in effect, be a busy-wait for
>> the faulted CPU until the patch is complete; more or less what
>> stop_machine would do, but only for a CPU which actually strays into
>> the affected region.
>
> Oh, something like we patch in a two-byte jump first:
>
> 1:
> jmp 1b
>
> until we finish patching the rest? Ha, interesting :).
>

No, the idea is that the affected CPU will simply execute int3 -> iret
ad nauseam until the first byte is repatched, at that point execution
will resume normally.

-hpa

2013-07-10 22:14:35

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On Wed, Jul 10, 2013 at 02:56:36PM -0700, H. Peter Anvin wrote:
> No, the idea is that the affected CPU will simply execute int3 -> iret
> ad nauseam until the first byte is repatched, at that point execution
> will resume normally.

Ok, that sounds simple enough. I just hope we don't unearth some silly
cross-modifying code snafus in some CPUs with it. :-)

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-07-10 22:27:01

by Jason Baron

[permalink] [raw]
Subject: Re: [RFC] [PATCH 0/2] x86: make jump labels use int3-based breakpoint instead of stop_machine()

On 07/10/2013 04:25 PM, Jiri Kosina wrote:
> Hi,
>
> this is a resurrection of a few years old idea to have jump labels use
> synchronization based on int3 breakpoint rather than relying on
> stop_machine() with all the consequences.
>
> ftrace has been doing exactly this kind of patching for year since
> 08d636b6 ("ftrace/x86: Have arch x86_64 use breakpoints instead of stop
> machine").
>
> This patchset first introduces generic text_poke_bp() that provides means
> to perform this method of patching in parallel to text_poke_smp(), and
> then converts x86 jump label code to use it.
>
> If this is merged, I'll do a followup patch converting ftrace to use this
> infrastructure as well, as it's doing the same thing in principle already.
>
> Comments welcome.
>

Cool. This definitely an area I've wanted to improve with jump labels.

Perhaps, ftrace should be considered at this point to make sure the
interface is suitable for both callers?

Also, I wonder if its worth batching up updates. For example, right now
we simply update each call-site
one at a time even if its associated with the same control variable.

Thanks,

-Jason

2013-07-10 22:39:13

by Jiri Kosina

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On Wed, 10 Jul 2013, H. Peter Anvin wrote:

> > If any CPU instruction execution would collide with the patching,
> > it'd be trapped by the int3 breakpoint and redirected to the provided
> > "handler" (which would typically mean just skipping over the patched
> > region, acting as "nop" has been there, in case we are doing nop -> jump
> > and jump -> nop transitions).
> >
>
> I'm wondering if it would be easier/more general to just return to the
> instruction. The "more general" bit would allow this to be used for
> other things, like alternatives, and perhaps eventually dynamic call
> patching.

Interesting idea ... This should be very easily done by just setting the
"handler" to the exact address that is being patched, and it'll work
exactly the way you are proposing, no?

> Returning to the instruction will, in effect, be a busy-wait for the
> faulted CPU until the patch is complete; more or less what stop_machine
> would do, but only for a CPU which actually strays into the affected region.

Exactly ... so the special case I am introducing for jump labels in 2/2
(i.e. implicitly behaving like there was a nop) is an optimized one, but
can be easily turned into busy loop by just redirecting the "handler" one
instruction back in cases where it'd be desirable.

--
Jiri Kosina
SUSE Labs

2013-07-11 00:04:35

by Jiri Kosina

[permalink] [raw]
Subject: Re: [RFC] [PATCH 0/2] x86: make jump labels use int3-based breakpoint instead of stop_machine()

On Wed, 10 Jul 2013, Jason Baron wrote:

> > this is a resurrection of a few years old idea to have jump labels use
> > synchronization based on int3 breakpoint rather than relying on
> > stop_machine() with all the consequences.
> >
> > ftrace has been doing exactly this kind of patching for year since
> > 08d636b6 ("ftrace/x86: Have arch x86_64 use breakpoints instead of stop
> > machine").
> >
> > This patchset first introduces generic text_poke_bp() that provides means
> > to perform this method of patching in parallel to text_poke_smp(), and
> > then converts x86 jump label code to use it.
> >
> > If this is merged, I'll do a followup patch converting ftrace to use this
> > infrastructure as well, as it's doing the same thing in principle already.
> >
> > Comments welcome.
> >
>
> Cool. This definitely an area I've wanted to improve with jump labels.
>
> Perhaps, ftrace should be considered at this point to make sure the
> interface is suitable for both callers?

Yeah, Steven is CCed. From my understanding of the code, ftrace is
actually doing exactly what I have done for jump labels in 2/2, i.e. in
case the breakpoint is triggered, ftrace implicitly behaves like if "nop"
was the instruction that has been executed.

I even have preliminary (completely untested) patch, but would like to
have this merged/acked in the first round before proceeding with porting
ftrace to the new interface.

> Also, I wonder if its worth batching up updates. For example, right now
> we simply update each call-site one at a time even if its associated
> with the same control variable.

That does seem to make sense indeed, but it's not really closely tied to
this patchset, is it?

Thanks,

--
Jiri Kosina
SUSE Labs

Subject: Re: [RFC] [PATCH 0/2] x86: make jump labels use int3-based breakpoint instead of stop_machine()

(2013/07/11 5:25), Jiri Kosina wrote:
> Hi,
>
> this is a resurrection of a few years old idea to have jump labels use
> synchronization based on int3 breakpoint rather than relying on
> stop_machine() with all the consequences.
>
> ftrace has been doing exactly this kind of patching for year since
> 08d636b6 ("ftrace/x86: Have arch x86_64 use breakpoints instead of stop
> machine").
>
> This patchset first introduces generic text_poke_bp() that provides means
> to perform this method of patching in parallel to text_poke_smp(), and
> then converts x86 jump label code to use it.
>
> If this is merged, I'll do a followup patch converting ftrace to use this
> infrastructure as well, as it's doing the same thing in principle already.

Hi Jiri,

Thank you for taking over it! :)
If yours is merged, I can move optprobe on that too ;)

Thank you again!!

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

(2013/07/11 6:36), H. Peter Anvin wrote:
> On 07/10/2013 02:31 PM, Jiri Kosina wrote:
>>
>> If any CPU instruction execution would collide with the patching,
>> it'd be trapped by the int3 breakpoint and redirected to the provided
>> "handler" (which would typically mean just skipping over the patched
>> region, acting as "nop" has been there, in case we are doing nop -> jump
>> and jump -> nop transitions).
>>
>
> I'm wondering if it would be easier/more general to just return to the
> instruction. The "more general" bit would allow this to be used for
> other things, like alternatives, and perhaps eventually dynamic call
> patching.
>
> Returning to the instruction will, in effect, be a busy-wait for the
> faulted CPU until the patch is complete; more or less what stop_machine
> would do, but only for a CPU which actually strays into the affected region.

Sounds a good idea :)
It may minimize the interface and the implementation will be
self-contained.

Thank you,

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-07-11 10:09:26

by Jiri Kosina

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On Wed, 10 Jul 2013, H. Peter Anvin wrote:

> > If any CPU instruction execution would collide with the patching,
> > it'd be trapped by the int3 breakpoint and redirected to the provided
> > "handler" (which would typically mean just skipping over the patched
> > region, acting as "nop" has been there, in case we are doing nop -> jump
> > and jump -> nop transitions).
> >
>
> I'm wondering if it would be easier/more general to just return to the
> instruction. The "more general" bit would allow this to be used for
> other things, like alternatives,

As Boris already pointed out, this is not really that interesting, as it's
being done through text_poke_early(), which is rather a different story
anyway.

> and perhaps eventually dynamic call patching.

Umm ... could you please elaborate either what exactly do you mean by
that, or why it can't be used currently as-is?

> Returning to the instruction will, in effect, be a busy-wait for the
> faulted CPU until the patch is complete; more or less what stop_machine
> would do, but only for a CPU which actually strays into the affected
> region.

To be honest, I fail to see a clear advantage ... we don't avoid any extra
IPI by it, and wrt. "correctness", the end result is the same.

Thanks,

--
Jiri Kosina
SUSE Labs

Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

(2013/07/11 6:31), Jiri Kosina wrote:
> +/*
> + * text_poke_bp() -- update instructions on live kernel on SMP
> + * @addr: address to patch
> + * @opcode: opcode of new instruction
> + * @len: length to copy
> + * @handler: address to jump to when the temporary breakpoint is hit
> + *
> +
> + * Modify multi-byte instruction by using int3 breakpoint on SMP.
> + * In contrary to text_poke_smp(), we completely avoid stop_machine() here,
> + * and achieve the synchronization using int3 breakpoint.
> + *
> + * The way it is done:
> + * - add a int3 trap to the address that will be patched
> + * - sync cores

You don't need this "sync cores". (and your code didn't) :)

> + * - update all but the first byte of the patched range
> + * - sync cores
> + * - replalace the first byte (int3) by the first byte of
> + * replacing opcode
> + * - sync cores
> + *
> + * Note: must be called under text_mutex.
> + */
> +void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
> +{
> + unsigned char int3 = 0xcc;
> +

Here, you have to protect this code from others, since bp_* are
global.

> + bp_int3_handler = handler;
> + bp_int3_addr = (u8 *)addr + sizeof(int3);
> + bp_patching_in_progress = true;
> + /*
> + * corresponding read barrier in int3 notifier for
> + * making sure the in_progress flags is correctly ordered wrt.
> + * patching */
> + smp_wmb();
> +
> + text_poke(addr, &int3, sizeof(int3));
> +
> + if (len - sizeof(int3) > 0) {
> + /* patch all but the first byte */
> + text_poke((char *)addr + sizeof(int3),
> + (const char *) opcode + sizeof(int3),
> + len - sizeof(int3));
> +
> + on_each_cpu(do_sync_core, NULL, 1);
> + }
> +
> + /* patch the first byte */
> + text_poke(addr, opcode, sizeof(int3));
> +
> + on_each_cpu(do_sync_core, NULL, 1);
> +
> + bp_patching_in_progress = false;
> + smp_wmb();
> +
> + return addr;
> +}
> +
> +/* this one needs to run before anything else handles it as a
> + * regular exception */
> +static struct notifier_block int3_nb = {
> + .priority = 0x7fffffff,
> + .notifier_call = int3_notify
> +};
> +
> +static int __init int3_init(void)
> +{
> + return register_die_notifier(&int3_nb);
> +}
> +
> +arch_initcall(int3_init);
> /*
> * Cross-modifying kernel text with stop_machine().
> * This code originally comes from immediate value.
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index bddf3b2..d6db7bd 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1709,7 +1709,7 @@ EXPORT_SYMBOL_GPL(unregister_kprobes);
>
> static struct notifier_block kprobe_exceptions_nb = {
> .notifier_call = kprobe_exceptions_notify,
> - .priority = 0x7fffffff /* we need to be notified first */
> + .priority = 0x7ffffff0 /* High priority, but not first. */
> };
>
> unsigned long __weak arch_deref_entry_point(void *entry)
>

Thanks,

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-07-11 10:51:15

by Jiri Kosina

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On Thu, 11 Jul 2013, Masami Hiramatsu wrote:

> > + * text_poke_bp() -- update instructions on live kernel on SMP
> > + * @addr: address to patch
> > + * @opcode: opcode of new instruction
> > + * @len: length to copy
> > + * @handler: address to jump to when the temporary breakpoint is hit
> > + *
> > +
> > + * Modify multi-byte instruction by using int3 breakpoint on SMP.
> > + * In contrary to text_poke_smp(), we completely avoid stop_machine() here,
> > + * and achieve the synchronization using int3 breakpoint.
> > + *
> > + * The way it is done:
> > + * - add a int3 trap to the address that will be patched
> > + * - sync cores
>
> You don't need this "sync cores". (and your code didn't) :)

Right, my code originally did, but then I found discussion between you and
hpa from 2009, where this was discussed and adjusted the code accordingly,
but forgot to update the comment. Will do in v3.

> > + * - update all but the first byte of the patched range
> > + * - sync cores
> > + * - replalace the first byte (int3) by the first byte of
> > + * replacing opcode
> > + * - sync cores
> > + *
> > + * Note: must be called under text_mutex.
> > + */
> > +void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
> > +{
> > + unsigned char int3 = 0xcc;
> > +
>
> Here, you have to protect this code from others, since bp_* are
> global.

Caller is responsible for holding the text_mutex, so text_poke_bp() can't
race with itself. And the proper consistency between text_poke_bp() and
the notifier is achieved by the memory barriers.

So what exact scenario do you have in mind here, please?

> > + bp_int3_handler = handler;
> > + bp_int3_addr = (u8 *)addr + sizeof(int3);
> > + bp_patching_in_progress = true;
> > + /*
> > + * corresponding read barrier in int3 notifier for
> > + * making sure the in_progress flags is correctly ordered wrt.
> > + * patching */
> > + smp_wmb();
> > +
> > + text_poke(addr, &int3, sizeof(int3));

Thanks for the review,

--
Jiri Kosina
SUSE Labs

2013-07-11 10:54:14

by Jiri Kosina

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On Thu, 11 Jul 2013, Jiri Kosina wrote:

> > Returning to the instruction will, in effect, be a busy-wait for the
> > faulted CPU until the patch is complete; more or less what stop_machine
> > would do, but only for a CPU which actually strays into the affected
> > region.
>
> To be honest, I fail to see a clear advantage ... we don't avoid any extra
> IPI by it, and wrt. "correctness", the end result is the same.

Ok, after some offline discussion, my understanding is that with this
proposal you are willing to make this usable in a more general way than
just simple nop -> jump -> nop patching.

That makes sense, but I'd propose to have a different independent
interface for this if needed (text_poke_bp_busy() ... ?) in parallel to
the text_poke_bp() as is in my patchset, due to the obvious extra cycles
it's bringing.

Thanks,

--
Jiri Kosina
SUSE Labs

2013-07-11 14:35:51

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On Wed, 2013-07-10 at 14:36 -0700, H. Peter Anvin wrote:
> On 07/10/2013 02:31 PM, Jiri Kosina wrote:
> >
> > If any CPU instruction execution would collide with the patching,
> > it'd be trapped by the int3 breakpoint and redirected to the provided
> > "handler" (which would typically mean just skipping over the patched
> > region, acting as "nop" has been there, in case we are doing nop -> jump
> > and jump -> nop transitions).
> >
>
> I'm wondering if it would be easier/more general to just return to the
> instruction. The "more general" bit would allow this to be used for
> other things, like alternatives, and perhaps eventually dynamic call
> patching.
>
> Returning to the instruction will, in effect, be a busy-wait for the
> faulted CPU until the patch is complete; more or less what stop_machine
> would do, but only for a CPU which actually strays into the affected region.
>

Wont work for ftrace, as it patches all functions, it even patches
functions used to do the changes. Thus, it would cause a deadlock if a
breakpoint were to spin till the changes were finished.

-- Steve

2013-07-11 14:47:12

by Jason Baron

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On 07/11/2013 10:35 AM, Steven Rostedt wrote:
> On Wed, 2013-07-10 at 14:36 -0700, H. Peter Anvin wrote:
>> On 07/10/2013 02:31 PM, Jiri Kosina wrote:
>>> If any CPU instruction execution would collide with the patching,
>>> it'd be trapped by the int3 breakpoint and redirected to the provided
>>> "handler" (which would typically mean just skipping over the patched
>>> region, acting as "nop" has been there, in case we are doing nop -> jump
>>> and jump -> nop transitions).
>>>
>> I'm wondering if it would be easier/more general to just return to the
>> instruction. The "more general" bit would allow this to be used for
>> other things, like alternatives, and perhaps eventually dynamic call
>> patching.
>>
>> Returning to the instruction will, in effect, be a busy-wait for the
>> faulted CPU until the patch is complete; more or less what stop_machine
>> would do, but only for a CPU which actually strays into the affected region.
>>
> Wont work for ftrace, as it patches all functions, it even patches
> functions used to do the changes. Thus, it would cause a deadlock if a
> breakpoint were to spin till the changes were finished.
>
> -- Steve
>
>

I'm not sure this works for jump labels either. Some tracepoints (which
use jump_labels) have interrupts disabled across them. Thus, they will
spin with interrupts disabled, while we are trying to issue an IPI.

Thanks,

-Jason

2013-07-11 15:57:09

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On Wed, 2013-07-10 at 23:31 +0200, Jiri Kosina wrote:
> Changes:
>
> v1 -> v2:
> + fixed kerneldoc
> + fixed checkpatch errors (reported by Borislav)
>
> arch/x86/include/asm/alternative.h | 1 +
> arch/x86/kernel/alternative.c | 101 ++++++++++++++++++++++++++++++++++++
> kernel/kprobes.c | 2 +-
> 3 files changed, 103 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> index 58ed6d9..3abf8dd 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -233,6 +233,7 @@ struct text_poke_param {
> };
>
> extern void *text_poke(void *addr, const void *opcode, size_t len);
> +extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
> extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
> extern void text_poke_smp_batch(struct text_poke_param *params, int n);
>
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index c15cf9a..ee1f51c 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -11,6 +11,7 @@
> #include <linux/memory.h>
> #include <linux/stop_machine.h>
> #include <linux/slab.h>
> +#include <linux/kdebug.h>
> #include <asm/alternative.h>
> #include <asm/sections.h>
> #include <asm/pgtable.h>
> @@ -596,6 +597,106 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
> return addr;
> }
>
> +static void do_sync_core(void *info)
> +{
> + sync_core();
> +}
> +
> +static bool bp_patching_in_progress;
> +static void *bp_int3_handler, *bp_int3_addr;
> +
> +static int int3_notify(struct notifier_block *self, unsigned long val, void *data)
> +{
> + struct die_args *args = data;
> + struct pt_regs *regs = args->regs;
> +
> + /* bp_patching_in_progress */
> + smp_rmb();
> +
> + if (likely(!bp_patching_in_progress))
> + return NOTIFY_DONE;
> +
> + /* we are not interested in non-int3 faults and ring > 0 faults */
> + if (val != DIE_INT3 || !regs || user_mode_vm(regs)
> + || (unsigned long) bp_int3_addr != regs->ip)
> + return NOTIFY_DONE;
> +
> + /* set up the specified breakpoint handler */
> + args->regs->ip = (unsigned long) bp_int3_handler;
> +
> + return NOTIFY_STOP;
> +}
> +/*
> + * text_poke_bp() -- update instructions on live kernel on SMP
> + * @addr: address to patch
> + * @opcode: opcode of new instruction
> + * @len: length to copy
> + * @handler: address to jump to when the temporary breakpoint is hit
> + *
> +
> + * Modify multi-byte instruction by using int3 breakpoint on SMP.
> + * In contrary to text_poke_smp(), we completely avoid stop_machine() here,
> + * and achieve the synchronization using int3 breakpoint.
> + *
> + * The way it is done:
> + * - add a int3 trap to the address that will be patched
> + * - sync cores
> + * - update all but the first byte of the patched range
> + * - sync cores
> + * - replalace the first byte (int3) by the first byte of
> + * replacing opcode
> + * - sync cores
> + *
> + * Note: must be called under text_mutex.
> + */
> +void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
> +{
> + unsigned char int3 = 0xcc;
> +
> + bp_int3_handler = handler;
> + bp_int3_addr = (u8 *)addr + sizeof(int3);
> + bp_patching_in_progress = true;
> + /*
> + * corresponding read barrier in int3 notifier for
> + * making sure the in_progress flags is correctly ordered wrt.
> + * patching */

Nitpick, but this should be:

/*
* Corresponding read barrier in int3 notifier for
* making sure the in_progress flags is correctly ordered wrt.
* patching.
*/

> + smp_wmb();
> +
> + text_poke(addr, &int3, sizeof(int3));
> +
> + if (len - sizeof(int3) > 0) {

I believe we need a sync here. Otherwise, if the instruction crosses
cache lines, the original first byte could have been pulled in, and then
after the text_poke() below, it gets the updated version, causing a
crash on that CPU.

on_each_cpu(do_sync_core, NULL, 1);

-- Steve

> + /* patch all but the first byte */
> + text_poke((char *)addr + sizeof(int3),
> + (const char *) opcode + sizeof(int3),
> + len - sizeof(int3));
> +
> + on_each_cpu(do_sync_core, NULL, 1);
> + }
> +
> + /* patch the first byte */
> + text_poke(addr, opcode, sizeof(int3));
> +
> + on_each_cpu(do_sync_core, NULL, 1);
> +
> + bp_patching_in_progress = false;
> + smp_wmb();
> +
> + return addr;
> +}
> +
> +/* this one needs to run before anything else handles it as a
> + * regular exception */
> +static struct notifier_block int3_nb = {
> + .priority = 0x7fffffff,
> + .notifier_call = int3_notify
> +};
> +
> +static int __init int3_init(void)
> +{
> + return register_die_notifier(&int3_nb);
> +}
> +
> +arch_initcall(int3_init);
> /*
> * Cross-modifying kernel text with stop_machine().
> * This code originally comes from immediate value.
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index bddf3b2..d6db7bd 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1709,7 +1709,7 @@ EXPORT_SYMBOL_GPL(unregister_kprobes);
>
> static struct notifier_block kprobe_exceptions_nb = {
> .notifier_call = kprobe_exceptions_notify,
> - .priority = 0x7fffffff /* we need to be notified first */
> + .priority = 0x7ffffff0 /* High priority, but not first. */
> };
>
> unsigned long __weak arch_deref_entry_point(void *entry)

2013-07-11 16:12:37

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On 07/11/2013 03:23 AM, Masami Hiramatsu wrote:
>> + *
>> + * The way it is done:
>> + * - add a int3 trap to the address that will be patched
>> + * - sync cores
>
> You don't need this "sync cores". (and your code didn't) :)
>

I believe you do, lest you get "Frankenstructions". I believe you don't
need the second one, however. I should dig up my notes on this.

>> + * - update all but the first byte of the patched range
>> + * - sync cores
>> + * - replalace the first byte (int3) by the first byte of
>> + * replacing opcode
>> + * - sync cores
>> + *

2013-07-11 16:32:15

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On 07/11/2013 03:09 AM, Jiri Kosina wrote:
>>
>> I'm wondering if it would be easier/more general to just return to the
>> instruction. The "more general" bit would allow this to be used for
>> other things, like alternatives,
>
> As Boris already pointed out, this is not really that interesting, as it's
> being done through text_poke_early(), which is rather a different story
> anyway.
>
>> and perhaps eventually dynamic call patching.
>
> Umm ... could you please elaborate either what exactly do you mean by
> that, or why it can't be used currently as-is?

Dynamic call patching would be changing a CALL instruction *emitted by
the compiler* (and therefore lacking any metadata annotation) from one
target function to another. Because it lacks metadata annotations, we
can't do this as a "big bang" (all at the same time) but rather would
have to do it on demand (the original CALL would point to a "patch me"
routine.) This means a lot of patching cycles; stop_machine() is a
total nonstarter, even IPIs might be too expensive.

There is an alternative, which is postprocessing the executable to
generate metadata, but that has its own trickiness.

>> Returning to the instruction will, in effect, be a busy-wait for the
>> faulted CPU until the patch is complete; more or less what stop_machine
>> would do, but only for a CPU which actually strays into the affected
>> region.
>
> To be honest, I fail to see a clear advantage ... we don't avoid any extra
> IPI by it, and wrt. "correctness", the end result is the same.
>

The current code assumes that one of the two code sequences is a NOP,
and therefore that jumping over the region is legal. This does not
allow for transitioning one active code sequence to another.

-hpa

2013-07-11 16:42:27

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC] [PATCH 0/2] x86: make jump labels use int3-based breakpoint instead of stop_machine()

On Thu, 2013-07-11 at 02:04 +0200, Jiri Kosina wrote:

> I even have preliminary (completely untested) patch, but would like to
> have this merged/acked in the first round before proceeding with porting
> ftrace to the new interface.
>
> > Also, I wonder if its worth batching up updates. For example, right now
> > we simply update each call-site one at a time even if its associated
> > with the same control variable.
>
> That does seem to make sense indeed, but it's not really closely tied to
> this patchset, is it?

If you want to have ftrace use this interface, then we need to support
batch processing. And you will need to do it with an iterator as well.
We can not allocate 30,000 locations to run this on. Ftrace has its own
table, and uses the ftrace iterator to traverse it.

Thus you would need to do something like:

int text_poke_iterator(struct text_poke_iter *iterator)
{
struct text_poke_iter_instance *iter;

for (iter = text_poke_iterator_start(iterator);
iter;
iter = text_poke_iterator_next(iterator)) {
ret = add_breakpoints(iter->addr, iter->old);
if (!ret)
goto failed;
}

on_each_cpu(do_sync_core, NULL, 1);
/* and this doesn't even handle the issue with ftrace
start up code */

for (iter = text_poke_iterator_start(iterator);
iter;
iter = text_poke_iterator_next(iterator)) {
ret = add_update(iter->addr, iter->old, iter->new);
if (!ret)
goto failed;
}

on_each_cpu(do_sync_core, NULL, 1);

for (iter = text_poke_iterator_start(iterator);
iter;
iter = text_poke_iterator_next(iterator)) {
ret = remove_breakpoints(iter->addr, iter->new);
if (!ret)
goto failed;
}

on_each_cpu(do_sync_core, NULL, 1);

return 0;
failed:
for (iter = text_poke_iterator_start(iterator);
iter;
iter = text_poke_iterator_next(iterator)) {
ret = remove_breakpoints_fail(iter->addr, iter->old, iter->new);
}
}

-- Steve

2013-07-11 16:46:19

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On Thu, 2013-07-11 at 09:31 -0700, H. Peter Anvin wrote:

> The current code assumes that one of the two code sequences is a NOP,
> and therefore that jumping over the region is legal. This does not
> allow for transitioning one active code sequence to another.

Correct, and I think we should keep the two changes separate, as the NOP
case is trivial. No need to complicate the trivial and common updates
(jump_labels and ftrace). But for things like kprobes, we could do a bit
more complex code, but it should probably be separate.

Perhaps call this text_poke_nop_bp()?

-- Steve

2013-07-11 19:21:18

by Jiri Kosina

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On Thu, 11 Jul 2013, Steven Rostedt wrote:

> > The current code assumes that one of the two code sequences is a NOP,
> > and therefore that jumping over the region is legal. This does not
> > allow for transitioning one active code sequence to another.
>
> Correct, and I think we should keep the two changes separate, as the NOP
> case is trivial. No need to complicate the trivial and common updates
> (jump_labels and ftrace). But for things like kprobes, we could do a bit
> more complex code, but it should probably be separate.
>
> Perhaps call this text_poke_nop_bp()?

Hmm ... I don't think this is exactly precise, at least as long as the
implementation in the patchset I have submitted is concerned.

Yes, most use cases (jump labels, perhaps ftrace) will simply be skipping
over the patched region, pretending that NOP has been there; but the
handler provided to text_poke_bp() is completely free to do any other kind
of trickery.

The one that jump label provides in PATCH 2/2 really just skips over the
region, yes. But the interface potentially allows for more.

--
Jiri Kosina
SUSE Labs

2013-07-11 19:23:22

by Jiri Kosina

[permalink] [raw]
Subject: Re: [RFC] [PATCH 0/2] x86: make jump labels use int3-based breakpoint instead of stop_machine()

On Thu, 11 Jul 2013, Steven Rostedt wrote:

> On Thu, 2013-07-11 at 02:04 +0200, Jiri Kosina wrote:
>
> > I even have preliminary (completely untested) patch, but would like to
> > have this merged/acked in the first round before proceeding with porting
> > ftrace to the new interface.
> >
> > > Also, I wonder if its worth batching up updates. For example, right now
> > > we simply update each call-site one at a time even if its associated
> > > with the same control variable.
> >
> > That does seem to make sense indeed, but it's not really closely tied to
> > this patchset, is it?
>
> If you want to have ftrace use this interface, then we need to support
> batch processing. And you will need to do it with an iterator as well.
> We can not allocate 30,000 locations to run this on. Ftrace has its own
> table, and uses the ftrace iterator to traverse it.
>
> Thus you would need to do something like:

Yup, I have been looking at the ftrace implementation and came to this
conclusion; thanks for confirmation.
That's exactly why I wanted to postpone converting ftrace before agreement
on text_poke_bp() is reached and jump labels are converted.

Thanks,

--
Jiri Kosina
SUSE Labs

2013-07-11 19:29:22

by Jiri Kosina

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On Thu, 11 Jul 2013, H. Peter Anvin wrote:

> >> + * The way it is done:
> >> + * - add a int3 trap to the address that will be patched
> >> + * - sync cores
> >
> > You don't need this "sync cores". (and your code didn't) :)
>
> I believe you do, lest you get "Frankenstructions". I believe you don't
> need the second one, however. I should dig up my notes on this.

I found this post from 2010 from you:

http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html

If it's still valid and you guys at Intel haven't discovered any reason
why that procedure would be invalid, I'll send out v3 with that'd be using
exactly this ordering of syncing of the cores.

Thanks,

--
Jiri Kosina
SUSE Labs

2013-07-11 19:43:45

by Jiri Kosina

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On Thu, 11 Jul 2013, Steven Rostedt wrote:

[ .. snip .. ]
> > + smp_wmb();
> > +
> > + text_poke(addr, &int3, sizeof(int3));
> > +
> > + if (len - sizeof(int3) > 0) {
>
> I believe we need a sync here. Otherwise, if the instruction crosses
> cache lines, the original first byte could have been pulled in, and then
> after the text_poke() below, it gets the updated version, causing a
> crash on that CPU.
>
> on_each_cpu(do_sync_core, NULL, 1);

Right you are.

OTOH we apparently don't need the one after the text_poke() below, as
syncing the cores just after patching the first byte afterwards provides
safe enough guard (at least according to hpa's words back in 2010 :) ).

Will change it for next respin of the patchset, thanks for review.

--
Jiri Kosina
SUSE Labs

2013-07-11 19:52:32

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On Thu, 2013-07-11 at 21:43 +0200, Jiri Kosina wrote:

> OTOH we apparently don't need the one after the text_poke() below, as
> syncing the cores just after patching the first byte afterwards provides
> safe enough guard (at least according to hpa's words back in 2010 :) ).

Right, but I'm paranoid enough to keep it anyway ;-)

-- Steve

2013-07-11 19:55:00

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC] [PATCH 0/2] x86: make jump labels use int3-based breakpoint instead of stop_machine()

On Thu, 2013-07-11 at 21:23 +0200, Jiri Kosina wrote:

> > Thus you would need to do something like:
>
> Yup, I have been looking at the ftrace implementation and came to this
> conclusion; thanks for confirmation.
> That's exactly why I wanted to postpone converting ftrace before agreement
> on text_poke_bp() is reached and jump labels are converted.

Note, there's also debug output that ftrace prints when things go wrong.
As really strange things can happen when things don't work, those debug
prints are crucial, and I don't want to lose them.

But we'll discuss that after we get this first round in.

-- Steve

2013-07-11 20:26:27

by Jiri Kosina

[permalink] [raw]
Subject: [PATCH 2/2 v3] x86: make jump_label use int3-based patching

Make jump labels use text_poke_bp() for text patching instead of
text_poke_smp(), avoiding the need for stop_machine().

Signed-off-by: Jiri Kosina <[email protected]>
---
Changes:

- unchanged since v1, patch 1/2 is the one being updated

arch/x86/kernel/jump_label.c | 16 ++++++++++++++--
1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index 2889b3d..460f5d9 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -37,7 +37,19 @@ static void __jump_label_transform(struct jump_entry *entry,
} else
memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);

- (*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
+ /*
+ * Make text_poke_bp() a default fallback poker.
+ *
+ * At the time the change is being done, just ignore whether we
+ * are doing nop -> jump or jump -> nop transition, and assume
+ * always nop being the 'currently valid' instruction
+ *
+ */
+ if (poker)
+ (*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
+ else
+ text_poke_bp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE,
+ (void *)entry->code + JUMP_LABEL_NOP_SIZE);
}

void arch_jump_label_transform(struct jump_entry *entry,
@@ -45,7 +57,7 @@ void arch_jump_label_transform(struct jump_entry *entry,
{
get_online_cpus();
mutex_lock(&text_mutex);
- __jump_label_transform(entry, type, text_poke_smp);
+ __jump_label_transform(entry, type, NULL);
mutex_unlock(&text_mutex);
put_online_cpus();
}
--
1.7.3.1

2013-07-11 20:26:39

by Jiri Kosina

[permalink] [raw]
Subject: [PATCH 1/2 v3] x86: introduce int3-based instruction patching

Introduce a method for run-time instrucntion patching on a live SMP kernel
based on int3 breakpoint, completely avoiding the need for stop_machine().

The way this is achieved:

- add a int3 trap to the address that will be patched
- sync cores
- update all but the first byte of the patched range
- sync cores
- replace the first byte (int3) by the first byte of
replacing opcode
- sync cores

According to

http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html

synchronization after replacing "all but first" instructions should not
be necessary (on Intel hardware), as the syncing after the subsequent
patching of the first byte provides enough safety.
But there's not only Intel HW out there, and we'd rather be on a safe
side.

If any CPU instruction execution would collide with the patching,
it'd be trapped by the int3 breakpoint and redirected to the provided
"handler" (which would typically mean just skipping over the patched
region, acting as "nop" has been there, in case we are doing nop -> jump
and jump -> nop transitions).

Ftrace has been using this very technique since 08d636b ("ftrace/x86:
Have arch x86_64 use breakpoints instead of stop machine") for ages
already, and jump labels are another obvious potential user of this.

Based on activities of Masami Hiramatsu <[email protected]>
a few years ago.

Signed-off-by: Jiri Kosina <[email protected]>
---
Changes:

v1 -> v2:
+ fixed kerneldoc
+ fixed checkpatch errors (reported by Borislav)

v2 -> v3:
+ fixed few typos (Joe Perches)
+ extended changelog with pointer to Intel's statement regarding minimal
necessary synchronization points to achieve correctness
+ added even the synchronization that might not be needed according to
Intel, to be on a safe side (and do what has been proven to work by
ftrace implementation already) (Steven Rostedt)
+ Fix formatting of comments (Steven Rostedt, Borislav Petkov, Joe
Perches)

arch/x86/include/asm/alternative.h | 1 +
arch/x86/kernel/alternative.c | 107 ++++++++++++++++++++++++++++++++++++
kernel/kprobes.c | 2 +-
3 files changed, 109 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 58ed6d9..3abf8dd 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -233,6 +233,7 @@ struct text_poke_param {
};

extern void *text_poke(void *addr, const void *opcode, size_t len);
+extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
extern void text_poke_smp_batch(struct text_poke_param *params, int n);

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index c15cf9a..ed2377d 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -11,6 +11,7 @@
#include <linux/memory.h>
#include <linux/stop_machine.h>
#include <linux/slab.h>
+#include <linux/kdebug.h>
#include <asm/alternative.h>
#include <asm/sections.h>
#include <asm/pgtable.h>
@@ -596,6 +597,112 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
return addr;
}

+static void do_sync_core(void *info)
+{
+ sync_core();
+}
+
+static bool bp_patching_in_progress;
+static void *bp_int3_handler, *bp_int3_addr;
+
+static int int3_notify(struct notifier_block *self, unsigned long val, void *data)
+{
+ struct die_args *args = data;
+
+ /* bp_patching_in_progress */
+ smp_rmb();
+
+ if (likely(!bp_patching_in_progress))
+ return NOTIFY_DONE;
+
+ /* we are not interested in non-int3 faults and ring > 0 faults */
+ if (val != DIE_INT3 || !args->regs || user_mode_vm(args->regs)
+ || (unsigned long) bp_int3_addr != args->regs->ip)
+ return NOTIFY_DONE;
+
+ /* set up the specified breakpoint handler */
+ args->regs->ip = (unsigned long) bp_int3_handler;
+
+ return NOTIFY_STOP;
+}
+/*
+ * text_poke_bp() -- update instructions on live kernel on SMP
+ * @addr: address to patch
+ * @opcode: opcode of new instruction
+ * @len: length to copy
+ * @handler: address to jump to when the temporary breakpoint is hit
+ *
+
+ * Modify multi-byte instruction by using int3 breakpoint on SMP.
+ * In contrary to text_poke_smp(), we completely avoid stop_machine() here,
+ * and achieve the synchronization using int3 breakpoint.
+ *
+ * The way it is done:
+ * - add a int3 trap to the address that will be patched
+ * - sync cores
+ * - update all but the first byte of the patched range
+ * - sync cores
+ * - replace the first byte (int3) by the first byte of
+ * replacing opcode
+ * - sync cores
+ *
+ * Note: must be called under text_mutex.
+ */
+void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
+{
+ unsigned char int3 = 0xcc;
+
+ bp_int3_handler = handler;
+ bp_int3_addr = (u8 *)addr + sizeof(int3);
+ bp_patching_in_progress = true;
+ /*
+ * Corresponding read barrier in int3 notifier for
+ * making sure the in_progress flags is correctly ordered wrt.
+ * patching
+ */
+ smp_wmb();
+
+ text_poke(addr, &int3, sizeof(int3));
+
+ on_each_cpu(do_sync_core, NULL, 1);
+
+ if (len - sizeof(int3) > 0) {
+ /* patch all but the first byte */
+ text_poke((char *)addr + sizeof(int3),
+ (const char *) opcode + sizeof(int3),
+ len - sizeof(int3));
+ /*
+ * According to Intel, this core syncing is very likely
+ * not necessary and we'd be safe even without it. But
+ * better safe than sorry (plus there's not only Intel).
+ */
+ on_each_cpu(do_sync_core, NULL, 1);
+ }
+
+ /* patch the first byte */
+ text_poke(addr, opcode, sizeof(int3));
+
+ on_each_cpu(do_sync_core, NULL, 1);
+
+ bp_patching_in_progress = false;
+ smp_wmb();
+
+ return addr;
+}
+
+/* this one needs to run before anything else handles it as a
+ * regular exception */
+static struct notifier_block int3_nb = {
+ .priority = 0x7fffffff,
+ .notifier_call = int3_notify
+};
+
+static int __init int3_init(void)
+{
+ return register_die_notifier(&int3_nb);
+}
+
+arch_initcall(int3_init);
/*
* Cross-modifying kernel text with stop_machine().
* This code originally comes from immediate value.
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index bddf3b2..d6db7bd 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1709,7 +1709,7 @@ EXPORT_SYMBOL_GPL(unregister_kprobes);

static struct notifier_block kprobe_exceptions_nb = {
.notifier_call = kprobe_exceptions_notify,
- .priority = 0x7fffffff /* we need to be notified first */
+ .priority = 0x7ffffff0 /* High priority, but not first. */
};

unsigned long __weak arch_deref_entry_point(void *entry)
--
1.7.3.1

2013-07-11 20:49:48

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On 07/11/2013 12:29 PM, Jiri Kosina wrote:
> On Thu, 11 Jul 2013, H. Peter Anvin wrote:
>
>>>> + * The way it is done:
>>>> + * - add a int3 trap to the address that will be patched
>>>> + * - sync cores
>>>
>>> You don't need this "sync cores". (and your code didn't) :)
>>
>> I believe you do, lest you get "Frankenstructions". I believe you don't
>> need the second one, however. I should dig up my notes on this.
>
> I found this post from 2010 from you:
>
> http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html
>
> If it's still valid and you guys at Intel haven't discovered any reason
> why that procedure would be invalid, I'll send out v3 with that'd be using
> exactly this ordering of syncing of the cores.
>

That is still the latest word as far as I know.

-hpa

2013-07-11 20:51:08

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

On 07/11/2013 12:29 PM, Jiri Kosina wrote:
> On Thu, 11 Jul 2013, H. Peter Anvin wrote:
>
>>>> + * The way it is done:
>>>> + * - add a int3 trap to the address that will be patched
>>>> + * - sync cores
>>>
>>> You don't need this "sync cores". (and your code didn't) :)
>>
>> I believe you do, lest you get "Frankenstructions". I believe you don't
>> need the second one, however. I should dig up my notes on this.
>
> I found this post from 2010 from you:
>
> http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html
>
> If it's still valid and you guys at Intel haven't discovered any reason
> why that procedure would be invalid, I'll send out v3 with that'd be using
> exactly this ordering of syncing of the cores.
>

Just a note on that: in that post "In fact, if a suitable int3 handler
is left permanently in place then step 5 is unnecessary as well" should
obviously have been "the synchronization in step 4" rather than "step 5".

-hpa

2013-07-11 20:53:36

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 1/2 v3] x86: introduce int3-based instruction patching

On 07/11/2013 01:26 PM, Jiri Kosina wrote:
>
> synchronization after replacing "all but first" instructions should not
> be necessary (on Intel hardware), as the syncing after the subsequent
> patching of the first byte provides enough safety.
> But there's not only Intel HW out there, and we'd rather be on a safe
> side.
>

Has anyone talked to AMD or VIA about this at all? Did anyone else ever
make SMP-capable x86?

-hpa

2013-07-11 21:05:14

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2 v3] x86: introduce int3-based instruction patching

On Thu, Jul 11, 2013 at 01:53:16PM -0700, H. Peter Anvin wrote:
> Has anyone talked to AMD or VIA about this at all?

I guess I can try to take care of the AMD part. Just to confirm, is this
the exact sequence we're interested in:

1. Setup int3 handler for fixup.

2. Put a breakpoint (int3) on the first byte of modifying region, and
synchronize code on all CPUs.

3. Modify other bytes of modifying region.

4. Modify the first byte of modifying region, and synchronize code on
all CPUs.

5. Clear int3 handler.

If a suitable int3 handler is left permanently in place then the
synchronization in step 4 is unnecessary.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-07-11 21:36:41

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 1/2 v3] x86: introduce int3-based instruction patching

On 07/11/2013 02:04 PM, Borislav Petkov wrote:
> On Thu, Jul 11, 2013 at 01:53:16PM -0700, H. Peter Anvin wrote:
>> Has anyone talked to AMD or VIA about this at all?
>
> I guess I can try to take care of the AMD part. Just to confirm, is this
> the exact sequence we're interested in:
>
> 1. Setup int3 handler for fixup.
>
> 2. Put a breakpoint (int3) on the first byte of modifying region, and
> synchronize code on all CPUs.
>
> 3. Modify other bytes of modifying region.
>
> 4. Modify the first byte of modifying region, and synchronize code on
> all CPUs.
>
> 5. Clear int3 handler.
>
> If a suitable int3 handler is left permanently in place then the
> synchronization in step 4 is unnecessary.
>

Correct.

-hpa

2013-07-11 22:31:56

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH 1/2 v3] x86: introduce int3-based instruction patching

On Thu, 11 Jul 2013, H. Peter Anvin wrote:

> > synchronization after replacing "all but first" instructions should not
> > be necessary (on Intel hardware), as the syncing after the subsequent
> > patching of the first byte provides enough safety.
> > But there's not only Intel HW out there, and we'd rather be on a safe
> > side.
>
> Has anyone talked to AMD or VIA about this at all? Did anyone else ever
> make SMP-capable x86?

If Boris can verify for AMD, that'd be good; we could then just remove one
extra syncing of the cores as a followup (can be done any time later, both
for alternative.c and ftrace in fact).

With the "extra" sync, the procedure is already verified to work properly
by ftace.

Thanks,

--
Jiri Kosina
SUSE Labs

2013-07-11 23:01:32

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 1/2 v3] x86: introduce int3-based instruction patching

On Thu, 2013-07-11 at 22:26 +0200, Jiri Kosina wrote:
> Introduce a method for run-time instrucntion patching on a live SMP kernel

Some of the typos anyway.

s/instrunction/instruction

[]

> v2 -> v3:
> + fixed few typos (Joe Perches)
[]
> + /* we are not interested in non-int3 faults and ring > 0 faults */
> + if (val != DIE_INT3 || !args->regs || user_mode_vm(args->regs)
> + || (unsigned long) bp_int3_addr != args->regs->ip)
> + return NOTIFY_DONE;
> +
> + /* set up the specified breakpoint handler */
> + args->regs->ip = (unsigned long) bp_int3_handler;

Still think the test and set should be in the same order.

> +/*
> + * text_poke_bp() -- update instructions on live kernel on SMP
> + * @addr: address to patch
> + * @opcode: opcode of new instruction
> + * @len: length to copy
> + * @handler: address to jump to when the temporary breakpoint is hit
> + *

This sure looks like kernel-doc but misses the leading /**


Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

(2013/07/11 19:51), Jiri Kosina wrote:
>>> + * - update all but the first byte of the patched range
>>> + * - sync cores
>>> + * - replalace the first byte (int3) by the first byte of
>>> + * replacing opcode
>>> + * - sync cores
>>> + *
>>> + * Note: must be called under text_mutex.
>>> + */
>>> +void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
>>> +{
>>> + unsigned char int3 = 0xcc;
>>> +
>>
>> Here, you have to protect this code from others, since bp_* are
>> global.
>
> Caller is responsible for holding the text_mutex, so text_poke_bp() can't
> race with itself. And the proper consistency between text_poke_bp() and
> the notifier is achieved by the memory barriers.

Oops, right. I missed your "Note" line

>
> So what exact scenario do you have in mind here, please?

No, never mind...


Thank you,

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [RFC] [PATCH 1/2 v2] x86: introduce int3-based instruction patching

(2013/07/12 1:46), Steven Rostedt wrote:
> On Thu, 2013-07-11 at 09:31 -0700, H. Peter Anvin wrote:
>
>> The current code assumes that one of the two code sequences is a NOP,
>> and therefore that jumping over the region is legal. This does not
>> allow for transitioning one active code sequence to another.
>
> Correct, and I think we should keep the two changes separate, as the NOP
> case is trivial. No need to complicate the trivial and common updates
> (jump_labels and ftrace). But for things like kprobes, we could do a bit
> more complex code, but it should probably be separate.

Don't mind, kprobes optimization code prepares the destination code
buffer to jump in before code patching. Thus, we just need to give the
buffer address to text_poke_bp().

Thank you,

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-07-12 02:07:30

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/2 v3] x86: introduce int3-based instruction patching

On Thu, 2013-07-11 at 22:26 +0200, Jiri Kosina wrote:

Nitpick, and Joe Perches mentioned this earlier too. The below should be
in kerneldoc format. That is:

/**
* text_poke_bp() -- update instructions on live kernel on SMP


> +/*
> + * text_poke_bp() -- update instructions on live kernel on SMP
> + * @addr: address to patch
> + * @opcode: opcode of new instruction
> + * @len: length to copy
> + * @handler: address to jump to when the temporary breakpoint is hit
> + *
> +
> + * Modify multi-byte instruction by using int3 breakpoint on SMP.

Also, you have a missing asterisk.

See Documentation/kernel-doc-nano-HOWTO.txt for more info.


But other than that, you can add my:

Reviewed-by: Steven Rostedt <[email protected]>

-- Steve

> + * In contrary to text_poke_smp(), we completely avoid stop_machine() here,
> + * and achieve the synchronization using int3 breakpoint.
> + *
> + * The way it is done:
> + * - add a int3 trap to the address that will be patched
> + * - sync cores
> + * - update all but the first byte of the patched range
> + * - sync cores
> + * - replace the first byte (int3) by the first byte of
> + * replacing opcode
> + * - sync cores
> + *
> + * Note: must be called under text_mutex.
> + */
> +void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
> +{
> + unsigned char int3 = 0xcc;
> +
> + bp_int3_handler = handler;
> + bp_int3_addr = (u8 *)addr + sizeof(int3);
> + bp_patching_in_progress = true;
> + /*
> + * Corresponding read barrier in int3 notifier for
> + * making sure the in_progress flags is correctly ordered wrt.
> + * patching
> + */
> + smp_wmb();
> +
> + text_poke(addr, &int3, sizeof(int3));
> +
> + on_each_cpu(do_sync_core, NULL, 1);
> +
> + if (len - sizeof(int3) > 0) {
> + /* patch all but the first byte */
> + text_poke((char *)addr + sizeof(int3),
> + (const char *) opcode + sizeof(int3),
> + len - sizeof(int3));
> + /*
> + * According to Intel, this core syncing is very likely
> + * not necessary and we'd be safe even without it. But
> + * better safe than sorry (plus there's not only Intel).
> + */
> + on_each_cpu(do_sync_core, NULL, 1);
> + }
> +
> + /* patch the first byte */
> + text_poke(addr, opcode, sizeof(int3));
> +
> + on_each_cpu(do_sync_core, NULL, 1);
> +
> + bp_patching_in_progress = false;
> + smp_wmb();
> +
> + return addr;
> +}
> +
> +/* this one needs to run before anything else handles it as a
> + * regular exception */
> +static struct notifier_block int3_nb = {
> + .priority = 0x7fffffff,
> + .notifier_call = int3_notify
> +};
> +
> +static int __init int3_init(void)
> +{
> + return register_die_notifier(&int3_nb);
> +}
> +
> +arch_initcall(int3_init);
> /*
> * Cross-modifying kernel text with stop_machine().
> * This code originally comes from immediate value.
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index bddf3b2..d6db7bd 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1709,7 +1709,7 @@ EXPORT_SYMBOL_GPL(unregister_kprobes);
>
> static struct notifier_block kprobe_exceptions_nb = {
> .notifier_call = kprobe_exceptions_notify,
> - .priority = 0x7fffffff /* we need to be notified first */
> + .priority = 0x7ffffff0 /* High priority, but not first. */
> };
>
> unsigned long __weak arch_deref_entry_point(void *entry)

2013-07-12 02:09:11

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/2 v3] x86: introduce int3-based instruction patching

On Fri, 2013-07-12 at 00:31 +0200, Jiri Kosina wrote:
> On Thu, 11 Jul 2013, H. Peter Anvin wrote:
>
> > > synchronization after replacing "all but first" instructions should not
> > > be necessary (on Intel hardware), as the syncing after the subsequent
> > > patching of the first byte provides enough safety.
> > > But there's not only Intel HW out there, and we'd rather be on a safe
> > > side.
> >
> > Has anyone talked to AMD or VIA about this at all? Did anyone else ever
> > make SMP-capable x86?
>
> If Boris can verify for AMD, that'd be good; we could then just remove one
> extra syncing of the cores as a followup (can be done any time later, both
> for alternative.c and ftrace in fact).
>
> With the "extra" sync, the procedure is already verified to work properly
> by ftace.
>

I'd like to caution on the side of safety. The extra sync really doesn't
hurt. Let's keep it in for a kernel release cycle to make sure
everything else works properly, then we can look at optimizing it.

-- Steve

2013-07-12 02:12:09

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/2 v3] x86: make jump_label use int3-based patching

On Thu, 2013-07-11 at 22:26 +0200, Jiri Kosina wrote:
> Make jump labels use text_poke_bp() for text patching instead of
> text_poke_smp(), avoiding the need for stop_machine().
>
> Signed-off-by: Jiri Kosina <[email protected]>
> ---
> Changes:
>
> - unchanged since v1, patch 1/2 is the one being updated
>
> arch/x86/kernel/jump_label.c | 16 ++++++++++++++--
> 1 files changed, 14 insertions(+), 2 deletions(-)
>

Reviewed-by: Steven Rostedt <[email protected]>

-- Steve

> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> index 2889b3d..460f5d9 100644
> --- a/arch/x86/kernel/jump_label.c
> +++ b/arch/x86/kernel/jump_label.c
> @@ -37,7 +37,19 @@ static void __jump_label_transform(struct jump_entry *entry,
> } else
> memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);
>
> - (*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
> + /*
> + * Make text_poke_bp() a default fallback poker.
> + *
> + * At the time the change is being done, just ignore whether we
> + * are doing nop -> jump or jump -> nop transition, and assume
> + * always nop being the 'currently valid' instruction
> + *
> + */
> + if (poker)
> + (*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
> + else
> + text_poke_bp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE,
> + (void *)entry->code + JUMP_LABEL_NOP_SIZE);
> }
>
> void arch_jump_label_transform(struct jump_entry *entry,
> @@ -45,7 +57,7 @@ void arch_jump_label_transform(struct jump_entry *entry,
> {
> get_online_cpus();
> mutex_lock(&text_mutex);
> - __jump_label_transform(entry, type, text_poke_smp);
> + __jump_label_transform(entry, type, NULL);
> mutex_unlock(&text_mutex);
> put_online_cpus();
> }

Subject: Re: [PATCH 1/2 v3] x86: introduce int3-based instruction patching

(2013/07/12 5:26), Jiri Kosina wrote:
> Introduce a method for run-time instrucntion patching on a live SMP kernel
> based on int3 breakpoint, completely avoiding the need for stop_machine().
>
> The way this is achieved:
>
> - add a int3 trap to the address that will be patched
> - sync cores
> - update all but the first byte of the patched range
> - sync cores
> - replace the first byte (int3) by the first byte of
> replacing opcode
> - sync cores
>
> According to
>
> http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html
>
> synchronization after replacing "all but first" instructions should not
> be necessary (on Intel hardware), as the syncing after the subsequent
> patching of the first byte provides enough safety.
> But there's not only Intel HW out there, and we'd rather be on a safe
> side.
>
> If any CPU instruction execution would collide with the patching,
> it'd be trapped by the int3 breakpoint and redirected to the provided
> "handler" (which would typically mean just skipping over the patched
> region, acting as "nop" has been there, in case we are doing nop -> jump
> and jump -> nop transitions).
>
> Ftrace has been using this very technique since 08d636b ("ftrace/x86:
> Have arch x86_64 use breakpoints instead of stop machine") for ages
> already, and jump labels are another obvious potential user of this.
>
> Based on activities of Masami Hiramatsu <[email protected]>
> a few years ago.
>
> Signed-off-by: Jiri Kosina <[email protected]>

Looks good for me. I'd like to use this for optprobe too :)

Reviewed-by: Masami Hiramatsu <[email protected]>

Thank you!

> ---
> Changes:
>
> v1 -> v2:
> + fixed kerneldoc
> + fixed checkpatch errors (reported by Borislav)
>
> v2 -> v3:
> + fixed few typos (Joe Perches)
> + extended changelog with pointer to Intel's statement regarding minimal
> necessary synchronization points to achieve correctness
> + added even the synchronization that might not be needed according to
> Intel, to be on a safe side (and do what has been proven to work by
> ftrace implementation already) (Steven Rostedt)
> + Fix formatting of comments (Steven Rostedt, Borislav Petkov, Joe
> Perches)
>
> arch/x86/include/asm/alternative.h | 1 +
> arch/x86/kernel/alternative.c | 107 ++++++++++++++++++++++++++++++++++++
> kernel/kprobes.c | 2 +-
> 3 files changed, 109 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> index 58ed6d9..3abf8dd 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -233,6 +233,7 @@ struct text_poke_param {
> };
>
> extern void *text_poke(void *addr, const void *opcode, size_t len);
> +extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
> extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
> extern void text_poke_smp_batch(struct text_poke_param *params, int n);
>
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index c15cf9a..ed2377d 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -11,6 +11,7 @@
> #include <linux/memory.h>
> #include <linux/stop_machine.h>
> #include <linux/slab.h>
> +#include <linux/kdebug.h>
> #include <asm/alternative.h>
> #include <asm/sections.h>
> #include <asm/pgtable.h>
> @@ -596,6 +597,112 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
> return addr;
> }
>
> +static void do_sync_core(void *info)
> +{
> + sync_core();
> +}
> +
> +static bool bp_patching_in_progress;
> +static void *bp_int3_handler, *bp_int3_addr;
> +
> +static int int3_notify(struct notifier_block *self, unsigned long val, void *data)
> +{
> + struct die_args *args = data;
> +
> + /* bp_patching_in_progress */
> + smp_rmb();
> +
> + if (likely(!bp_patching_in_progress))
> + return NOTIFY_DONE;
> +
> + /* we are not interested in non-int3 faults and ring > 0 faults */
> + if (val != DIE_INT3 || !args->regs || user_mode_vm(args->regs)
> + || (unsigned long) bp_int3_addr != args->regs->ip)
> + return NOTIFY_DONE;
> +
> + /* set up the specified breakpoint handler */
> + args->regs->ip = (unsigned long) bp_int3_handler;
> +
> + return NOTIFY_STOP;
> +}
> +/*
> + * text_poke_bp() -- update instructions on live kernel on SMP
> + * @addr: address to patch
> + * @opcode: opcode of new instruction
> + * @len: length to copy
> + * @handler: address to jump to when the temporary breakpoint is hit
> + *
> +
> + * Modify multi-byte instruction by using int3 breakpoint on SMP.
> + * In contrary to text_poke_smp(), we completely avoid stop_machine() here,
> + * and achieve the synchronization using int3 breakpoint.
> + *
> + * The way it is done:
> + * - add a int3 trap to the address that will be patched
> + * - sync cores
> + * - update all but the first byte of the patched range
> + * - sync cores
> + * - replace the first byte (int3) by the first byte of
> + * replacing opcode
> + * - sync cores
> + *
> + * Note: must be called under text_mutex.
> + */
> +void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
> +{
> + unsigned char int3 = 0xcc;
> +
> + bp_int3_handler = handler;
> + bp_int3_addr = (u8 *)addr + sizeof(int3);
> + bp_patching_in_progress = true;
> + /*
> + * Corresponding read barrier in int3 notifier for
> + * making sure the in_progress flags is correctly ordered wrt.
> + * patching
> + */
> + smp_wmb();
> +
> + text_poke(addr, &int3, sizeof(int3));
> +
> + on_each_cpu(do_sync_core, NULL, 1);
> +
> + if (len - sizeof(int3) > 0) {
> + /* patch all but the first byte */
> + text_poke((char *)addr + sizeof(int3),
> + (const char *) opcode + sizeof(int3),
> + len - sizeof(int3));
> + /*
> + * According to Intel, this core syncing is very likely
> + * not necessary and we'd be safe even without it. But
> + * better safe than sorry (plus there's not only Intel).
> + */
> + on_each_cpu(do_sync_core, NULL, 1);
> + }
> +
> + /* patch the first byte */
> + text_poke(addr, opcode, sizeof(int3));
> +
> + on_each_cpu(do_sync_core, NULL, 1);
> +
> + bp_patching_in_progress = false;
> + smp_wmb();
> +
> + return addr;
> +}
> +
> +/* this one needs to run before anything else handles it as a
> + * regular exception */
> +static struct notifier_block int3_nb = {
> + .priority = 0x7fffffff,
> + .notifier_call = int3_notify
> +};
> +
> +static int __init int3_init(void)
> +{
> + return register_die_notifier(&int3_nb);
> +}
> +
> +arch_initcall(int3_init);
> /*
> * Cross-modifying kernel text with stop_machine().
> * This code originally comes from immediate value.
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index bddf3b2..d6db7bd 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1709,7 +1709,7 @@ EXPORT_SYMBOL_GPL(unregister_kprobes);
>
> static struct notifier_block kprobe_exceptions_nb = {
> .notifier_call = kprobe_exceptions_notify,
> - .priority = 0x7fffffff /* we need to be notified first */
> + .priority = 0x7ffffff0 /* High priority, but not first. */
> };
>
> unsigned long __weak arch_deref_entry_point(void *entry)
>


--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [PATCH 2/2 v3] x86: make jump_label use int3-based patching

(2013/07/12 5:26), Jiri Kosina wrote:
> Make jump labels use text_poke_bp() for text patching instead of
> text_poke_smp(), avoiding the need for stop_machine().
>
> Signed-off-by: Jiri Kosina <[email protected]>
> ---
> Changes:
>
> - unchanged since v1, patch 1/2 is the one being updated
>
> arch/x86/kernel/jump_label.c | 16 ++++++++++++++--
> 1 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> index 2889b3d..460f5d9 100644
> --- a/arch/x86/kernel/jump_label.c
> +++ b/arch/x86/kernel/jump_label.c
> @@ -37,7 +37,19 @@ static void __jump_label_transform(struct jump_entry *entry,
> } else
> memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);
>
> - (*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
> + /*
> + * Make text_poke_bp() a default fallback poker.
> + *
> + * At the time the change is being done, just ignore whether we
> + * are doing nop -> jump or jump -> nop transition, and assume
> + * always nop being the 'currently valid' instruction
> + *
> + */
> + if (poker)
> + (*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
> + else
> + text_poke_bp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE,
> + (void *)entry->code + JUMP_LABEL_NOP_SIZE);

BTW, if the poker is NULL or text_poke_early, I think it doesn't need to
pass it to __jump_label_transform, does it?

Thank you,
--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-07-12 07:58:00

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2 v3] x86: introduce int3-based instruction patching

On Thu, Jul 11, 2013 at 02:36:38PM -0700, H. Peter Anvin wrote:
> On 07/11/2013 02:04 PM, Borislav Petkov wrote:
> > On Thu, Jul 11, 2013 at 01:53:16PM -0700, H. Peter Anvin wrote:
> >> Has anyone talked to AMD or VIA about this at all?
> >
> > I guess I can try to take care of the AMD part. Just to confirm, is this
> > the exact sequence we're interested in:
> >
> > 1. Setup int3 handler for fixup.
> >
> > 2. Put a breakpoint (int3) on the first byte of modifying region, and
> > synchronize code on all CPUs.
> >
> > 3. Modify other bytes of modifying region.
> >
> > 4. Modify the first byte of modifying region, and synchronize code on
> > all CPUs.
> >
> > 5. Clear int3 handler.
> >
> > If a suitable int3 handler is left permanently in place then the
> > synchronization in step 4 is unnecessary.
> >
>
> Correct.

Right, above sequence would work on AMD.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-07-12 09:21:55

by Jiri Kosina

[permalink] [raw]
Subject: [PATCH 1/2 v4] x86: introduce int3-based instruction patching

Introduce a method for run-time instruction patching on a live SMP kernel
based on int3 breakpoint, completely avoiding the need for stop_machine().

The way this is achieved:

- add a int3 trap to the address that will be patched
- sync cores
- update all but the first byte of the patched range
- sync cores
- replace the first byte (int3) by the first byte of
replacing opcode
- sync cores

According to

http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html

synchronization after replacing "all but first" instructions should not
be necessary (on Intel hardware), as the syncing after the subsequent
patching of the first byte provides enough safety.
But there's not only Intel HW out there, and we'd rather be on a safe
side.

If any CPU instruction execution would collide with the patching,
it'd be trapped by the int3 breakpoint and redirected to the provided
"handler" (which would typically mean just skipping over the patched
region, acting as "nop" has been there, in case we are doing nop -> jump
and jump -> nop transitions).

Ftrace has been using this very technique since 08d636b ("ftrace/x86:
Have arch x86_64 use breakpoints instead of stop machine") for ages
already, and jump labels are another obvious potential user of this.

Based on activities of Masami Hiramatsu <[email protected]>
a few years ago.

Reviewed-by: Steven Rostedt <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
---
Changes:
v1 -> v2:
+ fixed kerneldoc
+ fixed checkpatch errors (reported by Borislav)

v2 -> v3:
+ fixed few typos (Joe Perches)
+ extended changelog with pointer to Intel's statement regarding minimal
necessary synchronization points to achieve correctness
+ added even the synchronization that might not be needed according to
Intel, to be on a safe side (and do what has been proven to work by
ftrace implementation already) (Steven Rostedt)
+ Fix formatting of comments (Steven Rostedt, Borislav Petkov, Joe
Perches)
v3 -> v4:
+ fixed kerneldoc (Joe Perches, Steven Rostedr)
+ a few trivial code cleanups (Joe Perches)
+ gathered Acks, not RFC anymore

arch/x86/include/asm/alternative.h | 1 +
arch/x86/kernel/alternative.c | 106 ++++++++++++++++++++++++++++++++++++
kernel/kprobes.c | 2 +-
3 files changed, 108 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 58ed6d9..3abf8dd 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -233,6 +233,7 @@ struct text_poke_param {
};

extern void *text_poke(void *addr, const void *opcode, size_t len);
+extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
extern void text_poke_smp_batch(struct text_poke_param *params, int n);

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index c15cf9a..0ab4936 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -11,6 +11,7 @@
#include <linux/memory.h>
#include <linux/stop_machine.h>
#include <linux/slab.h>
+#include <linux/kdebug.h>
#include <asm/alternative.h>
#include <asm/sections.h>
#include <asm/pgtable.h>
@@ -596,6 +597,111 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
return addr;
}

+static void do_sync_core(void *info)
+{
+ sync_core();
+}
+
+static bool bp_patching_in_progress;
+static void *bp_int3_handler, *bp_int3_addr;
+
+static int int3_notify(struct notifier_block *self, unsigned long val, void *data)
+{
+ struct die_args *args = data;
+
+ /* bp_patching_in_progress */
+ smp_rmb();
+
+ if (likely(!bp_patching_in_progress))
+ return NOTIFY_DONE;
+
+ /* we are not interested in non-int3 faults and ring > 0 faults */
+ if (val != DIE_INT3 || !args->regs || user_mode_vm(args->regs)
+ || args->regs->ip != (unsigned long)bp_int3_addr)
+ return NOTIFY_DONE;
+
+ /* set up the specified breakpoint handler */
+ args->regs->ip = (unsigned long) bp_int3_handler;
+
+ return NOTIFY_STOP;
+}
+/**
+ * text_poke_bp() -- update instructions on live kernel on SMP
+ * @addr: address to patch
+ * @opcode: opcode of new instruction
+ * @len: length to copy
+ * @handler: address to jump to when the temporary breakpoint is hit
+ *
+ * Modify multi-byte instruction by using int3 breakpoint on SMP.
+ * In contrary to text_poke_smp(), we completely avoid stop_machine() here,
+ * and achieve the synchronization using int3 breakpoint.
+ *
+ * The way it is done:
+ * - add a int3 trap to the address that will be patched
+ * - sync cores
+ * - update all but the first byte of the patched range
+ * - sync cores
+ * - replace the first byte (int3) by the first byte of
+ * replacing opcode
+ * - sync cores
+ *
+ * Note: must be called under text_mutex.
+ */
+void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
+{
+ unsigned char int3 = 0xcc;
+
+ bp_int3_handler = handler;
+ bp_int3_addr = (u8 *)addr + sizeof(int3);
+ bp_patching_in_progress = true;
+ /*
+ * Corresponding read barrier in int3 notifier for
+ * making sure the in_progress flags is correctly ordered wrt.
+ * patching
+ */
+ smp_wmb();
+
+ text_poke(addr, &int3, sizeof(int3));
+
+ on_each_cpu(do_sync_core, NULL, 1);
+
+ if (len - sizeof(int3) > 0) {
+ /* patch all but the first byte */
+ text_poke((char *)addr + sizeof(int3),
+ (const char *) opcode + sizeof(int3),
+ len - sizeof(int3));
+ /*
+ * According to Intel, this core syncing is very likely
+ * not necessary and we'd be safe even without it. But
+ * better safe than sorry (plus there's not only Intel).
+ */
+ on_each_cpu(do_sync_core, NULL, 1);
+ }
+
+ /* patch the first byte */
+ text_poke(addr, opcode, sizeof(int3));
+
+ on_each_cpu(do_sync_core, NULL, 1);
+
+ bp_patching_in_progress = false;
+ smp_wmb();
+
+ return addr;
+}
+
+/* this one needs to run before anything else handles it as a
+ * regular exception */
+static struct notifier_block int3_nb = {
+ .priority = 0x7fffffff,
+ .notifier_call = int3_notify
+};
+
+static int __init int3_init(void)
+{
+ return register_die_notifier(&int3_nb);
+}
+
+arch_initcall(int3_init);
/*
* Cross-modifying kernel text with stop_machine().
* This code originally comes from immediate value.
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index bddf3b2..d6db7bd 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1709,7 +1709,7 @@ EXPORT_SYMBOL_GPL(unregister_kprobes);

static struct notifier_block kprobe_exceptions_nb = {
.notifier_call = kprobe_exceptions_notify,
- .priority = 0x7fffffff /* we need to be notified first */
+ .priority = 0x7ffffff0 /* High priority, but not first. */
};

unsigned long __weak arch_deref_entry_point(void *entry)
--
1.7.3.1

2013-07-12 09:22:12

by Jiri Kosina

[permalink] [raw]
Subject: [PATCH 2/2 v4] x86: make jump_label use int3-based patching

Make jump labels use text_poke_bp() for text patching instead of
text_poke_smp(), avoiding the need for stop_machine().

Reviewed-by: Steven Rostedt <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
---
arch/x86/kernel/jump_label.c | 16 ++++++++++++++--
1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index 2889b3d..460f5d9 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -37,7 +37,19 @@ static void __jump_label_transform(struct jump_entry *entry,
} else
memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);

- (*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
+ /*
+ * Make text_poke_bp() a default fallback poker.
+ *
+ * At the time the change is being done, just ignore whether we
+ * are doing nop -> jump or jump -> nop transition, and assume
+ * always nop being the 'currently valid' instruction
+ *
+ */
+ if (poker)
+ (*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
+ else
+ text_poke_bp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE,
+ (void *)entry->code + JUMP_LABEL_NOP_SIZE);
}

void arch_jump_label_transform(struct jump_entry *entry,
@@ -45,7 +57,7 @@ void arch_jump_label_transform(struct jump_entry *entry,
{
get_online_cpus();
mutex_lock(&text_mutex);
- __jump_label_transform(entry, type, text_poke_smp);
+ __jump_label_transform(entry, type, NULL);
mutex_unlock(&text_mutex);
put_online_cpus();
}
--
1.7.3.1

Subject: [tip:x86/jumplabel] x86: Introduce int3 (breakpoint) -based instruction patching

Commit-ID: fd4363fff3d96795d3feb1b3fb48ce590f186bdd
Gitweb: http://git.kernel.org/tip/fd4363fff3d96795d3feb1b3fb48ce590f186bdd
Author: Jiri Kosina <[email protected]>
AuthorDate: Fri, 12 Jul 2013 11:21:48 +0200
Committer: H. Peter Anvin <[email protected]>
CommitDate: Tue, 16 Jul 2013 17:55:29 -0700

x86: Introduce int3 (breakpoint)-based instruction patching

Introduce a method for run-time instruction patching on a live SMP kernel
based on int3 breakpoint, completely avoiding the need for stop_machine().

The way this is achieved:

- add a int3 trap to the address that will be patched
- sync cores
- update all but the first byte of the patched range
- sync cores
- replace the first byte (int3) by the first byte of
replacing opcode
- sync cores

According to

http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html

synchronization after replacing "all but first" instructions should not
be necessary (on Intel hardware), as the syncing after the subsequent
patching of the first byte provides enough safety.
But there's not only Intel HW out there, and we'd rather be on a safe
side.

If any CPU instruction execution would collide with the patching,
it'd be trapped by the int3 breakpoint and redirected to the provided
"handler" (which would typically mean just skipping over the patched
region, acting as "nop" has been there, in case we are doing nop -> jump
and jump -> nop transitions).

Ftrace has been using this very technique since 08d636b ("ftrace/x86:
Have arch x86_64 use breakpoints instead of stop machine") for ages
already, and jump labels are another obvious potential user of this.

Based on activities of Masami Hiramatsu <[email protected]>
a few years ago.

Reviewed-by: Steven Rostedt <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/include/asm/alternative.h | 1 +
arch/x86/kernel/alternative.c | 106 +++++++++++++++++++++++++++++++++++++
kernel/kprobes.c | 2 +-
3 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 58ed6d9..3abf8dd 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -233,6 +233,7 @@ struct text_poke_param {
};

extern void *text_poke(void *addr, const void *opcode, size_t len);
+extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
extern void text_poke_smp_batch(struct text_poke_param *params, int n);

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index c15cf9a..0ab4936 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -11,6 +11,7 @@
#include <linux/memory.h>
#include <linux/stop_machine.h>
#include <linux/slab.h>
+#include <linux/kdebug.h>
#include <asm/alternative.h>
#include <asm/sections.h>
#include <asm/pgtable.h>
@@ -596,6 +597,111 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
return addr;
}

+static void do_sync_core(void *info)
+{
+ sync_core();
+}
+
+static bool bp_patching_in_progress;
+static void *bp_int3_handler, *bp_int3_addr;
+
+static int int3_notify(struct notifier_block *self, unsigned long val, void *data)
+{
+ struct die_args *args = data;
+
+ /* bp_patching_in_progress */
+ smp_rmb();
+
+ if (likely(!bp_patching_in_progress))
+ return NOTIFY_DONE;
+
+ /* we are not interested in non-int3 faults and ring > 0 faults */
+ if (val != DIE_INT3 || !args->regs || user_mode_vm(args->regs)
+ || args->regs->ip != (unsigned long)bp_int3_addr)
+ return NOTIFY_DONE;
+
+ /* set up the specified breakpoint handler */
+ args->regs->ip = (unsigned long) bp_int3_handler;
+
+ return NOTIFY_STOP;
+}
+/**
+ * text_poke_bp() -- update instructions on live kernel on SMP
+ * @addr: address to patch
+ * @opcode: opcode of new instruction
+ * @len: length to copy
+ * @handler: address to jump to when the temporary breakpoint is hit
+ *
+ * Modify multi-byte instruction by using int3 breakpoint on SMP.
+ * In contrary to text_poke_smp(), we completely avoid stop_machine() here,
+ * and achieve the synchronization using int3 breakpoint.
+ *
+ * The way it is done:
+ * - add a int3 trap to the address that will be patched
+ * - sync cores
+ * - update all but the first byte of the patched range
+ * - sync cores
+ * - replace the first byte (int3) by the first byte of
+ * replacing opcode
+ * - sync cores
+ *
+ * Note: must be called under text_mutex.
+ */
+void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
+{
+ unsigned char int3 = 0xcc;
+
+ bp_int3_handler = handler;
+ bp_int3_addr = (u8 *)addr + sizeof(int3);
+ bp_patching_in_progress = true;
+ /*
+ * Corresponding read barrier in int3 notifier for
+ * making sure the in_progress flags is correctly ordered wrt.
+ * patching
+ */
+ smp_wmb();
+
+ text_poke(addr, &int3, sizeof(int3));
+
+ on_each_cpu(do_sync_core, NULL, 1);
+
+ if (len - sizeof(int3) > 0) {
+ /* patch all but the first byte */
+ text_poke((char *)addr + sizeof(int3),
+ (const char *) opcode + sizeof(int3),
+ len - sizeof(int3));
+ /*
+ * According to Intel, this core syncing is very likely
+ * not necessary and we'd be safe even without it. But
+ * better safe than sorry (plus there's not only Intel).
+ */
+ on_each_cpu(do_sync_core, NULL, 1);
+ }
+
+ /* patch the first byte */
+ text_poke(addr, opcode, sizeof(int3));
+
+ on_each_cpu(do_sync_core, NULL, 1);
+
+ bp_patching_in_progress = false;
+ smp_wmb();
+
+ return addr;
+}
+
+/* this one needs to run before anything else handles it as a
+ * regular exception */
+static struct notifier_block int3_nb = {
+ .priority = 0x7fffffff,
+ .notifier_call = int3_notify
+};
+
+static int __init int3_init(void)
+{
+ return register_die_notifier(&int3_nb);
+}
+
+arch_initcall(int3_init);
/*
* Cross-modifying kernel text with stop_machine().
* This code originally comes from immediate value.
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 6e33498..b58b490 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1709,7 +1709,7 @@ EXPORT_SYMBOL_GPL(unregister_kprobes);

static struct notifier_block kprobe_exceptions_nb = {
.notifier_call = kprobe_exceptions_notify,
- .priority = 0x7fffffff /* we need to be notified first */
+ .priority = 0x7ffffff0 /* High priority, but not first. */
};

unsigned long __weak arch_deref_entry_point(void *entry)

Subject: [tip:x86/jumplabel] x86: Make jump_label use int3-based patching

Commit-ID: 51b2c07b22261f19188d9a9071943d60a067481c
Gitweb: http://git.kernel.org/tip/51b2c07b22261f19188d9a9071943d60a067481c
Author: Jiri Kosina <[email protected]>
AuthorDate: Fri, 12 Jul 2013 11:22:09 +0200
Committer: H. Peter Anvin <[email protected]>
CommitDate: Tue, 16 Jul 2013 17:55:37 -0700

x86: Make jump_label use int3-based patching

Make jump labels use text_poke_bp() for text patching instead of
text_poke_smp(), avoiding the need for stop_machine().

Reviewed-by: Steven Rostedt <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/kernel/jump_label.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index 2889b3d..460f5d9 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -37,7 +37,19 @@ static void __jump_label_transform(struct jump_entry *entry,
} else
memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);

- (*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
+ /*
+ * Make text_poke_bp() a default fallback poker.
+ *
+ * At the time the change is being done, just ignore whether we
+ * are doing nop -> jump or jump -> nop transition, and assume
+ * always nop being the 'currently valid' instruction
+ *
+ */
+ if (poker)
+ (*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
+ else
+ text_poke_bp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE,
+ (void *)entry->code + JUMP_LABEL_NOP_SIZE);
}

void arch_jump_label_transform(struct jump_entry *entry,
@@ -45,7 +57,7 @@ void arch_jump_label_transform(struct jump_entry *entry,
{
get_online_cpus();
mutex_lock(&text_mutex);
- __jump_label_transform(entry, type, text_poke_smp);
+ __jump_label_transform(entry, type, NULL);
mutex_unlock(&text_mutex);
put_online_cpus();
}

2013-07-17 03:59:49

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 1/2 v3] x86: introduce int3-based instruction patching

On 07/12/2013 12:57 AM, Borislav Petkov wrote:
>
> Right, above sequence would work on AMD.
>

Awesome.

-hpa