Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756354Ab3GLCkS (ORCPT ); Thu, 11 Jul 2013 22:40:18 -0400 Received: from mail4.hitachi.co.jp ([133.145.228.5]:34852 "EHLO mail4.hitachi.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753048Ab3GLCkQ (ORCPT ); Thu, 11 Jul 2013 22:40:16 -0400 Message-ID: <51DF6C8D.1080903@hitachi.com> Date: Fri, 12 Jul 2013 11:40:13 +0900 From: Masami Hiramatsu Organization: Hitachi, Ltd., Japan User-Agent: Mozilla/5.0 (Windows NT 5.2; rv:13.0) Gecko/20120614 Thunderbird/13.0.1 MIME-Version: 1.0 To: Jiri Kosina Cc: Steven Rostedt , Jason Baron , "H. Peter Anvin" , Borislav Petkov , Joe Perches , linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2 v3] x86: introduce int3-based instruction patching References: In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7691 Lines: 232 (2013/07/12 5:26), Jiri Kosina wrote: > Introduce a method for run-time instrucntion patching on a live SMP kernel > based on int3 breakpoint, completely avoiding the need for stop_machine(). > > The way this is achieved: > > - add a int3 trap to the address that will be patched > - sync cores > - update all but the first byte of the patched range > - sync cores > - replace the first byte (int3) by the first byte of > replacing opcode > - sync cores > > According to > > http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html > > synchronization after replacing "all but first" instructions should not > be necessary (on Intel hardware), as the syncing after the subsequent > patching of the first byte provides enough safety. > But there's not only Intel HW out there, and we'd rather be on a safe > side. > > If any CPU instruction execution would collide with the patching, > it'd be trapped by the int3 breakpoint and redirected to the provided > "handler" (which would typically mean just skipping over the patched > region, acting as "nop" has been there, in case we are doing nop -> jump > and jump -> nop transitions). > > Ftrace has been using this very technique since 08d636b ("ftrace/x86: > Have arch x86_64 use breakpoints instead of stop machine") for ages > already, and jump labels are another obvious potential user of this. > > Based on activities of Masami Hiramatsu > a few years ago. > > Signed-off-by: Jiri Kosina Looks good for me. I'd like to use this for optprobe too :) Reviewed-by: Masami Hiramatsu Thank you! > --- > Changes: > > v1 -> v2: > + fixed kerneldoc > + fixed checkpatch errors (reported by Borislav) > > v2 -> v3: > + fixed few typos (Joe Perches) > + extended changelog with pointer to Intel's statement regarding minimal > necessary synchronization points to achieve correctness > + added even the synchronization that might not be needed according to > Intel, to be on a safe side (and do what has been proven to work by > ftrace implementation already) (Steven Rostedt) > + Fix formatting of comments (Steven Rostedt, Borislav Petkov, Joe > Perches) > > arch/x86/include/asm/alternative.h | 1 + > arch/x86/kernel/alternative.c | 107 ++++++++++++++++++++++++++++++++++++ > kernel/kprobes.c | 2 +- > 3 files changed, 109 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h > index 58ed6d9..3abf8dd 100644 > --- a/arch/x86/include/asm/alternative.h > +++ b/arch/x86/include/asm/alternative.h > @@ -233,6 +233,7 @@ struct text_poke_param { > }; > > extern void *text_poke(void *addr, const void *opcode, size_t len); > +extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler); > extern void *text_poke_smp(void *addr, const void *opcode, size_t len); > extern void text_poke_smp_batch(struct text_poke_param *params, int n); > > diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c > index c15cf9a..ed2377d 100644 > --- a/arch/x86/kernel/alternative.c > +++ b/arch/x86/kernel/alternative.c > @@ -11,6 +11,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -596,6 +597,112 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len) > return addr; > } > > +static void do_sync_core(void *info) > +{ > + sync_core(); > +} > + > +static bool bp_patching_in_progress; > +static void *bp_int3_handler, *bp_int3_addr; > + > +static int int3_notify(struct notifier_block *self, unsigned long val, void *data) > +{ > + struct die_args *args = data; > + > + /* bp_patching_in_progress */ > + smp_rmb(); > + > + if (likely(!bp_patching_in_progress)) > + return NOTIFY_DONE; > + > + /* we are not interested in non-int3 faults and ring > 0 faults */ > + if (val != DIE_INT3 || !args->regs || user_mode_vm(args->regs) > + || (unsigned long) bp_int3_addr != args->regs->ip) > + return NOTIFY_DONE; > + > + /* set up the specified breakpoint handler */ > + args->regs->ip = (unsigned long) bp_int3_handler; > + > + return NOTIFY_STOP; > +} > +/* > + * text_poke_bp() -- update instructions on live kernel on SMP > + * @addr: address to patch > + * @opcode: opcode of new instruction > + * @len: length to copy > + * @handler: address to jump to when the temporary breakpoint is hit > + * > + > + * Modify multi-byte instruction by using int3 breakpoint on SMP. > + * In contrary to text_poke_smp(), we completely avoid stop_machine() here, > + * and achieve the synchronization using int3 breakpoint. > + * > + * The way it is done: > + * - add a int3 trap to the address that will be patched > + * - sync cores > + * - update all but the first byte of the patched range > + * - sync cores > + * - replace the first byte (int3) by the first byte of > + * replacing opcode > + * - sync cores > + * > + * Note: must be called under text_mutex. > + */ > +void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler) > +{ > + unsigned char int3 = 0xcc; > + > + bp_int3_handler = handler; > + bp_int3_addr = (u8 *)addr + sizeof(int3); > + bp_patching_in_progress = true; > + /* > + * Corresponding read barrier in int3 notifier for > + * making sure the in_progress flags is correctly ordered wrt. > + * patching > + */ > + smp_wmb(); > + > + text_poke(addr, &int3, sizeof(int3)); > + > + on_each_cpu(do_sync_core, NULL, 1); > + > + if (len - sizeof(int3) > 0) { > + /* patch all but the first byte */ > + text_poke((char *)addr + sizeof(int3), > + (const char *) opcode + sizeof(int3), > + len - sizeof(int3)); > + /* > + * According to Intel, this core syncing is very likely > + * not necessary and we'd be safe even without it. But > + * better safe than sorry (plus there's not only Intel). > + */ > + on_each_cpu(do_sync_core, NULL, 1); > + } > + > + /* patch the first byte */ > + text_poke(addr, opcode, sizeof(int3)); > + > + on_each_cpu(do_sync_core, NULL, 1); > + > + bp_patching_in_progress = false; > + smp_wmb(); > + > + return addr; > +} > + > +/* this one needs to run before anything else handles it as a > + * regular exception */ > +static struct notifier_block int3_nb = { > + .priority = 0x7fffffff, > + .notifier_call = int3_notify > +}; > + > +static int __init int3_init(void) > +{ > + return register_die_notifier(&int3_nb); > +} > + > +arch_initcall(int3_init); > /* > * Cross-modifying kernel text with stop_machine(). > * This code originally comes from immediate value. > diff --git a/kernel/kprobes.c b/kernel/kprobes.c > index bddf3b2..d6db7bd 100644 > --- a/kernel/kprobes.c > +++ b/kernel/kprobes.c > @@ -1709,7 +1709,7 @@ EXPORT_SYMBOL_GPL(unregister_kprobes); > > static struct notifier_block kprobe_exceptions_nb = { > .notifier_call = kprobe_exceptions_notify, > - .priority = 0x7fffffff /* we need to be notified first */ > + .priority = 0x7ffffff0 /* High priority, but not first. */ > }; > > unsigned long __weak arch_deref_entry_point(void *entry) > -- Masami HIRAMATSU IT Management Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu.pt@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/