Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757231AbZKSAdZ (ORCPT ); Wed, 18 Nov 2009 19:33:25 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756241AbZKSAdY (ORCPT ); Wed, 18 Nov 2009 19:33:24 -0500 Received: from tomts40.bellnexxia.net ([209.226.175.97]:35320 "EHLO tomts40-srv.bellnexxia.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1756133AbZKSAdY (ORCPT ); Wed, 18 Nov 2009 19:33:24 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApsEAKMfBEtGGN1W/2dsb2JhbACBTddahDsEgW8 Date: Wed, 18 Nov 2009 19:28:26 -0500 From: Mathieu Desnoyers To: Jason Baron , mingo@elte.hu, mhiramat@redhat.com, "Paul E. McKenney" Cc: linux-kernel@vger.kernel.org, hpa@zytor.com, tglx@linutronix.de, rostedt@goodmis.org, andi@firstfloor.org, roland@redhat.com, rth@redhat.com Subject: Re: [RFC PATCH 2/6] jump label v3 - x86: Introduce generic jump patching without stop_machine Message-ID: <20091119002826.GB28962@Krystal> References: <37e397b27509c378f93b9a30f1158791d1b99be7.1258580048.git.jbaron@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline In-Reply-To: <37e397b27509c378f93b9a30f1158791d1b99be7.1258580048.git.jbaron@redhat.com> X-Editor: vi X-Info: http://krystal.dyndns.org:8080 X-Operating-System: Linux/2.6.27.31-grsec (i686) X-Uptime: 18:43:09 up 92 days, 10:32, 5 users, load average: 0.10, 0.11, 0.14 User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8028 Lines: 247 * Jason Baron (jbaron@redhat.com) wrote: > Add text_poke_fixup() which takes a fixup address to where a processor > jumps if it hits the modifying address while code modifying. > text_poke_fixup() does following steps for this purpose. > > 1. Setup int3 handler for fixup. > 2. Put a breakpoint (int3) on the first byte of modifying region, > and synchronize code on all CPUs. > 3. Modify other bytes of modifying region, and synchronize code on all CPUs. > 4. Modify the first byte of modifying region, and synchronize code > on all CPUs. > 5. Clear int3 handler. > Hi Masami, I like the approach and the API is clean. I have intersped comments below. Ingo: I raise a question about text_mutex vs on_each_cpu hangs I experienced recently in the message below. Might be worth having a look, I suspect this might have caused the hangs Paul McKenney had with his past TREE RCU callback migration. I think he did take a mutex in the cpu hotplug callbacks and might have used IPIs within that same lock. > Thus, if some other processor execute modifying address when step2 to step4, > it will be jumped to fixup code. > > This still has many limitations for modifying multi-instructions at once. > However, it is enough for 'a 5 bytes nop replacing with a jump' patching, > because; > - Replaced instruction is just one instruction, which is executed atomically. > - Replacing instruction is a jump, so we can set fixup address where the jump > goes to. > > Signed-off-by: Masami Hiramatsu > --- > arch/x86/include/asm/alternative.h | 12 ++++ > arch/x86/include/asm/kprobes.h | 1 + > arch/x86/kernel/alternative.c | 120 ++++++++++++++++++++++++++++++++++++ > kernel/kprobes.c | 2 +- > 4 files changed, 134 insertions(+), 1 deletions(-) > [snip snip] > index de7353c..af47f12 100644 > --- a/arch/x86/kernel/alternative.c > +++ b/arch/x86/kernel/alternative.c > @@ -4,6 +4,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -552,3 +553,122 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len) > local_irq_restore(flags); > return addr; > } > + > +/* > + * On pentium series, Unsynchronized cross-modifying code > + * operations can cause unexpected instruction execution results. > + * So after code modified, we should synchronize it on each processor. > + */ > +static void __kprobes __local_sync_core(void *info) > +{ > + sync_core(); > +} > + > +void __kprobes sync_core_all(void) > +{ > + on_each_cpu(__local_sync_core, NULL, 1); OK, so you rely on the fact that on_each_cpu has memory barriers and executes the remote "__local_sync_core" with the appropriate memory barriers underneath, am I correct ? > +} > + > +/* Safely cross-code modifying with fixup address */ > +static void *patch_fixup_from; > +static void *patch_fixup_addr; > + > +static int __kprobes patch_exceptions_notify(struct notifier_block *self, > + unsigned long val, void *data) > +{ > + struct die_args *args = data; > + struct pt_regs *regs = args->regs; > + > + if (likely(!patch_fixup_from)) > + return NOTIFY_DONE; > + > + if (val != DIE_INT3 || !regs || user_mode_vm(regs) || > + (unsigned long)patch_fixup_from != regs->ip) > + return NOTIFY_DONE; > + > + args->regs->ip = (unsigned long)patch_fixup_addr; > + return NOTIFY_STOP; > +} > + > +/** > + * text_poke_fixup() -- cross-modifying kernel text with fixup address. > + * @addr: Modifying address. > + * @opcode: New instruction. > + * @len: length of modifying bytes. > + * @fixup: Fixup address. > + * > + * Note: You must backup replaced instructions before calling this, > + * if you need to recover it. > + * Note: Must be called under text_mutex. > + */ > +void *__kprobes text_poke_fixup(void *addr, const void *opcode, size_t len, > + void *fixup) > +{ > + static const unsigned char int3_insn = BREAKPOINT_INSTRUCTION; > + static const int int3_size = sizeof(int3_insn); > + > + /* Replacing 1 byte can be done atomically. */ I'm sure it can be done atomically, but can it be done safely though ? (disclaimer: we're still waiting for the official answer on this statement): Assuming instruction trace cache effects of SMP cross-code modification, and that only int3 would be safe to patch, then even changing 1 single byte could only be done by going to an intermediate int3 and synchronizing all other cores. > + if (unlikely(len <= 1)) > + return text_poke(addr, opcode, len); > + > + /* Preparing */ > + patch_fixup_addr = fixup; > + wmb(); hrm, missing comment ? > + patch_fixup_from = (u8 *)addr + int3_size; /* IP address after int3 */ > + > + /* Cap by an int3 */ > + text_poke(addr, &int3_insn, int3_size); > + sync_core_all(); > + > + /* Replace tail bytes */ > + text_poke((char *)addr + int3_size, (const char *)opcode + int3_size, > + len - int3_size); > + sync_core_all(); > + > + /* Replace int3 with head byte */ > + text_poke(addr, opcode, int3_size); > + sync_core_all(); > + > + /* Cleanup */ > + patch_fixup_from = NULL; > + wmb(); missing comment here too. > + return addr; Little quiz question: When patch_fixup_from is set to NULL, what ensures that the int3 handlers have completed their execution ? I think it's probably OK, because the int3 is an interrupt gate, which therefore disables interrupts as soon as it runs, and executes the notifier while irqs are off. When we run sync_core_all() after replacing the int3 by the new 1st byte, we only return when all other cores have executed an interrupt, which implies that all int3 handlers previously running should have ended. Is it right ? It looks to me as if this 3rd sync_core_all() is only needed because of that. Probably that adding a comment would be good. Another thing: I've recently noticed that the following locking seems to hang the system with doing stress-testing concurrently with cpu hotplug/hotunplug: mutex_lock(&text_mutex); on_each_cpu(something, NULL, 1); The hang seems to be caused by the fact that alternative.c has: within cpu hotplug (cpu hotplug lock held) mutex_lock(&text_mutex); It might also be caused by the interaction with the stop_machine() performed within the cpu hotplug lock. I did not find the root cause of the problem, but this probably calls for lockdep improvements. In any case, given you are dealing with the exact same locking scenario here, I would recomment the following stress-test: in a loop, use text_poke_jump() in a loop, hotunplug all cpus, then hotplug all cpus I had to fix this temporarily by taking get/put_online_cpus() about the text_mutex. [snip snip] > +static struct notifier_block patch_exceptions_nb = { > + .notifier_call = patch_exceptions_notify, > + .priority = 0x7fffffff /* we need to be notified first */ > +}; > + > +static int __init patch_init(void) > +{ > + return register_die_notifier(&patch_exceptions_nb); > +} > + > +arch_initcall(patch_init); > diff --git a/kernel/kprobes.c b/kernel/kprobes.c > index e5342a3..43a30d8 100644 > --- a/kernel/kprobes.c > +++ b/kernel/kprobes.c > @@ -898,7 +898,7 @@ EXPORT_SYMBOL_GPL(unregister_kprobes); > > static struct notifier_block kprobe_exceptions_nb = { > .notifier_call = kprobe_exceptions_notify, > - .priority = 0x7fffffff /* we need to be notified first */ > + .priority = 0x7ffffff0 /* High priority, but not first. */ It would be good to keep all these priorities in a centralized header. Thanks, Mathieu > }; > > unsigned long __weak arch_deref_entry_point(void *entry) > -- > 1.6.5.1 > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/