Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756203Ab0AVSru (ORCPT ); Fri, 22 Jan 2010 13:47:50 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756137Ab0AVSru (ORCPT ); Fri, 22 Jan 2010 13:47:50 -0500 Received: from mx1.redhat.com ([209.132.183.28]:20796 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756126Ab0AVSrt (ORCPT ); Fri, 22 Jan 2010 13:47:49 -0500 From: Masami Hiramatsu Subject: [PATCH -tip v8 0/9] kprobes: Kprobes jump optimization support To: Frederic Weisbecker , Ingo Molnar , Ananth N Mavinakayanahalli , lkml Cc: Ananth N Mavinakayanahalli , Ingo Molnar , Jim Keniston , Srikar Dronamraju , Christoph Hellwig , Steven Rostedt , Frederic Weisbecker , "H. Peter Anvin" , Anders Kaseorg , Tim Abbott , Andi Kleen , Jason Baron , Mathieu Desnoyers , systemtap , DLE Date: Fri, 22 Jan 2010 13:54:50 -0500 Message-ID: <20100122185450.9022.87506.stgit@dhcp-100-2-132.bos.redhat.com> User-Agent: StGIT/0.14.3 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7857 Lines: 195 Hi, Here are the patchset of the kprobes jump optimization v8 (a.k.a. Djprobe). This version is just moving onto 2.6.33-rc4-tip. Ingo, I assume its a good timing to push this code onto -tip tree (maybe developing branch?), since people can test it with perf-probe. I've decided to make a separated series of patches of jump optimization with text_poke_smp() which is 'officially' supported on Intel's processors. So, this version of patches are just updated against the latest tip/master, no other updates are included. I know that int3-bypassing method (text_poke_fixup()) is currently unofficially believed as safe. But we need to get more official answers from x86 vendors. Moreover, we need to tweak entry_*.S for preventing recursive NMI, because int3 inside NMI handler will unblock NMI blocking. I'd like to push it after this series of patches are merged. Anyway, thanks Mathieu and Peter, for helping me to implement it and organizing discussion points about int3-bypass XMC! These patches can be applied on the latest -tip. Changes in v8: - Update patches against the latest tip/master. - Drop text_poke_fixup() related patches. - Update benchmark results and add jprobes and kprobe(post-handler) results. And kprobe stress test didn't found any regressions - from kprobes, under kvm/x86. TODO: - Support NMI-safe int3-bypassing text_poke. - Support preemptive kernel (by stack unwinding and checking address). Jump Optimized Kprobes ====================== o Concept Kprobes uses the int3 breakpoint instruction on x86 for instrumenting probes into running kernel. Jump optimization allows kprobes to replace breakpoint with a jump instruction for reducing probing overhead drastically. o Performance An optimized kprobe 5 times faster than a kprobe. Optimizing probes gains its performance. Usually, a kprobe hit takes 0.5 to 1.0 microseconds to process. On the other hand, a jump optimized probe hit takes less than 0.1 microseconds (actual number depends on the processor). Here is a sample overheads. Intel(R) Xeon(R) CPU E5410 @ 2.33GHz (without debugging options, with text_poke_smp patch, 2.6.33-rc4-tip+) x86-32 x86-64 kprobe: 0.80us 0.99us kprobe+booster: 0.33us 0.43us kprobe+optimized: 0.05us 0.06us kprobe(post-handler): 0.81us 1.00us kretprobe : 1.10us 1.24us kretprobe+booster: 0.61us 0.68us kretprobe+optimized: 0.33us 0.30us jprobe: 1.37us 1.67us jprobe+booster: 0.80us 1.10us (booster skips single-stepping, kprobe with post handler isn't boosted/optimized, and jprobe isn't optimized.) Note that jump optimization also consumes more memory, but not so much. It just uses ~200 bytes, so, even if you use ~10,000 probes, it just consumes a few MB. o Usage Set CONFIG_OPTPROBES=y when building a kernel, then all *probes will be optimized if possible. Kprobes decodes probed function and checks whether the target instructions can be optimized(replaced with a jump) safely. If it can't be, Kprobes just doesn't optimize it. o Optimization Before preparing optimization, Kprobes inserts original(user-defined) kprobe on the specified address. So, even if the kprobe is not possible to be optimized, it just uses a normal kprobe. - Safety check First, Kprobes gets the address of probed function and checks whether the optimized region, which will be replaced by a jump instruction, does NOT straddle the function boundary, because if the optimized region reaches the next function, its caller causes unexpected results. Next, Kprobes decodes whole body of probed function and checks there is NO indirect jump, NO instruction which will cause exception by checking exception_tables (this will jump to fixup code and fixup code jumps into same function body) and NO near jump which jumps into the optimized region (except the 1st byte of jump), because if some jump instruction jumps into the middle of another instruction, it causes unexpected results too. Kprobes also measures the length of instructions which will be replaced by a jump instruction, because a jump instruction is longer than 1 byte, it may replaces multiple instructions, and it checks whether those instructions can be executed out-of-line. - Preparing detour code Then, Kprobes prepares "detour" buffer, which contains exception emulating code (push/pop registers, call handler), copied instructions(Kprobes copies instructions which will be replaced by a jump, to the detour buffer), and a jump which jumps back to the original execution path. - Pre-optimization After preparing detour code, Kprobes enqueues the kprobe to optimizing list and kicks kprobe-optimizer workqueue to optimize it. To wait other optimized probes, kprobe-optimizer will delay to work. When the optimized-kprobe is hit before optimization, its handler changes IP(instruction pointer) to copied code and exits. So, those copied instructions are executed on the detour buffer. - Optimization Kprobe-optimizer doesn't start instruction-replacing soon, it waits synchronize_sched for safety, because some processors are possible to be interrupted on the middle of instruction series (2nd or Nth instruction) which will be replaced by a jump instruction(*). As you know, synchronize_sched() can ensure that all interruptions which were executed when synchronize_sched() was called are done, only if CONFIG_PREEMPT=n. So, this version supports only the kernel with CONFIG_PREEMPT=n.(**) After that, kprobe-optimizer calls stop_machine() to replace probed- instructions with a jump instruction by using text_poke_smp(). - Unoptimization When unregistering, disabling kprobe or being blocked by other kprobe, an optimized-kprobe will be unoptimized. Before kprobe-optimizer runs, the kprobe just be dequeued from the optimized list. When the optimization has been done, it replaces a jump with int3 breakpoint and original code by using text_poke_smp(). (*)Please imagine that 2nd instruction is interrupted and optimizer replaces the 2nd instruction with jump *address* while the interrupt handler is running. When the interrupt returns to original address, there is no valid instructions and it causes unexpected result. (**)This optimization-safety checking may be replaced with stop-machine method which ksplice is done for supporting CONFIG_PREEMPT=y kernel. Thank you, --- Masami Hiramatsu (9): kprobes: Add documents of jump optimization kprobes/x86: Support kprobes jump optimization on x86 x86: Add text_poke_smp for SMP cross modifying code kprobes/x86: Cleanup save/restore registers kprobes/x86: Boost probes when reentering kprobes: Jump optimization sysctl interface kprobes: Introduce kprobes jump optimization kprobes: Introduce generic insn_slot framework kprobes/x86: Cleanup RELATIVEJUMP_INSTRUCTION to RELATIVEJUMP_OPCODE Documentation/kprobes.txt | 191 ++++++++++- arch/Kconfig | 13 + arch/x86/Kconfig | 1 arch/x86/include/asm/alternative.h | 4 arch/x86/include/asm/kprobes.h | 31 ++ arch/x86/kernel/alternative.c | 60 +++ arch/x86/kernel/kprobes.c | 596 ++++++++++++++++++++++++++++------ include/linux/kprobes.h | 44 +++ kernel/kprobes.c | 626 +++++++++++++++++++++++++++++++----- kernel/sysctl.c | 12 + 10 files changed, 1373 insertions(+), 205 deletions(-) -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhiramat@redhat.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/