2005-10-31 11:04:09

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC][PATCH 0/3]Djprobe (Direct Jump Probe) for 2.6.14-rc5-mm1

Djprobe Documentation
authors: Satoshi Oshima ([email protected])
Masami Hiramatsu ([email protected])

INDEX

1. Djprobe concepts
2. How djprobe works
3. Further Considerations
4. Djprobe Features
5. Architectures Supported
6. Configuring Djprobe
7. API Reference
8. TODO
9. FAQ

1. Djprobe concepts

The basic idea of Djprobe is to dynamically hook at any kernel function
entry points and collect the debugging or performance analysis information
non-disruptively. The functionality of djprobe is very similar to Kprobe
or Jprobe. The distinction of djprobe is to use jump instruction instead
of break point instruction. This distinction reduces the overhead of each
probe.

Developers can trap at almost any kernel function entry points, specifying
a handler routine to be invoked when the jump instruction is executed.


2. How Djprobe works

Break point instruction is easily inserted on most architecture.
For example, binary size of break point instruction on i386 or x86_64
architecture is 1 byte. 1 byte replacement is took place in single step.
And replacement with breakpoint instruction is guaranteed as SMP safe.

On the other hand jump instruction is not easily inserted. Binary size of
jump instruction on i386 is 5 byte. 5 byte replacement cannot be executed
in single step. And beyond that dynamic code modification has some
complicated restriction.

To explain the djprobe mechanism, we introduce some terminology.
Image certain binary line which is constructed by 2 byte instruction,
2byte instruction and 3byte instruction.

IA
|
[-2][-1][0][1][2][3][4][5][6][7]
[ins1][ins2][ ins3 ]
[<- DCR ->]
[<- JTPR ->]

ins1: 1st Instruction
ins2: 2nd Instruction
ins3: 3rd Instruction
IA: Insertion Address
JTPR: Jump Target Prohibition Region
DCR: Detoured Code Region


The replacement procedure of djpopbes is 6 steps:

(1) copying instruction(s) in DCR
(2) putting break point instruction at IA
(3) scheduling works on each CPU
(4) executing CPU safety check on each work
(5) replacing original instruction(s) with jump instruction without
first byte and serializing code
(6) replacing break point instruction with first byte of jump instruction

Further explanation is given below.

(1) copying instruction(s) in DCR

Djprobe copies replaced instruction(s) to the region that djprobe allocates.
The replaced instructions must include the instruction that includes the byte
at IA+4. Therefore the size of DCR must be 5 byte or more. The size of DCR
must be given by djprobes user.

(2) putting break point instruction at IA

Djprobe replaces a break point instruction at Insertion Point. After this
replacement, the djprobe act like kprobes.

(3) scheduling works on each CPU

Djprobe schedules work(s) that execute CPU safety check on each CPU, and wait
till those works finished.

(4) executing CPU safety check on each work

Current Djprobe suppose that the context switch must NOT occur on extension
of interruption, which means that every interruption must return before
executing context switch.

Therefore, execution of scheduled works itself is the proof that every
interruption stack (and every process stack) doesn't include any address
in JTPR.

The last CPU that executes safety check work wakes the waiting process up.

(5) replacing original instruction(s) with jump instruction without first
byte and serializing code

After all safety check works are scheduled, djprobe can replace the codes
in JTPR safely. Because, now, any CPU is not executing JTPR. Even if a CPU
tries to execute the instructions in the top of DCR again, the CPU is
interrupted by kprobe and is led to execute the copied instructions. So
any CPU does not touch the instructions in the JTPR.

Djprobe replaces the bytes in the area from IA+1 to IA+4 with jump
instruction that doesn't contain first byte.
And it serializes the replaced code on every CPU.

(6) replacing breakpoint instruction with first byte of jump instruction

Djprobe replaces breakpoint instruction that is put by themselves with the
first byte of jump instruction.


3. Further considerations

There are many difficulties on implementation of djprobe. In this section,
we discuss restrictions of djprobe to understand these difficulties.

3.1 the way to confirm safety of DCR(dynamic analysis)

Djprobe tries to replace the code that includes one instruction or more.
This replacement usually accompanies changing the boundaries of instructions.
Therefore djprobe must ensure that the other CPUs don't execute DCR or every
stack doesn't contain the address in JTPR.

3.2 confirmation of safety of DCR(static analysis)

Djprobe must also avoid JTPR must not be targeted by any jump or call
instruction. Basically this must be extremely difficult to take place.
But some point such as function entry point can be expected that is not
target of jump or call instruction (because function entry point contains
fixed form that ensures the code convention.)

4. Djprobe Features

- Djprobe can probe entries of almost all functions without any interruption.

5. Architecture Supported

- i386


6. Configuring Djprobe
When configuring the kernel using make menuconfig/xconfig/oldconfig, ensure
that CONFIG_DJPROBE is set to "y". Under "Instrumentation Support",
look for "Direct Jump probe". You may have to enable "Kprobes" and to
*DISABLE* "Preemptible Kernel".

7. API Reference
The Djprobe API includes "register_djprobe" function and
"unregister_djprobe" function. Here are specifications for these functions
and the associated probe handlers.

7.1 register_djprobe

#include <linux/djprobe.h>
int register_djprobe(struct djprobe *djp, void *addr, int size);

Inserts a jump instruction at the address addr. When the jump is
hit, Djprobe calls djp->handler.

register_djprobe() returns 0 on success, or a negative errno otherwise.

User's probe handler (djp->handler):
#include <linux/djprobe.h>
#include <linux/ptrace.h>
void handler(struct djprobe *djp, struct pt_regs *regs);

Called with p pointing to the djprobe associated with the probe point,
and regs pointing to the struct containing the registers saved when
the probe point was hit.

7.2 unregister_djprobe

#include <linux/djprobe.h>
void unregister_djprobe(struct djprobe *djp);

Removes the specified probe. The unregister function can be called
at any time after the probe has been registered.


8. TODO

(1)support architecture transparent interface.
(Djprobe interface emulated by kprobes)
(2)bulk registeration interface support
(3)kprobe interoperability (coexistance in same address)
(4)other architectures support

9. FAQ
Direct Jump Probe Q&A

Q: What is the Direct Jump Probe (Djprobe)?
A: Djprobe is a low overhead probe method for linux kernel.

Q: What is different from Kprobes?
A: The most different feature is that the djprobe uses a jump instruction
code instead of breakpoint instruction code. It can reduce overheads of
probing especially when the probes are executed frequently.

Q: How does the djprobe work?
A: First, Djprobe copies some instructions modified by a jump instruction
into the middle of a stub code buffer. Next, it overwrites the instructions
with the jump instruction whose destination is the top of that stub code
buffer. In the top of the stub code buffer, there is a call instruction
which calls a probe function. And, in the bottom of the stub code buffer,
there is a jump instruction whose destination is the next of the modified
instructions.
On the other hand, Kprobe copies only one instruction which will be
modified by breakpoint instruction, and overwrites it breakpoint
instruction. When breakpoint interruption handling, it executes the copied
instruction with the trap flag. When trap interruption handling, it
corrects IP(*) for returning to the kernel code.
So, djprobe's work sequence is "jump", "probe", "execute copies" and
"jump", whereas kprobes' sequence is "break", "probe", "execute copies",
and "trap".

(*)Instruction Pointer

Q: Does the djprobe need to modify kernel source code?
A: No. The djprobe is one of the dynamic probes. It can be inserted into
running kernel.

Q: Can djprobe work with CPU-hotplug?
A: Yes, djprobe locks cpu-hotplug in the critical section.

Q: Where can the djprobe be inserted in?
A: Djprobe can be inserted in almost all kernel code including the head of
almost kernel functions. The insertion area must satisfy the assumptions
described below.

(In i386 architecture)
IA
|
[-2][-1][0][1][2][3][4][5][6][7]
[ins1][ins2][ ins3 ]
[<- DCR ->]
[<- JTPR ->]

ins1: 1st Instruction
ins2: 2nd Instruction
ins3: 3rd Instruction
IA: Insertion Address
DCR (Detoured Code Region): The area which is including the instructions
whose first byte is in the range in 5 bytes (this size is from the size of
jump instruction) from the insertion address. These instructions are copied
into the middle of a stub code buffer.
JTPR (Jump Target Prohibition Region): The area which is including the
codes among codes rewritten in the jump instruction by djprobe except the
first one byte.

Assumptions:
i) The insertion address points the first byte of an instruction.
This is for avoidance of a bad instruction exception.
ii) There are no instructions which refer IP (ex. relative jmp) in DCR.
EIP has been changed when copied instruction is executed.
iii) There are no instructions which occur context-switch (ex. call
schedule()) in DCR.
If a context-switch occurs in DCR, the next address of an instruction
(ex. the address of "ins2") is stored in the call stack of previous thread.
After that, djprobe overwrites the instruction with jump instruction. When
the previous thread switches back, it resumes execution from the stored
address. So it will cause a bad instruction exception.
iv) Destination address of jump or call is not included in JTPR.
This is for avoidance of a bad instruction exception too.

Q: Can several djprobes be inserted in the same address?
A: Yes. Several djprobes which are inserted in the same address are
aggregated and share one instance.
NOTE: When a new djprobe's insertion address is in another djprobe's JTPR
(above described), or the another djprobe's insertion address is in the new
djprobe's JTPR, register_djprobe() fails to register the new djprobe and
returns -EEXIST error code.

Q: Can djprobe be used with kprobes in same address?
A: No, currently djprobe can not coexist with kprobes in same address. But
we will support this feature as soon as possible.

Q: Should the jump instruction be with in a page boundary to avoid access
violation and page fault?
A: No. The x86 processors can handle non-aligned instructions correctly. We
can see many non-aligned instructions in the kernel binary. And, in the
kernel space, there is no page fault. Kernel code pages are always mapped
to the kernel page table.
So it is not necessary to care of page boundaries in x86 architecture.

Q: How does the djprobe resolve problems about self/cross-modifying code?
In Pentium Series, Unsynchronized cross-modifying code operations except
the first byte of an instruction can cause unexpected instruction
execution results.
A: Djprobe uses a trick code to resolve the problems. It modifies the
instructions as following.
1) Register special handler as a kprobe handler. (And a break point
instruction is written on the first byte of the insertion address by
kprobes.)
2) Check safety (this is described in the next question's answer).
3) Write only the destination address part of jump instruction on the
kernel code. (This operation is not synchronized)
4) Call "cpuid" on each processor for synchronization.
5) Write the first byte of the jump instruction. (This operation is
synchronized automatically)

Q: How does the djprobe guarantee no threads and no processors are
executing the modifying area? The IP of that area may be stored in the
stack memory of those threads.
A: The problem would be caused for three reasons:
i) Problem caused by the multi processor system
Another processor may be executing the area which is overwritten by jump
instruction. Djprobe should guarantee no processor is executing those
instructions when modify it.
ii) Problem caused by the interruption
An interruption might have occurred in the area which is going to be
overwritten by jump instruction. Djprobe should guarantee all
interruptions which occurred in the area have finished.
iii) Problem caused by full preempt kernel
In case of Problem (iii), it is described in the next question's answer.

The Djprobe uses the workqueue to resolve Problem (i) and (ii). The
solution is described below:
1) Copy the entire of the DCR (described above) into the middle of a stub
code buffer.
2) Register special handler as a kprobe handler. This special handler
changes kprobe's resume point to the stub code buffer.
3) Clear the safety flags of all processors.
4) Register a safety checking work to the workqueue on each processor. And
wait till those works are scheduled.
5) When keventd thread is scheduled on a processor, it executes the work.
In this time, this processor is not executing the area which is
overwritten by jump instruction. And also it has finished all
interruptions. Because, in the case of voluntary preemption or non
preemption kernel, the context switch does not occur in the extension of
interruption.
6) The all works are scheduled, djprobe writes the jump instruction.

Q: Can the djprobe work with kernel full preemption?
A: No, but you can use the djprobe's interface. When kernel full preemption
is enabled, we can't ensure that no threads are executing the modified
area. It may be stored in the stack of the threads. In this case, the
djprobe interfaces are emulated by using kprobe.
The latest linux kernel supports not only full preemption but also the
voluntarily preemption. In the case of voluntarily preemption, threads
are scheduled from only limited addresses. So it is easy to check that
the preemption can not occur in the modified area.


Attachments:
djprobe.txt (13.88 kB)

2005-10-31 11:07:48

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC][PATCH 1/3]Djprobe (Direct Jump Probe) for 2.6.14-rc5-mm1

Hi,

This patch enables get_insn_slot() to handle slots that have
different size.
The djprobe requires this patch to work it on the machines which
support "NX bit".

---
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: [email protected]

Signed-off-by: Masami Hiramatsu <[email protected]>

include/linux/kprobes.h | 5 ++++
kernel/kprobes.c | 58 +++++++++++++++++++++++++++++++-----------------
2 files changed, 43 insertions(+), 20 deletions(-)
diff -Narup linux-2.6.14-rc5-mm1/include/linux/kprobes.h linux-2.6.14-rc5-mm1.djp.1/include/linux/kprobes.h
--- linux-2.6.14-rc5-mm1/include/linux/kprobes.h 2005-10-25 11:29:02.000000000 +0900
+++ linux-2.6.14-rc5-mm1.djp.1/include/linux/kprobes.h 2005-10-25 13:11:26.000000000 +0900
@@ -147,6 +147,11 @@ struct kretprobe_instance {
struct task_struct *task;
};

+struct kprobe_insn_page_list {
+ struct hlist_head list;
+ int insn_size; /* size of an instruction slot */
+};
+
#ifdef CONFIG_KPROBES
extern spinlock_t kretprobe_lock;
extern int arch_prepare_kprobe(struct kprobe *p);
diff -Narup linux-2.6.14-rc5-mm1/kernel/kprobes.c linux-2.6.14-rc5-mm1.djp.1/kernel/kprobes.c
--- linux-2.6.14-rc5-mm1/kernel/kprobes.c 2005-10-25 11:29:02.000000000 +0900
+++ linux-2.6.14-rc5-mm1.djp.1/kernel/kprobes.c 2005-10-25 13:13:58.000000000 +0900
@@ -58,44 +58,50 @@ static DEFINE_PER_CPU(struct kprobe *, k
* stepping on the instruction on a vmalloced/kmalloced/data page
* is a recipe for disaster
*/
-#define INSNS_PER_PAGE (PAGE_SIZE/(MAX_INSN_SIZE * sizeof(kprobe_opcode_t)))
+#define INSNS_PER_PAGE(size) (PAGE_SIZE/(size * sizeof(kprobe_opcode_t)))

struct kprobe_insn_page {
struct hlist_node hlist;
kprobe_opcode_t *insns; /* Page of instruction slots */
- char slot_used[INSNS_PER_PAGE];
int nused;
+ char slot_used[1];
};

-static struct hlist_head kprobe_insn_pages;
+static struct kprobe_insn_page_list kprobe_insn_pages = {
+ HLIST_HEAD_INIT, MAX_INSN_SIZE
+};

/**
- * get_insn_slot() - Find a slot on an executable page for an instruction.
+ * __get_insn_slot() - Find a slot on an executable page for an instruction.
* We allocate an executable page if there's no room on existing ones.
*/
-kprobe_opcode_t __kprobes *get_insn_slot(void)
+kprobe_opcode_t
+ __kprobes * __get_insn_slot(struct kprobe_insn_page_list *pages)
{
struct kprobe_insn_page *kip;
struct hlist_node *pos;
+ int ninsns = INSNS_PER_PAGE(pages->insn_size);

- hlist_for_each(pos, &kprobe_insn_pages) {
+ hlist_for_each(pos, &pages->list) {
kip = hlist_entry(pos, struct kprobe_insn_page, hlist);
- if (kip->nused < INSNS_PER_PAGE) {
+ if (kip->nused < ninsns) {
int i;
- for (i = 0; i < INSNS_PER_PAGE; i++) {
+ for (i = 0; i < ninsns; i++) {
if (!kip->slot_used[i]) {
kip->slot_used[i] = 1;
kip->nused++;
- return kip->insns + (i * MAX_INSN_SIZE);
+ return kip->insns +
+ (i * pages->insn_size);
}
}
/* Surprise! No unused slots. Fix kip->nused. */
- kip->nused = INSNS_PER_PAGE;
+ kip->nused = ninsns;
}
}

- /* All out of space. Need to allocate a new page. Use slot 0.*/
- kip = kmalloc(sizeof(struct kprobe_insn_page), GFP_KERNEL);
+ /* All out of space. Need to allocate a new page. Use slot 0. */
+ kip = kmalloc(sizeof(struct kprobe_insn_page) +
+ sizeof(char) * (ninsns - 1), GFP_ATOMIC);
if (!kip) {
return NULL;
}
@@ -111,23 +117,25 @@ kprobe_opcode_t __kprobes *get_insn_slot
return NULL;
}
INIT_HLIST_NODE(&kip->hlist);
- hlist_add_head(&kip->hlist, &kprobe_insn_pages);
- memset(kip->slot_used, 0, INSNS_PER_PAGE);
+ hlist_add_head(&kip->hlist, &pages->list);
+ memset(kip->slot_used, 0, ninsns);
kip->slot_used[0] = 1;
kip->nused = 1;
return kip->insns;
}

-void __kprobes free_insn_slot(kprobe_opcode_t *slot)
+void __kprobes __free_insn_slot(struct kprobe_insn_page_list *pages,
+ kprobe_opcode_t * slot)
{
struct kprobe_insn_page *kip;
struct hlist_node *pos;
+ int ninsns = INSNS_PER_PAGE(pages->insn_size);

- hlist_for_each(pos, &kprobe_insn_pages) {
+ hlist_for_each(pos, &pages->list) {
kip = hlist_entry(pos, struct kprobe_insn_page, hlist);
if (kip->insns <= slot &&
- slot < kip->insns + (INSNS_PER_PAGE * MAX_INSN_SIZE)) {
- int i = (slot - kip->insns) / MAX_INSN_SIZE;
+ slot < kip->insns + (ninsns * pages->insn_size)) {
+ int i = (slot - kip->insns) / pages->insn_size;
kip->slot_used[i] = 0;
kip->nused--;
if (kip->nused == 0) {
@@ -138,10 +146,10 @@ void __kprobes free_insn_slot(kprobe_opc
* next time somebody inserts a probe.
*/
hlist_del(&kip->hlist);
- if (hlist_empty(&kprobe_insn_pages)) {
+ if (hlist_empty(&pages->list)) {
INIT_HLIST_NODE(&kip->hlist);
hlist_add_head(&kip->hlist,
- &kprobe_insn_pages);
+ &pages->list);
} else {
module_free(NULL, kip->insns);
kfree(kip);
@@ -152,6 +160,16 @@ void __kprobes free_insn_slot(kprobe_opc
}
}

+kprobe_opcode_t __kprobes *get_insn_slot(void)
+{
+ return __get_insn_slot(&kprobe_insn_pages);
+}
+
+void __kprobes free_insn_slot(kprobe_opcode_t * slot)
+{
+ __free_insn_slot(&kprobe_insn_pages, slot);
+}
+
/* We have preemption disabled.. so it is safe to use __ versions */
static inline void set_kprobe_instance(struct kprobe *kp)
{

2005-10-31 11:08:54

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC][PATCH 2/3]Djprobe (Direct Jump Probe) for 2.6.14-rc5-mm1

Hi,

This patch is the architecture independant part of djprobe.
The djprobe would replace the kernel codes (target codes) to insert
a jump instruction.
But the target codes may be run by other processors. So the djprobe
should ensure that no other processor is running on the target codes.
First, the djprobe makes a bypass route from a copy of the target codes.
And it inserts kprobes at the top address of the target codes. Thus
other processors can detour the target codes by using the bypass route.
Next, the djprobe runs works on other processors and waits until all
works are finished to run. After that, it can ensure other processors
are not running on the target codes.

So, it can replace the target codes to a jump instruction safely.

---
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: [email protected]

Signed-off-by: Masami Hiramatsu <[email protected]>

include/linux/djprobe.h | 80 +++++++++++++++
include/linux/kprobes.h | 4
kernel/Makefile | 1
kernel/djprobe.c | 253 ++++++++++++++++++++++++++++++++++++++++++++++++
kernel/kprobes.c | 8 +
5 files changed, 345 insertions(+), 1 deletion(-)
diff -Narup linux-2.6.14-rc5-mm1.djp.1/include/linux/djprobe.h linux-2.6.14-rc5-mm1.djp.2/include/linux/djprobe.h
--- linux-2.6.14-rc5-mm1.djp.1/include/linux/djprobe.h 1970-01-01 09:00:00.000000000 +0900
+++ linux-2.6.14-rc5-mm1.djp.2/include/linux/djprobe.h 2005-10-26 15:52:22.000000000 +0900
@@ -0,0 +1,80 @@
+#ifndef _LINUX_DJPROBE_H
+#define _LINUX_DJPROBE_H
+/*
+ * Kernel Direct Jump Probe (Djprobe)
+ * include/linux/djprobe.h
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) Hitachi, Ltd. 2005
+ *
+ * 2005-Aug Created by Masami HIRAMATSU <[email protected]>
+ * Initial implementation of Direct jump probe (djprobe)
+ * to reduce overhead.
+ */
+#include <linux/config.h>
+#include <linux/list.h>
+#include <linux/smp.h>
+#include <linux/kprobes.h>
+#include <asm/djprobe.h>
+
+struct djprobe;
+/* djprobe's instance (internal use)*/
+struct djprobe_instance {
+ struct list_head plist; /* list of djprobes for multiprobe support */
+ struct arch_djprobe_stub stub;
+ struct kprobe kp;
+ struct hlist_node hlist; /* list of djprobe_instances */
+};
+#define DJPI_EMPTY(djpi) (list_empty(&djpi->plist))
+
+struct djprobe;
+typedef void (*djprobe_handler_t) (struct djprobe *, struct pt_regs *);
+/*
+ * Direct Jump probe interface structure
+ */
+struct djprobe {
+ /* list of djprobes */
+ struct list_head plist;
+
+ /* probing handler (pre-executed) */
+ djprobe_handler_t handler;
+
+ /* pointer for instance */
+ struct djprobe_instance *inst;
+};
+
+#ifdef CONFIG_DJPROBE
+extern int arch_prepare_djprobe_instance(struct djprobe_instance *djpi,
+ unsigned long size);
+extern int djprobe_pre_handler(struct kprobe *, struct pt_regs *);
+extern void djprobe_post_handler(struct kprobe *, struct pt_regs *,
+ unsigned long);
+extern void arch_install_djprobe_instance(struct djprobe_instance *djpi);
+extern void arch_uninstall_djprobe_instance(struct djprobe_instance *djpi);
+struct djprobe_instance *__kprobes get_djprobe_instance(void *addr, int size);
+
+int register_djprobe(struct djprobe *p, void *addr, int size);
+void unregister_djprobe(struct djprobe *p);
+#else /* CONFIG_DJPROBE */
+static inline int register_djprobe(struct djprobe *p)
+{
+ return -ENOSYS;
+}
+static inline void unregister_djprobe(struct djprobe *p)
+{
+}
+#endif /* CONFIG_DJPROBE */
+#endif /* _LINUX_DJPROBE_H */
diff -Narup linux-2.6.14-rc5-mm1.djp.1/include/linux/kprobes.h linux-2.6.14-rc5-mm1.djp.2/include/linux/kprobes.h
--- linux-2.6.14-rc5-mm1.djp.1/include/linux/kprobes.h 2005-10-25 13:11:26.000000000 +0900
+++ linux-2.6.14-rc5-mm1.djp.2/include/linux/kprobes.h 2005-10-25 13:32:59.000000000 +0900
@@ -163,10 +163,14 @@ extern int arch_init_kprobes(void);
extern void show_registers(struct pt_regs *regs);
extern kprobe_opcode_t *get_insn_slot(void);
extern void free_insn_slot(kprobe_opcode_t *slot);
+extern kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_page_list *pages);
+extern void __free_insn_slot(struct kprobe_insn_page_list *pages,
+ kprobe_opcode_t * slot);

/* Get the kprobe at this addr (if any) - called under a rcu_read_lock() */
struct kprobe *get_kprobe(void *addr);
struct hlist_head * kretprobe_inst_table_head(struct task_struct *tsk);
+int in_kprobes_functions(unsigned long addr);

/* kprobe_running() will just return the current_kprobe on this CPU */
static inline struct kprobe *kprobe_running(void)
diff -Narup linux-2.6.14-rc5-mm1.djp.1/kernel/djprobe.c linux-2.6.14-rc5-mm1.djp.2/kernel/djprobe.c
--- linux-2.6.14-rc5-mm1.djp.1/kernel/djprobe.c 1970-01-01 09:00:00.000000000 +0900
+++ linux-2.6.14-rc5-mm1.djp.2/kernel/djprobe.c 2005-10-27 11:59:10.000000000 +0900
@@ -0,0 +1,253 @@
+/*
+ * Kernel Direct Jump Probe (Djprobe)
+ * kernel/djprobes.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) Hitachi, Ltd. 2005
+ *
+ * 2005-Aug Created by Masami HIRAMATSU <[email protected]>
+ * Initial implementation of Direct jump probe (djprobe)
+ * to reduce overhead.
+ */
+#include <linux/djprobe.h>
+#include <linux/hash.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/moduleloader.h>
+#include <asm-generic/sections.h>
+#include <asm/cacheflush.h>
+#include <asm/errno.h>
+
+#include <linux/cpu.h>
+#include <linux/percpu.h>
+#include <asm/semaphore.h>
+
+/*
+ * The djprobe do not refer instances list when probe function called.
+ * This list is operated on registering and unregistering djprobe.
+ */
+#define DJPROBE_BLOCK_BITS 6
+#define DJPROBE_BLOCK_SIZE (1 << DJPROBE_BLOCK_BITS)
+#define DJPROBE_HASH_BITS 8
+#define DJPROBE_TABLE_SIZE (1 << DJPROBE_HASH_BITS)
+#define DJPROBE_TABLE_MASK (DJPROBE_TABLE_SIZE - 1)
+
+/* djprobe instance hash table */
+static struct hlist_head djprobe_inst_table[DJPROBE_TABLE_SIZE];
+
+#define hash_djprobe(key) \
+ (((unsigned long)(key) >> DJPROBE_BLOCK_BITS) & DJPROBE_TABLE_MASK)
+
+static DECLARE_MUTEX(djprobe_mutex);
+static DEFINE_PER_CPU(struct work_struct, djprobe_works);
+static DECLARE_WAIT_QUEUE_HEAD(djprobe_wqh);
+static atomic_t djprobe_count = ATOMIC_INIT(0);
+
+/* Instruction pages for djprobe's stub code */
+static struct kprobe_insn_page_list djprobe_insn_pages = {
+ HLIST_HEAD_INIT, 0
+};
+
+static inline void __free_djprobe_instance(struct djprobe_instance *djpi)
+{
+ hlist_del(&djpi->hlist);
+ if (djpi->kp.addr) {
+ unregister_kprobe(&(djpi->kp));
+ }
+ if (djpi->stub.insn)
+ __free_insn_slot(&djprobe_insn_pages, djpi->stub.insn);
+ kfree(djpi);
+}
+
+static inline
+ struct djprobe_instance *__create_djprobe_instance(struct djprobe *djp,
+ void *addr, int size)
+{
+ struct djprobe_instance *djpi;
+ /* allocate a new instance */
+ djpi = kcalloc(1, sizeof(struct djprobe_instance), GFP_ATOMIC);
+ if (djpi == NULL) {
+ goto out;
+ }
+ /* allocate stub */
+ djpi->stub.insn = __get_insn_slot(&djprobe_insn_pages);
+ if (djpi->stub.insn == NULL) {
+ __free_djprobe_instance(djpi);
+ djpi = NULL;
+ goto out;
+ }
+
+ /* attach */
+ djp->inst = djpi;
+ INIT_LIST_HEAD(&djpi->plist);
+ list_add_rcu(&djp->plist, &djpi->plist);
+ djpi->kp.addr = addr;
+ djpi->kp.pre_handler = djprobe_pre_handler;
+ djpi->kp.post_handler = djprobe_post_handler;
+ arch_prepare_djprobe_instance(djpi, size);
+
+ INIT_HLIST_NODE(&djpi->hlist);
+ hlist_add_head(&djpi->hlist, &djprobe_inst_table[hash_djprobe(addr)]);
+ out:
+ return djpi;
+}
+
+static struct djprobe_instance *__kprobes __get_djprobe_instance(void *addr,
+ int size)
+{
+ struct djprobe_instance *djpi;
+ struct hlist_node *node;
+ unsigned long idx, eidx;
+
+ idx = hash_djprobe(addr - ARCH_STUB_INSN_MAX);
+ eidx = ((hash_djprobe(addr + size) + 1) & DJPROBE_TABLE_MASK);
+ do {
+ hlist_for_each_entry(djpi, node, &djprobe_inst_table[idx],
+ hlist) {
+ if (((long)addr <
+ (long)djpi->kp.addr + DJPI_ARCH_SIZE(djpi))
+ && ((long)djpi->kp.addr < (long)addr + size)) {
+ return djpi;
+ }
+ }
+ idx = ((idx + 1) & DJPROBE_TABLE_MASK);
+ }while (idx != eidx);
+
+ return NULL;
+}
+
+struct djprobe_instance *__kprobes get_djprobe_instance(void *addr, int size)
+{
+ struct djprobe_instance *djpi;
+ down(&djprobe_mutex);
+ djpi = __get_djprobe_instance(addr, size);
+ up(&djprobe_mutex);
+ return djpi;
+}
+
+/* This work function invoked while djprobe_mutex is locked. */
+static void __kprobes __work_check_safety(void *data)
+{
+ if (atomic_dec_and_test(&djprobe_count)) {
+ wake_up_all(&djprobe_wqh);
+ }
+}
+
+static void __kprobes __check_safety(void)
+{
+ int cpu;
+ struct work_struct *wk;
+ lock_cpu_hotplug();
+ atomic_set(&djprobe_count, num_online_cpus() - 1);
+ for_each_online_cpu(cpu) {
+ if (cpu == smp_processor_id())
+ continue;
+ wk = &per_cpu(djprobe_works, cpu);
+ INIT_WORK(wk, __work_check_safety, NULL);
+ schedule_delayed_work_on(cpu, wk, 0);
+ }
+ wait_event(djprobe_wqh, (atomic_read(&djprobe_count) == 0));
+ unlock_cpu_hotplug();
+}
+
+int __kprobes register_djprobe(struct djprobe *djp, void *addr, int size)
+{
+ struct djprobe_instance *djpi;
+ struct kprobe *kp;
+ int ret = 0, i;
+
+ BUG_ON(in_interrupt());
+
+ if (size > ARCH_STUB_INSN_MAX || size < ARCH_STUB_INSN_MIN)
+ return -EINVAL;
+
+ if ((ret = in_kprobes_functions((unsigned long)addr)) != 0)
+ return ret;
+
+ down(&djprobe_mutex);
+ INIT_LIST_HEAD(&djp->plist);
+ /* check confliction with other djprobes */
+ djpi = __get_djprobe_instance(addr, size);
+ if (djpi) {
+ if (djpi->kp.addr == addr) {
+ djp->inst = djpi; /* add to another instance */
+ list_add_rcu(&djp->plist, &djpi->plist);
+ } else {
+ ret = -EEXIST; /* other djprobes were inserted */
+ }
+ goto out;
+ }
+ djpi = __create_djprobe_instance(djp, addr, size);
+ if (djpi == NULL) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ /* check confliction with kprobes */
+ for (i = 0; i < size; i++) {
+ kp = get_kprobe((void *)((long)addr + i));
+ if (kp != NULL) {
+ ret = -EEXIST; /* a kprobes were inserted */
+ goto fail;
+ }
+ }
+ ret = register_kprobe(&djpi->kp);
+ if (ret < 0) {
+ fail:
+ djpi->kp.addr = NULL;
+ djp->inst = NULL;
+ list_del_rcu(&djp->plist);
+ __free_djprobe_instance(djpi);
+ } else {
+ __check_safety();
+ arch_install_djprobe_instance(djpi);
+ }
+ out:
+ up(&djprobe_mutex);
+ return ret;
+}
+
+void __kprobes unregister_djprobe(struct djprobe *djp)
+{
+ struct djprobe_instance *djpi;
+
+ BUG_ON(in_interrupt());
+
+ down(&djprobe_mutex);
+ djpi = djp->inst;
+ if (djp->plist.next == djp->plist.prev) {
+ arch_uninstall_djprobe_instance(djpi); /* this requires irq enabled */
+ list_del_rcu(&djp->plist);
+ djp->inst = NULL;
+ __check_safety();
+ __free_djprobe_instance(djpi);
+ } else {
+ list_del_rcu(&djp->plist);
+ djp->inst = NULL;
+ }
+ up(&djprobe_mutex);
+}
+
+static int __init init_djprobe(void)
+{
+ djprobe_insn_pages.insn_size = ARCH_STUB_SIZE;
+ return 0;
+}
+
+__initcall(init_djprobe);
+
+EXPORT_SYMBOL_GPL(register_djprobe);
+EXPORT_SYMBOL_GPL(unregister_djprobe);
diff -Narup linux-2.6.14-rc5-mm1.djp.1/kernel/kprobes.c linux-2.6.14-rc5-mm1.djp.2/kernel/kprobes.c
--- linux-2.6.14-rc5-mm1.djp.1/kernel/kprobes.c 2005-10-25 13:13:58.000000000 +0900
+++ linux-2.6.14-rc5-mm1.djp.2/kernel/kprobes.c 2005-10-26 15:53:05.000000000 +0900
@@ -37,6 +37,7 @@
#include <linux/slab.h>
#include <linux/module.h>
#include <linux/moduleloader.h>
+#include <linux/djprobe.h>
#include <asm-generic/sections.h>
#include <asm/cacheflush.h>
#include <asm/errno.h>
@@ -467,7 +468,7 @@ static inline void cleanup_aggr_kprobe(s
spin_unlock_irqrestore(&kprobe_lock, flags);
}

-static int __kprobes in_kprobes_functions(unsigned long addr)
+int __kprobes in_kprobes_functions(unsigned long addr)
{
if (addr >= (unsigned long)__kprobes_text_start
&& addr < (unsigned long)__kprobes_text_end)
@@ -483,6 +484,11 @@ int __kprobes register_kprobe(struct kpr

if ((ret = in_kprobes_functions((unsigned long) p->addr)) != 0)
return ret;
+#ifdef CONFIG_DJPROBE
+ if (p->pre_handler != djprobe_pre_handler &&
+ get_djprobe_instance(p->addr, 1) != NULL)
+ return -EEXIST;
+#endif /* CONFIG_DJPROBE */
if ((ret = arch_prepare_kprobe(p)) != 0)
goto rm_kprobe;

diff -Narup linux-2.6.14-rc5-mm1.djp.1/kernel/Makefile linux-2.6.14-rc5-mm1.djp.2/kernel/Makefile
--- linux-2.6.14-rc5-mm1.djp.1/kernel/Makefile 2005-10-25 11:29:02.000000000 +0900
+++ linux-2.6.14-rc5-mm1.djp.2/kernel/Makefile 2005-10-25 13:22:27.000000000 +0900
@@ -27,6 +27,7 @@ obj-$(CONFIG_STOP_MACHINE) += stop_machi
obj-$(CONFIG_AUDIT) += audit.o
obj-$(CONFIG_AUDITSYSCALL) += auditsc.o
obj-$(CONFIG_KPROBES) += kprobes.o
+obj-$(CONFIG_DJPROBE) += djprobe.o
obj-$(CONFIG_SYSFS) += ksysfs.o
obj-$(CONFIG_DETECT_SOFTLOCKUP) += softlockup.o
obj-$(CONFIG_GENERIC_HARDIRQS) += irq/

2005-10-31 11:10:55

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC][PATCH 3/3]Djprobe (Direct Jump Probe) for 2.6.14-rc5-mm1

Hi,

This patch is the i386 architecture dependent codes of djprobe.
I heard that we need to synchronize caches of each processor if we
execute self modifying on i386.
So, this patch synchronize caches by using CPUID and smp_call_function.

---
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: [email protected]

Signed-off-by: Masami Hiramatsu <[email protected]>

arch/i386/Kconfig | 8 +
arch/i386/kernel/Makefile | 1
arch/i386/kernel/djprobe.c | 172 ++++++++++++++++++++++++++++++++++++++++
arch/i386/kernel/stub_djprobe.S | 77 +++++++++++++++++
include/asm-i386/djprobe.h | 56 +++++++++++++
5 files changed, 314 insertions(+)
diff -Narup linux-2.6.14-rc5-mm1.djp.2/arch/i386/Kconfig linux-2.6.14-rc5-mm1.djp.3/arch/i386/Kconfig
--- linux-2.6.14-rc5-mm1.djp.2/arch/i386/Kconfig 2005-10-25 11:28:49.000000000 +0900
+++ linux-2.6.14-rc5-mm1.djp.3/arch/i386/Kconfig 2005-10-27 11:26:55.000000000 +0900
@@ -1317,6 +1317,14 @@ config KPROBES
a probepoint and specifies the callback. Kprobes is useful
for kernel debugging, non-intrusive instrumentation and testing.
If in doubt, say "N".
+
+config DJPROBE
+ bool "Direct Jump probe"
+ depends on KPROBES && !PREEMPT
+ help
+ Djprobe allows you to dynamically hook at any kernel function
+ entry points and collect the debugging or performance analysis
+ information non-disruptively.
endmenu

source "arch/i386/Kconfig.debug"
diff -Narup linux-2.6.14-rc5-mm1.djp.2/arch/i386/kernel/Makefile linux-2.6.14-rc5-mm1.djp.3/arch/i386/kernel/Makefile
--- linux-2.6.14-rc5-mm1.djp.2/arch/i386/kernel/Makefile 2005-10-25 11:28:49.000000000 +0900
+++ linux-2.6.14-rc5-mm1.djp.3/arch/i386/kernel/Makefile 2005-10-25 14:39:12.000000000 +0900
@@ -29,6 +29,7 @@ obj-$(CONFIG_KEXEC) += machine_kexec.o
obj-$(CONFIG_X86_NUMAQ) += numaq.o
obj-$(CONFIG_X86_SUMMIT_NUMA) += summit.o
obj-$(CONFIG_KPROBES) += kprobes.o
+obj-$(CONFIG_DJPROBE) += stub_djprobe.o djprobe.o
obj-$(CONFIG_MODULES) += module.o
obj-y += sysenter.o vsyscall.o
obj-$(CONFIG_ACPI_SRAT) += srat.o
diff -Narup linux-2.6.14-rc5-mm1.djp.2/arch/i386/kernel/djprobe.c linux-2.6.14-rc5-mm1.djp.3/arch/i386/kernel/djprobe.c
--- linux-2.6.14-rc5-mm1.djp.2/arch/i386/kernel/djprobe.c 1970-01-01 09:00:00.000000000 +0900
+++ linux-2.6.14-rc5-mm1.djp.3/arch/i386/kernel/djprobe.c 2005-10-28 17:52:29.000000000 +0900
@@ -0,0 +1,172 @@
+/*
+ * Kernel Direct Jump Probe (Djprobes)
+ * arch/i386/kernel/djprobe.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) Hitachi, Ltd. 2005
+ *
+ * 2005-Aug Created by Masami HIRAMATSU <[email protected]>
+ * Initial implementation of Direct jump probe (djprobe)
+ * to reduce overhead.
+ */
+
+#include <linux/config.h>
+#include <linux/djprobe.h>
+#include <linux/ptrace.h>
+#include <linux/spinlock.h>
+#include <linux/preempt.h>
+#include <asm/cacheflush.h>
+#include <asm/kdebug.h>
+#include <asm/desc.h>
+#include <asm/processor.h>
+
+
+/*
+ * When kernel full preemption is enabled, we can't ensure that no threads
+ * are executing the modified code. It may be stored in the stack of the
+ * threads. In this case, the djprobe interfaces are emulated by using
+ * kprobe.
+ * When kernel full preemption is disabled, threads are scheduled
+ * from only limited addresses. So it is easy to check whether the
+ * preemption can occur in the modified code.
+ */
+
+/*
+ * On pentium series, Unsynchronized cross-modifying code
+ * operations can cause unexpected instruction execution results.
+ * So after code modified, we should synchronize it on each processor.
+ */
+static void __local_serialize_cpu(void * info)
+{
+ serialize_cpu();
+}
+
+static inline void smp_serialize_cpus(void)
+{
+ on_each_cpu(__local_serialize_cpu, NULL, 1,1);
+}
+
+/* jmp code manipulators */
+struct __arch_jmp_op {
+ char op;
+ long raddr;
+} __attribute__((packed));
+/* insert jmp code */
+static inline void __set_jmp_op(void *from, void *to, int sync)
+{
+ struct __arch_jmp_op *jop;
+ jop = (struct __arch_jmp_op *)from;
+ jop->raddr=(long)(to) - ((long)(from) + 5);
+ mb();
+ if (sync) smp_serialize_cpus();
+ jop->op = RELATIVEJUMP_INSTRUCTION;
+}
+/* switch back to the kprobe */
+static inline void __set_breakpoint_op(void *dest, void *orig)
+{
+ struct __arch_jmp_op *jop = (struct __arch_jmp_op *)dest,
+ *jop2 = (struct __arch_jmp_op *)orig;
+
+ jop->op = BREAKPOINT_INSTRUCTION;
+ jop->raddr = jop2->raddr;
+ mb();
+ smp_serialize_cpus();
+}
+
+/* djprobe call back function: called from stub code */
+static void asmlinkage djprobe_callback(struct djprobe_instance * djpi,
+ struct pt_regs *regs)
+{
+ struct djprobe *djp;
+ rcu_read_lock();
+ list_for_each_entry_rcu(djp, &djpi->plist, plist) {
+ if (djp->handler)
+ djp->handler(djp, regs);
+ }
+ rcu_read_unlock();
+}
+
+/*
+ * Copy post processing instructions
+ * Target instructions MUST be relocatable.
+ */
+int __kprobes arch_prepare_djprobe_instance(struct djprobe_instance *djpi,
+ unsigned long size)
+{
+ kprobe_opcode_t *stub;
+ stub = djpi->stub.insn;
+ djpi->stub.size = size;
+
+ /* copy arch-dep-instance from template */
+ memcpy((void*)stub, (void*)&arch_tmpl_stub_entry, ARCH_STUB_SIZE);
+
+ /* set probe information */
+ *((long*)(stub + ARCH_STUB_VAL_IDX)) = (long)djpi;
+ /* set probe function */
+ *((long*)(stub + ARCH_STUB_CALL_IDX)) = (long)djprobe_callback;
+
+ /* copy instructions into the middle of djporbe instance */
+ memcpy((void*)(stub + ARCH_STUB_INST_IDX),
+ (void*)djpi->kp.addr, size);
+
+ /* set returning jmp instruction at the tail of djporbe instance*/
+ __set_jmp_op(stub + ARCH_STUB_INST_IDX + size,
+ (void*)((long)djpi->kp.addr + size), 0);
+
+ return 0;
+}
+
+/* Insert "jmp" instruction into the probing point. */
+void __kprobes arch_install_djprobe_instance(struct djprobe_instance *djpi)
+{
+ __set_jmp_op((void*)djpi->kp.addr, (void*)djpi->stub.insn, 1);
+}
+
+/* Write back original instructions & kprobe */
+void __kprobes arch_uninstall_djprobe_instance(struct djprobe_instance *djpi)
+{
+ kprobe_opcode_t *stub;
+ stub = &djpi->stub.insn[ARCH_STUB_INST_IDX];
+ __set_breakpoint_op((void*)djpi->kp.addr, (void*)stub);
+}
+
+static DEFINE_SPINLOCK(djprobe_handler_lock);
+
+/* djprobe handler : switch to a bypass code */
+int __kprobes djprobe_pre_handler(struct kprobe * kp, struct pt_regs * regs)
+{
+ struct djprobe_instance *djpi =
+ container_of(kp,struct djprobe_instance, kp);
+ kprobe_opcode_t *stub = djpi->stub.insn;
+
+ spin_lock(&djprobe_handler_lock);
+ if (DJPI_EMPTY(djpi)) {
+ kp->ainsn.insn[0] = kp->opcode;
+ return 0;
+ } else {
+ regs->eip = (unsigned long)stub;
+ regs->eflags |= TF_MASK;
+ regs->eflags &= ~IF_MASK;
+ kp->ainsn.insn[0] = RETURN_INSTRUCTION;
+ return 1; /* already prepared */
+ }
+}
+
+void __kprobes djprobe_post_handler(struct kprobe * kp, struct pt_regs * regs,
+ unsigned long flags)
+{
+ spin_unlock(&djprobe_handler_lock);
+}
diff -Narup linux-2.6.14-rc5-mm1.djp.2/arch/i386/kernel/stub_djprobe.S linux-2.6.14-rc5-mm1.djp.3/arch/i386/kernel/stub_djprobe.S
--- linux-2.6.14-rc5-mm1.djp.2/arch/i386/kernel/stub_djprobe.S 1970-01-01 09:00:00.000000000 +0900
+++ linux-2.6.14-rc5-mm1.djp.3/arch/i386/kernel/stub_djprobe.S 2005-10-25 14:39:12.000000000 +0900
@@ -0,0 +1,77 @@
+/*
+ * linux/arch/i386/stub_djprobe.S
+ *
+ * Copyright (C) HITACHI,LTD. 2005
+ * Created by Masami Hiramatsu <[email protected]>
+ */
+
+#include <linux/config.h>
+
+# jmp into this function from other functions.
+.global arch_tmpl_stub_entry
+arch_tmpl_stub_entry:
+ nop
+ subl $8, %esp #skip segment registers.
+ pushf
+ subl $20, %esp #skip segment registers.
+ pushl %eax
+ pushl %ebp
+ pushl %edi
+ pushl %esi
+ pushl %edx
+ pushl %ecx
+ pushl %ebx
+
+ movl %esp, %eax
+ pushl %eax
+ addl $60, %eax
+ movl %eax, 56(%esp)
+.global arch_tmpl_stub_val
+arch_tmpl_stub_val:
+ movl $0xffffffff, %eax
+ pushl %eax
+.global arch_tmpl_stub_call
+arch_tmpl_stub_call:
+ movl $0xffffffff, %eax
+ call *%eax
+ addl $8, %esp
+
+ popl %ebx
+ popl %ecx
+ popl %edx
+ popl %esi
+ popl %edi
+ popl %ebp
+ popl %eax
+ addl $20, %esp
+ popf
+ addl $8, %esp
+.global arch_tmpl_stub_inst
+arch_tmpl_stub_inst:
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+ nop
+.global arch_tmpl_stub_end
+arch_tmpl_stub_end:
diff -Narup linux-2.6.14-rc5-mm1.djp.2/include/asm-i386/djprobe.h linux-2.6.14-rc5-mm1.djp.3/include/asm-i386/djprobe.h
--- linux-2.6.14-rc5-mm1.djp.2/include/asm-i386/djprobe.h 1970-01-01 09:00:00.000000000 +0900
+++ linux-2.6.14-rc5-mm1.djp.3/include/asm-i386/djprobe.h 2005-10-25 14:39:12.000000000 +0900
@@ -0,0 +1,56 @@
+#ifndef _ASM_DJPROBE_H
+#define _ASM_DJPROBE_H
+/*
+ * Kernel Direct Jump Probe (Djprobe)
+ * include/asm-i386/djprobe.h
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) Hitachi, Ltd. 2005
+ *
+ * 2005-Aug Created by Masami HIRAMATSU <[email protected]>
+ * Initial implementation of Direct jump probe (djprobe)
+ * to reduce overhead.
+ */
+
+#define RELATIVEJUMP_INSTRUCTION 0xe9
+#define RETURN_INSTRUCTION 0xc3
+
+#ifndef CONFIG_PREEMPT
+#define ARCH_SUPPORTS_DJPROBES
+#endif /* CONFIG_PREEMPT */
+
+/* stub template code */
+extern kprobe_opcode_t arch_tmpl_stub_entry;
+extern kprobe_opcode_t arch_tmpl_stub_val;
+extern kprobe_opcode_t arch_tmpl_stub_call;
+extern kprobe_opcode_t arch_tmpl_stub_inst;
+extern kprobe_opcode_t arch_tmpl_stub_end;
+
+#define ARCH_STUB_VAL_IDX ((long)&arch_tmpl_stub_val - (long)&arch_tmpl_stub_entry + 1)
+#define ARCH_STUB_CALL_IDX ((long)&arch_tmpl_stub_call - (long)&arch_tmpl_stub_entry + 1)
+#define ARCH_STUB_INST_IDX ((long)&arch_tmpl_stub_inst - (long)&arch_tmpl_stub_entry)
+#define ARCH_STUB_SIZE ((long)&arch_tmpl_stub_end - (long)&arch_tmpl_stub_entry)
+
+#define ARCH_STUB_INSN_MAX 20
+#define ARCH_STUB_INSN_MIN 5
+
+struct arch_djprobe_stub {
+ kprobe_opcode_t *insn;
+ int size;
+};
+#define DJPI_ARCH_SIZE(djpi) (djpi->stub.size)
+
+#endif /* _ASM_DJPROBE_H */


2005-12-12 11:15:52

by Hiro Yoshioka

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/3]Djprobe (Direct Jump Probe) for 2.6.14-rc5-mm1

Hi,

HTML format has been rejected so I'd like to resend this mail again.

I was a lucky enough to attend Hiramatsu san's presentation at a
kernel reading party at YLUG (Yokohama Linux Users Group)
http://ylug.jp/

It is a really cool idea and I like it :-)

Regards,
Hiro

On 10/31/05, Masami Hiramatsu <[email protected]> wrote:
> Hello,
>
> I would like to propose djprobe (Direct Jump Probe) for low overhead
> probing.
> The djprobe is useful for the performance analysis function and the
> kernel flight-recording function which constantly traces events in
> the kernel. Because we should make their influence on performance as
> small as possible.
> Djprobe is a kind of probes in kernel like kprobes.
> It has some features:
> - Jump instruction based probe. This is so fast.
> - Non interruption.
> - Safely code insertion on SMP.
> - Lockless probe after registered.
> I attached detailed document of djprobe to this mail. If you need
> more information, please see it.
>
> This djprobe is NOT a replacement of kprobes. Djprobe and kprobes
> have complementary qualities. (ex: djprobe's overhead is low, and
> kprobes can be inserted in anywhere.)
> You can use both kprobes and djprobe as the situation demands.
>
> I measured the overhead of the djprobe on Pentium4 3.06GHz PC by
> using gtodbench (*). The result I got was about 100ns. In the view
> of performance, I think djprobe is the best probe method. What would
> you think about this?
>
> (*)The gtodbench is micro benchmark which is included in published
> djprobe source package. You can download it from LKST's web site:
> http://prdownloads.sourceforge.net/lkst/djprobe-20050713.tar.bz2
>
> The following three patches introduce djprobe (Direct Jump Probe)
> to linux-2.6.14-rc5-mm1.
> patch 1: Introduce a instruction slot management structure to
> handle different size slots. (a patch for kprobes)
> patch 2: Djprobe core (arch-independant) patch.
> patch 3: Djprobe i386 (arch-dependant) patch.
>
> Please try to use djprobe.
>
> Any comments or suggestions are welcome.
>
> Best regards,
>
> --
> Masami HIRAMATSU
> 2nd Research Dept.
> Hitachi, Ltd., Systems Development Laboratory
> E-mail: [email protected]
>
>
> Djprobe Documentation
> authors: Satoshi Oshima ([email protected])
> Masami Hiramatsu ([email protected])
>
> INDEX
>
> 1. Djprobe concepts
> 2. How djprobe works
> 3. Further Considerations
> 4. Djprobe Features
> 5. Architectures Supported
> 6. Configuring Djprobe
> 7. API Reference
> 8. TODO
> 9. FAQ
>
> 1. Djprobe concepts
>
> The basic idea of Djprobe is to dynamically hook at any kernel function
> entry points and collect the debugging or performance analysis information
> non-disruptively. The functionality of djprobe is very similar to Kprobe
> or Jprobe. The distinction of djprobe is to use jump instruction instead
> of break point instruction. This distinction reduces the overhead of each
> probe.
>
> Developers can trap at almost any kernel function entry points, specifying
> a handler routine to be invoked when the jump instruction is executed.
>
>
> 2. How Djprobe works
>
> Break point instruction is easily inserted on most architecture.
> For example, binary size of break point instruction on i386 or x86_64
> architecture is 1 byte. 1 byte replacement is took place in single step.
> And replacement with breakpoint instruction is guaranteed as SMP safe.
>
> On the other hand jump instruction is not easily inserted. Binary size of
> jump instruction on i386 is 5 byte. 5 byte replacement cannot be executed
> in single step. And beyond that dynamic code modification has some
> complicated restriction.
>
> To explain the djprobe mechanism, we introduce some terminology.
> Image certain binary line which is constructed by 2 byte instruction,
> 2byte instruction and 3byte instruction.
>
> IA
> |
> [-2][-1][0][1][2][3][4][5][6][7]
> [ins1][ins2][ ins3 ]
> [<- DCR ->]
> [<- JTPR ->]
>
> ins1: 1st Instruction
> ins2: 2nd Instruction
> ins3: 3rd Instruction
> IA: Insertion Address
> JTPR: Jump Target Prohibition Region
> DCR: Detoured Code Region
>
>
> The replacement procedure of djpopbes is 6 steps:
>
> (1) copying instruction(s) in DCR
> (2) putting break point instruction at IA
> (3) scheduling works on each CPU
> (4) executing CPU safety check on each work
> (5) replacing original instruction(s) with jump instruction without
> first byte and serializing code
> (6) replacing break point instruction with first byte of jump instruction
>
> Further explanation is given below.
>
> (1) copying instruction(s) in DCR
>
> Djprobe copies replaced instruction(s) to the region that djprobe allocates.
> The replaced instructions must include the instruction that includes the byte
> at IA+4. Therefore the size of DCR must be 5 byte or more. The size of DCR
> must be given by djprobes user.
>
> (2) putting break point instruction at IA
>
> Djprobe replaces a break point instruction at Insertion Point. After this
> replacement, the djprobe act like kprobes.
>
> (3) scheduling works on each CPU
>
> Djprobe schedules work(s) that execute CPU safety check on each CPU, and wait
> till those works finished.
>
> (4) executing CPU safety check on each work
>
> Current Djprobe suppose that the context switch must NOT occur on extension
> of interruption, which means that every interruption must return before
> executing context switch.
>
> Therefore, execution of scheduled works itself is the proof that every
> interruption stack (and every process stack) doesn't include any address
> in JTPR.
>
> The last CPU that executes safety check work wakes the waiting process up.
>
> (5) replacing original instruction(s) with jump instruction without first
> byte and serializing code
>
> After all safety check works are scheduled, djprobe can replace the codes
> in JTPR safely. Because, now, any CPU is not executing JTPR. Even if a CPU
> tries to execute the instructions in the top of DCR again, the CPU is
> interrupted by kprobe and is led to execute the copied instructions. So
> any CPU does not touch the instructions in the JTPR.
>
> Djprobe replaces the bytes in the area from IA+1 to IA+4 with jump
> instruction that doesn't contain first byte.
> And it serializes the replaced code on every CPU.
>
> (6) replacing breakpoint instruction with first byte of jump instruction
>
> Djprobe replaces breakpoint instruction that is put by themselves with the
> first byte of jump instruction.
>
>
> 3. Further considerations
>
> There are many difficulties on implementation of djprobe. In this section,
> we discuss restrictions of djprobe to understand these difficulties.
>
> 3.1 the way to confirm safety of DCR(dynamic analysis)
>
> Djprobe tries to replace the code that includes one instruction or more.
> This replacement usually accompanies changing the boundaries of instructions.
> Therefore djprobe must ensure that the other CPUs don't execute DCR or every
> stack doesn't contain the address in JTPR.
>
> 3.2 confirmation of safety of DCR(static analysis)
>
> Djprobe must also avoid JTPR must not be targeted by any jump or call
> instruction. Basically this must be extremely difficult to take place.
> But some point such as function entry point can be expected that is not
> target of jump or call instruction (because function entry point contains
> fixed form that ensures the code convention.)
>
> 4. Djprobe Features
>
> - Djprobe can probe entries of almost all functions without any interruption.
>
> 5. Architecture Supported
>
> - i386
>
>
> 6. Configuring Djprobe
> When configuring the kernel using make menuconfig/xconfig/oldconfig, ensure
> that CONFIG_DJPROBE is set to "y". Under "Instrumentation Support",
> look for "Direct Jump probe". You may have to enable "Kprobes" and to
> *DISABLE* "Preemptible Kernel".
>
> 7. API Reference
> The Djprobe API includes "register_djprobe" function and
> "unregister_djprobe" function. Here are specifications for these functions
> and the associated probe handlers.
>
> 7.1 register_djprobe
>
> #include <linux/djprobe.h>
> int register_djprobe(struct djprobe *djp, void *addr, int size);
>
> Inserts a jump instruction at the address addr. When the jump is
> hit, Djprobe calls djp->handler.
>
> register_djprobe() returns 0 on success, or a negative errno otherwise.
>
> User's probe handler (djp->handler):
> #include <linux/djprobe.h>
> #include <linux/ptrace.h>
> void handler(struct djprobe *djp, struct pt_regs *regs);
>
> Called with p pointing to the djprobe associated with the probe point,
> and regs pointing to the struct containing the registers saved when
> the probe point was hit.
>
> 7.2 unregister_djprobe
>
> #include <linux/djprobe.h>
> void unregister_djprobe(struct djprobe *djp);
>
> Removes the specified probe. The unregister function can be called
> at any time after the probe has been registered.
>
>
> 8. TODO
>
> (1)support architecture transparent interface.
> (Djprobe interface emulated by kprobes)
> (2)bulk registeration interface support
> (3)kprobe interoperability (coexistance in same address)
> (4)other architectures support
>
> 9. FAQ
> Direct Jump Probe Q&A
>
> Q: What is the Direct Jump Probe (Djprobe)?
> A: Djprobe is a low overhead probe method for linux kernel.
>
> Q: What is different from Kprobes?
> A: The most different feature is that the djprobe uses a jump instruction
> code instead of breakpoint instruction code. It can reduce overheads of
> probing especially when the probes are executed frequently.
>
> Q: How does the djprobe work?
> A: First, Djprobe copies some instructions modified by a jump instruction
> into the middle of a stub code buffer. Next, it overwrites the instructions
> with the jump instruction whose destination is the top of that stub code
> buffer. In the top of the stub code buffer, there is a call instruction
> which calls a probe function. And, in the bottom of the stub code buffer,
> there is a jump instruction whose destination is the next of the modified
> instructions.
> On the other hand, Kprobe copies only one instruction which will be
> modified by breakpoint instruction, and overwrites it breakpoint
> instruction. When breakpoint interruption handling, it executes the copied
> instruction with the trap flag. When trap interruption handling, it
> corrects IP(*) for returning to the kernel code.
> So, djprobe's work sequence is "jump", "probe", "execute copies" and
> "jump", whereas kprobes' sequence is "break", "probe", "execute copies",
> and "trap".
>
> (*)Instruction Pointer
>
> Q: Does the djprobe need to modify kernel source code?
> A: No. The djprobe is one of the dynamic probes. It can be inserted into
> running kernel.
>
> Q: Can djprobe work with CPU-hotplug?
> A: Yes, djprobe locks cpu-hotplug in the critical section.
>
> Q: Where can the djprobe be inserted in?
> A: Djprobe can be inserted in almost all kernel code including the head of
> almost kernel functions. The insertion area must satisfy the assumptions
> described below.
>
> (In i386 architecture)
> IA
> |
> [-2][-1][0][1][2][3][4][5][6][7]
> [ins1][ins2][ ins3 ]
> [<- DCR ->]
> [<- JTPR ->]
>
> ins1: 1st Instruction
> ins2: 2nd Instruction
> ins3: 3rd Instruction
> IA: Insertion Address
> DCR (Detoured Code Region): The area which is including the instructions
> whose first byte is in the range in 5 bytes (this size is from the size of
> jump instruction) from the insertion address. These instructions are copied
> into the middle of a stub code buffer.
> JTPR (Jump Target Prohibition Region): The area which is including the
> codes among codes rewritten in the jump instruction by djprobe except the
> first one byte.
>
> Assumptions:
> i) The insertion address points the first byte of an instruction.
> This is for avoidance of a bad instruction exception.
> ii) There are no instructions which refer IP (ex. relative jmp) in DCR.
> EIP has been changed when copied instruction is executed.
> iii) There are no instructions which occur context-switch (ex. call
> schedule()) in DCR.
> If a context-switch occurs in DCR, the next address of an instruction
> (ex. the address of "ins2") is stored in the call stack of previous thread.
> After that, djprobe overwrites the instruction with jump instruction. When
> the previous thread switches back, it resumes execution from the stored
> address. So it will cause a bad instruction exception.
> iv) Destination address of jump or call is not included in JTPR.
> This is for avoidance of a bad instruction exception too.
>
> Q: Can several djprobes be inserted in the same address?
> A: Yes. Several djprobes which are inserted in the same address are
> aggregated and share one instance.
> NOTE: When a new djprobe's insertion address is in another djprobe's JTPR
> (above described), or the another djprobe's insertion address is in the new
> djprobe's JTPR, register_djprobe() fails to register the new djprobe and
> returns -EEXIST error code.
>
> Q: Can djprobe be used with kprobes in same address?
> A: No, currently djprobe can not coexist with kprobes in same address. But
> we will support this feature as soon as possible.
>
> Q: Should the jump instruction be with in a page boundary to avoid access
> violation and page fault?
> A: No. The x86 processors can handle non-aligned instructions correctly. We
> can see many non-aligned instructions in the kernel binary. And, in the
> kernel space, there is no page fault. Kernel code pages are always mapped
> to the kernel page table.
> So it is not necessary to care of page boundaries in x86 architecture.
>
> Q: How does the djprobe resolve problems about self/cross-modifying code?
> In Pentium Series, Unsynchronized cross-modifying code operations except
> the first byte of an instruction can cause unexpected instruction
> execution results.
> A: Djprobe uses a trick code to resolve the problems. It modifies the
> instructions as following.
> 1) Register special handler as a kprobe handler. (And a break point
> instruction is written on the first byte of the insertion address by
> kprobes.)
> 2) Check safety (this is described in the next question's answer).
> 3) Write only the destination address part of jump instruction on the
> kernel code. (This operation is not synchronized)
> 4) Call "cpuid" on each processor for synchronization.
> 5) Write the first byte of the jump instruction. (This operation is
> synchronized automatically)
>
> Q: How does the djprobe guarantee no threads and no processors are
> executing the modifying area? The IP of that area may be stored in the
> stack memory of those threads.
> A: The problem would be caused for three reasons:
> i) Problem caused by the multi processor system
> Another processor may be executing the area which is overwritten by jump
> instruction. Djprobe should guarantee no processor is executing those
> instructions when modify it.
> ii) Problem caused by the interruption
> An interruption might have occurred in the area which is going to be
> overwritten by jump instruction. Djprobe should guarantee all
> interruptions which occurred in the area have finished.
> iii) Problem caused by full preempt kernel
> In case of Problem (iii), it is described in the next question's answer.
>
> The Djprobe uses the workqueue to resolve Problem (i) and (ii). The
> solution is described below:
> 1) Copy the entire of the DCR (described above) into the middle of a stub
> code buffer.
> 2) Register special handler as a kprobe handler. This special handler
> changes kprobe's resume point to the stub code buffer.
> 3) Clear the safety flags of all processors.
> 4) Register a safety checking work to the workqueue on each processor. And
> wait till those works are scheduled.
> 5) When keventd thread is scheduled on a processor, it executes the work.
> In this time, this processor is not executing the area which is
> overwritten by jump instruction. And also it has finished all
> interruptions. Because, in the case of voluntary preemption or non
> preemption kernel, the context switch does not occur in the extension of
> interruption.
> 6) The all works are scheduled, djprobe writes the jump instruction.
>
> Q: Can the djprobe work with kernel full preemption?
> A: No, but you can use the djprobe's interface. When kernel full preemption
> is enabled, we can't ensure that no threads are executing the modified
> area. It may be stored in the stack of the threads. In this case, the
> djprobe interfaces are emulated by using kprobe.
> The latest linux kernel supports not only full preemption but also the
> voluntarily preemption. In the case of voluntarily preemption, threads
> are scheduled from only limited addresses. So it is easy to check that
> the preemption can not occur in the modified area.
>
>
>
>



--
Hiro Yoshioka
mailto:hyoshiok at miraclelinux.com