2009-06-02 22:43:55

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 00/12] hw-breakpoints: new hardware breakpoints API

Hi Ingo,

The new hardware breakpoint API has benefit from deep reviews since
its first iteration several months ago, and Prasad has addressed
most (all?) of them.

After a small reorg and few fixes that were done recently, it seems
ready for integration on -tip.

I've tested it successfully for both kernel (ftrace) and user (gdb)
hardware breakpoints.

Thanks!

Frederic.

The following changes since commit 43bd1236234cacbc18d1476a9b57e7a306efddf5:
Frederic Weisbecker (1):
tracing/stat: remove unappropriate safe walk on list

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git hw-breakpoints

K.Prasad (12):
hw-breakpoints: prepare the code for Hardware Breakpoint interfaces
hw-breakpoints: introducing generic hardware breakpoint handler interfaces
hw-breakpoints: x86 architecture implementation of Hardware Breakpoint interfaces
hw-breakpoints: modifying generic debug exception to use thread-specific debug registers
hw-breakpoints: use wrapper routines around debug registers in processor related functions
hw-breakpoints: use the new wrapper routines to access debug registers in process/thread code
hw-breakpoints: modify signal handling code to refrain from re-enabling HW Breakpoints
hw-breakpoints: modify Ptrace routines to access breakpoint registers
hw-breakpoints: cleanup HW Breakpoint registers before kexec
hw-breakpoints: sample HW breakpoint over kernel data address
hw-breakpoints: ftrace plugin for kernel symbol tracing using HW Breakpoint interfaces
hw-breakpoints: reset bits in dr6 after the corresponding exception is handled

arch/Kconfig | 4 +
arch/x86/Kconfig | 1 +
arch/x86/include/asm/a.out-core.h | 8 +-
arch/x86/include/asm/debugreg.h | 29 ++
arch/x86/include/asm/hw_breakpoint.h | 55 ++++
arch/x86/include/asm/processor.h | 8 +-
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/hw_breakpoint.c | 391 +++++++++++++++++++++++
arch/x86/kernel/kgdb.c | 6 +
arch/x86/kernel/kprobes.c | 9 +-
arch/x86/kernel/machine_kexec_32.c | 2 +
arch/x86/kernel/machine_kexec_64.c | 2 +
arch/x86/kernel/process.c | 22 +-
arch/x86/kernel/process_32.c | 28 ++
arch/x86/kernel/process_64.c | 31 ++
arch/x86/kernel/ptrace.c | 231 +++++++++------
arch/x86/kernel/signal.c | 9 -
arch/x86/kernel/smpboot.c | 3 +
arch/x86/kernel/traps.c | 71 ++---
arch/x86/mm/kmmio.c | 8 +-
arch/x86/power/cpu_32.c | 13 +-
arch/x86/power/cpu_64.c | 12 +-
include/asm-generic/hw_breakpoint.h | 139 ++++++++
kernel/Makefile | 1 +
kernel/hw_breakpoint.c | 378 ++++++++++++++++++++++
kernel/trace/Kconfig | 21 ++
kernel/trace/Makefile | 1 +
kernel/trace/trace.h | 23 ++
kernel/trace/trace_ksym.c | 525 +++++++++++++++++++++++++++++++
kernel/trace/trace_selftest.c | 53 +++
samples/Kconfig | 6 +
samples/Makefile | 3 +-
samples/hw_breakpoint/Makefile | 1 +
samples/hw_breakpoint/data_breakpoint.c | 83 +++++
34 files changed, 1987 insertions(+), 192 deletions(-)
create mode 100644 arch/x86/include/asm/hw_breakpoint.h
create mode 100644 arch/x86/kernel/hw_breakpoint.c
create mode 100644 include/asm-generic/hw_breakpoint.h
create mode 100644 kernel/hw_breakpoint.c
create mode 100644 kernel/trace/trace_ksym.c
create mode 100644 samples/hw_breakpoint/Makefile
create mode 100644 samples/hw_breakpoint/data_breakpoint.c


2009-06-02 22:44:13

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 01/12] hw-breakpoints: prepare the code for Hardware Breakpoint interfaces

From: K.Prasad <[email protected]>

The generic hardware breakpoint interface provides an abstraction of
hardware breakpoints in front of specific arch implementations for both kernel
and user side breakpoints.
This includes execution breakpoints and read/write breakpoints, also known as
"watchpoints".

This patch introduces header files containing constants, structure definitions
and declaration of functions used by the hardware breakpoint core and x86
specific code.
It also introduces an array based storage for the debug-register values in
'struct thread_struct', while modifying all users of debugreg<n> member in the
structure.

[ Impact: add headers for new hardware breakpoint interface ]

Original-patch-by: Alan Stern <[email protected]>
Signed-off-by: K.Prasad <[email protected]>
Reviewed-by: Alan Stern <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/include/asm/a.out-core.h | 8 +-
arch/x86/include/asm/debugreg.h | 29 +++++++
arch/x86/include/asm/hw_breakpoint.h | 55 +++++++++++++
arch/x86/include/asm/processor.h | 8 +-
arch/x86/kernel/process.c | 16 ++--
arch/x86/kernel/ptrace.c | 16 ++--
arch/x86/power/cpu_32.c | 8 +-
arch/x86/power/cpu_64.c | 8 +-
include/asm-generic/hw_breakpoint.h | 139 ++++++++++++++++++++++++++++++++++
9 files changed, 255 insertions(+), 32 deletions(-)
create mode 100644 arch/x86/include/asm/hw_breakpoint.h
create mode 100644 include/asm-generic/hw_breakpoint.h

diff --git a/arch/x86/include/asm/a.out-core.h b/arch/x86/include/asm/a.out-core.h
index bb70e39..fc4685d 100644
--- a/arch/x86/include/asm/a.out-core.h
+++ b/arch/x86/include/asm/a.out-core.h
@@ -32,10 +32,10 @@ static inline void aout_dump_thread(struct pt_regs *regs, struct user *dump)
>> PAGE_SHIFT;
dump->u_dsize -= dump->u_tsize;
dump->u_ssize = 0;
- dump->u_debugreg[0] = current->thread.debugreg0;
- dump->u_debugreg[1] = current->thread.debugreg1;
- dump->u_debugreg[2] = current->thread.debugreg2;
- dump->u_debugreg[3] = current->thread.debugreg3;
+ dump->u_debugreg[0] = current->thread.debugreg[0];
+ dump->u_debugreg[1] = current->thread.debugreg[1];
+ dump->u_debugreg[2] = current->thread.debugreg[2];
+ dump->u_debugreg[3] = current->thread.debugreg[3];
dump->u_debugreg[4] = 0;
dump->u_debugreg[5] = 0;
dump->u_debugreg[6] = current->thread.debugreg6;
diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h
index 3ea6f37..23439fb 100644
--- a/arch/x86/include/asm/debugreg.h
+++ b/arch/x86/include/asm/debugreg.h
@@ -18,6 +18,7 @@
#define DR_TRAP1 (0x2) /* db1 */
#define DR_TRAP2 (0x4) /* db2 */
#define DR_TRAP3 (0x8) /* db3 */
+#define DR_TRAP_BITS (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)

#define DR_STEP (0x4000) /* single-step */
#define DR_SWITCH (0x8000) /* task switch */
@@ -49,6 +50,8 @@

#define DR_LOCAL_ENABLE_SHIFT 0 /* Extra shift to the local enable bit */
#define DR_GLOBAL_ENABLE_SHIFT 1 /* Extra shift to the global enable bit */
+#define DR_LOCAL_ENABLE (0x1) /* Local enable for reg 0 */
+#define DR_GLOBAL_ENABLE (0x2) /* Global enable for reg 0 */
#define DR_ENABLE_SIZE 2 /* 2 enable bits per register */

#define DR_LOCAL_ENABLE_MASK (0x55) /* Set local bits for all 4 regs */
@@ -67,4 +70,30 @@
#define DR_LOCAL_SLOWDOWN (0x100) /* Local slow the pipeline */
#define DR_GLOBAL_SLOWDOWN (0x200) /* Global slow the pipeline */

+/*
+ * HW breakpoint additions
+ */
+#ifdef __KERNEL__
+
+/* For process management */
+extern void flush_thread_hw_breakpoint(struct task_struct *tsk);
+extern int copy_thread_hw_breakpoint(struct task_struct *tsk,
+ struct task_struct *child, unsigned long clone_flags);
+
+/* For CPU management */
+extern void load_debug_registers(void);
+static inline void hw_breakpoint_disable(void)
+{
+ /* Zero the control register for HW Breakpoint */
+ set_debugreg(0UL, 7);
+
+ /* Zero-out the individual HW breakpoint address registers */
+ set_debugreg(0UL, 0);
+ set_debugreg(0UL, 1);
+ set_debugreg(0UL, 2);
+ set_debugreg(0UL, 3);
+}
+
+#endif /* __KERNEL__ */
+
#endif /* _ASM_X86_DEBUGREG_H */
diff --git a/arch/x86/include/asm/hw_breakpoint.h b/arch/x86/include/asm/hw_breakpoint.h
new file mode 100644
index 0000000..1acb4d4
--- /dev/null
+++ b/arch/x86/include/asm/hw_breakpoint.h
@@ -0,0 +1,55 @@
+#ifndef _I386_HW_BREAKPOINT_H
+#define _I386_HW_BREAKPOINT_H
+
+#ifdef __KERNEL__
+#define __ARCH_HW_BREAKPOINT_H
+
+struct arch_hw_breakpoint {
+ char *name; /* Contains name of the symbol to set bkpt */
+ unsigned long address;
+ u8 len;
+ u8 type;
+};
+
+#include <linux/kdebug.h>
+#include <asm-generic/hw_breakpoint.h>
+
+/* Available HW breakpoint length encodings */
+#define HW_BREAKPOINT_LEN_1 0x40
+#define HW_BREAKPOINT_LEN_2 0x44
+#define HW_BREAKPOINT_LEN_4 0x4c
+#define HW_BREAKPOINT_LEN_EXECUTE 0x40
+
+#ifdef CONFIG_X86_64
+#define HW_BREAKPOINT_LEN_8 0x48
+#endif
+
+/* Available HW breakpoint type encodings */
+
+/* trigger on instruction execute */
+#define HW_BREAKPOINT_EXECUTE 0x80
+/* trigger on memory write */
+#define HW_BREAKPOINT_WRITE 0x81
+/* trigger on memory read or write */
+#define HW_BREAKPOINT_RW 0x83
+
+/* Total number of available HW breakpoint registers */
+#define HBP_NUM 4
+
+extern struct hw_breakpoint *hbp_kernel[HBP_NUM];
+DECLARE_PER_CPU(struct hw_breakpoint*, this_hbp_kernel[HBP_NUM]);
+extern unsigned int hbp_user_refcount[HBP_NUM];
+
+extern void arch_install_thread_hw_breakpoint(struct task_struct *tsk);
+extern void arch_uninstall_thread_hw_breakpoint(void);
+extern int arch_check_va_in_userspace(unsigned long va, u8 hbp_len);
+extern int arch_validate_hwbkpt_settings(struct hw_breakpoint *bp,
+ struct task_struct *tsk);
+extern void arch_update_user_hw_breakpoint(int pos, struct task_struct *tsk);
+extern void arch_flush_thread_hw_breakpoint(struct task_struct *tsk);
+extern void arch_update_kernel_hw_breakpoint(void *);
+extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
+ unsigned long val, void *data);
+#endif /* __KERNEL__ */
+#endif /* _I386_HW_BREAKPOINT_H */
+
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 0b2fab0..448b34a 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -29,6 +29,7 @@ struct mm_struct;
#include <linux/threads.h>
#include <linux/init.h>

+#define HBP_NUM 4
/*
* Default implementation of macro that returns current
* instruction pointer ("program counter").
@@ -431,12 +432,11 @@ struct thread_struct {
unsigned long fs;
unsigned long gs;
/* Hardware debugging registers: */
- unsigned long debugreg0;
- unsigned long debugreg1;
- unsigned long debugreg2;
- unsigned long debugreg3;
+ unsigned long debugreg[HBP_NUM];
unsigned long debugreg6;
unsigned long debugreg7;
+ /* Hardware breakpoint info */
+ struct hw_breakpoint *hbp[HBP_NUM];
/* Fault info: */
unsigned long cr2;
unsigned long trap_no;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index fb5dfb8..291527c 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -106,10 +106,10 @@ void flush_thread(void)

clear_tsk_thread_flag(tsk, TIF_DEBUG);

- tsk->thread.debugreg0 = 0;
- tsk->thread.debugreg1 = 0;
- tsk->thread.debugreg2 = 0;
- tsk->thread.debugreg3 = 0;
+ tsk->thread.debugreg[0] = 0;
+ tsk->thread.debugreg[1] = 0;
+ tsk->thread.debugreg[2] = 0;
+ tsk->thread.debugreg[3] = 0;
tsk->thread.debugreg6 = 0;
tsk->thread.debugreg7 = 0;
memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
@@ -194,10 +194,10 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
update_debugctlmsr(next->debugctlmsr);

if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
- set_debugreg(next->debugreg0, 0);
- set_debugreg(next->debugreg1, 1);
- set_debugreg(next->debugreg2, 2);
- set_debugreg(next->debugreg3, 3);
+ set_debugreg(next->debugreg[0], 0);
+ set_debugreg(next->debugreg[1], 1);
+ set_debugreg(next->debugreg[2], 2);
+ set_debugreg(next->debugreg[3], 3);
/* no 4 and 5 */
set_debugreg(next->debugreg6, 6);
set_debugreg(next->debugreg7, 7);
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 09ecbde..313be40 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -471,10 +471,10 @@ static int genregs_set(struct task_struct *target,
static unsigned long ptrace_get_debugreg(struct task_struct *child, int n)
{
switch (n) {
- case 0: return child->thread.debugreg0;
- case 1: return child->thread.debugreg1;
- case 2: return child->thread.debugreg2;
- case 3: return child->thread.debugreg3;
+ case 0: return child->thread.debugreg[0];
+ case 1: return child->thread.debugreg[1];
+ case 2: return child->thread.debugreg[2];
+ case 3: return child->thread.debugreg[3];
case 6: return child->thread.debugreg6;
case 7: return child->thread.debugreg7;
}
@@ -493,10 +493,10 @@ static int ptrace_set_debugreg(struct task_struct *child,
return -EIO;

switch (n) {
- case 0: child->thread.debugreg0 = data; break;
- case 1: child->thread.debugreg1 = data; break;
- case 2: child->thread.debugreg2 = data; break;
- case 3: child->thread.debugreg3 = data; break;
+ case 0: child->thread.debugreg[0] = data; break;
+ case 1: child->thread.debugreg[1] = data; break;
+ case 2: child->thread.debugreg[2] = data; break;
+ case 3: child->thread.debugreg[3] = data; break;

case 6:
if ((data & ~0xffffffffUL) != 0)
diff --git a/arch/x86/power/cpu_32.c b/arch/x86/power/cpu_32.c
index ce702c5..5199139 100644
--- a/arch/x86/power/cpu_32.c
+++ b/arch/x86/power/cpu_32.c
@@ -84,10 +84,10 @@ static void fix_processor_context(void)
* Now maybe reload the debug registers
*/
if (current->thread.debugreg7) {
- set_debugreg(current->thread.debugreg0, 0);
- set_debugreg(current->thread.debugreg1, 1);
- set_debugreg(current->thread.debugreg2, 2);
- set_debugreg(current->thread.debugreg3, 3);
+ set_debugreg(current->thread.debugreg[0], 0);
+ set_debugreg(current->thread.debugreg[1], 1);
+ set_debugreg(current->thread.debugreg[2], 2);
+ set_debugreg(current->thread.debugreg[3], 3);
/* no 4 and 5 */
set_debugreg(current->thread.debugreg6, 6);
set_debugreg(current->thread.debugreg7, 7);
diff --git a/arch/x86/power/cpu_64.c b/arch/x86/power/cpu_64.c
index 5343540..1e3bdcc 100644
--- a/arch/x86/power/cpu_64.c
+++ b/arch/x86/power/cpu_64.c
@@ -163,10 +163,10 @@ static void fix_processor_context(void)
* Now maybe reload the debug registers
*/
if (current->thread.debugreg7){
- loaddebug(&current->thread, 0);
- loaddebug(&current->thread, 1);
- loaddebug(&current->thread, 2);
- loaddebug(&current->thread, 3);
+ set_debugreg(current->thread.debugreg[0], 0);
+ set_debugreg(current->thread.debugreg[1], 1);
+ set_debugreg(current->thread.debugreg[2], 2);
+ set_debugreg(current->thread.debugreg[3], 3);
/* no 4 and 5 */
loaddebug(&current->thread, 6);
loaddebug(&current->thread, 7);
diff --git a/include/asm-generic/hw_breakpoint.h b/include/asm-generic/hw_breakpoint.h
new file mode 100644
index 0000000..9bf2d12
--- /dev/null
+++ b/include/asm-generic/hw_breakpoint.h
@@ -0,0 +1,139 @@
+#ifndef _ASM_GENERIC_HW_BREAKPOINT_H
+#define _ASM_GENERIC_HW_BREAKPOINT_H
+
+#ifndef __ARCH_HW_BREAKPOINT_H
+#error "Please don't include this file directly"
+#endif
+
+#ifdef __KERNEL__
+#include <linux/list.h>
+#include <linux/types.h>
+#include <linux/kallsyms.h>
+
+/**
+ * struct hw_breakpoint - unified kernel/user-space hardware breakpoint
+ * @triggered: callback invoked after target address access
+ * @info: arch-specific breakpoint info (address, length, and type)
+ *
+ * %hw_breakpoint structures are the kernel's way of representing
+ * hardware breakpoints. These are data breakpoints
+ * (also known as "watchpoints", triggered on data access), and the breakpoint's
+ * target address can be located in either kernel space or user space.
+ *
+ * The breakpoint's address, length, and type are highly
+ * architecture-specific. The values are encoded in the @info field; you
+ * specify them when registering the breakpoint. To examine the encoded
+ * values use hw_breakpoint_get_{kaddress,uaddress,len,type}(), declared
+ * below.
+ *
+ * The address is specified as a regular kernel pointer (for kernel-space
+ * breakponts) or as an %__user pointer (for user-space breakpoints).
+ * With register_user_hw_breakpoint(), the address must refer to a
+ * location in user space. The breakpoint will be active only while the
+ * requested task is running. Conversely with
+ * register_kernel_hw_breakpoint(), the address must refer to a location
+ * in kernel space, and the breakpoint will be active on all CPUs
+ * regardless of the current task.
+ *
+ * The length is the breakpoint's extent in bytes, which is subject to
+ * certain limitations. include/asm/hw_breakpoint.h contains macros
+ * defining the available lengths for a specific architecture. Note that
+ * the address's alignment must match the length. The breakpoint will
+ * catch accesses to any byte in the range from address to address +
+ * (length - 1).
+ *
+ * The breakpoint's type indicates the sort of access that will cause it
+ * to trigger. Possible values may include:
+ *
+ * %HW_BREAKPOINT_RW (triggered on read or write access),
+ * %HW_BREAKPOINT_WRITE (triggered on write access), and
+ * %HW_BREAKPOINT_READ (triggered on read access).
+ *
+ * Appropriate macros are defined in include/asm/hw_breakpoint.h; not all
+ * possibilities are available on all architectures. Execute breakpoints
+ * must have length equal to the special value %HW_BREAKPOINT_LEN_EXECUTE.
+ *
+ * When a breakpoint gets hit, the @triggered callback is
+ * invoked in_interrupt with a pointer to the %hw_breakpoint structure and the
+ * processor registers.
+ * Data breakpoints occur after the memory access has taken place.
+ * Breakpoints are disabled during execution @triggered, to avoid
+ * recursive traps and allow unhindered access to breakpointed memory.
+ *
+ * This sample code sets a breakpoint on pid_max and registers a callback
+ * function for writes to that variable. Note that it is not portable
+ * as written, because not all architectures support HW_BREAKPOINT_LEN_4.
+ *
+ * ----------------------------------------------------------------------
+ *
+ * #include <asm/hw_breakpoint.h>
+ *
+ * struct hw_breakpoint my_bp;
+ *
+ * static void my_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+ * {
+ * printk(KERN_DEBUG "Inside triggered routine of breakpoint exception\n");
+ * dump_stack();
+ * .......<more debugging output>........
+ * }
+ *
+ * static struct hw_breakpoint my_bp;
+ *
+ * static int init_module(void)
+ * {
+ * ..........<do anything>............
+ * my_bp.info.type = HW_BREAKPOINT_WRITE;
+ * my_bp.info.len = HW_BREAKPOINT_LEN_4;
+ *
+ * my_bp.installed = (void *)my_bp_installed;
+ *
+ * rc = register_kernel_hw_breakpoint(&my_bp);
+ * ..........<do anything>............
+ * }
+ *
+ * static void cleanup_module(void)
+ * {
+ * ..........<do anything>............
+ * unregister_kernel_hw_breakpoint(&my_bp);
+ * ..........<do anything>............
+ * }
+ *
+ * ----------------------------------------------------------------------
+ */
+struct hw_breakpoint {
+ void (*triggered)(struct hw_breakpoint *, struct pt_regs *);
+ struct arch_hw_breakpoint info;
+};
+
+/*
+ * len and type values are defined in include/asm/hw_breakpoint.h.
+ * Available values vary according to the architecture. On i386 the
+ * possibilities are:
+ *
+ * HW_BREAKPOINT_LEN_1
+ * HW_BREAKPOINT_LEN_2
+ * HW_BREAKPOINT_LEN_4
+ * HW_BREAKPOINT_RW
+ * HW_BREAKPOINT_READ
+ *
+ * On other architectures HW_BREAKPOINT_LEN_8 may be available, and the
+ * 1-, 2-, and 4-byte lengths may be unavailable. There also may be
+ * HW_BREAKPOINT_WRITE. You can use #ifdef to check at compile time.
+ */
+
+extern int register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp);
+extern int modify_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp);
+extern void unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp);
+/*
+ * Kernel breakpoints are not associated with any particular thread.
+ */
+extern int register_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+extern void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+
+extern unsigned int hbp_kernel_pos;
+
+#endif /* __KERNEL__ */
+#endif /* _ASM_GENERIC_HW_BREAKPOINT_H */
--
1.6.2.3

2009-06-02 22:44:28

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 02/12] hw-breakpoints: introducing generic hardware breakpoint handler interfaces

From: K.Prasad <[email protected]>

This patch introduces the generic Hardware Breakpoint interfaces for both user
and kernel space requests.
This core Api handles the hardware breakpoints through new helpers. It
handles the user-space breakpoints and kernel breakpoints in front of
arch implementation.

One can choose kernel wide breakpoints using the following helpers
and passing them a generic struct hw_breakpoint:

- register_kernel_hw_breakpoint()
- unregister_kernel_hw_breakpoint()
- modify_kernel_hw_breakpoint()

On the other side, you can choose per task breakpoints.

- register_user_hw_breakpoint()
- unregister_user_hw_breakpoint()
- modify_user_hw_breakpoint()

[ [email protected]: fix conflict against perfcounter ]

Original-patch-by: Alan Stern <[email protected]>
Signed-off-by: K.Prasad <[email protected]>
Reviewed-by: Alan Stern <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/Kconfig | 4 +
kernel/Makefile | 1 +
kernel/hw_breakpoint.c | 378 ++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 383 insertions(+), 0 deletions(-)
create mode 100644 kernel/hw_breakpoint.c

diff --git a/arch/Kconfig b/arch/Kconfig
index 78a35e9..1adf2d0 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -112,3 +112,7 @@ config HAVE_DMA_API_DEBUG

config HAVE_DEFAULT_NO_SPIN_MUTEXES
bool
+
+config HAVE_HW_BREAKPOINT
+ bool
+
diff --git a/kernel/Makefile b/kernel/Makefile
index a35eee3..18ad111 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -96,6 +96,7 @@ obj-$(CONFIG_TRACING) += trace/
obj-$(CONFIG_X86_DS) += trace/
obj-$(CONFIG_SMP) += sched_cpupri.o
obj-$(CONFIG_SLOW_WORK) += slow-work.o
+obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o

ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
# According to Alan Modra <[email protected]>, the -fno-omit-frame-pointer is
diff --git a/kernel/hw_breakpoint.c b/kernel/hw_breakpoint.c
new file mode 100644
index 0000000..c1f64e6
--- /dev/null
+++ b/kernel/hw_breakpoint.c
@@ -0,0 +1,378 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ * Copyright (C) IBM Corporation, 2009
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ * This file contains the arch-independent routines.
+ */
+
+#include <linux/irqflags.h>
+#include <linux/kallsyms.h>
+#include <linux/notifier.h>
+#include <linux/kprobes.h>
+#include <linux/kdebug.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/percpu.h>
+#include <linux/sched.h>
+#include <linux/init.h>
+#include <linux/smp.h>
+
+#include <asm/hw_breakpoint.h>
+#include <asm/processor.h>
+
+#ifdef CONFIG_X86
+#include <asm/debugreg.h>
+#endif
+/*
+ * Spinlock that protects all (un)register operations over kernel/user-space
+ * breakpoint requests
+ */
+static DEFINE_SPINLOCK(hw_breakpoint_lock);
+
+/* Array of kernel-space breakpoint structures */
+struct hw_breakpoint *hbp_kernel[HBP_NUM];
+
+/*
+ * Per-processor copy of hbp_kernel[]. Used only when hbp_kernel is being
+ * modified but we need the older copy to handle any hbp exceptions. It will
+ * sync with hbp_kernel[] value after updation is done through IPIs.
+ */
+DEFINE_PER_CPU(struct hw_breakpoint*, this_hbp_kernel[HBP_NUM]);
+
+/*
+ * Kernel breakpoints grow downwards, starting from HBP_NUM
+ * 'hbp_kernel_pos' denotes lowest numbered breakpoint register occupied for
+ * kernel-space request. We will initialise it here and not in an __init
+ * routine because load_debug_registers(), which uses this variable can be
+ * called very early during CPU initialisation.
+ */
+unsigned int hbp_kernel_pos = HBP_NUM;
+
+/*
+ * An array containing refcount of threads using a given bkpt register
+ * Accesses are synchronised by acquiring hw_breakpoint_lock
+ */
+unsigned int hbp_user_refcount[HBP_NUM];
+
+/*
+ * Load the debug registers during startup of a CPU.
+ */
+void load_debug_registers(void)
+{
+ unsigned long flags;
+ struct task_struct *tsk = current;
+
+ spin_lock_bh(&hw_breakpoint_lock);
+
+ /* Prevent IPIs for new kernel breakpoint updates */
+ local_irq_save(flags);
+ arch_update_kernel_hw_breakpoint(NULL);
+ local_irq_restore(flags);
+
+ if (test_tsk_thread_flag(tsk, TIF_DEBUG))
+ arch_install_thread_hw_breakpoint(tsk);
+
+ spin_unlock_bh(&hw_breakpoint_lock);
+}
+
+/*
+ * Erase all the hardware breakpoint info associated with a thread.
+ *
+ * If tsk != current then tsk must not be usable (for example, a
+ * child being cleaned up from a failed fork).
+ */
+void flush_thread_hw_breakpoint(struct task_struct *tsk)
+{
+ int i;
+ struct thread_struct *thread = &(tsk->thread);
+
+ spin_lock_bh(&hw_breakpoint_lock);
+
+ /* The thread no longer has any breakpoints associated with it */
+ clear_tsk_thread_flag(tsk, TIF_DEBUG);
+ for (i = 0; i < HBP_NUM; i++) {
+ if (thread->hbp[i]) {
+ hbp_user_refcount[i]--;
+ kfree(thread->hbp[i]);
+ thread->hbp[i] = NULL;
+ }
+ }
+
+ arch_flush_thread_hw_breakpoint(tsk);
+
+ /* Actually uninstall the breakpoints if necessary */
+ if (tsk == current)
+ arch_uninstall_thread_hw_breakpoint();
+ spin_unlock_bh(&hw_breakpoint_lock);
+}
+
+/*
+ * Copy the hardware breakpoint info from a thread to its cloned child.
+ */
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+ struct task_struct *child, unsigned long clone_flags)
+{
+ /*
+ * We will assume that breakpoint settings are not inherited
+ * and the child starts out with no debug registers set.
+ * But what about CLONE_PTRACE?
+ */
+ clear_tsk_thread_flag(child, TIF_DEBUG);
+
+ /* We will call flush routine since the debugregs are not inherited */
+ arch_flush_thread_hw_breakpoint(child);
+
+ return 0;
+}
+
+static int __register_user_hw_breakpoint(int pos, struct task_struct *tsk,
+ struct hw_breakpoint *bp)
+{
+ struct thread_struct *thread = &(tsk->thread);
+ int rc;
+
+ /* Do not overcommit. Fail if kernel has used the hbp registers */
+ if (pos >= hbp_kernel_pos)
+ return -ENOSPC;
+
+ rc = arch_validate_hwbkpt_settings(bp, tsk);
+ if (rc)
+ return rc;
+
+ thread->hbp[pos] = bp;
+ hbp_user_refcount[pos]++;
+
+ arch_update_user_hw_breakpoint(pos, tsk);
+ /*
+ * Does it need to be installed right now?
+ * Otherwise it will get installed the next time tsk runs
+ */
+ if (tsk == current)
+ arch_install_thread_hw_breakpoint(tsk);
+
+ return rc;
+}
+
+/*
+ * Modify the address of a hbp register already in use by the task
+ * Do not invoke this in-lieu of a __unregister_user_hw_breakpoint()
+ */
+static int __modify_user_hw_breakpoint(int pos, struct task_struct *tsk,
+ struct hw_breakpoint *bp)
+{
+ struct thread_struct *thread = &(tsk->thread);
+
+ if ((pos >= hbp_kernel_pos) || (arch_validate_hwbkpt_settings(bp, tsk)))
+ return -EINVAL;
+
+ if (thread->hbp[pos] == NULL)
+ return -EINVAL;
+
+ thread->hbp[pos] = bp;
+ /*
+ * 'pos' must be that of a hbp register already used by 'tsk'
+ * Otherwise arch_modify_user_hw_breakpoint() will fail
+ */
+ arch_update_user_hw_breakpoint(pos, tsk);
+
+ if (tsk == current)
+ arch_install_thread_hw_breakpoint(tsk);
+
+ return 0;
+}
+
+static void __unregister_user_hw_breakpoint(int pos, struct task_struct *tsk)
+{
+ hbp_user_refcount[pos]--;
+ tsk->thread.hbp[pos] = NULL;
+
+ arch_update_user_hw_breakpoint(pos, tsk);
+
+ if (tsk == current)
+ arch_install_thread_hw_breakpoint(tsk);
+}
+
+/**
+ * register_user_hw_breakpoint - register a hardware breakpoint for user space
+ * @tsk: pointer to 'task_struct' of the process to which the address belongs
+ * @bp: the breakpoint structure to register
+ *
+ * @bp.info->name or @bp.info->address, @bp.info->len, @bp.info->type and
+ * @bp->triggered must be set properly before invocation
+ *
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp)
+{
+ struct thread_struct *thread = &(tsk->thread);
+ int i, rc = -ENOSPC;
+
+ spin_lock_bh(&hw_breakpoint_lock);
+
+ for (i = 0; i < hbp_kernel_pos; i++) {
+ if (!thread->hbp[i]) {
+ rc = __register_user_hw_breakpoint(i, tsk, bp);
+ break;
+ }
+ }
+ if (!rc)
+ set_tsk_thread_flag(tsk, TIF_DEBUG);
+
+ spin_unlock_bh(&hw_breakpoint_lock);
+ return rc;
+}
+EXPORT_SYMBOL_GPL(register_user_hw_breakpoint);
+
+/**
+ * modify_user_hw_breakpoint - modify a user-space hardware breakpoint
+ * @tsk: pointer to 'task_struct' of the process to which the address belongs
+ * @bp: the breakpoint structure to unregister
+ *
+ */
+int modify_user_hw_breakpoint(struct task_struct *tsk, struct hw_breakpoint *bp)
+{
+ struct thread_struct *thread = &(tsk->thread);
+ int i, ret = -ENOENT;
+
+ spin_lock_bh(&hw_breakpoint_lock);
+ for (i = 0; i < hbp_kernel_pos; i++) {
+ if (bp == thread->hbp[i]) {
+ ret = __modify_user_hw_breakpoint(i, tsk, bp);
+ break;
+ }
+ }
+ spin_unlock_bh(&hw_breakpoint_lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(modify_user_hw_breakpoint);
+
+/**
+ * unregister_user_hw_breakpoint - unregister a user-space hardware breakpoint
+ * @tsk: pointer to 'task_struct' of the process to which the address belongs
+ * @bp: the breakpoint structure to unregister
+ *
+ */
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp)
+{
+ struct thread_struct *thread = &(tsk->thread);
+ int i, pos = -1, hbp_counter = 0;
+
+ spin_lock_bh(&hw_breakpoint_lock);
+ for (i = 0; i < hbp_kernel_pos; i++) {
+ if (thread->hbp[i])
+ hbp_counter++;
+ if (bp == thread->hbp[i])
+ pos = i;
+ }
+ if (pos >= 0) {
+ __unregister_user_hw_breakpoint(pos, tsk);
+ hbp_counter--;
+ }
+ if (!hbp_counter)
+ clear_tsk_thread_flag(tsk, TIF_DEBUG);
+
+ spin_unlock_bh(&hw_breakpoint_lock);
+}
+EXPORT_SYMBOL_GPL(unregister_user_hw_breakpoint);
+
+/**
+ * register_kernel_hw_breakpoint - register a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to register
+ *
+ * @bp.info->name or @bp.info->address, @bp.info->len, @bp.info->type and
+ * @bp->triggered must be set properly before invocation
+ *
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+ int rc;
+
+ rc = arch_validate_hwbkpt_settings(bp, NULL);
+ if (rc)
+ return rc;
+
+ spin_lock_bh(&hw_breakpoint_lock);
+
+ rc = -ENOSPC;
+ /* Check if we are over-committing */
+ if ((hbp_kernel_pos > 0) && (!hbp_user_refcount[hbp_kernel_pos-1])) {
+ hbp_kernel_pos--;
+ hbp_kernel[hbp_kernel_pos] = bp;
+ on_each_cpu(arch_update_kernel_hw_breakpoint, NULL, 1);
+ rc = 0;
+ }
+
+ spin_unlock_bh(&hw_breakpoint_lock);
+ return rc;
+}
+EXPORT_SYMBOL_GPL(register_kernel_hw_breakpoint);
+
+/**
+ * unregister_kernel_hw_breakpoint - unregister a HW breakpoint for kernel space
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+ int i, j;
+
+ spin_lock_bh(&hw_breakpoint_lock);
+
+ /* Find the 'bp' in our list of breakpoints for kernel */
+ for (i = hbp_kernel_pos; i < HBP_NUM; i++)
+ if (bp == hbp_kernel[i])
+ break;
+
+ /* Check if we did not find a match for 'bp'. If so return early */
+ if (i == HBP_NUM) {
+ spin_unlock_bh(&hw_breakpoint_lock);
+ return;
+ }
+
+ /*
+ * We'll shift the breakpoints one-level above to compact if
+ * unregistration creates a hole
+ */
+ for (j = i; j > hbp_kernel_pos; j--)
+ hbp_kernel[j] = hbp_kernel[j-1];
+
+ hbp_kernel[hbp_kernel_pos] = NULL;
+ on_each_cpu(arch_update_kernel_hw_breakpoint, NULL, 1);
+ hbp_kernel_pos++;
+
+ spin_unlock_bh(&hw_breakpoint_lock);
+}
+EXPORT_SYMBOL_GPL(unregister_kernel_hw_breakpoint);
+
+static struct notifier_block hw_breakpoint_exceptions_nb = {
+ .notifier_call = hw_breakpoint_exceptions_notify,
+ /* we need to be notified first */
+ .priority = 0x7fffffff
+};
+
+static int __init init_hw_breakpoint(void)
+{
+ return register_die_notifier(&hw_breakpoint_exceptions_nb);
+}
+
+core_initcall(init_hw_breakpoint);
--
1.6.2.3

2009-06-02 22:44:39

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 03/12] hw-breakpoints: x86 architecture implementation of Hardware Breakpoint interfaces

From: K.Prasad <[email protected]>

This patch introduces the arch-specific implementation of the generic
hardware breakpoints in kernel/hw_breakpoint.c inside x86 specific directories.
It contains functions which help to validate and serve requests using
Hardware Breakpoint registers on x86 processors.

[ [email protected]: fix conflict against kmemcheck ]

Original-patch-by: Alan Stern <[email protected]>
Signed-off-by: K.Prasad <[email protected]>
Reviewed-by: Alan Stern <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/Kconfig | 1 +
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/hw_breakpoint.c | 382 +++++++++++++++++++++++++++++++++++++++
3 files changed, 384 insertions(+), 1 deletions(-)
create mode 100644 arch/x86/kernel/hw_breakpoint.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index df9e885..3033375 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -46,6 +46,7 @@ config X86
select HAVE_KERNEL_GZIP
select HAVE_KERNEL_BZIP2
select HAVE_KERNEL_LZMA
+ select HAVE_HW_BREAKPOINT

config ARCH_DEFCONFIG
string
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 77df4d6..cbc7818 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -36,7 +36,7 @@ obj-$(CONFIG_X86_64) += sys_x86_64.o x8664_ksyms_64.o
obj-$(CONFIG_X86_64) += syscall_64.o vsyscall_64.o
obj-y += bootflag.o e820.o
obj-y += pci-dma.o quirks.o i8237.o topology.o kdebugfs.o
-obj-y += alternative.o i8253.o pci-nommu.o
+obj-y += alternative.o i8253.o pci-nommu.o hw_breakpoint.o
obj-y += tsc.o io_delay.o rtc.o

obj-$(CONFIG_X86_TRAMPOLINE) += trampoline.o
diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c
new file mode 100644
index 0000000..4867c9f
--- /dev/null
+++ b/arch/x86/kernel/hw_breakpoint.c
@@ -0,0 +1,382 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ * Copyright (C) 2009 IBM Corporation
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ */
+
+#include <linux/irqflags.h>
+#include <linux/notifier.h>
+#include <linux/kallsyms.h>
+#include <linux/kprobes.h>
+#include <linux/percpu.h>
+#include <linux/kdebug.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/init.h>
+#include <linux/smp.h>
+
+#include <asm/hw_breakpoint.h>
+#include <asm/processor.h>
+#include <asm/debugreg.h>
+
+/* Unmasked kernel DR7 value */
+static unsigned long kdr7;
+
+/*
+ * Masks for the bits corresponding to registers DR0 - DR3 in DR7 register.
+ * Used to clear and verify the status of bits corresponding to DR0 - DR3
+ */
+static const unsigned long dr7_masks[HBP_NUM] = {
+ 0x000f0003, /* LEN0, R/W0, G0, L0 */
+ 0x00f0000c, /* LEN1, R/W1, G1, L1 */
+ 0x0f000030, /* LEN2, R/W2, G2, L2 */
+ 0xf00000c0 /* LEN3, R/W3, G3, L3 */
+};
+
+
+/*
+ * Encode the length, type, Exact, and Enable bits for a particular breakpoint
+ * as stored in debug register 7.
+ */
+static unsigned long encode_dr7(int drnum, unsigned int len, unsigned int type)
+{
+ unsigned long bp_info;
+
+ bp_info = (len | type) & 0xf;
+ bp_info <<= (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
+ bp_info |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE)) |
+ DR_GLOBAL_SLOWDOWN;
+ return bp_info;
+}
+
+void arch_update_kernel_hw_breakpoint(void *unused)
+{
+ struct hw_breakpoint *bp;
+ int i, cpu = get_cpu();
+ unsigned long temp_kdr7 = 0;
+
+ /* Don't allow debug exceptions while we update the registers */
+ set_debugreg(0UL, 7);
+
+ for (i = hbp_kernel_pos; i < HBP_NUM; i++) {
+ per_cpu(this_hbp_kernel[i], cpu) = bp = hbp_kernel[i];
+ if (bp) {
+ temp_kdr7 |= encode_dr7(i, bp->info.len, bp->info.type);
+ set_debugreg(bp->info.address, i);
+ }
+ }
+
+ /* No need to set DR6. Update the debug registers with kernel-space
+ * breakpoint values from kdr7 and user-space requests from the
+ * current process
+ */
+ kdr7 = temp_kdr7;
+ set_debugreg(kdr7 | current->thread.debugreg7, 7);
+ put_cpu_no_resched();
+}
+
+/*
+ * Install the thread breakpoints in their debug registers.
+ */
+void arch_install_thread_hw_breakpoint(struct task_struct *tsk)
+{
+ struct thread_struct *thread = &(tsk->thread);
+
+ switch (hbp_kernel_pos) {
+ case 4:
+ set_debugreg(thread->debugreg[3], 3);
+ case 3:
+ set_debugreg(thread->debugreg[2], 2);
+ case 2:
+ set_debugreg(thread->debugreg[1], 1);
+ case 1:
+ set_debugreg(thread->debugreg[0], 0);
+ default:
+ break;
+ }
+
+ /* No need to set DR6 */
+ set_debugreg((kdr7 | thread->debugreg7), 7);
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+void arch_uninstall_thread_hw_breakpoint()
+{
+ /* Clear the user-space portion of debugreg7 by setting only kdr7 */
+ set_debugreg(kdr7, 7);
+
+}
+
+static int get_hbp_len(u8 hbp_len)
+{
+ unsigned int len_in_bytes = 0;
+
+ switch (hbp_len) {
+ case HW_BREAKPOINT_LEN_1:
+ len_in_bytes = 1;
+ break;
+ case HW_BREAKPOINT_LEN_2:
+ len_in_bytes = 2;
+ break;
+ case HW_BREAKPOINT_LEN_4:
+ len_in_bytes = 4;
+ break;
+#ifdef CONFIG_X86_64
+ case HW_BREAKPOINT_LEN_8:
+ len_in_bytes = 8;
+ break;
+#endif
+ }
+ return len_in_bytes;
+}
+
+/*
+ * Check for virtual address in user space.
+ */
+int arch_check_va_in_userspace(unsigned long va, u8 hbp_len)
+{
+ unsigned int len;
+
+ len = get_hbp_len(hbp_len);
+
+ return (va <= TASK_SIZE - len);
+}
+
+/*
+ * Check for virtual address in kernel space.
+ */
+int arch_check_va_in_kernelspace(unsigned long va, u8 hbp_len)
+{
+ unsigned int len;
+
+ len = get_hbp_len(hbp_len);
+
+ return (va >= TASK_SIZE) && ((va + len - 1) >= TASK_SIZE);
+}
+
+/*
+ * Store a breakpoint's encoded address, length, and type.
+ */
+static int arch_store_info(struct hw_breakpoint *bp, struct task_struct *tsk)
+{
+ /*
+ * User-space requests will always have the address field populated
+ * Symbol names from user-space are rejected
+ */
+ if (tsk && bp->info.name)
+ return -EINVAL;
+ /*
+ * For kernel-addresses, either the address or symbol name can be
+ * specified.
+ */
+ if (bp->info.name)
+ bp->info.address = (unsigned long)
+ kallsyms_lookup_name(bp->info.name);
+ if (bp->info.address)
+ return 0;
+ return -EINVAL;
+}
+
+/*
+ * Validate the arch-specific HW Breakpoint register settings
+ */
+int arch_validate_hwbkpt_settings(struct hw_breakpoint *bp,
+ struct task_struct *tsk)
+{
+ unsigned int align;
+ int ret = -EINVAL;
+
+ switch (bp->info.type) {
+ /*
+ * Ptrace-refactoring code
+ * For now, we'll allow instruction breakpoint only for user-space
+ * addresses
+ */
+ case HW_BREAKPOINT_EXECUTE:
+ if ((!arch_check_va_in_userspace(bp->info.address,
+ bp->info.len)) &&
+ bp->info.len != HW_BREAKPOINT_LEN_EXECUTE)
+ return ret;
+ break;
+ case HW_BREAKPOINT_WRITE:
+ break;
+ case HW_BREAKPOINT_RW:
+ break;
+ default:
+ return ret;
+ }
+
+ switch (bp->info.len) {
+ case HW_BREAKPOINT_LEN_1:
+ align = 0;
+ break;
+ case HW_BREAKPOINT_LEN_2:
+ align = 1;
+ break;
+ case HW_BREAKPOINT_LEN_4:
+ align = 3;
+ break;
+#ifdef CONFIG_X86_64
+ case HW_BREAKPOINT_LEN_8:
+ align = 7;
+ break;
+#endif
+ default:
+ return ret;
+ }
+
+ if (bp->triggered)
+ ret = arch_store_info(bp, tsk);
+
+ if (ret < 0)
+ return ret;
+ /*
+ * Check that the low-order bits of the address are appropriate
+ * for the alignment implied by len.
+ */
+ if (bp->info.address & align)
+ return -EINVAL;
+
+ /* Check that the virtual address is in the proper range */
+ if (tsk) {
+ if (!arch_check_va_in_userspace(bp->info.address, bp->info.len))
+ return -EFAULT;
+ } else {
+ if (!arch_check_va_in_kernelspace(bp->info.address,
+ bp->info.len))
+ return -EFAULT;
+ }
+ return 0;
+}
+
+void arch_update_user_hw_breakpoint(int pos, struct task_struct *tsk)
+{
+ struct thread_struct *thread = &(tsk->thread);
+ struct hw_breakpoint *bp = thread->hbp[pos];
+
+ thread->debugreg7 &= ~dr7_masks[pos];
+ if (bp) {
+ thread->debugreg[pos] = bp->info.address;
+ thread->debugreg7 |= encode_dr7(pos, bp->info.len,
+ bp->info.type);
+ } else
+ thread->debugreg[pos] = 0;
+}
+
+void arch_flush_thread_hw_breakpoint(struct task_struct *tsk)
+{
+ int i;
+ struct thread_struct *thread = &(tsk->thread);
+
+ thread->debugreg7 = 0;
+ for (i = 0; i < HBP_NUM; i++)
+ thread->debugreg[i] = 0;
+}
+
+/*
+ * Handle debug exception notifications.
+ *
+ * Return value is either NOTIFY_STOP or NOTIFY_DONE as explained below.
+ *
+ * NOTIFY_DONE returned if one of the following conditions is true.
+ * i) When the causative address is from user-space and the exception
+ * is a valid one, i.e. not triggered as a result of lazy debug register
+ * switching
+ * ii) When there are more bits than trap<n> set in DR6 register (such
+ * as BD, BS or BT) indicating that more than one debug condition is
+ * met and requires some more action in do_debug().
+ *
+ * NOTIFY_STOP returned for all other cases
+ *
+ */
+int __kprobes hw_breakpoint_handler(struct die_args *args)
+{
+ int i, cpu, rc = NOTIFY_STOP;
+ struct hw_breakpoint *bp;
+ /* The DR6 value is stored in args->err */
+ unsigned long dr7, dr6 = args->err;
+
+ /* Do an early return if no trap bits are set in DR6 */
+ if ((dr6 & DR_TRAP_BITS) == 0)
+ return NOTIFY_DONE;
+
+ /* Lazy debug register switching */
+ if (!test_tsk_thread_flag(current, TIF_DEBUG))
+ arch_uninstall_thread_hw_breakpoint();
+
+ get_debugreg(dr7, 7);
+ /* Disable breakpoints during exception handling */
+ set_debugreg(0UL, 7);
+ /*
+ * Assert that local interrupts are disabled
+ * Reset the DRn bits in the virtualized register value.
+ * The ptrace trigger routine will add in whatever is needed.
+ */
+ current->thread.debugreg6 &= ~DR_TRAP_BITS;
+ cpu = get_cpu();
+
+ /* Handle all the breakpoints that were triggered */
+ for (i = 0; i < HBP_NUM; ++i) {
+ if (likely(!(dr6 & (DR_TRAP0 << i))))
+ continue;
+ /*
+ * Find the corresponding hw_breakpoint structure and
+ * invoke its triggered callback.
+ */
+ if (i >= hbp_kernel_pos)
+ bp = per_cpu(this_hbp_kernel[i], cpu);
+ else {
+ bp = current->thread.hbp[i];
+ if (bp)
+ rc = NOTIFY_DONE;
+ }
+ /*
+ * bp can be NULL due to lazy debug register switching
+ * or due to the delay between updates of hbp_kernel_pos
+ * and this_hbp_kernel.
+ */
+ if (!bp)
+ continue;
+
+ (bp->triggered)(bp, args->regs);
+ }
+ if (dr6 & (~DR_TRAP_BITS))
+ rc = NOTIFY_DONE;
+
+ set_debugreg(dr7, 7);
+ put_cpu_no_resched();
+ return rc;
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+int __kprobes hw_breakpoint_exceptions_notify(
+ struct notifier_block *unused, unsigned long val, void *data)
+{
+ if (val != DIE_DEBUG)
+ return NOTIFY_DONE;
+
+ return hw_breakpoint_handler(data);
+}
--
1.6.2.3

2009-06-02 22:44:52

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 04/12] hw-breakpoints: modifying generic debug exception to use thread-specific debug registers

From: K.Prasad <[email protected]>

This patch modifies the breakpoint exception handler code to use the new
abstract debug register names.

[ [email protected]: fix conflict against kmemcheck ]

[ Impact: refactor and cleanup x86 debug exception handler ]

Original-patch-by: Alan Stern <[email protected]>
Signed-off-by: K.Prasad <[email protected]>
Reviewed-by: Alan Stern <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/kernel/traps.c | 69 ++++++++++++++++------------------------------
1 files changed, 24 insertions(+), 45 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index a1d2883..de99132 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -529,73 +529,52 @@ asmlinkage __kprobes struct pt_regs *sync_regs(struct pt_regs *eregs)
dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
{
struct task_struct *tsk = current;
- unsigned long condition;
+ unsigned long dr6;
int si_code;

- get_debugreg(condition, 6);
+ get_debugreg(dr6, 6);

+ /* DR6 may or may not be cleared by the CPU */
+ set_debugreg(0, 6);
/*
* The processor cleared BTF, so don't mark that we need it set.
*/
clear_tsk_thread_flag(tsk, TIF_DEBUGCTLMSR);
tsk->thread.debugctlmsr = 0;

- if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
+ /* Store the virtualized DR6 value */
+ tsk->thread.debugreg6 = dr6;
+
+ if (notify_die(DIE_DEBUG, "debug", regs, dr6, error_code,
SIGTRAP) == NOTIFY_STOP)
return;

/* It's safe to allow irq's after DR6 has been saved */
preempt_conditional_sti(regs);

- /* Mask out spurious debug traps due to lazy DR7 setting */
- if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
- if (!tsk->thread.debugreg7)
- goto clear_dr7;
+ if (regs->flags & X86_VM_MASK) {
+ handle_vm86_trap((struct kernel_vm86_regs *) regs,
+ error_code, 1);
+ return;
}

-#ifdef CONFIG_X86_32
- if (regs->flags & X86_VM_MASK)
- goto debug_vm86;
-#endif
-
- /* Save debug status register where ptrace can see it */
- tsk->thread.debugreg6 = condition;
-
/*
- * Single-stepping through TF: make sure we ignore any events in
- * kernel space (but re-enable TF when returning to user mode).
+ * Single-stepping through system calls: ignore any exceptions in
+ * kernel space, but re-enable TF when returning to user mode.
+ *
+ * We already checked v86 mode above, so we can check for kernel mode
+ * by just checking the CPL of CS.
*/
- if (condition & DR_STEP) {
- if (!user_mode(regs))
- goto clear_TF_reenable;
+ if ((dr6 & DR_STEP) && !user_mode(regs)) {
+ tsk->thread.debugreg6 &= ~DR_STEP;
+ set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
+ regs->flags &= ~X86_EFLAGS_TF;
}
-
- si_code = get_si_code(condition);
- /* Ok, finally something we can handle */
- send_sigtrap(tsk, regs, error_code, si_code);
-
- /*
- * Disable additional traps. They'll be re-enabled when
- * the signal is delivered.
- */
-clear_dr7:
- set_debugreg(0, 7);
+ si_code = get_si_code(tsk->thread.debugreg6);
+ if (tsk->thread.debugreg6 & (DR_STEP | DR_TRAP_BITS))
+ send_sigtrap(tsk, regs, error_code, si_code);
preempt_conditional_cli(regs);
- return;

-#ifdef CONFIG_X86_32
-debug_vm86:
- /* reenable preemption: handle_vm86_trap() might sleep */
- dec_preempt_count();
- handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, 1);
- conditional_cli(regs);
- return;
-#endif
-
-clear_TF_reenable:
- set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
- regs->flags &= ~X86_EFLAGS_TF;
- preempt_conditional_cli(regs);
return;
}

--
1.6.2.3

2009-06-02 22:45:07

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 06/12] hw-breakpoints: use the new wrapper routines to access debug registers in process/thread code

From: K.Prasad <[email protected]>

This patch enables the use of abstract debug registers in
process-handling routines, according to the new hardware breakpoint
Api.

[ Impact: adapt thread breakpoints handling code to the new breakpoint Api ]

Original-patch-by: Alan Stern <[email protected]>
Signed-off-by: K.Prasad <[email protected]>
Reviewed-by: Alan Stern <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/kernel/process.c | 22 ++++++----------------
arch/x86/kernel/process_32.c | 28 ++++++++++++++++++++++++++++
arch/x86/kernel/process_64.c | 31 +++++++++++++++++++++++++++++++
3 files changed, 65 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 291527c..19a686c 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -15,6 +15,8 @@
#include <asm/uaccess.h>
#include <asm/i387.h>
#include <asm/ds.h>
+#include <asm/debugreg.h>
+#include <asm/hw_breakpoint.h>

unsigned long idle_halt;
EXPORT_SYMBOL(idle_halt);
@@ -46,6 +48,8 @@ void free_thread_xstate(struct task_struct *tsk)
kmem_cache_free(task_xstate_cachep, tsk->thread.xstate);
tsk->thread.xstate = NULL;
}
+ if (unlikely(test_tsk_thread_flag(tsk, TIF_DEBUG)))
+ flush_thread_hw_breakpoint(tsk);

WARN(tsk->thread.ds_ctx, "leaking DS context\n");
}
@@ -106,12 +110,8 @@ void flush_thread(void)

clear_tsk_thread_flag(tsk, TIF_DEBUG);

- tsk->thread.debugreg[0] = 0;
- tsk->thread.debugreg[1] = 0;
- tsk->thread.debugreg[2] = 0;
- tsk->thread.debugreg[3] = 0;
- tsk->thread.debugreg6 = 0;
- tsk->thread.debugreg7 = 0;
+ if (unlikely(test_tsk_thread_flag(tsk, TIF_DEBUG)))
+ flush_thread_hw_breakpoint(tsk);
memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
/*
* Forget coprocessor state..
@@ -193,16 +193,6 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
else if (next->debugctlmsr != prev->debugctlmsr)
update_debugctlmsr(next->debugctlmsr);

- if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
- set_debugreg(next->debugreg[0], 0);
- set_debugreg(next->debugreg[1], 1);
- set_debugreg(next->debugreg[2], 2);
- set_debugreg(next->debugreg[3], 3);
- /* no 4 and 5 */
- set_debugreg(next->debugreg6, 6);
- set_debugreg(next->debugreg7, 7);
- }
-
if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^
test_tsk_thread_flag(next_p, TIF_NOTSC)) {
/* prev and next are different */
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index b5e4bfe..297ffff 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -61,6 +61,8 @@
#include <asm/idle.h>
#include <asm/syscalls.h>
#include <asm/ds.h>
+#include <asm/debugreg.h>
+#include <asm/hw_breakpoint.h>

asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");

@@ -265,7 +267,13 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,

task_user_gs(p) = get_user_gs(regs);

+ p->thread.io_bitmap_ptr = NULL;
tsk = current;
+ err = -ENOMEM;
+ if (unlikely(test_tsk_thread_flag(tsk, TIF_DEBUG)))
+ if (copy_thread_hw_breakpoint(tsk, p, clone_flags))
+ goto out;
+
if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
p->thread.io_bitmap_ptr = kmemdup(tsk->thread.io_bitmap_ptr,
IO_BITMAP_BYTES, GFP_KERNEL);
@@ -285,10 +293,13 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
err = do_set_thread_area(p, -1,
(struct user_desc __user *)childregs->si, 0);

+out:
if (err && p->thread.io_bitmap_ptr) {
kfree(p->thread.io_bitmap_ptr);
p->thread.io_bitmap_max = 0;
}
+ if (err)
+ flush_thread_hw_breakpoint(p);

clear_tsk_thread_flag(p, TIF_DS_AREA_MSR);
p->thread.ds_ctx = NULL;
@@ -427,6 +438,23 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
lazy_load_gs(next->gs);

percpu_write(current_task, next_p);
+ /*
+ * There's a problem with moving the arch_install_thread_hw_breakpoint()
+ * call before current is updated. Suppose a kernel breakpoint is
+ * triggered in between the two, the hw-breakpoint handler will see that
+ * the 'current' task does not have TIF_DEBUG flag set and will think it
+ * is leftover from an old task (lazy switching) and will erase it. Then
+ * until the next context switch, no user-breakpoints will be installed.
+ *
+ * The real problem is that it's impossible to update both current and
+ * physical debug registers at the same instant, so there will always be
+ * a window in which they disagree and a breakpoint might get triggered.
+ * Since we use lazy switching, we are forced to assume that a
+ * disagreement means that current is correct and the exception is due
+ * to lazy debug register switching.
+ */
+ if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
+ arch_install_thread_hw_breakpoint(next_p);

return prev_p;
}
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 5a1a1de..f7b276d 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -55,6 +55,8 @@
#include <asm/idle.h>
#include <asm/syscalls.h>
#include <asm/ds.h>
+#include <asm/debugreg.h>
+#include <asm/hw_breakpoint.h>

asmlinkage extern void ret_from_fork(void);

@@ -248,6 +250,8 @@ void release_thread(struct task_struct *dead_task)
BUG();
}
}
+ if (unlikely(dead_task->thread.debugreg7))
+ flush_thread_hw_breakpoint(dead_task);
}

static inline void set_32bit_tls(struct task_struct *t, int tls, u32 addr)
@@ -303,12 +307,18 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,

p->thread.fs = me->thread.fs;
p->thread.gs = me->thread.gs;
+ p->thread.io_bitmap_ptr = NULL;

savesegment(gs, p->thread.gsindex);
savesegment(fs, p->thread.fsindex);
savesegment(es, p->thread.es);
savesegment(ds, p->thread.ds);

+ err = -ENOMEM;
+ if (unlikely(test_tsk_thread_flag(me, TIF_DEBUG)))
+ if (copy_thread_hw_breakpoint(me, p, clone_flags))
+ goto out;
+
if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
p->thread.io_bitmap_ptr = kmalloc(IO_BITMAP_BYTES, GFP_KERNEL);
if (!p->thread.io_bitmap_ptr) {
@@ -347,6 +357,9 @@ out:
kfree(p->thread.io_bitmap_ptr);
p->thread.io_bitmap_max = 0;
}
+ if (err)
+ flush_thread_hw_breakpoint(p);
+
return err;
}

@@ -492,6 +505,24 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
*/
if (tsk_used_math(next_p) && next_p->fpu_counter > 5)
math_state_restore();
+ /*
+ * There's a problem with moving the arch_install_thread_hw_breakpoint()
+ * call before current is updated. Suppose a kernel breakpoint is
+ * triggered in between the two, the hw-breakpoint handler will see that
+ * the 'current' task does not have TIF_DEBUG flag set and will think it
+ * is leftover from an old task (lazy switching) and will erase it. Then
+ * until the next context switch, no user-breakpoints will be installed.
+ *
+ * The real problem is that it's impossible to update both current and
+ * physical debug registers at the same instant, so there will always be
+ * a window in which they disagree and a breakpoint might get triggered.
+ * Since we use lazy switching, we are forced to assume that a
+ * disagreement means that current is correct and the exception is due
+ * to lazy debug register switching.
+ */
+ if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
+ arch_install_thread_hw_breakpoint(next_p);
+
return prev_p;
}

--
1.6.2.3

2009-06-02 22:45:29

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 05/12] hw-breakpoints: use wrapper routines around debug registers in processor related functions

From: K.Prasad <[email protected]>

This patch enables the use of wrapper routines to access the debug/breakpoint
registers on cpu management.

The hardcoded debug registers save and restore operations for threads
breakpoints are replaced by wrappers.

And now that we handle the kernel breakpoints too, we also need to handle them
on cpu hotplug operations.

[ Impact: adapt new hardware breakpoint api to cpu hotplug ]

Original-patch-by: Alan Stern <[email protected]>
Signed-off-by: K.Prasad <[email protected]>
Reviewed-by: Alan Stern <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/kernel/smpboot.c | 3 +++
arch/x86/power/cpu_32.c | 13 +++----------
arch/x86/power/cpu_64.c | 12 +++---------
3 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 58d24ef..2b2652d 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -63,6 +63,7 @@
#include <asm/apic.h>
#include <asm/setup.h>
#include <asm/uv/uv.h>
+#include <asm/debugreg.h>
#include <linux/mc146818rtc.h>

#include <asm/smpboot_hooks.h>
@@ -326,6 +327,7 @@ notrace static void __cpuinit start_secondary(void *unused)
setup_secondary_clock();

wmb();
+ load_debug_registers();
cpu_idle();
}

@@ -1250,6 +1252,7 @@ void cpu_disable_common(void)
remove_cpu_from_maps(cpu);
unlock_vector_lock();
fixup_irqs();
+ hw_breakpoint_disable();
}

int native_cpu_disable(void)
diff --git a/arch/x86/power/cpu_32.c b/arch/x86/power/cpu_32.c
index 5199139..2bc3b01 100644
--- a/arch/x86/power/cpu_32.c
+++ b/arch/x86/power/cpu_32.c
@@ -13,6 +13,7 @@
#include <asm/mce.h>
#include <asm/xcr.h>
#include <asm/suspend.h>
+#include <asm/debugreg.h>

static struct saved_context saved_context;

@@ -48,6 +49,7 @@ static void __save_processor_state(struct saved_context *ctxt)
ctxt->cr2 = read_cr2();
ctxt->cr3 = read_cr3();
ctxt->cr4 = read_cr4_safe();
+ hw_breakpoint_disable();
}

/* Needed by apm.c */
@@ -83,16 +85,7 @@ static void fix_processor_context(void)
/*
* Now maybe reload the debug registers
*/
- if (current->thread.debugreg7) {
- set_debugreg(current->thread.debugreg[0], 0);
- set_debugreg(current->thread.debugreg[1], 1);
- set_debugreg(current->thread.debugreg[2], 2);
- set_debugreg(current->thread.debugreg[3], 3);
- /* no 4 and 5 */
- set_debugreg(current->thread.debugreg6, 6);
- set_debugreg(current->thread.debugreg7, 7);
- }
-
+ load_debug_registers();
}

static void __restore_processor_state(struct saved_context *ctxt)
diff --git a/arch/x86/power/cpu_64.c b/arch/x86/power/cpu_64.c
index 1e3bdcc..46866a1 100644
--- a/arch/x86/power/cpu_64.c
+++ b/arch/x86/power/cpu_64.c
@@ -16,6 +16,7 @@
#include <asm/mtrr.h>
#include <asm/xcr.h>
#include <asm/suspend.h>
+#include <asm/debugreg.h>

static void fix_processor_context(void);

@@ -71,6 +72,7 @@ static void __save_processor_state(struct saved_context *ctxt)
ctxt->cr3 = read_cr3();
ctxt->cr4 = read_cr4();
ctxt->cr8 = read_cr8();
+ hw_breakpoint_disable();
}

void save_processor_state(void)
@@ -162,13 +164,5 @@ static void fix_processor_context(void)
/*
* Now maybe reload the debug registers
*/
- if (current->thread.debugreg7){
- set_debugreg(current->thread.debugreg[0], 0);
- set_debugreg(current->thread.debugreg[1], 1);
- set_debugreg(current->thread.debugreg[2], 2);
- set_debugreg(current->thread.debugreg[3], 3);
- /* no 4 and 5 */
- loaddebug(&current->thread, 6);
- loaddebug(&current->thread, 7);
- }
+ load_debug_registers();
}
--
1.6.2.3

2009-06-02 22:45:44

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 07/12] hw-breakpoints: modify signal handling code to refrain from re-enabling HW Breakpoints

From: K.Prasad <[email protected]>

This patch disables re-enabling of Hardware Breakpoint registers through
the signal handling code. This is now done during from hw_breakpoint_handler().

Original-patch-by: Alan Stern <[email protected]>
Signed-off-by: K.Prasad <[email protected]>
Reviewed-by: Alan Stern <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/kernel/signal.c | 9 ---------
1 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 1442516..f33d2e0 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -800,15 +800,6 @@ static void do_signal(struct pt_regs *regs)

signr = get_signal_to_deliver(&info, &ka, regs, NULL);
if (signr > 0) {
- /*
- * Re-enable any watchpoints before delivering the
- * signal to user space. The processor register will
- * have been cleared if the watchpoint triggered
- * inside the kernel.
- */
- if (current->thread.debugreg7)
- set_debugreg(current->thread.debugreg7, 7);
-
/* Whee! Actually deliver the signal. */
if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
/*
--
1.6.2.3

2009-06-02 22:45:56

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 08/12] hw-breakpoints: modify Ptrace routines to access breakpoint registers

From: K.Prasad <[email protected]>

This patch modifies the ptrace code to use the new wrapper routines around the
debug/breakpoint registers.

[ Impact: adapt x86 ptrace to the new breakpoint Api ]

Original-patch-by: Alan Stern <[email protected]>
Signed-off-by: K.Prasad <[email protected]>
Signed-off-by: Maneesh Soni <[email protected]>
Reviewed-by: Alan Stern <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/kernel/ptrace.c | 231 ++++++++++++++++++++++++++++------------------
1 files changed, 141 insertions(+), 90 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 313be40..b457f78 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -34,6 +34,7 @@
#include <asm/prctl.h>
#include <asm/proto.h>
#include <asm/ds.h>
+#include <asm/hw_breakpoint.h>

#include <trace/syscall.h>

@@ -136,11 +137,6 @@ static int set_segment_reg(struct task_struct *task,
return 0;
}

-static unsigned long debugreg_addr_limit(struct task_struct *task)
-{
- return TASK_SIZE - 3;
-}
-
#else /* CONFIG_X86_64 */

#define FLAG_MASK (FLAG_MASK_32 | X86_EFLAGS_NT)
@@ -265,15 +261,6 @@ static int set_segment_reg(struct task_struct *task,
return 0;
}

-static unsigned long debugreg_addr_limit(struct task_struct *task)
-{
-#ifdef CONFIG_IA32_EMULATION
- if (test_tsk_thread_flag(task, TIF_IA32))
- return IA32_PAGE_OFFSET - 3;
-#endif
- return TASK_SIZE_MAX - 7;
-}
-
#endif /* CONFIG_X86_32 */

static unsigned long get_flags(struct task_struct *task)
@@ -464,95 +451,159 @@ static int genregs_set(struct task_struct *target,
}

/*
- * This function is trivial and will be inlined by the compiler.
- * Having it separates the implementation details of debug
- * registers from the interface details of ptrace.
+ * Decode the length and type bits for a particular breakpoint as
+ * stored in debug register 7. Return the "enabled" status.
*/
-static unsigned long ptrace_get_debugreg(struct task_struct *child, int n)
+static int decode_dr7(unsigned long dr7, int bpnum, unsigned *len,
+ unsigned *type)
{
- switch (n) {
- case 0: return child->thread.debugreg[0];
- case 1: return child->thread.debugreg[1];
- case 2: return child->thread.debugreg[2];
- case 3: return child->thread.debugreg[3];
- case 6: return child->thread.debugreg6;
- case 7: return child->thread.debugreg7;
- }
- return 0;
+ int bp_info = dr7 >> (DR_CONTROL_SHIFT + bpnum * DR_CONTROL_SIZE);
+
+ *len = (bp_info & 0xc) | 0x40;
+ *type = (bp_info & 0x3) | 0x80;
+ return (dr7 >> (bpnum * DR_ENABLE_SIZE)) & 0x3;
}

-static int ptrace_set_debugreg(struct task_struct *child,
- int n, unsigned long data)
+static void ptrace_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
{
+ struct thread_struct *thread = &(current->thread);
int i;

- if (unlikely(n == 4 || n == 5))
- return -EIO;
+ /*
+ * Store in the virtual DR6 register the fact that the breakpoint
+ * was hit so the thread's debugger will see it.
+ */
+ for (i = 0; i < hbp_kernel_pos; i++)
+ /*
+ * We will check bp->info.address against the address stored in
+ * thread's hbp structure and not debugreg[i]. This is to ensure
+ * that the corresponding bit for 'i' in DR7 register is enabled
+ */
+ if (bp->info.address == thread->hbp[i]->info.address)
+ break;

- if (n < 4 && unlikely(data >= debugreg_addr_limit(child)))
- return -EIO;
+ thread->debugreg6 |= (DR_TRAP0 << i);
+}

- switch (n) {
- case 0: child->thread.debugreg[0] = data; break;
- case 1: child->thread.debugreg[1] = data; break;
- case 2: child->thread.debugreg[2] = data; break;
- case 3: child->thread.debugreg[3] = data; break;
+/*
+ * Handle ptrace writes to debug register 7.
+ */
+static int ptrace_write_dr7(struct task_struct *tsk, unsigned long data)
+{
+ struct thread_struct *thread = &(tsk->thread);
+ unsigned long old_dr7 = thread->debugreg7;
+ int i, orig_ret = 0, rc = 0;
+ int enabled, second_pass = 0;
+ unsigned len, type;
+ struct hw_breakpoint *bp;
+
+ data &= ~DR_CONTROL_RESERVED;
+restore:
+ /*
+ * Loop through all the hardware breakpoints, making the
+ * appropriate changes to each.
+ */
+ for (i = 0; i < HBP_NUM; i++) {
+ enabled = decode_dr7(data, i, &len, &type);
+ bp = thread->hbp[i];
+
+ if (!enabled) {
+ if (bp) {
+ /* Don't unregister the breakpoints right-away,
+ * unless all register_user_hw_breakpoint()
+ * requests have succeeded. This prevents
+ * any window of opportunity for debug
+ * register grabbing by other users.
+ */
+ if (!second_pass)
+ continue;
+ unregister_user_hw_breakpoint(tsk, bp);
+ kfree(bp);
+ }
+ continue;
+ }
+ if (!bp) {
+ rc = -ENOMEM;
+ bp = kzalloc(sizeof(struct hw_breakpoint), GFP_KERNEL);
+ if (bp) {
+ bp->info.address = thread->debugreg[i];
+ bp->triggered = ptrace_triggered;
+ bp->info.len = len;
+ bp->info.type = type;
+ rc = register_user_hw_breakpoint(tsk, bp);
+ if (rc)
+ kfree(bp);
+ }
+ } else
+ rc = modify_user_hw_breakpoint(tsk, bp);
+ if (rc)
+ break;
+ }
+ /*
+ * Make a second pass to free the remaining unused breakpoints
+ * or to restore the original breakpoints if an error occurred.
+ */
+ if (!second_pass) {
+ second_pass = 1;
+ if (rc < 0) {
+ orig_ret = rc;
+ data = old_dr7;
+ }
+ goto restore;
+ }
+ return ((orig_ret < 0) ? orig_ret : rc);
+}

- case 6:
- if ((data & ~0xffffffffUL) != 0)
- return -EIO;
- child->thread.debugreg6 = data;
- break;
+/*
+ * Handle PTRACE_PEEKUSR calls for the debug register area.
+ */
+unsigned long ptrace_get_debugreg(struct task_struct *tsk, int n)
+{
+ struct thread_struct *thread = &(tsk->thread);
+ unsigned long val = 0;
+
+ if (n < HBP_NUM)
+ val = thread->debugreg[n];
+ else if (n == 6)
+ val = thread->debugreg6;
+ else if (n == 7)
+ val = thread->debugreg7;
+ return val;
+}

- case 7:
- /*
- * Sanity-check data. Take one half-byte at once with
- * check = (val >> (16 + 4*i)) & 0xf. It contains the
- * R/Wi and LENi bits; bits 0 and 1 are R/Wi, and bits
- * 2 and 3 are LENi. Given a list of invalid values,
- * we do mask |= 1 << invalid_value, so that
- * (mask >> check) & 1 is a correct test for invalid
- * values.
- *
- * R/Wi contains the type of the breakpoint /
- * watchpoint, LENi contains the length of the watched
- * data in the watchpoint case.
- *
- * The invalid values are:
- * - LENi == 0x10 (undefined), so mask |= 0x0f00. [32-bit]
- * - R/Wi == 0x10 (break on I/O reads or writes), so
- * mask |= 0x4444.
- * - R/Wi == 0x00 && LENi != 0x00, so we have mask |=
- * 0x1110.
- *
- * Finally, mask = 0x0f00 | 0x4444 | 0x1110 == 0x5f54.
- *
- * See the Intel Manual "System Programming Guide",
- * 15.2.4
- *
- * Note that LENi == 0x10 is defined on x86_64 in long
- * mode (i.e. even for 32-bit userspace software, but
- * 64-bit kernel), so the x86_64 mask value is 0x5454.
- * See the AMD manual no. 24593 (AMD64 System Programming)
- */
-#ifdef CONFIG_X86_32
-#define DR7_MASK 0x5f54
-#else
-#define DR7_MASK 0x5554
-#endif
- data &= ~DR_CONTROL_RESERVED;
- for (i = 0; i < 4; i++)
- if ((DR7_MASK >> ((data >> (16 + 4*i)) & 0xf)) & 1)
- return -EIO;
- child->thread.debugreg7 = data;
- if (data)
- set_tsk_thread_flag(child, TIF_DEBUG);
- else
- clear_tsk_thread_flag(child, TIF_DEBUG);
- break;
+/*
+ * Handle PTRACE_POKEUSR calls for the debug register area.
+ */
+int ptrace_set_debugreg(struct task_struct *tsk, int n, unsigned long val)
+{
+ struct thread_struct *thread = &(tsk->thread);
+ int rc = 0;
+
+ /* There are no DR4 or DR5 registers */
+ if (n == 4 || n == 5)
+ return -EIO;
+
+ if (n == 6) {
+ tsk->thread.debugreg6 = val;
+ goto ret_path;
}
+ if (n < HBP_NUM) {
+ if (thread->hbp[n]) {
+ if (arch_check_va_in_userspace(val,
+ thread->hbp[n]->info.len) == 0) {
+ rc = -EIO;
+ goto ret_path;
+ }
+ thread->hbp[n]->info.address = val;
+ }
+ thread->debugreg[n] = val;
+ }
+ /* All that's left is DR7 */
+ if (n == 7)
+ rc = ptrace_write_dr7(tsk, val);

- return 0;
+ret_path:
+ return rc;
}

/*
--
1.6.2.3

2009-06-02 22:46:15

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 09/12] hw-breakpoints: cleanup HW Breakpoint registers before kexec

From: K.Prasad <[email protected]>

This patch disables Hardware breakpoints before doing a 'kexec' on the machine
so that the cpu doesn't keep debug registers values which would be out of
sync for the new image.

Original-patch-by: Alan Stern <[email protected]>
Signed-off-by: K.Prasad <[email protected]>
Reviewed-by: Alan Stern <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/kernel/machine_kexec_32.c | 2 ++
arch/x86/kernel/machine_kexec_64.c | 2 ++
2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
index c1c429d..c843f84 100644
--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -25,6 +25,7 @@
#include <asm/desc.h>
#include <asm/system.h>
#include <asm/cacheflush.h>
+#include <asm/debugreg.h>

static void set_idt(void *newidt, __u16 limit)
{
@@ -202,6 +203,7 @@ void machine_kexec(struct kimage *image)

/* Interrupts aren't acceptable while we reboot */
local_irq_disable();
+ hw_breakpoint_disable();

if (image->preserve_context) {
#ifdef CONFIG_X86_IO_APIC
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 84c3bf2..4a8bb82 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -18,6 +18,7 @@
#include <asm/pgtable.h>
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
+#include <asm/debugreg.h>

static int init_one_level2_page(struct kimage *image, pgd_t *pgd,
unsigned long addr)
@@ -282,6 +283,7 @@ void machine_kexec(struct kimage *image)

/* Interrupts aren't acceptable while we reboot */
local_irq_disable();
+ hw_breakpoint_disable();

if (image->preserve_context) {
#ifdef CONFIG_X86_IO_APIC
--
1.6.2.3

2009-06-02 22:46:28

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 11/12] hw-breakpoints: ftrace plugin for kernel symbol tracing using HW Breakpoint interfaces

From: K.Prasad <[email protected]>

This patch adds an ftrace plugin to detect and profile memory access over kernel
variables. It uses HW Breakpoint interfaces to 'watch memory addresses.

Signed-off-by: K.Prasad <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/trace/Kconfig | 21 ++
kernel/trace/Makefile | 1 +
kernel/trace/trace.h | 23 ++
kernel/trace/trace_ksym.c | 525 +++++++++++++++++++++++++++++++++++++++++
kernel/trace/trace_selftest.c | 53 ++++
5 files changed, 623 insertions(+), 0 deletions(-)
create mode 100644 kernel/trace/trace_ksym.c

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index a508b9d..d7f01e6 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -314,6 +314,27 @@ config POWER_TRACER
power management decisions, specifically the C-state and P-state
behavior.

+config KSYM_TRACER
+ bool "Trace read and write access on kernel memory locations"
+ depends on HAVE_HW_BREAKPOINT
+ select TRACING
+ help
+ This tracer helps find read and write operations on any given kernel
+ symbol i.e. /proc/kallsyms.
+
+config PROFILE_KSYM_TRACER
+ bool "Profile all kernel memory accesses on 'watched' variables"
+ depends on KSYM_TRACER
+ help
+ This tracer profiles kernel accesses on variables watched through the
+ ksym tracer ftrace plugin. Depending upon the hardware, all read
+ and write operations on kernel variables can be monitored for
+ accesses.
+
+ The results will be displayed in:
+ /debugfs/tracing/profile_ksym
+
+ Say N if unsure.

config STACK_TRACER
bool "Trace max stack"
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 06b8585..658aace 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -51,5 +51,6 @@ obj-$(CONFIG_EVENT_TRACING) += trace_export.o
obj-$(CONFIG_FTRACE_SYSCALLS) += trace_syscalls.o
obj-$(CONFIG_EVENT_PROFILE) += trace_event_profile.o
obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o
+obj-$(CONFIG_KSYM_TRACER) += trace_ksym.o

libftrace-y := ftrace.o
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 6e735d4..7d5cc37 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -15,6 +15,10 @@
#include <linux/trace_seq.h>
#include <linux/ftrace_event.h>

+#ifdef CONFIG_KSYM_TRACER
+#include <asm/hw_breakpoint.h>
+#endif
+
enum trace_type {
__TRACE_FIRST_TYPE = 0,

@@ -40,6 +44,7 @@ enum trace_type {
TRACE_KMEM_FREE,
TRACE_POWER,
TRACE_BLK,
+ TRACE_KSYM,

__TRACE_LAST_TYPE,
};
@@ -207,6 +212,21 @@ struct syscall_trace_exit {
unsigned long ret;
};

+#define KSYM_SELFTEST_ENTRY "ksym_selftest_dummy"
+extern int process_new_ksym_entry(char *ksymname, int op, unsigned long addr);
+
+struct trace_ksym {
+ struct trace_entry ent;
+ struct hw_breakpoint *ksym_hbp;
+ unsigned long ksym_addr;
+ unsigned long ip;
+#ifdef CONFIG_PROFILE_KSYM_TRACER
+ unsigned long counter;
+#endif
+ struct hlist_node ksym_hlist;
+ char ksym_name[KSYM_NAME_LEN];
+ char p_name[TASK_COMM_LEN];
+};

/*
* trace_flag_type is an enumeration that holds different
@@ -323,6 +343,7 @@ extern void __ftrace_bad_type(void);
TRACE_SYSCALL_ENTER); \
IF_ASSIGN(var, ent, struct syscall_trace_exit, \
TRACE_SYSCALL_EXIT); \
+ IF_ASSIGN(var, ent, struct trace_ksym, TRACE_KSYM); \
__ftrace_bad_type(); \
} while (0)

@@ -540,6 +561,8 @@ extern int trace_selftest_startup_branch(struct tracer *trace,
struct trace_array *tr);
extern int trace_selftest_startup_hw_branches(struct tracer *trace,
struct trace_array *tr);
+extern int trace_selftest_startup_ksym(struct tracer *trace,
+ struct trace_array *tr);
#endif /* CONFIG_FTRACE_STARTUP_TEST */

extern void *head_page(struct trace_array_cpu *data);
diff --git a/kernel/trace/trace_ksym.c b/kernel/trace/trace_ksym.c
new file mode 100644
index 0000000..11c74f6
--- /dev/null
+++ b/kernel/trace/trace_ksym.c
@@ -0,0 +1,525 @@
+/*
+ * trace_ksym.c - Kernel Symbol Tracer
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2009
+ */
+
+#include <linux/kallsyms.h>
+#include <linux/uaccess.h>
+#include <linux/debugfs.h>
+#include <linux/ftrace.h>
+#include <linux/module.h>
+#include <linux/fs.h>
+
+#include "trace_output.h"
+#include "trace_stat.h"
+#include "trace.h"
+
+/* For now, let us restrict the no. of symbols traced simultaneously to number
+ * of available hardware breakpoint registers.
+ */
+#define KSYM_TRACER_MAX HBP_NUM
+
+#define KSYM_TRACER_OP_LEN 3 /* rw- */
+#define KSYM_FILTER_ENTRY_LEN (KSYM_NAME_LEN + KSYM_TRACER_OP_LEN + 1)
+
+static struct trace_array *ksym_trace_array;
+
+static unsigned int ksym_filter_entry_count;
+static unsigned int ksym_tracing_enabled;
+
+static HLIST_HEAD(ksym_filter_head);
+
+#ifdef CONFIG_PROFILE_KSYM_TRACER
+
+#define MAX_UL_INT 0xffffffff
+
+static DEFINE_MUTEX(ksym_tracer_mutex);
+
+void ksym_collect_stats(unsigned long hbp_hit_addr)
+{
+ struct hlist_node *node;
+ struct trace_ksym *entry;
+
+ rcu_read_lock();
+ hlist_for_each_entry_rcu(entry, node, &ksym_filter_head, ksym_hlist) {
+ if ((entry->ksym_addr == hbp_hit_addr) &&
+ (entry->counter <= MAX_UL_INT)) {
+ entry->counter++;
+ break;
+ }
+ }
+ rcu_read_unlock();
+}
+#endif /* CONFIG_PROFILE_KSYM_TRACER */
+
+void ksym_hbp_handler(struct hw_breakpoint *hbp, struct pt_regs *regs)
+{
+ struct ring_buffer_event *event;
+ struct trace_array *tr;
+ struct trace_ksym *entry;
+ int pc;
+
+ if (!ksym_tracing_enabled)
+ return;
+
+ tr = ksym_trace_array;
+ pc = preempt_count();
+
+ event = trace_buffer_lock_reserve(tr, TRACE_KSYM,
+ sizeof(*entry), 0, pc);
+ if (!event)
+ return;
+
+ entry = ring_buffer_event_data(event);
+ strlcpy(entry->ksym_name, hbp->info.name, KSYM_SYMBOL_LEN);
+ entry->ksym_hbp = hbp;
+ entry->ip = instruction_pointer(regs);
+ strlcpy(entry->p_name, current->comm, TASK_COMM_LEN);
+#ifdef CONFIG_PROFILE_KSYM_TRACER
+ ksym_collect_stats(hbp->info.address);
+#endif /* CONFIG_PROFILE_KSYM_TRACER */
+
+ trace_buffer_unlock_commit(tr, event, 0, pc);
+}
+
+/* Valid access types are represented as
+ *
+ * rw- : Set Read/Write Access Breakpoint
+ * -w- : Set Write Access Breakpoint
+ * --- : Clear Breakpoints
+ * --x : Set Execution Break points (Not available yet)
+ *
+ */
+static int ksym_trace_get_access_type(char *access_str)
+{
+ int pos, access = 0;
+
+ for (pos = 0; pos < KSYM_TRACER_OP_LEN; pos++) {
+ switch (access_str[pos]) {
+ case 'r':
+ access += (pos == 0) ? 4 : -1;
+ break;
+ case 'w':
+ access += (pos == 1) ? 2 : -1;
+ break;
+ case '-':
+ break;
+ default:
+ return -EINVAL;
+ }
+ }
+
+ switch (access) {
+ case 6:
+ access = HW_BREAKPOINT_RW;
+ break;
+ case 2:
+ access = HW_BREAKPOINT_WRITE;
+ break;
+ case 0:
+ access = 0;
+ }
+
+ return access;
+}
+
+/*
+ * There can be several possible malformed requests and we attempt to capture
+ * all of them. We enumerate some of the rules
+ * 1. We will not allow kernel symbols with ':' since it is used as a delimiter.
+ * i.e. multiple ':' symbols disallowed. Possible uses are of the form
+ * <module>:<ksym_name>:<op>.
+ * 2. No delimiter symbol ':' in the input string
+ * 3. Spurious operator symbols or symbols not in their respective positions
+ * 4. <ksym_name>:--- i.e. clear breakpoint request when ksym_name not in file
+ * 5. Kernel symbol not a part of /proc/kallsyms
+ * 6. Duplicate requests
+ */
+static int parse_ksym_trace_str(char *input_string, char **ksymname,
+ unsigned long *addr)
+{
+ char *delimiter = ":";
+ int ret;
+
+ ret = -EINVAL;
+ *ksymname = strsep(&input_string, delimiter);
+ *addr = kallsyms_lookup_name(*ksymname);
+
+ /* Check for malformed request: (2), (1) and (5) */
+ if ((!input_string) ||
+ (strlen(input_string) != (KSYM_TRACER_OP_LEN + 1)) ||
+ (*addr == 0))
+ goto return_code;
+ ret = ksym_trace_get_access_type(input_string);
+
+return_code:
+ return ret;
+}
+
+int process_new_ksym_entry(char *ksymname, int op, unsigned long addr)
+{
+ struct trace_ksym *entry;
+ int ret;
+
+ if (ksym_filter_entry_count >= KSYM_TRACER_MAX) {
+ printk(KERN_ERR "ksym_tracer: Maximum limit:(%d) reached. No"
+ " new requests for tracing can be accepted now.\n",
+ KSYM_TRACER_MAX);
+ return -ENOSPC;
+ }
+
+ entry = kzalloc(sizeof(struct trace_ksym), GFP_KERNEL);
+ if (!entry)
+ return -ENOMEM;
+
+ entry->ksym_hbp = kzalloc(sizeof(struct hw_breakpoint), GFP_KERNEL);
+ if (!entry->ksym_hbp) {
+ kfree(entry);
+ return -ENOMEM;
+ }
+
+ entry->ksym_hbp->info.name = ksymname;
+ entry->ksym_hbp->info.type = op;
+ entry->ksym_addr = entry->ksym_hbp->info.address = addr;
+#ifdef CONFIG_X86
+ entry->ksym_hbp->info.len = HW_BREAKPOINT_LEN_4;
+#endif
+ entry->ksym_hbp->triggered = (void *)ksym_hbp_handler;
+
+ ret = register_kernel_hw_breakpoint(entry->ksym_hbp);
+ if (ret < 0) {
+ printk(KERN_INFO "ksym_tracer request failed. Try again"
+ " later!!\n");
+ kfree(entry->ksym_hbp);
+ kfree(entry);
+ return -EAGAIN;
+ }
+ hlist_add_head_rcu(&(entry->ksym_hlist), &ksym_filter_head);
+ ksym_filter_entry_count++;
+
+ return 0;
+}
+
+static ssize_t ksym_trace_filter_read(struct file *filp, char __user *ubuf,
+ size_t count, loff_t *ppos)
+{
+ struct trace_ksym *entry;
+ struct hlist_node *node;
+ char buf[KSYM_FILTER_ENTRY_LEN * KSYM_TRACER_MAX];
+ ssize_t ret, cnt = 0;
+
+ mutex_lock(&ksym_tracer_mutex);
+
+ hlist_for_each_entry(entry, node, &ksym_filter_head, ksym_hlist) {
+ cnt += snprintf(&buf[cnt], KSYM_FILTER_ENTRY_LEN - cnt, "%s:",
+ entry->ksym_hbp->info.name);
+ if (entry->ksym_hbp->info.type == HW_BREAKPOINT_WRITE)
+ cnt += snprintf(&buf[cnt], KSYM_FILTER_ENTRY_LEN - cnt,
+ "-w-\n");
+ else if (entry->ksym_hbp->info.type == HW_BREAKPOINT_RW)
+ cnt += snprintf(&buf[cnt], KSYM_FILTER_ENTRY_LEN - cnt,
+ "rw-\n");
+ }
+ ret = simple_read_from_buffer(ubuf, count, ppos, buf, strlen(buf));
+ mutex_unlock(&ksym_tracer_mutex);
+
+ return ret;
+}
+
+static ssize_t ksym_trace_filter_write(struct file *file,
+ const char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ struct trace_ksym *entry;
+ struct hlist_node *node;
+ char *input_string, *ksymname = NULL;
+ unsigned long ksym_addr = 0;
+ int ret, op, changed = 0;
+
+ /* Ignore echo "" > ksym_trace_filter */
+ if (count == 0)
+ return 0;
+
+ input_string = kzalloc(count, GFP_KERNEL);
+ if (!input_string)
+ return -ENOMEM;
+
+ if (copy_from_user(input_string, buffer, count)) {
+ kfree(input_string);
+ return -EFAULT;
+ }
+
+ ret = op = parse_ksym_trace_str(input_string, &ksymname, &ksym_addr);
+ if (ret < 0) {
+ kfree(input_string);
+ return ret;
+ }
+
+ mutex_lock(&ksym_tracer_mutex);
+
+ ret = -EINVAL;
+ hlist_for_each_entry(entry, node, &ksym_filter_head, ksym_hlist) {
+ if (entry->ksym_addr == ksym_addr) {
+ /* Check for malformed request: (6) */
+ if (entry->ksym_hbp->info.type != op)
+ changed = 1;
+ else
+ goto err_ret;
+ break;
+ }
+ }
+ if (changed) {
+ unregister_kernel_hw_breakpoint(entry->ksym_hbp);
+ entry->ksym_hbp->info.type = op;
+ if (op > 0) {
+ ret = register_kernel_hw_breakpoint(entry->ksym_hbp);
+ if (ret == 0) {
+ ret = count;
+ goto unlock_ret_path;
+ }
+ }
+ ksym_filter_entry_count--;
+ hlist_del_rcu(&(entry->ksym_hlist));
+ synchronize_rcu();
+ kfree(entry->ksym_hbp);
+ kfree(entry);
+ ret = count;
+ goto err_ret;
+ } else {
+ /* Check for malformed request: (4) */
+ if (op == 0)
+ goto err_ret;
+ ret = process_new_ksym_entry(ksymname, op, ksym_addr);
+ if (ret)
+ goto err_ret;
+ }
+ ret = count;
+ goto unlock_ret_path;
+
+err_ret:
+ kfree(input_string);
+
+unlock_ret_path:
+ mutex_unlock(&ksym_tracer_mutex);
+ return ret;
+}
+
+static const struct file_operations ksym_tracing_fops = {
+ .open = tracing_open_generic,
+ .read = ksym_trace_filter_read,
+ .write = ksym_trace_filter_write,
+};
+
+static void ksym_trace_reset(struct trace_array *tr)
+{
+ struct trace_ksym *entry;
+ struct hlist_node *node, *node1;
+
+ ksym_tracing_enabled = 0;
+
+ mutex_lock(&ksym_tracer_mutex);
+ hlist_for_each_entry_safe(entry, node, node1, &ksym_filter_head,
+ ksym_hlist) {
+ unregister_kernel_hw_breakpoint(entry->ksym_hbp);
+ ksym_filter_entry_count--;
+ hlist_del_rcu(&(entry->ksym_hlist));
+ synchronize_rcu();
+ /* Free the 'input_string' only if reset
+ * after startup self-test
+ */
+#ifdef CONFIG_FTRACE_SELFTEST
+ if (strncmp(entry->ksym_hbp->info.name, KSYM_SELFTEST_ENTRY,
+ strlen(KSYM_SELFTEST_ENTRY)) != 0)
+#endif /* CONFIG_FTRACE_SELFTEST*/
+ kfree(entry->ksym_hbp->info.name);
+ kfree(entry->ksym_hbp);
+ kfree(entry);
+ }
+ mutex_unlock(&ksym_tracer_mutex);
+}
+
+static int ksym_trace_init(struct trace_array *tr)
+{
+ int cpu, ret = 0;
+
+ for_each_online_cpu(cpu)
+ tracing_reset(tr, cpu);
+ ksym_tracing_enabled = 1;
+ ksym_trace_array = tr;
+
+ return ret;
+}
+
+static void ksym_trace_print_header(struct seq_file *m)
+{
+
+ seq_puts(m,
+ "# TASK-PID CPU# Symbol Type "
+ "Function \n");
+ seq_puts(m,
+ "# | | | | "
+ "| \n");
+}
+
+static enum print_line_t ksym_trace_output(struct trace_iterator *iter)
+{
+ struct trace_entry *entry = iter->ent;
+ struct trace_seq *s = &iter->seq;
+ struct trace_ksym *field;
+ char str[KSYM_SYMBOL_LEN];
+ int ret;
+
+ if (entry->type != TRACE_KSYM)
+ return TRACE_TYPE_UNHANDLED;
+
+ trace_assign_type(field, entry);
+
+ ret = trace_seq_printf(s, "%-15s %-5d %-3d %-20s ", field->p_name,
+ entry->pid, iter->cpu, field->ksym_name);
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ switch (field->ksym_hbp->info.type) {
+ case HW_BREAKPOINT_WRITE:
+ ret = trace_seq_printf(s, " W ");
+ break;
+ case HW_BREAKPOINT_RW:
+ ret = trace_seq_printf(s, " RW ");
+ break;
+ default:
+ return TRACE_TYPE_PARTIAL_LINE;
+ }
+
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ sprint_symbol(str, field->ip);
+ ret = trace_seq_printf(s, "%-20s\n", str);
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ return TRACE_TYPE_HANDLED;
+}
+
+struct tracer ksym_tracer __read_mostly =
+{
+ .name = "ksym_tracer",
+ .init = ksym_trace_init,
+ .reset = ksym_trace_reset,
+#ifdef CONFIG_FTRACE_SELFTEST
+ .selftest = trace_selftest_startup_ksym,
+#endif
+ .print_header = ksym_trace_print_header,
+ .print_line = ksym_trace_output
+};
+
+__init static int init_ksym_trace(void)
+{
+ struct dentry *d_tracer;
+ struct dentry *entry;
+
+ d_tracer = tracing_init_dentry();
+ ksym_filter_entry_count = 0;
+
+ entry = debugfs_create_file("ksym_trace_filter", 0644, d_tracer,
+ NULL, &ksym_tracing_fops);
+ if (!entry)
+ pr_warning("Could not create debugfs "
+ "'ksym_trace_filter' file\n");
+
+ return register_tracer(&ksym_tracer);
+}
+device_initcall(init_ksym_trace);
+
+
+#ifdef CONFIG_PROFILE_KSYM_TRACER
+static int ksym_tracer_stat_headers(struct seq_file *m)
+{
+ seq_printf(m, " Access type ");
+ seq_printf(m, " Symbol Counter \n");
+ return 0;
+}
+
+static int ksym_tracer_stat_show(struct seq_file *m, void *v)
+{
+ struct hlist_node *stat = v;
+ struct trace_ksym *entry;
+ int access_type = 0;
+ char fn_name[KSYM_NAME_LEN];
+
+ entry = hlist_entry(stat, struct trace_ksym, ksym_hlist);
+
+ if (entry->ksym_hbp)
+ access_type = entry->ksym_hbp->info.type;
+
+ switch (access_type) {
+ case HW_BREAKPOINT_WRITE:
+ seq_printf(m, " W ");
+ break;
+ case HW_BREAKPOINT_RW:
+ seq_printf(m, " RW ");
+ break;
+ default:
+ seq_printf(m, " NA ");
+ }
+
+ if (lookup_symbol_name(entry->ksym_addr, fn_name) >= 0)
+ seq_printf(m, " %s ", fn_name);
+ else
+ seq_printf(m, " <NA> ");
+
+ seq_printf(m, "%15lu\n", entry->counter);
+ return 0;
+}
+
+static void *ksym_tracer_stat_start(struct tracer_stat *trace)
+{
+ return &(ksym_filter_head.first);
+}
+
+static void *
+ksym_tracer_stat_next(void *v, int idx)
+{
+ struct hlist_node *stat = v;
+
+ return stat->next;
+}
+
+static struct tracer_stat ksym_tracer_stats = {
+ .name = "ksym_tracer",
+ .stat_start = ksym_tracer_stat_start,
+ .stat_next = ksym_tracer_stat_next,
+ .stat_headers = ksym_tracer_stat_headers,
+ .stat_show = ksym_tracer_stat_show
+};
+
+__init static int ksym_tracer_stat_init(void)
+{
+ int ret;
+
+ ret = register_stat_tracer(&ksym_tracer_stats);
+ if (ret) {
+ printk(KERN_WARNING "Warning: could not register "
+ "ksym tracer stats\n");
+ return 1;
+ }
+
+ return 0;
+}
+fs_initcall(ksym_tracer_stat_init);
+#endif /* CONFIG_PROFILE_KSYM_TRACER */
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 00dd648..71f2edb 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -17,6 +17,7 @@ static inline int trace_valid_entry(struct trace_entry *entry)
case TRACE_GRAPH_ENT:
case TRACE_GRAPH_RET:
case TRACE_HW_BRANCHES:
+ case TRACE_KSYM:
return 1;
}
return 0;
@@ -807,3 +808,55 @@ trace_selftest_startup_hw_branches(struct tracer *trace,
return ret;
}
#endif /* CONFIG_HW_BRANCH_TRACER */
+
+#ifdef CONFIG_KSYM_TRACER
+static int ksym_selftest_dummy;
+
+int
+trace_selftest_startup_ksym(struct tracer *trace, struct trace_array *tr)
+{
+ unsigned long count;
+ int ret;
+
+ /* start the tracing */
+ ret = tracer_init(trace, tr);
+ if (ret) {
+ warn_failed_init_tracer(trace, ret);
+ return ret;
+ }
+
+ ksym_selftest_dummy = 0;
+ /* Register the read-write tracing request */
+ ret = process_new_ksym_entry(KSYM_SELFTEST_ENTRY, HW_BREAKPOINT_RW,
+ (unsigned long)(&ksym_selftest_dummy));
+
+ if (ret < 0) {
+ printk(KERN_CONT "ksym_trace read-write startup test failed\n");
+ goto ret_path;
+ }
+ /* Perform a read and a write operation over the dummy variable to
+ * trigger the tracer
+ */
+ if (ksym_selftest_dummy == 0)
+ ksym_selftest_dummy++;
+
+ /* stop the tracing. */
+ tracing_stop();
+ /* check the trace buffer */
+ ret = trace_test_buffer(tr, &count);
+ trace->reset(tr);
+ tracing_start();
+
+ /* read & write operations - one each is performed on the dummy variable
+ * triggering two entries in the trace buffer
+ */
+ if (!ret && count != 2) {
+ printk(KERN_CONT "Ksym tracer startup test failed");
+ ret = -1;
+ }
+
+ret_path:
+ return ret;
+}
+#endif /* CONFIG_KSYM_TRACER */
+
--
1.6.2.3

2009-06-02 22:46:42

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 10/12] hw-breakpoints: sample HW breakpoint over kernel data address

From: K.Prasad <[email protected]>

This patch introduces a sample kernel module to demonstrate the use of Hardware
Breakpoint feature. It places a breakpoint over the kernel variable 'pid_max'
to monitor all write operations and emits a function-backtrace when done.

Signed-off-by: K.Prasad <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
samples/Kconfig | 6 ++
samples/Makefile | 3 +-
samples/hw_breakpoint/Makefile | 1 +
samples/hw_breakpoint/data_breakpoint.c | 83 +++++++++++++++++++++++++++++++
4 files changed, 92 insertions(+), 1 deletions(-)
create mode 100644 samples/hw_breakpoint/Makefile
create mode 100644 samples/hw_breakpoint/data_breakpoint.c

diff --git a/samples/Kconfig b/samples/Kconfig
index b75d28c..8458516 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -45,5 +45,11 @@ config SAMPLE_KRETPROBES
default m
depends on SAMPLE_KPROBES && KRETPROBES

+config SAMPLE_HW_BREAKPOINT
+ tristate "Build kernel hardware breakpoint examples -- loadable module only"
+ depends on HAVE_HW_BREAKPOINT && m
+ help
+ This builds kernel hardware breakpoint example modules.
+
endif # SAMPLES

diff --git a/samples/Makefile b/samples/Makefile
index 13e4b47..42e1755 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -1,3 +1,4 @@
# Makefile for Linux samples code

-obj-$(CONFIG_SAMPLES) += markers/ kobject/ kprobes/ tracepoints/ trace_events/
+obj-$(CONFIG_SAMPLES) += markers/ kobject/ kprobes/ tracepoints/ \
+ trace_events/ hw_breakpoint/
diff --git a/samples/hw_breakpoint/Makefile b/samples/hw_breakpoint/Makefile
new file mode 100644
index 0000000..0f5c31c
--- /dev/null
+++ b/samples/hw_breakpoint/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_SAMPLE_HW_BREAKPOINT) += data_breakpoint.o
diff --git a/samples/hw_breakpoint/data_breakpoint.c b/samples/hw_breakpoint/data_breakpoint.c
new file mode 100644
index 0000000..9cbdbb8
--- /dev/null
+++ b/samples/hw_breakpoint/data_breakpoint.c
@@ -0,0 +1,83 @@
+/*
+ * data_breakpoint.c - Sample HW Breakpoint file to watch kernel data address
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * usage: insmod data_breakpoint.ko ksym=<ksym_name>
+ *
+ * This file is a kernel module that places a breakpoint over ksym_name kernel
+ * variable using Hardware Breakpoint register. The corresponding handler which
+ * prints a backtrace is invoked everytime a write operation is performed on
+ * that variable.
+ *
+ * Copyright (C) IBM Corporation, 2009
+ */
+#include <linux/module.h> /* Needed by all modules */
+#include <linux/kernel.h> /* Needed for KERN_INFO */
+#include <linux/init.h> /* Needed for the macros */
+
+#include <asm/hw_breakpoint.h>
+
+struct hw_breakpoint sample_hbp;
+
+static char ksym_name[KSYM_NAME_LEN] = "pid_max";
+module_param_string(ksym, ksym_name, KSYM_NAME_LEN, S_IRUGO);
+MODULE_PARM_DESC(ksym, "Kernel symbol to monitor; this module will report any"
+ " write operations on the kernel symbol");
+
+void sample_hbp_handler(struct hw_breakpoint *temp, struct pt_regs
+ *temp_regs)
+{
+ printk(KERN_INFO "%s value is changed\n", ksym_name);
+ dump_stack();
+ printk(KERN_INFO "Dump stack from sample_hbp_handler\n");
+}
+
+static int __init hw_break_module_init(void)
+{
+ int ret;
+
+#ifdef CONFIG_X86
+ sample_hbp.info.name = ksym_name;
+ sample_hbp.info.type = HW_BREAKPOINT_WRITE;
+ sample_hbp.info.len = HW_BREAKPOINT_LEN_4;
+#endif /* CONFIG_X86 */
+
+ sample_hbp.triggered = (void *)sample_hbp_handler;
+
+ ret = register_kernel_hw_breakpoint(&sample_hbp);
+
+ if (ret < 0) {
+ printk(KERN_INFO "Breakpoint registration failed\n");
+ return ret;
+ } else
+ printk(KERN_INFO "HW Breakpoint for %s write installed\n",
+ ksym_name);
+
+ return 0;
+}
+
+static void __exit hw_break_module_exit(void)
+{
+ unregister_kernel_hw_breakpoint(&sample_hbp);
+ printk(KERN_INFO "HW Breakpoint for %s write uninstalled\n", ksym_name);
+}
+
+module_init(hw_break_module_init);
+module_exit(hw_break_module_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("K.Prasad");
+MODULE_DESCRIPTION("ksym breakpoint");
--
1.6.2.3

2009-06-02 22:46:54

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 12/12] hw-breakpoints: reset bits in dr6 after the corresponding exception is handled

From: K.Prasad <[email protected]>

This patch resets the bit in dr6 after the corresponding exception is
handled in code, so that we keep a clean track of the current virtual debug
status register.

[ Impact: keep track of breakpoints triggering completion ]

Signed-off-by: K.Prasad <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/kernel/hw_breakpoint.c | 13 +++++++++++--
arch/x86/kernel/kgdb.c | 6 ++++++
arch/x86/kernel/kprobes.c | 9 ++++++++-
arch/x86/kernel/traps.c | 4 ++--
arch/x86/mm/kmmio.c | 8 +++++++-
5 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c
index 4867c9f..6945147 100644
--- a/arch/x86/kernel/hw_breakpoint.c
+++ b/arch/x86/kernel/hw_breakpoint.c
@@ -314,8 +314,12 @@ int __kprobes hw_breakpoint_handler(struct die_args *args)
{
int i, cpu, rc = NOTIFY_STOP;
struct hw_breakpoint *bp;
- /* The DR6 value is stored in args->err */
- unsigned long dr7, dr6 = args->err;
+ unsigned long dr7, dr6;
+ unsigned long *dr6_p;
+
+ /* The DR6 value is pointed by args->err */
+ dr6_p = (unsigned long *)ERR_PTR(args->err);
+ dr6 = *dr6_p;

/* Do an early return if no trap bits are set in DR6 */
if ((dr6 & DR_TRAP_BITS) == 0)
@@ -352,6 +356,11 @@ int __kprobes hw_breakpoint_handler(struct die_args *args)
rc = NOTIFY_DONE;
}
/*
+ * Reset the 'i'th TRAP bit in dr6 to denote completion of
+ * exception handling
+ */
+ (*dr6_p) &= ~(DR_TRAP0 << i);
+ /*
* bp can be NULL due to lazy debug register switching
* or due to the delay between updates of hbp_kernel_pos
* and this_hbp_kernel.
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index b1f4dff..f820b73 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -43,6 +43,7 @@
#include <linux/smp.h>
#include <linux/nmi.h>

+#include <asm/debugreg.h>
#include <asm/apicdef.h>
#include <asm/system.h>

@@ -434,6 +435,11 @@ single_step_cont(struct pt_regs *regs, struct die_args *args)
"resuming...\n");
kgdb_arch_handle_exception(args->trapnr, args->signr,
args->err, "c", "", regs);
+ /*
+ * Reset the BS bit in dr6 (pointed by args->err) to
+ * denote completion of processing
+ */
+ (*(unsigned long *)ERR_PTR(args->err)) &= ~DR_STEP;

return NOTIFY_STOP;
}
diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index 7b5169d..b5b1848 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -54,6 +54,7 @@
#include <asm/pgtable.h>
#include <asm/uaccess.h>
#include <asm/alternative.h>
+#include <asm/debugreg.h>

void jprobe_return_end(void);

@@ -967,8 +968,14 @@ int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
ret = NOTIFY_STOP;
break;
case DIE_DEBUG:
- if (post_kprobe_handler(args->regs))
+ if (post_kprobe_handler(args->regs)) {
+ /*
+ * Reset the BS bit in dr6 (pointed by args->err) to
+ * denote completion of processing
+ */
+ (*(unsigned long *)ERR_PTR(args->err)) &= ~DR_STEP;
ret = NOTIFY_STOP;
+ }
break;
case DIE_GPF:
/*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index de99132..124a4d5 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -545,8 +545,8 @@ dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
/* Store the virtualized DR6 value */
tsk->thread.debugreg6 = dr6;

- if (notify_die(DIE_DEBUG, "debug", regs, dr6, error_code,
- SIGTRAP) == NOTIFY_STOP)
+ if (notify_die(DIE_DEBUG, "debug", regs, PTR_ERR(&dr6), error_code,
+ SIGTRAP) == NOTIFY_STOP)
return;

/* It's safe to allow irq's after DR6 has been saved */
diff --git a/arch/x86/mm/kmmio.c b/arch/x86/mm/kmmio.c
index 16ccbd7..11a4ad4 100644
--- a/arch/x86/mm/kmmio.c
+++ b/arch/x86/mm/kmmio.c
@@ -540,8 +540,14 @@ kmmio_die_notifier(struct notifier_block *nb, unsigned long val, void *args)
struct die_args *arg = args;

if (val == DIE_DEBUG && (arg->err & DR_STEP))
- if (post_kmmio_handler(arg->err, arg->regs) == 1)
+ if (post_kmmio_handler(arg->err, arg->regs) == 1) {
+ /*
+ * Reset the BS bit in dr6 (pointed by args->err) to
+ * denote completion of processing
+ */
+ (*(unsigned long *)ERR_PTR(arg->err)) &= ~DR_STEP;
return NOTIFY_STOP;
+ }

return NOTIFY_DONE;
}
--
1.6.2.3

2009-06-02 23:13:25

by David Daney

[permalink] [raw]
Subject: Re: [PATCH 11/12] hw-breakpoints: ftrace plugin for kernel symbol tracing using HW Breakpoint interfaces

Frederic Weisbecker wrote:
> From: K.Prasad <[email protected]>
>
> This patch adds an ftrace plugin to detect and profile memory access over kernel
> variables. It uses HW Breakpoint interfaces to 'watch memory addresses.
>
> Signed-off-by: K.Prasad <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> ---
> kernel/trace/Kconfig | 21 ++
> kernel/trace/Makefile | 1 +
> kernel/trace/trace.h | 23 ++
> kernel/trace/trace_ksym.c | 525 +++++++++++++++++++++++++++++++++++++++++
> kernel/trace/trace_selftest.c | 53 ++++
> 5 files changed, 623 insertions(+), 0 deletions(-)
> create mode 100644 kernel/trace/trace_ksym.c
[...]
> + entry->ksym_hbp->info.name = ksymname;
> + entry->ksym_hbp->info.type = op;
> + entry->ksym_addr = entry->ksym_hbp->info.address = addr;
> +#ifdef CONFIG_X86
> + entry->ksym_hbp->info.len = HW_BREAKPOINT_LEN_4;
> +#endif

What if the symbol referred to an object of size other than 4? This
would clearly be incorrect in that case.


> + entry->ksym_hbp->triggered = (void *)ksym_hbp_handler;
> +
> + ret = register_kernel_hw_breakpoint(entry->ksym_hbp);

I hate to sound like a broken record, but could some one explain to me
again why it is a good idea to design a new API that requires processor
specific #ifdefs to be sprinkled all around generic kernel code?

Back in:
http://lkml.org/lkml/2008/12/4/329
and
http://lkml.org/lkml/2009/5/21/189

I raised doubts about this hw-breakpoint thing being generic and the
responses made think that the processor specific portions would be
isolated in the processor specific parts of the kernel. I now see that
I was wrong.

When we add sparc, MIPS, ppc... Support it would be nice to not have to
add all our own #ifdefs to this, but instead have a generic interface
that will not need changes.

David Daney

2009-06-02 23:38:42

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 00/12] hw-breakpoints: new hardware breakpoints API


had a quick testrun with this and triggered a build failure:

kernel/trace/trace_ksym.c:226: error: ksym_tracer_mutex undeclared (first use in this function)

Ingo

2009-06-03 00:38:22

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 11/12] hw-breakpoints: ftrace plugin for kernel symbol tracing using HW Breakpoint interfaces

On Tue, Jun 02, 2009 at 04:12:08PM -0700, David Daney wrote:
> Frederic Weisbecker wrote:
>> From: K.Prasad <[email protected]>
>>
>> This patch adds an ftrace plugin to detect and profile memory access over kernel
>> variables. It uses HW Breakpoint interfaces to 'watch memory addresses.
>>
>> Signed-off-by: K.Prasad <[email protected]>
>> Signed-off-by: Frederic Weisbecker <[email protected]>
>> ---
>> kernel/trace/Kconfig | 21 ++
>> kernel/trace/Makefile | 1 +
>> kernel/trace/trace.h | 23 ++
>> kernel/trace/trace_ksym.c | 525 +++++++++++++++++++++++++++++++++++++++++
>> kernel/trace/trace_selftest.c | 53 ++++
>> 5 files changed, 623 insertions(+), 0 deletions(-)
>> create mode 100644 kernel/trace/trace_ksym.c
> [...]
>> + entry->ksym_hbp->info.name = ksymname;
>> + entry->ksym_hbp->info.type = op;
>> + entry->ksym_addr = entry->ksym_hbp->info.address = addr;
>> +#ifdef CONFIG_X86
>> + entry->ksym_hbp->info.len = HW_BREAKPOINT_LEN_4;
>> +#endif
>
> What if the symbol referred to an object of size other than 4? This
> would clearly be incorrect in that case.
>
>
>> + entry->ksym_hbp->triggered = (void *)ksym_hbp_handler;
>> +
>> + ret = register_kernel_hw_breakpoint(entry->ksym_hbp);
>
> I hate to sound like a broken record, but could some one explain to me
> again why it is a good idea to design a new API that requires processor
> specific #ifdefs to be sprinkled all around generic kernel code?
>
> Back in:
> http://lkml.org/lkml/2008/12/4/329
> and
> http://lkml.org/lkml/2009/5/21/189
>
> I raised doubts about this hw-breakpoint thing being generic and the
> responses made think that the processor specific portions would be
> isolated in the processor specific parts of the kernel. I now see that
> I was wrong.
>
> When we add sparc, MIPS, ppc... Support it would be nice to not have to
> add all our own #ifdefs to this, but instead have a generic interface
> that will not need changes.
>
> David Daney

I was discussing about it with Prasad few hours ago :)

The fact is that archs support the hardware breakpoints in
very different ways each.
Some of them support read breakpoint, others not (x86).
Some support addresses range, others (x86).

But still it would be nice to gather the most common
breakpoints operations through a real generic wrapper
that relies on arch specific implmentation in
background.

Such as setting very simple x/w/r breakpoints...

Well Prasad and Alan Stern could tell more about it,
I wait for their answer.

Anyway it's a fairly new Api that can still evolve.
The basis are set but can still be improved and more high level
and generic things can still be implemented.

2009-06-03 16:04:30

by David Daney

[permalink] [raw]
Subject: Re: [PATCH 11/12] hw-breakpoints: ftrace plugin for kernel symbol tracing using HW Breakpoint interfaces

Frederic Weisbecker wrote:
> On Tue, Jun 02, 2009 at 04:12:08PM -0700, David Daney wrote:
>> Frederic Weisbecker wrote:
>>> From: K.Prasad <[email protected]>
>>>
>>> This patch adds an ftrace plugin to detect and profile memory access over kernel
>>> variables. It uses HW Breakpoint interfaces to 'watch memory addresses.
>>>
>>> Signed-off-by: K.Prasad <[email protected]>
>>> Signed-off-by: Frederic Weisbecker <[email protected]>
>>> ---
>>> kernel/trace/Kconfig | 21 ++
>>> kernel/trace/Makefile | 1 +
>>> kernel/trace/trace.h | 23 ++
>>> kernel/trace/trace_ksym.c | 525 +++++++++++++++++++++++++++++++++++++++++
>>> kernel/trace/trace_selftest.c | 53 ++++
>>> 5 files changed, 623 insertions(+), 0 deletions(-)
>>> create mode 100644 kernel/trace/trace_ksym.c
>> [...]
>>> + entry->ksym_hbp->info.name = ksymname;
>>> + entry->ksym_hbp->info.type = op;
>>> + entry->ksym_addr = entry->ksym_hbp->info.address = addr;
>>> +#ifdef CONFIG_X86
>>> + entry->ksym_hbp->info.len = HW_BREAKPOINT_LEN_4;
>>> +#endif
>> What if the symbol referred to an object of size other than 4? This
>> would clearly be incorrect in that case.
>>
>>
>>> + entry->ksym_hbp->triggered = (void *)ksym_hbp_handler;
>>> +
>>> + ret = register_kernel_hw_breakpoint(entry->ksym_hbp);
>> I hate to sound like a broken record, but could some one explain to me
>> again why it is a good idea to design a new API that requires processor
>> specific #ifdefs to be sprinkled all around generic kernel code?
>>
>> Back in:
>> http://lkml.org/lkml/2008/12/4/329
>> and
>> http://lkml.org/lkml/2009/5/21/189
>>
>> I raised doubts about this hw-breakpoint thing being generic and the
>> responses made think that the processor specific portions would be
>> isolated in the processor specific parts of the kernel. I now see that
>> I was wrong.
>>
>> When we add sparc, MIPS, ppc... Support it would be nice to not have to
>> add all our own #ifdefs to this, but instead have a generic interface
>> that will not need changes.
>>
>> David Daney
>
> I was discussing about it with Prasad few hours ago :)
>
> The fact is that archs support the hardware breakpoints in
> very different ways each.
> Some of them support read breakpoint, others not (x86).
> Some support addresses range, others (x86).
>
> But still it would be nice to gather the most common
> breakpoints operations through a real generic wrapper
> that relies on arch specific implmentation in
> background.
>

Thanks for taking the time to think about it. I do think this patch set
will be useful.

One thing I was thinking was that when I am using gdb, all I have to do
is say 'watch symbol' and the differences in hardware breakpoint
facilities are abstracted away. Having similar abstraction in this
kernel API should be possible as well.

David Daney

2009-06-04 15:37:17

by K.Prasad

[permalink] [raw]
Subject: Re: [PATCH 11/12] hw-breakpoints: ftrace plugin for kernel symbol tracing using HW Breakpoint interfaces

On Tue, Jun 02, 2009 at 04:12:08PM -0700, David Daney wrote:
> Frederic Weisbecker wrote:
>> From: K.Prasad <[email protected]>
>>
>> This patch adds an ftrace plugin to detect and profile memory access over kernel
>> variables. It uses HW Breakpoint interfaces to 'watch memory addresses.
>>
>> Signed-off-by: K.Prasad <[email protected]>
>> Signed-off-by: Frederic Weisbecker <[email protected]>
>> ---
>> kernel/trace/Kconfig | 21 ++
>> kernel/trace/Makefile | 1 +
>> kernel/trace/trace.h | 23 ++
>> kernel/trace/trace_ksym.c | 525 +++++++++++++++++++++++++++++++++++++++++
>> kernel/trace/trace_selftest.c | 53 ++++
>> 5 files changed, 623 insertions(+), 0 deletions(-)
>> create mode 100644 kernel/trace/trace_ksym.c
> [...]
>> + entry->ksym_hbp->info.name = ksymname;
>> + entry->ksym_hbp->info.type = op;
>> + entry->ksym_addr = entry->ksym_hbp->info.address = addr;
>> +#ifdef CONFIG_X86
>> + entry->ksym_hbp->info.len = HW_BREAKPOINT_LEN_4;
>> +#endif
>
> What if the symbol referred to an object of size other than 4? This
> would clearly be incorrect in that case.
>

I don't see a way in which we could automatically detect the size of a
variable using the symbol name and use that as the 'len' (as this would
be the ideal case). However, in this case we can circumvent this problem by
using the least of the possible lengths (HW_BREAKPOINT_LEN_1 for x86)
as it would be valid for symbols of all sizes. I will change this.
Thanks for pointing it out!

>> + entry->ksym_hbp->triggered = (void *)ksym_hbp_handler;
>> +
>> + ret = register_kernel_hw_breakpoint(entry->ksym_hbp);
>
> I hate to sound like a broken record, but could some one explain to me
> again why it is a good idea to design a new API that requires processor
> specific #ifdefs to be sprinkled all around generic kernel code?
>
> Back in:
> http://lkml.org/lkml/2008/12/4/329
> and
> http://lkml.org/lkml/2009/5/21/189
>
> I raised doubts about this hw-breakpoint thing being generic and the
> responses made think that the processor specific portions would be
> isolated in the processor specific parts of the kernel. I now see that
> I was wrong.
>
> When we add sparc, MIPS, ppc... Support it would be nice to not have to
> add all our own #ifdefs to this, but instead have a generic interface
> that will not need changes.
>
> David Daney

One of the ways in which we could do this for 'len' field is to allow
numeric lengths to be specified (as suggested by David Gibson in another
mail thread), say "entry->ksym_hbp->info.len = 4;".

The length encoding based on register layout can be done in
arch-specific code. I think this would make the code more usable and I
should be implementing it in a patch soon.

As far as the 'type' information is concerned, some of the common
breakpoint types (such as HW_BREAKPOINT_READ, HW_BREAKPOINT_WRITE,
HW_BREAKPOINT_EXECUTE) can be defined in generic files and as NULL
for those architectures that don't implement them.

However it might be a source of confusion to the end-user about
supported breakpoint types on a given architecture.

Thanks,
K.Prasad

2009-06-04 15:45:37

by K.Prasad

[permalink] [raw]
Subject: Re: [PATCH 11/12] hw-breakpoints: ftrace plugin for kernel symbol tracing using HW Breakpoint interfaces

On Wed, Jun 03, 2009 at 02:38:12AM +0200, Frederic Weisbecker wrote:
> On Tue, Jun 02, 2009 at 04:12:08PM -0700, David Daney wrote:
> > Frederic Weisbecker wrote:
> >> From: K.Prasad <[email protected]>

> > I hate to sound like a broken record, but could some one explain to me
> > again why it is a good idea to design a new API that requires processor
> > specific #ifdefs to be sprinkled all around generic kernel code?
> >
> > Back in:
> > http://lkml.org/lkml/2008/12/4/329
> > and
> > http://lkml.org/lkml/2009/5/21/189
> >
> > I raised doubts about this hw-breakpoint thing being generic and the
> > responses made think that the processor specific portions would be
> > isolated in the processor specific parts of the kernel. I now see that
> > I was wrong.
> >
> > When we add sparc, MIPS, ppc... Support it would be nice to not have to
> > add all our own #ifdefs to this, but instead have a generic interface
> > that will not need changes.
> >
> > David Daney
>
> I was discussing about it with Prasad few hours ago :)
>
> The fact is that archs support the hardware breakpoints in
> very different ways each.
> Some of them support read breakpoint, others not (x86).
> Some support addresses range, others (x86).
>
> But still it would be nice to gather the most common
> breakpoints operations through a real generic wrapper
> that relies on arch specific implmentation in
> background.
>
> Such as setting very simple x/w/r breakpoints...
>
> Well Prasad and Alan Stern could tell more about it,
> I wait for their answer.
>
> Anyway it's a fairly new Api that can still evolve.
> The basis are set but can still be improved and more high level
> and generic things can still be implemented.
>

I think this concern can be partially addressed, atleast as far as the
breakpoint length is concerned. I've added my comments in the response
to David Daney here: http://lkml.org/lkml/2009/6/4/303.

Hope that the changes proposed there is acceptable to the community.

Thanks,
K.Prasad

2009-06-05 11:37:16

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 11/12] hw-breakpoints: ftrace plugin for kernel symbol tracing using HW Breakpoint interfaces

On Thu, Jun 04, 2009 at 09:06:58PM +0530, K.Prasad wrote:
> On Tue, Jun 02, 2009 at 04:12:08PM -0700, David Daney wrote:
> > Frederic Weisbecker wrote:
> >> From: K.Prasad <[email protected]>
> >>
> >> This patch adds an ftrace plugin to detect and profile memory access over kernel
> >> variables. It uses HW Breakpoint interfaces to 'watch memory addresses.
> >>
> >> Signed-off-by: K.Prasad <[email protected]>
> >> Signed-off-by: Frederic Weisbecker <[email protected]>
> >> ---
> >> kernel/trace/Kconfig | 21 ++
> >> kernel/trace/Makefile | 1 +
> >> kernel/trace/trace.h | 23 ++
> >> kernel/trace/trace_ksym.c | 525 +++++++++++++++++++++++++++++++++++++++++
> >> kernel/trace/trace_selftest.c | 53 ++++
> >> 5 files changed, 623 insertions(+), 0 deletions(-)
> >> create mode 100644 kernel/trace/trace_ksym.c
> > [...]
> >> + entry->ksym_hbp->info.name = ksymname;
> >> + entry->ksym_hbp->info.type = op;
> >> + entry->ksym_addr = entry->ksym_hbp->info.address = addr;
> >> +#ifdef CONFIG_X86
> >> + entry->ksym_hbp->info.len = HW_BREAKPOINT_LEN_4;
> >> +#endif
> >
> > What if the symbol referred to an object of size other than 4? This
> > would clearly be incorrect in that case.
> >
>
> I don't see a way in which we could automatically detect the size of a
> variable using the symbol name and use that as the 'len' (as this would
> be the ideal case). However, in this case we can circumvent this problem by
> using the least of the possible lengths (HW_BREAKPOINT_LEN_1 for x86)
> as it would be valid for symbols of all sizes. I will change this.
> Thanks for pointing it out!


Indeed, that looks a good solution.



> >> + entry->ksym_hbp->triggered = (void *)ksym_hbp_handler;
> >> +
> >> + ret = register_kernel_hw_breakpoint(entry->ksym_hbp);
> >
> > I hate to sound like a broken record, but could some one explain to me
> > again why it is a good idea to design a new API that requires processor
> > specific #ifdefs to be sprinkled all around generic kernel code?
> >
> > Back in:
> > http://lkml.org/lkml/2008/12/4/329
> > and
> > http://lkml.org/lkml/2009/5/21/189
> >
> > I raised doubts about this hw-breakpoint thing being generic and the
> > responses made think that the processor specific portions would be
> > isolated in the processor specific parts of the kernel. I now see that
> > I was wrong.
> >
> > When we add sparc, MIPS, ppc... Support it would be nice to not have to
> > add all our own #ifdefs to this, but instead have a generic interface
> > that will not need changes.
> >
> > David Daney
>
> One of the ways in which we could do this for 'len' field is to allow
> numeric lengths to be specified (as suggested by David Gibson in another
> mail thread), say "entry->ksym_hbp->info.len = 4;".
>
> The length encoding based on register layout can be done in
> arch-specific code. I think this would make the code more usable and I
> should be implementing it in a patch soon.



Yeah, but are you sure every arch support length encoding based breakpoints?
May be such information should be passed in register_breakpoint instead and
let the arch answer whether it supports or not this operation...



>
> As far as the 'type' information is concerned, some of the common
> breakpoint types (such as HW_BREAKPOINT_READ, HW_BREAKPOINT_WRITE,
> HW_BREAKPOINT_EXECUTE) can be defined in generic files and as NULL
> for those architectures that don't implement them.
>
> However it might be a source of confusion to the end-user about
> supported breakpoint types on a given architecture.


Indeed but I think it's still worth. The Api documentation could
be thought with a section on which we can find the operations
that are supported by every archs.


>
> Thanks,
> K.Prasad
>

2009-06-05 11:44:45

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 11/12] hw-breakpoints: ftrace plugin for kernel symbol tracing using HW Breakpoint interfaces

On Thu, Jun 04, 2009 at 09:15:00PM +0530, K.Prasad wrote:
> On Wed, Jun 03, 2009 at 02:38:12AM +0200, Frederic Weisbecker wrote:
> > On Tue, Jun 02, 2009 at 04:12:08PM -0700, David Daney wrote:
> > > Frederic Weisbecker wrote:
> > >> From: K.Prasad <[email protected]>
>
> > > I hate to sound like a broken record, but could some one explain to me
> > > again why it is a good idea to design a new API that requires processor
> > > specific #ifdefs to be sprinkled all around generic kernel code?
> > >
> > > Back in:
> > > http://lkml.org/lkml/2008/12/4/329
> > > and
> > > http://lkml.org/lkml/2009/5/21/189
> > >
> > > I raised doubts about this hw-breakpoint thing being generic and the
> > > responses made think that the processor specific portions would be
> > > isolated in the processor specific parts of the kernel. I now see that
> > > I was wrong.
> > >
> > > When we add sparc, MIPS, ppc... Support it would be nice to not have to
> > > add all our own #ifdefs to this, but instead have a generic interface
> > > that will not need changes.
> > >
> > > David Daney
> >
> > I was discussing about it with Prasad few hours ago :)
> >
> > The fact is that archs support the hardware breakpoints in
> > very different ways each.
> > Some of them support read breakpoint, others not (x86).
> > Some support addresses range, others (x86).
> >
> > But still it would be nice to gather the most common
> > breakpoints operations through a real generic wrapper
> > that relies on arch specific implmentation in
> > background.
> >
> > Such as setting very simple x/w/r breakpoints...
> >
> > Well Prasad and Alan Stern could tell more about it,
> > I wait for their answer.
> >
> > Anyway it's a fairly new Api that can still evolve.
> > The basis are set but can still be improved and more high level
> > and generic things can still be implemented.
> >
>
> I think this concern can be partially addressed, atleast as far as the
> breakpoint length is concerned. I've added my comments in the response
> to David Daney here: http://lkml.org/lkml/2009/6/4/303.
>
> Hope that the changes proposed there is acceptable to the community.


Yeah, I think it's worth trying it.

Btw, I see there is no easy way to implement read breapoints in x86.
I guess the only solution would be to use the instruction decoder sent
recently by Masami for the kprobe tracer, coupled with RW breakpoints.


> Thanks,
> K.Prasad
>