Hi All,
I'm posting a new revision of the patches for using HW
Breakpoint registers (first version posted y'day:
http://lkml.org/lkml/2008/10/7/127).
This patch includes changes which were agreed upon, following the
comments from Alan Stern. The patches which are still marked as RFC
(pending much larger discussion and interest) is based on 2.6.27-rc9.
Thanks,
K.Prasad
This patch introduces two new files hw_breakpoint.[ch] which defines the
generic interfaces to use hardware breakpoint infrastructure of the system.
Signed-off-by: K.Prasad <[email protected]>
Signed-off-by: Alan Stern <[email protected]>
---
include/asm-generic/hw_breakpoint.h | 263 ++++++++++++
kernel/hw_breakpoint.c | 731 ++++++++++++++++++++++++++++++++++++
2 files changed, 994 insertions(+)
Index: linux-bkpt-lkml-27-rc9/include/asm-generic/hw_breakpoint.h
===================================================================
--- /dev/null
+++ linux-bkpt-lkml-27-rc9/include/asm-generic/hw_breakpoint.h
@@ -0,0 +1,263 @@
+#ifndef _ASM_GENERIC_HW_BREAKPOINT_H
+#define _ASM_GENERIC_HW_BREAKPOINT_H
+
+#ifndef __ARCH_HW_BREAKPOINT_H
+#error "Please don't include this file directly"
+#endif
+
+#ifdef __KERNEL__
+#include <linux/list.h>
+#include <linux/types.h>
+
+/**
+ * struct hw_breakpoint - unified kernel/user-space hardware breakpoint
+ * @node: internal linked-list management
+ * @pre_handler: callback invoked before target address access
+ * @post_handler: callback invoked after target address access
+ * @installed: callback invoked when the breakpoint is installed
+ * @uninstalled: callback invoked when the breakpoint is uninstalled
+ * @info: arch-specific breakpoint info (address, length, and type)
+ * @priority: requested priority level
+ * @status: current registration/installation status
+ *
+ * %hw_breakpoint structures are the kernel's way of representing
+ * hardware breakpoints. These can be either execute breakpoints
+ * (triggered before instruction execution, and a post handler executed after
+ * instruction execution through use of single-stepping) or data breakpoints
+ * (also known as "watchpoints", triggered on data access), and the breakpoint's
+ * target address can be located in either kernel space or user space.
+ *
+ * The breakpoint's address, length, and type are highly
+ * architecture-specific. The values are encoded in the @info field; you
+ * specify them when registering the breakpoint. To examine the encoded
+ * values use hw_breakpoint_get_{kaddress,uaddress,len,type}(), declared
+ * below.
+ *
+ * The address is specified as a regular kernel pointer (for kernel-space
+ * breakponts) or as an %__user pointer (for user-space breakpoints).
+ * With register_user_hw_breakpoint(), the address must refer to a
+ * location in user space. The breakpoint will be active only while the
+ * requested task is running. Conversely with
+ * register_kernel_hw_breakpoint(), the address must refer to a location
+ * in kernel space, and the breakpoint will be active on all CPUs
+ * regardless of the current task.
+ *
+ * The length is the breakpoint's extent in bytes, which is subject to
+ * certain limitations. include/asm/hw_breakpoint.h contains macros
+ * defining the available lengths for a specific architecture. Note that
+ * the address's alignment must match the length. The breakpoint will
+ * catch accesses to any byte in the range from address to address +
+ * (length - 1).
+ *
+ * The breakpoint's type indicates the sort of access that will cause it
+ * to trigger. Possible values may include:
+ *
+ * %HW_BREAKPOINT_EXECUTE (triggered on instruction execution),
+ * %HW_BREAKPOINT_RW (triggered on read or write access),
+ * %HW_BREAKPOINT_WRITE (triggered on write access), and
+ * %HW_BREAKPOINT_READ (triggered on read access).
+ *
+ * Appropriate macros are defined in include/asm/hw_breakpoint.h; not all
+ * possibilities are available on all architectures. Execute breakpoints
+ * must have length equal to the special value %HW_BREAKPOINT_LEN_EXECUTE.
+ *
+ * When a breakpoint gets hit, the @pre_handler or @post_handler (whichever or
+ * both depending upon architecture support and assignment by user) callback is
+ * invoked in_interrupt with a pointer to the %hw_breakpoint structure and the
+ * processor registers. Execute-breakpoint traps occur before the
+ * breakpointed instruction runs; when the callback returns the
+ * instruction is restarted (this time without a debug exception). All
+ * other types of trap occur after the memory access has taken place.
+ * Breakpoints are disabled during execution @pre_handler begins execution and
+ * is enabled only after completion of execution of @post_handler runs, to avoid
+ * recursive traps and allow unhindered access to breakpointed memory.
+ *
+ * Hardware breakpoints are implemented using the CPU's debug registers,
+ * which are a limited hardware resource. Requests to register a
+ * breakpoint will always succeed provided the parameters are valid,
+ * but the breakpoint may not be installed in a debug register right
+ * away. Physical debug registers are allocated based on the priority
+ * level stored in @priority (higher values indicate higher priority).
+ * User-space breakpoints within a single thread compete with one
+ * another, and all user-space breakpoints compete with all kernel-space
+ * breakpoints; however user-space breakpoints in different threads do
+ * not compete. %HW_BREAKPOINT_PRIO_PTRACE is the level used for ptrace
+ * requests; an unobtrusive kernel-space breakpoint will use
+ * %HW_BREAKPOINT_PRIO_NORMAL to avoid disturbing user programs. A
+ * kernel-space breakpoint that always wants to be installed and doesn't
+ * care about disrupting user debugging sessions can specify
+ * %HW_BREAKPOINT_PRIO_HIGH.
+ *
+ * A particular breakpoint may be allocated (installed in) a debug
+ * register or deallocated (uninstalled) from its debug register at any
+ * time, as other breakpoints are registered and unregistered. The
+ * @installed and @uninstalled callbacks are invoked in_atomic when these
+ * events occur. It is legal for @installed or @uninstalled to be %NULL,
+ * but one of the handlers - @pre_handler or @post_handler must be populated
+ * (please check support for your architecture using pre_ and
+ * post_handler_supported() routines to check for support. Note that it is not
+ * possible to register or unregister a user-space breakpoint from within a
+ * callback routine, since doing so requires a process context. Note that for
+ * user breakpoints, while in @installed and @uninstalled the thread may be
+ * context switched. Hence it may not be safe to call printk().
+ *
+ * For kernel-space breakpoints, @installed is invoked after the
+ * breakpoint is actually installed and @uninstalled is invoked before
+ * the breakpoint is actually uninstalled. As a result the @pre_ and
+ * @post_handler() routines may be invoked when not expected, but this way you
+ * will know that during the time interval from @installed to @uninstalled, all
+ * events are faithfully reported. (It is not possible to do any better than
+ * this in general, because on SMP systems there is no way to set a debug
+ * register simultaneously on all CPUs.) The same isn't always true with
+ * user-space breakpoints, but the differences should not be visible to a
+ * user process.
+ *
+ * If you need to know whether your kernel-space breakpoint was installed
+ * immediately upon registration, you can check the return value from
+ * register_kernel_hw_breakpoint(). If the value is not > 0, you can
+ * give up and unregister the breakpoint right away.
+ *
+ * @node and @status are intended for internal use. However @status
+ * may be read to determine whether or not the breakpoint is currently
+ * installed. (The value is not reliable unless local interrupts are
+ * disabled.)
+ *
+ * This sample code sets a breakpoint on pid_max and registers a callback
+ * function for writes to that variable. Note that it is not portable
+ * as written, because not all architectures support HW_BREAKPOINT_LEN_4.
+ *
+ * ----------------------------------------------------------------------
+ *
+ * #include <asm/hw_breakpoint.h>
+ *
+ * static void my_pre_handler(struct hw_breakpoint *bp, struct pt_regs *regs)
+ * {
+ * printk(KERN_DEBUG "Inside pre_handler of breakpoint exception\n");
+ * dump_stack();
+ * .......<more debugging output>........
+ * }
+ *
+ * static void my_post_handler(struct hw_breakpoint *bp, struct pt_regs *regs)
+ * {
+ * printk(KERN_DEBUG "Inside post_handler of breakpoint exception\n");
+ * dump_stack();
+ * .......<more debugging output>........
+ * }
+ *
+ * static struct hw_breakpoint my_bp;
+ *
+ * static int init_module(void)
+ * {
+ * ..........<do anything>............
+ * int bkpt_type = HW_BREAKPOINT_WRITE;
+ *
+ * if (pre_handler_supported(bkpt_type)
+ * my_bp.pre_handler = my_pre_handler;
+ * if (post_handler_supported(bkpt_type)
+ * my_bp.post_handler = my_post_handler;
+ * my_bp.priority = HW_BREAKPOINT_PRIO_NORMAL;
+ * rc = register_kernel_hw_breakpoint(&my_bp, &pid_max,
+ * HW_BREAKPOINT_LEN_4, bkpt_type);
+ * ..........<do anything>............
+ * }
+ *
+ * static void cleanup_module(void)
+ * {
+ * ..........<do anything>............
+ * unregister_kernel_hw_breakpoint(&my_bp);
+ * ..........<do anything>............
+ * }
+ *
+ * ----------------------------------------------------------------------
+ *
+ */
+struct hw_breakpoint {
+ struct list_head node;
+ void (*pre_handler)(struct hw_breakpoint *,
+ struct pt_regs *);
+ void (*post_handler)(struct hw_breakpoint *,
+ struct pt_regs *);
+ void (*installed)(struct hw_breakpoint *);
+ void (*uninstalled)(struct hw_breakpoint *);
+ struct arch_hw_breakpoint info;
+ u8 priority;
+ u8 status;
+};
+
+/*
+ * Inline accessor routines to retrieve the arch-specific parts of
+ * a breakpoint structure:
+ */
+static const void *hw_breakpoint_get_kaddress(struct hw_breakpoint *bp);
+static const void __user *hw_breakpoint_get_uaddress(struct hw_breakpoint *bp);
+static unsigned hw_breakpoint_get_len(struct hw_breakpoint *bp);
+static unsigned hw_breakpoint_get_type(struct hw_breakpoint *bp);
+
+/*
+ * len and type values are defined in include/asm/hw_breakpoint.h.
+ * Available values vary according to the architecture. On i386 the
+ * possibilities are:
+ *
+ * HW_BREAKPOINT_LEN_1
+ * HW_BREAKPOINT_LEN_2
+ * HW_BREAKPOINT_LEN_4
+ * HW_BREAKPOINT_LEN_EXECUTE
+ * HW_BREAKPOINT_RW
+ * HW_BREAKPOINT_READ
+ * HW_BREAKPOINT_EXECUTE
+ *
+ * On other architectures HW_BREAKPOINT_LEN_8 may be available, and the
+ * 1-, 2-, and 4-byte lengths may be unavailable. There also may be
+ * HW_BREAKPOINT_WRITE. You can use #ifdef to check at compile time.
+ */
+
+/* Standard HW breakpoint priority levels (higher value = higher priority) */
+#define HW_BREAKPOINT_PRIO_NORMAL 25
+#define HW_BREAKPOINT_PRIO_PTRACE 50
+#define HW_BREAKPOINT_PRIO_HIGH 75
+
+/* HW breakpoint status values (0 = not registered) */
+#define HW_BREAKPOINT_REGISTERED 1
+#define HW_BREAKPOINT_INSTALLED 2
+
+/*
+ * The following two routines are meant to be called only from within
+ * the ptrace or utrace subsystems. The tsk argument will usually be a
+ * process being debugged by the current task, although it is also legal
+ * for tsk to be the current task. In any case it must be guaranteed
+ * that tsk will not start running in user mode while its breakpoints are
+ * being modified.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp,
+ const void __user *address, unsigned len, unsigned type);
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp);
+
+/*
+ * Kernel breakpoints are not associated with any particular thread.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp,
+ const void *address, unsigned len, unsigned type);
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+
+/**
+ * pre_handler_supported - function that indicates if pre_handler is supported
+ * @type - integer that denotes that type of breakpoint requested
+ *
+ * Return value 1 - indicates that pre_handler is supported on the host machine
+ * Return value 0 - indicates that pre_handler is unsupported
+ */
+int pre_handler_supported(unsigned type);
+
+/**
+ * post_handler_supported - function that indicates if post_handler is supported
+ * @type - integer that denotes that type of breakpoint requested
+ *
+ * Return value 1 - indicates that post_handler is supported on the host machine
+ * Return value 0 - indicates that post_handler is unsupported
+ */
+int post_handler_supported(unsigned type);
+
+#endif /* __KERNEL__ */
+#endif /* _ASM_GENERIC_HW_BREAKPOINT_H */
Index: linux-bkpt-lkml-27-rc9/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ linux-bkpt-lkml-27-rc9/kernel/hw_breakpoint.c
@@ -0,0 +1,731 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ *
+ * This file contains the arch-independent routines. It is not meant
+ * to be compiled as a standalone source file; rather it should be
+ * #include'd by the arch-specific implementation.
+ */
+
+
+/*
+ * Install the debug register values for a new thread.
+ */
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk)
+{
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+ struct cpu_hw_breakpoint *chbi;
+ struct kernel_bp_data *thr_kbpdata;
+
+ /* This routine is on the hot path; it gets called for every
+ * context switch into a task with active breakpoints. We
+ * must make sure that the common case executes as quickly as
+ * possible.
+ */
+ chbi = &per_cpu(cpu_info, get_cpu());
+ chbi->bp_task = tsk;
+
+ /* Use RCU to synchronize with external updates */
+ rcu_read_lock();
+
+ /* Other CPUs might be making updates to the list of kernel
+ * breakpoints at this time. If they are, they will modify
+ * the other entry in kbpdata[] -- the one not pointed to
+ * by chbi->cur_kbpdata. So the update itself won't affect
+ * us directly.
+ *
+ * However when the update is finished, an IPI will arrive
+ * telling this CPU to change chbi->cur_kbpdata. We need
+ * to use a single consistent kbpdata[] entry, the present one.
+ * So we'll copy the pointer to a local variable, thr_kbpdata,
+ * and we must prevent the compiler from aliasing the two
+ * pointers. Only a compiler barrier is required, not a full
+ * memory barrier, because everything takes place on a single CPU.
+ */
+ restart:
+ thr_kbpdata = chbi->cur_kbpdata;
+ barrier();
+
+ /* Normally we can keep the same debug register settings as the
+ * last time this task ran. But if the kernel breakpoints have
+ * changed or any user breakpoints have been registered or
+ * unregistered, we need to handle the updates and possibly
+ * send out some notifications.
+ */
+ if (unlikely(thbi->gennum != thr_kbpdata->gennum)) {
+ struct hw_breakpoint *bp;
+ int i;
+ int num;
+
+ thbi->gennum = thr_kbpdata->gennum;
+ arch_update_thbi(thbi, thr_kbpdata);
+ num = thr_kbpdata->num_kbps;
+
+ /* This code can be invoked while a debugger is actively
+ * updating the thread's breakpoint list. We use RCU to
+ * protect our access to the list pointers. */
+ thbi->num_installed = 0;
+ i = HB_NUM;
+ list_for_each_entry_rcu(bp, &thbi->thread_bps, node) {
+
+ /* If this register is allocated for kernel bps,
+ * don't install. Otherwise do. */
+ if (--i < num) {
+ if (bp->status == HW_BREAKPOINT_INSTALLED) {
+ if (bp->uninstalled)
+ (bp->uninstalled)(bp);
+ bp->status = HW_BREAKPOINT_REGISTERED;
+ }
+ } else {
+ ++thbi->num_installed;
+ if (bp->status != HW_BREAKPOINT_INSTALLED) {
+ bp->status = HW_BREAKPOINT_INSTALLED;
+ if (bp->installed)
+ (bp->installed)(bp);
+ }
+ }
+ }
+ }
+
+ /* Set the debug register */
+ arch_install_thbi(thbi);
+
+ /* Were there any kernel breakpoint changes while we were running? */
+ if (unlikely(chbi->cur_kbpdata != thr_kbpdata)) {
+
+ /* Some debug registers now be assigned to kernel bps and
+ * we might have messed them up. Reload all the kernel bps
+ * and then reload the thread bps.
+ */
+ arch_install_chbi(chbi);
+ goto restart;
+ }
+
+ rcu_read_unlock();
+ put_cpu_no_resched();
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static void switch_to_none_hw_breakpoint(void)
+{
+ struct cpu_hw_breakpoint *chbi;
+
+ chbi = &per_cpu(cpu_info, get_cpu());
+ chbi->bp_task = NULL;
+
+ /* This routine gets called from only two places. In one
+ * the caller holds the hw_breakpoint_mutex; in the other
+ * interrupts are disabled. In either case, no kernel
+ * breakpoint updates can arrive while the routine runs.
+ * So we don't need to use RCU.
+ */
+ arch_install_none(chbi);
+ put_cpu_no_resched();
+}
+
+/*
+ * Update the debug registers on this CPU.
+ */
+static void update_this_cpu(void *unused)
+{
+ struct cpu_hw_breakpoint *chbi;
+ struct task_struct *tsk = current;
+
+ chbi = &per_cpu(cpu_info, get_cpu());
+
+ /* Install both the kernel and the user breakpoints */
+ arch_install_chbi(chbi);
+ if (test_tsk_thread_flag(tsk, TIF_DEBUG))
+ switch_to_thread_hw_breakpoint(tsk);
+
+ put_cpu_no_resched();
+}
+
+/*
+ * Tell all CPUs to update their debug registers.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void update_all_cpus(void)
+{
+ /* We don't need to use any sort of memory barrier. The IPI
+ * carried out by on_each_cpu() includes its own barriers.
+ */
+ on_each_cpu(update_this_cpu, NULL, 0);
+ synchronize_rcu();
+}
+
+/*
+ * Load the debug registers during startup of a CPU.
+ */
+void load_debug_registers(void)
+{
+ unsigned long flags;
+
+ /* Prevent IPIs for new kernel breakpoint updates */
+ local_irq_save(flags);
+
+ rcu_read_lock();
+ update_this_cpu(NULL);
+ rcu_read_unlock();
+
+ local_irq_restore(flags);
+}
+
+/*
+ * Take the 4 highest-priority breakpoints in a thread and accumulate
+ * their priorities in tprio. Highest-priority entry is in tprio[3].
+ */
+static void accum_thread_tprio(struct thread_hw_breakpoint *thbi)
+{
+ int i;
+
+ for (i = HB_NUM - 1; i >= 0 && thbi->bps[i]; --i)
+ tprio[i] = max(tprio[i], thbi->bps[i]->priority);
+}
+
+/*
+ * Recalculate the value of the tprio array, the maximum priority levels
+ * requested by user breakpoints in all threads.
+ *
+ * Each thread has a list of registered breakpoints, kept in order of
+ * decreasing priority. We'll set tprio[0] to the maximum priority of
+ * the first entries in all the lists, tprio[1] to the maximum priority
+ * of the second entries in all the lists, etc. In the end, we'll know
+ * that no thread requires breakpoints with priorities higher than the
+ * values in tprio.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void recalc_tprio(void)
+{
+ struct thread_hw_breakpoint *thbi;
+
+ memset(tprio, 0, sizeof tprio);
+
+ /* Loop through all threads having registered breakpoints
+ * and accumulate the maximum priority levels in tprio.
+ */
+ list_for_each_entry(thbi, &thread_list, node)
+ accum_thread_tprio(thbi);
+}
+
+/*
+ * Decide how many debug registers will be allocated to kernel breakpoints
+ * and consequently, how many remain available for user breakpoints.
+ *
+ * The priorities of the entries in the list of registered kernel bps
+ * are compared against the priorities stored in tprio[]. The 4 highest
+ * winners overall get to be installed in a debug register; num_kpbs
+ * keeps track of how many of those winners come from the kernel list.
+ *
+ * If num_kbps changes, or if a kernel bp changes its installation status,
+ * then call update_all_cpus() so that the debug registers will be set
+ * correctly on every CPU. If neither condition holds then the set of
+ * kernel bps hasn't changed, and nothing more needs to be done.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void balance_kernel_vs_user(void)
+{
+ int k, u;
+ int changed = 0;
+ struct hw_breakpoint *bp;
+ struct kernel_bp_data *new_kbpdata;
+
+ /* Determine how many debug registers are available for kernel
+ * breakpoints as opposed to user breakpoints, based on the
+ * priorities. Ties are resolved in favor of user bps.
+ */
+ k = 0; /* Next kernel bp to allocate */
+ u = HB_NUM - 1; /* Next user bp to allocate */
+ bp = list_entry(kernel_bps.next, struct hw_breakpoint, node);
+ while (k <= u) {
+ if (&bp->node == &kernel_bps || tprio[u] >= bp->priority)
+ --u; /* User bps win a slot */
+ else {
+ ++k; /* Kernel bp wins a slot */
+ if (bp->status != HW_BREAKPOINT_INSTALLED)
+ changed = 1;
+ bp = list_entry(bp->node.next, struct hw_breakpoint,
+ node);
+ }
+ }
+ if (k != cur_kbpdata->num_kbps)
+ changed = 1;
+
+ /* Notify the remaining kernel breakpoints that they are about
+ * to be uninstalled.
+ */
+ list_for_each_entry_from(bp, &kernel_bps, node) {
+ if (bp->status == HW_BREAKPOINT_INSTALLED) {
+ if (bp->uninstalled)
+ (bp->uninstalled)(bp);
+ bp->status = HW_BREAKPOINT_REGISTERED;
+ changed = 1;
+ }
+ }
+
+ if (changed) {
+ cur_kbpindex ^= 1;
+ new_kbpdata = &kbpdata[cur_kbpindex];
+ new_kbpdata->gennum = cur_kbpdata->gennum + 1;
+ new_kbpdata->num_kbps = k;
+ arch_new_kbpdata(new_kbpdata);
+ u = 0;
+ list_for_each_entry(bp, &kernel_bps, node) {
+ if (u >= k)
+ break;
+ new_kbpdata->bps[u] = bp;
+ ++u;
+ }
+ rcu_assign_pointer(cur_kbpdata, new_kbpdata);
+
+ /* Tell all the CPUs to update their debug registers */
+ update_all_cpus();
+
+ /* Notify the breakpoints that just got installed */
+ for (u = 0; u < k; ++u) {
+ bp = new_kbpdata->bps[u];
+ if (bp->status != HW_BREAKPOINT_INSTALLED) {
+ bp->status = HW_BREAKPOINT_INSTALLED;
+ if (bp->installed)
+ (bp->installed)(bp);
+ }
+ }
+ }
+}
+
+/*
+ * Return the pointer to a thread's hw_breakpoint info area,
+ * and try to allocate one if it doesn't exist.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+ struct task_struct *tsk)
+{
+ if (!tsk->thread.hw_breakpoint_info && !(tsk->flags & PF_EXITING)) {
+ struct thread_hw_breakpoint *thbi;
+
+ thbi = kzalloc(sizeof(struct thread_hw_breakpoint),
+ GFP_KERNEL);
+ if (thbi) {
+ INIT_LIST_HEAD(&thbi->node);
+ INIT_LIST_HEAD(&thbi->thread_bps);
+
+ /* Force an update the next time tsk runs */
+ thbi->gennum = cur_kbpdata->gennum - 2;
+ tsk->thread.hw_breakpoint_info = thbi;
+ }
+ }
+ return tsk->thread.hw_breakpoint_info;
+}
+
+/*
+ * Erase all the hardware breakpoint info associated with a thread.
+ *
+ * If tsk != current then tsk must not be usable (for example, a
+ * child being cleaned up from a failed fork).
+ */
+void flush_thread_hw_breakpoint(struct task_struct *tsk)
+{
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+ struct hw_breakpoint *bp;
+
+ if (!thbi)
+ return;
+ mutex_lock(&hw_breakpoint_mutex);
+
+ /* Let the breakpoints know they are being uninstalled */
+ list_for_each_entry(bp, &thbi->thread_bps, node) {
+ if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+ (bp->uninstalled)(bp);
+ bp->status = 0;
+ }
+
+ /* Remove tsk from the list of all threads with registered bps */
+ list_del(&thbi->node);
+
+ /* The thread no longer has any breakpoints associated with it */
+ clear_tsk_thread_flag(tsk, TIF_DEBUG);
+ tsk->thread.hw_breakpoint_info = NULL;
+ kfree(thbi);
+
+ /* Recalculate and rebalance the kernel-vs-user priorities */
+ recalc_tprio();
+ balance_kernel_vs_user();
+
+ /* Actually uninstall the breakpoints if necessary */
+ if (tsk == current)
+ switch_to_none_hw_breakpoint();
+ mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/*
+ * Copy the hardware breakpoint info from a thread to its cloned child.
+ */
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+ struct task_struct *child, unsigned long clone_flags)
+{
+ /* We will assume that breakpoint settings are not inherited
+ * and the child starts out with no debug registers set.
+ * But what about CLONE_PTRACE?
+ */
+ clear_tsk_thread_flag(child, TIF_DEBUG);
+ return 0;
+}
+
+/*
+ * Store the highest-priority thread breakpoint entries in an array.
+ */
+static void store_thread_bp_array(struct thread_hw_breakpoint *thbi)
+{
+ struct hw_breakpoint *bp;
+ int i;
+
+ i = HB_NUM - 1;
+ list_for_each_entry(bp, &thbi->thread_bps, node) {
+ thbi->bps[i] = bp;
+ arch_store_thread_bp_array(thbi, bp, i);
+ if (--i < 0)
+ break;
+ }
+ while (i >= 0)
+ thbi->bps[i--] = NULL;
+
+ /* Force an update the next time this task runs */
+ thbi->gennum = cur_kbpdata->gennum - 2;
+}
+
+/*
+ * Insert a new breakpoint in a priority-sorted list.
+ * Return the bp's index in the list.
+ *
+ * Thread invariants:
+ * tsk_thread_flag(tsk, TIF_DEBUG) set implies
+ * tsk->thread.hw_breakpoint_info is not NULL.
+ * tsk_thread_flag(tsk, TIF_DEBUG) set iff thbi->thread_bps is non-empty
+ * iff thbi->node is on thread_list.
+ */
+static int insert_bp_in_list(struct hw_breakpoint *bp,
+ struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+ struct list_head *head;
+ int pos;
+ struct hw_breakpoint *temp_bp;
+
+ /* tsk and thbi are NULL for kernel bps, non-NULL for user bps */
+ if (tsk)
+ head = &thbi->thread_bps;
+ else
+ head = &kernel_bps;
+
+ /* Equal-priority breakpoints get listed first-come-first-served */
+ pos = 0;
+ list_for_each_entry(temp_bp, head, node) {
+ if (bp->priority > temp_bp->priority)
+ break;
+ ++pos;
+ }
+ bp->status = HW_BREAKPOINT_REGISTERED;
+ list_add_tail(&bp->node, &temp_bp->node);
+
+ if (tsk) {
+ store_thread_bp_array(thbi);
+
+ /* Is this the thread's first registered breakpoint? */
+ if (list_empty(&thbi->node)) {
+ set_tsk_thread_flag(tsk, TIF_DEBUG);
+ list_add(&thbi->node, &thread_list);
+ }
+ }
+ return pos;
+}
+
+/*
+ * Remove a breakpoint from its priority-sorted list.
+ *
+ * See the invariants mentioned above.
+ */
+static void remove_bp_from_list(struct hw_breakpoint *bp,
+ struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+ /* Remove bp from the thread's/kernel's list. If the list is now
+ * empty we must clear the TIF_DEBUG flag. But keep the
+ * thread_hw_breakpoint structure, so that the virtualized debug
+ * register values will remain valid.
+ */
+ list_del(&bp->node);
+ if (tsk) {
+ store_thread_bp_array(thbi);
+
+ if (list_empty(&thbi->thread_bps)) {
+ list_del_init(&thbi->node);
+ clear_tsk_thread_flag(tsk, TIF_DEBUG);
+ }
+ }
+
+ /* Tell the breakpoint it is being uninstalled */
+ if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+ (bp->uninstalled)(bp);
+ bp->status = 0;
+}
+
+/*
+ * Validate the settings in a hw_breakpoint structure.
+ */
+static int validate_settings(struct hw_breakpoint *bp, struct task_struct *tsk,
+ unsigned long address, unsigned len, unsigned int type)
+{
+ int ret;
+ unsigned int align;
+
+ ret = arch_validate_hwbkpt_settings(bp, address, len, type, &align);
+ if (ret < 0)
+ goto err;
+
+ /* Check that the low-order bits of the address are appropriate
+ * for the alignment implied by len.
+ */
+ if (address & align)
+ return -EINVAL;
+
+ /* Check that the virtual address is in the proper range */
+ if (tsk) {
+ if (!arch_check_va_in_userspace(address, tsk))
+ return -EFAULT;
+ } else {
+ if (!arch_check_va_in_kernelspace(address))
+ return -EFAULT;
+ }
+ err:
+ return ret;
+}
+
+/*
+ * Actual implementation of register_user_hw_breakpoint.
+ */
+static int __register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp,
+ unsigned long address, unsigned len, unsigned type)
+{
+ int rc;
+ struct thread_hw_breakpoint *thbi;
+ int pos;
+
+ bp->status = 0;
+ rc = validate_settings(bp, tsk, address, len, type);
+ if (rc)
+ return rc;
+
+ thbi = alloc_thread_hw_breakpoint(tsk);
+ if (!thbi)
+ return -ENOMEM;
+
+ /* Insert bp in the thread's list */
+ pos = insert_bp_in_list(bp, thbi, tsk);
+ arch_register_user_hw_breakpoint(bp, thbi);
+
+ /* Update and rebalance the priorities. We don't need to go through
+ * the list of all threads; adding a breakpoint can only cause the
+ * priorities for this thread to increase.
+ */
+ accum_thread_tprio(thbi);
+ balance_kernel_vs_user();
+
+ /* Did bp get allocated to a debug register? We can tell from its
+ * position in the list. The number of registers allocated to
+ * kernel breakpoints is num_kbps; all the others are available for
+ * user breakpoints. If bp's position in the priority-ordered list
+ * is low enough, it will get a register.
+ */
+ if (pos < HB_NUM - cur_kbpdata->num_kbps) {
+ rc = 1;
+
+ /* Does it need to be installed right now? */
+ if (tsk == current)
+ switch_to_thread_hw_breakpoint(tsk);
+ /* Otherwise it will get installed the next time tsk runs */
+ }
+
+ return rc;
+}
+
+/**
+ * register_user_hw_breakpoint - register a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint will be set
+ * @bp: the breakpoint structure to register
+ * @address: location (virtual address) of the breakpoint
+ * @len: encoded extent of the breakpoint address (1, 2, 4, or 8 bytes)
+ * @type: breakpoint type (read-only, write-only, read-write, or execute)
+ *
+ * This routine registers a breakpoint to be associated with @tsk's
+ * memory space and active only while @tsk is running. It does not
+ * guarantee that the breakpoint will be allocated to a debug register
+ * immediately; there may be other higher-priority breakpoints registered
+ * which require the use of all the debug registers.
+ *
+ * @tsk will normally be a process being debugged by the current process,
+ * but it may also be the current process.
+ *
+ * @address, @len, and @type are checked for validity and stored in
+ * encoded form in @bp. @bp->triggered and @bp->priority must be set
+ * properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp,
+ const void __user *address, unsigned len, unsigned type)
+{
+ int rc;
+
+ mutex_lock(&hw_breakpoint_mutex);
+ rc = __register_user_hw_breakpoint(tsk, bp,
+ (unsigned long) address, len, type);
+ mutex_unlock(&hw_breakpoint_mutex);
+ return rc;
+}
+
+/*
+ * Actual implementation of unregister_user_hw_breakpoint.
+ */
+static void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp)
+{
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+
+ if (!bp->status)
+ return; /* Not registered */
+
+ /* Remove bp from the thread's list */
+ remove_bp_from_list(bp, thbi, tsk);
+ arch_unregister_user_hw_breakpoint(bp, thbi);
+
+ /* Recalculate and rebalance the kernel-vs-user priorities,
+ * and actually uninstall bp if necessary.
+ */
+ recalc_tprio();
+ balance_kernel_vs_user();
+ if (tsk == current)
+ switch_to_thread_hw_breakpoint(tsk);
+}
+
+/**
+ * unregister_user_hw_breakpoint - unregister a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint is registered
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp)
+{
+ mutex_lock(&hw_breakpoint_mutex);
+ __unregister_user_hw_breakpoint(tsk, bp);
+ mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/**
+ * register_kernel_hw_breakpoint - register a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to register
+ * @address: location (virtual address) of the breakpoint
+ * @len: encoded extent of the breakpoint address (1, 2, 4, or 8 bytes)
+ * @type: breakpoint type (read-only, write-only, read-write, or execute)
+ *
+ * This routine registers a breakpoint to be active at all times. It
+ * does not guarantee that the breakpoint will be allocated to a debug
+ * register immediately; there may be other higher-priority breakpoints
+ * registered which require the use of all the debug registers.
+ *
+ * @address, @len, and @type are checked for validity and stored in
+ * encoded form in @bp. @bp->triggered and @bp->priority must be set
+ * properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp,
+ const void *address, unsigned len, unsigned type)
+{
+ int rc;
+ int pos;
+
+ bp->status = 0;
+ rc = validate_settings(bp, NULL, (unsigned long) address, len, type);
+ if (rc)
+ return rc;
+
+ mutex_lock(&hw_breakpoint_mutex);
+
+ /* Insert bp in the kernel's list */
+ pos = insert_bp_in_list(bp, NULL, NULL);
+ arch_register_kernel_hw_breakpoint(bp);
+
+ /* Rebalance the priorities. This will install bp if it
+ * was allocated a debug register.
+ */
+ balance_kernel_vs_user();
+
+ /* Did bp get allocated to a debug register? We can tell from its
+ * position in the list. The number of registers allocated to
+ * kernel breakpoints is num_kbps; all the others are available for
+ * user breakpoints. If bp's position in the priority-ordered list
+ * is low enough, it will get a register.
+ */
+ if (pos < cur_kbpdata->num_kbps)
+ rc = 1;
+
+ mutex_unlock(&hw_breakpoint_mutex);
+ return rc;
+}
+EXPORT_SYMBOL_GPL(register_kernel_hw_breakpoint);
+
+/**
+ * unregister_kernel_hw_breakpoint - unregister a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+ if (!bp->status)
+ return; /* Not registered */
+ mutex_lock(&hw_breakpoint_mutex);
+
+ /* Remove bp from the kernel's list */
+ remove_bp_from_list(bp, NULL, NULL);
+ arch_unregister_kernel_hw_breakpoint(bp);
+
+ /* Rebalance the priorities. This will uninstall bp if it
+ * was allocated a debug register.
+ */
+ balance_kernel_vs_user();
+
+ mutex_unlock(&hw_breakpoint_mutex);
+}
+EXPORT_SYMBOL_GPL(unregister_kernel_hw_breakpoint);
This patch introduces two new files named hw_breakpoint.[ch] inside x86 specific
directories. They contain functions which help validate and serve requests for
using Hardware Breakpoint registers on x86 processors.
Signed-off-by: K.Prasad <[email protected]>
Signed-off-by: Alan Stern <[email protected]>
---
arch/x86/kernel/Makefile | 2
arch/x86/kernel/hw_breakpoint.c | 684 ++++++++++++++++++++++++++++++++++++++++
include/asm-x86/hw_breakpoint.h | 121 +++++++
3 files changed, 806 insertions(+), 1 deletion(-)
Index: linux-bkpt-lkml-27-rc9/arch/x86/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ linux-bkpt-lkml-27-rc9/arch/x86/kernel/hw_breakpoint.c
@@ -0,0 +1,684 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ * Copyright (C) 2008 IBM Corporation
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ */
+
+#include <linux/init.h>
+#include <linux/irqflags.h>
+#include <linux/kdebug.h>
+#include <linux/kernel.h>
+#include <linux/kprobes.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/rculist.h>
+#include <linux/sched.h>
+#include <linux/smp.h>
+#include <linux/percpu.h>
+
+#include <asm/debugreg.h>
+#include <asm/hw_breakpoint.h>
+#include <asm/processor.h>
+
+DEFINE_PER_CPU(unsigned int, sstep_reason) = 0;
+
+/* Arch-specific hook routines */
+
+/*
+ * Install the kernel breakpoints in their debug registers.
+ */
+static void arch_install_chbi(struct cpu_hw_breakpoint *chbi)
+{
+ struct hw_breakpoint **bps;
+
+ /* Don't allow debug exceptions while we update the registers */
+ set_debugreg(0UL, 7);
+ chbi->cur_kbpdata = rcu_dereference(cur_kbpdata);
+
+ /* Kernel breakpoints are stored starting in DR0 and going up */
+ bps = chbi->cur_kbpdata->bps;
+ switch (chbi->cur_kbpdata->num_kbps) {
+ case 4:
+ set_debugreg(bps[3]->info.address, 3);
+ case 3:
+ set_debugreg(bps[2]->info.address, 2);
+ case 2:
+ set_debugreg(bps[1]->info.address, 1);
+ case 1:
+ set_debugreg(bps[0]->info.address, 0);
+ }
+ /* No need to set DR6 */
+ set_debugreg(chbi->cur_kbpdata->mkdr7, 7);
+}
+
+/*
+ * Update an out-of-date thread hw_breakpoint info structure.
+ */
+static void arch_update_thbi(struct thread_hw_breakpoint *thbi,
+ struct kernel_bp_data *thr_kbpdata)
+{
+ int num = thr_kbpdata->num_kbps;
+
+ thbi->tkdr7 = thr_kbpdata->mkdr7 | (thbi->tdr7 & ~kdr7_masks[num]);
+}
+
+/*
+ * Install the thread breakpoints in their debug registers.
+ */
+static void arch_install_thbi(struct thread_hw_breakpoint *thbi)
+{
+ /* Install the user breakpoints. Kernel breakpoints are stored
+ * starting in DR0 and going up; there are num_kbps of them.
+ * User breakpoints are stored starting in DR3 and going down,
+ * as many as we have room for.
+ */
+ switch (thbi->num_installed) {
+ case 4:
+ set_debugreg(thbi->tdr[0], 0);
+ case 3:
+ set_debugreg(thbi->tdr[1], 1);
+ case 2:
+ set_debugreg(thbi->tdr[2], 2);
+ case 1:
+ set_debugreg(thbi->tdr[3], 3);
+ }
+ /* No need to set DR6 */
+ set_debugreg(thbi->tkdr7, 7);
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static void arch_install_none(struct cpu_hw_breakpoint *chbi)
+{
+ set_debugreg(chbi->cur_kbpdata->mkdr7, 7);
+}
+
+/*
+ * Create a new kbpdata entry.
+ */
+static void arch_new_kbpdata(struct kernel_bp_data *new_kbpdata)
+{
+ int num = new_kbpdata->num_kbps;
+
+ new_kbpdata->mkdr7 = kdr7 & (kdr7_masks[num] | DR_GLOBAL_SLOWDOWN);
+}
+
+/*
+ * Store a thread breakpoint array entry's address
+ */
+static void arch_store_thread_bp_array(struct thread_hw_breakpoint *thbi,
+ struct hw_breakpoint *bp, int i)
+{
+ thbi->tdr[i] = bp->info.address;
+}
+
+int pre_handler_supported(unsigned type)
+{
+ if (type == HW_BREAKPOINT_EXECUTE)
+ return 1;
+ else
+ return 0;
+}
+
+int post_handler_supported(unsigned type)
+{
+ /* We can have a post handler for all types of breakpoints */
+ return 1;
+}
+
+/*
+ * Store a breakpoint's encoded address, length, and type.
+ */
+static void arch_store_info(struct hw_breakpoint *bp,
+ unsigned long address, unsigned len, unsigned type)
+{
+ bp->info.address = address;
+ bp->info.len = len;
+ bp->info.type = type;
+}
+
+/*
+ * Validate the arch-specific HW Breakpoint register settings
+ */
+static int arch_validate_hwbkpt_settings(struct hw_breakpoint *bp,
+ unsigned long address, unsigned len, unsigned int type,
+ unsigned int *align)
+{
+ int ret = -EINVAL;
+
+ switch (type) {
+ case HW_BREAKPOINT_EXECUTE:
+ if (len != HW_BREAKPOINT_LEN_EXECUTE)
+ return ret;
+ break;
+ case HW_BREAKPOINT_WRITE:
+ break;
+ case HW_BREAKPOINT_IO:
+ break;
+ case HW_BREAKPOINT_RW:
+ break;
+ default:
+ return ret;
+ }
+
+ switch (len) {
+ case HW_BREAKPOINT_LEN_1:
+ *align = 0;
+ break;
+ case HW_BREAKPOINT_LEN_2:
+ *align = 1;
+ break;
+ case HW_BREAKPOINT_LEN_4:
+ *align = 3;
+ break;
+ default:
+ return ret;
+ }
+
+ if ((pre_handler_supported(type) && (bp->pre_handler)) ||
+ (post_handler_supported(type) && (bp->post_handler))) {
+ ret = 0;
+ arch_store_info(bp, address, len, type);
+ }
+ return ret;
+}
+
+/*
+ * Check for virtual address in user space.
+ */
+static int arch_check_va_in_userspace(unsigned long va,
+ struct task_struct *tsk)
+{
+#ifndef CONFIG_X86_64
+#define TASK_SIZE_OF(t) TASK_SIZE
+#endif
+ return (va < TASK_SIZE_OF(tsk));
+}
+
+/*
+ * Check for virtual address in kernel space.
+ */
+static int arch_check_va_in_kernelspace(unsigned long va)
+{
+#ifndef CONFIG_X86_64
+#define TASK_SIZE64 TASK_SIZE
+#endif
+ return (va >= TASK_SIZE64);
+}
+
+/*
+ * Encode the length, type, Exact, and Enable bits for a particular breakpoint
+ * as stored in debug register 7.
+ */
+static unsigned long encode_dr7(int drnum, unsigned len, unsigned type)
+{
+ unsigned long temp;
+
+ temp = (len | type) & 0xf;
+ temp <<= (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
+ temp |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE)) |
+ DR_GLOBAL_SLOWDOWN;
+ return temp;
+}
+
+/*
+ * Calculate the DR7 value for a list of kernel or user breakpoints.
+ */
+static unsigned long calculate_dr7(struct thread_hw_breakpoint *thbi)
+{
+ int is_user;
+ struct list_head *bp_list;
+ struct hw_breakpoint *bp;
+ int i;
+ int drnum;
+ unsigned long dr7;
+
+ if (thbi) {
+ is_user = 1;
+ bp_list = &thbi->thread_bps;
+ drnum = HB_NUM - 1;
+ } else {
+ is_user = 0;
+ bp_list = &kernel_bps;
+ drnum = 0;
+ }
+
+ /* Kernel bps are assigned from DR0 on up, and user bps are assigned
+ * from DR3 on down. Accumulate all 4 bps; the kernel DR7 mask will
+ * select the appropriate bits later.
+ */
+ dr7 = 0;
+ i = 0;
+ list_for_each_entry(bp, bp_list, node) {
+
+ /* Get the debug register number and accumulate the bits */
+ dr7 |= encode_dr7(drnum, bp->info.len, bp->info.type);
+ if (++i >= HB_NUM)
+ break;
+ if (is_user)
+ --drnum;
+ else
+ ++drnum;
+ }
+ return dr7;
+}
+
+/*
+ * Register a new user breakpoint structure.
+ */
+static void arch_register_user_hw_breakpoint(struct hw_breakpoint *bp,
+ struct thread_hw_breakpoint *thbi)
+{
+ thbi->tdr7 = calculate_dr7(thbi);
+
+ /* If this is an execution breakpoint for the current PC address,
+ * we should clear the task's RF so that the bp will be certain
+ * to trigger.
+ *
+ * FIXME: It's not so easy to get hold of the task's PC as a linear
+ * address! ptrace.c does this already...
+ */
+}
+
+/*
+ * Unregister a user breakpoint structure.
+ */
+static void arch_unregister_user_hw_breakpoint(struct hw_breakpoint *bp,
+ struct thread_hw_breakpoint *thbi)
+{
+ thbi->tdr7 = calculate_dr7(thbi);
+}
+
+/*
+ * Register a kernel breakpoint structure.
+ */
+static void arch_register_kernel_hw_breakpoint(
+ struct hw_breakpoint *bp)
+{
+ kdr7 = calculate_dr7(NULL);
+}
+
+/*
+ * Unregister a kernel breakpoint structure.
+ */
+static void arch_unregister_kernel_hw_breakpoint(
+ struct hw_breakpoint *bp)
+{
+ kdr7 = calculate_dr7(NULL);
+}
+
+
+/* End of arch-specific hook routines */
+
+
+/*
+ * Copy out the debug register information for a core dump.
+ *
+ * tsk must be equal to current.
+ */
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8])
+{
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+ int i;
+
+ memset(u_debugreg, 0, sizeof u_debugreg);
+ if (thbi) {
+ for (i = 0; i < HB_NUM; ++i)
+ u_debugreg[i] = thbi->vdr_bps[i].info.address;
+ u_debugreg[7] = thbi->vdr7;
+ }
+ u_debugreg[6] = tsk->thread.vdr6;
+}
+
+/*
+ * Ptrace support: breakpoint trigger routine.
+ */
+
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+ struct task_struct *tsk);
+static int __register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp,
+ unsigned long address, unsigned len, unsigned type);
+static void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp);
+
+static void ptrace_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+{
+ struct task_struct *tsk = current;
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+ int i;
+
+ /* Store in the virtual DR6 register the fact that the breakpoint
+ * was hit so the thread's debugger will see it.
+ */
+ if (thbi) {
+ i = bp - thbi->vdr_bps;
+ tsk->thread.vdr6 |= (DR_TRAP0 << i);
+ }
+}
+
+/*
+ * Handle PTRACE_PEEKUSR calls for the debug register area.
+ */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n)
+{
+ struct thread_hw_breakpoint *thbi;
+ unsigned long val = 0;
+
+ mutex_lock(&hw_breakpoint_mutex);
+ thbi = tsk->thread.hw_breakpoint_info;
+ if (n < HB_NUM) {
+ if (thbi)
+ val = thbi->vdr_bps[n].info.address;
+ } else if (n == 6) {
+ val = tsk->thread.vdr6;
+ } else if (n == 7) {
+ if (thbi)
+ val = thbi->vdr7;
+ }
+ mutex_unlock(&hw_breakpoint_mutex);
+ return val;
+}
+
+/*
+ * Decode the length and type bits for a particular breakpoint as
+ * stored in debug register 7. Return the "enabled" status.
+ */
+static int decode_dr7(unsigned long dr7, int bpnum, unsigned *len,
+ unsigned *type)
+{
+ int temp = dr7 >> (DR_CONTROL_SHIFT + bpnum * DR_CONTROL_SIZE);
+
+ *len = (temp & 0xc) | 0x40;
+ *type = (temp & 0x3) | 0x80;
+ return (dr7 >> (bpnum * DR_ENABLE_SIZE)) & 0x3;
+}
+
+/*
+ * Handle ptrace writes to debug register 7.
+ */
+static int ptrace_write_dr7(struct task_struct *tsk,
+ struct thread_hw_breakpoint *thbi, unsigned long data)
+{
+ struct hw_breakpoint *bp;
+ int i;
+ int rc = 0;
+ unsigned long old_dr7 = thbi->vdr7;
+
+ data &= ~DR_CONTROL_RESERVED;
+
+ /* Loop through all the hardware breakpoints, making the
+ * appropriate changes to each.
+ */
+ restore_settings:
+ thbi->vdr7 = data;
+ bp = &thbi->vdr_bps[0];
+ for (i = 0; i < HB_NUM; (++i, ++bp)) {
+ int enabled;
+ unsigned len, type;
+
+ enabled = decode_dr7(data, i, &len, &type);
+
+ /* Unregister the breakpoint before trying to change it */
+ if (bp->status)
+ __unregister_user_hw_breakpoint(tsk, bp);
+
+ /* Now register the breakpoint if it should be enabled.
+ * New invalid entries will raise an error here.
+ */
+ if (enabled) {
+ /* TODO: Change this portion of the code to use pre_
+ * and post_ handler_supported() routine
+ */
+ if (type == HW_BREAKPOINT_EXECUTE)
+ bp->pre_handler = ptrace_triggered;
+ else
+ bp->post_handler = ptrace_triggered;
+
+ bp->priority = HW_BREAKPOINT_PRIO_PTRACE;
+ if (rc == 0 && __register_user_hw_breakpoint(tsk, bp,
+ bp->info.address, len, type) < 0)
+ break;
+ }
+ }
+
+ /* If anything above failed, restore the original settings */
+ if (i < HB_NUM) {
+ rc = -EIO;
+ data = old_dr7;
+ goto restore_settings;
+ }
+ return rc;
+}
+
+/*
+ * Handle PTRACE_POKEUSR calls for the debug register area.
+ */
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val)
+{
+ struct thread_hw_breakpoint *thbi;
+ int rc = -EIO;
+
+ /* We have to hold this lock the entire time, to prevent thbi
+ * from being deallocated out from under us.
+ */
+ mutex_lock(&hw_breakpoint_mutex);
+
+ /* There are no DR4 or DR5 registers */
+ if (n == 4 || n == 5)
+ ;
+
+ /* Writes to DR6 modify the virtualized value */
+ else if (n == 6) {
+ tsk->thread.vdr6 = val;
+ rc = 0;
+ }
+
+ else if (!tsk->thread.hw_breakpoint_info && val == 0)
+ rc = 0; /* Minor optimization */
+
+ else if ((thbi = alloc_thread_hw_breakpoint(tsk)) == NULL)
+ rc = -ENOMEM;
+
+ /* Writes to DR0 - DR3 change a breakpoint address */
+ else if (n < HB_NUM) {
+ struct hw_breakpoint *bp = &thbi->vdr_bps[n];
+
+ /* If the breakpoint is registered then unregister it,
+ * change it, and re-register it. Revert to the original
+ * address if an error occurs.
+ */
+ if (bp->status) {
+ unsigned long old_addr = bp->info.address;
+
+ __unregister_user_hw_breakpoint(tsk, bp);
+ rc = __register_user_hw_breakpoint(tsk, bp,
+ val, bp->info.len, bp->info.type);
+ if (rc < 0) {
+ __register_user_hw_breakpoint(tsk, bp,
+ old_addr,
+ bp->info.len, bp->info.type);
+ }
+ } else {
+ bp->info.address = val;
+ rc = 0;
+ }
+ }
+
+ /* All that's left is DR7 */
+ else
+ rc = ptrace_write_dr7(tsk, thbi, val);
+
+ mutex_unlock(&hw_breakpoint_mutex);
+ return rc;
+}
+
+
+/*
+ * Handle debug exception notifications.
+ */
+
+static void switch_to_none_hw_breakpoint(void);
+static struct hw_breakpoint *last_hit_bp;
+static struct thread_hw_breakpoint *last_hit_thbi;
+
+static int __kprobes hw_breakpoint_handler(struct die_args *args)
+{
+ struct cpu_hw_breakpoint *chbi;
+ int i;
+ struct hw_breakpoint *bp;
+ struct thread_hw_breakpoint *thbi = NULL;
+ unsigned int *ssr = &(__get_cpu_var(sstep_reason));
+
+ /* The DR6 value is stored in args->err */
+#define DR6 (args->err)
+
+ chbi = &per_cpu(cpu_info, get_cpu());
+
+ /* Disable all breakpoints so that the callbacks can run without
+ * triggering recursive debug exceptions.
+ */
+ set_debugreg(0UL, 7);
+
+ if ((DR6 & DR_STEP) && !((*ssr) & SSTEP_HWBKPT))
+ return NOTIFY_DONE;
+
+ /* If none of the breakpoint detection flags are enabled, it means we
+ * have arrived here due to single-stepping of a target instruction
+ */
+ if ((!(DR6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))) &&
+ (DR6 & DR_STEP)) {
+ args->regs->flags &= ~X86_EFLAGS_TF;
+ (*ssr) &= ~SSTEP_HWBKPT;
+ (last_hit_bp->post_handler)(last_hit_bp, args->regs);
+ /* Re-enable the breakpoints */
+ set_debugreg(last_hit_thbi ? last_hit_thbi->tkdr7 :
+ chbi->cur_kbpdata->mkdr7, 7);
+ put_cpu_no_resched();
+ current->thread.vdr6 &= ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
+
+ return NOTIFY_STOP;
+ }
+
+ /* Assert that local interrupts are disabled */
+ /* Reset the DRn bits in the virtualized register value.
+ * The ptrace trigger routine will add in whatever is needed.
+ */
+ current->thread.vdr6 &= ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
+
+ /* Are we a victim of lazy debug-register switching? */
+ if (!chbi->bp_task)
+ ;
+ else if (chbi->bp_task != current) {
+
+ /* No user breakpoints are valid. Perform the belated
+ * debug-register switch.
+ */
+ switch_to_none_hw_breakpoint();
+ } else {
+ thbi = chbi->bp_task->thread.hw_breakpoint_info;
+ }
+
+ /* Handle all the breakpoints that were triggered */
+ for (i = 0; i < HB_NUM; ++i) {
+ if (likely(!(DR6 & (DR_TRAP0 << i))))
+ continue;
+
+ /* Find the corresponding hw_breakpoint structure and
+ * invoke its triggered callback.
+ */
+ if (i < chbi->cur_kbpdata->num_kbps)
+ bp = chbi->cur_kbpdata->bps[i];
+ else if (thbi)
+ bp = thbi->bps[i];
+ else /* False alarm due to lazy DR switching */
+ continue;
+ if (bp) {
+ (*ssr) = 0;
+ /* Enable single-stepping over the watched insn */
+ switch (bp->info.type) {
+ case HW_BREAKPOINT_EXECUTE:
+ if (bp->pre_handler)
+ (bp->pre_handler)(bp, args->regs);
+
+ if (bp->post_handler) {
+ (*ssr) |= SSTEP_HWBKPT;
+ args->regs->flags |=
+ (X86_EFLAGS_TF | X86_EFLAGS_RF);
+ last_hit_bp = bp;
+ last_hit_thbi = thbi;
+ return NOTIFY_DONE;
+ }
+ break;
+ case HW_BREAKPOINT_WRITE:
+ case HW_BREAKPOINT_RW:
+ if (bp->post_handler)
+ (bp->post_handler)(bp, args->regs);
+ /* Re-enable the breakpoints */
+ set_debugreg(thbi ? thbi->tkdr7 :
+ chbi->cur_kbpdata->mkdr7, 7);
+ put_cpu_no_resched();
+
+ return NOTIFY_DONE;
+ }
+ }
+ }
+ /* Stop processing further if the exception is a stray one */
+ if (!(DR6 & ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
+ return NOTIFY_STOP;
+
+ return NOTIFY_DONE;
+#undef DR6
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+static int __kprobes hw_breakpoint_exceptions_notify(
+ struct notifier_block *unused, unsigned long val, void *data)
+{
+int ret;
+
+ if (val != DIE_DEBUG)
+ return NOTIFY_DONE;
+ ret = hw_breakpoint_handler(data);
+ return ret;
+}
+
+static struct notifier_block hw_breakpoint_exceptions_nb = {
+ .notifier_call = hw_breakpoint_exceptions_notify,
+ .priority = 0x00
+};
+
+static int __init init_hw_breakpoint(void)
+{
+ load_debug_registers();
+ return register_die_notifier(&hw_breakpoint_exceptions_nb);
+}
+
+core_initcall(init_hw_breakpoint);
+
+
+/* Grab the arch-independent code */
+#include "../../../kernel/hw_breakpoint.c"
+
Index: linux-bkpt-lkml-27-rc9/include/asm-x86/hw_breakpoint.h
===================================================================
--- /dev/null
+++ linux-bkpt-lkml-27-rc9/include/asm-x86/hw_breakpoint.h
@@ -0,0 +1,121 @@
+#ifndef _I386_HW_BREAKPOINT_H
+#define _I386_HW_BREAKPOINT_H
+
+#ifdef __KERNEL__
+#define __ARCH_HW_BREAKPOINT_H
+
+struct arch_hw_breakpoint {
+ unsigned long address;
+ u8 len;
+ u8 type;
+} __attribute__((packed));
+
+#include <asm-generic/hw_breakpoint.h>
+
+/* HW breakpoint accessor routines */
+static inline const void *hw_breakpoint_get_kaddress(struct hw_breakpoint *bp)
+{
+ return (const void *) bp->info.address;
+}
+
+static inline const void __user *hw_breakpoint_get_uaddress
+ (struct hw_breakpoint *bp)
+{
+ return (const void __user *) bp->info.address;
+}
+
+static inline unsigned hw_breakpoint_get_len(struct hw_breakpoint *bp)
+{
+ return bp->info.len;
+}
+
+static inline unsigned hw_breakpoint_get_type(struct hw_breakpoint *bp)
+{
+ return bp->info.type;
+}
+
+/* Available HW breakpoint length encodings */
+#define HW_BREAKPOINT_LEN_1 0x40
+#define HW_BREAKPOINT_LEN_2 0x44
+#define HW_BREAKPOINT_LEN_4 0x4c
+#define HW_BREAKPOINT_LEN_EXECUTE 0x40
+
+/* Available HW breakpoint type encodings */
+#define HW_BREAKPOINT_EXECUTE 0x80 /* trigger on instruction execute */
+#define HW_BREAKPOINT_WRITE 0x81 /* trigger on memory write */
+#define HW_BREAKPOINT_IO 0x82 /* trigger on I/O reads or writes */
+#define HW_BREAKPOINT_RW 0x83 /* trigger on memory read or write */
+
+#define HB_NUM 4
+
+/* Per-thread HW breakpoint and debug register info */
+struct thread_hw_breakpoint {
+
+ /* utrace support */
+ struct list_head node; /* Entry in thread list */
+ struct list_head thread_bps; /* Thread's breakpoints */
+ struct hw_breakpoint *bps[HB_NUM]; /* Highest-priority bps */
+ unsigned long tdr[HB_NUM]; /* and their addresses */
+ int num_installed; /* Number of installed bps */
+ unsigned gennum; /* update-generation number */
+
+ /* Only the portions below are arch-specific */
+
+ /* ptrace support -- Note that vdr6 is stored directly in the
+ * thread_struct so that it is always available.
+ */
+ unsigned long vdr7; /* Virtualized DR7 */
+ struct hw_breakpoint vdr_bps[HB_NUM]; /* Breakpoints
+ representing virtualized debug registers 0 - 3 */
+ unsigned long tdr7; /* Thread's DR7 value */
+ unsigned long tkdr7; /* Thread + kernel DR7 value */
+};
+
+/* Kernel-space breakpoint data */
+struct kernel_bp_data {
+ unsigned gennum; /* Generation number */
+ int num_kbps; /* Number of kernel bps */
+ struct hw_breakpoint *bps[HB_NUM]; /* Loaded breakpoints */
+
+ /* Only the portions below are arch-specific */
+ unsigned long mkdr7; /* Masked kernel DR7 value */
+};
+
+/* Per-CPU debug register info */
+struct cpu_hw_breakpoint {
+ struct kernel_bp_data *cur_kbpdata; /* Current kbpdata[] entry */
+ struct task_struct *bp_task; /* The thread whose bps
+ are currently loaded in the debug registers */
+};
+
+/* Global info */
+static struct kernel_bp_data kbpdata[2]; /* Old and new settings */
+static int cur_kbpindex; /* Alternates 0, 1, ... */
+static struct kernel_bp_data *cur_kbpdata = &kbpdata[0];
+ /* Always equal to &kbpdata[cur_kbpindex] */
+
+static u8 tprio[HB_NUM]; /* Thread bp max priorities */
+static LIST_HEAD(kernel_bps); /* Kernel breakpoint list */
+static LIST_HEAD(thread_list); /* thread_hw_breakpoint list */
+static DEFINE_MUTEX(hw_breakpoint_mutex); /* Protects everything */
+
+/* Only the portions below are arch-specific */
+
+static unsigned long kdr7; /* Unmasked kernel DR7 value */
+
+/* Masks for the bits in DR7 related to kernel breakpoints, for various
+ * values of num_kbps. Entry n is the mask for when there are n kernel
+ * breakpoints, in debug registers 0 - (n-1). The DR_GLOBAL_SLOWDOWN bit
+ * (GE) is handled specially.
+ */
+static const unsigned long kdr7_masks[HB_NUM + 1] = {
+ 0x00000000,
+ 0x000f0003, /* LEN0, R/W0, G0, L0 */
+ 0x00ff000f, /* Same for 0,1 */
+ 0x0fff003f, /* Same for 0,1,2 */
+ 0xffff00ff /* Same for 0,1,2,3 */
+};
+
+#endif /* __KERNEL__ */
+#endif /* _I386_HW_BREAKPOINT_H */
+
Index: linux-bkpt-lkml-27-rc9/arch/x86/kernel/Makefile
===================================================================
--- linux-bkpt-lkml-27-rc9.orig/arch/x86/kernel/Makefile
+++ linux-bkpt-lkml-27-rc9/arch/x86/kernel/Makefile
@@ -33,7 +33,7 @@ obj-$(CONFIG_X86_64) += sys_x86_64.o x86
obj-$(CONFIG_X86_64) += syscall_64.o vsyscall_64.o
obj-y += bootflag.o e820.o
obj-y += pci-dma.o quirks.o i8237.o topology.o kdebugfs.o
-obj-y += alternative.o i8253.o pci-nommu.o
+obj-y += alternative.o i8253.o pci-nommu.o hw_breakpoint.o
obj-y += tsc.o io_delay.o rtc.o
obj-$(CONFIG_X86_TRAMPOLINE) += trampoline.o
This patch modifies the breakpoint exception handler code to use the abstract
register names.
Signed-off-by: K.Prasad <[email protected]>
Signed-off-by: Alan Stern <[email protected]>
---
arch/x86/kernel/traps_32.c | 67 ++++++++++++++++-----------------------------
arch/x86/kernel/traps_64.c | 64 ++++++++++++++++++++----------------------
2 files changed, 54 insertions(+), 77 deletions(-)
Index: linux-bkpt-lkml-27-rc9/arch/x86/kernel/traps_32.c
===================================================================
--- linux-bkpt-lkml-27-rc9.orig/arch/x86/kernel/traps_32.c
+++ linux-bkpt-lkml-27-rc9/arch/x86/kernel/traps_32.c
@@ -890,11 +890,12 @@ void __kprobes do_int3(struct pt_regs *r
void __kprobes do_debug(struct pt_regs *regs, long error_code)
{
struct task_struct *tsk = current;
- unsigned int condition;
+ unsigned long dr6;
trace_hardirqs_fixup();
- get_debugreg(condition, 6);
+ get_debugreg(dr6, 6);
+ set_debugreg(0, 6); /* DR6 may or may not be cleared by the CPU */
/*
* The processor cleared BTF, so don't mark that we need it set.
@@ -902,60 +903,40 @@ void __kprobes do_debug(struct pt_regs *
clear_tsk_thread_flag(tsk, TIF_DEBUGCTLMSR);
tsk->thread.debugctlmsr = 0;
- if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
- SIGTRAP) == NOTIFY_STOP)
+ /* Store the virtualized DR6 value */
+ tsk->thread.vdr6 = dr6;
+
+ if (notify_die(DIE_DEBUG, "debug", regs, dr6, error_code,
+ SIGTRAP) == NOTIFY_STOP)
return;
/* It's safe to allow irq's after DR6 has been saved */
if (regs->flags & X86_EFLAGS_IF)
local_irq_enable();
- /* Mask out spurious debug traps due to lazy DR7 setting */
- if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
- if (!tsk->thread.debugreg7)
- goto clear_dr7;
+ if (regs->flags & X86_VM_MASK) {
+ handle_vm86_trap((struct kernel_vm86_regs *) regs,
+ error_code, 1);
+ return;
}
- if (regs->flags & X86_VM_MASK)
- goto debug_vm86;
-
- /* Save debug status register where ptrace can see it */
- tsk->thread.debugreg6 = condition;
-
/*
- * Single-stepping through TF: make sure we ignore any events in
- * kernel space (but re-enable TF when returning to user mode).
+ * Single-stepping through system calls: ignore any exceptions in
+ * kernel space, but re-enable TF when returning to user mode.
+ *
+ * We already checked v86 mode above, so we can check for kernel mode
+ * by just checking the CPL of CS.
*/
- if (condition & DR_STEP) {
- /*
- * We already checked v86 mode above, so we can
- * check for kernel mode by just checking the CPL
- * of CS.
- */
- if (!user_mode(regs))
- goto clear_TF_reenable;
+ if ((dr6 & DR_STEP) && !user_mode(regs)) {
+ tsk->thread.vdr6 &= ~DR_STEP;
+ set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
+ regs->flags &= ~X86_EFLAGS_TF;
}
- /* Ok, finally something we can handle */
- send_sigtrap(tsk, regs, error_code);
-
- /*
- * Disable additional traps. They'll be re-enabled when
- * the signal is delivered.
- */
-clear_dr7:
- set_debugreg(0, 7);
- return;
-
-debug_vm86:
- handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, 1);
- return;
-
-clear_TF_reenable:
- set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
- regs->flags &= ~X86_EFLAGS_TF;
- return;
+ if (tsk->thread.vdr6 & (DR_STEP|DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
+ send_sigtrap(tsk, regs, error_code);
}
+
/*
* Note that we play around with the 'TS' bit in an attempt to get
* the correct behaviour even in the presence of the asynchronous
Index: linux-bkpt-lkml-27-rc9/arch/x86/kernel/traps_64.c
===================================================================
--- linux-bkpt-lkml-27-rc9.orig/arch/x86/kernel/traps_64.c
+++ linux-bkpt-lkml-27-rc9/arch/x86/kernel/traps_64.c
@@ -894,13 +894,14 @@ asmlinkage __kprobes struct pt_regs *syn
asmlinkage void __kprobes do_debug(struct pt_regs * regs,
unsigned long error_code)
{
+ unsigned long dr6;;
struct task_struct *tsk = current;
- unsigned long condition;
siginfo_t info;
trace_hardirqs_fixup();
- get_debugreg(condition, 6);
+ get_debugreg(dr6, 6);
+ set_debugreg(0, 6); /* DR6 may or may not be cleared by the CPU */
/*
* The processor cleared BTF, so don't mark that we need it set.
@@ -908,48 +909,43 @@ asmlinkage void __kprobes do_debug(struc
clear_tsk_thread_flag(tsk, TIF_DEBUGCTLMSR);
tsk->thread.debugctlmsr = 0;
- if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
- SIGTRAP) == NOTIFY_STOP)
+ /* Store the virtualized DR6 value */
+ tsk->thread.vdr6 = dr6;
+
+ if (notify_die(DIE_DEBUG, "debug", regs, dr6, error_code,
+ SIGTRAP) == NOTIFY_STOP)
return;
preempt_conditional_sti(regs);
- /* Mask out spurious debug traps due to lazy DR7 setting */
- if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
- if (!tsk->thread.debugreg7)
- goto clear_dr7;
+ if (regs->flags & X86_VM_MASK) {
+ handle_vm86_trap((struct kernel_vm86_regs *) regs,
+ error_code, 1);
+ return;
}
- tsk->thread.debugreg6 = condition;
-
/*
- * Single-stepping through TF: make sure we ignore any events in
- * kernel space (but re-enable TF when returning to user mode).
+ * Single-stepping through system calls: ignore any exceptions in
+ * kernel space, but re-enable TF when returning to user mode.
+ *
+ * We already checked v86 mode above, so we can check for kernel mode
+ * by just checking the CPL of CS.
*/
- if (condition & DR_STEP) {
- if (!user_mode(regs))
- goto clear_TF_reenable;
+ if ((dr6 & DR_STEP) && !user_mode(regs)) {
+ tsk->thread.vdr6 &= ~DR_STEP;
+ set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
+ regs->flags &= ~X86_EFLAGS_TF;
}
- /* Ok, finally something we can handle */
- tsk->thread.trap_no = 1;
- tsk->thread.error_code = error_code;
- info.si_signo = SIGTRAP;
- info.si_errno = 0;
- info.si_code = TRAP_BRKPT;
- info.si_addr = user_mode(regs) ? (void __user *)regs->ip : NULL;
- force_sig_info(SIGTRAP, &info, tsk);
-
-clear_dr7:
- set_debugreg(0, 7);
- preempt_conditional_cli(regs);
- return;
-
-clear_TF_reenable:
- set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
- regs->flags &= ~X86_EFLAGS_TF;
- preempt_conditional_cli(regs);
- return;
+ if (tsk->thread.vdr6 & (DR_STEP|DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
+ tsk->thread.trap_no = 1;
+ tsk->thread.error_code = error_code;
+ info.si_signo = SIGTRAP;
+ info.si_errno = 0;
+ info.si_code = TRAP_BRKPT;
+ info.si_addr = user_mode(regs) ? (void __user *)regs->ip : NULL;
+ force_sig_info(SIGTRAP, &info, tsk);
+ }
}
static int kernel_math_error(struct pt_regs *regs, const char *str, int trapnr)
This patch modifies the kprobe handler to help it recognise single-stepping by
the HW Breakpoint exception code. A per-cpu variable called 'sstep_reason' to
distinguish the source of single-step exceptions.
Signed-off-by: K.Prasad <[email protected]>
Signed-off-by: Alan Stern <[email protected]>
---
arch/x86/kernel/kprobes.c | 18 ++++++++++++++++--
include/asm-x86/kprobes.h | 8 ++++++++
2 files changed, 24 insertions(+), 2 deletions(-)
Index: linux-bkpt-lkml-27-rc9/arch/x86/kernel/kprobes.c
===================================================================
--- linux-bkpt-lkml-27-rc9.orig/arch/x86/kernel/kprobes.c
+++ linux-bkpt-lkml-27-rc9/arch/x86/kernel/kprobes.c
@@ -54,6 +54,7 @@
#include <asm/pgtable.h>
#include <asm/uaccess.h>
#include <asm/alternative.h>
+#include <asm/debugreg.h>
void jprobe_return_end(void);
@@ -517,6 +518,7 @@ static int __kprobes kprobe_handler(stru
kprobe_opcode_t *addr;
struct kprobe *p;
struct kprobe_ctlblk *kcb;
+ unsigned int *ssr;
addr = (kprobe_opcode_t *)(regs->ip - sizeof(kprobe_opcode_t));
if (*addr != BREAKPOINT_INSTRUCTION) {
@@ -541,6 +543,7 @@ static int __kprobes kprobe_handler(stru
*/
preempt_disable();
+ ssr = &(__get_cpu_var(sstep_reason));
kcb = get_kprobe_ctlblk();
p = get_kprobe(addr);
@@ -560,14 +563,17 @@ static int __kprobes kprobe_handler(stru
* for jprobe processing, so get out doing nothing
* more here.
*/
- if (!p->pre_handler || !p->pre_handler(p, regs))
+ if (!p->pre_handler || !p->pre_handler(p, regs)) {
setup_singlestep(p, regs, kcb);
+ (*ssr) |= SSTEP_KPROBE;
+ }
return 1;
}
} else if (kprobe_running()) {
p = __get_cpu_var(current_kprobe);
if (p->break_handler && p->break_handler(p, regs)) {
setup_singlestep(p, regs, kcb);
+ (*ssr) |= SSTEP_KPROBE;
return 1;
}
} /* else: not a kprobe fault; let the kernel handle it */
@@ -952,6 +958,7 @@ int __kprobes kprobe_exceptions_notify(s
{
struct die_args *args = data;
int ret = NOTIFY_DONE;
+ unsigned int *ssr = &(__get_cpu_var(sstep_reason));
if (args->regs && user_mode_vm(args->regs))
return ret;
@@ -962,8 +969,15 @@ int __kprobes kprobe_exceptions_notify(s
ret = NOTIFY_STOP;
break;
case DIE_DEBUG:
- if (post_kprobe_handler(args->regs))
+ /* We could be here due to single-stepping after a pre-handler
+ * execution of HW Breakpoint or kprobes. We determine the cause
+ * using the bitmask flag 'sstep_reason'.
+ */
+ if (((*ssr) & SSTEP_KPROBE) &&
+ post_kprobe_handler(args->regs)) {
+ current->thread.vdr6 &= ~DR_STEP;
ret = NOTIFY_STOP;
+ }
break;
case DIE_GPF:
/*
Index: linux-bkpt-lkml-27-rc9/include/asm-x86/kprobes.h
===================================================================
--- linux-bkpt-lkml-27-rc9.orig/include/asm-x86/kprobes.h
+++ linux-bkpt-lkml-27-rc9/include/asm-x86/kprobes.h
@@ -30,6 +30,14 @@
struct pt_regs;
struct kprobe;
+/* Single stepping can be initiated for kprobes post handler or following HW
+ * Breakpoint exception. The bitmask below is used to identify the cause.
+ */
+#define SSTEP_KPROBE 1
+#define SSTEP_HWBKPT 2
+
+DECLARE_PER_CPU(unsigned int, sstep_reason);
+
typedef u8 kprobe_opcode_t;
#define BREAKPOINT_INSTRUCTION 0xcc
#define RELATIVEJUMP_INSTRUCTION 0xe9
This patch enables the use of wrapper routines to access the debug/breakpoint
registers.
Signed-off-by: K.Prasad <[email protected]>
Signed-off-by: Alan Stern <[email protected]>
---
arch/x86/kernel/smpboot.c | 3 +++
arch/x86/power/cpu_32.c | 16 +++-------------
arch/x86/power/cpu_64.c | 15 +++------------
include/asm-x86/debugreg.h | 29 +++++++++++++++++++++++++++++
include/asm-x86/processor.h | 10 +++-------
5 files changed, 41 insertions(+), 32 deletions(-)
Index: linux-bkpt-lkml-27-rc9/arch/x86/power/cpu_32.c
===================================================================
--- linux-bkpt-lkml-27-rc9.orig/arch/x86/power/cpu_32.c
+++ linux-bkpt-lkml-27-rc9/arch/x86/power/cpu_32.c
@@ -11,6 +11,7 @@
#include <linux/suspend.h>
#include <asm/mtrr.h>
#include <asm/mce.h>
+#include <asm/debugreg.h>
static struct saved_context saved_context;
@@ -46,6 +47,7 @@ static void __save_processor_state(struc
ctxt->cr2 = read_cr2();
ctxt->cr3 = read_cr3();
ctxt->cr4 = read_cr4_safe();
+ disable_debug_registers();
}
/* Needed by apm.c */
@@ -78,19 +80,7 @@ static void fix_processor_context(void)
load_TR_desc(); /* This does ltr */
load_LDT(¤t->active_mm->context); /* This does lldt */
- /*
- * Now maybe reload the debug registers
- */
- if (current->thread.debugreg7) {
- set_debugreg(current->thread.debugreg0, 0);
- set_debugreg(current->thread.debugreg1, 1);
- set_debugreg(current->thread.debugreg2, 2);
- set_debugreg(current->thread.debugreg3, 3);
- /* no 4 and 5 */
- set_debugreg(current->thread.debugreg6, 6);
- set_debugreg(current->thread.debugreg7, 7);
- }
-
+ load_debug_registers();
}
static void __restore_processor_state(struct saved_context *ctxt)
Index: linux-bkpt-lkml-27-rc9/arch/x86/power/cpu_64.c
===================================================================
--- linux-bkpt-lkml-27-rc9.orig/arch/x86/power/cpu_64.c
+++ linux-bkpt-lkml-27-rc9/arch/x86/power/cpu_64.c
@@ -14,6 +14,7 @@
#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/mtrr.h>
+#include <asm/debugreg.h>
static void fix_processor_context(void);
@@ -69,6 +70,7 @@ static void __save_processor_state(struc
ctxt->cr3 = read_cr3();
ctxt->cr4 = read_cr4();
ctxt->cr8 = read_cr8();
+ disable_debug_registers();
}
void save_processor_state(void)
@@ -151,16 +153,5 @@ static void fix_processor_context(void)
load_TR_desc(); /* This does ltr */
load_LDT(¤t->active_mm->context); /* This does lldt */
- /*
- * Now maybe reload the debug registers
- */
- if (current->thread.debugreg7){
- loaddebug(¤t->thread, 0);
- loaddebug(¤t->thread, 1);
- loaddebug(¤t->thread, 2);
- loaddebug(¤t->thread, 3);
- /* no 4 and 5 */
- loaddebug(¤t->thread, 6);
- loaddebug(¤t->thread, 7);
- }
+ load_debug_registers();
}
Index: linux-bkpt-lkml-27-rc9/arch/x86/kernel/smpboot.c
===================================================================
--- linux-bkpt-lkml-27-rc9.orig/arch/x86/kernel/smpboot.c
+++ linux-bkpt-lkml-27-rc9/arch/x86/kernel/smpboot.c
@@ -61,6 +61,7 @@
#include <asm/mtrr.h>
#include <asm/vmi.h>
#include <asm/genapic.h>
+#include <asm/debugreg.h>
#include <linux/mc146818rtc.h>
#include <mach_apic.h>
@@ -342,6 +343,7 @@ static void __cpuinit start_secondary(vo
setup_secondary_clock();
wmb();
+ load_debug_registers();
cpu_idle();
}
@@ -1382,6 +1384,7 @@ int __cpu_disable(void)
remove_cpu_from_maps(cpu);
unlock_vector_lock();
fixup_irqs(cpu_online_map);
+ disable_debug_registers();
return 0;
}
Index: linux-bkpt-lkml-27-rc9/include/asm-x86/debugreg.h
===================================================================
--- linux-bkpt-lkml-27-rc9.orig/include/asm-x86/debugreg.h
+++ linux-bkpt-lkml-27-rc9/include/asm-x86/debugreg.h
@@ -49,6 +49,8 @@
#define DR_LOCAL_ENABLE_SHIFT 0 /* Extra shift to the local enable bit */
#define DR_GLOBAL_ENABLE_SHIFT 1 /* Extra shift to the global enable bit */
+#define DR_LOCAL_ENABLE (0x1) /* Local enable for reg 0 */
+#define DR_GLOBAL_ENABLE (0x2) /* Global enable for reg 0 */
#define DR_ENABLE_SIZE 2 /* 2 enable bits per register */
#define DR_LOCAL_ENABLE_MASK (0x55) /* Set local bits for all 4 regs */
@@ -67,4 +69,31 @@
#define DR_LOCAL_SLOWDOWN (0x100) /* Local slow the pipeline */
#define DR_GLOBAL_SLOWDOWN (0x200) /* Global slow the pipeline */
+/*
+ * HW breakpoint additions
+ */
+#ifdef __KERNEL__
+
+#define HB_NUM 4 /* Number of hardware breakpoints */
+
+/* For process management */
+void flush_thread_hw_breakpoint(struct task_struct *tsk);
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+ struct task_struct *child, unsigned long clone_flags);
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8]);
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk);
+
+/* For CPU management */
+void load_debug_registers(void);
+static inline void disable_debug_registers(void)
+{
+ set_debugreg(0UL, 7);
+}
+
+/* For use by ptrace */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n);
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val);
+
+#endif /* __KERNEL__ */
+
#endif
Index: linux-bkpt-lkml-27-rc9/include/asm-x86/processor.h
===================================================================
--- linux-bkpt-lkml-27-rc9.orig/include/asm-x86/processor.h
+++ linux-bkpt-lkml-27-rc9/include/asm-x86/processor.h
@@ -381,13 +381,9 @@ struct thread_struct {
unsigned long ip;
unsigned long fs;
unsigned long gs;
- /* Hardware debugging registers: */
- unsigned long debugreg0;
- unsigned long debugreg1;
- unsigned long debugreg2;
- unsigned long debugreg3;
- unsigned long debugreg6;
- unsigned long debugreg7;
+/* Hardware breakpoint info */
+ unsigned long vdr6;
+ struct thread_hw_breakpoint *hw_breakpoint_info;
/* Fault info: */
unsigned long cr2;
unsigned long trap_no;
This patch enables the use of abstract/virtual debug registers in
process-handling routines.
Signed-off-by: K.Prasad <[email protected]>
Signed-off-by: Alan Stern <[email protected]>
---
arch/x86/kernel/process_32.c | 45 +++++++++++++++++++++++++------------------
arch/x86/kernel/process_64.c | 42 +++++++++++++++++++++++-----------------
2 files changed, 52 insertions(+), 35 deletions(-)
Index: linux-bkpt-lkml-27-rc9/arch/x86/kernel/process_32.c
===================================================================
--- linux-bkpt-lkml-27-rc9.orig/arch/x86/kernel/process_32.c
+++ linux-bkpt-lkml-27-rc9/arch/x86/kernel/process_32.c
@@ -56,6 +56,8 @@
#include <asm/cpu.h>
#include <asm/kdebug.h>
#include <asm/idle.h>
+#include <asm/debugreg.h>
+#include <asm/hw_breakpoint.h>
asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
@@ -257,6 +259,8 @@ EXPORT_SYMBOL(kernel_thread);
*/
void exit_thread(void)
{
+ struct task_struct *tsk = current;
+
/* The process may have allocated an io port bitmap... nuke it. */
if (unlikely(test_thread_flag(TIF_IO_BITMAP))) {
struct task_struct *tsk = current;
@@ -277,20 +281,17 @@ void exit_thread(void)
tss->x86_tss.io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
put_cpu();
}
+ if (unlikely(tsk->thread.hw_breakpoint_info))
+ flush_thread_hw_breakpoint(tsk);
}
void flush_thread(void)
{
struct task_struct *tsk = current;
- tsk->thread.debugreg0 = 0;
- tsk->thread.debugreg1 = 0;
- tsk->thread.debugreg2 = 0;
- tsk->thread.debugreg3 = 0;
- tsk->thread.debugreg6 = 0;
- tsk->thread.debugreg7 = 0;
- memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
- clear_tsk_thread_flag(tsk, TIF_DEBUG);
+ memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
+ if (unlikely(tsk->thread.hw_breakpoint_info))
+ flush_thread_hw_breakpoint(tsk);
/*
* Forget coprocessor state..
*/
@@ -334,7 +335,15 @@ int copy_thread(int nr, unsigned long cl
savesegment(gs, p->thread.gs);
+ p->thread.hw_breakpoint_info = NULL;
+ p->thread.io_bitmap_ptr = NULL;
+
tsk = current;
+ err = -ENOMEM;
+ if (unlikely(tsk->thread.hw_breakpoint_info)) {
+ if (copy_thread_hw_breakpoint(tsk, p, clone_flags))
+ goto out;
+ }
if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
p->thread.io_bitmap_ptr = kmemdup(tsk->thread.io_bitmap_ptr,
IO_BITMAP_BYTES, GFP_KERNEL);
@@ -354,10 +363,14 @@ int copy_thread(int nr, unsigned long cl
err = do_set_thread_area(p, -1,
(struct user_desc __user *)childregs->si, 0);
+out:
if (err && p->thread.io_bitmap_ptr) {
kfree(p->thread.io_bitmap_ptr);
p->thread.io_bitmap_max = 0;
}
+ if (err)
+ flush_thread_hw_breakpoint(p);
+
return err;
}
@@ -460,16 +473,6 @@ __switch_to_xtra(struct task_struct *pre
if (next->debugctlmsr != debugctl)
update_debugctlmsr(next->debugctlmsr);
- if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
- set_debugreg(next->debugreg0, 0);
- set_debugreg(next->debugreg1, 1);
- set_debugreg(next->debugreg2, 2);
- set_debugreg(next->debugreg3, 3);
- /* no 4 and 5 */
- set_debugreg(next->debugreg6, 6);
- set_debugreg(next->debugreg7, 7);
- }
-
if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^
test_tsk_thread_flag(next_p, TIF_NOTSC)) {
/* prev and next are different */
@@ -625,6 +628,12 @@ struct task_struct * __switch_to(struct
loadsegment(gs, next->gs);
x86_write_percpu(current_task, next_p);
+ /*
+ * Handle debug registers. This must be done _after_ current
+ * is updated.
+ */
+ if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
+ switch_to_thread_hw_breakpoint(next_p);
return prev_p;
}
Index: linux-bkpt-lkml-27-rc9/arch/x86/kernel/process_64.c
===================================================================
--- linux-bkpt-lkml-27-rc9.orig/arch/x86/kernel/process_64.c
+++ linux-bkpt-lkml-27-rc9/arch/x86/kernel/process_64.c
@@ -51,6 +51,8 @@
#include <asm/proto.h>
#include <asm/ia32.h>
#include <asm/idle.h>
+#include <asm/debugreg.h>
+#include <asm/hw_breakpoint.h>
asmlinkage extern void ret_from_fork(void);
@@ -240,6 +242,8 @@ void exit_thread(void)
t->io_bitmap_max = 0;
put_cpu();
}
+ if (unlikely(me->thread.hw_breakpoint_info))
+ flush_thread_hw_breakpoint(me);
}
void flush_thread(void)
@@ -257,13 +261,9 @@ void flush_thread(void)
}
clear_tsk_thread_flag(tsk, TIF_DEBUG);
- tsk->thread.debugreg0 = 0;
- tsk->thread.debugreg1 = 0;
- tsk->thread.debugreg2 = 0;
- tsk->thread.debugreg3 = 0;
- tsk->thread.debugreg6 = 0;
- tsk->thread.debugreg7 = 0;
memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
+ if (unlikely(tsk->thread.hw_breakpoint_info))
+ flush_thread_hw_breakpoint(tsk);
/*
* Forget coprocessor state..
*/
@@ -338,13 +338,21 @@ int copy_thread(int nr, unsigned long cl
p->thread.fs = me->thread.fs;
p->thread.gs = me->thread.gs;
+ p->thread.hw_breakpoint_info = NULL;
+ p->thread.io_bitmap_ptr = NULL;
savesegment(gs, p->thread.gsindex);
savesegment(fs, p->thread.fsindex);
savesegment(es, p->thread.es);
savesegment(ds, p->thread.ds);
- if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
+ err = -ENOMEM;
+ if (unlikely(me->thread.hw_breakpoint_info)) {
+ if (copy_thread_hw_breakpoint(me, p, clone_flags))
+ goto out;
+ }
+
+if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
p->thread.io_bitmap_ptr = kmalloc(IO_BITMAP_BYTES, GFP_KERNEL);
if (!p->thread.io_bitmap_ptr) {
p->thread.io_bitmap_max = 0;
@@ -375,6 +383,9 @@ out:
kfree(p->thread.io_bitmap_ptr);
p->thread.io_bitmap_max = 0;
}
+ if (err)
+ flush_thread_hw_breakpoint(p);
+
return err;
}
@@ -484,16 +495,6 @@ static inline void __switch_to_xtra(stru
if (next->debugctlmsr != debugctl)
update_debugctlmsr(next->debugctlmsr);
- if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
- loaddebug(next, 0);
- loaddebug(next, 1);
- loaddebug(next, 2);
- loaddebug(next, 3);
- /* no 4 and 5 */
- loaddebug(next, 6);
- loaddebug(next, 7);
- }
-
if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^
test_tsk_thread_flag(next_p, TIF_NOTSC)) {
/* prev and next are different */
@@ -524,6 +525,13 @@ static inline void __switch_to_xtra(stru
if (test_tsk_thread_flag(next_p, TIF_BTS_TRACE_TS))
ptrace_bts_take_timestamp(next_p, BTS_TASK_ARRIVES);
#endif
+
+/*
+ * Handle debug registers. This must be done _after_ current
+ * is updated.
+ */
+ if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
+ switch_to_thread_hw_breakpoint(next_p);
}
/*
This patch disables re-enabling of Hardware Breakpoint registers through the
signal handling code. This is now during hw_breakpoint_handler().
Signed-off-by: K.Prasad <[email protected]>
Signed-off-by: Alan Stern <[email protected]>
---
arch/x86/kernel/signal_32.c | 9 ---------
arch/x86/kernel/signal_64.c | 8 --------
2 files changed, 17 deletions(-)
Index: linux-bkpt-lkml-27-rc9/arch/x86/kernel/signal_32.c
===================================================================
--- linux-bkpt-lkml-27-rc9.orig/arch/x86/kernel/signal_32.c
+++ linux-bkpt-lkml-27-rc9/arch/x86/kernel/signal_32.c
@@ -600,15 +600,6 @@ static void do_signal(struct pt_regs *re
signr = get_signal_to_deliver(&info, &ka, regs, NULL);
if (signr > 0) {
- /*
- * Re-enable any watchpoints before delivering the
- * signal to user space. The processor register will
- * have been cleared if the watchpoint triggered
- * inside the kernel.
- */
- if (current->thread.debugreg7)
- set_debugreg(current->thread.debugreg7, 7);
-
/* Whee! Actually deliver the signal. */
if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
/*
Index: linux-bkpt-lkml-27-rc9/arch/x86/kernel/signal_64.c
===================================================================
--- linux-bkpt-lkml-27-rc9.orig/arch/x86/kernel/signal_64.c
+++ linux-bkpt-lkml-27-rc9/arch/x86/kernel/signal_64.c
@@ -496,14 +496,6 @@ static void do_signal(struct pt_regs *re
signr = get_signal_to_deliver(&info, &ka, regs, NULL);
if (signr > 0) {
- /* Re-enable any watchpoints before delivering the
- * signal to user space. The processor register will
- * have been cleared if the watchpoint triggered
- * inside the kernel.
- */
- if (current->thread.debugreg7)
- set_debugreg(current->thread.debugreg7, 7);
-
/* Whee! Actually deliver the signal. */
if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
/*
This patch modifies the ptrace code to use the new wrapper routines around the
debug/breakpoint registers.
Signed-off-by: K.Prasad <[email protected]>
Signed-off-by: Alan Stern <[email protected]>
---
arch/x86/kernel/ptrace.c | 101 +----------------------------------------------
1 file changed, 4 insertions(+), 97 deletions(-)
Index: linux-bkpt-lkml-27-rc9/arch/x86/kernel/ptrace.c
===================================================================
--- linux-bkpt-lkml-27-rc9.orig/arch/x86/kernel/ptrace.c
+++ linux-bkpt-lkml-27-rc9/arch/x86/kernel/ptrace.c
@@ -462,98 +462,6 @@ static int genregs_set(struct task_struc
return ret;
}
-/*
- * This function is trivial and will be inlined by the compiler.
- * Having it separates the implementation details of debug
- * registers from the interface details of ptrace.
- */
-static unsigned long ptrace_get_debugreg(struct task_struct *child, int n)
-{
- switch (n) {
- case 0: return child->thread.debugreg0;
- case 1: return child->thread.debugreg1;
- case 2: return child->thread.debugreg2;
- case 3: return child->thread.debugreg3;
- case 6: return child->thread.debugreg6;
- case 7: return child->thread.debugreg7;
- }
- return 0;
-}
-
-static int ptrace_set_debugreg(struct task_struct *child,
- int n, unsigned long data)
-{
- int i;
-
- if (unlikely(n == 4 || n == 5))
- return -EIO;
-
- if (n < 4 && unlikely(data >= debugreg_addr_limit(child)))
- return -EIO;
-
- switch (n) {
- case 0: child->thread.debugreg0 = data; break;
- case 1: child->thread.debugreg1 = data; break;
- case 2: child->thread.debugreg2 = data; break;
- case 3: child->thread.debugreg3 = data; break;
-
- case 6:
- if ((data & ~0xffffffffUL) != 0)
- return -EIO;
- child->thread.debugreg6 = data;
- break;
-
- case 7:
- /*
- * Sanity-check data. Take one half-byte at once with
- * check = (val >> (16 + 4*i)) & 0xf. It contains the
- * R/Wi and LENi bits; bits 0 and 1 are R/Wi, and bits
- * 2 and 3 are LENi. Given a list of invalid values,
- * we do mask |= 1 << invalid_value, so that
- * (mask >> check) & 1 is a correct test for invalid
- * values.
- *
- * R/Wi contains the type of the breakpoint /
- * watchpoint, LENi contains the length of the watched
- * data in the watchpoint case.
- *
- * The invalid values are:
- * - LENi == 0x10 (undefined), so mask |= 0x0f00. [32-bit]
- * - R/Wi == 0x10 (break on I/O reads or writes), so
- * mask |= 0x4444.
- * - R/Wi == 0x00 && LENi != 0x00, so we have mask |=
- * 0x1110.
- *
- * Finally, mask = 0x0f00 | 0x4444 | 0x1110 == 0x5f54.
- *
- * See the Intel Manual "System Programming Guide",
- * 15.2.4
- *
- * Note that LENi == 0x10 is defined on x86_64 in long
- * mode (i.e. even for 32-bit userspace software, but
- * 64-bit kernel), so the x86_64 mask value is 0x5454.
- * See the AMD manual no. 24593 (AMD64 System Programming)
- */
-#ifdef CONFIG_X86_32
-#define DR7_MASK 0x5f54
-#else
-#define DR7_MASK 0x5554
-#endif
- data &= ~DR_CONTROL_RESERVED;
- for (i = 0; i < 4; i++)
- if ((DR7_MASK >> ((data >> (16 + 4*i)) & 0xf)) & 1)
- return -EIO;
- child->thread.debugreg7 = data;
- if (data)
- set_tsk_thread_flag(child, TIF_DEBUG);
- else
- clear_tsk_thread_flag(child, TIF_DEBUG);
- break;
- }
-
- return 0;
-}
-
#ifdef X86_BTS
static int ptrace_bts_get_size(struct task_struct *child)
@@ -888,7 +796,7 @@ long arch_ptrace(struct task_struct *chi
else if (addr >= offsetof(struct user, u_debugreg[0]) &&
addr <= offsetof(struct user, u_debugreg[7])) {
addr -= offsetof(struct user, u_debugreg[0]);
- tmp = ptrace_get_debugreg(child, addr / sizeof(data));
+ tmp = thread_get_debugreg(child, addr/sizeof(data));
}
ret = put_user(tmp, datap);
break;
@@ -905,8 +813,7 @@ long arch_ptrace(struct task_struct *chi
else if (addr >= offsetof(struct user, u_debugreg[0]) &&
addr <= offsetof(struct user, u_debugreg[7])) {
addr -= offsetof(struct user, u_debugreg[0]);
- ret = ptrace_set_debugreg(child,
- addr / sizeof(data), data);
+ ret = thread_set_debugreg(child, addr/sizeof(data), data);
}
break;
@@ -1073,7 +980,7 @@ static int putreg32(struct task_struct *
case offsetof(struct user32, u_debugreg[0]) ...
offsetof(struct user32, u_debugreg[7]):
regno -= offsetof(struct user32, u_debugreg[0]);
- return ptrace_set_debugreg(child, regno / 4, value);
+ return thread_set_debugreg(child, regno / 4, value);
default:
if (regno > sizeof(struct user32) || (regno & 3))
@@ -1132,7 +1039,7 @@ static int getreg32(struct task_struct *
case offsetof(struct user32, u_debugreg[0]) ...
offsetof(struct user32, u_debugreg[7]):
regno -= offsetof(struct user32, u_debugreg[0]);
- *val = ptrace_get_debugreg(child, regno / 4);
+ *val = thread_get_debugreg(child, regno / 4);
break;
default:
This patch disables Hardware breakpoints before doing a 'kexec' on the machine.
Signed-off-by: K.Prasad <[email protected]>
Signed-off-by: Alan Stern <[email protected]>
---
arch/x86/kernel/machine_kexec_32.c | 2 ++
arch/x86/kernel/machine_kexec_64.c | 2 ++
2 files changed, 4 insertions(+)
Index: linux-bkpt-lkml-27-rc9/arch/x86/kernel/machine_kexec_32.c
===================================================================
--- linux-bkpt-lkml-27-rc9.orig/arch/x86/kernel/machine_kexec_32.c
+++ linux-bkpt-lkml-27-rc9/arch/x86/kernel/machine_kexec_32.c
@@ -24,6 +24,7 @@
#include <asm/desc.h>
#include <asm/system.h>
#include <asm/cacheflush.h>
+#include <asm/debugreg.h>
#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
static u32 kexec_pgd[1024] PAGE_ALIGNED;
@@ -131,6 +132,7 @@ void machine_kexec(struct kimage *image)
/* Interrupts aren't acceptable while we reboot */
local_irq_disable();
+ disable_debug_registers();
if (image->preserve_context) {
#ifdef CONFIG_X86_IO_APIC
Index: linux-bkpt-lkml-27-rc9/arch/x86/kernel/machine_kexec_64.c
===================================================================
--- linux-bkpt-lkml-27-rc9.orig/arch/x86/kernel/machine_kexec_64.c
+++ linux-bkpt-lkml-27-rc9/arch/x86/kernel/machine_kexec_64.c
@@ -17,6 +17,7 @@
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
#include <asm/io.h>
+#include <asm/debugreg.h>
#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
static u64 kexec_pgd[512] PAGE_ALIGNED;
@@ -190,6 +191,7 @@ void machine_kexec(struct kimage *image)
/* Interrupts aren't acceptable while we reboot */
local_irq_disable();
+ disable_debug_registers();
control_page = page_address(image->control_code_page) + PAGE_SIZE;
memcpy(control_page, relocate_kernel, PAGE_SIZE);
You need to redo this (and the whole set) for the post-2.6.27 tree.
e.g. traps.c has been unified.
+ /* Store the virtualized DR6 value */
+ tsk->thread.vdr6 = dr6;
+
+ if (notify_die(DIE_DEBUG, "debug", regs, dr6, error_code,
+ SIGTRAP) == NOTIFY_STOP)
return;
I'm not sure you should change vdr6 when notify_die returns NOTIFY_STOP.
Maybe Alan and I hashed out the logic of this before, I don't recall.
If the notifier is eating the event, then it should not affect the
thread-virtualized view of %db6. That would be consistent with the
existing code, where ->thread.debugreg6 is only set later when all the
intercepted or spurious exceptions have been filtered out.
Thanks,
Roland
Just make the new functions in hw_breakpoint code use the same old ptrace_*
names. Those are the proper names for this. It really is just compatibility
for the old ptrace interface, not any other reason for "thread" debugregs.
Then this patch need do nothing but remove the static ptrace.c functions.
Thanks,
Roland
+/*
+ * Handle debug registers. This must be done _after_ current
+ * is updated.
+ */
+ if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
+ switch_to_thread_hw_breakpoint(next_p);
It would be good if we could arrange that this works before current changes.
That way it says in __switch_to_xtra, which is off the hot path.
If you are moving it, then you need to match that by removing _TIF_DEBUG
from _TIF_WORK_CTXSW_NEXT.
Thanks,
Roland
The "sstep_reason" hair in the x86 patch is the perfect example of exactly
what I wanted to avoid by keeping the hw_breakpoint layer simpler. (Also I
strongly doubt that the x86 patch's single-step magic is correct in all the
corner cases. But that discussion is the very rat hole that I want to
defer, is my point in this here discussion.)
I think the role of the hw_breakpoint layer ought to be an arch-neutral API
for accessing and multiplexing the breakpoint functionality of the hardware.
I don't think its role ought to include enhancing the facilities beyond
what the actual hardware has (just the multiplexing).
AFAIK no hardware's facility delivers two separate exceptions for a single
hit of a single breakpoint slot. There is just one hit, and it's either
before the instruction or after it. So you only need one handler function,
and one pointer slot for it.
Each given type of breakpoint on a given arch is either a fire-before or a
fire-after type. The *_supported() functions are enough to make it clear.
Those really should be inlines in an arch header, since they will be
constants or type==CONST tests that are probably entirely optimized away
(and let a big dead fork of the caller's code get optimized away).
For fire-before types, there are two flavors. On powerpc, data breakpoints
are fire-before, and to actually complete the triggering load/store you
have to clear the dabr (disable the breakpoint) before resuming (and then
potentially deal with step-then-reenable, etc). On x86, instruction
breakpoints are fire-before, but there is the option of using the magic RF
bit to suppress the hit without disabling the breakpoint. So you need
another inline a la *_supported() to indicate what's possible. (Possibly
the handler should be able to control this with its return value,
i.e. allow returning with RF clear for some reason.)
The sequence of suppress-and-single-step (x86 insn) or
disable-and-single-step-and-reenable (powerpc data) will of course be
desireable for high-level uses. But that doesn't mean they need to be
built into hw_breakpoint. Make the hw_breakpoint layer clearly expose what
the options are on the hardware, and the rest can be built up from
hw_breakpoint along with other components managing the stepping part.
(For user-mode stepping, the building blocks are already there. For the
kernel side, we can attack that windmill another day.)
Thanks,
Roland
I'm leaving aside the single-step stuff, which I just posted about.
I'm also concerned about getting RF right here. We need to make sure that
every place that warps ->ip away somewhere also clears RF. In case of
instruction breakpoints in user space, we might also want to mask RF from
being seen/changed via ptrace/user_regset (a bit like is done for TF).
I'm a little inclined to do a first x86 version without instruction
breakpoint support and hash out the rest of the code for a bit. Then add
instruction breakpoints in a later patch. (Not that it has to be a lot
later or anything. Just to table the detailed review of RF fiddling until
after most of the meaty stuff is pretty baked.)
Thanks,
Roland
Hi Roland,
Thanks for the review of the patches and your detailed comments.
My responses to your comments are placed inline....
On Wed, Oct 15, 2008 at 07:49:56PM -0700, Roland McGrath wrote:
> The "sstep_reason" hair in the x86 patch is the perfect example of exactly
> what I wanted to avoid by keeping the hw_breakpoint layer simpler. (Also I
> strongly doubt that the x86 patch's single-step magic is correct in all the
> corner cases. But that discussion is the very rat hole that I want to
> defer, is my point in this here discussion.)
>
> I think the role of the hw_breakpoint layer ought to be an arch-neutral API
> for accessing and multiplexing the breakpoint functionality of the hardware.
> I don't think its role ought to include enhancing the facilities beyond
> what the actual hardware has (just the multiplexing).
I agree with you that single-stepping and enablement of a post_handler
for instruction breakpoints in x86 provides a 'feature' over what the
underlying processor provides. There may be corner-case issues to it
(and I'd be glad to hear if somebody has discovered any).
However, I'm afraid if the addition of such a feature in a layer above
the hw_breakpoint can be done entirely without modifying code that is
much closer to the hardware - say do_debug().
Without sufficient support in the proposed hw_breakpoint layer, I'm
afraid that future support for a post_handler() through use of
single-stepping for instructions in x86 may involve plenty of changes in
both the code underlying hw_breakpoint layer and itself. This is more
pronounced due to the fact that we now have multiple users of processor
single-stepping functionality - kprobes, KGDB and IABR (for PowerPC
which does not have an equivalent of RF bit).
"sstep_reason" - in its future avatars may be made to point to several
other reasons like SSTEP_KGDB (as pointed out by me earlier, integration
with KGDB is still pending and will require some re-work on the
patches).
Yet, given that there's sufficient interest to see hw_breakpoint layer
as one that provides pure wrappers around processor functionality i.e.
devoid of the overloading of single-stepping function, I will submit my
next iteration of patches with a single pointer called triggered() as a
callback function.
I would submit a new set of patches with a pre_ and post_handlers made
available (for instructions, that is) after the discussions on
hw_breakpoint itself have settled down.
I will respond to your suggestions to my individual patches as separate
emails.
>
> AFAIK no hardware's facility delivers two separate exceptions for a single
> hit of a single breakpoint slot. There is just one hit, and it's either
> before the instruction or after it. So you only need one handler function,
> and one pointer slot for it.
Yes, a triggered() callback would suffice if we are not going to
overload the single-stepping functionality. I've discussed this above.
>
> Each given type of breakpoint on a given arch is either a fire-before or a
> fire-after type. The *_supported() functions are enough to make it clear.
> Those really should be inlines in an arch header, since they will be
> constants or type==CONST tests that are probably entirely optimized away
> (and let a big dead fork of the caller's code get optimized away).
>
> For fire-before types, there are two flavors. On powerpc, data breakpoints
> are fire-before, and to actually complete the triggering load/store you
It's my understanding that Data Breakpoints, set by DABR fire the Data
Storage Interrupt in PPC only after the data access operation is done
(except for store operations on atomic writes). Exceptions resulting due
to a match with IABR (for instructions) is trigger-before.
> have to clear the dabr (disable the breakpoint) before resuming (and then
> potentially deal with step-then-reenable, etc). On x86, instruction
> breakpoints are fire-before, but there is the option of using the magic RF
> bit to suppress the hit without disabling the breakpoint. So you need
> another inline a la *_supported() to indicate what's possible. (Possibly
> the handler should be able to control this with its return value,
> i.e. allow returning with RF clear for some reason.)
Are you suggesting that the post_handler_supported() in i386/x86_64 be
modified to return, say UNSUPPORTED_DUE_TO_RF instead of just
UNSUPPORTED to indicate that post_handler() may be supported if
single-stepping is used to step-over the instructions rather than use
RF bit?
>
> The sequence of suppress-and-single-step (x86 insn) or
> disable-and-single-step-and-reenable (powerpc data) will of course be
> desireable for high-level uses. But that doesn't mean they need to be
> built into hw_breakpoint. Make the hw_breakpoint layer clearly expose what
> the options are on the hardware, and the rest can be built up from
> hw_breakpoint along with other components managing the stepping part.
> (For user-mode stepping, the building blocks are already there. For the
> kernel side, we can attack that windmill another day.)
>
>
> Thanks,
> Roland
Thanks,
K.Prasad
On Wed, 15 Oct 2008, Roland McGrath wrote:
> You need to redo this (and the whole set) for the post-2.6.27 tree.
> e.g. traps.c has been unified.
>
> + /* Store the virtualized DR6 value */
> + tsk->thread.vdr6 = dr6;
> +
> + if (notify_die(DIE_DEBUG, "debug", regs, dr6, error_code,
> + SIGTRAP) == NOTIFY_STOP)
> return;
>
> I'm not sure you should change vdr6 when notify_die returns NOTIFY_STOP.
> Maybe Alan and I hashed out the logic of this before, I don't recall.
> If the notifier is eating the event, then it should not affect the
> thread-virtualized view of %db6. That would be consistent with the
> existing code, where ->thread.debugreg6 is only set later when all the
> intercepted or spurious exceptions have been filtered out.
I think we have to leave the code as it is. As each notifier routine
processes the event, it will turn off the corresponding bits in vdr6.
We don't want those bits to get turned back on again by overwriting
vdr6 after the notifier chain has finished.
Alan Stern
On Wed, 15 Oct 2008, Roland McGrath wrote:
> +/*
> + * Handle debug registers. This must be done _after_ current
> + * is updated.
> + */
> + if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
> + switch_to_thread_hw_breakpoint(next_p);
>
> It would be good if we could arrange that this works before current changes.
> That way it says in __switch_to_xtra, which is off the hot path.
There's a problem with moving the switch_to_thread_hw_breakpoint() call
before current is updated. Suppose a kernel breakpoint is triggered in
between the two. The hw-breakpoint handler will see that current is
different from the task pointer stored in the chbi area, so it will
think the task pointer is leftover from an old task (lazy switching)
and will erase it. Then until the next context switch, no
user-breakpoints will be installed.
The real problem is that it's impossible to update both current and
chbi->bp_task at the same instant, so there will always be a window in
which they disagree and a breakpoint might get triggered. Since we use
lazy switching, we are forced to assume that a disagreement means that
current is correct an chbi->bp_task is old. But if you move the code
above then you'll create a window in which current is old and
chbi->bp_task is correct.
Alan Stern
On Wed, 15 Oct 2008, Roland McGrath wrote:
> AFAIK no hardware's facility delivers two separate exceptions for a single
> hit of a single breakpoint slot. There is just one hit, and it's either
> before the instruction or after it. So you only need one handler function,
> and one pointer slot for it.
Hmm... What happens on x86 if you have both an instruction breakpoint
and a data breakpoint triggered for the same instruction? My old 386
manual seems to imply that you'll get two exceptions: a fault and a
trap.
But I guess this would count as two separate breakpoint slots, with two
different handlers registered. So it shouldn't pose a problem.
> For fire-before types, there are two flavors. On powerpc, data breakpoints
> are fire-before, and to actually complete the triggering load/store you
> have to clear the dabr (disable the breakpoint) before resuming (and then
> potentially deal with step-then-reenable, etc). On x86, instruction
> breakpoints are fire-before, but there is the option of using the magic RF
> bit to suppress the hit without disabling the breakpoint. So you need
> another inline a la *_supported() to indicate what's possible. (Possibly
> the handler should be able to control this with its return value,
> i.e. allow returning with RF clear for some reason.)
There's another RF-related issue which the patch currently ignores.
Namely, what should happen if a new user breakpoint is registered at
the current PC address? We should force the RF flag to 0 so that the
breakpoint will be triggered when execution resumes. The problem is
that it's not easy to tell whether the current PC corresponds to the
same linear address as that registered for the breakpoint -- i.e., I
don't know how the code should go about translating from virtual
addresses to linear addresses. Would this in fact always be the
identity mapping? Presumably not if we're in VM86 mode.
Alan Stern
> > I'm not sure you should change vdr6 when notify_die returns NOTIFY_STOP.
> > Maybe Alan and I hashed out the logic of this before, I don't recall.
> > If the notifier is eating the event, then it should not affect the
> > thread-virtualized view of %db6. That would be consistent with the
> > existing code, where ->thread.debugreg6 is only set later when all the
> > intercepted or spurious exceptions have been filtered out.
>
> I think we have to leave the code as it is. As each notifier routine
> processes the event, it will turn off the corresponding bits in vdr6.
> We don't want those bits to get turned back on again by overwriting
> vdr6 after the notifier chain has finished.
Ah, I now almost remember discussing this before. We went around with the
notifier getting a pointer to "condition" to clear its bits, etc. So there
is a special requirement for DIE_DEBUG notifiers to fiddle the vdr6 bits.
That ought to be documented somewhere or other, as well as clearly
commented in do_debug itself.
And what does it mean exactly? The bits that should be left in thread.vdr6
are the virtualized bits, not the raw hardware bits. That is, the low four
bits might need to be reordered from the actual hardware order.
And what about this scenario?
1. do_debug hits for %db3, being used for a ptrace user watchpoint
-> vdr6 = hardware %db6 = 0x...08
-> send SIGTRAP
2. do_debug hits for %db0, being used for an in-kernel hw_breakpoint
-> hardware clears the low four bits before setting one of them
-> vdr6 = hardware %db6 = 0x...01
-> vdr6 &= ~1 = 0x...00
-> run hw_breakpoint callback
3. try to return to user, deliver SIGTRAP to ptrace'ing debugger
4. debugger does PTRACE_PEEKUSR u_debugreg[6]
-> read vdr6 = 0x...00
-> wtf? where is my db3 hit?
To get this scenario correct, the virtual db6 bits for ptrace need to
remain set when there is a later do_debug that ptrace won't see.
Thanks,
Roland
On Thu, 16 Oct 2008, Roland McGrath wrote:
> Ah, I now almost remember discussing this before. We went around with the
> notifier getting a pointer to "condition" to clear its bits, etc. So there
> is a special requirement for DIE_DEBUG notifiers to fiddle the vdr6 bits.
> That ought to be documented somewhere or other, as well as clearly
> commented in do_debug itself.
>
> And what does it mean exactly? The bits that should be left in thread.vdr6
> are the virtualized bits, not the raw hardware bits. That is, the low four
> bits might need to be reordered from the actual hardware order.
>
> And what about this scenario?
>
> 1. do_debug hits for %db3, being used for a ptrace user watchpoint
> -> vdr6 = hardware %db6 = 0x...08
> -> send SIGTRAP
> 2. do_debug hits for %db0, being used for an in-kernel hw_breakpoint
> -> hardware clears the low four bits before setting one of them
> -> vdr6 = hardware %db6 = 0x...01
> -> vdr6 &= ~1 = 0x...00
> -> run hw_breakpoint callback
> 3. try to return to user, deliver SIGTRAP to ptrace'ing debugger
> 4. debugger does PTRACE_PEEKUSR u_debugreg[6]
> -> read vdr6 = 0x...00
> -> wtf? where is my db3 hit?
>
> To get this scenario correct, the virtual db6 bits for ptrace need to
> remain set when there is a later do_debug that ptrace won't see.
I seem to recall doing some experiments to find out which bits in DR6
the CPU would set and which it would clear. According to the 386
manual the processor never clears any of the bits, but this was changed
in later models. And I seem to recall that the experiments showed that
gdb would not clear the bits either; it relied on the CPU to clear
them.
A kernel breakpoint shouldn't affect the task's virtualized view of
DR6. But only the hw-breakpoint notifier handler knows whether a
particular exception was caused by a kernel breakpoint. Conversely,
that handler knows _only_ about the 4 low-order bits in DR6; it can't
virtualize any of the other bits.
So this means that do_debug shouldn't modify the four low bits in vdr6.
Instead the hw-breakpoint handler should do whatever is needed to set
them properly: Leave them alone when a kernel breakpoint occurs,
otherwise clear them and set the bit corresponding to the user
breakpoint. In fact, the new ptrace_triggered() routine does set the
appropriate bit in vdr6.
I don't know how we should handle the BT (debug trap) and BS
(single-step exception) bits. Maybe the kprobes code can take care of
them.
Alan Stern
> I seem to recall doing some experiments to find out which bits in DR6
> the CPU would set and which it would clear. According to the 386
> manual the processor never clears any of the bits, but this was changed
> in later models. And I seem to recall that the experiments showed that
> gdb would not clear the bits either; it relied on the CPU to clear
> them.
Current Intel manuals say, "Certain debug exceptions may clear bits 0-3.
The remaining cntents of the DR6 register are never cleared by the processor."
Your experiments told us that "certain debug exceptions" includes at least
the data breakpoint hits. I assume that what it really means is all the
exceptions that set one of those four bits, i.e. ones due to DR[0-3] use.
Perhaps someone from Intel can clarify exactly what it means.
> So this means that do_debug shouldn't modify the four low bits in vdr6.
[...]
Right.
> I don't know how we should handle the BT (debug trap) and BS
> (single-step exception) bits. Maybe the kprobes code can take care of
> them.
BT is for task switch with TSS.T set. I don't think that can ever happen
in Linux, since we don't use hardware task-switching. If at all, maybe in
vm86 mode. I don't think there's a way to do it just from user_ldt.
I think BS (DR_STEP) should get set in vdr6 only when a SIGTRAP is
generated for the exception. It should never get cleared by the system,
only by PTRACE_POKEUSR. That is consistent with what we get now, AFAICT.
I don't think kprobes should "take care of" DR_STEP. It should eat a
DR_STEP that it's responsible for, and leave any others alone. i.e.,
CONFIG_KPROBE=n must not break the normal bookkeeping.
IIRC there can be one do_debug trap that's for both a breakpoint register
hit and a single-step (TF), with DR_STEP plus DR_TRAPn both set at once.
To handle that too, I think this will work:
do_debug does:
get_debugreg(condition, 6);
set_debugreg(0, 6);
Make sure the hw_breakpoint notifier is before the kprobes notifier.
hw_breakpoint is responsible for the low 4 bits of vdr6 and leaves its
other bits alone. It returns NOTIFY_STOP iff both this hit is not a ptrace
hit and hardware %db6 (args->err) has no other nonreserved bits set.
kprobes stays as it, returns NOTIFY_STOP iff it's swallowing the step.
do_debug stays mostly the same, replace:
tsk->thread.debugreg6 = condition;
with:
tsk->thread.vdr6 &= DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3;
tsk->thread.vdr6 |= condition & ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
Thanks,
Roland
> kprobes stays as it, returns NOTIFY_STOP iff it's swallowing the step.
Oops, I think this breaks if there was also a ptrace db[0-3] hit in the
same exception. In that case, kprobes would need to not return NOTIFY_STOP
when it otherwise would, if thread.vdr6 has low bits set.
Thanks,
Roland
> Hmm... What happens on x86 if you have both an instruction breakpoint
> and a data breakpoint triggered for the same instruction? My old 386
> manual seems to imply that you'll get two exceptions: a fault and a
> trap.
Yes, those are not really "the same instruction". The instruction
breakpoint gives you a fault-type debug exception, which means the
instruction hasn't actually been executed yet. You then might not execute
it at all, if you change the PC in the trap frame. To execute it, you have
to either set RF or disable that breakpoint, and then execute it. If it
actually completes (doesn't have some other normal fault first like a page
fault, or an external interrupt first, etc), then you get a single-step
trap after it's executed. As far as the hardware is concerned these two
events are entirely separate.
> There's another RF-related issue which the patch currently ignores.
> Namely, what should happen if a new user breakpoint is registered at
> the current PC address? We should force the RF flag to 0 so that the
> breakpoint will be triggered when execution resumes. The problem is
> that it's not easy to tell whether the current PC corresponds to the
> same linear address as that registered for the breakpoint -- i.e., I
> don't know how the code should go about translating from virtual
> addresses to linear addresses. Would this in fact always be the
> identity mapping? Presumably not if we're in VM86 mode.
Um, punt? There is the hair in step.c:convert_ip_to_linear, but, bah.
Well, I can see doing it, and it would integrate with the whole handling of
changes to ->ip, etc. But here we have another fine example of the new can
of worms involved in getting RF handling correct. I'm thoroughly convinced
we should drop instruction breakpoint support from the initial version of
the x86 patch and hash out this whole RF picture separately after we've got
the rest of hw_breakpoint somewhat more settled.
Thanks,
Roland
> There's a problem with moving the switch_to_thread_hw_breakpoint() call
> before current is updated. [...]
I got this. I'm just saying let's look for ways to do this better.
Hmm. The 64-bit version of __switch_to does the current change much
earlier, before __switch_to_xtra and math_state_restore. I wonder if the
32-bit version could change to match. I can't see what in __switch_to_xtra
would care either way, though I may be overlooking something. Ingo?
If this does not change, then __switch_to_xtra can pass prev,next to
switch_hw_breakpoint. You can always do some slightly different
bookkeeping in there to know that prev->next is the switch happening,
so if there is a hit when current still == prev you can filter it out.
Thanks,
Roland
> I agree with you that single-stepping and enablement of a post_handler
> for instruction breakpoints in x86 provides a 'feature' over what the
> underlying processor provides. There may be corner-case issues to it
> (and I'd be glad to hear if somebody has discovered any).
I'm pretty well positive I can come up with several. But my point is that
the whole subject is hairy and that I want to separate the concerns
properly and deal with all that later, rather than clutter up the initial
review and integration of hw_breakpoint with all that.
> However, I'm afraid if the addition of such a feature in a layer above
> the hw_breakpoint can be done entirely without modifying code that is
> much closer to the hardware - say do_debug().
I didn't say that sorting out single-step wouldn't involve changes to
do_debug. It probably will. But that is still quite separate from
hw_breakpoint and I don't agree at all that such later work will have to
(or want to) be rolled into the hw_breakpoint API itself.
> Yet, given that there's sufficient interest to see hw_breakpoint layer
> as one that provides pure wrappers around processor functionality i.e.
> devoid of the overloading of single-stepping function, I will submit my
> next iteration of patches with a single pointer called triggered() as a
> callback function.
Great.
> It's my understanding that Data Breakpoints, set by DABR fire the Data
> Storage Interrupt in PPC only after the data access operation is done
> (except for store operations on atomic writes). Exceptions resulting due
> to a match with IABR (for instructions) is trigger-before.
Ok, I wasn't really trying to be right about PPC at the moment. The point
is just to expose the true behavior of the hardware in a coherent way.
When working on each arch port, one of course has to be careful to know the
correct details.
> Are you suggesting that the post_handler_supported() in i386/x86_64 be
> modified to return, say UNSUPPORTED_DUE_TO_RF instead of just
> UNSUPPORTED to indicate that post_handler() may be supported if
> single-stepping is used to step-over the instructions rather than use
> RF bit?
I wasn't proposing the exact API details. I think the API that makes sense
to describe the true hardware functionality uses somewhat different terms,
so "post_handler_supported" doesn't especially make sense. What I think
makes sense is to say a given hw_breakpoint type "triggers before" or
"triggers after" (since there really is only one event to possibly have a
handler, not two). The further possibilities of what kind of "triggers
before" it is are that the handler can request completing the instruction
without a re-trigger, or it can't (and just has to disable the breakpoint).
Probably a couple of Boolean-valued inlines covers it most clearly.
Thanks,
Roland
On Fri, 17 Oct 2008, Roland McGrath wrote:
> Current Intel manuals say, "Certain debug exceptions may clear bits 0-3.
> The remaining cntents of the DR6 register are never cleared by the processor."
>
> Your experiments told us that "certain debug exceptions" includes at least
> the data breakpoint hits. I assume that what it really means is all the
> exceptions that set one of those four bits, i.e. ones due to DR[0-3] use.
> Perhaps someone from Intel can clarify exactly what it means.
>
> > So this means that do_debug shouldn't modify the four low bits in vdr6.
> [...]
>
> Right.
>
> > I don't know how we should handle the BT (debug trap) and BS
> > (single-step exception) bits. Maybe the kprobes code can take care of
> > them.
>
> BT is for task switch with TSS.T set. I don't think that can ever happen
> in Linux, since we don't use hardware task-switching. If at all, maybe in
> vm86 mode. I don't think there's a way to do it just from user_ldt.
>
> I think BS (DR_STEP) should get set in vdr6 only when a SIGTRAP is
> generated for the exception. It should never get cleared by the system,
> only by PTRACE_POKEUSR. That is consistent with what we get now, AFAICT.
>
> I don't think kprobes should "take care of" DR_STEP. It should eat a
> DR_STEP that it's responsible for, and leave any others alone. i.e.,
> CONFIG_KPROBE=n must not break the normal bookkeeping.
>
> IIRC there can be one do_debug trap that's for both a breakpoint register
> hit and a single-step (TF), with DR_STEP plus DR_TRAPn both set at once.
> To handle that too, I think this will work:
>
> do_debug does:
>
> get_debugreg(condition, 6);
> set_debugreg(0, 6);
>
> Make sure the hw_breakpoint notifier is before the kprobes notifier.
> hw_breakpoint is responsible for the low 4 bits of vdr6 and leaves its
> other bits alone. It returns NOTIFY_STOP iff both this hit is not a ptrace
> hit and hardware %db6 (args->err) has no other nonreserved bits set.
Ah yes, it's coming back to me now. The handler routines see the
original hardware DR6 contents in args->err. They want to turn off the
bits corresponding to events they take care of, leaving the remaining
bits intact. When the notifier chain is finished, any bits still left
in args->err have to be acted on by do_debug, by or'ing them into vdr6.
The problem is that, owing to the way the code is structured, this
can't be done. args->err is local to notify_die, so any changes made
to its value are not available in do_debug.
> kprobes stays as it, returns NOTIFY_STOP iff it's swallowing the step.
> do_debug stays mostly the same, replace:
>
> tsk->thread.debugreg6 = condition;
>
> with:
>
> tsk->thread.vdr6 &= DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3;
> tsk->thread.vdr6 |= condition & ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
No, this can't be right. Or if it is, it's just by coincidence. What
we really want to do is:
tsk->thread.vdr6 |= args->err;
after notify_die() returns. Unfortunately this is impossible unless we
change things around. For example, instead of passing condition as an
argument to notify_die(), we could pass (long) &condition and change
the notifier routines to use (* (unsigned *) (args->err)) instead of
args->err.
> > kprobes stays as it, returns NOTIFY_STOP iff it's swallowing the step.
>
> Oops, I think this breaks if there was also a ptrace db[0-3] hit in the
> same exception. In that case, kprobes would need to not return NOTIFY_STOP
> when it otherwise would, if thread.vdr6 has low bits set.
What should happen is kprobes returns NOTIFY_STOP if there are no
unreserved bits still set in args->err -- or (* (unsigned *)
(args->err)) -- when it is ready to return.
Alan Stern
On Fri, 17 Oct 2008, Roland McGrath wrote:
> > There's another RF-related issue which the patch currently ignores.
> > Namely, what should happen if a new user breakpoint is registered at
> > the current PC address? We should force the RF flag to 0 so that the
> > breakpoint will be triggered when execution resumes. The problem is
> > that it's not easy to tell whether the current PC corresponds to the
> > same linear address as that registered for the breakpoint -- i.e., I
> > don't know how the code should go about translating from virtual
> > addresses to linear addresses. Would this in fact always be the
> > identity mapping? Presumably not if we're in VM86 mode.
>
> Um, punt? There is the hair in step.c:convert_ip_to_linear, but, bah.
Punting is exactly what the patch does now. Together with a FIXME
comment.
Alan Stern
On Fri, 17 Oct 2008, Roland McGrath wrote:
> > There's a problem with moving the switch_to_thread_hw_breakpoint() call
> > before current is updated. [...]
>
> I got this. I'm just saying let's look for ways to do this better.
>
> Hmm. The 64-bit version of __switch_to does the current change much
> earlier, before __switch_to_xtra and math_state_restore. I wonder if the
> 32-bit version could change to match. I can't see what in __switch_to_xtra
> would care either way, though I may be overlooking something. Ingo?
Would it be better to move __switch_to_xtra down below the change to
current, rather than moving the change to current up above
__switch_to_xtra?
> If this does not change, then __switch_to_xtra can pass prev,next to
> switch_hw_breakpoint. You can always do some slightly different
> bookkeeping in there to know that prev->next is the switch happening,
> so if there is a hit when current still == prev you can filter it out.
The cpu_hw_breakpoint structure would have to keep track of two tasks:
the one whose breakpoints are currently loaded and the task that
preceded it. When a breakpoint occurs and current is not equal to the
first task, we would skip clearing the user-breakpoint registers if
current was equal to the second.
But then what happens when the system alternates between running two
tasks, one of which is being debugged and the other isn't? The task
not being debugged would constantly encounter breakpoints meant for the
other task, because the breakpoint registers wouldn't get cleared.
Alan Stern
I'd just like to point out that not all hardware breakpoints
involve addresses. For example:
The OLPC XO uses an x86 processor called the Geode.
It supports a breakpoint that traps based on the opcode.
For example, one could set it to trap on arpl instructions.
On Mon, Oct 20, 2008 at 05:33:40AM -0400, Albert Cahalan wrote:
> I'd just like to point out that not all hardware breakpoints
> involve addresses. For example:
>
> The OLPC XO uses an x86 processor called the Geode.
> It supports a breakpoint that traps based on the opcode.
> For example, one could set it to trap on arpl instructions.
We should be able to handle such unique requirements by introducing a
new type of breakpoints (similar to HW_BREAKPOINT_EXECUTE,
HW)BREAKPOINT_WRITE, etc).
Thanks,
K.Prasad
> > Hmm. The 64-bit version of __switch_to does the current change much
> > earlier, before __switch_to_xtra and math_state_restore. I wonder if the
> > 32-bit version could change to match. I can't see what in __switch_to_xtra
> > would care either way, though I may be overlooking something. Ingo?
>
> Would it be better to move __switch_to_xtra down below the change to
> current, rather than moving the change to current up above
> __switch_to_xtra?
I can't see that anything else in __switch_to_xtra cares either way.
Thanks,
Roland
> > Hmm. The 64-bit version of __switch_to does the current change much
> > earlier, before __switch_to_xtra and math_state_restore. I wonder if the
> > 32-bit version could change to match. I can't see what in __switch_to_xtra
> > would care either way, though I may be overlooking something. Ingo?
>
> Would it be better to move __switch_to_xtra down below the change to
> current, rather than moving the change to current up above
> __switch_to_xtra?
I can't see that anything else in __switch_to_xtra cares either way.
Thanks,
Roland
On Sat, Oct 18, 2008 at 11:21:53AM -0400, Alan Stern wrote:
> On Fri, 17 Oct 2008, Roland McGrath wrote:
>
> > Current Intel manuals say, "Certain debug exceptions may clear bits 0-3.
> > The remaining cntents of the DR6 register are never cleared by the processor."
> >
> > Your experiments told us that "certain debug exceptions" includes at least
> > the data breakpoint hits. I assume that what it really means is all the
> > exceptions that set one of those four bits, i.e. ones due to DR[0-3] use.
> > Perhaps someone from Intel can clarify exactly what it means.
> >
> > > So this means that do_debug shouldn't modify the four low bits in vdr6.
> > [...]
> >
> > Right.
> >
> > > I don't know how we should handle the BT (debug trap) and BS
> > > (single-step exception) bits. Maybe the kprobes code can take care of
> > > them.
> >
> > BT is for task switch with TSS.T set. I don't think that can ever happen
> > in Linux, since we don't use hardware task-switching. If at all, maybe in
> > vm86 mode. I don't think there's a way to do it just from user_ldt.
> >
> > I think BS (DR_STEP) should get set in vdr6 only when a SIGTRAP is
> > generated for the exception. It should never get cleared by the system,
> > only by PTRACE_POKEUSR. That is consistent with what we get now, AFAICT.
> >
> > I don't think kprobes should "take care of" DR_STEP. It should eat a
> > DR_STEP that it's responsible for, and leave any others alone. i.e.,
> > CONFIG_KPROBE=n must not break the normal bookkeeping.
> >
> > IIRC there can be one do_debug trap that's for both a breakpoint register
> > hit and a single-step (TF), with DR_STEP plus DR_TRAPn both set at once.
> > To handle that too, I think this will work:
> >
> > do_debug does:
> >
> > get_debugreg(condition, 6);
> > set_debugreg(0, 6);
> >
> > Make sure the hw_breakpoint notifier is before the kprobes notifier.
> > hw_breakpoint is responsible for the low 4 bits of vdr6 and leaves its
> > other bits alone. It returns NOTIFY_STOP iff both this hit is not a ptrace
> > hit and hardware %db6 (args->err) has no other nonreserved bits set.
>
> Ah yes, it's coming back to me now. The handler routines see the
> original hardware DR6 contents in args->err. They want to turn off the
> bits corresponding to events they take care of, leaving the remaining
> bits intact. When the notifier chain is finished, any bits still left
> in args->err have to be acted on by do_debug, by or'ing them into vdr6.
>
> The problem is that, owing to the way the code is structured, this
> can't be done. args->err is local to notify_die, so any changes made
> to its value are not available in do_debug.
>
> > kprobes stays as it, returns NOTIFY_STOP iff it's swallowing the step.
> > do_debug stays mostly the same, replace:
> >
> > tsk->thread.debugreg6 = condition;
> >
> > with:
> >
> > tsk->thread.vdr6 &= DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3;
> > tsk->thread.vdr6 |= condition & ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
>
> No, this can't be right. Or if it is, it's just by coincidence. What
> we really want to do is:
>
> tsk->thread.vdr6 |= args->err;
>
> after notify_die() returns. Unfortunately this is impossible unless we
> change things around. For example, instead of passing condition as an
> argument to notify_die(), we could pass (long) &condition and change
> the notifier routines to use (* (unsigned *) (args->err)) instead of
> args->err.
>
> > > kprobes stays as it, returns NOTIFY_STOP iff it's swallowing the step.
> >
> > Oops, I think this breaks if there was also a ptrace db[0-3] hit in the
> > same exception. In that case, kprobes would need to not return NOTIFY_STOP
> > when it otherwise would, if thread.vdr6 has low bits set.
>
> What should happen is kprobes returns NOTIFY_STOP if there are no
> unreserved bits still set in args->err -- or (* (unsigned *)
> (args->err)) -- when it is ready to return.
>
> Alan Stern
>
Given that kprobes and HW Breakpoint exceptions multiplex only the
DIE_DEBUG notifier, kprobes tries to handle and eat-up the exception
with a NOTIFY_STOP only if it is in the context of kprobes i.e. when
kprobe_running() is true; which leaves no room for interference with a
user-space single-stepping request.
Thanks,
K.Prasad
On Wed, Dec 03, 2008 at 05:05:59PM -0800, Roland McGrath wrote:
> > > Hmm. The 64-bit version of __switch_to does the current change much
> > > earlier, before __switch_to_xtra and math_state_restore. I wonder if the
> > > 32-bit version could change to match. I can't see what in __switch_to_xtra
> > > would care either way, though I may be overlooking something. Ingo?
> >
> > Would it be better to move __switch_to_xtra down below the change to
> > current, rather than moving the change to current up above
> > __switch_to_xtra?
>
> I can't see that anything else in __switch_to_xtra cares either way.
>
>
> Thanks,
> Roland
>
That the grouse against the placement of
switch_to_thread_hw_breakpoint() is about additional code in the
hot-path, can we deal with this separately through a different patch?
I now have the patchset which provides only data breakpoint facility on
x86 (and x86_64) and has addressed your comments, ported against
2.6.28-rc7 which will be sent shortly. I'm thinking if the suggested
changes to the context-switching code can be handled later.
Thanks,
K.Prasad
On Wed, Oct 15, 2008 at 06:44:46PM -0700, Roland McGrath wrote:
> Just make the new functions in hw_breakpoint code use the same old ptrace_*
> names. Those are the proper names for this. It really is just compatibility
> for the old ptrace interface, not any other reason for "thread" debugregs.
> Then this patch need do nothing but remove the static ptrace.c functions.
>
>
> Thanks,
> Roland
>
I will rename the thread_<set/get>_debugreg() functions to ptrace_* and
move the related functions to arch/x86/kernel/ptrace.c
Thanks,
K.Prasad