2016-10-20 05:48:25

by Pratyush Anand

[permalink] [raw]
Subject: [PATCH V2 0/5] ARM64: More flexible HW watchpoint

Currently, we do not support all the byte select option provided by ARM64
specs for a HW watchpoint.

This patch set will help user to instrument a watchpoint with all possible
byte select options.

Changes since v1:
Introduced a new patch 3/5 where it takes care of the situation when HW
does not report a watchpoint hit with the address that matches one of the
watchpoints set.
Added corresponding test case to test that functionality.

Pavel Labath (1):
arm64: hw_breakpoint: Handle inexact watchpoint addresses

Pratyush Anand (4):
hw_breakpoint: Allow watchpoint of length 3,5,6 and 7
arm64: Allow hw watchpoint at varied offset from base address
arm64: Allow hw watchpoint of length 3,5,6 and 7
selftests: arm64: add test for unaligned/inexact watchpoint handling

arch/arm64/include/asm/hw_breakpoint.h | 6 +-
arch/arm64/kernel/hw_breakpoint.c | 149 +++++++++----
arch/arm64/kernel/ptrace.c | 5 +-
include/uapi/linux/hw_breakpoint.h | 4 +
tools/include/uapi/linux/hw_breakpoint.h | 4 +
tools/testing/selftests/breakpoints/Makefile | 5 +-
.../selftests/breakpoints/breakpoint_test_arm64.c | 236 +++++++++++++++++++++
7 files changed, 366 insertions(+), 43 deletions(-)
create mode 100644 tools/testing/selftests/breakpoints/breakpoint_test_arm64.c

--
2.7.4


2016-10-20 05:48:30

by Pratyush Anand

[permalink] [raw]
Subject: [PATCH V2 1/5] hw_breakpoint: Allow watchpoint of length 3,5,6 and 7

We only support breakpoint/watchpoint of length 1, 2, 4 and 8. If we can
support other length as well, then user may watch more data with less
number of watchpoints (provided hardware supports it). For example: if we
have to watch only 4th, 5th and 6th byte from a 64 bit aligned address, we
will have to use two slots to implement it currently. One slot will watch a
half word at offset 4 and other a byte at offset 6. If we can have a
watchpoint of length 3 then we can watch it with single slot as well.

ARM64 hardware does support such functionality, therefore adding these new
definitions in generic layer.

Signed-off-by: Pratyush Anand <[email protected]>
---
include/uapi/linux/hw_breakpoint.h | 4 ++++
tools/include/uapi/linux/hw_breakpoint.h | 4 ++++
2 files changed, 8 insertions(+)

diff --git a/include/uapi/linux/hw_breakpoint.h b/include/uapi/linux/hw_breakpoint.h
index b04000a2296a..2b65efd19a46 100644
--- a/include/uapi/linux/hw_breakpoint.h
+++ b/include/uapi/linux/hw_breakpoint.h
@@ -4,7 +4,11 @@
enum {
HW_BREAKPOINT_LEN_1 = 1,
HW_BREAKPOINT_LEN_2 = 2,
+ HW_BREAKPOINT_LEN_3 = 3,
HW_BREAKPOINT_LEN_4 = 4,
+ HW_BREAKPOINT_LEN_5 = 5,
+ HW_BREAKPOINT_LEN_6 = 6,
+ HW_BREAKPOINT_LEN_7 = 7,
HW_BREAKPOINT_LEN_8 = 8,
};

diff --git a/tools/include/uapi/linux/hw_breakpoint.h b/tools/include/uapi/linux/hw_breakpoint.h
index b04000a2296a..2b65efd19a46 100644
--- a/tools/include/uapi/linux/hw_breakpoint.h
+++ b/tools/include/uapi/linux/hw_breakpoint.h
@@ -4,7 +4,11 @@
enum {
HW_BREAKPOINT_LEN_1 = 1,
HW_BREAKPOINT_LEN_2 = 2,
+ HW_BREAKPOINT_LEN_3 = 3,
HW_BREAKPOINT_LEN_4 = 4,
+ HW_BREAKPOINT_LEN_5 = 5,
+ HW_BREAKPOINT_LEN_6 = 6,
+ HW_BREAKPOINT_LEN_7 = 7,
HW_BREAKPOINT_LEN_8 = 8,
};

--
2.7.4

2016-10-20 05:48:42

by Pratyush Anand

[permalink] [raw]
Subject: [PATCH V2 3/5] arm64: hw_breakpoint: Handle inexact watchpoint addresses

From: Pavel Labath <[email protected]>

Arm64 hardware does not always report a watchpoint hit address that
matches one of the watchpoints set. It can also report an address
"near" the watchpoint if a single instruction access both watched and
unwatched addresses. There is no straight-forward way, short of
disassembling the offending instruction, to map that address back to
the watchpoint.

Previously, when the hardware reported a watchpoint hit on an address
that did not match our watchpoint (this happens in case of instructions
which access large chunks of memory such as "stp") the process would
enter a loop where we would be continually resuming it (because we did
not recognise that watchpoint hit) and it would keep hitting the
watchpoint again and again. The tracing process would never get
notified of the watchpoint hit.

This commit fixes the problem by looking at the watchpoints near the
address reported by the hardware. If the address does not exactly match
one of the watchpoints we have set, it attributes the hit to the
nearest watchpoint we have. This heuristic is a bit dodgy, but I don't
think we can do much more, given the hardware limitations.

[panand: reworked to rebase on his patches]

Signed-off-by: Pavel Labath <[email protected]>
Signed-off-by: Pratyush Anand <[email protected]>
---
arch/arm64/kernel/hw_breakpoint.c | 94 +++++++++++++++++++++++++++------------
1 file changed, 66 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index 3c2b96803eba..c57bc90b8286 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -662,11 +662,46 @@ unlock:
}
NOKPROBE_SYMBOL(breakpoint_handler);

+/*
+ * Arm64 hardware does not always report a watchpoint hit address that matches
+ * one of the watchpoints set. It can also report an address "near" the
+ * watchpoint if a single instruction access both watched and unwatched
+ * addresses. There is no straight-forward way, short of disassembling the
+ * offending instruction, to map that address back to the watchpoint. This
+ * function computes the distance of the memory access from the watchpoint as a
+ * heuristic for the likelyhood that a given access triggered the watchpoint.
+ *
+ * See Section D2.10.5 "Determining the memory location that caused a Watchpoint
+ * exception" of ARMv8 Architecture Reference Manual for details.
+ *
+ * The function returns the distance of the address from the bytes watched by
+ * the watchpoint. In case of an exact match, it returns 0.
+ */
+static u64 get_distance_from_watchpoint(unsigned long addr, u64 val,
+ struct arch_hw_breakpoint_ctrl *ctrl)
+{
+ u64 wp_low, wp_high;
+ u32 lens, lene;
+
+ lens = ffs(ctrl->len) - 1;
+ lene = fls(ctrl->len) - 1;
+
+ wp_low = val + lens;
+ wp_high = val + lene;
+ if (addr < wp_low)
+ return wp_low - addr;
+ else if (addr > wp_high)
+ return addr - wp_high;
+ else
+ return 0;
+}
+
static int watchpoint_handler(unsigned long addr, unsigned int esr,
struct pt_regs *regs)
{
- int i, step = 0, *kernel_step, access;
- u32 ctrl_reg, lens, lene;
+ int i, step = 0, *kernel_step, access, closest_match = 0;
+ u64 min_dist = -1, dist;
+ u32 ctrl_reg;
u64 val;
struct perf_event *wp, **slots;
struct debug_info *debug_info;
@@ -676,31 +711,15 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr,
slots = this_cpu_ptr(wp_on_reg);
debug_info = &current->thread.debug;

+ /*
+ * Find all watchpoints that match the reported address. If no exact
+ * match is found. Attribute the hit to the closest watchpoint.
+ */
+ rcu_read_lock();
for (i = 0; i < core_num_wrps; ++i) {
- rcu_read_lock();
-
wp = slots[i];
-
if (wp == NULL)
- goto unlock;
-
- info = counter_arch_bp(wp);
-
- /* Check if the watchpoint value and byte select match. */
- val = read_wb_reg(AARCH64_DBG_REG_WVR, i);
- ctrl_reg = read_wb_reg(AARCH64_DBG_REG_WCR, i);
- decode_ctrl_reg(ctrl_reg, &ctrl);
- lens = ffs(ctrl.len) - 1;
- lene = fls(ctrl.len) - 1;
- /*
- * FIXME: reported address can be anywhere between "the
- * lowest address accessed by the memory access that
- * triggered the watchpoint" and "the highest watchpointed
- * address accessed by the memory access". So, it may not
- * lie in the interval of watchpoint address range.
- */
- if (addr < val + lens || addr > val + lene)
- goto unlock;
+ continue;

/*
* Check that the access type matches.
@@ -709,18 +728,37 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr,
access = (esr & AARCH64_ESR_ACCESS_MASK) ? HW_BREAKPOINT_W :
HW_BREAKPOINT_R;
if (!(access & hw_breakpoint_type(wp)))
- goto unlock;
+ continue;
+
+ /* Check if the watchpoint value and byte select match. */
+ val = read_wb_reg(AARCH64_DBG_REG_WVR, i);
+ ctrl_reg = read_wb_reg(AARCH64_DBG_REG_WCR, i);
+ decode_ctrl_reg(ctrl_reg, &ctrl);
+ dist = get_distance_from_watchpoint(addr, val, &ctrl);
+ if (dist < min_dist) {
+ min_dist = dist;
+ closest_match = i;
+ }
+ /* Is this an exact match? */
+ if (dist != 0)
+ continue;

+ info = counter_arch_bp(wp);
info->trigger = addr;
perf_bp_event(wp, regs);

/* Do we need to handle the stepping? */
if (is_default_overflow_handler(wp))
step = 1;
-
-unlock:
- rcu_read_unlock();
}
+ if (min_dist > 0 && min_dist != -1) {
+ /* No exact match found. */
+ wp = slots[closest_match];
+ info = counter_arch_bp(wp);
+ info->trigger = addr;
+ perf_bp_event(wp, regs);
+ }
+ rcu_read_unlock();

if (!step)
return 0;
--
2.7.4

2016-10-20 05:48:46

by Pratyush Anand

[permalink] [raw]
Subject: [PATCH V2 4/5] arm64: Allow hw watchpoint of length 3,5,6 and 7

Since, arm64 can support all offset within a double word limit. Therefore,
now support other lengths within that range as well.

Signed-off-by: Pratyush Anand <[email protected]>
---
arch/arm64/include/asm/hw_breakpoint.h | 4 ++++
arch/arm64/kernel/hw_breakpoint.c | 36 ++++++++++++++++++++++++++++++++++
2 files changed, 40 insertions(+)

diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h
index 4f4e58bee9bc..7a18c8520588 100644
--- a/arch/arm64/include/asm/hw_breakpoint.h
+++ b/arch/arm64/include/asm/hw_breakpoint.h
@@ -76,7 +76,11 @@ static inline void decode_ctrl_reg(u32 reg,
/* Lengths */
#define ARM_BREAKPOINT_LEN_1 0x1
#define ARM_BREAKPOINT_LEN_2 0x3
+#define ARM_BREAKPOINT_LEN_3 0x7
#define ARM_BREAKPOINT_LEN_4 0xf
+#define ARM_BREAKPOINT_LEN_5 0x1f
+#define ARM_BREAKPOINT_LEN_6 0x3f
+#define ARM_BREAKPOINT_LEN_7 0x7f
#define ARM_BREAKPOINT_LEN_8 0xff

/* Kernel stepping */
diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index c57bc90b8286..4125c2152e85 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -317,9 +317,21 @@ static int get_hbp_len(u8 hbp_len)
case ARM_BREAKPOINT_LEN_2:
len_in_bytes = 2;
break;
+ case ARM_BREAKPOINT_LEN_3:
+ len_in_bytes = 3;
+ break;
case ARM_BREAKPOINT_LEN_4:
len_in_bytes = 4;
break;
+ case ARM_BREAKPOINT_LEN_5:
+ len_in_bytes = 5;
+ break;
+ case ARM_BREAKPOINT_LEN_6:
+ len_in_bytes = 6;
+ break;
+ case ARM_BREAKPOINT_LEN_7:
+ len_in_bytes = 7;
+ break;
case ARM_BREAKPOINT_LEN_8:
len_in_bytes = 8;
break;
@@ -379,9 +391,21 @@ int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
case ARM_BREAKPOINT_LEN_2:
*gen_len = HW_BREAKPOINT_LEN_2;
break;
+ case ARM_BREAKPOINT_LEN_3:
+ *gen_len = HW_BREAKPOINT_LEN_3;
+ break;
case ARM_BREAKPOINT_LEN_4:
*gen_len = HW_BREAKPOINT_LEN_4;
break;
+ case ARM_BREAKPOINT_LEN_5:
+ *gen_len = HW_BREAKPOINT_LEN_5;
+ break;
+ case ARM_BREAKPOINT_LEN_6:
+ *gen_len = HW_BREAKPOINT_LEN_6;
+ break;
+ case ARM_BREAKPOINT_LEN_7:
+ *gen_len = HW_BREAKPOINT_LEN_7;
+ break;
case ARM_BREAKPOINT_LEN_8:
*gen_len = HW_BREAKPOINT_LEN_8;
break;
@@ -425,9 +449,21 @@ static int arch_build_bp_info(struct perf_event *bp)
case HW_BREAKPOINT_LEN_2:
info->ctrl.len = ARM_BREAKPOINT_LEN_2;
break;
+ case HW_BREAKPOINT_LEN_3:
+ info->ctrl.len = ARM_BREAKPOINT_LEN_3;
+ break;
case HW_BREAKPOINT_LEN_4:
info->ctrl.len = ARM_BREAKPOINT_LEN_4;
break;
+ case HW_BREAKPOINT_LEN_5:
+ info->ctrl.len = ARM_BREAKPOINT_LEN_5;
+ break;
+ case HW_BREAKPOINT_LEN_6:
+ info->ctrl.len = ARM_BREAKPOINT_LEN_6;
+ break;
+ case HW_BREAKPOINT_LEN_7:
+ info->ctrl.len = ARM_BREAKPOINT_LEN_7;
+ break;
case HW_BREAKPOINT_LEN_8:
info->ctrl.len = ARM_BREAKPOINT_LEN_8;
break;
--
2.7.4

2016-10-20 05:48:56

by Pratyush Anand

[permalink] [raw]
Subject: [PATCH V2 5/5] selftests: arm64: add test for unaligned/inexact watchpoint handling

ARM64 hardware expects 64bit aligned address for watchpoint invocation.
However, it provides byte selection method to select any number of
consecutive byte set within the range of 1-8.

This patch adds support to test all such byte selection option for
different memory write sizes.

Patch also adds a test for handling the case when the cpu does not
report an address which exactly matches one of the regions we have
been watching (which is a situation permitted by the spec if an
instruction accesses both watched and unwatched regions). The test
was failing on a MSM8996pro before this patch series and is
passing now.

Signed-off-by: Pavel Labath <[email protected]>
Signed-off-by: Pratyush Anand <[email protected]>
---
tools/testing/selftests/breakpoints/Makefile | 5 +-
.../selftests/breakpoints/breakpoint_test_arm64.c | 236 +++++++++++++++++++++
2 files changed, 240 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/breakpoints/breakpoint_test_arm64.c

diff --git a/tools/testing/selftests/breakpoints/Makefile b/tools/testing/selftests/breakpoints/Makefile
index 74e533fd4bc5..61b79e8df1f4 100644
--- a/tools/testing/selftests/breakpoints/Makefile
+++ b/tools/testing/selftests/breakpoints/Makefile
@@ -5,6 +5,9 @@ ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/)
ifeq ($(ARCH),x86)
TEST_PROGS := breakpoint_test
endif
+ifeq ($(ARCH),aarch64)
+TEST_PROGS := breakpoint_test_arm64
+endif

TEST_PROGS += step_after_suspend_test

@@ -13,4 +16,4 @@ all: $(TEST_PROGS)
include ../lib.mk

clean:
- rm -fr breakpoint_test step_after_suspend_test
+ rm -fr breakpoint_test breakpoint_test_arm64 step_after_suspend_test
diff --git a/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c b/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c
new file mode 100644
index 000000000000..3897e996541e
--- /dev/null
+++ b/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c
@@ -0,0 +1,236 @@
+/*
+ * Copyright (C) 2016 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * Original Code by Pavel Labath <[email protected]>
+ *
+ * Code modified by Pratyush Anand <[email protected]>
+ * for testing different byte select for each access size.
+ *
+ */
+
+#define _GNU_SOURCE
+
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <sys/ptrace.h>
+#include <sys/param.h>
+#include <sys/uio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <string.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <elf.h>
+#include <errno.h>
+#include <signal.h>
+
+#include "../kselftest.h"
+
+static volatile uint8_t var[96] __attribute__((__aligned__(32)));
+
+static void child(int size, int wr)
+{
+ volatile uint8_t *addr = &var[32 + wr];
+
+ if (ptrace(PTRACE_TRACEME, 0, NULL, NULL) != 0) {
+ perror("ptrace(PTRACE_TRACEME) failed");
+ _exit(1);
+ }
+
+ if (raise(SIGSTOP) != 0) {
+ perror("raise(SIGSTOP) failed");
+ _exit(1);
+ }
+
+ if ((uintptr_t) addr % size) {
+ perror("Wrong address write for the given size\n");
+ _exit(1);
+ }
+ switch (size) {
+ case 1:
+ *addr = 47;
+ break;
+ case 2:
+ *(uint16_t *)addr = 47;
+ break;
+ case 4:
+ *(uint32_t *)addr = 47;
+ break;
+ case 8:
+ *(uint64_t *)addr = 47;
+ break;
+ case 16:
+ __asm__ volatile ("stp x29, x30, %0" : "=m" (addr[0]));
+ break;
+ case 32:
+ __asm__ volatile ("stp q29, q30, %0" : "=m" (addr[0]));
+ break;
+ }
+
+ _exit(0);
+}
+
+static bool set_watchpoint(pid_t pid, int size, int wp)
+{
+ const volatile uint8_t *addr = &var[32 + wp];
+ const int offset = (uintptr_t)addr % 8;
+ const unsigned int byte_mask = ((1 << size) - 1) << offset;
+ const unsigned int type = 2; /* Write */
+ const unsigned int enable = 1;
+ const unsigned int control = byte_mask << 5 | type << 3 | enable;
+ struct user_hwdebug_state dreg_state;
+ struct iovec iov;
+
+ memset(&dreg_state, 0, sizeof(dreg_state));
+ dreg_state.dbg_regs[0].addr = (uintptr_t)(addr - offset);
+ dreg_state.dbg_regs[0].ctrl = control;
+ iov.iov_base = &dreg_state;
+ iov.iov_len = offsetof(struct user_hwdebug_state, dbg_regs) +
+ sizeof(dreg_state.dbg_regs[0]);
+ if (ptrace(PTRACE_SETREGSET, pid, NT_ARM_HW_WATCH, &iov) == 0)
+ return true;
+
+ if (errno == EIO) {
+ printf("ptrace(PTRACE_SETREGSET, NT_ARM_HW_WATCH) "
+ "not supported on this hardware\n");
+ ksft_exit_skip();
+ }
+ perror("ptrace(PTRACE_SETREGSET, NT_ARM_HW_WATCH) failed");
+ return false;
+}
+
+static bool run_test(int wr_size, int wp_size, int wr, int wp)
+{
+ int status;
+ siginfo_t siginfo;
+ pid_t pid = fork();
+ pid_t wpid;
+
+ if (pid < 0) {
+ perror("fork() failed");
+ return false;
+ }
+ if (pid == 0)
+ child(wr_size, wr);
+
+ wpid = waitpid(pid, &status, __WALL);
+ if (wpid != pid) {
+ perror("waitpid() failed");
+ return false;
+ }
+ if (!WIFSTOPPED(status)) {
+ printf("child did not stop\n");
+ return false;
+ }
+ if (WSTOPSIG(status) != SIGSTOP) {
+ printf("child did not stop with SIGSTOP\n");
+ return false;
+ }
+
+ if (!set_watchpoint(pid, wp_size, wp))
+ return false;
+
+ if (ptrace(PTRACE_CONT, pid, NULL, NULL) < 0) {
+ perror("ptrace(PTRACE_SINGLESTEP) failed");
+ return false;
+ }
+
+ alarm(3);
+ wpid = waitpid(pid, &status, __WALL);
+ if (wpid != pid) {
+ perror("waitpid() failed");
+ return false;
+ }
+ alarm(0);
+ if (WIFEXITED(status)) {
+ printf("child did not single-step\t");
+ return false;
+ }
+ if (!WIFSTOPPED(status)) {
+ printf("child did not stop\n");
+ return false;
+ }
+ if (WSTOPSIG(status) != SIGTRAP) {
+ printf("child did not stop with SIGTRAP\n");
+ return false;
+ }
+ if (ptrace(PTRACE_GETSIGINFO, pid, NULL, &siginfo) != 0) {
+ perror("ptrace(PTRACE_GETSIGINFO)");
+ return false;
+ }
+ if (siginfo.si_code != TRAP_HWBKPT) {
+ printf("Unexpected si_code %d\n", siginfo.si_code);
+ return false;
+ }
+
+ kill(pid, SIGKILL);
+ wpid = waitpid(pid, &status, 0);
+ if (wpid != pid) {
+ perror("waitpid() failed");
+ return false;
+ }
+ return true;
+}
+
+static void sigalrm(int sig)
+{
+}
+
+int main(int argc, char **argv)
+{
+ int opt;
+ bool succeeded = true;
+ struct sigaction act;
+ int wr, wp, size;
+ bool result;
+
+ act.sa_handler = sigalrm;
+ sigemptyset(&act.sa_mask);
+ act.sa_flags = 0;
+ sigaction(SIGALRM, &act, NULL);
+ for (size = 1; size <= 32; size = size*2) {
+ for (wr = 0; wr <= 32; wr = wr + size) {
+ for (wp = wr - size; wp <= wr + size; wp = wp + size) {
+ printf("Test size = %d write offset = %d watchpoint offset = %d\t", size, wr, wp);
+ result = run_test(size, MIN(size, 8), wr, wp);
+ if ((result && wr == wp) || (!result && wr != wp)) {
+ printf("[OK]\n");
+ ksft_inc_pass_cnt();
+ } else {
+ printf("[FAILED]\n");
+ ksft_inc_fail_cnt();
+ succeeded = false;
+ }
+ }
+ }
+ }
+
+ for (size = 1; size <= 32; size = size*2) {
+ printf("Test size = %d write offset = %d watchpoint offset = -8\t", size, -size);
+
+ if (run_test(size, 8, -size, -8)) {
+ printf("[OK]\n");
+ ksft_inc_pass_cnt();
+ } else {
+ printf("[FAILED]\n");
+ ksft_inc_fail_cnt();
+ succeeded = false;
+ }
+ }
+
+ ksft_print_cnts();
+ if (succeeded)
+ ksft_exit_pass();
+ else
+ ksft_exit_fail();
+}
--
2.7.4

2016-10-20 05:48:40

by Pratyush Anand

[permalink] [raw]
Subject: [PATCH V2 2/5] arm64: Allow hw watchpoint at varied offset from base address

ARM64 hardware supports watchpoint at any double word aligned address.
However, it can select any consecutive bytes from offset 0 to 7 from that
base address. For example, if base address is programmed as 0x420030 and
byte select is 0x1C, then access of 0x420032,0x420033 and 0x420034 will
generate a watchpoint exception.

Currently, we do not have such modularity. We can only program byte,
halfword, word and double word access exception from any base address.

This patch adds support to overcome above limitations.

Signed-off-by: Pratyush Anand <[email protected]>
---
arch/arm64/include/asm/hw_breakpoint.h | 2 +-
arch/arm64/kernel/hw_breakpoint.c | 45 ++++++++++++++++------------------
arch/arm64/kernel/ptrace.c | 5 ++--
3 files changed, 25 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h
index 115ea2a64520..4f4e58bee9bc 100644
--- a/arch/arm64/include/asm/hw_breakpoint.h
+++ b/arch/arm64/include/asm/hw_breakpoint.h
@@ -118,7 +118,7 @@ struct perf_event;
struct pmu;

extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
- int *gen_len, int *gen_type);
+ int *gen_len, int *gen_type, int *offset);
extern int arch_check_bp_in_kernelspace(struct perf_event *bp);
extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index 26a6bf77d272..3c2b96803eba 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -349,7 +349,7 @@ int arch_check_bp_in_kernelspace(struct perf_event *bp)
* to generic breakpoint descriptions.
*/
int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
- int *gen_len, int *gen_type)
+ int *gen_len, int *gen_type, int *offset)
{
/* Type */
switch (ctrl.type) {
@@ -369,8 +369,10 @@ int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
return -EINVAL;
}

+ *offset = ffs(ctrl.len) - 1;
+
/* Len */
- switch (ctrl.len) {
+ switch (ctrl.len >> *offset) {
case ARM_BREAKPOINT_LEN_1:
*gen_len = HW_BREAKPOINT_LEN_1;
break;
@@ -517,18 +519,17 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
default:
return -EINVAL;
}
-
- info->address &= ~alignment_mask;
- info->ctrl.len <<= offset;
} else {
if (info->ctrl.type == ARM_BREAKPOINT_EXECUTE)
alignment_mask = 0x3;
else
alignment_mask = 0x7;
- if (info->address & alignment_mask)
- return -EINVAL;
+ offset = info->address & alignment_mask;
}

+ info->address &= ~alignment_mask;
+ info->ctrl.len <<= offset;
+
/*
* Disallow per-task kernel breakpoints since these would
* complicate the stepping code.
@@ -665,8 +666,8 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr,
struct pt_regs *regs)
{
int i, step = 0, *kernel_step, access;
- u32 ctrl_reg;
- u64 val, alignment_mask;
+ u32 ctrl_reg, lens, lene;
+ u64 val;
struct perf_event *wp, **slots;
struct debug_info *debug_info;
struct arch_hw_breakpoint *info;
@@ -684,25 +685,21 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr,
goto unlock;

info = counter_arch_bp(wp);
- /* AArch32 watchpoints are either 4 or 8 bytes aligned. */
- if (is_compat_task()) {
- if (info->ctrl.len == ARM_BREAKPOINT_LEN_8)
- alignment_mask = 0x7;
- else
- alignment_mask = 0x3;
- } else {
- alignment_mask = 0x7;
- }

- /* Check if the watchpoint value matches. */
+ /* Check if the watchpoint value and byte select match. */
val = read_wb_reg(AARCH64_DBG_REG_WVR, i);
- if (val != (addr & ~alignment_mask))
- goto unlock;
-
- /* Possible match, check the byte address select to confirm. */
ctrl_reg = read_wb_reg(AARCH64_DBG_REG_WCR, i);
decode_ctrl_reg(ctrl_reg, &ctrl);
- if (!((1 << (addr & alignment_mask)) & ctrl.len))
+ lens = ffs(ctrl.len) - 1;
+ lene = fls(ctrl.len) - 1;
+ /*
+ * FIXME: reported address can be anywhere between "the
+ * lowest address accessed by the memory access that
+ * triggered the watchpoint" and "the highest watchpointed
+ * address accessed by the memory access". So, it may not
+ * lie in the interval of watchpoint address range.
+ */
+ if (addr < val + lens || addr > val + lene)
goto unlock;

/*
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index e0c81da60f76..0eb366a94382 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -327,13 +327,13 @@ static int ptrace_hbp_fill_attr_ctrl(unsigned int note_type,
struct arch_hw_breakpoint_ctrl ctrl,
struct perf_event_attr *attr)
{
- int err, len, type, disabled = !ctrl.enabled;
+ int err, len, type, offset, disabled = !ctrl.enabled;

attr->disabled = disabled;
if (disabled)
return 0;

- err = arch_bp_generic_fields(ctrl, &len, &type);
+ err = arch_bp_generic_fields(ctrl, &len, &type, &offset);
if (err)
return err;

@@ -352,6 +352,7 @@ static int ptrace_hbp_fill_attr_ctrl(unsigned int note_type,

attr->bp_len = len;
attr->bp_type = type;
+ attr->bp_addr += offset;

return 0;
}
--
2.7.4

2016-11-08 03:26:58

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH V2 2/5] arm64: Allow hw watchpoint at varied offset from base address

Hi Pratyush,

On Thu, Oct 20, 2016 at 11:18:14AM +0530, Pratyush Anand wrote:
> ARM64 hardware supports watchpoint at any double word aligned address.
> However, it can select any consecutive bytes from offset 0 to 7 from that
> base address. For example, if base address is programmed as 0x420030 and
> byte select is 0x1C, then access of 0x420032,0x420033 and 0x420034 will
> generate a watchpoint exception.
>
> Currently, we do not have such modularity. We can only program byte,
> halfword, word and double word access exception from any base address.
>
> This patch adds support to overcome above limitations.
>
> Signed-off-by: Pratyush Anand <[email protected]>
> ---
> arch/arm64/include/asm/hw_breakpoint.h | 2 +-
> arch/arm64/kernel/hw_breakpoint.c | 45 ++++++++++++++++------------------
> arch/arm64/kernel/ptrace.c | 5 ++--
> 3 files changed, 25 insertions(+), 27 deletions(-)
>
> diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h
> index 115ea2a64520..4f4e58bee9bc 100644
> --- a/arch/arm64/include/asm/hw_breakpoint.h
> +++ b/arch/arm64/include/asm/hw_breakpoint.h
> @@ -118,7 +118,7 @@ struct perf_event;
> struct pmu;
>
> extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
> - int *gen_len, int *gen_type);
> + int *gen_len, int *gen_type, int *offset);
> extern int arch_check_bp_in_kernelspace(struct perf_event *bp);
> extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
> extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
> diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
> index 26a6bf77d272..3c2b96803eba 100644
> --- a/arch/arm64/kernel/hw_breakpoint.c
> +++ b/arch/arm64/kernel/hw_breakpoint.c
> @@ -349,7 +349,7 @@ int arch_check_bp_in_kernelspace(struct perf_event *bp)
> * to generic breakpoint descriptions.
> */
> int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
> - int *gen_len, int *gen_type)
> + int *gen_len, int *gen_type, int *offset)
> {
> /* Type */
> switch (ctrl.type) {
> @@ -369,8 +369,10 @@ int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
> return -EINVAL;
> }
>
> + *offset = ffs(ctrl.len) - 1;

Do we want to fail the length == 0 case early? If you do add that check,
then you can use __ffs here instead.

> /* Len */
> - switch (ctrl.len) {
> + switch (ctrl.len >> *offset) {
> case ARM_BREAKPOINT_LEN_1:
> *gen_len = HW_BREAKPOINT_LEN_1;
> break;
> @@ -517,18 +519,17 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
> default:
> return -EINVAL;
> }
> -
> - info->address &= ~alignment_mask;
> - info->ctrl.len <<= offset;
> } else {
> if (info->ctrl.type == ARM_BREAKPOINT_EXECUTE)
> alignment_mask = 0x3;
> else
> alignment_mask = 0x7;
> - if (info->address & alignment_mask)
> - return -EINVAL;
> + offset = info->address & alignment_mask;
> }
>
> + info->address &= ~alignment_mask;
> + info->ctrl.len <<= offset;

Urgh, I really hate all this converting to and fro to keep perf happy.
FWIW, I'm at the point where I would seriously consider ripping out the
hw_breakpoint code and replacing it with a ptrace-specific backend so we
just drop all this crap. The only people that seem to use the perf interface
are those running the (failing) selftests. Given that we have to have
a bloody thread switch notifier *anyway*, all perf seems to give us is
complexity and restrictions.

> /*
> * Disallow per-task kernel breakpoints since these would
> * complicate the stepping code.
> @@ -665,8 +666,8 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr,
> struct pt_regs *regs)
> {
> int i, step = 0, *kernel_step, access;
> - u32 ctrl_reg;
> - u64 val, alignment_mask;
> + u32 ctrl_reg, lens, lene;
> + u64 val;
> struct perf_event *wp, **slots;
> struct debug_info *debug_info;
> struct arch_hw_breakpoint *info;
> @@ -684,25 +685,21 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr,
> goto unlock;
>
> info = counter_arch_bp(wp);
> - /* AArch32 watchpoints are either 4 or 8 bytes aligned. */
> - if (is_compat_task()) {
> - if (info->ctrl.len == ARM_BREAKPOINT_LEN_8)
> - alignment_mask = 0x7;
> - else
> - alignment_mask = 0x3;
> - } else {
> - alignment_mask = 0x7;
> - }
>
> - /* Check if the watchpoint value matches. */
> + /* Check if the watchpoint value and byte select match. */
> val = read_wb_reg(AARCH64_DBG_REG_WVR, i);
> - if (val != (addr & ~alignment_mask))
> - goto unlock;
> -
> - /* Possible match, check the byte address select to confirm. */
> ctrl_reg = read_wb_reg(AARCH64_DBG_REG_WCR, i);
> decode_ctrl_reg(ctrl_reg, &ctrl);
> - if (!((1 << (addr & alignment_mask)) & ctrl.len))
> + lens = ffs(ctrl.len) - 1;
> + lene = fls(ctrl.len) - 1;

Again, you can use the '__' variants to avoid the '- 1'.

> + /*
> + * FIXME: reported address can be anywhere between "the
> + * lowest address accessed by the memory access that
> + * triggered the watchpoint" and "the highest watchpointed
> + * address accessed by the memory access". So, it may not
> + * lie in the interval of watchpoint address range.
> + */
> + if (addr < val + lens || addr > val + lene)
> goto unlock;
>
> /*
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index e0c81da60f76..0eb366a94382 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -327,13 +327,13 @@ static int ptrace_hbp_fill_attr_ctrl(unsigned int note_type,
> struct arch_hw_breakpoint_ctrl ctrl,
> struct perf_event_attr *attr)
> {
> - int err, len, type, disabled = !ctrl.enabled;
> + int err, len, type, offset, disabled = !ctrl.enabled;
>
> attr->disabled = disabled;
> if (disabled)
> return 0;
>
> - err = arch_bp_generic_fields(ctrl, &len, &type);
> + err = arch_bp_generic_fields(ctrl, &len, &type, &offset);
> if (err)
> return err;
>
> @@ -352,6 +352,7 @@ static int ptrace_hbp_fill_attr_ctrl(unsigned int note_type,
>
> attr->bp_len = len;
> attr->bp_type = type;
> + attr->bp_addr += offset;

That's going to look pretty bizarre from userspace if it decides to read
back the address registers to find that they've mysteriously changed.

Perhaps ptrace_hbp_get_addr needs to retrieve the address from the
arch_hw_breakpoint, like we do for the ctrl register. What do you think?

Will

2016-11-08 03:29:42

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH V2 3/5] arm64: hw_breakpoint: Handle inexact watchpoint addresses

On Thu, Oct 20, 2016 at 11:18:15AM +0530, Pratyush Anand wrote:
> From: Pavel Labath <[email protected]>
>
> Arm64 hardware does not always report a watchpoint hit address that
> matches one of the watchpoints set. It can also report an address
> "near" the watchpoint if a single instruction access both watched and
> unwatched addresses. There is no straight-forward way, short of
> disassembling the offending instruction, to map that address back to
> the watchpoint.
>
> Previously, when the hardware reported a watchpoint hit on an address
> that did not match our watchpoint (this happens in case of instructions
> which access large chunks of memory such as "stp") the process would
> enter a loop where we would be continually resuming it (because we did
> not recognise that watchpoint hit) and it would keep hitting the
> watchpoint again and again. The tracing process would never get
> notified of the watchpoint hit.
>
> This commit fixes the problem by looking at the watchpoints near the
> address reported by the hardware. If the address does not exactly match
> one of the watchpoints we have set, it attributes the hit to the
> nearest watchpoint we have. This heuristic is a bit dodgy, but I don't
> think we can do much more, given the hardware limitations.
>
> [panand: reworked to rebase on his patches]
>
> Signed-off-by: Pavel Labath <[email protected]>
> Signed-off-by: Pratyush Anand <[email protected]>
> ---
> arch/arm64/kernel/hw_breakpoint.c | 94 +++++++++++++++++++++++++++------------
> 1 file changed, 66 insertions(+), 28 deletions(-)
>
> diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
> index 3c2b96803eba..c57bc90b8286 100644
> --- a/arch/arm64/kernel/hw_breakpoint.c
> +++ b/arch/arm64/kernel/hw_breakpoint.c
> @@ -662,11 +662,46 @@ unlock:
> }
> NOKPROBE_SYMBOL(breakpoint_handler);
>
> +/*
> + * Arm64 hardware does not always report a watchpoint hit address that matches
> + * one of the watchpoints set. It can also report an address "near" the
> + * watchpoint if a single instruction access both watched and unwatched
> + * addresses. There is no straight-forward way, short of disassembling the
> + * offending instruction, to map that address back to the watchpoint. This
> + * function computes the distance of the memory access from the watchpoint as a
> + * heuristic for the likelyhood that a given access triggered the watchpoint.
> + *
> + * See Section D2.10.5 "Determining the memory location that caused a Watchpoint
> + * exception" of ARMv8 Architecture Reference Manual for details.
> + *
> + * The function returns the distance of the address from the bytes watched by
> + * the watchpoint. In case of an exact match, it returns 0.
> + */
> +static u64 get_distance_from_watchpoint(unsigned long addr, u64 val,
> + struct arch_hw_breakpoint_ctrl *ctrl)
> +{
> + u64 wp_low, wp_high;
> + u32 lens, lene;
> +
> + lens = ffs(ctrl->len) - 1;
> + lene = fls(ctrl->len) - 1;
> +
> + wp_low = val + lens;
> + wp_high = val + lene;
> + if (addr < wp_low)
> + return wp_low - addr;
> + else if (addr > wp_high)
> + return addr - wp_high;
> + else
> + return 0;
> +}
> +
> static int watchpoint_handler(unsigned long addr, unsigned int esr,
> struct pt_regs *regs)
> {
> - int i, step = 0, *kernel_step, access;
> - u32 ctrl_reg, lens, lene;
> + int i, step = 0, *kernel_step, access, closest_match = 0;
> + u64 min_dist = -1, dist;
> + u32 ctrl_reg;
> u64 val;
> struct perf_event *wp, **slots;
> struct debug_info *debug_info;
> @@ -676,31 +711,15 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr,
> slots = this_cpu_ptr(wp_on_reg);
> debug_info = &current->thread.debug;
>
> + /*
> + * Find all watchpoints that match the reported address. If no exact
> + * match is found. Attribute the hit to the closest watchpoint.
> + */
> + rcu_read_lock();
> for (i = 0; i < core_num_wrps; ++i) {
> - rcu_read_lock();
> -
> wp = slots[i];
> -
> if (wp == NULL)
> - goto unlock;
> -
> - info = counter_arch_bp(wp);
> -
> - /* Check if the watchpoint value and byte select match. */
> - val = read_wb_reg(AARCH64_DBG_REG_WVR, i);
> - ctrl_reg = read_wb_reg(AARCH64_DBG_REG_WCR, i);
> - decode_ctrl_reg(ctrl_reg, &ctrl);
> - lens = ffs(ctrl.len) - 1;
> - lene = fls(ctrl.len) - 1;
> - /*
> - * FIXME: reported address can be anywhere between "the
> - * lowest address accessed by the memory access that
> - * triggered the watchpoint" and "the highest watchpointed
> - * address accessed by the memory access". So, it may not
> - * lie in the interval of watchpoint address range.
> - */
> - if (addr < val + lens || addr > val + lene)
> - goto unlock;
> + continue;
>
> /*
> * Check that the access type matches.
> @@ -709,18 +728,37 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr,
> access = (esr & AARCH64_ESR_ACCESS_MASK) ? HW_BREAKPOINT_W :
> HW_BREAKPOINT_R;
> if (!(access & hw_breakpoint_type(wp)))
> - goto unlock;
> + continue;
> +
> + /* Check if the watchpoint value and byte select match. */
> + val = read_wb_reg(AARCH64_DBG_REG_WVR, i);
> + ctrl_reg = read_wb_reg(AARCH64_DBG_REG_WCR, i);
> + decode_ctrl_reg(ctrl_reg, &ctrl);
> + dist = get_distance_from_watchpoint(addr, val, &ctrl);
> + if (dist < min_dist) {
> + min_dist = dist;
> + closest_match = i;
> + }
> + /* Is this an exact match? */
> + if (dist != 0)
> + continue;
>
> + info = counter_arch_bp(wp);
> info->trigger = addr;
> perf_bp_event(wp, regs);
>
> /* Do we need to handle the stepping? */
> if (is_default_overflow_handler(wp))
> step = 1;
> -
> -unlock:
> - rcu_read_unlock();
> }
> + if (min_dist > 0 && min_dist != -1) {
> + /* No exact match found. */
> + wp = slots[closest_match];
> + info = counter_arch_bp(wp);
> + info->trigger = addr;
> + perf_bp_event(wp, regs);
> + }

Why don't we need to bother with the stepping in the case of a non-exact
match?

Will

2016-11-08 12:07:12

by Pavel Labath

[permalink] [raw]
Subject: Re: [PATCH V2 3/5] arm64: hw_breakpoint: Handle inexact watchpoint addresses

>>
>> /* Do we need to handle the stepping? */
>> if (is_default_overflow_handler(wp))
>> step = 1;
>> -
>> -unlock:
>> - rcu_read_unlock();
>> }
>> + if (min_dist > 0 && min_dist != -1) {
>> + /* No exact match found. */
>> + wp = slots[closest_match];
>> + info = counter_arch_bp(wp);
>> + info->trigger = addr;
>> + perf_bp_event(wp, regs);
>> + }
>
> Why don't we need to bother with the stepping in the case of a non-exact
> match?

Good catch. I think we do. I must have dropped that part somehow.

Pratyush, could you include the attached fixup in the next batch?

regards,
pavel


Attachments:
fixup.diff (493.00 B)

2016-11-09 17:38:45

by Pratyush Anand

[permalink] [raw]
Subject: Re: [PATCH V2 2/5] arm64: Allow hw watchpoint at varied offset from base address

Hi Will,

Thanks a lot for your review.

On Tuesday 08 November 2016 08:56 AM, Will Deacon wrote:
> Hi Pratyush,
>
> On Thu, Oct 20, 2016 at 11:18:14AM +0530, Pratyush Anand wrote:
>> ARM64 hardware supports watchpoint at any double word aligned address.
>> However, it can select any consecutive bytes from offset 0 to 7 from that
>> base address. For example, if base address is programmed as 0x420030 and
>> byte select is 0x1C, then access of 0x420032,0x420033 and 0x420034 will
>> generate a watchpoint exception.
>>
>> Currently, we do not have such modularity. We can only program byte,
>> halfword, word and double word access exception from any base address.
>>
>> This patch adds support to overcome above limitations.
>>
>> Signed-off-by: Pratyush Anand <[email protected]>
>> ---
>> arch/arm64/include/asm/hw_breakpoint.h | 2 +-
>> arch/arm64/kernel/hw_breakpoint.c | 45 ++++++++++++++++------------------
>> arch/arm64/kernel/ptrace.c | 5 ++--
>> 3 files changed, 25 insertions(+), 27 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h
>> index 115ea2a64520..4f4e58bee9bc 100644
>> --- a/arch/arm64/include/asm/hw_breakpoint.h
>> +++ b/arch/arm64/include/asm/hw_breakpoint.h
>> @@ -118,7 +118,7 @@ struct perf_event;
>> struct pmu;
>>
>> extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
>> - int *gen_len, int *gen_type);
>> + int *gen_len, int *gen_type, int *offset);
>> extern int arch_check_bp_in_kernelspace(struct perf_event *bp);
>> extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
>> extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
>> diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
>> index 26a6bf77d272..3c2b96803eba 100644
>> --- a/arch/arm64/kernel/hw_breakpoint.c
>> +++ b/arch/arm64/kernel/hw_breakpoint.c
>> @@ -349,7 +349,7 @@ int arch_check_bp_in_kernelspace(struct perf_event *bp)
>> * to generic breakpoint descriptions.
>> */
>> int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
>> - int *gen_len, int *gen_type)
>> + int *gen_len, int *gen_type, int *offset)
>> {
>> /* Type */
>> switch (ctrl.type) {
>> @@ -369,8 +369,10 @@ int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
>> return -EINVAL;
>> }
>>
>> + *offset = ffs(ctrl.len) - 1;
>
> Do we want to fail the length == 0 case early? If you do add that check,
> then you can use __ffs here instead.

Yes, I think, your point can be taken.

>
>> /* Len */
>> - switch (ctrl.len) {
>> + switch (ctrl.len >> *offset) {
>> case ARM_BREAKPOINT_LEN_1:
>> *gen_len = HW_BREAKPOINT_LEN_1;
>> break;
>> @@ -517,18 +519,17 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
>> default:
>> return -EINVAL;
>> }
>> -
>> - info->address &= ~alignment_mask;
>> - info->ctrl.len <<= offset;
>> } else {
>> if (info->ctrl.type == ARM_BREAKPOINT_EXECUTE)
>> alignment_mask = 0x3;
>> else
>> alignment_mask = 0x7;
>> - if (info->address & alignment_mask)
>> - return -EINVAL;
>> + offset = info->address & alignment_mask;
>> }
>>
>> + info->address &= ~alignment_mask;
>> + info->ctrl.len <<= offset;
>
> Urgh, I really hate all this converting to and fro to keep perf happy.
> FWIW, I'm at the point where I would seriously consider ripping out the
> hw_breakpoint code and replacing it with a ptrace-specific backend so we
> just drop all this crap. The only people that seem to use the perf interface
> are those running the (failing) selftests. Given that we have to have
> a bloody thread switch notifier *anyway*, all perf seems to give us is
> complexity and restrictions.

Yes, I agree.

>
>> /*
>> * Disallow per-task kernel breakpoints since these would
>> * complicate the stepping code.
>> @@ -665,8 +666,8 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr,
>> struct pt_regs *regs)
>> {
>> int i, step = 0, *kernel_step, access;
>> - u32 ctrl_reg;
>> - u64 val, alignment_mask;
>> + u32 ctrl_reg, lens, lene;
>> + u64 val;
>> struct perf_event *wp, **slots;
>> struct debug_info *debug_info;
>> struct arch_hw_breakpoint *info;
>> @@ -684,25 +685,21 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr,
>> goto unlock;
>>
>> info = counter_arch_bp(wp);
>> - /* AArch32 watchpoints are either 4 or 8 bytes aligned. */
>> - if (is_compat_task()) {
>> - if (info->ctrl.len == ARM_BREAKPOINT_LEN_8)
>> - alignment_mask = 0x7;
>> - else
>> - alignment_mask = 0x3;
>> - } else {
>> - alignment_mask = 0x7;
>> - }
>>
>> - /* Check if the watchpoint value matches. */
>> + /* Check if the watchpoint value and byte select match. */
>> val = read_wb_reg(AARCH64_DBG_REG_WVR, i);
>> - if (val != (addr & ~alignment_mask))
>> - goto unlock;
>> -
>> - /* Possible match, check the byte address select to confirm. */
>> ctrl_reg = read_wb_reg(AARCH64_DBG_REG_WCR, i);
>> decode_ctrl_reg(ctrl_reg, &ctrl);
>> - if (!((1 << (addr & alignment_mask)) & ctrl.len))
>> + lens = ffs(ctrl.len) - 1;
>> + lene = fls(ctrl.len) - 1;
>
> Again, you can use the '__' variants to avoid the '- 1'.

Ok.

>
>> + /*
>> + * FIXME: reported address can be anywhere between "the
>> + * lowest address accessed by the memory access that
>> + * triggered the watchpoint" and "the highest watchpointed
>> + * address accessed by the memory access". So, it may not
>> + * lie in the interval of watchpoint address range.
>> + */
>> + if (addr < val + lens || addr > val + lene)
>> goto unlock;
>>
>> /*
>> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
>> index e0c81da60f76..0eb366a94382 100644
>> --- a/arch/arm64/kernel/ptrace.c
>> +++ b/arch/arm64/kernel/ptrace.c
>> @@ -327,13 +327,13 @@ static int ptrace_hbp_fill_attr_ctrl(unsigned int note_type,
>> struct arch_hw_breakpoint_ctrl ctrl,
>> struct perf_event_attr *attr)
>> {
>> - int err, len, type, disabled = !ctrl.enabled;
>> + int err, len, type, offset, disabled = !ctrl.enabled;
>>
>> attr->disabled = disabled;
>> if (disabled)
>> return 0;
>>
>> - err = arch_bp_generic_fields(ctrl, &len, &type);
>> + err = arch_bp_generic_fields(ctrl, &len, &type, &offset);
>> if (err)
>> return err;
>>
>> @@ -352,6 +352,7 @@ static int ptrace_hbp_fill_attr_ctrl(unsigned int note_type,
>>
>> attr->bp_len = len;
>> attr->bp_type = type;
>> + attr->bp_addr += offset;
>
> That's going to look pretty bizarre from userspace if it decides to read
> back the address registers to find that they've mysteriously changed.
>
> Perhaps ptrace_hbp_get_addr needs to retrieve the address from the
> arch_hw_breakpoint, like we do for the ctrl register. What do you think?

..and that should help...I will give it a try and will test before next
revision.

~Pratyush

2016-11-09 17:39:19

by Pratyush Anand

[permalink] [raw]
Subject: Re: [PATCH V2 3/5] arm64: hw_breakpoint: Handle inexact watchpoint addresses



On Tuesday 08 November 2016 05:28 PM, Pavel Labath wrote:
>>> + if (min_dist > 0 && min_dist != -1) {
>>> >> + /* No exact match found. */
>>> >> + wp = slots[closest_match];
>>> >> + info = counter_arch_bp(wp);
>>> >> + info->trigger = addr;
>>> >> + perf_bp_event(wp, regs);
>>> >> + }
>> >
>> > Why don't we need to bother with the stepping in the case of a non-exact
>> > match?
> Good catch. I think we do. I must have dropped that part somehow.
>
> Pratyush, could you include the attached fixup in the next batch?

Ok, will do.

~Pratyush