The following patchset adds address masks to existing perf hardware
breakpoint mechanism to allow trapping on an address range (currently
only single address) on supported architectures.
perf uapi is updated, x86 AMD implementation (for AMD Family 16h and
beyond) is provided, and perf tool has been extended to do:
$ perf stat -e mem:0x1000/0xf:w a.out
^^^
"don't care" bit mask
which will count writes to [0x1000 ~ 0x1010)
V2:
Per Oleg's suggestions:
* Moved testing of bp_addr_mask to validate_hw_breakpoint()
* Changed perf tool syntax to mem:<addr>[/addr_mask][:access]
Jacob Shin (2):
perf: Add hardware breakpoint address mask
perf/x86/amd: AMD implementation for hardware breakpoint address mask
Suravee Suthikulpanit (2):
perf tools: Add hardware breakpoint address mask event parser
perf tools: Add hardware breakpoint address mask test cases
arch/x86/include/asm/cpufeature.h | 2 ++
arch/x86/include/asm/debugreg.h | 5 ++++
arch/x86/include/asm/hw_breakpoint.h | 1 +
arch/x86/include/uapi/asm/msr-index.h | 6 ++++
arch/x86/kernel/cpu/amd.c | 19 +++++++++++++
arch/x86/kernel/hw_breakpoint.c | 16 +++++++++++
include/uapi/linux/perf_event.h | 5 +++-
kernel/events/hw_breakpoint.c | 9 ++++++
tools/perf/Documentation/perf-record.txt | 14 +++++++---
tools/perf/tests/parse-events.c | 45 ++++++++++++++++++++++++++++++
tools/perf/util/parse-events.c | 5 ++--
tools/perf/util/parse-events.h | 2 +-
tools/perf/util/parse-events.l | 1 +
tools/perf/util/parse-events.y | 24 ++++++++++++++--
14 files changed, 144 insertions(+), 10 deletions(-)
--
1.7.9.5
From: Suravee Suthikulpanit <[email protected]>
Allow perf tool to pass in breakpoint address mask to match an address
range, i.e.:
$ perf stat -e mem:0x1000/0xf:w a.out
Will count writes to [0x1000 ~ 0x1010)
Signed-off-by: Suravee Suthikulpanit <[email protected]>
Signed-off-by: Jacob Shin <[email protected]>
---
tools/perf/Documentation/perf-record.txt | 14 ++++++++++----
tools/perf/util/parse-events.c | 5 +++--
tools/perf/util/parse-events.h | 2 +-
tools/perf/util/parse-events.l | 1 +
tools/perf/util/parse-events.y | 24 ++++++++++++++++++++++--
5 files changed, 37 insertions(+), 9 deletions(-)
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index d4da111..c516320 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -33,13 +33,19 @@ OPTIONS
- a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a
hexadecimal event descriptor.
- - a hardware breakpoint event in the form of '\mem:addr[:access]'
- where addr is the address in memory you want to break in.
- Access is the memory access type (read, write, execute) it can
- be passed as follows: '\mem:addr[:[r][w][x]]'.
+ - a hardware breakpoint event in the form of
+ '\mem:addr[/addr_mask][:access]' where addr is the address in
+ memory you want to break in. Access is the memory access type (read,
+ write, execute) it can be passed as follows: '\mem:addr[:[r][w][x]]'.
+ addr_mask is the "don't care" bit mask to further qualify the given
+ addr, to break in on accesses to an address range.
+
If you want to profile read-write accesses in 0x1000, just set
'mem:0x1000:rw'.
+ If you want to profile write accesses in [0x1000 ~ 0x1010), just set
+ 'mem:0x1000/0xf:w'. (Only supported on some hardware)
+
--filter=<filter>::
Event filter.
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 6c8bb0f..0ca518f 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -508,12 +508,13 @@ do { \
}
int parse_events_add_breakpoint(struct list_head **list, int *idx,
- void *ptr, char *type)
+ void *ptr, char *type, void *msk)
{
struct perf_event_attr attr;
memset(&attr, 0, sizeof(attr));
attr.bp_addr = (unsigned long) ptr;
+ attr.bp_addr_mask = (unsigned long) msk;
if (parse_breakpoint_type(type, &attr))
return -EINVAL;
@@ -1147,7 +1148,7 @@ void print_events(const char *event_glob, bool name_only)
printf("\n");
printf(" %-50s [%s]\n",
- "mem:<addr>[:access]",
+ "mem:<addr>[/addr_mask][:access]",
event_type_descriptors[PERF_TYPE_BREAKPOINT]);
printf("\n");
}
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 8a48593..83920cc 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -92,7 +92,7 @@ int parse_events_add_numeric(struct list_head **list, int *idx,
int parse_events_add_cache(struct list_head **list, int *idx,
char *type, char *op_result1, char *op_result2);
int parse_events_add_breakpoint(struct list_head **list, int *idx,
- void *ptr, char *type);
+ void *ptr, char *type, void *msk);
int parse_events_add_pmu(struct list_head **list, int *idx,
char *pmu , struct list_head *head_config);
void parse_events__set_leader(char *name, struct list_head *list);
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index e9d1134..15b3efc 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -191,6 +191,7 @@ r{num_raw_hex} { return raw(yyscanner); }
<mem>{
{modifier_bp} { return str(yyscanner, PE_MODIFIER_BP); }
: { return ':'; }
+"/" { return '/'; }
{num_dec} { return value(yyscanner, 10); }
{num_hex} { return value(yyscanner, 16); }
/*
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index afc44c1..8b17979 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -263,13 +263,33 @@ PE_NAME_CACHE_TYPE
}
event_legacy_mem:
+PE_PREFIX_MEM PE_VALUE '/' PE_VALUE ':' PE_MODIFIER_BP sep_dc
+{
+ struct parse_events_evlist *data = _data;
+ struct list_head *list = NULL;
+
+ ABORT_ON(parse_events_add_breakpoint(&list, &data->idx,
+ (void *) $2, $6, (void *) $4));
+ $$ = list;
+}
+|
+PE_PREFIX_MEM PE_VALUE '/' PE_VALUE sep_dc
+{
+ struct parse_events_evlist *data = _data;
+ struct list_head *list = NULL;
+
+ ABORT_ON(parse_events_add_breakpoint(&list, &data->idx,
+ (void *) $2, NULL, (void *) $4));
+ $$ = list;
+}
+|
PE_PREFIX_MEM PE_VALUE ':' PE_MODIFIER_BP sep_dc
{
struct parse_events_evlist *data = _data;
struct list_head *list = NULL;
ABORT_ON(parse_events_add_breakpoint(&list, &data->idx,
- (void *) $2, $4));
+ (void *) $2, $4, NULL));
$$ = list;
}
|
@@ -279,7 +299,7 @@ PE_PREFIX_MEM PE_VALUE sep_dc
struct list_head *list = NULL;
ABORT_ON(parse_events_add_breakpoint(&list, &data->idx,
- (void *) $2, NULL));
+ (void *) $2, NULL, NULL));
$$ = list;
}
--
1.7.9.5
From: Suravee Suthikulpanit <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
Signed-off-by: Jacob Shin <[email protected]>
---
tools/perf/tests/parse-events.c | 45 +++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 88e2f44..f9c7666 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -372,6 +372,38 @@ static int test__checkevent_breakpoint_rw_modifier(struct perf_evlist *evlist)
return test__checkevent_breakpoint_rw(evlist);
}
+static int test__checkevent_breakpoint_addrmask(struct perf_evlist *evlist)
+{
+ struct perf_evsel *evsel = perf_evlist__first(evlist);
+
+ TEST_ASSERT_VAL("wrong bp_addr_mask", 1 == evsel->attr.bp_addr_mask);
+
+ return test__checkevent_breakpoint(evlist);
+}
+
+static int test__checkevent_breakpoint_addrmask_x(struct perf_evlist *evlist)
+{
+ struct perf_evsel *evsel = perf_evlist__first(evlist);
+
+ TEST_ASSERT_VAL("wrong bp_addr_mask", 1 == evsel->attr.bp_addr_mask);
+
+ return test__checkevent_breakpoint_x(evlist);
+}
+
+static int
+test__checkevent_breakpoint_addrmask_rw_modifier(struct perf_evlist *evlist)
+{
+ struct perf_evsel *evsel = perf_evlist__first(evlist);
+
+ TEST_ASSERT_VAL("wrong exclude_user", !evsel->attr.exclude_user);
+ TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->attr.exclude_kernel);
+ TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
+ TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
+ TEST_ASSERT_VAL("wrong bp_addr_mask", 1 == evsel->attr.bp_addr_mask);
+
+ return test__checkevent_breakpoint_rw(evlist);
+}
+
static int test__checkevent_pmu(struct perf_evlist *evlist)
{
@@ -1187,6 +1219,19 @@ static struct evlist_test test__events[] = {
.name = "{cycles:G,cache-misses:H}:uG",
.check = test__group_gh4,
},
+ [38] = {
+ .name = "mem:0/1",
+ .check = test__checkevent_breakpoint_addrmask,
+ },
+
+ [39] = {
+ .name = "mem:0/1:x",
+ .check = test__checkevent_breakpoint_addrmask_x,
+ },
+ [40] = {
+ .name = "mem:0/1:rw:uk",
+ .check = test__checkevent_breakpoint_addrmask_rw_modifier,
+ },
};
static struct evlist_test test__events_pmu[] = {
--
1.7.9.5
Some architectures (for us, AMD Family 16h) allow for "don't care" bit
mask to further qualify a hardware breakpoint address, in order to
trap on range of addresses. Update perf uapi to add bp_addr_mask field.
Signed-off-by: Jacob Shin <[email protected]>
---
include/uapi/linux/perf_event.h | 5 ++++-
kernel/events/hw_breakpoint.c | 9 +++++++++
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index fb104e5..e22e1d1 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -286,7 +286,10 @@ struct perf_event_attr {
__u64 config1; /* extension of config */
};
union {
- __u64 bp_len;
+ struct {
+ __u32 bp_len;
+ __u32 bp_addr_mask;
+ };
__u64 config2; /* extension of config1 */
};
__u64 branch_sample_type; /* enum perf_branch_sample_type */
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index a64f8ae..c454880 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -385,6 +385,11 @@ int dbg_release_bp_slot(struct perf_event *bp)
return 0;
}
+__weak int arch_validate_hwbkpt_addr_mask(struct perf_event *bp)
+{
+ return bp->attr.bp_addr_mask == 0;
+}
+
static int validate_hw_breakpoint(struct perf_event *bp)
{
int ret;
@@ -393,6 +398,10 @@ static int validate_hw_breakpoint(struct perf_event *bp)
if (ret)
return ret;
+ ret = arch_validate_hwbkpt_addr_mask(bp);
+ if (ret)
+ return ret;
+
if (arch_check_bp_in_kernelspace(bp)) {
if (bp->attr.exclude_kernel)
return -EINVAL;
--
1.7.9.5
Implement hardware breakpoint address mask for AMD Family 16h (and
any other future) processors. CPUID feature bit indicates the hardware
support for DRn_ADDR_MASK MSRs.
Signed-off-by: Jacob Shin <[email protected]>
---
arch/x86/include/asm/cpufeature.h | 2 ++
arch/x86/include/asm/debugreg.h | 5 +++++
arch/x86/include/asm/hw_breakpoint.h | 1 +
arch/x86/include/uapi/asm/msr-index.h | 6 ++++++
arch/x86/kernel/cpu/amd.c | 19 +++++++++++++++++++
arch/x86/kernel/hw_breakpoint.c | 16 ++++++++++++++++
6 files changed, 49 insertions(+)
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index ac10df7..bee6994 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -168,6 +168,7 @@
#define X86_FEATURE_TOPOEXT (6*32+22) /* topology extensions CPUID leafs */
#define X86_FEATURE_PERFCTR_CORE (6*32+23) /* core performance counter extensions */
#define X86_FEATURE_PERFCTR_NB (6*32+24) /* NB performance counter extensions */
+#define X86_FEATURE_BPEXT (6*32+26) /* data breakpoint extension */
#define X86_FEATURE_PERFCTR_L2 (6*32+28) /* L2 performance counter extensions */
/*
@@ -317,6 +318,7 @@ extern const char * const x86_power_flags[32];
#define cpu_has_cx16 boot_cpu_has(X86_FEATURE_CX16)
#define cpu_has_eager_fpu boot_cpu_has(X86_FEATURE_EAGER_FPU)
#define cpu_has_topoext boot_cpu_has(X86_FEATURE_TOPOEXT)
+#define cpu_has_bpext boot_cpu_has(X86_FEATURE_BPEXT)
#ifdef CONFIG_X86_64
diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h
index 4b528a9..9b38750 100644
--- a/arch/x86/include/asm/debugreg.h
+++ b/arch/x86/include/asm/debugreg.h
@@ -114,5 +114,10 @@ static inline void debug_stack_usage_inc(void) { }
static inline void debug_stack_usage_dec(void) { }
#endif /* X86_64 */
+#ifdef CONFIG_CPU_SUP_AMD
+extern void set_dr_addr_mask(u32 mask, int dr);
+#else
+static inline void set_dr_addr_mask(u32 mask, int dr) { }
+#endif
#endif /* _ASM_X86_DEBUGREG_H */
diff --git a/arch/x86/include/asm/hw_breakpoint.h b/arch/x86/include/asm/hw_breakpoint.h
index ef1c4d2..c0b89d8 100644
--- a/arch/x86/include/asm/hw_breakpoint.h
+++ b/arch/x86/include/asm/hw_breakpoint.h
@@ -14,6 +14,7 @@ struct arch_hw_breakpoint {
unsigned long address;
u8 len;
u8 type;
+ u32 mask;
};
#include <linux/kdebug.h>
diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h
index b575788..ea7c98f 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/uapi/asm/msr-index.h
@@ -200,6 +200,12 @@
#define MSR_F16H_L2I_PERF_CTL 0xc0010230
#define MSR_F16H_L2I_PERF_CTR 0xc0010231
+/* Fam 16h MSRs */
+#define MSR_F16H_DR0_ADDR_MASK 0xc0011027
+#define MSR_F16H_DR1_ADDR_MASK 0xc0011019
+#define MSR_F16H_DR2_ADDR_MASK 0xc001101a
+#define MSR_F16H_DR3_ADDR_MASK 0xc001101b
+
/* Fam 15h MSRs */
#define MSR_F15H_PERF_CTL 0xc0010200
#define MSR_F15H_PERF_CTR 0xc0010201
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index fa96eb0..aadc499 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -910,3 +910,22 @@ bool cpu_has_amd_erratum(const int *erratum)
}
EXPORT_SYMBOL_GPL(cpu_has_amd_erratum);
+
+void set_dr_addr_mask(u32 mask, int dr)
+{
+ if (!cpu_has_bpext)
+ return;
+
+ switch (dr) {
+ case 0:
+ wrmsr(MSR_F16H_DR0_ADDR_MASK, mask, 0);
+ break;
+ case 1:
+ case 2:
+ case 3:
+ wrmsr(MSR_F16H_DR1_ADDR_MASK - 1 + dr, mask, 0);
+ break;
+ default:
+ break;
+ }
+}
diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c
index 02f0763..58a1b80 100644
--- a/arch/x86/kernel/hw_breakpoint.c
+++ b/arch/x86/kernel/hw_breakpoint.c
@@ -121,6 +121,8 @@ int arch_install_hw_breakpoint(struct perf_event *bp)
if (WARN_ONCE(i == HBP_NUM, "Can't find any breakpoint slot"))
return -EBUSY;
+ set_dr_addr_mask(info->mask, i);
+
set_debugreg(info->address, i);
__this_cpu_write(cpu_debugreg[i], info->address);
@@ -163,6 +165,8 @@ void arch_uninstall_hw_breakpoint(struct perf_event *bp)
*dr7 &= ~__encode_dr7(i, info->len, info->type);
set_debugreg(*dr7, 7);
+
+ set_dr_addr_mask(0, i);
}
static int get_hbp_len(u8 hbp_len)
@@ -254,6 +258,7 @@ static int arch_build_bp_info(struct perf_event *bp)
struct arch_hw_breakpoint *info = counter_arch_bp(bp);
info->address = bp->attr.bp_addr;
+ info->mask = bp->attr.bp_addr_mask;
/* Type */
switch (bp->attr.bp_type) {
@@ -345,6 +350,17 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
return 0;
}
+int arch_validate_hwbkpt_addr_mask(struct perf_event *bp)
+{
+ if (!cpu_has_bpext)
+ return -EOPNOTSUPP;
+
+ if (bp->attr.bp_addr & bp->attr.bp_addr_mask)
+ return -EINVAL;
+
+ return 0;
+}
+
/*
* Dump the debug register contents to the user.
* We can't dump our per cpu values because it
--
1.7.9.5
Hi Jacob,
On Tue, Apr 23, 2013 at 08:57:02AM +0100, Jacob Shin wrote:
> Some architectures (for us, AMD Family 16h) allow for "don't care" bit
> mask to further qualify a hardware breakpoint address, in order to
> trap on range of addresses. Update perf uapi to add bp_addr_mask field.
arm and arm64 have a similar feature to this, whereby we currently have to
translate the bp_len field into a mask, which is all the hardware
understands. Unlike what you describe, our mask indicates the bytes we *are*
interested in, but I think we could make use of the same functionality that
you're introducing here.
There are some funky restrictions on the alignment of the base address, but
we can detect those and tell userspace where to go if it tries any funny
stuff.
Can you see a problem if I simply invert the mask?
Will
On 04/23, Jacob Shin wrote:
>
> +__weak int arch_validate_hwbkpt_addr_mask(struct perf_event *bp)
> +{
> + return bp->attr.bp_addr_mask == 0;
> +}
> +
> static int validate_hw_breakpoint(struct perf_event *bp)
> {
> int ret;
> @@ -393,6 +398,10 @@ static int validate_hw_breakpoint(struct perf_event *bp)
> if (ret)
> return ret;
>
> + ret = arch_validate_hwbkpt_addr_mask(bp);
> + if (ret)
> + return ret;
Well, this looks obviously wrong?
arch_validate_hwbkpt_addr_mask() fails if bp_addr_mask == 0? and returns
"1" as the error code.
Either it should returns something like "bp_addr_mask ? -ENOTSUPP : 0"
or the caller should do "if (!validate_hw_breakpoint()) return -ERR".
Oleg.
On 04/23, Jacob Shin wrote:
>
> +int arch_validate_hwbkpt_addr_mask(struct perf_event *bp)
> +{
> + if (!cpu_has_bpext)
> + return -EOPNOTSUPP;
I must have missed something...
But it shouldn't fail if bp_addr_mask == 0?
Oleg.
On Tue, Apr 23, 2013 at 03:18:44PM +0200, Oleg Nesterov wrote:
> On 04/23, Jacob Shin wrote:
> >
> > +__weak int arch_validate_hwbkpt_addr_mask(struct perf_event *bp)
> > +{
> > + return bp->attr.bp_addr_mask == 0;
> > +}
> > +
> > static int validate_hw_breakpoint(struct perf_event *bp)
> > {
> > int ret;
> > @@ -393,6 +398,10 @@ static int validate_hw_breakpoint(struct perf_event *bp)
> > if (ret)
> > return ret;
> >
> > + ret = arch_validate_hwbkpt_addr_mask(bp);
> > + if (ret)
> > + return ret;
>
> Well, this looks obviously wrong?
>
> arch_validate_hwbkpt_addr_mask() fails if bp_addr_mask == 0? and returns
> "1" as the error code.
>
> Either it should returns something like "bp_addr_mask ? -ENOTSUPP : 0"
> or the caller should do "if (!validate_hw_breakpoint()) return -ERR".
You are absolutely right. I have mixed up "does this arch support address
masks?" vs "is this a valid address mask?" So sorry about that.
Here is the corrected patch:
>From f2054e8af979f024aa2da93f80646db18492479e Mon Sep 17 00:00:00 2001
From: Jacob Shin <[email protected]>
Date: Mon, 22 Apr 2013 17:02:37 -0500
Subject: [PATCH 1/4] perf: Add hardware breakpoint address mask
Some architectures (for us, AMD Family 16h) allow for "don't care" bit
mask to further qualify a hardware breakpoint address, in order to
trap on range of addresses. Update perf uapi to add bp_addr_mask field.
Signed-off-by: Jacob Shin <[email protected]>
---
include/uapi/linux/perf_event.h | 5 ++++-
kernel/events/hw_breakpoint.c | 9 +++++++++
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index fb104e5..e22e1d1 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -286,7 +286,10 @@ struct perf_event_attr {
__u64 config1; /* extension of config */
};
union {
- __u64 bp_len;
+ struct {
+ __u32 bp_len;
+ __u32 bp_addr_mask;
+ };
__u64 config2; /* extension of config1 */
};
__u64 branch_sample_type; /* enum perf_branch_sample_type */
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index a64f8ae..dde8273 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -385,6 +385,11 @@ int dbg_release_bp_slot(struct perf_event *bp)
return 0;
}
+__weak int arch_validate_hwbkpt_addr_mask(struct perf_event *bp)
+{
+ return bp->attr.bp_addr_mask ? -EOPNOTSUPP : 0;
+}
+
static int validate_hw_breakpoint(struct perf_event *bp)
{
int ret;
@@ -393,6 +398,10 @@ static int validate_hw_breakpoint(struct perf_event *bp)
if (ret)
return ret;
+ ret = arch_validate_hwbkpt_addr_mask(bp);
+ if (ret)
+ return ret;
+
if (arch_check_bp_in_kernelspace(bp)) {
if (bp->attr.exclude_kernel)
return -EINVAL;
--
1.7.9.5
On Tue, Apr 23, 2013 at 10:54:37AM +0100, Will Deacon wrote:
> Hi Jacob,
>
> On Tue, Apr 23, 2013 at 08:57:02AM +0100, Jacob Shin wrote:
> > Some architectures (for us, AMD Family 16h) allow for "don't care" bit
> > mask to further qualify a hardware breakpoint address, in order to
> > trap on range of addresses. Update perf uapi to add bp_addr_mask field.
>
> arm and arm64 have a similar feature to this, whereby we currently have to
> translate the bp_len field into a mask, which is all the hardware
> understands. Unlike what you describe, our mask indicates the bytes we *are*
> interested in, but I think we could make use of the same functionality that
> you're introducing here.
>
> There are some funky restrictions on the alignment of the base address, but
> we can detect those and tell userspace where to go if it tries any funny
> stuff.
>
> Can you see a problem if I simply invert the mask?
Hi,
That's great! No, I don't see a problem at all.
I guess now it can be debated if the mask coming in from userland should
be include or exclude mask. But I think exclude makes syntax easier:
To count writes to [0x1000 ~ 0x1010)
Include mask (my current patchset):
perf stat -e mem:0x1000/0xf:w a.out
Exclude mask:
perf stat -e mem:0x1000/0xfff0:w a.out
Thanks!
On Tue, Apr 23, 2013 at 09:34:23AM -0500, Jacob Shin wrote:
> On Tue, Apr 23, 2013 at 10:54:37AM +0100, Will Deacon wrote:
> > Hi Jacob,
> >
> > On Tue, Apr 23, 2013 at 08:57:02AM +0100, Jacob Shin wrote:
> > > Some architectures (for us, AMD Family 16h) allow for "don't care" bit
> > > mask to further qualify a hardware breakpoint address, in order to
> > > trap on range of addresses. Update perf uapi to add bp_addr_mask field.
> >
> > arm and arm64 have a similar feature to this, whereby we currently have to
> > translate the bp_len field into a mask, which is all the hardware
> > understands. Unlike what you describe, our mask indicates the bytes we *are*
> > interested in, but I think we could make use of the same functionality that
> > you're introducing here.
> >
> > There are some funky restrictions on the alignment of the base address, but
> > we can detect those and tell userspace where to go if it tries any funny
> > stuff.
> >
> > Can you see a problem if I simply invert the mask?
>
> Hi,
>
> That's great! No, I don't see a problem at all.
>
> I guess now it can be debated if the mask coming in from userland should
> be include or exclude mask. But I think exclude makes syntax easier:
>
> To count writes to [0x1000 ~ 0x1010)
>
> Include mask (my current patchset):
^^^^^^^
Exclude (I mean ..)
>
> perf stat -e mem:0x1000/0xf:w a.out
>
> Exclude mask:
^^^^^^^
Include
>
> perf stat -e mem:0x1000/0xfff0:w a.out
>
> Thanks!
On Tue, Apr 23, 2013 at 03:40:57PM +0100, Jacob Shin wrote:
> On Tue, Apr 23, 2013 at 09:34:23AM -0500, Jacob Shin wrote:
> > On Tue, Apr 23, 2013 at 10:54:37AM +0100, Will Deacon wrote:
> > > Can you see a problem if I simply invert the mask?
> >
> > That's great! No, I don't see a problem at all.
Great! The GDB folks have been asking for this, so I can finally make them
go away now :)
> > I guess now it can be debated if the mask coming in from userland should
> > be include or exclude mask. But I think exclude makes syntax easier:
> >
> > To count writes to [0x1000 ~ 0x1010)
> >
> > Include mask (my current patchset):
> ^^^^^^^
> Exclude (I mean ..)
> >
> > perf stat -e mem:0x1000/0xf:w a.out
Are you saying that this command would count any write to:
0x1000
0x1001
...
0x100e
0x100f
?
If so, that differs from the ARM debug architecture in that the mask is called
`byte-address-select', so a mask of 0b1001 would count accesses at +0 bytes
and +3 bytes from the base address. Is that possible to describe with your
masking scheme and a single watchpoint?
A mask of 0xf, would count +0, +1, +2 and +3 (essentially bp_len == 4).
Unfortunately, that means I can't just invert the mask like I originally
thought.
Will
On Tue, Apr 23, 2013 at 04:02:40PM +0100, Will Deacon wrote:
> On Tue, Apr 23, 2013 at 03:40:57PM +0100, Jacob Shin wrote:
> > On Tue, Apr 23, 2013 at 09:34:23AM -0500, Jacob Shin wrote:
> > > On Tue, Apr 23, 2013 at 10:54:37AM +0100, Will Deacon wrote:
> > > > Can you see a problem if I simply invert the mask?
> > >
> > > That's great! No, I don't see a problem at all.
>
> Great! The GDB folks have been asking for this, so I can finally make them
> go away now :)
>
> > > I guess now it can be debated if the mask coming in from userland should
> > > be include or exclude mask. But I think exclude makes syntax easier:
> > >
> > > To count writes to [0x1000 ~ 0x1010)
> > >
> > > Include mask (my current patchset):
> > ^^^^^^^
> > Exclude (I mean ..)
> > >
> > > perf stat -e mem:0x1000/0xf:w a.out
>
> Are you saying that this command would count any write to:
>
> 0x1000
> 0x1001
> ...
> 0x100e
> 0x100f
>
> ?
>
> If so, that differs from the ARM debug architecture in that the mask is called
> `byte-address-select', so a mask of 0b1001 would count accesses at +0 bytes
> and +3 bytes from the base address. Is that possible to describe with your
> masking scheme and a single watchpoint?
>
> A mask of 0xf, would count +0, +1, +2 and +3 (essentially bp_len == 4).
>
> Unfortunately, that means I can't just invert the mask like I originally
> thought.
Ah, .. that is different .
Our hardware matches on the breakpoint if:
(physical_address & ~bp_addr_mask) == (bp_addr & ~bp_addr_mask)
In other words, the mask says which of the bp_addr bits hardware should
ignore when matching.
.. it would be great if we can come up with userland interface that works
for both archs. I'm coming up empty at the moment ..
On Tue, Apr 23, 2013 at 04:18:46PM +0100, Jacob Shin wrote:
> On Tue, Apr 23, 2013 at 04:02:40PM +0100, Will Deacon wrote:
> > On Tue, Apr 23, 2013 at 03:40:57PM +0100, Jacob Shin wrote:
> > > > perf stat -e mem:0x1000/0xf:w a.out
> >
> > Are you saying that this command would count any write to:
> >
> > 0x1000
> > 0x1001
> > ...
> > 0x100e
> > 0x100f
> >
> > ?
> >
> > If so, that differs from the ARM debug architecture in that the mask is called
> > `byte-address-select', so a mask of 0b1001 would count accesses at +0 bytes
> > and +3 bytes from the base address. Is that possible to describe with your
> > masking scheme and a single watchpoint?
> >
> > A mask of 0xf, would count +0, +1, +2 and +3 (essentially bp_len == 4).
> >
> > Unfortunately, that means I can't just invert the mask like I originally
> > thought.
>
> Ah, .. that is different .
>
> Our hardware matches on the breakpoint if:
>
> (physical_address & ~bp_addr_mask) == (bp_addr & ~bp_addr_mask)
>
> In other words, the mask says which of the bp_addr bits hardware should
> ignore when matching.
>
> .. it would be great if we can come up with userland interface that works
> for both archs. I'm coming up empty at the moment ..
After a bit of thought, I *think* the ARM mechanism is more expressive, and
you could describe your masking in terms of byte-address-select. The
downside is that we'd end up with much larger masks, which could be argued
as counter-intuitive from a user's point of view.
Will
On Wed, Apr 24, 2013 at 10:48:53AM +0100, Will Deacon wrote:
> On Tue, Apr 23, 2013 at 04:18:46PM +0100, Jacob Shin wrote:
> > On Tue, Apr 23, 2013 at 04:02:40PM +0100, Will Deacon wrote:
> > > On Tue, Apr 23, 2013 at 03:40:57PM +0100, Jacob Shin wrote:
> > > > > perf stat -e mem:0x1000/0xf:w a.out
> > >
> > > Are you saying that this command would count any write to:
> > >
> > > 0x1000
> > > 0x1001
> > > ...
> > > 0x100e
> > > 0x100f
> > >
> > > ?
> > >
> > > If so, that differs from the ARM debug architecture in that the mask is called
> > > `byte-address-select', so a mask of 0b1001 would count accesses at +0 bytes
> > > and +3 bytes from the base address. Is that possible to describe with your
> > > masking scheme and a single watchpoint?
> > >
> > > A mask of 0xf, would count +0, +1, +2 and +3 (essentially bp_len == 4).
> > >
> > > Unfortunately, that means I can't just invert the mask like I originally
> > > thought.
> >
> > Ah, .. that is different .
> >
> > Our hardware matches on the breakpoint if:
> >
> > (physical_address & ~bp_addr_mask) == (bp_addr & ~bp_addr_mask)
> >
> > In other words, the mask says which of the bp_addr bits hardware should
> > ignore when matching.
> >
> > .. it would be great if we can come up with userland interface that works
> > for both archs. I'm coming up empty at the moment ..
>
> After a bit of thought, I *think* the ARM mechanism is more expressive, and
> you could describe your masking in terms of byte-address-select. The
> downside is that we'd end up with much larger masks, which could be argued
> as counter-intuitive from a user's point of view.
Hm .. Oleg, any thoughts on this?
bp_addr is encoded into attr.config1 and bp_len in attr.config2. I'm not
sure if attr.config is being used or not, but if not, we could say that
architecture specific "stuff" goes into attr.config.
And in perf tools:
perf stat -e mem:0x1000/0xf:w a.out
Will just translate into setting attr.config = 0xf. And only x86
hw_breakpoint will know what .config = 0xf means for x86 and do the right
thing. For ARM, 0xf will mean something different.
The downside is that in userland perf tool we need differing documentation
on what the mask syntax means for each architecture.
2013/4/23 Jacob Shin <[email protected]>:
> From: Jacob Shin <[email protected]>
> Date: Mon, 22 Apr 2013 17:02:37 -0500
> Subject: [PATCH 1/4] perf: Add hardware breakpoint address mask
>
> Some architectures (for us, AMD Family 16h) allow for "don't care" bit
> mask to further qualify a hardware breakpoint address, in order to
> trap on range of addresses. Update perf uapi to add bp_addr_mask field.
It would be nice to describe a bit what this "don't care" bit mask is
about. Is it a range of address/bitmask to ignore in the middle of the
range we want to breakpoint in? I mean, that's confusing.
>
> Signed-off-by: Jacob Shin <[email protected]>
> ---
> include/uapi/linux/perf_event.h | 5 ++++-
> kernel/events/hw_breakpoint.c | 9 +++++++++
> 2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index fb104e5..e22e1d1 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -286,7 +286,10 @@ struct perf_event_attr {
> __u64 config1; /* extension of config */
> };
> union {
> - __u64 bp_len;
> + struct {
> + __u32 bp_len;
> + __u32 bp_addr_mask;
> + };
Do we need len and mask to work at the same time? I can't think of a
situation when len and mask mix up together in a useful way to define
a range.
Thanks.
On 04/25, Frederic Weisbecker wrote:
>
> 2013/4/23 Jacob Shin <[email protected]>:
> > @@ -286,7 +286,10 @@ struct perf_event_attr {
> > __u64 config1; /* extension of config */
> > };
> > union {
> > - __u64 bp_len;
> > + struct {
> > + __u32 bp_len;
> > + __u32 bp_addr_mask;
> > + };
>
> Do we need len and mask to work at the same time? I can't think of a
> situation when len and mask mix up together in a useful way to define
> a range.
And it would be nice (I think) if we could simply turn bp_len into
bp_mask. It is already the mask actually, bp_addr should be aligned.
But I do not see how we can do this, so I guess we need another field.
Well. Another option is to extend bp_len. Fortunately HW_BREAKPOINT_LEN_*
match the length, so we can simply allow any 2^n length and amd.c can
translate it into the mask.
Of course, this doesn't allow to use, say, mask=0xF0. But perhaps this
is not really useful?
I dunno. I leave this to you and Jacob ;)
Oleg.
On 04/25, Oleg Nesterov wrote:
>
> On 04/25, Frederic Weisbecker wrote:
> >
> > 2013/4/23 Jacob Shin <[email protected]>:
> > > @@ -286,7 +286,10 @@ struct perf_event_attr {
> > > __u64 config1; /* extension of config */
> > > };
> > > union {
> > > - __u64 bp_len;
> > > + struct {
> > > + __u32 bp_len;
> > > + __u32 bp_addr_mask;
> > > + };
> >
> > Do we need len and mask to work at the same time? I can't think of a
> > situation when len and mask mix up together in a useful way to define
> > a range.
>
> And it would be nice (I think) if we could simply turn bp_len into
> bp_mask. It is already the mask actually, bp_addr should be aligned.
>
> But I do not see how we can do this, so I guess we need another field.
>
> Well. Another option is to extend bp_len. Fortunately HW_BREAKPOINT_LEN_*
> match the length, so we can simply allow any 2^n length and amd.c can
> translate it into the mask.
>
> Of course, this doesn't allow to use, say, mask=0xF0. But perhaps this
> is not really useful?
>
> I dunno. I leave this to you and Jacob ;)
IOW, I meant something like the incomplete patch below. But let me
repeat I am not sure this is the good idea.
Hmm. And note the change arch_check_bp_in_kernelspace(). Whatever we
do it should be updated if we have a mask...
And this reminds me that arch_check_bp_in_kernelspace() is buggy but
my trivial fix was ignored ;) see http://marc.info/?t=135248593000001
Oleg.
--- x/arch/x86/include/asm/hw_breakpoint.h
+++ x/arch/x86/include/asm/hw_breakpoint.h
@@ -12,6 +12,7 @@
*/
struct arch_hw_breakpoint {
unsigned long address;
+ unsigned long mask;
u8 len;
u8 type;
};
--- x/arch/x86/kernel/hw_breakpoint.c
+++ x/arch/x86/kernel/hw_breakpoint.c
@@ -128,6 +128,8 @@ int arch_install_hw_breakpoint(struct pe
*dr7 |= encode_dr7(i, info->len, info->type);
set_debugreg(*dr7, 7);
+ if (info->mask)
+ set_dr_addr_mask(...);
return 0;
}
@@ -165,29 +167,6 @@ void arch_uninstall_hw_breakpoint(struct
set_debugreg(*dr7, 7);
}
-static int get_hbp_len(u8 hbp_len)
-{
- unsigned int len_in_bytes = 0;
-
- switch (hbp_len) {
- case X86_BREAKPOINT_LEN_1:
- len_in_bytes = 1;
- break;
- case X86_BREAKPOINT_LEN_2:
- len_in_bytes = 2;
- break;
- case X86_BREAKPOINT_LEN_4:
- len_in_bytes = 4;
- break;
-#ifdef CONFIG_X86_64
- case X86_BREAKPOINT_LEN_8:
- len_in_bytes = 8;
- break;
-#endif
- }
- return len_in_bytes;
-}
-
/*
* Check for virtual address in kernel space.
*/
@@ -198,7 +177,7 @@ int arch_check_bp_in_kernelspace(struct
struct arch_hw_breakpoint *info = counter_arch_bp(bp);
va = info->address;
- len = get_hbp_len(info->len);
+ len = bp->attr.bp_len;
return (va >= TASK_SIZE) && ((va + len - 1) >= TASK_SIZE);
}
@@ -278,7 +257,8 @@ static int arch_build_bp_info(struct per
return -EINVAL;
}
- /* Len */
+ /* info->len == info->mask == 0 */
+
switch (bp->attr.bp_len) {
case HW_BREAKPOINT_LEN_1:
info->len = X86_BREAKPOINT_LEN_1;
@@ -295,7 +275,10 @@ static int arch_build_bp_info(struct per
break;
#endif
default:
- return -EINVAL;
+ if (!is_power_of_2(bp->attr.bp_len))
+ return -EINVAL;
+ info->mask = bp->attr.bp_len - 1;
+ info->len = X86_BREAKPOINT_LEN_1;
}
return 0;
@@ -314,11 +297,15 @@ int arch_validate_hwbkpt_settings(struct
if (ret)
return ret;
- ret = -EINVAL;
-
switch (info->len) {
case X86_BREAKPOINT_LEN_1:
align = 0;
+ if (info->mask) {
+ align = info->mask;
+ ret = arch_validate_hwbkpt_addr_mask(...);
+ if (ret)
+ return ret;
+ }
break;
case X86_BREAKPOINT_LEN_2:
align = 1;
@@ -332,7 +319,7 @@ int arch_validate_hwbkpt_settings(struct
break;
#endif
default:
- return ret;
+ BUG();
}
/*
On Thu, Apr 25, 2013 at 05:10:35PM +0200, Oleg Nesterov wrote:
> On 04/25, Frederic Weisbecker wrote:
> >
> > 2013/4/23 Jacob Shin <[email protected]>:
> > > @@ -286,7 +286,10 @@ struct perf_event_attr {
> > > __u64 config1; /* extension of config */
> > > };
> > > union {
> > > - __u64 bp_len;
> > > + struct {
> > > + __u32 bp_len;
> > > + __u32 bp_addr_mask;
> > > + };
> >
> > Do we need len and mask to work at the same time? I can't think of a
> > situation when len and mask mix up together in a useful way to define
> > a range.
Okay, we can make it:
union {
__u64 bp_len;
__u64 bp_addr_mask;
__config2;
};
And in x86, bp_len != HW_BREAKPOINT_LEN_1,2,4,8 will be interpreted as
bp_addr_mask.
>
> And it would be nice (I think) if we could simply turn bp_len into
> bp_mask. It is already the mask actually, bp_addr should be aligned.
>
> But I do not see how we can do this, so I guess we need another field.
>
> Well. Another option is to extend bp_len. Fortunately HW_BREAKPOINT_LEN_*
> match the length, so we can simply allow any 2^n length and amd.c can
> translate it into the mask.
Okay, this is nice because we can just ride on top of what already exits,
but ...
>
> Of course, this doesn't allow to use, say, mask=0xF0. But perhaps this
> is not really useful?
Exactly .. Right I think most of the time we are trying to trap on range
of contiguous addresses, but .. mask of 0xf0 allows us to trap on 16 byte
aligned addresses:
addr of 0x1000 and mask of 0xf0 will count accesses to:
0x1000, 0x1010, 0x1020, .. 0x10e0, 0x10f0
Maybe there is some big blob of data and user wants to see how many times
16 byte aligned addresses get hit. This might be not as common, but it is
plausible no?
On 04/24, Jacob Shin wrote:
>
> And only x86
> hw_breakpoint will know what .config = 0xf means for x86 and do the right
> thing. For ARM, 0xf will mean something different.
>
> The downside is that in userland perf tool we need differing documentation
> on what the mask syntax means for each architecture.
Personally I think this is acceptable.
But I am new to this code, so...
Oleg.
On 04/25/2013 10:06 AM, Oleg Nesterov wrote:
>>
>> The downside is that in userland perf tool we need differing documentation
>> on what the mask syntax means for each architecture.
>
> Personally I think this is acceptable.
>
> But I am new to this code, so...
>
That would seem really, really awkward. Yes, perf has a bunch of
low-level stuff, but it would seem highly undesirable to force the user
to deal with something like that.
It would be good to have a user-friendly syntax that covers most of what
users may want to do and perhaps a longer form that can express
everything including ARM's byte selects; if the system can't honor the
request it should return an error.
-hpa
On 04/25, Jacob Shin wrote:
>
> On Thu, Apr 25, 2013 at 05:10:35PM +0200, Oleg Nesterov wrote:
> > On 04/25, Frederic Weisbecker wrote:
> > >
> > > Do we need len and mask to work at the same time? I can't think of a
> > > situation when len and mask mix up together in a useful way to define
> > > a range.
>
> Okay, we can make it:
>
> union {
> __u64 bp_len;
> __u64 bp_addr_mask;
> __config2;
> };
>
> And in x86, bp_len != HW_BREAKPOINT_LEN_1,2,4,8 will be interpreted as
> bp_addr_mask.
I think this can work too. And this needs almost the same changes as
extending ->bp_len.
> > Well. Another option is to extend bp_len. Fortunately HW_BREAKPOINT_LEN_*
> > match the length, so we can simply allow any 2^n length and amd.c can
> > translate it into the mask.
>
> Okay, this is nice because we can just ride on top of what already exits,
> but ...
Yes, yes, I agree with your "but". As I said from the very beginning
I am not sure about this idea.
> addr of 0x1000 and mask of 0xf0 will count accesses to:
>
> 0x1000, 0x1010, 0x1020, .. 0x10e0, 0x10f0
>
> Maybe there is some big blob of data and user wants to see how many times
> 16 byte aligned addresses get hit. This might be not as common, but it is
> plausible no?
I'd say this is certainly uncommon ;)
But in any case we should not limit a user, so I agree.
Oleg.
On Thu, Apr 25, 2013 at 10:17:35AM -0700, H. Peter Anvin wrote:
> On 04/25/2013 10:06 AM, Oleg Nesterov wrote:
> >>
> >> The downside is that in userland perf tool we need differing documentation
> >> on what the mask syntax means for each architecture.
> >
> > Personally I think this is acceptable.
> >
> > But I am new to this code, so...
> >
>
> That would seem really, really awkward. Yes, perf has a bunch of
> low-level stuff, but it would seem highly undesirable to force the user
> to deal with something like that.
>
> It would be good to have a user-friendly syntax that covers most of what
> users may want to do and perhaps a longer form that can express
> everything including ARM's byte selects; if the system can't honor the
> request it should return an error.
Okay,
If arch specific masks are a no go, then I think I'm convinced that
Oleg's idea of using bp_len is the right thing to do. Right now perf
userland tool hard codes bp_len to 4, so I need to modify it to allow
user to override the length if desired.
Oleg, Frederic, et al.
Which syntax do you prefer?
If we want to set bp_len to 16:
$ perf stat -e mem:0x1000:rw:16
Or
$ perf stat -e mem:0x1000:16
Or
$ perf stat -e mem:0x1000/16
If no bp_len value is specified, it will still default to 4 as it did
before.
Hi Jacob,
On Fri, Apr 26, 2013 at 12:19:11AM +0100, Jacob Shin wrote:
> On Thu, Apr 25, 2013 at 10:17:35AM -0700, H. Peter Anvin wrote:
> > On 04/25/2013 10:06 AM, Oleg Nesterov wrote:
> > >>
> > >> The downside is that in userland perf tool we need differing documentation
> > >> on what the mask syntax means for each architecture.
> > >
> > > Personally I think this is acceptable.
> > >
> > > But I am new to this code, so...
> > >
> >
> > That would seem really, really awkward. Yes, perf has a bunch of
> > low-level stuff, but it would seem highly undesirable to force the user
> > to deal with something like that.
> >
> > It would be good to have a user-friendly syntax that covers most of what
> > users may want to do and perhaps a longer form that can express
> > everything including ARM's byte selects; if the system can't honor the
> > request it should return an error.
>
> Okay,
>
> If arch specific masks are a no go, then I think I'm convinced that
> Oleg's idea of using bp_len is the right thing to do. Right now perf
> userland tool hard codes bp_len to 4, so I need to modify it to allow
> user to override the length if desired.
So what value ends up in the bp_len field: a length or a mask? I just want
to make sure it's flexible enough that, if we add another user interface for
the byte-select stuff, we don't need to butcher the attr in another way.
> Oleg, Frederic, et al.
>
> Which syntax do you prefer?
>
> If we want to set bp_len to 16:
>
> $ perf stat -e mem:0x1000:rw:16
>
> Or
>
> $ perf stat -e mem:0x1000:16
>
> Or
>
> $ perf stat -e mem:0x1000/16
>
> If no bp_len value is specified, it will still default to 4 as it did
> before.
I certainly like the ability to change the length of an execute breakpoint,
as that helps for halfword instructions on ARM (not sure if your first
syntax precludes that, but just checking...).
Will
On Fri, Apr 26, 2013 at 05:20:44PM +0100, Will Deacon wrote:
> Hi Jacob,
>
> On Fri, Apr 26, 2013 at 12:19:11AM +0100, Jacob Shin wrote:
> > On Thu, Apr 25, 2013 at 10:17:35AM -0700, H. Peter Anvin wrote:
> > > On 04/25/2013 10:06 AM, Oleg Nesterov wrote:
> > > >>
> > > >> The downside is that in userland perf tool we need differing documentation
> > > >> on what the mask syntax means for each architecture.
> > > >
> > > > Personally I think this is acceptable.
> > > >
> > > > But I am new to this code, so...
> > > >
> > >
> > > That would seem really, really awkward. Yes, perf has a bunch of
> > > low-level stuff, but it would seem highly undesirable to force the user
> > > to deal with something like that.
> > >
> > > It would be good to have a user-friendly syntax that covers most of what
> > > users may want to do and perhaps a longer form that can express
> > > everything including ARM's byte selects; if the system can't honor the
> > > request it should return an error.
> >
> > Okay,
> >
> > If arch specific masks are a no go, then I think I'm convinced that
> > Oleg's idea of using bp_len is the right thing to do. Right now perf
> > userland tool hard codes bp_len to 4, so I need to modify it to allow
> > user to override the length if desired.
>
> So what value ends up in the bp_len field: a length or a mask? I just want
> to make sure it's flexible enough that, if we add another user interface for
> the byte-select stuff, we don't need to butcher the attr in another way.
It is length. Just as it is today, userland API does not change at all.
>
> > Oleg, Frederic, et al.
> >
> > Which syntax do you prefer?
> >
> > If we want to set bp_len to 16:
> >
> > $ perf stat -e mem:0x1000:rw:16
> >
> > Or
> >
> > $ perf stat -e mem:0x1000:16
> >
> > Or
> >
> > $ perf stat -e mem:0x1000/16
> >
> > If no bp_len value is specified, it will still default to 4 as it did
> > before.
>
> I certainly like the ability to change the length of an execute breakpoint,
> as that helps for halfword instructions on ARM (not sure if your first
> syntax precludes that, but just checking...).
All three are equivalent, I just need to pick one and implement/document.
-Jacob
On 04/25, Jacob Shin wrote:
>
> Right now perf
> userland tool hard codes bp_len to 4, so I need to modify it to allow
> user to override the length if desired.
imho this itself looks like a good change...
> Oleg, Frederic, et al.
>
> Which syntax do you prefer?
>
> $ perf stat -e mem:0x1000/16
Personally I like this a bit more, but I won't argue in any case.
Oleg.