Subject: [PATCH 0/7] perf/x86-ibs and tools: Add support for AMD IBS

This patch set adds perf tool support for AMD IBS. Mostly perf tool
patches, but

perf/x86-ibs: Add support for IBS pseudo events

includes kernel changes.

It basically implements:

* General perf tool support for dynamically allocated pmus,
* Support for IBS pseudo events,
* perf-script support for IBS (updated version).

With this patches there is the possibility to add pmu handler code for
a specific pmu. If a pmu with handler code is listed in sysfs, it is
activated and can be used with perf tools. E.g. perf-list shows
available events of this pmu and the event parser of perf-record is
able to setup profiling for that particular pmu. This mechanism is
used to implement pmus for ibs (ibs_op/ibs_fetch).

IBS pseudo events are derived from an IBS sample and determined
through a combination of one or more IBS event flags or values.
E.g. one could profile mispredicted branches with ibs using such
pseudo events:

# perf record -a -e ibs_op:MISPREDICTED_BRANCH ...

This is implemented with kernel side filtering of ibs samples by
passing filter settings to the kernel (attr.config1/config2).

I also attached an updated version of my previously posted patch that
implements perf-script support for ibs. This is usefull for a user to
dump raw ibs samples. The script can be used by the user as a basis
for a script that post-processes ibs samples.

Changes can be pulled from kernel.org and are available in the git
repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git perf-ibs

-Robert


Robert Richter (7):
perf tools: Fix generation of pmu list
perf tools: Add basic dynamic PMU support
perf tools: Add parser for dynamic PMU events
perf/x86-ibs: Add support for IBS pseudo events
perf, tools: Add raw event support for dynamic allocated pmus
perf tools: Add pmu mappings to header information
perf script: Add script to collect and display IBS samples

arch/x86/kernel/cpu/perf_event_amd_ibs.c | 83 ++++++-
tools/perf/Makefile | 1 +
tools/perf/scripts/perl/bin/ibs-record | 23 ++
tools/perf/scripts/perl/bin/ibs-report | 6 +
tools/perf/scripts/perl/ibs.pl | 47 ++++
tools/perf/util/header.c | 78 ++++++
tools/perf/util/header.h | 1 +
tools/perf/util/parse-events.c | 45 +++-
tools/perf/util/parse-events.h | 4 +-
tools/perf/util/parse-events.y | 11 +-
tools/perf/util/pmu-ibs.c | 419 ++++++++++++++++++++++++++++++
tools/perf/util/pmu.c | 102 +++++++-
tools/perf/util/pmu.h | 19 ++
13 files changed, 828 insertions(+), 11 deletions(-)
create mode 100644 tools/perf/scripts/perl/bin/ibs-record
create mode 100644 tools/perf/scripts/perl/bin/ibs-report
create mode 100644 tools/perf/scripts/perl/ibs.pl
create mode 100644 tools/perf/util/pmu-ibs.c

--
1.7.8.4


Subject: [PATCH 7/7] perf script: Add script to collect and display IBS samples

This patch adds a script to collect and display IBS samples.
There are the following options:

perf script ibs [ibs_op|ibs_fetch] [-c period] <command>

Examples for usage:

perf script ibs ibs_op <command>
perf script ibs ibs_fetch <command>
perf script record ibs ibs_op -c 500000 <command>
perf script report ibs
perf script record ibs ibs_op -c 500000 <command> | perf script report ibs

V2:
* fix cpu number output and whitespaces in ibs.pl

Signed-off-by: Robert Richter <[email protected]>
---
tools/perf/scripts/perl/bin/ibs-record | 23 +++++++++++++++
tools/perf/scripts/perl/bin/ibs-report | 6 ++++
tools/perf/scripts/perl/ibs.pl | 47 ++++++++++++++++++++++++++++++++
3 files changed, 76 insertions(+), 0 deletions(-)
create mode 100644 tools/perf/scripts/perl/bin/ibs-record
create mode 100644 tools/perf/scripts/perl/bin/ibs-report
create mode 100644 tools/perf/scripts/perl/ibs.pl

diff --git a/tools/perf/scripts/perl/bin/ibs-record b/tools/perf/scripts/perl/bin/ibs-record
new file mode 100644
index 0000000..dc5f4d2
--- /dev/null
+++ b/tools/perf/scripts/perl/bin/ibs-record
@@ -0,0 +1,23 @@
+#! /bin/bash
+
+while [ "${1+defined}" ]; do
+ case $1 in
+ ibs_op|ibs_fetch)
+ EVENT=$1
+ shift
+ break
+ ;;
+ -*)
+ REC_OPT+=($1)
+ shift
+ ;;
+ *)
+ echo $0 "$@" >&2
+ echo "Invalid option: $1" >&2
+ echo "perf script ibs [ibs_op|ibs_fetch] [-c <period>]" >&2
+ exit 1
+ ;;
+ esac
+done
+
+perf record -e ${EVENT:-ibs_op}:r0 -c 100000 -R -a "${REC_OPT[@]}" "$@"
diff --git a/tools/perf/scripts/perl/bin/ibs-report b/tools/perf/scripts/perl/bin/ibs-report
new file mode 100644
index 0000000..f44e69d
--- /dev/null
+++ b/tools/perf/scripts/perl/bin/ibs-report
@@ -0,0 +1,6 @@
+#! /bin/bash
+
+# description: collect and display AMD IBS samples
+# args: [ibs_op|ibs_fetch] [-c period]
+
+perf script -s "$PERF_EXEC_PATH"/scripts/perl/ibs.pl "$@"
diff --git a/tools/perf/scripts/perl/ibs.pl b/tools/perf/scripts/perl/ibs.pl
new file mode 100644
index 0000000..1fca03f
--- /dev/null
+++ b/tools/perf/scripts/perl/ibs.pl
@@ -0,0 +1,47 @@
+#
+# ibs.pl - perf script for AMD Instruction Based Sampling
+#
+# Copyright (C) 2011 Advanced Micro Devices, Inc., Robert Richter
+#
+# For licencing details see kernel-base/COPYING
+#
+# description: collect and display AMD IBS samples
+# args: [ibs_op|ibs_fetch] [-c period]
+#
+# examples:
+#
+# perf script ibs ibs_op <command>
+# perf script ibs ibs_fetch <command>
+# perf script record ibs ibs_op -c 500000 <command>
+# perf script report ibs
+# perf script record ibs ibs_op -c 500000 <command> | perf script report ibs
+#
+
+# Packed byte string args of process_event():
+#
+# $event: union perf_event util/event.h
+# $attr: struct perf_event_attr linux/perf_event.h
+# $sample: struct perf_sample util/event.h
+# $raw_data: perf_sample->raw_data util/event.h
+
+sub process_event
+{
+ my ($event, $attr, $sample, $raw_data) = @_;
+
+ my ($type) = (unpack("LSS", $event))[0];
+ my ($sample_type) = (unpack("LLQQQQQLLQQ", $attr))[4];
+ my ($cpu, $raw_size) = (unpack("QLLQQQQQLL", $sample))[8, 9];
+ my ($caps, @ibs_data) = unpack("LQ*", $raw_data);
+
+ return if (!$raw_size); # no raw data
+
+ if (scalar(@ibs_data) == 3) {
+ printf("IBS_FETCH sample on cpu%d\tIBS0: 0x%016x IBS1: 0x%016x IBS2: 0x%016x\n",
+ $cpu, @ibs_data);
+ } else {
+ printf("IBS_OP sample on cpu%d\t" .
+ "\t IBS0: 0x%016x IBS1: 0x%016x IBS2: 0x%016x\n" .
+ "\tIBS3: 0x%016x IBS4: 0x%016x IBS5: 0x%016x IBS6: 0x%016x\n",
+ $cpu, @ibs_data);
+ }
+}
--
1.7.8.4

Subject: [PATCH 6/7] perf tools: Add pmu mappings to header information

With dynamic pmu allocation there are also dynamically assigned pmu
ids. These ids are used in event->attr.type to describe the pmu to be
used for that event. The information is available in sysfs, e.g:

/sys/bus/event_source/devices/breakpoint/type: 5
/sys/bus/event_source/devices/cpu/type: 4
/sys/bus/event_source/devices/ibs_fetch/type: 6
/sys/bus/event_source/devices/ibs_op/type: 7
/sys/bus/event_source/devices/software/type: 1
/sys/bus/event_source/devices/tracepoint/type: 2

These mappings are needed to know which samples belong to which pmu.
If a pmu is added dynamically like for ibs_fetch or ibs_op the type
value may vary.

Now, when decoding samples from perf.data this information in sysfs
might be no longer available or may have changed. We need to store it
in perf.data. Using the header for this. The header information
created with perf report would contain an additional section looking
like this:

# pmu mappings: ibs_op = 7, ibs_fetch = 6, cpu = 4, breakpoint = 5, tracepoint = 2, software = 1

V2:
* improve error handling in print_pmu_mappings()
* use perf_pmu__scan() to iterate over pmus

Signed-off-by: Robert Richter <[email protected]>
---
tools/perf/util/header.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/header.h | 1 +
2 files changed, 79 insertions(+), 0 deletions(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 4c7c2d7..c139da7 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -20,6 +20,7 @@
#include "symbol.h"
#include "debug.h"
#include "cpumap.h"
+#include "pmu.h"

static bool no_buildid_cache = false;

@@ -1000,6 +1001,45 @@ done:
}

/*
+ * File format:
+ *
+ * struct pmu_mappings {
+ * u32 pmu_num;
+ * struct pmu_map {
+ * u32 type;
+ * char name[];
+ * }[pmu_num];
+ * };
+ */
+
+static int write_pmu_mappings(int fd, struct perf_header *h __used,
+ struct perf_evlist *evlist __used)
+{
+ struct perf_pmu *pmu = NULL;
+ off_t offset = lseek(fd, 0, SEEK_CUR);
+ __u32 pmu_num = 0;
+
+ /* write real pmu_num later */
+ do_write(fd, &pmu_num, sizeof(pmu_num));
+
+ while ((pmu = perf_pmu__scan(pmu))) {
+ if (!pmu->name)
+ continue;
+ pmu_num++;
+ do_write(fd, &pmu->type, sizeof(pmu->type));
+ do_write_string(fd, pmu->name);
+ }
+
+ if (pwrite(fd, &pmu_num, sizeof(pmu_num), offset) != sizeof(pmu_num)) {
+ /* discard all */
+ lseek(fd, offset, SEEK_SET);
+ return -1;
+ }
+
+ return 0;
+}
+
+/*
* default get_cpuid(): nothing gets recorded
* actual implementation must be in arch/$(ARCH)/util/header.c
*/
@@ -1327,6 +1367,43 @@ static void print_branch_stack(struct perf_header *ph __used, int fd __used,
fprintf(fp, "# contains samples with branch stack\n");
}

+static void print_pmu_mappings(struct perf_header *ph, int fd, FILE *fp)
+{
+ const char *delimiter = "# pmu mappings: ";
+ char *name;
+ int ret;
+ u32 pmu_num;
+ u32 type;
+
+ ret = read(fd, &pmu_num, sizeof(pmu_num));
+ if (ret != sizeof(pmu_num))
+ goto error;
+
+ if (!pmu_num) {
+ fprintf(fp, "# pmu mappings: not available\n");
+ return;
+ }
+
+ while (pmu_num) {
+ if (read(fd, &type, sizeof(type)) != sizeof(type))
+ break;
+ name = do_read_string(fd, ph);
+ if (!name)
+ break;
+ pmu_num--;
+ fprintf(fp, "%s%s = %" PRIu32, delimiter, name, type);
+ free(name);
+ delimiter = ", ";
+ }
+
+ fprintf(fp, "\n");
+
+ if (!pmu_num)
+ return;
+error:
+ fprintf(fp, "# pmu mappings: unable to read\n");
+}
+
static int __event_process_build_id(struct build_id_event *bev,
char *filename,
struct perf_session *session)
@@ -1532,6 +1609,7 @@ static const struct feature_ops feat_ops[HEADER_LAST_FEATURE] = {
FEAT_OPF(HEADER_CPU_TOPOLOGY, cpu_topology),
FEAT_OPF(HEADER_NUMA_TOPOLOGY, numa_topology),
FEAT_OPA(HEADER_BRANCH_STACK, branch_stack),
+ FEAT_OPA(HEADER_PMU_MAPPINGS, pmu_mappings),
};

struct header_print_data {
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index 21a6be0..166dba6 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -28,6 +28,7 @@ enum {
HEADER_CPU_TOPOLOGY,
HEADER_NUMA_TOPOLOGY,
HEADER_BRANCH_STACK,
+ HEADER_PMU_MAPPINGS,
HEADER_LAST_FEATURE,
HEADER_FEAT_BITS = 256,
};
--
1.7.8.4

Subject: [PATCH 5/7] perf, tools: Add raw event support for dynamic allocated pmus

This patch extends the event parser to pass raw config values to a
dynamic allocated pmu. The following event syntax is supported now:

<pmu_name>:<raw_config_value>:<modifier>

Example for the ibs pmu:

# perf record -a -e ibs_op:r000 ...

Signed-off-by: Robert Richter <[email protected]>
---
tools/perf/util/parse-events.c | 20 +++++++++++++++-----
tools/perf/util/parse-events.h | 2 +-
tools/perf/util/parse-events.y | 7 ++++++-
tools/perf/util/pmu-ibs.c | 4 ++++
4 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index d572204..8ad536c 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -524,6 +524,9 @@ static int parse_events_add_tracepoint(struct list_head *list, int *idx,
{
int ret;

+ if (!event)
+ return -ENOENT;
+
ret = debugfs_valid_mountpoint(tracing_events_path);
if (ret)
return ret;
@@ -533,8 +536,9 @@ static int parse_events_add_tracepoint(struct list_head *list, int *idx,
add_tracepoint(list, idx, sys, event);
}

-static int __parse_events_add_generic_event(struct list_head *list, int *idx,
- char *sys, char *event)
+static int
+__parse_events_add_generic_event(struct list_head *list, int *idx, char *sys,
+ char *event, unsigned long config)
{
struct perf_event_attr attr;
char name[MAX_NAME_LEN];
@@ -545,12 +549,18 @@ static int __parse_events_add_generic_event(struct list_head *list, int *idx,
if (ret)
return ret;

- snprintf(name, MAX_NAME_LEN, "%s:%s", sys, event);
+ if (event)
+ snprintf(name, MAX_NAME_LEN, "%s:%s", sys, event);
+ else
+ snprintf(name, MAX_NAME_LEN, "%s:r%lx", sys, config);
+
+ attr.config |= config;
+
return add_event(list, idx, &attr, name);
}

int parse_events_add_generic_event(struct list_head *list, int *idx,
- char *sys, char *event)
+ char *sys, char *event, unsigned long config)
{
int ret1, ret2;

@@ -558,7 +568,7 @@ int parse_events_add_generic_event(struct list_head *list, int *idx,
if (!ret1)
return 0;

- ret2 = __parse_events_add_generic_event(list, idx, sys, event);
+ ret2 = __parse_events_add_generic_event(list, idx, sys, event, config);
if (!ret2)
return 0;

diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index e9df741..88950d8 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -63,7 +63,7 @@ int parse_events__new_term(struct parse_events__term **term, int type,
void parse_events__free_terms(struct list_head *terms);
int parse_events_modifier(struct list_head *list __used, char *str __used);
int parse_events_add_generic_event(struct list_head *list, int *idx,
- char *sys, char *event);
+ char *sys, char *event, unsigned long config);
int parse_events_add_raw(struct perf_evlist *evlist, unsigned long config,
unsigned long config1, unsigned long config2,
char *mod);
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 659b5e8..d07fddd 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -134,7 +134,12 @@ PE_PREFIX_MEM PE_VALUE sep_dc
event_legacy_generic:
PE_NAME ':' PE_NAME
{
- ABORT_ON(parse_events_add_generic_event(list_event, idx, $1, $3));
+ ABORT_ON(parse_events_add_generic_event(list_event, idx, $1, $3, 0));
+}
+|
+PE_NAME ':' PE_RAW
+{
+ ABORT_ON(parse_events_add_generic_event(list_event, idx, $1, NULL, $3));
}

event_legacy_numeric:
diff --git a/tools/perf/util/pmu-ibs.c b/tools/perf/util/pmu-ibs.c
index 604cb8c..a5468c8 100644
--- a/tools/perf/util/pmu-ibs.c
+++ b/tools/perf/util/pmu-ibs.c
@@ -375,6 +375,9 @@ static int ibs_parse_event(struct perf_event_attr *attr, char *sys, char *name)
if (strcmp("ibs_op", sys) && strcmp("ibs_fetch", sys))
return -ENOENT;

+ if (!name)
+ goto out; /* raw event if name is NULL */
+
for (event = events; event->id; event++) {
if (!strcmp(event->name + strlen(sys) + 1, name))
goto match;
@@ -384,6 +387,7 @@ static int ibs_parse_event(struct perf_event_attr *attr, char *sys, char *name)
match:
/* pseudo event found */
attr->config1 = event->config;
+out:
attr->sample_type = PERF_SAMPLE_CPU;

return 0;
--
1.7.8.4

Subject: [PATCH 3/7] perf tools: Add parser for dynamic PMU events

This patch adds support for pmu specific event parsers by extending
the pmu handler. The event syntax is the same as for tracepoints:

<subsys>:<name>:<modifier>

In case of dynamically allocated pmus the sub-system's name is the
name of the pmu. For the IBS pmu we have events like the following:

ibs_op:ALL [PMU event: ibs_op]
ibs_op:ALL_LOAD_STORE [PMU event: ibs_op]
ibs_op:BANK_CONF_LOAD [PMU event: ibs_op]
ibs_op:BANK_CONF_STORE [PMU event: ibs_op]
ibs_op:BRANCH_RETIRED [PMU event: ibs_op]
ibs_op:CANCELLED [PMU event: ibs_op]
ibs_op:COMP_TO_RET [PMU event: ibs_op]
...

The parser for pmu events is implemented in the .parse_event()
function of the handler.

Since we share the same syntax with tracepoints, tracepoints are
preferred in case of name collisions. Thus, this implementation is
backward compatible.

Signed-off-by: Robert Richter <[email protected]>
---
tools/perf/util/parse-events.c | 34 +++++++++++++++++++++++++++++++++-
tools/perf/util/parse-events.h | 4 ++--
tools/perf/util/parse-events.y | 6 +++---
tools/perf/util/pmu-ibs.c | 22 ++++++++++++++++++++++
tools/perf/util/pmu.c | 13 +++++++++++++
tools/perf/util/pmu.h | 4 ++++
6 files changed, 77 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 8962544..d572204 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -519,7 +519,7 @@ static int add_tracepoint_multi(struct list_head *list, int *idx,
return ret;
}

-int parse_events_add_tracepoint(struct list_head *list, int *idx,
+static int parse_events_add_tracepoint(struct list_head *list, int *idx,
char *sys, char *event)
{
int ret;
@@ -533,6 +533,38 @@ int parse_events_add_tracepoint(struct list_head *list, int *idx,
add_tracepoint(list, idx, sys, event);
}

+static int __parse_events_add_generic_event(struct list_head *list, int *idx,
+ char *sys, char *event)
+{
+ struct perf_event_attr attr;
+ char name[MAX_NAME_LEN];
+ int ret;
+
+ memset(&attr, 0, sizeof(attr));
+ ret = perf_pmu__parse_event(&attr, sys, event);
+ if (ret)
+ return ret;
+
+ snprintf(name, MAX_NAME_LEN, "%s:%s", sys, event);
+ return add_event(list, idx, &attr, name);
+}
+
+int parse_events_add_generic_event(struct list_head *list, int *idx,
+ char *sys, char *event)
+{
+ int ret1, ret2;
+
+ ret1 = parse_events_add_tracepoint(list, idx, sys, event);
+ if (!ret1)
+ return 0;
+
+ ret2 = __parse_events_add_generic_event(list, idx, sys, event);
+ if (!ret2)
+ return 0;
+
+ return ret1 == -ENOENT ? ret2 : ret1;
+}
+
static int
parse_breakpoint_type(const char *type, struct perf_event_attr *attr)
{
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index ca069f8..e9df741 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -62,8 +62,8 @@ int parse_events__new_term(struct parse_events__term **term, int type,
char *config, char *str, long num);
void parse_events__free_terms(struct list_head *terms);
int parse_events_modifier(struct list_head *list __used, char *str __used);
-int parse_events_add_tracepoint(struct list_head *list, int *idx,
- char *sys, char *event);
+int parse_events_add_generic_event(struct list_head *list, int *idx,
+ char *sys, char *event);
int parse_events_add_raw(struct perf_evlist *evlist, unsigned long config,
unsigned long config1, unsigned long config2,
char *mod);
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index d9637da..659b5e8 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -75,7 +75,7 @@ event_def: event_pmu |
event_legacy_symbol |
event_legacy_cache sep_dc |
event_legacy_mem |
- event_legacy_tracepoint sep_dc |
+ event_legacy_generic sep_dc |
event_legacy_numeric sep_dc |
event_legacy_raw sep_dc

@@ -131,10 +131,10 @@ PE_PREFIX_MEM PE_VALUE sep_dc
ABORT_ON(parse_events_add_breakpoint(list_event, idx, (void *) $2, NULL));
}

-event_legacy_tracepoint:
+event_legacy_generic:
PE_NAME ':' PE_NAME
{
- ABORT_ON(parse_events_add_tracepoint(list_event, idx, $1, $3));
+ ABORT_ON(parse_events_add_generic_event(list_event, idx, $1, $3));
}

event_legacy_numeric:
diff --git a/tools/perf/util/pmu-ibs.c b/tools/perf/util/pmu-ibs.c
index 5cf8601..07acb82 100644
--- a/tools/perf/util/pmu-ibs.c
+++ b/tools/perf/util/pmu-ibs.c
@@ -8,6 +8,7 @@

#include <stdio.h>
#include <string.h>
+#include <errno.h>
#include <linux/compiler.h>
#include "pmu.h"

@@ -75,6 +76,25 @@ static const char *events[] = {
NULL
};

+static int ibs_parse_event(struct perf_event_attr *attr, char *sys, char *name)
+{
+ const char **event;
+
+ if (strcmp("ibs_op", sys) && strcmp("ibs_fetch", sys))
+ return -ENOENT;
+
+ for (event = events; *event; event++) {
+ if (!strcmp(*event + strlen(sys) + 1, name))
+ goto match;
+ }
+
+ return -EINVAL;
+match:
+ attr->sample_type = PERF_SAMPLE_CPU;
+
+ return 0;
+}
+
static void ibs_print_events(const char *sys)
{
const char **event;
@@ -89,10 +109,12 @@ static void ibs_print_events(const char *sys)

struct pmu_handler pmu_ibs_fetch = {
.name = "ibs_fetch",
+ .parse_event = ibs_parse_event,
.print_events = ibs_print_events,
};

struct pmu_handler pmu_ibs_op = {
.name = "ibs_op",
+ .parse_event = ibs_parse_event,
.print_events = ibs_print_events,
};
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 7bfaba1..5767a9c 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -29,6 +29,19 @@ static struct pmu_handler *pmu_handlers[] = {
NULL /* terminator */
};

+int perf_pmu__parse_event(struct perf_event_attr *attr,
+ char *sys, char *event)
+{
+ struct perf_pmu *pmu = perf_pmu__find(sys);
+
+ if (pmu && pmu->handler) {
+ attr->type = pmu->type;
+ return pmu->handler->parse_event(attr, sys, event);
+ }
+
+ return -ENOENT;
+}
+
void perf_pmu__print_events(const char *sys)
{
struct perf_pmu *pmu = NULL;
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index e5788aa..20b4e39 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -22,6 +22,8 @@ struct perf_pmu__format {
struct pmu_handler {
const char *name;

+ int (*parse_event)(struct perf_event_attr *attr,
+ char *sys, char *event);
void(*print_events)(const char *sys);
};

@@ -47,6 +49,8 @@ void perf_pmu__set_format(unsigned long *bits, long from, long to);
int perf_pmu__test(void);

struct perf_pmu *perf_pmu__scan(struct perf_pmu *pmu);
+int perf_pmu__parse_event(struct perf_event_attr *attr,
+ char *sys, char *event);
void perf_pmu__print_events(const char *sys);

/* supported pmus: */
--
1.7.8.4

Subject: [PATCH 2/7] perf tools: Add basic dynamic PMU support

This patch implements support for dynamically allocated pmus. This
happens if a driver registers a pmu with the perf_pmu_register()
function. The pmu is then identified by a fixed string describing the
pmu, e.g. "ibs_op". The type value of those pmus is freely assigned by
the system and may vary. The pmus are listed in sysfs, e.g.:

/sys/bus/event_source/devices/breakpoint/type: 5
/sys/bus/event_source/devices/cpu/type: 4
/sys/bus/event_source/devices/ibs_fetch/type: 6
/sys/bus/event_source/devices/ibs_op/type: 7
/sys/bus/event_source/devices/software/type: 1
/sys/bus/event_source/devices/tracepoint/type: 2

The idea is, that dynamically added pmus are detected by the perf tool
by parsing sysfs. There are also pmu handlers registered in the perf
tool. Both are identified by its unique names. If a pmu is supported
by the perf tool which means that a handler exists, the handler is
attached to a new detected pmu if both names match.

The handler may implement functions to print, parse and process
events, which are called if necessary, e.g. while printing a list of
events with perf-list, when parsing the event options or while
processing events.

This patch set first implements only printing of events, thus
perf-list provides a list of pmu events, e.g.:

# perf list ibs_op | cat

List of pre-defined events (to be used in -e):

ibs_op:ALL [PMU event: ibs_op]
ibs_op:ALL_LOAD_STORE [PMU event: ibs_op]
ibs_op:BANK_CONF_LOAD [PMU event: ibs_op]
ibs_op:BANK_CONF_STORE [PMU event: ibs_op]
ibs_op:BRANCH_RETIRED [PMU event: ibs_op]
ibs_op:CANCELLED [PMU event: ibs_op]
ibs_op:COMP_TO_RET [PMU event: ibs_op]
...

Code to parse such pmu events is added in the next patch.

Signed-off-by: Robert Richter <[email protected]>
---
tools/perf/Makefile | 1 +
tools/perf/util/parse-events.c | 1 +
tools/perf/util/pmu-ibs.c | 98 ++++++++++++++++++++++++++++++++++++++++
tools/perf/util/pmu.c | 86 ++++++++++++++++++++++++++++++++++-
tools/perf/util/pmu.h | 15 ++++++
5 files changed, 199 insertions(+), 2 deletions(-)
create mode 100644 tools/perf/util/pmu-ibs.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index e98e14c..8c372dc 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -310,6 +310,7 @@ LIB_OBJS += $(OUTPUT)util/ctype.o
LIB_OBJS += $(OUTPUT)util/debugfs.o
LIB_OBJS += $(OUTPUT)util/sysfs.o
LIB_OBJS += $(OUTPUT)util/pmu.o
+LIB_OBJS += $(OUTPUT)util/pmu-ibs.o
LIB_OBJS += $(OUTPUT)util/environment.o
LIB_OBJS += $(OUTPUT)util/event.o
LIB_OBJS += $(OUTPUT)util/evlist.o
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 5b3a0ef..8962544 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -991,6 +991,7 @@ void print_events(const char *event_glob)
printf("\n");
}
print_hwcache_events(event_glob);
+ perf_pmu__print_events(event_glob);

if (event_glob != NULL)
return;
diff --git a/tools/perf/util/pmu-ibs.c b/tools/perf/util/pmu-ibs.c
new file mode 100644
index 0000000..5cf8601
--- /dev/null
+++ b/tools/perf/util/pmu-ibs.c
@@ -0,0 +1,98 @@
+/*
+ * Performance events - AMD IBS
+ *
+ * Copyright (C) 2012 Advanced Micro Devices, Inc., Robert Richter
+ *
+ * For licencing details see kernel-base/COPYING
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <linux/compiler.h>
+#include "pmu.h"
+
+static const char *events[] = {
+ "ibs_fetch:2M_PAGE",
+ "ibs_fetch:4K_PAGE",
+ "ibs_fetch:ABORTED",
+ "ibs_fetch:ALL",
+ "ibs_fetch:ATTEMPTED",
+ "ibs_fetch:COMPLETED",
+ "ibs_fetch:ICACHE_HITS",
+ "ibs_fetch:ICACHE_MISSES",
+ "ibs_fetch:ITLB_HITS",
+ "ibs_fetch:KILLED",
+ "ibs_fetch:L1_ITLB_MISSES_L2_ITLB_HITS",
+ "ibs_fetch:L1_ITLB_MISSES_L2_ITLB_MISSES",
+ "ibs_fetch:LATENCY",
+ "ibs_op:ALL",
+ "ibs_op:ALL_LOAD_STORE",
+ "ibs_op:BANK_CONF_LOAD",
+ "ibs_op:BANK_CONF_STORE",
+ "ibs_op:BRANCH_RETIRED",
+ "ibs_op:CANCELLED",
+ "ibs_op:COMP_TO_RET",
+ "ibs_op:DATA_CACHE_MISS",
+ "ibs_op:DATA_HITS",
+ "ibs_op:DC_LOAD_LAT",
+ "ibs_op:DCUC_MEM_ACC",
+ "ibs_op:DCWC_MEM_ACC",
+ "ibs_op:FORWARD",
+ "ibs_op:L1_DTLB_1G",
+ "ibs_op:L1_DTLB_2M",
+ "ibs_op:L1_DTLB_4K",
+ "ibs_op:L1_DTLB_HITS",
+ "ibs_op:L1_DTLB_MISS_L2_DTLB_HIT",
+ "ibs_op:L1_L2_DTLB_MISS",
+ "ibs_op:L2_DTLB_1G",
+ "ibs_op:L2_DTLB_2M",
+ "ibs_op:L2_DTLB_4K",
+ "ibs_op:LOAD",
+ "ibs_op:LOCKED",
+ "ibs_op:MAB_HIT",
+ "ibs_op:MISALIGNED_DATA_ACC",
+ "ibs_op:MISPREDICTED_BRANCH",
+ "ibs_op:MISPREDICTED_BRANCH_TAKEN",
+ "ibs_op:MISPREDICTED_RETURNS",
+ "ibs_op:NB_CACHE_MODIFIED",
+ "ibs_op:NB_CACHE_OWNED",
+ "ibs_op:NB_LOCAL_CACHE",
+ "ibs_op:NB_LOCAL_CACHE_LAT",
+ "ibs_op:NB_LOCAL_DRAM",
+ "ibs_op:NB_LOCAL_L3",
+ "ibs_op:NB_LOCAL_ONLY",
+ "ibs_op:NB_LOCAL_OTHER",
+ "ibs_op:NB_REMOTE_CACHE",
+ "ibs_op:NB_REMOTE_CACHE_LAT",
+ "ibs_op:NB_REMOTE_DRAM",
+ "ibs_op:NB_REMOTE_ONLY",
+ "ibs_op:NB_REMOTE_OTHER",
+ "ibs_op:RESYNC",
+ "ibs_op:RETURNS",
+ "ibs_op:STORE",
+ "ibs_op:TAG_TO_RETIRE",
+ "ibs_op:TAKEN_BRANCH",
+ NULL
+};
+
+static void ibs_print_events(const char *sys)
+{
+ const char **event;
+
+ printf("\n");
+
+ for (event = events; *event; event++) {
+ if (!strncmp(sys, *event, strlen(sys)))
+ printf(" %-50s [PMU event: %s]\n", *event, sys);
+ }
+}
+
+struct pmu_handler pmu_ibs_fetch = {
+ .name = "ibs_fetch",
+ .print_events = ibs_print_events,
+};
+
+struct pmu_handler pmu_ibs_op = {
+ .name = "ibs_op",
+ .print_events = ibs_print_events,
+};
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 480c448..7bfaba1 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -5,17 +5,44 @@
#include <unistd.h>
#include <stdio.h>
#include <dirent.h>
+#include <stddef.h>
#include "sysfs.h"
#include "util.h"
#include "pmu.h"
#include "parse-events.h"

+#define EVENT_SOURCE_DEVICE_PATH "/bus/event_source/devices/"
+
int perf_pmu_parse(struct list_head *list, char *name);
extern FILE *perf_pmu_in;

static LIST_HEAD(pmus);

/*
+ * Dynamic PMU support
+ *
+ */
+
+static struct pmu_handler *pmu_handlers[] = {
+ &pmu_ibs_fetch,
+ &pmu_ibs_op,
+ NULL /* terminator */
+};
+
+void perf_pmu__print_events(const char *sys)
+{
+ struct perf_pmu *pmu = NULL;
+
+ while ((pmu = perf_pmu__scan(pmu))) {
+ if (!pmu->handler)
+ continue;
+ if (sys && strncmp(pmu->name, sys, strlen(sys)))
+ continue;
+ pmu->handler->print_events(pmu->name);
+ }
+}
+
+/*
* Parse & process all the sysfs attributes located under
* the directory specified in 'dir' parameter.
*/
@@ -69,7 +96,7 @@ static int pmu_format(char *name, struct list_head *format)
return -1;

snprintf(path, PATH_MAX,
- "%s/bus/event_source/devices/%s/format", sysfs, name);
+ "%s" EVENT_SOURCE_DEVICE_PATH "%s/format", sysfs, name);

if (stat(path, &st) < 0)
return 0; /* no error if format does not exist */
@@ -98,7 +125,7 @@ static int pmu_type(char *name, __u32 *type)
return -1;

snprintf(path, PATH_MAX,
- "%s/bus/event_source/devices/%s/type", sysfs, name);
+ "%s" EVENT_SOURCE_DEVICE_PATH "%s/type", sysfs, name);

if (stat(path, &st) < 0)
return -1;
@@ -114,6 +141,45 @@ static int pmu_type(char *name, __u32 *type)
return ret;
}

+/* Add all pmus in sysfs to pmu list: */
+static void pmu_read_sysfs(void)
+{
+ char path[PATH_MAX];
+ const char *sysfs;
+ DIR *dir;
+ struct dirent *dent;
+
+ sysfs = sysfs_find_mountpoint();
+ if (!sysfs)
+ return;
+
+ snprintf(path, PATH_MAX,
+ "%s" EVENT_SOURCE_DEVICE_PATH, sysfs);
+
+ dir = opendir(path);
+ if (!dir)
+ return;
+
+ while ((dent = readdir(dir))) {
+ if (!strcmp(dent->d_name, ".") || !strcmp(dent->d_name, ".."))
+ continue;
+ perf_pmu__find(dent->d_name);
+ }
+
+ closedir(dir);
+}
+
+static struct pmu_handler *get_pmu_handler(char *name)
+{
+ struct pmu_handler **handler;
+
+ for (handler = pmu_handlers; *handler; handler++)
+ if (!strcmp((*handler)->name, name))
+ return *handler;
+
+ return NULL;
+}
+
static struct perf_pmu *pmu_lookup(char *name)
{
struct perf_pmu *pmu;
@@ -139,6 +205,7 @@ static struct perf_pmu *pmu_lookup(char *name)
list_splice(&format, &pmu->format);
pmu->name = strdup(name);
pmu->type = type;
+ pmu->handler = get_pmu_handler(name);
list_add_tail(&pmu->list, &pmus);
return pmu;
}
@@ -154,6 +221,21 @@ static struct perf_pmu *pmu_find(char *name)
return NULL;
}

+struct perf_pmu *perf_pmu__scan(struct perf_pmu *pmu)
+{
+ /*
+ * pmu iterator: If pmu is NULL, we start at the begin,
+ * otherwise return the next pmu. Returns NULL on end.
+ */
+ if (!pmu) {
+ pmu_read_sysfs();
+ pmu = list_prepare_entry(pmu, &pmus, list);
+ }
+ list_for_each_entry_continue(pmu, &pmus, list)
+ return pmu;
+ return NULL;
+}
+
struct perf_pmu *perf_pmu__find(char *name)
{
struct perf_pmu *pmu;
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 68c0db9..e5788aa 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -19,11 +19,18 @@ struct perf_pmu__format {
struct list_head list;
};

+struct pmu_handler {
+ const char *name;
+
+ void(*print_events)(const char *sys);
+};
+
struct perf_pmu {
char *name;
__u32 type;
struct list_head format;
struct list_head list;
+ struct pmu_handler *handler;
};

struct perf_pmu *perf_pmu__find(char *name);
@@ -38,4 +45,12 @@ int perf_pmu__new_format(struct list_head *list, char *name,
void perf_pmu__set_format(unsigned long *bits, long from, long to);

int perf_pmu__test(void);
+
+struct perf_pmu *perf_pmu__scan(struct perf_pmu *pmu);
+void perf_pmu__print_events(const char *sys);
+
+/* supported pmus: */
+struct pmu_handler pmu_ibs_fetch;
+struct pmu_handler pmu_ibs_op;
+
#endif /* __PMU_H */
--
1.7.8.4

Subject: [PATCH 1/7] perf tools: Fix generation of pmu list

The internal pmu list was never used. With each perf_pmu__find() call
the pmu structure was created new by parsing sysfs. Beside this it
caused memory leaks. We now keep all pmus by adding them to the list.

Also, pmu_lookup() should return pmus that do not expose the format
specifier in sysfs.

We need a valid internal pmu list in a later patch to iterate over all
pmus that exist in the system.

Signed-off-by: Robert Richter <[email protected]>
---
tools/perf/util/pmu.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index cb08a11..480c448 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -72,7 +72,7 @@ static int pmu_format(char *name, struct list_head *format)
"%s/bus/event_source/devices/%s/format", sysfs, name);

if (stat(path, &st) < 0)
- return -1;
+ return 0; /* no error if format does not exist */

if (pmu_format_parse(path, format))
return -1;
@@ -139,6 +139,7 @@ static struct perf_pmu *pmu_lookup(char *name)
list_splice(&format, &pmu->format);
pmu->name = strdup(name);
pmu->type = type;
+ list_add_tail(&pmu->list, &pmus);
return pmu;
}

--
1.7.8.4

Subject: [PATCH 4/7] perf/x86-ibs: Add support for IBS pseudo events

This patch implements support for IBS pseudo events. Pseudo events are
derived from an IBS sample and determined through a combination of one
or more IBS event flags or values. See here for a full description:

Software Optimization Guide for AMD Family 15h Processors
Appendix F Guide to Instruction-Based Sampling on AMD Family 15h Processors
Advanced Micro Devices, Inc.
Publication No. 47414, Revision 3.06
January 2012
http://support.amd.com/us/Processor_TechDocs/47414_15h_sw_opt_guide.pdf

The list of supported events is provided by perf-list. A pseudo event
can be set up like this:

# perf record -a -e ibs_op:MISPREDICTED_BRANCH ...

The filter rules for IBS samples depending on a pseudo event are also
described in the document above. The filter is setup in the perf tool
pmu handler and passed to the kernel via config1/config2 attr values.
The interface is extendable to pass the pseudo events directly to the
kernel.

There are some pseudo events capable to count latencies or other
values. Counting values of such events is not yet supported.

This patch includes kernel and userland changes.

Signed-off-by: Robert Richter <[email protected]>
---
arch/x86/kernel/cpu/perf_event_amd_ibs.c | 83 ++++++-
tools/perf/util/pmu-ibs.c | 433 +++++++++++++++++++++++++-----
2 files changed, 445 insertions(+), 71 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 03743ad..1675479 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -478,6 +478,81 @@ static struct perf_ibs perf_ibs_op = {
.get_count = get_ibs_op_count,
};

+enum ibs_filter_type {
+ IBS_NO_FILTER = 0,
+ IBS_MATCH_FILTER = 1,
+ IBS_ANY_SET_FILTER = 2,
+ IBS_PSEUDO_EVENT = 0x0F,
+};
+
+struct ibs_filter {
+ struct {
+ u16 idx : 8;
+ u16 reserved : 4;
+ u16 type : 4;
+ };
+ union {
+ struct {
+ u8 mask;
+ u8 match;
+ };
+ u16 any;
+ };
+};
+
+static bool
+__perf_ibs_sample_matches(struct ibs_filter *filter, void *data, int size)
+{
+ int left = size;
+
+ switch (filter->type) {
+ case IBS_MATCH_FILTER:
+ left -= sizeof(u8);
+ break;
+ case IBS_ANY_SET_FILTER:
+ left -= sizeof(u16);
+ break;
+ default:
+ return false;
+ }
+
+ left -= filter->idx;
+ if (left < 0)
+ return false;
+
+ switch (filter->type) {
+ case IBS_MATCH_FILTER:
+ return ((*(u8*)(data + filter->idx)) & filter->mask) == filter->match;
+ case IBS_ANY_SET_FILTER:
+ return (*(u16*)(data + filter->idx)) & filter->any;
+ };
+
+ return false;
+}
+
+static bool perf_ibs_sample_matches(struct perf_event *event,
+ struct perf_ibs_data *data)
+{
+ int i;
+ union {
+ struct ibs_filter filter[4];
+ u64 config[2];
+ } f;
+ struct ibs_filter *filter = f.filter;
+
+ f.config[0] = event->attr.config1;
+ f.config[1] = event->attr.config2;
+
+ for (i = 0; i < 4; i++, filter++) {
+ if (filter->type == IBS_NO_FILTER)
+ break;
+ if (!__perf_ibs_sample_matches(filter, data->regs, data->size))
+ return false;
+ }
+
+ return true;
+}
+
static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
{
struct cpu_perf_ibs *pcpu = this_cpu_ptr(perf_ibs->pcpu);
@@ -487,7 +562,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
struct perf_raw_record raw;
struct pt_regs regs;
struct perf_ibs_data ibs_data;
- int offset, size, check_rip, offset_max, throttle = 0;
+ int offset, size, check_rip, filter, offset_max, throttle = 0;
unsigned int msr;
u64 *buf, *config, period;

@@ -517,7 +592,8 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
size = 1;
offset = 1;
check_rip = (perf_ibs == &perf_ibs_op && (ibs_caps & IBS_CAPS_RIPINVALIDCHK));
- if (event->attr.sample_type & PERF_SAMPLE_RAW)
+ filter = (event->attr.config1 != 0);
+ if (filter || (event->attr.sample_type & PERF_SAMPLE_RAW))
offset_max = perf_ibs->offset_max;
else if (check_rip)
offset_max = 2;
@@ -532,6 +608,9 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
} while (offset < offset_max);
ibs_data.size = sizeof(u64) * size;

+ if (filter && !perf_ibs_sample_matches(event, &ibs_data))
+ goto out;
+
regs = *iregs;
if (check_rip && (ibs_data.regs[2] & IBS_RIP_INVALID)) {
regs.flags &= ~PERF_EFLAGS_EXACT;
diff --git a/tools/perf/util/pmu-ibs.c b/tools/perf/util/pmu-ibs.c
index 07acb82..604cb8c 100644
--- a/tools/perf/util/pmu-ibs.c
+++ b/tools/perf/util/pmu-ibs.c
@@ -12,84 +12,378 @@
#include <linux/compiler.h>
#include "pmu.h"

-static const char *events[] = {
- "ibs_fetch:2M_PAGE",
- "ibs_fetch:4K_PAGE",
- "ibs_fetch:ABORTED",
- "ibs_fetch:ALL",
- "ibs_fetch:ATTEMPTED",
- "ibs_fetch:COMPLETED",
- "ibs_fetch:ICACHE_HITS",
- "ibs_fetch:ICACHE_MISSES",
- "ibs_fetch:ITLB_HITS",
- "ibs_fetch:KILLED",
- "ibs_fetch:L1_ITLB_MISSES_L2_ITLB_HITS",
- "ibs_fetch:L1_ITLB_MISSES_L2_ITLB_MISSES",
- "ibs_fetch:LATENCY",
- "ibs_op:ALL",
- "ibs_op:ALL_LOAD_STORE",
- "ibs_op:BANK_CONF_LOAD",
- "ibs_op:BANK_CONF_STORE",
- "ibs_op:BRANCH_RETIRED",
- "ibs_op:CANCELLED",
- "ibs_op:COMP_TO_RET",
- "ibs_op:DATA_CACHE_MISS",
- "ibs_op:DATA_HITS",
- "ibs_op:DC_LOAD_LAT",
- "ibs_op:DCUC_MEM_ACC",
- "ibs_op:DCWC_MEM_ACC",
- "ibs_op:FORWARD",
- "ibs_op:L1_DTLB_1G",
- "ibs_op:L1_DTLB_2M",
- "ibs_op:L1_DTLB_4K",
- "ibs_op:L1_DTLB_HITS",
- "ibs_op:L1_DTLB_MISS_L2_DTLB_HIT",
- "ibs_op:L1_L2_DTLB_MISS",
- "ibs_op:L2_DTLB_1G",
- "ibs_op:L2_DTLB_2M",
- "ibs_op:L2_DTLB_4K",
- "ibs_op:LOAD",
- "ibs_op:LOCKED",
- "ibs_op:MAB_HIT",
- "ibs_op:MISALIGNED_DATA_ACC",
- "ibs_op:MISPREDICTED_BRANCH",
- "ibs_op:MISPREDICTED_BRANCH_TAKEN",
- "ibs_op:MISPREDICTED_RETURNS",
- "ibs_op:NB_CACHE_MODIFIED",
- "ibs_op:NB_CACHE_OWNED",
- "ibs_op:NB_LOCAL_CACHE",
- "ibs_op:NB_LOCAL_CACHE_LAT",
- "ibs_op:NB_LOCAL_DRAM",
- "ibs_op:NB_LOCAL_L3",
- "ibs_op:NB_LOCAL_ONLY",
- "ibs_op:NB_LOCAL_OTHER",
- "ibs_op:NB_REMOTE_CACHE",
- "ibs_op:NB_REMOTE_CACHE_LAT",
- "ibs_op:NB_REMOTE_DRAM",
- "ibs_op:NB_REMOTE_ONLY",
- "ibs_op:NB_REMOTE_OTHER",
- "ibs_op:RESYNC",
- "ibs_op:RETURNS",
- "ibs_op:STORE",
- "ibs_op:TAG_TO_RETIRE",
- "ibs_op:TAKEN_BRANCH",
- NULL
+enum ibs_filter_type {
+ IBS_NO_FILTER = 0,
+ IBS_MATCH_FILTER = 1,
+ IBS_ANY_SET_FILTER = 2,
+ IBS_PSEUDO_EVENT = 0x0F,
+};
+
+struct ibs_filter {
+ struct {
+ __u16 idx : 8;
+ __u16 reserved : 4;
+ __u16 type : 4;
+ };
+ union {
+ struct {
+ __u8 mask;
+ __u8 match;
+ };
+ __u16 any;
+ };
+};
+
+struct ibs_event {
+ __u16 id;
+ const char *name;
+ const char *desc;
+ union {
+ __u16 pseudo_event;
+ __u64 config;
+ struct ibs_filter filter[2];
+ };
+};
+
+#define IBS_FETCH_CTL 0
+#define IBS_OP_DATA 2
+#define IBS_OP_DATA2 3
+#define IBS_OP_DATA3 4
+
+#define IBS_IDX(reg, bit) ((reg)<<3)+((bit)>>3)
+#define IBS_MASK(bit, m) (0xFF&m)
+#define IBS_MASK16(bit, m) (0xFFFF&m)
+#define IBS_FILTER_MATCH_ANY() { { 0, 0, 0 }, { .any = 0 } }
+
+#define IBS_FILTER_ANY_SET(reg, bit, m) \
+ { \
+ { \
+ .type = IBS_ANY_SET_FILTER, \
+ .idx = IBS_IDX(reg, bit), \
+ .reserved = 0, \
+ },{ \
+ .any = IBS_MASK16(bit, m), \
+ } \
+ },
+
+#define IBS_FILTER_ALL_CLEAR(reg, bit, m) \
+ { \
+ { \
+ .type = IBS_MATCH_FILTER, \
+ .idx = IBS_IDX(reg, bit), \
+ .reserved = 0, \
+ },{{ \
+ .mask = IBS_MASK(bit, m), \
+ .match = 0, \
+ }} \
+ }, \
+ { \
+ { \
+ .type = IBS_MATCH_FILTER, \
+ .idx = IBS_IDX(reg, (bit) + 8), \
+ .reserved = 0, \
+ },{{ \
+ .mask = IBS_MASK(bit, (m) >> 8), \
+ .match = 0, \
+ }} \
+ },
+
+#define IBS_FILTER_ALL_SET(reg, bit, m) \
+ { \
+ { \
+ .type = IBS_MATCH_FILTER, \
+ .idx = IBS_IDX(reg, bit), \
+ .reserved = 0, \
+ },{{ \
+ .mask = IBS_MASK(bit, m), \
+ .match = IBS_MASK(bit, m), \
+ }} \
+ },
+
+#define IBS_FILTER_MATCH(reg, bit, m, v) \
+ { \
+ { \
+ .type = IBS_MATCH_FILTER, \
+ .idx = IBS_IDX(reg, bit), \
+ .reserved = 0, \
+ },{{ \
+ .mask = IBS_MASK(bit, m), \
+ .match = IBS_MASK(bit, v), \
+ }} \
+ },
+
+#define IBS_FILTER_MATCH2(reg, reg2, bit, bit2, m, m2, v, v2) \
+ { \
+ { \
+ .type = IBS_MATCH_FILTER, \
+ .idx = IBS_IDX(reg, bit), \
+ .reserved = 0, \
+ },{{ \
+ .mask = IBS_MASK(bit, m), \
+ .match = IBS_MASK(bit, v), \
+ }} \
+ }, \
+ { \
+ { \
+ .type = IBS_MATCH_FILTER, \
+ .idx = IBS_IDX(reg2, bit2), \
+ .reserved = 0, \
+ },{{ \
+ .mask = IBS_MASK(bit2, m2), \
+ .match = IBS_MASK(bit2, v2), \
+ }} \
+ }
+
+#define IBS_EVENT(i, n, d) \
+ { \
+ .id = (i), \
+ .name = (n), \
+ .desc = (d), \
+ { .filter = { IBS_FILTER_##i } }, \
+ }
+#define IBS_FILTER(type, args...) IBS_FILTER_##type(args)
+
+/*
+ * ID Name Derivation
+ *
+ * F000 IBS fetch samples Number of all IBS fetch samples
+ * F001 IBS fetch killed Number of killed IBS fetch samples
+ * F002 IBS fetch attempted Number of non-killed IBS fetch samples
+ * F003 IBS fetch completed IbsFetchComp
+ * F004 IBS fetch aborted ~IbsFetchComp
+ * F005 IBS L1 ITLB hit ~IbsL1TlbMiss & IbsPhyAddrValid
+ * F006 IBS L1 ITLB miss, L2 ITLB hit IbsL1TlbMiss & ~IbsL2TlbMiss
+ * F007 IBS L1 ITLB miss, L2 ITLB miss IbsL1TlbMiss & IbsL2TlbMiss
+ * F008 IBS instruction cache miss IbsIcMiss
+ * F009 IBS instruction cache hit IbsFetchComp & ~IbsIcMiss
+ * F00A IBS 4K page translation IbsL1TlbPgSz=0 & IbsPhyAddrValid
+ * F00B IBS 2M page translation IbsL1TlbPgSz=1 & IbsPhyAddrValid
+ * F00C IBS 1G page translation IbsL1TlbPgSz=2 & IbsPhyAddrValid
+ * F00D Reserved
+ * F00E IBS fetch latency IbsfetchLat
+ */
+#define IBS_FILTER_0xf000 IBS_FILTER(MATCH_ANY)
+#define IBS_FILTER_0xf001 IBS_FILTER(ALL_CLEAR, IBS_FETCH_CTL, 48, 0x019c)
+#define IBS_FILTER_0xf002 IBS_FILTER(ANY_SET, IBS_FETCH_CTL, 48, 0x019c)
+#define IBS_FILTER_0xf003 IBS_FILTER(MATCH, IBS_FETCH_CTL, 48, 0x04, 0x04)
+#define IBS_FILTER_0xf004 IBS_FILTER(MATCH, IBS_FETCH_CTL, 48, 0x04, 0x00)
+#define IBS_FILTER_0xf005 IBS_FILTER(MATCH, IBS_FETCH_CTL, 48, 0x90, 0x10)
+#define IBS_FILTER_0xf006 IBS_FILTER(MATCH2, IBS_FETCH_CTL, IBS_FETCH_CTL, 56, 48, 0x01, 0x80, 0x00, 0x80)
+#define IBS_FILTER_0xf007 IBS_FILTER(MATCH2, IBS_FETCH_CTL, IBS_FETCH_CTL, 56, 48, 0x01, 0x80, 0x01, 0x80)
+#define IBS_FILTER_0xf008 IBS_FILTER(MATCH, IBS_FETCH_CTL, 48, 0x08, 0x08)
+#define IBS_FILTER_0xf009 IBS_FILTER(MATCH, IBS_FETCH_CTL, 48, 0x0C, 0x04)
+#define IBS_FILTER_0xf00a IBS_FILTER(MATCH, IBS_FETCH_CTL, 48, 0x70, 0x10)
+#define IBS_FILTER_0xf00b IBS_FILTER(MATCH, IBS_FETCH_CTL, 48, 0x70, 0x30)
+#define IBS_FILTER_0xf00c IBS_FILTER(MATCH, IBS_FETCH_CTL, 48, 0x70, 0x60)
+#if 0
+#define IBS_FILTER_0xf00e IBS_FILTER(COUNT, IBS_FETCH_CTL, 32, 0xffff)
+#endif
+
+/*
+ * ID Name Derivation
+ *
+ * F100 IBS all op samples Number of all IBS op samples
+ * F101 IBS tag to retire cycles Sum of all tag to retire cycles
+ * F102 ibs completion to retire cycles Sum of all completion to retire cycles
+ * F103 IBS branch op IbsOpBrnRet
+ * F104 IBS mispredicted branch op IbsOpBrnRet & IbsOpBrnMisp
+ * F105 IBS taken branch op IbsOpBrnRet & IbsOpBrnTaken
+ * F106 IBS mispredicted taken branch op IbsOpBrnRet & IbsOpBrnTaken & IbsOpBrnMisp
+ * F107 IBS return op IbsOpReturn
+ * F108 IBS mispredicted return op IbsOpReturn & IbsOpMispReturn
+ * F109 IBS resync op IbsOpBrnResync
+ */
+#define IBS_FILTER_0xf100 IBS_FILTER(MATCH_ANY)
+#if 0
+#define IBS_FILTER_0xf101 IBS_FILTER(COUNT, IBS_OP_DATA, 16, 0xffff)
+#define IBS_FILTER_0xf102 IBS_FILTER(COUNT, IBS_OP_DATA, 0, 0xffff)
+#endif
+#define IBS_FILTER_0xf103 IBS_FILTER(MATCH, IBS_OP_DATA, 32, 0x20, 0x20)
+#define IBS_FILTER_0xf104 IBS_FILTER(MATCH, IBS_OP_DATA, 32, 0x30, 0x30)
+#define IBS_FILTER_0xf105 IBS_FILTER(MATCH, IBS_OP_DATA, 32, 0x28, 0x28)
+#define IBS_FILTER_0xf106 IBS_FILTER(MATCH, IBS_OP_DATA, 32, 0x38, 0x38)
+#define IBS_FILTER_0xf107 IBS_FILTER(MATCH, IBS_OP_DATA, 32, 0x04, 0x04)
+#define IBS_FILTER_0xf108 IBS_FILTER(MATCH, IBS_OP_DATA, 32, 0x06, 0x06)
+#define IBS_FILTER_0xf109 IBS_FILTER(MATCH, IBS_OP_DATA, 32, 0x01, 0x01)
+
+/*
+ * ID Name Derivation
+ *
+ * F200 IBS All Load/Store Ops IbsLdOp | IbsStOp
+ * F201 IBS Load Ops IbsLdOp
+ * F202 IBS Store Ops IbsStOp
+ * F203 IBS L1 DTLB Hit ~IbsDcL1tlbMiss & IbsDcLinAddrValid
+ * F204 IBS L1 DTLB Miss L2 DTLB Hit IbsDcL1tlbMiss & ~IbsDcL2tlbMiss
+ * F205 IBS L1 DTLB Miss L2 DTLB Miss IbsDcL1tlbMiss & IbsDcL2tlbMiss
+ * F206 IBS DC Miss IbsDcMiss
+ * F207 IBS DC Hit ~IbsDcMiss
+ * F208 IBS Misaligned Access IbsDcMisAcc
+ * F209 IBS Bank Conflict On Load Op IbsDcLdBnkCon
+ * F20A Reserved
+ * F20B IBS Store to Load Forwarded IbsDcStToLdFwd
+ * F20C IBSStore to Load Forwarding Cancelled IbsDcStToLdCan
+ * F20D IBS UC memory access IbsDcUcMemAcc
+ * F20E IBS WC memory access IbsDcWcMemAcc
+ * F20F IBS locked operation IbsDcLockedOp
+ * F210 IBS MAB hit IbsDcMabHit
+ * F211 IBS L1 DTLB 4K page ~IbsDcL1tlbHit2M & ~IbsDcL1tlbHit1G &
+ * IbsDcLinAddrValid
+ * F212 IBS L1 DTLB 2M page IbsDcL1tlbHit2M & IbsDcLinAddrValid
+ * F213 IBS L1 DTLB 1G page IbsDcL1tlbHit1G & IbsDcLinAddrValid
+ * F214 Reserved
+ * F215 IBS L2 DTLB 4K page ~IbsDcL2tlbMiss & IbsDcL1tlbMiss &
+ * ~IbsDcL1tlbHit2M & lbsDcLinAddrValid
+ * F216 IBS L2 DTLB 2M page ~IbsDcL2tlbMiss & IbsDcL1tlbMiss &
+ * IbsDcL1tlbHit2M & lbsDcLinAddrValid
+ * F217 Reserved
+ * F218 Reserved
+ * F219 IBS DC miss load latency IbsDcMissLat when IbsLdOp & IbsDcMiss
+ */
+#define IBS_FILTER_0xf200 IBS_FILTER(ANY_SET, IBS_OP_DATA3, 0, 0x0003)
+#define IBS_FILTER_0xf201 IBS_FILTER(MATCH, IBS_OP_DATA3, 0, 0x01, 0x01)
+#define IBS_FILTER_0xf202 IBS_FILTER(MATCH, IBS_OP_DATA3, 0, 0x02, 0x02)
+#define IBS_FILTER_0xf203 IBS_FILTER(MATCH2, IBS_OP_DATA3, IBS_OP_DATA3, 16, 0, 0x02, 0x04, 0x02, 0x00)
+#define IBS_FILTER_0xf204 IBS_FILTER(MATCH, IBS_OP_DATA3, 0, 0x0C, 0x08)
+#define IBS_FILTER_0xf205 IBS_FILTER(MATCH, IBS_OP_DATA3, 0, 0x0C, 0x0C)
+#define IBS_FILTER_0xf206 IBS_FILTER(MATCH, IBS_OP_DATA3, 0, 0x80, 0x80)
+#define IBS_FILTER_0xf207 IBS_FILTER(MATCH, IBS_OP_DATA3, 0, 0x80, 0x00) IBS_FILTER(ANY_SET, IBS_OP_DATA3, 0, 0x0003)
+#define IBS_FILTER_0xf208 IBS_FILTER(MATCH, IBS_OP_DATA3, 8, 0x01, 0x01)
+#define IBS_FILTER_0xf209 IBS_FILTER(MATCH, IBS_OP_DATA3, 8, 0x02, 0x02)
+#define IBS_FILTER_0xf20b IBS_FILTER(MATCH, IBS_OP_DATA3, 8, 0x08, 0x08)
+#define IBS_FILTER_0xf20c IBS_FILTER(MATCH, IBS_OP_DATA3, 8, 0x10, 0x10)
+#define IBS_FILTER_0xf20d IBS_FILTER(MATCH, IBS_OP_DATA3, 8, 0x40, 0x40)
+#define IBS_FILTER_0xf20e IBS_FILTER(MATCH, IBS_OP_DATA3, 8, 0x20, 0x20)
+#define IBS_FILTER_0xf20f IBS_FILTER(MATCH, IBS_OP_DATA3, 8, 0x80, 0x80)
+#define IBS_FILTER_0xf210 IBS_FILTER(MATCH, IBS_OP_DATA3, 16, 0x01, 0x01)
+#define IBS_FILTER_0xf211 IBS_FILTER(MATCH2, IBS_OP_DATA3, IBS_OP_DATA3, 16, 0, 0x02, 0x30, 0x02, 0x00)
+#define IBS_FILTER_0xf212 IBS_FILTER(MATCH2, IBS_OP_DATA3, IBS_OP_DATA3, 16, 0, 0x02, 0x10, 0x02, 0x10)
+#define IBS_FILTER_0xf213 IBS_FILTER(MATCH2, IBS_OP_DATA3, IBS_OP_DATA3, 16, 0, 0x02, 0x20, 0x02, 0x20)
+#define IBS_FILTER_0xf215 IBS_FILTER(MATCH2, IBS_OP_DATA3, IBS_OP_DATA3, 16, 0, 0x02, 0x1C, 0x02, 0x04)
+#define IBS_FILTER_0xf216 IBS_FILTER(MATCH2, IBS_OP_DATA3, IBS_OP_DATA3, 16, 0, 0x02, 0x1C, 0x02, 0x14)
+#if 0
+#define IBS_FILTER_0xf219 IBS_FILTER(COUNT, IBS_OP_DATA3, 32, 0x00FFFF, 0x00FFFF)
+#endif
+
+/*
+ * ID Name Derivation
+ *
+ * F240 IBS NB local ~NbIbsReqDstProc
+ * F241 IBS NB remote NbIbsReqDstProc
+ * F242 IBS NB local L3 NbIbsReqSrc=0x1 & ~NbIbsReqDstProc
+ * F243 IBS NB local L1/L2 (intercore) NbIbsReqSrc=0x2 & ~NbIbsReqDstProc
+ * F244 IBS NB remote L1/L2/L3 cache NbIbsReqSrc=0x2 & NbIbsReqDstProc
+ * F245 IBS NB local DRAM NbIbsReqSrc=0x3 & ~NbIbsReqDstProc
+ * F246 IBS NB remote DRAM NbIbsReqSrc=0x3 & NbIbsReqDstProc
+ * F247 IBS NB local other NbIbsReqSrc=0x7 & ~NbIbsReqDstProc
+ * F248 IBS NB remote other NbIbsReqSrc=0x7 & NbIbsReqDstProc
+ * F249 IBS NB cache M state NbIbsReqSrc=0x2 & ~NbIbsReqCacheHitSt
+ * F24A IBS NB cache O state NbIbsReqSrc=0x2 & NbIbsReqCacheHitSt
+ * F24B IBS NB local latency IbsDcMissLat when ~NbIbsReqDstProc
+ * F24C IBS NB remote latency IbsDcMissLat when NbIbsReqDstProc
+ */
+#define IBS_FILTER_0xf240 IBS_FILTER(MATCH, IBS_OP_DATA2, 0, 0x10, 0x00) IBS_FILTER(ANY_SET, IBS_OP_DATA2, 0, 0x0007)
+#define IBS_FILTER_0xf241 IBS_FILTER(MATCH, IBS_OP_DATA2, 0, 0x10, 0x10) IBS_FILTER(ANY_SET, IBS_OP_DATA2, 0, 0x0007)
+#define IBS_FILTER_0xf242 IBS_FILTER(MATCH, IBS_OP_DATA2, 0, 0x17, 0x01) IBS_FILTER(MATCH, IBS_OP_DATA3, 0, 0x81, 0x81)
+#define IBS_FILTER_0xf243 IBS_FILTER(MATCH, IBS_OP_DATA2, 0, 0x17, 0x02) IBS_FILTER(MATCH, IBS_OP_DATA3, 0, 0x81, 0x81)
+#define IBS_FILTER_0xf244 IBS_FILTER(MATCH, IBS_OP_DATA2, 0, 0x17, 0x12) IBS_FILTER(MATCH, IBS_OP_DATA3, 0, 0x81, 0x81)
+#define IBS_FILTER_0xf245 IBS_FILTER(MATCH, IBS_OP_DATA2, 0, 0x17, 0x03) IBS_FILTER(MATCH, IBS_OP_DATA3, 0, 0x81, 0x81)
+#define IBS_FILTER_0xf246 IBS_FILTER(MATCH, IBS_OP_DATA2, 0, 0x17, 0x13) IBS_FILTER(MATCH, IBS_OP_DATA3, 0, 0x81, 0x81)
+#define IBS_FILTER_0xf247 IBS_FILTER(MATCH, IBS_OP_DATA2, 0, 0x17, 0x07) IBS_FILTER(MATCH, IBS_OP_DATA3, 0, 0x81, 0x81)
+#define IBS_FILTER_0xf248 IBS_FILTER(MATCH, IBS_OP_DATA2, 0, 0x17, 0x17) IBS_FILTER(MATCH, IBS_OP_DATA3, 0, 0x81, 0x81)
+#define IBS_FILTER_0xf249 IBS_FILTER(MATCH, IBS_OP_DATA2, 0, 0x27, 0x02) IBS_FILTER(MATCH, IBS_OP_DATA3, 0, 0x81, 0x81)
+#define IBS_FILTER_0xf24a IBS_FILTER(MATCH, IBS_OP_DATA2, 0, 0x27, 0x22) IBS_FILTER(MATCH, IBS_OP_DATA3, 0, 0x81, 0x81)
+
+static struct ibs_event events[] = {
+ IBS_EVENT(0xf000, "ibs_fetch:ALL", "All IBS fetch samples"),
+ IBS_EVENT(0xf001, "ibs_fetch:KILLED", "IBS fetch killed"),
+ IBS_EVENT(0xf002, "ibs_fetch:ATTEMPTED", "IBS fetch attempted"),
+ IBS_EVENT(0xf003, "ibs_fetch:COMPLETED", "IBS fetch completed"),
+ IBS_EVENT(0xf004, "ibs_fetch:ABORTED", "IBS fetch aborted"),
+ IBS_EVENT(0xf005, "ibs_fetch:ITLB_HITS", "IBS ITLB hit"),
+ IBS_EVENT(0xf006, "ibs_fetch:L1_ITLB_MISSES_L2_ITLB_HITS", "IBS L1 ITLB misses (and L2 ITLB hits)"),
+ IBS_EVENT(0xf007, "ibs_fetch:L1_ITLB_MISSES_L2_ITLB_MISSES", "IBS L1 L2 ITLB miss"),
+ IBS_EVENT(0xf008, "ibs_fetch:ICACHE_MISSES", "IBS instruction cache misses"),
+ IBS_EVENT(0xf009, "ibs_fetch:ICACHE_HITS", "IBS instruction cache hit"),
+ IBS_EVENT(0xf00a, "ibs_fetch:4K_PAGE", "IBS 4K page translation"),
+ IBS_EVENT(0xf00b, "ibs_fetch:2M_PAGE", "IBS 2M page translation"),
+ IBS_EVENT(0xf00c, "ibs_fetch:1G_PAGE", "IBS 1G page translation"),
+#if 0
+ IBS_EVENT(0xf00e, "ibs_fetch:LATENCY", "IBS fetch latency"),
+#endif
+ IBS_EVENT(0xf100, "ibs_op:ALL", "All IBS op samples"),
+#if 0
+ IBS_EVENT(0xf101, "ibs_op:TAG_TO_RETIRE", "IBS tag-to-retire cycles"),
+ IBS_EVENT(0xf102, "ibs_op:COMP_TO_RET", "IBS completion-to-retire cycles"),
+#endif
+ IBS_EVENT(0xf103, "ibs_op:BRANCH_RETIRED", "IBS branch op"),
+ IBS_EVENT(0xf104, "ibs_op:MISPREDICTED_BRANCH", "IBS mispredicted branch op"),
+ IBS_EVENT(0xf105, "ibs_op:TAKEN_BRANCH", "IBS taken branch op"),
+ IBS_EVENT(0xf106, "ibs_op:MISPREDICTED_BRANCH_TAKEN", "IBS mispredicted taken branch op"),
+ IBS_EVENT(0xf107, "ibs_op:RETURNS", "IBS return op"),
+ IBS_EVENT(0xf108, "ibs_op:MISPREDICTED_RETURNS", "IBS mispredicted return op"),
+ IBS_EVENT(0xf109, "ibs_op:RESYNC", "IBS resync op"),
+ IBS_EVENT(0xf200, "ibs_op:ALL_LOAD_STORE", "IBS all load store ops"),
+ IBS_EVENT(0xf201, "ibs_op:LOAD", "IBS load ops"),
+ IBS_EVENT(0xf202, "ibs_op:STORE", "IBS store ops"),
+ IBS_EVENT(0xf203, "ibs_op:L1_DTLB_HITS", "IBS L1 DTLB hit"),
+ IBS_EVENT(0xf204, "ibs_op:L1_DTLB_MISS_L2_DTLB_HIT", "IBS L1 DTLB misses L2 hits"),
+ IBS_EVENT(0xf205, "ibs_op:L1_L2_DTLB_MISS", "IBS L1 and L2 DTLB misses"),
+ IBS_EVENT(0xf206, "ibs_op:DATA_CACHE_MISS", "IBS data cache misses"),
+ IBS_EVENT(0xf207, "ibs_op:DATA_HITS", "IBS data cache hits"),
+ IBS_EVENT(0xf208, "ibs_op:MISALIGNED_DATA_ACC", "IBS misaligned data access"),
+ IBS_EVENT(0xf209, "ibs_op:BANK_CONF_LOAD", "IBS bank conflict on load op"),
+#if 0
+ IBS_EVENT(0xf20a, "ibs_op:BANK_CONF_STORE", "IBS bank conflict on store op"),
+#endif
+ IBS_EVENT(0xf20b, "ibs_op:FORWARD", "IBS store-to-load forwarded"),
+ IBS_EVENT(0xf20c, "ibs_op:CANCELLED", "IBS store-to-load cancelled"),
+ IBS_EVENT(0xf20d, "ibs_op:DCUC_MEM_ACC", "IBS UC memory access"),
+ IBS_EVENT(0xf20e, "ibs_op:DCWC_MEM_ACC", "IBS WC memory access"),
+ IBS_EVENT(0xf20f, "ibs_op:LOCKED", "IBS locked operation"),
+ IBS_EVENT(0xf210, "ibs_op:MAB_HIT", "IBS MAB hit"),
+ IBS_EVENT(0xf211, "ibs_op:L1_DTLB_4K", "IBS L1 DTLB 4K page"),
+ IBS_EVENT(0xf212, "ibs_op:L1_DTLB_2M", "IBS L1 DTLB 2M page"),
+ IBS_EVENT(0xf213, "ibs_op:L1_DTLB_1G", "IBS L1 DTLB 1G page"),
+ IBS_EVENT(0xf215, "ibs_op:L2_DTLB_4K", "IBS L2 DTLB 4K page"),
+ IBS_EVENT(0xf216, "ibs_op:L2_DTLB_2M", "IBS L2 DTLB 2M page"),
+#if 0
+ IBS_EVENT(0xf217, "ibs_op:L2_DTLB_1G", "IBS L2 DTLB 1G page"),
+ IBS_EVENT(0xf219, "ibs_op:DC_LOAD_LAT", "IBS data cache miss load latency"),
+#endif
+ IBS_EVENT(0xf240, "ibs_op:NB_LOCAL_ONLY", "IBS Northbridge local"),
+ IBS_EVENT(0xf241, "ibs_op:NB_REMOTE_ONLY", "IBS Northbridge remote"),
+ IBS_EVENT(0xf242, "ibs_op:NB_LOCAL_L3", "IBS Northbridge local L3"),
+ IBS_EVENT(0xf243, "ibs_op:NB_LOCAL_CACHE", "IBS Northbridge local core L1 or L2 cache"),
+ IBS_EVENT(0xf244, "ibs_op:NB_REMOTE_CACHE", "IBS Northbridge local core L1, L2, L3 cache"),
+ IBS_EVENT(0xf245, "ibs_op:NB_LOCAL_DRAM", "IBS Northbridge local DRAM"),
+ IBS_EVENT(0xf246, "ibs_op:NB_REMOTE_DRAM", "IBS Northbridge remote DRAM"),
+ IBS_EVENT(0xf247, "ibs_op:NB_LOCAL_OTHER", "IBS Northbridge local APIC MMIO Config PCI"),
+ IBS_EVENT(0xf248, "ibs_op:NB_REMOTE_OTHER", "IBS Northbridge remote APIC MMIO Config PCI"),
+ IBS_EVENT(0xf249, "ibs_op:NB_CACHE_MODIFIED", "IBS Northbridge cache modified state"),
+ IBS_EVENT(0xf24a, "ibs_op:NB_CACHE_OWNED", "IBS Northbridge cache owned state"),
+#if 0
+ IBS_EVENT(0xf24b, "ibs_op:NB_LOCAL_CACHE_LAT", "IBS Northbridge local cache latency"),
+ IBS_EVENT(0xf24c, "ibs_op:NB_REMOTE_CACHE_LAT", "IBS Northbridge remote cache latency"),
+#endif
+ { 0, NULL, NULL, { .filter = { IBS_FILTER_MATCH_ANY() } } }
};

static int ibs_parse_event(struct perf_event_attr *attr, char *sys, char *name)
{
- const char **event;
+ struct ibs_event *event;

if (strcmp("ibs_op", sys) && strcmp("ibs_fetch", sys))
return -ENOENT;

- for (event = events; *event; event++) {
- if (!strcmp(*event + strlen(sys) + 1, name))
+ for (event = events; event->id; event++) {
+ if (!strcmp(event->name + strlen(sys) + 1, name))
goto match;
}

return -EINVAL;
match:
+ /* pseudo event found */
+ attr->config1 = event->config;
attr->sample_type = PERF_SAMPLE_CPU;

return 0;
@@ -97,13 +391,14 @@ match:

static void ibs_print_events(const char *sys)
{
- const char **event;
+ struct ibs_event *event;

printf("\n");

- for (event = events; *event; event++) {
- if (!strncmp(sys, *event, strlen(sys)))
- printf(" %-50s [PMU event: %s]\n", *event, sys);
+ for (event = events; event->id; event++) {
+ if (!strncmp(sys, event->name, strlen(sys)))
+ printf(" %-50s [PMU event: %s, id:0x%x]\n",
+ event->name, sys, event->id);
}
}

--
1.7.8.4

2012-05-07 11:01:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 4/7] perf/x86-ibs: Add support for IBS pseudo events

On Wed, 2012-05-02 at 20:26 +0200, Robert Richter wrote:
> +enum ibs_filter_type {
> + IBS_NO_FILTER = 0,
> + IBS_MATCH_FILTER = 1,
> + IBS_ANY_SET_FILTER = 2,
> + IBS_PSEUDO_EVENT = 0x0F,
> +};
> +
> +struct ibs_filter {
> + struct {
> + u16 idx : 8;
> + u16 reserved : 4;
> + u16 type : 4;
> + };
> + union {
> + struct {
> + u8 mask;
> + u8 match;
> + };
> + u16 any;
> + };
> +};
> +
> +static bool
> +__perf_ibs_sample_matches(struct ibs_filter *filter, void *data, int size)
> +{
> + int left = size;
> +
> + switch (filter->type) {
> + case IBS_MATCH_FILTER:
> + left -= sizeof(u8);
> + break;
> + case IBS_ANY_SET_FILTER:
> + left -= sizeof(u16);
> + break;
> + default:
> + return false;
> + }
> +
> + left -= filter->idx;
> + if (left < 0)
> + return false;
> +
> + switch (filter->type) {
> + case IBS_MATCH_FILTER:
> + return ((*(u8*)(data + filter->idx)) & filter->mask) == filter->match;
> + case IBS_ANY_SET_FILTER:
> + return (*(u16*)(data + filter->idx)) & filter->any;
> + };
> +
> + return false;
> +}
> +
> +static bool perf_ibs_sample_matches(struct perf_event *event,
> + struct perf_ibs_data *data)
> +{
> + int i;
> + union {
> + struct ibs_filter filter[4];
> + u64 config[2];
> + } f;
> + struct ibs_filter *filter = f.filter;
> +
> + f.config[0] = event->attr.config1;
> + f.config[1] = event->attr.config2;
> +
> + for (i = 0; i < 4; i++, filter++) {
> + if (filter->type == IBS_NO_FILTER)
> + break;
> + if (!__perf_ibs_sample_matches(filter, data->regs, data->size))
> + return false;
> + }
> +
> + return true;
> +}

Who again wasn't decoding anything in perf_event_attr:config* ?

2012-05-07 11:01:08

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/7] perf tools: Add parser for dynamic PMU events

On Wed, 2012-05-02 at 20:26 +0200, Robert Richter wrote:
> This patch adds support for pmu specific event parsers by extending
> the pmu handler. The event syntax is the same as for tracepoints:
>
> <subsys>:<name>:<modifier>

I really don't like this

> - event_legacy_tracepoint sep_dc |
> + event_legacy_generic sep_dc |

That 'legacy' in there is a good hint you shouldn't be using this.

Ideally I'd get rid of all legacy formats eventually, we really
shouldn't be adding new ones.

2012-05-07 12:09:03

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 4/7] perf/x86-ibs: Add support for IBS pseudo events

On Wed, 2012-05-02 at 20:26 +0200, Robert Richter wrote:

> This patch includes kernel and userland changes.
>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> arch/x86/kernel/cpu/perf_event_amd_ibs.c | 83 ++++++-
> tools/perf/util/pmu-ibs.c | 433 +++++++++++++++++++++++++-----

I think its best to separate this.

> 2 files changed, 445 insertions(+), 71 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
> index 03743ad..1675479 100644
> --- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
> +++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
> @@ -478,6 +478,81 @@ static struct perf_ibs perf_ibs_op = {
> .get_count = get_ibs_op_count,
> };
>
> +enum ibs_filter_type {
> + IBS_NO_FILTER = 0,
> + IBS_MATCH_FILTER = 1,
> + IBS_ANY_SET_FILTER = 2,
> + IBS_PSEUDO_EVENT = 0x0F,
> +};
> +
> +struct ibs_filter {
> + struct {
> + u16 idx : 8;
> + u16 reserved : 4;
> + u16 type : 4;
> + };
> + union {
> + struct {
> + u8 mask;
> + u8 match;
> + };
> + u16 any;
> + };
> +};

> diff --git a/tools/perf/util/pmu-ibs.c b/tools/perf/util/pmu-ibs.c
> index 07acb82..604cb8c 100644
> --- a/tools/perf/util/pmu-ibs.c
> +++ b/tools/perf/util/pmu-ibs.c


> +enum ibs_filter_type {
> + IBS_NO_FILTER = 0,
> + IBS_MATCH_FILTER = 1,
> + IBS_ANY_SET_FILTER = 2,
> + IBS_PSEUDO_EVENT = 0x0F,
> +};
> +
> +struct ibs_filter {
> + struct {
> + __u16 idx : 8;
> + __u16 reserved : 4;
> + __u16 type : 4;
> + };
> + union {
> + struct {
> + __u8 mask;
> + __u8 match;
> + };
> + __u16 any;
> + };
> +};

Having two copies of this just stinks, its only matter of time before
one receives changes the other doesn't.

2012-05-07 13:03:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 4/7] perf/x86-ibs: Add support for IBS pseudo events

On Wed, 2012-05-02 at 20:26 +0200, Robert Richter wrote:
> +enum ibs_filter_type {
> + IBS_NO_FILTER = 0,
> + IBS_MATCH_FILTER = 1,
> + IBS_ANY_SET_FILTER = 2,
> + IBS_PSEUDO_EVENT = 0x0F,
> +};

I don't get how those pseudo events work, AFAIKT IBS_PSEUDO_EVENT causes
one to loose all events since it does have a filter set but fails the
filter and thus we skip the call to perf_event_overflow().

Furthermore, I think this filter stuff should accumulate the period so
that PERF_SAMPLE_PERIOD still works correctly and reflects the number of
counts since the last sample.

Also, what's the point of filtering in-kernel as opposed to doing all
this in userspace? Is the amount of data 'saved' significant enough?


2012-05-07 13:05:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 5/7] perf, tools: Add raw event support for dynamic allocated pmus

On Wed, 2012-05-02 at 20:26 +0200, Robert Richter wrote:
> This patch extends the event parser to pass raw config values to a
> dynamic allocated pmu. The following event syntax is supported now:
>
> <pmu_name>:<raw_config_value>:<modifier>
>
> Example for the ibs pmu:
>
> # perf record -a -e ibs_op:r000 ...

Yeah, clearly not going to happen...

If you'd used the existing stuff you would have been able to do:

ibs_ops/field=0x000/


2012-05-07 14:20:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 5/7] perf, tools: Add raw event support for dynamic allocated pmus

On Mon, 2012-05-07 at 15:05 +0200, Peter Zijlstra wrote:
> On Wed, 2012-05-02 at 20:26 +0200, Robert Richter wrote:
> > This patch extends the event parser to pass raw config values to a
> > dynamic allocated pmu. The following event syntax is supported now:
> >
> > <pmu_name>:<raw_config_value>:<modifier>
> >
> > Example for the ibs pmu:
> >
> > # perf record -a -e ibs_op:r000 ...
>
> Yeah, clearly not going to happen...
>
> If you'd used the existing stuff you would have been able to do:
>
> ibs_ops/field=0x000/

in fact: ibs_ops/config=0x000,config1=0x111,config2=0x222/ should
already work.

Subject: Re: [PATCH 4/7] perf/x86-ibs: Add support for IBS pseudo events

On 07.05.12 13:00:54, Peter Zijlstra wrote:
> Who again wasn't decoding anything in perf_event_attr:config* ?

attr:config is one of the ibs control msrs comparable with perfctr's
evntsel msr:

MSRC001_1030 IBS Fetch Control Register (IbsFetchCtl)
MSRC001_1033 IBS Execution Control Register (IbsOpCtl)

There are some options (randomisation, cycle/micro-op counting) but
usually it is null since the period is encoded in attr:period. But ibs
could be setup by an application using attr:config only which then
passes the value directly to the ctl msr.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

2012-05-07 15:15:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 4/7] perf/x86-ibs: Add support for IBS pseudo events

On Mon, 2012-05-07 at 16:47 +0200, Robert Richter wrote:
> > Who again wasn't decoding anything in perf_event_attr:config* ?
>
> attr:config is one of the ibs control msrs comparable with perfctr's
> evntsel msr:
>
> MSRC001_1030 IBS Fetch Control Register (IbsFetchCtl)
> MSRC001_1033 IBS Execution Control Register (IbsOpCtl)

You missed reading a '*', even so:

> There are some options (randomisation, cycle/micro-op counting) but
> usually it is null since the period is encoded in attr:period. But ibs
> could be setup by an application using attr:config only which then
> passes the value directly to the ctl msr.

PMU_FORMAT_ATTR(IbsFetchMaxCnt, "config:0-15" );
PMU_FORMAT_ATTR(IbsFetchCnt, "config:16-31" );
PMU_FORMAT_ATTR(IbsFetchVal, "config:49" );
PMU_FORMAT_ATTR(IbsRandEn, "config:57" );

and

PMU_FORMAT_ATTR(IbsOpMaxCnt, "config:0-15" );
PMU_FORMAT_ATTR(IbsOpVal, "config:18" );
PMU_FORMAT_ATTR(IbsOpCntCtl, "config:19" ); /* subject to ibs_caps */

Are the writable bitfields of those two MSRs resp.

This patch adds:

PMU_FORMAT_ATTR(IbsFilter0Idx, "config1:0-7" );
PMU_FORMAT_ATTR(IbsFilter0Type, "config1:12-15" );
PMU_FORMAT_ATTR(IbsFilter0Mask, "config1:16-23" );
PMU_FORMAT_ATTR(IbsFilter0Match,"config1:24-31" );
PMU_FORMAT_ATTR(IbsFilter0Any, "config1:16-31" );

PMU_FORMAT_ATTR(IbsFilter1Idx, "config1:32-39" );
PMU_FORMAT_ATTR(IbsFilter1Type, "config1:44-47" );
PMU_FORMAT_ATTR(IbsFilter1Mask, "config1:48-55" );
PMU_FORMAT_ATTR(IbsFilter1Match,"config1:56-63" );
PMU_FORMAT_ATTR(IbsFilter1Any, "config1:48-63" );

PMU_FORMAT_ATTR(IbsFilter2Idx, "config2:0-7" );
PMU_FORMAT_ATTR(IbsFilter2Type, "config2:12-15" );
PMU_FORMAT_ATTR(IbsFilter2Mask, "config2:16-23" );
PMU_FORMAT_ATTR(IbsFilter2Match,"config2:24-31" );
PMU_FORMAT_ATTR(IbsFilter2Any, "config2:16-31" );

PMU_FORMAT_ATTR(IbsFilter3Idx, "config2:32-39" );
PMU_FORMAT_ATTR(IbsFilter3Type, "config2:44-47" );
PMU_FORMAT_ATTR(IbsFilter3Mask, "config2:48-55" );
PMU_FORMAT_ATTR(IbsFilter3Match,"config2:56-63" );
PMU_FORMAT_ATTR(IbsFilter3Any, "config2:48-63" );

And you can write your events like:

ibs_fetch/IbsFilter0Type=1,IbsFilter0Idx=48,IbsFilter0Mask=0x4,IbsFilter0Match=0x4/

No need to duplicate the struct and you're free to re-arrange the actual
bitfields if there ever is a need.

Even more, if you were to expose these events through sysfs
ibs_fetch/events/$foo you could modify the lot and it'd still all work.

No need to query cpuid to figure out if you're fam 10h+, no need to read
ibs_caps in userspace to figure out if config:19 is available, and no
need to duplicate that struct.

And don't tell me your config[12] fields are spec'ed somewhere..




2012-05-07 15:21:57

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 4/7] perf/x86-ibs: Add support for IBS pseudo events

Robert,

There is something I don't quite understand with those pseudo-events.
Is it the case that by construction, it means you can only measure
on pseudo-event at a time? Supposed I want to look at cache-misses.
For each miss, I'd like to know where it missed, any TLB impacts. All
of that in one run with no multiplexing. Can I do this with your pseudo-events?



On Wed, May 2, 2012 at 8:26 PM, Robert Richter <[email protected]> wrote:
> This patch implements support for IBS pseudo events. Pseudo events are
> derived from an IBS sample and determined through a combination of one
> or more IBS event flags or values. See here for a full description:
>
>  Software Optimization Guide for AMD Family 15h Processors
>  Appendix F Guide to Instruction-Based Sampling on AMD Family 15h Processors
>  Advanced Micro Devices, Inc.
>  Publication No. 47414, Revision 3.06
>  January 2012
>  http://support.amd.com/us/Processor_TechDocs/47414_15h_sw_opt_guide.pdf
>
> The list of supported events is provided by perf-list. A pseudo event
> can be set up like this:
>
>  # perf record -a -e ibs_op:MISPREDICTED_BRANCH ...
>
> The filter rules for IBS samples depending on a pseudo event are also
> described in the document above. The filter is setup in the perf tool
> pmu handler and passed to the kernel via config1/config2 attr values.
> The interface is extendable to pass the pseudo events directly to the
> kernel.
>
> There are some pseudo events capable to count latencies or other
> values. Counting values of such events is not yet supported.
>
> This patch includes kernel and userland changes.
>
> Signed-off-by: Robert Richter <[email protected]>
> ---
>  arch/x86/kernel/cpu/perf_event_amd_ibs.c |   83 ++++++-
>  tools/perf/util/pmu-ibs.c                |  433 +++++++++++++++++++++++++-----
>  2 files changed, 445 insertions(+), 71 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
> index 03743ad..1675479 100644
> --- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
> +++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
> @@ -478,6 +478,81 @@ static struct perf_ibs perf_ibs_op = {
>        .get_count              = get_ibs_op_count,
>  };
>
> +enum ibs_filter_type {
> +       IBS_NO_FILTER           = 0,
> +       IBS_MATCH_FILTER        = 1,
> +       IBS_ANY_SET_FILTER      = 2,
> +       IBS_PSEUDO_EVENT        = 0x0F,
> +};
> +
> +struct ibs_filter {
> +       struct {
> +               u16             idx             : 8;
> +               u16             reserved        : 4;
> +               u16             type            : 4;
> +       };
> +       union {
> +               struct {
> +                       u8      mask;
> +                       u8      match;
> +               };
> +               u16             any;
> +       };
> +};
> +
> +static bool
> +__perf_ibs_sample_matches(struct ibs_filter *filter, void *data, int size)
> +{
> +       int left = size;
> +
> +       switch (filter->type) {
> +       case IBS_MATCH_FILTER:
> +               left -= sizeof(u8);
> +               break;
> +       case IBS_ANY_SET_FILTER:
> +               left -= sizeof(u16);
> +               break;
> +       default:
> +               return false;
> +       }
> +
> +       left -= filter->idx;
> +       if (left < 0)
> +               return false;
> +
> +       switch (filter->type) {
> +       case IBS_MATCH_FILTER:
> +               return ((*(u8*)(data + filter->idx)) & filter->mask) == filter->match;
> +       case IBS_ANY_SET_FILTER:
> +               return (*(u16*)(data + filter->idx)) & filter->any;
> +       };
> +
> +       return false;
> +}
> +
> +static bool perf_ibs_sample_matches(struct perf_event *event,
> +                                   struct perf_ibs_data *data)
> +{
> +       int i;
> +       union {
> +               struct ibs_filter filter[4];
> +               u64     config[2];
> +       } f;
> +       struct ibs_filter *filter = f.filter;
> +
> +       f.config[0] = event->attr.config1;
> +       f.config[1] = event->attr.config2;
> +
> +       for (i = 0; i < 4; i++, filter++) {
> +               if (filter->type == IBS_NO_FILTER)
> +                       break;
> +               if (!__perf_ibs_sample_matches(filter, data->regs, data->size))
> +                       return false;
> +       }
> +
> +       return true;
> +}
> +
>  static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
>  {
>        struct cpu_perf_ibs *pcpu = this_cpu_ptr(perf_ibs->pcpu);
> @@ -487,7 +562,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
>        struct perf_raw_record raw;
>        struct pt_regs regs;
>        struct perf_ibs_data ibs_data;
> -       int offset, size, check_rip, offset_max, throttle = 0;
> +       int offset, size, check_rip, filter, offset_max, throttle = 0;
>        unsigned int msr;
>        u64 *buf, *config, period;
>
> @@ -517,7 +592,8 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
>        size = 1;
>        offset = 1;
>        check_rip = (perf_ibs == &perf_ibs_op && (ibs_caps & IBS_CAPS_RIPINVALIDCHK));
> -       if (event->attr.sample_type & PERF_SAMPLE_RAW)
> +       filter = (event->attr.config1 != 0);
> +       if (filter || (event->attr.sample_type & PERF_SAMPLE_RAW))
>                offset_max = perf_ibs->offset_max;
>        else if (check_rip)
>                offset_max = 2;
> @@ -532,6 +608,9 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
>        } while (offset < offset_max);
>        ibs_data.size = sizeof(u64) * size;
>
> +       if (filter && !perf_ibs_sample_matches(event, &ibs_data))
> +               goto out;
> +
>        regs = *iregs;
>        if (check_rip && (ibs_data.regs[2] & IBS_RIP_INVALID)) {
>                regs.flags &= ~PERF_EFLAGS_EXACT;
> diff --git a/tools/perf/util/pmu-ibs.c b/tools/perf/util/pmu-ibs.c
> index 07acb82..604cb8c 100644
> --- a/tools/perf/util/pmu-ibs.c
> +++ b/tools/perf/util/pmu-ibs.c
> @@ -12,84 +12,378 @@
>  #include <linux/compiler.h>
>  #include "pmu.h"
>
> -static const char *events[] = {
> -       "ibs_fetch:2M_PAGE",
> -       "ibs_fetch:4K_PAGE",
> -       "ibs_fetch:ABORTED",
> -       "ibs_fetch:ALL",
> -       "ibs_fetch:ATTEMPTED",
> -       "ibs_fetch:COMPLETED",
> -       "ibs_fetch:ICACHE_HITS",
> -       "ibs_fetch:ICACHE_MISSES",
> -       "ibs_fetch:ITLB_HITS",
> -       "ibs_fetch:KILLED",
> -       "ibs_fetch:L1_ITLB_MISSES_L2_ITLB_HITS",
> -       "ibs_fetch:L1_ITLB_MISSES_L2_ITLB_MISSES",
> -       "ibs_fetch:LATENCY",
> -       "ibs_op:ALL",
> -       "ibs_op:ALL_LOAD_STORE",
> -       "ibs_op:BANK_CONF_LOAD",
> -       "ibs_op:BANK_CONF_STORE",
> -       "ibs_op:BRANCH_RETIRED",
> -       "ibs_op:CANCELLED",
> -       "ibs_op:COMP_TO_RET",
> -       "ibs_op:DATA_CACHE_MISS",
> -       "ibs_op:DATA_HITS",
> -       "ibs_op:DC_LOAD_LAT",
> -       "ibs_op:DCUC_MEM_ACC",
> -       "ibs_op:DCWC_MEM_ACC",
> -       "ibs_op:FORWARD",
> -       "ibs_op:L1_DTLB_1G",
> -       "ibs_op:L1_DTLB_2M",
> -       "ibs_op:L1_DTLB_4K",
> -       "ibs_op:L1_DTLB_HITS",
> -       "ibs_op:L1_DTLB_MISS_L2_DTLB_HIT",
> -       "ibs_op:L1_L2_DTLB_MISS",
> -       "ibs_op:L2_DTLB_1G",
> -       "ibs_op:L2_DTLB_2M",
> -       "ibs_op:L2_DTLB_4K",
> -       "ibs_op:LOAD",
> -       "ibs_op:LOCKED",
> -       "ibs_op:MAB_HIT",
> -       "ibs_op:MISALIGNED_DATA_ACC",
> -       "ibs_op:MISPREDICTED_BRANCH",
> -       "ibs_op:MISPREDICTED_BRANCH_TAKEN",
> -       "ibs_op:MISPREDICTED_RETURNS",
> -       "ibs_op:NB_CACHE_MODIFIED",
> -       "ibs_op:NB_CACHE_OWNED",
> -       "ibs_op:NB_LOCAL_CACHE",
> -       "ibs_op:NB_LOCAL_CACHE_LAT",
> -       "ibs_op:NB_LOCAL_DRAM",
> -       "ibs_op:NB_LOCAL_L3",
> -       "ibs_op:NB_LOCAL_ONLY",
> -       "ibs_op:NB_LOCAL_OTHER",
> -       "ibs_op:NB_REMOTE_CACHE",
> -       "ibs_op:NB_REMOTE_CACHE_LAT",
> -       "ibs_op:NB_REMOTE_DRAM",
> -       "ibs_op:NB_REMOTE_ONLY",
> -       "ibs_op:NB_REMOTE_OTHER",
> -       "ibs_op:RESYNC",
> -       "ibs_op:RETURNS",
> -       "ibs_op:STORE",
> -       "ibs_op:TAG_TO_RETIRE",
> -       "ibs_op:TAKEN_BRANCH",
> -       NULL
> +enum ibs_filter_type {
> +       IBS_NO_FILTER           = 0,
> +       IBS_MATCH_FILTER        = 1,
> +       IBS_ANY_SET_FILTER      = 2,
> +       IBS_PSEUDO_EVENT        = 0x0F,
> +};
> +
> +struct ibs_filter {
> +       struct {
> +               __u16           idx             : 8;
> +               __u16           reserved        : 4;
> +               __u16           type            : 4;
> +       };
> +       union {
> +               struct {
> +                       __u8    mask;
> +                       __u8    match;
> +               };
> +               __u16           any;
> +       };
> +};
> +
> +struct ibs_event {
> +       __u16                   id;
> +       const char              *name;
> +       const char              *desc;
> +       union {
> +               __u16           pseudo_event;
> +               __u64           config;
> +               struct ibs_filter filter[2];
> +       };
> +};
> +
> +#define IBS_FETCH_CTL          0
> +#define IBS_OP_DATA            2
> +#define IBS_OP_DATA2           3
> +#define IBS_OP_DATA3           4
> +
> +#define IBS_IDX(reg, bit)      ((reg)<<3)+((bit)>>3)
> +#define IBS_MASK(bit, m)       (0xFF&m)
> +#define IBS_MASK16(bit, m)     (0xFFFF&m)
> +#define IBS_FILTER_MATCH_ANY() { { 0, 0, 0 }, { .any = 0 } }
> +
> +#define IBS_FILTER_ANY_SET(reg, bit, m)                                \
> +       {                                                       \
> +               {                                               \
> +               .type   = IBS_ANY_SET_FILTER,                   \
> +               .idx    = IBS_IDX(reg, bit),                    \
> +               .reserved = 0,                                  \
> +               },{                                             \
> +               .any    = IBS_MASK16(bit, m),                   \
> +               }                                               \
> +       },
> +
> +#define IBS_FILTER_ALL_CLEAR(reg, bit, m)                      \
> +       {                                                       \
> +               {                                               \
> +               .type   = IBS_MATCH_FILTER,                     \
> +               .idx    = IBS_IDX(reg, bit),                    \
> +               .reserved = 0,                                  \
> +               },{{                                            \
> +               .mask   = IBS_MASK(bit, m),                     \
> +               .match = 0,                                     \
> +               }}                                              \
> +       },                                                      \
> +       {                                                       \
> +               {                                               \
> +               .type   = IBS_MATCH_FILTER,                     \
> +               .idx    = IBS_IDX(reg, (bit) + 8),              \
> +               .reserved = 0,                                  \
> +               },{{                                            \
> +               .mask   = IBS_MASK(bit, (m) >> 8),              \
> +               .match  = 0,                                    \
> +               }}                                              \
> +       },
> +
> +#define IBS_FILTER_ALL_SET(reg, bit, m)                                \
> +       {                                                       \
> +               {                                               \
> +               .type   = IBS_MATCH_FILTER,                     \
> +               .idx    = IBS_IDX(reg, bit),                    \
> +               .reserved = 0,                                  \
> +               },{{                                            \
> +               .mask   = IBS_MASK(bit, m),                     \
> +               .match  = IBS_MASK(bit, m),                     \
> +               }}                                              \
> +       },
> +
> +#define IBS_FILTER_MATCH(reg, bit, m, v)                       \
> +       {                                                       \
> +               {                                               \
> +               .type   = IBS_MATCH_FILTER,                     \
> +               .idx    = IBS_IDX(reg, bit),                    \
> +               .reserved = 0,                                  \
> +               },{{                                            \
> +               .mask   = IBS_MASK(bit, m),                     \
> +               .match  = IBS_MASK(bit, v),                     \
> +               }}                                              \
> +       },
> +
> +#define IBS_FILTER_MATCH2(reg, reg2, bit, bit2, m, m2, v, v2)  \
> +       {                                                       \
> +               {                                               \
> +               .type   = IBS_MATCH_FILTER,                     \
> +               .idx    = IBS_IDX(reg, bit),                    \
> +               .reserved = 0,                                  \
> +               },{{                                            \
> +               .mask   = IBS_MASK(bit, m),                     \
> +               .match  = IBS_MASK(bit, v),                     \
> +               }}                                              \
> +       },                                                      \
> +       {                                                       \
> +               {                                               \
> +               .type   = IBS_MATCH_FILTER,                     \
> +               .idx    = IBS_IDX(reg2, bit2),                  \
> +               .reserved = 0,                                  \
> +               },{{                                            \
> +               .mask   = IBS_MASK(bit2, m2),                   \
> +               .match  = IBS_MASK(bit2, v2),                   \
> +               }}                                              \
> +       }
> +
> +#define IBS_EVENT(i, n, d)                                     \
> +       {                                                       \
> +               .id     = (i),                                  \
> +               .name   = (n),                                  \
> +               .desc   = (d),                                  \
> +               { .filter = { IBS_FILTER_##i } },               \
> +       }
> +#define IBS_FILTER(type, args...)              IBS_FILTER_##type(args)
> +
> +/*
> + * ID   Name                             Derivation
> + *
> + * F000 IBS fetch samples                Number of all IBS fetch samples
> + * F001 IBS fetch killed                 Number of killed IBS fetch samples
> + * F002 IBS fetch attempted              Number of non-killed IBS fetch samples
> + * F003 IBS fetch completed              IbsFetchComp
> + * F004 IBS fetch aborted                ~IbsFetchComp
> + * F005 IBS L1 ITLB hit                  ~IbsL1TlbMiss & IbsPhyAddrValid
> + * F006 IBS L1 ITLB miss, L2 ITLB hit    IbsL1TlbMiss & ~IbsL2TlbMiss
> + * F007 IBS L1 ITLB miss, L2 ITLB miss   IbsL1TlbMiss & IbsL2TlbMiss
> + * F008 IBS instruction cache miss       IbsIcMiss
> + * F009 IBS instruction cache hit        IbsFetchComp & ~IbsIcMiss
> + * F00A IBS 4K page translation          IbsL1TlbPgSz=0 & IbsPhyAddrValid
> + * F00B IBS 2M page translation          IbsL1TlbPgSz=1 & IbsPhyAddrValid
> + * F00C IBS 1G page translation          IbsL1TlbPgSz=2 & IbsPhyAddrValid
> + * F00D Reserved
> + * F00E IBS fetch latency                IbsfetchLat
> + */
> +#define IBS_FILTER_0xf000 IBS_FILTER(MATCH_ANY)
> +#define IBS_FILTER_0xf001 IBS_FILTER(ALL_CLEAR,        IBS_FETCH_CTL,  48, 0x019c)
> +#define IBS_FILTER_0xf002 IBS_FILTER(ANY_SET,  IBS_FETCH_CTL,  48, 0x019c)
> +#define IBS_FILTER_0xf003 IBS_FILTER(MATCH,    IBS_FETCH_CTL,  48, 0x04, 0x04)
> +#define IBS_FILTER_0xf004 IBS_FILTER(MATCH,    IBS_FETCH_CTL,  48, 0x04, 0x00)
> +#define IBS_FILTER_0xf005 IBS_FILTER(MATCH,    IBS_FETCH_CTL,  48, 0x90, 0x10)
> +#define IBS_FILTER_0xf006 IBS_FILTER(MATCH2,   IBS_FETCH_CTL, IBS_FETCH_CTL, 56, 48, 0x01, 0x80, 0x00, 0x80)
> +#define IBS_FILTER_0xf007 IBS_FILTER(MATCH2,   IBS_FETCH_CTL, IBS_FETCH_CTL, 56, 48, 0x01, 0x80, 0x01, 0x80)
> +#define IBS_FILTER_0xf008 IBS_FILTER(MATCH,    IBS_FETCH_CTL,  48, 0x08, 0x08)
> +#define IBS_FILTER_0xf009 IBS_FILTER(MATCH,    IBS_FETCH_CTL,  48, 0x0C, 0x04)
> +#define IBS_FILTER_0xf00a IBS_FILTER(MATCH,    IBS_FETCH_CTL,  48, 0x70, 0x10)
> +#define IBS_FILTER_0xf00b IBS_FILTER(MATCH,    IBS_FETCH_CTL,  48, 0x70, 0x30)
> +#define IBS_FILTER_0xf00c IBS_FILTER(MATCH,    IBS_FETCH_CTL,  48, 0x70, 0x60)
> +#if 0
> +#define IBS_FILTER_0xf00e IBS_FILTER(COUNT,    IBS_FETCH_CTL,  32, 0xffff)
> +#endif
> +
> +/*
> + * ID   Name                             Derivation
> + *
> + * F100 IBS all op samples               Number of all IBS op samples
> + * F101 IBS tag to retire cycles         Sum of all tag to retire cycles
> + * F102 ibs completion to retire cycles  Sum of all completion to retire cycles
> + * F103 IBS branch op                    IbsOpBrnRet
> + * F104 IBS mispredicted branch op       IbsOpBrnRet & IbsOpBrnMisp
> + * F105 IBS taken branch op              IbsOpBrnRet & IbsOpBrnTaken
> + * F106 IBS mispredicted taken branch op IbsOpBrnRet & IbsOpBrnTaken & IbsOpBrnMisp
> + * F107 IBS return op                    IbsOpReturn
> + * F108 IBS mispredicted return op       IbsOpReturn & IbsOpMispReturn
> + * F109 IBS resync op                    IbsOpBrnResync
> + */
> +#define IBS_FILTER_0xf100 IBS_FILTER(MATCH_ANY)
> +#if 0
> +#define IBS_FILTER_0xf101 IBS_FILTER(COUNT,    IBS_OP_DATA,    16, 0xffff)
> +#define IBS_FILTER_0xf102 IBS_FILTER(COUNT,    IBS_OP_DATA,     0, 0xffff)
> +#endif
> +#define IBS_FILTER_0xf103 IBS_FILTER(MATCH,    IBS_OP_DATA,    32, 0x20, 0x20)
> +#define IBS_FILTER_0xf104 IBS_FILTER(MATCH,    IBS_OP_DATA,    32, 0x30, 0x30)
> +#define IBS_FILTER_0xf105 IBS_FILTER(MATCH,    IBS_OP_DATA,    32, 0x28, 0x28)
> +#define IBS_FILTER_0xf106 IBS_FILTER(MATCH,    IBS_OP_DATA,    32, 0x38, 0x38)
> +#define IBS_FILTER_0xf107 IBS_FILTER(MATCH,    IBS_OP_DATA,    32, 0x04, 0x04)
> +#define IBS_FILTER_0xf108 IBS_FILTER(MATCH,    IBS_OP_DATA,    32, 0x06, 0x06)
> +#define IBS_FILTER_0xf109 IBS_FILTER(MATCH,    IBS_OP_DATA,    32, 0x01, 0x01)
> +
> +/*
> + * ID   Name                             Derivation
> + *
> + * F200 IBS All Load/Store Ops           IbsLdOp | IbsStOp
> + * F201 IBS Load Ops                     IbsLdOp
> + * F202 IBS Store Ops                    IbsStOp
> + * F203 IBS L1 DTLB Hit                  ~IbsDcL1tlbMiss & IbsDcLinAddrValid
> + * F204 IBS L1 DTLB Miss L2 DTLB Hit     IbsDcL1tlbMiss & ~IbsDcL2tlbMiss
> + * F205 IBS L1 DTLB Miss L2 DTLB Miss    IbsDcL1tlbMiss & IbsDcL2tlbMiss
> + * F206 IBS DC Miss                      IbsDcMiss
> + * F207 IBS DC Hit                       ~IbsDcMiss
> + * F208 IBS Misaligned Access            IbsDcMisAcc
> + * F209 IBS Bank Conflict On Load Op     IbsDcLdBnkCon
> + * F20A Reserved
> + * F20B IBS Store to Load Forwarded      IbsDcStToLdFwd
> + * F20C IBSStore to Load Forwarding Cancelled IbsDcStToLdCan
> + * F20D IBS UC memory access             IbsDcUcMemAcc
> + * F20E IBS WC memory access             IbsDcWcMemAcc
> + * F20F IBS locked operation             IbsDcLockedOp
> + * F210 IBS MAB hit                      IbsDcMabHit
> + * F211 IBS L1 DTLB 4K page              ~IbsDcL1tlbHit2M & ~IbsDcL1tlbHit1G &
> + *                                       IbsDcLinAddrValid
> + * F212 IBS L1 DTLB 2M page              IbsDcL1tlbHit2M & IbsDcLinAddrValid
> + * F213 IBS L1 DTLB 1G page              IbsDcL1tlbHit1G & IbsDcLinAddrValid
> + * F214 Reserved
> + * F215 IBS L2 DTLB 4K page              ~IbsDcL2tlbMiss & IbsDcL1tlbMiss &
> + *                                       ~IbsDcL1tlbHit2M & lbsDcLinAddrValid
> + * F216 IBS L2 DTLB 2M page              ~IbsDcL2tlbMiss & IbsDcL1tlbMiss &
> + *                                       IbsDcL1tlbHit2M & lbsDcLinAddrValid
> + * F217 Reserved
> + * F218 Reserved
> + * F219 IBS DC miss load latency         IbsDcMissLat when IbsLdOp & IbsDcMiss
> + */
> +#define IBS_FILTER_0xf200 IBS_FILTER(ANY_SET,  IBS_OP_DATA3,    0, 0x0003)
> +#define IBS_FILTER_0xf201 IBS_FILTER(MATCH,    IBS_OP_DATA3,    0, 0x01, 0x01)
> +#define IBS_FILTER_0xf202 IBS_FILTER(MATCH,    IBS_OP_DATA3,    0, 0x02, 0x02)
> +#define IBS_FILTER_0xf203 IBS_FILTER(MATCH2,   IBS_OP_DATA3, IBS_OP_DATA3, 16, 0, 0x02, 0x04, 0x02, 0x00)
> +#define IBS_FILTER_0xf204 IBS_FILTER(MATCH,    IBS_OP_DATA3,    0, 0x0C, 0x08)
> +#define IBS_FILTER_0xf205 IBS_FILTER(MATCH,    IBS_OP_DATA3,    0, 0x0C, 0x0C)
> +#define IBS_FILTER_0xf206 IBS_FILTER(MATCH,    IBS_OP_DATA3,    0, 0x80, 0x80)
> +#define IBS_FILTER_0xf207 IBS_FILTER(MATCH,    IBS_OP_DATA3,    0, 0x80, 0x00) IBS_FILTER(ANY_SET,   IBS_OP_DATA3,    0, 0x0003)
> +#define IBS_FILTER_0xf208 IBS_FILTER(MATCH,    IBS_OP_DATA3,    8, 0x01, 0x01)
> +#define IBS_FILTER_0xf209 IBS_FILTER(MATCH,    IBS_OP_DATA3,    8, 0x02, 0x02)
> +#define IBS_FILTER_0xf20b IBS_FILTER(MATCH,    IBS_OP_DATA3,    8, 0x08, 0x08)
> +#define IBS_FILTER_0xf20c IBS_FILTER(MATCH,    IBS_OP_DATA3,    8, 0x10, 0x10)
> +#define IBS_FILTER_0xf20d IBS_FILTER(MATCH,    IBS_OP_DATA3,    8, 0x40, 0x40)
> +#define IBS_FILTER_0xf20e IBS_FILTER(MATCH,    IBS_OP_DATA3,    8, 0x20, 0x20)
> +#define IBS_FILTER_0xf20f IBS_FILTER(MATCH,    IBS_OP_DATA3,    8, 0x80, 0x80)
> +#define IBS_FILTER_0xf210 IBS_FILTER(MATCH,    IBS_OP_DATA3,   16, 0x01, 0x01)
> +#define IBS_FILTER_0xf211 IBS_FILTER(MATCH2,   IBS_OP_DATA3, IBS_OP_DATA3, 16, 0, 0x02, 0x30, 0x02, 0x00)
> +#define IBS_FILTER_0xf212 IBS_FILTER(MATCH2,   IBS_OP_DATA3, IBS_OP_DATA3, 16, 0, 0x02, 0x10, 0x02, 0x10)
> +#define IBS_FILTER_0xf213 IBS_FILTER(MATCH2,   IBS_OP_DATA3, IBS_OP_DATA3, 16, 0, 0x02, 0x20, 0x02, 0x20)
> +#define IBS_FILTER_0xf215 IBS_FILTER(MATCH2,   IBS_OP_DATA3, IBS_OP_DATA3, 16, 0, 0x02, 0x1C, 0x02, 0x04)
> +#define IBS_FILTER_0xf216 IBS_FILTER(MATCH2,   IBS_OP_DATA3, IBS_OP_DATA3, 16, 0, 0x02, 0x1C, 0x02, 0x14)
> +#if 0
> +#define IBS_FILTER_0xf219 IBS_FILTER(COUNT,    IBS_OP_DATA3,   32, 0x00FFFF, 0x00FFFF)
> +#endif
> +
> +/*
> + * ID   Name                             Derivation
> + *
> + * F240 IBS NB local                     ~NbIbsReqDstProc
> + * F241 IBS NB remote                    NbIbsReqDstProc
> + * F242 IBS NB local L3                  NbIbsReqSrc=0x1 & ~NbIbsReqDstProc
> + * F243 IBS NB local L1/L2 (intercore)   NbIbsReqSrc=0x2 & ~NbIbsReqDstProc
> + * F244 IBS NB remote L1/L2/L3 cache     NbIbsReqSrc=0x2 & NbIbsReqDstProc
> + * F245 IBS NB local DRAM                NbIbsReqSrc=0x3 & ~NbIbsReqDstProc
> + * F246 IBS NB remote DRAM               NbIbsReqSrc=0x3 & NbIbsReqDstProc
> + * F247 IBS NB local other               NbIbsReqSrc=0x7 & ~NbIbsReqDstProc
> + * F248 IBS NB remote other              NbIbsReqSrc=0x7 & NbIbsReqDstProc
> + * F249 IBS NB cache M state             NbIbsReqSrc=0x2 & ~NbIbsReqCacheHitSt
> + * F24A IBS NB cache O state             NbIbsReqSrc=0x2 & NbIbsReqCacheHitSt
> + * F24B IBS NB local latency             IbsDcMissLat when ~NbIbsReqDstProc
> + * F24C IBS NB remote latency            IbsDcMissLat when NbIbsReqDstProc
> + */
> +#define IBS_FILTER_0xf240 IBS_FILTER(MATCH,    IBS_OP_DATA2,    0, 0x10, 0x00) IBS_FILTER(ANY_SET,     IBS_OP_DATA2,   0, 0x0007)
> +#define IBS_FILTER_0xf241 IBS_FILTER(MATCH,    IBS_OP_DATA2,    0, 0x10, 0x10) IBS_FILTER(ANY_SET,     IBS_OP_DATA2,   0, 0x0007)
> +#define IBS_FILTER_0xf242 IBS_FILTER(MATCH,    IBS_OP_DATA2,    0, 0x17, 0x01) IBS_FILTER(MATCH,       IBS_OP_DATA3,   0, 0x81, 0x81)
> +#define IBS_FILTER_0xf243 IBS_FILTER(MATCH,    IBS_OP_DATA2,    0, 0x17, 0x02) IBS_FILTER(MATCH,       IBS_OP_DATA3,   0, 0x81, 0x81)
> +#define IBS_FILTER_0xf244 IBS_FILTER(MATCH,    IBS_OP_DATA2,    0, 0x17, 0x12) IBS_FILTER(MATCH,       IBS_OP_DATA3,   0, 0x81, 0x81)
> +#define IBS_FILTER_0xf245 IBS_FILTER(MATCH,    IBS_OP_DATA2,    0, 0x17, 0x03) IBS_FILTER(MATCH,       IBS_OP_DATA3,   0, 0x81, 0x81)
> +#define IBS_FILTER_0xf246 IBS_FILTER(MATCH,    IBS_OP_DATA2,    0, 0x17, 0x13) IBS_FILTER(MATCH,       IBS_OP_DATA3,   0, 0x81, 0x81)
> +#define IBS_FILTER_0xf247 IBS_FILTER(MATCH,    IBS_OP_DATA2,    0, 0x17, 0x07) IBS_FILTER(MATCH,       IBS_OP_DATA3,   0, 0x81, 0x81)
> +#define IBS_FILTER_0xf248 IBS_FILTER(MATCH,    IBS_OP_DATA2,    0, 0x17, 0x17) IBS_FILTER(MATCH,       IBS_OP_DATA3,   0, 0x81, 0x81)
> +#define IBS_FILTER_0xf249 IBS_FILTER(MATCH,    IBS_OP_DATA2,    0, 0x27, 0x02) IBS_FILTER(MATCH,       IBS_OP_DATA3,   0, 0x81, 0x81)
> +#define IBS_FILTER_0xf24a IBS_FILTER(MATCH,    IBS_OP_DATA2,    0, 0x27, 0x22) IBS_FILTER(MATCH,       IBS_OP_DATA3,   0, 0x81, 0x81)
> +
> +static struct ibs_event events[] = {
> +       IBS_EVENT(0xf000, "ibs_fetch:ALL", "All IBS fetch samples"),
> +       IBS_EVENT(0xf001, "ibs_fetch:KILLED", "IBS fetch killed"),
> +       IBS_EVENT(0xf002, "ibs_fetch:ATTEMPTED", "IBS fetch attempted"),
> +       IBS_EVENT(0xf003, "ibs_fetch:COMPLETED", "IBS fetch completed"),
> +       IBS_EVENT(0xf004, "ibs_fetch:ABORTED", "IBS fetch aborted"),
> +       IBS_EVENT(0xf005, "ibs_fetch:ITLB_HITS", "IBS ITLB hit"),
> +       IBS_EVENT(0xf006, "ibs_fetch:L1_ITLB_MISSES_L2_ITLB_HITS", "IBS L1 ITLB misses (and L2 ITLB hits)"),
> +       IBS_EVENT(0xf007, "ibs_fetch:L1_ITLB_MISSES_L2_ITLB_MISSES", "IBS L1 L2 ITLB miss"),
> +       IBS_EVENT(0xf008, "ibs_fetch:ICACHE_MISSES", "IBS instruction cache misses"),
> +       IBS_EVENT(0xf009, "ibs_fetch:ICACHE_HITS", "IBS instruction cache hit"),
> +       IBS_EVENT(0xf00a, "ibs_fetch:4K_PAGE", "IBS 4K page translation"),
> +       IBS_EVENT(0xf00b, "ibs_fetch:2M_PAGE", "IBS 2M page translation"),
> +       IBS_EVENT(0xf00c, "ibs_fetch:1G_PAGE", "IBS 1G page translation"),
> +#if 0
> +       IBS_EVENT(0xf00e, "ibs_fetch:LATENCY", "IBS fetch latency"),
> +#endif
> +       IBS_EVENT(0xf100, "ibs_op:ALL", "All IBS op samples"),
> +#if 0
> +       IBS_EVENT(0xf101, "ibs_op:TAG_TO_RETIRE", "IBS tag-to-retire cycles"),
> +       IBS_EVENT(0xf102, "ibs_op:COMP_TO_RET", "IBS completion-to-retire cycles"),
> +#endif
> +       IBS_EVENT(0xf103, "ibs_op:BRANCH_RETIRED", "IBS branch op"),
> +       IBS_EVENT(0xf104, "ibs_op:MISPREDICTED_BRANCH", "IBS mispredicted branch op"),
> +       IBS_EVENT(0xf105, "ibs_op:TAKEN_BRANCH", "IBS taken branch op"),
> +       IBS_EVENT(0xf106, "ibs_op:MISPREDICTED_BRANCH_TAKEN", "IBS mispredicted taken branch op"),
> +       IBS_EVENT(0xf107, "ibs_op:RETURNS", "IBS return op"),
> +       IBS_EVENT(0xf108, "ibs_op:MISPREDICTED_RETURNS", "IBS mispredicted return op"),
> +       IBS_EVENT(0xf109, "ibs_op:RESYNC", "IBS resync op"),
> +       IBS_EVENT(0xf200, "ibs_op:ALL_LOAD_STORE", "IBS all load store ops"),
> +       IBS_EVENT(0xf201, "ibs_op:LOAD", "IBS load ops"),
> +       IBS_EVENT(0xf202, "ibs_op:STORE", "IBS store ops"),
> +       IBS_EVENT(0xf203, "ibs_op:L1_DTLB_HITS", "IBS L1 DTLB hit"),
> +       IBS_EVENT(0xf204, "ibs_op:L1_DTLB_MISS_L2_DTLB_HIT", "IBS L1 DTLB misses L2 hits"),
> +       IBS_EVENT(0xf205, "ibs_op:L1_L2_DTLB_MISS", "IBS L1 and L2 DTLB misses"),
> +       IBS_EVENT(0xf206, "ibs_op:DATA_CACHE_MISS", "IBS data cache misses"),
> +       IBS_EVENT(0xf207, "ibs_op:DATA_HITS", "IBS data cache hits"),
> +       IBS_EVENT(0xf208, "ibs_op:MISALIGNED_DATA_ACC", "IBS misaligned data access"),
> +       IBS_EVENT(0xf209, "ibs_op:BANK_CONF_LOAD", "IBS bank conflict on load op"),
> +#if 0
> +       IBS_EVENT(0xf20a, "ibs_op:BANK_CONF_STORE", "IBS bank conflict on store op"),
> +#endif
> +       IBS_EVENT(0xf20b, "ibs_op:FORWARD", "IBS store-to-load forwarded"),
> +       IBS_EVENT(0xf20c, "ibs_op:CANCELLED", "IBS store-to-load cancelled"),
> +       IBS_EVENT(0xf20d, "ibs_op:DCUC_MEM_ACC", "IBS UC memory access"),
> +       IBS_EVENT(0xf20e, "ibs_op:DCWC_MEM_ACC", "IBS WC memory access"),
> +       IBS_EVENT(0xf20f, "ibs_op:LOCKED", "IBS locked operation"),
> +       IBS_EVENT(0xf210, "ibs_op:MAB_HIT", "IBS MAB hit"),
> +       IBS_EVENT(0xf211, "ibs_op:L1_DTLB_4K", "IBS L1 DTLB 4K page"),
> +       IBS_EVENT(0xf212, "ibs_op:L1_DTLB_2M", "IBS L1 DTLB 2M page"),
> +       IBS_EVENT(0xf213, "ibs_op:L1_DTLB_1G", "IBS L1 DTLB 1G page"),
> +       IBS_EVENT(0xf215, "ibs_op:L2_DTLB_4K", "IBS L2 DTLB 4K page"),
> +       IBS_EVENT(0xf216, "ibs_op:L2_DTLB_2M", "IBS L2 DTLB 2M page"),
> +#if 0
> +       IBS_EVENT(0xf217, "ibs_op:L2_DTLB_1G", "IBS L2 DTLB 1G page"),
> +       IBS_EVENT(0xf219, "ibs_op:DC_LOAD_LAT", "IBS data cache miss load latency"),
> +#endif
> +       IBS_EVENT(0xf240, "ibs_op:NB_LOCAL_ONLY", "IBS Northbridge local"),
> +       IBS_EVENT(0xf241, "ibs_op:NB_REMOTE_ONLY", "IBS Northbridge remote"),
> +       IBS_EVENT(0xf242, "ibs_op:NB_LOCAL_L3", "IBS Northbridge local L3"),
> +       IBS_EVENT(0xf243, "ibs_op:NB_LOCAL_CACHE", "IBS Northbridge local core L1 or L2 cache"),
> +       IBS_EVENT(0xf244, "ibs_op:NB_REMOTE_CACHE", "IBS Northbridge local core L1, L2, L3 cache"),
> +       IBS_EVENT(0xf245, "ibs_op:NB_LOCAL_DRAM", "IBS Northbridge local DRAM"),
> +       IBS_EVENT(0xf246, "ibs_op:NB_REMOTE_DRAM", "IBS Northbridge remote DRAM"),
> +       IBS_EVENT(0xf247, "ibs_op:NB_LOCAL_OTHER", "IBS Northbridge local APIC MMIO Config PCI"),
> +       IBS_EVENT(0xf248, "ibs_op:NB_REMOTE_OTHER", "IBS Northbridge remote APIC MMIO Config PCI"),
> +       IBS_EVENT(0xf249, "ibs_op:NB_CACHE_MODIFIED", "IBS Northbridge cache modified state"),
> +       IBS_EVENT(0xf24a, "ibs_op:NB_CACHE_OWNED", "IBS Northbridge cache owned state"),
> +#if 0
> +       IBS_EVENT(0xf24b, "ibs_op:NB_LOCAL_CACHE_LAT", "IBS Northbridge local cache latency"),
> +       IBS_EVENT(0xf24c, "ibs_op:NB_REMOTE_CACHE_LAT", "IBS Northbridge remote cache latency"),
> +#endif
> +       { 0, NULL, NULL, { .filter = { IBS_FILTER_MATCH_ANY() } } }
>  };
>
>  static int ibs_parse_event(struct perf_event_attr *attr, char *sys, char *name)
>  {
> -       const char **event;
> +       struct ibs_event *event;
>
>        if (strcmp("ibs_op", sys) && strcmp("ibs_fetch", sys))
>                return -ENOENT;
>
> -       for (event = events; *event; event++) {
> -               if (!strcmp(*event + strlen(sys) + 1, name))
> +       for (event = events; event->id; event++) {
> +               if (!strcmp(event->name + strlen(sys) + 1, name))
>                        goto match;
>        }
>
>        return -EINVAL;
>  match:
> +       /* pseudo event found */
> +       attr->config1 = event->config;
>        attr->sample_type = PERF_SAMPLE_CPU;
>
>        return 0;
> @@ -97,13 +391,14 @@ match:
>
>  static void ibs_print_events(const char *sys)
>  {
> -       const char **event;
> +       struct ibs_event *event;
>
>        printf("\n");
>
> -       for (event = events; *event; event++) {
> -               if (!strncmp(sys, *event, strlen(sys)))
> -                       printf("  %-50s [PMU event: %s]\n", *event, sys);
> +       for (event = events; event->id; event++) {
> +               if (!strncmp(sys, event->name, strlen(sys)))
> +                       printf("  %-50s [PMU event: %s, id:0x%x]\n",
> +                              event->name, sys, event->id);
>        }
>  }
>
> --
> 1.7.8.4
>
>
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2012-05-07 15:29:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 4/7] perf/x86-ibs: Add support for IBS pseudo events

On Mon, 2012-05-07 at 17:21 +0200, Stephane Eranian wrote:
> There is something I don't quite understand with those pseudo-events.
> Is it the case that by construction, it means you can only measure
> on pseudo-event at a time? Supposed I want to look at cache-misses.
> For each miss, I'd like to know where it missed, any TLB impacts. All
> of that in one run with no multiplexing. Can I do this with your
> pseudo-events?
>
Not from how I read the patch, a pseudo event will count as a full event
and since the IBS things only have a single thing its full.

If you want multiple of these you'd have to get the raw stream and demux
in userspace.


Subject: Re: [PATCH 4/7] perf/x86-ibs: Add support for IBS pseudo events

On 07.05.12 15:03:02, Peter Zijlstra wrote:
> On Wed, 2012-05-02 at 20:26 +0200, Robert Richter wrote:
> > +enum ibs_filter_type {
> > + IBS_NO_FILTER = 0,
> > + IBS_MATCH_FILTER = 1,
> > + IBS_ANY_SET_FILTER = 2,
> > + IBS_PSEUDO_EVENT = 0x0F,
> > +};
>
> I don't get how those pseudo events work, AFAIKT IBS_PSEUDO_EVENT causes
> one to loose all events since it does have a filter set but fails the
> filter and thus we skip the call to perf_event_overflow().

You periodically (fix clk cycles or number of micro-ops) trigger IBS
samples and afterwards analyses the samples for certain filter rules
(see rule description in pmu-ibs.c).

> Furthermore, I think this filter stuff should accumulate the period so
> that PERF_SAMPLE_PERIOD still works correctly and reflects the number of
> counts since the last sample.

Yes, simply dropping the sample is not enough. Have to check if the
period is correctly maintained.

> Also, what's the point of filtering in-kernel as opposed to doing all
> this in userspace? Is the amount of data 'saved' significant enough?

In-kernel filtering has drawbacks esp. if you want to handle multiple
pseudo-events and also when post-processing a perf.data file. But for
a proof-of-concept and a demonstration it was the easiest
implementation since there is a lack of userland filtering. I guess I
will look at how to implement this.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

2012-05-07 15:52:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 4/7] perf/x86-ibs: Add support for IBS pseudo events

On Mon, 2012-05-07 at 17:44 +0200, Robert Richter wrote:
> On 07.05.12 15:03:02, Peter Zijlstra wrote:
> > On Wed, 2012-05-02 at 20:26 +0200, Robert Richter wrote:
> > > +enum ibs_filter_type {
> > > + IBS_NO_FILTER = 0,
> > > + IBS_MATCH_FILTER = 1,
> > > + IBS_ANY_SET_FILTER = 2,
> > > + IBS_PSEUDO_EVENT = 0x0F,
> > > +};
> >
> > I don't get how those pseudo events work, AFAIKT IBS_PSEUDO_EVENT causes
> > one to loose all events since it does have a filter set but fails the
> > filter and thus we skip the call to perf_event_overflow().
>
> You periodically (fix clk cycles or number of micro-ops) trigger IBS
> samples and afterwards analyses the samples for certain filter rules
> (see rule description in pmu-ibs.c).

But IBS_PSEUDO_EVENT will fail all filter test and you'll end up with
exactly 0 samples. Still somewhat confused..

Subject: Re: [PATCH 4/7] perf/x86-ibs: Add support for IBS pseudo events

On 07.05.12 17:29:17, Peter Zijlstra wrote:
> On Mon, 2012-05-07 at 17:21 +0200, Stephane Eranian wrote:
> > There is something I don't quite understand with those pseudo-events.
> > Is it the case that by construction, it means you can only measure
> > on pseudo-event at a time? Supposed I want to look at cache-misses.
> > For each miss, I'd like to know where it missed, any TLB impacts. All
> > of that in one run with no multiplexing. Can I do this with your
> > pseudo-events?
> >
> Not from how I read the patch, a pseudo event will count as a full event
> and since the IBS things only have a single thing its full.
>
> If you want multiple of these you'd have to get the raw stream and demux
> in userspace.

As said in my previous mail, this is a drawback. And as with
precise-rip it is better to implement this in userland.

The question here is how to implement it. You need the event option
string (-e) not only to setup the syscall but also for userspace
post-processing. Any hint for something similar already done in perf?

Thanks,

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

Subject: Re: [PATCH 4/7] perf/x86-ibs: Add support for IBS pseudo events

On 07.05.12 17:51:51, Peter Zijlstra wrote:
> On Mon, 2012-05-07 at 17:44 +0200, Robert Richter wrote:
> > On 07.05.12 15:03:02, Peter Zijlstra wrote:
> > > On Wed, 2012-05-02 at 20:26 +0200, Robert Richter wrote:
> > > > +enum ibs_filter_type {
> > > > + IBS_NO_FILTER = 0,
> > > > + IBS_MATCH_FILTER = 1,
> > > > + IBS_ANY_SET_FILTER = 2,
> > > > + IBS_PSEUDO_EVENT = 0x0F,
> > > > +};
> > >
> > > I don't get how those pseudo events work, AFAIKT IBS_PSEUDO_EVENT causes
> > > one to loose all events since it does have a filter set but fails the
> > > filter and thus we skip the call to perf_event_overflow().
> >
> > You periodically (fix clk cycles or number of micro-ops) trigger IBS
> > samples and afterwards analyses the samples for certain filter rules
> > (see rule description in pmu-ibs.c).
>
> But IBS_PSEUDO_EVENT will fail all filter test and you'll end up with
> exactly 0 samples. Still somewhat confused..

Ahh, got your question. This is a place holder for writing the pseudo
event id's (0xF000 mask) directly in config[12]. But this is not yet
implemented. And since we probably all prefer userland filtering we
just don't need this at all.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

Subject: Re: [PATCH 4/7] perf/x86-ibs: Add support for IBS pseudo events

On 07.05.12 17:15:18, Peter Zijlstra wrote:
> On Mon, 2012-05-07 at 16:47 +0200, Robert Richter wrote:
> > > Who again wasn't decoding anything in perf_event_attr:config* ?
> >
> > attr:config is one of the ibs control msrs comparable with perfctr's
> > evntsel msr:
> >
> > MSRC001_1030 IBS Fetch Control Register (IbsFetchCtl)
> > MSRC001_1033 IBS Execution Control Register (IbsOpCtl)
>
> You missed reading a '*', even so:
>
> > There are some options (randomisation, cycle/micro-op counting) but
> > usually it is null since the period is encoded in attr:period. But ibs
> > could be setup by an application using attr:config only which then
> > passes the value directly to the ctl msr.
>
> PMU_FORMAT_ATTR(IbsFetchMaxCnt, "config:0-15" );
> PMU_FORMAT_ATTR(IbsFetchCnt, "config:16-31" );
> PMU_FORMAT_ATTR(IbsFetchVal, "config:49" );
> PMU_FORMAT_ATTR(IbsRandEn, "config:57" );
>
> and
>
> PMU_FORMAT_ATTR(IbsOpMaxCnt, "config:0-15" );
> PMU_FORMAT_ATTR(IbsOpVal, "config:18" );
> PMU_FORMAT_ATTR(IbsOpCntCtl, "config:19" ); /* subject to ibs_caps */
>
> Are the writable bitfields of those two MSRs resp.
>
> This patch adds:
>
> PMU_FORMAT_ATTR(IbsFilter0Idx, "config1:0-7" );
> PMU_FORMAT_ATTR(IbsFilter0Type, "config1:12-15" );
> PMU_FORMAT_ATTR(IbsFilter0Mask, "config1:16-23" );
> PMU_FORMAT_ATTR(IbsFilter0Match,"config1:24-31" );
> PMU_FORMAT_ATTR(IbsFilter0Any, "config1:16-31" );
>

Yes, let's take this format specification for config and drop kernel
side filtering in config[12].

Passing options like IbsRandEn was another open item. This is a nice
solution for this.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

Subject: Re: [PATCH 3/7] perf tools: Add parser for dynamic PMU events

On 07.05.12 13:01:03, Peter Zijlstra wrote:
> On Wed, 2012-05-02 at 20:26 +0200, Robert Richter wrote:
> > This patch adds support for pmu specific event parsers by extending
> > the pmu handler. The event syntax is the same as for tracepoints:
> >
> > <subsys>:<name>:<modifier>
>
> I really don't like this
>
> > - event_legacy_tracepoint sep_dc |
> > + event_legacy_generic sep_dc |
>
> That 'legacy' in there is a good hint you shouldn't be using this.
>
> Ideally I'd get rid of all legacy formats eventually, we really
> shouldn't be adding new ones.

Anyway, we need something like this to specify event names (pseudo
event names in case of ibs) and userland filter options. Though, it
must not necessarily this syntax but something else. The one above
looks esthetic to me while some other event syntax is close to a perl
regex. ;)

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

2012-05-07 17:11:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/7] perf tools: Add parser for dynamic PMU events

On Mon, 2012-05-07 at 19:05 +0200, Robert Richter wrote:
> On 07.05.12 13:01:03, Peter Zijlstra wrote:
> > On Wed, 2012-05-02 at 20:26 +0200, Robert Richter wrote:
> > > This patch adds support for pmu specific event parsers by extending
> > > the pmu handler. The event syntax is the same as for tracepoints:
> > >
> > > <subsys>:<name>:<modifier>
> >
> > I really don't like this
> >
> > > - event_legacy_tracepoint sep_dc |
> > > + event_legacy_generic sep_dc |
> >
> > That 'legacy' in there is a good hint you shouldn't be using this.
> >
> > Ideally I'd get rid of all legacy formats eventually, we really
> > shouldn't be adding new ones.
>
> Anyway, we need something like this to specify event names (pseudo
> event names in case of ibs) and userland filter options. Though, it
> must not necessarily this syntax but something else. The one above
> looks esthetic to me while some other event syntax is close to a perl
> regex. ;)

Jiri and Yan Zheng are working on something that would allow:

ibs_op/foo/

Their solution will initially get foo from sysfs (ibs_op/events/foo),
but an external lookup should also get there. Stephane also wants/needs
something like that to plug the Intel event names in.


2012-05-31 15:24:31

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 0/7] perf/x86-ibs and tools: Add support for AMD IBS

Robert,

I think the script ibs.pl would be more useful if it were to offer the
option to decrypt
each IBS register. I am not a Perl programmer, so I had a hard time
understanding
the way one can use pack()/unpack() to extract bitfields from a u64 value.

On Wed, May 2, 2012 at 8:26 PM, Robert Richter <[email protected]> wrote:
> This patch set adds perf tool support for AMD IBS. Mostly perf tool
> patches, but
>
>  perf/x86-ibs: Add support for IBS pseudo events
>
> includes kernel changes.
>
> It basically implements:
>
>  * General perf tool support for dynamically allocated pmus,
>  * Support for IBS pseudo events,
>  * perf-script support for IBS (updated version).
>
> With this patches there is the possibility to add pmu handler code for
> a specific pmu. If a pmu with handler code is listed in sysfs, it is
> activated and can be used with perf tools. E.g. perf-list shows
> available events of this pmu and the event parser of perf-record is
> able to setup profiling for that particular pmu. This mechanism is
> used to implement pmus for ibs (ibs_op/ibs_fetch).
>
> IBS pseudo events are derived from an IBS sample and determined
> through a combination of one or more IBS event flags or values.
> E.g. one could profile mispredicted branches with ibs using such
> pseudo events:
>
>  # perf record -a -e ibs_op:MISPREDICTED_BRANCH ...
>
> This is implemented with kernel side filtering of ibs samples by
> passing filter settings to the kernel (attr.config1/config2).
>
> I also attached an updated version of my previously posted patch that
> implements perf-script support for ibs. This is usefull for a user to
> dump raw ibs samples. The script can be used by the user as a basis
> for a script that post-processes ibs samples.
>
> Changes can be pulled from kernel.org and are available in the git
> repository at:
>
>  git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git perf-ibs
>
> -Robert
>
>
> Robert Richter (7):
>  perf tools: Fix generation of pmu list
>  perf tools: Add basic dynamic PMU support
>  perf tools: Add parser for dynamic PMU events
>  perf/x86-ibs: Add support for IBS pseudo events
>  perf, tools: Add raw event support for dynamic allocated pmus
>  perf tools: Add pmu mappings to header information
>  perf script: Add script to collect and display IBS samples
>
>  arch/x86/kernel/cpu/perf_event_amd_ibs.c |   83 ++++++-
>  tools/perf/Makefile                      |    1 +
>  tools/perf/scripts/perl/bin/ibs-record   |   23 ++
>  tools/perf/scripts/perl/bin/ibs-report   |    6 +
>  tools/perf/scripts/perl/ibs.pl           |   47 ++++
>  tools/perf/util/header.c                 |   78 ++++++
>  tools/perf/util/header.h                 |    1 +
>  tools/perf/util/parse-events.c           |   45 +++-
>  tools/perf/util/parse-events.h           |    4 +-
>  tools/perf/util/parse-events.y           |   11 +-
>  tools/perf/util/pmu-ibs.c                |  419 ++++++++++++++++++++++++++++++
>  tools/perf/util/pmu.c                    |  102 +++++++-
>  tools/perf/util/pmu.h                    |   19 ++
>  13 files changed, 828 insertions(+), 11 deletions(-)
>  create mode 100644 tools/perf/scripts/perl/bin/ibs-record
>  create mode 100644 tools/perf/scripts/perl/bin/ibs-report
>  create mode 100644 tools/perf/scripts/perl/ibs.pl
>  create mode 100644 tools/perf/util/pmu-ibs.c
>
> --
> 1.7.8.4
>
>

Subject: Re: [PATCH 0/7] perf/x86-ibs and tools: Add support for AMD IBS

On 31.05.12 17:24:25, Stephane Eranian wrote:

> I think the script ibs.pl would be more useful if it were to offer the
> option to decrypt
> each IBS register. I am not a Perl programmer, so I had a hard time
> understanding
> the way one can use pack()/unpack() to extract bitfields from a u64 value.

Stephane, I didn't spent the effort yet to implement a complete IBS
sample parser in userland. This could be done with a later update. My
intention was to give the perl script out anyway as it helps to dump
the raw register values. It can easy be extended or grep'ed directly
for bit combinations of certain registers.

For the implementation of an ibs sample parser I was thinking of
extending perf-report, but so far I don't have something. Another
option I plan to implement is userland sample filtering. Pseudo events
could be extracted from the raw data which is then equivalent to a
sample parser.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center