2015-07-22 10:39:09

by Matt Fleming

[permalink] [raw]
Subject: [RFC][PATCH] perf tests: Add Intel CQM and arch tests

From: Matt Fleming <[email protected]>

Peter reports that it's possible to trigger a WARN_ON_ONCE() in the
Intel CQM code by combining a hardware event and an Intel CQM (software)
event into a group. Unfortunately, the perf tools are not able to create
this bundle and we need to manually construct a test case.

For posterity, record Peter's proof of concept test case in tools/perf
so that it presents a model for how we can perform architecture-specific
tests, or "arch tests", in perf in the future.

The particular issue triggered in the test case is that when the counter
for the hardware event overflows and triggers a PMI we'll read both the
hardware event and the software event counters. Unfortunately, for CQM
that involves performing an IPI to read the CQM event counters on all
sockets, which in NMI context triggers the WARN_ON_ONCE().

This patch is marked as RFC because I'd really like to solicit opinions
on this approach and hear feedback on whether this is the correct way to
structure these arch tests. I realise that we've already got tests for
the TSC, etc that are x86-specific but I didn't want to change the order
of the tests (say, by moving test__perf_time_to_tsc() into ARCH_TESTS)
in case that broke some kind of ABI.

Cc: Peter Zijlstra <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matt Fleming <[email protected]>
---
tools/perf/arch/x86/Build | 2 +-
tools/perf/arch/x86/include/arch-tests.h | 13 ++++
tools/perf/arch/x86/tests/Build | 6 +-
tools/perf/arch/x86/tests/intel-cqm.c | 124 +++++++++++++++++++++++++++++++
tools/perf/tests/builtin-test.c | 7 ++
5 files changed, 149 insertions(+), 3 deletions(-)
create mode 100644 tools/perf/arch/x86/include/arch-tests.h
create mode 100644 tools/perf/arch/x86/tests/intel-cqm.c

diff --git a/tools/perf/arch/x86/Build b/tools/perf/arch/x86/Build
index 41bf61da476a..db52fa22d3a1 100644
--- a/tools/perf/arch/x86/Build
+++ b/tools/perf/arch/x86/Build
@@ -1,2 +1,2 @@
libperf-y += util/
-libperf-$(CONFIG_DWARF_UNWIND) += tests/
+libperf-y += tests/
diff --git a/tools/perf/arch/x86/include/arch-tests.h b/tools/perf/arch/x86/include/arch-tests.h
new file mode 100644
index 000000000000..9d43f759e014
--- /dev/null
+++ b/tools/perf/arch/x86/include/arch-tests.h
@@ -0,0 +1,13 @@
+#ifndef ARCH_TESTS_H
+#define ARCH_TESTS_H
+
+/* Tests */
+int test__intel_cqm_count_nmi_context(void);
+
+#define ARCH_TESTS \
+ { \
+ .desc = "Test intel cqm nmi context read", \
+ .func = test__intel_cqm_count_nmi_context, \
+ },
+
+#endif
diff --git a/tools/perf/arch/x86/tests/Build b/tools/perf/arch/x86/tests/Build
index b30eff9bcc83..58469b91b49b 100644
--- a/tools/perf/arch/x86/tests/Build
+++ b/tools/perf/arch/x86/tests/Build
@@ -1,2 +1,4 @@
-libperf-y += regs_load.o
-libperf-y += dwarf-unwind.o
+libperf-$(CONFIG_DWARF_UNWIND) += regs_load.o
+libperf-$(CONFIG_DWARF_UNWIND) += dwarf-unwind.o
+
+libperf-y += intel-cqm.o
diff --git a/tools/perf/arch/x86/tests/intel-cqm.c b/tools/perf/arch/x86/tests/intel-cqm.c
new file mode 100644
index 000000000000..d28c1b6a3b54
--- /dev/null
+++ b/tools/perf/arch/x86/tests/intel-cqm.c
@@ -0,0 +1,124 @@
+#include "tests/tests.h"
+#include "perf.h"
+#include "cloexec.h"
+#include "debug.h"
+#include "evlist.h"
+#include "evsel.h"
+#include "arch-tests.h"
+
+#include <sys/mman.h>
+#include <string.h>
+
+static pid_t spawn(void)
+{
+ pid_t pid;
+
+ pid = fork();
+ if (pid)
+ return pid;
+
+ while(1);
+ sleep(5);
+ return 0;
+}
+
+/*
+ * Create an event group that contains both a sampled hardware
+ * (cpu-cycles) and software (intel_cqm/llc_occupancy/) event. We then
+ * wait for the hardware perf counter to overflow and generate a PMI,
+ * which triggers an event read for both of the events in the group.
+ *
+ * Since reading Intel CQM event counters requires sending SMP IPIs, the
+ * CQM pmu needs to handle the above situation gracefully, and return
+ * the last read counter value to avoid triggering a WARN_ON_ONCE() in
+ * smp_call_function_many() caused by sending IPIs from NMI context.
+ */
+int test__intel_cqm_count_nmi_context(void)
+{
+ struct perf_evlist *evlist = NULL;
+ struct perf_evsel *evsel = NULL;
+ struct perf_event_attr pe;
+ int i, fd[2], flag, ret;
+ size_t mmap_len;
+ void *event;
+ pid_t pid;
+ int err = TEST_FAIL;
+
+ flag = perf_event_open_cloexec_flag();
+
+ evlist = perf_evlist__new();
+ if (!evlist) {
+ pr_debug("perf_evlist__new failed\n");
+ return TEST_FAIL;
+ }
+
+ ret = parse_events(evlist, "intel_cqm/llc_occupancy/", NULL);
+ if (ret) {
+ pr_debug("parse_events failed\n");
+ err = TEST_SKIP;
+ goto out;
+ }
+
+ evsel = perf_evlist__first(evlist);
+ if (!evsel) {
+ pr_debug("perf_evlist__first failed\n");
+ goto out;
+ }
+
+ memset(&pe, 0, sizeof(pe));
+ pe.size = sizeof(pe);
+
+ pe.type = PERF_TYPE_HARDWARE;
+ pe.config = PERF_COUNT_HW_CPU_CYCLES;
+ pe.read_format = PERF_FORMAT_GROUP;
+
+ pe.sample_period = 128;
+ pe.sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_READ;
+
+ pid = spawn();
+
+ fd[0] = sys_perf_event_open(&pe, pid, -1, -1, flag);
+ if (fd[0] < 0) {
+ pr_debug("failed to open event\n");
+ goto out;
+ }
+
+ memset(&pe, 0, sizeof(pe));
+ pe.size = sizeof(pe);
+
+ pe.type = evsel->attr.type;
+ pe.config = evsel->attr.config;
+
+ fd[1] = sys_perf_event_open(&pe, pid, -1, fd[0], flag);
+ if (fd[1] < 0) {
+ pr_debug("failed to open event\n");
+ goto out;
+ }
+
+ /*
+ * Pick a power-of-two number of pages + 1 for the meta-data
+ * page (struct perf_event_mmap_page). See tools/perf/design.txt.
+ */
+ mmap_len = page_size * 65;
+
+ event = mmap(NULL, mmap_len, PROT_READ, MAP_SHARED, fd[0], 0);
+ if (event == (void *)(-1)) {
+ pr_debug("failed to mmap %d\n", errno);
+ goto out;
+ }
+
+ sleep(1);
+
+ err = TEST_OK;
+
+ munmap(event, mmap_len);
+
+ for (i = 0; i < 2; i++)
+ close(fd[i]);
+
+ kill(pid, SIGKILL);
+ wait(NULL);
+out:
+ perf_evlist__delete(evlist);
+ return err;
+}
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index c1dde733c3a6..82e433340fda 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -14,6 +14,10 @@
#include "parse-options.h"
#include "symbol.h"

+#if defined(__x86_64__)
+#include "arch-tests.h"
+#endif
+
static struct test {
const char *desc;
int (*func)(void);
@@ -174,6 +178,9 @@ static struct test {
.desc = "Test thread map",
.func = test__thread_map,
},
+#if defined(__x86_64__)
+ ARCH_TESTS
+#endif
{
.func = NULL,
},
--
2.1.0


2015-07-22 13:24:54

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC][PATCH] perf tests: Add Intel CQM and arch tests

On Wed, Jul 22, 2015 at 11:38:59AM +0100, Matt Fleming wrote:
> From: Matt Fleming <[email protected]>
>
> Peter reports that it's possible to trigger a WARN_ON_ONCE() in the
> Intel CQM code by combining a hardware event and an Intel CQM (software)
> event into a group. Unfortunately, the perf tools are not able to create
> this bundle and we need to manually construct a test case.
>
> For posterity, record Peter's proof of concept test case in tools/perf
> so that it presents a model for how we can perform architecture-specific
> tests, or "arch tests", in perf in the future.
>
> The particular issue triggered in the test case is that when the counter
> for the hardware event overflows and triggers a PMI we'll read both the
> hardware event and the software event counters. Unfortunately, for CQM
> that involves performing an IPI to read the CQM event counters on all
> sockets, which in NMI context triggers the WARN_ON_ONCE().
>
> This patch is marked as RFC because I'd really like to solicit opinions
> on this approach and hear feedback on whether this is the correct way to
> structure these arch tests. I realise that we've already got tests for
> the TSC, etc that are x86-specific but I didn't want to change the order
> of the tests (say, by moving test__perf_time_to_tsc() into ARCH_TESTS)
> in case that broke some kind of ABI.

I wouldn't consider the order of tests being ABI,
let's break it and watch ;-)

SNIP

> diff --git a/tools/perf/arch/x86/include/arch-tests.h b/tools/perf/arch/x86/include/arch-tests.h
> new file mode 100644
> index 000000000000..9d43f759e014
> --- /dev/null
> +++ b/tools/perf/arch/x86/include/arch-tests.h
> @@ -0,0 +1,13 @@
> +#ifndef ARCH_TESTS_H
> +#define ARCH_TESTS_H
> +
> +/* Tests */
> +int test__intel_cqm_count_nmi_context(void);
> +
> +#define ARCH_TESTS \
> + { \
> + .desc = "Test intel cqm nmi context read", \
> + .func = test__intel_cqm_count_nmi_context, \
> + },
> +

hum, I dont like much this being stuffed in macro,
but dont have any technical reason against ;-)

maybe we could add 'struct test arch_tests[]' array, that'd be
initialized by each arch and executed in addition to the current
'struct test tests[]'

jirka

2015-07-22 13:51:20

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC][PATCH] perf tests: Add Intel CQM and arch tests

Em Wed, Jul 22, 2015 at 03:24:44PM +0200, Jiri Olsa escreveu:
> On Wed, Jul 22, 2015 at 11:38:59AM +0100, Matt Fleming wrote:
> > This patch is marked as RFC because I'd really like to solicit opinions
> > on this approach and hear feedback on whether this is the correct way to
> > structure these arch tests. I realise that we've already got tests for
> > the TSC, etc that are x86-specific but I didn't want to change the order
> > of the tests (say, by moving test__perf_time_to_tsc() into ARCH_TESTS)
> > in case that broke some kind of ABI.

> I wouldn't consider the order of tests being ABI,
> let's break it and watch ;-)

yeah

> SNIP

> > diff --git a/tools/perf/arch/x86/include/arch-tests.h b/tools/perf/arch/x86/include/arch-tests.h
> > +#define ARCH_TESTS \
> > + { \
> > + .desc = "Test intel cqm nmi context read", \
> > + .func = test__intel_cqm_count_nmi_context, \
> > + },
> > +
>
> hum, I dont like much this being stuffed in macro,
> but dont have any technical reason against ;-)
>
> maybe we could add 'struct test arch_tests[]' array, that'd be
> initialized by each arch and executed in addition to the current
> 'struct test tests[]'

Agreed, that would be cleaner, and we need something like that anyway,
i.e. some way to group tests that run only if certain requirements are
met, i.e. tests that require root permission, arch specific ones, etc.

- Arnaldo

2015-07-22 22:18:56

by Matt Fleming

[permalink] [raw]
Subject: Re: [RFC][PATCH] perf tests: Add Intel CQM and arch tests

On Wed, 22 Jul, at 03:24:44PM, Jiri Olsa wrote:
> On Wed, Jul 22, 2015 at 11:38:59AM +0100, Matt Fleming wrote:
> > From: Matt Fleming <[email protected]>
> >
> > Peter reports that it's possible to trigger a WARN_ON_ONCE() in the
> > Intel CQM code by combining a hardware event and an Intel CQM (software)
> > event into a group. Unfortunately, the perf tools are not able to create
> > this bundle and we need to manually construct a test case.
> >
> > For posterity, record Peter's proof of concept test case in tools/perf
> > so that it presents a model for how we can perform architecture-specific
> > tests, or "arch tests", in perf in the future.
> >
> > The particular issue triggered in the test case is that when the counter
> > for the hardware event overflows and triggers a PMI we'll read both the
> > hardware event and the software event counters. Unfortunately, for CQM
> > that involves performing an IPI to read the CQM event counters on all
> > sockets, which in NMI context triggers the WARN_ON_ONCE().
> >
> > This patch is marked as RFC because I'd really like to solicit opinions
> > on this approach and hear feedback on whether this is the correct way to
> > structure these arch tests. I realise that we've already got tests for
> > the TSC, etc that are x86-specific but I didn't want to change the order
> > of the tests (say, by moving test__perf_time_to_tsc() into ARCH_TESTS)
> > in case that broke some kind of ABI.
>
> I wouldn't consider the order of tests being ABI,
> let's break it and watch ;-)

OK, that at least allows me to perform a litte more surgery on the
current tests.

> > diff --git a/tools/perf/arch/x86/include/arch-tests.h b/tools/perf/arch/x86/include/arch-tests.h
> > new file mode 100644
> > index 000000000000..9d43f759e014
> > --- /dev/null
> > +++ b/tools/perf/arch/x86/include/arch-tests.h
> > @@ -0,0 +1,13 @@
> > +#ifndef ARCH_TESTS_H
> > +#define ARCH_TESTS_H
> > +
> > +/* Tests */
> > +int test__intel_cqm_count_nmi_context(void);
> > +
> > +#define ARCH_TESTS \
> > + { \
> > + .desc = "Test intel cqm nmi context read", \
> > + .func = test__intel_cqm_count_nmi_context, \
> > + },
> > +
>
> hum, I dont like much this being stuffed in macro,
> but dont have any technical reason against ;-)
>
> maybe we could add 'struct test arch_tests[]' array, that'd be
> initialized by each arch and executed in addition to the current
> 'struct test tests[]'

Makes sense to me. I'll take that approach. Thanks.

--
Matt Fleming, Intel Open Source Technology Center