2009-06-03 09:41:01

by Paul Mackerras

[permalink] [raw]
Subject: [PATCH 1/2] perf_counter: powerpc: Fix event alternative code generation on POWER5/5+

Commit ef923214 ("perf_counter: powerpc: use u64 for event codes
internally") introduced a bug where the return value from function
find_alternative_bdecode gets put into a u64 variable and later tested
to see if it is < 0. The effect is that we get extra, bogus event
code alternatives on POWER5 and POWER5+, leading to error messages
such as "oops compute_mmcr failed" being printed and counters not
counting properly.

This fixes it by using s64 for the return type of find_alternative_bdecode
and for the local variable that the caller puts the value in. It also
makes the event argument a u64 on POWER5+ for consistency.

Signed-off-by: Paul Mackerras <[email protected]>
---
arch/powerpc/kernel/power5+-pmu.c | 4 ++--
arch/powerpc/kernel/power5-pmu.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/power5+-pmu.c b/arch/powerpc/kernel/power5+-pmu.c
index c6cdfc1..8471e3c 100644
--- a/arch/powerpc/kernel/power5+-pmu.c
+++ b/arch/powerpc/kernel/power5+-pmu.c
@@ -242,7 +242,7 @@ static const unsigned char bytedecode_alternatives[4][4] = {
* event code for those that do, or -1 otherwise. This also handles
* alternative PCMSEL values for add events.
*/
-static int find_alternative_bdecode(unsigned int event)
+static s64 find_alternative_bdecode(u64 event)
{
int pmc, altpmc, pp, j;

@@ -277,7 +277,7 @@ static int power5p_get_alternatives(u64 event, unsigned int flags, u64 alt[])
{
int i, j, nalt = 1;
int nlim;
- u64 ae;
+ s64 ae;

alt[0] = event;
nalt = 1;
diff --git a/arch/powerpc/kernel/power5-pmu.c b/arch/powerpc/kernel/power5-pmu.c
index d534496..1b44c5f 100644
--- a/arch/powerpc/kernel/power5-pmu.c
+++ b/arch/powerpc/kernel/power5-pmu.c
@@ -250,7 +250,7 @@ static const unsigned char bytedecode_alternatives[4][4] = {
* PMCSEL values on other counters. This returns the alternative
* event code for those that do, or -1 otherwise.
*/
-static u64 find_alternative_bdecode(u64 event)
+static s64 find_alternative_bdecode(u64 event)
{
int pmc, altpmc, pp, j;

@@ -272,7 +272,7 @@ static u64 find_alternative_bdecode(u64 event)
static int power5_get_alternatives(u64 event, unsigned int flags, u64 alt[])
{
int i, j, nalt = 1;
- u64 ae;
+ s64 ae;

alt[0] = event;
nalt = 1;
--
1.6.0.4


2009-06-03 09:40:49

by Paul Mackerras

[permalink] [raw]
Subject: [PATCH 2/2] perf_counter: powerpc: Fix race causing "oops trying to read PMC0" errors

When using interrupting counters and limited (non-interrupting) counters
at the same time, it's possible that we get an interrupt in write_mmcr0()
after writing MMCR0 but before we have set up the counters using limited
PMCs. What happens then is that we get into perf_counter_interrupt()
with counter->hw.idx = 0 for the limited counters, leading to the "oops
trying to read PMC0" error message being printed.

This fixes the problem by making perf_counter_interrupt() robust against
counter->hw.idx being zero (the counter is just ignored in that case)
and also by changing write_mmcr0() to write MMCR0 initially with the
counter overflow interrupt enable bits masked (set to 0). If the MMCR0
value requested by the caller has either of those bits set, we write
MMCR0 again with the requested value of those bits after setting up the
limited counters properly.

Signed-off-by: Paul Mackerras <[email protected]>
---
arch/powerpc/kernel/perf_counter.c | 16 ++++++++++++++--
1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/perf_counter.c b/arch/powerpc/kernel/perf_counter.c
index ea54686..4cc4ac5 100644
--- a/arch/powerpc/kernel/perf_counter.c
+++ b/arch/powerpc/kernel/perf_counter.c
@@ -372,16 +372,28 @@ static void write_mmcr0(struct cpu_hw_counters *cpuhw, unsigned long mmcr0)

/*
* Write MMCR0, then read PMC5 and PMC6 immediately.
+ * To ensure we don't get a performance monitor interrupt
+ * between writing MMCR0 and freezing/thawing the limited
+ * counters, we first write MMCR0 with the counter overflow
+ * interrupt enable bits turned off.
*/
asm volatile("mtspr %3,%2; mfspr %0,%4; mfspr %1,%5"
: "=&r" (pmc5), "=&r" (pmc6)
- : "r" (mmcr0), "i" (SPRN_MMCR0),
+ : "r" (mmcr0 & ~(MMCR0_PMC1CE | MMCR0_PMCjCE)),
+ "i" (SPRN_MMCR0),
"i" (SPRN_PMC5), "i" (SPRN_PMC6));

if (mmcr0 & MMCR0_FC)
freeze_limited_counters(cpuhw, pmc5, pmc6);
else
thaw_limited_counters(cpuhw, pmc5, pmc6);
+
+ /*
+ * Write the full MMCR0 including the counter overflow interrupt
+ * enable bits, if necessary.
+ */
+ if (mmcr0 & (MMCR0_PMC1CE | MMCR0_PMCjCE))
+ mtspr(SPRN_MMCR0, mmcr0);
}

/*
@@ -1108,7 +1120,7 @@ static void perf_counter_interrupt(struct pt_regs *regs)

for (i = 0; i < cpuhw->n_counters; ++i) {
counter = cpuhw->counter[i];
- if (is_limited_pmc(counter->hw.idx))
+ if (!counter->hw.idx || is_limited_pmc(counter->hw.idx))
continue;
val = read_pmc(counter->hw.idx);
if ((int)val < 0) {
--
1.6.0.4

2009-06-03 09:52:19

by Paul Mackerras

[permalink] [raw]
Subject: [tip:perfcounters/core] perf_counter: powerpc: Fix event alternative code generation on POWER5/5+

Commit-ID: 6984efb692e97ce5f75f26e595685c04c2061bac
Gitweb: http://git.kernel.org/tip/6984efb692e97ce5f75f26e595685c04c2061bac
Author: Paul Mackerras <[email protected]>
AuthorDate: Wed, 3 Jun 2009 19:38:58 +1000
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 3 Jun 2009 11:49:52 +0200

perf_counter: powerpc: Fix event alternative code generation on POWER5/5+

Commit ef923214 ("perf_counter: powerpc: use u64 for event
codes internally") introduced a bug where the return value from
function find_alternative_bdecode gets put into a u64 variable
and later tested to see if it is < 0. The effect is that we
get extra, bogus event code alternatives on POWER5 and POWER5+,
leading to error messages such as "oops compute_mmcr failed"
being printed and counters not counting properly.

This fixes it by using s64 for the return type of
find_alternative_bdecode and for the local variable that the
caller puts the value in. It also makes the event argument a
u64 on POWER5+ for consistency.

Signed-off-by: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: John Kacur <[email protected]>
Cc: Stephane Eranian <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/powerpc/kernel/power5+-pmu.c | 4 ++--
arch/powerpc/kernel/power5-pmu.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/power5+-pmu.c b/arch/powerpc/kernel/power5+-pmu.c
index c6cdfc1..8471e3c 100644
--- a/arch/powerpc/kernel/power5+-pmu.c
+++ b/arch/powerpc/kernel/power5+-pmu.c
@@ -242,7 +242,7 @@ static const unsigned char bytedecode_alternatives[4][4] = {
* event code for those that do, or -1 otherwise. This also handles
* alternative PCMSEL values for add events.
*/
-static int find_alternative_bdecode(unsigned int event)
+static s64 find_alternative_bdecode(u64 event)
{
int pmc, altpmc, pp, j;

@@ -277,7 +277,7 @@ static int power5p_get_alternatives(u64 event, unsigned int flags, u64 alt[])
{
int i, j, nalt = 1;
int nlim;
- u64 ae;
+ s64 ae;

alt[0] = event;
nalt = 1;
diff --git a/arch/powerpc/kernel/power5-pmu.c b/arch/powerpc/kernel/power5-pmu.c
index d534496..1b44c5f 100644
--- a/arch/powerpc/kernel/power5-pmu.c
+++ b/arch/powerpc/kernel/power5-pmu.c
@@ -250,7 +250,7 @@ static const unsigned char bytedecode_alternatives[4][4] = {
* PMCSEL values on other counters. This returns the alternative
* event code for those that do, or -1 otherwise.
*/
-static u64 find_alternative_bdecode(u64 event)
+static s64 find_alternative_bdecode(u64 event)
{
int pmc, altpmc, pp, j;

@@ -272,7 +272,7 @@ static u64 find_alternative_bdecode(u64 event)
static int power5_get_alternatives(u64 event, unsigned int flags, u64 alt[])
{
int i, j, nalt = 1;
- u64 ae;
+ s64 ae;

alt[0] = event;
nalt = 1;

2009-06-03 09:52:30

by Paul Mackerras

[permalink] [raw]
Subject: [tip:perfcounters/core] perf_counter: powerpc: Fix race causing "oops trying to read PMC0" errors

Commit-ID: dcd945e0d8a6d654e3e1de51faea9f98f1504aa5
Gitweb: http://git.kernel.org/tip/dcd945e0d8a6d654e3e1de51faea9f98f1504aa5
Author: Paul Mackerras <[email protected]>
AuthorDate: Wed, 3 Jun 2009 19:40:36 +1000
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 3 Jun 2009 11:49:53 +0200

perf_counter: powerpc: Fix race causing "oops trying to read PMC0" errors

When using interrupting counters and limited (non-interrupting)
counters at the same time, it's possible that we get an
interrupt in write_mmcr0() after writing MMCR0 but before we
have set up the counters using limited PMCs. What happens then
is that we get into perf_counter_interrupt() with
counter->hw.idx = 0 for the limited counters, leading to the
"oops trying to read PMC0" error message being printed.

This fixes the problem by making perf_counter_interrupt()
robust against counter->hw.idx being zero (the counter is just
ignored in that case) and also by changing write_mmcr0() to
write MMCR0 initially with the counter overflow interrupt
enable bits masked (set to 0). If the MMCR0 value requested by
the caller has either of those bits set, we write MMCR0 again
with the requested value of those bits after setting up the
limited counters properly.

Signed-off-by: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: John Kacur <[email protected]>
Cc: Stephane Eranian <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/powerpc/kernel/perf_counter.c | 16 ++++++++++++++--
1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/perf_counter.c b/arch/powerpc/kernel/perf_counter.c
index ea54686..4cc4ac5 100644
--- a/arch/powerpc/kernel/perf_counter.c
+++ b/arch/powerpc/kernel/perf_counter.c
@@ -372,16 +372,28 @@ static void write_mmcr0(struct cpu_hw_counters *cpuhw, unsigned long mmcr0)

/*
* Write MMCR0, then read PMC5 and PMC6 immediately.
+ * To ensure we don't get a performance monitor interrupt
+ * between writing MMCR0 and freezing/thawing the limited
+ * counters, we first write MMCR0 with the counter overflow
+ * interrupt enable bits turned off.
*/
asm volatile("mtspr %3,%2; mfspr %0,%4; mfspr %1,%5"
: "=&r" (pmc5), "=&r" (pmc6)
- : "r" (mmcr0), "i" (SPRN_MMCR0),
+ : "r" (mmcr0 & ~(MMCR0_PMC1CE | MMCR0_PMCjCE)),
+ "i" (SPRN_MMCR0),
"i" (SPRN_PMC5), "i" (SPRN_PMC6));

if (mmcr0 & MMCR0_FC)
freeze_limited_counters(cpuhw, pmc5, pmc6);
else
thaw_limited_counters(cpuhw, pmc5, pmc6);
+
+ /*
+ * Write the full MMCR0 including the counter overflow interrupt
+ * enable bits, if necessary.
+ */
+ if (mmcr0 & (MMCR0_PMC1CE | MMCR0_PMCjCE))
+ mtspr(SPRN_MMCR0, mmcr0);
}

/*
@@ -1108,7 +1120,7 @@ static void perf_counter_interrupt(struct pt_regs *regs)

for (i = 0; i < cpuhw->n_counters; ++i) {
counter = cpuhw->counter[i];
- if (is_limited_pmc(counter->hw.idx))
+ if (!counter->hw.idx || is_limited_pmc(counter->hw.idx))
continue;
val = read_pmc(counter->hw.idx);
if ((int)val < 0) {