2015-06-11 09:59:31

by Hendrik Brueckner

[permalink] [raw]
Subject: [PATCH 1/2] perf: return ENOENT instead of ENOTSUPP

The ENOTSUPP (which actually should be EOPNOTSUPP for user space) does not
trigger a fallback event selection, for example, by perf record.
If hardware support for the cycles perf event is available, but the hardware
does not provide interrupts, returning ENOTSUPP causes perf to end. Returning
ENOENT causes the perf tool to fallback to a software-based cycle PMU that
supports interrupts.

The commit 53b25335dd ("perf: Disable sampled events if no PMU interrupt")
introduced that incompatible change.

Reported-by: Michael Holzheu <[email protected]>
Signed-off-by: Hendrik Brueckner <[email protected]>
---
kernel/events/core.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index eddf1ed..4c66465 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7964,7 +7964,7 @@ SYSCALL_DEFINE5(perf_event_open,

if (is_sampling_event(event)) {
if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
- err = -ENOTSUPP;
+ err = -ENOENT;
goto err_alloc;
}
}
--
1.7.1


2015-06-11 09:59:48

by Hendrik Brueckner

[permalink] [raw]
Subject: [PATCH 2/2] perf: correct event accounting imbalance on error path

If the perf_event_open() syscall is called for sampling (perf record), but
the selected PMU does not support the sampling, the event is freed. This
particular free includes a decrement of the perf_sched_events jump label.

However, the accounting which actually increase perf_sched_events is not
yet called and, therefore, triggers a warning in the jump_label code.

On s390, this ends in this warning:

[ 16.633195] ------------[ cut here ]------------
[ 16.633196] WARNING: at ../kernel/jump_label.c:82
[ 16.633197] Modules linked in: eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd dm_multipath grace dm_mod sunrpc scsi_dh autofs4
[ 16.633204] CPU: 0 PID: 539 Comm: perf Not tainted 3.18.3-20150126.0.fdf02cc.31d6da9.fc20.s390xperformance #1
[ 16.633206] task: 000000000afecec0 ti: 0000000004a94000 task.ti: 0000000004a94000
[ 16.633207] Krnl PSW : 0704e00180000000 000000000024c6c2 (__static_key_slow_dec+0xfa/0x100)
[ 16.633214] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 EA:3
Krnl GPRS: 0000000000000086 0000000000e39e14 000000000000001b 0000000000000000
[ 16.633216] 000000000024c6be 0000000000000000 0000000000000000 0000000000000000
[ 16.633218] 0000000000000000 0000000005d13480 fffffffffffffdf4 0000000000ccac18
[ 16.633219] 0000000000000064 0000000000ccabf8 000000000024c6be 0000000004a97d58
[ 16.633227] Krnl Code: 000000000024c6b2: c0200041fd3e larl %r2,a8c12e
000000000024c6b8: c0e5003292c4 brasl %r14,89ec40
#000000000024c6be: a7f40001 brc 15,24c6c0
>000000000024c6c2: a7f4ffb2 brc 15,24c626
000000000024c6c6: 0707 bcr 0,%r7
000000000024c6c8: c0f40000000c brcl 15,24c6e0
000000000024c6ce: c0100055c8ed larl %r1,d058a8
000000000024c6d4: c0e50033065c brasl %r14,8ad38c
[ 16.633236] Call Trace:
[ 16.633238] ([<000000000024c6be>] __static_key_slow_dec+0xf6/0x100)
[ 16.633240] [<000000000024087c>] _free_event+0x15c/0x198
[ 16.633241] [<0000000000246e7a>] SyS_perf_event_open+0x432/0xa70
[ 16.633245] [<00000000008ac5f2>] system_call+0xd6/0x258
[ 16.633246] [<000003ffaaa784e2>] 0x3ffaaa784e2
[ 16.633247] Last Breaking-Event-Address:
[ 16.633249] [<000000000024c6be>] __static_key_slow_dec+0xf6/0x100
[ 16.633250] ---[ end trace cd0b0e85985e8baa ]---

To solve this problem, just free the event without taking care of the
accounting.

Reported-by: Michael Holzheu <[email protected]>
Signed-off-by: Hendrik Brueckner <[email protected]>
---
kernel/events/core.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4c66465..d9051e0 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7965,7 +7965,8 @@ SYSCALL_DEFINE5(perf_event_open,
if (is_sampling_event(event)) {
if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
err = -ENOENT;
- goto err_alloc;
+ __free_event(event);
+ goto err_cpus;
}
}

--
1.7.1

2015-06-11 10:25:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf: return ENOENT instead of ENOTSUPP

On Thu, 2015-06-11 at 11:59 +0200, Hendrik Brueckner wrote:
> The ENOTSUPP (which actually should be EOPNOTSUPP for user space) does not
> trigger a fallback event selection, for example, by perf record.
> If hardware support for the cycles perf event is available, but the hardware
> does not provide interrupts, returning ENOTSUPP causes perf to end. Returning
> ENOENT causes the perf tool to fallback to a software-based cycle PMU that
> supports interrupts.
>
> The commit 53b25335dd ("perf: Disable sampled events if no PMU interrupt")
> introduced that incompatible change.

That's 3.16

> if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
> - err = -ENOTSUPP;
> + err = -ENOENT;
> goto err_alloc;
> }
> }

And now you would be changing an API that's been around for at least 4
releases.

Also, I really think -ENOENT is the wrong return here, you're asking for
things that's not supported, not for something that's not there.

2015-06-11 13:03:11

by Hendrik Brueckner

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf: return ENOENT instead of ENOTSUPP

On Thu, Jun 11, 2015 at 12:25:01PM +0200, Peter Zijlstra wrote:
> On Thu, 2015-06-11 at 11:59 +0200, Hendrik Brueckner wrote:
> > The ENOTSUPP (which actually should be EOPNOTSUPP for user space) does not
> > trigger a fallback event selection, for example, by perf record.
> > If hardware support for the cycles perf event is available, but the hardware
> > does not provide interrupts, returning ENOTSUPP causes perf to end. Returning
> > ENOENT causes the perf tool to fallback to a software-based cycle PMU that
> > supports interrupts.
> >
> > The commit 53b25335dd ("perf: Disable sampled events if no PMU interrupt")
> > introduced that incompatible change.
>
> That's 3.16

Correct... I recently encountered the problem.

>
> > if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
> > - err = -ENOTSUPP;
> > + err = -ENOENT;
> > goto err_alloc;
> > }
> > }
>
> And now you would be changing an API that's been around for at least 4
> releases.

Well... the behavior before 53b25335dd was differently in this regard. Of
course, the API changed 4 releases ago. The question here is rather was
this desired or not. In my mind I considered this problem as a regression.

>
> Also, I really think -ENOENT is the wrong return here, you're asking for
> things that's not supported, not for something that's not there.

So looks like -ENOTSUPP is the desired API now. So the problem I'd like
to solve is that there are two different hardware PMUs that support the
"cycles" event. Just one of them supports sampling of cycles, the other not.

In the past (prior to 3.16), the perf tool tried several PMUs if -ENOENT
was returned. With 3.16, -ENOTSUPP is returned (which actually should be
-EOPNOTSUPP but different story) and the perf tool exits.

So the question is: what is the desired behavior?

A solution towards the "fallback-behavior" would be to change
perf_init_event() and consider the sampling/non-sampling criteria (in general
pmu->capabilities) when looking for a matching PMU to serve the event?

Thanks and kind regards,
Hendrik

--
Hendrik Brueckner
[email protected] | IBM Deutschland Research & Development GmbH
Linux on System z Development | Schoenaicher Str. 220, 71032 Boeblingen


IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

2015-06-11 13:28:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf: return ENOENT instead of ENOTSUPP

On Thu, Jun 11, 2015 at 03:02:55PM +0200, Hendrik Brueckner wrote:
> On Thu, Jun 11, 2015 at 12:25:01PM +0200, Peter Zijlstra wrote:
> > On Thu, 2015-06-11 at 11:59 +0200, Hendrik Brueckner wrote:
> > > The ENOTSUPP (which actually should be EOPNOTSUPP for user space) does not
> > > trigger a fallback event selection, for example, by perf record.
> > > If hardware support for the cycles perf event is available, but the hardware
> > > does not provide interrupts, returning ENOTSUPP causes perf to end. Returning
> > > ENOENT causes the perf tool to fallback to a software-based cycle PMU that
> > > supports interrupts.
> > >
> > > The commit 53b25335dd ("perf: Disable sampled events if no PMU interrupt")
> > > introduced that incompatible change.
> >
> > That's 3.16
>
> Correct... I recently encountered the problem.
>
> >
> > > if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
> > > - err = -ENOTSUPP;
> > > + err = -ENOENT;
> > > goto err_alloc;
> > > }
> > > }
> >
> > And now you would be changing an API that's been around for at least 4
> > releases.
>
> Well... the behavior before 53b25335dd was differently in this regard. Of
> course, the API changed 4 releases ago. The question here is rather was
> this desired or not. In my mind I considered this problem as a regression.

Ah, I see what you mean:

97b1198fece0 ("s390, perf: Use common PMU interrupt disabled code")

> > Also, I really think -ENOENT is the wrong return here, you're asking for
> > things that's not supported, not for something that's not there.
>
> So looks like -ENOTSUPP is the desired API now. So the problem I'd like
> to solve is that there are two different hardware PMUs that support the
> "cycles" event. Just one of them supports sampling of cycles, the other not.

perf_event_attr::type will uniquely identify the pmu.

if (event->attr.type != pmu->type)
return -ENOENT;

/* event is for this pmu, any fail hereafter should be fatal */

if (is_sampling_event(event))
return -EOPNOTSUPP;

> In the past (prior to 3.16), the perf tool tried several PMUs if -ENOENT
> was returned. With 3.16, -ENOTSUPP is returned (which actually should be
> -EOPNOTSUPP but different story) and the perf tool exits.

So cpumf_pmu_event_init() should not have returned -ENOENT to start
with.

It should have first ascertained that this event was indeed for that
pmu, if not, -ENOENT would indeed be the correct return. However once it
finds its an event for this pmu, which requests sampling, which this pmu
cannot deliver, it should have returned a fatal error (!-ENOENT).

-EOPNOTSUPP would have been a good one there.

> So the question is: what is the desired behavior?

I think desired would be EOPNOTSUPP, but it would mean yet another
change to the API.

Then again, seeing how this isn't actually working no, that might not be
too bad.