LinuxLists.cc - perf: rdpmc bug when viewing all procs on remote cpu

2019-01-10 17:37:27

Subject: perf: rdpmc bug when viewing all procs on remote cpu

Hello

I think this is a bug turned up by PAPI. I've been trying to track down
where this happens in the perf_event code myself, but it might be faster
to just report it.

If you create a per-process attached to CPU event:
perf_event_open(attr, 0, X, -1, 0);
the mmap event index is set to "0" (not available) on all cores but the
current one so the rdpmc read code can properly fall back to read().

However if you create an all-process attached to CPU event:
perf_event_open(attr, -1, X, -1, 0);
the mmap event index is set as if this were a valid event and so the rdpmc
succeeds even though it shouldn't (we're trying to read an event value
on a remote cpu with a local rdpmc).

so I think somehow in the perf_event_open pid=-1 case rdpmc is not getting
blocked properly...

Vince

2019-01-10 20:13:32

by Vince Weaver

[permalink] [raw]

Subject: Re: perf: rdpmc bug when viewing all procs on remote cpu

On Thu, 10 Jan 2019, Vince Weaver wrote:

> However if you create an all-process attached to CPU event:
> perf_event_open(attr, -1, X, -1, 0);
> the mmap event index is set as if this were a valid event and so the rdpmc
> succeeds even though it shouldn't (we're trying to read an event value
> on a remote cpu with a local rdpmc).

For a test case, try the
tests/rdpmc/rdpmc_attach_other_cpu
test found in my perf_event_tests suite
git clone https://github.com/deater/perf_event_tests

I can trigger it with current git on an intel machine, but not on an AMD
machine. Possibly because it is defaulting to one of the fixed counter
slots?

Vince

2019-01-10 21:15:09

by Vince Weaver

[permalink] [raw]

Subject: Re: perf: rdpmc bug when viewing all procs on remote cpu

On Thu, 10 Jan 2019, Vince Weaver wrote:

> On Thu, 10 Jan 2019, Vince Weaver wrote:
>
> > However if you create an all-process attached to CPU event:
> > perf_event_open(attr, -1, X, -1, 0);
> > the mmap event index is set as if this were a valid event and so the rdpmc
> > succeeds even though it shouldn't (we're trying to read an event value
> > on a remote cpu with a local rdpmc).
>
> For a test case, try the
> tests/rdpmc/rdpmc_attach_other_cpu
> test found in my perf_event_tests suite
> git clone https://github.com/deater/perf_event_tests

and that was a cut-and-paste error, I meant
tests/rdpmc/rdpmc_attach_global_cpu
and I was wrong, it does affect AMD machines too.

Vince

2019-01-11 21:54:56

by Vince Weaver

[permalink] [raw]

Subject: Re: perf: rdpmc bug when viewing all procs on remote cpu

On Thu, 10 Jan 2019, Vince Weaver wrote:

> On Thu, 10 Jan 2019, Vince Weaver wrote:
>
> > On Thu, 10 Jan 2019, Vince Weaver wrote:
> >
> > > However if you create an all-process attached to CPU event:
> > > perf_event_open(attr, -1, X, -1, 0);
> > > the mmap event index is set as if this were a valid event and so the rdpmc
> > > succeeds even though it shouldn't (we're trying to read an event value
> > > on a remote cpu with a local rdpmc).

so on further looking at the code, it doesn't appear that rdpmc events are
explicitly marked as unavailable in the attach-cpu or attach-pid case,
it's just by luck the check for PERF_EVENT_STATE_ACTIVE catches most of
the cases?

should an explicit check be added to zero out userpg->index in cases where
the event being measured is running on a different core?

Vince

2019-01-18 12:04:45

by Peter Zijlstra

[permalink] [raw]

Subject: Re: perf: rdpmc bug when viewing all procs on remote cpu

On Fri, Jan 11, 2019 at 04:52:22PM -0500, Vince Weaver wrote:
> On Thu, 10 Jan 2019, Vince Weaver wrote:
>
> > On Thu, 10 Jan 2019, Vince Weaver wrote:
> >
> > > On Thu, 10 Jan 2019, Vince Weaver wrote:
> > >
> > > > However if you create an all-process attached to CPU event:
> > > > perf_event_open(attr, -1, X, -1, 0);
> > > > the mmap event index is set as if this were a valid event and so the rdpmc
> > > > succeeds even though it shouldn't (we're trying to read an event value
> > > > on a remote cpu with a local rdpmc).
>
> so on further looking at the code, it doesn't appear that rdpmc events are
> explicitly marked as unavailable in the attach-cpu or attach-pid case,
> it's just by luck the check for PERF_EVENT_STATE_ACTIVE catches most of
> the cases?
>
> should an explicit check be added to zero out userpg->index in cases where
> the event being measured is running on a different core?

And how would we konw? We don't know what CPU will be observing the
mmap().

RDPMC fundamentally only makes sense on 'self' (either task or CPU).

2019-01-18 14:10:57

by Vince Weaver

[permalink] [raw]

Subject: Re: perf: rdpmc bug when viewing all procs on remote cpu

On Fri, 18 Jan 2019, Peter Zijlstra wrote:

> On Fri, Jan 11, 2019 at 04:52:22PM -0500, Vince Weaver wrote:
> > On Thu, 10 Jan 2019, Vince Weaver wrote:
> >
> > > On Thu, 10 Jan 2019, Vince Weaver wrote:
> > >
> > > > On Thu, 10 Jan 2019, Vince Weaver wrote:
> > > >
> > > > > However if you create an all-process attached to CPU event:
> > > > > perf_event_open(attr, -1, X, -1, 0);
> > > > > the mmap event index is set as if this were a valid event and so the rdpmc
> > > > > succeeds even though it shouldn't (we're trying to read an event value
> > > > > on a remote cpu with a local rdpmc).
> >
> > so on further looking at the code, it doesn't appear that rdpmc events are
> > explicitly marked as unavailable in the attach-cpu or attach-pid case,
> > it's just by luck the check for PERF_EVENT_STATE_ACTIVE catches most of
> > the cases?
> >
> > should an explicit check be added to zero out userpg->index in cases where
> > the event being measured is running on a different core?
>
> And how would we konw? We don't know what CPU will be observing the
> mmap().
>
> RDPMC fundamentally only makes sense on 'self' (either task or CPU).

so is this a "don't do that then" thing and I should have PAPI
userspace avoid using rdpmc() whenever a proc/cpu was attached to?

Or is there a way to have the kernel indicate this? Does the kernel track
current CPU and original CPU of the mmap and could zero out the index
field in this case? Or would that add too much overhead?

Vince

2019-01-18 16:11:49

by Peter Zijlstra

[permalink] [raw]

Subject: Re: perf: rdpmc bug when viewing all procs on remote cpu

On Fri, Jan 18, 2019 at 09:09:04AM -0500, Vince Weaver wrote:
> On Fri, 18 Jan 2019, Peter Zijlstra wrote:
>
> > On Fri, Jan 11, 2019 at 04:52:22PM -0500, Vince Weaver wrote:
> > > On Thu, 10 Jan 2019, Vince Weaver wrote:
> > >
> > > > On Thu, 10 Jan 2019, Vince Weaver wrote:
> > > >
> > > > > On Thu, 10 Jan 2019, Vince Weaver wrote:
> > > > >
> > > > > > However if you create an all-process attached to CPU event:
> > > > > > perf_event_open(attr, -1, X, -1, 0);
> > > > > > the mmap event index is set as if this were a valid event and so the rdpmc
> > > > > > succeeds even though it shouldn't (we're trying to read an event value
> > > > > > on a remote cpu with a local rdpmc).
> > >
> > > so on further looking at the code, it doesn't appear that rdpmc events are
> > > explicitly marked as unavailable in the attach-cpu or attach-pid case,
> > > it's just by luck the check for PERF_EVENT_STATE_ACTIVE catches most of
> > > the cases?
> > >
> > > should an explicit check be added to zero out userpg->index in cases where
> > > the event being measured is running on a different core?
> >
> > And how would we konw? We don't know what CPU will be observing the
> > mmap().
> >
> > RDPMC fundamentally only makes sense on 'self' (either task or CPU).
>
> so is this a "don't do that then" thing and I should have PAPI
> userspace avoid using rdpmc() whenever a proc/cpu was attached to?

You can actually use rdpmc when you attach to a CPU, but you have to
ensure that the userspace component is guaranteed to run on that very
CPU (sched_setaffinity(2) comes to mind).

> Or is there a way to have the kernel indicate this? Does the kernel track
> current CPU and original CPU of the mmap and could zero out the index
> field in this case? Or would that add too much overhead?

Impossible I'm afraid. Memory is not associated with a CPU; it's
contents is the same whatever CPU reads from it.

The best we could possibly do is put the (target, not current) cpu
number in the mmap page; but userspace should already know this, for it
created the event and therefore knows this already.

2019-01-18 17:26:13

by Vince Weaver

[permalink] [raw]

Subject: Re: perf: rdpmc bug when viewing all procs on remote cpu

On Fri, 18 Jan 2019, Peter Zijlstra wrote:
>
> You can actually use rdpmc when you attach to a CPU, but you have to
> ensure that the userspace component is guaranteed to run on that very
> CPU (sched_setaffinity(2) comes to mind).

unfortunately the HPC people using PAPI would probably be annoyed if we
started binding their threads to cores out from under them.

> The best we could possibly do is put the (target, not current) cpu
> number in the mmap page; but userspace should already know this, for it
> created the event and therefore knows this already.

one other thing the kernel would do is just disable rdpmc (setting index
to 0) in the case where the original perf_event_open() cpu paramater!=0

though that would stop the case where we were on the same CPU from
working.

The issue is currently if you're not careful the rdpmc() interface will
sometimes return plausible (but wrong) results for a cross-CPU rdpmc()
call, even if you are properly falling back to read() on ->index being 0.
It's a bit surprising and it looks like it will take a decent amount of
userspace code to work around the issue, which cuts into the low-overhead
nature of rdpmc.

If the answer is simply this is the way the kernel is going to do it,
that's fine, I just have to add workarounds to PAPI and then get the
perf_even_open() manpage updated to make sure people are aware of the
issue.

Vince

2019-01-18 20:13:23

by Peter Zijlstra

[permalink] [raw]

Subject: Re: perf: rdpmc bug when viewing all procs on remote cpu

On Fri, Jan 18, 2019 at 12:24:20PM -0500, Vince Weaver wrote:
> On Fri, 18 Jan 2019, Peter Zijlstra wrote:
> >
> > You can actually use rdpmc when you attach to a CPU, but you have to
> > ensure that the userspace component is guaranteed to run on that very
> > CPU (sched_setaffinity(2) comes to mind).
>
> unfortunately the HPC people using PAPI would probably be annoyed if we
> started binding their threads to cores out from under them.

Quite.. :-)

> > The best we could possibly do is put the (target, not current) cpu
> > number in the mmap page; but userspace should already know this, for it
> > created the event and therefore knows this already.
>
> one other thing the kernel would do is just disable rdpmc (setting index
> to 0) in the case where the original perf_event_open() cpu paramater!=0
>
> though that would stop the case where we were on the same CPU from
> working.

Indeed.

> The issue is currently if you're not careful the rdpmc() interface will
> sometimes return plausible (but wrong) results for a cross-CPU rdpmc()
> call, even if you are properly falling back to read() on ->index being 0.
> It's a bit surprising and it looks like it will take a decent amount of
> userspace code to work around the issue, which cuts into the low-overhead
> nature of rdpmc.
>
> If the answer is simply this is the way the kernel is going to do it,
> that's fine, I just have to add workarounds to PAPI and then get the
> perf_even_open() manpage updated to make sure people are aware of the
> issue.

I'm not sure there really is anything the kernel can do to help here...
One thing you could look at is using rseq together with adding a CPU
number to the userspace descriptor, and if rseq gives you a matching CPU
number use rdpmc, otherwise, or on rseq abort, use read().