2017-03-13 13:58:15

by Vince Weaver

[permalink] [raw]
Subject: perf: race with automatic rdpmc() disabling

Hello

I've been trying to track this issue down for a few days and haven't been
able to isolate it. So maybe someone who understands low-level perf mmap
reference counting can help here.

As you might recall, 7911d3f7af14a614617e38245fedf98a724e46a9
introduced automatic disabling of userspace rdpmc when no perf_events
were running.

I've run into a problem with PAPI when using rdpmc. If you have PAPI
measuring events in multiple pthread threads, sometimes (but not always)
the program will GPF because CR4/rdpmc gets turned off while events are
still active.

I've been trying to put together a reproducible test case but haven't been
able to manage. I have a PAPI test that will show the problem about
50% of the time but I can't seem to isolate the problem.

Any ideas?

If you really want to try to reproduce it, get the current git version of
PAPI
git clone https://bitbucket.org/icl/papi.git
edit src/components/perf_event/perf_event.c
so that #define PERF_USE_RDPMC 1
in src run ./configure , make
then run the ./ctests/zero_pthreads test a few times. It will GPF and I'm
relatively (though not entirely) sure it's not a PAPI issue.
The problem does go away if you set /sys/devices/cpu/rdpmc to 2

Vince


2017-03-13 16:46:35

by Andy Lutomirski

[permalink] [raw]
Subject: Re: perf: race with automatic rdpmc() disabling

On Mon, Mar 13, 2017 at 6:58 AM, Vince Weaver <[email protected]> wrote:
> Hello
>
> I've been trying to track this issue down for a few days and haven't been
> able to isolate it. So maybe someone who understands low-level perf mmap
> reference counting can help here.
>
> As you might recall, 7911d3f7af14a614617e38245fedf98a724e46a9
> introduced automatic disabling of userspace rdpmc when no perf_events
> were running.
>
> I've run into a problem with PAPI when using rdpmc. If you have PAPI
> measuring events in multiple pthread threads, sometimes (but not always)
> the program will GPF because CR4/rdpmc gets turned off while events are
> still active.
>
> I've been trying to put together a reproducible test case but haven't been
> able to manage. I have a PAPI test that will show the problem about
> 50% of the time but I can't seem to isolate the problem.
>
> Any ideas?
>
> If you really want to try to reproduce it, get the current git version of
> PAPI
> git clone https://bitbucket.org/icl/papi.git
> edit src/components/perf_event/perf_event.c
> so that #define PERF_USE_RDPMC 1
> in src run ./configure , make
> then run the ./ctests/zero_pthreads test a few times. It will GPF and I'm
> relatively (though not entirely) sure it's not a PAPI issue.
> The problem does go away if you set /sys/devices/cpu/rdpmc to 2

Hmm

static void x86_pmu_event_mapped(struct perf_event *event)
{
if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
return;

if (atomic_inc_return(&current->mm->context.perf_rdpmc_allowed) == 1)

<-- thread 1 stalls here

on_each_cpu_mask(mm_cpumask(current->mm), refresh_pce, NULL, 1);
}

Suppose you start with perf_rdpmc_allowed == 0. Thread 1 runs
x86_pmu_event_mapped and gets preempted (or just runs slowly) where I
marked. Then thread 2 runs the whole function, does *not* update CR4,
returns to userspace, and GPFs.

The big hammer solution is to stick a per-mm mutex around it. Let me
ponder whether a smaller hammer is available.

2017-03-13 16:55:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: perf: race with automatic rdpmc() disabling

On Mon, Mar 13, 2017 at 09:44:02AM -0700, Andy Lutomirski wrote:
> static void x86_pmu_event_mapped(struct perf_event *event)
> {
> if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
> return;
>
> if (atomic_inc_return(&current->mm->context.perf_rdpmc_allowed) == 1)
>
> <-- thread 1 stalls here
>
> on_each_cpu_mask(mm_cpumask(current->mm), refresh_pce, NULL, 1);
> }
>
> Suppose you start with perf_rdpmc_allowed == 0. Thread 1 runs
> x86_pmu_event_mapped and gets preempted (or just runs slowly) where I
> marked. Then thread 2 runs the whole function, does *not* update CR4,
> returns to userspace, and GPFs.
>
> The big hammer solution is to stick a per-mm mutex around it. Let me
> ponder whether a smaller hammer is available.

Reminds me a bit of what we ended up with in kernel/jump_label.c:static_key_slow_inc().


2017-03-13 21:05:59

by Andy Lutomirski

[permalink] [raw]
Subject: Re: perf: race with automatic rdpmc() disabling

On Mon, Mar 13, 2017 at 9:55 AM, Peter Zijlstra <[email protected]> wrote:
> On Mon, Mar 13, 2017 at 09:44:02AM -0700, Andy Lutomirski wrote:
>> static void x86_pmu_event_mapped(struct perf_event *event)
>> {
>> if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>> return;
>>
>> if (atomic_inc_return(&current->mm->context.perf_rdpmc_allowed) == 1)
>>
>> <-- thread 1 stalls here
>>
>> on_each_cpu_mask(mm_cpumask(current->mm), refresh_pce, NULL, 1);
>> }
>>
>> Suppose you start with perf_rdpmc_allowed == 0. Thread 1 runs
>> x86_pmu_event_mapped and gets preempted (or just runs slowly) where I
>> marked. Then thread 2 runs the whole function, does *not* update CR4,
>> returns to userspace, and GPFs.
>>
>> The big hammer solution is to stick a per-mm mutex around it. Let me
>> ponder whether a smaller hammer is available.
>
> Reminds me a bit of what we ended up with in kernel/jump_label.c:static_key_slow_inc().
>
>

One thing I don't get: isn't mmap_sem held for write the whole time?

2017-03-14 15:59:07

by Andy Lutomirski

[permalink] [raw]
Subject: Re: perf: race with automatic rdpmc() disabling

On Mon, Mar 13, 2017 at 2:05 PM, Andy Lutomirski <[email protected]> wrote:
> On Mon, Mar 13, 2017 at 9:55 AM, Peter Zijlstra <[email protected]> wrote:
>> On Mon, Mar 13, 2017 at 09:44:02AM -0700, Andy Lutomirski wrote:
>>> static void x86_pmu_event_mapped(struct perf_event *event)
>>> {
>>> if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>>> return;
>>>
>>> if (atomic_inc_return(&current->mm->context.perf_rdpmc_allowed) == 1)
>>>
>>> <-- thread 1 stalls here
>>>
>>> on_each_cpu_mask(mm_cpumask(current->mm), refresh_pce, NULL, 1);
>>> }
>>>
>>> Suppose you start with perf_rdpmc_allowed == 0. Thread 1 runs
>>> x86_pmu_event_mapped and gets preempted (or just runs slowly) where I
>>> marked. Then thread 2 runs the whole function, does *not* update CR4,
>>> returns to userspace, and GPFs.
>>>
>>> The big hammer solution is to stick a per-mm mutex around it. Let me
>>> ponder whether a smaller hammer is available.
>>
>> Reminds me a bit of what we ended up with in kernel/jump_label.c:static_key_slow_inc().
>>
>>
>
> One thing I don't get: isn't mmap_sem held for write the whole time?

mmap_sem is indeed held, so my theory is wrong. I can reproduce it,
but I don't see the bug yet...

--Andy

--
Andy Lutomirski
AMA Capital Management, LLC

2017-03-14 16:45:59

by Vince Weaver

[permalink] [raw]
Subject: Re: perf: race with automatic rdpmc() disabling

On Tue, 14 Mar 2017, Andy Lutomirski wrote:

> On Mon, Mar 13, 2017 at 2:05 PM, Andy Lutomirski <[email protected]> wrote:
> > On Mon, Mar 13, 2017 at 9:55 AM, Peter Zijlstra <[email protected]> wrote:
> >> On Mon, Mar 13, 2017 at 09:44:02AM -0700, Andy Lutomirski wrote:
> >>> static void x86_pmu_event_mapped(struct perf_event *event)
> >>> {
> >>> if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
> >>> return;
> >>>
> >>> if (atomic_inc_return(&current->mm->context.perf_rdpmc_allowed) == 1)
> >>>
> >>> <-- thread 1 stalls here
> >>>
> >>> on_each_cpu_mask(mm_cpumask(current->mm), refresh_pce, NULL, 1);
> >>> }
> >>>
> >>> Suppose you start with perf_rdpmc_allowed == 0. Thread 1 runs
> >>> x86_pmu_event_mapped and gets preempted (or just runs slowly) where I
> >>> marked. Then thread 2 runs the whole function, does *not* update CR4,
> >>> returns to userspace, and GPFs.
> >>>
> >>> The big hammer solution is to stick a per-mm mutex around it. Let me
> >>> ponder whether a smaller hammer is available.
> >>
> >> Reminds me a bit of what we ended up with in kernel/jump_label.c:static_key_slow_inc().
> >>
> >>
> >
> > One thing I don't get: isn't mmap_sem held for write the whole time?
>
> mmap_sem is indeed held, so my theory is wrong. I can reproduce it,
> but I don't see the bug yet...

It could still be a PAPI bug, as I'm having absolutely no luck trying to
come up with a plain perf_event reproducer.

Let me dig through the PAPI code again and make sure I'm not missing
something.

Vince

2017-03-15 20:16:30

by Andy Lutomirski

[permalink] [raw]
Subject: Re: perf: race with automatic rdpmc() disabling

On Tue, Mar 14, 2017 at 9:45 AM, Vince Weaver <[email protected]> wrote:
> On Tue, 14 Mar 2017, Andy Lutomirski wrote:
>
>> On Mon, Mar 13, 2017 at 2:05 PM, Andy Lutomirski <[email protected]> wrote:
>> > On Mon, Mar 13, 2017 at 9:55 AM, Peter Zijlstra <[email protected]> wrote:
>> >> On Mon, Mar 13, 2017 at 09:44:02AM -0700, Andy Lutomirski wrote:
>> >>> static void x86_pmu_event_mapped(struct perf_event *event)
>> >>> {
>> >>> if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>> >>> return;
>> >>>
>> >>> if (atomic_inc_return(&current->mm->context.perf_rdpmc_allowed) == 1)
>> >>>
>> >>> <-- thread 1 stalls here
>> >>>
>> >>> on_each_cpu_mask(mm_cpumask(current->mm), refresh_pce, NULL, 1);
>> >>> }
>> >>>
>> >>> Suppose you start with perf_rdpmc_allowed == 0. Thread 1 runs
>> >>> x86_pmu_event_mapped and gets preempted (or just runs slowly) where I
>> >>> marked. Then thread 2 runs the whole function, does *not* update CR4,
>> >>> returns to userspace, and GPFs.
>> >>>
>> >>> The big hammer solution is to stick a per-mm mutex around it. Let me
>> >>> ponder whether a smaller hammer is available.
>> >>
>> >> Reminds me a bit of what we ended up with in kernel/jump_label.c:static_key_slow_inc().
>> >>
>> >>
>> >
>> > One thing I don't get: isn't mmap_sem held for write the whole time?
>>
>> mmap_sem is indeed held, so my theory is wrong. I can reproduce it,
>> but I don't see the bug yet...
>
> It could still be a PAPI bug, as I'm having absolutely no luck trying to
> come up with a plain perf_event reproducer.
>
> Let me dig through the PAPI code again and make sure I'm not missing
> something.

Can you give this a try:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=9edb8154863ba1a7f6f1f15ffe6aecf3cf32bf21

(The link doesn't work yet but it should in a minute or two.)

--Andy

2017-03-16 05:43:50

by Vince Weaver

[permalink] [raw]
Subject: Re: perf: race with automatic rdpmc() disabling

On Wed, 15 Mar 2017, Andy Lutomirski wrote:

> Can you give this a try:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=9edb8154863ba1a7f6f1f15ffe6aecf3cf32bf21
>
> (The link doesn't work yet but it should in a minute or two.)

I've tested it and I am unable to reproduce the problem with the patch
applied.

Vince