LinuxLists.cc - perf: fuzzer triggered warning in intel_pmu_drain_pebs

[permalink] [raw]

Subject: Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

On Thu, Jul 02, 2015 at 11:18:10AM -0400, Vince Weaver wrote:
> This is current linus-git on a Haswell machine with peterz's patch to fix
> the aux buffer spinlock recursion (I can still crash the kernel if that
> patch is not applied).

Argh, thanks for reminding me of that patch.

2015-07-03 13:13:51

[permalink] [raw]

Subject: Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

On Thu, Jul 02, 2015 at 11:18:10AM -0400, Vince Weaver wrote:
>
> So sad to say the lack of fuzzer reports was because I was out of town for
> a bit, not due to the kernel suddenly getting amazingly better.
>
> In any case I am running against current git and getting a lot of
> warnings, but most of them seem to be old ones. This following one looks
> new though.
>
> This is current linus-git on a Haswell machine with peterz's patch to fix
> the aux buffer spinlock recursion (I can still crash the kernel if that
> patch is not applied).
>
> It corresponds to:
>
> WARN_ON_ONCE(!event->attr.precise_ip);
>
> [ 584.352324] WARNING: CPU: 2 PID: 18924 at arch/x86/kernel/cpu/perf_event_intel_ds.c:1198 intel_pmu_drain_pebs_nhm+0x283/0x2e0()

I've not yet tried to reproduce, but the below could explain things.

On disabling an event we first clear our cpuc->pebs_enabled bits, only
to then check them to see if there are any set, and if so, drain the
buffer.

If we just cleared the last bit, we'll fail to drain the buffer.

If we then program another event on that counter and another PEBS event,
we can hit the above WARN with the 'stale' entries left over from the
previous event.

---
arch/x86/kernel/cpu/perf_event_intel_ds.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 71fc40238843..041a30ba5654 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -548,7 +548,7 @@ int intel_pmu_drain_bts_buffer(void)

static inline void intel_pmu_drain_pebs_buffer(void)
{
- struct pt_regs regs;
+ struct pt_regs regs; /* SAMPLE_REGS_INTR must not be set for FREERUNNING */

x86_pmu.drain_pebs(&regs);
}
@@ -755,13 +755,6 @@ void intel_pmu_pebs_disable(struct perf_event *event)
struct hw_perf_event *hwc = &event->hw;
struct debug_store *ds = cpuc->ds;

- cpuc->pebs_enabled &= ~(1ULL << hwc->idx);
-
- if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT)
- cpuc->pebs_enabled &= ~(1ULL << (hwc->idx + 32));
- else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
- cpuc->pebs_enabled &= ~(1ULL << 63);
-
if (ds->pebs_interrupt_threshold >
ds->pebs_buffer_base + x86_pmu.pebs_record_size) {
intel_pmu_drain_pebs_buffer();
@@ -769,6 +762,13 @@ void intel_pmu_pebs_disable(struct perf_event *event)
perf_sched_cb_dec(event->ctx->pmu);
}

+ cpuc->pebs_enabled &= ~(1ULL << hwc->idx);
+
+ if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT)
+ cpuc->pebs_enabled &= ~(1ULL << (hwc->idx + 32));
+ else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
+ cpuc->pebs_enabled &= ~(1ULL << 63);
+
if (cpuc->enabled)
wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);

2015-07-03 18:57:17

[permalink] [raw]

Subject: Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

Peter,

On Fri, Jul 3, 2015 at 3:13 PM, Peter Zijlstra <[email protected]> wrote:
> On Thu, Jul 02, 2015 at 11:18:10AM -0400, Vince Weaver wrote:
>>
>> So sad to say the lack of fuzzer reports was because I was out of town for
>> a bit, not due to the kernel suddenly getting amazingly better.
>>
>> In any case I am running against current git and getting a lot of
>> warnings, but most of them seem to be old ones. This following one looks
>> new though.
>>
>> This is current linus-git on a Haswell machine with peterz's patch to fix
>> the aux buffer spinlock recursion (I can still crash the kernel if that
>> patch is not applied).
>>
>> It corresponds to:
>>
>> WARN_ON_ONCE(!event->attr.precise_ip);
>>
>> [ 584.352324] WARNING: CPU: 2 PID: 18924 at arch/x86/kernel/cpu/perf_event_intel_ds.c:1198 intel_pmu_drain_pebs_nhm+0x283/0x2e0()
>
> I've not yet tried to reproduce, but the below could explain things.
>
> On disabling an event we first clear our cpuc->pebs_enabled bits, only
> to then check them to see if there are any set, and if so, drain the
> buffer.
>
> If we just cleared the last bit, we'll fail to drain the buffer.
>
Where do you see that we use cpuc->pebs_enabled after clearing it
in pebs_disable() to check for overflow or active in drain_pebs()?
I only see it used in get_next_pebs_record_by_bit()?

> If we then program another event on that counter and another PEBS event,
> we can hit the above WARN with the 'stale' entries left over from the
> previous event.
>
> ---
> arch/x86/kernel/cpu/perf_event_intel_ds.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> index 71fc40238843..041a30ba5654 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> @@ -548,7 +548,7 @@ int intel_pmu_drain_bts_buffer(void)
>
> static inline void intel_pmu_drain_pebs_buffer(void)
> {
> - struct pt_regs regs;
> + struct pt_regs regs; /* SAMPLE_REGS_INTR must not be set for FREERUNNING */
>
> x86_pmu.drain_pebs(&regs);
> }
> @@ -755,13 +755,6 @@ void intel_pmu_pebs_disable(struct perf_event *event)
> struct hw_perf_event *hwc = &event->hw;
> struct debug_store *ds = cpuc->ds;
>
> - cpuc->pebs_enabled &= ~(1ULL << hwc->idx);
> -
> - if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT)
> - cpuc->pebs_enabled &= ~(1ULL << (hwc->idx + 32));
> - else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
> - cpuc->pebs_enabled &= ~(1ULL << 63);
> -
> if (ds->pebs_interrupt_threshold >
> ds->pebs_buffer_base + x86_pmu.pebs_record_size) {
> intel_pmu_drain_pebs_buffer();
> @@ -769,6 +762,13 @@ void intel_pmu_pebs_disable(struct perf_event *event)
> perf_sched_cb_dec(event->ctx->pmu);
> }
>
> + cpuc->pebs_enabled &= ~(1ULL << hwc->idx);
> +
> + if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT)
> + cpuc->pebs_enabled &= ~(1ULL << (hwc->idx + 32));
> + else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
> + cpuc->pebs_enabled &= ~(1ULL << 63);
> +
> if (cpuc->enabled)
> wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
>

2015-07-03 18:57:04

[permalink] [raw]

Subject: Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

On Fri, 3 Jul 2015, Peter Zijlstra wrote:

> On Thu, Jul 02, 2015 at 11:18:10AM -0400, Vince Weaver wrote:
> >
> > So sad to say the lack of fuzzer reports was because I was out of town for
> > a bit, not due to the kernel suddenly getting amazingly better.
> >
> > In any case I am running against current git and getting a lot of
> > warnings, but most of them seem to be old ones. This following one looks
> > new though.
> >
> > This is current linus-git on a Haswell machine with peterz's patch to fix
> > the aux buffer spinlock recursion (I can still crash the kernel if that
> > patch is not applied).
> >
> > It corresponds to:
> >
> > WARN_ON_ONCE(!event->attr.precise_ip);
> >
> > [ 584.352324] WARNING: CPU: 2 PID: 18924 at arch/x86/kernel/cpu/perf_event_intel_ds.c:1198 intel_pmu_drain_pebs_nhm+0x283/0x2e0()
>
> I've not yet tried to reproduce, but the below could explain things.
>
> On disabling an event we first clear our cpuc->pebs_enabled bits, only
> to then check them to see if there are any set, and if so, drain the
> buffer.
>
> If we just cleared the last bit, we'll fail to drain the buffer.
>
> If we then program another event on that counter and another PEBS event,
> we can hit the above WARN with the 'stale' entries left over from the
> previous event.

with that patch applied I still managed to hit this:

WARN_ON_ONCE(!event->attr.precise_ip);

I'll let it run some more and see if the watchdog still gets triggered.

[ 2217.544901] ------------[ cut here ]------------
[ 2217.550351] WARNING: CPU: 2 PID: 9136 at arch/x86/kernel/cpu/perf_event_intel_ds.c:1198 intel_pmu_drain_pebs_nhm+0x283/0x2e0()
[ 2217.563534] Modules linked in: fuse snd_hda_codec_hdmi i915 x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel psmouse hmac drbg evdev serio_raw ansi_cprng snd_hda_codec_realtek drm_kms_helper snd_hda_codec_generic ppdev iTCO_wdt iTCO_vendor_support pcspkr drm i2c_algo_bit aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper snd_hda_intel cryptd mei_me mei snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer tpm_tis tpm wmi button processor video battery i2c_i801 parport_pc parport snd lpc_ich mfd_core soundcore sg sr_mod sd_mod cdrom ehci_pci ehci_hcd ahci libahci xhci_pci xhci_hcd e1000e libata ptp crc32c_intel scsi_mod pps_core usbcore usb_common fan thermal thermal_sys
[ 2217.640998] CPU: 2 PID: 9136 Comm: perf_fuzzer Tainted: G W 4.1.0+ #163
[ 2217.649810] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[ 2217.658281] ffffffff81a105a0 ffff88011ea85b10 ffffffff8169f823 0000000000000000
[ 2217.666818] 0000000000000000 ffff88011ea85b50 ffffffff8106ec8a ffff88011ea85ba0
[ 2217.675329] 0000000000000002 0000000000000001 ffff88011ea8bd80 ffff8801190400c0
[ 2217.683821] Call Trace:
[ 2217.686960] <NMI> [<ffffffff8169f823>] dump_stack+0x45/0x57
[ 2217.693638] [<ffffffff8106ec8a>] warn_slowpath_common+0x8a/0xc0
[ 2217.700549] [<ffffffff8106ed7a>] warn_slowpath_null+0x1a/0x20
[ 2217.707296] [<ffffffff8102f783>] intel_pmu_drain_pebs_nhm+0x283/0x2e0
[ 2217.714775] [<ffffffff81031834>] ? intel_pmu_disable_event+0xa4/0x130
[ 2217.722216] [<ffffffff81032235>] intel_pmu_handle_irq+0x255/0x440
[ 2217.729339] [<ffffffff8115413e>] ? perf_event_ctx_lock_nested+0x5e/0xf0
[ 2217.737026] [<ffffffff81028e76>] perf_event_nmi_handler+0x26/0x40
[ 2217.744070] [<ffffffff810181ad>] nmi_handle+0x9d/0x140
[ 2217.750160] [<ffffffff81018115>] ? nmi_handle+0x5/0x140
[ 2217.756290] [<ffffffff8101843a>] default_do_nmi+0x4a/0x120
[ 2217.762688] [<ffffffff8101859d>] do_nmi+0x8d/0xc0
[ 2217.768280] [<ffffffff816a979f>] end_repeat_nmi+0x1e/0x2e
[ 2217.774627] [<ffffffff810309ba>] ? __intel_pmu_enable_all+0x5a/0xc0
[ 2217.781894] [<ffffffff810309ba>] ? __intel_pmu_enable_all+0x5a/0xc0
[ 2217.789153] [<ffffffff810309ba>] ? __intel_pmu_enable_all+0x5a/0xc0
[ 2217.796415] <<EOE>> <IRQ> [<ffffffff81030a30>] intel_pmu_enable_all+0x10/0x20
[ 2217.804847] [<ffffffff8102a95c>] x86_pmu_enable+0x25c/0x2e0
[ 2217.811383] [<ffffffff811560e2>] perf_pmu_enable+0x22/0x30
[ 2217.817837] [<ffffffff81157a80>] perf_mux_hrtimer_handler+0x120/0x1f0
[ 2217.825316] [<ffffffff81157960>] ? perf_event_context_sched_in+0x150/0x150
[ 2217.833239] [<ffffffff810dcf43>] __hrtimer_run_queues+0xd3/0x260
[ 2217.840239] [<ffffffff810dd4bb>] hrtimer_interrupt+0xab/0x1b0
[ 2217.846930] [<ffffffff8104b32c>] local_apic_timer_interrupt+0x3c/0x70
[ 2217.854367] [<ffffffff816aa1a1>] smp_apic_timer_interrupt+0x41/0x60
[ 2217.861630] [<ffffffff816a83eb>] apic_timer_interrupt+0x6b/0x70
[ 2217.868540] <EOI>
[ 2217.870633] ---[ end trace 3a31b4d07b4f3450 ]---
[ 2353.824071] Uhhuh. NMI received for unknown reason 31 on CPU 1.
[ 2353.831238] Do you have a strange power saving mode enabled?
[ 2353.838120] Dazed and confused, but trying to continue

2015-07-03 19:04:37

[permalink] [raw]

Subject: Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

On Fri, Jul 03, 2015 at 08:56:57PM +0200, Stephane Eranian wrote:
> Where do you see that we use cpuc->pebs_enabled after clearing it
> in pebs_disable() to check for overflow or active in drain_pebs()?
> I only see it used in get_next_pebs_record_by_bit()?

| static inline bool pebs_is_enabled(struct cpu_hw_events *cpuc)
| {
| return (cpuc->pebs_enabled & ((1ULL << MAX_PEBS_EVENTS) - 1));
| }

| void intel_pmu_pebs_disable(struct perf_event *event)
| {
| struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
| struct hw_perf_event *hwc = &event->hw;
| struct debug_store *ds = cpuc->ds;
|
| cpuc->pebs_enabled &= ~(1ULL << hwc->idx);

Here we clear it...

| if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT)
| cpuc->pebs_enabled &= ~(1ULL << (hwc->idx + 32));
| else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
| cpuc->pebs_enabled &= ~(1ULL << 63);
|
| if (ds->pebs_interrupt_threshold >
| ds->pebs_buffer_base + x86_pmu.pebs_record_size) {
| intel_pmu_drain_pebs_buffer();

Here we drain the buffer; which uses ->pebs_enabled

| if (!pebs_is_enabled(cpuc))

And here we test it

| perf_sched_cb_dec(event->ctx->pmu);
| }
|
| if (cpuc->enabled)
| wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
|
| hwc->config |= ARCH_PERFMON_EVENTSEL_INT;
| }

That said, its far too warm and I might just not be making sense.

2015-07-03 19:43:37

[permalink] [raw]

Subject: Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

On Fri, 3 Jul 2015, Peter Zijlstra wrote:

> That said, its far too warm and I might just not be making sense.

you need to come visit Maine! Although I am not sure the cooler weather
necessarily improves my kernel debugging skills.

I managed to lock the machine (again this is with the patch applied).

[ 299.366027] ------------[ cut here ]------------
[ 299.370985] WARNING: CPU: 2 PID: 8241 at arch/x86/kernel/cpu/perf_event_intel_ds.c:1198 intel_pmu_drain_pebs_nhm+0x283/0x2e0()
[ 299.456929] CPU: 2 PID: 8241 Comm: perf_fuzzer Tainted: G W 4.1.0+ #164
[ 299.465750] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[ 299.474274] ffffffff81a105a0 ffff88011ea85b10 ffffffff8169f823 0000000000000000
[ 299.482864] 0000000000000000 ffff88011ea85b50 ffffffff8106ec8a ffff88011ea85ba0
[ 299.491488] 0000000000000000 0000000000000001 ffff88011ea8bd80 ffff8801190400c0
[ 299.500029] Call Trace:
[ 299.503190] <NMI> [<ffffffff8169f823>] dump_stack+0x45/0x57
[ 299.509936] [<ffffffff8106ec8a>] warn_slowpath_common+0x8a/0xc0
[ 299.516901] [<ffffffff8106ed7a>] warn_slowpath_null+0x1a/0x20
[ 299.523715] [<ffffffff8102f783>] intel_pmu_drain_pebs_nhm+0x283/0x2e0
[ 299.531268] [<ffffffff81032235>] intel_pmu_handle_irq+0x255/0x440
[ 299.538487] [<ffffffff81028e76>] perf_event_nmi_handler+0x26/0x40
[ 299.545638] [<ffffffff810181ad>] nmi_handle+0x9d/0x140
[ 299.551772] [<ffffffff81018115>] ? nmi_handle+0x5/0x140
[ 299.558013] [<ffffffff8101843a>] default_do_nmi+0x4a/0x120
[ 299.564527] [<ffffffff8101859d>] do_nmi+0x8d/0xc0
[ 299.570185] [<ffffffff816a979f>] end_repeat_nmi+0x1e/0x2e
[ 299.576580] [<ffffffff811bc9d2>] ? check_poison_obj+0x92/0x230
[ 299.583390] [<ffffffff811bc9d2>] ? check_poison_obj+0x92/0x230
[ 299.590163] [<ffffffff811bc9d2>] ? check_poison_obj+0x92/0x230
[ 299.596922] <<EOE>> [<ffffffff8115bea8>] ? perf_event_alloc+0x58/0x680
[ 299.604594] [<ffffffff811bcf7d>] cache_alloc_debugcheck_after.isra.51+0x1cd/0x250
[ 299.613140] [<ffffffff811c08b6>] kmem_cache_alloc_trace+0xa6/0x510
[ 299.620330] [<ffffffff8115bea8>] ? perf_event_alloc+0x58/0x680
[ 299.627088] [<ffffffff8106ee48>] ? get_online_cpus+0x58/0x70
[ 299.633688] [<ffffffff8115bea8>] perf_event_alloc+0x58/0x680
[ 299.640319] [<ffffffff8115c897>] SYSC_perf_event_open+0x3c7/0xd40
[ 299.647353] [<ffffffff8105f86b>] ? __do_page_fault+0x1ab/0x3f0
[ 299.654172] [<ffffffff8115d689>] SyS_perf_event_open+0x9/0x10
[ 299.660871] [<ffffffff816a7572>] entry_SYSCALL_64_fastpath+0x16/0x7a
[ 299.668236] ---[ end trace 3356c74581c13f1d ]---
[ 299.673648] Uhhuh. NMI received for unknown reason 31 on CPU 2.
[ 299.680427] Do you have a strange power saving mode enabled?
[ 299.686963] Dazed and confused, but trying to continue
[ 299.692904] Uhhuh. NMI received for unknown reason 31 on CPU 2.
[ 299.699748] Do you have a strange power saving mode enabled?
[ 299.706227] Dazed and confused, but trying to continue
[ 299.712172] Uhhuh. NMI received for unknown reason 31 on CPU 2.
[ 299.718946] Do you have a strange power saving mode enabled?
[ 299.725446] Dazed and confused, but trying to continue
[ 299.731419] Uhhuh. NMI received for unknown reason 31 on CPU 2.
[ 299.738235] Do you have a strange power saving mode enabled?
[ 299.744740] Dazed and confused, but trying to continue
[ 299.750660] Uhhuh. NMI received for unknown reason 21 on CPU 2.
[ 299.757398] Do you have a strange power saving mode enabled?
[ 299.763862] Dazed and confused, but trying to continue

(machine eventually locks up after lots of these messages)

2015-07-03 20:08:41

by Liang, Kan

[permalink] [raw]

Subject: RE: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

>
> I've not yet tried to reproduce, but the below could explain things.
>
> On disabling an event we first clear our cpuc->pebs_enabled bits, only to
> then check them to see if there are any set, and if so, drain the buffer.
>
> If we just cleared the last bit, we'll fail to drain the buffer.
>
> If we then program another event on that counter and another PEBS event,
> we can hit the above WARN with the 'stale' entries left over from the
> previous event.
>
> ---
> arch/x86/kernel/cpu/perf_event_intel_ds.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c
> b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> index 71fc40238843..041a30ba5654 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> @@ -548,7 +548,7 @@ int intel_pmu_drain_bts_buffer(void)
>
> static inline void intel_pmu_drain_pebs_buffer(void) {
> - struct pt_regs regs;
> + struct pt_regs regs; /* SAMPLE_REGS_INTR must not be set for
> +FREERUNNING */
>
> x86_pmu.drain_pebs(&regs);
> }
> @@ -755,13 +755,6 @@ void intel_pmu_pebs_disable(struct perf_event
> *event)
> struct hw_perf_event *hwc = &event->hw;
> struct debug_store *ds = cpuc->ds;
>
> - cpuc->pebs_enabled &= ~(1ULL << hwc->idx);
> -
> - if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT)
> - cpuc->pebs_enabled &= ~(1ULL << (hwc->idx + 32));
> - else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
> - cpuc->pebs_enabled &= ~(1ULL << 63);
> -
> if (ds->pebs_interrupt_threshold >
> ds->pebs_buffer_base + x86_pmu.pebs_record_size) {
> intel_pmu_drain_pebs_buffer();
> @@ -769,6 +762,13 @@ void intel_pmu_pebs_disable(struct perf_event
> *event)
> perf_sched_cb_dec(event->ctx->pmu);
> }
>
> + cpuc->pebs_enabled &= ~(1ULL << hwc->idx);
> +
> + if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT)
> + cpuc->pebs_enabled &= ~(1ULL << (hwc->idx + 32));
> + else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
> + cpuc->pebs_enabled &= ~(1ULL << 63);
> +
> if (cpuc->enabled)
> wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
>

If we cleared the last bit, we not only drain the buffer but also decrease
the event->ctx->pmu, which is used to flush the PEBS buffer during
context switches.
We need to disable cpuc->pebs_enabled before changing
event->ctx->pmu as below.

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 71fc402..76285c1 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -754,6 +754,11 @@ void intel_pmu_pebs_disable(struct
perf_event *event)
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct hw_perf_event *hwc = &event->hw;
struct debug_store *ds = cpuc->ds;
+ bool large_pebs = ds->pebs_interrupt_threshold >
+ ds->pebs_buffer_base + x86_pmu.pebs_record_size;
+
+ if (large_pebs)
+ intel_pmu_drain_pebs_buffer();

cpuc->pebs_enabled &= ~(1ULL << hwc->idx);

@@ -762,12 +767,8 @@ void intel_pmu_pebs_disable(struct
perf_event *event)
else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
cpuc->pebs_enabled &= ~(1ULL << 63);

- if (ds->pebs_interrupt_threshold >
- ds->pebs_buffer_base + x86_pmu.pebs_record_size) {
- intel_pmu_drain_pebs_buffer();
- if (!pebs_is_enabled(cpuc))
- perf_sched_cb_dec(event->ctx->pmu);
- }
+ if (large_pebs && !pebs_is_enabled(cpuc))
+ perf_sched_cb_dec(event->ctx->pmu);

if (cpuc->enabled)
wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);

Thanks,
Kan

2015-07-06 10:55:27

[permalink] [raw]

Subject: Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

On Fri, Jul 03, 2015 at 08:08:27PM +0000, Liang, Kan wrote:
> If we cleared the last bit, we not only drain the buffer but also decrease
> the event->ctx->pmu, which is used to flush the PEBS buffer during
> context switches.
> We need to disable cpuc->pebs_enabled before changing
> event->ctx->pmu as below.
>

Indeed, mind sending a proper patch so I can press 'A' on it?

> diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c
> b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> index 71fc402..76285c1 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> @@ -754,6 +754,11 @@ void intel_pmu_pebs_disable(struct
> perf_event *event)
> struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> struct hw_perf_event *hwc = &event->hw;
> struct debug_store *ds = cpuc->ds;
> + bool large_pebs = ds->pebs_interrupt_threshold >
> + ds->pebs_buffer_base + x86_pmu.pebs_record_size;
> +
> + if (large_pebs)
> + intel_pmu_drain_pebs_buffer();
>
> cpuc->pebs_enabled &= ~(1ULL << hwc->idx);
>
> @@ -762,12 +767,8 @@ void intel_pmu_pebs_disable(struct
> perf_event *event)
> else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
> cpuc->pebs_enabled &= ~(1ULL << 63);
>
> - if (ds->pebs_interrupt_threshold >
> - ds->pebs_buffer_base + x86_pmu.pebs_record_size) {
> - intel_pmu_drain_pebs_buffer();
> - if (!pebs_is_enabled(cpuc))
> - perf_sched_cb_dec(event->ctx->pmu);
> - }
> + if (large_pebs && !pebs_is_enabled(cpuc))
> + perf_sched_cb_dec(event->ctx->pmu);
>
> if (cpuc->enabled)
> wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
>
>
>
> Thanks,
> Kan

2015-07-06 13:48:07

by Liang, Kan

[permalink] [raw]

Subject: RE: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

> On Fri, Jul 03, 2015 at 08:08:27PM +0000, Liang, Kan wrote:
> > If we cleared the last bit, we not only drain the buffer but also
> > decrease the event->ctx->pmu, which is used to flush the PEBS buffer
> > during context switches.
> > We need to disable cpuc->pebs_enabled before changing
> > event->ctx->pmu as below.
> >
>
> Indeed, mind sending a proper patch so I can press 'A' on it?

Sure, I will do that.
But I didn't verify the patch, since I cannot reproduce the issue.

Vince, would you mind testing the patch?
If the issue is gone, I will send a proper patch then.

Thanks,
Kan

>
> > diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c
> > b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> > index 71fc402..76285c1 100644
> > --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
> > +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> > @@ -754,6 +754,11 @@ void intel_pmu_pebs_disable(struct perf_event
> > *event)
> > struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> > struct hw_perf_event *hwc = &event->hw;
> > struct debug_store *ds = cpuc->ds;
> > + bool large_pebs = ds->pebs_interrupt_threshold >
> > + ds->pebs_buffer_base +
> > + x86_pmu.pebs_record_size;
> > +
> > + if (large_pebs)
> > + intel_pmu_drain_pebs_buffer();
> >
> > cpuc->pebs_enabled &= ~(1ULL << hwc->idx);
> >
> > @@ -762,12 +767,8 @@ void intel_pmu_pebs_disable(struct perf_event
> > *event)
> > else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
> > cpuc->pebs_enabled &= ~(1ULL << 63);
> >
> > - if (ds->pebs_interrupt_threshold >
> > - ds->pebs_buffer_base + x86_pmu.pebs_record_size) {
> > - intel_pmu_drain_pebs_buffer();
> > - if (!pebs_is_enabled(cpuc))
> > - perf_sched_cb_dec(event->ctx->pmu);
> > - }
> > + if (large_pebs && !pebs_is_enabled(cpuc))
> > + perf_sched_cb_dec(event->ctx->pmu);
> >
> > if (cpuc->enabled)
> > wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
> >
> >
> >
> > Thanks,
> > Kan

2015-07-06 16:16:09

[permalink] [raw]

Subject: RE: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

On Mon, 6 Jul 2015, Liang, Kan wrote:

>
> > On Fri, Jul 03, 2015 at 08:08:27PM +0000, Liang, Kan wrote:
> > > If we cleared the last bit, we not only drain the buffer but also
> > > decrease the event->ctx->pmu, which is used to flush the PEBS buffer
> > > during context switches.
> > > We need to disable cpuc->pebs_enabled before changing
> > > event->ctx->pmu as below.
> > >
> >
> > Indeed, mind sending a proper patch so I can press 'A' on it?
>
> Sure, I will do that.
> But I didn't verify the patch, since I cannot reproduce the issue.
>
> Vince, would you mind testing the patch?
> If the issue is gone, I will send a proper patch then.

I've got too many patches floating around so I forget what this one is
trying to fix. The pebs related lockup? Or the warning?

Was this patch meant to be in addition to PeterZ's, or standalone?

Also please send proper patches in the future, this one was whitespace
damaged and a pain to get applied.

With just this patch applied (without PeterZ's) I still managed to trigger
the following warning.

[ 1328.103920] ------------[ cut here ]------------
[ 1328.109367] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/perf_event_intel_ds.c:1199 intel_pmu_drain_pebs_nhm+0x283/0x2e0()
[ 1328.199193] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.2.0-rc1+ #166
[ 1328.207955] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[ 1328.216461] ffffffff81a10f88 ffff88011ea05b10 ffffffff816a10a3 0000000000000000
[ 1328.225043] 0000000000000000 ffff88011ea05b50 ffffffff8106ec8a ffff88011ea05ba0
[ 1328.233654] 0000000000000000 0000000000000001 ffff88011ea0bd80 ffff8801190000c0
[ 1328.242260] Call Trace:
[ 1328.245452] <NMI> [<ffffffff816a10a3>] dump_stack+0x45/0x57
[ 1328.252243] [<ffffffff8106ec8a>] warn_slowpath_common+0x8a/0xc0
[ 1328.259212] [<ffffffff8106ed7a>] warn_slowpath_null+0x1a/0x20
[ 1328.266000] [<ffffffff8102f783>] intel_pmu_drain_pebs_nhm+0x283/0x2e0
[ 1328.273551] [<ffffffff81032235>] intel_pmu_handle_irq+0x255/0x440
[ 1328.280742] [<ffffffff811574bf>] ? __perf_event_enable+0x1f/0x2a0
[ 1328.287887] [<ffffffff8114b19d>] ? irq_work_queue+0x5d/0x80
[ 1328.294500] [<ffffffff81028e76>] perf_event_nmi_handler+0x26/0x40
[ 1328.301659] [<ffffffff810181ad>] nmi_handle+0x9d/0x140
[ 1328.307783] [<ffffffff81018115>] ? nmi_handle+0x5/0x140
[ 1328.314010] [<ffffffff8101843a>] default_do_nmi+0x4a/0x120
[ 1328.320495] [<ffffffff8101859d>] do_nmi+0x8d/0xc0
[ 1328.326053] [<ffffffff816ab01f>] end_repeat_nmi+0x1e/0x2e
[ 1328.332405] [<ffffffff810309ba>] ? __intel_pmu_enable_all+0x5a/0xc0
[ 1328.339627] [<ffffffff810309ba>] ? __intel_pmu_enable_all+0x5a/0xc0
[ 1328.346894] [<ffffffff810309ba>] ? __intel_pmu_enable_all+0x5a/0xc0
[ 1328.354130] <<EOE>> <IRQ> [<ffffffff81030a30>] intel_pmu_enable_all+0x10/0x20
[ 1328.362556] [<ffffffff8102a95c>] x86_pmu_enable+0x25c/0x2e0
[ 1328.369079] [<ffffffff81156202>] perf_pmu_enable+0x22/0x30
[ 1328.375447] [<ffffffff81157da1>] __perf_install_in_context+0x131/0x1d0
[ 1328.382936] [<ffffffff811533b0>] ? cpu_clock_event_start+0x40/0x40
[ 1328.390106] [<ffffffff811533f2>] remote_function+0x42/0x50
[ 1328.396525] [<ffffffff810f0e9b>] flush_smp_call_function_queue+0x7b/0x170
[ 1328.404360] [<ffffffff810f1883>] generic_smp_call_function_single_interrupt+0x13/0x60
[ 1328.413345] [<ffffffff81049217>] smp_call_function_single_interrupt+0x27/0x40
[ 1328.421544] [<ffffffff816aa1ab>] call_function_single_interrupt+0x6b/0x70
[ 1328.429358] <EOI> [<ffffffff81545454>] ? cpuidle_enter_state+0xf4/0x220
[ 1328.437120] [<ffffffff81545430>] ? cpuidle_enter_state+0xd0/0x220
[ 1328.444210] [<ffffffff815455b7>] cpuidle_enter+0x17/0x20
[ 1328.450491] [<ffffffff810b06eb>] call_cpuidle+0x3b/0x70
[ 1328.456680] [<ffffffff81545593>] ? cpuidle_select+0x13/0x20
[ 1328.463196] [<ffffffff810b0965>] cpu_startup_entry+0x245/0x310
[ 1328.469992] [<ffffffff81695b3b>] rest_init+0xbb/0xd0
[ 1328.475860] [<ffffffff81d4af8b>] start_kernel+0x460/0x46d
[ 1328.482154] [<ffffffff81d4a120>] ? early_idt_handler_array+0x120/0x120
[ 1328.489661] [<ffffffff81d4a4d7>] x86_64_start_reservations+0x2a/0x2c
[ 1328.496974] [<ffffffff81d4a614>] x86_64_start_kernel+0x13b/0x14a
[ 1328.503905] ---[ end trace a75b257dea18211b ]---

2015-07-06 16:24:04

by Liang, Kan

[permalink] [raw]

Subject: RE: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

> On Mon, 6 Jul 2015, Liang, Kan wrote:
>
> >
> > > On Fri, Jul 03, 2015 at 08:08:27PM +0000, Liang, Kan wrote:
> > > > If we cleared the last bit, we not only drain the buffer but also
> > > > decrease the event->ctx->pmu, which is used to flush the PEBS
> > > > buffer during context switches.
> > > > We need to disable cpuc->pebs_enabled before changing
> > > > event->ctx->pmu as below.
> > > >
> > >
> > > Indeed, mind sending a proper patch so I can press 'A' on it?
> >
> > Sure, I will do that.
> > But I didn't verify the patch, since I cannot reproduce the issue.
> >
> > Vince, would you mind testing the patch?
> > If the issue is gone, I will send a proper patch then.
>
> I've got too many patches floating around so I forget what this one is trying
> to fix. The pebs related lockup? Or the warning?
>

It's trying to fix the warning issue as below.
For the test result, it looks the patch doesn't help, does it?

> > WARN_ON_ONCE(!event->attr.precise_ip);
> >
> > [ 584.352324] WARNING: CPU: 2 PID: 18924 at
> > arch/x86/kernel/cpu/perf_event_intel_ds.c:1198
> > intel_pmu_drain_pebs_nhm+0x283/0x2e0()

> Was this patch meant to be in addition to PeterZ's, or standalone?
>

Standalone.

> Also please send proper patches in the future, this one was whitespace
> damaged and a pain to get applied.
>
> With just this patch applied (without PeterZ's) I still managed to trigger the
> following warning.

Thanks for the test.

>
> [ 1328.103920] ------------[ cut here ]------------ [ 1328.109367] WARNING:
> CPU: 0 PID: 0 at arch/x86/kernel/cpu/perf_event_intel_ds.c:1199
> intel_pmu_drain_pebs_nhm+0x283/0x2e0()
> [ 1328.199193] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.2.0-
> rc1+ #166
> [ 1328.207955] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS
> FBKT72AUS 01/26/2014 [ 1328.216461] ffffffff81a10f88 ffff88011ea05b10
> ffffffff816a10a3 0000000000000000 [ 1328.225043] 0000000000000000
> ffff88011ea05b50 ffffffff8106ec8a ffff88011ea05ba0 [ 1328.233654]
> 0000000000000000 0000000000000001 ffff88011ea0bd80 ffff8801190000c0
> [ 1328.242260] Call Trace:
> [ 1328.245452] <NMI> [<ffffffff816a10a3>] dump_stack+0x45/0x57
> [ 1328.252243] [<ffffffff8106ec8a>] warn_slowpath_common+0x8a/0xc0
> [ 1328.259212] [<ffffffff8106ed7a>] warn_slowpath_null+0x1a/0x20
> [ 1328.266000] [<ffffffff8102f783>]
> intel_pmu_drain_pebs_nhm+0x283/0x2e0
> [ 1328.273551] [<ffffffff81032235>] intel_pmu_handle_irq+0x255/0x440
> [ 1328.280742] [<ffffffff811574bf>] ? __perf_event_enable+0x1f/0x2a0
> [ 1328.287887] [<ffffffff8114b19d>] ? irq_work_queue+0x5d/0x80
> [ 1328.294500] [<ffffffff81028e76>] perf_event_nmi_handler+0x26/0x40
> [ 1328.301659] [<ffffffff810181ad>] nmi_handle+0x9d/0x140 [ 1328.307783]
> [<ffffffff81018115>] ? nmi_handle+0x5/0x140 [ 1328.314010]
> [<ffffffff8101843a>] default_do_nmi+0x4a/0x120 [ 1328.320495]
> [<ffffffff8101859d>] do_nmi+0x8d/0xc0 [ 1328.326053] [<ffffffff816ab01f>]
> end_repeat_nmi+0x1e/0x2e [ 1328.332405] [<ffffffff810309ba>] ?
> __intel_pmu_enable_all+0x5a/0xc0 [ 1328.339627] [<ffffffff810309ba>] ?
> __intel_pmu_enable_all+0x5a/0xc0 [ 1328.346894] [<ffffffff810309ba>] ?
> __intel_pmu_enable_all+0x5a/0xc0 [ 1328.354130] <<EOE>> <IRQ>
> [<ffffffff81030a30>] intel_pmu_enable_all+0x10/0x20 [ 1328.362556]
> [<ffffffff8102a95c>] x86_pmu_enable+0x25c/0x2e0 [ 1328.369079]
> [<ffffffff81156202>] perf_pmu_enable+0x22/0x30 [ 1328.375447]
> [<ffffffff81157da1>] __perf_install_in_context+0x131/0x1d0
> [ 1328.382936] [<ffffffff811533b0>] ? cpu_clock_event_start+0x40/0x40
> [ 1328.390106] [<ffffffff811533f2>] remote_function+0x42/0x50
> [ 1328.396525] [<ffffffff810f0e9b>]
> flush_smp_call_function_queue+0x7b/0x170
> [ 1328.404360] [<ffffffff810f1883>]
> generic_smp_call_function_single_interrupt+0x13/0x60
> [ 1328.413345] [<ffffffff81049217>]
> smp_call_function_single_interrupt+0x27/0x40
> [ 1328.421544] [<ffffffff816aa1ab>]
> call_function_single_interrupt+0x6b/0x70
> [ 1328.429358] <EOI> [<ffffffff81545454>] ?
> cpuidle_enter_state+0xf4/0x220 [ 1328.437120] [<ffffffff81545430>] ?
> cpuidle_enter_state+0xd0/0x220 [ 1328.444210] [<ffffffff815455b7>]
> cpuidle_enter+0x17/0x20 [ 1328.450491] [<ffffffff810b06eb>]
> call_cpuidle+0x3b/0x70 [ 1328.456680] [<ffffffff81545593>] ?
> cpuidle_select+0x13/0x20 [ 1328.463196] [<ffffffff810b0965>]
> cpu_startup_entry+0x245/0x310 [ 1328.469992] [<ffffffff81695b3b>]
> rest_init+0xbb/0xd0 [ 1328.475860] [<ffffffff81d4af8b>]
> start_kernel+0x460/0x46d [ 1328.482154] [<ffffffff81d4a120>] ?
> early_idt_handler_array+0x120/0x120
> [ 1328.489661] [<ffffffff81d4a4d7>] x86_64_start_reservations+0x2a/0x2c
> [ 1328.496974] [<ffffffff81d4a614>] x86_64_start_kernel+0x13b/0x14a
> [ 1328.503905] ---[ end trace a75b257dea18211b ]---

2015-07-06 16:45:27

[permalink] [raw]

Subject: RE: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

On Mon, 6 Jul 2015, Liang, Kan wrote:

>
> > On Mon, 6 Jul 2015, Liang, Kan wrote:
> >
> > >
> > > > On Fri, Jul 03, 2015 at 08:08:27PM +0000, Liang, Kan wrote:
> > > > > If we cleared the last bit, we not only drain the buffer but also
> > > > > decrease the event->ctx->pmu, which is used to flush the PEBS
> > > > > buffer during context switches.
> > > > > We need to disable cpuc->pebs_enabled before changing
> > > > > event->ctx->pmu as below.
> > > > >
> > > >
> > > > Indeed, mind sending a proper patch so I can press 'A' on it?
> > >
> > > Sure, I will do that.
> > > But I didn't verify the patch, since I cannot reproduce the issue.
> > >
> > > Vince, would you mind testing the patch?
> > > If the issue is gone, I will send a proper patch then.
> >
> > I've got too many patches floating around so I forget what this one is trying
> > to fix. The pebs related lockup? Or the warning?
> >
>
> It's trying to fix the warning issue as below.
> For the test result, it looks the patch doesn't help, does it?
>
> > > WARN_ON_ONCE(!event->attr.precise_ip);
> > >
> > > [ 584.352324] WARNING: CPU: 2 PID: 18924 at
> > > arch/x86/kernel/cpu/perf_event_intel_ds.c:1198
> > > intel_pmu_drain_pebs_nhm+0x283/0x2e0()
>
>
> > Was this patch meant to be in addition to PeterZ's, or standalone?
> >
>
> Standalone.
>
> > Also please send proper patches in the future, this one was whitespace
> > damaged and a pain to get applied.
> >
> > With just this patch applied (without PeterZ's) I still managed to trigger the
> > following warning.
>
> Thanks for the test.

The machine also crashed a few minutes later.

[ 2972.105858] INFO: rcu_sched detected stalls on CPUs/tasks: { 3} (detected by 7, t=5482 jiffies, g=338012, c=338011, q=205)
[ 2972.118544] Task dump for CPU 3:
[ 2972.122706] perf_fuzzer R running task 0 9409 2404 0x0000000c
[ 2972.131021] 0000000000000092 ffffffff81030dfb 0000000300000005 0000000400000004
[ 2972.139762] ffff88011eacc4a0 0000000000000005 0000000000000092 0000000500000002
[ 2972.148507] ffffffff81030e6b ffff8801197b2800 ffff8801197b2800 ffff88011eacbd80
[ 2972.157225] Call Trace:
[ 2972.160547] [<ffffffff81030dfb>] ? intel_start_scheduling+0x4b/0x70
[ 2972.168098] [<ffffffff81030e6b>] ? intel_stop_scheduling+0x4b/0x70
[ 2972.175513] [<ffffffff816a853b>] ? _raw_spin_unlock+0x2b/0x40
[ 2972.182502] [<ffffffff81030e6b>] ? intel_stop_scheduling+0x4b/0x70
[ 2972.189920] [<ffffffff81029f32>] ? x86_schedule_events+0x1e2/0x260
[ 2972.197340] [<ffffffff810b78f6>] ? __lock_acquire.isra.31+0x3a6/0xf90
[ 2972.204998] [<ffffffff810b78f6>] ? __lock_acquire.isra.31+0x3a6/0xf90
[ 2972.212670] [<ffffffff811585b2>] ? perf_event_update_userpage+0x102/0x170
[ 2972.220680] [<ffffffff811585ca>] ? perf_event_update_userpage+0x11a/0x170
[ 2972.228699] [<ffffffff811584b0>] ? perf_event_task_disable+0xd0/0xd0
[ 2972.236281] [<ffffffff81031d7b>] ? intel_pmu_enable_event+0xfb/0x210
[ 2972.243882] [<ffffffff810303a4>] ? intel_pmu_pebs_enable_all+0x34/0x40
[ 2972.251652] [<ffffffff810309ed>] ? __intel_pmu_enable_all+0x8d/0xc0
[ 2972.259115] [<ffffffff81030a30>] ? intel_pmu_enable_all+0x10/0x20
[ 2972.266402] [<ffffffff8102a95c>] ? x86_pmu_enable+0x25c/0x2e0
[ 2972.273316] [<ffffffff81156202>] ? perf_pmu_enable+0x22/0x30
[ 2972.280152] [<ffffffff81157da1>] ? __perf_install_in_context+0x131/0x1d0
[ 2972.288087] [<ffffffff811533f2>] ? remote_function+0x42/0x50
[ 2972.294856] [<ffffffff810f1046>] ? generic_exec_single+0xb6/0x120
[ 2972.302156] [<ffffffff8115d13a>] ? SYSC_perf_event_open+0xb4a/0xd40
[ 2972.309584] [<ffffffff811533b0>] ? cpu_clock_event_start+0x40/0x40
[ 2972.316895] [<ffffffff810f1160>] ? smp_call_function_single+0xb0/0x110
[ 2972.324617] [<ffffffff81152484>] ? task_function_call+0x44/0x50
[ 2972.331698] [<ffffffff81157c70>] ? perf_mux_hrtimer_handler+0x1f0/0x1f0
[ 2972.339490] [<ffffffff81152843>] ? perf_install_in_context+0x83/0xf0
[ 2972.347014] [<ffffffff8115d171>] ? SYSC_perf_event_open+0xb81/0xd40
[ 2972.354436] [<ffffffff8115d7a9>] ? SyS_perf_event_open+0x9/0x10
[ 2972.361469] [<ffffffff816a8df2>] ? entry_SYSCALL_64_fastpath+0x16/0x7a
[ 2973.026502] ------------[ cut here ]------------
[ 2973.031822] WARNING: CPU: 3 PID: 9409 at kernel/watchdog.c:311 watchdog_overflow_callback+0x84/0xa0()
[ 2973.042038] Watchdog detected hard LOCKUP on cpu 3
[ 2973.124350] CPU: 3 PID: 9409 Comm: perf_fuzzer Tainted: G W 4.2.0-rc1+ #166
[ 2973.133447] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[ 2973.141844] ffffffff81a28ae2 ffff88011eac5af0 ffffffff816a10a3 0000000000000000
[ 2973.150325] ffff88011eac5b40 ffff88011eac5b30 ffffffff8106ec8a ffff88011eac5c40
[ 2973.158787] ffff880119133800 0000000000000001 ffff88011eac5c40 ffff88011eac5ef8
[ 2973.167248] Call Trace:
[ 2973.170317] <NMI> [<ffffffff816a10a3>] dump_stack+0x45/0x57
[ 2973.176960] [<ffffffff8106ec8a>] warn_slowpath_common+0x8a/0xc0
[ 2973.183818] [<ffffffff8106ed06>] warn_slowpath_fmt+0x46/0x50
[ 2973.190416] [<ffffffff8102f676>] ? intel_pmu_drain_pebs_nhm+0x176/0x2e0
[ 2973.198013] [<ffffffff8111b784>] watchdog_overflow_callback+0x84/0xa0
[ 2973.205422] [<ffffffff8115af7c>] __perf_event_overflow+0x8c/0x1c0
[ 2973.212460] [<ffffffff8115bae4>] perf_event_overflow+0x14/0x20
[ 2973.219226] [<ffffffff810321b4>] intel_pmu_handle_irq+0x1d4/0x440
[ 2973.226340] [<ffffffff81028e76>] perf_event_nmi_handler+0x26/0x40
[ 2973.233400] [<ffffffff810181ad>] nmi_handle+0x9d/0x140
[ 2973.239427] [<ffffffff81018115>] ? nmi_handle+0x5/0x140
[ 2973.245540] [<ffffffff810184b9>] default_do_nmi+0xc9/0x120
[ 2973.251932] [<ffffffff8101859d>] do_nmi+0x8d/0xc0
[ 2973.257507] [<ffffffff816ab01f>] end_repeat_nmi+0x1e/0x2e
[ 2973.263846] [<ffffffff81035d76>] ? intel_bts_enable_local+0x26/0x40
[ 2973.271087] [<ffffffff81035d76>] ? intel_bts_enable_local+0x26/0x40
[ 2973.278330] [<ffffffff81035d76>] ? intel_bts_enable_local+0x26/0x40
[ 2973.285556] <<EOE>> [<ffffffff810309ed>] ? __intel_pmu_enable_all+0x8d/0xc0
[ 2973.293646] [<ffffffff81030a30>] intel_pmu_enable_all+0x10/0x20
[ 2973.300538] [<ffffffff8102a95c>] x86_pmu_enable+0x25c/0x2e0
[ 2973.307048] [<ffffffff81156202>] perf_pmu_enable+0x22/0x30
[ 2973.313470] [<ffffffff81157da1>] __perf_install_in_context+0x131/0x1d0
[ 2973.320973] [<ffffffff811533f2>] remote_function+0x42/0x50
[ 2973.327406] [<ffffffff810f1046>] generic_exec_single+0xb6/0x120
[ 2973.334300] [<ffffffff8115d13a>] ? SYSC_perf_event_open+0xb4a/0xd40
[ 2973.341513] [<ffffffff811533b0>] ? cpu_clock_event_start+0x40/0x40
[ 2973.348676] [<ffffffff810f1160>] smp_call_function_single+0xb0/0x110
[ 2973.355976] [<ffffffff81152484>] task_function_call+0x44/0x50
[ 2973.362705] [<ffffffff81157c70>] ? perf_mux_hrtimer_handler+0x1f0/0x1f0
[ 2973.370291] [<ffffffff81152843>] perf_install_in_context+0x83/0xf0
[ 2973.377478] [<ffffffff8115d171>] SYSC_perf_event_open+0xb81/0xd40
[ 2973.384552] [<ffffffff8115d7a9>] SyS_perf_event_open+0x9/0x10
[ 2973.391242] [<ffffffff816a8df2>] entry_SYSCALL_64_fastpath+0x16/0x7a
[ 2973.398575] ---[ end trace a75b257dea18211c ]---

2015-07-15 06:42:53

[permalink] [raw]

Subject: Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

On Fri, Jul 3, 2015 at 9:49 PM, Vince Weaver <[email protected]> wrote:
> On Fri, 3 Jul 2015, Peter Zijlstra wrote:
>
>> That said, its far too warm and I might just not be making sense.
>
> you need to come visit Maine! Although I am not sure the cooler weather
> necessarily improves my kernel debugging skills.
>
> I managed to lock the machine (again this is with the patch applied).
>
I can reproduce the problem on my HSW running the fuzzer.

I can see why this could be happening if you are mixing PEBS and non PEBS events
in the bottom 4 counters. I suspect:
for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
if ((counts[bit] == 0) && (error[bit] == 0))
continue;

This test is not correct when you have non-PEBS events mixed with PEBS
events and
they overflow at the same time. They will have counts[i] != 0 but
error[i] == 0, and thus
you fall thru the loop and hit the assert. Or it is something along those lines.

> [ 299.366027] ------------[ cut here ]------------
> [ 299.370985] WARNING: CPU: 2 PID: 8241 at arch/x86/kernel/cpu/perf_event_intel_ds.c:1198 intel_pmu_drain_pebs_nhm+0x283/0x2e0()
> [ 299.456929] CPU: 2 PID: 8241 Comm: perf_fuzzer Tainted: G W 4.1.0+ #164
> [ 299.465750] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> [ 299.474274] ffffffff81a105a0 ffff88011ea85b10 ffffffff8169f823 0000000000000000
> [ 299.482864] 0000000000000000 ffff88011ea85b50 ffffffff8106ec8a ffff88011ea85ba0
> [ 299.491488] 0000000000000000 0000000000000001 ffff88011ea8bd80 ffff8801190400c0
> [ 299.500029] Call Trace:
> [ 299.503190] <NMI> [<ffffffff8169f823>] dump_stack+0x45/0x57
> [ 299.509936] [<ffffffff8106ec8a>] warn_slowpath_common+0x8a/0xc0
> [ 299.516901] [<ffffffff8106ed7a>] warn_slowpath_null+0x1a/0x20
> [ 299.523715] [<ffffffff8102f783>] intel_pmu_drain_pebs_nhm+0x283/0x2e0
> [ 299.531268] [<ffffffff81032235>] intel_pmu_handle_irq+0x255/0x440
> [ 299.538487] [<ffffffff81028e76>] perf_event_nmi_handler+0x26/0x40
> [ 299.545638] [<ffffffff810181ad>] nmi_handle+0x9d/0x140
> [ 299.551772] [<ffffffff81018115>] ? nmi_handle+0x5/0x140
> [ 299.558013] [<ffffffff8101843a>] default_do_nmi+0x4a/0x120
> [ 299.564527] [<ffffffff8101859d>] do_nmi+0x8d/0xc0
> [ 299.570185] [<ffffffff816a979f>] end_repeat_nmi+0x1e/0x2e
> [ 299.576580] [<ffffffff811bc9d2>] ? check_poison_obj+0x92/0x230
> [ 299.583390] [<ffffffff811bc9d2>] ? check_poison_obj+0x92/0x230
> [ 299.590163] [<ffffffff811bc9d2>] ? check_poison_obj+0x92/0x230
> [ 299.596922] <<EOE>> [<ffffffff8115bea8>] ? perf_event_alloc+0x58/0x680
> [ 299.604594] [<ffffffff811bcf7d>] cache_alloc_debugcheck_after.isra.51+0x1cd/0x250
> [ 299.613140] [<ffffffff811c08b6>] kmem_cache_alloc_trace+0xa6/0x510
> [ 299.620330] [<ffffffff8115bea8>] ? perf_event_alloc+0x58/0x680
> [ 299.627088] [<ffffffff8106ee48>] ? get_online_cpus+0x58/0x70
> [ 299.633688] [<ffffffff8115bea8>] perf_event_alloc+0x58/0x680
> [ 299.640319] [<ffffffff8115c897>] SYSC_perf_event_open+0x3c7/0xd40
> [ 299.647353] [<ffffffff8105f86b>] ? __do_page_fault+0x1ab/0x3f0
> [ 299.654172] [<ffffffff8115d689>] SyS_perf_event_open+0x9/0x10
> [ 299.660871] [<ffffffff816a7572>] entry_SYSCALL_64_fastpath+0x16/0x7a
> [ 299.668236] ---[ end trace 3356c74581c13f1d ]---
> [ 299.673648] Uhhuh. NMI received for unknown reason 31 on CPU 2.
> [ 299.680427] Do you have a strange power saving mode enabled?
> [ 299.686963] Dazed and confused, but trying to continue
> [ 299.692904] Uhhuh. NMI received for unknown reason 31 on CPU 2.
> [ 299.699748] Do you have a strange power saving mode enabled?
> [ 299.706227] Dazed and confused, but trying to continue
> [ 299.712172] Uhhuh. NMI received for unknown reason 31 on CPU 2.
> [ 299.718946] Do you have a strange power saving mode enabled?
> [ 299.725446] Dazed and confused, but trying to continue
> [ 299.731419] Uhhuh. NMI received for unknown reason 31 on CPU 2.
> [ 299.738235] Do you have a strange power saving mode enabled?
> [ 299.744740] Dazed and confused, but trying to continue
> [ 299.750660] Uhhuh. NMI received for unknown reason 21 on CPU 2.
> [ 299.757398] Do you have a strange power saving mode enabled?
> [ 299.763862] Dazed and confused, but trying to continue
>
> (machine eventually locks up after lots of these messages)
>

2015-07-15 12:35:55

[permalink] [raw]

Subject: Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

On Wed, Jul 15, 2015 at 08:42:50AM +0200, Stephane Eranian wrote:
> On Fri, Jul 3, 2015 at 9:49 PM, Vince Weaver <[email protected]> wrote:
> > On Fri, 3 Jul 2015, Peter Zijlstra wrote:
> >
> >> That said, its far too warm and I might just not be making sense.
> >
> > you need to come visit Maine! Although I am not sure the cooler weather
> > necessarily improves my kernel debugging skills.
> >
> > I managed to lock the machine (again this is with the patch applied).
> >
> I can reproduce the problem on my HSW running the fuzzer.
>
> I can see why this could be happening if you are mixing PEBS and non PEBS events
> in the bottom 4 counters. I suspect:
> for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
> if ((counts[bit] == 0) && (error[bit] == 0))
> continue;
>
> This test is not correct when you have non-PEBS events mixed with PEBS
> events and
> they overflow at the same time. They will have counts[i] != 0 but
> error[i] == 0, and thus
> you fall thru the loop and hit the assert. Or it is something along those lines.
>

The only way I can make this work is if ->status only has !PEBS events
set, because if it has both set we'll take that slow path which masks
out the !PEBS bits.

After masking there are 3 options:

- there is one bit set, and its @bit, we increment counts[bit].
- there are multiple bits set, we increment error[] for each set bit,
we do not increment counts[].
- there are no bits set, we do nothing.

The intent was to never increment counts[] for !PEBS events.

Now if we start out with only a single !PEBS event set, we'll pass the
test and increment counts[] for a !PEBS and hit the warn.

The below patch modifies the code such that it can deal with that
particular issue. Can you try?

---
arch/x86/kernel/cpu/perf_event_intel_ds.c | 29 +++++++++++++----------------
1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 71fc40238843..68d0ced1d229 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -1142,6 +1142,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)

for (at = base; at < top; at += x86_pmu.pebs_record_size) {
struct pebs_record_nhm *p = at;
+ u64 pebs_status;

/* PEBS v3 has accurate status bits */
if (x86_pmu.intel_cap.pebs_format >= 3) {
@@ -1152,12 +1153,14 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
continue;
}

- bit = find_first_bit((unsigned long *)&p->status,
+ pebs_status = p->status & cpuc->pebs_enabled;
+ pebs_status &= (1ULL << x86_pmu.max_pebs_events) - 1;
+
+ bit = find_first_bit((unsigned long *)&pebs_status,
x86_pmu.max_pebs_events);
if (bit >= x86_pmu.max_pebs_events)
continue;
- if (!test_bit(bit, cpuc->active_mask))
- continue;
+
/*
* The PEBS hardware does not deal well with the situation
* when events happen near to each other and multiple bits
@@ -1172,27 +1175,21 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
* one, and it's not possible to reconstruct all events
* that caused the PEBS record. It's called collision.
* If collision happened, the record will be dropped.
- *
*/
- if (p->status != (1 << bit)) {
- u64 pebs_status;
-
- /* slow path */
- pebs_status = p->status & cpuc->pebs_enabled;
- pebs_status &= (1ULL << MAX_PEBS_EVENTS) - 1;
- if (pebs_status != (1 << bit)) {
- for_each_set_bit(i, (unsigned long *)&pebs_status,
- MAX_PEBS_EVENTS)
- error[i]++;
- continue;
- }
+ if (p->status != (1ULL << bit)) {
+ for_each_set_bit(i, (unsigned long *)&pebs_status,
+ x86_pmu.max_pebs_events)
+ error[i]++;
+ continue;
}
+
counts[bit]++;
}

for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
if ((counts[bit] == 0) && (error[bit] == 0))
continue;
+
event = cpuc->events[bit];
WARN_ON_ONCE(!event);
WARN_ON_ONCE(!event->attr.precise_ip);

2015-07-16 06:02:20

[permalink] [raw]

Subject: Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

On Wed, Jul 15, 2015 at 2:35 PM, Peter Zijlstra <[email protected]> wrote:
> On Wed, Jul 15, 2015 at 08:42:50AM +0200, Stephane Eranian wrote:
>> On Fri, Jul 3, 2015 at 9:49 PM, Vince Weaver <[email protected]> wrote:
>> > On Fri, 3 Jul 2015, Peter Zijlstra wrote:
>> >
>> >> That said, its far too warm and I might just not be making sense.
>> >
>> > you need to come visit Maine! Although I am not sure the cooler weather
>> > necessarily improves my kernel debugging skills.
>> >
>> > I managed to lock the machine (again this is with the patch applied).
>> >
>> I can reproduce the problem on my HSW running the fuzzer.
>>
>> I can see why this could be happening if you are mixing PEBS and non PEBS events
>> in the bottom 4 counters. I suspect:
>> for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
>> if ((counts[bit] == 0) && (error[bit] == 0))
>> continue;
>>
>> This test is not correct when you have non-PEBS events mixed with PEBS
>> events and
>> they overflow at the same time. They will have counts[i] != 0 but
>> error[i] == 0, and thus
>> you fall thru the loop and hit the assert. Or it is something along those lines.
>>
>
> The only way I can make this work is if ->status only has !PEBS events
> set, because if it has both set we'll take that slow path which masks
> out the !PEBS bits.
>
> After masking there are 3 options:
>
> - there is one bit set, and its @bit, we increment counts[bit].
> - there are multiple bits set, we increment error[] for each set bit,
> we do not increment counts[].
> - there are no bits set, we do nothing.
>
> The intent was to never increment counts[] for !PEBS events.
>
> Now if we start out with only a single !PEBS event set, we'll pass the
> test and increment counts[] for a !PEBS and hit the warn.
>
> The below patch modifies the code such that it can deal with that
> particular issue. Can you try?
>
Been running it for a couple of hours, so far so good. I will let it
run all night.
Testing it on HSW and NHM, my SNB is broken at the moment.

Don't know if the fuzzer has already reproduced the test case.
Thanks.

> ---
> arch/x86/kernel/cpu/perf_event_intel_ds.c | 29 +++++++++++++----------------
> 1 file changed, 13 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> index 71fc40238843..68d0ced1d229 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> @@ -1142,6 +1142,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
>
> for (at = base; at < top; at += x86_pmu.pebs_record_size) {
> struct pebs_record_nhm *p = at;
> + u64 pebs_status;
>
> /* PEBS v3 has accurate status bits */
> if (x86_pmu.intel_cap.pebs_format >= 3) {
> @@ -1152,12 +1153,14 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
> continue;
> }
>
> - bit = find_first_bit((unsigned long *)&p->status,
> + pebs_status = p->status & cpuc->pebs_enabled;
> + pebs_status &= (1ULL << x86_pmu.max_pebs_events) - 1;
> +
> + bit = find_first_bit((unsigned long *)&pebs_status,
> x86_pmu.max_pebs_events);
> if (bit >= x86_pmu.max_pebs_events)
> continue;
> - if (!test_bit(bit, cpuc->active_mask))
> - continue;
> +
> /*
> * The PEBS hardware does not deal well with the situation
> * when events happen near to each other and multiple bits
> @@ -1172,27 +1175,21 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
> * one, and it's not possible to reconstruct all events
> * that caused the PEBS record. It's called collision.
> * If collision happened, the record will be dropped.
> - *
> */
> - if (p->status != (1 << bit)) {
> - u64 pebs_status;
> -
> - /* slow path */
> - pebs_status = p->status & cpuc->pebs_enabled;
> - pebs_status &= (1ULL << MAX_PEBS_EVENTS) - 1;
> - if (pebs_status != (1 << bit)) {
> - for_each_set_bit(i, (unsigned long *)&pebs_status,
> - MAX_PEBS_EVENTS)
> - error[i]++;
> - continue;
> - }
> + if (p->status != (1ULL << bit)) {
> + for_each_set_bit(i, (unsigned long *)&pebs_status,
> + x86_pmu.max_pebs_events)
> + error[i]++;
> + continue;
> }
> +
> counts[bit]++;
> }
>
> for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
> if ((counts[bit] == 0) && (error[bit] == 0))
> continue;
> +
> event = cpuc->events[bit];
> WARN_ON_ONCE(!event);
> WARN_ON_ONCE(!event->attr.precise_ip);

2015-07-16 07:15:47

[permalink] [raw]

Subject: Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

On Thu, Jul 16, 2015 at 08:02:03AM +0200, Stephane Eranian wrote:
> Been running it for a couple of hours, so far so good. I will let it
> run all night.

Thanks!

> > ---
> > arch/x86/kernel/cpu/perf_event_intel_ds.c | 29 +++++++++++++----------------
> > 1 file changed, 13 insertions(+), 16 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> > index 71fc40238843..68d0ced1d229 100644
> > --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
> > +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> > @@ -1142,6 +1142,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
> >
> > for (at = base; at < top; at += x86_pmu.pebs_record_size) {
> > struct pebs_record_nhm *p = at;
> > + u64 pebs_status;
> >
> > /* PEBS v3 has accurate status bits */
> > if (x86_pmu.intel_cap.pebs_format >= 3) {
> > @@ -1152,12 +1153,14 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
> > continue;
> > }
> >
> > - bit = find_first_bit((unsigned long *)&p->status,
> > + pebs_status = p->status & cpuc->pebs_enabled;
> > + pebs_status &= (1ULL << x86_pmu.max_pebs_events) - 1;
> > +
> > + bit = find_first_bit((unsigned long *)&pebs_status,
> > x86_pmu.max_pebs_events);
> > if (bit >= x86_pmu.max_pebs_events)
> > continue;

Maybe we should WARN in this case? A PEBS entry without any PEBS bits
set in the status field would be 'weird', right?

Maybe something like:

if (WARN(bit >= x86_pmu.max_pebs_events,
"PEBS record without PEBS event! status=%Lx pebs_enabled=%Lx active_mask=%Lx",
p->status, cpuc->pebs_enabled, cpuc->active_mask))
continue;

If that triggers we at least get more info.

2015-07-16 07:30:21

[permalink] [raw]

Subject: Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

On Thu, Jul 16, 2015 at 12:15 AM, Peter Zijlstra <[email protected]> wrote:
> On Thu, Jul 16, 2015 at 08:02:03AM +0200, Stephane Eranian wrote:
>> Been running it for a couple of hours, so far so good. I will let it
>> run all night.
>
> Thanks!
>
Well, it died on NHM in the same function despite your patch. Need to
look at the exact warning.\
So more work is needed. But then I also saw the irq loop stuck message
before that.

>> > ---
>> > arch/x86/kernel/cpu/perf_event_intel_ds.c | 29 +++++++++++++----------------
>> > 1 file changed, 13 insertions(+), 16 deletions(-)
>> >
>> > diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
>> > index 71fc40238843..68d0ced1d229 100644
>> > --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
>> > +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
>> > @@ -1142,6 +1142,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
>> >
>> > for (at = base; at < top; at += x86_pmu.pebs_record_size) {
>> > struct pebs_record_nhm *p = at;
>> > + u64 pebs_status;
>> >
>> > /* PEBS v3 has accurate status bits */
>> > if (x86_pmu.intel_cap.pebs_format >= 3) {
>> > @@ -1152,12 +1153,14 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
>> > continue;
>> > }
>> >
>> > - bit = find_first_bit((unsigned long *)&p->status,
>> > + pebs_status = p->status & cpuc->pebs_enabled;
>> > + pebs_status &= (1ULL << x86_pmu.max_pebs_events) - 1;
>> > +
>> > + bit = find_first_bit((unsigned long *)&pebs_status,
>> > x86_pmu.max_pebs_events);
>> > if (bit >= x86_pmu.max_pebs_events)
>> > continue;
>
> Maybe we should WARN in this case? A PEBS entry without any PEBS bits
> set in the status field would be 'weird', right?
>
> Maybe something like:
>
> if (WARN(bit >= x86_pmu.max_pebs_events,
> "PEBS record without PEBS event! status=%Lx pebs_enabled=%Lx active_mask=%Lx",
> p->status, cpuc->pebs_enabled, cpuc->active_mask))
> continue;
>
> If that triggers we at least get more info.

2015-07-16 21:12:13