2018-02-12 22:23:16

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V4 0/5] bugs fix for auto-reload mmap read and rdpmc read

From: Kan Liang <[email protected]>

------

Changes since V3:
- Apply Peter's patch to fix event update for auto-reload event
- Based on Peter's patch, specially handle case A.
- Introduce specific read function for auto-reload event, not just
large PEBS.

Changes since V2:
- Refined the changelog
- Introduced specific read function for large PEBS.
The previous generic PEBS read function is confusing.
Disabled PMU in pmu::read() path for large PEBS.
Handled the corner case when reload_times == 0.
- Modified the parameter of intel_pmu_save_and_restart_reload()
Discarded local64_cmpxchg
- Added fixes tag
- Added WARN to handle reload_times == 0 || reload_val == 0

Changes since V1:
- Check PERF_X86_EVENT_AUTO_RELOAD before call
intel_pmu_save_and_restore()
- Introduce a special purpose intel_pmu_save_and_restart()
just for AUTO_RELOAD.
- New patch to disable userspace RDPMC usage if large PEBS is enabled.

------

There is a bug when mmap read event->count with large PEBS enabled.
Here is an example.
#./read_count
0x71f0
0x122c0
0x1000000001c54
0x100000001257d
0x200000000bdc5

The bug is caused by two issues.
- In x86_perf_event_update, the calculation of event->count does not
take the auto-reload values into account.
- In x86_pmu_read, it doesn't count the undrained values in large PEBS
buffers.

The first issue was introduced with the auto-reload mechanism enabled
since commit 851559e35fd5 ("perf/x86/intel: Use the PEBS auto reload
mechanism when possible")

Patch 1 fixed the issue in x86_perf_event_update.

The second issue was introduced since commit b8241d20699e
("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS
interrupt threshold)")

Patch 2-4 fixed the issue in x86_pmu_read.

Besides the two issues, the userspace RDPMC usage is broken for large
PEBS as well.
The RDPMC issue was also introduced since commit b8241d20699e
("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS
interrupt threshold)")

Patch 5 fixed the RDPMC issue.

The source code of read_count is as below.

struct cpu {
int fd;
struct perf_event_mmap_page *buf;
};

int perf_open(struct cpu *ctx, int cpu)
{
struct perf_event_attr attr = {
.type = PERF_TYPE_HARDWARE,
.size = sizeof(struct perf_event_attr),
.sample_period = 100000,
.config = 0,
.sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_TID |
PERF_SAMPLE_TIME | PERF_SAMPLE_CPU,
.precise_ip = 3,
.mmap = 1,
.comm = 1,
.task = 1,
.mmap2 = 1,
.sample_id_all = 1,
.comm_exec = 1,
};
ctx->buf = NULL;
ctx->fd = syscall(__NR_perf_event_open, &attr, -1, cpu, -1, 0);
if (ctx->fd < 0) {
perror("perf_event_open");
return -1;
}
return 0;
}

void perf_close(struct cpu *ctx)
{
close(ctx->fd);
if (ctx->buf)
munmap(ctx->buf, pagesize);
}

int main(int ac, char **av)
{
struct cpu ctx;
u64 count;

perf_open(&ctx, 0);

while (1) {
sleep(5);

if (read(ctx.fd, &count, 8) != 8) {
perror("counter read");
break;
}
printf("0x%llx\n", count);

}
perf_close(&ctx);
}


Kan Liang (5):
perf/x86/intel: Fix event update for auto-reload
perf/x86: Introduce read function for x86_pmu
perf/x86/intel/ds: Introduce read function for auto-reload event
perf/x86/intel: Fix pmu read for auto-reload
perf/x86: Fix: disable userspace RDPMC usage for large PEBS

arch/x86/events/core.c | 20 ++++-----
arch/x86/events/intel/core.c | 9 +++++
arch/x86/events/intel/ds.c | 96 ++++++++++++++++++++++++++++++++++++++++++--
arch/x86/events/perf_event.h | 3 ++
4 files changed, 115 insertions(+), 13 deletions(-)

--
2.7.4



2018-02-12 22:23:30

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V4 5/5] perf/x86: Fix: disable userspace RDPMC usage for large PEBS

From: Kan Liang <[email protected]>

The userspace RDPMC usage never works for large PEBS since the large
PEBS is introduced by
commit b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt
handling (large PEBS interrupt threshold)")

When the PEBS interrupt threshold is larger than one, there is no way to
get exact auto-reload times and value for userspace RDPMC.
Disable the userspace RDPMC usage when large PEBS is enabled.

For the PEBS interrupt threshold equals to one, even it's auto-reload
event, it doesn't need to disable the userspace RDPMC. It works well.

Fixes: b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt
handling (large PEBS interrupt threshold)")
Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/events/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 00a6251..9c86e10 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2117,7 +2117,8 @@ static int x86_pmu_event_init(struct perf_event *event)
event->destroy(event);
}

- if (READ_ONCE(x86_pmu.attr_rdpmc))
+ if (READ_ONCE(x86_pmu.attr_rdpmc) &&
+ !(event->hw.flags & PERF_X86_EVENT_FREERUNNING))
event->hw.flags |= PERF_X86_EVENT_RDPMC_ALLOWED;

return err;
--
2.7.4


2018-02-12 22:23:45

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V4 1/5] perf/x86/intel: Fix event update for auto-reload

From: Kan Liang <[email protected]>

There is a bug when mmap read event->count with large PEBS enabled.
Here is an example.
#./read_count
0x71f0
0x122c0
0x1000000001c54
0x100000001257d
0x200000000bdc5

In fixed period mode, the auto-reload mechanism could be enabled for
PEBS events. But the calculation of event->count does not take the
auto-reload values into account. Anyone who read the event->count will
get wrong result, e.g x86_pmu_read.

The issue was introduced with the auto-reload mechanism enabled since
commit 851559e35fd5 ("perf/x86/intel: Use the PEBS auto reload mechanism
when possible")

Introduce intel_pmu_save_and_restart_reload to calculate the
event->count only for auto-reload.
Since the counter increments a negative counter value and overflows on
the sign switch, giving the interval:
[-period, 0]
the difference between two consequtive reads is:
A) value2 - value1;
when no overflows have happened in between,
B) (0 - value1) + (value2 - (-period));
when one overflow happened in between,
C) (0 - value1) + (n - 1) * (period) + (value2 - (-period));
when @n overflows happened in betwee.
Here A) is the obvious difference, B) is the extension to the discrete
interval, where the first term is to the top of the interval and the
second term is from the bottom of the next interval and C) the extension
to multiple intervals, where the middle term is the whole intervals
covered.
The equation for all cases is
value2 - value1 + n * period

Previously the event->count is updated right before the sample output.
But for case A, there is no PEBS record ready. It needs to be specially
handled.

Remove the auto-reload code from x86_perf_event_set_period(). It doesn't
need.

Fixes: 851559e35fd5 ("perf/x86/intel: Use the PEBS auto reload mechanism
when possible")
Based-on-code-from: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/events/core.c | 15 ++++----
arch/x86/events/intel/ds.c | 87 ++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 90 insertions(+), 12 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 140d332..5a3ccd1 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1156,16 +1156,13 @@ int x86_perf_event_set_period(struct perf_event *event)

per_cpu(pmc_prev_left[idx], smp_processor_id()) = left;

- if (!(hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) ||
- local64_read(&hwc->prev_count) != (u64)-left) {
- /*
- * The hw event starts counting from this event offset,
- * mark it to be able to extra future deltas:
- */
- local64_set(&hwc->prev_count, (u64)-left);
+ /*
+ * The hw event starts counting from this event offset,
+ * mark it to be able to extra future deltas:
+ */
+ local64_set(&hwc->prev_count, (u64)-left);

- wrmsrl(hwc->event_base, (u64)(-left) & x86_pmu.cntval_mask);
- }
+ wrmsrl(hwc->event_base, (u64)(-left) & x86_pmu.cntval_mask);

/*
* Due to erratum on certan cpu we need
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 8156e47..f519ebc 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1303,17 +1303,84 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit)
return NULL;
}

+/*
+ * Special variant of intel_pmu_save_and_restart() for auto-reload.
+ */
+static int
+intel_pmu_save_and_restart_reload(struct perf_event *event, int count)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ int shift = 64 - x86_pmu.cntval_bits;
+ u64 period = hwc->sample_period;
+ u64 prev_raw_count, new_raw_count;
+ s64 new, old;
+
+ WARN_ON(!period);
+
+ /*
+ * drain_pebs() only happens when the PMU is disabled.
+ */
+ WARN_ON(this_cpu_read(cpu_hw_events.enabled));
+
+ prev_raw_count = local64_read(&hwc->prev_count);
+ rdpmcl(hwc->event_base_rdpmc, new_raw_count);
+ local64_set(&hwc->prev_count, new_raw_count);
+
+ /*
+ * Since the counter increments a negative counter value and
+ * overflows on the sign switch, giving the interval:
+ *
+ * [-period, 0]
+ *
+ * the difference between two consequtive reads is:
+ *
+ * A) value2 - value1;
+ * when no overflows have happened in between,
+ *
+ * B) (0 - value1) + (value2 - (-period));
+ * when one overflow happened in between,
+ *
+ * C) (0 - value1) + (n - 1) * (period) + (value2 - (-period));
+ * when @n overflows happened in between.
+ *
+ * Here A) is the obvious difference, B) is the extension to the
+ * discrete interval, where the first term is to the top of the
+ * interval and the second term is from the bottom of the next
+ * interval and 3) the extension to multiple intervals, where the
+ * middle term is the whole intervals covered.
+ *
+ * An equivalent of C, by reduction, is:
+ *
+ * value2 - value1 + n * period
+ */
+ new = ((s64)(new_raw_count << shift) >> shift);
+ old = ((s64)(prev_raw_count << shift) >> shift);
+ local64_add(new - old + count * period, &event->count);
+
+ perf_event_update_userpage(event);
+
+ return 0;
+}
+
static void __intel_pmu_pebs_event(struct perf_event *event,
struct pt_regs *iregs,
void *base, void *top,
int bit, int count)
{
+ struct hw_perf_event *hwc = &event->hw;
struct perf_sample_data data;
struct pt_regs regs;
void *at = get_next_pebs_record_by_bit(base, top, bit);

- if (!intel_pmu_save_and_restart(event) &&
- !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
+ if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) {
+ /*
+ * Now, auto-reload is only enabled in fixed period mode.
+ * The reload value is always hwc->sample_period.
+ * May need to change it, if auto-reload is enabled in
+ * freq mode later.
+ */
+ intel_pmu_save_and_restart_reload(event, count);
+ } else if (!intel_pmu_save_and_restart(event))
return;

while (count > 1) {
@@ -1389,8 +1456,22 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)

ds->pebs_index = ds->pebs_buffer_base;

- if (unlikely(base >= top))
+ if (unlikely(base >= top)) {
+ /*
+ * The drain_pebs() could be called twice in a short period
+ * for auto-reload event in pmu::read(). There are no
+ * overflows have happened in between.
+ * It needs to call intel_pmu_save_and_restart_reload() to
+ * update the event->count for this case.
+ */
+ for_each_set_bit(bit, (unsigned long *)&cpuc->pebs_enabled,
+ x86_pmu.max_pebs_events) {
+ event = cpuc->events[bit];
+ if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
+ intel_pmu_save_and_restart_reload(event, 0);
+ }
return;
+ }

for (at = base; at < top; at += x86_pmu.pebs_record_size) {
struct pebs_record_nhm *p = at;
--
2.7.4


2018-02-12 22:23:58

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V4 2/5] perf/x86: Introduce read function for x86_pmu

From: Kan Liang <[email protected]>

Auto-reload needs to be specially handled in event count read.

Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/events/core.c | 2 ++
arch/x86/events/perf_event.h | 1 +
2 files changed, 3 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 5a3ccd1..00a6251 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1881,6 +1881,8 @@ early_initcall(init_hw_perf_events);

static inline void x86_pmu_read(struct perf_event *event)
{
+ if (x86_pmu.read)
+ return x86_pmu.read(event);
x86_perf_event_update(event);
}

diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 8e4ea143..805400b 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -519,6 +519,7 @@ struct x86_pmu {
void (*disable)(struct perf_event *);
void (*add)(struct perf_event *);
void (*del)(struct perf_event *);
+ void (*read)(struct perf_event *event);
int (*hw_config)(struct perf_event *event);
int (*schedule_events)(struct cpu_hw_events *cpuc, int n, int *assign);
unsigned eventsel;
--
2.7.4


2018-02-12 22:24:39

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V4 3/5] perf/x86/intel/ds: Introduce read function for auto-reload event

From: Kan Liang <[email protected]>

There is no way to get exact auto-reload times and value which needed
for event update unless flush the PEBS buffer.

Introduce intel_pmu_auto_reload_read() to drain the PEBS buffer for
auto reload event.
To prevent the race, the drain_pebs() only be called when the PMU is
disabled.

Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/events/intel/ds.c | 9 +++++++++
arch/x86/events/perf_event.h | 2 ++
2 files changed, 11 insertions(+)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index f519ebc..406f3ba 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1303,6 +1303,15 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit)
return NULL;
}

+void intel_pmu_auto_reload_read(struct perf_event *event)
+{
+ WARN_ON(!(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD));
+
+ perf_pmu_disable(event->pmu);
+ intel_pmu_drain_pebs_buffer();
+ perf_pmu_enable(event->pmu);
+}
+
/*
* Special variant of intel_pmu_save_and_restart() for auto-reload.
*/
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 805400b..f4720a9 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -923,6 +923,8 @@ void intel_pmu_pebs_disable_all(void);

void intel_pmu_pebs_sched_task(struct perf_event_context *ctx, bool sched_in);

+void intel_pmu_auto_reload_read(struct perf_event *event);
+
void intel_ds_init(void);

void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in);
--
2.7.4


2018-02-12 22:24:57

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V4 4/5] perf/x86/intel: Fix pmu read for auto-reload

From: Kan Liang <[email protected]>

Auto-reload event needs to be specially handled in event count read.

Auto-reload is only available for intel_pmu.

Fixes: b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt
handling (large PEBS interrupt threshold)")
Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/events/intel/core.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 731153a..6461a4a 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2060,6 +2060,14 @@ static void intel_pmu_del_event(struct perf_event *event)
intel_pmu_pebs_del(event);
}

+static void intel_pmu_read_event(struct perf_event *event)
+{
+ if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
+ intel_pmu_auto_reload_read(event);
+ else
+ x86_perf_event_update(event);
+}
+
static void intel_pmu_enable_fixed(struct hw_perf_event *hwc)
{
int idx = hwc->idx - INTEL_PMC_IDX_FIXED;
@@ -3495,6 +3503,7 @@ static __initconst const struct x86_pmu intel_pmu = {
.disable = intel_pmu_disable_event,
.add = intel_pmu_add_event,
.del = intel_pmu_del_event,
+ .read = intel_pmu_read_event,
.hw_config = intel_pmu_hw_config,
.schedule_events = x86_schedule_events,
.eventsel = MSR_ARCH_PERFMON_EVENTSEL0,
--
2.7.4


2018-02-17 06:22:55

by Fengguang Wu

[permalink] [raw]
Subject: [perf/x86/intel] 41e062cd2e: WARNING:at_arch/x86/events/intel/ds.c:#intel_pmu_save_and_restart_reload

FYI, we noticed the following commit (built with gcc-7):

commit: 41e062cd2eca8ad3f98159b87fb6cd4409ec8e68 ("perf/x86/intel: Fix event update for auto-reload")
url: https://github.com/0day-ci/linux/commits/kan-liang-linux-intel-com/bugs-fix-for-auto-reload-mmap-read-and-rdpmc-read/20180216-060737


in testcase: netperf
with following parameters:

ip: ipv4
runtime: 300s
nr_threads: 200%
cluster: cs-localhost
test: TCP_SENDFILE
cpufreq_governor: performance

test-description: Netperf is a benchmark that can be use to measure various aspect of networking performance.
test-url: http://www.netperf.org/netperf/


on test machine: 4 threads Intel(R) Core(TM) i5-3317U CPU @ 1.70GHz with 4G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+------------------------------------------------------------------------------------------+-------+------------+
| | v4.15 | 41e062cd2e |
+------------------------------------------------------------------------------------------+-------+------------+
| boot_successes | 3410 | 0 |
| boot_failures | 1077 | 7 |
| invoked_oom-killer:gfp_mask=0x | 841 | 4 |
| Mem-Info | 848 | 4 |
| Out_of_memory:Kill_process | 4 | |
| IP-Config:Auto-configuration_of_network_failed | 136 | |
| RIP:poll_idle | 19 | |
| page_allocation_failure:order:#,mode:#(GFP_ATOMIC|__GFP_COMP|__GFP_ZERO),nodemask=(null) | 7 | |
| BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/rwsem.c | 2 | |
| WARNING:at_kernel/workqueue.c:#destroy_workqueue | 1 | |
| RIP:destroy_workqueue | 1 | |
| RIP:cpuidle_enter_state | 16 | |
| RIP:copy_page | 1 | |
| RIP:SyS_rmdir | 1 | |
| WARNING:at_net/sched/sch_generic.c:#dev_watchdog | 1 | |
| RIP:dev_watchdog | 1 | |
| End_of_test:RCU_HOTPLUG | 5 | |
| WARNING:at_arch/x86/events/intel/core.c:#intel_pmu_handle_irq | 2 | |
| RIP:intel_pmu_handle_irq | 2 | |
| RIP:native_write_msr | 2 | 3 |
| BUG:unable_to_handle_kernel | 1 | |
| Oops:#[##] | 1 | |
| RIP:xhci_ring_trb_show | 1 | |
| Kernel_panic-not_syncing:Fatal_exception | 2 | |
| WARNING:at_fs/iomap.c:#iomap_dio_complete | 3 | |
| RIP:iomap_dio_complete | 3 | |
| WARNING:at_fs/direct-io.c:#dio_complete | 19 | |
| RIP:dio_complete | 19 | |
| WARNING:at_fs/btrfs/disk-io.c:#free_fs_root[btrfs] | 1 | |
| RIP:free_fs_root[btrfs] | 1 | |
| WARNING:at_fs/btrfs/extent-tree.c:#btrfs_put_block_group[btrfs] | 1 | |
| RIP:btrfs_free_block_groups[btrfs] | 1 | |
| WARNING:at_fs/btrfs/inode.c:#cow_file_range[btrfs] | 2 | |
| RIP:cow_file_range[btrfs] | 2 | |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 837 | 4 |
| Assertion_failed | 1 | |
| kernel_BUG_at_fs/xfs/xfs_message.c | 1 | |
| invalid_opcode:#[##] | 1 | |
| RIP:assfail[xfs] | 1 | |
| BUG:kernel_hang_in_test_stage | 16 | |
| BUG:kernel_reboot-without-warning_in_test_stage | 5 | |
| BUG:kernel_hang_in_boot_stage | 1 | |
| WARNING:at_arch/x86/events/intel/ds.c:#intel_pmu_save_and_restart_reload | 0 | 3 |
| RIP:intel_pmu_save_and_restart_reload | 0 | 3 |
| RIP:copy_user_enhanced_fast_string | 0 | 3 |
| RIP:syscall_return_via_sysret | 0 | 3 |
| RIP:__radix_tree_lookup | 0 | 3 |
| RIP:tcp_v4_send_check | 0 | 2 |
| RIP:entry_SYSCALL_64_after_hwframe | 0 | 3 |
| RIP:ip_rcv | 0 | 3 |
| RIP:skb_release_head_state | 0 | 3 |
| RIP:do_sendfile | 0 | 3 |
| RIP:tcp_established_options | 0 | 3 |
| RIP:current_kernel_time64 | 0 | 2 |
| RIP:___might_sleep | 0 | 3 |
| RIP:exit_to_usermode_loop | 0 | 2 |
| RIP:netif_rx_internal | 0 | 2 |
| RIP:do_tcp_sendpages | 0 | 3 |
| RIP:selinux_file_permission | 0 | 3 |
| RIP:tcp_transmit_skb | 0 | 3 |
| RIP:sock_sendpage | 0 | 3 |
| RIP:generic_file_read_iter | 0 | 3 |
| RIP:__schedule | 0 | 3 |
| RIP:_raw_spin_lock_bh | 0 | 3 |
| RIP:__tcp_v4_send_check | 0 | 3 |
| RIP:tcp_v4_rcv | 0 | 2 |
| RIP:copy_page_to_iter | 0 | 3 |
| RIP:skb_try_coalesce | 0 | 1 |
| RIP:pipe_wait | 0 | 2 |
| RIP:tcp_send_mss | 0 | 2 |
| RIP:fsnotify | 0 | 3 |
| RIP:tcp_sendpage_locked | 0 | 2 |
| RIP:find_get_entry | 0 | 2 |
| RIP:release_sock | 0 | 3 |
| RIP:select_task_rq_fair | 0 | 3 |
| RIP:_find_next_bit | 0 | 3 |
| RIP:avc_has_perm | 0 | 3 |
| RIP:splice_direct_to_actor | 0 | 2 |
| RIP:pipe_write | 0 | 2 |
| RIP:__sk_mem_raise_allocated | 0 | 1 |
| RIP:__switch_to | 0 | 3 |
| RIP:enqueue_to_backlog | 0 | 2 |
| RIP:lock_sock_nested | 0 | 2 |
| RIP:tcp_rcv_established | 0 | 2 |
| RIP:__dev_queue_xmit | 0 | 3 |
| RIP:rcu_all_qs | 0 | 2 |
| RIP:update_cfs_group | 0 | 3 |
| RIP:pipe_to_sendpage | 0 | 3 |
| RIP:kernel_sendpage | 0 | 3 |
| RIP:tcp_tso_segs | 0 | 3 |
| RIP:mod_timer | 0 | 3 |
| RIP:__fget_light | 0 | 3 |
| RIP:entry_SYSCALL_64_fastpath | 0 | 3 |
| RIP:tcp_send_delayed_ack | 0 | 2 |
| RIP:rb_erase | 0 | 2 |
| RIP:splice_from_pipe | 0 | 3 |
| RIP:SyS_sendfile64 | 0 | 2 |
| RIP:__x86_indirect_thunk_rax | 0 | 3 |
| RIP:pick_next_task_fair | 0 | 3 |
| RIP:file_has_perm | 0 | 2 |
| RIP:security_sock_rcv_skb | 0 | 3 |
| RIP:ipv4_mtu | 0 | 3 |
| RIP:tcp_wfree | 0 | 3 |
| RIP:_cond_resched | 0 | 2 |
| RIP:rb_erase_cached | 0 | 3 |
| RIP:__splice_from_pipe | 0 | 3 |
| RIP:tcp_v4_md5_lookup | 0 | 2 |
| RIP:sanity | 0 | 2 |
| RIP:finish_task_switch | 0 | 2 |
| RIP:ksize | 0 | 3 |
| RIP:inet_sendpage | 0 | 3 |
| RIP:touch_atime | 0 | 3 |
| RIP:anon_pipe_buf_release | 0 | 2 |
| RIP:tcp_sendpage | 0 | 3 |
| RIP:sock_def_readable | 0 | 2 |
| RIP:rb_first | 0 | 2 |
| RIP:page_cache_pipe_buf_release | 0 | 3 |
| RIP:skb_copy_datagram_iter | 0 | 3 |
| RIP:_raw_spin_lock_irqsave | 0 | 2 |
| RIP:validate_xmit_skb | 0 | 3 |
| RIP:tcp_current_mss | 0 | 3 |
| RIP:dequeue_task_fair | 0 | 3 |
| RIP:select_idle_sibling | 0 | 3 |
| RIP:copy_page_from_iter | 0 | 2 |
| RIP:__wake_up_common_lock | 0 | 2 |
| RIP:mark_page_accessed | 0 | 3 |
| RIP:update_curr | 0 | 2 |
| RIP:direct_splice_actor | 0 | 2 |
| RIP:tcp_md5_do_lookup | 0 | 2 |
| RIP:ip_rcv_finish | 0 | 2 |
| RIP:pagecache_get_page | 0 | 2 |
| RIP:vfs_write | 0 | 2 |
| RIP:page_cache_pipe_buf_confirm | 0 | 3 |
| RIP:tcp_v4_do_rcv | 0 | 2 |
| RIP:tcp_ack | 0 | 3 |
| RIP:entry_SYSCALL_64_stage2 | 0 | 2 |
| RIP:copy_user_generic_unrolled | 0 | 2 |
| RIP:enqueue_entity | 0 | 3 |
| RIP:scheduler_tick | 0 | 1 |
| RIP:tcp_rcv_space_adjust | 0 | 2 |
| RIP:load_new_mm_cr3 | 0 | 3 |
| RIP:tcp_rate_check_app_limited | 0 | 3 |
| RIP:__netif_receive_skb_core | 0 | 3 |
| RIP:__might_sleep | 0 | 3 |
| RIP:do_splice_to | 0 | 3 |
| RIP:jiffies_to_usecs | 0 | 2 |
| RIP:check_preempt_wakeup | 0 | 2 |
| RIP:pipe_read | 0 | 2 |
| RIP:kmalloc_slab | 0 | 2 |
| RIP:tcp_event_new_data_sent | 0 | 3 |
| RIP:io_serial_in | 0 | 1 |
| RIP:tcp_schedule_loss_probe | 0 | 2 |
| RIP:io_serial_out | 0 | 1 |
| RIP:cfb_imageblit | 0 | 1 |
| RIP:sched_clock_cpu | 0 | 1 |
| RIP:lapic_next_deadline | 0 | 1 |
| RIP:native_queued_spin_lock_slowpath | 0 | 1 |
| RIP:switch_mm_irqs_off | 0 | 2 |
| RIP:kmem_cache_alloc_node | 0 | 2 |
| RIP:sk_filter_trim_cap | 0 | 2 |
| RIP:nf_hook_slow | 0 | 1 |
| RIP:tcp_recvmsg | 0 | 2 |
| RIP:_raw_spin_lock | 0 | 1 |
| RIP:unmap_page_range | 0 | 1 |
| RIP:tcp_release_cb | 0 | 2 |
| RIP:ktime_get_with_offset | 0 | 2 |
| RIP:sk_wait_data | 0 | 1 |
| RIP:__clear_user | 0 | 1 |
| RIP:update_rq_clock | 0 | 1 |
| RIP:copyout | 0 | 1 |
| RIP:ktime_get_update_offsets_now | 0 | 1 |
| RIP:set_next_entity | 0 | 1 |
| RIP:__wake_up_common | 0 | 1 |
| RIP:__list_add_valid | 0 | 1 |
| RIP:wakeup_preempt_entity | 0 | 1 |
| RIP:security_file_permission | 0 | 2 |
| RIP:__do_softirq | 0 | 2 |
| RIP:reweight_entity | 0 | 1 |
| RIP:netif_skb_features | 0 | 1 |
| RIP:schedule_timeout | 0 | 1 |
| RIP:try_to_wake_up | 0 | 1 |
| RIP:rw_verify_area | 0 | 2 |
| RIP:tcp_rate_skb_delivered | 0 | 1 |
| RIP:___perf_sw_event | 0 | 2 |
| RIP:ip_finish_output2 | 0 | 2 |
| RIP:tcp_event_data_recv | 0 | 1 |
| RIP:tcp_v4_inbound_md5_hash | 0 | 1 |
| RIP:__kfree_skb_flush | 0 | 1 |
| RIP:net_rx_action | 0 | 2 |
| RIP:ip_output | 0 | 2 |
| RIP:__netif_receive_skb | 0 | 1 |
| RIP:process_backlog | 0 | 2 |
| RIP:__calc_delta | 0 | 2 |
| RIP:tcp_check_space | 0 | 1 |
| RIP:native_sched_clock | 0 | 1 |
| RIP:selinux_ip_postroute | 0 | 1 |
| RIP:filemap_map_pages | 0 | 1 |
| RIP:clear_buddies | 0 | 1 |
| RIP:netdev_pick_tx | 0 | 1 |
| RIP:tcp_write_xmit | 0 | 2 |
| RIP:skb_release_data | 0 | 1 |
| RIP:ipv4_dst_check | 0 | 1 |
| RIP:vmacache_find | 0 | 1 |
| RIP:iov_iter_pipe | 0 | 1 |
| RIP:tcp_clean_rtx_queue | 0 | 2 |
| RIP:dst_release | 0 | 1 |
| RIP:loopback_xmit | 0 | 1 |
| RIP:read_tsc | 0 | 1 |
| RIP:__ip_local_out | 0 | 1 |
| RIP:wait_woken | 0 | 2 |
| RIP:native_flush_tlb_single | 0 | 1 |
| RIP:detach_if_pending | 0 | 1 |
| RIP:__skb_clone | 0 | 1 |
| RIP:update_load_avg | 0 | 1 |
| RIP:__inet_lookup_established | 0 | 2 |
| RIP:__sb_start_write | 0 | 1 |
| RIP:enqueue_task_fair | 0 | 1 |
| RIP:__update_load_avg_se | 0 | 2 |
| RIP:put_prev_entity | 0 | 2 |
| RIP:vfs_read | 0 | 1 |
| RIP:finish_wait | 0 | 1 |
| RIP:rb_next | 0 | 2 |
| RIP:set_next_buddy | 0 | 1 |
| RIP:tcp_chrono_start | 0 | 1 |
| RIP:tcp_v4_fill_cb | 0 | 1 |
| RIP:__local_bh_enable_ip | 0 | 2 |
| RIP:__vfs_read | 0 | 1 |
| RIP:kfree | 0 | 2 |
| RIP:dequeue_entity | 0 | 1 |
| RIP:__alloc_skb | 0 | 2 |
| RIP:skb_network_protocol | 0 | 1 |
| RIP:account_entity_enqueue | 0 | 2 |
| RIP:selinux_parse_skb | 0 | 2 |
| RIP:account_entity_dequeue | 0 | 1 |
| RIP:selinux_netlbl_sock_rcv_skb | 0 | 1 |
| RIP:idle_cpu | 0 | 2 |
| RIP:tcp_send_ack | 0 | 1 |
| RIP:generic_file_splice_read | 0 | 1 |
| RIP:__inode_security_revalidate | 0 | 1 |
| RIP:kmem_cache_free | 0 | 1 |
| RIP:cpumask_next | 0 | 1 |
| RIP:perf_exclude_event | 0 | 1 |
| RIP:netlbl_enabled | 0 | 2 |
| RIP:tcp_ack_update_rtt | 0 | 2 |
| RIP:__mark_inode_dirty | 0 | 1 |
| RIP:hrtick_update | 0 | 1 |
| RIP:kill_fasync | 0 | 1 |
| RIP:_raw_spin_unlock_irqrestore | 0 | 2 |
| RIP:do_softirq | 0 | 1 |
| RIP:pick_next_entity | 0 | 2 |
| RIP:selinux_socket_sock_rcv_skb | 0 | 2 |
| RIP:minmax_subwin_update | 0 | 2 |
| RIP:__vfs_write | 0 | 1 |
| RIP:schedule | 0 | 1 |
| RIP:ip_local_deliver_finish | 0 | 1 |
| RIP:rb_insert_color_cached | 0 | 1 |
| RIP:ttwu_do_wakeup | 0 | 1 |
| RIP:cpumask_next_wrap | 0 | 2 |
| RIP:tcp_options_write | 0 | 1 |
| RIP:tcp_tx_timestamp | 0 | 1 |
| RIP:tcp_filter | 0 | 1 |
| RIP:__kmalloc_node_track_caller | 0 | 2 |
| RIP:__usecs_to_jiffies | 0 | 1 |
| RIP:find_next_bit | 0 | 1 |
| RIP:resched_curr | 0 | 1 |
| RIP:SyS_write | 0 | 1 |
| RIP:tcp_grow_window | 0 | 2 |
| RIP:bictcp_acked | 0 | 1 |
| RIP:syscall_return_slowpath | 0 | 2 |
| RIP:prepare_to_wait | 0 | 1 |
| RIP:__fsnotify_parent | 0 | 1 |
| RIP:tcp_update_pacing_rate | 0 | 2 |
| RIP:tcp_push | 0 | 2 |
| RIP:generic_splice_sendpage | 0 | 1 |
| RIP:update_min_vruntime | 0 | 1 |
| RIP:__hrtimer_run_queues | 0 | 1 |
| RIP:__might_fault | 0 | 2 |
| RIP:dev_hard_start_xmit | 0 | 2 |
| RIP:__list_del_entry_valid | 0 | 1 |
| RIP:__tcp_select_window | 0 | 2 |
| RIP:return_from_SYSCALL_64 | 0 | 1 |
| RIP:__switch_to_asm | 0 | 1 |
| RIP:sk_free | 0 | 1 |
| RIP:mutex_lock | 0 | 1 |
| RIP:memcpy_erms | 0 | 1 |
| RIP:current_time | 0 | 1 |
| RIP:raw_local_deliver | 0 | 1 |
| RIP:selinux_sock_rcv_skb_compat | 0 | 1 |
| RIP:ip_queue_xmit | 0 | 2 |
| RIP:update_wall_time | 0 | 1 |
| RIP:__wake_up_sync_key | 0 | 1 |
| RIP:deactivate_task | 0 | 1 |
| RIP:tcp_recv_timestamp | 0 | 1 |
| RIP:timespec_trunc | 0 | 1 |
| RIP:sockfd_lookup_light | 0 | 2 |
| RIP:ip_send_check | 0 | 1 |
| RIP:tcp_small_queue_check | 0 | 1 |
| RIP:tcp_rbtree_insert | 0 | 1 |
| RIP:SYSC_recvfrom | 0 | 1 |
| RIP:__tcp_ack_snd_check | 0 | 1 |
| RIP:run_timer_softirq | 0 | 1 |
| RIP:selinux_ipv4_postroute | 0 | 1 |
| RIP:__mnt_want_write | 0 | 1 |
| RIP:ns_to_timespec64 | 0 | 2 |
| RIP:woken_wake_function | 0 | 2 |
| RIP:default_wake_function | 0 | 1 |
| RIP:tcp_queue_rcv | 0 | 1 |
| RIP:__sb_end_write | 0 | 2 |
| RIP:iov_iter_init | 0 | 1 |
| RIP:inet_ehashfn | 0 | 2 |
| RIP:__xfrm_policy_check2 | 0 | 1 |
| RIP:mutex_unlock | 0 | 1 |
| RIP:tcp_chrono_stop | 0 | 2 |
| RIP:file_update_time | 0 | 1 |
| RIP:skb_entail | 0 | 1 |
| RIP:__atime_needs_update | 0 | 2 |
| RIP:pipe_unlock | 0 | 1 |
| RIP:tcp_cleanup_rbuf | 0 | 1 |
| RIP:perf_iterate_sb | 0 | 1 |
| RIP:tcp_rearm_rto | 0 | 2 |
| RIP:skb_clone_tx_timestamp | 0 | 1 |
| RIP:pipe_lock | 0 | 1 |
| RIP:rcu_note_context_switch | 0 | 1 |
| RIP:format_decode | 0 | 1 |
| RIP:ip_finish_output | 0 | 2 |
| RIP:skb_clone | 0 | 1 |
| RIP:hrtimer_forward | 0 | 1 |
| RIP:sock_rfree | 0 | 2 |
| RIP:check_cfs_rq_runtime | 0 | 1 |
| RIP:number | 0 | 1 |
| RIP:autoremove_wake_function | 0 | 1 |
| RIP:sock_put | 0 | 2 |
| RIP:tcp_stream_memory_free | 0 | 1 |
| RIP:__handle_mm_fault | 0 | 1 |
| RIP:devkmsg_read | 0 | 1 |
| RIP:free_pgd_range | 0 | 1 |
| RIP:__x86_indirect_thunk_r8 | 0 | 1 |
| RIP:__x86_indirect_thunk_r10 | 0 | 1 |
| RIP:sock_has_perm | 0 | 1 |
| RIP:remove_wait_queue | 0 | 2 |
| RIP:sched_clock | 0 | 1 |
| RIP:skb_release_all | 0 | 1 |
| RIP:tcp_rate_skb_sent | 0 | 1 |
| RIP:SyS_read | 0 | 1 |
| RIP:copyin | 0 | 1 |
| RIP:bictcp_cong_avoid | 0 | 1 |
| RIP:skb_csum_hwoffload_help | 0 | 1 |
| RIP:check_preempt_curr | 0 | 1 |
| RIP:tcp_add_backlog | 0 | 1 |
| RIP:decay_load | 0 | 1 |
| RIP:__release_sock | 0 | 2 |
| RIP:_copy_from_user | 0 | 1 |
| RIP:page_fault | 0 | 1 |
| RIP:vsnprintf | 0 | 1 |
| RIP:native_load_tls | 0 | 1 |
| RIP:__sock_wfree | 0 | 1 |
| RIP:ip_local_out | 0 | 1 |
| RIP:perf_swevent_event | 0 | 1 |
| RIP:apic_timer_interrupt | 0 | 1 |
| RIP:security_socket_recvmsg | 0 | 2 |
| RIP:tick_sched_timer | 0 | 1 |
| RIP:import_single_range | 0 | 2 |
| RIP:clockevents_program_event | 0 | 1 |
| RIP:activate_task | 0 | 1 |
| RIP:selinux_xfrm_postroute_last | 0 | 2 |
| RIP:__enqueue_entity | 0 | 1 |
| RIP:radix_tree_lookup_slot | 0 | 1 |
| RIP:acct_account_cputime | 0 | 1 |
| RIP:tcp_rate_gen | 0 | 1 |
| RIP:SyS_access | 0 | 1 |
| RIP:kfree_skbmem | 0 | 1 |
| RIP:get_nohz_timer_target | 0 | 1 |
| RIP:__sk_dst_check | 0 | 1 |
| RIP:cap_bprm_set_creds | 0 | 1 |
| RIP:inet_recvmsg | 0 | 2 |
| RIP:sock_recvmsg | 0 | 1 |
| RIP:cpu_load_update_active | 0 | 1 |
| RIP:PageHuge | 0 | 1 |
| RIP:netif_rx | 0 | 1 |
| RIP:__fdget_pos | 0 | 1 |
| RIP:selinux_xfrm_sock_rcv_skb | 0 | 2 |
| RIP:osq_lock | 0 | 1 |
| RIP:seq_write | 0 | 1 |
| RIP:ret_from_intr | 0 | 1 |
| RIP:rcu_bh_qs | 0 | 1 |
| RIP:find_vma | 0 | 1 |
| RIP:calc_wheel_index | 0 | 1 |
| RIP:do_softirq_own_stack | 0 | 2 |
| RIP:sk_reset_timer | 0 | 2 |
| RIP:vma_compute_subtree_gap | 0 | 1 |
| RIP:ip_local_deliver | 0 | 2 |
| RIP:__rb_insert_augmented | 0 | 1 |
| RIP:bictcp_cwnd_event | 0 | 1 |
| RIP:smp_apic_timer_interrupt | 0 | 1 |
| RIP:rcu_check_callbacks | 0 | 1 |
| RIP:ttwu_do_activate | 0 | 1 |
| RIP:generic_pipe_buf_confirm | 0 | 1 |
| RIP:mutex_spin_on_owner | 0 | 1 |
| RIP:__alloc_pages_nodemask | 0 | 1 |
| RIP:selinux_ip_postroute_compat | 0 | 2 |
| RIP:get_seconds | 0 | 1 |
| RIP:__put_user_8 | 0 | 1 |
| RIP:__fdget | 0 | 1 |
| RIP:vma_merge | 0 | 1 |
| RIP:rcu_implicit_dynticks_qs | 0 | 1 |
| RIP:native_smp_send_reschedule | 0 | 1 |
| RIP:avc_has_perm_noaudit | 0 | 1 |
| RIP:cpu_load_update | 0 | 1 |
| RIP:unfreeze_partials | 0 | 1 |
| RIP:worker_enter_idle | 0 | 1 |
| RIP:__printk_safe_exit | 0 | 1 |
| RIP:sync_regs | 0 | 1 |
| RIP:path_openat | 0 | 1 |
| RIP:_raw_spin_unlock_bh | 0 | 2 |
| RIP:eth_type_trans | 0 | 2 |
| RIP:selinux_ipv4_output | 0 | 1 |
| RIP:__slab_free | 0 | 1 |
| RIP:smp_call_function_single | 0 | 1 |
| RIP:__x86_indirect_thunk_r13 | 0 | 1 |
| RIP:perf_event_task_tick | 0 | 1 |
| RIP:ktime_get | 0 | 1 |
| RIP:minmax_running_min | 0 | 1 |
+------------------------------------------------------------------------------------------+-------+------------+



[ 242.731381] WARNING: CPU: 3 PID: 1107 at arch/x86/events/intel/ds.c:1326 intel_pmu_save_and_restart_reload+0x87/0x90
[ 242.731382] Modules linked in: netconsole snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc wmi_bmof aesni_intel crypto_simd glue_helper i915 cryptd snd_hda_intel snd_hda_codec uvcvideo videobuf2_vmalloc videobuf2_memops pcspkr serio_raw ahci drm_kms_helper libahci syscopyarea sysfillrect snd_hda_core videobuf2_v4l2 sysimgblt fb_sys_fops videobuf2_core snd_hwdep snd_pcm snd_timer bcma snd libata soundcore videodev drm shpchp ideapad_laptop wmi sparse_keymap rfkill video ip_tables
[ 242.731417] CPU: 3 PID: 1107 Comm: netserver Not tainted 4.15.0-00001-g41e062c #1
[ 242.731418] Hardware name: LENOVO IdeaPad U410 /Lenovo , BIOS 65CN15WW 06/05/2012
[ 242.731422] RIP: 0010:intel_pmu_save_and_restart_reload+0x87/0x90
[ 242.731423] RSP: 0018:fffffe000008c8d0 EFLAGS: 00010002
[ 242.731425] RAX: 0000000000000001 RBX: ffff88007d069800 RCX: 0000000000000000
[ 242.731426] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88007d069800
[ 242.731427] RBP: 0000000000000010 R08: 0000000000000001 R09: 0000000000000001
[ 242.731428] R10: 00000000000000b0 R11: 0000000000003000 R12: 00000000000f4243
[ 242.731429] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
[ 242.731431] FS: 00007f1501639700(0000) GS:ffff880112ac0000(0000) knlGS:0000000000000000
[ 242.731432] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 242.731433] CR2: 00007f65a1394d68 CR3: 000000007f62a006 CR4: 00000000001606e0
[ 242.731434] Call Trace:
[ 242.731438] <NMI>
[ 242.731443] __intel_pmu_pebs_event+0xc8/0x260
[ 242.731452] ? intel_pmu_drain_pebs_nhm+0x211/0x2f0
[ 242.731454] intel_pmu_drain_pebs_nhm+0x211/0x2f0
[ 242.731457] intel_pmu_handle_irq+0x12d/0x4b0
[ 242.731464] ? perf_event_nmi_handler+0x2d/0x50
[ 242.731466] perf_event_nmi_handler+0x2d/0x50
[ 242.731470] nmi_handle+0x6a/0x130
[ 242.731473] default_do_nmi+0x4e/0x110
[ 242.731475] do_nmi+0xe5/0x140
[ 242.731479] end_repeat_nmi+0x1a/0x54
[ 242.731483] RIP: 0010:__dev_queue_xmit+0x47/0x780
[ 242.731484] RSP: 0018:ffffc9000141fb70 EFLAGS: 00000206
[ 242.731486] RAX: 00000000000000ee RBX: ffff88010d501000 RCX: ffff88007eb60cfc
[ 242.731487] RDX: ffff88007eb60c00 RSI: 0000000000000000 RDI: ffff88007d5c4400
[ 242.731488] RBP: ffffc9000141fbe0 R08: 0000000000000000 R09: 0000000000000000
[ 242.731489] R10: ffffc9000141fc00 R11: 0000000000003000 R12: ffff88007d5c4400
[ 242.731490] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88007d5c4400
[ 242.731495] ? __dev_queue_xmit+0x47/0x780
[ 242.731497] ? __dev_queue_xmit+0x47/0x780
[ 242.731498] </NMI>
[ 242.731502] ? selinux_ip_postroute+0x160/0x3b0
[ 242.731506] ? pick_next_task_fair+0x2e9/0x5d0
[ 242.731509] ? ip_finish_output2+0x26f/0x380
[ 242.731510] ip_finish_output2+0x26f/0x380
[ 242.731513] ? ip_output+0x5c/0xe0
[ 242.731515] ip_output+0x5c/0xe0
[ 242.731517] ? ip_fragment+0x80/0x80
[ 242.731519] ip_queue_xmit+0x147/0x3c0
[ 242.731522] tcp_transmit_skb+0x51d/0x9f0
[ 242.731525] tcp_recvmsg+0x2ff/0x9f0
[ 242.731530] inet_recvmsg+0x40/0xb0
[ 242.731533] SYSC_recvfrom+0xb0/0x110
[ 242.731538] entry_SYSCALL_64_fastpath+0x20/0x83
[ 242.731541] RIP: 0033:0x7f1500c5d04d
[ 242.731542] RSP: 002b:00007ffd9cde9748 EFLAGS: 00000246
[ 242.731544] Code: e6 48 d3 f8 49 d3 fe 4c 29 f0 4c 01 e0 48 01 83 90 00 00 00 48 89 df e8 a8 6b 18 00 31 c0 5b 5d 41 5c 41 5d 41 5e c3 0f ff eb a3 <0f> ff eb aa 0f 1f 44 00 00 0f 1f 44 00 00 41 56 41 55 48 c7 c2
[ 242.731572] ---[ end trace 27afcd40ecad57ab ]---


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml



Thanks,
lkp


Attachments:
(No filename) (52.63 kB)
config-4.15.0-00001-g41e062c (167.24 kB)
job-script (7.22 kB)
dmesg.xz (137.79 kB)
job.yaml (4.71 kB)
Download all attachments

2018-02-19 12:47:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [perf/x86/intel] 41e062cd2e: WARNING:at_arch/x86/events/intel/ds.c:#intel_pmu_save_and_restart_reload

On Sat, Feb 17, 2018 at 02:21:19PM +0800, kernel test robot wrote:
> [ 242.731381] WARNING: CPU: 3 PID: 1107 at arch/x86/events/intel/ds.c:1326 intel_pmu_save_and_restart_reload+0x87/0x90

That's the one asserting the PMU is in fact disabled.

> [ 242.731417] CPU: 3 PID: 1107 Comm: netserver Not tainted 4.15.0-00001-g41e062c #1
> [ 242.731418] Hardware name: LENOVO IdeaPad U410 /Lenovo , BIOS 65CN15WW 06/05/2012
> [ 242.731422] RIP: 0010:intel_pmu_save_and_restart_reload+0x87/0x90
> [ 242.731423] RSP: 0018:fffffe000008c8d0 EFLAGS: 00010002
> [ 242.731425] RAX: 0000000000000001 RBX: ffff88007d069800 RCX: 0000000000000000
> [ 242.731426] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88007d069800
> [ 242.731427] RBP: 0000000000000010 R08: 0000000000000001 R09: 0000000000000001
> [ 242.731428] R10: 00000000000000b0 R11: 0000000000003000 R12: 00000000000f4243
> [ 242.731429] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
> [ 242.731431] FS: 00007f1501639700(0000) GS:ffff880112ac0000(0000) knlGS:0000000000000000
> [ 242.731432] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 242.731433] CR2: 00007f65a1394d68 CR3: 000000007f62a006 CR4: 00000000001606e0
> [ 242.731434] Call Trace:
> [ 242.731438] <NMI>
> [ 242.731443] __intel_pmu_pebs_event+0xc8/0x260
> [ 242.731452] ? intel_pmu_drain_pebs_nhm+0x211/0x2f0
> [ 242.731454] intel_pmu_drain_pebs_nhm+0x211/0x2f0
> [ 242.731457] intel_pmu_handle_irq+0x12d/0x4b0
> [ 242.731464] ? perf_event_nmi_handler+0x2d/0x50
> [ 242.731466] perf_event_nmi_handler+0x2d/0x50
> [ 242.731470] nmi_handle+0x6a/0x130
> [ 242.731473] default_do_nmi+0x4e/0x110
> [ 242.731475] do_nmi+0xe5/0x140
> [ 242.731479] end_repeat_nmi+0x1a/0x54

And this should have shown with any testing I think.

The problem appears to be that intel_pmu_handle_irq() uses
__intel_pmu_disable_all() which 'forgets' to clear cpuc->enabled as per
x86_pmu_disable().



2018-02-20 19:00:04

by Liang, Kan

[permalink] [raw]
Subject: Re: [perf/x86/intel] 41e062cd2e: WARNING:at_arch/x86/events/intel/ds.c:#intel_pmu_save_and_restart_reload



On 2/19/2018 7:44 AM, Peter Zijlstra wrote:
> On Sat, Feb 17, 2018 at 02:21:19PM +0800, kernel test robot wrote:
>> [ 242.731381] WARNING: CPU: 3 PID: 1107 at arch/x86/events/intel/ds.c:1326 intel_pmu_save_and_restart_reload+0x87/0x90
>
> That's the one asserting the PMU is in fact disabled.
>
>> [ 242.731417] CPU: 3 PID: 1107 Comm: netserver Not tainted 4.15.0-00001-g41e062c #1
>> [ 242.731418] Hardware name: LENOVO IdeaPad U410 /Lenovo , BIOS 65CN15WW 06/05/2012
>> [ 242.731422] RIP: 0010:intel_pmu_save_and_restart_reload+0x87/0x90
>> [ 242.731423] RSP: 0018:fffffe000008c8d0 EFLAGS: 00010002
>> [ 242.731425] RAX: 0000000000000001 RBX: ffff88007d069800 RCX: 0000000000000000
>> [ 242.731426] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88007d069800
>> [ 242.731427] RBP: 0000000000000010 R08: 0000000000000001 R09: 0000000000000001
>> [ 242.731428] R10: 00000000000000b0 R11: 0000000000003000 R12: 00000000000f4243
>> [ 242.731429] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
>> [ 242.731431] FS: 00007f1501639700(0000) GS:ffff880112ac0000(0000) knlGS:0000000000000000
>> [ 242.731432] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 242.731433] CR2: 00007f65a1394d68 CR3: 000000007f62a006 CR4: 00000000001606e0
>> [ 242.731434] Call Trace:
>> [ 242.731438] <NMI>
>> [ 242.731443] __intel_pmu_pebs_event+0xc8/0x260
>> [ 242.731452] ? intel_pmu_drain_pebs_nhm+0x211/0x2f0
>> [ 242.731454] intel_pmu_drain_pebs_nhm+0x211/0x2f0
>> [ 242.731457] intel_pmu_handle_irq+0x12d/0x4b0
>> [ 242.731464] ? perf_event_nmi_handler+0x2d/0x50
>> [ 242.731466] perf_event_nmi_handler+0x2d/0x50
>> [ 242.731470] nmi_handle+0x6a/0x130
>> [ 242.731473] default_do_nmi+0x4e/0x110
>> [ 242.731475] do_nmi+0xe5/0x140
>> [ 242.731479] end_repeat_nmi+0x1a/0x54
>
> And this should have shown with any testing I think.
>
> The problem appears to be that intel_pmu_handle_irq() uses
> __intel_pmu_disable_all() which 'forgets' to clear cpuc->enabled as per
> x86_pmu_disable().
>
>

Yes, the cpuc->enabled is not updated accordingly in NMI handler.
The patch as below could fix it.

Thanks,
Kan
------

From 4d07d81e3406a6a9958cfbb34c1deb87b77721a9 Mon Sep 17 00:00:00 2001
From: Kan Liang <[email protected]>
Date: Tue, 20 Feb 2018 02:11:50 -0800
Subject: [PATCH] perf/x86/intel: Update the PMU state in NMI handler

Intel PMU is disabled in NMI handler, but cpuc->enabled is not updated
accordingly. It doesn't trigger any problems in current code. Because
no one check it. But the code quality issue will bring problem when the
code want to check the PMU state. For example, the drain_pebs() will be
modified to fix auto-reload issue. The new code will check the PMU state.

The old PMU state must be saved when entering the NMI. Because it will
be used to restore the PMU state when leaving the NMI.

Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/events/intel/core.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 6461a4a..80dfaae 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2209,16 +2209,23 @@ static int intel_pmu_handle_irq(struct pt_regs
*regs)
int bit, loops;
u64 status;
int handled;
+ int pmu_enabled;

cpuc = this_cpu_ptr(&cpu_hw_events);

/*
+ * Save the PMU state.
+ * It needs to be restored when leaving the handler.
+ */
+ pmu_enabled = cpuc->enabled;
+ /*
* No known reason to not always do late ACK,
* but just in case do it opt-in.
*/
if (!x86_pmu.late_ack)
apic_write(APIC_LVTPC, APIC_DM_NMI);
intel_bts_disable_local();
+ cpuc->enabled = 0;
__intel_pmu_disable_all();
handled = intel_pmu_drain_bts_buffer();
handled += intel_bts_interrupt();
@@ -2328,7 +2335,8 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)

done:
/* Only restore PMU state when it's active. See x86_pmu_disable(). */
- if (cpuc->enabled)
+ cpuc->enabled = pmu_enabled;
+ if (pmu_enabled)
__intel_pmu_enable_all(0, true);
intel_bts_enable_local();

--
2.7.4


2018-02-21 10:35:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V4 1/5] perf/x86/intel: Fix event update for auto-reload

On Mon, Feb 12, 2018 at 02:20:31PM -0800, [email protected] wrote:
> @@ -1389,8 +1456,22 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
>
> ds->pebs_index = ds->pebs_buffer_base;
>
> - if (unlikely(base >= top))
> + if (unlikely(base >= top)) {
> + /*
> + * The drain_pebs() could be called twice in a short period
> + * for auto-reload event in pmu::read(). There are no
> + * overflows have happened in between.
> + * It needs to call intel_pmu_save_and_restart_reload() to
> + * update the event->count for this case.
> + */
> + for_each_set_bit(bit, (unsigned long *)&cpuc->pebs_enabled,
> + x86_pmu.max_pebs_events) {
> + event = cpuc->events[bit];
> + if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
> + intel_pmu_save_and_restart_reload(event, 0);
> + }
> return;
> + }
>
> for (at = base; at < top; at += x86_pmu.pebs_record_size) {
> struct pebs_record_nhm *p = at;

Is there a reason you didn't do intel_pmu_drain_pebs_core() ?


--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1435,8 +1435,11 @@ static void intel_pmu_drain_pebs_core(st
return;

n = top - at;
- if (n <= 0)
+ if (n <= 0) {
+ if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
+ intel_pmu_save_and_restart_reload(event, 0);
return;
+ }

__intel_pmu_pebs_event(event, iregs, at, top, 0, n);
}

2018-02-21 13:56:41

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH V4 1/5] perf/x86/intel: Fix event update for auto-reload



On 2/21/2018 5:32 AM, Peter Zijlstra wrote:
> On Mon, Feb 12, 2018 at 02:20:31PM -0800, [email protected] wrote:
>> @@ -1389,8 +1456,22 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
>>
>> ds->pebs_index = ds->pebs_buffer_base;
>>
>> - if (unlikely(base >= top))
>> + if (unlikely(base >= top)) {
>> + /*
>> + * The drain_pebs() could be called twice in a short period
>> + * for auto-reload event in pmu::read(). There are no
>> + * overflows have happened in between.
>> + * It needs to call intel_pmu_save_and_restart_reload() to
>> + * update the event->count for this case.
>> + */
>> + for_each_set_bit(bit, (unsigned long *)&cpuc->pebs_enabled,
>> + x86_pmu.max_pebs_events) {
>> + event = cpuc->events[bit];
>> + if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
>> + intel_pmu_save_and_restart_reload(event, 0);
>> + }
>> return;
>> + }
>>
>> for (at = base; at < top; at += x86_pmu.pebs_record_size) {
>> struct pebs_record_nhm *p = at;
>
> Is there a reason you didn't do intel_pmu_drain_pebs_core() ?
>

Right, I forgot that the drain_pebs_core() also has auto_reload support.
Sorry for that.
I will re-send all patches.

Thanks,
Kan

>
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1435,8 +1435,11 @@ static void intel_pmu_drain_pebs_core(st
> return;
>
> n = top - at;
> - if (n <= 0)
> + if (n <= 0) {
> + if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
> + intel_pmu_save_and_restart_reload(event, 0);
> return;
> + }
>
> __intel_pmu_pebs_event(event, iregs, at, top, 0, n);
> }
>

2018-02-21 13:57:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V4 1/5] perf/x86/intel: Fix event update for auto-reload

On Wed, Feb 21, 2018 at 08:43:47AM -0500, Liang, Kan wrote:
> Right, I forgot that the drain_pebs_core() also has auto_reload support.
> Sorry for that.
> I will re-send all patches.

No need, already collected the lot.

Thanks!

Subject: [tip:perf/core] perf/x86/intel: Properly save/restore the PMU state in the NMI handler

Commit-ID: 82d71ed0277efc45360828af8c4e4d40e1b45352
Gitweb: https://git.kernel.org/tip/82d71ed0277efc45360828af8c4e4d40e1b45352
Author: Kan Liang <[email protected]>
AuthorDate: Tue, 20 Feb 2018 02:11:50 -0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 9 Mar 2018 08:22:18 +0100

perf/x86/intel: Properly save/restore the PMU state in the NMI handler

The PMU is disabled in intel_pmu_handle_irq(), but cpuc->enabled is not updated
accordingly.

This is fine in current usage because no-one checks it - but fix it
for future code: for example, the drain_pebs() will be modified to
fix an auto-reload bug.

Properly save/restore the old PMU state.

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Cc: kernel test robot <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/events/intel/core.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 6b6c1717787d..1ba7ca7b675d 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2201,9 +2201,15 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
int bit, loops;
u64 status;
int handled;
+ int pmu_enabled;

cpuc = this_cpu_ptr(&cpu_hw_events);

+ /*
+ * Save the PMU state.
+ * It needs to be restored when leaving the handler.
+ */
+ pmu_enabled = cpuc->enabled;
/*
* No known reason to not always do late ACK,
* but just in case do it opt-in.
@@ -2211,6 +2217,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
if (!x86_pmu.late_ack)
apic_write(APIC_LVTPC, APIC_DM_NMI);
intel_bts_disable_local();
+ cpuc->enabled = 0;
__intel_pmu_disable_all();
handled = intel_pmu_drain_bts_buffer();
handled += intel_bts_interrupt();
@@ -2320,7 +2327,8 @@ again:

done:
/* Only restore PMU state when it's active. See x86_pmu_disable(). */
- if (cpuc->enabled)
+ cpuc->enabled = pmu_enabled;
+ if (pmu_enabled)
__intel_pmu_enable_all(0, true);
intel_bts_enable_local();


Subject: [tip:perf/core] perf/x86/intel: Fix event update for auto-reload

Commit-ID: d31fc13fdcb20e1c317f9a7dd6273c18fbd58308
Gitweb: https://git.kernel.org/tip/d31fc13fdcb20e1c317f9a7dd6273c18fbd58308
Author: Kan Liang <[email protected]>
AuthorDate: Mon, 12 Feb 2018 14:20:31 -0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 9 Mar 2018 08:22:19 +0100

perf/x86/intel: Fix event update for auto-reload

There is a bug when reading event->count with large PEBS enabled.

Here is an example:

# ./read_count
0x71f0
0x122c0
0x1000000001c54
0x100000001257d
0x200000000bdc5

In fixed period mode, the auto-reload mechanism could be enabled for
PEBS events, but the calculation of event->count does not take the
auto-reload values into account.

Anyone who reads event->count will get the wrong result, e.g x86_pmu_read().

This bug was introduced with the auto-reload mechanism enabled since
commit:

851559e35fd5 ("perf/x86/intel: Use the PEBS auto reload mechanism when possible")

Introduce intel_pmu_save_and_restart_reload() to calculate the
event->count only for auto-reload.

Since the counter increments a negative counter value and overflows on
the sign switch, giving the interval:

[-period, 0]

the difference between two consequtive reads is:

A) value2 - value1;
when no overflows have happened in between,
B) (0 - value1) + (value2 - (-period));
when one overflow happened in between,
C) (0 - value1) + (n - 1) * (period) + (value2 - (-period));
when @n overflows happened in between.

Here A) is the obvious difference, B) is the extension to the discrete
interval, where the first term is to the top of the interval and the
second term is from the bottom of the next interval and C) the extension
to multiple intervals, where the middle term is the whole intervals
covered.

The equation for all cases is:

value2 - value1 + n * period

Previously the event->count is updated right before the sample output.
But for case A, there is no PEBS record ready. It needs to be specially
handled.

Remove the auto-reload code from x86_perf_event_set_period() since
we'll not longer call that function in this case.

Based-on-code-from: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Fixes: 851559e35fd5 ("perf/x86/intel: Use the PEBS auto reload mechanism when possible")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/events/core.c | 15 +++-----
arch/x86/events/intel/ds.c | 92 ++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 94 insertions(+), 13 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 140d33288e78..5a3ccd1715e2 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1156,16 +1156,13 @@ int x86_perf_event_set_period(struct perf_event *event)

per_cpu(pmc_prev_left[idx], smp_processor_id()) = left;

- if (!(hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) ||
- local64_read(&hwc->prev_count) != (u64)-left) {
- /*
- * The hw event starts counting from this event offset,
- * mark it to be able to extra future deltas:
- */
- local64_set(&hwc->prev_count, (u64)-left);
+ /*
+ * The hw event starts counting from this event offset,
+ * mark it to be able to extra future deltas:
+ */
+ local64_set(&hwc->prev_count, (u64)-left);

- wrmsrl(hwc->event_base, (u64)(-left) & x86_pmu.cntval_mask);
- }
+ wrmsrl(hwc->event_base, (u64)(-left) & x86_pmu.cntval_mask);

/*
* Due to erratum on certan cpu we need
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 18c25ab28557..f39a4df3a7bd 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1306,17 +1306,84 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit)
return NULL;
}

+/*
+ * Special variant of intel_pmu_save_and_restart() for auto-reload.
+ */
+static int
+intel_pmu_save_and_restart_reload(struct perf_event *event, int count)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ int shift = 64 - x86_pmu.cntval_bits;
+ u64 period = hwc->sample_period;
+ u64 prev_raw_count, new_raw_count;
+ s64 new, old;
+
+ WARN_ON(!period);
+
+ /*
+ * drain_pebs() only happens when the PMU is disabled.
+ */
+ WARN_ON(this_cpu_read(cpu_hw_events.enabled));
+
+ prev_raw_count = local64_read(&hwc->prev_count);
+ rdpmcl(hwc->event_base_rdpmc, new_raw_count);
+ local64_set(&hwc->prev_count, new_raw_count);
+
+ /*
+ * Since the counter increments a negative counter value and
+ * overflows on the sign switch, giving the interval:
+ *
+ * [-period, 0]
+ *
+ * the difference between two consequtive reads is:
+ *
+ * A) value2 - value1;
+ * when no overflows have happened in between,
+ *
+ * B) (0 - value1) + (value2 - (-period));
+ * when one overflow happened in between,
+ *
+ * C) (0 - value1) + (n - 1) * (period) + (value2 - (-period));
+ * when @n overflows happened in between.
+ *
+ * Here A) is the obvious difference, B) is the extension to the
+ * discrete interval, where the first term is to the top of the
+ * interval and the second term is from the bottom of the next
+ * interval and C) the extension to multiple intervals, where the
+ * middle term is the whole intervals covered.
+ *
+ * An equivalent of C, by reduction, is:
+ *
+ * value2 - value1 + n * period
+ */
+ new = ((s64)(new_raw_count << shift) >> shift);
+ old = ((s64)(prev_raw_count << shift) >> shift);
+ local64_add(new - old + count * period, &event->count);
+
+ perf_event_update_userpage(event);
+
+ return 0;
+}
+
static void __intel_pmu_pebs_event(struct perf_event *event,
struct pt_regs *iregs,
void *base, void *top,
int bit, int count)
{
+ struct hw_perf_event *hwc = &event->hw;
struct perf_sample_data data;
struct pt_regs regs;
void *at = get_next_pebs_record_by_bit(base, top, bit);

- if (!intel_pmu_save_and_restart(event) &&
- !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
+ if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) {
+ /*
+ * Now, auto-reload is only enabled in fixed period mode.
+ * The reload value is always hwc->sample_period.
+ * May need to change it, if auto-reload is enabled in
+ * freq mode later.
+ */
+ intel_pmu_save_and_restart_reload(event, count);
+ } else if (!intel_pmu_save_and_restart(event))
return;

while (count > 1) {
@@ -1368,8 +1435,11 @@ static void intel_pmu_drain_pebs_core(struct pt_regs *iregs)
return;

n = top - at;
- if (n <= 0)
+ if (n <= 0) {
+ if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
+ intel_pmu_save_and_restart_reload(event, 0);
return;
+ }

__intel_pmu_pebs_event(event, iregs, at, top, 0, n);
}
@@ -1392,8 +1462,22 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)

ds->pebs_index = ds->pebs_buffer_base;

- if (unlikely(base >= top))
+ if (unlikely(base >= top)) {
+ /*
+ * The drain_pebs() could be called twice in a short period
+ * for auto-reload event in pmu::read(). There are no
+ * overflows have happened in between.
+ * It needs to call intel_pmu_save_and_restart_reload() to
+ * update the event->count for this case.
+ */
+ for_each_set_bit(bit, (unsigned long *)&cpuc->pebs_enabled,
+ x86_pmu.max_pebs_events) {
+ event = cpuc->events[bit];
+ if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
+ intel_pmu_save_and_restart_reload(event, 0);
+ }
return;
+ }

for (at = base; at < top; at += x86_pmu.pebs_record_size) {
struct pebs_record_nhm *p = at;

Subject: [tip:perf/core] perf/x86: Introduce a ->read() callback in 'struct x86_pmu'

Commit-ID: bcfbe5c41d630ce6b74da45134cea484248b515a
Gitweb: https://git.kernel.org/tip/bcfbe5c41d630ce6b74da45134cea484248b515a
Author: Kan Liang <[email protected]>
AuthorDate: Mon, 12 Feb 2018 14:20:32 -0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 9 Mar 2018 08:22:20 +0100

perf/x86: Introduce a ->read() callback in 'struct x86_pmu'

Auto-reload needs to be specially handled when reading event counts.

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/events/core.c | 2 ++
arch/x86/events/perf_event.h | 1 +
2 files changed, 3 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 5a3ccd1715e2..00a6251981d2 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1881,6 +1881,8 @@ early_initcall(init_hw_perf_events);

static inline void x86_pmu_read(struct perf_event *event)
{
+ if (x86_pmu.read)
+ return x86_pmu.read(event);
x86_perf_event_update(event);
}

diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 6495ffd57e3e..d445f0026989 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -520,6 +520,7 @@ struct x86_pmu {
void (*disable)(struct perf_event *);
void (*add)(struct perf_event *);
void (*del)(struct perf_event *);
+ void (*read)(struct perf_event *event);
int (*hw_config)(struct perf_event *event);
int (*schedule_events)(struct cpu_hw_events *cpuc, int n, int *assign);
unsigned eventsel;

Subject: [tip:perf/core] perf/x86/intel: Disable userspace RDPMC usage for large PEBS

Commit-ID: 1af22eba248efe2de25658041a80a3d40fb3e92e
Gitweb: https://git.kernel.org/tip/1af22eba248efe2de25658041a80a3d40fb3e92e
Author: Kan Liang <[email protected]>
AuthorDate: Mon, 12 Feb 2018 14:20:35 -0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 9 Mar 2018 08:22:23 +0100

perf/x86/intel: Disable userspace RDPMC usage for large PEBS

Userspace RDPMC cannot possibly work for large PEBS, which was introduced in:

b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)")

When the PEBS interrupt threshold is larger than one, there is no way
to get exact auto-reload times and value for userspace RDPMC. Disable
the userspace RDPMC usage when large PEBS is enabled.

The only exception is when the PEBS interrupt threshold is 1, in which
case user-space RDPMC works well even with auto-reload events.

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Fixes: b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/events/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 00a6251981d2..9c86e10f1196 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2117,7 +2117,8 @@ static int x86_pmu_event_init(struct perf_event *event)
event->destroy(event);
}

- if (READ_ONCE(x86_pmu.attr_rdpmc))
+ if (READ_ONCE(x86_pmu.attr_rdpmc) &&
+ !(event->hw.flags & PERF_X86_EVENT_FREERUNNING))
event->hw.flags |= PERF_X86_EVENT_RDPMC_ALLOWED;

return err;

Subject: [tip:perf/core] perf/x86/intel/ds: Introduce ->read() function for auto-reload events and flush the PEBS buffer there

Commit-ID: 5bee2cc69d986e20808c93c46f7b6aef51edd827
Gitweb: https://git.kernel.org/tip/5bee2cc69d986e20808c93c46f7b6aef51edd827
Author: Kan Liang <[email protected]>
AuthorDate: Mon, 12 Feb 2018 14:20:33 -0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 9 Mar 2018 08:22:21 +0100

perf/x86/intel/ds: Introduce ->read() function for auto-reload events and flush the PEBS buffer there

There is no way to get exact auto-reload times and values which are needed
for event updates unless we flush the PEBS buffer.

Introduce intel_pmu_auto_reload_read() to drain the PEBS buffer for
auto reload event. To prevent races with the hardware, we can only
call drain_pebs() when the PMU is disabled.

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/events/intel/ds.c | 9 +++++++++
arch/x86/events/perf_event.h | 2 ++
2 files changed, 11 insertions(+)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index f39a4df3a7bd..73844025adaf 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1306,6 +1306,15 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit)
return NULL;
}

+void intel_pmu_auto_reload_read(struct perf_event *event)
+{
+ WARN_ON(!(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD));
+
+ perf_pmu_disable(event->pmu);
+ intel_pmu_drain_pebs_buffer();
+ perf_pmu_enable(event->pmu);
+}
+
/*
* Special variant of intel_pmu_save_and_restart() for auto-reload.
*/
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index d445f0026989..91643472f385 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -924,6 +924,8 @@ void intel_pmu_pebs_disable_all(void);

void intel_pmu_pebs_sched_task(struct perf_event_context *ctx, bool sched_in);

+void intel_pmu_auto_reload_read(struct perf_event *event);
+
void intel_ds_init(void);

void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in);

Subject: [tip:perf/core] perf/x86/intel: Fix PMU read for auto-reload

Commit-ID: ceb90d9e0248947839a0ff4bee98cf28695a6020
Gitweb: https://git.kernel.org/tip/ceb90d9e0248947839a0ff4bee98cf28695a6020
Author: Kan Liang <[email protected]>
AuthorDate: Mon, 12 Feb 2018 14:20:34 -0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 9 Mar 2018 08:22:22 +0100

perf/x86/intel: Fix PMU read for auto-reload

Auto-reload events needs to be specially handled in event count read.

Auto-reload is only available for intel_pmu.

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Fixes: b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/events/intel/core.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 1ba7ca7b675d..41c68d337e84 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2060,6 +2060,14 @@ static void intel_pmu_del_event(struct perf_event *event)
intel_pmu_pebs_del(event);
}

+static void intel_pmu_read_event(struct perf_event *event)
+{
+ if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
+ intel_pmu_auto_reload_read(event);
+ else
+ x86_perf_event_update(event);
+}
+
static void intel_pmu_enable_fixed(struct hw_perf_event *hwc)
{
int idx = hwc->idx - INTEL_PMC_IDX_FIXED;
@@ -3503,6 +3511,7 @@ static __initconst const struct x86_pmu intel_pmu = {
.disable = intel_pmu_disable_event,
.add = intel_pmu_add_event,
.del = intel_pmu_del_event,
+ .read = intel_pmu_read_event,
.hw_config = intel_pmu_hw_config,
.schedule_events = x86_schedule_events,
.eventsel = MSR_ARCH_PERFMON_EVENTSEL0,

2018-03-09 14:32:32

by Vince Weaver

[permalink] [raw]
Subject: Re: [tip:perf/core] perf/x86/intel: Disable userspace RDPMC usage for large PEBS

On Fri, 9 Mar 2018, tip-bot for Kan Liang wrote:

> Commit-ID: 1af22eba248efe2de25658041a80a3d40fb3e92e
> Gitweb: https://git.kernel.org/tip/1af22eba248efe2de25658041a80a3d40fb3e92e
> Author: Kan Liang <[email protected]>
> AuthorDate: Mon, 12 Feb 2018 14:20:35 -0800
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Fri, 9 Mar 2018 08:22:23 +0100
>
> perf/x86/intel: Disable userspace RDPMC usage for large PEBS
>


So this whole commit log is about disabling RDPMC usage for "large PEBS"
but the actual change disables RDPMC if "PERF_X86_EVENT_FREERUNNING"

Either the commit log is really misleading, or else a poor name was chosen
for this feature.

Vince



> Userspace RDPMC cannot possibly work for large PEBS, which was introduced in:
>
> b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)")
>
> When the PEBS interrupt threshold is larger than one, there is no way
> to get exact auto-reload times and value for userspace RDPMC. Disable
> the userspace RDPMC usage when large PEBS is enabled.
>
> The only exception is when the PEBS interrupt threshold is 1, in which
> case user-space RDPMC works well even with auto-reload events.
>
> Signed-off-by: Kan Liang <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Cc: Alexander Shishkin <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Stephane Eranian <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Vince Weaver <[email protected]>
> Cc: [email protected]
> Fixes: b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)")
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> arch/x86/events/core.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 00a6251981d2..9c86e10f1196 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -2117,7 +2117,8 @@ static int x86_pmu_event_init(struct perf_event *event)
> event->destroy(event);
> }
>
> - if (READ_ONCE(x86_pmu.attr_rdpmc))
> + if (READ_ONCE(x86_pmu.attr_rdpmc) &&
> + !(event->hw.flags & PERF_X86_EVENT_FREERUNNING))
> event->hw.flags |= PERF_X86_EVENT_RDPMC_ALLOWED;
>
> return err;
>


2018-03-09 17:43:58

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip:perf/core] perf/x86/intel: Disable userspace RDPMC usage for large PEBS

On Fri, Mar 09, 2018 at 09:31:11AM -0500, Vince Weaver wrote:
> On Fri, 9 Mar 2018, tip-bot for Kan Liang wrote:
>
> > Commit-ID: 1af22eba248efe2de25658041a80a3d40fb3e92e
> > Gitweb: https://git.kernel.org/tip/1af22eba248efe2de25658041a80a3d40fb3e92e
> > Author: Kan Liang <[email protected]>
> > AuthorDate: Mon, 12 Feb 2018 14:20:35 -0800
> > Committer: Ingo Molnar <[email protected]>
> > CommitDate: Fri, 9 Mar 2018 08:22:23 +0100
> >
> > perf/x86/intel: Disable userspace RDPMC usage for large PEBS
> >
>
>
> So this whole commit log is about disabling RDPMC usage for "large PEBS"
> but the actual change disables RDPMC if "PERF_X86_EVENT_FREERUNNING"
>
> Either the commit log is really misleading, or else a poor name was chosen
> for this feature.

Its the same thing, and yes that might want renaming I suppose.

2018-03-09 18:55:15

by Liang, Kan

[permalink] [raw]
Subject: Re: [tip:perf/core] perf/x86/intel: Disable userspace RDPMC usage for large PEBS



On 3/9/2018 12:42 PM, Peter Zijlstra wrote:
> On Fri, Mar 09, 2018 at 09:31:11AM -0500, Vince Weaver wrote:
>> On Fri, 9 Mar 2018, tip-bot for Kan Liang wrote:
>>
>>> Commit-ID: 1af22eba248efe2de25658041a80a3d40fb3e92e
>>> Gitweb: https://git.kernel.org/tip/1af22eba248efe2de25658041a80a3d40fb3e92e
>>> Author: Kan Liang <[email protected]>
>>> AuthorDate: Mon, 12 Feb 2018 14:20:35 -0800
>>> Committer: Ingo Molnar <[email protected]>
>>> CommitDate: Fri, 9 Mar 2018 08:22:23 +0100
>>>
>>> perf/x86/intel: Disable userspace RDPMC usage for large PEBS
>>>
>>
>>
>> So this whole commit log is about disabling RDPMC usage for "large PEBS"
>> but the actual change disables RDPMC if "PERF_X86_EVENT_FREERUNNING"
>>
>> Either the commit log is really misleading, or else a poor name was chosen
>> for this feature.
>
> Its the same thing, and yes that might want renaming I suppose.
>

Yes, I will send a patch to rename the "FREERUNNING" to "LARGE_PEBS",
and fix the confusion.

Thanks,
Kan


2018-03-09 19:11:48

by Vince Weaver

[permalink] [raw]
Subject: Re: [tip:perf/core] perf/x86/intel: Disable userspace RDPMC usage for large PEBS

On Fri, 9 Mar 2018, Peter Zijlstra wrote:

> On Fri, Mar 09, 2018 at 09:31:11AM -0500, Vince Weaver wrote:
> > On Fri, 9 Mar 2018, tip-bot for Kan Liang wrote:
> >
> > > Commit-ID: 1af22eba248efe2de25658041a80a3d40fb3e92e
> > > Gitweb: https://git.kernel.org/tip/1af22eba248efe2de25658041a80a3d40fb3e92e
> > > Author: Kan Liang <[email protected]>
> > > AuthorDate: Mon, 12 Feb 2018 14:20:35 -0800
> > > Committer: Ingo Molnar <[email protected]>
> > > CommitDate: Fri, 9 Mar 2018 08:22:23 +0100
> > >
> > > perf/x86/intel: Disable userspace RDPMC usage for large PEBS
> > >
> >
> >
> > So this whole commit log is about disabling RDPMC usage for "large PEBS"
> > but the actual change disables RDPMC if "PERF_X86_EVENT_FREERUNNING"
> >
> > Either the commit log is really misleading, or else a poor name was chosen
> > for this feature.
>
> Its the same thing, and yes that might want renaming I suppose.

I apologize for noticing these things so late in the game, but I haven't
had time to keep up with a full lkml feed recently so I only see these
things once I'm CC'd on them.

So to summarize this: rdpmc is only disabled on a per-event basis, and
only if that event is doing multi-pebs sampling?

If that's true, then I don't think I have an issue with this.

We finally got rdpmc support in a released PAPI, and it is a massive
improvement when self-monitoring (even moreso if KPTI is enabled) so I was
just trying to make sure this wouldn't suddenly disable rdpmc out from
under us.

Vince

2018-03-12 14:09:22

by Liang, Kan

[permalink] [raw]
Subject: Re: [tip:perf/core] perf/x86/intel: Disable userspace RDPMC usage for large PEBS



On 3/9/2018 2:10 PM, Vince Weaver wrote:
> On Fri, 9 Mar 2018, Peter Zijlstra wrote:
>
>> On Fri, Mar 09, 2018 at 09:31:11AM -0500, Vince Weaver wrote:
>>> On Fri, 9 Mar 2018, tip-bot for Kan Liang wrote:
>>>
>>>> Commit-ID: 1af22eba248efe2de25658041a80a3d40fb3e92e
>>>> Gitweb: https://git.kernel.org/tip/1af22eba248efe2de25658041a80a3d40fb3e92e
>>>> Author: Kan Liang <[email protected]>
>>>> AuthorDate: Mon, 12 Feb 2018 14:20:35 -0800
>>>> Committer: Ingo Molnar <[email protected]>
>>>> CommitDate: Fri, 9 Mar 2018 08:22:23 +0100
>>>>
>>>> perf/x86/intel: Disable userspace RDPMC usage for large PEBS
>>>>
>>>
>>>
>>> So this whole commit log is about disabling RDPMC usage for "large PEBS"
>>> but the actual change disables RDPMC if "PERF_X86_EVENT_FREERUNNING"
>>>
>>> Either the commit log is really misleading, or else a poor name was chosen
>>> for this feature.
>>
>> Its the same thing, and yes that might want renaming I suppose.
>
> I apologize for noticing these things so late in the game, but I haven't
> had time to keep up with a full lkml feed recently so I only see these
> things once I'm CC'd on them.
>
> So to summarize this: rdpmc is only disabled on a per-event basis, and
> only if that event is doing multi-pebs sampling?
>

If the event can do multi-pebs sampling, the rdpmc will be disabled.
Other events which cannot do multi-pebs will not be impacted.

To enable multi-pebs sampling for event, it requires a fixed period.
It doesn't support callgraph. For older platform (before SKL), it
doesn't support time stamp.

Thanks,
Kan

> If that's true, then I don't think I have an issue with this.
>
> We finally got rdpmc support in a released PAPI, and it is a massive
> improvement when self-monitoring (even moreso if KPTI is enabled) so I was
> just trying to make sure this wouldn't suddenly disable rdpmc out from
> under us.
>
> Vince
>

Subject: [tip:perf/urgent] perf/x86/intel: Disable userspace RDPMC usage for large PEBS

Commit-ID: 2c2a9bbe7fecb2ad4981b6f4a56cacbfb849f848
Gitweb: https://git.kernel.org/tip/2c2a9bbe7fecb2ad4981b6f4a56cacbfb849f848
Author: Kan Liang <[email protected]>
AuthorDate: Mon, 12 Feb 2018 14:20:35 -0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 20 Mar 2018 08:52:58 +0100

perf/x86/intel: Disable userspace RDPMC usage for large PEBS

Userspace RDPMC cannot possibly work for large PEBS, which was introduced in:

b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)")

When the PEBS interrupt threshold is larger than one, there is no way
to get exact auto-reload times and value for userspace RDPMC. Disable
the userspace RDPMC usage when large PEBS is enabled.

The only exception is when the PEBS interrupt threshold is 1, in which
case user-space RDPMC works well even with auto-reload events.

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Fixes: b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
(cherry picked from commit 1af22eba248efe2de25658041a80a3d40fb3e92e)
---
arch/x86/events/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 140d33288e78..3d24edfef3e4 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2118,7 +2118,8 @@ static int x86_pmu_event_init(struct perf_event *event)
event->destroy(event);
}

- if (READ_ONCE(x86_pmu.attr_rdpmc))
+ if (READ_ONCE(x86_pmu.attr_rdpmc) &&
+ !(event->hw.flags & PERF_X86_EVENT_FREERUNNING))
event->hw.flags |= PERF_X86_EVENT_RDPMC_ALLOWED;

return err;