2012-06-06 16:17:33

by Vince Weaver

[permalink] [raw]
Subject: [tip:perf/core] perf/x86: Use rdpmc() rather than rdmsr() when possible in the kernel

Commit-ID: c48b60538c3ba05a7a2713c4791b25405525431b
Gitweb: http://git.kernel.org/tip/c48b60538c3ba05a7a2713c4791b25405525431b
Author: Vince Weaver <[email protected]>
AuthorDate: Thu, 1 Mar 2012 17:28:14 -0500
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 6 Jun 2012 17:23:35 +0200

perf/x86: Use rdpmc() rather than rdmsr() when possible in the kernel

The rdpmc instruction is faster than the equivelant rdmsr call,
so use it when possible in the kernel.

The perfctr kernel patches did this, after extensive testing showed
rdpmc to always be faster (One can look in etc/costs in the perfctr-2.6
package to see a historical list of the overhead).

I have done some tests on a 3.2 kernel, the kernel module I used
was included in the first posting of this patch:

rdmsr rdpmc
Core2 T9900: 203.9 cycles 30.9 cycles
AMD fam0fh: 56.2 cycles 9.8 cycles
Atom 6/28/2: 129.7 cycles 50.6 cycles

The speedup of using rdpmc is large.

[ It's probably possible (and desirable) to do this without
requiring a new field in the hw_perf_event structure, but
the fixed events make this tricky. ]

Signed-off-by: Vince Weaver <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/cpu/perf_event.c | 4 +++-
include/linux/perf_event.h | 1 +
2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 43c2017..000a474 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -75,7 +75,7 @@ u64 x86_perf_event_update(struct perf_event *event)
*/
again:
prev_raw_count = local64_read(&hwc->prev_count);
- rdmsrl(hwc->event_base, new_raw_count);
+ rdpmcl(hwc->event_base_rdpmc, new_raw_count);

if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
new_raw_count) != prev_raw_count)
@@ -819,9 +819,11 @@ static inline void x86_assign_hw_event(struct perf_event *event,
} else if (hwc->idx >= X86_PMC_IDX_FIXED) {
hwc->config_base = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
hwc->event_base = MSR_ARCH_PERFMON_FIXED_CTR0 + (hwc->idx - X86_PMC_IDX_FIXED);
+ hwc->event_base_rdpmc = (hwc->idx - X86_PMC_IDX_FIXED) | 1<<30;
} else {
hwc->config_base = x86_pmu_config_addr(hwc->idx);
hwc->event_base = x86_pmu_event_addr(hwc->idx);
+ hwc->event_base_rdpmc = x86_pmu_addr_offset(hwc->idx);
}
}

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 45db49f..1ce887a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -677,6 +677,7 @@ struct hw_perf_event {
u64 last_tag;
unsigned long config_base;
unsigned long event_base;
+ int event_base_rdpmc;
int idx;
int last_cpu;


Subject: [PATCH] perf, amd: Fix rdpmc index calculation for AMD family 15h

On 06.06.12 09:17:02, tip-bot for Vince Weaver wrote:
> Commit-ID: c48b60538c3ba05a7a2713c4791b25405525431b
> Gitweb: http://git.kernel.org/tip/c48b60538c3ba05a7a2713c4791b25405525431b
> Author: Vince Weaver <[email protected]>
> AuthorDate: Thu, 1 Mar 2012 17:28:14 -0500
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Wed, 6 Jun 2012 17:23:35 +0200
>
> perf/x86: Use rdpmc() rather than rdmsr() when possible in the kernel
>
> The rdpmc instruction is faster than the equivelant rdmsr call,
> so use it when possible in the kernel.
>
> The perfctr kernel patches did this, after extensive testing showed
> rdpmc to always be faster (One can look in etc/costs in the perfctr-2.6
> package to see a historical list of the overhead).
>
> I have done some tests on a 3.2 kernel, the kernel module I used
> was included in the first posting of this patch:
>
> rdmsr rdpmc
> Core2 T9900: 203.9 cycles 30.9 cycles
> AMD fam0fh: 56.2 cycles 9.8 cycles
> Atom 6/28/2: 129.7 cycles 50.6 cycles
>
> The speedup of using rdpmc is large.
>
> [ It's probably possible (and desirable) to do this without
> requiring a new field in the hw_perf_event structure, but
> the fixed events make this tricky. ]
>
> Signed-off-by: Vince Weaver <[email protected]>
> Signed-off-by: Peter Zijlstra <[email protected]>
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> arch/x86/kernel/cpu/perf_event.c | 4 +++-
> include/linux/perf_event.h | 1 +
> 2 files changed, 4 insertions(+), 1 deletions(-)

This crashs on AMD family 15h. Fix below.

-Robert


>From cf346c12eb0878046d8326a003d4b86aafc8f923 Mon Sep 17 00:00:00 2001
From: Robert Richter <[email protected]>
Date: Fri, 15 Jun 2012 19:06:44 +0200
Subject: [PATCH] perf, amd: Fix rdpmc index calculation for AMD family 15h

The rdpmc index calculation is wrong for AMD family 15h (X86_FEATURE_
PERFCTR_CORE set). This leads to a GP when accessing the counter.
While the msr address offset is (index << 1) we must use index to
select the correct rdpmc.

Pid: 2237, comm: syslog-ng Not tainted 3.5.0-rc1-perf-x86_64-standard-g130ff90 #135 AMD Pike/Pike
RIP: 0010:[<ffffffff8100dc33>] [<ffffffff8100dc33>] x86_perf_event_update+0x27/0x66
RSP: 0000:ffff88042ec07d90 EFLAGS: 00010083
RAX: 0000000000000005 RBX: 0000000000000005 RCX: 000000000000000a
RDX: 0000000000000000 RSI: ffff8803f4a68160 RDI: ffff8803f4a68000
RBP: ffff88042ec07d98 R08: ffffffffffff6c21 R09: 0000000000000030
R10: ffff88041d33f000 R11: 000000000000001f R12: 0000000000000004
R13: ffff88042ec0b790 R14: ffff88042ec0b998 R15: ffff8803f4a68000
FS: 00007f109a313700(0000) GS:ffff88042ec00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 00000003f485f000 CR4: 00000000000407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process syslog-ng (pid: 2237, threadinfo ffff8803f4972000, task ffff8803f4ed9850)
Stack:
0000000000000005 ffff88042ec07e48 ffffffff8100f5b9 ffff88042ec07ef8
ffff88042ec0b990 00000000000001c7 00007f1099445f4d 000008bd000008bd
0000004835d3bc7a 0000000000000000 0000000000000051 000000000000002a
Call Trace:
<NMI>
[<ffffffff8100f5b9>] x86_pmu_handle_irq+0xc6/0x16a
[<ffffffff8145db68>] perf_event_nmi_handler+0x19/0x1b
[<ffffffff8145d4b2>] nmi_handle+0x4d/0x71
[<ffffffff8145d57e>] do_nmi+0xa8/0x2b9
[<ffffffff8145cce2>] end_repeat_nmi+0x1a/0x1e
<<EOE>>
Code: c9 c3 90 90 31 d2 83 bf 1c 01 00 00 30 55 44 8b 0d 83 c4 6f 00 48 89 e5 53 74 49 48 8d b7 60 01 00 00 4c 8b 06 8b 8f 18 01 00 00 <0f> 33 48 c1 e2 20 89 c0 48 09 c2 4c 89 c0 48 0f b1 16 4c 39 c0
RIP [<ffffffff8100dc33>] x86_perf_event_update+0x27/0x66
RSP <ffff88042ec07d90>
---[ end trace 69da2f3afd06ff37 ]---
Kernel panic - not syncing: Fatal exception in interrupt

Cc: Vince Weaver <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
---
arch/x86/kernel/cpu/perf_event.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 000a474..e7540c8 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -823,7 +823,7 @@ static inline void x86_assign_hw_event(struct perf_event *event,
} else {
hwc->config_base = x86_pmu_config_addr(hwc->idx);
hwc->event_base = x86_pmu_event_addr(hwc->idx);
- hwc->event_base_rdpmc = x86_pmu_addr_offset(hwc->idx);
+ hwc->event_base_rdpmc = hwc->idx;
}
}

--
1.7.8.4



--
Advanced Micro Devices, Inc.
Operating System Research Center