Commit:
1aed267 perf kvm: Do guest-only counting by default
introduced a bug on AMD systems whereby simple commands:
$ perf stat ls
Performance counter stats for 'ls':
0 cycles # 0.000 GHz
0.003704596 seconds time elapsed
would not count anything anymore. Same results for perf record.
I tracked it down to guest mode exclusion being enabled
by default leading to attr->exclude_guest = 1. When
not operating under any sort of virtualization, this
causes the PMU not to count anything.
The fix disables guest exclusion by default.
Signed-off-by: Stephane Eranian <[email protected]>
---
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 8109a90..c1017b3 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -6,7 +6,7 @@
* XXX We need to find a better place for these things...
*/
bool perf_host = true;
-bool perf_guest = false;
+bool perf_guest = true;
void event_attr_init(struct perf_event_attr *attr)
{
On Mon, Feb 27, 2012 at 06:33:51PM +0100, Stephane Eranian wrote:
> Commit:
>
> 1aed267 perf kvm: Do guest-only counting by default
>
> introduced a bug on AMD systems whereby simple commands:
>
> $ perf stat ls
> Performance counter stats for 'ls':
> 0 cycles # 0.000 GHz
> 0.003704596 seconds time elapsed
>
> would not count anything anymore. Same results for perf record.
>
> I tracked it down to guest mode exclusion being enabled
> by default leading to attr->exclude_guest = 1. When
> not operating under any sort of virtualization, this
> causes the PMU not to count anything.
CCing Joerg. I think he fixed this or similar problem recently.
>
> The fix disables guest exclusion by default.
>
> Signed-off-by: Stephane Eranian <[email protected]>
> ---
>
> diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
> index 8109a90..c1017b3 100644
> --- a/tools/perf/util/util.c
> +++ b/tools/perf/util/util.c
> @@ -6,7 +6,7 @@
> * XXX We need to find a better place for these things...
> */
> bool perf_host = true;
> -bool perf_guest = false;
> +bool perf_guest = true;
>
> void event_attr_init(struct perf_event_attr *attr)
> {
--
Gleb.
On 2/27/12 10:33 AM, Stephane Eranian wrote:
> Commit:
>
> 1aed267 perf kvm: Do guest-only counting by default
>
> introduced a bug on AMD systems whereby simple commands:
>
> $ perf stat ls
> Performance counter stats for 'ls':
> 0 cycles # 0.000 GHz
> 0.003704596 seconds time elapsed
>
> would not count anything anymore. Same results for perf record.
>
> I tracked it down to guest mode exclusion being enabled
> by default leading to attr->exclude_guest = 1. When
> not operating under any sort of virtualization, this
> causes the PMU not to count anything.
>
> The fix disables guest exclusion by default.
>
> Signed-off-by: Stephane Eranian<[email protected]>
> ---
>
> diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
> index 8109a90..c1017b3 100644
> --- a/tools/perf/util/util.c
> +++ b/tools/perf/util/util.c
> @@ -6,7 +6,7 @@
> * XXX We need to find a better place for these things...
> */
> bool perf_host = true;
> -bool perf_guest = false;
> +bool perf_guest = true;
This was recently reverted to false by
c4a7dca92bbb9881a5d678720f1d0c2153499749
See: https://lkml.org/lkml/2012/2/8/234
David
>
> void event_attr_init(struct perf_event_attr *attr)
> {
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Mon, Feb 27, 2012 at 6:47 PM, David Ahern <[email protected]> wrote:
> On 2/27/12 10:33 AM, Stephane Eranian wrote:
>>
>> Commit:
>>
>> 1aed267 perf kvm: Do guest-only counting by default
>>
>> introduced a bug on AMD systems whereby simple commands:
>>
>> $ perf stat ls
>> Performance counter stats for 'ls':
>> 0 cycles # 0.000 GHz
>> 0.003704596 seconds time elapsed
>>
>> would not count anything anymore. Same results for perf record.
>>
>> I tracked it down to guest mode exclusion being enabled
>> by default leading to attr->exclude_guest = 1. When
>> not operating under any sort of virtualization, this
>> causes the PMU not to count anything.
>>
>> The fix disables guest exclusion by default.
>>
>> Signed-off-by: Stephane Eranian<[email protected]>
>> ---
>>
>> diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
>> index 8109a90..c1017b3 100644
>> --- a/tools/perf/util/util.c
>> +++ b/tools/perf/util/util.c
>> @@ -6,7 +6,7 @@
>> * XXX We need to find a better place for these things...
>> */
>> bool perf_host = true;
>> -bool perf_guest = false;
>> +bool perf_guest = true;
>
>
> This was recently reverted to false by
> c4a7dca92bbb9881a5d678720f1d0c2153499749
>
> See: https://lkml.org/lkml/2012/2/8/234
>
Yeah, but that causes simple commands such as "perf stat -e cycles ls"
to return 0 count.
So either you get a segfault or you get zero count. There is something
else going on here...
> David
>
>>
>> void event_attr_init(struct perf_event_attr *attr)
>> {
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
>
On 2/27/12 10:52 AM, Stephane Eranian wrote:
> On Mon, Feb 27, 2012 at 6:47 PM, David Ahern<[email protected]> wrote:
>> On 2/27/12 10:33 AM, Stephane Eranian wrote:
>>>
>>> Commit:
>>>
>>> 1aed267 perf kvm: Do guest-only counting by default
>>>
>>> introduced a bug on AMD systems whereby simple commands:
>>>
>>> $ perf stat ls
>>> Performance counter stats for 'ls':
>>> 0 cycles # 0.000 GHz
>>> 0.003704596 seconds time elapsed
>>>
>>> would not count anything anymore. Same results for perf record.
>>>
>>> I tracked it down to guest mode exclusion being enabled
>>> by default leading to attr->exclude_guest = 1. When
>>> not operating under any sort of virtualization, this
>>> causes the PMU not to count anything.
>>>
>>> The fix disables guest exclusion by default.
>>>
>>> Signed-off-by: Stephane Eranian<[email protected]>
>>> ---
>>>
>>> diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
>>> index 8109a90..c1017b3 100644
>>> --- a/tools/perf/util/util.c
>>> +++ b/tools/perf/util/util.c
>>> @@ -6,7 +6,7 @@
>>> * XXX We need to find a better place for these things...
>>> */
>>> bool perf_host = true;
>>> -bool perf_guest = false;
>>> +bool perf_guest = true;
>>
>>
>> This was recently reverted to false by
>> c4a7dca92bbb9881a5d678720f1d0c2153499749
>>
>> See: https://lkml.org/lkml/2012/2/8/234
>>
> Yeah, but that causes simple commands such as "perf stat -e cycles ls"
> to return 0 count.
>
> So either you get a segfault or you get zero count. There is something
> else going on here...
agreed. Did you try reverting exclude_guest by default?
On Mon, Feb 27, 2012 at 06:52:25PM +0100, Stephane Eranian wrote:
> Yeah, but that causes simple commands such as "perf stat -e cycles ls"
> to return 0 count.
>
> So either you get a segfault or you get zero count. There is something
> else going on here...
Hmm, looks like the counters will not do any counting when EFER.SVME==0
and HO==1 or GO==1 is set in the counters :(
A solution would be to mask out these bits in the perf-ctrl register
when EFER.SVME=0
I'll try to come up with a fix soon.
Joerg
It turned out that a performance counter on AMD does not
count at all when the GO or HO bit is set in the control
register and SVM is disabled in EFER.
This patch works around this issue by masking out the HO bit
in the performance counter control register when SVM is not
enabled.
The GO bit is not touched because it is only set when the
user wants to count in guest-mode only. So when SVM is
disabled the counter should not run at all and the
not-counting is the intended behaviour.
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Avi Kivity <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Gleb Natapov <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: [email protected] # 3.2
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/perf_event.h | 5 +++++
arch/x86/kernel/cpu/perf_event.c | 30 ++++++++++++++++++++++++++++++
arch/x86/kernel/cpu/perf_event.h | 6 +++++-
arch/x86/kernel/cpu/perf_event_amd.c | 6 ++++--
arch/x86/kvm/svm.c | 5 +++++
5 files changed, 49 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 096c975e..e0a4ad4 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -227,6 +227,8 @@ struct perf_guest_switch_msr {
extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr);
extern void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap);
+extern void x86_pmu_enable_virt(void);
+extern void x86_pmu_disable_virt(void);
#else
static inline perf_guest_switch_msr *perf_guest_get_msrs(int *nr)
{
@@ -240,6 +242,9 @@ static inline void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
}
static inline void perf_events_lapic_init(void) { }
+
+static inline void x86_pmu_enable_virt(void) { }
+static inline void x86_pmu_disable_virt(void) { }
#endif
#endif /* _ASM_X86_PERF_EVENT_H */
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 5adce10..f1ba9bf 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -477,6 +477,36 @@ void x86_pmu_enable_all(int added)
}
}
+void x86_pmu_enable_virt(void)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+
+ cpuc->perf_ctr_virt_mask = 0;
+
+ /* Reload all events */
+ x86_pmu_disable_all();
+ x86_pmu_enable_all(0);
+}
+EXPORT_SYMBOL_GPL(x86_pmu_enable_virt);
+
+void x86_pmu_disable_virt(void)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+
+ /*
+ * We only mask out the Host-only bit so that host-only counting works
+ * when SVM is disabled. If someone sets up a guest-only counter when
+ * SVM is disabled the Guest-only bits still gets set and the counter
+ * will not count anything.
+ */
+ cpuc->perf_ctr_virt_mask = AMD_PERFMON_EVENTSEL_HOSTONLY;
+
+ /* Reload all events */
+ x86_pmu_disable_all();
+ x86_pmu_enable_all(0);
+}
+EXPORT_SYMBOL_GPL(x86_pmu_disable_virt);
+
static struct pmu pmu;
static inline int is_x86_event(struct perf_event *event)
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 8944062..2c581b9 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -148,6 +148,8 @@ struct cpu_hw_events {
* AMD specific bits
*/
struct amd_nb *amd_nb;
+ /* Inverted mask of bits to clear in the perf_ctr ctrl registers */
+ u64 perf_ctr_virt_mask;
void *kfree_on_online;
};
@@ -417,9 +419,11 @@ void x86_pmu_disable_all(void);
static inline void __x86_pmu_enable_event(struct hw_perf_event *hwc,
u64 enable_mask)
{
+ u64 disable_mask = __this_cpu_read(cpu_hw_events.perf_ctr_virt_mask);
+
if (hwc->extra_reg.reg)
wrmsrl(hwc->extra_reg.reg, hwc->extra_reg.config);
- wrmsrl(hwc->config_base, hwc->config | enable_mask);
+ wrmsrl(hwc->config_base, (hwc->config | enable_mask) & ~disable_mask);
}
void x86_pmu_enable_all(int added);
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 0397b23..0487b12 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -357,7 +357,9 @@ static void amd_pmu_cpu_starting(int cpu)
struct amd_nb *nb;
int i, nb_id;
- if (boot_cpu_data.x86_max_cores < 2)
+ cpuc->perf_ctr_virt_mask = AMD_PERFMON_EVENTSEL_HOSTONLY;
+
+ if (boot_cpu_data.x86_max_cores < 2 || boot_cpu_data.x86 == 0x15)
return;
nb_id = amd_get_nb_id(cpu);
@@ -587,9 +589,9 @@ static __initconst const struct x86_pmu amd_pmu_f15h = {
.put_event_constraints = amd_put_event_constraints,
.cpu_prepare = amd_pmu_cpu_prepare,
- .cpu_starting = amd_pmu_cpu_starting,
.cpu_dead = amd_pmu_cpu_dead,
#endif
+ .cpu_starting = amd_pmu_cpu_starting,
};
__init int amd_pmu_init(void)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5fa553b..773fee2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -29,6 +29,7 @@
#include <linux/ftrace_event.h>
#include <linux/slab.h>
+#include <asm/perf_event.h>
#include <asm/tlbflush.h>
#include <asm/desc.h>
#include <asm/kvm_para.h>
@@ -575,6 +576,8 @@ static void svm_hardware_disable(void *garbage)
wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT);
cpu_svm_disable();
+
+ x86_pmu_disable_virt();
}
static int svm_hardware_enable(void *garbage)
@@ -622,6 +625,8 @@ static int svm_hardware_enable(void *garbage)
svm_init_erratum_383();
+ x86_pmu_enable_virt();
+
return 0;
}
--
1.7.5.4
On 02/28/2012 05:55 PM, Joerg Roedel wrote:
>
> __init int amd_pmu_init(void)
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 5fa553b..773fee2 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -29,6 +29,7 @@
> #include <linux/ftrace_event.h>
> #include <linux/slab.h>
>
> +#include <asm/perf_event.h>
> #include <asm/tlbflush.h>
> #include <asm/desc.h>
> #include <asm/kvm_para.h>
> @@ -575,6 +576,8 @@ static void svm_hardware_disable(void *garbage)
> wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT);
>
> cpu_svm_disable();
> +
> + x86_pmu_disable_virt();
> }
>
> static int svm_hardware_enable(void *garbage)
> @@ -622,6 +625,8 @@ static int svm_hardware_enable(void *garbage)
>
> svm_init_erratum_383();
>
> + x86_pmu_enable_virt();
> +
> return 0;
> }
>
These should go into x86.c. If the functions later gain meaning on
Intel, we want them to be called (and nothing in the name suggests
they're AMD specific).
--
error compiling committee.c: too many arguments to function
On 2/28/12 10:24 AM, Avi Kivity wrote:
> On 02/28/2012 05:55 PM, Joerg Roedel wrote:
>>
>> __init int amd_pmu_init(void)
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 5fa553b..773fee2 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -29,6 +29,7 @@
>> #include<linux/ftrace_event.h>
>> #include<linux/slab.h>
>>
>> +#include<asm/perf_event.h>
>> #include<asm/tlbflush.h>
>> #include<asm/desc.h>
>> #include<asm/kvm_para.h>
>> @@ -575,6 +576,8 @@ static void svm_hardware_disable(void *garbage)
>> wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT);
>>
>> cpu_svm_disable();
>> +
>> + x86_pmu_disable_virt();
>> }
>>
>> static int svm_hardware_enable(void *garbage)
>> @@ -622,6 +625,8 @@ static int svm_hardware_enable(void *garbage)
>>
>> svm_init_erratum_383();
>>
>> + x86_pmu_enable_virt();
>> +
>> return 0;
>> }
>>
>
> These should go into x86.c. If the functions later gain meaning on
> Intel, we want them to be called (and nothing in the name suggests
> they're AMD specific).
>
I was to suggest the reverse: since this patch addesses an AMD bug, why
not push those functions into perf_event_amd.c and make them dependent
on CONFIG_CPU_SUP_AMD as well.
David
On 02/28/2012 07:36 PM, David Ahern wrote:
> On 2/28/12 10:24 AM, Avi Kivity wrote:
>> On 02/28/2012 05:55 PM, Joerg Roedel wrote:
>>>
>>> __init int amd_pmu_init(void)
>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>> index 5fa553b..773fee2 100644
>>> --- a/arch/x86/kvm/svm.c
>>> +++ b/arch/x86/kvm/svm.c
>>> @@ -29,6 +29,7 @@
>>> #include<linux/ftrace_event.h>
>>> #include<linux/slab.h>
>>>
>>> +#include<asm/perf_event.h>
>>> #include<asm/tlbflush.h>
>>> #include<asm/desc.h>
>>> #include<asm/kvm_para.h>
>>> @@ -575,6 +576,8 @@ static void svm_hardware_disable(void *garbage)
>>> wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT);
>>>
>>> cpu_svm_disable();
>>> +
>>> + x86_pmu_disable_virt();
>>> }
>>>
>>> static int svm_hardware_enable(void *garbage)
>>> @@ -622,6 +625,8 @@ static int svm_hardware_enable(void *garbage)
>>>
>>> svm_init_erratum_383();
>>>
>>> + x86_pmu_enable_virt();
>>> +
>>> return 0;
>>> }
>>>
>>
>> These should go into x86.c. If the functions later gain meaning on
>> Intel, we want them to be called (and nothing in the name suggests
>> they're AMD specific).
>>
>
> I was to suggest the reverse: since this patch addesses an AMD bug,
> why not push those functions into perf_event_amd.c and make them
> dependent on CONFIG_CPU_SUP_AMD as well.
It depends on which direction you expect the code to grow. These hooks
seem reasonable, so I think they should be generic.
--
error compiling committee.c: too many arguments to function
On Tue, Feb 28, 2012 at 07:38:34PM +0200, Avi Kivity wrote:
> On 02/28/2012 07:36 PM, David Ahern wrote:
> > I was to suggest the reverse: since this patch addesses an AMD bug,
> > why not push those functions into perf_event_amd.c and make them
> > dependent on CONFIG_CPU_SUP_AMD as well.
>
> It depends on which direction you expect the code to grow. These hooks
> seem reasonable, so I think they should be generic.
Okay, I am fine with both directions. But since this patch is a fix in
the first place I make it more AMD specific for now. If we need this as
a generic feature it can easily be changed back.
Joerg
--
AMD Operating System Research Center
Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632
It turned out that a performance counter on AMD does not
count at all when the GO or HO bit is set in the control
register and SVM is disabled in EFER.
This patch works around this issue by masking out the HO bit
in the performance counter control register when SVM is not
enabled.
The GO bit is not touched because it is only set when the
user wants to count in guest-mode only. So when SVM is
disabled the counter should not run at all and the
not-counting is the intended behaviour.
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Avi Kivity <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Gleb Natapov <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: [email protected] # 3.2
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/perf_event.h | 8 +++++++
arch/x86/kernel/cpu/perf_event.h | 6 ++++-
arch/x86/kernel/cpu/perf_event_amd.c | 37 ++++++++++++++++++++++++++++++++-
arch/x86/kvm/svm.c | 5 ++++
4 files changed, 53 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 096c975e..ffad310 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -242,4 +242,12 @@ static inline void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
static inline void perf_events_lapic_init(void) { }
#endif
+#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD)
+extern void amd_pmu_enable_virt(void);
+extern void amd_pmu_disable_virt(void);
+#else /* CONFIG_PERF_EVENTS && CONFIG_CPU_SUP_AMD */
+static inline void amd_pmu_enable_virt(void) { }
+static inline void amd_pmu_disable_virt(void) { }
+#endif /* CONFIG_PERF_EVENTS && CONFIG_CPU_SUP_AMD */
+
#endif /* _ASM_X86_PERF_EVENT_H */
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 8944062..2c581b9 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -148,6 +148,8 @@ struct cpu_hw_events {
* AMD specific bits
*/
struct amd_nb *amd_nb;
+ /* Inverted mask of bits to clear in the perf_ctr ctrl registers */
+ u64 perf_ctr_virt_mask;
void *kfree_on_online;
};
@@ -417,9 +419,11 @@ void x86_pmu_disable_all(void);
static inline void __x86_pmu_enable_event(struct hw_perf_event *hwc,
u64 enable_mask)
{
+ u64 disable_mask = __this_cpu_read(cpu_hw_events.perf_ctr_virt_mask);
+
if (hwc->extra_reg.reg)
wrmsrl(hwc->extra_reg.reg, hwc->extra_reg.config);
- wrmsrl(hwc->config_base, hwc->config | enable_mask);
+ wrmsrl(hwc->config_base, (hwc->config | enable_mask) & ~disable_mask);
}
void x86_pmu_enable_all(int added);
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 0397b23..67250a5 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -1,4 +1,5 @@
#include <linux/perf_event.h>
+#include <linux/export.h>
#include <linux/types.h>
#include <linux/init.h>
#include <linux/slab.h>
@@ -357,7 +358,9 @@ static void amd_pmu_cpu_starting(int cpu)
struct amd_nb *nb;
int i, nb_id;
- if (boot_cpu_data.x86_max_cores < 2)
+ cpuc->perf_ctr_virt_mask = AMD_PERFMON_EVENTSEL_HOSTONLY;
+
+ if (boot_cpu_data.x86_max_cores < 2 || boot_cpu_data.x86 == 0x15)
return;
nb_id = amd_get_nb_id(cpu);
@@ -587,9 +590,9 @@ static __initconst const struct x86_pmu amd_pmu_f15h = {
.put_event_constraints = amd_put_event_constraints,
.cpu_prepare = amd_pmu_cpu_prepare,
- .cpu_starting = amd_pmu_cpu_starting,
.cpu_dead = amd_pmu_cpu_dead,
#endif
+ .cpu_starting = amd_pmu_cpu_starting,
};
__init int amd_pmu_init(void)
@@ -621,3 +624,33 @@ __init int amd_pmu_init(void)
return 0;
}
+
+void amd_pmu_enable_virt(void)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+
+ cpuc->perf_ctr_virt_mask = 0;
+
+ /* Reload all events */
+ x86_pmu_disable_all();
+ x86_pmu_enable_all(0);
+}
+EXPORT_SYMBOL_GPL(amd_pmu_enable_virt);
+
+void amd_pmu_disable_virt(void)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+
+ /*
+ * We only mask out the Host-only bit so that host-only counting works
+ * when SVM is disabled. If someone sets up a guest-only counter when
+ * SVM is disabled the Guest-only bits still gets set and the counter
+ * will not count anything.
+ */
+ cpuc->perf_ctr_virt_mask = AMD_PERFMON_EVENTSEL_HOSTONLY;
+
+ /* Reload all events */
+ x86_pmu_disable_all();
+ x86_pmu_enable_all(0);
+}
+EXPORT_SYMBOL_GPL(amd_pmu_disable_virt);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5fa553b..e385214 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -29,6 +29,7 @@
#include <linux/ftrace_event.h>
#include <linux/slab.h>
+#include <asm/perf_event.h>
#include <asm/tlbflush.h>
#include <asm/desc.h>
#include <asm/kvm_para.h>
@@ -575,6 +576,8 @@ static void svm_hardware_disable(void *garbage)
wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT);
cpu_svm_disable();
+
+ amd_pmu_disable_virt();
}
static int svm_hardware_enable(void *garbage)
@@ -622,6 +625,8 @@ static int svm_hardware_enable(void *garbage)
svm_init_erratum_383();
+ amd_pmu_enable_virt();
+
return 0;
}
--
1.7.5.4
On 02/29/2012 03:57 PM, Joerg Roedel wrote:
> It turned out that a performance counter on AMD does not
> count at all when the GO or HO bit is set in the control
> register and SVM is disabled in EFER.
>
> This patch works around this issue by masking out the HO bit
> in the performance counter control register when SVM is not
> enabled.
>
> The GO bit is not touched because it is only set when the
> user wants to count in guest-mode only. So when SVM is
> disabled the counter should not run at all and the
> not-counting is the intended behaviour.
>
> diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
> index 0397b23..67250a5 100644
> --- a/arch/x86/kernel/cpu/perf_event_amd.c
> +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> @@ -1,4 +1,5 @@
> #include <linux/perf_event.h>
> +#include <linux/export.h>
> #include <linux/types.h>
> #include <linux/init.h>
> #include <linux/slab.h>
> @@ -357,7 +358,9 @@ static void amd_pmu_cpu_starting(int cpu)
> struct amd_nb *nb;
> int i, nb_id;
>
> - if (boot_cpu_data.x86_max_cores < 2)
> + cpuc->perf_ctr_virt_mask = AMD_PERFMON_EVENTSEL_HOSTONLY;
> +
> + if (boot_cpu_data.x86_max_cores < 2 || boot_cpu_data.x86 == 0x15)
> return;
Why this (boot_cpu_data.x86 == 0x15) change?
--
error compiling committee.c: too many arguments to function
On Wed, Feb 29, 2012 at 07:00:09PM +0200, Avi Kivity wrote:
> On 02/29/2012 03:57 PM, Joerg Roedel wrote:
> > It turned out that a performance counter on AMD does not
> > count at all when the GO or HO bit is set in the control
> > register and SVM is disabled in EFER.
> >
> > This patch works around this issue by masking out the HO bit
> > in the performance counter control register when SVM is not
> > enabled.
> >
> > The GO bit is not touched because it is only set when the
> > user wants to count in guest-mode only. So when SVM is
> > disabled the counter should not run at all and the
> > not-counting is the intended behaviour.
> >
> > diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
> > index 0397b23..67250a5 100644
> > --- a/arch/x86/kernel/cpu/perf_event_amd.c
> > +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> > @@ -1,4 +1,5 @@
> > #include <linux/perf_event.h>
> > +#include <linux/export.h>
> > #include <linux/types.h>
> > #include <linux/init.h>
> > #include <linux/slab.h>
> > @@ -357,7 +358,9 @@ static void amd_pmu_cpu_starting(int cpu)
> > struct amd_nb *nb;
> > int i, nb_id;
> >
> > - if (boot_cpu_data.x86_max_cores < 2)
> > + cpuc->perf_ctr_virt_mask = AMD_PERFMON_EVENTSEL_HOSTONLY;
> > +
> > + if (boot_cpu_data.x86_max_cores < 2 || boot_cpu_data.x86 == 0x15)
> > return;
>
> Why this (boot_cpu_data.x86 == 0x15) change?
>
I think it is related to .cpu_starting callback now available in
amd_pmu_f15h where previously it wasn't.
--
Gleb.
On Wed, Feb 29, 2012 at 07:00:09PM +0200, Avi Kivity wrote:
> On 02/29/2012 03:57 PM, Joerg Roedel wrote:
> > It turned out that a performance counter on AMD does not
> > count at all when the GO or HO bit is set in the control
> > register and SVM is disabled in EFER.
> >
> > This patch works around this issue by masking out the HO bit
> > in the performance counter control register when SVM is not
> > enabled.
> >
> > The GO bit is not touched because it is only set when the
> > user wants to count in guest-mode only. So when SVM is
> > disabled the counter should not run at all and the
> > not-counting is the intended behaviour.
> >
> > diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
> > index 0397b23..67250a5 100644
> > --- a/arch/x86/kernel/cpu/perf_event_amd.c
> > +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> > @@ -1,4 +1,5 @@
> > #include <linux/perf_event.h>
> > +#include <linux/export.h>
> > #include <linux/types.h>
> > #include <linux/init.h>
> > #include <linux/slab.h>
> > @@ -357,7 +358,9 @@ static void amd_pmu_cpu_starting(int cpu)
> > struct amd_nb *nb;
> > int i, nb_id;
> >
> > - if (boot_cpu_data.x86_max_cores < 2)
> > + cpuc->perf_ctr_virt_mask = AMD_PERFMON_EVENTSEL_HOSTONLY;
> > +
> > + if (boot_cpu_data.x86_max_cores < 2 || boot_cpu_data.x86 == 0x15)
> > return;
>
> Why this (boot_cpu_data.x86 == 0x15) change?
This is because this function did not run on Fam15h before but now it
has to so that cpuc->perf_ctr_virt_mask is initialized. The other stuff
done in this function is setup for northbridge counter. These are not
yet implemented for Fam15h CPUs so this setup must not run on those
CPUs. Therefore the check was added.
Once northbridge counters are implemented for Fam15h this check can go
away again.
Joerg
--
AMD Operating System Research Center
Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632
On 02/29/2012 07:05 PM, Joerg Roedel wrote:
> On Wed, Feb 29, 2012 at 07:00:09PM +0200, Avi Kivity wrote:
> > On 02/29/2012 03:57 PM, Joerg Roedel wrote:
> > > It turned out that a performance counter on AMD does not
> > > count at all when the GO or HO bit is set in the control
> > > register and SVM is disabled in EFER.
> > >
> > > This patch works around this issue by masking out the HO bit
> > > in the performance counter control register when SVM is not
> > > enabled.
> > >
> > > The GO bit is not touched because it is only set when the
> > > user wants to count in guest-mode only. So when SVM is
> > > disabled the counter should not run at all and the
> > > not-counting is the intended behaviour.
> > >
> > > diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
> > > index 0397b23..67250a5 100644
> > > --- a/arch/x86/kernel/cpu/perf_event_amd.c
> > > +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> > > @@ -1,4 +1,5 @@
> > > #include <linux/perf_event.h>
> > > +#include <linux/export.h>
> > > #include <linux/types.h>
> > > #include <linux/init.h>
> > > #include <linux/slab.h>
> > > @@ -357,7 +358,9 @@ static void amd_pmu_cpu_starting(int cpu)
> > > struct amd_nb *nb;
> > > int i, nb_id;
> > >
> > > - if (boot_cpu_data.x86_max_cores < 2)
> > > + cpuc->perf_ctr_virt_mask = AMD_PERFMON_EVENTSEL_HOSTONLY;
> > > +
> > > + if (boot_cpu_data.x86_max_cores < 2 || boot_cpu_data.x86 == 0x15)
> > > return;
> >
> > Why this (boot_cpu_data.x86 == 0x15) change?
>
> This is because this function did not run on Fam15h before but now it
> has to so that cpuc->perf_ctr_virt_mask is initialized. The other stuff
> done in this function is setup for northbridge counter. These are not
> yet implemented for Fam15h CPUs so this setup must not run on those
> CPUs. Therefore the check was added.
> Once northbridge counters are implemented for Fam15h this check can go
> away again.
>
Thanks. Ingo, this had better go through tip.git since most of the
changes are perf related.
--
error compiling committee.c: too many arguments to function
On Wed, 2012-02-29 at 18:05 +0100, Joerg Roedel wrote:
> Once northbridge counters are implemented for Fam15h this check can go
> away again.
Nope, northbridge on fam15 is completely disjoint from the regular pmu
and should thus be a separate driver.
The only reason the old amd driver has them both is because how they
shared the registers so there is a shared resource to manage.
On Wed, 2012-02-29 at 14:57 +0100, Joerg Roedel wrote:
> diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
> index 8944062..2c581b9 100644
> --- a/arch/x86/kernel/cpu/perf_event.h
> +++ b/arch/x86/kernel/cpu/perf_event.h
> @@ -148,6 +148,8 @@ struct cpu_hw_events {
> * AMD specific bits
> */
> struct amd_nb *amd_nb;
> + /* Inverted mask of bits to clear in the perf_ctr ctrl registers */
> + u64 perf_ctr_virt_mask;
>
> void *kfree_on_online;
> };
> @@ -417,9 +419,11 @@ void x86_pmu_disable_all(void);
> static inline void __x86_pmu_enable_event(struct hw_perf_event *hwc,
> u64 enable_mask)
> {
> + u64 disable_mask = __this_cpu_read(cpu_hw_events.perf_ctr_virt_mask);
> +
> if (hwc->extra_reg.reg)
> wrmsrl(hwc->extra_reg.reg, hwc->extra_reg.config);
> - wrmsrl(hwc->config_base, hwc->config | enable_mask);
> + wrmsrl(hwc->config_base, (hwc->config | enable_mask) & ~disable_mask);
> }
Its starting to look like we should kill this helper, the extra_reg muck
is Intel only and the disable_mask is AMD.
Queued it for now though..