2021-03-09 17:12:14

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2] x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case

Initialize x86_pmu.guest_get_msrs to return 0/NULL to handle the "nop"
case. Patching in perf_guest_get_msrs_nop() during setup does not work
if there is no PMU, as setup bails before updating the static calls,
leaving x86_pmu.guest_get_msrs NULL and thus a complete nop. Ultimately,
this causes VMX abort on VM-Exit due to KVM putting random garbage from
the stack into the MSR load list.

Add a comment in KVM to note that nr_msrs is valid if and only if the
return value is non-NULL.

Fixes: abd562df94d1 ("x86/perf: Use static_call for x86_pmu.guest_get_msrs")
Cc: Like Xu <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Jim Mattson <[email protected]>
Reported-by: Dmitry Vyukov <[email protected]>
Reported-by: [email protected]
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---

v2:
- Use __static_call_return0 to return NULL instead of manually checking
the hook at invocation. [Peter]
- Rebase to tip/sched/core, commit 4117cebf1a9f ("psi: Optimize task
switch inside shared cgroups").


arch/x86/events/core.c | 16 +++++-----------
arch/x86/kvm/vmx/vmx.c | 2 +-
2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 6ddeed3cd2ac..7bb056151ecc 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -81,7 +81,11 @@ DEFINE_STATIC_CALL_NULL(x86_pmu_swap_task_ctx, *x86_pmu.swap_task_ctx);
DEFINE_STATIC_CALL_NULL(x86_pmu_drain_pebs, *x86_pmu.drain_pebs);
DEFINE_STATIC_CALL_NULL(x86_pmu_pebs_aliases, *x86_pmu.pebs_aliases);

-DEFINE_STATIC_CALL_NULL(x86_pmu_guest_get_msrs, *x86_pmu.guest_get_msrs);
+/*
+ * This one is magic, it will get called even when PMU init fails (because
+ * there is no PMU), in which case it should simply return NULL.
+ */
+DEFINE_STATIC_CALL_RET0(x86_pmu_guest_get_msrs, *x86_pmu.guest_get_msrs);

u64 __read_mostly hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
@@ -1944,13 +1948,6 @@ static void _x86_pmu_read(struct perf_event *event)
x86_perf_event_update(event);
}

-static inline struct perf_guest_switch_msr *
-perf_guest_get_msrs_nop(int *nr)
-{
- *nr = 0;
- return NULL;
-}
-
static int __init init_hw_perf_events(void)
{
struct x86_pmu_quirk *quirk;
@@ -2024,9 +2021,6 @@ static int __init init_hw_perf_events(void)
if (!x86_pmu.read)
x86_pmu.read = _x86_pmu_read;

- if (!x86_pmu.guest_get_msrs)
- x86_pmu.guest_get_msrs = perf_guest_get_msrs_nop;
-
x86_pmu_static_call_update();

/*
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 50810d471462..32cf8287d4a7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6580,8 +6580,8 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
int i, nr_msrs;
struct perf_guest_switch_msr *msrs;

+ /* Note, nr_msrs may be garbage if perf_guest_get_msrs() returns NULL. */
msrs = perf_guest_get_msrs(&nr_msrs);
-
if (!msrs)
return;

--
2.30.1.766.gb4fecdf3b7-goog


2021-03-09 18:28:34

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v2] x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case

On Tue, Mar 9, 2021 at 9:10 AM Sean Christopherson <[email protected]> wrote:
>
> Initialize x86_pmu.guest_get_msrs to return 0/NULL to handle the "nop"
> case. Patching in perf_guest_get_msrs_nop() during setup does not work
> if there is no PMU, as setup bails before updating the static calls,
> leaving x86_pmu.guest_get_msrs NULL and thus a complete nop. Ultimately,
> this causes VMX abort on VM-Exit due to KVM putting random garbage from
> the stack into the MSR load list.
>
> Add a comment in KVM to note that nr_msrs is valid if and only if the
> return value is non-NULL.
>
> Fixes: abd562df94d1 ("x86/perf: Use static_call for x86_pmu.guest_get_msrs")
> Cc: Like Xu <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Jim Mattson <[email protected]>
> Reported-by: Dmitry Vyukov <[email protected]>
> Reported-by: [email protected]
> Suggested-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
>
> v2:
> - Use __static_call_return0 to return NULL instead of manually checking
> the hook at invocation. [Peter]
> - Rebase to tip/sched/core, commit 4117cebf1a9f ("psi: Optimize task
> switch inside shared cgroups").
>
...
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 50810d471462..32cf8287d4a7 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6580,8 +6580,8 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
> int i, nr_msrs;
> struct perf_guest_switch_msr *msrs;
>
> + /* Note, nr_msrs may be garbage if perf_guest_get_msrs() returns NULL. */

You could drop the scary comment with a profligate initialization of
nr_msrs to 0.

[Apologies to those seeing this twice. I blame gmail.]

2021-03-09 19:48:22

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2] x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case

On Tue, Mar 09, 2021, Jim Mattson wrote:
> On Tue, Mar 9, 2021 at 9:10 AM Sean Christopherson <[email protected]>
> wrote:
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 50810d471462..32cf8287d4a7 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -6580,8 +6580,8 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx
> > *vmx)
> > int i, nr_msrs;
> > struct perf_guest_switch_msr *msrs;
> >
> > + /* Note, nr_msrs may be garbage if perf_guest_get_msrs() returns
> > NULL. */
> >
>
> You could drop the scary comment with a profligate initialization of
> nr_msrs to 0.

Yeah, I considered that as well. I opted for the scary comment because I
wanted to dissuade future patches from modifying this code without taking into
account the non-obvious behavior.

2021-03-10 08:20:54

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2] x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case

On Tue, Mar 09, 2021 at 09:10:19AM -0800, Sean Christopherson wrote:

> @@ -2024,9 +2021,6 @@ static int __init init_hw_perf_events(void)
> if (!x86_pmu.read)
> x86_pmu.read = _x86_pmu_read;
>
> - if (!x86_pmu.guest_get_msrs)
> - x86_pmu.guest_get_msrs = perf_guest_get_msrs_nop;

I suspect I might've been over eager here and we're now in trouble when
*_pmu_init() clears x86_pmu.guest_get_msrs (like for instance on AMD).

When that happens we need to restore __static_call_return0, otherwise
the following static_call_update() will patch in a NOP and RAX will be
garbage again.

So I've taken the liberty to update the patch as below.

---

Subject: x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case
From: Sean Christopherson <[email protected]>
Date: Tue, 9 Mar 2021 09:10:19 -0800

From: Sean Christopherson <[email protected]>

Initialize x86_pmu.guest_get_msrs to return 0/NULL to handle the "nop"
case. Patching in perf_guest_get_msrs_nop() during setup does not work
if there is no PMU, as setup bails before updating the static calls,
leaving x86_pmu.guest_get_msrs NULL and thus a complete nop. Ultimately,
this causes VMX abort on VM-Exit due to KVM putting random garbage from
the stack into the MSR load list.

Add a comment in KVM to note that nr_msrs is valid if and only if the
return value is non-NULL.

Fixes: abd562df94d1 ("x86/perf: Use static_call for x86_pmu.guest_get_msrs")
Reported-by: Dmitry Vyukov <[email protected]>
Reported-by: [email protected]
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---

v2:
- Use __static_call_return0 to return NULL instead of manually checking
the hook at invocation. [Peter]
- Rebase to tip/sched/core, commit 4117cebf1a9f ("psi: Optimize task
switch inside shared cgroups").

arch/x86/events/core.c | 15 ++++++---------
arch/x86/kvm/vmx/vmx.c | 2 +-
2 files changed, 7 insertions(+), 10 deletions(-)

--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -81,7 +81,11 @@ DEFINE_STATIC_CALL_NULL(x86_pmu_swap_tas
DEFINE_STATIC_CALL_NULL(x86_pmu_drain_pebs, *x86_pmu.drain_pebs);
DEFINE_STATIC_CALL_NULL(x86_pmu_pebs_aliases, *x86_pmu.pebs_aliases);

-DEFINE_STATIC_CALL_NULL(x86_pmu_guest_get_msrs, *x86_pmu.guest_get_msrs);
+/*
+ * This one is magic, it will get called even when PMU init fails (because
+ * there is no PMU), in which case it should simply return NULL.
+ */
+DEFINE_STATIC_CALL_RET0(x86_pmu_guest_get_msrs, *x86_pmu.guest_get_msrs);

u64 __read_mostly hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
@@ -1944,13 +1948,6 @@ static void _x86_pmu_read(struct perf_ev
x86_perf_event_update(event);
}

-static inline struct perf_guest_switch_msr *
-perf_guest_get_msrs_nop(int *nr)
-{
- *nr = 0;
- return NULL;
-}
-
static int __init init_hw_perf_events(void)
{
struct x86_pmu_quirk *quirk;
@@ -2025,7 +2022,7 @@ static int __init init_hw_perf_events(vo
x86_pmu.read = _x86_pmu_read;

if (!x86_pmu.guest_get_msrs)
- x86_pmu.guest_get_msrs = perf_guest_get_msrs_nop;
+ x86_pmu.guest_get_msrs = (void *)&__static_call_return0;

x86_pmu_static_call_update();

--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6580,8 +6580,8 @@ static void atomic_switch_perf_msrs(stru
int i, nr_msrs;
struct perf_guest_switch_msr *msrs;

+ /* Note, nr_msrs may be garbage if perf_guest_get_msrs() returns NULL. */
msrs = perf_guest_get_msrs(&nr_msrs);
-
if (!msrs)
return;

2021-03-10 18:20:16

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2] x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case

On Wed, Mar 10, 2021, Peter Zijlstra wrote:
> On Tue, Mar 09, 2021 at 09:10:19AM -0800, Sean Christopherson wrote:
>
> > @@ -2024,9 +2021,6 @@ static int __init init_hw_perf_events(void)
> > if (!x86_pmu.read)
> > x86_pmu.read = _x86_pmu_read;
> >
> > - if (!x86_pmu.guest_get_msrs)
> > - x86_pmu.guest_get_msrs = perf_guest_get_msrs_nop;
>
> I suspect I might've been over eager here and we're now in trouble when
> *_pmu_init() clears x86_pmu.guest_get_msrs (like for instance on AMD).
>
> When that happens we need to restore __static_call_return0, otherwise
> the following static_call_update() will patch in a NOP and RAX will be
> garbage again.
>
> So I've taken the liberty to update the patch as below.

Doh, I managed to forget about that between v1 and v2, too. Thanks much!

Subject: [tip: perf/urgent] x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID: c8e2fe13d1d1f3a02842b7b909d4e4846a4b6a2c
Gitweb: https://git.kernel.org/tip/c8e2fe13d1d1f3a02842b7b909d4e4846a4b6a2c
Author: Sean Christopherson <[email protected]>
AuthorDate: Tue, 09 Mar 2021 09:10:19 -08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 10 Mar 2021 16:45:09 +01:00

x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case

Initialize x86_pmu.guest_get_msrs to return 0/NULL to handle the "nop"
case. Patching in perf_guest_get_msrs_nop() during setup does not work
if there is no PMU, as setup bails before updating the static calls,
leaving x86_pmu.guest_get_msrs NULL and thus a complete nop. Ultimately,
this causes VMX abort on VM-Exit due to KVM putting random garbage from
the stack into the MSR load list.

Add a comment in KVM to note that nr_msrs is valid if and only if the
return value is non-NULL.

Fixes: abd562df94d1 ("x86/perf: Use static_call for x86_pmu.guest_get_msrs")
Reported-by: Dmitry Vyukov <[email protected]>
Reported-by: [email protected]
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/events/core.c | 15 ++++++---------
arch/x86/kvm/vmx/vmx.c | 2 +-
2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 6ddeed3..18df171 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -81,7 +81,11 @@ DEFINE_STATIC_CALL_NULL(x86_pmu_swap_task_ctx, *x86_pmu.swap_task_ctx);
DEFINE_STATIC_CALL_NULL(x86_pmu_drain_pebs, *x86_pmu.drain_pebs);
DEFINE_STATIC_CALL_NULL(x86_pmu_pebs_aliases, *x86_pmu.pebs_aliases);

-DEFINE_STATIC_CALL_NULL(x86_pmu_guest_get_msrs, *x86_pmu.guest_get_msrs);
+/*
+ * This one is magic, it will get called even when PMU init fails (because
+ * there is no PMU), in which case it should simply return NULL.
+ */
+DEFINE_STATIC_CALL_RET0(x86_pmu_guest_get_msrs, *x86_pmu.guest_get_msrs);

u64 __read_mostly hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
@@ -1944,13 +1948,6 @@ static void _x86_pmu_read(struct perf_event *event)
x86_perf_event_update(event);
}

-static inline struct perf_guest_switch_msr *
-perf_guest_get_msrs_nop(int *nr)
-{
- *nr = 0;
- return NULL;
-}
-
static int __init init_hw_perf_events(void)
{
struct x86_pmu_quirk *quirk;
@@ -2025,7 +2022,7 @@ static int __init init_hw_perf_events(void)
x86_pmu.read = _x86_pmu_read;

if (!x86_pmu.guest_get_msrs)
- x86_pmu.guest_get_msrs = perf_guest_get_msrs_nop;
+ x86_pmu.guest_get_msrs = (void *)&__static_call_return0;

x86_pmu_static_call_update();

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 50810d4..32cf828 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6580,8 +6580,8 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
int i, nr_msrs;
struct perf_guest_switch_msr *msrs;

+ /* Note, nr_msrs may be garbage if perf_guest_get_msrs() returns NULL. */
msrs = perf_guest_get_msrs(&nr_msrs);
-
if (!msrs)
return;