2020-08-21 10:54:48

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH v2] x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM

From: Sean Christopherson <[email protected]>

Don't use RDPID in the paranoid entry flow, as it can consume a KVM
guest's MSR_TSC_AUX value if an NMI arrives during KVM's run loop.

In general, the kernel does not need TSC_AUX because it can just use
__this_cpu_read(cpu_number) to read the current processor id. It can
also just block preemption and thread migration at its will, therefore
it has no need for the atomic rdtsc+vgetcpu provided by RDTSCP. For this
reason, as a performance optimization, KVM loads the guest's TSC_AUX when
a CPU first enters its run loop. On AMD's SVM, it doesn't restore the
host's value until the CPU exits the run loop; VMX is even more aggressive
and defers restoring the host's value until the CPU returns to userspace.

This optimization obviously relies on the kernel not consuming TSC_AUX,
which falls apart if an NMI arrives during the run loop and uses RDPID.
Removing it would be painful, as both SVM and VMX would need to context
switch the MSR on every VM-Enter (for a cost of 2x WRMSR), whereas using
LSL instead RDPID is a minor blip.

Both SAVE_AND_SET_GSBASE and GET_PERCPU_BASE are only used in paranoid entry,
therefore the patch can just remove the RDPID alternative.

Fixes: eaad981291ee3 ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
Cc: Dave Hansen <[email protected]>
Cc: Chang Seok Bae <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: [email protected]
Reported-by: Tom Lendacky <[email protected]>
Debugged-by: Tom Lendacky <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/entry/calling.h | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 98e4d8886f11..ae9b0d4615b3 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -374,12 +374,14 @@ For 32-bit we have the following conventions - kernel is built with
* Fetch the per-CPU GSBASE value for this processor and put it in @reg.
* We normally use %gs for accessing per-CPU data, but we are setting up
* %gs here and obviously can not use %gs itself to access per-CPU data.
+ *
+ * Do not use RDPID, because KVM loads guest's TSC_AUX on vm-entry and
+ * may not restore the host's value until the CPU returns to userspace.
+ * Thus the kernel would consume a guest's TSC_AUX if an NMI arrives
+ * while running KVM's run loop.
*/
.macro GET_PERCPU_BASE reg:req
- ALTERNATIVE \
- "LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
- "RDPID \reg", \
- X86_FEATURE_RDPID
+ LOAD_CPU_AND_NODE_SEG_LIMIT \reg
andq $VDSO_CPUNODE_MASK, \reg
movq __per_cpu_offset(, \reg, 8), \reg
.endm
--
2.26.2


2020-08-21 14:19:12

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH v2] x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM

On Fri, Aug 21, 2020 at 6:56 AM Paolo Bonzini <[email protected]> wrote:
>
> From: Sean Christopherson <[email protected]>
>
> Don't use RDPID in the paranoid entry flow, as it can consume a KVM
> guest's MSR_TSC_AUX value if an NMI arrives during KVM's run loop.
>
> In general, the kernel does not need TSC_AUX because it can just use
> __this_cpu_read(cpu_number) to read the current processor id. It can
> also just block preemption and thread migration at its will, therefore
> it has no need for the atomic rdtsc+vgetcpu provided by RDTSCP. For this
> reason, as a performance optimization, KVM loads the guest's TSC_AUX when
> a CPU first enters its run loop. On AMD's SVM, it doesn't restore the
> host's value until the CPU exits the run loop; VMX is even more aggressive
> and defers restoring the host's value until the CPU returns to userspace.
>
> This optimization obviously relies on the kernel not consuming TSC_AUX,
> which falls apart if an NMI arrives during the run loop and uses RDPID.
> Removing it would be painful, as both SVM and VMX would need to context
> switch the MSR on every VM-Enter (for a cost of 2x WRMSR), whereas using
> LSL instead RDPID is a minor blip.
>
> Both SAVE_AND_SET_GSBASE and GET_PERCPU_BASE are only used in paranoid entry,
> therefore the patch can just remove the RDPID alternative.
>
> Fixes: eaad981291ee3 ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
> Cc: Dave Hansen <[email protected]>
> Cc: Chang Seok Bae <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Sasha Levin <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: [email protected]
> Reported-by: Tom Lendacky <[email protected]>
> Debugged-by: Tom Lendacky <[email protected]>
> Suggested-by: Andy Lutomirski <[email protected]>
> Suggested-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> arch/x86/entry/calling.h | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> index 98e4d8886f11..ae9b0d4615b3 100644
> --- a/arch/x86/entry/calling.h
> +++ b/arch/x86/entry/calling.h
> @@ -374,12 +374,14 @@ For 32-bit we have the following conventions - kernel is built with
> * Fetch the per-CPU GSBASE value for this processor and put it in @reg.
> * We normally use %gs for accessing per-CPU data, but we are setting up
> * %gs here and obviously can not use %gs itself to access per-CPU data.
> + *
> + * Do not use RDPID, because KVM loads guest's TSC_AUX on vm-entry and
> + * may not restore the host's value until the CPU returns to userspace.
> + * Thus the kernel would consume a guest's TSC_AUX if an NMI arrives
> + * while running KVM's run loop.
> */
> .macro GET_PERCPU_BASE reg:req
> - ALTERNATIVE \
> - "LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
> - "RDPID \reg", \
> - X86_FEATURE_RDPID
> + LOAD_CPU_AND_NODE_SEG_LIMIT \reg
> andq $VDSO_CPUNODE_MASK, \reg
> movq __per_cpu_offset(, \reg, 8), \reg
> .endm

LOAD_CPU_AND_NODE_SEG_LIMIT can be merged into this, as its only
purpose was to work around using CPP macros in an alternative.

--
Brian Gerst

2020-08-21 14:22:59

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2] x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM

On Fri, Aug 21, 2020 at 06:52:29AM -0400, Paolo Bonzini wrote:
> From: Sean Christopherson <[email protected]>
>
> Don't use RDPID in the paranoid entry flow, as it can consume a KVM
> guest's MSR_TSC_AUX value if an NMI arrives during KVM's run loop.
>
> In general, the kernel does not need TSC_AUX because it can just use
> __this_cpu_read(cpu_number) to read the current processor id. It can
> also just block preemption and thread migration at its will, therefore
> it has no need for the atomic rdtsc+vgetcpu provided by RDTSCP. For this
> reason, as a performance optimization, KVM loads the guest's TSC_AUX when
> a CPU first enters its run loop. On AMD's SVM, it doesn't restore the
> host's value until the CPU exits the run loop; VMX is even more aggressive
> and defers restoring the host's value until the CPU returns to userspace.
>
> This optimization obviously relies on the kernel not consuming TSC_AUX,
> which falls apart if an NMI arrives during the run loop and uses RDPID.
> Removing it would be painful, as both SVM and VMX would need to context
> switch the MSR on every VM-Enter (for a cost of 2x WRMSR), whereas using
> LSL instead RDPID is a minor blip.
>
> Both SAVE_AND_SET_GSBASE and GET_PERCPU_BASE are only used in paranoid entry,
> therefore the patch can just remove the RDPID alternative.
>
> Fixes: eaad981291ee3 ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
> Cc: Dave Hansen <[email protected]>
> Cc: Chang Seok Bae <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Sasha Levin <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: [email protected]
> Reported-by: Tom Lendacky <[email protected]>
> Debugged-by: Tom Lendacky <[email protected]>
> Suggested-by: Andy Lutomirski <[email protected]>
> Suggested-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> arch/x86/entry/calling.h | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> index 98e4d8886f11..ae9b0d4615b3 100644
> --- a/arch/x86/entry/calling.h
> +++ b/arch/x86/entry/calling.h
> @@ -374,12 +374,14 @@ For 32-bit we have the following conventions - kernel is built with
> * Fetch the per-CPU GSBASE value for this processor and put it in @reg.
> * We normally use %gs for accessing per-CPU data, but we are setting up
> * %gs here and obviously can not use %gs itself to access per-CPU data.
> + *
> + * Do not use RDPID, because KVM loads guest's TSC_AUX on vm-entry and
> + * may not restore the host's value until the CPU returns to userspace.
> + * Thus the kernel would consume a guest's TSC_AUX if an NMI arrives
> + * while running KVM's run loop.
> */
> .macro GET_PERCPU_BASE reg:req
> - ALTERNATIVE \
> - "LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
> - "RDPID \reg", \

This was the only user of the RDPID macro, I assume we want to yank that out
as well?

> - X86_FEATURE_RDPID
> + LOAD_CPU_AND_NODE_SEG_LIMIT \reg
> andq $VDSO_CPUNODE_MASK, \reg
> movq __per_cpu_offset(, \reg, 8), \reg
> .endm
> --
> 2.26.2
>

Subject: [tip: x86/urgent] x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 6a3ea3e68b8a8a26c4aaac03432ed92269c9a14e
Gitweb: https://git.kernel.org/tip/6a3ea3e68b8a8a26c4aaac03432ed92269c9a14e
Author: Sean Christopherson <[email protected]>
AuthorDate: Fri, 21 Aug 2020 06:52:29 -04:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Fri, 21 Aug 2020 16:15:27 +02:00

x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM

KVM has an optmization to avoid expensive MRS read/writes on
VMENTER/EXIT. It caches the MSR values and restores them either when
leaving the run loop, on preemption or when going out to user space.

The affected MSRs are not required for kernel context operations. This
changed with the recently introduced mechanism to handle FSGSBASE in the
paranoid entry code which has to retrieve the kernel GSBASE value by
accessing per CPU memory. The mechanism needs to retrieve the CPU number
and uses either LSL or RDPID if the processor supports it.

Unfortunately RDPID uses MSR_TSC_AUX which is in the list of cached and
lazily restored MSRs, which means between the point where the guest value
is written and the point of restore, MSR_TSC_AUX contains a random number.

If an NMI or any other exception which uses the paranoid entry path happens
in such a context, then RDPID returns the random guest MSR_TSC_AUX value.

As a consequence this reads from the wrong memory location to retrieve the
kernel GSBASE value. Kernel GS is used to for all regular this_cpu_*()
operations. If the GSBASE in the exception handler points to the per CPU
memory of a different CPU then this has the obvious consequences of data
corruption and crashes.

As the paranoid entry path is the only place which accesses MSR_TSX_AUX
(via RDPID) and the fallback via LSL is not significantly slower, remove
the RDPID alternative from the entry path and always use LSL.

The alternative would be to write MSR_TSC_AUX on every VMENTER and VMEXIT
which would be inflicting massive overhead on that code path.

[ tglx: Rewrote changelog ]

Fixes: eaad981291ee3 ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
Reported-by: Tom Lendacky <[email protected]>
Debugged-by: Tom Lendacky <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/entry/calling.h | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 98e4d88..ae9b0d4 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -374,12 +374,14 @@ For 32-bit we have the following conventions - kernel is built with
* Fetch the per-CPU GSBASE value for this processor and put it in @reg.
* We normally use %gs for accessing per-CPU data, but we are setting up
* %gs here and obviously can not use %gs itself to access per-CPU data.
+ *
+ * Do not use RDPID, because KVM loads guest's TSC_AUX on vm-entry and
+ * may not restore the host's value until the CPU returns to userspace.
+ * Thus the kernel would consume a guest's TSC_AUX if an NMI arrives
+ * while running KVM's run loop.
*/
.macro GET_PERCPU_BASE reg:req
- ALTERNATIVE \
- "LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
- "RDPID \reg", \
- X86_FEATURE_RDPID
+ LOAD_CPU_AND_NODE_SEG_LIMIT \reg
andq $VDSO_CPUNODE_MASK, \reg
movq __per_cpu_offset(, \reg, 8), \reg
.endm

2020-08-21 15:37:45

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH v2] x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM

On Fri, Aug 21, 2020 at 10:22 AM Sean Christopherson
<[email protected]> wrote:
>
> On Fri, Aug 21, 2020 at 06:52:29AM -0400, Paolo Bonzini wrote:
> > From: Sean Christopherson <[email protected]>
> >
> > Don't use RDPID in the paranoid entry flow, as it can consume a KVM
> > guest's MSR_TSC_AUX value if an NMI arrives during KVM's run loop.
> >
> > In general, the kernel does not need TSC_AUX because it can just use
> > __this_cpu_read(cpu_number) to read the current processor id. It can
> > also just block preemption and thread migration at its will, therefore
> > it has no need for the atomic rdtsc+vgetcpu provided by RDTSCP. For this
> > reason, as a performance optimization, KVM loads the guest's TSC_AUX when
> > a CPU first enters its run loop. On AMD's SVM, it doesn't restore the
> > host's value until the CPU exits the run loop; VMX is even more aggressive
> > and defers restoring the host's value until the CPU returns to userspace.
> >
> > This optimization obviously relies on the kernel not consuming TSC_AUX,
> > which falls apart if an NMI arrives during the run loop and uses RDPID.
> > Removing it would be painful, as both SVM and VMX would need to context
> > switch the MSR on every VM-Enter (for a cost of 2x WRMSR), whereas using
> > LSL instead RDPID is a minor blip.
> >
> > Both SAVE_AND_SET_GSBASE and GET_PERCPU_BASE are only used in paranoid entry,
> > therefore the patch can just remove the RDPID alternative.
> >
> > Fixes: eaad981291ee3 ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
> > Cc: Dave Hansen <[email protected]>
> > Cc: Chang Seok Bae <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Sasha Levin <[email protected]>
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: [email protected]
> > Reported-by: Tom Lendacky <[email protected]>
> > Debugged-by: Tom Lendacky <[email protected]>
> > Suggested-by: Andy Lutomirski <[email protected]>
> > Suggested-by: Peter Zijlstra <[email protected]>
> > Signed-off-by: Sean Christopherson <[email protected]>
> > Signed-off-by: Paolo Bonzini <[email protected]>
> > ---
> > arch/x86/entry/calling.h | 10 ++++++----
> > 1 file changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> > index 98e4d8886f11..ae9b0d4615b3 100644
> > --- a/arch/x86/entry/calling.h
> > +++ b/arch/x86/entry/calling.h
> > @@ -374,12 +374,14 @@ For 32-bit we have the following conventions - kernel is built with
> > * Fetch the per-CPU GSBASE value for this processor and put it in @reg.
> > * We normally use %gs for accessing per-CPU data, but we are setting up
> > * %gs here and obviously can not use %gs itself to access per-CPU data.
> > + *
> > + * Do not use RDPID, because KVM loads guest's TSC_AUX on vm-entry and
> > + * may not restore the host's value until the CPU returns to userspace.
> > + * Thus the kernel would consume a guest's TSC_AUX if an NMI arrives
> > + * while running KVM's run loop.
> > */
> > .macro GET_PERCPU_BASE reg:req
> > - ALTERNATIVE \
> > - "LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
> > - "RDPID \reg", \
>
> This was the only user of the RDPID macro, I assume we want to yank that out
> as well?

No. That one should be kept until the minimum binutils version is
raised to one that supports the RDPID opcode.

--
Brian Gerst

2020-08-25 12:49:48

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2] x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM

On Fri, Aug 21 2020 at 11:35, Brian Gerst wrote:
> On Fri, Aug 21, 2020 at 10:22 AM Sean Christopherson
>> > .macro GET_PERCPU_BASE reg:req
>> > - ALTERNATIVE \
>> > - "LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
>> > - "RDPID \reg", \
>>
>> This was the only user of the RDPID macro, I assume we want to yank that out
>> as well?
>
> No. That one should be kept until the minimum binutils version is
> raised to one that supports the RDPID opcode.

The macro is unused and nothing in the kernel can use RDPID as we just
established.

Thanks,

tglx

2020-08-25 12:54:35

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH v2] x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM

On Tue, Aug 25, 2020 at 6:44 AM Thomas Gleixner <[email protected]> wrote:
>
> On Fri, Aug 21 2020 at 11:35, Brian Gerst wrote:
> > On Fri, Aug 21, 2020 at 10:22 AM Sean Christopherson
> >> > .macro GET_PERCPU_BASE reg:req
> >> > - ALTERNATIVE \
> >> > - "LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
> >> > - "RDPID \reg", \
> >>
> >> This was the only user of the RDPID macro, I assume we want to yank that out
> >> as well?
> >
> > No. That one should be kept until the minimum binutils version is
> > raised to one that supports the RDPID opcode.
>
> The macro is unused and nothing in the kernel can use RDPID as we just
> established.

It is opencoded in vdso_read_cpunode(), but the RDPID macro can't be
used there. So you are correct, it can be removed.

--
Brian Gerst