2023-10-25 20:55:35

by Pawan Gupta

[permalink] [raw]
Subject: [PATCH v3 0/6] Delay VERW

v3:
- Use .entry.text section for VERW memory operand. (Andrew/PeterZ)
- Fix the duplicate header inclusion. (Chao)

v2: https://lore.kernel.org/r/[email protected]
- Removed the extra EXEC_VERW macro layers. (Sean)
- Move NOPL before VERW. (Sean)
- s/USER_CLEAR_CPU_BUFFERS/CLEAR_CPU_BUFFERS/. (Josh/Dave)
- Removed the comments before CLEAR_CPU_BUFFERS. (Josh)
- Remove CLEAR_CPU_BUFFERS from NMI returning to kernel and document the
reason. (Josh/Dave)
- Reformat comment in md_clear_update_mitigation(). (Josh)
- Squash "x86/bugs: Cleanup mds_user_clear" patch. (Nikolay)
- s/GUEST_CLEAR_CPU_BUFFERS/CLEAR_CPU_BUFFERS/. (Josh)
- Added a patch from Sean to use CFLAGS.CF for VMLAUNCH/VMRESUME
selection. This facilitates a single CLEAR_CPU_BUFFERS location for both
VMLAUNCH and VMRESUME. (Sean)

v1: https://lore.kernel.org/r/[email protected]

Hi,

Legacy instruction VERW was overloaded by some processors to clear
micro-architectural CPU buffers as a mitigation of CPU bugs. This series
moves VERW execution to a later point in exit-to-user path. This is
needed because in some cases it may be possible for kernel data to be
accessed after VERW in arch_exit_to_user_mode(). Such accesses may put
data into MDS affected CPU buffers, for example:

1. Kernel data accessed by an NMI between VERW and return-to-user can
remain in CPU buffers (since NMI returning to kernel does not
execute VERW to clear CPU buffers).
2. Alyssa reported that after VERW is executed,
CONFIG_GCC_PLUGIN_STACKLEAK=y scrubs the stack used by a system
call. Memory accesses during stack scrubbing can move kernel stack
contents into CPU buffers.
3. When caller saved registers are restored after a return from
function executing VERW, the kernel stack accesses can remain in
CPU buffers(since they occur after VERW).

Although these cases are less practical to exploit, moving VERW closer
to ring transition reduces the attack surface.

Overview of the series:

Patch 1: Prepares VERW macros for use in asm.
Patch 2: Adds macros to 64-bit entry/exit points.
Patch 3: Adds macros to 32-bit entry/exit points.
Patch 4: Enables the new macros.
Patch 5: Uses CFLAGS.CF for VMLAUNCH/VMRESUME selection.
Patch 6: Adds macro to VMenter.

Below is some performance data collected with v1 on a Skylake client
compared with previous implementation:

Baseline: v6.6-rc5

| Test | Configuration | Relative |
| ------------------ | ---------------------- | -------- |
| build-linux-kernel | defconfig | 1.00 |
| hackbench | 32 - Process | 1.02 |
| nginx | Short Connection - 500 | 1.01 |

Signed-off-by: Pawan Gupta <[email protected]>
---
Pawan Gupta (5):
x86/bugs: Add asm helpers for executing VERW
x86/entry_64: Add VERW just before userspace transition
x86/entry_32: Add VERW just before userspace transition
x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
KVM: VMX: Move VERW closer to VMentry for MDS mitigation

Sean Christopherson (1):
KVM: VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH

Documentation/arch/x86/mds.rst | 39 ++++++++++++++++++++++++++----------
arch/x86/entry/entry.S | 16 +++++++++++++++
arch/x86/entry/entry_32.S | 3 +++
arch/x86/entry/entry_64.S | 11 ++++++++++
arch/x86/entry/entry_64_compat.S | 1 +
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/entry-common.h | 1 -
arch/x86/include/asm/nospec-branch.h | 27 ++++++++++++++-----------
arch/x86/kernel/cpu/bugs.c | 15 ++++++--------
arch/x86/kernel/nmi.c | 2 --
arch/x86/kvm/vmx/run_flags.h | 7 +++++--
arch/x86/kvm/vmx/vmenter.S | 9 ++++++---
arch/x86/kvm/vmx/vmx.c | 10 ++++++---
13 files changed, 99 insertions(+), 44 deletions(-)
---
base-commit: 05d3ef8bba77c1b5f98d941d8b2d4aeab8118ef1
change-id: 20231011-delay-verw-d0474986b2c3

Best regards,
--
Thanks,
Pawan



2023-10-25 20:55:40

by Pawan Gupta

[permalink] [raw]
Subject: [PATCH v3 2/6] x86/entry_64: Add VERW just before userspace transition

Mitigation for MDS is to use VERW instruction to clear any secrets in
CPU Buffers. Any memory accesses after VERW execution can still remain
in CPU buffers. It is safer to execute VERW late in return to user path
to minimize the window in which kernel data can end up in CPU buffers.
There are not many kernel secrets to be had after SWITCH_TO_USER_CR3.

Add support for deploying VERW mitigation after user register state is
restored. This helps minimize the chances of kernel data ending up into
CPU buffers after executing VERW.

Note that the mitigation at the new location is not yet enabled.

Corner case not handled
=======================
Interrupts returning to kernel don't clear CPUs buffers since the
exit-to-user path is expected to do that anyways. But, there could be
a case when an NMI is generated in kernel after the exit-to-user path
has cleared the buffers. This case is not handled and NMI returning to
kernel don't clear CPU buffers because:

1. It is rare to get an NMI after VERW, but before returning to userspace.
2. For an unprivileged user, there is no known way to make that NMI
less rare or target it.
3. It would take a large number of these precisely-timed NMIs to mount
an actual attack. There's presumably not enough bandwidth.
4. The NMI in question occurs after a VERW, i.e. when user state is
restored and most interesting data is already scrubbed. Whats left
is only the data that NMI touches, and that may or may not be of
any interest.

Suggested-by: Dave Hansen <[email protected]>
Signed-off-by: Pawan Gupta <[email protected]>
---
arch/x86/entry/entry_64.S | 11 +++++++++++
arch/x86/entry/entry_64_compat.S | 1 +
2 files changed, 12 insertions(+)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 43606de22511..9f97a8bd11e8 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -223,6 +223,7 @@ syscall_return_via_sysret:
SYM_INNER_LABEL(entry_SYSRETQ_unsafe_stack, SYM_L_GLOBAL)
ANNOTATE_NOENDBR
swapgs
+ CLEAR_CPU_BUFFERS
sysretq
SYM_INNER_LABEL(entry_SYSRETQ_end, SYM_L_GLOBAL)
ANNOTATE_NOENDBR
@@ -663,6 +664,7 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
/* Restore RDI. */
popq %rdi
swapgs
+ CLEAR_CPU_BUFFERS
jmp .Lnative_iret


@@ -774,6 +776,8 @@ native_irq_return_ldt:
*/
popq %rax /* Restore user RAX */

+ CLEAR_CPU_BUFFERS
+
/*
* RSP now points to an ordinary IRET frame, except that the page
* is read-only and RSP[31:16] are preloaded with the userspace
@@ -1502,6 +1506,12 @@ nmi_restore:
std
movq $0, 5*8(%rsp) /* clear "NMI executing" */

+ /*
+ * Skip CLEAR_CPU_BUFFERS here, since it only helps in rare cases like
+ * NMI in kernel after user state is restored. For an unprivileged user
+ * these conditions are hard to meet.
+ */
+
/*
* iretq reads the "iret" frame and exits the NMI stack in a
* single instruction. We are returning to kernel mode, so this
@@ -1520,6 +1530,7 @@ SYM_CODE_START(ignore_sysret)
UNWIND_HINT_END_OF_STACK
ENDBR
mov $-ENOSYS, %eax
+ CLEAR_CPU_BUFFERS
sysretl
SYM_CODE_END(ignore_sysret)
#endif
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 70150298f8bd..245697eb8485 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -271,6 +271,7 @@ SYM_INNER_LABEL(entry_SYSRETL_compat_unsafe_stack, SYM_L_GLOBAL)
xorl %r9d, %r9d
xorl %r10d, %r10d
swapgs
+ CLEAR_CPU_BUFFERS
sysretl
SYM_INNER_LABEL(entry_SYSRETL_compat_end, SYM_L_GLOBAL)
ANNOTATE_NOENDBR

--
2.34.1


2023-10-25 20:55:42

by Pawan Gupta

[permalink] [raw]
Subject: [PATCH v3 3/6] x86/entry_32: Add VERW just before userspace transition

As done for entry_64, add support for executing VERW late in exit to
user path for 32-bit mode.

Signed-off-by: Pawan Gupta <[email protected]>
---
arch/x86/entry/entry_32.S | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 6e6af42e044a..74a4358c7f45 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -885,6 +885,7 @@ SYM_FUNC_START(entry_SYSENTER_32)
BUG_IF_WRONG_CR3 no_user_check=1
popfl
popl %eax
+ CLEAR_CPU_BUFFERS

/*
* Return back to the vDSO, which will pop ecx and edx.
@@ -954,6 +955,7 @@ restore_all_switch_stack:

/* Restore user state */
RESTORE_REGS pop=4 # skip orig_eax/error_code
+ CLEAR_CPU_BUFFERS
.Lirq_return:
/*
* ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
@@ -1146,6 +1148,7 @@ SYM_CODE_START(asm_exc_nmi)

/* Not on SYSENTER stack. */
call exc_nmi
+ CLEAR_CPU_BUFFERS
jmp .Lnmi_return

.Lnmi_from_sysenter_stack:

--
2.34.1


2023-10-25 20:55:49

by Pawan Gupta

[permalink] [raw]
Subject: [PATCH v3 1/6] x86/bugs: Add asm helpers for executing VERW

MDS mitigation requires clearing the CPU buffers before returning to
user. This needs to be done late in the exit-to-user path. Current
location of VERW leaves a possibility of kernel data ending up in CPU
buffers for memory accesses done after VERW such as:

1. Kernel data accessed by an NMI between VERW and return-to-user can
remain in CPU buffers ( since NMI returning to kernel does not
execute VERW to clear CPU buffers.
2. Alyssa reported that after VERW is executed,
CONFIG_GCC_PLUGIN_STACKLEAK=y scrubs the stack used by a system
call. Memory accesses during stack scrubbing can move kernel stack
contents into CPU buffers.
3. When caller saved registers are restored after a return from
function executing VERW, the kernel stack accesses can remain in
CPU buffers(since they occur after VERW).

To fix this VERW needs to be moved very late in exit-to-user path.

In preparation for moving VERW to entry/exit asm code, create macros
that can be used in asm. Also make them depend on a new feature flag
X86_FEATURE_CLEAR_CPU_BUF.

Reported-by: Alyssa Milburn <[email protected]>
Suggested-by: Andrew Cooper <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Pawan Gupta <[email protected]>
---
arch/x86/entry/entry.S | 16 ++++++++++++++++
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/nospec-branch.h | 15 +++++++++++++++
3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
index bfb7bcb362bc..f8ba0c0b6e60 100644
--- a/arch/x86/entry/entry.S
+++ b/arch/x86/entry/entry.S
@@ -6,6 +6,9 @@
#include <linux/linkage.h>
#include <asm/export.h>
#include <asm/msr-index.h>
+#include <asm/unwind_hints.h>
+#include <asm/segment.h>
+#include <asm/cache.h>

.pushsection .noinstr.text, "ax"

@@ -20,3 +23,16 @@ SYM_FUNC_END(entry_ibpb)
EXPORT_SYMBOL_GPL(entry_ibpb);

.popsection
+
+.pushsection .entry.text, "ax"
+
+.align L1_CACHE_BYTES, 0xcc
+SYM_CODE_START_NOALIGN(mds_verw_sel)
+ UNWIND_HINT_UNDEFINED
+ ANNOTATE_NOENDBR
+ .word __KERNEL_DS
+SYM_CODE_END(mds_verw_sel);
+/* For KVM */
+EXPORT_SYMBOL_GPL(mds_verw_sel);
+
+.popsection
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 58cb9495e40f..f21fc0f12737 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -308,10 +308,10 @@
#define X86_FEATURE_SMBA (11*32+21) /* "" Slow Memory Bandwidth Allocation */
#define X86_FEATURE_BMEC (11*32+22) /* "" Bandwidth Monitoring Event Configuration */
#define X86_FEATURE_USER_SHSTK (11*32+23) /* Shadow stack support for user mode applications */
-
#define X86_FEATURE_SRSO (11*32+24) /* "" AMD BTB untrain RETs */
#define X86_FEATURE_SRSO_ALIAS (11*32+25) /* "" AMD BTB untrain RETs through aliasing */
#define X86_FEATURE_IBPB_ON_VMEXIT (11*32+26) /* "" Issue an IBPB only on VMEXIT */
+#define X86_FEATURE_CLEAR_CPU_BUF (11*32+27) /* "" Clear CPU buffers */

/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index c55cc243592e..005e69f93115 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -329,6 +329,21 @@
#endif
.endm

+/*
+ * Macros to execute VERW instruction that mitigate transient data sampling
+ * attacks such as MDS. On affected systems a microcode update overloaded VERW
+ * instruction to also clear the CPU buffers. VERW clobbers CFLAGS.ZF.
+ *
+ * Note: Only the memory operand variant of VERW clears the CPU buffers.
+ */
+.macro EXEC_VERW
+ verw _ASM_RIP(mds_verw_sel)
+.endm
+
+.macro CLEAR_CPU_BUFFERS
+ ALTERNATIVE "", __stringify(EXEC_VERW), X86_FEATURE_CLEAR_CPU_BUF
+.endm
+
#else /* __ASSEMBLY__ */

#define ANNOTATE_RETPOLINE_SAFE \

--
2.34.1


2023-10-25 20:55:58

by Pawan Gupta

[permalink] [raw]
Subject: [PATCH v3 6/6] KVM: VMX: Move VERW closer to VMentry for MDS mitigation

During VMentry VERW is executed to mitigate MDS. After VERW, any memory
access like register push onto stack may put host data in MDS affected
CPU buffers. A guest can then use MDS to sample host data.

Although likelihood of secrets surviving in registers at current VERW
callsite is less, but it can't be ruled out. Harden the MDS mitigation
by moving the VERW mitigation late in VMentry path.

Note that VERW for MMIO Stale Data mitigation is unchanged because of
the complexity of per-guest conditional VERW which is not easy to handle
that late in asm with no GPRs available. If the CPU is also affected by
MDS, VERW is unconditionally executed late in asm regardless of guest
having MMIO access.

Signed-off-by: Pawan Gupta <[email protected]>
---
arch/x86/kvm/vmx/vmenter.S | 3 +++
arch/x86/kvm/vmx/vmx.c | 10 +++++++---
2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index b3b13ec04bac..139960deb736 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -161,6 +161,9 @@ SYM_FUNC_START(__vmx_vcpu_run)
/* Load guest RAX. This kills the @regs pointer! */
mov VCPU_RAX(%_ASM_AX), %_ASM_AX

+ /* Clobbers EFLAGS.ZF */
+ CLEAR_CPU_BUFFERS
+
/* Check EFLAGS.CF from the VMX_RUN_VMRESUME bit test above. */
jnc .Lvmlaunch

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 24e8694b83fc..2d149589cf5b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7226,13 +7226,17 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,

guest_state_enter_irqoff();

- /* L1D Flush includes CPU buffer clear to mitigate MDS */
+ /*
+ * L1D Flush includes CPU buffer clear to mitigate MDS, but VERW
+ * mitigation for MDS is done late in VMentry and is still
+ * executed inspite of L1D Flush. This is because an extra VERW
+ * should not matter much after the big hammer L1D Flush.
+ */
if (static_branch_unlikely(&vmx_l1d_should_flush))
vmx_l1d_flush(vcpu);
- else if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF))
- mds_clear_cpu_buffers();
else if (static_branch_unlikely(&mmio_stale_data_clear) &&
kvm_arch_has_assigned_device(vcpu->kvm))
+ /* MMIO mitigation is mutually exclusive with MDS mitigation later in asm */
mds_clear_cpu_buffers();

vmx_disable_fb_clear(vmx);

--
2.34.1


2023-10-25 20:56:00

by Pawan Gupta

[permalink] [raw]
Subject: [PATCH v3 5/6] KVM: VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH

From: Sean Christopherson <[email protected]>

Use EFLAGS.CF instead of EFLAGS.ZF to track whether to use VMRESUME versus
VMLAUNCH. Freeing up EFLAGS.ZF will allow doing VERW, which clobbers ZF,
for MDS mitigations as late as possible without needing to duplicate VERW
for both paths.

Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Pawan Gupta <[email protected]>
---
arch/x86/kvm/vmx/run_flags.h | 7 +++++--
arch/x86/kvm/vmx/vmenter.S | 6 +++---
2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/run_flags.h b/arch/x86/kvm/vmx/run_flags.h
index edc3f16cc189..6a9bfdfbb6e5 100644
--- a/arch/x86/kvm/vmx/run_flags.h
+++ b/arch/x86/kvm/vmx/run_flags.h
@@ -2,7 +2,10 @@
#ifndef __KVM_X86_VMX_RUN_FLAGS_H
#define __KVM_X86_VMX_RUN_FLAGS_H

-#define VMX_RUN_VMRESUME (1 << 0)
-#define VMX_RUN_SAVE_SPEC_CTRL (1 << 1)
+#define VMX_RUN_VMRESUME_SHIFT 0
+#define VMX_RUN_SAVE_SPEC_CTRL_SHIFT 1
+
+#define VMX_RUN_VMRESUME BIT(VMX_RUN_VMRESUME_SHIFT)
+#define VMX_RUN_SAVE_SPEC_CTRL BIT(VMX_RUN_SAVE_SPEC_CTRL_SHIFT)

#endif /* __KVM_X86_VMX_RUN_FLAGS_H */
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index be275a0410a8..b3b13ec04bac 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -139,7 +139,7 @@ SYM_FUNC_START(__vmx_vcpu_run)
mov (%_ASM_SP), %_ASM_AX

/* Check if vmlaunch or vmresume is needed */
- test $VMX_RUN_VMRESUME, %ebx
+ bt $VMX_RUN_VMRESUME_SHIFT, %ebx

/* Load guest registers. Don't clobber flags. */
mov VCPU_RCX(%_ASM_AX), %_ASM_CX
@@ -161,8 +161,8 @@ SYM_FUNC_START(__vmx_vcpu_run)
/* Load guest RAX. This kills the @regs pointer! */
mov VCPU_RAX(%_ASM_AX), %_ASM_AX

- /* Check EFLAGS.ZF from 'test VMX_RUN_VMRESUME' above */
- jz .Lvmlaunch
+ /* Check EFLAGS.CF from the VMX_RUN_VMRESUME bit test above. */
+ jnc .Lvmlaunch

/*
* After a successful VMRESUME/VMLAUNCH, control flow "magically"

--
2.34.1


2023-10-25 21:10:52

by Andrew Cooper

[permalink] [raw]
Subject: Re: [PATCH v3 1/6] x86/bugs: Add asm helpers for executing VERW

On 25/10/2023 9:52 pm, Pawan Gupta wrote:
> diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
> index bfb7bcb362bc..f8ba0c0b6e60 100644
> --- a/arch/x86/entry/entry.S
> +++ b/arch/x86/entry/entry.S
> @@ -20,3 +23,16 @@ SYM_FUNC_END(entry_ibpb)
> EXPORT_SYMBOL_GPL(entry_ibpb);
>
> .popsection
> +
> +.pushsection .entry.text, "ax"
> +
> +.align L1_CACHE_BYTES, 0xcc
> +SYM_CODE_START_NOALIGN(mds_verw_sel)
> + UNWIND_HINT_UNDEFINED
> + ANNOTATE_NOENDBR
> + .word __KERNEL_DS

You need another .align here.  Otherwise subsequent code will still
start in this cacheline and defeat the purpose of trying to keep it
separate.

> +SYM_CODE_END(mds_verw_sel);

Thinking about it, should this really be CODE and not a data entry?

It lives in .entry.text but it really is data and objtool shouldn't be
writing ORC data for it at all.

(Not to mention that if it's marked as STT_OBJECT, objdump -d will do
the sensible thing and not even try to disassemble it).

~Andrew

P.S. Please CC on the full series.  Far less effort than fishing the
rest off lore.

2023-10-25 21:28:37

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v3 1/6] x86/bugs: Add asm helpers for executing VERW

On Wed, Oct 25, 2023 at 10:10:41PM +0100, Andrew Cooper wrote:
> On 25/10/2023 9:52 pm, Pawan Gupta wrote:
> > diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
> > index bfb7bcb362bc..f8ba0c0b6e60 100644
> > --- a/arch/x86/entry/entry.S
> > +++ b/arch/x86/entry/entry.S
> > @@ -20,3 +23,16 @@ SYM_FUNC_END(entry_ibpb)
> > EXPORT_SYMBOL_GPL(entry_ibpb);
> >
> > .popsection
> > +
> > +.pushsection .entry.text, "ax"
> > +
> > +.align L1_CACHE_BYTES, 0xcc
> > +SYM_CODE_START_NOALIGN(mds_verw_sel)
> > + UNWIND_HINT_UNDEFINED
> > + ANNOTATE_NOENDBR
> > + .word __KERNEL_DS
>
> You need another .align here.  Otherwise subsequent code will still
> start in this cacheline and defeat the purpose of trying to keep it
> separate.
>
> > +SYM_CODE_END(mds_verw_sel);
>
> Thinking about it, should this really be CODE and not a data entry?
>
> It lives in .entry.text but it really is data and objtool shouldn't be
> writing ORC data for it at all.
>
> (Not to mention that if it's marked as STT_OBJECT, objdump -d will do
> the sensible thing and not even try to disassemble it).
>
> ~Andrew
>
> P.S. Please CC on the full series.  Far less effort than fishing the
> rest off lore.

+1 to putting it in .rodata or so.

--
Josh

2023-10-25 21:31:40

by Andrew Cooper

[permalink] [raw]
Subject: Re: [PATCH v3 1/6] x86/bugs: Add asm helpers for executing VERW

On 25/10/2023 10:28 pm, Josh Poimboeuf wrote:
> On Wed, Oct 25, 2023 at 10:10:41PM +0100, Andrew Cooper wrote:
>> On 25/10/2023 9:52 pm, Pawan Gupta wrote:
>>> diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
>>> index bfb7bcb362bc..f8ba0c0b6e60 100644
>>> --- a/arch/x86/entry/entry.S
>>> +++ b/arch/x86/entry/entry.S
>>> @@ -20,3 +23,16 @@ SYM_FUNC_END(entry_ibpb)
>>> EXPORT_SYMBOL_GPL(entry_ibpb);
>>>
>>> .popsection
>>> +
>>> +.pushsection .entry.text, "ax"
>>> +
>>> +.align L1_CACHE_BYTES, 0xcc
>>> +SYM_CODE_START_NOALIGN(mds_verw_sel)
>>> + UNWIND_HINT_UNDEFINED
>>> + ANNOTATE_NOENDBR
>>> + .word __KERNEL_DS
>> You need another .align here.  Otherwise subsequent code will still
>> start in this cacheline and defeat the purpose of trying to keep it
>> separate.
>>
>>> +SYM_CODE_END(mds_verw_sel);
>> Thinking about it, should this really be CODE and not a data entry?
>>
>> It lives in .entry.text but it really is data and objtool shouldn't be
>> writing ORC data for it at all.
>>
>> (Not to mention that if it's marked as STT_OBJECT, objdump -d will do
>> the sensible thing and not even try to disassemble it).
>>
>> ~Andrew
>>
>> P.S. Please CC on the full series.  Far less effort than fishing the
>> rest off lore.
> +1 to putting it in .rodata or so.

It's necessarily in .entry.text so it doesn't explode with KPTI active.

~Andrew

2023-10-25 21:49:26

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v3 1/6] x86/bugs: Add asm helpers for executing VERW

On Wed, Oct 25, 2023 at 10:30:52PM +0100, Andrew Cooper wrote:
> On 25/10/2023 10:28 pm, Josh Poimboeuf wrote:
> > On Wed, Oct 25, 2023 at 10:10:41PM +0100, Andrew Cooper wrote:
> >> On 25/10/2023 9:52 pm, Pawan Gupta wrote:
> >>> diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
> >>> index bfb7bcb362bc..f8ba0c0b6e60 100644
> >>> --- a/arch/x86/entry/entry.S
> >>> +++ b/arch/x86/entry/entry.S
> >>> @@ -20,3 +23,16 @@ SYM_FUNC_END(entry_ibpb)
> >>> EXPORT_SYMBOL_GPL(entry_ibpb);
> >>>
> >>> .popsection
> >>> +
> >>> +.pushsection .entry.text, "ax"
> >>> +
> >>> +.align L1_CACHE_BYTES, 0xcc
> >>> +SYM_CODE_START_NOALIGN(mds_verw_sel)
> >>> + UNWIND_HINT_UNDEFINED
> >>> + ANNOTATE_NOENDBR
> >>> + .word __KERNEL_DS
> >> You need another .align here.  Otherwise subsequent code will still
> >> start in this cacheline and defeat the purpose of trying to keep it
> >> separate.
> >>
> >>> +SYM_CODE_END(mds_verw_sel);
> >> Thinking about it, should this really be CODE and not a data entry?
> >>
> >> It lives in .entry.text but it really is data and objtool shouldn't be
> >> writing ORC data for it at all.
> >>
> >> (Not to mention that if it's marked as STT_OBJECT, objdump -d will do
> >> the sensible thing and not even try to disassemble it).
> >>
> >> ~Andrew
> >>
> >> P.S. Please CC on the full series.  Far less effort than fishing the
> >> rest off lore.
> > +1 to putting it in .rodata or so.
>
> It's necessarily in .entry.text so it doesn't explode with KPTI active.

Ah, right. In general tooling doesn't take too kindly to putting data
in a text section. But it might be ok.

--
Josh

2023-10-25 22:07:58

by Pawan Gupta

[permalink] [raw]
Subject: Re: [PATCH v3 1/6] x86/bugs: Add asm helpers for executing VERW

On Wed, Oct 25, 2023 at 10:10:41PM +0100, Andrew Cooper wrote:
> > +.align L1_CACHE_BYTES, 0xcc
> > +SYM_CODE_START_NOALIGN(mds_verw_sel)
> > + UNWIND_HINT_UNDEFINED
> > + ANNOTATE_NOENDBR
> > + .word __KERNEL_DS
>
> You need another .align here.  Otherwise subsequent code will still
> start in this cacheline and defeat the purpose of trying to keep it
> separate.

Right.

> > +SYM_CODE_END(mds_verw_sel);
>
> Thinking about it, should this really be CODE and not a data entry?

Would that require adding a data equivalent of .entry.text and update
KPTI to keep it mapped? Or is there an easier option?

> P.S. Please CC on the full series.  Far less effort than fishing the
> rest off lore.

I didn't realize get_maintainer.pl isn't doing that already. Proposing
below update to MAINTAINERS:

---
From: Pawan Gupta <[email protected]>
Date: Wed, 25 Oct 2023 14:50:41 -0700
Subject: [PATCH] MAINTAINERS: Update entry for X86 HARDWARE VULNERABILITIES

Add Andrew Cooper to maintainers of hardware vulnerabilities
mitigations.

Signed-off-by: Pawan Gupta <[email protected]>
---
MAINTAINERS | 1 +
1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2894f0777537..bf8c8707b8f8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23382,6 +23382,7 @@ M: Thomas Gleixner <[email protected]>
M: Borislav Petkov <[email protected]>
M: Peter Zijlstra <[email protected]>
M: Josh Poimboeuf <[email protected]>
+M: Andrew Cooper <[email protected]>
R: Pawan Gupta <[email protected]>
S: Maintained
F: Documentation/admin-guide/hw-vuln/
--
2.34.1

2023-10-25 22:14:07

by Andrew Cooper

[permalink] [raw]
Subject: Re: [PATCH v3 1/6] x86/bugs: Add asm helpers for executing VERW

On 25/10/2023 11:07 pm, Pawan Gupta wrote:
> On Wed, Oct 25, 2023 at 10:10:41PM +0100, Andrew Cooper wrote:
>>> +.align L1_CACHE_BYTES, 0xcc
>>> +SYM_CODE_START_NOALIGN(mds_verw_sel)
>>> + UNWIND_HINT_UNDEFINED
>>> + ANNOTATE_NOENDBR
>>> + .word __KERNEL_DS
>> You need another .align here.  Otherwise subsequent code will still
>> start in this cacheline and defeat the purpose of trying to keep it
>> separate.
> Right.
>
>>> +SYM_CODE_END(mds_verw_sel);
>> Thinking about it, should this really be CODE and not a data entry?
> Would that require adding a data equivalent of .entry.text and update
> KPTI to keep it mapped? Or is there an easier option?

Leave it right here in .entry.text , but try using SYM_DATA() and
friends.  See whether objtool vomits over the result or not.

And if objtool does vomit over the result, then leaving it as it is in
this patch with SYM_CODE() is good enough.

>
>> P.S. Please CC on the full series.  Far less effort than fishing the
>> rest off lore.
> I didn't realize get_maintainer.pl isn't doing that already. Proposing
> below update to MAINTAINERS:
>
> ---
> From: Pawan Gupta <[email protected]>
> Date: Wed, 25 Oct 2023 14:50:41 -0700
> Subject: [PATCH] MAINTAINERS: Update entry for X86 HARDWARE VULNERABILITIES
>
> Add Andrew Cooper to maintainers of hardware vulnerabilities
> mitigations.
>
> Signed-off-by: Pawan Gupta <[email protected]>
> ---
> MAINTAINERS | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 2894f0777537..bf8c8707b8f8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -23382,6 +23382,7 @@ M: Thomas Gleixner <[email protected]>
> M: Borislav Petkov <[email protected]>
> M: Peter Zijlstra <[email protected]>
> M: Josh Poimboeuf <[email protected]>
> +M: Andrew Cooper <[email protected]>

Oh, right.  Perhaps R rather than M seeing as I can't make any time
commitments, but sure.

~Andrew

2023-10-26 13:46:17

by Nikolay Borisov

[permalink] [raw]
Subject: Re: [PATCH v3 1/6] x86/bugs: Add asm helpers for executing VERW



<snip>
> +
> +.pushsection .entry.text, "ax"
> +
> +.align L1_CACHE_BYTES, 0xcc
> +SYM_CODE_START_NOALIGN(mds_verw_sel)
> + UNWIND_HINT_UNDEFINED
> + ANNOTATE_NOENDBR
> + .word __KERNEL_DS
> +SYM_CODE_END(mds_verw_sel);
> +/* For KVM */
> +EXPORT_SYMBOL_GPL(mds_verw_sel);
> +
> +.popsection

<snip>

> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> index c55cc243592e..005e69f93115 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -329,6 +329,21 @@
> #endif
> .endm
>
> +/*
> + * Macros to execute VERW instruction that mitigate transient data sampling
> + * attacks such as MDS. On affected systems a microcode update overloaded VERW
> + * instruction to also clear the CPU buffers. VERW clobbers CFLAGS.ZF.
> + *
> + * Note: Only the memory operand variant of VERW clears the CPU buffers.
> + */
> +.macro EXEC_VERW
> + verw _ASM_RIP(mds_verw_sel)
> +.endm
> +
> +.macro CLEAR_CPU_BUFFERS
> + ALTERNATIVE "", __stringify(EXEC_VERW), X86_FEATURE_CLEAR_CPU_BUF
> +.endm


What happened with the first 5 bytes of a 7 byte nop being complemented
by __KERNEL_DS in order to handle VERW being executed after user
registers are restored and having its memory operand ?

> +
> #else /* __ASSEMBLY__ */
>
> #define ANNOTATE_RETPOLINE_SAFE \
>

2023-10-26 13:59:10

by Andrew Cooper

[permalink] [raw]
Subject: Re: [PATCH v3 1/6] x86/bugs: Add asm helpers for executing VERW

On 26/10/2023 2:44 pm, Nikolay Borisov wrote:
>
>
> <snip>
>> +
>> +.pushsection .entry.text, "ax"
>> +
>> +.align L1_CACHE_BYTES, 0xcc
>> +SYM_CODE_START_NOALIGN(mds_verw_sel)
>> +    UNWIND_HINT_UNDEFINED
>> +    ANNOTATE_NOENDBR
>> +    .word __KERNEL_DS
>> +SYM_CODE_END(mds_verw_sel);
>> +/* For KVM */
>> +EXPORT_SYMBOL_GPL(mds_verw_sel);
>> +
>> +.popsection
>
> <snip>
>
>> diff --git a/arch/x86/include/asm/nospec-branch.h
>> b/arch/x86/include/asm/nospec-branch.h
>> index c55cc243592e..005e69f93115 100644
>> --- a/arch/x86/include/asm/nospec-branch.h
>> +++ b/arch/x86/include/asm/nospec-branch.h
>> @@ -329,6 +329,21 @@
>>   #endif
>>   .endm
>>   +/*
>> + * Macros to execute VERW instruction that mitigate transient data
>> sampling
>> + * attacks such as MDS. On affected systems a microcode update
>> overloaded VERW
>> + * instruction to also clear the CPU buffers. VERW clobbers CFLAGS.ZF.
>> + *
>> + * Note: Only the memory operand variant of VERW clears the CPU
>> buffers.
>> + */
>> +.macro EXEC_VERW
>> +    verw _ASM_RIP(mds_verw_sel)
>> +.endm
>> +
>> +.macro CLEAR_CPU_BUFFERS
>> +    ALTERNATIVE "", __stringify(EXEC_VERW), X86_FEATURE_CLEAR_CPU_BUF
>> +.endm
>
>
> What happened with the first 5 bytes of a 7 byte nop being
> complemented by __KERNEL_DS in order to handle VERW being executed
> after user registers are restored and having its memory operand ?

It was moved out of line (so no need to hide a constant in a nop),
deduped, and renamed to mds_verw_sel.

verw _ASM_RIP(mds_verw_sel) *is* the memory form.

~Andrew

2023-10-26 16:15:00

by Nikolay Borisov

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] KVM: VMX: Move VERW closer to VMentry for MDS mitigation



On 25.10.23 г. 23:53 ч., Pawan Gupta wrote:
> During VMentry VERW is executed to mitigate MDS. After VERW, any memory
> access like register push onto stack may put host data in MDS affected
> CPU buffers. A guest can then use MDS to sample host data.
>
> Although likelihood of secrets surviving in registers at current VERW
> callsite is less, but it can't be ruled out. Harden the MDS mitigation
> by moving the VERW mitigation late in VMentry path.
>
> Note that VERW for MMIO Stale Data mitigation is unchanged because of
> the complexity of per-guest conditional VERW which is not easy to handle
> that late in asm with no GPRs available. If the CPU is also affected by
> MDS, VERW is unconditionally executed late in asm regardless of guest
> having MMIO access.
>
> Signed-off-by: Pawan Gupta <[email protected]>
> ---
> arch/x86/kvm/vmx/vmenter.S | 3 +++
> arch/x86/kvm/vmx/vmx.c | 10 +++++++---
> 2 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
> index b3b13ec04bac..139960deb736 100644
> --- a/arch/x86/kvm/vmx/vmenter.S
> +++ b/arch/x86/kvm/vmx/vmenter.S
> @@ -161,6 +161,9 @@ SYM_FUNC_START(__vmx_vcpu_run)
> /* Load guest RAX. This kills the @regs pointer! */
> mov VCPU_RAX(%_ASM_AX), %_ASM_AX
>
> + /* Clobbers EFLAGS.ZF */
> + CLEAR_CPU_BUFFERS
> +
> /* Check EFLAGS.CF from the VMX_RUN_VMRESUME bit test above. */
> jnc .Lvmlaunch
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 24e8694b83fc..2d149589cf5b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7226,13 +7226,17 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>
> guest_state_enter_irqoff();
>
> - /* L1D Flush includes CPU buffer clear to mitigate MDS */
> + /*
> + * L1D Flush includes CPU buffer clear to mitigate MDS, but VERW
> + * mitigation for MDS is done late in VMentry and is still
> + * executed inspite of L1D Flush. This is because an extra VERW
> + * should not matter much after the big hammer L1D Flush.
> + */
> if (static_branch_unlikely(&vmx_l1d_should_flush))
> vmx_l1d_flush(vcpu);
> - else if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF))
> - mds_clear_cpu_buffers();
> else if (static_branch_unlikely(&mmio_stale_data_clear) &&
> kvm_arch_has_assigned_device(vcpu->kvm))
> + /* MMIO mitigation is mutually exclusive with MDS mitigation later in asm */

Mutually exclusive implies that you have one or the other but not both,
whilst I think the right formulation here is redundant? Because if mmio
is enabled mds_clear_cpu_buffers() will clear the buffers here and
later they'll be cleared again, no ? Alternatively you might augment
this check to only execute iff X86_FEATURE_CLEAR_CPU_BUF is not set?

> mds_clear_cpu_buffers();
>
> vmx_disable_fb_clear(vmx);
>

2023-10-26 16:26:52

by Nikolay Borisov

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] x86/entry_64: Add VERW just before userspace transition



On 25.10.23 г. 23:52 ч., Pawan Gupta wrote:

<snip>

> @@ -1520,6 +1530,7 @@ SYM_CODE_START(ignore_sysret)
> UNWIND_HINT_END_OF_STACK
> ENDBR
> mov $-ENOSYS, %eax
> + CLEAR_CPU_BUFFERS

nit: Just out of curiosity is it really needed in this case or it's
doesn for the sake of uniformity so that all ring3 transitions are
indeed covered??

> sysretl
> SYM_CODE_END(ignore_sysret)
> #endif
> diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
> index 70150298f8bd..245697eb8485 100644
> --- a/arch/x86/entry/entry_64_compat.S
> +++ b/arch/x86/entry/entry_64_compat.S
> @@ -271,6 +271,7 @@ SYM_INNER_LABEL(entry_SYSRETL_compat_unsafe_stack, SYM_L_GLOBAL)
> xorl %r9d, %r9d
> xorl %r10d, %r10d
> swapgs
> + CLEAR_CPU_BUFFERS
> sysretl
> SYM_INNER_LABEL(entry_SYSRETL_compat_end, SYM_L_GLOBAL)
> ANNOTATE_NOENDBR
>

2023-10-26 19:08:41

by Pawan Gupta

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] KVM: VMX: Move VERW closer to VMentry for MDS mitigation

On Thu, Oct 26, 2023 at 07:14:18PM +0300, Nikolay Borisov wrote:
> > if (static_branch_unlikely(&vmx_l1d_should_flush))
> > vmx_l1d_flush(vcpu);
> > - else if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF))
> > - mds_clear_cpu_buffers();
> > else if (static_branch_unlikely(&mmio_stale_data_clear) &&
> > kvm_arch_has_assigned_device(vcpu->kvm))
> > + /* MMIO mitigation is mutually exclusive with MDS mitigation later in asm */
>
> Mutually exclusive implies that you have one or the other but not both,
> whilst I think the right formulation here is redundant? Because if mmio is
> enabled mds_clear_cpu_buffers() will clear the buffers here and later
> they'll be cleared again, no ?

No, because when mmio_stale_data_clear is enabled,
X86_FEATURE_CLEAR_CPU_BUF will not be set because of how mitigation is
selected in mmio_select_mitigation():

mmio_select_mitigation()
{
...
/*
* Enable CPU buffer clear mitigation for host and VMM if also affected
* by MDS or TAA. Otherwise, enable mitigation for VMM only.
*/
if (boot_cpu_has_bug(X86_BUG_MDS) || (boot_cpu_has_bug(X86_BUG_TAA) &&
boot_cpu_has(X86_FEATURE_RTM)))
setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
else
static_branch_enable(&mmio_stale_data_clear);

> Alternatively you might augment this check to only execute iff
> X86_FEATURE_CLEAR_CPU_BUF is not set?

It already is like that due to the logic above. That is what the
comment:

/* MMIO mitigation is mutually exclusive with MDS mitigation later in asm */

... is trying to convey. Suggestions welcome to improve the comment.

2023-10-26 19:30:27

by Pawan Gupta

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] x86/entry_64: Add VERW just before userspace transition

On Thu, Oct 26, 2023 at 07:25:27PM +0300, Nikolay Borisov wrote:
>
>
> On 25.10.23 г. 23:52 ч., Pawan Gupta wrote:
>
> <snip>
>
> > @@ -1520,6 +1530,7 @@ SYM_CODE_START(ignore_sysret)
> > UNWIND_HINT_END_OF_STACK
> > ENDBR
> > mov $-ENOSYS, %eax
> > + CLEAR_CPU_BUFFERS
>
> nit: Just out of curiosity is it really needed in this case or it's doesn
> for the sake of uniformity so that all ring3 transitions are indeed
> covered??

Interrupts returning to kernel don't clear the CPU buffers. I believe
interrupts will be enabled here, and getting an interrupt here could
leak the data that interrupt touched.

2023-10-26 19:31:19

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] KVM: VMX: Move VERW closer to VMentry for MDS mitigation

On Wed, Oct 25, 2023, Pawan Gupta wrote:
> During VMentry VERW is executed to mitigate MDS. After VERW, any memory
> access like register push onto stack may put host data in MDS affected
> CPU buffers. A guest can then use MDS to sample host data.
>
> Although likelihood of secrets surviving in registers at current VERW
> callsite is less, but it can't be ruled out. Harden the MDS mitigation
> by moving the VERW mitigation late in VMentry path.
>
> Note that VERW for MMIO Stale Data mitigation is unchanged because of
> the complexity of per-guest conditional VERW which is not easy to handle
> that late in asm with no GPRs available. If the CPU is also affected by
> MDS, VERW is unconditionally executed late in asm regardless of guest
> having MMIO access.
>
> Signed-off-by: Pawan Gupta <[email protected]>
> ---
> arch/x86/kvm/vmx/vmenter.S | 3 +++
> arch/x86/kvm/vmx/vmx.c | 10 +++++++---
> 2 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
> index b3b13ec04bac..139960deb736 100644
> --- a/arch/x86/kvm/vmx/vmenter.S
> +++ b/arch/x86/kvm/vmx/vmenter.S
> @@ -161,6 +161,9 @@ SYM_FUNC_START(__vmx_vcpu_run)
> /* Load guest RAX. This kills the @regs pointer! */
> mov VCPU_RAX(%_ASM_AX), %_ASM_AX
>
> + /* Clobbers EFLAGS.ZF */
> + CLEAR_CPU_BUFFERS
> +
> /* Check EFLAGS.CF from the VMX_RUN_VMRESUME bit test above. */
> jnc .Lvmlaunch
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 24e8694b83fc..2d149589cf5b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7226,13 +7226,17 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>
> guest_state_enter_irqoff();
>
> - /* L1D Flush includes CPU buffer clear to mitigate MDS */
> + /*
> + * L1D Flush includes CPU buffer clear to mitigate MDS, but VERW
> + * mitigation for MDS is done late in VMentry and is still
> + * executed inspite of L1D Flush. This is because an extra VERW

in spite

> + * should not matter much after the big hammer L1D Flush.
> + */
> if (static_branch_unlikely(&vmx_l1d_should_flush))
> vmx_l1d_flush(vcpu);

There's an existing bug here. vmx_1ld_flush() is not guaranteed to do a flush in
"conditional mode", and is not guaranteed to do a ucode-based flush (though I can't
tell if it's possible for the VERW magic to exist without X86_FEATURE_FLUSH_L1D).

If we care, something like the diff at the bottom is probably needed.

> - else if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF))
> - mds_clear_cpu_buffers();
> else if (static_branch_unlikely(&mmio_stale_data_clear) &&
> kvm_arch_has_assigned_device(vcpu->kvm))
> + /* MMIO mitigation is mutually exclusive with MDS mitigation later in asm */

Please don't put comments inside an if/elif without curly braces (and I don't
want to add curly braces). Though I think that's a moot point if we first fix
the conditional L1D flush issue. E.g. when the dust settles we can end up with:

/*
* Note, a ucode-based L1D flush also flushes CPU buffers, i.e. the
* manual VERW in __vmx_vcpu_run() to mitigate MDS *may* be redundant.
* But an L1D Flush is not guaranteed for "conditional mode", and the
* cost of an extra VERW after a full L1D flush is negligible.
*/
if (static_branch_unlikely(&vmx_l1d_should_flush))
cpu_buffers_flushed = vmx_l1d_flush(vcpu);

/*
* The MMIO stale data vulnerability is a subset of the general MDS
* vulnerability, i.e. this is mutually exclusive with the VERW that's
* done just before VM-Enter. The vulnerability requires the attacker,
* i.e. the guest, to do MMIO, so this "clear" can be done earlier.
*/
if (static_branch_unlikely(&mmio_stale_data_clear) &&
!cpu_buffers_flushed && kvm_arch_has_assigned_device(vcpu->kvm))
mds_clear_cpu_buffers();

> mds_clear_cpu_buffers();
>
> vmx_disable_fb_clear(vmx);

LOL, nice. IIUC, setting FB_CLEAR_DIS is mutually exclusive with doing a late
VERW, as KVM will never set FB_CLEAR_DIS if the CPU is susceptible to X86_BUG_MDS.
But the checks aren't identical, which makes this _look_ sketchy.

Can you do something like this to ensure we don't accidentally neuter the late VERW?

static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx)
{
vmx->disable_fb_clear = (host_arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) &&
!boot_cpu_has_bug(X86_BUG_MDS) &&
!boot_cpu_has_bug(X86_BUG_TAA);

if (vmx->disable_fb_clear &&
WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF)))
vmx->disable_fb_clear = false;

...
}

--
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6e502ba93141..cf6e06bb8310 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6606,8 +6606,11 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
* is not exactly LRU. This could be sized at runtime via topology
* information but as all relevant affected CPUs have 32KiB L1D cache size
* there is no point in doing so.
+ *
+ * Returns %true if CPU buffers were cleared, i.e. if a microcode-based L1D
+ * flush was executed (which also clears CPU buffers).
*/
-static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
+static noinstr bool vmx_l1d_flush(struct kvm_vcpu *vcpu)
{
int size = PAGE_SIZE << L1D_CACHE_ORDER;

@@ -6634,14 +6637,14 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
kvm_clear_cpu_l1tf_flush_l1d();

if (!flush_l1d)
- return;
+ return false;
}

vcpu->stat.l1d_flush++;

if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
native_wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
- return;
+ return true;
}

asm volatile(
@@ -6665,6 +6668,8 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
:: [flush_pages] "r" (vmx_l1d_flush_pages),
[size] "r" (size)
: "eax", "ebx", "ecx", "edx");
+
+ return false;
}

static void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
@@ -7222,16 +7227,17 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
unsigned int flags)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ bool cpu_buffers_flushed = false;

guest_state_enter_irqoff();

- /* L1D Flush includes CPU buffer clear to mitigate MDS */
if (static_branch_unlikely(&vmx_l1d_should_flush))
- vmx_l1d_flush(vcpu);
- else if (static_branch_unlikely(&mds_user_clear))
- mds_clear_cpu_buffers();
- else if (static_branch_unlikely(&mmio_stale_data_clear) &&
- kvm_arch_has_assigned_device(vcpu->kvm))
+ cpu_buffers_flushed = vmx_l1d_flush(vcpu);
+
+ if ((static_branch_unlikely(&mds_user_clear) ||
+ (static_branch_unlikely(&mmio_stale_data_clear) &&
+ kvm_arch_has_assigned_device(vcpu->kvm))) &&
+ !cpu_buffers_flushed)
mds_clear_cpu_buffers();

vmx_disable_fb_clear(vmx);

2023-10-26 19:41:13

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] x86/entry_64: Add VERW just before userspace transition

On 10/26/23 12:29, Pawan Gupta wrote:
> On Thu, Oct 26, 2023 at 07:25:27PM +0300, Nikolay Borisov wrote:
>> On 25.10.23 г. 23:52 ч., Pawan Gupta wrote:
>>> @@ -1520,6 +1530,7 @@ SYM_CODE_START(ignore_sysret)
>>> UNWIND_HINT_END_OF_STACK
>>> ENDBR
>>> mov $-ENOSYS, %eax
>>> + CLEAR_CPU_BUFFERS
>> nit: Just out of curiosity is it really needed in this case or it's doesn
>> for the sake of uniformity so that all ring3 transitions are indeed
>> covered??
> Interrupts returning to kernel don't clear the CPU buffers. I believe
> interrupts will be enabled here, and getting an interrupt here could
> leak the data that interrupt touched.

Specifically NMIs, right?

X86_EFLAGS_IF should be clear here.

2023-10-26 20:18:06

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] KVM: VMX: Move VERW closer to VMentry for MDS mitigation

On Thu, Oct 26, 2023, Sean Christopherson wrote:
> On Wed, Oct 25, 2023, Pawan Gupta wrote:
> > vmx_disable_fb_clear(vmx);
>
> LOL, nice. IIUC, setting FB_CLEAR_DIS is mutually exclusive with doing a late
> VERW, as KVM will never set FB_CLEAR_DIS if the CPU is susceptible to X86_BUG_MDS.
> But the checks aren't identical, which makes this _look_ sketchy.
>
> Can you do something like this to ensure we don't accidentally neuter the late VERW?
>
> static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx)
> {
> vmx->disable_fb_clear = (host_arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) &&
> !boot_cpu_has_bug(X86_BUG_MDS) &&
> !boot_cpu_has_bug(X86_BUG_TAA);
>
> if (vmx->disable_fb_clear &&
> WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF)))
> vmx->disable_fb_clear = false;
>
> ...
> }

Alternatively, and maybe even preferably, this would make it more obvious that
the two are mutually exclusive and would also be a (very, very) small perf win
when the mitigation is enabled.

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0936516cb93b..592103df1754 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7236,7 +7236,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
kvm_arch_has_assigned_device(vcpu->kvm))
mds_clear_cpu_buffers();

- vmx_disable_fb_clear(vmx);
+ if (!cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF))
+ vmx_disable_fb_clear(vmx);

if (vcpu->arch.cr2 != native_read_cr2())
native_write_cr2(vcpu->arch.cr2);
@@ -7249,7 +7250,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,

vmx->idt_vectoring_info = 0;

- vmx_enable_fb_clear(vmx);
+ if (!cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF))
+ vmx_enable_fb_clear(vmx);

if (unlikely(vmx->fail)) {
vmx->exit_reason.full = 0xdead;

2023-10-26 20:48:34

by Pawan Gupta

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] KVM: VMX: Move VERW closer to VMentry for MDS mitigation

On Thu, Oct 26, 2023 at 12:30:55PM -0700, Sean Christopherson wrote:
> > - /* L1D Flush includes CPU buffer clear to mitigate MDS */
> > + /*
> > + * L1D Flush includes CPU buffer clear to mitigate MDS, but VERW
> > + * mitigation for MDS is done late in VMentry and is still
> > + * executed inspite of L1D Flush. This is because an extra VERW
>
> in spite

Ok.

> > + * should not matter much after the big hammer L1D Flush.
> > + */
> > if (static_branch_unlikely(&vmx_l1d_should_flush))
> > vmx_l1d_flush(vcpu);
>
> There's an existing bug here. vmx_1ld_flush() is not guaranteed to do a flush in
> "conditional mode", and is not guaranteed to do a ucode-based flush

AFAICT, it is based on the condition whether after a VMexit any
sensitive data could have been touched or not. If L1TF mitigation
doesn't consider certain data sensitive and skips L1D flush, executing
VERW isn't giving any protection, since that data can anyways be leaked
from L1D using L1TF.

> (though I can't tell if it's possible for the VERW magic to exist
> without X86_FEATURE_FLUSH_L1D).

Likely not, ucode that adds VERW should have X86_FEATURE_FLUSH_L1D as
L1TF was mitigation prior to MDS.

> If we care, something like the diff at the bottom is probably needed.
>
> > - else if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF))
> > - mds_clear_cpu_buffers();
> > else if (static_branch_unlikely(&mmio_stale_data_clear) &&
> > kvm_arch_has_assigned_device(vcpu->kvm))
> > + /* MMIO mitigation is mutually exclusive with MDS mitigation later in asm */
>
> Please don't put comments inside an if/elif without curly braces (and I don't
> want to add curly braces). Though I think that's a moot point if we first fix
> the conditional L1D flush issue. E.g. when the dust settles we can end up with:

Ok.

> /*
> * Note, a ucode-based L1D flush also flushes CPU buffers, i.e. the
> * manual VERW in __vmx_vcpu_run() to mitigate MDS *may* be redundant.
> * But an L1D Flush is not guaranteed for "conditional mode", and the
> * cost of an extra VERW after a full L1D flush is negligible.
> */
> if (static_branch_unlikely(&vmx_l1d_should_flush))
> cpu_buffers_flushed = vmx_l1d_flush(vcpu);
>
> /*
> * The MMIO stale data vulnerability is a subset of the general MDS
> * vulnerability, i.e. this is mutually exclusive with the VERW that's
> * done just before VM-Enter. The vulnerability requires the attacker,
> * i.e. the guest, to do MMIO, so this "clear" can be done earlier.
> */
> if (static_branch_unlikely(&mmio_stale_data_clear) &&
> !cpu_buffers_flushed && kvm_arch_has_assigned_device(vcpu->kvm))
> mds_clear_cpu_buffers();

This is certainly better, but I don't know what scenario is this helping with.

> > mds_clear_cpu_buffers();
> >
> > vmx_disable_fb_clear(vmx);
>
> LOL, nice. IIUC, setting FB_CLEAR_DIS is mutually exclusive with doing a late
> VERW, as KVM will never set FB_CLEAR_DIS if the CPU is susceptible to X86_BUG_MDS.
> But the checks aren't identical, which makes this _look_ sketchy.
>
> Can you do something like this to ensure we don't accidentally neuter the late VERW?
>
> static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx)
> {
> vmx->disable_fb_clear = (host_arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) &&
> !boot_cpu_has_bug(X86_BUG_MDS) &&
> !boot_cpu_has_bug(X86_BUG_TAA);
>
> if (vmx->disable_fb_clear &&
> WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF)))
> vmx->disable_fb_clear = false;

Will do, this makes a lot of sense.

2023-10-26 21:16:12

by Pawan Gupta

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] x86/entry_64: Add VERW just before userspace transition

On Thu, Oct 26, 2023 at 12:40:49PM -0700, Dave Hansen wrote:
> On 10/26/23 12:29, Pawan Gupta wrote:
> > On Thu, Oct 26, 2023 at 07:25:27PM +0300, Nikolay Borisov wrote:
> >> On 25.10.23 г. 23:52 ч., Pawan Gupta wrote:
> >>> @@ -1520,6 +1530,7 @@ SYM_CODE_START(ignore_sysret)
> >>> UNWIND_HINT_END_OF_STACK
> >>> ENDBR
> >>> mov $-ENOSYS, %eax
> >>> + CLEAR_CPU_BUFFERS
> >> nit: Just out of curiosity is it really needed in this case or it's doesn
> >> for the sake of uniformity so that all ring3 transitions are indeed
> >> covered??
> > Interrupts returning to kernel don't clear the CPU buffers. I believe
> > interrupts will be enabled here, and getting an interrupt here could
> > leak the data that interrupt touched.
>
> Specifically NMIs, right?

Yes, and VERW can omitted for the same reason as NMI returning to
kernel.

> X86_EFLAGS_IF should be clear here.

I see that SYSCALL has a configuration for IF, but I didn't see it for
SYSENTER in the code. But looking at the SDM, it clear IF by default.

syscall_init()
{
...
#else
wrmsrl_cstar((unsigned long)ignore_sysret);
wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG);
wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL);
wrmsrl_safe(MSR_IA32_SYSENTER_EIP, 0ULL);
#endif

/*
* Flags to clear on syscall; clear as much as possible
* to minimize user space-kernel interference.
*/
wrmsrl(MSR_SYSCALL_MASK,
X86_EFLAGS_CF|X86_EFLAGS_PF|X86_EFLAGS_AF|
X86_EFLAGS_ZF|X86_EFLAGS_SF|X86_EFLAGS_TF|
X86_EFLAGS_IF|X86_EFLAGS_DF|X86_EFLAGS_OF|
X86_EFLAGS_IOPL|X86_EFLAGS_NT|X86_EFLAGS_RF|
X86_EFLAGS_AC|X86_EFLAGS_ID);

2023-10-26 21:23:18

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] KVM: VMX: Move VERW closer to VMentry for MDS mitigation

On Thu, Oct 26, 2023, Pawan Gupta wrote:
> On Thu, Oct 26, 2023 at 12:30:55PM -0700, Sean Christopherson wrote:
> > > if (static_branch_unlikely(&vmx_l1d_should_flush))
> > > vmx_l1d_flush(vcpu);
> >
> > There's an existing bug here. vmx_1ld_flush() is not guaranteed to do a flush in
> > "conditional mode", and is not guaranteed to do a ucode-based flush
>
> AFAICT, it is based on the condition whether after a VMexit any
> sensitive data could have been touched or not. If L1TF mitigation
> doesn't consider certain data sensitive and skips L1D flush, executing
> VERW isn't giving any protection, since that data can anyways be leaked
> from L1D using L1TF.

That assumes vcpu->arch.l1tf_flush_l1d is 100% precise and accurate, which is most
definitely not the case. You're also preventing the admin from choosing between
being super paranoind (always flush L1D) and mostly paranoid (conditionally flush
L1D, always flush CPU buffers).

AIUI, flushing the L1D is crazy expensive compared to flushing the CPU buffers,
so it's entirely plausible for someone to want to choose the mostly paranoid
option.

Side topic, isn't the NMI path missing a call to kvm_set_cpu_l1tf_flush_l1d()?

> > /*
> > * The MMIO stale data vulnerability is a subset of the general MDS
> > * vulnerability, i.e. this is mutually exclusive with the VERW that's
> > * done just before VM-Enter. The vulnerability requires the attacker,
> > * i.e. the guest, to do MMIO, so this "clear" can be done earlier.
> > */
> > if (static_branch_unlikely(&mmio_stale_data_clear) &&
> > !cpu_buffers_flushed && kvm_arch_has_assigned_device(vcpu->kvm))
> > mds_clear_cpu_buffers();
>
> This is certainly better, but I don't know what scenario is this helping with.

Heh, that's host I feel about moving VERW to just before VM-Enter. I have a hard
time believing there's meaningful sensitive that's accessed in __vmx_vcpu_run().
The closest thing is probably CR2, but that's a very dubious vector since CR2 will
hold a guest value for most VM-Enters.

I'm not against moving VERW close to VM-Enter because it's relatively straightforward,
but if we're going to be super paranoid, why not go all the way and not have to
worry about what ifs?

2023-10-26 21:28:08

by Pawan Gupta

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] KVM: VMX: Move VERW closer to VMentry for MDS mitigation

On Thu, Oct 26, 2023 at 01:17:47PM -0700, Sean Christopherson wrote:
> Alternatively, and maybe even preferably, this would make it more obvious that
> the two are mutually exclusive and would also be a (very, very) small perf win
> when the mitigation is enabled.

Agree.

> - vmx_disable_fb_clear(vmx);
> + if (!cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF))
> + vmx_disable_fb_clear(vmx);

2023-10-26 22:04:04

by Pawan Gupta

[permalink] [raw]
Subject: Re: [PATCH v3 6/6] KVM: VMX: Move VERW closer to VMentry for MDS mitigation

On Thu, Oct 26, 2023 at 02:22:58PM -0700, Sean Christopherson wrote:
> On Thu, Oct 26, 2023, Pawan Gupta wrote:
> > On Thu, Oct 26, 2023 at 12:30:55PM -0700, Sean Christopherson wrote:
> > > > if (static_branch_unlikely(&vmx_l1d_should_flush))
> > > > vmx_l1d_flush(vcpu);
> > >
> > > There's an existing bug here. vmx_1ld_flush() is not guaranteed to do a flush in
> > > "conditional mode", and is not guaranteed to do a ucode-based flush
> >
> > AFAICT, it is based on the condition whether after a VMexit any
> > sensitive data could have been touched or not. If L1TF mitigation
> > doesn't consider certain data sensitive and skips L1D flush, executing
> > VERW isn't giving any protection, since that data can anyways be leaked
> > from L1D using L1TF.
>
> That assumes vcpu->arch.l1tf_flush_l1d is 100% precise and accurate, which is most
> definitely not the case. You're also preventing the admin from choosing between
> being super paranoind (always flush L1D) and mostly paranoid (conditionally flush
> L1D, always flush CPU buffers).
> AIUI, flushing the L1D is crazy expensive compared to flushing the CPU buffers,
> so it's entirely plausible for someone to want to choose the mostly paranoid
> option.

Sure, if it helps an admin. I was asking about the problematic scenario
out of curiosity. BTW, the changes you suggested are definitely worth
doing.

> Side topic, isn't the NMI path missing a call to kvm_set_cpu_l1tf_flush_l1d()?

Yes, it is missing. Not sure if it was omitted intentionally.

> > This is certainly better, but I don't know what scenario is this helping with.
>
> Heh, that's host I feel about moving VERW to just before VM-Enter. I have a hard
> time believing there's meaningful sensitive that's accessed in __vmx_vcpu_run().
> The closest thing is probably CR2, but that's a very dubious vector since CR2 will
> hold a guest value for most VM-Enters.

Yes, kernel->user case has a better chance of leaking anything.

> I'm not against moving VERW close to VM-Enter because it's relatively straightforward,
> but if we're going to be super paranoid, why not go all the way and not have to
> worry about what ifs?

Right. The VMenter changes are mostly done to be consistent with what is being
done for kernel->user.

2023-10-26 22:14:09

by Pawan Gupta

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] x86/entry_64: Add VERW just before userspace transition

On Thu, Oct 26, 2023 at 02:15:11PM -0700, Pawan Gupta wrote:
> On Thu, Oct 26, 2023 at 12:40:49PM -0700, Dave Hansen wrote:
> > On 10/26/23 12:29, Pawan Gupta wrote:
> > > On Thu, Oct 26, 2023 at 07:25:27PM +0300, Nikolay Borisov wrote:
> > >> On 25.10.23 г. 23:52 ч., Pawan Gupta wrote:
> > >>> @@ -1520,6 +1530,7 @@ SYM_CODE_START(ignore_sysret)
> > >>> UNWIND_HINT_END_OF_STACK
> > >>> ENDBR
> > >>> mov $-ENOSYS, %eax
> > >>> + CLEAR_CPU_BUFFERS
> > >> nit: Just out of curiosity is it really needed in this case or it's doesn
> > >> for the sake of uniformity so that all ring3 transitions are indeed
> > >> covered??
> > > Interrupts returning to kernel don't clear the CPU buffers. I believe
> > > interrupts will be enabled here, and getting an interrupt here could
> > > leak the data that interrupt touched.
> >
> > Specifically NMIs, right?
>
> Yes, and VERW can omitted for the same reason as NMI returning to
> kernel.

Thinking more on this, we should not omit verw here, as this spot is way
easier to target NMIs. A user executing SYSENTER in a loop has much
higher chances of causing an NMI to return to kernel, and skip verw.

2023-10-26 22:18:20

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v3 2/6] x86/entry_64: Add VERW just before userspace transition

On 10/26/23 15:13, Pawan Gupta wrote:
>>>> Interrupts returning to kernel don't clear the CPU buffers. I believe
>>>> interrupts will be enabled here, and getting an interrupt here could
>>>> leak the data that interrupt touched.
>>> Specifically NMIs, right?
>> Yes, and VERW can omitted for the same reason as NMI returning to
>> kernel.
> Thinking more on this, we should not omit verw here, as this spot is way
> easier to target NMIs. A user executing SYSENTER in a loop has much
> higher chances of causing an NMI to return to kernel, and skip verw.

Right.

This is also a path where we care *ZERO* about performance. It's
basically all upside to _add_ VERW and all downside (increased attack
surface) to skip it.

2023-10-27 13:48:47

by Pawan Gupta

[permalink] [raw]
Subject: Re: [PATCH v3 1/6] x86/bugs: Add asm helpers for executing VERW

On Wed, Oct 25, 2023 at 11:13:46PM +0100, Andrew Cooper wrote:
> On 25/10/2023 11:07 pm, Pawan Gupta wrote:
> > On Wed, Oct 25, 2023 at 10:10:41PM +0100, Andrew Cooper wrote:
> >>> +.align L1_CACHE_BYTES, 0xcc
> >>> +SYM_CODE_START_NOALIGN(mds_verw_sel)
> >>> + UNWIND_HINT_UNDEFINED
> >>> + ANNOTATE_NOENDBR
> >>> + .word __KERNEL_DS
> >> You need another .align here.  Otherwise subsequent code will still
> >> start in this cacheline and defeat the purpose of trying to keep it
> >> separate.
> > Right.
> >
> >>> +SYM_CODE_END(mds_verw_sel);
> >> Thinking about it, should this really be CODE and not a data entry?
> > Would that require adding a data equivalent of .entry.text and update
> > KPTI to keep it mapped? Or is there an easier option?
>
> Leave it right here in .entry.text , but try using SYM_DATA() and
> friends.  See whether objtool vomits over the result or not.

objtool still complaints when using SYM_DATA*() without the annotations:

vmlinux.o: warning: objtool: mds_verw_sel+0x0: unreachable instruction
vmlinux.o: warning: objtool: .altinstr_replacement+0x2c: relocation to !ENDBR: mds_verw_sel+0x0

> And if objtool does vomit over the result, then leaving it as it is in
> this patch with SYM_CODE() is good enough.

Settling with SYM_CODE().

On the bright-side, I am seeing even better perf with VERW operand
out-of-line:

Baseline: v6.6-rc5

| Test | Configuration | v1 | v3 |
| ------------------ | ---------------------- | ---- | ---- |
| build-linux-kernel | defconfig | 1.00 | 1.00 |
| hackbench | 32 - Process | 1.02 | 1.06 |
| nginx | Short Connection - 500 | 1.01 | 1.04 |

Disclaimer: These are collected by a stupid dev who knows nothing about
perf, please take this with a grain of salt.

I will be sending v4 soon.

2023-10-27 14:13:06

by Andrew Cooper

[permalink] [raw]
Subject: Re: [PATCH v3 1/6] x86/bugs: Add asm helpers for executing VERW

On 27/10/2023 2:48 pm, Pawan Gupta wrote:
> On the bright-side, I am seeing even better perf with VERW operand
> out-of-line:
>
> Baseline: v6.6-rc5
>
> | Test | Configuration | v1 | v3 |
> | ------------------ | ---------------------- | ---- | ---- |
> | build-linux-kernel | defconfig | 1.00 | 1.00 |
> | hackbench | 32 - Process | 1.02 | 1.06 |
> | nginx | Short Connection - 500 | 1.01 | 1.04 |
>
> Disclaimer: These are collected by a stupid dev who knows nothing about
> perf, please take this with a grain of salt.

:)

Almost as if it's a good idea to follow the advice of the Optimisation
Guide on mixing code and data, which is "don't".

~Andrew

2023-10-27 14:24:28

by Pawan Gupta

[permalink] [raw]
Subject: Re: [PATCH v3 1/6] x86/bugs: Add asm helpers for executing VERW

On Fri, Oct 27, 2023 at 03:12:45PM +0100, Andrew Cooper wrote:
> Almost as if it's a good idea to follow the advice of the Optimisation
> Guide on mixing code and data, which is "don't".

Thanks a lot Andrew and Peter for shepherding me this way.