2021-02-06 03:47:13

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC v1 09/26] x86/tdx: Handle CPUID via #VE

On Fri, Feb 5, 2021 at 3:39 PM Kuppuswamy Sathyanarayanan
<[email protected]> wrote:
>
> From: "Kirill A. Shutemov" <[email protected]>
>
> TDX has three classes of CPUID leaves: some CPUID leaves
> are always handled by the CPU, others are handled by the TDX module,
> and some others are handled by the VMM. Since the VMM cannot directly
> intercept the instruction these are reflected with a #VE exception
> to the guest, which then converts it into a TDCALL to the VMM,
> or handled directly.
>
> The TDX module EAS has a full list of CPUID leaves which are handled
> natively or by the TDX module in 16.2. Only unknown CPUIDs are handled by
> the #VE method. In practice this typically only applies to the
> hypervisor specific CPUIDs unknown to the native CPU.
>
> Therefore there is no risk of causing this in early CPUID code which
> runs before the #VE handler is set up because it will never access
> those exotic CPUID leaves.
>
> Signed-off-by: Kirill A. Shutemov <[email protected]>
> Reviewed-by: Andi Kleen <[email protected]>
> Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
> ---
> arch/x86/kernel/tdx.c | 32 ++++++++++++++++++++++++++++++++
> 1 file changed, 32 insertions(+)
>
> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
> index 5d961263601e..e98058c048b5 100644
> --- a/arch/x86/kernel/tdx.c
> +++ b/arch/x86/kernel/tdx.c
> @@ -172,6 +172,35 @@ static int tdx_write_msr_safe(unsigned int msr, unsigned int low,
> return ret || r10 ? -EIO : 0;
> }
>
> +static void tdx_handle_cpuid(struct pt_regs *regs)
> +{
> + register long r10 asm("r10") = TDVMCALL_STANDARD;
> + register long r11 asm("r11") = EXIT_REASON_CPUID;
> + register long r12 asm("r12") = regs->ax;
> + register long r13 asm("r13") = regs->cx;
> + register long r14 asm("r14");
> + register long r15 asm("r15");
> + register long rcx asm("rcx");
> + long ret;
> +
> + /* Allow to pass R10, R11, R12, R13, R14 and R15 down to the VMM */
> + rcx = BIT(10) | BIT(11) | BIT(12) | BIT(13) | BIT(14) | BIT(15);
> +
> + asm volatile(TDCALL
> + : "=a"(ret), "=r"(r10), "=r"(r11), "=r"(r12), "=r"(r13),
> + "=r"(r14), "=r"(r15)
> + : "a"(TDVMCALL), "r"(rcx), "r"(r10), "r"(r11), "r"(r12),
> + "r"(r13)
> + : );

Some "+" constraints would make this simpler. But I think you should
factor the TDCALL helper out into its own function.


2021-02-07 14:16:22

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC v1 09/26] x86/tdx: Handle CPUID via #VE

On Fri, Feb 05, 2021 at 03:42:01PM -0800, Andy Lutomirski wrote:
> On Fri, Feb 5, 2021 at 3:39 PM Kuppuswamy Sathyanarayanan
> <[email protected]> wrote:
> >
> > From: "Kirill A. Shutemov" <[email protected]>
> >
> > TDX has three classes of CPUID leaves: some CPUID leaves
> > are always handled by the CPU, others are handled by the TDX module,
> > and some others are handled by the VMM. Since the VMM cannot directly
> > intercept the instruction these are reflected with a #VE exception
> > to the guest, which then converts it into a TDCALL to the VMM,
> > or handled directly.
> >
> > The TDX module EAS has a full list of CPUID leaves which are handled
> > natively or by the TDX module in 16.2. Only unknown CPUIDs are handled by
> > the #VE method. In practice this typically only applies to the
> > hypervisor specific CPUIDs unknown to the native CPU.
> >
> > Therefore there is no risk of causing this in early CPUID code which
> > runs before the #VE handler is set up because it will never access
> > those exotic CPUID leaves.
> >
> > Signed-off-by: Kirill A. Shutemov <[email protected]>
> > Reviewed-by: Andi Kleen <[email protected]>
> > Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
> > ---
> > arch/x86/kernel/tdx.c | 32 ++++++++++++++++++++++++++++++++
> > 1 file changed, 32 insertions(+)
> >
> > diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
> > index 5d961263601e..e98058c048b5 100644
> > --- a/arch/x86/kernel/tdx.c
> > +++ b/arch/x86/kernel/tdx.c
> > @@ -172,6 +172,35 @@ static int tdx_write_msr_safe(unsigned int msr, unsigned int low,
> > return ret || r10 ? -EIO : 0;
> > }
> >
> > +static void tdx_handle_cpuid(struct pt_regs *regs)
> > +{
> > + register long r10 asm("r10") = TDVMCALL_STANDARD;
> > + register long r11 asm("r11") = EXIT_REASON_CPUID;
> > + register long r12 asm("r12") = regs->ax;
> > + register long r13 asm("r13") = regs->cx;
> > + register long r14 asm("r14");
> > + register long r15 asm("r15");
> > + register long rcx asm("rcx");
> > + long ret;
> > +
> > + /* Allow to pass R10, R11, R12, R13, R14 and R15 down to the VMM */
> > + rcx = BIT(10) | BIT(11) | BIT(12) | BIT(13) | BIT(14) | BIT(15);
> > +
> > + asm volatile(TDCALL
> > + : "=a"(ret), "=r"(r10), "=r"(r11), "=r"(r12), "=r"(r13),
> > + "=r"(r14), "=r"(r15)
> > + : "a"(TDVMCALL), "r"(rcx), "r"(r10), "r"(r11), "r"(r12),
> > + "r"(r13)
> > + : );
>
> Some "+" constraints would make this simpler. But I think you should
> factor the TDCALL helper out into its own function.

Factor out TDCALL into a helper is tricky: different TDCALLs have
different list of registers passed to VMM.

--
Kirill A. Shutemov

2021-02-07 16:03:28

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC v1 09/26] x86/tdx: Handle CPUID via #VE

On 2/7/21 6:13 AM, Kirill A. Shutemov wrote:
>>> + /* Allow to pass R10, R11, R12, R13, R14 and R15 down to the VMM */
>>> + rcx = BIT(10) | BIT(11) | BIT(12) | BIT(13) | BIT(14) | BIT(15);
>>> +
>>> + asm volatile(TDCALL
>>> + : "=a"(ret), "=r"(r10), "=r"(r11), "=r"(r12), "=r"(r13),
>>> + "=r"(r14), "=r"(r15)
>>> + : "a"(TDVMCALL), "r"(rcx), "r"(r10), "r"(r11), "r"(r12),
>>> + "r"(r13)
>>> + : );
>> Some "+" constraints would make this simpler. But I think you should
>> factor the TDCALL helper out into its own function.
> Factor out TDCALL into a helper is tricky: different TDCALLs have
> different list of registers passed to VMM.

Couldn't you just have one big helper that takes *all* the registers
that get used in any TDVMCALL and sets all the rcx bits? The users
could just pass 0's for the things they don't use.

Then you've got the ugly inline asm in one place. It also makes it
harder to screw up the 'rcx' mask and end up passing registers you
didn't want into a malicious VMM.

2021-02-07 20:31:16

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC v1 09/26] x86/tdx: Handle CPUID via #VE

On Sun, Feb 07, 2021 at 08:01:50AM -0800, Dave Hansen wrote:
> On 2/7/21 6:13 AM, Kirill A. Shutemov wrote:
> >>> + /* Allow to pass R10, R11, R12, R13, R14 and R15 down to the VMM */
> >>> + rcx = BIT(10) | BIT(11) | BIT(12) | BIT(13) | BIT(14) | BIT(15);
> >>> +
> >>> + asm volatile(TDCALL
> >>> + : "=a"(ret), "=r"(r10), "=r"(r11), "=r"(r12), "=r"(r13),
> >>> + "=r"(r14), "=r"(r15)
> >>> + : "a"(TDVMCALL), "r"(rcx), "r"(r10), "r"(r11), "r"(r12),
> >>> + "r"(r13)
> >>> + : );
> >> Some "+" constraints would make this simpler. But I think you should
> >> factor the TDCALL helper out into its own function.
> > Factor out TDCALL into a helper is tricky: different TDCALLs have
> > different list of registers passed to VMM.
>
> Couldn't you just have one big helper that takes *all* the registers
> that get used in any TDVMCALL and sets all the rcx bits? The users
> could just pass 0's for the things they don't use.
>
> Then you've got the ugly inline asm in one place. It also makes it
> harder to screw up the 'rcx' mask and end up passing registers you
> didn't want into a malicious VMM.

For now we only pass down R10-R15, but the interface allows to pass down
much wider set of registers, including XMM. How far do we want to get it?

--
Kirill A. Shutemov

2021-02-07 22:33:55

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC v1 09/26] x86/tdx: Handle CPUID via #VE

On 2/7/21 12:29 PM, Kirill A. Shutemov wrote:
>> Couldn't you just have one big helper that takes *all* the registers
>> that get used in any TDVMCALL and sets all the rcx bits? The users
>> could just pass 0's for the things they don't use.
>>
>> Then you've got the ugly inline asm in one place. It also makes it
>> harder to screw up the 'rcx' mask and end up passing registers you
>> didn't want into a malicious VMM.
> For now we only pass down R10-R15, but the interface allows to pass down
> much wider set of registers, including XMM. How far do we want to get it?

Just do what we immediately need: R10-R15.

2021-02-07 22:48:49

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC v1 09/26] x86/tdx: Handle CPUID via #VE


> On Feb 7, 2021, at 2:31 PM, Dave Hansen <[email protected]> wrote:
>
> On 2/7/21 12:29 PM, Kirill A. Shutemov wrote:
>>> Couldn't you just have one big helper that takes *all* the registers
>>> that get used in any TDVMCALL and sets all the rcx bits? The users
>>> could just pass 0's for the things they don't use.
>>>
>>> Then you've got the ugly inline asm in one place. It also makes it
>>> harder to screw up the 'rcx' mask and end up passing registers you
>>> didn't want into a malicious VMM.
>> For now we only pass down R10-R15, but the interface allows to pass down
>> much wider set of registers, including XMM. How far do we want to get it?
>
> Just do what we immediately need: R10-R15
> .
>

How much of the register state is revealed to the VMM when we do a TDVMCALL? Presumably we should fully sanitize all register state that shows up in cleartext on the other end, and we should treat all regs that can be modified by the VMM as clobbered.

2021-02-08 18:58:44

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC v1 09/26] x86/tdx: Handle CPUID via #VE

On Sun, Feb 07, 2021, Andy Lutomirski wrote:
>
> > On Feb 7, 2021, at 2:31 PM, Dave Hansen <[email protected]> wrote:
> >
> > On 2/7/21 12:29 PM, Kirill A. Shutemov wrote:
> >>> Couldn't you just have one big helper that takes *all* the registers
> >>> that get used in any TDVMCALL and sets all the rcx bits? The users
> >>> could just pass 0's for the things they don't use.

IIRC, having exactly one helper is a big mess (my original code used a single
main helper). CPUID has a large number of outputs, and because outputs are
handled as pointers, the assembly routine needs to check output params for NULL.

And if we want to write up port I/O directly to TDVMCALL to avoid the #VE, IN
and OUT need separate helpers to implement a non-standard register ABI in order
to play nice with ALTERANTIVES.

This also has my vote, mainly because gcc doesn't allow directly specifying
r8-r15 as register constraints to inline asm. That creates a nasty hole where
a register can get corrupted if code is inserted between writing the local
variable and passing it to the inline asm.

Case in point, patch 08 has this exact bug.

> +static u64 tdx_read_msr_safe(unsigned int msr, int *err)
> +{
> +       register long r10 asm("r10") = TDVMCALL_STANDARD;
> +       register long r11 asm("r11") = EXIT_REASON_MSR_READ;
> +       register long r12 asm("r12") = msr;
> +       register long rcx asm("rcx");
> +       long ret;
> +
> +       WARN_ON_ONCE(tdx_is_context_switched_msr(msr));

This can corrupt r10, r11 and/or r12, e.g. if tdx_is_context_switched_msr() is
not inlined, it can use r10 and r11 as scratch registers, and gcc isn't smart
enough to know it needs to save registers before the call.

Even if the code as committed is guaranteed to work, IMO this approach is
hostile toward future debuggers/developers, e.g. adding a printk() in here to
debug can introduce a completely different failure.

> +
> +       if (msr == MSR_CSTAR)
> +               return 0;
> +
> +       /* Allow to pass R10, R11 and R12 down to the VMM */
> +       rcx = BIT(10) | BIT(11) | BIT(12);
> +
> +       asm volatile(TDCALL
> +                       : "=a"(ret), "=r"(r10), "=r"(r11), "=r"(r12)
> +                       : "a"(TDVMCALL), "r"(rcx), "r"(r10), "r"(r11), "r"(r12)
> +                       : );
> +
> +       /* XXX: Better error handling needed? */
> +       *err = (ret || r10) ? -EIO : 0;
> +
> +       return r11;
> +}

> >>> Then you've got the ugly inline asm in one place. It also makes it
> >>> harder to screw up the 'rcx' mask and end up passing registers you
> >>> didn't want into a malicious VMM.
> >> For now we only pass down R10-R15, but the interface allows to pass down
> >> much wider set of registers, including XMM. How far do we want to get it?
> >
> > Just do what we immediately need: R10-R15
> > .
> >
>
> How much of the register state is revealed to the VMM when we do a TDVMCALL?
> Presumably we should fully sanitize all register state that shows up in
> cleartext on the other end, and we should treat all regs that can be modified
> by the VMM as clobbered.

The guest gets to choose, with a few restrictions. RSP cannot be exposed to the
host. RAX, RCX, R10, and R11 are always exposed as they hold mandatory info
about the TDVMCALL (TDCALL fn, GPR mask, GHCI vs. vendor, and TDVMCALL fn). All
other GPRs are exposed and clobbered if their bit in RCX is set, otherwise they
are saved/restored by the TDX-Module.

I agree with Dave, pass everything required by the GHCI in the main routine, and
sanitize and save/restore all such GPRs.

2021-02-08 19:06:05

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC v1 09/26] x86/tdx: Handle CPUID via #VE

On Mon, Feb 8, 2021 at 9:11 AM Sean Christopherson <[email protected]> wrote:
>
> On Sun, Feb 07, 2021, Andy Lutomirski wrote:
> >

> > How much of the register state is revealed to the VMM when we do a TDVMCALL?
> > Presumably we should fully sanitize all register state that shows up in
> > cleartext on the other end, and we should treat all regs that can be modified
> > by the VMM as clobbered.
>
> The guest gets to choose, with a few restrictions. RSP cannot be exposed to the
> host. RAX, RCX, R10, and R11 are always exposed as they hold mandatory info
> about the TDVMCALL (TDCALL fn, GPR mask, GHCI vs. vendor, and TDVMCALL fn). All
> other GPRs are exposed and clobbered if their bit in RCX is set, otherwise they
> are saved/restored by the TDX-Module.
>
> I agree with Dave, pass everything required by the GHCI in the main routine, and
> sanitize and save/restore all such GPRs.

Sounds okay to me.

2021-02-08 19:09:51

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC v1 09/26] x86/tdx: Handle CPUID via #VE

On Mon, Feb 08, 2021, Andy Lutomirski wrote:
> On Mon, Feb 8, 2021 at 9:11 AM Sean Christopherson <[email protected]> wrote:
> >
> > On Sun, Feb 07, 2021, Andy Lutomirski wrote:
> > >
>
> > > How much of the register state is revealed to the VMM when we do a TDVMCALL?
> > > Presumably we should fully sanitize all register state that shows up in
> > > cleartext on the other end, and we should treat all regs that can be modified
> > > by the VMM as clobbered.
> >
> > The guest gets to choose, with a few restrictions. RSP cannot be exposed to the
> > host. RAX, RCX, R10, and R11 are always exposed as they hold mandatory info
> > about the TDVMCALL (TDCALL fn, GPR mask, GHCI vs. vendor, and TDVMCALL fn). All
> > other GPRs are exposed and clobbered if their bit in RCX is set, otherwise they
> > are saved/restored by the TDX-Module.
> >
> > I agree with Dave, pass everything required by the GHCI in the main routine, and
> > sanitize and save/restore all such GPRs.
>
> Sounds okay to me.

One clarification: only non-volatile GPRs (R12-R15) need to be saved/restored.

And I think it makes sense to sanitize any exposed GPRs (that don't hold an
output value) after TDVMCALL to avoid speculating with a host-controlled value.

Subject: [PATCH v1 1/1] x86/tdx: Add tdcall() and tdvmcall() helper functions

Implement common helper functions to communicate with
the TDX Module and VMM (using TDCALL instruction).

tdvmcall() function can be used to request services
from VMM.

tdcall() function can be used to communicate with the
TDX Module.

Using common helper functions makes the code more readable
and less error prone compared to distributed and use case
specific inline assembly code. Only downside in using this
approach is, it adds a few extra instructions for every
TDCALL use case when compared to distributed checks. Although
it's a bit less efficient, it's worth it to make the code more
readable.

Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---

Hi All,

As you have suggested, I have created common helper functions
for all tdcall() and tdvmcall() use cases. It uses inline
assembly and passes GPRs R8-15 and r[a-c]x registers to TDX
Module/VMM. Please take a look at it and let me know your
comments. If you agree with the design, I can re-submit the
patchset with changes related to using these new APIs. Please
let me know.

arch/x86/include/asm/tdx.h | 27 ++++++++++++++++++++
arch/x86/kernel/tdx.c | 52 ++++++++++++++++++++++++++++++++++++++
2 files changed, 79 insertions(+)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 0b9d571b1f95..311252a90cfb 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -3,8 +3,27 @@
#ifndef _ASM_X86_TDX_H
#define _ASM_X86_TDX_H

+#include <linux/types.h>
+
#define TDX_CPUID_LEAF_ID 0x21

+#define TDVMCALL 0
+
+/* TDVMCALL R10 Input */
+#define TDVMCALL_STANDARD 0
+
+/*
+ * TDCALL instruction is newly added in TDX architecture,
+ * used by TD for requesting the host VMM to provide
+ * (untrusted) services. Supported in Binutils >= 2.36
+ */
+#define TDCALL ".byte 0x66,0x0f,0x01,0xcc"
+
+struct tdcall_regs {
+ u64 rax, rcx, rdx;
+ u64 r8, r9, r10, r11, r12, r13, r14, r15;
+};
+
#ifdef CONFIG_INTEL_TDX_GUEST

/* Common API to check TDX support in decompression and common kernel code. */
@@ -12,6 +31,10 @@ bool is_tdx_guest(void);

void __init tdx_early_init(void);

+void tdcall(u64 leafid, struct tdcall_regs *regs);
+
+void tdvmcall(u64 subid, struct tdcall_regs *regs);
+
#else // !CONFIG_INTEL_TDX_GUEST

static inline bool is_tdx_guest(void)
@@ -21,6 +44,10 @@ static inline bool is_tdx_guest(void)

static inline void tdx_early_init(void) { };

+static inline void tdcall(u64 leafid, struct tdcall_regs *regs) { };
+
+static inline void tdvmcall(u64 subid, struct tdcall_regs *regs) { };
+
#endif /* CONFIG_INTEL_TDX_GUEST */

#endif /* _ASM_X86_TDX_H */
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index e44e55d1e519..7ae1d25e272b 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -4,6 +4,58 @@
#include <asm/tdx.h>
#include <asm/cpufeature.h>

+void tdcall(u64 leafid, struct tdcall_regs *regs)
+{
+ asm volatile(
+ /* RAX = leafid (TDCALL LEAF ID) */
+ " movq %0, %%rax;"
+ /* Move regs->r[*] data to regs r[a-c]x, r8-r5 */
+ " movq 8(%1), %%rcx;"
+ " movq 16(%1), %%rdx;"
+ " movq 24(%1), %%r8;"
+ " movq 32(%1), %%r9;"
+ " movq 40(%1), %%r10;"
+ " movq 48(%1), %%r11;"
+ " movq 56(%1), %%r12;"
+ " movq 64(%1), %%r13;"
+ " movq 72(%1), %%r14;"
+ " movq 80(%1), %%r15;"
+ TDCALL ";"
+ /* Save TDCALL success/failure to regs->rax */
+ " movq %%rax, (%1);"
+ /* Save rcx and rdx contents to regs->r[c-d]x */
+ " movq %%rcx, 8(%1);"
+ " movq %%rdx, 16(%1);"
+ /* Move content of registers R8-R15 regs->r[8-15] */
+ " movq %%r8, 24(%1);"
+ " movq %%r9, 32(%1);"
+ " movq %%r10, 40(%1);"
+ " movq %%r11, 48(%1);"
+ " movq %%r12, 56(%1);"
+ " movq %%r13, 64(%1);"
+ " movq %%r14, 72(%1);"
+ " movq %%r15, 80(%1);"
+
+ :
+ : "r" (leafid), "r" (regs)
+ : "memory", "rax", "rbx", "rcx", "rdx", "r8",
+ "r9", "r10", "r11", "r12", "r13", "r14", "r15"
+ );
+
+}
+
+void tdvmcall(u64 subid, struct tdcall_regs *regs)
+{
+ /* Expose GPRs R8-R15 to VMM */
+ regs->rcx = 0xff00;
+ /* R10 = 0 (standard TDVMCALL) */
+ regs->r10 = TDVMCALL_STANDARD;
+ /* Save subid to r11 register */
+ regs->r11 = subid;
+
+ tdcall(TDVMCALL, regs);
+}
+
static inline bool cpuid_has_tdx_guest(void)
{
u32 eax, signature[3];
--
2.25.1

2021-03-19 16:57:34

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v1 1/1] x86/tdx: Add tdcall() and tdvmcall() helper functions

On Thu, Mar 18, 2021, Kuppuswamy Sathyanarayanan wrote:
> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
> index e44e55d1e519..7ae1d25e272b 100644
> --- a/arch/x86/kernel/tdx.c
> +++ b/arch/x86/kernel/tdx.c
> @@ -4,6 +4,58 @@
> #include <asm/tdx.h>
> #include <asm/cpufeature.h>
>
> +void tdcall(u64 leafid, struct tdcall_regs *regs)
> +{
> + asm volatile(
> + /* RAX = leafid (TDCALL LEAF ID) */
> + " movq %0, %%rax;"
> + /* Move regs->r[*] data to regs r[a-c]x, r8-r5 */
> + " movq 8(%1), %%rcx;"

I am super duper opposed to using inline asm. Large blocks are hard to read,
and even harder to maintain. E.g. the %1 usage falls apart if an output
constraint is added; that can be avoided by defining a local const/imm (I forget
what they're called), but it doesn't help readability.

> + " movq 16(%1), %%rdx;"
> + " movq 24(%1), %%r8;"
> + " movq 32(%1), %%r9;"
> + " movq 40(%1), %%r10;"
> + " movq 48(%1), %%r11;"
> + " movq 56(%1), %%r12;"
> + " movq 64(%1), %%r13;"
> + " movq 72(%1), %%r14;"
> + " movq 80(%1), %%r15;"

This is extremely unsafe, and wasteful. Putting the onus on the caller to zero
out unused registers, with no mechanism to enforce/encourage doing so, makes it
likely that the kernel will leak information to the VMM, e.g. in the form of
stack data due to a partially initialized "regs".

And although TDVMCALL is anything but speedy, requiring multiple memory
operations just to set a single register is unnecessary. Not to mention
several of these registers are never used in the GHCI-defined TDVMCALLs. And,
since the caller defines the mask (which I also dislike), it's possible/likely
that many of these memory operations are wasteful even for registers that are
used by _some_ TDVMCALLs. Unnecessary accesses are inevitable if we want a
common helper, but this is too much.

> + TDCALL ";"
> + /* Save TDCALL success/failure to regs->rax */
> + " movq %%rax, (%1);"
> + /* Save rcx and rdx contents to regs->r[c-d]x */
> + " movq %%rcx, 8(%1);"
> + " movq %%rdx, 16(%1);"
> + /* Move content of registers R8-R15 regs->r[8-15] */
> + " movq %%r8, 24(%1);"
> + " movq %%r9, 32(%1);"
> + " movq %%r10, 40(%1);"
> + " movq %%r11, 48(%1);"
> + " movq %%r12, 56(%1);"
> + " movq %%r13, 64(%1);"
> + " movq %%r14, 72(%1);"
> + " movq %%r15, 80(%1);"
> +
> + :
> + : "r" (leafid), "r" (regs)
> + : "memory", "rax", "rbx", "rcx", "rdx", "r8",
> + "r9", "r10", "r11", "r12", "r13", "r14", "r15"

All these clobbers mean even more memory operations...

> + );
> +
> +}
> +
> +void tdvmcall(u64 subid, struct tdcall_regs *regs)
> +{
> + /* Expose GPRs R8-R15 to VMM */
> + regs->rcx = 0xff00;
> + /* R10 = 0 (standard TDVMCALL) */
> + regs->r10 = TDVMCALL_STANDARD;
> + /* Save subid to r11 register */
> + regs->r11 = subid;
> +
> + tdcall(TDVMCALL, regs);

This implies the caller is responsible for _all_ error checking. The base
TDCALL should never fail; if it does, something is horribly wrong with TDX-Module
and panicking is absolutely the best option.

The users of this are going to be difficult to read as well since the parameters
are stuff into a struct instead of being passed to a function.

IMO, throwing the bulk of the code in a proper asm subroutine and handling only
the GHCI-defined TDVMCALLs is the way to go. If/when a VMM comes along that
wants to enlighten Linux guests to work with non-GCHI TDVMCALLs, enhancing this
madness can be their problem.

Completely untested...

struct tdvmcall_output {
u64 r12;
u64 r13;
u64 r14;
u64 r15;
}

u64 __tdvmcall(u64 fn, u64 p0, u64 p1, u64 p2, u64 p3,
struct tdvmcall_output *out);

/* Offset for fields in tdvmcall_output */
OFFSET(TDVMCALL_r12, tdvmcall_output, r13);
OFFSET(TDVMCALL_r13, tdvmcall_output, r13);
OFFSET(TDVMCALL_r14, tdvmcall_output, r14);
OFFSET(TDVMCALL_r15, tdvmcall_output, r15);

SYM_FUNC_START(__tdvmcall)
FRAME_BEGIN

/* Save/restore non-volatile GPRs that are exposed to the VMM. */
push %r15
push %r14
push %r13
push %r12

/*
* 0 => RAX = TDCALL leaf
* 0 => R10 = standard vs. vendor
* RDI => R11 = TDVMCALL function, e.g. exit reason
* RSI => R12 = input param 0
* RDX => R13 = input param 1
* RCX => R14 = input param 2
* R8 => R15 = input param 3
* MASK => RCX = TDVMCALL register behavior
* R9 => N/A = output struct
*/
xor %eax, %eax
xor %r10d, %r10d
mov %rdi, %r11
mov %rsi, %r12
mov %rdx, %r13
mov %rcx, %r14
mov %r8, %r15

/*
* Expose R10 - R15, i.e. all GPRs that may be used by TDVMCALLs
* defined in the GHCI. Note, RAX and RCX are consumed, but only by
* TDX-Module and so don't need to be listed in the mask.
*/
movl $0xfc00, %ecx

tdcall

/* Panic if TDCALL reports failure. */
test %rax, %rax
jnz 2f

/* Propagate TDVMCALL success/failure to return value. */
mov %r10, %rax

/*
* On success, propagate TDVMCALL outputs values to the output struct,
* if an output struct is provided.
*/
test %rax, %rax
jnz 1f
test %r9, %r9
jz 1f

movq %r12, $TDVMCALL_r12(%r9)
movq %r13, $TDVMCALL_r13(%r9)
movq %r14, $TDVMCALL_r14(%r9)
movq %r15, $TDVMCALL_r15(%r9)
1:
/*
* Zero out registers exposed to the VMM to avoid speculative execution
* with VMM-controlled values.
*/
xor %r10d, %r10d
xor %r11d, %r11d
xor %r12d, %r12d
xor %r13d, %r13d
xor %r14d, %r14d
xor %r15d, %r15d

pop %r12
pop %r13
pop %r14
pop %r15

FRAME_END
ret
2:
ud2
SYM_FUNC_END(__tdvmcall)

/*
* Wrapper for the semi-common case where errors are fatal and there is a
* single output value.
*/
static inline u64 tdvmcall(u64 fn, u64 p0, u64 p1, u64 p2, u64 p3,
struct tdvmcall_output *out)
{
struct tdvmcall_output out;
u64 err;

err = __tdvmcall(fn, p0, p1, p2, p3, &out);
BUG_ON(err);

return out.r11;
}

static void tdx_handle_cpuid(struct pt_regs *regs)
{
struct tdvmcall_output out;
u64 err;

err = __tdvmcall(EXIT_REASON_CPUID, regs->ax, regs->cx, 0, 0, &out);
BUG_ON(err);

regs->ax = out.r11;
regs->bx = out.r12;
regs->cx = out.r13;
regs->dx = out.r14;
}

#define REG_MASK(size) ((1ULL << ((size) * 8)) - 1)

static void tdx_handle_io(struct pt_regs *regs, u32 exit_qual)
{
u8 out = (exit_qual & 8) ? 0 : 1;
u8 size = (exit_qual & 7) + 1;
u16 port = exit_qual >> 16;
u64 val;

/* I/O strings ops are unrolled at build time. */
BUG_ON(exit_qual & 0x10);

if (!tdx_allowed_port(port))
return;

if (out)
val = regs->ax & REG_MASK(size);
else
val = 0;

val = tdvmcall(EXIT_REASON_IO_INSTRUCTION, port, size, out, val);
if (!out) {
/* The upper bits of *AX are preserved for 2 and 1 byte I/O. */
if (size < 4)
val |= (regs->ax & ~REG_MASK(size));
regs->ax = val;
}4
}

static u64 tdx_read_msr_safe(unsigned int msr, int *ret)
{
struct tdvmcall_output out;
u64 err;

WARN_ON_ONCE(tdx_is_context_switched_msr(msr));

if (msr == MSR_CSTAR) {
*ret = 0;
return 0;
}

err = __tdvmcall(EXIT_REASON_MSR_READ, regs->ax, regs->cx, 0, 0, &out);
if (err) {
*ret -EIO;
return 0;
}
return out.r11;
}

Subject: Re: [PATCH v1 1/1] x86/tdx: Add tdcall() and tdvmcall() helper functions

Hi Sean,

Thanks for the review.

On 3/19/21 9:55 AM, Sean Christopherson wrote:
> On Thu, Mar 18, 2021, Kuppuswamy Sathyanarayanan wrote:
>> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
>> index e44e55d1e519..7ae1d25e272b 100644
>> --- a/arch/x86/kernel/tdx.c
>> +++ b/arch/x86/kernel/tdx.c
>> @@ -4,6 +4,58 @@
>> #include <asm/tdx.h>
>> #include <asm/cpufeature.h>
>>
>> +void tdcall(u64 leafid, struct tdcall_regs *regs)
>> +{
>> + asm volatile(
>> + /* RAX = leafid (TDCALL LEAF ID) */
>> + " movq %0, %%rax;"
>> + /* Move regs->r[*] data to regs r[a-c]x, r8-r5 */
>> + " movq 8(%1), %%rcx;"
>
> I am super duper opposed to using inline asm. Large blocks are hard to read,
I think this point is arguable. Based on the review comments I received so far,
people prefer inline assembly compared to asm sub functions.
> and even harder to maintain. E.g. the %1 usage falls apart if an output
> constraint is added; that can be avoided by defining a local const/imm (I forget
> what they're called), but it doesn't help readability.
we can use OFFSET() calls to improve the readability and avoid this issue. Also IMO,
any one adding constraints should know how this would affect the asm code.
>
>> + " movq 16(%1), %%rdx;"
>> + " movq 24(%1), %%r8;"
>> + " movq 32(%1), %%r9;"
>> + " movq 40(%1), %%r10;"
>> + " movq 48(%1), %%r11;"
>> + " movq 56(%1), %%r12;"
>> + " movq 64(%1), %%r13;"
>> + " movq 72(%1), %%r14;"
>> + " movq 80(%1), %%r15;"
>
> This is extremely unsafe, and wasteful. Putting the onus on the caller to zero
> out unused registers, with no mechanism to enforce/encourage doing so,
For encouragement, we can add a comment to this function about callers responsibility.
makes it
> likely that the kernel will leak information to the VMM, e.g. in the form of
> stack data due to a partially initialized "regs".
Unless you create sub-functions for each use cases, callers cannot avoid this
responsibility.
>
> And although TDVMCALL is anything but speedy, requiring multiple memory
> operations just to set a single register is unnecessary. Not to mention
> several of these registers are never used in the GHCI-defined TDVMCALLs.
This function is common between TDCALL and TDVMCALL. Extra registers you
mentioned are related to other TDCALL usecases.
And,
> since the caller defines the mask (which I also dislike), it's possible/likely
> that many of these memory operations are wasteful even for registers that are
> used by _some_ TDVMCALLs. Unnecessary accesses are inevitable if we want a
> common helper, but this is too much.
using single function makes it easy to maintain, readable and less error prone.
But I agree there are many unnecessary accesses for many users.
>
>> + TDCALL ";"
>> + /* Save TDCALL success/failure to regs->rax */
>> + " movq %%rax, (%1);"
>> + /* Save rcx and rdx contents to regs->r[c-d]x */
>> + " movq %%rcx, 8(%1);"
>> + " movq %%rdx, 16(%1);"
>> + /* Move content of registers R8-R15 regs->r[8-15] */
>> + " movq %%r8, 24(%1);"
>> + " movq %%r9, 32(%1);"
>> + " movq %%r10, 40(%1);"
>> + " movq %%r11, 48(%1);"
>> + " movq %%r12, 56(%1);"
>> + " movq %%r13, 64(%1);"
>> + " movq %%r14, 72(%1);"
>> + " movq %%r15, 80(%1);"
>> +
>> + :
>> + : "r" (leafid), "r" (regs)
>> + : "memory", "rax", "rbx", "rcx", "rdx", "r8",
>> + "r9", "r10", "r11", "r12", "r13", "r14", "r15"
>
> All these clobbers mean even more memory operations...
>
>> + );
>> +
>> +}
>> +
>> +void tdvmcall(u64 subid, struct tdcall_regs *regs)
>> +{
>> + /* Expose GPRs R8-R15 to VMM */
>> + regs->rcx = 0xff00;
>> + /* R10 = 0 (standard TDVMCALL) */
>> + regs->r10 = TDVMCALL_STANDARD;
>> + /* Save subid to r11 register */
>> + regs->r11 = subid;
>> +
>> + tdcall(TDVMCALL, regs);
>
> This implies the caller is responsible for _all_ error checking. The base
> TDCALL should never fail; if it does, something is horribly wrong with TDX-Module
> and panicking is absolutely the best option.
I haven't added error checking to common function because some use cases like
MSR and IO access does not need to panic with TDVMCALL failures.

To improve this, may be we can create sub-functions (similar to your code) like,
1. tdvmcall() //with BUG_ON(regs.rax)
2. _tdvmcall() // without error checks
>
> The users of this are going to be difficult to read as well since the parameters
> are stuff into a struct instead of being passed to a function.
I think regs.rx = xx code format is more easier to read compared to passing
parameters to the function.
>
> IMO, throwing the bulk of the code in a proper asm subroutine and handling only
> the GHCI-defined TDVMCALLs is the way to go. If/when a VMM comes along that
> wants to enlighten Linux guests to work with non-GCHI TDVMCALLs, enhancing this
> madness can be their problem.
>
> Completely untested...
>
> struct tdvmcall_output {
> u64 r12;
> u64 r13;
> u64 r14;
> u64 r15;
> }
>
> u64 __tdvmcall(u64 fn, u64 p0, u64 p1, u64 p2, u64 p3,
> struct tdvmcall_output *out);
This function is only for tdvmcall. If you want to create
common function for all tdcall use cases, you would end
up using struct for input as well.
>
> /* Offset for fields in tdvmcall_output */
> OFFSET(TDVMCALL_r12, tdvmcall_output, r13);
> OFFSET(TDVMCALL_r13, tdvmcall_output, r13);
> OFFSET(TDVMCALL_r14, tdvmcall_output, r14);
> OFFSET(TDVMCALL_r15, tdvmcall_output, r15);
>
> SYM_FUNC_START(__tdvmcall)
> FRAME_BEGIN
>
> /* Save/restore non-volatile GPRs that are exposed to the VMM. */
> push %r15
> push %r14
> push %r13
> push %r12
>
> /*
> * 0 => RAX = TDCALL leaf
> * 0 => R10 = standard vs. vendor
> * RDI => R11 = TDVMCALL function, e.g. exit reason
> * RSI => R12 = input param 0
> * RDX => R13 = input param 1
> * RCX => R14 = input param 2
> * R8 => R15 = input param 3
> * MASK => RCX = TDVMCALL register behavior
> * R9 => N/A = output struct
> */
> xor %eax, %eax
> xor %r10d, %r10d
> mov %rdi, %r11
> mov %rsi, %r12
> mov %rdx, %r13
> mov %rcx, %r14
> mov %r8, %r15
>
> /*
> * Expose R10 - R15, i.e. all GPRs that may be used by TDVMCALLs
> * defined in the GHCI. Note, RAX and RCX are consumed, but only by
> * TDX-Module and so don't need to be listed in the mask.
> */
> movl $0xfc00, %ecx
>
> tdcall
>
> /* Panic if TDCALL reports failure. */
> test %rax, %rax
> jnz 2f
>
> /* Propagate TDVMCALL success/failure to return value. */
> mov %r10, %rax
>
> /*
> * On success, propagate TDVMCALL outputs values to the output struct,
> * if an output struct is provided.
> */
> test %rax, %rax
> jnz 1f
> test %r9, %r9
> jz 1f
>
> movq %r12, $TDVMCALL_r12(%r9)
> movq %r13, $TDVMCALL_r13(%r9)
> movq %r14, $TDVMCALL_r14(%r9)
> movq %r15, $TDVMCALL_r15(%r9)
> 1:
> /*
> * Zero out registers exposed to the VMM to avoid speculative execution
> * with VMM-controlled values.
> */
> xor %r10d, %r10d
> xor %r11d, %r11d
> xor %r12d, %r12d
> xor %r13d, %r13d
> xor %r14d, %r14d
> xor %r15d, %r15d
>
> pop %r12
> pop %r13
> pop %r14
> pop %r15
>
> FRAME_END
> ret
> 2:
> ud2
> SYM_FUNC_END(__tdvmcall)
>
> /*
> * Wrapper for the semi-common case where errors are fatal and there is a
> * single output value.
> */
> static inline u64 tdvmcall(u64 fn, u64 p0, u64 p1, u64 p2, u64 p3,
> struct tdvmcall_output *out)
> {
> struct tdvmcall_output out;
> u64 err;
>
> err = __tdvmcall(fn, p0, p1, p2, p3, &out);
> BUG_ON(err);
>
> return out.r11;
> }
>
> static void tdx_handle_cpuid(struct pt_regs *regs)
> {
> struct tdvmcall_output out;
> u64 err;
>
> err = __tdvmcall(EXIT_REASON_CPUID, regs->ax, regs->cx, 0, 0, &out);
> BUG_ON(err);
>
> regs->ax = out.r11;
> regs->bx = out.r12;
> regs->cx = out.r13;
> regs->dx = out.r14;
> }
>
> #define REG_MASK(size) ((1ULL << ((size) * 8)) - 1)
>
> static void tdx_handle_io(struct pt_regs *regs, u32 exit_qual)
> {
> u8 out = (exit_qual & 8) ? 0 : 1;
> u8 size = (exit_qual & 7) + 1;
> u16 port = exit_qual >> 16;
> u64 val;
>
> /* I/O strings ops are unrolled at build time. */
> BUG_ON(exit_qual & 0x10);
>
> if (!tdx_allowed_port(port))
> return;
>
> if (out)
> val = regs->ax & REG_MASK(size);
> else
> val = 0;
>
> val = tdvmcall(EXIT_REASON_IO_INSTRUCTION, port, size, out, val);
> if (!out) {
> /* The upper bits of *AX are preserved for 2 and 1 byte I/O. */
> if (size < 4)
> val |= (regs->ax & ~REG_MASK(size));
> regs->ax = val;
> }4
> }
>
> static u64 tdx_read_msr_safe(unsigned int msr, int *ret)
> {
> struct tdvmcall_output out;
> u64 err;
>
> WARN_ON_ONCE(tdx_is_context_switched_msr(msr));
>
> if (msr == MSR_CSTAR) {
> *ret = 0;
> return 0;
> }
>
> err = __tdvmcall(EXIT_REASON_MSR_READ, regs->ax, regs->cx, 0, 0, &out);
> if (err) {
> *ret -EIO;
> return 0;
> }
> return out.r11;
> }
>

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

2021-03-19 18:24:21

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v1 1/1] x86/tdx: Add tdcall() and tdvmcall() helper functions

On 3/19/21 10:42 AM, Kuppuswamy, Sathyanarayanan wrote:
>>> @@ -4,6 +4,58 @@
>>>   #include <asm/tdx.h>
>>>   #include <asm/cpufeature.h>
>>>   +void tdcall(u64 leafid, struct tdcall_regs *regs)
>>> +{
>>> +    asm volatile(
>>> +            /* RAX = leafid (TDCALL LEAF ID) */
>>> +            "  movq %0, %%rax;"
>>> +            /* Move regs->r[*] data to regs r[a-c]x,  r8-r5 */
>>> +            "  movq 8(%1), %%rcx;"
>>
>> I am super duper opposed to using inline asm. Large blocks are
>> hard to read,
> I think this point is arguable. Based on the review comments I
> received so far, people prefer inline assembly compared to asm sub
> functions.

It's arguable, but Sean makes a pretty compelling case.

I actually think inline assembly is a monstrosity. It's insanely arcane
and, as I hope you have noted, does not scale nicely beyond one or two
instructions.

>> and even harder to maintain. E.g. the %1 usage falls apart if an
>> output constraint is added; that can be avoided by defining a local
>> const/imm (I forget what they're called), but it doesn't help
>> readability.
> we can use OFFSET() calls to improve the readability and avoid this
> issue. Also IMO, any one adding constraints should know how this
> would affect the asm code.

This is about *maintainability*. How _easily_ can someone change this
code in the future? Sean's arguing that it's *hard* to correctly add a
constraint. Unfortunately, our supply of omnipotent kernel developers
is a bit short.

>>> +            "  movq 16(%1), %%rdx;"
>>> +            "  movq 24(%1), %%r8;"
>>> +            "  movq 32(%1), %%r9;"
>>> +            "  movq 40(%1), %%r10;"
>>> +            "  movq 48(%1), %%r11;"
>>> +            "  movq 56(%1), %%r12;"
>>> +            "  movq 64(%1), %%r13;"
>>> +            "  movq 72(%1), %%r14;"
>>> +            "  movq 80(%1), %%r15;"
>>
>> This is extremely unsafe, and wasteful. Putting the onus on the
>> caller to zero out unused registers, with no mechanism to
>> enforce/encourage doing so,
> For encouragement, we can add a comment to this function about
> callers responsibility. makes it
>> likely that the kernel will leak information to the VMM, e.g. in
>> the form of stack data due to a partially initialized "regs".
> Unless you create sub-functions for each use cases, callers cannot
> avoid this responsibility.

I don't think we're quite at the point where we throw up our hands.

It would be pretty simple to have an initializer that zeros the
registers out, or looks at the argument mask and does it more precisely.
Surely we can do *something*.

>>     /* Offset for fields in tdvmcall_output */
>>     OFFSET(TDVMCALL_r12, tdvmcall_output, r13);
>>     OFFSET(TDVMCALL_r13, tdvmcall_output, r13);
>>     OFFSET(TDVMCALL_r14, tdvmcall_output, r14);
>>     OFFSET(TDVMCALL_r15, tdvmcall_output, r15);
>>
>> SYM_FUNC_START(__tdvmcall)
>>     FRAME_BEGIN
>>
>>     /* Save/restore non-volatile GPRs that are exposed to the VMM. */
>>          push %r15
>>          push %r14
>>          push %r13
>>          push %r12

I might have some tweaks for the assembly once someone puts a real patch
together. But, that looks a lot more sane than the inline assembly to me.

Subject: Re: [PATCH v1 1/1] x86/tdx: Add tdcall() and tdvmcall() helper functions



On 3/19/21 11:22 AM, Dave Hansen wrote:
> On 3/19/21 10:42 AM, Kuppuswamy, Sathyanarayanan wrote:
>>>> @@ -4,6 +4,58 @@
>>>>   #include <asm/tdx.h>
>>>>   #include <asm/cpufeature.h>
>>>>   +void tdcall(u64 leafid, struct tdcall_regs *regs)
>>>> +{
>>>> +    asm volatile(
>>>> +            /* RAX = leafid (TDCALL LEAF ID) */
>>>> +            "  movq %0, %%rax;"
>>>> +            /* Move regs->r[*] data to regs r[a-c]x,  r8-r5 */
>>>> +            "  movq 8(%1), %%rcx;"
>>>
>>> I am super duper opposed to using inline asm. Large blocks are
>>> hard to read,
>> I think this point is arguable. Based on the review comments I
>> received so far, people prefer inline assembly compared to asm sub
>> functions.
>
> It's arguable, but Sean makes a pretty compelling case.
>
> I actually think inline assembly is a monstrosity. It's insanely arcane
> and, as I hope you have noted, does not scale nicely beyond one or two
> instructions.
>
>>> and even harder to maintain. E.g. the %1 usage falls apart if an
>>> output constraint is added; that can be avoided by defining a local
>>> const/imm (I forget what they're called), but it doesn't help
>>> readability.
>> we can use OFFSET() calls to improve the readability and avoid this
>> issue. Also IMO, any one adding constraints should know how this
>> would affect the asm code.
>
> This is about *maintainability*. How _easily_ can someone change this
> code in the future? Sean's arguing that it's *hard* to correctly add a
> constraint. Unfortunately, our supply of omnipotent kernel developers
> is a bit short.
>
>>>> +            "  movq 16(%1), %%rdx;"
>>>> +            "  movq 24(%1), %%r8;"
>>>> +            "  movq 32(%1), %%r9;"
>>>> +            "  movq 40(%1), %%r10;"
>>>> +            "  movq 48(%1), %%r11;"
>>>> +            "  movq 56(%1), %%r12;"
>>>> +            "  movq 64(%1), %%r13;"
>>>> +            "  movq 72(%1), %%r14;"
>>>> +            "  movq 80(%1), %%r15;"
>>>
>>> This is extremely unsafe, and wasteful. Putting the onus on the
>>> caller to zero out unused registers, with no mechanism to
>>> enforce/encourage doing so,
>> For encouragement, we can add a comment to this function about
>> callers responsibility. makes it
>>> likely that the kernel will leak information to the VMM, e.g. in
>>> the form of stack data due to a partially initialized "regs".
>> Unless you create sub-functions for each use cases, callers cannot
>> avoid this responsibility.
>
> I don't think we're quite at the point where we throw up our hands.
>
> It would be pretty simple to have an initializer that zeros the
> registers out, or looks at the argument mask and does it more precisely.
> Surely we can do *something*.
>
>>>     /* Offset for fields in tdvmcall_output */
>>>     OFFSET(TDVMCALL_r12, tdvmcall_output, r13);
>>>     OFFSET(TDVMCALL_r13, tdvmcall_output, r13);
>>>     OFFSET(TDVMCALL_r14, tdvmcall_output, r14);
>>>     OFFSET(TDVMCALL_r15, tdvmcall_output, r15);
>>>
>>> SYM_FUNC_START(__tdvmcall)
>>>     FRAME_BEGIN
>>>
>>>     /* Save/restore non-volatile GPRs that are exposed to the VMM. */
>>>          push %r15
>>>          push %r14
>>>          push %r13
>>>          push %r12
>
> I might have some tweaks for the assembly once someone puts a real patch
> together. But, that looks a lot more sane than the inline assembly to me.
Ok. Let me use Sean's proposal and submit tested version of this patch.

Also, any thoughts on whether you want to use single function for
tdcall and tdvmcall?
>

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

Subject: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

Implement common helper functions to communicate with
the TDX Module and VMM (using TDCALL instruction).

__tdvmcall() function can be used to request services
from VMM.

__tdcall() function can be used to communicate with the
TDX Module.

Using common helper functions makes the code more readable
and less error prone compared to distributed and use case
specific inline assembly code. Only downside in using this
approach is, it adds a few extra instructions for every
TDCALL use case when compared to distributed checks. Although
it's a bit less efficient, it's worth it to make the code more
readable.

Originally-by: Sean Christopherson <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---

Hi All,

Please let me know your review comments. If you agree with this patch
and want to see the use of these APIs in rest of the patches, I will
re-send the patch series with updated code. Please let me know.

Changes since v1:
* Implemented tdvmcall and tdcall helper functions as assembly code.
* Followed suggestion provided by Sean & Dave.

arch/x86/include/asm/tdx.h | 23 +++++
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/asm-offsets.c | 22 +++++
arch/x86/kernel/tdcall.S | 163 ++++++++++++++++++++++++++++++++++
arch/x86/kernel/tdx.c | 30 +++++++
5 files changed, 239 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/kernel/tdcall.S

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 69af72d08d3d..ce6212ce5f45 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -8,12 +8,35 @@
#ifdef CONFIG_INTEL_TDX_GUEST

#include <asm/cpufeature.h>
+#include <linux/types.h>
+
+struct tdcall_output {
+ u64 rcx;
+ u64 rdx;
+ u64 r8;
+ u64 r9;
+ u64 r10;
+ u64 r11;
+};
+
+struct tdvmcall_output {
+ u64 r11;
+ u64 r12;
+ u64 r13;
+ u64 r14;
+ u64 r15;
+};

/* Common API to check TDX support in decompression and common kernel code. */
bool is_tdx_guest(void);

void __init tdx_early_init(void);

+u64 __tdcall(u64 fn, u64 rcx, u64 rdx, struct tdcall_output *out);
+
+u64 __tdvmcall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15,
+ struct tdvmcall_output *out);
+
#else // !CONFIG_INTEL_TDX_GUEST

static inline bool is_tdx_guest(void)
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index ea111bf50691..7966c10ea8d1 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -127,7 +127,7 @@ obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o
obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o

obj-$(CONFIG_JAILHOUSE_GUEST) += jailhouse.o
-obj-$(CONFIG_INTEL_TDX_GUEST) += tdx.o
+obj-$(CONFIG_INTEL_TDX_GUEST) += tdcall.o tdx.o

obj-$(CONFIG_EISA) += eisa.o
obj-$(CONFIG_PCSPKR_PLATFORM) += pcspeaker.o
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 60b9f42ce3c1..72de0b49467e 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -23,6 +23,10 @@
#include <xen/interface/xen.h>
#endif

+#ifdef CONFIG_INTEL_TDX_GUEST
+#include <asm/tdx.h>
+#endif
+
#ifdef CONFIG_X86_32
# include "asm-offsets_32.c"
#else
@@ -75,6 +79,24 @@ static void __used common(void)
OFFSET(XEN_vcpu_info_arch_cr2, vcpu_info, arch.cr2);
#endif

+#ifdef CONFIG_INTEL_TDX_GUEST
+ BLANK();
+ /* Offset for fields in tdcall_output */
+ OFFSET(TDCALL_rcx, tdcall_output, rcx);
+ OFFSET(TDCALL_rdx, tdcall_output, rdx);
+ OFFSET(TDCALL_r8, tdcall_output, r8);
+ OFFSET(TDCALL_r9, tdcall_output, r9);
+ OFFSET(TDCALL_r10, tdcall_output, r10);
+ OFFSET(TDCALL_r11, tdcall_output, r11);
+
+ /* Offset for fields in tdvmcall_output */
+ OFFSET(TDVMCALL_r11, tdvmcall_output, r11);
+ OFFSET(TDVMCALL_r12, tdvmcall_output, r12);
+ OFFSET(TDVMCALL_r13, tdvmcall_output, r13);
+ OFFSET(TDVMCALL_r14, tdvmcall_output, r14);
+ OFFSET(TDVMCALL_r15, tdvmcall_output, r15);
+#endif
+
BLANK();
OFFSET(BP_scratch, boot_params, scratch);
OFFSET(BP_secure_boot, boot_params, secure_boot);
diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
new file mode 100644
index 000000000000..a73b67c0b407
--- /dev/null
+++ b/arch/x86/kernel/tdcall.S
@@ -0,0 +1,163 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <asm/asm-offsets.h>
+#include <asm/asm.h>
+#include <asm/frame.h>
+#include <asm/unwind_hints.h>
+
+#include <linux/linkage.h>
+
+#define TDVMCALL_EXPOSE_REGS_MASK 0xfc00
+
+/*
+ * TDCALL instruction is newly added in TDX architecture,
+ * used by TD for requesting the host VMM to provide
+ * (untrusted) services. Supported in Binutils >= 2.36
+ */
+#define tdcall .byte 0x66,0x0f,0x01,0xcc
+
+/* Only for non TDVMCALL use cases */
+SYM_FUNC_START(__tdcall)
+ FRAME_BEGIN
+
+ /* Save/restore non-volatile GPRs that are exposed to the VMM. */
+ push %r15
+ push %r14
+ push %r13
+ push %r12
+
+ /*
+ * RDI => RAX = TDCALL leaf
+ * RSI => RCX = input param 1
+ * RDX => RDX = input param 2
+ * RCX => N/A = output struct
+ */
+
+ /* Save output pointer to R12 */
+ mov %rcx, %r12
+ /* Move TDCALL Leaf ID to RAX */
+ mov %rdi, %rax
+ /* Move input param 1 to rcx*/
+ mov %rsi, %rcx
+
+ tdcall
+
+ /*
+ * On success, propagate TDCALL outputs values to the output struct,
+ * if an output struct is provided.
+ */
+ test %rax, %rax
+ jnz 1f
+ test %r12, %r12
+ jz 1f
+
+ movq %rcx, TDCALL_rcx(%r12)
+ movq %rdx, TDCALL_rdx(%r12)
+ movq %r8, TDCALL_r8(%r12)
+ movq %r9, TDCALL_r9(%r12)
+ movq %r10, TDCALL_r10(%r12)
+ movq %r11, TDCALL_r11(%r12)
+1:
+ /*
+ * Zero out registers exposed to the VMM to avoid speculative execution
+ * with VMM-controlled values.
+ */
+ xor %rcx, %rcx
+ xor %rdx, %rdx
+ xor %r8d, %r8d
+ xor %r9d, %r9d
+ xor %r10d, %r10d
+ xor %r11d, %r11d
+
+ pop %r12
+ pop %r13
+ pop %r14
+ pop %r15
+
+ FRAME_END
+ ret
+SYM_FUNC_END(__tdcall)
+
+.macro tdvmcall_core
+ FRAME_BEGIN
+
+ /* Save/restore non-volatile GPRs that are exposed to the VMM. */
+ push %r15
+ push %r14
+ push %r13
+ push %r12
+
+ /*
+ * 0 => RAX = TDCALL leaf
+ * RDI => R11 = TDVMCALL function, e.g. exit reason
+ * RSI => R12 = input param 0
+ * RDX => R13 = input param 1
+ * RCX => R14 = input param 2
+ * R8 => R15 = input param 3
+ * MASK => RCX = TDVMCALL register behavior
+ * R9 => R9 = output struct
+ */
+
+ xor %eax, %eax
+ mov %rdi, %r11
+ mov %rsi, %r12
+ mov %rdx, %r13
+ mov %rcx, %r14
+ mov %r8, %r15
+
+ /*
+ * Expose R10 - R15, i.e. all GPRs that may be used by TDVMCALLs
+ * defined in the GHCI. Note, RAX and RCX are consumed, but only by
+ * TDX-Module and so don't need to be listed in the mask.
+ */
+ movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
+
+ tdcall
+
+ /* Panic if TDCALL reports failure. */
+ test %rax, %rax
+ jnz 2f
+
+ /* Propagate TDVMCALL success/failure to return value. */
+ mov %r10, %rax
+
+ /*
+ * On success, propagate TDVMCALL outputs values to the output struct,
+ * if an output struct is provided.
+ */
+ test %rax, %rax
+ jnz 1f
+ test %r9, %r9
+ jz 1f
+
+ movq %r11, TDVMCALL_r11(%r9)
+ movq %r12, TDVMCALL_r12(%r9)
+ movq %r13, TDVMCALL_r13(%r9)
+ movq %r14, TDVMCALL_r14(%r9)
+ movq %r15, TDVMCALL_r15(%r9)
+1:
+ /*
+ * Zero out registers exposed to the VMM to avoid speculative execution
+ * with VMM-controlled values.
+ */
+ xor %r10d, %r10d
+ xor %r11d, %r11d
+ xor %r12d, %r12d
+ xor %r13d, %r13d
+ xor %r14d, %r14d
+ xor %r15d, %r15d
+
+ pop %r12
+ pop %r13
+ pop %r14
+ pop %r15
+
+ FRAME_END
+ ret
+2:
+ ud2
+.endm
+
+SYM_FUNC_START(__tdvmcall)
+ xor %r10, %r10
+ tdvmcall_core
+SYM_FUNC_END(__tdvmcall)
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 0d00dd50a6ff..1147e7e765d6 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -3,6 +3,36 @@

#include <asm/tdx.h>

+/*
+ * Wrapper for the common case with standard output value (R10).
+ */
+static inline u64 tdvmcall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
+{
+ u64 err;
+
+ err = __tdvmcall(fn, r12, r13, r14, r15, NULL);
+
+ WARN_ON(err);
+
+ return err;
+}
+
+/*
+ * Wrapper for the semi-common case where we need single output value (R11).
+ */
+static inline u64 tdvmcall_out_r11(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
+{
+
+ struct tdvmcall_output out = {0};
+ u64 err;
+
+ err = __tdvmcall(fn, r12, r13, r14, r15, &out);
+
+ WARN_ON(err);
+
+ return out.r11;
+}
+
static inline bool cpuid_has_tdx_guest(void)
{
u32 eax, signature[3];
--
2.25.1

2021-04-20 17:40:26

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

On 3/26/21 4:38 PM, Kuppuswamy Sathyanarayanan wrote:
> Implement common helper functions to communicate with
> the TDX Module and VMM (using TDCALL instruction).

This is missing any kind of background. I'd say:

Guests communicate with VMMs with hypercalls. Historically, these are
implemented using instructions that are known to cause VMEXITs like
<examples here>. However, with TDX, VMEXITs no longer expose guest
state from the host. This prevents the old hypercall mechanisms from
working....

... and then go on to talk about what you are introducing, why there are
two of them and so forth.

> __tdvmcall() function can be used to request services
> from VMM.

^ "from a VMM" or "from the VMM", please

> __tdcall() function can be used to communicate with the
> TDX Module.
>
> Using common helper functions makes the code more readable
> and less error prone compared to distributed and use case
> specific inline assembly code. Only downside in using this

^ "The only downside..."

> approach is, it adds a few extra instructions for every
> TDCALL use case when compared to distributed checks. Although
> it's a bit less efficient, it's worth it to make the code more
> readable.

What's a "distributed check"?

This also doesn't talk at all about why this approach was chosen versus
inline assembly. You're going to be asked "why not use inline asm?"

> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -8,12 +8,35 @@
> #ifdef CONFIG_INTEL_TDX_GUEST
>
> #include <asm/cpufeature.h>
> +#include <linux/types.h>
> +
> +struct tdcall_output {
> + u64 rcx;
> + u64 rdx;
> + u64 r8;
> + u64 r9;
> + u64 r10;
> + u64 r11;
> +};
> +
> +struct tdvmcall_output {
> + u64 r11;
> + u64 r12;
> + u64 r13;
> + u64 r14;
> + u64 r15;
> +};
>
> /* Common API to check TDX support in decompression and common kernel code. */
> bool is_tdx_guest(void);
>
> void __init tdx_early_init(void);
>
> +u64 __tdcall(u64 fn, u64 rcx, u64 rdx, struct tdcall_output *out);
> +
> +u64 __tdvmcall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15,
> + struct tdvmcall_output *out);

Some one-liner comments about what these do would be nice.

> #else // !CONFIG_INTEL_TDX_GUEST
>
> static inline bool is_tdx_guest(void)
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index ea111bf50691..7966c10ea8d1 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -127,7 +127,7 @@ obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o
> obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o
>
> obj-$(CONFIG_JAILHOUSE_GUEST) += jailhouse.o
> -obj-$(CONFIG_INTEL_TDX_GUEST) += tdx.o
> +obj-$(CONFIG_INTEL_TDX_GUEST) += tdcall.o tdx.o
>
> obj-$(CONFIG_EISA) += eisa.o
> obj-$(CONFIG_PCSPKR_PLATFORM) += pcspeaker.o
> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
> index 60b9f42ce3c1..72de0b49467e 100644
> --- a/arch/x86/kernel/asm-offsets.c
> +++ b/arch/x86/kernel/asm-offsets.c
> @@ -23,6 +23,10 @@
> #include <xen/interface/xen.h>
> #endif
>
> +#ifdef CONFIG_INTEL_TDX_GUEST
> +#include <asm/tdx.h>
> +#endif
> +
> #ifdef CONFIG_X86_32
> # include "asm-offsets_32.c"
> #else
> @@ -75,6 +79,24 @@ static void __used common(void)
> OFFSET(XEN_vcpu_info_arch_cr2, vcpu_info, arch.cr2);
> #endif
>
> +#ifdef CONFIG_INTEL_TDX_GUEST
> + BLANK();
> + /* Offset for fields in tdcall_output */
> + OFFSET(TDCALL_rcx, tdcall_output, rcx);
> + OFFSET(TDCALL_rdx, tdcall_output, rdx);
> + OFFSET(TDCALL_r8, tdcall_output, r8);
> + OFFSET(TDCALL_r9, tdcall_output, r9);
> + OFFSET(TDCALL_r10, tdcall_output, r10);
> + OFFSET(TDCALL_r11, tdcall_output, r11);

^ vertically align this

> + /* Offset for fields in tdvmcall_output */
> + OFFSET(TDVMCALL_r11, tdvmcall_output, r11);
> + OFFSET(TDVMCALL_r12, tdvmcall_output, r12);
> + OFFSET(TDVMCALL_r13, tdvmcall_output, r13);
> + OFFSET(TDVMCALL_r14, tdvmcall_output, r14);
> + OFFSET(TDVMCALL_r15, tdvmcall_output, r15);
> +#endif
> +
> BLANK();
> OFFSET(BP_scratch, boot_params, scratch);
> OFFSET(BP_secure_boot, boot_params, secure_boot);
> diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
> new file mode 100644
> index 000000000000..a73b67c0b407
> --- /dev/null
> +++ b/arch/x86/kernel/tdcall.S
> @@ -0,0 +1,163 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#include <asm/asm-offsets.h>
> +#include <asm/asm.h>
> +#include <asm/frame.h>
> +#include <asm/unwind_hints.h>
> +
> +#include <linux/linkage.h>
> +
> +#define TDVMCALL_EXPOSE_REGS_MASK 0xfc00

This looks like an undocumented magic number.

> +/*
> + * TDCALL instruction is newly added in TDX architecture,
> + * used by TD for requesting the host VMM to provide
> + * (untrusted) services. Supported in Binutils >= 2.36
> + */

Host VMM *AND* TD-module, right?

> +#define tdcall .byte 0x66,0x0f,0x01,0xcc

How well will the "newly added" comment age?

"host VMM" is redundant.

/*
* TDX guests use the TDCALL instruction to make
* hypercalls to the VMM. ...


> +/* Only for non TDVMCALL use cases */
> +SYM_FUNC_START(__tdcall)
> + FRAME_BEGIN
> +
> + /* Save/restore non-volatile GPRs that are exposed to the VMM. */
> + push %r15
> + push %r14
> + push %r13
> + push %r12

How is this restoring GPRs?

> + /*
> + * RDI => RAX = TDCALL leaf
> + * RSI => RCX = input param 1
> + * RDX => RDX = input param 2
> + * RCX => N/A = output struct
> + */

I don't like this block comment. These should be interspersed with the
instructions. It's actually redundant with what's below.

> + /* Save output pointer to R12 */
> + mov %rcx, %r12

Is this a "save" or a "move"? Isn't this moving the function argument
"%rcx" to the TDCALL register argument "%r12"?

> + /* Move TDCALL Leaf ID to RAX */
> + mov %rdi, %rax
> + /* Move input param 1 to rcx*/
> + mov %rsi, %rcx

This needs a comment:

/* Leave the third function argument (%RDX) in place */

> + tdcall
> +
> + /*
> + * On success, propagate TDCALL outputs values to the output struct,
> + * if an output struct is provided.
> + */

Again, I don't like the comment separated from the instructions. This
should be:


/* Check for TDCALL success: */
> + test %rax, %rax
> + jnz 1f

/* Check for a TDCALL output struct */
> + test %r12, %r12
> + jz 1f

/* Copy TDCALL result registers to output struct: */
> + movq %rcx, TDCALL_rcx(%r12)
> + movq %rdx, TDCALL_rdx(%r12)
> + movq %r8, TDCALL_r8(%r12)
> + movq %r9, TDCALL_r9(%r12)
> + movq %r10, TDCALL_r10(%r12)
> + movq %r11, TDCALL_r11(%r12)

^ Vertically align this

> +1:
> + /*
> + * Zero out registers exposed to the VMM to avoid speculative execution
> + * with VMM-controlled values.
> + */
> + xor %rcx, %rcx
> + xor %rdx, %rdx
> + xor %r8d, %r8d
> + xor %r9d, %r9d
> + xor %r10d, %r10d
> + xor %r11d, %r11d

This has tabs-versus-space problems.

Also, is this the architectural list of *POSSIBLE* registers to which
the VMM can write?

> + pop %r12
> + pop %r13
> + pop %r14
> + pop %r15
> +
> + FRAME_END
> + ret
> +SYM_FUNC_END(__tdcall)
> +
> +.macro tdvmcall_core
> + FRAME_BEGIN
> +
> + /* Save/restore non-volatile GPRs that are exposed to the VMM. */

Again, where's the "restore"?

> + push %r15
> + push %r14
> + push %r13
> + push %r12
> +
> + /*
> + * 0 => RAX = TDCALL leaf
> + * RDI => R11 = TDVMCALL function, e.g. exit reason
> + * RSI => R12 = input param 0
> + * RDX => R13 = input param 1
> + * RCX => R14 = input param 2
> + * R8 => R15 = input param 3
> + * MASK => RCX = TDVMCALL register behavior
> + * R9 => R9 = output struct
> + */
> +
> + xor %eax, %eax
> + mov %rdi, %r11
> + mov %rsi, %r12
> + mov %rdx, %r13
> + mov %rcx, %r14
> + mov %r8, %r15
> +
> + /*
> + * Expose R10 - R15, i.e. all GPRs that may be used by TDVMCALLs
> + * defined in the GHCI. Note, RAX and RCX are consumed, but only by
> + * TDX-Module and so don't need to be listed in the mask.
> + */

"GCHI" is out of the blue here. So is "TDX-Module". There needs to be
more context.

> + movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
> +
> + tdcall
> +
> + /* Panic if TDCALL reports failure. */
> + test %rax, %rax
> + jnz 2f

Why panic?

Also, do you *REALLY* need to do this from assembly? Can't it be done
in the C wrapper?

> + /* Propagate TDVMCALL success/failure to return value. */
> + mov %r10, %rax

You just said it panic's on failure. How can this propagate failure?

> + /*
> + * On success, propagate TDVMCALL outputs values to the output struct,
> + * if an output struct is provided.
> + */
> + test %rax, %rax
> + jnz 1f
> + test %r9, %r9
> + jz 1f
> +
> + movq %r11, TDVMCALL_r11(%r9)
> + movq %r12, TDVMCALL_r12(%r9)
> + movq %r13, TDVMCALL_r13(%r9)
> + movq %r14, TDVMCALL_r14(%r9)
> + movq %r15, TDVMCALL_r15(%r9)
> +1:
> + /*
> + * Zero out registers exposed to the VMM to avoid speculative execution
> + * with VMM-controlled values.
> + */

Please evenly split the comment across those two lines. (Do this
everywhere in the series).

> + xor %r10d, %r10d
> + xor %r11d, %r11d
> + xor %r12d, %r12d
> + xor %r13d, %r13d
> + xor %r14d, %r14d
> + xor %r15d, %r15d
> +
> + pop %r12
> + pop %r13
> + pop %r14
> + pop %r15
> +
> + FRAME_END
> + ret
> +2:
> + ud2
> +.endm
> +
> +SYM_FUNC_START(__tdvmcall)
> + xor %r10, %r10
> + tdvmcall_core
> +SYM_FUNC_END(__tdvmcall)
> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
> index 0d00dd50a6ff..1147e7e765d6 100644
> --- a/arch/x86/kernel/tdx.c
> +++ b/arch/x86/kernel/tdx.c
> @@ -3,6 +3,36 @@
>
> #include <asm/tdx.h>
>
> +/*
> + * Wrapper for the common case with standard output value (R10).
> + */

... and oddly enough there is no explicit mention of R10 anywhere. Why?

> +static inline u64 tdvmcall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
> +{
> + u64 err;
> +
> + err = __tdvmcall(fn, r12, r13, r14, r15, NULL);
> +
> + WARN_ON(err);
> +
> + return err;
> +}

Are there really *ZERO* reasons for a TDVMCALL to return an error?
Won't this let a malicious VMM spew endless warnings into the guest console?

> +/*
> + * Wrapper for the semi-common case where we need single output value (R11).
> + */
> +static inline u64 tdvmcall_out_r11(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
> +{
> +
> + struct tdvmcall_output out = {0};
> + u64 err;
> +
> + err = __tdvmcall(fn, r12, r13, r14, r15, &out);
> +
> + WARN_ON(err);
> +
> + return out.r11;
> +}
> +

But you introduced __tdvmcall and __tdcall assembly functions. Why
aren't both of them used?

Subject: Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions



On 4/20/21 10:36 AM, Dave Hansen wrote:
> On 3/26/21 4:38 PM, Kuppuswamy Sathyanarayanan wrote:
>> Implement common helper functions to communicate with
>> the TDX Module and VMM (using TDCALL instruction).
>
> This is missing any kind of background. I'd say:
>
> Guests communicate with VMMs with hypercalls. Historically, these are
> implemented using instructions that are known to cause VMEXITs like
> <examples here>. However, with TDX, VMEXITs no longer expose guest
> state from the host. This prevents the old hypercall mechanisms from
> working....
>
> ... and then go on to talk about what you are introducing, why there are
> two of them and so forth.
Ok. I will add it.
>
>> __tdvmcall() function can be used to request services
>> from VMM.
>
> ^ "from a VMM" or "from the VMM", please
>
will use "from the VMM".
>> __tdcall() function can be used to communicate with the
>> TDX Module.
>>
>> Using common helper functions makes the code more readable
>> and less error prone compared to distributed and use case
>> specific inline assembly code. Only downside in using this
>
> ^ "The only downside..."
will fix it.
>
>> approach is, it adds a few extra instructions for every
>> TDCALL use case when compared to distributed checks. Although
>> it's a bit less efficient, it's worth it to make the code more
>> readable.
>
> What's a "distributed check"?

It should be "distributed TDVMCALL/TDCALL inline assembly calls"
>
> This also doesn't talk at all about why this approach was chosen versus
> inline assembly. You're going to be asked "why not use inline asm?"
"To make the core more readable and less error prone." I have added this info
in above paragraph. Do you think we need more argument to justify our approach?
>
>> --- a/arch/x86/include/asm/tdx.h
>> +++ b/arch/x86/include/asm/tdx.h
>> @@ -8,12 +8,35 @@
>> #ifdef CONFIG_INTEL_TDX_GUEST
>>
>> #include <asm/cpufeature.h>
>> +#include <linux/types.h>
>> +
>> +struct tdcall_output {
>> + u64 rcx;
>> + u64 rdx;
>> + u64 r8;
>> + u64 r9;
>> + u64 r10;
>> + u64 r11;
>> +};
>> +
>> +struct tdvmcall_output {
>> + u64 r11;
>> + u64 r12;
>> + u64 r13;
>> + u64 r14;
>> + u64 r15;
>> +};
>>
>> /* Common API to check TDX support in decompression and common kernel code. */
>> bool is_tdx_guest(void);
>>
>> void __init tdx_early_init(void);
>>
>> +u64 __tdcall(u64 fn, u64 rcx, u64 rdx, struct tdcall_output *out);
>> +
>> +u64 __tdvmcall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15,
>> + struct tdvmcall_output *out);
>
> Some one-liner comments about what these do would be nice.
will do.
>
>> #else // !CONFIG_INTEL_TDX_GUEST
>>
>> static inline bool is_tdx_guest(void)
>> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
>> index ea111bf50691..7966c10ea8d1 100644
>> --- a/arch/x86/kernel/Makefile
>> +++ b/arch/x86/kernel/Makefile
>> @@ -127,7 +127,7 @@ obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o
>> obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o
>>
>> obj-$(CONFIG_JAILHOUSE_GUEST) += jailhouse.o
>> -obj-$(CONFIG_INTEL_TDX_GUEST) += tdx.o
>> +obj-$(CONFIG_INTEL_TDX_GUEST) += tdcall.o tdx.o
>>
>> obj-$(CONFIG_EISA) += eisa.o
>> obj-$(CONFIG_PCSPKR_PLATFORM) += pcspeaker.o
>> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
>> index 60b9f42ce3c1..72de0b49467e 100644
>> --- a/arch/x86/kernel/asm-offsets.c
>> +++ b/arch/x86/kernel/asm-offsets.c
>> @@ -23,6 +23,10 @@
>> #include <xen/interface/xen.h>
>> #endif
>>
>> +#ifdef CONFIG_INTEL_TDX_GUEST
>> +#include <asm/tdx.h>
>> +#endif
>> +
>> #ifdef CONFIG_X86_32
>> # include "asm-offsets_32.c"
>> #else
>> @@ -75,6 +79,24 @@ static void __used common(void)
>> OFFSET(XEN_vcpu_info_arch_cr2, vcpu_info, arch.cr2);
>> #endif
>>
>> +#ifdef CONFIG_INTEL_TDX_GUEST
>> + BLANK();
>> + /* Offset for fields in tdcall_output */
>> + OFFSET(TDCALL_rcx, tdcall_output, rcx);
>> + OFFSET(TDCALL_rdx, tdcall_output, rdx);
>> + OFFSET(TDCALL_r8, tdcall_output, r8);
>> + OFFSET(TDCALL_r9, tdcall_output, r9);
>> + OFFSET(TDCALL_r10, tdcall_output, r10);
>> + OFFSET(TDCALL_r11, tdcall_output, r11);
>
> ^ vertically align this
>
will fix it.
>> + /* Offset for fields in tdvmcall_output */
>> + OFFSET(TDVMCALL_r11, tdvmcall_output, r11);
>> + OFFSET(TDVMCALL_r12, tdvmcall_output, r12);
>> + OFFSET(TDVMCALL_r13, tdvmcall_output, r13);
>> + OFFSET(TDVMCALL_r14, tdvmcall_output, r14);
>> + OFFSET(TDVMCALL_r15, tdvmcall_output, r15);
>> +#endif
>> +
>> BLANK();
>> OFFSET(BP_scratch, boot_params, scratch);
>> OFFSET(BP_secure_boot, boot_params, secure_boot);
>> diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
>> new file mode 100644
>> index 000000000000..a73b67c0b407
>> --- /dev/null
>> +++ b/arch/x86/kernel/tdcall.S
>> @@ -0,0 +1,163 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#include <asm/asm-offsets.h>
>> +#include <asm/asm.h>
>> +#include <asm/frame.h>
>> +#include <asm/unwind_hints.h>
>> +
>> +#include <linux/linkage.h>
>> +
>> +#define TDVMCALL_EXPOSE_REGS_MASK 0xfc00
>
> This looks like an undocumented magic number.
>
>> +/*
>> + * TDCALL instruction is newly added in TDX architecture,
>> + * used by TD for requesting the host VMM to provide
>> + * (untrusted) services. Supported in Binutils >= 2.36
>> + */
>
> Host VMM *AND* TD-module, right?
Yes, you are correct. I will fix it.
>
>> +#define tdcall .byte 0x66,0x0f,0x01,0xcc
>
> How well will the "newly added" comment age?
>
> "host VMM" is redundant.
>
> /*
> * TDX guests use the TDCALL instruction to make
> * hypercalls to the VMM. ...
will use it.
>
>
>> +/* Only for non TDVMCALL use cases */
>> +SYM_FUNC_START(__tdcall)
>> + FRAME_BEGIN
>> +
>> + /* Save/restore non-volatile GPRs that are exposed to the VMM. */
>> + push %r15
>> + push %r14
>> + push %r13
>> + push %r12
>
> How is this restoring GPRs?
I have used the same comment for both push/pop combinations. Will remove
restore from above comment.
>
>> + /*
>> + * RDI => RAX = TDCALL leaf
>> + * RSI => RCX = input param 1
>> + * RDX => RDX = input param 2
>> + * RCX => N/A = output struct
>> + */
>
> I don't like this block comment. These should be interspersed with the
> instructions. It's actually redundant with what's below.
I just want to show register mapping details in one place (similar to C
function comments). But I am fine with instruction specific comments.
I will fix it in next version.
>
>> + /* Save output pointer to R12 */
>> + mov %rcx, %r12
>
> Is this a "save" or a "move"? Isn't this moving the function argument
> "%rcx" to the TDCALL register argument "%r12"?
>
>> + /* Move TDCALL Leaf ID to RAX */
>> + mov %rdi, %rax
>> + /* Move input param 1 to rcx*/
>> + mov %rsi, %rcx
>
> This needs a comment:
>
> /* Leave the third function argument (%RDX) in place */
>
Ok.
>> + tdcall
>> +
>> + /*
>> + * On success, propagate TDCALL outputs values to the output struct,
>> + * if an output struct is provided.
>> + */
>
> Again, I don't like the comment separated from the instructions. This
> should be:
will use instruction specific comments.
>
>
> /* Check for TDCALL success: */
>> + test %rax, %rax
>> + jnz 1f
>
> /* Check for a TDCALL output struct */
>> + test %r12, %r12
>> + jz 1f
>
> /* Copy TDCALL result registers to output struct: */
>> + movq %rcx, TDCALL_rcx(%r12)
>> + movq %rdx, TDCALL_rdx(%r12)
>> + movq %r8, TDCALL_r8(%r12)
>> + movq %r9, TDCALL_r9(%r12)
>> + movq %r10, TDCALL_r10(%r12)
>> + movq %r11, TDCALL_r11(%r12)
>
> ^ Vertically align this
will do.
>
>> +1:
>> + /*
>> + * Zero out registers exposed to the VMM to avoid speculative execution
>> + * with VMM-controlled values.
>> + */
>> + xor %rcx, %rcx
>> + xor %rdx, %rdx
>> + xor %r8d, %r8d
>> + xor %r9d, %r9d
>> + xor %r10d, %r10d
>> + xor %r11d, %r11d
>
> This has tabs-versus-space problems.
>
> Also, is this the architectural list of *POSSIBLE* registers to which
> the VMM can write?
>
>> + pop %r12
>> + pop %r13
>> + pop %r14
>> + pop %r15
>> +
>> + FRAME_END
>> + ret
>> +SYM_FUNC_END(__tdcall)
>> +
>> +.macro tdvmcall_core
>> + FRAME_BEGIN
>> +
>> + /* Save/restore non-volatile GPRs that are exposed to the VMM. */
>
> Again, where's the "restore"?
I have used the same comment for both push/pop combinations. Will remove
restore from above comment.
>
>> + push %r15
>> + push %r14
>> + push %r13
>> + push %r12
>> +
>> + /*
>> + * 0 => RAX = TDCALL leaf
>> + * RDI => R11 = TDVMCALL function, e.g. exit reason
>> + * RSI => R12 = input param 0
>> + * RDX => R13 = input param 1
>> + * RCX => R14 = input param 2
>> + * R8 => R15 = input param 3
>> + * MASK => RCX = TDVMCALL register behavior
>> + * R9 => R9 = output struct
>> + */
>> +
>> + xor %eax, %eax
>> + mov %rdi, %r11
>> + mov %rsi, %r12
>> + mov %rdx, %r13
>> + mov %rcx, %r14
>> + mov %r8, %r15
>> +
>> + /*
>> + * Expose R10 - R15, i.e. all GPRs that may be used by TDVMCALLs
>> + * defined in the GHCI. Note, RAX and RCX are consumed, but only by
>> + * TDX-Module and so don't need to be listed in the mask.
>> + */
>
> "GCHI" is out of the blue here. So is "TDX-Module". There needs to be
> more context.
ok. will add it. Do you want GHCI spec reference with section id here?
>
>> + movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
>> +
>> + tdcall
>> +
>> + /* Panic if TDCALL reports failure. */
>> + test %rax, %rax
>> + jnz 2f
>
> Why panic?
As per spec, TDCALL should never fail. Note that it has nothing to do
with TDVMCALL function specific failure (which is reported via R10).
>
> Also, do you *REALLY* need to do this from assembly? Can't it be done
> in the C wrapper?
Its common for all use cases of TDVMCALL (vendor specific, in/out, etc). so added
it here.
>
>> + /* Propagate TDVMCALL success/failure to return value. */
>> + mov %r10, %rax
>
> You just said it panic's on failure. How can this propagate failure?
we use panic for TDCALL failure. But, R10 content used to identify whether given
TDVMCALL function operation is successful or not.
>
>> + /*
>> + * On success, propagate TDVMCALL outputs values to the output struct,
>> + * if an output struct is provided.
>> + */
>> + test %rax, %rax
>> + jnz 1f
>> + test %r9, %r9
>> + jz 1f
>> +
>> + movq %r11, TDVMCALL_r11(%r9)
>> + movq %r12, TDVMCALL_r12(%r9)
>> + movq %r13, TDVMCALL_r13(%r9)
>> + movq %r14, TDVMCALL_r14(%r9)
>> + movq %r15, TDVMCALL_r15(%r9)
>> +1:
>> + /*
>> + * Zero out registers exposed to the VMM to avoid speculative execution
>> + * with VMM-controlled values.
>> + */
>
> Please evenly split the comment across those two lines. (Do this
> everywhere in the series).
ok.
>
>> + xor %r10d, %r10d
>> + xor %r11d, %r11d
>> + xor %r12d, %r12d
>> + xor %r13d, %r13d
>> + xor %r14d, %r14d
>> + xor %r15d, %r15d
>> +
>> + pop %r12
>> + pop %r13
>> + pop %r14
>> + pop %r15
>> +
>> + FRAME_END
>> + ret
>> +2:
>> + ud2
>> +.endm
>> +
>> +SYM_FUNC_START(__tdvmcall)
>> + xor %r10, %r10
>> + tdvmcall_core
>> +SYM_FUNC_END(__tdvmcall)
>> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
>> index 0d00dd50a6ff..1147e7e765d6 100644
>> --- a/arch/x86/kernel/tdx.c
>> +++ b/arch/x86/kernel/tdx.c
>> @@ -3,6 +3,36 @@
>>
>> #include <asm/tdx.h>
>>
>> +/*
>> + * Wrapper for the common case with standard output value (R10).
>> + */
>
> ... and oddly enough there is no explicit mention of R10 anywhere. Why?
For Guest to Host call -> R10 holds TDCALL function id (which is 0 for TDVMCALL). so
we don't need special argument.
After TDVMCALL execution, R10 value is returned via RAX.
>
>> +static inline u64 tdvmcall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
>> +{
>> + u64 err;
>> +
>> + err = __tdvmcall(fn, r12, r13, r14, r15, NULL);
>> +
>> + WARN_ON(err);
>> +
>> + return err;
>> +}
>
> Are there really *ZERO* reasons for a TDVMCALL to return an error?
No. Its useful for debugging TDVMCALL failures.
> Won't this let a malicious VMM spew endless warnings into the guest console?
As per GHCI spec, R10 will hold error code details which can be used to determine
the type of TDVMCALL failure. More warnings at-least show that we are working
with malicious VMM.
>
>> +/*
>> + * Wrapper for the semi-common case where we need single output value (R11).
>> + */
>> +static inline u64 tdvmcall_out_r11(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
>> +{
>> +
>> + struct tdvmcall_output out = {0};
>> + u64 err;
>> +
>> + err = __tdvmcall(fn, r12, r13, r14, r15, &out);
>> +
>> + WARN_ON(err);
>> +
>> + return out.r11;
>> +}
>> +
>
> But you introduced __tdvmcall and __tdcall assembly functions. Why
> aren't both of them used?
This patch just adds helper functions. Its used by other patches in the
series. __tdvmcall is used in this patch because we need to add more
wrappers for it.
>

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

2021-04-20 20:01:06

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

On 4/20/21 12:20 PM, Kuppuswamy, Sathyanarayanan wrote:
>>> approach is, it adds a few extra instructions for every
>>> TDCALL use case when compared to distributed checks. Although
>>> it's a bit less efficient, it's worth it to make the code more
>>> readable.
>>
>> What's a "distributed check"?
>
> It should be "distributed TDVMCALL/TDCALL inline assembly calls"

It's still not clear to what that refers.

>> This also doesn't talk at all about why this approach was chosen versus
>> inline assembly.  You're going to be asked "why not use inline asm?"
> "To make the core more readable and less error prone." I have added this
> info in above paragraph. Do you think we need more argument to
> justify our approach?

Yes, you need much more justification. That's pretty generic and
non-specific.

>>> +    /*
>>> +     * Expose R10 - R15, i.e. all GPRs that may be used by TDVMCALLs
>>> +     * defined in the GHCI.  Note, RAX and RCX are consumed, but
>>> only by
>>> +     * TDX-Module and so don't need to be listed in the mask.
>>> +     */
>>
>> "GCHI" is out of the blue here.  So is "TDX-Module".  There needs to be
>> more context.
> ok. will add it. Do you want GHCI spec reference with section id here?

Absolutely not. I dislike all of the section references as-is. Doesn't
a comment like this say what you said above and also add context?

Expose every register currently used in the
guest-to-host communication interface (GHCI).

>>> +    movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
>>> +
>>> +    tdcall
>>> +
>>> +    /* Panic if TDCALL reports failure. */
>>> +    test %rax, %rax
>>> +    jnz 2f
>>
>> Why panic?
> As per spec, TDCALL should never fail. Note that it has nothing to do
> with TDVMCALL function specific failure (which is reported via R10).

You've introduced two concepts here, without differentiating them. You
need to work to differentiate these two kinds of failure somewhere. You
can't simply refer to both as "failure".

>> Also, do you *REALLY* need to do this from assembly?  Can't it be done
>> in the C wrapper?
> Its common for all use cases of TDVMCALL (vendor specific, in/out, etc).
> so added
> it here.

That's not a good reason. You could just as easily have a C wrapper
which all uses of TDVMCALL go through.

>>> +    /* Propagate TDVMCALL success/failure to return value. */
>>> +    mov %r10, %rax
>>
>> You just said it panic's on failure.  How can this propagate failure?
> we use panic for TDCALL failure. But, R10 content used to identify
> whether given
> TDVMCALL function operation is successful or not.

As I said above, please endeavor to differentiate the two classes of
failures.

Also, if the spec is violated, do you *REALLY* want to blow up the whole
world with a panic? I guess you can argue that it could have been
something security-related that failed. But, either way, you owe a
description of why panic'ing is a good idea, not just blindly deferring
to "the spec says this can't happen".

>>> +    xor %r10d, %r10d
>>> +    xor %r11d, %r11d
>>> +    xor %r12d, %r12d
>>> +    xor %r13d, %r13d
>>> +    xor %r14d, %r14d
>>> +    xor %r15d, %r15d
>>> +
>>> +    pop %r12
>>> +    pop %r13
>>> +    pop %r14
>>> +    pop %r15
>>> +
>>> +    FRAME_END
>>> +    ret
>>> +2:
>>> +    ud2
>>> +.endm
>>> +
>>> +SYM_FUNC_START(__tdvmcall)
>>> +    xor %r10, %r10
>>> +    tdvmcall_core
>>> +SYM_FUNC_END(__tdvmcall)
>>> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
>>> index 0d00dd50a6ff..1147e7e765d6 100644
>>> --- a/arch/x86/kernel/tdx.c
>>> +++ b/arch/x86/kernel/tdx.c
>>> @@ -3,6 +3,36 @@
>>>     #include <asm/tdx.h>
>>>   +/*
>>> + * Wrapper for the common case with standard output value (R10).
>>> + */
>>
>> ... and oddly enough there is no explicit mention of R10 anywhere.  Why?
> For Guest to Host call -> R10 holds TDCALL function id (which is 0 for
> TDVMCALL). so
> we don't need special argument.
> After TDVMCALL execution, R10 value is returned via RAX.

OK... so this is how it works. But why mention R10 in the comment? Why
is *THAT* worth mentioning?

>>> +static inline u64 tdvmcall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
>>> +{
>>> +    u64 err;
>>> +
>>> +    err = __tdvmcall(fn, r12, r13, r14, r15, NULL);
>>> +
>>> +    WARN_ON(err);
>>> +
>>> +    return err;
>>> +}
>>
>> Are there really *ZERO* reasons for a TDVMCALL to return an error?
> No. Its useful for debugging TDVMCALL failures.
>> Won't this let a malicious VMM spew endless warnings into the guest
>> console?
> As per GHCI spec, R10 will hold error code details which can be used to
> determine
> the type of TDVMCALL failure.

I would encourage you to stop citing the GCCI spec. In all of these
conversations, you seem to rely on it without considering the underlying
reasons. The fact that R10 is the error code is 100% irrelevant for
this conversation.

It's also entirely possible that the host would have bugs, or forget to
clear a bit somewhere, even if the spec says, "don't do it".

> More warnings at-least show that we are working
> with malicious VMM.

That argument does not hold water for me.

You can argue that a counter can be kept, or that a WARN_ON_ONCE() is
appropriate, or that a printk_ratelimited() is nice. But, allowing an
untrusted software component to write unlimited warnings to the kernel
console is utterly nonsensical.

By the same argument, any userspace exploit attempts could spew warnings
to the console also so that we can tell we are working with malicious
userspace.

>>> +/*
>>> + * Wrapper for the semi-common case where we need single output
>>> value (R11).
>>> + */
>>> +static inline u64 tdvmcall_out_r11(u64 fn, u64 r12, u64 r13, u64
>>> r14, u64 r15)
>>> +{
>>> +
>>> +    struct tdvmcall_output out = {0};
>>> +    u64 err;
>>> +
>>> +    err = __tdvmcall(fn, r12, r13, r14, r15, &out);
>>> +
>>> +    WARN_ON(err);
>>> +
>>> +    return out.r11;
>>> +}
>>> +
>>
>> But you introduced __tdvmcall and __tdcall assembly functions.  Why
>> aren't both of them used?
> This patch just adds helper functions. Its used by other patches in the
> series. __tdvmcall is used in this patch because we need to add more
> wrappers for it.

That needs to be mentioned in the changelog at least.

Subject: Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions



On 4/20/21 12:59 PM, Dave Hansen wrote:
> On 4/20/21 12:20 PM, Kuppuswamy, Sathyanarayanan wrote:
>>>> approach is, it adds a few extra instructions for every
>>>> TDCALL use case when compared to distributed checks. Although
>>>> it's a bit less efficient, it's worth it to make the code more
>>>> readable.
>>>
>>> What's a "distributed check"?
>>
>> It should be "distributed TDVMCALL/TDCALL inline assembly calls"
>
> It's still not clear to what that refers.

I am just comparing the performance cost of using generic TDCALL()/TDVMCALL()
function implementation with "usage specific" (GetQuote,MapGPA, HLT,etc) custom
TDCALL()/TDVMCALL() inline assembly implementation.
>
>>> This also doesn't talk at all about why this approach was chosen versus
>>> inline assembly.  You're going to be asked "why not use inline asm?"
>> "To make the core more readable and less error prone." I have added this
>> info in above paragraph. Do you think we need more argument to
>> justify our approach?
>
> Yes, you need much more justification. That's pretty generic and
> non-specific.
readability is one of the main motivation for not choosing inline
assembly. Since number of lines of instructions (with comments) are over
70, using inline assembly made it hard to read. Another reason is, since we
are using many registers (R8-R15, R[A-D]X)) in TDVMCAL/TDCALL operation, we are
not sure whether some older compiler can follow our specified inline assembly
constraints.
>
>>>> +    /*
>>>> +     * Expose R10 - R15, i.e. all GPRs that may be used by TDVMCALLs
>>>> +     * defined in the GHCI.  Note, RAX and RCX are consumed, but
>>>> only by
>>>> +     * TDX-Module and so don't need to be listed in the mask.
>>>> +     */
>>>
>>> "GCHI" is out of the blue here.  So is "TDX-Module".  There needs to be
>>> more context.
>> ok. will add it. Do you want GHCI spec reference with section id here?
>
> Absolutely not. I dislike all of the section references as-is. Doesn't
> a comment like this say what you said above and also add context?
>
> Expose every register currently used in the
> guest-to-host communication interface (GHCI).
ok.
>
>>>> +    movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
>>>> +
>>>> +    tdcall
>>>> +
>>>> +    /* Panic if TDCALL reports failure. */
>>>> +    test %rax, %rax
>>>> +    jnz 2f
>>>
>>> Why panic?
>> As per spec, TDCALL should never fail. Note that it has nothing to do
>> with TDVMCALL function specific failure (which is reported via R10).
>
> You've introduced two concepts here, without differentiating them. You
> need to work to differentiate these two kinds of failure somewhere. You
> can't simply refer to both as "failure".
will clarify it. I have assumed that once the user reads the spec, its easier
to understand.
>
>>> Also, do you *REALLY* need to do this from assembly?  Can't it be done
>>> in the C wrapper?
>> Its common for all use cases of TDVMCALL (vendor specific, in/out, etc).
>> so added
>> it here.
>
> That's not a good reason. You could just as easily have a C wrapper
> which all uses of TDVMCALL go through.
Any reason for not preferring it in assembly code?
Also, using wrapper will add more complication for in/out instruction
substitution use case. please check the use case in following patch.
https://github.com/intel/tdx/commit/1b73f60aa5bb93554f3b15cd786a9b10b53c1543
>
>>>> +    /* Propagate TDVMCALL success/failure to return value. */
>>>> +    mov %r10, %rax
>>>
>>> You just said it panic's on failure.  How can this propagate failure?
>> we use panic for TDCALL failure. But, R10 content used to identify
>> whether given
>> TDVMCALL function operation is successful or not.
>
> As I said above, please endeavor to differentiate the two classes of
> failures.
>
> Also, if the spec is violated, do you *REALLY* want to blow up the whole
> world with a panic? I guess you can argue that it could have been
> something security-related that failed. But, either way, you owe a
> description of why panic'ing is a good idea, not just blindly deferring
> to "the spec says this can't happen".
ok. will add some comments justifying our case.
>
>>>> +    xor %r10d, %r10d
>>>> +    xor %r11d, %r11d
>>>> +    xor %r12d, %r12d
>>>> +    xor %r13d, %r13d
>>>> +    xor %r14d, %r14d
>>>> +    xor %r15d, %r15d
>>>> +
>>>> +    pop %r12
>>>> +    pop %r13
>>>> +    pop %r14
>>>> +    pop %r15
>>>> +
>>>> +    FRAME_END
>>>> +    ret
>>>> +2:
>>>> +    ud2
>>>> +.endm
>>>> +
>>>> +SYM_FUNC_START(__tdvmcall)
>>>> +    xor %r10, %r10
>>>> +    tdvmcall_core
>>>> +SYM_FUNC_END(__tdvmcall)
>>>> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
>>>> index 0d00dd50a6ff..1147e7e765d6 100644
>>>> --- a/arch/x86/kernel/tdx.c
>>>> +++ b/arch/x86/kernel/tdx.c
>>>> @@ -3,6 +3,36 @@
>>>>     #include <asm/tdx.h>
>>>>   +/*
>>>> + * Wrapper for the common case with standard output value (R10).
>>>> + */
>>>
>>> ... and oddly enough there is no explicit mention of R10 anywhere.  Why?
>> For Guest to Host call -> R10 holds TDCALL function id (which is 0 for
>> TDVMCALL). so
>> we don't need special argument.
>> After TDVMCALL execution, R10 value is returned via RAX.
>
> OK... so this is how it works. But why mention R10 in the comment? Why
> is *THAT* worth mentioning?
its not needed. will remove it.
>
>>>> +static inline u64 tdvmcall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
>>>> +{
>>>> +    u64 err;
>>>> +
>>>> +    err = __tdvmcall(fn, r12, r13, r14, r15, NULL);
>>>> +
>>>> +    WARN_ON(err);
>>>> +
>>>> +    return err;
>>>> +}
>>>
>>> Are there really *ZERO* reasons for a TDVMCALL to return an error?
>> No. Its useful for debugging TDVMCALL failures.
>>> Won't this let a malicious VMM spew endless warnings into the guest
>>> console?
>> As per GHCI spec, R10 will hold error code details which can be used to
>> determine
>> the type of TDVMCALL failure.
>
> I would encourage you to stop citing the GCCI spec. In all of these
> conversations, you seem to rely on it without considering the underlying
> reasons. The fact that R10 is the error code is 100% irrelevant for
> this conversation.
>
> It's also entirely possible that the host would have bugs, or forget to
> clear a bit somewhere, even if the spec says, "don't do it".
>
>> More warnings at-least show that we are working
>> with malicious VMM.
>
> That argument does not hold water for me.
>
> You can argue that a counter can be kept, or that a WARN_ON_ONCE() is
> appropriate, or that a printk_ratelimited() is nice. But, allowing an
> untrusted software component to write unlimited warnings to the kernel
> console is utterly nonsensical.
>
> By the same argument, any userspace exploit attempts could spew warnings
> to the console also so that we can tell we are working with malicious
> userspace.
In our case, we will get WARN() output only if guest triggers TDCALL()/TDVMCALL()
right? So getting WARN() message for failure of guest triggered call is
justifiable right?
>
>>>> +/*
>>>> + * Wrapper for the semi-common case where we need single output
>>>> value (R11).
>>>> + */
>>>> +static inline u64 tdvmcall_out_r11(u64 fn, u64 r12, u64 r13, u64
>>>> r14, u64 r15)
>>>> +{
>>>> +
>>>> +    struct tdvmcall_output out = {0};
>>>> +    u64 err;
>>>> +
>>>> +    err = __tdvmcall(fn, r12, r13, r14, r15, &out);
>>>> +
>>>> +    WARN_ON(err);
>>>> +
>>>> +    return out.r11;
>>>> +}
>>>> +
>>>
>>> But you introduced __tdvmcall and __tdcall assembly functions.  Why
>>> aren't both of them used?
>> This patch just adds helper functions. Its used by other patches in the
>> series. __tdvmcall is used in this patch because we need to add more
>> wrappers for it.
>
> That needs to be mentioned in the changelog at least.
ok will do it.
>

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

2021-04-20 23:46:42

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

On 4/20/21 4:12 PM, Kuppuswamy, Sathyanarayanan wrote:
> On 4/20/21 12:59 PM, Dave Hansen wrote:
>> On 4/20/21 12:20 PM, Kuppuswamy, Sathyanarayanan wrote:
>>>>> approach is, it adds a few extra instructions for every
>>>>> TDCALL use case when compared to distributed checks. Although
>>>>> it's a bit less efficient, it's worth it to make the code more
>>>>> readable.
>>>>
>>>> What's a "distributed check"?
>>>
>>> It should be "distributed TDVMCALL/TDCALL inline assembly calls"
>>
>> It's still not clear to what that refers.
>
> I am just comparing the performance cost of using generic
> TDCALL()/TDVMCALL() function implementation with "usage specific"
> (GetQuote,MapGPA, HLT,etc) custom TDCALL()/TDVMCALL() inline assembly
> implementation.

So, I actually had an idea what you were talking about, but I have
*ZERO* idea what "distributed" means in this context.

I think you are trying to say something along the lines of:

Just like syscalls, not all TDVMCALL/TDCALLs use the same set
of argument registers. The implementation here picks the
current worst-case scenario for TDCALL (4 registers). For
TDCALLs with fewer than 4 arguments, there will end up being
a few superfluous (cheap) instructions. But, this approach
maximizes code reuse.


>>>> This also doesn't talk at all about why this approach was
>>>> chosen versus inline assembly. You're going to be asked "why
>>>> not use inline asm?"
>>> "To make the core more readable and less error prone." I have
>>> added this info in above paragraph. Do you think we need more
>>> argument to justify our approach?
>>
>> Yes, you need much more justification. That's pretty generic and
>> non-specific.
> readability is one of the main motivation for not choosing inline

I'm curious. Is there a reason you are not choosing to use
capitalization in your replies? I personally use capitalization as a
visual clue for where a reply starts.

I'm not sure whether this indicates that your keyboard is not
functioning properly, or that these replies are simply not important
enough to warrant the use of the Shift key. Or, is it simply an
oversight? Or, maybe I'm just being overly picky because I've been
working on these exact things with my third-grader a bit too much lately.

Either way, I personally would appreciate your attention to detail in
crafting writing that is easy to parse, since I'm the one that's going
to have to parse it. Details show that you care about the content you
produce. Even if you don't mean it, a lack of attention to detail (even
capital letters) can be perceived to mean that you do not care about
what you write. If you don't care about it, why should the reader?

> assembly. Since number of lines of instructions (with comments) are
> over 70, using inline assembly made it hard to read. Another reason
> is, since we
> are using many registers (R8-R15, R[A-D]X)) in TDVMCAL/TDCALL
> operation, we are not sure whether some older compiler can follow
> our specified inline assembly constraints.

As for the justification, that's much improved. Please include that,
along with some careful review of the grammar.

>>>>> +    movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
>>>>> +
>>>>> +    tdcall
>>>>> +
>>>>> +    /* Panic if TDCALL reports failure. */
>>>>> +    test %rax, %rax
>>>>> +    jnz 2f
>>>>
>>>> Why panic?
>>> As per spec, TDCALL should never fail. Note that it has nothing to do
>>> with TDVMCALL function specific failure (which is reported via R10).
>>
>> You've introduced two concepts here, without differentiating them.  You
>> need to work to differentiate these two kinds of failure somewhere.  You
>> can't simply refer to both as "failure".
> will clarify it. I have assumed that once the user reads the spec, its
> easier
> to understand.

Your code should be 100% self-supporting without the spec. The spec can
be there in a supportive role to help resolve ambiguity or add fine
detail. But, I think this is a major, repeated problem with this patch
set: it relies too much on reviewers spending quality time with the spec.

>>>> Also, do you *REALLY* need to do this from assembly?  Can't it be done
>>>> in the C wrapper?
>>> Its common for all use cases of TDVMCALL (vendor specific, in/out, etc).
>>> so added
>>> it here.
>>
>> That's not a good reason.  You could just as easily have a C wrapper
>> which all uses of TDVMCALL go through.
> Any reason for not preferring it in assembly code?

Assembly is a last resort. It should only be used for things that
simply can't be written in C or are horrific to understand and manage
when written in C. A single statement like:

BUG_ON(something);

does not qualify in my book as something that's horrific to write in C.

> Also, using wrapper will add more complication for in/out instruction
> substitution use case. please check the use case in following patch.
> https://github.com/intel/tdx/commit/1b73f60aa5bb93554f3b15cd786a9b10b53c1543

I'm seeing a repeated theme here. The approach in this patch series,
and in this email thread in general appears to be one where the patch
submitter is doing as little work as possible and trying to make the
reviewer do as much work as possible.

This is a 300-line diff with all kinds of stuff going on in it. I'm not
sure to what you are referring. You haven't made it easy to figure out.

It would make it a lot easier if you pointed to a specific line, or
copied-and-pasted the code to which you refer. I would really encourage
you to try to make your content easier for reviewers to digest:
Capitalize the start of sentences. Make unambiguous references to code.
Don't blindly cite the spec. Fully express your thoughts.

You'll end up with happier reviewers instead of grumpy ones.

...
>>> More warnings at-least show that we are working
>>> with malicious VMM.
>>
>> That argument does not hold water for me.
>>
>> You can argue that a counter can be kept, or that a WARN_ON_ONCE() is
>> appropriate, or that a printk_ratelimited() is nice.  But, allowing an
>> untrusted software component to write unlimited warnings to the kernel
>> console is utterly nonsensical.
>>
>> By the same argument, any userspace exploit attempts could spew warnings
>> to the console also so that we can tell we are working with malicious
>> userspace.
> In our case, we will get WARN() output only if guest triggers
> TDCALL()/TDVMCALL()
> right? So getting WARN() message for failure of guest triggered call is
> justifiable right?

The output of these calls and thus the error code comes from another
piece of software, either the SEAM module or the VMM.

The error can be from one of several reasons:
1. Guest error/bug where the guest provides a bad value. This is
probably the most likely scenario. But, it's impossible to
differentiate from the other cases because it's a guest bug.
2. SEAM error/bug. If the spec says "SEAM will not do this", then you
can probably justify a WARN_ON_ONCE(). If the call is security-
sensitve, like part of attestation, then you can't meaningfully
recover from it and it probably deserves a BUG_ON().
3. VMM error/bug/malice. Again, you might be able to justify a
WARN_ON_ONCE(). We do that for userspace that might be attacking
the kernel. These are *NEVER* fatal and should be rate-limited.

I don't see *ANYWHERE* in this list where an unbounded, unratelimited
WARN() is appropriate. But, that's just my $0.02.

2021-04-20 23:55:17

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

On Tue, Apr 20, 2021 at 4:12 PM Kuppuswamy, Sathyanarayanan
<[email protected]> wrote:
[..]
> >>> Also, do you *REALLY* need to do this from assembly? Can't it be done
> >>> in the C wrapper?
> >> Its common for all use cases of TDVMCALL (vendor specific, in/out, etc).
> >> so added
> >> it here.
> >

Can I ask a favor?

Please put a line break between quoted lines and your reply.

> > That's not a good reason. You could just as easily have a C wrapper
> > which all uses of TDVMCALL go through.

...because this runs together when reading otherwise.

> Any reason for not preferring it in assembly code?
> Also, using wrapper will add more complication for in/out instruction
> substitution use case. please check the use case in following patch.
> https://github.com/intel/tdx/commit/1b73f60aa5bb93554f3b15cd786a9b10b53c1543

This commit still has open coded assembly for the TDVMCALL? I thought
we talked about it being unified with the common definition, or has
this patch not been reworked with that feedback yet? I expect there is
no performance reason why in/out need to get their own custom coded
TDVMCALL implementation. It should also be the case the failure should
behave the same as native in/out failure i.e. all ones on read
failure, and silent drops on write failure.

Subject: Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions



On 4/20/21 4:53 PM, Dan Williams wrote:
> On Tue, Apr 20, 2021 at 4:12 PM Kuppuswamy, Sathyanarayanan
> <[email protected]> wrote:
> [..]
>>>>> Also, do you *REALLY* need to do this from assembly? Can't it be done
>>>>> in the C wrapper?
>>>> Its common for all use cases of TDVMCALL (vendor specific, in/out, etc).
>>>> so added
>>>> it here.
>>>
>
> Can I ask a favor?
>
> Please put a line break between quoted lines and your reply.

will do

>
>>> That's not a good reason. You could just as easily have a C wrapper
>>> which all uses of TDVMCALL go through.
>
> ...because this runs together when reading otherwise.
>
>> Any reason for not preferring it in assembly code?
>> Also, using wrapper will add more complication for in/out instruction
>> substitution use case. please check the use case in following patch.
>> https://github.com/intel/tdx/commit/1b73f60aa5bb93554f3b15cd786a9b10b53c1543
>
> This commit still has open coded assembly for the TDVMCALL? I thought
> we talked about it being unified with the common definition, or has
> this patch not been reworked with that feedback yet? I expect there is
> no performance reason why in/out need to get their own custom coded
> TDVMCALL implementation. It should also be the case the failure should
> behave the same as native in/out failure i.e. all ones on read
> failure, and silent drops on write failure.
>

That link is for older version. My next version addresses your review
comments (re-uses TDVMCALL() function). Although the patch is ready, I am
waiting to fix other review comments before sending the next version. I
have just shared that link to explain about the use case.

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

Subject: Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions



On 4/20/21 4:42 PM, Dave Hansen wrote:
> On 4/20/21 4:12 PM, Kuppuswamy, Sathyanarayanan wrote:
>> On 4/20/21 12:59 PM, Dave Hansen wrote:
>>> On 4/20/21 12:20 PM, Kuppuswamy, Sathyanarayanan wrote:
>>>>>> approach is, it adds a few extra instructions for every
>>>>>> TDCALL use case when compared to distributed checks. Although
>>>>>> it's a bit less efficient, it's worth it to make the code more
>>>>>> readable.
>>>>>
>>>>> What's a "distributed check"?
>>>>
>>>> It should be "distributed TDVMCALL/TDCALL inline assembly calls"
>>>
>>> It's still not clear to what that refers.
>>
>> I am just comparing the performance cost of using generic
>> TDCALL()/TDVMCALL() function implementation with "usage specific"
>> (GetQuote,MapGPA, HLT,etc) custom TDCALL()/TDVMCALL() inline assembly
>> implementation.
>
> So, I actually had an idea what you were talking about, but I have
> *ZERO* idea what "distributed" means in this context.
>
> I think you are trying to say something along the lines of:
>
> Just like syscalls, not all TDVMCALL/TDCALLs use the same set
> of argument registers. The implementation here picks the
> current worst-case scenario for TDCALL (4 registers). For
> TDCALLs with fewer than 4 arguments, there will end up being
> a few superfluous (cheap) instructions. But, this approach
> maximizes code reuse.
>

Yes, you are correct. I will word it better in my next version.

>
>>>>> This also doesn't talk at all about why this approach was
>>>>> chosen versus inline assembly. You're going to be asked "why
>>>>> not use inline asm?"
>>>> "To make the core more readable and less error prone." I have
>>>> added this info in above paragraph. Do you think we need more
>>>> argument to justify our approach?
>>>
>>> Yes, you need much more justification. That's pretty generic and
>>> non-specific.
>> readability is one of the main motivation for not choosing inline
>
> I'm curious. Is there a reason you are not choosing to use
> capitalization in your replies? I personally use capitalization as a
> visual clue for where a reply starts.
>
> I'm not sure whether this indicates that your keyboard is not
> functioning properly, or that these replies are simply not important
> enough to warrant the use of the Shift key. Or, is it simply an
> oversight? Or, maybe I'm just being overly picky because I've been
> working on these exact things with my third-grader a bit too much lately.
>
> Either way, I personally would appreciate your attention to detail in
> crafting writing that is easy to parse, since I'm the one that's going
> to have to parse it. Details show that you care about the content you
> produce. Even if you don't mean it, a lack of attention to detail (even
> capital letters) can be perceived to mean that you do not care about
> what you write. If you don't care about it, why should the reader?
>
>> assembly. Since number of lines of instructions (with comments) are
>> over 70, using inline assembly made it hard to read. Another reason
>> is, since we
>> are using many registers (R8-R15, R[A-D]X)) in TDVMCAL/TDCALL
>> operation, we are not sure whether some older compiler can follow
>> our specified inline assembly constraints.
>
> As for the justification, that's much improved. Please include that,
> along with some careful review of the grammar.

It's an oversight from my end. I will keep it in mind in my future
replies.


>
>>>>>> +    movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx


>>>
>>> You've introduced two concepts here, without differentiating them.  You
>>> need to work to differentiate these two kinds of failure somewhere.  You
>>> can't simply refer to both as "failure".
>> will clarify it. I have assumed that once the user reads the spec, its
>> easier
>> to understand.
>
> Your code should be 100% self-supporting without the spec. The spec can
> be there in a supportive role to help resolve ambiguity or add fine
> detail. But, I think this is a major, repeated problem with this patch
> set: it relies too much on reviewers spending quality time with the spec.
>

I will review the patch set again and add necessary comments to fix this gap.

>>>>> Also, do you *REALLY* need to do this from assembly?  Can't it be done
>>>>> in the C wrapper?
>>>> Its common for all use cases of TDVMCALL (vendor specific, in/out, etc).
>>>> so added
>>>> it here.
>>>
>>> That's not a good reason.  You could just as easily have a C wrapper
>>> which all uses of TDVMCALL go through.
>> Any reason for not preferring it in assembly code?
>
> Assembly is a last resort. It should only be used for things that
> simply can't be written in C or are horrific to understand and manage
> when written in C. A single statement like:
>
> BUG_ON(something);
>
> does not qualify in my book as something that's horrific to write in C.
>
>> Also, using wrapper will add more complication for in/out instruction
>> substitution use case. please check the use case in following patch.
>> https://github.com/intel/tdx/commit/1b73f60aa5bb93554f3b15cd786a9b10b53c1543
>
> I'm seeing a repeated theme here. The approach in this patch series,
> and in this email thread in general appears to be one where the patch
> submitter is doing as little work as possible and trying to make the
> reviewer do as much work as possible.
>
> This is a 300-line diff with all kinds of stuff going on in it. I'm not
> sure to what you are referring. You haven't made it easy to figure out.

I have pointed that patch to give reference to how in/out instructions
are substituted with tdvmcalls(). Specific implementation is spread across
multiple lines/files in that patch. So I did not include specific line
numbers.

But let me try to explain it here. What I meant by complication is,
for in/out instruction, we use alternative_io() to substitute in/out
instructions with tdg_in()/tdg_out() assembly calls. So we have to ensure
that we don't corrupt registers or stack from the substituted instructions

If you check the implementation of tdg_in()/tdg_out(), you will notice
that we have added code to preserve the caller registers. So, if we use
C wrapper for this use case, there is a chance that it might mess
the caller registers or stack.

alternative_io("in" #bwl " %w2, %" #bw "0", \
"call tdg_in" #bwl, X86_FEATURE_TDX_GUEST, \
"=a"(value), "d"(port))

>
> It would make it a lot easier if you pointed to a specific line, or
> copied-and-pasted the code to which you refer. I would really encourage
> you to try to make your content easier for reviewers to digest:
> Capitalize the start of sentences. Make unambiguous references to code.
> Don't blindly cite the spec. Fully express your thoughts.
>
> You'll end up with happier reviewers instead of grumpy ones.

Got it. I will try to keep your suggestion in mind for future
communications.

>
> ...
>>>> More warnings at-least show that we are working
>>>> with malicious VMM.
>>>

>> In our case, we will get WARN() output only if guest triggers
>> TDCALL()/TDVMCALL()
>> right? So getting WARN() message for failure of guest triggered call is
>> justifiable right?
>
> The output of these calls and thus the error code comes from another
> piece of software, either the SEAM module or the VMM.
>
> The error can be from one of several reasons:
> 1. Guest error/bug where the guest provides a bad value. This is
> probably the most likely scenario. But, it's impossible to
> differentiate from the other cases because it's a guest bug.
> 2. SEAM error/bug. If the spec says "SEAM will not do this", then you
> can probably justify a WARN_ON_ONCE(). If the call is security-
> sensitve, like part of attestation, then you can't meaningfully
> recover from it and it probably deserves a BUG_ON().
> 3. VMM error/bug/malice. Again, you might be able to justify a
> WARN_ON_ONCE(). We do that for userspace that might be attacking
> the kernel. These are *NEVER* fatal and should be rate-limited.
>
> I don't see *ANYWHERE* in this list where an unbounded, unratelimited
> WARN() is appropriate. But, that's just my $0.02.

WARN_ON_ONCE() will not work for our use case. Since tdvmcall()/tdcall()
can be triggered for multiple use cases. So we can't print errors only
once.

I will go with pr_warn_ratelimited() for this use case.

>

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

2021-04-23 01:21:53

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

On 4/22/21 6:09 PM, Kuppuswamy, Sathyanarayanan wrote:
> But let me try to explain it here. What I meant by complication is,
> for in/out instruction, we use alternative_io() to substitute in/out
> instructions with tdg_in()/tdg_out() assembly calls. So we have to ensure
> that we don't corrupt registers or stack from the substituted instructions
>
> If you check the implementation of tdg_in()/tdg_out(), you will notice
> that we have added code to preserve the caller registers. So, if we use
> C wrapper for this use case, there is a chance that it might mess
> the caller registers or stack.
>
>     alternative_io("in" #bwl " %w2, %" #bw "0",            \
>             "call tdg_in" #bwl, X86_FEATURE_TDX_GUEST,    \
>             "=a"(value), "d"(port))

Are you saying that calling C functions from inline assembly might
corrupt the stack or registers? Are you suggesting that you simply
can't call C functions from inline assembly? Or, that you can't express
the register clobbers of a function call in inline assembly?

You might want to check around the kernel to see how other folks do it.

2021-04-23 01:36:45

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

On Thu, Apr 22, 2021 at 06:21:07PM -0700, Dave Hansen wrote:
> On 4/22/21 6:09 PM, Kuppuswamy, Sathyanarayanan wrote:
> > But let me try to explain it here. What I meant by complication is,
> > for in/out instruction, we use alternative_io() to substitute in/out
> > instructions with tdg_in()/tdg_out() assembly calls. So we have to ensure
> > that we don't corrupt registers or stack from the substituted instructions
> >
> > If you check the implementation of tdg_in()/tdg_out(), you will notice
> > that we have added code to preserve the caller registers. So, if we use
> > C wrapper for this use case, there is a chance that it might mess
> > the caller registers or stack.
> >
> > ????alternative_io("in" #bwl " %w2, %" #bw "0",??????????? \
> > ??????????? "call tdg_in" #bwl, X86_FEATURE_TDX_GUEST,??? \
> > ??????????? "=a"(value), "d"(port))
>
> Are you saying that calling C functions from inline assembly might
> corrupt the stack or registers? Are you suggesting that you simply

It's possible, but you would need to mark a lot more registers clobbered
(the x86-64 ABI allows to clobber many registers)

I don't think the stack would be messed up, but there might be problems
with writing the correct unwind information (which tends to be tricky)

Usually it's better to avoid it.

-Andi


> can't call C functions from inline assembly? Or, that you can't express
> the register clobbers of a function call in inline assembly?
>
> You might want to check around the kernel to see how other folks do it.

2021-04-23 15:19:17

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

On Thu, Apr 22, 2021, Andi Kleen wrote:
> On Thu, Apr 22, 2021 at 06:21:07PM -0700, Dave Hansen wrote:
> > On 4/22/21 6:09 PM, Kuppuswamy, Sathyanarayanan wrote:
> > > But let me try to explain it here. What I meant by complication is,
> > > for in/out instruction, we use alternative_io() to substitute in/out
> > > instructions with tdg_in()/tdg_out() assembly calls. So we have to ensure
> > > that we don't corrupt registers or stack from the substituted instructions
> > >
> > > If you check the implementation of tdg_in()/tdg_out(), you will notice
> > > that we have added code to preserve the caller registers. So, if we use
> > > C wrapper for this use case, there is a chance that it might mess
> > > the caller registers or stack.
> > >
> > > ????alternative_io("in" #bwl " %w2, %" #bw "0",??????????? \
> > > ??????????? "call tdg_in" #bwl, X86_FEATURE_TDX_GUEST,??? \

Has Intel "officially" switched to "tdg" as the acronym for TDX guest? As much
as I dislike having to juggle "TDX host" vs "TDX guest" concepts, tdx_ vs tdg_
isn't any better IMO. The latter looks an awful lot like a typo, grepping for
"tdx" to find relevant code will get fail (sometimes), and confusion seems
inevitable as keeping "TDX" out of guest code/comments/documentation will be
nigh impossible.

If we do decide to go with "tdg" for the guest stuff, then _all_ of the guest
stuff, file names included, should use tdg. Maybe X86_FEATURE_TDX_GUEST could
be left as a breadcrumb for translating TDX->TDG.

> > > ??????????? "=a"(value), "d"(port))
> >
> > Are you saying that calling C functions from inline assembly might
> > corrupt the stack or registers? Are you suggesting that you simply
>
> It's possible, but you would need to mark a lot more registers clobbered
> (the x86-64 ABI allows to clobber many registers)
>
> I don't think the stack would be messed up, but there might be problems
> with writing the correct unwind information (which tends to be tricky)
>
> Usually it's better to avoid it.

For me, the more important justification is that, if calling from alternative_io,
the input parameters will be in the wrong registers. The OUT wrapper would be
especially gross as RAX (the value to write) isn't an input param, i.e. shifting
via "ignored" params wouldn't work.

But to Dave's point, that justfication needs to be in the changelog.

2021-04-23 15:31:12

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

On Fri, Apr 23, 2021 at 8:15 AM Sean Christopherson <[email protected]> wrote:
>
> On Thu, Apr 22, 2021, Andi Kleen wrote:
> > On Thu, Apr 22, 2021 at 06:21:07PM -0700, Dave Hansen wrote:
> > > On 4/22/21 6:09 PM, Kuppuswamy, Sathyanarayanan wrote:
> > > > But let me try to explain it here. What I meant by complication is,
> > > > for in/out instruction, we use alternative_io() to substitute in/out
> > > > instructions with tdg_in()/tdg_out() assembly calls. So we have to ensure
> > > > that we don't corrupt registers or stack from the substituted instructions
> > > >
> > > > If you check the implementation of tdg_in()/tdg_out(), you will notice
> > > > that we have added code to preserve the caller registers. So, if we use
> > > > C wrapper for this use case, there is a chance that it might mess
> > > > the caller registers or stack.
> > > >
> > > > alternative_io("in" #bwl " %w2, %" #bw "0", \
> > > > "call tdg_in" #bwl, X86_FEATURE_TDX_GUEST, \
>
> Has Intel "officially" switched to "tdg" as the acronym for TDX guest? As much
> as I dislike having to juggle "TDX host" vs "TDX guest" concepts, tdx_ vs tdg_
> isn't any better IMO. The latter looks an awful lot like a typo, grepping for
> "tdx" to find relevant code will get fail (sometimes), and confusion seems
> inevitable as keeping "TDX" out of guest code/comments/documentation will be
> nigh impossible.
>
> If we do decide to go with "tdg" for the guest stuff, then _all_ of the guest
> stuff, file names included, should use tdg. Maybe X86_FEATURE_TDX_GUEST could
> be left as a breadcrumb for translating TDX->TDG.
>
> > > > "=a"(value), "d"(port))
> > >
> > > Are you saying that calling C functions from inline assembly might
> > > corrupt the stack or registers? Are you suggesting that you simply
> >
> > It's possible, but you would need to mark a lot more registers clobbered
> > (the x86-64 ABI allows to clobber many registers)
> >
> > I don't think the stack would be messed up, but there might be problems
> > with writing the correct unwind information (which tends to be tricky)
> >
> > Usually it's better to avoid it.
>
> For me, the more important justification is that, if calling from alternative_io,
> the input parameters will be in the wrong registers. The OUT wrapper would be
> especially gross as RAX (the value to write) isn't an input param, i.e. shifting
> via "ignored" params wouldn't work.
>
> But to Dave's point, that justfication needs to be in the changelog.

It's not clear to me that in()/out() need to be inline asm with an
alternative vs out-of-line function calls with a static branch?

2021-04-23 15:39:01

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

On Fri, Apr 23, 2021 at 08:28:45AM -0700, Dan Williams wrote:
> On Fri, Apr 23, 2021 at 8:15 AM Sean Christopherson <[email protected]> wrote:
> >
> > On Thu, Apr 22, 2021, Andi Kleen wrote:
> > > On Thu, Apr 22, 2021 at 06:21:07PM -0700, Dave Hansen wrote:
> > > > On 4/22/21 6:09 PM, Kuppuswamy, Sathyanarayanan wrote:
> > > > > But let me try to explain it here. What I meant by complication is,
> > > > > for in/out instruction, we use alternative_io() to substitute in/out
> > > > > instructions with tdg_in()/tdg_out() assembly calls. So we have to ensure
> > > > > that we don't corrupt registers or stack from the substituted instructions
> > > > >
> > > > > If you check the implementation of tdg_in()/tdg_out(), you will notice
> > > > > that we have added code to preserve the caller registers. So, if we use
> > > > > C wrapper for this use case, there is a chance that it might mess
> > > > > the caller registers or stack.
> > > > >
> > > > > alternative_io("in" #bwl " %w2, %" #bw "0", \
> > > > > "call tdg_in" #bwl, X86_FEATURE_TDX_GUEST, \
> >
> > Has Intel "officially" switched to "tdg" as the acronym for TDX guest? As much
> > as I dislike having to juggle "TDX host" vs "TDX guest" concepts, tdx_ vs tdg_
> > isn't any better IMO. The latter looks an awful lot like a typo, grepping for
> > "tdx" to find relevant code will get fail (sometimes), and confusion seems
> > inevitable as keeping "TDX" out of guest code/comments/documentation will be
> > nigh impossible.
> >
> > If we do decide to go with "tdg" for the guest stuff, then _all_ of the guest
> > stuff, file names included, should use tdg. Maybe X86_FEATURE_TDX_GUEST could
> > be left as a breadcrumb for translating TDX->TDG.
> >
> > > > > "=a"(value), "d"(port))
> > > >
> > > > Are you saying that calling C functions from inline assembly might
> > > > corrupt the stack or registers? Are you suggesting that you simply
> > >
> > > It's possible, but you would need to mark a lot more registers clobbered
> > > (the x86-64 ABI allows to clobber many registers)
> > >
> > > I don't think the stack would be messed up, but there might be problems
> > > with writing the correct unwind information (which tends to be tricky)
> > >
> > > Usually it's better to avoid it.
> >
> > For me, the more important justification is that, if calling from alternative_io,
> > the input parameters will be in the wrong registers. The OUT wrapper would be
> > especially gross as RAX (the value to write) isn't an input param, i.e. shifting
> > via "ignored" params wouldn't work.
> >
> > But to Dave's point, that justfication needs to be in the changelog.
>
> It's not clear to me that in()/out() need to be inline asm with an
> alternative vs out-of-line function calls with a static branch?

I doubt it matters at all on a modern machine (the cost of a IO port
access is many orders of magnitudes greater than a call), but it might
have mattered on really old systems, like 486 class. Maybe if someone
is still running those moving it out of line could be a problem.

But likely it's fine.

I think actually for the main kernel we could just rely on #VE here
and drop it all.
Doing it without #VE only really matters for the old boot code, where
performance doesn't really matter.

-Andi

2021-04-23 15:48:11

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

>
> Has Intel "officially" switched to "tdg" as the acronym for TDX guest? As much

Just for the global symbols to avoid conflicts with the tdx host code.

-Andi

2021-04-23 15:51:40

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

On Fri, Apr 23, 2021, Dan Williams wrote:
> On Fri, Apr 23, 2021 at 8:15 AM Sean Christopherson <[email protected]> wrote:
> >
> > On Thu, Apr 22, 2021, Andi Kleen wrote:
> > > On Thu, Apr 22, 2021 at 06:21:07PM -0700, Dave Hansen wrote:
> > > > On 4/22/21 6:09 PM, Kuppuswamy, Sathyanarayanan wrote:
> > > > > But let me try to explain it here. What I meant by complication is,
> > > > > for in/out instruction, we use alternative_io() to substitute in/out
> > > > > instructions with tdg_in()/tdg_out() assembly calls. So we have to ensure
> > > > > that we don't corrupt registers or stack from the substituted instructions
> > > > >
> > > > > If you check the implementation of tdg_in()/tdg_out(), you will notice
> > > > > that we have added code to preserve the caller registers. So, if we use
> > > > > C wrapper for this use case, there is a chance that it might mess
> > > > > the caller registers or stack.
> > > > >
> > > > > alternative_io("in" #bwl " %w2, %" #bw "0", \
> > > > > "call tdg_in" #bwl, X86_FEATURE_TDX_GUEST, \
> >
> > Has Intel "officially" switched to "tdg" as the acronym for TDX guest? As much
> > as I dislike having to juggle "TDX host" vs "TDX guest" concepts, tdx_ vs tdg_
> > isn't any better IMO. The latter looks an awful lot like a typo, grepping for
> > "tdx" to find relevant code will get fail (sometimes), and confusion seems
> > inevitable as keeping "TDX" out of guest code/comments/documentation will be
> > nigh impossible.
> >
> > If we do decide to go with "tdg" for the guest stuff, then _all_ of the guest
> > stuff, file names included, should use tdg. Maybe X86_FEATURE_TDX_GUEST could
> > be left as a breadcrumb for translating TDX->TDG.
> >
> > > > > "=a"(value), "d"(port))
> > > >
> > > > Are you saying that calling C functions from inline assembly might
> > > > corrupt the stack or registers? Are you suggesting that you simply
> > >
> > > It's possible, but you would need to mark a lot more registers clobbered
> > > (the x86-64 ABI allows to clobber many registers)
> > >
> > > I don't think the stack would be messed up, but there might be problems
> > > with writing the correct unwind information (which tends to be tricky)
> > >
> > > Usually it's better to avoid it.
> >
> > For me, the more important justification is that, if calling from alternative_io,
> > the input parameters will be in the wrong registers. The OUT wrapper would be
> > especially gross as RAX (the value to write) isn't an input param, i.e. shifting
> > via "ignored" params wouldn't work.
> >
> > But to Dave's point, that justfication needs to be in the changelog.
>
> It's not clear to me that in()/out() need to be inline asm with an
> alternative vs out-of-line function calls with a static branch?

Code footprint is the main argument, especially since Intel presumably is hoping
most distros will build their generic kernels with CONFIG_TDX_GUEST=y. IIRC, a
carefully crafted assembly helper with a custom ABI can keep the overhead to
three bytes (or maybe even one byte?) versus raw IN and OUT. Using a static
branch would incur significantly more overhead since the register prologues would
be different.

Or are you saying make the inb/outb() helpers themselves functions? That would
be quite brutal, too.

On a related topic, this changelog also needs to justify changing "Nd" to "d".
Maybe no one cares too much about imm8 port I/O and code footprint, but
effectively killing off "IN AL, imm8" should at least be called out.

Subject: Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions



On 4/23/21 8:15 AM, Sean Christopherson wrote:
> Has Intel "officially" switched to "tdg" as the acronym for TDX guest? As much
> as I dislike having to juggle "TDX host" vs "TDX guest" concepts, tdx_ vs tdg_
> isn't any better IMO.

When we merged both host and guest kernel into the same code base, we hit some
name conflicts (due to using tdx_ prefix in both host/guest code). So in order to
avoid such issues in future we decided to go with tdg/tdh combination. we thought
its good enough for kernel function/variable names.

The latter looks an awful lot like a typo, grepping for
> "tdx" to find relevant code will get fail (sometimes), and confusion seems
> inevitable as keeping "TDX" out of guest code/comments/documentation will be
> nigh impossible.

tdg/tdh combination is only used within kernel code. But in sections which are
visible to users (kernel config and command line option), we still use
tdx_guest/tdx_host combination.


>
> If we do decide to go with "tdg" for the guest stuff, then_all_ of the guest
> stuff, file names included, should use tdg. Maybe X86_FEATURE_TDX_GUEST could
> be left as a breadcrumb for translating TDX->TDG.

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer