2022-05-18 03:50:41

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCH] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page

load_unaligned_zeropad() can lead to unwanted loads across page boundaries.
The unwanted loads are typically harmless. But, they might be made to
totally unrelated or even unmapped memory. load_unaligned_zeropad()
relies on exception fixup (#PF, #GP and now #VE) to recover from these
unwanted loads.

In TDX guest the second page can be shared page and VMM may configure it
to trigger #VE.

Kernel assumes that #VE on a shared page is MMIO access and tries to
decode instruction to handle it. In case of load_unaligned_zeropad() it
may result in confusion as it is not MMIO access.

Check fixup table before trying to handle MMIO.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/coco/tdx/tdx.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 03deb4d6920d..5fbdda2f2b86 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -11,6 +11,8 @@
#include <asm/insn.h>
#include <asm/insn-eval.h>
#include <asm/pgtable.h>
+#include <asm/trapnr.h>
+#include <asm/extable.h>

/* TDX module Call Leaf IDs */
#define TDX_GET_INFO 1
@@ -296,6 +298,26 @@ static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
if (WARN_ON_ONCE(user_mode(regs)))
return false;

+ /*
+ * load_unaligned_zeropad() relies on exception fixups in case of the
+ * word being a page-crosser and the second page is not accessible.
+ *
+ * In TDX guest the second page can be shared page and VMM may
+ * configure it to trigger #VE.
+ *
+ * Kernel assumes that #VE on a shared page is MMIO access and tries to
+ * decode instruction to handle it. In case of load_unaligned_zeropad()
+ * it may result in confusion as it is not MMIO access.
+ *
+ * Check fixup table before trying to handle MMIO.
+ */
+ if (fixup_exception(regs, X86_TRAP_VE, 0, ve->gla)) {
+ /* regs->ip is adjusted by fixup_exception() */
+ ve->instr_len = 0;
+
+ return true;
+ }
+
if (copy_from_kernel_nofault(buffer, (void *)regs->ip, MAX_INSN_SIZE))
return false;

--
2.35.1



2022-05-18 03:58:42

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page

On Tue, May 17, 2022 at 11:14:13AM -0700, Dave Hansen wrote:
> On 5/17/22 10:40, Kirill A. Shutemov wrote:
> >>
> >> ve_info is a software structure. Why not just add a:
> >>
> >> bool ip_adjusted;
> >>
> >> which defaults to false, then we have:
> >>
> >> /*
> >> * Adjust RIP if the exception was handled
> >> * but RIP was not adjusted.
> >> */
> >> if (!ret && !ve_info->ip_adjusted)
> >> regs->ip += ve_info->instr_len;
> >>
> >> One other oddity I just stumbled upon:
> >>
> >> static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
> >> {
> >> ...
> >> ve->instr_len = insn.length;
> >>
> >> Why does that need to override 've->instr_len'? What was wrong with the
> >> gunk in r10 that came out of TDX_GET_VEINFO?
> > TDX module doesn't decode MMIO instruction and does not provide valid size
> > of it. We had to do it manually, based on decoding.
>
> That's worth a comment, don't you think? I'd add one both in where the
> ve_info is filled and where ve->instr_len is adjusted.

Okay. Will do.

> > Given that we had to adjust IP in handle_mmio() anyway, do you still think
> > "ve->instr_len = 0;" is wrong? I dislike ip_adjusted more.
>
> Something is wrong about it.
>
> You could call it 've->instr_bytes_to_handle' or something. Then it
> makes actual logical sense when you handle it to zero it out. I just
> want it to be more explicit when the upper levels need to do something.
>
> Does ve->instr_len==0 both when the TDX module isn't providing
> instruction sizes *and* when no handling is necessary? That seems like
> an unfortunate logical multiplexing of 0.

For EPT violation, ve->instr_len has *something* (not zero) that doesn't
match the actual instruction size. I dig out that it is filled with data
from VMREAD(0x440C), but I don't know where is the ultimate origin of the
data.

I don't understand virtualization side of the thing well enough.

Maybe someone who knows virtualtion could comment here. Sean?

--
Kirill A. Shutemov

2022-05-18 04:09:23

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page

On Tue, May 17, 2022, Sean Christopherson wrote:
> On Tue, May 17, 2022, Dave Hansen wrote:
> > On 5/17/22 13:17, Kirill A. Shutemov wrote:
> > >>> Given that we had to adjust IP in handle_mmio() anyway, do you still think
> > >>> "ve->instr_len = 0;" is wrong? I dislike ip_adjusted more.
> > >> Something is wrong about it.
> > >>
> > >> You could call it 've->instr_bytes_to_handle' or something. Then it
> > >> makes actual logical sense when you handle it to zero it out. I just
> > >> want it to be more explicit when the upper levels need to do something.
> > >>
> > >> Does ve->instr_len==0 both when the TDX module isn't providing
> > >> instruction sizes *and* when no handling is necessary? That seems like
> > >> an unfortunate logical multiplexing of 0.
> > > For EPT violation, ve->instr_len has *something* (not zero) that doesn't
> > > match the actual instruction size. I dig out that it is filled with data
> > > from VMREAD(0x440C), but I don't know where is the ultimate origin of the
> > > data.
> >
> > The SDM has a breakdown:
> >
> > 27.2.5 Information for VM Exits Due to Instruction Execution
> >
> > I didn't realize it came from VMREAD. I guess I assumed it came from
> > some TDX module magic. Silly me.
> >
> > The SDM makes it sound like we should be more judicious about using
> > 've->instr_len' though. "All VM exits other than those listed in the
> > above items leave this field undefined." Looking over
> > virt_exception_kernel(), we've got five cases from CPU instructions that
> > cause unconditional VMEXITs:

Ideally, what the SDM says wouldn't matter at all. The TDX module spec really
should be the authorative source in this case, but it just punts to the SDM:

The 32-bit value that would have been saved into the VMCS as VM-exit instruction
length if a legacy VM exit had occurred instead of the virtualization exception.

Even if the TDX spec wants to punt to the SDM, it would save a lot of headache and
SDM reading if it also said something to the effect of:

The INSTRUCTION_LENGTH and INSTRUCTION_INFORMATION fields are valid for all
#VEs injected by the Intel TDX Module. The fields are undefined for #VEs
injected by the CPU due to EPT Violations.

2022-05-18 04:15:55

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page

On 5/17/22 08:30, Kirill A. Shutemov wrote:
> load_unaligned_zeropad() can lead to unwanted loads across page boundaries.
> The unwanted loads are typically harmless. But, they might be made to
> totally unrelated or even unmapped memory. load_unaligned_zeropad()
> relies on exception fixup (#PF, #GP and now #VE) to recover from these
> unwanted loads.
>
> In TDX guest the second page can be shared page and VMM may configure it
> to trigger #VE.
>
> Kernel assumes that #VE on a shared page is MMIO access and tries to
> decode instruction to handle it. In case of load_unaligned_zeropad() it
> may result in confusion as it is not MMIO access.
>
> Check fixup table before trying to handle MMIO.

Is this a theoretical problem or was it found in practice?

> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index 03deb4d6920d..5fbdda2f2b86 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -11,6 +11,8 @@
> #include <asm/insn.h>
> #include <asm/insn-eval.h>
> #include <asm/pgtable.h>
> +#include <asm/trapnr.h>
> +#include <asm/extable.h>
>
> /* TDX module Call Leaf IDs */
> #define TDX_GET_INFO 1
> @@ -296,6 +298,26 @@ static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
> if (WARN_ON_ONCE(user_mode(regs)))
> return false;
>
> + /*
> + * load_unaligned_zeropad() relies on exception fixups in case of the
> + * word being a page-crosser and the second page is not accessible.
> + *
> + * In TDX guest the second page can be shared page and VMM may

In TDX guests,

> + * configure it to trigger #VE.
> + *
> + * Kernel assumes that #VE on a shared page is MMIO access and tries to
> + * decode instruction to handle it. In case of load_unaligned_zeropad()
> + * it may result in confusion as it is not MMIO access.
> + *
> + * Check fixup table before trying to handle MMIO.
> + */
> + if (fixup_exception(regs, X86_TRAP_VE, 0, ve->gla)) {
> + /* regs->ip is adjusted by fixup_exception() */
> + ve->instr_len = 0;
> +
> + return true;
> + }

This 've->instr_len = ' stuff is just a hack.

ve_info is a software structure. Why not just add a:

bool ip_adjusted;

which defaults to false, then we have:

/*
* Adjust RIP if the exception was handled
* but RIP was not adjusted.
*/
if (!ret && !ve_info->ip_adjusted)
regs->ip += ve_info->instr_len;

One other oddity I just stumbled upon:

static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
{
...
ve->instr_len = insn.length;

Why does that need to override 've->instr_len'? What was wrong with the
gunk in r10 that came out of TDX_GET_VEINFO?

2022-05-18 08:40:09

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page

From: Kirill A. Shutemov
> Sent: 17 May 2022 16:30
>
> load_unaligned_zeropad() can lead to unwanted loads across page boundaries.
> The unwanted loads are typically harmless. But, they might be made to
> totally unrelated or even unmapped memory. load_unaligned_zeropad()
> relies on exception fixup (#PF, #GP and now #VE) to recover from these
> unwanted loads.
>
> In TDX guest the second page can be shared page and VMM may configure it
> to trigger #VE.
>
> Kernel assumes that #VE on a shared page is MMIO access and tries to
> decode instruction to handle it. In case of load_unaligned_zeropad() it
> may result in confusion as it is not MMIO access.
>
> Check fixup table before trying to handle MMIO.

Is it best to avoid that all happening by avoiding mapping
'normal memory' below anything that isn't normal memory.

Even on a normal system it is potentially possibly that the
second page might be MMIO and reference a target that doesn't
want to see non-word sized reads.
(Or the first location might be a fifo and the read consumes
some data.)

In that case the cpu won't fault the access, but the hardware
access might have rather unexpected side effects.

Now the way MMIO pages are allocated probably makes that
impossible - but load_unaligned_zeropad() relies on
it not happening or not breaking anything.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2022-05-18 12:24:39

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page

On Wed, May 18, 2022 at 08:39:45AM +0000, David Laight wrote:
> From: Kirill A. Shutemov
> > Sent: 17 May 2022 16:30
> >
> > load_unaligned_zeropad() can lead to unwanted loads across page boundaries.
> > The unwanted loads are typically harmless. But, they might be made to
> > totally unrelated or even unmapped memory. load_unaligned_zeropad()
> > relies on exception fixup (#PF, #GP and now #VE) to recover from these
> > unwanted loads.
> >
> > In TDX guest the second page can be shared page and VMM may configure it
> > to trigger #VE.
> >
> > Kernel assumes that #VE on a shared page is MMIO access and tries to
> > decode instruction to handle it. In case of load_unaligned_zeropad() it
> > may result in confusion as it is not MMIO access.
> >
> > Check fixup table before trying to handle MMIO.
>
> Is it best to avoid that all happening by avoiding mapping
> 'normal memory' below anything that isn't normal memory.
>
> Even on a normal system it is potentially possibly that the
> second page might be MMIO and reference a target that doesn't
> want to see non-word sized reads.
> (Or the first location might be a fifo and the read consumes
> some data.)
>
> In that case the cpu won't fault the access, but the hardware
> access might have rather unexpected side effects.
>
> Now the way MMIO pages are allocated probably makes that
> impossible - but load_unaligned_zeropad() relies on
> it not happening or not breaking anything.

Normally MMIO mappings comes from ioremap() and it does not land next to
normal pages in virtual memory. So I don't think there's high risk of MMIO
being a problem on normal machines.

What makes TDX (and other confidential computing platforms) different is
security model: host and VMM considered hostile and we need protect
against it. In TDX case, VMM can make any shared memory (such as DMA
buffers) to trigger #VE that kernel interprets as MMIO access. We need to
make sure host cannot exploit it.

--
Kirill A. Shutemov

2022-05-20 09:00:18

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page

On Tue, May 17, 2022 at 10:52:48PM +0000, Sean Christopherson wrote:
> On Tue, May 17, 2022, Sean Christopherson wrote:
> > On Tue, May 17, 2022, Dave Hansen wrote:
> > > On 5/17/22 13:17, Kirill A. Shutemov wrote:
> > > >>> Given that we had to adjust IP in handle_mmio() anyway, do you still think
> > > >>> "ve->instr_len = 0;" is wrong? I dislike ip_adjusted more.
> > > >> Something is wrong about it.
> > > >>
> > > >> You could call it 've->instr_bytes_to_handle' or something. Then it
> > > >> makes actual logical sense when you handle it to zero it out. I just
> > > >> want it to be more explicit when the upper levels need to do something.
> > > >>
> > > >> Does ve->instr_len==0 both when the TDX module isn't providing
> > > >> instruction sizes *and* when no handling is necessary? That seems like
> > > >> an unfortunate logical multiplexing of 0.
> > > > For EPT violation, ve->instr_len has *something* (not zero) that doesn't
> > > > match the actual instruction size. I dig out that it is filled with data
> > > > from VMREAD(0x440C), but I don't know where is the ultimate origin of the
> > > > data.
> > >
> > > The SDM has a breakdown:
> > >
> > > 27.2.5 Information for VM Exits Due to Instruction Execution
> > >
> > > I didn't realize it came from VMREAD. I guess I assumed it came from
> > > some TDX module magic. Silly me.
> > >
> > > The SDM makes it sound like we should be more judicious about using
> > > 've->instr_len' though. "All VM exits other than those listed in the
> > > above items leave this field undefined." Looking over
> > > virt_exception_kernel(), we've got five cases from CPU instructions that
> > > cause unconditional VMEXITs:
>
> Ideally, what the SDM says wouldn't matter at all. The TDX module spec really
> should be the authorative source in this case, but it just punts to the SDM:
>
> The 32-bit value that would have been saved into the VMCS as VM-exit instruction
> length if a legacy VM exit had occurred instead of the virtualization exception.
>
> Even if the TDX spec wants to punt to the SDM, it would save a lot of headache and
> SDM reading if it also said something to the effect of:
>
> The INSTRUCTION_LENGTH and INSTRUCTION_INFORMATION fields are valid for all
> #VEs injected by the Intel TDX Module. The fields are undefined for #VEs
> injected by the CPU due to EPT Violations.

I initiated update to the spec, but it will take time.

--
Kirill A. Shutemov

2022-05-20 14:52:54

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page

On 5/19/22 11:19, Kirill A. Shutemov wrote:
>>>> The SDM has a breakdown:
>>>>
>>>> 27.2.5 Information for VM Exits Due to Instruction Execution
>>>>
>>>> I didn't realize it came from VMREAD. I guess I assumed it came from
>>>> some TDX module magic. Silly me.
>>>>
>>>> The SDM makes it sound like we should be more judicious about using
>>>> 've->instr_len' though. "All VM exits other than those listed in the
>>>> above items leave this field undefined." Looking over
>>>> virt_exception_kernel(), we've got five cases from CPU instructions that
>>>> cause unconditional VMEXITs:
>> Ideally, what the SDM says wouldn't matter at all. The TDX module spec really
>> should be the authorative source in this case, but it just punts to the SDM:
>>
>> The 32-bit value that would have been saved into the VMCS as VM-exit instruction
>> length if a legacy VM exit had occurred instead of the virtualization exception.
>>
>> Even if the TDX spec wants to punt to the SDM, it would save a lot of headache and
>> SDM reading if it also said something to the effect of:
>>
>> The INSTRUCTION_LENGTH and INSTRUCTION_INFORMATION fields are valid for all
>> #VEs injected by the Intel TDX Module. The fields are undefined for #VEs
>> injected by the CPU due to EPT Violations.
> I initiated update to the spec, but it will take time.

Understood, and thanks for doing that.

For now, let's just declare what we *expect* the spec will say and show
it to the folks doing the spec itself. They will then have a chance to
balk at our interpretation if we got something wrong.