On Tue, May 17, 2022 at 09:36:03AM -0700, Dave Hansen wrote:
> On 5/17/22 08:30, Kirill A. Shutemov wrote:
> > load_unaligned_zeropad() can lead to unwanted loads across page boundaries.
> > The unwanted loads are typically harmless. But, they might be made to
> > totally unrelated or even unmapped memory. load_unaligned_zeropad()
> > relies on exception fixup (#PF, #GP and now #VE) to recover from these
> > unwanted loads.
> >
> > In TDX guest the second page can be shared page and VMM may configure it
> > to trigger #VE.
> >
> > Kernel assumes that #VE on a shared page is MMIO access and tries to
> > decode instruction to handle it. In case of load_unaligned_zeropad() it
> > may result in confusion as it is not MMIO access.
> >
> > Check fixup table before trying to handle MMIO.
>
> Is this a theoretical problem or was it found in practice?
No, it was found based on analysis.
The problem was found in practice for private pages (see the patch in the
unaccepted memory support patchset), but not for shared.
For shared I had to do some tricks to get it triggered. Shared pages that
configured to trigger MMIO normally comes from ioremap() and they are not
mapped next to normally allocated pages. I had to force this situation.
But there are normally allocated pages that we make shared, like SWIOTLB
buffer. These pages usually do not trigger #VE, but malicious host can
configure them to trigged it at any point.
Even after I forced the situation, insn_decode_mmio() worked fine as
load_unaligned_zeropad() uses a flavour of MOV that insn_decode_mmio() can
decode. But it gets situation worse: we ask host to handle MMIO for the
address in private page and it allows host to override the part of the
word that comes from the private pages. :/
> > diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> > index 03deb4d6920d..5fbdda2f2b86 100644
> > --- a/arch/x86/coco/tdx/tdx.c
> > +++ b/arch/x86/coco/tdx/tdx.c
> > @@ -11,6 +11,8 @@
> > #include <asm/insn.h>
> > #include <asm/insn-eval.h>
> > #include <asm/pgtable.h>
> > +#include <asm/trapnr.h>
> > +#include <asm/extable.h>
> >
> > /* TDX module Call Leaf IDs */
> > #define TDX_GET_INFO 1
> > @@ -296,6 +298,26 @@ static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
> > if (WARN_ON_ONCE(user_mode(regs)))
> > return false;
> >
> > + /*
> > + * load_unaligned_zeropad() relies on exception fixups in case of the
> > + * word being a page-crosser and the second page is not accessible.
> > + *
> > + * In TDX guest the second page can be shared page and VMM may
>
> In TDX guests,
>
> > + * configure it to trigger #VE.
> > + *
> > + * Kernel assumes that #VE on a shared page is MMIO access and tries to
> > + * decode instruction to handle it. In case of load_unaligned_zeropad()
> > + * it may result in confusion as it is not MMIO access.
> > + *
> > + * Check fixup table before trying to handle MMIO.
> > + */
> > + if (fixup_exception(regs, X86_TRAP_VE, 0, ve->gla)) {
> > + /* regs->ip is adjusted by fixup_exception() */
> > + ve->instr_len = 0;
> > +
> > + return true;
> > + }
>
> This 've->instr_len = ' stuff is just a hack.
>
> ve_info is a software structure. Why not just add a:
>
> bool ip_adjusted;
>
> which defaults to false, then we have:
>
> /*
> * Adjust RIP if the exception was handled
> * but RIP was not adjusted.
> */
> if (!ret && !ve_info->ip_adjusted)
> regs->ip += ve_info->instr_len;
>
> One other oddity I just stumbled upon:
>
> static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
> {
> ...
> ve->instr_len = insn.length;
>
> Why does that need to override 've->instr_len'? What was wrong with the
> gunk in r10 that came out of TDX_GET_VEINFO?
TDX module doesn't decode MMIO instruction and does not provide valid size
of it. We had to do it manually, based on decoding.
Given that we had to adjust IP in handle_mmio() anyway, do you still think
"ve->instr_len = 0;" is wrong? I dislike ip_adjusted more.
--
Kirill A. Shutemov
On 5/17/22 10:40, Kirill A. Shutemov wrote:
>>
>> ve_info is a software structure. Why not just add a:
>>
>> bool ip_adjusted;
>>
>> which defaults to false, then we have:
>>
>> /*
>> * Adjust RIP if the exception was handled
>> * but RIP was not adjusted.
>> */
>> if (!ret && !ve_info->ip_adjusted)
>> regs->ip += ve_info->instr_len;
>>
>> One other oddity I just stumbled upon:
>>
>> static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
>> {
>> ...
>> ve->instr_len = insn.length;
>>
>> Why does that need to override 've->instr_len'? What was wrong with the
>> gunk in r10 that came out of TDX_GET_VEINFO?
> TDX module doesn't decode MMIO instruction and does not provide valid size
> of it. We had to do it manually, based on decoding.
That's worth a comment, don't you think? I'd add one both in where the
ve_info is filled and where ve->instr_len is adjusted.
> Given that we had to adjust IP in handle_mmio() anyway, do you still think
> "ve->instr_len = 0;" is wrong? I dislike ip_adjusted more.
Something is wrong about it.
You could call it 've->instr_bytes_to_handle' or something. Then it
makes actual logical sense when you handle it to zero it out. I just
want it to be more explicit when the upper levels need to do something.
Does ve->instr_len==0 both when the TDX module isn't providing
instruction sizes *and* when no handling is necessary? That seems like
an unfortunate logical multiplexing of 0.