2021-11-16 00:27:48

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

+arm64 KVM folks

On Mon, Nov 15, 2021, Marc Orr wrote:
> On Mon, Nov 15, 2021 at 10:26 AM Sean Christopherson <[email protected]> wrote:
> >
> > On Mon, Nov 15, 2021, Dr. David Alan Gilbert wrote:
> > > * Sean Christopherson ([email protected]) wrote:
> > > > On Fri, Nov 12, 2021, Borislav Petkov wrote:
> > > > > On Fri, Nov 12, 2021 at 09:59:46AM -0800, Dave Hansen wrote:
> > > > > > Or, is there some mechanism that prevent guest-private memory from being
> > > > > > accessed in random host kernel code?
> > > >
> > > > Or random host userspace code...
> > > >
> > > > > So I'm currently under the impression that random host->guest accesses
> > > > > should not happen if not previously agreed upon by both.
> > > >
> > > > Key word "should".
> > > >
> > > > > Because, as explained on IRC, if host touches a private guest page,
> > > > > whatever the host does to that page, the next time the guest runs, it'll
> > > > > get a #VC where it will see that that page doesn't belong to it anymore
> > > > > and then, out of paranoia, it will simply terminate to protect itself.
> > > > >
> > > > > So cloud providers should have an interest to prevent such random stray
> > > > > accesses if they wanna have guests. :)
> > > >
> > > > Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.
> > >
> > > Would it necessarily have been a host bug? A guest telling the host a
> > > bad GPA to DMA into would trigger this wouldn't it?
> >
> > No, because as Andy pointed out, host userspace must already guard against a bad
> > GPA, i.e. this is just a variant of the guest telling the host to DMA to a GPA
> > that is completely bogus. The shared vs. private behavior just means that when
> > host userspace is doing a GPA=>HVA lookup, it needs to incorporate the "shared"
> > state of the GPA. If the host goes and DMAs into the completely wrong HVA=>PFN,
> > then that is a host bug; that the bug happened to be exploited by a buggy/malicious
> > guest doesn't change the fact that the host messed up.
>
> "If the host goes and DMAs into the completely wrong HVA=>PFN, then
> that is a host bug; that the bug happened to be exploited by a
> buggy/malicious guest doesn't change the fact that the host messed
> up."
> ^^^
> Again, I'm flabbergasted that you are arguing that it's OK for a guest
> to exploit a host bug to take down host-side processes or the host
> itself, either of which could bring down all other VMs on the machine.
>
> I'm going to repeat -- this is not OK! Period.

Huh? At which point did I suggest it's ok to ship software with bugs? Of course
it's not ok to introduce host bugs that let the guest crash the host (or host
processes). But _if_ someone does ship buggy host software, it's not like we can
wave a magic wand and stop the guest from exploiting the bug. That's why they're
such a big deal.

Yes, in this case a very specific flavor of host userspace bug could be morphed
into a guest exception, but as mentioned ad nauseum, _if_ host userspace has bug
where it does not properly validate a GPA=>HVA, then any such bug exists and is
exploitable today irrespective of SNP.

> Again, if the community wants to layer some orchestration scheme
> between host userspace, host kernel, and guest, on top of the code to
> inject the #VC into the guest, that's fine. This proposal is not
> stopping that. In fact, the two approaches are completely orthogonal
> and compatible.
>
> But so far I have heard zero reasons why injecting a #VC into the
> guest is wrong. Other than just stating that it's wrong.

It creates a new attack surface, e.g. if the guest mishandles the #VC and does
PVALIDATE on memory that it previously accepted, then userspace can attack the
guest by accessing guest private memory to coerce the guest into consuming corrupted
data.

> Again, the guest must be able to detect buggy and malicious host-side
> writes to private memory. Or else "confidential computing" doesn't
> work.

That assertion assumes the host _hypervisor_ is untrusted, which does not hold true
for all use cases. The Cc'd arm64 folks are working on a protected VM model where
the host kernel at large is untrusted, but the "hypervisor" (KVM plus a few other
bits), is still trusted by the guest. Because the hypervisor is trusted, the guest
doesn't need to be hardened against event injection attacks from the host.

Note, SNP already has a similar concept in it's VMPLs. VMPL3 runs a confidential VM
that is not hardened in any way, and fully trusts VMPL0 to not inject bogus faults.

And along the lines of arm64's pKVM, I would very much like to get KVM to a point
where it can remove host userspace from the guest's TCB without relying on hardware.
Anything that can be in hardware absolutely can be done in the kernel, and likely
can be done with significantly less performance overhead. Confidential computing is
not a binary thing where the only valid use case is removing the host kernel from
the TCB and trusting only hardware. There are undoubtedly use cases where trusting
the host kernel but not host userspace brings tangible value, but the overhead of
TDX/SNP to get the host kernel out of the TCB is the wrong tradeoff for performance
vs. security.

> Assuming that's not true is not a valid argument to dismiss
> injecting a #VC exception into the guest.


2021-11-16 05:41:48

by Marc Orr

[permalink] [raw]
Subject: Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

On Mon, Nov 15, 2021 at 11:15 AM Sean Christopherson <[email protected]> wrote:
>
> +arm64 KVM folks
>
> On Mon, Nov 15, 2021, Marc Orr wrote:
> > On Mon, Nov 15, 2021 at 10:26 AM Sean Christopherson <[email protected]> wrote:
> > >
> > > On Mon, Nov 15, 2021, Dr. David Alan Gilbert wrote:
> > > > * Sean Christopherson ([email protected]) wrote:
> > > > > On Fri, Nov 12, 2021, Borislav Petkov wrote:
> > > > > > On Fri, Nov 12, 2021 at 09:59:46AM -0800, Dave Hansen wrote:
> > > > > > > Or, is there some mechanism that prevent guest-private memory from being
> > > > > > > accessed in random host kernel code?
> > > > >
> > > > > Or random host userspace code...
> > > > >
> > > > > > So I'm currently under the impression that random host->guest accesses
> > > > > > should not happen if not previously agreed upon by both.
> > > > >
> > > > > Key word "should".
> > > > >
> > > > > > Because, as explained on IRC, if host touches a private guest page,
> > > > > > whatever the host does to that page, the next time the guest runs, it'll
> > > > > > get a #VC where it will see that that page doesn't belong to it anymore
> > > > > > and then, out of paranoia, it will simply terminate to protect itself.
> > > > > >
> > > > > > So cloud providers should have an interest to prevent such random stray
> > > > > > accesses if they wanna have guests. :)
> > > > >
> > > > > Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.
> > > >
> > > > Would it necessarily have been a host bug? A guest telling the host a
> > > > bad GPA to DMA into would trigger this wouldn't it?
> > >
> > > No, because as Andy pointed out, host userspace must already guard against a bad
> > > GPA, i.e. this is just a variant of the guest telling the host to DMA to a GPA
> > > that is completely bogus. The shared vs. private behavior just means that when
> > > host userspace is doing a GPA=>HVA lookup, it needs to incorporate the "shared"
> > > state of the GPA. If the host goes and DMAs into the completely wrong HVA=>PFN,
> > > then that is a host bug; that the bug happened to be exploited by a buggy/malicious
> > > guest doesn't change the fact that the host messed up.
> >
> > "If the host goes and DMAs into the completely wrong HVA=>PFN, then
> > that is a host bug; that the bug happened to be exploited by a
> > buggy/malicious guest doesn't change the fact that the host messed
> > up."
> > ^^^
> > Again, I'm flabbergasted that you are arguing that it's OK for a guest
> > to exploit a host bug to take down host-side processes or the host
> > itself, either of which could bring down all other VMs on the machine.
> >
> > I'm going to repeat -- this is not OK! Period.
>
> Huh? At which point did I suggest it's ok to ship software with bugs? Of course
> it's not ok to introduce host bugs that let the guest crash the host (or host
> processes). But _if_ someone does ship buggy host software, it's not like we can
> wave a magic wand and stop the guest from exploiting the bug. That's why they're
> such a big deal.
>
> Yes, in this case a very specific flavor of host userspace bug could be morphed
> into a guest exception, but as mentioned ad nauseum, _if_ host userspace has bug
> where it does not properly validate a GPA=>HVA, then any such bug exists and is
> exploitable today irrespective of SNP.

If I'm understanding you correctly, you're saying that we will never
get into the host's page fault handler due to an RMP violation if we
implement the unmapping guest private memory proposal (without bugs).

However, bugs do happen. And the host-side page fault handler will
have code to react to an RMP violation (even if it's supposedly
impossible to hit). I'm saying that the host-side page fault handler
should NOT handle an RMP violation by killing host-side processes or
the kernel itself. This is detrimental to host reliability.

There are two ways to handle this. (1) Convert the private page
causing the RMP violation to shared, (2) Kill the guest.

Converting the private page to shared is a good solution in SNP's
threat model. And overall, it's better for debuggability than
immediately terminating the guest.

> > Again, if the community wants to layer some orchestration scheme
> > between host userspace, host kernel, and guest, on top of the code to
> > inject the #VC into the guest, that's fine. This proposal is not
> > stopping that. In fact, the two approaches are completely orthogonal
> > and compatible.
> >
> > But so far I have heard zero reasons why injecting a #VC into the
> > guest is wrong. Other than just stating that it's wrong.
>
> It creates a new attack surface, e.g. if the guest mishandles the #VC and does
> PVALIDATE on memory that it previously accepted, then userspace can attack the
> guest by accessing guest private memory to coerce the guest into consuming corrupted
> data.

We should handle RMP violations as best possible from within the
host-side page fault handler, independent of the proposal to unmap
private guest memory for all CVM architectures. Otherwise, if someone
figures out how to trigger an RMP violation by writing guest private
memory (despite unmapping guest private memory's goal to make this
impossible), the attack surface has now increased. Because now we're
either killing host processes or the kernel. Which is worse than
killing the single guest.

Second, I don't think it's correct to say that the host-side
implementation changes the attack surface. The guest must already be
hardened against host-side bugs and attacks. From the guest's
perspective, the attack surface is the same. Also, unmapping private
guest memory is going to require coordination across host userspace,
host kernel, and guest. That's a lot of code. And typically more code
means more attack surface. That being said, if all that code is
implemented perfectly, you might be right, that unmapping guest
private memory makes it harder for the host to attack the guest. But I
think it's a moot point. Because in the end, converting the page from
private to shared is entirely reasonable to solve this problem for
SNP, and we should be doing it irregardless, even if we do get
unmapping private memory working on SNP.

2021-11-16 13:30:26

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

On Mon, Nov 15, 2021 at 07:15:07PM +0000, Sean Christopherson wrote:
> It creates a new attack surface, e.g. if the guest mishandles the #VC and does
> PVALIDATE on memory that it previously accepted, then userspace can attack the
> guest by accessing guest private memory to coerce the guest into consuming corrupted
> data.

If a guest can be tricked into a double PVALIDATE or otherwise
misbehaves on a #VC exception, then it is a guest bug and needs to be
fixed there.

It is a core requirement to the #VC handler that it can not be tricked
that way.

Regards,

--
J?rg R?del
[email protected]

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 N?rnberg
Germany

(HRB 36809, AG N?rnberg)
Gesch?ftsf?hrer: Ivo Totev