LinuxLists.cc - Re: [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH

2024-02-06 23:43:34

Subject: Re: [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command

On Fri, Feb 2, 2024 at 11:55 PM Sean Christopherson <[email protected]> wrote:
> > > > + struct kvm_sev_snp_launch_update {
> > > > + __u64 start_gfn; /* Guest page number to start from. */
> > > > + __u64 uaddr; /* userspace address need to be encrypted */
> > >
> > > Huh? Why is KVM taking a userspace address? IIUC, the address unconditionally
> > > gets translated into a gfn, so why not pass a gfn?
> > >
> > > And speaking of gfns, AFAICT start_gfn is never used.
> >
> > I think having both the uaddr and start_gfn parameters makes sense. I
> > think it's only awkward because how I'm using the memslot to translate
> > the uaddr to a GFN in the current implementation,
>
> Yes.
>
> > > Oof, reading more of the code, this *requires* an effective in-place copy-and-convert
> > > of guest memory.
> >
> > So that's how it's done here, KVM_SNP_LAUNCH_UPDATE copies the pages into
> > gmem, then passes those pages on to firmware for encryption. Then the
> > VMM is expected to mark the GFN range as private via
> > KVM_SET_MEMORY_ATTRIBUTES, since the VMM understands what constitutes
> > initial private/encrypted payload. I should document that better in
> > KVM_SNP_LAUNCH_UPDATE docs however.
>
> That's fine. As above, my complaints are that the unencrypted source *must* be
> covered by a memslot. That's beyond ugly.

Yes, if there's one field that has to go it's uaddr, and then you'll
have a non-in-place encrypt (any copy performed by KVM it is hidden).

> > > > + kvaddr = pfn_to_kaddr(pfns[i]);
> > > > + if (!virt_addr_valid(kvaddr)) {
> > >
> > > I really, really don't like that this assume guest_memfd is backed by struct page.
> >
> > There are similar enforcements in the SEV/SEV-ES code. There was some
> > initial discussion about relaxing this for SNP so we could support
> > things like /dev/mem-mapped guest memory, but then guest_memfd came
> > along and made that to be an unlikely use-case in the near-term given
> > that it relies on alloc_pages() currently and explicitly guards against
> > mmap()'ing pages in userspace.
> >
> > I think it makes to keep the current tightened restrictions in-place
> > until such a use-case comes along, since otherwise we are relaxing a
> > bunch of currently-useful sanity checks that span all throughout the code

What sanity is being checked for, in other words why are they useful?
If all you get for breaking the promise is a KVM_BUG_ON, for example,
that's par for the course. If instead you get an oops, then we have a
problem.

I may be a bit less draconian than Sean, but the assumptions need to
be documented and explained because they _are_ going to go away.

> > > (b) Why are KVM's memory attributes never consulted?
> >
> > It doesn't really matter if the attributes are set before or after
> > KVM_SNP_LAUNCH_UPDATE, only that by the time the guest actually launches
> > they pages get set to private so they get faulted in from gmem. We could
> > document our expectations and enforce them here if that's preferable
> > however. Maybe requiring KVM_SET_MEMORY_ATTRIBUTES(private) in advance
> > would make it easier to enforce that userspace does the right thing.
> > I'll see how that looks if there are no objections.
>
> Userspace owns whether a page is PRIVATE or SHARED, full stop. If KVM can't
> honor that, then we need to come up with better uAPI.

Can you explain more verbosely what you mean?

> > > > + * When invalid CPUID function entries are detected, the firmware
> > > > + * corrects these entries for debugging purpose and leaves the
> > > > + * page unencrypted so it can be provided users for debugging
> > > > + * and error-reporting.
> > >
> > > Why? IIUC, this is basically backdooring reads/writes into guest_memfd to avoid
> > > having to add proper mmap() support.
>
> Yes, I am specifically complaining about writing guest memory on failure, which is
> all kinds of weird.

It is weird but I am not sure if you are complaining about firmware
behavior or something else.

Paolo

2024-02-07 02:44:01

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command

On Wed, Feb 07, 2024, Paolo Bonzini wrote:
> On Fri, Feb 2, 2024 at 11:55 PM Sean Christopherson <[email protected]> wrote:
> > > It doesn't really matter if the attributes are set before or after
> > > KVM_SNP_LAUNCH_UPDATE, only that by the time the guest actually launches
> > > they pages get set to private so they get faulted in from gmem. We could
> > > document our expectations and enforce them here if that's preferable
> > > however. Maybe requiring KVM_SET_MEMORY_ATTRIBUTES(private) in advance
> > > would make it easier to enforce that userspace does the right thing.
> > > I'll see how that looks if there are no objections.
> >
> > Userspace owns whether a page is PRIVATE or SHARED, full stop. If KVM can't
> > honor that, then we need to come up with better uAPI.
>
> Can you explain more verbosely what you mean?

As proposed, snp_launch_update_gfn_handler() doesn't verify the state of the
gfns' attributes. But that's a minor problem and probably not a sticking point.

My overarching complaint is that the code is to be wildly unsafe, or at the very
least brittle. Without guest_memfd's knowledge, and without holding any locks
beyond kvm->lock, it

1) checks if a pfn is shared in the RMP
2) copies data to the page
3) converts the page to private in the RMP
4) does PSP stuff
5) on failure, converts the page back to shared in RMP
6) conditionally on failure, writes to the page via a gfn

I'm not at all confident that 1-4 isn't riddled with TOCTOU bugs, and that's
before KVM gains support for intrahost migration, i.e. before KVM allows multiple
VM instances to bind to a single guest_memfd.

But I _think_ we mostly sorted this out at PUCK. IIRC, the plan is to have guest_memfd
provide (kernel) APIs to allow arch/vendor code to initialize a guest_memfd range.
That will give guest_memfd complete control over the state of a given page, will
allow guest_memfd to take the appropriate locks, and if we're lucky, will be reusable
by other CoCo flavors beyond SNP.

> > > > > + * When invalid CPUID function entries are detected, the firmware
> > > > > + * corrects these entries for debugging purpose and leaves the
> > > > > + * page unencrypted so it can be provided users for debugging
> > > > > + * and error-reporting.
> > > >
> > > > Why? IIUC, this is basically backdooring reads/writes into guest_memfd to avoid
> > > > having to add proper mmap() support.
> >
> > Yes, I am specifically complaining about writing guest memory on failure, which is
> > all kinds of weird.
>
> It is weird but I am not sure if you are complaining about firmware
> behavior or something else.

This proposed KVM code:

+ host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
+
+ ret = kvm_write_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
+ if (ret)
+ pr_err("Failed to write CPUID page back to userspace, ret: 0x%x\n",
+ ret);

I have no objection to propagating error/debug information back to userspace,
but it needs to be routed through the source page (or I suppose some dedicated
error page, but that seems like overkill). Shoving the error information into
guest memory is gross.

But this should naturally go away when the requirement that the source be
covered by the same memslot also goes away.

2024-02-07 08:03:39

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command

On Wed, Feb 7, 2024 at 3:43 AM Sean Christopherson <[email protected]> wrote:
> > > Userspace owns whether a page is PRIVATE or SHARED, full stop. If KVM can't
> > > honor that, then we need to come up with better uAPI.
> >
> > Can you explain more verbosely what you mean?
>
> As proposed, snp_launch_update_gfn_handler() doesn't verify the state of the
> gfns' attributes. But that's a minor problem and probably not a sticking point.
>
> My overarching complaint is that the code is to be wildly unsafe, or at the very
> least brittle. Without guest_memfd's knowledge, and without holding any locks
> beyond kvm->lock, it
>
> 1) checks if a pfn is shared in the RMP
> 2) copies data to the page
> 3) converts the page to private in the RMP
> 4) does PSP stuff
> 5) on failure, converts the page back to shared in RMP
> 6) conditionally on failure, writes to the page via a gfn
>
> I'm not at all confident that 1-4 isn't riddled with TOCTOU bugs, and that's
> before KVM gains support for intrahost migration, i.e. before KVM allows multiple
> VM instances to bind to a single guest_memfd.

Absolutely.

> But I _think_ we mostly sorted this out at PUCK. IIRC, the plan is to have guest_memfd
> provide (kernel) APIs to allow arch/vendor code to initialize a guest_memfd range.
> That will give guest_memfd complete control over the state of a given page, will
> allow guest_memfd to take the appropriate locks, and if we're lucky, will be reusable
> by other CoCo flavors beyond SNP.

Ok, either way it's clear that guest_memfd needs to be in charge.
Whether it's MEMORY_ENCRYPT_OP that calls into guest_memfd or vice
versa, that only matters so much.

> > > Yes, I am specifically complaining about writing guest memory on failure, which is
> > > all kinds of weird.
> >
> > It is weird but I am not sure if you are complaining about firmware
> > behavior or something else.
>
> This proposed KVM code:
>
> + host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
> +
> + ret = kvm_write_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
> + if (ret)
> + pr_err("Failed to write CPUID page back to userspace, ret: 0x%x\n",
> + ret);
>
>
> I have no objection to propagating error/debug information back to userspace,
> but it needs to be routed through the source page (or I suppose some dedicated
> error page, but that seems like overkill). Shoving the error information into
> guest memory is gross.

Yes, but it should be just a consequence of not actually using
start_gfn. Having to copy back remains weird, but what can you do.

Paolo

2024-02-09 01:52:44

by Michael Roth

[permalink] [raw]

Subject: Re: [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command

On Wed, Feb 07, 2024 at 12:43:02AM +0100, Paolo Bonzini wrote:
> On Fri, Feb 2, 2024 at 11:55 PM Sean Christopherson <[email protected]> wrote:
> > > > > + struct kvm_sev_snp_launch_update {
> > > > > + __u64 start_gfn; /* Guest page number to start from. */
> > > > > + __u64 uaddr; /* userspace address need to be encrypted */
> > > >
> > > > Huh? Why is KVM taking a userspace address? IIUC, the address unconditionally
> > > > gets translated into a gfn, so why not pass a gfn?
> > > >
> > > > And speaking of gfns, AFAICT start_gfn is never used.
> > >
> > > I think having both the uaddr and start_gfn parameters makes sense. I
> > > think it's only awkward because how I'm using the memslot to translate
> > > the uaddr to a GFN in the current implementation,
> >
> > Yes.
> >
> > > > Oof, reading more of the code, this *requires* an effective in-place copy-and-convert
> > > > of guest memory.
> > >
> > > So that's how it's done here, KVM_SNP_LAUNCH_UPDATE copies the pages into
> > > gmem, then passes those pages on to firmware for encryption. Then the
> > > VMM is expected to mark the GFN range as private via
> > > KVM_SET_MEMORY_ATTRIBUTES, since the VMM understands what constitutes
> > > initial private/encrypted payload. I should document that better in
> > > KVM_SNP_LAUNCH_UPDATE docs however.
> >
> > That's fine. As above, my complaints are that the unencrypted source *must* be
> > covered by a memslot. That's beyond ugly.
>
> Yes, if there's one field that has to go it's uaddr, and then you'll
> have a non-in-place encrypt (any copy performed by KVM it is hidden).
>
> > > > > + kvaddr = pfn_to_kaddr(pfns[i]);
> > > > > + if (!virt_addr_valid(kvaddr)) {
> > > >
> > > > I really, really don't like that this assume guest_memfd is backed by struct page.
> > >
> > > There are similar enforcements in the SEV/SEV-ES code. There was some
> > > initial discussion about relaxing this for SNP so we could support
> > > things like /dev/mem-mapped guest memory, but then guest_memfd came
> > > along and made that to be an unlikely use-case in the near-term given
> > > that it relies on alloc_pages() currently and explicitly guards against
> > > mmap()'ing pages in userspace.
> > >
> > > I think it makes to keep the current tightened restrictions in-place
> > > until such a use-case comes along, since otherwise we are relaxing a
> > > bunch of currently-useful sanity checks that span all throughout the code
>
> What sanity is being checked for, in other words why are they useful?
> If all you get for breaking the promise is a KVM_BUG_ON, for example,
> that's par for the course. If instead you get an oops, then we have a
> problem.
>
> I may be a bit less draconian than Sean, but the assumptions need to
> be documented and explained because they _are_ going to go away.

Maybe in this case sanity-check isn't the right word, but for instance
the occurance Sean objected to:

kvaddr = pfn_to_kaddr(pfns[i]);
if (!virt_addr_valid(kvaddr)) {
...
ret = -EINVAL;

where there are pfn_valid() checks underneath the covers that provide
some assurance this is normal struct-page-backed/kernel-tracked memory
that has a mapping in the directmap we can use here. Dropping that
assumption means we need to create temporary mappings to access the PFN,
which complicates the code for a potential use-case that doesn't yet
exist. But if the maintainers are telling me this will change then I
have no objection to making those changes :) That was just my thinking
at the time.

And yes, if we move more of this sort of functionality closer to gmem
then those assumptions become reality and we can keep the code more
closely in sync will how memory is actually allocated.

I'll rework this to something closer to what Sean mentioned during the
PUCK call: a gmem interface that can be called to handle populating
initial gmem pages, and drop remaining any assumptions about
struct-backed/directmapped in PFNs in the code that remains afterward.

I'm hoping if we do move to a unified KVM API that a similar approach
will work in that case too. It may be a bit tricky with how TDX does
a lot of this through KVM MMU / SecureEPT hooks, this may complicate
locking expectations and not necessarily fit nicely into the same flow
as SNP, but we'll see how it goes.

-Mike

2024-02-09 14:38:54

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command

On Thu, Feb 08, 2024, Michael Roth wrote:
> On Wed, Feb 07, 2024 at 12:43:02AM +0100, Paolo Bonzini wrote:
> > On Fri, Feb 2, 2024 at 11:55 PM Sean Christopherson <[email protected]> wrote:
> > What sanity is being checked for, in other words why are they useful?
> > If all you get for breaking the promise is a KVM_BUG_ON, for example,
> > that's par for the course. If instead you get an oops, then we have a
> > problem.
> >
> > I may be a bit less draconian than Sean, but the assumptions need to
> > be documented and explained because they _are_ going to go away.
>
> Maybe in this case sanity-check isn't the right word, but for instance
> the occurance Sean objected to:
>
> kvaddr = pfn_to_kaddr(pfns[i]);
> if (!virt_addr_valid(kvaddr)) {
> ...
> ret = -EINVAL;
>
> where there are pfn_valid() checks underneath the covers that provide
> some assurance this is normal struct-page-backed/kernel-tracked memory
> that has a mapping in the directmap we can use here. Dropping that
> assumption means we need to create temporary mappings to access the PFN,

No, you don't. kvm_vcpu_map() does all of the lifting for you, with the small
caveat that it obviously needs a vCPU. But that's trivial to solve with a minor
refactoring, *if* we need to solve that problem (it's not clear to me whether or
not the APIs for copying data into guest_memfd will be VM-scoped or vCPU-scoped).