by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v2 03/16] KVM: x86: Always use non-compat vcpu_runstate_info size for gfn=>pfn cache

On Thu, Oct 27, 2022, Sean Christopherson wrote:
> On Thu, Oct 27, 2022, Paolo Bonzini wrote:
> > On 10/13/22 23:12, Sean Christopherson wrote:
> > > Always use the size of Xen's non-compat vcpu_runstate_info struct when
> > > checking that the GPA+size doesn't cross a page boundary. Conceptually,
> > > using the current mode is more correct, but KVM isn't consistent with
> > > itself as kvm_xen_vcpu_set_attr() unconditionally uses the "full" size
> > > when activating the cache. More importantly, prior to the introduction
> > > of the gfn_to_pfn_cache, KVM _always_ used the full size, i.e. allowing
> > > the guest (userspace?) to use a poorly aligned GPA in 32-bit mode but not
> > > 64-bit mode is more of a bug than a feature, and fixing the bug doesn't
> > > break KVM's historical ABI.
> >
> > I'd rather not introduce additional restrictions in KVM,
>
> But KVM already has this restriction. "struct vcpu_info" is always checked for
> the non-compat size, and as above, "struct vcpu_runstate_info" is checked for the
> non-compat size during its initialization.

Ah, I forgot those are the same size:

BUILD_BUG_ON(sizeof(struct vcpu_info) !=
sizeof(struct compat_vcpu_info));

2022-11-21 14:57:00

On Mon, 2022-11-21 at 19:11 +0000, Sean Christopherson wrote:
> On Mon, Nov 21, 2022, David Woodhouse wrote:
> > On Thu, 2022-10-13 at 21:12 +0000, Sean Christopherson wrote:
> > > From: Michal Luczaj <[email protected]>
> > >
> > > Make the length of a gfn=>pfn cache an immutable property of the cache
> > > to cleanup the APIs and avoid potential bugs, e.g calling check() with a
> > > larger size than refresh() could put KVM into an infinite loop.
> >
> > Hm, that's a strange hypothetical bug to be worried about, given the
> > pattern is usually to have the check() and refresh() within a few lines
> > of each other with just atomicity/locking stuff in between them.
>
> Why do you say it's strange to be worried about? The GPC and Xen code is all quite
> complex and has had multiple bugs, several of which are not exactly edge cases.
> I don't think it's at all strange to want to make it difficult to introduce a bug
> that would in many ways be worse than panicking the kernel.

The check() and refresh() calls are within a few lines of each other,
and it'd be really strange for them to have a *different* idea about
what the length is, surely?

> But as Paolo said, the APIs themselves are to blame[*], check() and refresh()
> shouldn't be split for the common case, i.e. this particular concern should largely
> be a non-issue in the long run.
>
> [*] https://lore.kernel.org/all/[email protected]

Yeah. As I said to Paul, I've been tempted by that. I've so far not
done it because although they look broadly similar, a bunch of the
sites do end up with *different* code between the check() and the
refresh(), for various locking and atomicity reasons.

> > I won't fight for it, but I quite liked the idea that each user of a
> > GPC would know how much space *it* is going to access, and provide that
> > length as a required parameter. I do note you've added a WARN_ON to one
> > such user, and that's great — but overall, this patch makes that
> > checking *optional* instead of mandatory.
>
> I honestly don't see a meaningful difference in this case. The only practical
> difference is that shoving @len into the cache makes the check a one-time thing.
> The "mandatory" check at use time still relies on a human to not make a mistake.
> If the check were derived from the actual access, a la get_user(), then I would
> feel differently.
>
> Case in point, the mandatory check didn't prevent KVM from screwing up bounds
> checking in kvm_xen_schedop_poll(). The PAGE_SIZE passed in for @len is in no
> way tied to actual access that's being performed, the code is simply regurgitating
> the size of the cache.

True, but that's a different class of bug, and the human needs to make
a more *egregious* mistake.

If the function itself writes outside the size that *it* thinks *it's*
going to write, right there and then in that function, that's utterly
hosed (and the SCHEDOP_poll thing was indeed so hosed).

The mandatory check *did* save us from configuring a 32-bit runstate
area at the end of a page, then *writing* to it in 64-bit mode (where
it's larger) and running off the end of the page.

It saved us from "knowing", a few seconds ago under different
circumstances, what the size of the runstate area was... and then it
actually being different when it's written.

But that's not the common case, so again, I won't fight for it.

I've reworked the unapplied parts of this series on top of the poll and
runstate fixes in my tree, *except* for this one making the length
immutable, and I'm running some tests.

https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/gpc-fixes

I'm happy to reinstate the immutable length thing in some form on top.

Given that the runstate code already calculates for itself how many
bytes it can fit onto the first page, it really doesn't care about the
length field in the GPC. As a nasty hack, the runstate code could
probably even get away with setting 'len' to zero. That's kind of
awful, but maybe we could introduce a __kvm_gpc_activate() which does
take a new length, leaving kvm_gpc_activate() without it?

> > > All current (and anticipated future) users access the cache with a
> > > predetermined size, which isn't a coincidence as using a dedicated cache
> > > really only make sense when the access pattern is "fixed".
> >
> > In fixing up the runstate area, I've made that not true. Not only does
> > the runstate area change size at runtime if the guest changes between
> > 32-bit and 64-bit mode, but it also now uses *two* GPCs to cope with a
> > region that crosses a page boundary, and the size of the first
> > therefore changes according to how much fits on the tail of the page.
> >
> > > Add a WARN in kvm_setup_guest_pvclock() to assert that the offset+size
> > > matches the length of the cache, both to make it more obvious that the
> > > length really is immutable in that case, and to detect future bugs.
> >
> > ...
> > > @@ -3031,13 +3030,13 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
> > > struct pvclock_vcpu_time_info *guest_hv_clock;
> > > unsigned long flags;
> > >
> > > + WARN_ON_ONCE(gpc->len != offset + sizeof(*guest_hv_clock));
> > > +
> >
> > That ought to be 'gpc->len < offset + sizeof(*guest_hv_clock)' I think?
> >
> > In the case where we are writing a clock *within* a mapped Xen
> > vcpu_info structure, it doesn't have to be at the *end* of that
> > structure. I think the xen_shinfo_test should have caught that?
>
> The WARN doesn't get false positive because "struct pvclock_vcpu_time_info" is
> placed at the end of "struct vcpu_info" and "struct compat_vcpu_info".
>
> I don't have a strong opinion on whether it's "!=" of "<", my goal in adding the
> WARN was primarily to show that @len really is immutable in this case. Guarding
> against future overrun bugs was a bonus.

Ah right, I think I was looking at the pvclock_wall_clock field in the
shared_info, not the time field in the vcpu_info.

Attachments:

smime.p7s (5.83 kB)

2022-11-22 19:32:14

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v2 07/16] KVM: Store gfn_to_pfn_cache length as an immutable property

On Mon, Nov 21, 2022, David Woodhouse wrote:
> On Mon, 2022-11-21 at 19:11 +0000, Sean Christopherson wrote:
> > On Mon, Nov 21, 2022, David Woodhouse wrote:
> > > I won't fight for it, but I quite liked the idea that each user of a
> > > GPC would know how much space *it* is going to access, and provide that
> > > length as a required parameter. I do note you've added a WARN_ON to one
> > > such user, and that's great — but overall, this patch makes that
> > > checking *optional* instead of mandatory.
> >
> > I honestly don't see a meaningful difference in this case. The only practical
> > difference is that shoving @len into the cache makes the check a one-time thing.
> > The "mandatory" check at use time still relies on a human to not make a mistake.
> > If the check were derived from the actual access, a la get_user(), then I would
> > feel differently.
> >
> > Case in point, the mandatory check didn't prevent KVM from screwing up bounds
> > checking in kvm_xen_schedop_poll(). The PAGE_SIZE passed in for @len is in no
> > way tied to actual access that's being performed, the code is simply regurgitating
> > the size of the cache.
>
> True, but that's a different class of bug, and the human needs to make
> a more *egregious* mistake.
>
> If the function itself writes outside the size that *it* thinks *it's*
> going to write, right there and then in that function, that's utterly
> hosed (and the SCHEDOP_poll thing was indeed so hosed).

Yes, such mistakes are more egregious in the sense they are harder to find and
have more severe consequences, but I don't think the mistakes are necessarily
harder to make. Bugs in simple usage patterns are easy to spot, but at the same
time they're also less likely to be buggy because they're simpler.

> The mandatory check *did* save us from configuring a 32-bit runstate
> area at the end of a page, then *writing* to it in 64-bit mode (where
> it's larger) and running off the end of the page.

Only because the length/capacity wasn't immutable, i.e. that particilar bug couldn't
have been introduced in the first place if kvm_gpc_activate() was the only "public"
API that allowed "changing" the length.

That's really what I dislike. I have no objection to adding a sanity check, what
I think is broken and dangerous is allowing a gpc->gpa to effectively become valid
by refreshing with a smaller length.

The gfn_to_hva_cache APIs have the same problem, but they get away with it because
they don't support concurrent usage and don't have to deal with invalidation events.

Lastly, if we keep "length" then we also need to keep "gpa", otherwise the resulting
API is all kinds of funky.

E.g. I'd be totally ok with something like this that would allow users to opt-in
to sanity checking their usage.

int __kvm_gpc_lock(struct gfn_to_pfn_cache *gpc)
{
int r;

read_lock_irqsave(&gpc->lock, gpc->flags);

while (kvm_gpc_check(gpc)) {
read_unlock_irqrestore(&gpc->lock, gpc->flags);

r = kvm_gpc_refresh(gpc);
if (r)
return r;

read_lock_irqsave(&gpc->lock, gpc->flags);
}

return 0;
}

int kvm_gpc_lock(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long len)
{
if (WARN_ON_ONCE(gpa < gpc->gpa || (gpa + len > PAGE_SIZE) ||
((gpa & PAGE_MASK) != (gpc->gpa & PAGE_MASK)))
return -EINVAL;

return __kvm_gpc_lock(gpc);
}

2022-12-02 10:38:10

by Like Xu

[permalink] [raw]

Subject: Re: [PATCH v2 05/16] KVM: x86: Remove unused argument in gpc_unmap_khva()

On 14/10/2022 5:12 am, Sean Christopherson wrote:
> Remove the unused @kvm argument from gpc_unmap_khva().

Nit: the caller kvm_gpc_unmap() can also get rid of the unused @kvm argument.

2022-12-02 13:10:40

by Michal Luczaj

[permalink] [raw]

Subject: Re: [PATCH v2 05/16] KVM: x86: Remove unused argument in gpc_unmap_khva()

On 12/2/22 10:28, Like Xu wrote:
> On 14/10/2022 5:12 am, Sean Christopherson wrote:
>> Remove the unused @kvm argument from gpc_unmap_khva().
>
> Nit: the caller kvm_gpc_unmap() can also get rid of the unused @kvm argument.

Right, the initial series cleaned up kvm_gpc_unmap() in a separate patch.
Current iteration removes kvm_gpc_unmap() later in the series:
https://lore.kernel.org/kvm/[email protected]/

Michal

2022-12-02 14:30:32

by David Woodhouse

[permalink] [raw]

Subject: Re: [PATCH v2 05/16] KVM: x86: Remove unused argument in gpc_unmap_khva()

On Fri, 2022-12-02 at 11:57 +0100, Michal Luczaj wrote:
> On 12/2/22 10:28, Like Xu wrote:
> > On 14/10/2022 5:12 am, Sean Christopherson wrote:
> > > Remove the unused @kvm argument from gpc_unmap_khva().
> >
> > Nit: the caller kvm_gpc_unmap() can also get rid of the unused @kvm argument.
>
> Right, the initial series cleaned up kvm_gpc_unmap() in a separate patch.
> Current iteration removes kvm_gpc_unmap() later in the series:
> https://lore.kernel.org/kvm/[email protected]/

I have been keeping that series up to date in
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/gpc-fixes

Now that the dust has settled on the Xen runstate area, I may post it
as v3 of the series.

Or I may attempt to resolve the gpc->len immutability thing first. I'm
still not really convinced Sean has won me round on that; I'm still
quite attached to the TOCTOU benefit of checking the length right there
at the moment you're going to use the pointer — especially given that
it *doesn't* have bounds checks like get_user() does, as Sean points
out.

Attachments:

smime.p7s (5.83 kB)

2022-12-02 17:21:30

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v2 05/16] KVM: x86: Remove unused argument in gpc_unmap_khva()

On Fri, Dec 02, 2022, David Woodhouse wrote:
> On Fri, 2022-12-02 at 11:57 +0100, Michal Luczaj wrote:
> > On 12/2/22 10:28, Like Xu wrote:
> > > On 14/10/2022 5:12 am, Sean Christopherson wrote:
> > > > Remove the unused @kvm argument from gpc_unmap_khva().
> > >
> > > Nit: the caller kvm_gpc_unmap() can also get rid of the unused @kvm argument.
> >
> > Right, the initial series cleaned up kvm_gpc_unmap() in a separate patch.
> > Current iteration removes kvm_gpc_unmap() later in the series:
> > https://lore.kernel.org/kvm/[email protected]/
>
> I have been keeping that series up to date in
> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/gpc-fixes
>
> Now that the dust has settled on the Xen runstate area, I may post it
> as v3 of the series.
>
> Or I may attempt to resolve the gpc->len immutability thing first. I'm
> still not really convinced Sean has won me round on that;

Ya, I agree that storing "len" is undesirable, but so is storing "gpa" instead of
"gfn".

> I'm still quite attached to the TOCTOU benefit of checking the length right
> there at the moment you're going to use the pointer — especially given that
> it *doesn't* have bounds checks like get_user() does, as Sean points out.

I'm in favor of keeping the length checks if we modify the cache to store the
gfn, not the gpa, and require the gpa (or maybe just offset?) in the "get a kernel
pointer" API.

So, how about this for a set of APIs? Obviously not tested whatsoever, but I
think they address the Xen use cases, and will fit the nested virt cases too
(which want to stuff a pfn into a VMCS/VMCB).

void *kvm_gpc_get_kmap(struct gfn_to_pfn_cache *gpc, gpa_t offset,
unsigned long len, bool atomic)
{
<lock + refresh>

return gpc->khva + offset;
}
EXPORT_SYMBOL_GPL(kvm_gpc_refresh);

kvm_pfn_t kvm_gpc_get_pfn(struct gfn_to_pfn_cache *gpc, bool atomic)
{
<lock + refresh of full page>

return gpc->pfn;
}
EXPORT_SYMBOL_GPL(kvm_gpc_refresh);

void kvm_gpc_put(struct gfn_to_pfn_cache *gpc)
{
<unlock>
}
EXPORT_SYMBOL_GPL(kvm_gpc_refresh);

int kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc)
{
return __kvm_gpc_refresh(gpc, gfn_to_gpa(gpc->gfn), PAGE_SIZE);
}
EXPORT_SYMBOL_GPL(kvm_gpc_refresh);

And then __kvm_gpc_refresh() would do something like:

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 2d6aba677830..b2dd2eda4b56 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -236,22 +236,19 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
return -EFAULT;
}

-static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
+static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpt_t gpa,
unsigned long len)
{
struct kvm_memslots *slots = kvm_memslots(gpc->kvm);
- unsigned long page_offset = gpa & ~PAGE_MASK;
bool unmap_old = false;
unsigned long old_uhva;
kvm_pfn_t old_pfn;
void *old_khva;
+ gfn_t gfn;
int ret;

- /*
- * If must fit within a single page. The 'len' argument is
- * only to enforce that.
- */
- if (page_offset + len > PAGE_SIZE)
+ /* An individual cache doesn't support page splits. */
+ if ((gpa & ~PAGE_MASK) + len > PAGE_SIZE)
return -EINVAL;

/*
@@ -268,16 +265,16 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
goto out_unlock;
}

+ gfn = gpa_to_gfn(gpa);
+
old_pfn = gpc->pfn;
- old_khva = gpc->khva - offset_in_page(gpc->khva);
+ old_khva = gpc->khva;
old_uhva = gpc->uhva;

/* If the userspace HVA is invalid, refresh that first */
- if (gpc->gpa != gpa || gpc->generation != slots->generation ||
+ if (gpc->gfn != gfn || gpc->generation != slots->generation ||
kvm_is_error_hva(gpc->uhva)) {
- gfn_t gfn = gpa_to_gfn(gpa);
-
- gpc->gpa = gpa;
+ gpc->gfn = gfn;
gpc->generation = slots->generation;
gpc->memslot = __gfn_to_memslot(slots, gfn);
gpc->uhva = gfn_to_hva_memslot(gpc->memslot, gfn);
@@ -295,12 +292,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
if (!gpc->valid || old_uhva != gpc->uhva) {
ret = hva_to_pfn_retry(gpc);
} else {
- /*
- * If the HVA→PFN mapping was already valid, don't unmap it.
- * But do update gpc->khva because the offset within the page
- * may have changed.
- */
- gpc->khva = old_khva + page_offset;
+ /* If the HVA→PFN mapping was already valid, don't unmap it. */
ret = 0;
goto out_unlock;
}