2022-08-11 21:25:23

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH v2 2/9] KVM: x86: remove return value of kvm_vcpu_block

The return value of kvm_vcpu_block will be repurposed soon to return
the state of KVM_REQ_UNHALT. In preparation for that, get rid of the
current return value. It is only used by kvm_vcpu_halt to decide whether
the call resulted in a wait, but the same effect can be obtained with
a single round of polling.

No functional change intended, apart from practically indistinguishable
changes to the polling behavior.

Signed-off-by: Paolo Bonzini <[email protected]>
---
include/linux/kvm_host.h | 2 +-
virt/kvm/kvm_main.c | 45 +++++++++++++++++-----------------------
2 files changed, 20 insertions(+), 27 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1c480b1821e1..e7bd48d15db8 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1339,7 +1339,7 @@ void kvm_sigset_activate(struct kvm_vcpu *vcpu);
void kvm_sigset_deactivate(struct kvm_vcpu *vcpu);

void kvm_vcpu_halt(struct kvm_vcpu *vcpu);
-bool kvm_vcpu_block(struct kvm_vcpu *vcpu);
+void kvm_vcpu_block(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu);
bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 515dfe9d3bcf..1f049c1d01b4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3429,10 +3429,9 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
* pending. This is mostly used when halting a vCPU, but may also be used
* directly for other vCPU non-runnable states, e.g. x86's Wait-For-SIPI.
*/
-bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
+void kvm_vcpu_block(struct kvm_vcpu *vcpu)
{
struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
- bool waited = false;

vcpu->stat.generic.blocking = 1;

@@ -3447,7 +3446,6 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
if (kvm_vcpu_check_block(vcpu) < 0)
break;

- waited = true;
schedule();
}

@@ -3457,8 +3455,6 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
preempt_enable();

vcpu->stat.generic.blocking = 0;
-
- return waited;
}

static inline void update_halt_poll_stats(struct kvm_vcpu *vcpu, ktime_t start,
@@ -3493,35 +3489,32 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
{
bool halt_poll_allowed = !kvm_arch_no_poll(vcpu);
bool do_halt_poll = halt_poll_allowed && vcpu->halt_poll_ns;
- ktime_t start, cur, poll_end;
+ ktime_t start, cur, poll_end, stop;
bool waited = false;
u64 halt_ns;

start = cur = poll_end = ktime_get();
- if (do_halt_poll) {
- ktime_t stop = ktime_add_ns(start, vcpu->halt_poll_ns);
+ stop = do_halt_poll ? start : ktime_add_ns(start, vcpu->halt_poll_ns);

- do {
- /*
- * This sets KVM_REQ_UNHALT if an interrupt
- * arrives.
- */
- if (kvm_vcpu_check_block(vcpu) < 0)
- goto out;
- cpu_relax();
- poll_end = cur = ktime_get();
- } while (kvm_vcpu_can_poll(cur, stop));
- }
+ do {
+ /*
+ * This sets KVM_REQ_UNHALT if an interrupt
+ * arrives.
+ */
+ if (kvm_vcpu_check_block(vcpu) < 0)
+ goto out;
+ cpu_relax();
+ poll_end = cur = ktime_get();
+ } while (kvm_vcpu_can_poll(cur, stop));

- waited = kvm_vcpu_block(vcpu);
+ waited = true;
+ kvm_vcpu_block(vcpu);

cur = ktime_get();
- if (waited) {
- vcpu->stat.generic.halt_wait_ns +=
- ktime_to_ns(cur) - ktime_to_ns(poll_end);
- KVM_STATS_LOG_HIST_UPDATE(vcpu->stat.generic.halt_wait_hist,
- ktime_to_ns(cur) - ktime_to_ns(poll_end));
- }
+ vcpu->stat.generic.halt_wait_ns +=
+ ktime_to_ns(cur) - ktime_to_ns(poll_end);
+ KVM_STATS_LOG_HIST_UPDATE(vcpu->stat.generic.halt_wait_hist,
+ ktime_to_ns(cur) - ktime_to_ns(poll_end));
out:
/* The total time the vCPU was "halted", including polling time. */
halt_ns = ktime_to_ns(cur) - ktime_to_ns(start);
--
2.31.1



2022-08-16 23:58:52

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 2/9] KVM: x86: remove return value of kvm_vcpu_block

On Thu, Aug 11, 2022, Paolo Bonzini wrote:
> The return value of kvm_vcpu_block will be repurposed soon to return

kvm_vcpu_block()

> the state of KVM_REQ_UNHALT. In preparation for that, get rid of the
> current return value. It is only used by kvm_vcpu_halt to decide whether

kvm_vcpu_halt()

> the call resulted in a wait, but the same effect can be obtained with
> a single round of polling.
>
> No functional change intended, apart from practically indistinguishable
> changes to the polling behavior.

This "breaks" update_halt_poll_stats(). At the very least, it breaks the comment
that effectively says "waited" is set if and only if schedule() is called.

/*
* Note, halt-polling is considered successful so long as the vCPU was
* never actually scheduled out, i.e. even if the wake event arrived
* after of the halt-polling loop itself, but before the full wait.
*/
if (do_halt_poll)
update_halt_poll_stats(vcpu, start, poll_end, !waited);

> @@ -3493,35 +3489,32 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
> {
> bool halt_poll_allowed = !kvm_arch_no_poll(vcpu);
> bool do_halt_poll = halt_poll_allowed && vcpu->halt_poll_ns;
> - ktime_t start, cur, poll_end;
> + ktime_t start, cur, poll_end, stop;
> bool waited = false;
> u64 halt_ns;
>
> start = cur = poll_end = ktime_get();
> - if (do_halt_poll) {
> - ktime_t stop = ktime_add_ns(start, vcpu->halt_poll_ns);
> + stop = do_halt_poll ? start : ktime_add_ns(start, vcpu->halt_poll_ns);

This is inverted, the loop should terminate after the first iteration (stop==start)
if halt-polling is _not_ allowed, i.e. do_halt_poll is false.

Overall, I don't like this patch. I don't necessarily hate it, but I definitely
don't like it.

Isn't freeing up the return from kvm_vcpu_check_block() unnecessary? Can't we
just do:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9f11b505cbee..ccb9f8bdeb18 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10633,7 +10633,7 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
if (hv_timer)
kvm_lapic_switch_to_hv_timer(vcpu);

- if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
+ if (!kvm_arch_vcpu_runnable(vcpu))
return 1;
}


which IMO is more intuitive and doesn't require reworking halt-polling (again).

I don't see why KVM cares if a vCPU becomes runnable after detecting that something
else caused kvm_vcpu_check_block() to bail. E.g. a pending signal doesn't invalidate
a pending vCPU event, and either way KVM is bailing from the run loop.

2022-08-17 14:12:27

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v2 2/9] KVM: x86: remove return value of kvm_vcpu_block

On Tue, 2022-08-16 at 23:34 +0000, Sean Christopherson wrote:
> On Thu, Aug 11, 2022, Paolo Bonzini wrote:
> > The return value of kvm_vcpu_block will be repurposed soon to return
>
> kvm_vcpu_block()
>
> > the state of KVM_REQ_UNHALT.  In preparation for that, get rid of the
> > current return value.  It is only used by kvm_vcpu_halt to decide whether
>
> kvm_vcpu_halt()
>
> > the call resulted in a wait, but the same effect can be obtained with
> > a single round of polling.
> >
> > No functional change intended, apart from practically indistinguishable
> > changes to the polling behavior.
>
> This "breaks" update_halt_poll_stats().  At the very least, it breaks the comment
> that effectively says "waited" is set if and only if schedule() is called.
>
>         /*
>          * Note, halt-polling is considered successful so long as the vCPU was
>          * never actually scheduled out, i.e. even if the wake event arrived
>          * after of the halt-polling loop itself, but before the full wait.
>          */
>         if (do_halt_poll)
>                 update_halt_poll_stats(vcpu, start, poll_end, !waited);
>
> > @@ -3493,35 +3489,32 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
> >  {
> >         bool halt_poll_allowed = !kvm_arch_no_poll(vcpu);
> >         bool do_halt_poll = halt_poll_allowed && vcpu->halt_poll_ns;
> > -       ktime_t start, cur, poll_end;
> > +       ktime_t start, cur, poll_end, stop;
> >         bool waited = false;
> >         u64 halt_ns;
> >  
> >         start = cur = poll_end = ktime_get();
> > -       if (do_halt_poll) {
> > -               ktime_t stop = ktime_add_ns(start, vcpu->halt_poll_ns);
> > +       stop = do_halt_poll ? start : ktime_add_ns(start, vcpu->halt_poll_ns);
>
> This is inverted, the loop should terminate after the first iteration (stop==start)
> if halt-polling is _not_ allowed, i.e. do_halt_poll is false.
>
> Overall, I don't like this patch.  I don't necessarily hate it, but I definitely
> don't like it.

Roughtly same opinion here.

>
> Isn't freeing up the return from kvm_vcpu_check_block() unnecessary?  Can't we
> just do:
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9f11b505cbee..ccb9f8bdeb18 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10633,7 +10633,7 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
>                 if (hv_timer)
>                         kvm_lapic_switch_to_hv_timer(vcpu);
>
> -               if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
> +               if (!kvm_arch_vcpu_runnable(vcpu))


The 'kvm_vcpu_block()' returns when 'kvm_vcpu_check_block()' returns a negative number
which can happen also due to pending apic timer / signal, however it only sets the
'KVM_REQ_UNHALT' only when 'kvm_arch_vcpu_runnable()==true' so the above does make
sense.


Best regards,
Maxim Levitsky


>                         return 1;
>         }
>
>
> which IMO is more intuitive and doesn't require reworking halt-polling (again).
>
> I don't see why KVM cares if a vCPU becomes runnable after detecting that something
> else caused kvm_vcpu_check_block() to bail.  E.g. a pending signal doesn't invalidate
> a pending vCPU event, and either way KVM is bailing from the run loop.
>


2022-08-17 15:49:49

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 2/9] KVM: x86: remove return value of kvm_vcpu_block

On 8/17/22 01:34, Sean Christopherson wrote:
> Isn't freeing up the return from kvm_vcpu_check_block() unnecessary? Can't we
> just do:
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9f11b505cbee..ccb9f8bdeb18 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10633,7 +10633,7 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
> if (hv_timer)
> kvm_lapic_switch_to_hv_timer(vcpu);
>
> - if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
> + if (!kvm_arch_vcpu_runnable(vcpu))
> return 1;
> }
>
>
> which IMO is more intuitive and doesn't require reworking halt-polling (again).

This was my first idea indeed. However I didn't like calling
kvm_arch_vcpu_runnable() again and "did it schedule()" seemed to be a
less interesting result from kvm_vcpu_block() (and in fact
kvm_vcpu_halt() does not bother passing it up the return chain).

Paolo

2022-08-17 16:46:42

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 2/9] KVM: x86: remove return value of kvm_vcpu_block

On Wed, Aug 17, 2022, Paolo Bonzini wrote:
> On 8/17/22 01:34, Sean Christopherson wrote:
> > Isn't freeing up the return from kvm_vcpu_check_block() unnecessary? Can't we
> > just do:
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 9f11b505cbee..ccb9f8bdeb18 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -10633,7 +10633,7 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
> > if (hv_timer)
> > kvm_lapic_switch_to_hv_timer(vcpu);
> >
> > - if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
> > + if (!kvm_arch_vcpu_runnable(vcpu))
> > return 1;
> > }
> >
> >
> > which IMO is more intuitive and doesn't require reworking halt-polling (again).
>
> This was my first idea indeed. However I didn't like calling
> kvm_arch_vcpu_runnable() again and "did it schedule()" seemed to be a less
> interesting result from kvm_vcpu_block() (and in fact kvm_vcpu_halt() does
> not bother passing it up the return chain).

The flip side of calling kvm_arch_vcpu_runnable() again is that KVM will immediately
wake the vCPU if it becomes runnable after kvm_vcpu_check_block(). The edge cases
where the vCPU becomes runnable late are unlikely to truly matter in practice, but
on the other hand manually re-checking kvm_arch_vcpu_runnable() means KVM gets both
cases "right" (waited=true iff vCPU actually waited, vCPU awakened ASAP), whereas
squishing the information into the return of kvm_vcpu_check_block() means KVM gets
both cases "wrong" (waited=true even if schedule() was never called, vCPU left in
a non-running state even though it's runnable).

My only hesitation with calling kvm_arch_vcpu_runnable() again is that it could be
problematic if KVM somehow managed to consume the event that caused kvm_vcpu_has_events()
to return true, but I don't see how that could happen without it being a KVM bug.

Ah, and if we do go with the explicit check, it should come with a comment to call
out that KVM needs to return up the stack, e.g.

/*
* If KVM stopped blocking the vCPU but the vCPU is still not
* runnable, then there is a pending host event of some kind,
* e.g. a pending signal. Return up the stack to service the
* pending event without changing the vCPU's activity state.
*/
if (!kvm_arch_vcpu_runnable(vcpu))
return 1;

so that we don't end combining the checks into:

while (!kvm_arch_vcpu_runnable(vcpu)) {
/*
* Switch to the software timer before halt-polling/blocking as
* the guest's timer may be a break event for the vCPU, and the
* hypervisor timer runs only when the CPU is in guest mode.
* Switch before halt-polling so that KVM recognizes an expired
* timer before blocking.
*/
hv_timer = kvm_lapic_hv_timer_in_use(vcpu);
if (hv_timer)
kvm_lapic_switch_to_sw_timer(vcpu);

kvm_vcpu_srcu_read_unlock(vcpu);
if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED)
kvm_vcpu_halt(vcpu);
else
kvm_vcpu_block(vcpu);
kvm_vcpu_srcu_read_lock(vcpu);

if (hv_timer)
kvm_lapic_switch_to_hv_timer(vcpu);
}

which looks sane, but would mean KVM never bounces out to handle whatever event
needs handling.

Side topic, usage of kvm_apic_accept_events() appears to be broken (though nothing
can trigger the bug). If kvm_apic_accept_events() were to return an -errno, then
kvm_arch_vcpu_ioctl_run() would return '0' to userspace without updating
vcpu->run->exit_reason. I think an easy fix is to drop the return value entirely
and then WARN if kvm_check_nested_events() returns something other than -EBUSY.

if (is_guest_mode(vcpu)) {
r = kvm_check_nested_events(vcpu);
if (r < 0) {
WARN_ON_ONCE(r != -EBUSY);
return;
}


2022-08-17 17:43:01

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 2/9] KVM: x86: remove return value of kvm_vcpu_block

On 8/17/22 18:41, Sean Christopherson wrote:
> On Wed, Aug 17, 2022, Paolo Bonzini wrote:
>> On 8/17/22 01:34, Sean Christopherson wrote:
>>> Isn't freeing up the return from kvm_vcpu_check_block() unnecessary? Can't we
>>> just do:
>>>
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 9f11b505cbee..ccb9f8bdeb18 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -10633,7 +10633,7 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
>>> if (hv_timer)
>>> kvm_lapic_switch_to_hv_timer(vcpu);
>>>
>>> - if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
>>> + if (!kvm_arch_vcpu_runnable(vcpu))
>>> return 1;
>>> }
>>>
>>>
>>> which IMO is more intuitive and doesn't require reworking halt-polling (again).
>>
>> This was my first idea indeed. However I didn't like calling
>> kvm_arch_vcpu_runnable() again and "did it schedule()" seemed to be a less
>> interesting result from kvm_vcpu_block() (and in fact kvm_vcpu_halt() does
>> not bother passing it up the return chain).
>
> The flip side of calling kvm_arch_vcpu_runnable() again is that KVM will immediately
> wake the vCPU if it becomes runnable after kvm_vcpu_check_block(). The edge cases
> where the vCPU becomes runnable late are unlikely to truly matter in practice, but
> on the other hand manually re-checking kvm_arch_vcpu_runnable() means KVM gets both
> cases "right" (waited=true iff vCPU actually waited, vCPU awakened ASAP), whereas
> squishing the information into the return of kvm_vcpu_check_block() means KVM gets
> both cases "wrong" (waited=true even if schedule() was never called, vCPU left in
> a non-running state even though it's runnable).
>
> My only hesitation with calling kvm_arch_vcpu_runnable() again is that it could be
> problematic if KVM somehow managed to consume the event that caused kvm_vcpu_has_events()
> to return true, but I don't see how that could happen without it being a KVM bug.

No, I agree that it cannot happen, and especially so after getting rid
of the kvm_check_nested_events() call in kvm_arch_vcpu_runnable().

I'll reorder the patches and apply your suggestion.

Paolo

2022-09-20 00:49:45

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 2/9] KVM: x86: remove return value of kvm_vcpu_block

On Wed, Aug 17, 2022, Sean Christopherson wrote:
> Side topic, usage of kvm_apic_accept_events() appears to be broken (though nothing
> can trigger the bug). If kvm_apic_accept_events() were to return an -errno, then
> kvm_arch_vcpu_ioctl_run() would return '0' to userspace without updating
> vcpu->run->exit_reason. I think an easy fix is to drop the return value entirely
> and then WARN if kvm_check_nested_events() returns something other than -EBUSY.
>
> if (is_guest_mode(vcpu)) {
> r = kvm_check_nested_events(vcpu);
> if (r < 0) {
> WARN_ON_ONCE(r != -EBUSY);
> return;
> }

For posterity, I was wrong. Way down the stack, vmx_complete_nested_posted_interrupt()
can return -ENXIO after filling vcpu->run->exit_reason via kvm_handle_memory_failure().
That's the entire reason why negative values from kvm_check_nested_events() and
kvm_apic_accept_events() are morphed to '0', i.e. to "exit to userspace".