2021-10-19 08:16:10

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU

From: Wanpeng Li <[email protected]>

Sometimes a vCPU kick is following a pending request, even if @vcpu is
the running vCPU. It suffers from both rcuwait_wake_up() which has
rcu/memory barrier operations and cmpxchg(). Let's check vcpu->wait
before rcu_wait_wake_up() and whether @vcpu is the running vCPU before
cmpxchg() to tax cut this overhead.

We evaluate the kvm-unit-test/vmexit.flat on an Intel ICX box, most of the
scores can improve ~600 cpu cycles especially when APICv is disabled.

tscdeadline_immed
tscdeadline
self_ipi_sti_nop
..............
x2apic_self_ipi_tpr_sti_hlt

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
v2 -> v3:
* use kvm_arch_vcpu_get_wait()
v1 -> v2:
* move checking running vCPU logic to kvm_vcpu_kick
* check rcuwait_active(&vcpu->wait) etc

virt/kvm/kvm_main.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7851f3a1b5f7..1bc52eab0a7d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3314,8 +3314,15 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
{
int me, cpu;

- if (kvm_vcpu_wake_up(vcpu))
- return;
+ me = get_cpu();
+
+ if (rcuwait_active(kvm_arch_vcpu_get_wait(vcpu)) && kvm_vcpu_wake_up(vcpu))
+ goto out;
+
+ if (vcpu == __this_cpu_read(kvm_running_vcpu)) {
+ WARN_ON_ONCE(vcpu->mode == IN_GUEST_MODE);
+ goto out;
+ }

/*
* Note, the vCPU could get migrated to a different pCPU at any point
@@ -3324,12 +3331,12 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
* IPI is to force the vCPU to leave IN_GUEST_MODE, and migrating the
* vCPU also requires it to leave IN_GUEST_MODE.
*/
- me = get_cpu();
if (kvm_arch_vcpu_should_kick(vcpu)) {
cpu = READ_ONCE(vcpu->cpu);
if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
smp_send_reschedule(cpu);
}
+out:
put_cpu();
}
EXPORT_SYMBOL_GPL(kvm_vcpu_kick);
--
2.25.1


2021-10-19 17:01:24

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU

On 19/10/21 10:12, Wanpeng Li wrote:
> - if (kvm_vcpu_wake_up(vcpu))
> - return;
> + me = get_cpu();
> +
> + if (rcuwait_active(kvm_arch_vcpu_get_wait(vcpu)) && kvm_vcpu_wake_up(vcpu))
> + goto out;

This is racy. You are basically doing the same check that
rcuwait_wake_up does, but without the memory barrier before.

Also here:

> + if (vcpu == __this_cpu_read(kvm_running_vcpu)) {
> + WARN_ON_ONCE(vcpu->mode == IN_GUEST_MODE);

it's better to do

if (vcpu == ... && !WARN_ON_ONCE(vcpu->mode == IN_GUEST_MODE))
goto out;

so that if the bug happens you do get a smp_send_reschedule() and fail
safely.

Paolo

> + goto out;
> + }

2021-10-19 17:37:04

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU

On Tue, Oct 19, 2021, Paolo Bonzini wrote:
> On 19/10/21 10:12, Wanpeng Li wrote:
> > - if (kvm_vcpu_wake_up(vcpu))
> > - return;
> > + me = get_cpu();
> > +
> > + if (rcuwait_active(kvm_arch_vcpu_get_wait(vcpu)) && kvm_vcpu_wake_up(vcpu))
> > + goto out;
>
> This is racy. You are basically doing the same check that rcuwait_wake_up
> does, but without the memory barrier before.

I was worried that was the case[*], but I didn't have the two hours it would have
taken me to verify there was indeed a problem :-)

The intent of the extra check was to avoid the locked instruction that comes with
disabling preemption via rcu_read_lock(). But thinking more, the extra op should
be little more than a basic arithmetic operation in the grand scheme on modern x86
since the cache line is going to be locked and written no matter what, either
immediately before or immediately after.

So with Paolo's other comment, maybe just this? And if this doesn't provide the
desired performance boost, changes to the rcuwait behavior should go in separate
patch.

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9ec99f5b972c..ebc6d4f2fbfa 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3333,11 +3333,22 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
* vCPU also requires it to leave IN_GUEST_MODE.
*/
me = get_cpu();
+
+ /*
+ * Avoid the moderately expensive "should kick" operation if this pCPU
+ * is currently running the target vCPU, in which case it's a KVM bug
+ * if the vCPU is in the inner run loop.
+ */
+ if (vcpu == __this_cpu_read(kvm_running_vcpu) &&
+ !WARN_ON_ONCE(vcpu->mode == IN_GUEST_MODE))
+ goto out;
+
if (kvm_arch_vcpu_should_kick(vcpu)) {
cpu = READ_ONCE(vcpu->cpu);
if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
smp_send_reschedule(cpu);
}
+out:
put_cpu();
}
EXPORT_SYMBOL_GPL(kvm_vcpu_kick);

[*] https://lkml.kernel.org/r/[email protected]

2021-10-20 02:51:38

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU

On Wed, 20 Oct 2021 at 01:34, Sean Christopherson <[email protected]> wrote:
>
> On Tue, Oct 19, 2021, Paolo Bonzini wrote:
> > On 19/10/21 10:12, Wanpeng Li wrote:
> > > - if (kvm_vcpu_wake_up(vcpu))
> > > - return;
> > > + me = get_cpu();
> > > +
> > > + if (rcuwait_active(kvm_arch_vcpu_get_wait(vcpu)) && kvm_vcpu_wake_up(vcpu))
> > > + goto out;
> >
> > This is racy. You are basically doing the same check that rcuwait_wake_up
> > does, but without the memory barrier before.
>
> I was worried that was the case[*], but I didn't have the two hours it would have
> taken me to verify there was indeed a problem :-)
>
> The intent of the extra check was to avoid the locked instruction that comes with
> disabling preemption via rcu_read_lock(). But thinking more, the extra op should
> be little more than a basic arithmetic operation in the grand scheme on modern x86
> since the cache line is going to be locked and written no matter what, either
> immediately before or immediately after.

I observe the main overhead of rcuwait_wake_up() is from rcu
operations, especially rcu_read_lock/unlock().

>
> So with Paolo's other comment, maybe just this? And if this doesn't provide the
> desired performance boost, changes to the rcuwait behavior should go in separate
> patch.

Ok.

Wanpeng

2021-10-20 06:48:32

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU

On 20/10/21 04:49, Wanpeng Li wrote:
>> The intent of the extra check was to avoid the locked instruction that comes with
>> disabling preemption via rcu_read_lock(). But thinking more, the extra op should
>> be little more than a basic arithmetic operation in the grand scheme on modern x86
>> since the cache line is going to be locked and written no matter what, either
>> immediately before or immediately after.
>
> I observe the main overhead of rcuwait_wake_up() is from rcu
> operations, especially rcu_read_lock/unlock().

Do you have CONFIG_PREEMPT_RCU set? If so, maybe something like this would help:

diff --git a/kernel/exit.c b/kernel/exit.c
index fd1c04193e18..ca1e60a1234d 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -235,8 +235,6 @@ int rcuwait_wake_up(struct rcuwait *w)
int ret = 0;
struct task_struct *task;

- rcu_read_lock();
-
/*
* Order condition vs @task, such that everything prior to the load
* of @task is visible. This is the condition as to why the user called
@@ -250,6 +248,14 @@ int rcuwait_wake_up(struct rcuwait *w)
*/
smp_mb(); /* (B) */

+#ifdef CONFIG_PREEMPT_RCU
+ /* The cost of rcu_read_lock() is nontrivial for preemptable RCU. */
+ if (!rcuwait_active(w))
+ return ret;
+#endif
+
+ rcu_read_lock();
+
task = rcu_dereference(w->task);
if (task)
ret = wake_up_process(task);

(If you don't, rcu_read_lock is essentially preempt_disable() and it
should not have a large overhead). You still need the memory barrier
though, in order to avoid missed wakeups; shameless plug for my
article at https://lwn.net/Articles/847481/.

Paolo

>> So with Paolo's other comment, maybe just this? And if this doesn't provide the
>> desired performance boost, changes to the rcuwait behavior should go in separate
>> patch.
> Ok.

2021-10-20 10:04:11

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU

On Wed, 20 Oct 2021 at 14:47, Paolo Bonzini <[email protected]> wrote:
>
> On 20/10/21 04:49, Wanpeng Li wrote:
> >> The intent of the extra check was to avoid the locked instruction that comes with
> >> disabling preemption via rcu_read_lock(). But thinking more, the extra op should
> >> be little more than a basic arithmetic operation in the grand scheme on modern x86
> >> since the cache line is going to be locked and written no matter what, either
> >> immediately before or immediately after.
> >
> > I observe the main overhead of rcuwait_wake_up() is from rcu
> > operations, especially rcu_read_lock/unlock().
>
> Do you have CONFIG_PREEMPT_RCU set? If so, maybe something like this would help:

Yes.

>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index fd1c04193e18..ca1e60a1234d 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -235,8 +235,6 @@ int rcuwait_wake_up(struct rcuwait *w)
> int ret = 0;
> struct task_struct *task;
>
> - rcu_read_lock();
> -
> /*
> * Order condition vs @task, such that everything prior to the load
> * of @task is visible. This is the condition as to why the user called
> @@ -250,6 +248,14 @@ int rcuwait_wake_up(struct rcuwait *w)
> */
> smp_mb(); /* (B) */
>
> +#ifdef CONFIG_PREEMPT_RCU
> + /* The cost of rcu_read_lock() is nontrivial for preemptable RCU. */
> + if (!rcuwait_active(w))
> + return ret;
> +#endif
> +
> + rcu_read_lock();
> +
> task = rcu_dereference(w->task);
> if (task)
> ret = wake_up_process(task);
>
> (If you don't, rcu_read_lock is essentially preempt_disable() and it
> should not have a large overhead). You still need the memory barrier
> though, in order to avoid missed wakeups; shameless plug for my
> article at https://lwn.net/Articles/847481/.

You are right, the cost of rcu_read_lock() for preemptable RCU
introduces too much overhead, do you want to send a separate patch?

Wanpeng

2021-10-20 10:34:49

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU

On 19/10/21 19:34, Sean Christopherson wrote:
> The intent of the extra check was to avoid the locked instruction that comes with
> disabling preemption via rcu_read_lock(). But thinking more, the extra op should
> be little more than a basic arithmetic operation in the grand scheme on modern x86
> since the cache line is going to be locked and written no matter what, either
> immediately before or immediately after.

There should be no locked instructions unless you're using
PREEMPT_RT/PREEMPT_RCU, no? The preempt_disable count is in a percpu
variable.

>
> + /*
> + * Avoid the moderately expensive "should kick" operation if this pCPU
> + * is currently running the target vCPU, in which case it's a KVM bug
> + * if the vCPU is in the inner run loop.
> + */
> + if (vcpu == __this_cpu_read(kvm_running_vcpu) &&
> + !WARN_ON_ONCE(vcpu->mode == IN_GUEST_MODE))
> + goto out;
> +

It should not even be a problem if vcpu->mode == IN_GUEST_MODE, you just
set it to EXITING_GUEST_MODE without even the need for atomic_cmpxchg.
I'll send a few patches out, since I think I found some related issues.

Paolo

2021-10-20 10:38:46

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU

On 20/10/21 12:02, Wanpeng Li wrote:
>> +#ifdef CONFIG_PREEMPT_RCU
>> + /* The cost of rcu_read_lock() is nontrivial for preemptable RCU. */
>> + if (!rcuwait_active(w))
>> + return ret;
>> +#endif
>> +
>> + rcu_read_lock();
>> +
>> task = rcu_dereference(w->task);
>> if (task)
>> ret = wake_up_process(task);
>>
>> (If you don't, rcu_read_lock is essentially preempt_disable() and it
>> should not have a large overhead). You still need the memory barrier
>> though, in order to avoid missed wakeups; shameless plug for my
>> article athttps://lwn.net/Articles/847481/.
> You are right, the cost of rcu_read_lock() for preemptable RCU
> introduces too much overhead, do you want to send a separate patch?

Yes, I'll take care of this. Thanks!

Paolo