2021-05-13 02:01:27

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v2 2/4] KVM: X86: Bail out of direct yield in case of under-comitted scenarios

From: Wanpeng Li <[email protected]>

In case of under-comitted scenarios, vCPU can get scheduling easily,
kvm_vcpu_yield_to adds extra overhead, we can observe a lot of race
between vcpu->ready is true and yield fails due to p->state is
TASK_RUNNING. Let's bail out in such scenarios by checking the length
of current cpu runqueue, it can be treated as a hint of under-committed
instead of guarantee of accuracy.

Signed-off-by: Wanpeng Li <[email protected]>
---
v1 -> v2:
* move the check after attempted counting
* update patch description

arch/x86/kvm/x86.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b6bca6..dfb7c32 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8360,6 +8360,9 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)

vcpu->stat.directed_yield_attempted++;

+ if (single_task_running())
+ goto no_yield;
+
rcu_read_lock();
map = rcu_dereference(vcpu->kvm->arch.apic_map);

--
2.7.4


2021-05-13 02:03:10

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v2 4/4] KVM: x86: hyper-v: Task srcu lock when accessing kvm_memslots()

From: Wanpeng Li <[email protected]>

WARNING: suspicious RCU usage
5.13.0-rc1 #4 Not tainted
-----------------------------
./include/linux/kvm_host.h:710 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by hyperv_clock/8318:
#0: ffffb6b8cb05a7d8 (&hv->hv_lock){+.+.}-{3:3}, at: kvm_hv_invalidate_tsc_page+0x3e/0xa0 [kvm]

stack backtrace:
CPU: 3 PID: 8318 Comm: hyperv_clock Not tainted 5.13.0-rc1 #4
Call Trace:
dump_stack+0x87/0xb7
lockdep_rcu_suspicious+0xce/0xf0
kvm_write_guest_page+0x1c1/0x1d0 [kvm]
kvm_write_guest+0x50/0x90 [kvm]
kvm_hv_invalidate_tsc_page+0x79/0xa0 [kvm]
kvm_gen_update_masterclock+0x1d/0x110 [kvm]
kvm_arch_vm_ioctl+0x2a7/0xc50 [kvm]
kvm_vm_ioctl+0x123/0x11d0 [kvm]
__x64_sys_ioctl+0x3ed/0x9d0
do_syscall_64+0x3d/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae

kvm_memslots() will be called by kvm_write_guest(), so we should take the srcu lock.

Fixes: e880c6ea5 (KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs)
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/hyperv.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index f98370a3..f00830e 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1172,6 +1172,7 @@ void kvm_hv_invalidate_tsc_page(struct kvm *kvm)
{
struct kvm_hv *hv = to_kvm_hv(kvm);
u64 gfn;
+ int idx;

if (hv->hv_tsc_page_status == HV_TSC_PAGE_BROKEN ||
hv->hv_tsc_page_status == HV_TSC_PAGE_UNSET ||
@@ -1190,9 +1191,16 @@ void kvm_hv_invalidate_tsc_page(struct kvm *kvm)
gfn = hv->hv_tsc_page >> HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT;

hv->tsc_ref.tsc_sequence = 0;
+
+ /*
+ * Take the srcu lock as memslots will be accessed to check the gfn
+ * cache generation against the memslots generation.
+ */
+ idx = srcu_read_lock(&kvm->srcu);
if (kvm_write_guest(kvm, gfn_to_gpa(gfn),
&hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence)))
hv->hv_tsc_page_status = HV_TSC_PAGE_BROKEN;
+ srcu_read_unlock(&kvm->srcu, idx);

out_unlock:
mutex_unlock(&hv->hv_lock);
--
2.7.4

2021-05-13 02:03:39

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v2 3/4] KVM: X86: Fix vCPU preempted state from guest's point of view

From: Wanpeng Li <[email protected]>

Commit 66570e966dd9 (kvm: x86: only provide PV features if enabled in guest's
CPUID) avoids to access pv tlb shootdown host side logic when this pv feature
is not exposed to guest, however, kvm_steal_time.preempted not only leveraged
by pv tlb shootdown logic but also mitigate the lock holder preemption issue.
From guest's point of view, vCPU is always preempted since we lose the reset
of kvm_steal_time.preempted before vmentry if pv tlb shootdown feature is not
exposed. This patch fixes it by clearing kvm_steal_time.preempted before
vmentry.

Fixes: 66570e966dd9 (kvm: x86: only provide PV features if enabled in guest's CPUID)
Reviewed-by: Sean Christopherson <[email protected]>
Cc: [email protected]
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/x86.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dfb7c32..bed7b53 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3105,6 +3105,8 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
st->preempted & KVM_VCPU_FLUSH_TLB);
if (xchg(&st->preempted, 0) & KVM_VCPU_FLUSH_TLB)
kvm_vcpu_flush_tlb_guest(vcpu);
+ } else {
+ st->preempted = 0;
}

vcpu->arch.st.preempted = 0;
--
2.7.4

2021-05-15 10:38:25

by David Matlack

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] KVM: X86: Bail out of direct yield in case of under-comitted scenarios

On Wed, May 12, 2021 at 7:01 PM Wanpeng Li <[email protected]> wrote:
>
> From: Wanpeng Li <[email protected]>
>
> In case of under-comitted scenarios, vCPU can get scheduling easily,
> kvm_vcpu_yield_to adds extra overhead, we can observe a lot of race
> between vcpu->ready is true and yield fails due to p->state is
> TASK_RUNNING. Let's bail out in such scenarios by checking the length
> of current cpu runqueue, it can be treated as a hint of under-committed
> instead of guarantee of accuracy.
>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> v1 -> v2:
> * move the check after attempted counting
> * update patch description
>
> arch/x86/kvm/x86.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9b6bca6..dfb7c32 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8360,6 +8360,9 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
>
> vcpu->stat.directed_yield_attempted++;
>
> + if (single_task_running())
> + goto no_yield;

Since this is a heuristic, do you have any experimental or real world
results that show the benefit?

> +
> rcu_read_lock();
> map = rcu_dereference(vcpu->kvm->arch.apic_map);
>
> --
> 2.7.4
>

2021-05-17 02:02:45

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] KVM: X86: Bail out of direct yield in case of under-comitted scenarios

On Sat, 15 May 2021 at 05:33, David Matlack <[email protected]> wrote:
>
> On Wed, May 12, 2021 at 7:01 PM Wanpeng Li <[email protected]> wrote:
> >
> > From: Wanpeng Li <[email protected]>
> >
> > In case of under-comitted scenarios, vCPU can get scheduling easily,
> > kvm_vcpu_yield_to adds extra overhead, we can observe a lot of race
> > between vcpu->ready is true and yield fails due to p->state is
> > TASK_RUNNING. Let's bail out in such scenarios by checking the length
> > of current cpu runqueue, it can be treated as a hint of under-committed
> > instead of guarantee of accuracy.
> >
> > Signed-off-by: Wanpeng Li <[email protected]>
> > ---
> > v1 -> v2:
> > * move the check after attempted counting
> > * update patch description
> >
> > arch/x86/kvm/x86.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 9b6bca6..dfb7c32 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -8360,6 +8360,9 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
> >
> > vcpu->stat.directed_yield_attempted++;
> >
> > + if (single_task_running())
> > + goto no_yield;
>
> Since this is a heuristic, do you have any experimental or real world
> results that show the benefit?

I observe the directed_yield_successful/directed_yield_attempted
ratio, it can be improved from 50%+ to 80%+ in the under-committed
scenario.

Wanpeng