Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp6216554yba; Tue, 14 May 2019 03:56:33 -0700 (PDT) X-Google-Smtp-Source: APXvYqxnqYHkQX7jCMZOd6wQpa30Px0PPjMElmpm8yJfOhzR/emTF8o300CLbZS6DF21UJwq1CHt X-Received: by 2002:a17:902:7d83:: with SMTP id a3mr38089654plm.305.1557831393188; Tue, 14 May 2019 03:56:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557831393; cv=none; d=google.com; s=arc-20160816; b=jj31Zpvx0LIHjfoLkgfLOxbHxqUy1UG4XKEVcxbEHEk2O+0g6TVAbAIywjYkHGV0tS xzCR+KgGx9MT9nd4qrXHPS79kBKOuDsKwbsVn6pzHIx2XuMVfgQgZaqJ+DylQ1D0kKRV qPvTOXOACSJ06KgiWqbG/oaq7jYJ46ALdwrcDZ1hASfb6QrEG68lujUbKeXDpGgXHKso s8L3gg5y1L3GR7Y86gj1+eLfAulzQu3ONaaeWkRBIpAsHgSEIX6DhGdGqwHIuPrduyo9 g4rsPk5ceB6B7MAfltbYBnOWzD1FgFQX1IpnxQ5PJhJaDwvfWcRrr76BKe5vXcAcm9ih TpRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=QtB5Rs/peEa4JN+D+iPr59JmMiQWnsnfd52Mq2rk5Fc=; b=Jyt8zyUPmB9R5dWVWOBngppcPUMQZXZKGpCh2Ip5dl3NTF4GvICuX+gPf73BvMsqrc uCw3qh6fhdAx22VWqHegdobO1UkJdRrePYh3g/1mM38wxsoRwBb33fFuqPPnA+Tt5dlE PceOLVQHgSRPXZJTGrjkG4fcQF0HcG7tzW4WCKv2vRKsgyzQhoT+mphTMswdjSxIxWCF tWvDmqG44gRBh1GiLLs5nSMyJv43OfcvGgGY2kXPg33kcebkNpoeNchMPoJalQLoibmr z8ZiAftY+Z7+Aoam6treqwgrF/R1MNAuw4Ki5+dMNkfofKijSHCKO0DmI90u0ftincse umjA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=RJsDKL3+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f63si20121369pff.107.2019.05.14.03.56.18; Tue, 14 May 2019 03:56:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=RJsDKL3+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726434AbfENKyy (ORCPT + 99 others); Tue, 14 May 2019 06:54:54 -0400 Received: from mail-ot1-f66.google.com ([209.85.210.66]:32792 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726473AbfENKyw (ORCPT ); Tue, 14 May 2019 06:54:52 -0400 Received: by mail-ot1-f66.google.com with SMTP id 66so14792366otq.0; Tue, 14 May 2019 03:54:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=QtB5Rs/peEa4JN+D+iPr59JmMiQWnsnfd52Mq2rk5Fc=; b=RJsDKL3+Kry+B8IxmAwZ/vkcvpAaZsiTayf7bGzVyoBmjaVPIPTy6hYnyGjK3EPmI5 0dkYQAXR/6f5xfTqU0cT+sgI+EAf38OalOkteS7YBoR7WfHHZmXOyOCyNDjxA3cvWd2d 9S4V2K2oWDeZnzSC6nxRDEiZI22X8Y5JiE9fmvbdJ7rfDjaWSFieKVaquT0XPwPCxeMk Wjk1RS6dm+cqTciREvRf4v97LERVEiYepfpEj8ylrbLMbzf6/p1C0sIqtadUW1V+k9TA KBO4zS7HKLrY/nxQ62xfsPS4fk+wr3md8lz10+V8sYXiA9VjnPEAVRvAij8OGC+yjNRv W0ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=QtB5Rs/peEa4JN+D+iPr59JmMiQWnsnfd52Mq2rk5Fc=; b=PCA9mO3c+Pt6i0Kd6JviphUHwzxCJCDO4gWwesHRSvyifOd9xhZ7jEFSq6689Jzemy +RUhybPxecZZ/294+o1oJmaBHBpjF1rYrHXS1qLtZsNq6THcP8/x06ebYXZwQXxhz+hC cGnim6Gm9emCtVP8UZAYUHqJCPz+pcM04A8yIDSfpZRZqVL/u6WWN3nGPBOZEIifedva 8knnajhUxWxvy+Jqp+7mJZSmCotvnf2zNlQg8WUK+7dkAThAIXtUmsX2uqCw7an6ocIf 7APK3n8rWgNS1KqmGUvnfd59ETAJF6EhDZKzFZyfFPbWERt9HNSzsEqx/iNTG55exvA0 k6xQ== X-Gm-Message-State: APjAAAUQ+xnDYaUOazWfG8FeoTEZ6sIw1QXvIBTyR2hFbIWl4Z6Vkhc3 kU7IJpYUPWGlxR54MbOVleYhjkPa1wgqSFg0ahQ= X-Received: by 2002:a9d:7343:: with SMTP id l3mr19766347otk.63.1557831291504; Tue, 14 May 2019 03:54:51 -0700 (PDT) MIME-Version: 1.0 References: <1557401361-3828-1-git-send-email-wanpengli@tencent.com> <1557401361-3828-4-git-send-email-wanpengli@tencent.com> <20190513195417.GM28561@linux.intel.com> In-Reply-To: From: Wanpeng Li Date: Tue, 14 May 2019 18:56:04 +0800 Message-ID: Subject: Re: [PATCH 3/3] KVM: LAPIC: Optimize timer latency further To: Sean Christopherson Cc: LKML , kvm , Paolo Bonzini , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Liran Alon Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 14 May 2019 at 09:45, Wanpeng Li wrote: > > On Tue, 14 May 2019 at 03:54, Sean Christopherson > wrote: > > > > On Thu, May 09, 2019 at 07:29:21PM +0800, Wanpeng Li wrote: > > > From: Wanpeng Li > > > > > > Advance lapic timer tries to hidden the hypervisor overhead between h= ost > > > timer fires and the guest awares the timer is fired. However, it just= hidden > > > the time between apic_timer_fn/handle_preemption_timer -> wait_lapic_= expire, > > > instead of the real position of vmentry which is mentioned in the ori= gnial > > > commit d0659d946be0 ("KVM: x86: add option to advance tscdeadline hrt= imer > > > expiration"). There is 700+ cpu cycles between the end of wait_lapic_= expire > > > and before world switch on my haswell desktop, it will be 2400+ cycle= s if > > > vmentry_l1d_flush is tuned to always. > > > > > > This patch tries to narrow the last gap, it measures the time between > > > the end of wait_lapic_expire and before world switch, we take this > > > time into consideration when busy waiting, otherwise, the guest still > > > awares the latency between wait_lapic_expire and world switch, we als= o > > > consider this when adaptively tuning the timer advancement. The patch > > > can reduce 50% latency (~1600+ cycles to ~800+ cycles on a haswell > > > desktop) for kvm-unit-tests/tscdeadline_latency when testing busy wai= ts. > > > > > > Cc: Paolo Bonzini > > > Cc: Radim Kr=C4=8Dm=C3=A1=C5=99 > > > Cc: Sean Christopherson > > > Cc: Liran Alon > > > Signed-off-by: Wanpeng Li > > > --- > > > arch/x86/kvm/lapic.c | 23 +++++++++++++++++++++-- > > > arch/x86/kvm/lapic.h | 8 ++++++++ > > > arch/x86/kvm/vmx/vmx.c | 2 ++ > > > 3 files changed, 31 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > > > index e7a0660..01d3a87 100644 > > > --- a/arch/x86/kvm/lapic.c > > > +++ b/arch/x86/kvm/lapic.c > > > @@ -1545,13 +1545,19 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu) > > > > > > tsc_deadline =3D apic->lapic_timer.expired_tscdeadline; > > > apic->lapic_timer.expired_tscdeadline =3D 0; > > > - guest_tsc =3D kvm_read_l1_tsc(vcpu, rdtsc()); > > > + guest_tsc =3D kvm_read_l1_tsc(vcpu, (apic->lapic_timer.measure_= delay_done =3D=3D 2) ? > > > + rdtsc() + apic->lapic_timer.vmentry_delay : rdtsc()); > > > trace_kvm_wait_lapic_expire(vcpu->vcpu_id, guest_tsc - tsc_dead= line); > > > > > > if (guest_tsc < tsc_deadline) > > > __wait_lapic_expire(vcpu, tsc_deadline - guest_tsc); > > > > > > adaptive_tune_timer_advancement(vcpu, guest_tsc, tsc_deadline); > > > + > > > + if (!apic->lapic_timer.measure_delay_done) { > > > + apic->lapic_timer.measure_delay_done =3D 1; > > > + apic->lapic_timer.vmentry_delay =3D rdtsc(); > > > + } > > > } > > > > > > static void start_sw_tscdeadline(struct kvm_lapic *apic) > > > @@ -1837,6 +1843,18 @@ static void apic_manage_nmi_watchdog(struct kv= m_lapic *apic, u32 lvt0_val) > > > } > > > } > > > > > > +void kvm_lapic_measure_vmentry_delay(struct kvm_vcpu *vcpu) > > > +{ > > > + struct kvm_timer *ktimer =3D &vcpu->arch.apic->lapic_timer; > > > > This will #GP if the APIC is not in-kernel, i.e. @apic is NULL. > > > > > + > > > + if (ktimer->measure_delay_done =3D=3D 1) { > > > + ktimer->vmentry_delay =3D rdtsc() - > > > + ktimer->vmentry_delay; > > > + ktimer->measure_delay_done =3D 2; > > > > Measuring the delay a single time is bound to result in random outliers= , > > e.g. if an NMI happens to occur after wait_lapic_expire(). > > > > Rather than reinvent the wheel, can we simply move the call to > > wait_lapic_expire() into vmx.c and svm.c? For VMX we'd probably want t= o > > support the advancement if enable_unrestricted_guest=3Dtrue so that we = avoid > > the emulation_required case, but other than that I don't see anything t= hat > > requires wait_lapic_expire() to be called where it is. > > I also considered to move wait_lapic_expire() into vmx.c and svm.c > before, what do you think, Paolo, Radim? However, guest_enter_irqoff() also prevents this. Otherwise, we will account busy wait time as guest time. How about sampling several times and get the average value or conservative min value to handle Sean's concern? Regards, Wanpeng Li