Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp3426095imm; Mon, 8 Oct 2018 03:59:05 -0700 (PDT) X-Google-Smtp-Source: ACcGV63RbxxV3UZgfivWdNLbGHaAbKHREc/vJD9Qw8Xexvv8Z9eqnhiViZqUV4a+3/jMaDSwxP6I X-Received: by 2002:a62:5982:: with SMTP id k2-v6mr14117115pfj.180.1538996345117; Mon, 08 Oct 2018 03:59:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538996345; cv=none; d=google.com; s=arc-20160816; b=bunJBoHKB43Ci6njazrJ//mJuqsX1IaDXrAoxsGGLKi9xQCoi/XcpSEYxlCjpUW5Xn PhVcNF2QRMxJONE5K7fOmFIvnh2BquILMAeuX4PPYRaz6M48hW76INwWTVABilKzRnLy 8nWB7jz2dvH8mkvJcKorXqqSdAlcceIzWjNwhmYggz84ZoEPiGsIhTVWfO9rqr19KTx3 2o0WeHA8FjypY8m3aaxx3NPm09c7MVebu40U2Sn8yA4VyiHlhu86IR0RmiDjfUtL/i9r Hs+bb79wI+aqfF4kCJ78NF/gbfEkczgJeE645Ens29GZPmLUnArkspRevigcItl2k2Xs 8MVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=HYauraNz349rvrX33twpDLpNOIEeG2x7NtAWEK5ACoE=; b=G2e2SD4QItLbGzaf4jOYW0NGgXhwGqOjA97enpAASogRmEGY/5k8ery1vGAC2DzRIm 0VmSsJCQ+Yag1SXtZrVB1lI092IbQd7JdirJwdjNmhneZlyIHHRnjGJXQ4l+lGHg941Q 9n1i04RAdA63n/qS3Otn/a38pPzK//VlRcIFpPfJozK8Th/UmK/NG1wOW/XfAhJJbyt2 JAHCtuKBmULRlS0H5SzOi6xeS36FZQOU3vet7Qqi4KckGeoln8oBOmZj5dz1GZ3BgxtX HH21E+sD6+LIAIR6iuVq+LL2i/Ga9ydM+y+A7z9q3/688VtwfbkbUhpllz1P8l4egD+D qnWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Hqc5VWKn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b127-v6si16879680pga.153.2018.10.08.03.58.50; Mon, 08 Oct 2018 03:59:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Hqc5VWKn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727658AbeJHSJt (ORCPT + 99 others); Mon, 8 Oct 2018 14:09:49 -0400 Received: from mail-ot1-f67.google.com ([209.85.210.67]:46867 "EHLO mail-ot1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726656AbeJHSJs (ORCPT ); Mon, 8 Oct 2018 14:09:48 -0400 Received: by mail-ot1-f67.google.com with SMTP id o21so18871901otb.13; Mon, 08 Oct 2018 03:58:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=HYauraNz349rvrX33twpDLpNOIEeG2x7NtAWEK5ACoE=; b=Hqc5VWKn6DcGDQjrP4DfN1qEmn3WvCPH8APi3MImhY/LN3U3q1FD+vWlqlbpIidmEn CQExPoCFLtZ5btnBcCYDRwHGDe9J8Nik6OAfW05eRaOYfnZ+M4JYHM0bZOlejUxnT498 dfUSZ2NdQEHbuquyI3gWsiIj+uJwgZoW/69IHBWVa7B5CwslwSeIqSFQkF1oiD3WNouz VBTC2urb4duMPXGZOQs3ev2MuLDYd5+o0vOFGwJs19ZqVs7KnDcrFHNuQI/V1J/h+alV 6TPZDU2VpXKWx79FFo/yozhQIeJsL20i7Ow7ILUzNlfl5jxhDzkFkIfDNRl7Rp8nj1gQ OiaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=HYauraNz349rvrX33twpDLpNOIEeG2x7NtAWEK5ACoE=; b=lvvRh4nlDMfWJCvQb/TxTXqtBLr9xRo+5iKOr3MPdVaeCdIJ9HJOsREgRPq+r+Dd1H ih03Mxs8JxtMPJe8qMOJ76CPAX80N7q9ueivDzTY8yJvvV+pAjVHCcMd7a6mnN44FZoF WaI+P98GF0OoSnedNOZbx1sebPLwWAYrZg3NCUuDIV2/CGhsvqcRENcAEQC6sdRzRGqK AHKk15g5CqiqmfN0kUAPJKYd+bbisOaRvdE2O44xeLhxdbQVR1DIPwqnmFm8PfuUoZr7 i3dtD7uqDSOIwPmIJoi7/kWWzdxvWshRb9SqJh52aFWlvvgkoSRdmhpTnKoihpTBanar /J5w== X-Gm-Message-State: ABuFfoj04RpcKVaDoqLPXTN5mLuZqodl2jkXqLB0OmlY0LjAaok98Omd lU0C7bbsj+M9o59hfLXjxBFU5nrEPenpIcfADhc= X-Received: by 2002:a9d:4492:: with SMTP id v18mr14527415ote.282.1538996319050; Mon, 08 Oct 2018 03:58:39 -0700 (PDT) MIME-Version: 1.0 References: <1538115136-20092-1-git-send-email-wanpengli@tencent.com> <8847332B-9759-4FF2-B6D8-02334AE11015@oracle.com> In-Reply-To: <8847332B-9759-4FF2-B6D8-02334AE11015@oracle.com> From: Wanpeng Li Date: Mon, 8 Oct 2018 18:59:22 +0800 Message-ID: Subject: Re: [PATCH] KVM: LAPIC: Tune lapic_timer_advance_ns automatically To: Liran Alon Cc: LKML , kvm , Paolo Bonzini , Radim Krcmar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 8 Oct 2018 at 05:02, Liran Alon wrote: > > > > > On 28 Sep 2018, at 9:12, Wanpeng Li wrote: > > > > From: Wanpeng Li > > > > In cloud environment, lapic_timer_advance_ns is needed to be tuned for = every CPU > > generations, and every host kernel versions(the kvm-unit-tests/tscdeadl= ine_latency.flat > > is 5700 cycles for upstream kernel and 9600 cycles for our 3.10 product= kernel, > > both preemption_timer=3DN, Skylake server). > > > > This patch adds the capability to automatically tune lapic_timer_advanc= e_ns > > step by step, the initial value is 1000ns as d0659d946be05 (KVM: x86: a= dd > > option to advance tscdeadline hrtimer expiration) recommended, it will = be > > reduced when it is too early, and increased when it is too late. The gu= est_tsc > > and tsc_deadline are hard to equal, so we assume we are done when the d= elta > > is within a small scope e.g. 100 cycles. This patch reduces latency > > (kvm-unit-tests/tscdeadline_latency, busy waits, preemption_timer enabl= ed) > > from ~2600 cyles to ~1200 cyles on our Skylake server. > > > > Cc: Paolo Bonzini > > Cc: Radim Kr=C4=8Dm=C3=A1=C5=99 > > Signed-off-by: Wanpeng Li > > --- > > arch/x86/kvm/lapic.c | 7 +++++++ > > arch/x86/kvm/x86.c | 2 +- > > 2 files changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > > index fbb0e6d..b756f12 100644 > > --- a/arch/x86/kvm/lapic.c > > +++ b/arch/x86/kvm/lapic.c > > @@ -70,6 +70,8 @@ > > #define APIC_BROADCAST 0xFF > > #define X2APIC_BROADCAST 0xFFFFFFFFul > > > > +static bool __read_mostly lapic_timer_advance_adjust_done =3D false; > > + > > static inline int apic_test_vector(int vec, void *bitmap) > > { > > return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); > > @@ -1492,6 +1494,11 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu) > > if (guest_tsc < tsc_deadline) > > __delay(min(tsc_deadline - guest_tsc, > > nsec_to_cycles(vcpu, lapic_timer_advance_ns))); > > + if (!lapic_timer_advance_adjust_done) { > > + lapic_timer_advance_ns +=3D (s64)(guest_tsc - tsc_deadlin= e) / 8; > > I don=E2=80=99t understand how this =E2=80=9C/ 8=E2=80=9D converts betwee= n guest TSC units to host nanoseconds. Oh, I miss it. In addition, /8 here I mean adjust lapic_timer_advance_ns step by step. I can observe big fluctuated value between early and late when running real guest os like linux instead of kvm-unit-tests. After more testing, I saw lapic_timer_advance_ns can be overflow since the delta between guest_tsc and tsc_deadline is too huge. > > I think that instead you should do something like: > s64 ns =3D (s64)(guest_tsc - tsc_deadline) * 1000000ULL; > do_div(ns, vcpu->arch.virtual_tsc_khz); > lapic_timer_advance_ns +=3D ns; > > > + if (abs(guest_tsc - tsc_deadline) < 100) > > I would put this =E2=80=9C100=E2=80=9D hard-coded value as some =E2=80=9C= #define=E2=80=9D to make code more clear. How about something like below: diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index fbb0e6d..354eb13c 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -70,6 +70,9 @@ #define APIC_BROADCAST 0xFF #define X2APIC_BROADCAST 0xFFFFFFFFul +static bool __read_mostly lapic_timer_advance_adjust_done =3D false; +#define LAPIC_TIMER_ADVANCE_ADJUST_DONE 100 + static inline int apic_test_vector(int vec, void *bitmap) { return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); @@ -1472,7 +1475,7 @@ static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu) void wait_lapic_expire(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic =3D vcpu->arch.apic; - u64 guest_tsc, tsc_deadline; + u64 guest_tsc, tsc_deadline, ns; if (!lapic_in_kernel(vcpu)) return; @@ -1492,6 +1495,19 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu) if (guest_tsc < tsc_deadline) __delay(min(tsc_deadline - guest_tsc, nsec_to_cycles(vcpu, lapic_timer_advance_ns))); + if (!lapic_timer_advance_adjust_done) { + if (guest_tsc < tsc_deadline) { + ns =3D (tsc_deadline - guest_tsc) * 1000000ULL; + do_div(ns, vcpu->arch.virtual_tsc_khz); + lapic_timer_advance_ns -=3D min((unsigned int)ns, lapic_timer_advance_ns / 8); + } else { + ns =3D (guest_tsc - tsc_deadline) * 1000000ULL; + do_div(ns, vcpu->arch.virtual_tsc_khz); + lapic_timer_advance_ns +=3D min((unsigned int)ns, lapic_timer_advance_ns / 8); + } + if (ns < LAPIC_TIMER_ADVANCE_ADJUST_DONE) + lapic_timer_advance_adjust_done =3D true; + } } static void start_sw_tscdeadline(struct kvm_lapic *apic) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ca71773..1f3f955 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -136,7 +136,7 @@ static u32 __read_mostly tsc_tolerance_ppm =3D 250; module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR); /* lapic timer advance (tscdeadline mode only) in nanoseconds */ -unsigned int __read_mostly lapic_timer_advance_ns =3D 0; +unsigned int __read_mostly lapic_timer_advance_ns =3D 1000; module_param(lapic_timer_advance_ns, uint, S_IRUGO | S_IWUSR); EXPORT_SYMBOL_GPL(lapic_timer_advance_ns);