Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp3492816imm; Mon, 8 Oct 2018 05:05:19 -0700 (PDT) X-Google-Smtp-Source: ACcGV6271PZmYJffSGl0uP293FTcYL/Fyzrci5Kzhs6QVyXKqEBiSaYSSn+2vGJZkOi2pMtGSdce X-Received: by 2002:a63:c642:: with SMTP id x2-v6mr19824797pgg.16.1539000319085; Mon, 08 Oct 2018 05:05:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539000319; cv=none; d=google.com; s=arc-20160816; b=j8yhxevfKstZuQuqP7tKoiCPJR+Jyw9ruBayRVOethdgi1+9sC6wBq8rdU5WwM5eVy cPKlHdR4quLqvl6YrmSh5C5vp1TGv5OVyhcY2Y35CgICvU9eRTH0/1fPo0arqzkS4Xdh hfR5l88naJzr68ycKJNaBULvS8OLNdR5hHFy1wKTHOhCf40PCRU8X3p7nxWXU4rjirMP iP3RK5gZqbSOQQ9bFcuZKz0eyuBwvRPHQpg6mlrNRJhHDVJv6NEH0yHNt4HOVOOmttSl wGbAPB6N7WBF+kmRcl0cEhAne36YmYsi2pkzjDE/ws8mEbO37JpZIKeNIo+lWhiSCJXf a/mg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=aBna1xgjxE65z0twGtMv8p2gO2AARvY6+fn3bpevcOI=; b=TJMWr6avBREADHlz4R5nr6rngsWerGc/hThyOnWiZompC7UIKO0sVqXjh9BppEuKHk 93BTL5zqElMNEvPFEs0O0AJRl8Q64VJLQLxyTBie5RMtZR02AL390ikPBefg6LNt6ltL tIUiGMK38czGohjsXHACqDJYCZRCkFqGEXB2AtrNFUBrJzkisaKTivk433vECOCuoVIJ ZlXbjShOcn0qrZGPQeOVCRT/fai8TNExJfblRMoJQpc4FQKgfop8IPande7JgxFzLtSS Hmjx0JHmGyBM+yiXASPmaoy+psoKZVvhhO/u2w2hY1K9LiVZM0SrFPUVNXsaoBUFKlzh sf+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=kNu6xOl0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z7-v6si16761002plk.215.2018.10.08.05.05.03; Mon, 08 Oct 2018 05:05:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=kNu6xOl0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727813AbeJHTQW (ORCPT + 99 others); Mon, 8 Oct 2018 15:16:22 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:43330 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726995AbeJHTQW (ORCPT ); Mon, 8 Oct 2018 15:16:22 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w98C4CMr012748; Mon, 8 Oct 2018 12:04:54 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=content-type : mime-version : subject : from : in-reply-to : date : cc : content-transfer-encoding : message-id : references : to; s=corp-2018-07-02; bh=aBna1xgjxE65z0twGtMv8p2gO2AARvY6+fn3bpevcOI=; b=kNu6xOl0gfRXq3J1EppkZ67++qPINyI3CO8ED2p8s30sO4cltKpx+G7rw6R0/YPwaUig u+uAyv9ZlJVpdZAArbwpeEj++2TRdLCpNiQlC82aH4N54yjDR4sXmr9YBXAP5wK0laUM jpix0YHyyXE0/OD0WgLBIcy/TinYJmIeHg4AjE+ghV/oPt0bOQ0OhYxOEPDBqH0EIXv6 5wMV/yIWOjjqTMVnV5eXAXCoy+/QMIA1wGaSgVSfFyS6oMQ8Qzh/RGKA2xk3eYWaEeY5 jTpLHj2Xz4kd7q7MSg9edLrq4dYkFQsGIidb2nj0UQoIjMUIc9yhvcvW9QmhG2U6ijQf qA== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2120.oracle.com with ESMTP id 2mxn0pp34y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 08 Oct 2018 12:04:54 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w98C4sMc010122 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 8 Oct 2018 12:04:54 GMT Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w98C4r3A018556; Mon, 8 Oct 2018 12:04:54 GMT Received: from [192.168.14.112] (/79.178.223.28) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 08 Oct 2018 12:04:53 +0000 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 11.1 \(3445.4.7\)) Subject: Re: [PATCH] KVM: LAPIC: Tune lapic_timer_advance_ns automatically From: Liran Alon In-Reply-To: Date: Mon, 8 Oct 2018 15:04:50 +0300 Cc: LKML , kvm , Paolo Bonzini , Radim Krcmar Content-Transfer-Encoding: quoted-printable Message-Id: <5A4E140E-BC3D-46E4-AF7B-E8053FD4C683@oracle.com> References: <1538115136-20092-1-git-send-email-wanpengli@tencent.com> <8847332B-9759-4FF2-B6D8-02334AE11015@oracle.com> To: Wanpeng Li X-Mailer: Apple Mail (2.3445.4.7) X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9039 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1810080120 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On 8 Oct 2018, at 13:59, Wanpeng Li wrote: >=20 > On Mon, 8 Oct 2018 at 05:02, Liran Alon wrote: >>=20 >>=20 >>=20 >>> On 28 Sep 2018, at 9:12, Wanpeng Li wrote: >>>=20 >>> From: Wanpeng Li >>>=20 >>> In cloud environment, lapic_timer_advance_ns is needed to be tuned = for every CPU >>> generations, and every host kernel versions(the = kvm-unit-tests/tscdeadline_latency.flat >>> is 5700 cycles for upstream kernel and 9600 cycles for our 3.10 = product kernel, >>> both preemption_timer=3DN, Skylake server). >>>=20 >>> This patch adds the capability to automatically tune = lapic_timer_advance_ns >>> step by step, the initial value is 1000ns as d0659d946be05 (KVM: = x86: add >>> option to advance tscdeadline hrtimer expiration) recommended, it = will be >>> reduced when it is too early, and increased when it is too late. The = guest_tsc >>> and tsc_deadline are hard to equal, so we assume we are done when = the delta >>> is within a small scope e.g. 100 cycles. This patch reduces latency >>> (kvm-unit-tests/tscdeadline_latency, busy waits, preemption_timer = enabled) >>> from ~2600 cyles to ~1200 cyles on our Skylake server. >>>=20 >>> Cc: Paolo Bonzini >>> Cc: Radim Kr=C4=8Dm=C3=A1=C5=99 >>> Signed-off-by: Wanpeng Li >>> --- >>> arch/x86/kvm/lapic.c | 7 +++++++ >>> arch/x86/kvm/x86.c | 2 +- >>> 2 files changed, 8 insertions(+), 1 deletion(-) >>>=20 >>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c >>> index fbb0e6d..b756f12 100644 >>> --- a/arch/x86/kvm/lapic.c >>> +++ b/arch/x86/kvm/lapic.c >>> @@ -70,6 +70,8 @@ >>> #define APIC_BROADCAST 0xFF >>> #define X2APIC_BROADCAST 0xFFFFFFFFul >>>=20 >>> +static bool __read_mostly lapic_timer_advance_adjust_done =3D = false; >>> + >>> static inline int apic_test_vector(int vec, void *bitmap) >>> { >>> return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); >>> @@ -1492,6 +1494,11 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu) >>> if (guest_tsc < tsc_deadline) >>> __delay(min(tsc_deadline - guest_tsc, >>> nsec_to_cycles(vcpu, lapic_timer_advance_ns))); >>> + if (!lapic_timer_advance_adjust_done) { >>> + lapic_timer_advance_ns +=3D (s64)(guest_tsc - = tsc_deadline) / 8; >>=20 >> I don=E2=80=99t understand how this =E2=80=9C/ 8=E2=80=9D converts = between guest TSC units to host nanoseconds. >=20 > Oh, I miss it. In addition, /8 here I mean adjust > lapic_timer_advance_ns step by step. I can observe big fluctuated If that=E2=80=99s the case, I would also put the =E2=80=9C8=E2=80=9D as = a #define to make it more clear of it=E2=80=99s purpose. > value between early and late when running real guest os like linux > instead of kvm-unit-tests. After more testing, I saw > lapic_timer_advance_ns can be overflow since the delta between > guest_tsc and tsc_deadline is too huge. >=20 >>=20 >> I think that instead you should do something like: >> s64 ns =3D (s64)(guest_tsc - tsc_deadline) * 1000000ULL; >> do_div(ns, vcpu->arch.virtual_tsc_khz); >> lapic_timer_advance_ns +=3D ns; >>=20 >>> + if (abs(guest_tsc - tsc_deadline) < 100) >>=20 >> I would put this =E2=80=9C100=E2=80=9D hard-coded value as some = =E2=80=9C#define=E2=80=9D to make code more clear. >=20 > How about something like below: >=20 > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index fbb0e6d..354eb13c 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -70,6 +70,9 @@ > #define APIC_BROADCAST 0xFF > #define X2APIC_BROADCAST 0xFFFFFFFFul >=20 > +static bool __read_mostly lapic_timer_advance_adjust_done =3D false; > +#define LAPIC_TIMER_ADVANCE_ADJUST_DONE 100 > + > static inline int apic_test_vector(int vec, void *bitmap) > { > return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); > @@ -1472,7 +1475,7 @@ static bool lapic_timer_int_injected(struct > kvm_vcpu *vcpu) > void wait_lapic_expire(struct kvm_vcpu *vcpu) > { > struct kvm_lapic *apic =3D vcpu->arch.apic; > - u64 guest_tsc, tsc_deadline; > + u64 guest_tsc, tsc_deadline, ns; >=20 > if (!lapic_in_kernel(vcpu)) > return; > @@ -1492,6 +1495,19 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu) > if (guest_tsc < tsc_deadline) > __delay(min(tsc_deadline - guest_tsc, > nsec_to_cycles(vcpu, lapic_timer_advance_ns))); > + if (!lapic_timer_advance_adjust_done) { > + if (guest_tsc < tsc_deadline) { > + ns =3D (tsc_deadline - guest_tsc) * 1000000ULL; > + do_div(ns, vcpu->arch.virtual_tsc_khz); > + lapic_timer_advance_ns -=3D min((unsigned int)ns, > lapic_timer_advance_ns / 8); > + } else { > + ns =3D (guest_tsc - tsc_deadline) * 1000000ULL; > + do_div(ns, vcpu->arch.virtual_tsc_khz); > + lapic_timer_advance_ns +=3D min((unsigned int)ns, > lapic_timer_advance_ns / 8); > + } > + if (ns < LAPIC_TIMER_ADVANCE_ADJUST_DONE) Didn=E2=80=99t you meant to compare here that abs(guest_tsc - = tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE? This is also a good example on why I would also rename = LAPIC_TIMER_ADVANCE_ADJUST_DONE to something which indicates it represents a number in guest TSC units. > + lapic_timer_advance_adjust_done =3D true; > + } > } >=20 > static void start_sw_tscdeadline(struct kvm_lapic *apic) > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index ca71773..1f3f955 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -136,7 +136,7 @@ static u32 __read_mostly tsc_tolerance_ppm =3D = 250; > module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR); >=20 > /* lapic timer advance (tscdeadline mode only) in nanoseconds */ > -unsigned int __read_mostly lapic_timer_advance_ns =3D 0; > +unsigned int __read_mostly lapic_timer_advance_ns =3D 1000; > module_param(lapic_timer_advance_ns, uint, S_IRUGO | S_IWUSR); > EXPORT_SYMBOL_GPL(lapic_timer_advance_ns);