Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp3821306ybe; Mon, 16 Sep 2019 01:44:31 -0700 (PDT) X-Google-Smtp-Source: APXvYqwU63MhBbjycwh/WPlTqzgPwOc97/i7TYriDW06kTjcOw2IReC2pycyl1vQ+8fVsqH8J0v0 X-Received: by 2002:a17:906:6818:: with SMTP id k24mr49564195ejr.271.1568623470937; Mon, 16 Sep 2019 01:44:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568623470; cv=none; d=google.com; s=arc-20160816; b=YNSxJGvVAcJ5m8Ve/OyC/3imNQq+a0vm6ZicDvQXPMZiwTeGxMATksLnlxVCSrOG6I bJORQQRAZSmAY+hOU6aJZaWIKeU8YXdgX0UZV91nBb7h3AcepU+eDj95XtQghv6lMIJL UyhSh9C7zcSXF0HUNL7/FkeeuJyZZIY0tGnKzyjAhpN6ZJOj6531o7dt8dP/CAUIic3z GmRIiOATGtklKXN62OwIqjgAdHh/ofeOC8RkscoP0W+Ss1fkEKchZ9gGi/icfl+MDzwX TywMzQpJuV2DQbf0mpqTLDLZY4J5wtLuJ0/a25NzGYDz+Wu9amLYSm0YMf41Ag5JLgB0 1xvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=MO99i9mXaEOJiKYYZONOtDCqaLIAO0yEzXnTE3xFd9w=; b=lVnoCnWzBcIjFPvV5KX0edaBselERnX/ZtLVbsq4FB08Hln/2EcslxhrQJg3feqx79 ZIjC3/jbRzGL1/2BT2c/288/g54WSHeMrAeQeivZM6yHJcjVVdSn5RXR28b7MvIwPyqc Dwl7LTjlrVkGL5RQfV8vSzJ1q/WyIBu6aDmAXJeYMHR6Dwz5COxJXTq/+xQioBHAdYVM WIfdgPH6jpAhR53FASiHljnDI34E5ihPphJKkYAsZIux2496GhpuKNVQyNmk0fbrCauy 503ohg9Q6l7P6crNzmxHrp3uOpVy1HC3gbj9vIB1voKyR+ONXHBWHriBCHLWLHrdDXut 8Fqw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=axVUIv6t; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l8si16533312ejq.352.2019.09.16.01.44.07; Mon, 16 Sep 2019 01:44:30 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=axVUIv6t; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726055AbfIPEDC (ORCPT + 99 others); Mon, 16 Sep 2019 00:03:02 -0400 Received: from mail-ot1-f66.google.com ([209.85.210.66]:42637 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725730AbfIPEDC (ORCPT ); Mon, 16 Sep 2019 00:03:02 -0400 Received: by mail-ot1-f66.google.com with SMTP id c10so34357754otd.9; Sun, 15 Sep 2019 21:03:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=MO99i9mXaEOJiKYYZONOtDCqaLIAO0yEzXnTE3xFd9w=; b=axVUIv6tfRpmCQypOp3Nufj7NSNtRNUeb9NufEPHAGxJ8SgNj6UdRWuWFtLsXSFNnT 7ZaiW+4NvLUY2+hdrgMM1tJRxLt18fJhxmDyBBHMtfsHe8YAxRAJNRCDfSZxVFfLF1Di /iWmIyQd7QdGuj5kaQKG9c816QfBhnpuM1PcovlSNK+rCG/Zxm5bdViHQEuVGlP5NjaK 6AuUq2ILZfd9YWFQaKOVhsiDYKCgNeUTrKFjokX4hqhS0z5s+yjz6bCKV2HvJPfO9jIN mgnmyWcsxgksSx71IbPiCPWQTHk1yw8pYzz2/M9JBJv0gwBX17Uzk76vci9c+p2/Gcqp kqqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=MO99i9mXaEOJiKYYZONOtDCqaLIAO0yEzXnTE3xFd9w=; b=S5VgOErPQ8OswcVGdo6++c4dYPYtL+JGeAaAPFM5nw6Zyr5x9fFUQJ+Z3kICIEFf27 83WQ9u8hhA4lt+m9/ZNF1Z9HMf9SLscII2jErUXbtOtHYuvmNnXlNr4Wa5eY1yn5P0hJ 2krGtDFDcYXS/id7HwpzAEurwdLy0eH/4+QDjVeKAxpQFugCbVywc3CU7YWUchxFKUKX eoizv4qU5o3igRhscUvL9C1t2kzsacFPalnQxy4SZOCvhHubcMwyN1n3dysTE0Xy0Snq Om74K+XTUDKixzYzrxs5A6sjXsNOCetUFrU3guQ50ajqjSfvS9VriIl9GsrThOFlwIuI FL2g== X-Gm-Message-State: APjAAAVDN26kJUgUdfNQZUAfrhjGE4pSqbeZsQDnGECZ/3obpMSCmn5I NGkze3xVjXr9HUxwbBZfWBrqCuCJ7eBssKg/90w= X-Received: by 2002:a9d:4597:: with SMTP id x23mr31647946ote.185.1568606580868; Sun, 15 Sep 2019 21:03:00 -0700 (PDT) MIME-Version: 1.0 References: <1566980342-22045-1-git-send-email-wanpengli@tencent.com> <8054e73d-1e09-0f98-4beb-3caa501f2ac7@redhat.com> In-Reply-To: <8054e73d-1e09-0f98-4beb-3caa501f2ac7@redhat.com> From: Wanpeng Li Date: Mon, 16 Sep 2019 12:02:49 +0800 Message-ID: Subject: Re: [PATCH v3 1/2] KVM: LAPIC: Tune lapic_timer_advance_ns smoothly To: Paolo Bonzini Cc: Wanpeng Li , "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 12 Sep 2019 at 20:37, Paolo Bonzini wrote: > > On 12/09/19 02:34, Wanpeng Li wrote: > >>> - timer_advance_ns -= min((u32)ns, > >>> - timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP); > >>> + timer_advance_ns -= ns; > > Looking more closely, this assignment... > > >>> } else { > >>> /* too late */ > >>> ns = advance_expire_delta * 1000000ULL; > >>> do_div(ns, vcpu->arch.virtual_tsc_khz); > >>> - timer_advance_ns += min((u32)ns, > >>> - timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP); > >>> + timer_advance_ns += ns; > > ... and this one are dead code now. However... > > >>> } > >>> > >>> + timer_advance_ns = (apic->lapic_timer.timer_advance_ns * > >>> + (LAPIC_TIMER_ADVANCE_ADJUST_STEP - 1) + advance_expire_delta) / > >>> + LAPIC_TIMER_ADVANCE_ADJUST_STEP; > > ... you should instead remove this new assignment and just make the > assignments above just > > timer_advance -= ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP; > > and > > timer_advance -= ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP; > > In fact this whole last assignment is buggy, since advance_expire_delta > is in TSC units rather than nanoseconds. > > >>> if (abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_DONE) > >>> apic->lapic_timer.timer_advance_adjust_done = true; > >>> if (unlikely(timer_advance_ns > 5000)) { > >> This looks great. But instead of patch 2, why not remove > >> timer_advance_adjust_done altogether? > > It can fluctuate w/o stop. > > Possibly because of the wrong calculation of timer_advance_ns? Hi Paolo, Something like below? It still fluctuate when running complex guest os like linux. Removing timer_advance_adjust_done will hinder introduce patch v3 2/2 since there is no adjust done flag in each round evaluation. I have two questions here, best-effort tune always as below or periodically revaluate to get conservative value and get best-effort value in each round evaluation as patch v3 2/2, which one do you prefer? The former one can wast time to wait sometimes and the later one can not get the best latency. In addition, can the adaptive tune algorithm be using in all the scenarios contain RT/over-subscribe? ---8<--- diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 685d17c..895735b 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -69,6 +69,7 @@ #define LAPIC_TIMER_ADVANCE_ADJUST_INIT 1000 /* step-by-step approximation to mitigate fluctuation */ #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8 +#define LAPIC_TIMER_ADVANCE_FILTER 5000 static inline int apic_test_vector(int vec, void *bitmap) { @@ -1479,29 +1480,28 @@ static inline void adjust_lapic_timer_advance(struct kvm_vcpu *vcpu, s64 advance_expire_delta) { struct kvm_lapic *apic = vcpu->arch.apic; - u32 timer_advance_ns = apic->lapic_timer.timer_advance_ns; - u64 ns; + u32 timer_advance_ns = apic->lapic_timer.timer_advance_ns, ns; + + if (abs(advance_expire_delta) > LAPIC_TIMER_ADVANCE_FILTER || + abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_DONE) { + /* filter out random fluctuations */ + return; + } /* too early */ if (advance_expire_delta < 0) { ns = -advance_expire_delta * 1000000ULL; do_div(ns, vcpu->arch.virtual_tsc_khz); - timer_advance_ns -= min((u32)ns, - timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP); + timer_advance_ns -= ns/LAPIC_TIMER_ADVANCE_ADJUST_STEP; } else { /* too late */ ns = advance_expire_delta * 1000000ULL; do_div(ns, vcpu->arch.virtual_tsc_khz); - timer_advance_ns += min((u32)ns, - timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP); + timer_advance_ns += ns/LAPIC_TIMER_ADVANCE_ADJUST_STEP; } - if (abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_DONE) - apic->lapic_timer.timer_advance_adjust_done = true; - if (unlikely(timer_advance_ns > 5000)) { + if (unlikely(timer_advance_ns > 5000)) timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT; - apic->lapic_timer.timer_advance_adjust_done = false; - } apic->lapic_timer.timer_advance_ns = timer_advance_ns; } @@ -1521,7 +1521,7 @@ static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu) if (guest_tsc < tsc_deadline) __wait_lapic_expire(vcpu, tsc_deadline - guest_tsc); - if (unlikely(!apic->lapic_timer.timer_advance_adjust_done)) + if (lapic_timer_advance_ns == -1) adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta); } @@ -2298,10 +2298,8 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns) apic->lapic_timer.timer.function = apic_timer_fn; if (timer_advance_ns == -1) { apic->lapic_timer.timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT; - apic->lapic_timer.timer_advance_adjust_done = false; } else { apic->lapic_timer.timer_advance_ns = timer_advance_ns; - apic->lapic_timer.timer_advance_adjust_done = true; } diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 50053d2..2aad7e2 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -35,7 +35,6 @@ struct kvm_timer { s64 advance_expire_delta; atomic_t pending; /* accumulated triggered timers */ bool hv_timer_in_use; - bool timer_advance_adjust_done; }; struct kvm_lapic { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 93b0bd4..4f65ef1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -141,7 +141,7 @@ * advancement entirely. Any other value is used as-is and disables adaptive * tuning, i.e. allows priveleged userspace to set an exact advancement time. */ -static int __read_mostly lapic_timer_advance_ns = -1; +int __read_mostly lapic_timer_advance_ns = -1; module_param(lapic_timer_advance_ns, int, S_IRUGO | S_IWUSR); static bool __read_mostly vector_hashing = true; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 6594020..2c6ba86 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -301,6 +301,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2, extern bool enable_vmware_backdoor; +extern int lapic_timer_advance_ns; extern int pi_inject_timer; extern struct static_key kvm_no_apic_vcpu;