Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp2533740yba; Mon, 6 May 2019 07:37:10 -0700 (PDT) X-Google-Smtp-Source: APXvYqwpeo8daoTBtqwSdM0JSU6P1FoM/YzlPBN7J1BkFjWevKYHr90v6I+eUe43snrY+Y+g9Skk X-Received: by 2002:a63:fc08:: with SMTP id j8mr11200532pgi.432.1557153430830; Mon, 06 May 2019 07:37:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557153430; cv=none; d=google.com; s=arc-20160816; b=rMrsWwp5UzhAcZmA+zksbj5pNNtbRacohva9VohH5dzmQlK4uKez3STX25w+UD2r9K DcpusqmcP3gEOLnFMlBizTAf39CuHfShbl7cV5/ILZB6yJfUPz/JpiHlGoPQfeawfb4F tBcZvqv6S0ALaULuKvUcTQa/7m8PDQDqWq1MEKwX4fR2XttSNI8bW1WGZWrEIX9Edios 4m267BO8VFOnd4AX2fbgz+ztKnNZ0BWR1L9qoNZRrBU7kvmCOEoNQYCQwonXmDPE9rh2 zRT+GQ6OPX/trN76sWv1xeRZkQFOUX21xXAJ5+3OZXRnd3D2/u+iY4SesZ2wKYu1axlk RAbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=9EFKBCF2eYRq11Wzcw5J1zs5DJZUUl0m6YB9sgt7tSc=; b=ZsTWdZRiGdhbat7P1xJOadZPy/3V77m2E0D5k4VZ6ygmthXQ7RwTzAqDs/gupxPwNY /AVh0xTsIeIn49TdzU2DmiHX6kBajATBACLnZtCQhufDbR6D/2KRbUTF/stx3ILSSrf/ 2oYQphS3zOuDvIUq1xXKD1eJ8xPaJfg7x03MqFrm1PXkc1d53NOhbQYYnQ8CCqfG2peM hRuDl0ooMyi8KKSzZABNP+NI73DDEXzIe1cGtMWnqRdqjNxZL47VfYgucuLTvOeOVqH/ EK8Z+tbcZUEarDfAz0ujT5MuReOE8gZY6fUX31bdSxD5Ne4tgW2rRJ7SUphmvnLJKuGu Sdeg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=2P3HPq+2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a9si15271184pls.395.2019.05.06.07.36.55; Mon, 06 May 2019 07:37:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=2P3HPq+2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726924AbfEFOfX (ORCPT + 99 others); Mon, 6 May 2019 10:35:23 -0400 Received: from mail.kernel.org ([198.145.29.99]:55362 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726373AbfEFOfU (ORCPT ); Mon, 6 May 2019 10:35:20 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 1DF9C21019; Mon, 6 May 2019 14:35:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1557153319; bh=pnmmx/EzFnoIEvx4yU9Paj26Jn+qVnWXJinaZ5xxo4k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=2P3HPq+2bhN4cPsNByB/3I1v6nhfCBnXd/8XF/HMuM57wOZndL8yn4PXZ9Y6Fw+j1 +GmwCynsea5Hnh97lul21PyiDRyYOm9SWmMbSRlA9HQh0bRO+xK4WQDmijKuCp/p/u jTbTTzdiX+9ybxTHdmmwUxPdMsECPLuIYlKwU5v4= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Liran Alon , Wanpeng Li , Sean Christopherson , Paolo Bonzini Subject: [PATCH 5.0 023/122] KVM: lapic: Allow user to disable adaptive tuning of timer advancement Date: Mon, 6 May 2019 16:31:21 +0200 Message-Id: <20190506143056.836498988@linuxfoundation.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190506143054.670334917@linuxfoundation.org> References: <20190506143054.670334917@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Sean Christopherson commit c3941d9e0ccd48920e4811f133235b3597e5310b upstream. The introduction of adaptive tuning of lapic timer advancement did not allow for the scenario where userspace would want to disable adaptive tuning but still employ timer advancement, e.g. for testing purposes or to handle a use case where adaptive tuning is unable to settle on a suitable time. This is epecially pertinent now that KVM places a hard threshold on the maximum advancment time. Rework the timer semantics to accept signed values, with a value of '-1' being interpreted as "use adaptive tuning with KVM's internal default", and any other value being used as an explicit advancement time, e.g. a time of '0' effectively disables advancement. Note, this does not completely restore the original behavior of lapic_timer_advance_ns. Prior to tracking the advancement per vCPU, which is necessary to support autotuning, userspace could adjust lapic_timer_advance_ns for *running* vCPU. With per-vCPU tracking, the module params are snapshotted at vCPU creation, i.e. applying a new advancement effectively requires restarting a VM. Dynamically updating a running vCPU is possible, e.g. a helper could be added to retrieve the desired delay, choosing between the global module param and the per-VCPU value depending on whether or not auto-tuning is (globally) enabled, but introduces a great deal of complexity. The wrapper itself is not complex, but understanding and documenting the effects of dynamically toggling auto-tuning and/or adjusting the timer advancement is nigh impossible since the behavior would be dependent on KVM's implementation as well as compiler optimizations. In other words, providing stable behavior would require extremely careful consideration now and in the future. Given that the expected use of a manually-tuned timer advancement is to "tune once, run many", use the vastly simpler approach of recognizing changes to the module params only when creating a new vCPU. Cc: Liran Alon Cc: Wanpeng Li Reviewed-by: Liran Alon Cc: stable@vger.kernel.org Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically") Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/lapic.c | 11 +++++++++-- arch/x86/kvm/lapic.h | 2 +- arch/x86/kvm/x86.c | 9 +++++++-- 3 files changed, 17 insertions(+), 5 deletions(-) --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2260,7 +2260,7 @@ static enum hrtimer_restart apic_timer_f return HRTIMER_NORESTART; } -int kvm_create_lapic(struct kvm_vcpu *vcpu, u32 timer_advance_ns) +int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns) { struct kvm_lapic *apic; @@ -2284,7 +2284,14 @@ int kvm_create_lapic(struct kvm_vcpu *vc hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED); apic->lapic_timer.timer.function = apic_timer_fn; - apic->lapic_timer.timer_advance_ns = timer_advance_ns; + if (timer_advance_ns == -1) { + apic->lapic_timer.timer_advance_ns = 1000; + apic->lapic_timer.timer_advance_adjust_done = false; + } else { + apic->lapic_timer.timer_advance_ns = timer_advance_ns; + apic->lapic_timer.timer_advance_adjust_done = true; + } + /* * APIC is created enabled. This will prevent kvm_lapic_set_base from --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -64,7 +64,7 @@ struct kvm_lapic { struct dest_map; -int kvm_create_lapic(struct kvm_vcpu *vcpu, u32 timer_advance_ns); +int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns); void kvm_free_lapic(struct kvm_vcpu *vcpu); int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu); --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -136,8 +136,13 @@ EXPORT_SYMBOL_GPL(kvm_default_tsc_scalin static u32 __read_mostly tsc_tolerance_ppm = 250; module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR); -/* lapic timer advance (tscdeadline mode only) in nanoseconds */ -static u32 __read_mostly lapic_timer_advance_ns = 1000; +/* + * lapic timer advance (tscdeadline mode only) in nanoseconds. '-1' enables + * adaptive tuning starting from default advancment of 1000ns. '0' disables + * advancement entirely. Any other value is used as-is and disables adaptive + * tuning, i.e. allows priveleged userspace to set an exact advancement time. + */ +static int __read_mostly lapic_timer_advance_ns = -1; module_param(lapic_timer_advance_ns, uint, S_IRUGO | S_IWUSR); static bool __read_mostly vector_hashing = true;