Received: by 10.223.185.116 with SMTP id b49csp7225993wrg; Thu, 1 Mar 2018 01:53:45 -0800 (PST) X-Google-Smtp-Source: AG47ELt+fNKZ2RePSR+Iw9g+E6zH2RRXV9KEarR+1Bu17SjZ+csWNfu7cuqnGaPbSI+v4woSFIjh X-Received: by 2002:a17:902:595d:: with SMTP id e29-v6mr1388944plj.189.1519898025233; Thu, 01 Mar 2018 01:53:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519898025; cv=none; d=google.com; s=arc-20160816; b=H65OZS+9aQp9Edq3CqEOroHTzpK6sCEfT7NaabjhV3hfrGn6HxKae/F+H3UZ+YYvrG M9Lqc5Y51gdfzwAMAToQQMMzQqL/Dfo3sSSoNzwrnlD2ZkinUButnDMmD/pVR2H2GocL g8dCsm9gR4Xl6sNJNlX8QFEF3ASlzO1+zag+BWA7C8njWmkT7ipnnxpaddW8ngHx7PeC 0m531/XnoocwudwMZWi82qKXPEn5t3nhJcIGyIVbGk5BfOzpxxKdcmOqcZLl8Qqdnr1M yY3O7jD768bwuZkEFiXWThUyMGGlHxjRq92Ii6omnEUnEI5JIZucuwLRMkPcxMpo8iCU 3qJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature :arc-authentication-results; bh=MexHwB1pIp+bL44ICd3ZxB8/8jFrgB//TvXiTQJ78uQ=; b=NwbjEPoJruCRy3ox1q66kOYa7SZuzDcEXXHMDHbko0tRCB7tX+kZ6ZHi7u/UHkRvYA /c2t6+6VGZY0nRTKYwkCXZTwZMze++g4k5IeVZkCOK0og9M/31iYqih36LTtbLzbkXpy 70/eEvfUiAhkIAX3NhUPcCGsrYvJdMvA5DM/MNniANul4Mr6BKqvDWxuUEm+okvH461W xkkROq24j6guVIVb99ZNRUiyHQ577pBYqK9S/X+/sj8wVltz7MCaCC1k8UYI6ZYxdHEm sso0IuViP3x0nEbFXdOkLSYWQ+MZKUqiBRNIiLJ9mQRYV1+lJes4APcDDbuAJjHHBNwW a51w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ZlDwMBDc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k3-v6si2804900pld.719.2018.03.01.01.53.30; Thu, 01 Mar 2018 01:53:45 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ZlDwMBDc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S967211AbeCAJt4 (ORCPT + 99 others); Thu, 1 Mar 2018 04:49:56 -0500 Received: from mail-pg0-f65.google.com ([74.125.83.65]:37686 "EHLO mail-pg0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S967059AbeCAJtw (ORCPT ); Thu, 1 Mar 2018 04:49:52 -0500 Received: by mail-pg0-f65.google.com with SMTP id y26so2098679pgv.4; Thu, 01 Mar 2018 01:49:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=MexHwB1pIp+bL44ICd3ZxB8/8jFrgB//TvXiTQJ78uQ=; b=ZlDwMBDcgsIfj6OzOo0VWFlf8e1l76kaakWP928NijfQYm4hdnaipQ14IEHCINiMNY cNrKBV7xzHCN94fURT3Pij0ZHzFM8Jdz5zH0npJ8wnIF+S4yfiRh8jaaOWJfTay3ObK4 Kdj9T1OH1V6FOcQR+zYb+RTuwj8pj8ueyW//wwuYxmO2TblQT5VzQMq/IyHFwaZLNwOp vqQQ75znAj8EhOQfTIjvbv98ug4aA9OjMvlryyo9aLRx83Vgky6rGGbvi6EyB4ofFZu1 +8GxOr0Ah7dxGHez1RYrj3vRjaUKMAAOTlibdcvVPaGC96TPZPsjsPYkpXJARqN/dZIJ +xeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=MexHwB1pIp+bL44ICd3ZxB8/8jFrgB//TvXiTQJ78uQ=; b=WrV0+4NkUrr/qz93xdn9ENm4gGpF/oEcaCY40G2I99nQ9cHqqEeMMyjETj8Aslxgdi /MxT+SGj669/XeCBH+9mZr9BZvafcog+wLBZBMAngkTh3iF+l5UZ3zlg4emCfDAD0crM XhcHGSRp1gvMzdx4sD7eFzMuYaGynKodMAb+Q6RtHW9C6xvQz3MI8PNsS1u0Z2VHICPJ SDBM0rUmet/7lWZoW4CINJf5kFruEuqRV5GFsgnSNmEfDQyuKLOeSUDPpTfzHuRjDQLJ 7bwgp2sh3xSPWcTFMLqRmsuTbOI+D3qHwZhduFjr98T/RF4wzR8LLvFr4LaL2hjKORNz SYjQ== X-Gm-Message-State: APf1xPAmk3fbknJ+L0yjsZ80BE24UP0Sn4KiZp4k7d0XiI4P3Wp3pkUz c7DVqU6WpeqPwb9SUsn2VnbGkw== X-Received: by 10.98.31.155 with SMTP id l27mr1342611pfj.176.1519897791822; Thu, 01 Mar 2018 01:49:51 -0800 (PST) Received: from localhost.localdomain ([203.205.141.123]) by smtp.googlemail.com with ESMTPSA id 9sm8142398pfq.63.2018.03.01.01.49.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 01 Mar 2018 01:49:51 -0800 (PST) From: Wanpeng Li X-Google-Original-From: Wanpeng Li To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Paolo Bonzini , =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= Subject: [PATCH 1/3] KVM: X86: Provides userspace with a capability to not intercept MWAIT Date: Thu, 1 Mar 2018 17:49:40 +0800 Message-Id: <1519897782-8124-1-git-send-email-wanpengli@tencent.com> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Wanpeng Li Allowing a guest to execute MWAIT without interception enables a guest to put a (physical) CPU into a power saving state, where it takes longer to return from than what may be desired by the host. Don't give a guest that power over a host by default. (Especially, since nothing prevents a guest from using MWAIT even when it is not advertised via CPUID.) Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Wanpeng Li --- Documentation/virtual/kvm/api.txt | 26 ++++++++++++++++---------- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/svm.c | 2 +- arch/x86/kvm/vmx.c | 9 +++++---- arch/x86/kvm/x86.c | 24 ++++++++++++++++++++---- arch/x86/kvm/x86.h | 10 +++++----- include/uapi/linux/kvm.h | 2 +- tools/include/uapi/linux/kvm.h | 2 +- 8 files changed, 51 insertions(+), 26 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index da3b958..4df35c0 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1041,7 +1041,8 @@ On systems that do not support this ioctl, it always fails. On systems that do support it, it only works for extensions that are supported for enablement. To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should -be used. +be used. Blindly passing the KVM_CHECK_EXTENSION result to KVM_ENABLE_CAP is +a valid thing to do when vCPUs are associated to dedicated physical CPUs. struct kvm_enable_cap { /* in */ @@ -4358,6 +4359,20 @@ enables QEMU to build error log and branch to guest kernel registered machine check handling routine. Without this capability KVM will branch to guests' 0x200 interrupt vector. +7.13 KVM_CAP_X86_DISABLE_EXITS + +Architectures: x86 +Parameters: args[0] defines which exits are disabled +Returns: 0 on success, -EINVAL when args[0] contains invalid exits + +Valid exits in args[0] are + +#define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0) + +Enabling this capability on a VM provides userspace with a way to no +longer intercepts some instructions for improved latency in some +workloads. Not enable KVM_FEATURE_PV_UNHALT if you block HLT. + 8. Other capabilities. ---------------------- @@ -4470,15 +4485,6 @@ reserved. Both registers and addresses are 64-bits wide. It will be possible to run 64-bit or 32-bit guest code. -8.8 KVM_CAP_X86_GUEST_MWAIT - -Architectures: x86 - -This capability indicates that guest using memory monotoring instructions -(MWAIT/MWAITX) to stop the virtual CPU will not cause a VM exit. As such time -spent while virtual CPU is halted in this way will then be accounted for as -guest running time on the host (as opposed to e.g. HLT). - 8.9 KVM_CAP_ARM_USER_IRQ Architectures: arm, arm64 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index df6720f..6bd754f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -807,6 +807,8 @@ struct kvm_arch { gpa_t wall_clock; + bool mwait_in_guest; + bool ept_identity_pagetable_done; gpa_t ept_identity_map_addr; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 312f33f..dff3a5d 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1389,7 +1389,7 @@ static void init_vmcb(struct vcpu_svm *svm) set_intercept(svm, INTERCEPT_XSETBV); set_intercept(svm, INTERCEPT_RSM); - if (!kvm_mwait_in_guest()) { + if (!kvm_mwait_in_guest(svm->vcpu.kvm)) { set_intercept(svm, INTERCEPT_MONITOR); set_intercept(svm, INTERCEPT_MWAIT); } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 2cdbea7..b551067 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3713,13 +3713,11 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) CPU_BASED_UNCOND_IO_EXITING | CPU_BASED_MOV_DR_EXITING | CPU_BASED_USE_TSC_OFFSETING | + CPU_BASED_MWAIT_EXITING | + CPU_BASED_MONITOR_EXITING | CPU_BASED_INVLPG_EXITING | CPU_BASED_RDPMC_EXITING; - if (!kvm_mwait_in_guest()) - min |= CPU_BASED_MWAIT_EXITING | - CPU_BASED_MONITOR_EXITING; - opt = CPU_BASED_TPR_SHADOW | CPU_BASED_USE_MSR_BITMAPS | CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; @@ -5503,6 +5501,9 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx) exec_control |= CPU_BASED_CR3_STORE_EXITING | CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_INVLPG_EXITING; + if (kvm_mwait_in_guest(vmx->vcpu.kvm)) + exec_control &= ~(CPU_BASED_MWAIT_EXITING | + CPU_BASED_MONITOR_EXITING); return exec_control; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5c93cbc..c1d9bbb 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2780,9 +2780,15 @@ static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs, return r; } +static inline bool kvm_mwait_can_in_guest(void) +{ + return boot_cpu_has(X86_FEATURE_MWAIT) && + !boot_cpu_has_bug(X86_BUG_MONITOR); +} + int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) { - int r; + int r = 0; switch (ext) { case KVM_CAP_IRQCHIP: @@ -2838,8 +2844,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ADJUST_CLOCK: r = KVM_CLOCK_TSC_STABLE; break; - case KVM_CAP_X86_GUEST_MWAIT: - r = kvm_mwait_in_guest(); + case KVM_CAP_X86_DISABLE_EXITS: + if(kvm_mwait_can_in_guest()) + r |= KVM_X86_DISABLE_EXITS_MWAIT; break; case KVM_CAP_X86_SMM: /* SMBASE is usually relocated above 1M on modern chipsets, @@ -2880,7 +2887,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = KVM_X2APIC_API_VALID_FLAGS; break; default: - r = 0; break; } return r; @@ -4185,6 +4191,16 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, r = 0; break; + case KVM_CAP_X86_DISABLE_EXITS: + r = -EINVAL; + if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS) + break; + + if ((cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT) && + kvm_mwait_can_in_guest()) + kvm->arch.mwait_in_guest = true; + r = 0; + break; default: r = -EINVAL; break; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index b91215d..cd1215e 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -2,8 +2,6 @@ #ifndef ARCH_X86_KVM_X86_H #define ARCH_X86_KVM_X86_H -#include -#include #include #include #include "kvm_cache_regs.h" @@ -264,10 +262,12 @@ static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec) __rem; \ }) -static inline bool kvm_mwait_in_guest(void) +#define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0) +#define KVM_X86_DISABLE_VALID_EXITS (KVM_X86_DISABLE_EXITS_MWAIT) + +static inline bool kvm_mwait_in_guest(struct kvm *kvm) { - return boot_cpu_has(X86_FEATURE_MWAIT) && - !boot_cpu_has_bug(X86_BUG_MONITOR); + return kvm->arch.mwait_in_guest; } #endif diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 088c2c9..1065006 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -929,7 +929,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_S390_GS 140 #define KVM_CAP_S390_AIS 141 #define KVM_CAP_SPAPR_TCE_VFIO 142 -#define KVM_CAP_X86_GUEST_MWAIT 143 +#define KVM_CAP_X86_DISABLE_EXITS 143 #define KVM_CAP_ARM_USER_IRQ 144 #define KVM_CAP_S390_CMMA_MIGRATION 145 #define KVM_CAP_PPC_FWNMI 146 diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h index 8fb90a0..a81df22 100644 --- a/tools/include/uapi/linux/kvm.h +++ b/tools/include/uapi/linux/kvm.h @@ -924,7 +924,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_S390_GS 140 #define KVM_CAP_S390_AIS 141 #define KVM_CAP_SPAPR_TCE_VFIO 142 -#define KVM_CAP_X86_GUEST_MWAIT 143 +#define KVM_CAP_X86_DISABLE_EXITS 143 #define KVM_CAP_ARM_USER_IRQ 144 #define KVM_CAP_S390_CMMA_MIGRATION 145 #define KVM_CAP_PPC_FWNMI 146 -- 2.7.4