Received: by 10.223.176.5 with SMTP id f5csp3479861wra; Mon, 29 Jan 2018 13:53:22 -0800 (PST) X-Google-Smtp-Source: AH8x227MjXPVYOmaVx5XSvduehuC4CNzlC3k2wrHC8+E6Sfcb1aAjtFa/R4B6oP4No7Yxc4swqlk X-Received: by 10.98.161.16 with SMTP id b16mr28348723pff.34.1517262801971; Mon, 29 Jan 2018 13:53:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517262801; cv=none; d=google.com; s=arc-20160816; b=yvEtR3u402Vdqxe4f8vEqKzBb2wYse18gcDFeIMPVmygkjeQmRGBur1SskjakArx3V 2o9t5gf5NCc6eTc/kn+6lARzJAGn6Qu16aoFhPZK4VrA6Qr1/y7wAo6fnr9alDnOiC1Q AjfoG1XWLLOuCg7MCkmQUdlDZlFXXFgqICIM9b8GbNyMxV8RIAVm2Rg41GS62HuRNXHn lGvVHtpPB4ybfE1qt0b592BZzSxTBaORt/66qDFOg9gA4BLooZ2pVnhY5Q1VlWCVVEhf rEAV6Av6ekB/Crg9q1x1xfwF5b62xTNpDzCNSBfiwbu3aFCQznOnE9KRVBE4BWI8YUxQ oD4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=oVL7HqjRGeUWaY+Mo900ryXXrOmwUdops4W1o57vjcg=; b=epKGDfkrtFSZsbfAs+vbsbDR8dTJbbo0KAZk/iTLyyTOo8h4QvAnL6UxXAkIDFp5b6 awFidZnJ/ygPgkozuyCt9wgtwHZxkMIBASFNYCvk6jL3MalQwJIh6VR0L4M8w8ywteIT G1NyklxI/Z+2RSgVhXYULndBcsVhxMTyPr/rWNG4U0A+Rv1N4LqmYYqor2nXdNYVaqqr kUKwE1gs7IXPYnZfrQb0l+Xp2SG1fh9T7FRYFnoPuzELo+D6di5sWJ7t85v+/mq/Lqeb dKr7rBaZtl4QJwfyC/7ou2TOEcseaRmZ1/rnZLcG3BxokM4D86jHOV35sRzmIHHvuSfz 0ihw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=oC/u1+2U; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 33-v6si10016931plg.793.2018.01.29.13.53.07; Mon, 29 Jan 2018 13:53:21 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=oC/u1+2U; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752155AbeA2Vvl (ORCPT + 99 others); Mon, 29 Jan 2018 16:51:41 -0500 Received: from userp2120.oracle.com ([156.151.31.85]:59618 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751436AbeA2Vvi (ORCPT ); Mon, 29 Jan 2018 16:51:38 -0500 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w0TLkn3S146016; Mon, 29 Jan 2018 21:50:03 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=corp-2017-10-26; bh=oVL7HqjRGeUWaY+Mo900ryXXrOmwUdops4W1o57vjcg=; b=oC/u1+2UJoHYNPcQ9slpuM2yDFJErxY8Ew2+hX2bPI37y8xuES7kOBrpCg0WIxSyFyTx SAg8iHwfhLhTe99vzDdAGI0lEEKSnO4xPnxUlsfKwa60CO6penyRLAOrnIj/YMXuvrFM pVAJbyrjwLbPaNbLMA7TO+DUH5cI+5lVrHRPxt7poaGxVrjLzfalH5kZh6qsBEXsV/JC X5G6+L2AwzaruhHRo1LXp4e/Obt+hHz4G7vuDxhjWu9Hebl5OfjdmUHP9XbWlj51Knw4 Jl8/3miFQboOTWuSy46K5P43ecMPtR4TkwS4YnO5z8Lv/W3L+fXEzj1K6XY06zWp25JB cw== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp2120.oracle.com with ESMTP id 2ftbnag38a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 29 Jan 2018 21:50:03 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w0TLo27M030455 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Mon, 29 Jan 2018 21:50:02 GMT Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w0TLo0Ru010725; Mon, 29 Jan 2018 21:50:00 GMT Received: from olila.local.net-space.pl (/10.175.182.146) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 29 Jan 2018 13:49:59 -0800 Date: Mon, 29 Jan 2018 22:49:52 +0100 From: Daniel Kiper To: Konrad Rzeszutek Wilk Cc: David Woodhouse , Mihai Carabas , Liran Alon , luto@kernel.org, tglx@linutronix.de, torvalds@linux-foundation.org, gregkh@linuxfoundation.org, asit.k.mallick@intel.com, dave.hansen@intel.com, karahmed@amazon.de, jun.nakajima@intel.com, dan.j.williams@intel.com, ashok.raj@intel.com, arjan.van.de.ven@intel.com, tim.c.chen@linux.intel.com, pbonzini@redhat.com, linux-kernel@vger.kernel.org, ak@linux.intel.com, kvm@vger.kernel.org, aarcange@redhat.com Subject: Re: [PATCH] x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL Message-ID: <20180129214952.GI16863@olila.local.net-space.pl> References: <6b9a1ec2-5ebd-4624-a825-3f31db5cefb5@default> <1517215563.6624.118.camel@infradead.org> <20180129173113.GO22045@char.us.oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180129173113.GO22045@char.us.oracle.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8789 signatures=668655 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801290278 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 29, 2018 at 12:31:13PM -0500, Konrad Rzeszutek Wilk wrote: > On Mon, Jan 29, 2018 at 08:46:03AM +0000, David Woodhouse wrote: > > On Sun, 2018-01-28 at 16:39 -0800, Liran Alon wrote: > > > > > > Windows use IBRS and Microsoft don't have any plans to switch to retpoline. > > > Running a Windows guest should be a pretty common use-case no? > > > > > > In addition, your handle of the first WRMSR intercept could be different. > > > It could signal you to start doing the following: > > > 1. Disable intercept on SPEC_CTRL MSR. > > > 2. On VMEntry, Write vCPU SPEC_CTRL value into physical MSR. > > > 3. On VMExit, read physical MSR into vCPU SPEC_CTRL value. > > > (And if IBRS is used at host, also set physical SPEC_CTRL MSR here to 1) > > > > > > That way, you will both have fastest option as long as guest don't use IBRS > > > and also won't have the 3% performance hit compared to Konrad's proposal. > > > > > > Am I missing something? > > > > Reads from the SPEC_CTRL MSR are strangely slow. I suspect a large part > > of the 3% speedup you observe is because in the above, the vmentry path > > doesn't need to *read* the host's value and store it; the host is > > expected to restore it for itself anyway? > > Yes for at least the purpose of correctness. That is based on what I have > heard is that you when you transition to a higher ring you have to write 1, then write zero > when you transition back to lower rings. That is it is like a knob. > > But then I heard that on some CPUs it is more like reset button and > just writting 1 to IBRS is fine. But again, correctness here. > > > > > I'd actually quite like to repeat the benchmark on the new fixed > > microcode, if anyone has it yet, to see if that read/swap slowness is > > still quite as excessive. I'm certainly not ruling this out, but I'm > > just a little wary of premature optimisation, and I'd like to make sure > > we have everything *else* in the KVM patches right first. > > > > The fact that the save-and-restrict macros I have in the tip of my > > working tree at the moment are horrid and causing 0-day nastygrams, > > probably doesn't help persuade me to favour the approach ;) > > > > ... hm, the CPU actually has separate MSR save/restore lists for > > entry/exit, doesn't it? Is there any way to sanely make use of that and > > do the restoration manually on vmentry but let it be automatic on > > vmexit, by having it *only* in the guest's MSR-store area to be saved > > on exit and restored on exit, but *not* in the host's MSR-store area? s/on exit and restored on exit/on exit and restored on entry/? Additionally, AIUI there is no "host's MSR-store area". > Oh. That sounds sounds interesting That is possible but you have to split add_atomic_switch_msr() accordingly. > > Reading the code and comparing with the SDM, I can't see where we're > > ever setting VM_EXIT_MSR_STORE_{ADDR,COUNT} except in the nested > > case... > > Right. We (well Daniel Kiper, CC-ed) implemented it for this and > that is where we got the numbers. > > Daniel, you OK posting it here? Granted with the caveats thta it won't even > compile against upstream as it was based on a distro kernel. Please look below... diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index aa9bc4f..e7c0f8b 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -162,8 +162,10 @@ module_param(ple_window_max, int, S_IRUGO); extern const ulong vmx_return; -#define NR_AUTOLOAD_MSRS 8 -#define VMCS02_POOL_SIZE 1 +#define NR_AUTOLOAD_MSRS 8 +#define NR_AUTOSTORE_MSRS NR_AUTOLOAD_MSRS + +#define VMCS02_POOL_SIZE 1 struct vmcs { u32 revision_id; @@ -504,6 +506,10 @@ struct vcpu_vmx { struct vmx_msr_entry guest[NR_AUTOLOAD_MSRS]; struct vmx_msr_entry host[NR_AUTOLOAD_MSRS]; } msr_autoload; + struct msr_autostore { + unsigned nr; + struct vmx_msr_entry guest[NR_AUTOSTORE_MSRS]; + } msr_autostore; struct { int loaded; u16 fs_sel, gs_sel, ldt_sel; @@ -1704,6 +1710,37 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, m->host[i].value = host_val; } +static void add_atomic_store_msr(struct vcpu_vmx *vmx, unsigned msr) +{ + unsigned i; + struct msr_autostore *m = &vmx->msr_autostore; + + for (i = 0; i < m->nr; ++i) + if (m->guest[i].index == msr) + return; + + if (i == NR_AUTOSTORE_MSRS) { + pr_err("Not enough msr store entries. Can't add msr %x\n", msr); + BUG(); + } + + m->guest[i].index = msr; + vmcs_write32(VM_EXIT_MSR_STORE_COUNT, ++m->nr); +} + +static u64 get_msr_vmcs_store(struct vcpu_vmx *vmx, unsigned msr) +{ + unsigned i; + struct msr_autostore *m = &vmx->msr_autostore; + + for (i = 0; i < m->nr; ++i) + if (m->guest[i].index == msr) + return m->guest[i].value; + + pr_err("Can't find msr %x in VMCS store\n", msr); + BUG(); +} + static void reload_tss(void) { /* @@ -4716,6 +4753,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) #endif vmcs_write32(VM_EXIT_MSR_STORE_COUNT, 0); + vmcs_write64(VM_EXIT_MSR_STORE_ADDR, __pa(vmx->msr_autostore.guest)); vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, 0); vmcs_write64(VM_EXIT_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.host)); vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, 0); @@ -8192,8 +8230,11 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) vmx->__launched = vmx->loaded_vmcs->launched; - if (ibrs_inuse) - wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl); + if (ibrs_inuse) { + add_atomic_switch_msr(vmx, MSR_IA32_SPEC_CTRL, vmx->spec_ctrl, + SPEC_CTRL_FEATURE_ENABLE_IBRS); + add_atomic_store_msr(vmx, MSR_IA32_SPEC_CTRL); + } asm( /* Store host registers */ @@ -8317,12 +8358,11 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) #endif ); - if (ibrs_inuse) { - rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl); - wrmsrl(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_ENABLE_IBRS); - } stuff_RSB(); + if (ibrs_inuse) + vmx->spec_ctrl = get_msr_vmcs_store(vmx, MSR_IA32_SPEC_CTRL); + /* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */ if (debugctlmsr) update_debugctlmsr(debugctlmsr); By the way, I saw in one version of patches that you use own rdmsrl()/wrmsrl() functions to save/restore IBRS instead of using common ones (I mean native_rdmsrl()/native_wrmsrl(), safe__rdmsrl()/safe_wrmsrl(), etc.) available in the kernel. Could you explain why do you do that? Daniel