Received: by 10.223.176.5 with SMTP id f5csp3546725wra; Mon, 29 Jan 2018 15:02:51 -0800 (PST) X-Google-Smtp-Source: AH8x225wW/h3Qr9lD48PhqxkWvRPNJkodroyv+c/wxgs6JJDl178OIjPOK1KaGFrU2mHFoJTG9h2 X-Received: by 2002:a17:902:7887:: with SMTP id q7-v6mr177431pll.385.1517266971120; Mon, 29 Jan 2018 15:02:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517266971; cv=none; d=google.com; s=arc-20160816; b=loylbZRFC39yw3EMSekEyP1Ze2mWNP31Uqm0Yi09+gq4Xyxg2ZoshwdHqthtWajl15 OFflc3YdvAq8RS3BriobRm5rNxITeibVBZmlprxEKLFuj92JFpm9aOa1az9MLLkVSlB1 KBn8ttXwVU62b4aAbjpcqUgtqUqb7StACjq+CL6WT8ziaKHJH9xoFUPsCxMskXq8PvGH 0WC5GVXn8RjQPozUPuCLQ8/+X1uUGOWmOof2tqCEJ7Bz56li2V5RoMBsIYHFquWActML qILXLw9QQw8fhrG+vfS5/DfxIh1gmZF9yAjqGD+7hl6pzluS+RWbY93Hat/ApuCwfzbO cd9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=GXhYsaD7IfckB8FHAi709uSS6UDB+9NpQQcoQbp+yRs=; b=TGuaPlL2BOSggDARKv0vjUrFDO/WM30svT104JsEA4c+cfXlelhckqXkTAJQZ7//O6 raGugzqzpPHm4yFSdgzz4TaYe7TNdcEKm4ZlLPM6QvdXWW8mMygXtgB8CCBo5SbmZqc2 z2ZyXsyB09rnFBz//Eq370AEuCofGDQ7zubxDa3QpPUG0S4+gu/Wld+BPgbajtD/qPn3 hJlCZTNkjEVLr+WCBI77TcwmZkiggRtO88LCyelxahoRFP0VAMFQqbo/9JLEP9HjIdqT sKP6L32PKWo1X6k9hbxdhDAeOr+FnMBWDcaRwZ/2iRQ0Uf0+Yzi2/WoNkoVk2uWYjPo3 gkPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=ZEhKbdpP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e13-v6si909778plo.242.2018.01.29.15.02.36; Mon, 29 Jan 2018 15:02:51 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=ZEhKbdpP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751837AbeA2XBP (ORCPT + 99 others); Mon, 29 Jan 2018 18:01:15 -0500 Received: from mail-ua0-f194.google.com ([209.85.217.194]:42068 "EHLO mail-ua0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751296AbeA2XBN (ORCPT ); Mon, 29 Jan 2018 18:01:13 -0500 Received: by mail-ua0-f194.google.com with SMTP id n2so5852876uak.9 for ; Mon, 29 Jan 2018 15:01:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=GXhYsaD7IfckB8FHAi709uSS6UDB+9NpQQcoQbp+yRs=; b=ZEhKbdpPoDepXJdMJtbwP/GTHAGsvt4g4IjZaki+FWtVs4iA+KY/BLaT/4WjTxaXl3 IJLkvGc8xsBB1+/QsuQKxASIvGOXsPO2EJdLnakvH7ppbqa1pVSAQdPipoHRJKPGIgnO vJaUZ/Bl9oKVIuwouVBkAazTs/Ozeue4+5psg6ZqNUaZanoZuVf/h3gBUFGu8i/wZUO1 4ZrVggI1jICmkGjD7SO+bTTfuve3PWY69dG4jKYmHg9AclfUKGVE4m/ly5Ifs0S5emTg 7IyFNTCIouGnuuynHcXNUDZrJNenhqoelsP6QBg20QIi17V5myNv+Xp7538svgpjEGQi 0gUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=GXhYsaD7IfckB8FHAi709uSS6UDB+9NpQQcoQbp+yRs=; b=BBRQG0PDOhCCSW3pPyvo19Irb8jLtAs/hkJ9OUVSVkOVBZa7V67bTTxzMSkzxnip3S 4ubhXCFrJia+GCXnryX3cIc5SiZjSRQKp1Y4naHwFugVUE/bvkhG2cUsJTFgHd4TYSbG 7uzrFXmPof/zHZ81KoSe9HgkADgHvHQlCMoAcYDRGKrdMGCSeX7Yo70ibzysM1J7dlUP goTFr7uw9CkOopZIw6Vzceye78G9sCetFK2j2fd2E7SaPSTvaBYv+byDhfdmakAQt3H6 B1ihBia+1izsWWV276D+V6+SOh59SUZ7vVGi30GOqGrLXCVku8heHhQi8Hx9g5SimkPy 9T8A== X-Gm-Message-State: AKwxyteGEVqr+hAJ4lHULVBruw2UurNjnQ+3qy0huxX78BQyimYShkA+ qyNkk1gFkdPeF34yxcbODfz+g/8+Jchul0LuPd7ykQ== X-Received: by 10.176.16.15 with SMTP id f15mr20547216uab.77.1517266871723; Mon, 29 Jan 2018 15:01:11 -0800 (PST) MIME-Version: 1.0 Received: by 10.103.124.200 with HTTP; Mon, 29 Jan 2018 15:01:10 -0800 (PST) In-Reply-To: <20180129214952.GI16863@olila.local.net-space.pl> References: <6b9a1ec2-5ebd-4624-a825-3f31db5cefb5@default> <1517215563.6624.118.camel@infradead.org> <20180129173113.GO22045@char.us.oracle.com> <20180129214952.GI16863@olila.local.net-space.pl> From: Jim Mattson Date: Mon, 29 Jan 2018 15:01:10 -0800 Message-ID: Subject: Re: [PATCH] x86: vmx: Allow direct access to MSR_IA32_SPEC_CTRL To: Daniel Kiper Cc: Konrad Rzeszutek Wilk , David Woodhouse , Mihai Carabas , Liran Alon , Andy Lutomirski , Thomas Gleixner , Linus Torvalds , Greg Kroah-Hartman , Asit Mallick , Dave Hansen , KarimAllah Ahmed , Jun Nakajima , Dan Williams , Ashok Raj , "Van De Ven, Arjan" , Tim Chen , Paolo Bonzini , LKML , Andi Kleen , kvm list , Andrea Arcangeli Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 29, 2018 at 1:49 PM, Daniel Kiper wrote: > On Mon, Jan 29, 2018 at 12:31:13PM -0500, Konrad Rzeszutek Wilk wrote: >> On Mon, Jan 29, 2018 at 08:46:03AM +0000, David Woodhouse wrote: >> > On Sun, 2018-01-28 at 16:39 -0800, Liran Alon wrote: >> > > >> > > Windows use IBRS and Microsoft don't have any plans to switch to retpoline. >> > > Running a Windows guest should be a pretty common use-case no? >> > > >> > > In addition, your handle of the first WRMSR intercept could be different. >> > > It could signal you to start doing the following: >> > > 1. Disable intercept on SPEC_CTRL MSR. >> > > 2. On VMEntry, Write vCPU SPEC_CTRL value into physical MSR. >> > > 3. On VMExit, read physical MSR into vCPU SPEC_CTRL value. >> > > (And if IBRS is used at host, also set physical SPEC_CTRL MSR here to 1) >> > > >> > > That way, you will both have fastest option as long as guest don't use IBRS >> > > and also won't have the 3% performance hit compared to Konrad's proposal. >> > > >> > > Am I missing something? >> > >> > Reads from the SPEC_CTRL MSR are strangely slow. I suspect a large part >> > of the 3% speedup you observe is because in the above, the vmentry path >> > doesn't need to *read* the host's value and store it; the host is >> > expected to restore it for itself anyway? >> >> Yes for at least the purpose of correctness. That is based on what I have >> heard is that you when you transition to a higher ring you have to write 1, then write zero >> when you transition back to lower rings. That is it is like a knob. >> >> But then I heard that on some CPUs it is more like reset button and >> just writting 1 to IBRS is fine. But again, correctness here. >> >> > >> > I'd actually quite like to repeat the benchmark on the new fixed >> > microcode, if anyone has it yet, to see if that read/swap slowness is >> > still quite as excessive. I'm certainly not ruling this out, but I'm >> > just a little wary of premature optimisation, and I'd like to make sure >> > we have everything *else* in the KVM patches right first. >> > >> > The fact that the save-and-restrict macros I have in the tip of my >> > working tree at the moment are horrid and causing 0-day nastygrams, >> > probably doesn't help persuade me to favour the approach ;) >> > >> > ... hm, the CPU actually has separate MSR save/restore lists for >> > entry/exit, doesn't it? Is there any way to sanely make use of that and >> > do the restoration manually on vmentry but let it be automatic on >> > vmexit, by having it *only* in the guest's MSR-store area to be saved >> > on exit and restored on exit, but *not* in the host's MSR-store area? > > s/on exit and restored on exit/on exit and restored on entry/? > > Additionally, AIUI there is no "host's MSR-store area". > >> Oh. That sounds sounds interesting > > That is possible but you have to split add_atomic_switch_msr() accordingly. > >> > Reading the code and comparing with the SDM, I can't see where we're >> > ever setting VM_EXIT_MSR_STORE_{ADDR,COUNT} except in the nested >> > case... >> >> Right. We (well Daniel Kiper, CC-ed) implemented it for this and >> that is where we got the numbers. >> >> Daniel, you OK posting it here? Granted with the caveats thta it won't even >> compile against upstream as it was based on a distro kernel. > > Please look below... > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index aa9bc4f..e7c0f8b 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -162,8 +162,10 @@ module_param(ple_window_max, int, S_IRUGO); > > extern const ulong vmx_return; > > -#define NR_AUTOLOAD_MSRS 8 > -#define VMCS02_POOL_SIZE 1 > +#define NR_AUTOLOAD_MSRS 8 > +#define NR_AUTOSTORE_MSRS NR_AUTOLOAD_MSRS > + > +#define VMCS02_POOL_SIZE 1 I think you accidentally resurrected VMCS02_POOL_SIZE. > struct vmcs { > u32 revision_id; > @@ -504,6 +506,10 @@ struct vcpu_vmx { > struct vmx_msr_entry guest[NR_AUTOLOAD_MSRS]; > struct vmx_msr_entry host[NR_AUTOLOAD_MSRS]; > } msr_autoload; > + struct msr_autostore { > + unsigned nr; > + struct vmx_msr_entry guest[NR_AUTOSTORE_MSRS]; > + } msr_autostore; > struct { > int loaded; > u16 fs_sel, gs_sel, ldt_sel; > @@ -1704,6 +1710,37 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, > m->host[i].value = host_val; > } > > +static void add_atomic_store_msr(struct vcpu_vmx *vmx, unsigned msr) > +{ > + unsigned i; > + struct msr_autostore *m = &vmx->msr_autostore; > + > + for (i = 0; i < m->nr; ++i) > + if (m->guest[i].index == msr) > + return; > + > + if (i == NR_AUTOSTORE_MSRS) { > + pr_err("Not enough msr store entries. Can't add msr %x\n", msr); > + BUG(); I wouldn't panic the host for this. add_atomic_switch_msr() just emits a warning for the equivalent condition. (Resetting the vCPU might be better in both cases.) > + } > + > + m->guest[i].index = msr; > + vmcs_write32(VM_EXIT_MSR_STORE_COUNT, ++m->nr); > +} > + > +static u64 get_msr_vmcs_store(struct vcpu_vmx *vmx, unsigned msr) > +{ > + unsigned i; > + struct msr_autostore *m = &vmx->msr_autostore; > + > + for (i = 0; i < m->nr; ++i) > + if (m->guest[i].index == msr) > + return m->guest[i].value; > + > + pr_err("Can't find msr %x in VMCS store\n", msr); > + BUG(); Same comment as above. > +} > + > static void reload_tss(void) > { > /* > @@ -4716,6 +4753,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) > #endif > > vmcs_write32(VM_EXIT_MSR_STORE_COUNT, 0); > + vmcs_write64(VM_EXIT_MSR_STORE_ADDR, __pa(vmx->msr_autostore.guest)); > vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, 0); > vmcs_write64(VM_EXIT_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.host)); > vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, 0); > @@ -8192,8 +8230,11 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) > > vmx->__launched = vmx->loaded_vmcs->launched; > > - if (ibrs_inuse) > - wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl); > + if (ibrs_inuse) { > + add_atomic_switch_msr(vmx, MSR_IA32_SPEC_CTRL, vmx->spec_ctrl, > + SPEC_CTRL_FEATURE_ENABLE_IBRS); > + add_atomic_store_msr(vmx, MSR_IA32_SPEC_CTRL); > + } If ibrs_inuse can be cleared dynamically, perhaps there should be: } else { clear_autostore_msr(vmx, MSR_IA32_SPEC_CTRL); } > > asm( > /* Store host registers */ > @@ -8317,12 +8358,11 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) > #endif > ); > > - if (ibrs_inuse) { > - rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl); > - wrmsrl(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_ENABLE_IBRS); > - } > stuff_RSB(); > > + if (ibrs_inuse) > + vmx->spec_ctrl = get_msr_vmcs_store(vmx, MSR_IA32_SPEC_CTRL); > + > /* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */ > if (debugctlmsr) > update_debugctlmsr(debugctlmsr); > > By the way, I saw in one version of patches that you use own rdmsrl()/wrmsrl() > functions to save/restore IBRS instead of using common ones (I mean > native_rdmsrl()/native_wrmsrl(), safe__rdmsrl()/safe_wrmsrl(), etc.) available > in the kernel. Could you explain why do you do that? > > Daniel What about vmcs02? If ibrs_inuse can be set dynamically, you may need the following in nested_vmx_vmexit: vmcs_write32(VM_EXIT_MSR_STORE_COUNT, vmx->msr_autostore.nr);