Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp2076056ybb; Thu, 9 Apr 2020 14:46:17 -0700 (PDT) X-Google-Smtp-Source: APiQypKPB2hAEWflvwNUxSkh0vWA3YehYHz8UQfLifsWackbi8+L/A793I71BLOHbC8w9nmlmI0G X-Received: by 2002:ac8:27f9:: with SMTP id x54mr1520161qtx.45.1586468777805; Thu, 09 Apr 2020 14:46:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586468777; cv=none; d=google.com; s=arc-20160816; b=TrGx4lDV2PLByvnlfl01twAtFoGILw2P74RjioWJscN3x41X6h+K09aURAjAaKXC23 UrfigsM8YR98K1zIOGToyny9WmuZCvUiTD6LOzjp7h7H0LL00JxGIDFXtsNuojs24lQ7 cawM8igr/pz72FLDYMgiY90ypwjJmz3wc79g/PdSIA91yZ0DCA4WT5e16mQ4J6PekBaV qF+km5OnXNfs3WlZVm/xsMwBKmH12D2S6kWSzKUA/E0mx3Dpu3DBrxxRIlxZHTApM/pL MbFjNX9jevW8dKlERH4teEUKVSNPkrYuUyYbNFYqoY6vkfP9r2a9Exm+W3ut0nzLTRbM xYNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:references:in-reply-to:subject:cc:to:from; bh=Ufm13vLgR/Sw0CEJBrKNyzKlzDnkSyigqSIj9hUjbBY=; b=wSMKzpxHEaoOXr9UcCpJEXWMA0Iy4fgNXmOLj5JSSbuHKIRaEVyO37+Fq7BG4HIFpW 9+dPxhmpiKjtUCi5H3WX479YGwfM9KEP6R6RblZd8clbmCi6ZWONzPiWPIRLBfgl5S64 xklnvgcEsYMzpJNhx37SBHadHao1BFoCG27MfM1d6DSV61NufNtO+fe8YzWa55MULDJD Iw7tFrdb3IJ/D/rfgME5cRRtKBhLBWW1Wqllnu4ulJwYYwXyLR9JVCdJhgD+AB0JZiTz 3d0EYnEPrc+ia+rSRefJahR2Ejouh+9vJVsduzQGKD+b3coHqNRZ9LmpdjhE7T8S5w9X y17Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y185si184923qka.96.2020.04.09.14.46.02; Thu, 09 Apr 2020 14:46:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727464AbgDIVNv convert rfc822-to-8bit (ORCPT + 99 others); Thu, 9 Apr 2020 17:13:51 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:53710 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727376AbgDIVNv (ORCPT ); Thu, 9 Apr 2020 17:13:51 -0400 Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jMeUF-0004z0-44; Thu, 09 Apr 2020 23:13:23 +0200 Received: by nanos.tec.linutronix.de (Postfix, from userid 1000) id 7E127101150; Thu, 9 Apr 2020 23:13:22 +0200 (CEST) From: Thomas Gleixner To: Nadav Amit , Peter Zijlstra Cc: Paolo Bonzini , Christoph Hellwig , Steven Rostedt , LKML , Sean Christopherson , mingo@redhat.com, bp@alien8.de, hpa@zytor.com, x86@kernel.org, kenny@panix.com, jeyu@kernel.org, rasmus.villemoes@prevas.dk, fenghua.yu@intel.com, xiaoyao.li@intel.com, thellstrom@vmware.com, tony.luck@intel.com, gregkh@linuxfoundation.org, jannh@google.com, keescook@chromium.org, David.Laight@aculab.com, dcovelli@vmware.com, mhiramat@kernel.org Subject: Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation In-Reply-To: <9A25271A-71F7-4EA1-9D1C-23B53E35C281@gmail.com> References: <20200407110236.930134290@infradead.org> <20200407111007.429362016@infradead.org> <20200408092726.7c2bda01@gandalf.local.home> <20200408154419.GP20730@hirez.programming.kicks-ass.net> <20200408154602.GA24869@infradead.org> <2b0dc69c-f7f9-985d-fc40-8b7bbd927e4f@redhat.com> <20200409085632.GB20713@hirez.programming.kicks-ass.net> <9A25271A-71F7-4EA1-9D1C-23B53E35C281@gmail.com> Date: Thu, 09 Apr 2020 23:13:22 +0200 Message-ID: <87imi8pdl9.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Nadav Amit writes: >> On Apr 9, 2020, at 1:56 AM, Peter Zijlstra wrote: >> Speaking with my virt ignorance hat on, how impossible is it to provide >> generic/useful VMLAUNCH/VMRESUME wrappers? >> >> Because a lot of what happens around VMEXIT/VMENTER is very much like >> the userspace entry crud, as per that series from Thomas that fixes all >> that. And surely we don't need various broken copies of that in all the >> out-of-tree hypervisors. >> >> Also, I suppose if you have this, we no longer need to excempt CR2. > > It depends on what you mean by “VMLAUNCH/VMRESUME”. If you only consider the > instructions themselves, as Sean did in vmx_vmenter() and vmx_vmexit(), > there is no problem. Even if you consider saving the general purpose > registers as done in __vmx_vcpu_run() - that’s relatively easy. __vmx_vcpu_run() is roughly the scope, but that wont work. Looking at the vmmon source: Task_Switch() 1) Mask all APIC LVTs which have NMI delivery mode enabled, e.g. PERF 2) Disable interrupts 3) Disable PEBS 4) Disable PT 5) Load a magic IDT According to comments these are stubs to catch any exception which happens while switching over. 6) Write CR0 and CR4 directly which is "safe" as the the IDT is redirected to the monitor stubs. 7) VMXON() 8) Invoke monitor on some magic page which switches CR3 and GDT and clears CR4.PCIDE (at least thats what the comments claim) The monitor code is loaded from a binary only blob and that does the actual vmlaunch/vmresume ... And as this runs with a completely different CR3 sharing that code is impossible. When returning the above is undone in reverse order and any catched exceptions / interrupts are replayed via "int $NR". So it's pretty much the same mess as with vbox just different and binary. Oh well... The "good" news is that it's not involved in any of the context tracking stuff so RCU wont ever be affected when a vmware vCPU runs. It's not pretty, but TBH I don't care. Thanks, tglx