Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp2103769ybb; Thu, 9 Apr 2020 15:21:26 -0700 (PDT) X-Google-Smtp-Source: APiQypI6VXEb/2W3Z30ixEbz/RixM/H70p/nPOunyaGDiJJWnBylxE3VCeY8gOb0zVwrc6dUYC3q X-Received: by 2002:a37:4d5:: with SMTP id 204mr1229249qke.176.1586470886153; Thu, 09 Apr 2020 15:21:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586470886; cv=none; d=google.com; s=arc-20160816; b=Ve6PKCTvKjER16ojfhY0AN8miG/VAkQhbKG8YxyLWLKMgqeCl7weVsLSBIaMA87eCm vt+j5ZitSDvDGE0oDGcaaaARqeQ3ZVosE4+ZNBBrXQsZceQa1robd2K64+0DxFiPkbsf eDOaEOyAlLEaRx6TXy3rKMmZbKbwM44AYC8lw+XYI+DWhSceeKF9G+y0vCiuWKCp9hwy t3czBL3+MUe7vRv7Rj+DOji2IkG+Wschu7VV59999g0W9yAu95Nr/WyUFrNpmiKvDPNr 8yBLTrFk5Egp9CjPIMZwDj4wdZKwPVM/NiA4F6wwvbjkeJsPHFj2FoVZx7/aW6vPfVx2 /CzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=kNicedEIB+2/hPYRmbXvczqaBDRurbAPQHYNNGRt4bk=; b=sL+oDRRMVhftRivSXdY9IhQ27oozFxsB8g409ehnM+XzaPJRZARt1GNjQ3wM//KBfL qrbvZmRnClx7uQcIdDUjy+aHokMoU66urir/3MLmbQWt9HGNyhF0U0H2l8HEkqFk/OeQ JVY9e2+ulPX6nXr5ywxvKZ+6noug08NOvLiVg9V/x7IcnTWEd+zufn9NnryofUo0UZDR tuAoBOaZuRcQJVCzCV4lGzZPmaOpXLL01vwDF2atl+VqBbeImVqixF+eQttUn3N2MBbj QmsyaoM/FnMULD1EcUvf9nOPV00Eo4u5L3qFzUybTQcVULhYg6cyL+H8NRQwEPvin5gW stqg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y23si63343qtn.69.2020.04.09.15.21.07; Thu, 09 Apr 2020 15:21:26 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726916AbgDIWS1 convert rfc822-to-8bit (ORCPT + 99 others); Thu, 9 Apr 2020 18:18:27 -0400 Received: from mail.kernel.org ([198.145.29.99]:56360 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726632AbgDIWS1 (ORCPT ); Thu, 9 Apr 2020 18:18:27 -0400 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E7509206F7; Thu, 9 Apr 2020 22:18:24 +0000 (UTC) Date: Thu, 9 Apr 2020 18:18:23 -0400 From: Steven Rostedt To: Thomas Gleixner Cc: Nadav Amit , Peter Zijlstra , Paolo Bonzini , Christoph Hellwig , LKML , Sean Christopherson , mingo@redhat.com, bp@alien8.de, hpa@zytor.com, x86@kernel.org, kenny@panix.com, jeyu@kernel.org, rasmus.villemoes@prevas.dk, fenghua.yu@intel.com, xiaoyao.li@intel.com, thellstrom@vmware.com, tony.luck@intel.com, gregkh@linuxfoundation.org, jannh@google.com, keescook@chromium.org, David.Laight@aculab.com, dcovelli@vmware.com, mhiramat@kernel.org Subject: Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation Message-ID: <20200409181823.00bcd14a@gandalf.local.home> In-Reply-To: <87imi8pdl9.fsf@nanos.tec.linutronix.de> References: <20200407110236.930134290@infradead.org> <20200407111007.429362016@infradead.org> <20200408092726.7c2bda01@gandalf.local.home> <20200408154419.GP20730@hirez.programming.kicks-ass.net> <20200408154602.GA24869@infradead.org> <2b0dc69c-f7f9-985d-fc40-8b7bbd927e4f@redhat.com> <20200409085632.GB20713@hirez.programming.kicks-ass.net> <9A25271A-71F7-4EA1-9D1C-23B53E35C281@gmail.com> <87imi8pdl9.fsf@nanos.tec.linutronix.de> X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 09 Apr 2020 23:13:22 +0200 Thomas Gleixner wrote: > Nadav Amit writes: > >> On Apr 9, 2020, at 1:56 AM, Peter Zijlstra wrote: > >> Speaking with my virt ignorance hat on, how impossible is it to provide > >> generic/useful VMLAUNCH/VMRESUME wrappers? > >> > >> Because a lot of what happens around VMEXIT/VMENTER is very much like > >> the userspace entry crud, as per that series from Thomas that fixes all > >> that. And surely we don't need various broken copies of that in all the > >> out-of-tree hypervisors. > >> > >> Also, I suppose if you have this, we no longer need to excempt CR2. > > > > It depends on what you mean by “VMLAUNCH/VMRESUME”. If you only consider the > > instructions themselves, as Sean did in vmx_vmenter() and vmx_vmexit(), > > there is no problem. Even if you consider saving the general purpose > > registers as done in __vmx_vcpu_run() - that’s relatively easy. > > __vmx_vcpu_run() is roughly the scope, but that wont work. > > Looking at the vmmon source: > > Task_Switch() > > 1) Mask all APIC LVTs which have NMI delivery mode enabled, e.g. PERF > > 2) Disable interrupts > > 3) Disable PEBS > > 4) Disable PT > > 5) Load a magic IDT > > According to comments these are stubs to catch any exception which > happens while switching over. > > 6) Write CR0 and CR4 directly which is "safe" as the the IDT is > redirected to the monitor stubs. > > 7) VMXON() > > 8) Invoke monitor on some magic page which switches CR3 and GDT and > clears CR4.PCIDE (at least thats what the comments claim) > > The monitor code is loaded from a binary only blob and that does > the actual vmlaunch/vmresume ... From what I understand (never looked at the code), is that this binary blob is the same for Windows and Apple. It's basically its own operating system that does all the work and vmmon is the way to switch to and from it. When this blob gets an interrupt that it doesn't know about, it assumes it belongs to the operating system its sharing the machine with and exits back to it, whether that's Linux, Windows or OSX. It's not too unlike what jailhouse does with its hypervisor, to take over the machine and place the running Linux into its own "cell", except that it will switch full control of the machine back to Linux. -- Steve > > And as this runs with a completely different CR3 sharing that > code is impossible. > > When returning the above is undone in reverse order and any catched > exceptions / interrupts are replayed via "int $NR". > > So it's pretty much the same mess as with vbox just different and > binary. Oh well... > > The "good" news is that it's not involved in any of the context tracking > stuff so RCU wont ever be affected when a vmware vCPU runs. It's not > pretty, but TBH I don't care. > > Thanks, > > tglx >