Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp4371474imm; Mon, 20 Aug 2018 14:50:22 -0700 (PDT) X-Google-Smtp-Source: AA+uWPygpKfvguRnPl0GdsPvV33GrNe5sngBg+AqqSo34CnXBVg3+4f7T6l9IHh2gPtu5ueaz2qo X-Received: by 2002:a65:40cd:: with SMTP id u13-v6mr45368286pgp.334.1534801822539; Mon, 20 Aug 2018 14:50:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534801822; cv=none; d=google.com; s=arc-20160816; b=h+VVf5zZJf0/73izEOcnZpl07Xc9FReZU53kccOD1r1QHPbEVHsQ/3cD9rkUmu1T+e XUP5tOA8B+LdURY17DgSDt/YPcUARjx14ozIupwtoYpt26AqeEpXkAWIfKSnquAmO9F4 BS65RfBta9wxhzgYc3C3lXcSujZT5ITRr84ot21K0PoDoK3L/jZ6t7XyfM0ePRAo1LaB 04Zt+Ax1qmbIvgSk+EsjIr0J5k3vBqPx1tqm8HN+8TvUfBEyR6EANum3Yah9R99zrBPe mAJQaLH9IjmYIBjaoc7iBluAcbOdKvbsZZU4R0asFXTXzaRpI8gbqixpqXE5ik7UtmMU vZpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature :arc-authentication-results; bh=S60IQiaH6D3Igx5OKZaFDH6VZ24uqizJ3TYtT9chWPo=; b=qbad0eInefwmurqpOup4FdO1KH3HX+EHT7kTqhyiJI9ZYC8d0rNzFhPTu4HYoL2PAC tHtx+b1g9usW3KU4vv4yPIpm3fQW/9RoBxD93VM5opmz86aqcoeHrB4noXwVfjm0+LnF 9o2UscnRI4R8Vhk9j/RHNKA/naZuy3hN0vlbbfsAQuyrRybV4lHdqPC3ED23xNT2w60Z tsOs8LIsUrMoAqdTEbCf0HXyP/e2qxNZG+IFeTJJO2IuUQdSqmoInply42YV1kRwrU7h 9AqwjLhLrT91WeOiP3xxJacbZQn2mscyA6WN6HapZlIC+AhzTlKaD8I3EERVjwpLaHkD FwFA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=Si1iUeY6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r10-v6si3301806pgv.168.2018.08.20.14.50.07; Mon, 20 Aug 2018 14:50:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=Si1iUeY6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726705AbeHUBGW (ORCPT + 99 others); Mon, 20 Aug 2018 21:06:22 -0400 Received: from mail-io0-f193.google.com ([209.85.223.193]:44700 "EHLO mail-io0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726461AbeHUBGW (ORCPT ); Mon, 20 Aug 2018 21:06:22 -0400 Received: by mail-io0-f193.google.com with SMTP id l8-v6so3283071ioj.11 for ; Mon, 20 Aug 2018 14:49:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=S60IQiaH6D3Igx5OKZaFDH6VZ24uqizJ3TYtT9chWPo=; b=Si1iUeY6Dqf3A8g8YizUVtg4mPZe561G93qibY+q+XuibRauC8QZ0tV6eQNocWqGh9 iQ+yvg7hGZlJ4g/zdTuRCP511Fz2rS3mAukTCPveyZsEX0QcvcI0Ch8IUnccHE9/HRM4 porB7h82BGAEKwyK/NVZp1rZ6f3Yzbk7ZT9BY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=S60IQiaH6D3Igx5OKZaFDH6VZ24uqizJ3TYtT9chWPo=; b=HaQ4CqotRcf9YXoMFjCfPk09dLxrl/4Xbb6AlejxkCC9hb2vzMm/l2Te/HczDMpwkF fQFIEnBor/YDr8dlmRYvzjv9lUo/ZG9Kwt1OdiJnL4uLmCRLo8MeKWr61yyQUL2SKP6F 8hJkP2lSM9jWmw9ZMfPj6uf+5HXZrxmA+meg4VM9BlLDqLcPkqzyy1FHQnKqSPVwKmHb KzaG9jM9B39TNv8vsXoXmIBCfElUCtF2QkO0H2ZBjO4fLmATrZPQnUnR+NUQfqtzsyiq SCLF63Fff3OlcwN280ERZCUypm33+1WEymOu5uAowV6tg5HLDjfqhI9uwRm9/LVDa+rS +OGg== X-Gm-Message-State: AOUpUlEfZbTtLQOwcW+CvonKXIuznL941W/vWXmdXiqjW01ocLPOxaqd dZ5b+d54eC7CFjcVhjNmG+L5sQhmclFpps4ROxU= X-Received: by 2002:a6b:f609:: with SMTP id n9-v6mr24040541ioh.259.1534801742247; Mon, 20 Aug 2018 14:49:02 -0700 (PDT) MIME-Version: 1.0 References: <20180820212556.GC2230@char.us.oracle.com> In-Reply-To: <20180820212556.GC2230@char.us.oracle.com> From: Linus Torvalds Date: Mon, 20 Aug 2018 14:48:51 -0700 Message-ID: Subject: Re: Redoing eXclusive Page Frame Ownership (XPFO) with isolated CPUs in mind (for KVM to isolate its guests per CPU) To: Konrad Rzeszutek Wilk Cc: Kernel Hardening , Liran Alon , deepa.srinivasan@oracle.com, linux-mm , juerg.haefliger@hpe.com, Khalid Aziz , chris.hyser@oracle.com, Tyler Hicks , David Woodhouse , Kees Cook , Andrew Cooper , Jon Masters , Boris Ostrovsky , kanth.ghatraju@oracle.com, joao.m.martins@oracle.com, Jim Mattson , pradeep.vincent@oracle.com, Andi Kleen , John Haxby , jsteckli@os.inf.tu-dresden.de, Linux Kernel Mailing List , Thomas Gleixner Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 20, 2018 at 2:26 PM Konrad Rzeszutek Wilk wrote: > > See eXclusive Page Frame Ownership (https://lwn.net/Articles/700606/) which was posted > way back in in 2016.. Ok, so my gut feel is that the above was reasonable within the context of 2016, but that the XPFO model is completely pointless and wrong in the post-Meltdown world we now live in. Why? Because with the Meltdown patches, we ALREADY HAVE the isolated page tables that XPFO tries to do. They are just the normal user page tables. So don't go around doing other crazy things. All you need to do is to literally: - before you enter VMX mode, switch to the user page tables - when you exit, switch back to the kernel page tables don't do anything else. You're done. Now, this is complicated a bit by the fact that in order to enter VMX mode with the user page tables, you do need to add the VMX state itself to those user page tables (and add the actual trampoline code to the vmenter too). So it does imply we need to slightly extend the user mapping with a few new patches, but that doesn't sound bad. In fact, it sounds absolutely trivial to me. The other thing you want to do is is the trivial optimization of "hey. we exited VMX mode due to a host interrupt", which would look like this: * switch to user page tables in order to do vmenter * vmenter * host interrupt happens - switch to kernel page tables to handle irq - do_IRQ etc - switch back to user page tables - iret * switch to kernel page tables because the vmenter returned so you want to have some trivial short-circuiting of that last "switch to user page tables and back" dance. It may actually be that we don't even need it, because the irq code may just be looking at what *mode* we were in, not what page tables we were in. I looked at that code back in the meltdown days, but that's already so last-year now that we have all these _other_ CPU bugs we handled. But other than small details like that, doesn't this "use our Meltdown user page table" sound like the right thing to do? And note: no new VM code or complexity. None. We already have the "isolated KVM context with only pages for the KVM process" case handled. Of course, after the long (and entirely unrelated) discussion about the TLB flushing bug we had, I'm starting to worry about my own competence, and maybe I'm missing something really fundamental, and the XPFO patches do something else than what I think they do, or my "hey, let's use our Meltdown code" idea has some fundamental weakness that I'm missing. Linus