Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp974849ybi; Fri, 12 Jul 2019 07:37:54 -0700 (PDT) X-Google-Smtp-Source: APXvYqzXRmGuTCT9P1JA32XS/brEKEyS1RUyM6GenPj/RV4KRK8u7qAyW4wCytfiBHBZVfngmLdC X-Received: by 2002:a65:6256:: with SMTP id q22mr11151515pgv.408.1562942273984; Fri, 12 Jul 2019 07:37:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562942273; cv=none; d=google.com; s=arc-20160816; b=GdL0yI/InHxaCbWy3xYv7mD0icOAaDsWgmj6bVMAPo4K2uN9cyMVqxGRV4RcOEpueF 89dgUQ0I2r2aqsgKaemY5j+v8Mo64DHa3PRA+euAl/YjZhlD1pRG9yup/An1i+bshXY8 +kJ6tSlIeNOZwPZ+aBrekLE1Nsfsr4xYAJ6KxO5wGww1uniR3ZHltZKX06dvKSZ74z9F MeukDps6i3rL4BBnSiI+XS+zsVM/z/c90mgMtNTJFR6BpynflEQLFgUeckxBRAgk6gdt CqhYpT00InpfnFNZofTsHpwL7p5T03D3WQWw20ZCw14Rdoua1z5b5J8i4PF4fLRYt5DH IVlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=AUZ+nG48yRgZSNfhP/Rtxt7JrJ9PZivJotzk6lKp9vA=; b=I2V1ivMSLSmE+2QCeL5iD0A338ccPmjCl5Q4u2Wn9OuVHYQ+t79Dn3erE4bsrujm8+ 5hmiaI7PQmRYC/G+AiTzBBy5akQFi2Q9mu8sYxHXpelj73M+PpVxf+nrl/CajQLhkGbg hq+bFJhwYh88bVsSnUp47W0vvHZvZtLIFVPYT72LJkmCMNgv4A10HTLsD0nXrlblCiGA m1ZNgb53ircgoFu+5RHxVoh73v346LnNFDrRNUMEE0EZcgLKZZY07n/Vn4OskU3zUoir PiFhdE2qyKtzciBnRXQbiZMK9VIQStlU6rJHRFX9AhpSV/M8luWgZRG+8D2trvRPwKSo rc/g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Psf1NjaQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 94si7947839plb.224.2019.07.12.07.37.38; Fri, 12 Jul 2019 07:37:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Psf1NjaQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727253AbfGLOgj (ORCPT + 99 others); Fri, 12 Jul 2019 10:36:39 -0400 Received: from mail.kernel.org ([198.145.29.99]:36836 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726449AbfGLOgj (ORCPT ); Fri, 12 Jul 2019 10:36:39 -0400 Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A86D4216C4 for ; Fri, 12 Jul 2019 14:36:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1562942197; bh=HHZaY2tFKhZh3Vou4+5i4iiP03K9QdtH44f6+mWjpsU=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=Psf1NjaQQet45SE1qUKg2HlyWLtBriCxn1RqxK5FXxZyfhEGf6Avgz72+9qBbVlyt PNhwwNdc/arZmxDjpptLJIlITJITqa9TmMKLOEhilhyIaL4oiohOmBgk9cCNsdIzdD LykORndiv/X8nO2SXVBWi68HRLe7tjTebEm1HBtI= Received: by mail-wr1-f44.google.com with SMTP id p13so10200279wru.10 for ; Fri, 12 Jul 2019 07:36:37 -0700 (PDT) X-Gm-Message-State: APjAAAWPe/963uFVgj0SiXcFbxqpxnCxi5cXhYQmdGC0mKKoNXUlBDYd N160eKC7SvUhSpExoVuAGq2Uo/vmS3K1spj8O96aOQ== X-Received: by 2002:adf:a143:: with SMTP id r3mr12152043wrr.352.1562942196223; Fri, 12 Jul 2019 07:36:36 -0700 (PDT) MIME-Version: 1.0 References: <1562855138-19507-1-git-send-email-alexandre.chartre@oracle.com> <5cab2a0e-1034-8748-fcbe-a17cf4fa2cd4@intel.com> <61d5851e-a8bf-e25c-e673-b71c8b83042c@oracle.com> <20190712125059.GP3419@hirez.programming.kicks-ass.net> In-Reply-To: From: Andy Lutomirski Date: Fri, 12 Jul 2019 07:36:24 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC v2 00/27] Kernel Address Space Isolation To: Alexandre Chartre Cc: Peter Zijlstra , Thomas Gleixner , Dave Hansen , Paolo Bonzini , Radim Krcmar , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Dave Hansen , Andrew Lutomirski , kvm list , X86 ML , Linux-MM , LKML , Konrad Rzeszutek Wilk , jan.setjeeilers@oracle.com, Liran Alon , Jonathan Adams , Alexander Graf , Mike Rapoport , Paul Turner Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 12, 2019 at 6:45 AM Alexandre Chartre wrote: > > > On 7/12/19 2:50 PM, Peter Zijlstra wrote: > > On Fri, Jul 12, 2019 at 01:56:44PM +0200, Alexandre Chartre wrote: > > > >> I think that's precisely what makes ASI and PTI different and independent. > >> PTI is just about switching between userland and kernel page-tables, while > >> ASI is about switching page-table inside the kernel. You can have ASI without > >> having PTI. You can also use ASI for kernel threads so for code that won't > >> be triggered from userland and so which won't involve PTI. > > > > PTI is not mapping kernel space to avoid speculation crap (meltdown). > > ASI is not mapping part of kernel space to avoid (different) speculation crap (MDS). > > > > See how very similar they are? > > > > > > Furthermore, to recover SMT for userspace (under MDS) we not only need > > core-scheduling but core-scheduling per address space. And ASI was > > specifically designed to help mitigate the trainwreck just described. > > > > By explicitly exposing (hopefully harmless) part of the kernel to MDS, > > we reduce the part that needs core-scheduling and thus reduce the rate > > the SMT siblngs need to sync up/schedule. > > > > But looking at it that way, it makes no sense to retain 3 address > > spaces, namely: > > > > user / kernel exposed / kernel private. > > > > Specifically, it makes no sense to expose part of the kernel through MDS > > but not through Meltdow. Therefore we can merge the user and kernel > > exposed address spaces. > > The goal of ASI is to provide a reduced address space which exclude sensitive > data. A user process (for example a database daemon, a web server, or a vmm > like qemu) will likely have sensitive data mapped in its user address space. > Such data shouldn't be mapped with ASI because it can potentially leak to the > sibling hyperthread. For example, if an hyperthread is running a VM then the > VM could potentially access user sensitive data if they are mapped on the > sibling hyperthread with ASI. So I've proposed the following slightly hackish thing: Add a mechanism (call it /dev/xpfo). When you open /dev/xpfo and fallocate it to some size, you allocate that amount of memory and kick it out of the kernel direct map. (And pay the IPI cost unless there were already cached non-direct-mapped pages ready.) Then you map *that* into your VMs. Now, for a dedicated VM host, you map *all* the VM private memory from /dev/xpfo. Pretend it's SEV if you want to determine which pages can be set up like this. Does this get enough of the benefit at a negligible fraction of the code complexity cost? (This plus core scheduling, anyway.) --Andy