Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp1018346ybi; Fri, 12 Jul 2019 08:19:20 -0700 (PDT) X-Google-Smtp-Source: APXvYqwgVSGZhYmAy1kFKV3aGQwURY8LfjR6urgnrujz08QtxXvhSjO+RW+8dlgMYOBek8uhRoDM X-Received: by 2002:a63:5945:: with SMTP id j5mr11378018pgm.452.1562944760608; Fri, 12 Jul 2019 08:19:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562944760; cv=none; d=google.com; s=arc-20160816; b=N2vUxxB70o5JeFG129vSzXtJWWweH5X83j+BZEU1rcGCKbmiH94A4uLlpuNfGoWgit bnJ/7B4vinBZtpkaqKwItwbMOJUsx3J4Gywpv8EV9cOHGXUPPbBuW5A9agqLygz9akII CZVDee0wcq6X+hDODyz+6P3h5ANlsfKyBDXZ6/44j+d5Kh2Ut7T4QNN2sVL2vlD9g2AL xyfdELdM+gkcDABdWeoK8DHzNiPC9gOOGkl16JBbOk38a1KghgxpHnz6piP08FiYUz03 3HE+/r3bwY2V8drU9JQl/6jW2Xqdk0T1ETOW0K6jDc4VTOHLQUIrxxK6N1I2iboIhkm1 cjug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=dUrm3Ydn4da4L3iFFJKbK3HWpFnArI2Iu5wL4ttC9+0=; b=S4jXuObPfCrRBO8/CdK8NfXsdQfoLW2aGn0F7SZ8ShfRu3upfZAMiR/5ABWK7Zj7BO TFz2AYD982jVC1EFgKICI0TuQWtFPaItJVcHr04dPQFNXAJyMOPdckb5gJJE9F5SKMnT 5wRXoITYRCgo07Jnd8D3O/mhxg3fhdq+oi8fiopOkWRmEmqE9TAF3SkaQI6FAuoXPcR6 Epi7VHfy4qog0vo8gtF87r+5nSeUOH8NJiXGcyTrwel4f069jEM+GoHlREH79PS36LF2 XlOfc5JLpgueVVzCzkoOOlmhkC3omr5V9g0lSv2jUXHLF2ijMPzE9h9PjX8k6bLz6aJ7 /5iA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id cb13si7848460plb.325.2019.07.12.08.19.04; Fri, 12 Jul 2019 08:19:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727096AbfGLPRW (ORCPT + 99 others); Fri, 12 Jul 2019 11:17:22 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:44192 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726318AbfGLPRW (ORCPT ); Fri, 12 Jul 2019 11:17:22 -0400 Received: from [5.158.153.52] (helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1hlxIF-0003Mo-Lj; Fri, 12 Jul 2019 17:17:03 +0200 Date: Fri, 12 Jul 2019 17:16:58 +0200 (CEST) From: Thomas Gleixner To: Peter Zijlstra cc: Alexandre Chartre , Dave Hansen , pbonzini@redhat.com, rkrcmar@redhat.com, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, kvm@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, konrad.wilk@oracle.com, jan.setjeeilers@oracle.com, liran.alon@oracle.com, jwadams@google.com, graf@amazon.de, rppt@linux.vnet.ibm.com, Paul Turner Subject: Re: [RFC v2 00/27] Kernel Address Space Isolation In-Reply-To: <20190712125059.GP3419@hirez.programming.kicks-ass.net> Message-ID: References: <1562855138-19507-1-git-send-email-alexandre.chartre@oracle.com> <5cab2a0e-1034-8748-fcbe-a17cf4fa2cd4@intel.com> <61d5851e-a8bf-e25c-e673-b71c8b83042c@oracle.com> <20190712125059.GP3419@hirez.programming.kicks-ass.net> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 12 Jul 2019, Peter Zijlstra wrote: > On Fri, Jul 12, 2019 at 01:56:44PM +0200, Alexandre Chartre wrote: > > > I think that's precisely what makes ASI and PTI different and independent. > > PTI is just about switching between userland and kernel page-tables, while > > ASI is about switching page-table inside the kernel. You can have ASI without > > having PTI. You can also use ASI for kernel threads so for code that won't > > be triggered from userland and so which won't involve PTI. > > PTI is not mapping kernel space to avoid speculation crap (meltdown). > ASI is not mapping part of kernel space to avoid (different) speculation crap (MDS). > > See how very similar they are? > > Furthermore, to recover SMT for userspace (under MDS) we not only need > core-scheduling but core-scheduling per address space. And ASI was > specifically designed to help mitigate the trainwreck just described. > > By explicitly exposing (hopefully harmless) part of the kernel to MDS, > we reduce the part that needs core-scheduling and thus reduce the rate > the SMT siblngs need to sync up/schedule. > > But looking at it that way, it makes no sense to retain 3 address > spaces, namely: > > user / kernel exposed / kernel private. > > Specifically, it makes no sense to expose part of the kernel through MDS > but not through Meltdow. Therefore we can merge the user and kernel > exposed address spaces. > > And then we've fully replaced PTI. > > So no, they're not orthogonal. Right. If we decide to expose more parts of the kernel mappings then that's just adding more stuff to the existing user (PTI) map mechanics. As a consequence the CR3 switching points become different or can be consolidated and that can be handled right at those switching points depending on static keys or alternatives as we do today with PTI and other mitigations. All of that can do without that obscure "state machine" which is solely there to duct-tape the complete lack of design. The same applies to that mapping thing. Just mapping randomly selected parts by sticking them into an array is a non-maintainable approach. This needs proper separation of text and data sections, so violations of the mapping constraints can be statically analyzed. Depending solely on the page fault at run time for analysis is just bound to lead to hard to diagnose failures in the field. TBH we all know already that this can be done and that this will solve some of the issues caused by the speculation mess, so just writing some hastily cobbled together POC code which explodes just by looking at it, does not lead to anything else than time waste on all ends. This first needs a clear definition of protection scope. That scope clearly defines the required mappings and consequently the transition requirements which provide the necessary transition points for flipping CR3. If we have agreed on that, then we can think about the implementation details. Thanks, tglx