Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751425AbeAECpi (ORCPT + 1 other); Thu, 4 Jan 2018 21:45:38 -0500 Received: from bombadil.infradead.org ([65.50.211.133]:58304 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751208AbeAECph (ORCPT ); Thu, 4 Jan 2018 21:45:37 -0500 Subject: Re: [PATCH] [v2] x86/doc: add PTI description To: Dave Hansen , linux-kernel@vger.kernel.org Cc: x86@kernel.org, keescook@chromium.org, moritz.lipp@iaik.tugraz.at, daniel.gruss@iaik.tugraz.at, michael.schwarz@iaik.tugraz.at, richard.fellner@student.tugraz.at, luto@kernel.org, torvalds@linux-foundation.org, hughd@google.com References: <20180105002428.19A01A83@viggo.jf.intel.com> From: Randy Dunlap Message-ID: <80238ada-e085-9bc9-2120-64e33249bafd@infradead.org> Date: Thu, 4 Jan 2018 18:45:18 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <20180105002428.19A01A83@viggo.jf.intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 01/04/2018 04:24 PM, Dave Hansen wrote: > Changes from v1: > * update kernel-parameters.txt to clarify that the pti= option > is not just for disabling. Also describe what 'pti=auto' does > and why > * Add a note about the presence of NX in the user portion of the > kernel page tables > * Clarify _additional_ 4k of PGD space > * Add a note about the runtime overhead of PCID without INVPCID > > --- > > From: Dave Hansen > > Add some details about how PTI works, what some of the downsides > are, and how to debug it when things go wrong. > > Also document the kernel parameter: 'nopti'. > > Signed-off-by: Dave Hansen > Reviewed-by: Kees Cook > Cc: Moritz Lipp > Cc: Daniel Gruss > Cc: Michael Schwarz > Cc: Richard Fellner > Cc: Andy Lutomirski > Cc: Linus Torvalds > Cc: Hugh Dickins > Cc: x86@kernel.org > --- > > b/Documentation/admin-guide/kernel-parameters.txt | 22 +- > b/Documentation/x86/pti.txt | 185 ++++++++++++++++++++++ > 2 files changed, 200 insertions(+), 7 deletions(-) > diff -puN /dev/null Documentation/x86/pti.txt > --- /dev/null 2017-12-15 13:48:30.454245127 -0800 > +++ b/Documentation/x86/pti.txt 2018-01-04 16:23:40.870819409 -0800 > @@ -0,0 +1,185 @@ > +The userspace copy is used when running userspace and mirrors the > +mapping of userspace present in the kernel copy. It maps a only drop: a > +the kernel data needed to enter and exit the kernel. This data > +is entirely contained in the 'struct cpu_entry_area' structure > +which is placed in the fixmap and thus each CPU's copy of the > +area has a compile-time-fixed virtual address. > + > +2. Runtime Cost > + a. CR3 manipulation to switch between the page table copies > + must be done at interrupt, syscall, and exception entry > + and exit (it can be skipped when the kernel is interrupted, > + though.) Moves to CR3 are on the order of a hundred > + cycles, and are required every at entry and every at exit. at every entry and at every exit. > + d. Global pages are disabled for all kernel structures not > + mapped in both to kernel and userspace page tables. This into both kernel and userspace page tables. > + feature of the MMU allows different processes to share TLB > + entries mapping the kernel. Losing the feature means more > + TLB misses after a context switch. The actual loss of > + performance is very small, however, never exceeding 1%. > + f. In addition to the fork()-time copying, there must also > + be an update to the userspace PGD any time a set_pgd() is done > + on a PGD used to map userspace. This ensures that the kernel > + and userspace copies always map the same userspace > + memory. > + g. On systems without PCID support, each CR3 write flushes > + the entire TLB. That means that each syscall, interrupt > + or exception flushes the TLB. > + h. On systems without INVPCID support, addresses can only be This is the first mention of INVPCID. Probably needs more info about what it is. > + flushed from the TLB for the current PCID. When flushing > + a kernel address, we need to flush all PCIDs, so a single > + kernel address flush will require a TLB-flushing CR3 write > + upon the next use of every PCID. > + > +Possible Future Work > +==================== > +1. We can be more careful about not actually writing to CR3 > + unless its value is actually changed. > +2. Allow PTI to enabled/disabled at runtime in addition to the to be > + boot-time switching. > + > +Testing > +======== -- ~Randy