Received: by 10.223.164.202 with SMTP id h10csp3983956wrb; Mon, 20 Nov 2017 08:05:07 -0800 (PST) X-Google-Smtp-Source: AGs4zMaVCPBBRFKT+zvGa+JdJaqnoH17z8FbEgqmL+7rJ4hBdPmcdmKeEd8MrLgaGU9w3V1o4jRt X-Received: by 10.99.153.1 with SMTP id d1mr13847940pge.379.1511193907685; Mon, 20 Nov 2017 08:05:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511193907; cv=none; d=google.com; s=arc-20160816; b=TEPCFoznNsb+NglbeeVsgDKc97ePWc4dDilsu1D3bOQI5sGaah+uiJXFj8rYD95NTI +eqj45NcKezCrVZ4gYPFRpNlpcdebljajESzZGERmn+/ngiprolaK3S1CAZhzmoSbftF w1QUAaTrR0AEM8BjSlnzETEg7L1E3zLXIUmCdrag8JH7NP9YxqcI7VqSdeeGXWdj6nJw /AD4hBiopRs/o220uqDW+CwclVz6giNp74GzoPdZxqGEx4CzR0PFx4boZbrBUb1D1z2r 0SPIzDot8QYjRQO0KmPeSZhYA9OJkfFo5JfRkt7yrDc8nGWKacsXDnv8XXX68PHuGAg4 O83A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=2Z0a3mbEF4MvCLGDCLipRUC28KVulPc+4kR4WhzcpnE=; b=Vkn7ocaVDLSS+xpuO32mQiAOCTCNiEaSvGKbcgFV6Cwi8JhJQdJQiXSN/TaKUFyiaz Le6WNBAWMsNYpZ6xw+nOGax1nBjCP56k0LXd+jl4AwlzixFYlU0ltKzqqDDn+d+XswNO 9f4AjnbRSe7NWejvy61AigWDXEizDOnB+OTcSEDNCAsnYp1Y3iIZOugaCbtJI6io+Pkc vIX5AoH/z2OYnR0ylWuRQw8KkwV3gy4OIy2uMYqvq4d5qjK5xeJeghHLhPktNlg8rPPN mGCCWml2VegKcoVk7jZqLbmBNJn9sZJPkc5fNbV+8qOOvlSLMOLypxFV8SrgDpPoy8nY 4SEQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=oNIN5sJg; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f3si8877912plf.106.2017.11.20.08.04.57; Mon, 20 Nov 2017 08:05:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=oNIN5sJg; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751570AbdKTQC1 (ORCPT + 66 others); Mon, 20 Nov 2017 11:02:27 -0500 Received: from mail-yb0-f195.google.com ([209.85.213.195]:37216 "EHLO mail-yb0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751350AbdKTQCZ (ORCPT ); Mon, 20 Nov 2017 11:02:25 -0500 Received: by mail-yb0-f195.google.com with SMTP id e83so3209845yba.4 for ; Mon, 20 Nov 2017 08:02:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=2Z0a3mbEF4MvCLGDCLipRUC28KVulPc+4kR4WhzcpnE=; b=oNIN5sJgq321sGufP3hPdhHBnkTig5kusbxSawPPtMUftP8p+kAGRmGR+6PoLgUTJv LyqWPHLo9yH6Lf1ADnNOetGDs8a9rcYipVDEWj93f67VOV27NgAi0rVHtbDvPlqbU/Y/ qewpyD1KHxe4I7tnr+W3/sppoaKPAFa1Ky/ggAMpdeHFF2ONnGsyP9BioAKwy8b66w5X MlU3/GKW9lTcer7izCTyzTlI3YxFMcfRBfbrrrhWcHgiu2Rp5O4DfwGSzf/VsWAwJ7Cy Sz0FSXGtE6sQIlo3TPFMCE48hKf3G9mrYvsNj4kw16PlwGA0/P5SekH5ALfIP6sbov5e rFwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=2Z0a3mbEF4MvCLGDCLipRUC28KVulPc+4kR4WhzcpnE=; b=FEES+tuGmuTbYEdOdWoakMX/eXMexqYQDnJwZAfNm831JfASrWFJNkHp8ppbko+V9f mhZ2H0q1n+Ys1EF7gTXi4sJrJyYkv3791kMV0981VJb7h0QgfnCnoSuPaT02M+dW3q0R Cae0nPqqjTAVM0v6rkEaI6dkRZBUdsuQn/FfJnFQIOTK3uJZfMfyIR5VN+J1yeCh44Or ZtfZ7acItvVAUkJsMRCWgBxfSP+2ESgHvkBQVfsujoTgNsEElLgIcdQ5c3eMCHBDnbsi KVBschWcG9psrigLu++Ma4PpvFao4U95l831KOZeEO+E1xJfBB9K/WwunNEfeKQaIa4A Yzkg== X-Gm-Message-State: AJaThX4mHbfch7TILwYqFsz0Zvm7iTJtHlBptwvDG5mjXFMEE8eVBJeZ moxEwD3OxOtlwpzr6Jm+ZraKjJIhlkfQxRDHuIA= X-Received: by 10.37.3.148 with SMTP id 142mr8623428ybd.64.1511193744385; Mon, 20 Nov 2017 08:02:24 -0800 (PST) MIME-Version: 1.0 Received: by 10.37.100.3 with HTTP; Mon, 20 Nov 2017 08:02:23 -0800 (PST) In-Reply-To: <20171110193058.BECA7D88@viggo.jf.intel.com> References: <20171110193058.BECA7D88@viggo.jf.intel.com> From: Juerg Haefliger Date: Mon, 20 Nov 2017 17:02:23 +0100 Message-ID: Subject: Re: [PATCH 00/30] [v3] KAISER: unmap most of the kernel from userspace page tables To: Dave Hansen Cc: LKML , linux-mm@kvack.org, moritz.lipp@iaik.tugraz.at, daniel.gruss@iaik.tugraz.at, michael.schwarz@iaik.tugraz.at, richard.fellner@student.tugraz.at, luto@kernel.org, Linus Torvalds , keescook@google.com, hughd@google.com, x86@kernel.org, jgross@suse.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 10, 2017 at 8:30 PM, Dave Hansen wrote: > Thanks, everyone for all the reviews thus far. I hope I managed to > address all the feedback given so far, except for the TODOs of > course. This is a pretty minor update compared to v1->v2. > > These patches are all on top of Andy's entry changes here: > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/entry_consolidation > > Changes from v2: > * Reword documentation removing "we" > * Fix some whitespace damage > * Fix up MAX ASID values off-by-one noted by Peter Z > * Change CodingStyle stuff from Borislav comments > * Always use _KERNPG_TABLE for pmd_populate_kernel(). > > Changes from v1: > * Updated to be on top of Andy L's new entry code > * Allow global pages again, and use them for pages mapped into > userspace page tables. > * Use trampoline stack instead of process stack at entry so no > longer need to map process stack (big win in fork() speed) > * Made the page table walking less generic by restricting it > to kernel addresses and !_PAGE_USER pages. > * Added a debugfs file to enable/disable CR3 switching at > runtime. This does not remove all the KAISER overhead, but > it removes the largest source. > * Use runtime disable with Xen to permit Xen-PV guests with > KAISER=y. > * Moved assembly code from "core" to "prepare assembly" patch > * Pass full register name to asm macros > * Remove double stack switch in entry_SYSENTER_compat > * Disable vsyscall native case when KAISER=y > * Separate PER_CPU_USER_MAPPED generic definitions from use > by arch/x86/. > > TODO: > * Allow dumping the shadow page tables with the ptdump code > * Put LDT at top of userspace > * Create separate tlb flushing functions for user and kernel > * Chase down the source of the new !CR4.PGE warning that 0day > found with i386 > > --- > > tl;dr: > > KAISER makes it harder to defeat KASLR, but makes syscalls and > interrupts slower. These patches are based on work from a team at > Graz University of Technology posted here[1]. The major addition is > support for Intel PCIDs which builds on top of Andy Lutomorski's PCID > work merged for 4.14. PCIDs make KAISER's overhead very reasonable > for a wide variety of use cases. > > Full Description: > > KAISER is a countermeasure against attacks on kernel address > information. There are at least three existing, published, > approaches using the shared user/kernel mapping and hardware features > to defeat KASLR. One approach referenced in the paper locates the > kernel by observing differences in page fault timing between > present-but-inaccessable kernel pages and non-present pages. > > KAISER addresses this by unmapping (most of) the kernel when > userspace runs. It leaves the existing page tables largely alone and > refers to them as "kernel page tables". For running userspace, a new > "shadow" copy of the page tables is allocated for each process. The > shadow page tables map all the same user memory as the "kernel" copy, > but only maps a minimal set of kernel memory. > > When we enter the kernel via syscalls, interrupts or exceptions, > page tables are switched to the full "kernel" copy. When the system > switches back to user mode, the "shadow" copy is used. Process > Context IDentifiers (PCIDs) are used to to ensure that the TLB is not > flushed when switching between page tables, which makes syscalls > roughly 2x faster than without it. PCIDs are usable on Haswell and > newer CPUs (the ones with "v4", or called fourth-generation Core). > > The minimal kernel page tables try to map only what is needed to > enter/exit the kernel such as the entry/exit functions, interrupt > descriptors (IDT) and the kernel trampoline stacks. This minimal set > of data can still reveal the kernel's ASLR base address. But, this > minimal kernel data is all trusted, which makes it harder to exploit > than data in the kernel direct map which contains loads of > user-controlled data. > > KAISER will affect performance for anything that does system calls or > interrupts: everything. Just the new instructions (CR3 manipulation) > add a few hundred cycles to a syscall or interrupt. Most workloads > that we have run show single-digit regressions. 5% is a good round > number for what is typical. The worst we have seen is a roughly 30% > regression on a loopback networking test that did a ton of syscalls > and context switches. More details about possible performance > impacts are in the new Documentation/ file. > > This code is based on a version I downloaded from > (https://github.com/IAIK/KAISER). It has been heavily modified. > > The approach is described in detail in a paper[2]. However, there is > some incorrect and information in the paper, both on how Linux and > the hardware works. For instance, I do not share the opinion that > KAISER has "runtime overhead of only 0.28%". Please rely on this > patch series as the canonical source of information about this > submission. > > Here is one example of how the kernel image grow with CONFIG_KAISER > on and off. Most of the size increase is presumably from additional > alignment requirements for mapping entry/exit code and structures. > > text data bss dec filename > 11786064 7356724 2928640 22071428 vmlinux-nokaiser > 11798203 7371704 2928640 22098547 vmlinux-kaiser > +12139 +14980 0 +27119 > > To give folks an idea what the performance impact is like, I took > the following test and ran it single-threaded: > > https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c > > It's a pretty quick syscall so this shows how much KAISER slows > down syscalls (and how much PCIDs help). The units here are > lseeks/second: > > no kaiser: 5.2M > kaiser+ pcid: 3.0M > kaiser+nopcid: 2.2M > > "nopcid" is literally with the "nopcid" command-line option which > turns PCIDs off entirely. > > Thanks to: > The original KAISER team at Graz University of Technology. > Andy Lutomirski for all the help with the entry code. > Kirill Shutemov for a helpful review of the code. > > 1. https://github.com/IAIK/KAISER > 2. https://gruss.cc/files/kaiser.pdf > > -- > > The code is available here: > > https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/ > > Documentation/x86/kaiser.txt | 160 +++++ > arch/x86/Kconfig | 8 + > arch/x86/entry/calling.h | 89 +++ > arch/x86/entry/entry_64.S | 44 +- > arch/x86/entry/entry_64_compat.S | 8 + > arch/x86/events/intel/ds.c | 49 +- > arch/x86/include/asm/cpufeatures.h | 1 + > arch/x86/include/asm/desc.h | 2 +- > arch/x86/include/asm/kaiser.h | 62 ++ > arch/x86/include/asm/mmu_context.h | 29 +- > arch/x86/include/asm/pgalloc.h | 37 +- > arch/x86/include/asm/pgtable.h | 20 +- > arch/x86/include/asm/pgtable_64.h | 135 +++++ > arch/x86/include/asm/pgtable_types.h | 25 +- > arch/x86/include/asm/processor.h | 2 +- > arch/x86/include/asm/tlbflush.h | 232 +++++++- > arch/x86/include/uapi/asm/processor-flags.h | 3 +- > arch/x86/kernel/cpu/common.c | 21 +- > arch/x86/kernel/espfix_64.c | 27 +- > arch/x86/kernel/head_64.S | 30 +- > arch/x86/kernel/ldt.c | 25 +- > arch/x86/kernel/process.c | 2 +- > arch/x86/kernel/process_64.c | 2 +- > arch/x86/kernel/traps.c | 46 +- > arch/x86/kvm/x86.c | 3 +- > arch/x86/mm/Makefile | 1 + > arch/x86/mm/init.c | 75 ++- > arch/x86/mm/kaiser.c | 627 ++++++++++++++++++++ > arch/x86/mm/pageattr.c | 18 +- > arch/x86/mm/pgtable.c | 16 +- > arch/x86/mm/tlb.c | 105 +++- > include/asm-generic/vmlinux.lds.h | 17 + > include/linux/kaiser.h | 34 ++ > include/linux/percpu-defs.h | 30 + > init/main.c | 3 + > kernel/fork.c | 1 + > security/Kconfig | 10 + > 37 files changed, 1851 insertions(+), 148 deletions(-) > > Cc: Moritz Lipp > Cc: Daniel Gruss > Cc: Michael Schwarz > Cc: Richard Fellner > Cc: Andy Lutomirski > Cc: Linus Torvalds > Cc: Kees Cook > Cc: Hugh Dickins > Cc: x86@kernel.org > Cc: Juergen Gross I get a compilation error with: CONFIG_RANDOMIZE_BASE=y OBJCOPY arch/x86/boot/compressed/vmlinux.bin RELOCS arch/x86/boot/compressed/vmlinux.relocs CC arch/x86/boot/compressed/early_serial_console.o CC arch/x86/boot/compressed/kaslr.o CC arch/x86/boot/compressed/pagetable.o CC arch/x86/boot/compressed/misc.o GZIP arch/x86/boot/compressed/vmlinux.bin.gz MKPIGGY arch/x86/boot/compressed/piggy.S AS arch/x86/boot/compressed/piggy.o DATAREL arch/x86/boot/compressed/vmlinux LD arch/x86/boot/compressed/vmlinux arch/x86/boot/compressed/pagetable.o: In function `kernel_ident_mapping_init': pagetable.c:(.text+0x31b): undefined reference to `kaiser_enabled' arch/x86/boot/compressed/Makefile:106: recipe for target 'arch/x86/boot/compressed/vmlinux' failed make[2]: *** [arch/x86/boot/compressed/vmlinux] Error 1 arch/x86/boot/Makefile:112: recipe for target 'arch/x86/boot/compressed/vmlinux' failed make[1]: *** [arch/x86/boot/compressed/vmlinux] Error 2 arch/x86/Makefile:295: recipe for target 'bzImage' failed make: *** [bzImage] Error 2 Compiles fine with: # CONFIG_RANDOMIZE_BASE is not set ...Juerg > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org From 1583709210511687799@xxx Fri Nov 10 19:39:58 +0000 2017 X-GM-THRID: 1583709210511687799 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread