Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752351AbdDIMtW (ORCPT ); Sun, 9 Apr 2017 08:49:22 -0400 Received: from r00tworld.com ([212.85.137.150]:53892 "EHLO r00tworld.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752069AbdDIMtN (ORCPT ); Sun, 9 Apr 2017 08:49:13 -0400 From: "PaX Team" To: Andy Lutomirski Date: Sun, 09 Apr 2017 14:47:20 +0200 MIME-Version: 1.0 Subject: Re: [kernel-hardening] Re: [RFC v2][PATCH 04/11] x86: Implement __arch_rare_write_begin/unmap() Reply-to: pageexec@freemail.hu CC: Mathias Krause , Andy Lutomirski , Thomas Gleixner , Kees Cook , "kernel-hardening@lists.openwall.com" , Mark Rutland , Hoeun Ryu , Emese Revfy , Russell King , X86 ML , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , Peter Zijlstra Message-ID: <58EA2D58.17782.6ADE22BD@pageexec.freemail.hu> In-reply-to: References: <1490811363-93944-1-git-send-email-keescook@chromium.org>, <58E7EF70.30766.621C4F44@pageexec.freemail.hu>, X-mailer: Pegasus Mail for Windows (4.72.572) Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Content-description: Mail message body X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.1.12 (r00tworld.com [212.85.137.150]); Sun, 09 Apr 2017 14:47:20 +0200 (CEST) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2156 Lines: 39 On 7 Apr 2017 at 21:58, Andy Lutomirski wrote: > On Fri, Apr 7, 2017 at 12:58 PM, PaX Team wrote: > > On 7 Apr 2017 at 9:14, Andy Lutomirski wrote: > >> Then someone who cares about performance can benchmark the CR0.WP > >> approach against it and try to argue that it's a good idea. This > >> benchmark should wait until I'm done with my PCID work, because PCID > >> is going to make use_mm() a whole heck of a lot faster. > > > > in my measurements switching PCID is hovers around 230 cycles for snb-ivb > > and 200-220 for hsw-skl whereas cr0 writes are around 230-240 cycles. there's > > of course a whole lot more impact for switching address spaces so it'll never > > be fast enough to beat cr0.wp. > > > > If I'm reading this right, you're saying that a non-flushing CR3 write > is about the same cost as a CR0.WP write. If so, then why should CR0 > be preferred over the (arch-neutral) CR3 approach? cr3 (page table switching) isn't arch neutral at all ;). you probably meant the higher level primitives except they're not enough to implement the scheme as discussed before since the enter/exit paths are very much arch dependent. on x86 the cost of the pax_open/close_kernel primitives comes from the cr0 writes and nothing else, use_mm suffers not only from the cr3 writes but also locking/atomic ops and cr4 writes on its path and the inevitable TLB entry costs. and if cpu vendors cared enough, they could make toggling cr0.wp a fast path in the microcode and reduce its overhead by an order of magnitude. > And why would switching address spaces obviously be much slower? > There'll be a very small number of TLB fills needed for the actual > protected access. you'll be duplicating TLB entries in the alternative PCID for both code and data, where they will accumulate (=take room away from the normal PCID and expose unwanted memory for access) unless you also flush them when switching back (which then will cost even more cycles). also i'm not sure that processors implement all the 12 PCID bits so depending on how many PCIDs you plan to use, you could be causing even more unnecessary TLB replacements.