Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934207AbeAJLyX (ORCPT + 1 other); Wed, 10 Jan 2018 06:54:23 -0500 Received: from mx1.redhat.com ([209.132.183.28]:37650 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932170AbeAJLyW (ORCPT ); Wed, 10 Jan 2018 06:54:22 -0500 Date: Wed, 10 Jan 2018 12:54:19 +0100 From: Andrea Arcangeli To: David Woodhouse Cc: Peter Zijlstra , Dave Hansen , Thomas Gleixner , LKML , Linus Torvalds , x86@kernel.org, Borislav Petkov , Tim Chen , Andi Kleen , Greg KH , Andy Lutomirski , Arjan Van De Ven Subject: Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code Message-ID: <20180110115419.GA9706@redhat.com> References: <20180110010652.404145126@linutronix.de> <20180110011350.855878109@linutronix.de> <20180110092234.GY29822@worktop.programming.kicks-ass.net> <1515576479.22302.81.camel@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1515576479.22302.81.camel@infradead.org> User-Agent: Mutt/1.9.2 (2017-12-15) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Wed, 10 Jan 2018 11:54:21 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Wed, Jan 10, 2018 at 09:27:59AM +0000, David Woodhouse wrote: > I don't know why you're calling that 'IBRS=2'; are you getting confused > by Andrea's distro horridness? Eh, yes he's got confused. ibrs_enabled 2 simply means to leave IBRS set in SPEC_CTLR 100% of the time, except in guest mode. IBRS is the "1" value to write in SPEC_CTRL MSR, SPEC_CTRL = 2 is not IBRS. "ibrs_enabled ibpb_enabled" semantics are practically the only thing we didn't change from the code that we've been presented with (aside from adding ibrs_enable 0 ibpb_enabled 2 new mode to fix spectre variant#2 on CPUs with no SPEC_CTRL/IBRS and only with IBPB_SUPPORT, that otherwise would have no kernel protection at all [of course modulo HT/SMT but that's not the PoC and ibrs_enabled 1 ibpb_enabled 1 also won't protect guest/user vs guest/user for HT/SMT and CPU pinning/isolation or ibrs 2 or HT/SMT disablement is needed for that]). NOTE: we moved ibrs_enabled ibpb_enable to debugfs, they were in more problematic location before we moved them there. We could have done something entirely different, but ibrs_enabled ibpb_enabled semantics as they were presented, looked as good as it could get to be able to achieve all the below on 0day: 1) ship a full fix for all CPUs (including those CPUs that cannot be fixed with reptolines) that protected the host kernel memory 100% from spectre variant#2 by default 2) to be able to get as much performance back as possible for all those cases where these attacks are irrelevant and performance is critical by setting ibrs_enabled 0 ibpb_enabled 0 (pti_enabled 0 is available too in the same debug/x86 location as well in fact). If you see complains on ibrs_enabled 1 slowing down ffmpeg or something compared to ibrs_enabled 0, well it's because we shipped those tunables in the first place that precisely allows so easy comparison, and that makes it so easy to set ibrs_enabled 0 and run at full CPU power in kernel mode too if spectre variant#2 is not a concern. Compare that to having to reboot the system and modify grub every time before running a test... that would have been truly horrid as a solution to get the all performance back where spectre variant#2 is no concern. If you're rendering a movie or you're transcoding with ffmpeg and nothing else and you're behind several level of firewalls or even disconnected from the network, it can all be disabled (including setting pti_enabled 0 if that's the only app in guest or host, i.e. a single memcached microservice running in a KVM guest). 3) allow to optionally make it impossible for a KVM guest userland to read KVM guest kernel memory by attacking qemu host userland, by simply setting ibrs_enabled 2 ibpb_enabled 1 for a subset of customers that may be especially concerned about that theoretical issue. Also allow to fix other theoretical HT/SMT attacks affecting guest/user mode vs guest/user mode with the same ibrs_enabled 2 ibpb_enabled 1. We don't know if those issues are practical but ibrs_enabled 2 will provide math guarantee for those as well. For the KVM guest userland attack on qemu userland not even SMEP helps as kindly confirmed by Dave. For that issue also the ibrs_enabled 0 ibpb_enabled 2 on host or even only in guest is enough, but likely it would perform worse than ibrs_enabled 2 ibpb_enabled 1, but then it depends on the CPU which is again why tunables are so handy so you can optimize for the best tradeoff. Not exactly sure how you could have accommodated for all 3 points above in an even more flexible way by removing those three debugfs tunables. If you don't want to deal with them because it's very complex to understand exactly what they do, there's a simple solution too: ignore them knowing it boots secure by default if either SPEC_CTRL/IBPB_SUPPORT are present in boot "dmesg" logs and to disable it all to get the performance back: echo 0 >ibrs_enabled echo 0 >ibpb_enabled. Even Dave just said that even with reptolines optimization patched in by default at boot, he'd still like to be able to enable the equivalent of ibrs_enabled 1, let alone those CPUs where reptolines can't fix spectre variant#2. Furthermore ibrs_enabled 2 is not obsoleted at all by kernel repotlines unless you rebuild the whole userland including all qemu dependencies and glibc with reptolines which would have been impossible to achieve on 0day no matter what. Retpolines are an optimization for later for a subset of CPUs so that we can stop setting IBRS in SPEC_CTRL while in kernel mode. On some CPUs IBRS or in general disabling the IBP is so fast I wouldn't even bother to enable reptolines at boot in if they don't even work safe to begin with. Thanks, Andrea