Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753480AbeADQZo (ORCPT + 1 other); Thu, 4 Jan 2018 11:25:44 -0500 Received: from mx1.redhat.com ([209.132.183.28]:37316 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752344AbeADQZn (ORCPT ); Thu, 4 Jan 2018 11:25:43 -0500 Date: Thu, 4 Jan 2018 17:25:41 +0100 From: Andrea Arcangeli To: Paolo Bonzini Cc: Andrew Cooper , "Woodhouse, David" , "pavel@ucw.cz" , "tim.c.chen@linux.intel.com" , "linux-kernel@vger.kernel.org" , "torvalds@linux-foundation.org" , "tglx@linutronix.de" , "andi@firstfloor.org" , "gnomes@lxorguk.ukuu.org.uk" , "dave.hansen@intel.com" , "gregkh@linux-foundation.org" Subject: Re: Avoid speculative indirect calls in kernel Message-ID: <20180104162541.GD13348@redhat.com> References: <20180103230934.15788-1-andi@firstfloor.org> <20180104114231.GB1702@amd> <1515066469.12987.112.camel@amazon.co.uk> <94b12025-b27c-04d2-8726-c07a3af6b265@redhat.com> <7a3584c6-0c00-d807-5130-13d1f4b34102@citrix.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Thu, 04 Jan 2018 16:25:43 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hello, On Thu, Jan 04, 2018 at 04:32:01PM +0100, Paolo Bonzini wrote: > On 04/01/2018 15:51, Andrew Cooper wrote: > > Where have you got this idea from?? Using IBPB on every mode switch > > would be an insane overhead to take, and isn't necessary. It's only on kernel entry and vmexit. > IIRC it started as a paranoia mode for AMD, but then we found out it was > actually faster than IBRS on some Intel processor where IBRS performance > was horrible. But I don't remember the details of the performance > testing, sorry. Yes, it depends on the workload what is faster. ibrs 0 ibpb 2 is possible to use on CPUs with SPEC_CTRL too in fact. It's only where SPEC_CTRL is missing and only IBPB_SUPPORT is available, that ibrs 0 ibpb 2 is the only option to fix variant#2 for good. If you run lots of syscalls ibrs 1 ibpb 1 is much faster. If you do infrequent syscalls computing a lot in kernel like I/O with large buffers getting copied, ibrs 0 ibpb 2 is much faster than ibrs 1 ibpb 1 (on those microcodes where ibrs 1 reduces performance a lot, not all microcodes implementing SPEC_CTRL are inefficient like that). If SPEC_CTRL is available ibrs 1 ibpb 1 should be preferred even if it may not always be faster in every workload. AMD website says https://www.amd.com/en/corporate/speculative-execution "Differences in AMD architecture mean there is a near zero risk of exploitation of this variant." ibrs 0 ibpb 2 brings the probability down to zero even when SPEC_CTRL is missing and only IBPB_SUPPORT is available in microcode, if you need that kind of piece of mind. What exactly would be the point of shipping fixes for variant#2 if we leave spectre variant#2 unfixed also in cases where we could have fixed it? The problem is, it's very unlikely, but if by accident somebody can mount and setup such an attack, then spectre variant#2 becomes a problem almost as bad as spectre variant#1 is and your hypervisor guest/host isolation is fully compromised. It's not up to us to decide if to leave something with "near zero risk" unfixed by default, so for now we provided a fix that brings the probability of such spectre variant#2 attack to zero whenever possible so that such a spectre varaint#2 attack becomes impossible (not just "near zero risk""). Of course we made sure the performance comes back at runtime no matter what after running this: echo 0 >/sys/kernel/debug/x86/ibpb_enabled echo 0 >/sys/kernel/debug/x86/ibrs_enabled Or if you prefer at boot time with "noibrs noibpb". Not everyone will necessarily care about that kind of variant#2 attacks of course. NOTE: if those two tunables both read as 0 it means the fix for variant#2 isn't activated by the running kernel and you need to contact your CPU manufacturer for a microcode update providing SPEC_CTRL or at least IBPB_SUPPORT (in the latter case the fix will generally tend to perform worse and ibrs 0 ibpb 2 mode will auto-engage). For meltdown variant#3 same thing: if you want to disable the fix at runtime because it's a guest kernel and it's running a single microservice with a single app (similar to unikernel) or something like that, you can with "nopti" or: echo 0 >/sys/kernel/debug/x86/pti_enabled Same issue if it's a bare metal host and it's running a single app and it doesn't store secure data in kernel space etc... There's always an option to disable the fixes. Only spectre variant#1 fix is always on, as there's no performance overhead to it. By default it boots in the most secure setting possible so that all spectre variant#1 and variant2 and meltdown variant#3 are fixed. Thanks, Andrea