Date: Thu, 4 Jan 2018 17:25:41 +0100
From: Andrea Arcangeli <aarcange@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
        "Woodhouse, David" <dwmw@amazon.co.uk>,
        "pavel@ucw.cz" <pavel@ucw.cz>,
        "tim.c.chen@linux.intel.com" <tim.c.chen@linux.intel.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
        "tglx@linutronix.de" <tglx@linutronix.de>,
        "andi@firstfloor.org" <andi@firstfloor.org>,
        "gnomes@lxorguk.ukuu.org.uk" <gnomes@lxorguk.ukuu.org.uk>,
        "dave.hansen@intel.com" <dave.hansen@intel.com>,
        "gregkh@linux-foundation.org" <gregkh@linux-foundation.org>
Subject: Re: Avoid speculative indirect calls in kernel
Message-ID: <20180104162541.GD13348@redhat.com>
References: <20180103230934.15788-1-andi@firstfloor.org>
 <CA+55aFzCocoK+4kxAUEhaxxba4RTv3ewBmhiZ8Osc9iDkBtCEQ@mail.gmail.com>
 <e64025e6-9c4f-703d-cb69-29eeb9cc16bd@redhat.com>
 <20180104114231.GB1702@amd>
 <1515066469.12987.112.camel@amazon.co.uk>
 <94b12025-b27c-04d2-8726-c07a3af6b265@redhat.com>
 <7a3584c6-0c00-d807-5130-13d1f4b34102@citrix.com>
 <ad2e4bc5-26cb-c225-e869-0621fd0c9a7c@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <ad2e4bc5-26cb-c225-e869-0621fd0c9a7c@redhat.com>
User-Agent: Mutt/1.9.2 (2017-12-15)
Sender: linux-kernel-owner@vger.kernel.org

Hello,

On Thu, Jan 04, 2018 at 04:32:01PM +0100, Paolo Bonzini wrote:
> On 04/01/2018 15:51, Andrew Cooper wrote:
> > Where have you got this idea from?? Using IBPB on every mode switch
> > would be an insane overhead to take, and isn't necessary.

It's only on kernel entry and vmexit.

> IIRC it started as a paranoia mode for AMD, but then we found out it was
> actually faster than IBRS on some Intel processor where IBRS performance
> was horrible.  But I don't remember the details of the performance
> testing, sorry.

Yes, it depends on the workload what is faster. ibrs 0 ibpb 2 is
possible to use on CPUs with SPEC_CTRL too in fact.

It's only where SPEC_CTRL is missing and only IBPB_SUPPORT is
available, that ibrs 0 ibpb 2 is the only option to fix variant#2 for
good.

If you run lots of syscalls ibrs 1 ibpb 1 is much faster. If you do
infrequent syscalls computing a lot in kernel like I/O with large
buffers getting copied, ibrs 0 ibpb 2 is much faster than ibrs 1 ibpb
1 (on those microcodes where ibrs 1 reduces performance a lot, not all
microcodes implementing SPEC_CTRL are inefficient like that).

If SPEC_CTRL is available ibrs 1 ibpb 1 should be preferred even if it
may not always be faster in every workload.

AMD website says
https://www.amd.com/en/corporate/speculative-execution

"Differences in AMD architecture mean there is a near zero risk of
exploitation of this variant."

ibrs 0 ibpb 2 brings the probability down to zero even when SPEC_CTRL
is missing and only IBPB_SUPPORT is available in microcode, if you
need that kind of piece of mind.

What exactly would be the point of shipping fixes for variant#2 if we
leave spectre variant#2 unfixed also in cases where we could have
fixed it?

The problem is, it's very unlikely, but if by accident somebody can
mount and setup such an attack, then spectre variant#2 becomes a
problem almost as bad as spectre variant#1 is and your hypervisor
guest/host isolation is fully compromised.

It's not up to us to decide if to leave something with "near zero
risk" unfixed by default, so for now we provided a fix that brings the
probability of such spectre variant#2 attack to zero whenever
possible so that such a spectre varaint#2 attack becomes impossible
(not just "near zero risk"").

Of course we made sure the performance comes back at runtime no matter
what after running this:

    echo 0 >/sys/kernel/debug/x86/ibpb_enabled
    echo 0 >/sys/kernel/debug/x86/ibrs_enabled

Or if you prefer at boot time with "noibrs noibpb". Not everyone
will necessarily care about that kind of variant#2 attacks of course.

NOTE: if those two tunables both read as 0 it means the fix for
variant#2 isn't activated by the running kernel and you need to
contact your CPU manufacturer for a microcode update providing
SPEC_CTRL or at least IBPB_SUPPORT (in the latter case the fix will
generally tend to perform worse and ibrs 0 ibpb 2 mode will
auto-engage).

For meltdown variant#3 same thing: if you want to disable the fix at
runtime because it's a guest kernel and it's running a single
microservice with a single app (similar to unikernel) or something
like that, you can with "nopti" or:

    echo 0 >/sys/kernel/debug/x86/pti_enabled

Same issue if it's a bare metal host and it's running a single app and
it doesn't store secure data in kernel space etc... There's always an
option to disable the fixes.

Only spectre variant#1 fix is always on, as there's no performance
overhead to it.

By default it boots in the most secure setting possible so that all
spectre variant#1 and variant2 and meltdown variant#3 are fixed.

Thanks,
Andrea