Received: by 10.223.176.46 with SMTP id f43csp4052054wra; Tue, 23 Jan 2018 03:32:46 -0800 (PST) X-Google-Smtp-Source: AH8x227w0/hYXhKjAA+ry3gMSEMyuxJ3PEeHKGGdpTI0PgaIA07qGUxMa2T9WOV1cYFfv/r8ttjT X-Received: by 2002:a17:902:1665:: with SMTP id g92-v6mr5363419plg.245.1516707166623; Tue, 23 Jan 2018 03:32:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516707166; cv=none; d=google.com; s=arc-20160816; b=pA+NOSsgaX3DirwRbTz/M/7GYJky2slOARsKstgRT2FOXjSWSJo3uwO82yyib3v8SX go0c6j87AamfCPy/k9JIk9sWhDgURqFVp+rIHuOFwyY+sNfJq8d4KmR2edEVzDh/Rc5L PsniUWxsJaspw+kZ8I4YOpMQ8dhgWnZMIhx9e5IT5iumtuliuuSyOStnURFCvD74Uykr 3gdkC7/FdGadrf2LsvGapjMze/btjG8vmLJ4d9Wwu5rwP7tkitFQ9JnjLJN88ax1Ng8C YOJC5eS1U6CdgfbZ4ZF18uzSiXavtRtMMqawvThdiQWA4xnAUegTxMfg7ueeBlbmrxn+ y21g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-disposition :content-transfer-encoding:subject:cc:to:from:date:message-id :mime-version:dkim-signature:arc-authentication-results; bh=vcqXXy/iNYZKSwpS/OWeYQYgOEwVsDCPzZMW8fPOX5k=; b=VZJpVAr9FoesJKWhRwNnpLHO60qvodgVk6zWK9TjnS//mglHpMzdaAXbeJC0FMsWUo VMO1MPH+hyn7rZ2keQXk9LDGR93c/eHdzrj4ptj0G08bs5MrdulPNEwxvXOuiW6e57zX nQJM6X21cTsWGrec8xNrSGp/vemEvO+VcAidt0D6YTJDVXWVzj+CRc7Eediq5NUinNw8 aUh5MStQVC23WAWSXe1C9PX7ZuGI5ZzgqgcdWTBY/3JsYW7/5xSXMCtrDp0qbzNqEEfR qeVMmrjNnMP5ro4CSI19WV092pZvf+oDBBUx+y+kGFEkCqDGIo+vRHlNr5zmsSvjrrhA CqHg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=RO8E9Fn4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f6-v6si4552215plo.800.2018.01.23.03.32.32; Tue, 23 Jan 2018 03:32:46 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=RO8E9Fn4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751438AbeAWLbi (ORCPT + 99 others); Tue, 23 Jan 2018 06:31:38 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:34090 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751192AbeAWLbg (ORCPT ); Tue, 23 Jan 2018 06:31:36 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w0NBUMbA133261; Tue, 23 Jan 2018 11:30:27 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=mime-version : message-id : date : from : to : cc : subject : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=vcqXXy/iNYZKSwpS/OWeYQYgOEwVsDCPzZMW8fPOX5k=; b=RO8E9Fn4p+yBnhfAwApFZMfIX8ReAdfXZ1uvRgl54WmjT9/zjdMDlGCfi618D3yfp6mx 2GkZxjAZ8oEewbtHstGEZG3j5jsjTbEUxgN1+G3Hq6xweyj6uGLounUnIaoC+rg5fF7I vnNLDRrKHbZOyhb12twC0W02nVLWkMGj+kc0fzxzP6YRmWE7+Ebj1aKR7+3bLl0z5W7m qgwgIDYLcKd7cV3PKg1f86LB+VNKAffYV0lsPbNxyZz8nva/6CYF7lRV0Pba9oaxvuGe 3I4iXBHT4YHCqfVEnG2TqrPcB1UPthA58Grf0B2ml4AqQnGfj3qiGL1+rphi54V5tpOF ng== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp2120.oracle.com with ESMTP id 2fp4800025-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 23 Jan 2018 11:30:27 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w0NBD6hh005024 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 23 Jan 2018 11:13:06 GMT Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w0NBD5f2004723; Tue, 23 Jan 2018 11:13:05 GMT MIME-Version: 1.0 Message-ID: <6c16fc37-bdf2-4925-8114-14f5a08c07e3@default> Date: Tue, 23 Jan 2018 03:13:05 -0800 (PST) From: Liran Alon To: Cc: , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation X-Mailer: Zimbra on Oracle Beehive Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8782 signatures=668655 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801230158 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- dwmw2@infradead.org wrote: > On Sun, 2018-01-21 at 14:27 -0800, Linus Torvalds wrote: > > On Sun, Jan 21, 2018 at 2:00 PM, David Woodhouse > wrote: > > >> > > >> The patches do things like add the garbage MSR writes to the > kernel > > >> entry/exit points. That's insane. That says "we're trying to > protect > > >> the kernel".=C2=A0 We already have retpoline there, with less > overhead. > > > > > > You're looking at IBRS usage, not IBPB. They are different > things. > >=20 > > Ehh. Odd intel naming detail. > >=20 > > If you look at this series, it very much does that kernel > entry/exit > > stuff. It was patch 10/10, iirc. In fact, the patch I was replying > to > > was explicitly setting that garbage up. > >=20 > > And I really don't want to see these garbage patches just > mindlessly > > sent around. >=20 > I think we've covered the technical part of this now, not that you > like > it =E2=80=94 not that any of us *like* it. But since the peanut gallery i= s > paying lots of attention it's probably worth explaining it a little > more for their benefit. >=20 > This is all about Spectre variant 2, where the CPU can be tricked > into > mispredicting the target of an indirect branch. And I'm specifically > looking at what we can do on *current* hardware, where we're limited > to > the hacks they can manage to add in the microcode. >=20 > The new microcode from Intel and AMD adds three new features. >=20 > One new feature (IBPB) is a complete barrier for branch prediction. > After frobbing this, no branch targets learned earlier are going to > be > used. It's kind of expensive (order of magnitude ~4000 cycles). >=20 > The second (STIBP) protects a hyperthread sibling from following > branch > predictions which were learned on another sibling. You *might* want > this when running unrelated processes in userspace, for example. Or > different VM guests running on HT siblings. >=20 > The third feature (IBRS) is more complicated. It's designed to be > set when you enter a more privileged execution mode (i.e. the > kernel). > It prevents branch targets learned in a less-privileged execution > mode, > BEFORE IT WAS MOST RECENTLY SET, from taking effect. But it's not > just > a 'set-and-forget' feature, it also has barrier-like semantics and > needs to be set on *each* entry into the kernel (from userspace or a > VM > guest). It's *also* expensive. And a vile hack, but for a while it > was > the only option we had. >=20 > Even with IBRS, the CPU cannot tell the difference between different > userspace processes, and between different VM guests. So in addition > to > IBRS to protect the kernel, we need the full IBPB barrier on context > switch and vmexit. And maybe STIBP while they're running. >=20 > Then along came Paul with the cunning plan of "oh, indirect branches > can be exploited? Screw it, let's not have any of *those* then", > which > is retpoline. And it's a *lot* faster than frobbing IBRS on every > entry > into the kernel. It's a massive performance win. >=20 > So now we *mostly* don't need IBRS. We build with retpoline, use IBPB > on context switches/vmexit (which is in the first part of this patch > series before IBRS is added), and we're safe. We even refactored the > patch series to put retpoline first. >=20 > But wait, why did I say "mostly"? Well, not everyone has a retpoline > compiler yet... but OK, screw them; they need to update. >=20 > Then there's Skylake, and that generation of CPU cores. For > complicated > reasons they actually end up being vulnerable not just on indirect > branches, but also on a 'ret' in some circumstances (such as 16+ > CALLs > in a deep chain). >=20 > The IBRS solution, ugly though it is, did address that. Retpoline > doesn't. There are patches being floated to detect and prevent deep > stacks, and deal with some of the other special cases that bite on > SKL, > but those are icky too. And in fact IBRS performance isn't anywhere > near as bad on this generation of CPUs as it is on earlier CPUs > *anyway*, which makes it not quite so insane to *contemplate* using > it > as Intel proposed. >=20 > That's why my initial idea, as implemented in this RFC patchset, was > to > stick with IBRS on Skylake, and use retpoline everywhere else. I'll > give you "garbage patches", but they weren't being "just mindlessly > sent around". If we're going to drop IBRS support and accept the > caveats, then let's do it as a conscious decision having seen what it > would look like, not just drop it quietly because poor Davey is too > scared that Linus might shout at him again. :) >=20 > I have seen *hand-wavy* analyses of the Skylake thing that mean I'm > not > actually lying awake at night fretting about it, but nothing concrete > that really says it's OK. >=20 > If you view retpoline as a performance optimisation, which is how it > first arrived, then it's rather unconventional to say "well, it only > opens a *little* bit of a security hole but it does go nice and fast > so > let's do it". >=20 > But fine, I'm content with ditching the use of IBRS to protect the > kernel, and I'm not even surprised. There's a *reason* we put it last > in the series, as both the most contentious and most dispensable > part. > I'd be *happier* with a coherent analysis showing Skylake is still > OK, > but hey-ho, screw Skylake. >=20 > The early part of the series adds the new feature bits and detects > when > it can turn KPTI off on non-Meltdown-vulnerable Intel CPUs, and also > supports the IBPB barrier that we need to make retpoline complete. > That > much I think we definitely *do* want. There have been a bunch of us > working on this behind the scenes; one of us will probably post that > bit in the next day or so. >=20 > I think we also want to expose IBRS to VM guests, even if we don't > use > it ourselves. Because Windows guests (and RHEL guests; yay!) do use > it. >=20 > If we can be done with the shouty part, I'd actually quite like to > have > a sensible discussion about when, if ever, we do IBPB on context > switch > (ptraceability and dumpable have both been suggested) and when, if > ever, we set STIPB in userspace. It is also important to note that current solutions, as I understand it, st= ill have info-leak issues. If retpoline is being used, user-mode code can leak RSB entries created whi= le CPU was in kernel-mode. Therefore, breaking KASLR. In order to handle this, every exit from kernel-= mode to user-mode should stuff RSB. In addition, this stuffing of RSB may n= eed to be done from a fixed address to avoid leaking the address of the RSB= stuffing itself. Same concept applies for VMEntry into guests. Hypervisor = should stuff RSB just before VMEntry, otherwise guest will be able to leak = RSB entries which reveals hypervisor addresses. If IBRS is being used, things seems to be even worse. IBRS prevents BTB entries created at lower prediction-mode from being used = by higher prediction-mode code. However, nothing seems to prevent lower prediction-mode code from using BTB= entries of higher prediction-mode code. This means that user-mode code cou= ld leak BTB entries in order to break KASLR and guests could leaks host's B= TB entries to reveal hypervisor addresses. This seems to be an issue even w= ith future CPUs that will have "IBRS all-the-time" feature. Note that this issue is not theoretical. This is exactly what Google's Proj= ect-Zero KVM PoC did. They leaked host's BTB entries to reveal kvm-intel.ko= , kvm.ko & vmlinux addresses. It seems that the correct way to really handle this scenario should be to t= ag every BTB entry with prediction-mode and make CPU only use BTB entries t= agged with current prediction-mode. Therefore, entirely separating the BTB = entries between prediction-modes. That, in my opinion, should replace the I= BRS-feature. -Liran