Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752475AbeADJZ0 (ORCPT + 1 other); Thu, 4 Jan 2018 04:25:26 -0500 Received: from mail-ua0-f169.google.com ([209.85.217.169]:39729 "EHLO mail-ua0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752340AbeADJZN (ORCPT ); Thu, 4 Jan 2018 04:25:13 -0500 X-Google-Smtp-Source: ACJfBouZRQV5sgh+auDi0LEqt96f4XnV655D/046eEeSshTdaOhhRZrUw0njkSoqWxnGZJKW9PQCswK4uEfeuEJZwac= MIME-Version: 1.0 In-Reply-To: References: From: Paul Turner Date: Thu, 4 Jan 2018 01:24:41 -0800 Message-ID: Subject: Re: [RFC] Retpoline: Binary mitigation for branch-target-injection (aka "Spectre") To: LKML , Linus Torvalds , Greg Kroah-Hartman , "Woodhouse, David" , Tim Chen , Dave Hansen , tglx@linuxtronix.de, Kees Cook , Rik van Riel , Peter Zijlstra , Andy Lutomirski , Jiri Kosina , gnomes@lxorguk.ukuu.org.uk Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Thu, Jan 4, 2018 at 1:10 AM, Paul Turner wrote: > Apologies for the discombobulation around today's disclosure. Obviously the > original goal was to communicate this a little more coherently, but the > unscheduled advances in the disclosure disrupted the efforts to pull this > together more cleanly. > > I wanted to open discussion the "retpoline" approach and and define its > requirements so that we can separate the core > details from questions regarding any particular implementation thereof. > > As a starting point, a full write-up describing the approach is available at: > https://support.google.com/faqs/answer/7625886 > > The 30 second version is: > Returns are a special type of indirect branch. As function returns are intended > to pair with function calls, processors often implement dedicated return stack > predictors. The choice of this branch prediction allows us to generate an > indirect branch in which speculative execution is intentionally redirected into > a controlled location by a return stack target that we control. Preventing > branch target injections (also known as "Spectre") against these binaries. > > On the targets (Intel Xeon) we have measured so far, cost is within cycles of a > "native" indirect branch for which branch prediction hardware has been disabled. > This is unfortunately measurable -- from 3 cycles on average to about 30. > However the cost is largely mitigated for many workloads since the kernel uses > comparatively few indirect branches (versus say, a C++ binary). With some > effort we have the average overall overhead within the 0-1.5% range for our > internal workloads, including some particularly high packet processing engines. > > There are several components, the majority of which are independent of kernel > modifications: > > (1) A compiler supporting retpoline transformations. An implementation for LLVM is available at: https://reviews.llvm.org/D41723 > (1a) Optionally: annotations for hand-coded indirect jmps, so that they may be > made compatible with (1). > [ Note: The only known indirect jmp which is not safe to convert, is the > early virtual address check in head entry. ] > (2) Kernel modifications for preventing return-stack underflow (see document > above). > The key points where this occurs are: > - Context switches (into protected targets) > - interrupt return (we return into potentially unwinding execution) > - sleep state exit (flushes cashes) > - guest exit. > (These can be run-time gated, a full refill costs 30-45 cycles.) > (3) Optional: Optimizations so that direct branches can be used for hot kernel > indirects. While as discussed above, kernel execution generally depends on > fewer indirect branches, there are a few places (in particular, the > networking stack) where we have chained sequences of indirects on hot paths. > (4) More general support for guarding against RSB underflow in an affected > target. While this is harder to exploit and may not be required for many > users, the approaches we have used here are not generally applicable. > Further discussion is required. > > With respect to the what these deltas mean for an unmodified kernel: Sorry this should have been, a kernel that does not care about this protection. It has been a long day :-). > (1a) At minimum annotation only. More complicated, config and > run-time gated options are also possigble. > (2) Trivially run-time & config gated. > (3) The de-virtualizing of these branches improves performance in both the > retpoline and non-retpoline cases. > > For an out of the box kernel that is reasonably protected, (1)-(3) are required. > > I apologize that this does not come with a clean set of patches, merging the > things that we and Intel have looked at here. That was one of the original > goals for this week. Strictly speaking, I think that Andi, David, and I have > a fair amount of merging and clean-up to do here. This is an attempt > to keep discussion of the fundamentals at least independent of that. > > I'm trying to keep the above reasonably compact/dense. I'm happy to expand on > any details in sub-threads. I'll also link back some of the other compiler work > which is landing for (1). > > Thanks, > > - Paul