Received: by 10.223.176.5 with SMTP id f5csp1733295wra; Sun, 4 Feb 2018 10:45:53 -0800 (PST) X-Google-Smtp-Source: AH8x225rU5+TyG8k6o7tLfvmGzFDl5Z6d04VUMbIXCvhEtsLM/4qF7gzBTxyEvNIZBFVKKwab+bi X-Received: by 10.101.78.12 with SMTP id r12mr36715808pgt.33.1517769953728; Sun, 04 Feb 2018 10:45:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517769953; cv=none; d=google.com; s=arc-20160816; b=Jj8shvXH6uxIrWC6YO7blqoTa+p1kVC+dB9l+fGQYEFwryXM2wCIPRuPiaJlpueurz hQLF+YRzvqR0VsgqKttXXPRokiPN2+sFW/UhjHcHb+CANwMWvFD8L3lOrixFmUvGaM1v C3APEKA+wZmXgC8DpNx10fOg4FfZLPtp7MW4h+gDl3sIngie7oh7HApXKARKy05bQtyN EXUV6Lr/1ppJOzMo151tmYjjeFouN0/WClTYEnsdbR7hLQoHIZ7wEQLY+y9cpuyl92UL otHsjFaQz1MHGzZ0xAEp+zzxIf/ttBpYLd9YJu0TODz7mVXwoOv4NJKvQM6eFos/aWtz 9LXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date :arc-authentication-results; bh=PgLn4TQz3l0wgVd4WgcJEwlYcjfVjaSKtIknUyn11Zw=; b=SKrJW82n/ubNcTwE4dSn/gTdciflY3EcN2SrbiCShEgkJdcxhMVMaYJIz3p92LScfb VmmQc3TTzT0Uc4U3LTKbeu7C0Ylo947XgLdAM5lRxg2D71uH53F5/dX+JwGFOyq1gwVU X42cooQdtxFGiEcVcVdkcsgP6mH7q+FaDu5slR/XJ2ZVqTIJRibOiRKRy0OPA9ZaRNZU dWjCEFmZuX34xjToqohtNx57KaOG6NnZ4iezHSkRJJ+uZZHStPey866+r+tWejUFpyMC kbNG+H0X6go6J+aZJTBxRqRxCB8RdG/0biyFPiptSl9v+9mwF5eVcEY6+EMwNhYZUQm5 JvJg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m7si3003202pgr.233.2018.02.04.10.45.38; Sun, 04 Feb 2018 10:45:53 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752110AbeBDSog (ORCPT + 99 others); Sun, 4 Feb 2018 13:44:36 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:51734 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751651AbeBDSo2 (ORCPT ); Sun, 4 Feb 2018 13:44:28 -0500 Received: from p4fea5f09.dip0.t-ipconnect.de ([79.234.95.9] helo=nanos.glx-home) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1eiPDf-0006n1-Md; Sun, 04 Feb 2018 19:40:51 +0100 Date: Sun, 4 Feb 2018 19:43:50 +0100 (CET) From: Thomas Gleixner To: Ingo Molnar cc: David Woodhouse , Linus Torvalds , KarimAllah Ahmed , Linux Kernel Mailing List , Andi Kleen , Andrea Arcangeli , Andy Lutomirski , Arjan van de Ven , Ashok Raj , Asit Mallick , Borislav Petkov , Dan Williams , Dave Hansen , Greg Kroah-Hartman , "H . Peter Anvin" , Ingo Molnar , Janakarajan Natarajan , Joerg Roedel , Jun Nakajima , Laura Abbott , Masami Hiramatsu , Paolo Bonzini , Peter Zijlstra , =?ISO-8859-2?Q?Radim_Kr=E8m=E1=F8?= , Tim Chen , Tom Lendacky , KVM list , the arch/x86 maintainers , Arjan Van De Ven Subject: Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation In-Reply-To: <20180123102318.airsvcl5uckguo2z@gmail.com> Message-ID: References: <1516476182-5153-10-git-send-email-karahmed@amazon.de> <1516566497.9814.78.camel@infradead.org> <1516572013.9814.109.camel@infradead.org> <1516638426.9521.20.camel@infradead.org> <20180123072930.soz25cyky3u4hpgv@gmail.com> <20180123075358.nztpyxympwfkyi2a@gmail.com> <1516699832.9521.123.camel@infradead.org> <20180123102318.airsvcl5uckguo2z@gmail.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="8323329-1739222675-1517769830=:1035" X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323329-1739222675-1517769830=:1035 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8BIT On Tue, 23 Jan 2018, Ingo Molnar wrote: > * David Woodhouse wrote: > > > > On SkyLake this would add an overhead of maybe 2-3 cycles per function call and? > > > obviously all this code and data would be very cache hot. Given that the average? > > > number of function calls per system call is around a dozen, this would be _much_? > > > faster than any microcode/MSR based approach. > > > > That's kind of neat, except you don't want it at the top of the > > function; you want it at the bottom. > > > > If you could hijack the *return* site, then you could check for > > underflow and stuff the RSB right there. But in __fentry__ there's not > > a lot you can do other than complain that something bad is going to > > happen in the future. You know that a string of 16+ rets is going to > > happen, but you've got no gadget in *there* to deal with it when it > > does. > > No, it can be done with the existing CALL instrumentation callback that > CONFIG_DYNAMIC_FTRACE=y provides, by pushing a RET trampoline on the stack from > the CALL trampoline - see my previous email. > > > HJ did have patches to turn 'ret' into a form of retpoline, which I > > don't think ever even got performance-tested. > > Return instrumentation is possible as well, but there are two major drawbacks: > > - GCC support for it is not as widely available and return instrumentation is > less tested in Linux kernel contexts > > - a major point of my suggestion is that CONFIG_DYNAMIC_FTRACE=y is already > enabled in distros here and today, so the runtime overhead to non-SkyLake CPUs > would be literally zero, while still allowing to fix the RSB vulnerability on > SkyLake. I played around with that a bit during the week and it turns out to be less simple than you thought. 1) Injecting a trampoline return only works for functions which have all arguments in registers. For functions with arguments on stack like all varg functions this breaks because the function wont find its arguments anymore. I have not yet found a way to figure out reliably which functions have arguments on stack. That might be an option to simply ignore them. The workaround is to replace the original return on stack with the trampoline and store the original return in a per thread stack, which I implemented. But this sucks performance wise badly. 2) Doing the whole dance on function entry has a real down side because you refill RSB on every 15th return no matter whether its required or not. That really gives a very prominent performance hit. An alternative idea is to do the following (not yet implemented): __fentry__: incl PER_CPU_VAR(call_depth) retq and use -mfunction-return=thunk-extern which is available on retpoline enabled compilers. That's a reasonable requirement because w/o retpoline the whole SKL magic is pointless anyway. -mfunction-return=thunk-extern issues jump __x86_return_thunk instead of ret. In the thunk we can do the whole shebang of mitigation. That jump can be identified at build time and it can be patched into a ret for unaffected CPUs. Ideally we do the patching at build time and only patch the jump in when SKL is detected or paranoia requests it. We could actually look into that for tracing as well. The only reason why we don't do that is to select the ideal nop for the CPU the kernel runs on, which obviously cannot be known at build time. __x86_return_thunk would look like this: __x86_return_thunk: testl $0xf, PER_CPU_VAR(call_depth) jnz 1f stuff_rsb 1: decl PER_CPU_VAR(call_depth) ret The call_depth variable would be reset on context switch. Though that has another problem: tail calls. Tail calls will invoke the __fentry__ call of the tail called function, which makes the call_depth counter unbalanced. Tail calls can be prevented by using -fno-optimize-sibling-calls, but that probably sucks as well. Yet another possibility is to avoid the function entry and accouting magic and use the generic gcc return thunk: __x86_return_thunk: call L2 L1: pause lfence jmp L1 L2: lea 8(%rsp), %rsp|lea 4(%esp), %esp ret which basically refills the RSB on every return. That can be inline or extern, but in both cases we should be able to patch it out. I have no idea how that affects performance, but it might be worthwhile to experiment with that. If nobody beats me to it, I'll play around with that some more after vacation. Thanks, tglx --8323329-1739222675-1517769830=:1035--