Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932279AbeAHKC6 (ORCPT + 1 other); Mon, 8 Jan 2018 05:02:58 -0500 Received: from mx1.redhat.com ([209.132.183.28]:44300 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932186AbeAHKCy (ORCPT ); Mon, 8 Jan 2018 05:02:54 -0500 Date: Mon, 8 Jan 2018 11:02:51 +0100 From: Andrea Arcangeli To: Thomas Gleixner Cc: Alexei Starovoitov , Dan Williams , Alan Cox , Linus Torvalds , Linux Kernel Mailing List , linux-arch@vger.kernel.org, Andi Kleen , Arnd Bergmann , Greg Kroah-Hartman , Peter Zijlstra , Netdev , Ingo Molnar , "H. Peter Anvin" Subject: Re: [PATCH 06/18] x86, barrier: stop speculation for failed access_ok Message-ID: <20180108100251.GJ25546@redhat.com> References: <151520099201.32271.4677179499894422956.stgit@dwillia2-desk3.amr.corp.intel.com> <151520102670.32271.8447983009852138826.stgit@dwillia2-desk3.amr.corp.intel.com> <20180106123242.77f4d860@alans-desktop> <20180106181331.mmrqwwbu2jcjj2si@ast-mbp> <20180106183937.vkseldf4arkdlkum@ast-mbp> <20180106192517.ykvlcq4564cqy4u6@ast-mbp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Mon, 08 Jan 2018 10:02:54 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Sat, Jan 06, 2018 at 08:41:34PM +0100, Thomas Gleixner wrote: > optimized argumentation. We need to make sure that we have a solution which > kills the problem safely and then take it from there. Correctness first, > optimization later is the rule for this. Better safe than sorry. Agreed, assuming the objective here is to achieve a complete spectre fix fast. Also note there's a whole set of stuff to do in addition of IBRS: IBPB, stuff_RSB() and the register hygiene in kernel entry points and vmexists, that alters the whole syscall stackframe to be able to clear callee saved registers. That register hygiene was one of the most tedious pieces to get right along with the PTI "rep movsb" (no C) stack trampoline that never calls into C code with zero stack available because it's very bad to do so, consdering C is free to use some stack for register spillage. I suggest to discuss how important register hygiene is on top of IBRS, IBPB and stuff_RSB() to fix spectre, not future optimizations that only matter for old CPUS and are irrelevant for future silicon. I also suggest to discuss how to automate the other parts of variant#1 lfence/mfence across the bound checks, depending on arch with a open source scanner, or if to pretend developers think about it like we think about mb() (except no regression test will ever notice a bounds check speculation memory barrier missing). Reptolines alone are leaving a whole set of stuff unfixed: register hygiene still missing, bios/firmware calls still require ibrs, all asm has to be audited by hand as there's no sure asm scanner I know of (grep can go somewhere though) and the gcc dependency isn't very flexible to begin with, and they don't help with lfence/mfence across bound checks, they still require IBPB and stuff_RSB() to avoid guest/user mode against guest/user spectre variant#2 attacks. I don't see why we should talk about pure performance optimization at this point instead of focusing on the above. Not to tell if you want to guarantee mathematically that guest userland cannot read the guest kernel memory by starting a spectre variant#2 attack from guest userland to host userland (David Gilbert's new attack discovery). For that you'll have to set ibrs_enabled 2 ibpb_enabled 1 mode or ibrs_enabled 0 ibpb_enabled 2 mode in the host kernel or alternatively ibrs_enabled 0 ibpb_enabled 2 in the guest kernel. ibrs 2 bpbp 1 will prevent qemu userland to use the IBP so guest userland cannot probe it. ibrs 0 ibpb 2 will flush the IBP state at vmexit so qemu userland won't be affected by it. ibrs 0 ibpb 2 in guest will flush the IBP state at kernel entry so guest userland won't be able to affect anything. Of course such an attack from guest user -> guest kernel -> host kernel -> host user -> host kernel -> guest kernel -> guest user and probing IBP (RSB is fixed for good with unconditional stuff_RSB in vmexit even when SMEP is set, precisely because SMEP won't stop guest ring 3 to probe host ring 3 RSB and same for ring 0) is far fetched, but reptolines alone cannot solve it unless you also build qemu userland with reptolines (which then means the whole userland has to be built with reptolines because the qemu dependency chain is endless, includes glibc etc..). As a reminder (for lkml): if you use KVM, spectre variant#2 is the only attack that can affect guest/host memory isolation. spectre variant#1 and meltdown (aka variant#3) always have been impossible through KVM guest/user isolation. spectre variant#2 is the one that is harder to fix and it's the most theoretical of them all and it may be impossible to mount as an attack depending on host kernel code that has to play against itself to achieve it. The setup for such an attack is very tedious, takes half an hour or several hours depending on the amounts of memory and you may have to know already accurately the kernel that is running on the host. As opposed to spectre variant#1 and meltdown (aka variant#3), it's very unlikely anybody gets attacked through spectre variant#2. It's also the side channel with the lowest amount of kbytes/sec of bandwidth if mounted successfully in the first place. However if it can be mounted successfully it becomes almost a concern as the other two variants, which is why it needs fixing too. Thanks, Andrea