Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934190AbbDUQ1i (ORCPT ); Tue, 21 Apr 2015 12:27:38 -0400 Received: from mx1.redhat.com ([209.132.183.28]:40982 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932613AbbDUQ1e (ORCPT ); Tue, 21 Apr 2015 12:27:34 -0400 From: Denys Vlasenko To: Ingo Molnar Cc: Denys Vlasenko , Linus Torvalds , Steven Rostedt , Borislav Petkov , "H. Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Frederic Weisbecker , Alexei Starovoitov , Will Drewry , Kees Cook , x86@kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] x86/asm/entry/64: better check for canonical address Date: Tue, 21 Apr 2015 18:27:29 +0200 Message-Id: <1429633649-20169-1-git-send-email-dvlasenk@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4694 Lines: 124 This change makes the check exact (no more false positives on "negative" addresses). It isn't really important to be fully correct here - almost all addresses we'll ever see will be userspace ones, but OTOH it looks to be cheap enough: the new code uses two more ALU ops but preserves %rcx, allowing to not reload it from pt_regs->cx again. On disassembly level, the changes are: cmp %rcx,0x80(%rsp) -> mov 0x80(%rsp),%r11; cmp %rcx,%r11 shr $0x2f,%rcx -> shl $0x10,%rcx; sar $0x10,%rcx; cmp %rcx,%r11 mov 0x58(%rsp),%rcx -> (eliminated) On 03/26/2015 07:45 PM, Andy Lutomirski wrote: > I suspect that the two added ALU ops are free for all practical > purposes, and the performance of this path isn't *that* critical. > > If anyone is running with vsyscall=native because they need the > performance, then this would be a big win. Otherwise I don't have a > real preference. Anyone else have any thoughts here? > > Let me just run through the math quickly to make sure I believe all the numbers: > > Canonical addresses either start with 17 zeros or 17 ones. > > In the old code, we checked that the top (64-47) = 17 bits were all > zero. We did this by shifting right by 47 bits and making sure that > nothing was left. > > In the new code, we're shifting left by (64 - 48) = 16 bits and then > signed shifting right by the same amount, this propagating the 17th > highest bit to all positions to its left. If we get the same value we > started with, then we're good to go. > > So it looks okay to me. > > IOW, the new code extends the optimization correctly to one more case > (native vsyscalls or the really weird corner case of returns to > emulated vsyscalls, although that should basically never happen) at > the cost of two probably-free ALU ops. Signed-off-by: Denys Vlasenko CC: Linus Torvalds CC: Steven Rostedt CC: Ingo Molnar CC: Borislav Petkov CC: "H. Peter Anvin" CC: Andy Lutomirski CC: Oleg Nesterov CC: Frederic Weisbecker CC: Alexei Starovoitov CC: Will Drewry CC: Kees Cook CC: x86@kernel.org CC: linux-kernel@vger.kernel.org --- Changes since last submission: expanded commit message with Andy's reply as requested by Ingo. arch/x86/kernel/entry_64.S | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 3bdfdcd..e952f6b 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -410,26 +410,27 @@ syscall_return: * a completely clean 64-bit userspace context. */ movq RCX(%rsp),%rcx - cmpq %rcx,RIP(%rsp) /* RCX == RIP */ + movq RIP(%rsp),%r11 + cmpq %rcx,%r11 /* RCX == RIP */ jne opportunistic_sysret_failed /* * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP * in kernel space. This essentially lets the user take over - * the kernel, since userspace controls RSP. It's not worth - * testing for canonicalness exactly -- this check detects any - * of the 17 high bits set, which is true for non-canonical - * or kernel addresses. (This will pessimize vsyscall=native. - * Big deal.) + * the kernel, since userspace controls RSP. * - * If virtual addresses ever become wider, this will need + * If width of "canonical tail" ever becomes variable, this will need * to be updated to remain correct on both old and new CPUs. */ .ifne __VIRTUAL_MASK_SHIFT - 47 .error "virtual address width changed -- SYSRET checks need update" .endif - shr $__VIRTUAL_MASK_SHIFT, %rcx - jnz opportunistic_sysret_failed + /* Change top 16 bits to be the sign-extension of 47th bit */ + shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx + sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx + /* If this changed %rcx, it was not canonical */ + cmpq %rcx, %r11 + jne opportunistic_sysret_failed cmpq $__USER_CS,CS(%rsp) /* CS must match SYSRET */ jne opportunistic_sysret_failed @@ -466,8 +467,8 @@ syscall_return: */ syscall_return_via_sysret: CFI_REMEMBER_STATE - /* r11 is already restored (see code above) */ - RESTORE_C_REGS_EXCEPT_R11 + /* rcx and r11 are already restored (see code above) */ + RESTORE_C_REGS_EXCEPT_RCX_R11 movq RSP(%rsp),%rsp USERGS_SYSRET64 CFI_RESTORE_STATE -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/