2015-04-21 16:27:38

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH] x86/asm/entry/64: better check for canonical address

This change makes the check exact (no more false positives
on "negative" addresses).

It isn't really important to be fully correct here -
almost all addresses we'll ever see will be userspace ones,
but OTOH it looks to be cheap enough:
the new code uses two more ALU ops but preserves %rcx,
allowing to not reload it from pt_regs->cx again.
On disassembly level, the changes are:

cmp %rcx,0x80(%rsp) -> mov 0x80(%rsp),%r11; cmp %rcx,%r11
shr $0x2f,%rcx -> shl $0x10,%rcx; sar $0x10,%rcx; cmp %rcx,%r11
mov 0x58(%rsp),%rcx -> (eliminated)

On 03/26/2015 07:45 PM, Andy Lutomirski wrote:
> I suspect that the two added ALU ops are free for all practical
> purposes, and the performance of this path isn't *that* critical.
>
> If anyone is running with vsyscall=native because they need the
> performance, then this would be a big win. Otherwise I don't have a
> real preference. Anyone else have any thoughts here?
>
> Let me just run through the math quickly to make sure I believe all the numbers:
>
> Canonical addresses either start with 17 zeros or 17 ones.
>
> In the old code, we checked that the top (64-47) = 17 bits were all
> zero. We did this by shifting right by 47 bits and making sure that
> nothing was left.
>
> In the new code, we're shifting left by (64 - 48) = 16 bits and then
> signed shifting right by the same amount, this propagating the 17th
> highest bit to all positions to its left. If we get the same value we
> started with, then we're good to go.
>
> So it looks okay to me.
>
> IOW, the new code extends the optimization correctly to one more case
> (native vsyscalls or the really weird corner case of returns to
> emulated vsyscalls, although that should basically never happen) at
> the cost of two probably-free ALU ops.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Linus Torvalds <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Oleg Nesterov <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Alexei Starovoitov <[email protected]>
CC: Will Drewry <[email protected]>
CC: Kees Cook <[email protected]>
CC: [email protected]
CC: [email protected]
---

Changes since last submission: expanded commit message with Andy's reply
as requested by Ingo.

arch/x86/kernel/entry_64.S | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 3bdfdcd..e952f6b 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -410,26 +410,27 @@ syscall_return:
* a completely clean 64-bit userspace context.
*/
movq RCX(%rsp),%rcx
- cmpq %rcx,RIP(%rsp) /* RCX == RIP */
+ movq RIP(%rsp),%r11
+ cmpq %rcx,%r11 /* RCX == RIP */
jne opportunistic_sysret_failed

/*
* On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
* in kernel space. This essentially lets the user take over
- * the kernel, since userspace controls RSP. It's not worth
- * testing for canonicalness exactly -- this check detects any
- * of the 17 high bits set, which is true for non-canonical
- * or kernel addresses. (This will pessimize vsyscall=native.
- * Big deal.)
+ * the kernel, since userspace controls RSP.
*
- * If virtual addresses ever become wider, this will need
+ * If width of "canonical tail" ever becomes variable, this will need
* to be updated to remain correct on both old and new CPUs.
*/
.ifne __VIRTUAL_MASK_SHIFT - 47
.error "virtual address width changed -- SYSRET checks need update"
.endif
- shr $__VIRTUAL_MASK_SHIFT, %rcx
- jnz opportunistic_sysret_failed
+ /* Change top 16 bits to be the sign-extension of 47th bit */
+ shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
+ sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
+ /* If this changed %rcx, it was not canonical */
+ cmpq %rcx, %r11
+ jne opportunistic_sysret_failed

cmpq $__USER_CS,CS(%rsp) /* CS must match SYSRET */
jne opportunistic_sysret_failed
@@ -466,8 +467,8 @@ syscall_return:
*/
syscall_return_via_sysret:
CFI_REMEMBER_STATE
- /* r11 is already restored (see code above) */
- RESTORE_C_REGS_EXCEPT_R11
+ /* rcx and r11 are already restored (see code above) */
+ RESTORE_C_REGS_EXCEPT_RCX_R11
movq RSP(%rsp),%rsp
USERGS_SYSRET64
CFI_RESTORE_STATE
--
1.8.1.4


2015-04-21 18:09:09

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/asm/entry/64: better check for canonical address

On Tue, Apr 21, 2015 at 9:27 AM, Denys Vlasenko <[email protected]> wrote:
> This change makes the check exact (no more false positives
> on "negative" addresses).

Acked-by: Andy Lutomirski <[email protected]>

I'll take a full implementation of what Intel says over probably
unmeasurable performance. If anyone in the AMD camp really cared, we
could add X86_BUG_SYSRET_NEEDS_CANONICAL_RCX and use alternatives to
patch this out on AMD. I doubt this would buy us much.

--Andy

Subject: [tip:x86/asm] x86/asm/entry/64: Implement better check for canonical addresses

Commit-ID: 17be0aec74fb036eb4eb32c2268f3420a034762b
Gitweb: http://git.kernel.org/tip/17be0aec74fb036eb4eb32c2268f3420a034762b
Author: Denys Vlasenko <[email protected]>
AuthorDate: Tue, 21 Apr 2015 18:27:29 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 22 Apr 2015 08:37:56 +0200

x86/asm/entry/64: Implement better check for canonical addresses

This change makes the check exact (no more false positives
on "negative" addresses).

Andy explains:

"Canonical addresses either start with 17 zeros or 17 ones.

In the old code, we checked that the top (64-47) = 17 bits were all
zero. We did this by shifting right by 47 bits and making sure that
nothing was left.

In the new code, we're shifting left by (64 - 48) = 16 bits and then
signed shifting right by the same amount, this propagating the 17th
highest bit to all positions to its left. If we get the same value we
started with, then we're good to go."

While it isn't really important to be fully correct here -
almost all addresses we'll ever see will be userspace ones,
but OTOH it looks to be cheap enough: the new code uses
two more ALU ops but preserves %rcx, allowing to not
reload it from pt_regs->cx again.

On disassembly level, the changes are:

cmp %rcx,0x80(%rsp) -> mov 0x80(%rsp),%r11; cmp %rcx,%r11
shr $0x2f,%rcx -> shl $0x10,%rcx; sar $0x10,%rcx; cmp %rcx,%r11
mov 0x58(%rsp),%rcx -> (eliminated)

Signed-off-by: Denys Vlasenko <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Drewry <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Changelog massage. ]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/entry_64.S | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index c7b2384..3c78a15 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -410,26 +410,27 @@ syscall_return:
* a completely clean 64-bit userspace context.
*/
movq RCX(%rsp),%rcx
- cmpq %rcx,RIP(%rsp) /* RCX == RIP */
+ movq RIP(%rsp),%r11
+ cmpq %rcx,%r11 /* RCX == RIP */
jne opportunistic_sysret_failed

/*
* On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
* in kernel space. This essentially lets the user take over
- * the kernel, since userspace controls RSP. It's not worth
- * testing for canonicalness exactly -- this check detects any
- * of the 17 high bits set, which is true for non-canonical
- * or kernel addresses. (This will pessimize vsyscall=native.
- * Big deal.)
+ * the kernel, since userspace controls RSP.
*
- * If virtual addresses ever become wider, this will need
+ * If width of "canonical tail" ever becomes variable, this will need
* to be updated to remain correct on both old and new CPUs.
*/
.ifne __VIRTUAL_MASK_SHIFT - 47
.error "virtual address width changed -- SYSRET checks need update"
.endif
- shr $__VIRTUAL_MASK_SHIFT, %rcx
- jnz opportunistic_sysret_failed
+ /* Change top 16 bits to be the sign-extension of 47th bit */
+ shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
+ sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
+ /* If this changed %rcx, it was not canonical */
+ cmpq %rcx, %r11
+ jne opportunistic_sysret_failed

cmpq $__USER_CS,CS(%rsp) /* CS must match SYSRET */
jne opportunistic_sysret_failed
@@ -466,8 +467,8 @@ syscall_return:
*/
syscall_return_via_sysret:
CFI_REMEMBER_STATE
- /* r11 is already restored (see code above) */
- RESTORE_C_REGS_EXCEPT_R11
+ /* rcx and r11 are already restored (see code above) */
+ RESTORE_C_REGS_EXCEPT_RCX_R11
movq RSP(%rsp),%rsp
USERGS_SYSRET64
CFI_RESTORE_STATE

2015-04-23 15:11:00

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/asm/entry/64: better check for canonical address

On Tue, Apr 21, 2015 at 11:08:42AM -0700, Andy Lutomirski wrote:
> I'll take a full implementation of what Intel says over probably
> unmeasurable performance. If anyone in the AMD camp really cared, we
> could add X86_BUG_SYSRET_NEEDS_CANONICAL_RCX and use alternatives to
> patch this out on AMD. I doubt this would buy us much.

Err, why do we care if RCX is canonical when executing SYSRET?

The RIP canonicalness test is being done anyway and we read RIP from
RCX. What am I missing?

Oh, and then there's this:

http://lists.xen.org/archives/html/xen-announce/2012-06/msg00001.html

and more specifically:

"AMD have issued the following statement:

AMD processors' SYSRET behavior is such that a non-canonical address
in RCX does not generate a #GP while in CPL0. We have verified this
with our architecture team, with our design team, and have performed
tests that verified this on silicon. Therefore, this privilege
escalation exposure is not applicable to any AMD processor.
"

oh, and look at one of the xen fixes, hahaha! (at the end). Intel faults
in ring0 due to non-canonical RCX but with user RSP. Lovely.

---
x86_64: Do not execute sysret with a non-canonical return address

Check for non-canonical guest RIP before attempting to execute sysret.
If sysret is executed with a non-canonical value in RCX, Intel CPUs
take the fault in ring0, but we will necessarily already have switched
to the the user's stack pointer.

This is a security vulnerability, XSA-7 / CVE-2012-0217.

Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Ian Campbell <[email protected]>
Signed-off-by: Ian Jackson <[email protected]>
Tested-by: Ian Campbell <[email protected]>
Acked-by: Keir Fraser <[email protected]>
Committed-by: Ian Jackson <[email protected]>

diff -r 340062faf298 -r ad87903fdca1 xen/arch/x86/x86_64/entry.S
--- a/xen/arch/x86/x86_64/entry.S Wed May 23 11:06:49 2012 +0100
+++ b/xen/arch/x86/x86_64/entry.S Thu May 24 11:02:35 2012 +0100
@@ -40,6 +40,13 @@ restore_all_guest:
testw $TRAP_syscall,4(%rsp)
jz iret_exit_to_guest

+ /* Don't use SYSRET path if the return address is not canonical. */
+ movq 8(%rsp),%rcx
+ sarq $47,%rcx
+ incl %ecx
+ cmpl $1,%ecx
+ ja .Lforce_iret
+
addq $8,%rsp
popq %rcx # RIP
popq %r11 # CS
@@ -50,6 +57,10 @@ restore_all_guest:
sysretq
1: sysretl

+.Lforce_iret:
+ /* Mimic SYSRET behavior. */
+ movq 8(%rsp),%rcx # RIP
+ movq 24(%rsp),%r11 # RFLAGS
ALIGN
/* No special register assumptions. */
iret_exit_to_guest:
---

Thanks.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-23 15:41:57

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/asm/entry/64: better check for canonical address

On Thu, Apr 23, 2015 at 8:10 AM, Borislav Petkov <[email protected]> wrote:
> On Tue, Apr 21, 2015 at 11:08:42AM -0700, Andy Lutomirski wrote:
>> I'll take a full implementation of what Intel says over probably
>> unmeasurable performance. If anyone in the AMD camp really cared, we
>> could add X86_BUG_SYSRET_NEEDS_CANONICAL_RCX and use alternatives to
>> patch this out on AMD. I doubt this would buy us much.
>
> Err, why do we care if RCX is canonical when executing SYSRET?
>
> The RIP canonicalness test is being done anyway and we read RIP from
> RCX. What am I missing?

I was rather vague there. Let me try again:

If anyone in the AMD camp really cared, we could add a new bug flag
X86_BUG_SYSRET_NEEDS_CANONICAL_RCX and set it on Intel chips only, so
we could use alternatives to patch out the check when running on
sensible AMD hardware. This would speed the slow path up by a couple
of cycles on AMD chips.

Does that make more sense? We could call it
X86_BUG_SYSRET_NEEDS_CANONICAL_RIP if that makes more sense.

--Andy

2015-04-23 15:49:37

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/asm/entry/64: better check for canonical address

On Thu, Apr 23, 2015 at 08:41:15AM -0700, Andy Lutomirski wrote:
> I was rather vague there. Let me try again:
>
> If anyone in the AMD camp really cared, we could add a new bug flag
> X86_BUG_SYSRET_NEEDS_CANONICAL_RCX and set it on Intel chips only, so
> we could use alternatives to patch out the check when running on
> sensible AMD hardware. This would speed the slow path up by a couple
> of cycles on AMD chips.
>
> Does that make more sense? We could call it
> X86_BUG_SYSRET_NEEDS_CANONICAL_RIP if that makes more sense.

Actually "...NEEDS_CANONICAL_RCX" makes more sense as this is what we're
going to patch out eventually, if it makes sense - the RIP canonicalness
test is being done as part of SYSRET, just RCX is not being tested.

Tell you what - how about I perf stat this first by commenting out that
couple of instructions on AMD to see whether it brings anything.

Got an idea for a workload other than a kernel build? :-)

Although a kernel build should do a lot of syscalls too...

Thanks.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-23 15:52:39

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/asm/entry/64: better check for canonical address

On Thu, Apr 23, 2015 at 8:49 AM, Borislav Petkov <[email protected]> wrote:
> On Thu, Apr 23, 2015 at 08:41:15AM -0700, Andy Lutomirski wrote:
>> I was rather vague there. Let me try again:
>>
>> If anyone in the AMD camp really cared, we could add a new bug flag
>> X86_BUG_SYSRET_NEEDS_CANONICAL_RCX and set it on Intel chips only, so
>> we could use alternatives to patch out the check when running on
>> sensible AMD hardware. This would speed the slow path up by a couple
>> of cycles on AMD chips.
>>
>> Does that make more sense? We could call it
>> X86_BUG_SYSRET_NEEDS_CANONICAL_RIP if that makes more sense.
>
> Actually "...NEEDS_CANONICAL_RCX" makes more sense as this is what we're
> going to patch out eventually, if it makes sense - the RIP canonicalness
> test is being done as part of SYSRET, just RCX is not being tested.
>
> Tell you what - how about I perf stat this first by commenting out that
> couple of instructions on AMD to see whether it brings anything.
>
> Got an idea for a workload other than a kernel build? :-)
>
> Although a kernel build should do a lot of syscalls too...

Kernel build should be fine. Or "timing_test_64 10 sys_enosys 1" or
"perf_self_monitor" (warning: WIP). Make sure you either have context
tracking forced on or something else (audit?) that forces the slow
path, though, or you won't see it at all.

https://git.kernel.org/cgit/linux/kernel/git/luto/misc-tests.git/

--Andy

>
> Thanks.
>
> --
> Regards/Gruss,
> Boris.
>
> ECO tip #101: Trim your mails when you reply.
> --



--
Andy Lutomirski
AMA Capital Management, LLC