2014-07-20 21:43:04

by Sven Wegener

[permalink] [raw]
Subject: [PATCH] x86_32, entry: store badsys error code in %eax

Commit 554086d ("x86_32, entry: Do syscall exit work on badsys
(CVE-2014-4508)") introduced a subtle regression in the x86_32 syscall
entry code, resulting in syscall() not returning proper errors for
non-existing syscalls on CPUs not supporting the sysenter feature.

The following code:

> int result = syscall(666);
> printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno));

results in:

> result=666 errno=0 error=Success

Obviously, the syscall return value is the called syscall number, but it
should have been an ENOSYS error. When run under ptrace it behaves
correctly, which makes it hard to debug in the wild:

> result=-1 errno=38 error=Function not implemented

The %eax register is the return value register. For debugging via ptrace
the syscall entry code stores the complete register context on the
stack. The badsys handlers only store the ENOSYS error code in the
ptrace register set and do not set %eax like a regular syscall handler
would. The old resume_userspace call chain contains code that clobbers
%eax and it restores %eax from the ptrace registers afterwards. The same
goes for the ptrace-enabled call chain. When ptrace is not used, the
syscall return value is the passed-in syscall number from the
%eax register.

Use %eax as the return value register in syscall_badsys and
sysenter_badsys, like a real syscall handler does, and have the caller
push the value onto the stack for ptrace access.

Signed-off-by: Sven Wegener <[email protected]>
---
arch/x86/kernel/entry_32.S | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index dbaa23e..0d0c9d4 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -425,8 +425,8 @@ sysenter_do_call:
cmpl $(NR_syscalls), %eax
jae sysenter_badsys
call *sys_call_table(,%eax,4)
- movl %eax,PT_EAX(%esp)
sysenter_after_call:
+ movl %eax,PT_EAX(%esp)
LOCKDEP_SYS_EXIT
DISABLE_INTERRUPTS(CLBR_ANY)
TRACE_IRQS_OFF
@@ -502,6 +502,7 @@ ENTRY(system_call)
jae syscall_badsys
syscall_call:
call *sys_call_table(,%eax,4)
+syscall_after_call:
movl %eax,PT_EAX(%esp) # store the return value
syscall_exit:
LOCKDEP_SYS_EXIT
@@ -675,12 +676,12 @@ syscall_fault:
END(syscall_fault)

syscall_badsys:
- movl $-ENOSYS,PT_EAX(%esp)
- jmp syscall_exit
+ movl $-ENOSYS,%eax
+ jmp syscall_after_call
END(syscall_badsys)

sysenter_badsys:
- movl $-ENOSYS,PT_EAX(%esp)
+ movl $-ENOSYS,%eax
jmp sysenter_after_call
END(syscall_badsys)
CFI_ENDPROC


2014-07-20 22:09:06

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86_32, entry: store badsys error code in %eax

This is not a subtle regression at all. It is in fact a very very serious one.

On July 20, 2014 2:33:50 PM PDT, Sven Wegener <[email protected]> wrote:
>Commit 554086d ("x86_32, entry: Do syscall exit work on badsys
>(CVE-2014-4508)") introduced a subtle regression in the x86_32 syscall
>entry code, resulting in syscall() not returning proper errors for
>non-existing syscalls on CPUs not supporting the sysenter feature.
>
>The following code:
>
>> int result = syscall(666);
>> printf("result=%d errno=%d error=%s\n", result, errno,
>strerror(errno));
>
>results in:
>
>> result=666 errno=0 error=Success
>
>Obviously, the syscall return value is the called syscall number, but
>it
>should have been an ENOSYS error. When run under ptrace it behaves
>correctly, which makes it hard to debug in the wild:
>
>> result=-1 errno=38 error=Function not implemented
>
>The %eax register is the return value register. For debugging via
>ptrace
>the syscall entry code stores the complete register context on the
>stack. The badsys handlers only store the ENOSYS error code in the
>ptrace register set and do not set %eax like a regular syscall handler
>would. The old resume_userspace call chain contains code that clobbers
>%eax and it restores %eax from the ptrace registers afterwards. The
>same
>goes for the ptrace-enabled call chain. When ptrace is not used, the
>syscall return value is the passed-in syscall number from the
>%eax register.
>
>Use %eax as the return value register in syscall_badsys and
>sysenter_badsys, like a real syscall handler does, and have the caller
>push the value onto the stack for ptrace access.
>
>Signed-off-by: Sven Wegener <[email protected]>
>---
> arch/x86/kernel/entry_32.S | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
>diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
>index dbaa23e..0d0c9d4 100644
>--- a/arch/x86/kernel/entry_32.S
>+++ b/arch/x86/kernel/entry_32.S
>@@ -425,8 +425,8 @@ sysenter_do_call:
> cmpl $(NR_syscalls), %eax
> jae sysenter_badsys
> call *sys_call_table(,%eax,4)
>- movl %eax,PT_EAX(%esp)
> sysenter_after_call:
>+ movl %eax,PT_EAX(%esp)
> LOCKDEP_SYS_EXIT
> DISABLE_INTERRUPTS(CLBR_ANY)
> TRACE_IRQS_OFF
>@@ -502,6 +502,7 @@ ENTRY(system_call)
> jae syscall_badsys
> syscall_call:
> call *sys_call_table(,%eax,4)
>+syscall_after_call:
> movl %eax,PT_EAX(%esp) # store the return value
> syscall_exit:
> LOCKDEP_SYS_EXIT
>@@ -675,12 +676,12 @@ syscall_fault:
> END(syscall_fault)
>
> syscall_badsys:
>- movl $-ENOSYS,PT_EAX(%esp)
>- jmp syscall_exit
>+ movl $-ENOSYS,%eax
>+ jmp syscall_after_call
> END(syscall_badsys)
>
> sysenter_badsys:
>- movl $-ENOSYS,PT_EAX(%esp)
>+ movl $-ENOSYS,%eax
> jmp sysenter_after_call
> END(syscall_badsys)
> CFI_ENDPROC

--
Sent from my mobile phone. Please pardon brevity and lack of formatting.

2014-07-21 16:54:08

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86_32, entry: store badsys error code in %eax

On Sun, Jul 20, 2014 at 3:07 PM, H. Peter Anvin <[email protected]> wrote:
> This is not a subtle regression at all. It is in fact a very very serious one.

Indeed.

Does this really work correctly on sysenter? It sure looks like the
sysenter path has the same problem.

Grr. I'm not set up to properly test 32-bit kernels. The patch looks
correct, but I'm obviously not an infallible authority on "looking
correct" when it comes to entry_32.S.

--Andy

>
> On July 20, 2014 2:33:50 PM PDT, Sven Wegener <[email protected]> wrote:
>>Commit 554086d ("x86_32, entry: Do syscall exit work on badsys
>>(CVE-2014-4508)") introduced a subtle regression in the x86_32 syscall
>>entry code, resulting in syscall() not returning proper errors for
>>non-existing syscalls on CPUs not supporting the sysenter feature.
>>
>>The following code:
>>
>>> int result = syscall(666);
>>> printf("result=%d errno=%d error=%s\n", result, errno,
>>strerror(errno));
>>
>>results in:
>>
>>> result=666 errno=0 error=Success
>>
>>Obviously, the syscall return value is the called syscall number, but
>>it
>>should have been an ENOSYS error. When run under ptrace it behaves
>>correctly, which makes it hard to debug in the wild:
>>
>>> result=-1 errno=38 error=Function not implemented
>>
>>The %eax register is the return value register. For debugging via
>>ptrace
>>the syscall entry code stores the complete register context on the
>>stack. The badsys handlers only store the ENOSYS error code in the
>>ptrace register set and do not set %eax like a regular syscall handler
>>would. The old resume_userspace call chain contains code that clobbers
>>%eax and it restores %eax from the ptrace registers afterwards. The
>>same
>>goes for the ptrace-enabled call chain. When ptrace is not used, the
>>syscall return value is the passed-in syscall number from the
>>%eax register.
>>
>>Use %eax as the return value register in syscall_badsys and
>>sysenter_badsys, like a real syscall handler does, and have the caller
>>push the value onto the stack for ptrace access.
>>
>>Signed-off-by: Sven Wegener <[email protected]>
>>---
>> arch/x86/kernel/entry_32.S | 9 +++++----
>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>
>>diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
>>index dbaa23e..0d0c9d4 100644
>>--- a/arch/x86/kernel/entry_32.S
>>+++ b/arch/x86/kernel/entry_32.S
>>@@ -425,8 +425,8 @@ sysenter_do_call:
>> cmpl $(NR_syscalls), %eax
>> jae sysenter_badsys
>> call *sys_call_table(,%eax,4)
>>- movl %eax,PT_EAX(%esp)
>> sysenter_after_call:
>>+ movl %eax,PT_EAX(%esp)
>> LOCKDEP_SYS_EXIT
>> DISABLE_INTERRUPTS(CLBR_ANY)
>> TRACE_IRQS_OFF
>>@@ -502,6 +502,7 @@ ENTRY(system_call)
>> jae syscall_badsys
>> syscall_call:
>> call *sys_call_table(,%eax,4)
>>+syscall_after_call:
>> movl %eax,PT_EAX(%esp) # store the return value
>> syscall_exit:
>> LOCKDEP_SYS_EXIT
>>@@ -675,12 +676,12 @@ syscall_fault:
>> END(syscall_fault)
>>
>> syscall_badsys:
>>- movl $-ENOSYS,PT_EAX(%esp)
>>- jmp syscall_exit
>>+ movl $-ENOSYS,%eax
>>+ jmp syscall_after_call
>> END(syscall_badsys)
>>
>> sysenter_badsys:
>>- movl $-ENOSYS,PT_EAX(%esp)
>>+ movl $-ENOSYS,%eax
>> jmp sysenter_after_call
>> END(syscall_badsys)
>> CFI_ENDPROC
>
> --
> Sent from my mobile phone. Please pardon brevity and lack of formatting.



--
Andy Lutomirski
AMA Capital Management, LLC

2014-07-21 17:23:49

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86_32, entry: store badsys error code in %eax

On Mon, Jul 21, 2014 at 10:20 AM, H. Peter Anvin <[email protected]> wrote:
> On 07/21/2014 09:53 AM, Andy Lutomirski wrote:
>> On Sun, Jul 20, 2014 at 3:07 PM, H. Peter Anvin <[email protected]> wrote:
>>> This is not a subtle regression at all. It is in fact a very very serious one.
>>
>> Indeed.
>>
>> Does this really work correctly on sysenter? It sure looks like the
>> sysenter path has the same problem.
>>
>> Grr. I'm not set up to properly test 32-bit kernels. The patch looks
>> correct, but I'm obviously not an infallible authority on "looking
>> correct" when it comes to entry_32.S.
>>
>> --Andy
>>
>
> KVM is great for that sort of things.

This may finally inspire me to teach virtme how to boot an alternate
root. I got sick of dealing with VM images awhile ago.

--Andy

>
> -hpa
>
>



--
Andy Lutomirski
AMA Capital Management, LLC

2014-07-21 17:24:04

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86_32, entry: store badsys error code in %eax

On 07/21/2014 09:53 AM, Andy Lutomirski wrote:
> On Sun, Jul 20, 2014 at 3:07 PM, H. Peter Anvin <[email protected]> wrote:
>> This is not a subtle regression at all. It is in fact a very very serious one.
>
> Indeed.
>
> Does this really work correctly on sysenter? It sure looks like the
> sysenter path has the same problem.
>
> Grr. I'm not set up to properly test 32-bit kernels. The patch looks
> correct, but I'm obviously not an infallible authority on "looking
> correct" when it comes to entry_32.S.
>
> --Andy
>

KVM is great for that sort of things.

-hpa

2014-07-21 22:00:21

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86_32, entry: store badsys error code in %eax

On Sun, Jul 20, 2014 at 2:33 PM, Sven Wegener <[email protected]> wrote:
> Commit 554086d ("x86_32, entry: Do syscall exit work on badsys
> (CVE-2014-4508)") introduced a subtle regression in the x86_32 syscall
> entry code, resulting in syscall() not returning proper errors for
> non-existing syscalls on CPUs not supporting the sysenter feature.

s/not supporting/supporting/

That means that this is IMO much worse than the other way around: all
newish 32-bit systems are affected.

Other than the typo and the missing Cc: stable:

Reviewed-and-tested-by: Andy Lutomirski <[email protected]>

--Andy

2014-07-22 06:58:04

by Sven Wegener

[permalink] [raw]
Subject: Re: [PATCH] x86_32, entry: store badsys error code in %eax

On Mon, 21 Jul 2014, Andy Lutomirski wrote:

> On Sun, Jul 20, 2014 at 2:33 PM, Sven Wegener <[email protected]> wrote:
> > Commit 554086d ("x86_32, entry: Do syscall exit work on badsys
> > (CVE-2014-4508)") introduced a subtle regression in the x86_32 syscall
> > entry code, resulting in syscall() not returning proper errors for
> > non-existing syscalls on CPUs not supporting the sysenter feature.
>
> s/not supporting/supporting/

Looks like I mixed the sep vs. syscall CPU flag. Initially I encountered
the issue on real hardware (Celeron) having the sep but not the syscall
flag. During testing it worked on an emulated CPU missing the sep and
having the syscall flag and broke on an emulated CPU having the sep and
missing the syscall flag. I only looked at the syscall flag, which is
completly invariant for this issue, and assumed it stands for sysenter
support, completly ignoring the sep flag.

Sven