2008-02-20 14:26:32

by Arne Georg Gleditsch

[permalink] [raw]
Subject: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls

Hi,

I'm looking at 2.6.25-rc2. vsyscall_sysctl_change contains code to NOP
out the actual system call instructions of the vsyscall page when
vsyscall64 is enabled. This seems to interact badly with the fallback
code in do_vgettimeofday which tries to call gettimeofday if the
configured clock source does not support vread. (In effect,
gettimeofday() becomes a nop and time() always returns 0. Not very
useful.)

Is there a good reason to keep this? Aren't the instructions in
question avoided (or invoked) according to the vsyscall64 flag by the
surrounding logic anyway?

--
Arne.


2008-02-20 18:09:05

by Thomas Gleixner

[permalink] [raw]
Subject: Re: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls

Arne,

On Wed, 20 Feb 2008, Arne Georg Gleditsch wrote:
> I'm looking at 2.6.25-rc2. vsyscall_sysctl_change contains code to NOP
> out the actual system call instructions of the vsyscall page when
> vsyscall64 is enabled. This seems to interact badly with the fallback
> code in do_vgettimeofday which tries to call gettimeofday if the
> configured clock source does not support vread. (In effect,
> gettimeofday() becomes a nop and time() always returns 0. Not very
> useful.)

Indeed.

> Is there a good reason to keep this? Aren't the instructions in
> question avoided (or invoked) according to the vsyscall64 flag by the
> surrounding logic anyway?

The initial intent of this was to make it harder for malicious code to
attack via the vsysall area.

But you are right, the code is indeed fundamentally unsafe in various
aspects:

1) the patching code runs without synchronizing other CPUs

2) it inserts NOPs even if there is no clock source which provides
vread

3) when the clock source changes to one without vread we run in
exactly the same problem as in #2

The correct solution is randomizing that area, but that's definitely
not an ad hoc fix.

Thanks for pointing this out. I'm looking into fixing this ASAP.

tglx

2008-02-21 16:02:57

by Andi Kleen

[permalink] [raw]
Subject: Re: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls

On Wed, Feb 20, 2008 at 02:57:34PM +0100, Arne Georg Gleditsch wrote:
> Hi,
>
> I'm looking at 2.6.25-rc2. vsyscall_sysctl_change contains code to NOP
> out the actual system call instructions of the vsyscall page when
> vsyscall64 is enabled. This seems to interact badly with the fallback
> code in do_vgettimeofday which tries to call gettimeofday if the
> configured clock source does not support vread. (In effect,
> gettimeofday() becomes a nop and time() always returns 0. Not very
> useful.)
>
> Is there a good reason to keep this? Aren't the instructions in
> question avoided (or invoked) according to the vsyscall64 flag by the
> surrounding logic anyway?

Yes they are. But a system call sequence at a known fixed address
is potentially useful to exploits. That is why it is nop'ed out when
it is not needed.

-Andi

2008-02-21 19:47:36

by Thomas Gleixner

[permalink] [raw]
Subject: Re: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls

On Thu, 21 Feb 2008, Andi Kleen wrote:

> On Wed, Feb 20, 2008 at 02:57:34PM +0100, Arne Georg Gleditsch wrote:
> > Hi,
> >
> > I'm looking at 2.6.25-rc2. vsyscall_sysctl_change contains code to NOP
> > out the actual system call instructions of the vsyscall page when
> > vsyscall64 is enabled. This seems to interact badly with the fallback
> > code in do_vgettimeofday which tries to call gettimeofday if the
> > configured clock source does not support vread. (In effect,
> > gettimeofday() becomes a nop and time() always returns 0. Not very
> > useful.)
> >
> > Is there a good reason to keep this? Aren't the instructions in
> > question avoided (or invoked) according to the vsyscall64 flag by the
> > surrounding logic anyway?
>
> Yes they are. But a system call sequence at a known fixed address
> is potentially useful to exploits. That is why it is nop'ed out when
> it is not needed.

That's a nice intent, but the reality is that this code is broken as
hell:

1) the patching code runs without synchronizing other CPUs

2) it inserts NOPs even if there is no clock source which provides
vread

3) when the clock source changes to one without vread we run in
exactly the same problem as in #2

4) if nobody toggles the proc entry from 1 to 0 and to 1 again, then
the syscall is not patched out contrary to your claim.

Thanks,

tglx

2008-02-21 20:03:47

by john stultz

[permalink] [raw]
Subject: Re: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls


On Thu, 2008-02-21 at 20:45 +0100, Thomas Gleixner wrote:
> On Thu, 21 Feb 2008, Andi Kleen wrote:
>
> > On Wed, Feb 20, 2008 at 02:57:34PM +0100, Arne Georg Gleditsch wrote:
> > > Hi,
> > >
> > > I'm looking at 2.6.25-rc2. vsyscall_sysctl_change contains code to NOP
> > > out the actual system call instructions of the vsyscall page when
> > > vsyscall64 is enabled. This seems to interact badly with the fallback
> > > code in do_vgettimeofday which tries to call gettimeofday if the
> > > configured clock source does not support vread. (In effect,
> > > gettimeofday() becomes a nop and time() always returns 0. Not very
> > > useful.)
> > >
> > > Is there a good reason to keep this? Aren't the instructions in
> > > question avoided (or invoked) according to the vsyscall64 flag by the
> > > surrounding logic anyway?
> >
> > Yes they are. But a system call sequence at a known fixed address
> > is potentially useful to exploits. That is why it is nop'ed out when
> > it is not needed.
>
> That's a nice intent, but the reality is that this code is broken as
> hell:
>
> 1) the patching code runs without synchronizing other CPUs
>
> 2) it inserts NOPs even if there is no clock source which provides
> vread
>
> 3) when the clock source changes to one without vread we run in
> exactly the same problem as in #2

I've not looked deeply here, but it seems to resolve #2 and #3, we need
some way for glibc to know if it should jump to the vsyscall or make the
syscall itself.

If it always jumps to the vsyscall address, it seems we have to have
some way of falling back to syscall inside the vsyscall.

That or we need to do the NOP/un-NOP part in the update_vsyscall code
dependent on if the current clocksource supports vread, instead of on
the /proc entry access.

-john


2008-02-21 20:53:56

by Thomas Gleixner

[permalink] [raw]
Subject: Re: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls

On Thu, 21 Feb 2008, john stultz wrote:
> > > Yes they are. But a system call sequence at a known fixed address
> > > is potentially useful to exploits. That is why it is nop'ed out when
> > > it is not needed.
> >
> > That's a nice intent, but the reality is that this code is broken as
> > hell:
> >
> > 1) the patching code runs without synchronizing other CPUs
> >
> > 2) it inserts NOPs even if there is no clock source which provides
> > vread
> >
> > 3) when the clock source changes to one without vread we run in
> > exactly the same problem as in #2
>
> I've not looked deeply here, but it seems to resolve #2 and #3, we need
> some way for glibc to know if it should jump to the vsyscall or make the
> syscall itself.

Does glibc have an existing fallback, when we return -ENOSYS (or
whatever) from the vsyscall path ? If not, we can not do this.

> If it always jumps to the vsyscall address, it seems we have to have
> some way of falling back to syscall inside the vsyscall.
>
> That or we need to do the NOP/un-NOP part in the update_vsyscall code
> dependent on if the current clocksource supports vread, instead of on
> the /proc entry access.

That won't fly. We need to sychronize the CPUs when we patch the code,
which is not possible from update_wall_time with interrupts disabled.

I fear we have to kill the broken code for .25 and implement vdso
randomizing to fix that. Another short term solution to prevent
attacks via the syscall instruction would be to sanity check syscalls
when the call originates from the vsyscall address range, but that
might be frowned upon as it adds two assembler instructions into the
syscall hotpath :)

Thanks,

tglx

2008-02-21 21:01:58

by Thomas Gleixner

[permalink] [raw]
Subject: Re: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls

On Thu, 21 Feb 2008, Thomas Gleixner wrote:
> > That or we need to do the NOP/un-NOP part in the update_vsyscall code
> > dependent on if the current clocksource supports vread, instead of on
> > the /proc entry access.
>
> That won't fly. We need to sychronize the CPUs when we patch the code,
> which is not possible from update_wall_time with interrupts disabled.

Also this is utterly stupid as we keep the syscall in cases where we
do not have vread anyway, so we keep the attack point open for a lot
of existing machines due to TSC wreckage and HPET unavailability.

Thanks,

tglx

2008-02-22 10:37:28

by Arne Georg Gleditsch

[permalink] [raw]
Subject: Re: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls

Andi Kleen <[email protected]> writes:
> On Wed, Feb 20, 2008 at 02:57:34PM +0100, Arne Georg Gleditsch wrote:
>> Hi,
>>
>> I'm looking at 2.6.25-rc2. vsyscall_sysctl_change contains code to NOP
>> out the actual system call instructions of the vsyscall page when
>> vsyscall64 is enabled. This seems to interact badly with the fallback
>> code in do_vgettimeofday which tries to call gettimeofday if the
>> configured clock source does not support vread. (In effect,
>> gettimeofday() becomes a nop and time() always returns 0. Not very
>> useful.)
>>
>> Is there a good reason to keep this? Aren't the instructions in
>> question avoided (or invoked) according to the vsyscall64 flag by the
>> surrounding logic anyway?
>
> Yes they are. But a system call sequence at a known fixed address
> is potentially useful to exploits. That is why it is nop'ed out when
> it is not needed.

Reasonable enough, as long as it can be determined to be not needed.
Still, isn't the __vsyscall_gtod_data structure part of the same page?
Wouldn't that give you access to any 2-byte opcode you want every 64k
seconds? You'd need to time your attack, of course, but that could be
done prior to actually launching the exploit...

--
Arne.

2008-02-22 12:50:00

by Andi Kleen

[permalink] [raw]
Subject: Re: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls

On Thu, Feb 21, 2008 at 08:45:53PM +0100, Thomas Gleixner wrote:
> On Thu, 21 Feb 2008, Andi Kleen wrote:
>
> > On Wed, Feb 20, 2008 at 02:57:34PM +0100, Arne Georg Gleditsch wrote:
> > > Hi,
> > >
> > > I'm looking at 2.6.25-rc2. vsyscall_sysctl_change contains code to NOP
> > > out the actual system call instructions of the vsyscall page when
> > > vsyscall64 is enabled. This seems to interact badly with the fallback
> > > code in do_vgettimeofday which tries to call gettimeofday if the
> > > configured clock source does not support vread. (In effect,
> > > gettimeofday() becomes a nop and time() always returns 0. Not very
> > > useful.)
> > >
> > > Is there a good reason to keep this? Aren't the instructions in
> > > question avoided (or invoked) according to the vsyscall64 flag by the
> > > surrounding logic anyway?
> >
> > Yes they are. But a system call sequence at a known fixed address
> > is potentially useful to exploits. That is why it is nop'ed out when
> > it is not needed.
>
> That's a nice intent, but the reality is that this code is broken as
> hell:

Well it worked when I wrote it, but it's quite possible it didn't survive
the clocksource conversion completely.

-Andi

2008-02-22 12:55:16

by Andi Kleen

[permalink] [raw]
Subject: Re: arch/x86/kernel/vsyscall_64.c: overeager NOP of syscalls

On Thu, Feb 21, 2008 at 09:59:24PM +0100, Thomas Gleixner wrote:
> On Thu, 21 Feb 2008, Thomas Gleixner wrote:
> > > That or we need to do the NOP/un-NOP part in the update_vsyscall code
> > > dependent on if the current clocksource supports vread, instead of on
> > > the /proc entry access.
> >
> > That won't fly. We need to sychronize the CPUs when we patch the code,
> > which is not possible from update_wall_time with interrupts disabled.
>
> Also this is utterly stupid as we keep the syscall in cases where we
> do not have vread anyway, so we keep the attack point open for a lot
> of existing machines due to TSC wreckage and HPET unavailability.

Yes that is true, but I didn't find a cheap way around it without
breaking binary compatibility.

I found an expensive way (essentially just putting
a trampoline to a variable mapped vsyscall page on the old static
address), but since it would have added quite a lot of complexity
(vsyscall is inside the kernel mapping and would need to be rewritten
at context switch) and memory overhead in page tables and one more
page per process I didn't do that.

I also considered doing boot time randomization only (which
would avoid a lot of these problems), but it didn't seem worth it.

Also one must say I don't consider it a big security improvement on most
systems. That is because most programs are not PIC, but linked at a fixed
address and usually if you grep the larger binary for the few bytes
needed for a system call you'll find it at some known offset as
part of another instruction sequence.

-Andi