2007-08-13 22:58:52

by Hajime Inoue

[permalink] [raw]
Subject: System call interposition/unprotecting the table

I have a question about changing the page attributes of the
system call table.

I am writing a kernel module that does some system call interposition.
This works fine on my debian system, but apparently the stock Fedora
kernel (2.6.22) has the system call table write protected. So I would like
the module to add write permissions to the system call table before
it modifies it.

This is the code in my init_module that is problematic:

// Storing the original call
orig_kill = sys_call_table[__NR_kill];

// Change to write
pg = virt_to_page(sys_call_table);
change_page_attr(pg, 1, PAGE_KERNEL);
global_flush_tlb();

// Test write, should change nothing, but oopses instead
sys_call_table[__NR_kill] = (void*)orig_kill;

I imagine that I'm doing something obviously wrong; I've only been looking
at kernel code for a couple weeks. Can someone please explain what my
error is?

-Hajime Inoue


2007-08-13 23:02:52

by Alan

[permalink] [raw]
Subject: Re: System call interposition/unprotecting the table

On Mon, 13 Aug 2007 18:05:35 -0400
[email protected] wrote:

> I have a question about changing the page attributes of the
> system call table.

Please don't do that.

> This works fine on my debian system, but apparently the stock Fedora
> kernel (2.6.22) has the system call table write protected.

This is to protect it from being changed by an attacker or someone trying
to do strange and bogus things to the kernel.

What are you actually trying to achieve ?

2007-08-14 05:13:04

by Avi Kivity

[permalink] [raw]
Subject: Re: System call interposition/unprotecting the table

Alan Cox wrote:
> This is to protect it from being changed by an attacker or someone trying
> to do strange and bogus things to the kernel.
>

Someone who apparently hasn't heard of vmap(), etc.

It's a debug aid at best.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2007-08-14 11:27:19

by Alan

[permalink] [raw]
Subject: Re: System call interposition/unprotecting the table

On Tue, 14 Aug 2007 08:12:29 +0300
Avi Kivity <[email protected]> wrote:

> Alan Cox wrote:
> > This is to protect it from being changed by an attacker or someone trying
> > to do strange and bogus things to the kernel.
> >
>
> Someone who apparently hasn't heard of vmap(), etc.
>
> It's a debug aid at best.

You assume the average attacker is smart. Fortunately this isn't true,
although some certainly are. For guests however we can push this further
as a hypervisor can implement 'irrevocably read only' pages for a guest.
Something we don't currently do but should.

Alan

2007-08-14 14:22:56

by James Morris

[permalink] [raw]
Subject: Re: System call interposition/unprotecting the table

On Tue, 14 Aug 2007, Avi Kivity wrote:

> Alan Cox wrote:
> > This is to protect it from being changed by an attacker or someone trying
> > to do strange and bogus things to the kernel.
> >
>
> Someone who apparently hasn't heard of vmap(), etc.
>
> It's a debug aid at best.

It clarifies to all developers that the syscall table should not be messed
with.


- James
--
James Morris
<[email protected]>

2007-08-14 17:28:14

by Hajime Inoue

[permalink] [raw]
Subject: Re: System call interposition/unprotecting the table

Thanks for your comments.

Alan Cox wrote:
> This is to protect it from being changed by an attacker or someone trying
> to do strange and bogus things to the kernel.
>
> What are you actually trying to achieve ?

I am trying to emulate an attacker. I'm helping develop a system that
that detects stealthy malware. To that end, we need to test the system in
an environment we completely understand.

Just protecting the table does not stop rootkits. A highly referenced
phrack article explains how to bypass the table. Enyelkm and mood-nt
are both compatible with a protected system call table (I tested them
against the latest Fedora stock kernel). I'm trying to simulate a
rootkit less capable then those publicly available.

Why isn't the rest of the kernel code protected along with the table?
Your response leads to the inverse of my question. How would I protect
the system call table (and other areas) in systems, without recompiling,
that do not protect them?

Finally, system call interposition is used in several interesting
systems, most notably, systrace. It's unclear to me how one would
implement something like systrace without modifying the table or doing
other rootkit-like antics.

If anyone has problems explaining this publicly, please contact me
privately. If anyone doubts my motivation, read my home page
(http://www.ccsl.carleton.ca/~hinoue/), or google my name.

-Hajime Inoue

2007-08-14 17:41:24

by Alan

[permalink] [raw]
Subject: Re: System call interposition/unprotecting the table

> Just protecting the table does not stop rootkits. A highly referenced
> phrack article explains how to bypass the table.

But most people are not capable of fllowing that article - or they
wouldn't be asking here whatever their intention.

> Why isn't the rest of the kernel code protected along with the table?

Thats in progress actually and hopefully then in the hypervisor case
implementing irrevocably read-only pages.

> Finally, system call interposition is used in several interesting
> systems, most notably, systrace. It's unclear to me how one would
> implement something like systrace without modifying the table or doing
> other rootkit-like antics.

Always wrongly. You can't be sure the table format will not change, you
can't reliably restore the table and its virtually impossible to do any
kind of trace reliably this way as you end up with two copies of the data
from user space which may vary (and leads to bad problems - see BSD
recently).

2007-08-14 18:01:32

by Arjan van de Ven

[permalink] [raw]
Subject: Re: System call interposition/unprotecting the table


> Just protecting the table does not stop rootkits.

not in general no; however it requires more rights than before (eg code
execute rather than just "write to memory")

> Why isn't the rest of the kernel code protected along with the table?

it is if you set the right config options.


2007-08-14 18:56:25

by Andi Kleen

[permalink] [raw]
Subject: Re: System call interposition/unprotecting the table

Hajime Inoue <[email protected]> writes:

> Just protecting the table does not stop rootkits. A highly referenced
> phrack article explains how to bypass the table.

During .23-pre for some time the kernel text was protected too (that
would have likely stopped that particular attack), but it was
removed because it caused too many problems.

Ultimatively it is useless for security anyways because the page
tables cannot be protected (because there are valid reasons to change
them). If they're not protected any protection can be undone by
changing them or simply creating an alias mapping. Also the Linux
kernel has function pointers in read-write data structures which could
also be used to inject code.

So even with Alan's hypervisor support the whole thing would be still
quite holey. The argument of raising the bar also doesn't seem very
convincing to me, because attackers reuse code too and it's enough
when someone publishes such code once, then they can cut'n'paste
it into any exploits forever.

In general the .data protection is only considered a debugging
feature. I don't know why Fedora enables it in their production
kernels.

BTW I tested your test case and it works for me on 2.6.23rc3 with
DEBUG_RODATA enabled on i386/PAE. Without DEBUG_RODATA it BUGs,
but that's because the c_p_a interface is somewhat clumpsy
and expects balanced changes.

-Andi

2007-08-14 21:10:12

by Jan Engelhardt

[permalink] [raw]
Subject: Re: System call interposition/unprotecting the table


On Aug 14 2007 21:50, Andi Kleen wrote:
>Hajime Inoue <[email protected]> writes:
>
>> Just protecting the table does not stop rootkits. A highly referenced
>> phrack article explains how to bypass the table.
>
>During .23-pre for some time the kernel text was protected too (that
>would have likely stopped that particular attack), but it was
>removed because it caused too many problems.
>
>Ultimatively it is useless for security anyways because the page
>tables cannot be protected (because there are valid reasons to change
>them).

But with DEBUG_RODATA (does that also apply to .text?) enabled,
accidental writes to it should cause a fault rather than doing silent
changes, would not it?


Jan
--

2007-08-14 22:35:26

by Alan

[permalink] [raw]
Subject: Re: System call interposition/unprotecting the table

> So even with Alan's hypervisor support the whole thing would be still
> quite holey. The argument of raising the bar also doesn't seem very

Its materially harder, especially with the hypervisor.

> convincing to me, because attackers reuse code too and it's enough
> when someone publishes such code once, then they can cut'n'paste
> it into any exploits forever.

Then you fix the specific case and the game continues.

> In general the .data protection is only considered a debugging
> feature. I don't know why Fedora enables it in their production
> kernels.

That would be because we think you are wrong 8)

Alan

2007-08-14 22:48:53

by Andi Kleen

[permalink] [raw]
Subject: Re: System call interposition/unprotecting the table

> Then you fix the specific case and the game continues.

If they intercept netdev->hard_start_xmit there is nothing
to fix. Or inode->i_ops or any other virtual method pointer
that is called often..

Putting i_ops into const memory doesn't help either -- they
can just copy them and use their own and replace the pointer
in any accessed inode.

It's also not that doing this is rocket science. Anybody
barely skilled in computer architecture should be able
to figure this out.

Ok the only thing that could help is IA64/PPC64 style smart pointer
checking that could prevent foreign code from being
executed, but you won't get that on x86 or most other
architectures any time soon.

And that would also only work if you disable module loading
or implement a likely impractical/incompatible
with free software code signing scheme
(and Vista has just shown that these don't work anyways)

> > In general the .data protection is only considered a debugging
> > feature. I don't know why Fedora enables it in their production
> > kernels.
>
> That would be because we think you are wrong 8)

Well, it might at best buy you a few weeks/months in
terms of the exploit arms race, but thrash your user's TLBs
forever.

-Andi

2007-08-17 14:19:21

by Dave Jones

[permalink] [raw]
Subject: Re: System call interposition/unprotecting the table

On Wed, Aug 15, 2007 at 12:48:35AM +0200, Andi Kleen wrote:

> > > In general the .data protection is only considered a debugging
> > > feature. I don't know why Fedora enables it in their production
> > > kernels.
> >
> > That would be because we think you are wrong 8)
>
> Well, it might at best buy you a few weeks/months in
> terms of the exploit arms race, but thrash your user's TLBs
> forever.

Show me a single situation where this matters.
When we first enabled, we tried both benchmarks and real-world
loads, and it didn't matter at all. Unless something fundamental
has changed since then, the story should still be the same.

Dave

--
http://www.codemonkey.org.uk

2007-08-18 10:37:33

by Andi Kleen

[permalink] [raw]
Subject: Re: System call interposition/unprotecting the table

On Fri, Aug 17, 2007 at 10:19:00AM -0400, Dave Jones wrote:
> On Wed, Aug 15, 2007 at 12:48:35AM +0200, Andi Kleen wrote:
>
> > > > In general the .data protection is only considered a debugging
> > > > feature. I don't know why Fedora enables it in their production
> > > > kernels.
> > >
> > > That would be because we think you are wrong 8)
> >
> > Well, it might at best buy you a few weeks/months in
> > terms of the exploit arms race, but thrash your user's TLBs
> > forever.
>
> Show me a single situation where this matters.

We had a couple of benchmarks where compiled in vs external
4K mapped drivers made a noticeable difference.

> When we first enabled, we tried both benchmarks and real-world
> loads, and it didn't matter at all. Unless something fundamental

It also depends on the CPU -- the sizes of the TLBs vary widely.
On some older CPUs using 2/4MB pages was indeed a bad idea
because the number of large TLB entries were very small.

Also there are sometimes effects where the CPU splits the
TLBs internally so even with 2MB pages you effectively
get 4K TLB use. You could have run in one of those

> has changed since then, the story should still be the same.

Well if you believe it is that hyper useful you should try to convince
Linus then to readd the text protection for DEBUG_RODATA and a working text_poke()
that handles it correctly. The last version was nearly there, unfortunately the
time allowed for a new feature to be buggy and getting fixed before it
is reverted is very short these days[1].

Even if you say "my dumb attacker can patch sys_call_table but not
root_inode->i_ops in memory" it is still much harder to say
"my dumb attacker can patch sys_call_table but not a jump into *sys_read"
So without text protection your scheme is really double plus useless.

Modules could be also protected early BTW if you waste enough memory
to make .data page aligned [so likely raising minimal module size
from one page to two]

-Andi

[1] unless you name it pci sysinfo -- then anything is allowed.