LinuxLists.cc - Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering

2011-05-26 10:39:16

Subject: Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering

* Gleb Natapov <[email protected]> wrote:

> On Thu, May 26, 2011 at 11:57:51AM +0300, Pekka Enberg wrote:
> > Hi Avi,
> >
> > On Thu, May 26, 2011 at 11:49 AM, Avi Kivity <[email protected]> wrote:
> >
> > > You mean each thread will have a different security context? ?I
> > > don't see the point. ?All threads share all of memory so it
> > > would be trivial for one thread to exploit another and gain all
> > > of its privileges.
> >
> > So how would that happen? I'm assuming that once the security
> > context has been set up for a thread, you're not able to change
> > it after that. You'd be able to exploit other threads through
> > shared memory but how would you gain privileges?
>
> By tricking other threads to execute code for you. Just replace
> return address on the other's thread stack.

That kind of exploit is not possible if the worker pool consists of
processes - which would be rather easy to achieve with tools/kvm/.

In that model each process has its own stack, not accessible to other
worker processes. They'd only share the guest RAM image and some
(minimal) global state.

This way the individual devices are (optionally) isolated from each
other. In a way this is a microkernel done right ;-)

Thanks,

Ingo

2011-05-26 10:47:06

by Avi Kivity

[permalink] [raw]

Subject: Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering

On 05/26/2011 01:38 PM, Ingo Molnar wrote:
> * Gleb Natapov<[email protected]> wrote:
>
> > On Thu, May 26, 2011 at 11:57:51AM +0300, Pekka Enberg wrote:
> > > Hi Avi,
> > >
> > > On Thu, May 26, 2011 at 11:49 AM, Avi Kivity<[email protected]> wrote:
> > >
> > > > You mean each thread will have a different security context? I
> > > > don't see the point. All threads share all of memory so it
> > > > would be trivial for one thread to exploit another and gain all
> > > > of its privileges.
> > >
> > > So how would that happen? I'm assuming that once the security
> > > context has been set up for a thread, you're not able to change
> > > it after that. You'd be able to exploit other threads through
> > > shared memory but how would you gain privileges?
> >
> > By tricking other threads to execute code for you. Just replace
> > return address on the other's thread stack.
>
> That kind of exploit is not possible if the worker pool consists of
> processes - which would be rather easy to achieve with tools/kvm/.
>
> In that model each process has its own stack, not accessible to other
> worker processes. They'd only share the guest RAM image and some
> (minimal) global state.
>
> This way the individual devices are (optionally) isolated from each
> other. In a way this is a microkernel done right ;-)

It's really hard to achieve, since devices have global interactions.
For example a PCI device can change the memory layout when a BAR is
programmed. So you would have a lot of message passing going on (not at
runtime, so no huge impact on performance). The programming model is
very different.

Note that message passing is in fact quite a good way to model hardware,
since what different devices actually do is pass messages to each
other. I expect if done this way, the device model would be better than
what we have today. But it's not an easy step away from threads.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2011-05-26 10:47:12

by Gleb Natapov

[permalink] [raw]

Subject: Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering

On Thu, May 26, 2011 at 12:38:36PM +0200, Ingo Molnar wrote:
>
> * Gleb Natapov <[email protected]> wrote:
>
> > On Thu, May 26, 2011 at 11:57:51AM +0300, Pekka Enberg wrote:
> > > Hi Avi,
> > >
> > > On Thu, May 26, 2011 at 11:49 AM, Avi Kivity <[email protected]> wrote:
> > >
> > > > You mean each thread will have a different security context? ?I
> > > > don't see the point. ?All threads share all of memory so it
> > > > would be trivial for one thread to exploit another and gain all
> > > > of its privileges.
> > >
> > > So how would that happen? I'm assuming that once the security
> > > context has been set up for a thread, you're not able to change
> > > it after that. You'd be able to exploit other threads through
> > > shared memory but how would you gain privileges?
> >
> > By tricking other threads to execute code for you. Just replace
> > return address on the other's thread stack.
>
> That kind of exploit is not possible if the worker pool consists of
> processes - which would be rather easy to achieve with tools/kvm/.
>
Well, of course. There original question was about threads.

> In that model each process has its own stack, not accessible to other
> worker processes. They'd only share the guest RAM image and some
> (minimal) global state.
>
> This way the individual devices are (optionally) isolated from each
> other. In a way this is a microkernel done right ;-)
>
But doesn't this design suffer the same problem as microkernel? Namely
a lot of slow IPCs?

--
Gleb.

2011-05-26 11:12:06

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering

* Gleb Natapov <[email protected]> wrote:

> > In that model each process has its own stack, not accessible to
> > other worker processes. They'd only share the guest RAM image and
> > some (minimal) global state.
> >
> > This way the individual devices are (optionally) isolated from
> > each other. In a way this is a microkernel done right ;-)
>
> But doesn't this design suffer the same problem as microkernel?
> Namely a lot of slow IPCs?

Most of the IPCs we do already, to keep the devices separated from
each other. So the most common type of IPC comes 'for free' in that
model - and this is specific to virtualization so i'd not extend the
claim to the host kernel.

virtio is an IPC mechanism to begin with.

It's certainly not entirely free though so if this is implemented in
tools/kvm/ it should be configurable.

Thanks,

Ingo