Hi Andrew,
Would you be willing to carry this series? Andy Lutomirski appears
happy with it now. (Thanks again for all the feedback Andy!) If so, it
has a relatively small merge conflict with the bpf changes living in
net-next. Would you prefer I rebase against net-next, let sfr handle
it, get carried in net-next, or some other option?
Thanks!
-Kees
On Thu, May 22, 2014 at 4:05 PM, Kees Cook <[email protected]> wrote:
> This adds the ability for threads to request seccomp filter
> synchronization across their thread group (either at filter attach time
> or later). (For example, for Chrome to make sure graphic driver threads
> are fully confined after seccomp filters have been attached.)
>
> To support this, seccomp locking on writes is introduced, along with
> refactoring of no_new_privs. Races with thread creation are handled via
> the tasklist_list.
>
> I think all the concerns raised during the discussion[1] of the first
> version of this patch have been addressed. However, the races involved
> have tricked me before. :)
>
> Thanks!
>
> -Kees
>
> [1] https://lkml.org/lkml/2014/1/13/795
>
> v5:
> - move includes around (drysdale)
> - drop set_nnp return value (luto)
> - use smp_load_acquire/store_release (luto)
> - merge nnp changes to seccomp always, fewer ifdef (luto)
> v4:
> - cleaned up locking further, as noticed by David Drysdale
> v3:
> - added SECCOMP_EXT_ACT_FILTER for new filter install options
> v2:
> - reworked to avoid clone races
>
--
Kees Cook
Chrome OS Security
On Mon, Jun 2, 2014 at 12:47 PM, Kees Cook <[email protected]> wrote:
> Hi Andrew,
>
> Would you be willing to carry this series? Andy Lutomirski appears
> happy with it now. (Thanks again for all the feedback Andy!) If so, it
> has a relatively small merge conflict with the bpf changes living in
> net-next. Would you prefer I rebase against net-next, let sfr handle
> it, get carried in net-next, or some other option?
Well, I'm still not entirely convinced that we want to have this much
multiplexing in a prctl, and I'm still a bit unconvinced that the code
wouldn't be better off it it were completely atomic in the sense that
it would either work or fail without doing anything.
--Andy
On Mon, Jun 2, 2014 at 12:59 PM, Andy Lutomirski <[email protected]> wrote:
> On Mon, Jun 2, 2014 at 12:47 PM, Kees Cook <[email protected]> wrote:
>> Hi Andrew,
>>
>> Would you be willing to carry this series? Andy Lutomirski appears
>> happy with it now. (Thanks again for all the feedback Andy!) If so, it
>> has a relatively small merge conflict with the bpf changes living in
>> net-next. Would you prefer I rebase against net-next, let sfr handle
>> it, get carried in net-next, or some other option?
>
> Well, I'm still not entirely convinced that we want to have this much
> multiplexing in a prctl, and I'm still a bit unconvinced that the code
I don't want to get caught without interface argument flexibility
again, so that's why the prctl interface is being set up that way.
> wouldn't be better off it it were completely atomic in the sense that
> it would either work or fail without doing anything.
Getting perfect atomic operation looks extremely hard given task
locking. If this could get fixed in the future, it would have no
impact on the interface. At present, the corner case of the racing
thread is small enough that just catching the race failure is
sufficient. If task locking is improved in the future, it could just
simply never lose a race. Userspace still needs to handle errors no
matter what is the non-race failure condition (mode 1 or forked
filter) still exists.
-Kees
--
Kees Cook
Chrome OS Security
On Mon, Jun 2, 2014 at 1:06 PM, Kees Cook <[email protected]> wrote:
> On Mon, Jun 2, 2014 at 12:59 PM, Andy Lutomirski <[email protected]> wrote:
>> On Mon, Jun 2, 2014 at 12:47 PM, Kees Cook <[email protected]> wrote:
>>> Hi Andrew,
>>>
>>> Would you be willing to carry this series? Andy Lutomirski appears
>>> happy with it now. (Thanks again for all the feedback Andy!) If so, it
>>> has a relatively small merge conflict with the bpf changes living in
>>> net-next. Would you prefer I rebase against net-next, let sfr handle
>>> it, get carried in net-next, or some other option?
>>
>> Well, I'm still not entirely convinced that we want to have this much
>> multiplexing in a prctl, and I'm still a bit unconvinced that the code
>
> I don't want to get caught without interface argument flexibility
> again, so that's why the prctl interface is being set up that way.
I was thinking that a syscall might be a lot prettier. It may pay to
cc linux-api, too.
I'll offer you a deal: if you try to come up with a nice, clean
syscall, I'll try to write a fast(er) path for x86_64 to reduce
overhead. I bet I can save 90-100ns per syscall. :)
>
>> wouldn't be better off it it were completely atomic in the sense that
>> it would either work or fail without doing anything.
>
> Getting perfect atomic operation looks extremely hard given task
> locking. If this could get fixed in the future, it would have no
> impact on the interface. At present, the corner case of the racing
> thread is small enough that just catching the race failure is
> sufficient. If task locking is improved in the future, it could just
> simply never lose a race. Userspace still needs to handle errors no
> matter what is the non-race failure condition (mode 1 or forked
> filter) still exists.
>
I think it's doable -- I just replied to the other thread.
> -Kees
>
> --
> Kees Cook
> Chrome OS Security
--
Andy Lutomirski
AMA Capital Management, LLC
On Mon, Jun 2, 2014 at 2:17 PM, Andy Lutomirski <[email protected]> wrote:
> On Mon, Jun 2, 2014 at 1:06 PM, Kees Cook <[email protected]> wrote:
>> On Mon, Jun 2, 2014 at 12:59 PM, Andy Lutomirski <[email protected]> wrote:
>>> On Mon, Jun 2, 2014 at 12:47 PM, Kees Cook <[email protected]> wrote:
>>>> Hi Andrew,
>>>>
>>>> Would you be willing to carry this series? Andy Lutomirski appears
>>>> happy with it now. (Thanks again for all the feedback Andy!) If so, it
>>>> has a relatively small merge conflict with the bpf changes living in
>>>> net-next. Would you prefer I rebase against net-next, let sfr handle
>>>> it, get carried in net-next, or some other option?
>>>
>>> Well, I'm still not entirely convinced that we want to have this much
>>> multiplexing in a prctl, and I'm still a bit unconvinced that the code
>>
>> I don't want to get caught without interface argument flexibility
>> again, so that's why the prctl interface is being set up that way.
>
> I was thinking that a syscall might be a lot prettier. It may pay to
> cc linux-api, too.
>
> I'll offer you a deal: if you try to come up with a nice, clean
> syscall, I'll try to write a fast(er) path for x86_64 to reduce
> overhead. I bet I can save 90-100ns per syscall. :)
Now added to the Cc.
Which path do you mean to improve? Neither the prctl nor a syscall for
this would need to be fast at all.
I don't want to go in circles on this. I've been there before on my
VFS link hardening series, and my module restriction series. I would
like consensus from more than just one person. :)
I'd like to hear from other folks on this (akpm?). My instinct is to
continue using prctl since that is already where mediation for seccomp
happens. I don't see why prctl vs a new syscall makes a difference
here, frankly.
-Kees
--
Kees Cook
Chrome OS Security
On Mon, Jun 2, 2014 at 4:05 PM, Kees Cook <[email protected]> wrote:
> On Mon, Jun 2, 2014 at 2:17 PM, Andy Lutomirski <[email protected]> wrote:
>> On Mon, Jun 2, 2014 at 1:06 PM, Kees Cook <[email protected]> wrote:
>>> On Mon, Jun 2, 2014 at 12:59 PM, Andy Lutomirski <[email protected]> wrote:
>>>> On Mon, Jun 2, 2014 at 12:47 PM, Kees Cook <[email protected]> wrote:
>>>>> Hi Andrew,
>>>>>
>>>>> Would you be willing to carry this series? Andy Lutomirski appears
>>>>> happy with it now. (Thanks again for all the feedback Andy!) If so, it
>>>>> has a relatively small merge conflict with the bpf changes living in
>>>>> net-next. Would you prefer I rebase against net-next, let sfr handle
>>>>> it, get carried in net-next, or some other option?
>>>>
>>>> Well, I'm still not entirely convinced that we want to have this much
>>>> multiplexing in a prctl, and I'm still a bit unconvinced that the code
>>>
>>> I don't want to get caught without interface argument flexibility
>>> again, so that's why the prctl interface is being set up that way.
>>
>> I was thinking that a syscall might be a lot prettier. It may pay to
>> cc linux-api, too.
>>
>> I'll offer you a deal: if you try to come up with a nice, clean
>> syscall, I'll try to write a fast(er) path for x86_64 to reduce
>> overhead. I bet I can save 90-100ns per syscall. :)
>
> Now added to the Cc.
>
> Which path do you mean to improve? Neither the prctl nor a syscall for
> this would need to be fast at all.
Non-seccomp-related syscalls when seccomp is enabled.
>
> I don't want to go in circles on this. I've been there before on my
> VFS link hardening series, and my module restriction series. I would
> like consensus from more than just one person. :)
I can't offer you anyone else's review, unfortunately :-/
>
> I'd like to hear from other folks on this (akpm?). My instinct is to
> continue using prctl since that is already where mediation for seccomp
> happens. I don't see why prctl vs a new syscall makes a difference
> here, frankly.
Aesthetics? There's a tendency for people to get annoyed at big
multiplexed APIs, and your patches will be doubly multiplexed.
TBH, I care more about the atomicity thing than about the actual form
of the API.
--Andy
>
> -Kees
>
> --
> Kees Cook
> Chrome OS Security
--
Andy Lutomirski
AMA Capital Management, LLC
[Kees, thank you for CCing linux-api]
On Tue, Jun 3, 2014 at 1:08 AM, Andy Lutomirski <[email protected]> wrote:
> On Mon, Jun 2, 2014 at 4:05 PM, Kees Cook <[email protected]> wrote:
>> I'd like to hear from other folks on this (akpm?). My instinct is to
>> continue using prctl since that is already where mediation for seccomp
>> happens. I don't see why prctl vs a new syscall makes a difference
>> here, frankly.
>
> Aesthetics? There's a tendency for people to get annoyed at big
> multiplexed APIs, and your patches will be doubly multiplexed.
prctl() is already a Franken-interface that provides a mass of
different, mostly completely unrelated, functionality. So, I wonder if
it would be better not to make the situation worse. Furthermore, the
very fact that the existing prctl seccomp API is being extended and
multiplexed suggests that other extensions might be desirable further
down the line, which also hints that a separate syscall would be a
good idea. (Or do we have to wait until the prctl seccomp API is
extended one more time, before we realize that a new system call would
have been a good idea...)
> TBH, I care more about the atomicity thing than about the actual form
> of the API.
User-space does not necessarily thank you for that perspective, Andy
;-). The atomicity thing is presumably fixable, regardless of the API.
On the other hand, APIs are things that kernel developers design once
and forget about, and user-space has to live with forever.
Cheers,
Michael
On Tue, Jun 3, 2014 at 3:12 AM, Michael Kerrisk <[email protected]> wrote:
> [Kees, thank you for CCing linux-api]
>
> On Tue, Jun 3, 2014 at 1:08 AM, Andy Lutomirski <[email protected]> wrote:
>> On Mon, Jun 2, 2014 at 4:05 PM, Kees Cook <[email protected]> wrote:
>
>>> I'd like to hear from other folks on this (akpm?). My instinct is to
>>> continue using prctl since that is already where mediation for seccomp
>>> happens. I don't see why prctl vs a new syscall makes a difference
>>> here, frankly.
>>
>> Aesthetics? There's a tendency for people to get annoyed at big
>> multiplexed APIs, and your patches will be doubly multiplexed.
>
> prctl() is already a Franken-interface that provides a mass of
> different, mostly completely unrelated, functionality. So, I wonder if
> it would be better not to make the situation worse. Furthermore, the
> very fact that the existing prctl seccomp API is being extended and
> multiplexed suggests that other extensions might be desirable further
> down the line, which also hints that a separate syscall would be a
> good idea. (Or do we have to wait until the prctl seccomp API is
> extended one more time, before we realize that a new system call would
> have been a good idea...)
>
>> TBH, I care more about the atomicity thing than about the actual form
>> of the API.
>
> User-space does not necessarily thank you for that perspective, Andy
> ;-). The atomicity thing is presumably fixable, regardless of the API.
> On the other hand, APIs are things that kernel developers design once
> and forget about, and user-space has to live with forever.
Well, maybe the history of it being a prctl() should count for something.
Most likely, userland will need to test for whether or not these
features are present in the kernel for years to come. With a syscall,
it would now require a syscall (unlikely to be in older headers for a
while, so will require using syscall(3) for a bit) as well as a call
to prctl() to test for seccomp mode 2 (without thread sync) in the
fallback path. It'll be a little odd. As the person who will make this
work in Chromium, I do not feel strongly either way, it's a detail, so
feel free to disregard this point.
But I'm eagerly waiting for:
- Not having to test for the presence of threads at run-time (which
requires a very ugly busy loop with exponential back-off watching for
/proc/self/tasks/ to drop to 1 directory entry).
- Being able to engage the sandbox after third-party libraries have
started threads.
Julien