2012-05-21 18:22:00

by Eric Paris

[permalink] [raw]
Subject: seccomp and ptrace. what is the correct order?

Viro ask me a question today and I didn't have a good answer.

Lets assume I set a seccomp filter that will allow read and will
deny/kill ioctl. If something else is tracing me I could call read.
The read will pass the seccomp hook and move onto the ptrace hook.
The tracer could then change the syscall number to ioctl and I would
then actually perform an ioctl.

Is that what we want? Do we want to do the permission check based on
what a process ask at syscall enter or do we want to do the permission
check based on what the kernel is actually going to do on behalf of
the process?

Does the question make sense?

-Eric


2012-05-21 18:25:27

by Roland McGrath

[permalink] [raw]
Subject: Re: seccomp and ptrace. what is the correct order?

>From a security perspective I think the natural expectation would be that
the seccomp check is on the values that will actually be used, without an
intervening opportunity to change anything.

2012-05-21 18:47:20

by Richard Weinberger

[permalink] [raw]
Subject: Re: seccomp and ptrace. what is the correct order?

On Mon, May 21, 2012 at 8:21 PM, Eric Paris <[email protected]> wrote:
> Is that what we want? ?Do we want to do the permission check based on
> what a process ask at syscall enter or do we want to do the permission
> check based on what the kernel is actually going to do on behalf of
> the process?

I think we want the latter.
A system call emulator like UserModeLinux would benefit from that.

--
Thanks,
//richard

2012-05-21 19:15:07

by H. Peter Anvin

[permalink] [raw]
Subject: Re: seccomp and ptrace. what is the correct order?

On 05/21/2012 11:47 AM, richard -rw- weinberger wrote:
> On Mon, May 21, 2012 at 8:21 PM, Eric Paris <[email protected]> wrote:
>> Is that what we want? Do we want to do the permission check based on
>> what a process ask at syscall enter or do we want to do the permission
>> check based on what the kernel is actually going to do on behalf of
>> the process?
>
> I think we want the latter.
> A system call emulator like UserModeLinux would benefit from that.
>

Are you sure? This would mean that a seccomp program used by the
process to intercept its own system calls via SIGSYS would give
completely different results under UML than under native...

-hpa

2012-05-21 19:39:50

by Indan Zupancic

[permalink] [raw]
Subject: Re: seccomp and ptrace. what is the correct order?

On Mon, May 21, 2012 20:25, Roland McGrath wrote:
> From a security perspective I think the natural expectation would be that
> the seccomp check is on the values that will actually be used, without an
> intervening opportunity to change anything.

Actually, considering a tracer has full control over a traced process,
it would make most sense from a security perspective to check both the
traced task's seccomp filter, as well as the one for the ptracer for
modified system calls (calls where any register poking at all was done).
Otherwise a task could bypass its own seccomp filter by ptracing a hapless
victim.

I mentioned this before, but I forgot why this option was dismissed.
Probably because ptrace shouldn't have been allowed by the filter in
the first place.

The current patch does the seccomp check first and ignores any changes
made via ptrace, just like the old seccomp did. So in that sense nothing
changed.

Originally the seccomp filter check was in the fast path, so doing it
after ptrace was tricky. But now it has been moved to the slow tracehook
path it can easily be checked after the ptrace notification. That would
change the behaviour SECCOMP_MODE=1 though, but probably nobody cares,
as it can be argued that that was a security hole anyway (except if
ptracing a seccomped task was disallowed, in which case moving it to
the end doesn't change anything anyway).

Another argument for moving it to the end is that it makes debugging
seccomped tasks a lot easier, because the debugger sees the denied
system call. With the current patch the tasks would silently die.

Greetings,

Indan

2012-05-22 16:23:11

by Will Drewry

[permalink] [raw]
Subject: Re: seccomp and ptrace. what is the correct order?

On Mon, May 21, 2012 at 2:20 PM, Indan Zupancic <[email protected]> wrote:
> On Mon, May 21, 2012 20:25, Roland McGrath wrote:
>> From a security perspective I think the natural expectation would be that
>> the seccomp check is on the values that will actually be used, without an
>> intervening opportunity to change anything.
>
> Actually, considering a tracer has full control over a traced process,
> it would make most sense from a security perspective to check both the
> traced task's seccomp filter, as well as the one for the ptracer for
> modified system calls (calls where any register poking at all was done).
> Otherwise a task could bypass its own seccomp filter by ptracing a hapless
> victim.
>
> I mentioned this before, but I forgot why this option was dismissed.
> Probably because ptrace shouldn't have been allowed by the filter in
> the first place.
>
> The current patch does the seccomp check first and ignores any changes
> made via ptrace, just like the old seccomp did. So in that sense nothing
> changed.
>
> Originally the seccomp filter check was in the fast path, so doing it
> after ptrace was tricky. But now it has been moved to the slow tracehook
> path it can easily be checked after the ptrace notification. That would
> change the behaviour SECCOMP_MODE=1 though, but probably nobody cares,
> as it can be argued that that was a security hole anyway (except if
> ptracing a seccomped task was disallowed, in which case moving it to
> the end doesn't change anything anyway).
>
> Another argument for moving it to the end is that it makes debugging
> seccomped tasks a lot easier, because the debugger sees the denied
> system call. With the current patch the tasks would silently die.

I don't see this as a strict bypass because an successful attack would require:
- the ability to fork/clone _and_ call ptrace (which would otherwise
be blocked by the BPF if the user of the bpf cares)
- the ability to compromise a process with ptrace abilities for a
sandboxed process -- which means that a privilege escalation (in a
word) has already occurred.

However(!), if we did move secure_computing() to after ptrace _and_
added a check after SECCOMP_RET_TRACE's ptrace_event call, we could
ensure the system call was not changed by the tracer. This would give
strong assurances that whatever system call is executed was explicitly
allowed by seccomp policy is the one that was executed.

I'm open to either leaving things alone (since it isn't horrible) or
making the change to tighten things up. Is the mode=1 behavior change
acceptable? I assumed it wouldn't be, but perhaps I shouldn't have
made that assumption.

Regardless, I will go ahead and make patches and test them for both,
so they are on-hand regardless.

thanks!
will

2012-05-22 16:26:46

by Will Drewry

[permalink] [raw]
Subject: Re: seccomp and ptrace. what is the correct order?

On Tue, May 22, 2012 at 11:23 AM, Will Drewry <[email protected]> wrote:
> On Mon, May 21, 2012 at 2:20 PM, Indan Zupancic <[email protected]> wrote:
>> On Mon, May 21, 2012 20:25, Roland McGrath wrote:
>>> From a security perspective I think the natural expectation would be that
>>> the seccomp check is on the values that will actually be used, without an
>>> intervening opportunity to change anything.
>>
>> Actually, considering a tracer has full control over a traced process,
>> it would make most sense from a security perspective to check both the
>> traced task's seccomp filter, as well as the one for the ptracer for
>> modified system calls (calls where any register poking at all was done).
>> Otherwise a task could bypass its own seccomp filter by ptracing a hapless
>> victim.
>>
>> I mentioned this before, but I forgot why this option was dismissed.
>> Probably because ptrace shouldn't have been allowed by the filter in
>> the first place.
>>
>> The current patch does the seccomp check first and ignores any changes
>> made via ptrace, just like the old seccomp did. So in that sense nothing
>> changed.
>>
>> Originally the seccomp filter check was in the fast path, so doing it
>> after ptrace was tricky. But now it has been moved to the slow tracehook
>> path it can easily be checked after the ptrace notification. That would
>> change the behaviour SECCOMP_MODE=1 though, but probably nobody cares,
>> as it can be argued that that was a security hole anyway (except if
>> ptracing a seccomped task was disallowed, in which case moving it to
>> the end doesn't change anything anyway).
>>
>> Another argument for moving it to the end is that it makes debugging
>> seccomped tasks a lot easier, because the debugger sees the denied
>> system call. With the current patch the tasks would silently die.
>
> I don't see this as a strict bypass because an successful attack would require:
> - the ability to fork/clone _and_ call ptrace (which would otherwise
> be blocked by the BPF if the user of the bpf cares)
> - the ability to compromise a process with ptrace abilities for a
> sandboxed process -- which means that a privilege escalation (in a
> word) has already occurred.
>
> However(!), if we did move secure_computing() to after ptrace _and_
> added a check after SECCOMP_RET_TRACE's ptrace_event call, we could
> ensure the system call was not changed by the tracer. ?This would give
> strong assurances that whatever system call is executed was explicitly
> allowed by seccomp policy is the one that was executed.
>
> I'm open to either leaving things alone (since it isn't horrible) or

Let me rephrase :) The current behavior makes sense for an inwardly
focused system. seccomp mode=1 and mode=2 are about the behavior of
the task they apply to. In that regard, both modes block (or can
block) ptrace, which means that there is no way for them to exploit
this -- they are operating correctly.

The discussion we're having now is if that is the desirable behavior.
Do we want seccomp to be a system call contract between userspace and
the kernel for the task no matter what, or do we want it to be a
system call contract between the specific task's code and the kernel
(thereby allowing a tracer to make things act totally differently).

> making the change to tighten things up. Is the mode=1 behavior change
> acceptable? ?I assumed it wouldn't be, but perhaps I shouldn't have
> made that assumption.
>
> Regardless, I will go ahead and make patches and test them for both,
> so they are on-hand regardless.
>
> thanks!
> will

2012-05-22 17:40:00

by Al Viro

[permalink] [raw]
Subject: Re: seccomp and ptrace. what is the correct order?

On Tue, May 22, 2012 at 11:23:06AM -0500, Will Drewry wrote:

> However(!), if we did move secure_computing() to after ptrace _and_
> added a check after SECCOMP_RET_TRACE's ptrace_event call, we could
> ensure the system call was not changed by the tracer. This would give
> strong assurances that whatever system call is executed was explicitly
> allowed by seccomp policy is the one that was executed.

BTW, after grepping around a bit, I have to say that some callers of those
hooks make very little sense

Exhibit A: sh32 has in do_syscall_trace_enter(regs)
secure_computing(regs->regs[0]);
Syscall number in r0, right?
[usual PTRACE_SYSCALL bits]
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->regs[0]);
Ditto
audit_syscall_entry(audit_arch(), regs->regs[3],
regs->regs[4], regs->regs[5],
regs->regs[6], regs->regs[7]);
Oops - that one says syscall number in r3, first 4 arguments in r4..r7
return ret ?: regs->regs[0];

and the caller of that sucker is
syscall_trace_entry:
! Yes it is traced.
mov r15, r4
mov.l 7f, r11 ! Call do_syscall_trace_enter which notifies
jsr @r11 ! superior (will chomp R[0-7])
nop
mov.l r0, @(OFF_R0,r15) ! Save return value
! Reload R0-R4 from kernel stack, where the
! parent may have modified them using
! ptrace(POKEUSR). (Note that R0-R2 are
! used by the system call handler directly
! from the kernel stack anyway, so don't need
! to be reloaded here.) This allows the parent
! to rewrite system calls and args on the fly.
mov.l @(OFF_R4,r15), r4 ! arg0
mov.l @(OFF_R5,r15), r5
mov.l @(OFF_R6,r15), r6
mov.l @(OFF_R7,r15), r7 ! arg3
mov.l @(OFF_R3,r15), r3 ! syscall_nr
!
mov.l 2f, r10 ! Number of syscalls
cmp/hs r10, r3
bf syscall_call
[...]
7: .long do_syscall_trace_enter

... and syscall_call very clearly picks an index in syscall table from r3.
Incidentally, r0 is the fifth syscall argument... So what we have is
* b0rken hookups for seccomp and tracepoint
* b0rken cancelling of syscalls by ptrace (replacing syscall number
with -1 would've worked; doing that to the 5th argument - not really)

Exhibit B: sh64; makes even less sense, but there the assembler glue had
been too dense for me to get through. At the very least, seccomp and
tracepoint are assuming that syscall number is in r9, while audit is
assuming that it's in r1. I'm not too inclined to trust audit in this
case, though. The _really_ interesting part is that by the look of
their pt_regs syscall number is stored separately - not in regs->regs[]
at all. And the caller
* shoves the return value of do_syscall_trace_enter() into regs->regs[2]
* picks syscall number from regs->syscall_nr and uses that as index
in sys_call_table
* seems to imply that arguments are in regs->regs[2..7]
* code around the (presumable) path leading there seems to imply
that syscall number comes from the trap number and isn't in regs->regs[]
at all. But I might be misreading that assembler. Easily.

Exhibit C:
mips is consistent these days, but it has no tracepoint hookup *and* it does
open-code tracehook_report_syscall_entry(), except for its return value...
Used to pass the wrong register to seccomp, IIRC.

We really ought to look into merging those suckers. It's a source of PITA
that keeps coming back; what we need is
regs_syscall_number(struct pt_regs *)
regs_syscall_arg1(struct pt_regs *)
...
regs_syscall_arg6(struct pt_regs *)
in addition to existing
regs_return_value(struct pt_regs *)
added on all platforms and made mandatory for new ones. With that we
could go a long way towards merging these implementations...

2012-05-22 20:26:27

by Will Drewry

[permalink] [raw]
Subject: Re: seccomp and ptrace. what is the correct order?

On Tue, May 22, 2012 at 12:39 PM, Al Viro <[email protected]> wrote:
> On Tue, May 22, 2012 at 11:23:06AM -0500, Will Drewry wrote:
>
>> However(!), if we did move secure_computing() to after ptrace _and_
>> added a check after SECCOMP_RET_TRACE's ptrace_event call, we could
>> ensure the system call was not changed by the tracer. ?This would give
>> strong assurances that whatever system call is executed was explicitly
>> allowed by seccomp policy is the one that was executed.
>
> BTW, after grepping around a bit, I have to say that some callers of those
> hooks make very little sense

Yeah - the arch specific seccomp and ptrace code is in dire need of
attention on a number of platforms.

> Exhibit A: sh32 has in do_syscall_trace_enter(regs)
> ? ? ? ?secure_computing(regs->regs[0]);
> Syscall number in r0, right?
> ? ? ? ?[usual PTRACE_SYSCALL bits]
> ? ? ? ?if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
> ? ? ? ? ? ? ? ?trace_sys_enter(regs, regs->regs[0]);
> Ditto
> ? ? ? ?audit_syscall_entry(audit_arch(), regs->regs[3],
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?regs->regs[4], regs->regs[5],
> ? ? ? ? ? ? ? ? ? ? ? ? ? ?regs->regs[6], regs->regs[7]);
> Oops - that one says syscall number in r3, first 4 arguments in r4..r7
> ? ? ? ?return ret ?: regs->regs[0];
>
> and the caller of that sucker is
> syscall_trace_entry:
> ? ? ? ?! ? ? ? ? ? ? ? ? ? ? ? Yes it is traced.
> ? ? ? ?mov ? ? r15, r4
> ? ? ? ?mov.l ? 7f, r11 ? ? ? ? ! Call do_syscall_trace_enter which notifies
> ? ? ? ?jsr ? ? @r11 ? ? ? ? ? ?! superior (will chomp R[0-7])
> ? ? ? ? nop
> ? ? ? ?mov.l ? r0, @(OFF_R0,r15) ? ? ? ! Save return value
> ? ? ? ?! ? ? ? ? ? ? ? ? ? ? ? Reload R0-R4 from kernel stack, where the
> ? ? ? ?! ? ? ? ? ? ? ? ? ? ? ? parent may have modified them using
> ? ? ? ?! ? ? ? ? ? ? ? ? ? ? ? ptrace(POKEUSR). ?(Note that R0-R2 are
> ? ? ? ?! ? ? ? ? ? ? ? ? ? ? ? used by the system call handler directly
> ? ? ? ?! ? ? ? ? ? ? ? ? ? ? ? from the kernel stack anyway, so don't need
> ? ? ? ?! ? ? ? ? ? ? ? ? ? ? ? to be reloaded here.) ?This allows the parent
> ? ? ? ?! ? ? ? ? ? ? ? ? ? ? ? to rewrite system calls and args on the fly.
> ? ? ? ?mov.l ? @(OFF_R4,r15), r4 ? ! arg0
> ? ? ? ?mov.l ? @(OFF_R5,r15), r5
> ? ? ? ?mov.l ? @(OFF_R6,r15), r6
> ? ? ? ?mov.l ? @(OFF_R7,r15), r7 ? ! arg3
> ? ? ? ?mov.l ? @(OFF_R3,r15), r3 ? ! syscall_nr
> ? ? ? ?!
> ? ? ? ?mov.l ? 2f, r10 ? ? ? ? ? ? ? ? ! Number of syscalls
> ? ? ? ?cmp/hs ?r10, r3
> ? ? ? ?bf ? ? ?syscall_call
> [...]
> 7: ? ? ?.long ? do_syscall_trace_enter
>
> ... and syscall_call very clearly picks an index in syscall table from r3.
> Incidentally, r0 is the fifth syscall argument... ?So what we have is
> ? ? ? ?* b0rken hookups for seccomp and tracepoint
> ? ? ? ?* b0rken cancelling of syscalls by ptrace (replacing syscall number
> with -1 would've worked; doing that to the 5th argument - not really)
>
> Exhibit B: sh64; makes even less sense, but there the assembler glue had
> been too dense for me to get through. ?At the very least, seccomp and
> tracepoint are assuming that syscall number is in r9, while audit is
> assuming that it's in r1. ?I'm not too inclined to trust audit in this
> case, though. ?The _really_ interesting part is that by the look of
> their pt_regs syscall number is stored separately - not in regs->regs[]
> at all. ?And the caller
> ? ? ? ?* shoves the return value of do_syscall_trace_enter() into regs->regs[2]
> ? ? ? ?* picks syscall number from regs->syscall_nr and uses that as index
> in sys_call_table
> ? ? ? ?* seems to imply that arguments are in regs->regs[2..7]
> ? ? ? ?* code around the (presumable) path leading there seems to imply
> that syscall number comes from the trap number and isn't in regs->regs[]
> at all. ?But I might be misreading that assembler. ?Easily.
>
> Exhibit C:
> mips is consistent these days, but it has no tracepoint hookup *and* it does
> open-code tracehook_report_syscall_entry(), except for its return value...
> Used to pass the wrong register to seccomp, IIRC.
>
> We really ought to look into merging those suckers. ?It's a source of PITA
> that keeps coming back; what we need is
> ? ? ? ?regs_syscall_number(struct pt_regs *)
> ? ? ? ?regs_syscall_arg1(struct pt_regs *)
> ? ? ? ?...
> ? ? ? ?regs_syscall_arg6(struct pt_regs *)
> in addition to existing
> ? ? ? ?regs_return_value(struct pt_regs *)
> added on all platforms and made mandatory for new ones. ?With that we
> could go a long way towards merging these implementations...

For fun, I took a stab at that to see how it'd play out for x86.
(Attached instead of inline since it's just a first cut.)

I do think much of the ptrace/seccomp could be merged, but I don't
know how quickly given all the changes that each arch will need.

What makes sense for the current seccomp work? I'm not displeased
with the in-tree behavior, but the alternative approach would change
the ptrace+seccomp interactions (minimally) just by switching the
ordering. It's not hard to change all the existing arches that call
secure_computing calls (I have a tentative patch set), but it wouldn't
fix their general brokenness.

Anyway, I'd like to see seccomp work properly on all arches, but if
there are strong opinions about the correct expectations between
seccomp and ptrace, I'm happy to fix that /now/ for whatever arches it
makes sense for.

thanks!
will


Attachments:
0001-arch-x86-asm-generic-add-syscall_regs.h.patch.txt (8.67 kB)

2012-05-22 20:40:15

by H. Peter Anvin

[permalink] [raw]
Subject: Re: seccomp and ptrace. what is the correct order?

The proposed patch seems to duplicate the functionality in
<asm/syscall.h>. Those macros also try to do the right thing in the
presence of compat.

-hpa

2012-05-22 20:48:44

by Will Drewry

[permalink] [raw]
Subject: Re: seccomp and ptrace. what is the correct order?

On Tue, May 22, 2012 at 3:34 PM, H. Peter Anvin <[email protected]> wrote:
> The proposed patch seems to duplicate the functionality in
> <asm/syscall.h>. ?Those macros also try to do the right thing in the
> presence of compat.

That was my first thought too, so I ran a few simple tests. gcc isn't
smart enough to not add ~344 bytes of code to get the number and
arguments for the x86/kernel/ptrace.c case I included (in the
naive-est of integrations). But I don't know that it justifies the
extra patchwork or enforcing shared code across arches.

Regardless, the syscall entry + trace code can use some attention
across the architectures. I don't know that
one-more-layer-of-abstraction is the right answer (rather than just
fixing the code). The biggest benefit would be having one-true
syscall_trace_entry slow path. That said, the fast paths will be
forever divergent so the opportunity for bugs like the ones pointed
out will still be there.

cheers!
will

2012-05-22 21:07:21

by Al Viro

[permalink] [raw]
Subject: Re: seccomp and ptrace. what is the correct order?

On Tue, May 22, 2012 at 03:48:40PM -0500, Will Drewry wrote:
> On Tue, May 22, 2012 at 3:34 PM, H. Peter Anvin <[email protected]> wrote:
> > The proposed patch seems to duplicate the functionality in
> > <asm/syscall.h>. ?Those macros also try to do the right thing in the
> > presence of compat.
>
> That was my first thought too, so I ran a few simple tests. gcc isn't
> smart enough to not add ~344 bytes of code to get the number and
> arguments for the x86/kernel/ptrace.c case I included (in the
> naive-est of integrations). But I don't know that it justifies the
> extra patchwork or enforcing shared code across arches.
>
> Regardless, the syscall entry + trace code can use some attention
> across the architectures. I don't know that
> one-more-layer-of-abstraction is the right answer (rather than just
> fixing the code). The biggest benefit would be having one-true
> syscall_trace_entry slow path. That said, the fast paths will be
> forever divergent so the opportunity for bugs like the ones pointed
> out will still be there.

FWIW, I'd prefer to have all that done inside __audit_syscall_entry(),
with
context->arch = syscall_get_arch(current, regs);
context->major = syscall_get_nr(current, regs);
syscall_get_arguments(current, regs, 0, 4, context->argv);
done instead of initializations from arguments we are doing there now.
I seriously doubt that it would lead to worse code than what we currently
have. If nothing else, we won't be passing that pile of arguments around.

And yes, asm/syscall.h stuff is probably the right approach here. For
biarch ones syscall_get_arguments() is saner than doing them one by one...

2012-05-22 21:09:54

by H. Peter Anvin

[permalink] [raw]
Subject: Re: seccomp and ptrace. what is the correct order?

On 05/22/2012 01:48 PM, Will Drewry wrote:
>
> That was my first thought too, so I ran a few simple tests. gcc isn't
> smart enough to not add ~344 bytes of code to get the number and
> arguments for the x86/kernel/ptrace.c case I included (in the
> naive-est of integrations). But I don't know that it justifies the
> extra patchwork or enforcing shared code across arches.
>

I suspect the construction of those inlines can be improved.

-hpa

2012-05-22 21:14:14

by Will Drewry

[permalink] [raw]
Subject: Re: seccomp and ptrace. what is the correct order?

On Tue, May 22, 2012 at 4:09 PM, H. Peter Anvin <[email protected]> wrote:
> On 05/22/2012 01:48 PM, Will Drewry wrote:
>>
>> That was my first thought too, so I ran a few simple tests. ?gcc isn't
>> smart enough to not add ~344 bytes of code to get the number and
>> arguments for the x86/kernel/ptrace.c case I included (in the
>> naive-est of integrations). ?But I don't know that it justifies the
>> extra patchwork or enforcing shared code across arches.
>>
>
> I suspect the construction of those inlines can be improved.

Seems likely - or just my use of them :)

2012-05-22 21:17:28

by Roland McGrath

[permalink] [raw]
Subject: Re: seccomp and ptrace. what is the correct order?

On Tue, May 22, 2012 at 2:07 PM, Al Viro <[email protected]> wrote:
> FWIW, I'd prefer to have all that done inside __audit_syscall_entry(),
> with
> ? ? ? ?context->arch ? ? ? = syscall_get_arch(current, regs);
> ? ? ? ?context->major ? ? ?= syscall_get_nr(current, regs);
> ? ? ? ?syscall_get_arguments(current, regs, 0, 4, context->argv);
> done instead of initializations from arguments we are doing there now.
> I seriously doubt that it would lead to worse code than what we currently
> have. ?If nothing else, we won't be passing that pile of arguments around.

I always felt the same way about the audit code. (As a bonus, if the audit
folks ever decide they want all six syscall arguments instead of just four,
they wouldn't have to touch every arch.) But it will certainly produce
drastically worse code for ia64. (Not that anybody cares about ia64.)


Thanks,
Roland

2012-05-22 21:19:23

by H. Peter Anvin

[permalink] [raw]
Subject: Re: seccomp and ptrace. what is the correct order?

On 05/22/2012 02:17 PM, Roland McGrath wrote:
> (As a bonus, if the audit folks ever decide they want all six syscall
> arguments instead of just four

Is there anything about audit which *isn't* a gigantic facepalm?

-hpa

2012-05-22 21:38:41

by H. Peter Anvin

[permalink] [raw]
Subject: Re: seccomp and ptrace. what is the correct order?

On 05/22/2012 02:14 PM, Will Drewry wrote:
>>
>> I suspect the construction of those inlines can be improved.
>
> Seems likely - or just my use of them :)
>

One thing that could make the code worse is if you are in a flow where
the state of the TS_COMPAT flag is known but gcc doesn't know that. I
don't have an easy answer for that.

-hpa

2012-05-22 22:20:59

by Al Viro

[permalink] [raw]
Subject: Re: seccomp and ptrace. what is the correct order?

On Tue, May 22, 2012 at 02:17:03PM -0700, Roland McGrath wrote:
> I always felt the same way about the audit code. (As a bonus, if the audit
> folks ever decide they want all six syscall arguments instead of just four,
> they wouldn't have to touch every arch.) But it will certainly produce
> drastically worse code for ia64. (Not that anybody cares about ia64.)

FWIW, here's more or less what I had in mind; it's completely untested,
definitely breaks build on mips (no asm/syscall.h there) and I'm very
sceptical about ia32 compat syscall glue on x86 (I remember that there
had been some very evil games played with what's in which register at
which point, but I can't recall the details and I would rather leave that
to somebody whose eyes are not bleeding from staring at asm glue for more
than a month, TYVM...

diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index c334a23..c31d82e 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -8,6 +8,8 @@
#define _ASM_ARM_SYSCALL_H

#include <linux/err.h>
+#include <asm/elf.h>
+#include <linux/audit.h>

extern const unsigned long sys_call_table[];

@@ -90,4 +92,10 @@ static inline void syscall_set_arguments(struct task_struct *task,
memcpy(&regs->ARM_r0 + i, args, n * sizeof(args[0]));
}

+static inline int syscall_get_arch(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ return AUDIT_ARCH_ARM;
+}
+
#endif /* _ASM_ARM_SYSCALL_H */
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 14e3826..2e39596 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -911,17 +911,16 @@ asmlinkage int syscall_trace(int why, struct pt_regs *regs, int scno)
{
unsigned long ip;

+ current_thread_info()->syscall = scno;
+
if (why)
audit_syscall_exit(regs);
else
- audit_syscall_entry(AUDIT_ARCH_ARM, scno, regs->ARM_r0,
- regs->ARM_r1, regs->ARM_r2, regs->ARM_r3);
+ audit_syscall_entry(regs);

if (!test_thread_flag(TIF_SYSCALL_TRACE))
return scno;

- current_thread_info()->syscall = scno;
-
/*
* IP is used to denote syscall entry/exit:
* IP = 0 -> entry, =1 -> exit
diff --git a/arch/ia64/include/asm/syscall.h b/arch/ia64/include/asm/syscall.h
index a7ff1c6..12f36fb 100644
--- a/arch/ia64/include/asm/syscall.h
+++ b/arch/ia64/include/asm/syscall.h
@@ -14,6 +14,7 @@
#define _ASM_SYSCALL_H 1

#include <linux/sched.h>
+#include <linux/audit.h>
#include <linux/err.h>

static inline long syscall_get_nr(struct task_struct *task,
@@ -79,4 +80,10 @@ static inline void syscall_set_arguments(struct task_struct *task,

ia64_syscall_get_set_arguments(task, regs, i, n, args, 1);
}
+
+static inline int syscall_get_arch(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ return AUDIT_ARCH_IA64;
+}
#endif /* _ASM_SYSCALL_H */
diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
index 4265ff6..319d001 100644
--- a/arch/ia64/kernel/ptrace.c
+++ b/arch/ia64/kernel/ptrace.c
@@ -1246,7 +1246,7 @@ syscall_trace_enter (long arg0, long arg1, long arg2, long arg3,
ia64_sync_krbs();


- audit_syscall_entry(AUDIT_ARCH_IA64, regs.r15, arg0, arg1, arg2, arg3);
+ audit_syscall_entry(&regs);

return 0;
}
diff --git a/arch/microblaze/include/asm/syscall.h b/arch/microblaze/include/asm/syscall.h
index 9bc4317..d8d963b 100644
--- a/arch/microblaze/include/asm/syscall.h
+++ b/arch/microblaze/include/asm/syscall.h
@@ -3,6 +3,7 @@

#include <linux/kernel.h>
#include <linux/sched.h>
+#include <linux/audit.h>
#include <asm/ptrace.h>

/* The system call number is given by the user in R12 */
@@ -96,6 +97,12 @@ static inline void syscall_set_arguments(struct task_struct *task,
microblaze_set_syscall_arg(regs, i++, *args++);
}

+static inline int syscall_get_arch(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ return EM_MICROBLAZE;
+}
+
asmlinkage long do_syscall_trace_enter(struct pt_regs *regs);
asmlinkage void do_syscall_trace_leave(struct pt_regs *regs);

diff --git a/arch/microblaze/kernel/ptrace.c b/arch/microblaze/kernel/ptrace.c
index ab1b9db..4953058 100644
--- a/arch/microblaze/kernel/ptrace.c
+++ b/arch/microblaze/kernel/ptrace.c
@@ -147,8 +147,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
*/
ret = -1L;

- audit_syscall_entry(EM_MICROBLAZE, regs->r12, regs->r5, regs->r6,
- regs->r7, regs->r8);
+ audit_syscall_entry(regs);

return ret ?: regs->r12;
}
diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h
index b54b2ad..3a6d01f 100644
--- a/arch/powerpc/include/asm/syscall.h
+++ b/arch/powerpc/include/asm/syscall.h
@@ -14,6 +14,7 @@
#define _ASM_SYSCALL_H 1

#include <linux/sched.h>
+#include <linux/audit.h>

/* ftrace syscalls requires exporting the sys_call_table */
#ifdef CONFIG_FTRACE_SYSCALLS
@@ -86,4 +87,12 @@ static inline void syscall_set_arguments(struct task_struct *task,
memcpy(&regs->gpr[3 + i], args, n * sizeof(args[0]));
}

+static inline int syscall_get_arch(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ return test_tsk_thread_flag(task, TIF_32BIT) ?
+ AUDIT_ARCH_PPC :
+ AUDIT_ARCH_PPC64;
+}
+
#endif /* _ASM_SYSCALL_H */
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index dd5e214..3f959f4 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -1724,20 +1724,7 @@ long do_syscall_trace_enter(struct pt_regs *regs)
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->gpr[0]);

-#ifdef CONFIG_PPC64
- if (!is_32bit_task())
- audit_syscall_entry(AUDIT_ARCH_PPC64,
- regs->gpr[0],
- regs->gpr[3], regs->gpr[4],
- regs->gpr[5], regs->gpr[6]);
- else
-#endif
- audit_syscall_entry(AUDIT_ARCH_PPC,
- regs->gpr[0],
- regs->gpr[3] & 0xffffffff,
- regs->gpr[4] & 0xffffffff,
- regs->gpr[5] & 0xffffffff,
- regs->gpr[6] & 0xffffffff);
+ audit_syscall_entry(regs);

return ret ?: regs->gpr[0];
}
diff --git a/arch/s390/include/asm/syscall.h b/arch/s390/include/asm/syscall.h
index fb214dd..b481b11 100644
--- a/arch/s390/include/asm/syscall.h
+++ b/arch/s390/include/asm/syscall.h
@@ -14,6 +14,7 @@

#include <linux/sched.h>
#include <linux/err.h>
+#include <linux/audit.h>
#include <asm/ptrace.h>

/*
@@ -87,4 +88,12 @@ static inline void syscall_set_arguments(struct task_struct *task,
regs->orig_gpr2 = args[0];
}

+static inline int syscall_get_arch(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ return test_tsk_thread_flag(task, TIF_31BIT) ?
+ AUDIT_ARCH_S390 :
+ AUDIT_ARCH_S390X;
+}
+
#endif /* _ASM_SYSCALL_H */
diff --git a/arch/s390/kernel/ptrace.c b/arch/s390/kernel/ptrace.c
index 4993e68..4222adb 100644
--- a/arch/s390/kernel/ptrace.c
+++ b/arch/s390/kernel/ptrace.c
@@ -740,11 +740,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->gprs[2]);

- audit_syscall_entry(is_compat_task() ?
- AUDIT_ARCH_S390 : AUDIT_ARCH_S390X,
- regs->gprs[2], regs->orig_gpr2,
- regs->gprs[3], regs->gprs[4],
- regs->gprs[5]);
+ audit_syscall_entry(regs);
return ret ?: regs->gprs[2];
}

diff --git a/arch/sh/include/asm/syscall_32.h b/arch/sh/include/asm/syscall_32.h
index 7d80df4..db1eb5d 100644
--- a/arch/sh/include/asm/syscall_32.h
+++ b/arch/sh/include/asm/syscall_32.h
@@ -4,6 +4,7 @@
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/err.h>
+#include <linux/audit.h>
#include <asm/ptrace.h>

/* The system call number is given by the user in R3 */
@@ -93,4 +94,16 @@ static inline void syscall_set_arguments(struct task_struct *task,
}
}

+static inline int syscall_get_arch(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ int arch = EM_SH;
+
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+ arch |= __AUDIT_ARCH_LE;
+#endif
+
+ return arch;
+}
+
#endif /* __ASM_SH_SYSCALL_32_H */
diff --git a/arch/sh/include/asm/syscall_64.h b/arch/sh/include/asm/syscall_64.h
index c3561ca..f3a5913 100644
--- a/arch/sh/include/asm/syscall_64.h
+++ b/arch/sh/include/asm/syscall_64.h
@@ -3,6 +3,7 @@

#include <linux/kernel.h>
#include <linux/sched.h>
+#include <linux/audit.h>
#include <asm/ptrace.h>

/* The system call number is given by the user in R9 */
@@ -61,4 +62,19 @@ static inline void syscall_set_arguments(struct task_struct *task,
memcpy(&regs->regs[2 + i], args, n * sizeof(args[0]));
}

+static inline int syscall_get_arch(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ int arch = EM_SH;
+
+#ifdef CONFIG_64BIT
+ arch |= __AUDIT_ARCH_64BIT;
+#endif
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+ arch |= __AUDIT_ARCH_LE;
+#endif
+
+ return arch;
+}
+
#endif /* __ASM_SH_SYSCALL_64_H */
diff --git a/arch/sh/kernel/ptrace_32.c b/arch/sh/kernel/ptrace_32.c
index 81f999a..0173bfd 100644
--- a/arch/sh/kernel/ptrace_32.c
+++ b/arch/sh/kernel/ptrace_32.c
@@ -488,17 +488,6 @@ long arch_ptrace(struct task_struct *child, long request,
return ret;
}

-static inline int audit_arch(void)
-{
- int arch = EM_SH;
-
-#ifdef CONFIG_CPU_LITTLE_ENDIAN
- arch |= __AUDIT_ARCH_LE;
-#endif
-
- return arch;
-}
-
asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
{
long ret = 0;
@@ -517,9 +506,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->regs[0]);

- audit_syscall_entry(audit_arch(), regs->regs[3],
- regs->regs[4], regs->regs[5],
- regs->regs[6], regs->regs[7]);
+ audit_syscall_entry(regs);

return ret ?: regs->regs[0];
}
diff --git a/arch/sh/kernel/ptrace_64.c b/arch/sh/kernel/ptrace_64.c
index af90339..5e21c79 100644
--- a/arch/sh/kernel/ptrace_64.c
+++ b/arch/sh/kernel/ptrace_64.c
@@ -504,20 +504,6 @@ asmlinkage int sh64_ptrace(long request, long pid,
return sys_ptrace(request, pid, addr, data);
}

-static inline int audit_arch(void)
-{
- int arch = EM_SH;
-
-#ifdef CONFIG_64BIT
- arch |= __AUDIT_ARCH_64BIT;
-#endif
-#ifdef CONFIG_CPU_LITTLE_ENDIAN
- arch |= __AUDIT_ARCH_LE;
-#endif
-
- return arch;
-}
-
asmlinkage long long do_syscall_trace_enter(struct pt_regs *regs)
{
long long ret = 0;
@@ -536,10 +522,7 @@ asmlinkage long long do_syscall_trace_enter(struct pt_regs *regs)
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->regs[9]);

- audit_syscall_entry(audit_arch(), regs->regs[1],
- regs->regs[2], regs->regs[3],
- regs->regs[4], regs->regs[5]);
-
+ audit_syscall_entry(regs);
return ret ?: regs->regs[9];
}

diff --git a/arch/sparc/include/asm/syscall.h b/arch/sparc/include/asm/syscall.h
index 025a02a..dc94ae8 100644
--- a/arch/sparc/include/asm/syscall.h
+++ b/arch/sparc/include/asm/syscall.h
@@ -3,6 +3,7 @@

#include <linux/kernel.h>
#include <linux/sched.h>
+#include <linux/audit.h>
#include <asm/ptrace.h>

/*
@@ -124,4 +125,12 @@ static inline void syscall_set_arguments(struct task_struct *task,
regs->u_regs[UREG_I0 + i + j] = args[j];
}

+static inline int syscall_get_arch(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ return test_tsk_thread_flag(TIF_32BIT) ?
+ AUDIT_ARCH_SPARC :
+ AUDIT_ARCH_SPARC64;
+}
+
#endif /* __ASM_SPARC_SYSCALL_H */
diff --git a/arch/sparc/kernel/ptrace_64.c b/arch/sparc/kernel/ptrace_64.c
index 484daba..7deee07 100644
--- a/arch/sparc/kernel/ptrace_64.c
+++ b/arch/sparc/kernel/ptrace_64.c
@@ -1070,15 +1070,7 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs)
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->u_regs[UREG_G1]);

- audit_syscall_entry((test_thread_flag(TIF_32BIT) ?
- AUDIT_ARCH_SPARC :
- AUDIT_ARCH_SPARC64),
- regs->u_regs[UREG_G1],
- regs->u_regs[UREG_I0],
- regs->u_regs[UREG_I1],
- regs->u_regs[UREG_I2],
- regs->u_regs[UREG_I3]);
-
+ audit_syscall_entry(regs);
return ret;
}

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index e3e7340..aa5d1ff 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -184,17 +184,11 @@ sysexit_from_sys_call:

#ifdef CONFIG_AUDITSYSCALL
.macro auditsys_entry_common
- movl %esi,%r9d /* 6th arg: 4th syscall arg */
- movl %edx,%r8d /* 5th arg: 3rd syscall arg */
- /* (already in %ecx) 4th arg: 2nd syscall arg */
- movl %ebx,%edx /* 3rd arg: 1st syscall arg */
- movl %eax,%esi /* 2nd arg: syscall number */
- movl $AUDIT_ARCH_I386,%edi /* 1st arg: audit arch */
+ movq %rsp,%rdi
call __audit_syscall_entry
movl RAX-ARGOFFSET(%rsp),%eax /* reload syscall number */
cmpq $(IA32_NR_syscalls-1),%rax
ja ia32_badsys
- movl %ebx,%edi /* reload 1st syscall arg */
movl RCX-ARGOFFSET(%rsp),%esi /* reload 2nd syscall arg */
movl RDX-ARGOFFSET(%rsp),%edx /* reload 3rd syscall arg */
movl RSI-ARGOFFSET(%rsp),%ecx /* reload 4th syscall arg */
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 1ace47b..7d69c71 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -13,8 +13,8 @@
#ifndef _ASM_X86_SYSCALL_H
#define _ASM_X86_SYSCALL_H

-#include <linux/audit.h>
#include <linux/sched.h>
+#include <linux/audit.h>
#include <linux/err.h>
#include <asm/asm-offsets.h> /* For NR_syscalls */
#include <asm/thread_info.h> /* for TS_COMPAT */
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 7b784f4..42169b9 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -449,16 +449,8 @@ sysenter_exit:
sysenter_audit:
testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
jnz syscall_trace_entry
- addl $4,%esp
- CFI_ADJUST_CFA_OFFSET -4
- /* %esi already in 8(%esp) 6th arg: 4th syscall arg */
- /* %edx already in 4(%esp) 5th arg: 3rd syscall arg */
- /* %ecx already in 0(%esp) 4th arg: 2nd syscall arg */
- movl %ebx,%ecx /* 3rd arg: 1st syscall arg */
- movl %eax,%edx /* 2nd arg: syscall number */
- movl $AUDIT_ARCH_I386,%eax /* 1st arg: audit arch */
+ movl %esp, %eax
call __audit_syscall_entry
- pushl_cfi %ebx
movl PT_EAX(%esp),%eax /* reload syscall number */
jmp sysenter_do_call

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index cdc79b5..38fc889 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -557,12 +557,7 @@ badsys:
* jump back to the normal fast path.
*/
auditsys:
- movq %r10,%r9 /* 6th arg: 4th syscall arg */
- movq %rdx,%r8 /* 5th arg: 3rd syscall arg */
- movq %rsi,%rcx /* 4th arg: 2nd syscall arg */
- movq %rdi,%rdx /* 3rd arg: 1st syscall arg */
- movq %rax,%rsi /* 2nd arg: syscall number */
- movl $AUDIT_ARCH_X86_64,%edi /* 1st arg: audit arch */
+ movq %rsp,%rdi
call __audit_syscall_entry
LOAD_ARGS 0 /* reload call-clobbered registers */
jmp system_call_fastpath
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 13b1990..916abfd 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1496,19 +1496,7 @@ long syscall_trace_enter(struct pt_regs *regs)
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->orig_ax);

- if (IS_IA32)
- audit_syscall_entry(AUDIT_ARCH_I386,
- regs->orig_ax,
- regs->bx, regs->cx,
- regs->dx, regs->si);
-#ifdef CONFIG_X86_64
- else
- audit_syscall_entry(AUDIT_ARCH_X86_64,
- regs->orig_ax,
- regs->di, regs->si,
- regs->dx, regs->r10);
-#endif
-
+ audit_syscall_entry(regs);
out:
return ret ?: regs->orig_ax;
}
diff --git a/include/linux/audit.h b/include/linux/audit.h
index 22f292a..3d1f963 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -454,9 +454,7 @@ extern int audit_classify_arch(int arch);
/* Public API */
extern int audit_alloc(struct task_struct *task);
extern void __audit_free(struct task_struct *task);
-extern void __audit_syscall_entry(int arch,
- int major, unsigned long a0, unsigned long a1,
- unsigned long a2, unsigned long a3);
+extern void __audit_syscall_entry(struct pt_regs *regs);
extern void __audit_syscall_exit(int ret_success, long ret_value);
extern void __audit_getname(const char *name);
extern void audit_putname(const char *name);
@@ -476,14 +474,12 @@ static inline void audit_free(struct task_struct *task)
if (unlikely(task->audit_context))
__audit_free(task);
}
-static inline void audit_syscall_entry(int arch, int major, unsigned long a0,
- unsigned long a1, unsigned long a2,
- unsigned long a3)
+static inline void audit_syscall_entry(struct pt_regs *regs)
{
if (unlikely(!audit_dummy_context()))
- __audit_syscall_entry(arch, major, a0, a1, a2, a3);
+ __audit_syscall_entry(regs);
}
-static inline void audit_syscall_exit(void *pt_regs)
+static inline void audit_syscall_exit(struct pt_regs *pt_regs)
{
if (unlikely(current->audit_context)) {
int success = is_syscall_success(pt_regs);
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 4b96415..9075c27 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -68,6 +68,7 @@
#include <linux/capability.h>
#include <linux/fs_struct.h>
#include <linux/compat.h>
+#include <asm/syscall.h>

#include "audit.h"

@@ -1787,13 +1788,12 @@ void __audit_free(struct task_struct *tsk)
* will only be written if another part of the kernel requests that it
* be written).
*/
-void __audit_syscall_entry(int arch, int major,
- unsigned long a1, unsigned long a2,
- unsigned long a3, unsigned long a4)
+void __audit_syscall_entry(struct pt_regs *regs)
{
struct task_struct *tsk = current;
struct audit_context *context = tsk->audit_context;
enum audit_state state;
+ int major;

if (!context)
return;
@@ -1812,6 +1812,7 @@ void __audit_syscall_entry(int arch, int major,
* This also happens with vm86 emulation in a non-nested manner
* (entries without exits), so this case must be caught.
*/
+ major = syscall_get_nr(tsk, regs);
if (context->in_syscall) {
struct audit_context *newctx;

@@ -1839,12 +1840,9 @@ void __audit_syscall_entry(int arch, int major,
if (!audit_enabled)
return;

- context->arch = arch;
+ context->arch = syscall_get_arch(tsk, regs);
context->major = major;
- context->argv[0] = a1;
- context->argv[1] = a2;
- context->argv[2] = a3;
- context->argv[3] = a4;
+ syscall_get_arguments(tsk, regs, 0, 4, context->argv);

state = context->state;
context->dummy = !audit_n_rules;

2012-05-24 16:08:41

by Will Drewry

[permalink] [raw]
Subject: [RFC PATCH 0/3] move the secure_computing call

This is an RFC based on the comments from Al Viro and Eric Paris
regarding ptrace()rs being able to change the system call the kernel
sees after the seccomp enforcement has occurred (for mode 1 or 2).

With this series applied, a (p)tracer of a process with seccomp enabled
will be unable to change the tracee's system call number after the
secure computing check has been performed.

The x86 change is tested, as is the seccomp.c change. For other arches,
it is not (RFC :). Given that there are other inconsistencies in this
code across architectures, I'm not sure if it makes sense to attempt to
fix them all at once or to roll through as I attempt to add seccomp
filter support.

As is, the biggest benefit of this change is just setting consistent
expectations in what the ptrace/seccomp interactions should be. The
current ability for ptrace to "bypass" secure computing (by remapping
allowed system calls) is not necessarily a problem, but it is not
necessarily intuitive behavior.

Thoughts, comments, flames will be appreciated!


Will Drewry (3):
seccomp: Don't allow tracers to abuse RET_TRACE
arch/x86: move secure_computing after ptrace
arch/*: move secure_computing after trace

arch/arm/kernel/entry-common.S | 7 +------
arch/arm/kernel/ptrace.c | 42 +++++++++++++++++++--------------------
arch/microblaze/kernel/ptrace.c | 4 ++--
arch/mips/kernel/ptrace.c | 16 ++++++---------
arch/powerpc/kernel/ptrace.c | 5 +++--
arch/s390/kernel/ptrace.c | 6 +++---
arch/sh/kernel/ptrace_32.c | 5 +++--
arch/sh/kernel/ptrace_64.c | 5 +++--
arch/sparc/kernel/ptrace_64.c | 7 ++++---
arch/x86/kernel/ptrace.c | 13 ++++++------
kernel/seccomp.c | 4 ++++
11 files changed, 56 insertions(+), 58 deletions(-)

--
1.7.9.5

2012-05-24 16:08:46

by Will Drewry

[permalink] [raw]
Subject: [RFC PATCH 2/3] arch/x86: move secure_computing after ptrace

At present, seccomp modes 1 and 2 may have their
behavior changed by a ptrace()ing task. The ptracer
cannot change blocked/disallowed system calls, but it can
change allowed system calls to calls that would otherwise
not be allowed by the seccomp policy.

Signed-off-by: Will Drewry <[email protected]>
---
arch/x86/kernel/ptrace.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 13b1990..ad649a6 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1479,13 +1479,6 @@ long syscall_trace_enter(struct pt_regs *regs)
if (test_thread_flag(TIF_SINGLESTEP))
regs->flags |= X86_EFLAGS_TF;

- /* do the secure computing check first */
- if (secure_computing(regs->orig_ax)) {
- /* seccomp failures shouldn't expose any additional code. */
- ret = -1L;
- goto out;
- }
-
if (unlikely(test_thread_flag(TIF_SYSCALL_EMU)))
ret = -1L;

@@ -1493,6 +1486,12 @@ long syscall_trace_enter(struct pt_regs *regs)
tracehook_report_syscall_entry(regs))
ret = -1L;

+ /* check secure computing after userspace can't change the syscall. */
+ if (!ret && secure_computing(regs->orig_ax)) {
+ ret = -1L;
+ goto out;
+ }
+
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->orig_ax);

--
1.7.9.5

2012-05-24 16:09:01

by Will Drewry

[permalink] [raw]
Subject: [RFC PATCH 3/3] arch/*: move secure_computing after trace

At present, any ptracer can bypass the seccomp system call
policy by changing the system call after secure_computing has
been called. This change ensures that secure_computing
occurs after all other userspace applications may have changed the
system call value.

(Note, ARM was the most affected in this patch as the call was moved
into the syscall_trace path and reordering after ptrace was
non-trivial.)

This change is wildly untested.

Signed-off-by: Will Drewry <[email protected]>
---
arch/arm/kernel/entry-common.S | 7 +------
arch/arm/kernel/ptrace.c | 42 +++++++++++++++++++--------------------
arch/microblaze/kernel/ptrace.c | 4 ++--
arch/mips/kernel/ptrace.c | 16 ++++++---------
arch/powerpc/kernel/ptrace.c | 5 +++--
arch/s390/kernel/ptrace.c | 6 +++---
arch/sh/kernel/ptrace_32.c | 5 +++--
arch/sh/kernel/ptrace_64.c | 5 +++--
arch/sparc/kernel/ptrace_64.c | 7 ++++---
9 files changed, 46 insertions(+), 51 deletions(-)

diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index 7bd2d3c..57a5bde 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -416,12 +416,7 @@ ENTRY(vector_swi)

#ifdef CONFIG_SECCOMP
tst r10, #_TIF_SECCOMP
- beq 1f
- mov r0, scno
- bl __secure_computing
- add r0, sp, #S_R0 + S_OFF @ pointer to regs
- ldmia r0, {r0 - r3} @ have to reload r0 - r3
-1:
+ bne __sys_trace @ are we in seccomp mode?
#endif

tst r10, #_TIF_SYSCALL_WORK @ are we tracing syscalls?
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 14e3826..1654071 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -909,32 +909,32 @@ long arch_ptrace(struct task_struct *child, long request,

asmlinkage int syscall_trace(int why, struct pt_regs *regs, int scno)
{
- unsigned long ip;
-
- if (why)
- audit_syscall_exit(regs);
- else
- audit_syscall_entry(AUDIT_ARCH_ARM, scno, regs->ARM_r0,
- regs->ARM_r1, regs->ARM_r2, regs->ARM_r3);
-
- if (!test_thread_flag(TIF_SYSCALL_TRACE))
- return scno;
-
- current_thread_info()->syscall = scno;
-
/*
* IP is used to denote syscall entry/exit:
* IP = 0 -> entry, =1 -> exit
*/
- ip = regs->ARM_ip;
- regs->ARM_ip = why;
+ unsigned long ip = regs->ARM_ip;

- if (why)
- tracehook_report_syscall_exit(regs, 0);
- else if (tracehook_report_syscall_entry(regs))
- current_thread_info()->syscall = -1;
-
- regs->ARM_ip = ip;
+ current_thread_info()->syscall = scno;
+ if (why) { /* exit */
+ if (test_thread_flag(TIF_SYSCALL_TRACE)) {
+ regs->ARM_ip = why;
+ tracehook_report_syscall_exit(regs, 0);
+ regs->ARM_ip = ip;
+ }
+ audit_syscall_exit(regs);
+ } else { /* entry */
+ if (test_thread_flag(TIF_SYSCALL_TRACE)) {
+ regs->ARM_ip = why;
+ if (tracehook_report_syscall_entry(regs))
+ current_thread_info()->syscall = -1;
+ regs->ARM_ip = ip;
+ }
+ /* Call secure_computing after any userspace changes are done */
+ secure_computing_strict(current_thread_info()->syscall);
+ audit_syscall_entry(AUDIT_ARCH_ARM, scno, regs->ARM_r0,
+ regs->ARM_r1, regs->ARM_r2, regs->ARM_r3);
+ }

return current_thread_info()->syscall;
}
diff --git a/arch/microblaze/kernel/ptrace.c b/arch/microblaze/kernel/ptrace.c
index ab1b9db..9a917a7 100644
--- a/arch/microblaze/kernel/ptrace.c
+++ b/arch/microblaze/kernel/ptrace.c
@@ -136,8 +136,6 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
{
long ret = 0;

- secure_computing_strict(regs->r12);
-
if (test_thread_flag(TIF_SYSCALL_TRACE) &&
tracehook_report_syscall_entry(regs))
/*
@@ -147,6 +145,8 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
*/
ret = -1L;

+ secure_computing_strict(regs->r12);
+
audit_syscall_entry(EM_MICROBLAZE, regs->r12, regs->r5, regs->r6,
regs->r7, regs->r8);

diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c
index 4812c6d..2b1f5f3 100644
--- a/arch/mips/kernel/ptrace.c
+++ b/arch/mips/kernel/ptrace.c
@@ -534,19 +534,16 @@ static inline int audit_arch(void)
*/
asmlinkage void syscall_trace_enter(struct pt_regs *regs)
{
- /* do the secure computing check first */
- secure_computing_strict(regs->regs[2]);
-
- if (!(current->ptrace & PT_PTRACED))
- goto out;
-
- if (!test_thread_flag(TIF_SYSCALL_TRACE))
- goto out;
-
+ if ((current->ptrace & PT_PTRACED) &&
+ test_thread_flag(TIF_SYSCALL_TRACE)) {
/* The 0x80 provides a way for the tracing parent to distinguish
between a syscall stop and SIGTRAP delivery */
ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) ?
0x80 : 0));
+ }
+
+ /* do the secure computing check after the system calls are final. */
+ secure_computing_strict(regs->regs[2]);

/*
* this isn't the same as continuing with a signal, but it will do
@@ -558,7 +555,6 @@ asmlinkage void syscall_trace_enter(struct pt_regs *regs)
current->exit_code = 0;
}

-out:
audit_syscall_entry(audit_arch(), regs->regs[2],
regs->regs[4], regs->regs[5],
regs->regs[6], regs->regs[7]);
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index dd5e214..4b725ed 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -1710,8 +1710,6 @@ long do_syscall_trace_enter(struct pt_regs *regs)
{
long ret = 0;

- secure_computing_strict(regs->gpr[0]);
-
if (test_thread_flag(TIF_SYSCALL_TRACE) &&
tracehook_report_syscall_entry(regs))
/*
@@ -1721,6 +1719,9 @@ long do_syscall_trace_enter(struct pt_regs *regs)
*/
ret = -1L;

+ if (!ret)
+ secure_computing_strict(regs->gpr[0]);
+
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->gpr[0]);

diff --git a/arch/s390/kernel/ptrace.c b/arch/s390/kernel/ptrace.c
index 4993e68..b95c0ac 100644
--- a/arch/s390/kernel/ptrace.c
+++ b/arch/s390/kernel/ptrace.c
@@ -718,9 +718,6 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
{
long ret = 0;

- /* Do the secure computing check first. */
- secure_computing_strict(regs->gprs[2]);
-
/*
* The sysc_tracesys code in entry.S stored the system
* call number to gprs[2].
@@ -737,6 +734,9 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
ret = -1;
}

+ /* Do the secure computing check with a final syscall set. */
+ secure_computing_strict(regs->gprs[2]);
+
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->gprs[2]);

diff --git a/arch/sh/kernel/ptrace_32.c b/arch/sh/kernel/ptrace_32.c
index 81f999a..2500ce1 100644
--- a/arch/sh/kernel/ptrace_32.c
+++ b/arch/sh/kernel/ptrace_32.c
@@ -503,8 +503,6 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
{
long ret = 0;

- secure_computing_strict(regs->regs[0]);
-
if (test_thread_flag(TIF_SYSCALL_TRACE) &&
tracehook_report_syscall_entry(regs))
/*
@@ -514,6 +512,9 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
*/
ret = -1L;

+ if (!ret)
+ secure_computing_strict(regs->regs[0]);
+
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->regs[0]);

diff --git a/arch/sh/kernel/ptrace_64.c b/arch/sh/kernel/ptrace_64.c
index af90339..cc030e0 100644
--- a/arch/sh/kernel/ptrace_64.c
+++ b/arch/sh/kernel/ptrace_64.c
@@ -522,8 +522,6 @@ asmlinkage long long do_syscall_trace_enter(struct pt_regs *regs)
{
long long ret = 0;

- secure_computing_strict(regs->regs[9]);
-
if (test_thread_flag(TIF_SYSCALL_TRACE) &&
tracehook_report_syscall_entry(regs))
/*
@@ -533,6 +531,9 @@ asmlinkage long long do_syscall_trace_enter(struct pt_regs *regs)
*/
ret = -1LL;

+ if (!ret)
+ secure_computing_strict(regs->regs[9]);
+
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->regs[9]);

diff --git a/arch/sparc/kernel/ptrace_64.c b/arch/sparc/kernel/ptrace_64.c
index 484daba..31fff51 100644
--- a/arch/sparc/kernel/ptrace_64.c
+++ b/arch/sparc/kernel/ptrace_64.c
@@ -1061,12 +1061,13 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs)
{
int ret = 0;

- /* do the secure computing check first */
- secure_computing_strict(regs->u_regs[UREG_G1]);
-
if (test_thread_flag(TIF_SYSCALL_TRACE))
ret = tracehook_report_syscall_entry(regs);

+ /* do the secure computing check with the final syscall */
+ if (!ret)
+ secure_computing_strict(regs->u_regs[UREG_G1]);
+
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->u_regs[UREG_G1]);

--
1.7.9.5

2012-05-24 16:09:49

by Will Drewry

[permalink] [raw]
Subject: [RFC PATCH 1/3] seccomp: Don't allow tracers to abuse RET_TRACE

Ensure that consumers of the PTRACE_EVENT_SECCOMP notification
cannot change the system call number for the traced task
without it resulting in the system call being skipped.

Traditionally, tracers will set the system call number to
-1 to skip the system call. This behavior will work as expected
but the tracer will be unable to remap the system call to a valid
system call after the seccomp policy has been evaluated.

Signed-off-by: Will Drewry <[email protected]>
---
kernel/seccomp.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index ee376be..33f0ad6 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -425,6 +425,10 @@ int __secure_computing(int this_syscall)
*/
if (fatal_signal_pending(current))
break;
+ /* Skip the system call if the tracer changed it. */
+ if (this_syscall !=
+ syscall_get_nr(current, task_pt_regs(current)))
+ goto skip;
return 0;
case SECCOMP_RET_ALLOW:
return 0;
--
1.7.9.5

2012-05-24 16:15:05

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] move the secure_computing call

On 05/24/2012 09:07 AM, Will Drewry wrote:
> This is an RFC based on the comments from Al Viro and Eric Paris
> regarding ptrace()rs being able to change the system call the kernel
> sees after the seccomp enforcement has occurred (for mode 1 or 2).
>
> With this series applied, a (p)tracer of a process with seccomp enabled
> will be unable to change the tracee's system call number after the
> secure computing check has been performed.
>
> The x86 change is tested, as is the seccomp.c change. For other arches,
> it is not (RFC :). Given that there are other inconsistencies in this
> code across architectures, I'm not sure if it makes sense to attempt to
> fix them all at once or to roll through as I attempt to add seccomp
> filter support.
>
> As is, the biggest benefit of this change is just setting consistent
> expectations in what the ptrace/seccomp interactions should be. The
> current ability for ptrace to "bypass" secure computing (by remapping
> allowed system calls) is not necessarily a problem, but it is not
> necessarily intuitive behavior.
>
> Thoughts, comments, flames will be appreciated!

I think this really screws with using seccomp for self-interception. I
wouldn't inherently be opposed to the following flow:

seccomp -> ptrace -> seccomp

... i.e. if ptrace is enabled and we enable something, run it through
seccomp again, but there are bunch of use cases (mostly involving
SIGSYS) where doing ptrace before seccomp is just bizarre.

-hpa

2012-05-24 17:55:11

by Indan Zupancic

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] seccomp: Don't allow tracers to abuse RET_TRACE

On Thu, May 24, 2012 18:07, Will Drewry wrote:
> Ensure that consumers of the PTRACE_EVENT_SECCOMP notification
> cannot change the system call number for the traced task
> without it resulting in the system call being skipped.
>
> Traditionally, tracers will set the system call number to
> -1 to skip the system call. This behavior will work as expected
> but the tracer will be unable to remap the system call to a valid
> system call after the seccomp policy has been evaluated.
>
> Signed-off-by: Will Drewry <[email protected]>
> ---
> kernel/seccomp.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index ee376be..33f0ad6 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -425,6 +425,10 @@ int __secure_computing(int this_syscall)
> */
> if (fatal_signal_pending(current))
> break;
> + /* Skip the system call if the tracer changed it. */
> + if (this_syscall !=
> + syscall_get_nr(current, task_pt_regs(current)))
> + goto skip;
> return 0;
> case SECCOMP_RET_ALLOW:
> return 0;
> --

This patch doesn't make any sense whatsoever. You can't know why a system
call was blocked by a seccomp filter, assuming it's always because of the
system call number is wrong.

Also, you don't check if an allowed system call is changed into a denied
one, so this doesn't protect against ptracers bypassing seccomp filters.

And one of the main points of PTRACE_EVENT_SECCOMP events was that it's
useful for cases that can't be handled or decided by the seccomp filter.
Then taking away the ability to change the syscall number makes it a lot
less useful.

Either do the seccomp test before or after ptrace, or both, but please
don't introduce ad hoc checks like this.

Greetings,

Indan

2012-05-24 18:07:55

by Roland McGrath

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] move the secure_computing call

On Thu, May 24, 2012 at 9:13 AM, H. Peter Anvin <[email protected]> wrote:
> I think this really screws with using seccomp for self-interception. ?I
> wouldn't inherently be opposed to the following flow:
>
> ? ? ? ?seccomp -> ptrace -> seccomp
>
> ... i.e. if ptrace is enabled and we enable something, run it through
> seccomp again, but there are bunch of use cases (mostly involving
> SIGSYS) where doing ptrace before seccomp is just bizarre.

Are you sure? This is ptrace syscall tracing going first.
If seccomp generates a SIGSYS, then ptrace will still get its opportunity
to intercept the signal and change the register state however it likes.


Thanks,
Roland

2012-05-24 18:24:41

by Will Drewry

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] seccomp: Don't allow tracers to abuse RET_TRACE

On Thu, May 24, 2012 at 12:54 PM, Indan Zupancic <[email protected]> wrote:
> On Thu, May 24, 2012 18:07, Will Drewry wrote:
>> Ensure that consumers of the PTRACE_EVENT_SECCOMP notification
>> cannot change the system call number for the traced task
>> without it resulting in the system call being skipped.
>>
>> Traditionally, tracers will set the system call number to
>> -1 to skip the system call. This behavior will work as expected
>> but the tracer will be unable to remap the system call to a valid
>> system call after the seccomp policy has been evaluated.
>>
>> Signed-off-by: Will Drewry <[email protected]>
>> ---
>> ?kernel/seccomp.c | ? ?4 ++++
>> ?1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>> index ee376be..33f0ad6 100644
>> --- a/kernel/seccomp.c
>> +++ b/kernel/seccomp.c
>> @@ -425,6 +425,10 @@ int __secure_computing(int this_syscall)
>> ? ? ? ? ? ? ? ? ? ? ? ?*/
>> ? ? ? ? ? ? ? ? ? ? ? if (fatal_signal_pending(current))
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? break;
>> + ? ? ? ? ? ? ? ? ? ? /* Skip the system call if the tracer changed it. */
>> + ? ? ? ? ? ? ? ? ? ? if (this_syscall !=
>> + ? ? ? ? ? ? ? ? ? ? ? ? syscall_get_nr(current, task_pt_regs(current)))
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? goto skip;
>> ? ? ? ? ? ? ? ? ? ? ? return 0;
>> ? ? ? ? ? ? ? case SECCOMP_RET_ALLOW:
>> ? ? ? ? ? ? ? ? ? ? ? return 0;
>> --
>
> This patch doesn't make any sense whatsoever. You can't know why a system
> call was blocked by a seccomp filter, assuming it's always because of the
> system call number is wrong.

All this does is assert that the tracer can't change the syscall
number without it skipping the call. If seccomp returned
SECCOMP_RET_TRACE because the argument to open was O_RDWR, then
everything is fine.

> Also, you don't check if an allowed system call is changed into a denied
> one, so this doesn't protect against ptracers bypassing seccomp filters.

This enforces that the system call that is going to be executed is the
one that triggered SECCOMP_RET_TRACE. That means seccomp is
delegating the go/no-go decision to the tracer. I don't understand
your assertion here. This code doesn't affect the PTRACE_SYSCALL
case.

> And one of the main points of PTRACE_EVENT_SECCOMP events was that it's
> useful for cases that can't be handled or decided by the seccomp filter.
> Then taking away the ability to change the syscall number makes it a lot
> less useful.

Do you have a valid case where you want to remap one system call to
another without the ability to also handle the syscall exit path and
do any fixups? I've mostly just seen skip, allow, update arguments -
not swapping the entire syscall. That said, it's possible. you could
do all sorts of weird things with ptrace if you want :)

> Either do the seccomp test before or after ptrace, or both, but please
> don't introduce ad hoc checks like this.

I don't feel strongly about this RFC, but I don't believe that
expectations are being changed dramatically.

thanks!
will

2012-05-24 18:27:48

by Indan Zupancic

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] move the secure_computing call

On Thu, May 24, 2012 20:07, Roland McGrath wrote:
> On Thu, May 24, 2012 at 9:13 AM, H. Peter Anvin <[email protected]> wrote:
>> I think this really screws with using seccomp for self-interception. I
>> wouldn't inherently be opposed to the following flow:
>>
>> seccomp -> ptrace -> seccomp
>>
>> ... i.e. if ptrace is enabled and we enable something, run it through
>> seccomp again, but there are bunch of use cases (mostly involving
>> SIGSYS) where doing ptrace before seccomp is just bizarre.
>
> Are you sure? This is ptrace syscall tracing going first.
> If seccomp generates a SIGSYS, then ptrace will still get its opportunity
> to intercept the signal and change the register state however it likes.

If so, then the seccomp check needs to be redone after any ptrace
changes, or we should give up and just do the seccomp check first,
instead of possibly looping forever. PTRACE_EVENT_SECCOMP has the
same problem.

If a seccomp filtered task can do ptrace(), it can easily bypass
the seccomp filter by ptracing any task not under the same filter
but from the same user. And then it can puppeteer the victim into
doing anything it wishes. So pretending seccomp can make a ptracer
secure is futile, I think. Perhaps it's better to keep it simple and
always do the seccomp test first and ignore ptrace changes, however
sad that may seem. Seccomp had the power to stop ptrace(). It didn't,
so it shouldn't try to do it afterwards either.

It's a bit fuzzy though, only reason why doing seccomp first is more
convenient is because seccomp can generate ptrace events. I don't
think it will make a difference in practice because ptrace(2) won't
be allowed by seccomp filters anyway, so it's a bit of a theoretical
problem.

Greetings,

Indan

2012-05-24 18:46:37

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] move the secure_computing call

On 05/24/2012 11:27 AM, Indan Zupancic wrote:
>
> If so, then the seccomp check needs to be redone after any ptrace
> changes, or we should give up and just do the seccomp check first,
> instead of possibly looping forever. PTRACE_EVENT_SECCOMP has the
> same problem.
>
> If a seccomp filtered task can do ptrace(), it can easily bypass
> the seccomp filter by ptracing any task not under the same filter
> but from the same user. And then it can puppeteer the victim into
> doing anything it wishes. So pretending seccomp can make a ptracer
> secure is futile, I think. Perhaps it's better to keep it simple and
> always do the seccomp test first and ignore ptrace changes, however
> sad that may seem. Seccomp had the power to stop ptrace(). It didn't,
> so it shouldn't try to do it afterwards either.
>
> It's a bit fuzzy though, only reason why doing seccomp first is more
> convenient is because seccomp can generate ptrace events. I don't
> think it will make a difference in practice because ptrace(2) won't
> be allowed by seccomp filters anyway, so it's a bit of a theoretical
> problem.
>

No, that's not the reason to do seccomp first. The reason to do seccomp
first is that a seccomp filter can be part of the process execution and
can completely transform the system call picture.

Consider UML, for example. It uses ptrace to capture system calls and
execute them on the behalf of the process. It needs to know what system
calls *actually* are done by the virtual process.

(Note: that being said, UML might very well be better done using seccomp
filters *instead* of ptrace, but that's another matter.)

I agree with you, if the process is traceable it is rather questionable
to claim any kind of security; more likely consider that a debugging
mode and tell people to lock out ptrace for real sandboxing.

-hpa

2012-05-24 19:39:24

by Indan Zupancic

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] move the secure_computing call

On Thu, May 24, 2012 20:45, H. Peter Anvin wrote:
> On 05/24/2012 11:27 AM, Indan Zupancic wrote:
>>
>> If so, then the seccomp check needs to be redone after any ptrace
>> changes, or we should give up and just do the seccomp check first,
>> instead of possibly looping forever. PTRACE_EVENT_SECCOMP has the
>> same problem.
>>
>> If a seccomp filtered task can do ptrace(), it can easily bypass
>> the seccomp filter by ptracing any task not under the same filter
>> but from the same user. And then it can puppeteer the victim into
>> doing anything it wishes. So pretending seccomp can make a ptracer
>> secure is futile, I think. Perhaps it's better to keep it simple and
>> always do the seccomp test first and ignore ptrace changes, however
>> sad that may seem. Seccomp had the power to stop ptrace(). It didn't,
>> so it shouldn't try to do it afterwards either.
>>
>> It's a bit fuzzy though, only reason why doing seccomp first is more
>> convenient is because seccomp can generate ptrace events. I don't
>> think it will make a difference in practice because ptrace(2) won't
>> be allowed by seccomp filters anyway, so it's a bit of a theoretical
>> problem.
>>
>
> No, that's not the reason to do seccomp first. The reason to do seccomp
> first is that a seccomp filter can be part of the process execution and
> can completely transform the system call picture.

How? All that seccomp can do is deny system calls and kill the task.
What you describe sounds more like ptrace.

>
> Consider UML, for example. It uses ptrace to capture system calls and
> execute them on the behalf of the process. It needs to know what system
> calls *actually* are done by the virtual process.

Seccomp can't change system calls, only prevent them, so I miss how it
can change anything for UML in that way.

>
> (Note: that being said, UML might very well be better done using seccomp
> filters *instead* of ptrace, but that's another matter.)

Well, obviously it will use seccomp filters instead of ptrace when possible
and do the more complicated stuff via PTRACE_SECCOMP_EVENT when seccomp isn't
sufficient. But mainly for performance reasons.

>
> I agree with you, if the process is traceable it is rather questionable
> to claim any kind of security; more likely consider that a debugging
> mode and tell people to lock out ptrace for real sandboxing.

"process is traceable" should be "process is able to trace".

Greetings,

Indan

2012-05-24 20:17:12

by Indan Zupancic

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] seccomp: Don't allow tracers to abuse RET_TRACE

On Thu, May 24, 2012 20:24, Will Drewry wrote:
> On Thu, May 24, 2012 at 12:54 PM, Indan Zupancic <[email protected]> wrote:
>> This patch doesn't make any sense whatsoever. You can't know why a system
>> call was blocked by a seccomp filter, assuming it's always because of the
>> system call number is wrong.
>
> All this does is assert that the tracer can't change the syscall
> number without it skipping the call.

Why wouldn't it be allowed to change the system call number?

And try answering that question in a way that doesn't apply to syscall
argument values too.

> If seccomp returned
> SECCOMP_RET_TRACE because the argument to open was O_RDWR, then
> everything is fine.

No it's not fine, because it's inconsistent and arbitrary.

>
>> Also, you don't check if an allowed system call is changed into a denied
>> one, so this doesn't protect against ptracers bypassing seccomp filters.
>
> This enforces that the system call that is going to be executed is the
> one that triggered SECCOMP_RET_TRACE. That means seccomp is
> delegating the go/no-go decision to the tracer. I don't understand
> your assertion here. This code doesn't affect the PTRACE_SYSCALL
> case.

It still gives normal ptracers the ability to bypass seccomp by changing
allowed system calls into system calls that would have been denied. So
considering ptrace can still be used to execute arbitrary system calls,
why add this special case restriction to SECCOMP_RET_TRACE?

>
>> And one of the main points of PTRACE_EVENT_SECCOMP events was that it's
>> useful for cases that can't be handled or decided by the seccomp filter.
>> Then taking away the ability to change the syscall number makes it a lot
>> less useful.
>
> Do you have a valid case where you want to remap one system call to
> another without the ability to also handle the syscall exit path and
> do any fixups? I've mostly just seen skip, allow, update arguments -
> not swapping the entire syscall. That said, it's possible. you could
> do all sorts of weird things with ptrace if you want :)

One case is for system call injection where an arbitrary syscall is
hijacked for the jailer's purposes. But that needs the exit path too.
Another one is replacing fork() with clone(), but that's only necessary
on 2.4. Another one is to let the process call exit() to get rid of it.

But it's easy to get the exit notification by resuming with PTRACE_SYSCALL
after getting a PTRACE_EVENT_SECCOMP event, so I don't see how it matters
if we're interested in the exit path or not.

In the jailer the first system call after an execve is replaced with a
call to mmap2() to get a read-only shared mapping which is used to avoid
file path modifications and similar races. And when using seccomp all
events are PTRACE_EVENT_SECCOMP ones, and those are exactly the ones
where we need the shared read-only mapping. We could work around your
ad hoc restriction, but the main thing is if seccomp filters is just
about syscall numbers anyway, then why use BPF instead of a bitmask?

The whole point of PTRACE_EVENT_SECCOMP is to delegate to ptrace, it
means "we can't decide in the filter, ask the ptracer". This implies
that the ptracer is trusted, so doing any checks afterwards is a bit
pointless. But if you want to do it anyway, at least do it properly
and re-run the filter after ptrace. To avoid loops you need to allow
the syscall if you get another PTRACE_EVENT_SECCOMP.

>
>> Either do the seccomp test before or after ptrace, or both, but please
>> don't introduce ad hoc checks like this.
>
> I don't feel strongly about this RFC, but I don't believe that
> expectations are being changed dramatically.

As seccomp can generate ptrace events, the only thing that makes sense is
to either do seccomp first and ignore ptrace changes, or rerun filters after
any ptrace changes. But as I said in my other mail, if a process can call
ptrace(), it will pretty much avoid all secomp filters anyway, seccomp
filters which allow ptrace() are pretty much guaranteed to be insecure.
We're talking about a non-seccomped process ptracing a seccomped process,
both with the same UID. I don't think it matters in practice what you do
in this case, from a security point of view.

Greetings,

Indan

2012-05-24 22:01:01

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] move the secure_computing call

On Thu, 24 May 2012 11:07:58 -0500
Will Drewry <[email protected]> wrote:

> This is an RFC based on the comments from Al Viro and Eric Paris
> regarding ptrace()rs being able to change the system call the kernel
> sees after the seccomp enforcement has occurred (for mode 1 or 2).

Perhaps you could repeat those comments in this changelog.

> With this series applied, a (p)tracer of a process with seccomp enabled
> will be unable to change the tracee's system call number after the
> secure computing check has been performed.
>
> The x86 change is tested, as is the seccomp.c change. For other arches,
> it is not (RFC :). Given that there are other inconsistencies in this
> code across architectures, I'm not sure if it makes sense to attempt to
> fix them all at once or to roll through as I attempt to add seccomp
> filter support.
>
> As is, the biggest benefit of this change is just setting consistent
> expectations in what the ptrace/seccomp interactions should be. The
> current ability for ptrace to "bypass" secure computing (by remapping
> allowed system calls) is not necessarily a problem, but it is not
> necessarily intuitive behavior.
>

Because my take on the above reasoning is "why did you bother writing
these patches"!

2012-05-24 23:41:11

by James Morris

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] move the secure_computing call

On Thu, 24 May 2012, Will Drewry wrote:

> As is, the biggest benefit of this change is just setting consistent
> expectations in what the ptrace/seccomp interactions should be. The
> current ability for ptrace to "bypass" secure computing (by remapping
> allowed system calls) is not necessarily a problem, but it is not
> necessarily intuitive behavior.

Indeed -- while the purpose of seccomp is to reduce the attack surface of
the syscall interface, if a user allows ptrace, attackers will definitely
see that as an attack vector, if it allows them to increase that attack
surface.

It at least needs to be well-documented.

--
James Morris
<[email protected]>

2012-05-24 23:43:53

by Andrew Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] move the secure_computing call

On Thu, May 24, 2012 at 4:40 PM, James Morris <[email protected]> wrote:
> On Thu, 24 May 2012, Will Drewry wrote:
>
>> As is, the biggest benefit of this change is just setting consistent
>> expectations in what the ptrace/seccomp interactions should be. ?The
>> current ability for ptrace to "bypass" secure computing (by remapping
>> allowed system calls) is not necessarily a problem, but it is not
>> necessarily intuitive behavior.
>
> Indeed -- while the purpose of seccomp is to reduce the attack surface of
> the syscall interface, if a user allows ptrace, attackers will definitely
> see that as an attack vector, if it allows them to increase that attack
> surface.
>
> It at least needs to be well-documented.

IMO the behavior should change. Alternatively, a post-ptrace syscall
should have to pass the *tracer's* seccomp filter, but that seems
overcomplicated and confusing.

OTOH, allowing ptrace in a seccomp filter is asking for trouble anyway
-- if you can ptrace something outside the sandbox, you've escaped.

--Andy

2012-05-24 23:57:49

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] move the secure_computing call

On 05/24/2012 04:43 PM, Andrew Lutomirski wrote:
>
> IMO the behavior should change. Alternatively, a post-ptrace syscall
> should have to pass the *tracer's* seccomp filter, but that seems
> overcomplicated and confusing.
>
> OTOH, allowing ptrace in a seccomp filter is asking for trouble anyway
> -- if you can ptrace something outside the sandbox, you've escaped.
>

This is my suggestion: if there is demand, make it possible to install a
*second* seccomp filter program which is run on the result of the
ptrace. I.e.:

Untraced: process -> seccomp1 -> kernel

Traced: process -> seccomp1 -> ptrace -> seccomp2 -> kernel

This is something we could add later if there is demand.

-hpa

2012-05-25 00:27:17

by Andrew Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] move the secure_computing call

On Thu, May 24, 2012 at 4:56 PM, H. Peter Anvin <[email protected]> wrote:
> On 05/24/2012 04:43 PM, Andrew Lutomirski wrote:
>>
>> IMO the behavior should change. ?Alternatively, a post-ptrace syscall
>> should have to pass the *tracer's* seccomp filter, but that seems
>> overcomplicated and confusing.
>>
>> OTOH, allowing ptrace in a seccomp filter is asking for trouble anyway
>> -- if you can ptrace something outside the sandbox, you've escaped.
>>
>
> This is my suggestion: if there is demand, make it possible to install a
> *second* seccomp filter program which is run on the result of the
> ptrace. ?I.e.:
>
> Untraced: ? ? ? process -> seccomp1 -> kernel
>
> Traced: ? ? ? ? process -> seccomp1 -> ptrace -> seccomp2 -> kernel

Just to clarify: are you suggesting that, for now, the traced behavior
should be:

process -> seccomp -> ptrace -> kernel?

If so, I think the man page or something should have a big fat warning
that seccomp filters should *never* allow ptrace (even PTRACE_TRACEME)
unless they fully understand the issue.

In any case, I think that the UML interaction is missing the point.
UML will *emulate* the seccomp filter. If it chooses to use host
seccomp filters for some business, that's its business.

--Andy

2012-05-25 00:39:10

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] move the secure_computing call

On 05/24/2012 05:26 PM, Andrew Lutomirski wrote:
>
> Just to clarify: are you suggesting that, for now, the traced behavior
> should be:
>
> process -> seccomp -> ptrace -> kernel?
>
> If so, I think the man page or something should have a big fat warning
> that seccomp filters should *never* allow ptrace (even PTRACE_TRACEME)
> unless they fully understand the issue.
>

Yes, and yes.

> In any case, I think that the UML interaction is missing the point.
> UML will *emulate* the seccomp filter. If it chooses to use host
> seccomp filters for some business, that's its business.

I don't see why UML should have to emulate the seccomp filter. With the
proposed order, then it can simply use the seccomp filter provided by
the host. Furthermore, with this sequencing UML can actually *use*
seccomp to provide the simulation.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-05-25 00:55:24

by Andrew Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] move the secure_computing call

On Thu, May 24, 2012 at 5:38 PM, H. Peter Anvin <[email protected]> wrote:
> On 05/24/2012 05:26 PM, Andrew Lutomirski wrote:
>>
>> Just to clarify: are you suggesting that, for now, the traced behavior
>> should be:
>>
>> process -> seccomp -> ptrace -> kernel?
>>
>> If so, I think the man page or something should have a big fat warning
>> that seccomp filters should *never* allow ptrace (even PTRACE_TRACEME)
>> unless they fully understand the issue.
>>
>
> Yes, and yes.
>
>> In any case, I think that the UML interaction is missing the point.
>> UML will *emulate* the seccomp filter. ?If it chooses to use host
>> seccomp filters for some business, that's its business.
>
> I don't see why UML should have to emulate the seccomp filter. ?With the
> proposed order, then it can simply use the seccomp filter provided by
> the host. ?Furthermore, with this sequencing UML can actually *use*
> seccomp to provide the simulation.

Hmm. I guess I agree. I'll shut up now :)

--Andy

2012-05-25 01:55:14

by Will Drewry

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] move the secure_computing call

On Thu, May 24, 2012 at 5:00 PM, Andrew Morton
<[email protected]> wrote:
> On Thu, 24 May 2012 11:07:58 -0500
> Will Drewry <[email protected]> wrote:
>
>> This is an RFC based on the comments from Al Viro and Eric Paris
>> regarding ptrace()rs being able to change the system call the kernel
>> sees after the seccomp enforcement has occurred (for mode 1 or 2).
>
> Perhaps you could repeat those comments in this changelog.

Oops :) Here's the context -- https://lkml.org/lkml/2012/5/21/326

I doubt there's need for a repost though.

>> With this series applied, a (p)tracer of a process with seccomp enabled
>> will be unable to change the tracee's system call number after the
>> secure computing check has been performed.
>>
>> The x86 change is tested, as is the seccomp.c change. ?For other arches,
>> it is not (RFC :). ?Given that there are other inconsistencies in this
>> code across architectures, I'm not sure if it makes sense to attempt to
>> fix them all at once or to roll through as I attempt to add seccomp
>> filter support.
>>
>> As is, the biggest benefit of this change is just setting consistent
>> expectations in what the ptrace/seccomp interactions should be. ?The
>> current ability for ptrace to "bypass" secure computing (by remapping
>> allowed system calls) is not necessarily a problem, but it is not
>> necessarily intuitive behavior.
>>
>
> Because my take on the above reasoning is "why did you bother writing
> these patches"!

Just to be thorough -- I wanted to make sure the discussion was framed
against actual code just in case I was missing something. Otherwise,
I'd be happy to see these patches disappear into the annals of the
wayback machine.

thanks!