2012-05-23 16:01:55

by Will Drewry

[permalink] [raw]
Subject: New ARM asm/syscall.h incompatible? (commit bf2c9f9866928df60157bc4f1ab39f93a32c754e)

Hi Wade and Steven,

I don't believe the syscall_get_arguments/syscall_set_arguments
implementation that landed in 3.4 is correct or safe. I didn't see it
get pulled in - sorry for not mailing sooner! :(

The current implementation allows for _7_ arguments and allows the 0th
index to be the ARM_ORIG_r0 instead of starting with ARM_r0 == 0. In
the global description of syscall_*_arguments it says:

* It's only valid to call this when @task is stopped for tracing on
* entry to a system call, due to %TIF_SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
* It's invalid to call this with @i + @n > 6; we only support system calls
* taking up to 6 arguments.

This means that the current implementation is broken when matching
system call arguments for ftrace (unless there is an arch specific
hack in there) and it breaks internal kernel API for any other
consumers without arch knowledge (like seccomp mode=2). Is there a
reason to expose ARM_ORIG_r0 this way? Am I misreading?

My understanding of the arch register usage at syscall time is something like:
- ORIG_r0 gets the syscall number
- r0 becomes the first system call argument
- system call proceeds
- on return, r0 is the return value

Right now, anyone who asks for the first argument will get the system
call number (syscall_get_nr) instead of the first argument. The
attached patch fixes this, but I'm curious why this is and how it
didn't break ftrace! Am I missing something?

Even audit_syscall_entry() uses ARM_r0 for the first argument which
means that any future consumers doing syscall_get_arguments(..., 0, 6)
would get the wrong first argument.

I'm also curious why the system call argument getter/setters allow for
invalid requests instead of BUG_ON()ing? All code that exposes
syscall arguments to userspace should be limiting the number to a
maximum of 6, and any other badness is a definite kernel bug. Am I
just really confused?

Any insights will be appreciated - thanks!
will

-----
>From af693829bc883b2cb2c2050119839983505a01c3 Mon Sep 17 00:00:00 2001
From: Will Drewry <[email protected]>
Date: Wed, 23 May 2012 10:54:28 -0500
Subject: [PATCH] arm: syscall.h: fix up syscall_set/get_arguments

syscall_{get,set}_arguments currently allow up to 7 arguments and make
the system call number a virtual argument. This breaks the internal
asm/syscall.h ABI and makes the implementation incompatible with
existing implementations.

This change converts those functions to behave as they do on other
arches: 6 arguments and leave the syscall to syscall_get_nr and
syscall_rollback.

Signed-off-by: Will Drewry <[email protected]>
---
arch/arm/include/asm/syscall.h | 33 +++------------------------------
1 file changed, 3 insertions(+), 30 deletions(-)

diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index c334a23..c30035a 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -43,29 +43,14 @@ static inline void syscall_set_return_value(struct
task_struct *task,
regs->ARM_r0 = (long) error ? error : val;
}

-#define SYSCALL_MAX_ARGS 7
+#define SYSCALL_MAX_ARGS 6

static inline void syscall_get_arguments(struct task_struct *task,
struct pt_regs *regs,
unsigned int i, unsigned int n,
unsigned long *args)
{
- if (i + n > SYSCALL_MAX_ARGS) {
- unsigned long *args_bad = args + SYSCALL_MAX_ARGS - i;
- unsigned int n_bad = n + i - SYSCALL_MAX_ARGS;
- pr_warning("%s called with max args %d, handling only %d\n",
- __func__, i + n, SYSCALL_MAX_ARGS);
- memset(args_bad, 0, n_bad * sizeof(args[0]));
- n = SYSCALL_MAX_ARGS - i;
- }
-
- if (i == 0) {
- args[0] = regs->ARM_ORIG_r0;
- args++;
- i++;
- n--;
- }
-
+ BUG_ON(i + n > SYSCALL_MAX_ARGS);
memcpy(args, &regs->ARM_r0 + i, n * sizeof(args[0]));
}

@@ -74,19 +59,7 @@ static inline void syscall_set_arguments(struct
task_struct *task,
unsigned int i, unsigned int n,
const unsigned long *args)
{
- if (i + n > SYSCALL_MAX_ARGS) {
- pr_warning("%s called with max args %d, handling only %d\n",
- __func__, i + n, SYSCALL_MAX_ARGS);
- n = SYSCALL_MAX_ARGS - i;
- }
-
- if (i == 0) {
- regs->ARM_ORIG_r0 = args[0];
- args++;
- i++;
- n--;
- }
-
+ BUG_ON(i + n > SYSCALL_MAX_ARGS);
memcpy(&regs->ARM_r0 + i, args, n * sizeof(args[0]));
}

--
1.7.9.5


2012-05-23 16:45:30

by Will Deacon

[permalink] [raw]
Subject: Re: New ARM asm/syscall.h incompatible? (commit bf2c9f9866928df60157bc4f1ab39f93a32c754e)

On Wed, May 23, 2012 at 05:01:50PM +0100, Will Drewry wrote:
> Hi Wade and Steven,

Hi Will,

> I don't believe the syscall_get_arguments/syscall_set_arguments
> implementation that landed in 3.4 is correct or safe. I didn't see it
> get pulled in - sorry for not mailing sooner! :(

Well, thanks for shouting anyway. It only went in during this merge window,
so hopefully we can fix it before 3.5.

> The current implementation allows for _7_ arguments and allows the 0th
> index to be the ARM_ORIG_r0 instead of starting with ARM_r0 == 0. In
> the global description of syscall_*_arguments it says:
>
> * It's only valid to call this when @task is stopped for tracing on
> * entry to a system call, due to %TIF_SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
> * It's invalid to call this with @i + @n > 6; we only support system calls
> * taking up to 6 arguments.

Hmm, where does this `6' come from? I see it's an open-coded constant in
some of the trace code, but as Russell pointed out:

http://lists.infradead.org/pipermail/linux-arm-kernel/2012-February/085509.html

we have the unusual case of requiring 7 arguments for a particular syscall.
Perhaps we could let the arch configure this?

> This means that the current implementation is broken when matching
> system call arguments for ftrace (unless there is an arch specific
> hack in there) and it breaks internal kernel API for any other
> consumers without arch knowledge (like seccomp mode=2). Is there a
> reason to expose ARM_ORIG_r0 this way? Am I misreading?
>
> My understanding of the arch register usage at syscall time is something like:
> - ORIG_r0 gets the syscall number
> - r0 becomes the first system call argument
> - system call proceeds
> - on return, r0 is the return value

Aha, you're slightly off here. For ARM, ORIG_r0 is the first argument to the
system call (we keep it here as part of the syscall restarting code, since
the result of the interrupted syscall will sit in r0). The system call
number is either in r7 or encoded as part of the instruction but when
tracing we save it into the thread_info so we don't need to worry about
where it came from.

> Even audit_syscall_entry() uses ARM_r0 for the first argument which
> means that any future consumers doing syscall_get_arguments(..., 0, 6)
> would get the wrong first argument.

Thinking about it, I can't think of a scenario where the tracing code should
see a different value for ARM_r0 and ARM_ORIG_r0 on the entry path. Maybe we
should use ARM_r0 consistently for clarity.

> I'm also curious why the system call argument getter/setters allow for
> invalid requests instead of BUG_ON()ing? All code that exposes
> syscall arguments to userspace should be limiting the number to a
> maximum of 6, and any other badness is a definite kernel bug. Am I
> just really confused?

See the post from Russell linked above.

So I think the conclusion is that the 7-argument syscall (sys_syscall with
OABI) will have a truncated argument list when tracing.

Hope that helps,

Will

2012-05-23 18:02:59

by Will Drewry

[permalink] [raw]
Subject: Re: New ARM asm/syscall.h incompatible? (commit bf2c9f9866928df60157bc4f1ab39f93a32c754e)

On Wed, May 23, 2012 at 11:45 AM, Will Deacon <[email protected]> wrote:
> On Wed, May 23, 2012 at 05:01:50PM +0100, Will Drewry wrote:
>> Hi Wade and Steven,
>
> Hi Will,
>
>> I don't believe the syscall_get_arguments/syscall_set_arguments
>> implementation that landed in 3.4 is correct or safe. ?I didn't see it
>> get pulled in - sorry for not mailing sooner! :(
>
> Well, thanks for shouting anyway. It only went in during this merge window,
> so hopefully we can fix it before 3.5.

Ah cool - so I can't read doubly :)

>> The current implementation allows for _7_ arguments and allows the 0th
>> index to be the ARM_ORIG_r0 instead of starting with ARM_r0 == 0. ?In
>> the global description of syscall_*_arguments it says:
>>
>> ?* It's only valid to call this when @task is stopped for tracing on
>> ?* entry to a system call, due to %TIF_SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
>> ?* It's invalid to call this with @i + @n > 6; we only support system calls
>> ?* taking up to 6 arguments.
>
> Hmm, where does this `6' come from? I see it's an open-coded constant in
> some of the trace code, but as Russell pointed out:

I don't know the origins of the "6" argument maximum, but it is what
is specified in the asm-generic/syscall.h describing the expected
values for the calls. I believe the glibc wrappers for syscall used to
enforce it with its _syscall helpers.

I've seen it elsewhere in the kernel too (and modeled the seccomp
filter code based on that explicit expectation).

> http://lists.infradead.org/pipermail/linux-arm-kernel/2012-February/085509.html
>
> we have the unusual case of requiring 7 arguments for a particular syscall.
> Perhaps we could let the arch configure this?

It seems like it's a bit of a trap (pun not really intended). The
goal of the tracehook and asm/syscall.h code is to create a portable
ABI on arch-specific details. If arch-specific behavior gets encoded
there, then the encapsulation is broken.

I don't think all is lost though :) I'll explain what I'm thinking
below and you can tell me it's been thought of and written off :)


>> This means that the current implementation is broken when matching
>> system call arguments for ftrace (unless there is an arch specific
>> hack in there) and it breaks internal kernel API for any other
>> consumers without arch knowledge (like seccomp mode=2). ?Is there a
>> reason to expose ARM_ORIG_r0 this way? ?Am I misreading?
>>
>> My understanding of the arch register usage at syscall time is something like:
>> - ORIG_r0 gets the syscall number
>> - r0 becomes the first system call argument
>> - system call proceeds
>> - on return, r0 is the return value
>
> Aha, you're slightly off here. For ARM, ORIG_r0 is the first argument to the
> system call (we keep it here as part of the syscall restarting code, since
> the result of the interrupted syscall will sit in r0). The system call
> number is either in r7 or encoded as part of the instruction but when
> tracing we save it into the thread_info so we don't need to worry about
> where it came from.

That makes more sense :)

>> Even audit_syscall_entry() uses ARM_r0 for the first argument which
>> means that any future consumers doing syscall_get_arguments(..., 0, 6)
>> would get the wrong first argument.
>
> Thinking about it, I can't think of a scenario where the tracing code should
> see a different value for ARM_r0 and ARM_ORIG_r0 on the entry path. Maybe we
> should use ARM_r0 consistently for clarity.

I can't see the benefit of the ARM_ORIG_r0 since the syscall is cached
and I haven't seen any code that expects syscall_get_arguments(...,0,
1) to return the original arg 0. I don't really know which is better
behavior -- showing the original values or showing the actual register
contents at that point.

>> I'm also curious why the system call argument getter/setters allow for
>> invalid requests instead of BUG_ON()ing? ?All code that exposes
>> syscall arguments to userspace should be limiting the number to a
>> maximum of 6, and any other badness is a definite kernel bug. ?Am I
>> just really confused?
>
> See the post from Russell linked above.
>
> So I think the conclusion is that the 7-argument syscall (sys_syscall with
> OABI) will have a truncated argument list when tracing.

greatly. sys_syscall seems to be a very special case. I'm curious
why sys_syscall needs to bubble up during tracing. Did you consider
just doing the sys_syscall remapping (as it is done in entry-common.S)
again in the syscall_trace why==0 code? In particular, sys_syscall
looks like:

@ r0 = syscall number
@ r8 = syscall table
sys_syscall:
>------->-------bic>----scno, r0, #__NR_OABI_SYSCALL_BASE
>------->-------cmp>----scno, #__NR_syscall - __NR_SYSCALL_BASE
>------->-------cmpne>--scno, #NR_syscalls>-----@ check range
>------->-------stmloia>sp, {r5, r6}>--->-------@ shuffle args
>------->-------movlo>--r0, r1
>------->-------movlo>--r1, r2
>------->-------movlo>--r2, r3
>------->-------movlo>--r3, r4
>------->-------ldrlo>--pc, [tbl, scno, lsl #2]
>------->-------b>------sys_ni_syscall
ENDPROC(sys_syscall)

Would it make sense to just pre-handle the sys_syscall case when
entering syscall_trace (it's already the slow path)? Trace will then
see the syscall that _is_ executed, so will ptrace, etc. The only
downside is that strace and friends won't see that sys_syscall was the
originator. The rest of ptrace, ftrace, and seccomp behavior would
work beautifully though ;)

Is that just more insanity? I'd really love to see
syscall_*_arguments behavior stay uniform across the various
architectures. That said, having the optional 7th argument doesn't
really affect the other arches if they always assume it maxes at 6, it
just means that that 7th argument will be routinely unreachable in
cross-arch code.

Thanks for the enlightening response!
will

2012-05-23 18:56:22

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: New ARM asm/syscall.h incompatible? (commit bf2c9f9866928df60157bc4f1ab39f93a32c754e)

On Wed, May 23, 2012 at 11:01:50AM -0500, Will Drewry wrote:
> Hi Wade and Steven,
>
> I don't believe the syscall_get_arguments/syscall_set_arguments
> implementation that landed in 3.4 is correct or safe. I didn't see it
> get pulled in - sorry for not mailing sooner! :(
>
> The current implementation allows for _7_ arguments and allows the 0th
> index to be the ARM_ORIG_r0 instead of starting with ARM_r0 == 0. In
> the global description of syscall_*_arguments it says:
>
> * It's only valid to call this when @task is stopped for tracing on
> * entry to a system call, due to %TIF_SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
> * It's invalid to call this with @i + @n > 6; we only support system calls
> * taking up to 6 arguments.
>
> This means that the current implementation is broken when matching
> system call arguments for ftrace (unless there is an arch specific
> hack in there) and it breaks internal kernel API for any other
> consumers without arch knowledge (like seccomp mode=2). Is there a
> reason to expose ARM_ORIG_r0 this way? Am I misreading?
>
> My understanding of the arch register usage at syscall time is something like:
> - ORIG_r0 gets the syscall number
> - r0 becomes the first system call argument
> - system call proceeds
> - on return, r0 is the return value

Wrong. You're far too used to the x86 way of doing things.

For ARM, on entry to a system call, r0 _and_ orig_r0 are the first
syscall argument. For other exceptions, orig_r0 will be -1 (but you
can't rely upon that meaning anything, because a syscall can take -1
as the first argument.)

On exit from a system call, r0 will be the return value, and orig_r0
will _still_ be the first system call argument.

2012-05-23 19:04:27

by Will Drewry

[permalink] [raw]
Subject: Re: New ARM asm/syscall.h incompatible? (commit bf2c9f9866928df60157bc4f1ab39f93a32c754e)

On Wed, May 23, 2012 at 1:56 PM, Russell King - ARM Linux
<[email protected]> wrote:
> On Wed, May 23, 2012 at 11:01:50AM -0500, Will Drewry wrote:
>> Hi Wade and Steven,
>>
>> I don't believe the syscall_get_arguments/syscall_set_arguments
>> implementation that landed in 3.4 is correct or safe. ?I didn't see it
>> get pulled in - sorry for not mailing sooner! :(
>>
>> The current implementation allows for _7_ arguments and allows the 0th
>> index to be the ARM_ORIG_r0 instead of starting with ARM_r0 == 0. ?In
>> the global description of syscall_*_arguments it says:
>>
>> ?* It's only valid to call this when @task is stopped for tracing on
>> ?* entry to a system call, due to %TIF_SYSCALL_TRACE or %TIF_SYSCALL_AUDIT.
>> ?* It's invalid to call this with @i + @n > 6; we only support system calls
>> ?* taking up to 6 arguments.
>>
>> This means that the current implementation is broken when matching
>> system call arguments for ftrace (unless there is an arch specific
>> hack in there) and it breaks internal kernel API for any other
>> consumers without arch knowledge (like seccomp mode=2). ?Is there a
>> reason to expose ARM_ORIG_r0 this way? ?Am I misreading?
>>
>> My understanding of the arch register usage at syscall time is something like:
>> - ORIG_r0 gets the syscall number
>> - r0 becomes the first system call argument
>> - system call proceeds
>> - on return, r0 is the return value
>
> Wrong. ?You're far too used to the x86 way of doing things.
>
> For ARM, on entry to a system call, r0 _and_ orig_r0 are the first
> syscall argument. ?For other exceptions, orig_r0 will be -1 (but you
> can't rely upon that meaning anything, because a syscall can take -1
> as the first argument.)
>
> On exit from a system call, r0 will be the return value, and orig_r0
> will _still_ be the first system call argument.

Thanks - as usual, I can't keep them straight without the asm in front of me.

I'm still curious if it wouldn't make more sense to handle the
sys_syscall special case prior to any cross-arch (slowpath) code
involvement rather than truncating the 7th parameter making
sys_syscall a second class citizen for those cross-arch paths.
Perhaps that's not acceptable for ptrace/tracehook, but it seems like
it would make sense for ftrace and seccomp.

cheers!
will

2012-05-23 19:15:21

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: New ARM asm/syscall.h incompatible? (commit bf2c9f9866928df60157bc4f1ab39f93a32c754e)

On Wed, May 23, 2012 at 02:04:20PM -0500, Will Drewry wrote:
> I'm still curious if it wouldn't make more sense to handle the
> sys_syscall special case prior to any cross-arch (slowpath) code
> involvement rather than truncating the 7th parameter making
> sys_syscall a second class citizen for those cross-arch paths.

It would mean making sys_syscall an explicit special case in the fast
path of syscall entry, which we really don't want to do. It _is_ a
standard syscall, it just happens to have 7 arguments which are
rewritten back to what the syscall actually expects.

As I say, the alternative would be to explicitly test for the syscall
number in the fast path of system call entry and branch away to deal
with it. Adding unnecessary instructions to this fast path for such
a special case when there's already a perfectly reasonable alternative
solution doesn't fill me with any joy.

2012-05-24 15:39:43

by Will Drewry

[permalink] [raw]
Subject: Re: New ARM asm/syscall.h incompatible? (commit bf2c9f9866928df60157bc4f1ab39f93a32c754e)

On Wed, May 23, 2012 at 2:14 PM, Russell King - ARM Linux
<[email protected]> wrote:
> On Wed, May 23, 2012 at 02:04:20PM -0500, Will Drewry wrote:
>> I'm still curious if it wouldn't make more sense to handle the
>> sys_syscall special case prior to any cross-arch (slowpath) code
>> involvement rather than truncating the 7th parameter making
>> sys_syscall a second class citizen for those cross-arch paths.
>
> It would mean making sys_syscall an explicit special case in the fast
> path of syscall entry, which we really don't want to do. ?It _is_ a
> standard syscall, it just happens to have 7 arguments which are
> rewritten back to what the syscall actually expects.
>
> As I say, the alternative would be to explicitly test for the syscall
> number in the fast path of system call entry and branch away to deal
> with it. ?Adding unnecessary instructions to this fast path for such
> a special case when there's already a perfectly reasonable alternative
> solution doesn't fill me with any joy.

I'd been picturing this as being done exclusively after the slow-path
is triggered, in __sys_trace or syscall_trace(), but perhaps I'm
missing something that makes that untenable.

thanks!
will

2012-05-24 16:11:44

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: New ARM asm/syscall.h incompatible? (commit bf2c9f9866928df60157bc4f1ab39f93a32c754e)

On Thu, May 24, 2012 at 10:39:38AM -0500, Will Drewry wrote:
> On Wed, May 23, 2012 at 2:14 PM, Russell King - ARM Linux
> <[email protected]> wrote:
> > On Wed, May 23, 2012 at 02:04:20PM -0500, Will Drewry wrote:
> >> I'm still curious if it wouldn't make more sense to handle the
> >> sys_syscall special case prior to any cross-arch (slowpath) code
> >> involvement rather than truncating the 7th parameter making
> >> sys_syscall a second class citizen for those cross-arch paths.
> >
> > It would mean making sys_syscall an explicit special case in the fast
> > path of syscall entry, which we really don't want to do. ?It _is_ a
> > standard syscall, it just happens to have 7 arguments which are
> > rewritten back to what the syscall actually expects.
> >
> > As I say, the alternative would be to explicitly test for the syscall
> > number in the fast path of system call entry and branch away to deal
> > with it. ?Adding unnecessary instructions to this fast path for such
> > a special case when there's already a perfectly reasonable alternative
> > solution doesn't fill me with any joy.
>
> I'd been picturing this as being done exclusively after the slow-path
> is triggered, in __sys_trace or syscall_trace(), but perhaps I'm
> missing something that makes that untenable.

Well, I think the first thing which can be done is to move the seccomp
stuff out of the fast path, and combine it with the tracing code - like
this:

diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index 0f04d84..5006374 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -148,24 +148,25 @@ extern int vfp_restore_user_hwstate(struct user_vfp __user *,
#define TIF_NOTIFY_RESUME 2 /* callback before returning to user */
#define TIF_SYSCALL_TRACE 8
#define TIF_SYSCALL_AUDIT 9
+#define TIF_SECCOMP 10
#define TIF_POLLING_NRFLAG 16
#define TIF_USING_IWMMXT 17
#define TIF_MEMDIE 18 /* is terminating due to OOM killer */
#define TIF_RESTORE_SIGMASK 20
-#define TIF_SECCOMP 21

#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
#define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
#define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
+#define _TIF_SECCOMP (1 << TIF_SECCOMP)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_USING_IWMMXT (1 << TIF_USING_IWMMXT)
#define _TIF_RESTORE_SIGMASK (1 << TIF_RESTORE_SIGMASK)
-#define _TIF_SECCOMP (1 << TIF_SECCOMP)

/* Checks for any syscall work in entry-common.S */
-#define _TIF_SYSCALL_WORK (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT)
+#define _TIF_SYSCALL_WORK (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT)
+#define _TIF_SYSCALL_SLOW (_TIF_SYSCALL_WORK | _TIF_SECCOMP)

/*
* Change these and you break ASM code in entry-common.S
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index 54ee265..2fa4e85 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -442,19 +442,10 @@ ENTRY(vector_swi)
ldr r10, [tsk, #TI_FLAGS] @ check for syscall tracing
stmdb sp!, {r4, r5} @ push fifth and sixth args

-#ifdef CONFIG_SECCOMP
- tst r10, #_TIF_SECCOMP
- beq 1f
- mov r0, scno
- bl __secure_computing
- add r0, sp, #S_R0 + S_OFF @ pointer to regs
- ldmia r0, {r0 - r3} @ have to reload r0 - r3
-1:
-#endif
-
- tst r10, #_TIF_SYSCALL_WORK @ are we tracing syscalls?
- bne __sys_trace
+ tst r10, #_TIF_SYSCALL_SLOW
+ bne __sys_slow @ slow path for syscalls

+__sys_fast:
cmp scno, #NR_syscalls @ check upper syscall limit
adr lr, BSYM(ret_fast_syscall) @ return address
ldrcc pc, [tbl, scno, lsl #2] @ call sys_* routine
@@ -467,6 +458,18 @@ ENTRY(vector_swi)
b sys_ni_syscall @ not private func
ENDPROC(vector_swi)

+
+__sys_slow:
+ tst r10, #_TIF_SECCOMP
+ beq __sys_trace
+
+ bl __secure_computing
+ add r0, sp, #S_R0 + S_OFF @ pointer to regs
+ ldmia r0, {r0 - r3}
+
+ tst r10, #_TIF_SYSCALL_WORK
+ beq __sys_fast
+
/*
* This is the really slow path. We're going to be doing
* context switches, and waiting for our parent to respond.


Now, we could add code just after __sys_slow to check for __NR_syscall,
but to do that we need to first check whether it was an OABI syscall.
(EABI doesn't use __NR_syscall at all). We currently don't have access
to that at this point in the processing.

Then we'd need to check whether scno told is that it was __NR_syscall.
And if it was, then we can read the system call number from r0 for
seccomp. Great, that allows OABI programs to use sys_syscall() to
call the permitted seccomp syscalls. (Was anyone complaining this
doesn't work?)

Now, at this point, we need to consider what to do about the the
syscall tracing code. As you've realised, they read the registers
directly off the kernel stack as they were saved at entry. To
"fix" this for __sys_syscall.

You couldn't pass a pt_regs pointer shifted by the appropriate amount,
because that makes things like ORIG_r0 and sp, lr, pc all wrong. So,
you'd have to read all the existing stacked register state, and re-stack
it to create a faked pt_regs structure for the syscall.

You'd then have to undo all that when you exit the syscall, before you
start thinking about doing any signal handling...

And now consider what state a debugger is going to read through ptrace,
which would continue to accesses the as-stacked-at-entry register set,
not the hacked up set. Things like ptrace will have lost the information
that it's a sys_syscall call to know that they have to shuffle the args.

So, all round, this is really not nice.

I really don't see the point of adding all this complexity to the kernel,
which makes the existing code paths _much_ more complex just to give
ftrace/tracehook an easier time.

2012-05-24 18:58:38

by Roland McGrath

[permalink] [raw]
Subject: Re: New ARM asm/syscall.h incompatible? (commit bf2c9f9866928df60157bc4f1ab39f93a32c754e)

When I first added the asm/syscall.h interface, I took six arguments
to be a universal limit
in the kernel. I had the impression that the limit was encoded in
many places, but I really
don't recall any details now. For x86-32, it is a hard limit of the
register argument passing
convention, as six arguments plus syscall number and stack pointer is
all the registers there are.
So no generic syscall will ever have more than six arguments, since
x86-32 couldn't support it.

If there are machine-specific syscalls that allow more arguments, then
I think it would be reasonable
to make a machine-specfiic argument limit part of the asm/syscall.h
interface. That is, define some
macro like MAX_SYSCALL_ARGS in asm/syscall.h. If you do that, please
put a definition for it in
asm-generic/syscall.h and update the kerneldoc comments there to
mention that macro instead of
the explicit "6" and "0..5" there now.


Thanks,
Roland

2012-05-29 00:47:11

by Al Viro

[permalink] [raw]
Subject: Re: New ARM asm/syscall.h incompatible? (commit bf2c9f9866928df60157bc4f1ab39f93a32c754e)

On Thu, May 24, 2012 at 05:11:30PM +0100, Russell King - ARM Linux wrote:
> +__sys_slow:
> + tst r10, #_TIF_SECCOMP
> + beq __sys_trace
> +
> + bl __secure_computing
> + add r0, sp, #S_R0 + S_OFF @ pointer to regs
> + ldmia r0, {r0 - r3}
> +
> + tst r10, #_TIF_SYSCALL_WORK
> + beq __sys_fast

What is it doing outside of syscall_trace(), anyway?