tl;dr: after 27d6b4d14f5c3ab21c4aef87dd04055a2d7adf14 ptracer
modifications to orig_ax in a syscall entry trace stop are not honored
and this breaks our code.
rr, a userspace record and replay debugger[0], redirects syscalls of
its ptracee through an in-process LD_PRELOAD-injected solib. To do
this, it ptraces the tracee to a syscall entry event, and then, if the
syscall instruction is not our redirected syscall instruction, it
examines the tracee's code and pattern matches against a set of
syscall invocations that it knows how to rewrite. If that succeeds, rr
hijacks[1] the current syscall entry by setting orig_ax to something
innocuous like SYS_gettid, runs the hijacked syscall, and then
restores program state to before the syscall entry trace event and
allows the tracee to execute forwards, through the newly patched code
and into our injected solib.
Before 27d6b4d14f5c3ab21c4aef87dd04055a2d7adf14 modifications to
orig_ax were honored by x86's syscall_enter_trace[2]. The generic arch
code however does not honor any modifications to the syscall number[3]
(presumably because on most architectures syscall results clobber the
first argument and not the syscall number, so there is no equivalent
to orig_rax).
Note that the above is just one example of when rr changes the syscall
number this way. This is done in many places in our code and rr is
largely broken on 5.9-rc1 at the moment because of this bug.
- Kyle
[0] https://rr-project.org/
[1] https://github.com/mozilla/rr/blob/cd61ba22ccc05b426691312784674c0eb8e654ef/src/Task.cc#L872
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/entry/common.c?h=v5.8&id=bcf876870b95592b52519ed4aafcf9d95999bc9c#n204
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/entry/common.c?h=v5.8&id=27d6b4d14f5c3ab21c4aef87dd04055a2d7adf14#n44
On Wed, Aug 19 2020 at 10:14, Kyle Huey wrote:
> tl;dr: after 27d6b4d14f5c3ab21c4aef87dd04055a2d7adf14 ptracer
> modifications to orig_ax in a syscall entry trace stop are not honored
> and this breaks our code.
My fault and I have no idead why none of the silly test cases
noticed. Fix below.
Thanks,
tglx
---
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 9852e0d62d95..fcae019158ca 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -65,7 +65,8 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
syscall_enter_audit(regs, syscall);
- return ret ? : syscall;
+ /* The above might have changed the syscall number */
+ return ret ? : syscall_get_nr(current, regs);
}
noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall)
On Wed, Aug 19, 2020 at 12:44 PM Thomas Gleixner <[email protected]> wrote:
>
> On Wed, Aug 19 2020 at 10:14, Kyle Huey wrote:
> > tl;dr: after 27d6b4d14f5c3ab21c4aef87dd04055a2d7adf14 ptracer
> > modifications to orig_ax in a syscall entry trace stop are not honored
> > and this breaks our code.
>
> My fault and I have no idead why none of the silly test cases
> noticed. Fix below.
That'll do it, thanks.
- Kyle
> Thanks,
>
> tglx
> ---
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index 9852e0d62d95..fcae019158ca 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -65,7 +65,8 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
>
> syscall_enter_audit(regs, syscall);
>
> - return ret ? : syscall;
> + /* The above might have changed the syscall number */
> + return ret ? : syscall_get_nr(current, regs);
> }
>
> noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall)
On Wed, Aug 19, 2020 at 09:44:39PM +0200, Thomas Gleixner wrote:
> On Wed, Aug 19 2020 at 10:14, Kyle Huey wrote:
> > tl;dr: after 27d6b4d14f5c3ab21c4aef87dd04055a2d7adf14 ptracer
> > modifications to orig_ax in a syscall entry trace stop are not honored
> > and this breaks our code.
>
> My fault and I have no idead why none of the silly test cases
> noticed. Fix below.
Hmm, which were you trying? Looking just now, I see that the seccomp
selftests were failing for all their syscall-changing tests.
Regardless, I can confirm both the failure and the fix.
Reported-by: Kyle Huey <[email protected]>
Tested-by: Kees Cook <[email protected]>
Acked-by: Kees Cook <[email protected]>
kernelci.org is *so* close to having the kernel selftests actually
running with their builds. :)
https://github.com/kernelci/kernelci-core/issues/331
-Kees
>
> Thanks,
>
> tglx
> ---
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index 9852e0d62d95..fcae019158ca 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -65,7 +65,8 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
>
> syscall_enter_audit(regs, syscall);
>
> - return ret ? : syscall;
> + /* The above might have changed the syscall number */
> + return ret ? : syscall_get_nr(current, regs);
> }
>
> noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall)
--
Kees Cook
On Thu, Aug 20 2020 at 14:09, Kees Cook wrote:
> On Wed, Aug 19, 2020 at 09:44:39PM +0200, Thomas Gleixner wrote:
>> My fault and I have no idead why none of the silly test cases
>> noticed. Fix below.
>
> Hmm, which were you trying? Looking just now, I see that the seccomp
> selftests were failing for all their syscall-changing tests.
/me feels stupid
It's probably all caused by the heat wave which made my brain operate
outside of the specified temperature range.
Thanks,
tglx
The following commit has been merged into the core/urgent branch of tip:
Commit-ID: d88d59b64ca35abae208e2781fdb45e69cbed56c
Gitweb: https://git.kernel.org/tip/d88d59b64ca35abae208e2781fdb45e69cbed56c
Author: Thomas Gleixner <[email protected]>
AuthorDate: Wed, 19 Aug 2020 21:44:39 +02:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Fri, 21 Aug 2020 16:17:29 +02:00
core/entry: Respect syscall number rewrites
The transcript of the x86 entry code to the generic version failed to
reload the syscall number from ptregs after ptrace and seccomp have run,
which both can modify the syscall number in ptregs. It returns the original
syscall number instead which is obviously not the right thing to do.
Reload the syscall number to fix that.
Fixes: 142781e108b1 ("entry: Provide generic syscall entry functionality")
Reported-by: Kyle Huey <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Kyle Huey <[email protected]>
Tested-by: Kees Cook <[email protected]>
Acked-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/entry/common.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 9852e0d..fcae019 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -65,7 +65,8 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
syscall_enter_audit(regs, syscall);
- return ret ? : syscall;
+ /* The above might have changed the syscall number */
+ return ret ? : syscall_get_nr(current, regs);
}
noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall)
On Wed, Sep 09, 2020 at 11:53:42PM +1000, Michael Ellerman wrote:
> Hi Thomas,
>
> Sorry if this was discussed already somewhere, but I didn't see anything ...
>
> Thomas Gleixner <[email protected]> writes:
> > On Wed, Aug 19 2020 at 10:14, Kyle Huey wrote:
> >> tl;dr: after 27d6b4d14f5c3ab21c4aef87dd04055a2d7adf14 ptracer
> >> modifications to orig_ax in a syscall entry trace stop are not honored
> >> and this breaks our code.
> ...
> > diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> > index 9852e0d62d95..fcae019158ca 100644
> > --- a/kernel/entry/common.c
> > +++ b/kernel/entry/common.c
> > @@ -65,7 +65,8 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
>
> Adding context:
>
> /* Do seccomp after ptrace, to catch any tracer changes. */
> if (ti_work & _TIF_SECCOMP) {
> ret = __secure_computing(NULL);
> if (ret == -1L)
> return ret;
> }
>
> if (unlikely(ti_work & _TIF_SYSCALL_TRACEPOINT))
> trace_sys_enter(regs, syscall);
>
> > syscall_enter_audit(regs, syscall);
> >
> > - return ret ? : syscall;
> > + /* The above might have changed the syscall number */
> > + return ret ? : syscall_get_nr(current, regs);
> > }
> >
> > noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall)
>
> I noticed if the syscall number is changed by seccomp/ptrace, the
> original syscall number is still passed to trace_sys_enter() and audit.
>
> The old code used regs->orig_ax, so any change to the syscall number
> would be seen by the tracepoint and audit.
Ah! That's no good.
> I can observe the difference between v5.8 and mainline, using the
> raw_syscall trace event and running the seccomp_bpf selftest which turns
> a getpid (39) into a getppid (110).
>
> With v5.8 we see getppid on entry and exit:
>
> seccomp_bpf-1307 [000] .... 22974.874393: sys_enter: NR 110 (7ffff22c46e0, 40a350, 4, fffffffffffff7ab, 7fa6ee0d4010, 0)
> seccomp_bpf-1307 [000] .N.. 22974.874401: sys_exit: NR 110 = 1304
>
> Whereas on mainline we see an enter for getpid and an exit for getppid:
>
> seccomp_bpf-1030 [000] .... 21.806766: sys_enter: NR 39 (7ffe2f6d1ad0, 40a350, 7ffe2f6d1ad0, 0, 0, 407299)
> seccomp_bpf-1030 [000] .... 21.806767: sys_exit: NR 110 = 1027
>
>
> I don't know audit that well, but I think it saves the syscall number on
> entry eg. in __audit_syscall_entry(). So it will record the wrong
> syscall happening in this case I think.
>
> Seems like we should reload the syscall number before calling
> trace_sys_enter() & audit ?
Agreed. I wonder what the best way to build a regression test for this
is... hmmm.
--
Kees Cook
On Wed, Sep 09, 2020 at 11:53:42PM +1000, Michael Ellerman wrote:
> I can observe the difference between v5.8 and mainline, using the
> raw_syscall trace event and running the seccomp_bpf selftest which turns
> a getpid (39) into a getppid (110).
>
> With v5.8 we see getppid on entry and exit:
>
> seccomp_bpf-1307 [000] .... 22974.874393: sys_enter: NR 110 (7ffff22c46e0, 40a350, 4, fffffffffffff7ab, 7fa6ee0d4010, 0)
> seccomp_bpf-1307 [000] .N.. 22974.874401: sys_exit: NR 110 = 1304
>
> Whereas on mainline we see an enter for getpid and an exit for getppid:
>
> seccomp_bpf-1030 [000] .... 21.806766: sys_enter: NR 39 (7ffe2f6d1ad0, 40a350, 7ffe2f6d1ad0, 0, 0, 407299)
> seccomp_bpf-1030 [000] .... 21.806767: sys_exit: NR 110 = 1027
For my own notes, this is how I reproduced it:
# ./perf-$VER record -e raw_syscalls:sys_enter -e raw_syscalls:sys_exit &
# ./seccomp_bpf
# fg
ctrl-c
# ./perf-$VER script | grep seccomp_bpf | awk '{print $7}' | sort | uniq -c > $VER.log
*repeat*
# diff -u old.log new.log
...
(Is there an easier way to get those results?)
I will go see if I can figure out the best way to correct this.
--
Kees Cook
Kees Cook <[email protected]> writes:
> On Wed, Sep 09, 2020 at 11:53:42PM +1000, Michael Ellerman wrote:
>> I can observe the difference between v5.8 and mainline, using the
>> raw_syscall trace event and running the seccomp_bpf selftest which turns
>> a getpid (39) into a getppid (110).
>>
>> With v5.8 we see getppid on entry and exit:
>>
>> seccomp_bpf-1307 [000] .... 22974.874393: sys_enter: NR 110 (7ffff22c46e0, 40a350, 4, fffffffffffff7ab, 7fa6ee0d4010, 0)
>> seccomp_bpf-1307 [000] .N.. 22974.874401: sys_exit: NR 110 = 1304
>>
>> Whereas on mainline we see an enter for getpid and an exit for getppid:
>>
>> seccomp_bpf-1030 [000] .... 21.806766: sys_enter: NR 39 (7ffe2f6d1ad0, 40a350, 7ffe2f6d1ad0, 0, 0, 407299)
>> seccomp_bpf-1030 [000] .... 21.806767: sys_exit: NR 110 = 1027
>
> For my own notes, this is how I reproduced it:
>
> # ./perf-$VER record -e raw_syscalls:sys_enter -e raw_syscalls:sys_exit &
> # ./seccomp_bpf
> # fg
> ctrl-c
> # ./perf-$VER script | grep seccomp_bpf | awk '{print $7}' | sort | uniq -c > $VER.log
> *repeat*
> # diff -u old.log new.log
> ...
>
> (Is there an easier way to get those results?)
I did more or less the same thing, except I ran the trace event manually
(via debugfs), which is no better really.
I think the right way to test it would be to have a test that modifies
the syscall via seccomp and also monitors the trace event using perf
events. But that wouldn't be easier :)
> I will go see if I can figure out the best way to correct this.
I think this works?
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 18683598edbc..901361e2f8ea 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -60,13 +60,15 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
return ret;
}
+ syscall = syscall_get_nr(current, regs);
+
if (unlikely(ti_work & _TIF_SYSCALL_TRACEPOINT))
trace_sys_enter(regs, syscall);
syscall_enter_audit(regs, syscall);
/* The above might have changed the syscall number */
- return ret ? : syscall_get_nr(current, regs);
+ return ret ? : syscall;
}
static __always_inline long
cheers
On Sun, Sep 13 2020 at 17:44, Michael Ellerman wrote:
> Kees Cook <[email protected]> writes:
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index 18683598edbc..901361e2f8ea 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -60,13 +60,15 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
> return ret;
> }
>
> + syscall = syscall_get_nr(current, regs);
> +
> if (unlikely(ti_work & _TIF_SYSCALL_TRACEPOINT))
> trace_sys_enter(regs, syscall);
>
> syscall_enter_audit(regs, syscall);
>
> /* The above might have changed the syscall number */
> - return ret ? : syscall_get_nr(current, regs);
> + return ret ? : syscall;
> }
Yup, this looks right. Can you please send a proper patch?
Thanks,
tglx
On Sun, Sep 13, 2020 at 08:27:23PM +0200, Thomas Gleixner wrote:
> On Sun, Sep 13 2020 at 17:44, Michael Ellerman wrote:
> > Kees Cook <[email protected]> writes:
> > diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> > index 18683598edbc..901361e2f8ea 100644
> > --- a/kernel/entry/common.c
> > +++ b/kernel/entry/common.c
> > @@ -60,13 +60,15 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
> > return ret;
> > }
> >
> > + syscall = syscall_get_nr(current, regs);
> > +
> > if (unlikely(ti_work & _TIF_SYSCALL_TRACEPOINT))
> > trace_sys_enter(regs, syscall);
> >
> > syscall_enter_audit(regs, syscall);
> >
> > /* The above might have changed the syscall number */
> > - return ret ? : syscall_get_nr(current, regs);
> > + return ret ? : syscall;
> > }
>
> Yup, this looks right. Can you please send a proper patch?
I already did on Friday:
https://lore.kernel.org/lkml/[email protected]/
--
Kees Cook
Kees Cook <[email protected]> writes:
> On Sun, Sep 13, 2020 at 08:27:23PM +0200, Thomas Gleixner wrote:
>> On Sun, Sep 13 2020 at 17:44, Michael Ellerman wrote:
>> > Kees Cook <[email protected]> writes:
>> > diff --git a/kernel/entry/common.c b/kernel/entry/common.c
>> > index 18683598edbc..901361e2f8ea 100644
>> > --- a/kernel/entry/common.c
>> > +++ b/kernel/entry/common.c
>> > @@ -60,13 +60,15 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
>> > return ret;
>> > }
>> >
>> > + syscall = syscall_get_nr(current, regs);
>> > +
>> > if (unlikely(ti_work & _TIF_SYSCALL_TRACEPOINT))
>> > trace_sys_enter(regs, syscall);
>> >
>> > syscall_enter_audit(regs, syscall);
>> >
>> > /* The above might have changed the syscall number */
>> > - return ret ? : syscall_get_nr(current, regs);
>> > + return ret ? : syscall;
>> > }
>>
>> Yup, this looks right. Can you please send a proper patch?
>
> I already did on Friday:
> https://lore.kernel.org/lkml/[email protected]/
Thanks.
cheers