2006-05-26 03:25:11

by Jeff Dike

[permalink] [raw]
Subject: [RFC] [PATCH] Double syscall exit traces on x86_64

We are seeing double ptrace notifications of system call returns on recent
x86_64 kernels. This breaks UML and at least one other app.

The patch below appears to fix the problem. The bug is caused by both
syscall_trace and int_very_careful both calling syscall_trace_leave,
and the system call tracing path going through int_very_careful.

I would have liked to get rid of one or the other call to
syscall_trace_leave. However, the syscall_trace path looks like it
can exit to userspace without going through int_very_careful, and
int_very_careful does things other than system call tracing.

So, instead, I took _TIF_SYSCALL_TRACE and _TIF_SYSCALL_AUDIT out of
the flags test on the grounds that they had already been checked in
syscall_trace. There is possibly a preemption and call to schedule
between syscall_trace and int_very_careful, so if it can be attached
at that point, then the first return will be missed. However, I think
that ptrace attachment requires a stopped child, not just one that has
been preempted.

I don't see signal delivery between syscall_trace and
int_very_careful, so I don't see that there can be a ptrace attach
followed by int_very_careful missing the first return.

This is an RFC - if it turns out to be actually correct, some comments
need fixing before this goes anywhere.

UML works with this applied, and it doesn't seem to break
singlestepping, either on normal instructions or across system calls,
which looks like the next most vulnerable thing.

Jeff


Index: linux-2.6.16.x86_64/arch/x86_64/kernel/entry.S
===================================================================
--- linux-2.6.16.x86_64.orig/arch/x86_64/kernel/entry.S
+++ linux-2.6.16.x86_64/arch/x86_64/kernel/entry.S
@@ -345,7 +345,7 @@ int_very_careful:
sti
SAVE_REST
/* Check for syscall exit trace */
- testl $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP),%edx
+ testl $(_TIF_SINGLESTEP),%edx
jz int_signal
pushq %rdi
CFI_ADJUST_CFA_OFFSET 8
@@ -353,7 +353,7 @@ int_very_careful:
call syscall_trace_leave
popq %rdi
CFI_ADJUST_CFA_OFFSET -8
- andl $~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP),%edi
+ andl $~(_TIF_SINGLESTEP),%edi
cli
jmp int_restore_rest


2006-05-26 10:36:41

by Andi Kleen

[permalink] [raw]
Subject: Re: [discuss] [RFC] [PATCH] Double syscall exit traces on x86_64

On Friday 26 May 2006 05:24, Jeff Dike wrote:
> We are seeing double ptrace notifications of system call returns on recent
> x86_64 kernels. This breaks UML and at least one other app.

I believe this patch is the correct fix. Can you confirm it works for you?

-Andi

Don't do syscall exit tracing twice

int_ret_from_syscall already does syscall exit tracing, so
no need to do it again in the caller.

This caused problems for UML and some other special programs doing
syscall interception.

Signed-off-by: Andi Kleen <[email protected]>

Index: linux/arch/x86_64/kernel/entry.S
===================================================================
--- linux.orig/arch/x86_64/kernel/entry.S
+++ linux/arch/x86_64/kernel/entry.S
@@ -282,12 +282,7 @@ tracesys:
ja 1f
movq %r10,%rcx /* fixup for C */
call *sys_call_table(,%rax,8)
- movq %rax,RAX-ARGOFFSET(%rsp)
-1: SAVE_REST
- movq %rsp,%rdi
- call syscall_trace_leave
- RESTORE_TOP_OF_STACK %rbx
- RESTORE_REST
+1: movq %rax,RAX-ARGOFFSET(%rsp)
/* Use IRET because user could have changed frame */
jmp int_ret_from_sys_call
CFI_ENDPROC

2006-05-26 14:13:48

by Jeff Dike

[permalink] [raw]
Subject: Re: [discuss] [RFC] [PATCH] Double syscall exit traces on x86_64

On Fri, May 26, 2006 at 12:36:26PM +0200, Andi Kleen wrote:
> I believe this patch is the correct fix. Can you confirm it works for you?

Looks good, thanks.

Jeff