LinuxLists.cc - [RFC PATCH] x86 single-step (TF) vs system calls & traps

2004-06-29 01:55:35

Subject: [RFC PATCH] x86 single-step (TF) vs system calls & traps

Andrew Cagney discovered this problem while working on GDB. I suspect this
bug has always been there, but I've only actually tested current 2.6 kernels.

When you single-step into a trap instruction, you actually don't get a
SIGTRAP until the instruction after the trap instruction has also executed.
I have demonstrated this in three cases: `into' generating a SIGSEGV that
is suppressed via ptrace; an `int $0x80' system call entry; and a
`sysenter' system call entry via the vsyscall entry point.

In https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=126699 you can find
a working test program and full details on reproducing the problem using
gdb. I can post all that here if people prefer. If there were a program
that set the TF bit in the processor flags itself and handled the signals,
I'm sure that would see the same problem, without ptrace being involved.

>From reading the code and the x86 specs on traps, it makes sense why it
happens. The trap flag causes a single-step trap after the execution of
the instruction that sets the trap flag. For instructions that generate
their own traps, TF is cleared on the way into the kernel, and it's the
normal iret that is restoring the flag on the way back to user mode. As
advertised, that executes the next instruction, i.e. whatever the restored
user PC is at, and then traps. But from the userland perspective, this is
highly unexpected: the user executed one instruction like 'into' or 'int'
or `sysenter', and expects execution to stop after that "instruction" is
done. To the user, everything the kernel does in response to the trap is
part of the execution of that one instruction.

So, I submit that the sensible behavior in these cases, when TF is set, is
to stop immediately with a simulated single-step trap, before running the
next user instruction. Below is a patch that implements this on x86, and I
believe it covers all the cases. I would like some verification from
others on my reading of which cases there are--that is, cases where a
kernel entry is with the saved PC sitting after an instruction that's
already executed to cause that entry. From the x86 specs, I see only the
following exception of "trap" type:

#1: some kind of debug traps. These don't need any help because this is
the same trap as the single-step trap anyway and the kernel behavior
(SIGTRAP and its details) are identical already.

#3: int3. Probably the same story here, but my fix does treat it like `into'.
Since gdb uses this for its own purposes, you can't demonstrate the
same anomaly that's easy to see for `into'.

#4: into. This is a problem case I've demonstrated and that my fix addresses.

Software exceptions like the `int' instruction generates are not classified
the same in that table, but `int $0x80' system call entry behaves the same
way in effect. My patch handles the system call entry case.

sysenter is not a trap, but the way it and sysexit work seems to have the
same effect.

Are there any cases of this nature that I have overlooked?

Unfortunately I just don't see a way to address this without inserting into
the trap and system call paths checks for TF being set in the user-mode
flags word.

The good news is that the fast `sysenter' path does not need any such
check, because of the way `sysenter' interacts with TF: instead the
single-step trap handler that already detects entry in kernel mode via the
`sysenter' case is where the checking happens, so the new overhead is only
in the case where TF was in use. Something else I would like verification
on is that the `sysenter' entry point is the only case where the
single-step trap will be seen in kernel mode. My patch assumes this. I
can't see how else it could come up (except for the KGDB patch, which also
seems to make the same assumption).

I have demonstrated the problem behavior in 32-bit x86 code (see the
bugzilla reference above) both on x86 and on x86-64. I haven't yet
attacked the x86-64 kernel's 32-bit support to make the same change.

However, in a native 64-bit system call on x86-64, I do not see the same
problem. This bewilders me. From the AMD64 docs I have, I would expect
syscall/sysret to behave the same way as we observe sysenter/sysexit doing
in 32-bit mode. I don't see any difference in the behavior of TF specified
for x86-64 from x86. Can anyone help me understand what's going on there?

I have only investigated x86 and x86-64 in detail (those are the boxes I
have handy for kernel hacking, and the machines I know well enough off hand
to suggest specific fixes like those below). Andrew has tested for the
same issue on other architectures, and I believe the same or similar
problem hits powerpc, ia64, and perhaps s390. I'm not sure if any other
machines have been tested and seen not to have any such isse.

Comments?

Thanks,
Roland

Index: linux-2.6/arch/i386/kernel/entry.S
===================================================================
RCS file: /home/roland/redhat/bkcvs/linux-2.5/arch/i386/kernel/entry.S,v
retrieving revision 1.86
diff -b -p -u -r1.86 entry.S
--- linux-2.6/arch/i386/kernel/entry.S 23 May 2004 05:03:15 -0000 1.86
+++ linux-2.6/arch/i386/kernel/entry.S 24 Jun 2004 06:53:37 -0000
@@ -255,6 +255,13 @@ sysenter_past_esp:
pushl %eax
SAVE_ALL
GET_THREAD_INFO(%ebp)
+ /*
+ * We don't need to check for TF here as in system_call, below.
+ * Kernel entry by `sysenter' doesn't clear TF and so we just took
+ * the single-step trap in kernel mode and set TIF_SINGLESTEP to
+ * restore TF when we return to user mode. That will also handle
+ * simulating the single-step trap as syscall_singlestep arranges.
+ */
cmpl $(nr_syscalls), %eax
jae syscall_badsys

@@ -278,6 +285,9 @@ ENTRY(system_call)
pushl %eax # save orig_eax
SAVE_ALL
GET_THREAD_INFO(%ebp)
+ testb $(TF_MASK >> 8),EFLAGS+1(%esp)
+ jnz syscall_singlestep
+syscall_check_nr:
cmpl $(nr_syscalls), %eax
jae syscall_badsys
# system call tracing in operation
@@ -370,6 +380,14 @@ syscall_badsys:
movl $-ENOSYS,EAX(%esp)
jmp resume_userspace

+ ALIGN
+syscall_singlestep:
+#ifdef CONFIG_SMP
+ lock
+#endif
+ btsl $TIF_SINGLESTEP_TRAP,TI_flags(%ebp)
+ jmp syscall_check_nr
+
/*
* Build the entry stubs and pointer table with
* some assembler magic.
Index: linux-2.6/arch/i386/kernel/signal.c
===================================================================
RCS file: /home/roland/redhat/bkcvs/linux-2.5/arch/i386/kernel/signal.c,v
retrieving revision 1.40
diff -b -p -u -r1.40 signal.c
--- linux-2.6/arch/i386/kernel/signal.c 18 Jun 2004 20:35:31 -0000 1.40
+++ linux-2.6/arch/i386/kernel/signal.c 24 Jun 2004 06:54:51 -0000
@@ -613,10 +613,47 @@ void do_notify_resume(struct pt_regs *re
if (thread_info_flags & _TIF_SINGLESTEP) {
regs->eflags |= TF_MASK;
clear_thread_flag(TIF_SINGLESTEP);
+ /*
+ * TIF_SINGLESTEP is only ever set for a `sysenter' entry
+ * to the kernel when TF was set in the user-mode flags.
+ */
+ thread_info_flags |= _TIF_SINGLESTEP_TRAP;
}
/* deal with pending signal delivery */
if (thread_info_flags & _TIF_SIGPENDING)
do_signal(regs,oldset);

+ /*
+ * Check if we should simulate a single-step trap.
+ * Note this happens after handling other pending signals,
+ * including the one that set the flag for us.
+ */
+ if (thread_info_flags & _TIF_SINGLESTEP_TRAP) {
+ clear_thread_flag(TIF_SINGLESTEP_TRAP);
+ if (regs->eflags & X86_EFLAGS_TF) {
+ /*
+ * We single-stepped into an instruction that
+ * caused a trap, leaving the PC past that instruction.
+ * If we return now with TF set, the next instruction
+ * will be executed before we get a single-step trap.
+ * But the user just single-stepped and expects a
+ * stop after the trapping instruction, not the next.
+ * So fake a single-step trap.
+ */
+ siginfo_t info;
+ info.si_signo = SIGTRAP;
+ info.si_errno = 0;
+ info.si_code = TRAP_BRKPT;
+ info.si_addr = (void *)regs->eip;
+ force_sig_info(SIGTRAP, &info, current);
+ /*
+ * Now handle that signal immediately,
+ * since on return to assembly code it
+ * won't check _TIF_WORK_MASK again.
+ */
+ do_signal(regs,oldset);
+ }
+ }
+
clear_thread_flag(TIF_IRET);
}
Index: linux-2.6/arch/i386/kernel/traps.c
===================================================================
RCS file: /home/roland/redhat/bkcvs/linux-2.5/arch/i386/kernel/traps.c,v
retrieving revision 1.74
diff -b -p -u -r1.74 traps.c
--- linux-2.6/arch/i386/kernel/traps.c 18 Jun 2004 15:16:11 -0000 1.74
+++ linux-2.6/arch/i386/kernel/traps.c 24 Jun 2004 06:33:13 -0000
@@ -348,7 +348,8 @@ static inline unsigned long get_cr2(void
}

static inline void do_trap(int trapnr, int signr, char *str, int vm86,
- struct pt_regs * regs, long error_code, siginfo_t *info)
+ struct pt_regs * regs, long error_code,
+ siginfo_t *info, int flag_if_tf)
{
if (regs->eflags & VM_MASK) {
if (vm86)
@@ -367,6 +368,8 @@ static inline void do_trap(int trapnr, i
force_sig_info(signr, info, tsk);
else
force_sig(signr, tsk);
+ if (flag_if_tf && (regs->eflags & X86_EFLAGS_TF))
+ set_thread_flag(flag_if_tf);
return;
}

@@ -386,7 +389,7 @@ static inline void do_trap(int trapnr, i
#define DO_ERROR(trapnr, signr, str, name) \
asmlinkage void do_##name(struct pt_regs * regs, long error_code) \
{ \
- do_trap(trapnr, signr, str, 0, regs, error_code, NULL); \
+ do_trap(trapnr, signr, str, 0, regs, error_code, NULL, 0); \
}

#define DO_ERROR_INFO(trapnr, signr, str, name, sicode, siaddr) \
@@ -397,13 +400,14 @@ asmlinkage void do_##name(struct pt_regs
info.si_errno = 0; \
info.si_code = sicode; \
info.si_addr = (void *)siaddr; \
- do_trap(trapnr, signr, str, 0, regs, error_code, &info); \
+ do_trap(trapnr, signr, str, 0, regs, error_code, &info, 0); \
}

#define DO_VM86_ERROR(trapnr, signr, str, name) \
asmlinkage void do_##name(struct pt_regs * regs, long error_code) \
{ \
- do_trap(trapnr, signr, str, 1, regs, error_code, NULL); \
+ do_trap(trapnr, signr, str, 1, regs, error_code, NULL, \
+ TIF_SINGLESTEP_TRAP); \
}

#define DO_VM86_ERROR_INFO(trapnr, signr, str, name, sicode, siaddr) \
@@ -414,7 +418,7 @@ asmlinkage void do_##name(struct pt_regs
info.si_errno = 0; \
info.si_code = sicode; \
info.si_addr = (void *)siaddr; \
- do_trap(trapnr, signr, str, 1, regs, error_code, &info); \
+ do_trap(trapnr, signr, str, 1, regs, error_code, &info, 0); \
}

DO_VM86_ERROR_INFO( 0, SIGFPE, "divide error", divide_error, FPE_INTDIV, regs->eip)
Index: linux-2.6/include/asm-i386/thread_info.h
===================================================================
RCS file: /home/roland/redhat/bkcvs/linux-2.5/include/asm-i386/thread_info.h,v
retrieving revision 1.20
diff -b -p -u -r1.20 thread_info.h
--- linux-2.6/include/asm-i386/thread_info.h 19 May 2004 23:34:45 -0000 1.20
+++ linux-2.6/include/asm-i386/thread_info.h 23 Jun 2004 03:08:57 -0000
@@ -143,6 +143,7 @@ static inline unsigned long current_stac
#define TIF_NEED_RESCHED 3 /* rescheduling necessary */
#define TIF_SINGLESTEP 4 /* restore singlestep on return to user mode */
#define TIF_IRET 5 /* return with iret */
+#define TIF_SINGLESTEP_TRAP 6 /* fake singlestep on return to user */
#define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */
#define TIF_POLLING_NRFLAG 16 /* true if poll_idle() is polling TIF_NEED_RESCHED */

@@ -152,6 +153,7 @@ static inline unsigned long current_stac
#define _TIF_NEED_RESCHED (1<<TIF_NEED_RESCHED)
#define _TIF_SINGLESTEP (1<<TIF_SINGLESTEP)
#define _TIF_IRET (1<<TIF_IRET)
+#define _TIF_SINGLESTEP_TRAP (1<<TIF_SINGLESTEP_TRAP)
#define _TIF_SYSCALL_AUDIT (1<<TIF_SYSCALL_AUDIT)
#define _TIF_POLLING_NRFLAG (1<<TIF_POLLING_NRFLAG)

2004-06-29 02:06:11

by Davide Libenzi

[permalink] [raw]

Subject: Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps

On Mon, 28 Jun 2004, Roland McGrath wrote:

> Andrew Cagney discovered this problem while working on GDB. I suspect this
> bug has always been there, but I've only actually tested current 2.6 kernels.

Roland, did you try with -mm kernels?

- Davide

2004-06-29 03:42:46

by Linus Torvalds

[permalink] [raw]

Subject: Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps

On Mon, 28 Jun 2004, Roland McGrath wrote:
>
> When you single-step into a trap instruction, you actually don't get a
> SIGTRAP until the instruction after the trap instruction has also executed.

Yes. This is documented Intel behaviour. It also guarantees that there is
forward progress in some strange circumstances, if I remember correctly.

And I refuse to make the fast-path slower just because of this. Not only
has Linux always worked like this, as far as I know all other x86 OS's
also tend to just do the Intel behaviour thing.

Linus

2004-06-29 03:46:50

by Roland McGrath

[permalink] [raw]

Subject: Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps

> And I refuse to make the fast-path slower just because of this.

You are talking about the int $0x80 system call path here?
That is the only non-exception path touched by my changes.

> Not only has Linux always worked like this, as far as I know all other
> x86 OS's also tend to just do the Intel behaviour thing.

The only other one I have at hand to test is NetBSD 1.6.1, which does
indeed behave the same way for its int $0x80 system calls.

Thanks,
Roland

2004-06-29 03:55:46

by Linus Torvalds

[permalink] [raw]

Subject: Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps

On Mon, 28 Jun 2004, Roland McGrath wrote:
>
> > And I refuse to make the fast-path slower just because of this.
>
> You are talking about the int $0x80 system call path here?
> That is the only non-exception path touched by my changes.

That's still the fast path on any machine where this matters.

If the system uses sysenter, it won't matter because we reload TF by hand.
And if it uses "int 0x80" through the trampoline, it won't matter, because
the "lost" instruction is part of the trampoline, and not "important" for
the debuggee. I think it's a "ret" instruction that gets lost, so the only
thing that happens is that the person "magically" returns to the caller.

If that's a problem (for "finish" or other logic), I'd be ok with adding a
"nop" to the vsyscall thing. That would have less of an impact than adding
the test to the kernel path..

So the _only_ case your patch matters is for old-style binaries that use
"int 0x80" inline, and there that path is indeed the hot-path.

> > Not only has Linux always worked like this, as far as I know all other
> > x86 OS's also tend to just do the Intel behaviour thing.
>
> The only other one I have at hand to test is NetBSD 1.6.1, which does
> indeed behave the same way for its int $0x80 system calls.

I bet that if you really search, you cna probably find _some_ OS out there
that considered the Intel behaviour a bug, and fixed it with something
like your patch. But I bet it's not just Linux and BSD that use the Intel
behaviour, just because it's such a pain _not_ to.

Linus

2004-06-29 04:17:10

by Andrew Morton

[permalink] [raw]

Subject: Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps

Linus Torvalds <[email protected]> wrote:
>
> On Mon, 28 Jun 2004, Roland McGrath wrote:
> >
> > > And I refuse to make the fast-path slower just because of this.
> >
> > You are talking about the int $0x80 system call path here?
> > That is the only non-exception path touched by my changes.
>
> That's still the fast path on any machine where this matters.

Davide's patch (which has been in -mm for 6-7 weeks) doesn't add
fastpath overhead.

diff -puN arch/i386/kernel/entry.S~really-ptrace-single-step-2 arch/i386/kernel/entry.S
--- 25/arch/i386/kernel/entry.S~really-ptrace-single-step-2 2004-06-24 13:19:51.721958784 -0700
+++ 25-akpm/arch/i386/kernel/entry.S 2004-06-24 13:19:51.728957720 -0700
@@ -375,7 +375,7 @@ syscall_trace_entry:
# perform syscall exit tracing
ALIGN
syscall_exit_work:
- testb $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT), %cl
+ testb $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP), %cl
jz work_pending
sti # could let do_syscall_trace() call
# schedule() instead
diff -puN arch/i386/kernel/ptrace.c~really-ptrace-single-step-2 arch/i386/kernel/ptrace.c
--- 25/arch/i386/kernel/ptrace.c~really-ptrace-single-step-2 2004-06-24 13:19:51.723958480 -0700
+++ 25-akpm/arch/i386/kernel/ptrace.c 2004-06-24 13:19:51.729957568 -0700
@@ -147,6 +147,7 @@ void ptrace_disable(struct task_struct *
{
long tmp;

+ clear_tsk_thread_flag(child, TIF_SINGLESTEP);
tmp = get_stack_long(child, EFL_OFFSET) & ~TRAP_FLAG;
put_stack_long(child, EFL_OFFSET, tmp);
}
@@ -370,6 +371,7 @@ asmlinkage int sys_ptrace(long request,
else {
clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
}
+ clear_tsk_thread_flag(child, TIF_SINGLESTEP);
child->exit_code = data;
/* make sure the single step bit is not set. */
tmp = get_stack_long(child, EFL_OFFSET) & ~TRAP_FLAG;
@@ -391,6 +393,7 @@ asmlinkage int sys_ptrace(long request,
if (child->state == TASK_ZOMBIE) /* already dead */
break;
child->exit_code = SIGKILL;
+ clear_tsk_thread_flag(child, TIF_SINGLESTEP);
/* make sure the single step bit is not set. */
tmp = get_stack_long(child, EFL_OFFSET) & ~TRAP_FLAG;
put_stack_long(child, EFL_OFFSET, tmp);
@@ -411,6 +414,7 @@ asmlinkage int sys_ptrace(long request,
}
tmp = get_stack_long(child, EFL_OFFSET) | TRAP_FLAG;
put_stack_long(child, EFL_OFFSET, tmp);
+ set_tsk_thread_flag(child, TIF_SINGLESTEP);
child->exit_code = data;
/* give it a chance to run. */
wake_up_process(child);
@@ -535,7 +539,8 @@ void do_syscall_trace(struct pt_regs *re
audit_syscall_exit(current, regs->eax);
}

- if (!test_thread_flag(TIF_SYSCALL_TRACE))
+ if (!test_thread_flag(TIF_SYSCALL_TRACE) &&
+ !test_thread_flag(TIF_SINGLESTEP))
return;
if (!(current->ptrace & PT_PTRACED))
return;
diff -puN include/asm-i386/thread_info.h~really-ptrace-single-step-2 include/asm-i386/thread_info.h
--- 25/include/asm-i386/thread_info.h~really-ptrace-single-step-2 2004-06-24 13:19:51.724958328 -0700
+++ 25-akpm/include/asm-i386/thread_info.h 2004-06-24 13:19:51.729957568 -0700
@@ -157,7 +157,7 @@ static inline unsigned long current_stac

/* work to do on interrupt/exception return */
#define _TIF_WORK_MASK \
- (0x0000FFFF & ~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT))
+ (0x0000FFFF & ~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP))
#define _TIF_ALLWORK_MASK 0x0000FFFF /* work to do on any return to u-space */

/*
_

2004-06-29 04:33:02

by Roland McGrath

[permalink] [raw]

Subject: Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps

> > You are talking about the int $0x80 system call path here?
> > That is the only non-exception path touched by my changes.
>
> That's still the fast path on any machine where this matters.

My question was whether this is the only path that you were referring to
when you said you refused to add any more instructions there. I will seek
alternate solutions that avoid introducing any new instructions into fast
paths, but I want to be quite sure which you insist not to be changed.

> So the _only_ case your patch matters is for old-style binaries that use
> "int 0x80" inline, and there that path is indeed the hot-path.

Hmm, it really sounds like you didn't read my explanation about the
sysenter case, or read the patch where I added code to handle it.

> If the system uses sysenter, it won't matter because we reload TF by hand.

The issue does indeed arise using sysenter, as I explained and is easily
demonstrated by trying it. I'm not sure what you mean here when you say,
"by hand". The TF trap taken in kernel mode upon sysenter entry causes the
kernel to return using iret, which restores the TF flag in exactly the same
way as returning from other kinds of traps, and likewise executes the
following user-mode instruction. As I've mentioned, there is no change in
the main path for sysenter entries required. My patch changes the trap
handler for the single-step trap (in arch/i386/kernel/traps.c:do_debug) to
set the flag requesting a simulated trap before returning to user mode.

> And if it uses "int 0x80" through the trampoline, it won't matter, because
> the "lost" instruction is part of the trampoline, and not "important" for
> the debuggee. I think it's a "ret" instruction that gets lost, so the only
> thing that happens is that the person "magically" returns to the caller.

Are you referring to the signal trampoline for returning from signal
handlers? If you are referring to some other trampoline, please clear up
my confusion. The signal trampoline calls sigreturn, and the PC after that
system call instruction is then abandoned entirely. What happens in the
case of single-stepping into the sigreturn system call is that (at least)
the first instruction of the restored state where the signal hit is
executed. In practice, the sigreturn call also clears the TF bit in the
restored state unless the saved flags word has been diddled on the stack.
I submit that when using PTRACE_SINGLESTEP to enter the sigreturn system
call, the most useful thing is to get the next SIGTRAP with exactly the
state restored by the sigreturn call and no more instructions executed yet.
My patch accomplishes that, since sigreturn's clearing of TF in the
restored state doesn't clear the TIF_SINGLESTEP_TRAP kernel flag set on the
way in.

> But I bet it's not just Linux and BSD that use the Intel
> behaviour, just because it's such a pain _not_ to.

I don't think it's really so hard to make this work sanely. It's just an
arcane thing that's easy to overlook in the initial implementation.

2004-06-29 04:37:48

by Roland McGrath

[permalink] [raw]

Subject: Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps

> Davide's patch (which has been in -mm for 6-7 weeks) doesn't add
> fastpath overhead.

I am also dubious about exactly what it does. That patch seems a bizarre
obfuscation of the code to me. TIF_SINGLESTEP is really there to handle
the lazy TF clearing for sysenter entry, and that's all. I don't think
that patch handles user-mode setting TF properly, unusual though that case
is. How does that patch interact with PT_TRACESYSGOOD? It appears to me
that PTRACE_SINGLESTEP will now generate a syscall trap instead of a
single-step trap, which is an undesireable change in behavior I would say.

I don't really care about user-mode setting of TF before executing int
$0x80. If poeple have programs that use TF in user mode, they have never
complained about the issue before. For PTRACE_SINGLESTEP, Davide's
approach of setting the kernel-work flag directly when PTRACE_SINGLESTEP
sets TF in the user flags word is the obvious way to avoid the test in the
fast path. I am inclined to combine that approeach with what my patch
does, i.e. just take out the system call fast-path test and set
TIF_SINGLESTEP_TRAP in PTRACE_SINGLESTEP. I think the way Davide's patch
uses TIF_SINGLESTEP is pretty questionable.

Thanks,
Roland

2004-06-29 05:16:03

by Linus Torvalds

[permalink] [raw]

Subject: Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps

On Mon, 28 Jun 2004, Roland McGrath wrote:
>
> The issue does indeed arise using sysenter, as I explained and is easily
> demonstrated by trying it. I'm not sure what you mean here when you say,
> "by hand". The TF trap taken in kernel mode upon sysenter entry causes the
> kernel to return using iret, which restores the TF flag in exactly the same
> way as returning from other kinds of traps, and likewise executes the
> following user-mode instruction.

They are "user-mode" only in theory.

They are really kernel instructions set up by the kernel, and user-mode
only in the sense that yes, they run in ring3. No actual user-compiled
code executed anywhere.

At least to me, that vsyscall trampoline is all kernel code. But yes,
you're right, we no longer do the eflags save/restore in user mode, that
was too slow (some of my first versions just cleared TF unconditionally,
and the trampoline was responsible for re-enabling it).

> Are you referring to the signal trampoline for returning from signal
> handlers? If you are referring to some other trampoline, please clear up
> my confusion.

I was talking about the vsyscall code.

Linus

2004-06-29 07:01:17

by Davide Libenzi

[permalink] [raw]

Subject: Re: [RFC PATCH] x86 single-step (TF) vs system calls & traps

On Mon, 28 Jun 2004, Roland McGrath wrote:

> > Davide's patch (which has been in -mm for 6-7 weeks) doesn't add
> > fastpath overhead.
>
> I am also dubious about exactly what it does. That patch seems a bizarre
> obfuscation of the code to me. TIF_SINGLESTEP is really there to handle
> the lazy TF clearing for sysenter entry, and that's all. I don't think
> that patch handles user-mode setting TF properly, unusual though that case
> is. How does that patch interact with PT_TRACESYSGOOD? It appears to me
> that PTRACE_SINGLESTEP will now generate a syscall trap instead of a
> single-step trap, which is an undesireable change in behavior I would say.
>
> I don't really care about user-mode setting of TF before executing int
> $0x80. If poeple have programs that use TF in user mode, they have never
> complained about the issue before. For PTRACE_SINGLESTEP, Davide's
> approach of setting the kernel-work flag directly when PTRACE_SINGLESTEP
> sets TF in the user flags word is the obvious way to avoid the test in the
> fast path. I am inclined to combine that approeach with what my patch
> does, i.e. just take out the system call fast-path test and set
> TIF_SINGLESTEP_TRAP in PTRACE_SINGLESTEP. I think the way Davide's patch
> uses TIF_SINGLESTEP is pretty questionable.

Roland, I don't think (pretty sure actually ;) we can handle the case
where TF is set from userspace and, at the same time, the user uses
PTRACE_SINGLESTEP. The ptrace infrastructure uses the hw TF flag to work.
The PTRACE_SINGLESTEP gives you the SYSGOOD behaviour, if you set it. And
sends a SIGTRAP notification to the ptrace'ing parent process.

- Davide