2009-12-18 01:02:52

by Oleg Nesterov

[permalink] [raw]
Subject: x86: do_debug && PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

Hi.

do_debug() is obviously wrong wrt PTRACE_SINGLESTEP/TIF_SINGLESTEP, no?

Afaics this was broken by

hw-breakpoints: modifying generic debug exception to use thread-specific debug registers
commit 08d68323d1f0c34452e614263b212ca556dae47f

To verify, the "patch" below fixes the stepping for me, not sure what
is the proper fix...

Oleg.

--- arch/x86/kernel/traps.c~ 2009-12-18 00:20:49.000000000 +0100
+++ arch/x86/kernel/traps.c 2009-12-18 01:44:05.000000000 +0100
@@ -575,7 +575,7 @@ dotraplinkage void __kprobes do_debug(st
regs->flags &= ~X86_EFLAGS_TF;
}
si_code = get_si_code(tsk->thread.debugreg6);
- if (tsk->thread.debugreg6 & (DR_STEP | DR_TRAP_BITS))
+// if (tsk->thread.debugreg6 & (DR_STEP | DR_TRAP_BITS))
send_sigtrap(tsk, regs, error_code, si_code);
preempt_conditional_cli(regs);


2009-12-18 01:40:34

by Roland McGrath

[permalink] [raw]
Subject: Re: x86: do_debug && PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

Comparing to the old (2.6.32) logic, I think it might be this (untested).
I also note this is the sole use of get_si_code, seems like it should
just be rolled in here.


Thanks,
Roland


diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 3339917..16a88f5 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -530,7 +530,6 @@ dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
{
struct task_struct *tsk = current;
unsigned long dr6;
- int si_code;

get_debugreg(dr6, 6);

@@ -569,14 +568,15 @@ dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
* We already checked v86 mode above, so we can check for kernel mode
* by just checking the CPL of CS.
*/
+ dr6 = tsk->thread.debugreg6;
if ((dr6 & DR_STEP) && !user_mode(regs)) {
tsk->thread.debugreg6 &= ~DR_STEP;
set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
regs->flags &= ~X86_EFLAGS_TF;
+ } else if (dr6 & (DR_STEP | DR_TRAP_BITS)) {
+ send_sigtrap(tsk, regs, error_code, get_si_code(dr6));
}
- si_code = get_si_code(tsk->thread.debugreg6);
- if (tsk->thread.debugreg6 & (DR_STEP | DR_TRAP_BITS))
- send_sigtrap(tsk, regs, error_code, si_code);
+
preempt_conditional_cli(regs);

return;

2009-12-18 02:16:30

by Oleg Nesterov

[permalink] [raw]
Subject: Re: x86: do_debug && PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

On 12/17, Roland McGrath wrote:
>
> Comparing to the old (2.6.32) logic, I think it might be this (untested).
> I also note this is the sole use of get_si_code, seems like it should
> just be rolled in here.

Well, it is too late for me to even try to read this patch ;)

but...

> @@ -569,14 +568,15 @@ dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
> * We already checked v86 mode above, so we can check for kernel mode
> * by just checking the CPL of CS.
> */
> + dr6 = tsk->thread.debugreg6;

why? we have "tsk->thread.debugreg6 = dr6" above

> if ((dr6 & DR_STEP) && !user_mode(regs)) {
> tsk->thread.debugreg6 &= ~DR_STEP;
> set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
> regs->flags &= ~X86_EFLAGS_TF;

this looks strange... we set TIF_SINGLESTEP but clear X86_EFLAGS_TF

> + } else if (dr6 & (DR_STEP | DR_TRAP_BITS)) {
> + send_sigtrap(tsk, regs, error_code, get_si_code(dr6));
> }
> - si_code = get_si_code(tsk->thread.debugreg6);
> - if (tsk->thread.debugreg6 & (DR_STEP | DR_TRAP_BITS))
> - send_sigtrap(tsk, regs, error_code, si_code);
> +

can't understand how this change can fix the problem. We should always
send SIGTRAP if the task returns to user-mode with X86_EFLAGS_TF?

OK. I blindly applied this patch, step-simple still fails.

Oleg.

2009-12-18 02:58:15

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: x86: do_debug && PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

On Fri, Dec 18, 2009 at 03:10:42AM +0100, Oleg Nesterov wrote:
> On 12/17, Roland McGrath wrote:
> >
> > Comparing to the old (2.6.32) logic, I think it might be this (untested).
> > I also note this is the sole use of get_si_code, seems like it should
> > just be rolled in here.
>
> Well, it is too late for me to even try to read this patch ;)
>
> but...
>
> > @@ -569,14 +568,15 @@ dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
> > * We already checked v86 mode above, so we can check for kernel mode
> > * by just checking the CPL of CS.
> > */
> > + dr6 = tsk->thread.debugreg6;
>
> why? we have "tsk->thread.debugreg6 = dr6" above


Yeah.



> > if ((dr6 & DR_STEP) && !user_mode(regs)) {
> > tsk->thread.debugreg6 &= ~DR_STEP;
> > set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
> > regs->flags &= ~X86_EFLAGS_TF;
>
> this looks strange... we set TIF_SINGLESTEP but clear X86_EFLAGS_TF


Yep, I don't understand what happens here either. This logic
was there before the refactoring and the comment indicates we want
to ignore traps from kernel. Why do we set this flag in a random
thread?



> > + } else if (dr6 & (DR_STEP | DR_TRAP_BITS)) {
> > + send_sigtrap(tsk, regs, error_code, get_si_code(dr6));
> > }
> > - si_code = get_si_code(tsk->thread.debugreg6);
> > - if (tsk->thread.debugreg6 & (DR_STEP | DR_TRAP_BITS))
> > - send_sigtrap(tsk, regs, error_code, si_code);
> > +
>
> can't understand how this change can fix the problem. We should always
> send SIGTRAP if the task returns to user-mode with X86_EFLAGS_TF?
>
> OK. I blindly applied this patch, step-simple still fails.


Yep, that doesn't fix your problem but this patch makes sense
in that if we were not in user mode while the step occured,
we shouldn't send the signal.

2009-12-18 03:06:12

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: x86: do_debug && PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

On Fri, Dec 18, 2009 at 01:56:50AM +0100, Oleg Nesterov wrote:
> Hi.
>
> do_debug() is obviously wrong wrt PTRACE_SINGLESTEP/TIF_SINGLESTEP, no?
>
> Afaics this was broken by
>
> hw-breakpoints: modifying generic debug exception to use thread-specific debug registers
> commit 08d68323d1f0c34452e614263b212ca556dae47f
>
> To verify, the "patch" below fixes the stepping for me, not sure what
> is the proper fix...
>
> Oleg.
>
> --- arch/x86/kernel/traps.c~ 2009-12-18 00:20:49.000000000 +0100
> +++ arch/x86/kernel/traps.c 2009-12-18 01:44:05.000000000 +0100
> @@ -575,7 +575,7 @@ dotraplinkage void __kprobes do_debug(st
> regs->flags &= ~X86_EFLAGS_TF;
> }
> si_code = get_si_code(tsk->thread.debugreg6);
> - if (tsk->thread.debugreg6 & (DR_STEP | DR_TRAP_BITS))
> +// if (tsk->thread.debugreg6 & (DR_STEP | DR_TRAP_BITS))
> send_sigtrap(tsk, regs, error_code, si_code);



But I don't understand why it is broken with the check.
If we are in a singlestep exception, dr6 should have its
DR_STEP bit set...

Single stepping works well for me, after a quick check on
gdb. How did you trigger the bug?

Thanks.

2009-12-18 03:09:22

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: x86: do_debug && PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

On Fri, Dec 18, 2009 at 03:58:09AM +0100, Frederic Weisbecker wrote:
> Yep, that doesn't fix your problem but this patch makes sense
> in that if we were not in user mode while the step occured,
> we shouldn't send the signal.


That wouldn't happen actually, as we clear the STEP flag before.

2009-12-18 03:54:27

by Roland McGrath

[permalink] [raw]
Subject: Re: x86: do_debug && PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

> > + dr6 = tsk->thread.debugreg6;
>
> why? we have "tsk->thread.debugreg6 = dr6" above

Yeah, it's a little superfluous. Except that the existing code uses
tsk->thread.debugreg6 and dr6 inconsistently. It only matters either way
if some notifier function might change thread.debugreg6, which I wasn't
sure that none might. i.e., does/should hw_breakpoint hide/remap the
watchpoint-fired bits when they are not for the same-numbered,
ptrace-installed virtual debugreg? And does/should kprobes, kgdb, and
whatnot, hide DR_STEP from thread.debugreg6 for a step that's not from
user_enable_single_step?

> > if ((dr6 & DR_STEP) && !user_mode(regs)) {
> > tsk->thread.debugreg6 &= ~DR_STEP;
> > set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
> > regs->flags &= ~X86_EFLAGS_TF;
>
> this looks strange... we set TIF_SINGLESTEP but clear X86_EFLAGS_TF

This was the original purpose of TIF_SINGLESTEP from long, long ago. This
happens when TF was set in user mode and then it did syscall/sysenter so TF
is still set at the first instruction in kernel mode. TF is cleared from
the interrupted kernel registers so the kernel can resume normally. In the
original logic, TIF_SINGLESTEP served just to make it turn TF back on when
going to user mode. Since then we grew the complicated step.c stuff and
it all fits together slightly differently than it did when the original
traps.c path was written.

> can't understand how this change can fix the problem. We should always
> send SIGTRAP if the task returns to user-mode with X86_EFLAGS_TF?

If the debug exception happened in user mode, then we should send SIGTRAP.

In the old (2.6.32) code with its goto-heavy logic the !user_mode(regs)
was goto clear_TF_reenable; and that is:

clear_TF_reenable:
set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
regs->flags &= ~X86_EFLAGS_TF;
preempt_conditional_cli(regs);
return;

I thought the new logic was falling through to the send_sigtrap case after
"if ((dr6 & DR_STEP) && !user_mode(regs))". But now I see that the subtle
use of dr6 vs tsk->thread.debugreg6 (without comments about it!) meant
that DR_STEP is cleared from tsk->thread.debugreg6 before we test it.

So I guess the idea there is that the !user_mode case would swallow the
step indication but still leave some DR_TRAP_BITS set and so you'd generate
a user SIGTRAP in honor of those (i.e. watchpoint hits). But I thought the
hardware behavior was that a step will set DR_STEP in DR6 but not clear any
DR_TRAP_BITS set from before, so I'm not sure this can't sometimes send a
SIGTRAP twice for a combination of a watchpoint hit and a delayed step.

> OK. I blindly applied this patch, step-simple still fails.

Yeah, it was a quick reaction to the funny-looking control flow.
But I didn't really investigate what is actually happening.


Thanks,
Roland

2009-12-18 17:34:13

by Oleg Nesterov

[permalink] [raw]
Subject: Re: x86: do_debug && PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

On 12/18, Frederic Weisbecker wrote:
>
> On Fri, Dec 18, 2009 at 01:56:50AM +0100, Oleg Nesterov wrote:
> > Hi.
> >
> > do_debug() is obviously wrong wrt PTRACE_SINGLESTEP/TIF_SINGLESTEP, no?
> >
> > Afaics this was broken by
> >
> > hw-breakpoints: modifying generic debug exception to use thread-specific debug registers
> > commit 08d68323d1f0c34452e614263b212ca556dae47f
> >
> > To verify, the "patch" below fixes the stepping for me, not sure what
> > is the proper fix...
> >
> > Oleg.
> >
> > --- arch/x86/kernel/traps.c~ 2009-12-18 00:20:49.000000000 +0100
> > +++ arch/x86/kernel/traps.c 2009-12-18 01:44:05.000000000 +0100
> > @@ -575,7 +575,7 @@ dotraplinkage void __kprobes do_debug(st
> > regs->flags &= ~X86_EFLAGS_TF;
> > }
> > si_code = get_si_code(tsk->thread.debugreg6);
> > - if (tsk->thread.debugreg6 & (DR_STEP | DR_TRAP_BITS))
> > +// if (tsk->thread.debugreg6 & (DR_STEP | DR_TRAP_BITS))
> > send_sigtrap(tsk, regs, error_code, si_code);
>
>
>
> But I don't understand why it is broken with the check.
> If we are in a singlestep exception, dr6 should have its
> DR_STEP bit set...
>
> Single stepping works well for me, after a quick check on
> gdb. How did you trigger the bug?

Please find the trivial test-case below. It hangs, because
PTRACE_SINGLESTEP doesn't trigger the trap.

(not sure this matters, but I did the testing under kvm)

Oleg.

#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <assert.h>

int main(void)
{
int pid, status, i;

pid = fork();
if (!pid)
for (;;);

sleep(1);
assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);

assert(pid == wait(&status));
assert(WIFSTOPPED(status));

for (i = 0; i < 10; ++i) {
assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0);

printf("wait %d ...\n", i);
assert(pid == wait(&status));

assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
}

kill(pid, SIGKILL);
return 0;
}

2009-12-18 17:37:19

by K.Prasad

[permalink] [raw]
Subject: Re: x86: do_debug && PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

On Fri, Dec 18, 2009 at 01:56:50AM +0100, Oleg Nesterov wrote:
> Hi.
>
> do_debug() is obviously wrong wrt PTRACE_SINGLESTEP/TIF_SINGLESTEP, no?
>
> Afaics this was broken by
>
> hw-breakpoints: modifying generic debug exception to use thread-specific debug registers
> commit 08d68323d1f0c34452e614263b212ca556dae47f
>
> To verify, the "patch" below fixes the stepping for me, not sure what
> is the proper fix...
>
> Oleg.
>
> --- arch/x86/kernel/traps.c~ 2009-12-18 00:20:49.000000000 +0100
> +++ arch/x86/kernel/traps.c 2009-12-18 01:44:05.000000000 +0100
> @@ -575,7 +575,7 @@ dotraplinkage void __kprobes do_debug(st
> regs->flags &= ~X86_EFLAGS_TF;
> }
> si_code = get_si_code(tsk->thread.debugreg6);
> - if (tsk->thread.debugreg6 & (DR_STEP | DR_TRAP_BITS))
> +// if (tsk->thread.debugreg6 & (DR_STEP | DR_TRAP_BITS))
> send_sigtrap(tsk, regs, error_code, si_code);
> preempt_conditional_cli(regs);
>

The cause for such a behaviour isn't apparent to me and like others, I'm
unable to recreate it (Single-stepping using gdb over a tiny program
running on x86, latest -tip works fine).

Did you try to narrow down the causative piece of code, among the
several hooks in do_debug()?

A separate 'dr6' and 'thread.debugreg6' was desired by the community (refer:
[email protected] and
[email protected]) then.
'dr6' and 'thread.deebugreg6' would contain the value of the DR6 status
register and every exception handler would reset the bits in them
corresponding to which action has been taken. The difference in them being
that 'thread.debugreg6' would be eventually processed by code interested
in user-space while 'dr6' was restricted to those hooks in do_debug().

Thanks,
K.Prasad

2009-12-18 17:58:55

by K.Prasad

[permalink] [raw]
Subject: Re: x86: do_debug && PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

On Fri, Dec 18, 2009 at 06:27:47PM +0100, Oleg Nesterov wrote:
> On 12/18, Frederic Weisbecker wrote:
> >
> > On Fri, Dec 18, 2009 at 01:56:50AM +0100, Oleg Nesterov wrote:
> > > Hi.

<snipped>

> > Single stepping works well for me, after a quick check on
> > gdb. How did you trigger the bug?
>
> Please find the trivial test-case below. It hangs, because
> PTRACE_SINGLESTEP doesn't trigger the trap.
>

aah...my other mail just criss-crossed yours.

I quickly ran on the said x86 box, loaded with -tip (commit
7818b3d0fc68f5c2a85fed86d9fa37131c5a3068) and it runs fine.

[root@llm05 prasadkr]# cat oleg.c
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <assert.h>

int main(void)
{
int pid, status, i;

pid = fork();
if (!pid)
for (;;);

sleep(1);
assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);

assert(pid == wait(&status));
assert(WIFSTOPPED(status));

for (i = 0; i < 10; ++i) {
assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0);

printf("wait %d ...\n", i);
assert(pid == wait(&status));

assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
}

kill(pid, SIGKILL);
return 0;
}

[root@llm05 prasadkr]# gcc -o oleg oleg.c -g -Wall
[root@llm05 prasadkr]# ./oleg
wait 0 ...
wait 1 ...
wait 2 ...
wait 3 ...
wait 4 ...
wait 5 ...
wait 6 ...
wait 7 ...
wait 8 ...
wait 9 ...
[root@llm05 prasadkr]#

Am I missing something here?

Thanks,
K.Prasad

2009-12-18 18:32:07

by Oleg Nesterov

[permalink] [raw]
Subject: Re: x86: do_debug && PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

On 12/18, K.Prasad wrote:
>
> On Fri, Dec 18, 2009 at 06:27:47PM +0100, Oleg Nesterov wrote:
> > On 12/18, Frederic Weisbecker wrote:
> > >
> > > On Fri, Dec 18, 2009 at 01:56:50AM +0100, Oleg Nesterov wrote:
> > > > Hi.
>
> <snipped>
>
> > > Single stepping works well for me, after a quick check on
> > > gdb. How did you trigger the bug?
> >
> > Please find the trivial test-case below. It hangs, because
> > PTRACE_SINGLESTEP doesn't trigger the trap.
> >
>
> aah...my other mail just criss-crossed yours.
>
> I quickly ran on the said x86 box, loaded with -tip (commit
> 7818b3d0fc68f5c2a85fed86d9fa37131c5a3068) and it runs fine.

Hmm. Just re-tested 2.6.33-rc1 under kvm, it hangs...

Oleg.

> [root@llm05 prasadkr]# cat oleg.c
> #include <stdio.h>
> #include <unistd.h>
> #include <signal.h>
> #include <sys/ptrace.h>
> #include <sys/wait.h>
> #include <assert.h>
>
> int main(void)
> {
> int pid, status, i;
>
> pid = fork();
> if (!pid)
> for (;;);
>
> sleep(1);
> assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);
>
> assert(pid == wait(&status));
> assert(WIFSTOPPED(status));
>
> for (i = 0; i < 10; ++i) {
> assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0);
>
> printf("wait %d ...\n", i);
> assert(pid == wait(&status));
>
> assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
> }
>
> kill(pid, SIGKILL);
> return 0;
> }
>
> [root@llm05 prasadkr]# gcc -o oleg oleg.c -g -Wall
> [root@llm05 prasadkr]# ./oleg
> wait 0 ...
> wait 1 ...
> wait 2 ...
> wait 3 ...
> wait 4 ...
> wait 5 ...
> wait 6 ...
> wait 7 ...
> wait 8 ...
> wait 9 ...
> [root@llm05 prasadkr]#
>
> Am I missing something here?
>
> Thanks,
> K.Prasad
>

2009-12-18 20:26:15

by Roland McGrath

[permalink] [raw]
Subject: Re: x86: do_debug && PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

> Please find the trivial test-case below. It hangs, because
> PTRACE_SINGLESTEP doesn't trigger the trap.

2.6.33-rc1 x86-64 works for me with either -m64 or -m32 version of that test.

> (not sure this matters, but I did the testing under kvm)

Apparently it does. You should hack some printks into do_debug() and see
how kvm is differing from real hardware. (Actually you can probably do
this with a notifier added by a module, not that you are shy about
recompiling!)

Probably kvm's emulation of the hardware behavior wrt the DR6 bits is not
sufficiently faithful. Conceivably, kvm is being consistent with some
older hardware and we have encoded assumptions that only newer hardware
meets. But I'd guess it's just a plain kvm bug.


Thanks,
Roland

2009-12-18 22:38:45

by Oleg Nesterov

[permalink] [raw]
Subject: Re: x86: do_debug && PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

On 12/18, Roland McGrath wrote:
>
> > Please find the trivial test-case below. It hangs, because
> > PTRACE_SINGLESTEP doesn't trigger the trap.
>
> 2.6.33-rc1 x86-64 works for me with either -m64 or -m32 version of that test.
>
> > (not sure this matters, but I did the testing under kvm)
>
> Apparently it does. You should hack some printks into do_debug() and see
> how kvm is differing from real hardware. (Actually you can probably do
> this with a notifier added by a module, not that you are shy about
> recompiling!)
>
> Probably kvm's emulation of the hardware behavior wrt the DR6 bits is not
> sufficiently faithful. Conceivably, kvm is being consistent with some
> older hardware and we have encoded assumptions that only newer hardware
> meets. But I'd guess it's just a plain kvm bug.

OK, thanks.

Hmm. Now I see how wrong I was when I said this code is "obviously wrong" ;)

I'll add the debugging printk's and report the output. Sorry for delay,
can't do this today.

Oleg.

2009-12-18 23:15:53

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: x86: do_debug && PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

On Fri, Dec 18, 2009 at 12:05:03PM -0800, Roland McGrath wrote:
> > Please find the trivial test-case below. It hangs, because
> > PTRACE_SINGLESTEP doesn't trigger the trap.
>
> 2.6.33-rc1 x86-64 works for me with either -m64 or -m32 version of that test.
>
> > (not sure this matters, but I did the testing under kvm)
>
> Apparently it does. You should hack some printks into do_debug() and see
> how kvm is differing from real hardware. (Actually you can probably do
> this with a notifier added by a module, not that you are shy about
> recompiling!)
>
> Probably kvm's emulation of the hardware behavior wrt the DR6 bits is not
> sufficiently faithful. Conceivably, kvm is being consistent with some
> older hardware and we have encoded assumptions that only newer hardware
> meets. But I'd guess it's just a plain kvm bug.


It looks like in kvm, before entering the guest, we restore its
debug registers:

vcpu_enter_guest():
if (unlikely(vcpu->arch.switch_db_regs)) {
set_debugreg(0, 7);
set_debugreg(vcpu->arch.eff_db[0], 0);
set_debugreg(vcpu->arch.eff_db[1], 1);
set_debugreg(vcpu->arch.eff_db[2], 2);
set_debugreg(vcpu->arch.eff_db[3], 3);
}


But what happens to dr6, I don't know.

Adding Avi and Jan in Cc.

2009-12-20 08:32:12

by Avi Kivity

[permalink] [raw]
Subject: Re: x86: do_debug && PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

On 12/19/2009 01:15 AM, Frederic Weisbecker wrote:
>
>> Apparently it does. You should hack some printks into do_debug() and see
>> how kvm is differing from real hardware. (Actually you can probably do
>> this with a notifier added by a module, not that you are shy about
>> recompiling!)
>>
>> Probably kvm's emulation of the hardware behavior wrt the DR6 bits is not
>> sufficiently faithful. Conceivably, kvm is being consistent with some
>> older hardware and we have encoded assumptions that only newer hardware
>> meets. But I'd guess it's just a plain kvm bug.
>>
>

A kvm bug is most likely.

> It looks like in kvm, before entering the guest, we restore its
> debug registers:
>
> vcpu_enter_guest():
> if (unlikely(vcpu->arch.switch_db_regs)) {
> set_debugreg(0, 7);
> set_debugreg(vcpu->arch.eff_db[0], 0);
> set_debugreg(vcpu->arch.eff_db[1], 1);
> set_debugreg(vcpu->arch.eff_db[2], 2);
> set_debugreg(vcpu->arch.eff_db[3], 3);
> }
>
>
> But what happens to dr6, I don't know.
>

That's done later, in vmx.c:vmx_vcpu_run():

if (vcpu->arch.switch_db_regs)
set_debugreg(vcpu->arch.dr6, 6);

Can you describe the failure? I'll try to construct a test case
reproducer and work with Jan to fix it.

--
error compiling committee.c: too many arguments to function

2009-12-21 10:23:07

by Jan Kiszka

[permalink] [raw]
Subject: Re: x86: do_debug && PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

Oleg Nesterov wrote:
> On 12/18, Roland McGrath wrote:
>>> Please find the trivial test-case below. It hangs, because
>>> PTRACE_SINGLESTEP doesn't trigger the trap.
>> 2.6.33-rc1 x86-64 works for me with either -m64 or -m32 version of that test.
>>
>>> (not sure this matters, but I did the testing under kvm)
>> Apparently it does. You should hack some printks into do_debug() and see
>> how kvm is differing from real hardware. (Actually you can probably do
>> this with a notifier added by a module, not that you are shy about
>> recompiling!)
>>
>> Probably kvm's emulation of the hardware behavior wrt the DR6 bits is not
>> sufficiently faithful. Conceivably, kvm is being consistent with some
>> older hardware and we have encoded assumptions that only newer hardware
>> meets. But I'd guess it's just a plain kvm bug.
>
> OK, thanks.
>
> Hmm. Now I see how wrong I was when I said this code is "obviously wrong" ;)
>
> I'll add the debugging printk's and report the output. Sorry for delay,
> can't do this today.

Can't reproduce, runs fine here with with 2.6.33-rc1 as both host&guest
and qemu-kvm latest git. Host uses kvm-intel.

Can you specify your setup in more details? Which host kernel did you
use, which qemu-kvm version? Are you on AMD or Intel? Any specific guest
kernel config switch that may influence this?

Jan


Attachments:
signature.asc (257.00 B)
OpenPGP digital signature

2009-12-21 16:49:09

by Jan Kiszka

[permalink] [raw]
Subject: Re: x86: do_debug && PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

Oleg Nesterov wrote:
> On 12/21, Jan Kiszka wrote:
>> Oleg Nesterov wrote:
>>> Hmm. Now I see how wrong I was when I said this code is "obviously wrong" ;)
>
> Yes, it is easy to blame the code you don't understand.
>
> My apologies to all.
>
>>> I'll add the debugging printk's and report the output. Sorry for delay,
>>> can't do this today.
>> Can't reproduce, runs fine here with with 2.6.33-rc1 as both host&guest
>> and qemu-kvm latest git. Host uses kvm-intel.
>
> Everything runs fine under 2.6.32 as a _host_ kernel. Previously I did
> the testing under 2.6.26.5-45.fc9.

Makes sense: that kernel (most probably) predates any debug register
virtualization in kvm.

>
> Sorry for noise, thanks all for your help.

Never mind.

Jan


Attachments:
signature.asc (257.00 B)
OpenPGP digital signature

2009-12-21 15:59:19

by Oleg Nesterov

[permalink] [raw]
Subject: Re: x86: do_debug && PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f

On 12/21, Jan Kiszka wrote:
>
> Oleg Nesterov wrote:
> >
> > Hmm. Now I see how wrong I was when I said this code is "obviously wrong" ;)

Yes, it is easy to blame the code you don't understand.

My apologies to all.

> > I'll add the debugging printk's and report the output. Sorry for delay,
> > can't do this today.
>
> Can't reproduce, runs fine here with with 2.6.33-rc1 as both host&guest
> and qemu-kvm latest git. Host uses kvm-intel.

Everything runs fine under 2.6.32 as a _host_ kernel. Previously I did
the testing under 2.6.26.5-45.fc9.

Sorry for noise, thanks all for your help.

Oleg.