Hello.
This is the new iteration of Roland's utrace patch, this time
with "rewrite-ptrace-via-utrace" + cleanups in utrace core.
1-7 are already in -mm tree, I am sending them to simplify the
review.
8-12 don not change the behaviour, simple preparations.
13-14 add utrace-ptrace and utrace
Please review.
Oleg.
On Tue, Nov 24, 2009 at 09:01:27PM +0100, Oleg Nesterov wrote:
> Hello.
>
> This is the new iteration of Roland's utrace patch, this time
> with "rewrite-ptrace-via-utrace" + cleanups in utrace core.
>
> 1-7 are already in -mm tree, I am sending them to simplify the
> review.
>
> 8-12 don not change the behaviour, simple preparations.
>
> 13-14 add utrace-ptrace and utrace
Oleg,
I ran the ptrace-tests testsuite [1] on powerpc on the vanilla ptrace
and then with ptrace/utrace. The results for ptrace/utrace look better
:-)
All tests are 'make check xcheck'.
Ananth
[1] cvs?-d?:pserver:anoncvs:[email protected]:/cvs/systemtap?co?ptrace-tests
---------
Vanilla ptrace:
PASS: ptrace-on-job-control-stopped
PASS: attach-wait-on-stopped
PASS: detach-can-signal
PASS: attach-into-signal
PASS: attach-sigcont-wait
PASS: sa-resethand-on-cont-signal
PASS: ptrace_cont-defeats-sigblock
PASS: ptrace-cont-sigstop-detach
PASS: ptrace_event_clone
PASS: tif-syscall-trace-after-detach
PASS: event-exit-proc-maps
PASS: event-exit-proc-environ
SKIP: x86_64-ia32-gs
SKIP: x86_64-gsbase
PASS: powerpc-altivec
PASS: peekpokeusr
PASS: watchpoint
PASS: block-step
PASS: step-jump-cont
SKIP: step-jump-cont-strict
PASS: ppc-dabr-race
PASS: signal-loss
PASS: step-into-handler
SKIP: user-area-access
PASS: user-regs-peekpoke
PASS: erestartsys
SKIP: erestart-debugger
SKIP: step-to-breakpoint
errno 14 (Bad address)
syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location ()) == 38' failed.
unexpected child status 67f
FAIL: syscall-reset
PASS: reparent-zombie
PASS: step-simple
SKIP: step-through-sigret
PASS: stop-attach-then-wait
FAIL: detach-stopped
PASS: detach-stopped-rhel5
PASS: clone-multi-ptrace
PASS: clone-ptrace
PASS: o_tracevfork
PASS: o_tracevforkdone
PASS: detach-parting-signal
PASS: detach-sigkill-race
PASS: waitpid-double-report
PASS: o_tracevfork-parent
PASS: stopped-detach-sleeping
FAIL: stopped-attach-transparency
SKIP: erestartsys-trap
SKIP: highmem-debugger
PASS: sigint-before-syscall-exit
SKIP: syscall-from-clone
step-from-clone: step-from-clone.c:195: main: Assertion `(status >> 8) == 5' failed.
step-from-clone: step-from-clone.c:119: handler_fail: Assertion `0' failed.
/bin/sh: line 5: 19825 Aborted ${dir}$tst
FAIL: step-from-clone
step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
/bin/sh: line 5: 19832 Aborted ${dir}$tst
FAIL: step-fork
========================================
5 of 41 tests failed
(10 tests were not run)
Please report to [email protected]
========================================
make[3]: *** [check-TESTS] Error 1
make[3]: Leaving directory `/home/ananth/ptrace-tests/tests'
make[2]: *** [check-am] Error 2
make[2]: Leaving directory `/home/ananth/ptrace-tests/tests'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/home/ananth/ptrace-tests/tests'
make: *** [check-recursive] Error 1
---------
ptrace over utrace:
PASS: ptrace-on-job-control-stopped
PASS: attach-wait-on-stopped
PASS: detach-can-signal
PASS: attach-into-signal
PASS: attach-sigcont-wait
PASS: sa-resethand-on-cont-signal
PASS: ptrace_cont-defeats-sigblock
PASS: ptrace-cont-sigstop-detach
PASS: ptrace_event_clone
PASS: tif-syscall-trace-after-detach
PASS: event-exit-proc-maps
PASS: event-exit-proc-environ
SKIP: x86_64-ia32-gs
SKIP: x86_64-gsbase
PASS: powerpc-altivec
PASS: peekpokeusr
PASS: watchpoint
PASS: block-step
PASS: step-jump-cont
SKIP: step-jump-cont-strict
PASS: ppc-dabr-race
PASS: signal-loss
PASS: step-into-handler
SKIP: user-area-access
PASS: user-regs-peekpoke
PASS: erestartsys
SKIP: erestart-debugger
SKIP: step-to-breakpoint
errno 14 (Bad address)
syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location ()) == 38' failed.
unexpected child status 67f
FAIL: syscall-reset
PASS: reparent-zombie
PASS: step-simple
SKIP: step-through-sigret
PASS: stop-attach-then-wait
PASS: detach-stopped
PASS: detach-stopped-rhel5
PASS: clone-multi-ptrace
PASS: clone-ptrace
PASS: o_tracevfork
PASS: o_tracevforkdone
PASS: detach-parting-signal
PASS: detach-sigkill-race
PASS: waitpid-double-report
PASS: o_tracevfork-parent
PASS: stopped-detach-sleeping
PASS: stopped-attach-transparency
SKIP: erestartsys-trap
SKIP: highmem-debugger
PASS: sigint-before-syscall-exit
SKIP: syscall-from-clone
SKIP: step-from-clone
step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
/bin/sh: line 5: 24803 Aborted ${dir}$tst
FAIL: step-fork
========================================
2 of 40 tests failed
(11 tests were not run)
Please report to [email protected]
========================================
make[3]: *** [check-TESTS] Error 1
make[3]: Leaving directory `/home/ananth/ptrace-tests/tests'
make[2]: *** [check-am] Error 2
make[2]: Leaving directory `/home/ananth/ptrace-tests/tests'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/home/ananth/ptrace-tests/tests'
make: *** [check-recursive] Error 1
On 11/25, Ananth N Mavinakayanahalli wrote:
>
> I ran the ptrace-tests testsuite [1] on powerpc on the vanilla ptrace
> and then with ptrace/utrace. The results for ptrace/utrace look better
> :-)
Great! thanks a lot Ananth for doing this.
ptrace-utrace still fails 2 tests,
> FAIL: syscall-reset
I'll take a look later. Since unpatched kernel fails this test too
I am not going to worry right now. I think this is ppc specific, x86
passes this test.
> step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
> /bin/sh: line 5: 24803 Aborted ${dir}$tst
> FAIL: step-fork
This is expected. Should be fixed by
ptrace-copy_process-should-disable-stepping.patch
in -mm tree. (I am attaching this patch below just in case)
I din't mention this patch in this series because this bug
is "ortogonal" to utrace/ptrace.
Oleg.
------------------------------------------------------
If the tracee calls fork() after PTRACE_SINGLESTEP, the forked child
starts with TIF_SINGLESTEP/X86_EFLAGS_TF bits copied from ptraced parent.
This is not right, especially when the new child is not auto-attaced: in
this case it is killed by SIGTRAP.
Change copy_process() to call user_disable_single_step(). Tested on x86.
Test-case:
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <assert.h>
int main(void)
{
int pid, status;
if (!(pid = fork())) {
assert(ptrace(PTRACE_TRACEME) == 0);
kill(getpid(), SIGSTOP);
if (!fork()) {
/* kernel bug: this child will be killed by SIGTRAP */
printf("Hello world\n");
return 43;
}
wait(&status);
return WEXITSTATUS(status);
}
for (;;) {
assert(pid == wait(&status));
if (WIFEXITED(status))
break;
assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0);
}
assert(WEXITSTATUS(status) == 43);
return 0;
}
Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Roland McGrath <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
diff -puN kernel/fork.c~ptrace-copy_process-should-disable-stepping kernel/fork.c
--- a/kernel/fork.c~ptrace-copy_process-should-disable-stepping
+++ a/kernel/fork.c
@@ -1203,9 +1203,10 @@ static struct task_struct *copy_process(
p->sas_ss_sp = p->sas_ss_size = 0;
/*
- * Syscall tracing should be turned off in the child regardless
- * of CLONE_PTRACE.
+ * Syscall tracing and stepping should be turned off in the
+ * child regardless of CLONE_PTRACE.
*/
+ user_disable_single_step(p);
clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE);
#ifdef TIF_SYSCALL_EMU
clear_tsk_thread_flag(p, TIF_SYSCALL_EMU);
On Tue, Nov 24, 2009 at 09:01:27PM +0100, Oleg Nesterov wrote:
> Hello.
>
> This is the new iteration of Roland's utrace patch, this time
> with "rewrite-ptrace-via-utrace" + cleanups in utrace core.
>
> 1-7 are already in -mm tree, I am sending them to simplify the
> review.
>
> 8-12 don not change the behaviour, simple preparations.
>
> 13-14 add utrace-ptrace and utrace
Skipped over it very, very briefly. One thing I really hate about this
is that it introduces two ptrace implementation by adding the new one
without removing the old one. Given that's it's pretty much too later
for the 2.6.33 cycle anyway I'd suggest you make sure the remaining
two major architectures (arm and mips) get converted, and if the
remaining minor architectures don't manage to get their homework done
they're left without ptrace.
The other thing is that this patchset really doesn't quite justify
utrace. It's growing a lot more code without actually growing any
useful functionality. What about all those other utrace killer
features that have been promised for a long time?
On 11/25, Christoph Hellwig wrote:
>
> On Tue, Nov 24, 2009 at 09:01:27PM +0100, Oleg Nesterov wrote:
> > Hello.
> >
> > This is the new iteration of Roland's utrace patch, this time
> > with "rewrite-ptrace-via-utrace" + cleanups in utrace core.
> >
> > 1-7 are already in -mm tree, I am sending them to simplify the
> > review.
> >
> > 8-12 don not change the behaviour, simple preparations.
> >
> > 13-14 add utrace-ptrace and utrace
>
> Skipped over it very, very briefly. One thing I really hate about this
> is that it introduces two ptrace implementation by adding the new one
> without removing the old one.
Yes, we obviously need the old one when CONFIG_UTRACE is not enabled.
So, I'd like to try to restate: one thing we all really hate is that
CONFIG_UTRACE exists.
> Given that's it's pretty much too later
> for the 2.6.33 cycle anyway I'd suggest you make sure the remaining
> two major architectures (arm and mips) get converted, and if the
> remaining minor architectures don't manage to get their homework done
> they're left without ptrace.
Well, I can't comment this. I mean, I can't judge.
> The other thing is that this patchset really doesn't quite justify
> utrace. It's growing a lot more code without actually growing any
> useful functionality.
This should be clarified. I don't think ptrace-utrace adds a lot more
code compared to the old ptrace. Note that we can kill a lot of old
code once CONFIG_UTRACE goes away. ptrace_signal(), ptrace_notify(),
even task_struct->almost_all_ptrace_related can go away.
kernel/utrace.c does add 12280 bytes (on my machine), yes.
> What about all those other utrace killer
> features that have been promised for a long time?
It is not clear how we can expect the new "killer" modules/applications
which use utrace before we merge it.
We already have some users, say, systemtap. But I don not know
what can be counted as a "really killer" application of utrace.
Oleg.
Hi Christoph,
>
> The other thing is that this patchset really doesn't quite justify
> utrace. It's growing a lot more code without actually growing any
> useful functionality. What about all those other utrace killer
> features that have been promised for a long time?
>
We are working on in-kernel gdbstub which was one of the features that
you had asked for. gdbstub does pass unit tests; but we are looking at
some way to hack the GDB testsuite to run its regression tests. Once we
are able to run the GDB testsuite and utrace is part of some upstream
tree, we plan to post these patches to LKML for comments. gdbstub uses
utrace and uprobes underneath. Uprobes was rewritten to remove issues
that LKML developers had opposed. Uprobes also has its own ftrace plugin
to use uprobes.
Currently in-kernel gdbstub is hosted by Frank Ch. Eigler over here:
git://web.elastic.org/~fche/utrace-ext.git
branch name utrace-gdbstub-uprobes
--
Regards
Srikar
On Wed, Nov 25, 2009 at 04:40:52PM +0100, Oleg Nesterov wrote:
> On 11/25, Ananth N Mavinakayanahalli wrote:
> >
> > I ran the ptrace-tests testsuite [1] on powerpc on the vanilla ptrace
> > and then with ptrace/utrace. The results for ptrace/utrace look better
> > :-)
>
> Great! thanks a lot Ananth for doing this.
>
> ptrace-utrace still fails 2 tests,
>
> > FAIL: syscall-reset
>
> I'll take a look later. Since unpatched kernel fails this test too
> I am not going to worry right now. I think this is ppc specific, x86
> passes this test.
>
> > step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
> > /bin/sh: line 5: 24803 Aborted ${dir}$tst
> > FAIL: step-fork
>
> This is expected. Should be fixed by
>
> ptrace-copy_process-should-disable-stepping.patch
>
> in -mm tree. (I am attaching this patch below just in case)
> I din't mention this patch in this series because this bug
> is "ortogonal" to utrace/ptrace.
Oleg,
The patch doesn't seem to fix the issue on powerpc:
step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
/bin/sh: line 5: 17325 Aborted ${dir}$tst
FAIL: step-fork
Ananth
* Christoph Hellwig <[email protected]> wrote:
> [...] Given that's it's pretty much too later for the 2.6.33 cycle
> anyway I'd suggest you make sure the remaining two major architectures
> (arm and mips) get converted, and if the remaining minor architectures
> don't manage to get their homework done they're left without ptrace.
I suspect the opinion of the ptrace maintainers matters heavily whether
it's appropriate for v2.6.33. You are not going to maintain this, they
are.
Regarding porting it to even more architectures - that's pretty much the
worst idea possible. It increases maintenance and testing overhead by
exploding the test matrix, while giving little to end result. Plus the
worst effect of it is that it becomes even more intrusive and even
harder (and riskier) to merge.
So dont do that.
The best strategy is to concentrate on just one or two well-tested
architectures, and then grow to other architectures gradually. Like
we've done it for basically all big kernel features in the past 10 years
(that had non-trivial arch dependencies), with no exception that i can
remember.
Thanks,
Ingo
On Thu, Nov 26, 2009 at 10:10:52AM +0100, Ingo Molnar wrote:
> > [...] Given that's it's pretty much too later for the 2.6.33 cycle
> > anyway I'd suggest you make sure the remaining two major architectures
> > (arm and mips) get converted, and if the remaining minor architectures
> > don't manage to get their homework done they're left without ptrace.
>
> I suspect the opinion of the ptrace maintainers matters heavily whether
> it's appropriate for v2.6.33. You are not going to maintain this, they
> are.
I am whoever like many others going to use it. And throwing in new code
a few days before the merge window closes and thus not getting any of
the broad -next test coverage is a pretty bad idea. In the end
it will be the maintainers ruling but that doesn't make it a good idea
from the engineering point of view.
> Regarding porting it to even more architectures - that's pretty much the
> worst idea possible. It increases maintenance and testing overhead by
> exploding the test matrix, while giving little to end result. Plus the
> worst effect of it is that it becomes even more intrusive and even
> harder (and riskier) to merge.
But it doesn't. Take a look at what these patches actually do, they
basically introduce a new utrace layer, and (conditionally) rewrite
ptrace to use it. The arch support isn't actually part of these patches
directly but rather the cleanup of the underlying arch ptrace code to
use regsets, tracehooks and co so that the new ptrace code can use.
What the patches in the current form do is to introduce two different
ptrace implementations, with one used on the architectures getting most
testing and another secondary one for left over embedded or dead
architectures with horrible results. So removing the old one is much
better. The arm ptrace rewrite has already been posted by Roland, btw
including some feedback from Russell, but nothing really happened to it.
* Christoph Hellwig <[email protected]> wrote:
> On Thu, Nov 26, 2009 at 10:10:52AM +0100, Ingo Molnar wrote:
> > > [...] Given that's it's pretty much too later for the 2.6.33 cycle
> > > anyway I'd suggest you make sure the remaining two major architectures
> > > (arm and mips) get converted, and if the remaining minor architectures
> > > don't manage to get their homework done they're left without ptrace.
> >
> > I suspect the opinion of the ptrace maintainers matters heavily whether
> > it's appropriate for v2.6.33. You are not going to maintain this, they
> > are.
>
> I am whoever like many others going to use it. And throwing in new
> code a few days before the merge window closes [...]
FYI, the merge window has not opened yet, so it cannot close in a few
days.
> [...] and thus not getting any of the broad -next test coverage is a
> pretty bad idea. In the end it will be the maintainers ruling but
> that doesn't make it a good idea from the engineering point of view.
FYI, it's been in -mm, that's where it's maintained.
> > Regarding porting it to even more architectures - that's pretty much
> > the worst idea possible. It increases maintenance and testing
> > overhead by exploding the test matrix, while giving little to end
> > result. Plus the worst effect of it is that it becomes even more
> > intrusive and even harder (and riskier) to merge.
>
> But it doesn't. Take a look at what these patches actually do, they
> basically introduce a new utrace layer, and (conditionally) rewrite
> ptrace to use it. The arch support isn't actually part of these
> patches directly but rather the cleanup of the underlying arch ptrace
> code to use regsets, tracehooks and co so that the new ptrace code can
> use.
( I am aware of its design, i merged the original tracehook patches for
x86. )
> What the patches in the current form do is to introduce two different
> ptrace implementations, with one used on the architectures getting
> most testing and another secondary one for left over embedded or dead
> architectures with horrible results. So removing the old one is much
> better. The arm ptrace rewrite has already been posted by Roland, btw
> including some feedback from Russell, but nothing really happened to
> it.
Yes. Which is a further argument to not do it like that but to do one
arch at a time. Trying to do too much at once is bad engineering.
Ingo
On Thu, 2009-11-26 at 12:37 +0530, Srikar Dronamraju wrote:
> Hi Christoph,
>
> >
> > The other thing is that this patchset really doesn't quite justify
> > utrace. It's growing a lot more code without actually growing any
> > useful functionality. What about all those other utrace killer
> > features that have been promised for a long time?
> >
>
> We are working on in-kernel gdbstub which was one of the features that
> you had asked for. gdbstub does pass unit tests; but we are looking at
> some way to hack the GDB testsuite to run its regression tests. Once we
> are able to run the GDB testsuite and utrace is part of some upstream
> tree, we plan to post these patches to LKML for comments. gdbstub uses
> utrace and uprobes underneath. Uprobes was rewritten to remove issues
> that LKML developers had opposed. Uprobes also has its own ftrace plugin
> to use uprobes.
>
> Currently in-kernel gdbstub is hosted by Frank Ch. Eigler over here:
> git://web.elastic.org/~fche/utrace-ext.git
> branch name utrace-gdbstub-uprobes
If its anywhere near functioning it would have made sense to send it out
as an RFC patch-set right along with the utrace one.
On 11/26, Christoph Hellwig wrote:
>
> What the patches in the current form do is to introduce two different
> ptrace implementations, with one used on the architectures getting most
> testing and another secondary one for left over embedded or dead
> architectures with horrible results.
Yes, nobody likes 2 implementations. I guess Roland and me hate
CONFIG_UTRACE much more than anybody else.
> So removing the old one is much
> better.
I am in no position to discuss this option. It is very easy to remove
the old code and break !HAVE_ARCH_TRACEHOOK architectures. Although
personally I am not sure this is practical.
If we merge utrace, perhaps we will get more attention from maintainers,
the old code will be "officially" deprecated/obsolete. I sent some
trivial initial changes in arch/um/ a long ago, the patch was silently
ignored.
Even if I was able to fix arch/xxx myself, I don't understand how can
I send the patches to maintainers until utrace is already merged in
-mm at least.
Oleg.
I changed the subject. This bug has nothing to do with utrace,
the kernel fails with or without these changes.
On 11/26, Ananth N Mavinakayanahalli wrote:
>
> On Wed, Nov 25, 2009 at 04:40:52PM +0100, Oleg Nesterov wrote:
> > On 11/25, Ananth N Mavinakayanahalli wrote:
> > >
> > > step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
> > > /bin/sh: line 5: 24803 Aborted ${dir}$tst
> > > FAIL: step-fork
> >
> > This is expected. Should be fixed by
> >
> > ptrace-copy_process-should-disable-stepping.patch
> >
> > in -mm tree. (I am attaching this patch below just in case)
> > I din't mention this patch in this series because this bug
> > is "ortogonal" to utrace/ptrace.
>
> The patch doesn't seem to fix the issue on powerpc:
>
> step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
> /bin/sh: line 5: 17325 Aborted ${dir}$tst
> FAIL: step-fork
Good to know, thanks again Ananth.
I'll take a look. Since I know nothing about powerpc, I can't
promise the quick fix ;)
The bug was found by code inspection, but the fix is not trivial
because it depends on arch/, and it turns out the arch-independent
fix in
ptrace-copy_process-should-disable-stepping.patch
http://marc.info/?l=linux-mm-commits&m=125789789322573
doesn't work.
Ananth, could you please run the test-case from the changelog
below ? I do not really expect this can help, but just in case.
Oleg.
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <assert.h>
int main(void)
{
int pid, status;
if (!(pid = fork())) {
assert(ptrace(PTRACE_TRACEME) == 0);
kill(getpid(), SIGSTOP);
if (!fork()) {
/* kernel bug: this child will be killed by SIGTRAP */
printf("Hello world\n");
return 43;
}
wait(&status);
return WEXITSTATUS(status);
}
for (;;) {
assert(pid == wait(&status));
if (WIFEXITED(status))
break;
assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0);
}
assert(WEXITSTATUS(status) == 43);
return 0;
}
On 11/26, Oleg Nesterov wrote:
>
> On 11/26, Ananth N Mavinakayanahalli wrote:
> >
> > step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
> > /bin/sh: line 5: 17325 Aborted ${dir}$tst
> > FAIL: step-fork
>
> Good to know, thanks again Ananth.
>
> I'll take a look. Since I know nothing about powerpc, I can't
> promise the quick fix ;)
>
> The bug was found by code inspection, but the fix is not trivial
> because it depends on arch/, and it turns out the arch-independent
> fix in
>
> ptrace-copy_process-should-disable-stepping.patch
> http://marc.info/?l=linux-mm-commits&m=125789789322573
>
> doesn't work.
Just noticed the test-case fails in handler_fail(). Most probably
this means it is killed by SIGALRM because either parent or child
hang in wait(). Perhaps we have another (ppc specific?) bug, but
currently I do not understand how this is possible, this should
not be arch-dependent.
Oleg.
On Thu, Nov 26, 2009 at 06:25:24PM +0100, Oleg Nesterov wrote:
> On 11/26, Oleg Nesterov wrote:
> >
> > On 11/26, Ananth N Mavinakayanahalli wrote:
> > >
> > > step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
> > > /bin/sh: line 5: 17325 Aborted ${dir}$tst
> > > FAIL: step-fork
> >
> > Good to know, thanks again Ananth.
> >
> > I'll take a look. Since I know nothing about powerpc, I can't
> > promise the quick fix ;)
> >
> > The bug was found by code inspection, but the fix is not trivial
> > because it depends on arch/, and it turns out the arch-independent
> > fix in
> >
> > ptrace-copy_process-should-disable-stepping.patch
> > http://marc.info/?l=linux-mm-commits&m=125789789322573
> >
> > doesn't work.
>
> Just noticed the test-case fails in handler_fail(). Most probably
> this means it is killed by SIGALRM because either parent or child
> hang in wait(). Perhaps we have another (ppc specific?) bug, but
> currently I do not understand how this is possible, this should
> not be arch-dependent.
I can confirm that we have another bug on ppc arch. The test case below
is spinning forever,
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <assert.h>
int main(void)
{
int pid, status;
if (!(pid = fork())) {
assert(ptrace(PTRACE_TRACEME) == 0);
kill(getpid(), SIGSTOP);
if (!fork())
return 0;
printf("fork passed..\n");
return 0;
}
for (;;) {
assert(pid == wait(&status));
if (WIFEXITED(status))
break;
assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0);
}
printf("Parent exit.\n");
return 0;
}
it doesn't hang, the parent is spinning around for, the test case
isn't printing anything. Seems like fork() can't complete under
PTRACE_SINGLESTEP.
--
Veaceslav
Veaceslav doesn't have the time to continue, but he gave me
access to rhts machine ;)
The kernel is 2.6.31.6 btw.
On 11/26, Veaceslav Falico wrote:
>
> > Just noticed the test-case fails in handler_fail(). Most probably
> > this means it is killed by SIGALRM because either parent or child
> > hang in wait(). Perhaps we have another (ppc specific?) bug, but
> > currently I do not understand how this is possible, this should
> > not be arch-dependent.
>
> I can confirm that we have another bug on ppc arch. The test case below
> is spinning forever,
>
> [...]
>
> it doesn't hang, the parent is spinning around for, the test case
> isn't printing anything. Seems like fork() can't complete under
> PTRACE_SINGLESTEP.
Yep, thanks a lot Veaceslav.
I modified this test-case to print si_addr:
int main(void)
{
int pid, status;
if (!(pid = fork())) {
assert(ptrace(PTRACE_TRACEME) == 0);
kill(getpid(), SIGSTOP);
if (!fork())
return 0;
printf("fork passed..\n");
return 0;
}
for (;;) {
siginfo_t info;
assert(pid == wait(&status));
assert(status = 0x57f);
assert(ptrace(PTRACE_GETSIGINFO, pid, 0,&info) == 0);
printf("%p\n", info.si_addr);
if (WIFEXITED(status))
break;
assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0);
}
printf("Parent exit.\n");
return 0;
}
the output is:
...
0xfedf880
0xfedf884
...
0xfedf96c
0xfedf970
this is fork which calls __GI__IO_list_lock
Dump of assembler code for function fork:
0x0fedf880 <fork+0>: mflr r0
...
0x0fedf96c <fork+236>: li r28,0
0x0fedf970 <fork+240>: bl 0xfeacce0 <__GI__IO_list_lock>
Then it loops inside __GI__IO_list_lock
...
0xfeacd24
0xfeacd28
0xfeacd2c
0xfeacd30
0xfeacd34
0xfeacd24
0xfeacd28
0xfeacd2c
0xfeacd30
0xfeacd34
0xfeacd24
0xfeacd28
0xfeacd2c
0xfeacd30
0xfeacd34
...
and so on forever,
Dump of assembler code for function __GI__IO_list_lock:
0x0feacce0 <__GI__IO_list_lock+0>: mflr r0
0x0feacce4 <__GI__IO_list_lock+4>: stwu r1,-32(r1)
0x0feacce8 <__GI__IO_list_lock+8>: li r11,0
0x0feaccec <__GI__IO_list_lock+12>: bcl- 20,4*cr7+so,0xfeaccf0 <__GI__IO_list_lock+16>
0x0feaccf0 <__GI__IO_list_lock+16>: li r9,1
0x0feaccf4 <__GI__IO_list_lock+20>: stw r0,36(r1)
0x0feaccf8 <__GI__IO_list_lock+24>: stw r30,24(r1)
0x0feaccfc <__GI__IO_list_lock+28>: mflr r30
0x0feacd00 <__GI__IO_list_lock+32>: stw r31,28(r1)
0x0feacd04 <__GI__IO_list_lock+36>: stw r29,20(r1)
0x0feacd08 <__GI__IO_list_lock+40>: addi r29,r2,-29824
0x0feacd0c <__GI__IO_list_lock+44>: addis r30,r30,16
0x0feacd10 <__GI__IO_list_lock+48>: addi r30,r30,13060
0x0feacd14 <__GI__IO_list_lock+52>: lwz r31,-6436(r30)
0x0feacd18 <__GI__IO_list_lock+56>: lwz r0,8(r31)
0x0feacd1c <__GI__IO_list_lock+60>: cmpw cr7,r0,r29
0x0feacd20 <__GI__IO_list_lock+64>: beq- cr7,0xfeacd4c <__GI__IO_list_lock+108>
beg-> 0x0feacd24 <__GI__IO_list_lock+68>: lwarx r0,0,r31
0x0feacd28 <__GI__IO_list_lock+72>: cmpw r0,r11
0x0feacd2c <__GI__IO_list_lock+76>: bne- 0xfeacd38 <__GI__IO_list_lock+88>
0x0feacd30 <__GI__IO_list_lock+80>: stwcx. r9,0,r31
end-> 0x0feacd34 <__GI__IO_list_lock+84>: bne+ 0xfeacd24 <__GI__IO_list_lock+68>
I don't even know whether this is user-space bug or kernel bug,
the asm above is the black magic for me.
Anyone who knows something about powerpc can give me a hint?
Oleg.
On 11/26, Oleg Nesterov wrote:
>
> Then it loops inside __GI__IO_list_lock
>
> 0xfeacd24
> 0xfeacd28
> 0xfeacd2c
> 0xfeacd30
> 0xfeacd34
> ...
>
> and so on forever,
>
> Dump of assembler code for function __GI__IO_list_lock:
> 0x0feacce0 <__GI__IO_list_lock+0>: mflr r0
> 0x0feacce4 <__GI__IO_list_lock+4>: stwu r1,-32(r1)
> 0x0feacce8 <__GI__IO_list_lock+8>: li r11,0
> 0x0feaccec <__GI__IO_list_lock+12>: bcl- 20,4*cr7+so,0xfeaccf0 <__GI__IO_list_lock+16>
> 0x0feaccf0 <__GI__IO_list_lock+16>: li r9,1
> 0x0feaccf4 <__GI__IO_list_lock+20>: stw r0,36(r1)
> 0x0feaccf8 <__GI__IO_list_lock+24>: stw r30,24(r1)
> 0x0feaccfc <__GI__IO_list_lock+28>: mflr r30
> 0x0feacd00 <__GI__IO_list_lock+32>: stw r31,28(r1)
> 0x0feacd04 <__GI__IO_list_lock+36>: stw r29,20(r1)
> 0x0feacd08 <__GI__IO_list_lock+40>: addi r29,r2,-29824
> 0x0feacd0c <__GI__IO_list_lock+44>: addis r30,r30,16
> 0x0feacd10 <__GI__IO_list_lock+48>: addi r30,r30,13060
> 0x0feacd14 <__GI__IO_list_lock+52>: lwz r31,-6436(r30)
> 0x0feacd18 <__GI__IO_list_lock+56>: lwz r0,8(r31)
> 0x0feacd1c <__GI__IO_list_lock+60>: cmpw cr7,r0,r29
> 0x0feacd20 <__GI__IO_list_lock+64>: beq- cr7,0xfeacd4c <__GI__IO_list_lock+108>
>
> beg-> 0x0feacd24 <__GI__IO_list_lock+68>: lwarx r0,0,r31
> 0x0feacd28 <__GI__IO_list_lock+72>: cmpw r0,r11
> 0x0feacd2c <__GI__IO_list_lock+76>: bne- 0xfeacd38 <__GI__IO_list_lock+88>
> 0x0feacd30 <__GI__IO_list_lock+80>: stwcx. r9,0,r31
> end-> 0x0feacd34 <__GI__IO_list_lock+84>: bne+ 0xfeacd24 <__GI__IO_list_lock+68>
>
> I don't even know whether this is user-space bug or kernel bug,
> the asm above is the black magic for me.
When I use gdb to step over __GI__IO_list_lock(), it doesn't loop.
I straced gdb and noticed that when the trace reaches
0x0feacd24: lwarx r0,0,r31
gdb does PTRACE_CONT, not PTRACE_SINGLESTEP. After that the child
stops at 0x0feacd38, the next insn (isync).
> Anyone who knows something about powerpc can give me a hint?
Please ;)
Oleg.
Oleg Nesterov writes:
> 0xfeacd24
> 0xfeacd28
> 0xfeacd2c
> 0xfeacd30
> 0xfeacd34
> ...
>
> and so on forever,
...
> beg-> 0x0feacd24 <__GI__IO_list_lock+68>: lwarx r0,0,r31
> 0x0feacd28 <__GI__IO_list_lock+72>: cmpw r0,r11
> 0x0feacd2c <__GI__IO_list_lock+76>: bne- 0xfeacd38 <__GI__IO_list_lock+88>
> 0x0feacd30 <__GI__IO_list_lock+80>: stwcx. r9,0,r31
> end-> 0x0feacd34 <__GI__IO_list_lock+84>: bne+ 0xfeacd24 <__GI__IO_list_lock+68>
>
> I don't even know whether this is user-space bug or kernel bug,
> the asm above is the black magic for me.
The lwarx and stwcx. work together to do an atomic update to the word
whose address is in r31. They are like LL (load-linked) and SC
(store-conditional) on other architectures such as alpha. Basically
the lwarx creates an internal "reservation" on the word pointed to by
r31 and loads its value into r0. The stwcx. stores into that word but
only if the reservation still exists. The reservation gets cleared
(in hardware) if any other cpu writes to that word in the meantime.
If the reservation did get cleared, the bne (branch if not equal)
instruction will be taken and we loop around to try again.
There is a difficulty when single-stepping through such a sequence
because the process of taking the single-step exception and returning
will clear the reservation. Thus if you single-step through that
sequence it will never succeed. I believe gdb has code to recognize
this kind of sequence and run through it without stopping until after
the bne, precisely to avoid this problem.
Paul.
On 11/27, Paul Mackerras wrote:
>
> Oleg Nesterov writes:
>
> > 0xfeacd24
> > 0xfeacd28
> > 0xfeacd2c
> > 0xfeacd30
> > 0xfeacd34
> > ...
> >
> > and so on forever,
> ...
> > beg-> 0x0feacd24 <__GI__IO_list_lock+68>: lwarx r0,0,r31
> > 0x0feacd28 <__GI__IO_list_lock+72>: cmpw r0,r11
> > 0x0feacd2c <__GI__IO_list_lock+76>: bne- 0xfeacd38 <__GI__IO_list_lock+88>
> > 0x0feacd30 <__GI__IO_list_lock+80>: stwcx. r9,0,r31
> > end-> 0x0feacd34 <__GI__IO_list_lock+84>: bne+ 0xfeacd24 <__GI__IO_list_lock+68>
> >
> > I don't even know whether this is user-space bug or kernel bug,
> > the asm above is the black magic for me.
>
> The lwarx and stwcx. work together to do an atomic update to the word
> whose address is in r31. They are like LL (load-linked) and SC
> (store-conditional) on other architectures such as alpha. Basically
> the lwarx creates an internal "reservation" on the word pointed to by
> r31 and loads its value into r0. The stwcx. stores into that word but
> only if the reservation still exists. The reservation gets cleared
> (in hardware) if any other cpu writes to that word in the meantime.
> If the reservation did get cleared, the bne (branch if not equal)
> instruction will be taken and we loop around to try again.
>
> There is a difficulty when single-stepping through such a sequence
> because the process of taking the single-step exception and returning
> will clear the reservation. Thus if you single-step through that
> sequence it will never succeed. I believe gdb has code to recognize
> this kind of sequence and run through it without stopping until after
> the bne, precisely to avoid this problem.
Thanks! This explains everything, I think.
Could you look at this
ptrace-copy_process-should-disable-stepping.patch
http://marc.info/?l=linux-mm-commits&m=125789789322573
patch? It is not clear to me how we can modify the test-case to
verify it fixes the original problem for powerpc.
At least, do you think this patch is good for powerpc ?
Oleg.
Paul Mackerras <[email protected]> writes:
> I believe gdb has code to recognize this kind of sequence and run
> through it without stopping until after the bne, precisely to avoid
> this problem.
See gdb/rs6000-tdep.c:ppc_deal_with_atomic_sequence.
Andreas.
--
Andreas Schwab, [email protected]
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
On Thu, Nov 26, 2009 at 03:50:51PM +0100, Oleg Nesterov wrote:
> I changed the subject. This bug has nothing to do with utrace,
> the kernel fails with or without these changes.
>
> On 11/26, Ananth N Mavinakayanahalli wrote:
> >
> > On Wed, Nov 25, 2009 at 04:40:52PM +0100, Oleg Nesterov wrote:
> > > On 11/25, Ananth N Mavinakayanahalli wrote:
> > > >
> > > > step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
> > > > /bin/sh: line 5: 24803 Aborted ${dir}$tst
> > > > FAIL: step-fork
> > >
> > > This is expected. Should be fixed by
> > >
> > > ptrace-copy_process-should-disable-stepping.patch
> > >
> > > in -mm tree. (I am attaching this patch below just in case)
> > > I din't mention this patch in this series because this bug
> > > is "ortogonal" to utrace/ptrace.
> >
> > The patch doesn't seem to fix the issue on powerpc:
> >
> > step-fork: step-fork.c:56: handler_fail: Assertion `0' failed.
> > /bin/sh: line 5: 17325 Aborted ${dir}$tst
> > FAIL: step-fork
>
> Good to know, thanks again Ananth.
>
> I'll take a look. Since I know nothing about powerpc, I can't
> promise the quick fix ;)
>
> The bug was found by code inspection, but the fix is not trivial
> because it depends on arch/, and it turns out the arch-independent
> fix in
>
> ptrace-copy_process-should-disable-stepping.patch
> http://marc.info/?l=linux-mm-commits&m=125789789322573
>
> doesn't work.
>
> Ananth, could you please run the test-case from the changelog
> below ? I do not really expect this can help, but just in case.
Right, it doesn't help :-(
GDB shows that the parent is forever struck at wait().
Ananth
On Thu, Nov 26, 2009 at 01:24:41PM +0100, Ingo Molnar wrote:
> FYI, the merge window has not opened yet, so it cannot close in a few
> days.
subsystems merged window, not Linus'.
>
> > [...] and thus not getting any of the broad -next test coverage is a
> > pretty bad idea. In the end it will be the maintainers ruling but
> > that doesn't make it a good idea from the engineering point of view.
>
> FYI, it's been in -mm, that's where it's maintained.
None of the recent mm snapshots has anything utrace related in there,
just a few ptrace patches from Oleg (which are in this series but a very
small part of it) and certainly not all this new code that is pretty
recent (take a look at the utrace list for the development).
> Yes. Which is a further argument to not do it like that but to do one
> arch at a time. Trying to do too much at once is bad engineering.
I'm not sure why you're trying to pick fights here, but no one has said
about doing it all in once. The point I'm trying to make is that it's
pretty bad to keep parallel ptrace implementations, and we should settle
on one. A pre-requisite of using the new once genericly is to have the
architecture ptrace code updated. I think arm and mips are the two
only relevant ones still missing, so updating them and killing the other
ones is easy.
If you think keeping the two ptrace implementations is fine argue for
that directly, but please stick to the technical points instead of just
fighting for fightings sake.
On 11/27, Christoph Hellwig wrote:
>
> On Thu, Nov 26, 2009 at 01:24:41PM +0100, Ingo Molnar wrote:
> >
> > FYI, it's been in -mm, that's where it's maintained.
>
> None of the recent mm snapshots has anything utrace related in there,
Well, not that I think this is important, but...
Two weeks ago we asked Andrew do drop utrace-core.patch from -mm,
it should be replaced by this updated version.
Oleg.
On 11/27, Ananth N Mavinakayanahalli wrote:
>
> On Thu, Nov 26, 2009 at 03:50:51PM +0100, Oleg Nesterov wrote:
>
> > Ananth, could you please run the test-case from the changelog
> > below ? I do not really expect this can help, but just in case.
>
> Right, it doesn't help :-(
>
> GDB shows that the parent is forever struck at wait().
Now this is interesting. Could you please double check the parent hangs
in wait() ?
This doesn't match the testing we did on powerpc machine with Veaceslav,
and I hoped the problem was already resolved?
Please see other emails in this thread.
Hmm. Fortunately I still have the access to the testing machine.
Yes, according to gdb it looks as if it "hangs" in wait(). This
is not true. You can strace gdb itself, or look at xxx_ctxt_switches
in /proc/pid_of_parent/status.
Better yet, do not use gdb at all. Just strace (without -f) the parent,
you should see it continues to trace the child and loops forever.
Oleg.
On Thu, Nov 26, 2009 at 11:37:03PM +0100, Oleg Nesterov wrote:
>
> Could you look at this
>
> ptrace-copy_process-should-disable-stepping.patch
> http://marc.info/?l=linux-mm-commits&m=125789789322573
>
> patch? It is not clear to me how we can modify the test-case to
> verify it fixes the original problem for powerpc.
I modified the test-case, it confirms that
ptrace-copy_process-should-disable-stepping.patch fixes the
problem with TIF_SINGLESTEP copied by fork() on powerpc.
Probably we need a similar fix for step-fork.c in ptrace-tests.
Modified the original testcase to call fork via syscall(__NR_fork),
to avoid the looping inside libc's fork() on powerpc.
The parent singlesteps until he sees that the child has forked, after
that the parent PTRACE_CONTs until the child exits.
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <assert.h>
#include <sys/syscall.h>
int main(void)
{
void *addr_after_fork = &&after_fork;
int pid, status;
if (!(pid = fork())) {
assert(ptrace(PTRACE_TRACEME) == 0);
kill(getpid(), SIGSTOP);
if (!syscall(__NR_fork)) {
/* kernel bug: this child will be killed by SIGTRAP */
printf("Hello world\n");
return 43;
}
after_fork: wait(&status);
return WEXITSTATUS(status);
}
for (;;) {
siginfo_t info;
assert(pid == wait(&status));
assert(ptrace(PTRACE_GETSIGINFO, pid, 0,&info) == 0);
if (info.si_addr == addr_after_fork)
break;
assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0);
}
for (;;) {
if (WIFEXITED(status))
break;
assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
assert(pid == wait(&status));
}
assert(WEXITSTATUS(status) == 43);
return 0;
}
--
Veaceslav
* Christoph Hellwig <[email protected]> wrote:
> > Yes. Which is a further argument to not do it like that but to do
> > one arch at a time. Trying to do too much at once is bad
> > engineering.
>
> I'm not sure why you're trying to pick fights here, [...]
I am advocating proper engineering practices - not sure why you
characterise it as 'picking fights'.
> [...] but no one has said about doing it all in once. [...]
To quote your mail in (<[email protected]>):
> > [...] I'd suggest you make sure the remaining two major
> > architectures (arm and mips) get converted [...]
Spreading the impact of these changes to even more architectures is bad,
for the reasons i outlined in my previous mail.
Thanks,
Ingo
On Fri, Nov 27, 2009 at 04:05:31PM +0100, Oleg Nesterov wrote:
> On 11/27, Ananth N Mavinakayanahalli wrote:
> >
> > On Thu, Nov 26, 2009 at 03:50:51PM +0100, Oleg Nesterov wrote:
> >
> > > Ananth, could you please run the test-case from the changelog
> > > below ? I do not really expect this can help, but just in case.
> >
> > Right, it doesn't help :-(
> >
> > GDB shows that the parent is forever struck at wait().
>
> Now this is interesting. Could you please double check the parent hangs
> in wait() ?
>
> This doesn't match the testing we did on powerpc machine with Veaceslav,
> and I hoped the problem was already resolved?
>
> Please see other emails in this thread.
>
>
> Hmm. Fortunately I still have the access to the testing machine.
> Yes, according to gdb it looks as if it "hangs" in wait(). This
> is not true. You can strace gdb itself, or look at xxx_ctxt_switches
> in /proc/pid_of_parent/status.
>
> Better yet, do not use gdb at all. Just strace (without -f) the parent,
> you should see it continues to trace the child and loops forever.
Yes, I can see it looping.
However, Veaceslav's modified testcase in
http://lkml.org/lkml/2009/11/27/215 passes on the machine with
ptrace-copy_process-should-disable-stepping.patch.
Ananth
On Fri, Nov 27, 2009 at 06:46:27PM +0100, Veaceslav Falico wrote:
> On Thu, Nov 26, 2009 at 11:37:03PM +0100, Oleg Nesterov wrote:
> >
> > Could you look at this
> >
> > ptrace-copy_process-should-disable-stepping.patch
> > http://marc.info/?l=linux-mm-commits&m=125789789322573
> >
> > patch? It is not clear to me how we can modify the test-case to
> > verify it fixes the original problem for powerpc.
>
> I modified the test-case, it confirms that
> ptrace-copy_process-should-disable-stepping.patch fixes the
> problem with TIF_SINGLESTEP copied by fork() on powerpc.
>
> Probably we need a similar fix for step-fork.c in ptrace-tests.
>
> Modified the original testcase to call fork via syscall(__NR_fork),
> to avoid the looping inside libc's fork() on powerpc.
> The parent singlesteps until he sees that the child has forked, after
> that the parent PTRACE_CONTs until the child exits.
Thanks Veaceslav. This works:
Index: ptrace-tests/tests/step-fork.c
===================================================================
--- ptrace-tests.orig/tests/step-fork.c
+++ ptrace-tests/tests/step-fork.c
@@ -29,6 +29,7 @@
#include <unistd.h>
#include <sys/wait.h>
#include <string.h>
+#include <sys/syscall.h>
#include <signal.h>
#ifndef PTRACE_SINGLESTEP
@@ -78,7 +79,7 @@ main (int argc, char **argv)
sigprocmask (SIG_BLOCK, &mask, NULL);
ptrace (PTRACE_TRACEME);
raise (SIGUSR1);
- if (fork () == 0)
+ if (syscall(__NR_fork) == 0)
{
read (-1, NULL, 0);
_exit (22);
Oleg,
With the above patch applied, syscall-reset is the only failure I see on
powerpc:
errno 14 (Bad address)
syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location
()) == 38' failed.
unexpected child status 67f
FAIL: syscall-reset
...
========================================
1 of 40 tests failed
(11 tests were not run)
Please report to [email protected]
========================================
Ananth
On Wed 2009-11-25 16:48:18, Christoph Hellwig wrote:
> On Tue, Nov 24, 2009 at 09:01:27PM +0100, Oleg Nesterov wrote:
> > Hello.
> >
> > This is the new iteration of Roland's utrace patch, this time
> > with "rewrite-ptrace-via-utrace" + cleanups in utrace core.
> >
> > 1-7 are already in -mm tree, I am sending them to simplify the
> > review.
> >
> > 8-12 don not change the behaviour, simple preparations.
> >
> > 13-14 add utrace-ptrace and utrace
>
> Skipped over it very, very briefly. One thing I really hate about this
> is that it introduces two ptrace implementation by adding the new one
> without removing the old one. Given that's it's pretty much too later
> for the 2.6.33 cycle anyway I'd suggest you make sure the remaining
> two major architectures (arm and mips) get converted, and if the
> remaining minor architectures don't manage to get their homework done
> they're left without ptrace.
I don't think introducing regressions to force people to rewrite code
is a good way to go...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On 11/28, Ananth N Mavinakayanahalli wrote:
>
> syscall-reset is the only failure I see on
> powerpc:
>
> errno 14 (Bad address)
> syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location
> ()) == 38' failed.
> unexpected child status 67f
> FAIL: syscall-reset
(to remind, it also fails without utrace)
Once again, I know nothing about powerc, perhaps I misread the code,
but I believe this test-case is just wrong on powerpc and should be
fixed.
On powerpc, syscall_get_nr() returns regs->gpr[0], this means this
register is used to pass the syscall number.
This matches do_syscall_trace_enter(), it returns regs->gpr[0] as a
(possibly changed by tracer) syscall nr.
arch/powerpc/kernel/entry_64.S does
syscall_dotrace:
bl .do_syscall_trace_enter
mr r0,r3 // I guess, r3 = r0 ?
...
b syscall_dotrace_cont
syscall_dotrace_cont:
syscall_dotrace_cont:
cmpldi 0,r0,NR_syscalls
bge- syscall_enosys
syscall_enosys:
li r3,-ENOSYS
b syscall_exit
Now return to the test-case, syscall-reset.c. The tracee does
l = syscall (-23, 1, 2, 3) and stops.
The tracer does
#define RETREG offsetof(struct pt_regs, gpr[0])
#define NEWVAL ((long) ENOTTY)
l = ptrace(PTRACE_PEEKUSER, child, RETREG, 0l);
l == -23, this is correct, note syscall(-23) above.
l = ptrace(PTRACE_POKEUSER, child, RETREG, NEWVAL);
And expects the tracee will see NEWVAL==ENOTTY after return from
the systame call.
Of course this can't happen. We changed the syscall number, the
new value is ENOTTY == 25 == __NR_stime, sys_stime() correctly
returns -EFAULT.
-----------------------------------------------------------------
If I change the test-case to use NEWVAL == 1000 (or any other value
greater than NR_syscalls), then the tracee sees ENOSYS and this is
correct too.
But I do not see how it is possible to change the retcode on powerpc.
Unlike x86, powepc doesn't set -ENOSYS "in advance", before doing
do_syscall_trace_enter() logic. This means that if the tracer "cancels"
syscall, r3 will be overwritten by syscall_enosys.
This probably means the kernel should be fixed too, but I am not
brave enough to change the asm which I can't understand ;)
Oleg.
On Sun, 2009-11-29 at 22:07 +0100, Oleg Nesterov wrote:
> On 11/28, Ananth N Mavinakayanahalli wrote:
> >
> > syscall-reset is the only failure I see on
> > powerpc:
> >
> > errno 14 (Bad address)
> > syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location
> > ()) == 38' failed.
> > unexpected child status 67f
> > FAIL: syscall-reset
>
> (to remind, it also fails without utrace)
>
> Once again, I know nothing about powerc, perhaps I misread the code,
> but I believe this test-case is just wrong on powerpc and should be
> fixed.
>
> On powerpc, syscall_get_nr() returns regs->gpr[0], this means this
> register is used to pass the syscall number.
Correct.
> This matches do_syscall_trace_enter(), it returns regs->gpr[0] as a
> (possibly changed by tracer) syscall nr.
>
> arch/powerpc/kernel/entry_64.S does
>
> syscall_dotrace:
>
> bl .do_syscall_trace_enter
> mr r0,r3 // I guess, r3 = r0 ?
r3 is the return value from a function so this replaces the
syscall number
> ...
> b syscall_dotrace_cont
>
> syscall_dotrace_cont:
>
> syscall_dotrace_cont:
>
> cmpldi 0,r0,NR_syscalls
> bge- syscall_enosys
>
> syscall_enosys:
>
> li r3,-ENOSYS
> b syscall_exit
>
>
> Now return to the test-case, syscall-reset.c. The tracee does
> l = syscall (-23, 1, 2, 3) and stops.
>
> The tracer does
>
> #define RETREG offsetof(struct pt_regs, gpr[0])
> #define NEWVAL ((long) ENOTTY)
>
> l = ptrace(PTRACE_PEEKUSER, child, RETREG, 0l);
>
> l == -23, this is correct, note syscall(-23) above.
>
> l = ptrace(PTRACE_POKEUSER, child, RETREG, NEWVAL);
>
> And expects the tracee will see NEWVAL==ENOTTY after return from
> the systame call.
>
> Of course this can't happen. We changed the syscall number, the
> new value is ENOTTY == 25 == __NR_stime, sys_stime() correctly
> returns -EFAULT.
>
> -----------------------------------------------------------------
>
> If I change the test-case to use NEWVAL == 1000 (or any other value
> greater than NR_syscalls), then the tracee sees ENOSYS and this is
> correct too.
>
> But I do not see how it is possible to change the retcode on powerpc.
> Unlike x86, powepc doesn't set -ENOSYS "in advance", before doing
> do_syscall_trace_enter() logic. This means that if the tracer "cancels"
> syscall, r3 will be overwritten by syscall_enosys.
>
> This probably means the kernel should be fixed too, but I am not
> brave enough to change the asm which I can't understand ;)
Yes, the asm should be changed. I suppose we could check if the result
of do_syscall_trace_enter is negative, and if it is, branch to the exit
path using r3 as the error code. Would that be ok ?
Something like this:
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 1175a85..7a88c88 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -419,6 +419,9 @@ syscall_dotrace:
stw r0,_TRAP(r1)
addi r3,r1,STACK_FRAME_OVERHEAD
bl do_syscall_trace_enter
+ cmpwi cr0,r3,0
+ blt ret_from_syscall
+
/*
* Restore argument registers possibly just changed.
* We use the return value of do_syscall_trace_enter
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 9763267..ec709a7 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -240,6 +240,9 @@ syscall_dotrace:
bl .save_nvgprs
addi r3,r1,STACK_FRAME_OVERHEAD
bl .do_syscall_trace_enter
+ cmpdi cr0,r3,0
+ blt syscall_exit
+
/*
* Restore argument registers possibly just changed.
* We use the return value of do_syscall_trace_enter
Cheers,
Ben.
On Mon, 2009-11-30 at 10:15 +1100, Benjamin Herrenschmidt wrote:
> Yes, the asm should be changed. I suppose we could check if the result
> of do_syscall_trace_enter is negative, and if it is, branch to the exit
> path using r3 as the error code. Would that be ok ?
>
> Something like this:
Note however that there's a trace exit too and that's normally the right
place to alter the result don't you think ?
Cheers,
Ben.
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index 1175a85..7a88c88 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -419,6 +419,9 @@ syscall_dotrace:
> stw r0,_TRAP(r1)
> addi r3,r1,STACK_FRAME_OVERHEAD
> bl do_syscall_trace_enter
> + cmpwi cr0,r3,0
> + blt ret_from_syscall
> +
> /*
> * Restore argument registers possibly just changed.
> * We use the return value of do_syscall_trace_enter
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 9763267..ec709a7 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -240,6 +240,9 @@ syscall_dotrace:
> bl .save_nvgprs
> addi r3,r1,STACK_FRAME_OVERHEAD
> bl .do_syscall_trace_enter
> + cmpdi cr0,r3,0
> + blt syscall_exit
> +
> /*
> * Restore argument registers possibly just changed.
> * We use the return value of do_syscall_trace_enter
>
>
> Cheers,
> Ben.
On 11/30, Benjamin Herrenschmidt wrote:
>
> On Mon, 2009-11-30 at 10:15 +1100, Benjamin Herrenschmidt wrote:
>
> > Yes, the asm should be changed. I suppose we could check if the result
> > of do_syscall_trace_enter is negative, and if it is, branch to the exit
> > path using r3 as the error code. Would that be ok ?
> >
> > Something like this:
>
> Note however that there's a trace exit too and that's normally the right
> place to alter the result don't you think ?
Yes, the result can be changed when the tracee reports syscall-exit.
Should powerpc allow this on syscall-entry? I do not know. x86 does,
and we have this test-case which assumes powerpc should allow too.
But when it comes to ptrace I can almost never know what was the
supposed behaviour/api.
Jan, Roland, what do you think?
Oleg.
On 11/30, Benjamin Herrenschmidt wrote:
>
> On Sun, 2009-11-29 at 22:07 +0100, Oleg Nesterov wrote:
> >
> > If I change the test-case to use NEWVAL == 1000 (or any other value
> > greater than NR_syscalls), then the tracee sees ENOSYS and this is
> > correct too.
> >
> > But I do not see how it is possible to change the retcode on powerpc.
> > Unlike x86, powepc doesn't set -ENOSYS "in advance", before doing
> > do_syscall_trace_enter() logic. This means that if the tracer "cancels"
> > syscall, r3 will be overwritten by syscall_enosys.
> >
> > This probably means the kernel should be fixed too, but I am not
> > brave enough to change the asm which I can't understand ;)
>
> Yes, the asm should be changed. I suppose we could check if the result
> of do_syscall_trace_enter is negative, and if it is, branch to the exit
> path using r3 as the error code. Would that be ok ?
>
> Something like this:
>
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -240,6 +240,9 @@ syscall_dotrace:
> bl .save_nvgprs
> addi r3,r1,STACK_FRAME_OVERHEAD
> bl .do_syscall_trace_enter
> + cmpdi cr0,r3,0
> + blt syscall_exit
> +
Yes, but this doesn't allow to a) cancel this syscall and b) make it
return a non-negative result to the tracee.
Perhaps poweprc should set pt_regs->result = -ENOSYS before calling
do_syscall_trace_enter() like x86 does ? (in this case syscall_exit()
shouldn't change RESULT(r1)). This way the tracer can change both
pt_regs->result and gpr[0] independently.
Oleg.
We don't really intend to impose any new requirements on the arch behavior
here. It's up to the arch folks to decide. It does simplify life somewhat
on x86 that you can change the registers at the syscall-entry stop and then
after skipping the syscall, those registers will be unchanged from what you
set. But it's never been that way on every other arch, and it doesn't need
to be. The arch requirement on the tracehook_report_syscall_entry() return
value handling is that it make the syscall not be run, and that the
register state then left be compatible with using syscall_rollback() at the
tracehook_report_syscall_exit() spot.
As to what you get from ptrace explicitly fiddling with registers at
syscall entry, that has always been arch-specific before and we haven't
done anything to change that situation. On every arch, there is some way
to change the syscall number to be run, and changing it to a known-bogus
number (e.g. -1) makes sure no syscall runs. On every arch, it's possible
at the tracehook_report_syscall_exit() spot to change the registers to
exactly whatever you want userland to see. That's enough as it stands to
make it possible to do whatever you want, some way or other.
If the powerpc maintainers want to change the behavior here, that is fine
by me. But there is no need for that just to satisfy general ptrace
cleanups (or utrace). Normal concerns require that no such change break
the ptrace behavior that userland could have relied on in the past.
So off hand I don't see a reason to change at all. If every arch were to
change so that registers changed at syscall-entry were left unmolested by
aborting the syscall, then that might be a new consistency worth having.
But short of that, I don't really see a benefit.
All this implies that the ptrace-tests case relating to this needs to be
tailored differently for powerpc and each other arch so it expects and
verifies exactly the arch-specific behavior that's been seen in the past.
Thanks,
Roland
On Tue, 2009-12-01 at 11:27 -0800, Roland McGrath wrote:
>
> If the powerpc maintainers want to change the behavior here, that is fine
> by me. But there is no need for that just to satisfy general ptrace
> cleanups (or utrace). Normal concerns require that no such change break
> the ptrace behavior that userland could have relied on in the past.
>
> So off hand I don't see a reason to change at all. If every arch were to
> change so that registers changed at syscall-entry were left unmolested by
> aborting the syscall, then that might be a new consistency worth having.
> But short of that, I don't really see a benefit.
>
> All this implies that the ptrace-tests case relating to this needs to be
> tailored differently for powerpc and each other arch so it expects and
> verifies exactly the arch-specific behavior that's been seen in the past.
Ok thanks. I'm happy to not change it then, the risk of breaking some
existing assumption is too high in my book.
Cheers,
Ben.
Note to all: I'm on the road this week (having had a holiday last week)
and will be somewhat slow in replying on these threads, but I will be
sure to get to them all.
> Yes, nobody likes 2 implementations. I guess Roland and me hate
> CONFIG_UTRACE much more than anybody else.
:-) We both hate maintaining the old ptrace implementation, that is true!
The other thing we hate is having our work delayed arbitrarily again and
again to wait for the arch maintainers of arch's we don't use ourselves.
Oleg and I have been trying to follow the advice we get on how to get this
work merged in. When multiple gate-keepers give conflicting advice, we
really don't know what to do next. Our interest is in whatever path
smooths the merging of our work. We'd greatly prefer to spend our time
hacking on our new code to make it better or different in cool and useful
ways than to spend more time guessing what order of patches and rejuggling
of the work will fit the changing whims of the next round of review.
We don't want to rush anyone, like uninterested arch maintainers. We don't
want to break anyone who doesn't care about our work (we do test for ptrace
regressions but of course new code will always have new bugs so some
instances of instability in the transition to a new ptrace implementation
have to be expected no matter how hard we try). We just don't want to keep
working out of tree.
The advantage of making the new code optional is, obviously, that you can
turn it off. That is, lagging arch's won't be broken, just unable to turn
it on. And, anyone who doesn't want to try anything new yet can just turn
it off and not be exposed to new code.
The advantage of making the new code nonoptional is, obviously, that you
can't turn it off. The old implementation will be gone and we won't have
to maintain it any more (outside some -stable streams for a while). The
kernel source will be cleaner with no optional old cruft code duplicating
functionality. All ptrace users will be testing the new code, and so we'll
find its bugs and gain confidence that it works solidly.
Like I've said, we want to do whatever the people want. My concern about
what Christoph has suggested is that it really seems like an open-ended
delay depending on some arch people who are not even in the conversation.
For anyone either who likes utrace or who is concerned about its details,
the sooner we are working in-tree the sooner we can really hash it out
thoroughly and get to having merged code that works well. As long as there
is anything unfinished like arch work that we've decided is a gating event,
then the review and hashing out of the new code itself does not seem to get
very serious. (To wit, this thread is still talking about merge order and
such, another long thread was about incidental trivia, and we've only just
started to see a tiny bit of actual code review today.)
Thanks,
Roland