I'm sending this mail again, because unfortunately I didn't receive
any reply. It was sent the first time at April, 5th.
Regards, Bodo
Hi,
currently I'm porting UML to s390 31-bit.
A first 2.6.11 UML system already is running in UML-SKAS0 mode,
which normally should run on an unpatched host (no skas3-patch).
To make UML build and run on s390, I needed to do these two little
changes (the patches are copy and paste. I hope that doesn't hurt,
since they are very small):
1) UML includes some of the subarch's (s390) headers. I had to
change one of them with the following one-liner, to make this
compile. AFAICS, this change doesn't break compilation of s390
itself.
==============================================================================
--- linux-2.6.11.orig/include/asm-s390/user.h 2004-12-09 18:45:02.000000000 +0100
+++ linux-2.6.11/include/asm-s390/user.h 2004-12-09 18:48:11.000000000 +0100
@@ -10,7 +10,7 @@
#define _S390_USER_H
#include <asm/page.h>
-#include <linux/ptrace.h>
+#include <asm/ptrace.h>
/* Core file format: The core file is written in such a way that gdb
can understand it and provide useful information to the user (under
linux we use the 'trad-core' bfd). There are quite a number of
==============================================================================
2) UML needs to intercept syscalls via ptrace to invalidate the syscall,
read syscall's parameters and write the result with the result of
UML's syscall processing. Also, UML needs to make sure, that the host
does no syscall restart processing. On i386 for example, this can be
done by writing -1 to orig_eax on the 2nd syscall interception
(orig_eax is the syscall number, which after the interception is used
as a "interrupt was a syscall" flag only.
Unfortunately, s390 holds syscall number and syscall result in gpr2 and
its "interrupt was a syscall" flag (trap) is unreachable via ptrace.
So I changed the host to set trap to -1, if the syscall number is written
to a negative value on the first syscall interception.
I hope, this adds what UML needs without changing ptrace visibly to other
ptrace users.
(This isn't tested on a 2.6 host yet, because my host still is a 2.4.21 SuSE.
But I've adapted this change to 2.4 and tested, it works.)
==============================================================================
--- linux-2.6.11.orig/arch/s390/kernel/ptrace.c 2005-04-04 18:57:38.000000000 +0200
+++ linux-2.6.11/arch/s390/kernel/ptrace.c 2005-04-04 19:01:51.000000000 +0200
@@ -726,6 +726,13 @@
? 0x80 : 0));
/*
+ * If debugger has set an invalid syscall number,
+ * we prepare to skip syscall restart handling
+ */
+ if (!entryexit && (long )regs->gprs[2] < 0 )
+ regs->trap = -1;
+
+ /*
* this isn't the same as continuing with a signal, but it will do
* for normal use. strace only continues with a signal if the
* stopping signal is not SIGTRAP. -brl
==============================================================================
It would be very helpful for me, if these changes could go into s390 mainline.
If there is something wrong with them, please help me find a better solution.
Regards, Bodo
Bodo Stroesser <[email protected]> wrote on 04/27/2005
10:21:58 PM:
> I'm sending this mail again, because unfortunately I didn't receive
> any reply. It was sent the first time at April, 5th.
Sorry, I put it on the to-do pile and promptly forgot about it.
> currently I'm porting UML to s390 31-bit.
Nice...
> To make UML build and run on s390, I needed to do these two little
> changes (the patches are copy and paste. I hope that doesn't hurt,
> since they are very small):
>
> 1) UML includes some of the subarch's (s390) headers. I had to
> change one of them with the following one-liner, to make this
> compile. AFAICS, this change doesn't break compilation of s390
> itself.
This one isn't a problem. I'll add it to the repository.
> 2) UML needs to intercept syscalls via ptrace to invalidate the syscall,
> read syscall's parameters and write the result with the result of
> UML's syscall processing. Also, UML needs to make sure, that the host
> does no syscall restart processing. On i386 for example, this can be
> done by writing -1 to orig_eax on the 2nd syscall interception
> (orig_eax is the syscall number, which after the interception is used
> as a "interrupt was a syscall" flag only.
> Unfortunately, s390 holds syscall number and syscall result in gpr2 and
> its "interrupt was a syscall" flag (trap) is unreachable via ptrace.
> So I changed the host to set trap to -1, if the syscall number is written
> to a negative value on the first syscall interception.
> I hope, this adds what UML needs without changing ptrace visibly to other
> ptrace users.
> (This isn't tested on a 2.6 host yet, because my host still is a 2.4.21 SuSE.
> But I've adapted this change to 2.4 and tested, it works.)
>
>
> ==============================================================================
> --- linux-2.6.11.orig/arch/s390/kernel/ptrace.c 2005-04-04 18:57:
> 38.000000000 +0200
> +++ linux-2.6.11/arch/s390/kernel/ptrace.c 2005-04-04 19:01:51.
> 000000000 +0200
> @@ -726,6 +726,13 @@
> ? 0x80 : 0));
>
> /*
> + * If debugger has set an invalid syscall number,
> + * we prepare to skip syscall restart handling
> + */
> + if (!entryexit && (long )regs->gprs[2] < 0 )
> + regs->trap = -1;
> +
> + /*
> * this isn't the same as continuing with a signal, but it will do
> * for normal use. strace only continues with a signal if the
> * stopping signal is not SIGTRAP. -brl
> ==============================================================================
This patch is not good. !entryexit indicates that you want to change the trap
indication on the first of the two calls of syscall_trace for a system call. The
second condition is gprs[2] < 0 but that can be true for a normal system call as
well, like sys_exit(-1). It might even be true for user addresses if we really
extent the virtual address space to full 64 bit one day (and the hardware can do
it with a 5 level paging table). To change regs->trap to -1 with the current
condition is definitly wrong.
Independent from that it do not understand why you need it at all. If the
uml host intercepted and invalidated the guest system call the system restart
indication bit _TIF_RESTART_SVC shouldn't be set because the guest didn't
execute a system call.
blue skies,
Martin
Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH
Martin Schwidefsky wrote:
> Bodo Stroesser <[email protected]> wrote on 04/27/2005
> 10:21:58 PM:
>>1) UML includes some of the subarch's (s390) headers. I had to
>> change one of them with the following one-liner, to make this
>> compile. AFAICS, this change doesn't break compilation of s390
>> itself.
>
>
> This one isn't a problem. I'll add it to the repository.
Thank you!
>
>
>>==============================================================================
>>--- linux-2.6.11.orig/arch/s390/kernel/ptrace.c 2005-04-04 18:57:
>>38.000000000 +0200
>>+++ linux-2.6.11/arch/s390/kernel/ptrace.c 2005-04-04 19:01:51.
>>000000000 +0200
>>@@ -726,6 +726,13 @@
>> ? 0x80 : 0));
>>
>> /*
>>+ * If debugger has set an invalid syscall number,
>>+ * we prepare to skip syscall restart handling
>>+ */
>>+ if (!entryexit && (long )regs->gprs[2] < 0 )
>>+ regs->trap = -1;
>>+
>>+ /*
>> * this isn't the same as continuing with a signal, but it will do
>> * for normal use. strace only continues with a signal if the
>> * stopping signal is not SIGTRAP. -brl
>>==============================================================================
>
>
> This patch is not good. !entryexit indicates that you want to change the trap
> indication on the first of the two calls of syscall_trace for a system call. The
> second condition is gprs[2] < 0 but that can be true for a normal system call as
> well, like sys_exit(-1).
Sorry, that's not right. At that point, gprs[2] holds the syscall number, while the
first argument of the syscall is in origgpr2. If the debugger sets the syscall number
to -1, which is an invalid syscall, changing trap to -1 will result in a changed
behavior only in case, that the debugger on the second syscall interception also sets
the syscall result to ERESTARTXXXXX (This again is modifying gprs[2]). ERESTARTXXXXX
normally would/could be handled by do_signal(), but with the patch it no longer will.
So, I think the patch doesn't hurt in normal cases, but does the trick for UML.
> It might even be true for user addresses if we really
> extent the virtual address space to full 64 bit one day (and the hardware can do
> it with a 5 level paging table). To change regs->trap to -1 with the current
> condition is definitly wrong.
> Independent from that it do not understand why you need it at all. If the
> uml host intercepted and invalidated the guest system call the system restart
> indication bit _TIF_RESTART_SVC shouldn't be set because the guest didn't
> execute a system call.
Let my explain a bit more. UML invalidates UML-user's syscalls on the host, processes
the syscall itself and inserts the result into gprs[2] on the second syscall
interception. For nearly all syscalls ERESTARTXXXXX is a result not returned to user,
but handled in UML kernel internally. But that's not true for sys_(rt_)sigreturn.
The "result" of those is the original contents of gpr2 of the interrupted routine,
which accidentally also might be ERESTARTXXXXXXX (BTW, that's the reason for
sys_(rt_)sigreturn setting trap to -1 also). We skip UML's syscall restart handling
in this case, but we need to skip it in the host, too.
>
> blue skies,
> Martin
>
> Martin Schwidefsky
> Linux for zSeries Development & Services
> IBM Deutschland Entwicklung GmbH
Regards, Bodo
Bodo Stroesser <[email protected]> wrote on 04/28/2005
11:54:17 AM:
> > This patch is not good. !entryexit indicates that you want to change
the trap
> > indication on the first of the two calls of syscall_trace for a system
call. The
> > second condition is gprs[2] < 0 but that can be true for a normal
system call as
> > well, like sys_exit(-1).
> Sorry, that's not right. At that point, gprs[2] holds the syscall number,
while the
> first argument of the syscall is in origgpr2. If the debugger sets the
syscall number
> to -1, which is an invalid syscall, changing trap to -1 will result in a
changed
> behavior only in case, that the debugger on the second syscall
interception also sets
> the syscall result to ERESTARTXXXXX (This again is modifying gprs[2]).
ERESTARTXXXXX
> normally would/could be handled by do_signal(), but with the patch it no
longer will.
> So, I think the patch doesn't hurt in normal cases, but does the trick
for UML.
Yes, your are right. gprs[2] holds the syscall number for the debugger to
change.
So (!entryexit & regs->gprs[2] < 0) translates to the debugger changed the
guest
system call to something illegal on the first of the two ptrace calls. So
the
patch doesn't hurt for normal, non-ptraced operation but it might hurt
other
users of ptrace.
> > Independent from that it do not understand why you need it at all. If
the
> > uml host intercepted and invalidated the guest system call the system
restart
> > indication bit _TIF_RESTART_SVC shouldn't be set because the guest
didn't
> > execute a system call.
> Let my explain a bit more. UML invalidates UML-user's syscalls on the
host, processes
> the syscall itself and inserts the result into gprs[2] on the second
syscall
> interception. For nearly all syscalls ERESTARTXXXXX is a result not
returned to user,
> but handled in UML kernel internally. But that's not true for
sys_(rt_)sigreturn.
> The "result" of those is the original contents of gpr2 of the interrupted
routine,
> which accidentally also might be ERESTARTXXXXXXX (BTW, that's the reason
for
> sys_(rt_)sigreturn setting trap to -1 also). We skip UML's syscall
restart handling
> in this case, but we need to skip it in the host, too.
Ok, I think I've understood the problem now. What you are basically have is
a process running in a UML guest that happens to have -ERESTARTXXX in grp2
when it gets interrupted. A signal is delivered and on return from that
signal
with sys_(rt_)sigreturn >another< signal might be pending and then
do_signal
gets confused because of -ERESTARTXXX in grp2. For normal, non-uml
operation
restore_sigregs resets regs->trap to -1 which avoids the confusion. With
UML
the host intercepts sys_rt_sigreturn and does whatever needs to be done for
the guest >except< resetting regs->trap to -1. So the problem seems to be
that you need a ptrace interface to do that. I don't think it is a good
idea
to kludge syscall_trace to reset regs->trap under some conditions.
blue skies,
Martin
Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH
Martin Schwidefsky wrote:
> So (!entryexit & regs->gprs[2] < 0) translates to the debugger changed the
> guest
> system call to something illegal on the first of the two ptrace calls. So
> the
> patch doesn't hurt for normal, non-ptraced operation but it might hurt
> other
> users of ptrace.
I don't think, it hurts. If a debugger willingly sets the syscall number
to -1, what would happen without the patch?
The kernel will set the result -ENOSYS into grps[2]. So, even if trap
still indicates a syscall and a signal is pending, no syscall restarting
will be done.
With the patch, a debugger would observe changed behavior of the kernel
*only*, if it writes the syscall number to -1 on the first syscall
interception and then writes the result to ERESTARTXXXXX on the second,
while at the same time a signal is pending for the debugged process.
I assumed, that non of the current users of ptrace exactly does this.
If I'm wrong here, the patch *really* is bad.
> Ok, I think I've understood the problem now. What you are basically have is
> a process running in a UML guest that happens to have -ERESTARTXXX in grp2
> when it gets interrupted. A signal is delivered and on return from that
> signal
> with sys_(rt_)sigreturn >another< signal might be pending and then
> do_signal
> gets confused because of -ERESTARTXXX in grp2.
This other signal must be pending on the *host*, in UML, this might be
SIGVTALRM.
> For normal, non-uml operation
> restore_sigregs resets regs->trap to -1 which avoids the confusion. With
> UML
> the host intercepts sys_rt_sigreturn and does whatever needs to be done for
> the guest >except< resetting regs->trap to -1. So the problem seems to be
> that you need a ptrace interface to do that. I don't think it is a good
> idea
> to kludge syscall_trace to reset regs->trap under some conditions.
My idea was to enable the existing ptrace interface to do what UML
needs, without changing it in a way observable to other users of ptrace.
I expected my patch to exactly do that, but maybe I missed something.
Any better idea is welcome.
Regards, Bodo
Bodo Stroesser wrote:
> Martin Schwidefsky wrote:
>
>> So (!entryexit & regs->gprs[2] < 0) translates to the debugger changed
>> the
>> guest
>> system call to something illegal on the first of the two ptrace calls. So
>> the
>> patch doesn't hurt for normal, non-ptraced operation but it might hurt
>> other
>> users of ptrace.
>
> I don't think, it hurts. If a debugger willingly sets the syscall number
> to -1, what would happen without the patch?
> The kernel will set the result -ENOSYS into grps[2]. So, even if trap
> still indicates a syscall and a signal is pending, no syscall restarting
> will be done.
> With the patch, a debugger would observe changed behavior of the kernel
> *only*, if it writes the syscall number to -1 on the first syscall
> interception and then writes the result to ERESTARTXXXXX on the second,
> while at the same time a signal is pending for the debugged process.
>
> I assumed, that non of the current users of ptrace exactly does this.
> If I'm wrong here, the patch *really* is bad.
Addendum:
To avoid any conflicts as far as possible, the -1 written and checked
as the syscall number to reset trap could be replaced by some magic
value, which then should defined in asm/ptrace.h
In terms of performance, any method, that allows to reset trap
without an additional ptrace call, is fine.
Bodo
Bodo Stroesser <[email protected]> wrote on 04/28/2005
03:41:39 PM:
> Martin Schwidefsky wrote:
> > So (!entryexit & regs->gprs[2] < 0) translates to the debugger changed
the guest
> > system call to something illegal on the first of the two ptrace calls.
So the
> > patch doesn't hurt for normal, non-ptraced operation but it might hurt
other
> > users of ptrace.
> I don't think, it hurts. If a debugger willingly sets the syscall number
> to -1, what would happen without the patch?
> The kernel will set the result -ENOSYS into grps[2]. So, even if trap
> still indicates a syscall and a signal is pending, no syscall restarting
> will be done.
Why should the kernel set the result to -ENOSYS? If the debugger
invalidated
the system call, entry.S will just skip the system call and gprs[2] will
keep
the value that the debugger put there. But if regs->traps still indicates a
system call and there is another signal pending do_signal will try to
restart
the system call. In case of sys_(rt_)return there is no system call anymore
because the debugger = UML took care of it. That's why UML wants to have
access to regs->traps.
> With the patch, a debugger would observe changed behavior of the kernel
> *only*, if it writes the syscall number to -1 on the first syscall
> interception and then writes the result to ERESTARTXXXXX on the second,
> while at the same time a signal is pending for the debugged process.
In theory the debugger may want to intercept specific system calls and
do magic things instead but still want to have the system call restarting
to intercept it again after the (guest) signal has been delivered. You
just don't know.
> I assumed, that non of the current users of ptrace exactly does this.
> If I'm wrong here, the patch *really* is bad.
At least strace and gdb won't do it.
> > Ok, I think I've understood the problem now. What you are basically
have is
> > a process running in a UML guest that happens to have -ERESTARTXXX in
grp2
> > when it gets interrupted. A signal is delivered and on return from that
signal
> > with sys_(rt_)sigreturn >another< signal might be pending and then
do_signal
> > gets confused because of -ERESTARTXXX in grp2.
> This other signal must be pending on the *host*, in UML, this might be
> SIGVTALRM.
Why on the host ?!? Now I'm really confused. I thought the problem is the
regs->trap value in the guest system, how can a signal pending in the host
have an effect on the guest?
> > For normal, non-uml operation
> > restore_sigregs resets regs->trap to -1 which avoids the confusion.
With UML
> > the host intercepts sys_rt_sigreturn and does whatever needs to be done
for
> > the guest >except< resetting regs->trap to -1. So the problem seems to
be
> > that you need a ptrace interface to do that. I don't think it is a good
idea
> > to kludge syscall_trace to reset regs->trap under some conditions.
> My idea was to enable the existing ptrace interface to do what UML
> needs, without changing it in a way observable to other users of ptrace.
> I expected my patch to exactly do that, but maybe I missed something.
> Any better idea is welcome.
The patch does a very specific thing that UML needs, hardcoded into the
ptrace
interface. Who knows what kind of things other user of ptrace might need?
I don't claim to know, and that is why I don't like to see this done in the
syscall_ptrace function. Perhaps via peekusr/pokeuser interface but then
trap should be a member of struct user.
blue skies,
Martin
Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH
Martin Schwidefsky wrote:
> Bodo Stroesser <[email protected]> wrote on 04/28/2005
> 03:41:39 PM:
>
>
>>I don't think, it hurts. If a debugger willingly sets the syscall number
>>to -1, what would happen without the patch?
>>The kernel will set the result -ENOSYS into grps[2]. So, even if trap
>>still indicates a syscall and a signal is pending, no syscall restarting
>>will be done.
>
>
> Why should the kernel set the result to -ENOSYS? If the debugger
> invalidated
> the system call, entry.S will just skip the system call and gprs[2] will
> keep
> the value that the debugger put there.
You are right. I mixed it with i386, that has a separate field for the
result, which is preloaded with -ENOSYS. On s390, the -1 written as the
syscall number leads to entry.S calling nothing. If a signal is pending,
do_signal() will interprete gprs[2] as syscall's result. Since -1 is
-EPERM, no syscall restarting will be done, regardless the value of trap.
The debugger would have to write gprs[2] with -1 on the first syscall
interception *and* with ERESTARTXXXXX at the second syscall interception
to see any changed behavior.
> But if regs->traps still indicates a
> system call and there is another signal pending do_signal will try to
> restart
> the system call. In case of sys_(rt_)return there is no system call anymore
> because the debugger = UML took care of it. That's why UML wants to have
> access to regs->traps.
Yeah.
>>>A signal is delivered and on return from that
>>>signal
>>>with sys_(rt_)sigreturn >another< signal might be pending and then
>>>do_signal
>>>gets confused because of -ERESTARTXXX in grp2.
>>
>>This other signal must be pending on the *host*, in UML, this might be
>>SIGVTALRM.
>
>
> Why on the host ?!? Now I'm really confused. I thought the problem is the
> regs->trap value in the guest system, how can a signal pending in the host
> have an effect on the guest?
I think, your confusion results from the unclear terminology.
UML is a full Linux, running as some user-space processes on the host.
So, there can be signals pending in UML for the processes running in
UML, but the host won't even know about that.
On the other hand, there can be signals pending "in the host" for the
user-space processes running on the host, that make up the UML.
For example, UML intensely uses SIG(VT)ALRM as the analogue to host's
timer interrupt. UML users never will "see" these signals.
To make occur the problem, that results from host's unwanted syscall
restarting, some condition must be met:
1) A process running in UML and having ERESTARTXXXXX in its GPR2 must be
interrupted by a signal. As GPR2 never is ERESTARTXXXXX at the end of
a "normal" syscall, this can happen only if a "interrupt" to UML
(SIGVTALRM, SIGIO from the host to UML) hits this situation.
2) If interrupt processing in UML results in sending a signal in UML
to the interrupted process (e.g. alarm timeout triggered) or if
a signal already is pending in UML, UML starts the handler for the
signal in the interrupted process.
3) The sighandler returns by calling sys_(rt_)sigreturn.
4) The ptracing process of UML intercepts that syscall and invalidates it.
Then it does one further PTRACE_SYSCALL and waits, until the child
reaches the second syscall interception.
5) UML runs its own sys_(rt_)sigreturn for the process, skips its own
syscall restart processing and writes the registers of the child with
the values, that result from sys_(rt_)sigreturn processing. Now GPR2
is loaded with ERESTARTXXXXXX again.
6) The child is resumed with PTRACE_SYSCALL. If a further signal is
pending in the host for that child, the host runs do_signal().
As regs->trap still is set for a syscall, syscall restarting is
processed in the host, the process in UML will fail.
Obviously, this is a rare case. On i386, the syscall number is used as
trap, so -1 can be written to it at the second interception to skip
syscall restarting. Some months ago, UML/i386 did not yet use this, so
I wrote a litte program, that made the problem happen.
>
> The patch does a very specific thing that UML needs, hardcoded into the
> ptrace
> interface. Who knows what kind of things other user of ptrace might need?
I don't *know* this, too. But I still believe, that no current user of
ptrace will see the difference, as very specific ptrace operations have
to be done to trigger the change.
If there really would be no user of ptrace doing things conflicting with
the patch yet, there would be no reason to not insert the patch. AFAICS,
the patch doesn't make any useful operation impossible. So future conflicts
can be avoided by future users of ptrace. Additionally, the -1 could be
replaced by -ENOSYS or even a special magic number.
> I don't claim to know, and that is why I don't like to see this done in the
> syscall_ptrace function. Perhaps via peekusr/pokeuser interface but then
> trap should be a member of struct user.
As trap could be added to struct user at the end of struct user only, this
would result in an additional ptrace call in UML :-(
Is it safe to increase size of struct user? What about software being
recompiled partly (e.g. using a private lib which isn't recompiled; or the
lib is recompiled, while the program isn't).
So maybe an additional ptrace operation (PTRACE_SETTRAP?) would be better,
but still we would need one more syscall in UML.
Regards, Bodo
Bodo Stroesser <[email protected]> wrote on 04/28/2005
08:50:44 PM:
> 5) UML runs its own sys_(rt_)sigreturn for the process, skips its own
> syscall restart processing and writes the registers of the child with
> the values, that result from sys_(rt_)sigreturn processing. Now GPR2
> is loaded with ERESTARTXXXXXX again.
>
> 6) The child is resumed with PTRACE_SYSCALL. If a further signal is
> pending in the host for that child, the host runs do_signal().
> As regs->trap still is set for a syscall, syscall restarting is
> processed in the host, the process in UML will fail.
That is what I was after, the additional signal that causes the problem
is pending for the child, not for the ptrace father process.
> Obviously, this is a rare case. On i386, the syscall number is used as
> trap, so -1 can be written to it at the second interception to skip
> syscall restarting. Some months ago, UML/i386 did not yet use this, so
> I wrote a litte program, that made the problem happen.
The rare cases are always the most complicated ones. To make UML work
reliably this needs to get fixed.
> > I don't claim to know, and that is why I don't like to see this done in the
> > syscall_ptrace function. Perhaps via peekusr/pokeuser interface but then
> > trap should be a member of struct user.
> As trap could be added to struct user at the end of struct user only, this
> would result in an additional ptrace call in UML :-(
>
> Is it safe to increase size of struct user? What about software being
> recompiled partly (e.g. using a private lib which isn't recompiled; or the
> lib is recompiled, while the program isn't).
> So maybe an additional ptrace operation (PTRACE_SETTRAP?) would be better,
> but still we would need one more syscall in UML.
Yes, it is not a really good idea to add something to struct user. That will
affect the dump format and debugging tools. So it would be an additional ptrace
command like PTRACE_SETTRAP/PTRACE_GETTRAP. The only other solution I can think
of is to be more specific about what the debugger can indicate to the debuggee
what needs to be done after the first syscall_trace invocation. At the moment
it is either
1) a valid system call number, execute the new syscall, or
2) an invalid system call number, skip the system call but don't change
regs->traps and do system call restarting if another signal is pending
If we use more specific error codes instead of just any invalid syscall number
we could have e.g. this:
1) a vaild system call number, execute the new syscall,
2) -Exxx, skip the system call, store -1 to regs->trap and then continue
with restarting system calls if another system call is pending.
3) -Eyyy, skip the system call but leave regs->trap intact so that a pending
signal will restart the system call.
But we really have to be very careful not to break either strace or gdb if
we do this change. Probably it is much easier to introduce PTRACE_SET/GET_TRAP.
blue skies,
Martin
Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH
Martin Schwidefsky wrote:
> Yes, it is not a really good idea to add something to struct user. That will
> affect the dump format and debugging tools. So it would be an additional ptrace
> command like PTRACE_SETTRAP/PTRACE_GETTRAP. The only other solution I can think
> of is to be more specific about what the debugger can indicate to the debuggee
> what needs to be done after the first syscall_trace invocation. At the moment
> it is either
> 1) a valid system call number, execute the new syscall, or
> 2) an invalid system call number, skip the system call but don't change
> regs->traps and do system call restarting if another signal is pending
> If we use more specific error codes instead of just any invalid syscall number
> we could have e.g. this:
> 1) a vaild system call number, execute the new syscall,
> 2) -Exxx, skip the system call, store -1 to regs->trap and then continue
> with restarting system calls if another system call is pending.
Typo with<->without?
Yes. That's what I suggested as a "special magic number". Only if that magic
is written as syscall number at the first interception, syscall_trace() would
modify regs->trap to -1.
Currently my patch uses -1 as the magic number, but there might be better
choices.
> 3) -Eyyy, skip the system call but leave regs->trap intact so that a pending
> signal will restart the system call.
Not only -Eyyy, but all values unequal to "special magic number" could leave
regs->trap intact.
>
> But we really have to be very careful not to break either strace or gdb if
> we do this change. Probably it is much easier to introduce PTRACE_SET/GET_TRAP.
It's easier for s390-kernel, but from UML's point of view, the magic number
solution would be better.
Anyway, if you decide not to allow the magic number, we have to find a way
to use PTRACE_SETTRAP in UML without having to call it too often (Performance).
Because of UML's splitting in kernel-obj and user-obj, this might be a bit
tricky.
BTW: I see no reason to implement PTRACE_GETTRAP, as
PTRACE_SETOPTIONS/PTRACE_TRACESYSGOOD give us a way to distinguish between
syscall interceptions and other SIGTRAPs.
Regards, Bodo
> Yes. That's what I suggested as a "special magic number". Only if that magic
> is written as syscall number at the first interception, syscall_trace() would
> modify regs->trap to -1.
> Currently my patch uses -1 as the magic number, but there might be better
> choices.
>
> > 3) -Eyyy, skip the system call but leave regs->trap intact so that a pending
> > signal will restart the system call.
> Not only -Eyyy, but all values unequal to "special magic number" could leave
> regs->trap intact.
>
> >
> > But we really have to be very careful not to break either strace or gdb if
> > we do this change. Probably it is much easier to introduce PTRACE_SET/GET_TRAP.
> It's easier for s390-kernel, but from UML's point of view, the magic number
> solution would be better.
> Anyway, if you decide not to allow the magic number, we have to find a way
> to use PTRACE_SETTRAP in UML without having to call it too often (Performance).
> Because of UML's splitting in kernel-obj and user-obj, this might be a bit
> tricky.
>
> BTW: I see no reason to implement PTRACE_GETTRAP, as
> PTRACE_SETOPTIONS/PTRACE_TRACESYSGOOD give us a way to distinguish between
> syscall interceptions and other SIGTRAPs.
I talked with Uli about the problem and we came up with a more
elegant solution. We already have a debugger specific code in
do_signal that sets up the registers for system call restarting
BEFORE calling the debugger. Only if the debugger did not change
the restart psw and the return value still indicates
-ERESTARTNOHAND or -ERESTARTSYS we undo this change. In the case
that the debugger did change the psw or the return value we do
not want to restart a system call if another signal is pending.
This is in fact a bug in the signal delivery code. To fix it we
have to set regs->traps to something != __LC_SVC_OLD_PSW while
the debugger has control. regs->traps is reset to the value
indicating a system call if the system call is not restarted
after all.
Will that make UML happy?
blue skies,
Martin.
---
Index: arch/s390/kernel/signal.c
===================================================================
RCS file: /home/cvs/linux-2.5/arch/s390/kernel/signal.c,v
retrieving revision 1.22
diff -u -p -r1.22 signal.c
--- arch/s390/kernel/signal.c 23 Mar 2005 08:30:01 -0000 1.22
+++ arch/s390/kernel/signal.c 4 May 2005 14:51:31 -0000
@@ -482,6 +482,7 @@ int do_signal(struct pt_regs *regs, sigs
} else if (retval == -ERESTART_RESTARTBLOCK) {
regs->gprs[2] = -EINTR;
}
+ regs->trap = -1;
}
/* Get signal to deliver. When running under ptrace, at this point
@@ -497,6 +498,7 @@ int do_signal(struct pt_regs *regs, sigs
& SA_RESTART))) {
regs->gprs[2] = -EINTR;
regs->psw.addr = continue_addr;
+ regs->trap = __LC_SVC_OLD_PSW;
}
}
Martin Schwidefsky wrote:
> I talked with Uli about the problem and we came up with a more
> elegant solution. We already have a debugger specific code in
> do_signal that sets up the registers for system call restarting
> BEFORE calling the debugger. Only if the debugger did not change
> the restart psw and the return value still indicates
> -ERESTARTNOHAND or -ERESTARTSYS we undo this change. In the case
> that the debugger did change the psw or the return value we do
> not want to restart a system call if another signal is pending.
> This is in fact a bug in the signal delivery code. To fix it we
> have to set regs->traps to something != __LC_SVC_OLD_PSW while
> the debugger has control. regs->traps is reset to the value
> indicating a system call if the system call is not restarted
> after all.
>
> Will that make UML happy?
Unfortunately, I guess this will not help. But maybe I'm missing
something, as I don't even understand, what the effect of the
attached patch should be. AFAICS, after each call to do_signal(),
entry.S will return to user without regs->trap being checked again.
do_signal() is the only place, where regs->trap is checked, and
it will be called on return to user exactly once.
What does UML need? It needs a way to prohibit host's do_signal()
from doing *any* changes in regs->psw or regs->gprs[2], no matter
if the kernel was entered by a syscall or what value is in gprs[2].
You know, the problem is a sigreturn syscall done by a process
that runs inside of UML. UML intercepts the syscall and executes
its own sys_(rt_)sigreturn. At the second syscall interception it
restores the registers to the values that were saved in
sigcontext. Maybe, regs->gprs[2] now is -ERESTARTXXXXX. Next, UML
resumes the process with ptrace(PTRACE_SYSCALL).
UML can't know, whether the host will or will not call do_signal().
Whether a signal is pending or not in most cases isn't related to
the syscall.
So a practical solution should allow to reset regs->trap while the
child is on the first or second syscall interception.
This could be "PTRACE_SETTRAP" or the "magic syscall number".
The latter I would prefer.
BTW:
There would be a very nasty way to work around all this. At the
second syscall interception UML could write a "safe" dummy value
(e.g. 0) as the result of sys_(rt_)sigreturn to regs->gprs[2].
Then, it could send an additional signal to the child, to make
sure, that do_signal *will* be called by the host. Next, it
resumes the child with ptrace(PTRACE_SYSCALL) and waits for the
interception of signal delivery. Now it writes the *real* result
and resumes the child while suppressing delivery of the additional
signal.
Other signals might be pending at the same time and might be
intercepted before the additional one. They need to be suppressed
also, but have to be send again, after the additional one was
nullified. This really is nasty and slow, but would not need any
change in host.
I hope, I will not have to use it ;-)
Regards,
Bodo
>
> blue skies,
> Martin.
>
> ---
>
> Index: arch/s390/kernel/signal.c
> ===================================================================
> RCS file: /home/cvs/linux-2.5/arch/s390/kernel/signal.c,v
> retrieving revision 1.22
> diff -u -p -r1.22 signal.c
> --- arch/s390/kernel/signal.c 23 Mar 2005 08:30:01 -0000 1.22
> +++ arch/s390/kernel/signal.c 4 May 2005 14:51:31 -0000
> @@ -482,6 +482,7 @@ int do_signal(struct pt_regs *regs, sigs
> } else if (retval == -ERESTART_RESTARTBLOCK) {
> regs->gprs[2] = -EINTR;
> }
> + regs->trap = -1;
> }
>
> /* Get signal to deliver. When running under ptrace, at this point
> @@ -497,6 +498,7 @@ int do_signal(struct pt_regs *regs, sigs
> & SA_RESTART))) {
> regs->gprs[2] = -EINTR;
> regs->psw.addr = continue_addr;
> + regs->trap = __LC_SVC_OLD_PSW;
> }
> }
>
Bodo Stroesser wrote:
>Unfortunately, I guess this will not help. But maybe I'm missing
>something, as I don't even understand, what the effect of the
>attached patch should be.
Have you tried it?
>AFAICS, after each call to do_signal(),
>entry.S will return to user without regs->trap being checked again.
>do_signal() is the only place, where regs->trap is checked, and
>it will be called on return to user exactly once.
It will be called multiple times if *multiple* signals are pending,
and this is exactly the situation in your problem case (some other
signal is pending after the ptrace intercept SIGTRAP was delievered).
>So a practical solution should allow to reset regs->trap while the
>child is on the first or second syscall interception.
This is exactly what this patch is supposed to do: whenever during
a ptrace intercept the PSW is changed (as it presumably is by your
sigreturn implementation), regs->trap is automatically reset.
Bye,
Ulrich
--
Dr. Ulrich Weigand
Linux on zSeries Development
[email protected]
Ulrich Weigand wrote:
> Bodo Stroesser wrote:
>
>
>>Unfortunately, I guess this will not help. But maybe I'm missing
>>something, as I don't even understand, what the effect of the
>>attached patch should be.
>
> Have you tried it?
>
>
>>AFAICS, after each call to do_signal(),
>>entry.S will return to user without regs->trap being checked again.
>>do_signal() is the only place, where regs->trap is checked, and
>>it will be called on return to user exactly once.
>
> It will be called multiple times if *multiple* signals are pending,
> and this is exactly the situation in your problem case (some other
> signal is pending after the ptrace intercept SIGTRAP was delievered).
No, that's not the situation, we talk about.
UML runs its child with ptrace(PTRACE_SYSCALL).
The syscall-interceptions do not use *real* signals. Instead, before
and after it calls the syscall-handler, entry.S calls syscall_trace(),
which again uses ptrace_notify() to inform the father.
The father will see an event similar to the child receiving SIGTRAP or
(SIGTRAP|0x80), but there will be no signal queued and do_signal() will
not be called.
UML does all changes to its child on these two interceptions. It reads
syscall-number and register contents from the first syscall-interception,
writes a dummy number to the syscall-number, restarts the child with
ptrace(PTRACE_SYSCALL) and waits until the second interception for the
syscall happens. Next it internally executes its syscall-handler for the
original syscall-number and writes the resulting register contents to
the child. Now syscall-handling in UML is finished and the child is
resumed with ptrace(PTRACE_SYSCALL). Host's do_signal() is not called
while doing all this.
UML does not know, whether a signal is pending or not. It would not
even help, if there would be a way to retrieve this information. A
signal still could come in between retrieving the info and the child
being scheduled after ptrace(PTRACE_SYSCALL).
If there is a signal pending for the child, entry.S now jumps to
sysc_return, which again jumps to sysc_work, which calls do_signal()
exactly once. As trap still indicates a syscall, do_signal() possibly
modifies psw and gpr2, which makes UML fail.
The signal is not related to the syscall. UML does not know, if it is
delivered while returning from syscall, with do_signal() changing
registers, or later, without changes from do_signal(). So UML can't
undo the changes done by do_signal().
To UML the signal is an interrupt, and normally when returning from
interrupt it doesn't want to modify child's psw or gprs. So UML
normally does not modify psw or gprs on signal interceptions.
Having said all this, unfortunately I don't see a way to satisfy UML's
need with the current host implementation.
Regards,
Bodo
>
>
>>So a practical solution should allow to reset regs->trap while the
>>child is on the first or second syscall interception.
>
> This is exactly what this patch is supposed to do: whenever during
> a ptrace intercept the PSW is changed (as it presumably is by your
> sigreturn implementation), regs->trap is automatically reset.
>
> Bye,
> Ulrich
>
/*
* This is a tool to test syscall invalidation via ptrace on s390.
* It is based on arch/um/os-Linux/start_up.c
*/
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sched.h>
#include <errno.h>
#include <stdarg.h>
#include <stdlib.h>
#include <sys/time.h>
#include <sys/wait.h>
#include <sys/mman.h>
#include <asm/unistd.h>
#include <asm/page.h>
#include <linux/ptrace.h>
#include <stddef.h>
#include <string.h>
#include <fcntl.h>
#include <sys/types.h>
#define ERESTARTNOINTR 513
static int ptrace_child(void *arg)
{
int ret;
int pid = getpid();
int sc_result;
/* Child wants to be ptraced */
if(ptrace(PTRACE_TRACEME, 0, 0, 0) < 0){
perror("ptrace");
kill(pid, SIGKILL);
}
/* Child stops itself */
kill(pid, SIGSTOP);
/* The following part is run under PTRACE_SYSCALL */
/* Here we have "svc __NR_getpid" twice. Father will invalidate the
* first and skip the second by adding 6 to PSWADDR.
* If the host does the unwanted syscall restarting, the second svc
* will be done and the result will be child's pid instead of
* -ERESTARTNOINTR
*/
__asm__ __volatile__ (
" svc %b1\n"
" .long 0\n"
" svc %b1\n"
" lr %0,2"
: "=d" (sc_result)
: "i" (__NR_getpid)
: "2" );
/* Here we are back running PTRACE_CONT */
/* Now we check the result of the syscall */
if (sc_result == -ERESTARTNOINTR)
ret = 0; /* Expected result: syscall was invalidated, no
syscall restart is done */
else if (sc_result == pid)
ret = 1; /* This is wrong, as it is the normal result of
getpid(). Probably host did a syscall restart! */
else
ret = 2; /* We don't know, what happened. There may be a bug in
this test tool */
/* Give father a status indicating success or failure */
exit(ret);
}
static void errout(char *str, int error)
{
printf(str, error);
putchar('\n');
exit(1);
}
static int start_ptraced_child(void **stack_out)
{
void *stack;
unsigned long sp;
int pid, n, status;
stack = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if(stack == MAP_FAILED)
errout("start_ptraced_child : mmap failed, errno = %d", errno);
sp = (unsigned long) stack + PAGE_SIZE - sizeof(void *);
pid = __clone(ptrace_child, (void *) sp, SIGCHLD, NULL);
if(pid < 0)
errout("start_ptraced_child : clone failed, errno = %d", errno);
n = waitpid(pid, &status, WUNTRACED);
if(n < 0)
errout("start_ptraced_child : wait failed, errno = %d", errno);
if(!WIFSTOPPED(status) || (WSTOPSIG(status) != SIGSTOP))
errout("start_ptraced_child : expected SIGSTOP, "
"got status = 0x%x", status);
*stack_out = stack;
return(pid);
}
static int stop_ptraced_child(int pid, void *stack)
{
int status, n;
/* We resume our child and let it check it's result */
if(ptrace(PTRACE_CONT, pid, 0, 0) < 0)
errout("stop_ptraced_child : ptrace failed, errno = %d", errno);
/* Now, we wait for the child to exit */
n = waitpid(pid, &status, 0);
if(!WIFEXITED(status))
errout("\nstop_ptraced_child: error: child didn't exit,"
" status 0x%x\n", status);
if(munmap(stack, PAGE_SIZE) < 0)
errout("stop_ptraced_child : munmap failed, errno = %d", errno);
/* Return child's exit status */
return WEXITSTATUS(status);
}
int main(void)
{
void *stack;
int pid, syscall, n, status;
unsigned long addr;
printf("Checking if syscall restart handling in host can be skipped...");
fflush(stdout);
/* First create a child and wait, until it stops itself */
pid = start_ptraced_child(&stack);
/* Now resume the child */
if(ptrace(PTRACE_SYSCALL, pid, 0, 0) < 0)
errout("check_restart_skip : ptrace failed, "
"errno = %d", errno);
/* wait, until child does a syscall */
n = waitpid(pid, &status, WUNTRACED);
if(n < 0)
errout("check_restart_skip : wait failed, "
"errno = %d", errno);
if(!WIFSTOPPED(status) || (WSTOPSIG(status) != SIGTRAP))
errout("check_restart_skip : expected "
"SIGTRAP, got status = %d", status);
/* Check, if syscall is __NR_getpid */
syscall = ptrace(PTRACE_PEEKUSR, pid, PT_GPR2, 0);
if(syscall != __NR_getpid)
errout("check_restart_skip: unexpected syscall %d\n", syscall);
/* Modify syscall number to -1 */
n = ptrace(PTRACE_POKEUSR, pid, PT_GPR2, -1);
if(n < 0)
errout("check_restart_skip : failed to "
"modify system call, errno = %d", errno);
/* Resume child and wait for second syscall interception */
if(ptrace(PTRACE_SYSCALL, pid, 0, 0) < 0)
errout("check_restart_skip : ptrace failed, "
"errno = %d", errno);
n = waitpid(pid, &status, WUNTRACED);
if(n < 0)
errout("check_restart_skip : wait failed, "
"errno = %d", errno);
if(!WIFSTOPPED(status) || (WSTOPSIG(status) != SIGTRAP))
errout("check_restart_skip : expected "
"SIGTRAP, got status = %d", status);
/* Now, modify PSW_ADDR to skip second syscall */
addr = ptrace(PTRACE_PEEKUSR, pid, PT_PSWADDR, 0);
n = ptrace(PTRACE_POKEUSR, pid, PT_PSWADDR, addr+6);
if(n < 0)
errout("check_restart_skip : failed to modify PSWADDR,"
" errno = %d", errno);
/* Set syscall result to -ERESTARTNOINTR */
n = ptrace(PTRACE_POKEUSR, pid, PT_GPR2, -ERESTARTNOINTR);
if(n < 0)
errout("check_restart_skip : failed to modify system "
"call result, errno = %d", errno);
/* Here "accidentally" a signal is queued for the child */
kill(pid, SIGALRM);
/* We resume the child again and wait for next interception */
if(ptrace(PTRACE_SYSCALL, pid, 0, 0) < 0)
errout("check_restart_skip : ptrace failed, "
"errno = %d", errno);
n = waitpid(pid, &status, WUNTRACED);
if(n < 0)
errout("check_restart_skip : wait failed, "
"errno = %d", errno);
/* The interception must be for the signal, not for a syscall
Here, UML would do some interrupt processing */
if(!WIFSTOPPED(status) || (WSTOPSIG(status) != SIGALRM))
errout("check_restart_skip : expected "
"SIGALRM, got status = %d", status);
/* At the end of interrupt processing, UML would resume the child
* doing ptrace(PTRACE_SYSCALL), but without modifying the regs.
* Here we call stop_ptraced_child, which will resume the child
* with ptrace(PTRACE_CONT). Then the child will check the "result"
* and will exit with
* 0 if the result is -ERESTARTNOINTR
* 1 if the result is child's pid (host did syscall restart)
* 2 if we have an unexpected result
*/
n = stop_ptraced_child(pid, stack);
if (n)
printf("failed, result = %d\n", n);
else
printf("OK\n");
return n;
}
> Meanwhile I've tried.
>
> Your patch absolutely doesn't change host's behavior in the situation,
> that is relevant to UML.
And as I understand that is because the SIGTRAP is not delivered
by the normal signal mechanism.
> I've prepared and attached a small program that easily can reproduce
> the problem. I hope this will help to find a viable solution.
That is cool, thanks. Will certainly speed up debugging on my side.
blue skies,
Martin
Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH
Martin Schwidefsky wrote:
>>Meanwhile I've tried.
>>
>>Your patch absolutely doesn't change host's behavior in the situation,
>>that is relevant to UML.
>
>
> And as I understand that is because the SIGTRAP is not delivered
> by the normal signal mechanism.
Yes.
BTW: I still can't see any loop in the kernel, that could call
do_signal() multiple times without returning to user in between.
For my understanding: do I miss something? If so, where is the loop?
Regards
Bodo
>
>
>>I've prepared and attached a small program that easily can reproduce
>>the problem. I hope this will help to find a viable solution.
>
>
> That is cool, thanks. Will certainly speed up debugging on my side.
>
> blue skies,
> Martin
>
> Martin Schwidefsky
> Linux for zSeries Development & Services
> IBM Deutschland Entwicklung GmbH
>
> BTW: I still can't see any loop in the kernel, that could call
> do_signal() multiple times without returning to user in between.
> For my understanding: do I miss something? If so, where is the loop?
do_signal sets up a signal frame for the user. It returns from it
by an svc and then another signal could be pending. It's not really
a loop. For every signal frame you end up at least once in user space
because we avoid creating multiple signal frames on the user stack.
blue skies,
Martin
Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH
Martin Schwidefsky wrote:
>>BTW: I still can't see any loop in the kernel, that could call
>>do_signal() multiple times without returning to user in between.
>>For my understanding: do I miss something? If so, where is the loop?
>
>
> do_signal sets up a signal frame for the user. It returns from it
> by an svc and then another signal could be pending. It's not really
> a loop. For every signal frame you end up at least once in user space
> because we avoid creating multiple signal frames on the user stack.
Yeah. That's what I saw.
Each time when the kernel is entered again and a signal is pending,
do_signal() will be called on return to user with regs->trap setup
freshly. So, I still believe the patch doesn't have *any* effect.
Regards
Bodo
>
> blue skies,
> Martin
>
> Martin Schwidefsky
> Linux for zSeries Development & Services
> IBM Deutschland Entwicklung GmbH
>
> Each time when the kernel is entered again and a signal is pending,
> do_signal() will be called on return to user with regs->trap setup
> freshly. So, I still believe the patch doesn't have *any* effect.
Oh, the patch does have an effect for the debugger. If the debugger
stopped on the sys_sig_return system call and does e.g. an inferior
function call, then the kernel might want to restart a system call
that isn't there because the debugger did a "jump" but could not
change regs->trap.
blue skies,
Martin
Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH
Martin Schwidefsky wrote:
>>Each time when the kernel is entered again and a signal is pending,
>>do_signal() will be called on return to user with regs->trap setup
>>freshly. So, I still believe the patch doesn't have *any* effect.
>
>
> Oh, the patch does have an effect for the debugger. If the debugger
> stopped on the sys_sig_return system call and does e.g. an inferior
> function call, then the kernel might want to restart a system call
> that isn't there because the debugger did a "jump" but could not
> change regs->trap.
AFAICS, it not even has an effect for the debugger.
do_signal() is the only routine, that examines regs->trap. On each
kernel-entry, regs->trap is set freshly. What will be the effect of
changing it *after* it had been examined?
The only exceptions are sys_(rt_)sigsuspend. Here do_signal() might
be called twice, while *and* after processing the syscall. But even
here the patch has no effect, as regs->gprs[2] contains -EINTR, if
so_signal is called by sys_(rt_)sigreturn.
Regards
Bodo
>
> blue skies,
> Martin
>
> Martin Schwidefsky
> Linux for zSeries Development & Services
> IBM Deutschland Entwicklung GmbH
>
/*
* This is a tool to test syscall invalidation via ptrace on s390.
* It is based on arch/um/os-Linux/start_up.c
*/
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sched.h>
#include <errno.h>
#include <stdarg.h>
#include <stdlib.h>
#include <sys/time.h>
#include <sys/wait.h>
#include <sys/mman.h>
#include <asm/unistd.h>
#include <asm/page.h>
#include <linux/ptrace.h>
#include <stddef.h>
#include <string.h>
#include <fcntl.h>
#include <sys/types.h>
#define ERESTARTNOINTR 513
static int ptrace_child(void *arg)
{
int ret;
int pid = getpid();
int sc_result;
/* Child wants to be ptraced */
if(ptrace(PTRACE_TRACEME, 0, 0, 0) < 0){
perror("ptrace");
kill(pid, SIGKILL);
}
/* Child stops itself */
kill(pid, SIGSTOP);
/* The following part is run under PTRACE_SYSCALL */
__asm__ __volatile__ (
" svc %b1\n"
" lr %0,2"
: "=d" (sc_result)
: "i" (__NR_getpid)
: "2" );
/* Here we are back running PTRACE_CONT */
/* Now we check the result of the syscall */
if (sc_result == -ERESTARTNOINTR)
ret = 0; /* Expected result: syscall was invalidated, no
syscall restart is done */
else if (sc_result == pid)
ret = 1; /* This is wrong, as it is the normal result of
getpid(). Probably host did a syscall restart! */
else
ret = 2; /* We don't know, what happened. There may be a bug in
this test tool */
/* Give father a status indicating success or failure */
exit(ret);
}
static void errout(char *str, int error)
{
printf(str, error);
putchar('\n');
exit(1);
}
static int start_ptraced_child(void **stack_out)
{
void *stack;
unsigned long sp;
int pid, n, status;
stack = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if(stack == MAP_FAILED)
errout("start_ptraced_child : mmap failed, errno = %d", errno);
sp = (unsigned long) stack + PAGE_SIZE - sizeof(void *);
pid = __clone(ptrace_child, (void *) sp, SIGCHLD, NULL);
if(pid < 0)
errout("start_ptraced_child : clone failed, errno = %d", errno);
n = waitpid(pid, &status, WUNTRACED);
if(n < 0)
errout("start_ptraced_child : wait failed, errno = %d", errno);
if(!WIFSTOPPED(status) || (WSTOPSIG(status) != SIGSTOP))
errout("start_ptraced_child : expected SIGSTOP, "
"got status = 0x%x", status);
*stack_out = stack;
return(pid);
}
static int stop_ptraced_child(int pid, void *stack)
{
int status, n;
/* We resume our child and let it check it's result */
if(ptrace(PTRACE_CONT, pid, 0, 0) < 0)
errout("stop_ptraced_child : ptrace failed, errno = %d", errno);
/* Now, we wait for the child to exit */
n = waitpid(pid, &status, 0);
if(!WIFEXITED(status))
errout("\nstop_ptraced_child: error: child didn't exit,"
" status 0x%x\n", status);
if(munmap(stack, PAGE_SIZE) < 0)
errout("stop_ptraced_child : munmap failed, errno = %d", errno);
/* Return child's exit status */
return WEXITSTATUS(status);
}
int main(void)
{
void *stack;
int pid, syscall, n, status;
unsigned long addr;
printf("Checking if syscall restart handling in host can be skipped...");
fflush(stdout);
/* First create a child and wait, until it stops itself */
pid = start_ptraced_child(&stack);
/* Now resume the child */
if(ptrace(PTRACE_SYSCALL, pid, 0, 0) < 0)
errout("check_restart_skip : ptrace failed, "
"errno = %d", errno);
/* wait, until child does a syscall */
n = waitpid(pid, &status, WUNTRACED);
if(n < 0)
errout("check_restart_skip : wait failed, "
"errno = %d", errno);
if(!WIFSTOPPED(status) || (WSTOPSIG(status) != SIGTRAP))
errout("check_restart_skip : expected "
"SIGTRAP, got status = %d", status);
/* Check, if syscall is __NR_getpid */
syscall = ptrace(PTRACE_PEEKUSR, pid, PT_GPR2, 0);
if(syscall != __NR_getpid)
errout("check_restart_skip: unexpected syscall %d\n", syscall);
/* Modify syscall number to -1 */
n = ptrace(PTRACE_POKEUSR, pid, PT_GPR2, -1);
if(n < 0)
errout("check_restart_skip : failed to "
"modify system call, errno = %d", errno);
/* Resume child and wait for second syscall interception */
if(ptrace(PTRACE_SYSCALL, pid, 0, 0) < 0)
errout("check_restart_skip : ptrace failed, "
"errno = %d", errno);
n = waitpid(pid, &status, WUNTRACED);
if(n < 0)
errout("check_restart_skip : wait failed, "
"errno = %d", errno);
if(!WIFSTOPPED(status) || (WSTOPSIG(status) != SIGTRAP))
errout("check_restart_skip : expected "
"SIGTRAP, got status = %d", status);
/* Set syscall result to -ERESTARTNOINTR */
n = ptrace(PTRACE_POKEUSR, pid, PT_GPR2, -ERESTARTNOINTR);
if(n < 0)
errout("check_restart_skip : failed to modify system "
"call result, errno = %d", errno);
/* Here "accidentally" a signal is queued for the child */
kill(pid, SIGALRM);
/* We resume the child again and wait for next interception */
if(ptrace(PTRACE_SYSCALL, pid, 0, 0) < 0)
errout("check_restart_skip : ptrace failed, "
"errno = %d", errno);
n = waitpid(pid, &status, WUNTRACED);
if(n < 0)
errout("check_restart_skip : wait failed, "
"errno = %d", errno);
/* The interception must be for the signal, not for a syscall
Here, UML would do some interrupt processing */
if(!WIFSTOPPED(status) || (WSTOPSIG(status) != SIGALRM))
errout("check_restart_skip : expected "
"SIGALRM, got status = %d", status);
/* At the end of interrupt processing, UML would resume the child
* doing ptrace(PTRACE_SYSCALL), but without modifying the regs.
* Here we call stop_ptraced_child, which will resume the child
* with ptrace(PTRACE_CONT). Then the child will check the "result"
* and will exit with
* 0 if the result is -ERESTARTNOINTR
* 1 if the result is child's pid (host did syscall restart)
* 2 if we have an unexpected result
*/
n = stop_ptraced_child(pid, stack);
if (n)
printf("failed, result = %d\n", n);
else
printf("OK\n");
return n;
}
> > I've prepared and attached a small program that easily can reproduce
> > the problem. I hope this will help to find a viable solution.
>
> Here is a slightly modified version of my testtool. The new version
> covers the fact, that in certain situations UML must avoid syscall
> restarting, even if PSWADDR is not modified.
Ok, Uli convinced me that the original patch to clear regs->traps if
the system call has been canceled on the first call to syscall_trace
is the correct thing to do. If the tracer chooses to invalidate the
system call of the traced process then the complete handling of the
function executed for the system call is done in the tracer. That
includes system call restarting in the case that another system call
is involved to implement the function. The point is that the traced
process did not execute a system call, ergo no system call restarting
may take place.
So after a long discussion I'll just use a slightly modified version
of the original patch:
Index: ptrace.c
===================================================================
RCS file: /home/cvs/linux-2.5/arch/s390/kernel/ptrace.c,v
retrieving revision 1.35
diff -u -r1.35 ptrace.c
--- ptrace.c 6 May 2005 18:59:13 -0000 1.35
+++ ptrace.c 31 May 2005 16:50:50 -0000
@@ -39,6 +39,7 @@
#include <asm/pgalloc.h>
#include <asm/system.h>
#include <asm/uaccess.h>
+#include <asm/unistd.h>
#ifdef CONFIG_S390_SUPPORT
#include "compat_ptrace.h"
@@ -762,6 +763,13 @@
return;
ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD)
? 0x80 : 0));
+
+ /*
+ * If the debuffer has set an invalid system call number,
+ * we prepare to skip the system call restart handling.
+ */
+ if (!entryexit && regs->gprs[2] >= NR_syscalls)
+ regs->trap = -1;
/*
* this isn't the same as continuing with a signal, but it will do
===================================================================
regs->trap should be reset for any invalid system call, not just for
negative system call numbers.
blue skies,
Martin
Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH