2009-04-23 12:15:38

by Denys Vlasenko

[permalink] [raw]
Subject: ptrace(PTRACE_SYSCALL/CONT/DETACH, ..., SIGSTOP) does not work

Hi Oleg,

Bringing the discussion to lkml per your request.

Lets focus on ptrace() API, not the actual kernel behavior (which may be
different due to bugs). In other words, how ptrace should work?

ptrace(PTRACE_SYSCALL/CONT/DETACH, ..., sig) API is needed in order to continue
the process after ptrace stop, and let it see and handle the signal.

For example, imagine that you are stracing cat process. It is blocked on read
syscall:

# { sleep 10; echo Hello; } | strace cat
...
read(0,

What strace is doing at this point? It is in wait4(-1, &status, __WALL, NULL).

Now someone sends a signal. strace sees this:

wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == <sig>}], __WALL, NULL) = <PID>

strace will inform user with "--- SIGxxxx ---", and what strace needs to do
now? Right! It needs to deliver the signal to <PID>, and wait again:

ptrace(PTRACE_SYSCALL, <PID>, 0x1, <sig>);
wait4(-1, &status, __WALL, NULL);

And now, this signal should NOT show up in wait4 (because tracer already saw
it, we don't want infinite cycle here :), but should act as normal.


Example 1: cat is signaled with a signal with default action "nop". Strace does
this (irrelevant syscalls removed):

wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGWINCH}], __WALL, NULL) = 31027
write(2, "--- SIGWINCH (Window changed) @ "..., 42) = 42
ptrace(PTRACE_SYSCALL, 31027, 0x1, SIGWINCH) = 0
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == 133}], __WALL, NULL) = 31027
...

Example 2: cat is signaled with a signal with default action "die". Strace does
this:

wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTERM}], __WALL, NULL) = 32654
write(2, "--- SIGTERM (Terminated) @ 0 (0)"..., 37) = 37
ptrace(PTRACE_SYSCALL, 32654, 0x1, SIGTERM) = 0
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], __WALL, NULL) = 32654
write(2, "+++ killed by SIGTERM +++\n", 26) = 26
wait4(-1, 0x7fff02e98f7c, WNOHANG|__WALL, NULL) = -1 ECHILD (No child
processes)

There two examples works as expected. In particular, user-observed cat's
behavior is not changed by the fact that it is straced.


Now, the bug:

Example 3: cat is signaled with SIGSTOP. Strace does this:

wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WALL, NULL) = 30989
write(2, "--- SIGSTOP (Stopped (signal)) @"..., 43) = 43
ptrace(PTRACE_SYSCALL, 30989, 0x1, SIGSTOP) = 0

Note: traced process is NOT stopped here as it should be!
Somehow, we get another SIGSTOP notification:

wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WALL, NULL) = 30989

strace is confused (it thinks it's another SIGSTOP).
it injects SIGSTOP again:

write(2, "--- SIGSTOP (Stopped (signal)) @"..., 43) = 43
ptrace(PTRACE_SYSCALL, 30989, 0x1, SIGSTOP) = 0

thankfully, this doesn't loop forever, but tracee is still not stopped!
we are immediately getting notification that it entered read syscall:

wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == 133}], __WALL, NULL) = 30989
ptrace(PTRACE_PEEKUSER, 30989, 8*ORIG_RAX, [0]) = 0
ptrace(PTRACE_PEEKUSER, 30989, 8*CS, [0x33]) = 0
ptrace(PTRACE_PEEKUSER, 30989, 8*RAX, [0xffffffffffffffda]) = 0
ptrace(PTRACE_PEEKUSER, 30989, 8*RDI, [0]) = 0
ptrace(PTRACE_PEEKUSER, 30989, 8*RSI, [0x77a000]) = 0
ptrace(PTRACE_PEEKUSER, 30989, 8*RDX, [0x1000]) = 0
write(2, "read(0, ", 8) = 8
ptrace(PTRACE_SYSCALL, 30989, 0x1, SIG_0) = 0
wait4(-1, ..., __WALL, NULL) = ....

This does not look correct to me. Straced process is not behaving in the same
way as it would without strace.


If you disagree with me, let me know what, in your opinion, strace should do to
properly emulate process behavior?


BTW, same happens with other stopping signal, TSTP:

wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTSTP}], __WALL, NULL) = 32682
write(2, "--- SIGTSTP (Stopped) @ 0 (0) --"..., 34) = 34
ptrace(PTRACE_SYSCALL, 32682, 0x1, SIGTSTP) = 0
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTSTP}], __WALL, NULL) = 32682
write(2, "--- SIGTSTP (Stopped) @ 0 (0) --"..., 34) = 34
ptrace(PTRACE_SYSCALL, 32682, 0x1, SIGTSTP) = 0
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == 133}], __WALL, NULL) = 32682
ptrace(PTRACE_PEEKUSER, 32682, 8*ORIG_RAX, [0]) = 0
ptrace(PTRACE_PEEKUSER, 32682, 8*CS, [0x33]) = 0
ptrace(PTRACE_PEEKUSER, 32682, 8*RAX, [0xffffffffffffffda]) = 0
ptrace(PTRACE_PEEKUSER, 32682, 8*RDI, [0]) = 0
ptrace(PTRACE_PEEKUSER, 32682, 8*RSI, [0x2483000]) = 0
ptrace(PTRACE_PEEKUSER, 32682, 8*RDX, [0x1000]) = 0
write(2, "read(0, ", 8) = 8
wait4(-1, ...., __WALL, NULL) = ...


>From Oleg Nesterov 2009-04-22 19:22:29 EDT
> > Now, the bug:
> >
> > Example 3: cat is signaled with SIGSTOP. Strace does this:
> >
> > wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WALL, NULL) = 30989
>
> The tracee dequeues SIGSTOP, does get_signal_to_deliver()->ptrace_stop(),
> and do not really handle SIGSTOP.
>
> > write(2, "--- SIGSTOP (Stopped (signal)) @"..., 43) = 43
> > ptrace(PTRACE_SYSCALL, 30989, 0x1, SIGSTOP) = 0
> >
> > Note: traced process is NOT stopped here as it should be!
> > Somehow, we get another SIGSTOP notification:
> >
> > wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WALL, NULL) = 30989
>
> strace does ptrace(PTRACE_SYSCALL, SIGSTOP), this sets ->exit_code = SIGSTOP.
> The tracee sees the debugger wants SIGSTOP to be handled and calls
> do_signal_stop().
> (we have some complications with SIGNAL_STOP_DEQUEUED, but lets ignore them).
>
> finish_stop() notifies ->parent == tracer about jctl stop, strace does
> do_wait()
> and gets WSTOPSIG(s) == SIGSTOP.
>
> What is wrong?

It's wrong that single SIGSTOP gets reported twice, yet fails to act even once.

You are replying from the point of view of kernel's current implementation.
Stop thinking about implementation. Think about he API.
Does kernel fulfil what API promises? It does not look like it does.

What strace told kernel to do? strace said:

Kernel, please make traced process act as if it received <sig>:
* ignore <sig> if <sig> is blocked
(and keep it pending in pending signal mask);
* jump to handler if handler is registered;
* ignore <sig> if it is SIG_IGNed, or if default action is no-op;
* make process die if default handler is to die;
* make process stop if default handler is to stop.

IOW: strace does NOT want to see this signal reported back to strace -
it already saw that, what's the point in seeing it again?

All of the above is working correctly, except for the last line:
"make process stop if default handler is to stop". This one does not work.
Instead it acts really weird, as shown in my SIGSTOP and SIGTSTP
examples above.

I think this is a bug.
--
vda


2009-04-23 14:30:40

by Oleg Nesterov

[permalink] [raw]
Subject: SIGSTOP && ptrace (Was: ptrace(PTRACE_SYSCALL/CONT/DETACH, ..., SIGSTOP) does not work)

On 04/23, Denys Vlasenko wrote:
>
> Bringing the discussion to lkml per your request.

You forgot to cc maintainer ;) Add Roland.

> From Oleg Nesterov 2009-04-22 19:22:29 EDT
> > > Now, the bug:
> > >
> > > Example 3: cat is signaled with SIGSTOP. Strace does this:
> > >
> > > wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WALL, NULL) = 30989
> >
> > The tracee dequeues SIGSTOP, does get_signal_to_deliver()->ptrace_stop(),
> > and do not really handle SIGSTOP.
> >
> > > write(2, "--- SIGSTOP (Stopped (signal)) @"..., 43) = 43
> > > ptrace(PTRACE_SYSCALL, 30989, 0x1, SIGSTOP) = 0
> > >
> > > Note: traced process is NOT stopped here as it should be!
> > > Somehow, we get another SIGSTOP notification:
> > >
> > > wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WALL, NULL) = 30989
> >
> > strace does ptrace(PTRACE_SYSCALL, SIGSTOP), this sets ->exit_code = SIGSTOP.
> > The tracee sees the debugger wants SIGSTOP to be handled and calls
> > do_signal_stop().
> > (we have some complications with SIGNAL_STOP_DEQUEUED, but lets ignore them).
> >
> > finish_stop() notifies ->parent == tracer about jctl stop, strace does
> > do_wait()
> > and gets WSTOPSIG(s) == SIGSTOP.
> >
> > What is wrong?
>
> It's wrong that single SIGSTOP gets reported twice, yet fails to act even once.

I'd say this is correct. The second time the tracee reports about jctl stop,
given that the tracer did ptrace(PTRACE_SYSCALL, SIGSTOP) this looks right.
The debugger explicitely tells the tracee "proceed with SIGSTOP", what else
can we expect?

> You are replying from the point of view of kernel's current implementation.

Yes,

> Stop thinking about implementation. Think about he API.
> Does kernel fulfil what API promises? It does not look like it does.

I don't really know. More or less, I understand the process-management
part of ptrace, but I never knew what user-space actually expects from
API.

But the behaviour above looks very natural to me.

> What strace told kernel to do? strace said:
>
> Kernel, please make traced process act as if it received <sig>:
> * ignore <sig> if <sig> is blocked
> (and keep it pending in pending signal mask);
> * jump to handler if handler is registered;
> * ignore <sig> if it is SIG_IGNed, or if default action is no-op;
> * make process die if default handler is to die;
> * make process stop if default handler is to stop.
>
> IOW: strace does NOT want to see this signal reported back to strace -
> it already saw that, what's the point in seeing it again?

As I said many times, the second report is not about the signal, it
reports that the group-stop is completed.

> All of the above is working correctly, except for the last line:
> "make process stop if default handler is to stop". This one does not work.

This does work. The tracee stops and reports. OK, OK, unless I missed
something of course ;)

> Instead it acts really weird, as shown in my SIGSTOP and SIGTSTP
> examples above.

I guess, you don't like the fact that finish_stop() always clear ->exit_code
after wake up. This is why the next ptrace(PTRACE_SYSCALL, SIGSTOP) (called
when the tracee sleeps in finish_stop) "loses" SIGSTOP.

I can't say if this really right or not, but I guess it is too late to
change this behaviour. I may be wrong.

But anyway, I'd say in this case strace should not do ptrace(SIGSTOP),
this looks just wrong to me.

> I think this is a bug.

I don't. At least, I strongly disagree with the subject, and I think you
misunderstand what really happens with this example. Please correct me.


Perhaps, you should ask how strace can distinguish between "SIGSTOP
recieved" and "group-stop completed". I am not 100% sure, but at first
glance this looks possible.

The tracee reports "SIGSTOP recieved". It also sets ->last_siginfo = info.

The tracer can change info->si_signo before ptrace(SYSCALL, SIGSTOP).

This info->si_signo will be used as ->exit_code in status instead of
SIGTOP when the tracee actually stops.


I didn't check this though. And again, I am in no position to tell
you what should be done here. Fortunately, we have knowledgeable
people in cc.

Oleg.

2009-04-23 15:20:32

by Oleg Nesterov

[permalink] [raw]
Subject: Q: ptrace_signal() && PTRACE_SETSIGINFO (Was: SIGSTOP && ptrace)

ptrace_signal:

signr = current->exit_code;

/* Update the siginfo structure if the signal has
changed. If the debugger wanted something
specific in the siginfo structure then it should
have updated *info via PTRACE_SETSIGINFO. */

Yes. PTRACE_SETSIGINFO can change *info if debugger wants something
special. But then we do:

if (signr != info->si_signo) {
info->si_signo = signr;
info->si_errno = 0;
info->si_code = SI_USER;
info->si_pid = task_pid_vnr(current->parent);
info->si_uid = task_uid(current->parent);
}

Why? If the tracer changes ->exit_code it should know what it does.
Why do we reset *info?

But the real question, how can PTRACE_SETSIGINFO change ->si_signo
(for example, for do_signal_stop(si_signo)) if this in fact is not
allowed?

Oleg.

2009-04-24 06:24:17

by Roland McGrath

[permalink] [raw]
Subject: Re: SIGSTOP && ptrace (Was: ptrace(PTRACE_SYSCALL/CONT/DETACH, ..., SIGSTOP) does not work)

> > IOW: strace does NOT want to see this signal reported back to strace -
> > it already saw that, what's the point in seeing it again?
>
> As I said many times, the second report is not about the signal, it
> reports that the group-stop is completed.

Oleg is correct. The "second" report, is not "WIFSTOPPED means a ptrace
stop". It is "WIFSTOPPED means a job control stop". This is not meant as
a notification to tracer, but a notification to parent. In case of
PTRACE_ATTACH, this is "unnatural" because the tracer steals the parent's
notifications, but that's just the known crime against nature that is
ptrace semantics.

> I guess, you don't like the fact that finish_stop() always clear ->exit_code
> after wake up. This is why the next ptrace(PTRACE_SYSCALL, SIGSTOP) (called
> when the tracee sleeps in finish_stop) "loses" SIGSTOP.
>
> I can't say if this really right or not, but I guess it is too late to
> change this behaviour. I may be wrong.

I agree it's unfortunate. But TBH it's more unfortunate that PTRACE_CONT
et al are allowed here at all. It's a job control stop, and if the world
were right then it would only be broken by SIGCONT. But Oleg is right,
it's too late to change it now. Them's the semantics.

> Perhaps, you should ask how strace can distinguish between "SIGSTOP
> recieved" and "group-stop completed". I am not 100% sure, but at first
> glance this looks possible.

It is, but it's easier than the hack you suggest. PTRACE_GETSIGINFO only
works for a ptrace stop, not a job control stop. If wait reported SIGSTOP,
PTRACE_GETSIGINFO will fail with EINVAL for a job control stop but will
succeed for a ptrace stop.


Thanks,
Roland

2009-04-24 06:32:20

by Roland McGrath

[permalink] [raw]
Subject: Re: Q: ptrace_signal() && PTRACE_SETSIGINFO (Was: SIGSTOP && ptrace)

> Yes. PTRACE_SETSIGINFO can change *info if debugger wants something
> special. But then we do:
>
> if (signr != info->si_signo) {
> info->si_signo = signr;
[...]
> Why? If the tracer changes ->exit_code it should know what it does.

If it uses PTRACE_SETSIGINFO it should know what it does, and update
the siginfo_t to match the signal it passes to PTRACE_CONT et al.

> Why do we reset *info?

PTRACE_SETSIGINFO did not always exist, and even now might not be used by a
simple-minded application. If the user is sophisticated, it calls
PTRACE_SETSIGINFO and then passes the signal number to match. If not, it
never calls PTRACE_SETSIGINFO at all, but expects the signal number it
chose to pass in PTRACE_CONT to behave "normally" in the tracee.

We reset the siginfo_t the tracee will see to match what a kill() from the
debugger would have looked like. Otherwise the tracee could be confused by
the siginfo_t values that don't make sense for the signal number delivered.
(The simple-minded debugger's ptrace code could even predate SA_SIGINFO
handlers and tracees that could see the siginfo_t.)

> But the real question, how can PTRACE_SETSIGINFO change ->si_signo
> (for example, for do_signal_stop(si_signo)) if this in fact is not
> allowed?

It's allowed. You just have to pass the same value you set in si_signo as
the argument to PTRACE_CONT after you do PTRACE_SETSIGINFO.


Thanks,
Roland

2009-04-24 17:45:31

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Q: ptrace_signal() && PTRACE_SETSIGINFO (Was: SIGSTOP && ptrace)

On 04/23, Roland McGrath wrote:
>
> > Yes. PTRACE_SETSIGINFO can change *info if debugger wants something
> > special. But then we do:
> >
> > if (signr != info->si_signo) {
> > info->si_signo = signr;
> [...]
> > Why? If the tracer changes ->exit_code it should know what it does.
>
> If it uses PTRACE_SETSIGINFO it should know what it does, and update
> the siginfo_t to match the signal it passes to PTRACE_CONT et al.
>
> > Why do we reset *info?
>
> PTRACE_SETSIGINFO did not always exist, and even now might not be used by a
> simple-minded application. If the user is sophisticated, it calls
> PTRACE_SETSIGINFO and then passes the signal number to match. If not, it
> never calls PTRACE_SETSIGINFO at all, but expects the signal number it
> chose to pass in PTRACE_CONT to behave "normally" in the tracee.

OK, understand.

> > But the real question, how can PTRACE_SETSIGINFO change ->si_signo
> > (for example, for do_signal_stop(si_signo)) if this in fact is not
> > allowed?
>
> It's allowed. You just have to pass the same value you set in si_signo as
> the argument to PTRACE_CONT after you do PTRACE_SETSIGINFO.

Yes, yes, I see. I meant "the tracer can not use signr != ->si_signo",
but now I don't see the reason it should.

Oleg.

2009-04-24 21:14:23

by Denys Vlasenko

[permalink] [raw]
Subject: Re: SIGSTOP && ptrace (Was: ptrace(PTRACE_SYSCALL/CONT/DETACH, ..., SIGSTOP) does not work)

On Friday 24 April 2009 08:23, Roland McGrath wrote:
> > Perhaps, you should ask how strace can distinguish between "SIGSTOP
> > recieved" and "group-stop completed". I am not 100% sure, but at first
> > glance this looks possible.
>
> It is, but it's easier than the hack you suggest. PTRACE_GETSIGINFO only
> works for a ptrace stop, not a job control stop. If wait reported SIGSTOP,
> PTRACE_GETSIGINFO will fail with EINVAL for a job control stop but will
> succeed for a ptrace stop.

I already tested it and it works.
--
vda