2013-07-23 10:05:23

by Mike Galbraith

[permalink] [raw]
Subject: ptrace(PTRACE_ATTACH) [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck

I received a report that glibc:elf/pldd hangs occasionally, and indeed..

for i in `seq 1 1000`; do taskset -c 3 pldd $$ > /dev/null 2>&1; done

..will do so. Rummage.....

ptrace(PTRACE_DETACH) returns -ESRCH when the trap hasn't happened yet,
which happens because pldd doesn't wait() before ptrace(PTRACE_DETACH).

pldd source:

if (ptrace (PTRACE_ATTACH, tid, NULL, NULL) != 0)
{
/* There might be a race between reading the directory and
threads terminating. Ignore errors attaching to unknown
threads unless this is the main thread. */
if (errno == ESRCH && tid != pid)
continue;

error (EXIT_FAILURE, errno, gettext ("cannot attach to process %lu"),
tid);
}

struct thread_list *newp = alloca (sizeof (*newp));
newp->tid = tid;
newp->next = thread_list;
thread_list = newp;
}

closedir (dir);

int status = get_process_info (dfd, pid);

assert (thread_list != NULL);
do
{
ptrace (PTRACE_DETACH, thread_list->tid, NULL, NULL);
thread_list = thread_list->next;
}
while (thread_list != NULL);

Seems this usually works only because cycles expended between attach and
detach is usually enough to let trap happen so tracee can set its state
to TASK_TRACED as PTRACE_DETACH expects it to be.

Is this expected behavior? It looks a bit like "Doctor Doctor..".

-Mike


2013-07-23 16:04:04

by Oleg Nesterov

[permalink] [raw]
Subject: Re: ptrace(PTRACE_ATTACH) [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck

On 07/23, Mike Galbraith wrote:
>
> I received a report that glibc:elf/pldd hangs occasionally, and indeed..
>
> for i in `seq 1 1000`; do taskset -c 3 pldd $$ > /dev/null 2>&1; done
>
> ..will do so. Rummage.....
>
> ptrace(PTRACE_DETACH) returns -ESRCH when the trap hasn't happened yet,
> which happens because pldd doesn't wait() before ptrace(PTRACE_DETACH).
>
> pldd source:
>
[...snip...]
>
> Seems this usually works only because cycles expended between attach and
> detach is usually enough to let trap happen so tracee can set its state
> to TASK_TRACED as PTRACE_DETACH expects it to be.
>
> Is this expected behavior?

Yes. PTRACE_ATTACH + PTRACE_DETACH is not correct without wait() in
between, this is expected.

PTRACE_DETACH like (almost) any other ptrace request needs the stopped
tracee. Otherwise, say, ptrace_disable() or flush_ptrace_hw_breakpoint()
are not safe.

We could probably add PTRACE_UNTRACE which only does __ptrace_unlink/etc
like the exiting tracer does. (In particular, it could help to detach a
zombie).

But note that even PTRACE_ATTACH + PTRACE_UNTRACE won't be really correct.
PTRACE_ATTACH sends SIGSTOP, so without sys_wait() in between the tracee
can stop in TASK_STOPPED.

Oleg.

2013-07-23 16:43:12

by Oleg Nesterov

[permalink] [raw]
Subject: Re: ptrace(PTRACE_ATTACH) [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck

On 07/23, Oleg Nesterov wrote:
>
> On 07/23, Mike Galbraith wrote:
> >
> > I received a report that glibc:elf/pldd hangs occasionally, and indeed..
> >
> > for i in `seq 1 1000`; do taskset -c 3 pldd $$ > /dev/null 2>&1; done
> >
> > ..will do so. Rummage.....
> >
> > ptrace(PTRACE_DETACH) returns -ESRCH when the trap hasn't happened yet,
> > which happens because pldd doesn't wait() before ptrace(PTRACE_DETACH).
> >
> > pldd source:
> >
> [...snip...]
> >
> > Seems this usually works only because cycles expended between attach and
> > detach is usually enough to let trap happen so tracee can set its state
> > to TASK_TRACED as PTRACE_DETACH expects it to be.
> >
> > Is this expected behavior?
>
> Yes. PTRACE_ATTACH + PTRACE_DETACH is not correct without wait() in
> between, this is expected.
>
> PTRACE_DETACH like (almost) any other ptrace request needs the stopped
> tracee. Otherwise, say, ptrace_disable() or flush_ptrace_hw_breakpoint()
> are not safe.

I have found the source of pldd.c. It seems that it has another reason
for waitpid().

/* Stop all threads since otherwise the list of loaded modules might
change while we are reading it. */

Yes, but without waitpid() we can't know if it was actually stopped.

OTOH, in this particular case pldd.c doesn't really need PTRACE_DETACH,
it can simply exit.

Oleg.

2013-07-23 16:48:10

by Oleg Nesterov

[permalink] [raw]
Subject: Re: ptrace(PTRACE_ATTACH) [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck

Damn. Sorry for noise Mike,

On 07/23, Oleg Nesterov wrote:
>
> OTOH, in this particular case pldd.c doesn't really need PTRACE_DETACH,
> it can simply exit.

No it can't, I forgot that exit_ptrace() doesn't (and can't) clear
->exit_code. And this is another reason why PTRACE_DETACH needs the
stopped tracee.

Oleg.

2013-07-24 02:21:34

by Mike Galbraith

[permalink] [raw]
Subject: Re: ptrace(PTRACE_ATTACH) [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck

On Tue, 2013-07-23 at 17:58 +0200, Oleg Nesterov wrote:
> On 07/23, Mike Galbraith wrote:
> >
> > I received a report that glibc:elf/pldd hangs occasionally, and indeed..
> >
> > for i in `seq 1 1000`; do taskset -c 3 pldd $$ > /dev/null 2>&1; done
> >
> > ..will do so. Rummage.....
> >
> > ptrace(PTRACE_DETACH) returns -ESRCH when the trap hasn't happened yet,
> > which happens because pldd doesn't wait() before ptrace(PTRACE_DETACH).
> >
> > pldd source:
> >
> [...snip...]
> >
> > Seems this usually works only because cycles expended between attach and
> > detach is usually enough to let trap happen so tracee can set its state
> > to TASK_TRACED as PTRACE_DETACH expects it to be.
> >
> > Is this expected behavior?
>
> Yes. PTRACE_ATTACH + PTRACE_DETACH is not correct without wait() in
> between, this is expected.

Thanks for confirmation. The man page was pretty clear (read it after
slogging through source/traces, oh well, educational;) that -ESRCH was
expected, but I wanted to be sure about tracee state thereafter.

-Mike