2006-08-31 21:05:52

by Andreas Hobein

[permalink] [raw]
Subject: Trouble with ptrace self-attach rule since kernel > 2.6.14

Hi !

I have some trouble with the restriction of the ptrace functionality assumably
introduced into the linux kernel ?with the patch from 9. 11.2006
1105_2_ptrace-self-attach.patch.

My multithreaded application tries to write callstacks of all threads (some
sort of built-in mini debugger) in case of abnormal situations or failure.
With the newer linux kernel (> 2.6.14) self-attaching to processes of the
same thread group does not work any longer. Any call to ptrace results in a
EPERM result.

I have worked around this problem by first forking the process, than creating
the callstack output in the forked child process - which works without the
above mentioned problem - and terminating the child process just after this
operation.

Anyway this solution is somehow dirty and I would prefer the way it was
implemented before. My question is: Why may a sibling thread not
ptrace_attach another process of the same thread group, while at the same
time a forked child process of the same thread is allowed to do this
operation? Is there any replacement like pthread_suspend, which is available
on other Unixes?

(A short program for the demonstration of this effect is attached. Use Option
-f to enable forking)

Best regards,

????????Andreas


Attachments:
(No filename) (1.23 kB)
trace.c (1.77 kB)
Download all attachments

2006-09-01 07:36:44

by Andreas Hobein

[permalink] [raw]
Subject: Re: Trouble with ptrace self-attach rule since kernel > 2.6.14

On Friday 01 September 2006 03:39, Andrew Morton wrote:
> I'm unable to identify what patch you're referring to here. Please be more
> specific so we can ask the person who developed it.

I assume the attached patch from Linus Torvalds causes my problem, since the
condition was changed from "if (task == current)" to "if (task->tgid ==
current->tgid)" it breaks my application code. There may be other parts of
the ptrace() kernel code that where changed accordingly that I'm not aware.

There is also a reply from Roland McGrath (see
http://lkml.org/lkml/2005/11/9/460) who mentioned that there may occur some
problems in "some real programs out there". May be I'm the first one who is
affected by this new behaviour.

To summarise my questions:
- Why should a thread not be allowed to ptrace_attach to a sibling thread
- while a forked child of this thread may do this ?
- Is there any other way to suspend sibling threads at arbitrary points like
phread_suspend_np() does for example on AIX?

Thanks, Andreas

---------------------------
>From [email protected] Wed Nov 9 12:04:07 2005
Date: Wed, 9 Nov 2005 11:37:57 -0800 (PST)
From: Linus Torvalds <[email protected]>
Subject: Fix ptrace self-attach rule

Before we did CLONE_THREAD, the way to check whether we were attaching
to ourselves was to just check "current == task", but with CLONE_THREAD
we should check that the thread group ID matches instead.

Signed-off-by: Linus Torvalds <[email protected]>
---
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 5b8dd98..b88d418 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -155,7 +155,7 @@ int ptrace_attach(struct task_struct *ta
retval = -EPERM;
if (task->pid <= 1)
goto bad;
- if (task == current)
+ if (task->tgid == current->tgid)
goto bad;
/* the same process cannot be attached many times */
if (task->ptrace & PT_PTRACED)

2006-09-01 07:49:34

by Andrew Morton

[permalink] [raw]
Subject: Re: Trouble with ptrace self-attach rule since kernel > 2.6.14

On Fri, 1 Sep 2006 09:36:38 +0200
Andreas Hobein <[email protected]> wrote:

> On Friday 01 September 2006 03:39, Andrew Morton wrote:
> > I'm unable to identify what patch you're referring to here. Please be more
> > specific so we can ask the person who developed it.
>
> I assume the attached patch from Linus Torvalds causes my problem, since the
> condition was changed from "if (task == current)" to "if (task->tgid ==
> current->tgid)" it breaks my application code. There may be other parts of
> the ptrace() kernel code that where changed accordingly that I'm not aware.
>
> There is also a reply from Roland McGrath (see
> http://lkml.org/lkml/2005/11/9/460) who mentioned that there may occur some
> problems in "some real programs out there". May be I'm the first one who is
> affected by this new behaviour.

When you have names, please cc them..

> To summarise my questions:
> - Why should a thread not be allowed to ptrace_attach to a sibling thread
> - while a forked child of this thread may do this ?
> - Is there any other way to suspend sibling threads at arbitrary points like
> phread_suspend_np() does for example on AIX?
>
> Thanks, Andreas
>
> ---------------------------
> >From [email protected] Wed Nov 9 12:04:07 2005
> Date: Wed, 9 Nov 2005 11:37:57 -0800 (PST)
> From: Linus Torvalds <[email protected]>
> Subject: Fix ptrace self-attach rule
>
> Before we did CLONE_THREAD, the way to check whether we were attaching
> to ourselves was to just check "current == task", but with CLONE_THREAD
> we should check that the thread group ID matches instead.
>
> Signed-off-by: Linus Torvalds <[email protected]>
> ---
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 5b8dd98..b88d418 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -155,7 +155,7 @@ int ptrace_attach(struct task_struct *ta
> retval = -EPERM;
> if (task->pid <= 1)
> goto bad;
> - if (task == current)
> + if (task->tgid == current->tgid)
> goto bad;
> /* the same process cannot be attached many times */
> if (task->ptrace & PT_PTRACED)

2006-09-01 18:29:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: Trouble with ptrace self-attach rule since kernel > 2.6.14



On Fri, 1 Sep 2006, Andrew Morton wrote:
> On Fri, 1 Sep 2006 09:36:38 +0200
> Andreas Hobein <[email protected]> wrote:
>
> > There is also a reply from Roland McGrath (see
> > http://lkml.org/lkml/2005/11/9/460) who mentioned that there may occur some
> > problems in "some real programs out there". May be I'm the first one who is
> > affected by this new behaviour.
>
> When you have names, please cc them..

Andreas isn't the first, but in the almost-a-year that the patch has been
part of the kernel, he's the second.

And for the first one I found a reasonable way to avoid the problem: the
debugging thread can do a "vfork()" (or, if vfork() does something bad in
libc, do the direct "clone(CLONE_VFORK|CLONE_MM)" thing) to have a new
thread that is in a _different_ thread group, but is able to ptrace and
also is "synchronized" with the VM, simply because it shares it with all
the other threads it might want to debug.

That "new" (last November) check isn't likely going away. It solved _so_
many problems (both security and stability), and considering that

(a) in a year, only two people have ever even _noticed_
(b) there's a work-around as per above that isn't horribly invasive

I have to say that in order to actually go back to the old behaviour, we'd
have to have somebody who cares _deeply_, go back and check every single
special case, deadlock, and race.

Not allowing ptracing to send signals and be part of the same thread group
really got rid of a _lot_ of complexity (and bugs - I don't mind
complexity per se, but complexity that was known broken and nobody could
really see a good solution for, and had both security and stability
implications is a bad bad thing).

But if somebody does feel that "deep caring feeling", I'll try to help
them. Maybe Roland and Oleg Nesterov might lend a hand too (Oleg in
particular by now probably knows all the races in that area, including
pthread parents sending signals to its own thread and de-parenting etc).

Hint, hint, Andreas. But I think it's a rats' nest, and you'd be better
off trying the CLONE_MM|CLONE_VFORK approach.

Oleg (now added to the Cc) in particular may answer authoritatively
whether maybe his fixes since last year have fixed some of the problems,
or whether they do indeed depend even more on the fact that the ptrace
"fake parent" cannot be in the same thread group.

(Even if we could make it work, I personally much prefer the fact that a
ptrace parent is forced to behave more like a "real parent" - who also
cannot be in the same thread group).

Linus

2006-09-02 17:03:27

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Trouble with ptrace self-attach rule since kernel > 2.6.14

On 09/01, Linus Torvalds wrote:
>
> On Fri, 1 Sep 2006, Andrew Morton wrote:
> > On Fri, 1 Sep 2006 09:36:38 +0200
> > Andreas Hobein <[email protected]> wrote:
> >
> > > There is also a reply from Roland McGrath (see
> > > http://lkml.org/lkml/2005/11/9/460) who mentioned that there may occur some
> > > problems in "some real programs out there". May be I'm the first one who is
> > > affected by this new behaviour.
> >
> > When you have names, please cc them..
>
> Andreas isn't the first, but in the almost-a-year that the patch has been
> part of the kernel, he's the second.
>
> And for the first one I found a reasonable way to avoid the problem: the
> debugging thread can do a "vfork()" (or, if vfork() does something bad in
> libc, do the direct "clone(CLONE_VFORK|CLONE_MM)" thing) to have a new
> thread that is in a _different_ thread group, but is able to ptrace and
> also is "synchronized" with the VM, simply because it shares it with all
> the other threads it might want to debug.
>
> That "new" (last November) check isn't likely going away. It solved _so_
> many problems (both security and stability), and considering that
>
> (a) in a year, only two people have ever even _noticed_
> (b) there's a work-around as per above that isn't horribly invasive
>
> I have to say that in order to actually go back to the old behaviour, we'd
> have to have somebody who cares _deeply_, go back and check every single
> special case, deadlock, and race.

I can't say whether it is safe to restore an old behaviour. This all is
tricky, and there were so many changes in that area since then.

At least we need to remove 'current->tgid != p->tgid' check in eligible_child().
Currently this is a dead code. But if we restore an old behaviour, this
allows to release ->group_leader too early and crash the kernel.

I personally do not see other problems right now, but this doesn't mean
they are not present...

On the other hand, if we are not going to go back, we can remove subtle
'tsk->parent->signal->flags & SIGNAL_GROUP_EXIT' check in exit_notify(),
and a similar one in may_ptrace_stop().

Oleg.


--
VGER BF report: H 0

2006-09-02 17:22:17

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH] eligible_child: remove an obsolete ->tgid check

It is not possible to find a sub-thread in ->children/->ptrace_children
lists, ptrace_attach() does not allow to attach to sub-threads.

Even if it was possible to ptrace the task from the same thread group,
we can't allow to release ->group_leader while there are others (ptracer)
threads in the same group.

Signed-off-by: Oleg Nesterov <[email protected]>

--- 2.6.18-rc4/kernel/exit.c~ 2006-09-02 21:08:59.000000000 +0400
+++ 2.6.18-rc4/kernel/exit.c 2006-09-02 21:09:12.000000000 +0400
@@ -1051,7 +1051,7 @@ static int eligible_child(pid_t pid, int
* Do not consider thread group leaders that are
* in a non-empty thread group:
*/
- if (current->tgid != p->tgid && delay_group_leader(p))
+ if (delay_group_leader(p))
return 2;

if (security_task_wait(p))


--
VGER BF report: U 0.499788

2006-09-04 12:16:10

by Andreas Hobein

[permalink] [raw]
Subject: Re: Trouble with ptrace self-attach rule since kernel > 2.6.14

On Saturday 02 September 2006 19:03, Oleg Nesterov wrote:
> On 09/01, Linus Torvalds wrote:
> > On Fri, 1 Sep 2006, Andrew Morton wrote:
> > > On Fri, 1 Sep 2006 09:36:38 +0200
> > >
> > > Andreas Hobein <[email protected]> wrote:
> > > > There is also a reply from Roland McGrath (see
> > > > http://lkml.org/lkml/2005/11/9/460) who mentioned that there may
> > > > occur some problems in "some real programs out there". May be I'm the
> > > > first one who is affected by this new behaviour.
> > >
> > > When you have names, please cc them..
> >
> > Andreas isn't the first, but in the almost-a-year that the patch has been
> > part of the kernel, he's the second.
> >
> > And for the first one I found a reasonable way to avoid the problem: the
> > debugging thread can do a "vfork()" (or, if vfork() does something bad in
> > libc, do the direct "clone(CLONE_VFORK|CLONE_MM)" thing) to have a new
> > thread that is in a _different_ thread group, but is able to ptrace and
> > also is "synchronized" with the VM, simply because it shares it with all
> > the other threads it might want to debug.
> >
> > That "new" (last November) check isn't likely going away. It solved _so_
> > many problems (both security and stability), and considering that
> >
> > (a) in a year, only two people have ever even _noticed_
> > (b) there's a work-around as per above that isn't horribly invasive
> >
> > I have to say that in order to actually go back to the old behaviour,
> > we'd have to have somebody who cares _deeply_, go back and check every
> > single special case, deadlock, and race.
>
> I can't say whether it is safe to restore an old behaviour. This all is
> tricky, and there were so many changes in that area since then.
>
> At least we need to remove 'current->tgid != p->tgid' check in
> eligible_child(). Currently this is a dead code. But if we restore an old
> behaviour, this allows to release ->group_leader too early and crash the
> kernel.
>
> I personally do not see other problems right now, but this doesn't mean
> they are not present...
>
> On the other hand, if we are not going to go back, we can remove subtle
> 'tsk->parent->signal->flags & SIGNAL_GROUP_EXIT' check in exit_notify(),
> and a similar one in may_ptrace_stop().

Thank you all for your kind assistance. It turned out that using vfork() or
clone() would make a considerable redesign of my code necessary. While the
added overhead from a "real" fork plus communication of the result over pipes
is still acceptable, I currently have a lack of time to restructure my
application to work with vfork or clone and its intrinsic restrictions. Also
some more non-portable code would be added, which discourages me a bit also.

On the other hand I understand that this kernel change helped to fix a number
of problems while the now forbidden feature of self attaching to threads is
rarely used and could be achieved by other means like forking before
attaching.

Since I'm rather clueless with regard to the kernel internals I'm afraid I
can't add more value to this discussion other than to prove there is at least
a second application out there being affected by this patch.

Thank you once again and best regards,

Andreas

--
VGER BF report: U 0.5

2006-09-04 15:23:13

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Trouble with ptrace self-attach rule since kernel > 2.6.14

On 09/04, Andreas Hobein wrote:
>
> Thank you all for your kind assistance. It turned out that using vfork() or
> clone() would make a considerable redesign of my code necessary. While the
> added overhead from a "real" fork plus communication of the result over pipes
> is still acceptable, I currently have a lack of time to restructure my
> application to work with vfork or clone and its intrinsic restrictions. Also
> some more non-portable code would be added, which discourages me a bit also.

Could you test your application with 2.6.18-rc6 and this change

- if (task == current)
+ if (task->tgid == current->tgid)

reverted? I think any report, positive or negative, would be useful.

It would be nice if your test covers different conditions, such as
'main thread debugs sub-thread' and vice versa. Exec under ptrace is
also interesting.

> Since I'm rather clueless with regard to the kernel internals I'm afraid I
> can't add more value to this discussion other than to prove there is at least
> a second application out there being affected by this patch.

It's a pity to disappoint you, but you may be the 3rd :) Found this
unanswered message:

http://marc.theaimsgroup.com/?l=linux-kernel&m=114073955827139

(the author cc'ed)

Oleg.

2006-09-04 15:56:22

by Andreas Hobein

[permalink] [raw]
Subject: Re: Trouble with ptrace self-attach rule since kernel > 2.6.14

On Monday 04 September 2006 17:23, you wrote:
> Could you test your application with 2.6.18-rc6 and this change
>
> - if (task == current)
> + if (task->tgid == current->tgid)
>
> reverted? I think any report, positive or negative, would be useful.

In fact I applied exactly this change before posting to this mailing list to
kernel-2.6.17-1.2174_FC5 (Source rpm from fedora core 5) _without_ success.
Thats why I thought there were also some other changes with similar effects
in the kernel source at the same time.

> It would be nice if your test covers different conditions, such as
> 'main thread debugs sub-thread' and vice versa. Exec under ptrace is
> also interesting.

In my application a child thread debugs sibling threads and the parent.
Neither works for newer kernels.

I will try a 2.6.18-rc6 vanilla kernel with the above patch applied at home
and give you some feedback wether there is a different result as with the
patched 2.6.17-1.2174_FC5 kernel.

> It's a pity to disappoint you, but you may be the 3rd :) Found this
> unanswered message:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=114073955827139

May be there are more advocates of self-debugging than expected ...

Andreas

2006-09-04 20:08:04

by Markus Gutschke

[permalink] [raw]
Subject: Re: Trouble with ptrace self-attach rule since kernel > 2.6.14

Oleg Nesterov wrote:
> It's a pity to disappoint you, but you may be the 3rd :) Found this
> unanswered message:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=114073955827139
>
> (the author cc'ed)


I think, I would be the second one rather than the third one. Linus
replied to me personally, and that is probably the reason why the
archive shows the question as unanswered.

For the record (i.e. the mailing list archives), yes, I was able to
change my application to use clone(CLONE_VM) followed by ptrace(),
instead of ptrace()'ing from one of the threads in my application.

There were a few minor obstacles that I had to overcome, though. E.g.
some versions of glibc find the location of "errno" by looking at the
current stack pointer and masking off some bits. Since my code should be
portable across a large range of different glibc and kernel versions, I
had to accommodate this behavior by either 1) allocating the new
thread's stack within the old thread's stack, effectively sharing
"errno", or 2) making direct system calls and avoiding all functions
that access "errno".

The former approach is preferable when using CLONE_VFORK, but if that is
not an option than the second approach will work OK.

Overall, it turned out to be a few weeks worth of work making these
changes, but (as usual) most of the time was spent validating that the
new code works on all platforms and in all usage scenarios. As a result,
the new code is actually better than the old one. There definitely
seemed to be problems with the old approach and some older kernels, too.

Using clone(), makes the code slightly less portable, but all of this
code is already pretty Linux-specific anyway.

I'd be happy to answer any questions about working around various bugs
in historic kernel and glibc versions.


Markus

2006-09-04 21:42:45

by Andreas Hobein

[permalink] [raw]
Subject: Re: Trouble with ptrace self-attach rule since kernel > 2.6.14

> On Monday 04 September 2006 17:23, you wrote:
> > Could you test your application with 2.6.18-rc6 and this change
> >
> > - if (task == current)
> > + if (task->tgid == current->tgid)
> >
> > reverted? I think any report, positive or negative, would be useful.

Reverting this change in kernel 2.6.18-rc6 was successful, that means my
application has the old behaviour as before 2.6.15. I don't know why patching
the fedora kernel didn't lead to the same result.

I've tested tracing child threads from the parent thread as well as tracing
siblings and parent threads from a child. All tests where successful when
reverting the above mentioned changes.

I cannot judge wether reverting back to "if (task == current)" would cause
trouble in other cases. On the other hand, I'm quite happy with my
fork()/pipe() solution that works fine in my application so there is no need
to go back from my point of view.

I started this discussion mainly to get more background information, why
ptracing sibling threads is forbidden since kernel 2.6.15. I hope I
understood correctly that this change was a trade-off between removing the
self attach feature and elliminating many serious bugs. Thus I would
understand if the change won't be reverted in future kernel versions.

Andreas

2006-09-04 22:00:53

by Linus Torvalds

[permalink] [raw]
Subject: Re: Trouble with ptrace self-attach rule since kernel > 2.6.14



On Mon, 4 Sep 2006, Andreas Hobein wrote:
>
> I've tested tracing child threads from the parent thread as well as tracing
> siblings and parent threads from a child. All tests where successful when
> reverting the above mentioned changes.

The problems tend to happen when the thread leader exits while one of the
sub-threads is being traced, and the tracer thread ends up being
re-parented to be the child of the traced thread (or something like that -
I forget the exact details).

There was also some problem with the tracer doing an exit() without
detaching, or something.

We may have fixed most of the problems since - Oleg has certainly been
cleaning up some of this, and it's possible that the problems we had are
ok now.

Even back when it was broken, _normal_ use never showed the problem (ie no
well-behaved ptrace use would cause anything bad to happen). But the
breakage was a local DoS attack, where you could either force an oops or a
unkillable process, I forget which.

There was an exploit for at least one of the exploits, so maybe somebody
could test that exploit together with the one-line revert.

That said, it sounds like both of the people who ever noticed this are
reasonably happy with their work-arounds, so I'm hoping we can simply
decide not to care, and just keep doing the simpler "you can't ptrace your
own thread group" thing. That rule simply avoids a lot of special cases.

Linus