Zero is written at clear_tid_address, when the process exits.
This functionality is used by pthread_join().
sys_set_tid_address() changes this address for current task.
Before this patch clear_tid_address could not be got from user space.
I want to dump a full state of a task, so I need this address.
Also I think it may be useful for debugging a multithreading program.
I am not sure that ptrace is suitable place. It may be added in prctl,
but I think it's a bit useless and strange. I can't image a real
situation (avoid checkpointing) when a thread will want to get own
clear_tid_address from itself, this address is used by parent ussually.
Signed-off-by: Andrew Vagin <[email protected]>
---
include/linux/ptrace.h | 3 +++
kernel/ptrace.c | 3 +++
2 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index c2f1f6a..79b84a3 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -51,6 +51,9 @@
#define PTRACE_INTERRUPT 0x4207
#define PTRACE_LISTEN 0x4208
+/* Get clear_child_tid address */
+#define PTRACE_GET_TID_ADDRESS 0x4209
+
/* flags in @data for PTRACE_SEIZE */
#define PTRACE_SEIZE_DEVEL 0x80000000 /* temp flag for development */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 00ab2ca..ed7fbe7 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -845,6 +845,9 @@ int ptrace_request(struct task_struct *child, long request,
break;
}
#endif
+ case PTRACE_GET_TID_ADDRESS:
+ return put_user(child->clear_child_tid, (int __user **) data);
+
default:
break;
}
--
1.7.1
On Fri, Feb 03, 2012 at 02:11:23PM +0300, Andrew Vagin wrote:
> Zero is written at clear_tid_address, when the process exits.
> This functionality is used by pthread_join().
>
> sys_set_tid_address() changes this address for current task.
>
> Before this patch clear_tid_address could not be got from user space.
> I want to dump a full state of a task, so I need this address.
> Also I think it may be useful for debugging a multithreading program.
>
> I am not sure that ptrace is suitable place. It may be added in prctl,
> but I think it's a bit useless and strange. I can't image a real
> situation (avoid checkpointing) when a thread will want to get own
> clear_tid_address from itself, this address is used by parent ussually.
>
> Signed-off-by: Andrew Vagin <[email protected]>
Ummm... this really doesn't fit in ptrace. Cyrill, why not put it
together with other params you're exporting?
Thanks.
--
tejun
On Fri, Feb 03, 2012 at 08:25:19AM -0800, Tejun Heo wrote:
> On Fri, Feb 03, 2012 at 02:11:23PM +0300, Andrew Vagin wrote:
> > Zero is written at clear_tid_address, when the process exits.
> > This functionality is used by pthread_join().
> >
> > sys_set_tid_address() changes this address for current task.
> >
> > Before this patch clear_tid_address could not be got from user space.
> > I want to dump a full state of a task, so I need this address.
> > Also I think it may be useful for debugging a multithreading program.
> >
> > I am not sure that ptrace is suitable place. It may be added in prctl,
> > but I think it's a bit useless and strange. I can't image a real
> > situation (avoid checkpointing) when a thread will want to get own
> > clear_tid_address from itself, this address is used by parent ussually.
> >
> > Signed-off-by: Andrew Vagin <[email protected]>
>
> Ummm... this really doesn't fit in ptrace. Cyrill, why not put it
> together with other params you're exporting?
>
We could add it to /proc/pid/stat (but I fear Andrew Morton shoot me then ;)
Cyrill
On 02/03/2012 08:25 PM, Tejun Heo wrote:
> On Fri, Feb 03, 2012 at 02:11:23PM +0300, Andrew Vagin wrote:
>> Zero is written at clear_tid_address, when the process exits.
>> This functionality is used by pthread_join().
>>
>> sys_set_tid_address() changes this address for current task.
>>
>> Before this patch clear_tid_address could not be got from user space.
>> I want to dump a full state of a task, so I need this address.
>> Also I think it may be useful for debugging a multithreading program.
>>
>> I am not sure that ptrace is suitable place. It may be added in prctl,
>> but I think it's a bit useless and strange. I can't image a real
>> situation (avoid checkpointing) when a thread will want to get own
>> clear_tid_address from itself, this address is used by parent ussually.
>>
>> Signed-off-by: Andrew Vagin <[email protected]>
>
> Ummm... this really doesn't fit in ptrace. Cyrill, why not put it
> together with other params you're exporting?
Because there's no need for current to get this value of himself, but can
be useful for e.g. gdb. But we don't insist. Prctl extension is just fine.
> Thanks.
>
On 02/03, Cyrill Gorcunov wrote:
>
> On Fri, Feb 03, 2012 at 08:25:19AM -0800, Tejun Heo wrote:
>
> > Ummm... this really doesn't fit in ptrace. Cyrill, why not put it
> > together with other params you're exporting?
I agree. And to me this looks strange: some info is exported via /proc,
some things need ptrace.
> We could add it to /proc/pid/stat (but I fear Andrew Morton shoot me then ;)
may be you can add /proc/pid/misc_ugly^Wrandom_things_for_cr which
you can extend as needed ;)
Oleg.
On Fri, Feb 03, 2012 at 05:41:29PM +0100, Oleg Nesterov wrote:
> On 02/03, Cyrill Gorcunov wrote:
> >
> > On Fri, Feb 03, 2012 at 08:25:19AM -0800, Tejun Heo wrote:
> >
> > > Ummm... this really doesn't fit in ptrace. Cyrill, why not put it
> > > together with other params you're exporting?
>
> I agree. And to me this looks strange: some info is exported via /proc,
> some things need ptrace.
>
> > We could add it to /proc/pid/stat (but I fear Andrew Morton shoot me then ;)
>
> may be you can add /proc/pid/misc_ugly^Wrandom_things_for_cr which
> you can extend as needed ;)
>
I personally don't mind to have /proc/pid/cr directory where we could
export all the data we might ever need. Still some things added (such
as /proc/pid/children) could be useful for other purposes as well, not
only c/r.
Cyrill
On 02/03, Pavel Emelyanov wrote:
>
> On 02/03/2012 08:25 PM, Tejun Heo wrote:
> > On Fri, Feb 03, 2012 at 02:11:23PM +0300, Andrew Vagin wrote:
> >> Zero is written at clear_tid_address, when the process exits.
> >> This functionality is used by pthread_join().
> >>
> >> sys_set_tid_address() changes this address for current task.
> >>
> >> Before this patch clear_tid_address could not be got from user space.
> >> I want to dump a full state of a task, so I need this address.
> >> Also I think it may be useful for debugging a multithreading program.
> >>
> >> I am not sure that ptrace is suitable place. It may be added in prctl,
> >> but I think it's a bit useless and strange. I can't image a real
> >> situation (avoid checkpointing) when a thread will want to get own
> >> clear_tid_address from itself, this address is used by parent ussually.
> >>
> >> Signed-off-by: Andrew Vagin <[email protected]>
> >
> > Ummm... this really doesn't fit in ptrace. Cyrill, why not put it
> > together with other params you're exporting?
>
> Because there's no need for current to get this value of himself, but can
> be useful for e.g. gdb.
OK, perhaps this makes sense, I do not know.
Jan, Pedro, do you think gdb can use PTRACE_GET_TID_ADDRESS (returns
tracee->clear_child_tid) ?
Oleg.
On 02/03/2012 04:51 PM, Oleg Nesterov wrote:
> On 02/03, Pavel Emelyanov wrote:
>> Because there's no need for current to get this value of himself, but can
>> be useful for e.g. gdb.
>
> OK, perhaps this makes sense, I do not know.
>
> Jan, Pedro, do you think gdb can use PTRACE_GET_TID_ADDRESS (returns
> tracee->clear_child_tid) ?
Off hand, I'm not picturing a use. But that may well just mean I'm lacking
imagination. Andrew, Pavel, did you have a particular idea in mind when you
said it may be useful for debugging a multithreading program / gdb?
--
Pedro Alves
On Tue, Feb 07, 2012 at 08:07:38PM +0000, Pedro Alves wrote:
> On 02/03/2012 04:51 PM, Oleg Nesterov wrote:
> > On 02/03, Pavel Emelyanov wrote:
>
> >> Because there's no need for current to get this value of himself, but can
> >> be useful for e.g. gdb.
> >
> > OK, perhaps this makes sense, I do not know.
> >
> > Jan, Pedro, do you think gdb can use PTRACE_GET_TID_ADDRESS (returns
> > tracee->clear_child_tid) ?
>
> Off hand, I'm not picturing a use. But that may well just mean I'm lacking
> imagination. Andrew, Pavel, did you have a particular idea in mind when you
> said it may be useful for debugging a multithreading program / gdb?
>
Might not we set up hw watchpoint on this address and get interrupt
before pthread-join will find it? (To be fair I'm not sure if such
trick will work, didn't test ;)
Cyrill
On 02/07/2012 08:56 PM, Cyrill Gorcunov wrote:
> On Tue, Feb 07, 2012 at 08:07:38PM +0000, Pedro Alves wrote:
>> On 02/03/2012 04:51 PM, Oleg Nesterov wrote:
>>> On 02/03, Pavel Emelyanov wrote:
>>
>>>> Because there's no need for current to get this value of himself, but can
>>>> be useful for e.g. gdb.
>>>
>>> OK, perhaps this makes sense, I do not know.
>>>
>>> Jan, Pedro, do you think gdb can use PTRACE_GET_TID_ADDRESS (returns
>>> tracee->clear_child_tid) ?
>>
>> Off hand, I'm not picturing a use. But that may well just mean I'm lacking
>> imagination. Andrew, Pavel, did you have a particular idea in mind when you
>> said it may be useful for debugging a multithreading program / gdb?
>>
>
> Might not we set up hw watchpoint on this address and get interrupt
> before pthread-join will find it? (To be fair I'm not sure if such
> trick will work, didn't test ;)
For a debugger wanting to know when a pthread_join was about to return?
Might be simpler to put a breakpoint (or stap probe, or some such) inside
pthread_join.
It's the kernel that writes to this address, so I've no
idea if the watchpoint trap ends up visible on userspace. Which thread
would it be reported to, given that this is cleared when the child
is gone, I have no idea either.
--
Pedro Alves
On Tue, Feb 07, 2012 at 09:15:07PM +0000, Pedro Alves wrote:
> On 02/07/2012 08:56 PM, Cyrill Gorcunov wrote:
> > On Tue, Feb 07, 2012 at 08:07:38PM +0000, Pedro Alves wrote:
> >> On 02/03/2012 04:51 PM, Oleg Nesterov wrote:
> >>> On 02/03, Pavel Emelyanov wrote:
> >>
> >>>> Because there's no need for current to get this value of himself, but can
> >>>> be useful for e.g. gdb.
> >>>
> >>> OK, perhaps this makes sense, I do not know.
> >>>
> >>> Jan, Pedro, do you think gdb can use PTRACE_GET_TID_ADDRESS (returns
> >>> tracee->clear_child_tid) ?
> >>
> >> Off hand, I'm not picturing a use. But that may well just mean I'm lacking
> >> imagination. Andrew, Pavel, did you have a particular idea in mind when you
> >> said it may be useful for debugging a multithreading program / gdb?
> >>
> >
> > Might not we set up hw watchpoint on this address and get interrupt
> > before pthread-join will find it? (To be fair I'm not sure if such
> > trick will work, didn't test ;)
>
> For a debugger wanting to know when a pthread_join was about to return?
> Might be simpler to put a breakpoint (or stap probe, or some such) inside
> pthread_join.
Yes, could be, but it means you have to install pthread debug libs, right?
(have no idea actually since I personally use debug printing instead of
breakpoints).
> It's the kernel that writes to this address, so I've no
> idea if the watchpoint trap ends up visible on userspace. Which thread
> would it be reported to, given that this is cleared when the child
> is gone, I have no idea either.
Yeah, need some help from someone who wrote hw-breakpoints support in
kernel (i don't remember the details).
Cyrill
On 02/07/2012 09:51 PM, Cyrill Gorcunov wrote:
> On Tue, Feb 07, 2012 at 09:15:07PM +0000, Pedro Alves wrote:
>> On 02/07/2012 08:56 PM, Cyrill Gorcunov wrote:
>>> On Tue, Feb 07, 2012 at 08:07:38PM +0000, Pedro Alves wrote:
>>>> On 02/03/2012 04:51 PM, Oleg Nesterov wrote:
>>>>> On 02/03, Pavel Emelyanov wrote:
>>>>
>>>>>> Because there's no need for current to get this value of himself, but can
>>>>>> be useful for e.g. gdb.
>>>>>
>>>>> OK, perhaps this makes sense, I do not know.
>>>>>
>>>>> Jan, Pedro, do you think gdb can use PTRACE_GET_TID_ADDRESS (returns
>>>>> tracee->clear_child_tid) ?
>>>>
>>>> Off hand, I'm not picturing a use. But that may well just mean I'm lacking
>>>> imagination. Andrew, Pavel, did you have a particular idea in mind when you
>>>> said it may be useful for debugging a multithreading program / gdb?
>>>>
>>>
>>> Might not we set up hw watchpoint on this address and get interrupt
>>> before pthread-join will find it? (To be fair I'm not sure if such
>>> trick will work, didn't test ;)
>>
>> For a debugger wanting to know when a pthread_join was about to return?
>> Might be simpler to put a breakpoint (or stap probe, or some such) inside
>> pthread_join.
>
> Yes, could be, but it means you have to install pthread debug libs, right?
> (have no idea actually since I personally use debug printing instead of
> breakpoints).
Not really more than what we need today. Just some exported function name
in the elf symbol tables. Assuming a the program is using clear_child_tid
address for pthread_join as glibc is may not be a good idea. It's doing
things at the wrong layer. Also, hardware watchpoints are a scarse
resource.
>> It's the kernel that writes to this address, so I've no
>> idea if the watchpoint trap ends up visible on userspace. Which thread
>> would it be reported to, given that this is cleared when the child
>> is gone, I have no idea either.
>
> Yeah, need some help from someone who wrote hw-breakpoints support in
> kernel (i don't remember the details).
I just tried it. This is &pthread->tid in glibc/libpthread, so with debug
info it's easy to figure out where to set the watchpoint manually with gdb
without asking the kernel. Doesn't work. ptrace doesn't show any trap
for the kernel writes.
--
Pedro Alves
On Wed, Feb 08, 2012 at 12:30:47PM +0000, Pedro Alves wrote:
> >
> > Yes, could be, but it means you have to install pthread debug libs, right?
> > (have no idea actually since I personally use debug printing instead of
> > breakpoints).
>
> Not really more than what we need today. Just some exported function name
> in the elf symbol tables. Assuming a the program is using clear_child_tid
> address for pthread_join as glibc is may not be a good idea. It's doing
> things at the wrong layer. Also, hardware watchpoints are a scarse
> resource.
It's pretty precious resource still incredibly useful. ok, i see
what you mean, thanks.
>
> >> It's the kernel that writes to this address, so I've no
> >> idea if the watchpoint trap ends up visible on userspace. Which thread
> >> would it be reported to, given that this is cleared when the child
> >> is gone, I have no idea either.
> >
> > Yeah, need some help from someone who wrote hw-breakpoints support in
> > kernel (i don't remember the details).
>
> I just tried it. This is &pthread->tid in glibc/libpthread, so with debug
> info it's easy to figure out where to set the watchpoint manually with gdb
> without asking the kernel. Doesn't work. ptrace doesn't show any trap
> for the kernel writes.
>
Thanks for info.
Cyrill
Hi All,
> Off hand, I'm not picturing a use. But that may well just mean I'm lacking
> imagination. Andrew, Pavel, did you have a particular idea in mind when you
> said it may be useful for debugging a multithreading program / gdb?
For example:
pthread_join() waits on a futex (child_tid_address), but a thread is
dead already.
Where is the problem? At first I would want to check, that
pthread_join() uses the correct address of the futex. The simple way is
to connect to this process via gdb
and get it.
I want to say, that when you have trouble with child_tid_address, you
may want to get it.
If the code is your, you may get it via prctl, but if it's not, what
will you do?
Yes, we have lived without this for a long time, so I can assume that
all what I tell is useless.
But one more reason why I want to add this in ptrace.
I want use this functionality to dump a process state.
Now we do a few actions in a context of the process which we want to
dump. A parasite code is injected for that. This way is dangerous,
because it may affect a target process (It may occur due to a bug in
parasite). We use this way, because we want to add minimum functionality
in the kernel. Our current goal is to make a full functional prototype
and when everyone will understand that this project works and it's
usefull, we will do improvements. And I hope in future we will save
state of processes without parasite.
For this reason I avoid adding new actions in a parasite code, but
prctl() can be executed only from parasite code.
If after all you are not sure that this functionality should be in
ptrace, I will add it in prctl, it's not a problem.
Thanks.
On 02/08, Pedro Alves wrote:
>
> I just tried it. This is &pthread->tid in glibc/libpthread, so with debug
> info it's easy to figure out where to set the watchpoint manually with gdb
> without asking the kernel. Doesn't work. ptrace doesn't show any trap
> for the kernel writes.
The tracee simply can't report this trap. it is already dead ;) and
hw breakpoint (used by ptrace) is "pinned" to the thread.
Oleg.
On 02/08/2012 05:31 PM, Oleg Nesterov wrote:
> On 02/08, Pedro Alves wrote:
>>
>> I just tried it. This is &pthread->tid in glibc/libpthread, so with debug
>> info it's easy to figure out where to set the watchpoint manually with gdb
>> without asking the kernel. Doesn't work. ptrace doesn't show any trap
>> for the kernel writes.
>
> The tracee simply can't report this trap. it is already dead ;) and
> hw breakpoint (used by ptrace) is "pinned" to the thread.
Right, as I said. :-) I saw that a watchpoint trap isn't reported either
for the CLONE_CHILD_SETTID case (that is, within clone, when the kernel
writes the tid to the memory address passed in to the clone syscall).
I wouldn't have been surprised to see the trap in userspace in either
the parent or the child, though I'm not really surprised to not
see it either.
--
Pedro Alves
On Wed, 08 Feb 2012 14:41:45 +0100, Andrew Vagin wrote:
> I want to say, that when you have trouble with child_tid_address,
> you may want to get it.
> If the code is your, you may get it via prctl, but if it's not, what
> will you do?
GDB does PTRACE_ARCH_PRCTL (amd64-linux-nat.c) - without inserting any code
into the debuggee. How it can be simplified more?
Jan
On 02/08, Pedro Alves wrote:
>
> On 02/08/2012 05:31 PM, Oleg Nesterov wrote:
> > On 02/08, Pedro Alves wrote:
> >>
> >> I just tried it. This is &pthread->tid in glibc/libpthread, so with debug
> >> info it's easy to figure out where to set the watchpoint manually with gdb
> >> without asking the kernel. Doesn't work. ptrace doesn't show any trap
> >> for the kernel writes.
> >
> > The tracee simply can't report this trap. it is already dead ;) and
> > hw breakpoint (used by ptrace) is "pinned" to the thread.
>
> Right, as I said. :-) I saw that a watchpoint trap isn't reported either
> for the CLONE_CHILD_SETTID case (that is, within clone, when the kernel
> writes the tid to the memory address passed in to the clone syscall).
Yes. But in this case the new thread has no bps even if it is auto-
attached.
IOW, I think that hw bp can detect the write from the kernel space,
but I didn't check.
> I wouldn't have been surprised to see the trap in userspace in either
> the parent
It would be just wrong. Please note that it is child, not parent, who
does the write.
If only I understood why do we need CLONE_CHILD_SETTID... at least
I certainly do not understand why glibc translates fork() into
clone(CLONE_CHILD_SETTID) on my system. The child write into its
memory, the parent can't see this change. IIRC, initially
CLONE_CHILD_SETTID wrote child->pid into the parent's memory, and
even before the child was actually created.
Oleg.
On Wed, Feb 08, 2012 at 08:02:50PM +0100, Oleg Nesterov wrote:
> > Right, as I said. :-) I saw that a watchpoint trap isn't reported either
> > for the CLONE_CHILD_SETTID case (that is, within clone, when the kernel
> > writes the tid to the memory address passed in to the clone syscall).
>
> Yes. But in this case the new thread has no bps even if it is auto-
> attached.
>
> IOW, I think that hw bp can detect the write from the kernel space,
> but I didn't check.
yes, that's how kgdb work (if only I'm not missing something obvious)
Cyrill
On Wed, 08 Feb 2012 20:02:50 +0100, Oleg Nesterov wrote:
> If only I understood why do we need CLONE_CHILD_SETTID... at least
> I certainly do not understand why glibc translates fork() into
> clone(CLONE_CHILD_SETTID) on my system. The child write into its
> memory, the parent can't see this change. IIRC, initially
> CLONE_CHILD_SETTID wrote child->pid into the parent's memory, and
> even before the child was actually created.
IIUC your question correctly it is because if you PTRACE_SYSCALL SYS_fork
(therefore you PTRACE_SYSCALL SYS_clone) twice (therefore you stop on the
fork/clone syscall exit) you should have struct pthread contents valid for
iterating and examinating the thread structures via libthread_db.
This cannot be achieved by any userland code.
Regards,
Jan
On 02/08/2012 07:02 PM, Oleg Nesterov wrote:
> On 02/08, Pedro Alves wrote:
>>
>> On 02/08/2012 05:31 PM, Oleg Nesterov wrote:
>>> On 02/08, Pedro Alves wrote:
>>>>
>>>> I just tried it. This is &pthread->tid in glibc/libpthread, so with debug
>>>> info it's easy to figure out where to set the watchpoint manually with gdb
>>>> without asking the kernel. Doesn't work. ptrace doesn't show any trap
>>>> for the kernel writes.
>>>
>>> The tracee simply can't report this trap. it is already dead ;) and
>>> hw breakpoint (used by ptrace) is "pinned" to the thread.
>>
>> Right, as I said. :-) I saw that a watchpoint trap isn't reported either
>> for the CLONE_CHILD_SETTID case (that is, within clone, when the kernel
>> writes the tid to the memory address passed in to the clone syscall).
>
> Yes. But in this case the new thread has no bps even if it is auto-
> attached.
Ah, right. It used to be the kernel copied the debug registers from
parent->child, but they're always cleared in the child nowadays
(since 72f674d203cd230426437cdcf7dd6f681dad8b0d).
>> I wouldn't have been surprised to see the trap in userspace in either
>> the parent
>
> It would be just wrong. Please note that it is child, not parent, who
> does the write.
Okay, I didn't know which it was that touched the memory,
hence the "either". Thanks. Paired with the
we-now-clear-debug-regs-on-clone thing, it makes sense.
--
Pedro Alves
Jan Kratochvil wrote:
> On Wed, 08 Feb 2012 14:41:45 +0100, Andrew Vagin wrote:
> > I want to say, that when you have trouble with child_tid_address,
> > you may want to get it.
> > If the code is your, you may get it via prctl, but if it's not, what
> > will you do?
>
> GDB does PTRACE_ARCH_PRCTL (amd64-linux-nat.c) - without inserting any code
> into the debuggee. How it can be simplified more?
That's x86-64 only, (although bizarrely it looks like 32-bit tasks can
use it too, if running on a 64-bit kernel).
Whereas child_tid_address is for all architectures.
We've already got a whole syscall, get_robust_list(), which fetches
that address from a remote process if you have ptrace permission, and
you don't even need to be ptracing it! get_robust_list() is quite
large, and most of it is permission checks. How often is that useful?
I don't see why getting the child_tid_address should be a big deal.
On the other hand, lots of little bits of info would be handy to a
ptracer sometimes, such as the current signal mask and prctl settings.
Maybe what's needed is a generic PTRACE_PRCTL, allowed to call a
subset of prctl() functions from outside, or maybe even all of them.
Then add child_tid_address as a prctl.
( Then again maybe just skip the messing around that everyone does with
"parasite" code, and go straight for PTRACE_CALL_SYSCALL. It'd save a
lot of bother :-) )
All the best,
-- Jamie