2011-02-23 13:55:17

by Stas Sergeev

[permalink] [raw]
Subject: [path][rfc] add PR_DETACH prctl command

Hi.

The attched patch adds the PR_DETACH prctl command.
It is needed for those rare but unfortunate cases, where
you can't daemonize your process before creating a thread.
The effect of this command is similar to the fork() and then
exit() on parent, except that:
1. PID does not change
2. Threads are not destroyed

It would be nice to know what people think about such an
approach.

Signed-off-by: [email protected]
CC: Oleg Nesterov <[email protected]>


Attachments:
pr_detach.diff (5.38 kB)

2011-02-23 19:23:04

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

On 02/23, Stas Sergeev wrote:
>
> Hi.
>
> The attched patch adds the PR_DETACH prctl command.

Hi. The patch doesn't look right at first glance, but to me
this is not the main problem.

> It is needed for those rare but unfortunate cases, where
> you can't daemonize your process before creating a thread.
> The effect of this command is similar to the fork() and then
> exit() on parent, except that:
> 1. PID does not change
> 2. Threads are not destroyed
>
> It would be nice to know what people think about such an
> approach.

Well. You should somehow convince people we need this ;) This is
the main problem.

I am not going to discuss this, I never know when it comes to the
new feautures. And you need the authoritative ack, probably you
can ask Linus + Roland directly.

As for the patch itself,

> +static int wait_task_detached(struct wait_opts *wo, struct task_struct *p)
> +{
> + int retval = 0;
> + pid_t pid = task_pid_vnr(p);
> + uid_t uid = __task_cred(p)->uid;
> +
> + get_task_struct(p);
> + if (unlikely(wo->wo_flags & WNOWAIT)) {
> + read_unlock(&tasklist_lock);
> + return wait_noreap_copyout(wo, p, pid, uid, CLD_DETACHED,
> + p->exit_code >> 8);
> + }
> +
> + p->flags &= ~PF_DETACH;

Only current can change its ->flags, this is racy

> + if (!ptrace_reparented(p))
> + p->parent = init_pid_ns.child_reaper;
> + p->real_parent = init_pid_ns.child_reaper;
> + p->exit_signal = SIGCHLD;
> + list_move_tail(&p->sibling, &p->real_parent->children);

No, we can't do this under read_lock(tasklist). And you forgot about
threads, they also have ->real_parent == old_parent.

The usage of ->exit_code doesn't look right, espeicaily if it is traced.

And other problems afaics....

> @@ -1549,6 +1581,9 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
> if (p->exit_state == EXIT_DEAD)
> return 0;
>
> + if (p->flags & PF_DETACH)
> + return wait_task_detached(wo, p);

What if it is already dead? We are goint to reparent it, but init
won't notice the new zombie.

And what if do_wait() was called without WEXITED? say, the old parent
does waitpid(WSTOPPED).

> @@ -1450,10 +1450,10 @@ int do_notify_parent(struct task_struct *tsk, int sig)
>
> BUG_ON(sig == -1);
>
> - /* do_notify_parent_cldstop should have been called instead. */
> - BUG_ON(task_is_stopped_or_traced(tsk));
> + /* do_notify_parent_cldstop should have been called instead. */
> + BUG_ON(task_is_stopped_or_traced(tsk));
>
> - BUG_ON(!task_ptrace(tsk) &&
> + BUG_ON(!task_ptrace(tsk) && (tsk->flags & PF_EXITING) &&
> (tsk->group_leader != tsk || !thread_group_empty(tsk)));

Afaics, you are trying to hide the problem.... The code below can make
tsk detached if real_parent ignores SIGCHLD.

> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1736,6 +1736,22 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> else
> error = PR_MCE_KILL_DEFAULT;
> break;
> + case PR_DETACH:
> + error = -EPERM;
> + /* if parent is init, or not a group leader - bail */
> + if (me->real_parent == init_pid_ns.child_reaper)

This is not exactly right. What if the child of init's sub-thread
does PR_DETACH?

Oleg.

2011-02-23 20:40:05

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

23.02.2011 22:14, Oleg Nesterov wrote:
> The attched patch adds the PR_DETACH prctl command.
> Hi. The patch doesn't look right at first glance,
I know, thanks for the review. That's basically an RFC material.

> Well. You should somehow convince people we need this ;)
I don't have to: I need this patch, and that's the main motivation
for me. :) Though of course I'll offer it for inclusion when its ready,
but I am developing it mostly for my project.
google reveals that many people were confused by the fact that
daemon() silently drops all the threads, and the man page says
nothing about that nasty habit. And I really think there is no other
way to daemonize the process with threads, than to use something
like this patch, or is there?

> Only current can change its ->flags, this is racy
Oh my, add a new lock only for that? :((
Add another thread_struct member only for that?
Abuse ->exit_state only for that?
Nothing looks good...

>> + if (!ptrace_reparented(p))
>> + p->parent = init_pid_ns.child_reaper;
>> + p->real_parent = init_pid_ns.child_reaper;
>> + p->exit_signal = SIGCHLD;
>> + list_move_tail(&p->sibling,&p->real_parent->children);
> No, we can't do this under read_lock(tasklist). And you forgot about
> threads, they also have ->real_parent == old_parent.
Thanks, will fix.

> The usage of ->exit_code doesn't look right, espeicaily if it is traced.
Could you please elaborate on that? I am using the
->exit_code to pass the (fake) exit code to the parent.
The argument of my PR_DETACH is an exit code to pass.
What is a problem with that?

> What if it is already dead? We are goint to reparent it, but init
> won't notice the new zombie.
>
> And what if do_wait() was called without WEXITED? say, the old parent
> does waitpid(WSTOPPED).
Will fix.

>> @@ -1450,10 +1450,10 @@ int do_notify_parent(struct task_struct *tsk, int sig)
>>
>> BUG_ON(sig == -1);
>>
>> - /* do_notify_parent_cldstop should have been called instead. */
>> - BUG_ON(task_is_stopped_or_traced(tsk));
>> + /* do_notify_parent_cldstop should have been called instead. */
>> + BUG_ON(task_is_stopped_or_traced(tsk));
>>
>> - BUG_ON(!task_ptrace(tsk)&&
>> + BUG_ON(!task_ptrace(tsk)&& (tsk->flags& PF_EXITING)&&
>> (tsk->group_leader != tsk || !thread_group_empty(tsk)));
> Afaics, you are trying to hide the problem.... The code below can make
> tsk detached if real_parent ignores SIGCHLD.
Will fix the problem with parent ignoring SIGCHLD, thanks.
Though could you please clarify whether or not you see the
above hunk wrong? It is there just because the group is not
empty when the leader does PR_DETACH, so I adjusted the
sanity check.

>> --- a/kernel/sys.c
>> +++ b/kernel/sys.c
>> @@ -1736,6 +1736,22 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>> else
>> error = PR_MCE_KILL_DEFAULT;
>> break;
>> + case PR_DETACH:
>> + error = -EPERM;
>> + /* if parent is init, or not a group leader - bail */
>> + if (me->real_parent == init_pid_ns.child_reaper)
> This is not exactly right. What if the child of init's sub-thread
> does PR_DETACH?
Will fix.

Thanks for your review! I'll update the patch.

2011-02-24 13:37:30

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

On 02/23, Stas Sergeev wrote:
>
> 23.02.2011 22:14, Oleg Nesterov wrote:
>
>> Well. You should somehow convince people we need this ;)
> I don't have to: I need this patch, and that's the main motivation
> for me. :)

OK,

> Though of course I'll offer it for inclusion when its ready,
> but I am developing it mostly for my project.
> google reveals that many people were confused by the fact that
> daemon() silently drops all the threads, and the man page says
> nothing about that nasty habit.

I think it does... from man daemon

This function forks, and if the fork(2) succeeds,
the parent calls _exit(2),

and of course fork() doesn't clone the sub-threads.

> And I really think there is no other
> way to daemonize the process with threads, than to use something
> like this patch, or is there?

Depends on what "daemonize" means. Even with this patch, setsid()
after PR_DETACH can fail because we do not change the pids and
the caller can still be pgrp leader.

And. What if the parent of PR_DETACH caller blocks or ignores
SIGCHLD or simply doesn't call do_wait()? The caller can run with
PR_DETACH set without any effect "forever".

There are other problems. For example, if the caller is ptraced
it can silently disappear from parent's radar. If you fix this, then
the actual reparenting can be delayed unpredictably whatever the
real parent does.

So, to be honest, I do not think this idea will be accepted, and I don't
really understand your motivation. But once again, I never argue with the
"we need this feature" requests, no need to convince me.

>> Only current can change its ->flags, this is racy
> Oh my, add a new lock only for that? :((
> Add another thread_struct member only for that?
> Abuse ->exit_state only for that?
> Nothing looks good...

Yep. But this is minor, if nothing else you can use signal->flags.
In fact, I do not understand why only the group leader can do
prctl(PR_DETACH) and why this flag is per-thread.

>> The usage of ->exit_code doesn't look right, espeicaily if it is traced.
> Could you please elaborate on that? I am using the
> ->exit_code to pass the (fake) exit code to the parent.
> The argument of my PR_DETACH is an exit code to pass.
> What is a problem with that?

The problem is that ptrace uses this ->exit_code member as well.
Suppose that the (ptraced) task calls PR_DETACH and, say, recieves
a signal after that. See ptrace_signal().

>>> @@ -1450,10 +1450,10 @@ int do_notify_parent(struct task_struct *tsk, int sig)
>>>
>>> BUG_ON(sig == -1);
>>>
>>> - /* do_notify_parent_cldstop should have been called instead. */
>>> - BUG_ON(task_is_stopped_or_traced(tsk));
>>> + /* do_notify_parent_cldstop should have been called instead. */
>>> + BUG_ON(task_is_stopped_or_traced(tsk));
>>>
>>> - BUG_ON(!task_ptrace(tsk)&&
>>> + BUG_ON(!task_ptrace(tsk)&& (tsk->flags& PF_EXITING)&&
>>> (tsk->group_leader != tsk || !thread_group_empty(tsk)));
>> Afaics, you are trying to hide the problem.... The code below can make
>> tsk detached if real_parent ignores SIGCHLD.
> Will fix the problem with parent ignoring SIGCHLD, thanks.
> Though could you please clarify whether or not you see the
> above hunk wrong? It is there just because the group is not
> empty when the leader does PR_DETACH, so I adjusted the
> sanity check.

I understand why you added PF_EXITING. And, once again, this is not
right afaics. The current condition is more or less "random" and mostly
historical, although correct. If we want to take PF_EXITING into account,
we should just add BUG_ON(!(tsk->flags & PF_EXITING)). IOW, it is just
wrong to call this function unless this tsk exits.

Oleg.

2011-02-24 15:14:00

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

24.02.2011 16:29, Oleg Nesterov wrote:
>> And I really think there is no other
>> way to daemonize the process with threads, than to use something
>> like this patch, or is there?
> Depends on what "daemonize" means. Even with this patch, setsid()
> after PR_DETACH can fail because we do not change the pids and
> the caller can still be pgrp leader.
Yes, I am using TIOCNOTTY ioctl instead.
It doesn't detach the entire group from tty though, but the plan
is to implement also TIOCNOTTY_GRP, in case PR_DETACH is done.

> And. What if the parent of PR_DETACH caller blocks or ignores
> SIGCHLD or simply doesn't call do_wait()? The caller can run with
> PR_DETACH set without any effect "forever".
I am currently rewriting the patch to solve this all.
What I am trying to do now, is to reparent directly in prctl(),
but delay the list_move_tail(&p->sibling, &p->real_parent->children);
to the wait() call. If this is a feasible solution, I'll post the new patch.

> So, to be honest, I do not think this idea will be accepted, and I don't
> really understand your motivation. But once again, I never argue with the
> "we need this feature" requests, no need to convince me.
Lets see if the clean implementation is possible first. :)

> The problem is that ptrace uses this ->exit_code member as well.
> Suppose that the (ptraced) task calls PR_DETACH and, say, recieves
> a signal after that. See ptrace_signal().
Also do_signal_stop() seems to alter it.
Do you mean right now it can't happen that multiple events
alter the exit_code, and the parent notices only the last one?
In this case I need to add a separate variable.

> I understand why you added PF_EXITING. And, once again, this is not
> right afaics. The current condition is more or less "random" and mostly
> historical, although correct. If we want to take PF_EXITING into account,
> we should just add BUG_ON(!(tsk->flags& PF_EXITING)). IOW, it is just
> wrong to call this function unless this tsk exits.
OK, I'll address also this.

Thanks for your time, I am hoping to post the patch that addresses
the pointed problems.

2011-02-24 15:40:42

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

On 02/24, Stas Sergeev wrote:
>
> I am currently rewriting the patch to solve this all.
> What I am trying to do now, is to reparent directly in prctl(),
> but delay the list_move_tail(&p->sibling, &p->real_parent->children);
> to the wait() call.

Ooh... good luck ;) but I expect this won't be simple. In fact,
I do not understand how it is possible to do this correctly.

>> The problem is that ptrace uses this ->exit_code member as well.
>> Suppose that the (ptraced) task calls PR_DETACH and, say, recieves
>> a signal after that. See ptrace_signal().
> Also do_signal_stop() seems to alter it.

Yes. It is not immediately obvious but this is in fact for ptrace.
Even if this thread is not traced, the tracer can attach later.

> Do you mean right now it can't happen that multiple events
> alter the exit_code, and the parent notices only the last one?

Yes.

> In this case I need to add a separate variable.

I'd suggest you to always report 0 as the exit status. Much simpler.
Then you can do another patch if you really want to report arg2.

Oleg.

2011-03-31 16:10:56

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

Hi Oleg.

I found some time to get back to that patch and
to address all of the problems you pointed.
What do you think about the attached patch?
I didn't expect it would became that big.


Attachments:
pr_detach.diff (18.83 kB)

2011-03-31 17:03:06

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

Hi Stas,

On 03/31, Stas Sergeev wrote:
>
> I found some time to get back to that patch and
> to address all of the problems you pointed.
> What do you think about the attached patch?
> I didn't expect it would became that big.

fs/proc/array.c | 7 -
include/asm-generic/siginfo.h | 3
include/linux/init_task.h | 2
include/linux/prctl.h | 2
include/linux/sched.h | 21 +++-
kernel/exit.c | 200 +++++++++++++++++++++++++++++++++++-------
kernel/fork.c | 4
kernel/signal.c | 59 +++++++-----
kernel/sys.c | 45 +++++++++
9 files changed, 281 insertions(+), 62 deletions(-)

Eek! Not only it is big. It is complex and changes a lot of core
kernel code.

Sorry Stas, I am not going to try to review it carefully. As I said,
you need to convince lkml we need this feature first. And iirc you
are not going to suggest this change for everyone.

I guess, the main complication is that you are trying to ensure the
old parent can do wait() without -ECHLD... This complicates everything
soooooooooo much. I _feel_ this can be simplified.... but in any case
we need the nasty complications. And for what?


I only looked at sys_prctl() code, and almost every line looks wrong.
Hmm... in fact, the changes in exit.c look wrong too, but I didn't really
try to understand them.

> @@ -1736,6 +1737,50 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> else
> error = PR_MCE_KILL_DEFAULT;
> break;
> + case PR_DETACH: {
> + struct task_struct *p, *old_parent;
> + int notif = DEATH_REAP;
> + error = -EPERM;
> + /* not detaching from init */
> + if (me->real_parent == init_pid_ns.child_reaper)

2 problems. You shouldn't use init_pid_ns, you need the task's namespace.
Also, the task can be the child of /sbin/init's sub-thread.

> + write_lock_irq(&tasklist_lock);
> + old_parent = me->real_parent;
> + me->detach_code = arg2 << 8;
> + if (!task_detached(me))
> + notif = do_signal_parent(me, me->exit_signal,
> + CLD_DETACHED, arg2);

This is simply wrong. We reparent the whole thread group, we should
always notify the old parent. Or never. but this shouldn't depend on
the thread.

> + if (notif != DEATH_REAP) {
> + list_add_tail(&me->detached_sibling,
> + &me->real_parent->detached_children);
> + me->exit_state = EXIT_DETACHED;

No, no, we can't set ->exit_state != 0. This means the task is dead.

> + if (!ptrace_reparented(me))
> + me->parent = init_pid_ns.child_reaper;

Again, this shouldn't use init_pid_ns.child_reaper. But the main problem,
you can't trust ptrace_reparented(). What if the old parent ptraces this
task?

> + /* detaching makes us a group leader */
> + me->group_leader = me;

How? Now, we can't change ->group_leader, this is simply not possible
and very wrong. If nothing else, think about tid/tgid, but there are
a lot more problems.

> + while_each_thread(me, p) {
> + if (p->real_parent != old_parent)
> + continue;
> + if (!ptrace_reparented(p))
> + p->parent = init_pid_ns.child_reaper;
> + p->real_parent = init_pid_ns.child_reaper;

The same problems as above, pluse "p->real_parent != old_parent" looks
bogus.


Well. Once again, I never argue with new features, but you need to
convince lkml. Probably it is simple to implement PR_DETACH so that
the task just "disappears" from the old_parent's radar. Otherwise
we need more complications, but I'd rather add the fake TASK_ZOMBIE
task_struct for that. This will be much, much simply although not
pretty anyway.

Oleg.

2011-03-31 17:47:12

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

31.03.2011 21:02, Oleg Nesterov wrote:
> I only looked at sys_prctl() code, and almost every line looks wrong.
Well, the lines you pointed, were also in the
previous patch, and, as you didn't complained
back then, I thought they were fine. :)

>> + if (notif != DEATH_REAP) {
>> + list_add_tail(&me->detached_sibling,
>> + &me->real_parent->detached_children);
>> + me->exit_state = EXIT_DETACHED;
> No, no, we can't set ->exit_state != 0. This means the task is dead.
Does this really break things? I mean, there are
probably no checks like "if (task->exit_state)", every
time the exact value is checked. And EXIT_DETACHED
is only set for the short period: the wait() from parent
(either new or old) will remove it.

>> + /* detaching makes us a group leader */
>> + me->group_leader = me;
> How? Now, we can't change ->group_leader, this is simply not possible
> and very wrong. If nothing else, think about tid/tgid, but there are
> a lot more problems.
OK, thats why in the previous patch I was allowing only
the group leader to detach, but you complained. Should
I return that back?

>> + while_each_thread(me, p) {
>> + if (p->real_parent != old_parent)
>> + continue;
>> + if (!ptrace_reparented(p))
>> + p->parent = init_pid_ns.child_reaper;
>> + p->real_parent = init_pid_ns.child_reaper;
> The same problems as above, pluse "p->real_parent != old_parent" looks
> bogus.
Just an extra care. Should I just remove that check?

> Well. Once again, I never argue with new features, but you need to
> convince lkml. Probably it is simple to implement PR_DETACH so that
> the task just "disappears" from the old_parent's radar.
Yes, that worked for me too:
https://lkml.org/lkml/2010/12/25/37
Yes, I know there are bugs too. :) But, at least the patch
is just few lines.

> Otherwise
> we need more complications, but I'd rather add the fake TASK_ZOMBIE
> task_struct for that. This will be much, much simply although not
> pretty anyway.
Well, maybe the patch looks more complex than it actually is.
How it works:
- num_waiters is set to 1 by fork(). Then PR_DETACH may increment
it if the old parent does not ignore the SIGCHLD. Both the
wait_task_zombie() and wait_task_detached() do decrement that
counter, and when it is zero, the task is reaped. Also, if the old
parent terminates without wait(), it decrements that counter, and,
if needed, reaps the task.
- exit_state is set to EXIT_DETACHED if the parent doesn't ignore
SIGCHLD. Then, if old parent wait()s, exit_state gets reset to 0. But
if the process exits, exit_state gets set to EXIT_ZOMBIE, and then,
by the use of the num_waiters counter, I make sure both parents
waited, before releasing the task.
There were some rearrangements in the exit.c code, that are
not directly related to the new feature. I can split them to the
separate patches, if that will help.

As for convincing LKML... Well, when the code is right, maybe. :))

2011-03-31 18:18:38

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

On 03/31, Stas Sergeev wrote:
>
> 31.03.2011 21:02, Oleg Nesterov wrote:
>> I only looked at sys_prctl() code, and almost every line looks wrong.
> Well, the lines you pointed, were also in the
> previous patch, and, as you didn't complained
> back then, I thought they were fine. :)

Of course, I forgot completely what the previous patch did ;)
Perhaps I missed them before, or perhaps I stopped looking after
I noticed other problems.

>>> + if (notif != DEATH_REAP) {
>>> + list_add_tail(&me->detached_sibling,
>>> + &me->real_parent->detached_children);
>>> + me->exit_state = EXIT_DETACHED;
>> No, no, we can't set ->exit_state != 0. This means the task is dead.
> Does this really break things? I mean, there are
> probably no checks like "if (task->exit_state)",

There are. zap_other_threads() for example. More importantly, we
can have more of them. exit_state != 0 means: this task has passed
exit_notify(), we shouldn't break this.

>>> + /* detaching makes us a group leader */
>>> + me->group_leader = me;
>> How? Now, we can't change ->group_leader, this is simply not possible
>> and very wrong. If nothing else, think about tid/tgid, but there are
>> a lot more problems.
> OK, thats why in the previous patch I was allowing only
> the group leader to detach, but you complained. Should
> I return that back?

Ah, I _seem_ to recall... Yes, it seems strange that only group leader
can do PR_DETACH. If we implement this, I think any thread should be
allowed to call prctl(DETACH). But why do we need to change the leader?

>> Well. Once again, I never argue with new features, but you need to
>> convince lkml. Probably it is simple to implement PR_DETACH so that
>> the task just "disappears" from the old_parent's radar.
> Yes, that worked for me too:
> https://lkml.org/lkml/2010/12/25/37
> Yes, I know there are bugs too. :) But, at least the patch
> is just few lines.

I didn't look at the patch above, but probably it makes more sense.
At least for the start.

>> Otherwise
>> we need more complications, but I'd rather add the fake TASK_ZOMBIE
>> task_struct for that. This will be much, much simply although not
>> pretty anyway.
> Well, maybe the patch looks more complex than it actually is.
> How it works:

I didn't mean it is very hard to understand the intent. What is
really hard (at least to me) to see if it is correct or not.

And in any case it adds a lot of the code, and complicates the things.

So, just in case, I have to admit: yes, personally I dislike this
new feature ;) But, again, I am not going to argue.

> There were some rearrangements in the exit.c code, that are
> not directly related to the new feature. I can split them to the
> separate patches, if that will help.

Yes, this always help.

> As for convincing LKML... Well, when the code is right, maybe. :))

Maybe ;)

But, otoh. It is quite possible you will get the quick nack...

Oleg.

2011-03-31 20:58:57

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

31.03.2011 22:18, Oleg Nesterov wrote:
> > + if (me->real_parent == init_pid_ns.child_reaper)
> Also, the task can be the child of /sbin/init's sub-thread.
Hmm, how to check then? Should I add the "exact_parent" just
for that? Or traverse the sibling list? How bad. :(


> There are. zap_other_threads() for example. More importantly, we
> can have more of them. exit_state != 0 means: this task has passed
> exit_notify(), we shouldn't break this.
OK.

> Ah, I _seem_ to recall... Yes, it seems strange that only group leader
> can do PR_DETACH. If we implement this, I think any thread should be
> allowed to call prctl(DETACH). But why do we need to change the leader?
OK, I didn't know it is safe to leave it unchanged.

> I didn't look at the patch above, but probably it makes more sense.
> At least for the start.
IIRC strace didn't like the fact that wait() fails, and that's
why I started to add the complexity.

> I didn't mean it is very hard to understand the intent. What is
> really hard (at least to me) to see if it is correct or not.
OK, I'll try to break it into a small chunks then.
But the fact that such a seemingly simple functionality
requires so many changes, is IMHO bad by itself. And it
is also non-trivial to implement mostly because of the
ptrace hooks that are everywhere. Some cleanups are
certainly needed in that area. :)

> So, just in case, I have to admit: yes, personally I dislike this
> new feature ;)
Do you dislike a feature by itself, or by its implementation?
The feature by itself does nothing but to allow the daemon()
to work with threads, which can't be bad IMO, and, in some
cases, the inability to do so is difficult to work around. Of course,
these cases have nothing to do with the good design, but rather
with the vendor-provided poorly-coded drivers and libs. In all
the better cases you have the trivial workarounds, but why to
search for the workarounds at all, in the first place? Why not
to have the daemon() to "just work" with threads? :)

> > As for convincing LKML... Well, when the code is right, maybe. :))
> Maybe ;)
>
> But, otoh. It is quite possible you will get the quick nack...
Well, with such a rate of additions into a thread_struct, I'll
probably nack it myself soon, but again, I am surprised that
such a simple task requires so many untrivial changes.

2011-04-01 17:02:43

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

Hi Oleg.

Here are the splitted patches.
What do you think?
I admit that I haven't found yet the solutions to all
the problems you pointed yesterday, namely to the
check of "real_parent == init" and ptrace_reparented,
so ignore these 2 for now.
But probably now you can have a look into the exit.c part?
The first 2 patches are just the rearrangements and
should not incur any functional changes. The third one
is an implementation of pr_detach.
This time, the child is allowed to disappear from parent's
radar in case the old parent was slow to wait(), and the
process have exited and the new parent have wait()ed.
This is probably fine and not worth the complications,
what do you think?

Thanks for your time!


Attachments:
01_sigpar.diff (3.56 kB)
02_cwaittsk.diff (3.04 kB)
03_prdetach.diff (6.39 kB)
Download all attachments

2011-04-02 13:55:45

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

On 04/01, Stas Sergeev wrote:
>
> 31.03.2011 22:18, Oleg Nesterov wrote:
>> > + if (me->real_parent == init_pid_ns.child_reaper)
>> Also, the task can be the child of /sbin/init's sub-thread.
> Hmm, how to check then? Should I add the "exact_parent" just
> for that? Or traverse the sibling list? How bad. :(

You can check same_thread_group(real_parent, ns->child_reaper)

> > I didn't look at the patch above, but probably it makes more sense.
> > At least for the start.
> IIRC strace didn't like the fact that wait() fails, and that's
> why I started to add the complexity.

Of course, the task should never escape ptrace, but this is
another issue?

And yes, it is not good if the child simply disappears, but the
semantics is "strange" in any case.

> OK, I'll try to break it into a small chunks then.
> But the fact that such a seemingly simple functionality
> requires so many changes, is IMHO bad by itself.

Indeed. You are trying to do something unnatural ;)

> And it
> is also non-trivial to implement mostly because of the
> ptrace hooks that are everywhere. Some cleanups are
> certainly needed in that area. :)

Not sure ptrace is the main problem... To me, it is the problem,
yes, but mostly from the semantics pov.

>> So, just in case, I have to admit: yes, personally I dislike this
>> new feature ;)
> Do you dislike a feature by itself, or by its implementation?

Both ;)

> The feature by itself does nothing but to allow the daemon()
> to work with threads, which can't be bad IMO,

Well. It is bad even if the patch was correct. This complicates
and slows down the code. The code should be maintained. And for
what?

> Of course,
> these cases have nothing to do with the good design,

Indeed. You suggest the exotic and non-portable feature, almost
nobody will use it. But every user should pay for that. This is
not fair.

> but why to
> search for the workarounds at all, in the first place? Why not
> to have the daemon() to "just work" with threads? :)

The kernel is already huge. I simply can't understand why do
you think this is good idea to add more bloat for this feature.

And this is not enough to daemonize anyway. setsid() won't work.



Stas, please do not try to convince me, you can't. OTOH, let me
repeat, you do not need to convince me. I'd suggest you to discuss
this with Linus. If he agrees - then we can look at your patch
seriously and discuss the implementation details. Everything can
be implemented.

Oleg.

2011-04-02 14:06:50

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

On 04/01, Stas Sergeev wrote:
> Hi Oleg.
>
> This time, the child is allowed to disappear from parent's
> radar in case the old parent was slow to wait(),

Hmm... Again, I didn't really read the patch... But iiuc, the
detached child can exit and init can reap it, after that the
old parent gets ECHLD?

You know, even if the patch was correct and this feature was
acked, I'd try to nack this ;) do_wait() from parent should
always work or it should always return ECHLD, but it should
not depend on /dev/random. This is really weird, imho.

OK, even ignoring prctl() code he patch is buggy. __unhash_process()
does list_del_init(&p->detached_sibling). This is too late. If the
old parent has already called wait_task_detached() and cleared
->detaching, this child will add the unnecessary cost to every
subsequent do_wait() call. If the old parent exits, list_del_init()
can crash the kernel.

I didn't read the patch further.

Oleg.

2011-04-02 18:20:47

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

02.04.2011 17:55, Oleg Nesterov wrote:
>> OK, I'll try to break it into a small chunks then.
>> But the fact that such a seemingly simple functionality
>> requires so many changes, is IMHO bad by itself.
> Indeed. You are trying to do something unnatural ;)
Well, the fact that fork/exit was always used to detach
from parent, doesn't mean that it was natural: a single
syscall looks more natural. :)

> Indeed. You suggest the exotic and non-portable feature, almost
> nobody will use it. But every user should pay for that. This is
> not fair.
If that is true, then of course this should not be ever applied.
But why do you think so? What gets slowed down? wait()?
But if the detached_sibling list is empty, then why?

> The kernel is already huge. I simply can't understand why do
> you think this is good idea to add more bloat for this feature.
That depends on the final code (if it will ever be produced).
Maybe it can be very simple? :) I don't want to add bloat, I
want a small and simple patch.

> And this is not enough to daemonize anyway. setsid() won't work.
But TIOCNOTTY will.

> Stas, please do not try to convince me, you can't. OTOH, let me
> repeat, you do not need to convince me. I'd suggest you to discuss
> this with Linus. If he agrees - then we can look at your patch
Well, if the patch is bloated or buggy, there is no reason to
discuss it with Linus. :) If the patch is small and simple, that
alone will give it lots of credits.

> Hmm... Again, I didn't really read the patch... But iiuc, the
> detached child can exit and init can reap it, after that the
> old parent gets ECHLD?
Yes. Because making sure that both parents wait()ed, probably
cannot be reliably implemented without the write_lock_irq(&tasklist_lock),
while the current code uses only read lock. Or you really need
a fake task_struct...

> do_wait() from parent should
> always work or it should always return ECHLD, but it should
> not depend on /dev/random. This is really weird, imho.
OK, in this case, I guess, the fake task_struct is the last chance.

> does list_del_init(&p->detached_sibling). This is too late. If the
> old parent has already called wait_task_detached() and cleared
> ->detaching, this child will add the unnecessary cost to every
> subsequent do_wait() call.
OK, will add the forgotten "if (->detaching)" check before calling
wait_task_detached(), thanks.

> If the old parent exits, list_del_init()
> can crash the kernel.
Why?

Yes, I can read Russian, as you asked in another e-mail. :)
Let me clarify once again: I don't want to add neither bloat, nor
the slowdown on the common path. But I haven't yet convinced myself
that this is unavoidable. If you haven't noticed, I shrunk the
patch 3 times, and fixed most of the bugs already. :)) And I am not
even trying to convince you (or Linus) in anything, before the pach
is made simple. If it can't be made simple, then I'll leave it to
my project only. That was the intention anyway. :)

2011-04-02 22:00:49

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

02.04.2011 17:55, Oleg Nesterov wrote:
>
>>>> + if (me->real_parent == init_pid_ns.child_reaper)
>>> Also, the task can be the child of /sbin/init's sub-thread.
>> Hmm, how to check then? Should I add the "exact_parent" just
>> for that? Or traverse the sibling list? How bad. :(
> You can check same_thread_group(real_parent, ns->child_reaper)
But real_parent==ns->child_reaper in our case, so what does
this check give?

> acked, I'd try to nack this;) do_wait() from parent should
> always work or it should always return ECHLD, but it should
> not depend on /dev/random. This is really weird, imho.
OK, this can be fixed by delaying the wait() from init till
the old parent wait()s or die. Will fix.

2011-04-04 14:34:19

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

Hi Oleg.

Here's the patch that addresses your concerns
about the late deleting from list.
Also, the patch is shrunk twice.
I think it is about to be trivial this time.
I still haven't solved the problems with checking
parent and checking ptrace, so ignore them for
now (or give me the hints:)
Do we still have other bugs here?


Attachments:
01_sigpar.diff (3.56 kB)
pr_detach2.diff (4.96 kB)
Download all attachments

2011-04-04 16:04:15

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

Hi Stas,

On 04/04, Stas Sergeev wrote:
>
> Here's the patch that addresses your concerns
> about the late deleting from list.
> Also, the patch is shrunk twice.
> I think it is about to be trivial this time.

But it is very wrong at first glance...

> I still haven't solved the problems with checking
> parent and checking ptrace, so ignore them for
> now (or give me the hints:)

Not sure I understand your question...

But, once again. You should not use ptrace_reparented(). Reparanted or
not, the ptraced thread must not change its ->parent.

Sorry Stas, I have no time to read the patch, just one thing...

> + me->detach_code = arg2 << 8;
> ...
> + if (notif != DEATH_REAP)
> + me->detaching = 1;
> + else
> + list_move_tail(&me->sibling,
> + &me->real_parent->children);

This is simply wrong unless the caller is the group_leader. Probably
there was some confusion, I thought we already discussed this. But this
is minor...

> + while_each_thread(me, p) {
> + if (!ptrace_reparented(p))
> + p->parent = pid_ns->child_reaper;
> + p->real_parent = pid_ns->child_reaper;

Eek. Even ignoring ptrace, this is weird. We change parent/real_parent,
but we do not do list_move_tail(sibling) until wait_task_detached() !
No, I think we should not do this even if this was correct. I'll try
to nack this in any case, even if there were no immediate problems ;)
IMHO, this is insane.

But this is wrong. Well. Suppose that the caller of PR_DETACH exits
before the old parent does do_wait(). What /sbin/init (who is the new
parent) can do after it gets SIGCHLD? If can't see this zombie. Nor
the old parent can release this task due to ->detaching. Eventually
/sbin/init can reap it if it does, say, waitpid(-1), but still this
is wrong.

Or. Suppose that the old parent exits after its child does PR_DETACH.
You changed forget_original_parent() but this is not enough, note that
find_new_reaper() can pick the live sub-thread. In this case the child
will be moved to init's ->children list, and after that we are changing
->real_parent back.

Oleg.

2011-04-04 20:06:28

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

04.04.2011 20:03, Oleg Nesterov wrote:
>> I still haven't solved the problems with checking
>> parent and checking ptrace, so ignore them for
>> now (or give me the hints:)
> Not sure I understand your question...
I wonder how to check whether the child was
reparented to init, and not is a child on init's
subthread. You suggested to check
same_thread_group(real_parent, ns->child_reaper),
but I don't understand that suggestion because in
our case real_parent==ns->child_reaper, yet it is not
enough, as it could be of CLONE_PARENT (I suppose).

>> + while_each_thread(me, p) {
>> + if (!ptrace_reparented(p))
>> + p->parent = pid_ns->child_reaper;
>> + p->real_parent = pid_ns->child_reaper;
> Eek. Even ignoring ptrace, this is weird. We change parent/real_parent,
> but we do not do list_move_tail(sibling) until wait_task_detached() !
> No, I think we should not do this even if this was correct. I'll try
> to nack this in any case, even if there were no immediate problems ;)
> IMHO, this is insane.
>
> But this is wrong. Well. Suppose that the caller of PR_DETACH exits
> before the old parent does do_wait(). What /sbin/init (who is the new
> parent) can do after it gets SIGCHLD? If can't see this zombie. Nor
> the old parent can release this task due to ->detaching. Eventually
> /sbin/init can reap it if it does, say, waitpid(-1), but still this
> is wrong.
No, the idea was like that: the old parent either wait()s or
exits, then init became a "real" parent of that process, and
reaps it immediately. I think that's natural: the very same will happen
if the child didn't do pr_detach. If it exits, its current parent
have to either wait(), or exit. If it doesn't do so - zombie.
I decided that if I still allow wait() for the old parent, I'd better
leave that logic untouched. So: until the old parent wait()s
or exits, we have zombie, and that's what I supposed to implement.

> Or. Suppose that the old parent exits after its child does PR_DETACH.
> You changed forget_original_parent() but this is not enough, note that
> find_new_reaper() can pick the live sub-thread. In this case the child
> will be moved to init's ->children list, and after that we are changing
> ->real_parent back.
How? I think I prevented that with this:
---

+ p->detaching = 0;
+ continue;
---
So it should not go down to get reparented back.
Or am I missing something?

Thanks for your time!

2011-04-05 15:16:16

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

On 04/05, Stas Sergeev wrote:
>
> 04.04.2011 20:03, Oleg Nesterov wrote:
>>> I still haven't solved the problems with checking
>>> parent and checking ptrace, so ignore them for
>>> now (or give me the hints:)
>> Not sure I understand your question...
> I wonder how to check whether the child was
> reparented to init, and not is a child on init's
> subthread.

I don't understand the "and not is a child on init's subthread".
If the child was created by init's sub-thread, it is the child
of the whole thread group.

->real_parent points to thread, yes. But the parent is the whole
process, not thread. The only reason for this oddity is __WNOTHREAD.

> You suggested to check
> same_thread_group(real_parent, ns->child_reaper),

Yes. This means: our parent is the init process.

>>> + while_each_thread(me, p) {
>>> + if (!ptrace_reparented(p))
>>> + p->parent = pid_ns->child_reaper;
>>> + p->real_parent = pid_ns->child_reaper;
>> Eek. Even ignoring ptrace, this is weird. We change parent/real_parent,
>> but we do not do list_move_tail(sibling) until wait_task_detached() !
>> No, I think we should not do this even if this was correct. I'll try
>> to nack this in any case, even if there were no immediate problems ;)
>> IMHO, this is insane.
>>
>> But this is wrong. Well. Suppose that the caller of PR_DETACH exits
>> before the old parent does do_wait(). What /sbin/init (who is the new
>> parent) can do after it gets SIGCHLD? If can't see this zombie. Nor
>> the old parent can release this task due to ->detaching. Eventually
>> /sbin/init can reap it if it does, say, waitpid(-1), but still this
>> is wrong.
> No, the idea was like that: the old parent either wait()s or
> exits, then init became a "real" parent of that process, and
> reaps it immediately.

I understand this, but

> I think that's natural:

I strongly disagree, this is not natural. I mean, ->real_parent
and the head of ->sibling list should match each other.

For example. Ignoring other problems, you are doing list_move() in
do_wait() pathes. This happens to work now because everything is
protected by the global tasklist. But we are going to change the
locking, it should be per-process.

> If it exits,

If it exits, it notifies the new ->real_parent == init. init receives
SIGCHLD, but do_wait() can't see this process on ->children lists.

> its current parent
> have to either wait(), or exit. If it doesn't do so - zombie.

Indeed. And, if the old parent does wait(), this simply does
list_move_tail(p->sibling), a zombie won't be reaped. And nobody will
reap it, until init does waitpid(-1). This was my point, this is wrong.

>> Or. Suppose that the old parent exits after its child does PR_DETACH.
>> You changed forget_original_parent() but this is not enough, note that
>> find_new_reaper() can pick the live sub-thread. In this case the child
>> will be moved to init's ->children list, and after that we are changing
>> ->real_parent back.
> How? I think I prevented that with this:
> ---
>
> + p->detaching = 0;
> + continue;

Yes, thanks, I didn't notice "continue". But then this is wrong again.
This can race with wait_task_detached() called by our sub-thread, it
can clear ->detaching before we check it.

Oleg.

2011-04-05 16:25:16

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

05.04.2011 19:15, Oleg Nesterov wrote:
> I don't understand the "and not is a child on init's subthread".
> If the child was created by init's sub-thread, it is the child
> of the whole thread group.
>
> ->real_parent points to thread, yes.
OK, sorry, I thought you mean the CLONE_PARENT case, where
the real_parent points _not_ to thread. That's why the confusion.

> But the parent is the whole
> process, not thread. The only reason for this oddity is __WNOTHREAD.
OK, now I see what problem you are pointing to.
Will fix.

>> How? I think I prevented that with this:
>> ---
>>
>> + p->detaching = 0;
>> + continue;
> Yes, thanks, I didn't notice "continue". But then this is wrong again.
> This can race with wait_task_detached() called by our sub-thread, it
> can clear ->detaching before we check it.
But the above code is under write_lock_irq(&tasklist_lock), so why
would it race?

2011-04-05 16:46:21

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

On 04/05, Stas Sergeev wrote:
>
> 05.04.2011 19:15, Oleg Nesterov wrote:
>>>
>>> + p->detaching = 0;
>>> + continue;
>> Yes, thanks, I didn't notice "continue". But then this is wrong again.
>> This can race with wait_task_detached() called by our sub-thread, it
>> can clear ->detaching before we check it.
> But the above code is under write_lock_irq(&tasklist_lock), so why
> would it race?

wait_task_detached() clears detaching, then it drops tasklist and takes
it for writing. forget_original_parent() can come in between.

Oleg.

2011-04-05 17:51:44

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

Hi Oleg, here's the patch that should address
the mentioned problems. Or does it add more? :)
I try to delay the notification of init till the detaching
is complete.


Attachments:
a.diff (7.70 kB)

2011-04-08 10:51:11

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

Hi Oleg.

I updated patch to fix the race between wait_task_detached()
and wait_task_zombie() by using is_detaching flag.
Here's the patch.
What problems do remain here?


Attachments:
01_sigpar.diff (3.56 kB)
02_cwaittsk.diff (3.04 kB)
pr_detach3.diff (7.97 kB)
Download all attachments

2011-04-08 18:14:09

by Bryan Donlan

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

On Wed, Feb 23, 2011 at 08:50, Stas Sergeev <[email protected]> wrote:
> Hi.
>
> The attched patch adds the PR_DETACH prctl command.
> It is needed for those rare but unfortunate cases, where
> you can't daemonize your process before creating a thread.
> The effect of this command is similar to the fork() and then
> exit() on parent, except that:
> 1. PID does not change
> 2. Threads are not destroyed
>
> It would be nice to know what people think about such an
> approach.

I can't comment on the patch itself, but, if your application knows it
might have to daemonize after spinning up threads, why not simply
fork() immediately on startup, and have the parent simply wait forever
for either the child to die or for a daemonize signal from the child?
If done early enough it shouldn't tie up too much memory, and it
wouldn't require any of these invasive kernel changes. As an added
bonus, it's portable to all unixen :)

2011-04-08 18:56:01

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

On 04/08, Stas Sergeev wrote:
>
> I updated patch to fix the race between wait_task_detached()
> and wait_task_zombie() by using is_detaching flag.
> Here's the patch.
> What problems do remain here?

Stas, sorry, I simply have no time to read this patch. Not to mention,
you know that I dislike it in any case.

I only quickly looked at the changes in sys_prctl and they are definitely
wrong.

> + me->real_parent = pid_ns->child_reaper;
> + if (thread_group_leader(me)) {
> + list_move_tail(&me->sibling,
> + &me->real_parent->children);
> + /* reparent threads */
> + p = me;
> + while_each_thread(me, p) {
> + if (!task_ptrace(p))
> + p->parent = pid_ns->child_reaper;
> + p->real_parent = pid_ns->child_reaper;

Why do you check thread_group_leader() ???? What if the caller is not
the leader?

I told this many times, and I got lost. If you reparent, you should
always reparent the whole group.

Also, you are doing do_signal_parent(me). I do not know what it does
(once again, I didn't read the patch), but this looks suspicious.
A sub-thread shouldn't notify the parent.

Oleg.

2011-04-08 20:17:02

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

08.04.2011 22:55, Oleg Nesterov wrote:
> Why do you check thread_group_leader() ???? What if the caller is not
> the leader?
>
> I told this many times, and I got lost. If you reparent, you should
> always reparent the whole group.
Arrgh, OK, hope I realize that now, sorry! There were lots of other
problems, so I was constantly forgetting about this one. Or,
maybe, I couldn't get away with the fact that any sub-thread
can reparent the whole group. OK, will definitely fix this time. :))
Thanks!

2011-04-08 20:27:14

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

08.04.2011 22:13, Bryan Donlan wrote:
> I can't comment on the patch itself, but, if your application knows it
> might have to daemonize after spinning up threads, why not simply
> fork() immediately on startup, and have the parent simply wait forever
> for either the child to die or for a daemonize signal from the child?
> If done early enough it shouldn't tie up too much memory, and it
> wouldn't require any of these invasive kernel changes. As an added
> bonus, it's portable to all unixen :)
Yes, thats almost always true. Except when you deal with
the vendor-provided poorly-coded drivers and libs, where
you get the drivers initialized only via the lib. And the
initialization process must be finished before the boot-up
can continue, but that's not the whole story: only the
process that initialized that lib, can then work with it.
And of course that lib creates a dozen of threads...
OK, you've got the idea. :)
But anyway. Yes, not everyone have to deal with such a
nightmare, this is very rare. But people being confused
by the fact that daemon() silently loses threads, are not
rare. So I just wonder: even if the workaround is simple,
why searching for the workaround at all? If you need
2 syscalls to detach from parent, and you loose the threads
in a process, then why not to have a single call, that
detaches without loosing threads?
But in any case, I am mostly inclinced to leave that patch
for my project only.

2011-04-08 20:52:47

by Bryan Donlan

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

2011/4/8 Stas Sergeev <[email protected]>:
> 08.04.2011 22:13, Bryan Donlan wrote:
>>
>> I can't comment on the patch itself, but, if your application knows it
>> might have to daemonize after spinning up threads, why not simply
>> fork() immediately on startup, and have the parent simply wait forever
>> for either the child to die or for a daemonize signal from the child?
>> If done early enough it shouldn't tie up too much memory, and it
>> wouldn't require any of these invasive kernel changes. As an added
>> bonus, it's portable to all unixen :)
>
> Yes, thats almost always true. Except when you deal with
> the vendor-provided poorly-coded drivers and libs, where
> you get the drivers initialized only via the lib. And the
> initialization process must be finished before the boot-up
> can continue, but that's not the whole story: only the
> process that initialized that lib, can then work with it.
> And of course that lib creates a dozen of threads...
> OK, you've got the idea. :)

Still, you can workaround this by either:
a) Load the vendor library via dlopen()
b) Use a separate launcher executable to handle the fork-and-wait

Either of these approaches are far simpler than patching the kernel,
more portable, and much, much easier to get right. In fact, here's a
quick and dirty implementation of the latter:

#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

void sigusr1_handler(int unused)
{
(void)unused;

_exit(0);
}

int main(int argc, char **argv)
{
int waitstatus;
pid_t child, parent = getpid();

(void)argc;

signal(SIGUSR1, sigusr1_handler);

child = fork();
if (child < 0) {
perror("Fork failed");
return EXIT_FAILURE;
}
if (child == 0) {
char envbuf[20];
sprintf(envbuf, "%d", (int)parent);
setenv("LAUNCHER_PID", envbuf, 1);

execvp(argv[1], &argv[1]);
perror("Launching child failed");
_exit(EXIT_FAILURE);
}

wait(&waitstatus);

if (WIFEXITED(waitstatus))
_exit(WEXITSTATUS(waitstatus));

_exit(256);
}

Just kill(atoi(getenv("LAUNCHER_PID")), SIGUSR1) to detach. Much
easier than doing some very racy things in the kernel, no? It's
certainly more obvious that this ought to be correct in the face of
races with its parent :)

2011-04-08 21:15:36

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

09.04.2011 00:52, Bryan Donlan wrote:
> Still, you can workaround this by either:
Yes, sure.

> a) Load the vendor library via dlopen()
Too much to dlsym(), and by the way, what does this give?
I _have to_ init that lib early at bootup, so I don't see how
dlopen will help, could you clarify?

> b) Use a separate launcher executable to handle the fork-and-wait
Yes, that's certainly possible, except that I was trying the
semaphore instead of a signal, as in your example, but its
the same.

> Just kill(atoi(getenv("LAUNCHER_PID")), SIGUSR1) to detach. Much
> easier than doing some very racy things in the kernel, no? It's
> certainly more obvious that this ought to be correct in the face of
> races with its parent :)
But what races do you mean? Yes, the workaround is a
workaround, and it works. And is simpler to implement. :)
But what problems do you see with the PR_DETACH approach,
except for the bugs in my patch? :) I mean, the fact that
daemon() silently loses threads, sounds like a limitation,
after all.

2011-04-08 21:26:41

by Bryan Donlan

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

2011/4/8 Stas Sergeev <[email protected]>:
> 09.04.2011 00:52, Bryan Donlan wrote:
>>
>> Still, you can workaround this by either:
>
> Yes, sure.
>
>> a) Load the vendor library via dlopen()
>
> Too much to dlsym(), and by the way, what does this give?
> I _have to_ init that lib early at bootup, so I don't see how
> dlopen will help, could you clarify?

Using dlopen will allow you to perform the fork() prior to running the
library's initialization code.

>> Just kill(atoi(getenv("LAUNCHER_PID")), SIGUSR1) to detach. Much
>> easier than doing some very racy things in the kernel, no? It's
>> certainly more obvious that this ought to be correct in the face of
>> races with its parent :)
>
> But what races do you mean? Yes, the workaround is a
> workaround, and it works. And is simpler to implement. :)
> But what problems do you see with the PR_DETACH approach,
> except for the bugs in my patch? :) I mean, the fact that
> daemon() silently loses threads, sounds like a limitation,
> after all.

As I said, I can't comment on the patch - I'm not familiar with that
part of the kernel, so I don't know what kind of races may be lurking.
But based on Oleg's comments, it seems clear to me that implementing
your PR_DETACH is quite a complex thing to be doing. It is true that
being able to daemonize() without losing threads would be a nice thing
to have; I just have my doubts that it's worth the potential bugs that
such a change might introduce, when it's only needed in a somewhat
rare case, and when perfectly good workarounds exist in userspace.

2011-04-08 21:39:29

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

09.04.2011 01:25, Bryan Donlan wrote:
> Using dlopen will allow you to perform the fork() prior to running the
> library's initialization code.
But I don't even need that: I init it manually, there
is no automatic ctors there.

> As I said, I can't comment on the patch - I'm not familiar with that
> part of the kernel, so I don't know what kind of races may be lurking.
> But based on Oleg's comments, it seems clear to me that implementing
> your PR_DETACH is quite a complex thing to be doing.
Yes, that's true, but treat that as my personal tour to the
kernel internals. I wasn't realizing the complexity initially,
and now its almost done, so I wonder what the result will
look like, it will it worth all the troubles.

> It is true that
> being able to daemonize() without losing threads would be a nice thing
> to have; I just have my doubts that it's worth the potential bugs that
> such a change might introduce, when it's only needed in a somewhat
> rare case, and when perfectly good workarounds exist in userspace.
Yes, I share that concerns too. And it is very unlikely that
I will propose that patch for inclusion, but I think it is possible
to code that up small and simple, in which case maybe even
Oleg will not hate it all that much. ;))

2011-04-11 11:15:52

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command

Hi Oleg.

In the attached patch I finally reparent the whole group.
How does that look?


Attachments:
pr_detach4.diff (7.83 kB)

2011-04-19 14:44:46

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [1/3]

Reposting per Oleg's request.

The attached patch adds do_signal_parent() function.
It is the same as do_notify_parent(), but can be used
when the child is not terminating, but (for example)
detaches with PR_DETACH.


Attachments:
01_sigpar.diff (3.56 kB)

2011-04-19 14:50:05

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [2/3]

The attached patch adds the can_wait_task() and
can_wait_task_ptrace() functions by splitting out the
checks from wait_consider_task().

int ret = wait_consider_task(wo, 0, p);

gets then replaced by

ret = can_wait_task(wo, p);
if (!ret)
continue;
ret = wait_consider_task(wo, 0, p);


Attachments:
02_cwaittsk.diff (3.04 kB)

2011-04-19 14:54:37

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [3/3]

The attached patch implements the PR_DETACH prctl
command. It detaches the entire process group from
its parent, allowing the parent to still read the detach
code with normal wait(). If the process then exits, the
notification of the new parent is delayed till the old parent
does either wait() to read the detach code, or exits.


Attachments:
pr_detach4.diff (7.83 kB)

2011-04-19 14:57:40

by Alan

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [3/3]

On Tue, 19 Apr 2011 18:54:34 +0400
Stas Sergeev <[email protected]> wrote:

> The attached patch implements the PR_DETACH prctl
> command. It detaches the entire process group from
> its parent, allowing the parent to still read the detach
> code with normal wait(). If the process then exits, the
> notification of the new parent is delayed till the old parent
> does either wait() to read the detach code, or exits.

What is the use case for this and why won't existing process group
interfaces do the job ?

2011-04-19 15:08:57

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [3/3]

19.04.2011 18:58, Alan Cox wrote:
>> The attached patch implements the PR_DETACH prctl
>> command. It detaches the entire process group from
>> its parent, allowing the parent to still read the detach
>> code with normal wait(). If the process then exits, the
>> notification of the new parent is delayed till the old parent
>> does either wait() to read the detach code, or exits.
> What is the use case for this and why won't existing process group
> interfaces do the job ?
The use case is to daemonize the process with threads.
You first need to detach it from its parent, then use TIOCNOTTY
ioctl to detach from the tty (TIOCNOTTY_GRP doesn't seem
to exist too, though, but might be easy to implement).
There are a few workarounds to that that I am aware of,
but what exactly interfaces do you have in mind? I have
found nothing that allows to do the same without a workarounds
like this:
https://lkml.org/lkml/2011/4/8/278
The current way of detaching, which is a fork/exit combo,
loses all threads, so, when you call daemon() and you had
threads, you are a toast.

2011-04-19 15:53:35

by Alan

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [3/3]

> The use case is to daemonize the process with threads.
> You first need to detach it from its parent, then use TIOCNOTTY
> ioctl to detach from the tty (TIOCNOTTY_GRP doesn't seem
> to exist too, though, but might be easy to implement).
> There are a few workarounds to that that I am aware of,
> but what exactly interfaces do you have in mind? I have
> found nothing that allows to do the same without a workarounds
> like this:
> https://lkml.org/lkml/2011/4/8/278
> The current way of detaching, which is a fork/exit combo,
> loses all threads, so, when you call daemon() and you had
> threads, you are a toast.

Yes - you need to detach then create the threads.

The reason I ask is you appear to add overhead to various hot paths and
you add 48 bytes to each task struct if I read the code right. Thats half
a megabyte on a server running a pile of java gunge !

Alan

2011-04-19 16:13:51

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [3/3]

On 04/19, Alan Cox wrote:
>
> > The use case is to daemonize the process with threads.
> > You first need to detach it from its parent, then use TIOCNOTTY
> > ioctl to detach from the tty (TIOCNOTTY_GRP doesn't seem
> > to exist too, though, but might be easy to implement).
> > There are a few workarounds to that that I am aware of,
> > but what exactly interfaces do you have in mind? I have
> > found nothing that allows to do the same without a workarounds
> > like this:
> > https://lkml.org/lkml/2011/4/8/278
> > The current way of detaching, which is a fork/exit combo,
> > loses all threads, so, when you call daemon() and you had
> > threads, you are a toast.
>
> Yes - you need to detach then create the threads.
>
> The reason I ask is you appear to add overhead to various hot paths

... and complicates this code.

> and
> you add 48 bytes to each task struct if I read the code right. Thats half
> a megabyte on a server running a pile of java gunge !

... to add the non-portable (and _imho_ unneeded) feauture.

IOW, personally I do not like this change in any case. But I am not
going to argue if someone acks this new feauture.




I'll try to check these patches from the correctness pov tomorrow,
but to be honest I hope someone will nack them before I start ;)

Oleg.

2011-04-19 16:19:24

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [3/3]

19.04.2011 19:54, Alan Cox wrote:
>> The use case is to daemonize the process with threads.
>> You first need to detach it from its parent, then use TIOCNOTTY
>> ioctl to detach from the tty (TIOCNOTTY_GRP doesn't seem
>> to exist too, though, but might be easy to implement).
>> There are a few workarounds to that that I am aware of,
>> but what exactly interfaces do you have in mind? I have
>> found nothing that allows to do the same without a workarounds
>> like this:
>> https://lkml.org/lkml/2011/4/8/278
>> The current way of detaching, which is a fork/exit combo,
>> loses all threads, so, when you call daemon() and you had
>> threads, you are a toast.
> Yes - you need to detach then create the threads.
That's painful sometimes:
https://lkml.org/lkml/2011/4/8/266

> The reason I ask is you appear to add overhead to various hot paths and
> you add 48 bytes to each task struct if I read the code right. Thats half
> a megabyte on a server running a pile of java gunge !
I don't think there is a real code overhead: the new list is
always empty (unless you detach). But the memory overhead
is real. Can be shrunk though.
Well, there is always an option to leave that patch for my own
project if people dislike it too much.
But I'd like to know the margins, i.e. how much memory is acceptable?
Eg, if it only uses 8 extra bytes (one pointer), is it acceptable then,
or not in any case? What is the policy?

2011-04-19 16:29:59

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [3/3]

On 04/19, Oleg Nesterov wrote:
>
> I'll try to check these patches from the correctness pov tomorrow,
> but to be honest I hope someone will nack them before I start ;)

OK, I briefly looked at 3/3. It looks certainly wrong.

notif = do_signal_parent(...);
if (notif != DEATH_REAP) {
....

do_signal_parent() must not return DEATH_REAP (this means that
leader->exit_signal becomes -1), but this can happen and this is bug.

I can be wrong, but iirc this was already discussed, probably when
you sent the very first version which had the same problem. That is
why (in particular) do_notify_parent() does

BUG_ON(tsk->group_leader != tsk || !thread_group_empty(tsk))

but you simply removed this check to hide the problem.



Also. I didn't actually read the patch yet, but iiuc: if a task T does
PR_DETACH and then exits, init can't reap it until the old parent does
wait. This can confuse the poor admin, he can see the zombies with
ppid = 1 and there is no way to understand why.

Oleg.

2011-04-19 16:54:13

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [3/3]

19.04.2011 20:29, Oleg Nesterov wrote:
>> I'll try to check these patches from the correctness pov tomorrow,
>> but to be honest I hope someone will nack them before I start ;)
> OK, I briefly looked at 3/3. It looks certainly wrong.
>
> notif = do_signal_parent(...);
> if (notif != DEATH_REAP) {
> ....
>
> do_signal_parent() must not return DEATH_REAP (this means that
> leader->exit_signal becomes -1), but this can happen and this is bug.
>
Could you please clarify this a bit: according to the comments
in signal.c:
---
* We are exiting and our parent doesn't care. POSIX.1
* defines special semantics for setting SIGCHLD to SIG_IGN
* or setting the SA_NOCLDWAIT flag: we should be reaped
* automatically and not left for our parent's wait4 call.
---
That's how I understand it: if DEATH_REAP is returned, the
parent ignores SIGCHILD, and in this case I am not allowing
it to read the detach code with wait(). What is the bug?

> Also. I didn't actually read the patch yet, but iiuc: if a task T does
> PR_DETACH and then exits, init can't reap it until the old parent does
> wait. This can confuse the poor admin, he can see the zombies with
> ppid = 1 and there is no way to understand why.
Oh my. :)) I guess you are not going to suggest a solution to
that "problem", other than to rip the patch? :)

2011-04-19 17:21:19

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [3/3]

On 04/19, Stas Sergeev wrote:
>
> 19.04.2011 20:29, Oleg Nesterov wrote:
>>> I'll try to check these patches from the correctness pov tomorrow,
>>> but to be honest I hope someone will nack them before I start ;)
>> OK, I briefly looked at 3/3. It looks certainly wrong.
>>
>> notif = do_signal_parent(...);
>> if (notif != DEATH_REAP) {
>> ....
>>
>> do_signal_parent() must not return DEATH_REAP (this means that
>> leader->exit_signal becomes -1), but this can happen and this is bug.
>>
> Could you please clarify this a bit: according to the comments
> in signal.c:
> ---
> * We are exiting and our parent doesn't care. POSIX.1
> * defines special semantics for setting SIGCHLD to SIG_IGN
> * or setting the SA_NOCLDWAIT flag: we should be reaped
> * automatically and not left for our parent's wait4 call.
> ---
> That's how I understand it: if DEATH_REAP is returned, the
> parent ignores SIGCHILD, and in this case I am not allowing
> it to read the detach code with wait(). What is the bug?

Indeed. But, once again, that is why do_notify_parent() expects the dead
tsk! Please note that if it returns DEATH_REAP it sets ->exit_signal = -1.
And this is _only_ allowed if the leader is already dead and we are going
to reap it.

>> Also. I didn't actually read the patch yet, but iiuc: if a task T does
>> PR_DETACH and then exits, init can't reap it until the old parent does
>> wait. This can confuse the poor admin, he can see the zombies with
>> ppid = 1 and there is no way to understand why.
> Oh my. :)) I guess you are not going to suggest a solution to
> that "problem", other than to rip the patch? :)

Cough... well, the mentioned solution looks very nice to me ;)

Oleg.

2011-04-19 17:41:06

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [3/3]

19.04.2011 21:20, Oleg Nesterov wrote:

>>> do_signal_parent() must not return DEATH_REAP (this means that
>>> leader->exit_signal becomes -1), but this can happen and this is bug.
>>>
>> Could you please clarify this a bit: according to the comments
>> in signal.c:
>> ---
>> * We are exiting and our parent doesn't care. POSIX.1
>> * defines special semantics for setting SIGCHLD to SIG_IGN
>> * or setting the SA_NOCLDWAIT flag: we should be reaped
>> * automatically and not left for our parent's wait4 call.
>> ---
>> That's how I understand it: if DEATH_REAP is returned, the
>> parent ignores SIGCHILD, and in this case I am not allowing
>> it to read the detach code with wait(). What is the bug?
> Indeed. But, once again, that is why do_notify_parent() expects the dead
> tsk! Please note that if it returns DEATH_REAP it sets ->exit_signal = -1.
> And this is _only_ allowed if the leader is already dead and we are going
> to reap it.
Ah, so, by saying "do_signal_parent() must not return DEATH_REAP (this means that
leader->exit_signal becomes -1)", you actually meant
"do_signal_parent(), when returning DEATH_REAP, must not
set ->exit_signal = -1, because only do_notify_parent()
can do that"? In other words, do_signal_parent() can return
DEATH_REAP after all?
If so - will fix, thanks.

2011-04-19 18:18:17

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [3/3]

On 04/19, Stas Sergeev wrote:
>
> 19.04.2011 21:20, Oleg Nesterov wrote:
>
>>>> do_signal_parent() must not return DEATH_REAP (this means that
>>>> leader->exit_signal becomes -1), but this can happen and this is bug.
>>>>
>>> Could you please clarify this a bit: according to the comments
>>> in signal.c:
>>> ---
>>> * We are exiting and our parent doesn't care. POSIX.1
>>> * defines special semantics for setting SIGCHLD to SIG_IGN
>>> * or setting the SA_NOCLDWAIT flag: we should be reaped
>>> * automatically and not left for our parent's wait4 call.
>>> ---
>>> That's how I understand it: if DEATH_REAP is returned, the
>>> parent ignores SIGCHILD, and in this case I am not allowing
>>> it to read the detach code with wait(). What is the bug?
>> Indeed. But, once again, that is why do_notify_parent() expects the dead
>> tsk! Please note that if it returns DEATH_REAP it sets ->exit_signal = -1.
>> And this is _only_ allowed if the leader is already dead and we are going
>> to reap it.
> Ah, so, by saying "do_signal_parent() must not return DEATH_REAP (this means that
> leader->exit_signal becomes -1)", you actually meant
> "do_signal_parent(), when returning DEATH_REAP, must not
> set ->exit_signal = -1,

Yes.

> because only do_notify_parent()
> can do that"?

because we can only do this if we are going to reap the task

> If so - will fix, thanks.

Stas, please do not trim CC. I am very glad Alan looked at this patch,
I hope he will participate. Better yet, add Linus too as I already asked ;)

Oleg.

2011-04-20 13:12:32

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [1/2]

Hi Alan, Oleg.

I changed the patch the way you asked.
It now consumes much fewer per-thread bytes
(2 bytes + padding), and does much fewer suspicious
things on the hot path. And no zombies with ppid=1.

The attached patch adds do_signal_parent() function.
It is the same as do_notify_parent(), but can be used
when the child is not terminating, but (for example)
detaches with PR_DETACH.


Attachments:
01_sigpar.diff (3.60 kB)

2011-04-20 13:14:56

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [2/2]

The attached patch implements the PR_DETACH prctl
command. It detaches the entire process group from
its parent, allowing the parent to still read the detach
code with normal wait().
Can be used to daemonize process with threads.


Attachments:
pr_detach5.diff (6.03 kB)

2011-04-20 16:51:17

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [2/2]

On 04/20, Stas Sergeev wrote:
>
> The attached patch implements the PR_DETACH prctl
> command. It detaches the entire process group from
> its parent, allowing the parent to still read the detach
> code with normal wait().
> Can be used to daemonize process with threads.

Now I do not understand this patch at all.

Stas, I already asked you, please describe the semantics of PR_DETACH.
Looking at the code, I can only see it was changed again, but I have
no idea what is the supposed behaviour.

At first glance, this version does not reparent the caller to init,
but still PR_DETACH checks ->real_parent != /sbin/init for unknown
reason...

IIUC, PR_DETACH simply fools ->real_parent so that it think this child
exits, while the child in fact runs after that but do_wait() can't see it.

Why? I don't understand the point.

And. To hide the pr_detached task from do_wait(). you changed
do_notify_parent() to returnd DEATH_REAP. I guess you did this to ensure
exit_notify() paths will set EXIT_DEAD and thus do_wait() can't see this
child again. This looks very ugly I must admit. And not 100% correct, we
notify the parent twice if ->detaching != 0. Also, because of this logic,
once wait_task_detached() drops tasklist it can race with exit_notify()
doing release_task(). Yes, task_struct can't go away, but we can't trust
task_pid_vnr(p). Hmm, the change in reparent_leader doesn't look right,
at least in case we reparent to sub-thread.

And of course, this breaks ptrace _completely_. I didn't read the patch
further, perhaps there are more problems.



Stas, I am sorry, I am tired ;) You are sending more and more versions
with the different semantics and they all are buggy. Please add Linus
and ask him if he is going to accept the new feature, and please describe
exactly what you are trying to achieve. Until then I do not want to spend
my time trying to understand yet another implementation of the hack which
I personally dislike.

Oleg.

2011-04-20 18:45:09

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [2/2]

20.04.2011 20:50, Oleg Nesterov wrote:
>> The attached patch implements the PR_DETACH prctl
>> command. It detaches the entire process group from
>> its parent, allowing the parent to still read the detach
>> code with normal wait().
>> Can be used to daemonize process with threads.
> At first glance, this version does not reparent the caller to init,
> but still PR_DETACH checks ->real_parent != /sbin/init for unknown
> reason...
The reason is that the task could have been reparented
to ini by some other means (parent died) - in this case no
detaching.

> IIUC, PR_DETACH simply fools ->real_parent so that it think this child
> exits, while the child in fact runs after that but do_wait() can't see it.
It can check si_code to see what actually happened.

> Why? I don't understand the point.
It was the same also when reparenting was done: the
original parent was only able to read the detach code,
and nothing more. _Nothing_ was changed, the semantic
is completely the same! The implementation changed only.


> And. To hide the pr_detached task from do_wait(). you changed
> do_notify_parent() to returnd DEATH_REAP.
No, its hidden by the check in wait_consider_task().
do_notify_parent() was changed only to not allow the
second notification to the same parent. Again, this is
to completely preserve the old semantic, where's the
original parent was always notified only once.
So, the original parent gets only one notification on
PR_DETACH, and on exit() - it does not. Nothing was
changed.

> I guess you did this to ensure
> exit_notify() paths will set EXIT_DEAD and thus do_wait() can't see this
> child again. This looks very ugly I must admit. And not 100% correct, we
> notify the parent twice if ->detaching != 0.
No: the do_dignal_parent() is not called in that case too.

> task_pid_vnr(p). Hmm, the change in reparent_leader doesn't look right,
> at least in case we reparent to sub-thread.
Why not? Whereever we reparented, I allow reparenting again.
But, more importantly, I unhide that task in wait() and allow the
notifications again, so without this change it can't work.

> And of course, this breaks ptrace _completely_.
Why? What exactly breaks?

> Stas, I am sorry, I am tired ;) You are sending more and more versions
> with the different semantics and they all are buggy.
I am not that sure the last ones are very buggy.
Of course some bugs are possible, but at least that semantic
was a kind of "agreed on", and didn't change at all.

2011-04-20 19:33:59

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [2/2]

On 04/20, Stas Sergeev wrote:
>
> 20.04.2011 20:50, Oleg Nesterov wrote:
>>> The attached patch implements the PR_DETACH prctl
>>> command. It detaches the entire process group from
>>> its parent, allowing the parent to still read the detach
>>> code with normal wait().
>>> Can be used to daemonize process with threads.
>> At first glance, this version does not reparent the caller to init,
>> but still PR_DETACH checks ->real_parent != /sbin/init for unknown
>> reason...
> The reason is that the task could have been reparented
> to ini by some other means (parent died)

This is clear,

- in this case no
> detaching.

Why? Now that we do not reparent PR_DETACH could works in this case too.
Nevermind, this is minor and probably makes sense, forget.

I still do not understand the point of PR_DETACH. Why do you think it is
needed without reparenting? OK, I do not really care ;)

>> IIUC, PR_DETACH simply fools ->real_parent so that it think this child
>> exits, while the child in fact runs after that but do_wait() can't see it.
> It can check si_code to see what actually happened.
>
>> Why? I don't understand the point.
> It was the same also when reparenting was done: the
> original parent was only able to read the detach code,
> and nothing more. _Nothing_ was changed, the semantic
> is completely the same!

The same? what about, say, ppid?

>> And. To hide the pr_detached task from do_wait(). you changed
>> do_notify_parent() to returnd DEATH_REAP.
> No, its hidden by the check in wait_consider_task().
> do_notify_parent() was changed only to not allow the
> second notification to the same parent.

Not only. Please look at your own code ;) wait_consider_task() checks
exit_state == EXIT_ZOMBIE before p->pr_detached, and thus do_notify_parent()
haas to return DEATH_REAP so that the caller will set EXIT_DEAD. Otherwise
the old parent could see EXIT_ZOMBIE && pr_detached task again.

> Again, this is
> to completely preserve the old semantic, where's the
> original parent was always notified only once.

Of course, it should be notified only once. but afaics it can be notified
twice.

>> This looks very ugly I must admit. And not 100% correct, we
>> notify the parent twice if ->detaching != 0.
>
> No: the do_dignal_parent() is not called in that case too.

OK, I misread this code. In fact I missed something else, can't recall
what I meant. Probably I was wrong anyway.

But. What if the child stops after PR_DETACH? It will notify the parent
anyway, no? And this can even happen after wait(WEXITED) succeeded.

>> task_pid_vnr(p). Hmm, the change in reparent_leader doesn't look right,
>> at least in case we reparent to sub-thread.
> Why not? Whereever we reparented, I allow reparenting again.

Even if the child reparents to the original parent's sub-thread?

> But, more importantly, I unhide that task in wait() and allow the
> notifications again, so without this change it can't work.

This is clear but see above.

>> And of course, this breaks ptrace _completely_.
> Why? What exactly breaks?

Everything. ptrace relies on do_wait(). wait_consider_task() doesn't
work if pr_detached (except for WEXITED of course).

>> Stas, I am sorry, I am tired ;) You are sending more and more versions
>> with the different semantics and they all are buggy.
> I am not that sure the last ones are very buggy.

Well, I am not sure.

> but at least that semantic
> was a kind of "agreed on", and didn't change at all.

Cough. So far nobody except you agrees with this semantics ;)

Oleg.

2011-04-20 20:36:50

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [2/2]

20.04.2011 23:33, Oleg Nesterov wrote:

> I still do not understand the point of PR_DETACH. Why do you think it is
> needed without reparenting? OK, I do not really care ;)
Hmm, but that's really interesting, I wonder why do
you ask this. In my eyes, the reparenting was just a
"trick", which is now avoided for good. And I saved
a lot of per-thread memory by throwing that, and
shrunk the patch twice. Why do you think no-reparenting
changed something real? Could this change any use case?

>>> And. To hide the pr_detached task from do_wait(). you changed
>>> do_notify_parent() to returnd DEATH_REAP.
>> No, its hidden by the check in wait_consider_task().
>> do_notify_parent() was changed only to not allow the
>> second notification to the same parent.
> Not only. Please look at your own code ;) wait_consider_task() checks
> exit_state == EXIT_ZOMBIE before p->pr_detached, and thus do_notify_parent()
> haas to return DEATH_REAP so that the caller will set EXIT_DEAD. Otherwise
> the old parent could see EXIT_ZOMBIE&& pr_detached task again.
Yes, but that's not to hide from do_wait().
At least as far as I understand, exit_notify() does
release_task() in this case, so that's not hiding: I
literally terminate the child this way.
Or am I missing something?

> But. What if the child stops after PR_DETACH? It will notify the parent
> anyway, no?
Ah, so that should be disallowed too, will fix.

>>> task_pid_vnr(p). Hmm, the change in reparent_leader doesn't look right,
>>> at least in case we reparent to sub-thread.
>> Why not? Whereever we reparented, I allow reparenting again.
> Even if the child reparents to the original parent's sub-thread?
OK. :)

> Everything. ptrace relies on do_wait(). wait_consider_task() doesn't
> work if pr_detached (except for WEXITED of course).
OK.

>> I am not that sure the last ones are very buggy.
> Well, I am not sure.
OK, at least the above 2 bugs are trivial to fix... :))
Thanks!

2011-04-21 10:02:25

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [2/2]

Hi Oleg.

Attaching the fixed patch.
Changes:
- in wait_task_consider(), pr_detach hides the process
only in !ptrace case
- reparenting within the group is no longer re-enables detaching
- in do_notify_parent_cldstop(), do not allow notifications of
real_parent (still allow for ptrace parent)
- scared by your question about reparenting, I reserved the
flags argument for the future extensions. :) Reparenting may
depend on a flag.

The attached patch implements the PR_DETACH prctl
command. It detaches the entire process group from
its parent, allowing the parent to still read the detach
code with normal wait().
Can be used to daemonize process with threads.


Attachments:
pr_detach6.diff (7.22 kB)

2011-04-21 20:01:06

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [2/2]

On 04/21, Stas Sergeev wrote:
> 20.04.2011 23:33, Oleg Nesterov wrote:
>
>> I still do not understand the point of PR_DETACH. Why do you think it is
>> needed without reparenting? OK, I do not really care ;)
> Hmm, but that's really interesting, I wonder why do
> you ask this.

Because I do not understand why do we need this feature. And I strongly
dislike the complications this (wrong) patch adds.

> In my eyes, the reparenting was just a
> "trick",

Sure, I din't like the previous semantics too.


Once again, last time... No need to convince me. Please convince
someone who can ack this hack authoritatively. I can't anyway. And
I'm afraid nobody except Linus can decide whether we need it or not.

>From my side - nack. What can I do if I do not agree with you?

>>>> And. To hide the pr_detached task from do_wait(). you changed
>>>> do_notify_parent() to returnd DEATH_REAP.
>>> No, its hidden by the check in wait_consider_task().
>>> do_notify_parent() was changed only to not allow the
>>> second notification to the same parent.
>> Not only. Please look at your own code ;) wait_consider_task() checks
>> exit_state == EXIT_ZOMBIE before p->pr_detached, and thus do_notify_parent()
>> haas to return DEATH_REAP so that the caller will set EXIT_DEAD. Otherwise
>> the old parent could see EXIT_ZOMBIE&& pr_detached task again.
> Yes, but that's not to hide from do_wait().
> At least as far as I understand, exit_notify() does
> release_task() in this case, so that's not hiding: I
> literally terminate the child this way.

Yes. But there is a window before release_task() does this, we should
change task->state.

But I forgot where did we start... perhaps I missed something or meant
something else. Perhaps I disliked the fact wait_consider_task() checks
EXIT_ZOMBIE before pr_detached... Nevermind.

Oleg.

2011-04-21 20:12:24

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [2/2]

22.04.2011 00:00, Oleg Nesterov wrote:
> Once again, last time... No need to convince me.
Please, I did _never_ tried to convince you in anything.
The question was only why you asked "Why do you think it is
needed without reparenting?", as if without reparenting
it does something different. That was a question about
the semantic only.

2011-04-21 20:16:30

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [2/2]

On 04/21, Stas Sergeev wrote:
>
> Attaching the fixed patch.
> Changes:
> - in wait_task_consider(), pr_detach hides the process
> only in !ptrace case
> ...
> @@ -1555,6 +1600,14 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
> if (p->exit_state == EXIT_ZOMBIE && !delay_group_leader(p))
> return wait_task_zombie(wo, p);
>
> + if (unlikely(p->pr_detached)) {
> + if (p->detaching)
> + return wait_task_detached(wo, p);
> + /* pr_detached tasks are hidden from parent */
> + if (!ptrace)
> + return 0;
> + }

Hmm... I guess this should fix ptrace? How? I dont understand this at all.

Stas, I bet you didn't test your patch. If the old parent traces the child
WEXITED should not succeed until the child really exits. Otherwise it is
easy to escape ptrace.

Oh. And I bet there are other problems. Say, exec can change the leader...
Easy to fix, but note that we need more and more stupid pr_detached special
cases.

Stas, sorry. I am not going to looks at the next versions. Until you
convince lkml we need this feature.

Oleg.

2011-04-21 20:33:35

by Stas Sergeev

[permalink] [raw]
Subject: Re: [path][rfc] add PR_DETACH prctl command [2/2]

22.04.2011 00:15, Oleg Nesterov wrote:
> Stas, I bet you didn't test your patch.
Not with ptrace, yes. :(

> If the old parent traces the child
What is "old parent" without reparenting?

> Oh. And I bet there are other problems. Say, exec can change the leader...
> Easy to fix, but note that we need more and more stupid pr_detached special
> cases.
>
> Stas, sorry. I am not going to looks at the next versions. Until you
> convince lkml we need this feature.
OK, if you think there are really to much cases
that I missed, then this patch might be really more
intrusive than I suspected.
I got tired of it too, so fare well. :)
Thanks for your time.