Disable CLONE_PARENT for init
When global or container-init processes use CLONE_PARENT, they create a
multi-rooted process tree. Besides, if the siblings of init exit, the
SIGCHLD is not sent to init process resulting in the zombies sticking
around indefinitely. So disable CLONE_PARENT for init.
Lightly tested, RFC patch :-)
Changelog[v2]:
- Simplify patch description based on comments from Eric Biederman
and Oleg Nesterov.
- [Oleg Nesterov] Use SIGNAL_UNKILLABLE instead of is_global_init()
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
Acked-by: Roland McGrath <[email protected]>
---
kernel/fork.c | 11 +++++++++++
1 file changed, 11 insertions(+)
Index: linux-mmotm/kernel/fork.c
===================================================================
--- linux-mmotm.orig/kernel/fork.c 2009-06-30 23:01:06.000000000 -0700
+++ linux-mmotm/kernel/fork.c 2009-06-30 23:13:53.000000000 -0700
@@ -974,6 +974,17 @@ static struct task_struct *copy_process(
if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
return ERR_PTR(-EINVAL);
+ /*
+ * Swapper process sets the handler for SIGCHLD to SIG_DFL. If init
+ * creates a sibling and the sibling exits, the SIGCHLD is sent to
+ * the swapper (since the swapper's handler for SIGCHLD is SIG_DFL).
+ * But since the swapper does not reap its children, the zombie will
+ * remain forever. So prevent init from using CLONE_PARENT.
+ */
+ if ((clone_flags & CLONE_PARENT) &&
+ current->signal->flags & SIGNAL_UNKILLABLE)
+ return ERR_PTR(-EINVAL);
+
retval = security_task_create(clone_flags);
if (retval)
goto fork_out;
> When global or container-init processes use CLONE_PARENT, they create a
> multi-rooted process tree.
I take this to be the real motivation for your change.
But you don't mention it in the code comment.
> + * Swapper process sets the handler for SIGCHLD to SIG_DFL. If init
> + * creates a sibling and the sibling exits, the SIGCHLD is sent to
> + * the swapper (since the swapper's handler for SIGCHLD is SIG_DFL).
> + * But since the swapper does not reap its children, the zombie will
> + * remain forever. So prevent init from using CLONE_PARENT.
This would be fixed by having swapper set its SIGCHLD to SIG_IGN instead,
so such children self-reap. That seems like the better fix for that.
If you want to make this change because of container-init issues, I think
you should just say so independent of this global-init case.
Thanks,
Roland
Roland McGrath [[email protected]] wrote:
| > When global or container-init processes use CLONE_PARENT, they create a
| > multi-rooted process tree.
|
| I take this to be the real motivation for your change.
| But you don't mention it in the code comment.
Well, it was - when I started. But my understanding of the comments was that
the constraint could be extended to global init as well for the following
reason.
|
| > + * Swapper process sets the handler for SIGCHLD to SIG_DFL. If init
| > + * creates a sibling and the sibling exits, the SIGCHLD is sent to
| > + * the swapper (since the swapper's handler for SIGCHLD is SIG_DFL).
| > + * But since the swapper does not reap its children, the zombie will
| > + * remain forever. So prevent init from using CLONE_PARENT.
|
| This would be fixed by having swapper set its SIGCHLD to SIG_IGN instead,
| so such children self-reap. That seems like the better fix for that.
Yes, that would fix the global init case.
|
| If you want to make this change because of container-init issues, I think
| you should just say so independent of this global-init case.
So can I leave the check for SIGNAL_UNKILLABLE but simplify the comments
to refer to the multi-rooted process tree ?
On 07/01, Roland McGrath wrote:
>
> > When global or container-init processes use CLONE_PARENT, they create a
> > multi-rooted process tree.
>
> I take this to be the real motivation for your change.
> But you don't mention it in the code comment.
>
> > + * Swapper process sets the handler for SIGCHLD to SIG_DFL. If init
> > + * creates a sibling and the sibling exits, the SIGCHLD is sent to
> > + * the swapper (since the swapper's handler for SIGCHLD is SIG_DFL).
> > + * But since the swapper does not reap its children, the zombie will
> > + * remain forever. So prevent init from using CLONE_PARENT.
>
> This would be fixed by having swapper set its SIGCHLD to SIG_IGN instead,
> so such children self-reap. That seems like the better fix for that.
This won't fix the problem. The child won't autoreap itself if ->exit_signal
!= SIGCHLD.
> If you want to make this change because of container-init issues, I think
> you should just say so independent of this global-init case.
Yes, agreed, the comment looks confusing.
Oleg
| This won't fix the problem. The child won't autoreap itself if ->exit_signal
| != SIGCHLD.
|
| > If you want to make this change because of container-init issues, I think
| > you should just say so independent of this global-init case.
|
| Yes, agreed, the comment looks confusing.
|
| Oleg
Here is an updated patch with comments fixed.
Roland pls ack again if this is better.
---
Disable CLONE_PARENT for init.
When global or container-init processes use CLONE_PARENT, they create a
multi-rooted process tree. Besides if the siblings of init exit, the
SIGCHLD is not sent to init process resulting in the zombies sticking
around indefinitely.
Changelog[v3]:
- [Roland, Oleg] Simplify comment describing the change
Changelog[v2]:
- Simplify patch description based on comments from Eric Biederman
and Oleg Nesterov.
- [Oleg Nesterov] Use SIGNAL_UNKILLABLE instead of is_global_init()
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
---
kernel/fork.c | 8 ++++++++
1 file changed, 8 insertions(+)
Index: linux-mmotm/kernel/fork.c
===================================================================
--- linux-mmotm.orig/kernel/fork.c 2009-06-30 23:01:06.000000000 -0700
+++ linux-mmotm/kernel/fork.c 2009-07-01 14:43:10.000000000 -0700
@@ -974,6 +974,14 @@ static struct task_struct *copy_process(
if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
return ERR_PTR(-EINVAL);
+ /*
+ * To avoid multi-rooted process-trees prevent global and container
+ * inits from creating siblings.
+ */
+ if ((clone_flags & CLONE_PARENT) &&
+ current->signal->flags & SIGNAL_UNKILLABLE)
+ return ERR_PTR(-EINVAL);
+
retval = security_task_create(clone_flags);
if (retval)
goto fork_out;
Yeah, that's fine. Since Oleg's pointed out that there is indeed no way to
avoid the leak in some global-init uses, it is fine to have a comment that
says that global init problems are part of the reason to outlaw this usage.
I just objected to a change that was really for container init but said it
was only to fix something different.
Thanks,
Roland
Sukadev Bhattiprolu <[email protected]> writes:
> | This won't fix the problem. The child won't autoreap itself if ->exit_signal
> | != SIGCHLD.
> |
> | > If you want to make this change because of container-init issues, I think
> | > you should just say so independent of this global-init case.
> |
> | Yes, agreed, the comment looks confusing.
> |
> | Oleg
>
> Here is an updated patch with comments fixed.
>
> Roland pls ack again if this is better.
>
> ---
>
> Disable CLONE_PARENT for init.
>
> When global or container-init processes use CLONE_PARENT, they create a
> multi-rooted process tree. Besides if the siblings of init exit, the
> SIGCHLD is not sent to init process resulting in the zombies sticking
> around indefinitely.
>
> Changelog[v3]:
> - [Roland, Oleg] Simplify comment describing the change
> Changelog[v2]:
> - Simplify patch description based on comments from Eric Biederman
> and Oleg Nesterov.
> - [Oleg Nesterov] Use SIGNAL_UNKILLABLE instead of is_global_init()
>
> Signed-off-by: Sukadev Bhattiprolu <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
> ---
> kernel/fork.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> Index: linux-mmotm/kernel/fork.c
> ===================================================================
> --- linux-mmotm.orig/kernel/fork.c 2009-06-30 23:01:06.000000000 -0700
> +++ linux-mmotm/kernel/fork.c 2009-07-01 14:43:10.000000000 -0700
> @@ -974,6 +974,14 @@ static struct task_struct *copy_process(
> if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
> return ERR_PTR(-EINVAL);
>
> + /*
> + * To avoid multi-rooted process-trees prevent global and container
> + * inits from creating siblings.
> + */
> + if ((clone_flags & CLONE_PARENT) &&
> + current->signal->flags & SIGNAL_UNKILLABLE)
> + return ERR_PTR(-EINVAL);
> +
> retval = security_task_create(clone_flags);
> if (retval)
> goto fork_out;
Roland McGrath [[email protected]] wrote:
| Yeah, that's fine. Since Oleg's pointed out that there is indeed no way to
| avoid the leak in some global-init uses, it is fine to have a comment that
| says that global init problems are part of the reason to outlaw this usage.
| I just objected to a change that was really for container init but said it
| was only to fix something different.
Ok. How about this comment:
---
Disable CLONE_PARENT for init.
When global or container-init processes use CLONE_PARENT, they create a
multi-rooted process tree. Besides siblings of global init remain as
zombies on exit since they are not reaped by their parent (swapper). So
prevent global and container-inits from creating siblings.
Changelog[v3]:
- [Roland, Oleg] Simplify comment describing the change
Changelog[v2]:
- Simplify patch description based on comments from Eric Biederman
and Oleg Nesterov.
- [Oleg Nesterov] Use SIGNAL_UNKILLABLE instead of is_global_init()
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
---
kernel/fork.c | 10 ++++++++++
1 file changed, 10 insertions(+)
Index: linux-mmotm/kernel/fork.c
===================================================================
--- linux-mmotm.orig/kernel/fork.c 2009-06-30 23:01:06.000000000 -0700
+++ linux-mmotm/kernel/fork.c 2009-07-01 17:29:09.000000000 -0700
@@ -974,6 +974,16 @@ static struct task_struct *copy_process(
if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
return ERR_PTR(-EINVAL);
+ /*
+ * Siblings of global init remain as zombies on exit since they are
+ * not reaped by their parent (swapper). To solve this and to avoid
+ * multi-rooted process trees, prevent global and container-inits
+ * from creating siblings.
+ */
+ if ((clone_flags & CLONE_PARENT) &&
+ current->signal->flags & SIGNAL_UNKILLABLE)
+ return ERR_PTR(-EINVAL);
+
retval = security_task_create(clone_flags);
if (retval)
goto fork_out;
Looks good to me.
Acked-by: Roland McGrath <[email protected]>
On 07/01, Sukadev Bhattiprolu wrote:
>
> --- linux-mmotm.orig/kernel/fork.c 2009-06-30 23:01:06.000000000 -0700
> +++ linux-mmotm/kernel/fork.c 2009-07-01 17:29:09.000000000 -0700
> @@ -974,6 +974,16 @@ static struct task_struct *copy_process(
> if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
> return ERR_PTR(-EINVAL);
>
> + /*
> + * Siblings of global init remain as zombies on exit since they are
> + * not reaped by their parent (swapper). To solve this and to avoid
> + * multi-rooted process trees, prevent global and container-inits
> + * from creating siblings.
> + */
> + if ((clone_flags & CLONE_PARENT) &&
> + current->signal->flags & SIGNAL_UNKILLABLE)
> + return ERR_PTR(-EINVAL);
Acked-by: Oleg Nesterov <[email protected]>
On 07/02, Oleg Nesterov wrote:
>
> On 07/01, Sukadev Bhattiprolu wrote:
> >
> > --- linux-mmotm.orig/kernel/fork.c 2009-06-30 23:01:06.000000000 -0700
> > +++ linux-mmotm/kernel/fork.c 2009-07-01 17:29:09.000000000 -0700
> > @@ -974,6 +974,16 @@ static struct task_struct *copy_process(
> > if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
> > return ERR_PTR(-EINVAL);
> >
> > + /*
> > + * Siblings of global init remain as zombies on exit since they are
> > + * not reaped by their parent (swapper). To solve this and to avoid
> > + * multi-rooted process trees, prevent global and container-inits
> > + * from creating siblings.
> > + */
> > + if ((clone_flags & CLONE_PARENT) &&
> > + current->signal->flags & SIGNAL_UNKILLABLE)
> > + return ERR_PTR(-EINVAL);
>
> Acked-by: Oleg Nesterov <[email protected]>
Thinking more, perhaps it makes sense to disallow CLONE_VM too.
If init forks CLONE_VM task, this task can be killed by
sig_kernel_coredump signal. In that case init will be killed too
and the kernel will crash.
But this is minor, we can trust the global init.
Oleg.