2008-08-24 15:46:07

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/4] pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing

zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.

Yes, we have already killed all tasks in this namespace, and sys_wait4()
doesn't see any child. But this doesn't mean ->children list is empty,
we may have EXIT_DEAD tasks which are not visible to do_wait(). In that
case the subsequent forget_original_parent() will crash the kernel because
it will try to re-parent these tasks to the NULL reaper.

Even if there are no childs, it is not good that forget_original_parent()
uses reaper == NULL.

Change the code to set ->child_reaper = init_pid_ns.child_reaper instead.
We could use pid_ns->parent->child_reaper as well, I think this does not
really matter. These EXIT_DEAD tasks are not visible to the new ->parent
after re-parenting, they will silently do release_task() eventually.

Note that we must change ->child_reaper, otherwise forget_original_parent()
will use reaper == father, and in that case we will hit the (correct)
BUG_ON(!list_empty(&father->children)).

Signed-off-by: Oleg Nesterov <[email protected]>

--- 2.6.27-rc4/kernel/pid_namespace.c~1_ZAP_DONT_CLEAR_REAPER 2008-07-30 13:12:49.000000000 +0400
+++ 2.6.27-rc4/kernel/pid_namespace.c 2008-08-24 17:22:59.000000000 +0400
@@ -179,9 +179,12 @@ void zap_pid_ns_processes(struct pid_nam
rc = sys_wait4(-1, NULL, __WALL, NULL);
} while (rc != -ECHILD);

-
- /* Child reaper for the pid namespace is going away */
- pid_ns->child_reaper = NULL;
+ /*
+ * We can not clear ->child_reaper or leave it alone.
+ * There may by stealth EXIT_DEAD tasks on ->children,
+ * forget_original_parent() must move them somewhere.
+ */
+ pid_ns->child_reaper = init_pid_ns.child_reaper;
acct_exit_ns(pid_ns);
return;
}


2008-08-26 21:25:40

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 1/4] pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing

Quoting Oleg Nesterov ([email protected]):
> zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.
>
> Yes, we have already killed all tasks in this namespace, and sys_wait4()
> doesn't see any child. But this doesn't mean ->children list is empty,
> we may have EXIT_DEAD tasks which are not visible to do_wait(). In that
> case the subsequent forget_original_parent() will crash the kernel because
> it will try to re-parent these tasks to the NULL reaper.
>
> Even if there are no childs, it is not good that forget_original_parent()
> uses reaper == NULL.
>
> Change the code to set ->child_reaper = init_pid_ns.child_reaper instead.
> We could use pid_ns->parent->child_reaper as well, I think this does not
> really matter. These EXIT_DEAD tasks are not visible to the new ->parent
> after re-parenting, they will silently do release_task() eventually.
>
> Note that we must change ->child_reaper, otherwise forget_original_parent()
> will use reaper == father, and in that case we will hit the (correct)
> BUG_ON(!list_empty(&father->children)).
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Well while it looked correct to me all along, I couldn't get the
testcase to cause an oops. But I see where do_exit() calls
zap_pid_ns_processes() (through group_exit()) before
forget_original_parent(), which can/should end up dereferencing
the null pid_ns, so clearly this is needed and correct.

Acked-by: Serge Hallyn <[email protected]>

Thanks, Oleg.

-serge

> --- 2.6.27-rc4/kernel/pid_namespace.c~1_ZAP_DONT_CLEAR_REAPER 2008-07-30 13:12:49.000000000 +0400
> +++ 2.6.27-rc4/kernel/pid_namespace.c 2008-08-24 17:22:59.000000000 +0400
> @@ -179,9 +179,12 @@ void zap_pid_ns_processes(struct pid_nam
> rc = sys_wait4(-1, NULL, __WALL, NULL);
> } while (rc != -ECHILD);
>
> -
> - /* Child reaper for the pid namespace is going away */
> - pid_ns->child_reaper = NULL;
> + /*
> + * We can not clear ->child_reaper or leave it alone.
> + * There may by stealth EXIT_DEAD tasks on ->children,
> + * forget_original_parent() must move them somewhere.

Actually this comment is a little bit misleading - the null
deref will happen regardless of whether there are children,
right?

> + */
> + pid_ns->child_reaper = init_pid_ns.child_reaper;
> acct_exit_ns(pid_ns);
> return;
> }

2008-08-27 01:47:23

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: Re: [PATCH 1/4] pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing

Oleg Nesterov [[email protected]] wrote:
| zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.
|
| Yes, we have already killed all tasks in this namespace, and sys_wait4()
| doesn't see any child. But this doesn't mean ->children list is empty,
| we may have EXIT_DEAD tasks which are not visible to do_wait(). In that
| case the subsequent forget_original_parent() will crash the kernel because
| it will try to re-parent these tasks to the NULL reaper.
|
| Even if there are no childs, it is not good that forget_original_parent()
| uses reaper == NULL.
|
| Change the code to set ->child_reaper = init_pid_ns.child_reaper instead.
| We could use pid_ns->parent->child_reaper as well, I think this does not
| really matter. These EXIT_DEAD tasks are not visible to the new ->parent
| after re-parenting, they will silently do release_task() eventually.
|
| Note that we must change ->child_reaper, otherwise forget_original_parent()
| will use reaper == father, and in that case we will hit the (correct)
| BUG_ON(!list_empty(&father->children)).
|
| Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Sukadev Bhattiprolu <[email protected]>

|
| --- 2.6.27-rc4/kernel/pid_namespace.c~1_ZAP_DONT_CLEAR_REAPER 2008-07-30 13:12:49.000000000 +0400
| +++ 2.6.27-rc4/kernel/pid_namespace.c 2008-08-24 17:22:59.000000000 +0400
| @@ -179,9 +179,12 @@ void zap_pid_ns_processes(struct pid_nam
| rc = sys_wait4(-1, NULL, __WALL, NULL);
| } while (rc != -ECHILD);
|
| -
| - /* Child reaper for the pid namespace is going away */
| - pid_ns->child_reaper = NULL;
| + /*
| + * We can not clear ->child_reaper or leave it alone.
| + * There may by stealth EXIT_DEAD tasks on ->children,
| + * forget_original_parent() must move them somewhere.
| + */
| + pid_ns->child_reaper = init_pid_ns.child_reaper;
| acct_exit_ns(pid_ns);
| return;
| }
|
| --
| To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
| the body of a message to [email protected]
| More majordomo info at http://vger.kernel.org/majordomo-info.html
| Please read the FAQ at http://www.tux.org/lkml/

2008-08-27 16:32:16

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/4] pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing

On 08/26, Serge E. Hallyn wrote:
>
> Quoting Oleg Nesterov ([email protected]):
> > zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.
> > ...
> > Even if there are no childs, it is not good that forget_original_parent()
> > uses reaper == NULL.
> > ...
>
> Well while it looked correct to me all along, I couldn't get the
> testcase to cause an oops.

Perhaps you can start several instances or increase the delay, I
forgot that usleep() uses usecs, not msecs. In any case I agree,
the test-case is oversimplified, but since it worked for me I didn't
try to make it more "stubborn".

> Acked-by: Serge Hallyn <[email protected]>

Thanks!

> > + /*
> > + * We can not clear ->child_reaper or leave it alone.
> > + * There may by stealth EXIT_DEAD tasks on ->children,
> > + * forget_original_parent() must move them somewhere.
>
> Actually this comment is a little bit misleading - the null
> deref will happen regardless of whether there are children,
> right?

Yes, the comment could be better ;)

But no, we won't oops without children. Please note that
forget_original_parent() uses reaper inside the
list_for_each_entry_safe(father->children) loop. ->children is
empty, reaper is not used at all.

But please also note the "not good" part of the changelog above.


Thanks to all for review!

Oleg.