zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.
Yes, we have already killed all tasks in this namespace, and sys_wait4()
doesn't see any child. But this doesn't mean ->children list is empty,
we may have EXIT_DEAD tasks which are not visible to do_wait(). In that
case the subsequent forget_original_parent() will crash the kernel because
it will try to re-parent these tasks to the NULL reaper.
Even if there are no childs, it is not good that forget_original_parent()
uses reaper == NULL.
Change the code to set ->child_reaper = init_pid_ns.child_reaper instead.
We could use pid_ns->parent->child_reaper as well, I think this does not
really matter. These EXIT_DEAD tasks are not visible to the new ->parent
after re-parenting, they will silently do release_task() eventually.
Note that we must change ->child_reaper, otherwise forget_original_parent()
will use reaper == father, and in that case we will hit the (correct)
BUG_ON(!list_empty(&father->children)).
Signed-off-by: Oleg Nesterov <[email protected]>
--- 2.6.27-rc4/kernel/pid_namespace.c~1_ZAP_DONT_CLEAR_REAPER 2008-07-30 13:12:49.000000000 +0400
+++ 2.6.27-rc4/kernel/pid_namespace.c 2008-08-24 17:22:59.000000000 +0400
@@ -179,9 +179,12 @@ void zap_pid_ns_processes(struct pid_nam
rc = sys_wait4(-1, NULL, __WALL, NULL);
} while (rc != -ECHILD);
-
- /* Child reaper for the pid namespace is going away */
- pid_ns->child_reaper = NULL;
+ /*
+ * We can not clear ->child_reaper or leave it alone.
+ * There may by stealth EXIT_DEAD tasks on ->children,
+ * forget_original_parent() must move them somewhere.
+ */
+ pid_ns->child_reaper = init_pid_ns.child_reaper;
acct_exit_ns(pid_ns);
return;
}
Quoting Oleg Nesterov ([email protected]):
> zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.
>
> Yes, we have already killed all tasks in this namespace, and sys_wait4()
> doesn't see any child. But this doesn't mean ->children list is empty,
> we may have EXIT_DEAD tasks which are not visible to do_wait(). In that
> case the subsequent forget_original_parent() will crash the kernel because
> it will try to re-parent these tasks to the NULL reaper.
>
> Even if there are no childs, it is not good that forget_original_parent()
> uses reaper == NULL.
>
> Change the code to set ->child_reaper = init_pid_ns.child_reaper instead.
> We could use pid_ns->parent->child_reaper as well, I think this does not
> really matter. These EXIT_DEAD tasks are not visible to the new ->parent
> after re-parenting, they will silently do release_task() eventually.
>
> Note that we must change ->child_reaper, otherwise forget_original_parent()
> will use reaper == father, and in that case we will hit the (correct)
> BUG_ON(!list_empty(&father->children)).
>
> Signed-off-by: Oleg Nesterov <[email protected]>
Well while it looked correct to me all along, I couldn't get the
testcase to cause an oops. But I see where do_exit() calls
zap_pid_ns_processes() (through group_exit()) before
forget_original_parent(), which can/should end up dereferencing
the null pid_ns, so clearly this is needed and correct.
Acked-by: Serge Hallyn <[email protected]>
Thanks, Oleg.
-serge
> --- 2.6.27-rc4/kernel/pid_namespace.c~1_ZAP_DONT_CLEAR_REAPER 2008-07-30 13:12:49.000000000 +0400
> +++ 2.6.27-rc4/kernel/pid_namespace.c 2008-08-24 17:22:59.000000000 +0400
> @@ -179,9 +179,12 @@ void zap_pid_ns_processes(struct pid_nam
> rc = sys_wait4(-1, NULL, __WALL, NULL);
> } while (rc != -ECHILD);
>
> -
> - /* Child reaper for the pid namespace is going away */
> - pid_ns->child_reaper = NULL;
> + /*
> + * We can not clear ->child_reaper or leave it alone.
> + * There may by stealth EXIT_DEAD tasks on ->children,
> + * forget_original_parent() must move them somewhere.
Actually this comment is a little bit misleading - the null
deref will happen regardless of whether there are children,
right?
> + */
> + pid_ns->child_reaper = init_pid_ns.child_reaper;
> acct_exit_ns(pid_ns);
> return;
> }
Oleg Nesterov [[email protected]] wrote:
| zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.
|
| Yes, we have already killed all tasks in this namespace, and sys_wait4()
| doesn't see any child. But this doesn't mean ->children list is empty,
| we may have EXIT_DEAD tasks which are not visible to do_wait(). In that
| case the subsequent forget_original_parent() will crash the kernel because
| it will try to re-parent these tasks to the NULL reaper.
|
| Even if there are no childs, it is not good that forget_original_parent()
| uses reaper == NULL.
|
| Change the code to set ->child_reaper = init_pid_ns.child_reaper instead.
| We could use pid_ns->parent->child_reaper as well, I think this does not
| really matter. These EXIT_DEAD tasks are not visible to the new ->parent
| after re-parenting, they will silently do release_task() eventually.
|
| Note that we must change ->child_reaper, otherwise forget_original_parent()
| will use reaper == father, and in that case we will hit the (correct)
| BUG_ON(!list_empty(&father->children)).
|
| Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Sukadev Bhattiprolu <[email protected]>
|
| --- 2.6.27-rc4/kernel/pid_namespace.c~1_ZAP_DONT_CLEAR_REAPER 2008-07-30 13:12:49.000000000 +0400
| +++ 2.6.27-rc4/kernel/pid_namespace.c 2008-08-24 17:22:59.000000000 +0400
| @@ -179,9 +179,12 @@ void zap_pid_ns_processes(struct pid_nam
| rc = sys_wait4(-1, NULL, __WALL, NULL);
| } while (rc != -ECHILD);
|
| -
| - /* Child reaper for the pid namespace is going away */
| - pid_ns->child_reaper = NULL;
| + /*
| + * We can not clear ->child_reaper or leave it alone.
| + * There may by stealth EXIT_DEAD tasks on ->children,
| + * forget_original_parent() must move them somewhere.
| + */
| + pid_ns->child_reaper = init_pid_ns.child_reaper;
| acct_exit_ns(pid_ns);
| return;
| }
|
| --
| To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
| the body of a message to [email protected]
| More majordomo info at http://vger.kernel.org/majordomo-info.html
| Please read the FAQ at http://www.tux.org/lkml/
> Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Pavel Emelyanov <[email protected]>
On 08/26, Serge E. Hallyn wrote:
>
> Quoting Oleg Nesterov ([email protected]):
> > zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.
> > ...
> > Even if there are no childs, it is not good that forget_original_parent()
> > uses reaper == NULL.
> > ...
>
> Well while it looked correct to me all along, I couldn't get the
> testcase to cause an oops.
Perhaps you can start several instances or increase the delay, I
forgot that usleep() uses usecs, not msecs. In any case I agree,
the test-case is oversimplified, but since it worked for me I didn't
try to make it more "stubborn".
> Acked-by: Serge Hallyn <[email protected]>
Thanks!
> > + /*
> > + * We can not clear ->child_reaper or leave it alone.
> > + * There may by stealth EXIT_DEAD tasks on ->children,
> > + * forget_original_parent() must move them somewhere.
>
> Actually this comment is a little bit misleading - the null
> deref will happen regardless of whether there are children,
> right?
Yes, the comment could be better ;)
But no, we won't oops without children. Please note that
forget_original_parent() uses reaper inside the
list_for_each_entry_safe(father->children) loop. ->children is
empty, reaper is not used at all.
But please also note the "not good" part of the changelog above.
Thanks to all for review!
Oleg.