There is a rare case where current's nsproxy might be NULL but we are
required to check for credentials and capabilities. It sometimes happens
during an exit() syscall while destroying user's session (logging out).
My understanding is that while we have to use task_nsproxy() to get
task's nsproxy and check whether it's NULL, for the 'current' we don't
have to and it's expected not to be NULL. There is a code in the kernel
currently that does current->nsproxy->user_ns without any checks.
There seem to be no crash currently because of this, but with other LSM
modules or in future there might be. This is the backtrace:
0 smk_tskacc (task=0xffff88003b0b92e0, obj_known=0x2 <irq_stack_union+2>, mode=2, a=0xffff88003be53dd8) at security/smack/smack_access.c:261
1 0xffffffff8130e2aa in smk_curacc (obj_known=<optimized out>, mode=<optimized out>, a=<optimized out>) at security/smack/smack_access.c:318
2 0xffffffff8130a50d in smack_task_kill (p=0xffff88003b0b92e0, info=<optimized out>, sig=<optimized out>, secid=<optimized out>) at security/smack/smack_lsm.c:2071
3 0xffffffff812ea4f6 in security_task_kill (p=<optimized out>, info=<optimized out>, sig=<optimized out>, secid=<optimized out>) at security/security.c:952
4 0xffffffff8109ac80 in check_kill_permission (sig=15, info=0x0 <irq_stack_union>, t=0xffff88003b0b8000) at kernel/signal.c:796
5 0xffffffff8109d3ab in group_send_sig_info (sig=15, info=0x0 <irq_stack_union>, p=0xffff88003b0b8000) at kernel/signal.c:1296
6 0xffffffff8108e527 in forget_original_parent (father=<optimized out>) at kernel/exit.c:575
7 exit_notify (group_dead=<optimized out>, tsk=<optimized out>) at kernel/exit.c:606
8 do_exit (code=<optimized out>) at kernel/exit.c:775
9 0xffffffff8108ec0f in do_group_exit (exit_code=0) at kernel/exit.c:891
10 0xffffffff8108ec84 in SYSC_exit_group (error_code=<optimized out>) at kernel/exit.c:902
11 SyS_exit_group (error_code=<optimized out>) at kernel/exit.c:900
LSM task_kill() hook is triggered and current->nsproxy within is NULL.
This happens during an exit() syscall because exit_task_namespaces() is
called before the exit_notify(). This patch changes their order.
Signed-off-by: Lukasz Pawelczyk <[email protected]>
---
kernel/exit.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/kernel/exit.c b/kernel/exit.c
index e5c4668..ac4735c 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -751,7 +751,6 @@ void do_exit(long code)
exit_fs(tsk);
if (group_dead)
disassociate_ctty(1);
- exit_task_namespaces(tsk);
exit_task_work(tsk);
exit_thread();
@@ -773,6 +772,13 @@ void do_exit(long code)
flush_ptrace_hw_breakpoint(tsk);
exit_notify(tsk, group_dead);
+
+ /*
+ * This should be after all things that pottentially require
+ * process's namespaces (e.g. capability checks).
+ */
+ exit_task_namespaces(tsk);
+
proc_exit_connector(tsk);
#ifdef CONFIG_NUMA
task_lock(tsk);
--
1.9.3
On 11/26, Lukasz Pawelczyk wrote:
>
> My understanding is that while we have to use task_nsproxy()
task_nsproxy() has already gone... probably this doesn't matter but which
kernel version ?
> task's nsproxy and check whether it's NULL, for the 'current' we don't
> have to and it's expected not to be NULL.
Well, unless exit_task_namespaces() was called ;)
> There seem to be no crash currently because of this, but with other LSM
> modules or in future there might be. This is the backtrace:
Confused... backtrace of what? did kernel crash or what?
> 0 smk_tskacc (task=0xffff88003b0b92e0, obj_known=0x2 <irq_stack_union+2>, mode=2, a=0xffff88003be53dd8) at security/smack/smack_access.c:261
> 1 0xffffffff8130e2aa in smk_curacc (obj_known=<optimized out>, mode=<optimized out>, a=<optimized out>) at security/smack/smack_access.c:318
> 2 0xffffffff8130a50d in smack_task_kill (p=0xffff88003b0b92e0, info=<optimized out>, sig=<optimized out>, secid=<optimized out>) at security/smack/smack_lsm.c:2071
I do not know this code, so could you please tell more? How/wher smk_tskacc()
uses ->nsproxy? smack_access.c:261 leads to the comment header above smk_curacc()
in my tree, so this tells me nothing.
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -751,7 +751,6 @@ void do_exit(long code)
> exit_fs(tsk);
> if (group_dead)
> disassociate_ctty(1);
> - exit_task_namespaces(tsk);
> exit_task_work(tsk);
> exit_thread();
>
> @@ -773,6 +772,13 @@ void do_exit(long code)
> flush_ptrace_hw_breakpoint(tsk);
>
> exit_notify(tsk, group_dead);
> +
> + /*
> + * This should be after all things that pottentially require
> + * process's namespaces (e.g. capability checks).
> + */
> + exit_task_namespaces(tsk);
> +
> proc_exit_connector(tsk);
Well, we can probably move exit_task_namespaces() down (perhaps we even
want to move it after exit_task_work).
But I am not sure about exit_notify(), in this case free_nsproxy() can
be called when the caller is already reaped.
In any case, please more details?
Oleg.
On Wed, 26 Nov 2014, Lukasz Pawelczyk wrote:
> There is a rare case where current's nsproxy might be NULL but we are
> required to check for credentials and capabilities. It sometimes happens
> during an exit() syscall while destroying user's session (logging out).
>
> My understanding is that while we have to use task_nsproxy() to get
> task's nsproxy and check whether it's NULL, for the 'current' we don't
> have to and it's expected not to be NULL. There is a code in the kernel
> currently that does current->nsproxy->user_ns without any checks.
>
> There seem to be no crash currently because of this, but with other LSM
> modules or in future there might be. This is the backtrace:
>
> 0 smk_tskacc (task=0xffff88003b0b92e0, obj_known=0x2 <irq_stack_union+2>, mode=2, a=0xffff88003be53dd8) at security/smack/smack_access.c:261
> 1 0xffffffff8130e2aa in smk_curacc (obj_known=<optimized out>, mode=<optimized out>, a=<optimized out>) at security/smack/smack_access.c:318
> 2 0xffffffff8130a50d in smack_task_kill (p=0xffff88003b0b92e0, info=<optimized out>, sig=<optimized out>, secid=<optimized out>) at security/smack/smack_lsm.c:2071
> 3 0xffffffff812ea4f6 in security_task_kill (p=<optimized out>, info=<optimized out>, sig=<optimized out>, secid=<optimized out>) at security/security.c:952
> 4 0xffffffff8109ac80 in check_kill_permission (sig=15, info=0x0 <irq_stack_union>, t=0xffff88003b0b8000) at kernel/signal.c:796
> 5 0xffffffff8109d3ab in group_send_sig_info (sig=15, info=0x0 <irq_stack_union>, p=0xffff88003b0b8000) at kernel/signal.c:1296
> 6 0xffffffff8108e527 in forget_original_parent (father=<optimized out>) at kernel/exit.c:575
> 7 exit_notify (group_dead=<optimized out>, tsk=<optimized out>) at kernel/exit.c:606
> 8 do_exit (code=<optimized out>) at kernel/exit.c:775
> 9 0xffffffff8108ec0f in do_group_exit (exit_code=0) at kernel/exit.c:891
> 10 0xffffffff8108ec84 in SYSC_exit_group (error_code=<optimized out>) at kernel/exit.c:902
> 11 SyS_exit_group (error_code=<optimized out>) at kernel/exit.c:900
>
> LSM task_kill() hook is triggered and current->nsproxy within is NULL.
>
> This happens during an exit() syscall because exit_task_namespaces() is
> called before the exit_notify(). This patch changes their order.
>
This is a classic case of a patch being proposed for a problem that only
occurs on kernels that include other patches that are not upstream. The
order that things are deconstructed in the exit path is complex and
carefully choreographed, changing it comes at significant risk. That risk
would be justified if a patch were being proposed for upstream that fixes
an upstream problem. It becomes too much of a maintenance nightmare to
try to address problems and keep issues from arising for non-upstream
patches. Thus, I don't think this is something that we want.
On śro, 2014-11-26 at 21:52 +0100, Oleg Nesterov wrote:
> On 11/26, Lukasz Pawelczyk wrote:
> >
> > My understanding is that while we have to use task_nsproxy()
>
> task_nsproxy() has already gone... probably this doesn't matter but which
> kernel version ?
Ah, yes, 728dba3a39c66b3d8ac889ddbe38b5b1c264aec3. But you're right, it
doesn't matter here. I've seen this change, just forgot about it.
> > task's nsproxy and check whether it's NULL, for the 'current' we don't
> > have to and it's expected not to be NULL.
>
> Well, unless exit_task_namespaces() was called ;)
That's the thing, should we ever get to a point that current's nsproxy
is NULL during an LSM check? That's why I mentioned code like:
current->nsproxy->some_ns
being in a kernel without a check. I think that current's nsproxy should
always be guaranteed to exist.
> > There seem to be no crash currently because of this, but with other LSM
> > modules or in future there might be. This is the backtrace:
>
> Confused... backtrace of what? did kernel crash or what?
Sorry, probably should have explained this a little better. Yes, this is
backtrace of crash, but one that will not happen in the exact form on
master. This is of a modified smack that makes use of nsproxy during LSM
checks.
But this really doesn't matter. I posted this backtrace to show that
there is an LSM check that happens after exit_task_namespaces() has been
called.
>
> > 0 smk_tskacc (task=0xffff88003b0b92e0, obj_known=0x2 <irq_stack_union+2>, mode=2, a=0xffff88003be53dd8) at security/smack/smack_access.c:261
> > 1 0xffffffff8130e2aa in smk_curacc (obj_known=<optimized out>, mode=<optimized out>, a=<optimized out>) at security/smack/smack_access.c:318
> > 2 0xffffffff8130a50d in smack_task_kill (p=0xffff88003b0b92e0, info=<optimized out>, sig=<optimized out>, secid=<optimized out>) at security/smack/smack_lsm.c:2071
>
> I do not know this code, so could you please tell more? How/wher smk_tskacc()
> uses ->nsproxy? smack_access.c:261 leads to the comment header above smk_curacc()
> in my tree, so this tells me nothing.
This is a code from my working tree. I'm working on Smack/LSM namespaces
that will make Smack use nsproxy during LSM checks. The code above is
not important at the moment. As I said previously this backtrace is just
a proof that an LSM check happens after exit_task_namespaces(), during
exit_notify().
> Well, we can probably move exit_task_namespaces() down (perhaps we even
> want to move it after exit_task_work).
>
> But I am not sure about exit_notify(), in this case free_nsproxy() can
> be called when the caller is already reaped.
>
> In any case, please more details?
The capability checks sometimes use nsproxy, e.g. user namespaces. Same
thing might be a case for Smack (or any other LSM module) when working
inside a namespace. I'm WIP on those changes and I will be trying to
upstream them to. This issue is a stopper for me so I though I'd try to
push it earlier.
I'm not expert on this code hence any suggestion would be welcome.
My understanding is that when we do an LSM hook there might be a
capability check inside. And this capability check might be using
ns_capable() instead of capable() if namespaced. And in the case
depicted above the nsproxy of the current process is already destroyed.
I think that this is a bug, even though it has no repercussions in the
kernel yet.
--
Lukasz Pawelczyk
Samsung R&D Institute Poland
Samsung Electronics
On śro, 2014-11-26 at 13:32 -0800, David Rientjes wrote:
> On Wed, 26 Nov 2014, Lukasz Pawelczyk wrote:
> >
> > LSM task_kill() hook is triggered and current->nsproxy within is NULL.
> >
> > This happens during an exit() syscall because exit_task_namespaces() is
> > called before the exit_notify(). This patch changes their order.
> >
>
> This is a classic case of a patch being proposed for a problem that only
> occurs on kernels that include other patches that are not upstream. The
> order that things are deconstructed in the exit path is complex and
> carefully choreographed, changing it comes at significant risk. That risk
> would be justified if a patch were being proposed for upstream that fixes
> an upstream problem. It becomes too much of a maintenance nightmare to
> try to address problems and keep issues from arising for non-upstream
> patches. Thus, I don't think this is something that we want.
This is a problem for the change I'm working on and I will be
upstreaming it too at some point. Please see my other reply for more
details:
http://www.spinics.net/lists/kernel/msg1877152.html
The only thing I can do then is to post this patch together with the
other patches when the time comes. But since this issue is rather
separate I've decided to try to push it earlier.
--
Lukasz Pawelczyk
Samsung R&D Institute Poland
Samsung Electronics
On Thu, 27 Nov 2014, Lukasz Pawelczyk wrote:
> On śro, 2014-11-26 at 13:32 -0800, David Rientjes wrote:
> > On Wed, 26 Nov 2014, Lukasz Pawelczyk wrote:
> > >
> > > LSM task_kill() hook is triggered and current->nsproxy within is NULL.
> > >
> > > This happens during an exit() syscall because exit_task_namespaces() is
> > > called before the exit_notify(). This patch changes their order.
> > >
> >
> > This is a classic case of a patch being proposed for a problem that only
> > occurs on kernels that include other patches that are not upstream. The
> > order that things are deconstructed in the exit path is complex and
> > carefully choreographed, changing it comes at significant risk. That risk
> > would be justified if a patch were being proposed for upstream that fixes
> > an upstream problem. It becomes too much of a maintenance nightmare to
> > try to address problems and keep issues from arising for non-upstream
> > patches. Thus, I don't think this is something that we want.
>
> This is a problem for the change I'm working on and I will be
> upstreaming it too at some point. Please see my other reply for more
> details:
>
> http://www.spinics.net/lists/kernel/msg1877152.html
>
> The only thing I can do then is to post this patch together with the
> other patches when the time comes. But since this issue is rather
> separate I've decided to try to push it earlier.
>
Yeah, it would be best to fold this into a series that needs
current->nsproxy to be valid at a sequence point in the exit path as part
of the same patch that requires it.