selinux_bprm_committed_creds:
rc = avc_has_perm()
if (rc) {
flush_signals(current);
This doesn't look right. If the task was SIGKILL'ed we must not proceed,
the task should die. The fix is simple, we should check SIGNAL_GROUP_EXIT
and do nothing in this case, the task will exit before return to user
space. If SIGNAL_GROUP_EXIT is set, it is just wrong to drop SIGKILL and
continue.
But, before fixing, I'd like to understand why we are doing
flush_signal_handlers(current, 1);
sigemptyset(¤t->blocked);
later. Could someone explain ? This looks unneeded.
Another question,
wake_up_interruptible(¤t->parent->signal->wait_chldexit);
Shouldn't we use ->real_parent ? Afaics, we shouldn't worry about the tracer
if current is ptraced, exec must not succeed if the tracer has no rights to
trace this task after cred changing. But we should notify ->real_parent which
is, well, real parent.
Also, we don't need _irq to take tasklist_lock, and we don't actually need
->siglock.
Oleg.
I am totally confused and almost sleeping, so another question ;)
What if eligible_child()->security_task_wait() returns the error?
wait_consider_task:
if (unlikely(ret < 0)) {
/*
* If we have not yet seen any eligible child,
* then let this error code replace -ECHILD.
* A permission error will give the user a clue
* to look for security policy problems, rather
* than for mysterious wait bugs.
*/
if (*notask_error)
*notask_error = ret;
}
But shouldn't we return 0 in this case ?
The current code proceeds and either reaps the child or clears notask_error.
Oleg.
On Wed, 29 Apr 2009, Oleg Nesterov wrote:
> selinux_bprm_committed_creds:
>
> rc = avc_has_perm()
> if (rc) {
> flush_signals(current);
>
> This doesn't look right. If the task was SIGKILL'ed we must not proceed,
> the task should die. The fix is simple, we should check SIGNAL_GROUP_EXIT
> and do nothing in this case, the task will exit before return to user
> space. If SIGNAL_GROUP_EXIT is set, it is just wrong to drop SIGKILL and
> continue.
I'm not quite sure what you're asking. This is a permission check to see
if the new task can inherit the signal state of the parent, and if not,
the new task's signal state is flushed.
Where does a consideration of SIGKILL arise?
> But, before fixing, I'd like to understand why we are doing
>
> flush_signal_handlers(current, 1);
> sigemptyset(¤t->blocked);
>
> later. Could someone explain ? This looks unneeded.
This is part of clearing all the signal state in the child.
>
>
> Another question,
>
> wake_up_interruptible(¤t->parent->signal->wait_chldexit);
>
> Shouldn't we use ->real_parent ? Afaics, we shouldn't worry about the tracer
> if current is ptraced, exec must not succeed if the tracer has no rights to
> trace this task after cred changing. But we should notify ->real_parent which
> is, well, real parent.
>
> Also, we don't need _irq to take tasklist_lock, and we don't actually need
> ->siglock.
>
> Oleg.
>
--
James Morris
<[email protected]>
On 04/29, James Morris wrote:
>
> On Wed, 29 Apr 2009, Oleg Nesterov wrote:
>
> > selinux_bprm_committed_creds:
> >
> > rc = avc_has_perm()
> > if (rc) {
> > flush_signals(current);
> >
> > This doesn't look right. If the task was SIGKILL'ed we must not proceed,
> > the task should die. The fix is simple, we should check SIGNAL_GROUP_EXIT
> > and do nothing in this case, the task will exit before return to user
> > space. If SIGNAL_GROUP_EXIT is set, it is just wrong to drop SIGKILL and
> > continue.
>
> I'm not quite sure what you're asking. This is a permission check to see
> if the new task can inherit the signal state of the parent,
we can flush the signal which was sent after we changed SID/cred and passed
the new permission checks,
> and if not,
> the new task's signal state is flushed.
>
> Where does a consideration of SIGKILL arise?
It is not possible to flush SIGKILL. Once SIGKILL (or another fatal signal)
is queued, it sets SIGNAL_GROUP_EXIT which can't be and must not be cleared.
But, there is no need to flush SIGKILL. The task will exit. If it was sent
before we changed SID, we can pretend the task has died before exec().
> > But, before fixing, I'd like to understand why we are doing
> >
> > flush_signal_handlers(current, 1);
> > sigemptyset(¤t->blocked);
> >
> > later. Could someone explain ? This looks unneeded.
>
> This is part of clearing all the signal state in the child.
This doesn't explain why we are doing this ;)
Why do we need to s/IGN/DFL/ and why do we clear ->blocked ? How this can
help from the security pov?
In fact this looks a bit wrong. The only way to ensure we can't lose the
signal during exec() is to block it beforehand, then install the handler
after exec(). s/IGN/DFL/ doesn't look good too.
But, if we really need this for security (selinux is a black magic to me),
then the above doesn't matter. Please help to understand.
Oleg.
Oleg Nesterov <[email protected]> wrote:
> we can flush the signal which was sent after we changed SID/cred and passed
> the new permission checks,
I think you mean to say, rather, that we can *lose* a signal that was sent,
because flush_signals() discards all pending signals unconditionally, and so
SIGKILL can be lost?
I suspect we should pass SIGKILL and possibly SIGSTOP through the flush.
David
On 04/29, David Howells wrote:
>
> Oleg Nesterov <[email protected]> wrote:
>
> > we can flush the signal which was sent after we changed SID/cred and passed
> > the new permission checks,
>
> I think you mean to say, rather, that we can *lose* a signal that was sent,
> because flush_signals() discards all pending signals unconditionally, and so
> SIGKILL can be lost?
Yes, thanks.
> I suspect we should pass SIGKILL
Or we can fliter out SIGKILLs, yes.
But this doesn't differ from "do nothing if SIGNAL_GROUP_EXIT", except needs
a bit more changes. If SIGNAL_GROUP_EXIT is true, we must have a pending
SIGKILL. Either way the task never returns to user-space.
> and possibly SIGSTOP through the flush.
Yes, perhaps... But I don't know if this is right from the selinux pov.
Perhaps it was queued before we changed SID.
And. It is possible that the task/user who sent SIGSTOP before changing
SID will not able to send SIGCONT later.
Oleg.
Oleg Nesterov <[email protected]> wrote:
> > I suspect we should pass SIGKILL
>
> Or we can fliter out SIGKILLs, yes.
How about the attached patch?
David
---
From: David Howells <[email protected]>
Subject: [PATCH] SELinux: Don't flush inherited SIGKILL during execve()
Don't flush inherited SIGKILL during execve() in SELinux's post cred commit
hook. This isn't really a security problem: if the SIGKILL came before the
credentials were changed, then we were right to receive it at the time, and
should honour it; if it came after the creds were changed, then we definitely
should honour it; and in any case, all that will happen is that the process
will be scrapped before it ever returns to userspace.
Signed-off-by: David Howells <[email protected]>
---
include/linux/sched.h | 1 +
kernel/signal.c | 11 ++++++++---
security/selinux/hooks.c | 11 +++++++----
3 files changed, 16 insertions(+), 7 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b4c38bc..3fa82b3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1885,6 +1885,7 @@ extern void sched_dead(struct task_struct *p);
extern void proc_caches_init(void);
extern void flush_signals(struct task_struct *);
+extern void __flush_signals(struct task_struct *);
extern void ignore_signals(struct task_struct *);
extern void flush_signal_handlers(struct task_struct *, int force_default);
extern int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info);
diff --git a/kernel/signal.c b/kernel/signal.c
index d803473..d2dd9cf 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -249,14 +249,19 @@ void flush_sigqueue(struct sigpending *queue)
/*
* Flush all pending signals for a task.
*/
+void __flush_signals(struct task_struct *t)
+{
+ clear_tsk_thread_flag(t, TIF_SIGPENDING);
+ flush_sigqueue(&t->pending);
+ flush_sigqueue(&t->signal->shared_pending);
+}
+
void flush_signals(struct task_struct *t)
{
unsigned long flags;
spin_lock_irqsave(&t->sighand->siglock, flags);
- clear_tsk_thread_flag(t, TIF_SIGPENDING);
- flush_sigqueue(&t->pending);
- flush_sigqueue(&t->signal->shared_pending);
+ __flush_signals(t);
spin_unlock_irqrestore(&t->sighand->siglock, flags);
}
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index ba808ef..b3ff7fa 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2398,11 +2398,14 @@ static void selinux_bprm_committed_creds(struct linux_binprm *bprm)
memset(&itimer, 0, sizeof itimer);
for (i = 0; i < 3; i++)
do_setitimer(i, &itimer, NULL);
- flush_signals(current);
spin_lock_irq(¤t->sighand->siglock);
- flush_signal_handlers(current, 1);
- sigemptyset(¤t->blocked);
- recalc_sigpending();
+ if (!sigismember(¤t->pending.signal, SIGKILL) &&
+ !sigismember(¤t->signal->shared_pending.signal,
+ SIGKILL)) {
+ __flush_signals(current);
+ flush_signal_handlers(current, 1);
+ sigemptyset(¤t->blocked);
+ }
spin_unlock_irq(¤t->sighand->siglock);
}
On 04/29, David Howells wrote:
>
> Oleg Nesterov <[email protected]> wrote:
>
> > > I suspect we should pass SIGKILL
> >
> > Or we can fliter out SIGKILLs, yes.
>
> How about the attached patch?
Heh. I did the very similar patch. It wasn't sent because I'd like to
understand flush_signal_handlers + sigemptyset first.
But,
> @@ -2398,11 +2398,14 @@ static void selinux_bprm_committed_creds(struct linux_binprm *bprm)
> memset(&itimer, 0, sizeof itimer);
> for (i = 0; i < 3; i++)
> do_setitimer(i, &itimer, NULL);
> - flush_signals(current);
> spin_lock_irq(¤t->sighand->siglock);
> - flush_signal_handlers(current, 1);
> - sigemptyset(¤t->blocked);
> - recalc_sigpending();
> + if (!sigismember(¤t->pending.signal, SIGKILL) &&
> + !sigismember(¤t->signal->shared_pending.signal,
> + SIGKILL)) {
No, no. Just
if (!(current->signal->flags & SIGNAL_GROUP_EXIT))
__flush_signals();
is enough and more clean imho. The fact that we _really_ have the pending
SIGKILL is just the implementation detail (and perhaps this we be changed
eventually).
No need to check ->shared_pending + ->pending. We can't have SIGKILL
(shared or not) without SIGNAL_GROUP_EXIT.
Oleg.
On Wed, 2009-04-29 at 08:58 +0200, Oleg Nesterov wrote:
> On 04/29, James Morris wrote:
> >
> > On Wed, 29 Apr 2009, Oleg Nesterov wrote:
> >
> > > selinux_bprm_committed_creds:
> > >
> > > rc = avc_has_perm()
> > > if (rc) {
> > > flush_signals(current);
> > >
> > > This doesn't look right. If the task was SIGKILL'ed we must not proceed,
> > > the task should die. The fix is simple, we should check SIGNAL_GROUP_EXIT
> > > and do nothing in this case, the task will exit before return to user
> > > space. If SIGNAL_GROUP_EXIT is set, it is just wrong to drop SIGKILL and
> > > continue.
> >
> > I'm not quite sure what you're asking. This is a permission check to see
> > if the new task can inherit the signal state of the parent,
>
> we can flush the signal which was sent after we changed SID/cred and passed
> the new permission checks,
>
> > and if not,
> > the new task's signal state is flushed.
> >
> > Where does a consideration of SIGKILL arise?
>
> It is not possible to flush SIGKILL. Once SIGKILL (or another fatal signal)
> is queued, it sets SIGNAL_GROUP_EXIT which can't be and must not be cleared.
>
> But, there is no need to flush SIGKILL. The task will exit. If it was sent
> before we changed SID, we can pretend the task has died before exec().
>
> > > But, before fixing, I'd like to understand why we are doing
> > >
> > > flush_signal_handlers(current, 1);
> > > sigemptyset(¤t->blocked);
> > >
> > > later. Could someone explain ? This looks unneeded.
> >
> > This is part of clearing all the signal state in the child.
>
> This doesn't explain why we are doing this ;)
>
> Why do we need to s/IGN/DFL/ and why do we clear ->blocked ? How this can
> help from the security pov?
We don't want the caller to be able to arrange conditions that prevent
correct handling of signals (e.g. SIGHUP) by the callee. That was
motivated by a specific attack against newrole, but was a general issue
for any program that runs in a more trusted domain than its caller.
As I recall, I based the logic in part on existing logic in
call_usermodehelper().
> In fact this looks a bit wrong. The only way to ensure we can't lose the
> signal during exec() is to block it beforehand, then install the handler
> after exec(). s/IGN/DFL/ doesn't look good too.
>
> But, if we really need this for security (selinux is a black magic to me),
> then the above doesn't matter. Please help to understand.
>
> Oleg.
--
Stephen Smalley
National Security Agency
Oleg Nesterov <[email protected]> wrote:
> No need to check ->shared_pending + ->pending. We can't have SIGKILL
> (shared or not) without SIGNAL_GROUP_EXIT.
Okay, I didn't realise we did this now.
How about the attached patch then?
David
---
From: David Howells <[email protected]>
Subject: [PATCH] SELinux: Don't flush inherited SIGKILL during execve()
Don't flush inherited SIGKILL during execve() in SELinux's post cred commit
hook. This isn't really a security problem: if the SIGKILL came before the
credentials were changed, then we were right to receive it at the time, and
should honour it; if it came after the creds were changed, then we definitely
should honour it; and in any case, all that will happen is that the process
will be scrapped before it ever returns to userspace.
Signed-off-by: David Howells <[email protected]>
---
include/linux/sched.h | 1 +
kernel/signal.c | 11 ++++++++---
security/selinux/hooks.c | 10 +++++-----
3 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b4c38bc..3fa82b3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1885,6 +1885,7 @@ extern void sched_dead(struct task_struct *p);
extern void proc_caches_init(void);
extern void flush_signals(struct task_struct *);
+extern void __flush_signals(struct task_struct *);
extern void ignore_signals(struct task_struct *);
extern void flush_signal_handlers(struct task_struct *, int force_default);
extern int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info);
diff --git a/kernel/signal.c b/kernel/signal.c
index d803473..d2dd9cf 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -249,14 +249,19 @@ void flush_sigqueue(struct sigpending *queue)
/*
* Flush all pending signals for a task.
*/
+void __flush_signals(struct task_struct *t)
+{
+ clear_tsk_thread_flag(t, TIF_SIGPENDING);
+ flush_sigqueue(&t->pending);
+ flush_sigqueue(&t->signal->shared_pending);
+}
+
void flush_signals(struct task_struct *t)
{
unsigned long flags;
spin_lock_irqsave(&t->sighand->siglock, flags);
- clear_tsk_thread_flag(t, TIF_SIGPENDING);
- flush_sigqueue(&t->pending);
- flush_sigqueue(&t->signal->shared_pending);
+ __flush_signals(t);
spin_unlock_irqrestore(&t->sighand->siglock, flags);
}
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 02c2647..76670e2 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2378,7 +2378,6 @@ static void selinux_bprm_committed_creds(struct linux_binprm *bprm)
struct sighand_struct *psig;
u32 osid, sid;
int rc, i;
- unsigned long flags;
osid = tsec->osid;
sid = tsec->sid;
@@ -2398,11 +2397,12 @@ static void selinux_bprm_committed_creds(struct linux_binprm *bprm)
memset(&itimer, 0, sizeof itimer);
for (i = 0; i < 3; i++)
do_setitimer(i, &itimer, NULL);
- flush_signals(current);
spin_lock_irq(¤t->sighand->siglock);
- flush_signal_handlers(current, 1);
- sigemptyset(¤t->blocked);
- recalc_sigpending();
+ if (!(current->signal->flags & SIGNAL_GROUP_EXIT)) {
+ __flush_signals(current);
+ flush_signal_handlers(current, 1);
+ sigemptyset(¤t->blocked);
+ }
spin_unlock_irq(¤t->sighand->siglock);
}
David Howells <[email protected]> wrote:
> > No need to check ->shared_pending + ->pending. We can't have SIGKILL
> > (shared or not) without SIGNAL_GROUP_EXIT.
>
> Okay, I didn't realise we did this now.
>
> How about the attached patch then?
Grrr... A bit leaked out of another patch into that one when I split them.
Try this attached instead.
David
---
From: David Howells <[email protected]>
Subject: [PATCH] SELinux: Don't flush inherited SIGKILL during execve()
Don't flush inherited SIGKILL during execve() in SELinux's post cred commit
hook. This isn't really a security problem: if the SIGKILL came before the
credentials were changed, then we were right to receive it at the time, and
should honour it; if it came after the creds were changed, then we definitely
should honour it; and in any case, all that will happen is that the process
will be scrapped before it ever returns to userspace.
Signed-off-by: David Howells <[email protected]>
---
include/linux/sched.h | 1 +
kernel/signal.c | 11 ++++++++---
security/selinux/hooks.c | 9 +++++----
3 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b4c38bc..3fa82b3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1885,6 +1885,7 @@ extern void sched_dead(struct task_struct *p);
extern void proc_caches_init(void);
extern void flush_signals(struct task_struct *);
+extern void __flush_signals(struct task_struct *);
extern void ignore_signals(struct task_struct *);
extern void flush_signal_handlers(struct task_struct *, int force_default);
extern int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info);
diff --git a/kernel/signal.c b/kernel/signal.c
index d803473..d2dd9cf 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -249,14 +249,19 @@ void flush_sigqueue(struct sigpending *queue)
/*
* Flush all pending signals for a task.
*/
+void __flush_signals(struct task_struct *t)
+{
+ clear_tsk_thread_flag(t, TIF_SIGPENDING);
+ flush_sigqueue(&t->pending);
+ flush_sigqueue(&t->signal->shared_pending);
+}
+
void flush_signals(struct task_struct *t)
{
unsigned long flags;
spin_lock_irqsave(&t->sighand->siglock, flags);
- clear_tsk_thread_flag(t, TIF_SIGPENDING);
- flush_sigqueue(&t->pending);
- flush_sigqueue(&t->signal->shared_pending);
+ __flush_signals(t);
spin_unlock_irqrestore(&t->sighand->siglock, flags);
}
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 0702ba6..76670e2 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2397,11 +2397,12 @@ static void selinux_bprm_committed_creds(struct linux_binprm *bprm)
memset(&itimer, 0, sizeof itimer);
for (i = 0; i < 3; i++)
do_setitimer(i, &itimer, NULL);
- flush_signals(current);
spin_lock_irq(¤t->sighand->siglock);
- flush_signal_handlers(current, 1);
- sigemptyset(¤t->blocked);
- recalc_sigpending();
+ if (!(current->signal->flags & SIGNAL_GROUP_EXIT)) {
+ __flush_signals(current);
+ flush_signal_handlers(current, 1);
+ sigemptyset(¤t->blocked);
+ }
spin_unlock_irq(¤t->sighand->siglock);
}
On 04/29, Stephen Smalley wrote:
>
> On Wed, 2009-04-29 at 08:58 +0200, Oleg Nesterov wrote:
> >
> > Why do we need to s/IGN/DFL/ and why do we clear ->blocked ? How this can
> > help from the security pov?
>
> We don't want the caller to be able to arrange conditions that prevent
> correct handling of signals (e.g. SIGHUP) by the callee. That was
> motivated by a specific attack against newrole, but was a general issue
> for any program that runs in a more trusted domain than its caller.
Still can't understand...
If the new image runs in a more trusted domain, then we should not change
SIG_IGN to SIG_DFL ?
For example, a user does "nohup setuid_app". Now, why should we change
SIG_IGN to SIG_DFL for SIGHUP? This makes setuid_app more "vulnerable"
to SIGHUP, not more "protected". Confused.
OK. Since I don't understand the security magic, you can just ignore me.
But I will appreciate any explanation for dummies ;)
> As I recall, I based the logic in part on existing logic in
> call_usermodehelper().
____call_usermodehelper() does this because we should not exec a user-space
application with SIGKILL/SIGSTOP ignored/blocked. We don't have this problem
when user-space execs.
Oleg.
On Wed, 2009-04-29 at 14:56 +0200, Oleg Nesterov wrote:
> On 04/29, Stephen Smalley wrote:
> >
> > On Wed, 2009-04-29 at 08:58 +0200, Oleg Nesterov wrote:
> > >
> > > Why do we need to s/IGN/DFL/ and why do we clear ->blocked ? How this can
> > > help from the security pov?
> >
> > We don't want the caller to be able to arrange conditions that prevent
> > correct handling of signals (e.g. SIGHUP) by the callee. That was
> > motivated by a specific attack against newrole, but was a general issue
> > for any program that runs in a more trusted domain than its caller.
>
> Still can't understand...
>
> If the new image runs in a more trusted domain, then we should not change
> SIG_IGN to SIG_DFL ?
>
> For example, a user does "nohup setuid_app". Now, why should we change
> SIG_IGN to SIG_DFL for SIGHUP? This makes setuid_app more "vulnerable"
> to SIGHUP, not more "protected". Confused.
Not if the app was depending on the default handler for SIGHUP to
correctly handle vhangup(). The point is that we don't necessarily
trust the caller to define the handling behavior for signals in the
callee. If we trust the caller to do so, then we can grant the
corresponding permission.
newrole scenario was to run it nohup, logout, wait for other user to
login on same tty, trigger termination of newrole'd child shell, and
have newrole relabel other user's tty to attacker's sid.
> OK. Since I don't understand the security magic, you can just ignore me.
> But I will appreciate any explanation for dummies ;)
>
> > As I recall, I based the logic in part on existing logic in
> > call_usermodehelper().
>
> ____call_usermodehelper() does this because we should not exec a user-space
> application with SIGKILL/SIGSTOP ignored/blocked. We don't have this problem
> when user-space execs.
But we still have the problem of the caller setting up the signal
handlers or blocked signal mask prior to exec'ing the privileged
program, right?
--
Stephen Smalley
National Security Agency
On Wed, 2009-04-29 at 00:30 +0200, Oleg Nesterov wrote:
> selinux_bprm_committed_creds:
>
> rc = avc_has_perm()
> if (rc) {
> flush_signals(current);
>
> This doesn't look right. If the task was SIGKILL'ed we must not proceed,
> the task should die. The fix is simple, we should check SIGNAL_GROUP_EXIT
> and do nothing in this case, the task will exit before return to user
> space. If SIGNAL_GROUP_EXIT is set, it is just wrong to drop SIGKILL and
> continue.
>
> But, before fixing, I'd like to understand why we are doing
>
> flush_signal_handlers(current, 1);
> sigemptyset(¤t->blocked);
>
> later. Could someone explain ? This looks unneeded.
>
>
> Another question,
>
> wake_up_interruptible(¤t->parent->signal->wait_chldexit);
>
> Shouldn't we use ->real_parent ? Afaics, we shouldn't worry about the tracer
> if current is ptraced, exec must not succeed if the tracer has no rights to
> trace this task after cred changing. But we should notify ->real_parent which
> is, well, real parent.
That makes sense to me - yes, s/parent/real_parent/.
--
Stephen Smalley
National Security Agency
On 04/29, David Howells wrote:
>
> From: David Howells <[email protected]>
> Subject: [PATCH] SELinux: Don't flush inherited SIGKILL during execve()
>
> Don't flush inherited SIGKILL during execve() in SELinux's post cred commit
> hook. This isn't really a security problem: if the SIGKILL came before the
> credentials were changed, then we were right to receive it at the time, and
> should honour it; if it came after the creds were changed, then we definitely
> should honour it; and in any case, all that will happen is that the process
> will be scrapped before it ever returns to userspace.
>
> Signed-off-by: David Howells <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
On 04/29, Stephen Smalley wrote:
>
> On Wed, 2009-04-29 at 00:30 +0200, Oleg Nesterov wrote:
> > selinux_bprm_committed_creds:
> >
> > rc = avc_has_perm()
> > if (rc) {
> > flush_signals(current);
> >
> > This doesn't look right. If the task was SIGKILL'ed we must not proceed,
> > the task should die. The fix is simple, we should check SIGNAL_GROUP_EXIT
> > and do nothing in this case, the task will exit before return to user
> > space. If SIGNAL_GROUP_EXIT is set, it is just wrong to drop SIGKILL and
> > continue.
> >
> > But, before fixing, I'd like to understand why we are doing
> >
> > flush_signal_handlers(current, 1);
> > sigemptyset(¤t->blocked);
> >
> > later. Could someone explain ? This looks unneeded.
> >
> >
> > Another question,
> >
> > wake_up_interruptible(¤t->parent->signal->wait_chldexit);
> >
> > Shouldn't we use ->real_parent ? Afaics, we shouldn't worry about the tracer
> > if current is ptraced, exec must not succeed if the tracer has no rights to
> > trace this task after cred changing. But we should notify ->real_parent which
> > is, well, real parent.
>
> That makes sense to me - yes, s/parent/real_parent/.
Great, thanks. Will send the patch soon.
Oleg.
On 04/29, Stephen Smalley wrote:
>
> On Wed, 2009-04-29 at 14:56 +0200, Oleg Nesterov wrote:
> > On 04/29, Stephen Smalley wrote:
> > >
> > > On Wed, 2009-04-29 at 08:58 +0200, Oleg Nesterov wrote:
> > > >
> > > > Why do we need to s/IGN/DFL/ and why do we clear ->blocked ? How this can
> > > > help from the security pov?
> > >
> > > We don't want the caller to be able to arrange conditions that prevent
> > > correct handling of signals (e.g. SIGHUP) by the callee. That was
> > > motivated by a specific attack against newrole, but was a general issue
> > > for any program that runs in a more trusted domain than its caller.
> >
> > Still can't understand...
> >
> > If the new image runs in a more trusted domain, then we should not change
> > SIG_IGN to SIG_DFL ?
> >
> > For example, a user does "nohup setuid_app". Now, why should we change
> > SIG_IGN to SIG_DFL for SIGHUP? This makes setuid_app more "vulnerable"
> > to SIGHUP, not more "protected". Confused.
>
> Not if the app was depending on the default handler for SIGHUP to
> correctly handle vhangup(). The point is that we don't necessarily
> trust the caller to define the handling behavior for signals in the
> callee. If we trust the caller to do so, then we can grant the
> corresponding permission.
>
> newrole scenario was to run it nohup, logout, wait for other user to
> login on same tty, trigger termination of newrole'd child shell, and
> have newrole relabel other user's tty to attacker's sid.
>
> > OK. Since I don't understand the security magic, you can just ignore me.
> > But I will appreciate any explanation for dummies ;)
> >
> > > As I recall, I based the logic in part on existing logic in
> > > call_usermodehelper().
> >
> > ____call_usermodehelper() does this because we should not exec a user-space
> > application with SIGKILL/SIGSTOP ignored/blocked. We don't have this problem
> > when user-space execs.
>
> But we still have the problem of the caller setting up the signal
> handlers or blocked signal mask prior to exec'ing the privileged
> program, right?
The callee can never setup the signal handler. Note that flush_old_exec()
does flush_signal_handlers() too. But it uses force_default == F.
OK, please forget this. I trust you even if can't understand ;)
My real concerns were SIGKILL and do_wait(), they were addressed.
Thanks!
Oleg.
On Wed, 2009-04-29 at 15:42 +0200, Oleg Nesterov wrote:
> On 04/29, Stephen Smalley wrote:
> >
> > On Wed, 2009-04-29 at 14:56 +0200, Oleg Nesterov wrote:
> > > On 04/29, Stephen Smalley wrote:
> > > >
> > > > On Wed, 2009-04-29 at 08:58 +0200, Oleg Nesterov wrote:
> > > > >
> > > > > Why do we need to s/IGN/DFL/ and why do we clear ->blocked ? How this can
> > > > > help from the security pov?
> > > >
> > > > We don't want the caller to be able to arrange conditions that prevent
> > > > correct handling of signals (e.g. SIGHUP) by the callee. That was
> > > > motivated by a specific attack against newrole, but was a general issue
> > > > for any program that runs in a more trusted domain than its caller.
> > >
> > > Still can't understand...
> > >
> > > If the new image runs in a more trusted domain, then we should not change
> > > SIG_IGN to SIG_DFL ?
> > >
> > > For example, a user does "nohup setuid_app". Now, why should we change
> > > SIG_IGN to SIG_DFL for SIGHUP? This makes setuid_app more "vulnerable"
> > > to SIGHUP, not more "protected". Confused.
> >
> > Not if the app was depending on the default handler for SIGHUP to
> > correctly handle vhangup(). The point is that we don't necessarily
> > trust the caller to define the handling behavior for signals in the
> > callee. If we trust the caller to do so, then we can grant the
> > corresponding permission.
> >
> > newrole scenario was to run it nohup, logout, wait for other user to
> > login on same tty, trigger termination of newrole'd child shell, and
> > have newrole relabel other user's tty to attacker's sid.
> >
> > > OK. Since I don't understand the security magic, you can just ignore me.
> > > But I will appreciate any explanation for dummies ;)
> > >
> > > > As I recall, I based the logic in part on existing logic in
> > > > call_usermodehelper().
> > >
> > > ____call_usermodehelper() does this because we should not exec a user-space
> > > application with SIGKILL/SIGSTOP ignored/blocked. We don't have this problem
> > > when user-space execs.
> >
> > But we still have the problem of the caller setting up the signal
> > handlers or blocked signal mask prior to exec'ing the privileged
> > program, right?
>
> The callee can never setup the signal handler. Note that flush_old_exec()
> does flush_signal_handlers() too. But it uses force_default == F.
Right, but it can set it to SIG_IGN, which was the problem in the
situation above.
> OK, please forget this. I trust you even if can't understand ;)
>
> My real concerns were SIGKILL and do_wait(), they were addressed.
>
> Thanks!
>
> Oleg.
--
Stephen Smalley
National Security Agency
(on top of David's "[PATCH] SELinux: Don't flush inherited SIGKILL during execve()",
but can be applied independently).
selinux_bprm_committed_creds() should wake up ->real_parent, not ->parent.
We shouldn't worry about the tracer if current is ptraced, exec() must not
succeed if the tracer has no rights to trace this task after cred changing.
But we should notify ->real_parent which is, well, real parent.
Also, we don't need _irq to take tasklist, and we don't need parent's
->siglock to wake_up_interruptible(real_parent->signal->wait_chldexit).
Since we hold tasklist, real_parent->signal must be stable. Otherwise
spin_lock(siglock) is not safe too and can't help anyway.
Signed-off-by: Oleg Nesterov <[email protected]>
--- PTRACE/security/selinux/hooks.c~SELINUX_PARENT 2009-04-29 15:45:28.000000000 +0200
+++ PTRACE/security/selinux/hooks.c 2009-04-29 15:48:41.000000000 +0200
@@ -2375,10 +2375,8 @@ static void selinux_bprm_committed_creds
{
const struct task_security_struct *tsec = current_security();
struct itimerval itimer;
- struct sighand_struct *psig;
u32 osid, sid;
int rc, i;
- unsigned long flags;
osid = tsec->osid;
sid = tsec->sid;
@@ -2409,12 +2407,9 @@ static void selinux_bprm_committed_creds
/* Wake up the parent if it is waiting so that it can recheck
* wait permission to the new task SID. */
- read_lock_irq(&tasklist_lock);
- psig = current->parent->sighand;
- spin_lock_irqsave(&psig->siglock, flags);
- wake_up_interruptible(¤t->parent->signal->wait_chldexit);
- spin_unlock_irqrestore(&psig->siglock, flags);
- read_unlock_irq(&tasklist_lock);
+ read_lock(&tasklist_lock);
+ wake_up_interruptible(¤t->real_parent->signal->wait_chldexit);
+ read_unlock(&tasklist_lock);
}
/* superblock security operations */
Damn. Forgot to add "[PATCH]" to the subject, hopefully not a problem.
On 04/29, Oleg Nesterov wrote:
>
> selinux_bprm_committed_creds() should wake up ->real_parent, not ->parent.
Afaics, with this patch the only user of ->parent outside of ptrace.c & co
is arch/ia64/kernel/mca.c:format_mca_init_stack(). Hopefully ->parent will
die soon.
Oleg.
> > For example, a user does "nohup setuid_app". Now, why should we change
> > SIG_IGN to SIG_DFL for SIGHUP? This makes setuid_app more "vulnerable"
> > to SIGHUP, not more "protected". Confused.
>
> Not if the app was depending on the default handler for SIGHUP to
> correctly handle vhangup(). The point is that we don't necessarily
> trust the caller to define the handling behavior for signals in the
> callee. If we trust the caller to do so, then we can grant the
> corresponding permission.
You are changing standards defined behaviour - so apps that rely on
correct behaviour maybe misbehave or become vulnerable in ways not
anticipated
A classic example here is SIGPIPE. What you are doing by dropping to
default signal behaviour is breaking various inetd style setups that know
the child will inherit SIGPIPE as SIG_IGN and therefore will cleanly
process network connection loss.
Instead the child will now suffer immediate termination and not
have a chance to clean up.
So mucking with signal masks is actually not improving security - it's
randomly changing things - and randomly changing things to disagree with
the spec is a very very bad idea IMHO as you will harm correctly written
code.
> But we still have the problem of the caller setting up the signal
> handlers or blocked signal mask prior to exec'ing the privileged
> program, right
No.
A signal handler function is cleared across a setuid exec so you can't
set an address and jump to it in the new binary. K&R thought of that
one ;)
A signal mask is inherited and this is *required to be so*. Changing it
breaks stuff and may itself introduce security problems.
Any setuid app has to be reasonably robust because it may get very
strange file handles, very strange signal masks, quotas, resource limits
and other stuff. Signals are just one trivial detail.
Clearing a lot of these is hard as you don't know what the resource
limits etc for the setuid target are and you don't want a pair of setuid
user A and user B apps co-operating to evade things.
The correct way to do most of this is not to use setuid apps run by the
user but to pass authority information to service daemons that run
services within a known non-user influencable environment in the first
place... the very daemons that randomly clearing the SIG_IGN on SIGPIPE
will break !
Alan
> But, before fixing, I'd like to understand why we are doing
>
> flush_signal_handlers(current, 1);
> sigemptyset(¤t->blocked);
Interesting - this appear to be introducing a security hole by clearing
things daemons running setuid apps are entitled to rely upon.
On Wed, 2009-04-29 at 15:47 +0100, Alan Cox wrote:
> > > For example, a user does "nohup setuid_app". Now, why should we change
> > > SIG_IGN to SIG_DFL for SIGHUP? This makes setuid_app more "vulnerable"
> > > to SIGHUP, not more "protected". Confused.
> >
> > Not if the app was depending on the default handler for SIGHUP to
> > correctly handle vhangup(). The point is that we don't necessarily
> > trust the caller to define the handling behavior for signals in the
> > callee. If we trust the caller to do so, then we can grant the
> > corresponding permission.
>
> You are changing standards defined behaviour - so apps that rely on
> correct behaviour maybe misbehave or become vulnerable in ways not
> anticipated
>
> A classic example here is SIGPIPE. What you are doing by dropping to
> default signal behaviour is breaking various inetd style setups that know
> the child will inherit SIGPIPE as SIG_IGN and therefore will cleanly
> process network connection loss.
>
> Instead the child will now suffer immediate termination and not
> have a chance to clean up.
>
> So mucking with signal masks is actually not improving security - it's
> randomly changing things - and randomly changing things to disagree with
> the spec is a very very bad idea IMHO as you will harm correctly written
> code.
>
> > But we still have the problem of the caller setting up the signal
> > handlers or blocked signal mask prior to exec'ing the privileged
> > program, right
>
> No.
>
> A signal handler function is cleared across a setuid exec so you can't
> set an address and jump to it in the new binary. K&R thought of that
> one ;)
Yes, I know - I was just referring to setting to SIG_IGN.
> A signal mask is inherited and this is *required to be so*. Changing it
> breaks stuff and may itself introduce security problems.
Which is why it is subject to a permission check, so it can in fact be
allowed when desired.
> Any setuid app has to be reasonably robust because it may get very
> strange file handles, very strange signal masks, quotas, resource limits
> and other stuff. Signals are just one trivial detail.
>
> Clearing a lot of these is hard as you don't know what the resource
> limits etc for the setuid target are and you don't want a pair of setuid
> user A and user B apps co-operating to evade things.
Control over inherited file descriptors was present in SELinux from the
beginning. Control over signal state and resource limits was introduced
a little later (2.6.1, 2.6.2), partly in response to a specific attack,
and based on consultations with Roland and others. There was a review
done of inherited state and what we could reasonably do to control it.
> The correct way to do most of this is not to use setuid apps run by the
> user but to pass authority information to service daemons that run
> services within a known non-user influencable environment in the first
> place... the very daemons that randomly clearing the SIG_IGN on SIGPIPE
> will break !
That has some advantages, yes, and in that situation you can allow the
inheritance of state to occur between the caller and the callee domains
in the policy. But it carries its own set of challenges as well, and
improving the boundaries in the exec-based model is nonetheless
worthwhile.
--
Stephen Smalley
National Security Agency
I was never able to understand what should we actually do when
security_task_wait() fails, but the current code doesn't look right.
If ->task_wait() returns the error, we update *notask_error correctly.
But then we either reap the child (despite the fact this was forbidden)
or clear *notask_error (and hide the securiy policy problems).
This patch assumes that "stolen by ptrace" doesn't matter. If selinux
denies the child we should ignore it but make sure we report -EACCESS
instead of -ECHLD if there are no other eligible children.
Signed-off-by: Oleg Nesterov <[email protected]>
--- PTRACE/kernel/exit.c~WAIT_SECURITY 2009-04-29 12:46:15.000000000 +0200
+++ PTRACE/kernel/exit.c 2009-04-29 16:19:40.000000000 +0200
@@ -1476,6 +1476,7 @@ static int wait_consider_task(struct tas
*/
if (*notask_error)
*notask_error = ret;
+ return 0;
}
if (likely(!ptrace) && unlikely(task_ptrace(p))) {
On Wed, 29 Apr 2009, Oleg Nesterov wrote:
> On 04/29, David Howells wrote:
> >
> > From: David Howells <[email protected]>
> > Subject: [PATCH] SELinux: Don't flush inherited SIGKILL during execve()
> >
> > Don't flush inherited SIGKILL during execve() in SELinux's post cred commit
> > hook. This isn't really a security problem: if the SIGKILL came before the
> > credentials were changed, then we were right to receive it at the time, and
> > should honour it; if it came after the creds were changed, then we definitely
> > should honour it; and in any case, all that will happen is that the process
> > will be scrapped before it ever returns to userspace.
> >
> > Signed-off-by: David Howells <[email protected]>
>
> Signed-off-by: Oleg Nesterov <[email protected]>
Applied to
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6#next
--
James Morris
<[email protected]>
On Wed, 29 Apr 2009, Oleg Nesterov wrote:
> (on top of David's "[PATCH] SELinux: Don't flush inherited SIGKILL during execve()",
> but can be applied independently).
>
> selinux_bprm_committed_creds() should wake up ->real_parent, not ->parent.
>
> We shouldn't worry about the tracer if current is ptraced, exec() must not
> succeed if the tracer has no rights to trace this task after cred changing.
> But we should notify ->real_parent which is, well, real parent.
>
> Also, we don't need _irq to take tasklist, and we don't need parent's
> ->siglock to wake_up_interruptible(real_parent->signal->wait_chldexit).
> Since we hold tasklist, real_parent->signal must be stable. Otherwise
> spin_lock(siglock) is not safe too and can't help anyway.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
Applied to
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6#next
--
James Morris
<[email protected]>
I'm pretty sure this was just an unintended change when we did the do_wait
reorganization in 2.6.27. Nobody noticed since the SELinux policy bugs
that made that error diagnosis relevant were fixed a long time before.
Acked-by: Roland McGrath <[email protected]>
Thanks,
Roland
Acked-by: Roland McGrath <[email protected]>
> Afaics, with this patch the only user of ->parent outside of ptrace.c & co
> is arch/ia64/kernel/mca.c:format_mca_init_stack(). Hopefully ->parent will
> die soon.
Woo! Nice work. That oddball ia64 case is obviously trivial, it just
wants to avoid uninitialized stuff in its task_struct for a
not-really-a-process. With your fixes, it seems certain that ->parent
should never be examined in those tasks. Even so, the extra initialization
doesn't hurt. You could clean it up today with:
tracehook_finish_clone(p, 0, 0);
at the end (in place of touching ->parent).
That will dtrt both now and later.
Thanks,
Roland
On Wed, 29 Apr 2009, Oleg Nesterov wrote:
> I was never able to understand what should we actually do when
> security_task_wait() fails, but the current code doesn't look right.
>
> If ->task_wait() returns the error, we update *notask_error correctly.
> But then we either reap the child (despite the fact this was forbidden)
> or clear *notask_error (and hide the securiy policy problems).
>
> This patch assumes that "stolen by ptrace" doesn't matter. If selinux
> denies the child we should ignore it but make sure we report -EACCESS
> instead of -ECHLD if there are no other eligible children.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
Applied to:
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6#next
--
James Morris
<[email protected]>
The SIGNAL_GROUP_EXIT check is fine for now. But we should revisit this
area more thoroughly at some point. (That is a low-priority thing.)
There is a wrinkle here I don't like. The fatal signal is "committed to"
(sender approved by security modules, etc.) "before" the exec, but gets
delivered "after" the exec. e.g., acct_process() writes a record for the
killed process showing the uid/gid from after the setuid/setgid exec, when
security would not have allowed the killing signal to be sent after the
exec, only before. Obviously this is a minor quirk (not to be worried
about today), but it points to a deeper kind of "wrong" that troubles me.
I can't think of any other similar wrinkles that could be observable at
all (or matter more than that one), but there might be another.
This is related to the issue of racing stop signals lost by de_thread().
That is still on our back-burner list to think about more deeply one day.
We don't need to contemplate any of this much more right now, but I would
like to address the whole mess better later on.
I don't understand why install_exec_creds() is called as late as it is.
Can't we do that in flush_old_exec()--you know, where it says:
/* install the new credentials */
?
What I think would be best is if flush_old_exec() does all the "point of no
return" logic and that includes the credentials changes. Then we can
define this as the point of transition from "before exec" to "after exec".
It would do the final check for signals interrupting the exec, and if
flush_old_exec() returned 0, then any "new" signals are "after exec".
We'd reorganize things so the final creds switch is under siglock.
Then either a sender precedes the exec and any security module's flushing
logic wipes the pre-exec state as it wants to, or the sender follows the
exec and is subject to signal security checks based on the new credentials.
Thanks,
Roland
Roland McGrath <[email protected]> wrote:
> There is a wrinkle here I don't like. The fatal signal is "committed to"
> (sender approved by security modules, etc.) "before" the exec, but gets
> delivered "after" the exec.
That's because the signal is delivered between the system call function
returning to entry.S and userspace resuming. We could put a lot of
check...abort clauses in exec.c and the binfmts, but is it worth the hassle?
> I don't understand why install_exec_creds() is called as late as it is.
> Can't we do that in flush_old_exec()--you know, where it says:
> /* install the new credentials */
> ?
I believe it's something to do with the binfmt driver needing to access files
in the old security context between calling flush_old_exec() and calling
install_exec_creds() [compute_creds() as was]. It can't do some of the
accesses before calling flush_old_exec() because it has to do funky things
with mmap().
Actually, that comment should probably be removed. IIRC, at one time I was
trying to set all the credentials there, but was told I couldn't do that.
David
> I believe it's something to do with the binfmt driver needing to access files
> in the old security context between calling flush_old_exec() and calling
> install_exec_creds() [compute_creds() as was]. It can't do some of the
> accesses before calling flush_old_exec() because it has to do funky things
> with mmap().
This doesn't make too much sense to me off hand. These accesses must
already be specially magical for unreadable setuid (--s) files to work.
Thanks,
Roland