2004-10-21 01:37:38

by Chris Wright

[permalink] [raw]
Subject: [RFC][PATCH] delay rq_lock acquisition in setscheduler

Hi Ingo,

Doing access control checks with rq_lock held can cause deadlock
when audit messages are created (via printk or audit infrastructure),
as noted by both SELinux and SubDomain folks. This patch will let the
security checks happen w/out lock held, then re-sample the p->policy in
case it was raced. Original patch from John Johansen, with some updates
from me. What do you think?

From: John Johansen <[email protected]>
Signed-off-by: Chris Wright <[email protected]>

===== kernel/sched.c 1.367 vs edited =====
--- 1.367/kernel/sched.c 2004-10-18 22:26:52 -07:00
+++ edited/kernel/sched.c 2004-10-20 15:55:12 -07:00
@@ -3038,7 +3038,7 @@
{
struct sched_param lp;
int retval = -EINVAL;
- int oldprio;
+ int oldprio, oldpolicy = -1;
prio_array_t *array;
unsigned long flags;
runqueue_t *rq;
@@ -3060,23 +3060,17 @@

retval = -ESRCH;
if (!p)
- goto out_unlock_tasklist;
-
- /*
- * To be able to change p->policy safely, the apropriate
- * runqueue lock must be held.
- */
- rq = task_rq_lock(p, &flags);
+ goto out_unlock;

+ /* double check policy once rq lock held */
if (policy < 0)
- policy = p->policy;
+ policy = oldpolicy = p->policy;
else {
retval = -EINVAL;
if (policy != SCHED_FIFO && policy != SCHED_RR &&
policy != SCHED_NORMAL)
goto out_unlock;
}
-
/*
* Valid priorities for SCHED_FIFO and SCHED_RR are
* 1..MAX_USER_RT_PRIO-1, valid priority for SCHED_NORMAL is 0.
@@ -3098,7 +3092,15 @@
retval = security_task_setscheduler(p, policy, &lp);
if (retval)
goto out_unlock;
-
+ /*
+ * To be able to change p->policy safely, the apropriate
+ * runqueue lock must be held.
+ */
+ rq = task_rq_lock(p, &flags);
+ /* recheck policy now with rq lock held */
+ retval = -EPERM;
+ if (unlikely(oldpolicy != -1 && oldpolicy != p->policy))
+ goto out_unlock_rq;
array = p->array;
if (array)
deactivate_task(p, task_rq(p));
@@ -3118,12 +3120,10 @@
} else if (TASK_PREEMPTS_CURR(p, rq))
resched_task(rq->curr);
}
-
-out_unlock:
+out_unlock_rq:
task_rq_unlock(rq, &flags);
-out_unlock_tasklist:
+out_unlock:
read_unlock_irq(&tasklist_lock);
-
out_nounlock:
return retval;
}


2004-10-21 02:00:00

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [RFC][PATCH] delay rq_lock acquisition in setscheduler

On Wed, Oct 20, 2004 at 06:32:38PM -0700, Chris Wright wrote:
> + rq = task_rq_lock(p, &flags);
> + /* recheck policy now with rq lock held */
> + retval = -EPERM;
> + if (unlikely(oldpolicy != -1 && oldpolicy != p->policy))
> + goto out_unlock_rq;

to be really backwards compatible you should return 0 methinks, the only
case when this race can trigger is with non deterministic usage, and the
current kernel would never return -EPERM in such a non deterministic
usage. However the -EPERM will signal the non deterministic usage, but I
doubt it worth to return -EPERM there, since it makes it looks like the
other side that didn't get EPERM is safe while it's not, since the other
side isn't deterministic either.

2004-10-21 05:23:27

by Chris Wright

[permalink] [raw]
Subject: Re: [RFC][PATCH] delay rq_lock acquisition in setscheduler

* Andrea Arcangeli ([email protected]) wrote:
> On Wed, Oct 20, 2004 at 06:32:38PM -0700, Chris Wright wrote:
> > + rq = task_rq_lock(p, &flags);
> > + /* recheck policy now with rq lock held */
> > + retval = -EPERM;
> > + if (unlikely(oldpolicy != -1 && oldpolicy != p->policy))
> > + goto out_unlock_rq;
>
> to be really backwards compatible you should return 0 methinks, the only
> case when this race can trigger is with non deterministic usage, and the
> current kernel would never return -EPERM in such a non deterministic
> usage. However the -EPERM will signal the non deterministic usage, but I
> doubt it worth to return -EPERM there, since it makes it looks like the
> other side that didn't get EPERM is safe while it's not, since the other
> side isn't deterministic either.

true. another alternative is to drop rq_lock and do the checks over.
i didn't convince myself yet that there's no chance for livelock,
although it seems unlikely.

2004-10-21 08:07:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH] delay rq_lock acquisition in setscheduler


* Chris Wright <[email protected]> wrote:

> Hi Ingo,
>
> Doing access control checks with rq_lock held can cause deadlock when
> audit messages are created (via printk or audit infrastructure), as
> noted by both SELinux and SubDomain folks. This patch will let the
> security checks happen w/out lock held, then re-sample the p->policy
> in case it was raced. Original patch from John Johansen, with some
> updates from me. What do you think?

i dont this this patch is correct, because it changes semantics by
pushing a security-subsystem failure back into userspace. There's
nothing wrong with two tasks trying to change a third task's policy in
parallel - our API allows that.

I agree that this is a very special case of lock dependency and that the
security subsystem should not be burdened with double-buffering messages
just to avoid the runqueue lock on syslogd wakeup. Only this single
scheduling-related system-call is affected by this problem.

i think the right solution would be to retry the permission check if the
policy has changed (an unlikely event). It is livelockable the same way
seqlocks are livelockable so i'd not worry about it too much. It is also
preemptible with PREEMPT so not a big issue. Also, the check & repeat
code should possibly be #ifdef CONFIG_SECURITY.

Ingo

2004-10-21 12:56:31

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [RFC][PATCH] delay rq_lock acquisition in setscheduler

On Wed, Oct 20, 2004 at 10:16:32PM -0700, Chris Wright wrote:
> true. another alternative is to drop rq_lock and do the checks over.
> i didn't convince myself yet that there's no chance for livelock,
> although it seems unlikely.

yep, since the workload isn't deterministic if the race triggers I got
convinced the retry loop wasn't strictly needed. There should be no
livelock, however with the loop just like with the spinlocks there's no
fariness guarantee on the numa (especially old numa). (and fixing the
spinlocks is easier for the architecture by implementing a fair version
transparently). That's probably the only issue with the loops.

2004-10-21 17:33:05

by Chris Wright

[permalink] [raw]
Subject: Re: [RFC][PATCH] delay rq_lock acquisition in setscheduler

* Ingo Molnar ([email protected]) wrote:
> i dont this this patch is correct, because it changes semantics by
> pushing a security-subsystem failure back into userspace. There's
> nothing wrong with two tasks trying to change a third task's policy in
> parallel - our API allows that.

Andrea and John had similar concern.

> I agree that this is a very special case of lock dependency and that the
> security subsystem should not be burdened with double-buffering messages
> just to avoid the runqueue lock on syslogd wakeup. Only this single
> scheduling-related system-call is affected by this problem.
>
> i think the right solution would be to retry the permission check if the
> policy has changed (an unlikely event). It is livelockable the same way
> seqlocks are livelockable so i'd not worry about it too much. It is also
> preemptible with PREEMPT so not a big issue. Also, the check & repeat
> code should possibly be #ifdef CONFIG_SECURITY.

I think ifdef would start to look messy in that function. This one
simply rechecks permissions if the policy has changed. Look OK?

Doing access control checks with rq_lock held can cause deadlock when
audit messages are created (via printk or audit infrastructure) which
trigger a wakeup and deadlock, as noted by both SELinux and SubDomain
folks. This patch will let the security checks happen w/out lock held,
then re-sample the p->policy in case it was raced. Originally from
John Johansen <[email protected]>, bits redone by me.

From: John Johansen <[email protected]>
Signed-off-by: Chris Wright <[email protected]>

===== kernel/sched.c 1.367 vs edited =====
--- 1.367/kernel/sched.c 2004-10-18 22:26:52 -07:00
+++ edited/kernel/sched.c 2004-10-21 09:41:23 -07:00
@@ -3038,7 +3038,7 @@
{
struct sched_param lp;
int retval = -EINVAL;
- int oldprio;
+ int oldprio, oldpolicy = -1;
prio_array_t *array;
unsigned long flags;
runqueue_t *rq;
@@ -3060,23 +3060,17 @@

retval = -ESRCH;
if (!p)
- goto out_unlock_tasklist;
-
- /*
- * To be able to change p->policy safely, the apropriate
- * runqueue lock must be held.
- */
- rq = task_rq_lock(p, &flags);
-
+ goto out_unlock;
+recheck:
+ /* double check policy once rq lock held */
if (policy < 0)
- policy = p->policy;
+ policy = oldpolicy = p->policy;
else {
retval = -EINVAL;
if (policy != SCHED_FIFO && policy != SCHED_RR &&
policy != SCHED_NORMAL)
goto out_unlock;
}
-
/*
* Valid priorities for SCHED_FIFO and SCHED_RR are
* 1..MAX_USER_RT_PRIO-1, valid priority for SCHED_NORMAL is 0.
@@ -3098,7 +3092,17 @@
retval = security_task_setscheduler(p, policy, &lp);
if (retval)
goto out_unlock;
-
+ /*
+ * To be able to change p->policy safely, the apropriate
+ * runqueue lock must be held.
+ */
+ rq = task_rq_lock(p, &flags);
+ /* recheck policy now with rq lock held */
+ if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
+ policy = oldpolicy = -1;
+ task_rq_unlock(rq, &flags);
+ goto recheck;
+ }
array = p->array;
if (array)
deactivate_task(p, task_rq(p));
@@ -3118,12 +3122,9 @@
} else if (TASK_PREEMPTS_CURR(p, rq))
resched_task(rq->curr);
}
-
-out_unlock:
task_rq_unlock(rq, &flags);
-out_unlock_tasklist:
+out_unlock:
read_unlock_irq(&tasklist_lock);
-
out_nounlock:
return retval;
}