Andrew Morton wrote:
> On Mon, 10 May 2010 20:00:46 +0200
> Jiri Slaby <[email protected]> wrote:
> > @@ -1309,11 +1305,13 @@ int do_setrlimit(struct task_struct *tsk, unsigned int resource,
> >
> > old_rlim = tsk->signal->rlim + resource;
> > task_lock(tsk->group_leader);
> > - if ((new_rlim->rlim_max <= old_rlim->rlim_max) ||
> > - capable(CAP_SYS_RESOURCE))
> > - *old_rlim = *new_rlim;
> > - else
> > + if ((new_rlim->rlim_max > old_rlim->rlim_max) &&
> > + !capable(CAP_SYS_RESOURCE))
> > retval = -EPERM;
> > + if (!retval)
> > + retval = security_task_setrlimit(tsk, resource, new_rlim);
>
> I looked at the amount of code which exists under
> security_task_setrlimit() and nearly died. Please convince me that it
> is correct to do all that work under tasklist_lock, and that it is also
> maintainable.
When I wrote this code, I looked how it is done in the current code
and redid what others, including getioprio, setnice and task_kill
(this one even with task_lock write-locked) do. I need the lock
to protect task->signal (which is checked inside the security
function) from disappearing.
> > + if (!retval)
> > + *old_rlim = *new_rlim;
> > task_unlock(tsk->group_leader);
> >
> > if (retval || resource != RLIMIT_CPU)
>
> Yikes, so the locking around all that selinux code becomes even more
> brutal. How much rope are you tying around the selinux developers'
> hands here?
I agree with that and didn't find a better solution than is attached
here.
--
The code performed in security_task_setrlimit is very long when
CONFIG_SECURITY is enabled. Don't execute the function under
alloc_lock, unlock it temporarily and check whether the limits changed
in the meantime while the lock was not held. If yes, redo do check.
Signed-off-by: Jiri Slaby <[email protected]>
---
kernel/sys.c | 39 +++++++++++++++++++++++++++++++++++----
1 files changed, 35 insertions(+), 4 deletions(-)
diff --git a/kernel/sys.c b/kernel/sys.c
index d7fcd4a..48cb21d 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1304,12 +1304,37 @@ static void rlim64_to_rlim(const struct rlimit64 *rlim64, struct rlimit *rlim)
rlim->rlim_max = (unsigned long)rlim64->rlim_max;
}
+/**
+ * check_security_task_setrlimit_unlocked - calls security_task_setrlimit
+ *
+ * But security_task_setrlimit is complex with much of code, so that we don't
+ * want to call it while holding the task_lock. We temporarily unlock the lock
+ * and after the check we test whether the limits changed in the meantime.
+ */
+static int check_security_task_setrlimit_unlocked(struct task_struct *tsk,
+ unsigned int resource, struct rlimit *new_rlim,
+ struct rlimit *old_rlim)
+{
+ struct rlimit rlim;
+ int ret;
+
+ memcpy(&rlim, old_rlim, sizeof(rlim));
+
+ task_unlock(tsk->group_leader);
+ ret = security_task_setrlimit(tsk, resource, new_rlim);
+ task_lock(tsk->group_leader);
+
+ if (!ret && memcmp(&rlim, old_rlim, sizeof(rlim)))
+ return -EAGAIN;
+ return ret;
+}
+
/* make sure you are allowed to change @tsk limits before calling this */
int do_prlimit(struct task_struct *tsk, unsigned int resource,
struct rlimit *new_rlim, struct rlimit *old_rlim)
{
struct rlimit *rlim;
- int retval = 0;
+ int retval;
if (resource >= RLIM_NLIMITS)
return -EINVAL;
@@ -1339,13 +1364,19 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
rlim = tsk->signal->rlim + resource;
task_lock(tsk->group_leader);
+again:
+ retval = 0;
if (new_rlim) {
if ((new_rlim->rlim_max > rlim->rlim_max) &&
!capable(CAP_SYS_RESOURCE))
retval = -EPERM;
- if (!retval)
- retval = security_task_setrlimit(tsk, resource,
- new_rlim);
+ if (!retval) {
+ retval = check_security_task_setrlimit_unlocked(tsk,
+ resource, new_rlim, rlim);
+ if (retval == -EAGAIN) {
+ goto again;
+ }
+ }
}
if (!retval) {
if (old_rlim)
--
1.7.1
(add selinux maintainers)
First of all, my apologies for the huge delay. And I still didn't
read the whole series, sorry.
On 06/06, Jiri Slaby wrote:
>
> +static int check_security_task_setrlimit_unlocked(struct task_struct *tsk,
> + unsigned int resource, struct rlimit *new_rlim,
> + struct rlimit *old_rlim)
> +{
> + struct rlimit rlim;
> + int ret;
> +
> + memcpy(&rlim, old_rlim, sizeof(rlim));
> +
> + task_unlock(tsk->group_leader);
> + ret = security_task_setrlimit(tsk, resource, new_rlim);
> + task_lock(tsk->group_leader);
> +
> + if (!ret && memcmp(&rlim, old_rlim, sizeof(rlim)))
> + return -EAGAIN;
> + return ret;
> +}
> +
> /* make sure you are allowed to change @tsk limits before calling this */
> int do_prlimit(struct task_struct *tsk, unsigned int resource,
> struct rlimit *new_rlim, struct rlimit *old_rlim)
> {
> struct rlimit *rlim;
> - int retval = 0;
> + int retval;
>
> if (resource >= RLIM_NLIMITS)
> return -EINVAL;
> @@ -1339,13 +1364,19 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
>
> rlim = tsk->signal->rlim + resource;
> task_lock(tsk->group_leader);
> +again:
> + retval = 0;
> if (new_rlim) {
> if ((new_rlim->rlim_max > rlim->rlim_max) &&
> !capable(CAP_SYS_RESOURCE))
> retval = -EPERM;
> - if (!retval)
> - retval = security_task_setrlimit(tsk, resource,
> - new_rlim);
> + if (!retval) {
> + retval = check_security_task_setrlimit_unlocked(tsk,
> + resource, new_rlim, rlim);
> + if (retval == -EAGAIN) {
> + goto again;
> + }
> + }
Oh. Can't we just ignore this (imho minor) race ? Or just verify/document that
current_has_perm() can be called under task_lock. Actually, I do not think
we have a race, selinux_task_setrlimit() only checks that the caller has
rights to change the rlimits.
And. Given that avc_has_perm() can be called from irq context (say,
selinux_file_send_sigiotask or selinux_task_kill), we can assume it is safe
to call it under task_lock() which is not irq-safe.
But. OTOH, if we are really worried about security_ ops, then we have another
reason to call this hook under task_lock(), and we probably want to modify
selinux_bprm_committing_creds() to take this lock too:
--- security/selinux/hooks.c
+++ security/selinux/hooks.c
@@ -2333,11 +2333,14 @@ static void selinux_bprm_committing_cred
rc = avc_has_perm(new_tsec->osid, new_tsec->sid, SECCLASS_PROCESS,
PROCESS__RLIMITINH, NULL);
if (rc) {
+ /* protects against do_prlimit() */
+ task_lock(current);
for (i = 0; i < RLIM_NLIMITS; i++) {
rlim = current->signal->rlim + i;
initrlim = init_task.signal->rlim + i;
rlim->rlim_cur = min(rlim->rlim_max, initrlim->rlim_cur);
}
+ task_unlock(current);
update_rlimit_cpu(current->signal->rlim[RLIMIT_CPU].rlim_cur);
}
}
Finally. selinux_task_setrlimit(p) uses __task_cred(p) for the check.
This looks a bit strange, different threads can have different creds
but obviously rlimits are per-process.
Perhaps it makes sense to do selinux_task_setrlimit(p->group_leader)?
At least in this case the result should be "consistent".
Oleg.
On 06/07/2010 08:08 PM, Oleg Nesterov wrote:
> First of all, my apologies for the huge delay. And I still didn't
> read the whole series, sorry.
Hi, never mind, my RTT of 2 weeks doesn't look like very short too :).
> On 06/06, Jiri Slaby wrote:
>> @@ -1339,13 +1364,19 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
>>
>> rlim = tsk->signal->rlim + resource;
>> task_lock(tsk->group_leader);
>> +again:
>> + retval = 0;
>> if (new_rlim) {
>> if ((new_rlim->rlim_max > rlim->rlim_max) &&
>> !capable(CAP_SYS_RESOURCE))
BTW this capable() has the exactly same problem with being called with
task_lock held. Is it OK to move it completely out of critical section?
I'm asking because it sets a current->flags SU bit used for accounting.
If I move it out of the section, it will set the bit always.
>> retval = -EPERM;
>> - if (!retval)
>> - retval = security_task_setrlimit(tsk, resource,
>> - new_rlim);
>> + if (!retval) {
>> + retval = check_security_task_setrlimit_unlocked(tsk,
>> + resource, new_rlim, rlim);
>> + if (retval == -EAGAIN) {
>> + goto again;
>> + }
>> + }
>
> Oh. Can't we just ignore this (imho minor) race ? Or just verify/document that
> current_has_perm() can be called under task_lock. Actually, I do not think
> we have a race, selinux_task_setrlimit() only checks that the caller has
> rights to change the rlimits.
But does so only if current limits are different to the new ones. My
opinion is that we can ignore it anyway.
> And. Given that avc_has_perm() can be called from irq context (say,
> selinux_file_send_sigiotask or selinux_task_kill), we can assume it is safe
> to call it under task_lock() which is not irq-safe.
>
> But. OTOH, if we are really worried about security_ ops, then we have another
> reason to call this hook under task_lock(), and we probably want to modify
> selinux_bprm_committing_creds() to take this lock too:
>
> --- security/selinux/hooks.c
> +++ security/selinux/hooks.c
> @@ -2333,11 +2333,14 @@ static void selinux_bprm_committing_cred
> rc = avc_has_perm(new_tsec->osid, new_tsec->sid, SECCLASS_PROCESS,
> PROCESS__RLIMITINH, NULL);
> if (rc) {
> + /* protects against do_prlimit() */
> + task_lock(current);
> for (i = 0; i < RLIM_NLIMITS; i++) {
> rlim = current->signal->rlim + i;
> initrlim = init_task.signal->rlim + i;
> rlim->rlim_cur = min(rlim->rlim_max, initrlim->rlim_cur);
> }
> + task_unlock(current);
> update_rlimit_cpu(current->signal->rlim[RLIMIT_CPU].rlim_cur);
> }
> }
Makes sense to me.
> Finally. selinux_task_setrlimit(p) uses __task_cred(p) for the check.
> This looks a bit strange, different threads can have different creds
> but obviously rlimits are per-process.
Sorry I can't see it. Could you point out in which function this is done?
thanks,
--
js
On 06/23, Jiri Slaby wrote:
>
> On 06/07/2010 08:08 PM, Oleg Nesterov wrote:
> > First of all, my apologies for the huge delay. And I still didn't
> > read the whole series, sorry.
>
> Hi, never mind, my RTT of 2 weeks doesn't look like very short too :).
>
> > On 06/06, Jiri Slaby wrote:
> >> @@ -1339,13 +1364,19 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
> >>
> >> rlim = tsk->signal->rlim + resource;
> >> task_lock(tsk->group_leader);
> >> +again:
> >> + retval = 0;
> >> if (new_rlim) {
> >> if ((new_rlim->rlim_max > rlim->rlim_max) &&
> >> !capable(CAP_SYS_RESOURCE))
>
> BTW this capable() has the exactly same problem with being called with
> task_lock held. Is it OK to move it completely out of critical section?
> I'm asking because it sets a current->flags SU bit used for accounting.
> If I move it out of the section, it will set the bit always.
Well, with all these delays I do not know what "exactly same problem"
means ;) Please explain?
> >> retval = -EPERM;
> >> - if (!retval)
> >> - retval = security_task_setrlimit(tsk, resource,
> >> - new_rlim);
> >> + if (!retval) {
> >> + retval = check_security_task_setrlimit_unlocked(tsk,
> >> + resource, new_rlim, rlim);
> >> + if (retval == -EAGAIN) {
> >> + goto again;
> >> + }
> >> + }
> >
> > Oh. Can't we just ignore this (imho minor) race ? Or just verify/document that
> > current_has_perm() can be called under task_lock. Actually, I do not think
> > we have a race, selinux_task_setrlimit() only checks that the caller has
> > rights to change the rlimits.
>
> But does so only if current limits are different to the new ones. My
> opinion is that we can ignore it anyway.
Or call it under task_lock(), see below
> > And. Given that avc_has_perm() can be called from irq context (say,
> > selinux_file_send_sigiotask or selinux_task_kill), we can assume it is safe
> > to call it under task_lock() which is not irq-safe.
> >
> > But. OTOH, if we are really worried about security_ ops, then we have another
> > reason to call this hook under task_lock(), and we probably want to modify
> > selinux_bprm_committing_creds() to take this lock too:
> >
> > --- security/selinux/hooks.c
> > +++ security/selinux/hooks.c
> > @@ -2333,11 +2333,14 @@ static void selinux_bprm_committing_cred
> > rc = avc_has_perm(new_tsec->osid, new_tsec->sid, SECCLASS_PROCESS,
> > PROCESS__RLIMITINH, NULL);
> > if (rc) {
> > + /* protects against do_prlimit() */
> > + task_lock(current);
> > for (i = 0; i < RLIM_NLIMITS; i++) {
> > rlim = current->signal->rlim + i;
> > initrlim = init_task.signal->rlim + i;
> > rlim->rlim_cur = min(rlim->rlim_max, initrlim->rlim_cur);
> > }
> > + task_unlock(current);
> > update_rlimit_cpu(current->signal->rlim[RLIMIT_CPU].rlim_cur);
> > }
> > }
>
> Makes sense to me.
see above ;)
> > Finally. selinux_task_setrlimit(p) uses __task_cred(p) for the check.
> > This looks a bit strange, different threads can have different creds
> > but obviously rlimits are per-process.
>
> Sorry I can't see it. Could you point out in which function this is done?
selinux_task_setrlimit()->current_has_perm()->current_sid()->current_cred()
Oleg.
On 06/23/2010 06:12 PM, Oleg Nesterov wrote:
> On 06/23, Jiri Slaby wrote:
>>
>> On 06/07/2010 08:08 PM, Oleg Nesterov wrote:
>>> On 06/06, Jiri Slaby wrote:
>>>> @@ -1339,13 +1364,19 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
>>>>
>>>> rlim = tsk->signal->rlim + resource;
>>>> task_lock(tsk->group_leader);
>>>> +again:
>>>> + retval = 0;
>>>> if (new_rlim) {
>>>> if ((new_rlim->rlim_max > rlim->rlim_max) &&
>>>> !capable(CAP_SYS_RESOURCE))
>>
>> BTW this capable() has the exactly same problem with being called with
>> task_lock held. Is it OK to move it completely out of critical section?
>> I'm asking because it sets a current->flags SU bit used for accounting.
>> If I move it out of the section, it will set the bit always.
>
> Well, with all these delays I do not know what "exactly same problem"
> means ;) Please explain?
As I wrote: that the capable() is called with task_lock held. With
security enabled, capable() goes through all the avc_has_perm_noaudit,
avc_audit and similar (in selinux), the same as security_task_setrlimit
which we were writing about -- Andrew complaining about doing very long
security checks while holding spinlocks.
I mean we should do either none of capable and selinux_task_setrlimit
under task_lock or both :).
>>> Finally. selinux_task_setrlimit(p) uses __task_cred(p) for the check.
>>> This looks a bit strange, different threads can have different creds
>>> but obviously rlimits are per-process.
>>
>> Sorry I can't see it. Could you point out in which function this is done?
>
> selinux_task_setrlimit()->current_has_perm()->current_sid()->current_cred()
I still see no way how this is wrong. We want to check whether current
thread has capabilities to change (someone else's) rlimits. Maybe I'm
missing something?
thanks,
--
js
On 06/23, Jiri Slaby wrote:
>
> On 06/23/2010 06:12 PM, Oleg Nesterov wrote:
> > On 06/23, Jiri Slaby wrote:
> >>
> >> BTW this capable() has the exactly same problem with being called with
> >> task_lock held. Is it OK to move it completely out of critical section?
> >> I'm asking because it sets a current->flags SU bit used for accounting.
> >> If I move it out of the section, it will set the bit always.
> >
> > Well, with all these delays I do not know what "exactly same problem"
> > means ;) Please explain?
>
> As I wrote: that the capable() is called with task_lock held.
Ah, got it, yes.
> > selinux_task_setrlimit()->current_has_perm()->current_sid()->current_cred()
I meant
selinux_task_setrlimit(p)->current_has_perm(p)->task_sid(p)->__task_cred(p)
> I still see no way how this is wrong. We want to check whether current
> thread has capabilities to change (someone else's) rlimits.
Yes. but what is "someone else" ?
IIRC, one of your patches (correctly) changes security_task_setrlimit()
to have the new argument, p == "someone else", correct?
Now, the result of security check depends on __task_cred(p) above, and
thus depends on which thread we choose to change rlimits.
I think it makes more sense to always pass ->group_leader as an argument
to security_task_setrlimit(p). But probably I missed something, I do not
remember what exactly other patches do.
Oleg.
On Wed, 2010-06-23 at 19:44 +0200, Jiri Slaby wrote:
> On 06/23/2010 06:12 PM, Oleg Nesterov wrote:
> > On 06/23, Jiri Slaby wrote:
> >>
> >> On 06/07/2010 08:08 PM, Oleg Nesterov wrote:
> >>> On 06/06, Jiri Slaby wrote:
> >>>> @@ -1339,13 +1364,19 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
> >>>>
> >>>> rlim = tsk->signal->rlim + resource;
> >>>> task_lock(tsk->group_leader);
> >>>> +again:
> >>>> + retval = 0;
> >>>> if (new_rlim) {
> >>>> if ((new_rlim->rlim_max > rlim->rlim_max) &&
> >>>> !capable(CAP_SYS_RESOURCE))
> >>
> >> BTW this capable() has the exactly same problem with being called with
> >> task_lock held. Is it OK to move it completely out of critical section?
> >> I'm asking because it sets a current->flags SU bit used for accounting.
> >> If I move it out of the section, it will set the bit always.
> >
> > Well, with all these delays I do not know what "exactly same problem"
> > means ;) Please explain?
>
> As I wrote: that the capable() is called with task_lock held. With
> security enabled, capable() goes through all the avc_has_perm_noaudit,
> avc_audit and similar (in selinux), the same as security_task_setrlimit
> which we were writing about -- Andrew complaining about doing very long
> security checks while holding spinlocks.
>
> I mean we should do either none of capable and selinux_task_setrlimit
> under task_lock or both :).
IMO, just do them both under task_lock. All of the core SELinux
permission checking code should be safe under task lock and the audit
path is the exceptional case (only upon denial).
> >>> Finally. selinux_task_setrlimit(p) uses __task_cred(p) for the check.
> >>> This looks a bit strange, different threads can have different creds
> >>> but obviously rlimits are per-process.
> >>
> >> Sorry I can't see it. Could you point out in which function this is done?
> >
> > selinux_task_setrlimit()->current_has_perm()->current_sid()->current_cred()
>
> I still see no way how this is wrong. We want to check whether current
> thread has capabilities to change (someone else's) rlimits. Maybe I'm
> missing something?
>
> thanks,
--
Stephen Smalley
National Security Agency
On 06/23/2010 07:56 PM, Oleg Nesterov wrote:
> On 06/23, Jiri Slaby wrote:
>> I still see no way how this is wrong. We want to check whether current
>> thread has capabilities to change (someone else's) rlimits.
You know, this is one of those sentences where you wonder what kind of
idiot wrote that. And then you find out it was you being totally off base.
> Yes. but what is "someone else" ?
I don't know what was I thinking of. Indeed you are correct. When I was
writing that I was somehow under the impression (dunno why) that 'p' is
current process and the code checks whether 'p' can do the change. But
we are indeed checking whether 'current' may change 'p's limits -- or
better, as you wrote, not of the thread 'p' itself, but of its whole
process.
I'll resend a new version with you all in CCs to see what we can do with
the patches, if anything.
thanks,
--
js