2002-08-08 14:55:04

by Dave McCracken

[permalink] [raw]
Subject: [PATCH 2.5.30+] Second attempt at a shared credentials patch



This patch allows tasks to share credentials via a flag to clone().

This version fixes the problem with exec() that Linus found. Tasks that
call exec() get their own copy of the credentials at that point.

The URL is here because it's too big to include in email:

http://www.ibm.com/linux/ltc/patches/misc/cred-2.5.30-3.diff.gz

The patch is against Linus' BK tree as of this morning.

Dave McCracken

======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
[email protected] T/L 678-3059


2002-08-08 15:28:35

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 2.5.30+] Second attempt at a shared credentials patch

>>>>> " " == Dave McCracken <[email protected]> writes:

> This patch allows tasks to share credentials via a flag to
> clone().

> This version fixes the problem with exec() that Linus found.
> Tasks that call exec() get their own copy of the credentials at
> that point.

> The URL is here because it's too big to include in email:

> http://www.ibm.com/linux/ltc/patches/misc/cred-2.5.30-3.diff.gz

What the hell is that change to fs/nfs/dir.c below all about? Try
mounting an NFSv2 partition with that applied...

Instead of doing this as one big unreadable monolithic patch and
risking getting things wrong like in the above case, it would be nice
if you could go via a set of wrapper functions:

#define get_current_uid() (current->uid)
#define set_current_uid(a) current->uid = a
.
.
.

...

That would allow you to make the changes to the lower level filesystem
code in smaller babysteps, and make the actual move to 'struct cred' a
trivial patch...


As I argued before when Ben first presented this, that will also allow
us the flexibility to change the structure at a later date. Several
filesystems could benefit from a shared *BSD-style 'struct ucred' to
replace the tuple current->{ fsuid, fsgid, groups }.

Cheers,
Trond

diff -Nru a/fs/nfs/dir.c b/fs/nfs/dir.c
--- a/fs/nfs/dir.c Wed Aug 7 09:08:23 2002
+++ b/fs/nfs/dir.c Wed Aug 7 09:08:23 2002
@@ -1237,9 +1237,6 @@

lock_kernel();

- if (!NFS_PROTO(inode)->access)
- goto out_notsup;
-
cred = rpcauth_lookupcred(NFS_CLIENT(inode)->cl_auth, 0);
if (cache->cred == cred
&& time_before(jiffies, cache->jiffies + NFS_ATTRTIMEO(inode))) {

2002-08-08 16:17:34

by Dave McCracken

[permalink] [raw]
Subject: Re: [PATCH 2.5.30+] Second attempt at a shared credentials patch


--On Thursday, August 08, 2002 05:32:05 PM +0200 Trond Myklebust
<[email protected]> wrote:

> What the hell is that change to fs/nfs/dir.c below all about? Try
> mounting an NFSv2 partition with that applied...

Oops. Looks like I managed to make a mistake in merging. Here's a fixed
version:

http://www.ibm.com/linux/ltc/patches/misc/cred-2.5.30-4.diff.gz

> Instead of doing this as one big unreadable monolithic patch and
> risking getting things wrong like in the above case, it would be nice
> if you could go via a set of wrapper functions:
>
># define get_current_uid() (current->uid)
># define set_current_uid(a) current->uid = a

I don't see this as a win. I *could* do a big monolithic patch to change
all references to current->*id to macros, then change the macros in a
separate patch. But then we'd be stuck with macros for all those
references forever, and they're not likely to change again any time soon.
I don't think we'd really want to have macros for all our structure
references on the off chance that someone might change it in the future.

Dave McCracken

======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
[email protected] T/L 678-3059

2002-08-08 16:50:56

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 2.5.30+] Second attempt at a shared credentials patch

>>>>> " " == Dave McCracken <[email protected]> writes:

>> Instead of doing this as one big unreadable monolithic patch
>> and risking getting things wrong like in the above case, it
>> would be nice if you could go via a set of wrapper functions:
>>
>> # define get_current_uid() (current->uid) define
>> # set_current_uid(a) current->uid = a

> I don't see this as a win. I *could* do a big monolithic patch
> to change all references to current->*id to macros, then change
> the macros in a separate patch. But then we'd be stuck with
> macros for all those references forever, and they're not likely
> to change again any time soon. I don't think we'd really want
> to have macros for all our structure references on the off
> chance that someone might change it in the future.

Why? Macros (and inlined functions) have the advantage that they
enforce good policy. Doing 'task->cred->uid = a' on tasks other than
'current' is in general not a very safe thing to do. This sort of
issue w.r.t. safe policies should in particular be worrying you when
you start adding CRED_CLONE...
There are good precedents for this sort of argument: see
'set_current_state()' & friends.

In addition, those macros would allow you to set up compatibility with
2.4.x and simplify patch backports.


As for changing the structure: As I said previously I'd like to unify
all those { fsuid, fsgid, group } things into a proper ucred, so that
we can share these objects around the VFS, and cache them...
Your 'struct cred' as it stands will not suffice to do all that since
it does not provide the necessary Copy On Write protection. (For
instance if some thread temporarily raises my process privileges, I
will *not* want all my already opened 'struct file's to suddenly gain
root access).

Cheers,
Trond

2002-08-08 18:02:41

by Dave McCracken

[permalink] [raw]
Subject: Re: [PATCH 2.5.30+] Second attempt at a shared credentials patch


--On Thursday, August 08, 2002 06:54:30 PM +0200 Trond Myklebust
<[email protected]> wrote:

> Why? Macros (and inlined functions) have the advantage that they
> enforce good policy. Doing 'task->cred->uid = a' on tasks other than
> 'current' is in general not a very safe thing to do. This sort of
> issue w.r.t. safe policies should in particular be worrying you when
> you start adding CRED_CLONE...
> There are good precedents for this sort of argument: see
> 'set_current_state()' & friends.

I don't really see the benefit. The macros you're talking about are only
there to provide different behavior for MP and UP. There aren't macros for
any of the other shareable structures hanging off the task struct.

> In addition, those macros would allow you to set up compatibility with
> 2.4.x and simplify patch backports.

I don't see this one either. A patch to change everything to macros + a
patch for my changes is no smaller than my current patch, and I don't see
this as something that'll need changing again, or at least not any more
likely than any of the other structures. Having macros for elements of a
structure should have more reason than to hide a dereference, in my opinion.

> As for changing the structure: As I said previously I'd like to unify
> all those { fsuid, fsgid, group } things into a proper ucred, so that
> we can share these objects around the VFS, and cache them...
> Your 'struct cred' as it stands will not suffice to do all that since
> it does not provide the necessary Copy On Write protection. (For
> instance if some thread temporarily raises my process privileges, I
> will *not* want all my already opened 'struct file's to suddenly gain
> root access).

I'm not opposed to enhancing the cred structure so it can be used like you
describe. It's not a job I want to tackle, but I'd think my change would
be a step in the right direction.

I'm confused by your example, though. If a thread makes a system call to
change its credentials, all other threads should see it. That's POSIX
behavior, and the whole point of the patch. If you're talking about kernel
code that assumes another identity under the covers, then yes, that's
interesting. And could be achieved by allocating a temporary cred
structure and attaching it to the task for the duration of the operation.

Dave McCracken

======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
[email protected] T/L 678-3059

2002-08-08 19:52:47

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 2.5.30+] Second attempt at a shared credentials patch

>>>>> " " == Dave McCracken <[email protected]> writes:

>> There are good precedents for this sort of argument: see
>> 'set_current_state()' & friends.

> I don't really see the benefit. The macros you're talking
> about are only there to provide different behavior for MP and
> UP.

... which begs the question: are you saying that there are no SMP
issues with CLONE_CRED and setting/reading the 'struct cred' members?

> There aren't macros for any of the other shareable structures
> hanging off the task struct.

Which other shareable structures? Are there other any that can get
changed at random places in the code?
Please read what I said. The macros help to enforce the idea that you
should not change ->state for anything other than the current task.

> I'm not opposed to enhancing the cred structure so it can be
> used like you describe. It's not a job I want to tackle, but
> I'd think my change would be a step in the right direction.

> I'm confused by your example, though. If a thread makes a
> system call to change its credentials, all other threads should
> see it. That's POSIX behavior, and the whole point of the
> patch. If you're talking about kernel code that assumes
> another identity under the covers, then yes, that's
> interesting. And could be achieved by allocating a temporary
> cred structure and attaching it to the task for the duration of
> the operation.

Authentication under UNIX usually requires you to check the process'
uid/gid/groups affiliation. As such, it is useful to be able to pass
that information around the kernel. Most OSes use some variation of
the BSD 'ucred' structure which is reference counted and obeys COW
(copy on write).

struct ucred {
atomic_t count;
uid_t uid; /* == fsuid if you like */
gid_t gid; /* == fsgid " " " */
int ngroups;
gid_t *groups;
};

This means that 'struct file', the underlying filesystems, whoever
else... can hold a reference to the above structure and be assured
that it will never change. Changing the fsuid etc. are extremely rare
operations compared to opening/closing a file, so the whole idea is
precisely to *avoid* having to copy the above information all the time
(which, given all the races that CLONE_CRED introduces, is a good
thing).

As for POSIX behaviour: it is quite compatible with the above. The
only change would be that your shared 'struct cred' would require a
reference to a struct ucred rather than including fsuid, fsgid, groups
as cred structure members.

Note: Given that Linux has adopted the 'capability' model on top of
the standard UNIX authentication model, it might perhaps be necessary
to move the capabilities into the ucred in order to make them COW too?

Cheers,
Trond

2002-08-08 20:07:56

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 2.5.30+] Second attempt at a shared credentials patch


BTW: if you'd like an example the sort of thing I'm talking about,
please take a quick look at

http://www.fys.uio.no/~trondmy/src/bsdcred/linux-2.5.1-pre11_cred.dif

That was a start at doing credentials POSIX threads. It doesn't have
the 'capabilities' stuff merged into the struct task (the patch itself
is incomplete, out of date, and just as monolithic as yours) but it
does illustrate what I mean about the relationship between the struct
cred and the ucred stuff, and also illustrates how one might share the
struct ucred with the struct file.

Cheers,
Trond

2002-08-08 21:46:05

by Dave McCracken

[permalink] [raw]
Subject: Re: [PATCH 2.5.30+] Second attempt at a shared credentials patch


--On Thursday, August 08, 2002 09:56:23 PM +0200 Trond Myklebust
<[email protected]> wrote:

> ... which begs the question: are you saying that there are no SMP
> issues with CLONE_CRED and setting/reading the 'struct cred' members?

Yes, I'm saying there are no SMP issues with the shared cred structure. I
looked for them and failed to find any. Credentials are not set
cross-task, and are always done via atomic ops. I also failed to find any
broader race conditions that would require a lock.

> Which other shareable structures? Are there other any that can get
> changed at random places in the code?
> Please read what I said. The macros help to enforce the idea that you
> should not change ->state for anything other than the current task.

Ahh, hmm. That might possibly be useful, though I'm not convinced it's
necessary. The benefit would have to outweigh the added obscurity of using
a macro, and I don't think it does.

> Authentication under UNIX usually requires you to check the process'
> uid/gid/groups affiliation. As such, it is useful to be able to pass
> that information around the kernel. Most OSes use some variation of
> the BSD 'ucred' structure which is reference counted and obeys COW
> (copy on write).
>
> struct ucred {
> atomic_t count;
> uid_t uid; /* == fsuid if you like */
> gid_t gid; /* == fsgid " " " */
> int ngroups;
> gid_t *groups;
> };
>
> This means that 'struct file', the underlying filesystems, whoever
> else... can hold a reference to the above structure and be assured
> that it will never change. Changing the fsuid etc. are extremely rare
> operations compared to opening/closing a file, so the whole idea is
> precisely to *avoid* having to copy the above information all the time
> (which, given all the races that CLONE_CRED introduces, is a good
> thing).
>
> As for POSIX behaviour: it is quite compatible with the above. The
> only change would be that your shared 'struct cred' would require a
> reference to a struct ucred rather than including fsuid, fsgid, groups
> as cred structure members.
>
> Note: Given that Linux has adopted the 'capability' model on top of
> the standard UNIX authentication model, it might perhaps be necessary
> to move the capabilities into the ucred in order to make them COW too?

Ahh, ok. I see what you're getting at now. It's an interesting idea, and
I think a good one. But it's not really related to the patch I did, and I
don't want to tie one to the other.

Dave

======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
[email protected] T/L 678-3059

2002-08-08 21:51:29

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 2.5.30+] Second attempt at a shared credentials patch

>>>>> " " == Dave McCracken <[email protected]> writes:

> --On Thursday, August 08, 2002 09:56:23 PM +0200 Trond
> Myklebust <[email protected]> wrote:

>> ... which begs the question: are you saying that there are no
>> SMP issues with CLONE_CRED and setting/reading the 'struct
>> cred' members?

> Yes, I'm saying there are no SMP issues with the shared cred
> structure. I looked for them and failed to find any.
> Credentials are not set cross-task, and are always done via
> atomic ops. I also failed to find any broader race conditions
> that would require a lock.

What if one thread is doing an RPC call while the other is changing
the 'groups' entry?

Given that the first thread wants both to copy and/or compare the
groups entry what prevents scenarios of the form?

Thread 1 Thread 2

change cred->ngroups
copy cred->ngroups
copy cred->groups
change cred->groups


Cheers,
Trond

2002-08-09 19:21:14

by Dave McCracken

[permalink] [raw]
Subject: [PATCH 2.5.30+] Fourth attempt at a shared credentials patch


--On Thursday, August 08, 2002 11:55:05 PM +0200 Trond Myklebust
<[email protected]> wrote:

> What if one thread is doing an RPC call while the other is changing
> the 'groups' entry?

Gah. Good point. Ok, I've added locking to the cred structure to handle
this. Here's my new patch with those changes made:

http://www.ibm.com/linux/ltc/patches/misc/cred-2.5.30-5.diff.gz

I've gone through all the code again, and don't see any other places where
locking is really necessary. Feel free to point them out to me if you see
any.

Dave McCracken

======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
[email protected] T/L 678-3059

2002-08-09 19:48:18

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH 2.5.30+] Fourth attempt at a shared credentials patch

>>>>> " " == Dave McCracken <[email protected]> writes:

> --On Thursday, August 08, 2002 11:55:05 PM +0200 Trond
> Myklebust <[email protected]> wrote:

>> What if one thread is doing an RPC call while the other is
>> changing the 'groups' entry?

> Gah. Good point. Ok, I've added locking to the cred structure
> to handle this. Here's my new patch with those changes made:

> http://www.ibm.com/linux/ltc/patches/misc/cred-2.5.30-5.diff.gz

> I've gone through all the code again, and don't see any other
> places where locking is really necessary. Feel free to point
> them out to me if you see any.

Err... Well my original point about your changes to the sunrpc code
still stand: no spinlocking there AFAICS. In addition, you'll want to
talk to the Intermezzo people: they do allocation of buffers based on
the (volatile) value of cred->ngroups.

Finally, you also want all those reads and changes to more than one
value in the credential such as the stuff in security/capability.c, or
net/socket.c,... to be atomic. (Note: This is where 'struct ucred'
with COW gives you an efficiency gain).

Please also note that you only need spinlocking for the particular
case of tasks that have set CLONE_CRED. In all other cases, it adds a
rather nasty overhead...

Cheers,
Trond

2002-08-09 20:47:33

by Dave McCracken

[permalink] [raw]
Subject: Re: [PATCH 2.5.30+] Fourth attempt at a shared credentials patch


--On Friday, August 09, 2002 09:51:56 PM +0200 Trond Myklebust
<[email protected]> wrote:

> Err... Well my original point about your changes to the sunrpc code
> still stand: no spinlocking there AFAICS. In addition, you'll want to
> talk to the Intermezzo people: they do allocation of buffers based on
> the (volatile) value of cred->ngroups.

Oops. I missed the sunrpc case. This patch fixes it:

http://www.ibm.com/linux/ltc/patches/misc/cred-2.5.30-6.diff.gz

I think I mostly nailed the intermezzo case. I did go through it.

> Finally, you also want all those reads and changes to more than one
> value in the credential such as the stuff in security/capability.c, or
> net/socket.c,... to be atomic. (Note: This is where 'struct ucred'
> with COW gives you an efficiency gain).

I disagree. It won't generate bogus values of any of these fields. There
may be some cases where it'll pick up a combination of before and after
values, but I don't see where any of that is fatal.

> Please also note that you only need spinlocking for the particular
> case of tasks that have set CLONE_CRED. In all other cases, it adds a
> rather nasty overhead...

Spinlock isn't nasty overhead, if it's not contested. It seems to me that
checking whether it's shared is as much overhead as just taking the lock.

Dave McCracken

======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
[email protected] T/L 678-3059

2002-08-09 21:10:53

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 2.5.30+] Fourth attempt at a shared credentials patch

In article <[email protected]>,
Dave McCracken <[email protected]> wrote:
>
>--On Thursday, August 08, 2002 11:55:05 PM +0200 Trond Myklebust
><[email protected]> wrote:
>
>> What if one thread is doing an RPC call while the other is changing
>> the 'groups' entry?
>
>Gah. Good point. Ok, I've added locking to the cred structure to handle
>this.

Please don't do this with locking, I really think the right thing to do
is to have a "duplicate()" function, and when you pass credentials off
to something, you dup them at that point.

If you start off as non-root, and then execve suid into root, a pending
NFS request should _not_ suddently have the credentials changed under
it. Yet clearly that kind of thing can't just be locked either.

Along with copy-on-write semantics, this should perform perfectly well
(ie "duplicate()" would only increment a count, and then setuid() would
have to have code soemthing like

if (cred->count > 1) {
newcred = alloc_cred();
copy_cred(newcred, cred);
for_each_cred_group(p) {
p->cred = newcred;
atomic_inc(&newcred->count);
putcred(cred);
}
}

instead.

Linus

2002-08-12 20:05:03

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 2.5.30+] Fourth attempt at a shared credentials patch

>>>>> " " == Dave McCracken <[email protected]> writes:

> I think I mostly nailed the intermezzo case. I did go through
> it.

Are you sure? AFAICS, the intermezzo code assumes that ngroups/groups
won't change while you inside the intermezzo layer itself. Look at the
way they stuff current->ngroups into the record 'size', then do the
actual copy in journal_log_prefix()...

>> Finally, you also want all those reads and changes to more than
>> one value in the credential such as the stuff in
>> security/capability.c, or net/socket.c,... to be atomic. (Note:
>> This is where 'struct ucred' with COW gives you an efficiency
>> gain).

> I disagree. It won't generate bogus values of any of these
> fields. There may be some cases where it'll pick up a
> combination of before and after values, but I don't see where
> any of that is fatal.

It doesn't have to be fatal in order to be a security risk. The kernel
simply does not know whether or not formula A (random combination of
uid/gid/groups) will be secure as opposed to the before/after states.

>> Please also note that you only need spinlocking for the
>> particular case of tasks that have set CLONE_CRED. In all other
>> cases, it adds a rather nasty overhead...

> Spinlock isn't nasty overhead, if it's not contested. It seems
> to me that checking whether it's shared is as much overhead as
> just taking the lock.

spinlocking is designed so as to invalidate the processor caches.

Cheers,
Trond