This message didn't seem to get through the first time, so here it is
again. The patch is available for review at:
http://www.kvack.org/~blah/v2.5.23-cred.diff
Several parts of the kernel need to obtain a reference to the
credentials of a process: aio, nfs, dirty mmap writebacks to
name a few. Additionally, POSIX threads need to be able to share
credentials between clones. The patch below moves the credentials
out of task_struct and into struct cred, which is reference
counted. It also adds support for CLONE_CRED which shares the
credentials between threads. Comments?
-ben
--
"You will be reincarnated as a toad; and you will be much happier."
On Wed, 2002-06-19 at 18:29, Benjamin LaHaise wrote:
> This message didn't seem to get through the first time, so here it is
> again. The patch is available for review at:
>
> http://www.kvack.org/~blah/v2.5.23-cred.diff
Yow, big patch... that is why it did not make it through. Vger
(silently) rejects email over ~20KB.
> Several parts of the kernel need to obtain a reference to the
> credentials of a process: aio, nfs, dirty mmap writebacks to
> name a few. Additionally, POSIX threads need to be able to share
> credentials between clones. The patch below moves the credentials
> out of task_struct and into struct cred, which is reference
> counted. It also adds support for CLONE_CRED which shares the
> credentials between threads. Comments?
Looks good. I am running it now in my 2.5 tree with no apparent issues
and I looked over the patch and it looked snazzy.
Question: what semantics would be broken if CLONE_CRED was implied by
CLONE_THREAD? Regardless of code that needs this, it would be nice to
just save the memory when using threads. Hell, as far as I am
concerned as long as the tasks still share a VM they could share
credentials - but I am sure that breaks something.
Next, now that all this data no longer belongs solely to current... you
need to be explicit about locking rules. There is a capability_lock
spinlock but the long semantics are not 100% respected. I tried to firm
them up in my capability.c cleanup one or two kernels ago... it should
be good enough and not be highly contended.
I guess the only downside would be the extra overhead in our
preciously-fast do_fork... another slab allocation etc. but that is
countered if enough stuff starts using CLONE_CRED.
Oh, and what is with the #if 0 over the set and getaffinity syscalls???
That needs to go!
Robert Love
On Wed, Jun 19, 2002 at 07:30:35PM -0700, Robert Love wrote:
> Yow, big patch... that is why it did not make it through. Vger
> (silently) rejects email over ~20KB.
Silent is definately not good. Oh well.
> Question: what semantics would be broken if CLONE_CRED was implied by
> CLONE_THREAD? Regardless of code that needs this, it would be nice to
> just save the memory when using threads. Hell, as far as I am
> concerned as long as the tasks still share a VM they could share
> credentials - but I am sure that breaks something.
Changing semantics like that is dangerous. We have no way of knowing if
any applications rely on the existing behaviour, and furthermore if
changing it will silently introduce security holes.
> Next, now that all this data no longer belongs solely to current... you
> need to be explicit about locking rules. There is a capability_lock
> spinlock but the long semantics are not 100% respected. I tried to firm
> them up in my capability.c cleanup one or two kernels ago... it should
> be good enough and not be highly contended.
Noted. Perhaps the current usage counts of the various limits should
be atomic types, or maybe the spinlock is enough. If Linus is actually
interested in the patch, this could be easily fixed up. Also, a few
parts of the kernel were suspiciously different in their use of euid/suid
instead of the plain old uid. It would be nice if someone could double
check that these places are correct.
> I guess the only downside would be the extra overhead in our
> preciously-fast do_fork... another slab allocation etc. but that is
> countered if enough stuff starts using CLONE_CRED.
I would hope that the per-cpu slab caches are fast enough.
> Oh, and what is with the #if 0 over the set and getaffinity syscalls???
> That needs to go!
Whoops, that was a dirty workaround for the breakage of 2.5.23.
-ben
--
"You will be reincarnated as a toad; and you will be much happier."
On Thu, 2002-06-20 at 09:28, Benjamin LaHaise wrote:
> On Wed, Jun 19, 2002 at 07:30:35PM -0700, Robert Love wrote:
>
> > Question: what semantics would be broken if CLONE_CRED was implied by
> > CLONE_THREAD? Regardless of code that needs this, it would be nice to
> > just save the memory when using threads. Hell, as far as I am
> > concerned as long as the tasks still share a VM they could share
> > credentials - but I am sure that breaks something.
>
> Changing semantics like that is dangerous. We have no way of knowing if
> any applications rely on the existing behaviour, and furthermore if
> changing it will silently introduce security holes.
OK, understood. I just see an opportunity to save memory and would love
to take advantage of it. We should definitely look into giving pthreads
(maybe via NGPT) the ability to specify CLONE_CRED. I suspect 90% of
the cases retain the same credentials anyhow. Copy-on-write? :)
> > Next, now that all this data no longer belongs solely to current... you
> > need to be explicit about locking rules. There is a capability_lock
> > spinlock but the long semantics are not 100% respected. I tried to firm
> > them up in my capability.c cleanup one or two kernels ago... it should
> > be good enough and not be highly contended.
>
> Noted. Perhaps the current usage counts of the various limits should
> be atomic types, or maybe the spinlock is enough. If Linus is actually
> interested in the patch, this could be easily fixed up. Also, a few
> parts of the kernel were suspiciously different in their use of euid/suid
> instead of the plain old uid. It would be nice if someone could double
> check that these places are correct.
I am not so much worried about the ref counting (I think it looks right)
but just the general behavior. I.e., you do not need to code anything
but you need to lay out right now what the locking rules need to be or
it will be a mess.
There is going to be code reading and writing p->cred data now
simultaneously and that should probably always be done under lock.
Another example: see fs/open.c :: sys_access(). I added that FIXME
there a couple kernel revisions ago...
> > Oh, and what is with the #if 0 over the set and getaffinity syscalls???
> > That needs to go!
>
> Whoops, that was a dirty workaround for the breakage of 2.5.23.
Oh, OK - I thought you hated my syscalls :)
There is a patch in Linus's BK for it, or you can define cpu_online_map
to 1 on UP.
Robert Love
>>>>> " " == Robert Love <[email protected]> writes:
> CLONE_CRED. I suspect 90% of the cases retain the same
> credentials anyhow. Copy-on-write? :)
Ben,
Making the credentials a monolithic block like you appear to be
doing just doesn't make sense. If you look at the way things like
fsuid/fsgid/groups[] are used, you will see that almost all those that
filesystems that care are making their own private copies.
It would be a lot more useful to split out fsuid/fsgid/groups[] as
per the *BSD ucred, and then allow filesystems to reference the
resulting struct instead (using COW semantics).
That way too, 'struct file' could finally contain a reference to a
full copy of the filesystem credentials, and we could get rid of
the 'struct file' crud in address_space_operations like readpage().
See
http://www.fys.uio.no/~trondmy/src/bsdcred/linux-2.5.1-pre11_cred.dif
for my earlier attempt at doing this sort of thing...
Cheers,
Trond
On Fri, Jun 21, 2002 at 01:12:59PM +0200, Trond Myklebust wrote:
> Making the credentials a monolithic block like you appear to be
> doing just doesn't make sense. If you look at the way things like
> fsuid/fsgid/groups[] are used, you will see that almost all those that
> filesystems that care are making their own private copies.
I'm not looking at things from the filesystem's point of view, so
much as for threads and aio, where rlimits and identificantion needs
to be shared between contexts.
-ben
--
"You will be reincarnated as a toad; and you will be much happier."
On Friday 21 June 2002 20:52, Benjamin LaHaise wrote:
> On Fri, Jun 21, 2002 at 01:12:59PM +0200, Trond Myklebust wrote:
> > Making the credentials a monolithic block like you appear to be
> > doing just doesn't make sense. If you look at the way things like
> > fsuid/fsgid/groups[] are used, you will see that almost all those that
> > filesystems that care are making their own private copies.
>
> I'm not looking at things from the filesystem's point of view, so
> much as for threads and aio, where rlimits and identificantion needs
> to be shared between contexts.
Fair enough, however by hard coding a lot of new pointers and links everywhere
you are making the task unnecessarily more difficult for the person who
*does* want to look at the filesystem point of view. Please could you at
least hide the details of the location of all these elements in
macros/inlined functions.
IOW: if you could write something like
x = current_fsuid();
set_current_euid(y);
instead of
x = current->cred->fsuid;
current->cred->euid = y;
then this would make a later transition to current->cred->ucred a lot easier.
In addition, it might also make it possible to share future code with 2.4.x
via a set of compatibility routines.
Cheers,
Trond