LinuxLists.cc - Holding ref to /proc/<pid> dentry prevents task being freed

2005-05-05 09:11:18

Subject: Holding ref to /proc/<pid> dentry prevents task being freed

[Please CC me on replies. I'm not subscribed to LKML]

Hi

I'm busy writing a security module that does some very basic ACL stuff
on a per-task basis. If my module obtains and holds a dentry for
/proc/<pid> (via path_lookup), then the task_free_security hook is
never called for that process. Since the module releases the dentry in
task_free_security, this creates a chicken-and-egg problem and neither
the task nor the dentry is ever released. A side-effect is that the
module refcount never drops to 0.

Perhaps the LSM framework needs a hook for a task exiting (transition
to zombie state), in addition to the task_free_security hook? That
would allow resources to be freed from zombies, including these types
of circular references.

Thanks
Bruce

[Please CC me on replies. I'm not subscribed to LKML]
--
/--------------------------------------------------------------------\
| Bruce Merry | bmerry at cs . uct . ac . za |
| Proud user of Gentoo GNU/Linux | http://www.cs.uct.ac.za/~bmerry |
| Despite the high cost of living, it remains popular. |
\--------------------------------------------------------------------/

2005-05-05 16:34:52

by Chris Wright

[permalink] [raw]

Subject: Re: Holding ref to /proc/<pid> dentry prevents task being freed

* [email protected] ([email protected]) wrote:
> I'm busy writing a security module that does some very basic ACL stuff
> on a per-task basis. If my module obtains and holds a dentry for
> /proc/<pid> (via path_lookup), then the task_free_security hook is
> never called for that process. Since the module releases the dentry in
> task_free_security, this creates a chicken-and-egg problem and neither
> the task nor the dentry is ever released. A side-effect is that the
> module refcount never drops to 0.

Why are you holding that dentry? Some more background please.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-05-06 08:17:31

by bmerry

[permalink] [raw]

Subject: Re: Holding ref to /proc/<pid> dentry prevents task being freed

On Thu, May 05, 2005 at 09:34:13AM -0700, Chris Wright wrote:
> * [email protected] ([email protected]) wrote:
> > I'm busy writing a security module that does some very basic ACL stuff
> > on a per-task basis. If my module obtains and holds a dentry for
> > /proc/<pid> (via path_lookup), then the task_free_security hook is
> > never called for that process. Since the module releases the dentry in
> > task_free_security, this creates a chicken-and-egg problem and neither
> > the task nor the dentry is ever released. A side-effect is that the
> > module refcount never drops to 0.
>
> Why are you holding that dentry? Some more background please.

Just realised that I never mentioned what kernel I've been using:
2.6.11.7.

The security module is for sandboxing. The basic idea is that a wrapper
process passes a bunch of paths to the module, then execs the program
that should be sandboxed. After the exec, the process should only be
allowed access to those paths and their subdirectories (actually there
are some flags passed too to say what permissions are granted, but
that isn't really relevant).

Rather than calling d_path on every access request and doing string
matching (sounds hideously slow), I use path_lookup to get a dentry/mnt
pair for each path passed in (once when it is passed in). Then the
inode_permission hook walks up the filesystem, matching dentries.

Some processes have a legitimate reason for accessing /proc/<pid> (pid
of that process). Java, for example, does readlink("/proc/self/exe") to
find the binary. So the wrapper passes /proc/<pid> to the module, which
looks up and holds the dentry for it. I don't want to give blanket
permission to /proc, since preventing the sandbox from getting
information about what else is happening on the system is fairly
important to my application.

At the moment I'm looking at hard-coding special behaviour for /proc
into the module, but I was wondering if there was a simpler way around
this problem.

Incidentally, is it intentional that vfsmount_lock is not exported to
modules? My walk-up-the-tree code is essentially d_path without
constructing the string, but I've had to remove the lock and unlock of
vfsmount_lock because I don't have the symbol.

Thanks
Bruce

[Please cc me on replies. I'm not subscribed to LKML]
--
/--------------------------------------------------------------------\
| Bruce Merry | bmerry at cs . uct . ac . za |
| Proud user of Gentoo GNU/Linux | http://www.cs.uct.ac.za/~bmerry |
| In the force if Yoda's so strong, construct a sentence |
| with words in the proper order then why can't he? |
\--------------------------------------------------------------------/

2005-05-06 17:48:06

by Chris Wright

[permalink] [raw]

Subject: Re: Holding ref to /proc/<pid> dentry prevents task being freed

* [email protected] ([email protected]) wrote:
> On Thu, May 05, 2005 at 09:34:13AM -0700, Chris Wright wrote:
> > * [email protected] ([email protected]) wrote:
> > > I'm busy writing a security module that does some very basic ACL stuff
> > > on a per-task basis. If my module obtains and holds a dentry for
> > > /proc/<pid> (via path_lookup), then the task_free_security hook is
> > > never called for that process. Since the module releases the dentry in
> > > task_free_security, this creates a chicken-and-egg problem and neither
> > > the task nor the dentry is ever released. A side-effect is that the
> > > module refcount never drops to 0.
> >
> > Why are you holding that dentry? Some more background please.
>
> Just realised that I never mentioned what kernel I've been using:
> 2.6.11.7.
>
> The security module is for sandboxing. The basic idea is that a wrapper
> process passes a bunch of paths to the module, then execs the program
> that should be sandboxed. After the exec, the process should only be
> allowed access to those paths and their subdirectories (actually there
> are some flags passed too to say what permissions are granted, but
> that isn't really relevant).
>
> Rather than calling d_path on every access request and doing string
> matching (sounds hideously slow), I use path_lookup to get a dentry/mnt
> pair for each path passed in (once when it is passed in). Then the
> inode_permission hook walks up the filesystem, matching dentries.

This can break with hard links, bind mounts, etc. Can you not label the
inode directly?

> Some processes have a legitimate reason for accessing /proc/<pid> (pid
> of that process). Java, for example, does readlink("/proc/self/exe") to
> find the binary. So the wrapper passes /proc/<pid> to the module, which
> looks up and holds the dentry for it. I don't want to give blanket
> permission to /proc, since preventing the sandbox from getting
> information about what else is happening on the system is fairly
> important to my application.

Did you look at security_task_to_inode? It's there to help you label the
task based proc entries' inodes directly.

> At the moment I'm looking at hard-coding special behaviour for /proc
> into the module, but I was wondering if there was a simpler way around
> this problem.

You'll likely regret hardcoding something like that.

> Incidentally, is it intentional that vfsmount_lock is not exported to
> modules? My walk-up-the-tree code is essentially d_path without
> constructing the string, but I've had to remove the lock and unlock of
> vfsmount_lock because I don't have the symbol.

It's on the grounds that you shouldn't be poking about vfsmounts as it's
very core to vfs. Right answer is to use helpers (or identify a
legimate need for a new helper). In this case, your code is now
hopelessly racy, and I think would be better served by dealing with the
inode directly.

Hope that helps,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-05-07 09:32:55

by bmerry

[permalink] [raw]

Subject: Re: Holding ref to /proc/<pid> dentry prevents task being freed

On Fri, May 06, 2005 at 10:47:41AM -0700, Chris Wright wrote:
> > Rather than calling d_path on every access request and doing string
> > matching (sounds hideously slow), I use path_lookup to get a dentry/mnt
> > pair for each path passed in (once when it is passed in). Then the
> > inode_permission hook walks up the filesystem, matching dentries.
>
> This can break with hard links, bind mounts, etc. Can you not label the
> inode directly?

I assume you mean using security attributes in the FS, ala SE Linux?
The problem with that is that each task will have a different ACL. So
I'd have to label the inodes with per-pid tags, make sure the tags got
removed when the task exited, worry about PID reuse, worry about a
crash leaving bogus tags on inodes etc. Or do you mean storing data in
the inode->i_security field? That then raises questions about memory
consumption, since I need to store data per (task, inode) pair.

I also really do want to grant permission to directory trees. For
example, I may want it to be possible to run Java apps, which means
granting access to the whole of /opt/sun-jdk-1.5.0.02. Perhaps I should
just have userspace walk that tree and pass all the files in it to the
security module? Somehow that seems unclean, in that files added to the
tree after the walk are not included, and those moved out of the tree
are not excluded. The walk could also take quite a while if I grant
access to a large tree (/usr maybe), only a small fraction of which is
ever accessed.

I'm aware that there may well be no canonical path to an inode.
However, the sandboxed processes have no ability to do mounts or to
make hard links, so I don't think this is exploitable.

BTW if I do go with storing a big list of inodes in each

> > Some processes have a legitimate reason for accessing /proc/<pid> (pid
> > of that process). Java, for example, does readlink("/proc/self/exe") to
> > find the binary. So the wrapper passes /proc/<pid> to the module, which
> > looks up and holds the dentry for it. I don't want to give blanket
> > permission to /proc, since preventing the sandbox from getting
> > information about what else is happening on the system is fairly
> > important to my application.
>
> Did you look at security_task_to_inode? It's there to help you label the
> task based proc entries' inodes directly.

Thanks, I wasn't aware of that. I'll look into it.

>
> > At the moment I'm looking at hard-coding special behaviour for /proc
> > into the module, but I was wondering if there was a simpler way around
> > this problem.
>
> You'll likely regret hardcoding something like that.

Quite. Hence I'm looking for a simpler way :-)

> > Incidentally, is it intentional that vfsmount_lock is not exported to
> > modules? My walk-up-the-tree code is essentially d_path without
> > constructing the string, but I've had to remove the lock and unlock of
> > vfsmount_lock because I don't have the symbol.

In fact it gets worse than this. Many of the LSM inode hooks get given
only a dentry, not a nameidata (i.e. no vfsmount). So I've hacked a
function to find a vfsmount from a superblock (again, without the
lock), but obviously bind mounts will mess that up totally.

> It's on the grounds that you shouldn't be poking about vfsmounts as it's
> very core to vfs. Right answer is to use helpers (or identify a
> legimate need for a new helper). In this case, your code is now
> hopelessly racy, and I think would be better served by dealing with the
> inode directly.

Ok, I'm pretty new to kernel hacking, so I'm not familiar with all the
helpers available. Would path_walk("..", &nd) do what I want (i.e. give
me the parent of a nameidata)? And what would be the best way to map a
super_block to a vfsmount? I know that isn't necessarily a well-defined
operation, but I'll be happy to get *an* answer (on the system I will
be using the module on, there will be no bind mounts and filesystems
like /proc will only be mounted once, so there will be a unique answer).

At the moment this module will only be deployed on a single UP machine
with preemption disabled (a computer olympiad handin server, that runs
and marks code that people submit over the web), so races hopefully
shouldn't matter. But yes, every time I look at that code I start
twitching.

> Hope that helps,
> -chris

Yes, thanks for all your help on this. Even if I don't agree with the
inode-based approach, you've given me ideas and things to think about.

Bruce
--
/--------------------------------------------------------------------\
| Bruce Merry | bmerry at cs . uct . ac . za |
| Proud user of Gentoo GNU/Linux | http://www.cs.uct.ac.za/~bmerry |
| "Oh dear, I think you'll find reality is on the blink again." |
| -- Marvin, the Hitchhikers Guide to the Galaxy. |
\--------------------------------------------------------------------/