LinuxLists.cc - [PATCH] namespace.c: fix bind mount from foreign namespace

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

On Fri, 2005-05-13 at 03:44, Miklos Szeredi wrote:
> Bind mount from a foreign namespace results in an un-removable mount.

i wonder, should we even allow the ability to bind mount
from a foreign namespace?

The only time this is allowed is while creating a new namespace.

RP

> The reason is that mnt->mnt_namespace is copied from the old mount in
> clone_mnt(). Because of this, check_mnt() in sys_umount() will fail.
>
> The solution is to set mnt->mnt_namespace to current->namespace.
>
> clone_mnt() is either called from do_loopback() or copy_tree().
> copy_tree() is called from do_loopback() or copy_namespace().
>
> When called (directly or indirectly) from do_loopback(), always
> current->namspace is being modified: check_mnt(nd->mnt). So setting
> mnt->mnt_namespace to current->namspace is the right thing to do.
>
> When called from copy_namespace(), the setting of mnt_namespace is
> irrelevant, since mnt_namespace is reset later in that function for
> all copied mounts.

>
> Signed-off-by: Miklos Szeredi <[email protected]>
>
> Index: linux/fs/namespace.c
> ===================================================================
> --- linux.orig/fs/namespace.c 2005-05-13 12:22:52.000000000 +0200
> +++ linux/fs/namespace.c 2005-05-13 12:32:36.000000000 +0200
> @@ -160,7 +160,7 @@ clone_mnt(struct vfsmount *old, struct d
> mnt->mnt_root = dget(root);
> mnt->mnt_mountpoint = mnt->mnt_root;
> mnt->mnt_parent = mnt;
> - mnt->mnt_namespace = old->mnt_namespace;
> + mnt->mnt_namespace = current->namespace;
>
> /* stick the duplicate mount on the same expiry list
> * as the original if that was on one */
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2005-05-13 17:05:52

by Al Viro

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

On Fri, May 13, 2005 at 12:44:03PM +0200, Miklos Szeredi wrote:
> Bind mount from a foreign namespace results in

... -EINVAL

2005-05-13 17:19:18

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

> > Bind mount from a foreign namespace results in
>
> ... -EINVAL

Wrong answer. Look again, you wrote the code, so you _should_ know ;)

Miklos

2005-05-13 17:25:44

by Al Viro

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

On Fri, May 13, 2005 at 07:17:39PM +0200, Miklos Szeredi wrote:
> > > Bind mount from a foreign namespace results in
> >
> > ... -EINVAL
>
> Wrong answer. Look again, you wrote the code, so you _should_ know ;)

static inline int check_mnt(struct vfsmount *mnt)
{
return mnt->mnt_namespace == current->namespace;
}

static int do_loopback(struct nameidata *nd, char *old_name, int recurse)
{
struct nameidata old_nd;
struct vfsmount *mnt = NULL;
/* no changes of mnt */
err = -EINVAL;
if (check_mnt(nd->mnt) && ... ) {
/* assigns to mnt */
}
if (mnt) {
/* assigns to err */
}
up_write(&current->namespace->sem);
path_release(&old_nd);
return err;
}

Care to explain how that would not give -EINVAL?

2005-05-13 17:32:10

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

On Fri, 2005-05-13 at 10:17, Miklos Szeredi wrote:
> > > Bind mount from a foreign namespace results in
> >
> > ... -EINVAL
>
> Wrong answer. Look again, you wrote the code, so you _should_ know ;)

I guess Al agrees that bind mount from foreign namespace must be
disallowed.

Which means what Jamie pointed to was right. Attached the patch which
fixes it.

>
> Miklos
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

Attachments:

bind.patch (478.00 B)

2005-05-13 17:34:59

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

> static int do_loopback(struct nameidata *nd, char *old_name, int recurse)
> {
> struct nameidata old_nd;
> struct vfsmount *mnt = NULL;
> /* no changes of mnt */
> err = -EINVAL;
> if (check_mnt(nd->mnt) && ... ) {
> /* assigns to mnt */
> }
> if (mnt) {
> /* assigns to err */
> }
> up_write(&current->namespace->sem);
> path_release(&old_nd);
> return err;
> }
>
> Care to explain how that would not give -EINVAL?

Yeah, but that check_mnt() checks the _destination_ of the bind not
the source. The source is only checked for recursive mounts,
presumably because the source namespace is not locked, and so can
change.

Miklos

2005-05-13 18:42:22

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

> > > > Bind mount from a foreign namespace results in
> > >
> > > ... -EINVAL
> >
> > Wrong answer. Look again, you wrote the code, so you _should_ know ;)
>
> I guess Al agrees that bind mount from foreign namespace must be
> disallowed.
>
> Which means what Jamie pointed to was right. Attached the patch which
> fixes it.

You are very quick fixing things which are not broken :)

And BTW Jamie was saying, the checks should be removed, not that more
checks should be added (as your patch does).

Jamie Lokier wrote:
> I agree about the bug (and it's why I think the current->namespace
> checks in fs/namespace.c should be killed - the _only_ effect is to
> make un-removable mounts like the above, and the checks are completely
> redundant for "normal" namespace operations).

The checks are actually not redundant, but only because of locking
reasons, not because of security reasons. So I agree with Jamie, that
in the long run it makes sense to relax those checks.

Miklos

2005-05-14 06:12:39

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

(CC restored, because I think this is interesting to others as well)

> > > I dont get it...
> > >
> > > do you agree that bind mounts accross namespaces should be disallowed?
> >
> > No. I think it's a useful feature. It's actually been discussed at
> > length earlier, that it makes sense if the user can copy private
> > mounts from one session to the other (or even automate this with pam,
> > and a daemon). Why should they be disallowed?
>
> I understand that feature and the way to do it is through shared
> subtrees.

I agree fully, that shared subtrees would be very useful.

> I am against a mount in one namespace being bind mounted in
> another namespace(which Al Viro has implied in his mail).

I'd rather not speculate on what Al Viro was thinking, it may have
been just a misunderstanding.

> With shared subtree if a bind mount is done
> in one namespace, the bind happens within the same namespace.

Yes. But that's not fundamentally different from explicitly passing a
mount (through a file descriptor) to a process in a different
namespace, and allowing that process to bind mount it in it's native
namespace.

The end result can be exactly the same, only in the shared subtree
case the binding is implicit, while in the other case it's explicit on
both sides (which makes it perfectly secure even for unprivileged use)

Please explain why you think it's wrong to be able to bind mount from
a different namespace?

>
> However the operation is mirrored to other namespaces
> that has the same heridity link to this namespace.
>
> probably I can give an example:
>
> if namespace n1 has the following tree
> v11
> / \
> v12 v13
>
> v1 is mark shared. (mount --make-shared v1) [ for simplicity I vxy
> means yth vfsmount in xth namespace ]
>
> and than n2 is cloned out of n1, than in n2 we have
> v21
> / \
> v22 v23
>
> now a bind mount in n1
> mount --bind v12 v13
>
> will first change the tree in n1 as follows:
>
> v11
> / \
> v12 v13
> \
> v14
> where v14 is a bind mount of v12
>
>
> and than due to propogation the tree in n2 will also change to
> v21
> / \
> v22 v23
> \
> v24
> where v24 is a bind mount of v22
>
>
> Essentially there is no cross-contamination, as well as it meets
> the requirement of per-user-namespace.

What do you mean by cross contamination?

Thanks,
Miklos

2005-05-16 07:26:45

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

On Fri, 2005-05-13 at 23:11, Miklos Szeredi wrote:
> (CC restored, because I think this is interesting to others as well)
>
> > > > I dont get it...
> > > >
> > > > do you agree that bind mounts accross namespaces should be disallowed?
> > >
> > > No. I think it's a useful feature. It's actually been discussed at
> > > length earlier, that it makes sense if the user can copy private
> > > mounts from one session to the other (or even automate this with pam,
> > > and a daemon). Why should they be disallowed?
> >
> > I understand that feature and the way to do it is through shared
> > subtrees.
>
> I agree fully, that shared subtrees would be very useful.
>
> > I am against a mount in one namespace being bind mounted in
> > another namespace(which Al Viro has implied in his mail).
>
> I'd rather not speculate on what Al Viro was thinking, it may have
> been just a misunderstanding.

Can somebody who know internals of Al Viro's thinking help here?

>
> > With shared subtree if a bind mount is done
> > in one namespace, the bind happens within the same namespace.
>
> Yes. But that's not fundamentally different from explicitly passing a
> mount (through a file descriptor) to a process in a different
> namespace, and allowing that process to bind mount it in it's native
> namespace.
>
> The end result can be exactly the same, only in the shared subtree
> case the binding is implicit, while in the other case it's explicit on
> both sides (which makes it perfectly secure even for unprivileged use)
>
> Please explain why you think it's wrong to be able to bind mount from
> a different namespace?

If It is allowed, the concept of namespaces itself becomes
nebulous. one could bind mount the root vfsmount of all the other
namespace in their own namespace and then it all becomes one big tree
with all the other namespaces as a subtree.

why would we need this feature? what extra advantage would this feature
provide us? Is the advantage of this feature already discussed in this
thread? (maybe i missed it).

>
> >
> > However the operation is mirrored to other namespaces
> > that has the same heridity link to this namespace.
> >
> > probably I can give an example:
> >
> > if namespace n1 has the following tree
> > v11
> > / \
> > v12 v13
> >
> > v1 is mark shared. (mount --make-shared v1) [ for simplicity I vxy
> > means yth vfsmount in xth namespace ]
> >
> > and than n2 is cloned out of n1, than in n2 we have
> > v21
> > / \
> > v22 v23
> >
> > now a bind mount in n1
> > mount --bind v12 v13
> >
> > will first change the tree in n1 as follows:
> >
> > v11
> > / \
> > v12 v13
> > \
> > v14
> > where v14 is a bind mount of v12
> >
> >
> > and than due to propogation the tree in n2 will also change to
> > v21
> > / \
> > v22 v23
> > \
> > v24
> > where v24 is a bind mount of v22
> >
> >
> > Essentially there is no cross-contamination, as well as it meets
> > the requirement of per-user-namespace.
>
> What do you mean by cross contamination?

A vfsmount in one namespace bound to a mountpoint in another
namespace.

RP

>
> Thanks,
> Miklos

2005-05-16 08:49:23

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

> Can somebody who know internals of Al Viro's thinking help here?

There's only one person... :)

> > Please explain why you think it's wrong to be able to bind mount from
> > a different namespace?
>
>
> If It is allowed, the concept of namespaces itself becomes
> nebulous. one could bind mount the root vfsmount of all the other
> namespace in their own namespace and then it all becomes one big tree
> with all the other namespaces as a subtree.

1) you need not recursively bind the whole tree of the private
namespace. In fact you can only do that by hand, since the kernel
won't do it (!recurse || check_mnt(old_nd.mnt) in do_loopback).

2) you won't see changes made in other namespace, they are still
separate, they are just sharing some filesystems, just as after
clone, or just as after propagation within a shared subtree.

3) this is not automatic, so no filesystems in the other namespace are
visible unless there's an explicit action by processes both in the
originating namespace and the destination namespace.

4) in fact, the process in the originating namespace can single out a
mount and just send a file descriptor refering to that mount
(e.g. by binding it to a temporary directory, opening the root,
detaching from the mountpoint, and then sending the file descriptor
to the receiving process). This way the receiving process will see
no other mounts in the originating namespace, and can only bind
from that single mount.

> why would we need this feature? what extra advantage would this feature
> provide us? Is the advantage of this feature already discussed in this
> thread? (maybe i missed it).

http://lkml.org/lkml/2005/4/25/47

It was suggested as a way of sharing mounts between sessions (as
opposed to shared subtrees, which shares mounts between parent/child
process).

I'm not saying that that this is the way to do it. But it does seem
to me a useful feature.

And since it doesn't add _any_ complexity to the kernel, I think it
would be rather stupid to remove it.

> > What do you mean by cross contamination?
>
> A vfsmount in one namespace bound to a mountpoint in another
> namespace.

A vfsmount can only be in a single namespace at a time, since each
mount tree is rooted in a single namespace. So what you are saying is
impossible.

This feature is about binding a mount (copying a vfsmount in
clone_mnt()), which happens to be in a different namespace. After the
vfsmount is cloned, and is attached to the process's native namespace
it's in that namespace solely.

The bug being discussed, is an administrative error, of setting
mnt_namespace to the wrong value, which causes the mount to be
un-removable. So there was never a question of "cross contamination".

Miklos

2005-05-16 09:01:00

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

> A vfsmount can only be in a single namespace at a time, since each
> mount tree is rooted in a single namespace. So what you are saying is
> impossible.

To be more precise a vfsmount can only be in _at_ _most_ one
namespace. When it is detached from a mount tree, it's no longer in
_any_ namespace.

There even seems to be some confusion about that in namespace.c. I
think mnt_namespace should be set to NULL in detach_mnt() instead of
__put_namespace().

Miklos

2005-05-16 11:15:19

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

Ram wrote:
> > I'd rather not speculate on what Al Viro was thinking, it may have
> > been just a misunderstanding.
>
> Can somebody who know internals of Al Viro's thinking help here?

Presumably he wrote this line:

if (check_mnt(nd->mnt) && (!recurse || check_mnt(old_nd.mnt))) {

Which /explicitly/ permits bind mounts between namespaces if it's not
recursive. It's not accidental: that !recurse is blatantly making a
point of allowing it.

I take that to mean that /at least at one time/ Al chose to allow it.

Then again, he also wrote this:

> > Bind mount from a foreign namespace results in
>
> ... -EINVAL

Which means that /at another time/ Al thought he'd disallowed it.

This is a bit like arguing over what the Founding Fathers of the US
Constitution meant. Does it matter? We really should ask what
behaviour makes sense now. Should we add more explicit restrictions
to the code, making the concept of namespaces more restrictive? Or
remove the restrictions, on the grounds that they don't really add any
security, it'd be useful to relax them, and the code would be simpler?

-- Jamie

2005-05-16 11:27:19

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

Miklos Szeredi wrote:
> 1) you need not recursively bind the whole tree of the private
> namespace. In fact you can only do that by hand, since the kernel
> won't do it (!recurse || check_mnt(old_nd.mnt) in do_loopback).

That would be easy to change if it was desired though, by taking both
namespace semaphores when two namespaces are involved.

> 2) you won't see changes made in other namespace, they are still
> separate, they are just sharing some filesystems, just as after
> clone, or just as after propagation within a shared subtree.

That's true.

Let's not get confused between binding across namespaces, and
chroot/chdir into an fd supplied by a process from another namespace.

In the case of bind mount, that _won't_ see changes made in the other
namespace. The /dentry/ visible in the other namespace is simply
mounted here. The fact that it happens to come from another namespace
is irrelevant: the other namespace is not used for single dentry bind mounts.

In the case of chroot/chdir, that _will_ see changes made in the other
namespace. Effectively, that transitions into the other namespace as
a whole, which is exactly what we want in some cases (when userspace
policy determines that a per-session namespace is wanted).

> 4) in fact, the process in the originating namespace can single out a
> mount and just send a file descriptor refering to that mount
> (e.g. by binding it to a temporary directory, opening the root,
> detaching from the mountpoint, and then sending the file descriptor
> to the receiving process). This way the receiving process will see
> no other mounts in the originating namespace, and can only bind
> from that single mount.

Nice. The process in the originating namespace can also bind a small,
carefully selected tree of mounts to a tree in that temporary
directory before passing it, so the recipient can chroot/chdir into
the set of mounts and get only those explicitly authorised by the
originating process.

-- Jamie

2005-05-16 13:28:11

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

> > 1) you need not recursively bind the whole tree of the private
> > namespace. In fact you can only do that by hand, since the kernel
> > won't do it (!recurse || check_mnt(old_nd.mnt) in do_loopback).
>
> That would be easy to change if it was desired though, by taking both
> namespace semaphores when two namespaces are involved.

Yes.

The other check_mnt() calls could be removed by taking
nd.mnt->mnt_namespace->sem instead of current->namespace->sem in the
relevant functions.

It does make sense IMO, even if it won't be used very often, since
only very little extra complexity is involved.

> > 4) in fact, the process in the originating namespace can single out a
> > mount and just send a file descriptor refering to that mount
> > (e.g. by binding it to a temporary directory, opening the root,
> > detaching from the mountpoint, and then sending the file descriptor
> > to the receiving process). This way the receiving process will see
> > no other mounts in the originating namespace, and can only bind
> > from that single mount.
>
> Nice. The process in the originating namespace can also bind a small,
> carefully selected tree of mounts to a tree in that temporary
> directory before passing it, so the recipient can chroot/chdir into
> the set of mounts and get only those explicitly authorised by the
> originating process.

That won't work, since detach (umount -l) will break up the tree, and
the file descriptor will hold a reference to only one vfsmount/dentry.

Miklos

2005-05-16 19:51:04

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

On Mon, 2005-05-16 at 04:14, Jamie Lokier wrote:
> Ram wrote:
> > > I'd rather not speculate on what Al Viro was thinking, it may have
> > > been just a misunderstanding.
> >
> > Can somebody who know internals of Al Viro's thinking help here?
>
> Presumably he wrote this line:
>
> if (check_mnt(nd->mnt) && (!recurse || check_mnt(old_nd.mnt))) {
>
> Which /explicitly/ permits bind mounts between namespaces if it's not
> recursive. It's not accidental: that !recurse is blatantly making a
> point of allowing it.
>
> I take that to mean that /at least at one time/ Al chose to allow it.
>
> Then again, he also wrote this:
>
> > > Bind mount from a foreign namespace results in
> >
> > ... -EINVAL
>
> Which means that /at another time/ Al thought he'd disallowed it.
>
> This is a bit like arguing over what the Founding Fathers of the US
> Constitution meant. Does it matter? We really should ask what
> behaviour makes sense now. Should we add more explicit restrictions
> to the code, making the concept of namespaces more restrictive? Or
> remove the restrictions, on the grounds that they don't really add any
> security, it'd be useful to relax them, and the code would be simpler?
>

Ok. less restriction without compromising security is a good idea.

Under the premise that bind mounts across namespace should be allowed;
any insight why the "founding fathers" :) allowed only bind
and not recursive bind? What issue would that create? One can
easily workaround that restriction by manually binding recursively.
So does the recursive bind restriction serve any purpose?

I remember Miklos saying its not a security issue but a
implementation/locking issue. That can be fixed aswell.

RP

> -- Jamie

2005-05-16 20:23:23

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

> Ok. less restriction without compromising security is a good idea.
>
> Under the premise that bind mounts across namespace should be allowed;
> any insight why the "founding fathers" :) allowed only bind
> and not recursive bind? What issue would that create? One can
> easily workaround that restriction by manually binding recursively.
> So does the recursive bind restriction serve any purpose?
>
> I remember Miklos saying its not a security issue but a
> implementation/locking issue. That can be fixed aswell.

Yes, as pointed out by Jamie, both namespaces need to be locked for
this to work. Something like the attached should do it.

Miklos

Index: linux/fs/namespace.c
===================================================================
--- linux.orig/fs/namespace.c 2005-05-16 22:02:36.000000000 +0200
+++ linux/fs/namespace.c 2005-05-16 22:13:30.000000000 +0200
@@ -622,6 +622,8 @@ out_unlock:
static int do_loopback(struct nameidata *nd, char *old_name, int recurse)
{
struct nameidata old_nd;
+ struct namespace *ns1 = current->namespace;
+ struct namespace *ns2 = NULL;
struct vfsmount *mnt = NULL;
int err = mount_is_safe(nd);
if (err)
@@ -632,15 +634,30 @@ static int do_loopback(struct nameidata
if (err)
return err;

- down_write(&current->namespace->sem);
err = -EINVAL;
- if (check_mnt(nd->mnt) && (!recurse || check_mnt(old_nd.mnt))) {
- err = -ENOMEM;
- if (recurse)
- mnt = copy_tree(old_nd.mnt, old_nd.dentry);
- else
- mnt = clone_mnt(old_nd.mnt, old_nd.dentry);
- }
+ if (!check_mnt(nd->mnt))
+ goto out_path_release;
+
+ if (recurse && old_nd.mnt->mnt_namespace != ns1) {
+ ns2 = old_nd.mnt->mnt_namespace;
+ if (ns1 < ns2) {
+ down_write(&ns1->sem);
+ down_write(&ns2->sem);
+ } else {
+ down_write(&ns2->sem);
+ down_write(&ns1->sem);
+ }
+ } else
+ down_write(&ns1->sem);
+
+ err = -ENOMEM;
+ if (recurse)
+ mnt = copy_tree(old_nd.mnt, old_nd.dentry);
+ else
+ mnt = clone_mnt(old_nd.mnt, old_nd.dentry);
+
+ if (ns2)
+ up_write(&ns2->sem);

if (mnt) {
/* stop bind mounts from expiring */
@@ -657,7 +674,8 @@ static int do_loopback(struct nameidata
mntput(mnt);
}

- up_write(&current->namespace->sem);
+ up_write(&ns1->sem);
+out_path_release:
path_release(&old_nd);
return err;
}

2005-05-17 00:00:43

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

Ram wrote:
> Under the premise that bind mounts across namespace should be allowed;
> any insight why the "founding fathers" :) allowed only bind
> and not recursive bind? What issue would that create?

Recursive bind traverses the subtree of vfsmnts rooted at the source
mount (following mnt->mnt_mounts, see copy_tree()). That requires the
source mount's namespace semaphore to be held.

> One can easily workaround that restriction by manually binding
> recursively.

Yes, if you know which mounts they are.

> I remember Miklos saying its not a security issue but a
> implementation/locking issue. That can be fixed aswell.

Yes, by taking the source namespace semaphore while traversing the
subtree. That involves taking _two_ semaphores, so they have to be
ordered to avoid deadlock (see double-locking elsewhere in the kernel).

- Jamie

2005-05-17 01:29:08

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

Miklos Szeredi wrote:
> + if (ns1 < ns2) {
> + down_write(&ns1->sem);
> + down_write(&ns2->sem);
> + } else {
> + down_write(&ns2->sem);
> + down_write(&ns1->sem);
> + }

That's a bit smaller (source and compiled) as:

if (ns2 < ns1)
down_write(&ns2->sem);
down_write(&ns1->sem);
if (ns2 > ns1)
down_write(&ns2->sem);

(And you'll notice that does the right thing if ns2==ns1 too, in case
that gives you any ideas.)

Otherwise, the patch looks convincing to me.

-- Jamie

2005-05-17 05:35:39

[permalink] [raw]

Subject: Re: [PATCH] namespace.c: fix bind mount from foreign namespace

> That's a bit smaller (source and compiled) as:
>
> if (ns2 < ns1)
> down_write(&ns2->sem);
> down_write(&ns1->sem);
> if (ns2 > ns1)
> down_write(&ns2->sem);
>
> (And you'll notice that does the right thing if ns2==ns1 too, in case
> that gives you any ideas.)

Nice.

> Otherwise, the patch looks convincing to me.

There's another problem with it: we don't hold a reference to
old_nd.mnt->mnt_namespace, and the namespace going away could race
with this function.

So first obtain the necessary reference:

spin_lock(&vfsmount_lock);
ns2 = old_nd.mnt->mnt_namespace;
if (ns2)
get_namespace(ns2);
spin_unlock(&vfsmount_lock);

Then take the semaphores.

Then recheck old_nd.mnt->mnt_namespace, because it might have been
detached between the spin_unlock() and the down_write(&ns2->sem).

It's starting to get a bit complex, and I'm wondering if it's worth it :)

Miklos

2005-05-17 18:48:55