2006-05-01 21:06:21

by Valdis Klētnieks

[permalink] [raw]
Subject: 2.6.17-rc3 - fs/namespace.c issue

There seems to have been a bug introduced in this changeset:

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f6422f17d3a480f21917a3895e2a46b968f56a08

Am running 2.6.17-rc3-mm1. When this changeset is applied, 'mount --bind'
misbehaves:

> # mkdir /foo
> # mount -t tmpfs -o rw,nosuid,nodev,noexec,noatime,nodiratime none /foo
> # mkdir /foo/bar
> # mount --bind /foo/bar /foo
> # tail -2 /proc/mounts
> none /foo tmpfs rw,nosuid,nodev,noexec,noatime,nodiratime 0 0
> none /foo tmpfs rw 0 0

Reverting this changeset causes both mounts to have the same options.

(Thanks to Stephen Smalley for tracking down the changeset...)


Attachments:
(No filename) (226.00 B)

2006-05-01 21:31:21

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.17-rc3 - fs/namespace.c issue

[email protected] wrote:
>
> There seems to have been a bug introduced in this changeset:
>
> http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f6422f17d3a480f21917a3895e2a46b968f56a08
>
> Am running 2.6.17-rc3-mm1. When this changeset is applied, 'mount --bind'
> misbehaves:
>
> > # mkdir /foo
> > # mount -t tmpfs -o rw,nosuid,nodev,noexec,noatime,nodiratime none /foo
> > # mkdir /foo/bar
> > # mount --bind /foo/bar /foo
> > # tail -2 /proc/mounts
> > none /foo tmpfs rw,nosuid,nodev,noexec,noatime,nodiratime 0 0
> > none /foo tmpfs rw 0 0
>
> Reverting this changeset causes both mounts to have the same options.
>
> (Thanks to Stephen Smalley for tracking down the changeset...)
>

(cc's added)

2006-05-01 23:56:39

by Herbert Poetzl

[permalink] [raw]
Subject: Re: 2.6.17-rc3 - fs/namespace.c issue

On Mon, May 01, 2006 at 02:33:44PM -0700, Andrew Morton wrote:
> [email protected] wrote:
> >
> > There seems to have been a bug introduced in this changeset:
> >
> > http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f6422f17d3a480f21917a3895e2a46b968f56a08
> >
> > Am running 2.6.17-rc3-mm1. When this changeset is applied, 'mount --bind'
> > misbehaves:
> >
> > > # mkdir /foo
> > > # mount -t tmpfs -o rw,nosuid,nodev,noexec,noatime,nodiratime none /foo
> > > # mkdir /foo/bar
> > > # mount --bind /foo/bar /foo
> > > # tail -2 /proc/mounts
> > > none /foo tmpfs rw,nosuid,nodev,noexec,noatime,nodiratime 0 0
> > > none /foo tmpfs rw 0 0
> >
> > Reverting this changeset causes both mounts to have the same options.
> >
> > (Thanks to Stephen Smalley for tracking down the changeset...)

well, IMHO there are several open questions regarding semantics

first, what do we expect from --bind mounts regarding
vfs (mount) level flags like noatime, noexec, nodev?

- should they be propagated from the original mfs/mount?
- should they only restrict the original set?
- should they allow to modify the existing flags?

IMHO, it makes perfect sense to mount something noatime
and change that rule later for a subtree like this:

mkdir /foo
mount -t tmpfs -o rw,noatime none /foo
mkdir /foo/bar
mount --bind -o atime /foo/bar /foo/bar

second, has the kernel to decide what flags userspace
can request and/or change, depending on the original?

and finally, how to handle --rbind mounts at a level
deeper than the top?

so I do not consider the example above a misbehaviour.
what I consider a misbehaviour is that mount (userspace)
blindly assumes that --bind mounts are independant, so
it does not check the existing flags, and thus, does not
preserve them (instead it replaces them with the default)

removing the mnt->mnt_flags = mnt_flags; assignment
is sufficient to _only_ allow the identical attributes
of the original mount, as they are copied in the
clone_mnt() operation, of course, this also makes it
impossible to have any flags/changes to the --bind mounts
over the original

as this patch was torn out of a much larger patch set
to allow for such attribute changes at --bind mount time
I'd sugegst the following untested 'fix'

best,
Herbert

--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -937,7 +937,6 @@ static int do_loopback(struct nameidata
spin_unlock(&vfsmount_lock);
release_mounts(&umount_list);
}
- mnt->mnt_flags = mnt_flags;

out:
up_write(&namespace_sem);


> (cc's added)

tx

> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2006-05-02 06:56:53

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: 2.6.17-rc3 - fs/namespace.c issue

On Tue, 02 May 2006 01:56:37 +0200, Herbert Poetzl said:
> > > http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f6422f17d3a480f21917a3895e2a46b968f56a08

> first, what do we expect from --bind mounts regarding
> vfs (mount) level flags like noatime, noexec, nodev?
>
> - should they be propagated from the original mfs/mount?

I tripped over this apparent regression when I hit a problem with some code
that expected this behavior. Given the documented behavior of the mount syscall
(see below), apparently propagating all flags intact and clearing all
flags are the only 2 options that don't break the documented API.

> - should they only restrict the original set?
> - should they allow to modify the existing flags?

Well, absent a '-o newflags' to modify it, propagating the originals
probably follows the Principle of Least Surprise. And whether mountflags
are permissible is an API change issue...

> IMHO, it makes perfect sense to mount something noatime
> and change that rule later for a subtree like this:
>
> mkdir /foo
> mount -t tmpfs -o rw,noatime none /foo
> mkdir /foo/bar
> mount --bind -o atime /foo/bar /foo/bar

Here, there's a -o parameter being passed.

> second, has the kernel to decide what flags userspace
> can request and/or change, depending on the original?

Can of worms, too complicated for 3AM. :)

> and finally, how to handle --rbind mounts at a level
> deeper than the top?

More worms. ;)

Note that any provision for changing the mountflags *IS* a break of
the documented API. 'man 2 mount' says specifically:

MS_BIND
(Linux 2.4 onwards) Perform a bind mount, making a file or a
directory subtree visible at another point within a file system.
Bind mounts may cross file system boundaries and span chroot(2)
jails. The filesystemtype, mountflags, and data arguments are
ignored.

I admit not knowing that whether POSIX or other standards specify that
mountflags be ignored.


Attachments:
(No filename) (226.00 B)

2006-05-02 16:48:53

by Herbert Poetzl

[permalink] [raw]
Subject: Re: 2.6.17-rc3 - fs/namespace.c issue

On Tue, May 02, 2006 at 02:56:23AM -0400, [email protected] wrote:
> On Tue, 02 May 2006 01:56:37 +0200, Herbert Poetzl said:
> > > > http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f6422f17d3a480f21917a3895e2a46b968f56a08
>
> > first, what do we expect from --bind mounts regarding
> > vfs (mount) level flags like noatime, noexec, nodev?
> >
> > - should they be propagated from the original mfs/mount?
>
> I tripped over this apparent regression when I hit a problem with some
> code that expected this behavior. Given the documented behavior of the
> mount syscall (see below), apparently propagating all flags intact
> and clearing all flags are the only 2 options that don't break the
> documented API.
>
> > - should they only restrict the original set?
> > - should they allow to modify the existing flags?
>
> Well, absent a '-o newflags' to modify it, propagating the originals
> probably follows the Principle of Least Surprise. And whether mountflags
> are permissible is an API change issue...
>
> > IMHO, it makes perfect sense to mount something noatime
> > and change that rule later for a subtree like this:
> >
> > mkdir /foo
> > mount -t tmpfs -o rw,noatime none /foo
> > mkdir /foo/bar
> > mount --bind -o atime /foo/bar /foo/bar
>
> Here, there's a -o parameter being passed.

yes, but this information unfortunately cannot be passed
to the kernel, assuming that we 'preserve' the original
mount flags, as there simply is no 'atime' flag, just an
MS_NOATIME flag, which in this case is not set :)

> > second, has the kernel to decide what flags userspace
> > can request and/or change, depending on the original?
>
> Can of worms, too complicated for 3AM. :)
>
> > and finally, how to handle --rbind mounts at a level
> > deeper than the top?
>
> More worms. ;)

it's full of worms, maybe we should add a new option or
even a new mount type for this?

maybe we should allow remount on bind mounts, and keep
the original (copy all) behaviour intact for --bind and
--rbind mounts? (that would look most natural to me)

suggestions welcome!

best,
Herbert

> Note that any provision for changing the mountflags *IS* a break of
> the documented API. 'man 2 mount' says specifically:
>
> MS_BIND
> (Linux 2.4 onwards) Perform a bind mount, making a
> file or a directory subtree visible at another point
> within a file system. Bind mounts may cross file system
> boundaries and span chroot(2) jails. The filesystemtype,
> mountflags, and data arguments are ignored.
>
> I admit not knowing that whether POSIX or other standards specify that
> mountflags be ignored.


2006-05-12 17:32:51

by Stephen Smalley

[permalink] [raw]
Subject: Re: 2.6.17-rc3 - fs/namespace.c issue

On Mon, 2006-05-01 at 14:33 -0700, Andrew Morton wrote:
> [email protected] wrote:
> >
> > There seems to have been a bug introduced in this changeset:
> >
> > http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f6422f17d3a480f21917a3895e2a46b968f56a08
> >
> > Am running 2.6.17-rc3-mm1. When this changeset is applied, 'mount --bind'
> > misbehaves:
> >
> > > # mkdir /foo
> > > # mount -t tmpfs -o rw,nosuid,nodev,noexec,noatime,nodiratime none /foo
> > > # mkdir /foo/bar
> > > # mount --bind /foo/bar /foo
> > > # tail -2 /proc/mounts
> > > none /foo tmpfs rw,nosuid,nodev,noexec,noatime,nodiratime 0 0
> > > none /foo tmpfs rw 0 0
> >
> > Reverting this changeset causes both mounts to have the same options.
> >
> > (Thanks to Stephen Smalley for tracking down the changeset...)
> >
>
> (cc's added)

What's the verdict on this change in user-visible behavior for bind
mounts? Is it a legitimate change and userland just needs to adapt to
it, or is it a change to the kernel's stable ABI that needs to be
reverted? It still appears to be present in -rc4.

--
Stephen Smalley
National Security Agency

2006-05-12 17:50:59

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.17-rc3 - fs/namespace.c issue

Stephen Smalley <[email protected]> wrote:
>
> On Mon, 2006-05-01 at 14:33 -0700, Andrew Morton wrote:
> > [email protected] wrote:
> > >
> > > There seems to have been a bug introduced in this changeset:
> > >
> > > http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f6422f17d3a480f21917a3895e2a46b968f56a08
> > >
> > > Am running 2.6.17-rc3-mm1. When this changeset is applied, 'mount --bind'
> > > misbehaves:
> > >
> > > > # mkdir /foo
> > > > # mount -t tmpfs -o rw,nosuid,nodev,noexec,noatime,nodiratime none /foo
> > > > # mkdir /foo/bar
> > > > # mount --bind /foo/bar /foo
> > > > # tail -2 /proc/mounts
> > > > none /foo tmpfs rw,nosuid,nodev,noexec,noatime,nodiratime 0 0
> > > > none /foo tmpfs rw 0 0
> > >
> > > Reverting this changeset causes both mounts to have the same options.
> > >
> > > (Thanks to Stephen Smalley for tracking down the changeset...)
> > >
> >
> > (cc's added)
>
> What's the verdict on this change in user-visible behavior for bind
> mounts? Is it a legitimate change and userland just needs to adapt to
> it, or is it a change to the kernel's stable ABI that needs to be
> reverted? It still appears to be present in -rc4.
>

Well. We'd certainly prefer to not change user-visible behaviour without
excellent reasons - I don't htink any have been given, really.

AFACIT nobody tested Herbert's 'untested "fix"'. What was the verdict on
that?

2006-05-12 17:58:12

by Stephen Smalley

[permalink] [raw]
Subject: Re: 2.6.17-rc3 - fs/namespace.c issue

On Fri, 2006-05-12 at 10:53 -0700, Andrew Morton wrote:
> Stephen Smalley <[email protected]> wrote:
> >
> > On Mon, 2006-05-01 at 14:33 -0700, Andrew Morton wrote:
> > > [email protected] wrote:
> > > >
> > > > There seems to have been a bug introduced in this changeset:
> > > >
> > > > http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f6422f17d3a480f21917a3895e2a46b968f56a08
> > > >
> > > > Am running 2.6.17-rc3-mm1. When this changeset is applied, 'mount --bind'
> > > > misbehaves:
> > > >
> > > > > # mkdir /foo
> > > > > # mount -t tmpfs -o rw,nosuid,nodev,noexec,noatime,nodiratime none /foo
> > > > > # mkdir /foo/bar
> > > > > # mount --bind /foo/bar /foo
> > > > > # tail -2 /proc/mounts
> > > > > none /foo tmpfs rw,nosuid,nodev,noexec,noatime,nodiratime 0 0
> > > > > none /foo tmpfs rw 0 0
> > > >
> > > > Reverting this changeset causes both mounts to have the same options.
> > > >
> > > > (Thanks to Stephen Smalley for tracking down the changeset...)
> > > >
> > >
> > > (cc's added)
> >
> > What's the verdict on this change in user-visible behavior for bind
> > mounts? Is it a legitimate change and userland just needs to adapt to
> > it, or is it a change to the kernel's stable ABI that needs to be
> > reverted? It still appears to be present in -rc4.
> >
>
> Well. We'd certainly prefer to not change user-visible behaviour without
> excellent reasons - I don't htink any have been given, really.
>
> AFACIT nobody tested Herbert's 'untested "fix"'. What was the verdict on
> that?

The untested 'fix' makes the rest of the patch pointless (no point in
passing the mnt_flags if we aren't going to use them). Might as well
just revert the patch altogether.

--
Stephen Smalley
National Security Agency

2006-05-12 18:12:14

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.17-rc3 - fs/namespace.c issue

Stephen Smalley <[email protected]> wrote:
>
> On Fri, 2006-05-12 at 10:53 -0700, Andrew Morton wrote:
> > Stephen Smalley <[email protected]> wrote:
> > >
> > > On Mon, 2006-05-01 at 14:33 -0700, Andrew Morton wrote:
> > > > [email protected] wrote:
> > > > >
> > > > > There seems to have been a bug introduced in this changeset:
> > > > >
> > > > > http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f6422f17d3a480f21917a3895e2a46b968f56a08
> > > > >
> > > > > Am running 2.6.17-rc3-mm1. When this changeset is applied, 'mount --bind'
> > > > > misbehaves:
> > > > >
> > > > > > # mkdir /foo
> > > > > > # mount -t tmpfs -o rw,nosuid,nodev,noexec,noatime,nodiratime none /foo
> > > > > > # mkdir /foo/bar
> > > > > > # mount --bind /foo/bar /foo
> > > > > > # tail -2 /proc/mounts
> > > > > > none /foo tmpfs rw,nosuid,nodev,noexec,noatime,nodiratime 0 0
> > > > > > none /foo tmpfs rw 0 0
> > > > >
> > > > > Reverting this changeset causes both mounts to have the same options.
> > > > >
> > > > > (Thanks to Stephen Smalley for tracking down the changeset...)
> > > > >
> > > >
> > > > (cc's added)
> > >
> > > What's the verdict on this change in user-visible behavior for bind
> > > mounts? Is it a legitimate change and userland just needs to adapt to
> > > it, or is it a change to the kernel's stable ABI that needs to be
> > > reverted? It still appears to be present in -rc4.
> > >
> >
> > Well. We'd certainly prefer to not change user-visible behaviour without
> > excellent reasons - I don't htink any have been given, really.
> >
> > AFACIT nobody tested Herbert's 'untested "fix"'. What was the verdict on
> > that?
>
> The untested 'fix' makes the rest of the patch pointless (no point in
> passing the mnt_flags if we aren't going to use them). Might as well
> just revert the patch altogether.

OK, I queued a revert. It seems to be the sanest thing to do at this time
- Herbert can have another go later..

2006-05-13 13:54:47

by Herbert Poetzl

[permalink] [raw]
Subject: Re: 2.6.17-rc3 - fs/namespace.c issue

On Fri, May 12, 2006 at 11:09:03AM -0700, Andrew Morton wrote:
> Stephen Smalley <[email protected]> wrote:
> >
> > On Fri, 2006-05-12 at 10:53 -0700, Andrew Morton wrote:
> > > Stephen Smalley <[email protected]> wrote:
> > > >
> > > > On Mon, 2006-05-01 at 14:33 -0700, Andrew Morton wrote:
> > > > > [email protected] wrote:
> > > > > >
> > > > > > There seems to have been a bug introduced in this changeset:
> > > > > >
> > > > > > http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f6422f17d3a480f21917a3895e2a46b968f56a08
> > > > > >
> > > > > > Am running 2.6.17-rc3-mm1. When this changeset is applied, 'mount --bind'
> > > > > > misbehaves:
> > > > > >
> > > > > > > # mkdir /foo
> > > > > > > # mount -t tmpfs -o rw,nosuid,nodev,noexec,noatime,nodiratime none /foo
> > > > > > > # mkdir /foo/bar
> > > > > > > # mount --bind /foo/bar /foo
> > > > > > > # tail -2 /proc/mounts
> > > > > > > none /foo tmpfs rw,nosuid,nodev,noexec,noatime,nodiratime 0 0
> > > > > > > none /foo tmpfs rw 0 0
> > > > > >
> > > > > > Reverting this changeset causes both mounts to have the same options.
> > > > > >
> > > > > > (Thanks to Stephen Smalley for tracking down the changeset...)
> > > > > >
> > > > >
> > > > > (cc's added)
> > > >
> > > > What's the verdict on this change in user-visible behavior for bind
> > > > mounts? Is it a legitimate change and userland just needs to adapt to
> > > > it, or is it a change to the kernel's stable ABI that needs to be
> > > > reverted? It still appears to be present in -rc4.
> > > >
> > >
> > > Well. We'd certainly prefer to not change user-visible behaviour without
> > > excellent reasons - I don't htink any have been given, really.
> > >
> > > AFACIT nobody tested Herbert's 'untested "fix"'. What was the verdict on
> > > that?
> >
> > The untested 'fix' makes the rest of the patch pointless (no point in
> > passing the mnt_flags if we aren't going to use them). Might as well
> > just revert the patch altogether.
>
> OK, I queued a revert. It seems to be the sanest thing to do at this time
> - Herbert can have another go later..

okay, probably the best choice right now, had a chat
with hch, and will try to achieve the new functionality
with remount instead of mount -o bind, which seems to
be a less intrusive way from the userspace PoV

best,
Herbert

2006-05-15 21:19:41

by Bill Davidsen

[permalink] [raw]
Subject: Re: 2.6.17-rc3 - fs/namespace.c issue

has other stuff Herbert Poetzl wrote:
> On Mon, May 01, 2006 at 02:33:44PM -0700, Andrew Morton wrote:
>> [email protected] wrote:
>>> There seems to have been a bug introduced in this changeset:
>>>
>>> http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f6422f17d3a480f21917a3895e2a46b968f56a08
>>>
>>> Am running 2.6.17-rc3-mm1. When this changeset is applied, 'mount --bind'
>>> misbehaves:
>>>
>>>> # mkdir /foo
>>>> # mount -t tmpfs -o rw,nosuid,nodev,noexec,noatime,nodiratime none /foo
>>>> # mkdir /foo/bar
>>>> # mount --bind /foo/bar /foo
>>>> # tail -2 /proc/mounts
>>>> none /foo tmpfs rw,nosuid,nodev,noexec,noatime,nodiratime 0 0
>>>> none /foo tmpfs rw 0 0
>>> Reverting this changeset causes both mounts to have the same options.
>>>
>>> (Thanks to Stephen Smalley for tracking down the changeset...)
>
> well, IMHO there are several open questions regarding semantics
>
> first, what do we expect from --bind mounts regarding
> vfs (mount) level flags like noatime, noexec, nodev?
>
> - should they be propagated from the original mfs/mount?
> - should they only restrict the original set?
> - should they allow to modify the existing flags?

What does it mean if the flags can be modified? If I mount a tree ro, do
I want to open the can of worms from allowing part of it to be rw
elsewhere? And what checking is done, or should be done? If I do a ro
mount with something like NFS, what should happen if I mount part of it
rw? Substitute any of the other above flags, is there a security issue
here, and can I shoot myself in the foot?

Can I apply the "user" attribute in fstab to a bind mount? If I let a
user bind /foo/bar to /bazfaz/zot, what happens if I have the wrong
thing mounted on /foo? Or if /bazfaz is NFS exported read only?
>
> IMHO, it makes perfect sense to mount something noatime
> and change that rule later for a subtree like this:
>
> mkdir /foo
> mount -t tmpfs -o rw,noatime none /foo
> mkdir /foo/bar
> mount --bind -o atime /foo/bar /foo/bar
>
> second, has the kernel to decide what flags userspace
> can request and/or change, depending on the original?
>
> and finally, how to handle --rbind mounts at a level
> deeper than the top?
>
> so I do not consider the example above a misbehaviour.
> what I consider a misbehaviour is that mount (userspace)
> blindly assumes that --bind mounts are independant, so
> it does not check the existing flags, and thus, does not
> preserve them (instead it replaces them with the default)
>
> removing the mnt->mnt_flags = mnt_flags; assignment
> is sufficient to _only_ allow the identical attributes
> of the original mount, as they are copied in the
> clone_mnt() operation, of course, this also makes it
> impossible to have any flags/changes to the --bind mounts
> over the original

That certainly is a lot less likely to violate Plauger's law of least
astonishment.
>
> as this patch was torn out of a much larger patch set
> to allow for such attribute changes at --bind mount time
> I'd sugegst the following untested 'fix'
>
> best,
> Herbert

--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me