2004-02-18 16:42:04

by Olaf Kirch

[permalink] [raw]
Subject: NFS suport block sharing

Hi,

yesterday, I ran into something that puzzled me quite a bit. I had an
NFS file system (my home directory) mounted in its normal location. For
testing purposes, I wanted to mount it a second time, but with a different
set of options (hard instead of soft, etc).

To my surprise, the second mount continued to act like the original
mount, as if it had been mounted with -o soft. /etc/mtab showed the
options I had specified, but /proc/mounts showed that it indeed used
soft retransmits.

To make a long story short, I think _if_ we do NFS super block
sharing, we should make sure we use don't reuse an existing
sb if it has different options. Attached is a patch that implements
this, at least for the most common set of options.

Comments?

Olaf
--
Olaf Kirch | Stop wasting entropy - start using predictable
[email protected] | tempfile names today!
---------------+


Attachments:
(No filename) (878.00 B)
nfs-super-reuse (548.00 B)
Download all attachments

2004-02-18 20:23:01

by Lever, Charles

[permalink] [raw]
Subject: RE: NFS suport block sharing

i've also seen this rather puzzling behavior on 2.6.
it certainly doesn't behave as we have come to expect.

"me too"

the idea of the patch makes sense, but you also need
to check for some other flags like intr/nointr, noac,
tcp/udp, and sync, for example.

> -----Original Message-----
> From: Olaf Kirch [mailto:[email protected]]=20
> Sent: Wednesday, February 18, 2004 11:37 AM
> To: [email protected]
> Subject: [NFS] NFS suport block sharing
>=20
>=20
> Hi,
>=20
> yesterday, I ran into something that puzzled me quite a bit. I had an
> NFS file system (my home directory) mounted in its normal=20
> location. For
> testing purposes, I wanted to mount it a second time, but=20
> with a different
> set of options (hard instead of soft, etc).
>=20
> To my surprise, the second mount continued to act like the original
> mount, as if it had been mounted with -o soft. /etc/mtab showed the
> options I had specified, but /proc/mounts showed that it indeed used
> soft retransmits.
>=20
> To make a long story short, I think _if_ we do NFS super block
> sharing, we should make sure we use don't reuse an existing
> sb if it has different options. Attached is a patch that implements
> this, at least for the most common set of options.
>=20
> Comments?
>=20
> Olaf
> --=20
> Olaf Kirch | Stop wasting entropy - start using predictable
> [email protected] | tempfile names today!
> ---------------+=20
>=20


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-02-19 00:11:32

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS suport block sharing

P? on , 18/02/2004 klokka 08:37, skreiv Olaf Kirch:
> To make a long story short, I think _if_ we do NFS super block
> sharing, we should make sure we use don't reuse an existing
> sb if it has different options. Attached is a patch that implements
this, at least for the most common set of options.
>
> Comments?

I'd prefer not.

We should aim only to have 1 super block per filesystem. Anything else
will lead to inode aliasing, and hence potential data corruption problems
when two processes try to write to the same file on different super
blocks.

Is there really any sane application that requires you to have different
rsize/wsize for the same filesystem?

Cheers,
Trond





-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-02-19 02:33:18

by Lever, Charles

[permalink] [raw]
Subject: RE: NFS suport block sharing

> We should aim only to have 1 super block per filesystem. Anything else
> will lead to inode aliasing, and hence potential data=20
> corruption problems
> when two processes try to write to the same file on different super
> blocks.
>=20
> Is there really any sane application that requires you to=20
> have different
> rsize/wsize for the same filesystem?

an application suite may want to mount one file system with
forcedirectio (if such an option might exist), and another
normally. with the shared super block scheme, this is no
longer possible.

another typical use for having two unshared cached copies of
a file is to test NFS client caching (and yes, we do this
all the time).

so there are some "sane" cases i can think of where separate
caches and/or mount options for the same export make some
sense.

if nothing else, /proc/mounts and /etc/mtab should reflect
the true mount options in effect, and there should be some
explanation of this bizarre phenomenon in the man page. the
way it works now is totally confusing.


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-02-19 04:44:38

by Trond Myklebust

[permalink] [raw]
Subject: RE: NFS suport block sharing

P? on , 18/02/2004 klokka 18:27, skreiv Lever, Charles:
> > Is there really any sane application that requires you to
> > have different
> > rsize/wsize for the same filesystem?
>
> an application suite may want to mount one file system with
> forcedirectio (if such an option might exist), and another
> normally. with the shared super block scheme, this is no
> longer possible.

Sure it is. Just add the equivalent of Olaf's patch for the
forcedirectio flag into the external patch that adds forcedirectio
support. "forcedirectio" does indeed not care about inode aliasing, but it
is the *only* such case.

> another typical use for having two unshared cached copies of
> a file is to test NFS client caching (and yes, we do this
> all the time).

<Cough, splutter>...

That sort of testing can be done using 2 real clients, or with your own
patch. Why force insane semantics on everyone else just in order to allow
for some doubtful testing convenience on behalf of the NFS
developers?

> if nothing else, /proc/mounts and /etc/mtab should reflect
> the true mount options in effect, and there should be some
> explanation of this bizarre phenomenon in the man page. the
> way it works now is totally confusing.

/proc/mounts will always reflect the true mount options. It should just
read that information directly from the superblock.

/etc/mtab is not under kernel control, and has always been broken w.r.t.
NFS options: try typing things like "mount -oremount,tcp /mnt" for
instance, or see the behaviour when you type insane values for
rsize/wsize.
Sure we can document this, but I don't see it as an argument for
breaking the kernel cache consistency by opening for inode aliasing.

Cheers,
Trond





-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-02-19 09:11:18

by Olaf Kirch

[permalink] [raw]
Subject: Re: NFS suport block sharing

On Thu, Feb 19, 2004 at 05:39:06AM +0100, Trond Myklebust wrote:
> Sure it is. Just add the equivalent of Olaf's patch for the
> forcedirectio flag into the external patch that adds forcedirectio
> support. "forcedirectio" does indeed not care about inode aliasing, but it
> is the *only* such case.

But it is non-intuitive. You mount something with one set of flags,
but get a totally different behavior. That is arguably a bug.

I concede that we're arguing a very rarely used setup here, so I'm not
going to be religious about it :-)

Let me state my point though: how many people actually do mount a file
system twice? And if they do, wouldn't that be exactly _because_ they
want different semantics on the two mounts?

In general I think sharing the super block itself is not a good idea,
even for block file systems, because flags such as ro and sync get
ignored as well. These flags, as well as the RPC transport stuff,
might be better placed in the vfsmount as they're really per-mount,
not per-filesystem.

Olaf
--
Olaf Kirch | Stop wasting entropy - start using predictable
[email protected] | tempfile names today!
---------------+


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-02-19 14:57:38

by Lever, Charles

[permalink] [raw]
Subject: RE: NFS suport block sharing

> On Thu, Feb 19, 2004 at 05:39:06AM +0100, Trond Myklebust wrote:
> > Sure it is. Just add the equivalent of Olaf's patch for the
> > forcedirectio flag into the external patch that adds forcedirectio
> > support. "forcedirectio" does indeed not care about inode=20
> aliasing, but it
> > is the *only* such case.
>=20
> But it is non-intuitive. You mount something with one set of flags,
> but get a totally different behavior. That is arguably a bug.
>=20
> I concede that we're arguing a very rarely used setup here, so I'm not
> going to be religious about it :-)
>=20
> Let me state my point though: how many people actually do mount a file
> system twice? And if they do, wouldn't that be exactly _because_ they
> want different semantics on the two mounts?

exactly.

> In general I think sharing the super block itself is not a good idea,
> even for block file systems, because flags such as ro and sync get
> ignored as well. These flags, as well as the RPC transport stuff,
> might be better placed in the vfsmount as they're really per-mount,
> not per-filesystem.

i have to agree completely.

there are cases where the dentry/inode & page cache can/should be
shared for two mounted file systems, but that does not match the
set of cases where it is appropriate to share mount options (which
is almost never). the mount option behavior is just weird and
should be fixed. i can't think of another client implementation
that behaves this way.

i object to placing these options in vfsmount, though. the client
can't get to the vfsmount struct in any easy way. (we hit this
problem trying to implement the nfs client metrics patch -- no
way for the client to find out what export a given file system
is mounted on subsequent to the mount operation, for example).


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-02-19 16:43:42

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS suport block sharing

P? to , 19/02/2004 klokka 01:05, skreiv Olaf Kirch:
> On Thu, Feb 19, 2004 at 05:39:06AM +0100, Trond Myklebust wrote:
> > Sure it is. Just add the equivalent of Olaf's patch for the
> > forcedirectio flag into the external patch that adds forcedirectio
support. "forcedirectio" does indeed not care about inode aliasing,
but it
> > is the *only* such case.
>
> But it is non-intuitive. You mount something with one set of flags, but
get a totally different behavior. That is arguably a bug.

So is data corruption. Arguably more so ;-)

> Let me state my point though: how many people actually do mount a file
system twice? And if they do, wouldn't that be exactly _because_ they
want different semantics on the two mounts?

Possibly, but will they want that given the above mentioned expense? The
last thing I want is a bunch of "but when I do this on Solaris, it doesn't
corrupt my file" type of whines. Better then to tell people straight
upfront that this won't work.

> In general I think sharing the super block itself is not a good idea,
even for block file systems, because flags such as ro and sync get
ignored as well. These flags, as well as the RPC transport stuff, might
be better placed in the vfsmount as they're really per-mount, not
per-filesystem.

Some things (rsize/wsize/acdir*/acreg* in particular come to mind) simply
do not make sense in the vfsmount structure because they are inherently
associated with the inode/data space rather than the name space.
"ro" and "sync" could possibly be implemented in a more sane manner, since
you can do the checks at file open time when you do have the full
namespace information. Most others, though (especially those parameters
related to attribute caching) have to be shared.

Note, though, that Linux has *never* previously supported mounting the
same filesystem twice precisely because of the problems of cache
consistency. When Al added in the current scheme, he changed that by
giving remounting in a second place act the same properties as "mount
--bind". AFAICS that is the only sane scheme.

Cheers,
Trond





-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-02-20 13:32:08

by barrie_spence

[permalink] [raw]
Subject: RE: NFS suport block sharing




> Let me state my point though: how many people actually do mount a file
> system twice? And if they do, wouldn't that be exactly _because_ they
> want different semantics on the two mounts?

We use automount extensively. Yes, all the home directories would be
expected to share common mount options, but the remote filesystem may
contain more than home directories and we may mount separate parts
of the filesystems in different ways in different places.=20

Obviously, I wouldn't expect to use different rsize/wsize.

Barrie


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs