LinuxLists.cc - Union mounts, NFS, and locking

2009-07-14 17:50:05

Subject: Union mounts, NFS, and locking

For now, the union mount developers are assuming that the bottom layer
of a union mount is read-only. This avoids the directory
topology-related problems summarized in this thread:

http://lkml.org/lkml/2008/1/17/4

The problem is making a file system really really read-only and not
letting someone else come along and start sneakily modifying it
underneath us. My superblock readonly users patch
(http://lkml.org/lkml/2009/7/13/243) fixes that for local file
systems, but I'm not sure how to handle NFS mounts as the bottom
layer. The client can mount it read-only, but that puts no
restrictions on the server. However, the kernel can already deal with
things unexpectedly moving/going away on an NFS mount - ESTALE - and
the dentry/inode/etc. won't suddenly disappear in the way they can
with local file systems. Is this good enough for union mounts? Or do
we need to get the NFS server to promise that the exported file system
is really really read-only too?

(Yes, this depends on the actual concrete union mount locking scheme,
but I'm more interested in whether it can or cannot be solved in
principle.)

-VAL

2009-07-14 20:37:19

by Erez Zadok

[permalink] [raw]

Subject: Re: Union mounts, NFS, and locking

In message <20090714201940.GF27582@shell>, Valerie Aurora writes:
> On Tue, Jul 14, 2009 at 02:19:26PM -0400, Erez Zadok wrote:
> > In message <20090714174828.GE27582@shell>, Valerie Aurora writes:
[...]
> > IMHO it's not worth at this stage to try and solve that problem in an
> > end-to-end manner (client to server). For a unioning layer to have to worry
> > about every possible change in any of the layers below it, is no different
> > than for every possible network-filesystem client to be able to guarantee
> > that nothing ever changes on the server unexpectedly: they don't, so why
> > should you have to solve this problem now? Not that I don't think it's an
> > important problem---I just don't see why *you* should have to solve this and
> > not the network-filesystem community: whatever solution that can come up,
> > can be applicable to any unioning layer. In the mean time, do the best you
> > can (e.g., ESTALE, readonly superblocks, etc.).
>
> Okay, so my best idea for a solution is to introduce a new NFS mount
> option that means the server promises that the exported file system is
> read-only (using superblock read-only count scheme locally). E.g.:

How would the server be able to guarantee that? Are you planning to change
the protocol or implementation somehow? Are you assuming that the server
will be running linux w/ special r/o sb support? If so, it won't work on
other platforms (NFS is supposed to be interoperable in principle :-)

Without a protocol change, such an option (if I understood you), is at best
a server promise to "behave nice."

In dealing with Unionfs, I've already had to face some really annoying
cases related to this. For example, when the client mounts read-write, but
the server does *any* combination of these two:

1. export readonly or readwrite
2. the native f/s exported can be locally mounted r/o or r/w.

Turns out that servers in that case will return any of EROFS, EACESS, EPERM,
and even ESTALE. So this is annoying to have to detect: a true permission
error should be returned to the user in unionfs, but a readonly access
should result in a copyup. I don't believe this behavior was standardized
in v2/v3.

So, in retrospect, it would be *great* if I had a client-side mount option
that could guarantee that the server is exporting/mounting readonly. But I
feel that for such a client-side option to work, some sort of information
has to flow b/t the server and the client to validate this readonly
assertion.

> /etc/exports:
> /client_root_fs thin-client-*.local.domain(server_ro,no_root_squash)
>
> Trond, is this super-gross or totally reasonable? Seems like we add
> new NFS mount options at the drop of a hat.
>
> -VAL

Erez.

2009-07-14 22:56:35

by Myklebust, Trond

[permalink] [raw]

Subject: Re: Union mounts, NFS, and locking

On Tue, 2009-07-14 at 18:33 -0400, Erez Zadok wrote:
> How would the client detect that the server broke the promise? In theory,
> your client may never know b/c it'll never send the server any
> state-changing ops (e.g., creat, write, unlink). One really ugly idea might
> be for the client to try and create a dummy .nfsXXXXXX file on the server,
> and if that succeds, or the error returned isn't EROFS, the client can guess
> that the server's misbhaving.

That still doesn't guarantee anything:

cat /etc/exports
/export 10.0.0.0/24(ro,sync) 10.0.1.1(rw,sync)
/export/home 10.0.0.0/24(sec=krb5i:krb5p,rw,sec=sys:krb5,ro)

Both of the above are liable to return EROFS to some clients, but not
others...

NFSv4.1 directory delegations can do the job of notifying you if the
directory contents change, but what should your unionfs do when it gets
told that this is the case?

Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2009-07-14 18:19:54

by Erez Zadok

[permalink] [raw]

Subject: Re: Union mounts, NFS, and locking

In message <20090714174828.GE27582@shell>, Valerie Aurora writes:
[...]
> (Yes, this depends on the actual concrete union mount locking scheme,
> but I'm more interested in whether it can or cannot be solved in
> principle.)

Val,

To solve this "in principle" would require a semantic change to all
network-based file systems (not just NFS). You'll find yourself deep inside
age-old distributed systems and distributed locking issues---hard problems
(you've got plenty to worry about w/o having to redefine NFS semantics :-)

IMHO it's not worth at this stage to try and solve that problem in an
end-to-end manner (client to server). For a unioning layer to have to worry
about every possible change in any of the layers below it, is no different
than for every possible network-filesystem client to be able to guarantee
that nothing ever changes on the server unexpectedly: they don't, so why
should you have to solve this problem now? Not that I don't think it's an
important problem---I just don't see why *you* should have to solve this and
not the network-filesystem community: whatever solution that can come up,
can be applicable to any unioning layer. In the mean time, do the best you
can (e.g., ESTALE, readonly superblocks, etc.).

For example, I cannot see how this can be solved cleanly and *portably* for
established protocols such as nfsv2/3. For v4, it might be possible to
enforce it with new callbacks which can propagate from the v4 client all the
way up to the VFS (and thus Union Mounts). If you're going to try and
tackle the cache-coherency beast, I strongly suggest restricting the problem
to something as "simple" and manageable as a new NFSv4 CB.

Cheers,
Erez.

2009-07-14 20:20:03

by Valerie Aurora

[permalink] [raw]

Subject: Re: Union mounts, NFS, and locking

On Tue, Jul 14, 2009 at 02:19:26PM -0400, Erez Zadok wrote:
> In message <20090714174828.GE27582@shell>, Valerie Aurora writes:
> [...]
> > (Yes, this depends on the actual concrete union mount locking scheme,
> > but I'm more interested in whether it can or cannot be solved in
> > principle.)
>
> Val,
>
> To solve this "in principle" would require a semantic change to all
> network-based file systems (not just NFS). You'll find yourself deep inside
> age-old distributed systems and distributed locking issues---hard problems
> (you've got plenty to worry about w/o having to redefine NFS semantics :-)

Makes sense.

> IMHO it's not worth at this stage to try and solve that problem in an
> end-to-end manner (client to server). For a unioning layer to have to worry
> about every possible change in any of the layers below it, is no different
> than for every possible network-filesystem client to be able to guarantee
> that nothing ever changes on the server unexpectedly: they don't, so why
> should you have to solve this problem now? Not that I don't think it's an
> important problem---I just don't see why *you* should have to solve this and
> not the network-filesystem community: whatever solution that can come up,
> can be applicable to any unioning layer. In the mean time, do the best you
> can (e.g., ESTALE, readonly superblocks, etc.).

Okay, so my best idea for a solution is to introduce a new NFS mount
option that means the server promises that the exported file system is
read-only (using superblock read-only count scheme locally). E.g.:

/etc/exports:
/client_root_fs thin-client-*.local.domain(server_ro,no_root_squash)

Trond, is this super-gross or totally reasonable? Seems like we add
new NFS mount options at the drop of a hat.

-VAL

2009-07-14 22:05:34

by Valerie Aurora

[permalink] [raw]

Subject: Re: Union mounts, NFS, and locking

On Tue, Jul 14, 2009 at 04:36:40PM -0400, Erez Zadok wrote:
> In message <20090714201940.GF27582@shell>, Valerie Aurora writes:
>
> > Okay, so my best idea for a solution is to introduce a new NFS mount
> > option that means the server promises that the exported file system is
> > read-only (using superblock read-only count scheme locally). E.g.:
>
> How would the server be able to guarantee that? Are you planning to change
> the protocol or implementation somehow? Are you assuming that the server
> will be running linux w/ special r/o sb support? If so, it won't work on
> other platforms (NFS is supposed to be interoperable in principle :-)
>
> Without a protocol change, such an option (if I understood you), is at best
> a server promise to "behave nice."

Yeah, it's just a promise, one that the NFS server shouldn't make if
it can't implement it. The client's sole responsibility is to fail
gracefully if the server breaks its promise.

The "protocol change" can probably be limited to a new NFS mount
option and the error returned if the server can't implement this mount
option.

> In dealing with Unionfs, I've already had to face some really annoying
> cases related to this. For example, when the client mounts read-write, but
> the server does *any* combination of these two:
>
> 1. export readonly or readwrite
> 2. the native f/s exported can be locally mounted r/o or r/w.
>
> Turns out that servers in that case will return any of EROFS, EACESS, EPERM,
> and even ESTALE. So this is annoying to have to detect: a true permission
> error should be returned to the user in unionfs, but a readonly access
> should result in a copyup. I don't believe this behavior was standardized
> in v2/v3.
>
> So, in retrospect, it would be *great* if I had a client-side mount option
> that could guarantee that the server is exporting/mounting readonly. But I
> feel that for such a client-side option to work, some sort of information
> has to flow b/t the server and the client to validate this readonly
> assertion.

I'm glad it would be useful in other cases!

-VAL

2009-07-14 22:34:01

by Erez Zadok

[permalink] [raw]

Subject: Re: Union mounts, NFS, and locking

In message <20090714220515.GH27582@shell>, Valerie Aurora writes:
> On Tue, Jul 14, 2009 at 04:36:40PM -0400, Erez Zadok wrote:
> > In message <20090714201940.GF27582@shell>, Valerie Aurora writes:
> >
> > > Okay, so my best idea for a solution is to introduce a new NFS mount
> > > option that means the server promises that the exported file system is
> > > read-only (using superblock read-only count scheme locally). E.g.:
> >
> > How would the server be able to guarantee that? Are you planning to change
> > the protocol or implementation somehow? Are you assuming that the server
> > will be running linux w/ special r/o sb support? If so, it won't work on
> > other platforms (NFS is supposed to be interoperable in principle :-)
> >
> > Without a protocol change, such an option (if I understood you), is at best
> > a server promise to "behave nice."
>
> Yeah, it's just a promise, one that the NFS server shouldn't make if
> it can't implement it. The client's sole responsibility is to fail
> gracefully if the server breaks its promise.
[...]

How would the client detect that the server broke the promise? In theory,
your client may never know b/c it'll never send the server any
state-changing ops (e.g., creat, write, unlink). One really ugly idea might
be for the client to try and create a dummy .nfsXXXXXX file on the server,
and if that succeds, or the error returned isn't EROFS, the client can guess
that the server's misbhaving.

Erez.

2009-07-15 00:19:33

by Valerie Aurora

[permalink] [raw]

Subject: Re: Union mounts, NFS, and locking

On Tue, Jul 14, 2009 at 06:33:27PM -0400, Erez Zadok wrote:
> In message <20090714220515.GH27582@shell>, Valerie Aurora writes:
> > On Tue, Jul 14, 2009 at 04:36:40PM -0400, Erez Zadok wrote:
> > > In message <20090714201940.GF27582@shell>, Valerie Aurora writes:
> > >
> > > > Okay, so my best idea for a solution is to introduce a new NFS mount
> > > > option that means the server promises that the exported file system is
> > > > read-only (using superblock read-only count scheme locally). E.g.:
> > >
> > > How would the server be able to guarantee that? Are you planning to change
> > > the protocol or implementation somehow? Are you assuming that the server
> > > will be running linux w/ special r/o sb support? If so, it won't work on
> > > other platforms (NFS is supposed to be interoperable in principle :-)
> > >
> > > Without a protocol change, such an option (if I understood you), is at best
> > > a server promise to "behave nice."
> >
> > Yeah, it's just a promise, one that the NFS server shouldn't make if
> > it can't implement it. The client's sole responsibility is to fail
> > gracefully if the server breaks its promise.
> [...]
>
> How would the client detect that the server broke the promise?

The same way we detect other NFS server bugs - frequently by crashing
the client, but in the best case hitting code like:

fs/nfs/inode.c.:

if (!fattr->nlink) {
printk("NFS: Buggy server - nlink == 0!\n");
goto out_no_inode;
}

If the server happily mounts with the server_ro option, and then
writes to the exported file system, it's buggy. Deal accordingly.

-VAL

2009-07-15 17:28:16

by J. Bruce Fields

[permalink] [raw]

Subject: Re: Union mounts, NFS, and locking

On Tue, Jul 14, 2009 at 06:05:16PM -0400, Valerie Aurora wrote:
> On Tue, Jul 14, 2009 at 04:36:40PM -0400, Erez Zadok wrote:
> > In message <20090714201940.GF27582@shell>, Valerie Aurora writes:
> >
> > > Okay, so my best idea for a solution is to introduce a new NFS mount
> > > option that means the server promises that the exported file system is
> > > read-only (using superblock read-only count scheme locally). E.g.:

Language nitpick: the term "read-only" is confusing. Files
(/proc/mounts) and filesystems (nfs) that are "read-only" can still
change.

I'd be happier with "unchanging" or "constant" or "static".

> >
> > How would the server be able to guarantee that? Are you planning to change
> > the protocol or implementation somehow? Are you assuming that the server
> > will be running linux w/ special r/o sb support? If so, it won't work on
> > other platforms (NFS is supposed to be interoperable in principle :-)
> >
> > Without a protocol change, such an option (if I understood you), is at best
> > a server promise to "behave nice."
>
> Yeah, it's just a promise, one that the NFS server shouldn't make if
> it can't implement it. The client's sole responsibility is to fail
> gracefully if the server breaks its promise.
>
> The "protocol change" can probably be limited to a new NFS mount
> option and the error returned if the server can't implement this mount
> option.

The mount options aren't really in the protocol--so it'd probably take
the form of a filesystem-granularity attribute that the client could
query (and then fail the mount if the client didn't like the answer).

But even then: the fact is that someone will want to update the
filesystem some day. And there's no way to force every client
administrator to remount. So we'd have to decide how to handle that
case.

--b.

2009-07-16 00:15:41

by Erez Zadok

[permalink] [raw]

Subject: Re: Union mounts, NFS, and locking

In message <1247612140.5332.11.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>, Trond Myklebust writes:
> On Tue, 2009-07-14 at 18:33 -0400, Erez Zadok wrote:
> > How would the client detect that the server broke the promise? In theory,
> > your client may never know b/c it'll never send the server any
> > state-changing ops (e.g., creat, write, unlink). One really ugly idea might
> > be for the client to try and create a dummy .nfsXXXXXX file on the server,
> > and if that succeds, or the error returned isn't EROFS, the client can guess
> > that the server's misbhaving.
>
> That still doesn't guarantee anything:
>
> cat /etc/exports
> /export 10.0.0.0/24(ro,sync) 10.0.1.1(rw,sync)
> /export/home 10.0.0.0/24(sec=krb5i:krb5p,rw,sec=sys:krb5,ro)

Agreed. One more example why trying to enforce read-only-ness (or any other
term Bruce prefers :-) is going to be too hard to do at this stage. I think
we can live with a client-side "promise" via a mount option for now.

> Both of the above are liable to return EROFS to some clients, but not
> others...
>
> NFSv4.1 directory delegations can do the job of notifying you if the
> directory contents change, but what should your unionfs do when it gets
> told that this is the case?

Good to know about these delegations in 4.1. They could be useful for any
stackable layer including union mounts and even ecryptfs, to handle cache
coherency across the layers more gracefully. At the very least it could be
used to purge stale caches. Even w/ union mounts, some operations could be
expensive: e.g., directory name merging and duplicate elimination; hence
it's likely the result of such a merge would be cached somewhere. This
cache can get stale if a directory that's part of the name merge has changed
on the server. So a unioning solution might want to know this.

(Directory delegations may also be more efficient than the traditional
NFSv2/3 way to validating the attrcache; IIRC, the client has to check that
the parent dir mtime hasn't changed before using a cached attribute.)

BTW, will nfs4.1's directory delegations be able to inform about namespace
changes only, or would they also be able to inform the client about file
data which had changed.

> Cheers
> Trond
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com

Thanks,
Erez.

2009-07-16 17:26:06

by Valerie Aurora

[permalink] [raw]

Subject: Re: Union mounts, NFS, and locking

On Wed, Jul 15, 2009 at 01:27:59PM -0400, J. Bruce Fields wrote:
> On Tue, Jul 14, 2009 at 06:05:16PM -0400, Valerie Aurora wrote:
> > On Tue, Jul 14, 2009 at 04:36:40PM -0400, Erez Zadok wrote:
> > > In message <20090714201940.GF27582@shell>, Valerie Aurora writes:
> > >
> > > > Okay, so my best idea for a solution is to introduce a new NFS mount
> > > > option that means the server promises that the exported file system is
> > > > read-only (using superblock read-only count scheme locally). E.g.:
>
> Language nitpick: the term "read-only" is confusing. Files
> (/proc/mounts) and filesystems (nfs) that are "read-only" can still
> change.
>
> I'd be happier with "unchanging" or "constant" or "static".

What about "immutable"? It should be familiar from inode attributes.

> The mount options aren't really in the protocol--so it'd probably take
> the form of a filesystem-granularity attribute that the client could
> query (and then fail the mount if the client didn't like the answer).
>
> But even then: the fact is that someone will want to update the
> filesystem some day. And there's no way to force every client
> administrator to remount. So we'd have to decide how to handle that
> case.

Agreed. I'm no NFS expert, but I think that treating it as if the NFS
server had stopped responding might be a start?

-VAL

2009-07-16 22:14:57

by David P. Quigley

[permalink] [raw]

Subject: Re: Union mounts, NFS, and locking

On Wed, 2009-07-15 at 13:27 -0400, J. Bruce Fields wrote:
[snip]
> The mount options aren't really in the protocol--so it'd probably take
> the form of a filesystem-granularity attribute that the client could
> query (and then fail the mount if the client didn't like the answer).
>
> But even then: the fact is that someone will want to update the
> filesystem some day. And there's no way to force every client
> administrator to remount. So we'd have to decide how to handle that
> case.

So currently this is the case but at the last IETF meeting I proposed a
remount callback to handle the case of a mass file relabel on the
server. I think Beepy wrote it down on the possible 4.2 items. However I
wouldn't expect to see anything related to that for a while and that
assumes that someone picks up the ball and runs with it to begin with.

Dave

P.S. Note this is NFSv4 we are talking about I don't have a solution for
v2(does anyone even use it any more?) or v3