2011-11-11 23:05:53

by Matthew Treinish

[permalink] [raw]
Subject: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

This patch series implements client side support for volatile file handle
recovery (RFC 3530 section 4.2 and 4.3) with walk back using the dcache. To
test the client you either need a server that supports volatile file handles or
you can hard code the server to output NFS4ERR_FHEXPIRED instead of
NFSERR_STALE. (See the last patch in the series)

The approach used here for recovery is to perform lookups for each file handle
that receives a FHEXPIRED error. If the lookup also fails with FHEXPIRED using
the dcache, it will recursively walk back to the root of the mount, and recover
that using get_root.

Simple testing has shown that this approach works and will correctly recover
from a FHEXPIRED error code. However, the current implementation uses
d_obtain_alias if a nfs4_proc function is only given an inode. When hardlinks
are involved, this results in getting a path, but not necessarily a path which
the user has access, which might lead to permission issues.

Since the RFC did not mandate how to recover from FHEXPIRED I would
assume this approach solves majority of the generic use cases, but, I think
that some more discussion is needed on this topic. Also, considering that this
is my first kernel patch set, I think it definitely needs review.

Matthew Treinish (7):
New mount option for volatile filehandle recovery
Added support for FH_EXPIRE_TYPE attribute.
Add VFS objects from nfs4_proc calls into nfs4_exception.
Save root file handle in nfs_server.
Added VFH FHEXPIRED recovery functions.
Perform recovery on both inodes for rename.
Added error handling for NFS4ERR_FHEXPIRED

fs/nfs/client.c | 3 +
fs/nfs/getroot.c | 7 ++
fs/nfs/nfs4_fs.h | 2 +
fs/nfs/nfs4proc.c | 245 +++++++++++++++++++++++++++++++++++++++------
fs/nfs/nfs4xdr.c | 27 +++++
fs/nfs/super.c | 6 +
include/linux/nfs_fs_sb.h | 2 +
include/linux/nfs_mount.h | 1 +
include/linux/nfs_xdr.h | 1 +
9 files changed, 265 insertions(+), 29 deletions(-)

--
1.7.4.4



2011-11-12 17:16:55

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH/RFC 5/7] Added VFH FHEXPIRED recovery functions.

On Fri, 2011-11-11 at 19:45 -0800, Malahal Naineni wrote:
> Trond Myklebust [[email protected]] wrote:
> > On Fri, 2011-11-11 at 18:04 -0500, Matthew Treinish wrote:
> > > +static int nfs4_proc_vfh_lookup(struct rpc_clnt *clnt, struct inode *dir,
> > > + struct qstr *name, struct nfs_fh *fhandle, struct nfs_fattr *fattr)
> > > +{
> > > + struct nfs4_exception exception = { };
> > > + int err;
> > > + do {
> > > + int status;
> > > +
> > > + status = _nfs4_proc_lookup(clnt, dir, name, fhandle, fattr);
> > > + switch (status) {
> > > + case -NFS4ERR_BADNAME:
> > > + return -ENOENT;
> > > + case -NFS4ERR_MOVED:
> > > + err = nfs4_get_referral(dir, name, fattr, fhandle);
> > > + break;
> > > + case -NFS4ERR_FHEXPIRED:
> > > + return -NFS4ERR_FHEXPIRED;
> > > + case -NFS4ERR_WRONGSEC:
> > > + nfs_fixup_secinfo_attributes(fattr, fhandle);
> >
> > case -NFS4ERR_ACCESS:
> > ???????
> >
> > > + }
> > > + err = nfs4_handle_exception(NFS_SERVER(dir),
> > > + status, &exception);
> > > + } while (exception.retry);
> > > + return err;
> > > +}
> > > +
> >
> > What execution context is this function going to be running under and
> > what guarantees that it actually has the right file access credentials
> > to allow it to perform a lookup?
>
> I imagine, it is in the context of the process that received FHEXPIRED
> error. It may not have credentials to perform a lookup on parent
> directories. If it doesn't, that would end up with ESTALE with Matt's
> patches, right Matt?

My point is that if you don't have the ability to pass a credential as
an argument, then you won't be able to recover from something like an
OPEN, READ or WRITE, which all happen in the rpciod context, nor can you
recover from the state recovery thread context.

Note also that you are doing synchronous I/O, and so you will need a
recovery thread context anyway in order to recover from stuff running in
the rpciod context...

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-11-12 00:19:41

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH/RFC 1/7] New mount option for volatile filehandle recovery

On Fri, 2011-11-11 at 18:04 -0500, Matthew Treinish wrote:
> The new 'vfhretry' mount option will be used to enable the volatile filehandle
> recovery routines in the client. On an expired filehandle recover the client
> will attempt to recover by performing a lookup on the name of the file.
>
> This mechanism of recovery isn't necessarily safe for a posix filesystem so
> using the mount option will allow the user to enable this at their own risk. If the mount option is not turned on, the FHEXPIRED error will be converted to
> ESTALE.

Either we handle NFS4ERR_FHEXPIRED, or we don't... What is the
justification for wanting to turn this off on a per-mount basis?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-11-11 23:06:23

by Matthew Treinish

[permalink] [raw]
Subject: [PATCH/RFC 1/7] New mount option for volatile filehandle recovery

The new 'vfhretry' mount option will be used to enable the volatile filehandle
recovery routines in the client. On an expired filehandle recover the client
will attempt to recover by performing a lookup on the name of the file.

This mechanism of recovery isn't necessarily safe for a posix filesystem so
using the mount option will allow the user to enable this at their own risk. If the mount option is not turned on, the FHEXPIRED error will be converted to
ESTALE.

Signed-off-by: Matthew Treinish <[email protected]>
---
fs/nfs/super.c | 6 ++++++
include/linux/nfs_mount.h | 1 +
2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 480b3b6..7eef204 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -87,6 +87,7 @@ enum {
Opt_sharecache, Opt_nosharecache,
Opt_resvport, Opt_noresvport,
Opt_fscache, Opt_nofscache,
+ Opt_vfhretry,

/* Mount options that take integer arguments */
Opt_port,
@@ -149,6 +150,7 @@ static const match_table_t nfs_mount_option_tokens = {
{ Opt_noresvport, "noresvport" },
{ Opt_fscache, "fsc" },
{ Opt_nofscache, "nofsc" },
+ { Opt_vfhretry, "vfhretry" },

{ Opt_port, "port=%s" },
{ Opt_rsize, "rsize=%s" },
@@ -650,6 +652,7 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss,
{ NFS_MOUNT_NORDIRPLUS, ",nordirplus", "" },
{ NFS_MOUNT_UNSHARED, ",nosharecache", "" },
{ NFS_MOUNT_NORESVPORT, ",noresvport", "" },
+ { NFS_MOUNT_VFHRETRY, ",vfhretry", ""},
{ 0, NULL, NULL }
};
const struct proc_nfs_info *nfs_infop;
@@ -1203,6 +1206,9 @@ static int nfs_parse_mount_options(char *raw,
kfree(mnt->fscache_uniq);
mnt->fscache_uniq = NULL;
break;
+ case Opt_vfhretry:
+ mnt->flags |= NFS_MOUNT_VFHRETRY;
+ break;

/*
* options that take numeric values
diff --git a/include/linux/nfs_mount.h b/include/linux/nfs_mount.h
index 576bddd..dba0e23 100644
--- a/include/linux/nfs_mount.h
+++ b/include/linux/nfs_mount.h
@@ -73,5 +73,6 @@ struct nfs_mount_data {

#define NFS_MOUNT_LOCAL_FLOCK 0x100000
#define NFS_MOUNT_LOCAL_FCNTL 0x200000
+#define NFS_MOUNT_VFHRETRY 0x400000

#endif
--
1.7.4.4


2011-11-14 09:10:00

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Sun, Nov 13, 2011 at 7:06 PM, Matthew Treinish
<[email protected]> wrote:
> On Sun, Nov 13, 2011 at 02:54:00PM +1100, NeilBrown wrote:
>> On Sat, 12 Nov 2011 09:49:53 -0500 Christoph Hellwig <[email protected]>
>> wrote:
>>
>> > On Fri, Nov 11, 2011 at 07:13:29PM -0500, Trond Myklebust wrote:
>> > > On Fri, 2011-11-11 at 18:04 -0500, Matthew Treinish wrote:
>> > > > This patch series implements client side support for volatile file handle
>> > > > recovery (RFC 3530 section 4.2 and 4.3) with walk back using the dcache. To
>> > > > test the client you either need a server that supports volatile file handles or
>> > > > you can hard code the server to output NFS4ERR_FHEXPIRED instead of
>> > > > NFSERR_STALE. (See the last patch in the series)
>> > >
>> > > WHY do we want to support this kind of "feature"? As you said, the RFC
>> > > doesn't actually help in figuring out how this crap is supposed to work
>> > > in practice, so why do we even consider starting to give a damn?
>> >
>> > *nod*. Pretending we handle it seems fairly dangerous.  I'd much prefer
>> > outright rejecting it.
>>
>> Hence the suggested mount option.
>>
>> A server might not be able to provide stable file handles, but can ensure
>> that files don't get renamed - for these filesystems, the name is a
>> reliable stable handle for the file (it just doesn't fit in the NFSv4 file
>> handle structure).
>>
>> So if you know the filesystem will only return FHEXPIRED for filehandles
>> belonging to files that cannot be renamed, then it is perfectly reasonable to
>> repeat the name lookup to re-access the file after the server forgets about
>> an old filehandle.  The mount option is how you communicate this knowledge,
>> because the RFC doesn't provide a way to communicate it.
>>
> This was one of 2 reasons for implementing this, and we actually run into this with
> certain z/OS systems, because the z/OS NFS server currently uses FHEXPIRED in this way.
>
> The other thought was that this could be used for migration/replication
> between file synced servers. So, if we wanted to switch/move to another server where
> the file names were the same but all the inode numbers were different you could use
> this to refresh the invalid file handles on the new server.

If my scec reading is correct, then spec does not enforce you to use
the same file handles on redirected server. This is of course you you
want to return NFS4ERR_MOVED.

Tigran.

>
> -Matt Treinish
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

2011-11-13 18:06:38

by Matthew Treinish

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Sun, Nov 13, 2011 at 02:54:00PM +1100, NeilBrown wrote:
> On Sat, 12 Nov 2011 09:49:53 -0500 Christoph Hellwig <[email protected]>
> wrote:
>
> > On Fri, Nov 11, 2011 at 07:13:29PM -0500, Trond Myklebust wrote:
> > > On Fri, 2011-11-11 at 18:04 -0500, Matthew Treinish wrote:
> > > > This patch series implements client side support for volatile file handle
> > > > recovery (RFC 3530 section 4.2 and 4.3) with walk back using the dcache. To
> > > > test the client you either need a server that supports volatile file handles or
> > > > you can hard code the server to output NFS4ERR_FHEXPIRED instead of
> > > > NFSERR_STALE. (See the last patch in the series)
> > >
> > > WHY do we want to support this kind of "feature"? As you said, the RFC
> > > doesn't actually help in figuring out how this crap is supposed to work
> > > in practice, so why do we even consider starting to give a damn?
> >
> > *nod*. Pretending we handle it seems fairly dangerous. I'd much prefer
> > outright rejecting it.
>
> Hence the suggested mount option.
>
> A server might not be able to provide stable file handles, but can ensure
> that files don't get renamed - for these filesystems, the name is a
> reliable stable handle for the file (it just doesn't fit in the NFSv4 file
> handle structure).
>
> So if you know the filesystem will only return FHEXPIRED for filehandles
> belonging to files that cannot be renamed, then it is perfectly reasonable to
> repeat the name lookup to re-access the file after the server forgets about
> an old filehandle. The mount option is how you communicate this knowledge,
> because the RFC doesn't provide a way to communicate it.
>
This was one of 2 reasons for implementing this, and we actually run into this with
certain z/OS systems, because the z/OS NFS server currently uses FHEXPIRED in this way.

The other thought was that this could be used for migration/replication
between file synced servers. So, if we wanted to switch/move to another server where
the file names were the same but all the inode numbers were different you could use
this to refresh the invalid file handles on the new server.

-Matt Treinish



2011-11-12 03:45:57

by Malahal Naineni

[permalink] [raw]
Subject: Re: [PATCH/RFC 5/7] Added VFH FHEXPIRED recovery functions.

Trond Myklebust [[email protected]] wrote:
> On Fri, 2011-11-11 at 18:04 -0500, Matthew Treinish wrote:
> > +static int nfs4_proc_vfh_lookup(struct rpc_clnt *clnt, struct inode *dir,
> > + struct qstr *name, struct nfs_fh *fhandle, struct nfs_fattr *fattr)
> > +{
> > + struct nfs4_exception exception = { };
> > + int err;
> > + do {
> > + int status;
> > +
> > + status = _nfs4_proc_lookup(clnt, dir, name, fhandle, fattr);
> > + switch (status) {
> > + case -NFS4ERR_BADNAME:
> > + return -ENOENT;
> > + case -NFS4ERR_MOVED:
> > + err = nfs4_get_referral(dir, name, fattr, fhandle);
> > + break;
> > + case -NFS4ERR_FHEXPIRED:
> > + return -NFS4ERR_FHEXPIRED;
> > + case -NFS4ERR_WRONGSEC:
> > + nfs_fixup_secinfo_attributes(fattr, fhandle);
>
> case -NFS4ERR_ACCESS:
> ???????
>
> > + }
> > + err = nfs4_handle_exception(NFS_SERVER(dir),
> > + status, &exception);
> > + } while (exception.retry);
> > + return err;
> > +}
> > +
>
> What execution context is this function going to be running under and
> what guarantees that it actually has the right file access credentials
> to allow it to perform a lookup?

I imagine, it is in the context of the process that received FHEXPIRED
error. It may not have credentials to perform a lookup on parent
directories. If it doesn't, that would end up with ESTALE with Matt's
patches, right Matt?

--Malahal.


2011-11-11 23:06:18

by Matthew Treinish

[permalink] [raw]
Subject: [PATCH/RFC 5/7] Added VFH FHEXPIRED recovery functions.

The VFH recovery functions perform a recursive walk back on FHEXPIRED
errors. If an FHEXPIRED error is received during a recovery lookup
This means that the directory's filehandle is also expired and
needs to be recovered.

The end case for the recursion is if the filehandle is the rootfh.
This means that we recursed back to the root of the export, and we
can just perform a rootfh lookup.

Also added a modified lookup for volatile file handle recovery. This
function will not use the exception handling on FHEXPIRED errors
and just return FHEXPIRED instead.

Signed-off-by: Matthew Treinish <[email protected]>
---
fs/nfs/nfs4proc.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 94 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 20b96cb..50bb823 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -74,6 +74,7 @@ static int _nfs4_recover_proc_open(struct nfs4_opendata *data);
static int nfs4_do_fsinfo(struct nfs_server *, struct nfs_fh *, struct nfs_fsinfo *);
static int nfs4_async_handle_error(struct rpc_task *, const struct nfs_server *, struct nfs4_state *);
static int _nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle, struct nfs_fattr *fattr);
+static int nfs4_fhexpired_recovery(struct nfs_server *server, struct nfs4_exception *exception);
static int nfs4_do_setattr(struct inode *inode, struct rpc_cred *cred,
struct nfs_fattr *fattr, struct iattr *sattr,
struct nfs4_state *state);
@@ -2494,6 +2495,99 @@ static int nfs4_proc_lookup(struct rpc_clnt *clnt, struct inode *dir, struct qst
return err;
}

+static int nfs4_proc_vfh_lookup(struct rpc_clnt *clnt, struct inode *dir,
+ struct qstr *name, struct nfs_fh *fhandle, struct nfs_fattr *fattr)
+{
+ struct nfs4_exception exception = { };
+ int err;
+ do {
+ int status;
+
+ status = _nfs4_proc_lookup(clnt, dir, name, fhandle, fattr);
+ switch (status) {
+ case -NFS4ERR_BADNAME:
+ return -ENOENT;
+ case -NFS4ERR_MOVED:
+ err = nfs4_get_referral(dir, name, fattr, fhandle);
+ break;
+ case -NFS4ERR_FHEXPIRED:
+ return -NFS4ERR_FHEXPIRED;
+ case -NFS4ERR_WRONGSEC:
+ nfs_fixup_secinfo_attributes(fattr, fhandle);
+ }
+ err = nfs4_handle_exception(NFS_SERVER(dir),
+ status, &exception);
+ } while (exception.retry);
+ return err;
+}
+
+static int _nfs4_fhexpired_recovery(struct nfs_server *server, struct dentry *d_parent, struct qstr *name, struct inode *inode)
+{
+ int err;
+ struct nfs_fh *fhandle = NFS_FH(inode);
+ struct nfs_fattr *fattr = nfs_alloc_fattr();
+ if (fattr == NULL)
+ return -ENOMEM;
+ fattr->fileid = 0;
+ if (!nfs_compare_fh(fhandle, server->rootfh)) {
+ struct nfs_fsinfo info = {
+ .fattr = fattr,
+ };
+ err = nfs4_proc_get_root(server, fhandle, &info);
+ if (!err) {
+ if (NFS_FILEID(inode) != info.fattr->fileid)
+ set_nfs_fileid(inode, info.fattr->fileid);
+ nfs_copy_fh(server->rootfh, fhandle);
+ /* Only needed if fsid changes on server */
+ memcpy(&server->fsid, &info.fattr->fsid,
+ sizeof(server->fsid));
+ }
+ nfs_free_fattr(fattr);
+ return err;
+ }
+ err = nfs4_proc_vfh_lookup(server->client, d_parent->d_inode, name,
+ fhandle, fattr);
+ if (!fattr->fileid && !err && fattr->fileid != NFS_FILEID(inode))
+ set_nfs_fileid(inode, fattr->fileid);
+ if (err == -NFS4ERR_FHEXPIRED) {
+ err = _nfs4_fhexpired_recovery(server, d_parent->d_parent,
+ &d_parent->d_name, d_parent->d_inode);
+ if (!err) {
+ err = nfs4_proc_vfh_lookup(server->client,
+ d_parent->d_inode, name,
+ fhandle, fattr);
+ if (!fattr->fileid && !err &&
+ fattr->fileid != NFS_FILEID(inode))
+ set_nfs_fileid(inode, fattr->fileid);
+ }
+ }
+ nfs_free_fattr(fattr);
+ return err;
+}
+
+static int nfs4_fhexpired_recovery(struct nfs_server *server, struct nfs4_exception *exception)
+{
+ int err;
+ struct dentry *dentry;
+ if (exception->dentry) {
+ dentry = exception->dentry;
+ err = _nfs4_fhexpired_recovery(server, dentry->d_parent,
+ &dentry->d_name, dentry->d_inode);
+ } else if (exception->inode) {
+ dentry = d_obtain_alias(exception->inode);
+ err = _nfs4_fhexpired_recovery(server, dentry->d_parent,
+ &dentry->d_name, exception->inode);
+ dput(dentry);
+ }
+ BUG_ON(!exception->inode && !exception->dentry); /*Recovery without
+ VFS objects, missed
+ a proc function*/
+ if (!err)
+ return -EAGAIN; /* Return EAGAIN so that the operation will
+ be performed again on successful recovery */
+ return err;
+}
+
static int _nfs4_proc_access(struct inode *inode, struct nfs_access_entry *entry)
{
struct nfs_server *server = NFS_SERVER(inode);
--
1.7.4.4


2011-11-13 16:36:47

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Sun, Nov 13, 2011 at 02:45:48PM +0100, Tigran Mkrtchyan wrote:
> I have a server which runs on top of hadoop. The problem with hadoop
> is that there is no way to have persistent file handles. I am
> currently working on a way to do that - either simulate them or add a
> support for unique file id to hadoop. If linux client will support
> volatile file handles then I can stop inventing some workarounds.

I might call that "fixing" rather than inventing workarounds.

Our of curiosity: if we really wanted to support such filesystems, what
would we need in the protocol? Just saying "filehandles aren't stable,
deal with it" seems insufficient.

Say there was some way for the client to indicate which filehandles it
currently has in use, and some way for the server to ask the client to
return in-use filehandles if there are too many (like DELEG_RECALL_ANY).
Then the server could at least place a limit on the number of
filehandles that it had to guarantee persistent.

And/or the client could get a callback on rename/link/unlink. Bah.

Would any of that actually be easier than implementing persistent file
handles?

--b.

2011-11-14 16:30:05

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Sun, 2011-11-13 at 14:45 +0100, Tigran Mkrtchyan wrote:
> I have a server which runs on top of hadoop. The problem with hadoop
> is that there is no way to have persistent file handles. I am
> currently working on a way to do that - either simulate them or add a
> support for unique file id to hadoop. If linux client will support
> volatile file handles then I can stop inventing some workarounds.

So if we add a broken hack to the client then you can stop adding broken
hacks to the server? Sounds like you will still have a broken hacky
setup to me.

The idea of adding file ids to hadoop sounds like a better solution.

Trond

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-11-14 17:27:10

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Mon, 2011-11-14 at 08:07 +1100, NeilBrown wrote:
> On Sun, 13 Nov 2011 11:36:32 -0500 "J. Bruce Fields" <[email protected]>
> wrote:
>
> > On Sun, Nov 13, 2011 at 02:45:48PM +0100, Tigran Mkrtchyan wrote:
> > > I have a server which runs on top of hadoop. The problem with hadoop
> > > is that there is no way to have persistent file handles. I am
> > > currently working on a way to do that - either simulate them or add a
> > > support for unique file id to hadoop. If linux client will support
> > > volatile file handles then I can stop inventing some workarounds.
> >
> > I might call that "fixing" rather than inventing workarounds.
> >
> > Our of curiosity: if we really wanted to support such filesystems, what
> > would we need in the protocol? Just saying "filehandles aren't stable,
> > deal with it" seems insufficient.
>
> 1/ no guarantees if the file is not 'open'
> 2/ two possible responses to FHEXPIRED:

Question: Section 8.11 states that

When the server chooses to export multiple filehandles corresponding
to the same file object and returns different filehandles on two
different OPENs of the same file object, the server MUST NOT "OR"
together the access and deny bits and coalesce the two open files.
Instead the server must maintain separate OPENs with separate
stateids and will require separate CLOSEs to free them.

How does one reconcile the above paragraph with a case where the server
can expire a filehandle while the file is open? For one thing, it seems
to say that you cannot CLOSE (or unlock!) a file once the filehandle
expires...

> a/ perform a GETATTR and request the 'filehandle' attribute. Client then
> uses that filehandle instead.

??? GETATTR takes a filehandle argument and will presumably get an
automatic FHEXPIRED. If not, and if you can map one filehandle into
another, then why do you need the second filehandle?

If the issue is that the mapping is expensive then what stops you from
caching the first filehandle for the duration of the file being open?

> b/ perform LOOKUP on parent filehandle with same name as before, and use
> the resulting filehandle.
> Server specifies which somehow (different error code? magic attribute
> flag somewhere? doesn't really matter)

How do I know this is the same file?

> If a server has objects that are never renamed, it can easily use volatile
> file handles.

How do you deal with unlink("foo") followed by create("foo")? The spec
says that the server is free to return FHEXPIRED in this case too.

> If a server has objects which can be renamed and wants to use volatile file
> handles, then if such an object is open and is about to be renamed, it must
> first log to stable storage some mapping to allow it to access the file from
> the old volatile file handle. And of course it cannot allow renames during
> the grace period, but I think we already have that.
> Also, if the VFH is such that it will be lost on a reboot, the server must
> log it to stable storage before allowing an open.
>
> >
> > Say there was some way for the client to indicate which filehandles it
> > currently has in use, and some way for the server to ask the client to
> > return in-use filehandles if there are too many (like DELEG_RECALL_ANY).
> > Then the server could at least place a limit on the number of
> > filehandles that it had to guarantee persistent.
> >
> > And/or the client could get a callback on rename/link/unlink. Bah.
> >
> > Would any of that actually be easier than implementing persistent file
> > handles?
>
> Easier for whom? Should NFSv4 be designed to make life easier for filesystem
> implementers, or easier for NFS implementers :-?
>
> While I don't have concrete examples I would not be surprised if there were
> filesystems where implementing limited persistence was practical while
> implementing universal persistence was not.

The question is why would we need to support exporting such filesystems
over NFS?

The thing to note is that not everything in the NFSv4 spec is actually
useful. A lot of it is "it seemed like a good idea a the time" material
and is recognisably incompletely thought through (volatile filehandles
being a major case in point). That's why we need to

A. demand very concrete use-cases with very real reasons for why
there is no alternative
B. make sure that we work out the spec details before attempting
implementations.

Trond

Trond

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-11-12 00:27:55

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH/RFC 5/7] Added VFH FHEXPIRED recovery functions.

On Fri, 2011-11-11 at 18:04 -0500, Matthew Treinish wrote:
> +static int nfs4_proc_vfh_lookup(struct rpc_clnt *clnt, struct inode *dir,
> + struct qstr *name, struct nfs_fh *fhandle, struct nfs_fattr *fattr)
> +{
> + struct nfs4_exception exception = { };
> + int err;
> + do {
> + int status;
> +
> + status = _nfs4_proc_lookup(clnt, dir, name, fhandle, fattr);
> + switch (status) {
> + case -NFS4ERR_BADNAME:
> + return -ENOENT;
> + case -NFS4ERR_MOVED:
> + err = nfs4_get_referral(dir, name, fattr, fhandle);
> + break;
> + case -NFS4ERR_FHEXPIRED:
> + return -NFS4ERR_FHEXPIRED;
> + case -NFS4ERR_WRONGSEC:
> + nfs_fixup_secinfo_attributes(fattr, fhandle);

case -NFS4ERR_ACCESS:
???????

> + }
> + err = nfs4_handle_exception(NFS_SERVER(dir),
> + status, &exception);
> + } while (exception.retry);
> + return err;
> +}
> +

What execution context is this function going to be running under and
what guarantees that it actually has the right file access credentials
to allow it to perform a lookup?
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-11-13 16:42:51

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Sun, Nov 13, 2011 at 02:54:00PM +1100, NeilBrown wrote:
> So if you know the filesystem will only return FHEXPIRED for filehandles
> belonging to files that cannot be renamed, then it is perfectly reasonable to
> repeat the name lookup to re-access the file after the server forgets about
> an old filehandle. The mount option is how you communicate this knowledge,
> because the RFC doesn't provide a way to communicate it.

What about http://tools.ietf.org/html/rfc5661#section-11.11
STATUS4_FIXED?

--b.

2011-11-11 23:05:55

by Matthew Treinish

[permalink] [raw]
Subject: [PATCH/RFC 3/7] Add VFS objects from nfs4_proc calls into nfs4_exception.

To successfully recover an expired file handle we'll need either
an inode or preferably a dentry to enable us to recursively walk
back to the root of the export.

Signed-off-by: Matthew Treinish <[email protected]>
---
fs/nfs/nfs4_fs.h | 2 +
fs/nfs/nfs4proc.c | 115 ++++++++++++++++++++++++++++++++++++++++++-----------
2 files changed, 94 insertions(+), 23 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 3e93e9a..8fe81d8 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -215,6 +215,8 @@ struct nfs4_exception {
long timeout;
int retry;
struct nfs4_state *state;
+ struct dentry *dentry;
+ struct inode *inode;
};

struct nfs4_state_recovery_ops {
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 86c273e..20b96cb 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1239,7 +1239,10 @@ static int _nfs4_do_open_reclaim(struct nfs_open_context *ctx, struct nfs4_state
static int nfs4_do_open_reclaim(struct nfs_open_context *ctx, struct nfs4_state *state)
{
struct nfs_server *server = NFS_SERVER(state->inode);
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = state->inode,
+ .dentry = NULL,
+ };
int err;
do {
err = _nfs4_do_open_reclaim(ctx, state);
@@ -1281,7 +1284,10 @@ static int _nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs

int nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs4_state *state, const nfs4_stateid *stateid)
{
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = state->inode,
+ .dentry = NULL,
+ };
struct nfs_server *server = NFS_SERVER(state->inode);
int err;
do {
@@ -1666,7 +1672,10 @@ static int _nfs4_open_expired(struct nfs_open_context *ctx, struct nfs4_state *s
static int nfs4_do_open_expired(struct nfs_open_context *ctx, struct nfs4_state *state)
{
struct nfs_server *server = NFS_SERVER(state->inode);
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = state->inode,
+ .dentry = NULL,
+ };
int err;

do {
@@ -1795,7 +1804,10 @@ out_err:

static struct nfs4_state *nfs4_do_open(struct inode *dir, struct dentry *dentry, fmode_t fmode, int flags, struct iattr *sattr, struct rpc_cred *cred)
{
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .dentry = dentry,
+ .inode = NULL,
+ };
struct nfs4_state *res;
int status;

@@ -1886,7 +1898,10 @@ static int nfs4_do_setattr(struct inode *inode, struct rpc_cred *cred,
struct nfs4_state *state)
{
struct nfs_server *server = NFS_SERVER(inode);
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = inode,
+ .dentry = NULL,
+ };
int err;
do {
err = nfs4_handle_exception(server,
@@ -2455,7 +2470,10 @@ void nfs_fixup_secinfo_attributes(struct nfs_fattr *fattr, struct nfs_fh *fh)
static int nfs4_proc_lookup(struct rpc_clnt *clnt, struct inode *dir, struct qstr *name,
struct nfs_fh *fhandle, struct nfs_fattr *fattr)
{
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = dir,
+ .dentry = NULL,
+ };
int err;
do {
int status;
@@ -2533,7 +2551,10 @@ static int _nfs4_proc_access(struct inode *inode, struct nfs_access_entry *entry

static int nfs4_proc_access(struct inode *inode, struct nfs_access_entry *entry)
{
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = inode,
+ .dentry = NULL,
+ };
int err;
do {
err = nfs4_handle_exception(NFS_SERVER(inode),
@@ -2589,7 +2610,10 @@ static int _nfs4_proc_readlink(struct inode *inode, struct page *page,
static int nfs4_proc_readlink(struct inode *inode, struct page *page,
unsigned int pgbase, unsigned int pglen)
{
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = inode,
+ .dentry = NULL,
+ };
int err;
do {
err = nfs4_handle_exception(NFS_SERVER(inode),
@@ -2681,7 +2705,10 @@ out:

static int nfs4_proc_remove(struct inode *dir, struct qstr *name)
{
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = dir,
+ .dentry = NULL,
+ };
int err;
do {
err = nfs4_handle_exception(NFS_SERVER(dir),
@@ -2835,7 +2862,10 @@ out:

static int nfs4_proc_link(struct inode *inode, struct inode *dir, struct qstr *name)
{
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = inode,
+ .dentry = NULL,
+ };
int err;
do {
err = nfs4_handle_exception(NFS_SERVER(inode),
@@ -2927,7 +2957,10 @@ out:
static int nfs4_proc_symlink(struct inode *dir, struct dentry *dentry,
struct page *page, unsigned int len, struct iattr *sattr)
{
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .dentry = dentry,
+ .inode = NULL,
+ };
int err;
do {
err = nfs4_handle_exception(NFS_SERVER(dir),
@@ -2958,7 +2991,10 @@ out:
static int nfs4_proc_mkdir(struct inode *dir, struct dentry *dentry,
struct iattr *sattr)
{
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .dentry = dentry,
+ .inode = NULL,
+ };
int err;

sattr->ia_mode &= ~current_umask();
@@ -3012,7 +3048,10 @@ static int _nfs4_proc_readdir(struct dentry *dentry, struct rpc_cred *cred,
static int nfs4_proc_readdir(struct dentry *dentry, struct rpc_cred *cred,
u64 cookie, struct page **pages, unsigned int count, int plus)
{
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .dentry = dentry,
+ .inode = NULL,
+ };
int err;
do {
err = nfs4_handle_exception(NFS_SERVER(dentry->d_inode),
@@ -3060,7 +3099,10 @@ out:
static int nfs4_proc_mknod(struct inode *dir, struct dentry *dentry,
struct iattr *sattr, dev_t rdev)
{
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .dentry = dentry,
+ .inode = NULL,
+ };
int err;

sattr->ia_mode &= ~current_umask();
@@ -3590,7 +3632,10 @@ out_free:

static ssize_t nfs4_get_acl_uncached(struct inode *inode, void *buf, size_t buflen)
{
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = inode,
+ .dentry = NULL,
+ };
ssize_t ret;
do {
ret = __nfs4_get_acl_uncached(inode, buf, buflen);
@@ -3665,7 +3710,10 @@ static int __nfs4_proc_set_acl(struct inode *inode, const void *buf, size_t bufl

static int nfs4_proc_set_acl(struct inode *inode, const void *buf, size_t buflen)
{
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = inode,
+ .dentry = NULL,
+ };
int err;
do {
err = nfs4_handle_exception(NFS_SERVER(inode),
@@ -3928,7 +3976,10 @@ out:
int nfs4_proc_delegreturn(struct inode *inode, struct rpc_cred *cred, const nfs4_stateid *stateid, int issync)
{
struct nfs_server *server = NFS_SERVER(inode);
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = inode,
+ .dentry = NULL,
+ };
int err;
do {
err = _nfs4_proc_delegreturn(inode, cred, stateid, issync);
@@ -4002,7 +4053,10 @@ out:

static int nfs4_proc_getlk(struct nfs4_state *state, int cmd, struct file_lock *request)
{
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = state->inode,
+ .dentry = NULL,
+ };
int err;

do {
@@ -4406,7 +4460,10 @@ static int _nfs4_do_setlk(struct nfs4_state *state, int cmd, struct file_lock *f
static int nfs4_lock_reclaim(struct nfs4_state *state, struct file_lock *request)
{
struct nfs_server *server = NFS_SERVER(state->inode);
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = state->inode,
+ .dentry = NULL,
+ };
int err;

do {
@@ -4424,7 +4481,10 @@ static int nfs4_lock_reclaim(struct nfs4_state *state, struct file_lock *request
static int nfs4_lock_expired(struct nfs4_state *state, struct file_lock *request)
{
struct nfs_server *server = NFS_SERVER(state->inode);
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = state->inode,
+ .dentry = NULL,
+ };
int err;

err = nfs4_set_lock_state(state, request);
@@ -4502,7 +4562,10 @@ out:

static int nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock *request)
{
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = state->inode,
+ .dentry = NULL,
+ };
int err;

do {
@@ -4562,7 +4625,10 @@ nfs4_proc_lock(struct file *filp, int cmd, struct file_lock *request)
int nfs4_lock_delegation_recall(struct nfs4_state *state, struct file_lock *fl)
{
struct nfs_server *server = NFS_SERVER(state->inode);
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = state->inode,
+ .dentry = NULL,
+ };
int err;

err = nfs4_set_lock_state(state, fl);
@@ -4769,7 +4835,10 @@ static int _nfs4_proc_secinfo(struct inode *dir, const struct qstr *name, struct

int nfs4_proc_secinfo(struct inode *dir, const struct qstr *name, struct nfs4_secinfo_flavors *flavors)
{
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = dir,
+ .dentry = NULL,
+ };
int err;
do {
err = nfs4_handle_exception(NFS_SERVER(dir),
--
1.7.4.4


2011-11-11 23:06:15

by Matthew Treinish

[permalink] [raw]
Subject: [PATCH/RFC 6/7] Perform recovery on both inodes for rename.

Rename is a special case because it passes 2 file handles to the server. If the
server replies with NFS4ERR_FHEXPIRED we can't tell which of the to file handles
are expired.

To remedy this, on receiving FHEXPIRED, rename will first call
nfs4_fhexpired_recovery() with the old_dir inode. If this succeeds it will then
call nfs4_fhexpired_recovery() with the new_dir inode. This will bypass
nfs4_handle_exception() on FHEXPIRED errors for renames. Allow us to run
recovery on each of the inodes.

Signed-off-by: Matthew Treinish <[email protected]>
---
fs/nfs/nfs4proc.c | 23 ++++++++++++++++++-----
1 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 50bb823..ebc5ee9 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2907,13 +2907,26 @@ out:
static int nfs4_proc_rename(struct inode *old_dir, struct qstr *old_name,
struct inode *new_dir, struct qstr *new_name)
{
- struct nfs4_exception exception = { };
+ struct nfs4_exception exception = {
+ .inode = old_dir,
+ .dentry = NULL,
+ };
int err;
do {
- err = nfs4_handle_exception(NFS_SERVER(old_dir),
- _nfs4_proc_rename(old_dir, old_name,
- new_dir, new_name),
- &exception);
+ err = _nfs4_proc_rename(old_dir, old_name, new_dir, new_name);
+ if (err == -NFS4ERR_FHEXPIRED) {
+ err = nfs4_fhexpired_recovery(NFS_SERVER(old_dir),
+ &exception);
+ if (err == -EAGAIN) {
+ exception.inode = new_dir;
+ err = nfs4_fhexpired_recovery(
+ NFS_SERVER(old_dir),
+ &exception);
+ }
+ return err;
+ }
+ err = nfs4_handle_exception(NFS_SERVER(old_dir), err,
+ &exception);
} while (exception.retry);
return err;
}
--
1.7.4.4


2011-11-15 22:38:37

by Matthew Treinish

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Tue, Nov 15, 2011 at 08:49:51AM +0200, Trond Myklebust wrote:
> On Sun, 2011-11-13 at 13:06 -0500, Matthew Treinish wrote:
> > On Sun, Nov 13, 2011 at 02:54:00PM +1100, NeilBrown wrote:
> > > On Sat, 12 Nov 2011 09:49:53 -0500 Christoph Hellwig <[email protected]>
> > > wrote:
> > >
> > > > On Fri, Nov 11, 2011 at 07:13:29PM -0500, Trond Myklebust wrote:
> > > > > On Fri, 2011-11-11 at 18:04 -0500, Matthew Treinish wrote:
> > > > > > This patch series implements client side support for volatile file handle
> > > > > > recovery (RFC 3530 section 4.2 and 4.3) with walk back using the dcache. To
> > > > > > test the client you either need a server that supports volatile file handles or
> > > > > > you can hard code the server to output NFS4ERR_FHEXPIRED instead of
> > > > > > NFSERR_STALE. (See the last patch in the series)
> > > > >
> > > > > WHY do we want to support this kind of "feature"? As you said, the RFC
> > > > > doesn't actually help in figuring out how this crap is supposed to work
> > > > > in practice, so why do we even consider starting to give a damn?
> > > >
> > > > *nod*. Pretending we handle it seems fairly dangerous. I'd much prefer
> > > > outright rejecting it.
> > >
> > > Hence the suggested mount option.
> > >
> > > A server might not be able to provide stable file handles, but can ensure
> > > that files don't get renamed - for these filesystems, the name is a
> > > reliable stable handle for the file (it just doesn't fit in the NFSv4 file
> > > handle structure).
> > >
> > > So if you know the filesystem will only return FHEXPIRED for filehandles
> > > belonging to files that cannot be renamed, then it is perfectly reasonable to
> > > repeat the name lookup to re-access the file after the server forgets about
> > > an old filehandle. The mount option is how you communicate this knowledge,
> > > because the RFC doesn't provide a way to communicate it.
> > >
> > This was one of 2 reasons for implementing this, and we actually run into this with
> > certain z/OS systems, because the z/OS NFS server currently uses FHEXPIRED in this way.
>
> So you're both basically saying that 'we know that this is a bad idea,
> so let's punt it to the users and assume they will know those few
> exceptions when it is safe to use'?
> In that case, are you planning on documenting what constitutes safe
> usage? So far, I've seen nothing either in the discussion here or in the
> changelogs that explains precisely when you can safely enable this mount
> option.
>
> Note that just disabling renames is, as I stated yesterday, not a
> sufficient condition. You pretty much need a read-only filesystem
> situation, in which case you can easily devise persistent filehandle
> solutions that work just as well.
>
Yes, I agree documenting the risks associated with the mount option is a
necessity, but something that I clearly overlooked. How about something like:

This option enables volatile filehandle recovery by re-lookup
on FHEXPIRED errors. Only use this mount option if the
filenames/paths on the server are not going to change from the
initial expiration until all the recovery operations complete.
Otherwise the validity of the files from the server can not be
guaranteed. It can only truly be considered safe to use on a
linux server, if the filesystem is read-only.

> > The other thought was that this could be used for migration/replication
> > between file synced servers. So, if we wanted to switch/move to another server where
> > the file names were the same but all the inode numbers were different you could use
> > this to refresh the invalid file handles on the new server.
>
> This runs into the rename problem. How do you guarantee that the files
> haven't been renamed before the migration event occurred? How does the
> client identify that the file is the same one when it looks it up on the
> new server?
>

I don't think there is a way to guarantee that the files haven't been renamed
before the migration event. It would probably only be fully safe under the same
conditions as above.



2011-11-12 00:13:31

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Fri, 2011-11-11 at 18:04 -0500, Matthew Treinish wrote:
> This patch series implements client side support for volatile file handle
> recovery (RFC 3530 section 4.2 and 4.3) with walk back using the dcache. To
> test the client you either need a server that supports volatile file handles or
> you can hard code the server to output NFS4ERR_FHEXPIRED instead of
> NFSERR_STALE. (See the last patch in the series)

WHY do we want to support this kind of "feature"? As you said, the RFC
doesn't actually help in figuring out how this crap is supposed to work
in practice, so why do we even consider starting to give a damn?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-11-13 18:25:36

by Matthew Treinish

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Sun, Nov 13, 2011 at 11:45:36AM -0500, J. Bruce Fields wrote:
>
> Also, could a re-looked-up file be considered sufficiently safe to use
> if all the attributes matched?
>
> (I guess not: inode numbers, change attributes, etc., could agree by
> coincidence, so it would never be completely reliable.)
>

We thought of doing something like that, but like you said it's not a
guarantee. That was part of the reason why we went with the mount option.
By using it, the user will assume that whatever files the re-look-ups return
are safe/correct.

-Matt Treinish


2011-11-11 23:05:54

by Matthew Treinish

[permalink] [raw]
Subject: [PATCH/RFC 2/7] Added support for FH_EXPIRE_TYPE attribute.

FH_EXPIRE_TYPE is used by the client to determine
what type of filehandle the server is providing for a
particular filesystem.

I added the bitmask to fsinfo since the bitmask is
set per filesystem.

Signed-off-by: Matthew Treinish <[email protected]>
---
fs/nfs/client.c | 2 ++
fs/nfs/nfs4proc.c | 3 ++-
fs/nfs/nfs4xdr.c | 27 +++++++++++++++++++++++++++
include/linux/nfs_fs_sb.h | 1 +
include/linux/nfs_xdr.h | 1 +
5 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 873bf00..44dedfe 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -957,6 +957,8 @@ static void nfs_server_set_fsinfo(struct nfs_server *server,

server->time_delta = fsinfo->time_delta;

+ server->fhexpiretype = fsinfo->fhexpiretype;
+
/* We're airborne Set socket buffersize */
rpc_setbufsize(server->client, server->wsize + 100, server->rsize + 100);
}
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index b60fddf..86c273e 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -140,7 +140,8 @@ const u32 nfs4_pathconf_bitmap[2] = {
const u32 nfs4_fsinfo_bitmap[3] = { FATTR4_WORD0_MAXFILESIZE
| FATTR4_WORD0_MAXREAD
| FATTR4_WORD0_MAXWRITE
- | FATTR4_WORD0_LEASE_TIME,
+ | FATTR4_WORD0_LEASE_TIME
+ | FATTR4_WORD0_FH_EXPIRE_TYPE,
FATTR4_WORD1_TIME_DELTA
| FATTR4_WORD1_FS_LAYOUT_TYPES,
FATTR4_WORD2_LAYOUT_BLKSIZE
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index e6161b2..4d775e2 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -4508,6 +4508,30 @@ static int decode_attr_layout_blksize(struct xdr_stream *xdr, uint32_t *bitmap,
return 0;
}

+/*
+ * The VFH expire type bitmask
+ */
+
+static int decode_attr_fh_expire_type(struct xdr_stream *xdr, uint32_t *bitmap,
+ uint32_t *res)
+{
+ __be32 *p;
+
+ dprintk("%s: bitmap %x\n", __func__, bitmap[0]);
+ *res = 0;
+ if (bitmap[0] & FATTR4_WORD0_FH_EXPIRE_TYPE) {
+ p = xdr_inline_decode(xdr, 4);
+ if (unlikely(!p)) {
+ print_overflow_msg(__func__, xdr);
+ return -EIO;
+ }
+ *res = be32_to_cpup(p);
+ bitmap[0] &= ~FATTR4_WORD0_FH_EXPIRE_TYPE;
+ }
+ return 0;
+
+}
+
static int decode_fsinfo(struct xdr_stream *xdr, struct nfs_fsinfo *fsinfo)
{
__be32 *savep;
@@ -4523,6 +4547,9 @@ static int decode_fsinfo(struct xdr_stream *xdr, struct nfs_fsinfo *fsinfo)

fsinfo->rtmult = fsinfo->wtmult = 512; /* ??? */

+ if ((status = decode_attr_fh_expire_type(xdr, bitmap, &fsinfo->fhexpiretype)) != 0)
+ goto xdr_error;
+
if ((status = decode_attr_lease_time(xdr, bitmap, &fsinfo->lease_time)) != 0)
goto xdr_error;
if ((status = decode_attr_maxfilesize(xdr, bitmap, &fsinfo->maxfilesize)) != 0)
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index b5479df..706c92b 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -112,6 +112,7 @@ struct nfs_server {
unsigned int dtsize; /* readdir size */
unsigned short port; /* "port=" setting */
unsigned int bsize; /* server block size */
+ unsigned int fhexpiretype; /* VFH attributes */
unsigned int acregmin; /* attr cache timeouts */
unsigned int acregmax;
unsigned int acdirmin;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index c74595b..72ba66b 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -123,6 +123,7 @@ struct nfs_fsinfo {
__u32 lease_time; /* in seconds */
__u32 layouttype; /* supported pnfs layout driver */
__u32 blksize; /* preferred pnfs io block size */
+ __u32 fhexpiretype; /* VFH attributes */
};

struct nfs_fsstat {
--
1.7.4.4


2011-11-14 01:26:20

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Sun, 13 Nov 2011 19:42:54 -0500 "J. Bruce Fields" <[email protected]>
wrote:

> On Mon, Nov 14, 2011 at 08:07:45AM +1100, NeilBrown wrote:
> > On Sun, 13 Nov 2011 11:36:32 -0500 "J. Bruce Fields" <[email protected]>
> > wrote:
> >
> > > On Sun, Nov 13, 2011 at 02:45:48PM +0100, Tigran Mkrtchyan wrote:
> > > > I have a server which runs on top of hadoop. The problem with hadoop
> > > > is that there is no way to have persistent file handles. I am
> > > > currently working on a way to do that - either simulate them or add a
> > > > support for unique file id to hadoop. If linux client will support
> > > > volatile file handles then I can stop inventing some workarounds.
> > >
> > > I might call that "fixing" rather than inventing workarounds.
> > >
> > > Our of curiosity: if we really wanted to support such filesystems, what
> > > would we need in the protocol? Just saying "filehandles aren't stable,
> > > deal with it" seems insufficient.
> >
> > 1/ no guarantees if the file is not 'open'
> > 2/ two possible responses to FHEXPIRED:
> > a/ perform a GETATTR and request the 'filehandle' attribute. Client then
> > uses that filehandle instead.
> > b/ perform LOOKUP on parent filehandle with same name as before, and use
> > the resulting filehandle.
> > Server specifies which somehow (different error code? magic attribute
> > flag somewhere? doesn't really matter)
> >
> > If a server has objects that are never renamed, it can easily use volatile
> > file handles.
> > If a server has objects which can be renamed and wants to use volatile file
> > handles, then if such an object is open and is about to be renamed, it must
> > first log to stable storage some mapping to allow it to access the file from
> > the old volatile file handle.
>
> I think then there's no limit to the lifetime of those log entries, or
> to the size of the log?

The lifetime of the log entry matches the lifetime of an 'open', just like
any state that the server holds on behalf of a client. They can be discarded
on last close, or when you haven't heard from the client for the lease-time.

The size of the log is bounded by the maximum number of allowed clients, and
the maximum number of concurrent opens per client. i.e. it is the same order
and the amount of open state that the server needs to store at any one time.

So the only really 'new' thing here is that more of the state needs to be on
stable storage - it isn't really substantially more state.

NeilBrown


Attachments:
signature.asc (828.00 B)

2011-11-11 23:06:17

by Matthew Treinish

[permalink] [raw]
Subject: [PATCH/RFC 4/7] Save root file handle in nfs_server.

Save each FSID's root directory file handle in the
export's local nfs_server structure on the client.
This FH can later be used by the migration recovery
logic.

NB: Saving the root FH is done only for NFSv4 mounts.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Matthew Treinish <[email protected]>
---
fs/nfs/client.c | 1 +
fs/nfs/getroot.c | 7 +++++++
include/linux/nfs_fs_sb.h | 1 +
3 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 44dedfe..641f69f 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1108,6 +1108,7 @@ void nfs_free_server(struct nfs_server *server)
nfs_put_client(server->nfs_client);

nfs_free_iostats(server->io_stats);
+ nfs_free_fhandle(server->rootfh);
bdi_destroy(&server->backing_dev_info);
kfree(server);
nfs_release_automount_timer();
diff --git a/fs/nfs/getroot.c b/fs/nfs/getroot.c
index dcb6154..51ca63b 100644
--- a/fs/nfs/getroot.c
+++ b/fs/nfs/getroot.c
@@ -232,6 +232,13 @@ struct dentry *nfs4_get_root(struct super_block *sb, struct nfs_fh *mntfh,
ret = ERR_CAST(inode);
goto out;
}
+ server->rootfh = nfs_alloc_fhandle();
+ if (server->rootfh == NULL) {
+ dprintk("nfs_get_root: alloc rootfh failed\n");
+ ret = ERR_PTR(-ENOMEM);
+ goto out;
+ }
+ nfs_copy_fh(server->rootfh, mntfh);

error = nfs_superblock_set_dummy_root(sb, inode);
if (error != 0) {
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 706c92b..5261fa1 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -157,6 +157,7 @@ struct nfs_server {
struct list_head layouts;
struct list_head delegations;
void (*destroy)(struct nfs_server *);
+ struct nfs_fh *rootfh;

atomic_t active; /* Keep trace of any activity to this server */

--
1.7.4.4


2011-11-13 03:54:12

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Sat, 12 Nov 2011 09:49:53 -0500 Christoph Hellwig <[email protected]>
wrote:

> On Fri, Nov 11, 2011 at 07:13:29PM -0500, Trond Myklebust wrote:
> > On Fri, 2011-11-11 at 18:04 -0500, Matthew Treinish wrote:
> > > This patch series implements client side support for volatile file handle
> > > recovery (RFC 3530 section 4.2 and 4.3) with walk back using the dcache. To
> > > test the client you either need a server that supports volatile file handles or
> > > you can hard code the server to output NFS4ERR_FHEXPIRED instead of
> > > NFSERR_STALE. (See the last patch in the series)
> >
> > WHY do we want to support this kind of "feature"? As you said, the RFC
> > doesn't actually help in figuring out how this crap is supposed to work
> > in practice, so why do we even consider starting to give a damn?
>
> *nod*. Pretending we handle it seems fairly dangerous. I'd much prefer
> outright rejecting it.

Hence the suggested mount option.

A server might not be able to provide stable file handles, but can ensure
that files don't get renamed - for these filesystems, the name is a
reliable stable handle for the file (it just doesn't fit in the NFSv4 file
handle structure).

So if you know the filesystem will only return FHEXPIRED for filehandles
belonging to files that cannot be renamed, then it is perfectly reasonable to
repeat the name lookup to re-access the file after the server forgets about
an old filehandle. The mount option is how you communicate this knowledge,
because the RFC doesn't provide a way to communicate it.

NeilBrown



Attachments:
signature.asc (828.00 B)

2011-11-15 06:34:03

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Mon, 2011-11-14 at 08:07 +1100, NeilBrown wrote:

> If a server has objects that are never renamed, it can easily use volatile
> file handles.
> If a server has objects which can be renamed and wants to use volatile file
> handles, then if such an object is open and is about to be renamed, it must
> first log to stable storage some mapping to allow it to access the file from
> the old volatile file handle. And of course it cannot allow renames during
> the grace period, but I think we already have that.
> Also, if the VFH is such that it will be lost on a reboot, the server must
> log it to stable storage before allowing an open.

BTW: If the namespace is stable, then the server can easily implement
permanent filehandles. Use a hash of the pathname as the filehandle, and
set up a hidden directory ('/.filehandles') containing symlinks that map
said hash back to the correct pathname. No need for volatile
filehandles.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-11-14 00:43:07

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Mon, Nov 14, 2011 at 08:07:45AM +1100, NeilBrown wrote:
> On Sun, 13 Nov 2011 11:36:32 -0500 "J. Bruce Fields" <[email protected]>
> wrote:
>
> > On Sun, Nov 13, 2011 at 02:45:48PM +0100, Tigran Mkrtchyan wrote:
> > > I have a server which runs on top of hadoop. The problem with hadoop
> > > is that there is no way to have persistent file handles. I am
> > > currently working on a way to do that - either simulate them or add a
> > > support for unique file id to hadoop. If linux client will support
> > > volatile file handles then I can stop inventing some workarounds.
> >
> > I might call that "fixing" rather than inventing workarounds.
> >
> > Our of curiosity: if we really wanted to support such filesystems, what
> > would we need in the protocol? Just saying "filehandles aren't stable,
> > deal with it" seems insufficient.
>
> 1/ no guarantees if the file is not 'open'
> 2/ two possible responses to FHEXPIRED:
> a/ perform a GETATTR and request the 'filehandle' attribute. Client then
> uses that filehandle instead.
> b/ perform LOOKUP on parent filehandle with same name as before, and use
> the resulting filehandle.
> Server specifies which somehow (different error code? magic attribute
> flag somewhere? doesn't really matter)
>
> If a server has objects that are never renamed, it can easily use volatile
> file handles.
> If a server has objects which can be renamed and wants to use volatile file
> handles, then if such an object is open and is about to be renamed, it must
> first log to stable storage some mapping to allow it to access the file from
> the old volatile file handle.

I think then there's no limit to the lifetime of those log entries, or
to the size of the log?

--b.

> And of course it cannot allow renames during
> the grace period, but I think we already have that.
> Also, if the VFH is such that it will be lost on a reboot, the server must
> log it to stable storage before allowing an open.



2011-11-14 21:12:49

by Matthew Treinish

[permalink] [raw]
Subject: Re: [PATCH/RFC 5/7] Added VFH FHEXPIRED recovery functions.

On Sat, Nov 12, 2011 at 12:16:38PM -0500, Trond Myklebust wrote:
> > > > +static int nfs4_proc_vfh_lookup(struct rpc_clnt *clnt, struct inode *dir,
> > > > + struct qstr *name, struct nfs_fh *fhandle, struct nfs_fattr *fattr)
> > > > +{
> > > > + struct nfs4_exception exception = { };
> > > > + int err;
> > > > + do {
> > > > + int status;
> > > > +
> > > > + status = _nfs4_proc_lookup(clnt, dir, name, fhandle, fattr);
> > > > + switch (status) {
> > > > + case -NFS4ERR_BADNAME:
> > > > + return -ENOENT;
> > > > + case -NFS4ERR_MOVED:
> > > > + err = nfs4_get_referral(dir, name, fattr, fhandle);
> > > > + break;
> > > > + case -NFS4ERR_FHEXPIRED:
> > > > + return -NFS4ERR_FHEXPIRED;
> > > > + case -NFS4ERR_WRONGSEC:
> > > > + nfs_fixup_secinfo_attributes(fattr, fhandle);
> > >
> > > case -NFS4ERR_ACCESS:
> > > ???????
> > >
> > > > + }
> > > > + err = nfs4_handle_exception(NFS_SERVER(dir),
> > > > + status, &exception);
> > > > + } while (exception.retry);
> > > > + return err;
> > > > +}
> > > > +
> > >
> > > What execution context is this function going to be running under and
> > > what guarantees that it actually has the right file access credentials
> > > to allow it to perform a lookup?
> >
> > I imagine, it is in the context of the process that received FHEXPIRED
> > error. It may not have credentials to perform a lookup on parent
> > directories. If it doesn't, that would end up with ESTALE with Matt's
> > patches, right Matt?
Yes, the current code should be run in the context of the process that
received the FHEXPIRED. I think that if the server returns NFS4ERR_ACCESS
(or EACCESS?) that my patches will keep that error. The only time my patches
return ESTALE right now is if either the mount option is not used, or the
server doesn't use NFS4_FH_VOLATILE_ANY. Any other errors received during
recovery are returned like they normally would be.

>
> My point is that if you don't have the ability to pass a credential as
> an argument, then you won't be able to recover from something like an
> OPEN, READ or WRITE, which all happen in the rpciod context, nor can you
> recover from the state recovery thread context.
>
> Note also that you are doing synchronous I/O, and so you will need a
> recovery thread context anyway in order to recover from stuff running in
> the rpciod context...
>

Hmm, I didn't even think about that when I wrote this...

So, how would you suggest I go about correcting this? Should I add a rpc_cred
argument to the vfh lookup and then try to switch the context on ACCESS errors?

Thanks,

Matt Treinish


2011-11-15 06:49:54

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Sun, 2011-11-13 at 13:06 -0500, Matthew Treinish wrote:
> On Sun, Nov 13, 2011 at 02:54:00PM +1100, NeilBrown wrote:
> > On Sat, 12 Nov 2011 09:49:53 -0500 Christoph Hellwig <[email protected]>
> > wrote:
> >
> > > On Fri, Nov 11, 2011 at 07:13:29PM -0500, Trond Myklebust wrote:
> > > > On Fri, 2011-11-11 at 18:04 -0500, Matthew Treinish wrote:
> > > > > This patch series implements client side support for volatile file handle
> > > > > recovery (RFC 3530 section 4.2 and 4.3) with walk back using the dcache. To
> > > > > test the client you either need a server that supports volatile file handles or
> > > > > you can hard code the server to output NFS4ERR_FHEXPIRED instead of
> > > > > NFSERR_STALE. (See the last patch in the series)
> > > >
> > > > WHY do we want to support this kind of "feature"? As you said, the RFC
> > > > doesn't actually help in figuring out how this crap is supposed to work
> > > > in practice, so why do we even consider starting to give a damn?
> > >
> > > *nod*. Pretending we handle it seems fairly dangerous. I'd much prefer
> > > outright rejecting it.
> >
> > Hence the suggested mount option.
> >
> > A server might not be able to provide stable file handles, but can ensure
> > that files don't get renamed - for these filesystems, the name is a
> > reliable stable handle for the file (it just doesn't fit in the NFSv4 file
> > handle structure).
> >
> > So if you know the filesystem will only return FHEXPIRED for filehandles
> > belonging to files that cannot be renamed, then it is perfectly reasonable to
> > repeat the name lookup to re-access the file after the server forgets about
> > an old filehandle. The mount option is how you communicate this knowledge,
> > because the RFC doesn't provide a way to communicate it.
> >
> This was one of 2 reasons for implementing this, and we actually run into this with
> certain z/OS systems, because the z/OS NFS server currently uses FHEXPIRED in this way.

So you're both basically saying that 'we know that this is a bad idea,
so let's punt it to the users and assume they will know those few
exceptions when it is safe to use'?
In that case, are you planning on documenting what constitutes safe
usage? So far, I've seen nothing either in the discussion here or in the
changelogs that explains precisely when you can safely enable this mount
option.

Note that just disabling renames is, as I stated yesterday, not a
sufficient condition. You pretty much need a read-only filesystem
situation, in which case you can easily devise persistent filehandle
solutions that work just as well.

> The other thought was that this could be used for migration/replication
> between file synced servers. So, if we wanted to switch/move to another server where
> the file names were the same but all the inode numbers were different you could use
> this to refresh the invalid file handles on the new server.

This runs into the rename problem. How do you guarantee that the files
haven't been renamed before the migration event occurred? How does the
client identify that the file is the same one when it looks it up on the
new server?

Trond

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-11-11 23:06:14

by Matthew Treinish

[permalink] [raw]
Subject: [PATCH/RFC] Hard code testing on server <ONLY FOR TESTING>

This should only be used for testing volatile file handle support

Force STALE errors to be FHEXPIRED and have server say it uses FH_VOLATILE_ANY.

To force an expired change the fsid to 0 from another value in /etc/exports on
the server and then restart the server. This would normally force all the
handles to be stale, which forces FHEXPIRED to be returned for all putfh
operations.

Signed-off-by: Matthew Treinish <[email protected]>

---
fs/nfsd/nfs4xdr.c | 2 +-
fs/nfsd/nfsd.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index c8bf405..9d4a768 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1963,7 +1963,7 @@ nfsd4_encode_fattr(struct svc_fh *fhp, struct svc_export *exp,
if ((buflen -= 4) < 0)
goto out_resource;
if (exp->ex_flags & NFSEXP_NOSUBTREECHECK)
- WRITE32(NFS4_FH_PERSISTENT);
+ WRITE32(NFS4_FH_PERSISTENT|NFS4_FH_VOLATILE_ANY);
else
WRITE32(NFS4_FH_PERSISTENT|NFS4_FH_VOL_RENAME);
}
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 7ecfa24..959652c 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -124,7 +124,7 @@ void nfsd_lockd_shutdown(void);
#define nfserr_nametoolong cpu_to_be32(NFSERR_NAMETOOLONG)
#define nfserr_notempty cpu_to_be32(NFSERR_NOTEMPTY)
#define nfserr_dquot cpu_to_be32(NFSERR_DQUOT)
-#define nfserr_stale cpu_to_be32(NFSERR_STALE)
+#define nfserr_stale cpu_to_be32(NFS4ERR_FHEXPIRED)
#define nfserr_remote cpu_to_be32(NFSERR_REMOTE)
#define nfserr_wflush cpu_to_be32(NFSERR_WFLUSH)
#define nfserr_badhandle cpu_to_be32(NFSERR_BADHANDLE)
--
1.7.4.4


2011-11-14 21:47:25

by Matthew Treinish

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Mon, Nov 14, 2011 at 10:09:58AM +0100, Tigran Mkrtchyan wrote:
> > The other thought was that this could be used for migration/replication
> > between file synced servers. So, if we wanted to switch/move to another server where
> > the file names were the same but all the inode numbers were different you could use
> > this to refresh the invalid file handles on the new server.
>
> If my scec reading is correct, then spec does not enforce you to use
> the same file handles on redirected server. This is of course you you
> want to return NFS4ERR_MOVED.

What I was referring to was Section 6.4:

Filehandles for filesystems that are replicated or migrated generally
have the same semantics as for filesystems that are not replicated or
migrated. For example, if a filesystem has persistent filehandles
and it is migrated to another server, the filehandle values for the
filesystem will be valid at the new server.

So in the case where you sync the files between 2 servers using some type of
file transfer (for example, rsync) the file handles will be different because
all the inode numbers would be different between the machines. You can use
VFH to get around this though, the same section states:

If the bit FH4_VOL_MIGRATION is set in the fh_expire_type attribute,
the client must treat the volatile filehandle as if the server had
returned the NFS4ERR_FHEXPIRED error. At the migration or replication
event in the presence of the FH4_VOL_MIGRATION bit, the client will not
present the original or old volatile filehandle to the new server.
The client will start its communication with the new server by
recovering its filehandles using the saved file names.

So assuming you synced the files so that the paths are the same on the 2 servers,
by having the servers just return FH4_VOL_MIGRATION for FH_EXPIRE_TYPE, you could
then use VFH recovery to lookup the current file handles for the new server.

-Matt Treinish


2011-11-12 14:50:00

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Fri, Nov 11, 2011 at 07:13:29PM -0500, Trond Myklebust wrote:
> On Fri, 2011-11-11 at 18:04 -0500, Matthew Treinish wrote:
> > This patch series implements client side support for volatile file handle
> > recovery (RFC 3530 section 4.2 and 4.3) with walk back using the dcache. To
> > test the client you either need a server that supports volatile file handles or
> > you can hard code the server to output NFS4ERR_FHEXPIRED instead of
> > NFSERR_STALE. (See the last patch in the series)
>
> WHY do we want to support this kind of "feature"? As you said, the RFC
> doesn't actually help in figuring out how this crap is supposed to work
> in practice, so why do we even consider starting to give a damn?

*nod*. Pretending we handle it seems fairly dangerous. I'd much prefer
outright rejecting it.

2011-11-11 23:06:02

by Matthew Treinish

[permalink] [raw]
Subject: [PATCH/RFC 7/7] Added error handling for NFS4ERR_FHEXPIRED

Added checks in the nfs4_handle_exception for FHEXPIRED.
If FHEXPIRED is received from the server and the appropriate
attributes are enabled then the client calls
nfs4_fhexpired_recovery() to perform the lookup operation to
try and recovery the expired vfh.

If the mount option is not enabled or the server FH_EXPIRE_TYPE
doesn't have VOLATILE_ANY then client will convert the
FHEXPIRED error into ESTALE since recovery isn't possible.

Signed-off-by: Matthew Treinish <[email protected]>
---
fs/nfs/nfs4proc.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index ebc5ee9..8ae5c49 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -277,6 +277,16 @@ static int nfs4_handle_exception(struct nfs_server *server, int errorcode, struc
case -NFS4ERR_STALE_CLIENTID:
nfs4_schedule_lease_recovery(clp);
goto wait_on_recovery;
+ case -NFS4ERR_FHEXPIRED:
+ if (server->flags & NFS_MOUNT_VFHRETRY)
+ if (server->fhexpiretype & NFS4_FH_VOLATILE_ANY) {
+ ret = nfs4_fhexpired_recovery(server, exception);
+ if (!ret)
+ ret = -EAGAIN;
+ break;
+ }
+ ret = -ESTALE;
+ break;
#if defined(CONFIG_NFS_V4_1)
case -NFS4ERR_BADSESSION:
case -NFS4ERR_BADSLOT:
--
1.7.4.4


2011-11-12 03:36:03

by Malahal Naineni

[permalink] [raw]
Subject: Re: [PATCH/RFC 1/7] New mount option for volatile filehandle recovery

Trond Myklebust [[email protected]] wrote:
> On Fri, 2011-11-11 at 18:04 -0500, Matthew Treinish wrote:
> > The new 'vfhretry' mount option will be used to enable the volatile filehandle
> > recovery routines in the client. On an expired filehandle recover the client
> > will attempt to recover by performing a lookup on the name of the file.
> >
> > This mechanism of recovery isn't necessarily safe for a posix filesystem so
> > using the mount option will allow the user to enable this at their own risk. If the mount option is not turned on, the FHEXPIRED error will be converted to
> > ESTALE.
>
> Either we handle NFS4ERR_FHEXPIRED, or we don't... What is the
> justification for wanting to turn this off on a per-mount basis?

VFH should work with read-only file systems. Is there a way to find if
the exported file system is read-only? If there is, Matt should use that
instead of using this mount option.

--Malahal.


2011-11-13 21:08:09

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Sun, 13 Nov 2011 11:36:32 -0500 "J. Bruce Fields" <[email protected]>
wrote:

> On Sun, Nov 13, 2011 at 02:45:48PM +0100, Tigran Mkrtchyan wrote:
> > I have a server which runs on top of hadoop. The problem with hadoop
> > is that there is no way to have persistent file handles. I am
> > currently working on a way to do that - either simulate them or add a
> > support for unique file id to hadoop. If linux client will support
> > volatile file handles then I can stop inventing some workarounds.
>
> I might call that "fixing" rather than inventing workarounds.
>
> Our of curiosity: if we really wanted to support such filesystems, what
> would we need in the protocol? Just saying "filehandles aren't stable,
> deal with it" seems insufficient.

1/ no guarantees if the file is not 'open'
2/ two possible responses to FHEXPIRED:
a/ perform a GETATTR and request the 'filehandle' attribute. Client then
uses that filehandle instead.
b/ perform LOOKUP on parent filehandle with same name as before, and use
the resulting filehandle.
Server specifies which somehow (different error code? magic attribute
flag somewhere? doesn't really matter)

If a server has objects that are never renamed, it can easily use volatile
file handles.
If a server has objects which can be renamed and wants to use volatile file
handles, then if such an object is open and is about to be renamed, it must
first log to stable storage some mapping to allow it to access the file from
the old volatile file handle. And of course it cannot allow renames during
the grace period, but I think we already have that.
Also, if the VFH is such that it will be lost on a reboot, the server must
log it to stable storage before allowing an open.

>
> Say there was some way for the client to indicate which filehandles it
> currently has in use, and some way for the server to ask the client to
> return in-use filehandles if there are too many (like DELEG_RECALL_ANY).
> Then the server could at least place a limit on the number of
> filehandles that it had to guarantee persistent.
>
> And/or the client could get a callback on rename/link/unlink. Bah.
>
> Would any of that actually be easier than implementing persistent file
> handles?

Easier for whom? Should NFSv4 be designed to make life easier for filesystem
implementers, or easier for NFS implementers :-?

While I don't have concrete examples I would not be surprised if there were
filesystems where implementing limited persistence was practical while
implementing universal persistence was not.


NeilBrown


Attachments:
signature.asc (828.00 B)

2011-11-13 16:45:41

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Sun, Nov 13, 2011 at 11:42:40AM -0500, J. Bruce Fields wrote:
> On Sun, Nov 13, 2011 at 02:54:00PM +1100, NeilBrown wrote:
> > So if you know the filesystem will only return FHEXPIRED for filehandles
> > belonging to files that cannot be renamed, then it is perfectly reasonable to
> > repeat the name lookup to re-access the file after the server forgets about
> > an old filehandle. The mount option is how you communicate this knowledge,
> > because the RFC doesn't provide a way to communicate it.
>
> What about http://tools.ietf.org/html/rfc5661#section-11.11
> STATUS4_FIXED?

Also, could a re-looked-up file be considered sufficiently safe to use
if all the attributes matched?

(I guess not: inode numbers, change attributes, etc., could agree by
coincidence, so it would never be completely reliable.)

--b.

2011-11-13 13:45:49

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

I have a server which runs on top of hadoop. The problem with hadoop
is that there is no way to have persistent file handles. I am
currently working on a way to do that - either simulate them or add a
support for unique file id to hadoop. If linux client will support
volatile file handles then I can stop inventing some workarounds.

Tigran.

On Sun, Nov 13, 2011 at 4:54 AM, NeilBrown <[email protected]> wrote:
> On Sat, 12 Nov 2011 09:49:53 -0500 Christoph Hellwig <[email protected]>
> wrote:
>
>> On Fri, Nov 11, 2011 at 07:13:29PM -0500, Trond Myklebust wrote:
>> > On Fri, 2011-11-11 at 18:04 -0500, Matthew Treinish wrote:
>> > > This patch series implements client side support for volatile file handle
>> > > recovery (RFC 3530 section 4.2 and 4.3) with walk back using the dcache. To
>> > > test the client you either need a server that supports volatile file handles or
>> > > you can hard code the server to output NFS4ERR_FHEXPIRED instead of
>> > > NFSERR_STALE. (See the last patch in the series)
>> >
>> > WHY do we want to support this kind of "feature"? As you said, the RFC
>> > doesn't actually help in figuring out how this crap is supposed to work
>> > in practice, so why do we even consider starting to give a damn?
>>
>> *nod*. Pretending we handle it seems fairly dangerous.  I'd much prefer
>> outright rejecting it.
>
> Hence the suggested mount option.
>
> A server might not be able to provide stable file handles, but can ensure
> that files don't get renamed - for these filesystems, the name is a
> reliable stable handle for the file (it just doesn't fit in the NFSv4 file
> handle structure).
>
> So if you know the filesystem will only return FHEXPIRED for filehandles
> belonging to files that cannot be renamed, then it is perfectly reasonable to
> repeat the name lookup to re-access the file after the server forgets about
> an old filehandle.  The mount option is how you communicate this knowledge,
> because the RFC doesn't provide a way to communicate it.
>
> NeilBrown
>
>
>

2012-01-17 15:18:27

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Mon, Jan 16, 2012 at 10:52:28AM -0600, Malahal Naineni wrote:
> J. Bruce Fields [[email protected]] wrote:
> > On Fri, Jan 13, 2012 at 11:09:14AM -0600, Malahal Naineni wrote:
> > > Trond Myklebust [[email protected]] wrote:
> > > > On Mon, 2011-11-14 at 08:07 +1100, NeilBrown wrote:
> > > >
> > > > > If a server has objects that are never renamed, it can easily use volatile
> > > > > file handles.
> > > > > If a server has objects which can be renamed and wants to use volatile file
> > > > > handles, then if such an object is open and is about to be renamed, it must
> > > > > first log to stable storage some mapping to allow it to access the file from
> > > > > the old volatile file handle. And of course it cannot allow renames during
> > > > > the grace period, but I think we already have that.
> > > > > Also, if the VFH is such that it will be lost on a reboot, the server must
> > > > > log it to stable storage before allowing an open.
> > > >
> > > > BTW: If the namespace is stable, then the server can easily implement
> > > > permanent filehandles. Use a hash of the pathname as the filehandle, and
> > > > set up a hidden directory ('/.filehandles') containing symlinks that map
> > > > said hash back to the correct pathname. No need for volatile
> > > > filehandles.
> > >
> > > Neil and Trond, one of our use cases is for a read only file system. The
> > > name space is stable and Volatile File Handle support should not have
> > > any issues under those conditions, correct?
> >
> > Dumb question: remind me which filesystem your exporting that can't
> > already generate stable filehandles?
>
> Only answers can be dumb! Bruce, we have ext3/ext4 file systems on two
> separate servers. The file systems are mirrored using rsync as and when
> needed. We would like to use the servers as replicas.

And why aren't you rsync'ing the underlying filesystem image instead?
Is that too slow?

> Since the file systems are mirrored using "rsync", the NFS file handles
> each server exports would be different. We would like to use volatile
> file handles feature of NFSv4 for this.

In theory the hidden directory for reverse lookups would work, but it
seems like it would be complicated to get right:
- How do you generate the directory and keep it up to date?
- What happens if somebody breaks the rules and updates the
filesystem while it's being exported?

Somehow it feels like there should be a simpler solution.

Maybe there would be other applications for that kind of
filehandle->file mapping, though, I don't know.

--b.

2012-01-17 17:22:53

by Malahal Naineni

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

J. Bruce Fields [[email protected]] wrote:
> > Only answers can be dumb! Bruce, we have ext3/ext4 file systems on two
> > separate servers. The file systems are mirrored using rsync as and when
> > needed. We would like to use the servers as replicas.
>
> And why aren't you rsync'ing the underlying filesystem image instead?
> Is that too slow?

The file system image is too big to do entire image level rsync'ing.

> > Since the file systems are mirrored using "rsync", the NFS file handles
> > each server exports would be different. We would like to use volatile
> > file handles feature of NFSv4 for this.
>
> In theory the hidden directory for reverse lookups would work, but it
> seems like it would be complicated to get right:
> - How do you generate the directory and keep it up to date?
> - What happens if somebody breaks the rules and updates the
> filesystem while it's being exported?

I think so too. We think Volatile file handle support on the client side
is simpler for our use case (read only NFS file systems)

Regards, Malahal.


2012-01-17 18:47:43

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Tue, Jan 17, 2012 at 11:22:31AM -0600, Malahal Naineni wrote:
> J. Bruce Fields [[email protected]] wrote:
> > > Only answers can be dumb! Bruce, we have ext3/ext4 file systems on two
> > > separate servers. The file systems are mirrored using rsync as and when
> > > needed. We would like to use the servers as replicas.
> >
> > And why aren't you rsync'ing the underlying filesystem image instead?
> > Is that too slow?
>
> The file system image is too big to do entire image level rsync'ing.

Is there any way you could get hints from the filesystem about which
blocks are actually used, that could make this as efficient?

> > > Since the file systems are mirrored using "rsync", the NFS file handles
> > > each server exports would be different. We would like to use volatile
> > > file handles feature of NFSv4 for this.
> >
> > In theory the hidden directory for reverse lookups would work, but it
> > seems like it would be complicated to get right:
> > - How do you generate the directory and keep it up to date?
> > - What happens if somebody breaks the rules and updates the
> > filesystem while it's being exported?
>
> I think so too. We think Volatile file handle support on the client side
> is simpler for our use case (read only NFS file systems)

Won't it be confusing to applications if inode numbers change on
migration?

--b.

2012-01-16 17:59:50

by Malahal Naineni

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

J. Bruce Fields [[email protected]] wrote:
> On Fri, Jan 13, 2012 at 11:09:14AM -0600, Malahal Naineni wrote:
> > Trond Myklebust [[email protected]] wrote:
> > > On Mon, 2011-11-14 at 08:07 +1100, NeilBrown wrote:
> > >
> > > > If a server has objects that are never renamed, it can easily use volatile
> > > > file handles.
> > > > If a server has objects which can be renamed and wants to use volatile file
> > > > handles, then if such an object is open and is about to be renamed, it must
> > > > first log to stable storage some mapping to allow it to access the file from
> > > > the old volatile file handle. And of course it cannot allow renames during
> > > > the grace period, but I think we already have that.
> > > > Also, if the VFH is such that it will be lost on a reboot, the server must
> > > > log it to stable storage before allowing an open.
> > >
> > > BTW: If the namespace is stable, then the server can easily implement
> > > permanent filehandles. Use a hash of the pathname as the filehandle, and
> > > set up a hidden directory ('/.filehandles') containing symlinks that map
> > > said hash back to the correct pathname. No need for volatile
> > > filehandles.
> >
> > Neil and Trond, one of our use cases is for a read only file system. The
> > name space is stable and Volatile File Handle support should not have
> > any issues under those conditions, correct?
>
> Dumb question: remind me which filesystem your exporting that can't
> already generate stable filehandles?

Only answers can be dumb! Bruce, we have ext3/ext4 file systems on two
separate servers. The file systems are mirrored using rsync as and when
needed. We would like to use the servers as replicas.

Since the file systems are mirrored using "rsync", the NFS file handles
each server exports would be different. We would like to use volatile
file handles feature of NFSv4 for this.

Hope that helps.

Regards, Malahal.


2012-01-13 17:10:02

by Malahal Naineni

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

Trond Myklebust [[email protected]] wrote:
> On Mon, 2011-11-14 at 08:07 +1100, NeilBrown wrote:
>
> > If a server has objects that are never renamed, it can easily use volatile
> > file handles.
> > If a server has objects which can be renamed and wants to use volatile file
> > handles, then if such an object is open and is about to be renamed, it must
> > first log to stable storage some mapping to allow it to access the file from
> > the old volatile file handle. And of course it cannot allow renames during
> > the grace period, but I think we already have that.
> > Also, if the VFH is such that it will be lost on a reboot, the server must
> > log it to stable storage before allowing an open.
>
> BTW: If the namespace is stable, then the server can easily implement
> permanent filehandles. Use a hash of the pathname as the filehandle, and
> set up a hidden directory ('/.filehandles') containing symlinks that map
> said hash back to the correct pathname. No need for volatile
> filehandles.

Neil and Trond, one of our use cases is for a read only file system. The
name space is stable and Volatile File Handle support should not have
any issues under those conditions, correct?

Trond, trying to understand how we can make file handles permanent at
the servers with your ideas. We do use linux NFS server. Looks like the
server needs a new config parameter to hide '/.filehandles' and use it
for storing permanent file handles. Also, the solution needs a tool
that generates/populates './filehandles' directory. I don't know how
best we should handle hash collisions at this point (shouldn't be an
issue once we agree to a method though). Anything else I am missing?

Any thoughts from linux NFS server community? Is this approach
acceptable?

Regards, Malahal.


2012-01-14 01:38:54

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

On Fri, Jan 13, 2012 at 11:09:14AM -0600, Malahal Naineni wrote:
> Trond Myklebust [[email protected]] wrote:
> > On Mon, 2011-11-14 at 08:07 +1100, NeilBrown wrote:
> >
> > > If a server has objects that are never renamed, it can easily use volatile
> > > file handles.
> > > If a server has objects which can be renamed and wants to use volatile file
> > > handles, then if such an object is open and is about to be renamed, it must
> > > first log to stable storage some mapping to allow it to access the file from
> > > the old volatile file handle. And of course it cannot allow renames during
> > > the grace period, but I think we already have that.
> > > Also, if the VFH is such that it will be lost on a reboot, the server must
> > > log it to stable storage before allowing an open.
> >
> > BTW: If the namespace is stable, then the server can easily implement
> > permanent filehandles. Use a hash of the pathname as the filehandle, and
> > set up a hidden directory ('/.filehandles') containing symlinks that map
> > said hash back to the correct pathname. No need for volatile
> > filehandles.
>
> Neil and Trond, one of our use cases is for a read only file system. The
> name space is stable and Volatile File Handle support should not have
> any issues under those conditions, correct?

Dumb question: remind me which filesystem your exporting that can't
already generate stable filehandles?

--b.

>
> Trond, trying to understand how we can make file handles permanent at
> the servers with your ideas. We do use linux NFS server. Looks like the
> server needs a new config parameter to hide '/.filehandles' and use it
> for storing permanent file handles. Also, the solution needs a tool
> that generates/populates './filehandles' directory. I don't know how
> best we should handle hash collisions at this point (shouldn't be an
> issue once we agree to a method though). Anything else I am missing?
>
> Any thoughts from linux NFS server community? Is this approach
> acceptable?
>
> Regards, Malahal.
>

2012-01-17 19:44:33

by Malahal Naineni

[permalink] [raw]
Subject: Re: [PATCH/RFC 0/7] Volatile Filehandle Client-side Support

J. Bruce Fields [[email protected]] wrote:
> > I think so too. We think Volatile file handle support on the client side
> > is simpler for our use case (read only NFS file systems)
>
> Won't it be confusing to applications if inode numbers change on
> migration?

Some applications do depends on stable inode numbers. We can't have VFH
and stable inode numbers! We don't have any such apps in our
environment. Mount option for VFH (use at your own risk) would work fine
for us.