2012-06-29 18:58:07

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 00/17] vfs: add the ability to retry on ESTALE to several syscalls

This patchset is the third version of the patchset to add ESTALE
handling to several syscalls. The basic idea is to allow the client to
gracefully retry the lookup and call when a NFS server returns ESTALE.

The previous version of this patchset is here:

http://lwn.net/Articles/496103/

When discussing my v2 patchset, Al pointed out that the problems with
audit go beyond just doing multiple getname() calls per syscall attempt.
While that is a problem, the most insidious issue is some very brittle
logic in audit_inode_child(). I believe I now have a solution to that
problem in the patchset that I posted earlier this week:

https://lkml.org/lkml/2012/6/26/306

The above patchset should make it safe to do retries of path-based
syscalls. The main caveat when doing that is that it's important to only
call getname() once per __user string. That ensures that you don't end
up with duplicate audit_names entries per syscall. As a side benefit, it
turns out that that's more efficient on a retry, but it does mean that
we can't use the nice user_path_* helpers in many cases.

I've tried to keep this patchset pretty granular. Some of these patches
could be combined if that's desirable. I'd like to see this go into 3.6,
but it obviously needs to go in after the above-referenced set of audit
related patches is merged.

Jeff Layton (17):
vfs: add a retry_estale helper function to handle retries on ESTALE
vfs: add a kern_path_at function
vfs: make fstatat retry on ESTALE errors from getattr call
vfs: fix readlinkat to retry on ESTALE
vfs: remove user_path_at_empty
vfs: turn "empty" arg in getname_flags into a bool
vfs: add new "reval" argument to kern_path_create
vfs: fix mknodat to retry on ESTALE errors
vfs: fix mkdir to retry on ESTALE errors
vfs: fix symlinkat to retry on ESTALE errors
vfs: fix linkat to retry on ESTALE errors
vfs: make rmdir retry on ESTALE errors
vfs: make do_unlinkat retry on ESTALE errors
vfs: fix renameat to retry on ESTALE errors
vfs: remove user_path_parent
vfs: have do_sys_truncate retry once on an ESTALE error
vfs: have faccessat retry once on an ESTALE error

drivers/base/devtmpfs.c | 7 +-
fs/namei.c | 407 ++++++++++++++++++++++++++++-------------------
fs/open.c | 162 +++++++++++--------
fs/stat.c | 44 ++++--
include/linux/fs.h | 24 +++-
include/linux/namei.h | 4 +-
net/unix/af_unix.c | 3 +-
7 files changed, 401 insertions(+), 250 deletions(-)

--
1.7.7.6



2012-06-29 18:58:18

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 07/17] vfs: add new "reval" argument to kern_path_create

...for now, all of the callers pass in "false". Eventually, we'll set
that to "true" when we retry the lookup after getting back an ESTALE on
a call.

While we're at it, change the is_dir arg to a bool since that's how it's
used currently.

Signed-off-by: Jeff Layton <[email protected]>
---
drivers/base/devtmpfs.c | 7 ++++---
fs/namei.c | 12 +++++++++---
include/linux/namei.h | 2 +-
net/unix/af_unix.c | 3 ++-
4 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index 765c3a2..fdd1bff 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -148,7 +148,7 @@ static int dev_mkdir(const char *name, umode_t mode)
struct path path;
int err;

- dentry = kern_path_create(AT_FDCWD, name, &path, 1);
+ dentry = kern_path_create(AT_FDCWD, name, &path, true, false);
if (IS_ERR(dentry))
return PTR_ERR(dentry);

@@ -195,10 +195,11 @@ static int handle_create(const char *nodename, umode_t mode, struct device *dev)
struct path path;
int err;

- dentry = kern_path_create(AT_FDCWD, nodename, &path, 0);
+ dentry = kern_path_create(AT_FDCWD, nodename, &path, false, false);
if (dentry == ERR_PTR(-ENOENT)) {
create_path(nodename);
- dentry = kern_path_create(AT_FDCWD, nodename, &path, 0);
+ dentry = kern_path_create(AT_FDCWD, nodename, &path,
+ false, false);
}
if (IS_ERR(dentry))
return PTR_ERR(dentry);
diff --git a/fs/namei.c b/fs/namei.c
index a20faa5..86aa8ee 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2536,11 +2536,17 @@ struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt,
return file;
}

-struct dentry *kern_path_create(int dfd, const char *pathname, struct path *path, int is_dir)
+struct dentry *kern_path_create(int dfd, const char *pathname, struct path *path, bool is_dir, bool reval)
{
struct dentry *dentry = ERR_PTR(-EEXIST);
struct nameidata nd;
- int error = do_path_lookup(dfd, pathname, LOOKUP_PARENT, &nd);
+ int error;
+ unsigned int lookup_flags = LOOKUP_PARENT;
+
+ if (reval)
+ lookup_flags |= LOOKUP_REVAL;
+
+ error = do_path_lookup(dfd, pathname, lookup_flags, &nd);
if (error)
return ERR_PTR(error);

@@ -2594,7 +2600,7 @@ struct dentry *user_path_create(int dfd, const char __user *pathname, struct pat
struct dentry *res;
if (IS_ERR(tmp))
return ERR_CAST(tmp);
- res = kern_path_create(dfd, tmp, path, is_dir);
+ res = kern_path_create(dfd, tmp, path, (bool)is_dir, false);
putname(tmp);
return res;
}
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 1ea93d5..9e135e3 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -76,7 +76,7 @@ extern int user_path_at(int, const char __user *, unsigned, struct path *);

extern int kern_path(const char *, unsigned, struct path *);

-extern struct dentry *kern_path_create(int, const char *, struct path *, int);
+extern struct dentry *kern_path_create(int, const char *, struct path *, bool, bool);
extern struct dentry *user_path_create(int, const char __user *, struct path *, int);
extern int kern_path_parent(const char *, struct nameidata *);
extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 641f2e4..d94caa9 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -866,7 +866,8 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
* Get the parent directory, calculate the hash for last
* component.
*/
- dentry = kern_path_create(AT_FDCWD, sun_path, &path, 0);
+ dentry = kern_path_create(AT_FDCWD, sun_path, &path,
+ false, false);
err = PTR_ERR(dentry);
if (IS_ERR(dentry))
goto out_mknod_parent;
--
1.7.7.6


2012-06-29 18:58:29

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 15/17] vfs: remove user_path_parent

...there are no more callers.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/namei.c | 18 ------------------
1 files changed, 0 insertions(+), 18 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index b38d734..7c03af9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1941,24 +1941,6 @@ int user_path_at(int dfd, const char __user *name, unsigned flags,
return err;
}

-static int user_path_parent(int dfd, const char __user *path,
- struct nameidata *nd, char **name)
-{
- char *s = getname(path);
- int error;
-
- if (IS_ERR(s))
- return PTR_ERR(s);
-
- error = do_path_lookup(dfd, s, LOOKUP_PARENT, nd);
- if (error)
- putname(s);
- else
- *name = s;
-
- return error;
-}
-
/*
* It's inline, so penalty for filesystems that don't use sticky bit is
* minimal.
--
1.7.7.6


2012-06-29 18:58:24

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 12/17] vfs: make rmdir retry on ESTALE errors

Signed-off-by: Jeff Layton <[email protected]>
---
fs/namei.c | 82 +++++++++++++++++++++++++++++++++--------------------------
1 files changed, 46 insertions(+), 36 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e902770..7fccc1c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2845,52 +2845,62 @@ out:
static long do_rmdir(int dfd, const char __user *pathname)
{
int error = 0;
- char * name;
+ char *name;
struct dentry *dentry;
struct nameidata nd;
+ unsigned int try = 0;
+ unsigned int lookup_flags = LOOKUP_PARENT;

- error = user_path_parent(dfd, pathname, &nd, &name);
- if (error)
- return error;
+ name = getname(pathname);
+ if (IS_ERR(name))
+ return PTR_ERR(name);

- switch(nd.last_type) {
- case LAST_DOTDOT:
- error = -ENOTEMPTY;
- goto exit1;
- case LAST_DOT:
- error = -EINVAL;
- goto exit1;
- case LAST_ROOT:
- error = -EBUSY;
- goto exit1;
- }
+ do {
+ error = do_path_lookup(dfd, name, lookup_flags, &nd);
+ if (error)
+ break;

- nd.flags &= ~LOOKUP_PARENT;
+ switch (nd.last_type) {
+ case LAST_DOTDOT:
+ error = -ENOTEMPTY;
+ goto exit1;
+ case LAST_DOT:
+ error = -EINVAL;
+ goto exit1;
+ case LAST_ROOT:
+ error = -EBUSY;
+ goto exit1;
+ }

- mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
- dentry = lookup_hash(&nd);
- error = PTR_ERR(dentry);
- if (IS_ERR(dentry))
- goto exit2;
- if (!dentry->d_inode) {
- error = -ENOENT;
- goto exit3;
- }
- error = mnt_want_write(nd.path.mnt);
- if (error)
- goto exit3;
- error = security_path_rmdir(&nd.path, dentry);
- if (error)
- goto exit4;
- error = vfs_rmdir(nd.path.dentry->d_inode, dentry);
+ nd.flags &= ~LOOKUP_PARENT;
+
+ mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex,
+ I_MUTEX_PARENT);
+ dentry = lookup_hash(&nd);
+ error = PTR_ERR(dentry);
+ if (IS_ERR(dentry))
+ goto exit2;
+ if (!dentry->d_inode) {
+ error = -ENOENT;
+ goto exit3;
+ }
+ error = mnt_want_write(nd.path.mnt);
+ if (error)
+ goto exit3;
+ error = security_path_rmdir(&nd.path, dentry);
+ if (error)
+ goto exit4;
+ error = vfs_rmdir(nd.path.dentry->d_inode, dentry);
exit4:
- mnt_drop_write(nd.path.mnt);
+ mnt_drop_write(nd.path.mnt);
exit3:
- dput(dentry);
+ dput(dentry);
exit2:
- mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
+ mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
exit1:
- path_put(&nd.path);
+ path_put(&nd.path);
+ lookup_flags |= LOOKUP_REVAL;
+ } while (retry_estale(error, try++));
putname(name);
return error;
}
--
1.7.7.6


2012-06-29 18:58:19

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 08/17] vfs: fix mknodat to retry on ESTALE errors

Signed-off-by: Jeff Layton <[email protected]>
---
fs/namei.c | 68 +++++++++++++++++++++++++++++++++++------------------------
1 files changed, 40 insertions(+), 28 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 86aa8ee..ef3ccee 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2656,44 +2656,56 @@ SYSCALL_DEFINE4(mknodat, int, dfd, const char __user *, filename, umode_t, mode,
struct dentry *dentry;
struct path path;
int error;
+ char *name;
+ unsigned int try = 0;

if (S_ISDIR(mode))
return -EPERM;

- dentry = user_path_create(dfd, filename, &path, 0);
- if (IS_ERR(dentry))
- return PTR_ERR(dentry);
+ name = getname(filename);
+ if (IS_ERR(name))
+ return PTR_ERR(name);

- if (!IS_POSIXACL(path.dentry->d_inode))
- mode &= ~current_umask();
- error = may_mknod(mode);
- if (error)
- goto out_dput;
- error = mnt_want_write(path.mnt);
- if (error)
- goto out_dput;
- error = security_path_mknod(&path, dentry, mode, dev);
- if (error)
- goto out_drop_write;
- switch (mode & S_IFMT) {
- case 0: case S_IFREG:
- error = vfs_create(path.dentry->d_inode,dentry,mode,NULL);
+ do {
+ dentry = kern_path_create(dfd, name, &path, false, try);
+ if (IS_ERR(dentry))
+ return PTR_ERR(dentry);
+
+ if (!IS_POSIXACL(path.dentry->d_inode))
+ mode &= ~current_umask();
+ error = may_mknod(mode);
+ if (error)
+ goto out_dput;
+ error = mnt_want_write(path.mnt);
+ if (error)
+ goto out_dput;
+ error = security_path_mknod(&path, dentry, mode, dev);
+ if (error)
+ goto out_drop_write;
+ switch (mode & S_IFMT) {
+ case 0:
+ case S_IFREG:
+ error = vfs_create(path.dentry->d_inode, dentry,
+ mode, NULL);
break;
- case S_IFCHR: case S_IFBLK:
- error = vfs_mknod(path.dentry->d_inode,dentry,mode,
+ case S_IFCHR:
+ case S_IFBLK:
+ error = vfs_mknod(path.dentry->d_inode, dentry, mode,
new_decode_dev(dev));
break;
- case S_IFIFO: case S_IFSOCK:
- error = vfs_mknod(path.dentry->d_inode,dentry,mode,0);
- break;
- }
+ case S_IFIFO:
+ case S_IFSOCK:
+ error = vfs_mknod(path.dentry->d_inode, dentry,
+ mode, 0);
+ }
out_drop_write:
- mnt_drop_write(path.mnt);
+ mnt_drop_write(path.mnt);
out_dput:
- dput(dentry);
- mutex_unlock(&path.dentry->d_inode->i_mutex);
- path_put(&path);
-
+ dput(dentry);
+ mutex_unlock(&path.dentry->d_inode->i_mutex);
+ path_put(&path);
+ } while (retry_estale(error, try++));
+ putname(name);
return error;
}

--
1.7.7.6


2012-06-29 18:58:09

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 02/17] vfs: add a kern_path_at function

This will function like user_path_at, but does not do the getname and
putname, leaving that to the caller.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/namei.c | 27 +++++++++++++++++++--------
include/linux/namei.h | 1 +
2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 196f039..3f46b2f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1912,20 +1912,31 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
return __lookup_hash(&this, base, NULL);
}

+int kern_path_at(int dfd, const char *name, unsigned flags, struct path *path)
+{
+ struct nameidata nd;
+ int err;
+
+ BUG_ON(flags & LOOKUP_PARENT);
+
+ err = do_path_lookup(dfd, name, flags, &nd);
+ if (!err)
+ *path = nd.path;
+
+ return err;
+}
+
int user_path_at_empty(int dfd, const char __user *name, unsigned flags,
struct path *path, int *empty)
{
- struct nameidata nd;
char *tmp = getname_flags(name, flags, empty);
- int err = PTR_ERR(tmp);
- if (!IS_ERR(tmp)) {
-
- BUG_ON(flags & LOOKUP_PARENT);
+ int err;

- err = do_path_lookup(dfd, tmp, flags, &nd);
+ if (IS_ERR(tmp)) {
+ err = PTR_ERR(tmp);
+ } else {
+ err = kern_path_at(dfd, tmp, flags, path);
putname(tmp);
- if (!err)
- *path = nd.path;
}
return err;
}
diff --git a/include/linux/namei.h b/include/linux/namei.h
index ffc0213..cd95fcb 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -66,6 +66,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
#define LOOKUP_ROOT 0x2000
#define LOOKUP_EMPTY 0x4000

+extern int kern_path_at(int, const char *, unsigned, struct path *);
extern int user_path_at(int, const char __user *, unsigned, struct path *);
extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty);

--
1.7.7.6


2012-06-29 18:58:12

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 03/17] vfs: make fstatat retry on ESTALE errors from getattr call

Signed-off-by: Jeff Layton <[email protected]>
---
fs/namei.c | 2 +-
fs/stat.c | 21 ++++++++++++++++-----
include/linux/fs.h | 3 ++-
3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3f46b2f..44c7ccd 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -117,7 +117,7 @@
* POSIX.1 2.4: an empty pathname is invalid (ENOENT).
* PATH_MAX includes the nul terminator --RR.
*/
-static char *getname_flags(const char __user *filename, int flags, int *empty)
+char *getname_flags(const char __user *filename, int flags, int *empty)
{
char *result = __getname(), *err;
int len;
diff --git a/fs/stat.c b/fs/stat.c
index b6ff118..5afeb37 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -72,9 +72,11 @@ EXPORT_SYMBOL(vfs_fstat);
int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
int flag)
{
+ char *name;
struct path path;
int error = -EINVAL;
- int lookup_flags = 0;
+ unsigned int try = 0;
+ unsigned int lookup_flags = 0;

if ((flag & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT |
AT_EMPTY_PATH)) != 0)
@@ -85,12 +87,21 @@ int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
if (flag & AT_EMPTY_PATH)
lookup_flags |= LOOKUP_EMPTY;

- error = user_path_at(dfd, filename, lookup_flags, &path);
- if (error)
+ name = getname_flags(filename, lookup_flags, NULL);
+ if (IS_ERR(name)) {
+ error = PTR_ERR(name);
goto out;
+ }
+ do {
+ error = kern_path_at(dfd, name, lookup_flags, &path);
+ if (error)
+ break;

- error = vfs_getattr(path.mnt, path.dentry, stat);
- path_put(&path);
+ error = vfs_getattr(path.mnt, path.dentry, stat);
+ path_put(&path);
+ lookup_flags |= LOOKUP_REVAL;
+ } while (retry_estale(error, try++));
+ putname(name);
out:
return error;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4a2aa04..c7d60db 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2060,7 +2060,8 @@ extern struct file *file_open_root(struct dentry *, struct vfsmount *,
extern struct file * dentry_open(struct dentry *, struct vfsmount *, int,
const struct cred *);
extern int filp_close(struct file *, fl_owner_t id);
-extern char * getname(const char __user *);
+extern char *getname_flags(const char __user *, int, int *);
+extern char *getname(const char __user *);

/**
* retry_estale - determine whether the caller should retry an operation
--
1.7.7.6


2012-06-29 18:58:26

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 13/17] vfs: make do_unlinkat retry on ESTALE errors

Signed-off-by: Jeff Layton <[email protected]>
---
fs/namei.c | 12 +++++++++++-
1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 7fccc1c..7fadc99 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2955,8 +2955,14 @@ static long do_unlinkat(int dfd, const char __user *pathname)
struct dentry *dentry;
struct nameidata nd;
struct inode *inode = NULL;
+ unsigned int try = 0;
+ unsigned int lookup_flags = LOOKUP_PARENT;

- error = user_path_parent(dfd, pathname, &nd, &name);
+ name = getname(pathname);
+ if (IS_ERR(name))
+ return PTR_ERR(name);
+retry:
+ error = do_path_lookup(dfd, name, lookup_flags, &nd);
if (error)
return error;

@@ -2994,6 +3000,10 @@ exit3:
iput(inode); /* truncate the inode here */
exit1:
path_put(&nd.path);
+ if (retry_estale(error, try++)) {
+ lookup_flags |= LOOKUP_REVAL;
+ goto retry;
+ }
putname(name);
return error;

--
1.7.7.6


2012-06-29 18:58:31

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 17/17] vfs: have faccessat retry once on an ESTALE error

Signed-off-by: Jeff Layton <[email protected]>
---
fs/open.c | 70 ++++++++++++++++++++++++++++++++++--------------------------
1 files changed, 40 insertions(+), 30 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index e813d90..2bdc531 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -313,6 +313,9 @@ SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
struct path path;
struct inode *inode;
int res;
+ unsigned int lookup_flags = LOOKUP_FOLLOW;
+ unsigned int try = 0;
+ char *name;

if (mode & ~S_IRWXO) /* where's F_OK, X_OK, W_OK, R_OK? */
return -EINVAL;
@@ -334,44 +337,51 @@ SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
override_cred->cap_permitted;
}

+ name = getname_flags(filename, lookup_flags, NULL);
+ if (IS_ERR(name))
+ return PTR_ERR(name);
+
old_cred = override_creds(override_cred);

- res = user_path_at(dfd, filename, LOOKUP_FOLLOW, &path);
- if (res)
- goto out;
+ do {
+ res = kern_path_at(dfd, name, lookup_flags, &path);
+ if (res)
+ break;

- inode = path.dentry->d_inode;
+ inode = path.dentry->d_inode;

- if ((mode & MAY_EXEC) && S_ISREG(inode->i_mode)) {
+ if ((mode & MAY_EXEC) && S_ISREG(inode->i_mode)) {
+ /*
+ * MAY_EXEC on regular files is denied if the fs is
+ * mounted with the "noexec" flag.
+ */
+ res = -EACCES;
+ if (path.mnt->mnt_flags & MNT_NOEXEC)
+ goto out_path_release;
+ }
+
+ res = inode_permission(inode, mode | MAY_ACCESS);
+ /* SuS v2 requires we report a read only fs too */
+ if (res || !(mode & S_IWOTH) || special_file(inode->i_mode))
+ goto out_path_release;
/*
- * MAY_EXEC on regular files is denied if the fs is mounted
- * with the "noexec" flag.
+ * This is a rare case where using __mnt_is_readonly()
+ * is OK without a mnt_want/drop_write() pair. Since
+ * no actual write to the fs is performed here, we do
+ * not need to telegraph to that to anyone.
+ *
+ * By doing this, we accept that this access is
+ * inherently racy and know that the fs may change
+ * state before we even see this result.
*/
- res = -EACCES;
- if (path.mnt->mnt_flags & MNT_NOEXEC)
- goto out_path_release;
- }
-
- res = inode_permission(inode, mode | MAY_ACCESS);
- /* SuS v2 requires we report a read only fs too */
- if (res || !(mode & S_IWOTH) || special_file(inode->i_mode))
- goto out_path_release;
- /*
- * This is a rare case where using __mnt_is_readonly()
- * is OK without a mnt_want/drop_write() pair. Since
- * no actual write to the fs is performed here, we do
- * not need to telegraph to that to anyone.
- *
- * By doing this, we accept that this access is
- * inherently racy and know that the fs may change
- * state before we even see this result.
- */
- if (__mnt_is_readonly(path.mnt))
- res = -EROFS;
+ if (__mnt_is_readonly(path.mnt))
+ res = -EROFS;

out_path_release:
- path_put(&path);
-out:
+ path_put(&path);
+ lookup_flags |= LOOKUP_REVAL;
+ } while (retry_estale(res, try++));
+ putname(name);
revert_creds(old_cred);
put_cred(override_cred);
return res;
--
1.7.7.6


2012-06-29 18:58:27

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 14/17] vfs: fix renameat to retry on ESTALE errors

...as always, rename is the messiest of the bunch. We have to track
whether to retry or not via a separate flag since the error handling
is already quite complex.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/namei.c | 31 ++++++++++++++++++++++++++-----
1 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 7fadc99..b38d734 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3392,12 +3392,25 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
char *from;
char *to;
int error;
+ unsigned int try = 0;
+ bool should_retry = false;
+ unsigned int lookup_flags = LOOKUP_PARENT;

- error = user_path_parent(olddfd, oldname, &oldnd, &from);
- if (error)
+ from = getname(oldname);
+ if (IS_ERR(from))
+ return PTR_ERR(from);
+
+ to = getname(newname);
+ if (IS_ERR(to)) {
+ error = PTR_ERR(to);
goto exit;
+ }
+retry:
+ error = do_path_lookup(olddfd, from, lookup_flags, &oldnd);
+ if (error)
+ goto exit0;

- error = user_path_parent(newdfd, newname, &newnd, &to);
+ error = do_path_lookup(newdfd, to, lookup_flags, &newnd);
if (error)
goto exit1;

@@ -3466,13 +3479,21 @@ exit4:
dput(old_dentry);
exit3:
unlock_rename(new_dir, old_dir);
+ if (retry_estale(error, try++))
+ should_retry = true;
exit2:
path_put(&newnd.path);
- putname(to);
exit1:
path_put(&oldnd.path);
- putname(from);
+ if (should_retry) {
+ should_retry = false;
+ lookup_flags |= LOOKUP_REVAL;
+ goto retry;
+ }
+exit0:
+ putname(to);
exit:
+ putname(from);
return error;
}

--
1.7.7.6


2012-06-29 18:58:20

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 09/17] vfs: fix mkdir to retry on ESTALE errors

Signed-off-by: Jeff Layton <[email protected]>
---
fs/namei.c | 43 ++++++++++++++++++++++++++-----------------
1 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ef3ccee..dec5713 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2744,26 +2744,35 @@ SYSCALL_DEFINE3(mkdirat, int, dfd, const char __user *, pathname, umode_t, mode)
struct dentry *dentry;
struct path path;
int error;
+ char *name;
+ unsigned int try = 0;

- dentry = user_path_create(dfd, pathname, &path, 1);
- if (IS_ERR(dentry))
- return PTR_ERR(dentry);
-
- if (!IS_POSIXACL(path.dentry->d_inode))
- mode &= ~current_umask();
- error = mnt_want_write(path.mnt);
- if (error)
- goto out_dput;
- error = security_path_mkdir(&path, dentry, mode);
- if (error)
- goto out_drop_write;
- error = vfs_mkdir(path.dentry->d_inode, dentry, mode);
+ name = getname(pathname);
+ if (IS_ERR(name))
+ return PTR_ERR(name);
+ do {
+ dentry = kern_path_create(dfd, name, &path, true, try);
+ if (IS_ERR(dentry)) {
+ error = PTR_ERR(dentry);
+ break;
+ }
+ if (!IS_POSIXACL(path.dentry->d_inode))
+ mode &= ~current_umask();
+ error = mnt_want_write(path.mnt);
+ if (error)
+ goto out_dput;
+ error = security_path_mkdir(&path, dentry, mode);
+ if (error)
+ goto out_drop_write;
+ error = vfs_mkdir(path.dentry->d_inode, dentry, mode);
out_drop_write:
- mnt_drop_write(path.mnt);
+ mnt_drop_write(path.mnt);
out_dput:
- dput(dentry);
- mutex_unlock(&path.dentry->d_inode->i_mutex);
- path_put(&path);
+ dput(dentry);
+ mutex_unlock(&path.dentry->d_inode->i_mutex);
+ path_put(&path);
+ } while (retry_estale(error, try++));
+ putname(name);
return error;
}

--
1.7.7.6


2012-06-29 18:58:30

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 16/17] vfs: have do_sys_truncate retry once on an ESTALE error

Signed-off-by: Jeff Layton <[email protected]>
---
fs/open.c | 92 +++++++++++++++++++++++++++++++++---------------------------
1 files changed, 51 insertions(+), 41 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 5af0872..e813d90 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -66,62 +66,72 @@ static long do_sys_truncate(const char __user *pathname, loff_t length)
struct path path;
struct inode *inode;
int error;
+ unsigned int lookup_flags = LOOKUP_FOLLOW;
+ unsigned int try = 0;
+ char *name;

- error = -EINVAL;
if (length < 0) /* sorry, but loff_t says... */
- goto out;
+ return -EINVAL;

- error = user_path(pathname, &path);
- if (error)
- goto out;
- inode = path.dentry->d_inode;
+ name = getname_flags(pathname, lookup_flags, NULL);
+ if (IS_ERR(name))
+ return PTR_ERR(name);

- /* For directories it's -EISDIR, for other non-regulars - -EINVAL */
- error = -EISDIR;
- if (S_ISDIR(inode->i_mode))
- goto dput_and_out;
+ do {
+ error = kern_path_at(AT_FDCWD, name, lookup_flags, &path);
+ if (error)
+ break;
+ inode = path.dentry->d_inode;

- error = -EINVAL;
- if (!S_ISREG(inode->i_mode))
- goto dput_and_out;
+ /* For dirs, -EISDIR. For other non-regulars, -EINVAL */
+ error = -EISDIR;
+ if (S_ISDIR(inode->i_mode))
+ goto dput_and_out;

- error = mnt_want_write(path.mnt);
- if (error)
- goto dput_and_out;
+ error = -EINVAL;
+ if (!S_ISREG(inode->i_mode))
+ goto dput_and_out;

- error = inode_permission(inode, MAY_WRITE);
- if (error)
- goto mnt_drop_write_and_out;
+ error = mnt_want_write(path.mnt);
+ if (error)
+ goto dput_and_out;

- error = -EPERM;
- if (IS_APPEND(inode))
- goto mnt_drop_write_and_out;
+ error = inode_permission(inode, MAY_WRITE);
+ if (error)
+ goto mnt_drop_write_and_out;

- error = get_write_access(inode);
- if (error)
- goto mnt_drop_write_and_out;
+ error = -EPERM;
+ if (IS_APPEND(inode))
+ goto mnt_drop_write_and_out;

- /*
- * Make sure that there are no leases. get_write_access() protects
- * against the truncate racing with a lease-granting setlease().
- */
- error = break_lease(inode, O_WRONLY);
- if (error)
- goto put_write_and_out;
+ error = get_write_access(inode);
+ if (error)
+ goto mnt_drop_write_and_out;

- error = locks_verify_truncate(inode, NULL, length);
- if (!error)
- error = security_path_truncate(&path);
- if (!error)
- error = do_truncate(path.dentry, length, 0, NULL);
+ /*
+ * Make sure that there are no leases. get_write_access()
+ * protects against the truncate racing with a lease-granting
+ * setlease().
+ */
+ error = break_lease(inode, O_WRONLY);
+ if (error)
+ goto put_write_and_out;
+
+ error = locks_verify_truncate(inode, NULL, length);
+ if (!error)
+ error = security_path_truncate(&path);
+ if (!error)
+ error = do_truncate(path.dentry, length, 0, NULL);

put_write_and_out:
- put_write_access(inode);
+ put_write_access(inode);
mnt_drop_write_and_out:
- mnt_drop_write(path.mnt);
+ mnt_drop_write(path.mnt);
dput_and_out:
- path_put(&path);
-out:
+ path_put(&path);
+ lookup_flags |= LOOKUP_REVAL;
+ } while (retry_estale(error, try++));
+ putname(name);
return error;
}

--
1.7.7.6


2012-06-29 18:58:23

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 11/17] vfs: fix linkat to retry on ESTALE errors

Signed-off-by: Jeff Layton <[email protected]>
---
fs/namei.c | 67 ++++++++++++++++++++++++++++++++++++++---------------------
1 files changed, 43 insertions(+), 24 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a2d691a..e902770 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3138,6 +3138,8 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
struct path old_path, new_path;
int how = 0;
int error;
+ char *old, *new;
+ unsigned int try = 0;

if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH)) != 0)
return -EINVAL;
@@ -3155,34 +3157,51 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
if (flags & AT_SYMLINK_FOLLOW)
how |= LOOKUP_FOLLOW;

- error = user_path_at(olddfd, oldname, how, &old_path);
- if (error)
- return error;
+ old = getname_flags(oldname, how, NULL);
+ if (IS_ERR(old))
+ return PTR_ERR(old);

- new_dentry = user_path_create(newdfd, newname, &new_path, 0);
- error = PTR_ERR(new_dentry);
- if (IS_ERR(new_dentry))
- goto out;
+ new = getname(newname);
+ if (IS_ERR(new)) {
+ putname(old);
+ return PTR_ERR(new);
+ }

- error = -EXDEV;
- if (old_path.mnt != new_path.mnt)
- goto out_dput;
- error = mnt_want_write(new_path.mnt);
- if (error)
- goto out_dput;
- error = security_path_link(old_path.dentry, &new_path, new_dentry);
- if (error)
- goto out_drop_write;
- error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry);
+ do {
+ error = kern_path_at(olddfd, old, how, &old_path);
+ if (error)
+ break;
+
+ new_dentry = kern_path_create(newdfd, new, &new_path, 0, try);
+ error = PTR_ERR(new_dentry);
+ if (IS_ERR(new_dentry)) {
+ path_put(&old_path);
+ break;
+ }
+
+ error = -EXDEV;
+ if (old_path.mnt != new_path.mnt)
+ goto out_dput;
+ error = mnt_want_write(new_path.mnt);
+ if (error)
+ goto out_dput;
+ error = security_path_link(old_path.dentry, &new_path,
+ new_dentry);
+ if (error)
+ goto out_drop_write;
+ error = vfs_link(old_path.dentry, new_path.dentry->d_inode,
+ new_dentry);
out_drop_write:
- mnt_drop_write(new_path.mnt);
+ mnt_drop_write(new_path.mnt);
out_dput:
- dput(new_dentry);
- mutex_unlock(&new_path.dentry->d_inode->i_mutex);
- path_put(&new_path);
-out:
- path_put(&old_path);
-
+ dput(new_dentry);
+ mutex_unlock(&new_path.dentry->d_inode->i_mutex);
+ path_put(&new_path);
+ path_put(&old_path);
+ how |= LOOKUP_REVAL;
+ } while (retry_estale(error, try++));
+ putname(new);
+ putname(old);
return error;
}

--
1.7.7.6


2012-06-29 18:58:15

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 05/17] vfs: remove user_path_at_empty

...there are no more callers.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/namei.c | 12 +++---------
include/linux/namei.h | 1 -
2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 44c7ccd..7945b19 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1926,10 +1926,10 @@ int kern_path_at(int dfd, const char *name, unsigned flags, struct path *path)
return err;
}

-int user_path_at_empty(int dfd, const char __user *name, unsigned flags,
- struct path *path, int *empty)
+int user_path_at(int dfd, const char __user *name, unsigned flags,
+ struct path *path)
{
- char *tmp = getname_flags(name, flags, empty);
+ char *tmp = getname_flags(name, flags, NULL);
int err;

if (IS_ERR(tmp)) {
@@ -1941,12 +1941,6 @@ int user_path_at_empty(int dfd, const char __user *name, unsigned flags,
return err;
}

-int user_path_at(int dfd, const char __user *name, unsigned flags,
- struct path *path)
-{
- return user_path_at_empty(dfd, name, flags, path, NULL);
-}
-
static int user_path_parent(int dfd, const char __user *path,
struct nameidata *nd, char **name)
{
diff --git a/include/linux/namei.h b/include/linux/namei.h
index cd95fcb..1ea93d5 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -68,7 +68,6 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};

extern int kern_path_at(int, const char *, unsigned, struct path *);
extern int user_path_at(int, const char __user *, unsigned, struct path *);
-extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty);

#define user_path(name, path) user_path_at(AT_FDCWD, name, LOOKUP_FOLLOW, path)
#define user_lpath(name, path) user_path_at(AT_FDCWD, name, 0, path)
--
1.7.7.6


2012-06-29 18:58:16

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 06/17] vfs: turn "empty" arg in getname_flags into a bool

...it's just used as a flag.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/namei.c | 4 ++--
fs/stat.c | 2 +-
include/linux/fs.h | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 7945b19..a20faa5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -117,7 +117,7 @@
* POSIX.1 2.4: an empty pathname is invalid (ENOENT).
* PATH_MAX includes the nul terminator --RR.
*/
-char *getname_flags(const char __user *filename, int flags, int *empty)
+char *getname_flags(const char __user *filename, int flags, bool *empty)
{
char *result = __getname(), *err;
int len;
@@ -133,7 +133,7 @@ char *getname_flags(const char __user *filename, int flags, int *empty)
/* The empty path is special. */
if (unlikely(!len)) {
if (empty)
- *empty = 1;
+ *empty = true;
err = ERR_PTR(-ENOENT);
if (!(flags & LOOKUP_EMPTY))
goto error;
diff --git a/fs/stat.c b/fs/stat.c
index c9d88f7..4f8b6bc 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -306,7 +306,7 @@ SYSCALL_DEFINE4(readlinkat, int, dfd, const char __user *, pathname,
{
struct path path;
int error;
- int empty = 0;
+ bool empty = false;
char *name;
unsigned int try = 0;
unsigned int lookup_flags = LOOKUP_EMPTY;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c7d60db..d044088 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2060,7 +2060,7 @@ extern struct file *file_open_root(struct dentry *, struct vfsmount *,
extern struct file * dentry_open(struct dentry *, struct vfsmount *, int,
const struct cred *);
extern int filp_close(struct file *, fl_owner_t id);
-extern char *getname_flags(const char __user *, int, int *);
+extern char *getname_flags(const char __user *, int, bool *);
extern char *getname(const char __user *);

/**
--
1.7.7.6


2012-06-29 18:58:14

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 04/17] vfs: fix readlinkat to retry on ESTALE

Signed-off-by: Jeff Layton <[email protected]>
---
fs/stat.c | 21 +++++++++++++++++----
1 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/fs/stat.c b/fs/stat.c
index 5afeb37..c9d88f7 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -307,14 +307,25 @@ SYSCALL_DEFINE4(readlinkat, int, dfd, const char __user *, pathname,
struct path path;
int error;
int empty = 0;
+ char *name;
+ unsigned int try = 0;
+ unsigned int lookup_flags = LOOKUP_EMPTY;

if (bufsiz <= 0)
return -EINVAL;

- error = user_path_at_empty(dfd, pathname, LOOKUP_EMPTY, &path, &empty);
- if (!error) {
- struct inode *inode = path.dentry->d_inode;
+ name = getname_flags(pathname, lookup_flags, &empty);
+ if (IS_ERR(name))
+ return PTR_ERR(name);
+
+ do {
+ struct inode *inode;
+
+ error = kern_path_at(dfd, name, lookup_flags, &path);
+ if (error)
+ break;

+ inode = path.dentry->d_inode;
error = empty ? -ENOENT : -EINVAL;
if (inode->i_op->readlink) {
error = security_inode_readlink(path.dentry);
@@ -325,7 +336,9 @@ SYSCALL_DEFINE4(readlinkat, int, dfd, const char __user *, pathname,
}
}
path_put(&path);
- }
+ lookup_flags |= LOOKUP_REVAL;
+ } while (retry_estale(error, try++));
+ putname(name);
return error;
}

--
1.7.7.6


2012-06-29 18:58:08

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 01/17] vfs: add a retry_estale helper function to handle retries on ESTALE

This function is expected to be called from path-based syscalls to help
them decide whether to try the lookup and call again in the event that
they got an -ESTALE return back on an earier try.

Currently, we only retry the call once on an ESTALE error, but in the
event that we decide that that's not enough in the future, we should be
able to change the logic in this helper without too much effort.

Signed-off-by: Jeff Layton <[email protected]>
---
include/linux/fs.h | 21 +++++++++++++++++++++
1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 17fd887..4a2aa04 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2062,6 +2062,27 @@ extern struct file * dentry_open(struct dentry *, struct vfsmount *, int,
extern int filp_close(struct file *, fl_owner_t id);
extern char * getname(const char __user *);

+/**
+ * retry_estale - determine whether the caller should retry an operation
+ *
+ * @error: the error we'll be returning
+ * @try: number of retries already performed
+ *
+ * Check to see if the error code was -ESTALE, and then determine whether
+ * to retry the call based on the number of retries so far. Currently, we only
+ * retry the call once.
+ *
+ * Returns true if the caller should try again.
+ */
+static inline bool
+retry_estale(const long error, const unsigned int try)
+{
+ if (likely(error != -ESTALE))
+ return false;
+
+ return !try;
+}
+
/* fs/ioctl.c */

extern int ioctl_preallocate(struct file *filp, void __user *argp);
--
1.7.7.6


2012-06-29 18:58:22

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v3 10/17] vfs: fix symlinkat to retry on ESTALE errors

Signed-off-by: Jeff Layton <[email protected]>
---
fs/namei.c | 43 ++++++++++++++++++++++++++-----------------
1 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index dec5713..a2d691a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3033,33 +3033,42 @@ SYSCALL_DEFINE3(symlinkat, const char __user *, oldname,
int, newdfd, const char __user *, newname)
{
int error;
- char *from;
+ char *from, *to;
struct dentry *dentry;
struct path path;
+ unsigned int try = 0;

from = getname(oldname);
if (IS_ERR(from))
return PTR_ERR(from);

- dentry = user_path_create(newdfd, newname, &path, 0);
- error = PTR_ERR(dentry);
- if (IS_ERR(dentry))
- goto out_putname;
+ to = getname(newname);
+ if (IS_ERR(to)) {
+ putname(from);
+ return PTR_ERR(to);
+ }

- error = mnt_want_write(path.mnt);
- if (error)
- goto out_dput;
- error = security_path_symlink(&path, dentry, from);
- if (error)
- goto out_drop_write;
- error = vfs_symlink(path.dentry->d_inode, dentry, from);
+ do {
+ dentry = kern_path_create(newdfd, to, &path, 0, try);
+ error = PTR_ERR(dentry);
+ if (IS_ERR(dentry))
+ break;
+
+ error = mnt_want_write(path.mnt);
+ if (error)
+ goto out_dput;
+ error = security_path_symlink(&path, dentry, from);
+ if (error)
+ goto out_drop_write;
+ error = vfs_symlink(path.dentry->d_inode, dentry, from);
out_drop_write:
- mnt_drop_write(path.mnt);
+ mnt_drop_write(path.mnt);
out_dput:
- dput(dentry);
- mutex_unlock(&path.dentry->d_inode->i_mutex);
- path_put(&path);
-out_putname:
+ dput(dentry);
+ mutex_unlock(&path.dentry->d_inode->i_mutex);
+ path_put(&path);
+ } while (retry_estale(error, try++));
+ putname(to);
putname(from);
return error;
}
--
1.7.7.6


2012-07-11 18:44:16

by Steven J. Magnani

[permalink] [raw]
Subject: Re: [PATCH v3 00/17] vfs: add the ability to retry on ESTALE to several syscalls

Jeff -

On Fri, 2012-06-29 at 14:57 -0400, Jeff Layton wrote:
> This patchset is the third version of the patchset to add ESTALE
> handling to several syscalls. The basic idea is to allow the client to
> gracefully retry the lookup and call when a NFS server returns ESTALE.

I exercised this using 3.5-rc5 against a memory-starved server that
exports a FAT-backed filesystem. Where normally I see lots of ESTALE
errors due to inode eviction, with this patchset I see none. And, the
performance is much better than the only other way I know to eliminate
the errors, which is to mount with 'lookupcache=none'.

It's not an exhaustive test by any means, just a data point for you.
Thanks!

Lightly-tested-by: Steve Magnani <[email protected]>
------------------------------------------------------------------------
Steven J. Magnani "I claim this network for MARS!
http://www.digidescorp.com Earthling, return my space modulator!"

#include <standard.disclaimer>



2012-07-11 19:01:26

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v3 00/17] vfs: add the ability to retry on ESTALE to several syscalls

On Wed, 11 Jul 2012 13:42:23 -0500
"Steven J. Magnani" <[email protected]> wrote:

> Jeff -
>
> On Fri, 2012-06-29 at 14:57 -0400, Jeff Layton wrote:
> > This patchset is the third version of the patchset to add ESTALE
> > handling to several syscalls. The basic idea is to allow the client to
> > gracefully retry the lookup and call when a NFS server returns ESTALE.
>
> I exercised this using 3.5-rc5 against a memory-starved server that
> exports a FAT-backed filesystem. Where normally I see lots of ESTALE
> errors due to inode eviction, with this patchset I see none. And, the
> performance is much better than the only other way I know to eliminate
> the errors, which is to mount with 'lookupcache=none'.
>
> It's not an exhaustive test by any means, just a data point for you.
> Thanks!
>
> Lightly-tested-by: Steve Magnani <[email protected]>

Awesome -- thanks for testing it.

Yeah "lookupcache=none" has been the standard answer for people
suffering from ESTALE problems, but the perf hit can be quite nasty.
Avoiding that is one of the main drivers for handling ESTALE this way.

With luck, we'll see at least some of this set go in for 3.6.

--
Jeff Layton <[email protected]>