2010-01-29 20:19:22

by Myklebust, Trond

[permalink] [raw]
Subject: nfsd: Replace nfsd_sync() with vfs_fsync() and vfs_fsync_range()

From: Trond Myklebust <[email protected]>

Currently, the nfs server holds the inode->i_mutex across both the
filemap_write_and_wait() call and the fsync() call itself. However we know
that filemap_write_and_wait() is already safe against livelocks, so we only
need to hold the mutex across the fsync() call.

Fix this by reusing vfs_fsync(), which already does the right thing.
Also make sure that we use vfs_fsync_range() in the COMMIT operation, to
improve the efficiency for clients that do specify a range.

Signed-off-by: Trond Myklebust <[email protected]>
---

fs/nfsd/vfs.c | 58 +++++++++++++++++++++------------------------------------
1 files changed, 21 insertions(+), 37 deletions(-)


diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index c194793..e4568d6 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -769,40 +769,17 @@ nfsd_close(struct file *filp)
}

/*
- * Sync a file
- * As this calls fsync (not fdatasync) there is no need for a write_inode
- * after it.
+ * Sync a directory
+ * returns 0 if the directory had no fsync method
*/
-static inline int nfsd_dosync(struct file *filp, struct dentry *dp,
- const struct file_operations *fop)
-{
- struct inode *inode = dp->d_inode;
- int (*fsync) (struct file *, struct dentry *, int);
- int err;
-
- err = filemap_write_and_wait(inode->i_mapping);
- if (err == 0 && fop && (fsync = fop->fsync))
- err = fsync(filp, dp, 0);
- return err;
-}
-
-static int
-nfsd_sync(struct file *filp)
-{
- int err;
- struct inode *inode = filp->f_path.dentry->d_inode;
- dprintk("nfsd: sync file %s\n", filp->f_path.dentry->d_name.name);
- mutex_lock(&inode->i_mutex);
- err=nfsd_dosync(filp, filp->f_path.dentry, filp->f_op);
- mutex_unlock(&inode->i_mutex);
-
- return err;
-}
-
int
nfsd_sync_dir(struct dentry *dp)
{
- return nfsd_dosync(NULL, dp, dp->d_inode->i_fop);
+ int err;
+
+ dprintk("nfsd: sync directory %s\n", dp->d_name.name);
+ err = vfs_fsync(NULL, dp, 0);
+ return (err != -EINVAL) ? err : 0;
}

/*
@@ -1008,7 +985,9 @@ static int wait_for_concurrent_writes(struct file *file)

if (inode->i_state & I_DIRTY) {
dprintk("nfsd: write sync %d\n", task_pid_nr(current));
- err = nfsd_sync(file);
+ err = vfs_fsync(file, file->f_path.dentry, 0);
+ if (err == -EINVAL)
+ err = 0;
}
last_ino = inode->i_ino;
last_dev = inode->i_sb->s_dev;
@@ -1156,8 +1135,6 @@ out:
#ifdef CONFIG_NFSD_V3
/*
* Commit all pending writes to stable storage.
- * Strictly speaking, we could sync just the indicated file region here,
- * but there's currently no way we can ask the VFS to do so.
*
* Unfortunately we cannot lock the file to make sure we return full WCC
* data to the client, as locking happens lower down in the filesystem.
@@ -1176,11 +1153,18 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (err)
return err;
if (EX_ISSYNC(fhp->fh_export)) {
- if (file->f_op && file->f_op->fsync) {
- err = nfserrno(nfsd_sync(file));
- } else {
+ loff_t end = LLONG_MAX;
+ int err2;
+
+ if (count != 0)
+ end = offset + count;
+ err2 = vfs_fsync_range(file, file->f_path.dentry,
+ offset, end, 0);
+
+ if (err2 != -EINVAL)
+ err = nfserrno(err2);
+ else
err = nfserr_notsupp;
- }
}

nfsd_close(file);



2010-01-29 20:35:22

by Trond Myklebust

[permalink] [raw]
Subject: Re: nfsd: Replace nfsd_sync() with vfs_fsync() and vfs_fsync_range()

On Fri, 2010-01-29 at 15:19 -0500, Trond Myklebust wrote:
> From: Trond Myklebust <[email protected]>
>
> Currently, the nfs server holds the inode->i_mutex across both the
> filemap_write_and_wait() call and the fsync() call itself. However we know
> that filemap_write_and_wait() is already safe against livelocks, so we only
> need to hold the mutex across the fsync() call.
>
> Fix this by reusing vfs_fsync(), which already does the right thing.
> Also make sure that we use vfs_fsync_range() in the COMMIT operation, to
> improve the efficiency for clients that do specify a range.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
>
> fs/nfsd/vfs.c | 58 +++++++++++++++++++++------------------------------------
> 1 files changed, 21 insertions(+), 37 deletions(-)
>
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index c194793..e4568d6 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -769,40 +769,17 @@ nfsd_close(struct file *filp)
> }
>
> /*
> - * Sync a file
> - * As this calls fsync (not fdatasync) there is no need for a write_inode
> - * after it.
> + * Sync a directory
> + * returns 0 if the directory had no fsync method
> */
> -static inline int nfsd_dosync(struct file *filp, struct dentry *dp,
> - const struct file_operations *fop)
> -{
> - struct inode *inode = dp->d_inode;
> - int (*fsync) (struct file *, struct dentry *, int);
> - int err;
> -
> - err = filemap_write_and_wait(inode->i_mapping);
> - if (err == 0 && fop && (fsync = fop->fsync))
> - err = fsync(filp, dp, 0);
> - return err;
> -}
> -
> -static int
> -nfsd_sync(struct file *filp)
> -{
> - int err;
> - struct inode *inode = filp->f_path.dentry->d_inode;
> - dprintk("nfsd: sync file %s\n", filp->f_path.dentry->d_name.name);
> - mutex_lock(&inode->i_mutex);
> - err=nfsd_dosync(filp, filp->f_path.dentry, filp->f_op);
> - mutex_unlock(&inode->i_mutex);
> -
> - return err;
> -}
> -
> int
> nfsd_sync_dir(struct dentry *dp)
> {
> - return nfsd_dosync(NULL, dp, dp->d_inode->i_fop);
> + int err;
> +
> + dprintk("nfsd: sync directory %s\n", dp->d_name.name);
> + err = vfs_fsync(NULL, dp, 0);
> + return (err != -EINVAL) ? err : 0;
> }
>
> /*
> @@ -1008,7 +985,9 @@ static int wait_for_concurrent_writes(struct file *file)
>
> if (inode->i_state & I_DIRTY) {
> dprintk("nfsd: write sync %d\n", task_pid_nr(current));
> - err = nfsd_sync(file);
> + err = vfs_fsync(file, file->f_path.dentry, 0);
> + if (err == -EINVAL)
> + err = 0;
> }
> last_ino = inode->i_ino;
> last_dev = inode->i_sb->s_dev;
> @@ -1156,8 +1135,6 @@ out:
> #ifdef CONFIG_NFSD_V3
> /*
> * Commit all pending writes to stable storage.
> - * Strictly speaking, we could sync just the indicated file region here,
> - * but there's currently no way we can ask the VFS to do so.
> *
> * Unfortunately we cannot lock the file to make sure we return full WCC
> * data to the client, as locking happens lower down in the filesystem.
> @@ -1176,11 +1153,18 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
> if (err)
> return err;
> if (EX_ISSYNC(fhp->fh_export)) {
> - if (file->f_op && file->f_op->fsync) {
> - err = nfserrno(nfsd_sync(file));
> - } else {
> + loff_t end = LLONG_MAX;
> + int err2;
> +
> + if (count != 0)
> + end = offset + count;
> + err2 = vfs_fsync_range(file, file->f_path.dentry,
> + offset, end, 0);
> +
Actually, we should probably check for offset < 0 || end < offset. I'll
add that...

Cheers
Trond


2010-01-29 20:44:28

by Myklebust, Trond

[permalink] [raw]
Subject: [PATCH v2] nfsd: Replace nfsd_sync() with vfs_fsync() and vfs_fsync_range()

From: Trond Myklebust <[email protected]>

Currently, the nfs server holds the inode->i_mutex across both the
filemap_write_and_wait() call and the fsync() call itself. However we know
that filemap_write_and_wait() is already safe against livelocks, so we only
need to hold the mutex across the fsync() call.

Fix this by reusing vfs_fsync(), which already does the right thing.
Also make sure that we use vfs_fsync_range() in the COMMIT operation, to
improve the efficiency for clients that do specify a range.

Signed-off-by: Trond Myklebust <[email protected]>
---

fs/nfsd/vfs.c | 59 ++++++++++++++++++++-------------------------------------
1 files changed, 21 insertions(+), 38 deletions(-)


diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index c194793..26d0d2c 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -769,40 +769,17 @@ nfsd_close(struct file *filp)
}

/*
- * Sync a file
- * As this calls fsync (not fdatasync) there is no need for a write_inode
- * after it.
+ * Sync a directory
+ * returns 0 if the directory had no fsync method
*/
-static inline int nfsd_dosync(struct file *filp, struct dentry *dp,
- const struct file_operations *fop)
-{
- struct inode *inode = dp->d_inode;
- int (*fsync) (struct file *, struct dentry *, int);
- int err;
-
- err = filemap_write_and_wait(inode->i_mapping);
- if (err == 0 && fop && (fsync = fop->fsync))
- err = fsync(filp, dp, 0);
- return err;
-}
-
-static int
-nfsd_sync(struct file *filp)
-{
- int err;
- struct inode *inode = filp->f_path.dentry->d_inode;
- dprintk("nfsd: sync file %s\n", filp->f_path.dentry->d_name.name);
- mutex_lock(&inode->i_mutex);
- err=nfsd_dosync(filp, filp->f_path.dentry, filp->f_op);
- mutex_unlock(&inode->i_mutex);
-
- return err;
-}
-
int
nfsd_sync_dir(struct dentry *dp)
{
- return nfsd_dosync(NULL, dp, dp->d_inode->i_fop);
+ int err;
+
+ dprintk("nfsd: sync directory %s\n", dp->d_name.name);
+ err = vfs_fsync(NULL, dp, 0);
+ return (err != -EINVAL) ? err : 0;
}

/*
@@ -1008,7 +985,9 @@ static int wait_for_concurrent_writes(struct file *file)

if (inode->i_state & I_DIRTY) {
dprintk("nfsd: write sync %d\n", task_pid_nr(current));
- err = nfsd_sync(file);
+ err = vfs_fsync(file, file->f_path.dentry, 0);
+ if (err == -EINVAL)
+ err = 0;
}
last_ino = inode->i_ino;
last_dev = inode->i_sb->s_dev;
@@ -1156,8 +1135,6 @@ out:
#ifdef CONFIG_NFSD_V3
/*
* Commit all pending writes to stable storage.
- * Strictly speaking, we could sync just the indicated file region here,
- * but there's currently no way we can ask the VFS to do so.
*
* Unfortunately we cannot lock the file to make sure we return full WCC
* data to the client, as locking happens lower down in the filesystem.
@@ -1167,20 +1144,26 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
loff_t offset, unsigned long count)
{
struct file *file;
+ loff_t end = LLONG_MAX;
__be32 err;

- if ((u64)count > ~(u64)offset)
+ if (count != 0)
+ end = offset + count - 1;
+
+ if (offset < 0 || end < offset)
return nfserr_inval;

err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
if (err)
return err;
if (EX_ISSYNC(fhp->fh_export)) {
- if (file->f_op && file->f_op->fsync) {
- err = nfserrno(nfsd_sync(file));
- } else {
+ int err2 = vfs_fsync_range(file, file->f_path.dentry,
+ offset, end, 0);
+
+ if (err2 != -EINVAL)
+ err = nfserrno(err2);
+ else
err = nfserr_notsupp;
- }
}

nfsd_close(file);


2010-01-29 20:50:25

by Christoph Hellwig

[permalink] [raw]
Subject: Re: nfsd: Replace nfsd_sync() with vfs_fsync() and vfs_fsync_range()

On Fri, Jan 29, 2010 at 03:19:13PM -0500, Trond Myklebust wrote:
> From: Trond Myklebust <[email protected]>
>
> Currently, the nfs server holds the inode->i_mutex across both the
> filemap_write_and_wait() call and the fsync() call itself. However we know
> that filemap_write_and_wait() is already safe against livelocks, so we only
> need to hold the mutex across the fsync() call.
>
> Fix this by reusing vfs_fsync(), which already does the right thing.
> Also make sure that we use vfs_fsync_range() in the COMMIT operation, to
> improve the efficiency for clients that do specify a range.

I already sent a patch to replace nfsd_sync with it that should be
queued up. We can't use vfs_fsync for nfsd_sync_dir as we're already
holding i_mutex when calling it. The vfs_fsync_range optimizations
seems like something that should be applied ontop, though.


2010-01-29 20:57:15

by Myklebust, Trond

[permalink] [raw]
Subject: Re: nfsd: Replace nfsd_sync() with vfs_fsync() and vfs_fsync_range()

On Fri, 2010-01-29 at 15:50 -0500, Christoph Hellwig wrote:
> On Fri, Jan 29, 2010 at 03:19:13PM -0500, Trond Myklebust wrote:
> > From: Trond Myklebust <[email protected]>
> >
> > Currently, the nfs server holds the inode->i_mutex across both the
> > filemap_write_and_wait() call and the fsync() call itself. However we know
> > that filemap_write_and_wait() is already safe against livelocks, so we only
> > need to hold the mutex across the fsync() call.
> >
> > Fix this by reusing vfs_fsync(), which already does the right thing.
> > Also make sure that we use vfs_fsync_range() in the COMMIT operation, to
> > improve the efficiency for clients that do specify a range.
>
> I already sent a patch to replace nfsd_sync with it that should be
> queued up. We can't use vfs_fsync for nfsd_sync_dir as we're already
> holding i_mutex when calling it. The vfs_fsync_range optimizations
> seems like something that should be applied ontop, though.
>

OK. I missed your patch as it flew by on the list, but I assume it is
this one:

http://git.linux-nfs.org/?p=bfields/linux.git;a=commitdiff;h=6a68f89ee1f2d177af4a5410fa7a45734c975fd6;hp=de3cab793c6a5c8505d66bee111edcc7098380ba


I'll separate out the vfs_sync_range() changes and cobble up a patch on
top of the above changeset...

Cheers
Trond

2010-01-29 21:18:28

by Myklebust, Trond

[permalink] [raw]
Subject: nfsd: Use vfs_fsync_range() in nfsd_commit

On Fri, 2010-01-29 at 15:57 -0500, Trond Myklebust wrote:
> OK. I missed your patch as it flew by on the list, but I assume it is
> this one:
>
> http://git.linux-nfs.org/?p=bfields/linux.git;a=commitdiff;h=6a68f89ee1f2d177af4a5410fa7a45734c975fd6;hp=de3cab793c6a5c8505d66bee111edcc7098380ba
>
>
> I'll separate out the vfs_sync_range() changes and cobble up a patch on
> top of the above changeset...
>
> Cheers
> Trond

Here it is....

Cheers
Trond
-------------------------------------------------------------------------------------------------------------------
nfsd: Use vfs_fsync_range() in nfsd_commit

From: Trond Myklebust <[email protected]>

The NFS COMMIT operation allows the client to specify the exact byte range
that it wishes to sync to disk in order to optimise server performance.

Signed-off-by: Trond Myklebust <[email protected]>
---

fs/nfsd/vfs.c | 27 +++++++++++++++++----------
1 files changed, 17 insertions(+), 10 deletions(-)


diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 79d216f..cc046ca 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1141,8 +1141,6 @@ out:
#ifdef CONFIG_NFSD_V3
/*
* Commit all pending writes to stable storage.
- * Strictly speaking, we could sync just the indicated file region here,
- * but there's currently no way we can ask the VFS to do so.
*
* Unfortunately we cannot lock the file to make sure we return full WCC
* data to the client, as locking happens lower down in the filesystem.
@@ -1152,23 +1150,32 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
loff_t offset, unsigned long count)
{
struct file *file;
- __be32 err;
+ loff_t end = LLONG_MAX;
+ __be32 err = nfserr_inval;

- if ((u64)count > ~(u64)offset)
- return nfserr_inval;
+ if (offset < 0)
+ goto out;
+ if (count != 0) {
+ end = offset + (loff_t)count - 1;
+ if (end < offset)
+ goto out;
+ }

err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
if (err)
- return err;
+ goto out;
if (EX_ISSYNC(fhp->fh_export)) {
- if (file->f_op && file->f_op->fsync) {
- err = nfserrno(vfs_fsync(file, file->f_path.dentry, 0));
- } else {
+ int err2 = vfs_fsync_range(file, file->f_path.dentry,
+ offset, end, 0);
+
+ if (err2 != -EINVAL)
+ err = nfserrno(err2);
+ else
err = nfserr_notsupp;
- }
}

nfsd_close(file);
+out:
return err;
}
#endif /* CONFIG_NFSD_V3 */


2010-01-29 21:27:09

by Christoph Hellwig

[permalink] [raw]
Subject: Re: nfsd: Use vfs_fsync_range() in nfsd_commit

On Fri, Jan 29, 2010 at 04:18:26PM -0500, Trond Myklebust wrote:
> nfsd: Use vfs_fsync_range() in nfsd_commit
>
> From: Trond Myklebust <[email protected]>
>
> The NFS COMMIT operation allows the client to specify the exact byte range
> that it wishes to sync to disk in order to optimise server performance.
>
> Signed-off-by: Trond Myklebust <[email protected]>

Looks good to me,


Reviewed-by: Christoph Hellwig <[email protected]>

> * Commit all pending writes to stable storage.
> - * Strictly speaking, we could sync just the indicated file region here,
> - * but there's currently no way we can ask the VFS to do so.

The commit above could use some addition that the commit only happens
for the specified range.


And not actually touched by your patch, but that is the reason to
open/close the file if we don't actually do anything with it for an
async export?

2010-01-29 21:39:39

by Myklebust, Trond

[permalink] [raw]
Subject: Re: nfsd: Use vfs_fsync_range() in nfsd_commit

On Fri, 2010-01-29 at 16:27 -0500, Christoph Hellwig wrote:
> On Fri, Jan 29, 2010 at 04:18:26PM -0500, Trond Myklebust wrote:
> > nfsd: Use vfs_fsync_range() in nfsd_commit
> >
> > From: Trond Myklebust <[email protected]>
> >
> > The NFS COMMIT operation allows the client to specify the exact byte range
> > that it wishes to sync to disk in order to optimise server performance.
> >
> > Signed-off-by: Trond Myklebust <[email protected]>
>
> Looks good to me,
>
>
> Reviewed-by: Christoph Hellwig <[email protected]>

Thanks!

> > * Commit all pending writes to stable storage.
> > - * Strictly speaking, we could sync just the indicated file region here,
> > - * but there's currently no way we can ask the VFS to do so.
>
> The commit above could use some addition that the commit only happens
> for the specified range.

I'll add in a couple of words.

> And not actually touched by your patch, but that is the reason to
> open/close the file if we don't actually do anything with it for an
> async export?

I must admit that I was wondering about that too. I'm assuming that the
reason is to provide consistent behaviour w.r.t. access checks and
possibly also to ensure that NFSv4 delegations are revoked. Perhaps
Bruce could comment?

Cheers
Trond

2010-01-29 21:44:26

by Myklebust, Trond

[permalink] [raw]
Subject: [PATCH v4] nfsd: Use vfs_fsync_range() in nfsd_commit

nfsd: Use vfs_fsync_range() in nfsd_commit

From: Trond Myklebust <[email protected]>

The NFS COMMIT operation allows the client to specify the exact byte range
that it wishes to sync to disk in order to optimise server performance.

Signed-off-by: Trond Myklebust <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---

fs/nfsd/vfs.c | 30 ++++++++++++++++++++----------
1 files changed, 20 insertions(+), 10 deletions(-)


diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 79d216f..ed024d3 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1141,8 +1141,9 @@ out:
#ifdef CONFIG_NFSD_V3
/*
* Commit all pending writes to stable storage.
- * Strictly speaking, we could sync just the indicated file region here,
- * but there's currently no way we can ask the VFS to do so.
+ *
+ * Note: we only guarantee that data that lies within the range specified
+ * by the 'offset' and 'count' parameters will be synced.
*
* Unfortunately we cannot lock the file to make sure we return full WCC
* data to the client, as locking happens lower down in the filesystem.
@@ -1152,23 +1153,32 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
loff_t offset, unsigned long count)
{
struct file *file;
- __be32 err;
+ loff_t end = LLONG_MAX;
+ __be32 err = nfserr_inval;

- if ((u64)count > ~(u64)offset)
- return nfserr_inval;
+ if (offset < 0)
+ goto out;
+ if (count != 0) {
+ end = offset + (loff_t)count - 1;
+ if (end < offset)
+ goto out;
+ }

err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
if (err)
- return err;
+ goto out;
if (EX_ISSYNC(fhp->fh_export)) {
- if (file->f_op && file->f_op->fsync) {
- err = nfserrno(vfs_fsync(file, file->f_path.dentry, 0));
- } else {
+ int err2 = vfs_fsync_range(file, file->f_path.dentry,
+ offset, end, 0);
+
+ if (err2 != -EINVAL)
+ err = nfserrno(err2);
+ else
err = nfserr_notsupp;
- }
}

nfsd_close(file);
+out:
return err;
}
#endif /* CONFIG_NFSD_V3 */


2010-01-29 23:53:54

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v4] nfsd: Use vfs_fsync_range() in nfsd_commit

On Fri, Jan 29, 2010 at 04:44:25PM -0500, Trond Myklebust wrote:
> nfsd: Use vfs_fsync_range() in nfsd_commit
>
> From: Trond Myklebust <[email protected]>
>
> The NFS COMMIT operation allows the client to specify the exact byte range
> that it wishes to sync to disk in order to optimise server performance.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> Reviewed-by: Christoph Hellwig <[email protected]>

Applied, thanks!

--b.

> ---
>
> fs/nfsd/vfs.c | 30 ++++++++++++++++++++----------
> 1 files changed, 20 insertions(+), 10 deletions(-)
>
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 79d216f..ed024d3 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1141,8 +1141,9 @@ out:
> #ifdef CONFIG_NFSD_V3
> /*
> * Commit all pending writes to stable storage.
> - * Strictly speaking, we could sync just the indicated file region here,
> - * but there's currently no way we can ask the VFS to do so.
> + *
> + * Note: we only guarantee that data that lies within the range specified
> + * by the 'offset' and 'count' parameters will be synced.
> *
> * Unfortunately we cannot lock the file to make sure we return full WCC
> * data to the client, as locking happens lower down in the filesystem.
> @@ -1152,23 +1153,32 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
> loff_t offset, unsigned long count)
> {
> struct file *file;
> - __be32 err;
> + loff_t end = LLONG_MAX;
> + __be32 err = nfserr_inval;
>
> - if ((u64)count > ~(u64)offset)
> - return nfserr_inval;
> + if (offset < 0)
> + goto out;
> + if (count != 0) {
> + end = offset + (loff_t)count - 1;
> + if (end < offset)
> + goto out;
> + }
>
> err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
> if (err)
> - return err;
> + goto out;
> if (EX_ISSYNC(fhp->fh_export)) {
> - if (file->f_op && file->f_op->fsync) {
> - err = nfserrno(vfs_fsync(file, file->f_path.dentry, 0));
> - } else {
> + int err2 = vfs_fsync_range(file, file->f_path.dentry,
> + offset, end, 0);
> +
> + if (err2 != -EINVAL)
> + err = nfserrno(err2);
> + else
> err = nfserr_notsupp;
> - }
> }
>
> nfsd_close(file);
> +out:
> return err;
> }
> #endif /* CONFIG_NFSD_V3 */
>

2010-01-29 23:54:14

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v4] nfsd: Use vfs_fsync_range() in nfsd_commit

On Fri, Jan 29, 2010 at 06:53:54PM -0500, Dr. J. Bruce Fields wrote:
> On Fri, Jan 29, 2010 at 04:44:25PM -0500, Trond Myklebust wrote:
> > nfsd: Use vfs_fsync_range() in nfsd_commit
> >
> > From: Trond Myklebust <[email protected]>
> >
> > The NFS COMMIT operation allows the client to specify the exact byte range
> > that it wishes to sync to disk in order to optimise server performance.
> >
> > Signed-off-by: Trond Myklebust <[email protected]>
> > Reviewed-by: Christoph Hellwig <[email protected]>
>
> Applied, thanks!

Do you have any performance results, by the way?

--b.

>
> --b.
>
> > ---
> >
> > fs/nfsd/vfs.c | 30 ++++++++++++++++++++----------
> > 1 files changed, 20 insertions(+), 10 deletions(-)
> >
> >
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 79d216f..ed024d3 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1141,8 +1141,9 @@ out:
> > #ifdef CONFIG_NFSD_V3
> > /*
> > * Commit all pending writes to stable storage.
> > - * Strictly speaking, we could sync just the indicated file region here,
> > - * but there's currently no way we can ask the VFS to do so.
> > + *
> > + * Note: we only guarantee that data that lies within the range specified
> > + * by the 'offset' and 'count' parameters will be synced.
> > *
> > * Unfortunately we cannot lock the file to make sure we return full WCC
> > * data to the client, as locking happens lower down in the filesystem.
> > @@ -1152,23 +1153,32 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > loff_t offset, unsigned long count)
> > {
> > struct file *file;
> > - __be32 err;
> > + loff_t end = LLONG_MAX;
> > + __be32 err = nfserr_inval;
> >
> > - if ((u64)count > ~(u64)offset)
> > - return nfserr_inval;
> > + if (offset < 0)
> > + goto out;
> > + if (count != 0) {
> > + end = offset + (loff_t)count - 1;
> > + if (end < offset)
> > + goto out;
> > + }
> >
> > err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
> > if (err)
> > - return err;
> > + goto out;
> > if (EX_ISSYNC(fhp->fh_export)) {
> > - if (file->f_op && file->f_op->fsync) {
> > - err = nfserrno(vfs_fsync(file, file->f_path.dentry, 0));
> > - } else {
> > + int err2 = vfs_fsync_range(file, file->f_path.dentry,
> > + offset, end, 0);
> > +
> > + if (err2 != -EINVAL)
> > + err = nfserrno(err2);
> > + else
> > err = nfserr_notsupp;
> > - }
> > }
> >
> > nfsd_close(file);
> > +out:
> > return err;
> > }
> > #endif /* CONFIG_NFSD_V3 */
> >

2010-01-29 23:58:50

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd: Use vfs_fsync_range() in nfsd_commit

On Fri, Jan 29, 2010 at 04:39:11PM -0500, Trond Myklebust wrote:
> On Fri, 2010-01-29 at 16:27 -0500, Christoph Hellwig wrote:
> > And not actually touched by your patch, but that is the reason to
> > open/close the file if we don't actually do anything with it for an
> > async export?
>
> I must admit that I was wondering about that too. I'm assuming that the
> reason is to provide consistent behaviour w.r.t. access checks and
> possibly also to ensure that NFSv4 delegations are revoked. Perhaps
> Bruce could comment?

Do delegations need to be revoked on commit? (And do we care about
access checks?)

We could do both without the need for an actual open, but I don't know
that it matters much either way.

--b.

2010-03-17 19:08:36

by Jeff Layton

[permalink] [raw]
Subject: Re: nfsd: Use vfs_fsync_range() in nfsd_commit

On Fri, 29 Jan 2010 18:58:52 -0500
"Dr. J. Bruce Fields" <[email protected]> wrote:

> On Fri, Jan 29, 2010 at 04:39:11PM -0500, Trond Myklebust wrote:
> > On Fri, 2010-01-29 at 16:27 -0500, Christoph Hellwig wrote:
> > > And not actually touched by your patch, but that is the reason to
> > > open/close the file if we don't actually do anything with it for an
> > > async export?
> >
> > I must admit that I was wondering about that too. I'm assuming that the
> > reason is to provide consistent behaviour w.r.t. access checks and
> > possibly also to ensure that NFSv4 delegations are revoked. Perhaps
> > Bruce could comment?
>
> Do delegations need to be revoked on commit? (And do we care about
> access checks?)
>
> We could do both without the need for an actual open, but I don't know
> that it matters much either way.
>

Is there any reason that we need to revoke the delegation on a commit?

The only real effect is that writes would be flushed to stable storage.
The VM on the server could just decide the flush the data to stable
storage whenever it feels like it without revoking the lease. I don't
see why we'd need to revoke the lease just because a client asked for
the flush.

Bruce has already chimed in on it, but a deadlock of sorts has popped
up relating to this:

https://bugzilla.redhat.com/show_bug.cgi?id=551028#c10

...and it seems like not recalling the delegation on a COMMIT might
help resolve it.

--
Jeff Layton <[email protected]>

2010-03-17 19:36:11

by Myklebust, Trond

[permalink] [raw]
Subject: Re: nfsd: Use vfs_fsync_range() in nfsd_commit

On Wed, 2010-03-17 at 15:08 -0400, Jeff Layton wrote:
> Is there any reason that we need to revoke the delegation on a commit?
>
> The only real effect is that writes would be flushed to stable storage.
> The VM on the server could just decide the flush the data to stable
> storage whenever it feels like it without revoking the lease. I don't
> see why we'd need to revoke the lease just because a client asked for
> the flush.
>
> Bruce has already chimed in on it, but a deadlock of sorts has popped
> up relating to this:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=551028#c10
>
> ...and it seems like not recalling the delegation on a COMMIT might
> help resolve it.

I agree. A commit doesn't change the inode in any way, so it shouldn't
require you to revoke the delegation.

Furthermore, I think that the write access check there is incorrect too.
The posix definition of fsync() doesn't specify that the file descriptor
has be writeable, and since COMMIT semantics are supposed to be modelled
on fsync...

Cheers
Trond