2011-11-30 14:07:59

by Jeff Layton

[permalink] [raw]
Subject: [PATCH] nfs: only do COMMIT for range written with direct I/O

When given a range to write with unstable writes, the current code
always does a COMMIT of the entire file afterward. This is potentially
expensive on some servers and unnecessary. Instead, just do a COMMIT for
the offset and count that was written.

Khoa, who reported this bug, stated that this made a big difference in
performance in their environment, which I believe involves GPFS on the
server. He didn't pass along any hard numbers so I can't quantify the
gain, but it stands to reason that clustered filesystems might suffer
more contention issues when issuing a commit over the whole file.

Reported-by: Khoa Huynh <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/direct.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 1940f1a..33f2be7 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -77,6 +77,7 @@ struct nfs_direct_req {
/* completion state */
atomic_t io_count; /* i/os we're waiting for */
spinlock_t lock; /* protect completion state */
+ loff_t pos; /* offset into file */
ssize_t count, /* bytes actually processed */
error; /* any reported error */
struct completion completion; /* wait for i/o completion */
@@ -584,8 +585,8 @@ static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq)
data->cred = msg.rpc_cred;

data->args.fh = NFS_FH(data->inode);
- data->args.offset = 0;
- data->args.count = 0;
+ data->args.offset = dreq->pos;
+ data->args.count = dreq->count;
data->args.context = dreq->ctx;
data->args.lock_context = dreq->l_ctx;
data->res.count = 0;
@@ -877,6 +878,7 @@ static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov,
sync = NFS_FILE_SYNC;

dreq->inode = inode;
+ dreq->pos = pos;
dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
dreq->l_ctx = nfs_get_lock_context(dreq->ctx);
if (dreq->l_ctx == NULL)
--
1.7.6.4



2011-11-30 19:25:29

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] nfs: only do COMMIT for range written with direct I/O

On Wed, 30 Nov 2011 08:56:57 -0600
Malahal Naineni <[email protected]> wrote:

> Jeff Layton [[email protected]] wrote:
> > When given a range to write with unstable writes, the current code
> > always does a COMMIT of the entire file afterward. This is potentially
> > expensive on some servers and unnecessary. Instead, just do a COMMIT for
> > the offset and count that was written.
> >
> > Khoa, who reported this bug, stated that this made a big difference in
> > performance in their environment, which I believe involves GPFS on the
> > server. He didn't pass along any hard numbers so I can't quantify the
> > gain, but it stands to reason that clustered filesystems might suffer
> > more contention issues when issuing a commit over the whole file.
> >
> > Reported-by: Khoa Huynh <[email protected]>
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/nfs/direct.c | 6 ++++--
> > 1 files changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> > index 1940f1a..33f2be7 100644
> > --- a/fs/nfs/direct.c
> > +++ b/fs/nfs/direct.c
> > @@ -77,6 +77,7 @@ struct nfs_direct_req {
> > /* completion state */
> > atomic_t io_count; /* i/os we're waiting for */
> > spinlock_t lock; /* protect completion state */
> > + loff_t pos; /* offset into file */
>
> Why not call it "offset"?

Why should we call it "offset"? The rest of the code refers to this
value as "pos". Essentially this patch replaces a field that was
removed from this struct several years ago. At the time, that field was
also called "pos".

> Should we set this to zero in
> nfs_direct_req_alloc (just like count is set to zero there)?
>

Yes, that certainly wouldn't hurt. I'll plan to add that into the next
respin which I'll plan to send out once I've collected some more
feedback.

Thanks,
--
Jeff Layton <[email protected]>

2011-11-30 14:57:16

by Malahal Naineni

[permalink] [raw]
Subject: Re: [PATCH] nfs: only do COMMIT for range written with direct I/O

Jeff Layton [[email protected]] wrote:
> When given a range to write with unstable writes, the current code
> always does a COMMIT of the entire file afterward. This is potentially
> expensive on some servers and unnecessary. Instead, just do a COMMIT for
> the offset and count that was written.
>
> Khoa, who reported this bug, stated that this made a big difference in
> performance in their environment, which I believe involves GPFS on the
> server. He didn't pass along any hard numbers so I can't quantify the
> gain, but it stands to reason that clustered filesystems might suffer
> more contention issues when issuing a commit over the whole file.
>
> Reported-by: Khoa Huynh <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfs/direct.c | 6 ++++--
> 1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index 1940f1a..33f2be7 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -77,6 +77,7 @@ struct nfs_direct_req {
> /* completion state */
> atomic_t io_count; /* i/os we're waiting for */
> spinlock_t lock; /* protect completion state */
> + loff_t pos; /* offset into file */

Why not call it "offset"? Should we set this to zero in
nfs_direct_req_alloc (just like count is set to zero there)?

Thanks, Malahal.


2011-12-05 17:10:02

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] nfs: only do COMMIT for range written with direct I/O

On Wed, 30 Nov 2011 09:07:54 -0500
Jeff Layton <[email protected]> wrote:

> When given a range to write with unstable writes, the current code
> always does a COMMIT of the entire file afterward. This is potentially
> expensive on some servers and unnecessary. Instead, just do a COMMIT for
> the offset and count that was written.
>
> Khoa, who reported this bug, stated that this made a big difference in
> performance in their environment, which I believe involves GPFS on the
> server. He didn't pass along any hard numbers so I can't quantify the
> gain, but it stands to reason that clustered filesystems might suffer
> more contention issues when issuing a commit over the whole file.
>
> Reported-by: Khoa Huynh <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfs/direct.c | 6 ++++--
> 1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index 1940f1a..33f2be7 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -77,6 +77,7 @@ struct nfs_direct_req {
> /* completion state */
> atomic_t io_count; /* i/os we're waiting for */
> spinlock_t lock; /* protect completion state */
> + loff_t pos; /* offset into file */
> ssize_t count, /* bytes actually processed */
> error; /* any reported error */
> struct completion completion; /* wait for i/o completion */
> @@ -584,8 +585,8 @@ static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq)
> data->cred = msg.rpc_cred;
>
> data->args.fh = NFS_FH(data->inode);
> - data->args.offset = 0;
> - data->args.count = 0;
> + data->args.offset = dreq->pos;
> + data->args.count = dreq->count;
> data->args.context = dreq->ctx;
> data->args.lock_context = dreq->l_ctx;
> data->res.count = 0;
> @@ -877,6 +878,7 @@ static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov,
> sync = NFS_FILE_SYNC;
>
> dreq->inode = inode;
> + dreq->pos = pos;
> dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
> dreq->l_ctx = nfs_get_lock_context(dreq->ctx);
> if (dreq->l_ctx == NULL)

Khoa found that he made a mistake when testing this originally, and any
benefit that the patch provides seems to be negligible. I still think
it's safe and reasonable to only issue a commit for the range that was
written, but there doesn't seem to be any compelling need for this
patch right now.

Trond, do you have an opinion here? Should we go ahead and commit this
patch or something like it, or leave well-enough alone?
--
Jeff Layton <[email protected]>

2011-12-13 15:03:03

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] nfs: only do COMMIT for range written with direct I/O

On Mon, 5 Dec 2011 10:06:49 -0800
"Myklebust, Trond" <[email protected]> wrote:

> > -----Original Message-----
> > From: Jeff Layton [mailto:[email protected]]
> > Sent: Monday, December 05, 2011 12:11 PM
> > To: Jeff Layton
> > Cc: Myklebust, Trond; [email protected]; [email protected]
> > Subject: Re: [PATCH] nfs: only do COMMIT for range written with direct
> I/O
> >
> > On Wed, 30 Nov 2011 09:07:54 -0500
> > Jeff Layton <[email protected]> wrote:
> >
> > > When given a range to write with unstable writes, the current code
> > > always does a COMMIT of the entire file afterward. This is
> potentially
> > > expensive on some servers and unnecessary. Instead, just do a COMMIT
> > > for the offset and count that was written.
> > >
> > > Khoa, who reported this bug, stated that this made a big difference
> in
> > > performance in their environment, which I believe involves GPFS on
> the
> > > server. He didn't pass along any hard numbers so I can't quantify
> the
> > > gain, but it stands to reason that clustered filesystems might
> suffer
> > > more contention issues when issuing a commit over the whole file.
> > >
> > > Reported-by: Khoa Huynh <[email protected]>
> > > Signed-off-by: Jeff Layton <[email protected]>
> >
> > Khoa found that he made a mistake when testing this originally, and
> any
> > benefit that the patch provides seems to be negligible. I still think
> it's safe
> > and reasonable to only issue a commit for the range that was written,
> but
> > there doesn't seem to be any compelling need for this patch right now.
> >
> > Trond, do you have an opinion here? Should we go ahead and commit this
> > patch or something like it, or leave well-enough alone?
>
> I'd prefer to wait until I see a tangible benefit. I know that recent
> kernels do have support for a COMMIT range on the Linux kernel server
> side, so maybe it is just a question of shooting up an ext4 or XFS based
> server and running a few tests with a large O_DIRECT writer on one
> client, and a smaller O_DIRECT writer on another...
>

Khoa went back and did his tests and they confirm that there was no
tangible benefit in his environment. I did some minimal testing on
other rigs too, but also didn't see any clear benefit.

Of course, this is highly server-dependent, so there may be places
where this matters more, but for now I don't know of any. At this
point, I suggest we drop this patch for now. If we can note some
tangible benefit later, we can resurrect it then.

Thanks,
--
Jeff Layton <[email protected]>

2011-12-05 18:06:52

by Myklebust, Trond

[permalink] [raw]
Subject: RE: [PATCH] nfs: only do COMMIT for range written with direct I/O

> -----Original Message-----
> From: Jeff Layton [mailto:[email protected]]
> Sent: Monday, December 05, 2011 12:11 PM
> To: Jeff Layton
> Cc: Myklebust, Trond; [email protected]; [email protected]
> Subject: Re: [PATCH] nfs: only do COMMIT for range written with direct
I/O
>
> On Wed, 30 Nov 2011 09:07:54 -0500
> Jeff Layton <[email protected]> wrote:
>
> > When given a range to write with unstable writes, the current code
> > always does a COMMIT of the entire file afterward. This is
potentially
> > expensive on some servers and unnecessary. Instead, just do a COMMIT
> > for the offset and count that was written.
> >
> > Khoa, who reported this bug, stated that this made a big difference
in
> > performance in their environment, which I believe involves GPFS on
the
> > server. He didn't pass along any hard numbers so I can't quantify
the
> > gain, but it stands to reason that clustered filesystems might
suffer
> > more contention issues when issuing a commit over the whole file.
> >
> > Reported-by: Khoa Huynh <[email protected]>
> > Signed-off-by: Jeff Layton <[email protected]>
>
> Khoa found that he made a mistake when testing this originally, and
any
> benefit that the patch provides seems to be negligible. I still think
it's safe
> and reasonable to only issue a commit for the range that was written,
but
> there doesn't seem to be any compelling need for this patch right now.
>
> Trond, do you have an opinion here? Should we go ahead and commit this
> patch or something like it, or leave well-enough alone?

I'd prefer to wait until I see a tangible benefit. I know that recent
kernels do have support for a COMMIT range on the Linux kernel server
side, so maybe it is just a question of shooting up an ext4 or XFS based
server and running a few tests with a large O_DIRECT writer on one
client, and a smaller O_DIRECT writer on another...

Cheers
Trond