LinuxLists.cc - Re: NFS performance degradation of local loopback FS.

2008-06-30 15:36:53

Subject: Re: NFS performance degradation of local loopback FS.

On Mon, Jun 30, 2008 at 11:26:54AM -0400, Jeff Layton wrote:
> Recently I spent some time with others here at Red Hat looking
> at problems with nfs server performance. One thing we found was that
> there are some problems with multiple nfsd's. It seems like the I/O
> scheduling or something is fooled by the fact that sequential write
> calls are often handled by different nfsd's. This can negatively
> impact performance (I don't think we've tracked this down completely
> yet, however).

Yes, we've been trying to see how close to full network speed we can get
over a 10 gig network and have run into situations where increasing the
number of threads (without changing anything else) seems to decrease
performance of a simple sequential write.

And the hypothesis that the problem was randomized IO scheduling was the
first thing that came to mind. But I'm not sure what the easiest way
would be to really prove that that was the problem.

And then once we really are sure that's the problem, I'm not sure what
to do about it. I suppose it may depend partly on exactly where the
reordering is happening.

--b.

>
> Since you're just doing some single-threaded testing on the client
> side, it might be interesting to try running a single nfsd and testing
> performance with that. It might provide an interesting data point.
>
> Some other thoughts of things to try:
>
> 1) run the tests against an exported tmpfs filesystem to eliminate
> underlying disk performance as a factor.
>
> 2) test nfsv4 -- nfsd opens and closes the file for each read/write.
> nfsv4 is statelful, however, so I don't believe it does that there.
>
> As others have pointed out though, testing with client and server on
> the same machine is not necessarily eliminating performance
> bottlenecks. You may want to test with dedicated clients and servers
> (maybe on a nice fast network or with a gigE crossover cable or
> something).

2008-06-30 16:02:11

by Chuck Lever

[permalink] [raw]

Subject: Re: NFS performance degradation of local loopback FS.

J. Bruce Fields wrote:
> On Mon, Jun 30, 2008 at 11:26:54AM -0400, Jeff Layton wrote:
>> Recently I spent some time with others here at Red Hat looking
>> at problems with nfs server performance. One thing we found was that
>> there are some problems with multiple nfsd's. It seems like the I/O
>> scheduling or something is fooled by the fact that sequential write
>> calls are often handled by different nfsd's. This can negatively
>> impact performance (I don't think we've tracked this down completely
>> yet, however).
>
> Yes, we've been trying to see how close to full network speed we can get
> over a 10 gig network and have run into situations where increasing the
> number of threads (without changing anything else) seems to decrease
> performance of a simple sequential write.
>
> And the hypothesis that the problem was randomized IO scheduling was the
> first thing that came to mind. But I'm not sure what the easiest way
> would be to really prove that that was the problem.

Here's an easy way for reads: instrument the VFS code that manages
read-ahead contexts. Probably not an issue for krkumar2, since the file
from one of the read tests is small enough to fit in the server's cache,
and the other read test involves only /dev/null.

I had always thought wdelay would mitigate write request re-ordering,
but I've never looked at how it's implemented in Linux's nfsd. Of
course, if the client is sending too many COMMIT requests, this will
negate the benefit of wdelay.

Attachments:

chuck_lever.vcf (259.00 B)

2008-07-01 10:21:04

by Krishna Kumar2

[permalink] [raw]

Subject: Re: NFS performance degradation of local loopback FS.

"J. Bruce Fields" <[email protected]> wrote on 06/30/2008 09:05:41 PM:

> On Mon, Jun 30, 2008 at 11:26:54AM -0400, Jeff Layton wrote:
> > Recently I spent some time with others here at Red Hat looking
> > at problems with nfs server performance. One thing we found was that
> > there are some problems with multiple nfsd's. It seems like the I/O
> > scheduling or something is fooled by the fact that sequential write
> > calls are often handled by different nfsd's. This can negatively
> > impact performance (I don't think we've tracked this down completely
> > yet, however).
>
> Yes, we've been trying to see how close to full network speed we can get
> over a 10 gig network and have run into situations where increasing the
> number of threads (without changing anything else) seems to decrease
> performance of a simple sequential write.
>
> And the hypothesis that the problem was randomized IO scheduling was the
> first thing that came to mind. But I'm not sure what the easiest way
> would be to really prove that that was the problem.
>
> And then once we really are sure that's the problem, I'm not sure what
> to do about it. I suppose it may depend partly on exactly where the
> reordering is happening.

For 1 process, this theory seems to work:
1 testing process: /local: 39.11 MB/s
64 nfsd's: 29.63 MB/s
1 nfs'd: 38.99 MB/s

However for 6 processes reading 6 different files:
6 parallel testing processes: /local: 70 MB/s
1 nfs'd: 36 MB/s (49% drop)
2 nfs'd: 37.7 MB/s (46% drop)
4 nfs'd: 38.6 MB/s (44.9% drop)
4 nfsd's on different cpu's: 37.5 MB/s (46% drop)
32 nfs'd: 38.3 MB/s (45% drop)
64 nfs'd: 38.3 MB/s (45% drop)

Thanks,

- KK

2008-07-01 12:49:19

by Jeff Layton

[permalink] [raw]

Subject: Re: NFS performance degradation of local loopback FS.

On Tue, 1 Jul 2008 15:49:44 +0530
Krishna Kumar2 <[email protected]> wrote:

> "J. Bruce Fields" <[email protected]> wrote on 06/30/2008 09:05:41 PM:
>
> > On Mon, Jun 30, 2008 at 11:26:54AM -0400, Jeff Layton wrote:
> > > Recently I spent some time with others here at Red Hat looking
> > > at problems with nfs server performance. One thing we found was that
> > > there are some problems with multiple nfsd's. It seems like the I/O
> > > scheduling or something is fooled by the fact that sequential write
> > > calls are often handled by different nfsd's. This can negatively
> > > impact performance (I don't think we've tracked this down completely
> > > yet, however).
> >
> > Yes, we've been trying to see how close to full network speed we can get
> > over a 10 gig network and have run into situations where increasing the
> > number of threads (without changing anything else) seems to decrease
> > performance of a simple sequential write.
> >
> > And the hypothesis that the problem was randomized IO scheduling was the
> > first thing that came to mind. But I'm not sure what the easiest way
> > would be to really prove that that was the problem.
> >
> > And then once we really are sure that's the problem, I'm not sure what
> > to do about it. I suppose it may depend partly on exactly where the
> > reordering is happening.
>
> For 1 process, this theory seems to work:
> 1 testing process: /local: 39.11 MB/s
> 64 nfsd's: 29.63 MB/s
> 1 nfs'd: 38.99 MB/s
>
>
> However for 6 processes reading 6 different files:
> 6 parallel testing processes: /local: 70 MB/s
> 1 nfs'd: 36 MB/s (49% drop)
> 2 nfs'd: 37.7 MB/s (46% drop)
> 4 nfs'd: 38.6 MB/s (44.9% drop)
> 4 nfsd's on different cpu's: 37.5 MB/s (46% drop)
> 32 nfs'd: 38.3 MB/s (45% drop)
> 64 nfs'd: 38.3 MB/s (45% drop)
>

That makes some sense, I think...

What's happening is that the processes on the client doing the I/O are
being "masqueraded" behind the nfsd's. This is throwing off readahead
(and maybe other predictive I/O optimizations?). These optimizations
help when a single thread is doing I/O, but when a single process is
feeding multiple nfsd's or multiple processes are spewing I/O to a
single nfsd, it falls back to random I/O behavior.

Also in the single nfsd case, you're also being bottlenecked by the
fact that all of the I/O is serialized. Not a problem with a single
client-side process, but it may be a significant slowdown when there
are multiple writers on the client.

We have a RHBZ open on this for RHEL5:

https://bugzilla.redhat.com/show_bug.cgi?id=448130

...there is a partial workaround described there as well.

--
Jeff Layton <[email protected]>