2008-06-19 06:47:05

by Krishna Kumar2

[permalink] [raw]
Subject: NFS performance degradation of local loopback FS.


Hi,

I am running 2.6.25 kernel on a [4 way, 3.2 x86_64, 4GB] system. The test
is doing I/O on a local ext3 filesystem, and measuring the bandwidth, and
then NFS mounting the filesystem loopback on the same system. I have
configured 64 nfsd's to run. The test script is attached at the bottom.

My configuration is:
/dev/some-local-disk : /local
NFS mount /local : /nfs

The result is:
200 processes:
/local: 108000 KB/s
/nfs: 66000 KB/s: Drop of 40%

300 processes (KB/s):
/local: 112000 KB/s
/nfs: 57000 KB/s: Drop of 50%

I am not using any tuning, though I have tested with both
sunrpc.tcp_slot_table_entries=16 & 128

Is this big a drop expected for a loopback NFS mount? Any
feedback/suggestions are very
appreciated.

Thanks,

- KK

(See attached file: nfs)


Attachments:
nfs (865.00 B)

2008-06-27 17:44:33

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

On Fri, Jun 27, 2008 at 02:34:24PM +0530, Krishna Kumar2 wrote:
> But if the file is being shared only with one client (and that too
> locally),
> isn't 25% too high?
>
> Will I get better results on NFSv4, and should I try delegation (that
> sounds
> automatic and not something that the user has to start)?

No, delegation couldn't possibly help in this case--more caching can't
help if you're only reading the file once.

--b.

2008-06-27 18:06:44

by Dean

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

One option might be to try using O_DIRECT if you are worried about
memory (although I would read/write in at least 1 MB at a time). I
would expect this to help at least a bit especially on reads.

Also, check all the standard nfs tuning stuff, #nfsds, #rpc slots.
Since with a loopback you effectively have no latency, you would want to
ensure that neither the #nfsds or #rpc slots is a bottleneck (if either
one is too low, you will have a problem). One way to reduce the # of
requests and therefore require fewer nfsds/rpc_slots is to 'cat
/proc/mounts' to see your wsize/rsize. Ensure your wsize/rsize is a
decent size (~ 1MB).

Dean

Krishna Kumar2 wrote:
> Chuck Lever <[email protected]> wrote on 06/26/2008 11:12:58 PM:
>
>
>>> Local:
>>> Read: 69.5 MB/s
>>> Write: 70.0 MB/s
>>> NFS of same FS mounted loopback on same system:
>>> Read: 29.5 MB/s (57% drop)
>>> Write: 27.5 MB/s (60% drop)
>>>
>> You can look at client-side NFS and RPC performance metrics using some
>> prototype Python tools that were just added to nfs-utils. The scripts
>> themselves can be downloaded from:
>> http://oss.oracle.com/~cel/Linux-2.6/2.6.25
>> but unfortunately they are not fully documented yet so you will have
>> to approach them with an open mind and a sense of experimentation.
>>
>> You can also capture network traces on your loopback interface to see
>> if there is, for example, unexpected congestion or latency, or if
>> there are other problems.
>>
>> But for loopback, the problem is often that the client and server are
>> sharing the same physical memory for caching data. Analyzing your
>> test system's physical memory utilization might be revealing.
>>
>
> But loopback is better than actual network traffic. If my file size is
> less than half the available physical memory, then this should not be
> a problem, right? The server caches the file data (64K at a time), and
> sends to the client (on the same system) and the client has a local
> copy. I am testing today with that assumption.
>
> My system has 4GB memory, of which 3.4GB is free before running the test.
> I created a 1.46GB (so that double that size for server/client copies will
> not be more than 3GB) file by running:
> dd if=/dev/zero of=smaller_file bs=65536 count=24000
>
> To measure the time exactly for just the I/O part, I have a small program
> that
> reads data in chunks of 64K and discards it "while (read(fd, buf, 64K) >
> 0)",
> with a gettimeofday before and after it to measure bandwidth. For each run,
> the script does (psuedo): "umount /nfs, stop nfs server, umount /local,
> mount /local, start nfs server, and mount /nfs". The result is:
>
> Testing on /local
> Time: 38.4553 BW:39.01 MB/s
> Time: 38.3073 BW:39.16 MB/s
> Time: 38.3807 BW:39.08 MB/s
> Time: 38.3724 BW:39.09 MB/s
> Time: 38.3463 BW:39.12 MB/s
> Testing on /nfs
> Time: 52.4386 BW:28.60 MB/s
> Time: 50.7531 BW:29.55 MB/s
> Time: 50.8296 BW:29.51 MB/s
> Time: 48.2363 BW:31.10 MB/s
> Time: 51.1992 BW:29.30 MB/s
>
> Average bandwidth drop across 5 runs is 24.24%.
>
> Memory stats *before* and *after* one run for /local and /nfs is:
>
> ********** local.start ******
> MemFree: 3500700 kB
> Cached: 317076 kB
> Inactive: 249356 kB
>
> ********** local.end ********
> MemFree: 1961872 kB
> Cached: 1853100 kB
> Inactive: 1785028 kB
>
> ********** nfs.start ********
> MemFree: 3480456 kB
> Cached: 317072 kB
> Inactive: 252740 kB
>
> ********** nfs.end **********
> MemFree: 400892 kB
> Cached: 3389164 kB
> Inactive: 3324800 kB
>
> I don't know if this is useful but looking at ratios:
> Memfree increased almost 5 times from 1.78 (Memfree before / Memfree after)
> to 8.68 for /local and /nfs respectively. Inactive almost doubled from 7.15
> times to 13.15 times for /local and /nfs (Inactive after / Inactive
> before),
> and Cached also almost doubled from 5.84 times to 10.69 times (same for
> Cached).
>
>
>> Otherwise, you should always expect some performance degradation when
>> comparing NFS and local disk. 50% is not completely unheard of. It's
>> the price paid for being able to share your file data concurrently
>> among multiple clients.
>>
>
> But if the file is being shared only with one client (and that too
> locally),
> isn't 25% too high?
>
> Will I get better results on NFSv4, and should I try delegation (that
> sounds
> automatic and not something that the user has to start)?
>
> Thanks,
>
> - KK
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2008-06-30 10:11:47

by Krishna Kumar2

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

Dean Hildebrand <[email protected]> wrote on 06/27/2008 11:36:28 PM:

> One option might be to try using O_DIRECT if you are worried about
> memory (although I would read/write in at least 1 MB at a time). I
> would expect this to help at least a bit especially on reads.
>
> Also, check all the standard nfs tuning stuff, #nfsds, #rpc slots.
> Since with a loopback you effectively have no latency, you would want to
> ensure that neither the #nfsds or #rpc slots is a bottleneck (if either
> one is too low, you will have a problem). One way to reduce the # of
> requests and therefore require fewer nfsds/rpc_slots is to 'cat
> /proc/mounts' to see your wsize/rsize. Ensure your wsize/rsize is a
> decent size (~ 1MB).

Number of nfsd: 64, and
sunrpc.transports = sunrpc.udp_slot_table_entries = 128
sunrpc.tcp_slot_table_entries = 128

I am using:

mount -o
rw,bg,hard,nointr,proto=tcp,vers=3,rsize=65536,wsize=65536,timeo=600,noatime
localhost:/local /nfs

I have also tried with 1MB for both rsize/wsize and it didn't change the BW
(other than
mini variations).

thanks,

- KK


2008-06-30 15:28:39

by Jeff Layton

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

On Mon, 30 Jun 2008 15:40:30 +0530
Krishna Kumar2 <[email protected]> wrote:

> Dean Hildebrand <[email protected]> wrote on 06/27/2008 11:36:28 PM:
>
> > One option might be to try using O_DIRECT if you are worried about
> > memory (although I would read/write in at least 1 MB at a time). I
> > would expect this to help at least a bit especially on reads.
> >
> > Also, check all the standard nfs tuning stuff, #nfsds, #rpc slots.
> > Since with a loopback you effectively have no latency, you would want to
> > ensure that neither the #nfsds or #rpc slots is a bottleneck (if either
> > one is too low, you will have a problem). One way to reduce the # of
> > requests and therefore require fewer nfsds/rpc_slots is to 'cat
> > /proc/mounts' to see your wsize/rsize. Ensure your wsize/rsize is a
> > decent size (~ 1MB).
>
> Number of nfsd: 64, and
> sunrpc.transports = sunrpc.udp_slot_table_entries = 128
> sunrpc.tcp_slot_table_entries = 128
>
> I am using:
>
> mount -o
> rw,bg,hard,nointr,proto=tcp,vers=3,rsize=65536,wsize=65536,timeo=600,noatime
> localhost:/local /nfs
>
> I have also tried with 1MB for both rsize/wsize and it didn't change the BW
> (other than
> mini variations).
>
> thanks,
>
> - KK
>

Recently I spent some time with others here at Red Hat looking
at problems with nfs server performance. One thing we found was that
there are some problems with multiple nfsd's. It seems like the I/O
scheduling or something is fooled by the fact that sequential write
calls are often handled by different nfsd's. This can negatively
impact performance (I don't think we've tracked this down completely
yet, however).

Since you're just doing some single-threaded testing on the client
side, it might be interesting to try running a single nfsd and testing
performance with that. It might provide an interesting data point.

Some other thoughts of things to try:

1) run the tests against an exported tmpfs filesystem to eliminate
underlying disk performance as a factor.

2) test nfsv4 -- nfsd opens and closes the file for each read/write.
nfsv4 is statelful, however, so I don't believe it does that there.

As others have pointed out though, testing with client and server on
the same machine is not necessarily eliminating performance
bottlenecks. You may want to test with dedicated clients and servers
(maybe on a nice fast network or with a gigE crossover cable or
something).

--
Jeff Layton <[email protected]>

2008-06-30 15:30:10

by Chuck Lever III

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

On Mon, Jun 30, 2008 at 6:10 AM, Krishna Kumar2 <[email protected]> wrote:
> Dean Hildebrand <[email protected]> wrote on 06/27/2008 11:36:28 PM:
>
>> One option might be to try using O_DIRECT if you are worried about
>> memory (although I would read/write in at least 1 MB at a time). I
>> would expect this to help at least a bit especially on reads.
>>
>> Also, check all the standard nfs tuning stuff, #nfsds, #rpc slots.
>> Since with a loopback you effectively have no latency, you would want to
>> ensure that neither the #nfsds or #rpc slots is a bottleneck (if either
>> one is too low, you will have a problem). One way to reduce the # of
>> requests and therefore require fewer nfsds/rpc_slots is to 'cat
>> /proc/mounts' to see your wsize/rsize. Ensure your wsize/rsize is a
>> decent size (~ 1MB).
>
> Number of nfsd: 64, and
> sunrpc.transports = sunrpc.udp_slot_table_entries = 128
> sunrpc.tcp_slot_table_entries = 128

Interestingly, sometimes using a large number of slots can be
detrimental to performance over loopback. Have you tried 32 and 64 as
well as 128? Also, I seem to recall that you should have the same as
or fewer slots on your clients than you have threads on your server.

--
Chuck Lever

2008-06-22 08:35:39

by Benny Halevy

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

On Jun. 20, 2008, 12:21 +0300, Krishna Kumar2 <[email protected]> wrote:
> Benny Halevy <[email protected]> wrote on 06/19/2008 06:22:42 PM:
>
>>> Well, you aren't exactly comparing apples to apples. The NFS
>>> client does close-to-open semantics, meaning that it writes
>>> all modified data to the server on close. The dd commands run
>>> on the local file system do not. You might trying using
>>> something which does an fsync before closing so that you are
>>> making a closer comparison.
>> try dd conv=fsync ...
>
> I ran a single 'dd' with this option on /local and later on /nfs (same
> filesystem nfs mounted on the same system). The script is umounting and
> mounting local and nfs partitions between each 'dd'. Following are the
> file sizes for 20 and 60 second runs respectively:

According to dd's man page, the f{,date}sync options tell it to
"physically write output file data before finishing"
If you kill it before that you end up with dirty data in the cache.
What exactly are you trying to measure, what is the expected application
workload?

> -rw-r--r-- 1 root root 1558056960 Jun 20 14:41 local.1
> -rw-r--r-- 1 root root 671834112 Jun 20 14:41 nfs.1 (56% drop)
> &
> -rw-r--r-- 1 root root 3845812224 Jun 20 14:42 local.1
> -rw-r--r-- 1 root root 2420342784 Jun 20 14:43 nfs.1 (37% drop)
>
> Since I am new to NFS, I am not sure if this much degradation is expected,
> or whether I need to tune something. Is there some code I can look at or
> hack into to find possible locations for the performance fall? At this time
> I cannot even tell whether the *possible* bug is in server or client code.

I'm not sure if there's a any bug per-se at all although there seems to be
some room for improvement.

As another data point, I'm seeing about 20% worse write throughput on my
system with a single dd writing local file system vs. writing to the same fs
over a loopback mounted nfs with a 2.6.26-rc6 based kernel (nfs 3 and 4 gave
similar results).
Disk:
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: ATA-7: HDT722516DLA380, V43OA96A, max UDMA/133
ata3.00: 321672960 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata3.00: configured for UDMA/133

ext3 mount options: noatime
nfs mount options: rsize=65536,wsize=65536
dd options: bs=64k count=10k conv=fsync

(write results average of 3 runs)
write local disk: 47.6 MB/s
write loopback nfsv3: 30.2 MB/s
write remote nfsv3: 29.0 MB/s
write loopback nfsv4: 37.5 MB/s
write remote nfsv4: 29.1 MB/s

read local disk: 50.8 MB/s
read loopback nfsv3: 27.2 MB/s
read remote nfsv3: 21.8 MB/s
read loopback nfsv4: 25.4 MB/s
read remote nfsv4: 21.4 MB/s

Benny

>
> Thanks,
>
> - KK
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2008-06-23 08:12:04

by Krishna Kumar2

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

Hi Benny,

> According to dd's man page, the f{,date}sync options tell it to
> "physically write output file data before finishing"
> If you kill it before that you end up with dirty data in the cache.
> What exactly are you trying to measure, what is the expected application
> workload?

I changed my test to do what you were doing instead of killing
dd's, etc. The end application is DB2 and it is using multiple
processes and I wanted to simulate that with micro-benchmarks.
The only reliable way to benchmark bandwidth for multiple
processes is to kill the tests after running them for some time
instead of letting them run till conclusion.

> ext3 mount options: noatime
> nfs mount options: rsize=65536,wsize=65536
> dd options: bs=64k count=10k conv=fsync
>
> (write results average of 3 runs)
> write local disk: 47.6 MB/s
> write loopback nfsv3: 30.2 MB/s
> write remote nfsv3: 29.0 MB/s
> write loopback nfsv4: 37.5 MB/s
> write remote nfsv4: 29.1 MB/s
>
> read local disk: 50.8 MB/s
> read loopback nfsv3: 27.2 MB/s
> read remote nfsv3: 21.8 MB/s
> read loopback nfsv4: 25.4 MB/s
> read remote nfsv4: 21.4 MB/s

I used the exact same options you are using, and here is the results
averaged across 3 runs:

Write local disk 58.5 MB/s
Write loopback nfsv3: 29.42 MB/s (50% drop)

Reading (file created from /dev/urandom, somehow I am getting in GB/sec
while your results were comparable to write's):
Read local disk: 2.77 GB/s
Read loopback nfsv3: 2.86 GB/s (higher for some reason)

Thanks,

- KK


2008-06-23 12:42:16

by Benny Halevy

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

On Jun. 23, 2008, 11:11 +0300, Krishna Kumar2 <[email protected]> wrote:
> Hi Benny,
>
>> According to dd's man page, the f{,date}sync options tell it to
>> "physically write output file data before finishing"
>> If you kill it before that you end up with dirty data in the cache.
>> What exactly are you trying to measure, what is the expected application
>> workload?
>
> I changed my test to do what you were doing instead of killing
> dd's, etc. The end application is DB2 and it is using multiple
> processes and I wanted to simulate that with micro-benchmarks.
> The only reliable way to benchmark bandwidth for multiple
> processes is to kill the tests after running them for some time
> instead of letting them run till conclusion.

BTW, iozone (http://www.iozone.org/) might be your friend if you're
looking for a reliable I/O benchmark (w/ -e and -c options to include
fsync and close).

>
>> ext3 mount options: noatime
>> nfs mount options: rsize=65536,wsize=65536
>> dd options: bs=64k count=10k conv=fsync
>>
>> (write results average of 3 runs)
>> write local disk: 47.6 MB/s
>> write loopback nfsv3: 30.2 MB/s
>> write remote nfsv3: 29.0 MB/s
>> write loopback nfsv4: 37.5 MB/s
>> write remote nfsv4: 29.1 MB/s
>>
>> read local disk: 50.8 MB/s
>> read loopback nfsv3: 27.2 MB/s
>> read remote nfsv3: 21.8 MB/s
>> read loopback nfsv4: 25.4 MB/s
>> read remote nfsv4: 21.4 MB/s
>
> I used the exact same options you are using, and here is the results
> averaged across 3 runs:
>
> Write local disk 58.5 MB/s
> Write loopback nfsv3: 29.42 MB/s (50% drop)
>
> Reading (file created from /dev/urandom, somehow I am getting in GB/sec
> while your results were comparable to write's):

Apparently the file is cached. You needed to restart nfs
and remount the file system to make sure it isn't before reading it.
Or, you can create a file larger than your host's cache size so
when you write (or read) it sequentially, its tail evicts its head
out of the cache. This is a less reliable method, yet creating a
file about 25% larger than the host's memory size should work for you.

Benny

> Read local disk: 2.77 GB/s
> Read loopback nfsv3: 2.86 GB/s (higher for some reason)
>
> Thanks,
>
> - KK
>


2008-06-26 07:19:42

by Krishna Kumar2

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

Benny Halevy <[email protected]> wrote on 06/23/2008 06:10:40 PM:

> Apparently the file is cached. You needed to restart nfs
> and remount the file system to make sure it isn't before reading it.
> Or, you can create a file larger than your host's cache size so
> when you write (or read) it sequentially, its tail evicts its head
> out of the cache. This is a less reliable method, yet creating a
> file about 25% larger than the host's memory size should work for you.

I did a umount of all filesystems and restart NFS before testing. Here
is the result:

Local:
Read: 69.5 MB/s
Write: 70.0 MB/s
NFS of same FS mounted loopback on same system:
Read: 29.5 MB/s (57% drop)
Write: 27.5 MB/s (60% drop)

The drops seems exceedingly high. How can I figure out the source of the
problem? Even if it is as general as to be able to state: "Problem is in
the NFS client code" or "Problem is in the NFS server code", or "Problem
can be mitigated by tuning" :-)

Thanks,

- KK


2008-06-26 17:45:00

by Chuck Lever III

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

On Jun 26, 2008, at 3:19 AM, Krishna Kumar2 wrote:
> Benny Halevy <[email protected]> wrote on 06/23/2008 06:10:40 PM:
>
>> Apparently the file is cached. You needed to restart nfs
>> and remount the file system to make sure it isn't before reading it.
>> Or, you can create a file larger than your host's cache size so
>> when you write (or read) it sequentially, its tail evicts its head
>> out of the cache. This is a less reliable method, yet creating a
>> file about 25% larger than the host's memory size should work for
>> you.
>
> I did a umount of all filesystems and restart NFS before testing. Here
> is the result:
>
> Local:
> Read: 69.5 MB/s
> Write: 70.0 MB/s
> NFS of same FS mounted loopback on same system:
> Read: 29.5 MB/s (57% drop)
> Write: 27.5 MB/s (60% drop)
>
> The drops seems exceedingly high. How can I figure out the source of
> the
> problem? Even if it is as general as to be able to state: "Problem
> is in
> the NFS client code" or "Problem is in the NFS server code", or
> "Problem
> can be mitigated by tuning" :-)

It's hard to say what might be the problem just by looking at
performance results.

You can look at client-side NFS and RPC performance metrics using some
prototype Python tools that were just added to nfs-utils. The scripts
themselves can be downloaded from:

http://oss.oracle.com/~cel/Linux-2.6/2.6.25

but unfortunately they are not fully documented yet so you will have
to approach them with an open mind and a sense of experimentation.

You can also capture network traces on your loopback interface to see
if there is, for example, unexpected congestion or latency, or if
there are other problems.

But for loopback, the problem is often that the client and server are
sharing the same physical memory for caching data. Analyzing your
test system's physical memory utilization might be revealing.

Otherwise, you should always expect some performance degradation when
comparing NFS and local disk. 50% is not completely unheard of. It's
the price paid for being able to share your file data concurrently
among multiple clients.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2008-06-26 17:56:08

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

On Thu, Jun 26, 2008 at 01:42:58PM -0400, Chuck Lever wrote:
> On Jun 26, 2008, at 3:19 AM, Krishna Kumar2 wrote:
>> Benny Halevy <[email protected]> wrote on 06/23/2008 06:10:40 PM:
>>
>>> Apparently the file is cached. You needed to restart nfs
>>> and remount the file system to make sure it isn't before reading it.
>>> Or, you can create a file larger than your host's cache size so
>>> when you write (or read) it sequentially, its tail evicts its head
>>> out of the cache. This is a less reliable method, yet creating a
>>> file about 25% larger than the host's memory size should work for
>>> you.
>>
>> I did a umount of all filesystems and restart NFS before testing. Here
>> is the result:
>>
>> Local:
>> Read: 69.5 MB/s
>> Write: 70.0 MB/s
>> NFS of same FS mounted loopback on same system:
>> Read: 29.5 MB/s (57% drop)
>> Write: 27.5 MB/s (60% drop)
>>
>> The drops seems exceedingly high. How can I figure out the source of
>> the
>> problem? Even if it is as general as to be able to state: "Problem is
>> in
>> the NFS client code" or "Problem is in the NFS server code", or
>> "Problem
>> can be mitigated by tuning" :-)
>
> It's hard to say what might be the problem just by looking at
> performance results.
>
> You can look at client-side NFS and RPC performance metrics using some
> prototype Python tools that were just added to nfs-utils. The scripts
> themselves can be downloaded from:
>
> http://oss.oracle.com/~cel/Linux-2.6/2.6.25
>
> but unfortunately they are not fully documented yet so you will have to
> approach them with an open mind and a sense of experimentation.
>
> You can also capture network traces on your loopback interface to see if
> there is, for example, unexpected congestion or latency, or if there are
> other problems.
>
> But for loopback, the problem is often that the client and server are
> sharing the same physical memory for caching data. Analyzing your test
> system's physical memory utilization might be revealing.

If he's just doing a single large read or write with cold caches (sounds
like that's probably the case), then memory probably doesn't matter
much, does it?

--b.

>
> Otherwise, you should always expect some performance degradation when
> comparing NFS and local disk. 50% is not completely unheard of. It's
> the price paid for being able to share your file data concurrently among
> multiple clients.
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2008-06-26 21:05:47

by Chuck Lever III

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

On Thu, Jun 26, 2008 at 1:55 PM, J. Bruce Fields <[email protected]> wrote:
> On Thu, Jun 26, 2008 at 01:42:58PM -0400, Chuck Lever wrote:
>> On Jun 26, 2008, at 3:19 AM, Krishna Kumar2 wrote:
>>> Benny Halevy <[email protected]> wrote on 06/23/2008 06:10:40 PM:
>>>
>>>> Apparently the file is cached. You needed to restart nfs
>>>> and remount the file system to make sure it isn't before reading it.
>>>> Or, you can create a file larger than your host's cache size so
>>>> when you write (or read) it sequentially, its tail evicts its head
>>>> out of the cache. This is a less reliable method, yet creating a
>>>> file about 25% larger than the host's memory size should work for
>>>> you.
>>>
>>> I did a umount of all filesystems and restart NFS before testing. Here
>>> is the result:
>>>
>>> Local:
>>> Read: 69.5 MB/s
>>> Write: 70.0 MB/s
>>> NFS of same FS mounted loopback on same system:
>>> Read: 29.5 MB/s (57% drop)
>>> Write: 27.5 MB/s (60% drop)
>>>
>>> The drops seems exceedingly high. How can I figure out the source of
>>> the
>>> problem? Even if it is as general as to be able to state: "Problem is
>>> in
>>> the NFS client code" or "Problem is in the NFS server code", or
>>> "Problem
>>> can be mitigated by tuning" :-)
>>
>> It's hard to say what might be the problem just by looking at
>> performance results.
>>
>> You can look at client-side NFS and RPC performance metrics using some
>> prototype Python tools that were just added to nfs-utils. The scripts
>> themselves can be downloaded from:
>>
>> http://oss.oracle.com/~cel/Linux-2.6/2.6.25
>>
>> but unfortunately they are not fully documented yet so you will have to
>> approach them with an open mind and a sense of experimentation.
>>
>> You can also capture network traces on your loopback interface to see if
>> there is, for example, unexpected congestion or latency, or if there are
>> other problems.
>>
>> But for loopback, the problem is often that the client and server are
>> sharing the same physical memory for caching data. Analyzing your test
>> system's physical memory utilization might be revealing.
>
> If he's just doing a single large read or write with cold caches (sounds
> like that's probably the case), then memory probably doesn't matter
> much, does it?

I expect it might.

The client and server would contend for available physical memory as
the file was first read in from the physical file system by the
server, and then a second copy was cached by the client.

A file as small as half the available physical memory on his system
could trigger this behavior.

On older 2.6 kernels (.18 or so), both the server's physical file
system and the client would trigger bdi congestion throttling.

--
Chuck Lever
chu ckl eve rat ora cle dot com

2008-06-27 09:05:11

by Krishna Kumar2

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

Chuck Lever <[email protected]> wrote on 06/26/2008 11:12:58 PM:

> > Local:
> > Read: 69.5 MB/s
> > Write: 70.0 MB/s
> > NFS of same FS mounted loopback on same system:
> > Read: 29.5 MB/s (57% drop)
> > Write: 27.5 MB/s (60% drop)
>
> You can look at client-side NFS and RPC performance metrics using some
> prototype Python tools that were just added to nfs-utils. The scripts
> themselves can be downloaded from:
> http://oss.oracle.com/~cel/Linux-2.6/2.6.25
> but unfortunately they are not fully documented yet so you will have
> to approach them with an open mind and a sense of experimentation.
>
> You can also capture network traces on your loopback interface to see
> if there is, for example, unexpected congestion or latency, or if
> there are other problems.
>
> But for loopback, the problem is often that the client and server are
> sharing the same physical memory for caching data. Analyzing your
> test system's physical memory utilization might be revealing.

But loopback is better than actual network traffic. If my file size is
less than half the available physical memory, then this should not be
a problem, right? The server caches the file data (64K at a time), and
sends to the client (on the same system) and the client has a local
copy. I am testing today with that assumption.

My system has 4GB memory, of which 3.4GB is free before running the test.
I created a 1.46GB (so that double that size for server/client copies will
not be more than 3GB) file by running:
dd if=/dev/zero of=smaller_file bs=65536 count=24000

To measure the time exactly for just the I/O part, I have a small program
that
reads data in chunks of 64K and discards it "while (read(fd, buf, 64K) >
0)",
with a gettimeofday before and after it to measure bandwidth. For each run,
the script does (psuedo): "umount /nfs, stop nfs server, umount /local,
mount /local, start nfs server, and mount /nfs". The result is:

Testing on /local
Time: 38.4553 BW:39.01 MB/s
Time: 38.3073 BW:39.16 MB/s
Time: 38.3807 BW:39.08 MB/s
Time: 38.3724 BW:39.09 MB/s
Time: 38.3463 BW:39.12 MB/s
Testing on /nfs
Time: 52.4386 BW:28.60 MB/s
Time: 50.7531 BW:29.55 MB/s
Time: 50.8296 BW:29.51 MB/s
Time: 48.2363 BW:31.10 MB/s
Time: 51.1992 BW:29.30 MB/s

Average bandwidth drop across 5 runs is 24.24%.

Memory stats *before* and *after* one run for /local and /nfs is:

********** local.start ******
MemFree: 3500700 kB
Cached: 317076 kB
Inactive: 249356 kB

********** local.end ********
MemFree: 1961872 kB
Cached: 1853100 kB
Inactive: 1785028 kB

********** nfs.start ********
MemFree: 3480456 kB
Cached: 317072 kB
Inactive: 252740 kB

********** nfs.end **********
MemFree: 400892 kB
Cached: 3389164 kB
Inactive: 3324800 kB

I don't know if this is useful but looking at ratios:
Memfree increased almost 5 times from 1.78 (Memfree before / Memfree after)
to 8.68 for /local and /nfs respectively. Inactive almost doubled from 7.15
times to 13.15 times for /local and /nfs (Inactive after / Inactive
before),
and Cached also almost doubled from 5.84 times to 10.69 times (same for
Cached).

> Otherwise, you should always expect some performance degradation when
> comparing NFS and local disk. 50% is not completely unheard of. It's
> the price paid for being able to share your file data concurrently
> among multiple clients.

But if the file is being shared only with one client (and that too
locally),
isn't 25% too high?

Will I get better results on NFSv4, and should I try delegation (that
sounds
automatic and not something that the user has to start)?

Thanks,

- KK


2008-06-27 14:06:46

by Chuck Lever III

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

On Fri, Jun 27, 2008 at 5:04 AM, Krishna Kumar2 <[email protected]> wrote:
> Chuck Lever <[email protected]> wrote on 06/26/2008 11:12:58 PM:
>> > Local:
>> > Read: 69.5 MB/s
>> > Write: 70.0 MB/s
>> > NFS of same FS mounted loopback on same system:
>> > Read: 29.5 MB/s (57% drop)
>> > Write: 27.5 MB/s (60% drop)
>>
>> You can look at client-side NFS and RPC performance metrics using some
>> prototype Python tools that were just added to nfs-utils. The scripts
>> themselves can be downloaded from:
>> http://oss.oracle.com/~cel/Linux-2.6/2.6.25
>> but unfortunately they are not fully documented yet so you will have
>> to approach them with an open mind and a sense of experimentation.
>>
>> You can also capture network traces on your loopback interface to see
>> if there is, for example, unexpected congestion or latency, or if
>> there are other problems.
>>
>> But for loopback, the problem is often that the client and server are
>> sharing the same physical memory for caching data. Analyzing your
>> test system's physical memory utilization might be revealing.
>
> But loopback is better than actual network traffic.

What precisely do you mean by that?

You are testing with the client and server on the same machine. Is
the loopback mount over the lo interface, but you mount the machine's
actual IP address for the "network" test?

I would expect that in that case, loopback would perform better
because a memory copy is always faster than going through the network
stack and the NIC.

It would be interesting to compare a network-only performance test
(like iPerf) for loopback and for going through the NIC.

> If my file size is
> less than half the available physical memory, then this should not be
> a problem, right?

It is likely not a problem in that case, but you never know until you
have analyzed the network traffic carefully to see what's going on.

>> Otherwise, you should always expect some performance degradation when
>> comparing NFS and local disk. 50% is not completely unheard of. It's
>> the price paid for being able to share your file data concurrently
>> among multiple clients.
>
> But if the file is being shared only with one client (and that too
> locally), isn't 25% too high?

NFS always allows the possibility of sharing, so it doesn't matter how
many clients have mounted the server.

The distinction I'm drawing here is between something like iSCSI,
where only a single client ever mounts a LUN, and thus can cache
aggressively, versus NFS in the same environment, where the client has
to assume that any other client can access a file at any time, and
therefore must cache more conservatively.

You are doing cold cache tests, so this may not be at issue here either.

A 25% performance drop between a 'dd' directly on the server, and one
from an NFS client, is probably typical.

> Will I get better results on NFSv4, and should I try delegation (that
> sounds automatic and not something that the user has to start)?

It's hard to predict if NFSv4 will help because we don't understand
what is causing your performance drop yet.

Delegation is usually automatic if the client's mount command has
generated a plausible callback IP address, and the server is
successfully able to connect to it. However, I didn't think the
server hands out a delegation until the second OPEN... with a single
dd, the client opens the file only once.

--
Chuck Lever

2008-06-19 09:59:18

by Krishna Kumar2

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

> 200 processes:

By "200 processes", I meant 200 dd's, each reading from /dev/zero and
writing to a file on the filesystem. The script "nfs" was run twice, first
with
a local filesystem and the second time with the same filesystem NFS
mounted.

Thanks,

- KK

[email protected] wrote on 06/19/2008 12:16:23 PM:

>
> Hi,
>
> I am running 2.6.25 kernel on a [4 way, 3.2 x86_64, 4GB] system. The test
> is doing I/O on a local ext3 filesystem, and measuring the bandwidth, and
> then NFS mounting the filesystem loopback on the same system. I have
> configured 64 nfsd's to run. The test script is attached at the bottom.
>
> My configuration is:
> /dev/some-local-disk : /local
> NFS mount /local : /nfs
>
> The result is:
> 200 processes:
> /local: 108000 KB/s
> /nfs: 66000 KB/s: Drop of 40%
>
> 300 processes (KB/s):
> /local: 112000 KB/s
> /nfs: 57000 KB/s: Drop of 50%
>
> I am not using any tuning, though I have tested with both
> sunrpc.tcp_slot_table_entries=16 & 128
>
> Is this big a drop expected for a loopback NFS mount? Any
> feedback/suggestions are very
> appreciated.
>
> Thanks,
>
> - KK
>
> (See attached file: nfs)[attachment "nfs" deleted by Krishna
Kumar2/India/IBM]


2008-06-19 12:05:21

by Peter Staubach

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

Krishna Kumar2 wrote:
>> 200 processes:
>>
>
> By "200 processes", I meant 200 dd's, each reading from /dev/zero and
> writing to a file on the filesystem. The script "nfs" was run twice, first
> with
> a local filesystem and the second time with the same filesystem NFS
> mounted.
>
>

Well, you aren't exactly comparing apples to apples. The NFS
client does close-to-open semantics, meaning that it writes
all modified data to the server on close. The dd commands run
on the local file system do not. You might trying using
something which does an fsync before closing so that you are
making a closer comparison.

All that said, yes, one would expect a slow down. How much is
debatable and varies from platform to platform and load to load.

I would also advise care when running NFS like that. It is
subject to deadlock and is not recommended.

ps

> Thanks,
>
> - KK
>
> [email protected] wrote on 06/19/2008 12:16:23 PM:
>
>
>> Hi,
>>
>> I am running 2.6.25 kernel on a [4 way, 3.2 x86_64, 4GB] system. The test
>> is doing I/O on a local ext3 filesystem, and measuring the bandwidth, and
>> then NFS mounting the filesystem loopback on the same system. I have
>> configured 64 nfsd's to run. The test script is attached at the bottom.
>>
>> My configuration is:
>> /dev/some-local-disk : /local
>> NFS mount /local : /nfs
>>
>> The result is:
>> 200 processes:
>> /local: 108000 KB/s
>> /nfs: 66000 KB/s: Drop of 40%
>>
>> 300 processes (KB/s):
>> /local: 112000 KB/s
>> /nfs: 57000 KB/s: Drop of 50%
>>
>> I am not using any tuning, though I have tested with both
>> sunrpc.tcp_slot_table_entries=16 & 128
>>
>> Is this big a drop expected for a loopback NFS mount? Any
>> feedback/suggestions are very
>> appreciated.
>>
>> Thanks,
>>
>> - KK
>>
>> (See attached file: nfs)[attachment "nfs" deleted by Krishna
>>
> Kumar2/India/IBM]
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2008-06-19 12:58:37

by Benny Halevy

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

On Jun. 19, 2008, 15:04 +0300, Peter Staubach <[email protected]> wrote:
> Krishna Kumar2 wrote:
>>> 200 processes:
>>>
>> By "200 processes", I meant 200 dd's, each reading from /dev/zero and
>> writing to a file on the filesystem. The script "nfs" was run twice, first
>> with
>> a local filesystem and the second time with the same filesystem NFS
>> mounted.
>>
>>
>
> Well, you aren't exactly comparing apples to apples. The NFS
> client does close-to-open semantics, meaning that it writes
> all modified data to the server on close. The dd commands run
> on the local file system do not. You might trying using
> something which does an fsync before closing so that you are
> making a closer comparison.

try dd conv=fsync ...

Benny

>
> All that said, yes, one would expect a slow down. How much is
> debatable and varies from platform to platform and load to load.
>
> I would also advise care when running NFS like that. It is
> subject to deadlock and is not recommended.
>
> ps
>
>> Thanks,
>>
>> - KK
>>
>> [email protected] wrote on 06/19/2008 12:16:23 PM:
>>
>>
>>> Hi,
>>>
>>> I am running 2.6.25 kernel on a [4 way, 3.2 x86_64, 4GB] system. The test
>>> is doing I/O on a local ext3 filesystem, and measuring the bandwidth, and
>>> then NFS mounting the filesystem loopback on the same system. I have
>>> configured 64 nfsd's to run. The test script is attached at the bottom.
>>>
>>> My configuration is:
>>> /dev/some-local-disk : /local
>>> NFS mount /local : /nfs
>>>
>>> The result is:
>>> 200 processes:
>>> /local: 108000 KB/s
>>> /nfs: 66000 KB/s: Drop of 40%
>>>
>>> 300 processes (KB/s):
>>> /local: 112000 KB/s
>>> /nfs: 57000 KB/s: Drop of 50%
>>>
>>> I am not using any tuning, though I have tested with both
>>> sunrpc.tcp_slot_table_entries=16 & 128
>>>
>>> Is this big a drop expected for a loopback NFS mount? Any
>>> feedback/suggestions are very
>>> appreciated.
>>>
>>> Thanks,
>>>
>>> - KK
>>>
>>> (See attached file: nfs)[attachment "nfs" deleted by Krishna
>>>
>> Kumar2/India/IBM]
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


--
Benny Halevy
Software Architect
Tel/Fax: +972-3-647-8340
Mobile: +972-54-802-8340
[email protected]

Panasas, Inc.
The Leader in Parallel Storage
http://www.panasas.com

2008-06-20 06:40:42

by Krishna Kumar2

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

Thanks Peter for your explanation, and Benny for this option I was not
aware of. Let me
run some tests with this option.

Regards,

- KK

[email protected] wrote on 06/19/2008 06:22:42 PM:

> On Jun. 19, 2008, 15:04 +0300, Peter Staubach <[email protected]>
wrote:
> > Krishna Kumar2 wrote:
> >>> 200 processes:
> >>>
> >> By "200 processes", I meant 200 dd's, each reading from /dev/zero and
> >> writing to a file on the filesystem. The script "nfs" was run twice,
first
> >> with
> >> a local filesystem and the second time with the same filesystem NFS
> >> mounted.
> >>
> >>
> >
> > Well, you aren't exactly comparing apples to apples. The NFS
> > client does close-to-open semantics, meaning that it writes
> > all modified data to the server on close. The dd commands run
> > on the local file system do not. You might trying using
> > something which does an fsync before closing so that you are
> > making a closer comparison.
>
> try dd conv=fsync ...
>
> Benny
>
> >
> > All that said, yes, one would expect a slow down. How much is
> > debatable and varies from platform to platform and load to load.
> >
> > I would also advise care when running NFS like that. It is
> > subject to deadlock and is not recommended.
> >
> > ps
> >
> >> Thanks,
> >>
> >> - KK
> >>
> >> [email protected] wrote on 06/19/2008 12:16:23 PM:
> >>
> >>
> >>> Hi,
> >>>
> >>> I am running 2.6.25 kernel on a [4 way, 3.2 x86_64, 4GB] system. The
test
> >>> is doing I/O on a local ext3 filesystem, and measuring the bandwidth,
and
> >>> then NFS mounting the filesystem loopback on the same system. I have
> >>> configured 64 nfsd's to run. The test script is attached at the
bottom.
> >>>
> >>> My configuration is:
> >>> /dev/some-local-disk : /local
> >>> NFS mount /local : /nfs
> >>>
> >>> The result is:
> >>> 200 processes:
> >>> /local: 108000 KB/s
> >>> /nfs: 66000 KB/s: Drop of 40%
> >>>
> >>> 300 processes (KB/s):
> >>> /local: 112000 KB/s
> >>> /nfs: 57000 KB/s: Drop of 50%
> >>>
> >>> I am not using any tuning, though I have tested with both
> >>> sunrpc.tcp_slot_table_entries=16 & 128
> >>>
> >>> Is this big a drop expected for a loopback NFS mount? Any
> >>> feedback/suggestions are very
> >>> appreciated.
> >>>
> >>> Thanks,
> >>>
> >>> - KK
> >>>
> >>> (See attached file: nfs)[attachment "nfs" deleted by Krishna
> >>>
> >> Kumar2/India/IBM]
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> --
> Benny Halevy
> Software Architect
> Tel/Fax: +972-3-647-8340
> Mobile: +972-54-802-8340
> [email protected]
>
> Panasas, Inc.
> The Leader in Parallel Storage
> http://www.panasas.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2008-06-20 09:22:19

by Krishna Kumar2

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

Benny Halevy <[email protected]> wrote on 06/19/2008 06:22:42 PM:

> > Well, you aren't exactly comparing apples to apples. The NFS
> > client does close-to-open semantics, meaning that it writes
> > all modified data to the server on close. The dd commands run
> > on the local file system do not. You might trying using
> > something which does an fsync before closing so that you are
> > making a closer comparison.
>
> try dd conv=fsync ...

I ran a single 'dd' with this option on /local and later on /nfs (same
filesystem nfs mounted on the same system). The script is umounting and
mounting local and nfs partitions between each 'dd'. Following are the
file sizes for 20 and 60 second runs respectively:
-rw-r--r-- 1 root root 1558056960 Jun 20 14:41 local.1
-rw-r--r-- 1 root root 671834112 Jun 20 14:41 nfs.1 (56% drop)
&
-rw-r--r-- 1 root root 3845812224 Jun 20 14:42 local.1
-rw-r--r-- 1 root root 2420342784 Jun 20 14:43 nfs.1 (37% drop)

Since I am new to NFS, I am not sure if this much degradation is expected,
or whether I need to tune something. Is there some code I can look at or
hack into to find possible locations for the performance fall? At this time
I cannot even tell whether the *possible* bug is in server or client code.

Thanks,

- KK


2008-07-01 03:44:53

by Krishna Kumar2

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

Hi Chuck,

> As I understand it, "lo" is effectively a virtualized network device
> with point-to-point routing. Looping back through a real NIC can, in
> many cases, go all the way down to the network hardware and back, and
> is likely subject to routing decisions in your system's network layer.
> So I would expect them to be different in most cases.

Atleast in the linux stack, if you address a local network device, the
kernel does a route lookup to figure out which interface to send the
packet out on, and this results in using lo.

Thanks,

- KK


2008-07-01 05:08:25

by Krishna Kumar2

[permalink] [raw]
Subject: Re: NFS performance degradation of local loopback FS.

Jeff Layton <[email protected]> wrote on 06/30/2008 08:56:54 PM:

> Recently I spent some time with others here at Red Hat looking
> at problems with nfs server performance. One thing we found was that
> there are some problems with multiple nfsd's. It seems like the I/O
> scheduling or something is fooled by the fact that sequential write
> calls are often handled by different nfsd's. This can negatively
> impact performance (I don't think we've tracked this down completely
> yet, however).
>
> Since you're just doing some single-threaded testing on the client
> side, it might be interesting to try running a single nfsd and testing
> performance with that. It might provide an interesting data point.

Works perfectly now!

With 64 nfsd's:
[root@localhost nfs]# ./perf
********** Testing on /nfs *************
Read Time: 50.6236 BW:29.63 MB/s
********** Testing on /local *************
Read Time: 38.3506 BW:39.11 MB/s

With 1 nfs'd:
[root@localhost nfs]# ./perf
********** Testing on /nfs *************
Read Time: 38.4760 BW:38.99 MB/s
********** Testing on /local *************
Read Time: 38.4874 BW:38.97 MB/s

I will try your other suggestions too.

I have to see what happens if I increase my processes. The real test is
DB2 using 300 connections. I will update when I run some more tests. But
thanks to everyone's help so far.

Regards,

- KK