LinuxLists.cc - Some code, and a question

2005-09-06 22:40:29

Subject: Some code, and a question

OK, I know NFS isn't usually thought of as the fastest protocol under
the sun, but still, there are times when making NFS move along a little
faster can be worthwhile.

I've written a sort of NFS benchmark that I'm calling nfs-test. It
tries a largish number of rsize's, wsize's, tcp vs udp, and version 2
or 3 (4 would be very easy to add), to see what gives the best
performance. You can find it at
http://dcs.nac.uci.edu/~strombrg/nfs-test.html

My question is, before diving into trying to determine this empirically,
is there any theoretical reason why it would be better to have
rsize==wsize, or should it be better to just pick whatever rsize gives
the best read performance and pick whatever wsize gives the best write
performance, and not worry about if rsize!=wsize?

Thanks!

-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-07 01:02:34

by Greg Banks

[permalink] [raw]

Subject: Re: Some code, and a question

On Tue, Sep 06, 2005 at 03:39:57PM -0700, Dan Stromberg wrote:
>
> OK, I know NFS isn't usually thought of as the fastest protocol under
> the sun,

Why would you think that? NFSv3 can be very efficient at moving
bits from point A to point B.

> My question is, before diving into trying to determine this empirically,
> is there any theoretical reason why it would be better to have
> rsize==wsize,

>From a protocol point of view, no.

> or should it be better to just pick whatever rsize gives
> the best read performance and pick whatever wsize gives the best write
> performance, and not worry about if rsize!=wsize?

It will depend on the workload, but generally read and write throughput
will be better the larger the block size, up to a value beyond the
Linux kernel's ability to support. I expect you will find your optimum
at rsize=wsize=32K.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-07 14:37:36

by Dan Stromberg

[permalink] [raw]

Subject: Re: Some code, and a question

On Wed, 2005-09-07 at 11:02 +1000, Greg Banks wrote:
> On Tue, Sep 06, 2005 at 03:39:57PM -0700, Dan Stromberg wrote:
> >
> > OK, I know NFS isn't usually thought of as the fastest protocol under
> > the sun,
>
> Why would you think that? NFSv3 can be very efficient at moving
> bits from point A to point B.

You mean aside from the troublesome back-and-forthing on a high latency
network?

Perhaps this'll be less of an issue when NFS becomes extent-based.

> > My question is, before diving into trying to determine this empirically,
> > is there any theoretical reason why it would be better to have
> > rsize==wsize,
>
> From a protocol point of view, no.

That much is interesting. Thank you.

I'm also thinking about resource contention on the wire...

> > or should it be better to just pick whatever rsize gives
> > the best read performance and pick whatever wsize gives the best write
> > performance, and not worry about if rsize!=wsize?
>
> It will depend on the workload, but generally read and write throughput
> will be better the larger the block size, up to a value beyond the
> Linux kernel's ability to support. I expect you will find your optimum
> at rsize=wsize=32K.

Here's the summary output from my script. You may find it surprising.
It may have bugs, but so far it seems to be coming up with results that
one might not expect. This was iterating rsize's and wsize's from 4K to
64K in steps of 1K. BTW, this is from an AIX 5.1 host to a Solaris 9
host, but the script should run on nearly any unix or linux:

======> Writing in isolation (read protocol!=write protocol, read version!=write version, rsize!=wsize)
Creating 5 pipes
popening echo Number of measurements: $(wc -l)
popening echo Average number of seconds: $(cut -d " " -f 4 | avg -i)
popening echo Average time: $(cut -d " " -f 4 | avg -i | modtime -i)
popening sleep 1; echo Best time: $(cut -d " " -f 4 | highest -s $(expr 1024 \* 1024) -r -n 1 | modtime)
popening sleep 2; echo Best numbers:; highest -s $(expr 1024 \* 1024) -r -f 2 -n 5
Number of measurements: 26
Average number of seconds: 703.932692308
Average time: 11 minutes 43 seconds
Best time: 9 minutes 43 seconds
Best numbers:
xfer-result-Writing-16384-3-udp:Write time: 583.82
xfer-result-Writing-8192-3-tcp:Write time: 638.06
xfer-result-Writing-9216-3-tcp:Write time: 649.62
xfer-result-Writing-16384-3-tcp:Write time: 653.30
xfer-result-Writing-13312-3-tcp:Write time: 654.96

======> Reading in isolation (read protocol!=write protocol, read version!=write version, rsize!=wsize)
Creating 5 pipes
popening echo Number of measurements: $(wc -l)
popening echo Average number of seconds: $(cut -d " " -f 4 | avg -i)
popening echo Average time: $(cut -d " " -f 4 | avg -i | modtime -i)
popening sleep 1; echo Best time: $(cut -d " " -f 4 | highest -s $(expr 1024 \* 1024) -r -n 1 | modtime)
popening sleep 2; echo Best numbers:; highest -s $(expr 1024 \* 1024) -r -f 2 -n 5
Number of measurements: 25
Average number of seconds: 389.25
Average time: 6 minutes 29 seconds
Best time: 4 minutes 18 seconds
Best numbers:
xfer-result-Reading-16384-3-tcp:Read time: 258.31
xfer-result-Reading-8192-3-tcp:Read time: 337.19
xfer-result-Reading-9216-3-tcp:Read time: 339.16
xfer-result-Reading-10240-3-tcp:Read time: 340.15
xfer-result-Reading-12288-3-tcp:Read time: 340.26

======> Best composite of read and write (read protocol==write protocol, read version==write version, rsize!=wsize)
tcp 3 rsize: 4096 readtime: 485.49 wsize: 8192 writetime: 638.06 composite: 714.345
tcp 3 rsize: 5120 readtime: 471.15 wsize: 8192 writetime: 638.06 composite: 721.515
tcp 3 rsize: 6144 readtime: 471.14 wsize: 8192 writetime: 638.06 composite: 721.520
tcp 3 rsize: 7168 readtime: 469.20 wsize: 8192 writetime: 638.06 composite: 722.490
tcp 3 rsize: 4096 readtime: 485.49 wsize: 9216 writetime: 649.62 composite: 731.685
/\/\/\
udp 3 rsize: 5120 readtime: 514.31 wsize: 16384 writetime: 583.82 composite: 618.575
udp 3 rsize: 7168 readtime: 481.18 wsize: 16384 writetime: 583.82 composite: 635.140
udp 3 rsize: 4096 readtime: 473.37 wsize: 16384 writetime: 583.82 composite: 639.045
udp 3 rsize: 6144 readtime: 466.38 wsize: 16384 writetime: 583.82 composite: 642.540
udp 3 rsize: 9216 readtime: 405.25 wsize: 16384 writetime: 583.82 composite: 673.105
/\/\/\

======> Best composite of read and write (read protocol==write protocol, read version==write version, rsize==wsize)
tcp 3 8192 both sizes: 8192 readtime: 337.19 writetime: 638.06 composite: 788.495
udp 3 9216 both sizes: 9216 readtime: 405.25 writetime: 664.46 composite: 794.065
tcp 3 9216 both sizes: 9216 readtime: 339.16 writetime: 649.62 composite: 804.850
tcp 3 13312 both sizes: 13312 readtime: 341.15 writetime: 654.96 composite: 811.865
tcp 3 14336 both sizes: 14336 readtime: 372.20 writetime: 665.83 composite: 812.645

Comments would be very welcome.

It almost seems like we might be able to improve performance by mounting the same filesystem three times onto a given system:
1) In a way that will optimize reads
2) In a way that will optimize writes
3) In a way that will optimize writing and then reading immediately afterward

Thanks!

-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-07 15:00:51

by Peter Staubach

[permalink] [raw]

Subject: Re: Some code, and a question

Dan Stromberg wrote:

>
>Here's the summary output from my script. You may find it surprising.
>It may have bugs, but so far it seems to be coming up with results that
>one might not expect. This was iterating rsize's and wsize's from 4K to
>64K in steps of 1K. BTW, this is from an AIX 5.1 host to a Solaris 9
>host, but the script should run on nearly any unix or linux:
>

Presumably you have made the configuration changes at least on the Solaris
side, /etc/system or some such, to allow these systems to go all the way to
a 64K transfer size? Vanilla Solaris 9 won't do that.

What have you done to factor out the file system on the server?

Thanx...

ps

-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-07 15:34:28

by Lever, Charles

[permalink] [raw]

Subject: RE: Some code, and a question

> On Wed, 2005-09-07 at 11:02 +1000, Greg Banks wrote:=20
> > On Tue, Sep 06, 2005 at 03:39:57PM -0700, Dan Stromberg wrote:
> > >=20
> > > OK, I know NFS isn't usually thought of as the fastest=20
> protocol under
> > > the sun,
> >=20
> > Why would you think that? NFSv3 can be very efficient at moving
> > bits from point A to point B.
>=20
> You mean aside from the troublesome back-and-forthing on a=20
> high latency network?

NFSv4 delegation will help there.

but on a data-intensive workload (ie mostly reads and writes and very
few metadata operations) NFS can be quite as fast as the underlying
network and the server's file system will allow.

-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-07 17:01:19

by Dan Stromberg

[permalink] [raw]

Subject: Re: Some code, and a question

On Wed, 2005-09-07 at 10:55 -0400, Peter Staubach wrote:
> Dan Stromberg wrote:
>
> >
> >Here's the summary output from my script. You may find it surprising.
> >It may have bugs, but so far it seems to be coming up with results that
> >one might not expect. This was iterating rsize's and wsize's from 4K to
> >64K in steps of 1K. BTW, this is from an AIX 5.1 host to a Solaris 9
> >host, but the script should run on nearly any unix or linux:
> >
>
> Presumably you have made the configuration changes at least on the Solaris
> side, /etc/system or some such, to allow these systems to go all the way to
> a 64K transfer size? Vanilla Solaris 9 won't do that.

No I haven't - great lead. I'll see if I can google that up. Or if you
have the incantation at your fingertips...

> What have you done to factor out the file system on the server?

Nothing. Actually, I don't really want to in this case, because it's
the speed as seen by the enduser that I need to optimize, not the speed
of NFS alone. That is, if there's something specific to the combination
of NFS and the underlying QFS filesystem, I don't want my benchmarking
to miss that.

Thanks!

-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-07 17:03:04

by Dan Stromberg

[permalink] [raw]

Subject: RE: Some code, and a question

On Wed, 2005-09-07 at 08:34 -0700, Lever, Charles wrote:
> > On Wed, 2005-09-07 at 11:02 +1000, Greg Banks wrote:
> > > On Tue, Sep 06, 2005 at 03:39:57PM -0700, Dan Stromberg wrote:
> > > >
> > > > OK, I know NFS isn't usually thought of as the fastest
> > protocol under
> > > > the sun,
> > >
> > > Why would you think that? NFSv3 can be very efficient at moving
> > > bits from point A to point B.
> >
> > You mean aside from the troublesome back-and-forthing on a
> > high latency network?
>
> NFSv4 delegation will help there.

Cool.

> but on a data-intensive workload (ie mostly reads and writes and very
> few metadata operations) NFS can be quite as fast as the underlying
> network and the server's file system will allow.

Agreed, on huge files, NFS isn't that lackluster.

On a related note, I heard a rumor that there's going to be some sort of
proxy released by some vendor (the guy who described was under an NDA)
that should be able to greatly reduce back-and-forthing in NFS... It
sounded very much like with NX does for X11 if you set it up with a
proxy.

Thanks!

-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-07 17:19:10

by Peter Staubach

[permalink] [raw]

Subject: Re: Some code, and a question

Dan Stromberg wrote:

>On Wed, 2005-09-07 at 10:55 -0400, Peter Staubach wrote:
>
>
>>Dan Stromberg wrote:
>>
>>
>>
>>>Here's the summary output from my script. You may find it surprising.
>>>It may have bugs, but so far it seems to be coming up with results that
>>>one might not expect. This was iterating rsize's and wsize's from 4K to
>>>64K in steps of 1K. BTW, this is from an AIX 5.1 host to a Solaris 9
>>>host, but the script should run on nearly any unix or linux:
>>>
>>>
>>>
>>Presumably you have made the configuration changes at least on the Solaris
>>side, /etc/system or some such, to allow these systems to go all the way to
>>a 64K transfer size? Vanilla Solaris 9 won't do that.
>>
>>
>
>No I haven't - great lead. I'll see if I can google that up. Or if you
>have the incantation at your fingertips...
>
>
>

For Solaris, you might check out adding something like:

set nfs:nfs3_max_transfer_size=1048576
set nfs:nfs4_max_transfer_size=1048576

to /etc/system and then reboot the system. This will increase the maximum
size of a transfer to 1M. Alternately, you could use adb to patch a running
system. The command, "nfsstat -m", should tell you what the limits are for
currently mounted file systems. You will need to umount and mount any
existing NFS mounted file systems in order for them to be able to use the
new limits.

>>What have you done to factor out the file system on the server?
>>
>>
>
>Nothing. Actually, I don't really want to in this case, because it's
>the speed as seen by the enduser that I need to optimize, not the speed
>of NFS alone. That is, if there's something specific to the combination
>of NFS and the underlying QFS filesystem, I don't want my benchmarking
>to miss that.
>

QFS, huh? I hope that you have some QFS expertise to know how to tune it
to match what you need. It is notoriously difficult to tune, with many,
many tunables. It can be very high speed, but can also be very _not_,
if you are not careful.

Thanx...

ps

-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-07 17:29:06

by Dan Stromberg

[permalink] [raw]

Subject: Re: Some code, and a question

On Wed, 2005-09-07 at 13:13 -0400, Peter Staubach wrote:
> Dan Stromberg wrote:
>
> >On Wed, 2005-09-07 at 10:55 -0400, Peter Staubach wrote:
> >
> >
> >>Dan Stromberg wrote:
> >>
> >>
> >>
> >>>Here's the summary output from my script. You may find it surprising.
> >>>It may have bugs, but so far it seems to be coming up with results that
> >>>one might not expect. This was iterating rsize's and wsize's from 4K to
> >>>64K in steps of 1K. BTW, this is from an AIX 5.1 host to a Solaris 9
> >>>host, but the script should run on nearly any unix or linux:
> >>>
> >>>
> >>>
> >>Presumably you have made the configuration changes at least on the Solaris
> >>side, /etc/system or some such, to allow these systems to go all the way to
> >>a 64K transfer size? Vanilla Solaris 9 won't do that.
> >>
> >>
> >
> >No I haven't - great lead. I'll see if I can google that up. Or if you
> >have the incantation at your fingertips...
> >
> >
> >
>
> For Solaris, you might check out adding something like:
>
> set nfs:nfs3_max_transfer_size=1048576
> set nfs:nfs4_max_transfer_size=1048576
>
> to /etc/system and then reboot the system. This will increase the maximum
> size of a transfer to 1M. Alternately, you could use adb to patch a running
> system. The command, "nfsstat -m", should tell you what the limits are for
> currently mounted file systems. You will need to umount and mount any
> existing NFS mounted file systems in order for them to be able to use the
> new limits.

I'm giving the /etc/system stuff a shot right now. I haven't adb'd a
Solaris kernel in ages - rather just stick with /etc/system when
possible. Actually, last time I tried to adb a Solaris kernel, the
syntax that used to work fine, no longer did.

> >>What have you done to factor out the file system on the server?
> >>
> >>
> >
> >Nothing. Actually, I don't really want to in this case, because it's
> >the speed as seen by the enduser that I need to optimize, not the speed
> >of NFS alone. That is, if there's something specific to the combination
> >of NFS and the underlying QFS filesystem, I don't want my benchmarking
> >to miss that.
> >
>
> QFS, huh? I hope that you have some QFS expertise to know how to tune it
> to match what you need. It is notoriously difficult to tune, with many,
> many tunables. It can be very high speed, but can also be very _not_,
> if you are not careful.

I set up our QFS. I made the QFS transfer size the same as the
underlying stripe size in the underlying RAID 5's. Is there more to it
than that? I studied some QFS notes that a QFS instructor shared with
me, but I didn't see anything more about QFS tuning than that.

Thanks!

-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-07 17:30:46

by Peter Staubach

[permalink] [raw]

Subject: Re: Some code, and a question

Peter Staubach wrote:

>>
>> No I haven't - great lead. I'll see if I can google that up. Or if you
>> have the incantation at your fingertips...
>>
>
> For Solaris, you might check out adding something like:
>
> set nfs:nfs3_max_transfer_size=1048576
> set nfs:nfs4_max_transfer_size=1048576
>
> to /etc/system and then reboot the system. This will increase the
> maximum
> size of a transfer to 1M. Alternately, you could use adb to patch a
> running
> system. The command, "nfsstat -m", should tell you what the limits
> are for
> currently mounted file systems. You will need to umount and mount any
> existing NFS mounted file systems in order for them to be able to use the
> new limits.

You will probably need to increase the NFS client side blocksize
as well. This is nfs[34]_bsize. So, something like:

set nfs:nfs3_bsize=1048576
set nfs:nfs4_bsize=1048576

although, you will probably want it to match your rsize/wsize settings.

Please be aware that the if the block size is bigger than the transfer
size, then the Solaris NFS client will do break up the block sized
transfers into multiple pieces and then do them sequentially.

So, there is a tie between the read transfer size and the write transfer
size, at least on a Solaris client.

Thanx...

ps

-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-07 17:32:34

by Peter Staubach

[permalink] [raw]

Subject: Re: Some code, and a question

Dan Stromberg wrote:

>
>I set up our QFS. I made the QFS transfer size the same as the
>underlying stripe size in the underlying RAID 5's. Is there more to it
>than that? I studied some QFS notes that a QFS instructor shared with
>me, but I didn't see anything more about QFS tuning than that.
>

I don't know how to tune a QFS file system. I do know that it was
considered
to be a black art and LSC/Sun consulting would often get involved.

ps

-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs