2008-06-23 15:00:13

by Adrian von Bidder

[permalink] [raw]
Subject: [NFS] NFS performance debuggins

Hi,

Environment:

several Debian based clients (Debian etch and etchnhalf kernels, this means
2.6.18 or 2.6.24); Debian etch (2.6.18 kernel) NFS (v3) server. Network
seems basically ok ("ping -f -s 3000" works without losses, ifconfig and
switch monitoring shows no errors) with no noticeable load. Disks seem to
have very little load either, NFS server has no other tasks.

Performance is sluggish :-( Basically works, though -- no spurious errors.

tcpdump shows many "reply ERR 1448" etc. msgs whenever NFS activitiy is
going on (both stat like with "find /home" or read/write with dd)

+++
16:49:24.778560 IP 10.0.1.2.2049 > 10.0.0.209.809066834: reply ERR 1448
16:49:24.790304 IP 10.0.1.2.2049 > 10.0.0.209.943279929: reply ERR 1448
16:49:24.801380 IP 10.0.1.2.2049 > 10.0.0.209.2001885801: reply ERR 1448
16:49:24.802173 IP 10.0.1.2.2049 > 10.0.0.209.860835666: reply ERR 1448
16:49:24.805286 IP 10.0.1.2.2049 > 10.0.0.209.1479697199: reply ERR 1332
16:49:24.807679 IP 10.0.1.2.2049 > 10.0.0.209.1096249460: reply ERR 1448
16:49:24.808358 IP 10.0.1.2.2049 > 10.0.0.209.2000902760: reply ERR 1332
16:49:24.809097 IP 10.0.1.2.2049 > 10.0.0.209.926298420: reply ERR 1448
16:49:24.809100 IP 10.0.1.2.2049 > 10.0.0.209.25105411: reply ERR 1332
16:49:24.817923 IP 10.0.1.2.2049 > 10.0.0.209.1366504235: reply ERR 1448
16:49:24.817927 IP 10.0.1.2.2049 > 10.0.0.209.352525071: reply ERR 1332
16:49:24.820397 IP 10.0.1.2.2049 > 10.0.0.209.269848846: reply ERR 1332
16:49:24.822097 IP 10.0.1.2.2049 > 10.0.0.209.1345540144: reply ERR 1448
16:49:24.822856 IP 10.0.1.2.2049 > 10.0.0.209.944780599: reply ERR 1448
16:49:24.825109 IP 10.0.1.2.2049 > 10.0.0.209.1395668559: reply ERR 1448
16:49:24.825112 IP 10.0.1.2.2049 > 10.0.0.209.1999335795: reply ERR 1332
16:49:24.827813 IP 10.0.1.2.2049 > 10.0.0.209.1685677906: reply ERR 1332
16:49:24.829439 IP 10.0.1.2.2049 > 10.0.0.209.1666084982: reply ERR 1448
16:49:24.829443 IP 10.0.1.2.2049 > 10.0.0.209.1415656037: reply ERR 1332
16:49:24.839013 IP 10.0.1.2.2049 > 10.0.0.209.911226680: reply ERR 1448
16:49:24.839017 IP 10.0.1.2.2049 > 10.0.0.209.1735414852: reply ERR 1332
16:49:24.841325 IP 10.0.1.2.2049 > 10.0.0.209.911358287: reply ERR 1332
16:49:24.842092 IP 10.0.1.2.2049 > 10.0.0.209.1364284211: reply ERR 1448
16:49:24.842800 IP 10.0.1.2.2049 > 10.0.0.209.258643250: reply ERR 1332
16:49:24.844256 IP 10.0.1.2.2049 > 10.0.0.209.1666017882: reply ERR 1448
16:49:24.844996 IP 10.0.1.2.2049 > 10.0.0.209.808595513: reply ERR 1448
16:49:24.845674 IP 10.0.1.2.2049 > 10.0.0.209.2000779112: reply ERR 1448
16:49:24.845677 IP 10.0.1.2.2049 > 10.0.0.209.1652175121: reply ERR 1332
16:49:24.847120 IP 10.0.1.2.2049 > 10.0.0.209.944722769: reply ERR 1448
16:49:24.847123 IP 10.0.1.2.2049 > 10.0.0.209.1682657874: reply ERR 1332
16:49:24.849334 IP 10.0.1.2.2049 > 10.0.0.209.944714835: reply ERR 1448
16:49:24.850873 IP 10.0.1.2.2049 > 10.0.0.209.1345861938: reply ERR 1448
16:49:24.918710 IP 10.0.1.2.2049 > 10.0.0.179.1936680564: reply ERR 1448
16:49:24.918719 IP 10.0.1.2.2049 > 10.0.0.179.1698508838: reply ERR 1448
16:49:24.921911 IP 10.0.1.2.2049 > 10.0.0.179.1633904741: reply ERR 1448
+++

Mount options: "rw,noatime,rsize=8192,wsize=8192,intr,hard,addr=10.0.1.2",
it seems to pick tcp by default. I had problems with UDP from some of the
clients due to a strangely buggy VDSL switch in the path, so I haven't
tried that again (I want to keep the DSL clients and the non-DSL clients
identical if this is at all possible, so I can switch equipment around
without reconfiguration.)

That performance is not optimal whith todays desktop environments (tons of
small configuration files in both oo.org and kde) at login/program start on
cold caches is one thing, but performance

Now where do I start debugging this?

--
Development costs of average proprietary and free software don't differ
radically because the methods are pretty much the same. The huge
difference lies in the way the developers try to recoup their costs, not
in the costs they have to compensate.
-- Florian Weimer on debian-security


Attachments:
(No filename) (3.98 kB)
signature.asc (388.00 B)
This is a digitally signed message part.
(No filename) (247.00 B)
(No filename) (362.00 B)
Download all attachments

2008-06-23 15:15:51

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NFS] NFS performance debuggins

On Mon, 2008-06-23 at 16:59 +0200, Adrian von Bidder wrote:
> Hi,
>
> Environment:
>
> several Debian based clients (Debian etch and etchnhalf kernels, this means
> 2.6.18 or 2.6.24); Debian etch (2.6.18 kernel) NFS (v3) server. Network
> seems basically ok ("ping -f -s 3000" works without losses, ifconfig and
> switch monitoring shows no errors) with no noticeable load. Disks seem to
> have very little load either, NFS server has no other tasks.
>
> Performance is sluggish :-( Basically works, though -- no spurious errors.
>
> tcpdump shows many "reply ERR 1448" etc. msgs whenever NFS activitiy is
> going on (both stat like with "find /home" or read/write with dd)
>
> +++
> 16:49:24.778560 IP 10.0.1.2.2049 > 10.0.0.209.809066834: reply ERR 1448
> 16:49:24.790304 IP 10.0.1.2.2049 > 10.0.0.209.943279929: reply ERR 1448
> 16:49:24.801380 IP 10.0.1.2.2049 > 10.0.0.209.2001885801: reply ERR 1448
> 16:49:24.802173 IP 10.0.1.2.2049 > 10.0.0.209.860835666: reply ERR 1448
> 16:49:24.805286 IP 10.0.1.2.2049 > 10.0.0.209.1479697199: reply ERR 1332
> 16:49:24.807679 IP 10.0.1.2.2049 > 10.0.0.209.1096249460: reply ERR 1448
> 16:49:24.808358 IP 10.0.1.2.2049 > 10.0.0.209.2000902760: reply ERR 1332
> 16:49:24.809097 IP 10.0.1.2.2049 > 10.0.0.209.926298420: reply ERR 1448
> 16:49:24.809100 IP 10.0.1.2.2049 > 10.0.0.209.25105411: reply ERR 1332
> 16:49:24.817923 IP 10.0.1.2.2049 > 10.0.0.209.1366504235: reply ERR 1448
> 16:49:24.817927 IP 10.0.1.2.2049 > 10.0.0.209.352525071: reply ERR 1332
> 16:49:24.820397 IP 10.0.1.2.2049 > 10.0.0.209.269848846: reply ERR 1332
> 16:49:24.822097 IP 10.0.1.2.2049 > 10.0.0.209.1345540144: reply ERR 1448
> 16:49:24.822856 IP 10.0.1.2.2049 > 10.0.0.209.944780599: reply ERR 1448
> 16:49:24.825109 IP 10.0.1.2.2049 > 10.0.0.209.1395668559: reply ERR 1448
> 16:49:24.825112 IP 10.0.1.2.2049 > 10.0.0.209.1999335795: reply ERR 1332
> 16:49:24.827813 IP 10.0.1.2.2049 > 10.0.0.209.1685677906: reply ERR 1332
> 16:49:24.829439 IP 10.0.1.2.2049 > 10.0.0.209.1666084982: reply ERR 1448
> 16:49:24.829443 IP 10.0.1.2.2049 > 10.0.0.209.1415656037: reply ERR 1332
> 16:49:24.839013 IP 10.0.1.2.2049 > 10.0.0.209.911226680: reply ERR 1448
> 16:49:24.839017 IP 10.0.1.2.2049 > 10.0.0.209.1735414852: reply ERR 1332
> 16:49:24.841325 IP 10.0.1.2.2049 > 10.0.0.209.911358287: reply ERR 1332
> 16:49:24.842092 IP 10.0.1.2.2049 > 10.0.0.209.1364284211: reply ERR 1448
> 16:49:24.842800 IP 10.0.1.2.2049 > 10.0.0.209.258643250: reply ERR 1332
> 16:49:24.844256 IP 10.0.1.2.2049 > 10.0.0.209.1666017882: reply ERR 1448
> 16:49:24.844996 IP 10.0.1.2.2049 > 10.0.0.209.808595513: reply ERR 1448
> 16:49:24.845674 IP 10.0.1.2.2049 > 10.0.0.209.2000779112: reply ERR 1448
> 16:49:24.845677 IP 10.0.1.2.2049 > 10.0.0.209.1652175121: reply ERR 1332
> 16:49:24.847120 IP 10.0.1.2.2049 > 10.0.0.209.944722769: reply ERR 1448
> 16:49:24.847123 IP 10.0.1.2.2049 > 10.0.0.209.1682657874: reply ERR 1332
> 16:49:24.849334 IP 10.0.1.2.2049 > 10.0.0.209.944714835: reply ERR 1448
> 16:49:24.850873 IP 10.0.1.2.2049 > 10.0.0.209.1345861938: reply ERR 1448
> 16:49:24.918710 IP 10.0.1.2.2049 > 10.0.0.179.1936680564: reply ERR 1448
> 16:49:24.918719 IP 10.0.1.2.2049 > 10.0.0.179.1698508838: reply ERR 1448
> 16:49:24.921911 IP 10.0.1.2.2049 > 10.0.0.179.1633904741: reply ERR 1448
> +++
>
> Mount options: "rw,noatime,rsize=8192,wsize=8192,intr,hard,addr=10.0.1.2",
> it seems to pick tcp by default. I had problems with UDP from some of the
> clients due to a strangely buggy VDSL switch in the path, so I haven't
> tried that again (I want to keep the DSL clients and the non-DSL clients
> identical if this is at all possible, so I can switch equipment around
> without reconfiguration.)
>
> That performance is not optimal whith todays desktop environments (tons of
> small configuration files in both oo.org and kde) at login/program start on
> cold caches is one thing, but performance
>
> Now where do I start debugging this?

In the above dump 1448 is _not_ the error code, but rather the packet
length. You might therefore try using the tcpdump option '-vvv' to see
if you can obtain the actual error value (which should tell you why the
NFS server is rejecting your packets).
Alternatively, you might consider using wireshark/tshark, which can
display NFS packets in a much more readable fashion.

Cheers
Trond


-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs


2008-06-23 19:29:08

by Bruce Fields

[permalink] [raw]
Subject: Re: [NFS] NFS performance debuggins

On Mon, Jun 23, 2008 at 04:59:57PM +0200, Adrian von Bidder wrote:
> Hi,
>
> Environment:
>
> several Debian based clients (Debian etch and etchnhalf kernels, this means
> 2.6.18 or 2.6.24); Debian etch (2.6.18 kernel) NFS (v3) server. Network
> seems basically ok ("ping -f -s 3000" works without losses, ifconfig and
> switch monitoring shows no errors) with no noticeable load. Disks seem to
> have very little load either, NFS server has no other tasks.
>
> Performance is sluggish :-( Basically works, though -- no spurious errors.

In what way exactly is it sluggish?

> tcpdump shows many "reply ERR 1448" etc. msgs whenever NFS activitiy is
> going on (both stat like with "find /home" or read/write with dd)

I'm afraid I don't know how to read that tcpdump output.

--b.

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs


2008-06-24 10:17:45

by Adrian von Bidder

[permalink] [raw]
Subject: Re: [NFS] NFS performance debugging

Hi again,

Thanks for your replies (You too, Trond)

On Monday 23 June 2008 21.28:36 you wrote:

[... NFS performance ...]

> In what way exactly is it sluggish?

Starting KDE, opening documents, sometimes also closing oo.org and saving
documents takes several seconds longer than on local disk.

Certainly network latency (especially with these silly lots of small config
files) takes some time, but I'm still surprised. At the same time, I don't
have data to compare a "known good" NFS against ours, so perhaps NFS is
indeed so slow?

>
> > tcpdump shows many "reply ERR 1448" etc. msgs whenever NFS activitiy is
> > going on (both stat like with "find /home" or read/write with dd)
>
> I'm afraid I don't know how to read that tcpdump output.

tcpdump "-vvv" doesn't give more information on these packets; at the same
time wireshark doesn't show anything suspicious except tons of wrong TCP
checksums caused (I hope...) by offloading. I'll have to look if I can get
the raw traffic at the network switch to check this (but I think with 30%
and more wrong tcp checksums, traffic would completely break down so I'm
quite confident here.)


Slightly different topic: is there an NFS related mailing list I can
subscribe to? This one is apparently closed for new subscribers, and the
bounce instructs me to send mail to [email protected] which
bounces :-( Reading others' NFS postings might just give me more ideas on
where to look.


TODO today: play around with NFSv4 on the shaky assumption that nfsv3 is
actually working but net latency is killing my performance.


cheers
-- vbi



--
The typewriting machine, when played with expression, is no more
annoying than the piano when played by a sister or near relation.
-- Oscar Wilde


Attachments:
(No filename) (1.75 kB)
signature.asc (388.00 B)
This is a digitally signed message part.
(No filename) (247.00 B)
(No filename) (362.00 B)
Download all attachments