Hello Trond,
hello all,
just to drop a note: I am experiencing a rather dramatic slowdown of the
nfs-server in kernel 2.4.20-pre10 in conjunction with nfs-clients kernel
2.2.19. To be more specific, the server is a SMP machine and runs always the
latest 2.4.x kernels. Upto 2.4.20-pre9 everything was quite ok, but pre10
brought an incredible loss. The setup did not change, only the kernel on the
server side. Merely all nfs action is writing to the server, reading from it is
next to zero in this setup.
--
Regards,
Stephan
On Sunday October 13, [email protected] wrote:
> Hello Trond,
> hello all,
>
> just to drop a note: I am experiencing a rather dramatic slowdown of the
> nfs-server in kernel 2.4.20-pre10 in conjunction with nfs-clients kernel
> 2.2.19. To be more specific, the server is a SMP machine and runs always the
> latest 2.4.x kernels. Upto 2.4.20-pre9 everything was quite ok, but pre10
> brought an incredible loss. The setup did not change, only the kernel on the
> server side. Merely all nfs action is writing to the server, reading from it is
> next to zero in this setup.
Very odd... There were no changes between pre9 and pre10 that
directly relate to the nfs server, and none that immediately jump out
at me that could cause a slowdown in NFS writes.
What architecture? PPC saw a lot of updates.
What filesystem? jfs saw one change
What storage device? IDE or SCSI?
Can you try going back to -pre9 and confirm that performance comes
back?
NeilBrown
On Mon, 14 Oct 2002 09:03:43 +1000
Neil Brown <[email protected]> wrote:
> On Sunday October 13, [email protected] wrote:
> > Hello Trond,
> > hello all,
> >
> > just to drop a note: I am experiencing a rather dramatic slowdown of the
> > nfs-server in kernel 2.4.20-pre10 in conjunction with nfs-clients kernel
> > 2.2.19. To be more specific, the server is a SMP machine and runs always the
> > latest 2.4.x kernels. Upto 2.4.20-pre9 everything was quite ok, but pre10
> > brought an incredible loss. The setup did not change, only the kernel on the
> > server side. Merely all nfs action is writing to the server, reading from it is
> > next to zero in this setup.
>
> Very odd... There were no changes between pre9 and pre10 that
> directly relate to the nfs server, and none that immediately jump out
> at me that could cause a slowdown in NFS writes.
>
> What architecture? PPC saw a lot of updates.
i386, namely dual PIII 1GHz with 1 GB RAM
Are you sure it has nothing to do with the latest patch and SMP:
Trond Myklebust <[email protected]>:
o Workaround NFS hangs introduced in 2.4.20-pre
> What filesystem? jfs saw one change
reiserfs 3.6
> What storage device? IDE or SCSI?
IDE, PDC20268
> Can you try going back to -pre9 and confirm that performance comes
> back?
I will have a second try on the issue this night and be back with info tommorrow.
Thanks,
Stephan
> NeilBrown
On Monday October 14, [email protected] wrote:
> >
> > Very odd... There were no changes between pre9 and pre10 that
> > directly relate to the nfs server, and none that immediately jump out
> > at me that could cause a slowdown in NFS writes.
> >
> > What architecture? PPC saw a lot of updates.
>
> i386, namely dual PIII 1GHz with 1 GB RAM
> Are you sure it has nothing to do with the latest patch and SMP:
>
> Trond Myklebust <[email protected]>:
> o Workaround NFS hangs introduced in 2.4.20-pre
Nope. This is purely client side. The nfs server doesn't go anywhere
near this code.
>
> > What filesystem? jfs saw one change
>
> reiserfs 3.6
>
> > What storage device? IDE or SCSI?
>
> IDE, PDC20268
>
Well, I cannot see any changes between pre9 and pre10 that would have
any effect on an nfs server with your configuration...
> > Can you try going back to -pre9 and confirm that performance comes
> > back?
>
> I will have a second try on the issue this night and be back with
> info tommorrow.
Thanks. hopefully that will shed some light.
NeilBrown
Stephan von Krawczynski <[email protected]> wrote:
> just to drop a note: I am experiencing a rather dramatic slowdown of
> the nfs-server in kernel 2.4.20-pre10 in conjunction with
> nfs-clients kernel 2.2.19. To be more specific, the server is a SMP
> machine and runs always the latest 2.4.x kernels. Upto 2.4.20-pre9
> everything was quite ok, but pre10 brought an incredible loss. The
> setup did not change, only the kernel on the server side. Merely all
> nfs action is writing to the server, reading from it is next to zero
> in this setup.
I also had unexplained slowdown recently and after changing my
buffer sizes (rsize/wsize) back to 1024 (they have always been 8192),
speed increased 10x.
After banging my head for an hour or so this message was what made me
try the lower sizes.
http://www.geocrawler.com/archives/3/789/2002/8/0/9379245/
Even though the message states upgrading to 2.4.19 fixed it for him,
I'm using 2.4.19 on both machines and it looks like the problem still
persists.
Willing to give any info to those that care.
--
Jeff Lightfoot -- [email protected] -- http://thefoots.com/
"And I'm not done and I won't be till my head falls off. Though it
may not be a long way off." -- TMBG
>>>>> " " == Stephan von Krawczynski <[email protected]> writes:
> Trond Myklebust <[email protected]>:
> o Workaround NFS hangs introduced in 2.4.20-pre
That's an NFS *client* change. It doesn't touch any of the server
code.
Cheers,
Trond
On Mon, 14 Oct 2002 13:38:32 +1000
Neil Brown <[email protected]> wrote:
> > > Can you try going back to -pre9 and confirm that performance comes
> > > back?
> >
> > I will have a second try on the issue this night and be back with
> > info tommorrow.
>
> Thanks. hopefully that will shed some light.
Hello Neil,
hello Trond,
my second try shows all the same result. The exact same setup as yesterday
night and a second try results again in very low performance. To name it:
about 11 GB of data took an incredible 13,5 hours to write to the server over a
100 MBit FDX switch.
This night I will try to reduce rsize/wsize from the current 8192 down to 1024
as suggested by Jeff.
--
Regards,
Stephan
On Mon, 14 Oct 2002 16:36:51 +0200
Stephan von Krawczynski <[email protected]> wrote:
> my second try shows all the same result. The exact same setup as
> yesterday night and a second try results again in very low
> performance. To name it: about 11 GB of data took an incredible 13,5
> hours to write to the server over a 100 MBit FDX switch.
> This night I will try to reduce rsize/wsize from the current 8192
> down to 1024 as suggested by Jeff.
Small mistake, reads improved with 1024 but writes dropped
dramatically. The set of options that work are rsize=1024,wsize=8192
Try those and see how it works.
I'm wondering what changed although I do remember my nfs* packages
changing in Debian (Sid) recently (async, sync now having to be
specified). Hmmm.
--
Jeff Lightfoot -- [email protected] -- http://thefoots.com/
"so impressed with all you do ... tried so hard to be like you ...
flew too high and burnt the wing ... lost my faith in everything"
-- NIN
On Mon, 14 Oct 2002 13:38:32 +1000
Neil Brown <[email protected]> wrote:
Hello Neil,
hello Trond,
> This night I will try to reduce rsize/wsize from the current 8192 down to
> 1024 as suggested by Jeff.
Ok. The result is: it is again way slower. I was not even capable to transfer 5
GB within 18 hours, that's when I shot the thing down.
Anything else I can test?
--
Regards,
Stephan
Stephan von Krawczynski <[email protected]> writes:
>On Mon, 14 Oct 2002 13:38:32 +1000
>Neil Brown <[email protected]> wrote:
>Hello Neil,
>hello Trond,
>> This night I will try to reduce rsize/wsize from the current 8192 down to
>> 1024 as suggested by Jeff.
>Ok. The result is: it is again way slower. I was not even capable to transfer 5
>GB within 18 hours, that's when I shot the thing down.
>Anything else I can test?
nfs v2 or v3? tcp or udp? I assume nfs v3, udp and 100 Mbit switched
network between your hosts and no firewalls, routers or something like
this.
Could you post a small (some ten lines or so) tcp dump of a data transfer?
I had a hell of a time with a) a firewall dropping fragments; b) a
trunked network connection where one VLAN "pushed" another running the
NFS traffic off the trunk (Also vice versa, letting 97 Mbit/sec NFS
traffic pushing almost everything else from the trunk but this is
obviously not your problem. :-) )
If lowering the blocksize speeds up your transfers, dropping fragments
could be the problem (shorter blocks result in less fragments per
packets and increase the chance that all fragments make it over the
connection).
Could you watch the Ip: InDiscards ReasmTimeout ReasmReqds ReasmFails
and Udp: InErrors counters in /proc/net/snmp. Is any of them steadily
increasing?
Regards
Henning
--
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen -- Geschaeftsfuehrer
INTERMETA - Gesellschaft fuer Mehrwertdienste mbH [email protected]
Am Schwabachgrund 22 Fon.: 09131 / 50654-0 [email protected]
D-91054 Buckenhof Fax.: 09131 / 50654-20
On Tuesday October 15, [email protected] wrote:
> On Mon, 14 Oct 2002 13:38:32 +1000
> Neil Brown <[email protected]> wrote:
>
> Hello Neil,
> hello Trond,
>
> > This night I will try to reduce rsize/wsize from the current 8192 down to
> > 1024 as suggested by Jeff.
>
> Ok. The result is: it is again way slower. I was not even capable to transfer 5
> GB within 18 hours, that's when I shot the thing down.
> Anything else I can test?
All I can suggest is a binary search among the patches that comprise
the difference between pre9 and pre10 to see when the problem comes
in.
There are about 40 patches, so about 6 test runs should find the
culprit.
I tried to extract them from bk and have put the 40 patches at:
http://www.cse.unsw.edu.au/~neilb/pre10/
There is a p-all.tgz that contains them all.
NeilBrown