2003-10-14 12:25:20

by Roberto Sebastiano

[permalink] [raw]
Subject: gigabit, 2.4.22 and timeo

Hi,
I have this hw setup:
1 Dual PIII 1Ghz, Intel PRO/1000MT nic, linux 2.4.22 (nfs client)
1 Single P4 2.53Ghz, Intel PRO/1000MT nic, linux 2.4.22, fast RAID5 scsi
disk subsystem (nfs server)
These are cross-connected and both have jumbo frames enabled
This setup will (hopefully) be used to share files between two web
servers under load balancing

Now, to the problem.
I specified the mount options as follows (from /proc/mounts):
rw,v3,rsize=8192,wsize=8192,hard,intr,udp,lock,addr=192.168.2.1

When I copy a large file over the network, on the client side I see:
Oct 14 13:39:24 flanders kernel: nfs: server 192.168.2.1 not responding,
still trying
Oct 14 13:39:24 flanders last message repeated 5 times
Oct 14 13:39:24 flanders kernel: nfs: server 192.168.2.1 OK
Oct 14 13:39:24 flanders last message repeated 5 times
Oct 14 13:39:25 flanders kernel: nfs: server 192.168.2.1 not responding,
still trying
Oct 14 13:39:26 flanders last message repeated 11 times
Oct 14 13:39:26 flanders kernel: nfs: server 192.168.2.1 OK

But the cp process goes just fine.
Setting retrans to 5 or higher "solves" the problem (copying is still ok
and no errors are triggered), but setting timeo to a larger value (60 or
so) doesn't seem to have any effect (between the start of the copy and
the printf of the message, there are only two seconds or so)

I think that the client is just assuming a too low timeout value, and it
keeps sending requests
I found that with nfs over TCP this problem doesn't occur


Is there any way to solve this ?

Thanks,
--
Roberto Sebastiano <[email protected]>



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-10-14 14:39:50

by Trond Myklebust

[permalink] [raw]
Subject: Re: gigabit, 2.4.22 and timeo


> Is there any way to solve this ?

There should be. Try applying patches number 02, 03 and 04 (in that
order) from

http://www.fys.uio.no/~trondmy/src/2.4.23-pre7

Those patches have already been sent to Marcelo but haven't yet
appeared in his Bitkeeper tree...

Cheers,
Trond


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-10-14 16:22:12

by Roberto Sebastiano

[permalink] [raw]
Subject: Re: gigabit, 2.4.22 and timeo

Il mar, 2003-10-14 alle 16:39, Trond Myklebust ha scritto:
> > Is there any way to solve this ?
>
> There should be. Try applying patches number 02, 03 and 04 (in that
> order) from
>
> http://www.fys.uio.no/~trondmy/src/2.4.23-pre7

Great
I'm transferring 120gb of data via net with those patches applied and it
runs just fine

Should I apply the others patches too for a production environment ?

Thanks,
--
Roberto Sebastiano <[email protected]>



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-10-14 16:27:06

by Trond Myklebust

[permalink] [raw]
Subject: Re: gigabit, 2.4.22 and timeo

>>>>> " " == Roberto Sebastiano <[email protected]> writes:

> Il mar, 2003-10-14 alle 16:39, Trond Myklebust ha scritto:
>> > Is there any way to solve this ?
>>
>> There should be. Try applying patches number 02, 03 and 04 (in
>> that order) from
>>
>> http://www.fys.uio.no/~trondmy/src/2.4.23-pre7

> Great I'm transferring 120gb of data via net with those patches
> applied and it runs just fine

> Should I apply the others patches too for a production
> environment ?

I would recommend that you at least apply

linux-2.4.23-01-fix_deadlock.dif

and

linux-2.4.23-05-fix_readdir.dif

in addition, since they fix critical problems (a deadlock and a stack
overflow).

The other patches are not critical (indeed some of them are
feature-enhancements) so you can probably drop them.

Cheers,
Trond


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-10-14 17:03:23

by Roberto Sebastiano

[permalink] [raw]
Subject: Re: gigabit, 2.4.22 and timeo

Il mar, 2003-10-14 alle 18:26, Trond Myklebust ha scritto:
> >>>>> " " == Roberto Sebastiano <[email protected]> writes:
>
> > Il mar, 2003-10-14 alle 16:39, Trond Myklebust ha scritto:
> >> > Is there any way to solve this ?
> >>
> >> There should be. Try applying patches number 02, 03 and 04 (in
> >> that order) from
> >>
> >> http://www.fys.uio.no/~trondmy/src/2.4.23-pre7
>
> > Great I'm transferring 120gb of data via net with those patches
> > applied and it runs just fine
>
> > Should I apply the others patches too for a production
> > environment ?
>
> I would recommend that you at least apply
>
> linux-2.4.23-01-fix_deadlock.dif
>
> and
>
> linux-2.4.23-05-fix_readdir.dif
>
> in addition, since they fix critical problems (a deadlock and a stack
> overflow).
>
I'll apply them when the copy finishs

While transferring the files it seems that "ls -laR" on the other nfs
client is a little slow ..

I'm worried that the 500-1000 apache childs will hang on reads from
network. Is there any way to prevent this ? Maybe more nfsd instances on
the server ?



Thanks,
--
Roberto Sebastiano <[email protected]>



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-10-14 18:31:16

by Trond Myklebust

[permalink] [raw]
Subject: Re: gigabit, 2.4.22 and timeo

>>>>> " " == Roberto Sebastiano <[email protected]> writes:

> While transferring the files it seems that "ls -laR" on the
> other nfs client is a little slow ..

> I'm worried that the 500-1000 apache childs will hang on reads
> from network. Is there any way to prevent this ? Maybe more
> nfsd instances on the server ?

Yes. Note that each NFS client will use a maximum of 16 nfsd threads
on the server at any point in time. Of course, that tails off as you
start to hit the scalability curves as you add more clients, but it is
a good rule of thumb if you just have between 1 and 8 clients...

Cheers,
Trond


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs