2003-05-03 22:39:28

by Xose Vazquez Perez

[permalink] [raw]
Subject: nfs over TCP stable?

hi,

Red Hat 9-kernel 2.4.20-9 doesn't include nfs over TCP because they
said that "it doesn't work good enough in our testing....."
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=88583

Are there any problems at standard kernels or only at
mega-patched rh kernels?

-thanks in advance-

regards,
--
Galiza nin perdoa nin esquence. Governo demision!



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-05-03 23:30:25

by Greg Lindahl

[permalink] [raw]
Subject: Re: nfs over TCP stable?

On Sun, May 04, 2003 at 12:39:35AM +0200, Xose Vazquez Perez wrote:

> Red Hat 9-kernel 2.4.20-9 doesn't include nfs over TCP because they
> said that "it doesn't work good enough in our testing....."
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=88583

Note that they are talking about CONFIG_NFSD_TCP, i.e. the daemon side
of TCP support. Client support is much older, and Redhat has had it
turned on for ages.

greg




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-05-05 10:49:38

by Andreas Behnert

[permalink] [raw]
Subject: Re: nfs over TCP stable?

Xose Vazquez Perez wrote:
> hi,
>
> Red Hat 9-kernel 2.4.20-9 doesn't include nfs over TCP because they
> said that "it doesn't work good enough in our testing....."
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=88583
>
> Are there any problems at standard kernels or only at
> mega-patched rh kernels?
>
> -thanks in advance-
>
> regards,

(Sorry 'bout my poor English)

Linux NFS server, UNIX clients:
I used 2.2.20 for NFSv3 over UDP for a long time and did some tests
with 2.4.21-rc7 for NFSv3 over TCP. Unfortunately there were a lot of
problems. "Normal" stress tests (mounting, creating some very large
and many small files, copying, ...) showed no problems but CATIA is
running on the clients and CATIA, at least 4.2.2/4.2.4, seems to use
some special fs access methods (sorry, can't be more specific becuse
I simply don't know :) and heavily relies on statd/lockd. With CATIA
running IRIX worked almost without a problem, AIX5L didn't work at all,
HP-UX11 was (and still is) _extremely_ slow, saving/loading a 90 MB
model file takes forever. AIX client behaviour could be fixed with a
"proto=udp" mount option, HP-UX11 couldn't be fixed until now because
it seems to ignore the "proto=udp" mount option I put into /etc/fstab
and the manpages seem to differ from the real behaviour. So apart from
the NFSv3 over TCP problems there seems to be a bug in HP-UX11, at
least in the release that's running here. Log files didn't say anything
relevant, not on the server and also not on the clients. To be honest -
at the moment I don't know what to do to get NFS over TCP to work
reliably and performant with all these client systems.
Next step will be a recompiled kernel 2.4.21-rc7 on the server with
deactivated TCP support for knfsd and I hope that then everything will
work as expected again because it did before with kernel 2.2.20 and UDP
only.

Server is Debian 3.0r1+security, replaced distribution nfs-utils package
with a Debian package built from current 1.0.3 nfs-utils, network
100MBit, switched.

Regards,
Andreas



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-05-05 11:28:21

by Steve Dickson

[permalink] [raw]
Subject: Re: nfs over TCP stable?

The client works fairly well due to the simple fact
a number of people are using it in production today.

The server seems to have issues around flow control. When
a server is unable to send back replies due to an EGAIN error,
it resets the connection; Clearly, IMHO, not the correct way
to hand this error.

It has been suggested to me, I and think I agree, that the
I/O processing of the NFS server should be broken up into
to threads. A RX and TX thread. With the idea being when the
TX thread starts to get backed up, it turns off the RX thread
(i.e. stops it from receiving) which in turn will flow-control
sending client. This should allow the TX thread to be able
to catch up...

The duel io threads might also be handy if NFSD decide
to use AIO....

SteveD.

Xose Vazquez Perez wrote:

>hi,
>
>Red Hat 9-kernel 2.4.20-9 doesn't include nfs over TCP because they
>said that "it doesn't work good enough in our testing....."
>https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=88583
>
>Are there any problems at standard kernels or only at
>mega-patched rh kernels?
>
>-thanks in advance-
>
>regards,
>
>




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-05-05 19:43:50

by Xose Vazquez Perez

[permalink] [raw]
Subject: Re: nfs over TCP stable?

Greg Lindahl wrote:

>Note that they are talking about CONFIG_NFSD_TCP, i.e. the daemon side
>of TCP support. Client support is much older, and Redhat has had it
>turned on for ages.

Yes, I was talking about kernel nfs daemon server.

regards,
--
Galiza nin perdoa nin esquence. Governo demision!



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-05-05 19:50:08

by Xose Vazquez Perez

[permalink] [raw]
Subject: Re: nfs over TCP stable?

Steve Dickson wrote:

>It has been suggested to me, I and think I agree, that the
>I/O processing of the NFS server should be broken up into
>to threads. A RX and TX thread. With the idea being when the
>TX thread starts to get backed up, it turns off the RX thread
>(i.e. stops it from receiving) which in turn will flow-control
>sending client. This should allow the TX thread to be able
>to catch up...

2.5.xx has the same problem?

-thanks-

regards,
--
Galiza nin perdoa nin esquence. Governo demision!



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-05-06 07:23:22

by Martin Spott

[permalink] [raw]
Subject: Re: nfs over TCP stable?

Xose Vazquez Perez <[email protected]> wrote:

> Red Hat 9-kernel 2.4.20-9 doesn't include nfs over TCP because they
> said that "it doesn't work good enough in our testing....."

Linux as an NFS client against Solaris8 works excellent for me with NFS V.3
over TCP (over ATM ;-) - at least with stock 2.4.20 and 2.5.67 kernel.
Solaris as an NFS server even allows for 32 k read/write - which makes the
whole stuff pretty fast for large files,

Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-05-07 05:10:47

by NeilBrown

[permalink] [raw]
Subject: Re: nfs over TCP stable?

On Monday May 5, [email protected] wrote:
> The client works fairly well due to the simple fact
> a number of people are using it in production today.
>
> The server seems to have issues around flow control. When
> a server is unable to send back replies due to an EGAIN error,
> it resets the connection; Clearly, IMHO, not the correct way
> to hand this error.
>
> It has been suggested to me, I and think I agree, that the
> I/O processing of the NFS server should be broken up into
> to threads. A RX and TX thread. With the idea being when the
> TX thread starts to get backed up, it turns off the RX thread
> (i.e. stops it from receiving) which in turn will flow-control
> sending client. This should allow the TX thread to be able
> to catch up...

Well... there are lots of threads, and each one will either be
receiving or transmitting (or working) at any time. So maybe we
already have that.

Also, the reception of new requests is blocked when there is more than
some set amount of replies pending to be sent.

However the total size of reply buffers is scaled by the number of
nfsd threads that are run, and possibly this gets set too large so the
server runs out of kmallocable memory.
And what do you do when you have accepted a request, processed it, and
now cannot send a reply.
If you block, and every other thread blocks, you get a deadlock and
no-one ever releases their memory and nothing happens.

You really have to drop the request and it is only fair when you do
that to also drop the tcp connection.

So maybe the problem is that the transmit buffers should be smaller so
the flow control hits earlier.
Currently every TCP connection has a big enough buffer that every
thread can be responding to a request at the same time. Maybe that
should be scaled back when there are more active connections.
Or maybe whenever a connection is closed do to flow control problems,
we reduce the size of buffers by half, and then slowly increase them
while everything is fine ... or maybe some other heuristic.


>
> The duel io threads might also be handy if NFSD decide
> to use AIO....

I cannot see how AIO would really help NFSD. Having lots of threads
each doing one thing at a time seems to work quite well inside the
kernel.

But I'm willing to be educated.

NeilBrown


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
http://www.enterpriselinuxforum.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs