2003-09-03 09:27:12

by Robert Heessels

[permalink] [raw]
Subject: 'random' diskless clients hangs

Hello,

We have a setup with a NFS V3 Fileserver and diskless clients all running
Red Hat 7.3.

All clients run fine for days at a time, but every now and then a client
hangs. Most of the time the process that is porbably causing the hang is
either rhn_check (from Red Hat Network) or httpsd with user psaadm (from
Plesk 6.0.1).

Since we are new to NFS we're not sure how to track down this problem. Any
help would be VERY much appreciated!

Kind regards,

Robert Heessels
Alphamega




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-09-03 18:01:07

by Steve Dickson

[permalink] [raw]
Subject: Re: 'random' diskless clients hangs

Hello,

Robert Heessels wrote:

>All clients run fine for days at a time, but every now and then a client
>hangs. Most of the time the process that is probably causing the hang is
>either rhn_check (from Red Hat Network) or httpsd with user psaadm (from
>Plesk 6.0.1).
>
This is simply not enough information to even try to guess what's
happening... Have you tried upgrading to a more recent kernel to see
if the problem goes away?

SteveD.



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-03 19:55:17

by Bernd Schubert

[permalink] [raw]
Subject: Re: 'random' diskless clients hangs

On Wednesday 03 September 2003 11:33, [email protected] wrote:
> Hello,
>
> We have a setup with a NFS V3 Fileserver and diskless clients all running
> Red Hat 7.3.
>
> All clients run fine for days at a time, but every now and then a client
> hangs. Most of the time the process that is porbably causing the hang is
> either rhn_check (from Red Hat Network) or httpsd with user psaadm (from
> Plesk 6.0.1).
>
> Since we are new to NFS we're not sure how to track down this problem. Any
> help would be VERY much appreciated!
>

Hello,

we are also using diskless clients but have never seen something like this,
though none of your hanging programs is running on your systems.
Is there anything in the logs? Which kernel-versions are you using? If you are
using redhat kernel you should try a vanilla kernel (2.4.21 *NOT* 2.4.22).

Btw, how is your diskless setup working? Perhaps there's something wrong?

Regards,
Bernd



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-03 21:31:10

by Marc Schmitt

[permalink] [raw]
Subject: Re: 'random' diskless clients hangs

Hi Bernd,

Bernd Schubert wrote:

>If you are
>using redhat kernel you should try a vanilla kernel (2.4.21 *NOT* 2.4.22).
>
>
What's wrong with 2.4.22?

Marc



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-05 18:46:45

by nfsmailinglist

[permalink] [raw]
Subject: RE: 'random' diskless clients hangs

The clients still hang every few hours. There is nothing in the logs.
When we change the rsize and wsize to 2048, the cleint no longer crash but
clients and server because extremely slow.

Please help! We are willing to pay a fee, by the way, if someone is
interested in take a good hard look at our problem.

This is our setup:

SERVER:
=======

Linux Red Hat 7.3 kernel 2.4.18-27.7.x
nfs-utils-1.0.5-1

netboot settings:
label linux
kernel vmlinuz
ipappend 1
append root=/dev/nfs
nfsroot=192.168.1.112:/clients/%s/root,v3,nolock,intr,rsize=4096,wsize=4096

exports:
/clients/blue/root
192.168.1.160/255.255.255.255(rw,no_root_squash,sync)

CLIENTS:
========

Linux Red Hat 7.3 kernel 2.4.20-18.7custom
nfs-utils-1.0.5-1

fstab:
192.168.1.112:/clients/blue/root / nfs
rw,nfsvers=3,hard,intr,udp,nolock 0 0


-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of Bernd Schubert
Sent: woensdag 3 september 2003 21:03
To: [email protected]; [email protected]
Subject: Re: [NFS] 'random' diskless clients hangs


On Wednesday 03 September 2003 11:33, [email protected] wrote:
> Hello,
>
> We have a setup with a NFS V3 Fileserver and diskless clients all running
> Red Hat 7.3.
>
> All clients run fine for days at a time, but every now and then a client
> hangs. Most of the time the process that is porbably causing the hang is
> either rhn_check (from Red Hat Network) or httpsd with user psaadm (from
> Plesk 6.0.1).
>
> Since we are new to NFS we're not sure how to track down this problem. Any
> help would be VERY much appreciated!
>

Hello,

we are also using diskless clients but have never seen something like this,
though none of your hanging programs is running on your systems.
Is there anything in the logs? Which kernel-versions are you using? If you
are
using redhat kernel you should try a vanilla kernel (2.4.21 *NOT* 2.4.22).

Btw, how is your diskless setup working? Perhaps there's something wrong?

Regards,
Bernd



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-05 19:14:47

by nfsmailinglist

[permalink] [raw]
Subject: RE: 'random' diskless clients hangs

AMAZING: I switched the server from UDP to TCP... same crashes... :-(

There is only 1 server and 2 clients in the setup. All connected via eth1
100Mbps through a Cisco switch. No other traffic on this switch.
Here are the stats you requested... normal results?

[root@blue root]# nfsstat -c
Client rpc stats:
calls retrans authrefrsh
1119581 5 0


[root@aqua pxelinux.cfg]# netstat -s
Ip:
42501380 total packets received
0 forwarded
0 incoming packets discarded
42501269 incoming packets delivered
29067258 requests sent out
5 reassemblies required
2 packets reassembled ok
468 fragments created


-----Original Message-----
From: Lever, Charles [mailto:[email protected]]
Sent: vrijdag 5 september 2003 21:02
To: [email protected]
Subject: RE: [NFS] 'random' diskless clients hangs


is your network clean? UDP is pretty sensitive to
networking problems -- look for reassembly errors
in 'netstat -s', and retransmission count in 'nfsstat -c'.

> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Friday, September 05, 2003 2:46 PM
> To: [email protected]
> Subject: RE: [NFS] 'random' diskless clients hangs
>
>
> The clients still hang every few hours. There is nothing in the logs.
> When we change the rsize and wsize to 2048, the cleint no
> longer crash but
> clients and server because extremely slow.
>
> Please help! We are willing to pay a fee, by the way, if someone is
> interested in take a good hard look at our problem.
>
> This is our setup:
>
> SERVER:
> =======
>
> Linux Red Hat 7.3 kernel 2.4.18-27.7.x
> nfs-utils-1.0.5-1
>
> netboot settings:
> label linux
> kernel vmlinuz
> ipappend 1
> append root=/dev/nfs
> nfsroot=192.168.1.112:/clients/%s/root,v3,nolock,intr,rsize=40
> 96,wsize=4096
>
> exports:
> /clients/blue/root
> 192.168.1.160/255.255.255.255(rw,no_root_squash,sync)
>
> CLIENTS:
> ========
>
> Linux Red Hat 7.3 kernel 2.4.20-18.7custom
> nfs-utils-1.0.5-1
>
> fstab:
> 192.168.1.112:/clients/blue/root / nfs
> rw,nfsvers=3,hard,intr,udp,nolock 0 0
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of Bernd Schubert
> Sent: woensdag 3 september 2003 21:03
> To: [email protected]; [email protected]
> Subject: Re: [NFS] 'random' diskless clients hangs
>
>
> On Wednesday 03 September 2003 11:33,
> [email protected] wrote:
> > Hello,
> >
> > We have a setup with a NFS V3 Fileserver and diskless
> clients all running
> > Red Hat 7.3.
> >
> > All clients run fine for days at a time, but every now and
> then a client
> > hangs. Most of the time the process that is porbably
> causing the hang is
> > either rhn_check (from Red Hat Network) or httpsd with user
> psaadm (from
> > Plesk 6.0.1).
> >
> > Since we are new to NFS we're not sure how to track down
> this problem. Any
> > help would be VERY much appreciated!
> >
>
> Hello,
>
> we are also using diskless clients but have never seen
> something like this,
> though none of your hanging programs is running on your systems.
> Is there anything in the logs? Which kernel-versions are you
> using? If you
> are
> using redhat kernel you should try a vanilla kernel (2.4.21
> *NOT* 2.4.22).
>
> Btw, how is your diskless setup working? Perhaps there's
> something wrong?
>
> Regards,
> Bernd
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-05 19:50:53

by Trond Myklebust

[permalink] [raw]
Subject: Re: 'random' diskless clients hangs

>>>>> " " == nfsmailinglist <[email protected]> writes:

> The clients still hang every few hours. There is nothing in the
> logs. When we change the rsize and wsize to 2048, the cleint
> no longer crash but clients and server because extremely slow.

That is usually a sign of a networking problem. What kind of
NIC+driver are you using on the clients?

Also, what does that 'custom' mean on you client kernels? Have you
done anything other than to enable nfsroot?

Cheers,
Trond


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-05 19:50:02

by Dennis, Richard

[permalink] [raw]
Subject: RE: 'random' diskless clients hangs

Anything in the logs? You might consider increasing the debugging output by
echoing into /proc/sys/sunrpc/*

-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Friday, September 05, 2003 3:14 PM
To: [email protected]
Subject: RE: [NFS] 'random' diskless clients hangs


AMAZING: I switched the server from UDP to TCP... same crashes... :-(

There is only 1 server and 2 clients in the setup. All connected via eth1
100Mbps through a Cisco switch. No other traffic on this switch. Here are
the stats you requested... normal results?

[root@blue root]# nfsstat -c
Client rpc stats:
calls retrans authrefrsh
1119581 5 0


[root@aqua pxelinux.cfg]# netstat -s
Ip:
42501380 total packets received
0 forwarded
0 incoming packets discarded
42501269 incoming packets delivered
29067258 requests sent out
5 reassemblies required
2 packets reassembled ok
468 fragments created


-----Original Message-----
From: Lever, Charles [mailto:[email protected]]
Sent: vrijdag 5 september 2003 21:02
To: [email protected]
Subject: RE: [NFS] 'random' diskless clients hangs


is your network clean? UDP is pretty sensitive to
networking problems -- look for reassembly errors
in 'netstat -s', and retransmission count in 'nfsstat -c'.

> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Friday, September 05, 2003 2:46 PM
> To: [email protected]
> Subject: RE: [NFS] 'random' diskless clients hangs
>
>
> The clients still hang every few hours. There is nothing in the logs.
> When we change the rsize and wsize to 2048, the cleint no longer crash
> but clients and server because extremely slow.
>
> Please help! We are willing to pay a fee, by the way, if someone is
> interested in take a good hard look at our problem.
>
> This is our setup:
>
> SERVER:
> =======
>
> Linux Red Hat 7.3 kernel 2.4.18-27.7.x
> nfs-utils-1.0.5-1
>
> netboot settings:
> label linux
> kernel vmlinuz
> ipappend 1
> append root=/dev/nfs
> nfsroot=192.168.1.112:/clients/%s/root,v3,nolock,intr,rsize=40
> 96,wsize=4096
>
> exports:
> /clients/blue/root
> 192.168.1.160/255.255.255.255(rw,no_root_squash,sync)
>
> CLIENTS:
> ========
>
> Linux Red Hat 7.3 kernel 2.4.20-18.7custom
> nfs-utils-1.0.5-1
>
> fstab:
> 192.168.1.112:/clients/blue/root / nfs
> rw,nfsvers=3,hard,intr,udp,nolock 0 0
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of Bernd Schubert
> Sent: woensdag 3 september 2003 21:03
> To: [email protected]; [email protected]
> Subject: Re: [NFS] 'random' diskless clients hangs
>
>
> On Wednesday 03 September 2003 11:33, [email protected]
> wrote:
> > Hello,
> >
> > We have a setup with a NFS V3 Fileserver and diskless
> clients all running
> > Red Hat 7.3.
> >
> > All clients run fine for days at a time, but every now and
> then a client
> > hangs. Most of the time the process that is porbably
> causing the hang is
> > either rhn_check (from Red Hat Network) or httpsd with user
> psaadm (from
> > Plesk 6.0.1).
> >
> > Since we are new to NFS we're not sure how to track down
> this problem. Any
> > help would be VERY much appreciated!
> >
>
> Hello,
>
> we are also using diskless clients but have never seen something like
> this, though none of your hanging programs is running on your systems.
> Is there anything in the logs? Which kernel-versions are you
> using? If you
> are
> using redhat kernel you should try a vanilla kernel (2.4.21
> *NOT* 2.4.22).
>
> Btw, how is your diskless setup working? Perhaps there's something
> wrong?
>
> Regards,
> Bernd
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf _______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

------------------------------------------------------------------------------
This message is intended only for the personal and confidential use of the
designated recipient(s) named above. If you are not the intended recipient of
this message you are hereby notified that any review, dissemination,
distribution or copying of this message is strictly prohibited. This
communication is for information purposes only and should not be regarded as
an offer to sell or as a solicitation of an offer to buy any financial
product, an official confirmation of any transaction, or as an official
statement of Lehman Brothers. Email transmission cannot be guaranteed to be
secure or error-free. Therefore, we do not represent that this information is
complete or accurate and it should not be relied upon as such. All
information is subject to change without notice.



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-06 06:44:46

by nfsmailinglist

[permalink] [raw]
Subject: RE: 'random' diskless clients hangs

This was in the log after I did
# echo 32767 > /proc/sys/sunrpc/rpc_debug
# echo 32767 > /proc/sys/sunrpc/nfs_debug
# echo 32767 > /proc/sys/sunrpc/nfsd_debug
# echo 32767 > /proc/sys/sunrpc/nlm_debug

Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 put into queue
Sep 6 08:37:07 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:07 aqua kernel: svc: socket f6a33700(inet f6b80080), count=140,
busy=1
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:07 aqua kernel: svc: got len=112
Sep 6 08:37:07 aqua kernel: svc: svc_authenticate (1)
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:07 aqua kernel: svc: calling dispatcher
Sep 6 08:37:07 aqua kernel: nfsd_dispatch: vers 3 proc 4
Sep 6 08:37:07 aqua kernel: nfsd: ACCESS(3) 12: 00000001 03000800
00eb0097 00000000 00000000 00000000 0x2
Sep 6 08:37:07 aqua kernel: nfsd: fh_verify(12: 00000001 03000800 00eb0097
00000000 00000000 00000000)
Sep 6 08:37:07 aqua kernel: nfsd: ressize_check p f6564078 base f6564000
len 2304
Sep 6 08:37:07 aqua kernel: svc: service f660ea00, releasing skb f6c71880
Sep 6 08:37:07 aqua kernel: svc: socket f6a33700 sendto([f6564000 120... ],
1, 120) = 120
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:07 aqua kernel: svc: server f660ea00 waiting for data (to =
30000)
Sep 6 08:37:07 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 dequeued, inuse=7
Sep 6 08:37:07 aqua kernel: svc: server f660ea00, socket f6a33700, inuse=8
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 put into queue
Sep 6 08:37:07 aqua kernel: svc: got len=4236
Sep 6 08:37:07 aqua kernel: svc: svc_authenticate (1)
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:07 aqua kernel: svc: calling dispatcher
Sep 6 08:37:07 aqua kernel: svc: socket f6a33700(inet f6b80080), count=140,
busy=1
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:07 aqua kernel: nfsd_dispatch: vers 3 proc 7
Sep 6 08:37:07 aqua kernel: nfsd: WRITE(3) 24: 02000001 03000800
00eb0097 01630077 44ef9777 01630070 4096 bytes at 0
Sep 6 08:37:07 aqua kernel: nfsd: fh_verify(24: 02000001 03000800 00eb0097
01630077 44ef9777 01630070)
Sep 6 08:37:07 aqua kernel: fh_verify: Inode 23265399, Bad count: 1 2 or
version 1156552573 1156552567
Sep 6 08:37:07 aqua kernel: nfsd: ressize_check p f6564024 base f6564000
len 2304
Sep 6 08:37:07 aqua kernel: svc: service f660ea00, releasing skb f6bdc480
Sep 6 08:37:07 aqua kernel: svc: socket f6a33700 sendto([f6564000 36... ],
1, 36) = 36
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:07 aqua kernel: svc: server f660ea00 waiting for data (to =
30000)
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 dequeued, inuse=7
Sep 6 08:37:07 aqua kernel: svc: server f660ea00, socket f6a33700, inuse=8
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 put into queue
Sep 6 08:37:07 aqua kernel: svc: got len=132
Sep 6 08:37:07 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:07 aqua kernel: svc: socket f6a33700(inet f6b80080), count=128,
busy=1
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:07 aqua kernel: svc: svc_authenticate (1)
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:07 aqua kernel: svc: calling dispatcher
Sep 6 08:37:07 aqua kernel: nfsd_dispatch: vers 3 proc 21
Sep 6 08:37:07 aqua kernel: nfsd: COMMIT(3) 24: 02000001 03000800
00eb0097 01630077 44ef9777 01630070 0@4096
Sep 6 08:37:07 aqua kernel: nfsd: fh_verify(24: 02000001 03000800 00eb0097
01630077 44ef9777 01630070)
Sep 6 08:37:07 aqua kernel: fh_verify: Inode 23265399, Bad count: 1 2 or
version 1156552573 1156552567
Sep 6 08:37:07 aqua kernel: nfsd: ressize_check p f6564024 base f6564000
len 2304
Sep 6 08:37:07 aqua kernel: svc: service f660ea00, releasing skb efc65e80
Sep 6 08:37:07 aqua kernel: svc: socket f6a33700 sendto([f6564000 36... ],
1, 36) = 36
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:07 aqua kernel: svc: server f660ea00 waiting for data (to =
30000)
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 dequeued, inuse=7
Sep 6 08:37:07 aqua kernel: svc: server f660ea00, socket f6a33700, inuse=8
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 put into queue
Sep 6 08:37:07 aqua kernel: svc: got len=132
Sep 6 08:37:07 aqua kernel: svc: svc_authenticate (1)
Sep 6 08:37:07 aqua kernel: svc: calling dispatcher
Sep 6 08:37:07 aqua kernel: nfsd_dispatch: vers 3 proc 6
Sep 6 08:37:07 aqua kernel: nfsd: READ(3) 24: 02000001 03000800 00eb0097
01630078 44ef977e 01630070 4096 bytes at 262144
Sep 6 08:37:07 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:07 aqua kernel: svc: socket f6a33700(inet f6b80080), count=140,
busy=1
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:07 aqua kernel: nfsd: fh_verify(24: 02000001 03000800 00eb0097
01630078 44ef977e 01630070)
Sep 6 08:37:07 aqua kernel: nfsd: raparms 1 2 0 0 0
Sep 6 08:37:07 aqua kernel: nfsd: ressize_check p f6565080 base f6564000
len 2304
Sep 6 08:37:07 aqua kernel: svc: service f660ea00, releasing skb f6c71580
Sep 6 08:37:07 aqua kernel: svc: socket f6a33700 sendto([f6564000
4224... ], 1, 4224) = 4224
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:07 aqua kernel: svc: server f660ea00 waiting for data (to =
30000)
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 dequeued, inuse=7
Sep 6 08:37:07 aqua kernel: svc: server f660ea00, socket f6a33700, inuse=8
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 put into queue
Sep 6 08:37:07 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:07 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:07 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080), count=128,
busy=1
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: got len=120
Sep 6 08:37:08 aqua kernel: svc: svc_authenticate (1)
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: calling dispatcher
Sep 6 08:37:08 aqua kernel: nfsd_dispatch: vers 3 proc 4
Sep 6 08:37:08 aqua kernel: nfsd: ACCESS(3) 20: 01000001 03000800
00eb0097 029a009c b1a5b3f8 00000000 0x2
Sep 6 08:37:08 aqua kernel: nfsd: fh_verify(20: 01000001 03000800 00eb0097
029a009c b1a5b3f8 00000000)
Sep 6 08:37:08 aqua kernel: nfsd: ressize_check p f6564078 base f6564000
len 2304
Sep 6 08:37:08 aqua kernel: svc: service f660ea00, releasing skb f6bdc580
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700 sendto([f6564000 120... ],
1, 120) = 120
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: server f660ea00 waiting for data (to =
30000)
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 dequeued, inuse=7
Sep 6 08:37:08 aqua kernel: svc: server f660ea00, socket f6a33700, inuse=8
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 put into queue
Sep 6 08:37:08 aqua kernel: svc: got len=120
Sep 6 08:37:08 aqua kernel: svc: svc_authenticate (1)
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: calling dispatcher
Sep 6 08:37:08 aqua kernel: nfsd_dispatch: vers 3 proc 4
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: nfsd: ACCESS(3) 20: 01000001 03000800
00eb0097 03f5408e b1a5bd30 00000000 0x2
Sep 6 08:37:08 aqua kernel: nfsd: fh_verify(20: 01000001 03000800 00eb0097
03f5408e b1a5bd30 00000000)
Sep 6 08:37:08 aqua kernel: nfsd: ressize_check p f6564078 base f6564000
len 2304
Sep 6 08:37:08 aqua kernel: svc: service f660ea00, releasing skb f6d5b480
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700 sendto([f6564000 120... ],
1, 120) = 120
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: server f660ea00 waiting for data (to =
30000)
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 dequeued, inuse=7
Sep 6 08:37:08 aqua kernel: svc: server f660ea00, socket f6a33700, inuse=8
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 put into queue
Sep 6 08:37:08 aqua kernel: svc: got len=132
Sep 6 08:37:08 aqua kernel: svc: svc_authenticate (1)
Sep 6 08:37:08 aqua kernel: svc: calling dispatcher
Sep 6 08:37:08 aqua kernel: nfsd_dispatch: vers 3 proc 6
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: nfsd: READ(3) 24: 02000001 03000800 00eb0097
01630078 44ef977e 01630070 4096 bytes at 274432
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: nfsd: fh_verify(24: 02000001 03000800 00eb0097
01630078 44ef977e 01630070)
Sep 6 08:37:08 aqua kernel: nfsd: raparms 1 2 0 0 0
Sep 6 08:37:08 aqua kernel: nfsd: ressize_check p f6565080 base f6564000
len 2304
Sep 6 08:37:08 aqua kernel: svc: service f660ea00, releasing skb f5010080
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700 sendto([f6564000
4224... ], 1, 4224) = 4224
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: server f660ea00 waiting for data (to =
30000)
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 dequeued, inuse=7
Sep 6 08:37:08 aqua kernel: svc: server f660ea00, socket f6a33700, inuse=8
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 put into queue
Sep 6 08:37:08 aqua kernel: svc: got len=132
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: svc_authenticate (1)
Sep 6 08:37:08 aqua kernel: svc: calling dispatcher
Sep 6 08:37:08 aqua kernel: nfsd_dispatch: vers 3 proc 6
Sep 6 08:37:08 aqua kernel: nfsd: READ(3) 24: 02000001 03000800 00eb0097
01630078 44ef977e 01630070 4096 bytes at 278528
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: nfsd: fh_verify(24: 02000001 03000800 00eb0097
01630078 44ef977e 01630070)
Sep 6 08:37:08 aqua kernel: nfsd: raparms 1 2 0 0 0
Sep 6 08:37:08 aqua kernel: nfsd: ressize_check p f6565080 base f6564000
len 2304
Sep 6 08:37:08 aqua kernel: svc: service f660ea00, releasing skb f6ca5b80
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700 sendto([f6564000
4224... ], 1, 4224) = 4224
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: server f660ea00 waiting for data (to =
30000)
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 dequeued, inuse=7
Sep 6 08:37:08 aqua kernel: svc: server f660ea00, socket f6a33700, inuse=8
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 put into queue
Sep 6 08:37:08 aqua kernel: svc: got len=120
Sep 6 08:37:08 aqua kernel: svc: svc_authenticate (1)
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: calling dispatcher
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: nfsd_dispatch: vers 3 proc 4
Sep 6 08:37:08 aqua kernel: nfsd: ACCESS(3) 20: 01000001 03000800
00eb0097 03ecc089 b1a5bad9 00000000 0x2
Sep 6 08:37:08 aqua kernel: nfsd: fh_verify(20: 01000001 03000800 00eb0097
03ecc089 b1a5bad9 00000000)
Sep 6 08:37:08 aqua kernel: nfsd: ressize_check p f6564078 base f6564000
len 2304
Sep 6 08:37:08 aqua kernel: svc: service f660ea00, releasing skb f6d5bc80
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700 sendto([f6564000 120... ],
1, 120) = 120
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: server f660ea00 waiting for data (to =
30000)
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 dequeued, inuse=7
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: server f660ea00, socket f6a33700, inuse=8
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 put into queue
Sep 6 08:37:08 aqua kernel: svc: got len=132
Sep 6 08:37:08 aqua kernel: svc: svc_authenticate (1)
Sep 6 08:37:08 aqua kernel: svc: calling dispatcher
Sep 6 08:37:08 aqua kernel: nfsd_dispatch: vers 3 proc 6
Sep 6 08:37:08 aqua kernel: nfsd: READ(3) 24: 02000001 03000800 00eb0097
01630078 44ef977e 01630070 4096 bytes at 286720
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: nfsd: fh_verify(24: 02000001 03000800 00eb0097
01630078 44ef977e 01630070)
Sep 6 08:37:08 aqua kernel: nfsd: raparms 1 2 0 0 0
Sep 6 08:37:08 aqua kernel: nfsd: ressize_check p f6565080 base f6564000
len 2304
Sep 6 08:37:08 aqua kernel: svc: service f660ea00, releasing skb f6bfd080
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700 sendto([f6564000
4224... ], 1, 4224) = 4224
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: server f660ea00 waiting for data (to =
30000)
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 dequeued, inuse=7
Sep 6 08:37:08 aqua kernel: svc: server f660ea00, socket f6a33700, inuse=8
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 put into queue
Sep 6 08:37:08 aqua kernel: svc: got len=120
Sep 6 08:37:08 aqua kernel: svc: svc_authenticate (1)
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: calling dispatcher
Sep 6 08:37:08 aqua kernel: nfsd_dispatch: vers 3 proc 4
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=1
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: nfsd: ACCESS(3) 20: 01000001 03000800
00eb0097 029a009c b1a5b3f8 00000000 0x2
Sep 6 08:37:08 aqua kernel: nfsd: fh_verify(20: 01000001 03000800 00eb0097
029a009c b1a5b3f8 00000000)
Sep 6 08:37:08 aqua kernel: nfsd: ressize_check p f6564078 base f6564000
len 2304
Sep 6 08:37:08 aqua kernel: svc: service f660ea00, releasing skb f6d5b580
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700 sendto([f6564000 120... ],
1, 120) = 120
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 busy, not enqueued
Sep 6 08:37:08 aqua kernel: svc: server f660ea00 waiting for data (to =
30000)
Sep 6 08:37:08 aqua kernel: svc: socket f6b80080 dequeued, inuse=7
Sep 6 08:37:08 aqua kernel: svc: server f660ea00, socket f6a33700, inuse=8
Sep 6 08:37:08 aqua kernel: svc: got len=-11
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=0
Sep 6 08:37:08 aqua kernel: svc: server f660ea00 waiting for data (to =
30000)
Sep 6 08:37:08 aqua kernel: nfsd: raparms 1 2 0 0 0
Sep 6 08:37:08 aqua kernel: nfsd: ressize_check p f65a1080 base f65a0000
len 2304
Sep 6 08:37:08 aqua kernel: svc: service f660ac00, releasing skb f57cc180
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700 sendto([f65a0000
4224... ], 1, 4224) = 4224
Sep 6 08:37:08 aqua kernel: svc: server f660ac00 waiting for data (to =
30000)
Sep 6 08:37:08 aqua kernel: nfsd: write complete err=512
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=0
Sep 6 08:37:08 aqua last message repeated 2 times
Sep 6 08:37:08 aqua kernel: nfsd: ressize_check p f6568088 base f6568000
len 2304
Sep 6 08:37:08 aqua kernel: svc: service f660e400, releasing skb f6d7b880
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700 sendto([f6568000 136... ],
1, 136) = 136
Sep 6 08:37:08 aqua kernel: svc: server f660e400 waiting for data (to =
30000)
Sep 6 08:37:08 aqua kernel: nfsd: write complete err=1728
Sep 6 08:37:08 aqua kernel: nfsd: ressize_check p f6574088 base f6574000
len 2304
Sep 6 08:37:08 aqua kernel: svc: service f660d800, releasing skb f6eb2980
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700 sendto([f6574000 136... ],
1, 136) = 136
Sep 6 08:37:08 aqua kernel: svc: server f660d800 waiting for data (to =
30000)
Sep 6 08:37:08 aqua kernel: nfsd: raparms 1 2 0 0 0
Sep 6 08:37:08 aqua kernel: nfsd: ressize_check p f657d080 base f657c000
len 2304
Sep 6 08:37:08 aqua kernel: svc: service f660d200, releasing skb f6eb2b80
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=0
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=0
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700 sendto([f657c000
4224... ], 1, 4224) = 4224
Sep 6 08:37:08 aqua kernel: svc: server f660d200 waiting for data (to =
30000)
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=0
Sep 6 08:37:08 aqua last message repeated 2 times
Sep 6 08:37:08 aqua kernel: nfsd: ressize_check p f65a8080 base f65a8000
len 2304
Sep 6 08:37:08 aqua kernel: svc: service f660a600, releasing skb efa34680
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700 sendto([f65a8000 128... ],
1, 128) = 128
Sep 6 08:37:08 aqua kernel: svc: server f660a600 waiting for data (to =
30000)
Sep 6 08:37:08 aqua kernel: nfsd: ressize_check p f6570080 base f6570000
len 2304
Sep 6 08:37:08 aqua kernel: svc: service f660de00, releasing skb f6d5b080
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=0
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700 sendto([f6570000 128... ],
1, 128) = 128
Sep 6 08:37:08 aqua kernel: svc: server f660de00 waiting for data (to =
30000)
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=0
Sep 6 08:37:08 aqua kernel: nfsd: ressize_check p f65b0104 base f65b0000
len 2304
Sep 6 08:37:08 aqua kernel: svc: service f78a2e00, releasing skb efa34180
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700 sendto([f65b0000 260... ],
1, 260) = 260
Sep 6 08:37:08 aqua kernel: svc: server f78a2e00 waiting for data (to =
30000)
Sep 6 08:37:08 aqua kernel: svc: socket f6a33700(inet f6b80080),
write_space busy=0




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-06 07:26:01

by nfsmailinglist

[permalink] [raw]
Subject: RE: 'random' diskless clients hangs

NIC: Bus 2, device 8, function 0:
Ethernet controller: Intel Corp. 82801BD PRO/100 VE (LOM) Ethernet
Controller (rev 129).

Driver is the default from kernel linux-2.4.20-18.7:
CONFIG_EEPRO100=y

Kernel customisations:
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_ROOT_NFS=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
CONFIG_M686=y
All network drivers built in kernel.


-----Original Message-----
From: Trond Myklebust [mailto:[email protected]]
Sent: vrijdag 5 september 2003 21:51
To: [email protected]
Cc: [email protected]
Subject: Re: [NFS] 'random' diskless clients hangs


>>>>> " " == nfsmailinglist <[email protected]> writes:

> The clients still hang every few hours. There is nothing in the
> logs. When we change the rsize and wsize to 2048, the cleint
> no longer crash but clients and server because extremely slow.

That is usually a sign of a networking problem. What kind of
NIC+driver are you using on the clients?

Also, what does that 'custom' mean on you client kernels? Have you
done anything other than to enable nfsroot?

Cheers,
Trond




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-06 18:27:30

by Bogdan Costescu

[permalink] [raw]
Subject: RE: 'random' diskless clients hangs

On Sat, 6 Sep 2003 [email protected] wrote:

> Driver is the default from kernel linux-2.4.20-18.7:
> CONFIG_EEPRO100=y

Could you also try using the e100 driver ? eepro100 seems to be
unmaintained lately while the guys at Intel actively support e100.

-

Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [email protected]



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-09 04:16:23

by nfsmailinglist

[permalink] [raw]
Subject: RE: 'random' diskless clients hangs

Help!! We're closing to giving up on NFS.

We have tried multiple NICS and drivers on both server and client (eepro100,
e100, e1000).
We have tries both UDP and TCP.
We have tried multiple versions of nfs-utils.
We have tried multiple versions of the kernel.
We have tried multiple clients.
We have tried multiple switches.

Clients keep crashing unless we use rsize&wsize<=2048, but the load on
server and clients then get very high.


-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of Bogdan Costescu
Sent: zaterdag 6 september 2003 20:27
To: [email protected]
Cc: [email protected]
Subject: RE: [NFS] 'random' diskless clients hangs


On Sat, 6 Sep 2003 [email protected] wrote:

> Driver is the default from kernel linux-2.4.20-18.7:
> CONFIG_EEPRO100=y

Could you also try using the e100 driver ? eepro100 seems to be
unmaintained lately while the guys at Intel actively support e100.

-

Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [email protected]



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-09 05:48:58

by Matt C

[permalink] [raw]
Subject: RE: 'random' diskless clients hangs

We can't have you give up on NFS, now can we? :)

1. check out the output of 'netstat -sw' on both the client and server.
Look for the 'packet reassemblies failed'. If it's incrementing at any
considerable rate, you're dropping packets somewhere in transit. This
generally indicates network issues, so I'd look into that.

2. try a network card with a different chipset. While I've had good luck
with the intel cards, it doesn't hurt to try a 3com instead. If nothing
else, if changing the card _type_ makes a difference in your problems, it
helps us narrow down the problem.

3. enable the NMI watchdog on the clients. this will help catch some
deadlock conditions, and may give us an OOPS instead of a lockup. To
enable this, you add 'nmi_watchdog=1' to your kernel commandline. Since
you're nfsroot booting, you'd probably add this to the append= line in
your /tftpboot/pxelinux.cfg/ config file.

4. try and older kernel on the client to see if older kernels are stable.
I'd recommend 2.4.18, since it has worked well for us in the past. It
certainly has it's bugs, but it's largely stable.

5. enable remote syslog, and enable nfs/rpc debugging on the clients by
echoing 32767 into /proc/sys/sunrpc/{nfs|rpc}_debug. this will give you
better information about what NFS traffic is leading up to your lockup.

Hope this stuff helps you some. It's what I'd be doing to troubleshoot
issues like yours.

-matt

On Mon, 8 Sep 2003 [email protected] wrote:

> Help!! We're closing to giving up on NFS.
>
> We have tried multiple NICS and drivers on both server and client (eepro100,
> e100, e1000).
> We have tries both UDP and TCP.
> We have tried multiple versions of nfs-utils.
> We have tried multiple versions of the kernel.
> We have tried multiple clients.
> We have tried multiple switches.
>
> Clients keep crashing unless we use rsize&wsize<=2048, but the load on
> server and clients then get very high.
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of Bogdan Costescu
> Sent: zaterdag 6 september 2003 20:27
> To: [email protected]
> Cc: [email protected]
> Subject: RE: [NFS] 'random' diskless clients hangs
>
>
> On Sat, 6 Sep 2003 [email protected] wrote:
>
> > Driver is the default from kernel linux-2.4.20-18.7:
> > CONFIG_EEPRO100=y
>
> Could you also try using the e100 driver ? eepro100 seems to be
> unmaintained lately while the guys at Intel actively support e100.
>
> -
>
> Bogdan Costescu
>
> IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
> Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
> Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
> E-mail: [email protected]
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-09 15:53:00

by Bernd Schubert

[permalink] [raw]
Subject: Re: 'random' diskless clients hangs

On Monday 08 September 2003 20:37, [email protected] wrote:
> Help!! We're closing to giving up on NFS.
>
> We have tried multiple NICS and drivers on both server and client
> (eepro100, e100, e1000).
> We have tries both UDP and TCP.
> We have tried multiple versions of nfs-utils.
> We have tried multiple versions of the kernel.
> We have tried multiple clients.
> We have tried multiple switches.
>

Have you already tried to connect the server and one client directly? So using
a cross-over-cable from the server to one client without any switch between.

Also, is your network really full-duplex? You can figure out using mii-tool.

I would also try if the same problems happen on non-root-nfs clients. So e.g.
exporting only /usr and /home from the server, using a local
linux-installation on the client and import those two directories.
Btw, I think I saw in one of your mail, that you are using 4096bytes for rsize
and wsize as default, setting this to 8192 should increase the probability to
see the problem occuring faster. I guess later on (when everything is
working) you won't like to use 4096byte anyway.

Cheers,
Bernd



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs