2007-06-14 23:25:29

by mike

[permalink] [raw]
Subject: Trying to determine why my NFS connection goes away

THE ISSUE (easily repeatable by doing a bunch of file I/O on the client - using
nhfsstone, eventually my "normal" web load hits it too)

I will dump as much information as possible... I really want to make
sure that I have the most optimal setup.

This is the output from dmesg that concerns me:

nfs: server raid01 not responding, still trying
nfs: server raid01 not responding, still trying
nfs: server raid01 not responding, still trying
nfs: server raid01 OK
nfs: server raid01 OK
nfs: server raid01 OK
nfs: server raid01 not responding, still trying
nfs: server raid01 not responding, still trying
nfs: server raid01 not responding, still trying
nfs: server raid01 OK
nfs: server raid01 OK

However from everything I am looking at, all the right programs are running.

[root@web03 ~]# rpcinfo -u raid01 mount
program 100005 version 1 ready and waiting
program 100005 version 2 ready and waiting
program 100005 version 3 ready and waiting

[root@web03 ~]# rpcinfo -u raid01 portmap
program 100000 version 2 ready and waiting

[root@web03 ~]# rpcinfo -u raid01 status
program 100024 version 1 ready and waiting

[root@web03 ~]# rpcinfo -p raid01
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100021 1 udp 32771 nlockmgr
100021 3 udp 32771 nlockmgr
100021 4 udp 32771 nlockmgr
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100021 1 tcp 44102 nlockmgr
100021 3 tcp 44102 nlockmgr
100021 4 tcp 44102 nlockmgr
100005 1 udp 32767 mountd
100005 1 tcp 32767 mountd
100005 2 udp 32767 mountd
100005 2 tcp 32767 mountd
100005 3 udp 32767 mountd
100005 3 tcp 32767 mountd
100024 1 udp 32765 status
100024 1 tcp 32765 status

I have even increased the number of nfsd's on the server.

THE SPECS:

Client mounts server:
raid01:/home on /local/home type nfs
(rw,noatime,nfsvers=3,rsize=16384,wsize=16384,hard,intr,timeo=10,addr=192.168.1.151)

Server

/etc/default/nfs-kernel-server:

# Number of servers to start up
RPCNFSDCOUNT=24

# Runtime priority of server (see nice(1))
RPCNFSDPRIORITY=-15

# Options for rpc.mountd.
# If you have a port-based firewall, you might want to set up
# a fixed port here using the --port option. For more information,
# see rpc.mountd(8) or http://wiki.debian.org/?SecuringNFS
RPCMOUNTDOPTS="-p 32767"

/etc/default/nfs-common:
# Options for rpc.statd.
# Should rpc.statd listen on a specific port? This is especially useful
# when you have a port-based firewall. To use a fixed port, set this
# this variable to a statd argument like: "--port 4000 --outgoing-port 4001".
# For more information, see rpc.statd(8) or
http://wiki.debian.org/?SecuringNFS
STATDOPTS="--port 32765 --outgoing-port 32766"

# Some kernels need a separate lockd daemon; most don't. Set this if you
# want to force an explicit choice for some reason.
NEED_LOCKD=

# Do you want to start the idmapd daemon? It is only needed for NFSv4.
NEED_IDMAPD="no"

# Do you want to start the gssd daemon? It is required for Kerberos mounts.
NEED_GSSD="no"

Network info:
Server: MTU 1500 (current NIC won't support jumbo)
Client: MTU 6000
The entire LAN is gigabit

OS info:
Ubuntu Edgy Eft (all up to date) amd64
Linux kernel 2.6.21.5

Usage info:
Server serves 3 medium busy webservers, and a couple other less busy servers.

Hardware:
Client is Dual-core Opteron 2.2ghz, 2G RAM, runs only
PHP+FastCGI/webserver. Broadcom Corporation NetXtreme BCM5704 Gigabit
Ethernet.

Server is Dual-core Xeon 3050 (2.2ghz), 2G RAM, Areca RAID5 w/ 4x750G
SATA2, dedicated basically for NFS. Intel Corporation 82573E Gigabit
Ethernet Controller.

Is there any more information I could possibly supply? I am willing to
try anything here. I have upgraded my kernels to the latest stable
just in case there was a minor bug I might have been hitting... no
such luck.

Thanks in advance!
- mike

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2007-06-15 02:35:56

by Trond Myklebust

[permalink] [raw]
Subject: Re: Trying to determine why my NFS connection goes away

On Thu, 2007-06-14 at 16:25 -0700, mike wrote:
> THE ISSUE (easily repeatable by doing a bunch of file I/O on the client - using
> nhfsstone, eventually my "normal" web load hits it too)
>
> I will dump as much information as possible... I really want to make
> sure that I have the most optimal setup.
>
> This is the output from dmesg that concerns me:
>
> nfs: server raid01 not responding, still trying
> nfs: server raid01 not responding, still trying
> nfs: server raid01 not responding, still trying
> nfs: server raid01 OK
> nfs: server raid01 OK
> nfs: server raid01 OK
> nfs: server raid01 not responding, still trying
> nfs: server raid01 not responding, still trying
> nfs: server raid01 not responding, still trying
> nfs: server raid01 OK
> nfs: server raid01 OK

It sounds as if you are using UDP mounts in a situation where you
probably should be using TCP mounts.

Cheers
Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-06-15 06:33:15

by mike

[permalink] [raw]
Subject: Re: Trying to determine why my NFS connection goes away

no - if you look at the mount parameters i am explicitly stating TCP.
i have compiled kernels with TCP as well. i haven't used UDP in
forever. also i do have NFSv4 available but have had odd issues in the
past, i don't know how stable it is for simple mounts now (i don't
need anything crazy, or thousands of client machines, etc)

i should also mention that i applied these to try to ensure i don't
have any leftover sockets, trying to cut down on the amount of TCP
overhead, etc. i've been having the same NFS issues before this too
(at least the same messages) so it's not due to that (at least,
there's no reason to consider that the culprit)

this is the same sysctl config on both the client and the server.

[root@web03 ~]# cat /etc/sysctl.conf
# Uncomment the next line to enable TCP/IP SYN cookies
net.ipv4.tcp_syncookies=1

# others
vm.swappiness=10
net.ipv4.ip_local_port_range = 1024 65000

# Controls IP packet forwarding
net.ipv4.ip_forward = 1
net.ipv4.conf.default.forwarding=1

# Controls whether core dumps will append the PID to the core filename.
# Useful for debugging multi-threaded applications.
kernel.core_uses_pid = 1

# suggested
net.ipv4.icmp_echo_ignore_broadcasts = 1
# Do not accept source routing
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.all.accept_source_route = 0
# Uncomment the next line to enable Spoof protection (reverse-path filter)
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.all.accept_redirects = 0

# my guess
net.ipv4.tcp_max_orphans = 1024
# Decrease the time default value for tcp_keepalive_time connection
net.ipv4.tcp_keepalive_time = 300
# Turn off the tcp_window_scaling
net.ipv4.tcp_window_scaling = 0
# Turn off the tcp_sack
net.ipv4.tcp_sack = 0
# Turn off the tcp_timestamps
net.ipv4.tcp_timestamps = 0

net.ipv4.tcp_rfc1337 = 1
net.core.rmem_default = 262144
net.core.rmem_max = 262144

# These ensure that TIME_WAIT ports either get reused or closed fast.
net.ipv4.tcp_fin_timeout = 1
net.ipv4.tcp_tw_recycle = 1

# TCP memory
net.core.rmem_max = 16777216
net.core.rmem_default = 16777216
net.core.netdev_max_backlog = 262144
net.core.somaxconn = 262144

net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2

# If you have a lot of large file uploads, increasing the receive
buffers will help.
net.ipv4.tcp_rmem = 4096 87380 524288
net.core.rmem_max = 1048576

# Increasing the TCP send and receive buffers will increase the
performance a lot if (and only if) you have a lot of large files to
send.
net.ipv4.tcp_wmem = 4096 65536 524288
net.core.wmem_max = 1048576

# you shouldn't be using conntrack on a heavily loaded server anyway,
but these are
# suitably high for our uses, insuring that if conntrack gets turned
on, the box doesn't die
net.ipv4.ip_conntrack_max = 1048576
net.nf_conntrack_max = 1048576


On 6/14/07, Trond Myklebust <[email protected]> wrote:
> On Thu, 2007-06-14 at 16:25 -0700, mike wrote:
> > THE ISSUE (easily repeatable by doing a bunch of file I/O on the client - using
> > nhfsstone, eventually my "normal" web load hits it too)
> >
> > I will dump as much information as possible... I really want to make
> > sure that I have the most optimal setup.
> >
> > This is the output from dmesg that concerns me:
> >
> > nfs: server raid01 not responding, still trying
> > nfs: server raid01 not responding, still trying
> > nfs: server raid01 not responding, still trying
> > nfs: server raid01 OK
> > nfs: server raid01 OK
> > nfs: server raid01 OK
> > nfs: server raid01 not responding, still trying
> > nfs: server raid01 not responding, still trying
> > nfs: server raid01 not responding, still trying
> > nfs: server raid01 OK
> > nfs: server raid01 OK
>
> It sounds as if you are using UDP mounts in a situation where you
> probably should be using TCP mounts.
>
> Cheers
> Trond
>
>

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-06-15 09:41:46

by Bernd Schubert

[permalink] [raw]
Subject: Re: Trying to determine why my NFS connection goes away

On Friday 15 June 2007 08:33:17 mike wrote:
> no - if you look at the mount parameters i am explicitly stating TCP.

Sure?

On Friday 15 June 2007 01:25:31 mike wrote:
> THE SPECS:
>
> Client mounts server:
> raid01:/home on /local/home type nfs
> (rw,noatime,nfsvers=3,rsize=16384,wsize=16384,hard,intr,timeo=10,addr=192.1
>68.1.151)
>

I don't see tcp here.


Cheers,
Bernd

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-06-15 09:56:57

by mike

[permalink] [raw]
Subject: Re: Trying to determine why my NFS connection goes away

okay, wow - i guess this is from my nfs4 setup when i didn't have to
explicitly define it.

i am trying to lsof | grep TCP | grep raid01 and UDP and not seeing
anything so maybe i don't know how to trace it the simplest way, but i
will remount it with tcp and see if it happens again.

if it does... i will surely be embarrased :)


On 6/15/07, Bernd Schubert <[email protected]> wrote:
> On Friday 15 June 2007 08:33:17 mike wrote:
> > no - if you look at the mount parameters i am explicitly stating TCP.
>
> Sure?
>
> On Friday 15 June 2007 01:25:31 mike wrote:
> > THE SPECS:
> >
> > Client mounts server:
> > raid01:/home on /local/home type nfs
> > (rw,noatime,nfsvers=3,rsize=16384,wsize=16384,hard,intr,timeo=10,addr=192.1
> >68.1.151)
> >
>
> I don't see tcp here.
>
>
> Cheers,
> Bernd
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-06-15 10:00:11

by mike

[permalink] [raw]
Subject: Re: Trying to determine why my NFS connection goes away

On 6/15/07, Bernd Schubert <[email protected]> wrote:
> Sure?

well, bad news.

just ran it. i cleared dmesg beforehand. only a minute or so of
nhfsstone already triggered this many errors:

[root@web03 /]# dmesg -c
nfs: server raid01 not responding, still trying
nfs: server raid01 not responding, still trying
nfs: server raid01 not responding, still trying
nfs: server raid01 not responding, still trying
nfs: server raid01 not responding, still trying
nfs: server raid01 not responding, still trying
nfs: server raid01 not responding, still trying
nfs: server raid01 OK
nfs: server raid01 not responding, still trying
nfs: server raid01 OK
nfs: server raid01 OK
nfs: server raid01 OK
nfs: server raid01 OK
nfs: server raid01 OK
nfs: server raid01 OK
nfs: server raid01 OK

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-06-15 13:49:26

by Trond Myklebust

[permalink] [raw]
Subject: Re: Trying to determine why my NFS connection goes away

On Thu, 2007-06-14 at 23:33 -0700, mike wrote:
> no - if you look at the mount parameters i am explicitly stating TCP.
> i have compiled kernels with TCP as well. i haven't used UDP in
> forever. also i do have NFSv4 available but have had odd issues in the
> past, i don't know how stable it is for simple mounts now (i don't
> need anything crazy, or thousands of client machines, etc)

Then why on earth are you using timeo=10? Use the default timeo=600 and
it will all work.

Using overly short timeouts on TCP is completely unnecessary: TCP
provides reliable delivery of data. Furthermore, a timeout forces the
client to keep disconnecting and reconnecting, and that is why you are
seeing those messages.

Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-06-15 22:32:45

by mike

[permalink] [raw]
Subject: Re: Trying to determine why my NFS connection goes away

On 6/15/07, Trond Myklebust <[email protected]> wrote:
> Then why on earth are you using timeo=10? Use the default timeo=600 and
> it will all work.
>
> Using overly short timeouts on TCP is completely unnecessary: TCP
> provides reliable delivery of data. Furthermore, a timeout forces the
> client to keep disconnecting and reconnecting, and that is why you are
> seeing those messages.

I think that was one of the suggestions (I've Googled a lot, tried
different things, etc.) when looking at tuning NFS and such.

I have now mounted it as such:

raid01:/home on /local/home type nfs
(rw,nodev,_netdev,noatime,nfsvers=3,tcp,rsize=16384,wsize=16384,hard,intr,nolock,timeo=600,addr=192.168.1.151)

I still consistently and easily get the

[root@web03 ~]# dmesg -c
nfs: server raid01 not responding, still trying
nfs: server raid01 OK
nfs: server raid01 not responding, still trying
nfs: server raid01 OK

After only a minute or two of nhfsstone (with default parameters)

Any other suggestions? Increasing/decreasing wsize/rsize? Changing
hard to soft? I'm willing to try anything.

Thanks.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-06-16 04:03:56

by Trond Myklebust

[permalink] [raw]
Subject: Re: Trying to determine why my NFS connection goes away

On Fri, 2007-06-15 at 15:32 -0700, mike wrote:
> On 6/15/07, Trond Myklebust <[email protected]> wrote:
> > Then why on earth are you using timeo=10? Use the default timeo=600 and
> > it will all work.
> >
> > Using overly short timeouts on TCP is completely unnecessary: TCP
> > provides reliable delivery of data. Furthermore, a timeout forces the
> > client to keep disconnecting and reconnecting, and that is why you are
> > seeing those messages.
>
> I think that was one of the suggestions (I've Googled a lot, tried
> different things, etc.) when looking at tuning NFS and such.
>
> I have now mounted it as such:
>
> raid01:/home on /local/home type nfs
> (rw,nodev,_netdev,noatime,nfsvers=3,tcp,rsize=16384,wsize=16384,hard,intr,nolock,timeo=600,addr=192.168.1.151)
>
> I still consistently and easily get the
>
> [root@web03 ~]# dmesg -c
> nfs: server raid01 not responding, still trying
> nfs: server raid01 OK
> nfs: server raid01 not responding, still trying
> nfs: server raid01 OK
>
> After only a minute or two of nhfsstone (with default parameters)
>
> Any other suggestions? Increasing/decreasing wsize/rsize? Changing
> hard to soft? I'm willing to try anything.

Did you check that the above parameters are indeed set in /proc/mounts?

Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-06-16 04:28:16

by mike

[permalink] [raw]
Subject: Re: Trying to determine why my NFS connection goes away

[root@web03 ~]# cat /proc/mounts

raid01:/home /local/home nfs
rw,nodev,noatime,vers=3,rsize=16384,wsize=16384,hard,intr,nolock,proto=tcp,timeo=10,retrans=2,sec=sys,addr=raid01
0 0

okay, it did not accept the timeo... do you see any other parameters
in there i should tune at the same time?

on another client, CentOS 2.6.9 kernel this is /proc/mounts:
raid01:/home /home nfs
rw,noatime,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=raid01 0
0

it does not seem to suffer from this specific issue. i am not sure if
it is running as well as it can, but it definately does not report
this nfs server going away stuff. i am running nhfsstone on it right
now and it is not reporting any nfs disconnections at all.

other than the timeo= being left out (and i guess the default of 600
kicking in) do you see anything else i should do? i am open for any
suggestions to have the most optimized solution i can.

thanks again.
- mike

On 6/15/07, Trond Myklebust <[email protected]> wrote:
> On Fri, 2007-06-15 at 15:32 -0700, mike wrote:
> > On 6/15/07, Trond Myklebust <[email protected]> wrote:
> > > Then why on earth are you using timeo=10? Use the default timeo=600 and
> > > it will all work.
> > >
> > > Using overly short timeouts on TCP is completely unnecessary: TCP
> > > provides reliable delivery of data. Furthermore, a timeout forces the
> > > client to keep disconnecting and reconnecting, and that is why you are
> > > seeing those messages.
> >
> > I think that was one of the suggestions (I've Googled a lot, tried
> > different things, etc.) when looking at tuning NFS and such.
> >
> > I have now mounted it as such:
> >
> > raid01:/home on /local/home type nfs
> > (rw,nodev,_netdev,noatime,nfsvers=3,tcp,rsize=16384,wsize=16384,hard,intr,nolock,timeo=600,addr=192.168.1.151)
> >
> > I still consistently and easily get the
> >
> > [root@web03 ~]# dmesg -c
> > nfs: server raid01 not responding, still trying
> > nfs: server raid01 OK
> > nfs: server raid01 not responding, still trying
> > nfs: server raid01 OK
> >
> > After only a minute or two of nhfsstone (with default parameters)
> >
> > Any other suggestions? Increasing/decreasing wsize/rsize? Changing
> > hard to soft? I'm willing to try anything.
>
> Did you check that the above parameters are indeed set in /proc/mounts?
>
> Trond
>
>

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-06-16 06:57:01

by mike

[permalink] [raw]
Subject: Re: Trying to determine why my NFS connection goes away

On 6/15/07, Trond Myklebust <[email protected]> wrote:
> Did you check that the above parameters are indeed set in /proc/mounts?

okay i am having a weird issue.

it is totally ignoring the timeo=XXX parameter in fstab, and even
manually mounting:

[root@web03 /]# mount -t nfs -o timeo=600 raid01:/home /home

[root@web03 /]# mount
raid01:/home on /local/home type nfs (rw,timeo=600,addr=192.168.1.151)

[root@web03 /]# cat /proc/mounts
raid01:/home /local/home nfs
rw,vers=3,rsize=16384,wsize=16384,hard,intr,nolock,proto=tcp,timeo=10,retrans=2,sec=sys,addr=raid01
0 0

is there some sort of weird caching or something i should be aware
about? if i do not have timeo defined in fstab, it still has the
timeo=10, if i define it as 600, it still keeps it as 10. i didn't
even define tcp on the command line above and it kept the old
settings...

i suppose if i can figure out why this is ignoring my requests then i
can try to configure it to have the longer timeouts, etc...

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-06-16 15:06:13

by Trond Myklebust

[permalink] [raw]
Subject: Re: Trying to determine why my NFS connection goes away

On Fri, 2007-06-15 at 23:57 -0700, mike wrote:
> On 6/15/07, Trond Myklebust <[email protected]> wrote:
> > Did you check that the above parameters are indeed set in /proc/mounts?
>
> okay i am having a weird issue.
>
> it is totally ignoring the timeo=XXX parameter in fstab, and even
> manually mounting:
>
> [root@web03 /]# mount -t nfs -o timeo=600 raid01:/home /home
>
> [root@web03 /]# mount
> raid01:/home on /local/home type nfs (rw,timeo=600,addr=192.168.1.151)
>
> [root@web03 /]# cat /proc/mounts
> raid01:/home /local/home nfs
> rw,vers=3,rsize=16384,wsize=16384,hard,intr,nolock,proto=tcp,timeo=10,retrans=2,sec=sys,addr=raid01
> 0 0
>
> is there some sort of weird caching or something i should be aware
> about? if i do not have timeo defined in fstab, it still has the
> timeo=10, if i define it as 600, it still keeps it as 10. i didn't
> even define tcp on the command line above and it kept the old
> settings...
>
> i suppose if i can figure out why this is ignoring my requests then i
> can try to configure it to have the longer timeouts, etc...

You've got the same filesystem mounted somewhere else on the system and
so the NFS client will reuse the superblock from the other mount.

Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-06-16 23:32:01

by mike

[permalink] [raw]
Subject: Re: Trying to determine why my NFS connection goes away

Weird.

Well I wound up trying nfs4 - and I have not been able to make it
report any errors yet.

from /proc/mounts:
raid01:/home /local/home nfs4
rw,vers=4,rsize=32768,wsize=32768,hard,intr,proto=tcp,timeo=600,retrans=3,sec=sys,addr=raid01
0 0

(I also upped the rsize/wsize to 32k...)

If you see anything weird with that, please let me know. Otherwise for
now I'll just keep it nfs4 and hope for the best... if I have to go
back to nfs3 though I may be bringing it up again :)

- mike


On 6/16/07, Trond Myklebust <[email protected]> wrote:
> On Fri, 2007-06-15 at 23:57 -0700, mike wrote:
> > On 6/15/07, Trond Myklebust <[email protected]> wrote:
> > > Did you check that the above parameters are indeed set in /proc/mounts?
> >
> > okay i am having a weird issue.
> >
> > it is totally ignoring the timeo=XXX parameter in fstab, and even
> > manually mounting:
> >
> > [root@web03 /]# mount -t nfs -o timeo=600 raid01:/home /home
> >
> > [root@web03 /]# mount
> > raid01:/home on /local/home type nfs (rw,timeo=600,addr=192.168.1.151)
> >
> > [root@web03 /]# cat /proc/mounts
> > raid01:/home /local/home nfs
> > rw,vers=3,rsize=16384,wsize=16384,hard,intr,nolock,proto=tcp,timeo=10,retrans=2,sec=sys,addr=raid01
> > 0 0
> >
> > is there some sort of weird caching or something i should be aware
> > about? if i do not have timeo defined in fstab, it still has the
> > timeo=10, if i define it as 600, it still keeps it as 10. i didn't
> > even define tcp on the command line above and it kept the old
> > settings...
> >
> > i suppose if i can figure out why this is ignoring my requests then i
> > can try to configure it to have the longer timeouts, etc...
>
> You've got the same filesystem mounted somewhere else on the system and
> so the NFS client will reuse the superblock from the other mount.
>
> Trond
>
>

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs