2003-08-23 12:04:40

by Hans-Peter Jansen

[permalink] [raw]
Subject: nfs errors clutter up logs after 2.4.20 -> 2.4.22-pre10

Hi,

after kernel update all my (diskless) systems clutter up the syslog
with:

Aug 23 13:41:42 yogi kernel: nfs: server shrek not responding, still trying
Aug 23 13:41:42 yogi last message repeated 8 times
Aug 23 13:41:44 yogi kernel: nfs: server xxx.yy.zz.102 not responding, still trying
Aug 23 13:41:45 yogi kernel: nfs: server xxx.yy.zz.102 OK
Aug 23 13:41:45 yogi kernel: nfs: server shrek OK
Aug 23 13:41:45 yogi last message repeated 8 times
Aug 23 13:43:17 yogi kernel: nfs: server shrek not responding, still trying
Aug 23 13:43:17 yogi last message repeated 8 times
Aug 23 13:43:19 yogi kernel: nfs: server xxx.yy.zz.102 not responding, still trying
Aug 23 13:43:20 yogi kernel: nfs: server xxx.yy.zz.102 OK
Aug 23 13:43:20 yogi kernel: nfs: server shrek OK
Aug 23 13:43:20 yogi last message repeated 8 times
Aug 23 13:44:53 yogi kernel: nfs: server shrek not responding, still trying
Aug 23 13:44:53 yogi last message repeated 15 times
Aug 23 13:44:54 yogi kernel: nfs: server xxx.yy.zz.102 not responding, still trying
Aug 23 13:44:54 yogi kernel: nfs: server shrek not responding, still trying
Aug 23 13:44:56 yogi kernel: nfs: server shrek OK
Aug 23 13:44:56 yogi kernel: nfs: server xxx.yy.zz.102 OK
Aug 23 13:44:56 yogi kernel: nfs: server shrek OK
Aug 23 13:44:56 yogi last message repeated 15 times

Funny is the continous change of name <-> ip address. I would rule out
named problems, but who knows... It happens predictable with
2.4.22-pre10, while it doesn't happen with 2.4.20.

Anybody with an idea, whats going on here?

TIA,
Pete


-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-08-23 18:02:32

by Trond Myklebust

[permalink] [raw]
Subject: Re: nfs errors clutter up logs after 2.4.20 -> 2.4.22-pre10

>>>>> " " == Hans-Peter Jansen <[email protected]> writes:

> Hi, after kernel update all my (diskless) systems clutter up
> the syslog with:

> Aug 23 13:41:42 yogi kernel: nfs: server shrek not responding,

Does the following patch help?

Cheers,
Trond

--- linux-2.4.22-up/net/sunrpc/timer.c.orig 2002-08-14 17:52:52.000000000 -0700
+++ linux-2.4.22-up/net/sunrpc/timer.c 2003-08-23 10:26:36.000000000 -0700
@@ -8,7 +8,7 @@

#define RPC_RTO_MAX (60*HZ)
#define RPC_RTO_INIT (HZ/5)
-#define RPC_RTO_MIN (2)
+#define RPC_RTO_MIN (HZ/10)

void
rpc_init_rtt(struct rpc_rtt *rt, long timeo)


-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-08-24 00:12:54

by Bernd Schubert

[permalink] [raw]
Subject: Re: nfs errors clutter up logs after 2.4.20 -> 2.4.22-pre10

On Saturday 23 August 2003 19:28, Trond Myklebust wrote:
> >>>>> " " == Hans-Peter Jansen <[email protected]> writes:
> > Hi, after kernel update all my (diskless) systems clutter up
> > the syslog with:
> >
> > Aug 23 13:41:42 yogi kernel: nfs: server shrek not responding,
>
> Does the following patch help?
>
> Cheers,
> Trond
>
> --- linux-2.4.22-up/net/sunrpc/timer.c.orig 2002-08-14 17:52:52.000000000
> -0700 +++ linux-2.4.22-up/net/sunrpc/timer.c 2003-08-23 10:26:36.000000000
> -0700 @@ -8,7 +8,7 @@
>
> #define RPC_RTO_MAX (60*HZ)
> #define RPC_RTO_INIT (HZ/5)
> -#define RPC_RTO_MIN (2)
> +#define RPC_RTO_MIN (HZ/10)
>
> void
> rpc_init_rtt(struct rpc_rtt *rt, long timeo)
>


Hello,

I also just had this problem with 2.4.22-rc2 and your patch is fixing it.

Thanks,
Bernd



-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-08-25 08:45:10

by Bernd Schubert

[permalink] [raw]
Subject: Re: [NFS] nfs errors clutter up logs after 2.4.20 -> 2.4.22-pre10

On Sunday 24 August 2003 02:12, Bernd Schubert wrote:
> On Saturday 23 August 2003 19:28, Trond Myklebust wrote:
> > >>>>> " " == Hans-Peter Jansen <[email protected]> writes:
> > > Hi, after kernel update all my (diskless) systems clutter up
> > > the syslog with:
> > >
> > > Aug 23 13:41:42 yogi kernel: nfs: server shrek not responding,
> >
> > Does the following patch help?
> >
> > Cheers,
> > Trond
> >
> > --- linux-2.4.22-up/net/sunrpc/timer.c.orig 2002-08-14 17:52:52.000000000
> > -0700 +++ linux-2.4.22-up/net/sunrpc/timer.c 2003-08-23
> > 10:26:36.000000000 -0700 @@ -8,7 +8,7 @@
> >
> > #define RPC_RTO_MAX (60*HZ)
> > #define RPC_RTO_INIT (HZ/5)
> > -#define RPC_RTO_MIN (2)
> > +#define RPC_RTO_MIN (HZ/10)
> >
> > void
> > rpc_init_rtt(struct rpc_rtt *rt, long timeo)
>
> Hello,
>
> I also just had this problem with 2.4.22-rc2 and your patch is fixing it.
>
> Thanks,
> Bernd
>

Since today already 2.4.22-rc4 has been released, shouldn't that be posted to
Marcelo as fast as possible?

Regards,
Bernd

2003-08-26 21:29:51

by Hans-Peter Jansen

[permalink] [raw]
Subject: Re: nfs errors clutter up logs after 2.4.20 -> 2.4.22-pre10

Hi Trond,

sorry 'bout the huge delay, but let me acknoledge: perfect cut.

Many thanks.

Since it's very late in 2.4.22 and crucial for all NFS users, I hope,
it's fine to CC you, Marcelo: if you didn't noticed already, I highly
recommend including this fix before 2.4.22-final (at least it hasn't
found its way into your tree until now...). Be prepared about many
more of these requests, if you don't ;-).

if (Tronds_fault || !Tronds_fault) /* silly test */
Tronds_heroscore += 100;

Pete

On Saturday 23 August 2003 19:28, Trond Myklebust wrote:
> >>>>> " " == Hans-Peter Jansen <[email protected]> writes:
> > Hi, after kernel update all my (diskless) systems clutter up
> > the syslog with:
> >
> > Aug 23 13:41:42 yogi kernel: nfs: server shrek not
> > responding,
>
> Does the following patch help?
>
> Cheers,
> Trond
>
> --- linux-2.4.22-up/net/sunrpc/timer.c.orig 2002-08-14
> 17:52:52.000000000 -0700 +++
> linux-2.4.22-up/net/sunrpc/timer.c 2003-08-23 10:26:36.000000000
> -0700 @@ -8,7 +8,7 @@
>
> #define RPC_RTO_MAX (60*HZ)
> #define RPC_RTO_INIT (HZ/5)
> -#define RPC_RTO_MIN (2)
> +#define RPC_RTO_MIN (HZ/10)
>
> void
> rpc_init_rtt(struct rpc_rtt *rt, long timeo)
>



-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-08-27 10:50:14

by Hans-Peter Jansen

[permalink] [raw]
Subject: Re: nfs errors clutter up logs after 2.4.20 -> 2.4.22-pre10

Hi Trond,

unfortunately, it hasn't fixed the phenomenon completely, but now
it takes much longer to trigger it. After rebuilding and restarting
tonight at 00:52, the first occurence was at 09:57, the next ones at
10:00 and 10:05.

Do you think, it's worth playing with bigger RPC_RTO_MIN values?

Cheers,
Pete

On Tuesday 26 August 2003 23:28, Hans-Peter Jansen wrote:
> Hi Trond,
>
> sorry 'bout the huge delay, but let me acknoledge: perfect cut.
>
> Many thanks.
>
> Since it's very late in 2.4.22 and crucial for all NFS users, I

Sorry, haven't tracked lkml/bk changelog sufficiently. Let's hope
to get this right in .23.

> hope, it's fine to CC you, Marcelo: if you didn't noticed already,
> I highly recommend including this fix before 2.4.22-final (at least
> it hasn't found its way into your tree until now...). Be prepared
> about many more of these requests, if you don't ;-).
>
> if (Tronds_fault || !Tronds_fault) /* silly test */
> Tronds_heroscore += 100;
>
> Pete
>
> On Saturday 23 August 2003 19:28, Trond Myklebust wrote:
> > >>>>> " " == Hans-Peter Jansen <[email protected]> writes:
> > > Hi, after kernel update all my (diskless) systems clutter
> > > up the syslog with:
> > >
> > > Aug 23 13:41:42 yogi kernel: nfs: server shrek not
> > > responding,
> >
> > Does the following patch help?
> >
> > Cheers,
> > Trond
> >
> > --- linux-2.4.22-up/net/sunrpc/timer.c.orig 2002-08-14
> > 17:52:52.000000000 -0700 +++
> > linux-2.4.22-up/net/sunrpc/timer.c 2003-08-23 10:26:36.000000000
> > -0700 @@ -8,7 +8,7 @@
> >
> > #define RPC_RTO_MAX (60*HZ)
> > #define RPC_RTO_INIT (HZ/5)
> > -#define RPC_RTO_MIN (2)
> > +#define RPC_RTO_MIN (HZ/10)
> >
> > void
> > rpc_init_rtt(struct rpc_rtt *rt, long timeo)



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-04 20:09:17

by Trond Myklebust

[permalink] [raw]
Subject: Re: nfs errors clutter up logs after 2.4.20 -> 2.4.22-pre10

>>>>> " " == Matt C <Matt> writes:

> Hi Trond- I applied this patch to one of my clients, and it
> seems to have helped. However, I'm still getting a lot of the:

> server x not responding, still trying server x OK server x not
> responding, still trying server x OK

> when the server is heavily loaded. what values would be
> reasonable for the
> #defines below in order to increase this timeout significantly?
> #I tried
> changing RPC_RTO_MIN to (HZ/5) and RPC_RTO_INIT to (HZ/2), but
> that didn't seem to make a big difference. I'm having a hard
> time understanding the rpc_update_rtt() function.

The rpc_update_rtt() function is pretty standard. It is documented in
a paper by Van Jacobson from 1998. See:
http://www-nrg.ee.lbl.gov/nrg-papers.html

To summarize: that function is just measuring the round-trip-time
(rtt) for each request, and then using that to build up an estimate
for the old 'timeo' mount option. The estimate takes into account
random fluctuations by also maintaining an estimate of the error on
the rtt. RPC_RTO_MIN is just a minimum value for that estimated error.

Note: If it is getting the estimate wrong, then that indicates that a
graph of your round trip time will show large 'spikes' at certain
moments. I would suggest that you ought to look into why this is the
case.
Are you, for instance, running with enough NFS server threads? Are the
switches/routers between you and the server up to the task, or are
they perhaps dropping large numbers of packets?

Cheers,
Trond


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-04 20:38:20

by Trond Myklebust

[permalink] [raw]
Subject: Re: nfs errors clutter up logs after 2.4.20 -> 2.4.22-pre10

>>>>> " " == Trond Myklebust <[email protected]> writes:

> The rpc_update_rtt() function is pretty standard. It is
> documented in a paper by Van Jacobson from 1998. See:
> http://www-nrg.ee.lbl.gov/nrg-papers.html

Errr... The above date should of course be 1988. Paper is called
"Congestion Avoidance and Control".

Cheers,
Trond


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-04 18:33:28

by Matt C

[permalink] [raw]
Subject: Re: nfs errors clutter up logs after 2.4.20 -> 2.4.22-pre10

Hi Trond-

I applied this patch to one of my clients, and it seems to have helped.
However, I'm still getting a lot of the:

server x not responding, still trying
server x OK
server x not responding, still trying
server x OK

when the server is heavily loaded. what values would be reasonable for the
#defines below in order to increase this timeout significantly? I tried
changing RPC_RTO_MIN to (HZ/5) and RPC_RTO_INIT to (HZ/2), but that didn't
seem to make a big difference. I'm having a hard time understanding the
rpc_update_rtt() function.

thanks!

-matt



On 23 Aug 2003, Trond Myklebust wrote:

> >>>>> " " == Hans-Peter Jansen <[email protected]> writes:
>
> > Hi, after kernel update all my (diskless) systems clutter up
> > the syslog with:
>
> > Aug 23 13:41:42 yogi kernel: nfs: server shrek not responding,
>
> Does the following patch help?
>
> Cheers,
> Trond
>
> --- linux-2.4.22-up/net/sunrpc/timer.c.orig 2002-08-14 17:52:52.000000000 -0700
> +++ linux-2.4.22-up/net/sunrpc/timer.c 2003-08-23 10:26:36.000000000 -0700
> @@ -8,7 +8,7 @@
>
> #define RPC_RTO_MAX (60*HZ)
> #define RPC_RTO_INIT (HZ/5)
> -#define RPC_RTO_MIN (2)
> +#define RPC_RTO_MIN (HZ/10)
>
> void
> rpc_init_rtt(struct rpc_rtt *rt, long timeo)
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: VM Ware
> With VMware you can run multiple operating systems on a single machine.
> WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
> at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-05 11:05:48

by Hans-Peter Jansen

[permalink] [raw]
Subject: Re: nfs errors clutter up logs after 2.4.20 -> 2.4.22-pre10

Hi Trond,

I've tried to #define RPC_RTO_MIN (HZ/8) also, but this doesn't
change the picture.

Here are a few more random data points (sorry for the large holes,
I cannot keep the pace of you guys _all_ the time ;-):

With 2.4.20 and before, I noticed this message only when taking
down the server, which is expected.

While betatesting the current kernel of upcoming SuSE 9.0 (2.4.21-59),
prepared for diskless in my 8.2 env. yesterday, it doesn't show this
problem, too. It's a heavily tweaked 2.4.21 with many .22 fixes
included. It happens for me with 2.4.22-pre10, so it smells like some
changes in (late) 2.4.22 triggers it.

Pete

On Wednesday 27 August 2003 12:50, Hans-Peter Jansen wrote:
> Hi Trond,
>
> unfortunately, it hasn't fixed the phenomenon completely, but now
> it takes much longer to trigger it. After rebuilding and restarting
> tonight at 00:52, the first occurence was at 09:57, the next ones
> at 10:00 and 10:05.
>
> Do you think, it's worth playing with bigger RPC_RTO_MIN values?
>
> Cheers,
> Pete



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-05 16:57:58

by Trond Myklebust

[permalink] [raw]
Subject: Re: nfs errors clutter up logs after 2.4.20 -> 2.4.22-pre10

>>>>> " " == Hans-Peter Jansen <[email protected]> writes:

> While betatesting the current kernel of upcoming SuSE 9.0
> (2.4.21-59), prepared for diskless in my 8.2 env. yesterday, it
> doesn't show this problem, too. It's a heavily tweaked 2.4.21
> with many .22 fixes included. It happens for me with
> 2.4.22-pre10, so it smells like some changes in (late) 2.4.22
> triggers it.


Hmm... Most of those changes were only supposed to affect the TCP
performance. Do you see the problem with 2.4.22-pre9? If so, what
about 2.4.22-pre3?

Also, does the following patch make any difference? It should tweak
the RTT timer updates slightly.

Cheers,
Trond


diff -u --recursive --new-file linux-2.4.22-rc3/include/linux/sunrpc/xprt.h linux-2.4.22-00-resends/include/linux/sunrpc/xprt.h
--- linux-2.4.22-rc3/include/linux/sunrpc/xprt.h 2003-08-23 14:39:24.000000000 -0400
+++ linux-2.4.22-00-resends/include/linux/sunrpc/xprt.h 2003-09-05 10:52:00.000000000 -0400
@@ -115,7 +115,7 @@

long rq_xtime; /* when transmitted */
int rq_ntimeo;
- int rq_nresend;
+ int rq_ntrans;
};
#define rq_svec rq_snd_buf.head
#define rq_slen rq_snd_buf.len
diff -u --recursive --new-file linux-2.4.22-rc3/net/sunrpc/xprt.c linux-2.4.22-00-resends/net/sunrpc/xprt.c
--- linux-2.4.22-rc3/net/sunrpc/xprt.c 2003-07-29 19:54:19.000000000 -0400
+++ linux-2.4.22-00-resends/net/sunrpc/xprt.c 2003-09-05 10:56:43.000000000 -0400
@@ -138,18 +138,21 @@
static int
__xprt_lock_write(struct rpc_xprt *xprt, struct rpc_task *task)
{
+ struct rpc_rqst *req = task->tk_rqstp;
if (!xprt->snd_task) {
if (xprt->nocong || __xprt_get_cong(xprt, task)) {
xprt->snd_task = task;
- if (task->tk_rqstp)
- task->tk_rqstp->rq_bytes_sent = 0;
+ if (req) {
+ req->rq_bytes_sent = 0;
+ req->rq_ntrans++;
+ }
}
}
if (xprt->snd_task != task) {
dprintk("RPC: %4d TCP write queue full\n", task->tk_pid);
task->tk_timeout = 0;
task->tk_status = -EAGAIN;
- if (task->tk_rqstp && task->tk_rqstp->rq_nresend)
+ if (req && req->rq_ntrans)
rpc_sleep_on(&xprt->resend, task, NULL, NULL);
else
rpc_sleep_on(&xprt->sending, task, NULL, NULL);
@@ -183,9 +186,12 @@
return;
}
if (xprt->nocong || __xprt_get_cong(xprt, task)) {
+ struct rpc_rqst *req = task->tk_rqstp;
xprt->snd_task = task;
- if (task->tk_rqstp)
- task->tk_rqstp->rq_bytes_sent = 0;
+ if (req) {
+ req->rq_bytes_sent = 0;
+ req->rq_ntrans++;
+ }
}
}

@@ -592,7 +598,7 @@
if (!xprt->nocong) {
xprt_adjust_cwnd(xprt, copied);
__xprt_put_cong(xprt, req);
- if (!req->rq_nresend) {
+ if (req->rq_trans == 1) {
int timer = rpcproc_timer(clnt, task->tk_msg.rpc_proc);
if (timer)
rpc_update_rtt(&clnt->cl_rtt, timer, (long)jiffies - req->rq_xtime);
@@ -1063,7 +1069,7 @@
goto out;

xprt_adjust_cwnd(req->rq_xprt, -ETIMEDOUT);
- req->rq_nresend++;
+ __xprt_put_cong(xprt, req);

dprintk("RPC: %4d xprt_timer (%s request)\n",
task->tk_pid, req ? "pending" : "backlogged");


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs