2004-01-30 00:47:15

by NeilBrown

[permalink] [raw]
Subject: Re: [NFS] NFS rpc and stale handles on 2.6.x servers

On Sunday January 25, [email protected] wrote:
> The below is being reported, on and off, when hitting nfs-kernel-servers
> running on 2.6.0 and 2.6.1 Could someone tell me if this is smoe bug or
> what? Thanks
> RPC request reserved 0 but used 124
>
> Debian sarge
> nfs-kernel-server
> am-untils
> nfsv3 over tcp
>

stale file handles is a known bug that is fixed in the but BK and will
be in 2.6.3.

The "RPC request reserved 0 ..." is very odd. It does immediately
indicate a major problem, but it should be fixed, if only I could
figure out what was causing it.

I might put come more info into the message so future bug reports will
tell me more.

Thanks,
NeilBrown


2004-01-30 01:27:12

by Mike Fedyk

[permalink] [raw]
Subject: Re: [NFS] NFS rpc and stale handles on 2.6.x servers

On Fri, Jan 30, 2004 at 11:46:31AM +1100, Neil Brown wrote:
> On Sunday January 25, [email protected] wrote:
> > The below is being reported, on and off, when hitting nfs-kernel-servers
> > running on 2.6.0 and 2.6.1 Could someone tell me if this is smoe bug or
> > what? Thanks
> > RPC request reserved 0 but used 124
> >
> > Debian sarge
> > nfs-kernel-server
> > am-untils
> > nfsv3 over tcp
> >
>
> stale file handles is a known bug that is fixed in the but BK and will
> be in 2.6.3.

do you mean 2.6.2?

I've merged the nfsd stale file handles into 2.6.1-bk2 and it is working
fine on a nfs server here...

2004-01-30 11:55:42

by Roman Kagan

[permalink] [raw]
Subject: Re: [NFS] NFS rpc and stale handles on 2.6.x servers

On Fri, Jan 30, 2004 at 10:16:05AM +0000, Neil Brown wrote:
> The "RPC request reserved 0 ..." is very odd. It does immediately
> indicate a major problem, but it should be fixed, if only I could
> figure out what was causing it.

In case it helps: having enabled svcsock debugging by

echo $[0x0100] > /proc/sys/sunrpc/rpc_debug

I've noticed that those messages always appear in the same pattern:

svc: server c7a3d200 waiting for data (to = 3600000)
svc: socket c6dbfb00(inet c769f220), write_space busy=0
svc: socket c2a9fac0 TCP data ready (svsk c6dbf980)
svc: socket c2a9fac0 served by daemon c7a3d200
svc: socket c2a9fac0 TCP data ready (svsk c6dbf980)
svc: socket c2a9fac0 busy, not enqueued
svc: server c7a3d200, socket c6dbf980, inuse=1
svc: tcp_recv c6dbf980 data 1 conn 0 close 0
svc: socket c6dbf980 recvfrom(c6dbf9d8, 0) = 4
svc: TCP record, 2584 bytes
svc: socket c6dbf980 recvfrom(c6d06a18, 1512) = 2584
svc: TCP complete record (2584 bytes)
svc: socket c2a9fac0 served by daemon c6efe000
svc: got len=2584
svc: socket c2a9fac0 busy, not enqueued
svc: socket c6dbf980 sendto([c436b000 140... ], 140) = 140 (addr 43e17cc1)
svc: socket c2a9fac0 busy, not enqueued
svc: server c7a3d200 waiting for data (to = 3600000)
svc: server c7a3d200, socket c25007a0, inuse=1
svc: tcp_recv c25007a0 data 0 conn 0 close 1
svc: svc_delete_socket(c25007a0)
svc: server socket destroy delayed
svc: got len=0
RPC request reserved 0 but used 140
svc: releasing dead socket
svc: server c7a3d200 waiting for data (to = 3600000)


Note that "tcp_recv" with this set of parameters (data=0 conn=0 close=1)
is always correlated with "RPC request reserved ...", and also the
"used" request length matches the message length in "sendto" on the
seemingly unrelated socket.

Unfortunately I don't understand the code well enough to make a better
bug report, but feel free to ask me to test your patches if you can't
reproduce the problem in your setup.

Roman.