2007-09-07 15:49:55

by Wolfgang Walter

[permalink] [raw]
Subject: problems with lockd in 2.6.22.6

Hello,

we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Sinc=
e then we get the message

lockd: too many open TCP sockets, consider increasing the number of nfs=
d threads
lockd: last TCP connect from ^\\236^\=C9^D

1) These random characters in the second line are caused by a bug in sv=
c_tcp_accept.
I already posted this patch on [email protected]:

Signed-off-by: Wolfgang Walter <[email protected]>
--- linux-2.6.22.6/net/sunrpc/svcsock.c 2007-08-27 18:10:14.000000000 +=
0200
+++ linux-2.6.22.6w/net/sunrpc/svcsock.c 2007-09-03 18:27:30.000000000 =
+0200
@@ -1090,7 +1090,7 @@
serv->sv_name);
printk(KERN_NOTICE
"%s: last TCP connect from %s\n",
- serv->sv_name, buf);
+ serv->sv_name, __svc_print_addr(sin, buf, sizeof(buf)));
}
/*
* Always select the oldest socket. It's not fair,


with this patch applied one gets something like

lockd: too many open TCP sockets, consider increasing the number of nfs=
d threads
lockd: last TCP connect from 10.11.0.12, port=3D784


2) The number of nfsd threads we are running on the machine is 1024. So=
this is not
the problem. It seems, though, that in the case of lockd svc_tcp_accept=
does not
check the number of nfsd threads but the number of lockd threads which =
is one.
As soon as the number of open lockd sockets surpasses 80 this message g=
ets logged.
This usually happens every evening when a lot of people shutdown their =
workstation.

3) For unknown reason these sockets then remain open. In the morning wh=
en people
start their workstation again we therefor not only get a lot of these m=
essages
again but often the nfs-server does not proberly work any more. Restart=
ing the
nfs-daemon is a workaround.

Reagrds,
--=20
Wolfgang Walter
Studentenwerk M=FCnchen
Anstalt des =F6ffentlichen Rechts


2007-09-07 16:19:45

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [NFS] problems with lockd in 2.6.22.6

On Fri, Sep 07, 2007 at 05:49:55PM +0200, Wolfgang Walter wrote:
> Hello,
>=20
> we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Si=
nce then we get the message
>=20
> lockd: too many open TCP sockets, consider increasing the number of n=
fsd threads
> lockd: last TCP connect from ^\\236^\=C3=89^D
>=20
> 1) These random characters in the second line are caused by a bug in =
svc_tcp_accept.
> I already posted this patch on [email protected]:

Thanks, I've applied that. (The bug is a little subtle: there's
actually two previous __svc_print_addr() calls which might have
initialized "buf" correctly, and it's not obvious that the second isn't
always called (since it's in a dprintk, which is a macro that expands
into a printk inside a conditional)).

> with this patch applied one gets something like
>=20
> lockd: too many open TCP sockets, consider increasing the number of
> nfsd threads lockd: last TCP connect from 10.11.0.12, port=3D784
>=20
>=20
> 2) The number of nfsd threads we are running on the machine is 1024.
> So this is not the problem. It seems, though, that in the case of
> lockd svc_tcp_accept does not check the number of nfsd threads but th=
e
> number of lockd threads which is one. As soon as the number of open
> lockd sockets surpasses 80 this message gets logged. This usually
> happens every evening when a lot of people shutdown their workstation=
=2E

So to be clear: there's not an actual problem here other than that the
logs are getting spammed? (Not that that isn't a problem in itself.)

> 3) For unknown reason these sockets then remain open. In the morning
> when people start their workstation again we therefor not only get a
> lot of these messages again but often the nfs-server does not proberl=
y
> work any more. Restarting the nfs-daemon is a workaround.

Hm, thanks.

--b.

2007-09-07 18:05:32

by Wolfgang Walter

[permalink] [raw]
Subject: Re: problems with lockd in 2.6.22.6

QW0gRnJlaXRhZywgNy4gU2VwdGVtYmVyIDIwMDcgMTg6MTkgc2NocmllYmVuIFNpZToKPiBPbiBG
cmksIFNlcCAwNywgMjAwNyBhdCAwNTo0OTo1NVBNICswMjAwLCBXb2xmZ2FuZyBXYWx0ZXIgd3Jv
dGU6Cj4gPiBIZWxsbywKPiA+Cj4gPiB3ZSB1cGdyYWRlZCB0aGUga2VybmVsIG9mIGEgbmZzLXNl
cnZlciBmcm9tIDIuNi4xNy4xMSB0byAyLjYuMjIuNi4gU2luY2UKPiA+IHRoZW4gd2UgZ2V0IHRo
ZSBtZXNzYWdlCj4gPgo+ID4gbG9ja2Q6IHRvbyBtYW55IG9wZW4gVENQIHNvY2tldHMsIGNvbnNp
ZGVyIGluY3JlYXNpbmcgdGhlIG51bWJlciBvZiBuZnNkCj4gPiB0aHJlYWRzIGxvY2tkOiBsYXN0
IFRDUCBjb25uZWN0IGZyb20gXlxcMjM2XlzDiV5ECgo+ID4KPiA+IDIpIFRoZSBudW1iZXIgb2Yg
bmZzZCB0aHJlYWRzIHdlIGFyZSBydW5uaW5nIG9uIHRoZSBtYWNoaW5lIGlzIDEwMjQuCj4gPiBT
byB0aGlzIGlzIG5vdCB0aGUgcHJvYmxlbS4gSXQgc2VlbXMsIHRob3VnaCwgdGhhdCBpbiB0aGUg
Y2FzZSBvZgo+ID4gbG9ja2Qgc3ZjX3RjcF9hY2NlcHQgZG9lcyBub3QgY2hlY2sgdGhlIG51bWJl
ciBvZiBuZnNkIHRocmVhZHMgYnV0IHRoZQo+ID4gbnVtYmVyIG9mIGxvY2tkIHRocmVhZHMgd2hp
Y2ggaXMgb25lLiAgQXMgc29vbiBhcyB0aGUgbnVtYmVyIG9mIG9wZW4KPiA+IGxvY2tkIHNvY2tl
dHMgc3VycGFzc2VzIDgwIHRoaXMgbWVzc2FnZSBnZXRzIGxvZ2dlZC4gIFRoaXMgdXN1YWxseQo+
ID4gaGFwcGVucyBldmVyeSBldmVuaW5nIHdoZW4gYSBsb3Qgb2YgcGVvcGxlIHNodXRkb3duIHRo
ZWlyIHdvcmtzdGF0aW9uLgo+Cj4gU28gdG8gYmUgY2xlYXI6IHRoZXJlJ3Mgbm90IGFuIGFjdHVh
bCBwcm9ibGVtIGhlcmUgb3RoZXIgdGhhbiB0aGF0IHRoZQo+IGxvZ3MgYXJlIGdldHRpbmcgc3Bh
bW1lZD8gIChOb3QgdGhhdCB0aGF0IGlzbid0IGEgcHJvYmxlbSBpbiBpdHNlbGYuKQo+CgpXaGVu
IG1vcmUgdGhhbiA4MCBuZnMgY2xpZW50cyB0cnkgdG8gbG9jayBmaWxlcyBhdCB0aGUgc2FtZSB0
aW1lIHRoZW4gaXQKcHJvYmFibHkgd291bGQuCgo+ID4gMykgRm9yIHVua25vd24gcmVhc29uIHRo
ZXNlIHNvY2tldHMgdGhlbiByZW1haW4gb3Blbi4gSW4gdGhlIG1vcm5pbmcKPiA+IHdoZW4gcGVv
cGxlIHN0YXJ0IHRoZWlyIHdvcmtzdGF0aW9uIGFnYWluIHdlIHRoZXJlZm9yIG5vdCBvbmx5IGdl
dCBhCj4gPiBsb3Qgb2YgdGhlc2UgbWVzc2FnZXMgYWdhaW4gYnV0IG9mdGVuIHRoZSBuZnMtc2Vy
dmVyIGRvZXMgbm90IHByb3Blcmx5Cj4gPiB3b3JrIGFueSBtb3JlLiBSZXN0YXJ0aW5nIHRoZSBu
ZnMtZGFlbW9uIGlzIGEgd29ya2Fyb3VuZC4KPgo+IEhtLCB0aGFua3MuCj4KCkkgZG9uJ3Qga25v
dyBpZiB0aGUgbG9ja2QgdGhpbmcgaXMgdGhlIHJlYXNvbiwgdGhvdWdoLgoKMi42LjIyLjYgcGVy
IHNlIHJ1bnMgc3RhYmxlIChubyBvb3BzLCBubyBjcmFzaCBldGMpIGJ1dCBrZXJuZWwgbmZzIHNl
ZW1zCnRvIGJlIGEgbGl0dGxlIGJpdCB1bnN0YWJsZS4gMi42LjE3LjExIHJ1biBmb3IgbW9udGhz
IHdpdGhvdXQgYW55IG5mc2QtcmVsYXRlZCAKcHJvYmxlbXMgd2hlcmVhcyBpbiAyLjYuMjIuNiBu
ZnMgbmVlZHMgdG8gYmUgcmVzdGFydGVkIGFsbW9zdCBldmVyeSBkYXkuIApTb21ldGltZXMgdGhp
cyBmYWlscyB3aXRoCgpsb2NrZF9kb3duOiBsb2NrZCBmYWlsZWQgdG8gZXhpdCwgY2xlYXJpbmcg
cGlkCm5mc2Q6IGxhc3Qgc2VydmVyIGhhcyBleGl0ZWQKbmZzZDogdW5leHBvcnRpbmcgYWxsIGZp
bGVzeXN0ZW1zCmxvY2tkX3VwOiBtYWtlc29jayBmYWlsZWQsIGVycm9yPS05OAoKYWZ0ZXIgd2hp
Y2ggdGhlIHNlcnZlciBtdXN0IGJlIHJlYm9vdGVkLgoKSSB0aGluayB0aGVyZSBpcyBzb21ldGhp
bmcgd2l0aCBsb2NrZCBiZWNhdXNlIHRoZXJlIGFyZSBubyBwcm9ibGVtcyBvdmVyIHRoZSAKZGF5
LiBJdCBpcyBpbiB0aGUgbW9ybmluZyB3aGVuIGEgbG90IG9mIHBlb3BsZSBsb2cgaW50byB0aGVp
ciBtYWNoaW5lcyBhbmQgCnN0YXJ0IHRoZWlyIGRlc2t0b3BzIChJIHRoaW5rIGtkZSBsb2NrcyBp
dHMgY29uZmlnIGZpbGVzIHdoZW4gaXQgcmVhZHMgdGhlbSkuCgpSZWdhcmRzCi0tIApXb2xmZ2Fu
ZyBXYWx0ZXIKU3R1ZGVudGVud2VyayBNw7xuY2hlbgpBbnN0YWx0IGRlcyDDtmZmZW50bGljaGVu
IFJlY2h0cwoKLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQpUaGlzIFNGLm5ldCBlbWFpbCBpcyBzcG9uc29yZWQg
Ynk6IFNwbHVuayBJbmMuClN0aWxsIGdyZXBwaW5nIHRocm91Z2ggbG9nIGZpbGVzIHRvIGZpbmQg
cHJvYmxlbXM/ICBTdG9wLgpOb3cgU2VhcmNoIGxvZyBldmVudHMgYW5kIGNvbmZpZ3VyYXRpb24g
ZmlsZXMgdXNpbmcgQUpBWCBhbmQgYSBicm93c2VyLgpEb3dubG9hZCB5b3VyIEZSRUUgY29weSBv
ZiBTcGx1bmsgbm93ID4+ICBodHRwOi8vZ2V0LnNwbHVuay5jb20vCl9fX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCk5GUyBtYWlsbGlzdCAgLSAgTkZTQGxpc3Rz
LnNvdXJjZWZvcmdlLm5ldApodHRwczovL2xpc3RzLnNvdXJjZWZvcmdlLm5ldC9saXN0cy9saXN0
aW5mby9uZnMK

2007-09-07 20:39:16

by Steinar H. Gunderson

[permalink] [raw]
Subject: Re: problems with lockd in 2.6.22.6

On Fri, Sep 07, 2007 at 05:49:55PM +0200, Wolfgang Walter wrote:
> 2) The number of nfsd threads we are running on the machine is 1024. So this is not
> the problem.

Wow, 1024? How many clients do you have? I'd normally assume 16 or 32 or
something was a reasonable value...

/* Steinar */
--
Homepage: http://www.sesse.net/

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-09-08 01:01:06

by Wolfgang Walter

[permalink] [raw]
Subject: Re: problems with lockd in 2.6.22.6

On Friday 07 September 2007, Steinar H. Gunderson wrote:
> On Fri, Sep 07, 2007 at 05:49:55PM +0200, Wolfgang Walter wrote:
> > 2) The number of nfsd threads we are running on the machine is 1024. So =

this is not
> > the problem.
> =

> Wow, 1024? How many clients do you have? I'd normally assume 16 or 32 or
> something was a reasonable value...
> =


We have about 200 clients.

When we installed it several years ago (with kernel 2.4 and with udp) I =

doubled the number of threads every two weeks till there were no stalls =

(client side) in the morning any more. This was with 512 threads. I didn't =

notice any disadvantages, so I doubled it again.

When switching to 2.6 and tcp I just left it unchanged.

Regards,
-- =

Wolfgang Walter
Studentenwerk M=FCnchen
Anstalt des =F6ffentlichen Rechts

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-09-08 18:20:52

by Wolfgang Walter

[permalink] [raw]
Subject: Re: [NFS] problems with lockd in 2.6.22.6

On Friday 07 September 2007, J. Bruce Fields wrote:
> On Fri, Sep 07, 2007 at 05:49:55PM +0200, Wolfgang Walter wrote:
> > Hello,
> >=20

> > 3) For unknown reason these sockets then remain open. In the mornin=
g
> > when people start their workstation again we therefor not only get =
a
> > lot of these messages again but often the nfs-server does not probe=
rly
> > work any more. Restarting the nfs-daemon is a workaround.
>=20

I wonder why these sockets remain open, by the way. Even if they aren't=
used
for days. Such a socket only gets deleted when the 81. socket must be o=
pened.

If I do not misunderstand the idea then temporary sockets should be des=
troyed
after some time without activity by svc_age_temp_sockets.

Now I wonder how svc_age_temp_sockets works. Does it ever close and del=
ete a
temporary socket at all?


static void
svc_age_temp_sockets(unsigned long closure)
{
struct svc_serv *serv =3D (struct svc_serv *)closure;
struct svc_sock *svsk;
struct list_head *le, *next;
LIST_HEAD(to_be_aged);

dprintk("svc_age_temp_sockets\n");

if (!spin_trylock_bh(&serv->sv_lock)) {
/* busy, try again 1 sec later */
dprintk("svc_age_temp_sockets: busy\n");
mod_timer(&serv->sv_temptimer, jiffies + HZ);
return;
}

list_for_each_safe(le, next, &serv->sv_tempsocks) {
svsk =3D list_entry(le, struct svc_sock, sk_list);

if (!test_and_set_bit(SK_OLD, &svsk->sk_flags))
continue;
if (atomic_read(&svsk->sk_inuse) || test_bit(SK_BUSY, &svsk->sk_flags=
))
continue;
####
doesn't this mean that svsk->sk_inuse must be zero which means that SK_=
DEAD is set?
and wouldn't that mean that svc_delete_socket already has been called f=
or that socket
(and probably is already closed) ?
and wouldn't that mean that svc_sock_enqueue which is called later does=
not make any
sense (it checks for SK_DEAD)?
####

atomic_inc(&svsk->sk_inuse);
list_move(le, &to_be_aged);
set_bit(SK_CLOSE, &svsk->sk_flags);
set_bit(SK_DETACHED, &svsk->sk_flags);
}
spin_unlock_bh(&serv->sv_lock);

while (!list_empty(&to_be_aged)) {
le =3D to_be_aged.next;
/* fiddling the sk_list node is safe 'cos we're SK_DETACHED */
list_del_init(le);
svsk =3D list_entry(le, struct svc_sock, sk_list);

dprintk("queuing svsk %p for closing, %lu seconds old\n",
svsk, get_seconds() - svsk->sk_lastrecv);

/* a thread will dequeue and close it soon */
svc_sock_enqueue(svsk);
svc_sock_put(svsk);
}

mod_timer(&serv->sv_temptimer, jiffies + svc_conn_age_period * HZ);
}

Regards,
--=20
Wolfgang Walter
Studentenwerk M=FCnchen
Anstalt des =F6ffentlichen Rechts