Hello,
as already described old temporary sockets (client is gone) of lockd aren't
closed after some time. So, with enough clients and some time gone, there
are 80 open dangling sockets and you start getting messages of the form:
lockd: too many open TCP sockets, consider increasing the number of nfsd threads.
If I understand the code then the intention was that the server closes
temporary sockets after about 6 to 12 minutes:
a timer is started which calls svc_age_temp_sockets every 6 minutes.
svc_age_temp_sockets:
if a socket is marked OLD it gets closed.
sockets which are not marked as OLD are marked OLD
every time the sockets receives something OLD is cleared.
But svc_age_temp_sockets never closes any socket though because it only
closes sockets with svsk->sk_inuse == 0. This seems to be a bug.
Here is a patch against 2.6.22.6 which changes the test to
svsk->sk_inuse <= 0 which was probably meant. The patched kernel runs fine
here. Unused sockets get closed (after 6 to 12 minutes)
Signed-off-by: Wolfgang Walter <[email protected]>
--- ../linux-2.6.22.6/net/sunrpc/svcsock.c 2007-08-27 18:10:14.000000000 +0200
+++ net/sunrpc/svcsock.c 2007-09-11 11:07:13.000000000 +0200
@@ -1572,7 +1575,7 @@
if (!test_and_set_bit(SK_OLD, &svsk->sk_flags))
continue;
- if (atomic_read(&svsk->sk_inuse) || test_bit(SK_BUSY, &svsk->sk_flags))
+ if (atomic_read(&svsk->sk_inuse) <= 0 || test_bit(SK_BUSY, &svsk->sk_flags))
continue;
atomic_inc(&svsk->sk_inuse);
list_move(le, &to_be_aged);
As svc_age_temp_sockets did not do anything before this change may trigger
hidden bugs.
To be true I don't see why this check
(atomic_read(&svsk->sk_inuse) <= 0 || test_bit(SK_BUSY, &svsk->sk_flags))
is needed at all (it can only be an optimation) as this fields change after
the check. In svc_tcp_accept there is no such check when a temporary socket
is closed.
Regards,
--
Wolfgang Walter
Studentenwerk M?nchen
Anstalt des ?ffentlichen Rechts
On Wed, Sep 12, 2007 at 02:07:10PM +0200, Wolfgang Walter wrote:
> as already described old temporary sockets (client is gone) of lockd aren't
> closed after some time. So, with enough clients and some time gone, there
> are 80 open dangling sockets and you start getting messages of the form:
>
> lockd: too many open TCP sockets, consider increasing the number of nfsd threads.
Thanks for working on this problem!
> If I understand the code then the intention was that the server closes
> temporary sockets after about 6 to 12 minutes:
>
> a timer is started which calls svc_age_temp_sockets every 6 minutes.
>
> svc_age_temp_sockets:
> if a socket is marked OLD it gets closed.
> sockets which are not marked as OLD are marked OLD
>
> every time the sockets receives something OLD is cleared.
>
> But svc_age_temp_sockets never closes any socket though because it only
> closes sockets with svsk->sk_inuse == 0. This seems to be a bug.
>
> Here is a patch against 2.6.22.6 which changes the test to
> svsk->sk_inuse <= 0 which was probably meant. The patched kernel runs fine
> here. Unused sockets get closed (after 6 to 12 minutes)
So the fact that this changes the behavior means that sk_inuse is taking
on negative values. This can't be right--how can something like
svc_sock_put() (which does an atomic_dec_and_test) work in that case?
I wish I had time today to figure out what's going on in this case. But
from a quick through svsock.c for sk_inuse, it looks odd; I'm suspicious
of anything without the stereotyped behavior--initializing to one,
atomic_inc()ing whenever someone takes a reference, and
atomic_dec_and_test()ing whenever someone drops it....
--b.
On Wednesday September 12, [email protected] wrote:
> Hello,
>
> as already described old temporary sockets (client is gone) of lockd aren't
> closed after some time. So, with enough clients and some time gone, there
> are 80 open dangling sockets and you start getting messages of the form:
>
> lockd: too many open TCP sockets, consider increasing the number of nfsd threads.
>
> If I understand the code then the intention was that the server closes
> temporary sockets after about 6 to 12 minutes:
>
> a timer is started which calls svc_age_temp_sockets every 6 minutes.
>
> svc_age_temp_sockets:
> if a socket is marked OLD it gets closed.
> sockets which are not marked as OLD are marked OLD
>
> every time the sockets receives something OLD is cleared.
>
> But svc_age_temp_sockets never closes any socket though because it only
> closes sockets with svsk->sk_inuse == 0. This seems to be a bug.
>
> Here is a patch against 2.6.22.6 which changes the test to
> svsk->sk_inuse <= 0 which was probably meant. The patched kernel runs fine
> here. Unused sockets get closed (after 6 to 12 minutes)
>
> Signed-off-by: Wolfgang Walter <[email protected]>
>
> --- ../linux-2.6.22.6/net/sunrpc/svcsock.c 2007-08-27 18:10:14.000000000 +0200
> +++ net/sunrpc/svcsock.c 2007-09-11 11:07:13.000000000 +0200
> @@ -1572,7 +1575,7 @@
>
> if (!test_and_set_bit(SK_OLD, &svsk->sk_flags))
> continue;
> - if (atomic_read(&svsk->sk_inuse) || test_bit(SK_BUSY, &svsk->sk_flags))
> + if (atomic_read(&svsk->sk_inuse) <= 0 || test_bit(SK_BUSY, &svsk->sk_flags))
> continue;
> atomic_inc(&svsk->sk_inuse);
> list_move(le, &to_be_aged);
>
>
> As svc_age_temp_sockets did not do anything before this change may trigger
> hidden bugs.
>
> To be true I don't see why this check
>
> (atomic_read(&svsk->sk_inuse) <= 0 || test_bit(SK_BUSY, &svsk->sk_flags))
>
> is needed at all (it can only be an optimation) as this fields change after
> the check. In svc_tcp_accept there is no such check when a temporary socket
> is closed.
Thanks for looking into this.
I think the correct change is to test
if (atomic_read(&svsk->sk_inuse) > 1 || test_bit(SK_BUSY, &svsk->sk_flags))
or even
if (atomic_read(&svsk->sk_inuse) != 1 || test_bit(SK_BUSY, &svsk->sk_flags))
sk_inuse contains a bias of '1' until SK_DEAD is set. So a normal,
active socket will have an inuse count of 1 or more. If it is exactly
1, then either it is SK_DEAD (in which case there is nothing for this
code to do), or it has no users, in which case it is appropriate to
close the socket if it is old.
Note that this test is for "the socket should not be closed", so we
test if it is *not* 1, or > 1.
The tests are needed because we don't want to close a socket that
might be inuse elsewhere. The SK_BUSY bit combined with the sk_inuse
count combine to check if the socket is in use at all or not.
You change effectively disabled the test, as sk_inuse is never <= 0
(except when SK_DEAD is set).
This bug has been present since
commit aaf68cfbf2241d24d46583423f6bff5c47e088b3
Author: NeilBrown <[email protected]>
Date: Thu Feb 8 14:20:30 2007 -0800
(i.e. it is my fault).
So it is in 2.6.21 and later and should probably go to .stable for .21
and .22.
Bruce: for you :-)
---------------------------
Correctly close old nfsd/lockd sockets.
From: NeilBrown <[email protected]>
Commit aaf68cfbf2241d24d46583423f6bff5c47e088b3 added a bias
to sk_inuse, so this test for an unused socket now fails. So no
sockets gets closed because they are old (they might get closed
if the client closed them).
This bug has existed since 2.6.21-rc1.
Thanks to Wolfgang Walter for finding and reporting the bug.
Cc: Wolfgang Walter <[email protected]>
Signed-off-by: Neil Brown <[email protected]>
### Diffstat output
./net/sunrpc/svcsock.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c
--- .prev/net/sunrpc/svcsock.c 2007-09-12 16:05:23.000000000 +0200
+++ ./net/sunrpc/svcsock.c 2007-09-12 16:06:01.000000000 +0200
@@ -1592,7 +1592,8 @@ svc_age_temp_sockets(unsigned long closu
if (!test_and_set_bit(SK_OLD, &svsk->sk_flags))
continue;
- if (atomic_read(&svsk->sk_inuse) || test_bit(SK_BUSY, &svsk->sk_flags))
+ if (atomic_read(&svsk->sk_inuse) > 1
+ || test_bit(SK_BUSY, &svsk->sk_flags))
continue;
atomic_inc(&svsk->sk_inuse);
list_move(le, &to_be_aged);
On Wed, Sep 12, 2007 at 09:37:29AM -0400, bfields wrote:
> So the fact that this changes the behavior means that sk_inuse is taking
> on negative values.
Uh, no, I misread the tests, sorry. I'm not awake.--b.
Am Mittwoch, 12. September 2007 15:37 schrieb J. Bruce Fields:
> On Wed, Sep 12, 2007 at 02:07:10PM +0200, Wolfgang Walter wrote:
> > as already described old temporary sockets (client is gone) of lockd
> > aren't closed after some time. So, with enough clients and some time
> > gone, there are 80 open dangling sockets and you start getting messages
> > of the form:
> >
> > lockd: too many open TCP sockets, consider increasing the number of nfsd
> > threads.
>
> Thanks for working on this problem!
>
> > If I understand the code then the intention was that the server closes
> > temporary sockets after about 6 to 12 minutes:
> >
> > a timer is started which calls svc_age_temp_sockets every 6 minutes.
> >
> > svc_age_temp_sockets:
> > if a socket is marked OLD it gets closed.
> > sockets which are not marked as OLD are marked OLD
> >
> > every time the sockets receives something OLD is cleared.
> >
> > But svc_age_temp_sockets never closes any socket though because it only
> > closes sockets with svsk->sk_inuse == 0. This seems to be a bug.
> >
> > Here is a patch against 2.6.22.6 which changes the test to
> > svsk->sk_inuse <= 0 which was probably meant. The patched kernel runs
> > fine here. Unused sockets get closed (after 6 to 12 minutes)
>
> So the fact that this changes the behavior means that sk_inuse is taking
> on negative values. This can't be right--how can something like
> svc_sock_put() (which does an atomic_dec_and_test) work in that case?
You probably misread the code.
if (atomic_read(&svsk->sk_inuse) || test_bit(SK_BUSY, &svsk->sk_flags))
continue;
This means: any socket where svsk->sk_inuse != 0 or SK_BUSY is set is ignored
by svc_age_temp_sockets: no attempt is made to close the svc.
This seems to be wrong: if svsk->sk_inuse is zero only if svc_delete_socket
has been called for it and will be deleted anyway (probably it is already
closed then).
But the intention of svc_age_temp_sockets is to close open temporary
sockets where no traffic has been received for more than 6 minutes. These
sockets have svsk->sk_inuse >= 1.
My patch does exactly this:
instead of
"skip sockets which are not already deleted or which are busy"
to
"skip sockets which are already deleted or which are busy"
>
> I wish I had time today to figure out what's going on in this case. But
> from a quick through svsock.c for sk_inuse, it looks odd; I'm suspicious
> of anything without the stereotyped behavior--initializing to one,
> atomic_inc()ing whenever someone takes a reference, and
> atomic_dec_and_test()ing whenever someone drops it....
>
Then svc_tcp_accept would be wrong, too (it closes sockets the same way just
without testing for sk_inuse and SK_BUSY).
I think this works because as long as a socket is in sv_tempsocks or
sv_permsocks svsk->sk_inuse can never reach zero. As svc_age_temp_sockets locks
the list nobody can bring svsk->sk_inuse to zero as long as
svc_age_temp_sockets holds the lock. As svc_age_temp_sockets calls
atomic_inc(&svsk->sk_inuse) when holding the lock there is no
problem. (the same is true for svc_tcp_accept).
This is the reason why I doubt that this check for svsk->sk_inuse in
svc_age_temp_sockets is usefull at all. It should be always false.
Regards,
--
Wolfgang Walter
Studentenwerk M?nchen
Anstalt des ?ffentlichen Rechts
On Wed, Sep 12, 2007 at 04:14:06PM +0200, Neil Brown wrote:
> So it is in 2.6.21 and later and should probably go to .stable for .21
> and .22.
>
> Bruce: for you :-)
OK, thanks! But, (as is alas often the case) I'm still confused:
> if (!test_and_set_bit(SK_OLD, &svsk->sk_flags))
> continue;
> - if (atomic_read(&svsk->sk_inuse) || test_bit(SK_BUSY, &svsk->sk_flags))
> + if (atomic_read(&svsk->sk_inuse) > 1
> + || test_bit(SK_BUSY, &svsk->sk_flags))
> continue;
> atomic_inc(&svsk->sk_inuse);
> list_move(le, &to_be_aged);
What is it that ensures svsk->sk_inuse isn't incremented or SK_BUSY set
after that test? Not all the code that does either of those is under
the same serv->sv_lock lock that this code is.
--b.
On Wednesday 12 September 2007, J. Bruce Fields wrote:
> On Wed, Sep 12, 2007 at 04:14:06PM +0200, Neil Brown wrote:
> > So it is in 2.6.21 and later and should probably go to .stable for .21
> > and .22.
> >
> > Bruce: for you :-)
>
> OK, thanks! But, (as is alas often the case) I'm still confused:
>
> > if (!test_and_set_bit(SK_OLD, &svsk->sk_flags))
> > continue;
> > - if (atomic_read(&svsk->sk_inuse) || test_bit(SK_BUSY, &svsk->sk_flags))
> > + if (atomic_read(&svsk->sk_inuse) > 1
> > + || test_bit(SK_BUSY, &svsk->sk_flags))
> > continue;
> > atomic_inc(&svsk->sk_inuse);
> > list_move(le, &to_be_aged);
>
> What is it that ensures svsk->sk_inuse isn't incremented or SK_BUSY set
> after that test? Not all the code that does either of those is under
> the same serv->sv_lock lock that this code is.
>
This should not matter - SK_CLOSED may be set at any time.
svc_age_temp_sockets only detaches the socket, sets SK_CLOSED and then
enqueues it. If SK_BUSY is set its already enqueued and svc_sock_enqueue
ensures that it is not enqueued twice.
Regards,
--
Wolfgang Walter
Studentenwerk M?nchen
Anstalt des ?ffentlichen Rechts
On Wed, Sep 12, 2007 at 09:40:57PM +0200, Wolfgang Walter wrote:
> On Wednesday 12 September 2007, J. Bruce Fields wrote:
> > On Wed, Sep 12, 2007 at 04:14:06PM +0200, Neil Brown wrote:
> > > So it is in 2.6.21 and later and should probably go to .stable for .21
> > > and .22.
> > >
> > > Bruce: for you :-)
> >
> > OK, thanks! But, (as is alas often the case) I'm still confused:
> >
> > > if (!test_and_set_bit(SK_OLD, &svsk->sk_flags))
> > > continue;
> > > - if (atomic_read(&svsk->sk_inuse) || test_bit(SK_BUSY, &svsk->sk_flags))
> > > + if (atomic_read(&svsk->sk_inuse) > 1
> > > + || test_bit(SK_BUSY, &svsk->sk_flags))
> > > continue;
> > > atomic_inc(&svsk->sk_inuse);
> > > list_move(le, &to_be_aged);
> >
> > What is it that ensures svsk->sk_inuse isn't incremented or SK_BUSY set
> > after that test? Not all the code that does either of those is under
> > the same serv->sv_lock lock that this code is.
> >
>
> This should not matter - SK_CLOSED may be set at any time.
>
> svc_age_temp_sockets only detaches the socket, sets SK_CLOSED and then
> enqueues it. If SK_BUSY is set its already enqueued and svc_sock_enqueue
> ensures that it is not enqueued twice.
Oh, got it. And the list manipulation is safe thanks to sv_lock. Neat,
thanks. Can you verify that this solves your problem?
--b.
On Wednesday 12 September 2007, J. Bruce Fields wrote:
> On Wed, Sep 12, 2007 at 09:40:57PM +0200, Wolfgang Walter wrote:
> > On Wednesday 12 September 2007, J. Bruce Fields wrote:
> > > On Wed, Sep 12, 2007 at 04:14:06PM +0200, Neil Brown wrote:
> > > > So it is in 2.6.21 and later and should probably go to .stable for .21
> > > > and .22.
> > > >
> > > > Bruce: for you :-)
> > >
> > > OK, thanks! But, (as is alas often the case) I'm still confused:
> > >
> > > > if (!test_and_set_bit(SK_OLD, &svsk->sk_flags))
> > > > continue;
> > > > - if (atomic_read(&svsk->sk_inuse) || test_bit(SK_BUSY,
&svsk->sk_flags))
> > > > + if (atomic_read(&svsk->sk_inuse) > 1
> > > > + || test_bit(SK_BUSY, &svsk->sk_flags))
> > > > continue;
> > > > atomic_inc(&svsk->sk_inuse);
> > > > list_move(le, &to_be_aged);
> > >
> > > What is it that ensures svsk->sk_inuse isn't incremented or SK_BUSY set
> > > after that test? Not all the code that does either of those is under
> > > the same serv->sv_lock lock that this code is.
> > >
> >
> > This should not matter - SK_CLOSED may be set at any time.
> >
> > svc_age_temp_sockets only detaches the socket, sets SK_CLOSED and then
> > enqueues it. If SK_BUSY is set its already enqueued and svc_sock_enqueue
> > ensures that it is not enqueued twice.
>
> Oh, got it. And the list manipulation is safe thanks to sv_lock. Neat,
> thanks. Can you verify that this solves your problem?
>
I'll test it tomorrow. So friday morning I'll know and mail you for sure.
Regards,
--
Wolfgang Walter
Studentenwerk M?nchen
Anstalt des ?ffentlichen Rechts
Am Mittwoch, 12. September 2007 21:55 schrieb J. Bruce Fields:
> On Wed, Sep 12, 2007 at 09:40:57PM +0200, Wolfgang Walter wrote:
> > On Wednesday 12 September 2007, J. Bruce Fields wrote:
> > > On Wed, Sep 12, 2007 at 04:14:06PM +0200, Neil Brown wrote:
> > > > So it is in 2.6.21 and later and should probably go to .stable for
> > > > .21 and .22.
> > > >
> > > > Bruce: for you :-)
> > >
> > > OK, thanks! But, (as is alas often the case) I'm still confused:
> > > > if (!test_and_set_bit(SK_OLD, &svsk->sk_flags))
> > > > continue;
> > > > - if (atomic_read(&svsk->sk_inuse) || test_bit(SK_BUSY,
> > > > &svsk->sk_flags)) + if (atomic_read(&svsk->sk_inuse) > 1
> > > > + || test_bit(SK_BUSY, &svsk->sk_flags))
> > > > continue;
> > > > atomic_inc(&svsk->sk_inuse);
> > > > list_move(le, &to_be_aged);
> > >
> > > What is it that ensures svsk->sk_inuse isn't incremented or SK_BUSY set
> > > after that test? Not all the code that does either of those is under
> > > the same serv->sv_lock lock that this code is.
> >
> > This should not matter - SK_CLOSED may be set at any time.
> >
> > svc_age_temp_sockets only detaches the socket, sets SK_CLOSED and then
> > enqueues it. If SK_BUSY is set its already enqueued and svc_sock_enqueue
> > ensures that it is not enqueued twice.
>
> Oh, got it. And the list manipulation is safe thanks to sv_lock. Neat,
> thanks. Can you verify that this solves your problem?
Patch works fine here.
Regards,
--
Wolfgang Walter
Studentenwerk M?nchen
Anstalt des ?ffentlichen Rechts
On Fri, Sep 14, 2007 at 11:12:30AM +0200, Wolfgang Walter wrote:
> Am Mittwoch, 12. September 2007 21:55 schrieb J. Bruce Fields:
> > On Wed, Sep 12, 2007 at 09:40:57PM +0200, Wolfgang Walter wrote:
> > > On Wednesday 12 September 2007, J. Bruce Fields wrote:
> > > > On Wed, Sep 12, 2007 at 04:14:06PM +0200, Neil Brown wrote:
> > > > > So it is in 2.6.21 and later and should probably go to .stable for
> > > > > .21 and .22.
> > > > >
> > > > > Bruce: for you :-)
> > > >
> > > > OK, thanks! But, (as is alas often the case) I'm still confused:
> > > > > if (!test_and_set_bit(SK_OLD, &svsk->sk_flags))
> > > > > continue;
> > > > > - if (atomic_read(&svsk->sk_inuse) || test_bit(SK_BUSY,
> > > > > &svsk->sk_flags)) + if (atomic_read(&svsk->sk_inuse) > 1
> > > > > + || test_bit(SK_BUSY, &svsk->sk_flags))
> > > > > continue;
> > > > > atomic_inc(&svsk->sk_inuse);
> > > > > list_move(le, &to_be_aged);
> > > >
> > > > What is it that ensures svsk->sk_inuse isn't incremented or SK_BUSY set
> > > > after that test? Not all the code that does either of those is under
> > > > the same serv->sv_lock lock that this code is.
> > >
> > > This should not matter - SK_CLOSED may be set at any time.
> > >
> > > svc_age_temp_sockets only detaches the socket, sets SK_CLOSED and then
> > > enqueues it. If SK_BUSY is set its already enqueued and svc_sock_enqueue
> > > ensures that it is not enqueued twice.
> >
> > Oh, got it. And the list manipulation is safe thanks to sv_lock. Neat,
> > thanks. Can you verify that this solves your problem?
>
> Patch works fine here.
Great, thanks again!
--b.