2008-12-16 12:05:52

by Kasparek Tomas

[permalink] [raw]
Subject: Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"

On Sun, Dec 14, 2008 at 02:50:01PM -0500, Trond Myklebust wrote:
> On Thu, 2008-12-04 at 11:23 +0100, Kasparek Tomas wrote:
> > On Tue, Dec 02, 2008 at 01:10:07PM -0500, Trond Myklebust wrote:
> > > On Tue, 2008-12-02 at 17:26 +0100, Kasparek Tomas wrote:
> > >
> > > > Did tried. The number should be seconds and defaults to 60, These
> > > > connections are still there after several hours. Changing it to 10 (sec)
> > > > and same behaviour. (BTW The server did not changed in last several months)
> > >
> > > Are you seeing the same behaviour with 'netstat -t'?
> >
> > yes:
> >
> > root@pckasparek: ~# ssh root@pcnlp1 'netstat -pan | grep WAIT' | cut -c-85
> > tcp 0 0 147.229.12.146:989 147.229.176.14:2049 FIN_WAIT2
> > root@pckasparek: ~# ssh root@pcnlp1 'netstat -t | grep WAIT' | cut -c-85
> > tcp 0 0 pcnlp1.fit.vutbr.:ftps-data eva.fit.vutbr.cz:nfs FIN_WAIT2
> >
> > but it should be the same, did't it? -t just selects TCP connections and
> > this is TCP connection so it shows the same
>
> Right, but the point is that the client is in the state FIN_WAIT2, which
> means that it has closed the socket on its end, and is waiting for the
> server to close on its end. The fact that the server is failing to do
> this is a server bug.
>
> That said, we can't wait forever for buggy servers. I see now why the
> linger2 stuff isn't working. I believe that the appended patch should
> help...

Hm, not happy to say that but it still does not work after some time. Now
the problem is opposite there are no connections to the server according to
netstat on client, just time to time there is

pcnlp1.fit.vutbr.cz.15234 > kazi.fit.vutbr.cz.nfs: 40 null
kazi.fit.vutbr.cz.nfs > pcnlp1.fit.vutbr.cz.15234: reply ok 24 null

(kazi is server). Will try to investigate more details.

(just to remember the same kernel with reversed
e06799f958bf7f9f8fae15f0c6f519953fb0257c works fine - exact patch is
included - it was slightly modified to fit 2.6.27.x kernels)

Thank you very much for your help so far.

--

Tomas Kasparek, PhD student E-mail: [email protected]
CVT FIT VUT Brno, L127 Web: http://www.fit.vutbr.cz/~kasparek
Bozetechova 1, 612 66 Fax: +420 54114-1270
Brno, Czech Republic Phone: +420 54114-1220

jabber: [email protected]
GPG: 2F1E 1AAF FD3B CFA3 1537 63BD DCBE 18FF A035 53BC



2008-12-23 22:34:37

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"

On Tue, 2008-12-16 at 13:05 +0100, Kasparek Tomas wrote:
> Hm, not happy to say that but it still does not work after some time. Now
> the problem is opposite there are no connections to the server according to
> netstat on client, just time to time there is
>
> pcnlp1.fit.vutbr.cz.15234 > kazi.fit.vutbr.cz.nfs: 40 null
> kazi.fit.vutbr.cz.nfs > pcnlp1.fit.vutbr.cz.15234: reply ok 24 null
>
> (kazi is server). Will try to investigate more details.

OK. Here is one more try. I've tightened up some locking issues with the
previous patch.

Thanks for helping test this!

Cheers
Trond
-----------------------------------------------------------
From: Trond Myklebust <[email protected]>
Date: Tue, 23 Dec 2008 16:21:25 -0500
SUNRPC: Add the equivalent of the linger2 timeout to RPC sockets

This avoids us getting stuck in the TCP_FIN_WAIT2 state forever.

Signed-off-by: Trond Myklebust <[email protected]>
---

include/linux/sunrpc/xprt.h | 1 +
net/sunrpc/xprtsock.c | 63 ++++++++++++++++++++++++++++++-------------
2 files changed, 45 insertions(+), 19 deletions(-)


diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 11fc71d..1a6ecd7 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -259,6 +259,7 @@ void xprt_conditional_disconnect(struct rpc_xprt *xprt, unsigned int cookie);
#define XPRT_BOUND (4)
#define XPRT_BINDING (5)
#define XPRT_CLOSING (6)
+#define XPRT_CONNECTION_ABORT (7)

static inline void xprt_set_connected(struct rpc_xprt *xprt)
{
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 0a50361..dfb0aeb 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -171,6 +171,7 @@ static ctl_table sunrpc_table[] = {
*/
#define XS_TCP_INIT_REEST_TO (3U * HZ)
#define XS_TCP_MAX_REEST_TO (5U * 60 * HZ)
+#define XS_TCP_LINGER2_TO (5U * HZ)

/*
* TCP idle timeout; client drops the transport socket if it is idle
@@ -792,6 +793,7 @@ static void xs_close(struct rpc_xprt *xprt)
sock_release(sock);
clear_close_wait:
smp_mb__before_clear_bit();
+ clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
clear_bit(XPRT_CLOSING, &xprt->state);
smp_mb__after_clear_bit();
@@ -1126,6 +1128,7 @@ out:
*/
static void xs_tcp_state_change(struct sock *sk)
{
+ struct sock_xprt *transport;
struct rpc_xprt *xprt;

read_lock(&sk->sk_callback_lock);
@@ -1137,13 +1140,12 @@ static void xs_tcp_state_change(struct sock *sk)
sock_flag(sk, SOCK_DEAD),
sock_flag(sk, SOCK_ZAPPED));

+ transport = container_of(xprt, struct sock_xprt, xprt);
+
switch (sk->sk_state) {
case TCP_ESTABLISHED:
spin_lock_bh(&xprt->transport_lock);
if (!xprt_test_and_set_connected(xprt)) {
- struct sock_xprt *transport = container_of(xprt,
- struct sock_xprt, xprt);
-
/* Reset TCP record info */
transport->tcp_offset = 0;
transport->tcp_reclen = 0;
@@ -1184,7 +1186,24 @@ static void xs_tcp_state_change(struct sock *sk)
clear_bit(XPRT_CONNECTED, &xprt->state);
smp_mb__after_clear_bit();
break;
+ case TCP_FIN_WAIT2:
+ /* Do the equivalent of linger2 handling for dealing with
+ * broken servers that don't close the socket in a timely
+ * fashion
+ */
+ if (!xprt_test_and_set_connecting(xprt)) {
+ set_bit(XPRT_CONNECTION_ABORT, &xprt->state);
+ queue_delayed_work(rpciod_workqueue,
+ &transport->connect_worker,
+ XS_TCP_LINGER2_TO);
+ }
+ break;
case TCP_CLOSE:
+ if (delayed_work_pending(&transport->connect_worker) &&
+ cancel_delayed_work(&transport->connect_worker)) {
+ clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
+ xprt_clear_connecting(xprt);
+ }
smp_mb__before_clear_bit();
clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
clear_bit(XPRT_CLOSING, &xprt->state);
@@ -1549,8 +1568,8 @@ static void xs_udp_connect_worker4(struct work_struct *work)
xs_udp_finish_connecting(xprt, sock);
status = 0;
out:
- xprt_wake_pending_tasks(xprt, status);
xprt_clear_connecting(xprt);
+ xprt_wake_pending_tasks(xprt, status);
}

/**
@@ -1590,8 +1609,8 @@ static void xs_udp_connect_worker6(struct work_struct *work)
xs_udp_finish_connecting(xprt, sock);
status = 0;
out:
- xprt_wake_pending_tasks(xprt, status);
xprt_clear_connecting(xprt);
+ xprt_wake_pending_tasks(xprt, status);
}

/*
@@ -1675,6 +1694,7 @@ static void xs_tcp_connect_worker4(struct work_struct *work)
goto out;

if (!sock) {
+ clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
/* start from scratch */
if ((err = sock_create_kern(PF_INET, SOCK_STREAM, IPPROTO_TCP, &sock)) < 0) {
dprintk("RPC: can't create TCP transport socket (%d).\n", -err);
@@ -1686,10 +1706,14 @@ static void xs_tcp_connect_worker4(struct work_struct *work)
sock_release(sock);
goto out;
}
- } else
+ } else {
/* "close" the socket, preserving the local port */
xs_tcp_reuse_connection(xprt);

+ if (test_and_clear_bit(XPRT_CONNECTION_ABORT, &xprt->state))
+ goto out;
+ }
+
dprintk("RPC: worker connecting xprt %p to address: %s\n",
xprt, xprt->address_strings[RPC_DISPLAY_ALL]);

@@ -1701,19 +1725,17 @@ static void xs_tcp_connect_worker4(struct work_struct *work)
switch (status) {
case -EINPROGRESS:
case -EALREADY:
- goto out_clear;
- case -ECONNREFUSED:
- case -ECONNRESET:
- /* retry with existing socket, after a delay */
break;
default:
/* get rid of existing socket, and retry */
xs_tcp_shutdown(xprt);
+ case -ECONNREFUSED:
+ case -ECONNRESET:
+ /* retry with existing socket, after a delay */
+ xprt_wake_pending_tasks(xprt, status);
}
}
out:
- xprt_wake_pending_tasks(xprt, status);
-out_clear:
xprt_clear_connecting(xprt);
}

@@ -1735,6 +1757,7 @@ static void xs_tcp_connect_worker6(struct work_struct *work)
goto out;

if (!sock) {
+ clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
/* start from scratch */
if ((err = sock_create_kern(PF_INET6, SOCK_STREAM, IPPROTO_TCP, &sock)) < 0) {
dprintk("RPC: can't create TCP transport socket (%d).\n", -err);
@@ -1746,10 +1769,14 @@ static void xs_tcp_connect_worker6(struct work_struct *work)
sock_release(sock);
goto out;
}
- } else
+ } else {
/* "close" the socket, preserving the local port */
xs_tcp_reuse_connection(xprt);

+ if (test_and_clear_bit(XPRT_CONNECTION_ABORT, &xprt->state))
+ goto out;
+ }
+
dprintk("RPC: worker connecting xprt %p to address: %s\n",
xprt, xprt->address_strings[RPC_DISPLAY_ALL]);

@@ -1760,19 +1787,17 @@ static void xs_tcp_connect_worker6(struct work_struct *work)
switch (status) {
case -EINPROGRESS:
case -EALREADY:
- goto out_clear;
- case -ECONNREFUSED:
- case -ECONNRESET:
- /* retry with existing socket, after a delay */
break;
default:
/* get rid of existing socket, and retry */
xs_tcp_shutdown(xprt);
+ case -ECONNREFUSED:
+ case -ECONNRESET:
+ /* retry with existing socket, after a delay */
+ xprt_wake_pending_tasks(xprt, status);
}
}
out:
- xprt_wake_pending_tasks(xprt, status);
-out_clear:
xprt_clear_connecting(xprt);
}



--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2008-12-16 12:10:15

by Kasparek Tomas

[permalink] [raw]
Subject: Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"

On Tue, Dec 16, 2008 at 01:05:47PM +0100, Kasparek Tomas wrote:
> On Sun, Dec 14, 2008 at 02:50:01PM -0500, Trond Myklebust wrote:
> > On Thu, 2008-12-04 at 11:23 +0100, Kasparek Tomas wrote:
> > > On Tue, Dec 02, 2008 at 01:10:07PM -0500, Trond Myklebust wrote:
> > > > On Tue, 2008-12-02 at 17:26 +0100, Kasparek Tomas wrote:
> > > >
> > > > > Did tried. The number should be seconds and defaults to 60, These
> > > > > connections are still there after several hours. Changing it to 10 (sec)
> > > > > and same behaviour. (BTW The server did not changed in last several months)
> > > >
> > > > Are you seeing the same behaviour with 'netstat -t'?
> > >
> > > yes:
> > >
> > > root@pckasparek: ~# ssh root@pcnlp1 'netstat -pan | grep WAIT' | cut -c-85
> > > tcp 0 0 147.229.12.146:989 147.229.176.14:2049 FIN_WAIT2
> > > root@pckasparek: ~# ssh root@pcnlp1 'netstat -t | grep WAIT' | cut -c-85
> > > tcp 0 0 pcnlp1.fit.vutbr.:ftps-data eva.fit.vutbr.cz:nfs FIN_WAIT2
> > >
> > > but it should be the same, did't it? -t just selects TCP connections and
> > > this is TCP connection so it shows the same
> >
> > Right, but the point is that the client is in the state FIN_WAIT2, which
> > means that it has closed the socket on its end, and is waiting for the
> > server to close on its end. The fact that the server is failing to do
> > this is a server bug.
> >
> > That said, we can't wait forever for buggy servers. I see now why the
> > linger2 stuff isn't working. I believe that the appended patch should
> > help...
>
> Hm, not happy to say that but it still does not work after some time. Now
> the problem is opposite there are no connections to the server according to
> netstat on client, just time to time there is
>
> pcnlp1.fit.vutbr.cz.15234 > kazi.fit.vutbr.cz.nfs: 40 null
> kazi.fit.vutbr.cz.nfs > pcnlp1.fit.vutbr.cz.15234: reply ok 24 null
>
> (kazi is server). Will try to investigate more details.
>
> (just to remember the same kernel with reversed
> e06799f958bf7f9f8fae15f0c6f519953fb0257c works fine - exact patch is
> included - it was slightly modified to fit 2.6.27.x kernels)
>
> Thank you very much for your help so far.

just the forgoten patch promised.

--

Tomas Kasparek, PhD student E-mail: [email protected]
CVT FIT VUT Brno, L127 Web: http://www.fit.vutbr.cz/~kasparek
Bozetechova 1, 612 66 Fax: +420 54114-1270
Brno, Czech Republic Phone: +420 54114-1220

jabber: [email protected]
GPG: 2F1E 1AAF FD3B CFA3 1537 63BD DCBE 18FF A035 53BC


Attachments:
(No filename) (2.54 kB)
linux-2.6.git-e06799f958bf7f9f8fae15f0c6f519953fb0257c-own-modified.patch (2.36 kB)
Download all attachments

2008-12-16 13:00:17

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"

On Tue, 2008-12-16 at 13:10 +0100, Kasparek Tomas wrote:
> On Tue, Dec 16, 2008 at 01:05:47PM +0100, Kasparek Tomas wrote:
> > On Sun, Dec 14, 2008 at 02:50:01PM -0500, Trond Myklebust wrote:
> > > On Thu, 2008-12-04 at 11:23 +0100, Kasparek Tomas wrote:
> > > > On Tue, Dec 02, 2008 at 01:10:07PM -0500, Trond Myklebust wrote:
> > > > > On Tue, 2008-12-02 at 17:26 +0100, Kasparek Tomas wrote:
> > > > >
> > > > > > Did tried. The number should be seconds and defaults to 60, These
> > > > > > connections are still there after several hours. Changing it to 10 (sec)
> > > > > > and same behaviour. (BTW The server did not changed in last several months)
> > > > >
> > > > > Are you seeing the same behaviour with 'netstat -t'?
> > > >
> > > > yes:
> > > >
> > > > root@pckasparek: ~# ssh root@pcnlp1 'netstat -pan | grep WAIT' | cut -c-85
> > > > tcp 0 0 147.229.12.146:989 147.229.176.14:2049 FIN_WAIT2
> > > > root@pckasparek: ~# ssh root@pcnlp1 'netstat -t | grep WAIT' | cut -c-85
> > > > tcp 0 0 pcnlp1.fit.vutbr.:ftps-data eva.fit.vutbr.cz:nfs FIN_WAIT2
> > > >
> > > > but it should be the same, did't it? -t just selects TCP connections and
> > > > this is TCP connection so it shows the same
> > >
> > > Right, but the point is that the client is in the state FIN_WAIT2, which
> > > means that it has closed the socket on its end, and is waiting for the
> > > server to close on its end. The fact that the server is failing to do
> > > this is a server bug.
> > >
> > > That said, we can't wait forever for buggy servers. I see now why the
> > > linger2 stuff isn't working. I believe that the appended patch should
> > > help...
> >
> > Hm, not happy to say that but it still does not work after some time. Now
> > the problem is opposite there are no connections to the server according to
> > netstat on client, just time to time there is
> >
> > pcnlp1.fit.vutbr.cz.15234 > kazi.fit.vutbr.cz.nfs: 40 null
> > kazi.fit.vutbr.cz.nfs > pcnlp1.fit.vutbr.cz.15234: reply ok 24 null
> >
> > (kazi is server). Will try to investigate more details.
> >
> > (just to remember the same kernel with reversed
> > e06799f958bf7f9f8fae15f0c6f519953fb0257c works fine - exact patch is
> > included - it was slightly modified to fit 2.6.27.x kernels)
> >
> > Thank you very much for your help so far.
>
> just the forgoten patch promised.

NACK. That takes us right back to the previous broken behaviour.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com