2012-08-20 21:52:12

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH] svcrpc: sends on closed socket should stop immediately

From: "J. Bruce Fields" <[email protected]>

svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply.
However, the XPT_CLOSE won't be acted on immediately. Meanwhile other
threads could send further replies before the socket is really shut
down. This can manifest as data corruption: for example, if a truncated
read reply is followed by another rpc reply, that second reply will look
to the client like further read data.

Symptoms were data corruption preceded by svc_tcp_sendto logging
something like

kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket

Cc: [email protected]
Reported-by: Malahal Naineni <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
---
net/sunrpc/svc_xprt.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

Can anyone see a reason *not* to fail the send on a socket marked with
XPT_CLOSE?

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 88f2bf6..0d693a8 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -794,7 +794,8 @@ int svc_send(struct svc_rqst *rqstp)

/* Grab mutex to serialize outgoing data. */
mutex_lock(&xprt->xpt_mutex);
- if (test_bit(XPT_DEAD, &xprt->xpt_flags))
+ if (test_bit(XPT_DEAD, &xprt->xpt_flags)
+ || test_bit(XPT_CLOSE, &xprt->xpt_flags))
len = -ENOTCONN;
else
len = xprt->xpt_ops->xpo_sendto(rqstp);
--
1.7.9.5



2012-08-24 15:38:26

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] svcrpc: sends on closed socket should stop immediately

On Mon, Aug 20, 2012 at 05:35:06PM -0500, Malahal Naineni wrote:
> J. Bruce Fields [[email protected]] wrote:
> > From: "J. Bruce Fields" <[email protected]>
> >
> > svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply.
> > However, the XPT_CLOSE won't be acted on immediately. Meanwhile other
> > threads could send further replies before the socket is really shut
> > down.
...
> Instrumented svc_send_common() to send partial read replies, was able
> reproduce the corruption easily. After applying this patch, I wasn't
> able to reproduce the corruption. The patch looks good.

I wonder, maybe someone who understands the tcp code better could
answer: is it really possible for a sendto to fail and then a subsequent
send succeed? If so, what are possible causes?

I'm curious how we'd reproduce this without the artificial fault
injection.

--b.

2012-08-20 22:36:46

by Malahal Naineni

[permalink] [raw]
Subject: Re: [PATCH] svcrpc: sends on closed socket should stop immediately

J. Bruce Fields [[email protected]] wrote:
> From: "J. Bruce Fields" <[email protected]>
>
> svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply.
> However, the XPT_CLOSE won't be acted on immediately. Meanwhile other
> threads could send further replies before the socket is really shut
> down. This can manifest as data corruption: for example, if a truncated
> read reply is followed by another rpc reply, that second reply will look
> to the client like further read data.
>
> Symptoms were data corruption preceded by svc_tcp_sendto logging
> something like
>
> kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket
>
> Cc: [email protected]
> Reported-by: Malahal Naineni <[email protected]>
> Signed-off-by: J. Bruce Fields <[email protected]>
> ---
> net/sunrpc/svc_xprt.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> Can anyone see a reason *not* to fail the send on a socket marked with
> XPT_CLOSE?
>
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 88f2bf6..0d693a8 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -794,7 +794,8 @@ int svc_send(struct svc_rqst *rqstp)
>
> /* Grab mutex to serialize outgoing data. */
> mutex_lock(&xprt->xpt_mutex);
> - if (test_bit(XPT_DEAD, &xprt->xpt_flags))
> + if (test_bit(XPT_DEAD, &xprt->xpt_flags)
> + || test_bit(XPT_CLOSE, &xprt->xpt_flags))
> len = -ENOTCONN;
> else
> len = xprt->xpt_ops->xpo_sendto(rqstp);
> --
> 1.7.9.5

Instrumented svc_send_common() to send partial read replies, was able
reproduce the corruption easily. After applying this patch, I wasn't
able to reproduce the corruption. The patch looks good.

Reviewed-by: Malahal Naineni <[email protected]>
Tested-by: Malahal Naineni <[email protected]>