2008-04-19 20:50:28

by Myklebust, Trond

[permalink] [raw]
Subject: [PATCH 10/33] SUNRPC: Fix read ordering problems with req->rq_private_buf.len

We want to ensure that req->rq_private_buf.len is updated before
req->rq_received, so that call_decode() doesn't use an old value for
req->rq_rcv_buf.len.

In 'call_decode()' itself, instead of using task->tk_status (which is set
using req->rq_received) must use the actual value of
req->rq_private_buf.len when deciding whether or not the received RPC reply
is too short.

Finally ensure that we set req->rq_rcv_buf.len to zero when retrying a
request. A typo meant that we were resetting req->rq_private_buf.len in
call_decode(), and then clobbering that value with the old rq_rcv_buf.len
again in xprt_transmit().

Signed-off-by: Trond Myklebust <[email protected]>
---

net/sunrpc/clnt.c | 26 +++++++++++++-------------
net/sunrpc/xprt.c | 3 ++-
2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 0c29792..57663a4 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1195,18 +1195,6 @@ call_decode(struct rpc_task *task)
task->tk_flags &= ~RPC_CALL_MAJORSEEN;
}

- if (task->tk_status < 12) {
- if (!RPC_IS_SOFT(task)) {
- task->tk_action = call_bind;
- clnt->cl_stats->rpcretrans++;
- goto out_retry;
- }
- dprintk("RPC: %s: too small RPC reply size (%d bytes)\n",
- clnt->cl_protname, task->tk_status);
- task->tk_action = call_timeout;
- goto out_retry;
- }
-
/*
* Ensure that we see all writes made by xprt_complete_rqst()
* before it changed req->rq_received.
@@ -1218,6 +1206,18 @@ call_decode(struct rpc_task *task)
WARN_ON(memcmp(&req->rq_rcv_buf, &req->rq_private_buf,
sizeof(req->rq_rcv_buf)) != 0);

+ if (req->rq_rcv_buf.len < 12) {
+ if (!RPC_IS_SOFT(task)) {
+ task->tk_action = call_bind;
+ clnt->cl_stats->rpcretrans++;
+ goto out_retry;
+ }
+ dprintk("RPC: %s: too small RPC reply size (%d bytes)\n",
+ clnt->cl_protname, task->tk_status);
+ task->tk_action = call_timeout;
+ goto out_retry;
+ }
+
/* Verify the RPC header */
p = call_verify(task);
if (IS_ERR(p)) {
@@ -1239,7 +1239,7 @@ out_retry:
task->tk_status = 0;
/* Note: call_verify() may have freed the RPC slot */
if (task->tk_rqstp == req) {
- req->rq_received = req->rq_private_buf.len = 0;
+ req->rq_received = req->rq_rcv_buf.len = 0;
if (task->tk_client->cl_discrtry)
xprt_force_disconnect(task->tk_xprt);
}
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 3ba64f9..5110a4e 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -757,9 +757,10 @@ void xprt_complete_rqst(struct rpc_task *task, int copied)
task->tk_rtt = (long)jiffies - req->rq_xtime;

list_del_init(&req->rq_list);
+ req->rq_private_buf.len = copied;
/* Ensure all writes are done before we update req->rq_received */
smp_wmb();
- req->rq_received = req->rq_private_buf.len = copied;
+ req->rq_received = copied;
rpc_wake_up_queued_task(&xprt->pending, task);
}
EXPORT_SYMBOL_GPL(xprt_complete_rqst);



2008-04-21 21:19:40

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 10/33] SUNRPC: Fix read ordering problems with req->rq_private_buf.len

Hi Trond-

On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> We want to ensure that req->rq_private_buf.len is updated before
> req->rq_received, so that call_decode() doesn't use an old value for
> req->rq_rcv_buf.len.
>
> In 'call_decode()' itself, instead of using task->tk_status (which
> is set
> using req->rq_received) must use the actual value of
> req->rq_private_buf.len when deciding whether or not the received
> RPC reply
> is too short.
>
> Finally ensure that we set req->rq_rcv_buf.len to zero when retrying a
> request. A typo meant that we were resetting req->rq_private_buf.len
> in
> call_decode(), and then clobbering that value with the old
> rq_rcv_buf.len
> again in xprt_transmit().

After staring at this for a while, the interaction between
xprt_complete_rqst and call_decode isn't clear to me.

I take it there is no guarantee that the xdr_buf fields and
rq_received are completely updated before the task is awoken and
call_decode runs?

> Signed-off-by: Trond Myklebust <[email protected]>
> ---
>
> net/sunrpc/clnt.c | 26 +++++++++++++-------------
> net/sunrpc/xprt.c | 3 ++-
> 2 files changed, 15 insertions(+), 14 deletions(-)
>
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 0c29792..57663a4 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1195,18 +1195,6 @@ call_decode(struct rpc_task *task)
> task->tk_flags &= ~RPC_CALL_MAJORSEEN;
> }
>
> - if (task->tk_status < 12) {
> - if (!RPC_IS_SOFT(task)) {
> - task->tk_action = call_bind;
> - clnt->cl_stats->rpcretrans++;
> - goto out_retry;
> - }
> - dprintk("RPC: %s: too small RPC reply size (%d bytes)\n",
> - clnt->cl_protname, task->tk_status);
> - task->tk_action = call_timeout;
> - goto out_retry;
> - }
> -
> /*
> * Ensure that we see all writes made by xprt_complete_rqst()
> * before it changed req->rq_received.
> @@ -1218,6 +1206,18 @@ call_decode(struct rpc_task *task)
> WARN_ON(memcmp(&req->rq_rcv_buf, &req->rq_private_buf,
> sizeof(req->rq_rcv_buf)) != 0);
>
> + if (req->rq_rcv_buf.len < 12) {
> + if (!RPC_IS_SOFT(task)) {
> + task->tk_action = call_bind;
> + clnt->cl_stats->rpcretrans++;
> + goto out_retry;
> + }
> + dprintk("RPC: %s: too small RPC reply size (%d bytes)\n",
> + clnt->cl_protname, task->tk_status);
> + task->tk_action = call_timeout;
> + goto out_retry;
> + }
> +
> /* Verify the RPC header */
> p = call_verify(task);
> if (IS_ERR(p)) {
> @@ -1239,7 +1239,7 @@ out_retry:
> task->tk_status = 0;
> /* Note: call_verify() may have freed the RPC slot */
> if (task->tk_rqstp == req) {
> - req->rq_received = req->rq_private_buf.len = 0;
> + req->rq_received = req->rq_rcv_buf.len = 0;
> if (task->tk_client->cl_discrtry)
> xprt_force_disconnect(task->tk_xprt);
> }
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index 3ba64f9..5110a4e 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -757,9 +757,10 @@ void xprt_complete_rqst(struct rpc_task *task,
> int copied)
> task->tk_rtt = (long)jiffies - req->rq_xtime;
>
> list_del_init(&req->rq_list);
> + req->rq_private_buf.len = copied;
> /* Ensure all writes are done before we update req->rq_received */
> smp_wmb();
> - req->rq_received = req->rq_private_buf.len = copied;
> + req->rq_received = copied;
> rpc_wake_up_queued_task(&xprt->pending, task);
> }
> EXPORT_SYMBOL_GPL(xprt_complete_rqst);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2008-04-22 00:30:02

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 10/33] SUNRPC: Fix read ordering problems with req->rq_private_buf.len


On Mon, 2008-04-21 at 17:19 -0400, Chuck Lever wrote:
> Hi Trond-
>
> On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> > We want to ensure that req->rq_private_buf.len is updated before
> > req->rq_received, so that call_decode() doesn't use an old value for
> > req->rq_rcv_buf.len.
> >
> > In 'call_decode()' itself, instead of using task->tk_status (which
> > is set
> > using req->rq_received) must use the actual value of
> > req->rq_private_buf.len when deciding whether or not the received
> > RPC reply
> > is too short.
> >
> > Finally ensure that we set req->rq_rcv_buf.len to zero when retrying a
> > request. A typo meant that we were resetting req->rq_private_buf.len
> > in
> > call_decode(), and then clobbering that value with the old
> > rq_rcv_buf.len
> > again in xprt_transmit().
>
> After staring at this for a while, the interaction between
> xprt_complete_rqst and call_decode isn't clear to me.
>
> I take it there is no guarantee that the xdr_buf fields and
> rq_received are completely updated before the task is awoken and
> call_decode runs?

The call could complete just as the RPC call is being woken up due to a
timeout. In any case, we need to ensure that the ordering of the update
is correct. We need to know that if a processor sees req->rq_received as
being non-zero, then the same processor will see req->rq_private_buf.len
as being updated: on something like an alpha processor or a PPC, we need
to use explicit read and write barriers to ensure this.


--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com