When an RPC request is deferred, the rq_xprt_ctxt pointer is moved out
of the svc_rqst into the svc_deferred_req.
When the deferred request is revisited, the pointer is copied into
the new svc_rqst - and also remains in the svc_deferred_req.
In the (rare?) case that the request is deferred a second time, the old
svc_deferred_req is reused - it still has all the correct content.
However in that case the rq_xprt_ctxt pointer is NOT cleared so that
when xpo_release_xprt is called, the ctxt is freed (UDP) or possible
added to a free list (RDMA).
When the deferred request is revisited for a second time, it will
reference this ctxt which may be invalid, and the free the object a
second time which is likely to oops.
So change svc_defer() to *always* clear rq_xprt_ctxt, and assert that
the value is now stored in the svc_deferred_req.
Fixes: 773f91b2cf3f ("SUNRPC: Fix NFSD's request deferral on RDMA transports")
Signed-off-by: NeilBrown <[email protected]>
---
net/sunrpc/svc_xprt.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 84e5d7d31481..5fd94f6bdc75 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -1223,13 +1223,14 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req)
dr->daddr = rqstp->rq_daddr;
dr->argslen = rqstp->rq_arg.len >> 2;
dr->xprt_ctxt = rqstp->rq_xprt_ctxt;
- rqstp->rq_xprt_ctxt = NULL;
/* back up head to the start of the buffer and copy */
skip = rqstp->rq_arg.len - rqstp->rq_arg.head[0].iov_len;
memcpy(dr->args, rqstp->rq_arg.head[0].iov_base - skip,
dr->argslen << 2);
}
+ WARN_ON_ONCE(rqstp->rq_xprt_ctxt != dr->xprt_ctxt);
+ rqstp->rq_xprt_ctxt = NULL;
trace_svc_defer(rqstp);
svc_xprt_get(rqstp->rq_xprt);
dr->xprt = rqstp->rq_xprt;
--
2.40.1
On Tue, 2023-05-09 at 09:41 +1000, NeilBrown wrote:
> When an RPC request is deferred, the rq_xprt_ctxt pointer is moved out
> of the svc_rqst into the svc_deferred_req.
> When the deferred request is revisited, the pointer is copied into
> the new svc_rqst - and also remains in the svc_deferred_req.
>
> In the (rare?) case that the request is deferred a second time, the old
> svc_deferred_req is reused - it still has all the correct content.
> However in that case the rq_xprt_ctxt pointer is NOT cleared so that
> when xpo_release_xprt is called, the ctxt is freed (UDP) or possible
> added to a free list (RDMA).
> When the deferred request is revisited for a second time, it will
> reference this ctxt which may be invalid, and the free the object a
> second time which is likely to oops.
>
I've always found the deferral code to be really hard to follow. The
dearth of comments around the design doesn't help either...
To be clear, when we call into svc_defer, if rq_deferred is already
set, then that means that we're revisiting a request that has already
been deferred at least once?
> So change svc_defer() to *always* clear rq_xprt_ctxt, and assert that
> the value is now stored in the svc_deferred_req.
>
> Fixes: 773f91b2cf3f ("SUNRPC: Fix NFSD's request deferral on RDMA transports")
> Signed-off-by: NeilBrown <[email protected]>
> ---
> net/sunrpc/svc_xprt.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 84e5d7d31481..5fd94f6bdc75 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -1223,13 +1223,14 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req)
> dr->daddr = rqstp->rq_daddr;
> dr->argslen = rqstp->rq_arg.len >> 2;
> dr->xprt_ctxt = rqstp->rq_xprt_ctxt;
> - rqstp->rq_xprt_ctxt = NULL;
>
> /* back up head to the start of the buffer and copy */
> skip = rqstp->rq_arg.len - rqstp->rq_arg.head[0].iov_len;
> memcpy(dr->args, rqstp->rq_arg.head[0].iov_base - skip,
> dr->argslen << 2);
> }
> + WARN_ON_ONCE(rqstp->rq_xprt_ctxt != dr->xprt_ctxt);
> + rqstp->rq_xprt_ctxt = NULL;
> trace_svc_defer(rqstp);
> svc_xprt_get(rqstp->rq_xprt);
> dr->xprt = rqstp->rq_xprt;
I think this looks right, assuming my understanding of what
rq_deferred == NULL indicates.
Reviewed-by: Jeff Layton <[email protected]>