2015-09-21 17:24:24

by Steve Wise

[permalink] [raw]
Subject: [PATCH 1/3] xprtrdma: disconnect and flush cqs before freeing buffers

Otherwise a FRMR completion can cause a touch-after-free crash.

In xprt_rdma_destroy(), call rpcrdma_buffer_destroy() only after calling
rpcrdma_ep_destroy().

In rpcrdma_ep_destroy(), disconnect the cm_id first which should flush the
qp, then drain the cqs, then destroy the qp, and finally destroy the cqs.

Signed-off-by: Steve Wise <[email protected]>
Tested-by: Chuck Lever <[email protected]>
---

net/sunrpc/xprtrdma/transport.c | 2 +-
net/sunrpc/xprtrdma/verbs.c | 9 ++++++---
2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 64443eb..41e452b 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -270,8 +270,8 @@ xprt_rdma_destroy(struct rpc_xprt *xprt)

xprt_clear_connected(xprt);

- rpcrdma_buffer_destroy(&r_xprt->rx_buf);
rpcrdma_ep_destroy(&r_xprt->rx_ep, &r_xprt->rx_ia);
+ rpcrdma_buffer_destroy(&r_xprt->rx_buf);
rpcrdma_ia_close(&r_xprt->rx_ia);

xprt_rdma_free_addresses(xprt);
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 6829967..01a314a 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -755,19 +755,22 @@ rpcrdma_ep_destroy(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)

cancel_delayed_work_sync(&ep->rep_connect_worker);

- if (ia->ri_id->qp) {
+ if (ia->ri_id->qp)
rpcrdma_ep_disconnect(ep, ia);
+
+ rpcrdma_clean_cq(ep->rep_attr.recv_cq);
+ rpcrdma_clean_cq(ep->rep_attr.send_cq);
+
+ if (ia->ri_id->qp) {
rdma_destroy_qp(ia->ri_id);
ia->ri_id->qp = NULL;
}

- rpcrdma_clean_cq(ep->rep_attr.recv_cq);
rc = ib_destroy_cq(ep->rep_attr.recv_cq);
if (rc)
dprintk("RPC: %s: ib_destroy_cq returned %i\n",
__func__, rc);

- rpcrdma_clean_cq(ep->rep_attr.send_cq);
rc = ib_destroy_cq(ep->rep_attr.send_cq);
if (rc)
dprintk("RPC: %s: ib_destroy_cq returned %i\n",



2015-09-21 17:24:29

by Steve Wise

[permalink] [raw]
Subject: [PATCH 2/3] svcrdma: handle rdma read with a non-zero initial page offset

The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions
were not taking into account the initial page_offset when determining
the rdma read length. This resulted in a read who's starting address
and length exceeded the base/bounds of the frmr.

Most work loads don't tickle this bug apparently, but one test hit it
every time: building the linux kernel on a 16 core node with 'make -j
16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA.

This bug seems to only be tripped with devices having small fastreg page
list depths. I didn't see it with mlx4, for instance.

Fixes: 0bf4828983df ('svcrdma: refactor marshalling logic')
Signed-off-by: Steve Wise <[email protected]>
Tested-by: Chuck Lever <[email protected]>
---

net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index cb51742..5f6ca47 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -136,7 +136,8 @@ int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
ctxt->direction = DMA_FROM_DEVICE;
ctxt->read_hdr = head;
pages_needed = min_t(int, pages_needed, xprt->sc_max_sge_rd);
- read = min_t(int, pages_needed << PAGE_SHIFT, rs_length);
+ read = min_t(int, (pages_needed << PAGE_SHIFT) - *page_offset,
+ rs_length);

for (pno = 0; pno < pages_needed; pno++) {
int len = min_t(int, rs_length, PAGE_SIZE - pg_off);
@@ -235,7 +236,8 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
ctxt->direction = DMA_FROM_DEVICE;
ctxt->frmr = frmr;
pages_needed = min_t(int, pages_needed, xprt->sc_frmr_pg_list_len);
- read = min_t(int, pages_needed << PAGE_SHIFT, rs_length);
+ read = min_t(int, (pages_needed << PAGE_SHIFT) - *page_offset,
+ rs_length);

frmr->kva = page_address(rqstp->rq_arg.pages[pg_no]);
frmr->direction = DMA_FROM_DEVICE;


2015-09-21 17:24:35

by Steve Wise

[permalink] [raw]
Subject: [PATCH 3/3] xprtrdma: don't log warnings for flushed completions

Unsignaled send WRs can get flushed as part of normal unmount, so don't
log them as warnings.

Signed-off-by: Steve Wise <[email protected]>
---

net/sunrpc/xprtrdma/frwr_ops.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index d6653f5..f1868ba 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -257,8 +257,11 @@ frwr_sendcompletion(struct ib_wc *wc)

/* WARNING: Only wr_id and status are reliable at this point */
r = (struct rpcrdma_mw *)(unsigned long)wc->wr_id;
- pr_warn("RPC: %s: frmr %p flushed, status %s (%d)\n",
- __func__, r, ib_wc_status_msg(wc->status), wc->status);
+ if (wc->status == IB_WC_WR_FLUSH_ERR)
+ dprintk("RPC: %s: frmr %p flushed\n", __func__, r);
+ else
+ pr_warn("RPC: %s: frmr %p error, status %s (%d)\n",
+ __func__, r, ib_wc_status_msg(wc->status), wc->status);
r->r.frmr.fr_state = FRMR_IS_STALE;
}



2015-09-28 14:30:46

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH 1/3] xprtrdma: disconnect and flush cqs before freeing buffers

On 9/21/2015 12:24 PM, Steve Wise wrote:
> Otherwise a FRMR completion can cause a touch-after-free crash.
>
> In xprt_rdma_destroy(), call rpcrdma_buffer_destroy() only after calling
> rpcrdma_ep_destroy().
>
> In rpcrdma_ep_destroy(), disconnect the cm_id first which should flush the
> qp, then drain the cqs, then destroy the qp, and finally destroy the cqs.
>
> Signed-off-by: Steve Wise <[email protected]>
> Tested-by: Chuck Lever <[email protected]>
> ---

Hey Trond, I'm hoping this can make 4.3-rc (and stable if you agree).

Thanks,

Steve.


2015-09-28 14:31:24

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH 2/3] svcrdma: handle rdma read with a non-zero initial page offset

On 9/21/2015 12:24 PM, Steve Wise wrote:
> The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions
> were not taking into account the initial page_offset when determining
> the rdma read length. This resulted in a read who's starting address
> and length exceeded the base/bounds of the frmr.
>
> Most work loads don't tickle this bug apparently, but one test hit it
> every time: building the linux kernel on a 16 core node with 'make -j
> 16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA.
>
> This bug seems to only be tripped with devices having small fastreg page
> list depths. I didn't see it with mlx4, for instance.
>
> Fixes: 0bf4828983df ('svcrdma: refactor marshalling logic')
> Signed-off-by: Steve Wise <[email protected]>
> Tested-by: Chuck Lever <[email protected]>
> ---
>
>

Hey Bruce, can this make 4.3-rc? Also, what do you think about pushing
it to stable?

Thanks,

Steve.

2015-09-28 14:45:39

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 1/3] xprtrdma: disconnect and flush cqs before freeing buffers

Hi Steve,

On 09/28/2015 10:30 AM, Steve Wise wrote:
> On 9/21/2015 12:24 PM, Steve Wise wrote:
>> Otherwise a FRMR completion can cause a touch-after-free crash.
>>
>> In xprt_rdma_destroy(), call rpcrdma_buffer_destroy() only after calling
>> rpcrdma_ep_destroy().
>>
>> In rpcrdma_ep_destroy(), disconnect the cm_id first which should flush the
>> qp, then drain the cqs, then destroy the qp, and finally destroy the cqs.
>>
>> Signed-off-by: Steve Wise <[email protected]>
>> Tested-by: Chuck Lever <[email protected]>
>> ---
>
> Hey Trond, I'm hoping this can make 4.3-rc (and stable if you agree).

This patch looks fine to me. I'll pass it on to Trond!

I'll save patch 3/3 for the Linux 4.4 merge.

Thanks,
Anna

>
> Thanks,
>
> Steve.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2015-09-28 14:50:57

by Steve Wise

[permalink] [raw]
Subject: RE: [PATCH 1/3] xprtrdma: disconnect and flush cqs before freeing buffers



> -----Original Message-----
> From: Anna Schumaker [mailto:[email protected]]
> Sent: Monday, September 28, 2015 9:45 AM
> To: Steve Wise; [email protected]; [email protected]
> Cc: [email protected]; [email protected]
> Subject: Re: [PATCH 1/3] xprtrdma: disconnect and flush cqs before freeing buffers
>
> Hi Steve,
>
> On 09/28/2015 10:30 AM, Steve Wise wrote:
> > On 9/21/2015 12:24 PM, Steve Wise wrote:
> >> Otherwise a FRMR completion can cause a touch-after-free crash.
> >>
> >> In xprt_rdma_destroy(), call rpcrdma_buffer_destroy() only after calling
> >> rpcrdma_ep_destroy().
> >>
> >> In rpcrdma_ep_destroy(), disconnect the cm_id first which should flush the
> >> qp, then drain the cqs, then destroy the qp, and finally destroy the cqs.
> >>
> >> Signed-off-by: Steve Wise <[email protected]>
> >> Tested-by: Chuck Lever <[email protected]>
> >> ---
> >
> > Hey Trond, I'm hoping this can make 4.3-rc (and stable if you agree).
>
> This patch looks fine to me. I'll pass it on to Trond!
>
> I'll save patch 3/3 for the Linux 4.4 merge.
>
> Thanks,
> Anna
>

Thanks. Going forward I'll make sure you are CC'd for client patches too! I wasn't sure if you are formally taking all xprtrdma patches and sending them to Trond...

Steve.



2015-09-28 14:57:23

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 1/3] xprtrdma: disconnect and flush cqs before freeing buffers

On 09/28/2015 10:50 AM, Steve Wise wrote:
>
>
>> -----Original Message-----
>> From: Anna Schumaker [mailto:[email protected]]
>> Sent: Monday, September 28, 2015 9:45 AM
>> To: Steve Wise; [email protected]; [email protected]
>> Cc: [email protected]; [email protected]
>> Subject: Re: [PATCH 1/3] xprtrdma: disconnect and flush cqs before freeing buffers
>>
>> Hi Steve,
>>
>> On 09/28/2015 10:30 AM, Steve Wise wrote:
>>> On 9/21/2015 12:24 PM, Steve Wise wrote:
>>>> Otherwise a FRMR completion can cause a touch-after-free crash.
>>>>
>>>> In xprt_rdma_destroy(), call rpcrdma_buffer_destroy() only after calling
>>>> rpcrdma_ep_destroy().
>>>>
>>>> In rpcrdma_ep_destroy(), disconnect the cm_id first which should flush the
>>>> qp, then drain the cqs, then destroy the qp, and finally destroy the cqs.
>>>>
>>>> Signed-off-by: Steve Wise <[email protected]>
>>>> Tested-by: Chuck Lever <[email protected]>
>>>> ---
>>>
>>> Hey Trond, I'm hoping this can make 4.3-rc (and stable if you agree).
>>
>> This patch looks fine to me. I'll pass it on to Trond!
>>
>> I'll save patch 3/3 for the Linux 4.4 merge.
>>
>> Thanks,
>> Anna
>>
>
> Thanks. Going forward I'll make sure you are CC'd for client patches too! I wasn't sure if you are formally taking all xprtrdma patches and sending them to Trond...

Yeah, I'm taking all the client RDMA patches. I'm glad that's cleared up now! :)

Anna

>
> Steve.
>
>


2015-09-28 21:04:59

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 2/3] svcrdma: handle rdma read with a non-zero initial page offset

On Mon, Sep 28, 2015 at 09:31:25AM -0500, Steve Wise wrote:
> On 9/21/2015 12:24 PM, Steve Wise wrote:
> >The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions
> >were not taking into account the initial page_offset when determining
> >the rdma read length. This resulted in a read who's starting address
> >and length exceeded the base/bounds of the frmr.
> >
> >Most work loads don't tickle this bug apparently, but one test hit it
> >every time: building the linux kernel on a 16 core node with 'make -j
> >16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA.
> >
> >This bug seems to only be tripped with devices having small fastreg page
> >list depths. I didn't see it with mlx4, for instance.
> >
> >Fixes: 0bf4828983df ('svcrdma: refactor marshalling logic')
> >Signed-off-by: Steve Wise <[email protected]>
> >Tested-by: Chuck Lever <[email protected]>
> >---
> >
> >
>
> Hey Bruce, can this make 4.3-rc? Also, what do you think about
> pushing it to stable?

It looks like a reasonable candidate for stable. Apologies, somehow I
missed it when you posted it--would you mind resending?

--b.

2015-09-28 21:49:19

by Steve Wise

[permalink] [raw]
Subject: RE: [PATCH 2/3] svcrdma: handle rdma read with a non-zero initial page offset



> -----Original Message-----
> From: J. Bruce Fields [mailto:[email protected]]
> Sent: Monday, September 28, 2015 4:05 PM
> To: Steve Wise
> Cc: [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 2/3] svcrdma: handle rdma read with a non-zero initial page offset
>
> On Mon, Sep 28, 2015 at 09:31:25AM -0500, Steve Wise wrote:
> > On 9/21/2015 12:24 PM, Steve Wise wrote:
> > >The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions
> > >were not taking into account the initial page_offset when determining
> > >the rdma read length. This resulted in a read who's starting address
> > >and length exceeded the base/bounds of the frmr.
> > >
> > >Most work loads don't tickle this bug apparently, but one test hit it
> > >every time: building the linux kernel on a 16 core node with 'make -j
> > >16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA.
> > >
> > >This bug seems to only be tripped with devices having small fastreg page
> > >list depths. I didn't see it with mlx4, for instance.
> > >
> > >Fixes: 0bf4828983df ('svcrdma: refactor marshalling logic')
> > >Signed-off-by: Steve Wise <[email protected]>
> > >Tested-by: Chuck Lever <[email protected]>
> > >---
> > >
> > >
> >
> > Hey Bruce, can this make 4.3-rc? Also, what do you think about
> > pushing it to stable?
>
> It looks like a reasonable candidate for stable. Apologies, somehow I
> missed it when you posted it--would you mind resending?
>
> --b.

resent this one patch.

What is your process for pushing to stable?

Thanks,

Steve.


2015-09-29 15:40:47

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 2/3] svcrdma: handle rdma read with a non-zero initial page offset

On Mon, Sep 28, 2015 at 04:49:21PM -0500, Steve Wise wrote:
>
>
> > -----Original Message-----
> > From: J. Bruce Fields [mailto:[email protected]]
> > Sent: Monday, September 28, 2015 4:05 PM
> > To: Steve Wise
> > Cc: [email protected]; [email protected]; [email protected]
> > Subject: Re: [PATCH 2/3] svcrdma: handle rdma read with a non-zero initial page offset
> >
> > On Mon, Sep 28, 2015 at 09:31:25AM -0500, Steve Wise wrote:
> > > On 9/21/2015 12:24 PM, Steve Wise wrote:
> > > >The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions
> > > >were not taking into account the initial page_offset when determining
> > > >the rdma read length. This resulted in a read who's starting address
> > > >and length exceeded the base/bounds of the frmr.
> > > >
> > > >Most work loads don't tickle this bug apparently, but one test hit it
> > > >every time: building the linux kernel on a 16 core node with 'make -j
> > > >16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA.
> > > >
> > > >This bug seems to only be tripped with devices having small fastreg page
> > > >list depths. I didn't see it with mlx4, for instance.
> > > >
> > > >Fixes: 0bf4828983df ('svcrdma: refactor marshalling logic')
> > > >Signed-off-by: Steve Wise <[email protected]>
> > > >Tested-by: Chuck Lever <[email protected]>
> > > >---
> > > >
> > > >
> > >
> > > Hey Bruce, can this make 4.3-rc? Also, what do you think about
> > > pushing it to stable?
> >
> > It looks like a reasonable candidate for stable. Apologies, somehow I
> > missed it when you posted it--would you mind resending?
> >
> > --b.
>
> resent this one patch.
>
> What is your process for pushing to stable?

Currently my standard regression tests don't seem to be passing on
recent 4.3; once I figure that out, I'll push out a branch with your
patch (and any others required) to

git://linux-nfs.org/~bfields for-4.3

and then send a pull request to Linus. Should be by the end of the
week.

--b.