Return-Path: Received: from userp1040.oracle.com ([156.151.31.81]:28056 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751053AbbEKPWq convert rfc822-to-8bit (ORCPT ); Mon, 11 May 2015 11:22:46 -0400 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: [PATCH v1 04/14] xprtrdma: Use ib_device pointer safely From: Chuck Lever In-Reply-To: <554B80B7.8090900@dev.mellanox.co.il> Date: Mon, 11 May 2015 11:22:59 -0400 Cc: linux-rdma , Linux NFS Mailing List Message-Id: References: <20150504174626.3483.97639.stgit@manet.1015granger.net> <20150504175720.3483.80356.stgit@manet.1015granger.net> <554B37CF.2070206@dev.mellanox.co.il> <554B6F2A.6000608@dev.mellanox.co.il> <554B80B7.8090900@dev.mellanox.co.il> To: Sagi Grimberg , "Hefty, Sean" Sender: linux-nfs-owner@vger.kernel.org List-ID: On May 7, 2015, at 11:11 AM, Sagi Grimberg wrote: > On 5/7/2015 5:12 PM, Chuck Lever wrote: >> >> On May 7, 2015, at 9:56 AM, Sagi Grimberg wrote: >> >>> On 5/7/2015 4:39 PM, Chuck Lever wrote: >>>> >>>> On May 7, 2015, at 6:00 AM, Sagi Grimberg wrote: >>>> >>>>> On 5/4/2015 8:57 PM, Chuck Lever wrote: >>>>>> The connect worker can replace ri_id, but prevents ri_id->device >>>>>> from changing during the lifetime of a transport instance. >>>>>> >>>>>> Cache a copy of ri_id->device in rpcrdma_ia and in rpcrdma_rep. >>>>>> The cached copy can be used safely in code that does not serialize >>>>>> with the connect worker. >>>>>> >>>>>> Other code can use it to save an extra address generation (one >>>>>> pointer dereference instead of two). >>>>>> >>>>>> Signed-off-by: Chuck Lever >>>>>> --- >>>>>> net/sunrpc/xprtrdma/fmr_ops.c | 8 +---- >>>>>> net/sunrpc/xprtrdma/frwr_ops.c | 12 +++---- >>>>>> net/sunrpc/xprtrdma/physical_ops.c | 8 +---- >>>>>> net/sunrpc/xprtrdma/verbs.c | 61 +++++++++++++++++++----------------- >>>>>> net/sunrpc/xprtrdma/xprt_rdma.h | 2 + >>>>>> 5 files changed, 43 insertions(+), 48 deletions(-) >>>>>> >>>>>> diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c >>>>>> index 302d4eb..0a96155 100644 >>>>>> --- a/net/sunrpc/xprtrdma/fmr_ops.c >>>>>> +++ b/net/sunrpc/xprtrdma/fmr_ops.c >>>>>> @@ -85,7 +85,7 @@ fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, >>>>>> int nsegs, bool writing) >>>>>> { >>>>>> struct rpcrdma_ia *ia = &r_xprt->rx_ia; >>>>>> - struct ib_device *device = ia->ri_id->device; >>>>>> + struct ib_device *device = ia->ri_device; >>>>>> enum dma_data_direction direction = rpcrdma_data_dir(writing); >>>>>> struct rpcrdma_mr_seg *seg1 = seg; >>>>>> struct rpcrdma_mw *mw = seg1->rl_mw; >>>>>> @@ -137,17 +137,13 @@ fmr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg) >>>>>> { >>>>>> struct rpcrdma_ia *ia = &r_xprt->rx_ia; >>>>>> struct rpcrdma_mr_seg *seg1 = seg; >>>>>> - struct ib_device *device; >>>>>> int rc, nsegs = seg->mr_nsegs; >>>>>> LIST_HEAD(l); >>>>>> >>>>>> list_add(&seg1->rl_mw->r.fmr->list, &l); >>>>>> rc = ib_unmap_fmr(&l); >>>>>> - read_lock(&ia->ri_qplock); >>>>>> - device = ia->ri_id->device; >>>>>> while (seg1->mr_nsegs--) >>>>>> - rpcrdma_unmap_one(device, seg++); >>>>>> - read_unlock(&ia->ri_qplock); >>>>>> + rpcrdma_unmap_one(ia->ri_device, seg++); >>>>> >>>>> Umm, I'm wandering if this is guaranteed to be the same device as >>>>> ri_id->device? >>>>> >>>>> Imagine you are working on a bond device where each slave belongs to >>>>> a different adapter. When the active port toggles, you will see a >>>>> ADDR_CHANGED event (that the current code does not handle...), what >>>>> you'd want to do is just reconnect and rdma_cm will resolve the new >>>>> address for you (via the backup slave). I suspect that in case this >>>>> flow is concurrent with the reconnects you may end up with a stale >>>>> device handle. >>>> >>>> I?m not sure what you mean by ?stale? : freed memory? >>>> >>>> I?m looking at this code in rpcrdma_ep_connect() : >>>> >>>> 916 if (ia->ri_id->device != id->device) { >>>> 917 printk("RPC: %s: can't reconnect on " >>>> 918 "different device!\n", __func__); >>>> 919 rdma_destroy_id(id); >>>> 920 rc = -ENETUNREACH; >>>> 921 goto out; >>>> 922 } >>>> >>>> After reconnecting, if the ri_id has changed, the connect fails. Today, >>>> xprtrdma does not support the device changing out from under it. >>>> >>>> Note also that our receive completion upcall uses ri_id->device for >>>> DMA map syncing. Would that also be a problem during a bond failover? >>>> >>> >>> I'm not talking about ri_id->device, this will be consistent. I'm >>> wandering about ia->ri_device, which might not have been updated yet. >> >> ia->ri_device is never updated. The only place it is set is in >> rpcrdma_ia_open(). > > So you assume that each ri_id that you will recreate contains the > same device handle? This issue is still unresolved. xprtrdma does not assume the device pointers are the same, it does actually check for it. The connect worker is careful to create a new ri_id first, then check to see that the new ri_id device pointer is the same as the device pointer in the old (defunct) ri_id. If the device pointer changes, the old ri_id is kept and the connect operation fails (retried later). That is how we guarantee that ia->ri_id->device never changes, and thus the cached pointer in ia->ri_device never needs to be updated. > I think that for ADDR_CHANGE event when the slave belongs to another > device you will hit a mismatch. CC'ing Sean for more info? Yes, I?d like some confirmation that xprtrdma is not abusing the verbs API. :-) -- Chuck Lever chuck[dot]lever[at]oracle[dot]com