Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751325AbdFEO0A (ORCPT ); Mon, 5 Jun 2017 10:26:00 -0400 Received: from zimbra1.kalray.eu ([92.103.151.219]:41128 "EHLO zimbra1.kalray.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751112AbdFEOZ7 (ORCPT ); Mon, 5 Jun 2017 10:25:59 -0400 DKIM-Filter: OpenDKIM Filter v2.9.2 zimbra1.kalray.eu 2A9F02809F7 Date: Mon, 5 Jun 2017 16:25:56 +0200 (CEST) From: Marta Rybczynska To: Max Gurtovoy Cc: keith busch , axboe@fb.com, hch@lst.de, sagi@grimberg.me, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, Jason Gunthorpe , Bart Van Assche , Doug Ledford , Leon Romanovsky Message-ID: <73128754.75195919.1496672756008.JavaMail.zimbra@kalray.eu> In-Reply-To: References: <1785464880.73761005.1496655938570.JavaMail.zimbra@kalray.eu> Subject: Re: [PATCH] nvme-rdma: remove race conditions from IB signalling MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [192.168.37.210] X-Mailer: Zimbra 8.6.0_GA_1182 (ZimbraWebClient - FF45 (Linux)/8.6.0_GA_1182) Thread-Topic: nvme-rdma: remove race conditions from IB signalling Thread-Index: 7zEjThnPtrZ+5kuN+mRlq/Sa6ulP2w== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2972 Lines: 76 > On 6/5/2017 12:45 PM, Marta Rybczynska wrote: >> This patch improves the way the RDMA IB signalling is done >> by using atomic operations for the signalling variable. This >> avoids race conditions on sig_count. >> >> The signalling interval changes slightly and is now the >> largest power of two not larger than queue depth / 2. >> >> ilog() usage idea by Bart Van Assche. >> >> Signed-off-by: Marta Rybczynska >> --- >> drivers/nvme/host/rdma.c | 31 +++++++++++++++++++++---------- >> 1 file changed, 21 insertions(+), 10 deletions(-) >> >> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c >> index 28bd255..80682f7 100644 >> --- a/drivers/nvme/host/rdma.c >> +++ b/drivers/nvme/host/rdma.c >> @@ -88,7 +88,7 @@ enum nvme_rdma_queue_flags { >> >> struct nvme_rdma_queue { >> struct nvme_rdma_qe *rsp_ring; >> - u8 sig_count; >> + atomic_t sig_count; >> int queue_size; >> size_t cmnd_capsule_len; >> struct nvme_rdma_ctrl *ctrl; >> @@ -554,6 +554,8 @@ static int nvme_rdma_init_queue(struct nvme_rdma_ctrl *ctrl, >> >> queue->queue_size = queue_size; >> >> + atomic_set(&queue->sig_count, 0); >> + >> queue->cm_id = rdma_create_id(&init_net, nvme_rdma_cm_handler, queue, >> RDMA_PS_TCP, IB_QPT_RC); >> if (IS_ERR(queue->cm_id)) { >> @@ -1038,17 +1040,26 @@ static void nvme_rdma_send_done(struct ib_cq *cq, struct >> ib_wc *wc) >> nvme_rdma_wr_error(cq, wc, "SEND"); >> } >> >> -static inline int nvme_rdma_queue_sig_limit(struct nvme_rdma_queue *queue) >> +static inline int nvme_rdma_init_sig_count(int queue_size) >> { >> - int sig_limit; >> - >> - /* >> - * We signal completion every queue depth/2 and also handle the >> - * degenerated case of a device with queue_depth=1, where we >> - * would need to signal every message. >> + /* We want to signal completion at least every queue depth/2. >> + * This returns the largest power of two that is not above half >> + * of (queue size + 1) to optimize (avoid divisions). >> */ >> - sig_limit = max(queue->queue_size / 2, 1); >> - return (++queue->sig_count % sig_limit) == 0; >> + return 1 << ilog2((queue_size + 1) / 2); >> +} >> + >> +static inline bool nvme_rdma_queue_sig_limit(struct nvme_rdma_queue *queue) >> +{ >> + int count, limit; >> + >> + limit = nvme_rdma_init_sig_count(queue->queue_size); > > Maybe I'm missing something, but why we need to calc limit each time in > data path (not a big deal but we can avoid that)? > Can you add queue->sig_limit that will be calculated once at queue > allocation ? We've said last time that ilog() is cheap enough so that we do not need to add this variable. It maps to fls() or a specific instruction of a platform. Marta