Received: by 10.223.185.116 with SMTP id b49csp5444416wrg; Wed, 7 Mar 2018 11:52:44 -0800 (PST) X-Google-Smtp-Source: AG47ELuzc+9hGGwYKNKspqFti5oVat+52ftPTKxxWraOq8pGf6RmDN91xnAsqXjlKtrVDPadkJdm X-Received: by 10.99.178.94 with SMTP id t30mr18901439pgo.441.1520452364018; Wed, 07 Mar 2018 11:52:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520452363; cv=none; d=google.com; s=arc-20160816; b=gBN8xhRTg1JHWtC//a1Pu4WvZCVquI+W8q4B2nyMS7Nlz2SNWX4qEJU86h7GvJ7CJb i/4WSptPnb20es9epiV1oyVkEjKS9NEic74sRSu3EFGaXJgYla5zacr1pC8Z2bhtGGHn pjr3cmQsz8OBj1cLqFHpjT7sXBAek7pr6g95YFlPEoRzmC6v0OVh/YV8//+HJKXh12Zn BgVd1W3rQy4+mOB1FKR1SJddNlAdUZrCp+yYuKgxx3sEjrY2F2iu5NOTqRLyaqzTkO4N ya+hIyQo52lXWUZsc00E5jRZ3Y2C/FU7gYGE9ToyD6jyboV75PA+6wIWYVHilA6o3qat SWQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=89uPqjCWyjjvzGb+6urYYcKFQQMDUjdn5xOfwQV2RI0=; b=jXv17Ciy1KycfLcSjtr4rB33A6KZtS/+NgylKxlbzbvwccGOyKV0AFNzlhzgdQTRsZ BNoYYdPgkg95RLSoA7goMznu4Y1RZo5Lm/G6KUayVU5C8QiBoFEV/bgiyUnDmG2udCTA 38exF5FDJZEhurGGjH5A7UpA0vqpJ/IA+yCI+6Bk3z4txpkvIzp31ImpPVGdmb2QLxjK 3wUcjlOfaeeeORuHXaQSZ3OgqawXoC01QO/WSd4KlxxQ6gJ+5GhADC3TzszU4AztKHAU 3Pq0shpsHrYvw0KRZ5wU6icir12+dOvnTliNpd25g07i9m4kQJfj/c8CLUR/MUxhS+62 TJnQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u86si14228499pfa.69.2018.03.07.11.52.29; Wed, 07 Mar 2018 11:52:43 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935095AbeCGTuV (ORCPT + 99 others); Wed, 7 Mar 2018 14:50:21 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:45382 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965178AbeCGTsS (ORCPT ); Wed, 7 Mar 2018 14:48:18 -0500 Received: from localhost (unknown [185.236.200.248]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id E17FC1143; Wed, 7 Mar 2018 19:48:17 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Sagi Grimberg , Max Gurtovoy , Christoph Hellwig Subject: [PATCH 4.14 110/110] nvme-rdma: dont suppress send completions Date: Wed, 7 Mar 2018 11:39:33 -0800 Message-Id: <20180307191054.072161192@linuxfoundation.org> X-Mailer: git-send-email 2.16.2 In-Reply-To: <20180307191039.748351103@linuxfoundation.org> References: <20180307191039.748351103@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.14-stable review patch. If anyone has any objections, please let me know. ------------------ From: Sagi Grimberg commit b4b591c87f2b0f4ebaf3a68d4f13873b241aa584 upstream. The entire completions suppress mechanism is currently broken because the HCA might retry a send operation (due to dropped ack) after the nvme transaction has completed. In order to handle this, we signal all send completions and introduce a separate done handler for async events as they will be handled differently (as they don't include in-capsule data by definition). Signed-off-by: Sagi Grimberg Reviewed-by: Max Gurtovoy Signed-off-by: Christoph Hellwig Signed-off-by: Greg Kroah-Hartman --- drivers/nvme/host/rdma.c | 54 ++++++++++++----------------------------------- 1 file changed, 14 insertions(+), 40 deletions(-) --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -88,7 +88,6 @@ enum nvme_rdma_queue_flags { struct nvme_rdma_queue { struct nvme_rdma_qe *rsp_ring; - atomic_t sig_count; int queue_size; size_t cmnd_capsule_len; struct nvme_rdma_ctrl *ctrl; @@ -521,7 +520,6 @@ static int nvme_rdma_alloc_queue(struct queue->cmnd_capsule_len = sizeof(struct nvme_command); queue->queue_size = queue_size; - atomic_set(&queue->sig_count, 0); queue->cm_id = rdma_create_id(&init_net, nvme_rdma_cm_handler, queue, RDMA_PS_TCP, IB_QPT_RC); @@ -1232,21 +1230,9 @@ static void nvme_rdma_send_done(struct i nvme_end_request(rq, req->status, req->result); } -/* - * We want to signal completion at least every queue depth/2. This returns the - * largest power of two that is not above half of (queue size + 1) to optimize - * (avoid divisions). - */ -static inline bool nvme_rdma_queue_sig_limit(struct nvme_rdma_queue *queue) -{ - int limit = 1 << ilog2((queue->queue_size + 1) / 2); - - return (atomic_inc_return(&queue->sig_count) & (limit - 1)) == 0; -} - static int nvme_rdma_post_send(struct nvme_rdma_queue *queue, struct nvme_rdma_qe *qe, struct ib_sge *sge, u32 num_sge, - struct ib_send_wr *first, bool flush) + struct ib_send_wr *first) { struct ib_send_wr wr, *bad_wr; int ret; @@ -1255,31 +1241,12 @@ static int nvme_rdma_post_send(struct nv sge->length = sizeof(struct nvme_command), sge->lkey = queue->device->pd->local_dma_lkey; - qe->cqe.done = nvme_rdma_send_done; - wr.next = NULL; wr.wr_cqe = &qe->cqe; wr.sg_list = sge; wr.num_sge = num_sge; wr.opcode = IB_WR_SEND; - wr.send_flags = 0; - - /* - * Unsignalled send completions are another giant desaster in the - * IB Verbs spec: If we don't regularly post signalled sends - * the send queue will fill up and only a QP reset will rescue us. - * Would have been way to obvious to handle this in hardware or - * at least the RDMA stack.. - * - * Always signal the flushes. The magic request used for the flush - * sequencer is not allocated in our driver's tagset and it's - * triggered to be freed by blk_cleanup_queue(). So we need to - * always mark it as signaled to ensure that the "wr_cqe", which is - * embedded in request's payload, is not freed when __ib_process_cq() - * calls wr_cqe->done(). - */ - if (nvme_rdma_queue_sig_limit(queue) || flush) - wr.send_flags |= IB_SEND_SIGNALED; + wr.send_flags = IB_SEND_SIGNALED; if (first) first->next = ≀ @@ -1329,6 +1296,12 @@ static struct blk_mq_tags *nvme_rdma_tag return queue->ctrl->tag_set.tags[queue_idx - 1]; } +static void nvme_rdma_async_done(struct ib_cq *cq, struct ib_wc *wc) +{ + if (unlikely(wc->status != IB_WC_SUCCESS)) + nvme_rdma_wr_error(cq, wc, "ASYNC"); +} + static void nvme_rdma_submit_async_event(struct nvme_ctrl *arg, int aer_idx) { struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(arg); @@ -1350,10 +1323,12 @@ static void nvme_rdma_submit_async_event cmd->common.flags |= NVME_CMD_SGL_METABUF; nvme_rdma_set_sg_null(cmd); + sqe->cqe.done = nvme_rdma_async_done; + ib_dma_sync_single_for_device(dev, sqe->dma, sizeof(*cmd), DMA_TO_DEVICE); - ret = nvme_rdma_post_send(queue, sqe, &sge, 1, NULL, false); + ret = nvme_rdma_post_send(queue, sqe, &sge, 1, NULL); WARN_ON_ONCE(ret); } @@ -1639,7 +1614,6 @@ static blk_status_t nvme_rdma_queue_rq(s struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq); struct nvme_rdma_qe *sqe = &req->sqe; struct nvme_command *c = sqe->data; - bool flush = false; struct ib_device *dev; blk_status_t ret; int err; @@ -1668,13 +1642,13 @@ static blk_status_t nvme_rdma_queue_rq(s goto err; } + sqe->cqe.done = nvme_rdma_send_done; + ib_dma_sync_single_for_device(dev, sqe->dma, sizeof(struct nvme_command), DMA_TO_DEVICE); - if (req_op(rq) == REQ_OP_FLUSH) - flush = true; err = nvme_rdma_post_send(queue, sqe, req->sge, req->num_sge, - req->mr->need_inval ? &req->reg_wr.wr : NULL, flush); + req->mr->need_inval ? &req->reg_wr.wr : NULL); if (unlikely(err)) { nvme_rdma_unmap_data(queue, rq); goto err;