Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp711591imj; Wed, 13 Feb 2019 16:17:11 -0800 (PST) X-Google-Smtp-Source: AHgI3IbpCRQxEVuxyRAtx47UaOSr+KiyFC8Ct1CFpqpDPbxjP1ch9uyywsMdbXjX1bPk/ebLeQ/U X-Received: by 2002:a17:902:bf06:: with SMTP id bi6mr963219plb.167.1550103431691; Wed, 13 Feb 2019 16:17:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550103431; cv=none; d=google.com; s=arc-20160816; b=vpIUSEcCbSrnYwqstr9m3Q33hFT/FrKigTRBL4sxeGIiRUz6p7P3ef8ntHERkCzGa7 h69rNZ45rgx1ZV4xYHgmXv7s4W2wTLcr7AuHfajeOi6O/JPOuImdf1WKPpnm8vbRT857 dyK+fiGH9ks4xdEaitZFXuZdZNmRuaEGswaJZxCbg5g5zvmHoBKHDlmgJCs+qOhD8Oka CaITQ39M7hSuzLkV2NqfZjyhbJe15/WJ29XERQqmoqjUYfxSnfaT4mZOz5EGj4uQSWPh tz/FvuRM+V4jmT90B3VTkE8MfUtZFCrxou6MnWjs+ncYrYaO6u80PBe/mkPZ5Eb0Ej3H rXLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=LW/B1WdbwOrp9xGbE1O0VcxOYwCwBvOWP6mWN9Fgy44=; b=rGQ8BT2KvalZCzLvL/m4xa0KLrqsmCtueBmXT2A2x5cSnbieC/OnDcqPeE9dO0SY/7 TT/aCxtBCZvBUGpurdvAxO0CUDOvEDaEUI0+08wGgfjWdAH8S8FUohdKqZlo50fvOm8q BJgnvtLnysqW52sEix0ykauKxbU6kyS6vAKC/qv2BcsVlWVKaYKuB4zOcabWZPZij50k 9BJ3hVD1uP/bs1dTJ07GaQzQnKmiNLuqxZEmq18/j9xDRsAhyWjeqIx++nnrU/OBcNfy MJAlWOmRi8/m+i6RIZn1/P6OtBehuKkGA3FDMmTUDCaVjIG8oqVS9yNIjV72CUBluEZR 2uVQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=uc8hyA5H; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x9si826704pfe.254.2019.02.13.16.16.55; Wed, 13 Feb 2019 16:17:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=uc8hyA5H; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731487AbfBMSoq (ORCPT + 99 others); Wed, 13 Feb 2019 13:44:46 -0500 Received: from mail.kernel.org ([198.145.29.99]:43266 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2393986AbfBMSom (ORCPT ); Wed, 13 Feb 2019 13:44:42 -0500 Received: from localhost (5356596B.cm-6-7b.dynamic.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id EA9D020835; Wed, 13 Feb 2019 18:44:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1550083481; bh=CeLCw0ccUC2W6wYH2ysmHkqIWkDWLRYdcc395Wa1XhU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=uc8hyA5HpJbf9PQ5BJiY+I97S/pxCaxlvSYgY+vpKdzvibZnWYITCPAByuFw3sKw6 4Q+mGBf4pPMjr7PEdGtUe/qdzgKFpMSdDBmPgyCrTm7FvLc05FxvXvdS5kvxZE+Ljm IRu72U46sJc4IkoColF+G5LkCRbPl77Jg8EQ75Ps= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Don Dutile , Chuck Lever , "J. Bruce Fields" Subject: [PATCH 4.19 44/44] svcrdma: Remove max_sge check at connect time Date: Wed, 13 Feb 2019 19:38:45 +0100 Message-Id: <20190213183655.483627018@linuxfoundation.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190213183651.648060257@linuxfoundation.org> References: <20190213183651.648060257@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review X-Patchwork-Hint: ignore MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.19-stable review patch. If anyone has any objections, please let me know. ------------------ From: Chuck Lever commit e248aa7be86e8179f20ac0931774ecd746f3f5bf upstream. Two and a half years ago, the client was changed to use gathered Send for larger inline messages, in commit 655fec6987b ("xprtrdma: Use gathered Send for large inline messages"). Several fixes were required because there are a few in-kernel device drivers whose max_sge is 3, and these were broken by the change. Apparently my memory is going, because some time later, I submitted commit 25fd86eca11c ("svcrdma: Don't overrun the SGE array in svc_rdma_send_ctxt"), and after that, commit f3c1fd0ee294 ("svcrdma: Reduce max_send_sges"). These too incorrectly assumed in-kernel device drivers would have more than a few Send SGEs available. The fix for the server side is not the same. This is because the fundamental problem on the server is that, whether or not the client has provisioned a chunk for the RPC reply, the server must squeeze even the most complex RPC replies into a single RDMA Send. Failing in the send path because of Send SGE exhaustion should never be an option. Therefore, instead of failing when the send path runs out of SGEs, switch to using a bounce buffer mechanism to handle RPC replies that are too complex for the device to send directly. That allows us to remove the max_sge check to enable drivers with small max_sge to work again. Reported-by: Don Dutile Fixes: 25fd86eca11c ("svcrdma: Don't overrun the SGE array in ...") Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman --- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 105 +++++++++++++++++++++++++++++-- net/sunrpc/xprtrdma/svc_rdma_transport.c | 9 -- 2 files changed, 102 insertions(+), 12 deletions(-) --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -563,6 +563,99 @@ void svc_rdma_sync_reply_hdr(struct svcx DMA_TO_DEVICE); } +/* If the xdr_buf has more elements than the device can + * transmit in a single RDMA Send, then the reply will + * have to be copied into a bounce buffer. + */ +static bool svc_rdma_pull_up_needed(struct svcxprt_rdma *rdma, + struct xdr_buf *xdr, + __be32 *wr_lst) +{ + int elements; + + /* xdr->head */ + elements = 1; + + /* xdr->pages */ + if (!wr_lst) { + unsigned int remaining; + unsigned long pageoff; + + pageoff = xdr->page_base & ~PAGE_MASK; + remaining = xdr->page_len; + while (remaining) { + ++elements; + remaining -= min_t(u32, PAGE_SIZE - pageoff, + remaining); + pageoff = 0; + } + } + + /* xdr->tail */ + if (xdr->tail[0].iov_len) + ++elements; + + /* assume 1 SGE is needed for the transport header */ + return elements >= rdma->sc_max_send_sges; +} + +/* The device is not capable of sending the reply directly. + * Assemble the elements of @xdr into the transport header + * buffer. + */ +static int svc_rdma_pull_up_reply_msg(struct svcxprt_rdma *rdma, + struct svc_rdma_send_ctxt *ctxt, + struct xdr_buf *xdr, __be32 *wr_lst) +{ + unsigned char *dst, *tailbase; + unsigned int taillen; + + dst = ctxt->sc_xprt_buf; + dst += ctxt->sc_sges[0].length; + + memcpy(dst, xdr->head[0].iov_base, xdr->head[0].iov_len); + dst += xdr->head[0].iov_len; + + tailbase = xdr->tail[0].iov_base; + taillen = xdr->tail[0].iov_len; + if (wr_lst) { + u32 xdrpad; + + xdrpad = xdr_padsize(xdr->page_len); + if (taillen && xdrpad) { + tailbase += xdrpad; + taillen -= xdrpad; + } + } else { + unsigned int len, remaining; + unsigned long pageoff; + struct page **ppages; + + ppages = xdr->pages + (xdr->page_base >> PAGE_SHIFT); + pageoff = xdr->page_base & ~PAGE_MASK; + remaining = xdr->page_len; + while (remaining) { + len = min_t(u32, PAGE_SIZE - pageoff, remaining); + + memcpy(dst, page_address(*ppages), len); + remaining -= len; + dst += len; + pageoff = 0; + } + } + + if (taillen) + memcpy(dst, tailbase, taillen); + + ctxt->sc_sges[0].length += xdr->len; + ib_dma_sync_single_for_device(rdma->sc_pd->device, + ctxt->sc_sges[0].addr, + ctxt->sc_sges[0].length, + DMA_TO_DEVICE); + + return 0; +} + /* svc_rdma_map_reply_msg - Map the buffer holding RPC message * @rdma: controlling transport * @ctxt: send_ctxt for the Send WR @@ -585,8 +678,10 @@ int svc_rdma_map_reply_msg(struct svcxpr u32 xdr_pad; int ret; - if (++ctxt->sc_cur_sge_no >= rdma->sc_max_send_sges) - return -EIO; + if (svc_rdma_pull_up_needed(rdma, xdr, wr_lst)) + return svc_rdma_pull_up_reply_msg(rdma, ctxt, xdr, wr_lst); + + ++ctxt->sc_cur_sge_no; ret = svc_rdma_dma_map_buf(rdma, ctxt, xdr->head[0].iov_base, xdr->head[0].iov_len); @@ -617,8 +712,7 @@ int svc_rdma_map_reply_msg(struct svcxpr while (remaining) { len = min_t(u32, PAGE_SIZE - page_off, remaining); - if (++ctxt->sc_cur_sge_no >= rdma->sc_max_send_sges) - return -EIO; + ++ctxt->sc_cur_sge_no; ret = svc_rdma_dma_map_page(rdma, ctxt, *ppages++, page_off, len); if (ret < 0) @@ -632,8 +726,7 @@ int svc_rdma_map_reply_msg(struct svcxpr len = xdr->tail[0].iov_len; tail: if (len) { - if (++ctxt->sc_cur_sge_no >= rdma->sc_max_send_sges) - return -EIO; + ++ctxt->sc_cur_sge_no; ret = svc_rdma_dma_map_buf(rdma, ctxt, base, len); if (ret < 0) return ret; --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -478,12 +478,9 @@ static struct svc_xprt *svc_rdma_accept( /* Transport header, head iovec, tail iovec */ newxprt->sc_max_send_sges = 3; /* Add one SGE per page list entry */ - newxprt->sc_max_send_sges += svcrdma_max_req_size / PAGE_SIZE; - if (newxprt->sc_max_send_sges > dev->attrs.max_send_sge) { - pr_err("svcrdma: too few Send SGEs available (%d needed)\n", - newxprt->sc_max_send_sges); - goto errout; - } + newxprt->sc_max_send_sges += (svcrdma_max_req_size / PAGE_SIZE) + 1; + if (newxprt->sc_max_send_sges > dev->attrs.max_send_sge) + newxprt->sc_max_send_sges = dev->attrs.max_send_sge; newxprt->sc_max_req_size = svcrdma_max_req_size; newxprt->sc_max_requests = svcrdma_max_requests; newxprt->sc_max_bc_requests = svcrdma_max_bc_requests;