Received: by 10.223.164.202 with SMTP id h10csp94627wrb; Tue, 7 Nov 2017 03:30:28 -0800 (PST) X-Google-Smtp-Source: ABhQp+R4wr17B0LHVl6uDtibv+KIIRHHmllMgMi2mNA9wGpnvfQ3Oy1+oeG9yThcJjBqcuv7mcAa X-Received: by 10.99.114.81 with SMTP id c17mr18602937pgn.43.1510054228885; Tue, 07 Nov 2017 03:30:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1510054228; cv=none; d=google.com; s=arc-20160816; b=Rzm3nyyIZrNLqmb7UlVwRS/HzvuxhIxATRXSpXYOv2HlCQ7P7jNJ+vBEXswQWc92yj 9Lungyefos5R9wK7Nqslan0Pa1nyGu88GnowZC3UiU6BzMc1mbjISRsZ9v2Omc1I5CGl h2N2n/gBRVv27aZOgo2104xevUvEFXKc0TRDs27G0ZCYeq/q9Pufd6c1R6VavSDR6cRY Y1ZCYEDQs0TkV+xNIvtblsAziXUjcRITZBaYxbqVYB+G18pti6qmJaVXZcoLkXkIjo+T n0PPpmMyslOEGTuTMM+MA8TkhwABR9BBLQCGSyBWl69JnE63OXn668dxusRsvZc5aXZM mczw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=t/HiSrpPKNRyKzTbzJjund1wZ6PrNklRLj8aCP7neKA=; b=DR7tq6hQ4nD/7LzGuoiUH5Cui4hxcZmO5ZbMpaYIM6wR01uZjSCe9FcWLZi7xgIL46 x7ox41s109HkB/u4NWfgGT6dF9vgsTtQpEtgQ1U5HGt3HLazBMPHAiqZeTd0LBqI7VTz sEftLzpCQFU6TxlDUyp7y9s2t/W9SyNT6erlq1vKj9ReYVwkJSP5QQCQus9D4msMmr4T kS7X/ErXdRwyMT0edE1yLfN6+rLD2i7d03XkqwNAGYrFLnBPD+Tcc2oHgq77K9ae31kz y83UxY2NbFGZDpL+ksWpZXRrvHNVu4UmkibBY6vL1pKEBC/zwcp/1+kCiullJtTC87+O tYRQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=exchange.microsoft.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p8si922676pls.804.2017.11.07.03.30.15; Tue, 07 Nov 2017 03:30:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=exchange.microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756107AbdKGJBn (ORCPT + 91 others); Tue, 7 Nov 2017 04:01:43 -0500 Received: from a2nlsmtp01-03.prod.iad2.secureserver.net ([198.71.225.37]:47104 "EHLO a2nlsmtp01-03.prod.iad2.secureserver.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753262AbdKGI5J (ORCPT ); Tue, 7 Nov 2017 03:57:09 -0500 Received: from linuxonhyperv.com ([107.180.71.197]) by : HOSTING RELAY : with SMTP id BzfreqGQjKv33BzfrejPjw; Tue, 07 Nov 2017 01:56:08 -0700 x-originating-ip: 107.180.71.197 Received: from longli by linuxonhyperv.com with local (Exim 4.89) (envelope-from ) id 1eBzfr-0003M5-IH; Tue, 07 Nov 2017 01:55:59 -0700 From: Long Li To: Steve French , linux-cifs@vger.kernel.org, samba-technical@lists.samba.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, Christoph Hellwig , Tom Talpey , Matthew Wilcox , Stephen Hemminger Cc: Long Li Subject: [Patch v7 19/22] CIFS: SMBD: Upper layer performs SMB write via RDMA read through memory registration Date: Tue, 7 Nov 2017 01:55:11 -0700 Message-Id: <20171107085514.12693-20-longli@exchange.microsoft.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171107085514.12693-1-longli@exchange.microsoft.com> References: <20171107085514.12693-1-longli@exchange.microsoft.com> X-CMAE-Envelope: MS4wfOSx9VOPj+7cr1bpGW03WNHIfPwlB9UX16sXS2ocxbLdQTnTZSb2GuG1zvgQMte4m30NsweHatwbvJULNnS6dS5IhHJvrR+eXkvXj62tHp+0MD+cpXw1 HMwb19vKcJbSk5onzO0wXwEcFRzxdGl+PcqTpkE1TY6i8HhEUPNhUUaNkL/iALqUrxPmX2M79nD0QH01/xhaRQqxnA9b/rrh2osFWSUN05UQpR8HlyoA1t4G nH4AEUjh88719JBC7jVJ3paZ69xKfoPO0Fue6FnDgDsZzB7DLx+qg43pdSYRaFwrAs4s9xuwDVKkZ4NBhwktu1ZuTEhCY0ymu2r6suDr1PIxop3pZzYURGyz IbTl7h2mM5st520bU8WytH6A1lm966X2FQe7gfrnt1a5qfCydJ1pAbxsvPog3BdsW/PLaWZNQF3Esh1kQ2KY8v3t5HYHcaCljMrYtQEupwRWCbDtdghgfG3q neATKvI/01uwbjMrhot4b3o7sNdOqvK2v7Xn8A== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Long Li When sending I/O, if size is larger than rdma_readwrite_threshold we prepare to send SMB write packet for a RDMA read via memory registration. The actual I/O is done by remote peer through local RDMA hardware. Modify the relevant fields in the packet accordingly, and append a smbd_buffer_descriptor_v1 to the end of the SMB write packet. On write I/O finish, deregister the memory region if this was for a RDMA read. If remote invalidation is not used, the call to smbd_deregister_mr will do local invalidation and possibly wait. Memory region is normally deregistered in MID callback as soon as it's used. There are situations where the MID may not be created on I/O failure, under which memory region is deregistered when write data context is released. Signed-off-by: Long Li --- fs/cifs/cifsglob.h | 3 +++ fs/cifs/cifssmb.c | 7 ++++++ fs/cifs/smb2pdu.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++---- 3 files changed, 72 insertions(+), 4 deletions(-) diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index 5585516..66f210d 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -1168,6 +1168,9 @@ struct cifs_writedata { pid_t pid; unsigned int bytes; int result; +#ifdef CONFIG_CIFS_SMB_DIRECT + struct smbd_mr *mr; +#endif unsigned int pagesz; unsigned int tailsz; unsigned int credits; diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index 5857009..e012e3f 100644 --- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -43,6 +43,7 @@ #include "cifs_unicode.h" #include "cifs_debug.h" #include "fscache.h" +#include "smbdirect.h" #ifdef CONFIG_CIFS_POSIX static struct { @@ -1911,6 +1912,12 @@ cifs_writedata_release(struct kref *refcount) { struct cifs_writedata *wdata = container_of(refcount, struct cifs_writedata, refcount); +#ifdef CONFIG_CIFS_SMB_DIRECT + if (wdata->mr) { + smbd_deregister_mr(wdata->mr); + wdata->mr = NULL; + } +#endif if (wdata->cfile) cifsFileInfo_put(wdata->cfile); diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 32ad590..c8afb83 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -48,6 +48,7 @@ #include "smb2glob.h" #include "cifspdu.h" #include "cifs_spnego.h" +#include "smbdirect.h" /* * The following table defines the expected "StructureSize" of SMB2 requests @@ -2656,7 +2657,19 @@ smb2_writev_callback(struct mid_q_entry *mid) wdata->result = -EIO; break; } - +#ifdef CONFIG_CIFS_SMB_DIRECT + /* + * If this wdata has a memory registered, the MR can be freed + * The number of MRs available is limited, it's important to recover + * used MR as soon as I/O is finished. Hold MR longer in the later + * I/O process can possibly result in I/O deadlock due to lack of MR + * to send request on I/O retry + */ + if (wdata->mr) { + smbd_deregister_mr(wdata->mr); + wdata->mr = NULL; + } +#endif if (wdata->result) cifs_stats_fail_inc(tcon, SMB2_WRITE_HE); @@ -2707,7 +2720,42 @@ smb2_async_writev(struct cifs_writedata *wdata, req->DataOffset = cpu_to_le16( offsetof(struct smb2_write_req, Buffer) - 4); req->RemainingBytes = 0; - +#ifdef CONFIG_CIFS_SMB_DIRECT + /* + * If we want to do a server RDMA read, fill in and append + * smbd_buffer_descriptor_v1 to the end of write request + */ + if (server->rdma && wdata->bytes >= + server->smbd_conn->rdma_readwrite_threshold) { + + struct smbd_buffer_descriptor_v1 *v1; + bool need_invalidate = server->dialect == SMB30_PROT_ID; + + wdata->mr = smbd_register_mr( + server->smbd_conn, wdata->pages, + wdata->nr_pages, wdata->tailsz, + false, need_invalidate); + if (!wdata->mr) { + rc = -ENOBUFS; + goto async_writev_out; + } + req->Length = 0; + req->DataOffset = 0; + req->RemainingBytes = + (wdata->nr_pages-1)*PAGE_SIZE + wdata->tailsz; + req->Channel = SMB2_CHANNEL_RDMA_V1_INVALIDATE; + if (need_invalidate) + req->Channel = SMB2_CHANNEL_RDMA_V1; + req->WriteChannelInfoOffset = + offsetof(struct smb2_write_req, Buffer) - 4; + req->WriteChannelInfoLength = + sizeof(struct smbd_buffer_descriptor_v1); + v1 = (struct smbd_buffer_descriptor_v1 *) &req->Buffer[0]; + v1->offset = wdata->mr->mr->iova; + v1->token = wdata->mr->mr->rkey; + v1->length = wdata->mr->mr->length; + } +#endif /* 4 for rfc1002 length field and 1 for Buffer */ iov[0].iov_len = 4; iov[0].iov_base = req; @@ -2720,12 +2768,22 @@ smb2_async_writev(struct cifs_writedata *wdata, rqst.rq_npages = wdata->nr_pages; rqst.rq_pagesz = wdata->pagesz; rqst.rq_tailsz = wdata->tailsz; - +#ifdef CONFIG_CIFS_SMB_DIRECT + if (wdata->mr) { + iov[1].iov_len += sizeof(struct smbd_buffer_descriptor_v1); + rqst.rq_npages = 0; + } +#endif cifs_dbg(FYI, "async write at %llu %u bytes\n", wdata->offset, wdata->bytes); +#ifdef CONFIG_CIFS_SMB_DIRECT + /* For RDMA read, I/O size is in RemainingBytes not in Length */ + if (!wdata->mr) + req->Length = cpu_to_le32(wdata->bytes); +#else req->Length = cpu_to_le32(wdata->bytes); - +#endif inc_rfc1001_len(&req->hdr, wdata->bytes - 1 /* Buffer */); if (wdata->credits) { -- 2.7.4 From 1583381313829526320@xxx Tue Nov 07 04:48:12 +0000 2017 X-GM-THRID: 1583361495307786050 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread