Received: by 10.223.164.202 with SMTP id h10csp101890wrb; Wed, 22 Nov 2017 16:43:20 -0800 (PST) X-Google-Smtp-Source: AGs4zMbHs+KIrp3N+u5KXsSwMdgLWpm4swGDX+w+Svdjga5nYOFlGijBsvhU00HrP/X/MHVZriDP X-Received: by 10.99.120.195 with SMTP id t186mr5209594pgc.62.1511397800547; Wed, 22 Nov 2017 16:43:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511397800; cv=none; d=google.com; s=arc-20160816; b=1HpsXYgE5CI5Il+5fdKeOgYaXwhp1GpXg7zoclkU5+z1ui49hPpERU9tcXXlYSxU6v 7L+hx+mA6sOnQ0W2FjvtrtKbHPz5dnKx/dw72lURmVIZDunsWn3bHWM1k8K+whTZKNLW mZSEJVsR/dxpCY9/Kpf1tlwi2+9BEIQvK5nkNxnWFYhR6JTOv+9+DZuy6gpmsCRdCbsl HM6JAfty2HNEfbNMOBnuYGLAHK+zFaZ1ev972+AjVo6Mqt5BTJVxsTDhk1na1vKxFK6I htpm4eRKfsxQHCoJDFPvLlyUHTV5ahI0LLcaHrxwbe1AlLFTsqcXuR4Cw8xfMHJQccye fMFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=fFR2RXLVf///XKejxjeMhh5XyTWfChrk+2PZY0bZUaM=; b=fshhaVQ5PYy11EQIpPsJnm/2Rp1uIPgBnTtu+69UXNRzLuqKi4MZvaPPSl+0jLrB+h s2prf5EpgWsdR2C6Owmnu6472FYCVuf8Chf/EH+b/1f8vARYp4JSZqpqzxGWDt/C+SsA 5WnGDAub6voPqMhUaj/hE/gyM52o/B3kkwgjrxULQvFtSmOiW25o2nDvpSaJYkk69dQP pG09Ptv+5kl28R1poEBZT2kpxYD4R6z66OC2MM/hH+DLnKA8FUzPtXTYnPYG1Cwh0iZm UlJqJAHlmoMQ3MD9muIy6l9a4e1hR6czUBcR2t6OI6dv+PdCKNYq+70j2jCELEQjPpBr Uw+w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=exchange.microsoft.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r63si9959643plb.174.2017.11.22.16.43.09; Wed, 22 Nov 2017 16:43:20 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=exchange.microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752444AbdKWAlI (ORCPT + 76 others); Wed, 22 Nov 2017 19:41:08 -0500 Received: from a2nlsmtp01-02.prod.iad2.secureserver.net ([198.71.225.36]:46974 "EHLO a2nlsmtp01-02.prod.iad2.secureserver.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752717AbdKWAkR (ORCPT ); Wed, 22 Nov 2017 19:40:17 -0500 Received: from linuxonhyperv.com ([107.180.71.197]) by : HOSTING RELAY : with SMTP id HfXce7fZpDI9kHfXceBrsb; Wed, 22 Nov 2017 17:39:15 -0700 x-originating-ip: 107.180.71.197 Received: from longli by linuxonhyperv.com with local (Exim 4.89) (envelope-from ) id 1eHfXc-0004Ti-5t; Wed, 22 Nov 2017 17:38:56 -0700 From: Long Li To: Steve French , linux-cifs@vger.kernel.org, samba-technical@lists.samba.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, Christoph Hellwig , Tom Talpey , Matthew Wilcox , Stephen Hemminger Cc: Long Li Subject: [Patch v8 12/16] CIFS: SMBD: Upper layer performs SMB write via RDMA read through memory registration Date: Wed, 22 Nov 2017 17:38:45 -0700 Message-Id: <20171123003849.17093-13-longli@exchange.microsoft.com> X-Mailer: git-send-email 2.15.0 In-Reply-To: <20171123003849.17093-1-longli@exchange.microsoft.com> References: <20171123003849.17093-1-longli@exchange.microsoft.com> X-CMAE-Envelope: MS4wfKN7+fLCZk1/LkRemVSq5/Juk3zdlSFj6JwEKfXfTquT6cmXdfLwaeiwnsS+GfzRzT1LwrYUwmnkLr4wflC16G/J0LoZ4/w7cLR8uHV15eGtRYhoHOT0 QviRk4Tz9svne9JJGqniaRT3KaFYDcyUBFGZGVypMDLkQuyAEjPMV/nmhG4+Srmtr+LBtk2HUKaB824aFU3rSOJ7NsUG3YOxMSaa05YHTn/n88sHHPbEV1ns fIEcE9VloZ1AxbFCwylhNgDn/JVJUTx/RevvJE5aX6JOe4+pNDMQSB4/YzF2eoZWJ8jtiH1SS26X3ixg98k7TJmhWNnfE1nJxpkU0nyAKbt8Ihb4+QrnRVgQ koFc11Qi1gZxvexCDIQEe1n9RMrg7Onl9fcEmJkPi6wR0CzOc4+0kxUCzviTitCLmRCJsJmoWLSJnCpa1aCeCo96REqvKiPPw5BHOyw3B5MHI6zbIcNTnE69 553JJIEuKi0rTbbc3ClrMSDssyLkIATRtlUIdA== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Long Li When sending I/O, if size is larger than rdma_readwrite_threshold we prepare to send SMB write packet for a RDMA read via memory registration. The actual I/O is done by remote peer through local RDMA hardware. Modify the relevant fields in the packet accordingly, and append a smbd_buffer_descriptor_v1 to the end of the SMB write packet. On write I/O finish, deregister the memory region if this was for a RDMA read. If remote invalidation is not used, the call to smbd_deregister_mr will do local invalidation and possibly wait. Memory region is normally deregistered in MID callback as soon as it's used. There are situations where the MID may not be created on I/O failure, under which memory region is deregistered when write data context is released. Signed-off-by: Long Li --- fs/cifs/cifsglob.h | 3 +++ fs/cifs/cifssmb.c | 7 ++++++ fs/cifs/smb2pdu.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++--- 3 files changed, 72 insertions(+), 3 deletions(-) diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index 3fb1a2f..22bfda0 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -1174,6 +1174,9 @@ struct cifs_writedata { pid_t pid; unsigned int bytes; int result; +#ifdef CONFIG_CIFS_SMB_DIRECT + struct smbd_mr *mr; +#endif unsigned int pagesz; unsigned int tailsz; unsigned int credits; diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index 35dc5bf..66d1ebf 100644 --- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -43,6 +43,7 @@ #include "cifs_unicode.h" #include "cifs_debug.h" #include "fscache.h" +#include "smbdirect.h" #ifdef CONFIG_CIFS_POSIX static struct { @@ -1923,6 +1924,12 @@ cifs_writedata_release(struct kref *refcount) { struct cifs_writedata *wdata = container_of(refcount, struct cifs_writedata, refcount); +#ifdef CONFIG_CIFS_SMB_DIRECT + if (wdata->mr) { + smbd_deregister_mr(wdata->mr); + wdata->mr = NULL; + } +#endif if (wdata->cfile) cifsFileInfo_put(wdata->cfile); diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index c0dc049..908d777 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -48,6 +48,7 @@ #include "smb2glob.h" #include "cifspdu.h" #include "cifs_spnego.h" +#include "smbdirect.h" /* * The following table defines the expected "StructureSize" of SMB2 requests @@ -2728,7 +2729,19 @@ smb2_writev_callback(struct mid_q_entry *mid) wdata->result = -EIO; break; } - +#ifdef CONFIG_CIFS_SMB_DIRECT + /* + * If this wdata has a memory registered, the MR can be freed + * The number of MRs available is limited, it's important to recover + * used MR as soon as I/O is finished. Hold MR longer in the later + * I/O process can possibly result in I/O deadlock due to lack of MR + * to send request on I/O retry + */ + if (wdata->mr) { + smbd_deregister_mr(wdata->mr); + wdata->mr = NULL; + } +#endif if (wdata->result) cifs_stats_fail_inc(tcon, SMB2_WRITE_HE); @@ -2780,7 +2793,42 @@ smb2_async_writev(struct cifs_writedata *wdata, req->DataOffset = cpu_to_le16( offsetof(struct smb2_write_req, Buffer)); req->RemainingBytes = 0; - +#ifdef CONFIG_CIFS_SMB_DIRECT + /* + * If we want to do a server RDMA read, fill in and append + * smbd_buffer_descriptor_v1 to the end of write request + */ + if (server->rdma && wdata->bytes >= + server->smbd_conn->rdma_readwrite_threshold) { + + struct smbd_buffer_descriptor_v1 *v1; + bool need_invalidate = server->dialect == SMB30_PROT_ID; + + wdata->mr = smbd_register_mr( + server->smbd_conn, wdata->pages, + wdata->nr_pages, wdata->tailsz, + false, need_invalidate); + if (!wdata->mr) { + rc = -ENOBUFS; + goto async_writev_out; + } + req->Length = 0; + req->DataOffset = 0; + req->RemainingBytes = + (wdata->nr_pages-1)*PAGE_SIZE + wdata->tailsz; + req->Channel = SMB2_CHANNEL_RDMA_V1_INVALIDATE; + if (need_invalidate) + req->Channel = SMB2_CHANNEL_RDMA_V1; + req->WriteChannelInfoOffset = + offsetof(struct smb2_write_req, Buffer); + req->WriteChannelInfoLength = + sizeof(struct smbd_buffer_descriptor_v1); + v1 = (struct smbd_buffer_descriptor_v1 *) &req->Buffer[0]; + v1->offset = wdata->mr->mr->iova; + v1->token = wdata->mr->mr->rkey; + v1->length = wdata->mr->mr->length; + } +#endif /* 4 for rfc1002 length field and 1 for Buffer */ iov[0].iov_len = 4; rfc1002_marker = cpu_to_be32(total_len - 1 + wdata->bytes); @@ -2794,11 +2842,22 @@ smb2_async_writev(struct cifs_writedata *wdata, rqst.rq_npages = wdata->nr_pages; rqst.rq_pagesz = wdata->pagesz; rqst.rq_tailsz = wdata->tailsz; - +#ifdef CONFIG_CIFS_SMB_DIRECT + if (wdata->mr) { + iov[1].iov_len += sizeof(struct smbd_buffer_descriptor_v1); + rqst.rq_npages = 0; + } +#endif cifs_dbg(FYI, "async write at %llu %u bytes\n", wdata->offset, wdata->bytes); +#ifdef CONFIG_CIFS_SMB_DIRECT + /* For RDMA read, I/O size is in RemainingBytes not in Length */ + if (!wdata->mr) + req->Length = cpu_to_le32(wdata->bytes); +#else req->Length = cpu_to_le32(wdata->bytes); +#endif if (wdata->credits) { shdr->CreditCharge = cpu_to_le16(DIV_ROUND_UP(wdata->bytes, -- 2.7.4 From 1584808246833488230@xxx Wed Nov 22 22:48:41 +0000 2017 X-GM-THRID: 1584807290218277234 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread