Received: by 10.223.164.202 with SMTP id h10csp103021wrb; Wed, 22 Nov 2017 16:44:59 -0800 (PST) X-Google-Smtp-Source: AGs4zMalB7DrWPwii/o3NmddYTomQFRuWzrwvcKbo7ch/8MZjykmGG3R5vV9VcVo3ZNaSvZcYGgy X-Received: by 10.159.255.7 with SMTP id bi7mr9357005plb.221.1511397898980; Wed, 22 Nov 2017 16:44:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511397898; cv=none; d=google.com; s=arc-20160816; b=aPmYNUnwt5NS7hDSdEGnTcFMD7pj0/bRgkexh9vCJ1/bStMYa7y5NpLDl2BmsHVoOP KCCvioVujmebUgxwE9eUui/eyqVGlfeMDmtEktrjIm71KRIZK3yvcOcQFGNG3s5pP6o3 WX/OuWd6jE7Fy0e9bW3nRgSOzcF4d642DNuU84gxqJcmfK/hSTQL/HtoAaeAGocuCA3o 2Vt+5mJ8fgy2yrwUvsnVtYolWvCfsQ0OsXJhmRCEi73QDU5kINcqw9O0orPgqy76UoxW TtmcxBiZNnWjxX+1/NyMveADNSRgqIgrbrsBwKadqxcf1ktm44fzh3kOHtGvnE/X6K7a TIoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=XsXSHHQNYP0C0szmv708bEsLfXjxvd1hX2kaYnQi4Hk=; b=hDn43voaEmrif/whcRIwbrLeA3K+IX/UitPx2OA2Y5yVZ9dPnuY6XBkCzx7hLTjx4t WRiciQ44UBqghycoO1WxI93vAFN+lqInWw7E0H6GPITzyEa9Nd2zafZY3mcMuVLolmkl /XkyoKIn/wE7UEs5/0cXW1OVFcBz8xc6jNhpD+TULN+1gM4ML8dwVCCbl3w9DgXDKtBl UoII18dFS1DsOPRrqmA+9/cOLiUejZQvJWD63uaB9pICJzdhT4fIbCxcuD1OgJmUqUsI 4hnGo+YriV07cnXumWs9L/G93AJ3aYSBKHbc1KF1MBm77pG6PwWXX+laN3Wdtz9EZI5r JkQQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=exchange.microsoft.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j8si2265670plk.538.2017.11.22.16.44.48; Wed, 22 Nov 2017 16:44:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=exchange.microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753015AbdKWAmj (ORCPT + 76 others); Wed, 22 Nov 2017 19:42:39 -0500 Received: from a2nlsmtp01-03.prod.iad2.secureserver.net ([198.71.225.37]:49492 "EHLO a2nlsmtp01-03.prod.iad2.secureserver.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752729AbdKWAkR (ORCPT ); Wed, 22 Nov 2017 19:40:17 -0500 Received: from linuxonhyperv.com ([107.180.71.197]) by : HOSTING RELAY : with SMTP id HfXceuu3DNCjBHfXceqky4; Wed, 22 Nov 2017 17:39:15 -0700 x-originating-ip: 107.180.71.197 Received: from longli by linuxonhyperv.com with local (Exim 4.89) (envelope-from ) id 1eHfXc-0004Tv-CN; Wed, 22 Nov 2017 17:38:56 -0700 From: Long Li To: Steve French , linux-cifs@vger.kernel.org, samba-technical@lists.samba.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, Christoph Hellwig , Tom Talpey , Matthew Wilcox , Stephen Hemminger Cc: Long Li Subject: [Patch v8 14/16] CIFS: SMBD: Upper layer performs SMB read via RDMA write through memory registration Date: Wed, 22 Nov 2017 17:38:47 -0700 Message-Id: <20171123003849.17093-15-longli@exchange.microsoft.com> X-Mailer: git-send-email 2.15.0 In-Reply-To: <20171123003849.17093-1-longli@exchange.microsoft.com> References: <20171123003849.17093-1-longli@exchange.microsoft.com> X-CMAE-Envelope: MS4wfEbZKhQiHN8OLB28r3CvcgqLLfjDxLESX+ahKVs3OiynOvtldvmSlEp1r2OHeDxi7RmbX3+k5HBORgrVtxUS5jehYlBlPauU1hMxdwdTTIBbZXi19vxo i7yguTPDS0OKPvf/CPDuN9hqFGKQWQipim7LZJkcYqOgmHhIoZg/PPGuEsi1Lh+prktIMYtwis153HCUPz2MTsD8sWKgNgvJt1qxvJAK84O9zv/Py4m0vCeK W3aFFxcRBypKIuxmVYPuPwnmtvoZVLjrdfgvpJhEtXjVA58Ejnxk7JJYIrwpzxbFR+kkUIrV8Gf2yoeIFAj+KVQ67pgCe0FHy+1c6kOKqeQ1l77JIVbJnFY1 CzVpLjpHPqcgcMjQncpSFpa/ijYwtbLyZE/lSTfjxMci2lNPgOvZ6juKtAKqZmLE9kAHq32KQh6tM8rd+0SuW+3OOkcV8wCdQM3mC+l9zDScZj+TDq9BVvkt 7QZ0pPjh0m6I+lHiqJFfDMM+zqjwLNXHr6jBBQ== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Long Li If I/O size is larger than rdma_readwrite_threshold, use RDMA write for SMB read by specifying channel SMB2_CHANNEL_RDMA_V1 or SMB2_CHANNEL_RDMA_V1_INVALIDATE in the SMB packet, depending on SMB dialect used. Append a smbd_buffer_descriptor_v1 to the end of the SMB packet and fill in other values to indicate this SMB read uses RDMA write. There is no need to read from the transport for incoming payload. At the time SMB read response comes back, the data is already transfered and placed in the pages by RDMA hardware. When SMB read is finished, deregister the memory regions if RDMA write is used for this SMB read. smbd_deregister_mr may need to do local invalidation and sleep, if server remote invalidation is not used. There are situations where the MID may not be created on I/O failure, under which memory region is deregistered when read data context is released. Signed-off-by: Long Li --- fs/cifs/file.c | 17 +++++++++++++++-- fs/cifs/smb2pdu.c | 45 ++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 59 insertions(+), 3 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index df9f682..93259a16 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -42,7 +42,7 @@ #include "cifs_debug.h" #include "cifs_fs_sb.h" #include "fscache.h" - +#include "smbdirect.h" static inline int cifs_convert_flags(unsigned int flags) { @@ -2902,7 +2902,12 @@ cifs_readdata_release(struct kref *refcount) { struct cifs_readdata *rdata = container_of(refcount, struct cifs_readdata, refcount); - +#ifdef CONFIG_CIFS_SMB_DIRECT + if (rdata->mr) { + smbd_deregister_mr(rdata->mr); + rdata->mr = NULL; + } +#endif if (rdata->cfile) cifsFileInfo_put(rdata->cfile); @@ -3031,6 +3036,10 @@ uncached_fill_pages(struct TCP_Server_Info *server, } if (iter) result = copy_page_from_iter(page, 0, n, iter); +#ifdef CONFIG_CIFS_SMB_DIRECT + else if (rdata->mr) + result = n; +#endif else result = cifs_read_page_from_socket(server, page, n); if (result < 0) @@ -3600,6 +3609,10 @@ readpages_fill_pages(struct TCP_Server_Info *server, if (iter) result = copy_page_from_iter(page, 0, n, iter); +#ifdef CONFIG_CIFS_SMB_DIRECT + else if (rdata->mr) + result = n; +#endif else result = cifs_read_page_from_socket(server, page, n); if (result < 0) diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 908d777..bee0871d 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -2458,7 +2458,40 @@ smb2_new_read_req(void **buf, unsigned int *total_len, req->MinimumCount = 0; req->Length = cpu_to_le32(io_parms->length); req->Offset = cpu_to_le64(io_parms->offset); +#ifdef CONFIG_CIFS_SMB_DIRECT + /* + * If we want to do a RDMA write, fill in and append + * smbd_buffer_descriptor_v1 to the end of read request + */ + if (server->rdma && rdata && + rdata->bytes >= server->smbd_conn->rdma_readwrite_threshold) { + + struct smbd_buffer_descriptor_v1 *v1; + bool need_invalidate = + io_parms->tcon->ses->server->dialect == SMB30_PROT_ID; + + rdata->mr = smbd_register_mr( + server->smbd_conn, rdata->pages, + rdata->nr_pages, rdata->tailsz, + true, need_invalidate); + if (!rdata->mr) + return -ENOBUFS; + + req->Channel = SMB2_CHANNEL_RDMA_V1_INVALIDATE; + if (need_invalidate) + req->Channel = SMB2_CHANNEL_RDMA_V1; + req->ReadChannelInfoOffset = + offsetof(struct smb2_read_plain_req, Buffer); + req->ReadChannelInfoLength = + sizeof(struct smbd_buffer_descriptor_v1); + v1 = (struct smbd_buffer_descriptor_v1 *) &req->Buffer[0]; + v1->offset = rdata->mr->mr->iova; + v1->token = rdata->mr->mr->rkey; + v1->length = rdata->mr->mr->length; + *total_len += sizeof(*v1) - 1; + } +#endif if (request_type & CHAINED_REQUEST) { if (!(request_type & END_OF_CHAIN)) { /* next 8-byte aligned request */ @@ -2537,7 +2570,17 @@ smb2_readv_callback(struct mid_q_entry *mid) if (rdata->result != -ENODATA) rdata->result = -EIO; } - +#ifdef CONFIG_CIFS_SMB_DIRECT + /* + * If this rdata has a memmory registered, the MR can be freed + * MR needs to be freed as soon as I/O finishes to prevent deadlock + * because they have limited number and are used for future I/Os + */ + if (rdata->mr) { + smbd_deregister_mr(rdata->mr); + rdata->mr = NULL; + } +#endif if (rdata->result) cifs_stats_fail_inc(tcon, SMB2_READ_HE); -- 2.7.4 From 1584808407086245032@xxx Wed Nov 22 22:51:14 +0000 2017 X-GM-THRID: 1584808407086245032 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread