2018-05-30 19:54:48

by Long Li

[permalink] [raw]
Subject: [Patch v2 00/15] CIFS: Add direct I/O support

From: Long Li <[email protected]>

This patch set implements direct I/O.

In normal code path (even with cache=none), CIFS copies I/O data from
user-space to kernel-space for security reasons of possible protocol
required signing and encryption on user data.

With this patch set, CIFS passes the I/O data directly from user-space
buffer to the transport layer, when file system is mounted with
"cache-none".

Patch v2 addressed comments from Christoph Hellwig <[email protected]> and
Tom Talpey <[email protected]> to implement direct I/O for both
socket and RDMA.


Long Li (15):
CIFS: Introduce offset for the 1st page in data transfer structures
CIFS: Add support for direct pages in rdata
CIFS: Use offset when reading pages
CIFS: Add support for direct pages in wdata
CIFS: Calculate the correct request length based on page offset and
tail size
CIFS: Introduce helper function to get page offset and length in
smb_rqst
CIFS: When sending data on socket, pass the correct page offset
CIFS: SMBD: Support page offset in RDMA send
CIFS: SMBD: Support page offset in RDMA recv
CIFS: SMBD: Support page offset in memory registration
CIFS: Pass page offset for calculating signature
CIFS: Pass page offset for encrypting
CIFS: Add support for direct I/O read
CIFS: Add support for direct I/O write
CIFS: Add direct I/O functions to file_operations

fs/cifs/cifsencrypt.c | 9 +-
fs/cifs/cifsfs.c | 5 +-
fs/cifs/cifsfs.h | 2 +
fs/cifs/cifsglob.h | 7 +-
fs/cifs/cifsproto.h | 9 +-
fs/cifs/cifssmb.c | 17 ++-
fs/cifs/connect.c | 5 +-
fs/cifs/file.c | 389 +++++++++++++++++++++++++++++++++++++++++++++++---
fs/cifs/misc.c | 17 +++
fs/cifs/smb2ops.c | 22 +--
fs/cifs/smb2pdu.c | 20 ++-
fs/cifs/smbdirect.c | 121 ++++++++++------
fs/cifs/smbdirect.h | 2 +-
fs/cifs/transport.c | 34 +++--
14 files changed, 554 insertions(+), 105 deletions(-)

--
2.7.4



2018-05-30 19:50:50

by Long Li

[permalink] [raw]
Subject: [Patch v2 04/15] CIFS: Add support for direct pages in wdata

From: Long Li <[email protected]>

Add a function to allocate wdata without allocating pages for data
transfer. This gives the caller an option to pass a number of pages that
point to the data buffer to write to.

wdata is reponsible for free those pages after it's done.

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/cifsglob.h | 2 +-
fs/cifs/cifsproto.h | 2 ++
fs/cifs/cifssmb.c | 17 ++++++++++++++---
3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 56864a87..7f62c98 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1205,7 +1205,7 @@ struct cifs_writedata {
unsigned int tailsz;
unsigned int credits;
unsigned int nr_pages;
- struct page *pages[];
+ struct page **pages;
};

/*
diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index 1f27d8e..7933c5f 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -533,6 +533,8 @@ int cifs_async_writev(struct cifs_writedata *wdata,
void cifs_writev_complete(struct work_struct *work);
struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages,
work_func_t complete);
+struct cifs_writedata *cifs_writedata_direct_alloc(struct page **pages,
+ work_func_t complete);
void cifs_writedata_release(struct kref *refcount);
int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
struct cifs_sb_info *cifs_sb,
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index c8e4278..5aca336 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -1952,6 +1952,7 @@ cifs_writedata_release(struct kref *refcount)
if (wdata->cfile)
cifsFileInfo_put(wdata->cfile);

+ kvfree(wdata->pages);
kfree(wdata);
}

@@ -2075,12 +2076,22 @@ cifs_writev_complete(struct work_struct *work)
struct cifs_writedata *
cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete)
{
+ struct page **pages =
+ kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS);
+ if (pages)
+ return cifs_writedata_direct_alloc(pages, complete);
+
+ return NULL;
+}
+
+struct cifs_writedata *
+cifs_writedata_direct_alloc(struct page **pages, work_func_t complete)
+{
struct cifs_writedata *wdata;

- /* writedata + number of page pointers */
- wdata = kzalloc(sizeof(*wdata) +
- sizeof(struct page *) * nr_pages, GFP_NOFS);
+ wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
if (wdata != NULL) {
+ wdata->pages = pages;
kref_init(&wdata->refcount);
INIT_LIST_HEAD(&wdata->list);
init_completion(&wdata->done);
--
2.7.4


2018-05-30 19:51:04

by Long Li

[permalink] [raw]
Subject: [Patch v2 12/15] CIFS: Pass page offset for encrypting

From: Long Li <[email protected]>

Encryption function needs to read data starting page offset from input
buffer.

This doesn't affect decryption path since it allocates its own page
buffers.

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/smb2ops.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 1fa1c29..38d19b6 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -2189,9 +2189,10 @@ init_sg(struct smb_rqst *rqst, u8 *sign)
smb2_sg_set_buf(&sg[i], rqst->rq_iov[i].iov_base,
rqst->rq_iov[i].iov_len);
for (j = 0; i < sg_len - 1; i++, j++) {
- unsigned int len = (j < rqst->rq_npages - 1) ? rqst->rq_pagesz
- : rqst->rq_tailsz;
- sg_set_page(&sg[i], rqst->rq_pages[j], len, 0);
+ unsigned int len, offset;
+
+ rqst_page_get_length(rqst, j, &len, &offset);
+ sg_set_page(&sg[i], rqst->rq_pages[j], len, offset);
}
smb2_sg_set_buf(&sg[sg_len - 1], sign, SMB2_SIGNATURE_SIZE);
return sg;
@@ -2332,6 +2333,7 @@ smb3_init_transform_rq(struct TCP_Server_Info *server, struct smb_rqst *new_rq,
return rc;

new_rq->rq_pages = pages;
+ new_rq->rq_offset = old_rq->rq_offset;
new_rq->rq_npages = old_rq->rq_npages;
new_rq->rq_pagesz = old_rq->rq_pagesz;
new_rq->rq_tailsz = old_rq->rq_tailsz;
@@ -2363,10 +2365,14 @@ smb3_init_transform_rq(struct TCP_Server_Info *server, struct smb_rqst *new_rq,

/* copy pages form the old */
for (i = 0; i < npages; i++) {
- char *dst = kmap(new_rq->rq_pages[i]);
- char *src = kmap(old_rq->rq_pages[i]);
- unsigned int len = (i < npages - 1) ? new_rq->rq_pagesz :
- new_rq->rq_tailsz;
+ char *dst, *src;
+ unsigned int offset, len;
+
+ rqst_page_get_length(new_rq, i, &len, &offset);
+
+ dst = (char *) kmap(new_rq->rq_pages[i]) + offset;
+ src = (char *) kmap(old_rq->rq_pages[i]) + offset;
+
memcpy(dst, src, len);
kunmap(new_rq->rq_pages[i]);
kunmap(old_rq->rq_pages[i]);
--
2.7.4


2018-05-30 19:51:20

by Long Li

[permalink] [raw]
Subject: [Patch v2 15/15] CIFS: Add direct I/O functions to file_operations

From: Long Li <[email protected]>

With direct read/write functions implemented, add them to file_operations.
When mounting with "cache=none", CIFS uses direct I/O.

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/cifsfs.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 62f1662..e84f8c2 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -1113,9 +1113,8 @@ const struct file_operations cifs_file_strict_ops = {
};

const struct file_operations cifs_file_direct_ops = {
- /* BB reevaluate whether they can be done with directio, no cache */
- .read_iter = cifs_user_readv,
- .write_iter = cifs_user_writev,
+ .read_iter = cifs_direct_readv,
+ .write_iter = cifs_direct_writev,
.open = cifs_open,
.release = cifs_close,
.lock = cifs_lock,
--
2.7.4


2018-05-30 19:51:52

by Long Li

[permalink] [raw]
Subject: [Patch v2 11/15] CIFS: Pass page offset for calculating signature

From: Long Li <[email protected]>

When calculating signature for the packet, it needs to read into the
correct page offset for the data.

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/cifsencrypt.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c
index a6ef088..e88303c 100644
--- a/fs/cifs/cifsencrypt.c
+++ b/fs/cifs/cifsencrypt.c
@@ -68,11 +68,12 @@ int __cifs_calc_signature(struct smb_rqst *rqst,

/* now hash over the rq_pages array */
for (i = 0; i < rqst->rq_npages; i++) {
- void *kaddr = kmap(rqst->rq_pages[i]);
- size_t len = rqst->rq_pagesz;
+ void *kaddr;
+ unsigned int len, offset;

- if (i == rqst->rq_npages - 1)
- len = rqst->rq_tailsz;
+ rqst_page_get_length(rqst, i, &len, &offset);
+
+ kaddr = (char *) kmap(rqst->rq_pages[i]) + offset;

crypto_shash_update(shash, kaddr, len);

--
2.7.4


2018-05-30 19:52:35

by Long Li

[permalink] [raw]
Subject: [Patch v2 08/15] CIFS: SMBD: Support page offset in RDMA send

From: Long Li <[email protected]>

The RDMA send function needs to look at offset in the request pages, and
send data starting from there.

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/smbdirect.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index c62f7c9..6141e3c 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -17,6 +17,7 @@
#include <linux/highmem.h>
#include "smbdirect.h"
#include "cifs_debug.h"
+#include "cifsproto.h"

static struct smbd_response *get_empty_queue_buffer(
struct smbd_connection *info);
@@ -2082,7 +2083,7 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
struct kvec vec;
int nvecs;
int size;
- int buflen = 0, remaining_data_length;
+ unsigned int buflen = 0, remaining_data_length;
int start, i, j;
int max_iov_size =
info->max_send_size - sizeof(struct smbd_data_transfer);
@@ -2113,10 +2114,17 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
buflen += iov[i].iov_len;
}

- /* add in the page array if there is one */
+ /*
+ * Add in the page array if there is one. The caller needs to set
+ * rq_tailsz to PAGE_SIZE when the buffer has multiple pages and
+ * ends at page boundary
+ */
if (rqst->rq_npages) {
- buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
- buflen += rqst->rq_tailsz;
+ if (rqst->rq_npages == 1)
+ buflen += rqst->rq_tailsz;
+ else
+ buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
+ rqst->rq_offset + rqst->rq_tailsz;
}

if (buflen + sizeof(struct smbd_data_transfer) >
@@ -2213,8 +2221,9 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)

/* now sending pages if there are any */
for (i = 0; i < rqst->rq_npages; i++) {
- buflen = (i == rqst->rq_npages-1) ?
- rqst->rq_tailsz : rqst->rq_pagesz;
+ unsigned int offset;
+
+ rqst_page_get_length(rqst, i, &buflen, &offset);
nvecs = (buflen + max_iov_size - 1) / max_iov_size;
log_write(INFO, "sending pages buflen=%d nvecs=%d\n",
buflen, nvecs);
@@ -2225,9 +2234,11 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
remaining_data_length -= size;
log_write(INFO, "sending pages i=%d offset=%d size=%d"
" remaining_data_length=%d\n",
- i, j*max_iov_size, size, remaining_data_length);
+ i, j*max_iov_size+offset, size,
+ remaining_data_length);
rc = smbd_post_send_page(
- info, rqst->rq_pages[i], j*max_iov_size,
+ info, rqst->rq_pages[i],
+ j*max_iov_size + offset,
size, remaining_data_length);
if (rc)
goto done;
--
2.7.4


2018-05-30 19:52:36

by Long Li

[permalink] [raw]
Subject: [Patch v2 10/15] CIFS: SMBD: Support page offset in memory registration

From: Long Li <[email protected]>

Change code to pass the correct page offset during memory registration for
RDMA read/write.

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/smb2pdu.c | 18 ++++++++-----
fs/cifs/smbdirect.c | 76 +++++++++++++++++++++++++++++++----------------------
fs/cifs/smbdirect.h | 2 +-
3 files changed, 58 insertions(+), 38 deletions(-)

diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index f603fbe..fc30774 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -2623,8 +2623,8 @@ smb2_new_read_req(void **buf, unsigned int *total_len,

rdata->mr = smbd_register_mr(
server->smbd_conn, rdata->pages,
- rdata->nr_pages, rdata->tailsz,
- true, need_invalidate);
+ rdata->nr_pages, rdata->page_offset,
+ rdata->tailsz, true, need_invalidate);
if (!rdata->mr)
return -ENOBUFS;

@@ -3013,16 +3013,22 @@ smb2_async_writev(struct cifs_writedata *wdata,

wdata->mr = smbd_register_mr(
server->smbd_conn, wdata->pages,
- wdata->nr_pages, wdata->tailsz,
- false, need_invalidate);
+ wdata->nr_pages, wdata->page_offset,
+ wdata->tailsz, false, need_invalidate);
if (!wdata->mr) {
rc = -ENOBUFS;
goto async_writev_out;
}
req->Length = 0;
req->DataOffset = 0;
- req->RemainingBytes =
- cpu_to_le32((wdata->nr_pages-1)*PAGE_SIZE + wdata->tailsz);
+ if (wdata->nr_pages > 1)
+ req->RemainingBytes =
+ cpu_to_le32(
+ (wdata->nr_pages - 1) * wdata->pagesz -
+ wdata->page_offset + wdata->tailsz
+ );
+ else
+ req->RemainingBytes = cpu_to_le32(wdata->tailsz);
req->Channel = SMB2_CHANNEL_RDMA_V1_INVALIDATE;
if (need_invalidate)
req->Channel = SMB2_CHANNEL_RDMA_V1;
diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index ba53c52..e459c97 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2299,37 +2299,37 @@ static void smbd_mr_recovery_work(struct work_struct *work)
if (smbdirect_mr->state == MR_INVALIDATED ||
smbdirect_mr->state == MR_ERROR) {

- if (smbdirect_mr->state == MR_INVALIDATED) {
+ /* recover this MR entry */
+ rc = ib_dereg_mr(smbdirect_mr->mr);
+ if (rc) {
+ log_rdma_mr(ERR,
+ "ib_dereg_mr failed rc=%x\n",
+ rc);
+ smbd_disconnect_rdma_connection(info);
+ continue;
+ }
+
+ smbdirect_mr->mr = ib_alloc_mr(
+ info->pd, info->mr_type,
+ info->max_frmr_depth);
+ if (IS_ERR(smbdirect_mr->mr)) {
+ log_rdma_mr(ERR,
+ "ib_alloc_mr failed mr_type=%x "
+ "max_frmr_depth=%x\n",
+ info->mr_type,
+ info->max_frmr_depth);
+ smbd_disconnect_rdma_connection(info);
+ continue;
+ }
+
+ if (smbdirect_mr->state == MR_INVALIDATED)
ib_dma_unmap_sg(
info->id->device, smbdirect_mr->sgl,
smbdirect_mr->sgl_count,
smbdirect_mr->dir);
- smbdirect_mr->state = MR_READY;
- } else if (smbdirect_mr->state == MR_ERROR) {
-
- /* recover this MR entry */
- rc = ib_dereg_mr(smbdirect_mr->mr);
- if (rc) {
- log_rdma_mr(ERR,
- "ib_dereg_mr failed rc=%x\n",
- rc);
- smbd_disconnect_rdma_connection(info);
- }

- smbdirect_mr->mr = ib_alloc_mr(
- info->pd, info->mr_type,
- info->max_frmr_depth);
- if (IS_ERR(smbdirect_mr->mr)) {
- log_rdma_mr(ERR,
- "ib_alloc_mr failed mr_type=%x "
- "max_frmr_depth=%x\n",
- info->mr_type,
- info->max_frmr_depth);
- smbd_disconnect_rdma_connection(info);
- }
+ smbdirect_mr->state = MR_READY;

- smbdirect_mr->state = MR_READY;
- }
/* smbdirect_mr->state is updated by this function
* and is read and updated by I/O issuing CPUs trying
* to get a MR, the call to atomic_inc_return
@@ -2475,7 +2475,7 @@ static struct smbd_mr *get_mr(struct smbd_connection *info)
*/
struct smbd_mr *smbd_register_mr(
struct smbd_connection *info, struct page *pages[], int num_pages,
- int tailsz, bool writing, bool need_invalidate)
+ int offset, int tailsz, bool writing, bool need_invalidate)
{
struct smbd_mr *smbdirect_mr;
int rc, i;
@@ -2498,17 +2498,31 @@ struct smbd_mr *smbd_register_mr(
smbdirect_mr->sgl_count = num_pages;
sg_init_table(smbdirect_mr->sgl, num_pages);

- for (i = 0; i < num_pages - 1; i++)
- sg_set_page(&smbdirect_mr->sgl[i], pages[i], PAGE_SIZE, 0);
+ log_rdma_mr(INFO, "num_pages=0x%x offset=0x%x tailsz=0x%x\n",
+ num_pages, offset, tailsz);

+ if (num_pages == 1) {
+ sg_set_page(&smbdirect_mr->sgl[0], pages[0], tailsz, offset);
+ goto skip_multiple_pages;
+ }
+
+ /* We have at least two pages to register */
+ sg_set_page(
+ &smbdirect_mr->sgl[0], pages[0], PAGE_SIZE - offset, offset);
+ i = 1;
+ while (i < num_pages - 1) {
+ sg_set_page(&smbdirect_mr->sgl[i], pages[i], PAGE_SIZE, 0);
+ i++;
+ }
sg_set_page(&smbdirect_mr->sgl[i], pages[i],
tailsz ? tailsz : PAGE_SIZE, 0);

+skip_multiple_pages:
dir = writing ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
smbdirect_mr->dir = dir;
rc = ib_dma_map_sg(info->id->device, smbdirect_mr->sgl, num_pages, dir);
if (!rc) {
- log_rdma_mr(INFO, "ib_dma_map_sg num_pages=%x dir=%x rc=%x\n",
+ log_rdma_mr(ERR, "ib_dma_map_sg num_pages=%x dir=%x rc=%x\n",
num_pages, dir, rc);
goto dma_map_error;
}
@@ -2516,8 +2530,8 @@ struct smbd_mr *smbd_register_mr(
rc = ib_map_mr_sg(smbdirect_mr->mr, smbdirect_mr->sgl, num_pages,
NULL, PAGE_SIZE);
if (rc != num_pages) {
- log_rdma_mr(INFO,
- "ib_map_mr_sg failed rc = %x num_pages = %x\n",
+ log_rdma_mr(ERR,
+ "ib_map_mr_sg failed rc = %d num_pages = %x\n",
rc, num_pages);
goto map_mr_error;
}
diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h
index f9038da..1e419c2 100644
--- a/fs/cifs/smbdirect.h
+++ b/fs/cifs/smbdirect.h
@@ -321,7 +321,7 @@ struct smbd_mr {
/* Interfaces to register and deregister MR for RDMA read/write */
struct smbd_mr *smbd_register_mr(
struct smbd_connection *info, struct page *pages[], int num_pages,
- int tailsz, bool writing, bool need_invalidate);
+ int offset, int tailsz, bool writing, bool need_invalidate);
int smbd_deregister_mr(struct smbd_mr *mr);

#else
--
2.7.4


2018-05-30 19:52:43

by Long Li

[permalink] [raw]
Subject: [Patch v2 14/15] CIFS: Add support for direct I/O write

From: Long Li <[email protected]>

Implement the function for direct I/O write. It doesn't support AIO, which
will be implemented in a follow up patch.

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/cifsfs.h | 1 +
fs/cifs/file.c | 165 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 166 insertions(+)

diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 7fba9aa..e9c5103 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -105,6 +105,7 @@ extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
+extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from);
extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
extern int cifs_lock(struct file *, int, struct file_lock *);
extern int cifs_fsync(struct file *, loff_t, loff_t, int);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index e6e6f24..8c385b1 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2461,6 +2461,35 @@ cifs_uncached_writedata_release(struct kref *refcount)

static void collect_uncached_write_data(struct cifs_aio_ctx *ctx);

+static void cifs_direct_writedata_release(struct kref *refcount)
+{
+ int i;
+ struct cifs_writedata *wdata = container_of(refcount,
+ struct cifs_writedata, refcount);
+
+ for (i = 0; i < wdata->nr_pages; i++)
+ put_page(wdata->pages[i]);
+
+ cifs_writedata_release(refcount);
+}
+
+static void cifs_direct_writev_complete(struct work_struct *work)
+{
+ struct cifs_writedata *wdata = container_of(work,
+ struct cifs_writedata, work);
+ struct inode *inode = d_inode(wdata->cfile->dentry);
+ struct cifsInodeInfo *cifsi = CIFS_I(inode);
+
+ spin_lock(&inode->i_lock);
+ cifs_update_eof(cifsi, wdata->offset, wdata->bytes);
+ if (cifsi->server_eof > inode->i_size)
+ i_size_write(inode, cifsi->server_eof);
+ spin_unlock(&inode->i_lock);
+
+ complete(&wdata->done);
+ kref_put(&wdata->refcount, cifs_direct_writedata_release);
+}
+
static void
cifs_uncached_writev_complete(struct work_struct *work)
{
@@ -2703,6 +2732,142 @@ static void collect_uncached_write_data(struct cifs_aio_ctx *ctx)
complete(&ctx->done);
}

+ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from)
+{
+ struct file *file = iocb->ki_filp;
+ ssize_t total_written = 0;
+ struct cifsFileInfo *cfile;
+ struct cifs_tcon *tcon;
+ struct cifs_sb_info *cifs_sb;
+ struct TCP_Server_Info *server;
+ pid_t pid;
+ unsigned long nr_pages;
+ loff_t offset = iocb->ki_pos;
+ size_t len = iov_iter_count(from);
+ int rc;
+ struct cifs_writedata *wdata;
+
+ /*
+ * iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
+ * In this case, fall back to non-direct write function.
+ */
+ if (from->type & ITER_KVEC) {
+ cifs_dbg(FYI, "use non-direct cifs_user_writev for kvec I/O\n");
+ return cifs_user_writev(iocb, from);
+ }
+
+ rc = generic_write_checks(iocb, from);
+ if (rc <= 0)
+ return rc;
+
+ cifs_sb = CIFS_FILE_SB(file);
+ cfile = file->private_data;
+ tcon = tlink_tcon(cfile->tlink);
+ server = tcon->ses->server;
+
+ if (!server->ops->async_writev)
+ return -ENOSYS;
+
+ if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
+ pid = cfile->pid;
+ else
+ pid = current->tgid;
+
+ do {
+ unsigned int wsize, credits;
+ struct page **pagevec;
+ size_t start;
+ ssize_t cur_len;
+
+ rc = server->ops->wait_mtu_credits(server, cifs_sb->wsize,
+ &wsize, &credits);
+ if (rc)
+ break;
+
+ cur_len = iov_iter_get_pages_alloc(
+ from, &pagevec, wsize, &start);
+ if (cur_len < 0) {
+ cifs_dbg(VFS,
+ "direct_writev couldn't get user pages "
+ "(rc=%zd) iter type %d iov_offset %lu count"
+ " %lu\n",
+ cur_len, from->type,
+ from->iov_offset, from->count);
+ dump_stack();
+ break;
+ }
+ if (cur_len < 0)
+ break;
+
+ nr_pages = (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE;
+
+ wdata = cifs_writedata_direct_alloc(pagevec,
+ cifs_direct_writev_complete);
+ if (!wdata) {
+ rc = -ENOMEM;
+ add_credits_and_wake_if(server, credits, 0);
+ break;
+ }
+
+ wdata->nr_pages = nr_pages;
+ wdata->page_offset = start;
+ wdata->pagesz = PAGE_SIZE;
+ wdata->tailsz =
+ nr_pages > 1 ?
+ cur_len - (PAGE_SIZE - start) -
+ (nr_pages - 2) * PAGE_SIZE :
+ cur_len;
+
+ wdata->sync_mode = WB_SYNC_ALL;
+ wdata->offset = (__u64)offset;
+ wdata->cfile = cifsFileInfo_get(cfile);
+ wdata->pid = pid;
+ wdata->bytes = cur_len;
+ wdata->credits = credits;
+
+ rc = 0;
+ if (wdata->cfile->invalidHandle)
+ rc = cifs_reopen_file(wdata->cfile, false);
+
+ if (!rc)
+ rc = server->ops->async_writev(wdata,
+ cifs_direct_writedata_release);
+
+ if (rc) {
+ add_credits_and_wake_if(server, wdata->credits, 0);
+ kref_put(&wdata->refcount,
+ cifs_direct_writedata_release);
+ if (rc == -EAGAIN)
+ continue;
+ break;
+ }
+
+ wait_for_completion(&wdata->done);
+ if (wdata->result) {
+ rc = wdata->result;
+ kref_put(&wdata->refcount,
+ cifs_direct_writedata_release);
+ if (rc == -EAGAIN)
+ continue;
+ break;
+ }
+
+ kref_put(&wdata->refcount, cifs_direct_writedata_release);
+
+ iov_iter_advance(from, cur_len);
+ total_written += cur_len;
+ offset += cur_len;
+ len -= cur_len;
+ } while (len);
+
+ if (unlikely(!total_written))
+ return rc;
+
+ iocb->ki_pos += total_written;
+ return total_written;
+
+}
+
ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
--
2.7.4


2018-05-30 19:53:11

by Long Li

[permalink] [raw]
Subject: [Patch v2 13/15] CIFS: Add support for direct I/O read

From: Long Li <[email protected]>

Implement the function for direct I/O read. It doesn't support AIO, which
will be implemented in a follow up patch.

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/cifsfs.h | 1 +
fs/cifs/file.c | 149 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 150 insertions(+)

diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 5f02318..7fba9aa 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -102,6 +102,7 @@ extern int cifs_open(struct inode *inode, struct file *file);
extern int cifs_close(struct inode *inode, struct file *file);
extern int cifs_closedir(struct inode *inode, struct file *file);
extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
+extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 87eece6..e6e6f24 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2955,6 +2955,18 @@ cifs_read_allocate_pages(struct cifs_readdata *rdata, unsigned int nr_pages)
return rc;
}

+static void cifs_direct_readdata_release(struct kref *refcount)
+{
+ struct cifs_readdata *rdata = container_of(refcount,
+ struct cifs_readdata, refcount);
+ unsigned int i;
+
+ for (i = 0; i < rdata->nr_pages; i++)
+ put_page(rdata->pages[i]);
+
+ cifs_readdata_release(refcount);
+}
+
static void
cifs_uncached_readdata_release(struct kref *refcount)
{
@@ -3267,6 +3279,143 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
complete(&ctx->done);
}

+static void cifs_direct_readv_complete(struct work_struct *work)
+{
+ struct cifs_readdata *rdata =
+ container_of(work, struct cifs_readdata, work);
+
+ complete(&rdata->done);
+ kref_put(&rdata->refcount, cifs_direct_readdata_release);
+}
+
+ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to)
+{
+ size_t len, cur_len, start;
+ unsigned int npages, rsize, credits;
+ struct file *file;
+ struct cifs_sb_info *cifs_sb;
+ struct cifsFileInfo *cfile;
+ struct cifs_tcon *tcon;
+ struct page **pagevec;
+ ssize_t rc, total_read = 0;
+ struct TCP_Server_Info *server;
+ loff_t offset = iocb->ki_pos;
+ pid_t pid;
+ struct cifs_readdata *rdata;
+
+ /*
+ * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
+ * fall back to data copy read path
+ */
+ if (to->type & ITER_KVEC) {
+ cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n");
+ return cifs_user_readv(iocb, to);
+ }
+
+ len = iov_iter_count(to);
+ if (!len)
+ return 0;
+
+ file = iocb->ki_filp;
+ cifs_sb = CIFS_FILE_SB(file);
+ cfile = file->private_data;
+ tcon = tlink_tcon(cfile->tlink);
+ server = tcon->ses->server;
+
+ if (!server->ops->async_readv)
+ return -ENOSYS;
+
+ if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
+ pid = cfile->pid;
+ else
+ pid = current->tgid;
+
+ if ((file->f_flags & O_ACCMODE) == O_WRONLY)
+ cifs_dbg(FYI, "attempting read on write only file instance\n");
+
+ do {
+ rc = server->ops->wait_mtu_credits(server, cifs_sb->rsize,
+ &rsize, &credits);
+ if (rc)
+ break;
+
+ cur_len = min_t(const size_t, len, rsize);
+
+ rc = iov_iter_get_pages_alloc(to, &pagevec, cur_len, &start);
+ if (rc < 0) {
+ cifs_dbg(VFS,
+ "couldn't get user pages (rc=%zd) iter type %d"
+ " iov_offset %lu count %lu\n",
+ rc, to->type, to->iov_offset, to->count);
+ dump_stack();
+ break;
+ }
+
+ rdata = cifs_readdata_direct_alloc(
+ pagevec, cifs_direct_readv_complete);
+ if (!rdata) {
+ add_credits_and_wake_if(server, credits, 0);
+ rc = -ENOMEM;
+ break;
+ }
+
+ npages = (rc + start + PAGE_SIZE-1) / PAGE_SIZE;
+ rdata->nr_pages = npages;
+ rdata->page_offset = start;
+ rdata->pagesz = PAGE_SIZE;
+ rdata->tailsz = npages > 1 ?
+ rc-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE :
+ rc;
+ cur_len = rc;
+
+ rdata->cfile = cifsFileInfo_get(cfile);
+ rdata->offset = offset;
+ rdata->bytes = rc;
+ rdata->pid = pid;
+ rdata->read_into_pages = cifs_uncached_read_into_pages;
+ rdata->copy_into_pages = cifs_uncached_copy_into_pages;
+ rdata->credits = credits;
+
+ rc = 0;
+ if (rdata->cfile->invalidHandle)
+ rc = cifs_reopen_file(rdata->cfile, true);
+
+ if (!rc)
+ rc = server->ops->async_readv(rdata);
+
+ if (rc) {
+ add_credits_and_wake_if(server, rdata->credits, 0);
+ kref_put(&rdata->refcount,
+ cifs_direct_readdata_release);
+ if (rc == -EAGAIN)
+ continue;
+ break;
+ }
+
+ wait_for_completion(&rdata->done);
+ rc = rdata->result;
+ if (rc) {
+ kref_put(
+ &rdata->refcount,
+ cifs_direct_readdata_release);
+ if (rc == -EAGAIN)
+ continue;
+ break;
+ }
+
+ total_read += rdata->got_bytes;
+ kref_put(&rdata->refcount, cifs_direct_readdata_release);
+
+ iov_iter_advance(to, cur_len);
+ len -= cur_len;
+ offset += cur_len;
+ } while (len);
+
+ iocb->ki_pos += total_read;
+
+ return total_read;
+}
+
ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
{
struct file *file = iocb->ki_filp;
--
2.7.4


2018-05-30 19:53:25

by Long Li

[permalink] [raw]
Subject: [Patch v2 03/15] CIFS: Use offset when reading pages

From: Long Li <[email protected]>

With offset defined in rdata, transport functions need to look at this
offset when reading data into the correct places in pages.

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/cifsproto.h | 4 +++-
fs/cifs/connect.c | 5 +++--
fs/cifs/file.c | 52 +++++++++++++++++++++++++++++++++++++---------------
fs/cifs/smb2ops.c | 2 +-
fs/cifs/smb2pdu.c | 1 +
5 files changed, 45 insertions(+), 19 deletions(-)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index dc80f84..1f27d8e 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -203,7 +203,9 @@ extern void dequeue_mid(struct mid_q_entry *mid, bool malformed);
extern int cifs_read_from_socket(struct TCP_Server_Info *server, char *buf,
unsigned int to_read);
extern int cifs_read_page_from_socket(struct TCP_Server_Info *server,
- struct page *page, unsigned int to_read);
+ struct page *page,
+ unsigned int page_offset,
+ unsigned int to_read);
extern int cifs_setup_cifs_sb(struct smb_vol *pvolume_info,
struct cifs_sb_info *cifs_sb);
extern int cifs_match_super(struct super_block *, void *);
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 83b0234..8501da0 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -594,10 +594,11 @@ cifs_read_from_socket(struct TCP_Server_Info *server, char *buf,

int
cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page *page,
- unsigned int to_read)
+ unsigned int page_offset, unsigned int to_read)
{
struct msghdr smb_msg;
- struct bio_vec bv = {.bv_page = page, .bv_len = to_read};
+ struct bio_vec bv = {
+ .bv_page = page, .bv_len = to_read, .bv_offset = page_offset};
iov_iter_bvec(&smb_msg.msg_iter, READ | ITER_BVEC, &bv, 1, to_read);
return cifs_readv_from_socket(server, &smb_msg);
}
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 1c98293..87eece6 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3026,12 +3026,20 @@ uncached_fill_pages(struct TCP_Server_Info *server,
int result = 0;
unsigned int i;
unsigned int nr_pages = rdata->nr_pages;
+ unsigned int page_offset = rdata->page_offset;

rdata->got_bytes = 0;
rdata->tailsz = PAGE_SIZE;
for (i = 0; i < nr_pages; i++) {
struct page *page = rdata->pages[i];
size_t n;
+ unsigned int segment_size = rdata->pagesz;
+
+ if (i == 0)
+ segment_size -= page_offset;
+ else
+ page_offset = 0;
+

if (len <= 0) {
/* no need to hold page hostage */
@@ -3040,24 +3048,25 @@ uncached_fill_pages(struct TCP_Server_Info *server,
put_page(page);
continue;
}
+
n = len;
- if (len >= PAGE_SIZE) {
+ if (len >= segment_size)
/* enough data to fill the page */
- n = PAGE_SIZE;
- len -= n;
- } else {
- zero_user(page, len, PAGE_SIZE - len);
+ n = segment_size;
+ else
rdata->tailsz = len;
- len = 0;
- }
+ len -= n;
+
if (iter)
- result = copy_page_from_iter(page, 0, n, iter);
+ result = copy_page_from_iter(
+ page, page_offset, n, iter);
#ifdef CONFIG_CIFS_SMB_DIRECT
else if (rdata->mr)
result = n;
#endif
else
- result = cifs_read_page_from_socket(server, page, n);
+ result = cifs_read_page_from_socket(
+ server, page, page_offset, n);
if (result < 0)
break;

@@ -3130,6 +3139,7 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
rdata->bytes = cur_len;
rdata->pid = pid;
rdata->pagesz = PAGE_SIZE;
+ rdata->tailsz = PAGE_SIZE;
rdata->read_into_pages = cifs_uncached_read_into_pages;
rdata->copy_into_pages = cifs_uncached_copy_into_pages;
rdata->credits = credits;
@@ -3574,6 +3584,7 @@ readpages_fill_pages(struct TCP_Server_Info *server,
u64 eof;
pgoff_t eof_index;
unsigned int nr_pages = rdata->nr_pages;
+ unsigned int page_offset = rdata->page_offset;

/* determine the eof that the server (probably) has */
eof = CIFS_I(rdata->mapping->host)->server_eof;
@@ -3584,13 +3595,21 @@ readpages_fill_pages(struct TCP_Server_Info *server,
rdata->tailsz = PAGE_SIZE;
for (i = 0; i < nr_pages; i++) {
struct page *page = rdata->pages[i];
- size_t n = PAGE_SIZE;
+ unsigned int to_read = rdata->pagesz;
+ size_t n;
+
+ if (i == 0)
+ to_read -= page_offset;
+ else
+ page_offset = 0;
+
+ n = to_read;

- if (len >= PAGE_SIZE) {
- len -= PAGE_SIZE;
+ if (len >= to_read) {
+ len -= to_read;
} else if (len > 0) {
/* enough for partial page, fill and zero the rest */
- zero_user(page, len, PAGE_SIZE - len);
+ zero_user(page, len + page_offset, to_read - len);
n = rdata->tailsz = len;
len = 0;
} else if (page->index > eof_index) {
@@ -3622,13 +3641,15 @@ readpages_fill_pages(struct TCP_Server_Info *server,
}

if (iter)
- result = copy_page_from_iter(page, 0, n, iter);
+ result = copy_page_from_iter(
+ page, page_offset, n, iter);
#ifdef CONFIG_CIFS_SMB_DIRECT
else if (rdata->mr)
result = n;
#endif
else
- result = cifs_read_page_from_socket(server, page, n);
+ result = cifs_read_page_from_socket(
+ server, page, page_offset, n);
if (result < 0)
break;

@@ -3807,6 +3828,7 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
rdata->bytes = bytes;
rdata->pid = pid;
rdata->pagesz = PAGE_SIZE;
+ rdata->tailsz = PAGE_SIZE;
rdata->read_into_pages = cifs_readpages_read_into_pages;
rdata->copy_into_pages = cifs_readpages_copy_into_pages;
rdata->credits = credits;
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 7c0edd2..1fa1c29 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -2467,7 +2467,7 @@ read_data_into_pages(struct TCP_Server_Info *server, struct page **pages,
zero_user(page, len, PAGE_SIZE - len);
len = 0;
}
- length = cifs_read_page_from_socket(server, page, n);
+ length = cifs_read_page_from_socket(server, page, 0, n);
if (length < 0)
return length;
server->total_read += length;
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index a02f6b6..f603fbe 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -2683,6 +2683,7 @@ smb2_readv_callback(struct mid_q_entry *mid)
struct smb_rqst rqst = { .rq_iov = rdata->iov,
.rq_nvec = 2,
.rq_pages = rdata->pages,
+ .rq_offset = rdata->page_offset,
.rq_npages = rdata->nr_pages,
.rq_pagesz = rdata->pagesz,
.rq_tailsz = rdata->tailsz };
--
2.7.4


2018-05-30 19:53:37

by Long Li

[permalink] [raw]
Subject: [Patch v2 01/15] CIFS: Introduce offset for the 1st page in data transfer structures

From: Long Li <[email protected]>

When direct I/O is used, the data buffer may not always align to page
boundaries. Introduce a page offset in transport data structures to
describe the location of the buffer within the page.

Also change the function to pass the page offset when sending data to
transport.

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/cifsglob.h | 3 +++
fs/cifs/smb2pdu.c | 1 +
2 files changed, 4 insertions(+)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 4f674b7..8d16c3e 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -176,6 +176,7 @@ struct smb_rqst {
struct kvec *rq_iov; /* array of kvecs */
unsigned int rq_nvec; /* number of kvecs in array */
struct page **rq_pages; /* pointer to array of page ptrs */
+ unsigned int rq_offset; /* the offset to the 1st page */
unsigned int rq_npages; /* number pages in array */
unsigned int rq_pagesz; /* page size to use */
unsigned int rq_tailsz; /* length of last page */
@@ -1174,6 +1175,7 @@ struct cifs_readdata {
struct smbd_mr *mr;
#endif
unsigned int pagesz;
+ unsigned int page_offset;
unsigned int tailsz;
unsigned int credits;
unsigned int nr_pages;
@@ -1199,6 +1201,7 @@ struct cifs_writedata {
struct smbd_mr *mr;
#endif
unsigned int pagesz;
+ unsigned int page_offset;
unsigned int tailsz;
unsigned int credits;
unsigned int nr_pages;
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 47d5331..a02f6b6 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -3045,6 +3045,7 @@ smb2_async_writev(struct cifs_writedata *wdata,
rqst.rq_iov = iov;
rqst.rq_nvec = 2;
rqst.rq_pages = wdata->pages;
+ rqst.rq_offset = wdata->page_offset;
rqst.rq_npages = wdata->nr_pages;
rqst.rq_pagesz = wdata->pagesz;
rqst.rq_tailsz = wdata->tailsz;
--
2.7.4


2018-05-30 19:53:49

by Long Li

[permalink] [raw]
Subject: [Patch v2 09/15] CIFS: SMBD: Support page offset in RDMA recv

From: Long Li <[email protected]>

RDMA recv function needs to place data to the correct place starting at
page offset.

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/smbdirect.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index 6141e3c..ba53c52 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2004,10 +2004,12 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
* return value: actual data read
*/
static int smbd_recv_page(struct smbd_connection *info,
- struct page *page, unsigned int to_read)
+ struct page *page, unsigned int page_offset,
+ unsigned int to_read)
{
int ret;
char *to_address;
+ void *page_address;

/* make sure we have the page ready for read */
ret = wait_event_interruptible(
@@ -2015,16 +2017,17 @@ static int smbd_recv_page(struct smbd_connection *info,
info->reassembly_data_length >= to_read ||
info->transport_status != SMBD_CONNECTED);
if (ret)
- return 0;
+ return ret;

/* now we can read from reassembly queue and not sleep */
- to_address = kmap_atomic(page);
+ page_address = kmap_atomic(page);
+ to_address = (char *) page_address + page_offset;

log_read(INFO, "reading from page=%p address=%p to_read=%d\n",
page, to_address, to_read);

ret = smbd_recv_buf(info, to_address, to_read);
- kunmap_atomic(to_address);
+ kunmap_atomic(page_address);

return ret;
}
@@ -2038,7 +2041,7 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
{
char *buf;
struct page *page;
- unsigned int to_read;
+ unsigned int to_read, page_offset;
int rc;

info->smbd_recv_pending++;
@@ -2052,15 +2055,16 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)

case READ | ITER_BVEC:
page = msg->msg_iter.bvec->bv_page;
+ page_offset = msg->msg_iter.bvec->bv_offset;
to_read = msg->msg_iter.bvec->bv_len;
- rc = smbd_recv_page(info, page, to_read);
+ rc = smbd_recv_page(info, page, page_offset, to_read);
break;

default:
/* It's a bug in upper layer to get there */
cifs_dbg(VFS, "CIFS: invalid msg type %d\n",
msg->msg_iter.type);
- rc = -EIO;
+ rc = -EINVAL;
}

info->smbd_recv_pending--;
--
2.7.4


2018-05-30 19:54:08

by Long Li

[permalink] [raw]
Subject: [Patch v2 06/15] CIFS: Introduce helper function to get page offset and length in smb_rqst

From: Long Li <[email protected]>

Introduce a function rqst_page_get_length to return the page offset and
length for a given page in smb_rqst. This function is to be used by
following patches.

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/cifsproto.h | 3 +++
fs/cifs/misc.c | 17 +++++++++++++++++
2 files changed, 20 insertions(+)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index 7933c5f..89dda14 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -557,4 +557,7 @@ int cifs_alloc_hash(const char *name, struct crypto_shash **shash,
struct sdesc **sdesc);
void cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc);

+extern void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page,
+ unsigned int *len, unsigned int *offset);
+
#endif /* _CIFSPROTO_H */
diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
index 96849b5..e951417 100644
--- a/fs/cifs/misc.c
+++ b/fs/cifs/misc.c
@@ -905,3 +905,20 @@ cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc)
crypto_free_shash(*shash);
*shash = NULL;
}
+
+/**
+ * rqst_page_get_length - obtain the length and offset for a page in smb_rqst
+ * Input: rqst - a smb_rqst, page - a page index for rqst
+ * Output: *len - the length for this page, *offset - the offset for this page
+ */
+void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page,
+ unsigned int *len, unsigned int *offset)
+{
+ *len = rqst->rq_pagesz;
+ *offset = (page == 0) ? rqst->rq_offset : 0;
+
+ if (rqst->rq_npages == 1 || page == rqst->rq_npages-1)
+ *len = rqst->rq_tailsz;
+ else if (page == 0)
+ *len = rqst->rq_pagesz - rqst->rq_offset;
+}
--
2.7.4


2018-05-30 19:54:20

by Long Li

[permalink] [raw]
Subject: [Patch v2 07/15] CIFS: When sending data on socket, pass the correct page offset

From: Long Li <[email protected]>

It's possible that the offset is non-zero in the page to send, change the
function to pass this offset to socket.

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/transport.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index d6b5523..5c96ee8 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -288,15 +288,13 @@ __smb_send_rqst(struct TCP_Server_Info *server, struct smb_rqst *rqst)

/* now walk the page array and send each page in it */
for (i = 0; i < rqst->rq_npages; i++) {
- size_t len = i == rqst->rq_npages - 1
- ? rqst->rq_tailsz
- : rqst->rq_pagesz;
- struct bio_vec bvec = {
- .bv_page = rqst->rq_pages[i],
- .bv_len = len
- };
+ struct bio_vec bvec;
+
+ bvec.bv_page = rqst->rq_pages[i];
+ rqst_page_get_length(rqst, i, &bvec.bv_len, &bvec.bv_offset);
+
iov_iter_bvec(&smb_msg.msg_iter, WRITE | ITER_BVEC,
- &bvec, 1, len);
+ &bvec, 1, bvec.bv_len);
rc = smb_send_kvec(server, &smb_msg, &sent);
if (rc < 0)
break;
--
2.7.4


2018-05-30 19:55:02

by Long Li

[permalink] [raw]
Subject: [Patch v2 05/15] CIFS: Calculate the correct request length based on page offset and tail size

From: Long Li <[email protected]>

It's possible that the page offset is non-zero in the pages in a request,
change the function to calculate the correct data buffer length.

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/transport.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 927226a..d6b5523 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -212,10 +212,24 @@ rqst_len(struct smb_rqst *rqst)
for (i = 0; i < rqst->rq_nvec; i++)
buflen += iov[i].iov_len;

- /* add in the page array if there is one */
+ /*
+ * Add in the page array if there is one. The caller needs to make
+ * sure rq_offset and rq_tailsz are set correctly. If a buffer of
+ * multiple pages ends at page boundary, rq_tailsz needs to be set to
+ * PAGE_SIZE.
+ */
if (rqst->rq_npages) {
- buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
- buflen += rqst->rq_tailsz;
+ if (rqst->rq_npages == 1)
+ buflen += rqst->rq_tailsz;
+ else {
+ /*
+ * If there is more than one page, calculate the
+ * buffer length based on rq_offset and rq_tailsz
+ */
+ buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
+ rqst->rq_offset;
+ buflen += rqst->rq_tailsz;
+ }
}

return buflen;
--
2.7.4


2018-05-30 19:56:06

by Long Li

[permalink] [raw]
Subject: [Patch v2 02/15] CIFS: Add support for direct pages in rdata

From: Long Li <[email protected]>

Add a function to allocate rdata without allocating pages for data
transfer. This gives the caller an option to pass a number of pages
that point to the data buffer.

rdata is still reponsible for free those pages after it's done.

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/cifsglob.h | 2 +-
fs/cifs/file.c | 23 ++++++++++++++++++++---
2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 8d16c3e..56864a87 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1179,7 +1179,7 @@ struct cifs_readdata {
unsigned int tailsz;
unsigned int credits;
unsigned int nr_pages;
- struct page *pages[];
+ struct page **pages;
};

struct cifs_writedata;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 23fd430..1c98293 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2880,13 +2880,13 @@ cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from)
}

static struct cifs_readdata *
-cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
+cifs_readdata_direct_alloc(struct page **pages, work_func_t complete)
{
struct cifs_readdata *rdata;

- rdata = kzalloc(sizeof(*rdata) + (sizeof(struct page *) * nr_pages),
- GFP_KERNEL);
+ rdata = kzalloc(sizeof(*rdata), GFP_KERNEL);
if (rdata != NULL) {
+ rdata->pages = pages;
kref_init(&rdata->refcount);
INIT_LIST_HEAD(&rdata->list);
init_completion(&rdata->done);
@@ -2896,6 +2896,22 @@ cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
return rdata;
}

+static struct cifs_readdata *
+cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
+{
+ struct page **pages =
+ kzalloc(sizeof(struct page *) * nr_pages, GFP_KERNEL);
+ struct cifs_readdata *ret = NULL;
+
+ if (pages) {
+ ret = cifs_readdata_direct_alloc(pages, complete);
+ if (!ret)
+ kfree(pages);
+ }
+
+ return ret;
+}
+
void
cifs_readdata_release(struct kref *refcount)
{
@@ -2910,6 +2926,7 @@ cifs_readdata_release(struct kref *refcount)
if (rdata->cfile)
cifsFileInfo_put(rdata->cfile);

+ kvfree(rdata->pages);
kfree(rdata);
}

--
2.7.4


2018-05-30 20:28:41

by Michael J. Ruhl

[permalink] [raw]
Subject: RE: [Patch v2 02/15] CIFS: Add support for direct pages in rdata

>-----Original Message-----
>From: [email protected] [mailto:linux-rdma-
>[email protected]] On Behalf Of Long Li
>Sent: Wednesday, May 30, 2018 3:48 PM
>To: Steve French <[email protected]>; [email protected]; samba-
>[email protected]; [email protected]; linux-
>[email protected]
>Cc: Long Li <[email protected]>
>Subject: [Patch v2 02/15] CIFS: Add support for direct pages in rdata
>
>From: Long Li <[email protected]>
>
>Add a function to allocate rdata without allocating pages for data
>transfer. This gives the caller an option to pass a number of pages
>that point to the data buffer.
>
>rdata is still reponsible for free those pages after it's done.
>
>Signed-off-by: Long Li <[email protected]>
>---
> fs/cifs/cifsglob.h | 2 +-
> fs/cifs/file.c | 23 ++++++++++++++++++++---
> 2 files changed, 21 insertions(+), 4 deletions(-)
>
>diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
>index 8d16c3e..56864a87 100644
>--- a/fs/cifs/cifsglob.h
>+++ b/fs/cifs/cifsglob.h
>@@ -1179,7 +1179,7 @@ struct cifs_readdata {
> unsigned int tailsz;
> unsigned int credits;
> unsigned int nr_pages;
>- struct page *pages[];
>+ struct page **pages;
> };
>
> struct cifs_writedata;
>diff --git a/fs/cifs/file.c b/fs/cifs/file.c
>index 23fd430..1c98293 100644
>--- a/fs/cifs/file.c
>+++ b/fs/cifs/file.c
>@@ -2880,13 +2880,13 @@ cifs_strict_writev(struct kiocb *iocb, struct
>iov_iter *from)
> }
>
> static struct cifs_readdata *
>-cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
>+cifs_readdata_direct_alloc(struct page **pages, work_func_t complete)
> {
> struct cifs_readdata *rdata;
>
>- rdata = kzalloc(sizeof(*rdata) + (sizeof(struct page *) * nr_pages),
>- GFP_KERNEL);
>+ rdata = kzalloc(sizeof(*rdata), GFP_KERNEL);
> if (rdata != NULL) {
>+ rdata->pages = pages;
> kref_init(&rdata->refcount);
> INIT_LIST_HEAD(&rdata->list);
> init_completion(&rdata->done);
>@@ -2896,6 +2896,22 @@ cifs_readdata_alloc(unsigned int nr_pages,
>work_func_t complete)
> return rdata;
> }
>
>+static struct cifs_readdata *
>+cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
>+{
>+ struct page **pages =
>+ kzalloc(sizeof(struct page *) * nr_pages, GFP_KERNEL);
>+ struct cifs_readdata *ret = NULL;
>+
>+ if (pages) {
>+ ret = cifs_readdata_direct_alloc(pages, complete);
>+ if (!ret)
>+ kfree(pages);
>+ }
>+
>+ return ret;
>+}
>+
> void
> cifs_readdata_release(struct kref *refcount)
> {
>@@ -2910,6 +2926,7 @@ cifs_readdata_release(struct kref *refcount)
> if (rdata->cfile)
> cifsFileInfo_put(rdata->cfile);
>
>+ kvfree(rdata->pages);

Is the kvfree() correct?

You use kzalloc() and kfree in cifs_readdata_alloc().

Mike

> kfree(rdata);
> }
>
>--
>2.7.4
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html

2018-05-30 21:00:23

by Long Li

[permalink] [raw]
Subject: RE: [Patch v2 02/15] CIFS: Add support for direct pages in rdata

> Subject: RE: [Patch v2 02/15] CIFS: Add support for direct pages in rdata
>
> >-----Original Message-----
> >From: [email protected] [mailto:linux-rdma-
> >[email protected]] On Behalf Of Long Li
> >Sent: Wednesday, May 30, 2018 3:48 PM
> >To: Steve French <[email protected]>; [email protected];
> >samba- [email protected]; [email protected]; linux-
> >[email protected]
> >Cc: Long Li <[email protected]>
> >Subject: [Patch v2 02/15] CIFS: Add support for direct pages in rdata
> >
> >From: Long Li <[email protected]>
> >
> >Add a function to allocate rdata without allocating pages for data
> >transfer. This gives the caller an option to pass a number of pages
> >that point to the data buffer.
> >
> >rdata is still reponsible for free those pages after it's done.
> >
> >Signed-off-by: Long Li <[email protected]>
> >---
> > fs/cifs/cifsglob.h | 2 +-
> > fs/cifs/file.c | 23 ++++++++++++++++++++---
> > 2 files changed, 21 insertions(+), 4 deletions(-)
> >
> >diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index
> >8d16c3e..56864a87 100644
> >--- a/fs/cifs/cifsglob.h
> >+++ b/fs/cifs/cifsglob.h
> >@@ -1179,7 +1179,7 @@ struct cifs_readdata {
> > unsigned int tailsz;
> > unsigned int credits;
> > unsigned int nr_pages;
> >- struct page *pages[];
> >+ struct page **pages;
> > };
> >
> > struct cifs_writedata;
> >diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 23fd430..1c98293
> >100644
> >--- a/fs/cifs/file.c
> >+++ b/fs/cifs/file.c
> >@@ -2880,13 +2880,13 @@ cifs_strict_writev(struct kiocb *iocb, struct
> >iov_iter *from) }
> >
> > static struct cifs_readdata *
> >-cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
> >+cifs_readdata_direct_alloc(struct page **pages, work_func_t complete)
> > {
> > struct cifs_readdata *rdata;
> >
> >- rdata = kzalloc(sizeof(*rdata) + (sizeof(struct page *) * nr_pages),
> >- GFP_KERNEL);
> >+ rdata = kzalloc(sizeof(*rdata), GFP_KERNEL);
> > if (rdata != NULL) {
> >+ rdata->pages = pages;
> > kref_init(&rdata->refcount);
> > INIT_LIST_HEAD(&rdata->list);
> > init_completion(&rdata->done);
> >@@ -2896,6 +2896,22 @@ cifs_readdata_alloc(unsigned int nr_pages,
> >work_func_t complete)
> > return rdata;
> > }
> >
> >+static struct cifs_readdata *
> >+cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete) {
> >+ struct page **pages =
> >+ kzalloc(sizeof(struct page *) * nr_pages, GFP_KERNEL);
> >+ struct cifs_readdata *ret = NULL;
> >+
> >+ if (pages) {
> >+ ret = cifs_readdata_direct_alloc(pages, complete);
> >+ if (!ret)
> >+ kfree(pages);
> >+ }
> >+
> >+ return ret;
> >+}
> >+
> > void
> > cifs_readdata_release(struct kref *refcount) { @@ -2910,6 +2926,7 @@
> >cifs_readdata_release(struct kref *refcount)
> > if (rdata->cfile)
> > cifsFileInfo_put(rdata->cfile);
> >
> >+ kvfree(rdata->pages);
>
> Is the kvfree() correct?
>
> You use kzalloc() and kfree in cifs_readdata_alloc().

This function is shared by both non-direct I/O and direct I/O code paths.

Direct I/O uses kvmalloc to allocate pages, so kvfree is used here to handle both cases.

>
> Mike
>
> > kfree(rdata);
> > }
> >
> >--
> >2.7.4
> >
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> >in the body of a message to [email protected] More majordomo
> >info at
> >https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvger.k
> e
> >rnel.org%2Fmajordomo-
> info.html&data=02%7C01%7Clongli%40microsoft.com%7C
> >e810388a534643737ab108d5c66bd6df%7C72f988bf86f141af91ab2d7cd011db
> 47%7C1
> >%7C0%7C636633088833938755&sdata=iHKiji2rUhLHpbH5x13SJBWCvHExSr4a
> rz9Xiv3
> >1rMQ%3D&reserved=0
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the
> body of a message to [email protected] More majordomo info at
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvger.ke
> rnel.org%2Fmajordomo-
> info.html&data=02%7C01%7Clongli%40microsoft.com%7Ce810388a53464373
> 7ab108d5c66bd6df%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63
> 6633088833938755&sdata=iHKiji2rUhLHpbH5x13SJBWCvHExSr4arz9Xiv31rMQ
> %3D&reserved=0

2018-06-02 05:53:32

by kernel test robot

[permalink] [raw]
Subject: Re: [Patch v2 13/15] CIFS: Add support for direct I/O read

Hi Long,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v4.17-rc6]
[cannot apply to cifs/for-next next-20180601]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Long-Li/CIFS-Add-direct-I-O-support/20180602-130240
config: i386-randconfig-x000-201821 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

All warnings (new ones prefixed by >>):

In file included from include/linux/kernel.h:14:0,
from include/linux/list.h:9,
from include/linux/wait.h:7,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from fs/cifs/file.c:24:
fs/cifs/file.c: In function 'cifs_direct_readv':
include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
#define KERN_SOH "\001" /* ASCII Start Of Header */
^
include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
printk(fmt, ##__VA_ARGS__); \
^~~
include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
#define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
^~~~~~~~
include/linux/printk.h:342:12: note: in expansion of macro 'KERN_DEBUG'
no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~
>> fs/cifs/cifs_debug.h:82:3: note: in expansion of macro 'pr_debug'
pr_debug(fmt, ##__VA_ARGS__); \
^~~~~~~~
fs/cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
cifs_dbg(VFS,
^~~~~~~~
fs/cifs/file.c:3348:20: note: format string is defined here
" iov_offset %lu count %lu\n",
~~^
%u
In file included from include/linux/kernel.h:14:0,
from include/linux/list.h:9,
from include/linux/wait.h:7,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from fs/cifs/file.c:24:
include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
#define KERN_SOH "\001" /* ASCII Start Of Header */
^
include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
printk(fmt, ##__VA_ARGS__); \
^~~
include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
#define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
^~~~~~~~
include/linux/printk.h:342:12: note: in expansion of macro 'KERN_DEBUG'
no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~
>> fs/cifs/cifs_debug.h:82:3: note: in expansion of macro 'pr_debug'
pr_debug(fmt, ##__VA_ARGS__); \
^~~~~~~~
fs/cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
cifs_dbg(VFS,
^~~~~~~~
fs/cifs/file.c:3348:30: note: format string is defined here
" iov_offset %lu count %lu\n",
~~^
%u

vim +/pr_debug +82 fs/cifs/cifs_debug.h

f2f176b4 Aurelien Aptel 2018-04-10 73
^1da177e Linus Torvalds 2005-04-16 74 /*
^1da177e Linus Torvalds 2005-04-16 75 * debug OFF
^1da177e Linus Torvalds 2005-04-16 76 * ---------
^1da177e Linus Torvalds 2005-04-16 77 */
^1da177e Linus Torvalds 2005-04-16 78 #else /* _CIFS_DEBUG */
f96637be Joe Perches 2013-05-04 79 #define cifs_dbg(type, fmt, ...) \
bde98197 Joe Perches 2012-12-05 80 do { \
bde98197 Joe Perches 2012-12-05 81 if (0) \
0b456f04 Andy Shevchenko 2014-08-27 @82 pr_debug(fmt, ##__VA_ARGS__); \
bde98197 Joe Perches 2012-12-05 83 } while (0)
f96637be Joe Perches 2013-05-04 84 #endif
^1da177e Linus Torvalds 2005-04-16 85

:::::: The code at line 82 was first introduced by commit
:::::: 0b456f04bcdf5b1151136adaada158bfc26f1be7 cifs: convert printk(LEVEL...) to pr_<level>

:::::: TO: Andy Shevchenko <[email protected]>
:::::: CC: Steve French <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (4.60 kB)
.config.gz (24.65 kB)
Download all attachments

2018-06-02 07:16:17

by kernel test robot

[permalink] [raw]
Subject: Re: [Patch v2 13/15] CIFS: Add support for direct I/O read

Hi Long,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v4.17-rc6]
[cannot apply to cifs/for-next next-20180601]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Long-Li/CIFS-Add-direct-I-O-support/20180602-130240
config: i386-randconfig-x008-201821 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

All warnings (new ones prefixed by >>):

In file included from include/linux/kernel.h:14:0,
from include/linux/list.h:9,
from include/linux/wait.h:7,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from fs//cifs/file.c:24:
fs//cifs/file.c: In function 'cifs_direct_readv':
include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
#define KERN_SOH "\001" /* ASCII Start Of Header */
^
include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
printk(fmt, ##__VA_ARGS__); \
^~~
include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
#define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
^~~~~~~~
include/linux/printk.h:410:12: note: in expansion of macro 'KERN_DEBUG'
no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~
fs//cifs/cifs_debug.h:54:3: note: in expansion of macro 'pr_debug_once'
pr_debug_ ## ratefunc("%s: " \
^~~~~~~~~
fs//cifs/cifs_debug.h:67:3: note: in expansion of macro 'cifs_dbg_func'
cifs_dbg_func(once, \
^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
cifs_dbg(VFS,
^~~~~~~~
fs//cifs/file.c:3348:20: note: format string is defined here
" iov_offset %lu count %lu\n",
~~^
%u
In file included from include/linux/kernel.h:14:0,
from include/linux/list.h:9,
from include/linux/wait.h:7,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from fs//cifs/file.c:24:
include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'size_t {aka unsigned int}' [-Wformat=]
#define KERN_SOH "\001" /* ASCII Start Of Header */
^
include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
printk(fmt, ##__VA_ARGS__); \
^~~
include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
#define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
^~~~~~~~
include/linux/printk.h:410:12: note: in expansion of macro 'KERN_DEBUG'
no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~
fs//cifs/cifs_debug.h:54:3: note: in expansion of macro 'pr_debug_once'
pr_debug_ ## ratefunc("%s: " \
^~~~~~~~~
fs//cifs/cifs_debug.h:67:3: note: in expansion of macro 'cifs_dbg_func'
cifs_dbg_func(once, \
^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
cifs_dbg(VFS,
^~~~~~~~
fs//cifs/file.c:3348:30: note: format string is defined here
" iov_offset %lu count %lu\n",
~~^
%u
In file included from include/linux/kernel.h:14:0,
from include/linux/list.h:9,
from include/linux/wait.h:7,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from fs//cifs/file.c:24:
include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
#define KERN_SOH "\001" /* ASCII Start Of Header */
^
include/linux/printk.h:357:10: note: in definition of macro 'printk_once'
printk(fmt, ##__VA_ARGS__); \
^~~
include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
#define KERN_ERR KERN_SOH "3" /* error conditions */
^~~~~~~~
include/linux/printk.h:386:14: note: in expansion of macro 'KERN_ERR'
printk_once(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~
fs//cifs/cifs_debug.h:57:3: note: in expansion of macro 'pr_err_once'
pr_err_ ## ratefunc("CIFS VFS: " \
^~~~~~~
fs//cifs/cifs_debug.h:67:3: note: in expansion of macro 'cifs_dbg_func'
cifs_dbg_func(once, \
^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
cifs_dbg(VFS,
^~~~~~~~
fs//cifs/file.c:3348:20: note: format string is defined here
" iov_offset %lu count %lu\n",
~~^
%u
In file included from include/linux/kernel.h:14:0,
from include/linux/list.h:9,
from include/linux/wait.h:7,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from fs//cifs/file.c:24:
include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
#define KERN_SOH "\001" /* ASCII Start Of Header */
^
include/linux/printk.h:357:10: note: in definition of macro 'printk_once'
printk(fmt, ##__VA_ARGS__); \
^~~
include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
#define KERN_ERR KERN_SOH "3" /* error conditions */
^~~~~~~~
include/linux/printk.h:386:14: note: in expansion of macro 'KERN_ERR'
printk_once(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~
fs//cifs/cifs_debug.h:57:3: note: in expansion of macro 'pr_err_once'
pr_err_ ## ratefunc("CIFS VFS: " \
^~~~~~~
fs//cifs/cifs_debug.h:67:3: note: in expansion of macro 'cifs_dbg_func'
cifs_dbg_func(once, \
^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
cifs_dbg(VFS,
^~~~~~~~
fs//cifs/file.c:3348:30: note: format string is defined here
" iov_offset %lu count %lu\n",
~~^
%u
In file included from include/linux/kernel.h:14:0,
from include/linux/list.h:9,
from include/linux/wait.h:7,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from fs//cifs/file.c:24:
include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
#define KERN_SOH "\001" /* ASCII Start Of Header */
^
include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
printk(fmt, ##__VA_ARGS__); \
^~~
include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
#define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
^~~~~~~~
include/linux/printk.h:410:12: note: in expansion of macro 'KERN_DEBUG'
no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~
fs//cifs/cifs_debug.h:60:3: note: in expansion of macro 'pr_debug_once'
pr_debug_ ## ratefunc(fmt, ##__VA_ARGS__); \
^~~~~~~~~
fs//cifs/cifs_debug.h:67:3: note: in expansion of macro 'cifs_dbg_func'
cifs_dbg_func(once, \
^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
cifs_dbg(VFS,
^~~~~~~~
fs//cifs/file.c:3348:20: note: format string is defined here
" iov_offset %lu count %lu\n",
~~^
%u
In file included from include/linux/kernel.h:14:0,
from include/linux/list.h:9,
from include/linux/wait.h:7,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from fs//cifs/file.c:24:
include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
#define KERN_SOH "\001" /* ASCII Start Of Header */
^
include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
printk(fmt, ##__VA_ARGS__); \
^~~
include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
#define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
^~~~~~~~
include/linux/printk.h:410:12: note: in expansion of macro 'KERN_DEBUG'
no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~
fs//cifs/cifs_debug.h:60:3: note: in expansion of macro 'pr_debug_once'
pr_debug_ ## ratefunc(fmt, ##__VA_ARGS__); \
^~~~~~~~~
fs//cifs/cifs_debug.h:67:3: note: in expansion of macro 'cifs_dbg_func'
cifs_dbg_func(once, \
^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
cifs_dbg(VFS,
^~~~~~~~
fs//cifs/file.c:3348:30: note: format string is defined here
" iov_offset %lu count %lu\n",
~~^
%u
In file included from include/linux/kernel.h:14:0,
from include/linux/list.h:9,
from include/linux/wait.h:7,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from fs//cifs/file.c:24:
include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
#define KERN_SOH "\001" /* ASCII Start Of Header */
^
include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
printk(fmt, ##__VA_ARGS__); \
^~~
include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
#define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
^~~~~~~~
include/linux/printk.h:474:12: note: in expansion of macro 'KERN_DEBUG'
no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~
fs//cifs/cifs_debug.h:54:3: note: in expansion of macro 'pr_debug_ratelimited'
pr_debug_ ## ratefunc("%s: " \
^~~~~~~~~
fs//cifs/cifs_debug.h:70:3: note: in expansion of macro 'cifs_dbg_func'
cifs_dbg_func(ratelimited, \
^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
cifs_dbg(VFS,
^~~~~~~~
fs//cifs/file.c:3348:20: note: format string is defined here
" iov_offset %lu count %lu\n",
~~^
%u
In file included from include/linux/kernel.h:14:0,
from include/linux/list.h:9,
from include/linux/wait.h:7,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from fs//cifs/file.c:24:
include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'size_t {aka unsigned int}' [-Wformat=]
#define KERN_SOH "\001" /* ASCII Start Of Header */
^
include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
printk(fmt, ##__VA_ARGS__); \
^~~
include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
#define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
^~~~~~~~
include/linux/printk.h:474:12: note: in expansion of macro 'KERN_DEBUG'
no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~
fs//cifs/cifs_debug.h:54:3: note: in expansion of macro 'pr_debug_ratelimited'
pr_debug_ ## ratefunc("%s: " \
^~~~~~~~~
fs//cifs/cifs_debug.h:70:3: note: in expansion of macro 'cifs_dbg_func'
cifs_dbg_func(ratelimited, \
^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
cifs_dbg(VFS,
^~~~~~~~
fs//cifs/file.c:3348:30: note: format string is defined here
" iov_offset %lu count %lu\n",
~~^
%u
In file included from include/linux/kernel.h:14:0,
from include/linux/list.h:9,
from include/linux/wait.h:7,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from fs//cifs/file.c:24:
include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
#define KERN_SOH "\001" /* ASCII Start Of Header */
^
include/linux/printk.h:425:10: note: in definition of macro 'printk_ratelimited'
printk(fmt, ##__VA_ARGS__); \
^~~
include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
#define KERN_ERR KERN_SOH "3" /* error conditions */
^~~~~~~~
include/linux/printk.h:439:21: note: in expansion of macro 'KERN_ERR'
printk_ratelimited(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~
fs//cifs/cifs_debug.h:57:3: note: in expansion of macro 'pr_err_ratelimited'
pr_err_ ## ratefunc("CIFS VFS: " \
^~~~~~~
fs//cifs/cifs_debug.h:70:3: note: in expansion of macro 'cifs_dbg_func'
cifs_dbg_func(ratelimited, \
^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
cifs_dbg(VFS,
^~~~~~~~
fs//cifs/file.c:3348:20: note: format string is defined here
" iov_offset %lu count %lu\n",
~~^
%u
In file included from include/linux/kernel.h:14:0,
from include/linux/list.h:9,
from include/linux/wait.h:7,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from fs//cifs/file.c:24:
include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
#define KERN_SOH "\001" /* ASCII Start Of Header */
^
include/linux/printk.h:425:10: note: in definition of macro 'printk_ratelimited'
printk(fmt, ##__VA_ARGS__); \
^~~
include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
#define KERN_ERR KERN_SOH "3" /* error conditions */
^~~~~~~~
include/linux/printk.h:439:21: note: in expansion of macro 'KERN_ERR'
printk_ratelimited(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~
fs//cifs/cifs_debug.h:57:3: note: in expansion of macro 'pr_err_ratelimited'
pr_err_ ## ratefunc("CIFS VFS: " \
^~~~~~~
fs//cifs/cifs_debug.h:70:3: note: in expansion of macro 'cifs_dbg_func'
cifs_dbg_func(ratelimited, \
^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
cifs_dbg(VFS,
^~~~~~~~
fs//cifs/file.c:3348:30: note: format string is defined here
" iov_offset %lu count %lu\n",
~~^
%u
In file included from include/linux/kernel.h:14:0,
from include/linux/list.h:9,
from include/linux/wait.h:7,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from fs//cifs/file.c:24:
include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
#define KERN_SOH "\001" /* ASCII Start Of Header */
^
include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
printk(fmt, ##__VA_ARGS__); \
^~~
include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
#define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
^~~~~~~~
include/linux/printk.h:474:12: note: in expansion of macro 'KERN_DEBUG'
no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~
fs//cifs/cifs_debug.h:60:3: note: in expansion of macro 'pr_debug_ratelimited'
pr_debug_ ## ratefunc(fmt, ##__VA_ARGS__); \
^~~~~~~~~
fs//cifs/cifs_debug.h:70:3: note: in expansion of macro 'cifs_dbg_func'
cifs_dbg_func(ratelimited, \
^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
cifs_dbg(VFS,
^~~~~~~~
fs//cifs/file.c:3348:20: note: format string is defined here
" iov_offset %lu count %lu\n",
~~^
%u
In file included from include/linux/kernel.h:14:0,
from include/linux/list.h:9,
from include/linux/wait.h:7,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from fs//cifs/file.c:24:
include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
#define KERN_SOH "\001" /* ASCII Start Of Header */
^
include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
printk(fmt, ##__VA_ARGS__); \
^~~
include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
#define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
^~~~~~~~
include/linux/printk.h:474:12: note: in expansion of macro 'KERN_DEBUG'
no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~
fs//cifs/cifs_debug.h:60:3: note: in expansion of macro 'pr_debug_ratelimited'
pr_debug_ ## ratefunc(fmt, ##__VA_ARGS__); \
^~~~~~~~~
fs//cifs/cifs_debug.h:70:3: note: in expansion of macro 'cifs_dbg_func'
cifs_dbg_func(ratelimited, \
^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
cifs_dbg(VFS,
^~~~~~~~
fs//cifs/file.c:3348:30: note: format string is defined here
" iov_offset %lu count %lu\n",
~~^
%u

vim +/cifs_dbg +3346 fs//cifs/file.c

3290
3291 ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to)
3292 {
3293 size_t len, cur_len, start;
3294 unsigned int npages, rsize, credits;
3295 struct file *file;
3296 struct cifs_sb_info *cifs_sb;
3297 struct cifsFileInfo *cfile;
3298 struct cifs_tcon *tcon;
3299 struct page **pagevec;
3300 ssize_t rc, total_read = 0;
3301 struct TCP_Server_Info *server;
3302 loff_t offset = iocb->ki_pos;
3303 pid_t pid;
3304 struct cifs_readdata *rdata;
3305
3306 /*
3307 * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
3308 * fall back to data copy read path
3309 */
3310 if (to->type & ITER_KVEC) {
3311 cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n");
3312 return cifs_user_readv(iocb, to);
3313 }
3314
3315 len = iov_iter_count(to);
3316 if (!len)
3317 return 0;
3318
3319 file = iocb->ki_filp;
3320 cifs_sb = CIFS_FILE_SB(file);
3321 cfile = file->private_data;
3322 tcon = tlink_tcon(cfile->tlink);
3323 server = tcon->ses->server;
3324
3325 if (!server->ops->async_readv)
3326 return -ENOSYS;
3327
3328 if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
3329 pid = cfile->pid;
3330 else
3331 pid = current->tgid;
3332
3333 if ((file->f_flags & O_ACCMODE) == O_WRONLY)
3334 cifs_dbg(FYI, "attempting read on write only file instance\n");
3335
3336 do {
3337 rc = server->ops->wait_mtu_credits(server, cifs_sb->rsize,
3338 &rsize, &credits);
3339 if (rc)
3340 break;
3341
3342 cur_len = min_t(const size_t, len, rsize);
3343
3344 rc = iov_iter_get_pages_alloc(to, &pagevec, cur_len, &start);
3345 if (rc < 0) {
> 3346 cifs_dbg(VFS,
3347 "couldn't get user pages (rc=%zd) iter type %d"
3348 " iov_offset %lu count %lu\n",
3349 rc, to->type, to->iov_offset, to->count);
3350 dump_stack();
3351 break;
3352 }
3353
3354 rdata = cifs_readdata_direct_alloc(
3355 pagevec, cifs_direct_readv_complete);
3356 if (!rdata) {
3357 add_credits_and_wake_if(server, credits, 0);
3358 rc = -ENOMEM;
3359 break;
3360 }
3361
3362 npages = (rc + start + PAGE_SIZE-1) / PAGE_SIZE;
3363 rdata->nr_pages = npages;
3364 rdata->page_offset = start;
3365 rdata->pagesz = PAGE_SIZE;
3366 rdata->tailsz = npages > 1 ?
3367 rc-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE :
3368 rc;
3369 cur_len = rc;
3370
3371 rdata->cfile = cifsFileInfo_get(cfile);
3372 rdata->offset = offset;
3373 rdata->bytes = rc;
3374 rdata->pid = pid;
3375 rdata->read_into_pages = cifs_uncached_read_into_pages;
3376 rdata->copy_into_pages = cifs_uncached_copy_into_pages;
3377 rdata->credits = credits;
3378
3379 rc = 0;
3380 if (rdata->cfile->invalidHandle)
3381 rc = cifs_reopen_file(rdata->cfile, true);
3382
3383 if (!rc)
3384 rc = server->ops->async_readv(rdata);
3385
3386 if (rc) {
3387 add_credits_and_wake_if(server, rdata->credits, 0);
3388 kref_put(&rdata->refcount,
3389 cifs_direct_readdata_release);
3390 if (rc == -EAGAIN)
3391 continue;
3392 break;
3393 }
3394
3395 wait_for_completion(&rdata->done);
3396 rc = rdata->result;
3397 if (rc) {
3398 kref_put(
3399 &rdata->refcount,
3400 cifs_direct_readdata_release);
3401 if (rc == -EAGAIN)
3402 continue;
3403 break;
3404 }
3405
3406 total_read += rdata->got_bytes;
3407 kref_put(&rdata->refcount, cifs_direct_readdata_release);
3408
3409 iov_iter_advance(to, cur_len);
3410 len -= cur_len;
3411 offset += cur_len;
3412 } while (len);
3413
3414 iocb->ki_pos += total_read;
3415
3416 return total_read;
3417 }
3418

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (23.07 kB)
.config.gz (28.03 kB)
Download all attachments

2018-06-07 11:19:06

by Pavel Shilovsky

[permalink] [raw]
Subject: Re: [Patch v2 15/15] CIFS: Add direct I/O functions to file_operations

2018-05-30 12:48 GMT-07:00 Long Li <[email protected]>:
> From: Long Li <[email protected]>
>
> With direct read/write functions implemented, add them to file_operations.
> When mounting with "cache=none", CIFS uses direct I/O.
>
> Signed-off-by: Long Li <[email protected]>
> ---
> fs/cifs/cifsfs.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index 62f1662..e84f8c2 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -1113,9 +1113,8 @@ const struct file_operations cifs_file_strict_ops = {
> };
>
> const struct file_operations cifs_file_direct_ops = {
> - /* BB reevaluate whether they can be done with directio, no cache */
> - .read_iter = cifs_user_readv,
> - .write_iter = cifs_user_writev,
> + .read_iter = cifs_direct_readv,
> + .write_iter = cifs_direct_writev,

I would postpone making this change until we have asynchronous I/O
support for direct mode.

--
Best regards,
Pavel Shilovsky

2018-06-24 02:00:07

by Tom Talpey

[permalink] [raw]
Subject: Re: [Patch v2 02/15] CIFS: Add support for direct pages in rdata

On 5/30/2018 3:47 PM, Long Li wrote:
> From: Long Li <[email protected]>
>
> Add a function to allocate rdata without allocating pages for data
> transfer. This gives the caller an option to pass a number of pages
> that point to the data buffer.
>
> rdata is still reponsible for free those pages after it's done.

"Caller" is still responsible? Or is the rdata somehow freeing itself
via another mechanism?

>
> Signed-off-by: Long Li <[email protected]>
> ---
> fs/cifs/cifsglob.h | 2 +-
> fs/cifs/file.c | 23 ++++++++++++++++++++---
> 2 files changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 8d16c3e..56864a87 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -1179,7 +1179,7 @@ struct cifs_readdata {
> unsigned int tailsz;
> unsigned int credits;
> unsigned int nr_pages;
> - struct page *pages[];
> + struct page **pages;

Technically speaking, these are syntactically equivalent. It may not
be worth changing this historic definition.

Tom.


> };
>
> struct cifs_writedata;
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 23fd430..1c98293 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2880,13 +2880,13 @@ cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from)
> }
>
> static struct cifs_readdata *
> -cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
> +cifs_readdata_direct_alloc(struct page **pages, work_func_t complete)
> {
> struct cifs_readdata *rdata;
>
> - rdata = kzalloc(sizeof(*rdata) + (sizeof(struct page *) * nr_pages),
> - GFP_KERNEL);
> + rdata = kzalloc(sizeof(*rdata), GFP_KERNEL);
> if (rdata != NULL) {
> + rdata->pages = pages;
> kref_init(&rdata->refcount);
> INIT_LIST_HEAD(&rdata->list);
> init_completion(&rdata->done);
> @@ -2896,6 +2896,22 @@ cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
> return rdata;
> }
>
> +static struct cifs_readdata *
> +cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
> +{
> + struct page **pages =
> + kzalloc(sizeof(struct page *) * nr_pages, GFP_KERNEL);
> + struct cifs_readdata *ret = NULL;
> +
> + if (pages) {
> + ret = cifs_readdata_direct_alloc(pages, complete);
> + if (!ret)
> + kfree(pages);
> + }
> +
> + return ret;
> +}
> +
> void
> cifs_readdata_release(struct kref *refcount)
> {
> @@ -2910,6 +2926,7 @@ cifs_readdata_release(struct kref *refcount)
> if (rdata->cfile)
> cifsFileInfo_put(rdata->cfile);
>
> + kvfree(rdata->pages);
> kfree(rdata);
> }
>
>

2018-06-24 02:00:07

by Tom Talpey

[permalink] [raw]
Subject: Re: [Patch v2 03/15] CIFS: Use offset when reading pages

On 5/30/2018 3:47 PM, Long Li wrote:
> From: Long Li <[email protected]>
>
> With offset defined in rdata, transport functions need to look at this
> offset when reading data into the correct places in pages.
>
> Signed-off-by: Long Li <[email protected]>
> ---
> fs/cifs/cifsproto.h | 4 +++-
> fs/cifs/connect.c | 5 +++--
> fs/cifs/file.c | 52 +++++++++++++++++++++++++++++++++++++---------------
> fs/cifs/smb2ops.c | 2 +-
> fs/cifs/smb2pdu.c | 1 +
> 5 files changed, 45 insertions(+), 19 deletions(-)
>
> diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
> index dc80f84..1f27d8e 100644
> --- a/fs/cifs/cifsproto.h
> +++ b/fs/cifs/cifsproto.h
> @@ -203,7 +203,9 @@ extern void dequeue_mid(struct mid_q_entry *mid, bool malformed);
> extern int cifs_read_from_socket(struct TCP_Server_Info *server, char *buf,
> unsigned int to_read);
> extern int cifs_read_page_from_socket(struct TCP_Server_Info *server,
> - struct page *page, unsigned int to_read);
> + struct page *page,
> + unsigned int page_offset,
> + unsigned int to_read);
> extern int cifs_setup_cifs_sb(struct smb_vol *pvolume_info,
> struct cifs_sb_info *cifs_sb);
> extern int cifs_match_super(struct super_block *, void *);
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 83b0234..8501da0 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -594,10 +594,11 @@ cifs_read_from_socket(struct TCP_Server_Info *server, char *buf,
>
> int
> cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page *page,
> - unsigned int to_read)
> + unsigned int page_offset, unsigned int to_read)
> {
> struct msghdr smb_msg;
> - struct bio_vec bv = {.bv_page = page, .bv_len = to_read};
> + struct bio_vec bv = {
> + .bv_page = page, .bv_len = to_read, .bv_offset = page_offset};
> iov_iter_bvec(&smb_msg.msg_iter, READ | ITER_BVEC, &bv, 1, to_read);
> return cifs_readv_from_socket(server, &smb_msg);
> }
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 1c98293..87eece6 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -3026,12 +3026,20 @@ uncached_fill_pages(struct TCP_Server_Info *server,
> int result = 0;
> unsigned int i;
> unsigned int nr_pages = rdata->nr_pages;
> + unsigned int page_offset = rdata->page_offset;
>
> rdata->got_bytes = 0;
> rdata->tailsz = PAGE_SIZE;
> for (i = 0; i < nr_pages; i++) {
> struct page *page = rdata->pages[i];
> size_t n;
> + unsigned int segment_size = rdata->pagesz;
> +
> + if (i == 0)
> + segment_size -= page_offset;
> + else
> + page_offset = 0;
> +
>
> if (len <= 0) {
> /* no need to hold page hostage */
> @@ -3040,24 +3048,25 @@ uncached_fill_pages(struct TCP_Server_Info *server,
> put_page(page);
> continue;
> }
> +
> n = len;
> - if (len >= PAGE_SIZE) {
> + if (len >= segment_size)
> /* enough data to fill the page */
> - n = PAGE_SIZE;
> - len -= n;
> - } else {
> - zero_user(page, len, PAGE_SIZE - len);
> + n = segment_size;
> + else
> rdata->tailsz = len;
> - len = 0;
> - }
> + len -= n;
> +
> if (iter)
> - result = copy_page_from_iter(page, 0, n, iter);
> + result = copy_page_from_iter(
> + page, page_offset, n, iter);
> #ifdef CONFIG_CIFS_SMB_DIRECT
> else if (rdata->mr)
> result = n;
> #endif

I hadn't noticed this type of xonditional code before. It's kind of ugly
to modify the data handling code like this. Do you plan to break this
out into an smbdirect-specific hander, like the cases above and below?

> else
> - result = cifs_read_page_from_socket(server, page, n);
> + result = cifs_read_page_from_socket(
> + server, page, page_offset, n);
> if (result < 0)
> break;
>
> @@ -3130,6 +3139,7 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
> rdata->bytes = cur_len;
> rdata->pid = pid;
> rdata->pagesz = PAGE_SIZE;
> + rdata->tailsz = PAGE_SIZE;
> rdata->read_into_pages = cifs_uncached_read_into_pages;
> rdata->copy_into_pages = cifs_uncached_copy_into_pages;
> rdata->credits = credits;
> @@ -3574,6 +3584,7 @@ readpages_fill_pages(struct TCP_Server_Info *server,
> u64 eof;
> pgoff_t eof_index;
> unsigned int nr_pages = rdata->nr_pages;
> + unsigned int page_offset = rdata->page_offset;
>
> /* determine the eof that the server (probably) has */
> eof = CIFS_I(rdata->mapping->host)->server_eof;
> @@ -3584,13 +3595,21 @@ readpages_fill_pages(struct TCP_Server_Info *server,
> rdata->tailsz = PAGE_SIZE;
> for (i = 0; i < nr_pages; i++) {
> struct page *page = rdata->pages[i];
> - size_t n = PAGE_SIZE;
> + unsigned int to_read = rdata->pagesz;
> + size_t n;
> +
> + if (i == 0)
> + to_read -= page_offset;
> + else
> + page_offset = 0;
> +
> + n = to_read;
>
> - if (len >= PAGE_SIZE) {
> - len -= PAGE_SIZE;
> + if (len >= to_read) {
> + len -= to_read;
> } else if (len > 0) {
> /* enough for partial page, fill and zero the rest */
> - zero_user(page, len, PAGE_SIZE - len);
> + zero_user(page, len + page_offset, to_read - len);
> n = rdata->tailsz = len;
> len = 0;
> } else if (page->index > eof_index) {
> @@ -3622,13 +3641,15 @@ readpages_fill_pages(struct TCP_Server_Info *server,
> }
>
> if (iter)
> - result = copy_page_from_iter(page, 0, n, iter);
> + result = copy_page_from_iter(
> + page, page_offset, n, iter);
> #ifdef CONFIG_CIFS_SMB_DIRECT
> else if (rdata->mr)
> result = n;
> #endif

Ditto previous comment/question.

> else
> - result = cifs_read_page_from_socket(server, page, n);
> + result = cifs_read_page_from_socket(
> + server, page, page_offset, n);
> if (result < 0)
> break;
>
> @@ -3807,6 +3828,7 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
> rdata->bytes = bytes;
> rdata->pid = pid;
> rdata->pagesz = PAGE_SIZE;
> + rdata->tailsz = PAGE_SIZE;
> rdata->read_into_pages = cifs_readpages_read_into_pages;
> rdata->copy_into_pages = cifs_readpages_copy_into_pages;
> rdata->credits = credits;
> diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> index 7c0edd2..1fa1c29 100644
> --- a/fs/cifs/smb2ops.c
> +++ b/fs/cifs/smb2ops.c
> @@ -2467,7 +2467,7 @@ read_data_into_pages(struct TCP_Server_Info *server, struct page **pages,
> zero_user(page, len, PAGE_SIZE - len);
> len = 0;
> }
> - length = cifs_read_page_from_socket(server, page, n);
> + length = cifs_read_page_from_socket(server, page, 0, n);
> if (length < 0)
> return length;
> server->total_read += length;
> diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
> index a02f6b6..f603fbe 100644
> --- a/fs/cifs/smb2pdu.c
> +++ b/fs/cifs/smb2pdu.c
> @@ -2683,6 +2683,7 @@ smb2_readv_callback(struct mid_q_entry *mid)
> struct smb_rqst rqst = { .rq_iov = rdata->iov,
> .rq_nvec = 2,
> .rq_pages = rdata->pages,
> + .rq_offset = rdata->page_offset,
> .rq_npages = rdata->nr_pages,
> .rq_pagesz = rdata->pagesz,
> .rq_tailsz = rdata->tailsz };
>

2018-06-24 02:02:56

by Tom Talpey

[permalink] [raw]
Subject: Re: [Patch v2 04/15] CIFS: Add support for direct pages in wdata

On 5/30/2018 3:47 PM, Long Li wrote:
> From: Long Li <[email protected]>
>
> Add a function to allocate wdata without allocating pages for data
> transfer. This gives the caller an option to pass a number of pages that
> point to the data buffer to write to.
>
> wdata is reponsible for free those pages after it's done.

Same comment as for the earlier patch. "Caller" is responsible for
"freeing" those pages? Confusing, as worded.

>
> Signed-off-by: Long Li <[email protected]>
> ---
> fs/cifs/cifsglob.h | 2 +-
> fs/cifs/cifsproto.h | 2 ++
> fs/cifs/cifssmb.c | 17 ++++++++++++++---
> 3 files changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 56864a87..7f62c98 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -1205,7 +1205,7 @@ struct cifs_writedata {
> unsigned int tailsz;
> unsigned int credits;
> unsigned int nr_pages;
> - struct page *pages[];
> + struct page **pages;

Also same comment as for earlier patch, these are syntactically
equivalent and maybe not needed to change.

> };
>
> /*
> diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
> index 1f27d8e..7933c5f 100644
> --- a/fs/cifs/cifsproto.h
> +++ b/fs/cifs/cifsproto.h
> @@ -533,6 +533,8 @@ int cifs_async_writev(struct cifs_writedata *wdata,
> void cifs_writev_complete(struct work_struct *work);
> struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages,
> work_func_t complete);
> +struct cifs_writedata *cifs_writedata_direct_alloc(struct page **pages,
> + work_func_t complete);
> void cifs_writedata_release(struct kref *refcount);
> int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
> struct cifs_sb_info *cifs_sb,
> diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
> index c8e4278..5aca336 100644
> --- a/fs/cifs/cifssmb.c
> +++ b/fs/cifs/cifssmb.c
> @@ -1952,6 +1952,7 @@ cifs_writedata_release(struct kref *refcount)
> if (wdata->cfile)
> cifsFileInfo_put(wdata->cfile);
>
> + kvfree(wdata->pages);
> kfree(wdata);
> }
>
> @@ -2075,12 +2076,22 @@ cifs_writev_complete(struct work_struct *work)
> struct cifs_writedata *
> cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete)
> {
> + struct page **pages =
> + kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS);

Why do you do a GFP_NOFS here but GFP_KERNEL in the earlier patch?

> + if (pages)
> + return cifs_writedata_direct_alloc(pages, complete);
> +
> + return NULL;
> +}
> +
> +struct cifs_writedata *
> +cifs_writedata_direct_alloc(struct page **pages, work_func_t complete)
> +{
> struct cifs_writedata *wdata;
>
> - /* writedata + number of page pointers */
> - wdata = kzalloc(sizeof(*wdata) +
> - sizeof(struct page *) * nr_pages, GFP_NOFS);
> + wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
> if (wdata != NULL) {
> + wdata->pages = pages;
> kref_init(&wdata->refcount);
> INIT_LIST_HEAD(&wdata->list);
> init_completion(&wdata->done);
>

2018-06-24 02:08:55

by Tom Talpey

[permalink] [raw]
Subject: Re: [Patch v2 05/15] CIFS: Calculate the correct request length based on page offset and tail size

On 5/30/2018 3:47 PM, Long Li wrote:
> From: Long Li <[email protected]>
>
> It's possible that the page offset is non-zero in the pages in a request,
> change the function to calculate the correct data buffer length.
>
> Signed-off-by: Long Li <[email protected]>
> ---
> fs/cifs/transport.c | 20 +++++++++++++++++---
> 1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index 927226a..d6b5523 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -212,10 +212,24 @@ rqst_len(struct smb_rqst *rqst)
> for (i = 0; i < rqst->rq_nvec; i++)
> buflen += iov[i].iov_len;
>
> - /* add in the page array if there is one */
> + /*
> + * Add in the page array if there is one. The caller needs to make
> + * sure rq_offset and rq_tailsz are set correctly. If a buffer of
> + * multiple pages ends at page boundary, rq_tailsz needs to be set to
> + * PAGE_SIZE.
> + */
> if (rqst->rq_npages) {
> - buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
> - buflen += rqst->rq_tailsz;
> + if (rqst->rq_npages == 1)
> + buflen += rqst->rq_tailsz;
> + else {
> + /*
> + * If there is more than one page, calculate the
> + * buffer length based on rq_offset and rq_tailsz
> + */
> + buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
> + rqst->rq_offset;
> + buflen += rqst->rq_tailsz;
> + }

Wouldn't it be simpler to keep the original code, but then just subtract
the rq_offset?

buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
buflen += rqst->rq_tailsz;
buflen -= rqst->rq_offset;

It's kind of confusing as written. Also, what if it's just one page, but
has a non-zero offset? Is that somehow not possible? My suggested code
would take that into account, yours doesn't.

Tom.

> }
>
> return buflen;
>

2018-06-24 02:10:09

by Tom Talpey

[permalink] [raw]
Subject: Re: [Patch v2 06/15] CIFS: Introduce helper function to get page offset and length in smb_rqst

On 5/30/2018 3:47 PM, Long Li wrote:
> From: Long Li <[email protected]>
>
> Introduce a function rqst_page_get_length to return the page offset and
> length for a given page in smb_rqst. This function is to be used by
> following patches.
>
> Signed-off-by: Long Li <[email protected]>
> ---
> fs/cifs/cifsproto.h | 3 +++
> fs/cifs/misc.c | 17 +++++++++++++++++
> 2 files changed, 20 insertions(+)
>
> diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
> index 7933c5f..89dda14 100644
> --- a/fs/cifs/cifsproto.h
> +++ b/fs/cifs/cifsproto.h
> @@ -557,4 +557,7 @@ int cifs_alloc_hash(const char *name, struct crypto_shash **shash,
> struct sdesc **sdesc);
> void cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc);
>
> +extern void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page,
> + unsigned int *len, unsigned int *offset);
> +
> #endif /* _CIFSPROTO_H */
> diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
> index 96849b5..e951417 100644
> --- a/fs/cifs/misc.c
> +++ b/fs/cifs/misc.c
> @@ -905,3 +905,20 @@ cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc)
> crypto_free_shash(*shash);
> *shash = NULL;
> }
> +
> +/**
> + * rqst_page_get_length - obtain the length and offset for a page in smb_rqst
> + * Input: rqst - a smb_rqst, page - a page index for rqst
> + * Output: *len - the length for this page, *offset - the offset for this page
> + */
> +void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page,
> + unsigned int *len, unsigned int *offset)
> +{
> + *len = rqst->rq_pagesz;
> + *offset = (page == 0) ? rqst->rq_offset : 0;

Really? Page 0 always has a zero offset??

> +
> + if (rqst->rq_npages == 1 || page == rqst->rq_npages-1)
> + *len = rqst->rq_tailsz;
> + else if (page == 0)
> + *len = rqst->rq_pagesz - rqst->rq_offset;
> +}
>

This subroutine does what patch 5 does inline. Why not push this patch
up in the sequence and use the helper?

Tom.

2018-06-24 02:12:33

by Tom Talpey

[permalink] [raw]
Subject: Re: [Patch v2 08/15] CIFS: SMBD: Support page offset in RDMA send

On 5/30/2018 3:48 PM, Long Li wrote:
> From: Long Li <[email protected]>
>
> The RDMA send function needs to look at offset in the request pages, and
> send data starting from there.
>
> Signed-off-by: Long Li <[email protected]>
> ---
> fs/cifs/smbdirect.c | 27 +++++++++++++++++++--------
> 1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
> index c62f7c9..6141e3c 100644
> --- a/fs/cifs/smbdirect.c
> +++ b/fs/cifs/smbdirect.c
> @@ -17,6 +17,7 @@
> #include <linux/highmem.h>
> #include "smbdirect.h"
> #include "cifs_debug.h"
> +#include "cifsproto.h"
>
> static struct smbd_response *get_empty_queue_buffer(
> struct smbd_connection *info);
> @@ -2082,7 +2083,7 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
> struct kvec vec;
> int nvecs;
> int size;
> - int buflen = 0, remaining_data_length;
> + unsigned int buflen = 0, remaining_data_length;
> int start, i, j;
> int max_iov_size =
> info->max_send_size - sizeof(struct smbd_data_transfer);
> @@ -2113,10 +2114,17 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
> buflen += iov[i].iov_len;
> }
>
> - /* add in the page array if there is one */
> + /*
> + * Add in the page array if there is one. The caller needs to set
> + * rq_tailsz to PAGE_SIZE when the buffer has multiple pages and
> + * ends at page boundary
> + */
> if (rqst->rq_npages) {
> - buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
> - buflen += rqst->rq_tailsz;
> + if (rqst->rq_npages == 1)
> + buflen += rqst->rq_tailsz;
> + else
> + buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
> + rqst->rq_offset + rqst->rq_tailsz;
> }

This code is really confusing and redundant. It tests npages > 0,
then tests npages == 1, then does an else. Why not call the helper
like the following smbd_send()?

Tom.

>
> if (buflen + sizeof(struct smbd_data_transfer) >
> @@ -2213,8 +2221,9 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
>
> /* now sending pages if there are any */
> for (i = 0; i < rqst->rq_npages; i++) {
> - buflen = (i == rqst->rq_npages-1) ?
> - rqst->rq_tailsz : rqst->rq_pagesz;
> + unsigned int offset;
> +
> + rqst_page_get_length(rqst, i, &buflen, &offset);
> nvecs = (buflen + max_iov_size - 1) / max_iov_size;
> log_write(INFO, "sending pages buflen=%d nvecs=%d\n",
> buflen, nvecs);
> @@ -2225,9 +2234,11 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
> remaining_data_length -= size;
> log_write(INFO, "sending pages i=%d offset=%d size=%d"
> " remaining_data_length=%d\n",
> - i, j*max_iov_size, size, remaining_data_length);
> + i, j*max_iov_size+offset, size,
> + remaining_data_length);
> rc = smbd_post_send_page(
> - info, rqst->rq_pages[i], j*max_iov_size,
> + info, rqst->rq_pages[i],
> + j*max_iov_size + offset,
> size, remaining_data_length);
> if (rc)
> goto done;
>

2018-06-24 02:17:09

by Tom Talpey

[permalink] [raw]
Subject: Re: [Patch v2 09/15] CIFS: SMBD: Support page offset in RDMA recv

On 5/30/2018 3:48 PM, Long Li wrote:
> From: Long Li <[email protected]>
>
> RDMA recv function needs to place data to the correct place starting at
> page offset.
>
> Signed-off-by: Long Li <[email protected]>
> ---
> fs/cifs/smbdirect.c | 18 +++++++++++-------
> 1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
> index 6141e3c..ba53c52 100644
> --- a/fs/cifs/smbdirect.c
> +++ b/fs/cifs/smbdirect.c
> @@ -2004,10 +2004,12 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
> * return value: actual data read
> */
> static int smbd_recv_page(struct smbd_connection *info,
> - struct page *page, unsigned int to_read)
> + struct page *page, unsigned int page_offset,
> + unsigned int to_read)
> {
> int ret;
> char *to_address;
> + void *page_address;
>
> /* make sure we have the page ready for read */
> ret = wait_event_interruptible(
> @@ -2015,16 +2017,17 @@ static int smbd_recv_page(struct smbd_connection *info,
> info->reassembly_data_length >= to_read ||
> info->transport_status != SMBD_CONNECTED);
> if (ret)
> - return 0;
> + return ret;
>
> /* now we can read from reassembly queue and not sleep */
> - to_address = kmap_atomic(page);
> + page_address = kmap_atomic(page);
> + to_address = (char *) page_address + page_offset;
>
> log_read(INFO, "reading from page=%p address=%p to_read=%d\n",
> page, to_address, to_read);
>
> ret = smbd_recv_buf(info, to_address, to_read);
> - kunmap_atomic(to_address);
> + kunmap_atomic(page_address);

Is "page" truly not mapped? This kmap/kunmap for each received 4KB is
very expensive. Is there not a way to keep a kva for the reassembly
queue segments?

>
> return ret;
> }
> @@ -2038,7 +2041,7 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
> {
> char *buf;
> struct page *page;
> - unsigned int to_read;
> + unsigned int to_read, page_offset;
> int rc;
>
> info->smbd_recv_pending++;
> @@ -2052,15 +2055,16 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
>
> case READ | ITER_BVEC:
> page = msg->msg_iter.bvec->bv_page;
> + page_offset = msg->msg_iter.bvec->bv_offset;
> to_read = msg->msg_iter.bvec->bv_len;
> - rc = smbd_recv_page(info, page, to_read);
> + rc = smbd_recv_page(info, page, page_offset, to_read);
> break;
>
> default:
> /* It's a bug in upper layer to get there */
> cifs_dbg(VFS, "CIFS: invalid msg type %d\n",
> msg->msg_iter.type);
> - rc = -EIO;
> + rc = -EINVAL;
> }
>
> info->smbd_recv_pending--;
>

2018-06-24 02:25:31

by Tom Talpey

[permalink] [raw]
Subject: Re: [Patch v2 10/15] CIFS: SMBD: Support page offset in memory registration

On 5/30/2018 3:48 PM, Long Li wrote:
> From: Long Li <[email protected]>
>
> Change code to pass the correct page offset during memory registration for
> RDMA read/write.
>
> Signed-off-by: Long Li <[email protected]>
> ---
> fs/cifs/smb2pdu.c | 18 ++++++++-----
> fs/cifs/smbdirect.c | 76 +++++++++++++++++++++++++++++++----------------------
> fs/cifs/smbdirect.h | 2 +-
> 3 files changed, 58 insertions(+), 38 deletions(-)
>
> diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
> index f603fbe..fc30774 100644
> --- a/fs/cifs/smb2pdu.c
> +++ b/fs/cifs/smb2pdu.c
> @@ -2623,8 +2623,8 @@ smb2_new_read_req(void **buf, unsigned int *total_len,
>
> rdata->mr = smbd_register_mr(
> server->smbd_conn, rdata->pages,
> - rdata->nr_pages, rdata->tailsz,
> - true, need_invalidate);
> + rdata->nr_pages, rdata->page_offset,
> + rdata->tailsz, true, need_invalidate);
> if (!rdata->mr)
> return -ENOBUFS;
>
> @@ -3013,16 +3013,22 @@ smb2_async_writev(struct cifs_writedata *wdata,
>
> wdata->mr = smbd_register_mr(
> server->smbd_conn, wdata->pages,
> - wdata->nr_pages, wdata->tailsz,
> - false, need_invalidate);
> + wdata->nr_pages, wdata->page_offset,
> + wdata->tailsz, false, need_invalidate);
> if (!wdata->mr) {
> rc = -ENOBUFS;
> goto async_writev_out;
> }
> req->Length = 0;
> req->DataOffset = 0;
> - req->RemainingBytes =
> - cpu_to_le32((wdata->nr_pages-1)*PAGE_SIZE + wdata->tailsz);
> + if (wdata->nr_pages > 1)
> + req->RemainingBytes =
> + cpu_to_le32(
> + (wdata->nr_pages - 1) * wdata->pagesz -
> + wdata->page_offset + wdata->tailsz
> + );
> + else
> + req->RemainingBytes = cpu_to_le32(wdata->tailsz);

Again, I think a helper that computed and returned this size would be
much clearer and compact. And I still am incredulous that a single page
io always has an offset of zero. :-)

> req->Channel = SMB2_CHANNEL_RDMA_V1_INVALIDATE;
> if (need_invalidate)
> req->Channel = SMB2_CHANNEL_RDMA_V1;
> diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
> index ba53c52..e459c97 100644
> --- a/fs/cifs/smbdirect.c
> +++ b/fs/cifs/smbdirect.c
> @@ -2299,37 +2299,37 @@ static void smbd_mr_recovery_work(struct work_struct *work)
> if (smbdirect_mr->state == MR_INVALIDATED ||
> smbdirect_mr->state == MR_ERROR) {
>
> - if (smbdirect_mr->state == MR_INVALIDATED) {
> + /* recover this MR entry */
> + rc = ib_dereg_mr(smbdirect_mr->mr);
> + if (rc) {
> + log_rdma_mr(ERR,
> + "ib_dereg_mr failed rc=%x\n",
> + rc);
> + smbd_disconnect_rdma_connection(info);
> + continue;
> + }

Ok, we discussed this ib_dereg_mr() call at the plugfest last week.
It's unnecessary - the MR is reusable and does not need to be destroyed
after each use.

> +
> + smbdirect_mr->mr = ib_alloc_mr(
> + info->pd, info->mr_type,
> + info->max_frmr_depth);
> + if (IS_ERR(smbdirect_mr->mr)) {
> + log_rdma_mr(ERR,
> + "ib_alloc_mr failed mr_type=%x "
> + "max_frmr_depth=%x\n",
> + info->mr_type,
> + info->max_frmr_depth);
> + smbd_disconnect_rdma_connection(info);
> + continue;
> + }
> +

Not needed, for the same reason above.

> + if (smbdirect_mr->state == MR_INVALIDATED)
> ib_dma_unmap_sg(
> info->id->device, smbdirect_mr->sgl,
> smbdirect_mr->sgl_count,
> smbdirect_mr->dir);
> - smbdirect_mr->state = MR_READY;

As we observed, the smbdirect_mr is not protected by a lock, therefore
this MR_READY state transition needs a memory barrier in front of it!

> - } else if (smbdirect_mr->state == MR_ERROR) {
> -
> - /* recover this MR entry */
> - rc = ib_dereg_mr(smbdirect_mr->mr);
> - if (rc) {
> - log_rdma_mr(ERR,
> - "ib_dereg_mr failed rc=%x\n",
> - rc);
> - smbd_disconnect_rdma_connection(info);
> - }
Why are you deleting the MR_ERROR handling? It seems this is precisely
the place where the MR needs to be destroyed, to prevent any later RDMA
operations from potentially reaching the original memory.

>
> - smbdirect_mr->mr = ib_alloc_mr(
> - info->pd, info->mr_type,
> - info->max_frmr_depth);
> - if (IS_ERR(smbdirect_mr->mr)) {
> - log_rdma_mr(ERR,
> - "ib_alloc_mr failed mr_type=%x "
> - "max_frmr_depth=%x\n",
> - info->mr_type,
> - info->max_frmr_depth);
> - smbd_disconnect_rdma_connection(info);
> - }
> + smbdirect_mr->state = MR_READY;
>
> - smbdirect_mr->state = MR_READY;
> - }
> /* smbdirect_mr->state is updated by this function
> * and is read and updated by I/O issuing CPUs trying
> * to get a MR, the call to atomic_inc_return
> @@ -2475,7 +2475,7 @@ static struct smbd_mr *get_mr(struct smbd_connection *info)
> */
> struct smbd_mr *smbd_register_mr(
> struct smbd_connection *info, struct page *pages[], int num_pages,
> - int tailsz, bool writing, bool need_invalidate)
> + int offset, int tailsz, bool writing, bool need_invalidate)
> {
> struct smbd_mr *smbdirect_mr;
> int rc, i;
> @@ -2498,17 +2498,31 @@ struct smbd_mr *smbd_register_mr(
> smbdirect_mr->sgl_count = num_pages;
> sg_init_table(smbdirect_mr->sgl, num_pages);
>
> - for (i = 0; i < num_pages - 1; i++)
> - sg_set_page(&smbdirect_mr->sgl[i], pages[i], PAGE_SIZE, 0);
> + log_rdma_mr(INFO, "num_pages=0x%x offset=0x%x tailsz=0x%x\n",
> + num_pages, offset, tailsz);
>
> + if (num_pages == 1) {
> + sg_set_page(&smbdirect_mr->sgl[0], pages[0], tailsz, offset);
> + goto skip_multiple_pages;

A simple "else" would be much preferable to this "goto".

> + }
> +
> + /* We have at least two pages to register */
> + sg_set_page(
> + &smbdirect_mr->sgl[0], pages[0], PAGE_SIZE - offset, offset);
> + i = 1;
> + while (i < num_pages - 1) {
> + sg_set_page(&smbdirect_mr->sgl[i], pages[i], PAGE_SIZE, 0);
> + i++;
> + }
> sg_set_page(&smbdirect_mr->sgl[i], pages[i],
> tailsz ? tailsz : PAGE_SIZE, 0);
>
> +skip_multiple_pages:
> dir = writing ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> smbdirect_mr->dir = dir;
> rc = ib_dma_map_sg(info->id->device, smbdirect_mr->sgl, num_pages, dir);
> if (!rc) {
> - log_rdma_mr(INFO, "ib_dma_map_sg num_pages=%x dir=%x rc=%x\n",
> + log_rdma_mr(ERR, "ib_dma_map_sg num_pages=%x dir=%x rc=%x\n",
> num_pages, dir, rc);
> goto dma_map_error;
> }
> @@ -2516,8 +2530,8 @@ struct smbd_mr *smbd_register_mr(
> rc = ib_map_mr_sg(smbdirect_mr->mr, smbdirect_mr->sgl, num_pages,
> NULL, PAGE_SIZE);
> if (rc != num_pages) {
> - log_rdma_mr(INFO,
> - "ib_map_mr_sg failed rc = %x num_pages = %x\n",
> + log_rdma_mr(ERR,
> + "ib_map_mr_sg failed rc = %d num_pages = %x\n",
> rc, num_pages);
> goto map_mr_error;
> }
> diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h
> index f9038da..1e419c2 100644
> --- a/fs/cifs/smbdirect.h
> +++ b/fs/cifs/smbdirect.h
> @@ -321,7 +321,7 @@ struct smbd_mr {
> /* Interfaces to register and deregister MR for RDMA read/write */
> struct smbd_mr *smbd_register_mr(
> struct smbd_connection *info, struct page *pages[], int num_pages,
> - int tailsz, bool writing, bool need_invalidate);
> + int offset, int tailsz, bool writing, bool need_invalidate);
> int smbd_deregister_mr(struct smbd_mr *mr);
>
> #else
>

2018-06-24 02:29:01

by Tom Talpey

[permalink] [raw]
Subject: Re: [Patch v2 11/15] CIFS: Pass page offset for calculating signature

On 5/30/2018 3:48 PM, Long Li wrote:
> From: Long Li <[email protected]>
>
> When calculating signature for the packet, it needs to read into the
> correct page offset for the data.
>
> Signed-off-by: Long Li <[email protected]>
> ---
> fs/cifs/cifsencrypt.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c
> index a6ef088..e88303c 100644
> --- a/fs/cifs/cifsencrypt.c
> +++ b/fs/cifs/cifsencrypt.c
> @@ -68,11 +68,12 @@ int __cifs_calc_signature(struct smb_rqst *rqst,
>
> /* now hash over the rq_pages array */
> for (i = 0; i < rqst->rq_npages; i++) {
> - void *kaddr = kmap(rqst->rq_pages[i]);
> - size_t len = rqst->rq_pagesz;
> + void *kaddr;
> + unsigned int len, offset;
>
> - if (i == rqst->rq_npages - 1)
> - len = rqst->rq_tailsz;
> + rqst_page_get_length(rqst, i, &len, &offset);
> +
> + kaddr = (char *) kmap(rqst->rq_pages[i]) + offset;

I suppose it's more robust to map a page at a time, but it's pretty
expensive. Is this the only way to iterate over a potentially very
large block of data? For example, a 1MB segment means 256 kmap/kunmaps.

Tom.

>
> crypto_shash_update(shash, kaddr, len);
>
>

2018-06-24 02:30:48

by Tom Talpey

[permalink] [raw]
Subject: Re: [Patch v2 12/15] CIFS: Pass page offset for encrypting

On 5/30/2018 3:48 PM, Long Li wrote:
> From: Long Li <[email protected]>
>
> Encryption function needs to read data starting page offset from input
> buffer.
>
> This doesn't affect decryption path since it allocates its own page
> buffers.
>
> Signed-off-by: Long Li <[email protected]>
> ---
> fs/cifs/smb2ops.c | 20 +++++++++++++-------
> 1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> index 1fa1c29..38d19b6 100644
> --- a/fs/cifs/smb2ops.c
> +++ b/fs/cifs/smb2ops.c
> @@ -2189,9 +2189,10 @@ init_sg(struct smb_rqst *rqst, u8 *sign)
> smb2_sg_set_buf(&sg[i], rqst->rq_iov[i].iov_base,
> rqst->rq_iov[i].iov_len);
> for (j = 0; i < sg_len - 1; i++, j++) {
> - unsigned int len = (j < rqst->rq_npages - 1) ? rqst->rq_pagesz
> - : rqst->rq_tailsz;
> - sg_set_page(&sg[i], rqst->rq_pages[j], len, 0);
> + unsigned int len, offset;
> +
> + rqst_page_get_length(rqst, j, &len, &offset);
> + sg_set_page(&sg[i], rqst->rq_pages[j], len, offset);
> }
> smb2_sg_set_buf(&sg[sg_len - 1], sign, SMB2_SIGNATURE_SIZE);
> return sg;
> @@ -2332,6 +2333,7 @@ smb3_init_transform_rq(struct TCP_Server_Info *server, struct smb_rqst *new_rq,
> return rc;
>
> new_rq->rq_pages = pages;
> + new_rq->rq_offset = old_rq->rq_offset;
> new_rq->rq_npages = old_rq->rq_npages;
> new_rq->rq_pagesz = old_rq->rq_pagesz;
> new_rq->rq_tailsz = old_rq->rq_tailsz;
> @@ -2363,10 +2365,14 @@ smb3_init_transform_rq(struct TCP_Server_Info *server, struct smb_rqst *new_rq,
>
> /* copy pages form the old */
> for (i = 0; i < npages; i++) {
> - char *dst = kmap(new_rq->rq_pages[i]);
> - char *src = kmap(old_rq->rq_pages[i]);
> - unsigned int len = (i < npages - 1) ? new_rq->rq_pagesz :
> - new_rq->rq_tailsz;
> + char *dst, *src;
> + unsigned int offset, len;
> +
> + rqst_page_get_length(new_rq, i, &len, &offset);
> +
> + dst = (char *) kmap(new_rq->rq_pages[i]) + offset;
> + src = (char *) kmap(old_rq->rq_pages[i]) + offset;

Ouch! TWO kmap/kunmaps per page of data? Is there not already a kva,
at least for the destination (message)?

Tom.


> +
> memcpy(dst, src, len);
> kunmap(new_rq->rq_pages[i]);
> kunmap(old_rq->rq_pages[i]);
>

2018-06-24 02:40:41

by Tom Talpey

[permalink] [raw]
Subject: Re: [Patch v2 13/15] CIFS: Add support for direct I/O read



On 5/30/2018 3:48 PM, Long Li wrote:
> From: Long Li <[email protected]>
>
> Implement the function for direct I/O read. It doesn't support AIO, which
> will be implemented in a follow up patch.
>
> Signed-off-by: Long Li <[email protected]>
> ---
> fs/cifs/cifsfs.h | 1 +
> fs/cifs/file.c | 149 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 150 insertions(+)
>
> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
> index 5f02318..7fba9aa 100644
> --- a/fs/cifs/cifsfs.h
> +++ b/fs/cifs/cifsfs.h
> @@ -102,6 +102,7 @@ extern int cifs_open(struct inode *inode, struct file *file);
> extern int cifs_close(struct inode *inode, struct file *file);
> extern int cifs_closedir(struct inode *inode, struct file *file);
> extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
> +extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
> extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
> extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
> extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 87eece6..e6e6f24 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2955,6 +2955,18 @@ cifs_read_allocate_pages(struct cifs_readdata *rdata, unsigned int nr_pages)
> return rc;
> }
>
> +static void cifs_direct_readdata_release(struct kref *refcount)
> +{
> + struct cifs_readdata *rdata = container_of(refcount,
> + struct cifs_readdata, refcount);
> + unsigned int i;
> +
> + for (i = 0; i < rdata->nr_pages; i++)
> + put_page(rdata->pages[i]);
> +
> + cifs_readdata_release(refcount);
> +}
> +
> static void
> cifs_uncached_readdata_release(struct kref *refcount)
> {
> @@ -3267,6 +3279,143 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
> complete(&ctx->done);
> }
>
> +static void cifs_direct_readv_complete(struct work_struct *work)
> +{
> + struct cifs_readdata *rdata =
> + container_of(work, struct cifs_readdata, work);
> +
> + complete(&rdata->done);
> + kref_put(&rdata->refcount, cifs_direct_readdata_release);
> +}
> +
> +ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to)
> +{
> + size_t len, cur_len, start;
> + unsigned int npages, rsize, credits;
> + struct file *file;
> + struct cifs_sb_info *cifs_sb;
> + struct cifsFileInfo *cfile;
> + struct cifs_tcon *tcon;
> + struct page **pagevec;
> + ssize_t rc, total_read = 0;
> + struct TCP_Server_Info *server;
> + loff_t offset = iocb->ki_pos;
> + pid_t pid;
> + struct cifs_readdata *rdata;
> +
> + /*
> + * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
> + * fall back to data copy read path
> + */
> + if (to->type & ITER_KVEC) {
> + cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n");
> + return cifs_user_readv(iocb, to);
> + }
> +
> + len = iov_iter_count(to);
> + if (!len)
> + return 0;
> +
> + file = iocb->ki_filp;
> + cifs_sb = CIFS_FILE_SB(file);
> + cfile = file->private_data;
> + tcon = tlink_tcon(cfile->tlink);
> + server = tcon->ses->server;
> +
> + if (!server->ops->async_readv)
> + return -ENOSYS;
> +
> + if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
> + pid = cfile->pid;
> + else
> + pid = current->tgid;
> +
> + if ((file->f_flags & O_ACCMODE) == O_WRONLY)
> + cifs_dbg(FYI, "attempting read on write only file instance\n");

Confusing. Maybe "attempting read on write-only filehandle"?

> +
> + do {
> + rc = server->ops->wait_mtu_credits(server, cifs_sb->rsize,
> + &rsize, &credits);
> + if (rc)
> + break;
> +
> + cur_len = min_t(const size_t, len, rsize);
> +
> + rc = iov_iter_get_pages_alloc(to, &pagevec, cur_len, &start);
> + if (rc < 0) {
> + cifs_dbg(VFS,
> + "couldn't get user pages (rc=%zd) iter type %d"
> + " iov_offset %lu count %lu\n",
> + rc, to->type, to->iov_offset, to->count);
> + dump_stack();
> + break;
> + }
> +
> + rdata = cifs_readdata_direct_alloc(
> + pagevec, cifs_direct_readv_complete);
> + if (!rdata) {
> + add_credits_and_wake_if(server, credits, 0);
> + rc = -ENOMEM;
> + break;
> + }
> +
> + npages = (rc + start + PAGE_SIZE-1) / PAGE_SIZE;
> + rdata->nr_pages = npages;
> + rdata->page_offset = start;
> + rdata->pagesz = PAGE_SIZE;
> + rdata->tailsz = npages > 1 ?
> + rc-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE :
> + rc;

This expression makes my head hurt. Surely it can be simplified, or
expressed in a clearer way.

> + cur_len = rc;
> +
> + rdata->cfile = cifsFileInfo_get(cfile);
> + rdata->offset = offset;
> + rdata->bytes = rc;
> + rdata->pid = pid;
> + rdata->read_into_pages = cifs_uncached_read_into_pages;
> + rdata->copy_into_pages = cifs_uncached_copy_into_pages;
> + rdata->credits = credits;
> +
> + rc = 0;
> + if (rdata->cfile->invalidHandle)
> + rc = cifs_reopen_file(rdata->cfile, true);
> +
> + if (!rc)
> + rc = server->ops->async_readv(rdata);
> +
> + if (rc) {

This whole rc thing is messy. Initializing to zero, setting only in
one case, then testing the result, then setting it again, is twisted.
I actually think a goto or two would read much more clearly.

> + add_credits_and_wake_if(server, rdata->credits, 0);
> + kref_put(&rdata->refcount,
> + cifs_direct_readdata_release);
> + if (rc == -EAGAIN)
> + continue;
> + break;

It's worth a comment here that this either breaks or continues the
entire do {} while (); and btw when it breaks it does *not* return "rc".
Again, maybe a goto instead of a break?

> + }
> +
> + wait_for_completion(&rdata->done);
> + rc = rdata->result;
> + if (rc) {
> + kref_put(
> + &rdata->refcount,
> + cifs_direct_readdata_release);
> + if (rc == -EAGAIN)
> + continue;
> + break;

Ditto.

> + }
> +
> + total_read += rdata->got_bytes;
> + kref_put(&rdata->refcount, cifs_direct_readdata_release);
> +
> + iov_iter_advance(to, cur_len);
> + len -= cur_len;
> + offset += cur_len;
> + } while (len);
> +
> + iocb->ki_pos += total_read;
> +
> + return total_read;
> +}
> +
> ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
> {
> struct file *file = iocb->ki_filp;
>

2018-06-24 02:49:17

by Tom Talpey

[permalink] [raw]
Subject: Re: [Patch v2 14/15] CIFS: Add support for direct I/O write

On 5/30/2018 3:48 PM, Long Li wrote:
> From: Long Li <[email protected]>
>
> Implement the function for direct I/O write. It doesn't support AIO, which
> will be implemented in a follow up patch.
>
> Signed-off-by: Long Li <[email protected]>
> ---
> fs/cifs/cifsfs.h | 1 +
> fs/cifs/file.c | 165 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 166 insertions(+)
>
> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
> index 7fba9aa..e9c5103 100644
> --- a/fs/cifs/cifsfs.h
> +++ b/fs/cifs/cifsfs.h
> @@ -105,6 +105,7 @@ extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
> extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
> extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
> extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
> +extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from);
> extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
> extern int cifs_lock(struct file *, int, struct file_lock *);
> extern int cifs_fsync(struct file *, loff_t, loff_t, int);
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index e6e6f24..8c385b1 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2461,6 +2461,35 @@ cifs_uncached_writedata_release(struct kref *refcount)
>
> static void collect_uncached_write_data(struct cifs_aio_ctx *ctx);
>
> +static void cifs_direct_writedata_release(struct kref *refcount)
> +{
> + int i;
> + struct cifs_writedata *wdata = container_of(refcount,
> + struct cifs_writedata, refcount);
> +
> + for (i = 0; i < wdata->nr_pages; i++)
> + put_page(wdata->pages[i]);
> +
> + cifs_writedata_release(refcount);
> +}
> +
> +static void cifs_direct_writev_complete(struct work_struct *work)
> +{
> + struct cifs_writedata *wdata = container_of(work,
> + struct cifs_writedata, work);
> + struct inode *inode = d_inode(wdata->cfile->dentry);
> + struct cifsInodeInfo *cifsi = CIFS_I(inode);
> +
> + spin_lock(&inode->i_lock);
> + cifs_update_eof(cifsi, wdata->offset, wdata->bytes);
> + if (cifsi->server_eof > inode->i_size)
> + i_size_write(inode, cifsi->server_eof);
> + spin_unlock(&inode->i_lock);
> +
> + complete(&wdata->done);
> + kref_put(&wdata->refcount, cifs_direct_writedata_release);
> +}
> +
> static void
> cifs_uncached_writev_complete(struct work_struct *work)
> {
> @@ -2703,6 +2732,142 @@ static void collect_uncached_write_data(struct cifs_aio_ctx *ctx)
> complete(&ctx->done);
> }
>
> +ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from)
> +{
> + struct file *file = iocb->ki_filp;
> + ssize_t total_written = 0;
> + struct cifsFileInfo *cfile;
> + struct cifs_tcon *tcon;
> + struct cifs_sb_info *cifs_sb;
> + struct TCP_Server_Info *server;
> + pid_t pid;
> + unsigned long nr_pages;
> + loff_t offset = iocb->ki_pos;
> + size_t len = iov_iter_count(from);
> + int rc;
> + struct cifs_writedata *wdata;
> +
> + /*
> + * iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
> + * In this case, fall back to non-direct write function.
> + */
> + if (from->type & ITER_KVEC) {
> + cifs_dbg(FYI, "use non-direct cifs_user_writev for kvec I/O\n");
> + return cifs_user_writev(iocb, from);
> + }
> +
> + rc = generic_write_checks(iocb, from);
> + if (rc <= 0)
> + return rc;
> +
> + cifs_sb = CIFS_FILE_SB(file);
> + cfile = file->private_data;
> + tcon = tlink_tcon(cfile->tlink);
> + server = tcon->ses->server;
> +
> + if (!server->ops->async_writev)
> + return -ENOSYS;
> +
> + if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
> + pid = cfile->pid;
> + else
> + pid = current->tgid;
> +
> + do {
> + unsigned int wsize, credits;
> + struct page **pagevec;
> + size_t start;
> + ssize_t cur_len;
> +
> + rc = server->ops->wait_mtu_credits(server, cifs_sb->wsize,
> + &wsize, &credits);
> + if (rc)
> + break;
> +
> + cur_len = iov_iter_get_pages_alloc(
> + from, &pagevec, wsize, &start);
> + if (cur_len < 0) {
> + cifs_dbg(VFS,
> + "direct_writev couldn't get user pages "
> + "(rc=%zd) iter type %d iov_offset %lu count"
> + " %lu\n",
> + cur_len, from->type,
> + from->iov_offset, from->count);
> + dump_stack();
> + break;
> + }
> + if (cur_len < 0)
> + break;

This cur_len < 0 test is redundant with the prior if(), delete.
> +
> + nr_pages = (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE;

Am I misreading, or will this return be one more page than needed? If
start (the first byte offset) is > 0, nr_pages will already be one.
And if cur_len is 4KB, even if start is 0, nr_pages will be two.

> +
> + wdata = cifs_writedata_direct_alloc(pagevec,
> + cifs_direct_writev_complete);
> + if (!wdata) {
> + rc = -ENOMEM;
> + add_credits_and_wake_if(server, credits, 0);
> + break;
> + }
> +
> + wdata->nr_pages = nr_pages;
> + wdata->page_offset = start;
> + wdata->pagesz = PAGE_SIZE;
> + wdata->tailsz =
> + nr_pages > 1 ?
> + cur_len - (PAGE_SIZE - start) -
> + (nr_pages - 2) * PAGE_SIZE :
> + cur_len;
> +
> + wdata->sync_mode = WB_SYNC_ALL;
> + wdata->offset = (__u64)offset;
> + wdata->cfile = cifsFileInfo_get(cfile);
> + wdata->pid = pid;
> + wdata->bytes = cur_len;
> + wdata->credits = credits;
> +
> + rc = 0;
> + if (wdata->cfile->invalidHandle)
> + rc = cifs_reopen_file(wdata->cfile, false);
> +
> + if (!rc)
> + rc = server->ops->async_writev(wdata,
> + cifs_direct_writedata_release);
> +
> + if (rc) {
> + add_credits_and_wake_if(server, wdata->credits, 0);
> + kref_put(&wdata->refcount,
> + cifs_direct_writedata_release);
> + if (rc == -EAGAIN)
> + continue;
> + break;
> + }

Same comments as for previous patch re the if (rc) ladder, and
the break/continues both being better expressed as careful goto's.

> +
> + wait_for_completion(&wdata->done);
> + if (wdata->result) {
> + rc = wdata->result;
> + kref_put(&wdata->refcount,
> + cifs_direct_writedata_release);
> + if (rc == -EAGAIN)
> + continue;
> + break;
> + }
> +
> + kref_put(&wdata->refcount, cifs_direct_writedata_release);
> +
> + iov_iter_advance(from, cur_len);
> + total_written += cur_len;
> + offset += cur_len;
> + len -= cur_len;
> + } while (len);
> +
> + if (unlikely(!total_written))
> + return rc;
> +
> + iocb->ki_pos += total_written;
> + return total_written;
> +
> +}
> +
> ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
> {
> struct file *file = iocb->ki_filp;
>

2018-06-25 20:26:59

by Long Li

[permalink] [raw]
Subject: RE: [Patch v2 02/15] CIFS: Add support for direct pages in rdata

> From: Tom Talpey <[email protected]>
> Sent: Saturday, June 23, 2018 6:50 PM
> To: Long Li <[email protected]>; Steve French <[email protected]>;
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]
> Subject: Re: [Patch v2 02/15] CIFS: Add support for direct pages in rdata
>
> On 5/30/2018 3:47 PM, Long Li wrote:
> > From: Long Li <[email protected]>
> >
> > Add a function to allocate rdata without allocating pages for data
> > transfer. This gives the caller an option to pass a number of pages
> > that point to the data buffer.
> >
> > rdata is still reponsible for free those pages after it's done.
>
> "Caller" is still responsible? Or is the rdata somehow freeing itself via another
> mechanism?

I meant to say free the array "struct page **pages", not the pages themselves.

Before the patch, the page array is allocated at the end of the struct cifs_readdata:

rdata = kzalloc(sizeof(*rdata) + (sizeof(struct page *) * nr_pages),
GFP_KERNEL);

If this is the case, they are automatically freed when cifs_readdata is freed.

With the direct I/O patch, the page array is handled separately. So they need to be freed after being used.

>
> >
> > Signed-off-by: Long Li <[email protected]>
> > ---
> > fs/cifs/cifsglob.h | 2 +-
> > fs/cifs/file.c | 23 ++++++++++++++++++++---
> > 2 files changed, 21 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index
> > 8d16c3e..56864a87 100644
> > --- a/fs/cifs/cifsglob.h
> > +++ b/fs/cifs/cifsglob.h
> > @@ -1179,7 +1179,7 @@ struct cifs_readdata {
> > unsigned int tailsz;
> > unsigned int credits;
> > unsigned int nr_pages;
> > - struct page *pages[];
> > + struct page **pages;
>
> Technically speaking, these are syntactically equivalent. It may not be worth
> changing this historic definition.
>
> Tom.
>
>
> > };
> >
> > struct cifs_writedata;
> > diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 23fd430..1c98293
> > 100644
> > --- a/fs/cifs/file.c
> > +++ b/fs/cifs/file.c
> > @@ -2880,13 +2880,13 @@ cifs_strict_writev(struct kiocb *iocb, struct
> iov_iter *from)
> > }
> >
> > static struct cifs_readdata *
> > -cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
> > +cifs_readdata_direct_alloc(struct page **pages, work_func_t complete)
> > {
> > struct cifs_readdata *rdata;
> >
> > - rdata = kzalloc(sizeof(*rdata) + (sizeof(struct page *) * nr_pages),
> > - GFP_KERNEL);
> > + rdata = kzalloc(sizeof(*rdata), GFP_KERNEL);
> > if (rdata != NULL) {
> > + rdata->pages = pages;
> > kref_init(&rdata->refcount);
> > INIT_LIST_HEAD(&rdata->list);
> > init_completion(&rdata->done);
> > @@ -2896,6 +2896,22 @@ cifs_readdata_alloc(unsigned int nr_pages,
> work_func_t complete)
> > return rdata;
> > }
> >
> > +static struct cifs_readdata *
> > +cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete) {
> > + struct page **pages =
> > + kzalloc(sizeof(struct page *) * nr_pages, GFP_KERNEL);
> > + struct cifs_readdata *ret = NULL;
> > +
> > + if (pages) {
> > + ret = cifs_readdata_direct_alloc(pages, complete);
> > + if (!ret)
> > + kfree(pages);
> > + }
> > +
> > + return ret;
> > +}
> > +
> > void
> > cifs_readdata_release(struct kref *refcount)
> > {
> > @@ -2910,6 +2926,7 @@ cifs_readdata_release(struct kref *refcount)
> > if (rdata->cfile)
> > cifsFileInfo_put(rdata->cfile);
> >
> > + kvfree(rdata->pages);
> > kfree(rdata);
> > }
> >
> >

2018-06-25 20:34:37

by Long Li

[permalink] [raw]
Subject: RE: [Patch v2 03/15] CIFS: Use offset when reading pages

> From: Tom Talpey <[email protected]>
> Sent: Saturday, June 23, 2018 6:59 PM
> To: Long Li <[email protected]>; Steve French <[email protected]>;
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]
> Subject: Re: [Patch v2 03/15] CIFS: Use offset when reading pages
>
> On 5/30/2018 3:47 PM, Long Li wrote:
> > From: Long Li <[email protected]>
> >
> > With offset defined in rdata, transport functions need to look at this
> > offset when reading data into the correct places in pages.
> >
> > Signed-off-by: Long Li <[email protected]>
> > ---
> > fs/cifs/cifsproto.h | 4 +++-
> > fs/cifs/connect.c | 5 +++--
> > fs/cifs/file.c | 52 +++++++++++++++++++++++++++++++++++++--------
> -------
> > fs/cifs/smb2ops.c | 2 +-
> > fs/cifs/smb2pdu.c | 1 +
> > 5 files changed, 45 insertions(+), 19 deletions(-)
> >
> > diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index
> > dc80f84..1f27d8e 100644
> > --- a/fs/cifs/cifsproto.h
> > +++ b/fs/cifs/cifsproto.h
> > @@ -203,7 +203,9 @@ extern void dequeue_mid(struct mid_q_entry *mid,
> bool malformed);
> > extern int cifs_read_from_socket(struct TCP_Server_Info *server, char
> *buf,
> > unsigned int to_read);
> > extern int cifs_read_page_from_socket(struct TCP_Server_Info *server,
> > - struct page *page, unsigned int to_read);
> > + struct page *page,
> > + unsigned int page_offset,
> > + unsigned int to_read);
> > extern int cifs_setup_cifs_sb(struct smb_vol *pvolume_info,
> > struct cifs_sb_info *cifs_sb);
> > extern int cifs_match_super(struct super_block *, void *); diff
> > --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 83b0234..8501da0
> > 100644
> > --- a/fs/cifs/connect.c
> > +++ b/fs/cifs/connect.c
> > @@ -594,10 +594,11 @@ cifs_read_from_socket(struct TCP_Server_Info
> > *server, char *buf,
> >
> > int
> > cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page
> *page,
> > - unsigned int to_read)
> > + unsigned int page_offset, unsigned int to_read)
> > {
> > struct msghdr smb_msg;
> > - struct bio_vec bv = {.bv_page = page, .bv_len = to_read};
> > + struct bio_vec bv = {
> > + .bv_page = page, .bv_len = to_read, .bv_offset =
> page_offset};
> > iov_iter_bvec(&smb_msg.msg_iter, READ | ITER_BVEC, &bv, 1,
> to_read);
> > return cifs_readv_from_socket(server, &smb_msg);
> > }
> > diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 1c98293..87eece6
> > 100644
> > --- a/fs/cifs/file.c
> > +++ b/fs/cifs/file.c
> > @@ -3026,12 +3026,20 @@ uncached_fill_pages(struct TCP_Server_Info
> *server,
> > int result = 0;
> > unsigned int i;
> > unsigned int nr_pages = rdata->nr_pages;
> > + unsigned int page_offset = rdata->page_offset;
> >
> > rdata->got_bytes = 0;
> > rdata->tailsz = PAGE_SIZE;
> > for (i = 0; i < nr_pages; i++) {
> > struct page *page = rdata->pages[i];
> > size_t n;
> > + unsigned int segment_size = rdata->pagesz;
> > +
> > + if (i == 0)
> > + segment_size -= page_offset;
> > + else
> > + page_offset = 0;
> > +
> >
> > if (len <= 0) {
> > /* no need to hold page hostage */ @@ -3040,24
> +3048,25 @@
> > uncached_fill_pages(struct TCP_Server_Info *server,
> > put_page(page);
> > continue;
> > }
> > +
> > n = len;
> > - if (len >= PAGE_SIZE) {
> > + if (len >= segment_size)
> > /* enough data to fill the page */
> > - n = PAGE_SIZE;
> > - len -= n;
> > - } else {
> > - zero_user(page, len, PAGE_SIZE - len);
> > + n = segment_size;
> > + else
> > rdata->tailsz = len;
> > - len = 0;
> > - }
> > + len -= n;
> > +
> > if (iter)
> > - result = copy_page_from_iter(page, 0, n, iter);
> > + result = copy_page_from_iter(
> > + page, page_offset, n, iter);
> > #ifdef CONFIG_CIFS_SMB_DIRECT
> > else if (rdata->mr)
> > result = n;
> > #endif
>
> I hadn't noticed this type of xonditional code before. It's kind of ugly to
> modify the data handling code like this. Do you plan to break this out into an
> smbdirect-specific hander, like the cases above and below?

This is a good point. I will look for ways to improve this.

>
> > else
> > - result = cifs_read_page_from_socket(server, page,
> n);
> > + result = cifs_read_page_from_socket(
> > + server, page, page_offset, n);
> > if (result < 0)
> > break;
> >
> > @@ -3130,6 +3139,7 @@ cifs_send_async_read(loff_t offset, size_t len,
> struct cifsFileInfo *open_file,
> > rdata->bytes = cur_len;
> > rdata->pid = pid;
> > rdata->pagesz = PAGE_SIZE;
> > + rdata->tailsz = PAGE_SIZE;
> > rdata->read_into_pages = cifs_uncached_read_into_pages;
> > rdata->copy_into_pages = cifs_uncached_copy_into_pages;
> > rdata->credits = credits;
> > @@ -3574,6 +3584,7 @@ readpages_fill_pages(struct TCP_Server_Info
> *server,
> > u64 eof;
> > pgoff_t eof_index;
> > unsigned int nr_pages = rdata->nr_pages;
> > + unsigned int page_offset = rdata->page_offset;
> >
> > /* determine the eof that the server (probably) has */
> > eof = CIFS_I(rdata->mapping->host)->server_eof;
> > @@ -3584,13 +3595,21 @@ readpages_fill_pages(struct TCP_Server_Info
> *server,
> > rdata->tailsz = PAGE_SIZE;
> > for (i = 0; i < nr_pages; i++) {
> > struct page *page = rdata->pages[i];
> > - size_t n = PAGE_SIZE;
> > + unsigned int to_read = rdata->pagesz;
> > + size_t n;
> > +
> > + if (i == 0)
> > + to_read -= page_offset;
> > + else
> > + page_offset = 0;
> > +
> > + n = to_read;
> >
> > - if (len >= PAGE_SIZE) {
> > - len -= PAGE_SIZE;
> > + if (len >= to_read) {
> > + len -= to_read;
> > } else if (len > 0) {
> > /* enough for partial page, fill and zero the rest */
> > - zero_user(page, len, PAGE_SIZE - len);
> > + zero_user(page, len + page_offset, to_read - len);
> > n = rdata->tailsz = len;
> > len = 0;
> > } else if (page->index > eof_index) { @@ -3622,13 +3641,15
> @@
> > readpages_fill_pages(struct TCP_Server_Info *server,
> > }
> >
> > if (iter)
> > - result = copy_page_from_iter(page, 0, n, iter);
> > + result = copy_page_from_iter(
> > + page, page_offset, n, iter);
> > #ifdef CONFIG_CIFS_SMB_DIRECT
> > else if (rdata->mr)
> > result = n;
> > #endif
>
> Ditto previous comment/question.
>
> > else
> > - result = cifs_read_page_from_socket(server, page,
> n);
> > + result = cifs_read_page_from_socket(
> > + server, page, page_offset, n);
> > if (result < 0)
> > break;
> >
> > @@ -3807,6 +3828,7 @@ static int cifs_readpages(struct file *file, struct
> address_space *mapping,
> > rdata->bytes = bytes;
> > rdata->pid = pid;
> > rdata->pagesz = PAGE_SIZE;
> > + rdata->tailsz = PAGE_SIZE;
> > rdata->read_into_pages = cifs_readpages_read_into_pages;
> > rdata->copy_into_pages = cifs_readpages_copy_into_pages;
> > rdata->credits = credits;
> > diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index
> > 7c0edd2..1fa1c29 100644
> > --- a/fs/cifs/smb2ops.c
> > +++ b/fs/cifs/smb2ops.c
> > @@ -2467,7 +2467,7 @@ read_data_into_pages(struct TCP_Server_Info
> *server, struct page **pages,
> > zero_user(page, len, PAGE_SIZE - len);
> > len = 0;
> > }
> > - length = cifs_read_page_from_socket(server, page, n);
> > + length = cifs_read_page_from_socket(server, page, 0, n);
> > if (length < 0)
> > return length;
> > server->total_read += length;
> > diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index
> > a02f6b6..f603fbe 100644
> > --- a/fs/cifs/smb2pdu.c
> > +++ b/fs/cifs/smb2pdu.c
> > @@ -2683,6 +2683,7 @@ smb2_readv_callback(struct mid_q_entry *mid)
> > struct smb_rqst rqst = { .rq_iov = rdata->iov,
> > .rq_nvec = 2,
> > .rq_pages = rdata->pages,
> > + .rq_offset = rdata->page_offset,
> > .rq_npages = rdata->nr_pages,
> > .rq_pagesz = rdata->pagesz,
> > .rq_tailsz = rdata->tailsz };
> >

2018-06-25 20:35:45

by Long Li

[permalink] [raw]
Subject: RE: [Patch v2 04/15] CIFS: Add support for direct pages in wdata

> Subject: Re: [Patch v2 04/15] CIFS: Add support for direct pages in wdata
>
> On 5/30/2018 3:47 PM, Long Li wrote:
> > From: Long Li <[email protected]>
> >
> > Add a function to allocate wdata without allocating pages for data
> > transfer. This gives the caller an option to pass a number of pages
> > that point to the data buffer to write to.
> >
> > wdata is reponsible for free those pages after it's done.
>
> Same comment as for the earlier patch. "Caller" is responsible for "freeing"
> those pages? Confusing, as worded.
>
> >
> > Signed-off-by: Long Li <[email protected]>
> > ---
> > fs/cifs/cifsglob.h | 2 +-
> > fs/cifs/cifsproto.h | 2 ++
> > fs/cifs/cifssmb.c | 17 ++++++++++++++---
> > 3 files changed, 17 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index
> > 56864a87..7f62c98 100644
> > --- a/fs/cifs/cifsglob.h
> > +++ b/fs/cifs/cifsglob.h
> > @@ -1205,7 +1205,7 @@ struct cifs_writedata {
> > unsigned int tailsz;
> > unsigned int credits;
> > unsigned int nr_pages;
> > - struct page *pages[];
> > + struct page **pages;
>
> Also same comment as for earlier patch, these are syntactically equivalent
> and maybe not needed to change.
>
> > };
> >
> > /*
> > diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index
> > 1f27d8e..7933c5f 100644
> > --- a/fs/cifs/cifsproto.h
> > +++ b/fs/cifs/cifsproto.h
> > @@ -533,6 +533,8 @@ int cifs_async_writev(struct cifs_writedata *wdata,
> > void cifs_writev_complete(struct work_struct *work);
> > struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages,
> > work_func_t complete);
> > +struct cifs_writedata *cifs_writedata_direct_alloc(struct page **pages,
> > + work_func_t complete);
> > void cifs_writedata_release(struct kref *refcount);
> > int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
> > struct cifs_sb_info *cifs_sb,
> > diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index
> > c8e4278..5aca336 100644
> > --- a/fs/cifs/cifssmb.c
> > +++ b/fs/cifs/cifssmb.c
> > @@ -1952,6 +1952,7 @@ cifs_writedata_release(struct kref *refcount)
> > if (wdata->cfile)
> > cifsFileInfo_put(wdata->cfile);
> >
> > + kvfree(wdata->pages);
> > kfree(wdata);
> > }
> >
> > @@ -2075,12 +2076,22 @@ cifs_writev_complete(struct work_struct
> *work)
> > struct cifs_writedata *
> > cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete)
> > {
> > + struct page **pages =
> > + kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS);
>
> Why do you do a GFP_NOFS here but GFP_KERNEL in the earlier patch?

This is for the I/O write path. The earlier patch is for I/O read path. This code preserves the behavior from the buffer I/O code.

The write path is possibly used when memory is under pressure, so it's better not to call FS (and CIFS) that may result in a loop.

>
> > + if (pages)
> > + return cifs_writedata_direct_alloc(pages, complete);
> > +
> > + return NULL;
> > +}
> > +
> > +struct cifs_writedata *
> > +cifs_writedata_direct_alloc(struct page **pages, work_func_t
> > +complete) {
> > struct cifs_writedata *wdata;
> >
> > - /* writedata + number of page pointers */
> > - wdata = kzalloc(sizeof(*wdata) +
> > - sizeof(struct page *) * nr_pages, GFP_NOFS);
> > + wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
> > if (wdata != NULL) {
> > + wdata->pages = pages;
> > kref_init(&wdata->refcount);
> > INIT_LIST_HEAD(&wdata->list);
> > init_completion(&wdata->done);
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the
> body of a message to [email protected] More majordomo info at
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvger.ke
> rnel.org%2Fmajordomo-
> info.html&amp;data=02%7C01%7Clongli%40microsoft.com%7C20e0bcbdcce3
> 402133ee08d5d9767977%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%
> 7C636654025229749549&amp;sdata=p4P9AqS8cnOHErCFc1XALeaZxHuB7%2B
> VOF10SjaaycAI%3D&amp;reserved=0

2018-06-25 21:04:29

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [Patch v2 02/15] CIFS: Add support for direct pages in rdata

On Sat, Jun 23, 2018 at 09:50:20PM -0400, Tom Talpey wrote:
> On 5/30/2018 3:47 PM, Long Li wrote:
> >From: Long Li <[email protected]>
> >
> >Add a function to allocate rdata without allocating pages for data
> >transfer. This gives the caller an option to pass a number of pages
> >that point to the data buffer.
> >
> >rdata is still reponsible for free those pages after it's done.
>
> "Caller" is still responsible? Or is the rdata somehow freeing itself
> via another mechanism?
>
> >
> >Signed-off-by: Long Li <[email protected]>
> > fs/cifs/cifsglob.h | 2 +-
> > fs/cifs/file.c | 23 ++++++++++++++++++++---
> > 2 files changed, 21 insertions(+), 4 deletions(-)
> >
> >diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> >index 8d16c3e..56864a87 100644
> >+++ b/fs/cifs/cifsglob.h
> >@@ -1179,7 +1179,7 @@ struct cifs_readdata {
> > unsigned int tailsz;
> > unsigned int credits;
> > unsigned int nr_pages;
> >- struct page *pages[];
> >+ struct page **pages;
>
> Technically speaking, these are syntactically equivalent. It may not
> be worth changing this historic definition.

[] is a C99 'flex array', it has a different allocation behavior than
** and is not interchangeable..

Jason

2018-06-25 21:12:05

by Long Li

[permalink] [raw]
Subject: RE: [Patch v2 05/15] CIFS: Calculate the correct request length based on page offset and tail size

> Subject: Re: [Patch v2 05/15] CIFS: Calculate the correct request length based
> on page offset and tail size
>
> On 5/30/2018 3:47 PM, Long Li wrote:
> > From: Long Li <[email protected]>
> >
> > It's possible that the page offset is non-zero in the pages in a
> > request, change the function to calculate the correct data buffer length.
> >
> > Signed-off-by: Long Li <[email protected]>
> > ---
> > fs/cifs/transport.c | 20 +++++++++++++++++---
> > 1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index
> > 927226a..d6b5523 100644
> > --- a/fs/cifs/transport.c
> > +++ b/fs/cifs/transport.c
> > @@ -212,10 +212,24 @@ rqst_len(struct smb_rqst *rqst)
> > for (i = 0; i < rqst->rq_nvec; i++)
> > buflen += iov[i].iov_len;
> >
> > - /* add in the page array if there is one */
> > + /*
> > + * Add in the page array if there is one. The caller needs to make
> > + * sure rq_offset and rq_tailsz are set correctly. If a buffer of
> > + * multiple pages ends at page boundary, rq_tailsz needs to be set to
> > + * PAGE_SIZE.
> > + */
> > if (rqst->rq_npages) {
> > - buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
> > - buflen += rqst->rq_tailsz;
> > + if (rqst->rq_npages == 1)
> > + buflen += rqst->rq_tailsz;
> > + else {
> > + /*
> > + * If there is more than one page, calculate the
> > + * buffer length based on rq_offset and rq_tailsz
> > + */
> > + buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
> > + rqst->rq_offset;
> > + buflen += rqst->rq_tailsz;
> > + }
>
> Wouldn't it be simpler to keep the original code, but then just subtract the
> rq_offset?
>
> buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
> buflen += rqst->rq_tailsz;
> buflen -= rqst->rq_offset;
>
> It's kind of confusing as written. Also, what if it's just one page, but has a
> non-zero offset? Is that somehow not possible? My suggested code would
> take that into account, yours doesn't.

I think It can be changed to this:

buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
buflen += rqst->rq_tailsz;
if (rqst->rq_npages > 1)
buflen -= rqst->rq_offset;

This looks cleaner than before.

When there is only one page with a non-zero offset, rq_tailsz is the actual data length.

>
> Tom.
>
> > }
> >
> > return buflen;
> >

2018-06-25 21:16:13

by Long Li

[permalink] [raw]
Subject: RE: [Patch v2 06/15] CIFS: Introduce helper function to get page offset and length in smb_rqst

> Subject: Re: [Patch v2 06/15] CIFS: Introduce helper function to get page
> offset and length in smb_rqst
>
> On 5/30/2018 3:47 PM, Long Li wrote:
> > From: Long Li <[email protected]>
> >
> > Introduce a function rqst_page_get_length to return the page offset
> > and length for a given page in smb_rqst. This function is to be used
> > by following patches.
> >
> > Signed-off-by: Long Li <[email protected]>
> > ---
> > fs/cifs/cifsproto.h | 3 +++
> > fs/cifs/misc.c | 17 +++++++++++++++++
> > 2 files changed, 20 insertions(+)
> >
> > diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index
> > 7933c5f..89dda14 100644
> > --- a/fs/cifs/cifsproto.h
> > +++ b/fs/cifs/cifsproto.h
> > @@ -557,4 +557,7 @@ int cifs_alloc_hash(const char *name, struct
> crypto_shash **shash,
> > struct sdesc **sdesc);
> > void cifs_free_hash(struct crypto_shash **shash, struct sdesc
> > **sdesc);
> >
> > +extern void rqst_page_get_length(struct smb_rqst *rqst, unsigned int
> page,
> > + unsigned int *len, unsigned int *offset);
> > +
> > #endif /* _CIFSPROTO_H */
> > diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c index 96849b5..e951417
> > 100644
> > --- a/fs/cifs/misc.c
> > +++ b/fs/cifs/misc.c
> > @@ -905,3 +905,20 @@ cifs_free_hash(struct crypto_shash **shash,
> struct sdesc **sdesc)
> > crypto_free_shash(*shash);
> > *shash = NULL;
> > }
> > +
> > +/**
> > + * rqst_page_get_length - obtain the length and offset for a page in
> > +smb_rqst
> > + * Input: rqst - a smb_rqst, page - a page index for rqst
> > + * Output: *len - the length for this page, *offset - the offset for
> > +this page */ void rqst_page_get_length(struct smb_rqst *rqst,
> > +unsigned int page,
> > + unsigned int *len, unsigned int *offset) {
> > + *len = rqst->rq_pagesz;
> > + *offset = (page == 0) ? rqst->rq_offset : 0;
>
> Really? Page 0 always has a zero offset??

I think you are misreading this line. The offset for page 0 is rq_offset, the offset for all subsequent pages are 0.

>
> > +
> > + if (rqst->rq_npages == 1 || page == rqst->rq_npages-1)
> > + *len = rqst->rq_tailsz;
> > + else if (page == 0)
> > + *len = rqst->rq_pagesz - rqst->rq_offset; }
> >
>
> This subroutine does what patch 5 does inline. Why not push this patch up in
> the sequence and use the helper?

This is actually a little different. This function returns the length and offset for a given page in the request. There might be multiple pages in a request.

The other function calculates the total length of all the pages in a request.

>
> Tom.

2018-06-25 21:26:28

by Long Li

[permalink] [raw]
Subject: RE: [Patch v2 08/15] CIFS: SMBD: Support page offset in RDMA send

> Subject: Re: [Patch v2 08/15] CIFS: SMBD: Support page offset in RDMA send
>
> On 5/30/2018 3:48 PM, Long Li wrote:
> > From: Long Li <[email protected]>
> >
> > The RDMA send function needs to look at offset in the request pages,
> > and send data starting from there.
> >
> > Signed-off-by: Long Li <[email protected]>
> > ---
> > fs/cifs/smbdirect.c | 27 +++++++++++++++++++--------
> > 1 file changed, 19 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index
> > c62f7c9..6141e3c 100644
> > --- a/fs/cifs/smbdirect.c
> > +++ b/fs/cifs/smbdirect.c
> > @@ -17,6 +17,7 @@
> > #include <linux/highmem.h>
> > #include "smbdirect.h"
> > #include "cifs_debug.h"
> > +#include "cifsproto.h"
> >
> > static struct smbd_response *get_empty_queue_buffer(
> > struct smbd_connection *info);
> > @@ -2082,7 +2083,7 @@ int smbd_send(struct smbd_connection *info,
> struct smb_rqst *rqst)
> > struct kvec vec;
> > int nvecs;
> > int size;
> > - int buflen = 0, remaining_data_length;
> > + unsigned int buflen = 0, remaining_data_length;
> > int start, i, j;
> > int max_iov_size =
> > info->max_send_size - sizeof(struct smbd_data_transfer);
> @@
> > -2113,10 +2114,17 @@ int smbd_send(struct smbd_connection *info,
> struct smb_rqst *rqst)
> > buflen += iov[i].iov_len;
> > }
> >
> > - /* add in the page array if there is one */
> > + /*
> > + * Add in the page array if there is one. The caller needs to set
> > + * rq_tailsz to PAGE_SIZE when the buffer has multiple pages and
> > + * ends at page boundary
> > + */
> > if (rqst->rq_npages) {
> > - buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
> > - buflen += rqst->rq_tailsz;
> > + if (rqst->rq_npages == 1)
> > + buflen += rqst->rq_tailsz;
> > + else
> > + buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
> > + rqst->rq_offset + rqst->rq_tailsz;
> > }
>
> This code is really confusing and redundant. It tests npages > 0, then tests
> npages == 1, then does an else. Why not call the helper like the following
> smbd_send()?

This code needs to get the combined length of all the pages in the request (but excluding iov, this is different to the function rqst_len in patch 05), but the following is for getting a single page from the request.

I can simplify the code a little bit by following your suggestion in patch 05.

>
> Tom.
>
> >
> > if (buflen + sizeof(struct smbd_data_transfer) > @@ -2213,8 +2221,9
> > @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
> >
> > /* now sending pages if there are any */
> > for (i = 0; i < rqst->rq_npages; i++) {
> > - buflen = (i == rqst->rq_npages-1) ?
> > - rqst->rq_tailsz : rqst->rq_pagesz;
> > + unsigned int offset;
> > +
> > + rqst_page_get_length(rqst, i, &buflen, &offset);
> > nvecs = (buflen + max_iov_size - 1) / max_iov_size;
> > log_write(INFO, "sending pages buflen=%d nvecs=%d\n",
> > buflen, nvecs);
> > @@ -2225,9 +2234,11 @@ int smbd_send(struct smbd_connection *info,
> struct smb_rqst *rqst)
> > remaining_data_length -= size;
> > log_write(INFO, "sending pages i=%d offset=%d
> size=%d"
> > " remaining_data_length=%d\n",
> > - i, j*max_iov_size, size,
> remaining_data_length);
> > + i, j*max_iov_size+offset, size,
> > + remaining_data_length);
> > rc = smbd_post_send_page(
> > - info, rqst->rq_pages[i], j*max_iov_size,
> > + info, rqst->rq_pages[i],
> > + j*max_iov_size + offset,
> > size, remaining_data_length);
> > if (rc)
> > goto done;
> >

2018-06-25 21:30:48

by Long Li

[permalink] [raw]
Subject: RE: [Patch v2 09/15] CIFS: SMBD: Support page offset in RDMA recv

> Subject: Re: [Patch v2 09/15] CIFS: SMBD: Support page offset in RDMA recv
>
> On 5/30/2018 3:48 PM, Long Li wrote:
> > From: Long Li <[email protected]>
> >
> > RDMA recv function needs to place data to the correct place starting
> > at page offset.
> >
> > Signed-off-by: Long Li <[email protected]>
> > ---
> > fs/cifs/smbdirect.c | 18 +++++++++++-------
> > 1 file changed, 11 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index
> > 6141e3c..ba53c52 100644
> > --- a/fs/cifs/smbdirect.c
> > +++ b/fs/cifs/smbdirect.c
> > @@ -2004,10 +2004,12 @@ static int smbd_recv_buf(struct
> smbd_connection *info, char *buf,
> > * return value: actual data read
> > */
> > static int smbd_recv_page(struct smbd_connection *info,
> > - struct page *page, unsigned int to_read)
> > + struct page *page, unsigned int page_offset,
> > + unsigned int to_read)
> > {
> > int ret;
> > char *to_address;
> > + void *page_address;
> >
> > /* make sure we have the page ready for read */
> > ret = wait_event_interruptible(
> > @@ -2015,16 +2017,17 @@ static int smbd_recv_page(struct
> smbd_connection *info,
> > info->reassembly_data_length >= to_read ||
> > info->transport_status != SMBD_CONNECTED);
> > if (ret)
> > - return 0;
> > + return ret;
> >
> > /* now we can read from reassembly queue and not sleep */
> > - to_address = kmap_atomic(page);
> > + page_address = kmap_atomic(page);
> > + to_address = (char *) page_address + page_offset;
> >
> > log_read(INFO, "reading from page=%p address=%p to_read=%d\n",
> > page, to_address, to_read);
> >
> > ret = smbd_recv_buf(info, to_address, to_read);
> > - kunmap_atomic(to_address);
> > + kunmap_atomic(page_address);
>
> Is "page" truly not mapped? This kmap/kunmap for each received 4KB is very
> expensive. Is there not a way to keep a kva for the reassembly queue
> segments?

I will find a way to not map those upper layer pages when doing RDMA receive. The reason for doing this is to use a common layer (that will greatly simplifies the code) for transport. It's probably a waste of time when only pages are involved, and not need to be mapped.

I will address those in a separate patch.

>
> >
> > return ret;
> > }
> > @@ -2038,7 +2041,7 @@ int smbd_recv(struct smbd_connection *info,
> struct msghdr *msg)
> > {
> > char *buf;
> > struct page *page;
> > - unsigned int to_read;
> > + unsigned int to_read, page_offset;
> > int rc;
> >
> > info->smbd_recv_pending++;
> > @@ -2052,15 +2055,16 @@ int smbd_recv(struct smbd_connection *info,
> > struct msghdr *msg)
> >
> > case READ | ITER_BVEC:
> > page = msg->msg_iter.bvec->bv_page;
> > + page_offset = msg->msg_iter.bvec->bv_offset;
> > to_read = msg->msg_iter.bvec->bv_len;
> > - rc = smbd_recv_page(info, page, to_read);
> > + rc = smbd_recv_page(info, page, page_offset, to_read);
> > break;
> >
> > default:
> > /* It's a bug in upper layer to get there */
> > cifs_dbg(VFS, "CIFS: invalid msg type %d\n",
> > msg->msg_iter.type);
> > - rc = -EIO;
> > + rc = -EINVAL;
> > }
> >
> > info->smbd_recv_pending--;
> >

2018-06-26 04:16:49

by Long Li

[permalink] [raw]
Subject: RE: [Patch v2 11/15] CIFS: Pass page offset for calculating signature

> Subject: Re: [Patch v2 11/15] CIFS: Pass page offset for calculating signature
>
> On 5/30/2018 3:48 PM, Long Li wrote:
> > From: Long Li <[email protected]>
> >
> > When calculating signature for the packet, it needs to read into the
> > correct page offset for the data.
> >
> > Signed-off-by: Long Li <[email protected]>
> > ---
> > fs/cifs/cifsencrypt.c | 9 +++++----
> > 1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c index
> > a6ef088..e88303c 100644
> > --- a/fs/cifs/cifsencrypt.c
> > +++ b/fs/cifs/cifsencrypt.c
> > @@ -68,11 +68,12 @@ int __cifs_calc_signature(struct smb_rqst *rqst,
> >
> > /* now hash over the rq_pages array */
> > for (i = 0; i < rqst->rq_npages; i++) {
> > - void *kaddr = kmap(rqst->rq_pages[i]);
> > - size_t len = rqst->rq_pagesz;
> > + void *kaddr;
> > + unsigned int len, offset;
> >
> > - if (i == rqst->rq_npages - 1)
> > - len = rqst->rq_tailsz;
> > + rqst_page_get_length(rqst, i, &len, &offset);
> > +
> > + kaddr = (char *) kmap(rqst->rq_pages[i]) + offset;
>
> I suppose it's more robust to map a page at a time, but it's pretty expensive.
> Is this the only way to iterate over a potentially very large block of data? For
> example, a 1MB segment means 256 kmap/kunmaps.

I will look into not mapping those pages while doing I/O.

This code path is for RDMA send/receive, and it's rarely used for transferring large amount of data.

>
> Tom.
>
> >
> > crypto_shash_update(shash, kaddr, len);
> >
> >

2018-06-26 04:35:29

by Long Li

[permalink] [raw]
Subject: RE: [Patch v2 13/15] CIFS: Add support for direct I/O read

> Subject: Re: [Patch v2 13/15] CIFS: Add support for direct I/O read
>
>
>
> On 5/30/2018 3:48 PM, Long Li wrote:
> > From: Long Li <[email protected]>
> >
> > Implement the function for direct I/O read. It doesn't support AIO,
> > which will be implemented in a follow up patch.
> >
> > Signed-off-by: Long Li <[email protected]>
> > ---
> > fs/cifs/cifsfs.h | 1 +
> > fs/cifs/file.c | 149
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 150 insertions(+)
> >
> > diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index
> > 5f02318..7fba9aa 100644
> > --- a/fs/cifs/cifsfs.h
> > +++ b/fs/cifs/cifsfs.h
> > @@ -102,6 +102,7 @@ extern int cifs_open(struct inode *inode, struct file
> *file);
> > extern int cifs_close(struct inode *inode, struct file *file);
> > extern int cifs_closedir(struct inode *inode, struct file *file);
> > extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter
> > *to);
> > +extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter
> > +*to);
> > extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
> > extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
> > extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct
> > iov_iter *from); diff --git a/fs/cifs/file.c b/fs/cifs/file.c index
> > 87eece6..e6e6f24 100644
> > --- a/fs/cifs/file.c
> > +++ b/fs/cifs/file.c
> > @@ -2955,6 +2955,18 @@ cifs_read_allocate_pages(struct cifs_readdata
> *rdata, unsigned int nr_pages)
> > return rc;
> > }
> >
> > +static void cifs_direct_readdata_release(struct kref *refcount) {
> > + struct cifs_readdata *rdata = container_of(refcount,
> > + struct cifs_readdata, refcount);
> > + unsigned int i;
> > +
> > + for (i = 0; i < rdata->nr_pages; i++)
> > + put_page(rdata->pages[i]);
> > +
> > + cifs_readdata_release(refcount);
> > +}
> > +
> > static void
> > cifs_uncached_readdata_release(struct kref *refcount)
> > {
> > @@ -3267,6 +3279,143 @@ collect_uncached_read_data(struct
> cifs_aio_ctx *ctx)
> > complete(&ctx->done);
> > }
> >
> > +static void cifs_direct_readv_complete(struct work_struct *work) {
> > + struct cifs_readdata *rdata =
> > + container_of(work, struct cifs_readdata, work);
> > +
> > + complete(&rdata->done);
> > + kref_put(&rdata->refcount, cifs_direct_readdata_release); }
> > +
> > +ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to) {
> > + size_t len, cur_len, start;
> > + unsigned int npages, rsize, credits;
> > + struct file *file;
> > + struct cifs_sb_info *cifs_sb;
> > + struct cifsFileInfo *cfile;
> > + struct cifs_tcon *tcon;
> > + struct page **pagevec;
> > + ssize_t rc, total_read = 0;
> > + struct TCP_Server_Info *server;
> > + loff_t offset = iocb->ki_pos;
> > + pid_t pid;
> > + struct cifs_readdata *rdata;
> > +
> > + /*
> > + * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
> > + * fall back to data copy read path
> > + */
> > + if (to->type & ITER_KVEC) {
> > + cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec
> I/O\n");
> > + return cifs_user_readv(iocb, to);
> > + }
> > +
> > + len = iov_iter_count(to);
> > + if (!len)
> > + return 0;
> > +
> > + file = iocb->ki_filp;
> > + cifs_sb = CIFS_FILE_SB(file);
> > + cfile = file->private_data;
> > + tcon = tlink_tcon(cfile->tlink);
> > + server = tcon->ses->server;
> > +
> > + if (!server->ops->async_readv)
> > + return -ENOSYS;
> > +
> > + if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
> > + pid = cfile->pid;
> > + else
> > + pid = current->tgid;
> > +
> > + if ((file->f_flags & O_ACCMODE) == O_WRONLY)
> > + cifs_dbg(FYI, "attempting read on write only file instance\n");
>
> Confusing. Maybe "attempting read on write-only filehandle"?
>
> > +
> > + do {
> > + rc = server->ops->wait_mtu_credits(server, cifs_sb->rsize,
> > + &rsize, &credits);
> > + if (rc)
> > + break;
> > +
> > + cur_len = min_t(const size_t, len, rsize);
> > +
> > + rc = iov_iter_get_pages_alloc(to, &pagevec, cur_len, &start);
> > + if (rc < 0) {
> > + cifs_dbg(VFS,
> > + "couldn't get user pages (rc=%zd) iter
> type %d"
> > + " iov_offset %lu count %lu\n",
> > + rc, to->type, to->iov_offset, to->count);
> > + dump_stack();
> > + break;
> > + }
> > +
> > + rdata = cifs_readdata_direct_alloc(
> > + pagevec, cifs_direct_readv_complete);
> > + if (!rdata) {
> > + add_credits_and_wake_if(server, credits, 0);
> > + rc = -ENOMEM;
> > + break;
> > + }
> > +
> > + npages = (rc + start + PAGE_SIZE-1) / PAGE_SIZE;
> > + rdata->nr_pages = npages;
> > + rdata->page_offset = start;
> > + rdata->pagesz = PAGE_SIZE;
> > + rdata->tailsz = npages > 1 ?
> > + rc-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE :
> > + rc;
>
> This expression makes my head hurt. Surely it can be simplified, or expressed
> in a clearer way.
>
> > + cur_len = rc;
> > +
> > + rdata->cfile = cifsFileInfo_get(cfile);
> > + rdata->offset = offset;
> > + rdata->bytes = rc;
> > + rdata->pid = pid;
> > + rdata->read_into_pages = cifs_uncached_read_into_pages;
> > + rdata->copy_into_pages = cifs_uncached_copy_into_pages;
> > + rdata->credits = credits;
> > +
> > + rc = 0;
> > + if (rdata->cfile->invalidHandle)
> > + rc = cifs_reopen_file(rdata->cfile, true);
> > +
> > + if (!rc)
> > + rc = server->ops->async_readv(rdata);
> > +
> > + if (rc) {
>
> This whole rc thing is messy. Initializing to zero, setting only in one case, then
> testing the result, then setting it again, is twisted.
> I actually think a goto or two would read much more clearly.
>
> > + add_credits_and_wake_if(server, rdata->credits, 0);
> > + kref_put(&rdata->refcount,
> > + cifs_direct_readdata_release);
> > + if (rc == -EAGAIN)
> > + continue;
> > + break;
>
> It's worth a comment here that this either breaks or continues the entire do {}
> while (); and btw when it breaks it does *not* return "rc".
> Again, maybe a goto instead of a break?
>
> > + }
> > +
> > + wait_for_completion(&rdata->done);
> > + rc = rdata->result;
> > + if (rc) {
> > + kref_put(
> > + &rdata->refcount,
> > + cifs_direct_readdata_release);
> > + if (rc == -EAGAIN)
> > + continue;
> > + break;
>
> Ditto.

I will re-work this patch.

>
> > + }
> > +
> > + total_read += rdata->got_bytes;
> > + kref_put(&rdata->refcount, cifs_direct_readdata_release);
> > +
> > + iov_iter_advance(to, cur_len);
> > + len -= cur_len;
> > + offset += cur_len;
> > + } while (len);
> > +
> > + iocb->ki_pos += total_read;
> > +
> > + return total_read;
> > +}
> > +
> > ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
> > {
> > struct file *file = iocb->ki_filp;
> >

2018-06-26 04:40:42

by Long Li

[permalink] [raw]
Subject: RE: [Patch v2 14/15] CIFS: Add support for direct I/O write

> Subject: Re: [Patch v2 14/15] CIFS: Add support for direct I/O write
>
> On 5/30/2018 3:48 PM, Long Li wrote:
> > From: Long Li <[email protected]>
> >
> > Implement the function for direct I/O write. It doesn't support AIO,
> > which will be implemented in a follow up patch.
> >
> > Signed-off-by: Long Li <[email protected]>
> > ---
> > fs/cifs/cifsfs.h | 1 +
> > fs/cifs/file.c | 165
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 166 insertions(+)
> >
> > diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index
> > 7fba9aa..e9c5103 100644
> > --- a/fs/cifs/cifsfs.h
> > +++ b/fs/cifs/cifsfs.h
> > @@ -105,6 +105,7 @@ extern ssize_t cifs_user_readv(struct kiocb *iocb,
> struct iov_iter *to);
> > extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
> > extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
> > extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter
> > *from);
> > +extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter
> > +*from);
> > extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
> > extern int cifs_lock(struct file *, int, struct file_lock *);
> > extern int cifs_fsync(struct file *, loff_t, loff_t, int); diff
> > --git a/fs/cifs/file.c b/fs/cifs/file.c index e6e6f24..8c385b1 100644
> > --- a/fs/cifs/file.c
> > +++ b/fs/cifs/file.c
> > @@ -2461,6 +2461,35 @@ cifs_uncached_writedata_release(struct kref
> > *refcount)
> >
> > static void collect_uncached_write_data(struct cifs_aio_ctx *ctx);
> >
> > +static void cifs_direct_writedata_release(struct kref *refcount) {
> > + int i;
> > + struct cifs_writedata *wdata = container_of(refcount,
> > + struct cifs_writedata, refcount);
> > +
> > + for (i = 0; i < wdata->nr_pages; i++)
> > + put_page(wdata->pages[i]);
> > +
> > + cifs_writedata_release(refcount);
> > +}
> > +
> > +static void cifs_direct_writev_complete(struct work_struct *work) {
> > + struct cifs_writedata *wdata = container_of(work,
> > + struct cifs_writedata, work);
> > + struct inode *inode = d_inode(wdata->cfile->dentry);
> > + struct cifsInodeInfo *cifsi = CIFS_I(inode);
> > +
> > + spin_lock(&inode->i_lock);
> > + cifs_update_eof(cifsi, wdata->offset, wdata->bytes);
> > + if (cifsi->server_eof > inode->i_size)
> > + i_size_write(inode, cifsi->server_eof);
> > + spin_unlock(&inode->i_lock);
> > +
> > + complete(&wdata->done);
> > + kref_put(&wdata->refcount, cifs_direct_writedata_release); }
> > +
> > static void
> > cifs_uncached_writev_complete(struct work_struct *work)
> > {
> > @@ -2703,6 +2732,142 @@ static void collect_uncached_write_data(struct
> cifs_aio_ctx *ctx)
> > complete(&ctx->done);
> > }
> >
> > +ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from)
> > +{
> > + struct file *file = iocb->ki_filp;
> > + ssize_t total_written = 0;
> > + struct cifsFileInfo *cfile;
> > + struct cifs_tcon *tcon;
> > + struct cifs_sb_info *cifs_sb;
> > + struct TCP_Server_Info *server;
> > + pid_t pid;
> > + unsigned long nr_pages;
> > + loff_t offset = iocb->ki_pos;
> > + size_t len = iov_iter_count(from);
> > + int rc;
> > + struct cifs_writedata *wdata;
> > +
> > + /*
> > + * iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
> > + * In this case, fall back to non-direct write function.
> > + */
> > + if (from->type & ITER_KVEC) {
> > + cifs_dbg(FYI, "use non-direct cifs_user_writev for kvec
> I/O\n");
> > + return cifs_user_writev(iocb, from);
> > + }
> > +
> > + rc = generic_write_checks(iocb, from);
> > + if (rc <= 0)
> > + return rc;
> > +
> > + cifs_sb = CIFS_FILE_SB(file);
> > + cfile = file->private_data;
> > + tcon = tlink_tcon(cfile->tlink);
> > + server = tcon->ses->server;
> > +
> > + if (!server->ops->async_writev)
> > + return -ENOSYS;
> > +
> > + if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
> > + pid = cfile->pid;
> > + else
> > + pid = current->tgid;
> > +
> > + do {
> > + unsigned int wsize, credits;
> > + struct page **pagevec;
> > + size_t start;
> > + ssize_t cur_len;
> > +
> > + rc = server->ops->wait_mtu_credits(server, cifs_sb->wsize,
> > + &wsize, &credits);
> > + if (rc)
> > + break;
> > +
> > + cur_len = iov_iter_get_pages_alloc(
> > + from, &pagevec, wsize, &start);
> > + if (cur_len < 0) {
> > + cifs_dbg(VFS,
> > + "direct_writev couldn't get user pages "
> > + "(rc=%zd) iter type %d iov_offset %lu count"
> > + " %lu\n",
> > + cur_len, from->type,
> > + from->iov_offset, from->count);
> > + dump_stack();
> > + break;
> > + }
> > + if (cur_len < 0)
> > + break;
>
> This cur_len < 0 test is redundant with the prior if(), delete.
> > +
> > + nr_pages = (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE;
>
> Am I misreading, or will this return be one more page than needed? If start
> (the first byte offset) is > 0, nr_pages will already be one.
> And if cur_len is 4KB, even if start is 0, nr_pages will be two.

I think the calculation is correct, assuming cur_len > 0. (which should be the case if we reach here)

If cur_len is 4kb and start is 0, nr_pages will be 1.

>
> > +
> > + wdata = cifs_writedata_direct_alloc(pagevec,
> > + cifs_direct_writev_complete);
> > + if (!wdata) {
> > + rc = -ENOMEM;
> > + add_credits_and_wake_if(server, credits, 0);
> > + break;
> > + }
> > +
> > + wdata->nr_pages = nr_pages;
> > + wdata->page_offset = start;
> > + wdata->pagesz = PAGE_SIZE;
> > + wdata->tailsz =
> > + nr_pages > 1 ?
> > + cur_len - (PAGE_SIZE - start) -
> > + (nr_pages - 2) * PAGE_SIZE :
> > + cur_len;
> > +
> > + wdata->sync_mode = WB_SYNC_ALL;
> > + wdata->offset = (__u64)offset;
> > + wdata->cfile = cifsFileInfo_get(cfile);
> > + wdata->pid = pid;
> > + wdata->bytes = cur_len;
> > + wdata->credits = credits;
> > +
> > + rc = 0;
> > + if (wdata->cfile->invalidHandle)
> > + rc = cifs_reopen_file(wdata->cfile, false);
> > +
> > + if (!rc)
> > + rc = server->ops->async_writev(wdata,
> > + cifs_direct_writedata_release);
> > +
> > + if (rc) {
> > + add_credits_and_wake_if(server, wdata->credits, 0);
> > + kref_put(&wdata->refcount,
> > + cifs_direct_writedata_release);
> > + if (rc == -EAGAIN)
> > + continue;
> > + break;
> > + }
>
> Same comments as for previous patch re the if (rc) ladder, and the
> break/continues both being better expressed as careful goto's.
>
> > +
> > + wait_for_completion(&wdata->done);
> > + if (wdata->result) {
> > + rc = wdata->result;
> > + kref_put(&wdata->refcount,
> > + cifs_direct_writedata_release);
> > + if (rc == -EAGAIN)
> > + continue;
> > + break;
> > + }
> > +
> > + kref_put(&wdata->refcount, cifs_direct_writedata_release);
> > +
> > + iov_iter_advance(from, cur_len);
> > + total_written += cur_len;
> > + offset += cur_len;
> > + len -= cur_len;
> > + } while (len);
> > +
> > + if (unlikely(!total_written))
> > + return rc;
> > +
> > + iocb->ki_pos += total_written;
> > + return total_written;
> > +
> > +}
> > +
> > ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
> > {
> > struct file *file = iocb->ki_filp;
> >

2018-06-26 13:51:43

by Tom Talpey

[permalink] [raw]
Subject: Re: [Patch v2 06/15] CIFS: Introduce helper function to get page offset and length in smb_rqst

On 6/25/2018 5:14 PM, Long Li wrote:
>> Subject: Re: [Patch v2 06/15] CIFS: Introduce helper function to get page
>> offset and length in smb_rqst
>>
>> On 5/30/2018 3:47 PM, Long Li wrote:
>>> From: Long Li <[email protected]>
>>>
>>> Introduce a function rqst_page_get_length to return the page offset
>>> and length for a given page in smb_rqst. This function is to be used
>>> by following patches.
>>>
>>> Signed-off-by: Long Li <[email protected]>
>>> ---
>>> fs/cifs/cifsproto.h | 3 +++
>>> fs/cifs/misc.c | 17 +++++++++++++++++
>>> 2 files changed, 20 insertions(+)
>>>
>>> diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index
>>> 7933c5f..89dda14 100644
>>> --- a/fs/cifs/cifsproto.h
>>> +++ b/fs/cifs/cifsproto.h
>>> @@ -557,4 +557,7 @@ int cifs_alloc_hash(const char *name, struct
>> crypto_shash **shash,
>>> struct sdesc **sdesc);
>>> void cifs_free_hash(struct crypto_shash **shash, struct sdesc
>>> **sdesc);
>>>
>>> +extern void rqst_page_get_length(struct smb_rqst *rqst, unsigned int
>> page,
>>> + unsigned int *len, unsigned int *offset);
>>> +
>>> #endif /* _CIFSPROTO_H */
>>> diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c index 96849b5..e951417
>>> 100644
>>> --- a/fs/cifs/misc.c
>>> +++ b/fs/cifs/misc.c
>>> @@ -905,3 +905,20 @@ cifs_free_hash(struct crypto_shash **shash,
>> struct sdesc **sdesc)
>>> crypto_free_shash(*shash);
>>> *shash = NULL;
>>> }
>>> +
>>> +/**
>>> + * rqst_page_get_length - obtain the length and offset for a page in
>>> +smb_rqst
>>> + * Input: rqst - a smb_rqst, page - a page index for rqst
>>> + * Output: *len - the length for this page, *offset - the offset for
>>> +this page */ void rqst_page_get_length(struct smb_rqst *rqst,
>>> +unsigned int page,
>>> + unsigned int *len, unsigned int *offset) {
>>> + *len = rqst->rq_pagesz;
>>> + *offset = (page == 0) ? rqst->rq_offset : 0;
>>
>> Really? Page 0 always has a zero offset??
>
> I think you are misreading this line. The offset for page 0 is rq_offset, the offset for all subsequent pages are 0.

Ah, yes, sorry I did read it incorrectly.
>>> +
>>> + if (rqst->rq_npages == 1 || page == rqst->rq_npages-1)
>>> + *len = rqst->rq_tailsz;
>>> + else if (page == 0)
>>> + *len = rqst->rq_pagesz - rqst->rq_offset; }
>>>
>>
>> This subroutine does what patch 5 does inline. Why not push this patch up in
>> the sequence and use the helper?
>
> This is actually a little different. This function returns the length and offset for a given page in the request. There might be multiple pages in a request.

Ok, but I still think there is quite a bit of inline computation of
this stuff that would more clearly and more robustly be done in a set
of common functions. If someone ever touches the code to support a
new upper layer, or even integrate with more complex compounding,
things will get ugly fast.

> The other function calculates the total length of all the pages in a request.

Again, a great candidate for a common set of subroutines, IMO.

Tom.

2018-06-26 14:02:02

by Tom Talpey

[permalink] [raw]
Subject: Re: [Patch v2 14/15] CIFS: Add support for direct I/O write

On 6/26/2018 12:39 AM, Long Li wrote:
>> Subject: Re: [Patch v2 14/15] CIFS: Add support for direct I/O write
>>
>> On 5/30/2018 3:48 PM, Long Li wrote:
>>> From: Long Li <[email protected]>
>>>
>>> Implement the function for direct I/O write. It doesn't support AIO,
>>> which will be implemented in a follow up patch.
>>>
>>> Signed-off-by: Long Li <[email protected]>
>>> ---
>>> fs/cifs/cifsfs.h | 1 +
>>> fs/cifs/file.c | 165
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> 2 files changed, 166 insertions(+)
>>>
>>> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index
>>> 7fba9aa..e9c5103 100644
>>> --- a/fs/cifs/cifsfs.h
>>> +++ b/fs/cifs/cifsfs.h
>>> @@ -105,6 +105,7 @@ extern ssize_t cifs_user_readv(struct kiocb *iocb,
>> struct iov_iter *to);
>>> extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
>>> extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
>>> extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter
>>> *from);
>>> +extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter
>>> +*from);
>>> extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
>>> extern int cifs_lock(struct file *, int, struct file_lock *);
>>> extern int cifs_fsync(struct file *, loff_t, loff_t, int); diff
>>> --git a/fs/cifs/file.c b/fs/cifs/file.c index e6e6f24..8c385b1 100644
>>> --- a/fs/cifs/file.c
>>> +++ b/fs/cifs/file.c
>>> @@ -2461,6 +2461,35 @@ cifs_uncached_writedata_release(struct kref
>>> *refcount)
>>>
>>> static void collect_uncached_write_data(struct cifs_aio_ctx *ctx);
>>>
>>> +static void cifs_direct_writedata_release(struct kref *refcount) {
>>> + int i;
>>> + struct cifs_writedata *wdata = container_of(refcount,
>>> + struct cifs_writedata, refcount);
>>> +
>>> + for (i = 0; i < wdata->nr_pages; i++)
>>> + put_page(wdata->pages[i]);
>>> +
>>> + cifs_writedata_release(refcount);
>>> +}
>>> +
>>> +static void cifs_direct_writev_complete(struct work_struct *work) {
>>> + struct cifs_writedata *wdata = container_of(work,
>>> + struct cifs_writedata, work);
>>> + struct inode *inode = d_inode(wdata->cfile->dentry);
>>> + struct cifsInodeInfo *cifsi = CIFS_I(inode);
>>> +
>>> + spin_lock(&inode->i_lock);
>>> + cifs_update_eof(cifsi, wdata->offset, wdata->bytes);
>>> + if (cifsi->server_eof > inode->i_size)
>>> + i_size_write(inode, cifsi->server_eof);
>>> + spin_unlock(&inode->i_lock);
>>> +
>>> + complete(&wdata->done);
>>> + kref_put(&wdata->refcount, cifs_direct_writedata_release); }
>>> +
>>> static void
>>> cifs_uncached_writev_complete(struct work_struct *work)
>>> {
>>> @@ -2703,6 +2732,142 @@ static void collect_uncached_write_data(struct
>> cifs_aio_ctx *ctx)
>>> complete(&ctx->done);
>>> }
>>>
>>> +ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from)
>>> +{
>>> + struct file *file = iocb->ki_filp;
>>> + ssize_t total_written = 0;
>>> + struct cifsFileInfo *cfile;
>>> + struct cifs_tcon *tcon;
>>> + struct cifs_sb_info *cifs_sb;
>>> + struct TCP_Server_Info *server;
>>> + pid_t pid;
>>> + unsigned long nr_pages;
>>> + loff_t offset = iocb->ki_pos;
>>> + size_t len = iov_iter_count(from);
>>> + int rc;
>>> + struct cifs_writedata *wdata;
>>> +
>>> + /*
>>> + * iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
>>> + * In this case, fall back to non-direct write function.
>>> + */
>>> + if (from->type & ITER_KVEC) {
>>> + cifs_dbg(FYI, "use non-direct cifs_user_writev for kvec
>> I/O\n");
>>> + return cifs_user_writev(iocb, from);
>>> + }
>>> +
>>> + rc = generic_write_checks(iocb, from);
>>> + if (rc <= 0)
>>> + return rc;
>>> +
>>> + cifs_sb = CIFS_FILE_SB(file);
>>> + cfile = file->private_data;
>>> + tcon = tlink_tcon(cfile->tlink);
>>> + server = tcon->ses->server;
>>> +
>>> + if (!server->ops->async_writev)
>>> + return -ENOSYS;
>>> +
>>> + if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
>>> + pid = cfile->pid;
>>> + else
>>> + pid = current->tgid;
>>> +
>>> + do {
>>> + unsigned int wsize, credits;
>>> + struct page **pagevec;
>>> + size_t start;
>>> + ssize_t cur_len;
>>> +
>>> + rc = server->ops->wait_mtu_credits(server, cifs_sb->wsize,
>>> + &wsize, &credits);
>>> + if (rc)
>>> + break;
>>> +
>>> + cur_len = iov_iter_get_pages_alloc(
>>> + from, &pagevec, wsize, &start);
>>> + if (cur_len < 0) {
>>> + cifs_dbg(VFS,
>>> + "direct_writev couldn't get user pages "
>>> + "(rc=%zd) iter type %d iov_offset %lu count"
>>> + " %lu\n",
>>> + cur_len, from->type,
>>> + from->iov_offset, from->count);
>>> + dump_stack();
>>> + break;
>>> + }
>>> + if (cur_len < 0)
>>> + break;
>>
>> This cur_len < 0 test is redundant with the prior if(), delete.
>>> +
>>> + nr_pages = (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE;
>>
>> Am I misreading, or will this return be one more page than needed? If start
>> (the first byte offset) is > 0, nr_pages will already be one.
>> And if cur_len is 4KB, even if start is 0, nr_pages will be two.
>
> I think the calculation is correct, assuming cur_len > 0. (which should be the case if we reach here)

Err, cur_len could possibly be zero, the prior line only breaks the
loop if cur_len < 0.

> If cur_len is 4kb and start is 0, nr_pages will be 1.

Hmm, I guess. But again, it seems as if this page handling is all
being coded inline, in subtly different ways. I'm just concerned
about clarity and robustness. To me, if it's hard to review, it's
even harder to maintain.

Tom.

>
>>
>>> +
>>> + wdata = cifs_writedata_direct_alloc(pagevec,
>>> + cifs_direct_writev_complete);
>>> + if (!wdata) {
>>> + rc = -ENOMEM;
>>> + add_credits_and_wake_if(server, credits, 0);
>>> + break;
>>> + }
>>> +
>>> + wdata->nr_pages = nr_pages;
>>> + wdata->page_offset = start;
>>> + wdata->pagesz = PAGE_SIZE;
>>> + wdata->tailsz =
>>> + nr_pages > 1 ?
>>> + cur_len - (PAGE_SIZE - start) -
>>> + (nr_pages - 2) * PAGE_SIZE :
>>> + cur_len;
>>> +
>>> + wdata->sync_mode = WB_SYNC_ALL;
>>> + wdata->offset = (__u64)offset;
>>> + wdata->cfile = cifsFileInfo_get(cfile);
>>> + wdata->pid = pid;
>>> + wdata->bytes = cur_len;
>>> + wdata->credits = credits;
>>> +
>>> + rc = 0;
>>> + if (wdata->cfile->invalidHandle)
>>> + rc = cifs_reopen_file(wdata->cfile, false);
>>> +
>>> + if (!rc)
>>> + rc = server->ops->async_writev(wdata,
>>> + cifs_direct_writedata_release);
>>> +
>>> + if (rc) {
>>> + add_credits_and_wake_if(server, wdata->credits, 0);
>>> + kref_put(&wdata->refcount,
>>> + cifs_direct_writedata_release);
>>> + if (rc == -EAGAIN)
>>> + continue;
>>> + break;
>>> + }
>>
>> Same comments as for previous patch re the if (rc) ladder, and the
>> break/continues both being better expressed as careful goto's.
>>
>>> +
>>> + wait_for_completion(&wdata->done);
>>> + if (wdata->result) {
>>> + rc = wdata->result;
>>> + kref_put(&wdata->refcount,
>>> + cifs_direct_writedata_release);
>>> + if (rc == -EAGAIN)
>>> + continue;
>>> + break;
>>> + }
>>> +
>>> + kref_put(&wdata->refcount, cifs_direct_writedata_release);
>>> +
>>> + iov_iter_advance(from, cur_len);
>>> + total_written += cur_len;
>>> + offset += cur_len;
>>> + len -= cur_len;
>>> + } while (len);
>>> +
>>> + if (unlikely(!total_written))
>>> + return rc;
>>> +
>>> + iocb->ki_pos += total_written;
>>> + return total_written;
>>> +
>>> +}
>>> +
>>> ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
>>> {
>>> struct file *file = iocb->ki_filp;
>>>


2018-06-26 15:22:19

by Tom Talpey

[permalink] [raw]
Subject: Re: [Patch v2 02/15] CIFS: Add support for direct pages in rdata

On 6/25/2018 5:01 PM, Jason Gunthorpe wrote:
> On Sat, Jun 23, 2018 at 09:50:20PM -0400, Tom Talpey wrote:
>> On 5/30/2018 3:47 PM, Long Li wrote:
>>> From: Long Li <[email protected]>
>>>
>>> Add a function to allocate rdata without allocating pages for data
>>> transfer. This gives the caller an option to pass a number of pages
>>> that point to the data buffer.
>>>
>>> rdata is still reponsible for free those pages after it's done.
>>
>> "Caller" is still responsible? Or is the rdata somehow freeing itself
>> via another mechanism?
>>
>>>
>>> Signed-off-by: Long Li <[email protected]>
>>> fs/cifs/cifsglob.h | 2 +-
>>> fs/cifs/file.c | 23 ++++++++++++++++++++---
>>> 2 files changed, 21 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
>>> index 8d16c3e..56864a87 100644
>>> +++ b/fs/cifs/cifsglob.h
>>> @@ -1179,7 +1179,7 @@ struct cifs_readdata {
>>> unsigned int tailsz;
>>> unsigned int credits;
>>> unsigned int nr_pages;
>>> - struct page *pages[];
>>> + struct page **pages;
>>
>> Technically speaking, these are syntactically equivalent. It may not
>> be worth changing this historic definition.
>
> [] is a C99 'flex array', it has a different allocation behavior than
> ** and is not interchangeable..

In that case, it's an even better reason to not change the declaration.

Tom.


2018-06-27 04:56:07

by Long Li

[permalink] [raw]
Subject: RE: [Patch v2 02/15] CIFS: Add support for direct pages in rdata

> Subject: Re: [Patch v2 02/15] CIFS: Add support for direct pages in rdata
>
> On 6/25/2018 5:01 PM, Jason Gunthorpe wrote:
> > On Sat, Jun 23, 2018 at 09:50:20PM -0400, Tom Talpey wrote:
> >> On 5/30/2018 3:47 PM, Long Li wrote:
> >>> From: Long Li <[email protected]>
> >>>
> >>> Add a function to allocate rdata without allocating pages for data
> >>> transfer. This gives the caller an option to pass a number of pages
> >>> that point to the data buffer.
> >>>
> >>> rdata is still reponsible for free those pages after it's done.
> >>
> >> "Caller" is still responsible? Or is the rdata somehow freeing itself
> >> via another mechanism?
> >>
> >>>
> >>> Signed-off-by: Long Li <[email protected]>
> >>> fs/cifs/cifsglob.h | 2 +-
> >>> fs/cifs/file.c | 23 ++++++++++++++++++++---
> >>> 2 files changed, 21 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index
> >>> 8d16c3e..56864a87 100644
> >>> +++ b/fs/cifs/cifsglob.h
> >>> @@ -1179,7 +1179,7 @@ struct cifs_readdata {
> >>> unsigned int tailsz;
> >>> unsigned int credits;
> >>> unsigned int nr_pages;
> >>> - struct page *pages[];
> >>> + struct page **pages;
> >>
> >> Technically speaking, these are syntactically equivalent. It may not
> >> be worth changing this historic definition.
> >
> > [] is a C99 'flex array', it has a different allocation behavior than
> > ** and is not interchangeable..
>
> In that case, it's an even better reason to not change the declaration.

No, it needs to be declared separately.

With Direct I/O, **pages are allocated and returned from iov_iter_get_pages_alloc() when locking those user pages. They can't be allocated as part of struct cifs_readdata.

2018-06-27 04:56:43

by Long Li

[permalink] [raw]
Subject: RE: [Patch v2 06/15] CIFS: Introduce helper function to get page offset and length in smb_rqst

> Subject: Re: [Patch v2 06/15] CIFS: Introduce helper function to get page
> offset and length in smb_rqst
>
> On 6/25/2018 5:14 PM, Long Li wrote:
> >> Subject: Re: [Patch v2 06/15] CIFS: Introduce helper function to get
> >> page offset and length in smb_rqst
> >>
> >> On 5/30/2018 3:47 PM, Long Li wrote:
> >>> From: Long Li <[email protected]>
> >>>
> >>> Introduce a function rqst_page_get_length to return the page offset
> >>> and length for a given page in smb_rqst. This function is to be used
> >>> by following patches.
> >>>
> >>> Signed-off-by: Long Li <[email protected]>
> >>> ---
> >>> fs/cifs/cifsproto.h | 3 +++
> >>> fs/cifs/misc.c | 17 +++++++++++++++++
> >>> 2 files changed, 20 insertions(+)
> >>>
> >>> diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index
> >>> 7933c5f..89dda14 100644
> >>> --- a/fs/cifs/cifsproto.h
> >>> +++ b/fs/cifs/cifsproto.h
> >>> @@ -557,4 +557,7 @@ int cifs_alloc_hash(const char *name, struct
> >> crypto_shash **shash,
> >>> struct sdesc **sdesc);
> >>> void cifs_free_hash(struct crypto_shash **shash, struct sdesc
> >>> **sdesc);
> >>>
> >>> +extern void rqst_page_get_length(struct smb_rqst *rqst, unsigned
> >>> +int
> >> page,
> >>> + unsigned int *len, unsigned int *offset);
> >>> +
> >>> #endif /* _CIFSPROTO_H */
> >>> diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c index 96849b5..e951417
> >>> 100644
> >>> --- a/fs/cifs/misc.c
> >>> +++ b/fs/cifs/misc.c
> >>> @@ -905,3 +905,20 @@ cifs_free_hash(struct crypto_shash **shash,
> >> struct sdesc **sdesc)
> >>> crypto_free_shash(*shash);
> >>> *shash = NULL;
> >>> }
> >>> +
> >>> +/**
> >>> + * rqst_page_get_length - obtain the length and offset for a page
> >>> +in smb_rqst
> >>> + * Input: rqst - a smb_rqst, page - a page index for rqst
> >>> + * Output: *len - the length for this page, *offset - the offset
> >>> +for this page */ void rqst_page_get_length(struct smb_rqst *rqst,
> >>> +unsigned int page,
> >>> + unsigned int *len, unsigned int *offset) {
> >>> + *len = rqst->rq_pagesz;
> >>> + *offset = (page == 0) ? rqst->rq_offset : 0;
> >>
> >> Really? Page 0 always has a zero offset??
> >
> > I think you are misreading this line. The offset for page 0 is rq_offset, the
> offset for all subsequent pages are 0.
>
> Ah, yes, sorry I did read it incorrectly.
> >>> +
> >>> + if (rqst->rq_npages == 1 || page == rqst->rq_npages-1)
> >>> + *len = rqst->rq_tailsz;
> >>> + else if (page == 0)
> >>> + *len = rqst->rq_pagesz - rqst->rq_offset; }
> >>>
> >>
> >> This subroutine does what patch 5 does inline. Why not push this
> >> patch up in the sequence and use the helper?
> >
> > This is actually a little different. This function returns the length and offset
> for a given page in the request. There might be multiple pages in a request.
>
> Ok, but I still think there is quite a bit of inline computation of this stuff that
> would more clearly and more robustly be done in a set of common functions.
> If someone ever touches the code to support a new upper layer, or even
> integrate with more complex compounding, things will get ugly fast.
>
> > The other function calculates the total length of all the pages in a request.
>
> Again, a great candidate for a common set of subroutines, IMO.

I will look into this. The reason I didn't make a common function is that this is the only patch that will use it in this way.

2018-06-27 04:58:57

by Long Li

[permalink] [raw]
Subject: RE: [Patch v2 14/15] CIFS: Add support for direct I/O write

> Subject: Re: [Patch v2 14/15] CIFS: Add support for direct I/O write
>
> On 6/26/2018 12:39 AM, Long Li wrote:
> >> Subject: Re: [Patch v2 14/15] CIFS: Add support for direct I/O write
> >>
> >> On 5/30/2018 3:48 PM, Long Li wrote:
> >>> From: Long Li <[email protected]>
> >>>
> >>> Implement the function for direct I/O write. It doesn't support AIO,
> >>> which will be implemented in a follow up patch.
> >>>
> >>> Signed-off-by: Long Li <[email protected]>
> >>> ---
> >>> fs/cifs/cifsfs.h | 1 +
> >>> fs/cifs/file.c | 165
> >> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> 2 files changed, 166 insertions(+)
> >>>
> >>> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index
> >>> 7fba9aa..e9c5103 100644
> >>> --- a/fs/cifs/cifsfs.h
> >>> +++ b/fs/cifs/cifsfs.h
> >>> @@ -105,6 +105,7 @@ extern ssize_t cifs_user_readv(struct kiocb
> >>> *iocb,
> >> struct iov_iter *to);
> >>> extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
> >>> extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
> >>> extern ssize_t cifs_user_writev(struct kiocb *iocb, struct
> >>> iov_iter *from);
> >>> +extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct
> >>> +iov_iter *from);
> >>> extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter
> *from);
> >>> extern int cifs_lock(struct file *, int, struct file_lock *);
> >>> extern int cifs_fsync(struct file *, loff_t, loff_t, int); diff
> >>> --git a/fs/cifs/file.c b/fs/cifs/file.c index e6e6f24..8c385b1
> >>> 100644
> >>> --- a/fs/cifs/file.c
> >>> +++ b/fs/cifs/file.c
> >>> @@ -2461,6 +2461,35 @@ cifs_uncached_writedata_release(struct kref
> >>> *refcount)
> >>>
> >>> static void collect_uncached_write_data(struct cifs_aio_ctx
> >>> *ctx);
> >>>
> >>> +static void cifs_direct_writedata_release(struct kref *refcount) {
> >>> + int i;
> >>> + struct cifs_writedata *wdata = container_of(refcount,
> >>> + struct cifs_writedata, refcount);
> >>> +
> >>> + for (i = 0; i < wdata->nr_pages; i++)
> >>> + put_page(wdata->pages[i]);
> >>> +
> >>> + cifs_writedata_release(refcount);
> >>> +}
> >>> +
> >>> +static void cifs_direct_writev_complete(struct work_struct *work) {
> >>> + struct cifs_writedata *wdata = container_of(work,
> >>> + struct cifs_writedata, work);
> >>> + struct inode *inode = d_inode(wdata->cfile->dentry);
> >>> + struct cifsInodeInfo *cifsi = CIFS_I(inode);
> >>> +
> >>> + spin_lock(&inode->i_lock);
> >>> + cifs_update_eof(cifsi, wdata->offset, wdata->bytes);
> >>> + if (cifsi->server_eof > inode->i_size)
> >>> + i_size_write(inode, cifsi->server_eof);
> >>> + spin_unlock(&inode->i_lock);
> >>> +
> >>> + complete(&wdata->done);
> >>> + kref_put(&wdata->refcount, cifs_direct_writedata_release); }
> >>> +
> >>> static void
> >>> cifs_uncached_writev_complete(struct work_struct *work)
> >>> {
> >>> @@ -2703,6 +2732,142 @@ static void
> >>> collect_uncached_write_data(struct
> >> cifs_aio_ctx *ctx)
> >>> complete(&ctx->done);
> >>> }
> >>>
> >>> +ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter
> >>> +*from) {
> >>> + struct file *file = iocb->ki_filp;
> >>> + ssize_t total_written = 0;
> >>> + struct cifsFileInfo *cfile;
> >>> + struct cifs_tcon *tcon;
> >>> + struct cifs_sb_info *cifs_sb;
> >>> + struct TCP_Server_Info *server;
> >>> + pid_t pid;
> >>> + unsigned long nr_pages;
> >>> + loff_t offset = iocb->ki_pos;
> >>> + size_t len = iov_iter_count(from);
> >>> + int rc;
> >>> + struct cifs_writedata *wdata;
> >>> +
> >>> + /*
> >>> + * iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
> >>> + * In this case, fall back to non-direct write function.
> >>> + */
> >>> + if (from->type & ITER_KVEC) {
> >>> + cifs_dbg(FYI, "use non-direct cifs_user_writev for kvec
> >> I/O\n");
> >>> + return cifs_user_writev(iocb, from);
> >>> + }
> >>> +
> >>> + rc = generic_write_checks(iocb, from);
> >>> + if (rc <= 0)
> >>> + return rc;
> >>> +
> >>> + cifs_sb = CIFS_FILE_SB(file);
> >>> + cfile = file->private_data;
> >>> + tcon = tlink_tcon(cfile->tlink);
> >>> + server = tcon->ses->server;
> >>> +
> >>> + if (!server->ops->async_writev)
> >>> + return -ENOSYS;
> >>> +
> >>> + if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
> >>> + pid = cfile->pid;
> >>> + else
> >>> + pid = current->tgid;
> >>> +
> >>> + do {
> >>> + unsigned int wsize, credits;
> >>> + struct page **pagevec;
> >>> + size_t start;
> >>> + ssize_t cur_len;
> >>> +
> >>> + rc = server->ops->wait_mtu_credits(server, cifs_sb->wsize,
> >>> + &wsize, &credits);
> >>> + if (rc)
> >>> + break;
> >>> +
> >>> + cur_len = iov_iter_get_pages_alloc(
> >>> + from, &pagevec, wsize, &start);
> >>> + if (cur_len < 0) {
> >>> + cifs_dbg(VFS,
> >>> + "direct_writev couldn't get user pages "
> >>> + "(rc=%zd) iter type %d iov_offset %lu count"
> >>> + " %lu\n",
> >>> + cur_len, from->type,
> >>> + from->iov_offset, from->count);
> >>> + dump_stack();
> >>> + break;
> >>> + }
> >>> + if (cur_len < 0)
> >>> + break;
> >>
> >> This cur_len < 0 test is redundant with the prior if(), delete.
> >>> +
> >>> + nr_pages = (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE;
> >>
> >> Am I misreading, or will this return be one more page than needed? If
> >> start (the first byte offset) is > 0, nr_pages will already be one.
> >> And if cur_len is 4KB, even if start is 0, nr_pages will be two.
> >
> > I think the calculation is correct, assuming cur_len > 0. (which
> > should be the case if we reach here)
>
> Err, cur_len could possibly be zero, the prior line only breaks the loop if
> cur_len < 0.

I will look into this. It's a good idea to return error on 0 since we have no way to proceed.

>
> > If cur_len is 4kb and start is 0, nr_pages will be 1.
>
> Hmm, I guess. But again, it seems as if this page handling is all being coded
> inline, in subtly different ways. I'm just concerned about clarity and
> robustness. To me, if it's hard to review, it's even harder to maintain.

I will address this in an updated patch.

>
> Tom.
>
> >
> >>
> >>> +
> >>> + wdata = cifs_writedata_direct_alloc(pagevec,
> >>> + cifs_direct_writev_complete);
> >>> + if (!wdata) {
> >>> + rc = -ENOMEM;
> >>> + add_credits_and_wake_if(server, credits, 0);
> >>> + break;
> >>> + }
> >>> +
> >>> + wdata->nr_pages = nr_pages;
> >>> + wdata->page_offset = start;
> >>> + wdata->pagesz = PAGE_SIZE;
> >>> + wdata->tailsz =
> >>> + nr_pages > 1 ?
> >>> + cur_len - (PAGE_SIZE - start) -
> >>> + (nr_pages - 2) * PAGE_SIZE :
> >>> + cur_len;
> >>> +
> >>> + wdata->sync_mode = WB_SYNC_ALL;
> >>> + wdata->offset = (__u64)offset;
> >>> + wdata->cfile = cifsFileInfo_get(cfile);
> >>> + wdata->pid = pid;
> >>> + wdata->bytes = cur_len;
> >>> + wdata->credits = credits;
> >>> +
> >>> + rc = 0;
> >>> + if (wdata->cfile->invalidHandle)
> >>> + rc = cifs_reopen_file(wdata->cfile, false);
> >>> +
> >>> + if (!rc)
> >>> + rc = server->ops->async_writev(wdata,
> >>> + cifs_direct_writedata_release);
> >>> +
> >>> + if (rc) {
> >>> + add_credits_and_wake_if(server, wdata->credits, 0);
> >>> + kref_put(&wdata->refcount,
> >>> + cifs_direct_writedata_release);
> >>> + if (rc == -EAGAIN)
> >>> + continue;
> >>> + break;
> >>> + }
> >>
> >> Same comments as for previous patch re the if (rc) ladder, and the
> >> break/continues both being better expressed as careful goto's.
> >>
> >>> +
> >>> + wait_for_completion(&wdata->done);
> >>> + if (wdata->result) {
> >>> + rc = wdata->result;
> >>> + kref_put(&wdata->refcount,
> >>> + cifs_direct_writedata_release);
> >>> + if (rc == -EAGAIN)
> >>> + continue;
> >>> + break;
> >>> + }
> >>> +
> >>> + kref_put(&wdata->refcount, cifs_direct_writedata_release);
> >>> +
> >>> + iov_iter_advance(from, cur_len);
> >>> + total_written += cur_len;
> >>> + offset += cur_len;
> >>> + len -= cur_len;
> >>> + } while (len);
> >>> +
> >>> + if (unlikely(!total_written))
> >>> + return rc;
> >>> +
> >>> + iocb->ki_pos += total_written;
> >>> + return total_written;
> >>> +
> >>> +}
> >>> +
> >>> ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
> >>> {
> >>> struct file *file = iocb->ki_filp;
> >>>