2018-09-20 21:20:25

by Long Li

[permalink] [raw]
Subject: [PATCH V3 (resend) 1/7] CIFS: pass page offsets on SMB1 read/write

From: Long Li <[email protected]>

When issuing SMB1 read/write, pass the page offset to transport.

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/cifssmb.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index 41329f4..f82fd34 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -1607,6 +1607,7 @@ cifs_readv_callback(struct mid_q_entry *mid)
struct smb_rqst rqst = { .rq_iov = rdata->iov,
.rq_nvec = 2,
.rq_pages = rdata->pages,
+ .rq_offset = rdata->page_offset,
.rq_npages = rdata->nr_pages,
.rq_pagesz = rdata->pagesz,
.rq_tailsz = rdata->tailsz };
@@ -2210,6 +2211,7 @@ cifs_async_writev(struct cifs_writedata *wdata,
rqst.rq_iov = iov;
rqst.rq_nvec = 2;
rqst.rq_pages = wdata->pages;
+ rqst.rq_offset = wdata->page_offset;
rqst.rq_npages = wdata->nr_pages;
rqst.rq_pagesz = wdata->pagesz;
rqst.rq_tailsz = wdata->tailsz;
--
2.7.4



2018-09-20 21:20:33

by Long Li

[permalink] [raw]
Subject: [PATCH V3 (resend) 2/7] CIFS: SMBD: Do not call ib_dereg_mr on invalidated memory registration

From: Long Li <[email protected]>

It is not necessary to deregister a memory registration after it has been
successfully invalidated.

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/smbdirect.c | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index 5fdb9a5..5e28236 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2295,8 +2295,12 @@ static void smbd_mr_recovery_work(struct work_struct *work)
int rc;

list_for_each_entry(smbdirect_mr, &info->mr_list, list) {
- if (smbdirect_mr->state == MR_INVALIDATED ||
- smbdirect_mr->state == MR_ERROR) {
+ if (smbdirect_mr->state == MR_INVALIDATED)
+ ib_dma_unmap_sg(
+ info->id->device, smbdirect_mr->sgl,
+ smbdirect_mr->sgl_count,
+ smbdirect_mr->dir);
+ else if (smbdirect_mr->state == MR_ERROR) {

/* recover this MR entry */
rc = ib_dereg_mr(smbdirect_mr->mr);
@@ -2320,25 +2324,21 @@ static void smbd_mr_recovery_work(struct work_struct *work)
smbd_disconnect_rdma_connection(info);
continue;
}
+ } else
+ /* This MR is being used, don't recover it */
+ continue;

- if (smbdirect_mr->state == MR_INVALIDATED)
- ib_dma_unmap_sg(
- info->id->device, smbdirect_mr->sgl,
- smbdirect_mr->sgl_count,
- smbdirect_mr->dir);
-
- smbdirect_mr->state = MR_READY;
+ smbdirect_mr->state = MR_READY;

- /* smbdirect_mr->state is updated by this function
- * and is read and updated by I/O issuing CPUs trying
- * to get a MR, the call to atomic_inc_return
- * implicates a memory barrier and guarantees this
- * value is updated before waking up any calls to
- * get_mr() from the I/O issuing CPUs
- */
- if (atomic_inc_return(&info->mr_ready_count) == 1)
- wake_up_interruptible(&info->wait_mr);
- }
+ /* smbdirect_mr->state is updated by this function
+ * and is read and updated by I/O issuing CPUs trying
+ * to get a MR, the call to atomic_inc_return
+ * implicates a memory barrier and guarantees this
+ * value is updated before waking up any calls to
+ * get_mr() from the I/O issuing CPUs
+ */
+ if (atomic_inc_return(&info->mr_ready_count) == 1)
+ wake_up_interruptible(&info->wait_mr);
}
}

--
2.7.4


2018-09-20 21:20:35

by Long Li

[permalink] [raw]
Subject: [PATCH V3 (resend) 5/7] CIFS: Add direct I/O functions to file_operations

From: Long Li <[email protected]>

With direct read/write functions implemented, add them to file_operations.

Dircet I/O is used under two conditions:
1. When mounting with "cache=none", CIFS uses direct I/O for all user file
data transfer.
2. When opening a file with O_DIRECT, CIFS uses direct I/O for all data
transfer on this file.

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/cifsfs.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 7065426..3ba44f1 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -1159,9 +1159,8 @@ const struct file_operations cifs_file_strict_ops = {
};

const struct file_operations cifs_file_direct_ops = {
- /* BB reevaluate whether they can be done with directio, no cache */
- .read_iter = cifs_user_readv,
- .write_iter = cifs_user_writev,
+ .read_iter = cifs_direct_readv,
+ .write_iter = cifs_direct_writev,
.open = cifs_open,
.release = cifs_close,
.lock = cifs_lock,
@@ -1215,9 +1214,8 @@ const struct file_operations cifs_file_strict_nobrl_ops = {
};

const struct file_operations cifs_file_direct_nobrl_ops = {
- /* BB reevaluate whether they can be done with directio, no cache */
- .read_iter = cifs_user_readv,
- .write_iter = cifs_user_writev,
+ .read_iter = cifs_direct_readv,
+ .write_iter = cifs_direct_writev,
.open = cifs_open,
.release = cifs_close,
.fsync = cifs_fsync,
--
2.7.4


2018-09-20 21:21:05

by Long Li

[permalink] [raw]
Subject: [PATCH V3 (resend) 4/7] CIFS: Add support for direct I/O write

From: Long Li <[email protected]>

With direct I/O write, user supplied buffers are pinned to the memory and data
are transferred directly from user buffers to the transport layer.

Change in v3: add support for kernel AIO

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/cifsfs.h | 1 +
fs/cifs/file.c | 196 ++++++++++++++++++++++++++++++++++++++++++++++---------
2 files changed, 166 insertions(+), 31 deletions(-)

diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index ed5479c..cc54051 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -104,6 +104,7 @@ extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
+extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from);
extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
extern int cifs_lock(struct file *, int, struct file_lock *);
extern int cifs_fsync(struct file *, loff_t, loff_t, int);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 6a939fa..2a5d209 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2537,6 +2537,8 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
loff_t saved_offset = offset;
pid_t pid;
struct TCP_Server_Info *server;
+ struct page **pagevec;
+ size_t start;

if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
pid = open_file->pid;
@@ -2553,38 +2555,74 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
if (rc)
break;

- nr_pages = get_numpages(wsize, len, &cur_len);
- wdata = cifs_writedata_alloc(nr_pages,
+ if (ctx->direct_io) {
+ cur_len = iov_iter_get_pages_alloc(
+ from, &pagevec, wsize, &start);
+ if (cur_len < 0) {
+ cifs_dbg(VFS,
+ "direct_writev couldn't get user pages "
+ "(rc=%zd) iter type %d iov_offset %zd count"
+ " %zd\n",
+ cur_len, from->type,
+ from->iov_offset, from->count);
+ dump_stack();
+ break;
+ }
+ iov_iter_advance(from, cur_len);
+
+ nr_pages = (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE;
+
+ wdata = cifs_writedata_direct_alloc(pagevec,
cifs_uncached_writev_complete);
- if (!wdata) {
- rc = -ENOMEM;
- add_credits_and_wake_if(server, credits, 0);
- break;
- }
+ if (!wdata) {
+ rc = -ENOMEM;
+ add_credits_and_wake_if(server, credits, 0);
+ break;
+ }

- rc = cifs_write_allocate_pages(wdata->pages, nr_pages);
- if (rc) {
- kfree(wdata);
- add_credits_and_wake_if(server, credits, 0);
- break;
- }

- num_pages = nr_pages;
- rc = wdata_fill_from_iovec(wdata, from, &cur_len, &num_pages);
- if (rc) {
- for (i = 0; i < nr_pages; i++)
- put_page(wdata->pages[i]);
- kfree(wdata);
- add_credits_and_wake_if(server, credits, 0);
- break;
- }
+ wdata->page_offset = start;
+ wdata->tailsz =
+ nr_pages > 1 ?
+ cur_len - (PAGE_SIZE - start) -
+ (nr_pages - 2) * PAGE_SIZE :
+ cur_len;
+ } else {
+ nr_pages = get_numpages(wsize, len, &cur_len);
+ wdata = cifs_writedata_alloc(nr_pages,
+ cifs_uncached_writev_complete);
+ if (!wdata) {
+ rc = -ENOMEM;
+ add_credits_and_wake_if(server, credits, 0);
+ break;
+ }

- /*
- * Bring nr_pages down to the number of pages we actually used,
- * and free any pages that we didn't use.
- */
- for ( ; nr_pages > num_pages; nr_pages--)
- put_page(wdata->pages[nr_pages - 1]);
+ rc = cifs_write_allocate_pages(wdata->pages, nr_pages);
+ if (rc) {
+ kfree(wdata);
+ add_credits_and_wake_if(server, credits, 0);
+ break;
+ }
+
+ num_pages = nr_pages;
+ rc = wdata_fill_from_iovec(wdata, from, &cur_len, &num_pages);
+ if (rc) {
+ for (i = 0; i < nr_pages; i++)
+ put_page(wdata->pages[i]);
+ kfree(wdata);
+ add_credits_and_wake_if(server, credits, 0);
+ break;
+ }
+
+ /*
+ * Bring nr_pages down to the number of pages we actually used,
+ * and free any pages that we didn't use.
+ */
+ for ( ; nr_pages > num_pages; nr_pages--)
+ put_page(wdata->pages[nr_pages - 1]);
+
+ wdata->tailsz = cur_len - ((nr_pages - 1) * PAGE_SIZE);
+ }

wdata->sync_mode = WB_SYNC_ALL;
wdata->nr_pages = nr_pages;
@@ -2593,7 +2631,6 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
wdata->pid = pid;
wdata->bytes = cur_len;
wdata->pagesz = PAGE_SIZE;
- wdata->tailsz = cur_len - ((nr_pages - 1) * PAGE_SIZE);
wdata->credits = credits;
wdata->ctx = ctx;
kref_get(&ctx->refcount);
@@ -2687,8 +2724,9 @@ static void collect_uncached_write_data(struct cifs_aio_ctx *ctx)
kref_put(&wdata->refcount, cifs_uncached_writedata_release);
}

- for (i = 0; i < ctx->npages; i++)
- put_page(ctx->bv[i].bv_page);
+ if (!ctx->direct_io)
+ for (i = 0; i < ctx->npages; i++)
+ put_page(ctx->bv[i].bv_page);

cifs_stats_bytes_written(tcon, ctx->total_len);
set_bit(CIFS_INO_INVALID_MAPPING, &CIFS_I(dentry->d_inode)->flags);
@@ -2703,6 +2741,102 @@ static void collect_uncached_write_data(struct cifs_aio_ctx *ctx)
complete(&ctx->done);
}

+ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from)
+{
+ struct file *file = iocb->ki_filp;
+ ssize_t total_written = 0;
+ struct cifsFileInfo *cfile;
+ struct cifs_tcon *tcon;
+ struct cifs_sb_info *cifs_sb;
+ struct TCP_Server_Info *server;
+ size_t len = iov_iter_count(from);
+ int rc;
+ struct cifs_aio_ctx *ctx;
+
+ /*
+ * iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
+ * In this case, fall back to non-direct write function.
+ * this could be improved by getting pages directly in ITER_KVEC
+ */
+ if (from->type & ITER_KVEC) {
+ cifs_dbg(FYI, "use non-direct cifs_user_writev for kvec I/O\n");
+ return cifs_user_writev(iocb, from);
+ }
+
+ rc = generic_write_checks(iocb, from);
+ if (rc <= 0)
+ return rc;
+
+ cifs_sb = CIFS_FILE_SB(file);
+ cfile = file->private_data;
+ tcon = tlink_tcon(cfile->tlink);
+ server = tcon->ses->server;
+
+ if (!server->ops->async_writev)
+ return -ENOSYS;
+
+ ctx = cifs_aio_ctx_alloc();
+ if (!ctx)
+ return -ENOMEM;
+
+ ctx->cfile = cifsFileInfo_get(cfile);
+
+ if (!is_sync_kiocb(iocb))
+ ctx->iocb = iocb;
+
+ ctx->pos = iocb->ki_pos;
+
+ ctx->direct_io = true;
+ ctx->iter = *from;
+ ctx->len = len;
+
+ /* grab a lock here due to read response handlers can access ctx */
+ mutex_lock(&ctx->aio_mutex);
+
+ rc = cifs_write_from_iter(iocb->ki_pos, ctx->len, from,
+ cfile, cifs_sb, &ctx->list, ctx);
+
+ /*
+ * If at least one write was successfully sent, then discard any rc
+ * value from the later writes. If the other write succeeds, then
+ * we'll end up returning whatever was written. If it fails, then
+ * we'll get a new rc value from that.
+ */
+ if (!list_empty(&ctx->list))
+ rc = 0;
+
+ mutex_unlock(&ctx->aio_mutex);
+
+ if (rc) {
+ kref_put(&ctx->refcount, cifs_aio_ctx_release);
+ return rc;
+ }
+
+ if (!is_sync_kiocb(iocb)) {
+ kref_put(&ctx->refcount, cifs_aio_ctx_release);
+ return -EIOCBQUEUED;
+ }
+
+ rc = wait_for_completion_killable(&ctx->done);
+ if (rc) {
+ mutex_lock(&ctx->aio_mutex);
+ ctx->rc = rc = -EINTR;
+ total_written = ctx->total_len;
+ mutex_unlock(&ctx->aio_mutex);
+ } else {
+ rc = ctx->rc;
+ total_written = ctx->total_len;
+ }
+
+ kref_put(&ctx->refcount, cifs_aio_ctx_release);
+
+ if (unlikely(!total_written))
+ return rc;
+
+ iocb->ki_pos += total_written;
+ return total_written;
+}
+
ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
--
2.7.4


2018-09-20 21:21:24

by Long Li

[permalink] [raw]
Subject: [PATCH V3 (resend) 3/7] CIFS: Add support for direct I/O read

From: Long Li <[email protected]>

With direct I/O read, we transfer the data directly from transport layer to
the user data buffer.

Change in v3: add support for kernel AIO

Signed-off-by: Long Li <[email protected]>
---
fs/cifs/cifsfs.h | 1 +
fs/cifs/cifsglob.h | 5 ++
fs/cifs/file.c | 210 +++++++++++++++++++++++++++++++++++++++++++++--------
3 files changed, 187 insertions(+), 29 deletions(-)

diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index f047e87..ed5479c 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -101,6 +101,7 @@ extern int cifs_open(struct inode *inode, struct file *file);
extern int cifs_close(struct inode *inode, struct file *file);
extern int cifs_closedir(struct inode *inode, struct file *file);
extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
+extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 9dcaed0..2131fec 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1172,6 +1172,11 @@ struct cifs_aio_ctx {
unsigned int len;
unsigned int total_len;
bool should_dirty;
+ /*
+ * Indicates if this aio_ctx is for direct_io,
+ * If yes, iter is a copy of the user passed iov_iter
+ */
+ bool direct_io;
};

struct cifs_readdata;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 8d41ca7..6a939fa 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2965,7 +2965,6 @@ cifs_uncached_readdata_release(struct kref *refcount)
kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release);
for (i = 0; i < rdata->nr_pages; i++) {
put_page(rdata->pages[i]);
- rdata->pages[i] = NULL;
}
cifs_readdata_release(refcount);
}
@@ -3004,7 +3003,7 @@ cifs_readdata_to_iov(struct cifs_readdata *rdata, struct iov_iter *iter)
return remaining ? -EFAULT : 0;
}

-static void collect_uncached_read_data(struct cifs_aio_ctx *ctx);
+static void collect_uncached_read_data(struct cifs_readdata *rdata, struct cifs_aio_ctx *ctx);

static void
cifs_uncached_readv_complete(struct work_struct *work)
@@ -3013,7 +3012,7 @@ cifs_uncached_readv_complete(struct work_struct *work)
struct cifs_readdata, work);

complete(&rdata->done);
- collect_uncached_read_data(rdata->ctx);
+ collect_uncached_read_data(rdata, rdata->ctx);
/* the below call can possibly free the last ref to aio ctx */
kref_put(&rdata->refcount, cifs_uncached_readdata_release);
}
@@ -3103,6 +3102,9 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
int rc;
pid_t pid;
struct TCP_Server_Info *server;
+ struct page **pagevec;
+ size_t start;
+ struct iov_iter direct_iov = ctx->iter;

server = tlink_tcon(open_file->tlink)->ses->server;

@@ -3111,6 +3113,9 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
else
pid = current->tgid;

+ if (ctx->direct_io)
+ iov_iter_advance(&direct_iov, offset - ctx->pos);
+
do {
rc = server->ops->wait_mtu_credits(server, cifs_sb->rsize,
&rsize, &credits);
@@ -3118,20 +3123,56 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
break;

cur_len = min_t(const size_t, len, rsize);
- npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);

- /* allocate a readdata struct */
- rdata = cifs_readdata_alloc(npages,
+ if (ctx->direct_io) {
+
+ cur_len = iov_iter_get_pages_alloc(
+ &direct_iov, &pagevec,
+ cur_len, &start);
+ if (cur_len < 0) {
+ cifs_dbg(VFS,
+ "couldn't get user pages (cur_len=%zd)"
+ " iter type %d"
+ " iov_offset %zd count %zd\n",
+ cur_len, direct_iov.type, direct_iov.iov_offset,
+ direct_iov.count);
+ dump_stack();
+ break;
+ }
+ iov_iter_advance(&direct_iov, cur_len);
+
+ rdata = cifs_readdata_direct_alloc(
+ pagevec, cifs_uncached_readv_complete);
+ if (!rdata) {
+ add_credits_and_wake_if(server, credits, 0);
+ rc = -ENOMEM;
+ break;
+ }
+
+ npages = (cur_len + start + PAGE_SIZE-1) / PAGE_SIZE;
+ rdata->page_offset = start;
+ rdata->tailsz = npages > 1 ?
+ cur_len-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE :
+ cur_len;
+
+ } else {
+
+ npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);
+ /* allocate a readdata struct */
+ rdata = cifs_readdata_alloc(npages,
cifs_uncached_readv_complete);
- if (!rdata) {
- add_credits_and_wake_if(server, credits, 0);
- rc = -ENOMEM;
- break;
- }
+ if (!rdata) {
+ add_credits_and_wake_if(server, credits, 0);
+ rc = -ENOMEM;
+ break;
+ }

- rc = cifs_read_allocate_pages(rdata, npages);
- if (rc)
- goto error;
+ rc = cifs_read_allocate_pages(rdata, npages);
+ if (rc)
+ goto error;
+
+ rdata->tailsz = PAGE_SIZE;
+ }

rdata->cfile = cifsFileInfo_get(open_file);
rdata->nr_pages = npages;
@@ -3139,7 +3180,6 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
rdata->bytes = cur_len;
rdata->pid = pid;
rdata->pagesz = PAGE_SIZE;
- rdata->tailsz = PAGE_SIZE;
rdata->read_into_pages = cifs_uncached_read_into_pages;
rdata->copy_into_pages = cifs_uncached_copy_into_pages;
rdata->credits = credits;
@@ -3153,13 +3193,17 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
if (rc) {
add_credits_and_wake_if(server, rdata->credits, 0);
kref_put(&rdata->refcount,
- cifs_uncached_readdata_release);
- if (rc == -EAGAIN)
+ cifs_uncached_readdata_release);
+ if (rc == -EAGAIN) {
+ iov_iter_revert(&direct_iov, cur_len);
continue;
+ }
break;
}

- list_add_tail(&rdata->list, rdata_list);
+ /* Add to aio pending list if it's not there */
+ if (rdata_list)
+ list_add_tail(&rdata->list, rdata_list);
offset += cur_len;
len -= cur_len;
} while (len > 0);
@@ -3168,7 +3212,7 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
}

static void
-collect_uncached_read_data(struct cifs_aio_ctx *ctx)
+collect_uncached_read_data(struct cifs_readdata *uncached_rdata, struct cifs_aio_ctx *ctx)
{
struct cifs_readdata *rdata, *tmp;
struct iov_iter *to = &ctx->iter;
@@ -3211,10 +3255,12 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
* reading.
*/
if (got_bytes && got_bytes < rdata->bytes) {
- rc = cifs_readdata_to_iov(rdata, to);
+ rc = 0;
+ if (!ctx->direct_io)
+ rc = cifs_readdata_to_iov(rdata, to);
if (rc) {
kref_put(&rdata->refcount,
- cifs_uncached_readdata_release);
+ cifs_uncached_readdata_release);
continue;
}
}
@@ -3228,28 +3274,32 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
list_splice(&tmp_list, &ctx->list);

kref_put(&rdata->refcount,
- cifs_uncached_readdata_release);
+ cifs_uncached_readdata_release);
goto again;
} else if (rdata->result)
rc = rdata->result;
- else
+ else if (!ctx->direct_io)
rc = cifs_readdata_to_iov(rdata, to);

/* if there was a short read -- discard anything left */
if (rdata->got_bytes && rdata->got_bytes < rdata->bytes)
rc = -ENODATA;
+
+ ctx->total_len += rdata->got_bytes;
}
list_del_init(&rdata->list);
kref_put(&rdata->refcount, cifs_uncached_readdata_release);
}

- for (i = 0; i < ctx->npages; i++) {
- if (ctx->should_dirty)
- set_page_dirty(ctx->bv[i].bv_page);
- put_page(ctx->bv[i].bv_page);
- }
+ if (!ctx->direct_io) {
+ for (i = 0; i < ctx->npages; i++) {
+ if (ctx->should_dirty)
+ set_page_dirty(ctx->bv[i].bv_page);
+ put_page(ctx->bv[i].bv_page);
+ }

- ctx->total_len = ctx->len - iov_iter_count(to);
+ ctx->total_len = ctx->len - iov_iter_count(to);
+ }

cifs_stats_bytes_read(tcon, ctx->total_len);

@@ -3267,6 +3317,108 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
complete(&ctx->done);
}

+ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to)
+{
+ size_t len;
+ struct file *file;
+ struct cifs_sb_info *cifs_sb;
+ struct cifsFileInfo *cfile;
+ struct cifs_tcon *tcon;
+ ssize_t rc, total_read = 0;
+ struct TCP_Server_Info *server;
+ loff_t offset = iocb->ki_pos;
+ pid_t pid;
+ struct cifs_aio_ctx *ctx;
+
+ /*
+ * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
+ * fall back to data copy read path
+ * this could be improved by getting pages directly in ITER_KVEC
+ */
+ if (to->type & ITER_KVEC) {
+ cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n");
+ return cifs_user_readv(iocb, to);
+ }
+
+ len = iov_iter_count(to);
+ if (!len)
+ return 0;
+
+ file = iocb->ki_filp;
+ cifs_sb = CIFS_FILE_SB(file);
+ cfile = file->private_data;
+ tcon = tlink_tcon(cfile->tlink);
+ server = tcon->ses->server;
+
+ if (!server->ops->async_readv)
+ return -ENOSYS;
+
+ if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
+ pid = cfile->pid;
+ else
+ pid = current->tgid;
+
+ if ((file->f_flags & O_ACCMODE) == O_WRONLY)
+ cifs_dbg(FYI, "attempting read on write only file instance\n");
+
+ ctx = cifs_aio_ctx_alloc();
+ if (!ctx)
+ return -ENOMEM;
+
+ ctx->cfile = cifsFileInfo_get(cfile);
+
+ if (!is_sync_kiocb(iocb))
+ ctx->iocb = iocb;
+
+ if (to->type == ITER_IOVEC)
+ ctx->should_dirty = true;
+
+ ctx->pos = offset;
+ ctx->direct_io = true;
+ ctx->iter = *to;
+ ctx->len = len;
+
+ /* grab a lock here due to read response handlers can access ctx */
+ mutex_lock(&ctx->aio_mutex);
+
+ rc = cifs_send_async_read(offset, len, cfile, cifs_sb, &ctx->list, ctx);
+
+ /* if at least one read request send succeeded, then reset rc */
+ if (!list_empty(&ctx->list))
+ rc = 0;
+
+ mutex_unlock(&ctx->aio_mutex);
+
+ if (rc) {
+ kref_put(&ctx->refcount, cifs_aio_ctx_release);
+ return rc;
+ }
+
+ if (!is_sync_kiocb(iocb)) {
+ kref_put(&ctx->refcount, cifs_aio_ctx_release);
+ return -EIOCBQUEUED;
+ }
+
+ rc = wait_for_completion_killable(&ctx->done);
+ if (rc) {
+ mutex_lock(&ctx->aio_mutex);
+ ctx->rc = rc = -EINTR;
+ total_read = ctx->total_len;
+ mutex_unlock(&ctx->aio_mutex);
+ } else {
+ rc = ctx->rc;
+ total_read = ctx->total_len;
+ }
+
+ kref_put(&ctx->refcount, cifs_aio_ctx_release);
+
+ if (total_read) {
+ iocb->ki_pos += total_read;
+ return total_read;
+ }
+ return rc;
+}
+
ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
{
struct file *file = iocb->ki_filp;
--
2.7.4


2018-09-21 22:19:45

by Pavel Shilovsky

[permalink] [raw]
Subject: Re: [PATCH V3 (resend) 3/7] CIFS: Add support for direct I/O read

чт, 20 сент. 2018 г. в 14:22, Long Li <[email protected]>:
>
> From: Long Li <[email protected]>
>
> With direct I/O read, we transfer the data directly from transport layer to
> the user data buffer.
>
> Change in v3: add support for kernel AIO
>
> Signed-off-by: Long Li <[email protected]>
> ---
> fs/cifs/cifsfs.h | 1 +
> fs/cifs/cifsglob.h | 5 ++
> fs/cifs/file.c | 210 +++++++++++++++++++++++++++++++++++++++++++++--------
> 3 files changed, 187 insertions(+), 29 deletions(-)
>
> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
> index f047e87..ed5479c 100644
> --- a/fs/cifs/cifsfs.h
> +++ b/fs/cifs/cifsfs.h
> @@ -101,6 +101,7 @@ extern int cifs_open(struct inode *inode, struct file *file);
> extern int cifs_close(struct inode *inode, struct file *file);
> extern int cifs_closedir(struct inode *inode, struct file *file);
> extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
> +extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
> extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
> extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
> extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 9dcaed0..2131fec 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -1172,6 +1172,11 @@ struct cifs_aio_ctx {
> unsigned int len;
> unsigned int total_len;
> bool should_dirty;
> + /*
> + * Indicates if this aio_ctx is for direct_io,
> + * If yes, iter is a copy of the user passed iov_iter
> + */
> + bool direct_io;
> };
>
> struct cifs_readdata;
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 8d41ca7..6a939fa 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2965,7 +2965,6 @@ cifs_uncached_readdata_release(struct kref *refcount)
> kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release);
> for (i = 0; i < rdata->nr_pages; i++) {
> put_page(rdata->pages[i]);
> - rdata->pages[i] = NULL;

why is this needed?

> }
> cifs_readdata_release(refcount);
> }
> @@ -3004,7 +3003,7 @@ cifs_readdata_to_iov(struct cifs_readdata *rdata, struct iov_iter *iter)
> return remaining ? -EFAULT : 0;
> }
>
> -static void collect_uncached_read_data(struct cifs_aio_ctx *ctx);
> +static void collect_uncached_read_data(struct cifs_readdata *rdata, struct cifs_aio_ctx *ctx);
>
> static void
> cifs_uncached_readv_complete(struct work_struct *work)
> @@ -3013,7 +3012,7 @@ cifs_uncached_readv_complete(struct work_struct *work)
> struct cifs_readdata, work);
>
> complete(&rdata->done);
> - collect_uncached_read_data(rdata->ctx);
> + collect_uncached_read_data(rdata, rdata->ctx);
> /* the below call can possibly free the last ref to aio ctx */
> kref_put(&rdata->refcount, cifs_uncached_readdata_release);
> }
> @@ -3103,6 +3102,9 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
> int rc;
> pid_t pid;
> struct TCP_Server_Info *server;
> + struct page **pagevec;
> + size_t start;
> + struct iov_iter direct_iov = ctx->iter;
>
> server = tlink_tcon(open_file->tlink)->ses->server;
>
> @@ -3111,6 +3113,9 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
> else
> pid = current->tgid;
>
> + if (ctx->direct_io)
> + iov_iter_advance(&direct_iov, offset - ctx->pos);
> +
> do {
> rc = server->ops->wait_mtu_credits(server, cifs_sb->rsize,
> &rsize, &credits);
> @@ -3118,20 +3123,56 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
> break;
>
> cur_len = min_t(const size_t, len, rsize);
> - npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);
>
> - /* allocate a readdata struct */
> - rdata = cifs_readdata_alloc(npages,
> + if (ctx->direct_io) {
> +
> + cur_len = iov_iter_get_pages_alloc(
> + &direct_iov, &pagevec,
> + cur_len, &start);
> + if (cur_len < 0) {
> + cifs_dbg(VFS,
> + "couldn't get user pages (cur_len=%zd)"
> + " iter type %d"
> + " iov_offset %zd count %zd\n",
> + cur_len, direct_iov.type, direct_iov.iov_offset,
> + direct_iov.count);
> + dump_stack();
> + break;
> + }
> + iov_iter_advance(&direct_iov, cur_len);
> +
> + rdata = cifs_readdata_direct_alloc(
> + pagevec, cifs_uncached_readv_complete);
> + if (!rdata) {
> + add_credits_and_wake_if(server, credits, 0);
> + rc = -ENOMEM;
> + break;
> + }
> +
> + npages = (cur_len + start + PAGE_SIZE-1) / PAGE_SIZE;
> + rdata->page_offset = start;
> + rdata->tailsz = npages > 1 ?
> + cur_len-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE :
> + cur_len;
> +
> + } else {
> +
> + npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);
> + /* allocate a readdata struct */
> + rdata = cifs_readdata_alloc(npages,
> cifs_uncached_readv_complete);
> - if (!rdata) {
> - add_credits_and_wake_if(server, credits, 0);
> - rc = -ENOMEM;
> - break;
> - }
> + if (!rdata) {
> + add_credits_and_wake_if(server, credits, 0);
> + rc = -ENOMEM;
> + break;
> + }
>
> - rc = cifs_read_allocate_pages(rdata, npages);
> - if (rc)
> - goto error;
> + rc = cifs_read_allocate_pages(rdata, npages);
> + if (rc)
> + goto error;
> +
> + rdata->tailsz = PAGE_SIZE;
> + }
>
> rdata->cfile = cifsFileInfo_get(open_file);
> rdata->nr_pages = npages;
> @@ -3139,7 +3180,6 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
> rdata->bytes = cur_len;
> rdata->pid = pid;
> rdata->pagesz = PAGE_SIZE;
> - rdata->tailsz = PAGE_SIZE;
> rdata->read_into_pages = cifs_uncached_read_into_pages;
> rdata->copy_into_pages = cifs_uncached_copy_into_pages;
> rdata->credits = credits;
> @@ -3153,13 +3193,17 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
> if (rc) {
> add_credits_and_wake_if(server, rdata->credits, 0);
> kref_put(&rdata->refcount,
> - cifs_uncached_readdata_release);
> - if (rc == -EAGAIN)
> + cifs_uncached_readdata_release);
> + if (rc == -EAGAIN) {
> + iov_iter_revert(&direct_iov, cur_len);
> continue;
> + }
> break;
> }
>
> - list_add_tail(&rdata->list, rdata_list);
> + /* Add to aio pending list if it's not there */
> + if (rdata_list)
> + list_add_tail(&rdata->list, rdata_list);
> offset += cur_len;
> len -= cur_len;
> } while (len > 0);
> @@ -3168,7 +3212,7 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
> }
>
> static void
> -collect_uncached_read_data(struct cifs_aio_ctx *ctx)
> +collect_uncached_read_data(struct cifs_readdata *uncached_rdata, struct cifs_aio_ctx *ctx)

Why do you need uncached_rdata argument? It doesn't seem you are using
it in this function.

> {
> struct cifs_readdata *rdata, *tmp;
> struct iov_iter *to = &ctx->iter;
> @@ -3211,10 +3255,12 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
> * reading.
> */
> if (got_bytes && got_bytes < rdata->bytes) {
> - rc = cifs_readdata_to_iov(rdata, to);
> + rc = 0;
> + if (!ctx->direct_io)
> + rc = cifs_readdata_to_iov(rdata, to);
> if (rc) {
> kref_put(&rdata->refcount,
> - cifs_uncached_readdata_release);
> + cifs_uncached_readdata_release);
> continue;
> }
> }
> @@ -3228,28 +3274,32 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
> list_splice(&tmp_list, &ctx->list);
>
> kref_put(&rdata->refcount,
> - cifs_uncached_readdata_release);
> + cifs_uncached_readdata_release);
> goto again;
> } else if (rdata->result)
> rc = rdata->result;
> - else
> + else if (!ctx->direct_io)
> rc = cifs_readdata_to_iov(rdata, to);
>
> /* if there was a short read -- discard anything left */
> if (rdata->got_bytes && rdata->got_bytes < rdata->bytes)
> rc = -ENODATA;
> +
> + ctx->total_len += rdata->got_bytes;
> }
> list_del_init(&rdata->list);
> kref_put(&rdata->refcount, cifs_uncached_readdata_release);
> }
>
> - for (i = 0; i < ctx->npages; i++) {
> - if (ctx->should_dirty)
> - set_page_dirty(ctx->bv[i].bv_page);
> - put_page(ctx->bv[i].bv_page);
> - }
> + if (!ctx->direct_io) {
> + for (i = 0; i < ctx->npages; i++) {
> + if (ctx->should_dirty)
> + set_page_dirty(ctx->bv[i].bv_page);
> + put_page(ctx->bv[i].bv_page);
> + }
>
> - ctx->total_len = ctx->len - iov_iter_count(to);
> + ctx->total_len = ctx->len - iov_iter_count(to);
> + }
>
> cifs_stats_bytes_read(tcon, ctx->total_len);
>
> @@ -3267,6 +3317,108 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
> complete(&ctx->done);
> }
>
> +ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to)
> +{
> + size_t len;
> + struct file *file;
> + struct cifs_sb_info *cifs_sb;
> + struct cifsFileInfo *cfile;
> + struct cifs_tcon *tcon;
> + ssize_t rc, total_read = 0;
> + struct TCP_Server_Info *server;
> + loff_t offset = iocb->ki_pos;
> + pid_t pid;
> + struct cifs_aio_ctx *ctx;
> +
> + /*
> + * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
> + * fall back to data copy read path
> + * this could be improved by getting pages directly in ITER_KVEC
> + */
> + if (to->type & ITER_KVEC) {
> + cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n");
> + return cifs_user_readv(iocb, to);
> + }
> +
> + len = iov_iter_count(to);
> + if (!len)
> + return 0;
> +
> + file = iocb->ki_filp;
> + cifs_sb = CIFS_FILE_SB(file);
> + cfile = file->private_data;
> + tcon = tlink_tcon(cfile->tlink);
> + server = tcon->ses->server;
> +
> + if (!server->ops->async_readv)
> + return -ENOSYS;
> +
> + if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
> + pid = cfile->pid;
> + else
> + pid = current->tgid;
> +

pid variable is not being used.

> + if ((file->f_flags & O_ACCMODE) == O_WRONLY)
> + cifs_dbg(FYI, "attempting read on write only file instance\n");
> +
> + ctx = cifs_aio_ctx_alloc();
> + if (!ctx)
> + return -ENOMEM;
> +
> + ctx->cfile = cifsFileInfo_get(cfile);
> +
> + if (!is_sync_kiocb(iocb))
> + ctx->iocb = iocb;
> +
> + if (to->type == ITER_IOVEC)
> + ctx->should_dirty = true;
> +
> + ctx->pos = offset;
> + ctx->direct_io = true;
> + ctx->iter = *to;
> + ctx->len = len;
> +
> + /* grab a lock here due to read response handlers can access ctx */
> + mutex_lock(&ctx->aio_mutex);
> +
> + rc = cifs_send_async_read(offset, len, cfile, cifs_sb, &ctx->list, ctx);
> +
> + /* if at least one read request send succeeded, then reset rc */
> + if (!list_empty(&ctx->list))
> + rc = 0;
> +
> + mutex_unlock(&ctx->aio_mutex);
> +
> + if (rc) {
> + kref_put(&ctx->refcount, cifs_aio_ctx_release);
> + return rc;
> + }
> +
> + if (!is_sync_kiocb(iocb)) {
> + kref_put(&ctx->refcount, cifs_aio_ctx_release);
> + return -EIOCBQUEUED;
> + }
> +
> + rc = wait_for_completion_killable(&ctx->done);
> + if (rc) {
> + mutex_lock(&ctx->aio_mutex);
> + ctx->rc = rc = -EINTR;
> + total_read = ctx->total_len;
> + mutex_unlock(&ctx->aio_mutex);
> + } else {
> + rc = ctx->rc;
> + total_read = ctx->total_len;
> + }
> +
> + kref_put(&ctx->refcount, cifs_aio_ctx_release);
> +
> + if (total_read) {
> + iocb->ki_pos += total_read;
> + return total_read;
> + }
> + return rc;

This function is almost identical to cifs_user_readv. Can the latter
be refactored to avoid code duplication?

> +}
> +
> ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
> {
> struct file *file = iocb->ki_filp;
> --
> 2.7.4
>

2018-09-21 22:31:15

by Pavel Shilovsky

[permalink] [raw]
Subject: Re: [PATCH V3 (resend) 4/7] CIFS: Add support for direct I/O write

чт, 20 сент. 2018 г. в 14:22, Long Li <[email protected]>:
>
> From: Long Li <[email protected]>
>
> With direct I/O write, user supplied buffers are pinned to the memory and data
> are transferred directly from user buffers to the transport layer.
>
> Change in v3: add support for kernel AIO
>
> Signed-off-by: Long Li <[email protected]>
> ---
> fs/cifs/cifsfs.h | 1 +
> fs/cifs/file.c | 196 ++++++++++++++++++++++++++++++++++++++++++++++---------
> 2 files changed, 166 insertions(+), 31 deletions(-)
>
> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
> index ed5479c..cc54051 100644
> --- a/fs/cifs/cifsfs.h
> +++ b/fs/cifs/cifsfs.h
> @@ -104,6 +104,7 @@ extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
> extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
> extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
> extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
> +extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from);
> extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
> extern int cifs_lock(struct file *, int, struct file_lock *);
> extern int cifs_fsync(struct file *, loff_t, loff_t, int);
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 6a939fa..2a5d209 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2537,6 +2537,8 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
> loff_t saved_offset = offset;
> pid_t pid;
> struct TCP_Server_Info *server;
> + struct page **pagevec;
> + size_t start;
>
> if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
> pid = open_file->pid;
> @@ -2553,38 +2555,74 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
> if (rc)
> break;
>
> - nr_pages = get_numpages(wsize, len, &cur_len);
> - wdata = cifs_writedata_alloc(nr_pages,
> + if (ctx->direct_io) {
> + cur_len = iov_iter_get_pages_alloc(
> + from, &pagevec, wsize, &start);
> + if (cur_len < 0) {
> + cifs_dbg(VFS,
> + "direct_writev couldn't get user pages "
> + "(rc=%zd) iter type %d iov_offset %zd count"
> + " %zd\n",
> + cur_len, from->type,
> + from->iov_offset, from->count);
> + dump_stack();
> + break;
> + }
> + iov_iter_advance(from, cur_len);
> +
> + nr_pages = (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE;
> +
> + wdata = cifs_writedata_direct_alloc(pagevec,
> cifs_uncached_writev_complete);
> - if (!wdata) {
> - rc = -ENOMEM;
> - add_credits_and_wake_if(server, credits, 0);
> - break;
> - }
> + if (!wdata) {
> + rc = -ENOMEM;
> + add_credits_and_wake_if(server, credits, 0);
> + break;
> + }
>
> - rc = cifs_write_allocate_pages(wdata->pages, nr_pages);
> - if (rc) {
> - kfree(wdata);
> - add_credits_and_wake_if(server, credits, 0);
> - break;
> - }
>
> - num_pages = nr_pages;
> - rc = wdata_fill_from_iovec(wdata, from, &cur_len, &num_pages);
> - if (rc) {
> - for (i = 0; i < nr_pages; i++)
> - put_page(wdata->pages[i]);
> - kfree(wdata);
> - add_credits_and_wake_if(server, credits, 0);
> - break;
> - }
> + wdata->page_offset = start;
> + wdata->tailsz =
> + nr_pages > 1 ?
> + cur_len - (PAGE_SIZE - start) -
> + (nr_pages - 2) * PAGE_SIZE :
> + cur_len;
> + } else {
> + nr_pages = get_numpages(wsize, len, &cur_len);
> + wdata = cifs_writedata_alloc(nr_pages,
> + cifs_uncached_writev_complete);
> + if (!wdata) {
> + rc = -ENOMEM;
> + add_credits_and_wake_if(server, credits, 0);
> + break;
> + }
>
> - /*
> - * Bring nr_pages down to the number of pages we actually used,
> - * and free any pages that we didn't use.
> - */
> - for ( ; nr_pages > num_pages; nr_pages--)
> - put_page(wdata->pages[nr_pages - 1]);
> + rc = cifs_write_allocate_pages(wdata->pages, nr_pages);
> + if (rc) {
> + kfree(wdata);
> + add_credits_and_wake_if(server, credits, 0);
> + break;
> + }
> +
> + num_pages = nr_pages;
> + rc = wdata_fill_from_iovec(wdata, from, &cur_len, &num_pages);
> + if (rc) {
> + for (i = 0; i < nr_pages; i++)
> + put_page(wdata->pages[i]);
> + kfree(wdata);
> + add_credits_and_wake_if(server, credits, 0);
> + break;
> + }
> +
> + /*
> + * Bring nr_pages down to the number of pages we actually used,
> + * and free any pages that we didn't use.
> + */
> + for ( ; nr_pages > num_pages; nr_pages--)
> + put_page(wdata->pages[nr_pages - 1]);
> +
> + wdata->tailsz = cur_len - ((nr_pages - 1) * PAGE_SIZE);
> + }
>
> wdata->sync_mode = WB_SYNC_ALL;
> wdata->nr_pages = nr_pages;
> @@ -2593,7 +2631,6 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
> wdata->pid = pid;
> wdata->bytes = cur_len;
> wdata->pagesz = PAGE_SIZE;
> - wdata->tailsz = cur_len - ((nr_pages - 1) * PAGE_SIZE);
> wdata->credits = credits;
> wdata->ctx = ctx;
> kref_get(&ctx->refcount);
> @@ -2687,8 +2724,9 @@ static void collect_uncached_write_data(struct cifs_aio_ctx *ctx)
> kref_put(&wdata->refcount, cifs_uncached_writedata_release);
> }
>
> - for (i = 0; i < ctx->npages; i++)
> - put_page(ctx->bv[i].bv_page);
> + if (!ctx->direct_io)
> + for (i = 0; i < ctx->npages; i++)
> + put_page(ctx->bv[i].bv_page);
>
> cifs_stats_bytes_written(tcon, ctx->total_len);
> set_bit(CIFS_INO_INVALID_MAPPING, &CIFS_I(dentry->d_inode)->flags);
> @@ -2703,6 +2741,102 @@ static void collect_uncached_write_data(struct cifs_aio_ctx *ctx)
> complete(&ctx->done);
> }
>
> +ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from)

the same comment as for readv: this function is almost identical to
cifs_user_writev, can't the latter be refactored to avoid code
duplication?

> +{
> + struct file *file = iocb->ki_filp;
> + ssize_t total_written = 0;
> + struct cifsFileInfo *cfile;
> + struct cifs_tcon *tcon;
> + struct cifs_sb_info *cifs_sb;
> + struct TCP_Server_Info *server;
> + size_t len = iov_iter_count(from);
> + int rc;
> + struct cifs_aio_ctx *ctx;
> +
> + /*
> + * iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
> + * In this case, fall back to non-direct write function.
> + * this could be improved by getting pages directly in ITER_KVEC
> + */
> + if (from->type & ITER_KVEC) {
> + cifs_dbg(FYI, "use non-direct cifs_user_writev for kvec I/O\n");
> + return cifs_user_writev(iocb, from);
> + }
> +
> + rc = generic_write_checks(iocb, from);
> + if (rc <= 0)
> + return rc;
> +
> + cifs_sb = CIFS_FILE_SB(file);
> + cfile = file->private_data;
> + tcon = tlink_tcon(cfile->tlink);
> + server = tcon->ses->server;
> +
> + if (!server->ops->async_writev)
> + return -ENOSYS;
> +
> + ctx = cifs_aio_ctx_alloc();
> + if (!ctx)
> + return -ENOMEM;
> +
> + ctx->cfile = cifsFileInfo_get(cfile);
> +
> + if (!is_sync_kiocb(iocb))
> + ctx->iocb = iocb;
> +
> + ctx->pos = iocb->ki_pos;
> +
> + ctx->direct_io = true;
> + ctx->iter = *from;
> + ctx->len = len;
> +
> + /* grab a lock here due to read response handlers can access ctx */
> + mutex_lock(&ctx->aio_mutex);
> +
> + rc = cifs_write_from_iter(iocb->ki_pos, ctx->len, from,
> + cfile, cifs_sb, &ctx->list, ctx);
> +
> + /*
> + * If at least one write was successfully sent, then discard any rc
> + * value from the later writes. If the other write succeeds, then
> + * we'll end up returning whatever was written. If it fails, then
> + * we'll get a new rc value from that.
> + */
> + if (!list_empty(&ctx->list))
> + rc = 0;
> +
> + mutex_unlock(&ctx->aio_mutex);
> +
> + if (rc) {
> + kref_put(&ctx->refcount, cifs_aio_ctx_release);
> + return rc;
> + }
> +
> + if (!is_sync_kiocb(iocb)) {
> + kref_put(&ctx->refcount, cifs_aio_ctx_release);
> + return -EIOCBQUEUED;
> + }
> +
> + rc = wait_for_completion_killable(&ctx->done);
> + if (rc) {
> + mutex_lock(&ctx->aio_mutex);
> + ctx->rc = rc = -EINTR;
> + total_written = ctx->total_len;
> + mutex_unlock(&ctx->aio_mutex);
> + } else {
> + rc = ctx->rc;
> + total_written = ctx->total_len;
> + }
> +
> + kref_put(&ctx->refcount, cifs_aio_ctx_release);
> +
> + if (unlikely(!total_written))
> + return rc;
> +
> + iocb->ki_pos += total_written;
> + return total_written;
> +}
> +
> ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
> {
> struct file *file = iocb->ki_filp;
> --
> 2.7.4
>


--
Best regards,
Pavel Shilovsky

2018-09-24 19:33:09

by Long Li

[permalink] [raw]
Subject: RE: [PATCH V3 (resend) 3/7] CIFS: Add support for direct I/O read

> Subject: Re: [PATCH V3 (resend) 3/7] CIFS: Add support for direct I/O read
>
> чт, 20 сент. 2018 г. в 14:22, Long Li <[email protected]>:
> >
> > From: Long Li <[email protected]>
> >
> > With direct I/O read, we transfer the data directly from transport
> > layer to the user data buffer.
> >
> > Change in v3: add support for kernel AIO
> >
> > Signed-off-by: Long Li <[email protected]>
> > ---
> > fs/cifs/cifsfs.h | 1 +
> > fs/cifs/cifsglob.h | 5 ++
> > fs/cifs/file.c | 210
> +++++++++++++++++++++++++++++++++++++++++++++--------
> > 3 files changed, 187 insertions(+), 29 deletions(-)
> >
> > diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index
> > f047e87..ed5479c 100644
> > --- a/fs/cifs/cifsfs.h
> > +++ b/fs/cifs/cifsfs.h
> > @@ -101,6 +101,7 @@ extern int cifs_open(struct inode *inode, struct
> > file *file); extern int cifs_close(struct inode *inode, struct file
> > *file); extern int cifs_closedir(struct inode *inode, struct file
> > *file); extern ssize_t cifs_user_readv(struct kiocb *iocb, struct
> > iov_iter *to);
> > +extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter
> > +*to);
> > extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter
> > *to); extern ssize_t cifs_user_writev(struct kiocb *iocb, struct
> > iov_iter *from); extern ssize_t cifs_strict_writev(struct kiocb
> > *iocb, struct iov_iter *from); diff --git a/fs/cifs/cifsglob.h
> > b/fs/cifs/cifsglob.h index 9dcaed0..2131fec 100644
> > --- a/fs/cifs/cifsglob.h
> > +++ b/fs/cifs/cifsglob.h
> > @@ -1172,6 +1172,11 @@ struct cifs_aio_ctx {
> > unsigned int len;
> > unsigned int total_len;
> > bool should_dirty;
> > + /*
> > + * Indicates if this aio_ctx is for direct_io,
> > + * If yes, iter is a copy of the user passed iov_iter
> > + */
> > + bool direct_io;
> > };
> >
> > struct cifs_readdata;
> > diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 8d41ca7..6a939fa
> > 100644
> > --- a/fs/cifs/file.c
> > +++ b/fs/cifs/file.c
> > @@ -2965,7 +2965,6 @@ cifs_uncached_readdata_release(struct kref
> *refcount)
> > kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release);
> > for (i = 0; i < rdata->nr_pages; i++) {
> > put_page(rdata->pages[i]);
> > - rdata->pages[i] = NULL;
>
> why is this needed?
It is not needed. But there is no need to set pages[i] to NULL, so just remove this.
>
> > }
> > cifs_readdata_release(refcount); } @@ -3004,7 +3003,7 @@
> > cifs_readdata_to_iov(struct cifs_readdata *rdata, struct iov_iter *iter)
> > return remaining ? -EFAULT : 0; }
> >
> > -static void collect_uncached_read_data(struct cifs_aio_ctx *ctx);
> > +static void collect_uncached_read_data(struct cifs_readdata *rdata,
> > +struct cifs_aio_ctx *ctx);
> >
> > static void
> > cifs_uncached_readv_complete(struct work_struct *work) @@ -3013,7
> > +3012,7 @@ cifs_uncached_readv_complete(struct work_struct *work)
> > struct cifs_readdata,
> > work);
> >
> > complete(&rdata->done);
> > - collect_uncached_read_data(rdata->ctx);
> > + collect_uncached_read_data(rdata, rdata->ctx);
> > /* the below call can possibly free the last ref to aio ctx */
> > kref_put(&rdata->refcount, cifs_uncached_readdata_release); }
> > @@ -3103,6 +3102,9 @@ cifs_send_async_read(loff_t offset, size_t len,
> struct cifsFileInfo *open_file,
> > int rc;
> > pid_t pid;
> > struct TCP_Server_Info *server;
> > + struct page **pagevec;
> > + size_t start;
> > + struct iov_iter direct_iov = ctx->iter;
> >
> > server = tlink_tcon(open_file->tlink)->ses->server;
> >
> > @@ -3111,6 +3113,9 @@ cifs_send_async_read(loff_t offset, size_t len,
> struct cifsFileInfo *open_file,
> > else
> > pid = current->tgid;
> >
> > + if (ctx->direct_io)
> > + iov_iter_advance(&direct_iov, offset - ctx->pos);
> > +
> > do {
> > rc = server->ops->wait_mtu_credits(server, cifs_sb->rsize,
> > &rsize, &credits);
> > @@ -3118,20 +3123,56 @@ cifs_send_async_read(loff_t offset, size_t len,
> struct cifsFileInfo *open_file,
> > break;
> >
> > cur_len = min_t(const size_t, len, rsize);
> > - npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);
> >
> > - /* allocate a readdata struct */
> > - rdata = cifs_readdata_alloc(npages,
> > + if (ctx->direct_io) {
> > +
> > + cur_len = iov_iter_get_pages_alloc(
> > + &direct_iov, &pagevec,
> > + cur_len, &start);
> > + if (cur_len < 0) {
> > + cifs_dbg(VFS,
> > + "couldn't get user pages (cur_len=%zd)"
> > + " iter type %d"
> > + " iov_offset %zd count %zd\n",
> > + cur_len, direct_iov.type, direct_iov.iov_offset,
> > + direct_iov.count);
> > + dump_stack();
> > + break;
> > + }
> > + iov_iter_advance(&direct_iov, cur_len);
> > +
> > + rdata = cifs_readdata_direct_alloc(
> > + pagevec, cifs_uncached_readv_complete);
> > + if (!rdata) {
> > + add_credits_and_wake_if(server, credits, 0);
> > + rc = -ENOMEM;
> > + break;
> > + }
> > +
> > + npages = (cur_len + start + PAGE_SIZE-1) / PAGE_SIZE;
> > + rdata->page_offset = start;
> > + rdata->tailsz = npages > 1 ?
> > + cur_len-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE :
> > + cur_len;
> > +
> > + } else {
> > +
> > + npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);
> > + /* allocate a readdata struct */
> > + rdata = cifs_readdata_alloc(npages,
> > cifs_uncached_readv_complete);
> > - if (!rdata) {
> > - add_credits_and_wake_if(server, credits, 0);
> > - rc = -ENOMEM;
> > - break;
> > - }
> > + if (!rdata) {
> > + add_credits_and_wake_if(server, credits, 0);
> > + rc = -ENOMEM;
> > + break;
> > + }
> >
> > - rc = cifs_read_allocate_pages(rdata, npages);
> > - if (rc)
> > - goto error;
> > + rc = cifs_read_allocate_pages(rdata, npages);
> > + if (rc)
> > + goto error;
> > +
> > + rdata->tailsz = PAGE_SIZE;
> > + }
> >
> > rdata->cfile = cifsFileInfo_get(open_file);
> > rdata->nr_pages = npages; @@ -3139,7 +3180,6 @@
> > cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo
> *open_file,
> > rdata->bytes = cur_len;
> > rdata->pid = pid;
> > rdata->pagesz = PAGE_SIZE;
> > - rdata->tailsz = PAGE_SIZE;
> > rdata->read_into_pages = cifs_uncached_read_into_pages;
> > rdata->copy_into_pages = cifs_uncached_copy_into_pages;
> > rdata->credits = credits; @@ -3153,13 +3193,17 @@
> > cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo
> *open_file,
> > if (rc) {
> > add_credits_and_wake_if(server, rdata->credits, 0);
> > kref_put(&rdata->refcount,
> > - cifs_uncached_readdata_release);
> > - if (rc == -EAGAIN)
> > + cifs_uncached_readdata_release);
> > + if (rc == -EAGAIN) {
> > + iov_iter_revert(&direct_iov, cur_len);
> > continue;
> > + }
> > break;
> > }
> >
> > - list_add_tail(&rdata->list, rdata_list);
> > + /* Add to aio pending list if it's not there */
> > + if (rdata_list)
> > + list_add_tail(&rdata->list, rdata_list);
> > offset += cur_len;
> > len -= cur_len;
> > } while (len > 0);
> > @@ -3168,7 +3212,7 @@ cifs_send_async_read(loff_t offset, size_t len,
> > struct cifsFileInfo *open_file, }
> >
> > static void
> > -collect_uncached_read_data(struct cifs_aio_ctx *ctx)
> > +collect_uncached_read_data(struct cifs_readdata *uncached_rdata,
> > +struct cifs_aio_ctx *ctx)
>
> Why do you need uncached_rdata argument? It doesn't seem you are using
> it in this function.

My bad. I will remove it.

>
> > {
> > struct cifs_readdata *rdata, *tmp;
> > struct iov_iter *to = &ctx->iter; @@ -3211,10 +3255,12 @@
> > collect_uncached_read_data(struct cifs_aio_ctx *ctx)
> > * reading.
> > */
> > if (got_bytes && got_bytes < rdata->bytes) {
> > - rc = cifs_readdata_to_iov(rdata, to);
> > + rc = 0;
> > + if (!ctx->direct_io)
> > + rc =
> > + cifs_readdata_to_iov(rdata, to);
> > if (rc) {
> > kref_put(&rdata->refcount,
> > - cifs_uncached_readdata_release);
> > +
> > + cifs_uncached_readdata_release);
> > continue;
> > }
> > }
> > @@ -3228,28 +3274,32 @@ collect_uncached_read_data(struct
> cifs_aio_ctx *ctx)
> > list_splice(&tmp_list, &ctx->list);
> >
> > kref_put(&rdata->refcount,
> > - cifs_uncached_readdata_release);
> > +
> > + cifs_uncached_readdata_release);
> > goto again;
> > } else if (rdata->result)
> > rc = rdata->result;
> > - else
> > + else if (!ctx->direct_io)
> > rc = cifs_readdata_to_iov(rdata, to);
> >
> > /* if there was a short read -- discard anything left */
> > if (rdata->got_bytes && rdata->got_bytes < rdata->bytes)
> > rc = -ENODATA;
> > +
> > + ctx->total_len += rdata->got_bytes;
> > }
> > list_del_init(&rdata->list);
> > kref_put(&rdata->refcount, cifs_uncached_readdata_release);
> > }
> >
> > - for (i = 0; i < ctx->npages; i++) {
> > - if (ctx->should_dirty)
> > - set_page_dirty(ctx->bv[i].bv_page);
> > - put_page(ctx->bv[i].bv_page);
> > - }
> > + if (!ctx->direct_io) {
> > + for (i = 0; i < ctx->npages; i++) {
> > + if (ctx->should_dirty)
> > + set_page_dirty(ctx->bv[i].bv_page);
> > + put_page(ctx->bv[i].bv_page);
> > + }
> >
> > - ctx->total_len = ctx->len - iov_iter_count(to);
> > + ctx->total_len = ctx->len - iov_iter_count(to);
> > + }
> >
> > cifs_stats_bytes_read(tcon, ctx->total_len);
> >
> > @@ -3267,6 +3317,108 @@ collect_uncached_read_data(struct
> cifs_aio_ctx *ctx)
> > complete(&ctx->done);
> > }
> >
> > +ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to) {
> > + size_t len;
> > + struct file *file;
> > + struct cifs_sb_info *cifs_sb;
> > + struct cifsFileInfo *cfile;
> > + struct cifs_tcon *tcon;
> > + ssize_t rc, total_read = 0;
> > + struct TCP_Server_Info *server;
> > + loff_t offset = iocb->ki_pos;
> > + pid_t pid;
> > + struct cifs_aio_ctx *ctx;
> > +
> > + /*
> > + * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
> > + * fall back to data copy read path
> > + * this could be improved by getting pages directly in ITER_KVEC
> > + */
> > + if (to->type & ITER_KVEC) {
> > + cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n");
> > + return cifs_user_readv(iocb, to);
> > + }
> > +
> > + len = iov_iter_count(to);
> > + if (!len)
> > + return 0;
> > +
> > + file = iocb->ki_filp;
> > + cifs_sb = CIFS_FILE_SB(file);
> > + cfile = file->private_data;
> > + tcon = tlink_tcon(cfile->tlink);
> > + server = tcon->ses->server;
> > +
> > + if (!server->ops->async_readv)
> > + return -ENOSYS;
> > +
> > + if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
> > + pid = cfile->pid;
> > + else
> > + pid = current->tgid;
> > +
>
> pid variable is not being used.

I will remove it.

>
> > + if ((file->f_flags & O_ACCMODE) == O_WRONLY)
> > + cifs_dbg(FYI, "attempting read on write only file
> > + instance\n");
> > +
> > + ctx = cifs_aio_ctx_alloc();
> > + if (!ctx)
> > + return -ENOMEM;
> > +
> > + ctx->cfile = cifsFileInfo_get(cfile);
> > +
> > + if (!is_sync_kiocb(iocb))
> > + ctx->iocb = iocb;
> > +
> > + if (to->type == ITER_IOVEC)
> > + ctx->should_dirty = true;
> > +
> > + ctx->pos = offset;
> > + ctx->direct_io = true;
> > + ctx->iter = *to;
> > + ctx->len = len;
> > +
> > + /* grab a lock here due to read response handlers can access ctx */
> > + mutex_lock(&ctx->aio_mutex);
> > +
> > + rc = cifs_send_async_read(offset, len, cfile, cifs_sb,
> > + &ctx->list, ctx);
> > +
> > + /* if at least one read request send succeeded, then reset rc */
> > + if (!list_empty(&ctx->list))
> > + rc = 0;
> > +
> > + mutex_unlock(&ctx->aio_mutex);
> > +
> > + if (rc) {
> > + kref_put(&ctx->refcount, cifs_aio_ctx_release);
> > + return rc;
> > + }
> > +
> > + if (!is_sync_kiocb(iocb)) {
> > + kref_put(&ctx->refcount, cifs_aio_ctx_release);
> > + return -EIOCBQUEUED;
> > + }
> > +
> > + rc = wait_for_completion_killable(&ctx->done);
> > + if (rc) {
> > + mutex_lock(&ctx->aio_mutex);
> > + ctx->rc = rc = -EINTR;
> > + total_read = ctx->total_len;
> > + mutex_unlock(&ctx->aio_mutex);
> > + } else {
> > + rc = ctx->rc;
> > + total_read = ctx->total_len;
> > + }
> > +
> > + kref_put(&ctx->refcount, cifs_aio_ctx_release);
> > +
> > + if (total_read) {
> > + iocb->ki_pos += total_read;
> > + return total_read;
> > + }
> > + return rc;
>
> This function is almost identical to cifs_user_readv. Can the latter be
> refactored to avoid code duplication?

I was thinking about the same thing. I will merge them into one function.

>
> > +}
> > +
> > ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to) {
> > struct file *file = iocb->ki_filp;
> > --
> > 2.7.4
> >