2022-08-23 17:17:42

by David Howells

[permalink] [raw]
Subject: [PATCH 0/7] smb3: Add iter helpers and use iov_iters down to the network transport


Hi Steve, Al,

Here's an updated version of a subset of my branch to make the cifs/smb3
driver pass iov_iters down to the lowest layers where they can be passed to
the network transport.

Al: Could you look at the first two patches, that add extract_iter_to_iter()
to see about decanting iterators of various types (but that might have to be
lost) into iterators that can be held on to (pinning pages in the process),
and iov_iter_scan() which passes each partial page of an iterator to a scanner
function to do something with (such as create an sglist element for).

Possibly I should add an extract_iter_to_sglist() - I'm doing that in a number
of places.

Steve: assuming Al is okay with the iov_iter patches, can you look at taking
this into your tree (or should it go through mine?)?

I've pushed the patches here also:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=cifs-for-viro

David
---
David Howells (7):
iov_iter: Add a function to extract an iter's buffers to a bvec iter
iov_iter: Add a general purpose iteration function
cifs: Add some helper functions
cifs: Add a function to read into an iter from a socket
cifs: Change the I/O paths to use an iterator rather than a page list
cifs: Remove unused code
cifs: Add some RDMA send tracepoints


fs/cifs/cifsencrypt.c | 40 +-
fs/cifs/cifsfs.h | 3 +
fs/cifs/cifsglob.h | 28 +-
fs/cifs/cifsproto.h | 11 +-
fs/cifs/cifssmb.c | 13 +-
fs/cifs/connect.c | 16 +
fs/cifs/file.c | 1653 ++++++++++++++++++-----------------------
fs/cifs/fscache.c | 22 +-
fs/cifs/fscache.h | 10 +-
fs/cifs/misc.c | 108 ---
fs/cifs/smb2ops.c | 369 +++++----
fs/cifs/smb2pdu.c | 44 +-
fs/cifs/smbdirect.c | 335 ++++-----
fs/cifs/smbdirect.h | 4 +-
fs/cifs/trace.h | 95 +++
fs/cifs/transport.c | 54 +-
include/linux/uio.h | 8 +
lib/iov_iter.c | 159 +++-
18 files changed, 1391 insertions(+), 1581 deletions(-)



2022-08-23 17:20:06

by David Howells

[permalink] [raw]
Subject: [PATCH 1/7] iov_iter: Add a function to extract an iter's buffers to a bvec iter

Copy cifs's setup_aio_ctx_iter() and to lib/iov_iter.c and generalise it as
extract_iter_to_iter(). This allocates and sets up an array of bio_vecs
for all the page fragments in an I/O iterator and sets a second supplied
iterator to bvec-type pointing to the array.

This is can be used when setting up for a direct I/O or an asynchronous I/O
to set up a record of the page fragments that are going to contribute to
the buffer, paging them all in to prevent DIO->mmap loops and allowing the
original iterator to be deallocated (it may be on the stack of the caller).

Note that extract_iter_to_iter() doesn't actually need to make a separate
allocation for the page array. It can place the page array at the end of
the bvec array storage, provided it traverses both arrays from the 0th
element forwards.

Signed-off-by: David Howells <[email protected]>
---

include/linux/uio.h | 4 ++
lib/iov_iter.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 97 insertions(+)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 5896af36199c..88fd93508710 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -251,6 +251,10 @@ ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
size_t maxsize, unsigned maxpages, size_t *start);
ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages,
size_t maxsize, size_t *start);
+ssize_t extract_iter_to_iter(struct iov_iter *orig,
+ size_t orig_len,
+ struct iov_iter *new,
+ struct bio_vec **_bv);
int iov_iter_npages(const struct iov_iter *i, int maxpages);
void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state);

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 4b7fce72e3e5..c07bf978b935 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1518,6 +1518,99 @@ ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
}
EXPORT_SYMBOL(iov_iter_get_pages_alloc2);

+/**
+ * extract_iter_to_iter - Extract the pages from one iterator into another
+ * @orig: The original iterator
+ * @orig_len: The amount of iterator to copy
+ * @new: The iterator to be set up
+ * @_bv: Where to store the allocated bvec table pointer, if allocated
+ *
+ * Extract the page fragments from the given amount of the source iterator and
+ * build up a second iterator that refers to all of those bits. This allows
+ * the original iterator to disposed of.
+ *
+ * If a bvec array is created, the number of pages in the array is returned and
+ * a pointer to the array is saved into *@_bv;
+ */
+ssize_t extract_iter_to_iter(struct iov_iter *orig,
+ size_t orig_len,
+ struct iov_iter *new,
+ struct bio_vec **_bv)
+{
+ struct bio_vec *bv = NULL;
+ struct page **pages;
+ unsigned int cur_npages;
+ unsigned int max_pages = iov_iter_npages(orig, INT_MAX);
+ unsigned int npages = 0;
+ unsigned int i;
+ size_t count = orig_len;
+ ssize_t ret;
+ size_t bv_size, pg_size;
+ size_t start;
+ size_t len;
+
+ *_bv = NULL;
+
+ if (iov_iter_is_kvec(orig) || iov_iter_is_discard(orig)) {
+ *new = *orig;
+ iov_iter_advance(orig, count);
+ return 0;
+ }
+
+ bv_size = array_size(max_pages, sizeof(*bv));
+ bv = kvmalloc(bv_size, GFP_KERNEL);
+ if (!bv)
+ return -ENOMEM;
+
+ /* Put the page list at the end of the bvec list storage. bvec
+ * elements are larger than page pointers, so as long as we work
+ * 0->last, we should be fine.
+ */
+ pg_size = array_size(max_pages, sizeof(*pages));
+ pages = (void *)bv + bv_size - pg_size;
+
+ while (count && npages < max_pages) {
+ ret = iov_iter_get_pages2(orig, pages, count, max_pages - npages,
+ &start);
+ if (ret < 0) {
+ pr_err("Couldn't get user pages (rc=%zd)\n", ret);
+ break;
+ }
+
+ if (ret > count) {
+ pr_err("get_pages rc=%zd more than %zu\n", ret, count);
+ break;
+ }
+
+ iov_iter_advance(orig, ret);
+ count -= ret;
+ ret += start;
+ cur_npages = DIV_ROUND_UP(ret, PAGE_SIZE);
+
+ if (npages + cur_npages > max_pages) {
+ pr_err("Out of bvec array capacity (%u vs %u)\n",
+ npages + cur_npages, max_pages);
+ break;
+ }
+
+ for (i = 0; i < cur_npages; i++) {
+ len = ret > PAGE_SIZE ? PAGE_SIZE : ret;
+ bv[npages + i].bv_page = *pages++;
+ bv[npages + i].bv_offset = start;
+ bv[npages + i].bv_len = len - start;
+ ret -= len;
+ start = 0;
+ }
+
+ npages += cur_npages;
+ }
+
+ *_bv = bv;
+ iov_iter_bvec(new, iov_iter_rw(orig), bv, npages, orig_len - count);
+ return npages;
+}
+EXPORT_SYMBOL(extract_iter_to_iter);
+
size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
struct iov_iter *i)
{


2022-08-23 17:54:16

by David Howells

[permalink] [raw]
Subject: [PATCH 7/7] cifs: Add some RDMA send tracepoints

Add some tracepoints to help trace RDMA in cifs.

Signed-off-by: David Howells <[email protected]>
---

fs/cifs/smbdirect.c | 15 +++++++-
fs/cifs/trace.h | 95 +++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index 578c24d5a941..711beb3722e3 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -1945,6 +1945,8 @@ static ssize_t smbd_post_send_iter_scanner(struct iov_iter *iter, const void *p,
ctx->len += len;
log_write(INFO, "sending page i=%d offset=%zu size=%zu remaining_data_length=%d\n",
ix, offset, len, *ctx->_remaining_data_length);
+ trace_smb3_rdma_page(ctx->num_rqst, ctx->rqst_idx, ix, offset, len,
+ iov_iter_count(iter), *ctx->_remaining_data_length);
*ctx->_remaining_data_length -= len;
ctx->nsg++;
return len;
@@ -2043,19 +2045,27 @@ int smbd_send(struct TCP_Server_Info *server,
klen += rqst->rq_iov[i].iov_len;
iov_iter_kvec(&iter, WRITE, rqst->rq_iov, rqst->rq_nvec, klen);

+ trace_smb3_rdma_send(num_rqst, rqst_idx, rqst->rq_nvec, klen,
+ iov_iter_count(&rqst->rq_iter), remaining_data_length);
+
rc = smbd_post_send_iter(info, &iter, num_rqst, rqst_idx,
&remaining_data_length);
- if (rc < 0)
+ if (rc < 0) {
+ trace_smb3_rdma_fail(num_rqst, rqst_idx, 1, rc);
goto done;
+ }

if (iov_iter_count(&rqst->rq_iter) > 0) {
/* And then the data pages if there are any */
rc = smbd_post_send_iter(info, &rqst->rq_iter, num_rqst, rqst_idx,
&remaining_data_length);
- if (rc < 0)
+ if (rc < 0) {
+ trace_smb3_rdma_fail(num_rqst, rqst_idx, 2, rc);
goto done;
+ }
}

+ trace_smb3_rdma_done(num_rqst, rqst_idx, 3);
rqst_idx++;
if (rqst_idx < num_rqst)
goto next_rqst;
@@ -2282,6 +2292,7 @@ static ssize_t smbd_iter_to_mr_scanner(struct iov_iter *iter, const void *p,
sg_set_buf(&ctx->sgl[ix], p, len);
log_write(INFO, "sending page i=%d offset=%zu size=%zu\n",
ix, offset, len);
+ trace_smb3_rdma_page(0, 0, ix, offset, len, iov_iter_count(iter), 0);
ctx->nsg++;
return len;
}
diff --git a/fs/cifs/trace.h b/fs/cifs/trace.h
index 6b88dc2e364f..70640232b572 100644
--- a/fs/cifs/trace.h
+++ b/fs/cifs/trace.h
@@ -1055,6 +1055,101 @@ DEFINE_SMB3_CREDIT_EVENT(waitff_credits);
DEFINE_SMB3_CREDIT_EVENT(overflow_credits);
DEFINE_SMB3_CREDIT_EVENT(set_credits);

+TRACE_EVENT(smb3_rdma_send,
+ TP_PROTO(unsigned int nr_rqst, unsigned int sub_rqst,
+ unsigned int nr_kvec, unsigned int kvec_len,
+ unsigned int iter_len, unsigned int remaining_len),
+ TP_ARGS(nr_rqst, sub_rqst, nr_kvec, kvec_len, iter_len, remaining_len),
+ TP_STRUCT__entry(
+ __field(u8, nr_rqst)
+ __field(u8, sub_rqst)
+ __field(u8, nr_kvec)
+ __field(unsigned int, kvec_len)
+ __field(unsigned int, iter_len)
+ __field(unsigned int, remaining_len)
+ ),
+ TP_fast_assign(
+ __entry->nr_rqst = nr_rqst;
+ __entry->sub_rqst = sub_rqst;
+ __entry->nr_kvec = nr_kvec;
+ __entry->kvec_len = kvec_len;
+ __entry->iter_len = iter_len;
+ __entry->remaining_len = remaining_len;
+ ),
+ TP_printk("nrq=%u/%u nkv=%u kvl=%u iter=%u rem=%u",
+ __entry->sub_rqst + 1, __entry->nr_rqst,
+ __entry->nr_kvec, __entry->kvec_len,
+ __entry->iter_len, __entry->remaining_len)
+ )
+
+TRACE_EVENT(smb3_rdma_page,
+ TP_PROTO(unsigned int nr_rqst, unsigned int sub_rqst,
+ unsigned int i, size_t offset, ssize_t len,
+ unsigned int iter_len, unsigned int remaining_len),
+ TP_ARGS(nr_rqst, sub_rqst, i, offset, len, iter_len, remaining_len),
+ TP_STRUCT__entry(
+ __field(u8, nr_rqst)
+ __field(u8, sub_rqst)
+ __field(unsigned int, iter_len)
+ __field(unsigned int, remaining_len)
+ __field(unsigned int, i)
+ __field(size_t, offset)
+ __field(ssize_t, len)
+ ),
+ TP_fast_assign(
+ __entry->nr_rqst = nr_rqst;
+ __entry->sub_rqst = sub_rqst;
+ __entry->i = i;
+ __entry->offset = offset;
+ __entry->len = len;
+ __entry->iter_len = iter_len;
+ __entry->remaining_len = remaining_len;
+ ),
+ TP_printk("nrq=%u/%u pg=%u o=%zx l=%zx iter=%u rem=%u",
+ __entry->sub_rqst + 1, __entry->nr_rqst,
+ __entry->i, __entry->offset, __entry->len,
+ __entry->iter_len, __entry->remaining_len)
+ )
+
+TRACE_EVENT(smb3_rdma_done,
+ TP_PROTO(unsigned int nr_rqst, unsigned int sub_rqst,
+ unsigned int where),
+ TP_ARGS(nr_rqst, sub_rqst, where),
+ TP_STRUCT__entry(
+ __field(u8, nr_rqst)
+ __field(u8, sub_rqst)
+ __field(u8, where)
+ ),
+ TP_fast_assign(
+ __entry->nr_rqst = nr_rqst;
+ __entry->sub_rqst = sub_rqst;
+ __entry->where = where;
+ ),
+ TP_printk("nrq=%u/%u where=%u",
+ __entry->sub_rqst + 1, __entry->nr_rqst, __entry->where)
+ )
+
+TRACE_EVENT(smb3_rdma_fail,
+ TP_PROTO(unsigned int nr_rqst, unsigned int sub_rqst,
+ unsigned int where, int rc),
+ TP_ARGS(nr_rqst, sub_rqst, where, rc),
+ TP_STRUCT__entry(
+ __field(u8, nr_rqst)
+ __field(u8, sub_rqst)
+ __field(u8, where)
+ __field(short, rc)
+ ),
+ TP_fast_assign(
+ __entry->nr_rqst = nr_rqst;
+ __entry->sub_rqst = sub_rqst;
+ __entry->where = where;
+ __entry->rc = rc;
+ ),
+ TP_printk("nrq=%u/%u where=%u rc=%d",
+ __entry->sub_rqst + 1, __entry->nr_rqst,
+ __entry->where, __entry->rc)
+ )
+
#endif /* _CIFS_TRACE_H */

#undef TRACE_INCLUDE_PATH


2022-08-24 04:30:00

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/7] iov_iter: Add a function to extract an iter's buffers to a bvec iter

On Tue, Aug 23, 2022 at 03:12:14PM +0100, David Howells wrote:
> Copy cifs's setup_aio_ctx_iter() and to lib/iov_iter.c and generalise it as
> extract_iter_to_iter(). This allocates and sets up an array of bio_vecs
> for all the page fragments in an I/O iterator and sets a second supplied
> iterator to bvec-type pointing to the array.

Did you read my NACK and comments from last time?

2022-08-26 21:48:49

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/7] iov_iter: Add a function to extract an iter's buffers to a bvec iter

On Tue, Aug 23, 2022 at 03:12:14PM +0100, David Howells wrote:
> + ret = iov_iter_get_pages2(orig, pages, count, max_pages - npages,
> + &start);
> + if (ret < 0) {
> + pr_err("Couldn't get user pages (rc=%zd)\n", ret);
> + break;
> + }
> +
> + if (ret > count) {
> + pr_err("get_pages rc=%zd more than %zu\n", ret, count);
> + break;
> + }
> +
> + iov_iter_advance(orig, ret);

Have you even tested that? iov_iter_get_pages2() advances the iterator it had been
given. And no, it does *not* return more than it had been asked to, so the second
check is complete BS.

That's aside of the usefulness of the primitive in question...

2022-10-14 14:47:45

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 1/7] iov_iter: Add a function to extract an iter's buffers to a bvec iter

Christoph Hellwig <[email protected]> wrote:

> On Tue, Aug 23, 2022 at 03:12:14PM +0100, David Howells wrote:
> > Copy cifs's setup_aio_ctx_iter() and to lib/iov_iter.c and generalise it as
> > extract_iter_to_iter(). This allocates and sets up an array of bio_vecs
> > for all the page fragments in an I/O iterator and sets a second supplied
> > iterator to bvec-type pointing to the array.
>
> Did you read my NACK and comments from last time?

No, because they ended up in a different mailbox from everything else for some
reason.

> I really do not like this as a general purpose helper. This is an odd
> quirk that we really generally should not needed unless you have very
> convoluted locking. So please keep it inside of cifs.

I'm using it in under-development netfslib code also.

Let me ask the question more generally in a separate email.

David