2023-09-13 21:25:08

by David Howells

[permalink] [raw]
Subject: [PATCH v4 00/13] iov_iter: Convert the iterator macros into inline funcs

Hi Al, Linus,

Here's my latest go at converting the iov_iter iteration macros to be
inline functions. The first four functions are the main part of the
series:

(1) To the recently added iov_iter kunit tests add three benchmarking
tests that test copying 256MiB from one buffer to another using three
different methods (single-BVEC over the whole buffer, BVEC's allocated
dynamically for 256-page chunks and whole-buffer XARRAY). The results
are dumped to dmesg. No setting up is required with null blockdevs or
anything like that.

(2) Renumber the type enum so that the ITER_* constants match the order in
iterate_and_advance*().

(3) Since (2) puts UBUF and IOVEC at 0 and 1, change user_backed_iter() to
just use the type value and get rid of the extra flag.

(4) Converts the iov_iter iteration macros to always-inline functions to
make the code easier to follow. It uses function pointers, but they
get optimised away. The priv2 argument likewise gets optimised away
if unused.

The rest of the patches are some options for consideration:

(5) Move the iov_iter iteration macros to a header file so that bespoke
iterators can be created elsewhere. For instance, rbd has an
optimisation that requires it to scan to the buffer it is given to see
if it is all zeros. It would be nice if this could use
iterate_and_advance() - but that's currently buried inside
lib/iov_iter.c.

(6) On top of (5), provide a cut-down iteration function that can only
handle kernel-backed iterators (ie. BVEC, KVEC, XARRAY and DISCARD)
for situations where we know that we won't see UBUF/IOVEC.

(7-8) Make copy_to/from_iter() always catch MCE and return a short copy.
This doesn't particularly increase the code size as the handling works
via exception handling tables. That said, there may be code that
doesn't check to result of the copy that could be adversely affected.
If we go with this, it might be worth having an 'MCE happened' flag in
the iterator or something by which this can be checked for.

[?] Is it better to kill the thread than returning a short copy if an
MCE occurs?
[?] Is it better to manually select MCE handling?

(9) On top of (5), move the copy-and-csum code to net/ where it can be in
proximity with the code that uses it. This eliminates the code if
CONFIG_NET=n and allows for the slim possibility of it being inlined.

(10) On top of (9), fold memcpy_and_csum() in to its two users.

(11) On top of (9), move csum_and_copy_from_iter_full() out of line and
merge in csum_and_copy_from_iter() since the former is the only caller
of the latter.

(12) Move hash_and_copy_to_iter() to net/ where it can be with its only
caller.

(13) Add a testing misc device for testing/benchmarking ITER_UBUF and
ITER_IOVEC devices. It is driven by read/write/readv/writev and the
results dumped through a tracepoint.

Further changes I could make:

(1) Add an 'ITER_OTHER' type and an ops table pointer and have
iterate_and_advance2(), iov_iter_advance(), iov_iter_revert(),
etc. jump through it if it sees ITER_OTHER type. This would allow
types for, say, scatterlist, bio list, skbuff to be added without
further expanding the core.

(2) Move the ITER_XARRAY type to being an ITER_OTHER type. This would
shrink the core iterators quite a lot and reduce the stack usage as
the xarray walking stuff wouldn't be there.

Anyway, the changes in compiled function size either side of patch (4) on
x86_64 look like:

_copy_from_iter inc 0x360 -> 0x3d5 +0x75
_copy_from_iter_flushcache inc 0x34c -> 0x358 +0xc
_copy_from_iter_nocache dcr 0x354 -> 0x346 -0xe
_copy_mc_to_iter inc 0x396 -> 0x3cf +0x39
_copy_to_iter inc 0x33b -> 0x35d +0x22
copy_page_from_iter_atomic.part.0 inc 0x393 -> 0x408 +0x75
copy_page_to_iter_nofault.part.0 dcr 0x3de -> 0x3b2 -0x2c
copyin del 0x30
copyout del 0x2d
copyout_mc del 0x2b
csum_and_copy_from_iter inc 0x3db -> 0x3f4 +0x19
csum_and_copy_to_iter dcr 0x45d -> 0x45b -0x2
iov_iter_zero dcr 0x34a -> 0x342 -0x8
memcpy_from_iter.isra.0 del 0x1f
memcpy_from_iter_mc new 0x2b

Note that there's a noticeable expansion on some of the main functions
because a number of the helpers get inlined instead of being called.

In terms of benchmarking patch (4), three runs without it:

# iov_kunit_benchmark_bvec: avg 3175 uS
# iov_kunit_benchmark_bvec_split: avg 3404 uS
# iov_kunit_benchmark_xarray: avg 3611 uS
# iov_kunit_benchmark_bvec: avg 3175 uS
# iov_kunit_benchmark_bvec_split: avg 3403 uS
# iov_kunit_benchmark_xarray: avg 3611 uS
# iov_kunit_benchmark_bvec: avg 3172 uS
# iov_kunit_benchmark_bvec_split: avg 3401 uS
# iov_kunit_benchmark_xarray: avg 3614 uS

and three runs with it:

# iov_kunit_benchmark_bvec: avg 3141 uS
# iov_kunit_benchmark_bvec_split: avg 3405 uS
# iov_kunit_benchmark_xarray: avg 3546 uS
# iov_kunit_benchmark_bvec: avg 3140 uS
# iov_kunit_benchmark_bvec_split: avg 3405 uS
# iov_kunit_benchmark_xarray: avg 3546 uS
# iov_kunit_benchmark_bvec: avg 3138 uS
# iov_kunit_benchmark_bvec_split: avg 3401 uS
# iov_kunit_benchmark_xarray: avg 3542 uS

It looks like patch (4) makes things a little bit faster, probably due to
the extra inlining.

I've pushed the patches here also:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=iov-cleanup

David

Changes
=======
ver #4)
- Fix iterate_bvec() and iterate_kvec() to update iter->bvec and
iter->kvec after subtracting it to calculate iter->nr_segs.
- Change iterate_xarray() to use start+progress rather than increasing
start to reduce code size.
- Added patches to move some iteration functions over to net/ as the files
there can #include the iterator framework.
- Added a patch to benchmark the iteration.

ver #3)
- Use min_t(size_t,) not min() to avoid a warning on Hexagon.
- Inline all the step functions.
- Added a patch to better handle copy_mc.

ver #2)
- Rebased on top of Willy's changes in linux-next.
- Change the checksum argument to the iteration functions to be a general
void* and use it to pass iter->copy_mc flag to memcpy_from_iter_mc() to
avoid using a function pointer.
- Arrange the end of the iterate_*() functions to look the same to give
the optimiser the best chance.
- Make iterate_and_advance() a wrapper around iterate_and_advance2().
- Adjust iterate_and_advance2() to use if-else-if-else-if-else rather than
switch(), to put ITER_BVEC before KVEC and to mark UBUF and IOVEC as
likely().
- Move "iter->count += progress" into iterate_and_advance2() from the
iterate functions.
- Mark a number of the iterator helpers with __always_inline.
- Fix _copy_from_iter_flushcache() to use memcpy_from_iter_flushcache()
not memcpy_from_iter().

Link: https://lore.kernel.org/r/[email protected]/ # v1
Link: https://lore.kernel.org/r/[email protected]/ # v2
Link: https://lore.kernel.org/r/[email protected]/ # v3

David Howells (13):
iov_iter: Add a benchmarking kunit test
iov_iter: Renumber ITER_* constants
iov_iter: Derive user-backedness from the iterator type
iov_iter: Convert iterate*() to inline funcs
iov: Move iterator functions to a header file
iov_iter: Add a kernel-type iterator-only iteration function
iov_iter: Make copy_from_iter() always handle MCE
iov_iter: Remove the copy_mc flag and associated functions
iov_iter, net: Move csum_and_copy_to/from_iter() to net/
iov_iter, net: Fold in csum_and_memcpy()
iov_iter, net: Merge csum_and_copy_from_iter{,_full}() together
iov_iter, net: Move hash_and_copy_to_iter() to net/
iov_iter: Create a fake device to allow iov_iter testing/benchmarking

arch/x86/include/asm/mce.h | 23 ++
fs/coredump.c | 1 -
include/linux/iov_iter.h | 296 +++++++++++++++++++++++++
include/linux/skbuff.h | 3 +
include/linux/uio.h | 45 +---
lib/Kconfig.debug | 8 +
lib/Makefile | 1 +
lib/iov_iter.c | 429 +++++++++++--------------------------
lib/kunit_iov_iter.c | 181 ++++++++++++++++
lib/test_iov_iter.c | 134 ++++++++++++
lib/test_iov_iter_trace.h | 80 +++++++
net/core/datagram.c | 75 ++++++-
net/core/skbuff.c | 40 ++++
13 files changed, 966 insertions(+), 350 deletions(-)
create mode 100644 include/linux/iov_iter.h
create mode 100644 lib/test_iov_iter.c
create mode 100644 lib/test_iov_iter_trace.h


2023-09-13 23:01:10

by David Howells

[permalink] [raw]
Subject: [PATCH v4 05/13] iov: Move iterator functions to a header file

Move the iterator functions to a header file so that other operations that
need to scan over an iterator can be added. For instance, the rbd driver
could use this to scan a buffer to see if it is all zeros and libceph could
use this to generate a crc.

Signed-off-by: David Howells <[email protected]>
cc: Alexander Viro <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: Linus Torvalds <[email protected]>
cc: David Laight <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
---
include/linux/iov_iter.h | 261 +++++++++++++++++++++++++++++++++++++++
lib/iov_iter.c | 197 +----------------------------
2 files changed, 262 insertions(+), 196 deletions(-)
create mode 100644 include/linux/iov_iter.h

diff --git a/include/linux/iov_iter.h b/include/linux/iov_iter.h
new file mode 100644
index 000000000000..836854847cdf
--- /dev/null
+++ b/include/linux/iov_iter.h
@@ -0,0 +1,261 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* I/O iterator iteration building functions.
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#ifndef _LINUX_IOV_ITER_H
+#define _LINUX_IOV_ITER_H
+
+#include <linux/uio.h>
+#include <linux/bvec.h>
+
+typedef size_t (*iov_step_f)(void *iter_base, size_t progress, size_t len,
+ void *priv, void *priv2);
+typedef size_t (*iov_ustep_f)(void __user *iter_base, size_t progress, size_t len,
+ void *priv, void *priv2);
+
+/*
+ * Handle ITER_UBUF.
+ */
+static __always_inline
+size_t iterate_ubuf(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+ iov_ustep_f step)
+{
+ void __user *base = iter->ubuf;
+ size_t progress = 0, remain;
+
+ remain = step(base + iter->iov_offset, 0, len, priv, priv2);
+ progress = len - remain;
+ iter->iov_offset += progress;
+ return progress;
+}
+
+/*
+ * Handle ITER_IOVEC.
+ */
+static __always_inline
+size_t iterate_iovec(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+ iov_ustep_f step)
+{
+ const struct iovec *p = iter->__iov;
+ size_t progress = 0, skip = iter->iov_offset;
+
+ do {
+ size_t remain, consumed;
+ size_t part = min(len, p->iov_len - skip);
+
+ if (likely(part)) {
+ remain = step(p->iov_base + skip, progress, part, priv, priv2);
+ consumed = part - remain;
+ progress += consumed;
+ skip += consumed;
+ len -= consumed;
+ if (skip < p->iov_len)
+ break;
+ }
+ p++;
+ skip = 0;
+ } while (len);
+
+ iter->__iov = p;
+ iter->nr_segs -= p - iter->__iov;
+ iter->iov_offset = skip;
+ return progress;
+}
+
+/*
+ * Handle ITER_KVEC.
+ */
+static __always_inline
+size_t iterate_kvec(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+ iov_step_f step)
+{
+ const struct kvec *p = iter->kvec;
+ size_t progress = 0, skip = iter->iov_offset;
+
+ do {
+ size_t remain, consumed;
+ size_t part = min(len, p->iov_len - skip);
+
+ if (likely(part)) {
+ remain = step(p->iov_base + skip, progress, part, priv, priv2);
+ consumed = part - remain;
+ progress += consumed;
+ skip += consumed;
+ len -= consumed;
+ if (skip < p->iov_len)
+ break;
+ }
+ p++;
+ skip = 0;
+ } while (len);
+
+ iter->nr_segs -= p - iter->kvec;
+ iter->kvec = p;
+ iter->iov_offset = skip;
+ return progress;
+}
+
+/*
+ * Handle ITER_BVEC.
+ */
+static __always_inline
+size_t iterate_bvec(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+ iov_step_f step)
+{
+ const struct bio_vec *p = iter->bvec;
+ size_t progress = 0, skip = iter->iov_offset;
+
+ do {
+ size_t remain, consumed;
+ size_t offset = p->bv_offset + skip, part;
+ void *kaddr = kmap_local_page(p->bv_page + offset / PAGE_SIZE);
+
+ part = min3(len,
+ (size_t)(p->bv_len - skip),
+ (size_t)(PAGE_SIZE - offset % PAGE_SIZE));
+ remain = step(kaddr + offset % PAGE_SIZE, progress, part, priv, priv2);
+ kunmap_local(kaddr);
+ consumed = part - remain;
+ len -= consumed;
+ progress += consumed;
+ skip += consumed;
+ if (skip >= p->bv_len) {
+ skip = 0;
+ p++;
+ }
+ if (remain)
+ break;
+ } while (len);
+
+ iter->nr_segs -= p - iter->bvec;
+ iter->bvec = p;
+ iter->iov_offset = skip;
+ return progress;
+}
+
+/*
+ * Handle ITER_XARRAY.
+ */
+static __always_inline
+size_t iterate_xarray(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+ iov_step_f step)
+{
+ struct folio *folio;
+ size_t progress = 0;
+ loff_t start = iter->xarray_start + iter->iov_offset;
+ pgoff_t index = start / PAGE_SIZE;
+ XA_STATE(xas, iter->xarray, index);
+
+ rcu_read_lock();
+ xas_for_each(&xas, folio, ULONG_MAX) {
+ size_t remain, consumed, offset, part, flen;
+
+ if (xas_retry(&xas, folio))
+ continue;
+ if (WARN_ON(xa_is_value(folio)))
+ break;
+ if (WARN_ON(folio_test_hugetlb(folio)))
+ break;
+
+ offset = offset_in_folio(folio, start + progress);
+ flen = min(folio_size(folio) - offset, len);
+
+ while (flen) {
+ void *base = kmap_local_folio(folio, offset);
+
+ part = min_t(size_t, flen,
+ PAGE_SIZE - offset_in_page(offset));
+ remain = step(base, progress, part, priv, priv2);
+ kunmap_local(base);
+
+ consumed = part - remain;
+ progress += consumed;
+ len -= consumed;
+
+ if (remain || len == 0)
+ goto out;
+ flen -= consumed;
+ offset += consumed;
+ }
+ }
+
+out:
+ rcu_read_unlock();
+ iter->iov_offset += progress;
+ return progress;
+}
+
+/**
+ * iterate_and_advance2 - Iterate over an iterator
+ * @iter: The iterator to iterate over.
+ * @len: The amount to iterate over.
+ * @priv: Data for the step functions.
+ * @priv2: More data for the step functions.
+ * @ustep: Function for UBUF/IOVEC iterators; given __user addresses.
+ * @step: Function for other iterators; given kernel addresses.
+ *
+ * Iterate over the next part of an iterator, up to the specified length. The
+ * buffer is presented in segments, which for kernel iteration are broken up by
+ * physical pages and mapped, with the mapped address being presented.
+ *
+ * Two step functions, @step and @ustep, must be provided, one for handling
+ * mapped kernel addresses and the other is given user addresses which have the
+ * potential to fault since no pinning is performed.
+ *
+ * The step functions are passed the address and length of the segment, @priv,
+ * @priv2 and the amount of data so far iterated over (which can, for example,
+ * be added to @priv to point to the right part of a second buffer). The step
+ * functions should return the amount of the segment they didn't process (ie. 0
+ * indicates complete processsing).
+ *
+ * This function returns the amount of data processed (ie. 0 means nothing was
+ * processed and the value of @len means processes to completion).
+ */
+static __always_inline
+size_t iterate_and_advance2(struct iov_iter *iter, size_t len, void *priv,
+ void *priv2, iov_ustep_f ustep, iov_step_f step)
+{
+ size_t progress;
+
+ if (unlikely(iter->count < len))
+ len = iter->count;
+ if (unlikely(!len))
+ return 0;
+
+ if (likely(iter_is_ubuf(iter)))
+ progress = iterate_ubuf(iter, len, priv, priv2, ustep);
+ else if (likely(iter_is_iovec(iter)))
+ progress = iterate_iovec(iter, len, priv, priv2, ustep);
+ else if (iov_iter_is_bvec(iter))
+ progress = iterate_bvec(iter, len, priv, priv2, step);
+ else if (iov_iter_is_kvec(iter))
+ progress = iterate_kvec(iter, len, priv, priv2, step);
+ else if (iov_iter_is_xarray(iter))
+ progress = iterate_xarray(iter, len, priv, priv2, step);
+ else
+ progress = len;
+ iter->count -= progress;
+ return progress;
+}
+
+/**
+ * iterate_and_advance - Iterate over an iterator
+ * @iter: The iterator to iterate over.
+ * @len: The amount to iterate over.
+ * @priv: Data for the step functions.
+ * @ustep: Function for UBUF/IOVEC iterators; given __user addresses.
+ * @step: Function for other iterators; given kernel addresses.
+ *
+ * As iterate_and_advance2(), but priv2 is always NULL.
+ */
+static __always_inline
+size_t iterate_and_advance(struct iov_iter *iter, size_t len, void *priv,
+ iov_ustep_f ustep, iov_step_f step)
+{
+ return iterate_and_advance2(iter, len, priv, NULL, ustep, step);
+}
+
+#endif /* _LINUX_IOV_ITER_H */
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index b3ce6fa5f7a5..65374ee91ecd 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -13,202 +13,7 @@
#include <net/checksum.h>
#include <linux/scatterlist.h>
#include <linux/instrumented.h>
-
-typedef size_t (*iov_step_f)(void *iter_base, size_t progress, size_t len,
- void *priv, void *priv2);
-typedef size_t (*iov_ustep_f)(void __user *iter_base, size_t progress, size_t len,
- void *priv, void *priv2);
-
-static __always_inline
-size_t iterate_ubuf(struct iov_iter *iter, size_t len, void *priv, void *priv2,
- iov_ustep_f step)
-{
- void __user *base = iter->ubuf;
- size_t progress = 0, remain;
-
- remain = step(base + iter->iov_offset, 0, len, priv, priv2);
- progress = len - remain;
- iter->iov_offset += progress;
- return progress;
-}
-
-static __always_inline
-size_t iterate_iovec(struct iov_iter *iter, size_t len, void *priv, void *priv2,
- iov_ustep_f step)
-{
- const struct iovec *p = iter->__iov;
- size_t progress = 0, skip = iter->iov_offset;
-
- do {
- size_t remain, consumed;
- size_t part = min(len, p->iov_len - skip);
-
- if (likely(part)) {
- remain = step(p->iov_base + skip, progress, part, priv, priv2);
- consumed = part - remain;
- progress += consumed;
- skip += consumed;
- len -= consumed;
- if (skip < p->iov_len)
- break;
- }
- p++;
- skip = 0;
- } while (len);
-
- iter->__iov = p;
- iter->nr_segs -= p - iter->__iov;
- iter->iov_offset = skip;
- return progress;
-}
-
-static __always_inline
-size_t iterate_kvec(struct iov_iter *iter, size_t len, void *priv, void *priv2,
- iov_step_f step)
-{
- const struct kvec *p = iter->kvec;
- size_t progress = 0, skip = iter->iov_offset;
-
- do {
- size_t remain, consumed;
- size_t part = min(len, p->iov_len - skip);
-
- if (likely(part)) {
- remain = step(p->iov_base + skip, progress, part, priv, priv2);
- consumed = part - remain;
- progress += consumed;
- skip += consumed;
- len -= consumed;
- if (skip < p->iov_len)
- break;
- }
- p++;
- skip = 0;
- } while (len);
-
- iter->nr_segs -= p - iter->kvec;
- iter->kvec = p;
- iter->iov_offset = skip;
- return progress;
-}
-
-static __always_inline
-size_t iterate_bvec(struct iov_iter *iter, size_t len, void *priv, void *priv2,
- iov_step_f step)
-{
- const struct bio_vec *p = iter->bvec;
- size_t progress = 0, skip = iter->iov_offset;
-
- do {
- size_t remain, consumed;
- size_t offset = p->bv_offset + skip, part;
- void *kaddr = kmap_local_page(p->bv_page + offset / PAGE_SIZE);
-
- part = min3(len,
- (size_t)(p->bv_len - skip),
- (size_t)(PAGE_SIZE - offset % PAGE_SIZE));
- remain = step(kaddr + offset % PAGE_SIZE, progress, part, priv, priv2);
- kunmap_local(kaddr);
- consumed = part - remain;
- len -= consumed;
- progress += consumed;
- skip += consumed;
- if (skip >= p->bv_len) {
- skip = 0;
- p++;
- }
- if (remain)
- break;
- } while (len);
-
- iter->nr_segs -= p - iter->bvec;
- iter->bvec = p;
- iter->iov_offset = skip;
- return progress;
-}
-
-static __always_inline
-size_t iterate_xarray(struct iov_iter *iter, size_t len, void *priv, void *priv2,
- iov_step_f step)
-{
- struct folio *folio;
- size_t progress = 0;
- loff_t start = iter->xarray_start + iter->iov_offset;
- pgoff_t index = start / PAGE_SIZE;
- XA_STATE(xas, iter->xarray, index);
-
- rcu_read_lock();
- xas_for_each(&xas, folio, ULONG_MAX) {
- size_t remain, consumed, offset, part, flen;
-
- if (xas_retry(&xas, folio))
- continue;
- if (WARN_ON(xa_is_value(folio)))
- break;
- if (WARN_ON(folio_test_hugetlb(folio)))
- break;
-
- offset = offset_in_folio(folio, start + progress);
- flen = min(folio_size(folio) - offset, len);
-
- while (flen) {
- void *base = kmap_local_folio(folio, offset);
-
- part = min_t(size_t, flen,
- PAGE_SIZE - offset_in_page(offset));
- remain = step(base, progress, part, priv, priv2);
- kunmap_local(base);
-
- consumed = part - remain;
- progress += consumed;
- len -= consumed;
-
- if (remain || len == 0)
- goto out;
- flen -= consumed;
- offset += consumed;
- }
- }
-
-out:
- rcu_read_unlock();
- iter->iov_offset += progress;
- return progress;
-}
-
-static __always_inline
-size_t iterate_and_advance2(struct iov_iter *iter, size_t len, void *priv,
- void *priv2, iov_ustep_f ustep, iov_step_f step)
-{
- size_t progress;
-
- if (unlikely(iter->count < len))
- len = iter->count;
- if (unlikely(!len))
- return 0;
-
- if (likely(iter_is_ubuf(iter)))
- progress = iterate_ubuf(iter, len, priv, priv2, ustep);
- else if (likely(iter_is_iovec(iter)))
- progress = iterate_iovec(iter, len, priv, priv2, ustep);
- else if (iov_iter_is_bvec(iter))
- progress = iterate_bvec(iter, len, priv, priv2, step);
- else if (iov_iter_is_kvec(iter))
- progress = iterate_kvec(iter, len, priv, priv2, step);
- else if (iov_iter_is_xarray(iter))
- progress = iterate_xarray(iter, len, priv, priv2, step);
- else
- progress = len;
- iter->count -= progress;
- return progress;
-}
-
-static __always_inline
-size_t iterate_and_advance(struct iov_iter *iter, size_t len, void *priv,
- iov_ustep_f ustep, iov_step_f step)
-{
- return iterate_and_advance2(iter, len, priv, NULL, ustep, step);
-}
+#include <linux/iov_iter.h>

static __always_inline
size_t copy_to_user_iter(void __user *iter_to, size_t progress,

2023-09-13 23:01:24

by David Howells

[permalink] [raw]
Subject: [PATCH v4 08/13] iov_iter: Remove the copy_mc flag and associated functions

copy_to/from_iter() now always return a short copy on MCE, so we don't need
to have special handling.

Signed-off-by: David Howells <[email protected]>
cc: Alexander Viro <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: Linus Torvalds <[email protected]>
cc: David Laight <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
---
include/linux/uio.h | 16 ----------------
lib/iov_iter.c | 5 -----
2 files changed, 21 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index e2a248dad80b..cc250892872e 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -40,7 +40,6 @@ struct iov_iter_state {

struct iov_iter {
u8 iter_type;
- bool copy_mc;
bool nofault;
bool data_source;
union {
@@ -251,22 +250,8 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);

#ifdef CONFIG_ARCH_HAS_COPY_MC
size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
-static inline void iov_iter_set_copy_mc(struct iov_iter *i)
-{
- i->copy_mc = true;
-}
-
-static inline bool iov_iter_is_copy_mc(const struct iov_iter *i)
-{
- return i->copy_mc;
-}
#else
#define _copy_mc_to_iter _copy_to_iter
-static inline void iov_iter_set_copy_mc(struct iov_iter *i) { }
-static inline bool iov_iter_is_copy_mc(const struct iov_iter *i)
-{
- return false;
-}
#endif

size_t iov_iter_zero(size_t bytes, struct iov_iter *);
@@ -381,7 +366,6 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
WARN_ON(direction & ~(READ | WRITE));
*i = (struct iov_iter) {
.iter_type = ITER_UBUF,
- .copy_mc = false,
.data_source = direction,
.ubuf = buf,
.count = count,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index b574601783bc..66d2a16774fe 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -169,7 +169,6 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
WARN_ON(direction & ~(READ | WRITE));
*i = (struct iov_iter) {
.iter_type = ITER_IOVEC,
- .copy_mc = false,
.nofault = false,
.data_source = direction,
.__iov = iov,
@@ -629,7 +628,6 @@ void iov_iter_kvec(struct iov_iter *i, unsigned int direction,
WARN_ON(direction & ~(READ | WRITE));
*i = (struct iov_iter){
.iter_type = ITER_KVEC,
- .copy_mc = false,
.data_source = direction,
.kvec = kvec,
.nr_segs = nr_segs,
@@ -646,7 +644,6 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction,
WARN_ON(direction & ~(READ | WRITE));
*i = (struct iov_iter){
.iter_type = ITER_BVEC,
- .copy_mc = false,
.data_source = direction,
.bvec = bvec,
.nr_segs = nr_segs,
@@ -675,7 +672,6 @@ void iov_iter_xarray(struct iov_iter *i, unsigned int direction,
BUG_ON(direction & ~1);
*i = (struct iov_iter) {
.iter_type = ITER_XARRAY,
- .copy_mc = false,
.data_source = direction,
.xarray = xarray,
.xarray_start = start,
@@ -699,7 +695,6 @@ void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count)
BUG_ON(direction != READ);
*i = (struct iov_iter){
.iter_type = ITER_DISCARD,
- .copy_mc = false,
.data_source = false,
.count = count,
.iov_offset = 0

2023-09-13 23:01:44

by David Howells

[permalink] [raw]
Subject: [PATCH v4 04/13] iov_iter: Convert iterate*() to inline funcs

Convert the iov_iter iteration macros to inline functions to make the code
easier to follow.

The functions are marked __always_inline as we don't want to end up with
indirect calls in the code. This, however, leaves dealing with ->copy_mc
in an awkard situation since the step function (memcpy_from_iter_mc())
needs to test the flag in the iterator, but isn't passed the iterator.
This will be dealt with in a follow-up patch.

The variable names in the per-type iterator functions have been harmonised
as much as possible and made clearer as to the variable purpose.

Signed-off-by: David Howells <[email protected]>
cc: Alexander Viro <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: Linus Torvalds <[email protected]>
cc: David Laight <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/ # v1
Link: https://lore.kernel.org/r/[email protected]/ # v2
Link: https://lore.kernel.org/r/[email protected]/ # v3
---

Notes:
Changes
=======
ver #4)
- Fix iterate_bvec() and iterate_kvec() to update iter->bvec and
iter->kvec after subtracting it to calculate iter->nr_segs.
- Change iterate_xarray() to use start+progress rather than increasing
start to reduce code size.

ver #3)
- Use min_t(size_t,) not min() to avoid a warning on Hexagon.
- Inline all the step functions.

ver #2)
- Rebased on top of Willy's changes in linux-next.
- Change the checksum argument to the iteration functions to be a general
void* and use it to pass iter->copy_mc flag to memcpy_from_iter_mc() to
avoid using a function pointer.
- Arrange the end of the iterate_*() functions to look the same to give
the optimiser the best chance.
- Make iterate_and_advance() a wrapper around iterate_and_advance2().
- Adjust iterate_and_advance2() to use if-else-if-else-if-else rather than
switch(), to put ITER_BVEC before KVEC and to mark UBUF and IOVEC as
likely().
- Move "iter->count += progress" into iterate_and_advance2() from the
iterate functions.
- Mark a number of the iterator helpers with __always_inline.
- Fix _copy_from_iter_flushcache() to use memcpy_from_iter_flushcache()
not memcpy_from_iter().

lib/iov_iter.c | 611 ++++++++++++++++++++++++++++++-------------------
1 file changed, 370 insertions(+), 241 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 227c9f536b94..b3ce6fa5f7a5 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -14,188 +14,263 @@
#include <linux/scatterlist.h>
#include <linux/instrumented.h>

-/* covers ubuf and kbuf alike */
-#define iterate_buf(i, n, base, len, off, __p, STEP) { \
- size_t __maybe_unused off = 0; \
- len = n; \
- base = __p + i->iov_offset; \
- len -= (STEP); \
- i->iov_offset += len; \
- n = len; \
-}
-
-/* covers iovec and kvec alike */
-#define iterate_iovec(i, n, base, len, off, __p, STEP) { \
- size_t off = 0; \
- size_t skip = i->iov_offset; \
- do { \
- len = min(n, __p->iov_len - skip); \
- if (likely(len)) { \
- base = __p->iov_base + skip; \
- len -= (STEP); \
- off += len; \
- skip += len; \
- n -= len; \
- if (skip < __p->iov_len) \
- break; \
- } \
- __p++; \
- skip = 0; \
- } while (n); \
- i->iov_offset = skip; \
- n = off; \
-}
-
-#define iterate_bvec(i, n, base, len, off, p, STEP) { \
- size_t off = 0; \
- unsigned skip = i->iov_offset; \
- while (n) { \
- unsigned offset = p->bv_offset + skip; \
- unsigned left; \
- void *kaddr = kmap_local_page(p->bv_page + \
- offset / PAGE_SIZE); \
- base = kaddr + offset % PAGE_SIZE; \
- len = min(min(n, (size_t)(p->bv_len - skip)), \
- (size_t)(PAGE_SIZE - offset % PAGE_SIZE)); \
- left = (STEP); \
- kunmap_local(kaddr); \
- len -= left; \
- off += len; \
- skip += len; \
- if (skip == p->bv_len) { \
- skip = 0; \
- p++; \
- } \
- n -= len; \
- if (left) \
- break; \
- } \
- i->iov_offset = skip; \
- n = off; \
-}
-
-#define iterate_xarray(i, n, base, len, __off, STEP) { \
- __label__ __out; \
- size_t __off = 0; \
- struct folio *folio; \
- loff_t start = i->xarray_start + i->iov_offset; \
- pgoff_t index = start / PAGE_SIZE; \
- XA_STATE(xas, i->xarray, index); \
- \
- len = PAGE_SIZE - offset_in_page(start); \
- rcu_read_lock(); \
- xas_for_each(&xas, folio, ULONG_MAX) { \
- unsigned left; \
- size_t offset; \
- if (xas_retry(&xas, folio)) \
- continue; \
- if (WARN_ON(xa_is_value(folio))) \
- break; \
- if (WARN_ON(folio_test_hugetlb(folio))) \
- break; \
- offset = offset_in_folio(folio, start + __off); \
- while (offset < folio_size(folio)) { \
- base = kmap_local_folio(folio, offset); \
- len = min(n, len); \
- left = (STEP); \
- kunmap_local(base); \
- len -= left; \
- __off += len; \
- n -= len; \
- if (left || n == 0) \
- goto __out; \
- offset += len; \
- len = PAGE_SIZE; \
- } \
- } \
-__out: \
- rcu_read_unlock(); \
- i->iov_offset += __off; \
- n = __off; \
-}
-
-#define __iterate_and_advance(i, n, base, len, off, I, K) { \
- if (unlikely(i->count < n)) \
- n = i->count; \
- if (likely(n)) { \
- if (likely(iter_is_ubuf(i))) { \
- void __user *base; \
- size_t len; \
- iterate_buf(i, n, base, len, off, \
- i->ubuf, (I)) \
- } else if (likely(iter_is_iovec(i))) { \
- const struct iovec *iov = iter_iov(i); \
- void __user *base; \
- size_t len; \
- iterate_iovec(i, n, base, len, off, \
- iov, (I)) \
- i->nr_segs -= iov - iter_iov(i); \
- i->__iov = iov; \
- } else if (iov_iter_is_bvec(i)) { \
- const struct bio_vec *bvec = i->bvec; \
- void *base; \
- size_t len; \
- iterate_bvec(i, n, base, len, off, \
- bvec, (K)) \
- i->nr_segs -= bvec - i->bvec; \
- i->bvec = bvec; \
- } else if (iov_iter_is_kvec(i)) { \
- const struct kvec *kvec = i->kvec; \
- void *base; \
- size_t len; \
- iterate_iovec(i, n, base, len, off, \
- kvec, (K)) \
- i->nr_segs -= kvec - i->kvec; \
- i->kvec = kvec; \
- } else if (iov_iter_is_xarray(i)) { \
- void *base; \
- size_t len; \
- iterate_xarray(i, n, base, len, off, \
- (K)) \
- } \
- i->count -= n; \
- } \
-}
-#define iterate_and_advance(i, n, base, len, off, I, K) \
- __iterate_and_advance(i, n, base, len, off, I, ((void)(K),0))
-
-static int copyout(void __user *to, const void *from, size_t n)
+typedef size_t (*iov_step_f)(void *iter_base, size_t progress, size_t len,
+ void *priv, void *priv2);
+typedef size_t (*iov_ustep_f)(void __user *iter_base, size_t progress, size_t len,
+ void *priv, void *priv2);
+
+static __always_inline
+size_t iterate_ubuf(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+ iov_ustep_f step)
{
- if (should_fail_usercopy())
- return n;
- if (access_ok(to, n)) {
- instrument_copy_to_user(to, from, n);
- n = raw_copy_to_user(to, from, n);
+ void __user *base = iter->ubuf;
+ size_t progress = 0, remain;
+
+ remain = step(base + iter->iov_offset, 0, len, priv, priv2);
+ progress = len - remain;
+ iter->iov_offset += progress;
+ return progress;
+}
+
+static __always_inline
+size_t iterate_iovec(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+ iov_ustep_f step)
+{
+ const struct iovec *p = iter->__iov;
+ size_t progress = 0, skip = iter->iov_offset;
+
+ do {
+ size_t remain, consumed;
+ size_t part = min(len, p->iov_len - skip);
+
+ if (likely(part)) {
+ remain = step(p->iov_base + skip, progress, part, priv, priv2);
+ consumed = part - remain;
+ progress += consumed;
+ skip += consumed;
+ len -= consumed;
+ if (skip < p->iov_len)
+ break;
+ }
+ p++;
+ skip = 0;
+ } while (len);
+
+ iter->__iov = p;
+ iter->nr_segs -= p - iter->__iov;
+ iter->iov_offset = skip;
+ return progress;
+}
+
+static __always_inline
+size_t iterate_kvec(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+ iov_step_f step)
+{
+ const struct kvec *p = iter->kvec;
+ size_t progress = 0, skip = iter->iov_offset;
+
+ do {
+ size_t remain, consumed;
+ size_t part = min(len, p->iov_len - skip);
+
+ if (likely(part)) {
+ remain = step(p->iov_base + skip, progress, part, priv, priv2);
+ consumed = part - remain;
+ progress += consumed;
+ skip += consumed;
+ len -= consumed;
+ if (skip < p->iov_len)
+ break;
+ }
+ p++;
+ skip = 0;
+ } while (len);
+
+ iter->nr_segs -= p - iter->kvec;
+ iter->kvec = p;
+ iter->iov_offset = skip;
+ return progress;
+}
+
+static __always_inline
+size_t iterate_bvec(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+ iov_step_f step)
+{
+ const struct bio_vec *p = iter->bvec;
+ size_t progress = 0, skip = iter->iov_offset;
+
+ do {
+ size_t remain, consumed;
+ size_t offset = p->bv_offset + skip, part;
+ void *kaddr = kmap_local_page(p->bv_page + offset / PAGE_SIZE);
+
+ part = min3(len,
+ (size_t)(p->bv_len - skip),
+ (size_t)(PAGE_SIZE - offset % PAGE_SIZE));
+ remain = step(kaddr + offset % PAGE_SIZE, progress, part, priv, priv2);
+ kunmap_local(kaddr);
+ consumed = part - remain;
+ len -= consumed;
+ progress += consumed;
+ skip += consumed;
+ if (skip >= p->bv_len) {
+ skip = 0;
+ p++;
+ }
+ if (remain)
+ break;
+ } while (len);
+
+ iter->nr_segs -= p - iter->bvec;
+ iter->bvec = p;
+ iter->iov_offset = skip;
+ return progress;
+}
+
+static __always_inline
+size_t iterate_xarray(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+ iov_step_f step)
+{
+ struct folio *folio;
+ size_t progress = 0;
+ loff_t start = iter->xarray_start + iter->iov_offset;
+ pgoff_t index = start / PAGE_SIZE;
+ XA_STATE(xas, iter->xarray, index);
+
+ rcu_read_lock();
+ xas_for_each(&xas, folio, ULONG_MAX) {
+ size_t remain, consumed, offset, part, flen;
+
+ if (xas_retry(&xas, folio))
+ continue;
+ if (WARN_ON(xa_is_value(folio)))
+ break;
+ if (WARN_ON(folio_test_hugetlb(folio)))
+ break;
+
+ offset = offset_in_folio(folio, start + progress);
+ flen = min(folio_size(folio) - offset, len);
+
+ while (flen) {
+ void *base = kmap_local_folio(folio, offset);
+
+ part = min_t(size_t, flen,
+ PAGE_SIZE - offset_in_page(offset));
+ remain = step(base, progress, part, priv, priv2);
+ kunmap_local(base);
+
+ consumed = part - remain;
+ progress += consumed;
+ len -= consumed;
+
+ if (remain || len == 0)
+ goto out;
+ flen -= consumed;
+ offset += consumed;
+ }
}
- return n;
+
+out:
+ rcu_read_unlock();
+ iter->iov_offset += progress;
+ return progress;
}

-static int copyout_nofault(void __user *to, const void *from, size_t n)
+static __always_inline
+size_t iterate_and_advance2(struct iov_iter *iter, size_t len, void *priv,
+ void *priv2, iov_ustep_f ustep, iov_step_f step)
{
- long res;
+ size_t progress;

+ if (unlikely(iter->count < len))
+ len = iter->count;
+ if (unlikely(!len))
+ return 0;
+
+ if (likely(iter_is_ubuf(iter)))
+ progress = iterate_ubuf(iter, len, priv, priv2, ustep);
+ else if (likely(iter_is_iovec(iter)))
+ progress = iterate_iovec(iter, len, priv, priv2, ustep);
+ else if (iov_iter_is_bvec(iter))
+ progress = iterate_bvec(iter, len, priv, priv2, step);
+ else if (iov_iter_is_kvec(iter))
+ progress = iterate_kvec(iter, len, priv, priv2, step);
+ else if (iov_iter_is_xarray(iter))
+ progress = iterate_xarray(iter, len, priv, priv2, step);
+ else
+ progress = len;
+ iter->count -= progress;
+ return progress;
+}
+
+static __always_inline
+size_t iterate_and_advance(struct iov_iter *iter, size_t len, void *priv,
+ iov_ustep_f ustep, iov_step_f step)
+{
+ return iterate_and_advance2(iter, len, priv, NULL, ustep, step);
+}
+
+static __always_inline
+size_t copy_to_user_iter(void __user *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
+{
if (should_fail_usercopy())
- return n;
+ return len;
+ if (access_ok(iter_to, len)) {
+ from += progress;
+ instrument_copy_to_user(iter_to, from, len);
+ len = raw_copy_to_user(iter_to, from, len);
+ }
+ return len;
+}

- res = copy_to_user_nofault(to, from, n);
+static __always_inline
+size_t copy_to_user_iter_nofault(void __user *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
+{
+ ssize_t res;

- return res < 0 ? n : res;
+ if (should_fail_usercopy())
+ return len;
+
+ from += progress;
+ res = copy_to_user_nofault(iter_to, from, len);
+ return res < 0 ? len : res;
}

-static int copyin(void *to, const void __user *from, size_t n)
+static __always_inline
+size_t copy_from_user_iter(void __user *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
{
- size_t res = n;
+ size_t res = len;

if (should_fail_usercopy())
- return n;
- if (access_ok(from, n)) {
- instrument_copy_from_user_before(to, from, n);
- res = raw_copy_from_user(to, from, n);
- instrument_copy_from_user_after(to, from, n, res);
+ return len;
+ if (access_ok(iter_from, len)) {
+ to += progress;
+ instrument_copy_from_user_before(to, iter_from, len);
+ res = raw_copy_from_user(to, iter_from, len);
+ instrument_copy_from_user_after(to, iter_from, len, res);
}
return res;
}

+static __always_inline
+size_t memcpy_to_iter(void *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
+{
+ memcpy(iter_to, from + progress, len);
+ return 0;
+}
+
+static __always_inline
+size_t memcpy_from_iter(void *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
+{
+ memcpy(to + progress, iter_from, len);
+ return 0;
+}
+
/*
* fault_in_iov_iter_readable - fault in iov iterator for reading
* @i: iterator
@@ -312,23 +387,29 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
return 0;
if (user_backed_iter(i))
might_fault();
- iterate_and_advance(i, bytes, base, len, off,
- copyout(base, addr + off, len),
- memcpy(base, addr + off, len)
- )
-
- return bytes;
+ return iterate_and_advance(i, bytes, (void *)addr,
+ copy_to_user_iter, memcpy_to_iter);
}
EXPORT_SYMBOL(_copy_to_iter);

#ifdef CONFIG_ARCH_HAS_COPY_MC
-static int copyout_mc(void __user *to, const void *from, size_t n)
-{
- if (access_ok(to, n)) {
- instrument_copy_to_user(to, from, n);
- n = copy_mc_to_user((__force void *) to, from, n);
+static __always_inline
+size_t copy_to_user_iter_mc(void __user *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
+{
+ if (access_ok(iter_to, len)) {
+ from += progress;
+ instrument_copy_to_user(iter_to, from, len);
+ len = copy_mc_to_user(iter_to, from, len);
}
- return n;
+ return len;
+}
+
+static __always_inline
+size_t memcpy_to_iter_mc(void *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
+{
+ return copy_mc_to_kernel(iter_to, from + progress, len);
}

/**
@@ -361,22 +442,20 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
return 0;
if (user_backed_iter(i))
might_fault();
- __iterate_and_advance(i, bytes, base, len, off,
- copyout_mc(base, addr + off, len),
- copy_mc_to_kernel(base, addr + off, len)
- )
-
- return bytes;
+ return iterate_and_advance(i, bytes, (void *)addr,
+ copy_to_user_iter_mc, memcpy_to_iter_mc);
}
EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
#endif /* CONFIG_ARCH_HAS_COPY_MC */

-static void *memcpy_from_iter(struct iov_iter *i, void *to, const void *from,
- size_t size)
+static size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
{
- if (iov_iter_is_copy_mc(i))
- return (void *)copy_mc_to_kernel(to, from, size);
- return memcpy(to, from, size);
+ struct iov_iter *iter = priv2;
+
+ if (iov_iter_is_copy_mc(iter))
+ return copy_mc_to_kernel(to + progress, iter_from, len);
+ return memcpy_from_iter(iter_from, progress, len, to, priv2);
}

size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
@@ -386,30 +465,46 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)

if (user_backed_iter(i))
might_fault();
- iterate_and_advance(i, bytes, base, len, off,
- copyin(addr + off, base, len),
- memcpy_from_iter(i, addr + off, base, len)
- )
-
- return bytes;
+ return iterate_and_advance2(i, bytes, addr, i,
+ copy_from_user_iter,
+ memcpy_from_iter_mc);
}
EXPORT_SYMBOL(_copy_from_iter);

+static __always_inline
+size_t copy_from_user_iter_nocache(void __user *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
+{
+ return __copy_from_user_inatomic_nocache(to + progress, iter_from, len);
+}
+
size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
{
if (WARN_ON_ONCE(!i->data_source))
return 0;

- iterate_and_advance(i, bytes, base, len, off,
- __copy_from_user_inatomic_nocache(addr + off, base, len),
- memcpy(addr + off, base, len)
- )
-
- return bytes;
+ return iterate_and_advance(i, bytes, addr,
+ copy_from_user_iter_nocache,
+ memcpy_from_iter);
}
EXPORT_SYMBOL(_copy_from_iter_nocache);

#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+static __always_inline
+size_t copy_from_user_iter_flushcache(void __user *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
+{
+ return __copy_from_user_flushcache(to + progress, iter_from, len);
+}
+
+static __always_inline
+size_t memcpy_from_iter_flushcache(void *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
+{
+ memcpy_flushcache(to + progress, iter_from, len);
+ return 0;
+}
+
/**
* _copy_from_iter_flushcache - write destination through cpu cache
* @addr: destination kernel address
@@ -431,12 +526,9 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
if (WARN_ON_ONCE(!i->data_source))
return 0;

- iterate_and_advance(i, bytes, base, len, off,
- __copy_from_user_flushcache(addr + off, base, len),
- memcpy_flushcache(addr + off, base, len)
- )
-
- return bytes;
+ return iterate_and_advance(i, bytes, addr,
+ copy_from_user_iter_flushcache,
+ memcpy_from_iter_flushcache);
}
EXPORT_SYMBOL_GPL(_copy_from_iter_flushcache);
#endif
@@ -508,10 +600,9 @@ size_t copy_page_to_iter_nofault(struct page *page, unsigned offset, size_t byte
void *kaddr = kmap_local_page(page);
size_t n = min(bytes, (size_t)PAGE_SIZE - offset);

- iterate_and_advance(i, n, base, len, off,
- copyout_nofault(base, kaddr + offset + off, len),
- memcpy(base, kaddr + offset + off, len)
- )
+ n = iterate_and_advance(i, bytes, kaddr,
+ copy_to_user_iter_nofault,
+ memcpy_to_iter);
kunmap_local(kaddr);
res += n;
bytes -= n;
@@ -554,14 +645,25 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
}
EXPORT_SYMBOL(copy_page_from_iter);

-size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
+static __always_inline
+size_t zero_to_user_iter(void __user *iter_to, size_t progress,
+ size_t len, void *priv, void *priv2)
{
- iterate_and_advance(i, bytes, base, len, count,
- clear_user(base, len),
- memset(base, 0, len)
- )
+ return clear_user(iter_to, len);
+}

- return bytes;
+static __always_inline
+size_t zero_to_iter(void *iter_to, size_t progress,
+ size_t len, void *priv, void *priv2)
+{
+ memset(iter_to, 0, len);
+ return 0;
+}
+
+size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
+{
+ return iterate_and_advance(i, bytes, NULL,
+ zero_to_user_iter, zero_to_iter);
}
EXPORT_SYMBOL(iov_iter_zero);

@@ -586,10 +688,9 @@ size_t copy_page_from_iter_atomic(struct page *page, size_t offset,
}

p = kmap_atomic(page) + offset;
- iterate_and_advance(i, n, base, len, off,
- copyin(p + off, base, len),
- memcpy_from_iter(i, p + off, base, len)
- )
+ n = iterate_and_advance2(i, n, p, i,
+ copy_from_user_iter,
+ memcpy_from_iter_mc);
kunmap_atomic(p);
copied += n;
offset += n;
@@ -1180,32 +1281,64 @@ ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
}
EXPORT_SYMBOL(iov_iter_get_pages_alloc2);

+static __always_inline
+size_t copy_from_user_iter_csum(void __user *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
+{
+ __wsum next, *csum = priv2;
+
+ next = csum_and_copy_from_user(iter_from, to + progress, len);
+ *csum = csum_block_add(*csum, next, progress);
+ return next ? 0 : len;
+}
+
+static __always_inline
+size_t memcpy_from_iter_csum(void *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
+{
+ __wsum *csum = priv2;
+
+ *csum = csum_and_memcpy(to + progress, iter_from, len, *csum, progress);
+ return 0;
+}
+
size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
struct iov_iter *i)
{
- __wsum sum, next;
- sum = *csum;
if (WARN_ON_ONCE(!i->data_source))
return 0;
-
- iterate_and_advance(i, bytes, base, len, off, ({
- next = csum_and_copy_from_user(base, addr + off, len);
- sum = csum_block_add(sum, next, off);
- next ? 0 : len;
- }), ({
- sum = csum_and_memcpy(addr + off, base, len, sum, off);
- })
- )
- *csum = sum;
- return bytes;
+ return iterate_and_advance2(i, bytes, addr, csum,
+ copy_from_user_iter_csum,
+ memcpy_from_iter_csum);
}
EXPORT_SYMBOL(csum_and_copy_from_iter);

+static __always_inline
+size_t copy_to_user_iter_csum(void __user *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
+{
+ __wsum next, *csum = priv2;
+
+ next = csum_and_copy_to_user(from + progress, iter_to, len);
+ *csum = csum_block_add(*csum, next, progress);
+ return next ? 0 : len;
+}
+
+static __always_inline
+size_t memcpy_to_iter_csum(void *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
+{
+ __wsum *csum = priv2;
+
+ *csum = csum_and_memcpy(iter_to, from + progress, len, *csum, progress);
+ return 0;
+}
+
size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate,
struct iov_iter *i)
{
struct csum_state *csstate = _csstate;
- __wsum sum, next;
+ __wsum sum;

if (WARN_ON_ONCE(i->data_source))
return 0;
@@ -1219,14 +1352,10 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate,
}

sum = csum_shift(csstate->csum, csstate->off);
- iterate_and_advance(i, bytes, base, len, off, ({
- next = csum_and_copy_to_user(addr + off, base, len);
- sum = csum_block_add(sum, next, off);
- next ? 0 : len;
- }), ({
- sum = csum_and_memcpy(base, addr + off, len, sum, off);
- })
- )
+
+ bytes = iterate_and_advance2(i, bytes, (void *)addr, &sum,
+ copy_to_user_iter_csum,
+ memcpy_to_iter_csum);
csstate->csum = csum_shift(sum, csstate->off);
csstate->off += bytes;
return bytes;

2023-09-13 23:17:40

by David Howells

[permalink] [raw]
Subject: [PATCH v4 01/13] iov_iter: Add a benchmarking kunit test

---
lib/kunit_iov_iter.c | 181 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 181 insertions(+)

diff --git a/lib/kunit_iov_iter.c b/lib/kunit_iov_iter.c
index 859b67c4d697..478fea956f58 100644
--- a/lib/kunit_iov_iter.c
+++ b/lib/kunit_iov_iter.c
@@ -756,6 +756,184 @@ static void __init iov_kunit_extract_pages_xarray(struct kunit *test)
KUNIT_SUCCEED();
}

+static void iov_kunit_free_page(void *data)
+{
+ __free_page(data);
+}
+
+static void __init iov_kunit_benchmark_print_stats(struct kunit *test,
+ unsigned int *samples)
+{
+ unsigned long total = 0;
+ int i;
+
+ for (i = 0; i < 16; i++) {
+ total += samples[i];
+ kunit_info(test, "run %x: %u uS\n", i, samples[i]);
+ }
+
+ kunit_info(test, "avg %lu uS\n", total / 16);
+}
+
+/*
+ * Time copying 256MiB through an ITER_BVEC.
+ */
+static void __init iov_kunit_benchmark_bvec(struct kunit *test)
+{
+ struct iov_iter iter;
+ struct bio_vec *bvec;
+ struct page *page, **pages;
+ unsigned int samples[16];
+ ktime_t a, b;
+ ssize_t copied;
+ size_t size = 256 * 1024 * 1024, npages = size / PAGE_SIZE;
+ void *scratch;
+ int i;
+
+ /* Allocate a page and tile it repeatedly in the buffer. */
+ page = alloc_page(GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, page);
+ kunit_add_action_or_reset(test, iov_kunit_free_page, page);
+
+ bvec = kunit_kmalloc_array(test, npages, sizeof(bvec[0]), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, bvec);
+ for (i = 0; i < npages; i++)
+ bvec_set_page(&bvec[i], page, PAGE_SIZE, 0);
+
+ /* Create a single large buffer to copy to/from. */
+ pages = kunit_kmalloc_array(test, npages, sizeof(pages[0]), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, pages);
+ for (i = 0; i < npages; i++)
+ pages[i] = page;
+
+ scratch = vmap(pages, npages, VM_MAP | VM_MAP_PUT_PAGES, PAGE_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, scratch);
+ kunit_add_action_or_reset(test, iov_kunit_unmap, scratch);
+
+ /* Perform and time a bunch of copies. */
+ kunit_info(test, "Benchmarking copy_to_iter() over BVEC:\n");
+ for (i = 0; i < 16; i++) {
+ iov_iter_bvec(&iter, ITER_DEST, bvec, npages, size);
+ a = ktime_get_real();
+ copied = copy_to_iter(scratch, size, &iter);
+ b = ktime_get_real();
+ KUNIT_EXPECT_EQ(test, copied, size);
+ samples[i] = ktime_to_us(ktime_sub(b, a));
+ }
+
+ iov_kunit_benchmark_print_stats(test, samples);
+ KUNIT_SUCCEED();
+}
+
+/*
+ * Time copying 256MiB through an ITER_BVEC in 256 page chunks.
+ */
+static void __init iov_kunit_benchmark_bvec_split(struct kunit *test)
+{
+ struct iov_iter iter;
+ struct bio_vec *bvec;
+ struct page *page, **pages;
+ unsigned int samples[16];
+ ktime_t a, b;
+ ssize_t copied;
+ size_t size, npages = 64;
+ void *scratch;
+ int i, j;
+
+ /* Allocate a page and tile it repeatedly in the buffer. */
+ page = alloc_page(GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, page);
+ kunit_add_action_or_reset(test, iov_kunit_free_page, page);
+
+ /* Create a single large buffer to copy to/from. */
+ pages = kunit_kmalloc_array(test, npages, sizeof(pages[0]), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, pages);
+ for (i = 0; i < npages; i++)
+ pages[i] = page;
+
+ scratch = vmap(pages, npages, VM_MAP | VM_MAP_PUT_PAGES, PAGE_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, scratch);
+ kunit_add_action_or_reset(test, iov_kunit_unmap, scratch);
+
+ /* Perform and time a bunch of copies. */
+ kunit_info(test, "Benchmarking copy_to_iter() over BVEC:\n");
+ for (i = 0; i < 16; i++) {
+ size = 256 * 1024 * 1024;
+ a = ktime_get_real();
+ do {
+ size_t part = min(size, npages * PAGE_SIZE);
+
+ bvec = kunit_kmalloc_array(test, npages, sizeof(bvec[0]), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, bvec);
+ for (j = 0; j < npages; j++)
+ bvec_set_page(&bvec[j], page, PAGE_SIZE, 0);
+
+ iov_iter_bvec(&iter, ITER_DEST, bvec, npages, part);
+ copied = copy_to_iter(scratch, part, &iter);
+ KUNIT_EXPECT_EQ(test, copied, part);
+ size -= part;
+ } while (size > 0);
+ b = ktime_get_real();
+ samples[i] = ktime_to_us(ktime_sub(b, a));
+ }
+
+ iov_kunit_benchmark_print_stats(test, samples);
+ KUNIT_SUCCEED();
+}
+
+/*
+ * Time copying 256MiB through an ITER_XARRAY.
+ */
+static void __init iov_kunit_benchmark_xarray(struct kunit *test)
+{
+ struct iov_iter iter;
+ struct xarray *xarray;
+ struct page *page, **pages;
+ unsigned int samples[16];
+ ktime_t a, b;
+ ssize_t copied;
+ size_t size = 256 * 1024 * 1024, npages = size / PAGE_SIZE;
+ void *scratch;
+ int i;
+
+ /* Allocate a page and tile it repeatedly in the buffer. */
+ page = alloc_page(GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, page);
+ kunit_add_action_or_reset(test, iov_kunit_free_page, page);
+
+ xarray = iov_kunit_create_xarray(test);
+
+ for (i = 0; i < npages; i++) {
+ void *x = xa_store(xarray, i, page, GFP_KERNEL);
+
+ KUNIT_ASSERT_FALSE(test, xa_is_err(x));
+ }
+
+ /* Create a single large buffer to copy to/from. */
+ pages = kunit_kmalloc_array(test, npages, sizeof(pages[0]), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, pages);
+ for (i = 0; i < npages; i++)
+ pages[i] = page;
+
+ scratch = vmap(pages, npages, VM_MAP | VM_MAP_PUT_PAGES, PAGE_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, scratch);
+ kunit_add_action_or_reset(test, iov_kunit_unmap, scratch);
+
+ /* Perform and time a bunch of copies. */
+ kunit_info(test, "Benchmarking copy_to_iter() over XARRAY:\n");
+ for (i = 0; i < 16; i++) {
+ iov_iter_xarray(&iter, ITER_DEST, xarray, 0, size);
+ a = ktime_get_real();
+ copied = copy_to_iter(scratch, size, &iter);
+ b = ktime_get_real();
+ KUNIT_EXPECT_EQ(test, copied, size);
+ samples[i] = ktime_to_us(ktime_sub(b, a));
+ }
+
+ iov_kunit_benchmark_print_stats(test, samples);
+ KUNIT_SUCCEED();
+}
+
static struct kunit_case __refdata iov_kunit_cases[] = {
KUNIT_CASE(iov_kunit_copy_to_kvec),
KUNIT_CASE(iov_kunit_copy_from_kvec),
@@ -766,6 +944,9 @@ static struct kunit_case __refdata iov_kunit_cases[] = {
KUNIT_CASE(iov_kunit_extract_pages_kvec),
KUNIT_CASE(iov_kunit_extract_pages_bvec),
KUNIT_CASE(iov_kunit_extract_pages_xarray),
+ KUNIT_CASE(iov_kunit_benchmark_bvec),
+ KUNIT_CASE(iov_kunit_benchmark_bvec_split),
+ KUNIT_CASE(iov_kunit_benchmark_xarray),
{}
};


2023-09-14 00:23:09

by David Howells

[permalink] [raw]
Subject: [PATCH v4 13/13] iov_iter: Create a fake device to allow iov_iter testing/benchmarking

Create a fake device to allow testing and benchmarking of UBUF and IOVEC
iterators. /dev/iov-test is created and can be driven with pwritev() in
which case it copies everything to a sink page and it can be written with
preadv() in which case it copies repeatedly from a patterned page.

The time taken is logged with tracepoints.

This can be driven by something like:

echo 1 >/sys/kernel/debug/tracing/events/iov_test/enable
cmd="r -b 1M -V 256 0 256M"; xfs_io -c "open /dev/iov-test" \
-c "$cmd" -c "$cmd" -c "$cmd" -c "$cmd" \
-c "$cmd" -c "$cmd" -c "$cmd" -c "$cmd" \
-c "$cmd" -c "$cmd" -c "$cmd" -c "$cmd" \
-c "$cmd" -c "$cmd" -c "$cmd" -c "$cmd"
cmd="w -b 1M -V 256 0 256M"; xfs_io -c "open /dev/iov-test" \
-c "$cmd" -c "$cmd" -c "$cmd" -c "$cmd" \
-c "$cmd" -c "$cmd" -c "$cmd" -c "$cmd" \
-c "$cmd" -c "$cmd" -c "$cmd" -c "$cmd" \
-c "$cmd" -c "$cmd" -c "$cmd" -c "$cmd"

showing something like:

...: iov_test_read: size=10000000 done=10000000 ty=1 nr=256 dur=27653
...: iov_test_write: size=10000000 done=10000000 ty=1 nr=256 dur=31792

in the trace log.

Signed-off-by: David Howells <[email protected]>
cc: Alexander Viro <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: Linus Torvalds <[email protected]>
cc: David Laight <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
---
lib/Kconfig.debug | 8 +++
lib/Makefile | 1 +
lib/test_iov_iter.c | 134 ++++++++++++++++++++++++++++++++++++++
lib/test_iov_iter_trace.h | 80 +++++++++++++++++++++++
4 files changed, 223 insertions(+)
create mode 100644 lib/test_iov_iter.c
create mode 100644 lib/test_iov_iter_trace.h

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index fa307f93fa2e..cf8392c51344 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2524,6 +2524,14 @@ config TEST_SYSCTL

If unsure, say N.

+config TEST_IOV_ITER
+ tristate "iov_iter test driver"
+ help
+ This creates a misc device that can be used as a way to test various
+ I/O iterator functions through the use of readv/writev and ioctl.
+
+ If unsure, say N.
+
config BITFIELD_KUNIT
tristate "KUnit test bitfield functions at runtime" if !KUNIT_ALL_TESTS
depends on KUNIT
diff --git a/lib/Makefile b/lib/Makefile
index 740109b6e2c8..f6419544a749 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -65,6 +65,7 @@ CFLAGS_test_bitops.o += -Werror
obj-$(CONFIG_CPUMASK_KUNIT_TEST) += cpumask_kunit.o
obj-$(CONFIG_TEST_SYSCTL) += test_sysctl.o
obj-$(CONFIG_TEST_IOV_ITER) += kunit_iov_iter.o
+obj-$(CONFIG_TEST_IOV_ITER) += test_iov_iter.o
obj-$(CONFIG_HASH_KUNIT_TEST) += test_hash.o
obj-$(CONFIG_TEST_IDA) += test_ida.o
obj-$(CONFIG_TEST_UBSAN) += test_ubsan.o
diff --git a/lib/test_iov_iter.c b/lib/test_iov_iter.c
new file mode 100644
index 000000000000..afa70647dbde
--- /dev/null
+++ b/lib/test_iov_iter.c
@@ -0,0 +1,134 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* I/O iterator testing device.
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/module.h>
+#include <linux/uio.h>
+#include <linux/fs.h>
+#include <linux/pagemap.h>
+#include <linux/miscdevice.h>
+#define CREATE_TRACE_POINTS
+#include "test_iov_iter_trace.h"
+
+MODULE_DESCRIPTION("iov_iter testing");
+MODULE_AUTHOR("David Howells <[email protected]>");
+MODULE_LICENSE("GPL");
+
+static ssize_t iov_test_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+ struct folio *folio = iocb->ki_filp->private_data;
+ unsigned int nr_segs = iter->nr_segs;
+ size_t size = iov_iter_count(iter), fsize = folio_size(folio);
+ size_t copied = 0, offset = 0, i;
+ ktime_t a, b;
+ u8 *p;
+
+ /* Pattern the buffer */
+ p = kmap_local_folio(folio, 0);
+ for (i = 0; i < folio_size(folio); i++)
+ p[i] = i & 0xff;
+ kunmap_local(p);
+
+ a = ktime_get_real();
+ while (iov_iter_count(iter)) {
+ size_t done, part = min(iov_iter_count(iter), fsize - offset);
+
+ done = copy_folio_to_iter(folio, offset, part, iter);
+ if (done == 0)
+ break;
+ copied += done;
+ offset = (offset + done) & (fsize - 1);
+ }
+
+ b = ktime_get_real();
+ trace_iov_test_read(size, copied, iov_iter_type(iter), nr_segs,
+ ktime_to_us(ktime_sub(b, a)));
+ return copied;
+}
+
+static ssize_t iov_test_write_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+ struct folio *folio = iocb->ki_filp->private_data;
+ unsigned int nr_segs = iter->nr_segs;
+ size_t size = iov_iter_count(iter), fsize = folio_size(folio);
+ size_t copied = 0, offset = 0;
+ ktime_t a = ktime_get_real(), b;
+
+ while (iov_iter_count(iter)) {
+ size_t done, part = min(iov_iter_count(iter), fsize - offset);
+
+ done = copy_page_from_iter(folio_page(folio, 0), offset, part, iter);
+ if (done == 0)
+ break;
+ copied += done;
+ offset = (offset + done) & (fsize - 1);
+ }
+
+ b = ktime_get_real();
+ trace_iov_test_write(size, copied, iov_iter_type(iter), nr_segs,
+ ktime_to_us(ktime_sub(b, a)));
+ return copied;
+}
+
+static int iov_test_open(struct inode *inode, struct file *file)
+{
+ struct folio *folio;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ folio = folio_alloc(GFP_KERNEL, 0);
+ if (!folio)
+ return -ENOMEM;
+ file->private_data = folio;
+ return 0;
+}
+
+static int iov_test_release(struct inode *inode, struct file *file)
+{
+ struct folio *folio = file->private_data;
+
+ folio_put(folio);
+ return 0;
+}
+
+static const struct file_operations iov_test_fops = {
+ .owner = THIS_MODULE,
+ .open = iov_test_open,
+ .release = iov_test_release,
+ .read_iter = iov_test_read_iter,
+ .write_iter = iov_test_write_iter,
+ .splice_read = copy_splice_read,
+ .llseek = noop_llseek,
+};
+
+static struct miscdevice iov_test_dev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "iov-test",
+ .fops = &iov_test_fops,
+};
+
+static int __init iov_test_init(void)
+{
+ int ret;
+
+ ret = misc_register(&iov_test_dev);
+ if (ret < 0)
+ return ret;
+ pr_info("Loaded\n");
+ return 0;
+}
+module_init(iov_test_init);
+
+static void __exit iov_test_exit(void)
+{
+ pr_info("Unloading\n");
+
+ misc_deregister(&iov_test_dev);
+}
+module_exit(iov_test_exit);
diff --git a/lib/test_iov_iter_trace.h b/lib/test_iov_iter_trace.h
new file mode 100644
index 000000000000..b99cade5d004
--- /dev/null
+++ b/lib/test_iov_iter_trace.h
@@ -0,0 +1,80 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* I/O iterator testing device.
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM iov_test
+
+#if !defined(_IOV_TEST_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _IOV_TEST_TRACE_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(iov_test_read,
+ TP_PROTO(size_t size, size_t done, enum iter_type type,
+ unsigned int nr_segs, u64 duration),
+
+ TP_ARGS(size, done, type, nr_segs, duration),
+
+ TP_STRUCT__entry(
+ __field(size_t, size)
+ __field(size_t, done)
+ __field(enum iter_type, type)
+ __field(unsigned int, nr_segs)
+ __field(u64, duration)
+ ),
+
+ TP_fast_assign(
+ __entry->size = size;
+ __entry->done = done;
+ __entry->type = type;
+ __entry->nr_segs = nr_segs;
+ __entry->duration = duration;
+ ),
+
+ TP_printk("size=%zx done=%zx ty=%u nr=%u dur=%llu",
+ __entry->size,
+ __entry->done,
+ __entry->type,
+ __entry->nr_segs,
+ __entry->duration)
+ );
+
+TRACE_EVENT(iov_test_write,
+ TP_PROTO(size_t size, size_t done, enum iter_type type,
+ unsigned int nr_segs, u64 duration),
+
+ TP_ARGS(size, done, type, nr_segs, duration),
+
+ TP_STRUCT__entry(
+ __field(size_t, size)
+ __field(size_t, done)
+ __field(enum iter_type, type)
+ __field(unsigned int, nr_segs)
+ __field(u64, duration)
+ ),
+
+ TP_fast_assign(
+ __entry->size = size;
+ __entry->done = done;
+ __entry->type = type;
+ __entry->nr_segs = nr_segs;
+ __entry->duration = duration;
+ ),
+
+ TP_printk("size=%zx done=%zx ty=%u nr=%u dur=%llu",
+ __entry->size,
+ __entry->done,
+ __entry->type,
+ __entry->nr_segs,
+ __entry->duration)
+ );
+
+#endif /* _IOV_TEST_TRACE_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE test_iov_iter_trace
+#include <trace/define_trace.h>

2023-09-14 03:47:48

by David Howells

[permalink] [raw]
Subject: [PATCH v4 10/13] iov_iter, net: Fold in csum_and_memcpy()

Fold csum_and_memcpy() in to its callers.

Signed-off-by: David Howells <[email protected]>
cc: Alexander Viro <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: Linus Torvalds <[email protected]>
cc: David Laight <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
---
include/linux/skbuff.h | 7 -------
net/core/datagram.c | 3 ++-
net/core/skbuff.c | 3 ++-
3 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index d0656cc11c16..c81ef5d76953 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3679,13 +3679,6 @@ static inline int __must_check skb_put_padto(struct sk_buff *skb, unsigned int l
return __skb_put_padto(skb, len, true);
}

-static inline __wsum csum_and_memcpy(void *to, const void *from, size_t len,
- __wsum sum, size_t off)
-{
- __wsum next = csum_partial_copy_nocheck(from, to, len);
- return csum_block_add(sum, next, off);
-}
-
struct csum_state {
__wsum csum;
size_t off;
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 37c89d0933b7..452620dd41e4 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -732,8 +732,9 @@ size_t memcpy_to_iter_csum(void *iter_to, size_t progress,
size_t len, void *from, void *priv2)
{
__wsum *csum = priv2;
+ __wsum next = csum_partial_copy_nocheck(from, iter_to, len);

- *csum = csum_and_memcpy(iter_to, from + progress, len, *csum, progress);
+ *csum = csum_block_add(*csum, next, progress);
return 0;
}

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5dbdfce2d05f..3efed86321db 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -6938,8 +6938,9 @@ size_t memcpy_from_iter_csum(void *iter_from, size_t progress,
size_t len, void *to, void *priv2)
{
__wsum *csum = priv2;
+ __wsum next = csum_partial_copy_nocheck(iter_from, to + progress, len);

- *csum = csum_and_memcpy(to + progress, iter_from, len, *csum, progress);
+ *csum = csum_block_add(*csum, next, progress);
return 0;
}


2023-09-14 05:32:51

by David Howells

[permalink] [raw]
Subject: [PATCH v4 09/13] iov_iter, net: Move csum_and_copy_to/from_iter() to net/

Move csum_and_copy_to/from_iter() to net code now that the iteration
framework can be #included.

Signed-off-by: David Howells <[email protected]>
---
include/linux/skbuff.h | 25 ++++++++++++
include/linux/uio.h | 18 ---------
lib/iov_iter.c | 89 ------------------------------------------
net/core/datagram.c | 50 +++++++++++++++++++++++-
net/core/skbuff.c | 33 ++++++++++++++++
5 files changed, 107 insertions(+), 108 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4174c4b82d13..d0656cc11c16 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3679,6 +3679,31 @@ static inline int __must_check skb_put_padto(struct sk_buff *skb, unsigned int l
return __skb_put_padto(skb, len, true);
}

+static inline __wsum csum_and_memcpy(void *to, const void *from, size_t len,
+ __wsum sum, size_t off)
+{
+ __wsum next = csum_partial_copy_nocheck(from, to, len);
+ return csum_block_add(sum, next, off);
+}
+
+struct csum_state {
+ __wsum csum;
+ size_t off;
+};
+
+size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i);
+
+static __always_inline __must_check
+bool csum_and_copy_from_iter_full(void *addr, size_t bytes,
+ __wsum *csum, struct iov_iter *i)
+{
+ size_t copied = csum_and_copy_from_iter(addr, bytes, csum, i);
+ if (likely(copied == bytes))
+ return true;
+ iov_iter_revert(i, copied);
+ return false;
+}
+
static inline int skb_add_data(struct sk_buff *skb,
struct iov_iter *from, int copy)
{
diff --git a/include/linux/uio.h b/include/linux/uio.h
index cc250892872e..c335b95626af 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -326,24 +326,6 @@ iov_iter_npages_cap(struct iov_iter *i, int maxpages, size_t max_bytes)
return npages;
}

-struct csum_state {
- __wsum csum;
- size_t off;
-};
-
-size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csstate, struct iov_iter *i);
-size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i);
-
-static __always_inline __must_check
-bool csum_and_copy_from_iter_full(void *addr, size_t bytes,
- __wsum *csum, struct iov_iter *i)
-{
- size_t copied = csum_and_copy_from_iter(addr, bytes, csum, i);
- if (likely(copied == bytes))
- return true;
- iov_iter_revert(i, copied);
- return false;
-}
size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
struct iov_iter *i);

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 66d2a16774fe..5b2d053f057f 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -10,7 +10,6 @@
#include <linux/vmalloc.h>
#include <linux/splice.h>
#include <linux/compat.h>
-#include <net/checksum.h>
#include <linux/scatterlist.h>
#include <linux/instrumented.h>
#include <linux/iov_iter.h>
@@ -179,13 +178,6 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
}
EXPORT_SYMBOL(iov_iter_init);

-static __wsum csum_and_memcpy(void *to, const void *from, size_t len,
- __wsum sum, size_t off)
-{
- __wsum next = csum_partial_copy_nocheck(from, to, len);
- return csum_block_add(sum, next, off);
-}
-
size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
{
if (WARN_ON_ONCE(i->data_source))
@@ -1079,87 +1071,6 @@ ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
}
EXPORT_SYMBOL(iov_iter_get_pages_alloc2);

-static __always_inline
-size_t copy_from_user_iter_csum(void __user *iter_from, size_t progress,
- size_t len, void *to, void *priv2)
-{
- __wsum next, *csum = priv2;
-
- next = csum_and_copy_from_user(iter_from, to + progress, len);
- *csum = csum_block_add(*csum, next, progress);
- return next ? 0 : len;
-}
-
-static __always_inline
-size_t memcpy_from_iter_csum(void *iter_from, size_t progress,
- size_t len, void *to, void *priv2)
-{
- __wsum *csum = priv2;
-
- *csum = csum_and_memcpy(to + progress, iter_from, len, *csum, progress);
- return 0;
-}
-
-size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
- struct iov_iter *i)
-{
- if (WARN_ON_ONCE(!i->data_source))
- return 0;
- return iterate_and_advance2(i, bytes, addr, csum,
- copy_from_user_iter_csum,
- memcpy_from_iter_csum);
-}
-EXPORT_SYMBOL(csum_and_copy_from_iter);
-
-static __always_inline
-size_t copy_to_user_iter_csum(void __user *iter_to, size_t progress,
- size_t len, void *from, void *priv2)
-{
- __wsum next, *csum = priv2;
-
- next = csum_and_copy_to_user(from + progress, iter_to, len);
- *csum = csum_block_add(*csum, next, progress);
- return next ? 0 : len;
-}
-
-static __always_inline
-size_t memcpy_to_iter_csum(void *iter_to, size_t progress,
- size_t len, void *from, void *priv2)
-{
- __wsum *csum = priv2;
-
- *csum = csum_and_memcpy(iter_to, from + progress, len, *csum, progress);
- return 0;
-}
-
-size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate,
- struct iov_iter *i)
-{
- struct csum_state *csstate = _csstate;
- __wsum sum;
-
- if (WARN_ON_ONCE(i->data_source))
- return 0;
- if (unlikely(iov_iter_is_discard(i))) {
- // can't use csum_memcpy() for that one - data is not copied
- csstate->csum = csum_block_add(csstate->csum,
- csum_partial(addr, bytes, 0),
- csstate->off);
- csstate->off += bytes;
- return bytes;
- }
-
- sum = csum_shift(csstate->csum, csstate->off);
-
- bytes = iterate_and_advance2(i, bytes, (void *)addr, &sum,
- copy_to_user_iter_csum,
- memcpy_to_iter_csum);
- csstate->csum = csum_shift(sum, csstate->off);
- csstate->off += bytes;
- return bytes;
-}
-EXPORT_SYMBOL(csum_and_copy_to_iter);
-
size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
struct iov_iter *i)
{
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 176eb5834746..37c89d0933b7 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -50,7 +50,7 @@
#include <linux/spinlock.h>
#include <linux/slab.h>
#include <linux/pagemap.h>
-#include <linux/uio.h>
+#include <linux/iov_iter.h>
#include <linux/indirect_call_wrapper.h>

#include <net/protocol.h>
@@ -716,6 +716,54 @@ int zerocopy_sg_from_iter(struct sk_buff *skb, struct iov_iter *from)
}
EXPORT_SYMBOL(zerocopy_sg_from_iter);

+static __always_inline
+size_t copy_to_user_iter_csum(void __user *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
+{
+ __wsum next, *csum = priv2;
+
+ next = csum_and_copy_to_user(from + progress, iter_to, len);
+ *csum = csum_block_add(*csum, next, progress);
+ return next ? 0 : len;
+}
+
+static __always_inline
+size_t memcpy_to_iter_csum(void *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
+{
+ __wsum *csum = priv2;
+
+ *csum = csum_and_memcpy(iter_to, from + progress, len, *csum, progress);
+ return 0;
+}
+
+static size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate,
+ struct iov_iter *i)
+{
+ struct csum_state *csstate = _csstate;
+ __wsum sum;
+
+ if (WARN_ON_ONCE(i->data_source))
+ return 0;
+ if (unlikely(iov_iter_is_discard(i))) {
+ // can't use csum_memcpy() for that one - data is not copied
+ csstate->csum = csum_block_add(csstate->csum,
+ csum_partial(addr, bytes, 0),
+ csstate->off);
+ csstate->off += bytes;
+ return bytes;
+ }
+
+ sum = csum_shift(csstate->csum, csstate->off);
+
+ bytes = iterate_and_advance2(i, bytes, (void *)addr, &sum,
+ copy_to_user_iter_csum,
+ memcpy_to_iter_csum);
+ csstate->csum = csum_shift(sum, csstate->off);
+ csstate->off += bytes;
+ return bytes;
+}
+
/**
* skb_copy_and_csum_datagram - Copy datagram to an iovec iterator
* and update a checksum.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4eaf7ed0d1f4..5dbdfce2d05f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -62,6 +62,7 @@
#include <linux/if_vlan.h>
#include <linux/mpls.h>
#include <linux/kcov.h>
+#include <linux/iov_iter.h>

#include <net/protocol.h>
#include <net/dst.h>
@@ -6931,3 +6932,35 @@ ssize_t skb_splice_from_iter(struct sk_buff *skb, struct iov_iter *iter,
return spliced ?: ret;
}
EXPORT_SYMBOL(skb_splice_from_iter);
+
+static __always_inline
+size_t memcpy_from_iter_csum(void *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
+{
+ __wsum *csum = priv2;
+
+ *csum = csum_and_memcpy(to + progress, iter_from, len, *csum, progress);
+ return 0;
+}
+
+static __always_inline
+size_t copy_from_user_iter_csum(void __user *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
+{
+ __wsum next, *csum = priv2;
+
+ next = csum_and_copy_from_user(iter_from, to + progress, len);
+ *csum = csum_block_add(*csum, next, progress);
+ return next ? 0 : len;
+}
+
+size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
+ struct iov_iter *i)
+{
+ if (WARN_ON_ONCE(!i->data_source))
+ return 0;
+ return iterate_and_advance2(i, bytes, addr, csum,
+ copy_from_user_iter_csum,
+ memcpy_from_iter_csum);
+}
+EXPORT_SYMBOL(csum_and_copy_from_iter);

2023-09-14 09:18:54

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v4 05/13] iov: Move iterator functions to a header file

From: David Howells
> Sent: 13 September 2023 17:57
>
> Move the iterator functions to a header file so that other operations that
> need to scan over an iterator can be added. For instance, the rbd driver
> could use this to scan a buffer to see if it is all zeros and libceph could
> use this to generate a crc.

These all look a bit big for being more generally inlined.

I know you want to avoid the indirect call in the normal cases,
but maybe it would be ok for other uses?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2023-09-14 09:29:29

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH v4 01/13] iov_iter: Add a benchmarking kunit test

On 13.09.23 18:58, David Howells wrote:
> ---
> lib/kunit_iov_iter.c | 181 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 181 insertions(+)

Hi David,

#1 this is missing a SoB
#2 looks like all the KUNIT_ASSERT_NOT_NULL() macros are indented wrong
#3 a bit more of a commit message would be nice

Byte,
Johannes


>
> diff --git a/lib/kunit_iov_iter.c b/lib/kunit_iov_iter.c
> index 859b67c4d697..478fea956f58 100644
> --- a/lib/kunit_iov_iter.c
> +++ b/lib/kunit_iov_iter.c
> @@ -756,6 +756,184 @@ static void __init iov_kunit_extract_pages_xarray(struct kunit *test)
> KUNIT_SUCCEED();
> }
>
> +static void iov_kunit_free_page(void *data)
> +{
> + __free_page(data);
> +}
> +
> +static void __init iov_kunit_benchmark_print_stats(struct kunit *test,
> + unsigned int *samples)
> +{
> + unsigned long total = 0;
> + int i;
> +
> + for (i = 0; i < 16; i++) {
> + total += samples[i];
> + kunit_info(test, "run %x: %u uS\n", i, samples[i]);
> + }
> +
> + kunit_info(test, "avg %lu uS\n", total / 16);
> +}
> +
> +/*
> + * Time copying 256MiB through an ITER_BVEC.
> + */
> +static void __init iov_kunit_benchmark_bvec(struct kunit *test)
> +{
> + struct iov_iter iter;
> + struct bio_vec *bvec;
> + struct page *page, **pages;
> + unsigned int samples[16];
> + ktime_t a, b;
> + ssize_t copied;
> + size_t size = 256 * 1024 * 1024, npages = size / PAGE_SIZE;
> + void *scratch;
> + int i;
> +
> + /* Allocate a page and tile it repeatedly in the buffer. */
> + page = alloc_page(GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, page);
> + kunit_add_action_or_reset(test, iov_kunit_free_page, page);
> +
> + bvec = kunit_kmalloc_array(test, npages, sizeof(bvec[0]), GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, bvec);
> + for (i = 0; i < npages; i++)
> + bvec_set_page(&bvec[i], page, PAGE_SIZE, 0);
> +
> + /* Create a single large buffer to copy to/from. */
> + pages = kunit_kmalloc_array(test, npages, sizeof(pages[0]), GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, pages);
> + for (i = 0; i < npages; i++)
> + pages[i] = page;
> +
> + scratch = vmap(pages, npages, VM_MAP | VM_MAP_PUT_PAGES, PAGE_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, scratch);
> + kunit_add_action_or_reset(test, iov_kunit_unmap, scratch);
> +
> + /* Perform and time a bunch of copies. */
> + kunit_info(test, "Benchmarking copy_to_iter() over BVEC:\n");
> + for (i = 0; i < 16; i++) {
> + iov_iter_bvec(&iter, ITER_DEST, bvec, npages, size);
> + a = ktime_get_real();
> + copied = copy_to_iter(scratch, size, &iter);
> + b = ktime_get_real();
> + KUNIT_EXPECT_EQ(test, copied, size);
> + samples[i] = ktime_to_us(ktime_sub(b, a));
> + }
> +
> + iov_kunit_benchmark_print_stats(test, samples);
> + KUNIT_SUCCEED();
> +}
> +
> +/*
> + * Time copying 256MiB through an ITER_BVEC in 256 page chunks.
> + */
> +static void __init iov_kunit_benchmark_bvec_split(struct kunit *test)
> +{
> + struct iov_iter iter;
> + struct bio_vec *bvec;
> + struct page *page, **pages;
> + unsigned int samples[16];
> + ktime_t a, b;
> + ssize_t copied;
> + size_t size, npages = 64;
> + void *scratch;
> + int i, j;
> +
> + /* Allocate a page and tile it repeatedly in the buffer. */
> + page = alloc_page(GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, page);
> + kunit_add_action_or_reset(test, iov_kunit_free_page, page);
> +
> + /* Create a single large buffer to copy to/from. */
> + pages = kunit_kmalloc_array(test, npages, sizeof(pages[0]), GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, pages);
> + for (i = 0; i < npages; i++)
> + pages[i] = page;
> +
> + scratch = vmap(pages, npages, VM_MAP | VM_MAP_PUT_PAGES, PAGE_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, scratch);
> + kunit_add_action_or_reset(test, iov_kunit_unmap, scratch);
> +
> + /* Perform and time a bunch of copies. */
> + kunit_info(test, "Benchmarking copy_to_iter() over BVEC:\n");
> + for (i = 0; i < 16; i++) {
> + size = 256 * 1024 * 1024;
> + a = ktime_get_real();
> + do {
> + size_t part = min(size, npages * PAGE_SIZE);
> +
> + bvec = kunit_kmalloc_array(test, npages, sizeof(bvec[0]), GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, bvec);
> + for (j = 0; j < npages; j++)
> + bvec_set_page(&bvec[j], page, PAGE_SIZE, 0);
> +
> + iov_iter_bvec(&iter, ITER_DEST, bvec, npages, part);
> + copied = copy_to_iter(scratch, part, &iter);
> + KUNIT_EXPECT_EQ(test, copied, part);
> + size -= part;
> + } while (size > 0);
> + b = ktime_get_real();
> + samples[i] = ktime_to_us(ktime_sub(b, a));
> + }
> +
> + iov_kunit_benchmark_print_stats(test, samples);
> + KUNIT_SUCCEED();
> +}
> +
> +/*
> + * Time copying 256MiB through an ITER_XARRAY.
> + */
> +static void __init iov_kunit_benchmark_xarray(struct kunit *test)
> +{
> + struct iov_iter iter;
> + struct xarray *xarray;
> + struct page *page, **pages;
> + unsigned int samples[16];
> + ktime_t a, b;
> + ssize_t copied;
> + size_t size = 256 * 1024 * 1024, npages = size / PAGE_SIZE;
> + void *scratch;
> + int i;
> +
> + /* Allocate a page and tile it repeatedly in the buffer. */
> + page = alloc_page(GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, page);
> + kunit_add_action_or_reset(test, iov_kunit_free_page, page);
> +
> + xarray = iov_kunit_create_xarray(test);
> +
> + for (i = 0; i < npages; i++) {
> + void *x = xa_store(xarray, i, page, GFP_KERNEL);
> +
> + KUNIT_ASSERT_FALSE(test, xa_is_err(x));
> + }
> +
> + /* Create a single large buffer to copy to/from. */
> + pages = kunit_kmalloc_array(test, npages, sizeof(pages[0]), GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, pages);
> + for (i = 0; i < npages; i++)
> + pages[i] = page;
> +
> + scratch = vmap(pages, npages, VM_MAP | VM_MAP_PUT_PAGES, PAGE_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, scratch);
> + kunit_add_action_or_reset(test, iov_kunit_unmap, scratch);
> +
> + /* Perform and time a bunch of copies. */
> + kunit_info(test, "Benchmarking copy_to_iter() over XARRAY:\n");
> + for (i = 0; i < 16; i++) {
> + iov_iter_xarray(&iter, ITER_DEST, xarray, 0, size);
> + a = ktime_get_real();
> + copied = copy_to_iter(scratch, size, &iter);
> + b = ktime_get_real();
> + KUNIT_EXPECT_EQ(test, copied, size);
> + samples[i] = ktime_to_us(ktime_sub(b, a));
> + }
> +
> + iov_kunit_benchmark_print_stats(test, samples);
> + KUNIT_SUCCEED();
> +}
> +
> static struct kunit_case __refdata iov_kunit_cases[] = {
> KUNIT_CASE(iov_kunit_copy_to_kvec),
> KUNIT_CASE(iov_kunit_copy_from_kvec),
> @@ -766,6 +944,9 @@ static struct kunit_case __refdata iov_kunit_cases[] = {
> KUNIT_CASE(iov_kunit_extract_pages_kvec),
> KUNIT_CASE(iov_kunit_extract_pages_bvec),
> KUNIT_CASE(iov_kunit_extract_pages_xarray),
> + KUNIT_CASE(iov_kunit_benchmark_bvec),
> + KUNIT_CASE(iov_kunit_benchmark_bvec_split),
> + KUNIT_CASE(iov_kunit_benchmark_xarray),
> {}
> };
>
>
>

2023-09-15 12:19:01

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v4 05/13] iov: Move iterator functions to a header file

David Laight <[email protected]> wrote:

> > Move the iterator functions to a header file so that other operations that
> > need to scan over an iterator can be added. For instance, the rbd driver
> > could use this to scan a buffer to see if it is all zeros and libceph could
> > use this to generate a crc.
>
> These all look a bit big for being more generally inlined.
>
> I know you want to avoid the indirect call in the normal cases,
> but maybe it would be ok for other uses?

So you'd advocate for something like:

size_t generic_iterate(struct iov_iter *iter, size_t len, void *priv,
void *priv2, iov_ustep_f ustep, iov_step_f step)
{
return iterate_and_advance2(iter, len, priv, priv2,
ustep, step);
}
EXPORT_SYMBOL(generic_iterate);

in lib/iov_iter.c and then call that from the places that want to use it?

I tried benchmarking that (see attached patch - it needs to go on top of my
iov patches). Running the insmod thrice and then filtering out and sorting
the results:

iov_kunit_benchmark_bvec: avg 3174 uS, stddev 68 uS
iov_kunit_benchmark_bvec: avg 3176 uS, stddev 61 uS
iov_kunit_benchmark_bvec: avg 3180 uS, stddev 64 uS
iov_kunit_benchmark_bvec_outofline: avg 3678 uS, stddev 4 uS
iov_kunit_benchmark_bvec_outofline: avg 3678 uS, stddev 5 uS
iov_kunit_benchmark_bvec_outofline: avg 3679 uS, stddev 6 uS
iov_kunit_benchmark_xarray: avg 3560 uS, stddev 5 uS
iov_kunit_benchmark_xarray: avg 3560 uS, stddev 6 uS
iov_kunit_benchmark_xarray: avg 3570 uS, stddev 16 uS
iov_kunit_benchmark_xarray_outofline: avg 4125 uS, stddev 13 uS
iov_kunit_benchmark_xarray_outofline: avg 4125 uS, stddev 2 uS
iov_kunit_benchmark_xarray_outofline: avg 4125 uS, stddev 6 uS

It adds almost 16% overhead:

(gdb) p 4125/3560.0
$2 = 1.1587078651685394
(gdb) p 3678/3174.0
$3 = 1.1587901701323251

I'm guessing a lot of that is due to function pointer mitigations.

Now, part of the code size expansion can be mitigated by using, say,
iterate_and_advance_kernel() if you know you aren't going to encounter
user-backed iterators, or even using, say, iterate_bvec() if you know you're
only going to see a specific iterator type.

David
---
iov_iter: Benchmark out of line generic iterator

diff --git a/include/linux/iov_iter.h b/include/linux/iov_iter.h
index 2ebb86c041b6..8f562e80473b 100644
--- a/include/linux/iov_iter.h
+++ b/include/linux/iov_iter.h
@@ -293,4 +293,7 @@ size_t iterate_and_advance_kernel(struct iov_iter *iter, size_t len, void *priv,
return progress;
}

+size_t generic_iterate(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+ iov_ustep_f ustep, iov_step_f step);
+
#endif /* _LINUX_IOV_ITER_H */
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 8f7a10c4a295..f9643dd02676 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1684,3 +1684,10 @@ ssize_t iov_iter_extract_pages(struct iov_iter *i,
return -EFAULT;
}
EXPORT_SYMBOL_GPL(iov_iter_extract_pages);
+
+size_t generic_iterate(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+ iov_ustep_f ustep, iov_step_f step)
+{
+ return iterate_and_advance2(iter, len, priv, priv2, ustep, step);
+}
+EXPORT_SYMBOL(generic_iterate);
diff --git a/lib/kunit_iov_iter.c b/lib/kunit_iov_iter.c
index cc9c64663a73..f208516a68c9 100644
--- a/lib/kunit_iov_iter.c
+++ b/lib/kunit_iov_iter.c
@@ -18,6 +18,7 @@
#include <linux/writeback.h>
#include <linux/uio.h>
#include <linux/bvec.h>
+#include <linux/iov_iter.h>
#include <kunit/test.h>

MODULE_DESCRIPTION("iov_iter testing");
@@ -1571,6 +1572,124 @@ static void __init iov_kunit_benchmark_xarray(struct kunit *test)
KUNIT_SUCCEED();
}

+static noinline
+size_t shovel_to_user_iter(void __user *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
+{
+ if (should_fail_usercopy())
+ return len;
+ if (access_ok(iter_to, len)) {
+ from += progress;
+ instrument_copy_to_user(iter_to, from, len);
+ len = raw_copy_to_user(iter_to, from, len);
+ }
+ return len;
+}
+
+static noinline
+size_t shovel_to_kernel_iter(void *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
+{
+ memcpy(iter_to, from + progress, len);
+ return 0;
+}
+
+/*
+ * Time copying 256MiB through an ITER_BVEC with an out-of-line copier
+ * function.
+ */
+static void __init iov_kunit_benchmark_bvec_outofline(struct kunit *test)
+{
+ struct iov_iter iter;
+ struct bio_vec *bvec;
+ struct page *page;
+ unsigned int samples[IOV_KUNIT_NR_SAMPLES];
+ ktime_t a, b;
+ ssize_t copied;
+ size_t size = 256 * 1024 * 1024, npages = size / PAGE_SIZE;
+ void *scratch;
+ int i;
+
+ /* Allocate a page and tile it repeatedly in the buffer. */
+ page = alloc_page(GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, page);
+ kunit_add_action_or_reset(test, iov_kunit_free_page, page);
+
+ bvec = kunit_kmalloc_array(test, npages, sizeof(bvec[0]), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, bvec);
+ for (i = 0; i < npages; i++)
+ bvec_set_page(&bvec[i], page, PAGE_SIZE, 0);
+
+ /* Create a single large buffer to copy to/from. */
+ scratch = iov_kunit_create_source(test, npages);
+
+ /* Perform and time a bunch of copies. */
+ kunit_info(test, "Benchmarking copy_to_iter() over BVEC:\n");
+ for (i = 0; i < IOV_KUNIT_NR_SAMPLES; i++) {
+ iov_iter_bvec(&iter, ITER_DEST, bvec, npages, size);
+ a = ktime_get_real();
+ copied = generic_iterate(&iter, size, scratch, NULL,
+ shovel_to_user_iter,
+ shovel_to_kernel_iter);
+ b = ktime_get_real();
+ KUNIT_EXPECT_EQ(test, copied, size);
+ samples[i] = ktime_to_us(ktime_sub(b, a));
+ }
+
+ iov_kunit_benchmark_print_stats(test, samples);
+ KUNIT_SUCCEED();
+}
+
+/*
+ * Time copying 256MiB through an ITER_XARRAY with an out-of-line copier
+ * function.
+ */
+static void __init iov_kunit_benchmark_xarray_outofline(struct kunit *test)
+{
+ struct iov_iter iter;
+ struct xarray *xarray;
+ struct page *page;
+ unsigned int samples[IOV_KUNIT_NR_SAMPLES];
+ ktime_t a, b;
+ ssize_t copied;
+ size_t size = 256 * 1024 * 1024, npages = size / PAGE_SIZE;
+ void *scratch;
+ int i;
+
+ /* Allocate a page and tile it repeatedly in the buffer. */
+ page = alloc_page(GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, page);
+ kunit_add_action_or_reset(test, iov_kunit_free_page, page);
+
+ xarray = iov_kunit_create_xarray(test);
+
+ for (i = 0; i < npages; i++) {
+ void *x = xa_store(xarray, i, page, GFP_KERNEL);
+
+ KUNIT_ASSERT_FALSE(test, xa_is_err(x));
+ }
+
+ /* Create a single large buffer to copy to/from. */
+ scratch = iov_kunit_create_source(test, npages);
+
+ /* Perform and time a bunch of copies. */
+ kunit_info(test, "Benchmarking copy_to_iter() over XARRAY:\n");
+ for (i = 0; i < IOV_KUNIT_NR_SAMPLES; i++) {
+ iov_iter_xarray(&iter, ITER_DEST, xarray, 0, size);
+ a = ktime_get_real();
+
+ copied = generic_iterate(&iter, size, scratch, NULL,
+ shovel_to_user_iter,
+ shovel_to_kernel_iter);
+ b = ktime_get_real();
+ KUNIT_EXPECT_EQ(test, copied, size);
+ samples[i] = ktime_to_us(ktime_sub(b, a));
+ }
+
+ iov_kunit_benchmark_print_stats(test, samples);
+ KUNIT_SUCCEED();
+}
+
static struct kunit_case __refdata iov_kunit_cases[] = {
KUNIT_CASE(iov_kunit_copy_to_ubuf),
KUNIT_CASE(iov_kunit_copy_from_ubuf),
@@ -1593,6 +1712,8 @@ static struct kunit_case __refdata iov_kunit_cases[] = {
KUNIT_CASE(iov_kunit_benchmark_bvec),
KUNIT_CASE(iov_kunit_benchmark_bvec_split),
KUNIT_CASE(iov_kunit_benchmark_xarray),
+ KUNIT_CASE(iov_kunit_benchmark_bvec_outofline),
+ KUNIT_CASE(iov_kunit_benchmark_xarray_outofline),
{}
};