2023-09-25 20:02:05

by David Howells

[permalink] [raw]
Subject: [PATCH v7 00/12] iov_iter: Convert the iterator macros into inline funcs

Hi Jens, Christian,

Could you take these patches into the block tree or the fs tree? The
patches convert the iov_iter iteration macros to be inline functions.

(1) Remove last_offset from iov_iter as it was only used by ITER_PIPE.

(2) Add a __user tag on copy_mc_to_user()'s dst argument on x86 to match
that on powerpc and get rid of a sparse warning.

(3) Convert iter->user_backed to user_backed_iter() in the sound PCM
driver.

(4) Convert iter->user_backed to user_backed_iter() in a couple of
infiniband drivers.

(5) Renumber the type enum so that the ITER_* constants match the order in
iterate_and_advance*().

(6) Since the preceding patch puts UBUF and IOVEC at 0 and 1, change
user_backed_iter() to just use the type value and get rid of the extra
flag.

(7) Convert the iov_iter iteration macros to always-inline functions to
make the code easier to follow. It uses function pointers, but they
get optimised away. The priv2 argument likewise gets optimised away
if unused.

The functions are placed into a header file so that bespoke iterators
can be created elsewhere. For instance, rbd has an optimisation that
requires it to scan to the buffer it is given to see if it is all
zeros. It would be nice if this could use iterate_and_advance() - but
that's currently buried inside lib/iov_iter.c.

Whilst we could provide a generic iteration function that takes a pair
of function pointers instead, it does add around 16% overhead in the
framework, presumably from the combination of function pointers and
various mitigations.

Further, if it is known that just a particular iterator-type is in
play, just that iteration function can be used.

(8) Move the check for ->copy_mc to _copy_from_iter() and
copy_page_from_iter_atomic() rather than in memcpy_from_iter_mc()
where it gets repeated for every segment. Instead, we check once and
invoke a side function that can use iterate_bvec() rather than
iterate_and_advance() and supply a different step function.

(9) Move the copy-and-csum code to net/ where it can be in proximity with
the code that uses it. This eliminates the code if CONFIG_NET=n and
allows for the slim possibility of it being inlined.

(10) Fold memcpy_and_csum() in to its two users.

(11) Move csum_and_copy_from_iter_full() out of line and merge in
csum_and_copy_from_iter() since the former is the only caller of the
latter.

(12) Move hash_and_copy_to_iter() to net/ where it can be with its only
caller.

Anyway, the changes in compiled function size either side of patches
(5)+(6) on x86_64 look like:

__copy_from_iter_mc new 0xe8
_copy_from_iter inc 0x360 -> 0x3a7 +0x47
_copy_from_iter_flushcache inc 0x34c -> 0x38e +0x42
_copy_from_iter_nocache inc 0x354 -> 0x378 +0x24
_copy_mc_to_iter inc 0x396 -> 0x3f1 +0x5b
_copy_to_iter inc 0x33b -> 0x385 +0x4a
copy_page_from_iter_atomic.part.0 inc 0x393 -> 0x3e0 +0x4d
copy_page_to_iter_nofault.part.0 inc 0x3de -> 0x3eb +0xd
copyin del 0x30
copyout del 0x2d
copyout_mc del 0x2b
csum_and_copy_from_iter inc 0x3db -> 0x41d +0x42
csum_and_copy_to_iter inc 0x45d -> 0x48d +0x30
iov_iter_zero inc 0x34a -> 0x36a +0x20
memcpy_from_iter.isra.0 del 0x1f

Note that there's a noticeable expansion on some of the main functions
because a number of the helpers get inlined instead of being called.

In terms of benchmarking patch (5)+(6), three runs without them:

iov_kunit_benchmark_ubuf: avg 3955 uS, stddev 169 uS
iov_kunit_benchmark_ubuf: avg 4122 uS, stddev 1292 uS
iov_kunit_benchmark_ubuf: avg 4451 uS, stddev 1362 uS
iov_kunit_benchmark_iovec: avg 6607 uS, stddev 22 uS
iov_kunit_benchmark_iovec: avg 6608 uS, stddev 19 uS
iov_kunit_benchmark_iovec: avg 6609 uS, stddev 24 uS
iov_kunit_benchmark_bvec: avg 3166 uS, stddev 11 uS
iov_kunit_benchmark_bvec: avg 3167 uS, stddev 13 uS
iov_kunit_benchmark_bvec: avg 3170 uS, stddev 16 uS
iov_kunit_benchmark_bvec_split: avg 3394 uS, stddev 12 uS
iov_kunit_benchmark_bvec_split: avg 3394 uS, stddev 20 uS
iov_kunit_benchmark_bvec_split: avg 3395 uS, stddev 19 uS
iov_kunit_benchmark_kvec: avg 2672 uS, stddev 12 uS
iov_kunit_benchmark_kvec: avg 2672 uS, stddev 12 uS
iov_kunit_benchmark_kvec: avg 2672 uS, stddev 9 uS
iov_kunit_benchmark_xarray: avg 3719 uS, stddev 9 uS
iov_kunit_benchmark_xarray: avg 3719 uS, stddev 9 uS
iov_kunit_benchmark_xarray: avg 3721 uS, stddev 24 uS

and three runs with them:

iov_kunit_benchmark_ubuf: avg 4110 uS, stddev 1254 uS
iov_kunit_benchmark_ubuf: avg 4141 uS, stddev 1411 uS
iov_kunit_benchmark_ubuf: avg 4572 uS, stddev 1889 uS
iov_kunit_benchmark_iovec: avg 6582 uS, stddev 27 uS
iov_kunit_benchmark_iovec: avg 6585 uS, stddev 25 uS
iov_kunit_benchmark_iovec: avg 6586 uS, stddev 48 uS
iov_kunit_benchmark_bvec: avg 3175 uS, stddev 13 uS
iov_kunit_benchmark_bvec: avg 3177 uS, stddev 12 uS
iov_kunit_benchmark_bvec: avg 3178 uS, stddev 12 uS
iov_kunit_benchmark_bvec_split: avg 3380 uS, stddev 20 uS
iov_kunit_benchmark_bvec_split: avg 3384 uS, stddev 15 uS
iov_kunit_benchmark_bvec_split: avg 3386 uS, stddev 25 uS
iov_kunit_benchmark_kvec: avg 2671 uS, stddev 11 uS
iov_kunit_benchmark_kvec: avg 2672 uS, stddev 12 uS
iov_kunit_benchmark_kvec: avg 2677 uS, stddev 20 uS
iov_kunit_benchmark_xarray: avg 3599 uS, stddev 20 uS
iov_kunit_benchmark_xarray: avg 3603 uS, stddev 8 uS
iov_kunit_benchmark_xarray: avg 3610 uS, stddev 16 uS

I've pushed the patches here also:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=iov-cleanup

David

Changes
=======
ver #7)
- Defer the patch to add a kernel-backed iter-only to my ceph branch.
- Add missing sign-offs and cc's to commit descriptions.

ver #6)
- Rebase on v6.6-rc2 rather than on my iov-kunit branch.
- Add a patch to remove last_offset from iov_iter.
- Add a patch to tag copy_mc_to_user() with __user on x86.
- Document the priv2 arg of iterate_and_advance_kernel().

ver #5)
- Moved kunit test patches out to a separate set.
- Moved "iter->count - progress" into individual iteration subfunctions.
- Fix iterate_iovec() and to update iter->__iovec after subtracting it to
calculate iter->nr_segs.
- Merged the function inlining patch and the move-to-header patch.
- Rearranged the patch order slightly to put the patches to move
networking stuff to net/ last.
- Went back to a simpler patch to special-case ->copy_mc in
_copy_from_iter() and copy_page_from_iter_atomic() before we get to call
iterate_and_advance() so as to reduce the number of checks for this.

ver #4)
- Fix iterate_bvec() and iterate_kvec() to update iter->bvec and
iter->kvec after subtracting it to calculate iter->nr_segs.
- Change iterate_xarray() to use start+progress rather than increasing
start to reduce code size.
- Added patches to move some iteration functions over to net/ as the files
there can #include the iterator framework.
- Added a patch to benchmark the iteration.
- Tried an experimental patch to make copy_from_iter() and similar always
catch MCE.

ver #3)
- Use min_t(size_t,) not min() to avoid a warning on Hexagon.
- Inline all the step functions.
- Added a patch to better handle copy_mc.

ver #2)
- Rebased on top of Willy's changes in linux-next.
- Change the checksum argument to the iteration functions to be a general
void* and use it to pass iter->copy_mc flag to memcpy_from_iter_mc() to
avoid using a function pointer.
- Arrange the end of the iterate_*() functions to look the same to give
the optimiser the best chance.
- Make iterate_and_advance() a wrapper around iterate_and_advance2().
- Adjust iterate_and_advance2() to use if-else-if-else-if-else rather than
switch(), to put ITER_BVEC before KVEC and to mark UBUF and IOVEC as
likely().
- Move "iter->count += progress" into iterate_and_advance2() from the
iterate functions.
- Mark a number of the iterator helpers with __always_inline.
- Fix _copy_from_iter_flushcache() to use memcpy_from_iter_flushcache()
not memcpy_from_iter().

Link: https://lore.kernel.org/r/[email protected]/ # v1
Link: https://lore.kernel.org/r/[email protected]/ # v2
Link: https://lore.kernel.org/r/[email protected]/ # v3
Link: https://lore.kernel.org/r/[email protected]/ # v4
Link: https://lore.kernel.org/r/[email protected]/ # v5
Link: https://lore.kernel.org/r/[email protected]/ # v6

David Howells (12):
iov_iter: Remove last_offset from iov_iter as it was for ITER_PIPE
iov_iter, x86: Be consistent about the __user tag on copy_mc_to_user()
sound: Fix snd_pcm_readv()/writev() to use iov access functions
infiniband: Use user_backed_iter() to see if iterator is UBUF/IOVEC
iov_iter: Renumber ITER_* constants
iov_iter: Derive user-backedness from the iterator type
iov_iter: Convert iterate*() to inline funcs
iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
iov_iter, net: Move csum_and_copy_to/from_iter() to net/
iov_iter, net: Fold in csum_and_memcpy()
iov_iter, net: Merge csum_and_copy_from_iter{,_full}() together
iov_iter, net: Move hash_and_copy_to_iter() to net/

arch/x86/include/asm/uaccess.h | 2 +-
arch/x86/lib/copy_mc.c | 8 +-
drivers/infiniband/hw/hfi1/file_ops.c | 2 +-
drivers/infiniband/hw/qib/qib_file_ops.c | 2 +-
include/linux/iov_iter.h | 274 ++++++++++++++
include/linux/skbuff.h | 3 +
include/linux/uio.h | 34 +-
lib/iov_iter.c | 441 +++++++----------------
net/core/datagram.c | 75 +++-
net/core/skbuff.c | 40 ++
sound/core/pcm_native.c | 4 +-
11 files changed, 544 insertions(+), 341 deletions(-)
create mode 100644 include/linux/iov_iter.h


2023-09-25 20:27:03

by David Howells

[permalink] [raw]
Subject: [PATCH v7 10/12] iov_iter, net: Fold in csum_and_memcpy()

Fold csum_and_memcpy() in to its callers.

Signed-off-by: David Howells <[email protected]>
cc: Alexander Viro <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: Linus Torvalds <[email protected]>
cc: David Laight <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
---
include/linux/skbuff.h | 7 -------
net/core/datagram.c | 3 ++-
net/core/skbuff.c | 3 ++-
3 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index d0656cc11c16..c81ef5d76953 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3679,13 +3679,6 @@ static inline int __must_check skb_put_padto(struct sk_buff *skb, unsigned int l
return __skb_put_padto(skb, len, true);
}

-static inline __wsum csum_and_memcpy(void *to, const void *from, size_t len,
- __wsum sum, size_t off)
-{
- __wsum next = csum_partial_copy_nocheck(from, to, len);
- return csum_block_add(sum, next, off);
-}
-
struct csum_state {
__wsum csum;
size_t off;
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 37c89d0933b7..452620dd41e4 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -732,8 +732,9 @@ size_t memcpy_to_iter_csum(void *iter_to, size_t progress,
size_t len, void *from, void *priv2)
{
__wsum *csum = priv2;
+ __wsum next = csum_partial_copy_nocheck(from, iter_to, len);

- *csum = csum_and_memcpy(iter_to, from + progress, len, *csum, progress);
+ *csum = csum_block_add(*csum, next, progress);
return 0;
}

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5dbdfce2d05f..3efed86321db 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -6938,8 +6938,9 @@ size_t memcpy_from_iter_csum(void *iter_from, size_t progress,
size_t len, void *to, void *priv2)
{
__wsum *csum = priv2;
+ __wsum next = csum_partial_copy_nocheck(iter_from, to + progress, len);

- *csum = csum_and_memcpy(to + progress, iter_from, len, *csum, progress);
+ *csum = csum_block_add(*csum, next, progress);
return 0;
}


2023-09-25 22:36:49

by David Howells

[permalink] [raw]
Subject: [PATCH v7 05/12] iov_iter: Renumber ITER_* constants

Renumber the ITER_* iterator-type constants to put things in the same order
as in the iteration functions and to group user-backed iterators at the
bottom.

Signed-off-by: David Howells <[email protected]>
cc: Alexander Viro <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: Linus Torvalds <[email protected]>
cc: David Laight <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
---
include/linux/uio.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 2000e42a6586..bef8e56aa45c 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -21,12 +21,12 @@ struct kvec {

enum iter_type {
/* iter types */
+ ITER_UBUF,
ITER_IOVEC,
- ITER_KVEC,
ITER_BVEC,
+ ITER_KVEC,
ITER_XARRAY,
ITER_DISCARD,
- ITER_UBUF,
};

#define ITER_SOURCE 1 // == WRITE

2023-09-25 23:26:16

by David Howells

[permalink] [raw]
Subject: [PATCH v7 09/12] iov_iter, net: Move csum_and_copy_to/from_iter() to net/

Move csum_and_copy_to/from_iter() to net code now that the iteration
framework can be #included.

Signed-off-by: David Howells <[email protected]>
cc: Alexander Viro <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: Linus Torvalds <[email protected]>
cc: David Laight <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
---
include/linux/skbuff.h | 25 ++++++++++++
include/linux/uio.h | 18 ---------
lib/iov_iter.c | 89 ------------------------------------------
net/core/datagram.c | 50 +++++++++++++++++++++++-
net/core/skbuff.c | 33 ++++++++++++++++
5 files changed, 107 insertions(+), 108 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4174c4b82d13..d0656cc11c16 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3679,6 +3679,31 @@ static inline int __must_check skb_put_padto(struct sk_buff *skb, unsigned int l
return __skb_put_padto(skb, len, true);
}

+static inline __wsum csum_and_memcpy(void *to, const void *from, size_t len,
+ __wsum sum, size_t off)
+{
+ __wsum next = csum_partial_copy_nocheck(from, to, len);
+ return csum_block_add(sum, next, off);
+}
+
+struct csum_state {
+ __wsum csum;
+ size_t off;
+};
+
+size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i);
+
+static __always_inline __must_check
+bool csum_and_copy_from_iter_full(void *addr, size_t bytes,
+ __wsum *csum, struct iov_iter *i)
+{
+ size_t copied = csum_and_copy_from_iter(addr, bytes, csum, i);
+ if (likely(copied == bytes))
+ return true;
+ iov_iter_revert(i, copied);
+ return false;
+}
+
static inline int skb_add_data(struct sk_buff *skb,
struct iov_iter *from, int copy)
{
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 65d9143f83c8..0a5426c97e02 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -338,24 +338,6 @@ iov_iter_npages_cap(struct iov_iter *i, int maxpages, size_t max_bytes)
return npages;
}

-struct csum_state {
- __wsum csum;
- size_t off;
-};
-
-size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csstate, struct iov_iter *i);
-size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i);
-
-static __always_inline __must_check
-bool csum_and_copy_from_iter_full(void *addr, size_t bytes,
- __wsum *csum, struct iov_iter *i)
-{
- size_t copied = csum_and_copy_from_iter(addr, bytes, csum, i);
- if (likely(copied == bytes))
- return true;
- iov_iter_revert(i, copied);
- return false;
-}
size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
struct iov_iter *i);

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 943aa3cfd7b3..fef934a8745d 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -10,7 +10,6 @@
#include <linux/vmalloc.h>
#include <linux/splice.h>
#include <linux/compat.h>
-#include <net/checksum.h>
#include <linux/scatterlist.h>
#include <linux/instrumented.h>
#include <linux/iov_iter.h>
@@ -179,13 +178,6 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
}
EXPORT_SYMBOL(iov_iter_init);

-static __wsum csum_and_memcpy(void *to, const void *from, size_t len,
- __wsum sum, size_t off)
-{
- __wsum next = csum_partial_copy_nocheck(from, to, len);
- return csum_block_add(sum, next, off);
-}
-
size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
{
if (WARN_ON_ONCE(i->data_source))
@@ -1101,87 +1093,6 @@ ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
}
EXPORT_SYMBOL(iov_iter_get_pages_alloc2);

-static __always_inline
-size_t copy_from_user_iter_csum(void __user *iter_from, size_t progress,
- size_t len, void *to, void *priv2)
-{
- __wsum next, *csum = priv2;
-
- next = csum_and_copy_from_user(iter_from, to + progress, len);
- *csum = csum_block_add(*csum, next, progress);
- return next ? 0 : len;
-}
-
-static __always_inline
-size_t memcpy_from_iter_csum(void *iter_from, size_t progress,
- size_t len, void *to, void *priv2)
-{
- __wsum *csum = priv2;
-
- *csum = csum_and_memcpy(to + progress, iter_from, len, *csum, progress);
- return 0;
-}
-
-size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
- struct iov_iter *i)
-{
- if (WARN_ON_ONCE(!i->data_source))
- return 0;
- return iterate_and_advance2(i, bytes, addr, csum,
- copy_from_user_iter_csum,
- memcpy_from_iter_csum);
-}
-EXPORT_SYMBOL(csum_and_copy_from_iter);
-
-static __always_inline
-size_t copy_to_user_iter_csum(void __user *iter_to, size_t progress,
- size_t len, void *from, void *priv2)
-{
- __wsum next, *csum = priv2;
-
- next = csum_and_copy_to_user(from + progress, iter_to, len);
- *csum = csum_block_add(*csum, next, progress);
- return next ? 0 : len;
-}
-
-static __always_inline
-size_t memcpy_to_iter_csum(void *iter_to, size_t progress,
- size_t len, void *from, void *priv2)
-{
- __wsum *csum = priv2;
-
- *csum = csum_and_memcpy(iter_to, from + progress, len, *csum, progress);
- return 0;
-}
-
-size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate,
- struct iov_iter *i)
-{
- struct csum_state *csstate = _csstate;
- __wsum sum;
-
- if (WARN_ON_ONCE(i->data_source))
- return 0;
- if (unlikely(iov_iter_is_discard(i))) {
- // can't use csum_memcpy() for that one - data is not copied
- csstate->csum = csum_block_add(csstate->csum,
- csum_partial(addr, bytes, 0),
- csstate->off);
- csstate->off += bytes;
- return bytes;
- }
-
- sum = csum_shift(csstate->csum, csstate->off);
-
- bytes = iterate_and_advance2(i, bytes, (void *)addr, &sum,
- copy_to_user_iter_csum,
- memcpy_to_iter_csum);
- csstate->csum = csum_shift(sum, csstate->off);
- csstate->off += bytes;
- return bytes;
-}
-EXPORT_SYMBOL(csum_and_copy_to_iter);
-
size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
struct iov_iter *i)
{
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 176eb5834746..37c89d0933b7 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -50,7 +50,7 @@
#include <linux/spinlock.h>
#include <linux/slab.h>
#include <linux/pagemap.h>
-#include <linux/uio.h>
+#include <linux/iov_iter.h>
#include <linux/indirect_call_wrapper.h>

#include <net/protocol.h>
@@ -716,6 +716,54 @@ int zerocopy_sg_from_iter(struct sk_buff *skb, struct iov_iter *from)
}
EXPORT_SYMBOL(zerocopy_sg_from_iter);

+static __always_inline
+size_t copy_to_user_iter_csum(void __user *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
+{
+ __wsum next, *csum = priv2;
+
+ next = csum_and_copy_to_user(from + progress, iter_to, len);
+ *csum = csum_block_add(*csum, next, progress);
+ return next ? 0 : len;
+}
+
+static __always_inline
+size_t memcpy_to_iter_csum(void *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
+{
+ __wsum *csum = priv2;
+
+ *csum = csum_and_memcpy(iter_to, from + progress, len, *csum, progress);
+ return 0;
+}
+
+static size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate,
+ struct iov_iter *i)
+{
+ struct csum_state *csstate = _csstate;
+ __wsum sum;
+
+ if (WARN_ON_ONCE(i->data_source))
+ return 0;
+ if (unlikely(iov_iter_is_discard(i))) {
+ // can't use csum_memcpy() for that one - data is not copied
+ csstate->csum = csum_block_add(csstate->csum,
+ csum_partial(addr, bytes, 0),
+ csstate->off);
+ csstate->off += bytes;
+ return bytes;
+ }
+
+ sum = csum_shift(csstate->csum, csstate->off);
+
+ bytes = iterate_and_advance2(i, bytes, (void *)addr, &sum,
+ copy_to_user_iter_csum,
+ memcpy_to_iter_csum);
+ csstate->csum = csum_shift(sum, csstate->off);
+ csstate->off += bytes;
+ return bytes;
+}
+
/**
* skb_copy_and_csum_datagram - Copy datagram to an iovec iterator
* and update a checksum.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4eaf7ed0d1f4..5dbdfce2d05f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -62,6 +62,7 @@
#include <linux/if_vlan.h>
#include <linux/mpls.h>
#include <linux/kcov.h>
+#include <linux/iov_iter.h>

#include <net/protocol.h>
#include <net/dst.h>
@@ -6931,3 +6932,35 @@ ssize_t skb_splice_from_iter(struct sk_buff *skb, struct iov_iter *iter,
return spliced ?: ret;
}
EXPORT_SYMBOL(skb_splice_from_iter);
+
+static __always_inline
+size_t memcpy_from_iter_csum(void *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
+{
+ __wsum *csum = priv2;
+
+ *csum = csum_and_memcpy(to + progress, iter_from, len, *csum, progress);
+ return 0;
+}
+
+static __always_inline
+size_t copy_from_user_iter_csum(void __user *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
+{
+ __wsum next, *csum = priv2;
+
+ next = csum_and_copy_from_user(iter_from, to + progress, len);
+ *csum = csum_block_add(*csum, next, progress);
+ return next ? 0 : len;
+}
+
+size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
+ struct iov_iter *i)
+{
+ if (WARN_ON_ONCE(!i->data_source))
+ return 0;
+ return iterate_and_advance2(i, bytes, addr, csum,
+ copy_from_user_iter_csum,
+ memcpy_from_iter_csum);
+}
+EXPORT_SYMBOL(csum_and_copy_from_iter);

2023-09-26 01:02:46

by David Howells

[permalink] [raw]
Subject: [PATCH v7 08/12] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()

iter->copy_mc is only used with a bvec iterator and only by
dump_emit_page() in fs/coredump.c so rather than handle this in
memcpy_from_iter_mc() where it is checked repeatedly by _copy_from_iter()
and copy_page_from_iter_atomic(),

Signed-off-by: David Howells <[email protected]>
cc: Alexander Viro <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: Linus Torvalds <[email protected]>
cc: David Laight <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
---
lib/iov_iter.c | 39 +++++++++++++++++++++++++++------------
1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 65374ee91ecd..943aa3cfd7b3 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -253,14 +253,33 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
#endif /* CONFIG_ARCH_HAS_COPY_MC */

-static size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
- size_t len, void *to, void *priv2)
+static __always_inline
+size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
+{
+ return copy_mc_to_kernel(to + progress, iter_from, len);
+}
+
+static size_t __copy_from_iter_mc(void *addr, size_t bytes, struct iov_iter *i)
{
- struct iov_iter *iter = priv2;
+ size_t progress;

- if (iov_iter_is_copy_mc(iter))
- return copy_mc_to_kernel(to + progress, iter_from, len);
- return memcpy_from_iter(iter_from, progress, len, to, priv2);
+ if (unlikely(i->count < bytes))
+ bytes = i->count;
+ if (unlikely(!bytes))
+ return 0;
+ progress = iterate_bvec(i, bytes, addr, NULL, memcpy_from_iter_mc);
+ i->count -= progress;
+ return progress;
+}
+
+static __always_inline
+size_t __copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
+{
+ if (unlikely(iov_iter_is_copy_mc(i)))
+ return __copy_from_iter_mc(addr, bytes, i);
+ return iterate_and_advance(i, bytes, addr,
+ copy_from_user_iter, memcpy_from_iter);
}

size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
@@ -270,9 +289,7 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)

if (user_backed_iter(i))
might_fault();
- return iterate_and_advance2(i, bytes, addr, i,
- copy_from_user_iter,
- memcpy_from_iter_mc);
+ return __copy_from_iter(addr, bytes, i);
}
EXPORT_SYMBOL(_copy_from_iter);

@@ -493,9 +510,7 @@ size_t copy_page_from_iter_atomic(struct page *page, size_t offset,
}

p = kmap_atomic(page) + offset;
- n = iterate_and_advance2(i, n, p, i,
- copy_from_user_iter,
- memcpy_from_iter_mc);
+ __copy_from_iter(p, n, i);
kunmap_atomic(p);
copied += n;
offset += n;

2023-09-26 09:57:06

by David Howells

[permalink] [raw]
Subject: [PATCH v7 04/12] infiniband: Use user_backed_iter() to see if iterator is UBUF/IOVEC

Use user_backed_iter() to see if iterator is UBUF/IOVEC rather than poking
inside the iterator.

Signed-off-by: David Howells <[email protected]>
cc: Dennis Dalessandro <[email protected]>
cc: Jason Gunthorpe <[email protected]>
cc: Leon Romanovsky <[email protected]>
cc: [email protected]
---
drivers/infiniband/hw/hfi1/file_ops.c | 2 +-
drivers/infiniband/hw/qib/qib_file_ops.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index a5ab22cedd41..788fc249234f 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -267,7 +267,7 @@ static ssize_t hfi1_write_iter(struct kiocb *kiocb, struct iov_iter *from)

if (!HFI1_CAP_IS_KSET(SDMA))
return -EINVAL;
- if (!from->user_backed)
+ if (!user_backed_iter(from))
return -EINVAL;
idx = srcu_read_lock(&fd->pq_srcu);
pq = srcu_dereference(fd->pq, &fd->pq_srcu);
diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c b/drivers/infiniband/hw/qib/qib_file_ops.c
index 152952127f13..29e4c59aa23b 100644
--- a/drivers/infiniband/hw/qib/qib_file_ops.c
+++ b/drivers/infiniband/hw/qib/qib_file_ops.c
@@ -2244,7 +2244,7 @@ static ssize_t qib_write_iter(struct kiocb *iocb, struct iov_iter *from)
struct qib_ctxtdata *rcd = ctxt_fp(iocb->ki_filp);
struct qib_user_sdma_queue *pq = fp->pq;

- if (!from->user_backed || !from->nr_segs || !pq)
+ if (!user_backed_iter(from) || !from->nr_segs || !pq)
return -EINVAL;

return qib_user_sdma_writev(rcd, pq, iter_iov(from), from->nr_segs);

2023-10-02 09:35:33

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v7 08/12] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()

David Howells <[email protected]> wrote:

> +static size_t __copy_from_iter_mc(void *addr, size_t bytes, struct iov_iter *i)
> {
> - struct iov_iter *iter = priv2;
> + size_t progress;
>
> - if (iov_iter_is_copy_mc(iter))
> - return copy_mc_to_kernel(to + progress, iter_from, len);
> - return memcpy_from_iter(iter_from, progress, len, to, priv2);
> + if (unlikely(i->count < bytes))
> + bytes = i->count;
> + if (unlikely(!bytes))
> + return 0;
> + progress = iterate_bvec(i, bytes, addr, NULL, memcpy_from_iter_mc);
> + i->count -= progress;

i->count shouldn't be decreased here as iterate_bvec() now does that.

This causes the LTP abort01 test to log a warning under KASAN (see below).
I'll remove the line and repush the patches.

David

LTP: starting abort01
==================================================================
BUG: KASAN: stack-out-of-bounds in __copy_from_iter_mc+0x2e6/0x480
Read of size 4 at addr ffffc90004777594 by task abort01/708

CPU: 4 PID: 708 Comm: abort01 Not tainted 99.6.0-rc3-ged6251886a1d #46
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/Incus, BIOS unknown 2/2/2022
Call Trace:
<TASK>
dump_stack_lvl+0x3d/0x70
print_report+0xce/0x650
? lock_acquire+0x1b1/0x330
kasan_report+0xda/0x110
? __copy_from_iter_mc+0x2e6/0x480
? __copy_from_iter_mc+0x2e6/0x480
__copy_from_iter_mc+0x2e6/0x480
copy_page_from_iter_atomic+0x517/0x1350
? __pfx_copy_page_from_iter_atomic+0x10/0x10
? __filemap_get_folio+0x281/0x6c0
? folio_wait_writeback+0x53/0x1e0
? prepare_pages.constprop.0+0x40b/0x6c0
btrfs_copy_from_user+0xc6/0x290
btrfs_buffered_write+0x8c9/0x1190
? __pfx_btrfs_buffered_write+0x10/0x10
? _raw_spin_unlock+0x2d/0x50
? btrfs_file_llseek+0x100/0xf00
? follow_page_mask+0x69f/0x1e10
btrfs_do_write_iter+0x859/0xff0
? __pfx_btrfs_file_llseek+0x10/0x10
? find_held_lock+0x2d/0x110
? __pfx_btrfs_do_write_iter+0x10/0x10
? __up_read+0x211/0x790
? __pfx___get_user_pages+0x10/0x10
? __pfx___up_read+0x10/0x10
? __kernel_write_iter+0x3be/0x6d0
__kernel_write_iter+0x226/0x6d0
? __pfx___kernel_write_iter+0x10/0x10
dump_user_range+0x25d/0x650
? __pfx_dump_user_range+0x10/0x10
? __pfx_writenote+0x10/0x10
elf_core_dump+0x231f/0x2e90
? __pfx_elf_core_dump+0x10/0x10
? do_coredump+0x12a9/0x38c0
? kasan_set_track+0x25/0x30
? __kasan_kmalloc+0xaa/0xb0
? __kmalloc_node+0x6c/0x1b0
? do_coredump+0x12a9/0x38c0
? get_signal+0x1e7d/0x20f0
? 0xffffffffff600000
? mas_next_slot+0x328/0x1dd0
? lock_acquire+0x162/0x330
? do_coredump+0x2537/0x38c0
do_coredump+0x2537/0x38c0
? __pfx_do_coredump+0x10/0x10
? kmem_cache_free+0x114/0x520
? find_held_lock+0x2d/0x110
get_signal+0x1e7d/0x20f0
? __pfx_get_signal+0x10/0x10
? do_send_specific+0xf1/0x1c0
? __pfx_do_send_specific+0x10/0x10
arch_do_signal_or_restart+0x8b/0x4b0
? __pfx_arch_do_signal_or_restart+0x10/0x10
exit_to_user_mode_prepare+0xde/0x210
syscall_exit_to_user_mode+0x16/0x50
do_syscall_64+0x53/0x90
entry_SYSCALL_64_after_hwframe+0x6e/0xd8

2023-10-07 04:33:45

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH next] iov_iter: fix copy_page_from_iter_atomic()

[PATCH next] iov_iter: fix copy_page_from_iter_atomic()

Trying to test tmpfs on latest linux-next, copying and building kernel
trees, huge pages, and swapping while swapping off involved: lots of
cp: error writing '/tmp/2624/Documentation/fb/vesafb.txt': Bad address
cp: error writing '/tmp/2624/arch/mips/math-emu/dp_fsp.c': Bad address
etc.

Bisection leads to next-20231006's 376fdc4552f1 ("iov_iter:
Don't deal with iter->copy_mc in memcpy_from_iter_mc()") from vfs.git.
The tweak below makes it healthy again: please feel free to fold in.

Signed-off-by: Hugh Dickins <[email protected]>

--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -497,7 +497,7 @@ size_t copy_page_from_iter_atomic(struct
}

p = kmap_atomic(page) + offset;
- __copy_from_iter(p, n, i);
+ n = __copy_from_iter(p, n, i);
kunmap_atomic(p);
copied += n;
offset += n;

2023-10-07 07:31:23

by David Howells

[permalink] [raw]
Subject: Re: [PATCH next] iov_iter: fix copy_page_from_iter_atomic()

Hugh Dickins <[email protected]> wrote:

> - __copy_from_iter(p, n, i);
> + n = __copy_from_iter(p, n, i);

Yeah, that looks right. Can you fold it in, Christian?

David

2023-10-09 07:37:08

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH next] iov_iter: fix copy_page_from_iter_atomic()

On Sat, Oct 07, 2023 at 08:29:14AM +0100, David Howells wrote:
> Hugh Dickins <[email protected]> wrote:
>
> > - __copy_from_iter(p, n, i);
> > + n = __copy_from_iter(p, n, i);
>
> Yeah, that looks right. Can you fold it in, Christian?

Of course. Folded into
c9eec08bac96 ("iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()")
on vfs.iov_iter.