2023-09-21 06:56:59

by David Howells

[permalink] [raw]
Subject: [PATCH v5 00/11] iov_iter: Convert the iterator macros into inline funcs

Hi Jens,

Could you take these patches into the block tree? Note that it's on top of
the kunit patchset though it could be separate. The patches convert the
iov_iter iteration macros to be inline functions.

(1) Convert iter->user_backed to user_backed_iter() in the sound PCM
driver.

(2) Convert iter->user_backed to user_backed_iter() in a couple of
infiniband drivers.

(3) Renumber the type enum so that the ITER_* constants match the order in
iterate_and_advance*().

(4) Since (3) puts UBUF and IOVEC at 0 and 1, change user_backed_iter() to
just use the type value and get rid of the extra flag.

(5) Convert the iov_iter iteration macros to always-inline functions to
make the code easier to follow. It uses function pointers, but they
get optimised away. The priv2 argument likewise gets optimised away
if unused.

The functions are placed into a header file so that bespoke iterators
can be created elsewhere. For instance, rbd has an optimisation that
requires it to scan to the buffer it is given to see if it is all
zeros. It would be nice if this could use iterate_and_advance() - but
that's currently buried inside lib/iov_iter.c.

Whilst we could provide a generic iteration function that takes a pair
of function pointers instead, it does add around 16% overhead in the
framework, presumably from the combination of function pointers and
various mitigations.

Further, if it is known that just a particular iterator-type is in
play, just that iteration function can be used.

(6) Move the check for ->copy_mc to _copy_from_iter() and
copy_page_from_iter_atomic() rather than in memcpy_from_iter_mc()
where it gets repeated for every segment. Instead, we check once and
invoke a side function that can use iterate_bvec() rather than
iterate_and_advance() and supply a different step function.

(7) Provide a cut-down iteration function that only handles kernel-backed
iterators (ie. BVEC, KVEC, XARRAY and DISCARD) for situations where we
know that we won't see UBUF/IOVEC. This crops up in a number of
filesystems where we decant a UBUF/IOVEC iter into a series of BVEC
iters.

(8) Move the copy-and-csum code to net/ where it can be in proximity with
the code that uses it. This eliminates the code if CONFIG_NET=n and
allows for the slim possibility of it being inlined.

(9) Fold memcpy_and_csum() in to its two users.

(10) Move csum_and_copy_from_iter_full() out of line and merge in
csum_and_copy_from_iter() since the former is the only caller of the
latter.

(11) Move hash_and_copy_to_iter() to net/ where it can be with its only
caller.

Anyway, the changes in compiled function size either side of patches
(5)+(6) on x86_64 look like:

__copy_from_iter_mc new 0xe8
_copy_from_iter inc 0x360 -> 0x3a7 +0x47
_copy_from_iter_flushcache inc 0x34c -> 0x38e +0x42
_copy_from_iter_nocache inc 0x354 -> 0x378 +0x24
_copy_mc_to_iter inc 0x396 -> 0x3f1 +0x5b
_copy_to_iter inc 0x33b -> 0x385 +0x4a
copy_page_from_iter_atomic.part.0 inc 0x393 -> 0x3e0 +0x4d
copy_page_to_iter_nofault.part.0 inc 0x3de -> 0x3eb +0xd
copyin del 0x30
copyout del 0x2d
copyout_mc del 0x2b
csum_and_copy_from_iter inc 0x3db -> 0x41d +0x42
csum_and_copy_to_iter inc 0x45d -> 0x48d +0x30
iov_iter_zero inc 0x34a -> 0x36a +0x20
memcpy_from_iter.isra.0 del 0x1f

Note that there's a noticeable expansion on some of the main functions
because a number of the helpers get inlined instead of being called.

In terms of benchmarking patch (5)+(6), three runs without them:

iov_kunit_benchmark_ubuf: avg 3955 uS, stddev 169 uS
iov_kunit_benchmark_ubuf: avg 4122 uS, stddev 1292 uS
iov_kunit_benchmark_ubuf: avg 4451 uS, stddev 1362 uS
iov_kunit_benchmark_iovec: avg 6607 uS, stddev 22 uS
iov_kunit_benchmark_iovec: avg 6608 uS, stddev 19 uS
iov_kunit_benchmark_iovec: avg 6609 uS, stddev 24 uS
iov_kunit_benchmark_bvec: avg 3166 uS, stddev 11 uS
iov_kunit_benchmark_bvec: avg 3167 uS, stddev 13 uS
iov_kunit_benchmark_bvec: avg 3170 uS, stddev 16 uS
iov_kunit_benchmark_bvec_split: avg 3394 uS, stddev 12 uS
iov_kunit_benchmark_bvec_split: avg 3394 uS, stddev 20 uS
iov_kunit_benchmark_bvec_split: avg 3395 uS, stddev 19 uS
iov_kunit_benchmark_kvec: avg 2672 uS, stddev 12 uS
iov_kunit_benchmark_kvec: avg 2672 uS, stddev 12 uS
iov_kunit_benchmark_kvec: avg 2672 uS, stddev 9 uS
iov_kunit_benchmark_xarray: avg 3719 uS, stddev 9 uS
iov_kunit_benchmark_xarray: avg 3719 uS, stddev 9 uS
iov_kunit_benchmark_xarray: avg 3721 uS, stddev 24 uS

and three runs with them:

iov_kunit_benchmark_ubuf: avg 4110 uS, stddev 1254 uS
iov_kunit_benchmark_ubuf: avg 4141 uS, stddev 1411 uS
iov_kunit_benchmark_ubuf: avg 4572 uS, stddev 1889 uS
iov_kunit_benchmark_iovec: avg 6582 uS, stddev 27 uS
iov_kunit_benchmark_iovec: avg 6585 uS, stddev 25 uS
iov_kunit_benchmark_iovec: avg 6586 uS, stddev 48 uS
iov_kunit_benchmark_bvec: avg 3175 uS, stddev 13 uS
iov_kunit_benchmark_bvec: avg 3177 uS, stddev 12 uS
iov_kunit_benchmark_bvec: avg 3178 uS, stddev 12 uS
iov_kunit_benchmark_bvec_split: avg 3380 uS, stddev 20 uS
iov_kunit_benchmark_bvec_split: avg 3384 uS, stddev 15 uS
iov_kunit_benchmark_bvec_split: avg 3386 uS, stddev 25 uS
iov_kunit_benchmark_kvec: avg 2671 uS, stddev 11 uS
iov_kunit_benchmark_kvec: avg 2672 uS, stddev 12 uS
iov_kunit_benchmark_kvec: avg 2677 uS, stddev 20 uS
iov_kunit_benchmark_xarray: avg 3599 uS, stddev 20 uS
iov_kunit_benchmark_xarray: avg 3603 uS, stddev 8 uS
iov_kunit_benchmark_xarray: avg 3610 uS, stddev 16 uS

I've pushed the patches here also:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=iov-cleanup

David

Changes
=======
ver #5)
- Moved kunit test patches out to a separate set.
- Moved "iter->count - progress" into individual iteration subfunctions.
- Fix iterate_iovec() and to update iter->__iovec after subtracting it to
calculate iter->nr_segs.
- Merged the function inlining patch and the move-to-header patch.
- Rearranged the patch order slightly to put the patches to move
networking stuff to net/ last.
- Went back to a simpler patch to special-case ->copy_mc in
_copy_from_iter() and copy_page_from_iter_atomic() before we get to call
iterate_and_advance() so as to reduce the number of checks for this.

ver #4)
- Fix iterate_bvec() and iterate_kvec() to update iter->bvec and
iter->kvec after subtracting it to calculate iter->nr_segs.
- Change iterate_xarray() to use start+progress rather than increasing
start to reduce code size.
- Added patches to move some iteration functions over to net/ as the files
there can #include the iterator framework.
- Added a patch to benchmark the iteration.
- Tried an experimental patch to make copy_from_iter() and similar always
catch MCE.

ver #3)
- Use min_t(size_t,) not min() to avoid a warning on Hexagon.
- Inline all the step functions.
- Added a patch to better handle copy_mc.

ver #2)
- Rebased on top of Willy's changes in linux-next.
- Change the checksum argument to the iteration functions to be a general
void* and use it to pass iter->copy_mc flag to memcpy_from_iter_mc() to
avoid using a function pointer.
- Arrange the end of the iterate_*() functions to look the same to give
the optimiser the best chance.
- Make iterate_and_advance() a wrapper around iterate_and_advance2().
- Adjust iterate_and_advance2() to use if-else-if-else-if-else rather than
switch(), to put ITER_BVEC before KVEC and to mark UBUF and IOVEC as
likely().
- Move "iter->count += progress" into iterate_and_advance2() from the
iterate functions.
- Mark a number of the iterator helpers with __always_inline.
- Fix _copy_from_iter_flushcache() to use memcpy_from_iter_flushcache()
not memcpy_from_iter().

Link: https://lore.kernel.org/r/[email protected]/ # v1
Link: https://lore.kernel.org/r/[email protected]/ # v2
Link: https://lore.kernel.org/r/[email protected]/ # v3
Link: https://lore.kernel.org/r/[email protected]/ # v4

David Howells (11):
sound: Fix snd_pcm_readv()/writev() to use iov access functions
infiniband: Use user_backed_iter() to see if iterator is UBUF/IOVEC
iov_iter: Renumber ITER_* constants
iov_iter: Derive user-backedness from the iterator type
iov_iter: Convert iterate*() to inline funcs
iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
iov_iter: Add a kernel-type iterator-only iteration function
iov_iter, net: Move csum_and_copy_to/from_iter() to net/
iov_iter, net: Fold in csum_and_memcpy()
iov_iter, net: Merge csum_and_copy_from_iter{,_full}() together
iov_iter, net: Move hash_and_copy_to_iter() to net/

drivers/infiniband/hw/hfi1/file_ops.c | 2 +-
drivers/infiniband/hw/qib/qib_file_ops.c | 2 +-
include/linux/iov_iter.h | 305 ++++++++++++++++
include/linux/skbuff.h | 3 +
include/linux/uio.h | 29 +-
lib/iov_iter.c | 441 +++++++----------------
net/core/datagram.c | 75 +++-
net/core/skbuff.c | 40 ++
sound/core/pcm_native.c | 4 +-
9 files changed, 569 insertions(+), 332 deletions(-)
create mode 100644 include/linux/iov_iter.h


2023-09-21 07:40:19

by David Howells

[permalink] [raw]
Subject: [PATCH v5 01/11] sound: Fix snd_pcm_readv()/writev() to use iov access functions

Fix snd_pcm_readv()/writev() to use iov access functions rather than poking
at the iov_iter internals directly.

Signed-off-by: David Howells <[email protected]>
cc: Jaroslav Kysela <[email protected]>
cc: Takashi Iwai <[email protected]>
cc: Oswald Buddenhagen <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Suren Baghdasaryan <[email protected]>
cc: Kuninori Morimoto <[email protected]>
cc: [email protected]
---
sound/core/pcm_native.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index bd9ddf412b46..9a69236fa207 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -3527,7 +3527,7 @@ static ssize_t snd_pcm_readv(struct kiocb *iocb, struct iov_iter *to)
if (runtime->state == SNDRV_PCM_STATE_OPEN ||
runtime->state == SNDRV_PCM_STATE_DISCONNECTED)
return -EBADFD;
- if (!to->user_backed)
+ if (!user_backed_iter(to))
return -EINVAL;
if (to->nr_segs > 1024 || to->nr_segs != runtime->channels)
return -EINVAL;
@@ -3567,7 +3567,7 @@ static ssize_t snd_pcm_writev(struct kiocb *iocb, struct iov_iter *from)
if (runtime->state == SNDRV_PCM_STATE_OPEN ||
runtime->state == SNDRV_PCM_STATE_DISCONNECTED)
return -EBADFD;
- if (!from->user_backed)
+ if (!user_backed_iter(from))
return -EINVAL;
if (from->nr_segs > 128 || from->nr_segs != runtime->channels ||
!frame_aligned(runtime, iov->iov_len))

2023-09-21 08:34:37

by David Howells

[permalink] [raw]
Subject: [PATCH v5 11/11] iov_iter, net: Move hash_and_copy_to_iter() to net/

Move hash_and_copy_to_iter() to be with its only caller in networking code.

Signed-off-by: David Howells <[email protected]>
cc: Alexander Viro <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: Linus Torvalds <[email protected]>
cc: David Laight <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
---
include/linux/uio.h | 3 ---
lib/iov_iter.c | 20 --------------------
net/core/datagram.c | 19 +++++++++++++++++++
3 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 0cf9937f98d3..442d0d238207 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -341,9 +341,6 @@ iov_iter_npages_cap(struct iov_iter *i, int maxpages, size_t max_bytes)
return npages;
}

-size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
- struct iov_iter *i);
-
struct iovec *iovec_from_user(const struct iovec __user *uvector,
unsigned long nr_segs, unsigned long fast_segs,
struct iovec *fast_iov, bool compat);
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index fef934a8745d..2547c96d56c7 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1,5 +1,4 @@
// SPDX-License-Identifier: GPL-2.0-only
-#include <crypto/hash.h>
#include <linux/export.h>
#include <linux/bvec.h>
#include <linux/fault-inject-usercopy.h>
@@ -1093,25 +1092,6 @@ ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
}
EXPORT_SYMBOL(iov_iter_get_pages_alloc2);

-size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
- struct iov_iter *i)
-{
-#ifdef CONFIG_CRYPTO_HASH
- struct ahash_request *hash = hashp;
- struct scatterlist sg;
- size_t copied;
-
- copied = copy_to_iter(addr, bytes, i);
- sg_init_one(&sg, addr, copied);
- ahash_request_set_crypt(hash, &sg, NULL, copied);
- crypto_ahash_update(hash);
- return copied;
-#else
- return 0;
-#endif
-}
-EXPORT_SYMBOL(hash_and_copy_to_iter);
-
static int iov_npages(const struct iov_iter *i, int maxpages)
{
size_t skip = i->iov_offset, size = i->count;
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 722311eeee18..103d46fa0eeb 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -61,6 +61,7 @@
#include <net/tcp_states.h>
#include <trace/events/skb.h>
#include <net/busy_poll.h>
+#include <crypto/hash.h>

/*
* Is a socket 'connection oriented' ?
@@ -489,6 +490,24 @@ static int __skb_datagram_iter(const struct sk_buff *skb, int offset,
return 0;
}

+static size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
+ struct iov_iter *i)
+{
+#ifdef CONFIG_CRYPTO_HASH
+ struct ahash_request *hash = hashp;
+ struct scatterlist sg;
+ size_t copied;
+
+ copied = copy_to_iter(addr, bytes, i);
+ sg_init_one(&sg, addr, copied);
+ ahash_request_set_crypt(hash, &sg, NULL, copied);
+ crypto_ahash_update(hash);
+ return copied;
+#else
+ return 0;
+#endif
+}
+
/**
* skb_copy_and_hash_datagram_iter - Copy datagram to an iovec iterator
* and update a hash.

2023-09-21 09:06:09

by David Howells

[permalink] [raw]
Subject: [PATCH v5 02/11] infiniband: Use user_backed_iter() to see if iterator is UBUF/IOVEC

Use user_backed_iter() to see if iterator is UBUF/IOVEC rather than poking
inside the iterator.

Signed-off-by: David Howells <[email protected]>
cc: Dennis Dalessandro <[email protected]>
cc: Jason Gunthorpe <[email protected]>
cc: Leon Romanovsky <[email protected]>
cc: [email protected]
---
drivers/infiniband/hw/hfi1/file_ops.c | 2 +-
drivers/infiniband/hw/qib/qib_file_ops.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index a5ab22cedd41..788fc249234f 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -267,7 +267,7 @@ static ssize_t hfi1_write_iter(struct kiocb *kiocb, struct iov_iter *from)

if (!HFI1_CAP_IS_KSET(SDMA))
return -EINVAL;
- if (!from->user_backed)
+ if (!user_backed_iter(from))
return -EINVAL;
idx = srcu_read_lock(&fd->pq_srcu);
pq = srcu_dereference(fd->pq, &fd->pq_srcu);
diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c b/drivers/infiniband/hw/qib/qib_file_ops.c
index 152952127f13..29e4c59aa23b 100644
--- a/drivers/infiniband/hw/qib/qib_file_ops.c
+++ b/drivers/infiniband/hw/qib/qib_file_ops.c
@@ -2244,7 +2244,7 @@ static ssize_t qib_write_iter(struct kiocb *iocb, struct iov_iter *from)
struct qib_ctxtdata *rcd = ctxt_fp(iocb->ki_filp);
struct qib_user_sdma_queue *pq = fp->pq;

- if (!from->user_backed || !from->nr_segs || !pq)
+ if (!user_backed_iter(from) || !from->nr_segs || !pq)
return -EINVAL;

return qib_user_sdma_writev(rcd, pq, iter_iov(from), from->nr_segs);

2023-09-21 09:53:18

by David Howells

[permalink] [raw]
Subject: [PATCH v5 05/11] iov_iter: Convert iterate*() to inline funcs

Convert the iov_iter iteration macros to inline functions to make the code
easier to follow.

The functions are marked __always_inline as we don't want to end up with
indirect calls in the code. This, however, leaves dealing with ->copy_mc
in an awkard situation since the step function (memcpy_from_iter_mc())
needs to test the flag in the iterator, but isn't passed the iterator.
This will be dealt with in a follow-up patch.

The variable names in the per-type iterator functions have been harmonised
as much as possible and made clearer as to the variable purpose.

The iterator functions are also moved to a header file so that other
operations that need to scan over an iterator can be added. For instance,
the rbd driver could use this to scan a buffer to see if it is all zeros
and libceph could use this to generate a crc.

Signed-off-by: David Howells <[email protected]>
cc: Alexander Viro <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: Linus Torvalds <[email protected]>
cc: David Laight <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/ # v1
Link: https://lore.kernel.org/r/[email protected]/ # v2
Link: https://lore.kernel.org/r/[email protected]/ # v3
---

Notes:
Changes
=======
ver #5)
- Merge in patch to move iteration framework to a header file.
- Move "iter->count - progress" into individual iteration subfunctions.

include/linux/iov_iter.h | 274 ++++++++++++++++++++++++++
lib/iov_iter.c | 416 ++++++++++++++++-----------------------
2 files changed, 449 insertions(+), 241 deletions(-)
create mode 100644 include/linux/iov_iter.h

diff --git a/include/linux/iov_iter.h b/include/linux/iov_iter.h
new file mode 100644
index 000000000000..270454a6703d
--- /dev/null
+++ b/include/linux/iov_iter.h
@@ -0,0 +1,274 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* I/O iterator iteration building functions.
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#ifndef _LINUX_IOV_ITER_H
+#define _LINUX_IOV_ITER_H
+
+#include <linux/uio.h>
+#include <linux/bvec.h>
+
+typedef size_t (*iov_step_f)(void *iter_base, size_t progress, size_t len,
+ void *priv, void *priv2);
+typedef size_t (*iov_ustep_f)(void __user *iter_base, size_t progress, size_t len,
+ void *priv, void *priv2);
+
+/*
+ * Handle ITER_UBUF.
+ */
+static __always_inline
+size_t iterate_ubuf(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+ iov_ustep_f step)
+{
+ void __user *base = iter->ubuf;
+ size_t progress = 0, remain;
+
+ remain = step(base + iter->iov_offset, 0, len, priv, priv2);
+ progress = len - remain;
+ iter->iov_offset += progress;
+ iter->count -= progress;
+ return progress;
+}
+
+/*
+ * Handle ITER_IOVEC.
+ */
+static __always_inline
+size_t iterate_iovec(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+ iov_ustep_f step)
+{
+ const struct iovec *p = iter->__iov;
+ size_t progress = 0, skip = iter->iov_offset;
+
+ do {
+ size_t remain, consumed;
+ size_t part = min(len, p->iov_len - skip);
+
+ if (likely(part)) {
+ remain = step(p->iov_base + skip, progress, part, priv, priv2);
+ consumed = part - remain;
+ progress += consumed;
+ skip += consumed;
+ len -= consumed;
+ if (skip < p->iov_len)
+ break;
+ }
+ p++;
+ skip = 0;
+ } while (len);
+
+ iter->nr_segs -= p - iter->__iov;
+ iter->__iov = p;
+ iter->iov_offset = skip;
+ iter->count -= progress;
+ return progress;
+}
+
+/*
+ * Handle ITER_KVEC.
+ */
+static __always_inline
+size_t iterate_kvec(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+ iov_step_f step)
+{
+ const struct kvec *p = iter->kvec;
+ size_t progress = 0, skip = iter->iov_offset;
+
+ do {
+ size_t remain, consumed;
+ size_t part = min(len, p->iov_len - skip);
+
+ if (likely(part)) {
+ remain = step(p->iov_base + skip, progress, part, priv, priv2);
+ consumed = part - remain;
+ progress += consumed;
+ skip += consumed;
+ len -= consumed;
+ if (skip < p->iov_len)
+ break;
+ }
+ p++;
+ skip = 0;
+ } while (len);
+
+ iter->nr_segs -= p - iter->kvec;
+ iter->kvec = p;
+ iter->iov_offset = skip;
+ iter->count -= progress;
+ return progress;
+}
+
+/*
+ * Handle ITER_BVEC.
+ */
+static __always_inline
+size_t iterate_bvec(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+ iov_step_f step)
+{
+ const struct bio_vec *p = iter->bvec;
+ size_t progress = 0, skip = iter->iov_offset;
+
+ do {
+ size_t remain, consumed;
+ size_t offset = p->bv_offset + skip, part;
+ void *kaddr = kmap_local_page(p->bv_page + offset / PAGE_SIZE);
+
+ part = min3(len,
+ (size_t)(p->bv_len - skip),
+ (size_t)(PAGE_SIZE - offset % PAGE_SIZE));
+ remain = step(kaddr + offset % PAGE_SIZE, progress, part, priv, priv2);
+ kunmap_local(kaddr);
+ consumed = part - remain;
+ len -= consumed;
+ progress += consumed;
+ skip += consumed;
+ if (skip >= p->bv_len) {
+ skip = 0;
+ p++;
+ }
+ if (remain)
+ break;
+ } while (len);
+
+ iter->nr_segs -= p - iter->bvec;
+ iter->bvec = p;
+ iter->iov_offset = skip;
+ iter->count -= progress;
+ return progress;
+}
+
+/*
+ * Handle ITER_XARRAY.
+ */
+static __always_inline
+size_t iterate_xarray(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+ iov_step_f step)
+{
+ struct folio *folio;
+ size_t progress = 0;
+ loff_t start = iter->xarray_start + iter->iov_offset;
+ pgoff_t index = start / PAGE_SIZE;
+ XA_STATE(xas, iter->xarray, index);
+
+ rcu_read_lock();
+ xas_for_each(&xas, folio, ULONG_MAX) {
+ size_t remain, consumed, offset, part, flen;
+
+ if (xas_retry(&xas, folio))
+ continue;
+ if (WARN_ON(xa_is_value(folio)))
+ break;
+ if (WARN_ON(folio_test_hugetlb(folio)))
+ break;
+
+ offset = offset_in_folio(folio, start + progress);
+ flen = min(folio_size(folio) - offset, len);
+
+ while (flen) {
+ void *base = kmap_local_folio(folio, offset);
+
+ part = min_t(size_t, flen,
+ PAGE_SIZE - offset_in_page(offset));
+ remain = step(base, progress, part, priv, priv2);
+ kunmap_local(base);
+
+ consumed = part - remain;
+ progress += consumed;
+ len -= consumed;
+
+ if (remain || len == 0)
+ goto out;
+ flen -= consumed;
+ offset += consumed;
+ }
+ }
+
+out:
+ rcu_read_unlock();
+ iter->iov_offset += progress;
+ iter->count -= progress;
+ return progress;
+}
+
+/*
+ * Handle ITER_DISCARD.
+ */
+static __always_inline
+size_t iterate_discard(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+ iov_step_f step)
+{
+ size_t progress = len;
+
+ iter->count -= progress;
+ return progress;
+}
+
+/**
+ * iterate_and_advance2 - Iterate over an iterator
+ * @iter: The iterator to iterate over.
+ * @len: The amount to iterate over.
+ * @priv: Data for the step functions.
+ * @priv2: More data for the step functions.
+ * @ustep: Function for UBUF/IOVEC iterators; given __user addresses.
+ * @step: Function for other iterators; given kernel addresses.
+ *
+ * Iterate over the next part of an iterator, up to the specified length. The
+ * buffer is presented in segments, which for kernel iteration are broken up by
+ * physical pages and mapped, with the mapped address being presented.
+ *
+ * Two step functions, @step and @ustep, must be provided, one for handling
+ * mapped kernel addresses and the other is given user addresses which have the
+ * potential to fault since no pinning is performed.
+ *
+ * The step functions are passed the address and length of the segment, @priv,
+ * @priv2 and the amount of data so far iterated over (which can, for example,
+ * be added to @priv to point to the right part of a second buffer). The step
+ * functions should return the amount of the segment they didn't process (ie. 0
+ * indicates complete processsing).
+ *
+ * This function returns the amount of data processed (ie. 0 means nothing was
+ * processed and the value of @len means processes to completion).
+ */
+static __always_inline
+size_t iterate_and_advance2(struct iov_iter *iter, size_t len, void *priv,
+ void *priv2, iov_ustep_f ustep, iov_step_f step)
+{
+ if (unlikely(iter->count < len))
+ len = iter->count;
+ if (unlikely(!len))
+ return 0;
+
+ if (likely(iter_is_ubuf(iter)))
+ return iterate_ubuf(iter, len, priv, priv2, ustep);
+ if (likely(iter_is_iovec(iter)))
+ return iterate_iovec(iter, len, priv, priv2, ustep);
+ if (iov_iter_is_bvec(iter))
+ return iterate_bvec(iter, len, priv, priv2, step);
+ if (iov_iter_is_kvec(iter))
+ return iterate_kvec(iter, len, priv, priv2, step);
+ if (iov_iter_is_xarray(iter))
+ return iterate_xarray(iter, len, priv, priv2, step);
+ return iterate_discard(iter, len, priv, priv2, step);
+}
+
+/**
+ * iterate_and_advance - Iterate over an iterator
+ * @iter: The iterator to iterate over.
+ * @len: The amount to iterate over.
+ * @priv: Data for the step functions.
+ * @ustep: Function for UBUF/IOVEC iterators; given __user addresses.
+ * @step: Function for other iterators; given kernel addresses.
+ *
+ * As iterate_and_advance2(), but priv2 is always NULL.
+ */
+static __always_inline
+size_t iterate_and_advance(struct iov_iter *iter, size_t len, void *priv,
+ iov_ustep_f ustep, iov_step_f step)
+{
+ return iterate_and_advance2(iter, len, priv, NULL, ustep, step);
+}
+
+#endif /* _LINUX_IOV_ITER_H */
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 227c9f536b94..65374ee91ecd 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -13,189 +13,69 @@
#include <net/checksum.h>
#include <linux/scatterlist.h>
#include <linux/instrumented.h>
+#include <linux/iov_iter.h>

-/* covers ubuf and kbuf alike */
-#define iterate_buf(i, n, base, len, off, __p, STEP) { \
- size_t __maybe_unused off = 0; \
- len = n; \
- base = __p + i->iov_offset; \
- len -= (STEP); \
- i->iov_offset += len; \
- n = len; \
-}
-
-/* covers iovec and kvec alike */
-#define iterate_iovec(i, n, base, len, off, __p, STEP) { \
- size_t off = 0; \
- size_t skip = i->iov_offset; \
- do { \
- len = min(n, __p->iov_len - skip); \
- if (likely(len)) { \
- base = __p->iov_base + skip; \
- len -= (STEP); \
- off += len; \
- skip += len; \
- n -= len; \
- if (skip < __p->iov_len) \
- break; \
- } \
- __p++; \
- skip = 0; \
- } while (n); \
- i->iov_offset = skip; \
- n = off; \
-}
-
-#define iterate_bvec(i, n, base, len, off, p, STEP) { \
- size_t off = 0; \
- unsigned skip = i->iov_offset; \
- while (n) { \
- unsigned offset = p->bv_offset + skip; \
- unsigned left; \
- void *kaddr = kmap_local_page(p->bv_page + \
- offset / PAGE_SIZE); \
- base = kaddr + offset % PAGE_SIZE; \
- len = min(min(n, (size_t)(p->bv_len - skip)), \
- (size_t)(PAGE_SIZE - offset % PAGE_SIZE)); \
- left = (STEP); \
- kunmap_local(kaddr); \
- len -= left; \
- off += len; \
- skip += len; \
- if (skip == p->bv_len) { \
- skip = 0; \
- p++; \
- } \
- n -= len; \
- if (left) \
- break; \
- } \
- i->iov_offset = skip; \
- n = off; \
-}
-
-#define iterate_xarray(i, n, base, len, __off, STEP) { \
- __label__ __out; \
- size_t __off = 0; \
- struct folio *folio; \
- loff_t start = i->xarray_start + i->iov_offset; \
- pgoff_t index = start / PAGE_SIZE; \
- XA_STATE(xas, i->xarray, index); \
- \
- len = PAGE_SIZE - offset_in_page(start); \
- rcu_read_lock(); \
- xas_for_each(&xas, folio, ULONG_MAX) { \
- unsigned left; \
- size_t offset; \
- if (xas_retry(&xas, folio)) \
- continue; \
- if (WARN_ON(xa_is_value(folio))) \
- break; \
- if (WARN_ON(folio_test_hugetlb(folio))) \
- break; \
- offset = offset_in_folio(folio, start + __off); \
- while (offset < folio_size(folio)) { \
- base = kmap_local_folio(folio, offset); \
- len = min(n, len); \
- left = (STEP); \
- kunmap_local(base); \
- len -= left; \
- __off += len; \
- n -= len; \
- if (left || n == 0) \
- goto __out; \
- offset += len; \
- len = PAGE_SIZE; \
- } \
- } \
-__out: \
- rcu_read_unlock(); \
- i->iov_offset += __off; \
- n = __off; \
-}
-
-#define __iterate_and_advance(i, n, base, len, off, I, K) { \
- if (unlikely(i->count < n)) \
- n = i->count; \
- if (likely(n)) { \
- if (likely(iter_is_ubuf(i))) { \
- void __user *base; \
- size_t len; \
- iterate_buf(i, n, base, len, off, \
- i->ubuf, (I)) \
- } else if (likely(iter_is_iovec(i))) { \
- const struct iovec *iov = iter_iov(i); \
- void __user *base; \
- size_t len; \
- iterate_iovec(i, n, base, len, off, \
- iov, (I)) \
- i->nr_segs -= iov - iter_iov(i); \
- i->__iov = iov; \
- } else if (iov_iter_is_bvec(i)) { \
- const struct bio_vec *bvec = i->bvec; \
- void *base; \
- size_t len; \
- iterate_bvec(i, n, base, len, off, \
- bvec, (K)) \
- i->nr_segs -= bvec - i->bvec; \
- i->bvec = bvec; \
- } else if (iov_iter_is_kvec(i)) { \
- const struct kvec *kvec = i->kvec; \
- void *base; \
- size_t len; \
- iterate_iovec(i, n, base, len, off, \
- kvec, (K)) \
- i->nr_segs -= kvec - i->kvec; \
- i->kvec = kvec; \
- } else if (iov_iter_is_xarray(i)) { \
- void *base; \
- size_t len; \
- iterate_xarray(i, n, base, len, off, \
- (K)) \
- } \
- i->count -= n; \
- } \
-}
-#define iterate_and_advance(i, n, base, len, off, I, K) \
- __iterate_and_advance(i, n, base, len, off, I, ((void)(K),0))
-
-static int copyout(void __user *to, const void *from, size_t n)
+static __always_inline
+size_t copy_to_user_iter(void __user *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
{
if (should_fail_usercopy())
- return n;
- if (access_ok(to, n)) {
- instrument_copy_to_user(to, from, n);
- n = raw_copy_to_user(to, from, n);
+ return len;
+ if (access_ok(iter_to, len)) {
+ from += progress;
+ instrument_copy_to_user(iter_to, from, len);
+ len = raw_copy_to_user(iter_to, from, len);
}
- return n;
+ return len;
}

-static int copyout_nofault(void __user *to, const void *from, size_t n)
+static __always_inline
+size_t copy_to_user_iter_nofault(void __user *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
{
- long res;
+ ssize_t res;

if (should_fail_usercopy())
- return n;
-
- res = copy_to_user_nofault(to, from, n);
+ return len;

- return res < 0 ? n : res;
+ from += progress;
+ res = copy_to_user_nofault(iter_to, from, len);
+ return res < 0 ? len : res;
}

-static int copyin(void *to, const void __user *from, size_t n)
+static __always_inline
+size_t copy_from_user_iter(void __user *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
{
- size_t res = n;
+ size_t res = len;

if (should_fail_usercopy())
- return n;
- if (access_ok(from, n)) {
- instrument_copy_from_user_before(to, from, n);
- res = raw_copy_from_user(to, from, n);
- instrument_copy_from_user_after(to, from, n, res);
+ return len;
+ if (access_ok(iter_from, len)) {
+ to += progress;
+ instrument_copy_from_user_before(to, iter_from, len);
+ res = raw_copy_from_user(to, iter_from, len);
+ instrument_copy_from_user_after(to, iter_from, len, res);
}
return res;
}

+static __always_inline
+size_t memcpy_to_iter(void *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
+{
+ memcpy(iter_to, from + progress, len);
+ return 0;
+}
+
+static __always_inline
+size_t memcpy_from_iter(void *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
+{
+ memcpy(to + progress, iter_from, len);
+ return 0;
+}
+
/*
* fault_in_iov_iter_readable - fault in iov iterator for reading
* @i: iterator
@@ -312,23 +192,29 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
return 0;
if (user_backed_iter(i))
might_fault();
- iterate_and_advance(i, bytes, base, len, off,
- copyout(base, addr + off, len),
- memcpy(base, addr + off, len)
- )
-
- return bytes;
+ return iterate_and_advance(i, bytes, (void *)addr,
+ copy_to_user_iter, memcpy_to_iter);
}
EXPORT_SYMBOL(_copy_to_iter);

#ifdef CONFIG_ARCH_HAS_COPY_MC
-static int copyout_mc(void __user *to, const void *from, size_t n)
-{
- if (access_ok(to, n)) {
- instrument_copy_to_user(to, from, n);
- n = copy_mc_to_user((__force void *) to, from, n);
+static __always_inline
+size_t copy_to_user_iter_mc(void __user *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
+{
+ if (access_ok(iter_to, len)) {
+ from += progress;
+ instrument_copy_to_user(iter_to, from, len);
+ len = copy_mc_to_user(iter_to, from, len);
}
- return n;
+ return len;
+}
+
+static __always_inline
+size_t memcpy_to_iter_mc(void *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
+{
+ return copy_mc_to_kernel(iter_to, from + progress, len);
}

/**
@@ -361,22 +247,20 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
return 0;
if (user_backed_iter(i))
might_fault();
- __iterate_and_advance(i, bytes, base, len, off,
- copyout_mc(base, addr + off, len),
- copy_mc_to_kernel(base, addr + off, len)
- )
-
- return bytes;
+ return iterate_and_advance(i, bytes, (void *)addr,
+ copy_to_user_iter_mc, memcpy_to_iter_mc);
}
EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
#endif /* CONFIG_ARCH_HAS_COPY_MC */

-static void *memcpy_from_iter(struct iov_iter *i, void *to, const void *from,
- size_t size)
+static size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
{
- if (iov_iter_is_copy_mc(i))
- return (void *)copy_mc_to_kernel(to, from, size);
- return memcpy(to, from, size);
+ struct iov_iter *iter = priv2;
+
+ if (iov_iter_is_copy_mc(iter))
+ return copy_mc_to_kernel(to + progress, iter_from, len);
+ return memcpy_from_iter(iter_from, progress, len, to, priv2);
}

size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
@@ -386,30 +270,46 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)

if (user_backed_iter(i))
might_fault();
- iterate_and_advance(i, bytes, base, len, off,
- copyin(addr + off, base, len),
- memcpy_from_iter(i, addr + off, base, len)
- )
-
- return bytes;
+ return iterate_and_advance2(i, bytes, addr, i,
+ copy_from_user_iter,
+ memcpy_from_iter_mc);
}
EXPORT_SYMBOL(_copy_from_iter);

+static __always_inline
+size_t copy_from_user_iter_nocache(void __user *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
+{
+ return __copy_from_user_inatomic_nocache(to + progress, iter_from, len);
+}
+
size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
{
if (WARN_ON_ONCE(!i->data_source))
return 0;

- iterate_and_advance(i, bytes, base, len, off,
- __copy_from_user_inatomic_nocache(addr + off, base, len),
- memcpy(addr + off, base, len)
- )
-
- return bytes;
+ return iterate_and_advance(i, bytes, addr,
+ copy_from_user_iter_nocache,
+ memcpy_from_iter);
}
EXPORT_SYMBOL(_copy_from_iter_nocache);

#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+static __always_inline
+size_t copy_from_user_iter_flushcache(void __user *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
+{
+ return __copy_from_user_flushcache(to + progress, iter_from, len);
+}
+
+static __always_inline
+size_t memcpy_from_iter_flushcache(void *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
+{
+ memcpy_flushcache(to + progress, iter_from, len);
+ return 0;
+}
+
/**
* _copy_from_iter_flushcache - write destination through cpu cache
* @addr: destination kernel address
@@ -431,12 +331,9 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
if (WARN_ON_ONCE(!i->data_source))
return 0;

- iterate_and_advance(i, bytes, base, len, off,
- __copy_from_user_flushcache(addr + off, base, len),
- memcpy_flushcache(addr + off, base, len)
- )
-
- return bytes;
+ return iterate_and_advance(i, bytes, addr,
+ copy_from_user_iter_flushcache,
+ memcpy_from_iter_flushcache);
}
EXPORT_SYMBOL_GPL(_copy_from_iter_flushcache);
#endif
@@ -508,10 +405,9 @@ size_t copy_page_to_iter_nofault(struct page *page, unsigned offset, size_t byte
void *kaddr = kmap_local_page(page);
size_t n = min(bytes, (size_t)PAGE_SIZE - offset);

- iterate_and_advance(i, n, base, len, off,
- copyout_nofault(base, kaddr + offset + off, len),
- memcpy(base, kaddr + offset + off, len)
- )
+ n = iterate_and_advance(i, bytes, kaddr,
+ copy_to_user_iter_nofault,
+ memcpy_to_iter);
kunmap_local(kaddr);
res += n;
bytes -= n;
@@ -554,14 +450,25 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
}
EXPORT_SYMBOL(copy_page_from_iter);

-size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
+static __always_inline
+size_t zero_to_user_iter(void __user *iter_to, size_t progress,
+ size_t len, void *priv, void *priv2)
{
- iterate_and_advance(i, bytes, base, len, count,
- clear_user(base, len),
- memset(base, 0, len)
- )
+ return clear_user(iter_to, len);
+}

- return bytes;
+static __always_inline
+size_t zero_to_iter(void *iter_to, size_t progress,
+ size_t len, void *priv, void *priv2)
+{
+ memset(iter_to, 0, len);
+ return 0;
+}
+
+size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
+{
+ return iterate_and_advance(i, bytes, NULL,
+ zero_to_user_iter, zero_to_iter);
}
EXPORT_SYMBOL(iov_iter_zero);

@@ -586,10 +493,9 @@ size_t copy_page_from_iter_atomic(struct page *page, size_t offset,
}

p = kmap_atomic(page) + offset;
- iterate_and_advance(i, n, base, len, off,
- copyin(p + off, base, len),
- memcpy_from_iter(i, p + off, base, len)
- )
+ n = iterate_and_advance2(i, n, p, i,
+ copy_from_user_iter,
+ memcpy_from_iter_mc);
kunmap_atomic(p);
copied += n;
offset += n;
@@ -1180,32 +1086,64 @@ ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
}
EXPORT_SYMBOL(iov_iter_get_pages_alloc2);

+static __always_inline
+size_t copy_from_user_iter_csum(void __user *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
+{
+ __wsum next, *csum = priv2;
+
+ next = csum_and_copy_from_user(iter_from, to + progress, len);
+ *csum = csum_block_add(*csum, next, progress);
+ return next ? 0 : len;
+}
+
+static __always_inline
+size_t memcpy_from_iter_csum(void *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
+{
+ __wsum *csum = priv2;
+
+ *csum = csum_and_memcpy(to + progress, iter_from, len, *csum, progress);
+ return 0;
+}
+
size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
struct iov_iter *i)
{
- __wsum sum, next;
- sum = *csum;
if (WARN_ON_ONCE(!i->data_source))
return 0;
-
- iterate_and_advance(i, bytes, base, len, off, ({
- next = csum_and_copy_from_user(base, addr + off, len);
- sum = csum_block_add(sum, next, off);
- next ? 0 : len;
- }), ({
- sum = csum_and_memcpy(addr + off, base, len, sum, off);
- })
- )
- *csum = sum;
- return bytes;
+ return iterate_and_advance2(i, bytes, addr, csum,
+ copy_from_user_iter_csum,
+ memcpy_from_iter_csum);
}
EXPORT_SYMBOL(csum_and_copy_from_iter);

+static __always_inline
+size_t copy_to_user_iter_csum(void __user *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
+{
+ __wsum next, *csum = priv2;
+
+ next = csum_and_copy_to_user(from + progress, iter_to, len);
+ *csum = csum_block_add(*csum, next, progress);
+ return next ? 0 : len;
+}
+
+static __always_inline
+size_t memcpy_to_iter_csum(void *iter_to, size_t progress,
+ size_t len, void *from, void *priv2)
+{
+ __wsum *csum = priv2;
+
+ *csum = csum_and_memcpy(iter_to, from + progress, len, *csum, progress);
+ return 0;
+}
+
size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate,
struct iov_iter *i)
{
struct csum_state *csstate = _csstate;
- __wsum sum, next;
+ __wsum sum;

if (WARN_ON_ONCE(i->data_source))
return 0;
@@ -1219,14 +1157,10 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate,
}

sum = csum_shift(csstate->csum, csstate->off);
- iterate_and_advance(i, bytes, base, len, off, ({
- next = csum_and_copy_to_user(addr + off, base, len);
- sum = csum_block_add(sum, next, off);
- next ? 0 : len;
- }), ({
- sum = csum_and_memcpy(base, addr + off, len, sum, off);
- })
- )
+
+ bytes = iterate_and_advance2(i, bytes, (void *)addr, &sum,
+ copy_to_user_iter_csum,
+ memcpy_to_iter_csum);
csstate->csum = csum_shift(sum, csstate->off);
csstate->off += bytes;
return bytes;

2023-09-21 10:24:15

by David Howells

[permalink] [raw]
Subject: [PATCH v5 04/11] iov_iter: Derive user-backedness from the iterator type

Use the iterator type to determine whether an iterator is user-backed or
not rather than using a special flag for it. Now that ITER_UBUF and
ITER_IOVEC are 0 and 1, they can be checked with a single comparison.

Signed-off-by: David Howells <[email protected]>
cc: Alexander Viro <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: Linus Torvalds <[email protected]>
cc: David Laight <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
---
include/linux/uio.h | 4 +---
lib/iov_iter.c | 1 -
2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index d1801c46e89e..e2a248dad80b 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -43,7 +43,6 @@ struct iov_iter {
bool copy_mc;
bool nofault;
bool data_source;
- bool user_backed;
union {
size_t iov_offset;
int last_offset;
@@ -143,7 +142,7 @@ static inline unsigned char iov_iter_rw(const struct iov_iter *i)

static inline bool user_backed_iter(const struct iov_iter *i)
{
- return i->user_backed;
+ return iter_is_ubuf(i) || iter_is_iovec(i);
}

/*
@@ -383,7 +382,6 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
*i = (struct iov_iter) {
.iter_type = ITER_UBUF,
.copy_mc = false,
- .user_backed = true,
.data_source = direction,
.ubuf = buf,
.count = count,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 27234a820eeb..227c9f536b94 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -290,7 +290,6 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
.iter_type = ITER_IOVEC,
.copy_mc = false,
.nofault = false,
- .user_backed = true,
.data_source = direction,
.__iov = iov,
.nr_segs = nr_segs,

2023-09-21 13:22:48

by David Howells

[permalink] [raw]
Subject: [PATCH v5 06/11] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()

iter->copy_mc is only used with a bvec iterator and only by
dump_emit_page() in fs/coredump.c so rather than handle this in
memcpy_from_iter_mc() where it is checked repeatedly by _copy_from_iter()
and copy_page_from_iter_atomic(),
---
lib/iov_iter.c | 39 +++++++++++++++++++++++++++------------
1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 65374ee91ecd..943aa3cfd7b3 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -253,14 +253,33 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
#endif /* CONFIG_ARCH_HAS_COPY_MC */

-static size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
- size_t len, void *to, void *priv2)
+static __always_inline
+size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
+ size_t len, void *to, void *priv2)
+{
+ return copy_mc_to_kernel(to + progress, iter_from, len);
+}
+
+static size_t __copy_from_iter_mc(void *addr, size_t bytes, struct iov_iter *i)
{
- struct iov_iter *iter = priv2;
+ size_t progress;

- if (iov_iter_is_copy_mc(iter))
- return copy_mc_to_kernel(to + progress, iter_from, len);
- return memcpy_from_iter(iter_from, progress, len, to, priv2);
+ if (unlikely(i->count < bytes))
+ bytes = i->count;
+ if (unlikely(!bytes))
+ return 0;
+ progress = iterate_bvec(i, bytes, addr, NULL, memcpy_from_iter_mc);
+ i->count -= progress;
+ return progress;
+}
+
+static __always_inline
+size_t __copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
+{
+ if (unlikely(iov_iter_is_copy_mc(i)))
+ return __copy_from_iter_mc(addr, bytes, i);
+ return iterate_and_advance(i, bytes, addr,
+ copy_from_user_iter, memcpy_from_iter);
}

size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
@@ -270,9 +289,7 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)

if (user_backed_iter(i))
might_fault();
- return iterate_and_advance2(i, bytes, addr, i,
- copy_from_user_iter,
- memcpy_from_iter_mc);
+ return __copy_from_iter(addr, bytes, i);
}
EXPORT_SYMBOL(_copy_from_iter);

@@ -493,9 +510,7 @@ size_t copy_page_from_iter_atomic(struct page *page, size_t offset,
}

p = kmap_atomic(page) + offset;
- n = iterate_and_advance2(i, n, p, i,
- copy_from_user_iter,
- memcpy_from_iter_mc);
+ __copy_from_iter(p, n, i);
kunmap_atomic(p);
copied += n;
offset += n;

2023-09-22 00:04:35

by Takashi Iwai

[permalink] [raw]
Subject: Re: [PATCH v5 01/11] sound: Fix snd_pcm_readv()/writev() to use iov access functions

On Thu, 21 Sep 2023 17:03:28 +0200,
David Howells wrote:
>
> Takashi Iwai <[email protected]> wrote:
>
> > Would you apply it through your tree, or shall I apply this one via
> > sound git tree?
>
> It's a prerequisite for a later patch in this series, so I'd prefer to keep it
> with my other patches.

Sure, please go ahead.


thanks,

Takashi

2023-09-22 00:13:14

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v5 00/11] iov_iter: Convert the iterator macros into inline funcs

From: David Howells
> Sent: 20 September 2023 23:22
...
> (8) Move the copy-and-csum code to net/ where it can be in proximity with
> the code that uses it. This eliminates the code if CONFIG_NET=n and
> allows for the slim possibility of it being inlined.
>
> (9) Fold memcpy_and_csum() in to its two users.
>
> (10) Move csum_and_copy_from_iter_full() out of line and merge in
> csum_and_copy_from_iter() since the former is the only caller of the
> latter.

I thought that the real idea behind these was to do the checksum
at the same time as the copy to avoid loading the data into the L1
data-cache twice - especially for long buffers.
I wonder how often there are multiple iov[] that actually make
it better than just check summing the linear buffer?

I had a feeling that check summing of udp data was done during
copy_to/from_user, but the code can't be the copy-and-csum here
for that because it is missing support form odd-length buffers.

Intel x86 desktop chips can easily checksum at 8 bytes/clock
(But probably not with the current code!).
(I've got ~12 bytes/clock using adox and adcx but that loop
is entirely horrid and it would need run-time patching.
Especially since I think some AMD cpu execute them very slowly.)

OTOH 'rep movs[bq]' copy will copy 16 bytes/clock (32 if the
destination is 32 byte aligned - it pretty much won't be).

So you'd need a csum-and-copy loop that did 16 bytes every
three clocks to get the same throughput for long buffers.
In principle splitting the 'adc memory' into two instructions
is the same number of u-ops - but I'm sure I've tried to do
that and failed and the extra memory write can happen in
parallel with everything else.
So I don't think you'll get 16 bytes in two clocks - but you
might get it is three.

OTOH for a cpu where memcpy is code loop summing the data in
the copy loop is likely to be a gain.

But I suspect doing the checksum and copy at the same time
got 'all to complicated' to actually implement fully.
With most modern ethernet chips checksumming receive pacakets
does it really get used enough for the additional complexity?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2023-09-22 04:41:04

by Jaroslav Kysela

[permalink] [raw]
Subject: Re: [PATCH v5 01/11] sound: Fix snd_pcm_readv()/writev() to use iov access functions

On 21. 09. 23 0:22, David Howells wrote:
> Fix snd_pcm_readv()/writev() to use iov access functions rather than poking
> at the iov_iter internals directly.
>
> Signed-off-by: David Howells <[email protected]>
> cc: Jaroslav Kysela <[email protected]>
> cc: Takashi Iwai <[email protected]>
> cc: Oswald Buddenhagen <[email protected]>
> cc: Jens Axboe <[email protected]>
> cc: Suren Baghdasaryan <[email protected]>
> cc: Kuninori Morimoto <[email protected]>
> cc: [email protected]
> ---
> sound/core/pcm_native.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Jaroslav Kysela <[email protected]>

--
Jaroslav Kysela <[email protected]>
Linux Sound Maintainer; ALSA Project; Red Hat, Inc.

2023-09-22 08:44:57

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v5 01/11] sound: Fix snd_pcm_readv()/writev() to use iov access functions

Takashi Iwai <[email protected]> wrote:

> Would you apply it through your tree, or shall I apply this one via
> sound git tree?

It's a prerequisite for a later patch in this series, so I'd prefer to keep it
with my other patches.

David

2023-09-22 10:31:03

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH v5 05/11] iov_iter: Convert iterate*() to inline funcs

On Wed, Sep 20, 2023 at 11:22:25PM +0100, David Howells wrote:

...

> @@ -312,23 +192,29 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
> return 0;
> if (user_backed_iter(i))
> might_fault();
> - iterate_and_advance(i, bytes, base, len, off,
> - copyout(base, addr + off, len),
> - memcpy(base, addr + off, len)
> - )
> -
> - return bytes;
> + return iterate_and_advance(i, bytes, (void *)addr,
> + copy_to_user_iter, memcpy_to_iter);
> }
> EXPORT_SYMBOL(_copy_to_iter);
>
> #ifdef CONFIG_ARCH_HAS_COPY_MC
> -static int copyout_mc(void __user *to, const void *from, size_t n)
> -{
> - if (access_ok(to, n)) {
> - instrument_copy_to_user(to, from, n);
> - n = copy_mc_to_user((__force void *) to, from, n);
> +static __always_inline
> +size_t copy_to_user_iter_mc(void __user *iter_to, size_t progress,
> + size_t len, void *from, void *priv2)
> +{
> + if (access_ok(iter_to, len)) {
> + from += progress;
> + instrument_copy_to_user(iter_to, from, len);
> + len = copy_mc_to_user(iter_to, from, len);

Hi David,

Sparse complains a bit about the line above, perhaps the '(__force void *)'
should be retained from the old code?

lib/iov_iter.c:208:39: warning: incorrect type in argument 1 (different address spaces)
lib/iov_iter.c:208:39: expected void *to
lib/iov_iter.c:208:39: got void [noderef] __user *iter_to

> }
> - return n;
> + return len;
> +}
> +
> +static __always_inline
> +size_t memcpy_to_iter_mc(void *iter_to, size_t progress,
> + size_t len, void *from, void *priv2)
> +{
> + return copy_mc_to_kernel(iter_to, from + progress, len);
> }
>
> /**

...

2023-09-22 14:39:05

by Takashi Iwai

[permalink] [raw]
Subject: Re: [PATCH v5 01/11] sound: Fix snd_pcm_readv()/writev() to use iov access functions

On Thu, 21 Sep 2023 00:22:21 +0200,
David Howells wrote:
>
> Fix snd_pcm_readv()/writev() to use iov access functions rather than poking
> at the iov_iter internals directly.
>
> Signed-off-by: David Howells <[email protected]>
> cc: Jaroslav Kysela <[email protected]>
> cc: Takashi Iwai <[email protected]>
> cc: Oswald Buddenhagen <[email protected]>
> cc: Jens Axboe <[email protected]>
> cc: Suren Baghdasaryan <[email protected]>
> cc: Kuninori Morimoto <[email protected]>
> cc: [email protected]

Reviewed-by: Takashi Iwai <[email protected]>

Would you apply it through your tree, or shall I apply this one via
sound git tree?


thanks,

Takashi

> ---
> sound/core/pcm_native.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
> index bd9ddf412b46..9a69236fa207 100644
> --- a/sound/core/pcm_native.c
> +++ b/sound/core/pcm_native.c
> @@ -3527,7 +3527,7 @@ static ssize_t snd_pcm_readv(struct kiocb *iocb, struct iov_iter *to)
> if (runtime->state == SNDRV_PCM_STATE_OPEN ||
> runtime->state == SNDRV_PCM_STATE_DISCONNECTED)
> return -EBADFD;
> - if (!to->user_backed)
> + if (!user_backed_iter(to))
> return -EINVAL;
> if (to->nr_segs > 1024 || to->nr_segs != runtime->channels)
> return -EINVAL;
> @@ -3567,7 +3567,7 @@ static ssize_t snd_pcm_writev(struct kiocb *iocb, struct iov_iter *from)
> if (runtime->state == SNDRV_PCM_STATE_OPEN ||
> runtime->state == SNDRV_PCM_STATE_DISCONNECTED)
> return -EBADFD;
> - if (!from->user_backed)
> + if (!user_backed_iter(from))
> return -EINVAL;
> if (from->nr_segs > 128 || from->nr_segs != runtime->channels ||
> !frame_aligned(runtime, iov->iov_len))
>

2023-09-22 16:43:14

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v5 05/11] iov_iter: Convert iterate*() to inline funcs

Simon Horman <[email protected]> wrote:

> Sparse complains a bit about the line above, perhaps the '(__force void *)'
> should be retained from the old code?

I've added a patch to add the missing __user to the x86 copy_mc_to_user().
The powerpc one already has it.

David

2023-09-22 17:08:24

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v5 00/11] iov_iter: Convert the iterator macros into inline funcs

David Laight <[email protected]> wrote:

> > (8) Move the copy-and-csum code to net/ where it can be in proximity with
> > the code that uses it. This eliminates the code if CONFIG_NET=n and
> > allows for the slim possibility of it being inlined.
> >
> > (9) Fold memcpy_and_csum() in to its two users.
> >
> > (10) Move csum_and_copy_from_iter_full() out of line and merge in
> > csum_and_copy_from_iter() since the former is the only caller of the
> > latter.
>
> I thought that the real idea behind these was to do the checksum
> at the same time as the copy to avoid loading the data into the L1
> data-cache twice - especially for long buffers.
> I wonder how often there are multiple iov[] that actually make
> it better than just check summing the linear buffer?

It also reduces the overhead for finding the data to checksum in the case the
packet gets split since we're doing the checksumming as we copy - but with a
linear buffer, that's negligible.

> I had a feeling that check summing of udp data was done during
> copy_to/from_user, but the code can't be the copy-and-csum here
> for that because it is missing support form odd-length buffers.

Is there a bug there?

> Intel x86 desktop chips can easily checksum at 8 bytes/clock
> (But probably not with the current code!).
> (I've got ~12 bytes/clock using adox and adcx but that loop
> is entirely horrid and it would need run-time patching.
> Especially since I think some AMD cpu execute them very slowly.)
>
> OTOH 'rep movs[bq]' copy will copy 16 bytes/clock (32 if the
> destination is 32 byte aligned - it pretty much won't be).
>
> So you'd need a csum-and-copy loop that did 16 bytes every
> three clocks to get the same throughput for long buffers.
> In principle splitting the 'adc memory' into two instructions
> is the same number of u-ops - but I'm sure I've tried to do
> that and failed and the extra memory write can happen in
> parallel with everything else.
> So I don't think you'll get 16 bytes in two clocks - but you
> might get it is three.
>
> OTOH for a cpu where memcpy is code loop summing the data in
> the copy loop is likely to be a gain.
>
> But I suspect doing the checksum and copy at the same time
> got 'all to complicated' to actually implement fully.
> With most modern ethernet chips checksumming receive pacakets
> does it really get used enough for the additional complexity?

You may be right. That's more a question for the networking folks than for
me. It's entirely possible that the checksumming code is just not used on
modern systems these days.

Maybe Willem can comment since he's the UDP maintainer?

David

2023-09-23 09:05:41

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH v5 00/11] iov_iter: Convert the iterator macros into inline funcs

On Fri, Sep 22, 2023 at 2:01 PM David Howells <[email protected]> wrote:
>
> David Laight <[email protected]> wrote:
>
> > > (8) Move the copy-and-csum code to net/ where it can be in proximity with
> > > the code that uses it. This eliminates the code if CONFIG_NET=n and
> > > allows for the slim possibility of it being inlined.
> > >
> > > (9) Fold memcpy_and_csum() in to its two users.
> > >
> > > (10) Move csum_and_copy_from_iter_full() out of line and merge in
> > > csum_and_copy_from_iter() since the former is the only caller of the
> > > latter.
> >
> > I thought that the real idea behind these was to do the checksum
> > at the same time as the copy to avoid loading the data into the L1
> > data-cache twice - especially for long buffers.
> > I wonder how often there are multiple iov[] that actually make
> > it better than just check summing the linear buffer?
>
> It also reduces the overhead for finding the data to checksum in the case the
> packet gets split since we're doing the checksumming as we copy - but with a
> linear buffer, that's negligible.
>
> > I had a feeling that check summing of udp data was done during
> > copy_to/from_user, but the code can't be the copy-and-csum here
> > for that because it is missing support form odd-length buffers.
>
> Is there a bug there?
>
> > Intel x86 desktop chips can easily checksum at 8 bytes/clock
> > (But probably not with the current code!).
> > (I've got ~12 bytes/clock using adox and adcx but that loop
> > is entirely horrid and it would need run-time patching.
> > Especially since I think some AMD cpu execute them very slowly.)
> >
> > OTOH 'rep movs[bq]' copy will copy 16 bytes/clock (32 if the
> > destination is 32 byte aligned - it pretty much won't be).
> >
> > So you'd need a csum-and-copy loop that did 16 bytes every
> > three clocks to get the same throughput for long buffers.
> > In principle splitting the 'adc memory' into two instructions
> > is the same number of u-ops - but I'm sure I've tried to do
> > that and failed and the extra memory write can happen in
> > parallel with everything else.
> > So I don't think you'll get 16 bytes in two clocks - but you
> > might get it is three.
> >
> > OTOH for a cpu where memcpy is code loop summing the data in
> > the copy loop is likely to be a gain.
> >
> > But I suspect doing the checksum and copy at the same time
> > got 'all to complicated' to actually implement fully.
> > With most modern ethernet chips checksumming receive pacakets
> > does it really get used enough for the additional complexity?
>
> You may be right. That's more a question for the networking folks than for
> me. It's entirely possible that the checksumming code is just not used on
> modern systems these days.
>
> Maybe Willem can comment since he's the UDP maintainer?

Perhaps these days it is more relevant to embedded systems than high
end servers.

2023-09-23 10:38:04

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v5 00/11] iov_iter: Convert the iterator macros into inline funcs

From: Willem de Bruijn
> Sent: 23 September 2023 07:59
>
> On Fri, Sep 22, 2023 at 2:01 PM David Howells <[email protected]> wrote:
> >
> > David Laight <[email protected]> wrote:
> >
> > > > (8) Move the copy-and-csum code to net/ where it can be in proximity with
> > > > the code that uses it. This eliminates the code if CONFIG_NET=n and
> > > > allows for the slim possibility of it being inlined.
> > > >
> > > > (9) Fold memcpy_and_csum() in to its two users.
> > > >
> > > > (10) Move csum_and_copy_from_iter_full() out of line and merge in
> > > > csum_and_copy_from_iter() since the former is the only caller of the
> > > > latter.
> > >
> > > I thought that the real idea behind these was to do the checksum
> > > at the same time as the copy to avoid loading the data into the L1
> > > data-cache twice - especially for long buffers.
> > > I wonder how often there are multiple iov[] that actually make
> > > it better than just check summing the linear buffer?
> >
> > It also reduces the overhead for finding the data to checksum in the case the
> > packet gets split since we're doing the checksumming as we copy - but with a
> > linear buffer, that's negligible.
> >
> > > I had a feeling that check summing of udp data was done during
> > > copy_to/from_user, but the code can't be the copy-and-csum here
> > > for that because it is missing support form odd-length buffers.
> >
> > Is there a bug there?

No, I misread the code - i shouldn't scan patches when I'd
got a viral head code...

...
> > You may be right. That's more a question for the networking folks than for
> > me. It's entirely possible that the checksumming code is just not used on
> > modern systems these days.
> >
> > Maybe Willem can comment since he's the UDP maintainer?
>
> Perhaps these days it is more relevant to embedded systems than high
> end servers.

The checksum and copy are done together.
I probably missed it because the function isn't passed the
old checksum (which it can pretty much process for free).
Instead the caller is adding it afterwards - which involves
and extra explicit csum_add().

The x86-x84 ip checksum loops are all horrid though.
The unrolling in them is so 1990's.
With the out-of-order pipeline the memory accesses tend
to take care of themselves.
Not to mention that a whole raft of (now oldish) cpu take two
clocks to execute 'adc'.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)