2020-11-21 14:16:32

by David Howells

[permalink] [raw]
Subject: [PATCH 00/29] RFC: iov_iter: Switch to using an ops table


Hi Pavel, Willy, Jens, Al,

I had a go switching the iov_iter stuff away from using a type bitmask to
using an ops table to get rid of the if-if-if-if chains that are all over
the place. After I pushed it, someone pointed me at Pavel's two patches.

I have another iterator class that I want to add - which would lengthen the
if-if-if-if chains. A lot of the time, there's a conditional clause at the
beginning of a function that just jumps off to a type-specific handler or
to reject the operation for that type. An ops table can just point to that
instead.

As far as I can tell, there's no difference in performance in most cases,
though doing AFS-based kernel compiles appears to take less time (down from
3m20 to 2m50), which might make sense as that uses iterators a lot - but
there are too many variables in that for that to be a good benchmark (I'm
dealing with a remote server, for a start).

Can someone recommend a good way to benchmark this properly? The problem
is that the difference this makes relative to the amount of time taken to
actually do I/O is tiny.

I've tried TCP transfers using the following sink program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <netinet/in.h>
#define OSERROR(X, Y) do { if ((long)(X) == -1) { perror(Y); exit(1); } } while(0)
static unsigned char buffer[512 * 1024] __attribute__((aligned(4096)));
int main(int argc, char *argv[])
{
struct sockaddr_in sin = { .sin_family = AF_INET, .sin_port = htons(5555) };
int sfd, afd;
sfd = socket(AF_INET, SOCK_STREAM, 0);
OSERROR(sfd, "socket");
OSERROR(bind(sfd, (struct sockaddr *)&sin, sizeof(sin)), "bind");
OSERROR(listen(sfd, 1), "listen");
for (;;) {
afd = accept(sfd, NULL, NULL);
if (afd != -1) {
while (read(afd, buffer, sizeof(buffer)) > 0) {}
close(afd);
}
}
}

and send program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <netdb.h>
#include <netinet/in.h>
#include <sys/stat.h>
#include <sys/sendfile.h>
#define OSERROR(X, Y) do { if ((long)(X) == -1) { perror(Y); exit(1); } } while(0)
static unsigned char buffer[512*1024] __attribute__((aligned(4096)));
int main(int argc, char *argv[])
{
struct sockaddr_in sin = { .sin_family = AF_INET, .sin_port = htons(5555) };
struct hostent *h;
ssize_t size, r, o;
int cfd;
if (argc != 3) {
fprintf(stderr, "tcp-gen <server> <size>\n");
exit(2);
}
size = strtoul(argv[2], NULL, 0);
if (size <= 0) {
fprintf(stderr, "Bad size\n");
exit(2);
}
h = gethostbyname(argv[1]);
if (!h) {
fprintf(stderr, "%s: %s\n", argv[1], hstrerror(h_errno));
exit(3);
}
if (!h->h_addr_list[0]) {
fprintf(stderr, "%s: No addresses\n", argv[1]);
exit(3);
}
memcpy(&sin.sin_addr, h->h_addr_list[0], h->h_length);
cfd = socket(AF_INET, SOCK_STREAM, 0);
OSERROR(cfd, "socket");
OSERROR(connect(cfd, (struct sockaddr *)&sin, sizeof(sin)), "connect");
do {
r = size > sizeof(buffer) ? sizeof(buffer) : size;
size -= r;
o = 0;
do {
ssize_t w = write(cfd, buffer + o, r - o);
OSERROR(w, "write");
o += w;
} while (o < r);
} while (size > 0);
OSERROR(close(cfd), "close/c");
return 0;
}

since the socket interface uses iterators. It seems to show no difference.
One side note, though: I've been doing 10GiB same-machine transfers, and it
takes either ~2.5s or ~0.87s and rarely in between, with or without these
patches, alternating apparently randomly between the two times.

The patches can be found here:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=iov-ops

David
---
David Howells (29):
iov_iter: Switch to using a table of operations
iov_iter: Split copy_page_to_iter()
iov_iter: Split iov_iter_fault_in_readable
iov_iter: Split the iterate_and_advance() macro
iov_iter: Split copy_to_iter()
iov_iter: Split copy_mc_to_iter()
iov_iter: Split copy_from_iter()
iov_iter: Split the iterate_all_kinds() macro
iov_iter: Split copy_from_iter_full()
iov_iter: Split copy_from_iter_nocache()
iov_iter: Split copy_from_iter_flushcache()
iov_iter: Split copy_from_iter_full_nocache()
iov_iter: Split copy_page_from_iter()
iov_iter: Split iov_iter_zero()
iov_iter: Split copy_from_user_atomic()
iov_iter: Split iov_iter_advance()
iov_iter: Split iov_iter_revert()
iov_iter: Split iov_iter_single_seg_count()
iov_iter: Split iov_iter_alignment()
iov_iter: Split iov_iter_gap_alignment()
iov_iter: Split iov_iter_get_pages()
iov_iter: Split iov_iter_get_pages_alloc()
iov_iter: Split csum_and_copy_from_iter()
iov_iter: Split csum_and_copy_from_iter_full()
iov_iter: Split csum_and_copy_to_iter()
iov_iter: Split iov_iter_npages()
iov_iter: Split dup_iter()
iov_iter: Split iov_iter_for_each_range()
iov_iter: Remove iterate_all_kinds() and iterate_and_advance()


lib/iov_iter.c | 1440 +++++++++++++++++++++++++++++++-----------------
1 file changed, 934 insertions(+), 506 deletions(-)



2020-11-21 14:16:41

by David Howells

[permalink] [raw]
Subject: [PATCH 01/29] iov_iter: Switch to using a table of operations

Switch to using a table of operations. In a future patch the individual
methods will be split up by type. For the moment, however, the ops tables
just jump directly to the old functions - which are now static. Inline
wrappers are provided to jump through the hooks.

Signed-off-by: David Howells <[email protected]>
---

fs/io_uring.c | 2
include/linux/uio.h | 241 ++++++++++++++++++++++++++++++++++--------
lib/iov_iter.c | 293 +++++++++++++++++++++++++++++++++++++++------------
3 files changed, 422 insertions(+), 114 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 4ead291b2976..baa78f58ae5c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3192,7 +3192,7 @@ static void io_req_map_rw(struct io_kiocb *req, const struct iovec *iovec,
rw->free_iovec = iovec;
rw->bytes_done = 0;
/* can only be fixed buffers, no need to do anything */
- if (iter->type == ITER_BVEC)
+ if (iov_iter_is_bvec(iter))
return;
if (!iovec) {
unsigned iov_off = 0;
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 72d88566694e..45ee087f8c43 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -32,9 +32,10 @@ struct iov_iter {
* Bit 1 is the BVEC_FLAG_NO_REF bit, set if type is a bvec and
* the caller isn't expecting to drop a page reference when done.
*/
- unsigned int type;
+ unsigned int flags;
size_t iov_offset;
size_t count;
+ const struct iov_iter_ops *ops;
union {
const struct iovec *iov;
const struct kvec *kvec;
@@ -50,9 +51,63 @@ struct iov_iter {
};
};

+void iov_iter_init(struct iov_iter *i, unsigned int direction, const struct iovec *iov,
+ unsigned long nr_segs, size_t count);
+void iov_iter_kvec(struct iov_iter *i, unsigned int direction, const struct kvec *kvec,
+ unsigned long nr_segs, size_t count);
+void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_vec *bvec,
+ unsigned long nr_segs, size_t count);
+void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode_info *pipe,
+ size_t count);
+void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count);
+
+struct iov_iter_ops {
+ enum iter_type type;
+ size_t (*copy_from_user_atomic)(struct page *page, struct iov_iter *i,
+ unsigned long offset, size_t bytes);
+ void (*advance)(struct iov_iter *i, size_t bytes);
+ void (*revert)(struct iov_iter *i, size_t bytes);
+ int (*fault_in_readable)(struct iov_iter *i, size_t bytes);
+ size_t (*single_seg_count)(const struct iov_iter *i);
+ size_t (*copy_page_to_iter)(struct page *page, size_t offset, size_t bytes,
+ struct iov_iter *i);
+ size_t (*copy_page_from_iter)(struct page *page, size_t offset, size_t bytes,
+ struct iov_iter *i);
+ size_t (*copy_to_iter)(const void *addr, size_t bytes, struct iov_iter *i);
+ size_t (*copy_from_iter)(void *addr, size_t bytes, struct iov_iter *i);
+ bool (*copy_from_iter_full)(void *addr, size_t bytes, struct iov_iter *i);
+ size_t (*copy_from_iter_nocache)(void *addr, size_t bytes, struct iov_iter *i);
+ bool (*copy_from_iter_full_nocache)(void *addr, size_t bytes, struct iov_iter *i);
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+ size_t (*copy_from_iter_flushcache)(void *addr, size_t bytes, struct iov_iter *i);
+#endif
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+ size_t (*copy_mc_to_iter)(const void *addr, size_t bytes, struct iov_iter *i);
+#endif
+ size_t (*csum_and_copy_to_iter)(const void *addr, size_t bytes, void *csump,
+ struct iov_iter *i);
+ size_t (*csum_and_copy_from_iter)(void *addr, size_t bytes, __wsum *csum,
+ struct iov_iter *i);
+ bool (*csum_and_copy_from_iter_full)(void *addr, size_t bytes, __wsum *csum,
+ struct iov_iter *i);
+
+ size_t (*zero)(size_t bytes, struct iov_iter *i);
+ unsigned long (*alignment)(const struct iov_iter *i);
+ unsigned long (*gap_alignment)(const struct iov_iter *i);
+ ssize_t (*get_pages)(struct iov_iter *i, struct page **pages,
+ size_t maxsize, unsigned maxpages, size_t *start);
+ ssize_t (*get_pages_alloc)(struct iov_iter *i, struct page ***pages,
+ size_t maxsize, size_t *start);
+ int (*npages)(const struct iov_iter *i, int maxpages);
+ const void *(*dup_iter)(struct iov_iter *new, struct iov_iter *old, gfp_t flags);
+ int (*for_each_range)(struct iov_iter *i, size_t bytes,
+ int (*f)(struct kvec *vec, void *context),
+ void *context);
+};
+
static inline enum iter_type iov_iter_type(const struct iov_iter *i)
{
- return i->type & ~(READ | WRITE);
+ return i->ops->type;
}

static inline bool iter_is_iovec(const struct iov_iter *i)
@@ -82,7 +137,7 @@ static inline bool iov_iter_is_discard(const struct iov_iter *i)

static inline unsigned char iov_iter_rw(const struct iov_iter *i)
{
- return i->type & (READ | WRITE);
+ return i->flags & (READ | WRITE);
}

/*
@@ -111,22 +166,71 @@ static inline struct iovec iov_iter_iovec(const struct iov_iter *iter)
};
}

-size_t iov_iter_copy_from_user_atomic(struct page *page,
- struct iov_iter *i, unsigned long offset, size_t bytes);
-void iov_iter_advance(struct iov_iter *i, size_t bytes);
-void iov_iter_revert(struct iov_iter *i, size_t bytes);
-int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes);
-size_t iov_iter_single_seg_count(const struct iov_iter *i);
+static inline
+size_t iov_iter_copy_from_user_atomic(struct page *page, struct iov_iter *i,
+ unsigned long offset, size_t bytes)
+{
+ return i->ops->copy_from_user_atomic(page, i, offset, bytes);
+}
+static inline
+void iov_iter_advance(struct iov_iter *i, size_t bytes)
+{
+ return i->ops->advance(i, bytes);
+}
+static inline
+void iov_iter_revert(struct iov_iter *i, size_t bytes)
+{
+ return i->ops->revert(i, bytes);
+}
+static inline
+int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes)
+{
+ return i->ops->fault_in_readable(i, bytes);
+}
+static inline
+size_t iov_iter_single_seg_count(const struct iov_iter *i)
+{
+ return i->ops->single_seg_count(i);
+}
+
+static inline
size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
- struct iov_iter *i);
+ struct iov_iter *i)
+{
+ return i->ops->copy_page_to_iter(page, offset, bytes, i);
+}
+static inline
size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
- struct iov_iter *i);
+ struct iov_iter *i)
+{
+ return i->ops->copy_page_from_iter(page, offset, bytes, i);
+}

-size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
-size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
-bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i);
-size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
-bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i);
+static __always_inline __must_check
+size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
+{
+ return i->ops->copy_to_iter(addr, bytes, i);
+}
+static __always_inline __must_check
+size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
+{
+ return i->ops->copy_from_iter(addr, bytes, i);
+}
+static __always_inline __must_check
+bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
+{
+ return i->ops->copy_from_iter_full(addr, bytes, i);
+}
+static __always_inline __must_check
+size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
+{
+ return i->ops->copy_from_iter_nocache(addr, bytes, i);
+}
+static __always_inline __must_check
+bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
+{
+ return i->ops->copy_from_iter_full_nocache(addr, bytes, i);
+}

static __always_inline __must_check
size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
@@ -173,23 +277,21 @@ bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
return _copy_from_iter_full_nocache(addr, bytes, i);
}

-#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
/*
* Note, users like pmem that depend on the stricter semantics of
* copy_from_iter_flushcache() than copy_from_iter_nocache() must check for
* IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) before assuming that the
* destination is flushed from the cache on return.
*/
-size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
-#else
-#define _copy_from_iter_flushcache _copy_from_iter_nocache
-#endif
-
-#ifdef CONFIG_ARCH_HAS_COPY_MC
-size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
+static __always_inline __must_check
+size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
+{
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+ return i->ops->copy_from_iter_flushcache(addr, bytes, i);
#else
-#define _copy_mc_to_iter _copy_to_iter
+ return i->ops->copy_from_iter_nocache(addr, bytes, i);
#endif
+}

static __always_inline __must_check
size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
@@ -200,6 +302,16 @@ size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
return _copy_from_iter_flushcache(addr, bytes, i);
}

+static __always_inline __must_check
+size_t _copy_mc_to_iter(void *addr, size_t bytes, struct iov_iter *i)
+{
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+ return i->ops->copy_mc_to_iter(addr, bytes, i);
+#else
+ return i->ops->copy_to_iter(addr, bytes, i);
+#endif
+}
+
static __always_inline __must_check
size_t copy_mc_to_iter(void *addr, size_t bytes, struct iov_iter *i)
{
@@ -209,25 +321,47 @@ size_t copy_mc_to_iter(void *addr, size_t bytes, struct iov_iter *i)
return _copy_mc_to_iter(addr, bytes, i);
}

-size_t iov_iter_zero(size_t bytes, struct iov_iter *);
-unsigned long iov_iter_alignment(const struct iov_iter *i);
-unsigned long iov_iter_gap_alignment(const struct iov_iter *i);
-void iov_iter_init(struct iov_iter *i, unsigned int direction, const struct iovec *iov,
- unsigned long nr_segs, size_t count);
-void iov_iter_kvec(struct iov_iter *i, unsigned int direction, const struct kvec *kvec,
- unsigned long nr_segs, size_t count);
-void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_vec *bvec,
- unsigned long nr_segs, size_t count);
-void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode_info *pipe,
- size_t count);
-void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count);
+static inline
+size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
+{
+ return i->ops->zero(bytes, i);
+}
+static inline
+unsigned long iov_iter_alignment(const struct iov_iter *i)
+{
+ return i->ops->alignment(i);
+}
+static inline
+unsigned long iov_iter_gap_alignment(const struct iov_iter *i)
+{
+ return i->ops->gap_alignment(i);
+}
+
+static inline
ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
- size_t maxsize, unsigned maxpages, size_t *start);
+ size_t maxsize, unsigned maxpages, size_t *start)
+{
+ return i->ops->get_pages(i, pages, maxsize, maxpages, start);
+}
+
+static inline
ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages,
- size_t maxsize, size_t *start);
-int iov_iter_npages(const struct iov_iter *i, int maxpages);
+ size_t maxsize, size_t *start)
+{
+ return i->ops->get_pages_alloc(i, pages, maxsize, start);
+}
+
+static inline
+int iov_iter_npages(const struct iov_iter *i, int maxpages)
+{
+ return i->ops->npages(i, maxpages);
+}

-const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags);
+static inline
+const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
+{
+ return old->ops->dup_iter(new, old, flags);
+}

static inline size_t iov_iter_count(const struct iov_iter *i)
{
@@ -260,9 +394,22 @@ static inline void iov_iter_reexpand(struct iov_iter *i, size_t count)
{
i->count = count;
}
-size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump, struct iov_iter *i);
-size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i);
-bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i);
+
+static inline
+size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump, struct iov_iter *i)
+{
+ return i->ops->csum_and_copy_to_iter(addr, bytes, csump, i);
+}
+static inline
+size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i)
+{
+ return i->ops->csum_and_copy_from_iter(addr, bytes, csum, i);
+}
+static inline
+bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i)
+{
+ return i->ops->csum_and_copy_from_iter_full(addr, bytes, csum, i);
+}
size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
struct iov_iter *i);

@@ -278,8 +425,12 @@ ssize_t __import_iovec(int type, const struct iovec __user *uvec,
int import_single_range(int type, void __user *buf, size_t len,
struct iovec *iov, struct iov_iter *i);

+static inline
int iov_iter_for_each_range(struct iov_iter *i, size_t bytes,
int (*f)(struct kvec *vec, void *context),
- void *context);
+ void *context)
+{
+ return i->ops->for_each_range(i, bytes, f, context);
+}

#endif
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 1635111c5bd2..e403d524c797 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -13,6 +13,12 @@
#include <linux/scatterlist.h>
#include <linux/instrumented.h>

+static const struct iov_iter_ops iovec_iter_ops;
+static const struct iov_iter_ops kvec_iter_ops;
+static const struct iov_iter_ops bvec_iter_ops;
+static const struct iov_iter_ops pipe_iter_ops;
+static const struct iov_iter_ops discard_iter_ops;
+
#define PIPE_PARANOIA /* for now */

#define iterate_iovec(i, n, __v, __p, skip, STEP) { \
@@ -81,15 +87,15 @@
#define iterate_all_kinds(i, n, v, I, B, K) { \
if (likely(n)) { \
size_t skip = i->iov_offset; \
- if (unlikely(i->type & ITER_BVEC)) { \
+ if (unlikely(iov_iter_type(i) & ITER_BVEC)) { \
struct bio_vec v; \
struct bvec_iter __bi; \
iterate_bvec(i, n, v, __bi, skip, (B)) \
- } else if (unlikely(i->type & ITER_KVEC)) { \
+ } else if (unlikely(iov_iter_type(i) & ITER_KVEC)) { \
const struct kvec *kvec; \
struct kvec v; \
iterate_kvec(i, n, v, kvec, skip, (K)) \
- } else if (unlikely(i->type & ITER_DISCARD)) { \
+ } else if (unlikely(iov_iter_type(i) & ITER_DISCARD)) { \
} else { \
const struct iovec *iov; \
struct iovec v; \
@@ -103,7 +109,7 @@
n = i->count; \
if (i->count) { \
size_t skip = i->iov_offset; \
- if (unlikely(i->type & ITER_BVEC)) { \
+ if (unlikely(iov_iter_type(i) & ITER_BVEC)) { \
const struct bio_vec *bvec = i->bvec; \
struct bio_vec v; \
struct bvec_iter __bi; \
@@ -111,7 +117,7 @@
i->bvec = __bvec_iter_bvec(i->bvec, __bi); \
i->nr_segs -= i->bvec - bvec; \
skip = __bi.bi_bvec_done; \
- } else if (unlikely(i->type & ITER_KVEC)) { \
+ } else if (unlikely(iov_iter_type(i) & ITER_KVEC)) { \
const struct kvec *kvec; \
struct kvec v; \
iterate_kvec(i, n, v, kvec, skip, (K)) \
@@ -121,7 +127,7 @@
} \
i->nr_segs -= kvec - i->kvec; \
i->kvec = kvec; \
- } else if (unlikely(i->type & ITER_DISCARD)) { \
+ } else if (unlikely(iov_iter_type(i) & ITER_DISCARD)) { \
skip += n; \
} else { \
const struct iovec *iov; \
@@ -427,14 +433,14 @@ static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t by
* Return 0 on success, or non-zero if the memory could not be accessed (i.e.
* because it is an invalid address).
*/
-int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes)
+static int xxx_fault_in_readable(struct iov_iter *i, size_t bytes)
{
size_t skip = i->iov_offset;
const struct iovec *iov;
int err;
struct iovec v;

- if (!(i->type & (ITER_BVEC|ITER_KVEC))) {
+ if (!(iov_iter_type(i) & (ITER_BVEC|ITER_KVEC))) {
iterate_iovec(i, bytes, v, iov, skip, ({
err = fault_in_pages_readable(v.iov_base, v.iov_len);
if (unlikely(err))
@@ -443,7 +449,6 @@ int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes)
}
return 0;
}
-EXPORT_SYMBOL(iov_iter_fault_in_readable);

void iov_iter_init(struct iov_iter *i, unsigned int direction,
const struct iovec *iov, unsigned long nr_segs,
@@ -454,10 +459,12 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,

/* It will get better. Eventually... */
if (uaccess_kernel()) {
- i->type = ITER_KVEC | direction;
+ i->ops = &kvec_iter_ops;
+ i->flags = direction;
i->kvec = (struct kvec *)iov;
} else {
- i->type = ITER_IOVEC | direction;
+ i->ops = &iovec_iter_ops;
+ i->flags = direction;
i->iov = iov;
}
i->nr_segs = nr_segs;
@@ -625,7 +632,7 @@ static size_t csum_and_copy_to_pipe_iter(const void *addr, size_t bytes,
return bytes;
}

-size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
+static size_t xxx_copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
{
const char *from = addr;
if (unlikely(iov_iter_is_pipe(i)))
@@ -641,7 +648,6 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)

return bytes;
}
-EXPORT_SYMBOL(_copy_to_iter);

#ifdef CONFIG_ARCH_HAS_COPY_MC
static int copyout_mc(void __user *to, const void *from, size_t n)
@@ -723,7 +729,7 @@ static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
* Compare to copy_to_iter() where only ITER_IOVEC attempts might return
* a short copy.
*/
-size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
+static size_t xxx_copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
{
const char *from = addr;
unsigned long rem, curr_addr, s_addr = (unsigned long) addr;
@@ -757,10 +763,9 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)

return bytes;
}
-EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
#endif /* CONFIG_ARCH_HAS_COPY_MC */

-size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
+static size_t xxx_copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
if (unlikely(iov_iter_is_pipe(i))) {
@@ -778,9 +783,8 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)

return bytes;
}
-EXPORT_SYMBOL(_copy_from_iter);

-bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
+static bool xxx_copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
if (unlikely(iov_iter_is_pipe(i))) {
@@ -805,9 +809,8 @@ bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
iov_iter_advance(i, bytes);
return true;
}
-EXPORT_SYMBOL(_copy_from_iter_full);

-size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
+static size_t xxx_copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
if (unlikely(iov_iter_is_pipe(i))) {
@@ -824,7 +827,6 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)

return bytes;
}
-EXPORT_SYMBOL(_copy_from_iter_nocache);

#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
/**
@@ -841,7 +843,7 @@ EXPORT_SYMBOL(_copy_from_iter_nocache);
* bypass the cache for the ITER_IOVEC case, and on some archs may use
* instructions that strand dirty-data in the cache.
*/
-size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
+static size_t xxx_copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
if (unlikely(iov_iter_is_pipe(i))) {
@@ -859,10 +861,9 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)

return bytes;
}
-EXPORT_SYMBOL_GPL(_copy_from_iter_flushcache);
#endif

-bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
+static bool xxx_copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
if (unlikely(iov_iter_is_pipe(i))) {
@@ -884,7 +885,6 @@ bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
iov_iter_advance(i, bytes);
return true;
}
-EXPORT_SYMBOL(_copy_from_iter_full_nocache);

static inline bool page_copy_sane(struct page *page, size_t offset, size_t n)
{
@@ -910,12 +910,12 @@ static inline bool page_copy_sane(struct page *page, size_t offset, size_t n)
return false;
}

-size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
+static size_t xxx_copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
struct iov_iter *i)
{
if (unlikely(!page_copy_sane(page, offset, bytes)))
return 0;
- if (i->type & (ITER_BVEC|ITER_KVEC)) {
+ if (iov_iter_type(i) & (ITER_BVEC|ITER_KVEC)) {
void *kaddr = kmap_atomic(page);
size_t wanted = copy_to_iter(kaddr + offset, bytes, i);
kunmap_atomic(kaddr);
@@ -927,9 +927,8 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
else
return copy_page_to_iter_pipe(page, offset, bytes, i);
}
-EXPORT_SYMBOL(copy_page_to_iter);

-size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
+static size_t xxx_copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
struct iov_iter *i)
{
if (unlikely(!page_copy_sane(page, offset, bytes)))
@@ -938,15 +937,14 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
WARN_ON(1);
return 0;
}
- if (i->type & (ITER_BVEC|ITER_KVEC)) {
+ if (iov_iter_type(i) & (ITER_BVEC|ITER_KVEC)) {
void *kaddr = kmap_atomic(page);
- size_t wanted = _copy_from_iter(kaddr + offset, bytes, i);
+ size_t wanted = xxx_copy_from_iter(kaddr + offset, bytes, i);
kunmap_atomic(kaddr);
return wanted;
} else
return copy_page_from_iter_iovec(page, offset, bytes, i);
}
-EXPORT_SYMBOL(copy_page_from_iter);

static size_t pipe_zero(size_t bytes, struct iov_iter *i)
{
@@ -975,7 +973,7 @@ static size_t pipe_zero(size_t bytes, struct iov_iter *i)
return bytes;
}

-size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
+static size_t xxx_zero(size_t bytes, struct iov_iter *i)
{
if (unlikely(iov_iter_is_pipe(i)))
return pipe_zero(bytes, i);
@@ -987,9 +985,8 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i)

return bytes;
}
-EXPORT_SYMBOL(iov_iter_zero);

-size_t iov_iter_copy_from_user_atomic(struct page *page,
+static size_t xxx_copy_from_user_atomic(struct page *page,
struct iov_iter *i, unsigned long offset, size_t bytes)
{
char *kaddr = kmap_atomic(page), *p = kaddr + offset;
@@ -1011,7 +1008,6 @@ size_t iov_iter_copy_from_user_atomic(struct page *page,
kunmap_atomic(kaddr);
return bytes;
}
-EXPORT_SYMBOL(iov_iter_copy_from_user_atomic);

static inline void pipe_truncate(struct iov_iter *i)
{
@@ -1067,7 +1063,7 @@ static void pipe_advance(struct iov_iter *i, size_t size)
pipe_truncate(i);
}

-void iov_iter_advance(struct iov_iter *i, size_t size)
+static void xxx_advance(struct iov_iter *i, size_t size)
{
if (unlikely(iov_iter_is_pipe(i))) {
pipe_advance(i, size);
@@ -1079,9 +1075,8 @@ void iov_iter_advance(struct iov_iter *i, size_t size)
}
iterate_and_advance(i, size, v, 0, 0, 0)
}
-EXPORT_SYMBOL(iov_iter_advance);

-void iov_iter_revert(struct iov_iter *i, size_t unroll)
+static void xxx_revert(struct iov_iter *i, size_t unroll)
{
if (!unroll)
return;
@@ -1147,12 +1142,11 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll)
}
}
}
-EXPORT_SYMBOL(iov_iter_revert);

/*
* Return the count of just the current iov_iter segment.
*/
-size_t iov_iter_single_seg_count(const struct iov_iter *i)
+static size_t xxx_single_seg_count(const struct iov_iter *i)
{
if (unlikely(iov_iter_is_pipe(i)))
return i->count; // it is a silly place, anyway
@@ -1165,14 +1159,14 @@ size_t iov_iter_single_seg_count(const struct iov_iter *i)
else
return min(i->count, i->iov->iov_len - i->iov_offset);
}
-EXPORT_SYMBOL(iov_iter_single_seg_count);

void iov_iter_kvec(struct iov_iter *i, unsigned int direction,
- const struct kvec *kvec, unsigned long nr_segs,
- size_t count)
+ const struct kvec *kvec, unsigned long nr_segs,
+ size_t count)
{
WARN_ON(direction & ~(READ | WRITE));
- i->type = ITER_KVEC | (direction & (READ | WRITE));
+ i->ops = &kvec_iter_ops;
+ i->flags = direction & (READ | WRITE);
i->kvec = kvec;
i->nr_segs = nr_segs;
i->iov_offset = 0;
@@ -1185,7 +1179,8 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction,
size_t count)
{
WARN_ON(direction & ~(READ | WRITE));
- i->type = ITER_BVEC | (direction & (READ | WRITE));
+ i->ops = &bvec_iter_ops;
+ i->flags = direction & (READ | WRITE);
i->bvec = bvec;
i->nr_segs = nr_segs;
i->iov_offset = 0;
@@ -1199,7 +1194,8 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction,
{
BUG_ON(direction != READ);
WARN_ON(pipe_full(pipe->head, pipe->tail, pipe->ring_size));
- i->type = ITER_PIPE | READ;
+ i->ops = &pipe_iter_ops;
+ i->flags = READ;
i->pipe = pipe;
i->head = pipe->head;
i->iov_offset = 0;
@@ -1220,13 +1216,14 @@ EXPORT_SYMBOL(iov_iter_pipe);
void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count)
{
BUG_ON(direction != READ);
- i->type = ITER_DISCARD | READ;
+ i->ops = &discard_iter_ops;
+ i->flags = READ;
i->count = count;
i->iov_offset = 0;
}
EXPORT_SYMBOL(iov_iter_discard);

-unsigned long iov_iter_alignment(const struct iov_iter *i)
+static unsigned long xxx_alignment(const struct iov_iter *i)
{
unsigned long res = 0;
size_t size = i->count;
@@ -1245,9 +1242,8 @@ unsigned long iov_iter_alignment(const struct iov_iter *i)
)
return res;
}
-EXPORT_SYMBOL(iov_iter_alignment);

-unsigned long iov_iter_gap_alignment(const struct iov_iter *i)
+static unsigned long xxx_gap_alignment(const struct iov_iter *i)
{
unsigned long res = 0;
size_t size = i->count;
@@ -1267,7 +1263,6 @@ unsigned long iov_iter_gap_alignment(const struct iov_iter *i)
);
return res;
}
-EXPORT_SYMBOL(iov_iter_gap_alignment);

static inline ssize_t __pipe_get_pages(struct iov_iter *i,
size_t maxsize,
@@ -1313,7 +1308,7 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
return __pipe_get_pages(i, min(maxsize, capacity), pages, iter_head, start);
}

-ssize_t iov_iter_get_pages(struct iov_iter *i,
+static ssize_t xxx_get_pages(struct iov_iter *i,
struct page **pages, size_t maxsize, unsigned maxpages,
size_t *start)
{
@@ -1352,7 +1347,6 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
)
return 0;
}
-EXPORT_SYMBOL(iov_iter_get_pages);

static struct page **get_pages_array(size_t n)
{
@@ -1392,7 +1386,7 @@ static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
return n;
}

-ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
+static ssize_t xxx_get_pages_alloc(struct iov_iter *i,
struct page ***pages, size_t maxsize,
size_t *start)
{
@@ -1439,9 +1433,8 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
)
return 0;
}
-EXPORT_SYMBOL(iov_iter_get_pages_alloc);

-size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
+static size_t xxx_csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
struct iov_iter *i)
{
char *to = addr;
@@ -1478,9 +1471,8 @@ size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
*csum = sum;
return bytes;
}
-EXPORT_SYMBOL(csum_and_copy_from_iter);

-bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum,
+static bool xxx_csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum,
struct iov_iter *i)
{
char *to = addr;
@@ -1520,9 +1512,8 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum,
iov_iter_advance(i, bytes);
return true;
}
-EXPORT_SYMBOL(csum_and_copy_from_iter_full);

-size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump,
+static size_t xxx_csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump,
struct iov_iter *i)
{
const char *from = addr;
@@ -1564,7 +1555,6 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump,
*csum = sum;
return bytes;
}
-EXPORT_SYMBOL(csum_and_copy_to_iter);

size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
struct iov_iter *i)
@@ -1585,7 +1575,7 @@ size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
}
EXPORT_SYMBOL(hash_and_copy_to_iter);

-int iov_iter_npages(const struct iov_iter *i, int maxpages)
+static int xxx_npages(const struct iov_iter *i, int maxpages)
{
size_t size = i->count;
int npages = 0;
@@ -1628,9 +1618,8 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages)
)
return npages;
}
-EXPORT_SYMBOL(iov_iter_npages);

-const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
+static const void *xxx_dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
{
*new = *old;
if (unlikely(iov_iter_is_pipe(new))) {
@@ -1649,7 +1638,6 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
new->nr_segs * sizeof(struct iovec),
flags);
}
-EXPORT_SYMBOL(dup_iter);

static int copy_compat_iovec_from_user(struct iovec *iov,
const struct iovec __user *uvec, unsigned long nr_segs)
@@ -1826,7 +1814,7 @@ int import_single_range(int rw, void __user *buf, size_t len,
}
EXPORT_SYMBOL(import_single_range);

-int iov_iter_for_each_range(struct iov_iter *i, size_t bytes,
+static int xxx_for_each_range(struct iov_iter *i, size_t bytes,
int (*f)(struct kvec *vec, void *context),
void *context)
{
@@ -1846,4 +1834,173 @@ int iov_iter_for_each_range(struct iov_iter *i, size_t bytes,
)
return err;
}
-EXPORT_SYMBOL(iov_iter_for_each_range);
+
+static const struct iov_iter_ops iovec_iter_ops = {
+ .type = ITER_IOVEC,
+ .copy_from_user_atomic = xxx_copy_from_user_atomic,
+ .advance = xxx_advance,
+ .revert = xxx_revert,
+ .fault_in_readable = xxx_fault_in_readable,
+ .single_seg_count = xxx_single_seg_count,
+ .copy_page_to_iter = xxx_copy_page_to_iter,
+ .copy_page_from_iter = xxx_copy_page_from_iter,
+ .copy_to_iter = xxx_copy_to_iter,
+ .copy_from_iter = xxx_copy_from_iter,
+ .copy_from_iter_full = xxx_copy_from_iter_full,
+ .copy_from_iter_nocache = xxx_copy_from_iter_nocache,
+ .copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+ .copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
+#endif
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+ .copy_mc_to_iter = xxx_copy_mc_to_iter,
+#endif
+ .csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
+ .csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
+ .csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,
+
+ .zero = xxx_zero,
+ .alignment = xxx_alignment,
+ .gap_alignment = xxx_gap_alignment,
+ .get_pages = xxx_get_pages,
+ .get_pages_alloc = xxx_get_pages_alloc,
+ .npages = xxx_npages,
+ .dup_iter = xxx_dup_iter,
+ .for_each_range = xxx_for_each_range,
+};
+
+static const struct iov_iter_ops kvec_iter_ops = {
+ .type = ITER_KVEC,
+ .copy_from_user_atomic = xxx_copy_from_user_atomic,
+ .advance = xxx_advance,
+ .revert = xxx_revert,
+ .fault_in_readable = xxx_fault_in_readable,
+ .single_seg_count = xxx_single_seg_count,
+ .copy_page_to_iter = xxx_copy_page_to_iter,
+ .copy_page_from_iter = xxx_copy_page_from_iter,
+ .copy_to_iter = xxx_copy_to_iter,
+ .copy_from_iter = xxx_copy_from_iter,
+ .copy_from_iter_full = xxx_copy_from_iter_full,
+ .copy_from_iter_nocache = xxx_copy_from_iter_nocache,
+ .copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+ .copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
+#endif
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+ .copy_mc_to_iter = xxx_copy_mc_to_iter,
+#endif
+ .csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
+ .csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
+ .csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,
+
+ .zero = xxx_zero,
+ .alignment = xxx_alignment,
+ .gap_alignment = xxx_gap_alignment,
+ .get_pages = xxx_get_pages,
+ .get_pages_alloc = xxx_get_pages_alloc,
+ .npages = xxx_npages,
+ .dup_iter = xxx_dup_iter,
+ .for_each_range = xxx_for_each_range,
+};
+
+static const struct iov_iter_ops bvec_iter_ops = {
+ .type = ITER_BVEC,
+ .copy_from_user_atomic = xxx_copy_from_user_atomic,
+ .advance = xxx_advance,
+ .revert = xxx_revert,
+ .fault_in_readable = xxx_fault_in_readable,
+ .single_seg_count = xxx_single_seg_count,
+ .copy_page_to_iter = xxx_copy_page_to_iter,
+ .copy_page_from_iter = xxx_copy_page_from_iter,
+ .copy_to_iter = xxx_copy_to_iter,
+ .copy_from_iter = xxx_copy_from_iter,
+ .copy_from_iter_full = xxx_copy_from_iter_full,
+ .copy_from_iter_nocache = xxx_copy_from_iter_nocache,
+ .copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+ .copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
+#endif
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+ .copy_mc_to_iter = xxx_copy_mc_to_iter,
+#endif
+ .csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
+ .csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
+ .csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,
+
+ .zero = xxx_zero,
+ .alignment = xxx_alignment,
+ .gap_alignment = xxx_gap_alignment,
+ .get_pages = xxx_get_pages,
+ .get_pages_alloc = xxx_get_pages_alloc,
+ .npages = xxx_npages,
+ .dup_iter = xxx_dup_iter,
+ .for_each_range = xxx_for_each_range,
+};
+
+static const struct iov_iter_ops pipe_iter_ops = {
+ .type = ITER_PIPE,
+ .copy_from_user_atomic = xxx_copy_from_user_atomic,
+ .advance = xxx_advance,
+ .revert = xxx_revert,
+ .fault_in_readable = xxx_fault_in_readable,
+ .single_seg_count = xxx_single_seg_count,
+ .copy_page_to_iter = xxx_copy_page_to_iter,
+ .copy_page_from_iter = xxx_copy_page_from_iter,
+ .copy_to_iter = xxx_copy_to_iter,
+ .copy_from_iter = xxx_copy_from_iter,
+ .copy_from_iter_full = xxx_copy_from_iter_full,
+ .copy_from_iter_nocache = xxx_copy_from_iter_nocache,
+ .copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+ .copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
+#endif
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+ .copy_mc_to_iter = xxx_copy_mc_to_iter,
+#endif
+ .csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
+ .csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
+ .csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,
+
+ .zero = xxx_zero,
+ .alignment = xxx_alignment,
+ .gap_alignment = xxx_gap_alignment,
+ .get_pages = xxx_get_pages,
+ .get_pages_alloc = xxx_get_pages_alloc,
+ .npages = xxx_npages,
+ .dup_iter = xxx_dup_iter,
+ .for_each_range = xxx_for_each_range,
+};
+
+static const struct iov_iter_ops discard_iter_ops = {
+ .type = ITER_DISCARD,
+ .copy_from_user_atomic = xxx_copy_from_user_atomic,
+ .advance = xxx_advance,
+ .revert = xxx_revert,
+ .fault_in_readable = xxx_fault_in_readable,
+ .single_seg_count = xxx_single_seg_count,
+ .copy_page_to_iter = xxx_copy_page_to_iter,
+ .copy_page_from_iter = xxx_copy_page_from_iter,
+ .copy_to_iter = xxx_copy_to_iter,
+ .copy_from_iter = xxx_copy_from_iter,
+ .copy_from_iter_full = xxx_copy_from_iter_full,
+ .copy_from_iter_nocache = xxx_copy_from_iter_nocache,
+ .copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
+#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+ .copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
+#endif
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+ .copy_mc_to_iter = xxx_copy_mc_to_iter,
+#endif
+ .csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
+ .csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
+ .csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,
+
+ .zero = xxx_zero,
+ .alignment = xxx_alignment,
+ .gap_alignment = xxx_gap_alignment,
+ .get_pages = xxx_get_pages,
+ .get_pages_alloc = xxx_get_pages_alloc,
+ .npages = xxx_npages,
+ .dup_iter = xxx_dup_iter,
+ .for_each_range = xxx_for_each_range,
+};


2020-11-21 14:16:45

by David Howells

[permalink] [raw]
Subject: [PATCH 02/29] iov_iter: Split copy_page_to_iter()

Split copy_page_to_iter() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 44 +++++++++++++++++++++++++-------------------
1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index e403d524c797..fee8e99fbb9c 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -19,6 +19,8 @@ static const struct iov_iter_ops bvec_iter_ops;
static const struct iov_iter_ops pipe_iter_ops;
static const struct iov_iter_ops discard_iter_ops;

+static inline bool page_copy_sane(struct page *page, size_t offset, size_t n);
+
#define PIPE_PARANOIA /* for now */

#define iterate_iovec(i, n, __v, __p, skip, STEP) { \
@@ -167,7 +169,7 @@ static int copyin(void *to, const void __user *from, size_t n)
return n;
}

-static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t bytes,
+static size_t iovec_copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
struct iov_iter *i)
{
size_t skip, copy, left, wanted;
@@ -175,6 +177,8 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
char __user *buf;
void *kaddr, *from;

+ if (unlikely(!page_copy_sane(page, offset, bytes)))
+ return 0;
if (unlikely(bytes > i->count))
bytes = i->count;

@@ -378,7 +382,7 @@ static bool sanity(const struct iov_iter *i)
#define sanity(i) true
#endif

-static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t bytes,
+static size_t pipe_copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
struct iov_iter *i)
{
struct pipe_inode_info *pipe = i->pipe;
@@ -388,6 +392,8 @@ static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t by
unsigned int i_head = i->head;
size_t off;

+ if (unlikely(!page_copy_sane(page, offset, bytes)))
+ return 0;
if (unlikely(bytes > i->count))
bytes = i->count;

@@ -910,22 +916,22 @@ static inline bool page_copy_sane(struct page *page, size_t offset, size_t n)
return false;
}

-static size_t xxx_copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
+static size_t bkvec_copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
struct iov_iter *i)
{
- if (unlikely(!page_copy_sane(page, offset, bytes)))
- return 0;
- if (iov_iter_type(i) & (ITER_BVEC|ITER_KVEC)) {
+ size_t wanted = 0;
+ if (likely(page_copy_sane(page, offset, bytes))) {
void *kaddr = kmap_atomic(page);
- size_t wanted = copy_to_iter(kaddr + offset, bytes, i);
+ wanted = copy_to_iter(kaddr + offset, bytes, i);
kunmap_atomic(kaddr);
- return wanted;
- } else if (unlikely(iov_iter_is_discard(i)))
- return bytes;
- else if (likely(!iov_iter_is_pipe(i)))
- return copy_page_to_iter_iovec(page, offset, bytes, i);
- else
- return copy_page_to_iter_pipe(page, offset, bytes, i);
+ }
+ return wanted;
+}
+
+static size_t discard_copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
+ struct iov_iter *i)
+{
+ return bytes;
}

static size_t xxx_copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
@@ -1842,7 +1848,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.revert = xxx_revert,
.fault_in_readable = xxx_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
- .copy_page_to_iter = xxx_copy_page_to_iter,
+ .copy_page_to_iter = iovec_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
.copy_to_iter = xxx_copy_to_iter,
.copy_from_iter = xxx_copy_from_iter,
@@ -1876,7 +1882,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.revert = xxx_revert,
.fault_in_readable = xxx_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
- .copy_page_to_iter = xxx_copy_page_to_iter,
+ .copy_page_to_iter = bkvec_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
.copy_to_iter = xxx_copy_to_iter,
.copy_from_iter = xxx_copy_from_iter,
@@ -1910,7 +1916,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.revert = xxx_revert,
.fault_in_readable = xxx_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
- .copy_page_to_iter = xxx_copy_page_to_iter,
+ .copy_page_to_iter = bkvec_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
.copy_to_iter = xxx_copy_to_iter,
.copy_from_iter = xxx_copy_from_iter,
@@ -1944,7 +1950,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.revert = xxx_revert,
.fault_in_readable = xxx_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
- .copy_page_to_iter = xxx_copy_page_to_iter,
+ .copy_page_to_iter = pipe_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
.copy_to_iter = xxx_copy_to_iter,
.copy_from_iter = xxx_copy_from_iter,
@@ -1978,7 +1984,7 @@ static const struct iov_iter_ops discard_iter_ops = {
.revert = xxx_revert,
.fault_in_readable = xxx_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
- .copy_page_to_iter = xxx_copy_page_to_iter,
+ .copy_page_to_iter = discard_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
.copy_to_iter = xxx_copy_to_iter,
.copy_from_iter = xxx_copy_from_iter,


2020-11-21 14:16:54

by David Howells

[permalink] [raw]
Subject: [PATCH 03/29] iov_iter: Split iov_iter_fault_in_readable

Split iov_iter_fault_in_readable() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 29 ++++++++++++++++-------------
1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index fee8e99fbb9c..280b5c9c9a9c 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -439,20 +439,23 @@ static size_t pipe_copy_page_to_iter(struct page *page, size_t offset, size_t by
* Return 0 on success, or non-zero if the memory could not be accessed (i.e.
* because it is an invalid address).
*/
-static int xxx_fault_in_readable(struct iov_iter *i, size_t bytes)
+static int iovec_fault_in_readable(struct iov_iter *i, size_t bytes)
{
size_t skip = i->iov_offset;
const struct iovec *iov;
int err;
struct iovec v;

- if (!(iov_iter_type(i) & (ITER_BVEC|ITER_KVEC))) {
- iterate_iovec(i, bytes, v, iov, skip, ({
- err = fault_in_pages_readable(v.iov_base, v.iov_len);
- if (unlikely(err))
- return err;
- 0;}))
- }
+ iterate_iovec(i, bytes, v, iov, skip, ({
+ err = fault_in_pages_readable(v.iov_base, v.iov_len);
+ if (unlikely(err))
+ return err;
+ 0;}))
+ return 0;
+}
+
+static int no_fault_in_readable(struct iov_iter *i, size_t bytes)
+{
return 0;
}

@@ -1846,7 +1849,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.copy_from_user_atomic = xxx_copy_from_user_atomic,
.advance = xxx_advance,
.revert = xxx_revert,
- .fault_in_readable = xxx_fault_in_readable,
+ .fault_in_readable = iovec_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = iovec_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
@@ -1880,7 +1883,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.copy_from_user_atomic = xxx_copy_from_user_atomic,
.advance = xxx_advance,
.revert = xxx_revert,
- .fault_in_readable = xxx_fault_in_readable,
+ .fault_in_readable = no_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = bkvec_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
@@ -1914,7 +1917,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.copy_from_user_atomic = xxx_copy_from_user_atomic,
.advance = xxx_advance,
.revert = xxx_revert,
- .fault_in_readable = xxx_fault_in_readable,
+ .fault_in_readable = no_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = bkvec_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
@@ -1948,7 +1951,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.copy_from_user_atomic = xxx_copy_from_user_atomic,
.advance = xxx_advance,
.revert = xxx_revert,
- .fault_in_readable = xxx_fault_in_readable,
+ .fault_in_readable = no_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = pipe_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
@@ -1982,7 +1985,7 @@ static const struct iov_iter_ops discard_iter_ops = {
.copy_from_user_atomic = xxx_copy_from_user_atomic,
.advance = xxx_advance,
.revert = xxx_revert,
- .fault_in_readable = xxx_fault_in_readable,
+ .fault_in_readable = no_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = discard_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,


2020-11-21 14:17:11

by David Howells

[permalink] [raw]
Subject: [PATCH 04/29] iov_iter: Split the iterate_and_advance() macro

Split the iterate_and_advance() macro into iovec, bvec, kvec and discard
variants. It doesn't handle pipes.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 280b5c9c9a9c..a221e7771201 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -147,6 +147,68 @@ static inline bool page_copy_sane(struct page *page, size_t offset, size_t n);
} \
}

+#define iterate_and_advance_iovec(i, n, v, CMD) { \
+ if (unlikely(i->count < n)) \
+ n = i->count; \
+ if (i->count) { \
+ size_t skip = i->iov_offset; \
+ const struct iovec *iov; \
+ struct iovec v; \
+ iterate_iovec(i, n, v, iov, skip, (CMD)) \
+ if (skip == iov->iov_len) { \
+ iov++; \
+ skip = 0; \
+ } \
+ i->nr_segs -= iov - i->iov; \
+ i->iov = iov; \
+ i->count -= n; \
+ i->iov_offset = skip; \
+ } \
+}
+
+#define iterate_and_advance_bvec(i, n, v, CMD) { \
+ if (unlikely(i->count < n)) \
+ n = i->count; \
+ if (i->count) { \
+ size_t skip = i->iov_offset; \
+ const struct bio_vec *bvec = i->bvec; \
+ struct bio_vec v; \
+ struct bvec_iter __bi; \
+ iterate_bvec(i, n, v, __bi, skip, (CMD)) \
+ i->bvec = __bvec_iter_bvec(i->bvec, __bi); \
+ i->nr_segs -= i->bvec - bvec; \
+ skip = __bi.bi_bvec_done; \
+ i->count -= n; \
+ i->iov_offset = skip; \
+ } \
+}
+
+#define iterate_and_advance_kvec(i, n, v, CMD) { \
+ if (unlikely(i->count < n)) \
+ n = i->count; \
+ if (i->count) { \
+ size_t skip = i->iov_offset; \
+ const struct kvec *kvec; \
+ struct kvec v; \
+ iterate_kvec(i, n, v, kvec, skip, (CMD)) \
+ if (skip == kvec->iov_len) { \
+ kvec++; \
+ skip = 0; \
+ } \
+ i->nr_segs -= kvec - i->kvec; \
+ i->kvec = kvec; \
+ i->count -= n; \
+ i->iov_offset = skip; \
+ } \
+}
+
+#define iterate_and_advance_discard(i, n) { \
+ if (unlikely(i->count < n)) \
+ n = i->count; \
+ i->count -= n; \
+ i->iov_offset += n; \
+}
+
static int copyout(void __user *to, const void *from, size_t n)
{
if (should_fail_usercopy())


2020-11-21 14:17:22

by David Howells

[permalink] [raw]
Subject: [PATCH 09/29] iov_iter: Split copy_from_iter_full()

Split copy_from_iter_full() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 59 +++++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 41 insertions(+), 18 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 934193627540..3dba665a1ee9 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -923,32 +923,55 @@ static size_t no_copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
return bytes;
}

-static bool xxx_copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
+static bool iovec_copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
- if (unlikely(iov_iter_is_pipe(i))) {
- WARN_ON(1);
- return false;
- }
+
if (unlikely(i->count < bytes))
return false;

- if (iter_is_iovec(i))
- might_fault();
- iterate_all_kinds(i, bytes, v, ({
+ might_fault();
+ iterate_over_iovec(i, bytes, v, ({
if (copyin((to += v.iov_len) - v.iov_len,
- v.iov_base, v.iov_len))
+ v.iov_base, v.iov_len))
return false;
- 0;}),
+ 0;}));
+ iov_iter_advance(i, bytes);
+ return true;
+}
+
+static bool bvec_copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
+{
+ char *to = addr;
+
+ if (unlikely(i->count < bytes))
+ return false;
+ iterate_over_bvec(i, bytes, v,
memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
- v.bv_offset, v.bv_len),
- memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
- )
+ v.bv_offset, v.bv_len));
+ iov_iter_advance(i, bytes);
+ return true;
+}
+
+static bool kvec_copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
+{
+ char *to = addr;

+ if (unlikely(i->count < bytes))
+ return false;
+
+ iterate_over_kvec(i, bytes, v,
+ memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len));
iov_iter_advance(i, bytes);
return true;
}

+static bool no_copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
+{
+ WARN_ON(1);
+ return false;
+}
+
static size_t xxx_copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
@@ -1985,7 +2008,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.copy_page_from_iter = xxx_copy_page_from_iter,
.copy_to_iter = iovec_copy_to_iter,
.copy_from_iter = iovec_copy_from_iter,
- .copy_from_iter_full = xxx_copy_from_iter_full,
+ .copy_from_iter_full = iovec_copy_from_iter_full,
.copy_from_iter_nocache = xxx_copy_from_iter_nocache,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
@@ -2019,7 +2042,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.copy_page_from_iter = xxx_copy_page_from_iter,
.copy_to_iter = kvec_copy_to_iter,
.copy_from_iter = kvec_copy_from_iter,
- .copy_from_iter_full = xxx_copy_from_iter_full,
+ .copy_from_iter_full = kvec_copy_from_iter_full,
.copy_from_iter_nocache = xxx_copy_from_iter_nocache,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
@@ -2053,7 +2076,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.copy_page_from_iter = xxx_copy_page_from_iter,
.copy_to_iter = bvec_copy_to_iter,
.copy_from_iter = bvec_copy_from_iter,
- .copy_from_iter_full = xxx_copy_from_iter_full,
+ .copy_from_iter_full = bvec_copy_from_iter_full,
.copy_from_iter_nocache = xxx_copy_from_iter_nocache,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
@@ -2087,7 +2110,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.copy_page_from_iter = xxx_copy_page_from_iter,
.copy_to_iter = pipe_copy_to_iter,
.copy_from_iter = no_copy_from_iter,
- .copy_from_iter_full = xxx_copy_from_iter_full,
+ .copy_from_iter_full = no_copy_from_iter_full,
.copy_from_iter_nocache = xxx_copy_from_iter_nocache,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
@@ -2121,7 +2144,7 @@ static const struct iov_iter_ops discard_iter_ops = {
.copy_page_from_iter = xxx_copy_page_from_iter,
.copy_to_iter = discard_copy_to_iter,
.copy_from_iter = no_copy_from_iter,
- .copy_from_iter_full = xxx_copy_from_iter_full,
+ .copy_from_iter_full = no_copy_from_iter_full,
.copy_from_iter_nocache = xxx_copy_from_iter_nocache,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE


2020-11-21 14:17:39

by David Howells

[permalink] [raw]
Subject: [PATCH 05/29] iov_iter: Split copy_to_iter()

Split copy_to_iter() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 47 +++++++++++++++++++++++++++++++----------------
1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index a221e7771201..0865e0b6eee9 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -634,7 +634,7 @@ static size_t push_pipe(struct iov_iter *i, size_t size,
return size - left;
}

-static size_t copy_pipe_to_iter(const void *addr, size_t bytes,
+static size_t pipe_copy_to_iter(const void *addr, size_t bytes,
struct iov_iter *i)
{
struct pipe_inode_info *pipe = i->pipe;
@@ -703,20 +703,35 @@ static size_t csum_and_copy_to_pipe_iter(const void *addr, size_t bytes,
return bytes;
}

-static size_t xxx_copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
+static size_t iovec_copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
{
const char *from = addr;
- if (unlikely(iov_iter_is_pipe(i)))
- return copy_pipe_to_iter(addr, bytes, i);
- if (iter_is_iovec(i))
- might_fault();
- iterate_and_advance(i, bytes, v,
- copyout(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len),
+ might_fault();
+ iterate_and_advance_iovec(i, bytes, v,
+ copyout(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len));
+ return bytes;
+}
+
+static size_t bvec_copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
+{
+ const char *from = addr;
+ iterate_and_advance_bvec(i, bytes, v,
memcpy_to_page(v.bv_page, v.bv_offset,
- (from += v.bv_len) - v.bv_len, v.bv_len),
- memcpy(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len)
- )
+ (from += v.bv_len) - v.bv_len, v.bv_len));
+ return bytes;
+}

+static size_t kvec_copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
+{
+ const char *from = addr;
+ iterate_and_advance_kvec(i, bytes, v,
+ memcpy(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len));
+ return bytes;
+}
+
+static size_t discard_copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
+{
+ iterate_and_advance_discard(i, bytes);
return bytes;
}

@@ -1915,7 +1930,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = iovec_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
- .copy_to_iter = xxx_copy_to_iter,
+ .copy_to_iter = iovec_copy_to_iter,
.copy_from_iter = xxx_copy_from_iter,
.copy_from_iter_full = xxx_copy_from_iter_full,
.copy_from_iter_nocache = xxx_copy_from_iter_nocache,
@@ -1949,7 +1964,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = bkvec_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
- .copy_to_iter = xxx_copy_to_iter,
+ .copy_to_iter = kvec_copy_to_iter,
.copy_from_iter = xxx_copy_from_iter,
.copy_from_iter_full = xxx_copy_from_iter_full,
.copy_from_iter_nocache = xxx_copy_from_iter_nocache,
@@ -1983,7 +1998,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = bkvec_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
- .copy_to_iter = xxx_copy_to_iter,
+ .copy_to_iter = bvec_copy_to_iter,
.copy_from_iter = xxx_copy_from_iter,
.copy_from_iter_full = xxx_copy_from_iter_full,
.copy_from_iter_nocache = xxx_copy_from_iter_nocache,
@@ -2017,7 +2032,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = pipe_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
- .copy_to_iter = xxx_copy_to_iter,
+ .copy_to_iter = pipe_copy_to_iter,
.copy_from_iter = xxx_copy_from_iter,
.copy_from_iter_full = xxx_copy_from_iter_full,
.copy_from_iter_nocache = xxx_copy_from_iter_nocache,
@@ -2051,7 +2066,7 @@ static const struct iov_iter_ops discard_iter_ops = {
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = discard_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
- .copy_to_iter = xxx_copy_to_iter,
+ .copy_to_iter = discard_copy_to_iter,
.copy_from_iter = xxx_copy_from_iter,
.copy_from_iter_full = xxx_copy_from_iter_full,
.copy_from_iter_nocache = xxx_copy_from_iter_nocache,


2020-11-21 14:17:50

by David Howells

[permalink] [raw]
Subject: [PATCH 06/29] iov_iter: Split copy_mc_to_iter()

Split copy_mc_to_iter() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 54 +++++++++++++++++++++++++++++++++---------------------
1 file changed, 33 insertions(+), 21 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 0865e0b6eee9..7c1d92f7d020 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -758,7 +758,7 @@ static unsigned long copy_mc_to_page(struct page *page, size_t offset,
return ret;
}

-static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
+static size_t pipe_copy_mc_to_iter(const void *addr, size_t bytes,
struct iov_iter *i)
{
struct pipe_inode_info *pipe = i->pipe;
@@ -815,18 +815,23 @@ static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
* Compare to copy_to_iter() where only ITER_IOVEC attempts might return
* a short copy.
*/
-static size_t xxx_copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
+static size_t iovec_copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
{
const char *from = addr;
- unsigned long rem, curr_addr, s_addr = (unsigned long) addr;

- if (unlikely(iov_iter_is_pipe(i)))
- return copy_mc_pipe_to_iter(addr, bytes, i);
- if (iter_is_iovec(i))
- might_fault();
- iterate_and_advance(i, bytes, v,
+ might_fault();
+ iterate_and_advance_iovec(i, bytes, v,
copyout_mc(v.iov_base, (from += v.iov_len) - v.iov_len,
- v.iov_len),
+ v.iov_len));
+ return bytes;
+}
+
+static size_t bvec_copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
+{
+ const char *from = addr;
+ unsigned long rem, curr_addr, s_addr = (unsigned long) addr;
+
+ iterate_and_advance_bvec(i, bytes, v,
({
rem = copy_mc_to_page(v.bv_page, v.bv_offset,
(from += v.bv_len) - v.bv_len, v.bv_len);
@@ -835,18 +840,25 @@ static size_t xxx_copy_mc_to_iter(const void *addr, size_t bytes, struct iov_ite
bytes = curr_addr - s_addr - rem;
return bytes;
}
- }),
- ({
- rem = copy_mc_to_kernel(v.iov_base, (from += v.iov_len)
- - v.iov_len, v.iov_len);
+ }))
+ return bytes;
+}
+
+static size_t kvec_copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
+{
+ const char *from = addr;
+ unsigned long rem, curr_addr, s_addr = (unsigned long) addr;
+
+ iterate_and_advance_kvec(i, bytes, v, ({
+ rem = copy_mc_to_kernel(v.iov_base,
+ (from += v.iov_len) - v.iov_len,
+ v.iov_len);
if (rem) {
curr_addr = (unsigned long) from;
bytes = curr_addr - s_addr - rem;
return bytes;
}
- })
- )
-
+ }));
return bytes;
}
#endif /* CONFIG_ARCH_HAS_COPY_MC */
@@ -1939,7 +1951,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
#endif
#ifdef CONFIG_ARCH_HAS_COPY_MC
- .copy_mc_to_iter = xxx_copy_mc_to_iter,
+ .copy_mc_to_iter = iovec_copy_mc_to_iter,
#endif
.csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
.csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
@@ -1973,7 +1985,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
#endif
#ifdef CONFIG_ARCH_HAS_COPY_MC
- .copy_mc_to_iter = xxx_copy_mc_to_iter,
+ .copy_mc_to_iter = kvec_copy_mc_to_iter,
#endif
.csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
.csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
@@ -2007,7 +2019,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
#endif
#ifdef CONFIG_ARCH_HAS_COPY_MC
- .copy_mc_to_iter = xxx_copy_mc_to_iter,
+ .copy_mc_to_iter = bvec_copy_mc_to_iter,
#endif
.csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
.csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
@@ -2041,7 +2053,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
#endif
#ifdef CONFIG_ARCH_HAS_COPY_MC
- .copy_mc_to_iter = xxx_copy_mc_to_iter,
+ .copy_mc_to_iter = pipe_copy_mc_to_iter,
#endif
.csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
.csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
@@ -2075,7 +2087,7 @@ static const struct iov_iter_ops discard_iter_ops = {
.copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
#endif
#ifdef CONFIG_ARCH_HAS_COPY_MC
- .copy_mc_to_iter = xxx_copy_mc_to_iter,
+ .copy_mc_to_iter = discard_copy_to_iter,
#endif
.csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
.csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,


2020-11-21 14:18:04

by David Howells

[permalink] [raw]
Subject: [PATCH 07/29] iov_iter: Split copy_from_iter()

Split copy_from_iter() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 50 ++++++++++++++++++++++++++++++++------------------
1 file changed, 32 insertions(+), 18 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 7c1d92f7d020..5b18dfe0dcc7 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -863,22 +863,36 @@ static size_t kvec_copy_mc_to_iter(const void *addr, size_t bytes, struct iov_it
}
#endif /* CONFIG_ARCH_HAS_COPY_MC */

-static size_t xxx_copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
+static size_t iovec_copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
- if (unlikely(iov_iter_is_pipe(i))) {
- WARN_ON(1);
- return 0;
- }
- if (iter_is_iovec(i))
- might_fault();
- iterate_and_advance(i, bytes, v,
- copyin((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
+ might_fault();
+ iterate_and_advance_iovec(i, bytes, v,
+ copyin((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len));
+
+ return bytes;
+}
+
+static size_t bvec_copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
+{
+ char *to = addr;
+ iterate_and_advance_bvec(i, bytes, v,
memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
- v.bv_offset, v.bv_len),
- memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
- )
+ v.bv_offset, v.bv_len));
+ return bytes;
+}
+
+static size_t kvec_copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
+{
+ char *to = addr;
+ iterate_and_advance_kvec(i, bytes, v,
+ memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len));
+ return bytes;
+}

+static size_t no_copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
+{
+ WARN_ON(1);
return bytes;
}

@@ -1037,7 +1051,7 @@ static size_t xxx_copy_page_from_iter(struct page *page, size_t offset, size_t b
}
if (iov_iter_type(i) & (ITER_BVEC|ITER_KVEC)) {
void *kaddr = kmap_atomic(page);
- size_t wanted = xxx_copy_from_iter(kaddr + offset, bytes, i);
+ size_t wanted = copy_from_iter(kaddr + offset, bytes, i);
kunmap_atomic(kaddr);
return wanted;
} else
@@ -1943,7 +1957,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.copy_page_to_iter = iovec_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
.copy_to_iter = iovec_copy_to_iter,
- .copy_from_iter = xxx_copy_from_iter,
+ .copy_from_iter = iovec_copy_from_iter,
.copy_from_iter_full = xxx_copy_from_iter_full,
.copy_from_iter_nocache = xxx_copy_from_iter_nocache,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
@@ -1977,7 +1991,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.copy_page_to_iter = bkvec_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
.copy_to_iter = kvec_copy_to_iter,
- .copy_from_iter = xxx_copy_from_iter,
+ .copy_from_iter = kvec_copy_from_iter,
.copy_from_iter_full = xxx_copy_from_iter_full,
.copy_from_iter_nocache = xxx_copy_from_iter_nocache,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
@@ -2011,7 +2025,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.copy_page_to_iter = bkvec_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
.copy_to_iter = bvec_copy_to_iter,
- .copy_from_iter = xxx_copy_from_iter,
+ .copy_from_iter = bvec_copy_from_iter,
.copy_from_iter_full = xxx_copy_from_iter_full,
.copy_from_iter_nocache = xxx_copy_from_iter_nocache,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
@@ -2045,7 +2059,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.copy_page_to_iter = pipe_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
.copy_to_iter = pipe_copy_to_iter,
- .copy_from_iter = xxx_copy_from_iter,
+ .copy_from_iter = no_copy_from_iter,
.copy_from_iter_full = xxx_copy_from_iter_full,
.copy_from_iter_nocache = xxx_copy_from_iter_nocache,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
@@ -2079,7 +2093,7 @@ static const struct iov_iter_ops discard_iter_ops = {
.copy_page_to_iter = discard_copy_page_to_iter,
.copy_page_from_iter = xxx_copy_page_from_iter,
.copy_to_iter = discard_copy_to_iter,
- .copy_from_iter = xxx_copy_from_iter,
+ .copy_from_iter = no_copy_from_iter,
.copy_from_iter_full = xxx_copy_from_iter_full,
.copy_from_iter_nocache = xxx_copy_from_iter_nocache,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,


2020-11-21 14:18:18

by David Howells

[permalink] [raw]
Subject: [PATCH 14/29] iov_iter: Split iov_iter_zero()

Split iov_iter_zero() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 40 +++++++++++++++++++++++++++-------------
1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 54029aeab3ec..9a167f53ecff 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1168,16 +1168,30 @@ static size_t pipe_zero(size_t bytes, struct iov_iter *i)
return bytes;
}

-static size_t xxx_zero(size_t bytes, struct iov_iter *i)
+static size_t iovec_zero(size_t bytes, struct iov_iter *i)
{
- if (unlikely(iov_iter_is_pipe(i)))
- return pipe_zero(bytes, i);
- iterate_and_advance(i, bytes, v,
- clear_user(v.iov_base, v.iov_len),
- memzero_page(v.bv_page, v.bv_offset, v.bv_len),
- memset(v.iov_base, 0, v.iov_len)
- )
+ iterate_and_advance_iovec(i, bytes, v,
+ clear_user(v.iov_base, v.iov_len));
+ return bytes;
+}

+static size_t bvec_zero(size_t bytes, struct iov_iter *i)
+{
+ iterate_and_advance_bvec(i, bytes, v,
+ memzero_page(v.bv_page, v.bv_offset, v.bv_len));
+ return bytes;
+}
+
+static size_t kvec_zero(size_t bytes, struct iov_iter *i)
+{
+ iterate_and_advance_kvec(i, bytes, v,
+ memset(v.iov_base, 0, v.iov_len));
+ return bytes;
+}
+
+static size_t discard_zero(size_t bytes, struct iov_iter *i)
+{
+ iterate_and_advance_discard(i, bytes);
return bytes;
}

@@ -2054,7 +2068,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
.csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,

- .zero = xxx_zero,
+ .zero = iovec_zero,
.alignment = xxx_alignment,
.gap_alignment = xxx_gap_alignment,
.get_pages = xxx_get_pages,
@@ -2088,7 +2102,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
.csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,

- .zero = xxx_zero,
+ .zero = kvec_zero,
.alignment = xxx_alignment,
.gap_alignment = xxx_gap_alignment,
.get_pages = xxx_get_pages,
@@ -2122,7 +2136,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
.csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,

- .zero = xxx_zero,
+ .zero = bvec_zero,
.alignment = xxx_alignment,
.gap_alignment = xxx_gap_alignment,
.get_pages = xxx_get_pages,
@@ -2156,7 +2170,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
.csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,

- .zero = xxx_zero,
+ .zero = pipe_zero,
.alignment = xxx_alignment,
.gap_alignment = xxx_gap_alignment,
.get_pages = xxx_get_pages,
@@ -2190,7 +2204,7 @@ static const struct iov_iter_ops discard_iter_ops = {
.csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
.csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,

- .zero = xxx_zero,
+ .zero = discard_zero,
.alignment = xxx_alignment,
.gap_alignment = xxx_gap_alignment,
.get_pages = xxx_get_pages,


2020-11-21 14:18:20

by David Howells

[permalink] [raw]
Subject: [PATCH 08/29] iov_iter: Split the iterate_all_kinds() macro

Split the iterate_all_kinds() macro into iovec, bvec and kvec variants.
It doesn't handle pipes and the discard variant is a no-op and can be built
in directly.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 5b18dfe0dcc7..934193627540 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -106,6 +106,33 @@ static inline bool page_copy_sane(struct page *page, size_t offset, size_t n);
} \
}

+#define iterate_over_iovec(i, n, v, CMD) { \
+ if (likely(n)) { \
+ size_t skip = i->iov_offset; \
+ const struct iovec *iov; \
+ struct iovec v; \
+ iterate_iovec(i, n, v, iov, skip, (CMD)) \
+ } \
+}
+
+#define iterate_over_bvec(i, n, v, CMD) { \
+ if (likely(n)) { \
+ size_t skip = i->iov_offset; \
+ struct bio_vec v; \
+ struct bvec_iter __bi; \
+ iterate_bvec(i, n, v, __bi, skip, (CMD)) \
+ } \
+}
+
+#define iterate_over_kvec(i, n, v, CMD) { \
+ if (likely(n)) { \
+ size_t skip = i->iov_offset; \
+ const struct kvec *kvec; \
+ struct kvec v; \
+ iterate_kvec(i, n, v, kvec, skip, (CMD)) \
+ } \
+}
+
#define iterate_and_advance(i, n, v, I, B, K) { \
if (unlikely(i->count < n)) \
n = i->count; \


2020-11-21 14:18:42

by David Howells

[permalink] [raw]
Subject: [PATCH 18/29] iov_iter: Split iov_iter_single_seg_count()

Split iov_iter_single_seg_count() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 31 ++++++++++++++++++-------------
1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index b8e3da20547e..90291188ace5 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1415,18 +1415,23 @@ static void discard_revert(struct iov_iter *i, size_t unroll)
/*
* Return the count of just the current iov_iter segment.
*/
-static size_t xxx_single_seg_count(const struct iov_iter *i)
+static size_t iovec_kvec_single_seg_count(const struct iov_iter *i)
{
- if (unlikely(iov_iter_is_pipe(i)))
- return i->count; // it is a silly place, anyway
if (i->nr_segs == 1)
return i->count;
- if (unlikely(iov_iter_is_discard(i)))
+ return min(i->count, i->iov->iov_len - i->iov_offset);
+}
+
+static size_t bvec_single_seg_count(const struct iov_iter *i)
+{
+ if (i->nr_segs == 1)
return i->count;
- else if (iov_iter_is_bvec(i))
- return min(i->count, i->bvec->bv_len - i->iov_offset);
- else
- return min(i->count, i->iov->iov_len - i->iov_offset);
+ return min(i->count, i->bvec->bv_len - i->iov_offset);
+}
+
+static size_t simple_single_seg_count(const struct iov_iter *i)
+{
+ return i->count;
}

void iov_iter_kvec(struct iov_iter *i, unsigned int direction,
@@ -2110,7 +2115,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.advance = iovec_advance,
.revert = iovec_kvec_revert,
.fault_in_readable = iovec_fault_in_readable,
- .single_seg_count = xxx_single_seg_count,
+ .single_seg_count = iovec_kvec_single_seg_count,
.copy_page_to_iter = iovec_copy_page_to_iter,
.copy_page_from_iter = iovec_copy_page_from_iter,
.copy_to_iter = iovec_copy_to_iter,
@@ -2144,7 +2149,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.advance = kvec_advance,
.revert = iovec_kvec_revert,
.fault_in_readable = no_fault_in_readable,
- .single_seg_count = xxx_single_seg_count,
+ .single_seg_count = iovec_kvec_single_seg_count,
.copy_page_to_iter = bkvec_copy_page_to_iter,
.copy_page_from_iter = bkvec_copy_page_from_iter,
.copy_to_iter = kvec_copy_to_iter,
@@ -2178,7 +2183,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.advance = bvec_iov_advance,
.revert = bvec_revert,
.fault_in_readable = no_fault_in_readable,
- .single_seg_count = xxx_single_seg_count,
+ .single_seg_count = bvec_single_seg_count,
.copy_page_to_iter = bkvec_copy_page_to_iter,
.copy_page_from_iter = bkvec_copy_page_from_iter,
.copy_to_iter = bvec_copy_to_iter,
@@ -2212,7 +2217,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.advance = pipe_advance,
.revert = pipe_revert,
.fault_in_readable = no_fault_in_readable,
- .single_seg_count = xxx_single_seg_count,
+ .single_seg_count = simple_single_seg_count,
.copy_page_to_iter = pipe_copy_page_to_iter,
.copy_page_from_iter = no_copy_page_from_iter,
.copy_to_iter = pipe_copy_to_iter,
@@ -2246,7 +2251,7 @@ static const struct iov_iter_ops discard_iter_ops = {
.advance = discard_advance,
.revert = discard_revert,
.fault_in_readable = no_fault_in_readable,
- .single_seg_count = xxx_single_seg_count,
+ .single_seg_count = simple_single_seg_count,
.copy_page_to_iter = discard_copy_page_to_iter,
.copy_page_from_iter = no_copy_page_from_iter,
.copy_to_iter = discard_copy_to_iter,


2020-11-21 14:18:46

by David Howells

[permalink] [raw]
Subject: [PATCH 19/29] iov_iter: Split iov_iter_alignment()

Split iov_iter_alignment() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 59 ++++++++++++++++++++++++++++++++++++++++----------------
1 file changed, 42 insertions(+), 17 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 90291188ace5..d2a66e951995 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1497,26 +1497,51 @@ void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count)
}
EXPORT_SYMBOL(iov_iter_discard);

-static unsigned long xxx_alignment(const struct iov_iter *i)
+static unsigned long iovec_alignment(const struct iov_iter *i)
{
unsigned long res = 0;
size_t size = i->count;

- if (unlikely(iov_iter_is_pipe(i))) {
- unsigned int p_mask = i->pipe->ring_size - 1;
+ iterate_over_iovec(i, size, v,
+ (res |= (unsigned long)v.iov_base | v.iov_len, 0));
+ return res;
+}

- if (size && i->iov_offset && allocated(&i->pipe->bufs[i->head & p_mask]))
- return size | i->iov_offset;
- return size;
- }
- iterate_all_kinds(i, size, v,
- (res |= (unsigned long)v.iov_base | v.iov_len, 0),
- res |= v.bv_offset | v.bv_len,
- res |= (unsigned long)v.iov_base | v.iov_len
- )
+static unsigned long bvec_alignment(const struct iov_iter *i)
+{
+ unsigned long res = 0;
+ size_t size = i->count;
+
+ iterate_over_bvec(i, size, v,
+ res |= v.bv_offset | v.bv_len);
return res;
}

+static unsigned long kvec_alignment(const struct iov_iter *i)
+{
+ unsigned long res = 0;
+ size_t size = i->count;
+
+ iterate_over_kvec(i, size, v,
+ res |= (unsigned long)v.iov_base | v.iov_len);
+ return res;
+}
+
+static unsigned long pipe_alignment(const struct iov_iter *i)
+{
+ size_t size = i->count;
+ unsigned int p_mask = i->pipe->ring_size - 1;
+
+ if (size && i->iov_offset && allocated(&i->pipe->bufs[i->head & p_mask]))
+ return size | i->iov_offset;
+ return size;
+}
+
+static unsigned long no_alignment(const struct iov_iter *i)
+{
+ return 0;
+}
+
static unsigned long xxx_gap_alignment(const struct iov_iter *i)
{
unsigned long res = 0;
@@ -2134,7 +2159,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,

.zero = iovec_zero,
- .alignment = xxx_alignment,
+ .alignment = iovec_alignment,
.gap_alignment = xxx_gap_alignment,
.get_pages = xxx_get_pages,
.get_pages_alloc = xxx_get_pages_alloc,
@@ -2168,7 +2193,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,

.zero = kvec_zero,
- .alignment = xxx_alignment,
+ .alignment = kvec_alignment,
.gap_alignment = xxx_gap_alignment,
.get_pages = xxx_get_pages,
.get_pages_alloc = xxx_get_pages_alloc,
@@ -2202,7 +2227,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,

.zero = bvec_zero,
- .alignment = xxx_alignment,
+ .alignment = bvec_alignment,
.gap_alignment = xxx_gap_alignment,
.get_pages = xxx_get_pages,
.get_pages_alloc = xxx_get_pages_alloc,
@@ -2236,7 +2261,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,

.zero = pipe_zero,
- .alignment = xxx_alignment,
+ .alignment = pipe_alignment,
.gap_alignment = xxx_gap_alignment,
.get_pages = xxx_get_pages,
.get_pages_alloc = xxx_get_pages_alloc,
@@ -2270,7 +2295,7 @@ static const struct iov_iter_ops discard_iter_ops = {
.csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,

.zero = discard_zero,
- .alignment = xxx_alignment,
+ .alignment = no_alignment,
.gap_alignment = xxx_gap_alignment,
.get_pages = xxx_get_pages,
.get_pages_alloc = xxx_get_pages_alloc,


2020-11-21 14:18:55

by David Howells

[permalink] [raw]
Subject: [PATCH 20/29] iov_iter: Split iov_iter_gap_alignment()

Split iov_iter_gap_alignment() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 50 ++++++++++++++++++++++++++++++++++----------------
1 file changed, 34 insertions(+), 16 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index d2a66e951995..5744ddec854f 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1542,27 +1542,45 @@ static unsigned long no_alignment(const struct iov_iter *i)
return 0;
}

-static unsigned long xxx_gap_alignment(const struct iov_iter *i)
+static unsigned long iovec_gap_alignment(const struct iov_iter *i)
{
unsigned long res = 0;
size_t size = i->count;

- if (unlikely(iov_iter_is_pipe(i) || iov_iter_is_discard(i))) {
- WARN_ON(1);
- return ~0U;
- }
-
- iterate_all_kinds(i, size, v,
+ iterate_over_iovec(i, size, v,
(res |= (!res ? 0 : (unsigned long)v.iov_base) |
- (size != v.iov_len ? size : 0), 0),
+ (size != v.iov_len ? size : 0), 0));
+ return res;
+}
+
+static unsigned long bvec_gap_alignment(const struct iov_iter *i)
+{
+ unsigned long res = 0;
+ size_t size = i->count;
+
+ iterate_over_bvec(i, size, v,
(res |= (!res ? 0 : (unsigned long)v.bv_offset) |
- (size != v.bv_len ? size : 0)),
+ (size != v.bv_len ? size : 0)));
+ return res;
+}
+
+static unsigned long kvec_gap_alignment(const struct iov_iter *i)
+{
+ unsigned long res = 0;
+ size_t size = i->count;
+
+ iterate_over_kvec(i, size, v,
(res |= (!res ? 0 : (unsigned long)v.iov_base) |
- (size != v.iov_len ? size : 0))
- );
+ (size != v.iov_len ? size : 0)));
return res;
}

+static unsigned long no_gap_alignment(const struct iov_iter *i)
+{
+ WARN_ON(1);
+ return ~0U;
+}
+
static inline ssize_t __pipe_get_pages(struct iov_iter *i,
size_t maxsize,
struct page **pages,
@@ -2160,7 +2178,7 @@ static const struct iov_iter_ops iovec_iter_ops = {

.zero = iovec_zero,
.alignment = iovec_alignment,
- .gap_alignment = xxx_gap_alignment,
+ .gap_alignment = iovec_gap_alignment,
.get_pages = xxx_get_pages,
.get_pages_alloc = xxx_get_pages_alloc,
.npages = xxx_npages,
@@ -2194,7 +2212,7 @@ static const struct iov_iter_ops kvec_iter_ops = {

.zero = kvec_zero,
.alignment = kvec_alignment,
- .gap_alignment = xxx_gap_alignment,
+ .gap_alignment = kvec_gap_alignment,
.get_pages = xxx_get_pages,
.get_pages_alloc = xxx_get_pages_alloc,
.npages = xxx_npages,
@@ -2228,7 +2246,7 @@ static const struct iov_iter_ops bvec_iter_ops = {

.zero = bvec_zero,
.alignment = bvec_alignment,
- .gap_alignment = xxx_gap_alignment,
+ .gap_alignment = bvec_gap_alignment,
.get_pages = xxx_get_pages,
.get_pages_alloc = xxx_get_pages_alloc,
.npages = xxx_npages,
@@ -2262,7 +2280,7 @@ static const struct iov_iter_ops pipe_iter_ops = {

.zero = pipe_zero,
.alignment = pipe_alignment,
- .gap_alignment = xxx_gap_alignment,
+ .gap_alignment = no_gap_alignment,
.get_pages = xxx_get_pages,
.get_pages_alloc = xxx_get_pages_alloc,
.npages = xxx_npages,
@@ -2296,7 +2314,7 @@ static const struct iov_iter_ops discard_iter_ops = {

.zero = discard_zero,
.alignment = no_alignment,
- .gap_alignment = xxx_gap_alignment,
+ .gap_alignment = no_gap_alignment,
.get_pages = xxx_get_pages,
.get_pages_alloc = xxx_get_pages_alloc,
.npages = xxx_npages,


2020-11-21 14:18:56

by David Howells

[permalink] [raw]
Subject: [PATCH 21/29] iov_iter: Split iov_iter_get_pages()

Split iov_iter_get_pages() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 46 +++++++++++++++++++++++++++++-----------------
1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 5744ddec854f..a2de201b947f 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1611,6 +1611,8 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
unsigned int iter_head, npages;
size_t capacity;

+ if (maxsize > i->count)
+ maxsize = i->count;
if (!maxsize)
return 0;

@@ -1625,19 +1627,14 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
return __pipe_get_pages(i, min(maxsize, capacity), pages, iter_head, start);
}

-static ssize_t xxx_get_pages(struct iov_iter *i,
+static ssize_t iovec_get_pages(struct iov_iter *i,
struct page **pages, size_t maxsize, unsigned maxpages,
size_t *start)
{
if (maxsize > i->count)
maxsize = i->count;

- if (unlikely(iov_iter_is_pipe(i)))
- return pipe_get_pages(i, pages, maxsize, maxpages, start);
- if (unlikely(iov_iter_is_discard(i)))
- return -EFAULT;
-
- iterate_all_kinds(i, maxsize, v, ({
+ iterate_over_iovec(i, maxsize, v, ({
unsigned long addr = (unsigned long)v.iov_base;
size_t len = v.iov_len + (*start = addr & (PAGE_SIZE - 1));
int n;
@@ -1653,18 +1650,33 @@ static ssize_t xxx_get_pages(struct iov_iter *i,
if (unlikely(res < 0))
return res;
return (res == n ? len : res * PAGE_SIZE) - *start;
- 0;}),({
+ 0;}));
+ return 0;
+}
+
+static ssize_t bvec_get_pages(struct iov_iter *i,
+ struct page **pages, size_t maxsize, unsigned maxpages,
+ size_t *start)
+{
+ if (maxsize > i->count)
+ maxsize = i->count;
+
+ iterate_over_bvec(i, maxsize, v, ({
/* can't be more than PAGE_SIZE */
*start = v.bv_offset;
get_page(*pages = v.bv_page);
return v.bv_len;
- }),({
- return -EFAULT;
- })
- )
+ }));
return 0;
}

+static ssize_t no_get_pages(struct iov_iter *i,
+ struct page **pages, size_t maxsize, unsigned maxpages,
+ size_t *start)
+{
+ return -EFAULT;
+}
+
static struct page **get_pages_array(size_t n)
{
return kvmalloc_array(n, sizeof(struct page *), GFP_KERNEL);
@@ -2179,7 +2191,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.zero = iovec_zero,
.alignment = iovec_alignment,
.gap_alignment = iovec_gap_alignment,
- .get_pages = xxx_get_pages,
+ .get_pages = iovec_get_pages,
.get_pages_alloc = xxx_get_pages_alloc,
.npages = xxx_npages,
.dup_iter = xxx_dup_iter,
@@ -2213,7 +2225,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.zero = kvec_zero,
.alignment = kvec_alignment,
.gap_alignment = kvec_gap_alignment,
- .get_pages = xxx_get_pages,
+ .get_pages = no_get_pages,
.get_pages_alloc = xxx_get_pages_alloc,
.npages = xxx_npages,
.dup_iter = xxx_dup_iter,
@@ -2247,7 +2259,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.zero = bvec_zero,
.alignment = bvec_alignment,
.gap_alignment = bvec_gap_alignment,
- .get_pages = xxx_get_pages,
+ .get_pages = bvec_get_pages,
.get_pages_alloc = xxx_get_pages_alloc,
.npages = xxx_npages,
.dup_iter = xxx_dup_iter,
@@ -2281,7 +2293,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.zero = pipe_zero,
.alignment = pipe_alignment,
.gap_alignment = no_gap_alignment,
- .get_pages = xxx_get_pages,
+ .get_pages = pipe_get_pages,
.get_pages_alloc = xxx_get_pages_alloc,
.npages = xxx_npages,
.dup_iter = xxx_dup_iter,
@@ -2315,7 +2327,7 @@ static const struct iov_iter_ops discard_iter_ops = {
.zero = discard_zero,
.alignment = no_alignment,
.gap_alignment = no_gap_alignment,
- .get_pages = xxx_get_pages,
+ .get_pages = no_get_pages,
.get_pages_alloc = xxx_get_pages_alloc,
.npages = xxx_npages,
.dup_iter = xxx_dup_iter,


2020-11-21 14:19:12

by David Howells

[permalink] [raw]
Subject: [PATCH 13/29] iov_iter: Split copy_page_from_iter()

Split copy_page_from_iter() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 39 +++++++++++++++++++++------------------
1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 544e532e3e9f..54029aeab3ec 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -344,7 +344,7 @@ static size_t iovec_copy_page_to_iter(struct page *page, size_t offset, size_t b
return wanted - bytes;
}

-static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t bytes,
+static size_t iovec_copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
struct iov_iter *i)
{
size_t skip, copy, left, wanted;
@@ -352,6 +352,8 @@ static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t
char __user *buf;
void *kaddr, *to;

+ if (unlikely(!page_copy_sane(page, offset, bytes)))
+ return 0;
if (unlikely(bytes > i->count))
bytes = i->count;

@@ -1120,22 +1122,23 @@ static size_t discard_copy_page_to_iter(struct page *page, size_t offset, size_t
return bytes;
}

-static size_t xxx_copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
+static size_t bkvec_copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
struct iov_iter *i)
{
- if (unlikely(!page_copy_sane(page, offset, bytes)))
- return 0;
- if (unlikely(iov_iter_is_pipe(i) || iov_iter_is_discard(i))) {
- WARN_ON(1);
- return 0;
- }
- if (iov_iter_type(i) & (ITER_BVEC|ITER_KVEC)) {
+ size_t wanted = 0;
+ if (likely(page_copy_sane(page, offset, bytes))) {
void *kaddr = kmap_atomic(page);
- size_t wanted = copy_from_iter(kaddr + offset, bytes, i);
+ wanted = copy_from_iter(kaddr + offset, bytes, i);
kunmap_atomic(kaddr);
- return wanted;
- } else
- return copy_page_from_iter_iovec(page, offset, bytes, i);
+ }
+ return wanted;
+}
+
+static size_t no_copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
+ struct iov_iter *i)
+{
+ WARN_ON(1);
+ return 0;
}

static size_t pipe_zero(size_t bytes, struct iov_iter *i)
@@ -2035,7 +2038,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.fault_in_readable = iovec_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = iovec_copy_page_to_iter,
- .copy_page_from_iter = xxx_copy_page_from_iter,
+ .copy_page_from_iter = iovec_copy_page_from_iter,
.copy_to_iter = iovec_copy_to_iter,
.copy_from_iter = iovec_copy_from_iter,
.copy_from_iter_full = iovec_copy_from_iter_full,
@@ -2069,7 +2072,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.fault_in_readable = no_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = bkvec_copy_page_to_iter,
- .copy_page_from_iter = xxx_copy_page_from_iter,
+ .copy_page_from_iter = bkvec_copy_page_from_iter,
.copy_to_iter = kvec_copy_to_iter,
.copy_from_iter = kvec_copy_from_iter,
.copy_from_iter_full = kvec_copy_from_iter_full,
@@ -2103,7 +2106,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.fault_in_readable = no_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = bkvec_copy_page_to_iter,
- .copy_page_from_iter = xxx_copy_page_from_iter,
+ .copy_page_from_iter = bkvec_copy_page_from_iter,
.copy_to_iter = bvec_copy_to_iter,
.copy_from_iter = bvec_copy_from_iter,
.copy_from_iter_full = bvec_copy_from_iter_full,
@@ -2137,7 +2140,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.fault_in_readable = no_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = pipe_copy_page_to_iter,
- .copy_page_from_iter = xxx_copy_page_from_iter,
+ .copy_page_from_iter = no_copy_page_from_iter,
.copy_to_iter = pipe_copy_to_iter,
.copy_from_iter = no_copy_from_iter,
.copy_from_iter_full = no_copy_from_iter_full,
@@ -2171,7 +2174,7 @@ static const struct iov_iter_ops discard_iter_ops = {
.fault_in_readable = no_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = discard_copy_page_to_iter,
- .copy_page_from_iter = xxx_copy_page_from_iter,
+ .copy_page_from_iter = no_copy_page_from_iter,
.copy_to_iter = discard_copy_to_iter,
.copy_from_iter = no_copy_from_iter,
.copy_from_iter_full = no_copy_from_iter_full,


2020-11-21 14:19:13

by David Howells

[permalink] [raw]
Subject: [PATCH 24/29] iov_iter: Split csum_and_copy_from_iter_full()

Split csum_and_copy_from_iter_full() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 62 ++++++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 47 insertions(+), 15 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 1f596cffddf9..8820a9e72815 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1841,20 +1841,16 @@ static size_t no_csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
return 0;
}

-static bool xxx_csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum,
+static bool iovec_csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum,
struct iov_iter *i)
{
char *to = addr;
__wsum sum, next;
size_t off = 0;
sum = *csum;
- if (unlikely(iov_iter_is_pipe(i) || iov_iter_is_discard(i))) {
- WARN_ON(1);
- return false;
- }
if (unlikely(i->count < bytes))
return false;
- iterate_all_kinds(i, bytes, v, ({
+ iterate_over_iovec(i, bytes, v, ({
next = csum_and_copy_from_user(v.iov_base,
(to += v.iov_len) - v.iov_len,
v.iov_len);
@@ -1863,25 +1859,61 @@ static bool xxx_csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *c
sum = csum_block_add(sum, next, off);
off += v.iov_len;
0;
- }), ({
+ }));
+ *csum = sum;
+ iov_iter_advance(i, bytes);
+ return true;
+}
+
+static bool bvec_csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum,
+ struct iov_iter *i)
+{
+ char *to = addr;
+ __wsum sum;
+ size_t off = 0;
+ sum = *csum;
+ if (unlikely(i->count < bytes))
+ return false;
+ iterate_over_bvec(i, bytes, v, ({
char *p = kmap_atomic(v.bv_page);
sum = csum_and_memcpy((to += v.bv_len) - v.bv_len,
p + v.bv_offset, v.bv_len,
sum, off);
kunmap_atomic(p);
off += v.bv_len;
- }),({
+ }));
+ *csum = sum;
+ iov_iter_advance(i, bytes);
+ return true;
+}
+
+static bool kvec_csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum,
+ struct iov_iter *i)
+{
+ char *to = addr;
+ __wsum sum;
+ size_t off = 0;
+ sum = *csum;
+ if (unlikely(i->count < bytes))
+ return false;
+ iterate_over_kvec(i, bytes, v, ({
sum = csum_and_memcpy((to += v.iov_len) - v.iov_len,
v.iov_base, v.iov_len,
sum, off);
off += v.iov_len;
- })
- )
+ }));
*csum = sum;
iov_iter_advance(i, bytes);
return true;
}

+static bool no_csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum,
+ struct iov_iter *i)
+{
+ WARN_ON(1);
+ return false;
+}
+
static size_t xxx_csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump,
struct iov_iter *i)
{
@@ -2226,7 +2258,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
#endif
.csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
.csum_and_copy_from_iter = iovec_csum_and_copy_from_iter,
- .csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,
+ .csum_and_copy_from_iter_full = iovec_csum_and_copy_from_iter_full,

.zero = iovec_zero,
.alignment = iovec_alignment,
@@ -2260,7 +2292,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
#endif
.csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
.csum_and_copy_from_iter = kvec_csum_and_copy_from_iter,
- .csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,
+ .csum_and_copy_from_iter_full = kvec_csum_and_copy_from_iter_full,

.zero = kvec_zero,
.alignment = kvec_alignment,
@@ -2294,7 +2326,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
#endif
.csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
.csum_and_copy_from_iter = bvec_csum_and_copy_from_iter,
- .csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,
+ .csum_and_copy_from_iter_full = bvec_csum_and_copy_from_iter_full,

.zero = bvec_zero,
.alignment = bvec_alignment,
@@ -2328,7 +2360,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
#endif
.csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
.csum_and_copy_from_iter = no_csum_and_copy_from_iter,
- .csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,
+ .csum_and_copy_from_iter_full = no_csum_and_copy_from_iter_full,

.zero = pipe_zero,
.alignment = pipe_alignment,
@@ -2362,7 +2394,7 @@ static const struct iov_iter_ops discard_iter_ops = {
#endif
.csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
.csum_and_copy_from_iter = no_csum_and_copy_from_iter,
- .csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,
+ .csum_and_copy_from_iter_full = no_csum_and_copy_from_iter_full,

.zero = discard_zero,
.alignment = no_alignment,


2020-11-21 14:19:28

by David Howells

[permalink] [raw]
Subject: [PATCH 10/29] iov_iter: Split copy_from_iter_nocache()

Split copy_from_iter_nocache() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 38 +++++++++++++++++++++++---------------
1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 3dba665a1ee9..c57c2171f730 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -972,21 +972,29 @@ static bool no_copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
return false;
}

-static size_t xxx_copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
+static size_t iovec_copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
- if (unlikely(iov_iter_is_pipe(i))) {
- WARN_ON(1);
- return 0;
- }
- iterate_and_advance(i, bytes, v,
+ iterate_and_advance_iovec(i, bytes, v,
__copy_from_user_inatomic_nocache((to += v.iov_len) - v.iov_len,
- v.iov_base, v.iov_len),
+ v.iov_base, v.iov_len));
+ return bytes;
+}
+
+static size_t bvec_copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
+{
+ char *to = addr;
+ iterate_and_advance_bvec(i, bytes, v,
memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
- v.bv_offset, v.bv_len),
- memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
- )
+ v.bv_offset, v.bv_len));
+ return bytes;
+}

+static size_t kvec_copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
+{
+ char *to = addr;
+ iterate_and_advance_kvec(i, bytes, v,
+ memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len));
return bytes;
}

@@ -2009,7 +2017,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.copy_to_iter = iovec_copy_to_iter,
.copy_from_iter = iovec_copy_from_iter,
.copy_from_iter_full = iovec_copy_from_iter_full,
- .copy_from_iter_nocache = xxx_copy_from_iter_nocache,
+ .copy_from_iter_nocache = iovec_copy_from_iter_nocache,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
.copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
@@ -2043,7 +2051,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.copy_to_iter = kvec_copy_to_iter,
.copy_from_iter = kvec_copy_from_iter,
.copy_from_iter_full = kvec_copy_from_iter_full,
- .copy_from_iter_nocache = xxx_copy_from_iter_nocache,
+ .copy_from_iter_nocache = kvec_copy_from_iter_nocache,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
.copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
@@ -2077,7 +2085,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.copy_to_iter = bvec_copy_to_iter,
.copy_from_iter = bvec_copy_from_iter,
.copy_from_iter_full = bvec_copy_from_iter_full,
- .copy_from_iter_nocache = xxx_copy_from_iter_nocache,
+ .copy_from_iter_nocache = bvec_copy_from_iter_nocache,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
.copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
@@ -2111,7 +2119,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.copy_to_iter = pipe_copy_to_iter,
.copy_from_iter = no_copy_from_iter,
.copy_from_iter_full = no_copy_from_iter_full,
- .copy_from_iter_nocache = xxx_copy_from_iter_nocache,
+ .copy_from_iter_nocache = no_copy_from_iter,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
.copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
@@ -2145,7 +2153,7 @@ static const struct iov_iter_ops discard_iter_ops = {
.copy_to_iter = discard_copy_to_iter,
.copy_from_iter = no_copy_from_iter,
.copy_from_iter_full = no_copy_from_iter_full,
- .copy_from_iter_nocache = xxx_copy_from_iter_nocache,
+ .copy_from_iter_nocache = no_copy_from_iter,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
.copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,


2020-11-21 14:19:31

by David Howells

[permalink] [raw]
Subject: [PATCH 26/29] iov_iter: Split iov_iter_npages()

Split iov_iter_npages() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 84 ++++++++++++++++++++++++++++++++++++++------------------
1 file changed, 57 insertions(+), 27 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 2f8019e3b09a..d8ef6c81c55f 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -2004,50 +2004,80 @@ size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
}
EXPORT_SYMBOL(hash_and_copy_to_iter);

-static int xxx_npages(const struct iov_iter *i, int maxpages)
+static int iovec_npages(const struct iov_iter *i, int maxpages)
{
size_t size = i->count;
int npages = 0;

if (!size)
return 0;
- if (unlikely(iov_iter_is_discard(i)))
- return 0;
-
- if (unlikely(iov_iter_is_pipe(i))) {
- struct pipe_inode_info *pipe = i->pipe;
- unsigned int iter_head;
- size_t off;
-
- if (!sanity(i))
- return 0;
-
- data_start(i, &iter_head, &off);
- /* some of this one + all after this one */
- npages = pipe_space_for_user(iter_head, pipe->tail, pipe);
- if (npages >= maxpages)
- return maxpages;
- } else iterate_all_kinds(i, size, v, ({
+ iterate_over_iovec(i, size, v, ({
unsigned long p = (unsigned long)v.iov_base;
npages += DIV_ROUND_UP(p + v.iov_len, PAGE_SIZE)
- p / PAGE_SIZE;
if (npages >= maxpages)
return maxpages;
- 0;}),({
+ 0;}));
+ return npages;
+}
+
+static int bvec_npages(const struct iov_iter *i, int maxpages)
+{
+ size_t size = i->count;
+ int npages = 0;
+
+ if (!size)
+ return 0;
+ iterate_over_bvec(i, size, v, ({
npages++;
if (npages >= maxpages)
return maxpages;
- }),({
+ }));
+ return npages;
+}
+
+static int kvec_npages(const struct iov_iter *i, int maxpages)
+{
+ size_t size = i->count;
+ int npages = 0;
+
+ if (!size)
+ return 0;
+ iterate_over_kvec(i, size, v, ({
unsigned long p = (unsigned long)v.iov_base;
npages += DIV_ROUND_UP(p + v.iov_len, PAGE_SIZE)
- p / PAGE_SIZE;
if (npages >= maxpages)
return maxpages;
- })
- )
+ }));
return npages;
}

+static int pipe_npages(const struct iov_iter *i, int maxpages)
+{
+ struct pipe_inode_info *pipe = i->pipe;
+ size_t size = i->count, off;
+ unsigned int iter_head;
+ int npages = 0;
+
+ if (!size)
+ return 0;
+ if (!sanity(i))
+ return 0;
+
+ data_start(i, &iter_head, &off);
+ /* some of this one + all after this one */
+ npages = pipe_space_for_user(iter_head, pipe->tail, pipe);
+ if (npages >= maxpages)
+ return maxpages;
+ return npages;
+}
+
+static int discard_npages(const struct iov_iter *i, int maxpages)
+{
+ return 0;
+}
+
static const void *xxx_dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
{
*new = *old;
@@ -2293,7 +2323,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.gap_alignment = iovec_gap_alignment,
.get_pages = iovec_get_pages,
.get_pages_alloc = iovec_get_pages_alloc,
- .npages = xxx_npages,
+ .npages = iovec_npages,
.dup_iter = xxx_dup_iter,
.for_each_range = xxx_for_each_range,
};
@@ -2327,7 +2357,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.gap_alignment = kvec_gap_alignment,
.get_pages = no_get_pages,
.get_pages_alloc = no_get_pages_alloc,
- .npages = xxx_npages,
+ .npages = kvec_npages,
.dup_iter = xxx_dup_iter,
.for_each_range = xxx_for_each_range,
};
@@ -2361,7 +2391,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.gap_alignment = bvec_gap_alignment,
.get_pages = bvec_get_pages,
.get_pages_alloc = bvec_get_pages_alloc,
- .npages = xxx_npages,
+ .npages = bvec_npages,
.dup_iter = xxx_dup_iter,
.for_each_range = xxx_for_each_range,
};
@@ -2395,7 +2425,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.gap_alignment = no_gap_alignment,
.get_pages = pipe_get_pages,
.get_pages_alloc = pipe_get_pages_alloc,
- .npages = xxx_npages,
+ .npages = pipe_npages,
.dup_iter = xxx_dup_iter,
.for_each_range = xxx_for_each_range,
};
@@ -2429,7 +2459,7 @@ static const struct iov_iter_ops discard_iter_ops = {
.gap_alignment = no_gap_alignment,
.get_pages = no_get_pages,
.get_pages_alloc = no_get_pages_alloc,
- .npages = xxx_npages,
+ .npages = discard_npages,
.dup_iter = xxx_dup_iter,
.for_each_range = xxx_for_each_range,
};


2020-11-21 14:19:35

by David Howells

[permalink] [raw]
Subject: [PATCH 27/29] iov_iter: Split dup_iter()

Split dup_iter() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 49 +++++++++++++++++++++++++++++--------------------
1 file changed, 29 insertions(+), 20 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index d8ef6c81c55f..ca0e94596eda 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -2078,26 +2078,35 @@ static int discard_npages(const struct iov_iter *i, int maxpages)
return 0;
}

-static const void *xxx_dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
+static const void *iovec_kvec_dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
{
*new = *old;
- if (unlikely(iov_iter_is_pipe(new))) {
- WARN_ON(1);
- return NULL;
- }
- if (unlikely(iov_iter_is_discard(new)))
- return NULL;
- if (iov_iter_is_bvec(new))
- return new->bvec = kmemdup(new->bvec,
- new->nr_segs * sizeof(struct bio_vec),
- flags);
- else
- /* iovec and kvec have identical layout */
- return new->iov = kmemdup(new->iov,
- new->nr_segs * sizeof(struct iovec),
+ /* iovec and kvec have identical layout */
+ return new->iov = kmemdup(new->iov,
+ new->nr_segs * sizeof(struct iovec),
+ flags);
+}
+
+static const void *bvec_dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
+{
+ *new = *old;
+ return new->bvec = kmemdup(new->bvec,
+ new->nr_segs * sizeof(struct bio_vec),
flags);
}

+static const void *discard_dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
+{
+ *new = *old;
+ return NULL;
+}
+
+static const void *no_dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
+{
+ WARN_ON(1);
+ return NULL;
+}
+
static int copy_compat_iovec_from_user(struct iovec *iov,
const struct iovec __user *uvec, unsigned long nr_segs)
{
@@ -2324,7 +2333,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.get_pages = iovec_get_pages,
.get_pages_alloc = iovec_get_pages_alloc,
.npages = iovec_npages,
- .dup_iter = xxx_dup_iter,
+ .dup_iter = iovec_kvec_dup_iter,
.for_each_range = xxx_for_each_range,
};

@@ -2358,7 +2367,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.get_pages = no_get_pages,
.get_pages_alloc = no_get_pages_alloc,
.npages = kvec_npages,
- .dup_iter = xxx_dup_iter,
+ .dup_iter = iovec_kvec_dup_iter,
.for_each_range = xxx_for_each_range,
};

@@ -2392,7 +2401,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.get_pages = bvec_get_pages,
.get_pages_alloc = bvec_get_pages_alloc,
.npages = bvec_npages,
- .dup_iter = xxx_dup_iter,
+ .dup_iter = bvec_dup_iter,
.for_each_range = xxx_for_each_range,
};

@@ -2426,7 +2435,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.get_pages = pipe_get_pages,
.get_pages_alloc = pipe_get_pages_alloc,
.npages = pipe_npages,
- .dup_iter = xxx_dup_iter,
+ .dup_iter = no_dup_iter,
.for_each_range = xxx_for_each_range,
};

@@ -2460,6 +2469,6 @@ static const struct iov_iter_ops discard_iter_ops = {
.get_pages = no_get_pages,
.get_pages_alloc = no_get_pages_alloc,
.npages = discard_npages,
- .dup_iter = xxx_dup_iter,
+ .dup_iter = discard_dup_iter,
.for_each_range = xxx_for_each_range,
};


2020-11-21 14:19:37

by David Howells

[permalink] [raw]
Subject: [PATCH 11/29] iov_iter: Split copy_from_iter_flushcache()

Split copy_from_iter_flushcache() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 42 +++++++++++++++++++++++++-----------------
1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index c57c2171f730..6b4739d7dd9a 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1000,7 +1000,7 @@ static size_t kvec_copy_from_iter_nocache(void *addr, size_t bytes, struct iov_i

#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
/**
- * _copy_from_iter_flushcache - write destination through cpu cache
+ * copy_from_iter_flushcache - write destination through cpu cache
* @addr: destination kernel address
* @bytes: total transfer length
* @iter: source iterator
@@ -1013,22 +1013,30 @@ static size_t kvec_copy_from_iter_nocache(void *addr, size_t bytes, struct iov_i
* bypass the cache for the ITER_IOVEC case, and on some archs may use
* instructions that strand dirty-data in the cache.
*/
-static size_t xxx_copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
+static size_t iovec_copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
- if (unlikely(iov_iter_is_pipe(i))) {
- WARN_ON(1);
- return 0;
- }
- iterate_and_advance(i, bytes, v,
+ iterate_and_advance_iovec(i, bytes, v,
__copy_from_user_flushcache((to += v.iov_len) - v.iov_len,
- v.iov_base, v.iov_len),
+ v.iov_base, v.iov_len));
+ return bytes;
+}
+
+static size_t bvec_copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
+{
+ char *to = addr;
+ iterate_and_advance_bvec(i, bytes, v,
memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page,
- v.bv_offset, v.bv_len),
- memcpy_flushcache((to += v.iov_len) - v.iov_len, v.iov_base,
- v.iov_len)
- )
+ v.bv_offset, v.bv_len));
+ return bytes;
+}

+static size_t kvec_copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
+{
+ char *to = addr;
+ iterate_and_advance_kvec(i, bytes, v,
+ memcpy_flushcache((to += v.iov_len) - v.iov_len, v.iov_base,
+ v.iov_len));
return bytes;
}
#endif
@@ -2020,7 +2028,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.copy_from_iter_nocache = iovec_copy_from_iter_nocache,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
- .copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
+ .copy_from_iter_flushcache = iovec_copy_from_iter_flushcache,
#endif
#ifdef CONFIG_ARCH_HAS_COPY_MC
.copy_mc_to_iter = iovec_copy_mc_to_iter,
@@ -2054,7 +2062,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.copy_from_iter_nocache = kvec_copy_from_iter_nocache,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
- .copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
+ .copy_from_iter_flushcache = kvec_copy_from_iter_flushcache,
#endif
#ifdef CONFIG_ARCH_HAS_COPY_MC
.copy_mc_to_iter = kvec_copy_mc_to_iter,
@@ -2088,7 +2096,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.copy_from_iter_nocache = bvec_copy_from_iter_nocache,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
- .copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
+ .copy_from_iter_flushcache = bvec_copy_from_iter_flushcache,
#endif
#ifdef CONFIG_ARCH_HAS_COPY_MC
.copy_mc_to_iter = bvec_copy_mc_to_iter,
@@ -2122,7 +2130,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.copy_from_iter_nocache = no_copy_from_iter,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
- .copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
+ .copy_from_iter_flushcache = no_copy_from_iter,
#endif
#ifdef CONFIG_ARCH_HAS_COPY_MC
.copy_mc_to_iter = pipe_copy_mc_to_iter,
@@ -2156,7 +2164,7 @@ static const struct iov_iter_ops discard_iter_ops = {
.copy_from_iter_nocache = no_copy_from_iter,
.copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
- .copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
+ .copy_from_iter_flushcache = no_copy_from_iter,
#endif
#ifdef CONFIG_ARCH_HAS_COPY_MC
.copy_mc_to_iter = discard_copy_to_iter,


2020-11-21 14:19:43

by David Howells

[permalink] [raw]
Subject: [PATCH 28/29] iov_iter: Split iov_iter_for_each_range()

Split iov_iter_for_each_range() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 41 +++++++++++++++++++++++++++++++----------
1 file changed, 31 insertions(+), 10 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index ca0e94596eda..db798966823e 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -2282,7 +2282,7 @@ int import_single_range(int rw, void __user *buf, size_t len,
}
EXPORT_SYMBOL(import_single_range);

-static int xxx_for_each_range(struct iov_iter *i, size_t bytes,
+static int bvec_for_each_range(struct iov_iter *i, size_t bytes,
int (*f)(struct kvec *vec, void *context),
void *context)
{
@@ -2291,18 +2291,39 @@ static int xxx_for_each_range(struct iov_iter *i, size_t bytes,
if (!bytes)
return 0;

- iterate_all_kinds(i, bytes, v, -EINVAL, ({
+ iterate_over_bvec(i, bytes, v, ({
w.iov_base = kmap(v.bv_page) + v.bv_offset;
w.iov_len = v.bv_len;
err = f(&w, context);
kunmap(v.bv_page);
- err;}), ({
+ err;
+ }));
+ return err;
+}
+
+static int kvec_for_each_range(struct iov_iter *i, size_t bytes,
+ int (*f)(struct kvec *vec, void *context),
+ void *context)
+{
+ struct kvec w;
+ int err = -EINVAL;
+ if (!bytes)
+ return 0;
+
+ iterate_over_kvec(i, bytes, v, ({
w = v;
- err = f(&w, context);})
- )
+ err = f(&w, context);
+ }));
return err;
}

+static int no_for_each_range(struct iov_iter *i, size_t bytes,
+ int (*f)(struct kvec *vec, void *context),
+ void *context)
+{
+ return !bytes ? 0 : -EINVAL;
+}
+
static const struct iov_iter_ops iovec_iter_ops = {
.type = ITER_IOVEC,
.copy_from_user_atomic = iovec_copy_from_user_atomic,
@@ -2334,7 +2355,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.get_pages_alloc = iovec_get_pages_alloc,
.npages = iovec_npages,
.dup_iter = iovec_kvec_dup_iter,
- .for_each_range = xxx_for_each_range,
+ .for_each_range = no_for_each_range,
};

static const struct iov_iter_ops kvec_iter_ops = {
@@ -2368,7 +2389,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.get_pages_alloc = no_get_pages_alloc,
.npages = kvec_npages,
.dup_iter = iovec_kvec_dup_iter,
- .for_each_range = xxx_for_each_range,
+ .for_each_range = kvec_for_each_range,
};

static const struct iov_iter_ops bvec_iter_ops = {
@@ -2402,7 +2423,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.get_pages_alloc = bvec_get_pages_alloc,
.npages = bvec_npages,
.dup_iter = bvec_dup_iter,
- .for_each_range = xxx_for_each_range,
+ .for_each_range = bvec_for_each_range,
};

static const struct iov_iter_ops pipe_iter_ops = {
@@ -2436,7 +2457,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.get_pages_alloc = pipe_get_pages_alloc,
.npages = pipe_npages,
.dup_iter = no_dup_iter,
- .for_each_range = xxx_for_each_range,
+ .for_each_range = no_for_each_range,
};

static const struct iov_iter_ops discard_iter_ops = {
@@ -2470,5 +2491,5 @@ static const struct iov_iter_ops discard_iter_ops = {
.get_pages_alloc = no_get_pages_alloc,
.npages = discard_npages,
.dup_iter = discard_dup_iter,
- .for_each_range = xxx_for_each_range,
+ .for_each_range = no_for_each_range,
};


2020-11-21 14:19:51

by David Howells

[permalink] [raw]
Subject: [PATCH 12/29] iov_iter: Split copy_from_iter_full_nocache()

Split copy_from_iter_full_nocache() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 44 +++++++++++++++++++++++++++++---------------
1 file changed, 29 insertions(+), 15 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 6b4739d7dd9a..544e532e3e9f 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1041,25 +1041,39 @@ static size_t kvec_copy_from_iter_flushcache(void *addr, size_t bytes, struct io
}
#endif

-static bool xxx_copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
+static bool iovec_copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
- if (unlikely(iov_iter_is_pipe(i))) {
- WARN_ON(1);
- return false;
- }
if (unlikely(i->count < bytes))
return false;
- iterate_all_kinds(i, bytes, v, ({
+ iterate_over_iovec(i, bytes, v, ({
if (__copy_from_user_inatomic_nocache((to += v.iov_len) - v.iov_len,
v.iov_base, v.iov_len))
return false;
- 0;}),
+ 0;}));
+ iov_iter_advance(i, bytes);
+ return true;
+}
+
+static bool bvec_copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
+{
+ char *to = addr;
+ if (unlikely(i->count < bytes))
+ return false;
+ iterate_over_bvec(i, bytes, v,
memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
- v.bv_offset, v.bv_len),
- memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
- )
+ v.bv_offset, v.bv_len));
+ iov_iter_advance(i, bytes);
+ return true;
+}

+static bool kvec_copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
+{
+ char *to = addr;
+ if (unlikely(i->count < bytes))
+ return false;
+ iterate_over_kvec(i, bytes, v,
+ memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len));
iov_iter_advance(i, bytes);
return true;
}
@@ -2026,7 +2040,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.copy_from_iter = iovec_copy_from_iter,
.copy_from_iter_full = iovec_copy_from_iter_full,
.copy_from_iter_nocache = iovec_copy_from_iter_nocache,
- .copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
+ .copy_from_iter_full_nocache = iovec_copy_from_iter_full_nocache,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
.copy_from_iter_flushcache = iovec_copy_from_iter_flushcache,
#endif
@@ -2060,7 +2074,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.copy_from_iter = kvec_copy_from_iter,
.copy_from_iter_full = kvec_copy_from_iter_full,
.copy_from_iter_nocache = kvec_copy_from_iter_nocache,
- .copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
+ .copy_from_iter_full_nocache = kvec_copy_from_iter_full_nocache,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
.copy_from_iter_flushcache = kvec_copy_from_iter_flushcache,
#endif
@@ -2094,7 +2108,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.copy_from_iter = bvec_copy_from_iter,
.copy_from_iter_full = bvec_copy_from_iter_full,
.copy_from_iter_nocache = bvec_copy_from_iter_nocache,
- .copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
+ .copy_from_iter_full_nocache = bvec_copy_from_iter_full_nocache,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
.copy_from_iter_flushcache = bvec_copy_from_iter_flushcache,
#endif
@@ -2128,7 +2142,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.copy_from_iter = no_copy_from_iter,
.copy_from_iter_full = no_copy_from_iter_full,
.copy_from_iter_nocache = no_copy_from_iter,
- .copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
+ .copy_from_iter_full_nocache = no_copy_from_iter_full,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
.copy_from_iter_flushcache = no_copy_from_iter,
#endif
@@ -2162,7 +2176,7 @@ static const struct iov_iter_ops discard_iter_ops = {
.copy_from_iter = no_copy_from_iter,
.copy_from_iter_full = no_copy_from_iter_full,
.copy_from_iter_nocache = no_copy_from_iter,
- .copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
+ .copy_from_iter_full_nocache = no_copy_from_iter_full,
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
.copy_from_iter_flushcache = no_copy_from_iter,
#endif


2020-11-21 14:20:23

by David Howells

[permalink] [raw]
Subject: [PATCH 15/29] iov_iter: Split copy_from_user_atomic()

Split copy_from_user_atomic() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 53 ++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 40 insertions(+), 13 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 9a167f53ecff..a626d41fef72 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1195,7 +1195,7 @@ static size_t discard_zero(size_t bytes, struct iov_iter *i)
return bytes;
}

-static size_t xxx_copy_from_user_atomic(struct page *page,
+static size_t iovec_copy_from_user_atomic(struct page *page,
struct iov_iter *i, unsigned long offset, size_t bytes)
{
char *kaddr = kmap_atomic(page), *p = kaddr + offset;
@@ -1203,21 +1203,48 @@ static size_t xxx_copy_from_user_atomic(struct page *page,
kunmap_atomic(kaddr);
return 0;
}
- if (unlikely(iov_iter_is_pipe(i) || iov_iter_is_discard(i))) {
+ iterate_over_iovec(i, bytes, v,
+ copyin((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len));
+ kunmap_atomic(kaddr);
+ return bytes;
+}
+
+static size_t bvec_copy_from_user_atomic(struct page *page,
+ struct iov_iter *i, unsigned long offset, size_t bytes)
+{
+ char *kaddr = kmap_atomic(page), *p = kaddr + offset;
+ if (unlikely(!page_copy_sane(page, offset, bytes))) {
kunmap_atomic(kaddr);
- WARN_ON(1);
return 0;
}
- iterate_all_kinds(i, bytes, v,
- copyin((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
+ iterate_over_bvec(i, bytes, v,
memcpy_from_page((p += v.bv_len) - v.bv_len, v.bv_page,
- v.bv_offset, v.bv_len),
- memcpy((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
- )
+ v.bv_offset, v.bv_len));
kunmap_atomic(kaddr);
return bytes;
}

+static size_t kvec_copy_from_user_atomic(struct page *page,
+ struct iov_iter *i, unsigned long offset, size_t bytes)
+{
+ char *kaddr = kmap_atomic(page), *p = kaddr + offset;
+ if (unlikely(!page_copy_sane(page, offset, bytes))) {
+ kunmap_atomic(kaddr);
+ return 0;
+ }
+ iterate_over_kvec(i, bytes, v,
+ memcpy((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len));
+ kunmap_atomic(kaddr);
+ return bytes;
+}
+
+static size_t no_copy_from_user_atomic(struct page *page,
+ struct iov_iter *i, unsigned long offset, size_t bytes)
+{
+ WARN_ON(1);
+ return 0;
+}
+
static inline void pipe_truncate(struct iov_iter *i)
{
struct pipe_inode_info *pipe = i->pipe;
@@ -2046,7 +2073,7 @@ static int xxx_for_each_range(struct iov_iter *i, size_t bytes,

static const struct iov_iter_ops iovec_iter_ops = {
.type = ITER_IOVEC,
- .copy_from_user_atomic = xxx_copy_from_user_atomic,
+ .copy_from_user_atomic = iovec_copy_from_user_atomic,
.advance = xxx_advance,
.revert = xxx_revert,
.fault_in_readable = iovec_fault_in_readable,
@@ -2080,7 +2107,7 @@ static const struct iov_iter_ops iovec_iter_ops = {

static const struct iov_iter_ops kvec_iter_ops = {
.type = ITER_KVEC,
- .copy_from_user_atomic = xxx_copy_from_user_atomic,
+ .copy_from_user_atomic = kvec_copy_from_user_atomic,
.advance = xxx_advance,
.revert = xxx_revert,
.fault_in_readable = no_fault_in_readable,
@@ -2114,7 +2141,7 @@ static const struct iov_iter_ops kvec_iter_ops = {

static const struct iov_iter_ops bvec_iter_ops = {
.type = ITER_BVEC,
- .copy_from_user_atomic = xxx_copy_from_user_atomic,
+ .copy_from_user_atomic = bvec_copy_from_user_atomic,
.advance = xxx_advance,
.revert = xxx_revert,
.fault_in_readable = no_fault_in_readable,
@@ -2148,7 +2175,7 @@ static const struct iov_iter_ops bvec_iter_ops = {

static const struct iov_iter_ops pipe_iter_ops = {
.type = ITER_PIPE,
- .copy_from_user_atomic = xxx_copy_from_user_atomic,
+ .copy_from_user_atomic = no_copy_from_user_atomic,
.advance = xxx_advance,
.revert = xxx_revert,
.fault_in_readable = no_fault_in_readable,
@@ -2182,7 +2209,7 @@ static const struct iov_iter_ops pipe_iter_ops = {

static const struct iov_iter_ops discard_iter_ops = {
.type = ITER_DISCARD,
- .copy_from_user_atomic = xxx_copy_from_user_atomic,
+ .copy_from_user_atomic = no_copy_from_user_atomic,
.advance = xxx_advance,
.revert = xxx_revert,
.fault_in_readable = no_fault_in_readable,


2020-11-21 14:20:32

by David Howells

[permalink] [raw]
Subject: [PATCH 16/29] iov_iter: Split iov_iter_advance()

Split iov_iter_advance() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 37 ++++++++++++++++++++++---------------
1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index a626d41fef72..9859b4b8a116 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1299,17 +1299,24 @@ static void pipe_advance(struct iov_iter *i, size_t size)
pipe_truncate(i);
}

-static void xxx_advance(struct iov_iter *i, size_t size)
+static void iovec_advance(struct iov_iter *i, size_t size)
{
- if (unlikely(iov_iter_is_pipe(i))) {
- pipe_advance(i, size);
- return;
- }
- if (unlikely(iov_iter_is_discard(i))) {
- i->count -= size;
- return;
- }
- iterate_and_advance(i, size, v, 0, 0, 0)
+ iterate_and_advance_iovec(i, size, v, 0)
+}
+
+static void bvec_iov_advance(struct iov_iter *i, size_t size)
+{
+ iterate_and_advance_bvec(i, size, v, 0)
+}
+
+static void kvec_advance(struct iov_iter *i, size_t size)
+{
+ iterate_and_advance_kvec(i, size, v, 0)
+}
+
+static void discard_advance(struct iov_iter *i, size_t size)
+{
+ i->count -= size;
}

static void xxx_revert(struct iov_iter *i, size_t unroll)
@@ -2074,7 +2081,7 @@ static int xxx_for_each_range(struct iov_iter *i, size_t bytes,
static const struct iov_iter_ops iovec_iter_ops = {
.type = ITER_IOVEC,
.copy_from_user_atomic = iovec_copy_from_user_atomic,
- .advance = xxx_advance,
+ .advance = iovec_advance,
.revert = xxx_revert,
.fault_in_readable = iovec_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
@@ -2108,7 +2115,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
static const struct iov_iter_ops kvec_iter_ops = {
.type = ITER_KVEC,
.copy_from_user_atomic = kvec_copy_from_user_atomic,
- .advance = xxx_advance,
+ .advance = kvec_advance,
.revert = xxx_revert,
.fault_in_readable = no_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
@@ -2142,7 +2149,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
static const struct iov_iter_ops bvec_iter_ops = {
.type = ITER_BVEC,
.copy_from_user_atomic = bvec_copy_from_user_atomic,
- .advance = xxx_advance,
+ .advance = bvec_iov_advance,
.revert = xxx_revert,
.fault_in_readable = no_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
@@ -2176,7 +2183,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
static const struct iov_iter_ops pipe_iter_ops = {
.type = ITER_PIPE,
.copy_from_user_atomic = no_copy_from_user_atomic,
- .advance = xxx_advance,
+ .advance = pipe_advance,
.revert = xxx_revert,
.fault_in_readable = no_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
@@ -2210,7 +2217,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
static const struct iov_iter_ops discard_iter_ops = {
.type = ITER_DISCARD,
.copy_from_user_atomic = no_copy_from_user_atomic,
- .advance = xxx_advance,
+ .advance = discard_advance,
.revert = xxx_revert,
.fault_in_readable = no_fault_in_readable,
.single_seg_count = xxx_single_seg_count,


2020-11-21 14:20:36

by David Howells

[permalink] [raw]
Subject: [PATCH 17/29] iov_iter: Split iov_iter_revert()

Split iov_iter_revert() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 132 ++++++++++++++++++++++++++++++++++----------------------
1 file changed, 79 insertions(+), 53 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 9859b4b8a116..b8e3da20547e 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1319,71 +1319,97 @@ static void discard_advance(struct iov_iter *i, size_t size)
i->count -= size;
}

-static void xxx_revert(struct iov_iter *i, size_t unroll)
+static void iovec_kvec_revert(struct iov_iter *i, size_t unroll)
{
+ const struct iovec *iov = i->iov;
if (!unroll)
return;
if (WARN_ON(unroll > MAX_RW_COUNT))
return;
i->count += unroll;
- if (unlikely(iov_iter_is_pipe(i))) {
- struct pipe_inode_info *pipe = i->pipe;
- unsigned int p_mask = pipe->ring_size - 1;
- unsigned int i_head = i->head;
- size_t off = i->iov_offset;
- while (1) {
- struct pipe_buffer *b = &pipe->bufs[i_head & p_mask];
- size_t n = off - b->offset;
- if (unroll < n) {
- off -= unroll;
- break;
- }
- unroll -= n;
- if (!unroll && i_head == i->start_head) {
- off = 0;
- break;
- }
- i_head--;
- b = &pipe->bufs[i_head & p_mask];
- off = b->offset + b->len;
- }
- i->iov_offset = off;
- i->head = i_head;
- pipe_truncate(i);
+ if (unroll <= i->iov_offset) {
+ i->iov_offset -= unroll;
return;
}
- if (unlikely(iov_iter_is_discard(i)))
+ unroll -= i->iov_offset;
+ while (1) {
+ size_t n = (--iov)->iov_len;
+ i->nr_segs++;
+ if (unroll <= n) {
+ i->iov = iov;
+ i->iov_offset = n - unroll;
+ return;
+ }
+ unroll -= n;
+ }
+}
+
+static void bvec_revert(struct iov_iter *i, size_t unroll)
+{
+ const struct bio_vec *bvec = i->bvec;
+
+ if (!unroll)
return;
+ if (WARN_ON(unroll > MAX_RW_COUNT))
+ return;
+ i->count += unroll;
if (unroll <= i->iov_offset) {
i->iov_offset -= unroll;
return;
}
unroll -= i->iov_offset;
- if (iov_iter_is_bvec(i)) {
- const struct bio_vec *bvec = i->bvec;
- while (1) {
- size_t n = (--bvec)->bv_len;
- i->nr_segs++;
- if (unroll <= n) {
- i->bvec = bvec;
- i->iov_offset = n - unroll;
- return;
- }
- unroll -= n;
+ while (1) {
+ size_t n = (--bvec)->bv_len;
+ i->nr_segs++;
+ if (unroll <= n) {
+ i->bvec = bvec;
+ i->iov_offset = n - unroll;
+ return;
}
- } else { /* same logics for iovec and kvec */
- const struct iovec *iov = i->iov;
- while (1) {
- size_t n = (--iov)->iov_len;
- i->nr_segs++;
- if (unroll <= n) {
- i->iov = iov;
- i->iov_offset = n - unroll;
- return;
- }
- unroll -= n;
+ unroll -= n;
+ }
+}
+
+static void pipe_revert(struct iov_iter *i, size_t unroll)
+{
+ struct pipe_inode_info *pipe = i->pipe;
+ unsigned int p_mask = pipe->ring_size - 1;
+ unsigned int i_head = i->head;
+ size_t off = i->iov_offset;
+
+ if (!unroll)
+ return;
+ if (WARN_ON(unroll > MAX_RW_COUNT))
+ return;
+
+ while (1) {
+ struct pipe_buffer *b = &pipe->bufs[i_head & p_mask];
+ size_t n = off - b->offset;
+ if (unroll < n) {
+ off -= unroll;
+ break;
+ }
+ unroll -= n;
+ if (!unroll && i_head == i->start_head) {
+ off = 0;
+ break;
}
+ i_head--;
+ b = &pipe->bufs[i_head & p_mask];
+ off = b->offset + b->len;
}
+ i->iov_offset = off;
+ i->head = i_head;
+ pipe_truncate(i);
+}
+
+static void discard_revert(struct iov_iter *i, size_t unroll)
+{
+ if (!unroll)
+ return;
+ if (WARN_ON(unroll > MAX_RW_COUNT))
+ return;
+ i->count += unroll;
}

/*
@@ -2082,7 +2108,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.type = ITER_IOVEC,
.copy_from_user_atomic = iovec_copy_from_user_atomic,
.advance = iovec_advance,
- .revert = xxx_revert,
+ .revert = iovec_kvec_revert,
.fault_in_readable = iovec_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = iovec_copy_page_to_iter,
@@ -2116,7 +2142,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.type = ITER_KVEC,
.copy_from_user_atomic = kvec_copy_from_user_atomic,
.advance = kvec_advance,
- .revert = xxx_revert,
+ .revert = iovec_kvec_revert,
.fault_in_readable = no_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = bkvec_copy_page_to_iter,
@@ -2150,7 +2176,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.type = ITER_BVEC,
.copy_from_user_atomic = bvec_copy_from_user_atomic,
.advance = bvec_iov_advance,
- .revert = xxx_revert,
+ .revert = bvec_revert,
.fault_in_readable = no_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = bkvec_copy_page_to_iter,
@@ -2184,7 +2210,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.type = ITER_PIPE,
.copy_from_user_atomic = no_copy_from_user_atomic,
.advance = pipe_advance,
- .revert = xxx_revert,
+ .revert = pipe_revert,
.fault_in_readable = no_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = pipe_copy_page_to_iter,
@@ -2218,7 +2244,7 @@ static const struct iov_iter_ops discard_iter_ops = {
.type = ITER_DISCARD,
.copy_from_user_atomic = no_copy_from_user_atomic,
.advance = discard_advance,
- .revert = xxx_revert,
+ .revert = discard_revert,
.fault_in_readable = no_fault_in_readable,
.single_seg_count = xxx_single_seg_count,
.copy_page_to_iter = discard_copy_page_to_iter,


2020-11-21 14:20:59

by David Howells

[permalink] [raw]
Subject: [PATCH 22/29] iov_iter: Split iov_iter_get_pages_alloc()

Split iov_iter_get_pages_alloc() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 48 +++++++++++++++++++++++++++++++-----------------
1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index a2de201b947f..a038bfbbbd53 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1690,6 +1690,8 @@ static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
unsigned int iter_head, npages;
ssize_t n;

+ if (maxsize > i->count)
+ maxsize = i->count;
if (!maxsize)
return 0;

@@ -1715,7 +1717,7 @@ static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
return n;
}

-static ssize_t xxx_get_pages_alloc(struct iov_iter *i,
+static ssize_t iovec_get_pages_alloc(struct iov_iter *i,
struct page ***pages, size_t maxsize,
size_t *start)
{
@@ -1724,12 +1726,7 @@ static ssize_t xxx_get_pages_alloc(struct iov_iter *i,
if (maxsize > i->count)
maxsize = i->count;

- if (unlikely(iov_iter_is_pipe(i)))
- return pipe_get_pages_alloc(i, pages, maxsize, start);
- if (unlikely(iov_iter_is_discard(i)))
- return -EFAULT;
-
- iterate_all_kinds(i, maxsize, v, ({
+ iterate_over_iovec(i, maxsize, v, ({
unsigned long addr = (unsigned long)v.iov_base;
size_t len = v.iov_len + (*start = addr & (PAGE_SIZE - 1));
int n;
@@ -1748,7 +1745,20 @@ static ssize_t xxx_get_pages_alloc(struct iov_iter *i,
}
*pages = p;
return (res == n ? len : res * PAGE_SIZE) - *start;
- 0;}),({
+ 0;}));
+ return 0;
+}
+
+static ssize_t bvec_get_pages_alloc(struct iov_iter *i,
+ struct page ***pages, size_t maxsize,
+ size_t *start)
+{
+ struct page **p;
+
+ if (maxsize > i->count)
+ maxsize = i->count;
+
+ iterate_over_bvec(i, maxsize, v, ({
/* can't be more than PAGE_SIZE */
*start = v.bv_offset;
*pages = p = get_pages_array(1);
@@ -1756,13 +1766,17 @@ static ssize_t xxx_get_pages_alloc(struct iov_iter *i,
return -ENOMEM;
get_page(*p = v.bv_page);
return v.bv_len;
- }),({
- return -EFAULT;
- })
- )
+ }));
return 0;
}

+static ssize_t no_get_pages_alloc(struct iov_iter *i,
+ struct page ***pages, size_t maxsize,
+ size_t *start)
+{
+ return -EFAULT;
+}
+
static size_t xxx_csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
struct iov_iter *i)
{
@@ -2192,7 +2206,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.alignment = iovec_alignment,
.gap_alignment = iovec_gap_alignment,
.get_pages = iovec_get_pages,
- .get_pages_alloc = xxx_get_pages_alloc,
+ .get_pages_alloc = iovec_get_pages_alloc,
.npages = xxx_npages,
.dup_iter = xxx_dup_iter,
.for_each_range = xxx_for_each_range,
@@ -2226,7 +2240,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.alignment = kvec_alignment,
.gap_alignment = kvec_gap_alignment,
.get_pages = no_get_pages,
- .get_pages_alloc = xxx_get_pages_alloc,
+ .get_pages_alloc = no_get_pages_alloc,
.npages = xxx_npages,
.dup_iter = xxx_dup_iter,
.for_each_range = xxx_for_each_range,
@@ -2260,7 +2274,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.alignment = bvec_alignment,
.gap_alignment = bvec_gap_alignment,
.get_pages = bvec_get_pages,
- .get_pages_alloc = xxx_get_pages_alloc,
+ .get_pages_alloc = bvec_get_pages_alloc,
.npages = xxx_npages,
.dup_iter = xxx_dup_iter,
.for_each_range = xxx_for_each_range,
@@ -2294,7 +2308,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.alignment = pipe_alignment,
.gap_alignment = no_gap_alignment,
.get_pages = pipe_get_pages,
- .get_pages_alloc = xxx_get_pages_alloc,
+ .get_pages_alloc = pipe_get_pages_alloc,
.npages = xxx_npages,
.dup_iter = xxx_dup_iter,
.for_each_range = xxx_for_each_range,
@@ -2328,7 +2342,7 @@ static const struct iov_iter_ops discard_iter_ops = {
.alignment = no_alignment,
.gap_alignment = no_gap_alignment,
.get_pages = no_get_pages,
- .get_pages_alloc = xxx_get_pages_alloc,
+ .get_pages_alloc = no_get_pages_alloc,
.npages = xxx_npages,
.dup_iter = xxx_dup_iter,
.for_each_range = xxx_for_each_range,


2020-11-21 14:21:09

by David Howells

[permalink] [raw]
Subject: [PATCH 23/29] iov_iter: Split csum_and_copy_from_iter()

Split csum_and_copy_from_iter() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 56 +++++++++++++++++++++++++++++++++++++++++---------------
1 file changed, 41 insertions(+), 15 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index a038bfbbbd53..1f596cffddf9 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1777,18 +1777,14 @@ static ssize_t no_get_pages_alloc(struct iov_iter *i,
return -EFAULT;
}

-static size_t xxx_csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
+static size_t iovec_csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
struct iov_iter *i)
{
char *to = addr;
__wsum sum, next;
size_t off = 0;
sum = *csum;
- if (unlikely(iov_iter_is_pipe(i) || iov_iter_is_discard(i))) {
- WARN_ON(1);
- return 0;
- }
- iterate_and_advance(i, bytes, v, ({
+ iterate_and_advance_iovec(i, bytes, v, ({
next = csum_and_copy_from_user(v.iov_base,
(to += v.iov_len) - v.iov_len,
v.iov_len);
@@ -1797,24 +1793,54 @@ static size_t xxx_csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum
off += v.iov_len;
}
next ? 0 : v.iov_len;
- }), ({
+ }));
+ *csum = sum;
+ return bytes;
+}
+
+static size_t bvec_csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
+ struct iov_iter *i)
+{
+ char *to = addr;
+ __wsum sum;
+ size_t off = 0;
+ sum = *csum;
+ iterate_and_advance_bvec(i, bytes, v, ({
char *p = kmap_atomic(v.bv_page);
sum = csum_and_memcpy((to += v.bv_len) - v.bv_len,
p + v.bv_offset, v.bv_len,
sum, off);
kunmap_atomic(p);
off += v.bv_len;
- }),({
+ }));
+ *csum = sum;
+ return bytes;
+}
+
+static size_t kvec_csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
+ struct iov_iter *i)
+{
+ char *to = addr;
+ __wsum sum;
+ size_t off = 0;
+ sum = *csum;
+ iterate_and_advance_kvec(i, bytes, v, ({
sum = csum_and_memcpy((to += v.iov_len) - v.iov_len,
v.iov_base, v.iov_len,
sum, off);
off += v.iov_len;
- })
- )
+ }));
*csum = sum;
return bytes;
}

+static size_t no_csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
+ struct iov_iter *i)
+{
+ WARN_ON(1);
+ return 0;
+}
+
static bool xxx_csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum,
struct iov_iter *i)
{
@@ -2199,7 +2225,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
.copy_mc_to_iter = iovec_copy_mc_to_iter,
#endif
.csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
- .csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
+ .csum_and_copy_from_iter = iovec_csum_and_copy_from_iter,
.csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,

.zero = iovec_zero,
@@ -2233,7 +2259,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
.copy_mc_to_iter = kvec_copy_mc_to_iter,
#endif
.csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
- .csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
+ .csum_and_copy_from_iter = kvec_csum_and_copy_from_iter,
.csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,

.zero = kvec_zero,
@@ -2267,7 +2293,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
.copy_mc_to_iter = bvec_copy_mc_to_iter,
#endif
.csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
- .csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
+ .csum_and_copy_from_iter = bvec_csum_and_copy_from_iter,
.csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,

.zero = bvec_zero,
@@ -2301,7 +2327,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
.copy_mc_to_iter = pipe_copy_mc_to_iter,
#endif
.csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
- .csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
+ .csum_and_copy_from_iter = no_csum_and_copy_from_iter,
.csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,

.zero = pipe_zero,
@@ -2335,7 +2361,7 @@ static const struct iov_iter_ops discard_iter_ops = {
.copy_mc_to_iter = discard_copy_to_iter,
#endif
.csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
- .csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
+ .csum_and_copy_from_iter = no_csum_and_copy_from_iter,
.csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,

.zero = discard_zero,


2020-11-21 14:21:15

by David Howells

[permalink] [raw]
Subject: [PATCH 25/29] iov_iter: Split csum_and_copy_to_iter()

Split csum_and_copy_to_iter() by type.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 68 ++++++++++++++++++++++++++++++++++++++++----------------
1 file changed, 48 insertions(+), 20 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 8820a9e72815..2f8019e3b09a 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -698,14 +698,15 @@ static __wsum csum_and_memcpy(void *to, const void *from, size_t len,
return csum_block_add(sum, next, off);
}

-static size_t csum_and_copy_to_pipe_iter(const void *addr, size_t bytes,
- __wsum *csum, struct iov_iter *i)
+static size_t pipe_csum_and_copy_to_iter(const void *addr, size_t bytes,
+ void *csump, struct iov_iter *i)
{
struct pipe_inode_info *pipe = i->pipe;
unsigned int p_mask = pipe->ring_size - 1;
unsigned int i_head;
size_t n, r;
size_t off = 0;
+ __wsum *csum = csump;
__wsum sum = *csum;

if (!sanity(i))
@@ -1914,7 +1915,7 @@ static bool no_csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *cs
return false;
}

-static size_t xxx_csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump,
+static size_t iovec_csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump,
struct iov_iter *i)
{
const char *from = addr;
@@ -1922,15 +1923,8 @@ static size_t xxx_csum_and_copy_to_iter(const void *addr, size_t bytes, void *cs
__wsum sum, next;
size_t off = 0;

- if (unlikely(iov_iter_is_pipe(i)))
- return csum_and_copy_to_pipe_iter(addr, bytes, csum, i);
-
sum = *csum;
- if (unlikely(iov_iter_is_discard(i))) {
- WARN_ON(1); /* for now */
- return 0;
- }
- iterate_and_advance(i, bytes, v, ({
+ iterate_and_advance_iovec(i, bytes, v, ({
next = csum_and_copy_to_user((from += v.iov_len) - v.iov_len,
v.iov_base,
v.iov_len);
@@ -1939,24 +1933,58 @@ static size_t xxx_csum_and_copy_to_iter(const void *addr, size_t bytes, void *cs
off += v.iov_len;
}
next ? 0 : v.iov_len;
- }), ({
+ }));
+ *csum = sum;
+ return bytes;
+}
+
+static size_t bvec_csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump,
+ struct iov_iter *i)
+{
+ const char *from = addr;
+ __wsum *csum = csump;
+ __wsum sum;
+ size_t off = 0;
+
+ sum = *csum;
+ iterate_and_advance_bvec(i, bytes, v, ({
char *p = kmap_atomic(v.bv_page);
sum = csum_and_memcpy(p + v.bv_offset,
(from += v.bv_len) - v.bv_len,
v.bv_len, sum, off);
kunmap_atomic(p);
off += v.bv_len;
- }),({
+ }));
+ *csum = sum;
+ return bytes;
+}
+
+static size_t kvec_csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump,
+ struct iov_iter *i)
+{
+ const char *from = addr;
+ __wsum *csum = csump;
+ __wsum sum;
+ size_t off = 0;
+
+ sum = *csum;
+ iterate_and_advance_kvec(i, bytes, v, ({
sum = csum_and_memcpy(v.iov_base,
(from += v.iov_len) - v.iov_len,
v.iov_len, sum, off);
off += v.iov_len;
- })
- )
+ }));
*csum = sum;
return bytes;
}

+static size_t discard_csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump,
+ struct iov_iter *i)
+{
+ WARN_ON(1); /* for now */
+ return 0;
+}
+
size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
struct iov_iter *i)
{
@@ -2256,7 +2284,7 @@ static const struct iov_iter_ops iovec_iter_ops = {
#ifdef CONFIG_ARCH_HAS_COPY_MC
.copy_mc_to_iter = iovec_copy_mc_to_iter,
#endif
- .csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
+ .csum_and_copy_to_iter = iovec_csum_and_copy_to_iter,
.csum_and_copy_from_iter = iovec_csum_and_copy_from_iter,
.csum_and_copy_from_iter_full = iovec_csum_and_copy_from_iter_full,

@@ -2290,7 +2318,7 @@ static const struct iov_iter_ops kvec_iter_ops = {
#ifdef CONFIG_ARCH_HAS_COPY_MC
.copy_mc_to_iter = kvec_copy_mc_to_iter,
#endif
- .csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
+ .csum_and_copy_to_iter = kvec_csum_and_copy_to_iter,
.csum_and_copy_from_iter = kvec_csum_and_copy_from_iter,
.csum_and_copy_from_iter_full = kvec_csum_and_copy_from_iter_full,

@@ -2324,7 +2352,7 @@ static const struct iov_iter_ops bvec_iter_ops = {
#ifdef CONFIG_ARCH_HAS_COPY_MC
.copy_mc_to_iter = bvec_copy_mc_to_iter,
#endif
- .csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
+ .csum_and_copy_to_iter = bvec_csum_and_copy_to_iter,
.csum_and_copy_from_iter = bvec_csum_and_copy_from_iter,
.csum_and_copy_from_iter_full = bvec_csum_and_copy_from_iter_full,

@@ -2358,7 +2386,7 @@ static const struct iov_iter_ops pipe_iter_ops = {
#ifdef CONFIG_ARCH_HAS_COPY_MC
.copy_mc_to_iter = pipe_copy_mc_to_iter,
#endif
- .csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
+ .csum_and_copy_to_iter = pipe_csum_and_copy_to_iter,
.csum_and_copy_from_iter = no_csum_and_copy_from_iter,
.csum_and_copy_from_iter_full = no_csum_and_copy_from_iter_full,

@@ -2392,7 +2420,7 @@ static const struct iov_iter_ops discard_iter_ops = {
#ifdef CONFIG_ARCH_HAS_COPY_MC
.copy_mc_to_iter = discard_copy_to_iter,
#endif
- .csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
+ .csum_and_copy_to_iter = discard_csum_and_copy_to_iter,
.csum_and_copy_from_iter = no_csum_and_copy_from_iter,
.csum_and_copy_from_iter_full = no_csum_and_copy_from_iter_full,



2020-11-21 14:21:47

by David Howells

[permalink] [raw]
Subject: [PATCH 29/29] iov_iter: Remove iterate_all_kinds() and iterate_and_advance()

Remove iterate_all_kinds() and iterate_and_advance() as they're no longer
used, having been split.

Signed-off-by: David Howells <[email protected]>
---

lib/iov_iter.c | 61 --------------------------------------------------------
1 file changed, 61 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index db798966823e..ba6b60c45103 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -86,26 +86,6 @@ static inline bool page_copy_sane(struct page *page, size_t offset, size_t n);
} \
}

-#define iterate_all_kinds(i, n, v, I, B, K) { \
- if (likely(n)) { \
- size_t skip = i->iov_offset; \
- if (unlikely(iov_iter_type(i) & ITER_BVEC)) { \
- struct bio_vec v; \
- struct bvec_iter __bi; \
- iterate_bvec(i, n, v, __bi, skip, (B)) \
- } else if (unlikely(iov_iter_type(i) & ITER_KVEC)) { \
- const struct kvec *kvec; \
- struct kvec v; \
- iterate_kvec(i, n, v, kvec, skip, (K)) \
- } else if (unlikely(iov_iter_type(i) & ITER_DISCARD)) { \
- } else { \
- const struct iovec *iov; \
- struct iovec v; \
- iterate_iovec(i, n, v, iov, skip, (I)) \
- } \
- } \
-}
-
#define iterate_over_iovec(i, n, v, CMD) { \
if (likely(n)) { \
size_t skip = i->iov_offset; \
@@ -133,47 +113,6 @@ static inline bool page_copy_sane(struct page *page, size_t offset, size_t n);
} \
}

-#define iterate_and_advance(i, n, v, I, B, K) { \
- if (unlikely(i->count < n)) \
- n = i->count; \
- if (i->count) { \
- size_t skip = i->iov_offset; \
- if (unlikely(iov_iter_type(i) & ITER_BVEC)) { \
- const struct bio_vec *bvec = i->bvec; \
- struct bio_vec v; \
- struct bvec_iter __bi; \
- iterate_bvec(i, n, v, __bi, skip, (B)) \
- i->bvec = __bvec_iter_bvec(i->bvec, __bi); \
- i->nr_segs -= i->bvec - bvec; \
- skip = __bi.bi_bvec_done; \
- } else if (unlikely(iov_iter_type(i) & ITER_KVEC)) { \
- const struct kvec *kvec; \
- struct kvec v; \
- iterate_kvec(i, n, v, kvec, skip, (K)) \
- if (skip == kvec->iov_len) { \
- kvec++; \
- skip = 0; \
- } \
- i->nr_segs -= kvec - i->kvec; \
- i->kvec = kvec; \
- } else if (unlikely(iov_iter_type(i) & ITER_DISCARD)) { \
- skip += n; \
- } else { \
- const struct iovec *iov; \
- struct iovec v; \
- iterate_iovec(i, n, v, iov, skip, (I)) \
- if (skip == iov->iov_len) { \
- iov++; \
- skip = 0; \
- } \
- i->nr_segs -= iov - i->iov; \
- i->iov = iov; \
- } \
- i->count -= n; \
- i->iov_offset = skip; \
- } \
-}
-
#define iterate_and_advance_iovec(i, n, v, CMD) { \
if (unlikely(i->count < n)) \
n = i->count; \


2020-11-21 14:36:24

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH 01/29] iov_iter: Switch to using a table of operations

On 21/11/2020 14:13, David Howells wrote:
> Switch to using a table of operations. In a future patch the individual
> methods will be split up by type. For the moment, however, the ops tables
> just jump directly to the old functions - which are now static. Inline
> wrappers are provided to jump through the hooks.
>
> Signed-off-by: David Howells <[email protected]>
> ---
>
> fs/io_uring.c | 2
> include/linux/uio.h | 241 ++++++++++++++++++++++++++++++++++--------
> lib/iov_iter.c | 293 +++++++++++++++++++++++++++++++++++++++------------
> 3 files changed, 422 insertions(+), 114 deletions(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 4ead291b2976..baa78f58ae5c 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -3192,7 +3192,7 @@ static void io_req_map_rw(struct io_kiocb *req, const struct iovec *iovec,
> rw->free_iovec = iovec;
> rw->bytes_done = 0;
> /* can only be fixed buffers, no need to do anything */
> - if (iter->type == ITER_BVEC)
> + if (iov_iter_is_bvec(iter))

Could you split this io_uring change and send for 5.10?
Or I can do it for you if you wish.

> return;
> if (!iovec) {
> unsigned iov_off = 0;
> diff --git a/include/linux/uio.h b/include/linux/uio.h
> index 72d88566694e..45ee087f8c43 100644
> --- a/include/linux/uio.h
> +++ b/include/linux/uio.h
> @@ -32,9 +32,10 @@ struct iov_iter {
> * Bit 1 is the BVEC_FLAG_NO_REF bit, set if type is a bvec and
> * the caller isn't expecting to drop a page reference when done.
> */
> - unsigned int type;
> + unsigned int flags;
> size_t iov_offset;
> size_t count;
> + const struct iov_iter_ops *ops;
> union {
> const struct iovec *iov;
> const struct kvec *kvec;
> @@ -50,9 +51,63 @@ struct iov_iter {
> };
> };
>
> +void iov_iter_init(struct iov_iter *i, unsigned int direction, const struct iovec *iov,
> + unsigned long nr_segs, size_t count);
> +void iov_iter_kvec(struct iov_iter *i, unsigned int direction, const struct kvec *kvec,
> + unsigned long nr_segs, size_t count);
> +void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_vec *bvec,
> + unsigned long nr_segs, size_t count);
> +void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode_info *pipe,
> + size_t count);
> +void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count);
> +
> +struct iov_iter_ops {
> + enum iter_type type;
> + size_t (*copy_from_user_atomic)(struct page *page, struct iov_iter *i,
> + unsigned long offset, size_t bytes);
> + void (*advance)(struct iov_iter *i, size_t bytes);
> + void (*revert)(struct iov_iter *i, size_t bytes);
> + int (*fault_in_readable)(struct iov_iter *i, size_t bytes);
> + size_t (*single_seg_count)(const struct iov_iter *i);
> + size_t (*copy_page_to_iter)(struct page *page, size_t offset, size_t bytes,
> + struct iov_iter *i);
> + size_t (*copy_page_from_iter)(struct page *page, size_t offset, size_t bytes,
> + struct iov_iter *i);
> + size_t (*copy_to_iter)(const void *addr, size_t bytes, struct iov_iter *i);
> + size_t (*copy_from_iter)(void *addr, size_t bytes, struct iov_iter *i);
> + bool (*copy_from_iter_full)(void *addr, size_t bytes, struct iov_iter *i);
> + size_t (*copy_from_iter_nocache)(void *addr, size_t bytes, struct iov_iter *i);
> + bool (*copy_from_iter_full_nocache)(void *addr, size_t bytes, struct iov_iter *i);
> +#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
> + size_t (*copy_from_iter_flushcache)(void *addr, size_t bytes, struct iov_iter *i);
> +#endif
> +#ifdef CONFIG_ARCH_HAS_COPY_MC
> + size_t (*copy_mc_to_iter)(const void *addr, size_t bytes, struct iov_iter *i);
> +#endif
> + size_t (*csum_and_copy_to_iter)(const void *addr, size_t bytes, void *csump,
> + struct iov_iter *i);
> + size_t (*csum_and_copy_from_iter)(void *addr, size_t bytes, __wsum *csum,
> + struct iov_iter *i);
> + bool (*csum_and_copy_from_iter_full)(void *addr, size_t bytes, __wsum *csum,
> + struct iov_iter *i);
> +
> + size_t (*zero)(size_t bytes, struct iov_iter *i);
> + unsigned long (*alignment)(const struct iov_iter *i);
> + unsigned long (*gap_alignment)(const struct iov_iter *i);
> + ssize_t (*get_pages)(struct iov_iter *i, struct page **pages,
> + size_t maxsize, unsigned maxpages, size_t *start);
> + ssize_t (*get_pages_alloc)(struct iov_iter *i, struct page ***pages,
> + size_t maxsize, size_t *start);
> + int (*npages)(const struct iov_iter *i, int maxpages);
> + const void *(*dup_iter)(struct iov_iter *new, struct iov_iter *old, gfp_t flags);
> + int (*for_each_range)(struct iov_iter *i, size_t bytes,
> + int (*f)(struct kvec *vec, void *context),
> + void *context);
> +};
> +
> static inline enum iter_type iov_iter_type(const struct iov_iter *i)
> {
> - return i->type & ~(READ | WRITE);
> + return i->ops->type;
> }
>
> static inline bool iter_is_iovec(const struct iov_iter *i)
> @@ -82,7 +137,7 @@ static inline bool iov_iter_is_discard(const struct iov_iter *i)
>
> static inline unsigned char iov_iter_rw(const struct iov_iter *i)
> {
> - return i->type & (READ | WRITE);
> + return i->flags & (READ | WRITE);
> }
>
> /*
> @@ -111,22 +166,71 @@ static inline struct iovec iov_iter_iovec(const struct iov_iter *iter)
> };
> }
>
> -size_t iov_iter_copy_from_user_atomic(struct page *page,
> - struct iov_iter *i, unsigned long offset, size_t bytes);
> -void iov_iter_advance(struct iov_iter *i, size_t bytes);
> -void iov_iter_revert(struct iov_iter *i, size_t bytes);
> -int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes);
> -size_t iov_iter_single_seg_count(const struct iov_iter *i);
> +static inline
> +size_t iov_iter_copy_from_user_atomic(struct page *page, struct iov_iter *i,
> + unsigned long offset, size_t bytes)
> +{
> + return i->ops->copy_from_user_atomic(page, i, offset, bytes);
> +}
> +static inline
> +void iov_iter_advance(struct iov_iter *i, size_t bytes)
> +{
> + return i->ops->advance(i, bytes);
> +}
> +static inline
> +void iov_iter_revert(struct iov_iter *i, size_t bytes)
> +{
> + return i->ops->revert(i, bytes);
> +}
> +static inline
> +int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes)
> +{
> + return i->ops->fault_in_readable(i, bytes);
> +}
> +static inline
> +size_t iov_iter_single_seg_count(const struct iov_iter *i)
> +{
> + return i->ops->single_seg_count(i);
> +}
> +
> +static inline
> size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
> - struct iov_iter *i);
> + struct iov_iter *i)
> +{
> + return i->ops->copy_page_to_iter(page, offset, bytes, i);
> +}
> +static inline
> size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
> - struct iov_iter *i);
> + struct iov_iter *i)
> +{
> + return i->ops->copy_page_from_iter(page, offset, bytes, i);
> +}
>
> -size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
> -size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
> -bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i);
> -size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
> -bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i);
> +static __always_inline __must_check
> +size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
> +{
> + return i->ops->copy_to_iter(addr, bytes, i);
> +}
> +static __always_inline __must_check
> +size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
> +{
> + return i->ops->copy_from_iter(addr, bytes, i);
> +}
> +static __always_inline __must_check
> +bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
> +{
> + return i->ops->copy_from_iter_full(addr, bytes, i);
> +}
> +static __always_inline __must_check
> +size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
> +{
> + return i->ops->copy_from_iter_nocache(addr, bytes, i);
> +}
> +static __always_inline __must_check
> +bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
> +{
> + return i->ops->copy_from_iter_full_nocache(addr, bytes, i);
> +}
>
> static __always_inline __must_check
> size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
> @@ -173,23 +277,21 @@ bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
> return _copy_from_iter_full_nocache(addr, bytes, i);
> }
>
> -#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
> /*
> * Note, users like pmem that depend on the stricter semantics of
> * copy_from_iter_flushcache() than copy_from_iter_nocache() must check for
> * IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) before assuming that the
> * destination is flushed from the cache on return.
> */
> -size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
> -#else
> -#define _copy_from_iter_flushcache _copy_from_iter_nocache
> -#endif
> -
> -#ifdef CONFIG_ARCH_HAS_COPY_MC
> -size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
> +static __always_inline __must_check
> +size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
> +{
> +#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
> + return i->ops->copy_from_iter_flushcache(addr, bytes, i);
> #else
> -#define _copy_mc_to_iter _copy_to_iter
> + return i->ops->copy_from_iter_nocache(addr, bytes, i);
> #endif
> +}
>
> static __always_inline __must_check
> size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
> @@ -200,6 +302,16 @@ size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
> return _copy_from_iter_flushcache(addr, bytes, i);
> }
>
> +static __always_inline __must_check
> +size_t _copy_mc_to_iter(void *addr, size_t bytes, struct iov_iter *i)
> +{
> +#ifdef CONFIG_ARCH_HAS_COPY_MC
> + return i->ops->copy_mc_to_iter(addr, bytes, i);
> +#else
> + return i->ops->copy_to_iter(addr, bytes, i);
> +#endif
> +}
> +
> static __always_inline __must_check
> size_t copy_mc_to_iter(void *addr, size_t bytes, struct iov_iter *i)
> {
> @@ -209,25 +321,47 @@ size_t copy_mc_to_iter(void *addr, size_t bytes, struct iov_iter *i)
> return _copy_mc_to_iter(addr, bytes, i);
> }
>
> -size_t iov_iter_zero(size_t bytes, struct iov_iter *);
> -unsigned long iov_iter_alignment(const struct iov_iter *i);
> -unsigned long iov_iter_gap_alignment(const struct iov_iter *i);
> -void iov_iter_init(struct iov_iter *i, unsigned int direction, const struct iovec *iov,
> - unsigned long nr_segs, size_t count);
> -void iov_iter_kvec(struct iov_iter *i, unsigned int direction, const struct kvec *kvec,
> - unsigned long nr_segs, size_t count);
> -void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_vec *bvec,
> - unsigned long nr_segs, size_t count);
> -void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode_info *pipe,
> - size_t count);
> -void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count);
> +static inline
> +size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
> +{
> + return i->ops->zero(bytes, i);
> +}
> +static inline
> +unsigned long iov_iter_alignment(const struct iov_iter *i)
> +{
> + return i->ops->alignment(i);
> +}
> +static inline
> +unsigned long iov_iter_gap_alignment(const struct iov_iter *i)
> +{
> + return i->ops->gap_alignment(i);
> +}
> +
> +static inline
> ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
> - size_t maxsize, unsigned maxpages, size_t *start);
> + size_t maxsize, unsigned maxpages, size_t *start)
> +{
> + return i->ops->get_pages(i, pages, maxsize, maxpages, start);
> +}
> +
> +static inline
> ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages,
> - size_t maxsize, size_t *start);
> -int iov_iter_npages(const struct iov_iter *i, int maxpages);
> + size_t maxsize, size_t *start)
> +{
> + return i->ops->get_pages_alloc(i, pages, maxsize, start);
> +}
> +
> +static inline
> +int iov_iter_npages(const struct iov_iter *i, int maxpages)
> +{
> + return i->ops->npages(i, maxpages);
> +}
>
> -const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags);
> +static inline
> +const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
> +{
> + return old->ops->dup_iter(new, old, flags);
> +}
>
> static inline size_t iov_iter_count(const struct iov_iter *i)
> {
> @@ -260,9 +394,22 @@ static inline void iov_iter_reexpand(struct iov_iter *i, size_t count)
> {
> i->count = count;
> }
> -size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump, struct iov_iter *i);
> -size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i);
> -bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i);
> +
> +static inline
> +size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump, struct iov_iter *i)
> +{
> + return i->ops->csum_and_copy_to_iter(addr, bytes, csump, i);
> +}
> +static inline
> +size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i)
> +{
> + return i->ops->csum_and_copy_from_iter(addr, bytes, csum, i);
> +}
> +static inline
> +bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i)
> +{
> + return i->ops->csum_and_copy_from_iter_full(addr, bytes, csum, i);
> +}
> size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
> struct iov_iter *i);
>
> @@ -278,8 +425,12 @@ ssize_t __import_iovec(int type, const struct iovec __user *uvec,
> int import_single_range(int type, void __user *buf, size_t len,
> struct iovec *iov, struct iov_iter *i);
>
> +static inline
> int iov_iter_for_each_range(struct iov_iter *i, size_t bytes,
> int (*f)(struct kvec *vec, void *context),
> - void *context);
> + void *context)
> +{
> + return i->ops->for_each_range(i, bytes, f, context);
> +}
>
> #endif
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 1635111c5bd2..e403d524c797 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -13,6 +13,12 @@
> #include <linux/scatterlist.h>
> #include <linux/instrumented.h>
>
> +static const struct iov_iter_ops iovec_iter_ops;
> +static const struct iov_iter_ops kvec_iter_ops;
> +static const struct iov_iter_ops bvec_iter_ops;
> +static const struct iov_iter_ops pipe_iter_ops;
> +static const struct iov_iter_ops discard_iter_ops;
> +
> #define PIPE_PARANOIA /* for now */
>
> #define iterate_iovec(i, n, __v, __p, skip, STEP) { \
> @@ -81,15 +87,15 @@
> #define iterate_all_kinds(i, n, v, I, B, K) { \
> if (likely(n)) { \
> size_t skip = i->iov_offset; \
> - if (unlikely(i->type & ITER_BVEC)) { \
> + if (unlikely(iov_iter_type(i) & ITER_BVEC)) { \
> struct bio_vec v; \
> struct bvec_iter __bi; \
> iterate_bvec(i, n, v, __bi, skip, (B)) \
> - } else if (unlikely(i->type & ITER_KVEC)) { \
> + } else if (unlikely(iov_iter_type(i) & ITER_KVEC)) { \
> const struct kvec *kvec; \
> struct kvec v; \
> iterate_kvec(i, n, v, kvec, skip, (K)) \
> - } else if (unlikely(i->type & ITER_DISCARD)) { \
> + } else if (unlikely(iov_iter_type(i) & ITER_DISCARD)) { \
> } else { \
> const struct iovec *iov; \
> struct iovec v; \
> @@ -103,7 +109,7 @@
> n = i->count; \
> if (i->count) { \
> size_t skip = i->iov_offset; \
> - if (unlikely(i->type & ITER_BVEC)) { \
> + if (unlikely(iov_iter_type(i) & ITER_BVEC)) { \
> const struct bio_vec *bvec = i->bvec; \
> struct bio_vec v; \
> struct bvec_iter __bi; \
> @@ -111,7 +117,7 @@
> i->bvec = __bvec_iter_bvec(i->bvec, __bi); \
> i->nr_segs -= i->bvec - bvec; \
> skip = __bi.bi_bvec_done; \
> - } else if (unlikely(i->type & ITER_KVEC)) { \
> + } else if (unlikely(iov_iter_type(i) & ITER_KVEC)) { \
> const struct kvec *kvec; \
> struct kvec v; \
> iterate_kvec(i, n, v, kvec, skip, (K)) \
> @@ -121,7 +127,7 @@
> } \
> i->nr_segs -= kvec - i->kvec; \
> i->kvec = kvec; \
> - } else if (unlikely(i->type & ITER_DISCARD)) { \
> + } else if (unlikely(iov_iter_type(i) & ITER_DISCARD)) { \
> skip += n; \
> } else { \
> const struct iovec *iov; \
> @@ -427,14 +433,14 @@ static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t by
> * Return 0 on success, or non-zero if the memory could not be accessed (i.e.
> * because it is an invalid address).
> */
> -int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes)
> +static int xxx_fault_in_readable(struct iov_iter *i, size_t bytes)
> {
> size_t skip = i->iov_offset;
> const struct iovec *iov;
> int err;
> struct iovec v;
>
> - if (!(i->type & (ITER_BVEC|ITER_KVEC))) {
> + if (!(iov_iter_type(i) & (ITER_BVEC|ITER_KVEC))) {
> iterate_iovec(i, bytes, v, iov, skip, ({
> err = fault_in_pages_readable(v.iov_base, v.iov_len);
> if (unlikely(err))
> @@ -443,7 +449,6 @@ int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes)
> }
> return 0;
> }
> -EXPORT_SYMBOL(iov_iter_fault_in_readable);
>
> void iov_iter_init(struct iov_iter *i, unsigned int direction,
> const struct iovec *iov, unsigned long nr_segs,
> @@ -454,10 +459,12 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
>
> /* It will get better. Eventually... */
> if (uaccess_kernel()) {
> - i->type = ITER_KVEC | direction;
> + i->ops = &kvec_iter_ops;
> + i->flags = direction;
> i->kvec = (struct kvec *)iov;
> } else {
> - i->type = ITER_IOVEC | direction;
> + i->ops = &iovec_iter_ops;
> + i->flags = direction;
> i->iov = iov;
> }
> i->nr_segs = nr_segs;
> @@ -625,7 +632,7 @@ static size_t csum_and_copy_to_pipe_iter(const void *addr, size_t bytes,
> return bytes;
> }
>
> -size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
> +static size_t xxx_copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
> {
> const char *from = addr;
> if (unlikely(iov_iter_is_pipe(i)))
> @@ -641,7 +648,6 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
>
> return bytes;
> }
> -EXPORT_SYMBOL(_copy_to_iter);
>
> #ifdef CONFIG_ARCH_HAS_COPY_MC
> static int copyout_mc(void __user *to, const void *from, size_t n)
> @@ -723,7 +729,7 @@ static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
> * Compare to copy_to_iter() where only ITER_IOVEC attempts might return
> * a short copy.
> */
> -size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
> +static size_t xxx_copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
> {
> const char *from = addr;
> unsigned long rem, curr_addr, s_addr = (unsigned long) addr;
> @@ -757,10 +763,9 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
>
> return bytes;
> }
> -EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
> #endif /* CONFIG_ARCH_HAS_COPY_MC */
>
> -size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
> +static size_t xxx_copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
> {
> char *to = addr;
> if (unlikely(iov_iter_is_pipe(i))) {
> @@ -778,9 +783,8 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
>
> return bytes;
> }
> -EXPORT_SYMBOL(_copy_from_iter);
>
> -bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
> +static bool xxx_copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
> {
> char *to = addr;
> if (unlikely(iov_iter_is_pipe(i))) {
> @@ -805,9 +809,8 @@ bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
> iov_iter_advance(i, bytes);
> return true;
> }
> -EXPORT_SYMBOL(_copy_from_iter_full);
>
> -size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
> +static size_t xxx_copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
> {
> char *to = addr;
> if (unlikely(iov_iter_is_pipe(i))) {
> @@ -824,7 +827,6 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
>
> return bytes;
> }
> -EXPORT_SYMBOL(_copy_from_iter_nocache);
>
> #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
> /**
> @@ -841,7 +843,7 @@ EXPORT_SYMBOL(_copy_from_iter_nocache);
> * bypass the cache for the ITER_IOVEC case, and on some archs may use
> * instructions that strand dirty-data in the cache.
> */
> -size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
> +static size_t xxx_copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
> {
> char *to = addr;
> if (unlikely(iov_iter_is_pipe(i))) {
> @@ -859,10 +861,9 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
>
> return bytes;
> }
> -EXPORT_SYMBOL_GPL(_copy_from_iter_flushcache);
> #endif
>
> -bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
> +static bool xxx_copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
> {
> char *to = addr;
> if (unlikely(iov_iter_is_pipe(i))) {
> @@ -884,7 +885,6 @@ bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
> iov_iter_advance(i, bytes);
> return true;
> }
> -EXPORT_SYMBOL(_copy_from_iter_full_nocache);
>
> static inline bool page_copy_sane(struct page *page, size_t offset, size_t n)
> {
> @@ -910,12 +910,12 @@ static inline bool page_copy_sane(struct page *page, size_t offset, size_t n)
> return false;
> }
>
> -size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
> +static size_t xxx_copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
> struct iov_iter *i)
> {
> if (unlikely(!page_copy_sane(page, offset, bytes)))
> return 0;
> - if (i->type & (ITER_BVEC|ITER_KVEC)) {
> + if (iov_iter_type(i) & (ITER_BVEC|ITER_KVEC)) {
> void *kaddr = kmap_atomic(page);
> size_t wanted = copy_to_iter(kaddr + offset, bytes, i);
> kunmap_atomic(kaddr);
> @@ -927,9 +927,8 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
> else
> return copy_page_to_iter_pipe(page, offset, bytes, i);
> }
> -EXPORT_SYMBOL(copy_page_to_iter);
>
> -size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
> +static size_t xxx_copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
> struct iov_iter *i)
> {
> if (unlikely(!page_copy_sane(page, offset, bytes)))
> @@ -938,15 +937,14 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
> WARN_ON(1);
> return 0;
> }
> - if (i->type & (ITER_BVEC|ITER_KVEC)) {
> + if (iov_iter_type(i) & (ITER_BVEC|ITER_KVEC)) {
> void *kaddr = kmap_atomic(page);
> - size_t wanted = _copy_from_iter(kaddr + offset, bytes, i);
> + size_t wanted = xxx_copy_from_iter(kaddr + offset, bytes, i);
> kunmap_atomic(kaddr);
> return wanted;
> } else
> return copy_page_from_iter_iovec(page, offset, bytes, i);
> }
> -EXPORT_SYMBOL(copy_page_from_iter);
>
> static size_t pipe_zero(size_t bytes, struct iov_iter *i)
> {
> @@ -975,7 +973,7 @@ static size_t pipe_zero(size_t bytes, struct iov_iter *i)
> return bytes;
> }
>
> -size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
> +static size_t xxx_zero(size_t bytes, struct iov_iter *i)
> {
> if (unlikely(iov_iter_is_pipe(i)))
> return pipe_zero(bytes, i);
> @@ -987,9 +985,8 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
>
> return bytes;
> }
> -EXPORT_SYMBOL(iov_iter_zero);
>
> -size_t iov_iter_copy_from_user_atomic(struct page *page,
> +static size_t xxx_copy_from_user_atomic(struct page *page,
> struct iov_iter *i, unsigned long offset, size_t bytes)
> {
> char *kaddr = kmap_atomic(page), *p = kaddr + offset;
> @@ -1011,7 +1008,6 @@ size_t iov_iter_copy_from_user_atomic(struct page *page,
> kunmap_atomic(kaddr);
> return bytes;
> }
> -EXPORT_SYMBOL(iov_iter_copy_from_user_atomic);
>
> static inline void pipe_truncate(struct iov_iter *i)
> {
> @@ -1067,7 +1063,7 @@ static void pipe_advance(struct iov_iter *i, size_t size)
> pipe_truncate(i);
> }
>
> -void iov_iter_advance(struct iov_iter *i, size_t size)
> +static void xxx_advance(struct iov_iter *i, size_t size)
> {
> if (unlikely(iov_iter_is_pipe(i))) {
> pipe_advance(i, size);
> @@ -1079,9 +1075,8 @@ void iov_iter_advance(struct iov_iter *i, size_t size)
> }
> iterate_and_advance(i, size, v, 0, 0, 0)
> }
> -EXPORT_SYMBOL(iov_iter_advance);
>
> -void iov_iter_revert(struct iov_iter *i, size_t unroll)
> +static void xxx_revert(struct iov_iter *i, size_t unroll)
> {
> if (!unroll)
> return;
> @@ -1147,12 +1142,11 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll)
> }
> }
> }
> -EXPORT_SYMBOL(iov_iter_revert);
>
> /*
> * Return the count of just the current iov_iter segment.
> */
> -size_t iov_iter_single_seg_count(const struct iov_iter *i)
> +static size_t xxx_single_seg_count(const struct iov_iter *i)
> {
> if (unlikely(iov_iter_is_pipe(i)))
> return i->count; // it is a silly place, anyway
> @@ -1165,14 +1159,14 @@ size_t iov_iter_single_seg_count(const struct iov_iter *i)
> else
> return min(i->count, i->iov->iov_len - i->iov_offset);
> }
> -EXPORT_SYMBOL(iov_iter_single_seg_count);
>
> void iov_iter_kvec(struct iov_iter *i, unsigned int direction,
> - const struct kvec *kvec, unsigned long nr_segs,
> - size_t count)
> + const struct kvec *kvec, unsigned long nr_segs,
> + size_t count)
> {
> WARN_ON(direction & ~(READ | WRITE));
> - i->type = ITER_KVEC | (direction & (READ | WRITE));
> + i->ops = &kvec_iter_ops;
> + i->flags = direction & (READ | WRITE);
> i->kvec = kvec;
> i->nr_segs = nr_segs;
> i->iov_offset = 0;
> @@ -1185,7 +1179,8 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction,
> size_t count)
> {
> WARN_ON(direction & ~(READ | WRITE));
> - i->type = ITER_BVEC | (direction & (READ | WRITE));
> + i->ops = &bvec_iter_ops;
> + i->flags = direction & (READ | WRITE);
> i->bvec = bvec;
> i->nr_segs = nr_segs;
> i->iov_offset = 0;
> @@ -1199,7 +1194,8 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction,
> {
> BUG_ON(direction != READ);
> WARN_ON(pipe_full(pipe->head, pipe->tail, pipe->ring_size));
> - i->type = ITER_PIPE | READ;
> + i->ops = &pipe_iter_ops;
> + i->flags = READ;
> i->pipe = pipe;
> i->head = pipe->head;
> i->iov_offset = 0;
> @@ -1220,13 +1216,14 @@ EXPORT_SYMBOL(iov_iter_pipe);
> void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count)
> {
> BUG_ON(direction != READ);
> - i->type = ITER_DISCARD | READ;
> + i->ops = &discard_iter_ops;
> + i->flags = READ;
> i->count = count;
> i->iov_offset = 0;
> }
> EXPORT_SYMBOL(iov_iter_discard);
>
> -unsigned long iov_iter_alignment(const struct iov_iter *i)
> +static unsigned long xxx_alignment(const struct iov_iter *i)
> {
> unsigned long res = 0;
> size_t size = i->count;
> @@ -1245,9 +1242,8 @@ unsigned long iov_iter_alignment(const struct iov_iter *i)
> )
> return res;
> }
> -EXPORT_SYMBOL(iov_iter_alignment);
>
> -unsigned long iov_iter_gap_alignment(const struct iov_iter *i)
> +static unsigned long xxx_gap_alignment(const struct iov_iter *i)
> {
> unsigned long res = 0;
> size_t size = i->count;
> @@ -1267,7 +1263,6 @@ unsigned long iov_iter_gap_alignment(const struct iov_iter *i)
> );
> return res;
> }
> -EXPORT_SYMBOL(iov_iter_gap_alignment);
>
> static inline ssize_t __pipe_get_pages(struct iov_iter *i,
> size_t maxsize,
> @@ -1313,7 +1308,7 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
> return __pipe_get_pages(i, min(maxsize, capacity), pages, iter_head, start);
> }
>
> -ssize_t iov_iter_get_pages(struct iov_iter *i,
> +static ssize_t xxx_get_pages(struct iov_iter *i,
> struct page **pages, size_t maxsize, unsigned maxpages,
> size_t *start)
> {
> @@ -1352,7 +1347,6 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
> )
> return 0;
> }
> -EXPORT_SYMBOL(iov_iter_get_pages);
>
> static struct page **get_pages_array(size_t n)
> {
> @@ -1392,7 +1386,7 @@ static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
> return n;
> }
>
> -ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
> +static ssize_t xxx_get_pages_alloc(struct iov_iter *i,
> struct page ***pages, size_t maxsize,
> size_t *start)
> {
> @@ -1439,9 +1433,8 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
> )
> return 0;
> }
> -EXPORT_SYMBOL(iov_iter_get_pages_alloc);
>
> -size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
> +static size_t xxx_csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
> struct iov_iter *i)
> {
> char *to = addr;
> @@ -1478,9 +1471,8 @@ size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
> *csum = sum;
> return bytes;
> }
> -EXPORT_SYMBOL(csum_and_copy_from_iter);
>
> -bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum,
> +static bool xxx_csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum,
> struct iov_iter *i)
> {
> char *to = addr;
> @@ -1520,9 +1512,8 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum,
> iov_iter_advance(i, bytes);
> return true;
> }
> -EXPORT_SYMBOL(csum_and_copy_from_iter_full);
>
> -size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump,
> +static size_t xxx_csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump,
> struct iov_iter *i)
> {
> const char *from = addr;
> @@ -1564,7 +1555,6 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump,
> *csum = sum;
> return bytes;
> }
> -EXPORT_SYMBOL(csum_and_copy_to_iter);
>
> size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
> struct iov_iter *i)
> @@ -1585,7 +1575,7 @@ size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
> }
> EXPORT_SYMBOL(hash_and_copy_to_iter);
>
> -int iov_iter_npages(const struct iov_iter *i, int maxpages)
> +static int xxx_npages(const struct iov_iter *i, int maxpages)
> {
> size_t size = i->count;
> int npages = 0;
> @@ -1628,9 +1618,8 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages)
> )
> return npages;
> }
> -EXPORT_SYMBOL(iov_iter_npages);
>
> -const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
> +static const void *xxx_dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
> {
> *new = *old;
> if (unlikely(iov_iter_is_pipe(new))) {
> @@ -1649,7 +1638,6 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
> new->nr_segs * sizeof(struct iovec),
> flags);
> }
> -EXPORT_SYMBOL(dup_iter);
>
> static int copy_compat_iovec_from_user(struct iovec *iov,
> const struct iovec __user *uvec, unsigned long nr_segs)
> @@ -1826,7 +1814,7 @@ int import_single_range(int rw, void __user *buf, size_t len,
> }
> EXPORT_SYMBOL(import_single_range);
>
> -int iov_iter_for_each_range(struct iov_iter *i, size_t bytes,
> +static int xxx_for_each_range(struct iov_iter *i, size_t bytes,
> int (*f)(struct kvec *vec, void *context),
> void *context)
> {
> @@ -1846,4 +1834,173 @@ int iov_iter_for_each_range(struct iov_iter *i, size_t bytes,
> )
> return err;
> }
> -EXPORT_SYMBOL(iov_iter_for_each_range);
> +
> +static const struct iov_iter_ops iovec_iter_ops = {
> + .type = ITER_IOVEC,
> + .copy_from_user_atomic = xxx_copy_from_user_atomic,
> + .advance = xxx_advance,
> + .revert = xxx_revert,
> + .fault_in_readable = xxx_fault_in_readable,
> + .single_seg_count = xxx_single_seg_count,
> + .copy_page_to_iter = xxx_copy_page_to_iter,
> + .copy_page_from_iter = xxx_copy_page_from_iter,
> + .copy_to_iter = xxx_copy_to_iter,
> + .copy_from_iter = xxx_copy_from_iter,
> + .copy_from_iter_full = xxx_copy_from_iter_full,
> + .copy_from_iter_nocache = xxx_copy_from_iter_nocache,
> + .copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
> +#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
> + .copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
> +#endif
> +#ifdef CONFIG_ARCH_HAS_COPY_MC
> + .copy_mc_to_iter = xxx_copy_mc_to_iter,
> +#endif
> + .csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
> + .csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
> + .csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,
> +
> + .zero = xxx_zero,
> + .alignment = xxx_alignment,
> + .gap_alignment = xxx_gap_alignment,
> + .get_pages = xxx_get_pages,
> + .get_pages_alloc = xxx_get_pages_alloc,
> + .npages = xxx_npages,
> + .dup_iter = xxx_dup_iter,
> + .for_each_range = xxx_for_each_range,
> +};
> +
> +static const struct iov_iter_ops kvec_iter_ops = {
> + .type = ITER_KVEC,
> + .copy_from_user_atomic = xxx_copy_from_user_atomic,
> + .advance = xxx_advance,
> + .revert = xxx_revert,
> + .fault_in_readable = xxx_fault_in_readable,
> + .single_seg_count = xxx_single_seg_count,
> + .copy_page_to_iter = xxx_copy_page_to_iter,
> + .copy_page_from_iter = xxx_copy_page_from_iter,
> + .copy_to_iter = xxx_copy_to_iter,
> + .copy_from_iter = xxx_copy_from_iter,
> + .copy_from_iter_full = xxx_copy_from_iter_full,
> + .copy_from_iter_nocache = xxx_copy_from_iter_nocache,
> + .copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
> +#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
> + .copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
> +#endif
> +#ifdef CONFIG_ARCH_HAS_COPY_MC
> + .copy_mc_to_iter = xxx_copy_mc_to_iter,
> +#endif
> + .csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
> + .csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
> + .csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,
> +
> + .zero = xxx_zero,
> + .alignment = xxx_alignment,
> + .gap_alignment = xxx_gap_alignment,
> + .get_pages = xxx_get_pages,
> + .get_pages_alloc = xxx_get_pages_alloc,
> + .npages = xxx_npages,
> + .dup_iter = xxx_dup_iter,
> + .for_each_range = xxx_for_each_range,
> +};
> +
> +static const struct iov_iter_ops bvec_iter_ops = {
> + .type = ITER_BVEC,
> + .copy_from_user_atomic = xxx_copy_from_user_atomic,
> + .advance = xxx_advance,
> + .revert = xxx_revert,
> + .fault_in_readable = xxx_fault_in_readable,
> + .single_seg_count = xxx_single_seg_count,
> + .copy_page_to_iter = xxx_copy_page_to_iter,
> + .copy_page_from_iter = xxx_copy_page_from_iter,
> + .copy_to_iter = xxx_copy_to_iter,
> + .copy_from_iter = xxx_copy_from_iter,
> + .copy_from_iter_full = xxx_copy_from_iter_full,
> + .copy_from_iter_nocache = xxx_copy_from_iter_nocache,
> + .copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
> +#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
> + .copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
> +#endif
> +#ifdef CONFIG_ARCH_HAS_COPY_MC
> + .copy_mc_to_iter = xxx_copy_mc_to_iter,
> +#endif
> + .csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
> + .csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
> + .csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,
> +
> + .zero = xxx_zero,
> + .alignment = xxx_alignment,
> + .gap_alignment = xxx_gap_alignment,
> + .get_pages = xxx_get_pages,
> + .get_pages_alloc = xxx_get_pages_alloc,
> + .npages = xxx_npages,
> + .dup_iter = xxx_dup_iter,
> + .for_each_range = xxx_for_each_range,
> +};
> +
> +static const struct iov_iter_ops pipe_iter_ops = {
> + .type = ITER_PIPE,
> + .copy_from_user_atomic = xxx_copy_from_user_atomic,
> + .advance = xxx_advance,
> + .revert = xxx_revert,
> + .fault_in_readable = xxx_fault_in_readable,
> + .single_seg_count = xxx_single_seg_count,
> + .copy_page_to_iter = xxx_copy_page_to_iter,
> + .copy_page_from_iter = xxx_copy_page_from_iter,
> + .copy_to_iter = xxx_copy_to_iter,
> + .copy_from_iter = xxx_copy_from_iter,
> + .copy_from_iter_full = xxx_copy_from_iter_full,
> + .copy_from_iter_nocache = xxx_copy_from_iter_nocache,
> + .copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
> +#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
> + .copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
> +#endif
> +#ifdef CONFIG_ARCH_HAS_COPY_MC
> + .copy_mc_to_iter = xxx_copy_mc_to_iter,
> +#endif
> + .csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
> + .csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
> + .csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,
> +
> + .zero = xxx_zero,
> + .alignment = xxx_alignment,
> + .gap_alignment = xxx_gap_alignment,
> + .get_pages = xxx_get_pages,
> + .get_pages_alloc = xxx_get_pages_alloc,
> + .npages = xxx_npages,
> + .dup_iter = xxx_dup_iter,
> + .for_each_range = xxx_for_each_range,
> +};
> +
> +static const struct iov_iter_ops discard_iter_ops = {
> + .type = ITER_DISCARD,
> + .copy_from_user_atomic = xxx_copy_from_user_atomic,
> + .advance = xxx_advance,
> + .revert = xxx_revert,
> + .fault_in_readable = xxx_fault_in_readable,
> + .single_seg_count = xxx_single_seg_count,
> + .copy_page_to_iter = xxx_copy_page_to_iter,
> + .copy_page_from_iter = xxx_copy_page_from_iter,
> + .copy_to_iter = xxx_copy_to_iter,
> + .copy_from_iter = xxx_copy_from_iter,
> + .copy_from_iter_full = xxx_copy_from_iter_full,
> + .copy_from_iter_nocache = xxx_copy_from_iter_nocache,
> + .copy_from_iter_full_nocache = xxx_copy_from_iter_full_nocache,
> +#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
> + .copy_from_iter_flushcache = xxx_copy_from_iter_flushcache,
> +#endif
> +#ifdef CONFIG_ARCH_HAS_COPY_MC
> + .copy_mc_to_iter = xxx_copy_mc_to_iter,
> +#endif
> + .csum_and_copy_to_iter = xxx_csum_and_copy_to_iter,
> + .csum_and_copy_from_iter = xxx_csum_and_copy_from_iter,
> + .csum_and_copy_from_iter_full = xxx_csum_and_copy_from_iter_full,
> +
> + .zero = xxx_zero,
> + .alignment = xxx_alignment,
> + .gap_alignment = xxx_gap_alignment,
> + .get_pages = xxx_get_pages,
> + .get_pages_alloc = xxx_get_pages_alloc,
> + .npages = xxx_npages,
> + .dup_iter = xxx_dup_iter,
> + .for_each_range = xxx_for_each_range,
> +};
>
>

--
Pavel Begunkov

2020-11-21 14:39:50

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH 00/29] RFC: iov_iter: Switch to using an ops table

On 21/11/2020 14:13, David Howells wrote:
>
> Hi Pavel, Willy, Jens, Al,
>
> I had a go switching the iov_iter stuff away from using a type bitmask to
> using an ops table to get rid of the if-if-if-if chains that are all over
> the place. After I pushed it, someone pointed me at Pavel's two patches.
>
> I have another iterator class that I want to add - which would lengthen the
> if-if-if-if chains. A lot of the time, there's a conditional clause at the
> beginning of a function that just jumps off to a type-specific handler or
> to reject the operation for that type. An ops table can just point to that
> instead.
>
> As far as I can tell, there's no difference in performance in most cases,
> though doing AFS-based kernel compiles appears to take less time (down from
> 3m20 to 2m50), which might make sense as that uses iterators a lot - but
> there are too many variables in that for that to be a good benchmark (I'm
> dealing with a remote server, for a start).
>
> Can someone recommend a good way to benchmark this properly? The problem
> is that the difference this makes relative to the amount of time taken to
> actually do I/O is tiny.

I find enough of iov overhead running fio/t/io_uring.c with nullblk.
Not sure whether it'll help you but worth a try.

>
> I've tried TCP transfers using the following sink program:
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <netinet/in.h>
> #define OSERROR(X, Y) do { if ((long)(X) == -1) { perror(Y); exit(1); } } while(0)
> static unsigned char buffer[512 * 1024] __attribute__((aligned(4096)));
> int main(int argc, char *argv[])
> {
> struct sockaddr_in sin = { .sin_family = AF_INET, .sin_port = htons(5555) };
> int sfd, afd;
> sfd = socket(AF_INET, SOCK_STREAM, 0);
> OSERROR(sfd, "socket");
> OSERROR(bind(sfd, (struct sockaddr *)&sin, sizeof(sin)), "bind");
> OSERROR(listen(sfd, 1), "listen");
> for (;;) {
> afd = accept(sfd, NULL, NULL);
> if (afd != -1) {
> while (read(afd, buffer, sizeof(buffer)) > 0) {}
> close(afd);
> }
> }
> }
>
> and send program:
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <netdb.h>
> #include <netinet/in.h>
> #include <sys/stat.h>
> #include <sys/sendfile.h>
> #define OSERROR(X, Y) do { if ((long)(X) == -1) { perror(Y); exit(1); } } while(0)
> static unsigned char buffer[512*1024] __attribute__((aligned(4096)));
> int main(int argc, char *argv[])
> {
> struct sockaddr_in sin = { .sin_family = AF_INET, .sin_port = htons(5555) };
> struct hostent *h;
> ssize_t size, r, o;
> int cfd;
> if (argc != 3) {
> fprintf(stderr, "tcp-gen <server> <size>\n");
> exit(2);
> }
> size = strtoul(argv[2], NULL, 0);
> if (size <= 0) {
> fprintf(stderr, "Bad size\n");
> exit(2);
> }
> h = gethostbyname(argv[1]);
> if (!h) {
> fprintf(stderr, "%s: %s\n", argv[1], hstrerror(h_errno));
> exit(3);
> }
> if (!h->h_addr_list[0]) {
> fprintf(stderr, "%s: No addresses\n", argv[1]);
> exit(3);
> }
> memcpy(&sin.sin_addr, h->h_addr_list[0], h->h_length);
> cfd = socket(AF_INET, SOCK_STREAM, 0);
> OSERROR(cfd, "socket");
> OSERROR(connect(cfd, (struct sockaddr *)&sin, sizeof(sin)), "connect");
> do {
> r = size > sizeof(buffer) ? sizeof(buffer) : size;
> size -= r;
> o = 0;
> do {
> ssize_t w = write(cfd, buffer + o, r - o);
> OSERROR(w, "write");
> o += w;
> } while (o < r);
> } while (size > 0);
> OSERROR(close(cfd), "close/c");
> return 0;
> }
>
> since the socket interface uses iterators. It seems to show no difference.
> One side note, though: I've been doing 10GiB same-machine transfers, and it
> takes either ~2.5s or ~0.87s and rarely in between, with or without these
> patches, alternating apparently randomly between the two times.
>
> The patches can be found here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=iov-ops
>
> David
> ---
> David Howells (29):
> iov_iter: Switch to using a table of operations
> iov_iter: Split copy_page_to_iter()
> iov_iter: Split iov_iter_fault_in_readable
> iov_iter: Split the iterate_and_advance() macro
> iov_iter: Split copy_to_iter()
> iov_iter: Split copy_mc_to_iter()
> iov_iter: Split copy_from_iter()
> iov_iter: Split the iterate_all_kinds() macro
> iov_iter: Split copy_from_iter_full()
> iov_iter: Split copy_from_iter_nocache()
> iov_iter: Split copy_from_iter_flushcache()
> iov_iter: Split copy_from_iter_full_nocache()
> iov_iter: Split copy_page_from_iter()
> iov_iter: Split iov_iter_zero()
> iov_iter: Split copy_from_user_atomic()
> iov_iter: Split iov_iter_advance()
> iov_iter: Split iov_iter_revert()
> iov_iter: Split iov_iter_single_seg_count()
> iov_iter: Split iov_iter_alignment()
> iov_iter: Split iov_iter_gap_alignment()
> iov_iter: Split iov_iter_get_pages()
> iov_iter: Split iov_iter_get_pages_alloc()
> iov_iter: Split csum_and_copy_from_iter()
> iov_iter: Split csum_and_copy_from_iter_full()
> iov_iter: Split csum_and_copy_to_iter()
> iov_iter: Split iov_iter_npages()
> iov_iter: Split dup_iter()
> iov_iter: Split iov_iter_for_each_range()
> iov_iter: Remove iterate_all_kinds() and iterate_and_advance()
>
>
> lib/iov_iter.c | 1440 +++++++++++++++++++++++++++++++-----------------
> 1 file changed, 934 insertions(+), 506 deletions(-)
>
>

--
Pavel Begunkov

2020-11-21 18:25:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 01/29] iov_iter: Switch to using a table of operations

On Sat, Nov 21, 2020 at 6:13 AM David Howells <[email protected]> wrote:
>
> Switch to using a table of operations. In a future patch the individual
> methods will be split up by type. For the moment, however, the ops tables
> just jump directly to the old functions - which are now static. Inline
> wrappers are provided to jump through the hooks.

So I think conceptually this is the right thing to do, but I have a
couple of worries:

- do we really need all those different versions? I'm thinking
"iter_full" versions in particular. They I think the iter_full version
could just be wrappers that call the regular iter thing and verify the
end result is full (and revert if not). No?

- I don't like the xxx_iter_op naming - even as a temporary thing.

Please don't use "xxx" as a placeholder. It's not a great grep
pattern, it's not really descriptive, and we've literally had issues
with things being marked as spam when you use that. So it's about the
worst pattern to use.

Use "anycase" - or something like that - which is descriptive and
greps much better (ie not a single hit for that pattern in the kernel
either before or after).

- I worry a bit about the indirect call overhead and spectre v2.

So yeah, it would be good to have benchmarks to make sure this
doesn't regress for some simple case.

Other than those things, my initial reaction is "this does seem cleaner".

Al?

Linus

2020-11-21 18:25:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 00/29] RFC: iov_iter: Switch to using an ops table

On Sat, Nov 21, 2020 at 6:13 AM David Howells <[email protected]> wrote:
>
> Can someone recommend a good way to benchmark this properly? The problem
> is that the difference this makes relative to the amount of time taken to
> actually do I/O is tiny.

Maybe try /dev/zero -> /dev/null to try a load where the IO itself is
cheap. Or vmsplice to /dev/null?

Linus

2020-11-22 13:36:01

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 01/29] iov_iter: Switch to using a table of operations

Linus Torvalds <[email protected]> wrote:

> - I worry a bit about the indirect call overhead and spectre v2.

I don't know enough about how spectre v2 works to say if this would be a
problem for the ops-table approach, but wouldn't it also affect the chain of
conditional branches that we currently use, since it's branch-prediction
based?

David

2020-11-22 14:00:18

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 01/29] iov_iter: Switch to using a table of operations

From: David Howells
> Sent: 22 November 2020 13:33
>
> Linus Torvalds <[email protected]> wrote:
>
> > - I worry a bit about the indirect call overhead and spectre v2.
>
> I don't know enough about how spectre v2 works to say if this would be a
> problem for the ops-table approach, but wouldn't it also affect the chain of
> conditional branches that we currently use, since it's branch-prediction
> based?

The advantage of the 'chain of branches' is that it can be converted
into a 'tree of branches' because the values are all separate bits.

So as well as putting the (expected) common one first; you can do:
if (likely((a & (A | B))) {
if (a & A) {
code for A;
} else {
code for B;
} else ...
So get better control over the branch sequence.
(Hopefully the compiler doesn't change the logic.
I want a dumb compiler that (mostly) compiles what I write!)

Part of the difficulty is deciding the common case.
There'll always be a benchmark that exercises an uncommon case.

Adding an indirect call does let you do things like adding
ITER_IOVER_SINGLE and ITER_KVEC_SINGLE that are used in the
common case of a single buffer fragment.
That might be a measurable gain.

It is also possible to optimise the common case to a direct
call (or even inline code) and use an indirect call for
everything else.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-11-22 19:26:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 01/29] iov_iter: Switch to using a table of operations

On Sun, Nov 22, 2020 at 5:33 AM David Howells <[email protected]> wrote:
>
> I don't know enough about how spectre v2 works to say if this would be a
> problem for the ops-table approach, but wouldn't it also affect the chain of
> conditional branches that we currently use, since it's branch-prediction
> based?

No, regular conditional branches aren't a problem. Yes, they may
mispredict, but outside of a few very rare cases that we handle
specially, that's not an issue.

Why? Because they always mispredict to one or the other side, so the
code flow may be mis-predicted, but it is fairly controlled.

In contrast, an indirect jump can mispredict the target, and branch
_anywhere_, and the attack vectors can poison the BTB (branch target
buffer), so our mitigation for that is that every single indirect
branch isn't predicted at all (using "retpoline").

So a conditional branch takes zero cycles when predicted (and most
will predict quite well). And as David Laight pointed out a compiler
can also turn a series of conditional branches into a tree, means that
N conditional branches basically only needs log2(N) conditionals
executed.

In contrast, with retpoline in place, an indirect branch will
basically always take something like 25-30 cycles, because it always
mispredicts.

End result:

- well-predicted conditional branches are basically free (apart from
code layout issues)

- even with average prediction, a series of conditional branches has
to be fairly long for it to be worse than an indirect branch

- only completely unpredictable conditional branches end up basically
losing, and even then you probably need more than one. And while
completely unpredictable conditional branches do exist, they are
pretty rare.

The other side of the coin, of course, is

- often this is not measurable anyway.

- code cleanliness is important

- not everything needs retpolines and the expensive indirect branches.

So this is not in any way "indirect branches are bad". It's more of a
"indirect branches really aren't necessarily better than a couple of
conditionals, and _may_ be much worse".

For example, look at this gcc bugzilla:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952

which basically is about the compiler generating a jump table (is a
single indirect branch) vs a series of conditional branches. With
retpoline, the cross-over point is basically when you need to have
over 10 conditional branches - and because of the log2(N) behavior,
that's around a thousand cases!

(But this depends hugely on microarchitectural details).

Linus

2020-11-22 22:49:23

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 01/29] iov_iter: Switch to using a table of operations

From: David Howells
> Sent: 21 November 2020 14:14
>
> Switch to using a table of operations. In a future patch the individual
> methods will be split up by type. For the moment, however, the ops tables
> just jump directly to the old functions - which are now static. Inline
> wrappers are provided to jump through the hooks.

I was wondering if you could use a bit of 'cpp magic'
so the to call sites would be:
ITER_CALL(iter, action)(arg_list);

which might expand to:
iter->action(arg_list);
in the function-table case.
But could also be an if-chain:
if (iter->type & foo)
foo_action(args);
else ...
with foo_action() being inlined.

If there is enough symmetry it might make the code easier to read.
Although I'm not sure what happens to 'iterate_all_kinds'.
OTOH that is already unreadable.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-11-22 22:59:50

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 01/29] iov_iter: Switch to using a table of operations

From: Linus Torvalds
> Sent: 22 November 2020 19:22
> Subject: Re: [PATCH 01/29] iov_iter: Switch to using a table of operations
>
> On Sun, Nov 22, 2020 at 5:33 AM David Howells <[email protected]> wrote:
> >
> > I don't know enough about how spectre v2 works to say if this would be a
> > problem for the ops-table approach, but wouldn't it also affect the chain of
> > conditional branches that we currently use, since it's branch-prediction
> > based?
>
> No, regular conditional branches aren't a problem. Yes, they may
> mispredict, but outside of a few very rare cases that we handle
> specially, that's not an issue.
>
> Why? Because they always mispredict to one or the other side, so the
> code flow may be mis-predicted, but it is fairly controlled.
>
> In contrast, an indirect jump can mispredict the target, and branch
> _anywhere_, and the attack vectors can poison the BTB (branch target
> buffer), so our mitigation for that is that every single indirect
> branch isn't predicted at all (using "retpoline").
>
> So a conditional branch takes zero cycles when predicted (and most
> will predict quite well). And as David Laight pointed out a compiler
> can also turn a series of conditional branches into a tree, means that
> N conditional branches basically only needs log2(N) conditionals
> executed.

The compiler can convert a switch statement into a branch tree.
But I don't think it can convert the 'if chain' in the current code
to one.

There is also the problem that some x86 cpu can't predict branches
if too many happen in the same cache line (or similar).

> In contrast, with retpoline in place, an indirect branch will
> basically always take something like 25-30 cycles, because it always
> mispredicts.

I also wonder if a retpoline also trashes the return stack optimisation.
(If that is ever really a significant gain for real functions.)

...
> So this is not in any way "indirect branches are bad". It's more of a
> "indirect branches really aren't necessarily better than a couple of
> conditionals, and _may_ be much worse".

Even without retpolines, the jump table is likely to a data-cache
miss (and maybe a TLB miss) unless you are running hot-cache.
That is probably an extra cache miss on top of the I-cache ones.
Even worse if you end up with the jump table near the code
since the data cache line and TLB might never be shared.

So a very short switch statement is likely to be better as
conditional jumps anyway.

> For example, look at this gcc bugzilla:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952
>
> which basically is about the compiler generating a jump table (is a
> single indirect branch) vs a series of conditional branches. With
> retpoline, the cross-over point is basically when you need to have
> over 10 conditional branches - and because of the log2(N) behavior,
> that's around a thousand cases!

That was a hot-cache test.
Cold-cache is likely to favour the retpoline a little sooner.
(And the retpoline (probbaly) won't be (much) worse than the
mid-predicted indirect jump.

I do wonder how much of the kernel actually runs hot-cache?
Except for parts that explicitly run things in bursts.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-11-23 08:09:03

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 01/29] iov_iter: Switch to using a table of operations

On Sat, Nov 21, 2020 at 02:13:30PM +0000, David Howells wrote:
> Switch to using a table of operations. In a future patch the individual
> methods will be split up by type. For the moment, however, the ops tables
> just jump directly to the old functions - which are now static. Inline
> wrappers are provided to jump through the hooks.
>
> Signed-off-by: David Howells <[email protected]>

Please run performance tests. I think the indirect calls could totally
wreck things like high performance direct I/O, especially using io_uring
on x86.

2020-11-23 23:26:21

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 01/29] iov_iter: Switch to using a table of operations

Christoph Hellwig <[email protected]> wrote:

> Please run performance tests. I think the indirect calls could totally
> wreck things like high performance direct I/O, especially using io_uring
> on x86.

Here's an initial test using fio and null_blk. I left null_blk in its default
configuration and used the following command line:

fio --ioengine=libaio --direct=1 --gtod_reduce=1 --name=readtest --filename=/dev/nullb0 --bs=4k --iodepth=128 --time_based --runtime=120 --readwrite=randread --iodepth_low=96 --iodepth_batch=16 --numjobs=4

I borrowed some of the parameters from an email I found online, so I'm not
sure if they're that useful.

I tried three different sets of patches: none, just the first (which adds the
jump table without getting rid of the conditional branches), and all of them.

I'm not sure which stats are of particular interest here, so I took the two
summary stats from the output of fio and also added together the "issued rwts:
total=a,b,c,d" from each test thread (only the first of which is non-zero).

The CPU is an Intel(R) Core(TM) i3-4170 CPU @ 3.70GHz, so 4 single-thread
cores, and 16G of RAM. No virtualisation is involved.

Unpatched:

READ: bw=4109MiB/s (4308MB/s), 1025MiB/s-1029MiB/s (1074MB/s-1079MB/s), io=482GiB (517GB), run=120001-120001msec
READ: bw=4097MiB/s (4296MB/s), 1020MiB/s-1029MiB/s (1070MB/s-1079MB/s), io=480GiB (516GB), run=120001-120001msec
READ: bw=4113MiB/s (4312MB/s), 1025MiB/s-1031MiB/s (1075MB/s-1082MB/s), io=482GiB (517GB), run=120001-120001msec
READ: bw=4125MiB/s (4325MB/s), 1028MiB/s-1033MiB/s (1078MB/s-1084MB/s), io=483GiB (519GB), run=120001-120001msec

nullb0: ios=126017326/0, merge=53/0, ticks=3538817/0, in_queue=3538817, util=100.00%
nullb0: ios=125655193/0, merge=55/0, ticks=3548157/0, in_queue=3548157, util=100.00%
nullb0: ios=126133014/0, merge=58/0, ticks=3545621/0, in_queue=3545621, util=100.00%
nullb0: ios=126512562/0, merge=57/0, ticks=3531600/0, in_queue=3531600, util=100.00%

sum issued rwts = 126224632
sum issued rwts = 125861368
sum issued rwts = 126340344
sum issued rwts = 126718648

Just first patch:

READ: bw=4106MiB/s (4306MB/s), 1023MiB/s-1030MiB/s (1073MB/s-1080MB/s), io=481GiB (517GB), run=120001-120001msec
READ: bw=4126MiB/s (4327MB/s), 1029MiB/s-1034MiB/s (1079MB/s-1084MB/s), io=484GiB (519GB), run=120001-120001msec
READ: bw=4109MiB/s (4308MB/s), 1025MiB/s-1029MiB/s (1075MB/s-1079MB/s), io=481GiB (517GB), run=120001-120001msec
READ: bw=4097MiB/s (4296MB/s), 1023MiB/s-1025MiB/s (1073MB/s-1074MB/s), io=480GiB (516GB), run=120001-120001msec

nullb0: ios=125939152/0, merge=62/0, ticks=3534917/0, in_queue=3534917, util=100.00%
nullb0: ios=126554181/0, merge=61/0, ticks=3532067/0, in_queue=3532067, util=100.00%
nullb0: ios=126012346/0, merge=54/0, ticks=3530504/0, in_queue=3530504, util=100.00%
nullb0: ios=125653775/0, merge=54/0, ticks=3537438/0, in_queue=3537438, util=100.00%

sum issued rwts = 126144952
sum issued rwts = 126765368
sum issued rwts = 126215928
sum issued rwts = 125864120

All patches:
nullb0: ios=10477062/0, merge=2/0, ticks=284992/0, in_queue=284992, util=95.87%
nullb0: ios=10405246/0, merge=2/0, ticks=291886/0, in_queue=291886, util=99.82%
nullb0: ios=10425583/0, merge=1/0, ticks=291699/0, in_queue=291699, util=99.22%
nullb0: ios=10438845/0, merge=3/0, ticks=292445/0, in_queue=292445, util=99.31%

READ: bw=4118MiB/s (4318MB/s), 1028MiB/s-1032MiB/s (1078MB/s-1082MB/s), io=483GiB (518GB), run=120001-120001msec
READ: bw=4109MiB/s (4308MB/s), 1024MiB/s-1030MiB/s (1073MB/s-1080MB/s), io=481GiB (517GB), run=120001-120001msec
READ: bw=4108MiB/s (4308MB/s), 1026MiB/s-1029MiB/s (1076MB/s-1079MB/s), io=481GiB (517GB), run=120001-120001msec
READ: bw=4112MiB/s (4312MB/s), 1025MiB/s-1031MiB/s (1075MB/s-1081MB/s), io=482GiB (517GB), run=120001-120001msec

nullb0: ios=126282410/0, merge=58/0, ticks=3557384/0, in_queue=3557384, util=100.00%
nullb0: ios=126004837/0, merge=67/0, ticks=3565235/0, in_queue=3565235, util=100.00%
nullb0: ios=125988876/0, merge=59/0, ticks=3563026/0, in_queue=3563026, util=100.00%
nullb0: ios=126118279/0, merge=57/0, ticks=3566122/0, in_queue=3566122, util=100.00%

sum issued rwts = 126494904
sum issued rwts = 126214200
sum issued rwts = 126198200
sum issued rwts = 126328312


David

2020-11-23 23:29:42

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 01/29] iov_iter: Switch to using a table of operations

David Howells <[email protected]> wrote:

> I tried three different sets of patches: none, just the first (which adds the
> jump table without getting rid of the conditional branches), and all of them.

And, I forgot to mention, I ran each test four times and then interleaved the
result lines for that set.

David

2020-11-24 00:41:55

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH 01/29] iov_iter: Switch to using a table of operations

On 21/11/2020 14:31, Pavel Begunkov wrote:
> On 21/11/2020 14:13, David Howells wrote:
>> Switch to using a table of operations. In a future patch the individual
>> methods will be split up by type. For the moment, however, the ops tables
>> just jump directly to the old functions - which are now static. Inline
>> wrappers are provided to jump through the hooks.
>>
>> Signed-off-by: David Howells <[email protected]>
>> ---
>>
>> fs/io_uring.c | 2
>> include/linux/uio.h | 241 ++++++++++++++++++++++++++++++++++--------
>> lib/iov_iter.c | 293 +++++++++++++++++++++++++++++++++++++++------------
>> 3 files changed, 422 insertions(+), 114 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 4ead291b2976..baa78f58ae5c 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -3192,7 +3192,7 @@ static void io_req_map_rw(struct io_kiocb *req, const struct iovec *iovec,
>> rw->free_iovec = iovec;
>> rw->bytes_done = 0;
>> /* can only be fixed buffers, no need to do anything */
>> - if (iter->type == ITER_BVEC)
>> + if (iov_iter_is_bvec(iter))
>
> Could you split this io_uring change and send for 5.10?
> Or I can do it for you if you wish.

FYI, I stole this chunk with right attributes. It should go through
io_uring 5.10, so shouldn't be a problem if you just drop it.

--
Pavel Begunkov

2020-11-24 00:43:23

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH 01/29] iov_iter: Switch to using a table of operations

On 23/11/2020 10:31, David Howells wrote:
> Christoph Hellwig <[email protected]> wrote:
>
>> Please run performance tests. I think the indirect calls could totally
>> wreck things like high performance direct I/O, especially using io_uring
>> on x86.
>
> Here's an initial test using fio and null_blk. I left null_blk in its default
> configuration and used the following command line:

I'd prefer something along no_sched=1 submit_queues=$(nproc) to reduce overhead.

>
> fio --ioengine=libaio --direct=1 --gtod_reduce=1 --name=readtest --filename=/dev/nullb0 --bs=4k --iodepth=128 --time_based --runtime=120 --readwrite=randread --iodepth_low=96 --iodepth_batch=16 --numjobs=4

fio is relatively heavy, I'd suggest to try fio/t/io_uring with nullblk

>
> I borrowed some of the parameters from an email I found online, so I'm not
> sure if they're that useful.
>
> I tried three different sets of patches: none, just the first (which adds the
> jump table without getting rid of the conditional branches), and all of them.
>
> I'm not sure which stats are of particular interest here, so I took the two
> summary stats from the output of fio and also added together the "issued rwts:
> total=a,b,c,d" from each test thread (only the first of which is non-zero).
>
> The CPU is an Intel(R) Core(TM) i3-4170 CPU @ 3.70GHz, so 4 single-thread
> cores, and 16G of RAM. No virtualisation is involved.
>
> Unpatched:
>
> READ: bw=4109MiB/s (4308MB/s), 1025MiB/s-1029MiB/s (1074MB/s-1079MB/s), io=482GiB (517GB), run=120001-120001msec
> READ: bw=4097MiB/s (4296MB/s), 1020MiB/s-1029MiB/s (1070MB/s-1079MB/s), io=480GiB (516GB), run=120001-120001msec
> READ: bw=4113MiB/s (4312MB/s), 1025MiB/s-1031MiB/s (1075MB/s-1082MB/s), io=482GiB (517GB), run=120001-120001msec
> READ: bw=4125MiB/s (4325MB/s), 1028MiB/s-1033MiB/s (1078MB/s-1084MB/s), io=483GiB (519GB), run=120001-120001msec
>
> nullb0: ios=126017326/0, merge=53/0, ticks=3538817/0, in_queue=3538817, util=100.00%
> nullb0: ios=125655193/0, merge=55/0, ticks=3548157/0, in_queue=3548157, util=100.00%
> nullb0: ios=126133014/0, merge=58/0, ticks=3545621/0, in_queue=3545621, util=100.00%
> nullb0: ios=126512562/0, merge=57/0, ticks=3531600/0, in_queue=3531600, util=100.00%
>
> sum issued rwts = 126224632
> sum issued rwts = 125861368
> sum issued rwts = 126340344
> sum issued rwts = 126718648
>
> Just first patch:
>
> READ: bw=4106MiB/s (4306MB/s), 1023MiB/s-1030MiB/s (1073MB/s-1080MB/s), io=481GiB (517GB), run=120001-120001msec
> READ: bw=4126MiB/s (4327MB/s), 1029MiB/s-1034MiB/s (1079MB/s-1084MB/s), io=484GiB (519GB), run=120001-120001msec
> READ: bw=4109MiB/s (4308MB/s), 1025MiB/s-1029MiB/s (1075MB/s-1079MB/s), io=481GiB (517GB), run=120001-120001msec
> READ: bw=4097MiB/s (4296MB/s), 1023MiB/s-1025MiB/s (1073MB/s-1074MB/s), io=480GiB (516GB), run=120001-120001msec
>
> nullb0: ios=125939152/0, merge=62/0, ticks=3534917/0, in_queue=3534917, util=100.00%
> nullb0: ios=126554181/0, merge=61/0, ticks=3532067/0, in_queue=3532067, util=100.00%
> nullb0: ios=126012346/0, merge=54/0, ticks=3530504/0, in_queue=3530504, util=100.00%
> nullb0: ios=125653775/0, merge=54/0, ticks=3537438/0, in_queue=3537438, util=100.00%
>
> sum issued rwts = 126144952
> sum issued rwts = 126765368
> sum issued rwts = 126215928
> sum issued rwts = 125864120
>
> All patches:
> nullb0: ios=10477062/0, merge=2/0, ticks=284992/0, in_queue=284992, util=95.87%
> nullb0: ios=10405246/0, merge=2/0, ticks=291886/0, in_queue=291886, util=99.82%
> nullb0: ios=10425583/0, merge=1/0, ticks=291699/0, in_queue=291699, util=99.22%
> nullb0: ios=10438845/0, merge=3/0, ticks=292445/0, in_queue=292445, util=99.31%
>
> READ: bw=4118MiB/s (4318MB/s), 1028MiB/s-1032MiB/s (1078MB/s-1082MB/s), io=483GiB (518GB), run=120001-120001msec
> READ: bw=4109MiB/s (4308MB/s), 1024MiB/s-1030MiB/s (1073MB/s-1080MB/s), io=481GiB (517GB), run=120001-120001msec
> READ: bw=4108MiB/s (4308MB/s), 1026MiB/s-1029MiB/s (1076MB/s-1079MB/s), io=481GiB (517GB), run=120001-120001msec
> READ: bw=4112MiB/s (4312MB/s), 1025MiB/s-1031MiB/s (1075MB/s-1081MB/s), io=482GiB (517GB), run=120001-120001msec
>
> nullb0: ios=126282410/0, merge=58/0, ticks=3557384/0, in_queue=3557384, util=100.00%
> nullb0: ios=126004837/0, merge=67/0, ticks=3565235/0, in_queue=3565235, util=100.00%
> nullb0: ios=125988876/0, merge=59/0, ticks=3563026/0, in_queue=3563026, util=100.00%
> nullb0: ios=126118279/0, merge=57/0, ticks=3566122/0, in_queue=3566122, util=100.00%
>
> sum issued rwts = 126494904
> sum issued rwts = 126214200
> sum issued rwts = 126198200
> sum issued rwts = 126328312
>
>
> David
>

--
Pavel Begunkov

2020-11-24 21:44:01

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 01/29] iov_iter: Switch to using a table of operations

Pavel Begunkov <[email protected]> wrote:

> fio is relatively heavy, I'd suggest to try fio/t/io_uring with nullblk

no patches:

IOPS=885152, IOS/call=25/25, inflight=64 (64)
IOPS=890400, IOS/call=25/25, inflight=32 (32)
IOPS=890656, IOS/call=25/25, inflight=64 (64)
IOPS=896096, IOS/call=25/25, inflight=96 (96)
IOPS=876256, IOS/call=25/25, inflight=128 (128)
IOPS=905056, IOS/call=25/25, inflight=128 (128)
IOPS=882912, IOS/call=25/25, inflight=96 (96)
IOPS=887392, IOS/call=25/25, inflight=64 (32)
IOPS=897152, IOS/call=25/25, inflight=128 (128)
IOPS=871392, IOS/call=25/25, inflight=32 (32)
IOPS=865088, IOS/call=25/25, inflight=96 (96)
IOPS=880032, IOS/call=25/25, inflight=32 (32)
IOPS=905376, IOS/call=25/25, inflight=96 (96)
IOPS=898016, IOS/call=25/25, inflight=128 (128)
IOPS=885792, IOS/call=25/25, inflight=64 (64)
IOPS=897632, IOS/call=25/25, inflight=96 (96)

first patch only:

IOPS=876640, IOS/call=25/25, inflight=64 (64)
IOPS=878208, IOS/call=25/25, inflight=64 (64)
IOPS=884000, IOS/call=25/25, inflight=64 (64)
IOPS=900864, IOS/call=25/25, inflight=64 (64)
IOPS=878496, IOS/call=25/25, inflight=64 (64)
IOPS=870944, IOS/call=25/25, inflight=32 (32)
IOPS=900672, IOS/call=25/25, inflight=32 (32)
IOPS=882368, IOS/call=25/25, inflight=128 (128)
IOPS=877120, IOS/call=25/25, inflight=128 (128)
IOPS=861856, IOS/call=25/25, inflight=64 (64)
IOPS=892896, IOS/call=25/25, inflight=96 (96)
IOPS=875808, IOS/call=25/25, inflight=128 (128)
IOPS=887808, IOS/call=25/25, inflight=32 (80)
IOPS=889984, IOS/call=25/25, inflight=128 (128)

all patches:

IOPS=872192, IOS/call=25/25, inflight=96 (96)
IOPS=887360, IOS/call=25/25, inflight=32 (32)
IOPS=894432, IOS/call=25/25, inflight=128 (128)
IOPS=884640, IOS/call=25/25, inflight=32 (32)
IOPS=886784, IOS/call=25/25, inflight=32 (32)
IOPS=884160, IOS/call=25/25, inflight=96 (96)
IOPS=886944, IOS/call=25/25, inflight=96 (96)
IOPS=903360, IOS/call=25/25, inflight=128 (128)
IOPS=887744, IOS/call=25/25, inflight=64 (64)
IOPS=891072, IOS/call=25/25, inflight=32 (32)
IOPS=900512, IOS/call=25/25, inflight=128 (128)
IOPS=888544, IOS/call=25/25, inflight=128 (128)
IOPS=877312, IOS/call=25/25, inflight=128 (128)
IOPS=895008, IOS/call=25/25, inflight=128 (128)
IOPS=889376, IOS/call=25/25, inflight=128 (128)

David

2020-11-24 23:21:34

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 01/29] iov_iter: Switch to using a table of operations

On 11/24/20 5:50 AM, David Howells wrote:
> Pavel Begunkov <[email protected]> wrote:
>
>> fio is relatively heavy, I'd suggest to try fio/t/io_uring with nullblk
>
> no patches:

Here's what I get. nullb0 using blk-mq, and submit_queues==NPROC.
iostats and merging disabled, using 8k bs for t/io_uring to ensure we
have > 1 segment. Everything pinned to the same CPU to ensure
reproducibility and stability. Kernel has CONFIG_RETPOLINE enabled.

5.10-rc5:
IOPS=2453184, IOS/call=32/31, inflight=128 (128)
IOPS=2435648, IOS/call=32/32, inflight=64 (64)
IOPS=2448544, IOS/call=32/31, inflight=96 (96)
IOPS=2439584, IOS/call=32/31, inflight=128 (128)
IOPS=2454176, IOS/call=32/32, inflight=32 (32)

5.10-rc5+all patches
IOPS=2304224, IOS/call=32/32, inflight=64 (64)
IOPS=2309216, IOS/call=32/32, inflight=32 (32)
IOPS=2305376, IOS/call=32/31, inflight=128 (128)
IOPS=2300544, IOS/call=32/32, inflight=128 (128)
IOPS=2301728, IOS/call=32/32, inflight=32 (32)

which looks to be around a 6% drop.

Using actual hardware instead of just null_blk:

5.10-rc5:
IOPS=854163, IOS/call=31/31, inflight=101 (101)
IOPS=855495, IOS/call=31/31, inflight=117 (117)
IOPS=856118, IOS/call=31/31, inflight=100 (100)
IOPS=855863, IOS/call=31/31, inflight=113 (113)
IOPS=856282, IOS/call=31/31, inflight=116 (116)

5.10-rc5+all patches
IOPS=833391, IOS/call=31/31, inflight=100 (100)
IOPS=838342, IOS/call=31/31, inflight=100 (100)
IOPS=839921, IOS/call=31/31, inflight=105 (105)
IOPS=841607, IOS/call=31/31, inflight=123 (123)
IOPS=843625, IOS/call=31/31, inflight=107 (107)

which looks to be around 2-3%, but we're also running at a much
slower rate (830K vs ~2.3M).

--
Jens Axboe

2020-11-27 17:17:56

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 01/29] iov_iter: Switch to using a table of operations

Jens Axboe <[email protected]> wrote:

> which looks to be around a 6% drop.

That's quite a lot.

> which looks to be around 2-3%, but we're also running at a much
> slower rate (830K vs ~2.3M).

That's still a lot.

Thanks for having a look!

David

2020-12-03 06:34:56

by kernel test robot

[permalink] [raw]
Subject: [iov_iter] 9bd0e337c6: will-it-scale.per_process_ops -4.8% regression


Greeting,

FYI, we noticed a -4.8% regression of will-it-scale.per_process_ops due to commit:


commit: 9bd0e337c633aed3e8ec3c7397b7ae0b8436f163 ("[PATCH 01/29] iov_iter: Switch to using a table of operations")
url: https://github.com/0day-ci/linux/commits/David-Howells/RFC-iov_iter-Switch-to-using-an-ops-table/20201121-222344
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 27bba9c532a8d21050b94224ffd310ad0058c353

in testcase: will-it-scale
on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 112G memory
with following parameters:

nr_task: 50%
mode: process
test: pwrite1
cpufreq_governor: performance
ucode: 0x42e

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/50%/debian-10.4-x86_64-20200603.cgz/lkp-ivb-2ep1/pwrite1/will-it-scale/0x42e

commit:
27bba9c532 ("Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi")
9bd0e337c6 ("iov_iter: Switch to using a table of operations")

27bba9c532a8d210 9bd0e337c633aed3e8ec3c7397b
---------------- ---------------------------
%stddev %change %stddev
\ | \
28443113 -4.8% 27064036 will-it-scale.24.processes
1185129 -4.8% 1127667 will-it-scale.per_process_ops
28443113 -4.8% 27064036 will-it-scale.workload
13.84 +1.0% 13.98 boot-time.dhcp
0.00 ? 9% -13.5% 0.00 ? 3% sched_debug.cpu.next_balance.stddev
1251 ? 9% -17.2% 1035 ? 10% slabinfo.dmaengine-unmap-16.active_objs
1251 ? 9% -17.2% 1035 ? 10% slabinfo.dmaengine-unmap-16.num_objs
24623 ? 5% -18.0% 20184 ? 15% softirqs.CPU0.RCU
28877 ? 10% -30.6% 20051 ? 15% softirqs.CPU19.RCU
5693 ? 31% +402.3% 28595 ? 22% softirqs.CPU19.SCHED
21142 ? 15% -26.5% 15533 ? 11% softirqs.CPU27.RCU
20776 ? 38% -50.5% 10290 ? 58% softirqs.CPU3.SCHED
26618 ? 11% -35.3% 17214 ? 6% softirqs.CPU37.RCU
10894 ? 48% +175.5% 30012 ? 34% softirqs.CPU37.SCHED
17015 ? 4% +39.2% 23681 ? 7% softirqs.CPU43.RCU
411.75 ? 58% +76.8% 728.00 ? 32% numa-vmstat.node0.nr_active_anon
34304 ? 2% -35.6% 22103 ? 48% numa-vmstat.node0.nr_anon_pages
36087 ? 2% -31.0% 24915 ? 43% numa-vmstat.node0.nr_inactive_anon
2233 ? 51% +60.4% 3582 ? 7% numa-vmstat.node0.nr_shmem
411.75 ? 58% +76.8% 728.00 ? 32% numa-vmstat.node0.nr_zone_active_anon
36087 ? 2% -31.0% 24915 ? 43% numa-vmstat.node0.nr_zone_inactive_anon
24265 ? 3% +51.3% 36707 ? 29% numa-vmstat.node1.nr_anon_pages
25441 ? 2% +44.9% 36858 ? 29% numa-vmstat.node1.nr_inactive_anon
537.25 ? 20% +22.8% 659.50 ? 10% numa-vmstat.node1.nr_page_table_pages
25441 ? 2% +44.9% 36858 ? 29% numa-vmstat.node1.nr_zone_inactive_anon
1649 ? 58% +76.7% 2913 ? 32% numa-meminfo.node0.Active
1649 ? 58% +76.7% 2913 ? 32% numa-meminfo.node0.Active(anon)
137223 ? 2% -35.6% 88410 ? 48% numa-meminfo.node0.AnonPages
164997 ? 9% -28.4% 118095 ? 42% numa-meminfo.node0.AnonPages.max
144353 ? 2% -31.0% 99656 ? 43% numa-meminfo.node0.Inactive
144353 ? 2% -31.0% 99656 ? 43% numa-meminfo.node0.Inactive(anon)
8937 ? 51% +60.3% 14328 ? 7% numa-meminfo.node0.Shmem
97072 ? 3% +51.3% 146858 ? 29% numa-meminfo.node1.AnonPages
127410 ? 5% +43.2% 182468 ? 16% numa-meminfo.node1.AnonPages.max
101822 ? 2% +44.9% 147521 ? 29% numa-meminfo.node1.Inactive
101822 ? 2% +44.9% 147521 ? 29% numa-meminfo.node1.Inactive(anon)
2148 ? 20% +22.9% 2639 ? 10% numa-meminfo.node1.PageTables
3431 ? 89% -85.1% 512.25 ?109% interrupts.38:PCI-MSI.2621444-edge.eth0-TxRx-3
348.50 ? 62% +152.7% 880.75 ? 27% interrupts.40:PCI-MSI.2621446-edge.eth0-TxRx-5
1697 ? 63% -53.1% 796.75 ? 13% interrupts.CPU13.CAL:Function_call_interrupts
89.75 ? 36% +220.3% 287.50 ? 20% interrupts.CPU13.RES:Rescheduling_interrupts
745.75 ? 3% +104.6% 1526 ? 69% interrupts.CPU19.CAL:Function_call_interrupts
293.00 ? 5% -60.0% 117.25 ? 47% interrupts.CPU19.RES:Rescheduling_interrupts
778.50 ? 9% +123.7% 1741 ? 64% interrupts.CPU22.CAL:Function_call_interrupts
6450 ? 29% -38.0% 4000 ? 4% interrupts.CPU24.NMI:Non-maskable_interrupts
6450 ? 29% -38.0% 4000 ? 4% interrupts.CPU24.PMI:Performance_monitoring_interrupts
2012 ? 56% -57.6% 852.75 ? 6% interrupts.CPU26.CAL:Function_call_interrupts
184.25 ? 37% -47.9% 96.00 ? 49% interrupts.CPU27.RES:Rescheduling_interrupts
0.50 ?100% +64250.0% 321.75 ?170% interrupts.CPU28.TLB:TLB_shootdowns
3431 ? 89% -85.1% 512.25 ?109% interrupts.CPU29.38:PCI-MSI.2621444-edge.eth0-TxRx-3
348.50 ? 62% +152.7% 880.75 ? 27% interrupts.CPU31.40:PCI-MSI.2621446-edge.eth0-TxRx-5
156.50 ? 51% -51.3% 76.25 ? 59% interrupts.CPU33.RES:Rescheduling_interrupts
883.50 ? 18% -23.8% 673.25 ? 22% interrupts.CPU36.CAL:Function_call_interrupts
7492 ? 13% -45.6% 4073 ? 63% interrupts.CPU37.NMI:Non-maskable_interrupts
7492 ? 13% -45.6% 4073 ? 63% interrupts.CPU37.PMI:Performance_monitoring_interrupts
250.50 ? 19% -52.5% 119.00 ? 50% interrupts.CPU37.RES:Rescheduling_interrupts
4688 ? 27% +63.5% 7667 ? 15% interrupts.CPU40.NMI:Non-maskable_interrupts
4688 ? 27% +63.5% 7667 ? 15% interrupts.CPU40.PMI:Performance_monitoring_interrupts
96.75 ? 92% +135.1% 227.50 ? 22% interrupts.CPU43.RES:Rescheduling_interrupts
2932 ? 36% +73.4% 5084 ? 21% interrupts.CPU47.NMI:Non-maskable_interrupts
2932 ? 36% +73.4% 5084 ? 21% interrupts.CPU47.PMI:Performance_monitoring_interrupts
57.50 ? 78% +250.4% 201.50 ? 42% interrupts.CPU47.RES:Rescheduling_interrupts
4207 ? 61% +86.0% 7827 ? 11% interrupts.CPU8.NMI:Non-maskable_interrupts
4207 ? 61% +86.0% 7827 ? 11% interrupts.CPU8.PMI:Performance_monitoring_interrupts
1.089e+10 -2.3% 1.064e+10 perf-stat.i.branch-instructions
1.62 +0.7 2.34 perf-stat.i.branch-miss-rate%
1.741e+08 +42.3% 2.476e+08 perf-stat.i.branch-misses
1.36 +3.3% 1.41 perf-stat.i.cpi
1.233e+08 ? 3% -7.1% 1.146e+08 perf-stat.i.dTLB-load-misses
2.38e+10 -3.3% 2.302e+10 perf-stat.i.dTLB-loads
57501510 -4.9% 54711717 perf-stat.i.dTLB-store-misses
1.828e+10 -3.7% 1.761e+10 perf-stat.i.dTLB-stores
98.97 -2.9 96.02 ? 2% perf-stat.i.iTLB-load-miss-rate%
29795797 ? 4% -5.0% 28320171 perf-stat.i.iTLB-load-misses
299268 ? 2% +298.1% 1191476 ? 50% perf-stat.i.iTLB-loads
5.335e+10 -3.7% 5.138e+10 perf-stat.i.instructions
0.74 -3.7% 0.71 perf-stat.i.ipc
0.20 ? 8% +12.1% 0.23 perf-stat.i.major-faults
1104 -3.2% 1069 perf-stat.i.metric.M/sec
72308 +2.3% 73975 ? 2% perf-stat.i.node-stores
0.10 +7.9% 0.11 ? 8% perf-stat.overall.MPKI
1.60 +0.7 2.33 perf-stat.overall.branch-miss-rate%
1.35 +4.1% 1.41 perf-stat.overall.cpi
99.00 -3.0 95.98 ? 2% perf-stat.overall.iTLB-load-miss-rate%
0.74 -3.9% 0.71 perf-stat.overall.ipc
1.085e+10 -2.3% 1.06e+10 perf-stat.ps.branch-instructions
1.735e+08 +42.3% 2.468e+08 perf-stat.ps.branch-misses
1.229e+08 ? 3% -7.1% 1.142e+08 perf-stat.ps.dTLB-load-misses
2.372e+10 -3.3% 2.294e+10 perf-stat.ps.dTLB-loads
57306258 -4.9% 54525679 perf-stat.ps.dTLB-store-misses
1.822e+10 -3.7% 1.755e+10 perf-stat.ps.dTLB-stores
29695158 ? 4% -5.0% 28224049 perf-stat.ps.iTLB-load-misses
298257 ? 2% +298.1% 1187498 ? 50% perf-stat.ps.iTLB-loads
5.317e+10 -3.7% 5.12e+10 perf-stat.ps.instructions
0.20 ? 7% +12.0% 0.23 ? 2% perf-stat.ps.major-faults
1.613e+13 -3.9% 1.55e+13 perf-stat.total.instructions
8.00 ? 14% -8.0 0.00 perf-profile.calltrace.cycles-pp.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.new_sync_write
7.38 ? 14% -7.4 0.00 perf-profile.calltrace.cycles-pp.copyin.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
7.27 ? 14% -7.3 0.00 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyin.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter
0.69 ? 14% -0.4 0.29 ?100% perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.new_sync_write.vfs_write.ksys_pwrite64
0.62 ? 15% -0.3 0.30 ?101% perf-profile.calltrace.cycles-pp.unlock_page.shmem_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
0.85 ? 8% -0.2 0.66 ? 15% perf-profile.calltrace.cycles-pp.__fget_light.ksys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
0.91 ? 11% -0.1 0.79 ? 12% perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.new_sync_write.vfs_write
0.00 +1.0 1.01 ? 13% perf-profile.calltrace.cycles-pp.__get_user_nocheck_1.xxx_fault_in_readable.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
0.00 +1.4 1.42 ? 12% perf-profile.calltrace.cycles-pp.xxx_advance.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.new_sync_write
0.00 +2.1 2.15 ? 13% perf-profile.calltrace.cycles-pp.xxx_fault_in_readable.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.new_sync_write
0.00 +6.8 6.82 ? 13% perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyin.xxx_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter
0.00 +6.9 6.92 ? 13% perf-profile.calltrace.cycles-pp.copyin.xxx_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
0.00 +8.1 8.09 ? 14% perf-profile.calltrace.cycles-pp.xxx_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.new_sync_write
8.03 ? 14% -8.0 0.00 perf-profile.children.cycles-pp.iov_iter_copy_from_user_atomic
0.85 ? 8% -0.2 0.66 ? 15% perf-profile.children.cycles-pp.__fget_light
0.69 ? 14% -0.2 0.52 ? 15% perf-profile.children.cycles-pp.up_write
0.62 ? 13% -0.2 0.46 ? 14% perf-profile.children.cycles-pp.apparmor_file_permission
0.94 ? 11% -0.1 0.82 ? 13% perf-profile.children.cycles-pp.file_update_time
0.51 ? 12% -0.1 0.40 ? 14% perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited
0.55 ? 12% -0.1 0.47 ? 12% perf-profile.children.cycles-pp.current_time
0.62 ? 14% -0.1 0.55 ? 13% perf-profile.children.cycles-pp.unlock_page
0.24 ? 13% -0.0 0.20 ? 16% perf-profile.children.cycles-pp.timestamp_truncate
0.18 ? 11% -0.0 0.14 ? 15% perf-profile.children.cycles-pp.file_remove_privs
0.55 ? 14% +0.3 0.87 ? 15% perf-profile.children.cycles-pp.__x86_retpoline_rax
0.00 +1.4 1.42 ? 12% perf-profile.children.cycles-pp.xxx_advance
0.00 +2.2 2.22 ? 13% perf-profile.children.cycles-pp.xxx_fault_in_readable
0.00 +8.1 8.12 ? 14% perf-profile.children.cycles-pp.xxx_copy_from_user_atomic
1.02 ? 16% -0.2 0.82 ? 12% perf-profile.self.cycles-pp.shmem_getpage_gfp
0.82 ? 8% -0.2 0.63 ? 15% perf-profile.self.cycles-pp.__fget_light
0.66 ? 14% -0.2 0.49 ? 15% perf-profile.self.cycles-pp.up_write
0.54 ? 15% -0.2 0.39 ? 14% perf-profile.self.cycles-pp.apparmor_file_permission
0.59 ? 13% -0.1 0.46 ? 13% perf-profile.self.cycles-pp.ksys_pwrite64
0.50 ? 12% -0.1 0.40 ? 13% perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited
0.24 ? 15% -0.0 0.19 ? 15% perf-profile.self.cycles-pp.timestamp_truncate
0.20 ? 13% -0.0 0.17 ? 12% perf-profile.self.cycles-pp.current_time
0.12 ? 14% +0.1 0.19 ? 14% perf-profile.self.cycles-pp.__x86_indirect_thunk_rax
0.43 ? 14% +0.3 0.68 ? 15% perf-profile.self.cycles-pp.__x86_retpoline_rax
0.00 +1.1 1.14 ? 15% perf-profile.self.cycles-pp.xxx_copy_from_user_atomic
0.00 +1.2 1.21 ? 12% perf-profile.self.cycles-pp.xxx_fault_in_readable
0.00 +1.3 1.28 ? 12% perf-profile.self.cycles-pp.xxx_advance



will-it-scale.24.processes

2.88e+07 +----------------------------------------------------------------+
2.86e+07 |-+ +.+.+..+. |
| +. + +. .+. |
2.84e+07 |.+.+.+.+. + +.+.+.+.+ + + |
2.82e+07 |-+ +.+ |
| |
2.8e+07 |-+ |
2.78e+07 |-+ |
2.76e+07 |-+ |
| |
2.74e+07 |-+ |
2.72e+07 |-O O O O O O O O O O O O O O O O |
| O O O O O O O O O O O O O |
2.7e+07 |-+ O O |
2.68e+07 +----------------------------------------------------------------+


will-it-scale.per_process_ops

1.2e+06 +----------------------------------------------------------------+
| +.+.+..+. |
1.19e+06 |-+ +. + +. .+. |
1.18e+06 |.+.+.+.+ + +.+.+.+.+ + + |
| + .+ |
1.17e+06 |-+ + |
| |
1.16e+06 |-+ |
| |
1.15e+06 |-+ |
1.14e+06 |-+ |
| O O O O |
1.13e+06 |-O O O O O O O O O O O O O O O O O O O O O O O |
| O O O O |
1.12e+06 +----------------------------------------------------------------+


will-it-scale.workload

2.88e+07 +----------------------------------------------------------------+
2.86e+07 |-+ +.+.+..+. |
| +. + +. .+. |
2.84e+07 |.+.+.+.+. + +.+.+.+.+ + + |
2.82e+07 |-+ +.+ |
| |
2.8e+07 |-+ |
2.78e+07 |-+ |
2.76e+07 |-+ |
| |
2.74e+07 |-+ |
2.72e+07 |-O O O O O O O O O O O O O O O O |
| O O O O O O O O O O O O O |
2.7e+07 |-+ O O |
2.68e+07 +----------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Oliver Sang


Attachments:
(No filename) (19.57 kB)
config-5.10.0-rc4-00369-g9bd0e337c633 (172.78 kB)
job-script (8.00 kB)
job.yaml (5.51 kB)
reproduce (349.00 B)
Download all attachments

2020-12-03 17:52:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: [iov_iter] 9bd0e337c6: will-it-scale.per_process_ops -4.8% regression

On Wed, Dec 2, 2020 at 10:31 PM kernel test robot <[email protected]> wrote:
>
> FYI, we noticed a -4.8% regression of will-it-scale.per_process_ops due to commit:

Ok, I guess that's bigger than expected, but the profile data does
show how bad the indirect branches are.

There's both a "direct" cost of them:

> 0.55 ą 14% +0.3 0.87 ą 15% perf-profile.children.cycles-pp.__x86_retpoline_rax
> 0.12 ą 14% +0.1 0.19 ą 14% perf-profile.self.cycles-pp.__x86_indirect_thunk_rax
> 0.43 ą 14% +0.3 0.68 ą 15% perf-profile.self.cycles-pp.__x86_retpoline_rax

The actual retpoline profile costs themselves do not add up to 4%, but
I think that's because the indirect costs are higher, because the
branch mis-predicts will basically make everything run slower for a
while as the OoO engine needs to restart.

So the global cost then shows up in CPU and branch miss stats, where
the IPC goes down (which is the same thing as saying that CPI goes
up):

> 1.741e+08 +42.3% 2.476e+08 perf-stat.i.branch-misses
> 0.74 -3.9% 0.71 perf-stat.overall.ipc
> 1.35 +4.1% 1.41 perf-stat.overall.cpi

which is why it ends up being so costly even if the retpoline overhead
itself is "only" just under 1%.

Linus

2020-12-03 17:54:11

by Jens Axboe

[permalink] [raw]
Subject: Re: [iov_iter] 9bd0e337c6: will-it-scale.per_process_ops -4.8% regression

On 12/3/20 10:47 AM, Linus Torvalds wrote:
> On Wed, Dec 2, 2020 at 10:31 PM kernel test robot <[email protected]> wrote:
>>
>> FYI, we noticed a -4.8% regression of will-it-scale.per_process_ops due to commit:
>
> Ok, I guess that's bigger than expected, but the profile data does
> show how bad the indirect branches are.

It's also in the same range (3-6%) as the microbenchmarks I ran and posted.
So at least there's correlation there too.

--
Jens Axboe

2020-12-04 11:55:17

by David Howells

[permalink] [raw]
Subject: Re: [iov_iter] 9bd0e337c6: will-it-scale.per_process_ops -4.8% regression

kernel test robot <[email protected]> wrote:

> FYI, we noticed a -4.8% regression of will-it-scale.per_process_ops due to commit:
>
>
> commit: 9bd0e337c633aed3e8ec3c7397b7ae0b8436f163 ("[PATCH 01/29] iov_iter: Switch to using a table of operations")

Out of interest, would it be possible for you to run this on the tail of the
series on the same hardware?

Thanks,
David

2020-12-04 11:56:04

by David Howells

[permalink] [raw]
Subject: Re: [iov_iter] 9bd0e337c6: will-it-scale.per_process_ops -4.8% regression

Linus Torvalds <[email protected]> wrote:

> > FYI, we noticed a -4.8% regression of will-it-scale.per_process_ops due to commit:
>
> Ok, I guess that's bigger than expected,

Note that it appears to be testing just the first patch and not the whole
series:

| commit: 9bd0e337c633aed3e8ec3c7397b7ae0b8436f163 ("[PATCH 01/29] iov_iter: Switch to using a table of operations")

that just adds an indirection table without taking away any of the conditional
branching. It seems quite likely, though, that even if you add all the other
patches, you won't get back enough to make it worth it.

David

2020-12-07 12:59:31

by kernel test robot

[permalink] [raw]
Subject: Re: [iov_iter] 9bd0e337c6: will-it-scale.per_process_ops -4.8% regression

Hi David,

On Fri, Dec 04, 2020 at 11:51:48AM +0000, David Howells wrote:
> kernel test robot <[email protected]> wrote:
>
> > FYI, we noticed a -4.8% regression of will-it-scale.per_process_ops due to commit:
> >
> >
> > commit: 9bd0e337c633aed3e8ec3c7397b7ae0b8436f163 ("[PATCH 01/29] iov_iter: Switch to using a table of operations")
>
> Out of interest, would it be possible for you to run this on the tail of the
> series on the same hardware?

sorry for late. below is the result adding the tail of the series:
* ded69a6991fe0 (linux-review/David-Howells/RFC-iov_iter-Switch-to-using-an-ops-table/20201121-222344) iov_iter: Remove iterate_all_kinds() and iterate_and_advance()

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/50%/debian-10.4-x86_64-20200603.cgz/lkp-ivb-2ep1/pwrite1/will-it-scale/0x42e

commit:
27bba9c532a8d21050b94224ffd310ad0058c353
9bd0e337c633aed3e8ec3c7397b7ae0b8436f163
ded69a6991fe0094f36d96bf1ace2a9636428676

27bba9c532a8d210 9bd0e337c633aed3e8ec3c7397b ded69a6991fe0094f36d96bf1ac
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
28443113 -4.8% 27064036 -4.8% 27084904 will-it-scale.24.processes
1185129 -4.8% 1127667 -4.8% 1128537 will-it-scale.per_process_ops
28443113 -4.8% 27064036 -4.8% 27084904 will-it-scale.workload
13.84 +1.0% 13.98 +0.3% 13.89 boot-time.dhcp
1251 ? 9% -17.2% 1035 ? 10% -9.1% 1137 ? 5% slabinfo.dmaengine-unmap-16.active_objs
1251 ? 9% -17.2% 1035 ? 10% -9.1% 1137 ? 5% slabinfo.dmaengine-unmap-16.num_objs
1052 ? 6% -1.1% 1041 ? 5% -13.4% 911.75 ? 10% slabinfo.task_group.active_objs
1052 ? 6% -1.1% 1041 ? 5% -13.4% 911.75 ? 10% slabinfo.task_group.num_objs
31902 ? 5% -5.6% 30124 ? 7% -8.3% 29265 ? 4% slabinfo.vm_area_struct.active_objs
32163 ? 5% -5.4% 30441 ? 6% -8.0% 29602 ? 4% slabinfo.vm_area_struct.num_objs
73.46 ? 48% -59.7% 29.59 ?100% -100.0% 0.00 sched_debug.cfs_rq:/.MIN_vruntime.avg
2386 ? 23% -40.5% 1420 ?100% -100.0% 0.00 sched_debug.cfs_rq:/.MIN_vruntime.max
393.92 ? 33% -48.5% 202.85 ?100% -100.0% 0.00 sched_debug.cfs_rq:/.MIN_vruntime.stddev
73.46 ? 48% -59.7% 29.60 ?100% -100.0% 0.00 sched_debug.cfs_rq:/.max_vruntime.avg
2386 ? 23% -40.5% 1420 ?100% -100.0% 0.00 sched_debug.cfs_rq:/.max_vruntime.max
393.92 ? 33% -48.5% 202.94 ?100% -100.0% 0.00 sched_debug.cfs_rq:/.max_vruntime.stddev
0.00 ? 9% -13.5% 0.00 ? 3% -2.9% 0.00 ? 13% sched_debug.cpu.next_balance.stddev
-18.50 +33.5% -24.70 -41.9% -10.75 sched_debug.cpu.nr_uninterruptible.min
411.75 ? 58% +76.8% 728.00 ? 32% +59.2% 655.50 ? 50% numa-vmstat.node0.nr_active_anon
34304 ? 2% -35.6% 22103 ? 48% +8.6% 37243 ? 26% numa-vmstat.node0.nr_anon_pages
36087 ? 2% -31.0% 24915 ? 43% +7.0% 38606 ? 27% numa-vmstat.node0.nr_inactive_anon
2233 ? 51% +60.4% 3582 ? 7% -7.7% 2062 ? 51% numa-vmstat.node0.nr_shmem
411.75 ? 58% +76.8% 728.00 ? 32% +59.2% 655.50 ? 50% numa-vmstat.node0.nr_zone_active_anon
36087 ? 2% -31.0% 24915 ? 43% +7.0% 38606 ? 27% numa-vmstat.node0.nr_zone_inactive_anon
24265 ? 3% +51.3% 36707 ? 29% -12.2% 21315 ? 47% numa-vmstat.node1.nr_anon_pages
25441 ? 2% +44.9% 36858 ? 29% -9.9% 22912 ? 47% numa-vmstat.node1.nr_inactive_anon
537.25 ? 20% +22.8% 659.50 ? 10% +14.5% 615.00 ? 21% numa-vmstat.node1.nr_page_table_pages
25441 ? 2% +44.9% 36858 ? 29% -9.9% 22912 ? 47% numa-vmstat.node1.nr_zone_inactive_anon
1649 ? 58% +76.7% 2913 ? 32% +59.0% 2621 ? 50% numa-meminfo.node0.Active
1649 ? 58% +76.7% 2913 ? 32% +59.0% 2621 ? 50% numa-meminfo.node0.Active(anon)
137223 ? 2% -35.6% 88410 ? 48% +8.6% 148973 ? 26% numa-meminfo.node0.AnonPages
164997 ? 9% -28.4% 118095 ? 42% +6.9% 176340 ? 23% numa-meminfo.node0.AnonPages.max
144353 ? 2% -31.0% 99656 ? 43% +7.0% 154424 ? 27% numa-meminfo.node0.Inactive
144353 ? 2% -31.0% 99656 ? 43% +7.0% 154424 ? 27% numa-meminfo.node0.Inactive(anon)
8937 ? 51% +60.3% 14328 ? 7% -7.7% 8251 ? 51% numa-meminfo.node0.Shmem
97072 ? 3% +51.3% 146858 ? 29% -12.2% 85274 ? 47% numa-meminfo.node1.AnonPages
127410 ? 5% +43.2% 182468 ? 16% -1.9% 124986 ? 42% numa-meminfo.node1.AnonPages.max
101822 ? 2% +44.9% 147521 ? 29% -9.9% 91738 ? 47% numa-meminfo.node1.Inactive
101822 ? 2% +44.9% 147521 ? 29% -9.9% 91738 ? 47% numa-meminfo.node1.Inactive(anon)
2148 ? 20% +22.9% 2639 ? 10% +14.5% 2460 ? 21% numa-meminfo.node1.PageTables
24623 ? 5% -18.0% 20184 ? 15% -6.9% 22929 ? 15% softirqs.CPU0.RCU
15977 ? 9% +34.4% 21477 ? 22% +54.7% 24711 ? 15% softirqs.CPU13.RCU
30680 ? 40% -56.2% 13431 ? 60% -70.8% 8966 ? 44% softirqs.CPU13.SCHED
28877 ? 10% -30.6% 20051 ? 15% -24.2% 21887 ? 13% softirqs.CPU19.RCU
5693 ? 31% +402.3% 28595 ? 22% +154.6% 14496 ? 46% softirqs.CPU19.SCHED
5753 ? 14% +141.4% 13886 ? 87% +172.2% 15657 ? 51% softirqs.CPU2.SCHED
7252 ? 79% +239.9% 24653 ? 48% +189.1% 20968 ? 44% softirqs.CPU23.SCHED
42479 -24.7% 31999 ? 39% -25.9% 31488 ? 27% softirqs.CPU26.SCHED
21142 ? 15% -26.5% 15533 ? 11% +5.6% 22317 ? 17% softirqs.CPU27.RCU
20776 ? 38% -50.5% 10290 ? 58% +4.7% 21748 ? 35% softirqs.CPU3.SCHED
26618 ? 11% -35.3% 17214 ? 6% -33.5% 17689 ? 5% softirqs.CPU37.RCU
10894 ? 48% +175.5% 30012 ? 34% +237.2% 36734 ? 10% softirqs.CPU37.SCHED
17015 ? 4% +39.2% 23681 ? 7% +9.9% 18707 ? 21% softirqs.CPU43.RCU
29682 ? 10% -17.6% 24446 ? 23% -18.9% 24062 ? 9% softirqs.CPU6.RCU
21953 ? 20% +9.7% 24079 ? 24% -18.3% 17943 ? 23% softirqs.CPU7.RCU
3431 ? 89% -85.1% 512.25 ?109% -93.6% 220.75 ? 32% interrupts.38:PCI-MSI.2621444-edge.eth0-TxRx-3
348.50 ? 62% +152.7% 880.75 ? 27% -30.1% 243.50 ? 44% interrupts.40:PCI-MSI.2621446-edge.eth0-TxRx-5
50948 -0.6% 50655 +7.1% 54590 ? 6% interrupts.CAL:Function_call_interrupts
2579 ? 26% +32.3% 3412 ? 43% +58.3% 4082 ? 27% interrupts.CPU0.NMI:Non-maskable_interrupts
2579 ? 26% +32.3% 3412 ? 43% +58.3% 4082 ? 27% interrupts.CPU0.PMI:Performance_monitoring_interrupts
296.75 -3.4% 286.75 ? 7% -38.2% 183.50 ? 40% interrupts.CPU1.RES:Rescheduling_interrupts
737.25 +8.7% 801.75 ? 13% +92.5% 1419 ? 73% interrupts.CPU11.CAL:Function_call_interrupts
1697 ? 63% -53.1% 796.75 ? 13% -55.7% 751.50 interrupts.CPU13.CAL:Function_call_interrupts
89.75 ? 36% +220.3% 287.50 ? 20% +195.3% 265.00 ? 10% interrupts.CPU13.RES:Rescheduling_interrupts
745.75 ? 3% +104.6% 1526 ? 69% +52.7% 1138 ? 61% interrupts.CPU19.CAL:Function_call_interrupts
293.00 ? 5% -60.0% 117.25 ? 47% -24.1% 222.25 ? 22% interrupts.CPU19.RES:Rescheduling_interrupts
778.50 ? 9% +123.7% 1741 ? 64% +3.3% 804.50 ? 10% interrupts.CPU22.CAL:Function_call_interrupts
670.00 ? 22% +40.2% 939.50 ? 49% +84.6% 1236 ? 63% interrupts.CPU23.CAL:Function_call_interrupts
283.50 ? 7% -47.7% 148.25 ? 64% -38.9% 173.25 ? 38% interrupts.CPU23.RES:Rescheduling_interrupts
6450 ? 29% -38.0% 4000 ? 4% +8.2% 6977 ? 29% interrupts.CPU24.NMI:Non-maskable_interrupts
6450 ? 29% -38.0% 4000 ? 4% +8.2% 6977 ? 29% interrupts.CPU24.PMI:Performance_monitoring_interrupts
2505 ? 24% +100.2% 5015 ? 45% +166.6% 6679 ? 26% interrupts.CPU25.NMI:Non-maskable_interrupts
2505 ? 24% +100.2% 5015 ? 45% +166.6% 6679 ? 26% interrupts.CPU25.PMI:Performance_monitoring_interrupts
2012 ? 56% -57.6% 852.75 ? 6% -48.0% 1047 ? 35% interrupts.CPU26.CAL:Function_call_interrupts
71.50 ? 12% +73.4% 124.00 ? 72% +106.3% 147.50 ? 49% interrupts.CPU26.RES:Rescheduling_interrupts
4198 ? 54% +5.7% 4438 ? 51% +41.8% 5952 ? 40% interrupts.CPU27.NMI:Non-maskable_interrupts
4198 ? 54% +5.7% 4438 ? 51% +41.8% 5952 ? 40% interrupts.CPU27.PMI:Performance_monitoring_interrupts
184.25 ? 37% -47.9% 96.00 ? 49% -6.5% 172.25 ? 27% interrupts.CPU27.RES:Rescheduling_interrupts
0.50 ?100% +64250.0% 321.75 ?170% +500.0% 3.00 ?115% interrupts.CPU28.TLB:TLB_shootdowns
3431 ? 89% -85.1% 512.25 ?109% -93.6% 220.75 ? 32% interrupts.CPU29.38:PCI-MSI.2621444-edge.eth0-TxRx-3
5982 ? 40% -21.5% 4695 ? 46% -35.1% 3881 ? 64% interrupts.CPU3.NMI:Non-maskable_interrupts
5982 ? 40% -21.5% 4695 ? 46% -35.1% 3881 ? 64% interrupts.CPU3.PMI:Performance_monitoring_interrupts
348.50 ? 62% +152.7% 880.75 ? 27% -30.1% 243.50 ? 44% interrupts.CPU31.40:PCI-MSI.2621446-edge.eth0-TxRx-5
156.50 ? 51% -51.3% 76.25 ? 59% +9.1% 170.75 ? 48% interrupts.CPU33.RES:Rescheduling_interrupts
883.50 ? 18% -23.8% 673.25 ? 22% -2.2% 863.75 ? 12% interrupts.CPU36.CAL:Function_call_interrupts
7492 ? 13% -45.6% 4073 ? 63% -40.2% 4483 ? 27% interrupts.CPU37.NMI:Non-maskable_interrupts
7492 ? 13% -45.6% 4073 ? 63% -40.2% 4483 ? 27% interrupts.CPU37.PMI:Performance_monitoring_interrupts
250.50 ? 19% -52.5% 119.00 ? 50% -76.0% 60.00 ? 49% interrupts.CPU37.RES:Rescheduling_interrupts
772.50 ? 2% +2.0% 787.75 ? 10% +346.2% 3447 ?127% interrupts.CPU40.CAL:Function_call_interrupts
4688 ? 27% +63.5% 7667 ? 15% +14.0% 5345 ? 38% interrupts.CPU40.NMI:Non-maskable_interrupts
4688 ? 27% +63.5% 7667 ? 15% +14.0% 5345 ? 38% interrupts.CPU40.PMI:Performance_monitoring_interrupts
96.75 ? 92% +135.1% 227.50 ? 22% +29.5% 125.25 ? 46% interrupts.CPU43.RES:Rescheduling_interrupts
2932 ? 36% +73.4% 5084 ? 21% +24.7% 3656 ? 55% interrupts.CPU47.NMI:Non-maskable_interrupts
2932 ? 36% +73.4% 5084 ? 21% +24.7% 3656 ? 55% interrupts.CPU47.PMI:Performance_monitoring_interrupts
57.50 ? 78% +250.4% 201.50 ? 42% +251.7% 202.25 ? 17% interrupts.CPU47.RES:Rescheduling_interrupts
4207 ? 61% +86.0% 7827 ? 11% +48.7% 6258 ? 33% interrupts.CPU8.NMI:Non-maskable_interrupts
4207 ? 61% +86.0% 7827 ? 11% +48.7% 6258 ? 33% interrupts.CPU8.PMI:Performance_monitoring_interrupts
0.18 ? 60% -36.2% 0.11 ? 9% -39.0% 0.11 ? 4% perf-stat.i.MPKI
1.089e+10 -2.3% 1.064e+10 -4.8% 1.036e+10 perf-stat.i.branch-instructions
1.62 +0.7 2.34 +0.8 2.40 perf-stat.i.branch-miss-rate%
1.741e+08 +42.3% 2.476e+08 +42.2% 2.475e+08 perf-stat.i.branch-misses
2.70 -0.1 2.65 ? 6% +0.2 2.95 ? 3% perf-stat.i.cache-miss-rate%
5228328 +4.0% 5436325 ? 8% -4.5% 4992245 ? 2% perf-stat.i.cache-references
1.36 +3.3% 1.41 +5.5% 1.44 perf-stat.i.cpi
52.10 +0.9% 52.55 +1.8% 53.04 perf-stat.i.cpu-migrations
1.233e+08 ? 3% -7.1% 1.146e+08 +1.6% 1.253e+08 ? 11% perf-stat.i.dTLB-load-misses
2.38e+10 -3.3% 2.302e+10 -4.5% 2.273e+10 perf-stat.i.dTLB-loads
57501510 -4.9% 54711717 -4.6% 54852849 perf-stat.i.dTLB-store-misses
1.828e+10 -3.7% 1.761e+10 -4.3% 1.75e+10 perf-stat.i.dTLB-stores
98.97 -2.9 96.02 ? 2% -29.3 69.69 perf-stat.i.iTLB-load-miss-rate%
29795797 ? 4% -5.0% 28320171 -5.2% 28254639 perf-stat.i.iTLB-load-misses
299268 ? 2% +298.1% 1191476 ? 50% +4062.6% 12457396 ? 4% perf-stat.i.iTLB-loads
5.335e+10 -3.7% 5.138e+10 -5.7% 5.029e+10 perf-stat.i.instructions
0.74 -3.7% 0.71 -5.7% 0.70 perf-stat.i.ipc
0.20 ? 8% +12.1% 0.23 +2.7% 0.21 ? 9% perf-stat.i.major-faults
1104 -3.2% 1069 -4.5% 1055 perf-stat.i.metric.M/sec
66981 +4.3% 69845 ? 6% +10.1% 73725 ? 4% perf-stat.i.node-load-misses
84278 ? 2% +7.2% 90313 ? 6% +9.8% 92543 ? 5% perf-stat.i.node-loads
72308 +2.3% 73975 ? 2% +1.5% 73361 perf-stat.i.node-stores
0.10 +7.9% 0.11 ? 8% +1.3% 0.10 ? 3% perf-stat.overall.MPKI
1.60 +0.7 2.33 +0.8 2.39 perf-stat.overall.branch-miss-rate%
3.60 ? 6% -0.1 3.45 ? 7% +0.3 3.88 ? 2% perf-stat.overall.cache-miss-rate%
1.35 +4.1% 1.41 +6.2% 1.44 perf-stat.overall.cpi
99.00 -3.0 95.98 ? 2% -29.6 69.42 perf-stat.overall.iTLB-load-miss-rate%
0.74 -3.9% 0.71 -5.9% 0.70 perf-stat.overall.ipc
567203 +1.0% 572789 -1.2% 560464 perf-stat.overall.path-length
1.085e+10 -2.3% 1.06e+10 -4.8% 1.033e+10 perf-stat.ps.branch-instructions
1.735e+08 +42.3% 2.468e+08 +42.2% 2.467e+08 perf-stat.ps.branch-misses
5216268 +4.0% 5422673 ? 8% -4.5% 4979211 ? 2% perf-stat.ps.cache-references
51.99 +0.8% 52.43 +1.8% 52.92 perf-stat.ps.cpu-migrations
1.229e+08 ? 3% -7.1% 1.142e+08 +1.6% 1.249e+08 ? 12% perf-stat.ps.dTLB-load-misses
2.372e+10 -3.3% 2.294e+10 -4.5% 2.266e+10 perf-stat.ps.dTLB-loads
57306258 -4.9% 54525679 -4.6% 54668669 perf-stat.ps.dTLB-store-misses
1.822e+10 -3.7% 1.755e+10 -4.3% 1.744e+10 perf-stat.ps.dTLB-stores
29695158 ? 4% -5.0% 28224049 -5.2% 28159995 perf-stat.ps.iTLB-load-misses
298257 ? 2% +298.1% 1187498 ? 50% +4061.6% 12412241 ? 4% perf-stat.ps.iTLB-loads
5.317e+10 -3.7% 5.12e+10 -5.7% 5.012e+10 perf-stat.ps.instructions
0.20 ? 7% +12.0% 0.23 ? 2% +3.0% 0.21 ? 8% perf-stat.ps.major-faults
66882 +4.3% 69726 ? 6% +10.1% 73651 ? 4% perf-stat.ps.node-load-misses
84325 ? 2% +7.1% 90306 ? 6% +9.7% 92489 ? 5% perf-stat.ps.node-loads
1.613e+13 -3.9% 1.55e+13 -5.9% 1.518e+13 perf-stat.total.instructions
8.00 ? 14% -8.0 0.00 -8.0 0.00 perf-profile.calltrace.cycles-pp.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.new_sync_write
7.38 ? 14% -7.4 0.00 -7.4 0.00 perf-profile.calltrace.cycles-pp.copyin.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
7.27 ? 14% -7.3 0.00 -7.3 0.00 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyin.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter
6.71 ? 12% -0.7 5.98 ? 13% -0.7 6.03 ? 10% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_pwrite
4.93 ? 12% -0.6 4.29 ? 14% -0.5 4.40 ? 11% perf-profile.calltrace.cycles-pp.shmem_getpage_gfp.shmem_write_begin.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
5.81 ? 13% -0.6 5.22 ? 14% -0.6 5.17 ? 11% perf-profile.calltrace.cycles-pp.shmem_write_begin.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.new_sync_write
3.50 ? 14% -0.5 3.03 ? 13% -0.4 3.13 ? 11% perf-profile.calltrace.cycles-pp.shmem_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.new_sync_write
0.69 ? 14% -0.4 0.29 ?100% -0.5 0.14 ?173% perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.new_sync_write.vfs_write.ksys_pwrite64
3.44 ? 12% -0.4 3.06 ? 14% -0.4 3.05 ? 12% perf-profile.calltrace.cycles-pp.find_lock_entry.shmem_getpage_gfp.shmem_write_begin.generic_perform_write.__generic_file_write_iter
0.62 ? 15% -0.3 0.30 ?101% -0.2 0.43 ? 59% perf-profile.calltrace.cycles-pp.unlock_page.shmem_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
0.85 ? 8% -0.2 0.66 ? 15% -0.1 0.71 ? 10% perf-profile.calltrace.cycles-pp.__fget_light.ksys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
0.84 ? 14% -0.1 0.71 ? 14% -0.1 0.72 ? 8% perf-profile.calltrace.cycles-pp.set_page_dirty.shmem_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
0.91 ? 11% -0.1 0.79 ? 12% -0.1 0.82 ? 10% perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.new_sync_write.vfs_write
0.68 ? 15% -0.1 0.58 ? 13% -0.1 0.57 ? 9% perf-profile.calltrace.cycles-pp.page_mapping.set_page_dirty.shmem_write_end.generic_perform_write.__generic_file_write_iter
0.00 +0.0 0.00 +1.0 1.02 ? 11% perf-profile.calltrace.cycles-pp.__get_user_nocheck_1.iovec_fault_in_readable.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
0.00 +0.0 0.00 +1.2 1.17 ? 9% perf-profile.calltrace.cycles-pp.iovec_advance.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.new_sync_write
0.00 +0.0 0.00 +2.1 2.13 ? 11% perf-profile.calltrace.cycles-pp.iovec_fault_in_readable.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.new_sync_write
0.00 +0.0 0.00 +6.8 6.85 ? 10% perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyin.iovec_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter
0.00 +0.0 0.00 +6.9 6.95 ? 10% perf-profile.calltrace.cycles-pp.copyin.iovec_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
0.00 +0.0 0.00 +8.2 8.17 ? 10% perf-profile.calltrace.cycles-pp.iovec_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.new_sync_write
0.00 +1.0 1.01 ? 13% +0.0 0.00 perf-profile.calltrace.cycles-pp.__get_user_nocheck_1.xxx_fault_in_readable.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
0.00 +1.4 1.42 ? 12% +0.0 0.00 perf-profile.calltrace.cycles-pp.xxx_advance.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.new_sync_write
0.00 +2.1 2.15 ? 13% +0.0 0.00 perf-profile.calltrace.cycles-pp.xxx_fault_in_readable.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.new_sync_write
0.00 +6.8 6.82 ? 13% +0.0 0.00 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyin.xxx_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter
0.00 +6.9 6.92 ? 13% +0.0 0.00 perf-profile.calltrace.cycles-pp.copyin.xxx_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
0.00 +8.1 8.09 ? 14% +0.0 0.00 perf-profile.calltrace.cycles-pp.xxx_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.new_sync_write
8.03 ? 14% -8.0 0.00 -8.0 0.00 perf-profile.children.cycles-pp.iov_iter_copy_from_user_atomic
7.55 ? 12% -0.8 6.75 ? 13% -0.8 6.79 ? 10% perf-profile.children.cycles-pp.syscall_return_via_sysret
4.99 ? 12% -0.6 4.34 ? 14% -0.5 4.45 ? 11% perf-profile.children.cycles-pp.shmem_getpage_gfp
5.84 ? 13% -0.6 5.22 ? 14% -0.6 5.20 ? 11% perf-profile.children.cycles-pp.shmem_write_begin
3.53 ? 13% -0.5 3.07 ? 13% -0.4 3.17 ? 11% perf-profile.children.cycles-pp.shmem_write_end
3.48 ? 12% -0.4 3.09 ? 14% -0.4 3.09 ? 12% perf-profile.children.cycles-pp.find_lock_entry
0.85 ? 8% -0.2 0.66 ? 15% -0.1 0.71 ? 10% perf-profile.children.cycles-pp.__fget_light
0.69 ? 14% -0.2 0.52 ? 15% -0.2 0.48 ? 9% perf-profile.children.cycles-pp.up_write
0.62 ? 13% -0.2 0.46 ? 14% -0.2 0.47 ? 12% perf-profile.children.cycles-pp.apparmor_file_permission
0.86 ? 14% -0.1 0.74 ? 14% -0.1 0.74 ? 8% perf-profile.children.cycles-pp.set_page_dirty
0.94 ? 11% -0.1 0.82 ? 13% -0.1 0.85 ? 10% perf-profile.children.cycles-pp.file_update_time
0.51 ? 12% -0.1 0.40 ? 14% +0.0 0.52 ? 11% perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited
0.71 ? 15% -0.1 0.60 ? 13% -0.1 0.60 ? 9% perf-profile.children.cycles-pp.page_mapping
0.55 ? 12% -0.1 0.47 ? 12% -0.0 0.50 ? 9% perf-profile.children.cycles-pp.current_time
0.62 ? 14% -0.1 0.55 ? 13% -0.1 0.56 ? 13% perf-profile.children.cycles-pp.unlock_page
0.24 ? 13% -0.0 0.20 ? 16% -0.0 0.22 ? 12% perf-profile.children.cycles-pp.timestamp_truncate
0.18 ? 11% -0.0 0.14 ? 15% -0.0 0.18 ? 12% perf-profile.children.cycles-pp.file_remove_privs
0.42 ? 13% -0.0 0.39 ? 14% -0.1 0.36 ? 13% perf-profile.children.cycles-pp.testcase
0.00 +0.0 0.00 +1.2 1.18 ? 9% perf-profile.children.cycles-pp.iovec_advance
0.00 +0.0 0.00 +2.2 2.21 ? 11% perf-profile.children.cycles-pp.iovec_fault_in_readable
0.00 +0.0 0.00 +8.2 8.20 ? 10% perf-profile.children.cycles-pp.iovec_copy_from_user_atomic
0.21 ? 17% +0.1 0.28 ? 16% +0.1 0.29 ? 10% perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
0.55 ? 14% +0.3 0.87 ? 15% +0.3 0.89 ? 13% perf-profile.children.cycles-pp.__x86_retpoline_rax
0.00 +1.4 1.42 ? 12% +0.0 0.00 perf-profile.children.cycles-pp.xxx_advance
0.00 +2.2 2.22 ? 13% +0.0 0.00 perf-profile.children.cycles-pp.xxx_fault_in_readable
0.00 +8.1 8.12 ? 14% +0.0 0.00 perf-profile.children.cycles-pp.xxx_copy_from_user_atomic
7.52 ? 12% -0.8 6.72 ? 13% -0.8 6.77 ? 10% perf-profile.self.cycles-pp.syscall_return_via_sysret
1.02 ? 16% -0.2 0.82 ? 12% -0.1 0.92 ? 10% perf-profile.self.cycles-pp.shmem_getpage_gfp
0.82 ? 8% -0.2 0.63 ? 15% -0.1 0.68 ? 10% perf-profile.self.cycles-pp.__fget_light
0.66 ? 14% -0.2 0.49 ? 15% -0.2 0.46 ? 8% perf-profile.self.cycles-pp.up_write
0.54 ? 15% -0.2 0.39 ? 14% -0.1 0.40 ? 12% perf-profile.self.cycles-pp.apparmor_file_permission
0.59 ? 13% -0.1 0.46 ? 13% -0.1 0.45 ? 9% perf-profile.self.cycles-pp.ksys_pwrite64
0.50 ? 12% -0.1 0.40 ? 13% -0.0 0.47 ? 12% perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited
0.67 ? 15% -0.1 0.57 ? 12% -0.1 0.57 ? 9% perf-profile.self.cycles-pp.page_mapping
0.71 ? 17% -0.1 0.63 ? 13% -0.1 0.60 ? 14% perf-profile.self.cycles-pp.security_file_permission
0.24 ? 15% -0.0 0.19 ? 15% -0.0 0.22 ? 12% perf-profile.self.cycles-pp.timestamp_truncate
0.20 ? 13% -0.0 0.17 ? 12% -0.0 0.18 ? 10% perf-profile.self.cycles-pp.current_time
0.00 +0.0 0.00 +1.1 1.05 ? 9% perf-profile.self.cycles-pp.iovec_advance
0.00 +0.0 0.00 +1.2 1.17 ? 12% perf-profile.self.cycles-pp.iovec_fault_in_readable
0.00 +0.0 0.00 +1.2 1.19 ? 10% perf-profile.self.cycles-pp.iovec_copy_from_user_atomic
0.82 ? 15% +0.0 0.83 ? 12% -0.1 0.71 ? 10% perf-profile.self.cycles-pp.shmem_write_begin
0.12 ? 14% +0.1 0.19 ? 14% +0.1 0.20 ? 7% perf-profile.self.cycles-pp.__x86_indirect_thunk_rax
0.43 ? 14% +0.3 0.68 ? 15% +0.3 0.69 ? 15% perf-profile.self.cycles-pp.__x86_retpoline_rax
0.00 +1.1 1.14 ? 15% +0.0 0.00 perf-profile.self.cycles-pp.xxx_copy_from_user_atomic
0.00 +1.2 1.21 ? 12% +0.0 0.00 perf-profile.self.cycles-pp.xxx_fault_in_readable
0.00 +1.3 1.28 ? 12% +0.0 0.00 perf-profile.self.cycles-pp.xxx_advance

>
> Thanks,
> David
>

2020-12-07 13:25:44

by David Howells

[permalink] [raw]
Subject: Re: [iov_iter] 9bd0e337c6: will-it-scale.per_process_ops -4.8% regression

Oliver Sang <[email protected]> wrote:

> > Out of interest, would it be possible for you to run this on the tail of the
> > series on the same hardware?
>
> sorry for late. below is the result adding the tail of the series:
> * ded69a6991fe0 (linux-review/David-Howells/RFC-iov_iter-Switch-to-using-an-ops-table/20201121-222344) iov_iter: Remove iterate_all_kinds() and iterate_and_advance()

Thanks very much for doing that!

David

2020-12-11 11:05:35

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 01/29] iov_iter: Switch to using a table of operations

On Sat, Nov 21, 2020 at 10:21:17AM -0800, Linus Torvalds wrote:
> So I think conceptually this is the right thing to do, but I have a
> couple of worries:
>
> - do we really need all those different versions? I'm thinking
> "iter_full" versions in particular. They I think the iter_full version
> could just be wrappers that call the regular iter thing and verify the
> end result is full (and revert if not). No?

Umm... Not sure - iov_iter_revert() is not exactly light. OTOH, it's
on a slow path... Other variants:
* save local copy, run of normal variant on iter, then copy
the saved back on failure
* make a local copy, run the normal variant in _that_, then
copy it back on success.

Note that the entire thing is 5 words, and we end up reading all of
them anyway, so I wouldn't bet which variant ends up being faster -
that would need testing to compare.

I would certainly like to get rid of the duplication there, especially
if we are going to add copy_to_iter_full() and friends (there are
use cases for those).

> - I worry a bit about the indirect call overhead and spectre v2.
>
> So yeah, it would be good to have benchmarks to make sure this
> doesn't regress for some simple case.
>
> Other than those things, my initial reaction is "this does seem cleaner".

It does seem cleaner, all right, but that stuff is on fairly hot paths.
And I didn't want to mix the overhead of indirect calls into the picture,
so it turned into cascades of ifs with rather vile macros to keep the
size down.

It looks like the cost of indirects is noticable. OTOH, there are
other iov_iter patches floating around, hopefully getting better
code generation. Let's see how much do those give and if they win
considerably more than those several percents, revisit this series.

2020-12-11 11:39:06

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 00/29] RFC: iov_iter: Switch to using an ops table

On Sat, Nov 21, 2020 at 02:13:21PM +0000, David Howells wrote:
> I had a go switching the iov_iter stuff away from using a type bitmask to
> using an ops table to get rid of the if-if-if-if chains that are all over
> the place. After I pushed it, someone pointed me at Pavel's two patches.
>
> I have another iterator class that I want to add - which would lengthen the
> if-if-if-if chains. A lot of the time, there's a conditional clause at the
> beginning of a function that just jumps off to a type-specific handler or
> to reject the operation for that type. An ops table can just point to that
> instead.

So, given the performance problem, how about turning this inside out?

struct iov_step {
union {
void *kaddr;
void __user *uaddr;
};
unsigned int len;
bool user_addr;
bool kmap;
struct page *page;
};

bool iov_iterate(struct iov_step *step, struct iov_iter *i, size_t max)
{
if (step->page)
kunmap(page)
else if (step->kmap)
kunmap_atomic(step->kaddr);

if (max == 0)
return false;

if (i->type & ITER_IOVEC) {
step->user_addr = true;
step->uaddr = i->iov.iov_base + i->iov_offset;
return true;
}
if (i->type & ITER_BVEC) {
... get the page ...
} else if (i->type & ITER_KVEC) {
... get the page ...
} else ...

kmap or kmap_atomic as appropriate ...
...set kaddr & len ...

return true;
}

size_t copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
{
struct iov_step step = {};

while (iov_iterate(&step, i, bytes)) {
if (user_addr)
copy_from_user(addr, step.uaddr, step.len);
else
memcpy(addr, step.kaddr, step.len);
bytes -= step.len;
}
}