2023-06-05 13:12:01

by David Howells

[permalink] [raw]
Subject: [PATCH net-next v4 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS

Here are patches to do the following:

(1) Block MSG_SENDPAGE_* flags from leaking into ->sendmsg() from
userspace, whilst allowing splice_to_socket() to pass them in.

(2) Allow MSG_SPLICE_PAGES to be passed into tls_*_sendmsg(). Until
support is added, it will be ignored and a splice-driven sendmsg()
will be treated like a normal sendmsg(). TCP, UDP, AF_UNIX and
Chelsio-TLS already handle the flag in net-next.

(3) Replace a chain of functions to splice-to-sendpage with a single
function to splice via sendmsg() with MSG_SPLICE_PAGES. This allows a
bunch of pages to be spliced from a pipe in a single call using a
bio_vec[] and pushes the main processing loop down into the bowels of
the protocol driver rather than repeatedly calling in with a page at a
time.

(4) Provide a ->splice_eof() op[2] that allows splice to signal to its
output that the input observed a premature EOF and that the caller
didn't flag SPLICE_F_MORE, thereby allowing a corked socket to be
flushed. This attempts to maintain the current behaviour. It is also
not called if we didn't manage to read any data and so didn't called
the actor function.

This needs routing though several layers to get it down to the network
protocol.

[!] Note that I chose not to pass in any flags - I'm not sure it's
particularly useful to pass in the splice flags; I also elected
not to return any error code - though we might actually want to do
that.

(5) Provide tls_{device,sw}_splice_eof() to flush a pending TLS record if
there is one.

(6) Alter the behaviour of sendfile() and fix SPLICE_F_MORE/MSG_MORE
signalling[1] such SPLICE_F_MORE is always signalled until we have
read sufficient data to finish the request. If we get a zero-length
before we've managed to splice sufficient data, we now leave the
socket expecting more data and leave it to userspace to deal with it.

(7) Make AF_TLS handle the MSG_SPLICE_PAGES internal sendmsg flag.
MSG_SPLICE_PAGES is an internal hint that tells the protocol that it
should splice the pages supplied if it can. Its sendpage
implementations are then turned into wrappers around that.


I've pushed the patches here also:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=sendpage-2-tls

David

Changes
=======
ver #4)
- Switch to using ->splice_eof() to signal premature EOF to the splice
output[2].

ver #3)
- Include the splice-to-socket rewrite patch.
- Fix SPLICE_F_MORE/MSG_MORE signalling.
- Allow AF_TLS to accept sendmsg() with MSG_SPLICE_PAGES before it is
handled.
- Allow a zero-length send() to a TLS socket to flush an outstanding
record.
- Address TLS kselftest failure.

ver #2)
- Dropped the slab data copying.
- "rls_" should be "tls_".
- Attempted to fix splice_direct_to_actor().
- Blocked MSG_SENDPAGE_* from being set by userspace.

Link: https://lore.kernel.org/r/[email protected]/ [1]
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ [2]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=51c78a4d532efe9543a4df019ff405f05c6157f6 # part 1
Link: https://lore.kernel.org/r/[email protected]/ # v1

David Howells (11):
net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace
tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg
splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()
splice, net: Add a splice_eof op to file-ops and socket-ops
tls/sw: Use splice_eof() to flush
tls/device: Use splice_eof() to flush
splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor()
tls/sw: Support MSG_SPLICE_PAGES
tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES
tls/device: Support MSG_SPLICE_PAGES
tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES

fs/splice.c | 207 ++++++++++++++++++++++++++-------
include/linux/fs.h | 3 +-
include/linux/net.h | 1 +
include/linux/socket.h | 4 +-
include/linux/splice.h | 3 +
include/net/sock.h | 1 +
net/socket.c | 36 ++----
net/tls/tls.h | 2 +
net/tls/tls_device.c | 110 +++++++++---------
net/tls/tls_main.c | 4 +
net/tls/tls_sw.c | 253 ++++++++++++++++++++++-------------------
11 files changed, 385 insertions(+), 239 deletions(-)



2023-06-05 13:12:09

by David Howells

[permalink] [raw]
Subject: [PATCH net-next v4 04/11] splice, net: Add a splice_eof op to file-ops and socket-ops

Add an optional method, ->splice_eof(), to allow splice to indicate the
premature termination of a splice to struct file_operations and struct
proto_ops.

This is called if sendfile() or splice() encounters all of the following
conditions inside splice_direct_to_actor():

(1) the user did not set SPLICE_F_MORE (splice only), and

(2) an EOF condition occurred (->splice_read() returned 0), and

(3) we haven't read enough to fulfill the request (ie. len > 0 still), and

(4) we have already spliced at least one byte.

A further patch will modify the behaviour of SPLICE_F_MORE to always be
passed to the actor if either the user set it or we haven't yet read
sufficient data to fulfill the request.

Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: Al Viro <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: Jan Kara <[email protected]>
cc: Jeff Layton <[email protected]>
cc: David Hildenbrand <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Chuck Lever <[email protected]>
cc: Boris Pismenny <[email protected]>
cc: John Fastabend <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
---
fs/splice.c | 31 ++++++++++++++++++++++++++++++-
include/linux/fs.h | 1 +
include/linux/net.h | 1 +
include/linux/splice.h | 1 +
include/net/sock.h | 1 +
net/socket.c | 10 ++++++++++
6 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/fs/splice.c b/fs/splice.c
index 9b1d43c0c562..3063f9a3ba62 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -969,6 +969,17 @@ static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
return out->f_op->splice_write(pipe, out, ppos, len, flags);
}

+/*
+ * Indicate to the caller that there was a premature EOF when reading from the
+ * source and the caller didn't indicate they would be sending more data after
+ * this.
+ */
+static void do_splice_eof(struct splice_desc *sd)
+{
+ if (sd->splice_eof)
+ sd->splice_eof(sd);
+}
+
/*
* Attempt to initiate a splice from a file to a pipe.
*/
@@ -1068,7 +1079,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,

ret = do_splice_to(in, &pos, pipe, len, flags);
if (unlikely(ret <= 0))
- goto out_release;
+ goto read_failure;

read_len = ret;
sd->total_len = read_len;
@@ -1108,6 +1119,15 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
file_accessed(in);
return bytes;

+read_failure:
+ /*
+ * If the user did *not* set SPLICE_F_MORE *and* we didn't hit that
+ * "use all of len" case that cleared SPLICE_F_MORE, *and* we did a
+ * "->splice_in()" that returned EOF (ie zero) *and* we have sent at
+ * least 1 byte *then* we will also do the ->splice_eof() call.
+ */
+ if (ret == 0 && !more && len > 0 && bytes)
+ do_splice_eof(sd);
out_release:
/*
* If we did an incomplete transfer we must release
@@ -1136,6 +1156,14 @@ static int direct_splice_actor(struct pipe_inode_info *pipe,
sd->flags);
}

+static void direct_file_splice_eof(struct splice_desc *sd)
+{
+ struct file *file = sd->u.file;
+
+ if (file->f_op->splice_eof)
+ file->f_op->splice_eof(file);
+}
+
/**
* do_splice_direct - splices data directly between two files
* @in: file to splice from
@@ -1161,6 +1189,7 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
.flags = flags,
.pos = *ppos,
.u.file = out,
+ .splice_eof = direct_file_splice_eof,
.opos = opos,
};
long ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f8254c3acf83..e393f2550300 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1796,6 +1796,7 @@ struct file_operations {
int (*flock) (struct file *, int, struct file_lock *);
ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
+ void (*splice_eof)(struct file *file);
int (*setlease)(struct file *, long, struct file_lock **, void **);
long (*fallocate)(struct file *file, int mode, loff_t offset,
loff_t len);
diff --git a/include/linux/net.h b/include/linux/net.h
index b73ad8e3c212..8defc8f1d82e 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -210,6 +210,7 @@ struct proto_ops {
int offset, size_t size, int flags);
ssize_t (*splice_read)(struct socket *sock, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len, unsigned int flags);
+ void (*splice_eof)(struct socket *sock);
int (*set_peek_off)(struct sock *sk, int val);
int (*peek_len)(struct socket *sock);

diff --git a/include/linux/splice.h b/include/linux/splice.h
index 991ae318b6eb..4fab18a6e371 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -38,6 +38,7 @@ struct splice_desc {
struct file *file; /* file to read/write */
void *data; /* cookie */
} u;
+ void (*splice_eof)(struct splice_desc *sd); /* Unexpected EOF handler */
loff_t pos; /* file position */
loff_t *opos; /* sendfile: output position */
size_t num_spliced; /* number of bytes already spliced */
diff --git a/include/net/sock.h b/include/net/sock.h
index 656ea89f60ff..330b9c24ef70 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1267,6 +1267,7 @@ struct proto {
size_t len, int flags, int *addr_len);
int (*sendpage)(struct sock *sk, struct page *page,
int offset, size_t size, int flags);
+ void (*splice_eof)(struct socket *sock);
int (*bind)(struct sock *sk,
struct sockaddr *addr, int addr_len);
int (*bind_add)(struct sock *sk,
diff --git a/net/socket.c b/net/socket.c
index c4d9104418c8..b778fc03c6e0 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -130,6 +130,7 @@ static int sock_fasync(int fd, struct file *filp, int on);
static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags);
+static void sock_splice_eof(struct file *file);

#ifdef CONFIG_PROC_FS
static void sock_show_fdinfo(struct seq_file *m, struct file *f)
@@ -163,6 +164,7 @@ static const struct file_operations socket_file_ops = {
.fasync = sock_fasync,
.splice_write = splice_to_socket,
.splice_read = sock_splice_read,
+ .splice_eof = sock_splice_eof,
.show_fdinfo = sock_show_fdinfo,
};

@@ -1076,6 +1078,14 @@ static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
return sock->ops->splice_read(sock, ppos, pipe, len, flags);
}

+static void sock_splice_eof(struct file *file)
+{
+ struct socket *sock = file->private_data;
+
+ if (sock->ops->splice_eof)
+ sock->ops->splice_eof(sock);
+}
+
static ssize_t sock_read_iter(struct kiocb *iocb, struct iov_iter *to)
{
struct file *file = iocb->ki_filp;


2023-06-05 13:12:35

by David Howells

[permalink] [raw]
Subject: [PATCH net-next v4 08/11] tls/sw: Support MSG_SPLICE_PAGES

Make TLS's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be
spliced from the source iterator if possible.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <[email protected]>
cc: Chuck Lever <[email protected]>
cc: Boris Pismenny <[email protected]>
cc: John Fastabend <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
---

Notes:
ver #2)
- "rls_" should be "tls_".

net/tls/tls_sw.c | 46 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 7a6bb670073f..b85f92be7c9d 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -929,6 +929,38 @@ static int tls_sw_push_pending_record(struct sock *sk, int flags)
&copied, flags);
}

+static int tls_sw_sendmsg_splice(struct sock *sk, struct msghdr *msg,
+ struct sk_msg *msg_pl, size_t try_to_copy,
+ ssize_t *copied)
+{
+ struct page *page = NULL, **pages = &page;
+
+ do {
+ ssize_t part;
+ size_t off;
+ bool put = false;
+
+ part = iov_iter_extract_pages(&msg->msg_iter, &pages,
+ try_to_copy, 1, 0, &off);
+ if (part <= 0)
+ return part ?: -EIO;
+
+ if (WARN_ON_ONCE(!sendpage_ok(page))) {
+ iov_iter_revert(&msg->msg_iter, part);
+ return -EIO;
+ }
+
+ sk_msg_page_add(msg_pl, page, part, off);
+ sk_mem_charge(sk, part);
+ if (put)
+ put_page(page);
+ *copied += part;
+ try_to_copy -= part;
+ } while (try_to_copy && !sk_msg_full(msg_pl));
+
+ return 0;
+}
+
int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
{
long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
@@ -1018,6 +1050,17 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
full_record = true;
}

+ if (try_to_copy && (msg->msg_flags & MSG_SPLICE_PAGES)) {
+ ret = tls_sw_sendmsg_splice(sk, msg, msg_pl,
+ try_to_copy, &copied);
+ if (ret < 0)
+ goto send_end;
+ tls_ctx->pending_open_record_frags = true;
+ if (full_record || eor || sk_msg_full(msg_pl))
+ goto copied;
+ continue;
+ }
+
if (!is_kvec && (full_record || eor) && !async_capable) {
u32 first = msg_pl->sg.end;

@@ -1080,8 +1123,9 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
/* Open records defined only if successfully copied, otherwise
* we would trim the sg but not reset the open record frags.
*/
- tls_ctx->pending_open_record_frags = true;
copied += try_to_copy;
+copied:
+ tls_ctx->pending_open_record_frags = true;
if (full_record || eor) {
ret = bpf_exec_tx_verdict(msg_pl, sk, full_record,
record_type, &copied,


2023-06-05 13:12:57

by David Howells

[permalink] [raw]
Subject: [PATCH net-next v4 02/11] tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg

Allow MSG_SPLICE_PAGES to be specified to sendmsg() but treat it as normal
sendmsg for now. This means the data will just be copied until
MSG_SPLICE_PAGES is handled.

Signed-off-by: David Howells <[email protected]>
cc: Chuck Lever <[email protected]>
cc: Boris Pismenny <[email protected]>
cc: John Fastabend <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
---
net/tls/tls_device.c | 3 ++-
net/tls/tls_sw.c | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index a959572a816f..9ef766e41c7a 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -447,7 +447,8 @@ static int tls_push_data(struct sock *sk,
long timeo;

if (flags &
- ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST))
+ ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST |
+ MSG_SPLICE_PAGES))
return -EOPNOTSUPP;

if (unlikely(sk->sk_err))
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 6e6a7c37d685..cac1adc968e8 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -953,7 +953,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
int pending;

if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
- MSG_CMSG_COMPAT))
+ MSG_CMSG_COMPAT | MSG_SPLICE_PAGES))
return -EOPNOTSUPP;

ret = mutex_lock_interruptible(&tls_ctx->tx_lock);