2023-06-02 15:13:36

by David Howells

[permalink] [raw]
Subject: [PATCH net-next v3 00/11] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS

Here are patches to do the following:

(1) Block MSG_SENDPAGE_* flags from leaking into ->sendmsg() from
userspace, whilst allowing splice_to_socket() to pass them in.

(2) Allow MSG_SPLICE_PAGES to be passed into tls_*_sendmsg(). Until
support is added, it will be ignored and a splice-driven sendmsg()
will be treated like a normal sendmsg(). TCP, UDP, AF_UNIX and
Chelsio-TLS already handle the flag in net-next.

(3) Allow tls/sw to be given a zero-length send()/sendto()/sendmsg()
without MSG_MORE set to allow userspace ot flush the pending record.

(4) Replace a chain of functions to splice-to-sendpage with a single
function to splice via sendmsg() with MSG_SPLICE_PAGES. This allows a
bunch of pages to be spliced from a pipe in a single call using a
bio_vec[] and pushes the main processing loop down into the bowels of
the protocol driver rather than repeatedly calling in with a page at a
time.

(5) Alter the behaviour of sendfile() and fix SPLICE_F_MORE/MSG_MORE
signalling[1] such SPLICE_F_MORE is always signalled until we have
read sufficient data to finish the request. If we get a zero-length
before we've managed to splice sufficient data, we now leave the
socket expecting more data and leave it to userspace to deal with it.

(6) Address the now failing TLS multi_chunk_sendfile kselftest by putting
in a zero-length send() to end the record.

(7) Make AF_TLS handle the MSG_SPLICE_PAGES internal sendmsg flag.
MSG_SPLICE_PAGES is an internal hint that tells the protocol that it
should splice the pages supplied if it can. Its sendpage
implementations are then turned into wrappers around that.

(8) Provide some sample programs for driving AF_ALG (hash & encrypt), TCP,
TLS, UDP and AF_UNIX.

Here are some simple timings, taking the best timing for each out of
several runs. In the following table, samples added in the last patch were
used for the first five columns and the tls kselftest for the last:

Patches unix- tcp-send tls-send tls
send kselftest
10G lo 10G lo
======= ======= ======= ======= ======= ======= =======
none 0.516 0.469 0.492 3.121 3.082 1.152
splice 0.470 0.452 0.471 3.074 3.041 0.294
all 0.469 0.440 0.475 3.077 3.041 0.345

the times are all in seconds. The "none" row is with none of the patches
applied; "splice" is up to the splice-to-sendpage replacement; and "all" is
with all the patches applied. The "10G" column is going to a server on a
different box by 10G ethernet and the "lo" column is going to a server on
the same box by the loopback device.

I think the apparent improvement is from cutting out a layer in the splice
stack and pushing more than one page in a single sendmsg. The improvement
in the tls selftest column is particularly marked.

The following sample and selftest commands were used:
unix-sink /tmp/sock & # server
unix-send -ds 256M /tmp/sock # client
tcp-sink & # server
tcp-send -ds 256M 127.0.0.1 # client - loopback
tcp-send -ds 256M 192.168.6.1 # client - 10G ethernet
tls-sink & # server
tls-send -ds 256M 127.0.0.1 # client - loopback
tls-send -ds 256M 192.168.6.1 # client - 10G ethernet
tls -r tls.12_aes_gcm.multi_chunk_sendfile

where 256M is a 256MiB file to be read in its entirety unless otherwise
specified, -d indicates O_DIRECT and -s asks for splice (if input is a
pipe) or sendfile (if input not a pipe) to be used.


I've pushed the patches here also:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=sendpage-2-tls

David

Changes
=======
ver #3)
- Include the splice-to-socket rewrite patch.
- Fix SPLICE_F_MORE/MSG_MORE signalling.
- Allow AF_TLS to accept sendmsg() with MSG_SPLICE_PAGES before it is
handled.
- Allow a zero-length send() to a TLS socket to flush an outstanding
record.
- Address TLS kselftest failure.

ver #2)
- Dropped the slab data copying.
- "rls_" should be "tls_".
- Attempted to fix splice_direct_to_actor().
- Blocked MSG_SENDPAGE_* from being set by userspace.

Link: https://lore.kernel.org/r/[email protected]/ [1]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=51c78a4d532efe9543a4df019ff405f05c6157f6 # part 1
Link: https://lore.kernel.org/r/[email protected]/ # v1

David Howells (11):
net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace
tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg
tls/sw: Use zero-length sendmsg() without MSG_MORE to flush
splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()
splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor()
tls: Address behaviour change in multi_chunk_sendfile kselftest
tls/sw: Support MSG_SPLICE_PAGES
tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES
tls/device: Support MSG_SPLICE_PAGES
tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES
net: Add samples for network I/O and splicing

fs/splice.c | 176 ++++++++++++++++++------
include/linux/fs.h | 2 -
include/linux/socket.h | 4 +-
include/linux/splice.h | 2 +
net/socket.c | 26 +---
net/tls/tls_device.c | 97 ++++++-------
net/tls/tls_sw.c | 217 +++++++++++-------------------
samples/Kconfig | 14 ++
samples/Makefile | 1 +
samples/net/Makefile | 13 ++
samples/net/alg-encrypt.c | 206 ++++++++++++++++++++++++++++
samples/net/alg-hash.c | 147 ++++++++++++++++++++
samples/net/splice-out.c | 147 ++++++++++++++++++++
samples/net/tcp-send.c | 177 ++++++++++++++++++++++++
samples/net/tcp-sink.c | 80 +++++++++++
samples/net/tls-send.c | 188 ++++++++++++++++++++++++++
samples/net/tls-sink.c | 104 ++++++++++++++
samples/net/udp-send.c | 156 +++++++++++++++++++++
samples/net/udp-sink.c | 84 ++++++++++++
samples/net/unix-send.c | 151 +++++++++++++++++++++
samples/net/unix-sink.c | 54 ++++++++
tools/testing/selftests/net/tls.c | 6 +-
22 files changed, 1792 insertions(+), 260 deletions(-)
create mode 100644 samples/net/Makefile
create mode 100644 samples/net/alg-encrypt.c
create mode 100644 samples/net/alg-hash.c
create mode 100644 samples/net/splice-out.c
create mode 100644 samples/net/tcp-send.c
create mode 100644 samples/net/tcp-sink.c
create mode 100644 samples/net/tls-send.c
create mode 100644 samples/net/tls-sink.c
create mode 100644 samples/net/udp-send.c
create mode 100644 samples/net/udp-sink.c
create mode 100644 samples/net/unix-send.c
create mode 100644 samples/net/unix-sink.c



2023-06-02 15:13:40

by David Howells

[permalink] [raw]
Subject: [PATCH net-next v3 05/11] splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor()

splice_direct_to_actor() doesn't manage SPLICE_F_MORE correctly[1] - and,
as a result, it incorrectly signals/fails to signal MSG_MORE when splicing
to a socket. The problem I'm seeing happens when a short splice occurs
because we got a short read due to hitting the EOF on a file: as the length
read (read_len) is less than the remaining size to be spliced (len),
SPLICE_F_MORE (and thus MSG_MORE) is set.

The issue is that, for the moment, we have no way to know *why* the short
read occurred and so can't make a good decision on whether we *should* keep
MSG_MORE set. Further, the argument can be made that it should be left to
userspace to decide how to handle it - userspace could perform some sort of
cancellation for example.

MSG_SENDPAGE_NOTLAST was added to work around this, but that is also set
incorrectly under some circumstances - for example if a short read fills a
single pipe_buffer, but the next read would return more (seqfile can do
this).

This was observed with the multi_chunk_sendfile tests in the tls kselftest
program. Some of those tests would hang and time out when the last chunk
of file was less than the sendfile request size:

build/kselftest/net/tls -r tls.12_aes_gcm.multi_chunk_sendfile

This has been observed before[2] and worked around in AF_TLS[3].

Fix this by making splice_direct_to_actor() always signal SPLICE_F_MORE if
we haven't yet hit the requested operation size. SPLICE_F_MORE remains
signalled if the user passed it in to splice() but otherwise gets cleared
when we've read sufficient data to fulfill the request. The cleanup of a
short splice to userspace is left to userspace.

[!] Note that this changes user-visible behaviour. It will cause the
multi_chunk_sendfile tests in the TLS kselftest to fail. This failure
in the testsuite will be addressed in a subsequent patch by making
userspace do a zero-length send().

It appears that SPLICE_F_MORE is only used by splice-to-socket.

Signed-off-by: David Howells <[email protected]>
cc: Linus Torvalds <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: Al Viro <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: Jan Kara <[email protected]>
cc: Jeff Layton <[email protected]>
cc: David Hildenbrand <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Chuck Lever <[email protected]>
cc: Boris Pismenny <[email protected]>
cc: John Fastabend <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]

Link: https://lore.kernel.org/r/[email protected]/ [1]
Link: https://lore.kernel.org/r/[email protected]/ [2]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d452d48b9f8b1a7f8152d33ef52cfd7fe1735b0a [3]
---
fs/splice.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 9b1d43c0c562..c71bd8e03469 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1052,13 +1052,17 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
*/
bytes = 0;
len = sd->total_len;
+
+ /* Don't block on output, we have to drain the direct pipe. */
flags = sd->flags;
+ sd->flags &= ~SPLICE_F_NONBLOCK;

/*
- * Don't block on output, we have to drain the direct pipe.
+ * We signal MORE until we've read sufficient data to fulfill the
+ * request and we keep signalling it if the caller set it.
*/
- sd->flags &= ~SPLICE_F_NONBLOCK;
more = sd->flags & SPLICE_F_MORE;
+ sd->flags |= SPLICE_F_MORE;

WARN_ON_ONCE(!pipe_empty(pipe->head, pipe->tail));

@@ -1074,14 +1078,12 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
sd->total_len = read_len;

/*
- * If more data is pending, set SPLICE_F_MORE
- * If this is the last data and SPLICE_F_MORE was not set
- * initially, clears it.
+ * If we now have sufficient data to fulfill the request then
+ * we clear SPLICE_F_MORE if it was not set initially.
*/
- if (read_len < len)
- sd->flags |= SPLICE_F_MORE;
- else if (!more)
+ if (read_len >= len && !more)
sd->flags &= ~SPLICE_F_MORE;
+
/*
* NOTE: nonblocking mode only applies to the input. We
* must not do the output in nonblocking mode as then we


2023-06-02 15:13:43

by David Howells

[permalink] [raw]
Subject: [PATCH net-next v3 03/11] tls/sw: Use zero-length sendmsg() without MSG_MORE to flush

Allow userspace to end a TLS record without supplying any data by calling
send()/sendto()/sendmsg() with no data and no MSG_MORE flag. This can be
used to flush a previous send/splice that had MSG_MORE or SPLICE_F_MORE set
or a sendfile() that was incomplete.

Without this, a zero-length send to tls-sw is just ignored. I think
tls-device will do the right thing without modification.

Signed-off-by: David Howells <[email protected]>
cc: Chuck Lever <[email protected]>
cc: Boris Pismenny <[email protected]>
cc: John Fastabend <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
---
net/tls/tls_sw.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index cac1adc968e8..6aa6d17888f5 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -945,7 +945,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
struct tls_rec *rec;
int required_size;
int num_async = 0;
- bool full_record;
+ bool full_record = false;
int record_room;
int num_zc = 0;
int orig_size;
@@ -971,6 +971,9 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
}
}

+ if (!msg_data_left(msg) && eor)
+ goto just_flush;
+
while (msg_data_left(msg)) {
if (sk->sk_err) {
ret = -sk->sk_err;
@@ -1082,6 +1085,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
*/
tls_ctx->pending_open_record_frags = true;
copied += try_to_copy;
+just_flush:
if (full_record || eor) {
ret = bpf_exec_tx_verdict(msg_pl, sk, full_record,
record_type, &copied,


2023-06-02 15:13:58

by David Howells

[permalink] [raw]
Subject: [PATCH net-next v3 02/11] tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg

Allow MSG_SPLICE_PAGES to be specified to sendmsg() but treat it as normal
sendmsg for now. This means the data will just be copied until
MSG_SPLICE_PAGES is handled.

Signed-off-by: David Howells <[email protected]>
cc: Chuck Lever <[email protected]>
cc: Boris Pismenny <[email protected]>
cc: John Fastabend <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
---
net/tls/tls_device.c | 3 ++-
net/tls/tls_sw.c | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index a959572a816f..9ef766e41c7a 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -447,7 +447,8 @@ static int tls_push_data(struct sock *sk,
long timeo;

if (flags &
- ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST))
+ ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST |
+ MSG_SPLICE_PAGES))
return -EOPNOTSUPP;

if (unlikely(sk->sk_err))
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 6e6a7c37d685..cac1adc968e8 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -953,7 +953,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
int pending;

if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
- MSG_CMSG_COMPAT))
+ MSG_CMSG_COMPAT | MSG_SPLICE_PAGES))
return -EOPNOTSUPP;

ret = mutex_lock_interruptible(&tls_ctx->tx_lock);


2023-06-02 15:16:06

by David Howells

[permalink] [raw]
Subject: [PATCH net-next v3 04/11] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()

Replace generic_splice_sendpage() + splice_from_pipe + pipe_to_sendpage()
with a net-specific handler, splice_to_socket(), that calls sendmsg() with
MSG_SPLICE_PAGES set instead of calling ->sendpage().

MSG_MORE is used to indicate if the sendmsg() is expected to be followed
with more data.

This allows multiple pipe-buffer pages to be passed in a single call in a
BVEC iterator, allowing the processing to be pushed down to a loop in the
protocol driver. This helps pave the way for passing multipage folios down
too.

Protocols that haven't been converted to handle MSG_SPLICE_PAGES yet should
just ignore it and do a normal sendmsg() for now - although that may be a
bit slower as it may copy everything.

Signed-off-by: David Howells <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
---
fs/splice.c | 158 +++++++++++++++++++++++++++++++++--------
include/linux/fs.h | 2 -
include/linux/splice.h | 2 +
net/socket.c | 26 +------
4 files changed, 131 insertions(+), 57 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 3e06611d19ae..9b1d43c0c562 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -33,6 +33,7 @@
#include <linux/fsnotify.h>
#include <linux/security.h>
#include <linux/gfp.h>
+#include <linux/net.h>
#include <linux/socket.h>
#include <linux/sched/signal.h>

@@ -448,30 +449,6 @@ const struct pipe_buf_operations nosteal_pipe_buf_ops = {
};
EXPORT_SYMBOL(nosteal_pipe_buf_ops);

-/*
- * Send 'sd->len' bytes to socket from 'sd->file' at position 'sd->pos'
- * using sendpage(). Return the number of bytes sent.
- */
-static int pipe_to_sendpage(struct pipe_inode_info *pipe,
- struct pipe_buffer *buf, struct splice_desc *sd)
-{
- struct file *file = sd->u.file;
- loff_t pos = sd->pos;
- int more;
-
- if (!likely(file->f_op->sendpage))
- return -EINVAL;
-
- more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
-
- if (sd->len < sd->total_len &&
- pipe_occupancy(pipe->head, pipe->tail) > 1)
- more |= MSG_SENDPAGE_NOTLAST;
-
- return file->f_op->sendpage(file, buf->page, buf->offset,
- sd->len, &pos, more);
-}
-
static void wakeup_pipe_writers(struct pipe_inode_info *pipe)
{
smp_mb();
@@ -652,7 +629,7 @@ static void splice_from_pipe_end(struct pipe_inode_info *pipe, struct splice_des
* Description:
* This function does little more than loop over the pipe and call
* @actor to do the actual moving of a single struct pipe_buffer to
- * the desired destination. See pipe_to_file, pipe_to_sendpage, or
+ * the desired destination. See pipe_to_file, pipe_to_sendmsg, or
* pipe_to_user.
*
*/
@@ -833,8 +810,9 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,

EXPORT_SYMBOL(iter_file_splice_write);

+#ifdef CONFIG_NET
/**
- * generic_splice_sendpage - splice data from a pipe to a socket
+ * splice_to_socket - splice data from a pipe to a socket
* @pipe: pipe to splice from
* @out: socket to write to
* @ppos: position in @out
@@ -846,13 +824,131 @@ EXPORT_SYMBOL(iter_file_splice_write);
* is involved.
*
*/
-ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe, struct file *out,
- loff_t *ppos, size_t len, unsigned int flags)
+ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out,
+ loff_t *ppos, size_t len, unsigned int flags)
{
- return splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_sendpage);
-}
+ struct socket *sock = sock_from_file(out);
+ struct bio_vec bvec[16];
+ struct msghdr msg = {};
+ ssize_t ret;
+ size_t spliced = 0;
+ bool need_wakeup = false;
+
+ pipe_lock(pipe);
+
+ while (len > 0) {
+ unsigned int head, tail, mask, bc = 0;
+ size_t remain = len;
+
+ /*
+ * Check for signal early to make process killable when there
+ * are always buffers available
+ */
+ ret = -ERESTARTSYS;
+ if (signal_pending(current))
+ break;
+
+ while (pipe_empty(pipe->head, pipe->tail)) {
+ ret = 0;
+ if (!pipe->writers)
+ goto out;
+
+ if (spliced)
+ goto out;
+
+ ret = -EAGAIN;
+ if (flags & SPLICE_F_NONBLOCK)
+ goto out;
+
+ ret = -ERESTARTSYS;
+ if (signal_pending(current))
+ goto out;
+
+ if (need_wakeup) {
+ wakeup_pipe_writers(pipe);
+ need_wakeup = false;
+ }
+
+ pipe_wait_readable(pipe);
+ }
+
+ head = pipe->head;
+ tail = pipe->tail;
+ mask = pipe->ring_size - 1;
+
+ while (!pipe_empty(head, tail)) {
+ struct pipe_buffer *buf = &pipe->bufs[tail & mask];
+ size_t seg;

-EXPORT_SYMBOL(generic_splice_sendpage);
+ if (!buf->len) {
+ tail++;
+ continue;
+ }
+
+ seg = min_t(size_t, remain, buf->len);
+ seg = min_t(size_t, seg, PAGE_SIZE);
+
+ ret = pipe_buf_confirm(pipe, buf);
+ if (unlikely(ret)) {
+ if (ret == -ENODATA)
+ ret = 0;
+ break;
+ }
+
+ bvec_set_page(&bvec[bc++], buf->page, seg, buf->offset);
+ remain -= seg;
+ if (seg >= buf->len)
+ tail++;
+ if (bc >= ARRAY_SIZE(bvec))
+ break;
+ }
+
+ if (!bc)
+ break;
+
+ msg.msg_flags = MSG_SPLICE_PAGES;
+ if (flags & SPLICE_F_MORE)
+ msg.msg_flags |= MSG_MORE;
+ if (remain && pipe_occupancy(pipe->head, tail) > 0)
+ msg.msg_flags |= MSG_MORE;
+
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, bvec, bc,
+ len - remain);
+ ret = sock_sendmsg(sock, &msg);
+ if (ret <= 0)
+ break;
+
+ spliced += ret;
+ len -= ret;
+ tail = pipe->tail;
+ while (ret > 0) {
+ struct pipe_buffer *buf = &pipe->bufs[tail & mask];
+ size_t seg = min_t(size_t, ret, buf->len);
+
+ buf->offset += seg;
+ buf->len -= seg;
+ ret -= seg;
+
+ if (!buf->len) {
+ pipe_buf_release(pipe, buf);
+ tail++;
+ }
+ }
+
+ if (tail != pipe->tail) {
+ pipe->tail = tail;
+ if (pipe->files)
+ need_wakeup = true;
+ }
+ }
+
+out:
+ pipe_unlock(pipe);
+ if (need_wakeup)
+ wakeup_pipe_writers(pipe);
+ return spliced ?: ret;
+}
+#endif

static int warn_unsupported(struct file *file, const char *op)
{
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 21a981680856..f8254c3acf83 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2759,8 +2759,6 @@ extern ssize_t generic_file_splice_read(struct file *, loff_t *,
struct pipe_inode_info *, size_t, unsigned int);
extern ssize_t iter_file_splice_write(struct pipe_inode_info *,
struct file *, loff_t *, size_t, unsigned int);
-extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe,
- struct file *out, loff_t *, size_t len, unsigned int flags);
extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
loff_t *opos, size_t len, unsigned int flags);

diff --git a/include/linux/splice.h b/include/linux/splice.h
index a55179fd60fc..991ae318b6eb 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -84,6 +84,8 @@ extern long do_splice(struct file *in, loff_t *off_in,

extern long do_tee(struct file *in, struct file *out, size_t len,
unsigned int flags);
+extern ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out,
+ loff_t *ppos, size_t len, unsigned int flags);

/*
* for dynamic pipe sizing
diff --git a/net/socket.c b/net/socket.c
index 3df96e9ba4e2..c4d9104418c8 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -57,6 +57,7 @@
#include <linux/mm.h>
#include <linux/socket.h>
#include <linux/file.h>
+#include <linux/splice.h>
#include <linux/net.h>
#include <linux/interrupt.h>
#include <linux/thread_info.h>
@@ -126,8 +127,6 @@ static long compat_sock_ioctl(struct file *file,
unsigned int cmd, unsigned long arg);
#endif
static int sock_fasync(int fd, struct file *filp, int on);
-static ssize_t sock_sendpage(struct file *file, struct page *page,
- int offset, size_t size, loff_t *ppos, int more);
static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags);
@@ -162,8 +161,7 @@ static const struct file_operations socket_file_ops = {
.mmap = sock_mmap,
.release = sock_close,
.fasync = sock_fasync,
- .sendpage = sock_sendpage,
- .splice_write = generic_splice_sendpage,
+ .splice_write = splice_to_socket,
.splice_read = sock_splice_read,
.show_fdinfo = sock_show_fdinfo,
};
@@ -1066,26 +1064,6 @@ int kernel_recvmsg(struct socket *sock, struct msghdr *msg,
}
EXPORT_SYMBOL(kernel_recvmsg);

-static ssize_t sock_sendpage(struct file *file, struct page *page,
- int offset, size_t size, loff_t *ppos, int more)
-{
- struct socket *sock;
- int flags;
- int ret;
-
- sock = file->private_data;
-
- flags = (file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;
- /* more is a combination of MSG_MORE and MSG_SENDPAGE_NOTLAST */
- flags |= more;
-
- ret = kernel_sendpage(sock, page, offset, size, flags);
-
- if (trace_sock_send_length_enabled())
- call_trace_sock_send_length(sock->sk, ret, 0);
- return ret;
-}
-
static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags)


2023-06-02 15:17:11

by David Howells

[permalink] [raw]
Subject: [PATCH net-next v3 06/11] tls: Address behaviour change in multi_chunk_sendfile kselftest

The multi_chunk_sendfile tests in the TLS kselftest now fail because the
behaviour of sendfile()[*] changed when SPLICE_F_MORE signalling was fixed.
Now MSG_MORE is signalled to the socket until we have read sufficient data
to fulfill the request - which means if we get a short read, MSG_MORE isn't
seen to be dropped and the TLS record remains pending.

[*] This will also affect splice() if SPLICE_F_MORE isn't included in the
flags.

Fix the TLS multi_chunk_sendfile kselftest to attempt to flush the
outstanding TLS record if we get a short sendfile() by doing a zero-length
send() with MSG_MORE unset.

Signed-off-by: David Howells <[email protected]>
cc: Chuck Lever <[email protected]>
cc: Boris Pismenny <[email protected]>
cc: John Fastabend <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
---
tools/testing/selftests/net/tls.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index e699548d4247..8f4bed8aacc0 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -377,7 +377,7 @@ static void chunked_sendfile(struct __test_metadata *_metadata,
char buf[TLS_PAYLOAD_MAX_LEN];
uint16_t test_payload_size;
int size = 0;
- int ret;
+ int ret = 0;
char filename[] = "/tmp/mytemp.XXXXXX";
int fd = mkstemp(filename);
off_t offset = 0;
@@ -398,6 +398,10 @@ static void chunked_sendfile(struct __test_metadata *_metadata,
size -= ret;
}

+ /* Flush the TLS record on a short read. */
+ if (ret < chunk_size)
+ EXPECT_EQ(send(self->fd, "", 0, 0), 0);
+
EXPECT_EQ(recv(self->cfd, buf, test_payload_size, MSG_WAITALL),
test_payload_size);



2023-06-02 15:21:10

by David Howells

[permalink] [raw]
Subject: [PATCH net-next v3 01/11] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace

It is necessary to allow MSG_SENDPAGE_* to be passed into ->sendmsg() to
allow sendmsg(MSG_SPLICE_PAGES) to replace ->sendpage(). Unblocking them
in the network protocol, however, allows these flags to be passed in by
userspace too[1].

Fix this by marking MSG_SENDPAGE_NOPOLICY, MSG_SENDPAGE_NOTLAST and
MSG_SENDPAGE_DECRYPTED as internal flags, which causes sendmsg() to object
if they are passed to sendmsg() by userspace. Network protocol ->sendmsg()
implementations can then allow them through.

Note that it should be possible to remove MSG_SENDPAGE_NOTLAST once
sendpage is removed as a whole slew of pages will be passed in in one go by
splice through sendmsg, with MSG_MORE being set if it has more data waiting
in the pipe.

Signed-off-by: David Howells <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Chuck Lever <[email protected]>
cc: Boris Pismenny <[email protected]>
cc: John Fastabend <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/ [1]
---
include/linux/socket.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index bd1cc3238851..3fd3436bc09f 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -339,7 +339,9 @@ struct ucred {
#endif

/* Flags to be cleared on entry by sendmsg and sendmmsg syscalls */
-#define MSG_INTERNAL_SENDMSG_FLAGS (MSG_SPLICE_PAGES)
+#define MSG_INTERNAL_SENDMSG_FLAGS \
+ (MSG_SPLICE_PAGES | MSG_SENDPAGE_NOPOLICY | MSG_SENDPAGE_NOTLAST | \
+ MSG_SENDPAGE_DECRYPTED)

/* Setsockoptions(2) level. Thanks to BSD these must match IPPROTO_xxx */
#define SOL_IP 0


2023-06-02 15:22:22

by David Howells

[permalink] [raw]
Subject: [PATCH net-next v3 10/11] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES

Convert tls_device_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather
than directly splicing in the pages itself. With that, the tls_iter_offset
union is no longer necessary and can be replaced with an iov_iter pointer
and the zc_page argument to tls_push_data() can also be removed.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <[email protected]>
cc: Chuck Lever <[email protected]>
cc: Boris Pismenny <[email protected]>
cc: John Fastabend <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
---
net/tls/tls_device.c | 84 +++++++++++---------------------------------
1 file changed, 20 insertions(+), 64 deletions(-)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index f2f1aff19e4a..c698d6d60219 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -422,16 +422,10 @@ static int tls_device_copy_data(void *addr, size_t bytes, struct iov_iter *i)
return 0;
}

-union tls_iter_offset {
- struct iov_iter *msg_iter;
- int offset;
-};
-
static int tls_push_data(struct sock *sk,
- union tls_iter_offset iter_offset,
+ struct iov_iter *iter,
size_t size, int flags,
- unsigned char record_type,
- struct page *zc_page)
+ unsigned char record_type)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_prot_info *prot = &tls_ctx->prot_info;
@@ -500,22 +494,13 @@ static int tls_push_data(struct sock *sk,
record = ctx->open_record;

copy = min_t(size_t, size, max_open_record_len - record->len);
- if (copy && zc_page) {
- struct page_frag zc_pfrag;
-
- zc_pfrag.page = zc_page;
- zc_pfrag.offset = iter_offset.offset;
- zc_pfrag.size = copy;
- tls_append_frag(record, &zc_pfrag, copy);
-
- iter_offset.offset += copy;
- } else if (copy && (flags & MSG_SPLICE_PAGES)) {
+ if (copy && (flags & MSG_SPLICE_PAGES)) {
struct page_frag zc_pfrag;
struct page **pages = &zc_pfrag.page;
size_t off;

- rc = iov_iter_extract_pages(iter_offset.msg_iter,
- &pages, copy, 1, 0, &off);
+ rc = iov_iter_extract_pages(iter, &pages,
+ copy, 1, 0, &off);
if (rc <= 0) {
if (rc == 0)
rc = -EIO;
@@ -524,7 +509,7 @@ static int tls_push_data(struct sock *sk,
copy = rc;

if (WARN_ON_ONCE(!sendpage_ok(zc_pfrag.page))) {
- iov_iter_revert(iter_offset.msg_iter, copy);
+ iov_iter_revert(iter, copy);
rc = -EIO;
goto handle_error;
}
@@ -537,7 +522,7 @@ static int tls_push_data(struct sock *sk,

rc = tls_device_copy_data(page_address(pfrag->page) +
pfrag->offset, copy,
- iter_offset.msg_iter);
+ iter);
if (rc)
goto handle_error;
tls_append_frag(record, pfrag, copy);
@@ -592,7 +577,6 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
{
unsigned char record_type = TLS_RECORD_TYPE_DATA;
struct tls_context *tls_ctx = tls_get_ctx(sk);
- union tls_iter_offset iter;
int rc;

if (!tls_ctx->zerocopy_sendfile)
@@ -607,8 +591,8 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
goto out;
}

- iter.msg_iter = &msg->msg_iter;
- rc = tls_push_data(sk, iter, size, msg->msg_flags, record_type, NULL);
+ rc = tls_push_data(sk, &msg->msg_iter, size, msg->msg_flags,
+ record_type);

out:
release_sock(sk);
@@ -619,44 +603,18 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
int tls_device_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags)
{
- struct tls_context *tls_ctx = tls_get_ctx(sk);
- union tls_iter_offset iter_offset;
- struct iov_iter msg_iter;
- char *kaddr;
- struct kvec iov;
- int rc;
+ struct bio_vec bvec;
+ struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };

if (flags & MSG_SENDPAGE_NOTLAST)
- flags |= MSG_MORE;
-
- mutex_lock(&tls_ctx->tx_lock);
- lock_sock(sk);
+ msg.msg_flags |= MSG_MORE;

- if (flags & MSG_OOB) {
- rc = -EOPNOTSUPP;
- goto out;
- }
-
- if (tls_ctx->zerocopy_sendfile) {
- iter_offset.offset = offset;
- rc = tls_push_data(sk, iter_offset, size,
- flags, TLS_RECORD_TYPE_DATA, page);
- goto out;
- }
-
- kaddr = kmap(page);
- iov.iov_base = kaddr + offset;
- iov.iov_len = size;
- iov_iter_kvec(&msg_iter, ITER_SOURCE, &iov, 1, size);
- iter_offset.msg_iter = &msg_iter;
- rc = tls_push_data(sk, iter_offset, size, flags, TLS_RECORD_TYPE_DATA,
- NULL);
- kunmap(page);
+ if (flags & MSG_OOB)
+ return -EOPNOTSUPP;

-out:
- release_sock(sk);
- mutex_unlock(&tls_ctx->tx_lock);
- return rc;
+ bvec_set_page(&bvec, page, size, offset);
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+ return tls_device_sendmsg(sk, &msg, size);
}

struct tls_record_info *tls_get_record(struct tls_offload_context_tx *context,
@@ -721,12 +679,10 @@ EXPORT_SYMBOL(tls_get_record);

static int tls_device_push_pending_record(struct sock *sk, int flags)
{
- union tls_iter_offset iter;
- struct iov_iter msg_iter;
+ struct iov_iter iter;

- iov_iter_kvec(&msg_iter, ITER_SOURCE, NULL, 0, 0);
- iter.msg_iter = &msg_iter;
- return tls_push_data(sk, iter, 0, flags, TLS_RECORD_TYPE_DATA, NULL);
+ iov_iter_kvec(&iter, ITER_SOURCE, NULL, 0, 0);
+ return tls_push_data(sk, &iter, 0, flags, TLS_RECORD_TYPE_DATA);
}

void tls_device_write_space(struct sock *sk, struct tls_context *ctx)


2023-06-02 15:22:48

by David Howells

[permalink] [raw]
Subject: [PATCH net-next v3 08/11] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES

Convert tls_sw_sendpage() and tls_sw_sendpage_locked() to use sendmsg()
with MSG_SPLICE_PAGES rather than directly splicing in the pages itself.

[!] Note that tls_sw_sendpage_locked() appears to have the wrong locking
upstream. I think the caller will only hold the socket lock, but it
should hold tls_ctx->tx_lock too.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <[email protected]>
cc: Chuck Lever <[email protected]>
cc: Boris Pismenny <[email protected]>
cc: John Fastabend <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
cc: [email protected]
---
net/tls/tls_sw.c | 165 +++++++++--------------------------------------
1 file changed, 31 insertions(+), 134 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 14636cc6c3a4..4caed478bef8 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -961,7 +961,8 @@ static int tls_sw_sendmsg_splice(struct sock *sk, struct msghdr *msg,
return 0;
}

-int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
+static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
+ size_t size)
{
long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
struct tls_context *tls_ctx = tls_get_ctx(sk);
@@ -984,15 +985,6 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
int ret = 0;
int pending;

- if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
- MSG_CMSG_COMPAT | MSG_SPLICE_PAGES))
- return -EOPNOTSUPP;
-
- ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
- if (ret)
- return ret;
- lock_sock(sk);
-
if (unlikely(msg->msg_controllen)) {
ret = tls_process_cmsg(sk, msg, &record_type);
if (ret) {
@@ -1197,157 +1189,62 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)

send_end:
ret = sk_stream_error(sk, msg->msg_flags, ret);
-
- release_sock(sk);
- mutex_unlock(&tls_ctx->tx_lock);
return copied > 0 ? copied : ret;
}

-static int tls_sw_do_sendpage(struct sock *sk, struct page *page,
- int offset, size_t size, int flags)
+int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
{
- long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
- struct tls_prot_info *prot = &tls_ctx->prot_info;
- unsigned char record_type = TLS_RECORD_TYPE_DATA;
- struct sk_msg *msg_pl;
- struct tls_rec *rec;
- int num_async = 0;
- ssize_t copied = 0;
- bool full_record;
- int record_room;
- int ret = 0;
- bool eor;
-
- eor = !(flags & MSG_SENDPAGE_NOTLAST);
- sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
-
- /* Call the sk_stream functions to manage the sndbuf mem. */
- while (size > 0) {
- size_t copy, required_size;
-
- if (sk->sk_err) {
- ret = -sk->sk_err;
- goto sendpage_end;
- }
-
- if (ctx->open_rec)
- rec = ctx->open_rec;
- else
- rec = ctx->open_rec = tls_get_rec(sk);
- if (!rec) {
- ret = -ENOMEM;
- goto sendpage_end;
- }
-
- msg_pl = &rec->msg_plaintext;
-
- full_record = false;
- record_room = TLS_MAX_PAYLOAD_SIZE - msg_pl->sg.size;
- copy = size;
- if (copy >= record_room) {
- copy = record_room;
- full_record = true;
- }
-
- required_size = msg_pl->sg.size + copy + prot->overhead_size;
-
- if (!sk_stream_memory_free(sk))
- goto wait_for_sndbuf;
-alloc_payload:
- ret = tls_alloc_encrypted_msg(sk, required_size);
- if (ret) {
- if (ret != -ENOSPC)
- goto wait_for_memory;
-
- /* Adjust copy according to the amount that was
- * actually allocated. The difference is due
- * to max sg elements limit
- */
- copy -= required_size - msg_pl->sg.size;
- full_record = true;
- }
-
- sk_msg_page_add(msg_pl, page, copy, offset);
- sk_mem_charge(sk, copy);
-
- offset += copy;
- size -= copy;
- copied += copy;
-
- tls_ctx->pending_open_record_frags = true;
- if (full_record || eor || sk_msg_full(msg_pl)) {
- ret = bpf_exec_tx_verdict(msg_pl, sk, full_record,
- record_type, &copied, flags);
- if (ret) {
- if (ret == -EINPROGRESS)
- num_async++;
- else if (ret == -ENOMEM)
- goto wait_for_memory;
- else if (ret != -EAGAIN) {
- if (ret == -ENOSPC)
- ret = 0;
- goto sendpage_end;
- }
- }
- }
- continue;
-wait_for_sndbuf:
- set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
-wait_for_memory:
- ret = sk_stream_wait_memory(sk, &timeo);
- if (ret) {
- if (ctx->open_rec)
- tls_trim_both_msgs(sk, msg_pl->sg.size);
- goto sendpage_end;
- }
+ int ret;

- if (ctx->open_rec)
- goto alloc_payload;
- }
+ if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
+ MSG_CMSG_COMPAT | MSG_SPLICE_PAGES |
+ MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
+ return -EOPNOTSUPP;

- if (num_async) {
- /* Transmit if any encryptions have completed */
- if (test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) {
- cancel_delayed_work(&ctx->tx_work.work);
- tls_tx_records(sk, flags);
- }
- }
-sendpage_end:
- ret = sk_stream_error(sk, flags, ret);
- return copied > 0 ? copied : ret;
+ ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
+ if (ret)
+ return ret;
+ lock_sock(sk);
+ ret = tls_sw_sendmsg_locked(sk, msg, size);
+ release_sock(sk);
+ mutex_unlock(&tls_ctx->tx_lock);
+ return ret;
}

int tls_sw_sendpage_locked(struct sock *sk, struct page *page,
int offset, size_t size, int flags)
{
+ struct bio_vec bvec;
+ struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
+
if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY |
MSG_NO_SHARED_FRAGS))
return -EOPNOTSUPP;
+ if (flags & MSG_SENDPAGE_NOTLAST)
+ msg.msg_flags |= MSG_MORE;

- return tls_sw_do_sendpage(sk, page, offset, size, flags);
+ bvec_set_page(&bvec, page, size, offset);
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+ return tls_sw_sendmsg_locked(sk, &msg, size);
}

int tls_sw_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags)
{
- struct tls_context *tls_ctx = tls_get_ctx(sk);
- int ret;
+ struct bio_vec bvec;
+ struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };

if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
return -EOPNOTSUPP;
+ if (flags & MSG_SENDPAGE_NOTLAST)
+ msg.msg_flags |= MSG_MORE;

- ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
- if (ret)
- return ret;
- lock_sock(sk);
- ret = tls_sw_do_sendpage(sk, page, offset, size, flags);
- release_sock(sk);
- mutex_unlock(&tls_ctx->tx_lock);
- return ret;
+ bvec_set_page(&bvec, page, size, offset);
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+ return tls_sw_sendmsg(sk, &msg, size);
}

static int


2023-06-02 15:37:53

by David Howells

[permalink] [raw]
Subject: [PATCH net-next v3 11/11] net: Add samples for network I/O and splicing

Add some small sample programs for doing network I/O including splicing.

There are three IPv4/IPv6 servers: tcp-sink, tls-sink and udp-sink. They
can be given a port number by passing "-p <port>" and will listen on an
IPv6 socket unless given a "-4" flag, in which case they'll listen for IPv4
only.

There are three IPv4/IPv6 clients: tcp-send, tls-send and udp-send. They
are given a file to get data from (or "-" for stdin) and the name of a
server to talk to. They can also be given a port number by passing "-p
<port>", "-4" or "-6" to force the use of IPv4 or IPv6, "-s" to indicate
they should use splice/sendfile to transfer the data and "-n" to specify
how much data to copy. If "-s" is given, the input will be spliced if it's
a pipe and sendfiled otherwise.

A driver program, splice-out, is provided to splice data from a file/stdin
to stdout and can be used to pipe into the aforementioned clients for
testing splice. This takes the name of the file to splice from (or "-" for
stdin). It can also be given "-w <size>" to indicate the maximum size of
each splice, "-k <size>" if a chunk of the input should be skipped between
splices to prevent coalescence and "-s" if sendfile should be used instead
of splice.

Additionally, there is an AF_UNIX client and server. These are similar to
the IPv[46] programs, except both take a socket path and there is no option
to change the port number.

And then there are two AF_ALG clients (there is no server). These are
similar to the other clients, except no destination is specified. One
exercised skcipher encryption and the other hashing.

Examples include:

./splice-out -w0x400 /foo/16K 4K | ./alg-encrypt -s -
./splice-out -w0x400 /foo/1M | ./unix-send -s - /tmp/foo
./splice-out -w0x400 /foo/16K 16K -w1 | ./tls-send -s6 -n16K - servbox
./tcp-send /bin/ls 192.168.6.1
./udp-send -4 -p5555 /foo/4K localhost

where, for example, /foo/16K is a 16KiB file.

Signed-off-by: David Howells <[email protected]>
cc: Willem de Bruijn <[email protected]>
cc: Boris Pismenny <[email protected]>
cc: John Fastabend <[email protected]>
cc: Herbert Xu <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: Jens Axboe <[email protected]>
cc: [email protected]
---
samples/Kconfig | 14 +++
samples/Makefile | 1 +
samples/net/Makefile | 13 +++
samples/net/alg-encrypt.c | 206 ++++++++++++++++++++++++++++++++++++++
samples/net/alg-hash.c | 147 +++++++++++++++++++++++++++
samples/net/splice-out.c | 147 +++++++++++++++++++++++++++
samples/net/tcp-send.c | 177 ++++++++++++++++++++++++++++++++
samples/net/tcp-sink.c | 80 +++++++++++++++
samples/net/tls-send.c | 188 ++++++++++++++++++++++++++++++++++
samples/net/tls-sink.c | 104 +++++++++++++++++++
samples/net/udp-send.c | 156 +++++++++++++++++++++++++++++
samples/net/udp-sink.c | 84 ++++++++++++++++
samples/net/unix-send.c | 151 ++++++++++++++++++++++++++++
samples/net/unix-sink.c | 54 ++++++++++
14 files changed, 1522 insertions(+)
create mode 100644 samples/net/Makefile
create mode 100644 samples/net/alg-encrypt.c
create mode 100644 samples/net/alg-hash.c
create mode 100644 samples/net/splice-out.c
create mode 100644 samples/net/tcp-send.c
create mode 100644 samples/net/tcp-sink.c
create mode 100644 samples/net/tls-send.c
create mode 100644 samples/net/tls-sink.c
create mode 100644 samples/net/udp-send.c
create mode 100644 samples/net/udp-sink.c
create mode 100644 samples/net/unix-send.c
create mode 100644 samples/net/unix-sink.c

diff --git a/samples/Kconfig b/samples/Kconfig
index b2db430bd3ff..928e06b08b99 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -280,6 +280,20 @@ config SAMPLE_KMEMLEAK
Build a sample program which have explicitly leaks memory to test
kmemleak

+config SAMPLE_NET
+ bool "Build example programs for driving network protocols"
+ depends on NET
+ help
+ Build example userspace programs for driving network protocols. Most
+ of the programs (tcp, udp, tls, unix) come as client-server pairs
+ that allow the test to be split across a network (but not in the unix
+ case); but some, such as the AF_ALG samples are standalone as there
+ is no server per se.
+
+ The programs allow sendfile and splice to be used. An additional
+ program is provided that allows sendfile/splice to stdout for use in
+ piping in to the other programs to operate splice there.
+
source "samples/rust/Kconfig"

endif # SAMPLES
diff --git a/samples/Makefile b/samples/Makefile
index 7727f1a0d6d1..b9fbf80a53be 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -37,3 +37,4 @@ obj-$(CONFIG_SAMPLE_KMEMLEAK) += kmemleak/
obj-$(CONFIG_SAMPLE_CORESIGHT_SYSCFG) += coresight/
obj-$(CONFIG_SAMPLE_FPROBE) += fprobe/
obj-$(CONFIG_SAMPLES_RUST) += rust/
+obj-$(CONFIG_SAMPLE_NET) += net/
diff --git a/samples/net/Makefile b/samples/net/Makefile
new file mode 100644
index 000000000000..0ccd68a36edf
--- /dev/null
+++ b/samples/net/Makefile
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0-only
+userprogs-always-y += \
+ alg-hash \
+ alg-encrypt \
+ splice-out \
+ tcp-send \
+ tcp-sink \
+ tls-send \
+ tls-sink \
+ udp-send \
+ udp-sink \
+ unix-send \
+ unix-sink
diff --git a/samples/net/alg-encrypt.c b/samples/net/alg-encrypt.c
new file mode 100644
index 000000000000..3851b5fbaeda
--- /dev/null
+++ b/samples/net/alg-encrypt.c
@@ -0,0 +1,206 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* AF_ALG hash test
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <getopt.h>
+#include <limits.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/un.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/sendfile.h>
+#include <linux/if_alg.h>
+
+#define OSERROR(X, Y) \
+ do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0)
+#define min(x, y) ((x) < (y) ? (x) : (y))
+
+static unsigned char buffer[4096 * 32] __attribute__((aligned(4096)));
+static unsigned char iv[16];
+static unsigned char key[16];
+
+static const struct sockaddr_alg sa = {
+ .salg_family = AF_ALG,
+ .salg_type = "skcipher",
+ .salg_name = "cbc(aes)",
+};
+
+static void format(void)
+{
+ fprintf(stderr, "alg-send [-ds] [-n<size>] <file>|-\n");
+ exit(2);
+}
+
+static void algif_add_set_op(struct msghdr *msg, unsigned int op)
+{
+ struct cmsghdr *__cmsg;
+
+ __cmsg = msg->msg_control + msg->msg_controllen;
+ __cmsg->cmsg_len = CMSG_LEN(sizeof(unsigned int));
+ __cmsg->cmsg_level = SOL_ALG;
+ __cmsg->cmsg_type = ALG_SET_OP;
+ *(unsigned int *)CMSG_DATA(__cmsg) = op;
+ msg->msg_controllen += CMSG_ALIGN(__cmsg->cmsg_len);
+}
+
+static void algif_add_set_iv(struct msghdr *msg, const void *iv, size_t ivlen)
+{
+ struct af_alg_iv *ivbuf;
+ struct cmsghdr *__cmsg;
+
+ printf("%zx\n", msg->msg_controllen);
+ __cmsg = msg->msg_control + msg->msg_controllen;
+ __cmsg->cmsg_len = CMSG_LEN(sizeof(*ivbuf) + ivlen);
+ __cmsg->cmsg_level = SOL_ALG;
+ __cmsg->cmsg_type = ALG_SET_IV;
+ ivbuf = (struct af_alg_iv *)CMSG_DATA(__cmsg);
+ ivbuf->ivlen = ivlen;
+ memcpy(ivbuf->iv, iv, ivlen);
+ msg->msg_controllen += CMSG_ALIGN(__cmsg->cmsg_len);
+}
+
+int main(int argc, char *argv[])
+{
+ struct msghdr msg;
+ struct stat st;
+ const char *filename;
+ unsigned char ctrl[4096];
+ unsigned int flags = O_RDONLY;
+ ssize_t r, w, o, ret;
+ size_t size = LONG_MAX, total = 0, i, out = 160;
+ char *end;
+ bool use_sendfile = false, all = true;
+ int opt, alg, sock, fd = 0;
+
+ while ((opt = getopt(argc, argv, "dn:s")) != EOF) {
+ switch (opt) {
+ case 'd':
+ flags |= O_DIRECT;
+ break;
+ case 'n':
+ size = strtoul(optarg, &end, 0);
+ switch (*end) {
+ case 'K':
+ case 'k':
+ size *= 1024;
+ break;
+ case 'M':
+ case 'm':
+ size *= 1024 * 1024;
+ break;
+ }
+ all = false;
+ break;
+ case 's':
+ use_sendfile = true;
+ break;
+ default:
+ format();
+ }
+ }
+
+ argc -= optind;
+ argv += optind;
+ if (argc != 1)
+ format();
+ filename = argv[0];
+
+ alg = socket(AF_ALG, SOCK_SEQPACKET, 0);
+ OSERROR(alg, "AF_ALG");
+ OSERROR(bind(alg, (struct sockaddr *)&sa, sizeof(sa)), "bind");
+ OSERROR(setsockopt(alg, SOL_ALG, ALG_SET_KEY, key, sizeof(key)),
+ "ALG_SET_KEY");
+ sock = accept(alg, NULL, 0);
+ OSERROR(sock, "accept");
+
+ if (strcmp(filename, "-") != 0) {
+ fd = open(filename, flags);
+ OSERROR(fd, filename);
+ OSERROR(fstat(fd, &st), filename);
+ size = st.st_size;
+ } else {
+ OSERROR(fstat(fd, &st), argv[2]);
+ }
+
+ memset(&msg, 0, sizeof(msg));
+ msg.msg_control = ctrl;
+ algif_add_set_op(&msg, ALG_OP_ENCRYPT);
+ algif_add_set_iv(&msg, iv, sizeof(iv));
+
+ OSERROR(sendmsg(sock, &msg, MSG_MORE), "sock/sendmsg");
+
+ if (!use_sendfile) {
+ bool more = false;
+
+ while (size) {
+ r = read(fd, buffer, sizeof(buffer));
+ OSERROR(r, filename);
+ if (r == 0)
+ break;
+ size -= r;
+
+ o = 0;
+ do {
+ more = size > 0;
+ w = send(sock, buffer + o, r - o,
+ more ? MSG_MORE : 0);
+ OSERROR(w, "sock/send");
+ total += w;
+ o += w;
+ } while (o < r);
+ }
+
+ if (more)
+ send(sock, NULL, 0, 0);
+ } else if (S_ISFIFO(st.st_mode)) {
+ do {
+ r = splice(fd, NULL, sock, NULL, size,
+ size > 0 ? SPLICE_F_MORE : 0);
+ OSERROR(r, "sock/splice");
+ size -= r;
+ total += r;
+ } while (r > 0 && size > 0);
+ if (size && !all) {
+ fprintf(stderr, "Short splice\n");
+ exit(1);
+ }
+ } else {
+ r = sendfile(sock, fd, NULL, size);
+ OSERROR(r, "sock/sendfile");
+ if (r != size) {
+ fprintf(stderr, "Short sendfile\n");
+ exit(1);
+ }
+ total = r;
+ }
+
+ while (total > 0) {
+ ret = read(sock, buffer, min(sizeof(buffer), total));
+ OSERROR(ret, "sock/read");
+ if (ret == 0)
+ break;
+ total -= ret;
+
+ if (out > 0) {
+ ret = min(out, ret);
+ out -= ret;
+ for (i = 0; i < ret; i++)
+ printf("%02x", (unsigned char)buffer[i]);
+ }
+ printf("...\n");
+ }
+
+ OSERROR(close(sock), "sock/close");
+ OSERROR(close(alg), "alg/close");
+ OSERROR(close(fd), "close");
+ return 0;
+}
diff --git a/samples/net/alg-hash.c b/samples/net/alg-hash.c
new file mode 100644
index 000000000000..df63c87e7661
--- /dev/null
+++ b/samples/net/alg-hash.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* AF_ALG hash test
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <getopt.h>
+#include <limits.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/un.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/sendfile.h>
+#include <linux/if_alg.h>
+
+#define OSERROR(X, Y) \
+ do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0)
+
+static unsigned char buffer[4096 * 32] __attribute__((aligned(4096)));
+
+static const struct sockaddr_alg sa = {
+ .salg_family = AF_ALG,
+ .salg_type = "hash",
+ .salg_name = "sha1",
+};
+
+static void format(void)
+{
+ fprintf(stderr, "alg-send [-ds] [-n<size>] <file>|-\n");
+ exit(2);
+}
+
+int main(int argc, char *argv[])
+{
+ struct stat st;
+ const char *filename;
+ unsigned int flags = O_RDONLY;
+ ssize_t r, w, o, ret;
+ size_t size = LONG_MAX, i;
+ char *end;
+ int use_sendfile = 0;
+ int opt, alg, sock, fd = 0;
+
+ while ((opt = getopt(argc, argv, "n:s")) != EOF) {
+ switch (opt) {
+ case 'd':
+ flags |= O_DIRECT;
+ break;
+ case 'n':
+ size = strtoul(optarg, &end, 0);
+ switch (*end) {
+ case 'K':
+ case 'k':
+ size *= 1024;
+ break;
+ case 'M':
+ case 'm':
+ size *= 1024 * 1024;
+ break;
+ }
+ break;
+ case 's':
+ use_sendfile = true;
+ break;
+ default:
+ format();
+ }
+ }
+
+ argc -= optind;
+ argv += optind;
+ if (argc != 1)
+ format();
+ filename = argv[0];
+
+ alg = socket(AF_ALG, SOCK_SEQPACKET, 0);
+ OSERROR(alg, "AF_ALG");
+ OSERROR(bind(alg, (struct sockaddr *)&sa, sizeof(sa)), "bind");
+ sock = accept(alg, NULL, 0);
+ OSERROR(sock, "accept");
+
+ if (strcmp(filename, "-") != 0) {
+ fd = open(filename, flags);
+ OSERROR(fd, filename);
+ OSERROR(fstat(fd, &st), filename);
+ size = st.st_size;
+ } else {
+ OSERROR(fstat(fd, &st), argv[2]);
+ }
+
+ if (!use_sendfile) {
+ bool more = false;
+
+ while (size) {
+ r = read(fd, buffer, sizeof(buffer));
+ OSERROR(r, filename);
+ if (r == 0)
+ break;
+ size -= r;
+
+ o = 0;
+ do {
+ more = size > 0;
+ w = send(sock, buffer + o, r - o,
+ more ? MSG_MORE : 0);
+ OSERROR(w, "sock/send");
+ o += w;
+ } while (o < r);
+ }
+
+ if (more)
+ send(sock, NULL, 0, 0);
+ } else if (S_ISFIFO(st.st_mode)) {
+ r = splice(fd, NULL, sock, NULL, size, 0);
+ OSERROR(r, "sock/splice");
+ if (r != size) {
+ fprintf(stderr, "Short splice\n");
+ exit(1);
+ }
+ } else {
+ r = sendfile(sock, fd, NULL, size);
+ OSERROR(r, "sock/sendfile");
+ if (r != size) {
+ fprintf(stderr, "Short sendfile\n");
+ exit(1);
+ }
+ }
+
+ ret = read(sock, buffer, sizeof(buffer));
+ OSERROR(ret, "sock/read");
+
+ for (i = 0; i < ret; i++)
+ printf("%02x", (unsigned char)buffer[i]);
+ printf("\n");
+
+ OSERROR(close(sock), "sock/close");
+ OSERROR(close(alg), "alg/close");
+ OSERROR(close(fd), "close");
+ return 0;
+}
diff --git a/samples/net/splice-out.c b/samples/net/splice-out.c
new file mode 100644
index 000000000000..224010dfd387
--- /dev/null
+++ b/samples/net/splice-out.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Splice or sendfile from the given file/stdin to stdout.
+ *
+ * Format: splice-out [-s] <file>|- [<size>]
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <getopt.h>
+#include <sys/stat.h>
+#include <sys/sendfile.h>
+
+#define OSERROR(X, Y) \
+ do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0)
+#define min(x, y) ((x) < (y) ? (x) : (y))
+
+static unsigned char buffer[4096];
+
+static void format(void)
+{
+ fprintf(stderr, "splice-out [-dkN][-s][-wN] <file>|- [<size>]\n");
+ exit(2);
+}
+
+int main(int argc, char *argv[])
+{
+ struct stat st;
+ const char *filename;
+ unsigned int flags = O_RDONLY;
+ ssize_t r;
+ size_t size = 1024 * 1024, skip = 0, unit = 0, part;
+ char *end;
+ bool use_sendfile = false, all = true;
+ int opt, fd = 0;
+
+ while ((opt = getopt(argc, argv, "dk:sw:")),
+ opt != -1) {
+ switch (opt) {
+ case 'd':
+ flags |= O_DIRECT;
+ break;
+ case 'k':
+ /* Skip size - prevent coalescence. */
+ skip = strtoul(optarg, &end, 0);
+ if (skip < 1 || skip >= 4096) {
+ fprintf(stderr, "-kN must be 0<N<4096\n");
+ exit(2);
+ }
+ break;
+ case 's':
+ use_sendfile = 1;
+ break;
+ case 'w':
+ /* Write unit size */
+ unit = strtoul(optarg, &end, 0);
+ if (!unit) {
+ fprintf(stderr, "-wN must be >0\n");
+ exit(2);
+ }
+ switch (*end) {
+ case 'K':
+ case 'k':
+ unit *= 1024;
+ break;
+ case 'M':
+ case 'm':
+ unit *= 1024 * 1024;
+ break;
+ }
+ break;
+ default:
+ format();
+ }
+ }
+
+ argc -= optind;
+ argv += optind;
+
+ if (argc != 1 && argc != 2)
+ format();
+
+ filename = argv[0];
+ if (argc == 2) {
+ size = strtoul(argv[1], &end, 0);
+ switch (*end) {
+ case 'K':
+ case 'k':
+ size *= 1024;
+ break;
+ case 'M':
+ case 'm':
+ size *= 1024 * 1024;
+ break;
+ }
+ all = false;
+ }
+
+ OSERROR(fstat(1, &st), "stdout");
+ if (!S_ISFIFO(st.st_mode)) {
+ fprintf(stderr, "stdout must be a pipe\n");
+ exit(3);
+ }
+
+ if (strcmp(filename, "-") != 0) {
+ fd = open(filename, flags);
+ OSERROR(fd, filename);
+ OSERROR(fstat(fd, &st), filename);
+ if (!all && size > st.st_size) {
+ fprintf(stderr, "%s: Specified size larger than file\n",
+ filename);
+ exit(3);
+ }
+ }
+
+ do {
+ if (skip) {
+ part = skip;
+ do {
+ r = read(fd, buffer, skip);
+ OSERROR(r, filename);
+ part -= r;
+ } while (part > 0 && r > 0);
+ }
+
+ part = unit ? min(size, unit) : size;
+ if (use_sendfile) {
+ r = sendfile(1, fd, NULL, part);
+ OSERROR(r, "sendfile");
+ } else {
+ r = splice(fd, NULL, 1, NULL, part, 0);
+ OSERROR(r, "splice");
+ }
+ if (!all)
+ size -= r;
+ } while (r > 0 && size > 0);
+
+ OSERROR(close(fd), "close");
+ return 0;
+}
diff --git a/samples/net/tcp-send.c b/samples/net/tcp-send.c
new file mode 100644
index 000000000000..608055354789
--- /dev/null
+++ b/samples/net/tcp-send.c
@@ -0,0 +1,177 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * TCP send client. Pass -s to use splice/sendfile; -z to use MSG_ZEROCOPY.
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <getopt.h>
+#include <limits.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <netdb.h>
+#include <netinet/in.h>
+#include <sys/stat.h>
+#include <sys/sendfile.h>
+
+#define OSERROR(X, Y) \
+ do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0)
+
+static unsigned char buffer[4096] __attribute__((aligned(4096)));
+
+static void format(void)
+{
+ fprintf(stderr,
+ "tcp-send [-46dsz][-p<port>][-n<size>] <file>|- <server>\n");
+ exit(2);
+}
+
+int main(int argc, char *argv[])
+{
+ struct addrinfo *addrs = NULL, hints = {};
+ struct stat st;
+ const char *filename, *sockname, *service = "5555";
+ unsigned int flags = O_RDONLY;
+ ssize_t r, w, o;
+ size_t size = LONG_MAX;
+ char *end;
+ bool use_sendfile = false, use_zerocopy = false, all = true;
+ int opt, sock, fd = 0, gai;
+
+ hints.ai_family = AF_UNSPEC;
+ hints.ai_socktype = SOCK_STREAM;
+
+ while ((opt = getopt(argc, argv, "46dn:p:sz")) != EOF) {
+ switch (opt) {
+ case '4':
+ hints.ai_family = AF_INET;
+ break;
+ case '6':
+ hints.ai_family = AF_INET6;
+ break;
+ case 'd':
+ flags |= O_DIRECT;
+ break;
+ case 'n':
+ size = strtoul(optarg, &end, 0);
+ switch (*end) {
+ case 'K':
+ case 'k':
+ size *= 1024;
+ break;
+ case 'M':
+ case 'm':
+ size *= 1024 * 1024;
+ break;
+ }
+ all = false;
+ break;
+ case 'p':
+ service = optarg;
+ break;
+ case 's':
+ use_sendfile = true;
+ break;
+ case 'z':
+ use_zerocopy = true;
+ break;
+ default:
+ format();
+ }
+ }
+
+ argc -= optind;
+ argv += optind;
+ if (argc != 2)
+ format();
+ filename = argv[0];
+ sockname = argv[1];
+
+ gai = getaddrinfo(sockname, service, &hints, &addrs);
+ if (gai) {
+ fprintf(stderr, "%s: %s\n", sockname, gai_strerror(gai));
+ exit(3);
+ }
+
+ if (!addrs) {
+ fprintf(stderr, "%s: No addresses\n", sockname);
+ exit(3);
+ }
+
+ sockname = addrs->ai_canonname;
+ sock = socket(addrs->ai_family, addrs->ai_socktype, addrs->ai_protocol);
+ OSERROR(sock, "socket");
+ OSERROR(connect(sock, addrs->ai_addr, addrs->ai_addrlen), "connect");
+
+ if (strcmp(filename, "-") != 0) {
+ fd = open(filename, flags);
+ OSERROR(fd, filename);
+ OSERROR(fstat(fd, &st), filename);
+ if (size > st.st_size)
+ size = st.st_size;
+ } else {
+ OSERROR(fstat(fd, &st), filename);
+ }
+
+ if (!use_sendfile) {
+ unsigned int flags = 0;
+
+ if (use_zerocopy) {
+ int zcflag = 1;
+
+ OSERROR(setsockopt(sock, SOL_SOCKET, SO_ZEROCOPY,
+ &zcflag, sizeof(zcflag)),
+ "SOCK_ZEROCOPY");
+ flags |= MSG_ZEROCOPY;
+ }
+
+ while (size) {
+ r = read(fd, buffer, sizeof(buffer));
+ OSERROR(r, filename);
+ if (r == 0)
+ break;
+ size -= r;
+
+ o = 0;
+ do {
+ flags &= ~MSG_MORE;
+ if (size > 0)
+ flags |= MSG_MORE;
+ w = send(sock, buffer + o, r - o, flags);
+ OSERROR(w, "sock/send");
+ o += w;
+ } while (o < r);
+ }
+
+ if (flags & MSG_MORE)
+ send(sock, NULL, 0, flags & ~MSG_MORE);
+ } else if (S_ISFIFO(st.st_mode)) {
+ do {
+ r = splice(fd, NULL, sock, NULL, size,
+ size > 0 ? SPLICE_F_MORE : 0);
+ OSERROR(r, "sock/splice");
+ size -= r;
+ } while (r > 0 && size > 0);
+ if (size && !all) {
+ fprintf(stderr, "Short splice\n");
+ exit(1);
+ }
+ } else {
+ r = sendfile(sock, fd, NULL, size);
+ OSERROR(r, "sock/sendfile");
+ if (r != size) {
+ fprintf(stderr, "Short sendfile\n");
+ exit(1);
+ }
+ }
+
+ OSERROR(close(sock), "sock/close");
+ OSERROR(close(fd), "close");
+ return 0;
+}
diff --git a/samples/net/tcp-sink.c b/samples/net/tcp-sink.c
new file mode 100644
index 000000000000..5c27c24dfb76
--- /dev/null
+++ b/samples/net/tcp-sink.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * TCP sink server
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <getopt.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <netinet/in.h>
+
+#define OSERROR(X, Y) \
+ do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0)
+
+static unsigned char buffer[512 * 1024];
+
+static void format(void)
+{
+ fprintf(stderr, "tcp-sink [-4][-p<port>]\n");
+ exit(2);
+}
+
+int main(int argc, char *argv[])
+{
+ unsigned int port = 5555;
+ bool ipv6 = true;
+ int opt, server_sock, sock;
+
+
+ while ((opt = getopt(argc, argv, "4p:")) != EOF) {
+ switch (opt) {
+ case '4':
+ ipv6 = false;
+ break;
+ case 'p':
+ port = atoi(optarg);
+ break;
+ default:
+ format();
+ }
+ }
+
+ if (!ipv6) {
+ struct sockaddr_in sin = {
+ .sin_family = AF_INET,
+ .sin_port = htons(port),
+ };
+ server_sock = socket(AF_INET, SOCK_STREAM, 0);
+ OSERROR(server_sock, "socket");
+ OSERROR(bind(server_sock, (struct sockaddr *)&sin, sizeof(sin)),
+ "bind");
+ OSERROR(listen(server_sock, 1), "listen");
+ } else {
+ struct sockaddr_in6 sin6 = {
+ .sin6_family = AF_INET6,
+ .sin6_port = htons(port),
+ };
+ server_sock = socket(AF_INET6, SOCK_STREAM, 0);
+ OSERROR(server_sock, "socket");
+ OSERROR(bind(server_sock, (struct sockaddr *)&sin6,
+ sizeof(sin6)),
+ "bind");
+ OSERROR(listen(server_sock, 1), "listen");
+ }
+
+ for (;;) {
+ sock = accept(server_sock, NULL, NULL);
+ if (sock != -1) {
+ while (read(sock, buffer, sizeof(buffer)) > 0)
+ ;
+ close(sock);
+ }
+ }
+}
diff --git a/samples/net/tls-send.c b/samples/net/tls-send.c
new file mode 100644
index 000000000000..d99b79aaf536
--- /dev/null
+++ b/samples/net/tls-send.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * TLS-over-TCP send client. Pass -s to splice.
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <getopt.h>
+#include <limits.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <netdb.h>
+#include <netinet/in.h>
+#include <netinet/tcp.h>
+#include <sys/stat.h>
+#include <sys/sendfile.h>
+#include <linux/tls.h>
+
+#define OSERROR(X, Y) \
+ do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0)
+
+static unsigned char buffer[4096];
+
+static void format(void)
+{
+ fprintf(stderr,
+ "tls-send [-46ds][-n<size>][-p<port>] <file>|- <server>\n");
+ exit(2);
+}
+
+static void set_tls(int sock)
+{
+ struct tls12_crypto_info_aes_gcm_128 crypto_info;
+
+ crypto_info.info.version = TLS_1_2_VERSION;
+ crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128;
+ memset(crypto_info.iv, 0, TLS_CIPHER_AES_GCM_128_IV_SIZE);
+ memset(crypto_info.rec_seq, 0, TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE);
+ memset(crypto_info.key, 0, TLS_CIPHER_AES_GCM_128_KEY_SIZE);
+ memset(crypto_info.salt, 0, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
+
+ OSERROR(setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls")),
+ "TCP_ULP");
+ OSERROR(setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info,
+ sizeof(crypto_info)),
+ "TLS_TX");
+ OSERROR(setsockopt(sock, SOL_TLS, TLS_RX, &crypto_info,
+ sizeof(crypto_info)),
+ "TLS_RX");
+}
+
+int main(int argc, char *argv[])
+{
+ struct addrinfo *addrs = NULL, hints = {};
+ struct stat st;
+ const char *filename, *sockname, *service = "5556";
+ unsigned int flags = O_RDONLY;
+ ssize_t r, w, o;
+ size_t size = LONG_MAX;
+ char *end;
+ bool use_sendfile = false, all = true;
+ int opt, sock, fd = 0, gai;
+
+ hints.ai_family = AF_UNSPEC;
+ hints.ai_socktype = SOCK_STREAM;
+
+ while ((opt = getopt(argc, argv, "46dn:p:s")) != EOF) {
+ switch (opt) {
+ case '4':
+ hints.ai_family = AF_INET;
+ break;
+ case '6':
+ hints.ai_family = AF_INET6;
+ break;
+ case 'd':
+ flags |= O_DIRECT;
+ break;
+ case 'n':
+ size = strtoul(optarg, &end, 0);
+ switch (*end) {
+ case 'K':
+ case 'k':
+ size *= 1024;
+ break;
+ case 'M':
+ case 'm':
+ size *= 1024 * 1024;
+ break;
+ }
+ all = false;
+ break;
+ case 'p':
+ service = optarg;
+ break;
+ case 's':
+ use_sendfile = true;
+ break;
+ default:
+ format();
+ }
+ }
+
+ argc -= optind;
+ argv += optind;
+ if (argc != 2)
+ format();
+ filename = argv[0];
+ sockname = argv[1];
+
+ gai = getaddrinfo(sockname, service, &hints, &addrs);
+ if (gai) {
+ fprintf(stderr, "%s: %s\n", sockname, gai_strerror(gai));
+ exit(3);
+ }
+
+ if (!addrs) {
+ fprintf(stderr, "%s: No addresses\n", sockname);
+ exit(3);
+ }
+
+ sockname = addrs->ai_canonname;
+ sock = socket(addrs->ai_family, addrs->ai_socktype, addrs->ai_protocol);
+ OSERROR(sock, "socket");
+ OSERROR(connect(sock, addrs->ai_addr, addrs->ai_addrlen), "connect");
+ set_tls(sock);
+
+ if (strcmp(filename, "-") != 0) {
+ fd = open(filename, flags);
+ OSERROR(fd, filename);
+ OSERROR(fstat(fd, &st), filename);
+ if (size > st.st_size)
+ size = st.st_size;
+ } else {
+ OSERROR(fstat(fd, &st), filename);
+ }
+
+ if (!use_sendfile) {
+ bool more = false;
+
+ while (size) {
+ r = read(fd, buffer, sizeof(buffer));
+ OSERROR(r, filename);
+ if (r == 0)
+ break;
+ size -= r;
+
+ o = 0;
+ do {
+ more = size > 0;
+ w = send(sock, buffer + o, r - o,
+ more ? MSG_MORE : 0);
+ OSERROR(w, "sock/send");
+ o += w;
+ } while (o < r);
+ }
+
+ if (more)
+ send(sock, NULL, 0, 0);
+ } else if (S_ISFIFO(st.st_mode)) {
+ do {
+ r = splice(fd, NULL, sock, NULL, size,
+ size > 0 ? SPLICE_F_MORE : 0);
+ OSERROR(r, "sock/splice");
+ size -= r;
+ } while (r > 0 && size > 0);
+ if (size && !all) {
+ fprintf(stderr, "Short splice\n");
+ exit(1);
+ }
+ } else {
+ r = sendfile(sock, fd, NULL, size);
+ OSERROR(r, "sock/sendfile");
+ if (r != size) {
+ fprintf(stderr, "Short sendfile\n");
+ exit(1);
+ }
+ }
+
+ OSERROR(close(sock), "sock/close");
+ OSERROR(close(fd), "close");
+ return 0;
+}
diff --git a/samples/net/tls-sink.c b/samples/net/tls-sink.c
new file mode 100644
index 000000000000..67900b74d6d6
--- /dev/null
+++ b/samples/net/tls-sink.c
@@ -0,0 +1,104 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * TLS-over-TCP sink server
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <getopt.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <netinet/in.h>
+#include <netinet/tcp.h>
+#include <linux/tls.h>
+
+#define OSERROR(X, Y) \
+ do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0)
+
+static unsigned char buffer[512 * 1024];
+
+static void format(void)
+{
+ fprintf(stderr, "tls-sink [-4][-p<port>]\n");
+ exit(2);
+}
+
+static void set_tls(int sock)
+{
+ struct tls12_crypto_info_aes_gcm_128 crypto_info;
+
+ crypto_info.info.version = TLS_1_2_VERSION;
+ crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128;
+ memset(crypto_info.iv, 0, TLS_CIPHER_AES_GCM_128_IV_SIZE);
+ memset(crypto_info.rec_seq, 0, TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE);
+ memset(crypto_info.key, 0, TLS_CIPHER_AES_GCM_128_KEY_SIZE);
+ memset(crypto_info.salt, 0, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
+
+ OSERROR(setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls")),
+ "TCP_ULP");
+ OSERROR(setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info,
+ sizeof(crypto_info)),
+ "TLS_TX");
+ OSERROR(setsockopt(sock, SOL_TLS, TLS_RX, &crypto_info,
+ sizeof(crypto_info)),
+ "TLS_RX");
+}
+
+int main(int argc, char *argv[])
+{
+ unsigned int port = 5556;
+ bool ipv6 = true;
+ int opt, server_sock, sock;
+
+
+ while ((opt = getopt(argc, argv, "4p:")) != EOF) {
+ switch (opt) {
+ case '4':
+ ipv6 = false;
+ break;
+ case 'p':
+ port = atoi(optarg);
+ break;
+ default:
+ format();
+ }
+ }
+
+ if (!ipv6) {
+ struct sockaddr_in sin = {
+ .sin_family = AF_INET,
+ .sin_port = htons(port),
+ };
+ server_sock = socket(AF_INET, SOCK_STREAM, 0);
+ OSERROR(server_sock, "socket");
+ OSERROR(bind(server_sock, (struct sockaddr *)&sin, sizeof(sin)),
+ "bind");
+ OSERROR(listen(server_sock, 1), "listen");
+ } else {
+ struct sockaddr_in6 sin6 = {
+ .sin6_family = AF_INET6,
+ .sin6_port = htons(port),
+ };
+ server_sock = socket(AF_INET6, SOCK_STREAM, 0);
+ OSERROR(server_sock, "socket");
+ OSERROR(bind(server_sock, (struct sockaddr *)&sin6,
+ sizeof(sin6)),
+ "bind");
+ OSERROR(listen(server_sock, 1), "listen");
+ }
+
+ for (;;) {
+ sock = accept(server_sock, NULL, NULL);
+ if (sock != -1) {
+ set_tls(sock);
+ while (read(sock, buffer, sizeof(buffer)) > 0)
+ ;
+ close(sock);
+ }
+ }
+}
diff --git a/samples/net/udp-send.c b/samples/net/udp-send.c
new file mode 100644
index 000000000000..7c6c27eb0fcc
--- /dev/null
+++ b/samples/net/udp-send.c
@@ -0,0 +1,156 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * UDP send client. Pass -s to splice.
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <getopt.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <netdb.h>
+#include <netinet/in.h>
+#include <sys/stat.h>
+#include <sys/sendfile.h>
+
+#define OSERROR(X, Y) \
+ do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0)
+#define min(x, y) ((x) < (y) ? (x) : (y))
+
+static unsigned char buffer[65536];
+
+static void format(void)
+{
+ fprintf(stderr,
+ "udp-send [-46s][-n<size>][-p<port>] <file>|- <server>\n");
+ exit(2);
+}
+
+int main(int argc, char *argv[])
+{
+ struct addrinfo *addrs = NULL, hints = {};
+ struct stat st;
+ const char *filename, *sockname, *service = "5555";
+ unsigned int flags = O_RDONLY, len;
+ ssize_t r, o, size = 65535;
+ char *end;
+ bool use_sendfile = false;
+ int opt, sock, fd = 0, gai;
+
+ hints.ai_family = AF_UNSPEC;
+ hints.ai_socktype = SOCK_DGRAM;
+
+ while ((opt = getopt(argc, argv, "46dn:p:s")) != EOF) {
+ switch (opt) {
+ case '4':
+ hints.ai_family = AF_INET;
+ break;
+ case '6':
+ hints.ai_family = AF_INET6;
+ break;
+ case 'd':
+ flags |= O_DIRECT;
+ break;
+ case 'n':
+ size = strtoul(optarg, &end, 0);
+ switch (*end) {
+ case 'K':
+ case 'k':
+ size *= 1024;
+ break;
+ }
+ if (size > 65535) {
+ fprintf(stderr,
+ "Too much data for UDP packet\n");
+ exit(2);
+ }
+ break;
+ case 'p':
+ service = optarg;
+ break;
+ case 's':
+ use_sendfile = true;
+ break;
+ default:
+ format();
+ }
+ }
+
+ argc -= optind;
+ argv += optind;
+ if (argc != 2)
+ format();
+ filename = argv[0];
+ sockname = argv[1];
+
+ gai = getaddrinfo(sockname, service, &hints, &addrs);
+ if (gai) {
+ fprintf(stderr, "%s: %s\n", sockname, gai_strerror(gai));
+ exit(3);
+ }
+
+ if (!addrs) {
+ fprintf(stderr, "%s: No addresses\n", sockname);
+ exit(3);
+ }
+
+ sockname = addrs->ai_canonname;
+ sock = socket(addrs->ai_family, addrs->ai_socktype, addrs->ai_protocol);
+ OSERROR(sock, "socket");
+ OSERROR(connect(sock, addrs->ai_addr, addrs->ai_addrlen), "connect");
+
+ if (strcmp(filename, "-") != 0) {
+ fd = open(filename, flags);
+ OSERROR(fd, filename);
+ OSERROR(fstat(fd, &st), filename);
+ if (size > st.st_size)
+ size = st.st_size;
+ } else {
+ OSERROR(fstat(fd, &st), filename);
+ }
+
+ len = htonl(size);
+ OSERROR(send(sock, &len, 4, MSG_MORE), "sock/send");
+
+ if (!use_sendfile) {
+ while (size) {
+ r = read(fd, buffer, sizeof(buffer));
+ OSERROR(r, filename);
+ if (r == 0)
+ break;
+ size -= r;
+
+ o = 0;
+ do {
+ ssize_t w = send(sock, buffer + o, r - o,
+ size > 0 ? MSG_MORE : 0);
+ OSERROR(w, "sock/send");
+ o += w;
+ } while (o < r);
+ }
+ } else if (S_ISFIFO(st.st_mode)) {
+ r = splice(fd, NULL, sock, NULL, size, 0);
+ OSERROR(r, "sock/splice");
+ if (r != size) {
+ fprintf(stderr, "Short splice\n");
+ exit(1);
+ }
+ } else {
+ r = sendfile(sock, fd, NULL, size);
+ OSERROR(r, "sock/sendfile");
+ if (r != size) {
+ fprintf(stderr, "Short sendfile\n");
+ exit(1);
+ }
+ }
+
+ OSERROR(close(sock), "sock/close");
+ OSERROR(close(fd), "close");
+ return 0;
+}
diff --git a/samples/net/udp-sink.c b/samples/net/udp-sink.c
new file mode 100644
index 000000000000..f23c64acec4a
--- /dev/null
+++ b/samples/net/udp-sink.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * UDP sink server
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <getopt.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <netinet/in.h>
+
+#define OSERROR(X, Y) \
+ do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0)
+
+static unsigned char buffer[512 * 1024];
+
+static void format(void)
+{
+ fprintf(stderr, "udp-sink [-4][-p<port>]\n");
+ exit(2);
+}
+
+int main(int argc, char *argv[])
+{
+ struct iovec iov[1] = {
+ [0] = {
+ .iov_base = buffer,
+ .iov_len = sizeof(buffer),
+ },
+ };
+ struct msghdr msg = {
+ .msg_iov = iov,
+ .msg_iovlen = 1,
+ };
+ unsigned int port = 5555;
+ bool ipv6 = true;
+ int opt, sock;
+
+ while ((opt = getopt(argc, argv, "4p:")) != EOF) {
+ switch (opt) {
+ case '4':
+ ipv6 = false;
+ break;
+ case 'p':
+ port = atoi(optarg);
+ break;
+ default:
+ format();
+ }
+ }
+
+ if (!ipv6) {
+ struct sockaddr_in sin = {
+ .sin_family = AF_INET,
+ .sin_port = htons(port),
+ };
+ sock = socket(AF_INET, SOCK_DGRAM, 0);
+ OSERROR(sock, "socket");
+ OSERROR(bind(sock, (struct sockaddr *)&sin, sizeof(sin)),
+ "bind");
+ } else {
+ struct sockaddr_in6 sin6 = {
+ .sin6_family = AF_INET6,
+ .sin6_port = htons(port),
+ };
+ sock = socket(AF_INET6, SOCK_DGRAM, 0);
+ OSERROR(sock, "socket");
+ OSERROR(bind(sock, (struct sockaddr *)&sin6, sizeof(sin6)),
+ "bind");
+ }
+
+ for (;;) {
+ ssize_t r;
+
+ r = recvmsg(sock, &msg, 0);
+ printf("rx %zd\n", r);
+ }
+}
diff --git a/samples/net/unix-send.c b/samples/net/unix-send.c
new file mode 100644
index 000000000000..5950fcf1ccd2
--- /dev/null
+++ b/samples/net/unix-send.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * AF_UNIX stream send client. Pass -s to use splice/sendfile.
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <getopt.h>
+#include <limits.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/un.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/sendfile.h>
+
+#define OSERROR(X, Y) \
+ do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0)
+#define min(x, y) ((x) < (y) ? (x) : (y))
+
+static unsigned char buffer[4096];
+
+static void format(void)
+{
+ fprintf(stderr, "unix-send [-ds] [-n<size>] <file>|- <socket-file>\n");
+ exit(2);
+}
+
+int main(int argc, char *argv[])
+{
+ struct sockaddr_un sun = { .sun_family = AF_UNIX, };
+ struct stat st;
+ const char *filename, *sockname;
+ unsigned int flags = O_RDONLY;
+ ssize_t r, w, o, size = LONG_MAX;
+ size_t plen, total = 0;
+ char *end;
+ bool use_sendfile = false, all = true;
+ int opt, sock, fd = 0;
+
+ while ((opt = getopt(argc, argv, "dn:s")) != EOF) {
+ switch (opt) {
+ case 'd':
+ flags |= O_DIRECT;
+ break;
+ case 'n':
+ size = strtoul(optarg, &end, 0);
+ switch (*end) {
+ case 'K':
+ case 'k':
+ size *= 1024;
+ break;
+ case 'M':
+ case 'm':
+ size *= 1024 * 1024;
+ break;
+ }
+ all = false;
+ break;
+ case 's':
+ use_sendfile = true;
+ break;
+ default:
+ format();
+ }
+ }
+
+ argc -= optind;
+ argv += optind;
+ if (argc != 2)
+ format();
+ filename = argv[0];
+ sockname = argv[1];
+
+ plen = strlen(sockname);
+ if (plen == 0 || plen > sizeof(sun.sun_path) - 1) {
+ fprintf(stderr, "socket filename too short or too long\n");
+ exit(2);
+ }
+ memcpy(sun.sun_path, sockname, plen + 1);
+
+ sock = socket(AF_UNIX, SOCK_STREAM, 0);
+ OSERROR(sock, "socket");
+ OSERROR(connect(sock, (struct sockaddr *)&sun, sizeof(sun)), "connect");
+
+ if (strcmp(filename, "-") != 0) {
+ fd = open(filename, flags);
+ OSERROR(fd, filename);
+ OSERROR(fstat(fd, &st), filename);
+ if (size > st.st_size)
+ size = st.st_size;
+ } else {
+ OSERROR(fstat(fd, &st), argv[2]);
+ }
+
+ if (!use_sendfile) {
+ bool more = false;
+
+ while (size) {
+ r = read(fd, buffer, min(sizeof(buffer), size));
+ OSERROR(r, filename);
+ if (r == 0)
+ break;
+ size -= r;
+
+ o = 0;
+ do {
+ more = size > 0;
+ w = send(sock, buffer + o, r - o,
+ more ? MSG_MORE : 0);
+ OSERROR(w, "sock/send");
+ o += w;
+ total += w;
+ } while (o < r);
+ }
+
+ if (more)
+ send(sock, NULL, 0, 0);
+ } else if (S_ISFIFO(st.st_mode)) {
+ do {
+ r = splice(fd, NULL, sock, NULL, size,
+ size > 0 ? SPLICE_F_MORE : 0);
+ OSERROR(r, "sock/splice");
+ size -= r;
+ total += r;
+ } while (r > 0 && size > 0);
+ if (size && !all) {
+ fprintf(stderr, "Short splice\n");
+ exit(1);
+ }
+ } else {
+ r = sendfile(sock, fd, NULL, size);
+ OSERROR(r, "sock/sendfile");
+ if (r != size) {
+ fprintf(stderr, "Short sendfile\n");
+ exit(1);
+ }
+ total += r;
+ }
+
+ printf("Sent %zu bytes\n", total);
+ OSERROR(close(sock), "sock/close");
+ OSERROR(close(fd), "close");
+ return 0;
+}
diff --git a/samples/net/unix-sink.c b/samples/net/unix-sink.c
new file mode 100644
index 000000000000..9f0a5ac9c578
--- /dev/null
+++ b/samples/net/unix-sink.c
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * UNIX stream sink server
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/un.h>
+#include <sys/socket.h>
+
+#define OSERROR(X, Y) \
+ do { if ((long)(X) == -1) { perror(Y); exit(1); } } while (0)
+
+static unsigned char buffer[512 * 1024];
+
+int main(int argc, char *argv[])
+{
+ struct sockaddr_un sun = { .sun_family = AF_UNIX, };
+ size_t plen;
+ int server_sock, sock;
+
+ if (argc != 2) {
+ fprintf(stderr, "unix-sink <socket-file>\n");
+ exit(2);
+ }
+
+ plen = strlen(argv[1]);
+ if (plen == 0 || plen > sizeof(sun.sun_path) - 1) {
+ fprintf(stderr, "socket filename too short or too long\n");
+ exit(2);
+ }
+ memcpy(sun.sun_path, argv[1], plen + 1);
+
+ server_sock = socket(AF_UNIX, SOCK_STREAM, 0);
+ OSERROR(server_sock, "socket");
+ OSERROR(bind(server_sock, (struct sockaddr *)&sun, sizeof(sun)),
+ "bind");
+ OSERROR(listen(server_sock, 1), "listen");
+
+ for (;;) {
+ sock = accept(server_sock, NULL, NULL);
+ if (sock != -1) {
+ while (read(sock, buffer, sizeof(buffer)) > 0)
+ ;
+ close(sock);
+ }
+ }
+}


2023-06-02 15:40:08

by David Howells

[permalink] [raw]
Subject: [PATCH net-next v3 09/11] tls/device: Support MSG_SPLICE_PAGES

Make TLS's device sendmsg() support MSG_SPLICE_PAGES. This causes pages to
be spliced from the source iterator if possible.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <[email protected]>
cc: Chuck Lever <[email protected]>
cc: Boris Pismenny <[email protected]>
cc: John Fastabend <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
---
net/tls/tls_device.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 9ef766e41c7a..f2f1aff19e4a 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -509,6 +509,29 @@ static int tls_push_data(struct sock *sk,
tls_append_frag(record, &zc_pfrag, copy);

iter_offset.offset += copy;
+ } else if (copy && (flags & MSG_SPLICE_PAGES)) {
+ struct page_frag zc_pfrag;
+ struct page **pages = &zc_pfrag.page;
+ size_t off;
+
+ rc = iov_iter_extract_pages(iter_offset.msg_iter,
+ &pages, copy, 1, 0, &off);
+ if (rc <= 0) {
+ if (rc == 0)
+ rc = -EIO;
+ goto handle_error;
+ }
+ copy = rc;
+
+ if (WARN_ON_ONCE(!sendpage_ok(zc_pfrag.page))) {
+ iov_iter_revert(iter_offset.msg_iter, copy);
+ rc = -EIO;
+ goto handle_error;
+ }
+
+ zc_pfrag.offset = off;
+ zc_pfrag.size = copy;
+ tls_append_frag(record, &zc_pfrag, copy);
} else if (copy) {
copy = min_t(size_t, copy, pfrag->size - pfrag->offset);

@@ -572,6 +595,9 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
union tls_iter_offset iter;
int rc;

+ if (!tls_ctx->zerocopy_sendfile)
+ msg->msg_flags &= ~MSG_SPLICE_PAGES;
+
mutex_lock(&tls_ctx->tx_lock);
lock_sock(sk);



2023-06-02 16:59:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH net-next v3 05/11] splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor()

On Fri, Jun 2, 2023 at 11:08 AM David Howells <[email protected]> wrote:
>
> Fix this by making splice_direct_to_actor() always signal SPLICE_F_MORE if
> we haven't yet hit the requested operation size.

Well, I certainly like this patch better than the previous versions,
just because it doesn't add random fd-specific code.

That said, I think it might be worth really documenting the behavior,
particularly for files where the kernel *could* know "the file is at
EOF, no more data".

I hope that if user space wants to splice() a file to a socket, said
user space would have done an 'fstat()' and actually pass in the file
size as the length to splice(). Because if they do, I think this
simplified patch does the right thing automatically.

But if user space instead passes in a "maximally big len", and just
depends on the kernel then doing tha

ret = do_splice_to(in, &pos, pipe, len, flags);
if (unlikely(ret <= 0))
goto out_release;

to stop splicing at EOF, then the last splice_write() will have had
SPLICE_F_MORE set, even though no more data is coming from the file,
of course.

And I think that's fine. But wasn't that effectively what the old code
was already doing because 'read_len' was smaller than 'len'? I thought
that was what you wanted to fix?

IOW, I thought you wanted to clear SPLICE_F_MORE when we hit EOF. This
still doesn't do that.

So now I'm confused about what your "fix" is. Your patch doesn't
actually seem to change existing behavior in splice_direct_to_actor().

I was expecting you to actually pass the 'sd' down to do_splice_to()
and then to ->splice_read(), so that the splice_read() function could
say "I have no more", and clear it.

But you didn't do that.

Am I misreading something, or did I miss another patch?

Linus

2023-06-02 18:45:00

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH net-next v3 03/11] tls/sw: Use zero-length sendmsg() without MSG_MORE to flush

+ dan Carpenter

On Fri, Jun 02, 2023 at 04:07:44PM +0100, David Howells wrote:
> Allow userspace to end a TLS record without supplying any data by calling
> send()/sendto()/sendmsg() with no data and no MSG_MORE flag. This can be
> used to flush a previous send/splice that had MSG_MORE or SPLICE_F_MORE set
> or a sendfile() that was incomplete.
>
> Without this, a zero-length send to tls-sw is just ignored. I think
> tls-device will do the right thing without modification.
>
> Signed-off-by: David Howells <[email protected]>
> cc: Chuck Lever <[email protected]>
> cc: Boris Pismenny <[email protected]>
> cc: John Fastabend <[email protected]>
> cc: Jakub Kicinski <[email protected]>
> cc: Eric Dumazet <[email protected]>
> cc: "David S. Miller" <[email protected]>
> cc: Paolo Abeni <[email protected]>
> cc: Jens Axboe <[email protected]>
> cc: Matthew Wilcox <[email protected]>
> cc: [email protected]
> ---
> net/tls/tls_sw.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index cac1adc968e8..6aa6d17888f5 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -945,7 +945,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
> struct tls_rec *rec;
> int required_size;
> int num_async = 0;
> - bool full_record;
> + bool full_record = false;
> int record_room;
> int num_zc = 0;
> int orig_size;
> @@ -971,6 +971,9 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
> }
> }
>
> + if (!msg_data_left(msg) && eor)
> + goto just_flush;
> +

Hi David,

the flow of this function is not entirely simple, so it is not easy for me
to manually verify this. But in combination gcc-12 -Wmaybe-uninitialized
and Smatch report that the following may be used uninitialised as a result
of this change:

* msg_pl
* orig_size
* msg_en
* required_size
* try_to_copy

> while (msg_data_left(msg)) {
> if (sk->sk_err) {
> ret = -sk->sk_err;
> @@ -1082,6 +1085,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
> */
> tls_ctx->pending_open_record_frags = true;
> copied += try_to_copy;
> +just_flush:
> if (full_record || eor) {
> ret = bpf_exec_tx_verdict(msg_pl, sk, full_record,
> record_type, &copied,
>
>

2023-06-02 19:12:05

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH net-next v3 03/11] tls/sw: Use zero-length sendmsg() without MSG_MORE to flush

On Fri, Jun 02, 2023 at 08:27:56PM +0200, Simon Horman wrote:
> + dan Carpenter
>
> On Fri, Jun 02, 2023 at 04:07:44PM +0100, David Howells wrote:
> > Allow userspace to end a TLS record without supplying any data by calling
> > send()/sendto()/sendmsg() with no data and no MSG_MORE flag. This can be
> > used to flush a previous send/splice that had MSG_MORE or SPLICE_F_MORE set
> > or a sendfile() that was incomplete.
> >
> > Without this, a zero-length send to tls-sw is just ignored. I think
> > tls-device will do the right thing without modification.
> >
> > Signed-off-by: David Howells <[email protected]>
> > cc: Chuck Lever <[email protected]>
> > cc: Boris Pismenny <[email protected]>
> > cc: John Fastabend <[email protected]>
> > cc: Jakub Kicinski <[email protected]>
> > cc: Eric Dumazet <[email protected]>
> > cc: "David S. Miller" <[email protected]>
> > cc: Paolo Abeni <[email protected]>
> > cc: Jens Axboe <[email protected]>
> > cc: Matthew Wilcox <[email protected]>
> > cc: [email protected]
> > ---
> > net/tls/tls_sw.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> > index cac1adc968e8..6aa6d17888f5 100644
> > --- a/net/tls/tls_sw.c
> > +++ b/net/tls/tls_sw.c
> > @@ -945,7 +945,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
> > struct tls_rec *rec;
> > int required_size;
> > int num_async = 0;
> > - bool full_record;
> > + bool full_record = false;
> > int record_room;
> > int num_zc = 0;
> > int orig_size;
> > @@ -971,6 +971,9 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
> > }
> > }
> >
> > + if (!msg_data_left(msg) && eor)
> > + goto just_flush;
> > +
>
> Hi David,
>
> the flow of this function is not entirely simple, so it is not easy for me
> to manually verify this. But in combination gcc-12 -Wmaybe-uninitialized
> and Smatch report that the following may be used uninitialised as a result
> of this change:
>
> * msg_pl

This warning seems correct to me.

> * orig_size

This warning assumes we hit the first warning and then hit the goto
wait_for_memory;

> * msg_en

I don't get this warning on my system but it's the same thing. Hit the
first warning then the goto wait_for_memory.

> * required_size

Same.

> * try_to_copy

I don't really understand this warning and I can't reproduce it.
Strange.

regards,
dan carpenter


2023-06-03 07:02:35

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v3 11/11] net: Add samples for network I/O and splicing

On Fri, 2 Jun 2023 16:07:52 +0100 David Howells wrote:
> Examples include:
>
> ./splice-out -w0x400 /foo/16K 4K | ./alg-encrypt -s -
> ./splice-out -w0x400 /foo/1M | ./unix-send -s - /tmp/foo
> ./splice-out -w0x400 /foo/16K 16K -w1 | ./tls-send -s6 -n16K - servbox
> ./tcp-send /bin/ls 192.168.6.1
> ./udp-send -4 -p5555 /foo/4K localhost

Can it be made into a selftests? Move the code and wrap the above in a
bash script?

2023-06-03 15:03:23

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH net-next v3 03/11] tls/sw: Use zero-length sendmsg() without MSG_MORE to flush

On Fri, Jun 02, 2023 at 10:00:45PM +0300, Dan Carpenter wrote:
> On Fri, Jun 02, 2023 at 08:27:56PM +0200, Simon Horman wrote:
> > + dan Carpenter
> >
> > On Fri, Jun 02, 2023 at 04:07:44PM +0100, David Howells wrote:
> > > Allow userspace to end a TLS record without supplying any data by calling
> > > send()/sendto()/sendmsg() with no data and no MSG_MORE flag. This can be
> > > used to flush a previous send/splice that had MSG_MORE or SPLICE_F_MORE set
> > > or a sendfile() that was incomplete.
> > >
> > > Without this, a zero-length send to tls-sw is just ignored. I think
> > > tls-device will do the right thing without modification.
> > >
> > > Signed-off-by: David Howells <[email protected]>
> > > cc: Chuck Lever <[email protected]>
> > > cc: Boris Pismenny <[email protected]>
> > > cc: John Fastabend <[email protected]>
> > > cc: Jakub Kicinski <[email protected]>
> > > cc: Eric Dumazet <[email protected]>
> > > cc: "David S. Miller" <[email protected]>
> > > cc: Paolo Abeni <[email protected]>
> > > cc: Jens Axboe <[email protected]>
> > > cc: Matthew Wilcox <[email protected]>
> > > cc: [email protected]
> > > ---
> > > net/tls/tls_sw.c | 6 +++++-
> > > 1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> > > index cac1adc968e8..6aa6d17888f5 100644
> > > --- a/net/tls/tls_sw.c
> > > +++ b/net/tls/tls_sw.c
> > > @@ -945,7 +945,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
> > > struct tls_rec *rec;
> > > int required_size;
> > > int num_async = 0;
> > > - bool full_record;
> > > + bool full_record = false;
> > > int record_room;
> > > int num_zc = 0;
> > > int orig_size;
> > > @@ -971,6 +971,9 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
> > > }
> > > }
> > >
> > > + if (!msg_data_left(msg) && eor)
> > > + goto just_flush;
> > > +
> >
> > Hi David,
> >
> > the flow of this function is not entirely simple, so it is not easy for me
> > to manually verify this. But in combination gcc-12 -Wmaybe-uninitialized
> > and Smatch report that the following may be used uninitialised as a result
> > of this change:
> >
> > * msg_pl
>
> This warning seems correct to me.
>
> > * orig_size
>
> This warning assumes we hit the first warning and then hit the goto
> wait_for_memory;
>
> > * msg_en
>
> I don't get this warning on my system but it's the same thing. Hit the
> first warning then the goto wait_for_memory.
>
> > * required_size
>
> Same.
>
> > * try_to_copy
>
> I don't really understand this warning and I can't reproduce it.
> Strange.

Thanks Dan.

Of the above I think only the last one was flagged
by GCC but not Smatch. I can try investigating further if it is useful.