Now that the splice_to_socket() has been rewritten so that nothing now uses
the ->sendpage() file op[1], some further changes can be made, so here are
some miscellaneous changes that can now be done.
(1) Remove the ->sendpage() file op.
(2) Remove hash_sendpage*() from AF_ALG.
(3) Make sunrpc send multiple pages in single sendmsg() call rather than
calling sendpage() in TCP (or maybe TLS).
(4) Make tcp_bpf_sendpage() a wrapper around tcp_bpf_sendmsg().
(5) Make AF_KCM use sendmsg() when calling down to TCP and then make it
send entire fragment lists in single sendmsg calls.
I've pushed the patches here also:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=sendpage-3-misc
David
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=fd5f4d7da29218485153fd8b4c08da7fc130c79f [1]
David Howells (6):
Remove file->f_op->sendpage
algif: Remove hash_sendpage*()
sunrpc: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage
tcp_bpf: Make tcp_bpf_sendpage() go through
tcp_bpf_sendmsg(MSG_SPLICE_PAGES)
kcm: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage
kcm: Send multiple frags in one sendmsg()
crypto/algif_hash.c | 66 --------------------
include/linux/fs.h | 1 -
include/linux/sunrpc/svc.h | 11 ++--
include/net/kcm.h | 2 +-
net/ipv4/tcp_bpf.c | 49 +++------------
net/kcm/kcmsock.c | 120 ++++++++++++++++---------------------
net/sunrpc/svcsock.c | 38 ++++--------
7 files changed, 77 insertions(+), 210 deletions(-)
When transmitting data, call down into the transport socket using sendmsg
with MSG_SPLICE_PAGES to indicate that content should be spliced rather
than using sendpage.
Signed-off-by: David Howells <[email protected]>
cc: Tom Herbert <[email protected]>
cc: Tom Herbert <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
---
net/kcm/kcmsock.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index 7dee74430b59..3bcac1453f10 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -641,6 +641,10 @@ static int kcm_write_msgs(struct kcm_sock *kcm)
for (fragidx = 0; fragidx < skb_shinfo(skb)->nr_frags;
fragidx++) {
+ struct bio_vec bvec;
+ struct msghdr msg = {
+ .msg_flags = MSG_DONTWAIT | MSG_SPLICE_PAGES,
+ };
skb_frag_t *frag;
frag_offset = 0;
@@ -651,11 +655,13 @@ static int kcm_write_msgs(struct kcm_sock *kcm)
goto out;
}
- ret = kernel_sendpage(psock->sk->sk_socket,
- skb_frag_page(frag),
- skb_frag_off(frag) + frag_offset,
- skb_frag_size(frag) - frag_offset,
- MSG_DONTWAIT);
+ bvec_set_page(&bvec,
+ skb_frag_page(frag),
+ skb_frag_size(frag) - frag_offset,
+ skb_frag_off(frag) + frag_offset);
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1,
+ bvec.bv_len);
+ ret = sock_sendmsg(psock->sk->sk_socket, &msg);
if (ret <= 0) {
if (ret == -EAGAIN) {
/* Save state to try again when there's
Rewrite the AF_KCM transmission loop to send all the fragments in a single
skb or frag_list-skb in one sendmsg() with MSG_SPLICE_PAGES set. The list
of fragments in each skb is conveniently a bio_vec[] that can just be
attached to a BVEC iter.
Note: I'm working out the size of each fragment-skb by adding up bv_len for
all the bio_vecs in skb->frags[] - but surely this information is recorded
somewhere? For the skbs in head->frag_list, this is equal to
skb->data_len, but not for the head. head->data_len includes all the tail
frags too.
Signed-off-by: David Howells <[email protected]>
cc: Tom Herbert <[email protected]>
cc: Tom Herbert <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
---
include/net/kcm.h | 2 +-
net/kcm/kcmsock.c | 126 ++++++++++++++++++----------------------------
2 files changed, 51 insertions(+), 77 deletions(-)
diff --git a/include/net/kcm.h b/include/net/kcm.h
index 2d704f8f4905..90279e5e09a5 100644
--- a/include/net/kcm.h
+++ b/include/net/kcm.h
@@ -47,9 +47,9 @@ struct kcm_stats {
struct kcm_tx_msg {
unsigned int sent;
- unsigned int fragidx;
unsigned int frag_offset;
unsigned int msg_flags;
+ bool started_tx;
struct sk_buff *frag_skb;
struct sk_buff *last_skb;
};
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index 3bcac1453f10..d75d775e9462 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -581,12 +581,10 @@ static void kcm_report_tx_retry(struct kcm_sock *kcm)
*/
static int kcm_write_msgs(struct kcm_sock *kcm)
{
+ unsigned int total_sent = 0;
struct sock *sk = &kcm->sk;
struct kcm_psock *psock;
- struct sk_buff *skb, *head;
- struct kcm_tx_msg *txm;
- unsigned short fragidx, frag_offset;
- unsigned int sent, total_sent = 0;
+ struct sk_buff *head;
int ret = 0;
kcm->tx_wait_more = false;
@@ -600,78 +598,57 @@ static int kcm_write_msgs(struct kcm_sock *kcm)
if (skb_queue_empty(&sk->sk_write_queue))
return 0;
- kcm_tx_msg(skb_peek(&sk->sk_write_queue))->sent = 0;
-
- } else if (skb_queue_empty(&sk->sk_write_queue)) {
- return 0;
+ kcm_tx_msg(skb_peek(&sk->sk_write_queue))->started_tx = false;
}
- head = skb_peek(&sk->sk_write_queue);
- txm = kcm_tx_msg(head);
-
- if (txm->sent) {
- /* Send of first skbuff in queue already in progress */
- if (WARN_ON(!psock)) {
- ret = -EINVAL;
- goto out;
+retry:
+ while ((head = skb_peek(&sk->sk_write_queue))) {
+ struct msghdr msg = {
+ .msg_flags = MSG_DONTWAIT | MSG_SPLICE_PAGES,
+ };
+ struct kcm_tx_msg *txm = kcm_tx_msg(head);
+ struct sk_buff *skb;
+ unsigned int msize;
+ int i;
+
+ if (!txm->started_tx) {
+ psock = reserve_psock(kcm);
+ if (!psock)
+ goto out;
+ skb = head;
+ txm->frag_offset = 0;
+ txm->sent = 0;
+ txm->started_tx = true;
+ } else {
+ if (WARN_ON(!psock)) {
+ ret = -EINVAL;
+ goto out;
+ }
+ skb = txm->frag_skb;
}
- sent = txm->sent;
- frag_offset = txm->frag_offset;
- fragidx = txm->fragidx;
- skb = txm->frag_skb;
-
- goto do_frag;
- }
-
-try_again:
- psock = reserve_psock(kcm);
- if (!psock)
- goto out;
-
- do {
- skb = head;
- txm = kcm_tx_msg(head);
- sent = 0;
-do_frag_list:
if (WARN_ON(!skb_shinfo(skb)->nr_frags)) {
ret = -EINVAL;
goto out;
}
- for (fragidx = 0; fragidx < skb_shinfo(skb)->nr_frags;
- fragidx++) {
- struct bio_vec bvec;
- struct msghdr msg = {
- .msg_flags = MSG_DONTWAIT | MSG_SPLICE_PAGES,
- };
- skb_frag_t *frag;
-
- frag_offset = 0;
-do_frag:
- frag = &skb_shinfo(skb)->frags[fragidx];
- if (WARN_ON(!skb_frag_size(frag))) {
- ret = -EINVAL;
- goto out;
- }
+ msize = 0;
+ for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
+ msize += skb_shinfo(skb)->frags[i].bv_len;
+
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE,
+ skb_shinfo(skb)->frags, skb_shinfo(skb)->nr_frags,
+ msize);
+ iov_iter_advance(&msg.msg_iter, txm->frag_offset);
- bvec_set_page(&bvec,
- skb_frag_page(frag),
- skb_frag_size(frag) - frag_offset,
- skb_frag_off(frag) + frag_offset);
- iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1,
- bvec.bv_len);
+ do {
ret = sock_sendmsg(psock->sk->sk_socket, &msg);
if (ret <= 0) {
if (ret == -EAGAIN) {
/* Save state to try again when there's
* write space on the socket
*/
- txm->sent = sent;
- txm->frag_offset = frag_offset;
- txm->fragidx = fragidx;
txm->frag_skb = skb;
-
ret = 0;
goto out;
}
@@ -685,39 +662,36 @@ static int kcm_write_msgs(struct kcm_sock *kcm)
true);
unreserve_psock(kcm);
- txm->sent = 0;
+ txm->started_tx = false;
kcm_report_tx_retry(kcm);
ret = 0;
-
- goto try_again;
+ goto retry;
}
- sent += ret;
- frag_offset += ret;
+ txm->sent += ret;
+ txm->frag_offset += ret;
KCM_STATS_ADD(psock->stats.tx_bytes, ret);
- if (frag_offset < skb_frag_size(frag)) {
- /* Not finished with this frag */
- goto do_frag;
- }
- }
+ } while (msg.msg_iter.count > 0);
if (skb == head) {
if (skb_has_frag_list(skb)) {
- skb = skb_shinfo(skb)->frag_list;
- goto do_frag_list;
+ txm->frag_skb = skb_shinfo(skb)->frag_list;
+ txm->frag_offset = 0;
+ continue;
}
} else if (skb->next) {
- skb = skb->next;
- goto do_frag_list;
+ txm->frag_skb = skb->next;
+ txm->frag_offset = 0;
+ continue;
}
/* Successfully sent the whole packet, account for it. */
+ sk->sk_wmem_queued -= txm->sent;
+ total_sent += txm->sent;
skb_dequeue(&sk->sk_write_queue);
kfree_skb(head);
- sk->sk_wmem_queued -= sent;
- total_sent += sent;
KCM_STATS_INCR(psock->stats.tx_msgs);
- } while ((head = skb_peek(&sk->sk_write_queue)));
+ }
out:
if (!head) {
/* Done with all queued messages. */
Make tcp_bpf_sendpage() a wrapper around tcp_bpf_sendmsg(MSG_SPLICE_PAGES)
rather than a loop calling tcp_sendpage(). sendpage() will be removed in
the future.
Signed-off-by: David Howells <[email protected]>
cc: John Fastabend <[email protected]>
cc: Jakub Sitnicki <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
cc: [email protected]
---
net/ipv4/tcp_bpf.c | 49 +++++++++-------------------------------------
1 file changed, 9 insertions(+), 40 deletions(-)
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index e75023ea052f..5a84053ac62b 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -568,49 +568,18 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
static int tcp_bpf_sendpage(struct sock *sk, struct page *page, int offset,
size_t size, int flags)
{
- struct sk_msg tmp, *msg = NULL;
- int err = 0, copied = 0;
- struct sk_psock *psock;
- bool enospc = false;
-
- psock = sk_psock_get(sk);
- if (unlikely(!psock))
- return tcp_sendpage(sk, page, offset, size, flags);
+ struct bio_vec bvec;
+ struct msghdr msg = {
+ .msg_flags = flags | MSG_SPLICE_PAGES,
+ };
- lock_sock(sk);
- if (psock->cork) {
- msg = psock->cork;
- } else {
- msg = &tmp;
- sk_msg_init(msg);
- }
+ bvec_set_page(&bvec, page, size, offset);
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
- /* Catch case where ring is full and sendpage is stalled. */
- if (unlikely(sk_msg_full(msg)))
- goto out_err;
-
- sk_msg_page_add(msg, page, size, offset);
- sk_mem_charge(sk, size);
- copied = size;
- if (sk_msg_full(msg))
- enospc = true;
- if (psock->cork_bytes) {
- if (size > psock->cork_bytes)
- psock->cork_bytes = 0;
- else
- psock->cork_bytes -= size;
- if (psock->cork_bytes && !enospc)
- goto out_err;
- /* All cork bytes are accounted, rerun the prog. */
- psock->eval = __SK_NONE;
- psock->cork_bytes = 0;
- }
+ if (flags & MSG_SENDPAGE_NOTLAST)
+ msg.msg_flags |= MSG_MORE;
- err = tcp_bpf_send_verdict(sk, psock, msg, &copied, flags);
-out_err:
- release_sock(sk);
- sk_psock_put(sk, psock);
- return copied ? copied : err;
+ return tcp_bpf_sendmsg(sk, &msg, size);
}
enum {
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <[email protected]>:
On Fri, 9 Jun 2023 11:02:15 +0100 you wrote:
> Now that the splice_to_socket() has been rewritten so that nothing now uses
> the ->sendpage() file op[1], some further changes can be made, so here are
> some miscellaneous changes that can now be done.
>
> (1) Remove the ->sendpage() file op.
>
> (2) Remove hash_sendpage*() from AF_ALG.
>
> [...]
Here is the summary with links:
- [net-next,1/6] Remove file->f_op->sendpage
https://git.kernel.org/netdev/net-next/c/a3bbdc52c38f
- [net-next,2/6] algif: Remove hash_sendpage*()
https://git.kernel.org/netdev/net-next/c/345ee3e8126a
- [net-next,3/6] sunrpc: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage
https://git.kernel.org/netdev/net-next/c/5df5dd03a8f7
- [net-next,4/6] tcp_bpf: Make tcp_bpf_sendpage() go through tcp_bpf_sendmsg(MSG_SPLICE_PAGES)
https://git.kernel.org/netdev/net-next/c/de17c6857301
- [net-next,5/6] kcm: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage
https://git.kernel.org/netdev/net-next/c/264ba53fac79
- [net-next,6/6] kcm: Send multiple frags in one sendmsg()
https://git.kernel.org/netdev/net-next/c/c31a25e1db48
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html