2023-07-30 09:42:02

by Arseniy Krasnov

[permalink] [raw]
Subject: [PATCH net-next v5 1/4] vsock/virtio/vhost: read data from non-linear skb

This is preparation patch for MSG_ZEROCOPY support. It adds handling of
non-linear skbs by replacing direct calls of 'memcpy_to_msg()' with
'skb_copy_datagram_iter()'. Main advantage of the second one is that it
can handle paged part of the skb by using 'kmap()' on each page, but if
there are no pages in the skb, it behaves like simple copying to iov
iterator. This patch also adds new field to the control block of skb -
this value shows current offset in the skb to read next portion of data
(it doesn't matter linear it or not). Idea behind this field is that
'skb_copy_datagram_iter()' handles both types of skb internally - it
just needs an offset from which to copy data from the given skb. This
offset is incremented on each read from skb. This approach allows to
avoid special handling of non-linear skbs:
1) We can't call 'skb_pull()' on it, because it updates 'data' pointer.
2) We need to update 'data_len' also on each read from this skb.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
Changelog:
v5(big patchset) -> v1:
* Merge 'virtio_transport_common.c' and 'vhost/vsock.c' patches into
this single patch.
* Commit message update: grammar fix and remark that this patch is
MSG_ZEROCOPY preparation.
* Use 'min_t()' instead of comparison using '<>' operators.
v1 -> v2:
* R-b tag added.
v3 -> v4:
* R-b tag removed due to rebase:
* Part for 'virtio_transport_stream_do_peek()' is changed.
* Part for 'virtio_transport_seqpacket_do_peek()' is added.
* Comments about sleep in 'memcpy_to_msg()' now describe sleep in
'skb_copy_datagram_iter()'.

drivers/vhost/vsock.c | 14 +++++++----
include/linux/virtio_vsock.h | 1 +
net/vmw_vsock/virtio_transport_common.c | 32 +++++++++++++++----------
3 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 817d377a3f36..8c917be32b5d 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -114,6 +114,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
struct sk_buff *skb;
unsigned out, in;
size_t nbytes;
+ u32 frag_off;
int head;

skb = virtio_vsock_skb_dequeue(&vsock->send_pkt_queue);
@@ -156,7 +157,8 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
}

iov_iter_init(&iov_iter, ITER_DEST, &vq->iov[out], in, iov_len);
- payload_len = skb->len;
+ frag_off = VIRTIO_VSOCK_SKB_CB(skb)->frag_off;
+ payload_len = skb->len - frag_off;
hdr = virtio_vsock_hdr(skb);

/* If the packet is greater than the space available in the
@@ -197,8 +199,10 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
break;
}

- nbytes = copy_to_iter(skb->data, payload_len, &iov_iter);
- if (nbytes != payload_len) {
+ if (skb_copy_datagram_iter(skb,
+ frag_off,
+ &iov_iter,
+ payload_len)) {
kfree_skb(skb);
vq_err(vq, "Faulted on copying pkt buf\n");
break;
@@ -212,13 +216,13 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
vhost_add_used(vq, head, sizeof(*hdr) + payload_len);
added = true;

- skb_pull(skb, payload_len);
+ VIRTIO_VSOCK_SKB_CB(skb)->frag_off += payload_len;
total_len += payload_len;

/* If we didn't send all the payload we can requeue the packet
* to send it with the next available buffer.
*/
- if (skb->len > 0) {
+ if (VIRTIO_VSOCK_SKB_CB(skb)->frag_off < skb->len) {
hdr->flags |= cpu_to_le32(flags_to_restore);

/* We are queueing the same skb to handle
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index c58453699ee9..17dbb7176e37 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -12,6 +12,7 @@
struct virtio_vsock_skb_cb {
bool reply;
bool tap_delivered;
+ u32 frag_off;
};

#define VIRTIO_VSOCK_SKB_CB(skb) ((struct virtio_vsock_skb_cb *)((skb)->cb))
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 352d042b130b..0b6a89139810 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -364,9 +364,10 @@ virtio_transport_stream_do_peek(struct vsock_sock *vsk,
spin_unlock_bh(&vvs->rx_lock);

/* sk_lock is held by caller so no one else can dequeue.
- * Unlock rx_lock since memcpy_to_msg() may sleep.
+ * Unlock rx_lock since skb_copy_datagram_iter() may sleep.
*/
- err = memcpy_to_msg(msg, skb->data, bytes);
+ err = skb_copy_datagram_iter(skb, VIRTIO_VSOCK_SKB_CB(skb)->frag_off,
+ &msg->msg_iter, bytes);
if (err)
goto out;

@@ -410,25 +411,27 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
while (total < len && !skb_queue_empty(&vvs->rx_queue)) {
skb = skb_peek(&vvs->rx_queue);

- bytes = len - total;
- if (bytes > skb->len)
- bytes = skb->len;
+ bytes = min_t(size_t, len - total,
+ skb->len - VIRTIO_VSOCK_SKB_CB(skb)->frag_off);

/* sk_lock is held by caller so no one else can dequeue.
- * Unlock rx_lock since memcpy_to_msg() may sleep.
+ * Unlock rx_lock since skb_copy_datagram_iter() may sleep.
*/
spin_unlock_bh(&vvs->rx_lock);

- err = memcpy_to_msg(msg, skb->data, bytes);
+ err = skb_copy_datagram_iter(skb,
+ VIRTIO_VSOCK_SKB_CB(skb)->frag_off,
+ &msg->msg_iter, bytes);
if (err)
goto out;

spin_lock_bh(&vvs->rx_lock);

total += bytes;
- skb_pull(skb, bytes);

- if (skb->len == 0) {
+ VIRTIO_VSOCK_SKB_CB(skb)->frag_off += bytes;
+
+ if (skb->len == VIRTIO_VSOCK_SKB_CB(skb)->frag_off) {
u32 pkt_len = le32_to_cpu(virtio_vsock_hdr(skb)->len);

virtio_transport_dec_rx_pkt(vvs, pkt_len);
@@ -492,9 +495,10 @@ virtio_transport_seqpacket_do_peek(struct vsock_sock *vsk,
spin_unlock_bh(&vvs->rx_lock);

/* sk_lock is held by caller so no one else can dequeue.
- * Unlock rx_lock since memcpy_to_msg() may sleep.
+ * Unlock rx_lock since skb_copy_datagram_iter() may sleep.
*/
- err = memcpy_to_msg(msg, skb->data, bytes);
+ err = skb_copy_datagram_iter(skb, VIRTIO_VSOCK_SKB_CB(skb)->frag_off,
+ &msg->msg_iter, bytes);
if (err)
return err;

@@ -553,11 +557,13 @@ static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk,
int err;

/* sk_lock is held by caller so no one else can dequeue.
- * Unlock rx_lock since memcpy_to_msg() may sleep.
+ * Unlock rx_lock since skb_copy_datagram_iter() may sleep.
*/
spin_unlock_bh(&vvs->rx_lock);

- err = memcpy_to_msg(msg, skb->data, bytes_to_copy);
+ err = skb_copy_datagram_iter(skb, 0,
+ &msg->msg_iter,
+ bytes_to_copy);
if (err) {
/* Copy of message failed. Rest of
* fragments will be freed without copy.
--
2.25.1



2023-08-01 16:08:46

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [PATCH net-next v5 1/4] vsock/virtio/vhost: read data from non-linear skb



On 01.08.2023 16:11, Paolo Abeni wrote:
> On Sun, 2023-07-30 at 11:59 +0300, Arseniy Krasnov wrote:
>> This is preparation patch for MSG_ZEROCOPY support. It adds handling of
>> non-linear skbs by replacing direct calls of 'memcpy_to_msg()' with
>> 'skb_copy_datagram_iter()'. Main advantage of the second one is that it
>> can handle paged part of the skb by using 'kmap()' on each page, but if
>> there are no pages in the skb, it behaves like simple copying to iov
>> iterator. This patch also adds new field to the control block of skb -
>> this value shows current offset in the skb to read next portion of data
>> (it doesn't matter linear it or not). Idea behind this field is that
>> 'skb_copy_datagram_iter()' handles both types of skb internally - it
>> just needs an offset from which to copy data from the given skb. This
>> offset is incremented on each read from skb. This approach allows to
>> avoid special handling of non-linear skbs:
>> 1) We can't call 'skb_pull()' on it, because it updates 'data' pointer.
>> 2) We need to update 'data_len' also on each read from this skb.
>
> It looks like the above sentence is a left-over from previous version
> as, as this patch does not touch data_len. And I think it contradicts
> the previous one, so it's a bit confusing.

Yes, seems I need to rephrase it in the next version. I meant that with
approach introduced in this patch we don't need to check that skb is
linear of non-linear after reading data from it. Because otherwise:
1) In case of linear skb we will need to call 'skb_pull()' after reading
data, to update 'data' pointer.
2) In case of non-linear skb we will need to update 'data_len' field after
reading data, as this field shows amount of data in fragged part.

>
>> Signed-off-by: Arseniy Krasnov <[email protected]>
>> ---
>> Changelog:
>> v5(big patchset) -> v1:
>> * Merge 'virtio_transport_common.c' and 'vhost/vsock.c' patches into
>> this single patch.
>> * Commit message update: grammar fix and remark that this patch is
>> MSG_ZEROCOPY preparation.
>> * Use 'min_t()' instead of comparison using '<>' operators.
>> v1 -> v2:
>> * R-b tag added.
>> v3 -> v4:
>> * R-b tag removed due to rebase:
>> * Part for 'virtio_transport_stream_do_peek()' is changed.
>> * Part for 'virtio_transport_seqpacket_do_peek()' is added.
>> * Comments about sleep in 'memcpy_to_msg()' now describe sleep in
>> 'skb_copy_datagram_iter()'.
>>
>> drivers/vhost/vsock.c | 14 +++++++----
>> include/linux/virtio_vsock.h | 1 +
>> net/vmw_vsock/virtio_transport_common.c | 32 +++++++++++++++----------
>> 3 files changed, 29 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> index 817d377a3f36..8c917be32b5d 100644
>> --- a/drivers/vhost/vsock.c
>> +++ b/drivers/vhost/vsock.c
>> @@ -114,6 +114,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>> struct sk_buff *skb;
>> unsigned out, in;
>> size_t nbytes;
>> + u32 frag_off;
>
> IMHO 'offset' would be a better name for both the variable and the CB
> field, as it can points both inside the skb frags, linear part or frag
> list.

Ack

>
> Otherwise LGTM, thanks!
>
> Paolo
>

Thanks, Arseniy

2023-08-01 16:19:27

by Paolo Abeni

[permalink] [raw]
Subject: Re: [PATCH net-next v5 1/4] vsock/virtio/vhost: read data from non-linear skb

On Sun, 2023-07-30 at 11:59 +0300, Arseniy Krasnov wrote:
> This is preparation patch for MSG_ZEROCOPY support. It adds handling of
> non-linear skbs by replacing direct calls of 'memcpy_to_msg()' with
> 'skb_copy_datagram_iter()'. Main advantage of the second one is that it
> can handle paged part of the skb by using 'kmap()' on each page, but if
> there are no pages in the skb, it behaves like simple copying to iov
> iterator. This patch also adds new field to the control block of skb -
> this value shows current offset in the skb to read next portion of data
> (it doesn't matter linear it or not). Idea behind this field is that
> 'skb_copy_datagram_iter()' handles both types of skb internally - it
> just needs an offset from which to copy data from the given skb. This
> offset is incremented on each read from skb. This approach allows to
> avoid special handling of non-linear skbs:
> 1) We can't call 'skb_pull()' on it, because it updates 'data' pointer.
> 2) We need to update 'data_len' also on each read from this skb.

It looks like the above sentence is a left-over from previous version
as, as this patch does not touch data_len. And I think it contradicts
the previous one, so it's a bit confusing.

> Signed-off-by: Arseniy Krasnov <[email protected]>
> ---
> Changelog:
> v5(big patchset) -> v1:
> * Merge 'virtio_transport_common.c' and 'vhost/vsock.c' patches into
> this single patch.
> * Commit message update: grammar fix and remark that this patch is
> MSG_ZEROCOPY preparation.
> * Use 'min_t()' instead of comparison using '<>' operators.
> v1 -> v2:
> * R-b tag added.
> v3 -> v4:
> * R-b tag removed due to rebase:
> * Part for 'virtio_transport_stream_do_peek()' is changed.
> * Part for 'virtio_transport_seqpacket_do_peek()' is added.
> * Comments about sleep in 'memcpy_to_msg()' now describe sleep in
> 'skb_copy_datagram_iter()'.
>
> drivers/vhost/vsock.c | 14 +++++++----
> include/linux/virtio_vsock.h | 1 +
> net/vmw_vsock/virtio_transport_common.c | 32 +++++++++++++++----------
> 3 files changed, 29 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 817d377a3f36..8c917be32b5d 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -114,6 +114,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> struct sk_buff *skb;
> unsigned out, in;
> size_t nbytes;
> + u32 frag_off;

IMHO 'offset' would be a better name for both the variable and the CB
field, as it can points both inside the skb frags, linear part or frag
list.

Otherwise LGTM, thanks!

Paolo