Hello,
this patchset is first of three parts of another big patchset for
MSG_ZEROCOPY flag support:
https://lore.kernel.org/netdev/[email protected]/
During review of this series, Stefano Garzarella <[email protected]>
suggested to split it for three parts to simplify review and merging:
1) virtio and vhost updates (for fragged skbs) <--- this patchset
2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
tx completions) and update for Documentation/.
3) Updates for tests and utils.
This series enables handling of fragged skbs in virtio and vhost parts.
Newly logic won't be triggered, because SO_ZEROCOPY options is still
impossible to enable at this moment (next bunch of patches from big
set above will enable it).
I've included changelog to some patches anyway, because there were some
comments during review of last big patchset from the link above.
Head for this patchset is:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=f2fa1c812c91e99d0317d1fc7d845e1e05f39716
Link to v1:
https://lore.kernel.org/netdev/[email protected]/
Link to v2:
https://lore.kernel.org/netdev/[email protected]/
Link to v3:
https://lore.kernel.org/netdev/[email protected]/
Link to v4:
https://lore.kernel.org/netdev/[email protected]/
Link to v5:
https://lore.kernel.org/netdev/[email protected]/
Link to v6:
https://lore.kernel.org/netdev/[email protected]/
Link to v7:
https://lore.kernel.org/netdev/[email protected]/
Link to v8:
https://lore.kernel.org/netdev/[email protected]/
Changelog:
v3 -> v4:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
v4 -> v5:
* See per-patch changelog after ---.
v5 -> v6:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* See per-patch changelog after ---.
v6 -> v7:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* See per-patch changelog after ---.
v7 -> v8:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* See per-patch changelog after ---.
v8 -> v9:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* See per-patch changelog after ---.
Arseniy Krasnov (4):
vsock/virtio/vhost: read data from non-linear skb
vsock/virtio: support to send non-linear skb
vsock/virtio: non-linear skb handling for tap
vsock/virtio: MSG_ZEROCOPY flag support
drivers/vhost/vsock.c | 14 +-
include/linux/virtio_vsock.h | 10 +
.../events/vsock_virtio_transport_common.h | 12 +-
net/vmw_vsock/virtio_transport.c | 92 +++++-
net/vmw_vsock/virtio_transport_common.c | 307 ++++++++++++++----
5 files changed, 348 insertions(+), 87 deletions(-)
--
2.25.1
For tap device new skb is created and data from the current skb is
copied to it. This adds copying data from non-linear skb to new
the skb.
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
---
net/vmw_vsock/virtio_transport_common.c | 31 ++++++++++++++++++++++---
1 file changed, 28 insertions(+), 3 deletions(-)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 3e08d52a9355..3a48e48a99ac 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -106,6 +106,27 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
return NULL;
}
+static void virtio_transport_copy_nonlinear_skb(const struct sk_buff *skb,
+ void *dst,
+ size_t len)
+{
+ struct iov_iter iov_iter = { 0 };
+ struct kvec kvec;
+ size_t to_copy;
+
+ kvec.iov_base = dst;
+ kvec.iov_len = len;
+
+ iov_iter.iter_type = ITER_KVEC;
+ iov_iter.kvec = &kvec;
+ iov_iter.nr_segs = 1;
+
+ to_copy = min_t(size_t, len, skb->len);
+
+ skb_copy_datagram_iter(skb, VIRTIO_VSOCK_SKB_CB(skb)->offset,
+ &iov_iter, to_copy);
+}
+
/* Packet capture */
static struct sk_buff *virtio_transport_build_skb(void *opaque)
{
@@ -114,7 +135,6 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
struct af_vsockmon_hdr *hdr;
struct sk_buff *skb;
size_t payload_len;
- void *payload_buf;
/* A packet could be split to fit the RX buffer, so we can retrieve
* the payload length from the header and the buffer pointer taking
@@ -122,7 +142,6 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
*/
pkt_hdr = virtio_vsock_hdr(pkt);
payload_len = pkt->len;
- payload_buf = pkt->data;
skb = alloc_skb(sizeof(*hdr) + sizeof(*pkt_hdr) + payload_len,
GFP_ATOMIC);
@@ -165,7 +184,13 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
skb_put_data(skb, pkt_hdr, sizeof(*pkt_hdr));
if (payload_len) {
- skb_put_data(skb, payload_buf, payload_len);
+ if (skb_is_nonlinear(pkt)) {
+ void *data = skb_put(skb, payload_len);
+
+ virtio_transport_copy_nonlinear_skb(pkt, data, payload_len);
+ } else {
+ skb_put_data(skb, pkt->data, payload_len);
+ }
}
return skb;
--
2.25.1
This adds handling of MSG_ZEROCOPY flag on transmission path:
1) If this flag is set and zerocopy transmission is possible (enabled
in socket options and transport allows zerocopy), then non-linear
skb will be created and filled with the pages of user's buffer.
Pages of user's buffer are locked in memory by 'get_user_pages()'.
2) Replaces way of skb owning: instead of 'skb_set_owner_sk_safe()' it
calls 'skb_set_owner_w()'. Reason of this change is that
'__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc' of socket, so
to decrease this field correctly, proper skb destructor is needed:
'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
3) Adds new callback to 'struct virtio_transport': 'can_msgzerocopy'.
If this callback is set, then transport needs extra check to be able
to send provided number of buffers in zerocopy mode. Currently, the
only transport that needs this callback set is virtio, because this
transport adds new buffers to the virtio queue and we need to check,
that number of these buffers is less than size of the queue (it is
required by virtio spec). vhost and loopback transports don't need
this check.
Signed-off-by: Arseniy Krasnov <[email protected]>
---
Changelog:
v5(big patchset) -> v1:
* Refactorings of 'if' conditions.
* Remove extra blank line.
* Remove 'frag_off' field unneeded init.
* Add function 'virtio_transport_fill_skb()' which fills both linear
and non-linear skb with provided data.
v1 -> v2:
* Use original order of last four arguments in 'virtio_transport_alloc_skb()'.
v2 -> v3:
* Add new transport callback: 'msgzerocopy_check_iov'. It checks that
provided 'iov_iter' with data could be sent in a zerocopy mode.
If this callback is not set in transport - transport allows to send
any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 'true'
then zerocopy is allowed. Reason of this callback is that in case of
G2H transmission we insert whole skb to the tx virtio queue and such
skb must fit to the size of the virtio queue to be sent in a single
iteration (may be tx logic in 'virtio_transport.c' could be reworked
as in vhost to support partial send of current skb). This callback
will be enabled only for G2H path. For details pls see comment
'Check that tx queue...' below.
v3 -> v4:
* 'msgzerocopy_check_iov' moved from 'struct vsock_transport' to
'struct virtio_transport' as it is virtio specific callback and
never needed in other transports.
v4 -> v5:
* 'msgzerocopy_check_iov' renamed to 'can_msgzerocopy' and now it
uses number of buffers to send as input argument. I think there is
no need to pass iov to this callback (at least today, it is used only
by guest side of virtio transport), because the only thing that this
callback does is comparison of number of buffers to be inserted to
the tx queue and size of this queue.
* Remove any checks for type of current 'iov_iter' with payload (is it
'iovec' or 'ubuf'). These checks left from the earlier versions where I
didn't use already implemented kernel API which handles every type of
'iov_iter'.
v5 -> v6:
* Refactor 'virtio_transport_fill_skb()'.
* Add 'WARN_ON_ONCE()' and comment on invalid combination of destination
socket and payload in 'virtio_transport_alloc_skb()'.
v7 -> v8:
* Move '+1' addition from 'can_msgzerocopy' callback body to the caller.
This addition means packet header.
* In 'virtio_transport_can_zcopy()' rename 'max_to_send' argument to
'pkt_len'.
* Update commit message by adding details about new 'can_msgzerocopy'
callback.
* In 'virtio_transport_init_hdr()' move 'len' argument directly after
'info'.
* Add comment about processing last skb in tx loop.
* Update comment for 'can_msgzerocopy' callback for more details.
v8 -> v9:
* Return and update comment for 'virtio_transport_alloc_skb()'.
* Pass pointer to transport ops to 'virtio_transport_can_zcopy()',
this allows to use it directly without calling virtio_transport_get_ops()'.
* Remove redundant call for 'msg_data_left()' in 'virtio_transport_fill_skb()'.
* Do not pass 'struct vsock_sock*' to 'virtio_transport_alloc_skb()',
use same pointer from already passed 'struct virtio_vsock_pkt_info*'.
* Fix setting 'end of message' bit for SOCK_SEQPACKET (add call for
'msg_data_left()' == 0).
* Add 'zcopy' parameter to packet allocation trace event.
include/linux/virtio_vsock.h | 9 +
.../events/vsock_virtio_transport_common.h | 12 +-
net/vmw_vsock/virtio_transport.c | 32 +++
net/vmw_vsock/virtio_transport_common.c | 250 ++++++++++++++----
4 files changed, 241 insertions(+), 62 deletions(-)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index a91fbdf233e4..ebb3ce63d64d 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -160,6 +160,15 @@ struct virtio_transport {
/* Takes ownership of the packet */
int (*send_pkt)(struct sk_buff *skb);
+
+ /* Used in MSG_ZEROCOPY mode. Checks, that provided data
+ * (number of buffers) could be transmitted with zerocopy
+ * mode. If this callback is not implemented for the current
+ * transport - this means that this transport doesn't need
+ * extra checks and can perform zerocopy transmission by
+ * default.
+ */
+ bool (*can_msgzerocopy)(int bufs_num);
};
ssize_t
diff --git a/include/trace/events/vsock_virtio_transport_common.h b/include/trace/events/vsock_virtio_transport_common.h
index d0b3f0ea9ba1..f1ebe36787c3 100644
--- a/include/trace/events/vsock_virtio_transport_common.h
+++ b/include/trace/events/vsock_virtio_transport_common.h
@@ -43,7 +43,8 @@ TRACE_EVENT(virtio_transport_alloc_pkt,
__u32 len,
__u16 type,
__u16 op,
- __u32 flags
+ __u32 flags,
+ bool zcopy
),
TP_ARGS(
src_cid, src_port,
@@ -51,7 +52,8 @@ TRACE_EVENT(virtio_transport_alloc_pkt,
len,
type,
op,
- flags
+ flags,
+ zcopy
),
TP_STRUCT__entry(
__field(__u32, src_cid)
@@ -62,6 +64,7 @@ TRACE_EVENT(virtio_transport_alloc_pkt,
__field(__u16, type)
__field(__u16, op)
__field(__u32, flags)
+ __field(bool, zcopy)
),
TP_fast_assign(
__entry->src_cid = src_cid;
@@ -72,14 +75,15 @@ TRACE_EVENT(virtio_transport_alloc_pkt,
__entry->type = type;
__entry->op = op;
__entry->flags = flags;
+ __entry->zcopy = zcopy;
),
- TP_printk("%u:%u -> %u:%u len=%u type=%s op=%s flags=%#x",
+ TP_printk("%u:%u -> %u:%u len=%u type=%s op=%s flags=%#x zcopy=%s",
__entry->src_cid, __entry->src_port,
__entry->dst_cid, __entry->dst_port,
__entry->len,
show_type(__entry->type),
show_op(__entry->op),
- __entry->flags)
+ __entry->flags, __entry->zcopy ? "true" : "false")
);
TRACE_EVENT(virtio_transport_recv_pkt,
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 73d730156349..09ba3128e759 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -455,6 +455,37 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
queue_work(virtio_vsock_workqueue, &vsock->rx_work);
}
+static bool virtio_transport_can_msgzerocopy(int bufs_num)
+{
+ struct virtio_vsock *vsock;
+ bool res = false;
+
+ rcu_read_lock();
+
+ vsock = rcu_dereference(the_virtio_vsock);
+ if (vsock) {
+ struct virtqueue *vq = vsock->vqs[VSOCK_VQ_TX];
+
+ /* Check that tx queue is large enough to keep whole
+ * data to send. This is needed, because when there is
+ * not enough free space in the queue, current skb to
+ * send will be reinserted to the head of tx list of
+ * the socket to retry transmission later, so if skb
+ * is bigger than whole queue, it will be reinserted
+ * again and again, thus blocking other skbs to be sent.
+ * Each page of the user provided buffer will be added
+ * as a single buffer to the tx virtqueue, so compare
+ * number of pages against maximum capacity of the queue.
+ */
+ if (bufs_num <= vq->num_max)
+ res = true;
+ }
+
+ rcu_read_unlock();
+
+ return res;
+}
+
static bool virtio_transport_seqpacket_allow(u32 remote_cid);
static struct virtio_transport virtio_transport = {
@@ -504,6 +535,7 @@ static struct virtio_transport virtio_transport = {
},
.send_pkt = virtio_transport_send_pkt,
+ .can_msgzerocopy = virtio_transport_can_msgzerocopy,
};
static bool virtio_transport_seqpacket_allow(u32 remote_cid)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 3a48e48a99ac..e22c81435ef7 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -37,73 +37,99 @@ virtio_transport_get_ops(struct vsock_sock *vsk)
return container_of(t, struct virtio_transport, transport);
}
-/* Returns a new packet on success, otherwise returns NULL.
- *
- * If NULL is returned, errp is set to a negative errno.
- */
-static struct sk_buff *
-virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
- size_t len,
- u32 src_cid,
- u32 src_port,
- u32 dst_cid,
- u32 dst_port)
-{
- const size_t skb_len = VIRTIO_VSOCK_SKB_HEADROOM + len;
- struct virtio_vsock_hdr *hdr;
- struct sk_buff *skb;
- void *payload;
- int err;
+static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
+ struct virtio_vsock_pkt_info *info,
+ size_t pkt_len)
+{
+ struct iov_iter *iov_iter;
- skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
- if (!skb)
- return NULL;
+ if (!info->msg)
+ return false;
- hdr = virtio_vsock_hdr(skb);
- hdr->type = cpu_to_le16(info->type);
- hdr->op = cpu_to_le16(info->op);
- hdr->src_cid = cpu_to_le64(src_cid);
- hdr->dst_cid = cpu_to_le64(dst_cid);
- hdr->src_port = cpu_to_le32(src_port);
- hdr->dst_port = cpu_to_le32(dst_port);
- hdr->flags = cpu_to_le32(info->flags);
- hdr->len = cpu_to_le32(len);
+ iov_iter = &info->msg->msg_iter;
- if (info->msg && len > 0) {
- payload = skb_put(skb, len);
- err = memcpy_from_msg(payload, info->msg, len);
- if (err)
- goto out;
+ if (iov_iter->iov_offset)
+ return false;
- if (msg_data_left(info->msg) == 0 &&
- info->type == VIRTIO_VSOCK_TYPE_SEQPACKET) {
- hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM);
+ /* We can't send whole iov. */
+ if (iov_iter->count > pkt_len)
+ return false;
- if (info->msg->msg_flags & MSG_EOR)
- hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR);
- }
+ /* Check that transport can send data in zerocopy mode. */
+ t_ops = virtio_transport_get_ops(info->vsk);
+
+ if (t_ops->can_msgzerocopy) {
+ int pages_in_iov = iov_iter_npages(iov_iter, MAX_SKB_FRAGS);
+ int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS);
+
+ /* +1 is for packet header. */
+ return t_ops->can_msgzerocopy(pages_to_send + 1);
}
- if (info->reply)
- virtio_vsock_skb_set_reply(skb);
+ return true;
+}
- trace_virtio_transport_alloc_pkt(src_cid, src_port,
- dst_cid, dst_port,
- len,
- info->type,
- info->op,
- info->flags);
+static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
+ struct sk_buff *skb,
+ struct msghdr *msg,
+ bool zerocopy)
+{
+ struct ubuf_info *uarg;
- if (info->vsk && !skb_set_owner_sk_safe(skb, sk_vsock(info->vsk))) {
- WARN_ONCE(1, "failed to allocate skb on vsock socket with sk_refcnt == 0\n");
- goto out;
+ if (msg->msg_ubuf) {
+ uarg = msg->msg_ubuf;
+ net_zcopy_get(uarg);
+ } else {
+ struct iov_iter *iter = &msg->msg_iter;
+ struct ubuf_info_msgzc *uarg_zc;
+
+ uarg = msg_zerocopy_realloc(sk_vsock(vsk),
+ iter->count,
+ NULL);
+ if (!uarg)
+ return -1;
+
+ uarg_zc = uarg_to_msgzc(uarg);
+ uarg_zc->zerocopy = zerocopy ? 1 : 0;
}
- return skb;
+ skb_zcopy_init(skb, uarg);
-out:
- kfree_skb(skb);
- return NULL;
+ return 0;
+}
+
+static int virtio_transport_fill_skb(struct sk_buff *skb,
+ struct virtio_vsock_pkt_info *info,
+ size_t len,
+ bool zcopy)
+{
+ if (zcopy)
+ return __zerocopy_sg_from_iter(info->msg, NULL, skb,
+ &info->msg->msg_iter,
+ len);
+
+ return memcpy_from_msg(skb_put(skb, len), info->msg, len);
+}
+
+static void virtio_transport_init_hdr(struct sk_buff *skb,
+ struct virtio_vsock_pkt_info *info,
+ size_t payload_len,
+ u32 src_cid,
+ u32 src_port,
+ u32 dst_cid,
+ u32 dst_port)
+{
+ struct virtio_vsock_hdr *hdr;
+
+ hdr = virtio_vsock_hdr(skb);
+ hdr->type = cpu_to_le16(info->type);
+ hdr->op = cpu_to_le16(info->op);
+ hdr->src_cid = cpu_to_le64(src_cid);
+ hdr->dst_cid = cpu_to_le64(dst_cid);
+ hdr->src_port = cpu_to_le32(src_port);
+ hdr->dst_port = cpu_to_le32(dst_port);
+ hdr->flags = cpu_to_le32(info->flags);
+ hdr->len = cpu_to_le32(payload_len);
}
static void virtio_transport_copy_nonlinear_skb(const struct sk_buff *skb,
@@ -214,6 +240,82 @@ static u16 virtio_transport_get_type(struct sock *sk)
return VIRTIO_VSOCK_TYPE_SEQPACKET;
}
+/* Returns new sk_buff on success, otherwise returns NULL. */
+static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
+ size_t payload_len,
+ bool zcopy,
+ u32 src_cid,
+ u32 src_port,
+ u32 dst_cid,
+ u32 dst_port)
+{
+ struct vsock_sock *vsk;
+ struct sk_buff *skb;
+ size_t skb_len;
+
+ skb_len = VIRTIO_VSOCK_SKB_HEADROOM;
+
+ if (!zcopy)
+ skb_len += payload_len;
+
+ skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
+ if (!skb)
+ return NULL;
+
+ virtio_transport_init_hdr(skb, info, payload_len, src_cid, src_port,
+ dst_cid, dst_port);
+
+ vsk = info->vsk;
+
+ /* If 'vsk' != NULL then payload is always present, so we
+ * will never call '__zerocopy_sg_from_iter()' below without
+ * setting skb owner in 'skb_set_owner_w()'. The only case
+ * when 'vsk' == NULL is VIRTIO_VSOCK_OP_RST control message
+ * without payload.
+ */
+ WARN_ON_ONCE(!(vsk && (info->msg && payload_len)) && zcopy);
+
+ /* Set owner here, because '__zerocopy_sg_from_iter()' uses
+ * owner of skb without check to update 'sk_wmem_alloc'.
+ */
+ if (vsk)
+ skb_set_owner_w(skb, sk_vsock(vsk));
+
+ if (info->msg && payload_len > 0) {
+ int err;
+
+ err = virtio_transport_fill_skb(skb, info, payload_len, zcopy);
+ if (err)
+ goto out;
+
+ if (msg_data_left(info->msg) == 0 &&
+ info->type == VIRTIO_VSOCK_TYPE_SEQPACKET) {
+ struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
+
+ hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM);
+
+ if (info->msg->msg_flags & MSG_EOR)
+ hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR);
+ }
+ }
+
+ if (info->reply)
+ virtio_vsock_skb_set_reply(skb);
+
+ trace_virtio_transport_alloc_pkt(src_cid, src_port,
+ dst_cid, dst_port,
+ payload_len,
+ info->type,
+ info->op,
+ info->flags,
+ zcopy);
+
+ return skb;
+out:
+ kfree_skb(skb);
+ return NULL;
+}
+
/* This function can only be used on connecting/connected sockets,
* since a socket assigned to a transport is required.
*
@@ -222,10 +324,12 @@ static u16 virtio_transport_get_type(struct sock *sk)
static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
struct virtio_vsock_pkt_info *info)
{
+ u32 max_skb_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
u32 src_cid, src_port, dst_cid, dst_port;
const struct virtio_transport *t_ops;
struct virtio_vsock_sock *vvs;
u32 pkt_len = info->pkt_len;
+ bool can_zcopy = false;
u32 rest_len;
int ret;
@@ -254,15 +358,30 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
return pkt_len;
+ if (info->msg) {
+ /* If zerocopy is not enabled by 'setsockopt()', we behave as
+ * there is no MSG_ZEROCOPY flag set.
+ */
+ if (!sock_flag(sk_vsock(vsk), SOCK_ZEROCOPY))
+ info->msg->msg_flags &= ~MSG_ZEROCOPY;
+
+ if (info->msg->msg_flags & MSG_ZEROCOPY)
+ can_zcopy = virtio_transport_can_zcopy(t_ops, info, pkt_len);
+
+ if (can_zcopy)
+ max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
+ (MAX_SKB_FRAGS * PAGE_SIZE));
+ }
+
rest_len = pkt_len;
do {
struct sk_buff *skb;
size_t skb_len;
- skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE, rest_len);
+ skb_len = min(max_skb_len, rest_len);
- skb = virtio_transport_alloc_skb(info, skb_len,
+ skb = virtio_transport_alloc_skb(info, skb_len, can_zcopy,
src_cid, src_port,
dst_cid, dst_port);
if (!skb) {
@@ -270,6 +389,21 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
break;
}
+ /* We process buffer part by part, allocating skb on
+ * each iteration. If this is last skb for this buffer
+ * and MSG_ZEROCOPY mode is in use - we must allocate
+ * completion for the current syscall.
+ */
+ if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
+ skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
+ if (virtio_transport_init_zcopy_skb(vsk, skb,
+ info->msg,
+ can_zcopy)) {
+ ret = -ENOMEM;
+ break;
+ }
+ }
+
virtio_transport_inc_tx_pkt(vvs, skb);
ret = t_ops->send_pkt(skb);
@@ -985,7 +1119,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
if (!t)
return -ENOTCONN;
- reply = virtio_transport_alloc_skb(&info, 0,
+ reply = virtio_transport_alloc_skb(&info, 0, false,
le64_to_cpu(hdr->dst_cid),
le32_to_cpu(hdr->dst_port),
le64_to_cpu(hdr->src_cid),
--
2.25.1
On Sat, Sep 16, 2023 at 04:09:18PM +0300, Arseniy Krasnov wrote:
>This adds handling of MSG_ZEROCOPY flag on transmission path:
>
>1) If this flag is set and zerocopy transmission is possible (enabled
> in socket options and transport allows zerocopy), then non-linear
> skb will be created and filled with the pages of user's buffer.
> Pages of user's buffer are locked in memory by 'get_user_pages()'.
>2) Replaces way of skb owning: instead of 'skb_set_owner_sk_safe()' it
> calls 'skb_set_owner_w()'. Reason of this change is that
> '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc' of socket, so
> to decrease this field correctly, proper skb destructor is needed:
> 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
>3) Adds new callback to 'struct virtio_transport': 'can_msgzerocopy'.
> If this callback is set, then transport needs extra check to be able
> to send provided number of buffers in zerocopy mode. Currently, the
> only transport that needs this callback set is virtio, because this
> transport adds new buffers to the virtio queue and we need to check,
> that number of these buffers is less than size of the queue (it is
> required by virtio spec). vhost and loopback transports don't need
> this check.
>
>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> Changelog:
> v5(big patchset) -> v1:
> * Refactorings of 'if' conditions.
> * Remove extra blank line.
> * Remove 'frag_off' field unneeded init.
> * Add function 'virtio_transport_fill_skb()' which fills both linear
> and non-linear skb with provided data.
> v1 -> v2:
> * Use original order of last four arguments in 'virtio_transport_alloc_skb()'.
> v2 -> v3:
> * Add new transport callback: 'msgzerocopy_check_iov'. It checks that
> provided 'iov_iter' with data could be sent in a zerocopy mode.
> If this callback is not set in transport - transport allows to send
> any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 'true'
> then zerocopy is allowed. Reason of this callback is that in case of
> G2H transmission we insert whole skb to the tx virtio queue and such
> skb must fit to the size of the virtio queue to be sent in a single
> iteration (may be tx logic in 'virtio_transport.c' could be reworked
> as in vhost to support partial send of current skb). This callback
> will be enabled only for G2H path. For details pls see comment
> 'Check that tx queue...' below.
> v3 -> v4:
> * 'msgzerocopy_check_iov' moved from 'struct vsock_transport' to
> 'struct virtio_transport' as it is virtio specific callback and
> never needed in other transports.
> v4 -> v5:
> * 'msgzerocopy_check_iov' renamed to 'can_msgzerocopy' and now it
> uses number of buffers to send as input argument. I think there is
> no need to pass iov to this callback (at least today, it is used only
> by guest side of virtio transport), because the only thing that this
> callback does is comparison of number of buffers to be inserted to
> the tx queue and size of this queue.
> * Remove any checks for type of current 'iov_iter' with payload (is it
> 'iovec' or 'ubuf'). These checks left from the earlier versions where I
> didn't use already implemented kernel API which handles every type of
> 'iov_iter'.
> v5 -> v6:
> * Refactor 'virtio_transport_fill_skb()'.
> * Add 'WARN_ON_ONCE()' and comment on invalid combination of destination
> socket and payload in 'virtio_transport_alloc_skb()'.
> v7 -> v8:
> * Move '+1' addition from 'can_msgzerocopy' callback body to the caller.
> This addition means packet header.
> * In 'virtio_transport_can_zcopy()' rename 'max_to_send' argument to
> 'pkt_len'.
> * Update commit message by adding details about new 'can_msgzerocopy'
> callback.
> * In 'virtio_transport_init_hdr()' move 'len' argument directly after
> 'info'.
> * Add comment about processing last skb in tx loop.
> * Update comment for 'can_msgzerocopy' callback for more details.
> v8 -> v9:
> * Return and update comment for 'virtio_transport_alloc_skb()'.
> * Pass pointer to transport ops to 'virtio_transport_can_zcopy()',
> this allows to use it directly without calling virtio_transport_get_ops()'.
> * Remove redundant call for 'msg_data_left()' in 'virtio_transport_fill_skb()'.
> * Do not pass 'struct vsock_sock*' to 'virtio_transport_alloc_skb()',
> use same pointer from already passed 'struct virtio_vsock_pkt_info*'.
> * Fix setting 'end of message' bit for SOCK_SEQPACKET (add call for
> 'msg_data_left()' == 0).
> * Add 'zcopy' parameter to packet allocation trace event.
Thanks for addressing the comments!
>
> include/linux/virtio_vsock.h | 9 +
> .../events/vsock_virtio_transport_common.h | 12 +-
> net/vmw_vsock/virtio_transport.c | 32 +++
> net/vmw_vsock/virtio_transport_common.c | 250 ++++++++++++++----
> 4 files changed, 241 insertions(+), 62 deletions(-)
LGTM!
Reviewed-by: Stefano Garzarella <[email protected]>
Hi Stefano,
thanks for review! So when this patchset will be merged to net-next,
I'll start sending next part of MSG_ZEROCOPY patchset, e.g. AF_VSOCK +
Documentation/ patches.
Thanks, Arseniy
On 16.09.2023 16:09, Arseniy Krasnov wrote:
> Hello,
>
> this patchset is first of three parts of another big patchset for
> MSG_ZEROCOPY flag support:
> https://lore.kernel.org/netdev/[email protected]/
>
> During review of this series, Stefano Garzarella <[email protected]>
> suggested to split it for three parts to simplify review and merging:
>
> 1) virtio and vhost updates (for fragged skbs) <--- this patchset
> 2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
> tx completions) and update for Documentation/.
> 3) Updates for tests and utils.
>
> This series enables handling of fragged skbs in virtio and vhost parts.
> Newly logic won't be triggered, because SO_ZEROCOPY options is still
> impossible to enable at this moment (next bunch of patches from big
> set above will enable it).
>
> I've included changelog to some patches anyway, because there were some
> comments during review of last big patchset from the link above.
>
> Head for this patchset is:
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=f2fa1c812c91e99d0317d1fc7d845e1e05f39716
>
> Link to v1:
> https://lore.kernel.org/netdev/[email protected]/
> Link to v2:
> https://lore.kernel.org/netdev/[email protected]/
> Link to v3:
> https://lore.kernel.org/netdev/[email protected]/
> Link to v4:
> https://lore.kernel.org/netdev/[email protected]/
> Link to v5:
> https://lore.kernel.org/netdev/[email protected]/
> Link to v6:
> https://lore.kernel.org/netdev/[email protected]/
> Link to v7:
> https://lore.kernel.org/netdev/[email protected]/
> Link to v8:
> https://lore.kernel.org/netdev/[email protected]/
>
> Changelog:
> v3 -> v4:
> * Patchset rebased and tested on new HEAD of net-next (see hash above).
> v4 -> v5:
> * See per-patch changelog after ---.
> v5 -> v6:
> * Patchset rebased and tested on new HEAD of net-next (see hash above).
> * See per-patch changelog after ---.
> v6 -> v7:
> * Patchset rebased and tested on new HEAD of net-next (see hash above).
> * See per-patch changelog after ---.
> v7 -> v8:
> * Patchset rebased and tested on new HEAD of net-next (see hash above).
> * See per-patch changelog after ---.
> v8 -> v9:
> * Patchset rebased and tested on new HEAD of net-next (see hash above).
> * See per-patch changelog after ---.
>
> Arseniy Krasnov (4):
> vsock/virtio/vhost: read data from non-linear skb
> vsock/virtio: support to send non-linear skb
> vsock/virtio: non-linear skb handling for tap
> vsock/virtio: MSG_ZEROCOPY flag support
>
> drivers/vhost/vsock.c | 14 +-
> include/linux/virtio_vsock.h | 10 +
> .../events/vsock_virtio_transport_common.h | 12 +-
> net/vmw_vsock/virtio_transport.c | 92 +++++-
> net/vmw_vsock/virtio_transport_common.c | 307 ++++++++++++++----
> 5 files changed, 348 insertions(+), 87 deletions(-)
>
On Sat, Sep 16, 2023 at 04:09:14PM +0300, Arseniy Krasnov wrote:
> Hello,
>
> this patchset is first of three parts of another big patchset for
> MSG_ZEROCOPY flag support:
> https://lore.kernel.org/netdev/[email protected]/
>
> During review of this series, Stefano Garzarella <[email protected]>
> suggested to split it for three parts to simplify review and merging:
>
> 1) virtio and vhost updates (for fragged skbs) <--- this patchset
> 2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
> tx completions) and update for Documentation/.
> 3) Updates for tests and utils.
>
> This series enables handling of fragged skbs in virtio and vhost parts.
> Newly logic won't be triggered, because SO_ZEROCOPY options is still
> impossible to enable at this moment (next bunch of patches from big
> set above will enable it).
Acked-by: Michael S. Tsirkin <[email protected]>
>
> I've included changelog to some patches anyway, because there were some
> comments during review of last big patchset from the link above.
>
> Head for this patchset is:
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=f2fa1c812c91e99d0317d1fc7d845e1e05f39716
>
> Link to v1:
> https://lore.kernel.org/netdev/[email protected]/
> Link to v2:
> https://lore.kernel.org/netdev/[email protected]/
> Link to v3:
> https://lore.kernel.org/netdev/[email protected]/
> Link to v4:
> https://lore.kernel.org/netdev/[email protected]/
> Link to v5:
> https://lore.kernel.org/netdev/[email protected]/
> Link to v6:
> https://lore.kernel.org/netdev/[email protected]/
> Link to v7:
> https://lore.kernel.org/netdev/[email protected]/
> Link to v8:
> https://lore.kernel.org/netdev/[email protected]/
> Changelog:
> v3 -> v4:
> * Patchset rebased and tested on new HEAD of net-next (see hash above).
> v4 -> v5:
> * See per-patch changelog after ---.
> v5 -> v6:
> * Patchset rebased and tested on new HEAD of net-next (see hash above).
> * See per-patch changelog after ---.
> v6 -> v7:
> * Patchset rebased and tested on new HEAD of net-next (see hash above).
> * See per-patch changelog after ---.
> v7 -> v8:
> * Patchset rebased and tested on new HEAD of net-next (see hash above).
> * See per-patch changelog after ---.
> v8 -> v9:
> * Patchset rebased and tested on new HEAD of net-next (see hash above).
> * See per-patch changelog after ---.
>
> Arseniy Krasnov (4):
> vsock/virtio/vhost: read data from non-linear skb
> vsock/virtio: support to send non-linear skb
> vsock/virtio: non-linear skb handling for tap
> vsock/virtio: MSG_ZEROCOPY flag support
>
> drivers/vhost/vsock.c | 14 +-
> include/linux/virtio_vsock.h | 10 +
> .../events/vsock_virtio_transport_common.h | 12 +-
> net/vmw_vsock/virtio_transport.c | 92 +++++-
> net/vmw_vsock/virtio_transport_common.c | 307 ++++++++++++++----
> 5 files changed, 348 insertions(+), 87 deletions(-)
>
> --
> 2.25.1
On Tue, 2023-09-19 at 09:54 +0200, Stefano Garzarella wrote:
> On Mon, Sep 18, 2023 at 07:56:00PM +0300, Arseniy Krasnov wrote:
> > Hi Stefano,
> >
> > thanks for review! So when this patchset will be merged to net-next,
> > I'll start sending next part of MSG_ZEROCOPY patchset, e.g. AF_VSOCK +
> > Documentation/ patches.
>
> Ack, if it is not a very big series, maybe better to include also the
> tests so we can run them before merge the feature.
I understand that at least 2 follow-up series are waiting for this, one
of them targeting net-next and the bigger one targeting the virtio
tree. Am I correct?
DaveM suggests this should go via the virtio tree, too. Any different
opinion?
Thanks!
Paolo
On Mon, Sep 18, 2023 at 07:56:00PM +0300, Arseniy Krasnov wrote:
>Hi Stefano,
>
>thanks for review! So when this patchset will be merged to net-next,
>I'll start sending next part of MSG_ZEROCOPY patchset, e.g. AF_VSOCK +
>Documentation/ patches.
Ack, if it is not a very big series, maybe better to include also the
tests so we can run them before merge the feature.
WDYT?
Stefano
On Tue, Sep 19, 2023 at 03:19:54PM +0200, Paolo Abeni wrote:
>On Tue, 2023-09-19 at 09:54 +0200, Stefano Garzarella wrote:
>> On Mon, Sep 18, 2023 at 07:56:00PM +0300, Arseniy Krasnov wrote:
>> > Hi Stefano,
>> >
>> > thanks for review! So when this patchset will be merged to net-next,
>> > I'll start sending next part of MSG_ZEROCOPY patchset, e.g. AF_VSOCK +
>> > Documentation/ patches.
>>
>> Ack, if it is not a very big series, maybe better to include also the
>> tests so we can run them before merge the feature.
>
>I understand that at least 2 follow-up series are waiting for this, one
>of them targeting net-next and the bigger one targeting the virtio
>tree. Am I correct?
IIUC the next series will touch only the vsock core
(net/vmw_vsock/af_vsock.c), tests, and documentation.
The virtio part should be fully covered by this series.
@Arseniy feel free to correct me!
>
>DaveM suggests this should go via the virtio tree, too. Any different
>opinion?
For this series should be fine, I'm not sure about the next series.
Merging this with the virtio tree, then it forces us to do it for
followup as well right?
In theory followup is more on the core, so better with net-next, but
it's also true that for now only virtio transports support it, so it
might be okay to continue with virtio.
@Michael WDYT?
Thanks,
Stefano
On 19.09.2023 19:48, Arseniy Krasnov wrote:
>
>
> On 19.09.2023 16:35, Stefano Garzarella wrote:
>> On Tue, Sep 19, 2023 at 03:19:54PM +0200, Paolo Abeni wrote:
>>> On Tue, 2023-09-19 at 09:54 +0200, Stefano Garzarella wrote:
>>>> On Mon, Sep 18, 2023 at 07:56:00PM +0300, Arseniy Krasnov wrote:
>>>>> Hi Stefano,
>>>>>
>>>>> thanks for review! So when this patchset will be merged to net-next,
>>>>> I'll start sending next part of MSG_ZEROCOPY patchset, e.g. AF_VSOCK +
>>>>> Documentation/ patches.
>>>>
>>>> Ack, if it is not a very big series, maybe better to include also the
>>>> tests so we can run them before merge the feature.
>>>
>>> I understand that at least 2 follow-up series are waiting for this, one
>>> of them targeting net-next and the bigger one targeting the virtio
>>> tree. Am I correct?
>>
>> IIUC the next series will touch only the vsock core
>> (net/vmw_vsock/af_vsock.c), tests, and documentation.
>>
>> The virtio part should be fully covered by this series.
>>
>> @Arseniy feel free to correct me!
>
> Yes, only this patchset touches virtio code. Next patchset will be AF_VSOCK,
> Documentation/ and tests. I think there is no need to merge it to the virtio
> tree - we can continue in the same way as before during AF_VSOCK development,
> e.g. merging it to net-next only.
^^^
I mean of course if there is need to merge to virtio tree also - no problem,
just informing that the next set of patches doesn't touch virtio code.
(except two several lines patches for drivers/vhost/vsock.c and net/vmw_vsock/virtio_transport.c)
>
> Thanks, Arseniy
>
>>
>>>
>>> DaveM suggests this should go via the virtio tree, too. Any different
>>> opinion?
>>
>> For this series should be fine, I'm not sure about the next series.
>> Merging this with the virtio tree, then it forces us to do it for
>> followup as well right?
>>
>> In theory followup is more on the core, so better with net-next, but
>> it's also true that for now only virtio transports support it, so it
>> might be okay to continue with virtio.
>>
>> @Michael WDYT?
>>
>> Thanks,
>> Stefano
>>
On 19.09.2023 16:35, Stefano Garzarella wrote:
> On Tue, Sep 19, 2023 at 03:19:54PM +0200, Paolo Abeni wrote:
>> On Tue, 2023-09-19 at 09:54 +0200, Stefano Garzarella wrote:
>>> On Mon, Sep 18, 2023 at 07:56:00PM +0300, Arseniy Krasnov wrote:
>>> > Hi Stefano,
>>> >
>>> > thanks for review! So when this patchset will be merged to net-next,
>>> > I'll start sending next part of MSG_ZEROCOPY patchset, e.g. AF_VSOCK +
>>> > Documentation/ patches.
>>>
>>> Ack, if it is not a very big series, maybe better to include also the
>>> tests so we can run them before merge the feature.
>>
>> I understand that at least 2 follow-up series are waiting for this, one
>> of them targeting net-next and the bigger one targeting the virtio
>> tree. Am I correct?
>
> IIUC the next series will touch only the vsock core
> (net/vmw_vsock/af_vsock.c), tests, and documentation.
>
> The virtio part should be fully covered by this series.
>
> @Arseniy feel free to correct me!
Yes, only this patchset touches virtio code. Next patchset will be AF_VSOCK,
Documentation/ and tests. I think there is no need to merge it to the virtio
tree - we can continue in the same way as before during AF_VSOCK development,
e.g. merging it to net-next only.
Thanks, Arseniy
>
>>
>> DaveM suggests this should go via the virtio tree, too. Any different
>> opinion?
>
> For this series should be fine, I'm not sure about the next series.
> Merging this with the virtio tree, then it forces us to do it for
> followup as well right?
>
> In theory followup is more on the core, so better with net-next, but
> it's also true that for now only virtio transports support it, so it
> might be okay to continue with virtio.
>
> @Michael WDYT?
>
> Thanks,
> Stefano
>
On 19.09.2023 10:54, Stefano Garzarella wrote:
> On Mon, Sep 18, 2023 at 07:56:00PM +0300, Arseniy Krasnov wrote:
>> Hi Stefano,
>>
>> thanks for review! So when this patchset will be merged to net-next,
>> I'll start sending next part of MSG_ZEROCOPY patchset, e.g. AF_VSOCK +
>> Documentation/ patches.
>
> Ack, if it is not a very big series, maybe better to include also the
> tests so we can run them before merge the feature.
>
> WDYT?
Yes, ok! AF_VSOCK part is smaller than virtio part.
Thanks, Arseniy
>
> Stefano
>
On Tue, Sep 19, 2023 at 03:35:51PM +0200, Stefano Garzarella wrote:
> On Tue, Sep 19, 2023 at 03:19:54PM +0200, Paolo Abeni wrote:
> > On Tue, 2023-09-19 at 09:54 +0200, Stefano Garzarella wrote:
> > > On Mon, Sep 18, 2023 at 07:56:00PM +0300, Arseniy Krasnov wrote:
> > > > Hi Stefano,
> > > >
> > > > thanks for review! So when this patchset will be merged to net-next,
> > > > I'll start sending next part of MSG_ZEROCOPY patchset, e.g. AF_VSOCK +
> > > > Documentation/ patches.
> > >
> > > Ack, if it is not a very big series, maybe better to include also the
> > > tests so we can run them before merge the feature.
> >
> > I understand that at least 2 follow-up series are waiting for this, one
> > of them targeting net-next and the bigger one targeting the virtio
> > tree. Am I correct?
>
> IIUC the next series will touch only the vsock core
> (net/vmw_vsock/af_vsock.c), tests, and documentation.
>
> The virtio part should be fully covered by this series.
>
> @Arseniy feel free to correct me!
>
> >
> > DaveM suggests this should go via the virtio tree, too. Any different
> > opinion?
>
> For this series should be fine, I'm not sure about the next series.
> Merging this with the virtio tree, then it forces us to do it for
> followup as well right?
>
> In theory followup is more on the core, so better with net-next, but
> it's also true that for now only virtio transports support it, so it
> might be okay to continue with virtio.
>
> @Michael WDYT?
>
> Thanks,
> Stefano
I didn't get DaveM's mail - was this off-list?
I think net-next is easier because the follow up belongs in net-next.
But if not I can take it, sure. Let me know.
--
MST
On Tue, 2023-09-19 at 22:38 -0400, Michael S. Tsirkin wrote:
> On Tue, Sep 19, 2023 at 03:35:51PM +0200, Stefano Garzarella wrote:
> > On Tue, Sep 19, 2023 at 03:19:54PM +0200, Paolo Abeni wrote:
> >
> > > DaveM suggests this should go via the virtio tree, too. Any different
> > > opinion?
> >
> > For this series should be fine, I'm not sure about the next series.
> > Merging this with the virtio tree, then it forces us to do it for
> > followup as well right?
> >
> > In theory followup is more on the core, so better with net-next, but
> > it's also true that for now only virtio transports support it, so it
> > might be okay to continue with virtio.
> >
> > @Michael WDYT?
> >
> > Thanks,
> > Stefano
>
> I didn't get DaveM's mail - was this off-list?
Yes, that was off-list co-ordination.
> I think net-next is easier because the follow up belongs in net-next.
> But if not I can take it, sure. Let me know.
Since there is agreement on that route, we will take it (likely
tomorrow).
Cheers,
Paolo
Hello:
This series was applied to netdev/net-next.git (main)
by Paolo Abeni <[email protected]>:
On Sat, 16 Sep 2023 16:09:14 +0300 you wrote:
> Hello,
>
> this patchset is first of three parts of another big patchset for
> MSG_ZEROCOPY flag support:
> https://lore.kernel.org/netdev/[email protected]/
>
> During review of this series, Stefano Garzarella <[email protected]>
> suggested to split it for three parts to simplify review and merging:
>
> [...]
Here is the summary with links:
- [net-next,v9,1/4] vsock/virtio/vhost: read data from non-linear skb
https://git.kernel.org/netdev/net-next/c/0df7cd3c13e4
- [net-next,v9,2/4] vsock/virtio: support to send non-linear skb
https://git.kernel.org/netdev/net-next/c/64c99d2d6ada
- [net-next,v9,3/4] vsock/virtio: non-linear skb handling for tap
https://git.kernel.org/netdev/net-next/c/4b0bf10eb077
- [net-next,v9,4/4] vsock/virtio: MSG_ZEROCOPY flag support
https://git.kernel.org/netdev/net-next/c/581512a6dc93
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html