Hey all!
This series introduces support for datagrams to virtio/vsock.
It is a spin-off (and smaller version) of this series from the summer:
https://lore.kernel.org/all/[email protected]/
Please note that this is an RFC and should not be merged until
associated changes are made to the virtio specification, which will
follow after discussion from this series.
Another aside, the v4 of the series has only been mildly tested with a
run of tools/testing/vsock/vsock_test. Some code likely needs cleaning
up, but I'm hoping to get some of the design choices agreed upon before
spending too much time making it pretty.
This series first supports datagrams in a basic form for virtio, and
then optimizes the sendpath for all datagram transports.
The result is a very fast datagram communication protocol that
outperforms even UDP on multi-queue virtio-net w/ vhost on a variety
of multi-threaded workload samples.
For those that are curious, some summary data comparing UDP and VSOCK
DGRAM (N=5):
vCPUS: 16
virtio-net queues: 16
payload size: 4KB
Setup: bare metal + vm (non-nested)
UDP: 287.59 MB/s
VSOCK DGRAM: 509.2 MB/s
Some notes about the implementation...
This datagram implementation forces datagrams to self-throttle according
to the threshold set by sk_sndbuf. It behaves similar to the credits
used by streams in its effect on throughput and memory consumption, but
it is not influenced by the receiving socket as credits are.
The device drops packets silently.
As discussed previously, this series introduces datagrams and defers
fairness to future work. See discussion in v2 for more context around
datagrams, fairness, and this implementation.
Signed-off-by: Bobby Eshleman <[email protected]>
---
Changes in v5:
- teach vhost to drop dgram when a datagram exceeds the receive buffer
- now uses MSG_ERRQUEUE and depends on Arseniy's zerocopy patch:
"vsock: read from socket's error queue"
- replace multiple ->dgram_* callbacks with single ->dgram_addr_init()
callback
- refactor virtio dgram skb allocator to reduce conflicts w/ zerocopy series
- add _fallback/_FALLBACK suffix to dgram transport variables/macros
- add WARN_ONCE() for table_size / VSOCK_HASH issue
- add static to vsock_find_bound_socket_common
- dedupe code in vsock_dgram_sendmsg() using module_got var
- drop concurrent sendmsg() for dgram and defer to future series
- Add more tests
- test EHOSTUNREACH in errqueue
- test stream + dgram address collision
- improve clarity of dgram msg bounds test code
- Link to v4: https://lore.kernel.org/r/[email protected]
Changes in v4:
- style changes
- vsock: use sk_vsock(vsk) in vsock_dgram_recvmsg instead of
&sk->vsk
- vsock: fix xmas tree declaration
- vsock: fix spacing issues
- virtio/vsock: virtio_transport_recv_dgram returns void because err
unused
- sparse analysis warnings/errors
- virtio/vsock: fix unitialized skerr on destroy
- virtio/vsock: fix uninitialized err var on goto out
- vsock: fix declarations that need static
- vsock: fix __rcu annotation order
- bugs
- vsock: fix null ptr in remote_info code
- vsock/dgram: make transport_dgram a fallback instead of first
priority
- vsock: remove redundant rcu read lock acquire in getname()
- tests
- add more tests (message bounds and more)
- add vsock_dgram_bind() helper
- add vsock_dgram_connect() helper
Changes in v3:
- Support multi-transport dgram, changing logic in connect/bind
to support VMCI case
- Support per-pkt transport lookup for sendto() case
- Fix dgram_allow() implementation
- Fix dgram feature bit number (now it is 3)
- Fix binding so dgram and connectible (cid,port) spaces are
non-overlapping
- RCU protect transport ptr so connect() calls never leave
a lockless read of the transport and remote_addr are always
in sync
- Link to v2: https://lore.kernel.org/r/[email protected]
---
Bobby Eshleman (13):
af_vsock: generalize vsock_dgram_recvmsg() to all transports
af_vsock: refactor transport lookup code
af_vsock: support multi-transport datagrams
af_vsock: generalize bind table functions
af_vsock: use a separate dgram bind table
virtio/vsock: add VIRTIO_VSOCK_TYPE_DGRAM
virtio/vsock: add common datagram send path
af_vsock: add vsock_find_bound_dgram_socket()
virtio/vsock: add common datagram recv path
virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
vhost/vsock: implement datagram support
vsock/loopback: implement datagram support
virtio/vsock: implement datagram support
Jiang Wang (1):
test/vsock: add vsock dgram tests
drivers/vhost/vsock.c | 64 ++-
include/linux/virtio_vsock.h | 10 +-
include/net/af_vsock.h | 14 +-
include/uapi/linux/virtio_vsock.h | 2 +
net/vmw_vsock/af_vsock.c | 281 ++++++++++---
net/vmw_vsock/hyperv_transport.c | 13 -
net/vmw_vsock/virtio_transport.c | 26 +-
net/vmw_vsock/virtio_transport_common.c | 190 +++++++--
net/vmw_vsock/vmci_transport.c | 60 +--
net/vmw_vsock/vsock_loopback.c | 10 +-
tools/testing/vsock/util.c | 141 ++++++-
tools/testing/vsock/util.h | 6 +
tools/testing/vsock/vsock_test.c | 680 ++++++++++++++++++++++++++++++++
13 files changed, 1320 insertions(+), 177 deletions(-)
---
base-commit: 37cadc266ebdc7e3531111c2b3304fa01b2131e8
change-id: 20230413-b4-vsock-dgram-3b6eba6a64e5
Best regards,
--
Bobby Eshleman <[email protected]>
This commit makes the bind table management functions in vsock usable
for different bind tables. Future work will introduce a new table for
datagrams to avoid address collisions, and these functions will be used
there.
Signed-off-by: Bobby Eshleman <[email protected]>
---
net/vmw_vsock/af_vsock.c | 34 +++++++++++++++++++++++++++-------
1 file changed, 27 insertions(+), 7 deletions(-)
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 26c97b33d55a..88100154156c 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -231,11 +231,12 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
sock_put(&vsk->sk);
}
-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
+static struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
+ struct list_head *bind_table)
{
struct vsock_sock *vsk;
- list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
+ list_for_each_entry(vsk, bind_table, bound_table) {
if (vsock_addr_equals_addr(addr, &vsk->local_addr))
return sk_vsock(vsk);
@@ -248,6 +249,11 @@ static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
return NULL;
}
+static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
+{
+ return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr));
+}
+
static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
struct sockaddr_vm *dst)
{
@@ -647,12 +653,18 @@ static void vsock_pending_work(struct work_struct *work)
/**** SOCKET OPERATIONS ****/
-static int __vsock_bind_connectible(struct vsock_sock *vsk,
- struct sockaddr_vm *addr)
+static int vsock_bind_common(struct vsock_sock *vsk,
+ struct sockaddr_vm *addr,
+ struct list_head *bind_table,
+ size_t table_size)
{
static u32 port;
struct sockaddr_vm new_addr;
+ if (WARN_ONCE(table_size < VSOCK_HASH_SIZE,
+ "table size too small, may cause overflow"))
+ return -EINVAL;
+
if (!port)
port = get_random_u32_above(LAST_RESERVED_PORT);
@@ -668,7 +680,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
new_addr.svm_port = port++;
- if (!__vsock_find_bound_socket(&new_addr)) {
+ if (!vsock_find_bound_socket_common(&new_addr,
+ &bind_table[VSOCK_HASH(addr)])) {
found = true;
break;
}
@@ -685,7 +698,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
return -EACCES;
}
- if (__vsock_find_bound_socket(&new_addr))
+ if (vsock_find_bound_socket_common(&new_addr,
+ &bind_table[VSOCK_HASH(addr)]))
return -EADDRINUSE;
}
@@ -697,11 +711,17 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
* by AF_UNIX.
*/
__vsock_remove_bound(vsk);
- __vsock_insert_bound(vsock_bound_sockets(&vsk->local_addr), vsk);
+ __vsock_insert_bound(&bind_table[VSOCK_HASH(&vsk->local_addr)], vsk);
return 0;
}
+static int __vsock_bind_connectible(struct vsock_sock *vsk,
+ struct sockaddr_vm *addr)
+{
+ return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
+}
+
static int __vsock_bind_dgram(struct vsock_sock *vsk,
struct sockaddr_vm *addr)
{
--
2.30.2
This commit adds a feature bit for virtio vsock to support datagrams.
Signed-off-by: Jiang Wang <[email protected]>
Signed-off-by: Bobby Eshleman <[email protected]>
---
include/uapi/linux/virtio_vsock.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
index 331be28b1d30..27b4b2b8bf13 100644
--- a/include/uapi/linux/virtio_vsock.h
+++ b/include/uapi/linux/virtio_vsock.h
@@ -40,6 +40,7 @@
/* The feature bitmap for virtio vsock */
#define VIRTIO_VSOCK_F_SEQPACKET 1 /* SOCK_SEQPACKET supported */
+#define VIRTIO_VSOCK_F_DGRAM 3 /* SOCK_DGRAM supported */
struct virtio_vsock_config {
__le64 guest_cid;
--
2.30.2
This commit implements datagram support for vsock loopback.
Not much more than simply toggling on "dgram_allow" and continuing to
use the common virtio functions.
Signed-off-by: Bobby Eshleman <[email protected]>
---
net/vmw_vsock/vsock_loopback.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
index 278235ea06c4..0459b2bf7b15 100644
--- a/net/vmw_vsock/vsock_loopback.c
+++ b/net/vmw_vsock/vsock_loopback.c
@@ -46,6 +46,7 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk)
return 0;
}
+static bool vsock_loopback_dgram_allow(u32 cid, u32 port);
static bool vsock_loopback_seqpacket_allow(u32 remote_cid);
static struct virtio_transport loopback_transport = {
@@ -62,7 +63,7 @@ static struct virtio_transport loopback_transport = {
.cancel_pkt = vsock_loopback_cancel_pkt,
.dgram_enqueue = virtio_transport_dgram_enqueue,
- .dgram_allow = virtio_transport_dgram_allow,
+ .dgram_allow = vsock_loopback_dgram_allow,
.stream_dequeue = virtio_transport_stream_dequeue,
.stream_enqueue = virtio_transport_stream_enqueue,
@@ -95,6 +96,11 @@ static struct virtio_transport loopback_transport = {
.send_pkt = vsock_loopback_send_pkt,
};
+static bool vsock_loopback_dgram_allow(u32 cid, u32 port)
+{
+ return true;
+}
+
static bool vsock_loopback_seqpacket_allow(u32 remote_cid)
{
return true;
--
2.30.2
This commit implements datagram support for virtio/vsock by teaching
virtio to use the general virtio transport ->dgram_addr_init() function
and implementation a new version of ->dgram_allow().
Additionally, it drops virtio_transport_dgram_allow() as an exported
symbol because it is no longer used in other transports.
Signed-off-by: Bobby Eshleman <[email protected]>
---
include/linux/virtio_vsock.h | 1 -
net/vmw_vsock/virtio_transport.c | 24 +++++++++++++++++++++++-
net/vmw_vsock/virtio_transport_common.c | 6 ------
3 files changed, 23 insertions(+), 8 deletions(-)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index b3856b8a42b3..d0a4f08b12c1 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -211,7 +211,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
bool virtio_transport_stream_allow(u32 cid, u32 port);
-bool virtio_transport_dgram_allow(u32 cid, u32 port);
void virtio_transport_dgram_addr_init(struct sk_buff *skb,
struct sockaddr_vm *addr);
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index ac2126c7dac5..713718861bd4 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -63,6 +63,7 @@ struct virtio_vsock {
u32 guest_cid;
bool seqpacket_allow;
+ bool dgram_allow;
};
static u32 virtio_transport_get_local_cid(void)
@@ -413,6 +414,7 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
queue_work(virtio_vsock_workqueue, &vsock->rx_work);
}
+static bool virtio_transport_dgram_allow(u32 cid, u32 port);
static bool virtio_transport_seqpacket_allow(u32 remote_cid);
static struct virtio_transport virtio_transport = {
@@ -430,6 +432,7 @@ static struct virtio_transport virtio_transport = {
.dgram_enqueue = virtio_transport_dgram_enqueue,
.dgram_allow = virtio_transport_dgram_allow,
+ .dgram_addr_init = virtio_transport_dgram_addr_init,
.stream_dequeue = virtio_transport_stream_dequeue,
.stream_enqueue = virtio_transport_stream_enqueue,
@@ -462,6 +465,21 @@ static struct virtio_transport virtio_transport = {
.send_pkt = virtio_transport_send_pkt,
};
+static bool virtio_transport_dgram_allow(u32 cid, u32 port)
+{
+ struct virtio_vsock *vsock;
+ bool dgram_allow;
+
+ dgram_allow = false;
+ rcu_read_lock();
+ vsock = rcu_dereference(the_virtio_vsock);
+ if (vsock)
+ dgram_allow = vsock->dgram_allow;
+ rcu_read_unlock();
+
+ return dgram_allow;
+}
+
static bool virtio_transport_seqpacket_allow(u32 remote_cid)
{
struct virtio_vsock *vsock;
@@ -655,6 +673,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_SEQPACKET))
vsock->seqpacket_allow = true;
+ if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
+ vsock->dgram_allow = true;
+
vdev->priv = vsock;
ret = virtio_vsock_vqs_init(vsock);
@@ -747,7 +768,8 @@ static struct virtio_device_id id_table[] = {
};
static unsigned int features[] = {
- VIRTIO_VSOCK_F_SEQPACKET
+ VIRTIO_VSOCK_F_SEQPACKET,
+ VIRTIO_VSOCK_F_DGRAM
};
static struct virtio_driver virtio_vsock_driver = {
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 96118e258097..77898f5325cd 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -783,12 +783,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
}
EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
-bool virtio_transport_dgram_allow(u32 cid, u32 port)
-{
- return false;
-}
-EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
-
int virtio_transport_connect(struct vsock_sock *vsk)
{
struct virtio_vsock_pkt_info info = {
--
2.30.2
This commit implements datagram support for vhost/vsock by teaching
vhost to use the common virtio transport datagram functions.
If the virtio RX buffer is too small, then the transmission is
abandoned, the packet dropped, and EHOSTUNREACH is added to the socket's
error queue.
Signed-off-by: Bobby Eshleman <[email protected]>
---
drivers/vhost/vsock.c | 62 +++++++++++++++++++++++++++++++++++++++++++++---
net/vmw_vsock/af_vsock.c | 5 +++-
2 files changed, 63 insertions(+), 4 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index d5d6a3c3f273..da14260c6654 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -8,6 +8,7 @@
*/
#include <linux/miscdevice.h>
#include <linux/atomic.h>
+#include <linux/errqueue.h>
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/vmalloc.h>
@@ -32,7 +33,8 @@
enum {
VHOST_VSOCK_FEATURES = VHOST_FEATURES |
(1ULL << VIRTIO_F_ACCESS_PLATFORM) |
- (1ULL << VIRTIO_VSOCK_F_SEQPACKET)
+ (1ULL << VIRTIO_VSOCK_F_SEQPACKET) |
+ (1ULL << VIRTIO_VSOCK_F_DGRAM)
};
enum {
@@ -56,6 +58,7 @@ struct vhost_vsock {
atomic_t queued_replies;
u32 guest_cid;
+ bool dgram_allow;
bool seqpacket_allow;
};
@@ -86,6 +89,32 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
return NULL;
}
+/* Claims ownership of the skb, do not free the skb after calling! */
+static void
+vhost_transport_error(struct sk_buff *skb, int err)
+{
+ struct sock_exterr_skb *serr;
+ struct sock *sk = skb->sk;
+ struct sk_buff *clone;
+
+ serr = SKB_EXT_ERR(skb);
+ memset(serr, 0, sizeof(*serr));
+ serr->ee.ee_errno = err;
+ serr->ee.ee_origin = SO_EE_ORIGIN_NONE;
+
+ clone = skb_clone(skb, GFP_KERNEL);
+ if (!clone)
+ return;
+
+ if (sock_queue_err_skb(sk, clone))
+ kfree_skb(clone);
+
+ sk->sk_err = err;
+ sk_error_report(sk);
+
+ kfree_skb(skb);
+}
+
static void
vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
struct vhost_virtqueue *vq)
@@ -160,9 +189,15 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
hdr = virtio_vsock_hdr(skb);
/* If the packet is greater than the space available in the
- * buffer, we split it using multiple buffers.
+ * buffer, we split it using multiple buffers for connectible
+ * sockets and drop the packet for datagram sockets.
*/
if (payload_len > iov_len - sizeof(*hdr)) {
+ if (le16_to_cpu(hdr->type) == VIRTIO_VSOCK_TYPE_DGRAM) {
+ vhost_transport_error(skb, EHOSTUNREACH);
+ continue;
+ }
+
payload_len = iov_len - sizeof(*hdr);
/* As we are copying pieces of large packet's buffer to
@@ -394,6 +429,7 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock)
return val < vq->num;
}
+static bool vhost_transport_dgram_allow(u32 cid, u32 port);
static bool vhost_transport_seqpacket_allow(u32 remote_cid);
static struct virtio_transport vhost_transport = {
@@ -410,7 +446,8 @@ static struct virtio_transport vhost_transport = {
.cancel_pkt = vhost_transport_cancel_pkt,
.dgram_enqueue = virtio_transport_dgram_enqueue,
- .dgram_allow = virtio_transport_dgram_allow,
+ .dgram_allow = vhost_transport_dgram_allow,
+ .dgram_addr_init = virtio_transport_dgram_addr_init,
.stream_enqueue = virtio_transport_stream_enqueue,
.stream_dequeue = virtio_transport_stream_dequeue,
@@ -443,6 +480,22 @@ static struct virtio_transport vhost_transport = {
.send_pkt = vhost_transport_send_pkt,
};
+static bool vhost_transport_dgram_allow(u32 cid, u32 port)
+{
+ struct vhost_vsock *vsock;
+ bool dgram_allow = false;
+
+ rcu_read_lock();
+ vsock = vhost_vsock_get(cid);
+
+ if (vsock)
+ dgram_allow = vsock->dgram_allow;
+
+ rcu_read_unlock();
+
+ return dgram_allow;
+}
+
static bool vhost_transport_seqpacket_allow(u32 remote_cid)
{
struct vhost_vsock *vsock;
@@ -799,6 +852,9 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
vsock->seqpacket_allow = true;
+ if (features & (1ULL << VIRTIO_VSOCK_F_DGRAM))
+ vsock->dgram_allow = true;
+
for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
vq = &vsock->vqs[i];
mutex_lock(&vq->mutex);
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index e73f3b2c52f1..449ed63ac2b0 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1427,9 +1427,12 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
return prot->recvmsg(sk, msg, len, flags, NULL);
#endif
- if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
+ if (unlikely(flags & MSG_OOB))
return -EOPNOTSUPP;
+ if (unlikely(flags & MSG_ERRQUEUE))
+ return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, 0);
+
transport = vsk->transport;
/* Retrieve the head sk_buff from the socket's receive queue. */
--
2.30.2
This commit adds support for bound dgram sockets to be tracked in a
separate bind table from connectible sockets in order to avoid address
collisions. With this commit, users can simultaneously bind a dgram
socket and connectible socket to the same CID and port.
Signed-off-by: Bobby Eshleman <[email protected]>
---
net/vmw_vsock/af_vsock.c | 103 ++++++++++++++++++++++++++++++++++-------------
1 file changed, 76 insertions(+), 27 deletions(-)
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 88100154156c..0895f4c1d340 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -10,18 +10,23 @@
* - There are two kinds of sockets: those created by user action (such as
* calling socket(2)) and those created by incoming connection request packets.
*
- * - There are two "global" tables, one for bound sockets (sockets that have
- * specified an address that they are responsible for) and one for connected
- * sockets (sockets that have established a connection with another socket).
- * These tables are "global" in that all sockets on the system are placed
- * within them. - Note, though, that the bound table contains an extra entry
- * for a list of unbound sockets and SOCK_DGRAM sockets will always remain in
- * that list. The bound table is used solely for lookup of sockets when packets
- * are received and that's not necessary for SOCK_DGRAM sockets since we create
- * a datagram handle for each and need not perform a lookup. Keeping SOCK_DGRAM
- * sockets out of the bound hash buckets will reduce the chance of collisions
- * when looking for SOCK_STREAM sockets and prevents us from having to check the
- * socket type in the hash table lookups.
+ * - There are three "global" tables, one for bound connectible (stream /
+ * seqpacket) sockets, one for bound datagram sockets, and one for connected
+ * sockets. Bound sockets are sockets that have specified an address that
+ * they are responsible for. Connected sockets are sockets that have
+ * established a connection with another socket. These tables are "global" in
+ * that all sockets on the system are placed within them. - Note, though,
+ * that the bound tables contain an extra entry for a list of unbound
+ * sockets. The bound tables are used solely for lookup of sockets when packets
+ * are received.
+ *
+ * - There are separate bind tables for connectible and datagram sockets to avoid
+ * address collisions between stream/seqpacket sockets and datagram sockets.
+ *
+ * - Transports may elect to NOT use the global datagram bind table by
+ * implementing the ->dgram_bind() callback. If that callback is implemented,
+ * the global bind table is not used and the responsibility of bound datagram
+ * socket tracking is deferred to the transport.
*
* - Sockets created by user action will either be "client" sockets that
* initiate a connection or "server" sockets that listen for connections; we do
@@ -115,6 +120,7 @@
static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
static void vsock_sk_destruct(struct sock *sk);
static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
+static bool sock_type_connectible(u16 type);
/* Protocol family. */
struct proto vsock_proto = {
@@ -151,21 +157,25 @@ static DEFINE_MUTEX(vsock_register_mutex);
* VSocket is stored in the connected hash table.
*
* Unbound sockets are all put on the same list attached to the end of the hash
- * table (vsock_unbound_sockets). Bound sockets are added to the hash table in
- * the bucket that their local address hashes to (vsock_bound_sockets(addr)
- * represents the list that addr hashes to).
+ * tables (vsock_unbound_sockets/vsock_unbound_dgram_sockets). Bound sockets
+ * are added to the hash table in the bucket that their local address hashes to
+ * (vsock_bound_sockets(addr) and vsock_bound_dgram_sockets(addr) represents
+ * the list that addr hashes to).
*
- * Specifically, we initialize the vsock_bind_table array to a size of
- * VSOCK_HASH_SIZE + 1 so that vsock_bind_table[0] through
- * vsock_bind_table[VSOCK_HASH_SIZE - 1] are for bound sockets and
- * vsock_bind_table[VSOCK_HASH_SIZE] is for unbound sockets. The hash function
- * mods with VSOCK_HASH_SIZE to ensure this.
+ * Specifically, taking connectible sockets as an example we initialize the
+ * vsock_bind_table array to a size of VSOCK_HASH_SIZE + 1 so that
+ * vsock_bind_table[0] through vsock_bind_table[VSOCK_HASH_SIZE - 1] are for
+ * bound sockets and vsock_bind_table[VSOCK_HASH_SIZE] is for unbound sockets.
+ * The hash function mods with VSOCK_HASH_SIZE to ensure this.
+ * Datagrams and vsock_dgram_bind_table operate in the same way.
*/
#define MAX_PORT_RETRIES 24
#define VSOCK_HASH(addr) ((addr)->svm_port % VSOCK_HASH_SIZE)
#define vsock_bound_sockets(addr) (&vsock_bind_table[VSOCK_HASH(addr)])
+#define vsock_bound_dgram_sockets(addr) (&vsock_dgram_bind_table[VSOCK_HASH(addr)])
#define vsock_unbound_sockets (&vsock_bind_table[VSOCK_HASH_SIZE])
+#define vsock_unbound_dgram_sockets (&vsock_dgram_bind_table[VSOCK_HASH_SIZE])
/* XXX This can probably be implemented in a better way. */
#define VSOCK_CONN_HASH(src, dst) \
@@ -181,6 +191,8 @@ struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
EXPORT_SYMBOL_GPL(vsock_connected_table);
DEFINE_SPINLOCK(vsock_table_lock);
EXPORT_SYMBOL_GPL(vsock_table_lock);
+static struct list_head vsock_dgram_bind_table[VSOCK_HASH_SIZE + 1];
+static DEFINE_SPINLOCK(vsock_dgram_table_lock);
/* Autobind this socket to the local address if necessary. */
static int vsock_auto_bind(struct vsock_sock *vsk)
@@ -203,6 +215,9 @@ static void vsock_init_tables(void)
for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++)
INIT_LIST_HEAD(&vsock_connected_table[i]);
+
+ for (i = 0; i < ARRAY_SIZE(vsock_dgram_bind_table); i++)
+ INIT_LIST_HEAD(&vsock_dgram_bind_table[i]);
}
static void __vsock_insert_bound(struct list_head *list,
@@ -270,13 +285,28 @@ static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
return NULL;
}
-static void vsock_insert_unbound(struct vsock_sock *vsk)
+static void __vsock_insert_dgram_unbound(struct vsock_sock *vsk)
+{
+ spin_lock_bh(&vsock_dgram_table_lock);
+ __vsock_insert_bound(vsock_unbound_dgram_sockets, vsk);
+ spin_unlock_bh(&vsock_dgram_table_lock);
+}
+
+static void __vsock_insert_connectible_unbound(struct vsock_sock *vsk)
{
spin_lock_bh(&vsock_table_lock);
__vsock_insert_bound(vsock_unbound_sockets, vsk);
spin_unlock_bh(&vsock_table_lock);
}
+static void vsock_insert_unbound(struct vsock_sock *vsk)
+{
+ if (sock_type_connectible(sk_vsock(vsk)->sk_type))
+ __vsock_insert_connectible_unbound(vsk);
+ else
+ __vsock_insert_dgram_unbound(vsk);
+}
+
void vsock_insert_connected(struct vsock_sock *vsk)
{
struct list_head *list = vsock_connected_sockets(
@@ -288,6 +318,14 @@ void vsock_insert_connected(struct vsock_sock *vsk)
}
EXPORT_SYMBOL_GPL(vsock_insert_connected);
+static void vsock_remove_dgram_bound(struct vsock_sock *vsk)
+{
+ spin_lock_bh(&vsock_dgram_table_lock);
+ if (__vsock_in_bound_table(vsk))
+ __vsock_remove_bound(vsk);
+ spin_unlock_bh(&vsock_dgram_table_lock);
+}
+
void vsock_remove_bound(struct vsock_sock *vsk)
{
spin_lock_bh(&vsock_table_lock);
@@ -339,7 +377,10 @@ EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
void vsock_remove_sock(struct vsock_sock *vsk)
{
- vsock_remove_bound(vsk);
+ if (sock_type_connectible(sk_vsock(vsk)->sk_type))
+ vsock_remove_bound(vsk);
+ else
+ vsock_remove_dgram_bound(vsk);
vsock_remove_connected(vsk);
}
EXPORT_SYMBOL_GPL(vsock_remove_sock);
@@ -722,11 +763,19 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
}
-static int __vsock_bind_dgram(struct vsock_sock *vsk,
- struct sockaddr_vm *addr)
+static int vsock_bind_dgram(struct vsock_sock *vsk,
+ struct sockaddr_vm *addr)
{
- if (!vsk->transport || !vsk->transport->dgram_bind)
- return -EINVAL;
+ if (!vsk->transport || !vsk->transport->dgram_bind) {
+ int retval;
+
+ spin_lock_bh(&vsock_dgram_table_lock);
+ retval = vsock_bind_common(vsk, addr, vsock_dgram_bind_table,
+ VSOCK_HASH_SIZE);
+ spin_unlock_bh(&vsock_dgram_table_lock);
+
+ return retval;
+ }
return vsk->transport->dgram_bind(vsk, addr);
}
@@ -757,7 +806,7 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
break;
case SOCK_DGRAM:
- retval = __vsock_bind_dgram(vsk, addr);
+ retval = vsock_bind_dgram(vsk, addr);
break;
default:
--
2.30.2
This commit drops the transport->dgram_dequeue callback and makes
vsock_dgram_recvmsg() generic to all transports.
To make this possible, two transport-level changes are introduced:
- implementation of the ->dgram_addr_init() callback to initialize
the sockaddr_vm structure with data from incoming socket buffers.
- transport implementations set the skb->data pointer to the beginning
of the payload prior to adding the skb to the socket's receive queue.
That is, they must use skb_pull() before enqueuing. This is an
agreement between the transport and the socket layer that skb->data
always points to the beginning of the payload (and not, for example,
the packet header).
Signed-off-by: Bobby Eshleman <[email protected]>
---
drivers/vhost/vsock.c | 1 -
include/linux/virtio_vsock.h | 5 ---
include/net/af_vsock.h | 3 +-
net/vmw_vsock/af_vsock.c | 40 ++++++++++++++++++++++-
net/vmw_vsock/hyperv_transport.c | 7 ----
net/vmw_vsock/virtio_transport.c | 1 -
net/vmw_vsock/virtio_transport_common.c | 9 -----
net/vmw_vsock/vmci_transport.c | 58 ++++++---------------------------
net/vmw_vsock/vsock_loopback.c | 1 -
9 files changed, 50 insertions(+), 75 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 6578db78f0ae..ae8891598a48 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
.cancel_pkt = vhost_transport_cancel_pkt,
.dgram_enqueue = virtio_transport_dgram_enqueue,
- .dgram_dequeue = virtio_transport_dgram_dequeue,
.dgram_bind = virtio_transport_dgram_bind,
.dgram_allow = virtio_transport_dgram_allow,
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index c58453699ee9..18cbe8d37fca 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -167,11 +167,6 @@ virtio_transport_stream_dequeue(struct vsock_sock *vsk,
size_t len,
int type);
int
-virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
- struct msghdr *msg,
- size_t len, int flags);
-
-int
virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
struct msghdr *msg,
size_t len);
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 0e7504a42925..305d57502e89 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -120,11 +120,10 @@ struct vsock_transport {
/* DGRAM. */
int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
- int (*dgram_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
- size_t len, int flags);
int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
struct msghdr *, size_t len);
bool (*dgram_allow)(u32 cid, u32 port);
+ void (*dgram_addr_init)(struct sk_buff *skb, struct sockaddr_vm *addr);
/* STREAM. */
/* TODO: stream_bind() */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index deb72a8c44a7..ad71e084bf2f 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1272,11 +1272,15 @@ static int vsock_dgram_connect(struct socket *sock,
int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
size_t len, int flags)
{
+ const struct vsock_transport *transport;
#ifdef CONFIG_BPF_SYSCALL
const struct proto *prot;
#endif
struct vsock_sock *vsk;
+ struct sk_buff *skb;
+ size_t payload_len;
struct sock *sk;
+ int err;
sk = sock->sk;
vsk = vsock_sk(sk);
@@ -1287,7 +1291,41 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
return prot->recvmsg(sk, msg, len, flags, NULL);
#endif
- return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
+ if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
+ return -EOPNOTSUPP;
+
+ transport = vsk->transport;
+
+ /* Retrieve the head sk_buff from the socket's receive queue. */
+ err = 0;
+ skb = skb_recv_datagram(sk_vsock(vsk), flags, &err);
+ if (!skb)
+ return err;
+
+ payload_len = skb->len;
+
+ if (payload_len > len) {
+ payload_len = len;
+ msg->msg_flags |= MSG_TRUNC;
+ }
+
+ /* Place the datagram payload in the user's iovec. */
+ err = skb_copy_datagram_msg(skb, 0, msg, payload_len);
+ if (err)
+ goto out;
+
+ if (msg->msg_name) {
+ /* Provide the address of the sender. */
+ DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
+
+ transport->dgram_addr_init(skb, vm_addr);
+ msg->msg_namelen = sizeof(*vm_addr);
+ }
+ err = payload_len;
+
+out:
+ skb_free_datagram(&vsk->sk, skb);
+ return err;
}
EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 7cb1a9d2cdb4..7f1ea434656d 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -556,12 +556,6 @@ static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
return -EOPNOTSUPP;
}
-static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
- size_t len, int flags)
-{
- return -EOPNOTSUPP;
-}
-
static int hvs_dgram_enqueue(struct vsock_sock *vsk,
struct sockaddr_vm *remote, struct msghdr *msg,
size_t dgram_len)
@@ -833,7 +827,6 @@ static struct vsock_transport hvs_transport = {
.shutdown = hvs_shutdown,
.dgram_bind = hvs_dgram_bind,
- .dgram_dequeue = hvs_dgram_dequeue,
.dgram_enqueue = hvs_dgram_enqueue,
.dgram_allow = hvs_dgram_allow,
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index e95df847176b..66edffdbf303 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -429,7 +429,6 @@ static struct virtio_transport virtio_transport = {
.cancel_pkt = virtio_transport_cancel_pkt,
.dgram_bind = virtio_transport_dgram_bind,
- .dgram_dequeue = virtio_transport_dgram_dequeue,
.dgram_enqueue = virtio_transport_dgram_enqueue,
.dgram_allow = virtio_transport_dgram_allow,
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index b769fc258931..01ea1402ad40 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -583,15 +583,6 @@ virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
}
EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_enqueue);
-int
-virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
- struct msghdr *msg,
- size_t len, int flags)
-{
- return -EOPNOTSUPP;
-}
-EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
-
s64 virtio_transport_stream_has_data(struct vsock_sock *vsk)
{
struct virtio_vsock_sock *vvs = vsk->trans;
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index b370070194fa..0bbbdb222245 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -641,6 +641,7 @@ static int vmci_transport_recv_dgram_cb(void *data, struct vmci_datagram *dg)
sock_hold(sk);
skb_put(skb, size);
memcpy(skb->data, dg, size);
+ skb_pull(skb, VMCI_DG_HEADERSIZE);
sk_receive_skb(sk, skb, 0);
return VMCI_SUCCESS;
@@ -1731,57 +1732,18 @@ static int vmci_transport_dgram_enqueue(
return err - sizeof(*dg);
}
-static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk,
- struct msghdr *msg, size_t len,
- int flags)
+static void vmci_transport_dgram_addr_init(struct sk_buff *skb,
+ struct sockaddr_vm *addr)
{
- int err;
struct vmci_datagram *dg;
- size_t payload_len;
- struct sk_buff *skb;
-
- if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
- return -EOPNOTSUPP;
-
- /* Retrieve the head sk_buff from the socket's receive queue. */
- err = 0;
- skb = skb_recv_datagram(&vsk->sk, flags, &err);
- if (!skb)
- return err;
-
- dg = (struct vmci_datagram *)skb->data;
- if (!dg)
- /* err is 0, meaning we read zero bytes. */
- goto out;
-
- payload_len = dg->payload_size;
- /* Ensure the sk_buff matches the payload size claimed in the packet. */
- if (payload_len != skb->len - sizeof(*dg)) {
- err = -EINVAL;
- goto out;
- }
-
- if (payload_len > len) {
- payload_len = len;
- msg->msg_flags |= MSG_TRUNC;
- }
+ unsigned int cid, port;
- /* Place the datagram payload in the user's iovec. */
- err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
- if (err)
- goto out;
-
- if (msg->msg_name) {
- /* Provide the address of the sender. */
- DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
- vsock_addr_init(vm_addr, dg->src.context, dg->src.resource);
- msg->msg_namelen = sizeof(*vm_addr);
- }
- err = payload_len;
+ WARN_ONCE(skb->head == skb->data, "vmci vsock bug: bad dgram skb");
-out:
- skb_free_datagram(&vsk->sk, skb);
- return err;
+ dg = (struct vmci_datagram *)skb->head;
+ cid = dg->src.context;
+ port = dg->src.resource;
+ vsock_addr_init(addr, cid, port);
}
static bool vmci_transport_dgram_allow(u32 cid, u32 port)
@@ -2040,9 +2002,9 @@ static struct vsock_transport vmci_transport = {
.release = vmci_transport_release,
.connect = vmci_transport_connect,
.dgram_bind = vmci_transport_dgram_bind,
- .dgram_dequeue = vmci_transport_dgram_dequeue,
.dgram_enqueue = vmci_transport_dgram_enqueue,
.dgram_allow = vmci_transport_dgram_allow,
+ .dgram_addr_init = vmci_transport_dgram_addr_init,
.stream_dequeue = vmci_transport_stream_dequeue,
.stream_enqueue = vmci_transport_stream_enqueue,
.stream_has_data = vmci_transport_stream_has_data,
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
index 5c6360df1f31..2a59dd177c74 100644
--- a/net/vmw_vsock/vsock_loopback.c
+++ b/net/vmw_vsock/vsock_loopback.c
@@ -62,7 +62,6 @@ static struct virtio_transport loopback_transport = {
.cancel_pkt = vsock_loopback_cancel_pkt,
.dgram_bind = virtio_transport_dgram_bind,
- .dgram_dequeue = virtio_transport_dgram_dequeue,
.dgram_enqueue = virtio_transport_dgram_enqueue,
.dgram_allow = virtio_transport_dgram_allow,
--
2.30.2
This commit implements the common function
virtio_transport_dgram_enqueue for enqueueing datagrams. It does not add
usage in either vhost or virtio yet.
Signed-off-by: Bobby Eshleman <[email protected]>
---
net/vmw_vsock/virtio_transport_common.c | 76 ++++++++++++++++++++++++++++++++-
1 file changed, 75 insertions(+), 1 deletion(-)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index ffcbdd77feaa..3bfaff758433 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -819,7 +819,81 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
struct msghdr *msg,
size_t dgram_len)
{
- return -EOPNOTSUPP;
+ /* Here we are only using the info struct to retain style uniformity
+ * and to ease future refactoring and merging.
+ */
+ struct virtio_vsock_pkt_info info_stack = {
+ .op = VIRTIO_VSOCK_OP_RW,
+ .msg = msg,
+ .vsk = vsk,
+ .type = VIRTIO_VSOCK_TYPE_DGRAM,
+ };
+ const struct virtio_transport *t_ops;
+ struct virtio_vsock_pkt_info *info;
+ struct sock *sk = sk_vsock(vsk);
+ struct virtio_vsock_hdr *hdr;
+ u32 src_cid, src_port;
+ struct sk_buff *skb;
+ void *payload;
+ int noblock;
+ int err;
+
+ info = &info_stack;
+
+ if (dgram_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
+ return -EMSGSIZE;
+
+ t_ops = virtio_transport_get_ops(vsk);
+ if (unlikely(!t_ops))
+ return -EFAULT;
+
+ /* Unlike some of our other sending functions, this function is not
+ * intended for use without a msghdr.
+ */
+ if (WARN_ONCE(!msg, "vsock dgram bug: no msghdr found for dgram enqueue\n"))
+ return -EFAULT;
+
+ noblock = msg->msg_flags & MSG_DONTWAIT;
+
+ /* Use sock_alloc_send_skb to throttle by sk_sndbuf. This helps avoid
+ * triggering the OOM.
+ */
+ skb = sock_alloc_send_skb(sk, dgram_len + VIRTIO_VSOCK_SKB_HEADROOM,
+ noblock, &err);
+ if (!skb)
+ return err;
+
+ skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM);
+
+ src_cid = t_ops->transport.get_local_cid();
+ src_port = vsk->local_addr.svm_port;
+
+ hdr = virtio_vsock_hdr(skb);
+ hdr->type = cpu_to_le16(info->type);
+ hdr->op = cpu_to_le16(info->op);
+ hdr->src_cid = cpu_to_le64(src_cid);
+ hdr->dst_cid = cpu_to_le64(remote_addr->svm_cid);
+ hdr->src_port = cpu_to_le32(src_port);
+ hdr->dst_port = cpu_to_le32(remote_addr->svm_port);
+ hdr->flags = cpu_to_le32(info->flags);
+ hdr->len = cpu_to_le32(dgram_len);
+
+ skb_set_owner_w(skb, sk);
+
+ payload = skb_put(skb, dgram_len);
+ err = memcpy_from_msg(payload, msg, dgram_len);
+ if (err)
+ return err;
+
+ trace_virtio_transport_alloc_pkt(src_cid, src_port,
+ remote_addr->svm_cid,
+ remote_addr->svm_port,
+ dgram_len,
+ info->type,
+ info->op,
+ 0);
+
+ return t_ops->send_pkt(skb);
}
EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
--
2.30.2
This patch adds support for multi-transport datagrams.
This includes:
- Per-packet lookup of transports when using sendto(sockaddr_vm)
- Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
sockaddr_vm
- rename VSOCK_TRANSPORT_F_DGRAM to VSOCK_TRANSPORT_F_DGRAM_FALLBACK
- connect() now assigns the transport for (similar to connectible
sockets)
To preserve backwards compatibility with VMCI, some important changes
are made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
be used for dgrams only if there is not yet a g2h or h2g transport that
has been registered that can transmit the packet. If there is a g2h/h2g
transport for that remote address, then that transport will be used and
not "transport_dgram". This essentially makes "transport_dgram" a
fallback transport for when h2g/g2h has not yet gone online, and so it
is renamed "transport_dgram_fallback". VMCI implements this transport.
The logic around "transport_dgram" needs to be retained to prevent
breaking VMCI:
1) VMCI datagrams existed prior to h2g/g2h and so operate under a
different paradigm. When the vmci transport comes online, it registers
itself with the DGRAM feature, but not H2G/G2H. Only later when the
transport has more information about its environment does it register
H2G or G2H. In the case that a datagram socket is created after
VSOCK_TRANSPORT_F_DGRAM registration but before G2H/H2G registration,
the "transport_dgram" transport is the only registered transport and so
needs to be used.
2) VMCI seems to require a special message be sent by the transport when a
datagram socket calls bind(). Under the h2g/g2h model, the transport
is selected using the remote_addr which is set by connect(). At
bind time there is no remote_addr because often no connect() has been
called yet: the transport is null. Therefore, with a null transport
there doesn't seem to be any good way for a datagram socket to tell the
VMCI transport that it has just had bind() called upon it.
With the new fallback logic, after H2G/G2H comes online the socket layer
will access the VMCI transport via transport_{h2g,g2h}. Prior to H2G/G2H
coming online, the socket layer will access the VMCI transport via
"transport_dgram_fallback".
Only transports with a special datagram fallback use-case such as VMCI
need to register VSOCK_TRANSPORT_F_DGRAM_FALLBACK.
Signed-off-by: Bobby Eshleman <[email protected]>
---
drivers/vhost/vsock.c | 1 -
include/linux/virtio_vsock.h | 2 --
include/net/af_vsock.h | 10 +++---
net/vmw_vsock/af_vsock.c | 64 ++++++++++++++++++++++++++-------
net/vmw_vsock/hyperv_transport.c | 6 ----
net/vmw_vsock/virtio_transport.c | 1 -
net/vmw_vsock/virtio_transport_common.c | 7 ----
net/vmw_vsock/vmci_transport.c | 2 +-
net/vmw_vsock/vsock_loopback.c | 1 -
9 files changed, 58 insertions(+), 36 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index ae8891598a48..d5d6a3c3f273 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
.cancel_pkt = vhost_transport_cancel_pkt,
.dgram_enqueue = virtio_transport_dgram_enqueue,
- .dgram_bind = virtio_transport_dgram_bind,
.dgram_allow = virtio_transport_dgram_allow,
.stream_enqueue = virtio_transport_stream_enqueue,
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 18cbe8d37fca..7632552bee58 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -211,8 +211,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
bool virtio_transport_stream_allow(u32 cid, u32 port);
-int virtio_transport_dgram_bind(struct vsock_sock *vsk,
- struct sockaddr_vm *addr);
bool virtio_transport_dgram_allow(u32 cid, u32 port);
int virtio_transport_connect(struct vsock_sock *vsk);
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 305d57502e89..f6a0ca9d7c3e 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -96,13 +96,13 @@ struct vsock_transport_send_notify_data {
/* Transport features flags */
/* Transport provides host->guest communication */
-#define VSOCK_TRANSPORT_F_H2G 0x00000001
+#define VSOCK_TRANSPORT_F_H2G 0x00000001
/* Transport provides guest->host communication */
-#define VSOCK_TRANSPORT_F_G2H 0x00000002
-/* Transport provides DGRAM communication */
-#define VSOCK_TRANSPORT_F_DGRAM 0x00000004
+#define VSOCK_TRANSPORT_F_G2H 0x00000002
+/* Transport provides fallback for DGRAM communication */
+#define VSOCK_TRANSPORT_F_DGRAM_FALLBACK 0x00000004
/* Transport provides local (loopback) communication */
-#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
+#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
struct vsock_transport {
struct module *module;
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index ae5ac5531d96..26c97b33d55a 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -139,8 +139,8 @@ struct proto vsock_proto = {
static const struct vsock_transport *transport_h2g;
/* Transport used for guest->host communication */
static const struct vsock_transport *transport_g2h;
-/* Transport used for DGRAM communication */
-static const struct vsock_transport *transport_dgram;
+/* Transport used as a fallback for DGRAM communication */
+static const struct vsock_transport *transport_dgram_fallback;
/* Transport used for local communication */
static const struct vsock_transport *transport_local;
static DEFINE_MUTEX(vsock_register_mutex);
@@ -439,6 +439,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
return transport;
}
+static const struct vsock_transport *
+vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
+{
+ const struct vsock_transport *transport;
+
+ transport = vsock_connectible_lookup_transport(cid, flags);
+ if (transport)
+ return transport;
+
+ return transport_dgram_fallback;
+}
+
/* Assign a transport to a socket and call the .init transport callback.
*
* Note: for connection oriented socket this must be called when vsk->remote_addr
@@ -475,7 +487,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
switch (sk->sk_type) {
case SOCK_DGRAM:
- new_transport = transport_dgram;
+ new_transport = vsock_dgram_lookup_transport(remote_cid,
+ remote_flags);
break;
case SOCK_STREAM:
case SOCK_SEQPACKET:
@@ -692,6 +705,9 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
static int __vsock_bind_dgram(struct vsock_sock *vsk,
struct sockaddr_vm *addr)
{
+ if (!vsk->transport || !vsk->transport->dgram_bind)
+ return -EINVAL;
+
return vsk->transport->dgram_bind(vsk, addr);
}
@@ -1162,6 +1178,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
struct vsock_sock *vsk;
struct sockaddr_vm *remote_addr;
const struct vsock_transport *transport;
+ bool module_got = false;
if (msg->msg_flags & MSG_OOB)
return -EOPNOTSUPP;
@@ -1173,19 +1190,34 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
lock_sock(sk);
- transport = vsk->transport;
-
err = vsock_auto_bind(vsk);
if (err)
goto out;
-
/* If the provided message contains an address, use that. Otherwise
* fall back on the socket's remote handle (if it has been connected).
*/
if (msg->msg_name &&
vsock_addr_cast(msg->msg_name, msg->msg_namelen,
&remote_addr) == 0) {
+ transport = vsock_dgram_lookup_transport(remote_addr->svm_cid,
+ remote_addr->svm_flags);
+ if (!transport) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (!try_module_get(transport->module)) {
+ err = -ENODEV;
+ goto out;
+ }
+
+ /* When looking up a transport dynamically and acquiring a
+ * reference on the module, we need to remember to release the
+ * reference later.
+ */
+ module_got = true;
+
/* Ensure this address is of the right type and is a valid
* destination.
*/
@@ -1200,6 +1232,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
} else if (sock->state == SS_CONNECTED) {
remote_addr = &vsk->remote_addr;
+ transport = vsk->transport;
if (remote_addr->svm_cid == VMADDR_CID_ANY)
remote_addr->svm_cid = transport->get_local_cid();
@@ -1224,6 +1257,8 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
out:
+ if (module_got)
+ module_put(transport->module);
release_sock(sk);
return err;
}
@@ -1256,13 +1291,18 @@ static int vsock_dgram_connect(struct socket *sock,
if (err)
goto out;
+ memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
+
+ err = vsock_assign_transport(vsk, NULL);
+ if (err)
+ goto out;
+
if (!vsk->transport->dgram_allow(remote_addr->svm_cid,
remote_addr->svm_port)) {
err = -EINVAL;
goto out;
}
- memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
sock->state = SS_CONNECTED;
/* sock map disallows redirection of non-TCP sockets with sk_state !=
@@ -2487,7 +2527,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
t_h2g = transport_h2g;
t_g2h = transport_g2h;
- t_dgram = transport_dgram;
+ t_dgram = transport_dgram_fallback;
t_local = transport_local;
if (features & VSOCK_TRANSPORT_F_H2G) {
@@ -2506,7 +2546,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
t_g2h = t;
}
- if (features & VSOCK_TRANSPORT_F_DGRAM) {
+ if (features & VSOCK_TRANSPORT_F_DGRAM_FALLBACK) {
if (t_dgram) {
err = -EBUSY;
goto err_busy;
@@ -2524,7 +2564,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
transport_h2g = t_h2g;
transport_g2h = t_g2h;
- transport_dgram = t_dgram;
+ transport_dgram_fallback = t_dgram;
transport_local = t_local;
err_busy:
@@ -2543,8 +2583,8 @@ void vsock_core_unregister(const struct vsock_transport *t)
if (transport_g2h == t)
transport_g2h = NULL;
- if (transport_dgram == t)
- transport_dgram = NULL;
+ if (transport_dgram_fallback == t)
+ transport_dgram_fallback = NULL;
if (transport_local == t)
transport_local = NULL;
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 7f1ea434656d..c29000f2612a 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -551,11 +551,6 @@ static void hvs_destruct(struct vsock_sock *vsk)
kfree(hvs);
}
-static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
-{
- return -EOPNOTSUPP;
-}
-
static int hvs_dgram_enqueue(struct vsock_sock *vsk,
struct sockaddr_vm *remote, struct msghdr *msg,
size_t dgram_len)
@@ -826,7 +821,6 @@ static struct vsock_transport hvs_transport = {
.connect = hvs_connect,
.shutdown = hvs_shutdown,
- .dgram_bind = hvs_dgram_bind,
.dgram_enqueue = hvs_dgram_enqueue,
.dgram_allow = hvs_dgram_allow,
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 66edffdbf303..ac2126c7dac5 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -428,7 +428,6 @@ static struct virtio_transport virtio_transport = {
.shutdown = virtio_transport_shutdown,
.cancel_pkt = virtio_transport_cancel_pkt,
- .dgram_bind = virtio_transport_dgram_bind,
.dgram_enqueue = virtio_transport_dgram_enqueue,
.dgram_allow = virtio_transport_dgram_allow,
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 01ea1402ad40..ffcbdd77feaa 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -781,13 +781,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
}
EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
-int virtio_transport_dgram_bind(struct vsock_sock *vsk,
- struct sockaddr_vm *addr)
-{
- return -EOPNOTSUPP;
-}
-EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
-
bool virtio_transport_dgram_allow(u32 cid, u32 port)
{
return false;
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 0bbbdb222245..857b0461f856 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -2072,7 +2072,7 @@ static int __init vmci_transport_init(void)
/* Register only with dgram feature, other features (H2G, G2H) will be
* registered when the first host or guest becomes active.
*/
- err = vsock_core_register(&vmci_transport, VSOCK_TRANSPORT_F_DGRAM);
+ err = vsock_core_register(&vmci_transport, VSOCK_TRANSPORT_F_DGRAM_FALLBACK);
if (err < 0)
goto err_unsubscribe;
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
index 2a59dd177c74..278235ea06c4 100644
--- a/net/vmw_vsock/vsock_loopback.c
+++ b/net/vmw_vsock/vsock_loopback.c
@@ -61,7 +61,6 @@ static struct virtio_transport loopback_transport = {
.shutdown = virtio_transport_shutdown,
.cancel_pkt = vsock_loopback_cancel_pkt,
- .dgram_bind = virtio_transport_dgram_bind,
.dgram_enqueue = virtio_transport_dgram_enqueue,
.dgram_allow = virtio_transport_dgram_allow,
--
2.30.2
Introduce new reusable function vsock_connectible_lookup_transport()
that performs the transport lookup logic.
No functional change intended.
Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Bobby Eshleman <[email protected]>
---
net/vmw_vsock/af_vsock.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index ad71e084bf2f..ae5ac5531d96 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -423,6 +423,22 @@ static void vsock_deassign_transport(struct vsock_sock *vsk)
vsk->transport = NULL;
}
+static const struct vsock_transport *
+vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
+{
+ const struct vsock_transport *transport;
+
+ if (vsock_use_local_transport(cid))
+ transport = transport_local;
+ else if (cid <= VMADDR_CID_HOST || !transport_h2g ||
+ (flags & VMADDR_FLAG_TO_HOST))
+ transport = transport_g2h;
+ else
+ transport = transport_h2g;
+
+ return transport;
+}
+
/* Assign a transport to a socket and call the .init transport callback.
*
* Note: for connection oriented socket this must be called when vsk->remote_addr
@@ -463,13 +479,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
break;
case SOCK_STREAM:
case SOCK_SEQPACKET:
- if (vsock_use_local_transport(remote_cid))
- new_transport = transport_local;
- else if (remote_cid <= VMADDR_CID_HOST || !transport_h2g ||
- (remote_flags & VMADDR_FLAG_TO_HOST))
- new_transport = transport_g2h;
- else
- new_transport = transport_h2g;
+ new_transport = vsock_connectible_lookup_transport(remote_cid,
+ remote_flags);
break;
default:
return -ESOCKTNOSUPPORT;
--
2.30.2
From: Jiang Wang <[email protected]>
This commit adds tests for vsock datagram.
Signed-off-by: Bobby Eshleman <[email protected]>
Signed-off-by: Jiang Wang <[email protected]>
---
tools/testing/vsock/util.c | 141 +++++++-
tools/testing/vsock/util.h | 6 +
tools/testing/vsock/vsock_test.c | 680 +++++++++++++++++++++++++++++++++++++++
3 files changed, 826 insertions(+), 1 deletion(-)
diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
index 01b636d3039a..811e70d7cf1e 100644
--- a/tools/testing/vsock/util.c
+++ b/tools/testing/vsock/util.c
@@ -99,7 +99,8 @@ static int vsock_connect(unsigned int cid, unsigned int port, int type)
int ret;
int fd;
- control_expectln("LISTENING");
+ if (type != SOCK_DGRAM)
+ control_expectln("LISTENING");
fd = socket(AF_VSOCK, type, 0);
@@ -130,6 +131,11 @@ int vsock_seqpacket_connect(unsigned int cid, unsigned int port)
return vsock_connect(cid, port, SOCK_SEQPACKET);
}
+int vsock_dgram_connect(unsigned int cid, unsigned int port)
+{
+ return vsock_connect(cid, port, SOCK_DGRAM);
+}
+
/* Listen on <cid, port> and return the first incoming connection. The remote
* address is stored to clientaddrp. clientaddrp may be NULL.
*/
@@ -211,6 +217,34 @@ int vsock_seqpacket_accept(unsigned int cid, unsigned int port,
return vsock_accept(cid, port, clientaddrp, SOCK_SEQPACKET);
}
+int vsock_dgram_bind(unsigned int cid, unsigned int port)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = port,
+ .svm_cid = cid,
+ },
+ };
+ int fd;
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ return fd;
+}
+
/* Transmit one byte and check the return value.
*
* expected_ret:
@@ -260,6 +294,57 @@ void send_byte(int fd, int expected_ret, int flags)
}
}
+/* Transmit one byte and check the return value.
+ *
+ * expected_ret:
+ * <0 Negative errno (for testing errors)
+ * 0 End-of-file
+ * 1 Success
+ */
+void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
+ int flags)
+{
+ const uint8_t byte = 'A';
+ ssize_t nwritten;
+
+ timeout_begin(TIMEOUT);
+ do {
+ nwritten = sendto(fd, &byte, sizeof(byte), flags, dest_addr,
+ len);
+ timeout_check("write");
+ } while (nwritten < 0 && errno == EINTR);
+ timeout_end();
+
+ if (expected_ret < 0) {
+ if (nwritten != -1) {
+ fprintf(stderr, "bogus sendto(2) return value %zd\n",
+ nwritten);
+ exit(EXIT_FAILURE);
+ }
+ if (errno != -expected_ret) {
+ perror("write");
+ exit(EXIT_FAILURE);
+ }
+ return;
+ }
+
+ if (nwritten < 0) {
+ perror("write");
+ exit(EXIT_FAILURE);
+ }
+ if (nwritten == 0) {
+ if (expected_ret == 0)
+ return;
+
+ fprintf(stderr, "unexpected EOF while sending byte\n");
+ exit(EXIT_FAILURE);
+ }
+ if (nwritten != sizeof(byte)) {
+ fprintf(stderr, "bogus sendto(2) return value %zd\n", nwritten);
+ exit(EXIT_FAILURE);
+ }
+}
+
/* Receive one byte and check the return value.
*
* expected_ret:
@@ -313,6 +398,60 @@ void recv_byte(int fd, int expected_ret, int flags)
}
}
+/* Receive one byte and check the return value.
+ *
+ * expected_ret:
+ * <0 Negative errno (for testing errors)
+ * 0 End-of-file
+ * 1 Success
+ */
+void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
+ int expected_ret, int flags)
+{
+ uint8_t byte;
+ ssize_t nread;
+
+ timeout_begin(TIMEOUT);
+ do {
+ nread = recvfrom(fd, &byte, sizeof(byte), flags, src_addr, addrlen);
+ timeout_check("read");
+ } while (nread < 0 && errno == EINTR);
+ timeout_end();
+
+ if (expected_ret < 0) {
+ if (nread != -1) {
+ fprintf(stderr, "bogus recvfrom(2) return value %zd\n",
+ nread);
+ exit(EXIT_FAILURE);
+ }
+ if (errno != -expected_ret) {
+ perror("read");
+ exit(EXIT_FAILURE);
+ }
+ return;
+ }
+
+ if (nread < 0) {
+ perror("read");
+ exit(EXIT_FAILURE);
+ }
+ if (nread == 0) {
+ if (expected_ret == 0)
+ return;
+
+ fprintf(stderr, "unexpected EOF while receiving byte\n");
+ exit(EXIT_FAILURE);
+ }
+ if (nread != sizeof(byte)) {
+ fprintf(stderr, "bogus recvfrom(2) return value %zd\n", nread);
+ exit(EXIT_FAILURE);
+ }
+ if (byte != 'A') {
+ fprintf(stderr, "unexpected byte read %c\n", byte);
+ exit(EXIT_FAILURE);
+ }
+}
+
/* Run test cases. The program terminates if a failure occurs. */
void run_tests(const struct test_case *test_cases,
const struct test_opts *opts)
diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
index fb99208a95ea..a69e128d120c 100644
--- a/tools/testing/vsock/util.h
+++ b/tools/testing/vsock/util.h
@@ -37,13 +37,19 @@ void init_signals(void);
unsigned int parse_cid(const char *str);
int vsock_stream_connect(unsigned int cid, unsigned int port);
int vsock_seqpacket_connect(unsigned int cid, unsigned int port);
+int vsock_dgram_connect(unsigned int cid, unsigned int port);
int vsock_stream_accept(unsigned int cid, unsigned int port,
struct sockaddr_vm *clientaddrp);
int vsock_seqpacket_accept(unsigned int cid, unsigned int port,
struct sockaddr_vm *clientaddrp);
+int vsock_dgram_bind(unsigned int cid, unsigned int port);
void vsock_wait_remote_close(int fd);
void send_byte(int fd, int expected_ret, int flags);
+void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
+ int flags);
void recv_byte(int fd, int expected_ret, int flags);
+void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
+ int expected_ret, int flags);
void run_tests(const struct test_case *test_cases,
const struct test_opts *opts);
void list_tests(const struct test_case *test_cases);
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index ac1bd3ac1533..c9904a3376ce 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -13,6 +13,7 @@
#include <string.h>
#include <errno.h>
#include <unistd.h>
+#include <linux/errqueue.h>
#include <linux/kernel.h>
#include <sys/types.h>
#include <sys/socket.h>
@@ -24,6 +25,12 @@
#include "control.h"
#include "util.h"
+#ifndef SOL_VSOCK
+#define SOL_VSOCK 287
+#endif
+
+#define DGRAM_MSG_CNT 16
+
static void test_stream_connection_reset(const struct test_opts *opts)
{
union {
@@ -1053,6 +1060,644 @@ static void test_stream_virtio_skb_merge_server(const struct test_opts *opts)
close(fd);
}
+static void test_dgram_sendto_client(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = opts->peer_cid,
+ },
+ };
+ int fd;
+
+ /* Wait for the server to be ready */
+ control_expectln("BIND");
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0);
+
+ /* Notify the server that the client has finished */
+ control_writeln("DONE");
+
+ close(fd);
+}
+
+static void test_dgram_sendto_server(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = VMADDR_CID_ANY,
+ },
+ };
+ unsigned long sock_buf_size;
+ int len = sizeof(addr.sa);
+ int fd;
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Set receive buffer to maximum */
+ sock_buf_size = -1;
+ if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
+ &sock_buf_size, sizeof(sock_buf_size))) {
+ perror("setsockopt(SO_RECVBUF)");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Notify the client that the server is ready */
+ control_writeln("BIND");
+
+ recvfrom_byte(fd, &addr.sa, &len, 1, 0);
+
+ /* Wait for the client to finish */
+ control_expectln("DONE");
+
+ close(fd);
+}
+
+static void test_dgram_connect_client(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = opts->peer_cid,
+ },
+ };
+ int ret;
+ int fd;
+
+ /* Wait for the server to be ready */
+ control_expectln("BIND");
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ ret = connect(fd, &addr.sa, sizeof(addr.svm));
+ if (ret < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ send_byte(fd, 1, 0);
+
+ /* Notify the server that the client has finished */
+ control_writeln("DONE");
+
+ close(fd);
+}
+
+static void test_dgram_connect_server(const struct test_opts *opts)
+{
+ test_dgram_sendto_server(opts);
+}
+
+static void test_dgram_multiconn_sendto_client(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = opts->peer_cid,
+ },
+ };
+ int fds[MULTICONN_NFDS];
+ int i;
+
+ /* Wait for the server to be ready */
+ control_expectln("BIND");
+
+ for (i = 0; i < MULTICONN_NFDS; i++) {
+ fds[i] = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fds[i] < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+ }
+
+ for (i = 0; i < MULTICONN_NFDS; i++) {
+ sendto_byte(fds[i], &addr.sa, sizeof(addr.svm), 1, 0);
+
+ /* This is here to make explicit the case of the test failing
+ * due to packet loss. The test fails when recv() times out
+ * otherwise, which is much more confusing.
+ */
+ control_expectln("PKTRECV");
+ }
+
+ /* Notify the server that the client has finished */
+ control_writeln("DONE");
+
+ for (i = 0; i < MULTICONN_NFDS; i++)
+ close(fds[i]);
+}
+
+static void test_dgram_multiconn_sendto_server(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = VMADDR_CID_ANY,
+ },
+ };
+ int len = sizeof(addr.sa);
+ int fd;
+ int i;
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Notify the client that the server is ready */
+ control_writeln("BIND");
+
+ for (i = 0; i < MULTICONN_NFDS; i++) {
+ recvfrom_byte(fd, &addr.sa, &len, 1, 0);
+ control_writeln("PKTRECV");
+ }
+
+ /* Wait for the client to finish */
+ control_expectln("DONE");
+
+ close(fd);
+}
+
+static void test_dgram_multiconn_send_client(const struct test_opts *opts)
+{
+ int fds[MULTICONN_NFDS];
+ int i;
+
+ /* Wait for the server to be ready */
+ control_expectln("BIND");
+
+ for (i = 0; i < MULTICONN_NFDS; i++) {
+ fds[i] = vsock_dgram_connect(opts->peer_cid, 1234);
+ if (fds[i] < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+ }
+
+ for (i = 0; i < MULTICONN_NFDS; i++) {
+ send_byte(fds[i], 1, 0);
+ /* This is here to make explicit the case of the test failing
+ * due to packet loss.
+ */
+ control_expectln("PKTRECV");
+ }
+
+ /* Notify the server that the client has finished */
+ control_writeln("DONE");
+
+ for (i = 0; i < MULTICONN_NFDS; i++)
+ close(fds[i]);
+}
+
+static void test_dgram_multiconn_send_server(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = VMADDR_CID_ANY,
+ },
+ };
+ unsigned long sock_buf_size;
+ int fd;
+ int i;
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Set receive buffer to maximum */
+ sock_buf_size = -1;
+ if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
+ &sock_buf_size, sizeof(sock_buf_size))) {
+ perror("setsockopt(SO_RECVBUF)");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Notify the client that the server is ready */
+ control_writeln("BIND");
+
+ for (i = 0; i < MULTICONN_NFDS; i++) {
+ recv_byte(fd, 1, 0);
+ control_writeln("PKTRECV");
+ }
+
+ /* Wait for the client to finish */
+ control_expectln("DONE");
+
+ close(fd);
+}
+
+/*
+ * This test is similar to the seqpacket msg bounds tests, but it is unreliable
+ * because it may also fail in the unlikely case that packets are dropped.
+ */
+static void test_dgram_bounds_unreliable_client(const struct test_opts *opts)
+{
+ unsigned long recv_buf_size;
+ unsigned long *hashes;
+ int page_size;
+ int fd;
+ int i;
+
+ fd = vsock_dgram_connect(opts->peer_cid, 1234);
+ if (fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ hashes = malloc(DGRAM_MSG_CNT * sizeof(unsigned long));
+ if (!hashes) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Let the server know the client is ready */
+ control_writeln("CLNTREADY");
+
+ /* Wait, until receiver sets buffer size. */
+ control_expectln("SRVREADY");
+
+ recv_buf_size = control_readulong();
+
+ page_size = getpagesize();
+
+ for (i = 0; i < DGRAM_MSG_CNT; i++) {
+ ssize_t send_size;
+ size_t buf_size;
+ void *buf;
+
+ /* Use "small" buffers and "big" buffers. */
+ if (opts->peer_cid <= VMADDR_CID_HOST && (i & 1))
+ buf_size = page_size +
+ (rand() % (MAX_MSG_SIZE - page_size));
+ else
+ buf_size = 1 + (rand() % page_size);
+
+ buf_size = min(buf_size, recv_buf_size);
+
+ buf = malloc(buf_size);
+
+ if (!buf) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+
+ memset(buf, rand() & 0xff, buf_size);
+
+ send_size = send(fd, buf, buf_size, 0);
+ if (send_size < 0) {
+ perror("send");
+ exit(EXIT_FAILURE);
+ }
+
+ if (send_size != buf_size) {
+ fprintf(stderr, "Invalid send size\n");
+ exit(EXIT_FAILURE);
+ }
+
+ /* In theory the implementation isn't required to transmit
+ * these packets in order, so we use this PKTSENT/PKTRECV
+ * message sequence so that server and client coordinate
+ * sending and receiving one packet at a time. The client sends
+ * a packet and waits until it has been received before sending
+ * another.
+ *
+ * Also in theory these packets can be lost and the test will
+ * fail for that reason.
+ */
+ control_writeln("PKTSENT");
+ control_expectln("PKTRECV");
+
+ /* Send the server a hash of the packet */
+ hashes[i] = hash_djb2(buf, buf_size);
+ free(buf);
+ }
+
+ control_writeln("SENDDONE");
+ close(fd);
+
+ for (i = 0; i < DGRAM_MSG_CNT; i++) {
+ if (hashes[i] != control_readulong())
+ fprintf(stderr, "broken dgram message bounds or packet loss\n");
+ }
+ free(hashes);
+}
+
+static void test_dgram_bounds_unreliable_server(const struct test_opts *opts)
+{
+ unsigned long hashes[DGRAM_MSG_CNT];
+ unsigned long sock_buf_size;
+ struct msghdr msg = {0};
+ struct iovec iov = {0};
+ char buf[MAX_MSG_SIZE];
+ socklen_t len;
+ int fd;
+ int i;
+
+ fd = vsock_dgram_bind(VMADDR_CID_ANY, 1234);
+ if (fd < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Set receive buffer to maximum */
+ sock_buf_size = -1;
+ if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
+ &sock_buf_size, sizeof(sock_buf_size))) {
+ perror("setsockopt(SO_RECVBUF)");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Retrieve the receive buffer size */
+ len = sizeof(sock_buf_size);
+ if (getsockopt(fd, SOL_SOCKET, SO_RCVBUF,
+ &sock_buf_size, &len)) {
+ perror("getsockopt(SO_RECVBUF)");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Client ready to receive parameters */
+ control_expectln("CLNTREADY");
+
+ /* Ready to receive data. */
+ control_writeln("SRVREADY");
+
+ if (opts->peer_cid > VMADDR_CID_HOST)
+ control_writeulong(sock_buf_size);
+ else
+ control_writeulong(getpagesize());
+
+ iov.iov_base = buf;
+ iov.iov_len = sizeof(buf);
+ msg.msg_iov = &iov;
+ msg.msg_iovlen = 1;
+
+ for (i = 0; i < DGRAM_MSG_CNT; i++) {
+ ssize_t recv_size;
+
+ control_expectln("PKTSENT");
+ recv_size = recvmsg(fd, &msg, 0);
+ control_writeln("PKTRECV");
+
+ if (!recv_size)
+ break;
+
+ if (recv_size < 0) {
+ perror("recvmsg");
+ exit(EXIT_FAILURE);
+ }
+
+ hashes[i] = hash_djb2(msg.msg_iov[0].iov_base, recv_size);
+ }
+
+ control_expectln("SENDDONE");
+
+ close(fd);
+
+ for (i = 0; i < DGRAM_MSG_CNT; i++)
+ control_writeulong(hashes[i]);
+}
+
+#define POLL_TIMEOUT_MS 1000
+void vsock_recv_error(int fd)
+{
+ struct sock_extended_err *serr;
+ struct msghdr msg = { 0 };
+ struct pollfd fds = { 0 };
+ char cmsg_data[128];
+ struct cmsghdr *cm;
+ ssize_t res;
+
+ fds.fd = fd;
+ fds.events = 0;
+
+ if (poll(&fds, 1, POLL_TIMEOUT_MS) < 0) {
+ perror("poll");
+ exit(EXIT_FAILURE);
+ }
+
+ if (!(fds.revents & POLLERR)) {
+ fprintf(stderr, "POLLERR expected\n");
+ exit(EXIT_FAILURE);
+ }
+
+ msg.msg_control = cmsg_data;
+ msg.msg_controllen = sizeof(cmsg_data);
+
+ res = recvmsg(fd, &msg, MSG_ERRQUEUE);
+ if (res) {
+ fprintf(stderr, "failed to read error queue: %zi\n", res);
+ exit(EXIT_FAILURE);
+ }
+
+ cm = CMSG_FIRSTHDR(&msg);
+ if (!cm) {
+ fprintf(stderr, "cmsg: no cmsg\n");
+ exit(EXIT_FAILURE);
+ }
+
+ if (cm->cmsg_level != SOL_VSOCK) {
+ fprintf(stderr, "cmsg: unexpected 'cmsg_level'\n");
+ exit(EXIT_FAILURE);
+ }
+
+ if (cm->cmsg_type != 0) {
+ fprintf(stderr, "cmsg: unexpected 'cmsg_type'\n");
+ exit(EXIT_FAILURE);
+ }
+
+ serr = (void *)CMSG_DATA(cm);
+ if (serr->ee_origin != 0) {
+ fprintf(stderr, "serr: unexpected 'ee_origin'\n");
+ exit(EXIT_FAILURE);
+ }
+
+ if (serr->ee_errno != EHOSTUNREACH) {
+ fprintf(stderr, "serr: wrong error code: %u\n", serr->ee_errno);
+ exit(EXIT_FAILURE);
+ }
+}
+
+/*
+ * Attempt to send a packet larger than the client's RX buffer. Test that the
+ * packet was dropped and that there is an error in the error queue.
+ */
+static void test_dgram_drop_big_packets_server(const struct test_opts *opts)
+{
+ unsigned long client_rx_buf_size;
+ size_t buf_size;
+ void *buf;
+ int fd;
+
+ if (opts->peer_cid <= VMADDR_CID_HOST) {
+ printf("The server's peer must be a guest (not CID %u), skipped...\n",
+ opts->peer_cid);
+ return;
+ }
+
+ /* Wait for the server to be ready */
+ control_expectln("READY");
+
+ fd = vsock_dgram_connect(opts->peer_cid, 1234);
+ if (fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ client_rx_buf_size = control_readulong();
+
+ buf_size = client_rx_buf_size + 1;
+ buf = malloc(buf_size);
+ if (!buf) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Even though the buffer is exceeded, the send() should still succeed. */
+ if (send(fd, buf, buf_size, 0) < 0) {
+ perror("send");
+ exit(EXIT_FAILURE);
+ }
+
+ vsock_recv_error(fd);
+
+ /* Notify the server that the client has finished */
+ control_writeln("DONE");
+
+ close(fd);
+}
+
+static void test_dgram_drop_big_packets_client(const struct test_opts *opts)
+{
+ unsigned long buf_size = getpagesize();
+
+ if (opts->peer_cid > VMADDR_CID_HOST) {
+ printf("The client's peer must be the host (not CID %u), skipped...\n",
+ opts->peer_cid);
+ return;
+ }
+
+ control_writeln("READY");
+ control_writeulong(buf_size);
+ control_expectln("DONE");
+}
+
+static void test_stream_dgram_address_collision_client(const struct test_opts *opts)
+{
+ int dgram_fd, stream_fd;
+
+ stream_fd = vsock_stream_connect(opts->peer_cid, 1234);
+ if (stream_fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ /* This simply tests if connect() causes address collision client-side.
+ * Keep in mind that there is no exchange of packets with the
+ * bound socket on the server.
+ */
+ dgram_fd = vsock_dgram_connect(opts->peer_cid, 1234);
+ if (dgram_fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ close(stream_fd);
+ close(dgram_fd);
+
+ /* Notify the server that the client has finished */
+ control_writeln("DONE");
+}
+
+static void test_stream_dgram_address_collision_server(const struct test_opts *opts)
+{
+ int dgram_fd, stream_fd;
+ struct sockaddr_vm addr;
+ socklen_t addrlen;
+
+ stream_fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, 0);
+ if (stream_fd < 0) {
+ perror("accept");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Retrieve the CID/port for re-use. */
+ addrlen = sizeof(addr);
+ if (getsockname(stream_fd, (struct sockaddr *)&addr, &addrlen)) {
+ perror("getsockname");
+ exit(EXIT_FAILURE);
+ }
+
+ /* See not in the client function about the pairwise connect call. */
+ dgram_fd = vsock_dgram_bind(addr.svm_cid, addr.svm_port);
+ if (dgram_fd < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ control_expectln("DONE");
+
+ close(stream_fd);
+ close(dgram_fd);
+}
+
static struct test_case test_cases[] = {
{
.name = "SOCK_STREAM connection reset",
@@ -1128,6 +1773,41 @@ static struct test_case test_cases[] = {
.run_client = test_stream_virtio_skb_merge_client,
.run_server = test_stream_virtio_skb_merge_server,
},
+ {
+ .name = "SOCK_DGRAM client sendto",
+ .run_client = test_dgram_sendto_client,
+ .run_server = test_dgram_sendto_server,
+ },
+ {
+ .name = "SOCK_DGRAM client connect",
+ .run_client = test_dgram_connect_client,
+ .run_server = test_dgram_connect_server,
+ },
+ {
+ .name = "SOCK_DGRAM multiple connections using sendto",
+ .run_client = test_dgram_multiconn_sendto_client,
+ .run_server = test_dgram_multiconn_sendto_server,
+ },
+ {
+ .name = "SOCK_DGRAM multiple connections using send",
+ .run_client = test_dgram_multiconn_send_client,
+ .run_server = test_dgram_multiconn_send_server,
+ },
+ {
+ .name = "SOCK_DGRAM msg bounds unreliable",
+ .run_client = test_dgram_bounds_unreliable_client,
+ .run_server = test_dgram_bounds_unreliable_server,
+ },
+ {
+ .name = "SOCK_DGRAM drop big packets",
+ .run_client = test_dgram_drop_big_packets_client,
+ .run_server = test_dgram_drop_big_packets_server,
+ },
+ {
+ .name = "SOCK_STREAM and SOCK_DGRAM address collision",
+ .run_client = test_stream_dgram_address_collision_client,
+ .run_server = test_stream_dgram_address_collision_server,
+ },
{},
};
--
2.30.2
This commit adds the common datagram receive functionality for virtio
transports. It does not add the vhost/virtio users of that
functionality.
This functionality includes:
- changes to the virtio_transport_recv_pkt() path for finding the
bound socket receiver for incoming packets.
- a virtio_transport_dgram_addr_init() function to be used as the
->dgram_addr_init callback for initializing sockaddr_vm inside
the generic recvmsg() caller.
Signed-off-by: Bobby Eshleman <[email protected]>
---
include/linux/virtio_vsock.h | 2 +
net/vmw_vsock/virtio_transport_common.c | 92 ++++++++++++++++++++++++++++-----
2 files changed, 81 insertions(+), 13 deletions(-)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 7632552bee58..b3856b8a42b3 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -212,6 +212,8 @@ u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
bool virtio_transport_stream_allow(u32 cid, u32 port);
bool virtio_transport_dgram_allow(u32 cid, u32 port);
+void virtio_transport_dgram_addr_init(struct sk_buff *skb,
+ struct sockaddr_vm *addr);
int virtio_transport_connect(struct vsock_sock *vsk);
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 3bfaff758433..96118e258097 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -183,7 +183,9 @@ EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt);
static u16 virtio_transport_get_type(struct sock *sk)
{
- if (sk->sk_type == SOCK_STREAM)
+ if (sk->sk_type == SOCK_DGRAM)
+ return VIRTIO_VSOCK_TYPE_DGRAM;
+ else if (sk->sk_type == SOCK_STREAM)
return VIRTIO_VSOCK_TYPE_STREAM;
else
return VIRTIO_VSOCK_TYPE_SEQPACKET;
@@ -1184,6 +1186,35 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
kfree_skb(skb);
}
+static void
+virtio_transport_dgram_kfree_skb(struct sk_buff *skb, int err)
+{
+ if (err == -ENOMEM)
+ kfree_skb_reason(skb, SKB_DROP_REASON_SOCKET_RCVBUFF);
+ else if (err == -ENOBUFS)
+ kfree_skb_reason(skb, SKB_DROP_REASON_PROTO_MEM);
+ else
+ kfree_skb(skb);
+}
+
+/* This function takes ownership of the skb.
+ *
+ * It either places the skb on the sk_receive_queue or frees it.
+ */
+static void
+virtio_transport_recv_dgram(struct sock *sk, struct sk_buff *skb)
+{
+ int err;
+
+ err = sock_queue_rcv_skb(sk, skb);
+ if (err) {
+ virtio_transport_dgram_kfree_skb(skb, err);
+ return;
+ }
+
+ sk->sk_data_ready(sk);
+}
+
static int
virtio_transport_recv_connected(struct sock *sk,
struct sk_buff *skb)
@@ -1347,7 +1378,8 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
static bool virtio_transport_valid_type(u16 type)
{
return (type == VIRTIO_VSOCK_TYPE_STREAM) ||
- (type == VIRTIO_VSOCK_TYPE_SEQPACKET);
+ (type == VIRTIO_VSOCK_TYPE_SEQPACKET) ||
+ (type == VIRTIO_VSOCK_TYPE_DGRAM);
}
/* We are under the virtio-vsock's vsock->rx_lock or vhost-vsock's vq->mutex
@@ -1361,40 +1393,52 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
struct vsock_sock *vsk;
struct sock *sk;
bool space_available;
+ u16 type;
vsock_addr_init(&src, le64_to_cpu(hdr->src_cid),
le32_to_cpu(hdr->src_port));
vsock_addr_init(&dst, le64_to_cpu(hdr->dst_cid),
le32_to_cpu(hdr->dst_port));
+ type = le16_to_cpu(hdr->type);
+
trace_virtio_transport_recv_pkt(src.svm_cid, src.svm_port,
dst.svm_cid, dst.svm_port,
le32_to_cpu(hdr->len),
- le16_to_cpu(hdr->type),
+ type,
le16_to_cpu(hdr->op),
le32_to_cpu(hdr->flags),
le32_to_cpu(hdr->buf_alloc),
le32_to_cpu(hdr->fwd_cnt));
- if (!virtio_transport_valid_type(le16_to_cpu(hdr->type))) {
+ if (!virtio_transport_valid_type(type)) {
(void)virtio_transport_reset_no_sock(t, skb);
goto free_pkt;
}
- /* The socket must be in connected or bound table
- * otherwise send reset back
+ /* For stream/seqpacket, the socket must be in connected or bound table
+ * otherwise send reset back.
+ *
+ * For datagrams, no reset is sent back.
*/
sk = vsock_find_connected_socket(&src, &dst);
if (!sk) {
- sk = vsock_find_bound_socket(&dst);
- if (!sk) {
- (void)virtio_transport_reset_no_sock(t, skb);
- goto free_pkt;
+ if (type == VIRTIO_VSOCK_TYPE_DGRAM) {
+ sk = vsock_find_bound_dgram_socket(&dst);
+ if (!sk)
+ goto free_pkt;
+ } else {
+ sk = vsock_find_bound_socket(&dst);
+ if (!sk) {
+ (void)virtio_transport_reset_no_sock(t, skb);
+ goto free_pkt;
+ }
}
}
- if (virtio_transport_get_type(sk) != le16_to_cpu(hdr->type)) {
- (void)virtio_transport_reset_no_sock(t, skb);
+ if (virtio_transport_get_type(sk) != type) {
+ if (type != VIRTIO_VSOCK_TYPE_DGRAM)
+ (void)virtio_transport_reset_no_sock(t, skb);
sock_put(sk);
goto free_pkt;
}
@@ -1410,12 +1454,18 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
/* Check if sk has been closed before lock_sock */
if (sock_flag(sk, SOCK_DONE)) {
- (void)virtio_transport_reset_no_sock(t, skb);
+ if (type != VIRTIO_VSOCK_TYPE_DGRAM)
+ (void)virtio_transport_reset_no_sock(t, skb);
release_sock(sk);
sock_put(sk);
goto free_pkt;
}
+ if (sk->sk_type == SOCK_DGRAM) {
+ virtio_transport_recv_dgram(sk, skb);
+ goto out;
+ }
+
space_available = virtio_transport_space_update(sk, skb);
/* Update CID in case it has changed after a transport reset event */
@@ -1447,6 +1497,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
break;
}
+out:
release_sock(sk);
/* Release refcnt obtained when we fetched this socket out of the
@@ -1515,6 +1566,21 @@ int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t recv_acto
}
EXPORT_SYMBOL_GPL(virtio_transport_read_skb);
+void virtio_transport_dgram_addr_init(struct sk_buff *skb,
+ struct sockaddr_vm *addr)
+{
+ struct virtio_vsock_hdr *hdr;
+ unsigned int cid, port;
+
+ WARN_ONCE(skb->head == skb->data, "virtio vsock bug: bad dgram skb");
+
+ hdr = virtio_vsock_hdr(skb);
+ cid = le64_to_cpu(hdr->src_cid);
+ port = le32_to_cpu(hdr->src_port);
+ vsock_addr_init(addr, cid, port);
+}
+EXPORT_SYMBOL_GPL(virtio_transport_dgram_addr_init);
+
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Asias He");
MODULE_DESCRIPTION("common code for virtio vsock");
--
2.30.2
This commit adds the datagram packet type for inclusion in virtio vsock
packet headers. It is included here as a standalone commit because
multiple future but distinct commits depend on it.
Signed-off-by: Bobby Eshleman <[email protected]>
---
include/uapi/linux/virtio_vsock.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
index 64738838bee5..331be28b1d30 100644
--- a/include/uapi/linux/virtio_vsock.h
+++ b/include/uapi/linux/virtio_vsock.h
@@ -69,6 +69,7 @@ struct virtio_vsock_hdr {
enum virtio_vsock_type {
VIRTIO_VSOCK_TYPE_STREAM = 1,
VIRTIO_VSOCK_TYPE_SEQPACKET = 2,
+ VIRTIO_VSOCK_TYPE_DGRAM = 3,
};
enum virtio_vsock_op {
--
2.30.2
This commit adds vsock_find_bound_dgram_socket() which allows transports
to find bound dgram sockets in the global dgram bind table. It is
intended to be used for "routing" incoming packets to the correct
sockets if the transport uses the global bind table.
Signed-off-by: Bobby Eshleman <[email protected]>
---
include/net/af_vsock.h | 1 +
net/vmw_vsock/af_vsock.c | 16 ++++++++++++++++
2 files changed, 17 insertions(+)
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index f6a0ca9d7c3e..ae6b6cdf6a4d 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -215,6 +215,7 @@ void vsock_for_each_connected_socket(struct vsock_transport *transport,
void (*fn)(struct sock *sk));
int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
bool vsock_find_cid(unsigned int cid);
+struct sock *vsock_find_bound_dgram_socket(struct sockaddr_vm *addr);
/**** TAP ****/
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 0895f4c1d340..e73f3b2c52f1 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -264,6 +264,22 @@ static struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
return NULL;
}
+struct sock *
+vsock_find_bound_dgram_socket(struct sockaddr_vm *addr)
+{
+ struct sock *sk;
+
+ spin_lock_bh(&vsock_dgram_table_lock);
+ sk = vsock_find_bound_socket_common(addr, vsock_bound_dgram_sockets(addr));
+ if (sk)
+ sock_hold(sk);
+
+ spin_unlock_bh(&vsock_dgram_table_lock);
+
+ return sk;
+}
+EXPORT_SYMBOL_GPL(vsock_find_bound_dgram_socket);
+
static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
{
return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr));
--
2.30.2
On 19.07.2023 03:50, Bobby Eshleman wrote:
> This commit implements the common function
> virtio_transport_dgram_enqueue for enqueueing datagrams. It does not add
> usage in either vhost or virtio yet.
>
> Signed-off-by: Bobby Eshleman <[email protected]>
> ---
> net/vmw_vsock/virtio_transport_common.c | 76 ++++++++++++++++++++++++++++++++-
> 1 file changed, 75 insertions(+), 1 deletion(-)
>
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index ffcbdd77feaa..3bfaff758433 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -819,7 +819,81 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
> struct msghdr *msg,
> size_t dgram_len)
> {
> - return -EOPNOTSUPP;
> + /* Here we are only using the info struct to retain style uniformity
> + * and to ease future refactoring and merging.
> + */
> + struct virtio_vsock_pkt_info info_stack = {
> + .op = VIRTIO_VSOCK_OP_RW,
> + .msg = msg,
> + .vsk = vsk,
> + .type = VIRTIO_VSOCK_TYPE_DGRAM,
> + };
> + const struct virtio_transport *t_ops;
> + struct virtio_vsock_pkt_info *info;
> + struct sock *sk = sk_vsock(vsk);
> + struct virtio_vsock_hdr *hdr;
> + u32 src_cid, src_port;
> + struct sk_buff *skb;
> + void *payload;
> + int noblock;
> + int err;
> +
> + info = &info_stack;
I think 'info' assignment could be moved below, to the place where it is used
first time.
> +
> + if (dgram_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> + return -EMSGSIZE;
> +
> + t_ops = virtio_transport_get_ops(vsk);
> + if (unlikely(!t_ops))
> + return -EFAULT;
> +
> + /* Unlike some of our other sending functions, this function is not
> + * intended for use without a msghdr.
> + */
> + if (WARN_ONCE(!msg, "vsock dgram bug: no msghdr found for dgram enqueue\n"))
> + return -EFAULT;
Sorry, but is that possible? I thought 'msg' is always provided by general socket layer (e.g. before
af_vsock.c code) and can't be NULL for DGRAM. Please correct me if i'm wrong.
Also I see, that in af_vsock.c , 'vsock_dgram_sendmsg()' dereferences 'msg' for checking MSG_OOB without any
checks (before calling transport callback - this function in case of virtio). So I think if we want to keep
this type of check - such check must be placed in af_vsock.c or somewhere before first dereference of this pointer.
> +
> + noblock = msg->msg_flags & MSG_DONTWAIT;
> +
> + /* Use sock_alloc_send_skb to throttle by sk_sndbuf. This helps avoid
> + * triggering the OOM.
> + */
> + skb = sock_alloc_send_skb(sk, dgram_len + VIRTIO_VSOCK_SKB_HEADROOM,
> + noblock, &err);
> + if (!skb)
> + return err;
> +
> + skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM);
> +
> + src_cid = t_ops->transport.get_local_cid();
> + src_port = vsk->local_addr.svm_port;
> +
> + hdr = virtio_vsock_hdr(skb);
> + hdr->type = cpu_to_le16(info->type);
> + hdr->op = cpu_to_le16(info->op);
> + hdr->src_cid = cpu_to_le64(src_cid);
> + hdr->dst_cid = cpu_to_le64(remote_addr->svm_cid);
> + hdr->src_port = cpu_to_le32(src_port);
> + hdr->dst_port = cpu_to_le32(remote_addr->svm_port);
> + hdr->flags = cpu_to_le32(info->flags);
> + hdr->len = cpu_to_le32(dgram_len);
> +
> + skb_set_owner_w(skb, sk);
> +
> + payload = skb_put(skb, dgram_len);
> + err = memcpy_from_msg(payload, msg, dgram_len);
> + if (err)
> + return err;
Do we need free allocated skb here ?
> +
> + trace_virtio_transport_alloc_pkt(src_cid, src_port,
> + remote_addr->svm_cid,
> + remote_addr->svm_port,
> + dgram_len,
> + info->type,
> + info->op,
> + 0);
> +
> + return t_ops->send_pkt(skb);
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
>
>
Thanks, Arseniy
Hello Bobby!
Thanks for this patchset! I left some comments and continue review and tests in
the next few days
Thanks, Arseniy
On 19.07.2023 03:50, Bobby Eshleman wrote:
> Hey all!
>
> This series introduces support for datagrams to virtio/vsock.
>
> It is a spin-off (and smaller version) of this series from the summer:
> https://lore.kernel.org/all/[email protected]/
>
> Please note that this is an RFC and should not be merged until
> associated changes are made to the virtio specification, which will
> follow after discussion from this series.
>
> Another aside, the v4 of the series has only been mildly tested with a
> run of tools/testing/vsock/vsock_test. Some code likely needs cleaning
> up, but I'm hoping to get some of the design choices agreed upon before
> spending too much time making it pretty.
>
> This series first supports datagrams in a basic form for virtio, and
> then optimizes the sendpath for all datagram transports.
>
> The result is a very fast datagram communication protocol that
> outperforms even UDP on multi-queue virtio-net w/ vhost on a variety
> of multi-threaded workload samples.
>
> For those that are curious, some summary data comparing UDP and VSOCK
> DGRAM (N=5):
>
> vCPUS: 16
> virtio-net queues: 16
> payload size: 4KB
> Setup: bare metal + vm (non-nested)
>
> UDP: 287.59 MB/s
> VSOCK DGRAM: 509.2 MB/s
>
> Some notes about the implementation...
>
> This datagram implementation forces datagrams to self-throttle according
> to the threshold set by sk_sndbuf. It behaves similar to the credits
> used by streams in its effect on throughput and memory consumption, but
> it is not influenced by the receiving socket as credits are.
>
> The device drops packets silently.
>
> As discussed previously, this series introduces datagrams and defers
> fairness to future work. See discussion in v2 for more context around
> datagrams, fairness, and this implementation.
>
> Signed-off-by: Bobby Eshleman <[email protected]>
> ---
> Changes in v5:
> - teach vhost to drop dgram when a datagram exceeds the receive buffer
> - now uses MSG_ERRQUEUE and depends on Arseniy's zerocopy patch:
> "vsock: read from socket's error queue"
> - replace multiple ->dgram_* callbacks with single ->dgram_addr_init()
> callback
> - refactor virtio dgram skb allocator to reduce conflicts w/ zerocopy series
> - add _fallback/_FALLBACK suffix to dgram transport variables/macros
> - add WARN_ONCE() for table_size / VSOCK_HASH issue
> - add static to vsock_find_bound_socket_common
> - dedupe code in vsock_dgram_sendmsg() using module_got var
> - drop concurrent sendmsg() for dgram and defer to future series
> - Add more tests
> - test EHOSTUNREACH in errqueue
> - test stream + dgram address collision
> - improve clarity of dgram msg bounds test code
> - Link to v4: https://lore.kernel.org/r/[email protected]
>
> Changes in v4:
> - style changes
> - vsock: use sk_vsock(vsk) in vsock_dgram_recvmsg instead of
> &sk->vsk
> - vsock: fix xmas tree declaration
> - vsock: fix spacing issues
> - virtio/vsock: virtio_transport_recv_dgram returns void because err
> unused
> - sparse analysis warnings/errors
> - virtio/vsock: fix unitialized skerr on destroy
> - virtio/vsock: fix uninitialized err var on goto out
> - vsock: fix declarations that need static
> - vsock: fix __rcu annotation order
> - bugs
> - vsock: fix null ptr in remote_info code
> - vsock/dgram: make transport_dgram a fallback instead of first
> priority
> - vsock: remove redundant rcu read lock acquire in getname()
> - tests
> - add more tests (message bounds and more)
> - add vsock_dgram_bind() helper
> - add vsock_dgram_connect() helper
>
> Changes in v3:
> - Support multi-transport dgram, changing logic in connect/bind
> to support VMCI case
> - Support per-pkt transport lookup for sendto() case
> - Fix dgram_allow() implementation
> - Fix dgram feature bit number (now it is 3)
> - Fix binding so dgram and connectible (cid,port) spaces are
> non-overlapping
> - RCU protect transport ptr so connect() calls never leave
> a lockless read of the transport and remote_addr are always
> in sync
> - Link to v2: https://lore.kernel.org/r/[email protected]
>
> ---
> Bobby Eshleman (13):
> af_vsock: generalize vsock_dgram_recvmsg() to all transports
> af_vsock: refactor transport lookup code
> af_vsock: support multi-transport datagrams
> af_vsock: generalize bind table functions
> af_vsock: use a separate dgram bind table
> virtio/vsock: add VIRTIO_VSOCK_TYPE_DGRAM
> virtio/vsock: add common datagram send path
> af_vsock: add vsock_find_bound_dgram_socket()
> virtio/vsock: add common datagram recv path
> virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
> vhost/vsock: implement datagram support
> vsock/loopback: implement datagram support
> virtio/vsock: implement datagram support
>
> Jiang Wang (1):
> test/vsock: add vsock dgram tests
>
> drivers/vhost/vsock.c | 64 ++-
> include/linux/virtio_vsock.h | 10 +-
> include/net/af_vsock.h | 14 +-
> include/uapi/linux/virtio_vsock.h | 2 +
> net/vmw_vsock/af_vsock.c | 281 ++++++++++---
> net/vmw_vsock/hyperv_transport.c | 13 -
> net/vmw_vsock/virtio_transport.c | 26 +-
> net/vmw_vsock/virtio_transport_common.c | 190 +++++++--
> net/vmw_vsock/vmci_transport.c | 60 +--
> net/vmw_vsock/vsock_loopback.c | 10 +-
> tools/testing/vsock/util.c | 141 ++++++-
> tools/testing/vsock/util.h | 6 +
> tools/testing/vsock/vsock_test.c | 680 ++++++++++++++++++++++++++++++++
> 13 files changed, 1320 insertions(+), 177 deletions(-)
> ---
> base-commit: 37cadc266ebdc7e3531111c2b3304fa01b2131e8
> change-id: 20230413-b4-vsock-dgram-3b6eba6a64e5
>
> Best regards,
On 19.07.2023 03:50, Bobby Eshleman wrote:
> This commit implements datagram support for vhost/vsock by teaching
> vhost to use the common virtio transport datagram functions.
>
> If the virtio RX buffer is too small, then the transmission is
> abandoned, the packet dropped, and EHOSTUNREACH is added to the socket's
> error queue.
>
> Signed-off-by: Bobby Eshleman <[email protected]>
> ---
> drivers/vhost/vsock.c | 62 +++++++++++++++++++++++++++++++++++++++++++++---
> net/vmw_vsock/af_vsock.c | 5 +++-
> 2 files changed, 63 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index d5d6a3c3f273..da14260c6654 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -8,6 +8,7 @@
> */
> #include <linux/miscdevice.h>
> #include <linux/atomic.h>
> +#include <linux/errqueue.h>
> #include <linux/module.h>
> #include <linux/mutex.h>
> #include <linux/vmalloc.h>
> @@ -32,7 +33,8 @@
> enum {
> VHOST_VSOCK_FEATURES = VHOST_FEATURES |
> (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
> - (1ULL << VIRTIO_VSOCK_F_SEQPACKET)
> + (1ULL << VIRTIO_VSOCK_F_SEQPACKET) |
> + (1ULL << VIRTIO_VSOCK_F_DGRAM)
> };
>
> enum {
> @@ -56,6 +58,7 @@ struct vhost_vsock {
> atomic_t queued_replies;
>
> u32 guest_cid;
> + bool dgram_allow;
> bool seqpacket_allow;
> };
>
> @@ -86,6 +89,32 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
> return NULL;
> }
>
> +/* Claims ownership of the skb, do not free the skb after calling! */
> +static void
> +vhost_transport_error(struct sk_buff *skb, int err)
> +{
> + struct sock_exterr_skb *serr;
> + struct sock *sk = skb->sk;
> + struct sk_buff *clone;
> +
> + serr = SKB_EXT_ERR(skb);
> + memset(serr, 0, sizeof(*serr));
> + serr->ee.ee_errno = err;
> + serr->ee.ee_origin = SO_EE_ORIGIN_NONE;
> +
> + clone = skb_clone(skb, GFP_KERNEL);
May for skb which is error carrier we can use 'sock_omalloc()', not 'skb_clone()' ? TCP uses skb
allocated by this function as carriers of error structure. I guess 'skb_clone()' also clones data of origin,
but i think that there is no need in data as we insert it to error queue of the socket.
What do You think?
> + if (!clone)
> + return;
What will happen here 'if (!clone)' ? skb will leak as it was removed from queue?
> +
> + if (sock_queue_err_skb(sk, clone))
> + kfree_skb(clone);
> +
> + sk->sk_err = err;
> + sk_error_report(sk);
> +
> + kfree_skb(skb);
> +}
> +
> static void
> vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> struct vhost_virtqueue *vq)
> @@ -160,9 +189,15 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> hdr = virtio_vsock_hdr(skb);
>
> /* If the packet is greater than the space available in the
> - * buffer, we split it using multiple buffers.
> + * buffer, we split it using multiple buffers for connectible
> + * sockets and drop the packet for datagram sockets.
> */
> if (payload_len > iov_len - sizeof(*hdr)) {
> + if (le16_to_cpu(hdr->type) == VIRTIO_VSOCK_TYPE_DGRAM) {
> + vhost_transport_error(skb, EHOSTUNREACH);
> + continue;
> + }
> +
> payload_len = iov_len - sizeof(*hdr);
>
> /* As we are copying pieces of large packet's buffer to
> @@ -394,6 +429,7 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock)
> return val < vq->num;
> }
>
> +static bool vhost_transport_dgram_allow(u32 cid, u32 port);
> static bool vhost_transport_seqpacket_allow(u32 remote_cid);
>
> static struct virtio_transport vhost_transport = {
> @@ -410,7 +446,8 @@ static struct virtio_transport vhost_transport = {
> .cancel_pkt = vhost_transport_cancel_pkt,
>
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> - .dgram_allow = virtio_transport_dgram_allow,
> + .dgram_allow = vhost_transport_dgram_allow,
> + .dgram_addr_init = virtio_transport_dgram_addr_init,
>
> .stream_enqueue = virtio_transport_stream_enqueue,
> .stream_dequeue = virtio_transport_stream_dequeue,
> @@ -443,6 +480,22 @@ static struct virtio_transport vhost_transport = {
> .send_pkt = vhost_transport_send_pkt,
> };
>
> +static bool vhost_transport_dgram_allow(u32 cid, u32 port)
> +{
> + struct vhost_vsock *vsock;
> + bool dgram_allow = false;
> +
> + rcu_read_lock();
> + vsock = vhost_vsock_get(cid);
> +
> + if (vsock)
> + dgram_allow = vsock->dgram_allow;
> +
> + rcu_read_unlock();
> +
> + return dgram_allow;
> +}
> +
> static bool vhost_transport_seqpacket_allow(u32 remote_cid)
> {
> struct vhost_vsock *vsock;
> @@ -799,6 +852,9 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
> if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
> vsock->seqpacket_allow = true;
>
> + if (features & (1ULL << VIRTIO_VSOCK_F_DGRAM))
> + vsock->dgram_allow = true;
> +
> for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
> vq = &vsock->vqs[i];
> mutex_lock(&vq->mutex);
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index e73f3b2c52f1..449ed63ac2b0 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -1427,9 +1427,12 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> return prot->recvmsg(sk, msg, len, flags, NULL);
> #endif
>
> - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> + if (unlikely(flags & MSG_OOB))
> return -EOPNOTSUPP;
>
> + if (unlikely(flags & MSG_ERRQUEUE))
> + return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, 0);
> +
Sorry, but I get build error here, because SOL_VSOCK in undefined. I think it should be added to
include/linux/socket.h and to uapi files also for future use in userspace.
Also Stefano Garzarella <[email protected]> suggested to add define something like VSOCK_RECVERR,
in the same way as IP_RECVERR, and use it as last parameter of 'sock_recv_errqueue()'.
> transport = vsk->transport;
>
> /* Retrieve the head sk_buff from the socket's receive queue. */
>
Thanks, Arseniy
On 19.07.2023 03:50, Bobby Eshleman wrote:
> This commit implements datagram support for virtio/vsock by teaching
> virtio to use the general virtio transport ->dgram_addr_init() function
> and implementation a new version of ->dgram_allow().
>
> Additionally, it drops virtio_transport_dgram_allow() as an exported
> symbol because it is no longer used in other transports.
>
> Signed-off-by: Bobby Eshleman <[email protected]>
> ---
> include/linux/virtio_vsock.h | 1 -
> net/vmw_vsock/virtio_transport.c | 24 +++++++++++++++++++++++-
> net/vmw_vsock/virtio_transport_common.c | 6 ------
> 3 files changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index b3856b8a42b3..d0a4f08b12c1 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -211,7 +211,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> bool virtio_transport_stream_allow(u32 cid, u32 port);
> -bool virtio_transport_dgram_allow(u32 cid, u32 port);
> void virtio_transport_dgram_addr_init(struct sk_buff *skb,
> struct sockaddr_vm *addr);
>
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index ac2126c7dac5..713718861bd4 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -63,6 +63,7 @@ struct virtio_vsock {
>
> u32 guest_cid;
> bool seqpacket_allow;
> + bool dgram_allow;
> };
>
> static u32 virtio_transport_get_local_cid(void)
> @@ -413,6 +414,7 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
> queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> }
>
> +static bool virtio_transport_dgram_allow(u32 cid, u32 port);
May be add body here? Without prototyping? Same for loopback and vhost.
> static bool virtio_transport_seqpacket_allow(u32 remote_cid);
>
> static struct virtio_transport virtio_transport = {
> @@ -430,6 +432,7 @@ static struct virtio_transport virtio_transport = {
>
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> .dgram_allow = virtio_transport_dgram_allow,
> + .dgram_addr_init = virtio_transport_dgram_addr_init,
>
> .stream_dequeue = virtio_transport_stream_dequeue,
> .stream_enqueue = virtio_transport_stream_enqueue,
> @@ -462,6 +465,21 @@ static struct virtio_transport virtio_transport = {
> .send_pkt = virtio_transport_send_pkt,
> };
>
> +static bool virtio_transport_dgram_allow(u32 cid, u32 port)
> +{
> + struct virtio_vsock *vsock;
> + bool dgram_allow;
> +
> + dgram_allow = false;
> + rcu_read_lock();
> + vsock = rcu_dereference(the_virtio_vsock);
> + if (vsock)
> + dgram_allow = vsock->dgram_allow;
> + rcu_read_unlock();
> +
> + return dgram_allow;
> +}
> +
> static bool virtio_transport_seqpacket_allow(u32 remote_cid)
> {
> struct virtio_vsock *vsock;
> @@ -655,6 +673,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_SEQPACKET))
> vsock->seqpacket_allow = true;
>
> + if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
> + vsock->dgram_allow = true;
> +
> vdev->priv = vsock;
>
> ret = virtio_vsock_vqs_init(vsock);
> @@ -747,7 +768,8 @@ static struct virtio_device_id id_table[] = {
> };
>
> static unsigned int features[] = {
> - VIRTIO_VSOCK_F_SEQPACKET
> + VIRTIO_VSOCK_F_SEQPACKET,
> + VIRTIO_VSOCK_F_DGRAM
> };
>
> static struct virtio_driver virtio_vsock_driver = {
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 96118e258097..77898f5325cd 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -783,12 +783,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
> }
> EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
>
> -bool virtio_transport_dgram_allow(u32 cid, u32 port)
> -{
> - return false;
> -}
> -EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
> -
> int virtio_transport_connect(struct vsock_sock *vsk)
> {
> struct virtio_vsock_pkt_info info = {
>
Thanks, Arseniy
On 19.07.2023 03:50, Bobby Eshleman wrote:
> This patch adds support for multi-transport datagrams.
>
> This includes:
> - Per-packet lookup of transports when using sendto(sockaddr_vm)
> - Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
> sockaddr_vm
> - rename VSOCK_TRANSPORT_F_DGRAM to VSOCK_TRANSPORT_F_DGRAM_FALLBACK
> - connect() now assigns the transport for (similar to connectible
> sockets)
>
> To preserve backwards compatibility with VMCI, some important changes
> are made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
> be used for dgrams only if there is not yet a g2h or h2g transport that
> has been registered that can transmit the packet. If there is a g2h/h2g
> transport for that remote address, then that transport will be used and
> not "transport_dgram". This essentially makes "transport_dgram" a
> fallback transport for when h2g/g2h has not yet gone online, and so it
> is renamed "transport_dgram_fallback". VMCI implements this transport.
>
> The logic around "transport_dgram" needs to be retained to prevent
> breaking VMCI:
>
> 1) VMCI datagrams existed prior to h2g/g2h and so operate under a
> different paradigm. When the vmci transport comes online, it registers
> itself with the DGRAM feature, but not H2G/G2H. Only later when the
> transport has more information about its environment does it register
> H2G or G2H. In the case that a datagram socket is created after
> VSOCK_TRANSPORT_F_DGRAM registration but before G2H/H2G registration,
> the "transport_dgram" transport is the only registered transport and so
> needs to be used.
>
> 2) VMCI seems to require a special message be sent by the transport when a
> datagram socket calls bind(). Under the h2g/g2h model, the transport
> is selected using the remote_addr which is set by connect(). At
> bind time there is no remote_addr because often no connect() has been
> called yet: the transport is null. Therefore, with a null transport
> there doesn't seem to be any good way for a datagram socket to tell the
> VMCI transport that it has just had bind() called upon it.
>
> With the new fallback logic, after H2G/G2H comes online the socket layer
> will access the VMCI transport via transport_{h2g,g2h}. Prior to H2G/G2H
> coming online, the socket layer will access the VMCI transport via
> "transport_dgram_fallback".
>
> Only transports with a special datagram fallback use-case such as VMCI
> need to register VSOCK_TRANSPORT_F_DGRAM_FALLBACK.
>
> Signed-off-by: Bobby Eshleman <[email protected]>
> ---
> drivers/vhost/vsock.c | 1 -
> include/linux/virtio_vsock.h | 2 --
> include/net/af_vsock.h | 10 +++---
> net/vmw_vsock/af_vsock.c | 64 ++++++++++++++++++++++++++-------
> net/vmw_vsock/hyperv_transport.c | 6 ----
> net/vmw_vsock/virtio_transport.c | 1 -
> net/vmw_vsock/virtio_transport_common.c | 7 ----
> net/vmw_vsock/vmci_transport.c | 2 +-
> net/vmw_vsock/vsock_loopback.c | 1 -
> 9 files changed, 58 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index ae8891598a48..d5d6a3c3f273 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
> .cancel_pkt = vhost_transport_cancel_pkt,
>
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> - .dgram_bind = virtio_transport_dgram_bind,
> .dgram_allow = virtio_transport_dgram_allow,
>
> .stream_enqueue = virtio_transport_stream_enqueue,
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 18cbe8d37fca..7632552bee58 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -211,8 +211,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> bool virtio_transport_stream_allow(u32 cid, u32 port);
> -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> - struct sockaddr_vm *addr);
> bool virtio_transport_dgram_allow(u32 cid, u32 port);
>
> int virtio_transport_connect(struct vsock_sock *vsk);
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> index 305d57502e89..f6a0ca9d7c3e 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -96,13 +96,13 @@ struct vsock_transport_send_notify_data {
>
> /* Transport features flags */
> /* Transport provides host->guest communication */
> -#define VSOCK_TRANSPORT_F_H2G 0x00000001
> +#define VSOCK_TRANSPORT_F_H2G 0x00000001
> /* Transport provides guest->host communication */
> -#define VSOCK_TRANSPORT_F_G2H 0x00000002
> -/* Transport provides DGRAM communication */
> -#define VSOCK_TRANSPORT_F_DGRAM 0x00000004
> +#define VSOCK_TRANSPORT_F_G2H 0x00000002
> +/* Transport provides fallback for DGRAM communication */
> +#define VSOCK_TRANSPORT_F_DGRAM_FALLBACK 0x00000004
> /* Transport provides local (loopback) communication */
> -#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
> +#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
>
> struct vsock_transport {
> struct module *module;
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index ae5ac5531d96..26c97b33d55a 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -139,8 +139,8 @@ struct proto vsock_proto = {
> static const struct vsock_transport *transport_h2g;
> /* Transport used for guest->host communication */
> static const struct vsock_transport *transport_g2h;
> -/* Transport used for DGRAM communication */
> -static const struct vsock_transport *transport_dgram;
> +/* Transport used as a fallback for DGRAM communication */
> +static const struct vsock_transport *transport_dgram_fallback;
> /* Transport used for local communication */
> static const struct vsock_transport *transport_local;
> static DEFINE_MUTEX(vsock_register_mutex);
> @@ -439,6 +439,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
> return transport;
> }
>
> +static const struct vsock_transport *
> +vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
> +{
> + const struct vsock_transport *transport;
> +
> + transport = vsock_connectible_lookup_transport(cid, flags);
> + if (transport)
> + return transport;
> +
> + return transport_dgram_fallback;
> +}
> +
> /* Assign a transport to a socket and call the .init transport callback.
> *
> * Note: for connection oriented socket this must be called when vsk->remote_addr
> @@ -475,7 +487,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
>
> switch (sk->sk_type) {
> case SOCK_DGRAM:
> - new_transport = transport_dgram;
> + new_transport = vsock_dgram_lookup_transport(remote_cid,
> + remote_flags);
I'm a little bit confused about this:
1) Let's create SOCK_DGRAM socket using vsock_create()
2) for SOCK_DGRAM it calls 'vsock_assign_transport()' and we go here, remote_cid == -1
3) I guess 'vsock_dgram_lookup_transport()' calls logic from 0002 and returns h2g for such remote cid, which is not
correct I think...
Please correct me if i'm wrong
Thanks, Arseniy
> break;
> case SOCK_STREAM:
> case SOCK_SEQPACKET:
> @@ -692,6 +705,9 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> static int __vsock_bind_dgram(struct vsock_sock *vsk,
> struct sockaddr_vm *addr)
> {
> + if (!vsk->transport || !vsk->transport->dgram_bind)
> + return -EINVAL;
> +
> return vsk->transport->dgram_bind(vsk, addr);
> }
>
> @@ -1162,6 +1178,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> struct vsock_sock *vsk;
> struct sockaddr_vm *remote_addr;
> const struct vsock_transport *transport;
> + bool module_got = false;
>
> if (msg->msg_flags & MSG_OOB)
> return -EOPNOTSUPP;
> @@ -1173,19 +1190,34 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
>
> lock_sock(sk);
>
> - transport = vsk->transport;
> -
> err = vsock_auto_bind(vsk);
> if (err)
> goto out;
>
> -
> /* If the provided message contains an address, use that. Otherwise
> * fall back on the socket's remote handle (if it has been connected).
> */
> if (msg->msg_name &&
> vsock_addr_cast(msg->msg_name, msg->msg_namelen,
> &remote_addr) == 0) {
> + transport = vsock_dgram_lookup_transport(remote_addr->svm_cid,
> + remote_addr->svm_flags);
> + if (!transport) {
> + err = -EINVAL;
> + goto out;
> + }
> +
> + if (!try_module_get(transport->module)) {
> + err = -ENODEV;
> + goto out;
> + }
> +
> + /* When looking up a transport dynamically and acquiring a
> + * reference on the module, we need to remember to release the
> + * reference later.
> + */
> + module_got = true;
> +
> /* Ensure this address is of the right type and is a valid
> * destination.
> */
> @@ -1200,6 +1232,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> } else if (sock->state == SS_CONNECTED) {
> remote_addr = &vsk->remote_addr;
>
> + transport = vsk->transport;
> if (remote_addr->svm_cid == VMADDR_CID_ANY)
> remote_addr->svm_cid = transport->get_local_cid();
>
> @@ -1224,6 +1257,8 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
>
> out:
> + if (module_got)
> + module_put(transport->module);
> release_sock(sk);
> return err;
> }
> @@ -1256,13 +1291,18 @@ static int vsock_dgram_connect(struct socket *sock,
> if (err)
> goto out;
>
> + memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
> +
> + err = vsock_assign_transport(vsk, NULL);
> + if (err)
> + goto out;
> +
> if (!vsk->transport->dgram_allow(remote_addr->svm_cid,
> remote_addr->svm_port)) {
> err = -EINVAL;
> goto out;
> }
>
> - memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
> sock->state = SS_CONNECTED;
>
> /* sock map disallows redirection of non-TCP sockets with sk_state !=
> @@ -2487,7 +2527,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
>
> t_h2g = transport_h2g;
> t_g2h = transport_g2h;
> - t_dgram = transport_dgram;
> + t_dgram = transport_dgram_fallback;
> t_local = transport_local;
>
> if (features & VSOCK_TRANSPORT_F_H2G) {
> @@ -2506,7 +2546,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> t_g2h = t;
> }
>
> - if (features & VSOCK_TRANSPORT_F_DGRAM) {
> + if (features & VSOCK_TRANSPORT_F_DGRAM_FALLBACK) {
> if (t_dgram) {
> err = -EBUSY;
> goto err_busy;
> @@ -2524,7 +2564,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
>
> transport_h2g = t_h2g;
> transport_g2h = t_g2h;
> - transport_dgram = t_dgram;
> + transport_dgram_fallback = t_dgram;
> transport_local = t_local;
>
> err_busy:
> @@ -2543,8 +2583,8 @@ void vsock_core_unregister(const struct vsock_transport *t)
> if (transport_g2h == t)
> transport_g2h = NULL;
>
> - if (transport_dgram == t)
> - transport_dgram = NULL;
> + if (transport_dgram_fallback == t)
> + transport_dgram_fallback = NULL;
>
> if (transport_local == t)
> transport_local = NULL;
> diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> index 7f1ea434656d..c29000f2612a 100644
> --- a/net/vmw_vsock/hyperv_transport.c
> +++ b/net/vmw_vsock/hyperv_transport.c
> @@ -551,11 +551,6 @@ static void hvs_destruct(struct vsock_sock *vsk)
> kfree(hvs);
> }
>
> -static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
> -{
> - return -EOPNOTSUPP;
> -}
> -
> static int hvs_dgram_enqueue(struct vsock_sock *vsk,
> struct sockaddr_vm *remote, struct msghdr *msg,
> size_t dgram_len)
> @@ -826,7 +821,6 @@ static struct vsock_transport hvs_transport = {
> .connect = hvs_connect,
> .shutdown = hvs_shutdown,
>
> - .dgram_bind = hvs_dgram_bind,
> .dgram_enqueue = hvs_dgram_enqueue,
> .dgram_allow = hvs_dgram_allow,
>
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index 66edffdbf303..ac2126c7dac5 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -428,7 +428,6 @@ static struct virtio_transport virtio_transport = {
> .shutdown = virtio_transport_shutdown,
> .cancel_pkt = virtio_transport_cancel_pkt,
>
> - .dgram_bind = virtio_transport_dgram_bind,
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> .dgram_allow = virtio_transport_dgram_allow,
>
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 01ea1402ad40..ffcbdd77feaa 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -781,13 +781,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
> }
> EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
>
> -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> - struct sockaddr_vm *addr)
> -{
> - return -EOPNOTSUPP;
> -}
> -EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
> -
> bool virtio_transport_dgram_allow(u32 cid, u32 port)
> {
> return false;
> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> index 0bbbdb222245..857b0461f856 100644
> --- a/net/vmw_vsock/vmci_transport.c
> +++ b/net/vmw_vsock/vmci_transport.c
> @@ -2072,7 +2072,7 @@ static int __init vmci_transport_init(void)
> /* Register only with dgram feature, other features (H2G, G2H) will be
> * registered when the first host or guest becomes active.
> */
> - err = vsock_core_register(&vmci_transport, VSOCK_TRANSPORT_F_DGRAM);
> + err = vsock_core_register(&vmci_transport, VSOCK_TRANSPORT_F_DGRAM_FALLBACK);
> if (err < 0)
> goto err_unsubscribe;
>
> diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> index 2a59dd177c74..278235ea06c4 100644
> --- a/net/vmw_vsock/vsock_loopback.c
> +++ b/net/vmw_vsock/vsock_loopback.c
> @@ -61,7 +61,6 @@ static struct virtio_transport loopback_transport = {
> .shutdown = virtio_transport_shutdown,
> .cancel_pkt = vsock_loopback_cancel_pkt,
>
> - .dgram_bind = virtio_transport_dgram_bind,
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> .dgram_allow = virtio_transport_dgram_allow,
>
>
On 19.07.2023 03:50, Bobby Eshleman wrote:
> This commit drops the transport->dgram_dequeue callback and makes
> vsock_dgram_recvmsg() generic to all transports.
>
> To make this possible, two transport-level changes are introduced:
> - implementation of the ->dgram_addr_init() callback to initialize
> the sockaddr_vm structure with data from incoming socket buffers.
> - transport implementations set the skb->data pointer to the beginning
> of the payload prior to adding the skb to the socket's receive queue.
> That is, they must use skb_pull() before enqueuing. This is an
> agreement between the transport and the socket layer that skb->data
> always points to the beginning of the payload (and not, for example,
> the packet header).
>
> Signed-off-by: Bobby Eshleman <[email protected]>
> ---
> drivers/vhost/vsock.c | 1 -
> include/linux/virtio_vsock.h | 5 ---
> include/net/af_vsock.h | 3 +-
> net/vmw_vsock/af_vsock.c | 40 ++++++++++++++++++++++-
> net/vmw_vsock/hyperv_transport.c | 7 ----
> net/vmw_vsock/virtio_transport.c | 1 -
> net/vmw_vsock/virtio_transport_common.c | 9 -----
> net/vmw_vsock/vmci_transport.c | 58 ++++++---------------------------
> net/vmw_vsock/vsock_loopback.c | 1 -
> 9 files changed, 50 insertions(+), 75 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 6578db78f0ae..ae8891598a48 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
> .cancel_pkt = vhost_transport_cancel_pkt,
>
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> - .dgram_dequeue = virtio_transport_dgram_dequeue,
> .dgram_bind = virtio_transport_dgram_bind,
> .dgram_allow = virtio_transport_dgram_allow,
>
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index c58453699ee9..18cbe8d37fca 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -167,11 +167,6 @@ virtio_transport_stream_dequeue(struct vsock_sock *vsk,
> size_t len,
> int type);
> int
> -virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
> - struct msghdr *msg,
> - size_t len, int flags);
> -
> -int
> virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
> struct msghdr *msg,
> size_t len);
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> index 0e7504a42925..305d57502e89 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -120,11 +120,10 @@ struct vsock_transport {
>
> /* DGRAM. */
> int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
> - int (*dgram_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
> - size_t len, int flags);
> int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
> struct msghdr *, size_t len);
> bool (*dgram_allow)(u32 cid, u32 port);
> + void (*dgram_addr_init)(struct sk_buff *skb, struct sockaddr_vm *addr);
>
> /* STREAM. */
> /* TODO: stream_bind() */
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index deb72a8c44a7..ad71e084bf2f 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -1272,11 +1272,15 @@ static int vsock_dgram_connect(struct socket *sock,
> int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> size_t len, int flags)
> {
> + const struct vsock_transport *transport;
> #ifdef CONFIG_BPF_SYSCALL
> const struct proto *prot;
> #endif
> struct vsock_sock *vsk;
> + struct sk_buff *skb;
> + size_t payload_len;
> struct sock *sk;
> + int err;
>
> sk = sock->sk;
> vsk = vsock_sk(sk);
> @@ -1287,7 +1291,41 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> return prot->recvmsg(sk, msg, len, flags, NULL);
> #endif
>
> - return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
> + if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> + return -EOPNOTSUPP;
> +
> + transport = vsk->transport;
> +
> + /* Retrieve the head sk_buff from the socket's receive queue. */
> + err = 0;
> + skb = skb_recv_datagram(sk_vsock(vsk), flags, &err);
> + if (!skb)
> + return err;
> +
> + payload_len = skb->len;
> +
> + if (payload_len > len) {
> + payload_len = len;
> + msg->msg_flags |= MSG_TRUNC;
> + }
> +
> + /* Place the datagram payload in the user's iovec. */
> + err = skb_copy_datagram_msg(skb, 0, msg, payload_len);
> + if (err)
> + goto out;
> +
> + if (msg->msg_name) {
> + /* Provide the address of the sender. */
> + DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> +
> + transport->dgram_addr_init(skb, vm_addr);
Do we need check that dgram_addr_init != NULL? because I see that not all transports have this
callback set in this patch
> + msg->msg_namelen = sizeof(*vm_addr);
> + }
> + err = payload_len;
> +
> +out:
> + skb_free_datagram(&vsk->sk, skb);
> + return err;
> }
> EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);
>
> diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> index 7cb1a9d2cdb4..7f1ea434656d 100644
> --- a/net/vmw_vsock/hyperv_transport.c
> +++ b/net/vmw_vsock/hyperv_transport.c
> @@ -556,12 +556,6 @@ static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
> return -EOPNOTSUPP;
> }
>
> -static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
> - size_t len, int flags)
> -{
> - return -EOPNOTSUPP;
> -}
> -
> static int hvs_dgram_enqueue(struct vsock_sock *vsk,
> struct sockaddr_vm *remote, struct msghdr *msg,
> size_t dgram_len)
> @@ -833,7 +827,6 @@ static struct vsock_transport hvs_transport = {
> .shutdown = hvs_shutdown,
>
> .dgram_bind = hvs_dgram_bind,
> - .dgram_dequeue = hvs_dgram_dequeue,
> .dgram_enqueue = hvs_dgram_enqueue,
> .dgram_allow = hvs_dgram_allow,
>
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index e95df847176b..66edffdbf303 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -429,7 +429,6 @@ static struct virtio_transport virtio_transport = {
> .cancel_pkt = virtio_transport_cancel_pkt,
>
> .dgram_bind = virtio_transport_dgram_bind,
> - .dgram_dequeue = virtio_transport_dgram_dequeue,
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> .dgram_allow = virtio_transport_dgram_allow,
>
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index b769fc258931..01ea1402ad40 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -583,15 +583,6 @@ virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
> }
> EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_enqueue);
>
> -int
> -virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
> - struct msghdr *msg,
> - size_t len, int flags)
> -{
> - return -EOPNOTSUPP;
> -}
> -EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
> -
> s64 virtio_transport_stream_has_data(struct vsock_sock *vsk)
> {
> struct virtio_vsock_sock *vvs = vsk->trans;
> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> index b370070194fa..0bbbdb222245 100644
> --- a/net/vmw_vsock/vmci_transport.c
> +++ b/net/vmw_vsock/vmci_transport.c
> @@ -641,6 +641,7 @@ static int vmci_transport_recv_dgram_cb(void *data, struct vmci_datagram *dg)
> sock_hold(sk);
> skb_put(skb, size);
> memcpy(skb->data, dg, size);
> + skb_pull(skb, VMCI_DG_HEADERSIZE);
> sk_receive_skb(sk, skb, 0);
>
> return VMCI_SUCCESS;
> @@ -1731,57 +1732,18 @@ static int vmci_transport_dgram_enqueue(
> return err - sizeof(*dg);
> }
>
> -static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk,
> - struct msghdr *msg, size_t len,
> - int flags)
> +static void vmci_transport_dgram_addr_init(struct sk_buff *skb,
> + struct sockaddr_vm *addr)
> {
> - int err;
> struct vmci_datagram *dg;
> - size_t payload_len;
> - struct sk_buff *skb;
> -
> - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> - return -EOPNOTSUPP;
> -
> - /* Retrieve the head sk_buff from the socket's receive queue. */
> - err = 0;
> - skb = skb_recv_datagram(&vsk->sk, flags, &err);
> - if (!skb)
> - return err;
> -
> - dg = (struct vmci_datagram *)skb->data;
> - if (!dg)
> - /* err is 0, meaning we read zero bytes. */
> - goto out;
> -
> - payload_len = dg->payload_size;
> - /* Ensure the sk_buff matches the payload size claimed in the packet. */
> - if (payload_len != skb->len - sizeof(*dg)) {
> - err = -EINVAL;
> - goto out;
> - }
> -
> - if (payload_len > len) {
> - payload_len = len;
> - msg->msg_flags |= MSG_TRUNC;
> - }
> + unsigned int cid, port;
>
> - /* Place the datagram payload in the user's iovec. */
> - err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
> - if (err)
> - goto out;
> -
> - if (msg->msg_name) {
> - /* Provide the address of the sender. */
> - DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> - vsock_addr_init(vm_addr, dg->src.context, dg->src.resource);
> - msg->msg_namelen = sizeof(*vm_addr);
> - }
> - err = payload_len;
> + WARN_ONCE(skb->head == skb->data, "vmci vsock bug: bad dgram skb");
>
> -out:
> - skb_free_datagram(&vsk->sk, skb);
> - return err;
> + dg = (struct vmci_datagram *)skb->head;
> + cid = dg->src.context;
> + port = dg->src.resource;
> + vsock_addr_init(addr, cid, port);
I think we
1) can short this to:
vsock_addr_init(addr, dg->src.context, dg->src.resource);
2) w/o previous point, cid and port better be u32, as VMCI structure has u32 fields 'context' and
'resource' and 'vsock_addr_init()' also has u32 type for both arguments.
Thanks, Arseniy
> }
>
> static bool vmci_transport_dgram_allow(u32 cid, u32 port)
> @@ -2040,9 +2002,9 @@ static struct vsock_transport vmci_transport = {
> .release = vmci_transport_release,
> .connect = vmci_transport_connect,
> .dgram_bind = vmci_transport_dgram_bind,
> - .dgram_dequeue = vmci_transport_dgram_dequeue,
> .dgram_enqueue = vmci_transport_dgram_enqueue,
> .dgram_allow = vmci_transport_dgram_allow,
> + .dgram_addr_init = vmci_transport_dgram_addr_init,
> .stream_dequeue = vmci_transport_stream_dequeue,
> .stream_enqueue = vmci_transport_stream_enqueue,
> .stream_has_data = vmci_transport_stream_has_data,
> diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> index 5c6360df1f31..2a59dd177c74 100644
> --- a/net/vmw_vsock/vsock_loopback.c
> +++ b/net/vmw_vsock/vsock_loopback.c
> @@ -62,7 +62,6 @@ static struct virtio_transport loopback_transport = {
> .cancel_pkt = vsock_loopback_cancel_pkt,
>
> .dgram_bind = virtio_transport_dgram_bind,
> - .dgram_dequeue = virtio_transport_dgram_dequeue,
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> .dgram_allow = virtio_transport_dgram_allow,
>
>
On Sat, Jul 22, 2023 at 11:45:29AM +0300, Arseniy Krasnov wrote:
>
>
> On 19.07.2023 03:50, Bobby Eshleman wrote:
> > This commit implements datagram support for virtio/vsock by teaching
> > virtio to use the general virtio transport ->dgram_addr_init() function
> > and implementation a new version of ->dgram_allow().
> >
> > Additionally, it drops virtio_transport_dgram_allow() as an exported
> > symbol because it is no longer used in other transports.
> >
> > Signed-off-by: Bobby Eshleman <[email protected]>
> > ---
> > include/linux/virtio_vsock.h | 1 -
> > net/vmw_vsock/virtio_transport.c | 24 +++++++++++++++++++++++-
> > net/vmw_vsock/virtio_transport_common.c | 6 ------
> > 3 files changed, 23 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index b3856b8a42b3..d0a4f08b12c1 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -211,7 +211,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> > u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> > bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> > bool virtio_transport_stream_allow(u32 cid, u32 port);
> > -bool virtio_transport_dgram_allow(u32 cid, u32 port);
> > void virtio_transport_dgram_addr_init(struct sk_buff *skb,
> > struct sockaddr_vm *addr);
> >
> > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > index ac2126c7dac5..713718861bd4 100644
> > --- a/net/vmw_vsock/virtio_transport.c
> > +++ b/net/vmw_vsock/virtio_transport.c
> > @@ -63,6 +63,7 @@ struct virtio_vsock {
> >
> > u32 guest_cid;
> > bool seqpacket_allow;
> > + bool dgram_allow;
> > };
> >
> > static u32 virtio_transport_get_local_cid(void)
> > @@ -413,6 +414,7 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
> > queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> > }
> >
> > +static bool virtio_transport_dgram_allow(u32 cid, u32 port);
>
> May be add body here? Without prototyping? Same for loopback and vhost.
>
Sounds okay with me, but this seems to go against the pattern
established by seqpacket. Any reason why?
> > static bool virtio_transport_seqpacket_allow(u32 remote_cid);
> >
> > static struct virtio_transport virtio_transport = {
> > @@ -430,6 +432,7 @@ static struct virtio_transport virtio_transport = {
> >
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > .dgram_allow = virtio_transport_dgram_allow,
> > + .dgram_addr_init = virtio_transport_dgram_addr_init,
> >
> > .stream_dequeue = virtio_transport_stream_dequeue,
> > .stream_enqueue = virtio_transport_stream_enqueue,
> > @@ -462,6 +465,21 @@ static struct virtio_transport virtio_transport = {
> > .send_pkt = virtio_transport_send_pkt,
> > };
> >
> > +static bool virtio_transport_dgram_allow(u32 cid, u32 port)
> > +{
> > + struct virtio_vsock *vsock;
> > + bool dgram_allow;
> > +
> > + dgram_allow = false;
> > + rcu_read_lock();
> > + vsock = rcu_dereference(the_virtio_vsock);
> > + if (vsock)
> > + dgram_allow = vsock->dgram_allow;
> > + rcu_read_unlock();
> > +
> > + return dgram_allow;
> > +}
> > +
> > static bool virtio_transport_seqpacket_allow(u32 remote_cid)
> > {
> > struct virtio_vsock *vsock;
> > @@ -655,6 +673,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> > if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_SEQPACKET))
> > vsock->seqpacket_allow = true;
> >
> > + if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
> > + vsock->dgram_allow = true;
> > +
> > vdev->priv = vsock;
> >
> > ret = virtio_vsock_vqs_init(vsock);
> > @@ -747,7 +768,8 @@ static struct virtio_device_id id_table[] = {
> > };
> >
> > static unsigned int features[] = {
> > - VIRTIO_VSOCK_F_SEQPACKET
> > + VIRTIO_VSOCK_F_SEQPACKET,
> > + VIRTIO_VSOCK_F_DGRAM
> > };
> >
> > static struct virtio_driver virtio_vsock_driver = {
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index 96118e258097..77898f5325cd 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -783,12 +783,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
> > }
> > EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> >
> > -bool virtio_transport_dgram_allow(u32 cid, u32 port)
> > -{
> > - return false;
> > -}
> > -EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
> > -
> > int virtio_transport_connect(struct vsock_sock *vsk)
> > {
> > struct virtio_vsock_pkt_info info = {
> >
>
> Thanks, Arseniy
Thanks,
Bobby
On Sat, Jul 22, 2023 at 11:16:05AM +0300, Arseniy Krasnov wrote:
>
>
> On 19.07.2023 03:50, Bobby Eshleman wrote:
> > This commit implements the common function
> > virtio_transport_dgram_enqueue for enqueueing datagrams. It does not add
> > usage in either vhost or virtio yet.
> >
> > Signed-off-by: Bobby Eshleman <[email protected]>
> > ---
> > net/vmw_vsock/virtio_transport_common.c | 76 ++++++++++++++++++++++++++++++++-
> > 1 file changed, 75 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index ffcbdd77feaa..3bfaff758433 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -819,7 +819,81 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
> > struct msghdr *msg,
> > size_t dgram_len)
> > {
> > - return -EOPNOTSUPP;
> > + /* Here we are only using the info struct to retain style uniformity
> > + * and to ease future refactoring and merging.
> > + */
> > + struct virtio_vsock_pkt_info info_stack = {
> > + .op = VIRTIO_VSOCK_OP_RW,
> > + .msg = msg,
> > + .vsk = vsk,
> > + .type = VIRTIO_VSOCK_TYPE_DGRAM,
> > + };
> > + const struct virtio_transport *t_ops;
> > + struct virtio_vsock_pkt_info *info;
> > + struct sock *sk = sk_vsock(vsk);
> > + struct virtio_vsock_hdr *hdr;
> > + u32 src_cid, src_port;
> > + struct sk_buff *skb;
> > + void *payload;
> > + int noblock;
> > + int err;
> > +
> > + info = &info_stack;
>
> I think 'info' assignment could be moved below, to the place where it is used
> first time.
>
> > +
> > + if (dgram_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> > + return -EMSGSIZE;
> > +
> > + t_ops = virtio_transport_get_ops(vsk);
> > + if (unlikely(!t_ops))
> > + return -EFAULT;
> > +
> > + /* Unlike some of our other sending functions, this function is not
> > + * intended for use without a msghdr.
> > + */
> > + if (WARN_ONCE(!msg, "vsock dgram bug: no msghdr found for dgram enqueue\n"))
> > + return -EFAULT;
>
> Sorry, but is that possible? I thought 'msg' is always provided by general socket layer (e.g. before
> af_vsock.c code) and can't be NULL for DGRAM. Please correct me if i'm wrong.
>
> Also I see, that in af_vsock.c , 'vsock_dgram_sendmsg()' dereferences 'msg' for checking MSG_OOB without any
> checks (before calling transport callback - this function in case of virtio). So I think if we want to keep
> this type of check - such check must be placed in af_vsock.c or somewhere before first dereference of this pointer.
>
There is some talk about dgram sockets adding additional messages types
in the future that help with congestion control. Those messages won't
come from the socket layer, so msghdr will be null. Since there is no
other function for sending datagrams, it seemed likely that this
function would be reworked for that purpose. I felt that adding this
check was a direct way to make it explicit that this function is
currently designed only for the socket-layer caller.
Perhaps a comment would suffice?
> > +
> > + noblock = msg->msg_flags & MSG_DONTWAIT;
> > +
> > + /* Use sock_alloc_send_skb to throttle by sk_sndbuf. This helps avoid
> > + * triggering the OOM.
> > + */
> > + skb = sock_alloc_send_skb(sk, dgram_len + VIRTIO_VSOCK_SKB_HEADROOM,
> > + noblock, &err);
> > + if (!skb)
> > + return err;
> > +
> > + skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM);
> > +
> > + src_cid = t_ops->transport.get_local_cid();
> > + src_port = vsk->local_addr.svm_port;
> > +
> > + hdr = virtio_vsock_hdr(skb);
> > + hdr->type = cpu_to_le16(info->type);
> > + hdr->op = cpu_to_le16(info->op);
> > + hdr->src_cid = cpu_to_le64(src_cid);
> > + hdr->dst_cid = cpu_to_le64(remote_addr->svm_cid);
> > + hdr->src_port = cpu_to_le32(src_port);
> > + hdr->dst_port = cpu_to_le32(remote_addr->svm_port);
> > + hdr->flags = cpu_to_le32(info->flags);
> > + hdr->len = cpu_to_le32(dgram_len);
> > +
> > + skb_set_owner_w(skb, sk);
> > +
> > + payload = skb_put(skb, dgram_len);
> > + err = memcpy_from_msg(payload, msg, dgram_len);
> > + if (err)
> > + return err;
>
> Do we need free allocated skb here ?
>
Yep, thanks.
> > +
> > + trace_virtio_transport_alloc_pkt(src_cid, src_port,
> > + remote_addr->svm_cid,
> > + remote_addr->svm_port,
> > + dgram_len,
> > + info->type,
> > + info->op,
> > + 0);
> > +
> > + return t_ops->send_pkt(skb);
> > }
> > EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
> >
> >
>
> Thanks, Arseniy
Thanks for the review!
Best,
Bobby
On Wed, Jul 19, 2023 at 12:50:15AM +0000, Bobby Eshleman wrote:
> This commit implements datagram support for vhost/vsock by teaching
> vhost to use the common virtio transport datagram functions.
>
> If the virtio RX buffer is too small, then the transmission is
> abandoned, the packet dropped, and EHOSTUNREACH is added to the socket's
> error queue.
>
> Signed-off-by: Bobby Eshleman <[email protected]>
EHOSTUNREACH?
> ---
> drivers/vhost/vsock.c | 62 +++++++++++++++++++++++++++++++++++++++++++++---
> net/vmw_vsock/af_vsock.c | 5 +++-
> 2 files changed, 63 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index d5d6a3c3f273..da14260c6654 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -8,6 +8,7 @@
> */
> #include <linux/miscdevice.h>
> #include <linux/atomic.h>
> +#include <linux/errqueue.h>
> #include <linux/module.h>
> #include <linux/mutex.h>
> #include <linux/vmalloc.h>
> @@ -32,7 +33,8 @@
> enum {
> VHOST_VSOCK_FEATURES = VHOST_FEATURES |
> (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
> - (1ULL << VIRTIO_VSOCK_F_SEQPACKET)
> + (1ULL << VIRTIO_VSOCK_F_SEQPACKET) |
> + (1ULL << VIRTIO_VSOCK_F_DGRAM)
> };
>
> enum {
> @@ -56,6 +58,7 @@ struct vhost_vsock {
> atomic_t queued_replies;
>
> u32 guest_cid;
> + bool dgram_allow;
> bool seqpacket_allow;
> };
>
> @@ -86,6 +89,32 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
> return NULL;
> }
>
> +/* Claims ownership of the skb, do not free the skb after calling! */
> +static void
> +vhost_transport_error(struct sk_buff *skb, int err)
> +{
> + struct sock_exterr_skb *serr;
> + struct sock *sk = skb->sk;
> + struct sk_buff *clone;
> +
> + serr = SKB_EXT_ERR(skb);
> + memset(serr, 0, sizeof(*serr));
> + serr->ee.ee_errno = err;
> + serr->ee.ee_origin = SO_EE_ORIGIN_NONE;
> +
> + clone = skb_clone(skb, GFP_KERNEL);
> + if (!clone)
> + return;
> +
> + if (sock_queue_err_skb(sk, clone))
> + kfree_skb(clone);
> +
> + sk->sk_err = err;
> + sk_error_report(sk);
> +
> + kfree_skb(skb);
> +}
> +
> static void
> vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> struct vhost_virtqueue *vq)
> @@ -160,9 +189,15 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> hdr = virtio_vsock_hdr(skb);
>
> /* If the packet is greater than the space available in the
> - * buffer, we split it using multiple buffers.
> + * buffer, we split it using multiple buffers for connectible
> + * sockets and drop the packet for datagram sockets.
> */
won't this break things like recently proposed zerocopy?
I think splitup has to be supported for all types.
> if (payload_len > iov_len - sizeof(*hdr)) {
> + if (le16_to_cpu(hdr->type) == VIRTIO_VSOCK_TYPE_DGRAM) {
> + vhost_transport_error(skb, EHOSTUNREACH);
> + continue;
> + }
> +
> payload_len = iov_len - sizeof(*hdr);
>
> /* As we are copying pieces of large packet's buffer to
> @@ -394,6 +429,7 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock)
> return val < vq->num;
> }
>
> +static bool vhost_transport_dgram_allow(u32 cid, u32 port);
> static bool vhost_transport_seqpacket_allow(u32 remote_cid);
>
> static struct virtio_transport vhost_transport = {
> @@ -410,7 +446,8 @@ static struct virtio_transport vhost_transport = {
> .cancel_pkt = vhost_transport_cancel_pkt,
>
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> - .dgram_allow = virtio_transport_dgram_allow,
> + .dgram_allow = vhost_transport_dgram_allow,
> + .dgram_addr_init = virtio_transport_dgram_addr_init,
>
> .stream_enqueue = virtio_transport_stream_enqueue,
> .stream_dequeue = virtio_transport_stream_dequeue,
> @@ -443,6 +480,22 @@ static struct virtio_transport vhost_transport = {
> .send_pkt = vhost_transport_send_pkt,
> };
>
> +static bool vhost_transport_dgram_allow(u32 cid, u32 port)
> +{
> + struct vhost_vsock *vsock;
> + bool dgram_allow = false;
> +
> + rcu_read_lock();
> + vsock = vhost_vsock_get(cid);
> +
> + if (vsock)
> + dgram_allow = vsock->dgram_allow;
> +
> + rcu_read_unlock();
> +
> + return dgram_allow;
> +}
> +
> static bool vhost_transport_seqpacket_allow(u32 remote_cid)
> {
> struct vhost_vsock *vsock;
> @@ -799,6 +852,9 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
> if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
> vsock->seqpacket_allow = true;
>
> + if (features & (1ULL << VIRTIO_VSOCK_F_DGRAM))
> + vsock->dgram_allow = true;
> +
> for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
> vq = &vsock->vqs[i];
> mutex_lock(&vq->mutex);
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index e73f3b2c52f1..449ed63ac2b0 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -1427,9 +1427,12 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> return prot->recvmsg(sk, msg, len, flags, NULL);
> #endif
>
> - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> + if (unlikely(flags & MSG_OOB))
> return -EOPNOTSUPP;
>
> + if (unlikely(flags & MSG_ERRQUEUE))
> + return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, 0);
> +
> transport = vsk->transport;
>
> /* Retrieve the head sk_buff from the socket's receive queue. */
>
> --
> 2.30.2
On Mon, Jul 24, 2023 at 09:11:44PM +0300, Arseniy Krasnov wrote:
>
>
> On 19.07.2023 03:50, Bobby Eshleman wrote:
> > This commit drops the transport->dgram_dequeue callback and makes
> > vsock_dgram_recvmsg() generic to all transports.
> >
> > To make this possible, two transport-level changes are introduced:
> > - implementation of the ->dgram_addr_init() callback to initialize
> > the sockaddr_vm structure with data from incoming socket buffers.
> > - transport implementations set the skb->data pointer to the beginning
> > of the payload prior to adding the skb to the socket's receive queue.
> > That is, they must use skb_pull() before enqueuing. This is an
> > agreement between the transport and the socket layer that skb->data
> > always points to the beginning of the payload (and not, for example,
> > the packet header).
> >
> > Signed-off-by: Bobby Eshleman <[email protected]>
> > ---
> > drivers/vhost/vsock.c | 1 -
> > include/linux/virtio_vsock.h | 5 ---
> > include/net/af_vsock.h | 3 +-
> > net/vmw_vsock/af_vsock.c | 40 ++++++++++++++++++++++-
> > net/vmw_vsock/hyperv_transport.c | 7 ----
> > net/vmw_vsock/virtio_transport.c | 1 -
> > net/vmw_vsock/virtio_transport_common.c | 9 -----
> > net/vmw_vsock/vmci_transport.c | 58 ++++++---------------------------
> > net/vmw_vsock/vsock_loopback.c | 1 -
> > 9 files changed, 50 insertions(+), 75 deletions(-)
> >
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index 6578db78f0ae..ae8891598a48 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
> > .cancel_pkt = vhost_transport_cancel_pkt,
> >
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > - .dgram_dequeue = virtio_transport_dgram_dequeue,
> > .dgram_bind = virtio_transport_dgram_bind,
> > .dgram_allow = virtio_transport_dgram_allow,
> >
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index c58453699ee9..18cbe8d37fca 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -167,11 +167,6 @@ virtio_transport_stream_dequeue(struct vsock_sock *vsk,
> > size_t len,
> > int type);
> > int
> > -virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
> > - struct msghdr *msg,
> > - size_t len, int flags);
> > -
> > -int
> > virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
> > struct msghdr *msg,
> > size_t len);
> > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > index 0e7504a42925..305d57502e89 100644
> > --- a/include/net/af_vsock.h
> > +++ b/include/net/af_vsock.h
> > @@ -120,11 +120,10 @@ struct vsock_transport {
> >
> > /* DGRAM. */
> > int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
> > - int (*dgram_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
> > - size_t len, int flags);
> > int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
> > struct msghdr *, size_t len);
> > bool (*dgram_allow)(u32 cid, u32 port);
> > + void (*dgram_addr_init)(struct sk_buff *skb, struct sockaddr_vm *addr);
> >
> > /* STREAM. */
> > /* TODO: stream_bind() */
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index deb72a8c44a7..ad71e084bf2f 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -1272,11 +1272,15 @@ static int vsock_dgram_connect(struct socket *sock,
> > int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> > size_t len, int flags)
> > {
> > + const struct vsock_transport *transport;
> > #ifdef CONFIG_BPF_SYSCALL
> > const struct proto *prot;
> > #endif
> > struct vsock_sock *vsk;
> > + struct sk_buff *skb;
> > + size_t payload_len;
> > struct sock *sk;
> > + int err;
> >
> > sk = sock->sk;
> > vsk = vsock_sk(sk);
> > @@ -1287,7 +1291,41 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> > return prot->recvmsg(sk, msg, len, flags, NULL);
> > #endif
> >
> > - return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
> > + if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> > + return -EOPNOTSUPP;
> > +
> > + transport = vsk->transport;
> > +
> > + /* Retrieve the head sk_buff from the socket's receive queue. */
> > + err = 0;
> > + skb = skb_recv_datagram(sk_vsock(vsk), flags, &err);
> > + if (!skb)
> > + return err;
> > +
> > + payload_len = skb->len;
> > +
> > + if (payload_len > len) {
> > + payload_len = len;
> > + msg->msg_flags |= MSG_TRUNC;
> > + }
> > +
> > + /* Place the datagram payload in the user's iovec. */
> > + err = skb_copy_datagram_msg(skb, 0, msg, payload_len);
> > + if (err)
> > + goto out;
> > +
> > + if (msg->msg_name) {
> > + /* Provide the address of the sender. */
> > + DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> > +
> > + transport->dgram_addr_init(skb, vm_addr);
>
> Do we need check that dgram_addr_init != NULL? because I see that not all transports have this
> callback set in this patch
>
How about adding the check somewhere outside of the hotpath, such as
when the transport is assigned?
> > + msg->msg_namelen = sizeof(*vm_addr);
> > + }
> > + err = payload_len;
> > +
> > +out:
> > + skb_free_datagram(&vsk->sk, skb);
> > + return err;
> > }
> > EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);
> >
> > diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> > index 7cb1a9d2cdb4..7f1ea434656d 100644
> > --- a/net/vmw_vsock/hyperv_transport.c
> > +++ b/net/vmw_vsock/hyperv_transport.c
> > @@ -556,12 +556,6 @@ static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
> > return -EOPNOTSUPP;
> > }
> >
> > -static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
> > - size_t len, int flags)
> > -{
> > - return -EOPNOTSUPP;
> > -}
> > -
> > static int hvs_dgram_enqueue(struct vsock_sock *vsk,
> > struct sockaddr_vm *remote, struct msghdr *msg,
> > size_t dgram_len)
> > @@ -833,7 +827,6 @@ static struct vsock_transport hvs_transport = {
> > .shutdown = hvs_shutdown,
> >
> > .dgram_bind = hvs_dgram_bind,
> > - .dgram_dequeue = hvs_dgram_dequeue,
> > .dgram_enqueue = hvs_dgram_enqueue,
> > .dgram_allow = hvs_dgram_allow,
> >
> > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > index e95df847176b..66edffdbf303 100644
> > --- a/net/vmw_vsock/virtio_transport.c
> > +++ b/net/vmw_vsock/virtio_transport.c
> > @@ -429,7 +429,6 @@ static struct virtio_transport virtio_transport = {
> > .cancel_pkt = virtio_transport_cancel_pkt,
> >
> > .dgram_bind = virtio_transport_dgram_bind,
> > - .dgram_dequeue = virtio_transport_dgram_dequeue,
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > .dgram_allow = virtio_transport_dgram_allow,
> >
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index b769fc258931..01ea1402ad40 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -583,15 +583,6 @@ virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
> > }
> > EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_enqueue);
> >
> > -int
> > -virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
> > - struct msghdr *msg,
> > - size_t len, int flags)
> > -{
> > - return -EOPNOTSUPP;
> > -}
> > -EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
> > -
> > s64 virtio_transport_stream_has_data(struct vsock_sock *vsk)
> > {
> > struct virtio_vsock_sock *vvs = vsk->trans;
> > diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> > index b370070194fa..0bbbdb222245 100644
> > --- a/net/vmw_vsock/vmci_transport.c
> > +++ b/net/vmw_vsock/vmci_transport.c
> > @@ -641,6 +641,7 @@ static int vmci_transport_recv_dgram_cb(void *data, struct vmci_datagram *dg)
> > sock_hold(sk);
> > skb_put(skb, size);
> > memcpy(skb->data, dg, size);
> > + skb_pull(skb, VMCI_DG_HEADERSIZE);
> > sk_receive_skb(sk, skb, 0);
> >
> > return VMCI_SUCCESS;
> > @@ -1731,57 +1732,18 @@ static int vmci_transport_dgram_enqueue(
> > return err - sizeof(*dg);
> > }
> >
> > -static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk,
> > - struct msghdr *msg, size_t len,
> > - int flags)
> > +static void vmci_transport_dgram_addr_init(struct sk_buff *skb,
> > + struct sockaddr_vm *addr)
> > {
> > - int err;
> > struct vmci_datagram *dg;
> > - size_t payload_len;
> > - struct sk_buff *skb;
> > -
> > - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> > - return -EOPNOTSUPP;
> > -
> > - /* Retrieve the head sk_buff from the socket's receive queue. */
> > - err = 0;
> > - skb = skb_recv_datagram(&vsk->sk, flags, &err);
> > - if (!skb)
> > - return err;
> > -
> > - dg = (struct vmci_datagram *)skb->data;
> > - if (!dg)
> > - /* err is 0, meaning we read zero bytes. */
> > - goto out;
> > -
> > - payload_len = dg->payload_size;
> > - /* Ensure the sk_buff matches the payload size claimed in the packet. */
> > - if (payload_len != skb->len - sizeof(*dg)) {
> > - err = -EINVAL;
> > - goto out;
> > - }
> > -
> > - if (payload_len > len) {
> > - payload_len = len;
> > - msg->msg_flags |= MSG_TRUNC;
> > - }
> > + unsigned int cid, port;
> >
> > - /* Place the datagram payload in the user's iovec. */
> > - err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
> > - if (err)
> > - goto out;
> > -
> > - if (msg->msg_name) {
> > - /* Provide the address of the sender. */
> > - DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> > - vsock_addr_init(vm_addr, dg->src.context, dg->src.resource);
> > - msg->msg_namelen = sizeof(*vm_addr);
> > - }
> > - err = payload_len;
> > + WARN_ONCE(skb->head == skb->data, "vmci vsock bug: bad dgram skb");
> >
> > -out:
> > - skb_free_datagram(&vsk->sk, skb);
> > - return err;
> > + dg = (struct vmci_datagram *)skb->head;
> > + cid = dg->src.context;
> > + port = dg->src.resource;
> > + vsock_addr_init(addr, cid, port);
>
> I think we
>
> 1) can short this to:
>
> vsock_addr_init(addr, dg->src.context, dg->src.resource);
>
> 2) w/o previous point, cid and port better be u32, as VMCI structure has u32 fields 'context' and
> 'resource' and 'vsock_addr_init()' also has u32 type for both arguments.
>
> Thanks, Arseniy
Sounds good, thanks.
>
> > }
> >
> > static bool vmci_transport_dgram_allow(u32 cid, u32 port)
> > @@ -2040,9 +2002,9 @@ static struct vsock_transport vmci_transport = {
> > .release = vmci_transport_release,
> > .connect = vmci_transport_connect,
> > .dgram_bind = vmci_transport_dgram_bind,
> > - .dgram_dequeue = vmci_transport_dgram_dequeue,
> > .dgram_enqueue = vmci_transport_dgram_enqueue,
> > .dgram_allow = vmci_transport_dgram_allow,
> > + .dgram_addr_init = vmci_transport_dgram_addr_init,
> > .stream_dequeue = vmci_transport_stream_dequeue,
> > .stream_enqueue = vmci_transport_stream_enqueue,
> > .stream_has_data = vmci_transport_stream_has_data,
> > diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> > index 5c6360df1f31..2a59dd177c74 100644
> > --- a/net/vmw_vsock/vsock_loopback.c
> > +++ b/net/vmw_vsock/vsock_loopback.c
> > @@ -62,7 +62,6 @@ static struct virtio_transport loopback_transport = {
> > .cancel_pkt = vsock_loopback_cancel_pkt,
> >
> > .dgram_bind = virtio_transport_dgram_bind,
> > - .dgram_dequeue = virtio_transport_dgram_dequeue,
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > .dgram_allow = virtio_transport_dgram_allow,
> >
> >
Thanks,
Bobby
On Wed, Jul 19, 2023 at 12:50:04AM +0000, Bobby Eshleman wrote:
> Hey all!
>
> This series introduces support for datagrams to virtio/vsock.
>
> It is a spin-off (and smaller version) of this series from the summer:
> https://lore.kernel.org/all/[email protected]/
>
> Please note that this is an RFC and should not be merged until
> associated changes are made to the virtio specification, which will
> follow after discussion from this series.
>
> Another aside, the v4 of the series has only been mildly tested with a
> run of tools/testing/vsock/vsock_test. Some code likely needs cleaning
> up, but I'm hoping to get some of the design choices agreed upon before
> spending too much time making it pretty.
Stale from v4 cover, sorry.
>
> This series first supports datagrams in a basic form for virtio, and
> then optimizes the sendpath for all datagram transports.
>
> The result is a very fast datagram communication protocol that
> outperforms even UDP on multi-queue virtio-net w/ vhost on a variety
> of multi-threaded workload samples.
>
> For those that are curious, some summary data comparing UDP and VSOCK
> DGRAM (N=5):
>
> vCPUS: 16
> virtio-net queues: 16
> payload size: 4KB
> Setup: bare metal + vm (non-nested)
>
> UDP: 287.59 MB/s
> VSOCK DGRAM: 509.2 MB/s
Also stale. After dropping the lockless sendpath patch and deferring it
to later, this data does not apply to the series anymore.
>
> Some notes about the implementation...
>
> This datagram implementation forces datagrams to self-throttle according
> to the threshold set by sk_sndbuf. It behaves similar to the credits
> used by streams in its effect on throughput and memory consumption, but
> it is not influenced by the receiving socket as credits are.
>
> The device drops packets silently.
>
> As discussed previously, this series introduces datagrams and defers
> fairness to future work. See discussion in v2 for more context around
> datagrams, fairness, and this implementation.
>
> Signed-off-by: Bobby Eshleman <[email protected]>
> ---
> Changes in v5:
> - teach vhost to drop dgram when a datagram exceeds the receive buffer
> - now uses MSG_ERRQUEUE and depends on Arseniy's zerocopy patch:
> "vsock: read from socket's error queue"
> - replace multiple ->dgram_* callbacks with single ->dgram_addr_init()
> callback
> - refactor virtio dgram skb allocator to reduce conflicts w/ zerocopy series
> - add _fallback/_FALLBACK suffix to dgram transport variables/macros
> - add WARN_ONCE() for table_size / VSOCK_HASH issue
> - add static to vsock_find_bound_socket_common
> - dedupe code in vsock_dgram_sendmsg() using module_got var
> - drop concurrent sendmsg() for dgram and defer to future series
> - Add more tests
> - test EHOSTUNREACH in errqueue
> - test stream + dgram address collision
> - improve clarity of dgram msg bounds test code
> - Link to v4: https://lore.kernel.org/r/[email protected]
>
> Changes in v4:
> - style changes
> - vsock: use sk_vsock(vsk) in vsock_dgram_recvmsg instead of
> &sk->vsk
> - vsock: fix xmas tree declaration
> - vsock: fix spacing issues
> - virtio/vsock: virtio_transport_recv_dgram returns void because err
> unused
> - sparse analysis warnings/errors
> - virtio/vsock: fix unitialized skerr on destroy
> - virtio/vsock: fix uninitialized err var on goto out
> - vsock: fix declarations that need static
> - vsock: fix __rcu annotation order
> - bugs
> - vsock: fix null ptr in remote_info code
> - vsock/dgram: make transport_dgram a fallback instead of first
> priority
> - vsock: remove redundant rcu read lock acquire in getname()
> - tests
> - add more tests (message bounds and more)
> - add vsock_dgram_bind() helper
> - add vsock_dgram_connect() helper
>
> Changes in v3:
> - Support multi-transport dgram, changing logic in connect/bind
> to support VMCI case
> - Support per-pkt transport lookup for sendto() case
> - Fix dgram_allow() implementation
> - Fix dgram feature bit number (now it is 3)
> - Fix binding so dgram and connectible (cid,port) spaces are
> non-overlapping
> - RCU protect transport ptr so connect() calls never leave
> a lockless read of the transport and remote_addr are always
> in sync
> - Link to v2: https://lore.kernel.org/r/[email protected]
>
> ---
> Bobby Eshleman (13):
> af_vsock: generalize vsock_dgram_recvmsg() to all transports
> af_vsock: refactor transport lookup code
> af_vsock: support multi-transport datagrams
> af_vsock: generalize bind table functions
> af_vsock: use a separate dgram bind table
> virtio/vsock: add VIRTIO_VSOCK_TYPE_DGRAM
> virtio/vsock: add common datagram send path
> af_vsock: add vsock_find_bound_dgram_socket()
> virtio/vsock: add common datagram recv path
> virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
> vhost/vsock: implement datagram support
> vsock/loopback: implement datagram support
> virtio/vsock: implement datagram support
>
> Jiang Wang (1):
> test/vsock: add vsock dgram tests
>
> drivers/vhost/vsock.c | 64 ++-
> include/linux/virtio_vsock.h | 10 +-
> include/net/af_vsock.h | 14 +-
> include/uapi/linux/virtio_vsock.h | 2 +
> net/vmw_vsock/af_vsock.c | 281 ++++++++++---
> net/vmw_vsock/hyperv_transport.c | 13 -
> net/vmw_vsock/virtio_transport.c | 26 +-
> net/vmw_vsock/virtio_transport_common.c | 190 +++++++--
> net/vmw_vsock/vmci_transport.c | 60 +--
> net/vmw_vsock/vsock_loopback.c | 10 +-
> tools/testing/vsock/util.c | 141 ++++++-
> tools/testing/vsock/util.h | 6 +
> tools/testing/vsock/vsock_test.c | 680 ++++++++++++++++++++++++++++++++
> 13 files changed, 1320 insertions(+), 177 deletions(-)
> ---
> base-commit: 37cadc266ebdc7e3531111c2b3304fa01b2131e8
> change-id: 20230413-b4-vsock-dgram-3b6eba6a64e5
>
> Best regards,
> --
> Bobby Eshleman <[email protected]>
>
On Sat, Jul 22, 2023 at 11:42:38AM +0300, Arseniy Krasnov wrote:
>
>
> On 19.07.2023 03:50, Bobby Eshleman wrote:
> > This commit implements datagram support for vhost/vsock by teaching
> > vhost to use the common virtio transport datagram functions.
> >
> > If the virtio RX buffer is too small, then the transmission is
> > abandoned, the packet dropped, and EHOSTUNREACH is added to the socket's
> > error queue.
> >
> > Signed-off-by: Bobby Eshleman <[email protected]>
> > ---
> > drivers/vhost/vsock.c | 62 +++++++++++++++++++++++++++++++++++++++++++++---
> > net/vmw_vsock/af_vsock.c | 5 +++-
> > 2 files changed, 63 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index d5d6a3c3f273..da14260c6654 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -8,6 +8,7 @@
> > */
> > #include <linux/miscdevice.h>
> > #include <linux/atomic.h>
> > +#include <linux/errqueue.h>
> > #include <linux/module.h>
> > #include <linux/mutex.h>
> > #include <linux/vmalloc.h>
> > @@ -32,7 +33,8 @@
> > enum {
> > VHOST_VSOCK_FEATURES = VHOST_FEATURES |
> > (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
> > - (1ULL << VIRTIO_VSOCK_F_SEQPACKET)
> > + (1ULL << VIRTIO_VSOCK_F_SEQPACKET) |
> > + (1ULL << VIRTIO_VSOCK_F_DGRAM)
> > };
> >
> > enum {
> > @@ -56,6 +58,7 @@ struct vhost_vsock {
> > atomic_t queued_replies;
> >
> > u32 guest_cid;
> > + bool dgram_allow;
> > bool seqpacket_allow;
> > };
> >
> > @@ -86,6 +89,32 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
> > return NULL;
> > }
> >
> > +/* Claims ownership of the skb, do not free the skb after calling! */
> > +static void
> > +vhost_transport_error(struct sk_buff *skb, int err)
> > +{
> > + struct sock_exterr_skb *serr;
> > + struct sock *sk = skb->sk;
> > + struct sk_buff *clone;
> > +
> > + serr = SKB_EXT_ERR(skb);
> > + memset(serr, 0, sizeof(*serr));
> > + serr->ee.ee_errno = err;
> > + serr->ee.ee_origin = SO_EE_ORIGIN_NONE;
> > +
> > + clone = skb_clone(skb, GFP_KERNEL);
>
> May for skb which is error carrier we can use 'sock_omalloc()', not 'skb_clone()' ? TCP uses skb
> allocated by this function as carriers of error structure. I guess 'skb_clone()' also clones data of origin,
> but i think that there is no need in data as we insert it to error queue of the socket.
>
> What do You think?
IIUC skb_clone() is often used in this scenario so that the user can
retrieve the error-causing packet from the error queue. Is there some
reason we shouldn't do this?
I'm seeing that the serr bits need to occur on the clone here, not the
original. I didn't realize the SKB_EXT_ERR() is a skb->cb cast. I'm not
actually sure how this passes the test case since ->cb isn't cloned.
>
> > + if (!clone)
> > + return;
>
> What will happen here 'if (!clone)' ? skb will leak as it was removed from queue?
>
Ah yes, true.
> > +
> > + if (sock_queue_err_skb(sk, clone))
> > + kfree_skb(clone);
> > +
> > + sk->sk_err = err;
> > + sk_error_report(sk);
> > +
> > + kfree_skb(skb);
> > +}
> > +
> > static void
> > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > struct vhost_virtqueue *vq)
> > @@ -160,9 +189,15 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > hdr = virtio_vsock_hdr(skb);
> >
> > /* If the packet is greater than the space available in the
> > - * buffer, we split it using multiple buffers.
> > + * buffer, we split it using multiple buffers for connectible
> > + * sockets and drop the packet for datagram sockets.
> > */
> > if (payload_len > iov_len - sizeof(*hdr)) {
> > + if (le16_to_cpu(hdr->type) == VIRTIO_VSOCK_TYPE_DGRAM) {
> > + vhost_transport_error(skb, EHOSTUNREACH);
> > + continue;
> > + }
> > +
> > payload_len = iov_len - sizeof(*hdr);
> >
> > /* As we are copying pieces of large packet's buffer to
> > @@ -394,6 +429,7 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock)
> > return val < vq->num;
> > }
> >
> > +static bool vhost_transport_dgram_allow(u32 cid, u32 port);
> > static bool vhost_transport_seqpacket_allow(u32 remote_cid);
> >
> > static struct virtio_transport vhost_transport = {
> > @@ -410,7 +446,8 @@ static struct virtio_transport vhost_transport = {
> > .cancel_pkt = vhost_transport_cancel_pkt,
> >
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > - .dgram_allow = virtio_transport_dgram_allow,
> > + .dgram_allow = vhost_transport_dgram_allow,
> > + .dgram_addr_init = virtio_transport_dgram_addr_init,
> >
> > .stream_enqueue = virtio_transport_stream_enqueue,
> > .stream_dequeue = virtio_transport_stream_dequeue,
> > @@ -443,6 +480,22 @@ static struct virtio_transport vhost_transport = {
> > .send_pkt = vhost_transport_send_pkt,
> > };
> >
> > +static bool vhost_transport_dgram_allow(u32 cid, u32 port)
> > +{
> > + struct vhost_vsock *vsock;
> > + bool dgram_allow = false;
> > +
> > + rcu_read_lock();
> > + vsock = vhost_vsock_get(cid);
> > +
> > + if (vsock)
> > + dgram_allow = vsock->dgram_allow;
> > +
> > + rcu_read_unlock();
> > +
> > + return dgram_allow;
> > +}
> > +
> > static bool vhost_transport_seqpacket_allow(u32 remote_cid)
> > {
> > struct vhost_vsock *vsock;
> > @@ -799,6 +852,9 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
> > if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
> > vsock->seqpacket_allow = true;
> >
> > + if (features & (1ULL << VIRTIO_VSOCK_F_DGRAM))
> > + vsock->dgram_allow = true;
> > +
> > for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
> > vq = &vsock->vqs[i];
> > mutex_lock(&vq->mutex);
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index e73f3b2c52f1..449ed63ac2b0 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -1427,9 +1427,12 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> > return prot->recvmsg(sk, msg, len, flags, NULL);
> > #endif
> >
> > - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> > + if (unlikely(flags & MSG_OOB))
> > return -EOPNOTSUPP;
> >
> > + if (unlikely(flags & MSG_ERRQUEUE))
> > + return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, 0);
> > +
>
> Sorry, but I get build error here, because SOL_VSOCK in undefined. I think it should be added to
> include/linux/socket.h and to uapi files also for future use in userspace.
>
Strange, I built each patch individually without issue. My base is
netdev/main with your SOL_VSOCK patch applied. I will look today and see
if I'm missing something.
> Also Stefano Garzarella <[email protected]> suggested to add define something like VSOCK_RECVERR,
> in the same way as IP_RECVERR, and use it as last parameter of 'sock_recv_errqueue()'.
>
Got it, thanks.
> > transport = vsk->transport;
> >
> > /* Retrieve the head sk_buff from the socket's receive queue. */
> >
>
> Thanks, Arseniy
Thanks,
Bobby
On Wed, Jul 19, 2023 at 12:50:14AM +0000, Bobby Eshleman wrote:
> This commit adds a feature bit for virtio vsock to support datagrams.
>
> Signed-off-by: Jiang Wang <[email protected]>
> Signed-off-by: Bobby Eshleman <[email protected]>
> ---
> include/uapi/linux/virtio_vsock.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
> index 331be28b1d30..27b4b2b8bf13 100644
> --- a/include/uapi/linux/virtio_vsock.h
> +++ b/include/uapi/linux/virtio_vsock.h
> @@ -40,6 +40,7 @@
>
> /* The feature bitmap for virtio vsock */
> #define VIRTIO_VSOCK_F_SEQPACKET 1 /* SOCK_SEQPACKET supported */
> +#define VIRTIO_VSOCK_F_DGRAM 3 /* SOCK_DGRAM supported */
>
> struct virtio_vsock_config {
> __le64 guest_cid;
pls do not add interface without first getting it accepted in the
virtio spec.
> --
> 2.30.2
On Wed, Jul 26, 2023 at 02:38:08PM -0400, Michael S. Tsirkin wrote:
>On Wed, Jul 19, 2023 at 12:50:14AM +0000, Bobby Eshleman wrote:
>> This commit adds a feature bit for virtio vsock to support datagrams.
>>
>> Signed-off-by: Jiang Wang <[email protected]>
>> Signed-off-by: Bobby Eshleman <[email protected]>
>> ---
>> include/uapi/linux/virtio_vsock.h | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
>> index 331be28b1d30..27b4b2b8bf13 100644
>> --- a/include/uapi/linux/virtio_vsock.h
>> +++ b/include/uapi/linux/virtio_vsock.h
>> @@ -40,6 +40,7 @@
>>
>> /* The feature bitmap for virtio vsock */
>> #define VIRTIO_VSOCK_F_SEQPACKET 1 /* SOCK_SEQPACKET supported */
>> +#define VIRTIO_VSOCK_F_DGRAM 3 /* SOCK_DGRAM supported */
>>
>> struct virtio_vsock_config {
>> __le64 guest_cid;
>
>pls do not add interface without first getting it accepted in the
>virtio spec.
Yep, fortunatelly this series is still RFC.
I think by now we've seen that the implementation is doable, so we
should discuss the changes to the specification ASAP. Then we can
merge the series.
@Bobby can you start the discussion about spec changes?
Thanks,
Stefano
On Wed, Jul 19, 2023 at 12:50:04AM +0000, Bobby Eshleman wrote:
> Hey all!
>
> This series introduces support for datagrams to virtio/vsock.
>
> It is a spin-off (and smaller version) of this series from the summer:
> https://lore.kernel.org/all/[email protected]/
>
> Please note that this is an RFC and should not be merged until
> associated changes are made to the virtio specification, which will
> follow after discussion from this series.
>
> Another aside, the v4 of the series has only been mildly tested with a
> run of tools/testing/vsock/vsock_test. Some code likely needs cleaning
> up, but I'm hoping to get some of the design choices agreed upon before
> spending too much time making it pretty.
>
> This series first supports datagrams in a basic form for virtio, and
> then optimizes the sendpath for all datagram transports.
>
> The result is a very fast datagram communication protocol that
> outperforms even UDP on multi-queue virtio-net w/ vhost on a variety
> of multi-threaded workload samples.
>
> For those that are curious, some summary data comparing UDP and VSOCK
> DGRAM (N=5):
>
> vCPUS: 16
> virtio-net queues: 16
> payload size: 4KB
> Setup: bare metal + vm (non-nested)
>
> UDP: 287.59 MB/s
> VSOCK DGRAM: 509.2 MB/s
>
> Some notes about the implementation...
>
> This datagram implementation forces datagrams to self-throttle according
> to the threshold set by sk_sndbuf. It behaves similar to the credits
> used by streams in its effect on throughput and memory consumption, but
> it is not influenced by the receiving socket as credits are.
>
> The device drops packets silently.
>
> As discussed previously, this series introduces datagrams and defers
> fairness to future work. See discussion in v2 for more context around
> datagrams, fairness, and this implementation.
it's a big thread - can't you summarize here?
> Signed-off-by: Bobby Eshleman <[email protected]>
could you give a bit more motivation? which applications do
you have in mind? for example, on localhost loopback datagrams
are actually reliable and a bunch of apps came to depend
on that even if they shouldn't.
> ---
> Changes in v5:
> - teach vhost to drop dgram when a datagram exceeds the receive buffer
> - now uses MSG_ERRQUEUE and depends on Arseniy's zerocopy patch:
> "vsock: read from socket's error queue"
> - replace multiple ->dgram_* callbacks with single ->dgram_addr_init()
> callback
> - refactor virtio dgram skb allocator to reduce conflicts w/ zerocopy series
> - add _fallback/_FALLBACK suffix to dgram transport variables/macros
> - add WARN_ONCE() for table_size / VSOCK_HASH issue
> - add static to vsock_find_bound_socket_common
> - dedupe code in vsock_dgram_sendmsg() using module_got var
> - drop concurrent sendmsg() for dgram and defer to future series
> - Add more tests
> - test EHOSTUNREACH in errqueue
> - test stream + dgram address collision
> - improve clarity of dgram msg bounds test code
> - Link to v4: https://lore.kernel.org/r/[email protected]
>
> Changes in v4:
> - style changes
> - vsock: use sk_vsock(vsk) in vsock_dgram_recvmsg instead of
> &sk->vsk
> - vsock: fix xmas tree declaration
> - vsock: fix spacing issues
> - virtio/vsock: virtio_transport_recv_dgram returns void because err
> unused
> - sparse analysis warnings/errors
> - virtio/vsock: fix unitialized skerr on destroy
> - virtio/vsock: fix uninitialized err var on goto out
> - vsock: fix declarations that need static
> - vsock: fix __rcu annotation order
> - bugs
> - vsock: fix null ptr in remote_info code
> - vsock/dgram: make transport_dgram a fallback instead of first
> priority
> - vsock: remove redundant rcu read lock acquire in getname()
> - tests
> - add more tests (message bounds and more)
> - add vsock_dgram_bind() helper
> - add vsock_dgram_connect() helper
>
> Changes in v3:
> - Support multi-transport dgram, changing logic in connect/bind
> to support VMCI case
> - Support per-pkt transport lookup for sendto() case
> - Fix dgram_allow() implementation
> - Fix dgram feature bit number (now it is 3)
> - Fix binding so dgram and connectible (cid,port) spaces are
> non-overlapping
> - RCU protect transport ptr so connect() calls never leave
> a lockless read of the transport and remote_addr are always
> in sync
> - Link to v2: https://lore.kernel.org/r/[email protected]
>
> ---
> Bobby Eshleman (13):
> af_vsock: generalize vsock_dgram_recvmsg() to all transports
> af_vsock: refactor transport lookup code
> af_vsock: support multi-transport datagrams
> af_vsock: generalize bind table functions
> af_vsock: use a separate dgram bind table
> virtio/vsock: add VIRTIO_VSOCK_TYPE_DGRAM
> virtio/vsock: add common datagram send path
> af_vsock: add vsock_find_bound_dgram_socket()
> virtio/vsock: add common datagram recv path
> virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
> vhost/vsock: implement datagram support
> vsock/loopback: implement datagram support
> virtio/vsock: implement datagram support
>
> Jiang Wang (1):
> test/vsock: add vsock dgram tests
>
> drivers/vhost/vsock.c | 64 ++-
> include/linux/virtio_vsock.h | 10 +-
> include/net/af_vsock.h | 14 +-
> include/uapi/linux/virtio_vsock.h | 2 +
> net/vmw_vsock/af_vsock.c | 281 ++++++++++---
> net/vmw_vsock/hyperv_transport.c | 13 -
> net/vmw_vsock/virtio_transport.c | 26 +-
> net/vmw_vsock/virtio_transport_common.c | 190 +++++++--
> net/vmw_vsock/vmci_transport.c | 60 +--
> net/vmw_vsock/vsock_loopback.c | 10 +-
> tools/testing/vsock/util.c | 141 ++++++-
> tools/testing/vsock/util.h | 6 +
> tools/testing/vsock/vsock_test.c | 680 ++++++++++++++++++++++++++++++++
> 13 files changed, 1320 insertions(+), 177 deletions(-)
> ---
> base-commit: 37cadc266ebdc7e3531111c2b3304fa01b2131e8
> change-id: 20230413-b4-vsock-dgram-3b6eba6a64e5
>
> Best regards,
> --
> Bobby Eshleman <[email protected]>
On 26.07.2023 21:21, Bobby Eshleman wrote:
> On Mon, Jul 24, 2023 at 09:11:44PM +0300, Arseniy Krasnov wrote:
>>
>>
>> On 19.07.2023 03:50, Bobby Eshleman wrote:
>>> This commit drops the transport->dgram_dequeue callback and makes
>>> vsock_dgram_recvmsg() generic to all transports.
>>>
>>> To make this possible, two transport-level changes are introduced:
>>> - implementation of the ->dgram_addr_init() callback to initialize
>>> the sockaddr_vm structure with data from incoming socket buffers.
>>> - transport implementations set the skb->data pointer to the beginning
>>> of the payload prior to adding the skb to the socket's receive queue.
>>> That is, they must use skb_pull() before enqueuing. This is an
>>> agreement between the transport and the socket layer that skb->data
>>> always points to the beginning of the payload (and not, for example,
>>> the packet header).
>>>
>>> Signed-off-by: Bobby Eshleman <[email protected]>
>>> ---
>>> drivers/vhost/vsock.c | 1 -
>>> include/linux/virtio_vsock.h | 5 ---
>>> include/net/af_vsock.h | 3 +-
>>> net/vmw_vsock/af_vsock.c | 40 ++++++++++++++++++++++-
>>> net/vmw_vsock/hyperv_transport.c | 7 ----
>>> net/vmw_vsock/virtio_transport.c | 1 -
>>> net/vmw_vsock/virtio_transport_common.c | 9 -----
>>> net/vmw_vsock/vmci_transport.c | 58 ++++++---------------------------
>>> net/vmw_vsock/vsock_loopback.c | 1 -
>>> 9 files changed, 50 insertions(+), 75 deletions(-)
>>>
>>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>>> index 6578db78f0ae..ae8891598a48 100644
>>> --- a/drivers/vhost/vsock.c
>>> +++ b/drivers/vhost/vsock.c
>>> @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
>>> .cancel_pkt = vhost_transport_cancel_pkt,
>>>
>>> .dgram_enqueue = virtio_transport_dgram_enqueue,
>>> - .dgram_dequeue = virtio_transport_dgram_dequeue,
>>> .dgram_bind = virtio_transport_dgram_bind,
>>> .dgram_allow = virtio_transport_dgram_allow,
>>>
>>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>>> index c58453699ee9..18cbe8d37fca 100644
>>> --- a/include/linux/virtio_vsock.h
>>> +++ b/include/linux/virtio_vsock.h
>>> @@ -167,11 +167,6 @@ virtio_transport_stream_dequeue(struct vsock_sock *vsk,
>>> size_t len,
>>> int type);
>>> int
>>> -virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
>>> - struct msghdr *msg,
>>> - size_t len, int flags);
>>> -
>>> -int
>>> virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
>>> struct msghdr *msg,
>>> size_t len);
>>> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>>> index 0e7504a42925..305d57502e89 100644
>>> --- a/include/net/af_vsock.h
>>> +++ b/include/net/af_vsock.h
>>> @@ -120,11 +120,10 @@ struct vsock_transport {
>>>
>>> /* DGRAM. */
>>> int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
>>> - int (*dgram_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
>>> - size_t len, int flags);
>>> int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
>>> struct msghdr *, size_t len);
>>> bool (*dgram_allow)(u32 cid, u32 port);
>>> + void (*dgram_addr_init)(struct sk_buff *skb, struct sockaddr_vm *addr);
>>>
>>> /* STREAM. */
>>> /* TODO: stream_bind() */
>>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>>> index deb72a8c44a7..ad71e084bf2f 100644
>>> --- a/net/vmw_vsock/af_vsock.c
>>> +++ b/net/vmw_vsock/af_vsock.c
>>> @@ -1272,11 +1272,15 @@ static int vsock_dgram_connect(struct socket *sock,
>>> int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
>>> size_t len, int flags)
>>> {
>>> + const struct vsock_transport *transport;
>>> #ifdef CONFIG_BPF_SYSCALL
>>> const struct proto *prot;
>>> #endif
>>> struct vsock_sock *vsk;
>>> + struct sk_buff *skb;
>>> + size_t payload_len;
>>> struct sock *sk;
>>> + int err;
>>>
>>> sk = sock->sk;
>>> vsk = vsock_sk(sk);
>>> @@ -1287,7 +1291,41 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
>>> return prot->recvmsg(sk, msg, len, flags, NULL);
>>> #endif
>>>
>>> - return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
>>> + if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
>>> + return -EOPNOTSUPP;
>>> +
>>> + transport = vsk->transport;
>>> +
>>> + /* Retrieve the head sk_buff from the socket's receive queue. */
>>> + err = 0;
>>> + skb = skb_recv_datagram(sk_vsock(vsk), flags, &err);
>>> + if (!skb)
>>> + return err;
>>> +
>>> + payload_len = skb->len;
>>> +
>>> + if (payload_len > len) {
>>> + payload_len = len;
>>> + msg->msg_flags |= MSG_TRUNC;
>>> + }
>>> +
>>> + /* Place the datagram payload in the user's iovec. */
>>> + err = skb_copy_datagram_msg(skb, 0, msg, payload_len);
>>> + if (err)
>>> + goto out;
>>> +
>>> + if (msg->msg_name) {
>>> + /* Provide the address of the sender. */
>>> + DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
>>> +
>>> + transport->dgram_addr_init(skb, vm_addr);
>>
>> Do we need check that dgram_addr_init != NULL? because I see that not all transports have this
>> callback set in this patch
>>
>
> How about adding the check somewhere outside of the hotpath, such as
> when the transport is assigned?
Yes, may be we can return ESOCKTNOSUPPORT if this callback is not provided by transport (as we dereference
it here without any checks).
Thanks, Arseniy
>
>>> + msg->msg_namelen = sizeof(*vm_addr);
>>> + }
>>> + err = payload_len;
>>> +
>>> +out:
>>> + skb_free_datagram(&vsk->sk, skb);
>>> + return err;
>>> }
>>> EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);
>>>
>>> diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
>>> index 7cb1a9d2cdb4..7f1ea434656d 100644
>>> --- a/net/vmw_vsock/hyperv_transport.c
>>> +++ b/net/vmw_vsock/hyperv_transport.c
>>> @@ -556,12 +556,6 @@ static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
>>> return -EOPNOTSUPP;
>>> }
>>>
>>> -static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
>>> - size_t len, int flags)
>>> -{
>>> - return -EOPNOTSUPP;
>>> -}
>>> -
>>> static int hvs_dgram_enqueue(struct vsock_sock *vsk,
>>> struct sockaddr_vm *remote, struct msghdr *msg,
>>> size_t dgram_len)
>>> @@ -833,7 +827,6 @@ static struct vsock_transport hvs_transport = {
>>> .shutdown = hvs_shutdown,
>>>
>>> .dgram_bind = hvs_dgram_bind,
>>> - .dgram_dequeue = hvs_dgram_dequeue,
>>> .dgram_enqueue = hvs_dgram_enqueue,
>>> .dgram_allow = hvs_dgram_allow,
>>>
>>> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>>> index e95df847176b..66edffdbf303 100644
>>> --- a/net/vmw_vsock/virtio_transport.c
>>> +++ b/net/vmw_vsock/virtio_transport.c
>>> @@ -429,7 +429,6 @@ static struct virtio_transport virtio_transport = {
>>> .cancel_pkt = virtio_transport_cancel_pkt,
>>>
>>> .dgram_bind = virtio_transport_dgram_bind,
>>> - .dgram_dequeue = virtio_transport_dgram_dequeue,
>>> .dgram_enqueue = virtio_transport_dgram_enqueue,
>>> .dgram_allow = virtio_transport_dgram_allow,
>>>
>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>> index b769fc258931..01ea1402ad40 100644
>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>> @@ -583,15 +583,6 @@ virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
>>> }
>>> EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_enqueue);
>>>
>>> -int
>>> -virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
>>> - struct msghdr *msg,
>>> - size_t len, int flags)
>>> -{
>>> - return -EOPNOTSUPP;
>>> -}
>>> -EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
>>> -
>>> s64 virtio_transport_stream_has_data(struct vsock_sock *vsk)
>>> {
>>> struct virtio_vsock_sock *vvs = vsk->trans;
>>> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
>>> index b370070194fa..0bbbdb222245 100644
>>> --- a/net/vmw_vsock/vmci_transport.c
>>> +++ b/net/vmw_vsock/vmci_transport.c
>>> @@ -641,6 +641,7 @@ static int vmci_transport_recv_dgram_cb(void *data, struct vmci_datagram *dg)
>>> sock_hold(sk);
>>> skb_put(skb, size);
>>> memcpy(skb->data, dg, size);
>>> + skb_pull(skb, VMCI_DG_HEADERSIZE);
>>> sk_receive_skb(sk, skb, 0);
>>>
>>> return VMCI_SUCCESS;
>>> @@ -1731,57 +1732,18 @@ static int vmci_transport_dgram_enqueue(
>>> return err - sizeof(*dg);
>>> }
>>>
>>> -static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk,
>>> - struct msghdr *msg, size_t len,
>>> - int flags)
>>> +static void vmci_transport_dgram_addr_init(struct sk_buff *skb,
>>> + struct sockaddr_vm *addr)
>>> {
>>> - int err;
>>> struct vmci_datagram *dg;
>>> - size_t payload_len;
>>> - struct sk_buff *skb;
>>> -
>>> - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
>>> - return -EOPNOTSUPP;
>>> -
>>> - /* Retrieve the head sk_buff from the socket's receive queue. */
>>> - err = 0;
>>> - skb = skb_recv_datagram(&vsk->sk, flags, &err);
>>> - if (!skb)
>>> - return err;
>>> -
>>> - dg = (struct vmci_datagram *)skb->data;
>>> - if (!dg)
>>> - /* err is 0, meaning we read zero bytes. */
>>> - goto out;
>>> -
>>> - payload_len = dg->payload_size;
>>> - /* Ensure the sk_buff matches the payload size claimed in the packet. */
>>> - if (payload_len != skb->len - sizeof(*dg)) {
>>> - err = -EINVAL;
>>> - goto out;
>>> - }
>>> -
>>> - if (payload_len > len) {
>>> - payload_len = len;
>>> - msg->msg_flags |= MSG_TRUNC;
>>> - }
>>> + unsigned int cid, port;
>>>
>>> - /* Place the datagram payload in the user's iovec. */
>>> - err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
>>> - if (err)
>>> - goto out;
>>> -
>>> - if (msg->msg_name) {
>>> - /* Provide the address of the sender. */
>>> - DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
>>> - vsock_addr_init(vm_addr, dg->src.context, dg->src.resource);
>>> - msg->msg_namelen = sizeof(*vm_addr);
>>> - }
>>> - err = payload_len;
>>> + WARN_ONCE(skb->head == skb->data, "vmci vsock bug: bad dgram skb");
>>>
>>> -out:
>>> - skb_free_datagram(&vsk->sk, skb);
>>> - return err;
>>> + dg = (struct vmci_datagram *)skb->head;
>>> + cid = dg->src.context;
>>> + port = dg->src.resource;
>>> + vsock_addr_init(addr, cid, port);
>>
>> I think we
>>
>> 1) can short this to:
>>
>> vsock_addr_init(addr, dg->src.context, dg->src.resource);
>>
>> 2) w/o previous point, cid and port better be u32, as VMCI structure has u32 fields 'context' and
>> 'resource' and 'vsock_addr_init()' also has u32 type for both arguments.
>>
>> Thanks, Arseniy
>
> Sounds good, thanks.
>
>>
>>> }
>>>
>>> static bool vmci_transport_dgram_allow(u32 cid, u32 port)
>>> @@ -2040,9 +2002,9 @@ static struct vsock_transport vmci_transport = {
>>> .release = vmci_transport_release,
>>> .connect = vmci_transport_connect,
>>> .dgram_bind = vmci_transport_dgram_bind,
>>> - .dgram_dequeue = vmci_transport_dgram_dequeue,
>>> .dgram_enqueue = vmci_transport_dgram_enqueue,
>>> .dgram_allow = vmci_transport_dgram_allow,
>>> + .dgram_addr_init = vmci_transport_dgram_addr_init,
>>> .stream_dequeue = vmci_transport_stream_dequeue,
>>> .stream_enqueue = vmci_transport_stream_enqueue,
>>> .stream_has_data = vmci_transport_stream_has_data,
>>> diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
>>> index 5c6360df1f31..2a59dd177c74 100644
>>> --- a/net/vmw_vsock/vsock_loopback.c
>>> +++ b/net/vmw_vsock/vsock_loopback.c
>>> @@ -62,7 +62,6 @@ static struct virtio_transport loopback_transport = {
>>> .cancel_pkt = vsock_loopback_cancel_pkt,
>>>
>>> .dgram_bind = virtio_transport_dgram_bind,
>>> - .dgram_dequeue = virtio_transport_dgram_dequeue,
>>> .dgram_enqueue = virtio_transport_dgram_enqueue,
>>> .dgram_allow = virtio_transport_dgram_allow,
>>>
>>>
>
> Thanks,
> Bobby
On 26.07.2023 20:08, Bobby Eshleman wrote:
> On Sat, Jul 22, 2023 at 11:16:05AM +0300, Arseniy Krasnov wrote:
>>
>>
>> On 19.07.2023 03:50, Bobby Eshleman wrote:
>>> This commit implements the common function
>>> virtio_transport_dgram_enqueue for enqueueing datagrams. It does not add
>>> usage in either vhost or virtio yet.
>>>
>>> Signed-off-by: Bobby Eshleman <[email protected]>
>>> ---
>>> net/vmw_vsock/virtio_transport_common.c | 76 ++++++++++++++++++++++++++++++++-
>>> 1 file changed, 75 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>> index ffcbdd77feaa..3bfaff758433 100644
>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>> @@ -819,7 +819,81 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
>>> struct msghdr *msg,
>>> size_t dgram_len)
>>> {
>>> - return -EOPNOTSUPP;
>>> + /* Here we are only using the info struct to retain style uniformity
>>> + * and to ease future refactoring and merging.
>>> + */
>>> + struct virtio_vsock_pkt_info info_stack = {
>>> + .op = VIRTIO_VSOCK_OP_RW,
>>> + .msg = msg,
>>> + .vsk = vsk,
>>> + .type = VIRTIO_VSOCK_TYPE_DGRAM,
>>> + };
>>> + const struct virtio_transport *t_ops;
>>> + struct virtio_vsock_pkt_info *info;
>>> + struct sock *sk = sk_vsock(vsk);
>>> + struct virtio_vsock_hdr *hdr;
>>> + u32 src_cid, src_port;
>>> + struct sk_buff *skb;
>>> + void *payload;
>>> + int noblock;
>>> + int err;
>>> +
>>> + info = &info_stack;
>>
>> I think 'info' assignment could be moved below, to the place where it is used
>> first time.
>>
>>> +
>>> + if (dgram_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
>>> + return -EMSGSIZE;
>>> +
>>> + t_ops = virtio_transport_get_ops(vsk);
>>> + if (unlikely(!t_ops))
>>> + return -EFAULT;
>>> +
>>> + /* Unlike some of our other sending functions, this function is not
>>> + * intended for use without a msghdr.
>>> + */
>>> + if (WARN_ONCE(!msg, "vsock dgram bug: no msghdr found for dgram enqueue\n"))
>>> + return -EFAULT;
>>
>> Sorry, but is that possible? I thought 'msg' is always provided by general socket layer (e.g. before
>> af_vsock.c code) and can't be NULL for DGRAM. Please correct me if i'm wrong.
>>
>> Also I see, that in af_vsock.c , 'vsock_dgram_sendmsg()' dereferences 'msg' for checking MSG_OOB without any
>> checks (before calling transport callback - this function in case of virtio). So I think if we want to keep
>> this type of check - such check must be placed in af_vsock.c or somewhere before first dereference of this pointer.
>>
>
> There is some talk about dgram sockets adding additional messages types
> in the future that help with congestion control. Those messages won't
> come from the socket layer, so msghdr will be null. Since there is no
> other function for sending datagrams, it seemed likely that this
> function would be reworked for that purpose. I felt that adding this
> check was a direct way to make it explicit that this function is
> currently designed only for the socket-layer caller.
>
> Perhaps a comment would suffice?
I see, thanks, it is for future usage. Sorry for dumb question: but if msg is NULL, how
we will decide what to do in this call? Interface of this callback will be updated or
some fields of 'vsock_sock' will contain type of such messages ?
Thanks, Arseniy
>
>>> +
>>> + noblock = msg->msg_flags & MSG_DONTWAIT;
>>> +
>>> + /* Use sock_alloc_send_skb to throttle by sk_sndbuf. This helps avoid
>>> + * triggering the OOM.
>>> + */
>>> + skb = sock_alloc_send_skb(sk, dgram_len + VIRTIO_VSOCK_SKB_HEADROOM,
>>> + noblock, &err);
>>> + if (!skb)
>>> + return err;
>>> +
>>> + skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM);
>>> +
>>> + src_cid = t_ops->transport.get_local_cid();
>>> + src_port = vsk->local_addr.svm_port;
>>> +
>>> + hdr = virtio_vsock_hdr(skb);
>>> + hdr->type = cpu_to_le16(info->type);
>>> + hdr->op = cpu_to_le16(info->op);
>>> + hdr->src_cid = cpu_to_le64(src_cid);
>>> + hdr->dst_cid = cpu_to_le64(remote_addr->svm_cid);
>>> + hdr->src_port = cpu_to_le32(src_port);
>>> + hdr->dst_port = cpu_to_le32(remote_addr->svm_port);
>>> + hdr->flags = cpu_to_le32(info->flags);
>>> + hdr->len = cpu_to_le32(dgram_len);
>>> +
>>> + skb_set_owner_w(skb, sk);
>>> +
>>> + payload = skb_put(skb, dgram_len);
>>> + err = memcpy_from_msg(payload, msg, dgram_len);
>>> + if (err)
>>> + return err;
>>
>> Do we need free allocated skb here ?
>>
>
> Yep, thanks.
>
>>> +
>>> + trace_virtio_transport_alloc_pkt(src_cid, src_port,
>>> + remote_addr->svm_cid,
>>> + remote_addr->svm_port,
>>> + dgram_len,
>>> + info->type,
>>> + info->op,
>>> + 0);
>>> +
>>> + return t_ops->send_pkt(skb);
>>> }
>>> EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
>>>
>>>
>>
>> Thanks, Arseniy
>
> Thanks for the review!
>
> Best,
> Bobby
On 26.07.2023 20:55, Bobby Eshleman wrote:
> On Sat, Jul 22, 2023 at 11:42:38AM +0300, Arseniy Krasnov wrote:
>>
>>
>> On 19.07.2023 03:50, Bobby Eshleman wrote:
>>> This commit implements datagram support for vhost/vsock by teaching
>>> vhost to use the common virtio transport datagram functions.
>>>
>>> If the virtio RX buffer is too small, then the transmission is
>>> abandoned, the packet dropped, and EHOSTUNREACH is added to the socket's
>>> error queue.
>>>
>>> Signed-off-by: Bobby Eshleman <[email protected]>
>>> ---
>>> drivers/vhost/vsock.c | 62 +++++++++++++++++++++++++++++++++++++++++++++---
>>> net/vmw_vsock/af_vsock.c | 5 +++-
>>> 2 files changed, 63 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>>> index d5d6a3c3f273..da14260c6654 100644
>>> --- a/drivers/vhost/vsock.c
>>> +++ b/drivers/vhost/vsock.c
>>> @@ -8,6 +8,7 @@
>>> */
>>> #include <linux/miscdevice.h>
>>> #include <linux/atomic.h>
>>> +#include <linux/errqueue.h>
>>> #include <linux/module.h>
>>> #include <linux/mutex.h>
>>> #include <linux/vmalloc.h>
>>> @@ -32,7 +33,8 @@
>>> enum {
>>> VHOST_VSOCK_FEATURES = VHOST_FEATURES |
>>> (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
>>> - (1ULL << VIRTIO_VSOCK_F_SEQPACKET)
>>> + (1ULL << VIRTIO_VSOCK_F_SEQPACKET) |
>>> + (1ULL << VIRTIO_VSOCK_F_DGRAM)
>>> };
>>>
>>> enum {
>>> @@ -56,6 +58,7 @@ struct vhost_vsock {
>>> atomic_t queued_replies;
>>>
>>> u32 guest_cid;
>>> + bool dgram_allow;
>>> bool seqpacket_allow;
>>> };
>>>
>>> @@ -86,6 +89,32 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
>>> return NULL;
>>> }
>>>
>>> +/* Claims ownership of the skb, do not free the skb after calling! */
>>> +static void
>>> +vhost_transport_error(struct sk_buff *skb, int err)
>>> +{
>>> + struct sock_exterr_skb *serr;
>>> + struct sock *sk = skb->sk;
>>> + struct sk_buff *clone;
>>> +
>>> + serr = SKB_EXT_ERR(skb);
>>> + memset(serr, 0, sizeof(*serr));
>>> + serr->ee.ee_errno = err;
>>> + serr->ee.ee_origin = SO_EE_ORIGIN_NONE;
>>> +
>>> + clone = skb_clone(skb, GFP_KERNEL);
>>
>> May for skb which is error carrier we can use 'sock_omalloc()', not 'skb_clone()' ? TCP uses skb
>> allocated by this function as carriers of error structure. I guess 'skb_clone()' also clones data of origin,
>> but i think that there is no need in data as we insert it to error queue of the socket.
>>
>> What do You think?
>
> IIUC skb_clone() is often used in this scenario so that the user can
> retrieve the error-causing packet from the error queue. Is there some
> reason we shouldn't do this?
>
> I'm seeing that the serr bits need to occur on the clone here, not the
> original. I didn't realize the SKB_EXT_ERR() is a skb->cb cast. I'm not
> actually sure how this passes the test case since ->cb isn't cloned.
>
>>
>>> + if (!clone)
>>> + return;
>>
>> What will happen here 'if (!clone)' ? skb will leak as it was removed from queue?
>>
>
> Ah yes, true.
>
>>> +
>>> + if (sock_queue_err_skb(sk, clone))
>>> + kfree_skb(clone);
>>> +
>>> + sk->sk_err = err;
>>> + sk_error_report(sk);
>>> +
>>> + kfree_skb(skb);
>>> +}
>>> +
>>> static void
>>> vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>>> struct vhost_virtqueue *vq)
>>> @@ -160,9 +189,15 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>>> hdr = virtio_vsock_hdr(skb);
>>>
>>> /* If the packet is greater than the space available in the
>>> - * buffer, we split it using multiple buffers.
>>> + * buffer, we split it using multiple buffers for connectible
>>> + * sockets and drop the packet for datagram sockets.
>>> */
>>> if (payload_len > iov_len - sizeof(*hdr)) {
>>> + if (le16_to_cpu(hdr->type) == VIRTIO_VSOCK_TYPE_DGRAM) {
>>> + vhost_transport_error(skb, EHOSTUNREACH);
>>> + continue;
>>> + }
>>> +
>>> payload_len = iov_len - sizeof(*hdr);
>>>
>>> /* As we are copying pieces of large packet's buffer to
>>> @@ -394,6 +429,7 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock)
>>> return val < vq->num;
>>> }
>>>
>>> +static bool vhost_transport_dgram_allow(u32 cid, u32 port);
>>> static bool vhost_transport_seqpacket_allow(u32 remote_cid);
>>>
>>> static struct virtio_transport vhost_transport = {
>>> @@ -410,7 +446,8 @@ static struct virtio_transport vhost_transport = {
>>> .cancel_pkt = vhost_transport_cancel_pkt,
>>>
>>> .dgram_enqueue = virtio_transport_dgram_enqueue,
>>> - .dgram_allow = virtio_transport_dgram_allow,
>>> + .dgram_allow = vhost_transport_dgram_allow,
>>> + .dgram_addr_init = virtio_transport_dgram_addr_init,
>>>
>>> .stream_enqueue = virtio_transport_stream_enqueue,
>>> .stream_dequeue = virtio_transport_stream_dequeue,
>>> @@ -443,6 +480,22 @@ static struct virtio_transport vhost_transport = {
>>> .send_pkt = vhost_transport_send_pkt,
>>> };
>>>
>>> +static bool vhost_transport_dgram_allow(u32 cid, u32 port)
>>> +{
>>> + struct vhost_vsock *vsock;
>>> + bool dgram_allow = false;
>>> +
>>> + rcu_read_lock();
>>> + vsock = vhost_vsock_get(cid);
>>> +
>>> + if (vsock)
>>> + dgram_allow = vsock->dgram_allow;
>>> +
>>> + rcu_read_unlock();
>>> +
>>> + return dgram_allow;
>>> +}
>>> +
>>> static bool vhost_transport_seqpacket_allow(u32 remote_cid)
>>> {
>>> struct vhost_vsock *vsock;
>>> @@ -799,6 +852,9 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
>>> if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
>>> vsock->seqpacket_allow = true;
>>>
>>> + if (features & (1ULL << VIRTIO_VSOCK_F_DGRAM))
>>> + vsock->dgram_allow = true;
>>> +
>>> for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
>>> vq = &vsock->vqs[i];
>>> mutex_lock(&vq->mutex);
>>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>>> index e73f3b2c52f1..449ed63ac2b0 100644
>>> --- a/net/vmw_vsock/af_vsock.c
>>> +++ b/net/vmw_vsock/af_vsock.c
>>> @@ -1427,9 +1427,12 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
>>> return prot->recvmsg(sk, msg, len, flags, NULL);
>>> #endif
>>>
>>> - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
>>> + if (unlikely(flags & MSG_OOB))
>>> return -EOPNOTSUPP;
>>>
>>> + if (unlikely(flags & MSG_ERRQUEUE))
>>> + return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, 0);
>>> +
>>
>> Sorry, but I get build error here, because SOL_VSOCK in undefined. I think it should be added to
>> include/linux/socket.h and to uapi files also for future use in userspace.
>>
>
> Strange, I built each patch individually without issue. My base is
> netdev/main with your SOL_VSOCK patch applied. I will look today and see
> if I'm missing something.
I see, this is difference, because i'm trying to run this patchset on the last net-next (as it is
supposed to be merged to net-next). I guess You should add this define anyway when You be ready to
be merged to net-next (I really don't know which SOL_VSOCK will be merged first - "Your" or "my" :) )
Thanks, Arseniy
>
>> Also Stefano Garzarella <[email protected]> suggested to add define something like VSOCK_RECVERR,
>> in the same way as IP_RECVERR, and use it as last parameter of 'sock_recv_errqueue()'.
>>
>
> Got it, thanks.
>
>>> transport = vsk->transport;
>>>
>>> /* Retrieve the head sk_buff from the socket's receive queue. */
>>>
>>
>> Thanks, Arseniy
>
> Thanks,
> Bobby
On 26.07.2023 20:58, Bobby Eshleman wrote:
> On Sat, Jul 22, 2023 at 11:45:29AM +0300, Arseniy Krasnov wrote:
>>
>>
>> On 19.07.2023 03:50, Bobby Eshleman wrote:
>>> This commit implements datagram support for virtio/vsock by teaching
>>> virtio to use the general virtio transport ->dgram_addr_init() function
>>> and implementation a new version of ->dgram_allow().
>>>
>>> Additionally, it drops virtio_transport_dgram_allow() as an exported
>>> symbol because it is no longer used in other transports.
>>>
>>> Signed-off-by: Bobby Eshleman <[email protected]>
>>> ---
>>> include/linux/virtio_vsock.h | 1 -
>>> net/vmw_vsock/virtio_transport.c | 24 +++++++++++++++++++++++-
>>> net/vmw_vsock/virtio_transport_common.c | 6 ------
>>> 3 files changed, 23 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>>> index b3856b8a42b3..d0a4f08b12c1 100644
>>> --- a/include/linux/virtio_vsock.h
>>> +++ b/include/linux/virtio_vsock.h
>>> @@ -211,7 +211,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
>>> u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
>>> bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
>>> bool virtio_transport_stream_allow(u32 cid, u32 port);
>>> -bool virtio_transport_dgram_allow(u32 cid, u32 port);
>>> void virtio_transport_dgram_addr_init(struct sk_buff *skb,
>>> struct sockaddr_vm *addr);
>>>
>>> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>>> index ac2126c7dac5..713718861bd4 100644
>>> --- a/net/vmw_vsock/virtio_transport.c
>>> +++ b/net/vmw_vsock/virtio_transport.c
>>> @@ -63,6 +63,7 @@ struct virtio_vsock {
>>>
>>> u32 guest_cid;
>>> bool seqpacket_allow;
>>> + bool dgram_allow;
>>> };
>>>
>>> static u32 virtio_transport_get_local_cid(void)
>>> @@ -413,6 +414,7 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
>>> queue_work(virtio_vsock_workqueue, &vsock->rx_work);
>>> }
>>>
>>> +static bool virtio_transport_dgram_allow(u32 cid, u32 port);
>>
>> May be add body here? Without prototyping? Same for loopback and vhost.
>>
>
> Sounds okay with me, but this seems to go against the pattern
> established by seqpacket. Any reason why?
Stefano Garzarella <[email protected]> commented my patch with the same approach:
https://lore.kernel.org/netdev/lex6l5suez7azhirt22lidndtjomkbagfbpvvi5p7c2t7klzas@4l2qly7at37c/
Thanks, Arseniy
>
>>> static bool virtio_transport_seqpacket_allow(u32 remote_cid);
>>>
>>> static struct virtio_transport virtio_transport = {
>>> @@ -430,6 +432,7 @@ static struct virtio_transport virtio_transport = {
>>>
>>> .dgram_enqueue = virtio_transport_dgram_enqueue,
>>> .dgram_allow = virtio_transport_dgram_allow,
>>> + .dgram_addr_init = virtio_transport_dgram_addr_init,
>>>
>>> .stream_dequeue = virtio_transport_stream_dequeue,
>>> .stream_enqueue = virtio_transport_stream_enqueue,
>>> @@ -462,6 +465,21 @@ static struct virtio_transport virtio_transport = {
>>> .send_pkt = virtio_transport_send_pkt,
>>> };
>>>
>>> +static bool virtio_transport_dgram_allow(u32 cid, u32 port)
>>> +{
>>> + struct virtio_vsock *vsock;
>>> + bool dgram_allow;
>>> +
>>> + dgram_allow = false;
>>> + rcu_read_lock();
>>> + vsock = rcu_dereference(the_virtio_vsock);
>>> + if (vsock)
>>> + dgram_allow = vsock->dgram_allow;
>>> + rcu_read_unlock();
>>> +
>>> + return dgram_allow;
>>> +}
>>> +
>>> static bool virtio_transport_seqpacket_allow(u32 remote_cid)
>>> {
>>> struct virtio_vsock *vsock;
>>> @@ -655,6 +673,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
>>> if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_SEQPACKET))
>>> vsock->seqpacket_allow = true;
>>>
>>> + if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
>>> + vsock->dgram_allow = true;
>>> +
>>> vdev->priv = vsock;
>>>
>>> ret = virtio_vsock_vqs_init(vsock);
>>> @@ -747,7 +768,8 @@ static struct virtio_device_id id_table[] = {
>>> };
>>>
>>> static unsigned int features[] = {
>>> - VIRTIO_VSOCK_F_SEQPACKET
>>> + VIRTIO_VSOCK_F_SEQPACKET,
>>> + VIRTIO_VSOCK_F_DGRAM
>>> };
>>>
>>> static struct virtio_driver virtio_vsock_driver = {
>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>> index 96118e258097..77898f5325cd 100644
>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>> @@ -783,12 +783,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
>>> }
>>> EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
>>>
>>> -bool virtio_transport_dgram_allow(u32 cid, u32 port)
>>> -{
>>> - return false;
>>> -}
>>> -EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
>>> -
>>> int virtio_transport_connect(struct vsock_sock *vsk)
>>> {
>>> struct virtio_vsock_pkt_info info = {
>>>
>>
>> Thanks, Arseniy
>
> Thanks,
> Bobby
On 26.07.2023 20:55, Bobby Eshleman wrote:
> On Sat, Jul 22, 2023 at 11:42:38AM +0300, Arseniy Krasnov wrote:
>>
>>
>> On 19.07.2023 03:50, Bobby Eshleman wrote:
>>> This commit implements datagram support for vhost/vsock by teaching
>>> vhost to use the common virtio transport datagram functions.
>>>
>>> If the virtio RX buffer is too small, then the transmission is
>>> abandoned, the packet dropped, and EHOSTUNREACH is added to the socket's
>>> error queue.
>>>
>>> Signed-off-by: Bobby Eshleman <[email protected]>
>>> ---
>>> drivers/vhost/vsock.c | 62 +++++++++++++++++++++++++++++++++++++++++++++---
>>> net/vmw_vsock/af_vsock.c | 5 +++-
>>> 2 files changed, 63 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>>> index d5d6a3c3f273..da14260c6654 100644
>>> --- a/drivers/vhost/vsock.c
>>> +++ b/drivers/vhost/vsock.c
>>> @@ -8,6 +8,7 @@
>>> */
>>> #include <linux/miscdevice.h>
>>> #include <linux/atomic.h>
>>> +#include <linux/errqueue.h>
>>> #include <linux/module.h>
>>> #include <linux/mutex.h>
>>> #include <linux/vmalloc.h>
>>> @@ -32,7 +33,8 @@
>>> enum {
>>> VHOST_VSOCK_FEATURES = VHOST_FEATURES |
>>> (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
>>> - (1ULL << VIRTIO_VSOCK_F_SEQPACKET)
>>> + (1ULL << VIRTIO_VSOCK_F_SEQPACKET) |
>>> + (1ULL << VIRTIO_VSOCK_F_DGRAM)
>>> };
>>>
>>> enum {
>>> @@ -56,6 +58,7 @@ struct vhost_vsock {
>>> atomic_t queued_replies;
>>>
>>> u32 guest_cid;
>>> + bool dgram_allow;
>>> bool seqpacket_allow;
>>> };
>>>
>>> @@ -86,6 +89,32 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
>>> return NULL;
>>> }
>>>
>>> +/* Claims ownership of the skb, do not free the skb after calling! */
>>> +static void
>>> +vhost_transport_error(struct sk_buff *skb, int err)
>>> +{
>>> + struct sock_exterr_skb *serr;
>>> + struct sock *sk = skb->sk;
>>> + struct sk_buff *clone;
>>> +
>>> + serr = SKB_EXT_ERR(skb);
>>> + memset(serr, 0, sizeof(*serr));
>>> + serr->ee.ee_errno = err;
>>> + serr->ee.ee_origin = SO_EE_ORIGIN_NONE;
>>> +
>>> + clone = skb_clone(skb, GFP_KERNEL);
>>
>> May for skb which is error carrier we can use 'sock_omalloc()', not 'skb_clone()' ? TCP uses skb
>> allocated by this function as carriers of error structure. I guess 'skb_clone()' also clones data of origin,
>> but i think that there is no need in data as we insert it to error queue of the socket.
>>
>> What do You think?
>
> IIUC skb_clone() is often used in this scenario so that the user can
> retrieve the error-causing packet from the error queue. Is there some
> reason we shouldn't do this?
>
> I'm seeing that the serr bits need to occur on the clone here, not the
> original. I didn't realize the SKB_EXT_ERR() is a skb->cb cast. I'm not
> actually sure how this passes the test case since ->cb isn't cloned.
Ah yes, sorry, You are right, I just confused this case with zerocopy completion
handling - there we allocate "empty" skb which carries completion metadata in its
'cb' field.
Hm, but can't we just reinsert current skb (update it's 'cb' as 'sock_exterr_skb')
to error queue of the socket without cloning it ?
Thanks, Arseniy
>
>>
>>> + if (!clone)
>>> + return;
>>
>> What will happen here 'if (!clone)' ? skb will leak as it was removed from queue?
>>
>
> Ah yes, true.
>
>>> +
>>> + if (sock_queue_err_skb(sk, clone))
>>> + kfree_skb(clone);
>>> +
>>> + sk->sk_err = err;
>>> + sk_error_report(sk);
>>> +
>>> + kfree_skb(skb);
>>> +}
>>> +
>>> static void
>>> vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>>> struct vhost_virtqueue *vq)
>>> @@ -160,9 +189,15 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>>> hdr = virtio_vsock_hdr(skb);
>>>
>>> /* If the packet is greater than the space available in the
>>> - * buffer, we split it using multiple buffers.
>>> + * buffer, we split it using multiple buffers for connectible
>>> + * sockets and drop the packet for datagram sockets.
>>> */
>>> if (payload_len > iov_len - sizeof(*hdr)) {
>>> + if (le16_to_cpu(hdr->type) == VIRTIO_VSOCK_TYPE_DGRAM) {
>>> + vhost_transport_error(skb, EHOSTUNREACH);
>>> + continue;
>>> + }
>>> +
>>> payload_len = iov_len - sizeof(*hdr);
>>>
>>> /* As we are copying pieces of large packet's buffer to
>>> @@ -394,6 +429,7 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock)
>>> return val < vq->num;
>>> }
>>>
>>> +static bool vhost_transport_dgram_allow(u32 cid, u32 port);
>>> static bool vhost_transport_seqpacket_allow(u32 remote_cid);
>>>
>>> static struct virtio_transport vhost_transport = {
>>> @@ -410,7 +446,8 @@ static struct virtio_transport vhost_transport = {
>>> .cancel_pkt = vhost_transport_cancel_pkt,
>>>
>>> .dgram_enqueue = virtio_transport_dgram_enqueue,
>>> - .dgram_allow = virtio_transport_dgram_allow,
>>> + .dgram_allow = vhost_transport_dgram_allow,
>>> + .dgram_addr_init = virtio_transport_dgram_addr_init,
>>>
>>> .stream_enqueue = virtio_transport_stream_enqueue,
>>> .stream_dequeue = virtio_transport_stream_dequeue,
>>> @@ -443,6 +480,22 @@ static struct virtio_transport vhost_transport = {
>>> .send_pkt = vhost_transport_send_pkt,
>>> };
>>>
>>> +static bool vhost_transport_dgram_allow(u32 cid, u32 port)
>>> +{
>>> + struct vhost_vsock *vsock;
>>> + bool dgram_allow = false;
>>> +
>>> + rcu_read_lock();
>>> + vsock = vhost_vsock_get(cid);
>>> +
>>> + if (vsock)
>>> + dgram_allow = vsock->dgram_allow;
>>> +
>>> + rcu_read_unlock();
>>> +
>>> + return dgram_allow;
>>> +}
>>> +
>>> static bool vhost_transport_seqpacket_allow(u32 remote_cid)
>>> {
>>> struct vhost_vsock *vsock;
>>> @@ -799,6 +852,9 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
>>> if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
>>> vsock->seqpacket_allow = true;
>>>
>>> + if (features & (1ULL << VIRTIO_VSOCK_F_DGRAM))
>>> + vsock->dgram_allow = true;
>>> +
>>> for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
>>> vq = &vsock->vqs[i];
>>> mutex_lock(&vq->mutex);
>>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>>> index e73f3b2c52f1..449ed63ac2b0 100644
>>> --- a/net/vmw_vsock/af_vsock.c
>>> +++ b/net/vmw_vsock/af_vsock.c
>>> @@ -1427,9 +1427,12 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
>>> return prot->recvmsg(sk, msg, len, flags, NULL);
>>> #endif
>>>
>>> - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
>>> + if (unlikely(flags & MSG_OOB))
>>> return -EOPNOTSUPP;
>>>
>>> + if (unlikely(flags & MSG_ERRQUEUE))
>>> + return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, 0);
>>> +
>>
>> Sorry, but I get build error here, because SOL_VSOCK in undefined. I think it should be added to
>> include/linux/socket.h and to uapi files also for future use in userspace.
>>
>
> Strange, I built each patch individually without issue. My base is
> netdev/main with your SOL_VSOCK patch applied. I will look today and see
> if I'm missing something.
>
>> Also Stefano Garzarella <[email protected]> suggested to add define something like VSOCK_RECVERR,
>> in the same way as IP_RECVERR, and use it as last parameter of 'sock_recv_errqueue()'.
>>
>
> Got it, thanks.
>
>>> transport = vsk->transport;
>>>
>>> /* Retrieve the head sk_buff from the socket's receive queue. */
>>>
>>
>> Thanks, Arseniy
>
> Thanks,
> Bobby
On Thu, Jul 27, 2023 at 11:09:21AM +0300, Arseniy Krasnov wrote:
>
>
> On 26.07.2023 20:58, Bobby Eshleman wrote:
> > On Sat, Jul 22, 2023 at 11:45:29AM +0300, Arseniy Krasnov wrote:
> >>
> >>
> >> On 19.07.2023 03:50, Bobby Eshleman wrote:
> >>> This commit implements datagram support for virtio/vsock by teaching
> >>> virtio to use the general virtio transport ->dgram_addr_init() function
> >>> and implementation a new version of ->dgram_allow().
> >>>
> >>> Additionally, it drops virtio_transport_dgram_allow() as an exported
> >>> symbol because it is no longer used in other transports.
> >>>
> >>> Signed-off-by: Bobby Eshleman <[email protected]>
> >>> ---
> >>> include/linux/virtio_vsock.h | 1 -
> >>> net/vmw_vsock/virtio_transport.c | 24 +++++++++++++++++++++++-
> >>> net/vmw_vsock/virtio_transport_common.c | 6 ------
> >>> 3 files changed, 23 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> >>> index b3856b8a42b3..d0a4f08b12c1 100644
> >>> --- a/include/linux/virtio_vsock.h
> >>> +++ b/include/linux/virtio_vsock.h
> >>> @@ -211,7 +211,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> >>> u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> >>> bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> >>> bool virtio_transport_stream_allow(u32 cid, u32 port);
> >>> -bool virtio_transport_dgram_allow(u32 cid, u32 port);
> >>> void virtio_transport_dgram_addr_init(struct sk_buff *skb,
> >>> struct sockaddr_vm *addr);
> >>>
> >>> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> >>> index ac2126c7dac5..713718861bd4 100644
> >>> --- a/net/vmw_vsock/virtio_transport.c
> >>> +++ b/net/vmw_vsock/virtio_transport.c
> >>> @@ -63,6 +63,7 @@ struct virtio_vsock {
> >>>
> >>> u32 guest_cid;
> >>> bool seqpacket_allow;
> >>> + bool dgram_allow;
> >>> };
> >>>
> >>> static u32 virtio_transport_get_local_cid(void)
> >>> @@ -413,6 +414,7 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
> >>> queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> >>> }
> >>>
> >>> +static bool virtio_transport_dgram_allow(u32 cid, u32 port);
> >>
> >> May be add body here? Without prototyping? Same for loopback and vhost.
> >>
> >
> > Sounds okay with me, but this seems to go against the pattern
> > established by seqpacket. Any reason why?
>
> Stefano Garzarella <[email protected]> commented my patch with the same approach:
>
> https://lore.kernel.org/netdev/lex6l5suez7azhirt22lidndtjomkbagfbpvvi5p7c2t7klzas@4l2qly7at37c/
>
> Thanks, Arseniy
>
Gotcha, sounds good.
Thanks,
Bobby
>
> >
> >>> static bool virtio_transport_seqpacket_allow(u32 remote_cid);
> >>>
> >>> static struct virtio_transport virtio_transport = {
> >>> @@ -430,6 +432,7 @@ static struct virtio_transport virtio_transport = {
> >>>
> >>> .dgram_enqueue = virtio_transport_dgram_enqueue,
> >>> .dgram_allow = virtio_transport_dgram_allow,
> >>> + .dgram_addr_init = virtio_transport_dgram_addr_init,
> >>>
> >>> .stream_dequeue = virtio_transport_stream_dequeue,
> >>> .stream_enqueue = virtio_transport_stream_enqueue,
> >>> @@ -462,6 +465,21 @@ static struct virtio_transport virtio_transport = {
> >>> .send_pkt = virtio_transport_send_pkt,
> >>> };
> >>>
> >>> +static bool virtio_transport_dgram_allow(u32 cid, u32 port)
> >>> +{
> >>> + struct virtio_vsock *vsock;
> >>> + bool dgram_allow;
> >>> +
> >>> + dgram_allow = false;
> >>> + rcu_read_lock();
> >>> + vsock = rcu_dereference(the_virtio_vsock);
> >>> + if (vsock)
> >>> + dgram_allow = vsock->dgram_allow;
> >>> + rcu_read_unlock();
> >>> +
> >>> + return dgram_allow;
> >>> +}
> >>> +
> >>> static bool virtio_transport_seqpacket_allow(u32 remote_cid)
> >>> {
> >>> struct virtio_vsock *vsock;
> >>> @@ -655,6 +673,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> >>> if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_SEQPACKET))
> >>> vsock->seqpacket_allow = true;
> >>>
> >>> + if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
> >>> + vsock->dgram_allow = true;
> >>> +
> >>> vdev->priv = vsock;
> >>>
> >>> ret = virtio_vsock_vqs_init(vsock);
> >>> @@ -747,7 +768,8 @@ static struct virtio_device_id id_table[] = {
> >>> };
> >>>
> >>> static unsigned int features[] = {
> >>> - VIRTIO_VSOCK_F_SEQPACKET
> >>> + VIRTIO_VSOCK_F_SEQPACKET,
> >>> + VIRTIO_VSOCK_F_DGRAM
> >>> };
> >>>
> >>> static struct virtio_driver virtio_vsock_driver = {
> >>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> >>> index 96118e258097..77898f5325cd 100644
> >>> --- a/net/vmw_vsock/virtio_transport_common.c
> >>> +++ b/net/vmw_vsock/virtio_transport_common.c
> >>> @@ -783,12 +783,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
> >>> }
> >>> EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> >>>
> >>> -bool virtio_transport_dgram_allow(u32 cid, u32 port)
> >>> -{
> >>> - return false;
> >>> -}
> >>> -EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
> >>> -
> >>> int virtio_transport_connect(struct vsock_sock *vsk)
> >>> {
> >>> struct virtio_vsock_pkt_info info = {
> >>>
> >>
> >> Thanks, Arseniy
> >
> > Thanks,
> > Bobby
On Wed, Jul 26, 2023 at 02:40:22PM -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 19, 2023 at 12:50:15AM +0000, Bobby Eshleman wrote:
> > This commit implements datagram support for vhost/vsock by teaching
> > vhost to use the common virtio transport datagram functions.
> >
> > If the virtio RX buffer is too small, then the transmission is
> > abandoned, the packet dropped, and EHOSTUNREACH is added to the socket's
> > error queue.
> >
> > Signed-off-by: Bobby Eshleman <[email protected]>
>
> EHOSTUNREACH?
>
Yes, in the v4 thread we decided to try to mimic UDP/ICMP behavior when
IP packets are lost.
If an IP packet is dropped and the full UDP segment is not assembled,
then ICMP_TIME_EXCEEDED ICMP_EXC_FRAGTIME is sent. The sending stack
propagates this up the socket as EHOSTUNREACH. ENOBUFS/ENOMEM is already
used for local buffers, so EHOSTUNREACH distinctly points to the remote
end of the flow as well.
>
> > ---
> > drivers/vhost/vsock.c | 62 +++++++++++++++++++++++++++++++++++++++++++++---
> > net/vmw_vsock/af_vsock.c | 5 +++-
> > 2 files changed, 63 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index d5d6a3c3f273..da14260c6654 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -8,6 +8,7 @@
> > */
> > #include <linux/miscdevice.h>
> > #include <linux/atomic.h>
> > +#include <linux/errqueue.h>
> > #include <linux/module.h>
> > #include <linux/mutex.h>
> > #include <linux/vmalloc.h>
> > @@ -32,7 +33,8 @@
> > enum {
> > VHOST_VSOCK_FEATURES = VHOST_FEATURES |
> > (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
> > - (1ULL << VIRTIO_VSOCK_F_SEQPACKET)
> > + (1ULL << VIRTIO_VSOCK_F_SEQPACKET) |
> > + (1ULL << VIRTIO_VSOCK_F_DGRAM)
> > };
> >
> > enum {
> > @@ -56,6 +58,7 @@ struct vhost_vsock {
> > atomic_t queued_replies;
> >
> > u32 guest_cid;
> > + bool dgram_allow;
> > bool seqpacket_allow;
> > };
> >
> > @@ -86,6 +89,32 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
> > return NULL;
> > }
> >
> > +/* Claims ownership of the skb, do not free the skb after calling! */
> > +static void
> > +vhost_transport_error(struct sk_buff *skb, int err)
> > +{
> > + struct sock_exterr_skb *serr;
> > + struct sock *sk = skb->sk;
> > + struct sk_buff *clone;
> > +
> > + serr = SKB_EXT_ERR(skb);
> > + memset(serr, 0, sizeof(*serr));
> > + serr->ee.ee_errno = err;
> > + serr->ee.ee_origin = SO_EE_ORIGIN_NONE;
> > +
> > + clone = skb_clone(skb, GFP_KERNEL);
> > + if (!clone)
> > + return;
> > +
> > + if (sock_queue_err_skb(sk, clone))
> > + kfree_skb(clone);
> > +
> > + sk->sk_err = err;
> > + sk_error_report(sk);
> > +
> > + kfree_skb(skb);
> > +}
> > +
> > static void
> > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > struct vhost_virtqueue *vq)
> > @@ -160,9 +189,15 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > hdr = virtio_vsock_hdr(skb);
> >
> > /* If the packet is greater than the space available in the
> > - * buffer, we split it using multiple buffers.
> > + * buffer, we split it using multiple buffers for connectible
> > + * sockets and drop the packet for datagram sockets.
> > */
>
> won't this break things like recently proposed zerocopy?
> I think splitup has to be supported for all types.
>
>
> > if (payload_len > iov_len - sizeof(*hdr)) {
> > + if (le16_to_cpu(hdr->type) == VIRTIO_VSOCK_TYPE_DGRAM) {
> > + vhost_transport_error(skb, EHOSTUNREACH);
> > + continue;
> > + }
> > +
> > payload_len = iov_len - sizeof(*hdr);
> >
> > /* As we are copying pieces of large packet's buffer to
> > @@ -394,6 +429,7 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock)
> > return val < vq->num;
> > }
> >
> > +static bool vhost_transport_dgram_allow(u32 cid, u32 port);
> > static bool vhost_transport_seqpacket_allow(u32 remote_cid);
> >
> > static struct virtio_transport vhost_transport = {
> > @@ -410,7 +446,8 @@ static struct virtio_transport vhost_transport = {
> > .cancel_pkt = vhost_transport_cancel_pkt,
> >
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > - .dgram_allow = virtio_transport_dgram_allow,
> > + .dgram_allow = vhost_transport_dgram_allow,
> > + .dgram_addr_init = virtio_transport_dgram_addr_init,
> >
> > .stream_enqueue = virtio_transport_stream_enqueue,
> > .stream_dequeue = virtio_transport_stream_dequeue,
> > @@ -443,6 +480,22 @@ static struct virtio_transport vhost_transport = {
> > .send_pkt = vhost_transport_send_pkt,
> > };
> >
> > +static bool vhost_transport_dgram_allow(u32 cid, u32 port)
> > +{
> > + struct vhost_vsock *vsock;
> > + bool dgram_allow = false;
> > +
> > + rcu_read_lock();
> > + vsock = vhost_vsock_get(cid);
> > +
> > + if (vsock)
> > + dgram_allow = vsock->dgram_allow;
> > +
> > + rcu_read_unlock();
> > +
> > + return dgram_allow;
> > +}
> > +
> > static bool vhost_transport_seqpacket_allow(u32 remote_cid)
> > {
> > struct vhost_vsock *vsock;
> > @@ -799,6 +852,9 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
> > if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
> > vsock->seqpacket_allow = true;
> >
> > + if (features & (1ULL << VIRTIO_VSOCK_F_DGRAM))
> > + vsock->dgram_allow = true;
> > +
> > for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
> > vq = &vsock->vqs[i];
> > mutex_lock(&vq->mutex);
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index e73f3b2c52f1..449ed63ac2b0 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -1427,9 +1427,12 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> > return prot->recvmsg(sk, msg, len, flags, NULL);
> > #endif
> >
> > - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> > + if (unlikely(flags & MSG_OOB))
> > return -EOPNOTSUPP;
> >
> > + if (unlikely(flags & MSG_ERRQUEUE))
> > + return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, 0);
> > +
> > transport = vsk->transport;
> >
> > /* Retrieve the head sk_buff from the socket's receive queue. */
> >
> > --
> > 2.30.2
>
> _______________________________________________
> Virtualization mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
On Thu, Jul 27, 2023 at 03:51:42AM -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 19, 2023 at 12:50:04AM +0000, Bobby Eshleman wrote:
> > Hey all!
> >
> > This series introduces support for datagrams to virtio/vsock.
> >
> > It is a spin-off (and smaller version) of this series from the summer:
> > https://lore.kernel.org/all/[email protected]/
> >
> > Please note that this is an RFC and should not be merged until
> > associated changes are made to the virtio specification, which will
> > follow after discussion from this series.
> >
> > Another aside, the v4 of the series has only been mildly tested with a
> > run of tools/testing/vsock/vsock_test. Some code likely needs cleaning
> > up, but I'm hoping to get some of the design choices agreed upon before
> > spending too much time making it pretty.
> >
> > This series first supports datagrams in a basic form for virtio, and
> > then optimizes the sendpath for all datagram transports.
> >
> > The result is a very fast datagram communication protocol that
> > outperforms even UDP on multi-queue virtio-net w/ vhost on a variety
> > of multi-threaded workload samples.
> >
> > For those that are curious, some summary data comparing UDP and VSOCK
> > DGRAM (N=5):
> >
> > vCPUS: 16
> > virtio-net queues: 16
> > payload size: 4KB
> > Setup: bare metal + vm (non-nested)
> >
> > UDP: 287.59 MB/s
> > VSOCK DGRAM: 509.2 MB/s
> >
> > Some notes about the implementation...
> >
> > This datagram implementation forces datagrams to self-throttle according
> > to the threshold set by sk_sndbuf. It behaves similar to the credits
> > used by streams in its effect on throughput and memory consumption, but
> > it is not influenced by the receiving socket as credits are.
> >
> > The device drops packets silently.
> >
> > As discussed previously, this series introduces datagrams and defers
> > fairness to future work. See discussion in v2 for more context around
> > datagrams, fairness, and this implementation.
>
> it's a big thread - can't you summarize here?
>
Sure, no problem. I'll add that in the next rev. For the sake of readers
here, the fairness of vsock streams and vsock datagrams per this
implementation was experimentally demonstrated to be nearly equal.
Fairness was measured as a percentage reduction of throughput on an
active and concurrent stream flow. The socket type under test (datagram
or stream) was overprovisioned into a large pool of sockets and were
exercised to maximum sending throughput. Each socket was given a unique
port and single-threaded sender to avoid any scalability differences
between datagrams and streams. Meanwhile, the throughput of a single,
lone stream socket was measured before and throughout the lifetime the
pool of sockets, to detect fairness as an amount of reduced throughput.
It was demonstrated that there was no real difference in this fairness
characteristic of datagrams and streams for vsock. In fact, datagrams
faired better (that is, datagrams were nicer to streams than streams
were to other streams), although the effect was not statistically
significant. From the design perspective, the queuing policy is always
FIFO regardless of socket type. Credits, despite being a perfect
mechanism for synchronizing send and receive buffer sizes, have no
effect on queuing fairness either.
>
> > Signed-off-by: Bobby Eshleman <[email protected]>
>
>
> could you give a bit more motivation? which applications do
> you have in mind? for example, on localhost loopback datagrams
> are actually reliable and a bunch of apps came to depend
> on that even if they shouldn't.
>
>
Our use case is sending various metrics from VMs to the host.
Ultimately, we just like the performance numbers we get from this
datagram implementation compared to what we get from UDP.
Currently the system is:
producers <-> UDS <-> guest proxy <-> UDP <-> host <-> UDS <-> consumers
^-------- guest ----------------^ ^------------ host ------------------^
And the numbers look really promising when using vsock dgram:
producers <-> UDS <-> guest proxy <-> VSOCK dgram <-> host <-> UDS <-> consumers
^-------- guest ----------------^ ^------------ host ---------------------------^
The numbers also look really promising when using sockmap in lieu of the
proxies.
Best,
Bobby
>
> > ---
> > Changes in v5:
> > - teach vhost to drop dgram when a datagram exceeds the receive buffer
> > - now uses MSG_ERRQUEUE and depends on Arseniy's zerocopy patch:
> > "vsock: read from socket's error queue"
> > - replace multiple ->dgram_* callbacks with single ->dgram_addr_init()
> > callback
> > - refactor virtio dgram skb allocator to reduce conflicts w/ zerocopy series
> > - add _fallback/_FALLBACK suffix to dgram transport variables/macros
> > - add WARN_ONCE() for table_size / VSOCK_HASH issue
> > - add static to vsock_find_bound_socket_common
> > - dedupe code in vsock_dgram_sendmsg() using module_got var
> > - drop concurrent sendmsg() for dgram and defer to future series
> > - Add more tests
> > - test EHOSTUNREACH in errqueue
> > - test stream + dgram address collision
> > - improve clarity of dgram msg bounds test code
> > - Link to v4: https://lore.kernel.org/r/[email protected]
> >
> > Changes in v4:
> > - style changes
> > - vsock: use sk_vsock(vsk) in vsock_dgram_recvmsg instead of
> > &sk->vsk
> > - vsock: fix xmas tree declaration
> > - vsock: fix spacing issues
> > - virtio/vsock: virtio_transport_recv_dgram returns void because err
> > unused
> > - sparse analysis warnings/errors
> > - virtio/vsock: fix unitialized skerr on destroy
> > - virtio/vsock: fix uninitialized err var on goto out
> > - vsock: fix declarations that need static
> > - vsock: fix __rcu annotation order
> > - bugs
> > - vsock: fix null ptr in remote_info code
> > - vsock/dgram: make transport_dgram a fallback instead of first
> > priority
> > - vsock: remove redundant rcu read lock acquire in getname()
> > - tests
> > - add more tests (message bounds and more)
> > - add vsock_dgram_bind() helper
> > - add vsock_dgram_connect() helper
> >
> > Changes in v3:
> > - Support multi-transport dgram, changing logic in connect/bind
> > to support VMCI case
> > - Support per-pkt transport lookup for sendto() case
> > - Fix dgram_allow() implementation
> > - Fix dgram feature bit number (now it is 3)
> > - Fix binding so dgram and connectible (cid,port) spaces are
> > non-overlapping
> > - RCU protect transport ptr so connect() calls never leave
> > a lockless read of the transport and remote_addr are always
> > in sync
> > - Link to v2: https://lore.kernel.org/r/[email protected]
> >
> > ---
> > Bobby Eshleman (13):
> > af_vsock: generalize vsock_dgram_recvmsg() to all transports
> > af_vsock: refactor transport lookup code
> > af_vsock: support multi-transport datagrams
> > af_vsock: generalize bind table functions
> > af_vsock: use a separate dgram bind table
> > virtio/vsock: add VIRTIO_VSOCK_TYPE_DGRAM
> > virtio/vsock: add common datagram send path
> > af_vsock: add vsock_find_bound_dgram_socket()
> > virtio/vsock: add common datagram recv path
> > virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
> > vhost/vsock: implement datagram support
> > vsock/loopback: implement datagram support
> > virtio/vsock: implement datagram support
> >
> > Jiang Wang (1):
> > test/vsock: add vsock dgram tests
> >
> > drivers/vhost/vsock.c | 64 ++-
> > include/linux/virtio_vsock.h | 10 +-
> > include/net/af_vsock.h | 14 +-
> > include/uapi/linux/virtio_vsock.h | 2 +
> > net/vmw_vsock/af_vsock.c | 281 ++++++++++---
> > net/vmw_vsock/hyperv_transport.c | 13 -
> > net/vmw_vsock/virtio_transport.c | 26 +-
> > net/vmw_vsock/virtio_transport_common.c | 190 +++++++--
> > net/vmw_vsock/vmci_transport.c | 60 +--
> > net/vmw_vsock/vsock_loopback.c | 10 +-
> > tools/testing/vsock/util.c | 141 ++++++-
> > tools/testing/vsock/util.h | 6 +
> > tools/testing/vsock/vsock_test.c | 680 ++++++++++++++++++++++++++++++++
> > 13 files changed, 1320 insertions(+), 177 deletions(-)
> > ---
> > base-commit: 37cadc266ebdc7e3531111c2b3304fa01b2131e8
> > change-id: 20230413-b4-vsock-dgram-3b6eba6a64e5
> >
> > Best regards,
> > --
> > Bobby Eshleman <[email protected]>
>
> _______________________________________________
> Virtualization mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
On Thu, Jul 27, 2023 at 09:48:21AM +0200, Stefano Garzarella wrote:
> On Wed, Jul 26, 2023 at 02:38:08PM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 19, 2023 at 12:50:14AM +0000, Bobby Eshleman wrote:
> > > This commit adds a feature bit for virtio vsock to support datagrams.
> > >
> > > Signed-off-by: Jiang Wang <[email protected]>
> > > Signed-off-by: Bobby Eshleman <[email protected]>
> > > ---
> > > include/uapi/linux/virtio_vsock.h | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
> > > index 331be28b1d30..27b4b2b8bf13 100644
> > > --- a/include/uapi/linux/virtio_vsock.h
> > > +++ b/include/uapi/linux/virtio_vsock.h
> > > @@ -40,6 +40,7 @@
> > >
> > > /* The feature bitmap for virtio vsock */
> > > #define VIRTIO_VSOCK_F_SEQPACKET 1 /* SOCK_SEQPACKET supported */
> > > +#define VIRTIO_VSOCK_F_DGRAM 3 /* SOCK_DGRAM supported */
> > >
> > > struct virtio_vsock_config {
> > > __le64 guest_cid;
> >
> > pls do not add interface without first getting it accepted in the
> > virtio spec.
>
> Yep, fortunatelly this series is still RFC.
> I think by now we've seen that the implementation is doable, so we
> should discuss the changes to the specification ASAP. Then we can
> merge the series.
>
> @Bobby can you start the discussion about spec changes?
>
No problem at all. Am I right to assume that a new patch to the spec is
the standard starting point for discussion?
> Thanks,
> Stefano
>
> _______________________________________________
> Virtualization mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
On Tue, Aug 01, 2023 at 04:30:22AM +0000, Bobby Eshleman wrote:
>On Thu, Jul 27, 2023 at 09:48:21AM +0200, Stefano Garzarella wrote:
>> On Wed, Jul 26, 2023 at 02:38:08PM -0400, Michael S. Tsirkin wrote:
>> > On Wed, Jul 19, 2023 at 12:50:14AM +0000, Bobby Eshleman wrote:
>> > > This commit adds a feature bit for virtio vsock to support datagrams.
>> > >
>> > > Signed-off-by: Jiang Wang <[email protected]>
>> > > Signed-off-by: Bobby Eshleman <[email protected]>
>> > > ---
>> > > include/uapi/linux/virtio_vsock.h | 1 +
>> > > 1 file changed, 1 insertion(+)
>> > >
>> > > diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
>> > > index 331be28b1d30..27b4b2b8bf13 100644
>> > > --- a/include/uapi/linux/virtio_vsock.h
>> > > +++ b/include/uapi/linux/virtio_vsock.h
>> > > @@ -40,6 +40,7 @@
>> > >
>> > > /* The feature bitmap for virtio vsock */
>> > > #define VIRTIO_VSOCK_F_SEQPACKET 1 /* SOCK_SEQPACKET supported */
>> > > +#define VIRTIO_VSOCK_F_DGRAM 3 /* SOCK_DGRAM supported */
>> > >
>> > > struct virtio_vsock_config {
>> > > __le64 guest_cid;
>> >
>> > pls do not add interface without first getting it accepted in the
>> > virtio spec.
>>
>> Yep, fortunatelly this series is still RFC.
>> I think by now we've seen that the implementation is doable, so we
>> should discuss the changes to the specification ASAP. Then we can
>> merge the series.
>>
>> @Bobby can you start the discussion about spec changes?
>>
>
>No problem at all. Am I right to assume that a new patch to the spec is
>the standard starting point for discussion?
Yep, I think so!
Thanks,
Stefano
On Wed, Jul 26, 2023 at 02:40:22PM -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 19, 2023 at 12:50:15AM +0000, Bobby Eshleman wrote:
> > This commit implements datagram support for vhost/vsock by teaching
> > vhost to use the common virtio transport datagram functions.
> >
> > If the virtio RX buffer is too small, then the transmission is
> > abandoned, the packet dropped, and EHOSTUNREACH is added to the socket's
> > error queue.
> >
> > Signed-off-by: Bobby Eshleman <[email protected]>
>
> EHOSTUNREACH?
>
>
> > ---
> > drivers/vhost/vsock.c | 62 +++++++++++++++++++++++++++++++++++++++++++++---
> > net/vmw_vsock/af_vsock.c | 5 +++-
> > 2 files changed, 63 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index d5d6a3c3f273..da14260c6654 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -8,6 +8,7 @@
> > */
> > #include <linux/miscdevice.h>
> > #include <linux/atomic.h>
> > +#include <linux/errqueue.h>
> > #include <linux/module.h>
> > #include <linux/mutex.h>
> > #include <linux/vmalloc.h>
> > @@ -32,7 +33,8 @@
> > enum {
> > VHOST_VSOCK_FEATURES = VHOST_FEATURES |
> > (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
> > - (1ULL << VIRTIO_VSOCK_F_SEQPACKET)
> > + (1ULL << VIRTIO_VSOCK_F_SEQPACKET) |
> > + (1ULL << VIRTIO_VSOCK_F_DGRAM)
> > };
> >
> > enum {
> > @@ -56,6 +58,7 @@ struct vhost_vsock {
> > atomic_t queued_replies;
> >
> > u32 guest_cid;
> > + bool dgram_allow;
> > bool seqpacket_allow;
> > };
> >
> > @@ -86,6 +89,32 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
> > return NULL;
> > }
> >
> > +/* Claims ownership of the skb, do not free the skb after calling! */
> > +static void
> > +vhost_transport_error(struct sk_buff *skb, int err)
> > +{
> > + struct sock_exterr_skb *serr;
> > + struct sock *sk = skb->sk;
> > + struct sk_buff *clone;
> > +
> > + serr = SKB_EXT_ERR(skb);
> > + memset(serr, 0, sizeof(*serr));
> > + serr->ee.ee_errno = err;
> > + serr->ee.ee_origin = SO_EE_ORIGIN_NONE;
> > +
> > + clone = skb_clone(skb, GFP_KERNEL);
> > + if (!clone)
> > + return;
> > +
> > + if (sock_queue_err_skb(sk, clone))
> > + kfree_skb(clone);
> > +
> > + sk->sk_err = err;
> > + sk_error_report(sk);
> > +
> > + kfree_skb(skb);
> > +}
> > +
> > static void
> > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > struct vhost_virtqueue *vq)
> > @@ -160,9 +189,15 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > hdr = virtio_vsock_hdr(skb);
> >
> > /* If the packet is greater than the space available in the
> > - * buffer, we split it using multiple buffers.
> > + * buffer, we split it using multiple buffers for connectible
> > + * sockets and drop the packet for datagram sockets.
> > */
>
> won't this break things like recently proposed zerocopy?
> I think splitup has to be supported for all types.
>
Could you elaborate? Is there something about zerocopy that would
prohibit the transport from dropping a datagram?
>
> > if (payload_len > iov_len - sizeof(*hdr)) {
> > + if (le16_to_cpu(hdr->type) == VIRTIO_VSOCK_TYPE_DGRAM) {
> > + vhost_transport_error(skb, EHOSTUNREACH);
> > + continue;
> > + }
> > +
> > payload_len = iov_len - sizeof(*hdr);
> >
> > /* As we are copying pieces of large packet's buffer to
> > @@ -394,6 +429,7 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock)
> > return val < vq->num;
> > }
> >
> > +static bool vhost_transport_dgram_allow(u32 cid, u32 port);
> > static bool vhost_transport_seqpacket_allow(u32 remote_cid);
> >
> > static struct virtio_transport vhost_transport = {
> > @@ -410,7 +446,8 @@ static struct virtio_transport vhost_transport = {
> > .cancel_pkt = vhost_transport_cancel_pkt,
> >
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > - .dgram_allow = virtio_transport_dgram_allow,
> > + .dgram_allow = vhost_transport_dgram_allow,
> > + .dgram_addr_init = virtio_transport_dgram_addr_init,
> >
> > .stream_enqueue = virtio_transport_stream_enqueue,
> > .stream_dequeue = virtio_transport_stream_dequeue,
> > @@ -443,6 +480,22 @@ static struct virtio_transport vhost_transport = {
> > .send_pkt = vhost_transport_send_pkt,
> > };
> >
> > +static bool vhost_transport_dgram_allow(u32 cid, u32 port)
> > +{
> > + struct vhost_vsock *vsock;
> > + bool dgram_allow = false;
> > +
> > + rcu_read_lock();
> > + vsock = vhost_vsock_get(cid);
> > +
> > + if (vsock)
> > + dgram_allow = vsock->dgram_allow;
> > +
> > + rcu_read_unlock();
> > +
> > + return dgram_allow;
> > +}
> > +
> > static bool vhost_transport_seqpacket_allow(u32 remote_cid)
> > {
> > struct vhost_vsock *vsock;
> > @@ -799,6 +852,9 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
> > if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
> > vsock->seqpacket_allow = true;
> >
> > + if (features & (1ULL << VIRTIO_VSOCK_F_DGRAM))
> > + vsock->dgram_allow = true;
> > +
> > for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
> > vq = &vsock->vqs[i];
> > mutex_lock(&vq->mutex);
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index e73f3b2c52f1..449ed63ac2b0 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -1427,9 +1427,12 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> > return prot->recvmsg(sk, msg, len, flags, NULL);
> > #endif
> >
> > - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> > + if (unlikely(flags & MSG_OOB))
> > return -EOPNOTSUPP;
> >
> > + if (unlikely(flags & MSG_ERRQUEUE))
> > + return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, 0);
> > +
> > transport = vsk->transport;
> >
> > /* Retrieve the head sk_buff from the socket's receive queue. */
> >
> > --
> > 2.30.2
>
> _______________________________________________
> Virtualization mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
On Thu, Jul 27, 2023 at 10:57:05AM +0300, Arseniy Krasnov wrote:
>
>
> On 26.07.2023 20:08, Bobby Eshleman wrote:
> > On Sat, Jul 22, 2023 at 11:16:05AM +0300, Arseniy Krasnov wrote:
> >>
> >>
> >> On 19.07.2023 03:50, Bobby Eshleman wrote:
> >>> This commit implements the common function
> >>> virtio_transport_dgram_enqueue for enqueueing datagrams. It does not add
> >>> usage in either vhost or virtio yet.
> >>>
> >>> Signed-off-by: Bobby Eshleman <[email protected]>
> >>> ---
> >>> net/vmw_vsock/virtio_transport_common.c | 76 ++++++++++++++++++++++++++++++++-
> >>> 1 file changed, 75 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> >>> index ffcbdd77feaa..3bfaff758433 100644
> >>> --- a/net/vmw_vsock/virtio_transport_common.c
> >>> +++ b/net/vmw_vsock/virtio_transport_common.c
> >>> @@ -819,7 +819,81 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
> >>> struct msghdr *msg,
> >>> size_t dgram_len)
> >>> {
> >>> - return -EOPNOTSUPP;
> >>> + /* Here we are only using the info struct to retain style uniformity
> >>> + * and to ease future refactoring and merging.
> >>> + */
> >>> + struct virtio_vsock_pkt_info info_stack = {
> >>> + .op = VIRTIO_VSOCK_OP_RW,
> >>> + .msg = msg,
> >>> + .vsk = vsk,
> >>> + .type = VIRTIO_VSOCK_TYPE_DGRAM,
> >>> + };
> >>> + const struct virtio_transport *t_ops;
> >>> + struct virtio_vsock_pkt_info *info;
> >>> + struct sock *sk = sk_vsock(vsk);
> >>> + struct virtio_vsock_hdr *hdr;
> >>> + u32 src_cid, src_port;
> >>> + struct sk_buff *skb;
> >>> + void *payload;
> >>> + int noblock;
> >>> + int err;
> >>> +
> >>> + info = &info_stack;
> >>
> >> I think 'info' assignment could be moved below, to the place where it is used
> >> first time.
> >>
> >>> +
> >>> + if (dgram_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> >>> + return -EMSGSIZE;
> >>> +
> >>> + t_ops = virtio_transport_get_ops(vsk);
> >>> + if (unlikely(!t_ops))
> >>> + return -EFAULT;
> >>> +
> >>> + /* Unlike some of our other sending functions, this function is not
> >>> + * intended for use without a msghdr.
> >>> + */
> >>> + if (WARN_ONCE(!msg, "vsock dgram bug: no msghdr found for dgram enqueue\n"))
> >>> + return -EFAULT;
> >>
> >> Sorry, but is that possible? I thought 'msg' is always provided by general socket layer (e.g. before
> >> af_vsock.c code) and can't be NULL for DGRAM. Please correct me if i'm wrong.
> >>
> >> Also I see, that in af_vsock.c , 'vsock_dgram_sendmsg()' dereferences 'msg' for checking MSG_OOB without any
> >> checks (before calling transport callback - this function in case of virtio). So I think if we want to keep
> >> this type of check - such check must be placed in af_vsock.c or somewhere before first dereference of this pointer.
> >>
> >
> > There is some talk about dgram sockets adding additional messages types
> > in the future that help with congestion control. Those messages won't
> > come from the socket layer, so msghdr will be null. Since there is no
> > other function for sending datagrams, it seemed likely that this
> > function would be reworked for that purpose. I felt that adding this
> > check was a direct way to make it explicit that this function is
> > currently designed only for the socket-layer caller.
> >
> > Perhaps a comment would suffice?
>
> I see, thanks, it is for future usage. Sorry for dumb question: but if msg is NULL, how
> we will decide what to do in this call? Interface of this callback will be updated or
> some fields of 'vsock_sock' will contain type of such messages ?
>
> Thanks, Arseniy
>
Hey Arseniy, sorry about the delay I forgot about this chunk of the
thread.
This warning was intended to help by calling attention to the fact that
even though this function is the only way to send dgram packets, unlike
the connectible sending function virtio_transport_send_pkt_info() this
actually requires a non-NULL msg... it seems like it doesn't help and
just causes more confusion than anything. It is a wasted cycle on the
fastpath too, so I think I'll just drop it in the next rev.
> >
> >>> +
> >>> + noblock = msg->msg_flags & MSG_DONTWAIT;
> >>> +
> >>> + /* Use sock_alloc_send_skb to throttle by sk_sndbuf. This helps avoid
> >>> + * triggering the OOM.
> >>> + */
> >>> + skb = sock_alloc_send_skb(sk, dgram_len + VIRTIO_VSOCK_SKB_HEADROOM,
> >>> + noblock, &err);
> >>> + if (!skb)
> >>> + return err;
> >>> +
> >>> + skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM);
> >>> +
> >>> + src_cid = t_ops->transport.get_local_cid();
> >>> + src_port = vsk->local_addr.svm_port;
> >>> +
> >>> + hdr = virtio_vsock_hdr(skb);
> >>> + hdr->type = cpu_to_le16(info->type);
> >>> + hdr->op = cpu_to_le16(info->op);
> >>> + hdr->src_cid = cpu_to_le64(src_cid);
> >>> + hdr->dst_cid = cpu_to_le64(remote_addr->svm_cid);
> >>> + hdr->src_port = cpu_to_le32(src_port);
> >>> + hdr->dst_port = cpu_to_le32(remote_addr->svm_port);
> >>> + hdr->flags = cpu_to_le32(info->flags);
> >>> + hdr->len = cpu_to_le32(dgram_len);
> >>> +
> >>> + skb_set_owner_w(skb, sk);
> >>> +
> >>> + payload = skb_put(skb, dgram_len);
> >>> + err = memcpy_from_msg(payload, msg, dgram_len);
> >>> + if (err)
> >>> + return err;
> >>
> >> Do we need free allocated skb here ?
> >>
> >
> > Yep, thanks.
> >
> >>> +
> >>> + trace_virtio_transport_alloc_pkt(src_cid, src_port,
> >>> + remote_addr->svm_cid,
> >>> + remote_addr->svm_port,
> >>> + dgram_len,
> >>> + info->type,
> >>> + info->op,
> >>> + 0);
> >>> +
> >>> + return t_ops->send_pkt(skb);
> >>> }
> >>> EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
> >>>
> >>>
> >>
> >> Thanks, Arseniy
> >
> > Thanks for the review!
> >
> > Best,
> > Bobby
On Thu, Jul 27, 2023 at 11:00:55AM +0300, Arseniy Krasnov wrote:
>
>
> On 26.07.2023 20:55, Bobby Eshleman wrote:
> > On Sat, Jul 22, 2023 at 11:42:38AM +0300, Arseniy Krasnov wrote:
> >>
> >>
> >> On 19.07.2023 03:50, Bobby Eshleman wrote:
> >>> This commit implements datagram support for vhost/vsock by teaching
> >>> vhost to use the common virtio transport datagram functions.
> >>>
> >>> If the virtio RX buffer is too small, then the transmission is
> >>> abandoned, the packet dropped, and EHOSTUNREACH is added to the socket's
> >>> error queue.
> >>>
> >>> Signed-off-by: Bobby Eshleman <[email protected]>
> >>> ---
> >>> drivers/vhost/vsock.c | 62 +++++++++++++++++++++++++++++++++++++++++++++---
> >>> net/vmw_vsock/af_vsock.c | 5 +++-
> >>> 2 files changed, 63 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> >>> index d5d6a3c3f273..da14260c6654 100644
> >>> --- a/drivers/vhost/vsock.c
> >>> +++ b/drivers/vhost/vsock.c
> >>> @@ -8,6 +8,7 @@
> >>> */
> >>> #include <linux/miscdevice.h>
> >>> #include <linux/atomic.h>
> >>> +#include <linux/errqueue.h>
> >>> #include <linux/module.h>
> >>> #include <linux/mutex.h>
> >>> #include <linux/vmalloc.h>
> >>> @@ -32,7 +33,8 @@
> >>> enum {
> >>> VHOST_VSOCK_FEATURES = VHOST_FEATURES |
> >>> (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
> >>> - (1ULL << VIRTIO_VSOCK_F_SEQPACKET)
> >>> + (1ULL << VIRTIO_VSOCK_F_SEQPACKET) |
> >>> + (1ULL << VIRTIO_VSOCK_F_DGRAM)
> >>> };
> >>>
> >>> enum {
> >>> @@ -56,6 +58,7 @@ struct vhost_vsock {
> >>> atomic_t queued_replies;
> >>>
> >>> u32 guest_cid;
> >>> + bool dgram_allow;
> >>> bool seqpacket_allow;
> >>> };
> >>>
> >>> @@ -86,6 +89,32 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
> >>> return NULL;
> >>> }
> >>>
> >>> +/* Claims ownership of the skb, do not free the skb after calling! */
> >>> +static void
> >>> +vhost_transport_error(struct sk_buff *skb, int err)
> >>> +{
> >>> + struct sock_exterr_skb *serr;
> >>> + struct sock *sk = skb->sk;
> >>> + struct sk_buff *clone;
> >>> +
> >>> + serr = SKB_EXT_ERR(skb);
> >>> + memset(serr, 0, sizeof(*serr));
> >>> + serr->ee.ee_errno = err;
> >>> + serr->ee.ee_origin = SO_EE_ORIGIN_NONE;
> >>> +
> >>> + clone = skb_clone(skb, GFP_KERNEL);
> >>
> >> May for skb which is error carrier we can use 'sock_omalloc()', not 'skb_clone()' ? TCP uses skb
> >> allocated by this function as carriers of error structure. I guess 'skb_clone()' also clones data of origin,
> >> but i think that there is no need in data as we insert it to error queue of the socket.
> >>
> >> What do You think?
> >
> > IIUC skb_clone() is often used in this scenario so that the user can
> > retrieve the error-causing packet from the error queue. Is there some
> > reason we shouldn't do this?
> >
> > I'm seeing that the serr bits need to occur on the clone here, not the
> > original. I didn't realize the SKB_EXT_ERR() is a skb->cb cast. I'm not
> > actually sure how this passes the test case since ->cb isn't cloned.
>
> Ah yes, sorry, You are right, I just confused this case with zerocopy completion
> handling - there we allocate "empty" skb which carries completion metadata in its
> 'cb' field.
>
> Hm, but can't we just reinsert current skb (update it's 'cb' as 'sock_exterr_skb')
> to error queue of the socket without cloning it ?
>
> Thanks, Arseniy
>
I just assumed other socket types used skb_clone() for some reason
unknown to me and I didn't want to deviate.
If it is fine to just use the skb directly, then I am happy to make that
change.
Best,
Bobby
> >
> >>
> >>> + if (!clone)
> >>> + return;
> >>
> >> What will happen here 'if (!clone)' ? skb will leak as it was removed from queue?
> >>
> >
> > Ah yes, true.
> >
> >>> +
> >>> + if (sock_queue_err_skb(sk, clone))
> >>> + kfree_skb(clone);
> >>> +
> >>> + sk->sk_err = err;
> >>> + sk_error_report(sk);
> >>> +
> >>> + kfree_skb(skb);
> >>> +}
> >>> +
> >>> static void
> >>> vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> >>> struct vhost_virtqueue *vq)
> >>> @@ -160,9 +189,15 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> >>> hdr = virtio_vsock_hdr(skb);
> >>>
> >>> /* If the packet is greater than the space available in the
> >>> - * buffer, we split it using multiple buffers.
> >>> + * buffer, we split it using multiple buffers for connectible
> >>> + * sockets and drop the packet for datagram sockets.
> >>> */
> >>> if (payload_len > iov_len - sizeof(*hdr)) {
> >>> + if (le16_to_cpu(hdr->type) == VIRTIO_VSOCK_TYPE_DGRAM) {
> >>> + vhost_transport_error(skb, EHOSTUNREACH);
> >>> + continue;
> >>> + }
> >>> +
> >>> payload_len = iov_len - sizeof(*hdr);
> >>>
> >>> /* As we are copying pieces of large packet's buffer to
> >>> @@ -394,6 +429,7 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock)
> >>> return val < vq->num;
> >>> }
> >>>
> >>> +static bool vhost_transport_dgram_allow(u32 cid, u32 port);
> >>> static bool vhost_transport_seqpacket_allow(u32 remote_cid);
> >>>
> >>> static struct virtio_transport vhost_transport = {
> >>> @@ -410,7 +446,8 @@ static struct virtio_transport vhost_transport = {
> >>> .cancel_pkt = vhost_transport_cancel_pkt,
> >>>
> >>> .dgram_enqueue = virtio_transport_dgram_enqueue,
> >>> - .dgram_allow = virtio_transport_dgram_allow,
> >>> + .dgram_allow = vhost_transport_dgram_allow,
> >>> + .dgram_addr_init = virtio_transport_dgram_addr_init,
> >>>
> >>> .stream_enqueue = virtio_transport_stream_enqueue,
> >>> .stream_dequeue = virtio_transport_stream_dequeue,
> >>> @@ -443,6 +480,22 @@ static struct virtio_transport vhost_transport = {
> >>> .send_pkt = vhost_transport_send_pkt,
> >>> };
> >>>
> >>> +static bool vhost_transport_dgram_allow(u32 cid, u32 port)
> >>> +{
> >>> + struct vhost_vsock *vsock;
> >>> + bool dgram_allow = false;
> >>> +
> >>> + rcu_read_lock();
> >>> + vsock = vhost_vsock_get(cid);
> >>> +
> >>> + if (vsock)
> >>> + dgram_allow = vsock->dgram_allow;
> >>> +
> >>> + rcu_read_unlock();
> >>> +
> >>> + return dgram_allow;
> >>> +}
> >>> +
> >>> static bool vhost_transport_seqpacket_allow(u32 remote_cid)
> >>> {
> >>> struct vhost_vsock *vsock;
> >>> @@ -799,6 +852,9 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
> >>> if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
> >>> vsock->seqpacket_allow = true;
> >>>
> >>> + if (features & (1ULL << VIRTIO_VSOCK_F_DGRAM))
> >>> + vsock->dgram_allow = true;
> >>> +
> >>> for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
> >>> vq = &vsock->vqs[i];
> >>> mutex_lock(&vq->mutex);
> >>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> >>> index e73f3b2c52f1..449ed63ac2b0 100644
> >>> --- a/net/vmw_vsock/af_vsock.c
> >>> +++ b/net/vmw_vsock/af_vsock.c
> >>> @@ -1427,9 +1427,12 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> >>> return prot->recvmsg(sk, msg, len, flags, NULL);
> >>> #endif
> >>>
> >>> - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> >>> + if (unlikely(flags & MSG_OOB))
> >>> return -EOPNOTSUPP;
> >>>
> >>> + if (unlikely(flags & MSG_ERRQUEUE))
> >>> + return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, 0);
> >>> +
> >>
> >> Sorry, but I get build error here, because SOL_VSOCK in undefined. I think it should be added to
> >> include/linux/socket.h and to uapi files also for future use in userspace.
> >>
> >
> > Strange, I built each patch individually without issue. My base is
> > netdev/main with your SOL_VSOCK patch applied. I will look today and see
> > if I'm missing something.
> >
> >> Also Stefano Garzarella <[email protected]> suggested to add define something like VSOCK_RECVERR,
> >> in the same way as IP_RECVERR, and use it as last parameter of 'sock_recv_errqueue()'.
> >>
> >
> > Got it, thanks.
> >
> >>> transport = vsk->transport;
> >>>
> >>> /* Retrieve the head sk_buff from the socket's receive queue. */
> >>>
> >>
> >> Thanks, Arseniy
> >
> > Thanks,
> > Bobby
On Sun, Jul 23, 2023 at 12:53:15AM +0300, Arseniy Krasnov wrote:
>
>
> On 19.07.2023 03:50, Bobby Eshleman wrote:
> > This patch adds support for multi-transport datagrams.
> >
> > This includes:
> > - Per-packet lookup of transports when using sendto(sockaddr_vm)
> > - Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
> > sockaddr_vm
> > - rename VSOCK_TRANSPORT_F_DGRAM to VSOCK_TRANSPORT_F_DGRAM_FALLBACK
> > - connect() now assigns the transport for (similar to connectible
> > sockets)
> >
> > To preserve backwards compatibility with VMCI, some important changes
> > are made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
> > be used for dgrams only if there is not yet a g2h or h2g transport that
> > has been registered that can transmit the packet. If there is a g2h/h2g
> > transport for that remote address, then that transport will be used and
> > not "transport_dgram". This essentially makes "transport_dgram" a
> > fallback transport for when h2g/g2h has not yet gone online, and so it
> > is renamed "transport_dgram_fallback". VMCI implements this transport.
> >
> > The logic around "transport_dgram" needs to be retained to prevent
> > breaking VMCI:
> >
> > 1) VMCI datagrams existed prior to h2g/g2h and so operate under a
> > different paradigm. When the vmci transport comes online, it registers
> > itself with the DGRAM feature, but not H2G/G2H. Only later when the
> > transport has more information about its environment does it register
> > H2G or G2H. In the case that a datagram socket is created after
> > VSOCK_TRANSPORT_F_DGRAM registration but before G2H/H2G registration,
> > the "transport_dgram" transport is the only registered transport and so
> > needs to be used.
> >
> > 2) VMCI seems to require a special message be sent by the transport when a
> > datagram socket calls bind(). Under the h2g/g2h model, the transport
> > is selected using the remote_addr which is set by connect(). At
> > bind time there is no remote_addr because often no connect() has been
> > called yet: the transport is null. Therefore, with a null transport
> > there doesn't seem to be any good way for a datagram socket to tell the
> > VMCI transport that it has just had bind() called upon it.
> >
> > With the new fallback logic, after H2G/G2H comes online the socket layer
> > will access the VMCI transport via transport_{h2g,g2h}. Prior to H2G/G2H
> > coming online, the socket layer will access the VMCI transport via
> > "transport_dgram_fallback".
> >
> > Only transports with a special datagram fallback use-case such as VMCI
> > need to register VSOCK_TRANSPORT_F_DGRAM_FALLBACK.
> >
> > Signed-off-by: Bobby Eshleman <[email protected]>
> > ---
> > drivers/vhost/vsock.c | 1 -
> > include/linux/virtio_vsock.h | 2 --
> > include/net/af_vsock.h | 10 +++---
> > net/vmw_vsock/af_vsock.c | 64 ++++++++++++++++++++++++++-------
> > net/vmw_vsock/hyperv_transport.c | 6 ----
> > net/vmw_vsock/virtio_transport.c | 1 -
> > net/vmw_vsock/virtio_transport_common.c | 7 ----
> > net/vmw_vsock/vmci_transport.c | 2 +-
> > net/vmw_vsock/vsock_loopback.c | 1 -
> > 9 files changed, 58 insertions(+), 36 deletions(-)
> >
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index ae8891598a48..d5d6a3c3f273 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
> > .cancel_pkt = vhost_transport_cancel_pkt,
> >
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > - .dgram_bind = virtio_transport_dgram_bind,
> > .dgram_allow = virtio_transport_dgram_allow,
> >
> > .stream_enqueue = virtio_transport_stream_enqueue,
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 18cbe8d37fca..7632552bee58 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -211,8 +211,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> > u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> > bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> > bool virtio_transport_stream_allow(u32 cid, u32 port);
> > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > - struct sockaddr_vm *addr);
> > bool virtio_transport_dgram_allow(u32 cid, u32 port);
> >
> > int virtio_transport_connect(struct vsock_sock *vsk);
> > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > index 305d57502e89..f6a0ca9d7c3e 100644
> > --- a/include/net/af_vsock.h
> > +++ b/include/net/af_vsock.h
> > @@ -96,13 +96,13 @@ struct vsock_transport_send_notify_data {
> >
> > /* Transport features flags */
> > /* Transport provides host->guest communication */
> > -#define VSOCK_TRANSPORT_F_H2G 0x00000001
> > +#define VSOCK_TRANSPORT_F_H2G 0x00000001
> > /* Transport provides guest->host communication */
> > -#define VSOCK_TRANSPORT_F_G2H 0x00000002
> > -/* Transport provides DGRAM communication */
> > -#define VSOCK_TRANSPORT_F_DGRAM 0x00000004
> > +#define VSOCK_TRANSPORT_F_G2H 0x00000002
> > +/* Transport provides fallback for DGRAM communication */
> > +#define VSOCK_TRANSPORT_F_DGRAM_FALLBACK 0x00000004
> > /* Transport provides local (loopback) communication */
> > -#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
> > +#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
> >
> > struct vsock_transport {
> > struct module *module;
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index ae5ac5531d96..26c97b33d55a 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -139,8 +139,8 @@ struct proto vsock_proto = {
> > static const struct vsock_transport *transport_h2g;
> > /* Transport used for guest->host communication */
> > static const struct vsock_transport *transport_g2h;
> > -/* Transport used for DGRAM communication */
> > -static const struct vsock_transport *transport_dgram;
> > +/* Transport used as a fallback for DGRAM communication */
> > +static const struct vsock_transport *transport_dgram_fallback;
> > /* Transport used for local communication */
> > static const struct vsock_transport *transport_local;
> > static DEFINE_MUTEX(vsock_register_mutex);
> > @@ -439,6 +439,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
> > return transport;
> > }
> >
> > +static const struct vsock_transport *
> > +vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
> > +{
> > + const struct vsock_transport *transport;
> > +
> > + transport = vsock_connectible_lookup_transport(cid, flags);
> > + if (transport)
> > + return transport;
> > +
> > + return transport_dgram_fallback;
> > +}
> > +
> > /* Assign a transport to a socket and call the .init transport callback.
> > *
> > * Note: for connection oriented socket this must be called when vsk->remote_addr
> > @@ -475,7 +487,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> >
> > switch (sk->sk_type) {
> > case SOCK_DGRAM:
> > - new_transport = transport_dgram;
> > + new_transport = vsock_dgram_lookup_transport(remote_cid,
> > + remote_flags);
>
> I'm a little bit confused about this:
> 1) Let's create SOCK_DGRAM socket using vsock_create()
> 2) for SOCK_DGRAM it calls 'vsock_assign_transport()' and we go here, remote_cid == -1
> 3) I guess 'vsock_dgram_lookup_transport()' calls logic from 0002 and returns h2g for such remote cid, which is not
> correct I think...
>
> Please correct me if i'm wrong
>
> Thanks, Arseniy
>
As I understand, for the VMCI case, if transport_h2g != NULL, then
transport_h2g == transport_dgram_fallback. In either case,
vsk->transport == transport_dgram_fallback.
For the virtio/vhost case, temporarily vsk->transport == transport_h2g,
but it is unused because vsk->transport->dgram_bind == NULL.
Until SS_CONNECTED is set by connect() and vsk->transport is set
correctly, the send path is barred from using the bad transport.
I guess the recvmsg() path is a little more sketchy, and probably only
works in my test cases because h2g/g2h in the vhost/virtio case have
identical dgram_addr_init() implementations.
I think a cleaner solution is maybe checking in vsock_create() if
dgram_bind is implemented. If it is not, then vsk->transport should be
reset to NULL and a comment added explaining why VMCI requires this.
Then the other calls can begin explicitly checking for vsk->transport ==
NULL.
Thoughts?
> > break;
> > case SOCK_STREAM:
> > case SOCK_SEQPACKET:
> > @@ -692,6 +705,9 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > static int __vsock_bind_dgram(struct vsock_sock *vsk,
> > struct sockaddr_vm *addr)
> > {
> > + if (!vsk->transport || !vsk->transport->dgram_bind)
> > + return -EINVAL;
> > +
> > return vsk->transport->dgram_bind(vsk, addr);
> > }
> >
> > @@ -1162,6 +1178,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > struct vsock_sock *vsk;
> > struct sockaddr_vm *remote_addr;
> > const struct vsock_transport *transport;
> > + bool module_got = false;
> >
> > if (msg->msg_flags & MSG_OOB)
> > return -EOPNOTSUPP;
> > @@ -1173,19 +1190,34 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> >
> > lock_sock(sk);
> >
> > - transport = vsk->transport;
> > -
> > err = vsock_auto_bind(vsk);
> > if (err)
> > goto out;
> >
> > -
> > /* If the provided message contains an address, use that. Otherwise
> > * fall back on the socket's remote handle (if it has been connected).
> > */
> > if (msg->msg_name &&
> > vsock_addr_cast(msg->msg_name, msg->msg_namelen,
> > &remote_addr) == 0) {
> > + transport = vsock_dgram_lookup_transport(remote_addr->svm_cid,
> > + remote_addr->svm_flags);
> > + if (!transport) {
> > + err = -EINVAL;
> > + goto out;
> > + }
> > +
> > + if (!try_module_get(transport->module)) {
> > + err = -ENODEV;
> > + goto out;
> > + }
> > +
> > + /* When looking up a transport dynamically and acquiring a
> > + * reference on the module, we need to remember to release the
> > + * reference later.
> > + */
> > + module_got = true;
> > +
> > /* Ensure this address is of the right type and is a valid
> > * destination.
> > */
> > @@ -1200,6 +1232,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > } else if (sock->state == SS_CONNECTED) {
> > remote_addr = &vsk->remote_addr;
> >
> > + transport = vsk->transport;
> > if (remote_addr->svm_cid == VMADDR_CID_ANY)
> > remote_addr->svm_cid = transport->get_local_cid();
> >
> > @@ -1224,6 +1257,8 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
> >
> > out:
> > + if (module_got)
> > + module_put(transport->module);
> > release_sock(sk);
> > return err;
> > }
> > @@ -1256,13 +1291,18 @@ static int vsock_dgram_connect(struct socket *sock,
> > if (err)
> > goto out;
> >
> > + memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
> > +
> > + err = vsock_assign_transport(vsk, NULL);
> > + if (err)
> > + goto out;
> > +
> > if (!vsk->transport->dgram_allow(remote_addr->svm_cid,
> > remote_addr->svm_port)) {
> > err = -EINVAL;
> > goto out;
> > }
> >
> > - memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
> > sock->state = SS_CONNECTED;
> >
> > /* sock map disallows redirection of non-TCP sockets with sk_state !=
> > @@ -2487,7 +2527,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> >
> > t_h2g = transport_h2g;
> > t_g2h = transport_g2h;
> > - t_dgram = transport_dgram;
> > + t_dgram = transport_dgram_fallback;
> > t_local = transport_local;
> >
> > if (features & VSOCK_TRANSPORT_F_H2G) {
> > @@ -2506,7 +2546,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> > t_g2h = t;
> > }
> >
> > - if (features & VSOCK_TRANSPORT_F_DGRAM) {
> > + if (features & VSOCK_TRANSPORT_F_DGRAM_FALLBACK) {
> > if (t_dgram) {
> > err = -EBUSY;
> > goto err_busy;
> > @@ -2524,7 +2564,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> >
> > transport_h2g = t_h2g;
> > transport_g2h = t_g2h;
> > - transport_dgram = t_dgram;
> > + transport_dgram_fallback = t_dgram;
> > transport_local = t_local;
> >
> > err_busy:
> > @@ -2543,8 +2583,8 @@ void vsock_core_unregister(const struct vsock_transport *t)
> > if (transport_g2h == t)
> > transport_g2h = NULL;
> >
> > - if (transport_dgram == t)
> > - transport_dgram = NULL;
> > + if (transport_dgram_fallback == t)
> > + transport_dgram_fallback = NULL;
> >
> > if (transport_local == t)
> > transport_local = NULL;
> > diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> > index 7f1ea434656d..c29000f2612a 100644
> > --- a/net/vmw_vsock/hyperv_transport.c
> > +++ b/net/vmw_vsock/hyperv_transport.c
> > @@ -551,11 +551,6 @@ static void hvs_destruct(struct vsock_sock *vsk)
> > kfree(hvs);
> > }
> >
> > -static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
> > -{
> > - return -EOPNOTSUPP;
> > -}
> > -
> > static int hvs_dgram_enqueue(struct vsock_sock *vsk,
> > struct sockaddr_vm *remote, struct msghdr *msg,
> > size_t dgram_len)
> > @@ -826,7 +821,6 @@ static struct vsock_transport hvs_transport = {
> > .connect = hvs_connect,
> > .shutdown = hvs_shutdown,
> >
> > - .dgram_bind = hvs_dgram_bind,
> > .dgram_enqueue = hvs_dgram_enqueue,
> > .dgram_allow = hvs_dgram_allow,
> >
> > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > index 66edffdbf303..ac2126c7dac5 100644
> > --- a/net/vmw_vsock/virtio_transport.c
> > +++ b/net/vmw_vsock/virtio_transport.c
> > @@ -428,7 +428,6 @@ static struct virtio_transport virtio_transport = {
> > .shutdown = virtio_transport_shutdown,
> > .cancel_pkt = virtio_transport_cancel_pkt,
> >
> > - .dgram_bind = virtio_transport_dgram_bind,
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > .dgram_allow = virtio_transport_dgram_allow,
> >
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index 01ea1402ad40..ffcbdd77feaa 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -781,13 +781,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
> > }
> > EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> >
> > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > - struct sockaddr_vm *addr)
> > -{
> > - return -EOPNOTSUPP;
> > -}
> > -EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
> > -
> > bool virtio_transport_dgram_allow(u32 cid, u32 port)
> > {
> > return false;
> > diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> > index 0bbbdb222245..857b0461f856 100644
> > --- a/net/vmw_vsock/vmci_transport.c
> > +++ b/net/vmw_vsock/vmci_transport.c
> > @@ -2072,7 +2072,7 @@ static int __init vmci_transport_init(void)
> > /* Register only with dgram feature, other features (H2G, G2H) will be
> > * registered when the first host or guest becomes active.
> > */
> > - err = vsock_core_register(&vmci_transport, VSOCK_TRANSPORT_F_DGRAM);
> > + err = vsock_core_register(&vmci_transport, VSOCK_TRANSPORT_F_DGRAM_FALLBACK);
> > if (err < 0)
> > goto err_unsubscribe;
> >
> > diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> > index 2a59dd177c74..278235ea06c4 100644
> > --- a/net/vmw_vsock/vsock_loopback.c
> > +++ b/net/vmw_vsock/vsock_loopback.c
> > @@ -61,7 +61,6 @@ static struct virtio_transport loopback_transport = {
> > .shutdown = virtio_transport_shutdown,
> > .cancel_pkt = vsock_loopback_cancel_pkt,
> >
> > - .dgram_bind = virtio_transport_dgram_bind,
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > .dgram_allow = virtio_transport_dgram_allow,
> >
> >
On Wed, Aug 02, 2023 at 10:24:44PM +0000, Bobby Eshleman wrote:
> On Sun, Jul 23, 2023 at 12:53:15AM +0300, Arseniy Krasnov wrote:
> >
> >
> > On 19.07.2023 03:50, Bobby Eshleman wrote:
> > > This patch adds support for multi-transport datagrams.
> > >
> > > This includes:
> > > - Per-packet lookup of transports when using sendto(sockaddr_vm)
> > > - Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
> > > sockaddr_vm
> > > - rename VSOCK_TRANSPORT_F_DGRAM to VSOCK_TRANSPORT_F_DGRAM_FALLBACK
> > > - connect() now assigns the transport for (similar to connectible
> > > sockets)
> > >
> > > To preserve backwards compatibility with VMCI, some important changes
> > > are made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
> > > be used for dgrams only if there is not yet a g2h or h2g transport that
> > > has been registered that can transmit the packet. If there is a g2h/h2g
> > > transport for that remote address, then that transport will be used and
> > > not "transport_dgram". This essentially makes "transport_dgram" a
> > > fallback transport for when h2g/g2h has not yet gone online, and so it
> > > is renamed "transport_dgram_fallback". VMCI implements this transport.
> > >
> > > The logic around "transport_dgram" needs to be retained to prevent
> > > breaking VMCI:
> > >
> > > 1) VMCI datagrams existed prior to h2g/g2h and so operate under a
> > > different paradigm. When the vmci transport comes online, it registers
> > > itself with the DGRAM feature, but not H2G/G2H. Only later when the
> > > transport has more information about its environment does it register
> > > H2G or G2H. In the case that a datagram socket is created after
> > > VSOCK_TRANSPORT_F_DGRAM registration but before G2H/H2G registration,
> > > the "transport_dgram" transport is the only registered transport and so
> > > needs to be used.
> > >
> > > 2) VMCI seems to require a special message be sent by the transport when a
> > > datagram socket calls bind(). Under the h2g/g2h model, the transport
> > > is selected using the remote_addr which is set by connect(). At
> > > bind time there is no remote_addr because often no connect() has been
> > > called yet: the transport is null. Therefore, with a null transport
> > > there doesn't seem to be any good way for a datagram socket to tell the
> > > VMCI transport that it has just had bind() called upon it.
> > >
> > > With the new fallback logic, after H2G/G2H comes online the socket layer
> > > will access the VMCI transport via transport_{h2g,g2h}. Prior to H2G/G2H
> > > coming online, the socket layer will access the VMCI transport via
> > > "transport_dgram_fallback".
> > >
> > > Only transports with a special datagram fallback use-case such as VMCI
> > > need to register VSOCK_TRANSPORT_F_DGRAM_FALLBACK.
> > >
> > > Signed-off-by: Bobby Eshleman <[email protected]>
> > > ---
> > > drivers/vhost/vsock.c | 1 -
> > > include/linux/virtio_vsock.h | 2 --
> > > include/net/af_vsock.h | 10 +++---
> > > net/vmw_vsock/af_vsock.c | 64 ++++++++++++++++++++++++++-------
> > > net/vmw_vsock/hyperv_transport.c | 6 ----
> > > net/vmw_vsock/virtio_transport.c | 1 -
> > > net/vmw_vsock/virtio_transport_common.c | 7 ----
> > > net/vmw_vsock/vmci_transport.c | 2 +-
> > > net/vmw_vsock/vsock_loopback.c | 1 -
> > > 9 files changed, 58 insertions(+), 36 deletions(-)
> > >
> > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > index ae8891598a48..d5d6a3c3f273 100644
> > > --- a/drivers/vhost/vsock.c
> > > +++ b/drivers/vhost/vsock.c
> > > @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
> > > .cancel_pkt = vhost_transport_cancel_pkt,
> > >
> > > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > > - .dgram_bind = virtio_transport_dgram_bind,
> > > .dgram_allow = virtio_transport_dgram_allow,
> > >
> > > .stream_enqueue = virtio_transport_stream_enqueue,
> > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > > index 18cbe8d37fca..7632552bee58 100644
> > > --- a/include/linux/virtio_vsock.h
> > > +++ b/include/linux/virtio_vsock.h
> > > @@ -211,8 +211,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> > > u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> > > bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> > > bool virtio_transport_stream_allow(u32 cid, u32 port);
> > > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > > - struct sockaddr_vm *addr);
> > > bool virtio_transport_dgram_allow(u32 cid, u32 port);
> > >
> > > int virtio_transport_connect(struct vsock_sock *vsk);
> > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > > index 305d57502e89..f6a0ca9d7c3e 100644
> > > --- a/include/net/af_vsock.h
> > > +++ b/include/net/af_vsock.h
> > > @@ -96,13 +96,13 @@ struct vsock_transport_send_notify_data {
> > >
> > > /* Transport features flags */
> > > /* Transport provides host->guest communication */
> > > -#define VSOCK_TRANSPORT_F_H2G 0x00000001
> > > +#define VSOCK_TRANSPORT_F_H2G 0x00000001
> > > /* Transport provides guest->host communication */
> > > -#define VSOCK_TRANSPORT_F_G2H 0x00000002
> > > -/* Transport provides DGRAM communication */
> > > -#define VSOCK_TRANSPORT_F_DGRAM 0x00000004
> > > +#define VSOCK_TRANSPORT_F_G2H 0x00000002
> > > +/* Transport provides fallback for DGRAM communication */
> > > +#define VSOCK_TRANSPORT_F_DGRAM_FALLBACK 0x00000004
> > > /* Transport provides local (loopback) communication */
> > > -#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
> > > +#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
> > >
> > > struct vsock_transport {
> > > struct module *module;
> > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > > index ae5ac5531d96..26c97b33d55a 100644
> > > --- a/net/vmw_vsock/af_vsock.c
> > > +++ b/net/vmw_vsock/af_vsock.c
> > > @@ -139,8 +139,8 @@ struct proto vsock_proto = {
> > > static const struct vsock_transport *transport_h2g;
> > > /* Transport used for guest->host communication */
> > > static const struct vsock_transport *transport_g2h;
> > > -/* Transport used for DGRAM communication */
> > > -static const struct vsock_transport *transport_dgram;
> > > +/* Transport used as a fallback for DGRAM communication */
> > > +static const struct vsock_transport *transport_dgram_fallback;
> > > /* Transport used for local communication */
> > > static const struct vsock_transport *transport_local;
> > > static DEFINE_MUTEX(vsock_register_mutex);
> > > @@ -439,6 +439,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
> > > return transport;
> > > }
> > >
> > > +static const struct vsock_transport *
> > > +vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
> > > +{
> > > + const struct vsock_transport *transport;
> > > +
> > > + transport = vsock_connectible_lookup_transport(cid, flags);
> > > + if (transport)
> > > + return transport;
> > > +
> > > + return transport_dgram_fallback;
> > > +}
> > > +
> > > /* Assign a transport to a socket and call the .init transport callback.
> > > *
> > > * Note: for connection oriented socket this must be called when vsk->remote_addr
> > > @@ -475,7 +487,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> > >
> > > switch (sk->sk_type) {
> > > case SOCK_DGRAM:
> > > - new_transport = transport_dgram;
> > > + new_transport = vsock_dgram_lookup_transport(remote_cid,
> > > + remote_flags);
> >
> > I'm a little bit confused about this:
> > 1) Let's create SOCK_DGRAM socket using vsock_create()
> > 2) for SOCK_DGRAM it calls 'vsock_assign_transport()' and we go here, remote_cid == -1
> > 3) I guess 'vsock_dgram_lookup_transport()' calls logic from 0002 and returns h2g for such remote cid, which is not
> > correct I think...
> >
> > Please correct me if i'm wrong
> >
> > Thanks, Arseniy
> >
>
> As I understand, for the VMCI case, if transport_h2g != NULL, then
> transport_h2g == transport_dgram_fallback. In either case,
> vsk->transport == transport_dgram_fallback.
>
> For the virtio/vhost case, temporarily vsk->transport == transport_h2g,
> but it is unused because vsk->transport->dgram_bind == NULL.
>
> Until SS_CONNECTED is set by connect() and vsk->transport is set
> correctly, the send path is barred from using the bad transport.
>
> I guess the recvmsg() path is a little more sketchy, and probably only
> works in my test cases because h2g/g2h in the vhost/virtio case have
> identical dgram_addr_init() implementations.
>
> I think a cleaner solution is maybe checking in vsock_create() if
> dgram_bind is implemented. If it is not, then vsk->transport should be
> reset to NULL and a comment added explaining why VMCI requires this.
>
> Then the other calls can begin explicitly checking for vsk->transport ==
> NULL.
Actually, on further reflection here, in order for the vsk->transport to
be called in time for ->dgram_addr_init(), it is going to be necessary
to call vsock_assign_transport() in vsock_dgram_bind() anyway.
I think this means that the vsock_assign_transport() call can be removed
from vsock_create() call entirely, and yet VMCI can still dispatch
messages upon bind() calls as needed.
This would then simplify the whole arrangement, if there aren't other
unseen issues.
>
> Thoughts?
>
> > > break;
> > > case SOCK_STREAM:
> > > case SOCK_SEQPACKET:
> > > @@ -692,6 +705,9 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > > static int __vsock_bind_dgram(struct vsock_sock *vsk,
> > > struct sockaddr_vm *addr)
> > > {
> > > + if (!vsk->transport || !vsk->transport->dgram_bind)
> > > + return -EINVAL;
> > > +
> > > return vsk->transport->dgram_bind(vsk, addr);
> > > }
> > >
> > > @@ -1162,6 +1178,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > > struct vsock_sock *vsk;
> > > struct sockaddr_vm *remote_addr;
> > > const struct vsock_transport *transport;
> > > + bool module_got = false;
> > >
> > > if (msg->msg_flags & MSG_OOB)
> > > return -EOPNOTSUPP;
> > > @@ -1173,19 +1190,34 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > >
> > > lock_sock(sk);
> > >
> > > - transport = vsk->transport;
> > > -
> > > err = vsock_auto_bind(vsk);
> > > if (err)
> > > goto out;
> > >
> > > -
> > > /* If the provided message contains an address, use that. Otherwise
> > > * fall back on the socket's remote handle (if it has been connected).
> > > */
> > > if (msg->msg_name &&
> > > vsock_addr_cast(msg->msg_name, msg->msg_namelen,
> > > &remote_addr) == 0) {
> > > + transport = vsock_dgram_lookup_transport(remote_addr->svm_cid,
> > > + remote_addr->svm_flags);
> > > + if (!transport) {
> > > + err = -EINVAL;
> > > + goto out;
> > > + }
> > > +
> > > + if (!try_module_get(transport->module)) {
> > > + err = -ENODEV;
> > > + goto out;
> > > + }
> > > +
> > > + /* When looking up a transport dynamically and acquiring a
> > > + * reference on the module, we need to remember to release the
> > > + * reference later.
> > > + */
> > > + module_got = true;
> > > +
> > > /* Ensure this address is of the right type and is a valid
> > > * destination.
> > > */
> > > @@ -1200,6 +1232,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > > } else if (sock->state == SS_CONNECTED) {
> > > remote_addr = &vsk->remote_addr;
> > >
> > > + transport = vsk->transport;
> > > if (remote_addr->svm_cid == VMADDR_CID_ANY)
> > > remote_addr->svm_cid = transport->get_local_cid();
> > >
> > > @@ -1224,6 +1257,8 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > > err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
> > >
> > > out:
> > > + if (module_got)
> > > + module_put(transport->module);
> > > release_sock(sk);
> > > return err;
> > > }
> > > @@ -1256,13 +1291,18 @@ static int vsock_dgram_connect(struct socket *sock,
> > > if (err)
> > > goto out;
> > >
> > > + memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
> > > +
> > > + err = vsock_assign_transport(vsk, NULL);
> > > + if (err)
> > > + goto out;
> > > +
> > > if (!vsk->transport->dgram_allow(remote_addr->svm_cid,
> > > remote_addr->svm_port)) {
> > > err = -EINVAL;
> > > goto out;
> > > }
> > >
> > > - memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
> > > sock->state = SS_CONNECTED;
> > >
> > > /* sock map disallows redirection of non-TCP sockets with sk_state !=
> > > @@ -2487,7 +2527,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> > >
> > > t_h2g = transport_h2g;
> > > t_g2h = transport_g2h;
> > > - t_dgram = transport_dgram;
> > > + t_dgram = transport_dgram_fallback;
> > > t_local = transport_local;
> > >
> > > if (features & VSOCK_TRANSPORT_F_H2G) {
> > > @@ -2506,7 +2546,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> > > t_g2h = t;
> > > }
> > >
> > > - if (features & VSOCK_TRANSPORT_F_DGRAM) {
> > > + if (features & VSOCK_TRANSPORT_F_DGRAM_FALLBACK) {
> > > if (t_dgram) {
> > > err = -EBUSY;
> > > goto err_busy;
> > > @@ -2524,7 +2564,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> > >
> > > transport_h2g = t_h2g;
> > > transport_g2h = t_g2h;
> > > - transport_dgram = t_dgram;
> > > + transport_dgram_fallback = t_dgram;
> > > transport_local = t_local;
> > >
> > > err_busy:
> > > @@ -2543,8 +2583,8 @@ void vsock_core_unregister(const struct vsock_transport *t)
> > > if (transport_g2h == t)
> > > transport_g2h = NULL;
> > >
> > > - if (transport_dgram == t)
> > > - transport_dgram = NULL;
> > > + if (transport_dgram_fallback == t)
> > > + transport_dgram_fallback = NULL;
> > >
> > > if (transport_local == t)
> > > transport_local = NULL;
> > > diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> > > index 7f1ea434656d..c29000f2612a 100644
> > > --- a/net/vmw_vsock/hyperv_transport.c
> > > +++ b/net/vmw_vsock/hyperv_transport.c
> > > @@ -551,11 +551,6 @@ static void hvs_destruct(struct vsock_sock *vsk)
> > > kfree(hvs);
> > > }
> > >
> > > -static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
> > > -{
> > > - return -EOPNOTSUPP;
> > > -}
> > > -
> > > static int hvs_dgram_enqueue(struct vsock_sock *vsk,
> > > struct sockaddr_vm *remote, struct msghdr *msg,
> > > size_t dgram_len)
> > > @@ -826,7 +821,6 @@ static struct vsock_transport hvs_transport = {
> > > .connect = hvs_connect,
> > > .shutdown = hvs_shutdown,
> > >
> > > - .dgram_bind = hvs_dgram_bind,
> > > .dgram_enqueue = hvs_dgram_enqueue,
> > > .dgram_allow = hvs_dgram_allow,
> > >
> > > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > > index 66edffdbf303..ac2126c7dac5 100644
> > > --- a/net/vmw_vsock/virtio_transport.c
> > > +++ b/net/vmw_vsock/virtio_transport.c
> > > @@ -428,7 +428,6 @@ static struct virtio_transport virtio_transport = {
> > > .shutdown = virtio_transport_shutdown,
> > > .cancel_pkt = virtio_transport_cancel_pkt,
> > >
> > > - .dgram_bind = virtio_transport_dgram_bind,
> > > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > > .dgram_allow = virtio_transport_dgram_allow,
> > >
> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > index 01ea1402ad40..ffcbdd77feaa 100644
> > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > @@ -781,13 +781,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
> > > }
> > > EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> > >
> > > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > > - struct sockaddr_vm *addr)
> > > -{
> > > - return -EOPNOTSUPP;
> > > -}
> > > -EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
> > > -
> > > bool virtio_transport_dgram_allow(u32 cid, u32 port)
> > > {
> > > return false;
> > > diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> > > index 0bbbdb222245..857b0461f856 100644
> > > --- a/net/vmw_vsock/vmci_transport.c
> > > +++ b/net/vmw_vsock/vmci_transport.c
> > > @@ -2072,7 +2072,7 @@ static int __init vmci_transport_init(void)
> > > /* Register only with dgram feature, other features (H2G, G2H) will be
> > > * registered when the first host or guest becomes active.
> > > */
> > > - err = vsock_core_register(&vmci_transport, VSOCK_TRANSPORT_F_DGRAM);
> > > + err = vsock_core_register(&vmci_transport, VSOCK_TRANSPORT_F_DGRAM_FALLBACK);
> > > if (err < 0)
> > > goto err_unsubscribe;
> > >
> > > diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> > > index 2a59dd177c74..278235ea06c4 100644
> > > --- a/net/vmw_vsock/vsock_loopback.c
> > > +++ b/net/vmw_vsock/vsock_loopback.c
> > > @@ -61,7 +61,6 @@ static struct virtio_transport loopback_transport = {
> > > .shutdown = virtio_transport_shutdown,
> > > .cancel_pkt = vsock_loopback_cancel_pkt,
> > >
> > > - .dgram_bind = virtio_transport_dgram_bind,
> > > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > > .dgram_allow = virtio_transport_dgram_allow,
> > >
> > >
On Thu, Aug 03, 2023 at 12:53:22AM +0000, Bobby Eshleman wrote:
>On Wed, Aug 02, 2023 at 10:24:44PM +0000, Bobby Eshleman wrote:
>> On Sun, Jul 23, 2023 at 12:53:15AM +0300, Arseniy Krasnov wrote:
>> >
>> >
>> > On 19.07.2023 03:50, Bobby Eshleman wrote:
>> > > This patch adds support for multi-transport datagrams.
>> > >
>> > > This includes:
>> > > - Per-packet lookup of transports when using sendto(sockaddr_vm)
>> > > - Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
>> > > sockaddr_vm
>> > > - rename VSOCK_TRANSPORT_F_DGRAM to VSOCK_TRANSPORT_F_DGRAM_FALLBACK
>> > > - connect() now assigns the transport for (similar to connectible
>> > > sockets)
>> > >
>> > > To preserve backwards compatibility with VMCI, some important changes
>> > > are made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
>> > > be used for dgrams only if there is not yet a g2h or h2g transport that
>> > > has been registered that can transmit the packet. If there is a g2h/h2g
>> > > transport for that remote address, then that transport will be used and
>> > > not "transport_dgram". This essentially makes "transport_dgram" a
>> > > fallback transport for when h2g/g2h has not yet gone online, and so it
>> > > is renamed "transport_dgram_fallback". VMCI implements this transport.
>> > >
>> > > The logic around "transport_dgram" needs to be retained to prevent
>> > > breaking VMCI:
>> > >
>> > > 1) VMCI datagrams existed prior to h2g/g2h and so operate under a
>> > > different paradigm. When the vmci transport comes online, it registers
>> > > itself with the DGRAM feature, but not H2G/G2H. Only later when the
>> > > transport has more information about its environment does it register
>> > > H2G or G2H. In the case that a datagram socket is created after
>> > > VSOCK_TRANSPORT_F_DGRAM registration but before G2H/H2G registration,
>> > > the "transport_dgram" transport is the only registered transport and so
>> > > needs to be used.
>> > >
>> > > 2) VMCI seems to require a special message be sent by the transport when a
>> > > datagram socket calls bind(). Under the h2g/g2h model, the transport
>> > > is selected using the remote_addr which is set by connect(). At
>> > > bind time there is no remote_addr because often no connect() has been
>> > > called yet: the transport is null. Therefore, with a null transport
>> > > there doesn't seem to be any good way for a datagram socket to tell the
>> > > VMCI transport that it has just had bind() called upon it.
>> > >
>> > > With the new fallback logic, after H2G/G2H comes online the socket layer
>> > > will access the VMCI transport via transport_{h2g,g2h}. Prior to H2G/G2H
>> > > coming online, the socket layer will access the VMCI transport via
>> > > "transport_dgram_fallback".
>> > >
>> > > Only transports with a special datagram fallback use-case such as VMCI
>> > > need to register VSOCK_TRANSPORT_F_DGRAM_FALLBACK.
>> > >
>> > > Signed-off-by: Bobby Eshleman <[email protected]>
>> > > ---
>> > > drivers/vhost/vsock.c | 1 -
>> > > include/linux/virtio_vsock.h | 2 --
>> > > include/net/af_vsock.h | 10 +++---
>> > > net/vmw_vsock/af_vsock.c | 64 ++++++++++++++++++++++++++-------
>> > > net/vmw_vsock/hyperv_transport.c | 6 ----
>> > > net/vmw_vsock/virtio_transport.c | 1 -
>> > > net/vmw_vsock/virtio_transport_common.c | 7 ----
>> > > net/vmw_vsock/vmci_transport.c | 2 +-
>> > > net/vmw_vsock/vsock_loopback.c | 1 -
>> > > 9 files changed, 58 insertions(+), 36 deletions(-)
>> > >
>> > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> > > index ae8891598a48..d5d6a3c3f273 100644
>> > > --- a/drivers/vhost/vsock.c
>> > > +++ b/drivers/vhost/vsock.c
>> > > @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
>> > > .cancel_pkt = vhost_transport_cancel_pkt,
>> > >
>> > > .dgram_enqueue = virtio_transport_dgram_enqueue,
>> > > - .dgram_bind = virtio_transport_dgram_bind,
>> > > .dgram_allow = virtio_transport_dgram_allow,
>> > >
>> > > .stream_enqueue = virtio_transport_stream_enqueue,
>> > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> > > index 18cbe8d37fca..7632552bee58 100644
>> > > --- a/include/linux/virtio_vsock.h
>> > > +++ b/include/linux/virtio_vsock.h
>> > > @@ -211,8 +211,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
>> > > u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
>> > > bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
>> > > bool virtio_transport_stream_allow(u32 cid, u32 port);
>> > > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>> > > - struct sockaddr_vm *addr);
>> > > bool virtio_transport_dgram_allow(u32 cid, u32 port);
>> > >
>> > > int virtio_transport_connect(struct vsock_sock *vsk);
>> > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>> > > index 305d57502e89..f6a0ca9d7c3e 100644
>> > > --- a/include/net/af_vsock.h
>> > > +++ b/include/net/af_vsock.h
>> > > @@ -96,13 +96,13 @@ struct vsock_transport_send_notify_data {
>> > >
>> > > /* Transport features flags */
>> > > /* Transport provides host->guest communication */
>> > > -#define VSOCK_TRANSPORT_F_H2G 0x00000001
>> > > +#define VSOCK_TRANSPORT_F_H2G 0x00000001
>> > > /* Transport provides guest->host communication */
>> > > -#define VSOCK_TRANSPORT_F_G2H 0x00000002
>> > > -/* Transport provides DGRAM communication */
>> > > -#define VSOCK_TRANSPORT_F_DGRAM 0x00000004
>> > > +#define VSOCK_TRANSPORT_F_G2H 0x00000002
>> > > +/* Transport provides fallback for DGRAM communication */
>> > > +#define VSOCK_TRANSPORT_F_DGRAM_FALLBACK 0x00000004
>> > > /* Transport provides local (loopback) communication */
>> > > -#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
>> > > +#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
>> > >
>> > > struct vsock_transport {
>> > > struct module *module;
>> > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> > > index ae5ac5531d96..26c97b33d55a 100644
>> > > --- a/net/vmw_vsock/af_vsock.c
>> > > +++ b/net/vmw_vsock/af_vsock.c
>> > > @@ -139,8 +139,8 @@ struct proto vsock_proto = {
>> > > static const struct vsock_transport *transport_h2g;
>> > > /* Transport used for guest->host communication */
>> > > static const struct vsock_transport *transport_g2h;
>> > > -/* Transport used for DGRAM communication */
>> > > -static const struct vsock_transport *transport_dgram;
>> > > +/* Transport used as a fallback for DGRAM communication */
>> > > +static const struct vsock_transport *transport_dgram_fallback;
>> > > /* Transport used for local communication */
>> > > static const struct vsock_transport *transport_local;
>> > > static DEFINE_MUTEX(vsock_register_mutex);
>> > > @@ -439,6 +439,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
>> > > return transport;
>> > > }
>> > >
>> > > +static const struct vsock_transport *
>> > > +vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
>> > > +{
>> > > + const struct vsock_transport *transport;
>> > > +
>> > > + transport = vsock_connectible_lookup_transport(cid, flags);
>> > > + if (transport)
>> > > + return transport;
>> > > +
>> > > + return transport_dgram_fallback;
>> > > +}
>> > > +
>> > > /* Assign a transport to a socket and call the .init transport callback.
>> > > *
>> > > * Note: for connection oriented socket this must be called when vsk->remote_addr
>> > > @@ -475,7 +487,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
>> > >
>> > > switch (sk->sk_type) {
>> > > case SOCK_DGRAM:
>> > > - new_transport = transport_dgram;
>> > > + new_transport = vsock_dgram_lookup_transport(remote_cid,
>> > > + remote_flags);
>> >
>> > I'm a little bit confused about this:
>> > 1) Let's create SOCK_DGRAM socket using vsock_create()
>> > 2) for SOCK_DGRAM it calls 'vsock_assign_transport()' and we go here, remote_cid == -1
>> > 3) I guess 'vsock_dgram_lookup_transport()' calls logic from 0002 and returns h2g for such remote cid, which is not
>> > correct I think...
>> >
>> > Please correct me if i'm wrong
>> >
>> > Thanks, Arseniy
>> >
>>
>> As I understand, for the VMCI case, if transport_h2g != NULL, then
>> transport_h2g == transport_dgram_fallback. In either case,
>> vsk->transport == transport_dgram_fallback.
>>
>> For the virtio/vhost case, temporarily vsk->transport == transport_h2g,
>> but it is unused because vsk->transport->dgram_bind == NULL.
>>
>> Until SS_CONNECTED is set by connect() and vsk->transport is set
>> correctly, the send path is barred from using the bad transport.
>>
>> I guess the recvmsg() path is a little more sketchy, and probably only
>> works in my test cases because h2g/g2h in the vhost/virtio case have
>> identical dgram_addr_init() implementations.
>>
>> I think a cleaner solution is maybe checking in vsock_create() if
>> dgram_bind is implemented. If it is not, then vsk->transport should be
>> reset to NULL and a comment added explaining why VMCI requires this.
>>
>> Then the other calls can begin explicitly checking for vsk->transport ==
>> NULL.
>
>Actually, on further reflection here, in order for the vsk->transport to
>be called in time for ->dgram_addr_init(), it is going to be necessary
>to call vsock_assign_transport() in vsock_dgram_bind() anyway.
>
>I think this means that the vsock_assign_transport() call can be removed
>from vsock_create() call entirely, and yet VMCI can still dispatch
>messages upon bind() calls as needed.
>
>This would then simplify the whole arrangement, if there aren't other
>unseen issues.
This sounds like a good approach.
My only question is whether vsock_dgram_bind() is always called for each
dgram socket.
Stefano
On Thu, Aug 03, 2023 at 02:42:26PM +0200, Stefano Garzarella wrote:
> On Thu, Aug 03, 2023 at 12:53:22AM +0000, Bobby Eshleman wrote:
> > On Wed, Aug 02, 2023 at 10:24:44PM +0000, Bobby Eshleman wrote:
> > > On Sun, Jul 23, 2023 at 12:53:15AM +0300, Arseniy Krasnov wrote:
> > > >
> > > >
> > > > On 19.07.2023 03:50, Bobby Eshleman wrote:
> > > > > This patch adds support for multi-transport datagrams.
> > > > >
> > > > > This includes:
> > > > > - Per-packet lookup of transports when using sendto(sockaddr_vm)
> > > > > - Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
> > > > > sockaddr_vm
> > > > > - rename VSOCK_TRANSPORT_F_DGRAM to VSOCK_TRANSPORT_F_DGRAM_FALLBACK
> > > > > - connect() now assigns the transport for (similar to connectible
> > > > > sockets)
> > > > >
> > > > > To preserve backwards compatibility with VMCI, some important changes
> > > > > are made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
> > > > > be used for dgrams only if there is not yet a g2h or h2g transport that
> > > > > has been registered that can transmit the packet. If there is a g2h/h2g
> > > > > transport for that remote address, then that transport will be used and
> > > > > not "transport_dgram". This essentially makes "transport_dgram" a
> > > > > fallback transport for when h2g/g2h has not yet gone online, and so it
> > > > > is renamed "transport_dgram_fallback". VMCI implements this transport.
> > > > >
> > > > > The logic around "transport_dgram" needs to be retained to prevent
> > > > > breaking VMCI:
> > > > >
> > > > > 1) VMCI datagrams existed prior to h2g/g2h and so operate under a
> > > > > different paradigm. When the vmci transport comes online, it registers
> > > > > itself with the DGRAM feature, but not H2G/G2H. Only later when the
> > > > > transport has more information about its environment does it register
> > > > > H2G or G2H. In the case that a datagram socket is created after
> > > > > VSOCK_TRANSPORT_F_DGRAM registration but before G2H/H2G registration,
> > > > > the "transport_dgram" transport is the only registered transport and so
> > > > > needs to be used.
> > > > >
> > > > > 2) VMCI seems to require a special message be sent by the transport when a
> > > > > datagram socket calls bind(). Under the h2g/g2h model, the transport
> > > > > is selected using the remote_addr which is set by connect(). At
> > > > > bind time there is no remote_addr because often no connect() has been
> > > > > called yet: the transport is null. Therefore, with a null transport
> > > > > there doesn't seem to be any good way for a datagram socket to tell the
> > > > > VMCI transport that it has just had bind() called upon it.
> > > > >
> > > > > With the new fallback logic, after H2G/G2H comes online the socket layer
> > > > > will access the VMCI transport via transport_{h2g,g2h}. Prior to H2G/G2H
> > > > > coming online, the socket layer will access the VMCI transport via
> > > > > "transport_dgram_fallback".
> > > > >
> > > > > Only transports with a special datagram fallback use-case such as VMCI
> > > > > need to register VSOCK_TRANSPORT_F_DGRAM_FALLBACK.
> > > > >
> > > > > Signed-off-by: Bobby Eshleman <[email protected]>
> > > > > ---
> > > > > drivers/vhost/vsock.c | 1 -
> > > > > include/linux/virtio_vsock.h | 2 --
> > > > > include/net/af_vsock.h | 10 +++---
> > > > > net/vmw_vsock/af_vsock.c | 64 ++++++++++++++++++++++++++-------
> > > > > net/vmw_vsock/hyperv_transport.c | 6 ----
> > > > > net/vmw_vsock/virtio_transport.c | 1 -
> > > > > net/vmw_vsock/virtio_transport_common.c | 7 ----
> > > > > net/vmw_vsock/vmci_transport.c | 2 +-
> > > > > net/vmw_vsock/vsock_loopback.c | 1 -
> > > > > 9 files changed, 58 insertions(+), 36 deletions(-)
> > > > >
> > > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > > index ae8891598a48..d5d6a3c3f273 100644
> > > > > --- a/drivers/vhost/vsock.c
> > > > > +++ b/drivers/vhost/vsock.c
> > > > > @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
> > > > > .cancel_pkt = vhost_transport_cancel_pkt,
> > > > >
> > > > > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > > > > - .dgram_bind = virtio_transport_dgram_bind,
> > > > > .dgram_allow = virtio_transport_dgram_allow,
> > > > >
> > > > > .stream_enqueue = virtio_transport_stream_enqueue,
> > > > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > > > > index 18cbe8d37fca..7632552bee58 100644
> > > > > --- a/include/linux/virtio_vsock.h
> > > > > +++ b/include/linux/virtio_vsock.h
> > > > > @@ -211,8 +211,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> > > > > u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> > > > > bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> > > > > bool virtio_transport_stream_allow(u32 cid, u32 port);
> > > > > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > > > > - struct sockaddr_vm *addr);
> > > > > bool virtio_transport_dgram_allow(u32 cid, u32 port);
> > > > >
> > > > > int virtio_transport_connect(struct vsock_sock *vsk);
> > > > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > > > > index 305d57502e89..f6a0ca9d7c3e 100644
> > > > > --- a/include/net/af_vsock.h
> > > > > +++ b/include/net/af_vsock.h
> > > > > @@ -96,13 +96,13 @@ struct vsock_transport_send_notify_data {
> > > > >
> > > > > /* Transport features flags */
> > > > > /* Transport provides host->guest communication */
> > > > > -#define VSOCK_TRANSPORT_F_H2G 0x00000001
> > > > > +#define VSOCK_TRANSPORT_F_H2G 0x00000001
> > > > > /* Transport provides guest->host communication */
> > > > > -#define VSOCK_TRANSPORT_F_G2H 0x00000002
> > > > > -/* Transport provides DGRAM communication */
> > > > > -#define VSOCK_TRANSPORT_F_DGRAM 0x00000004
> > > > > +#define VSOCK_TRANSPORT_F_G2H 0x00000002
> > > > > +/* Transport provides fallback for DGRAM communication */
> > > > > +#define VSOCK_TRANSPORT_F_DGRAM_FALLBACK 0x00000004
> > > > > /* Transport provides local (loopback) communication */
> > > > > -#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
> > > > > +#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
> > > > >
> > > > > struct vsock_transport {
> > > > > struct module *module;
> > > > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > > > > index ae5ac5531d96..26c97b33d55a 100644
> > > > > --- a/net/vmw_vsock/af_vsock.c
> > > > > +++ b/net/vmw_vsock/af_vsock.c
> > > > > @@ -139,8 +139,8 @@ struct proto vsock_proto = {
> > > > > static const struct vsock_transport *transport_h2g;
> > > > > /* Transport used for guest->host communication */
> > > > > static const struct vsock_transport *transport_g2h;
> > > > > -/* Transport used for DGRAM communication */
> > > > > -static const struct vsock_transport *transport_dgram;
> > > > > +/* Transport used as a fallback for DGRAM communication */
> > > > > +static const struct vsock_transport *transport_dgram_fallback;
> > > > > /* Transport used for local communication */
> > > > > static const struct vsock_transport *transport_local;
> > > > > static DEFINE_MUTEX(vsock_register_mutex);
> > > > > @@ -439,6 +439,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
> > > > > return transport;
> > > > > }
> > > > >
> > > > > +static const struct vsock_transport *
> > > > > +vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
> > > > > +{
> > > > > + const struct vsock_transport *transport;
> > > > > +
> > > > > + transport = vsock_connectible_lookup_transport(cid, flags);
> > > > > + if (transport)
> > > > > + return transport;
> > > > > +
> > > > > + return transport_dgram_fallback;
> > > > > +}
> > > > > +
> > > > > /* Assign a transport to a socket and call the .init transport callback.
> > > > > *
> > > > > * Note: for connection oriented socket this must be called when vsk->remote_addr
> > > > > @@ -475,7 +487,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> > > > >
> > > > > switch (sk->sk_type) {
> > > > > case SOCK_DGRAM:
> > > > > - new_transport = transport_dgram;
> > > > > + new_transport = vsock_dgram_lookup_transport(remote_cid,
> > > > > + remote_flags);
> > > >
> > > > I'm a little bit confused about this:
> > > > 1) Let's create SOCK_DGRAM socket using vsock_create()
> > > > 2) for SOCK_DGRAM it calls 'vsock_assign_transport()' and we go here, remote_cid == -1
> > > > 3) I guess 'vsock_dgram_lookup_transport()' calls logic from 0002 and returns h2g for such remote cid, which is not
> > > > correct I think...
> > > >
> > > > Please correct me if i'm wrong
> > > >
> > > > Thanks, Arseniy
> > > >
> > >
> > > As I understand, for the VMCI case, if transport_h2g != NULL, then
> > > transport_h2g == transport_dgram_fallback. In either case,
> > > vsk->transport == transport_dgram_fallback.
> > >
> > > For the virtio/vhost case, temporarily vsk->transport == transport_h2g,
> > > but it is unused because vsk->transport->dgram_bind == NULL.
> > >
> > > Until SS_CONNECTED is set by connect() and vsk->transport is set
> > > correctly, the send path is barred from using the bad transport.
> > >
> > > I guess the recvmsg() path is a little more sketchy, and probably only
> > > works in my test cases because h2g/g2h in the vhost/virtio case have
> > > identical dgram_addr_init() implementations.
> > >
> > > I think a cleaner solution is maybe checking in vsock_create() if
> > > dgram_bind is implemented. If it is not, then vsk->transport should be
> > > reset to NULL and a comment added explaining why VMCI requires this.
> > >
> > > Then the other calls can begin explicitly checking for vsk->transport ==
> > > NULL.
> >
> > Actually, on further reflection here, in order for the vsk->transport to
> > be called in time for ->dgram_addr_init(), it is going to be necessary
> > to call vsock_assign_transport() in vsock_dgram_bind() anyway.
> >
> > I think this means that the vsock_assign_transport() call can be removed
> > from vsock_create() call entirely, and yet VMCI can still dispatch
> > messages upon bind() calls as needed.
> >
> > This would then simplify the whole arrangement, if there aren't other
> > unseen issues.
>
> This sounds like a good approach.
>
> My only question is whether vsock_dgram_bind() is always called for each
> dgram socket.
>
No, not yet.
Currently, receivers may use vsock_dgram_recvmsg() prior to any bind,
but this should probably change.
For UDP, if we initialize a socket and call recvmsg() with no prior
bind, then the socket will be auto-bound to 0.0.0.0. I guess vsock
should probably also auto-bind in this case.
For other cases, bind may not be called prior to calls to vsock_poll() /
vsock_getname() (even if it doesn't make sense to do so), but I think it
is okay as long as vsk->transport is not used.
vsock_dgram_sendmsg() always auto-binds if needed.
> Stefano
>
> _______________________________________________
> Virtualization mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
On 03.08.2023 00:23, Bobby Eshleman wrote:
> On Thu, Jul 27, 2023 at 11:00:55AM +0300, Arseniy Krasnov wrote:
>>
>>
>> On 26.07.2023 20:55, Bobby Eshleman wrote:
>>> On Sat, Jul 22, 2023 at 11:42:38AM +0300, Arseniy Krasnov wrote:
>>>>
>>>>
>>>> On 19.07.2023 03:50, Bobby Eshleman wrote:
>>>>> This commit implements datagram support for vhost/vsock by teaching
>>>>> vhost to use the common virtio transport datagram functions.
>>>>>
>>>>> If the virtio RX buffer is too small, then the transmission is
>>>>> abandoned, the packet dropped, and EHOSTUNREACH is added to the socket's
>>>>> error queue.
>>>>>
>>>>> Signed-off-by: Bobby Eshleman <[email protected]>
>>>>> ---
>>>>> drivers/vhost/vsock.c | 62 +++++++++++++++++++++++++++++++++++++++++++++---
>>>>> net/vmw_vsock/af_vsock.c | 5 +++-
>>>>> 2 files changed, 63 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>>>>> index d5d6a3c3f273..da14260c6654 100644
>>>>> --- a/drivers/vhost/vsock.c
>>>>> +++ b/drivers/vhost/vsock.c
>>>>> @@ -8,6 +8,7 @@
>>>>> */
>>>>> #include <linux/miscdevice.h>
>>>>> #include <linux/atomic.h>
>>>>> +#include <linux/errqueue.h>
>>>>> #include <linux/module.h>
>>>>> #include <linux/mutex.h>
>>>>> #include <linux/vmalloc.h>
>>>>> @@ -32,7 +33,8 @@
>>>>> enum {
>>>>> VHOST_VSOCK_FEATURES = VHOST_FEATURES |
>>>>> (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
>>>>> - (1ULL << VIRTIO_VSOCK_F_SEQPACKET)
>>>>> + (1ULL << VIRTIO_VSOCK_F_SEQPACKET) |
>>>>> + (1ULL << VIRTIO_VSOCK_F_DGRAM)
>>>>> };
>>>>>
>>>>> enum {
>>>>> @@ -56,6 +58,7 @@ struct vhost_vsock {
>>>>> atomic_t queued_replies;
>>>>>
>>>>> u32 guest_cid;
>>>>> + bool dgram_allow;
>>>>> bool seqpacket_allow;
>>>>> };
>>>>>
>>>>> @@ -86,6 +89,32 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
>>>>> return NULL;
>>>>> }
>>>>>
>>>>> +/* Claims ownership of the skb, do not free the skb after calling! */
>>>>> +static void
>>>>> +vhost_transport_error(struct sk_buff *skb, int err)
>>>>> +{
>>>>> + struct sock_exterr_skb *serr;
>>>>> + struct sock *sk = skb->sk;
>>>>> + struct sk_buff *clone;
>>>>> +
>>>>> + serr = SKB_EXT_ERR(skb);
>>>>> + memset(serr, 0, sizeof(*serr));
>>>>> + serr->ee.ee_errno = err;
>>>>> + serr->ee.ee_origin = SO_EE_ORIGIN_NONE;
>>>>> +
>>>>> + clone = skb_clone(skb, GFP_KERNEL);
>>>>
>>>> May for skb which is error carrier we can use 'sock_omalloc()', not 'skb_clone()' ? TCP uses skb
>>>> allocated by this function as carriers of error structure. I guess 'skb_clone()' also clones data of origin,
>>>> but i think that there is no need in data as we insert it to error queue of the socket.
>>>>
>>>> What do You think?
>>>
>>> IIUC skb_clone() is often used in this scenario so that the user can
>>> retrieve the error-causing packet from the error queue. Is there some
>>> reason we shouldn't do this?
>>>
>>> I'm seeing that the serr bits need to occur on the clone here, not the
>>> original. I didn't realize the SKB_EXT_ERR() is a skb->cb cast. I'm not
>>> actually sure how this passes the test case since ->cb isn't cloned.
>>
>> Ah yes, sorry, You are right, I just confused this case with zerocopy completion
>> handling - there we allocate "empty" skb which carries completion metadata in its
>> 'cb' field.
>>
>> Hm, but can't we just reinsert current skb (update it's 'cb' as 'sock_exterr_skb')
>> to error queue of the socket without cloning it ?
>>
>> Thanks, Arseniy
>>
>
> I just assumed other socket types used skb_clone() for some reason
> unknown to me and I didn't want to deviate.
>
> If it is fine to just use the skb directly, then I am happy to make that
> change.
Agree, it is better to use behaviour from already implemented sockets.
I also found, that ICMP clones skb in this way:
https://elixir.bootlin.com/linux/latest/source/net/ipv4/ip_sockglue.c#L412
skb = skb_clone(skb, GFP_ATOMIC);
I guess there is some sense beyond 'skb = skb_clone(skb)'...
Thanks, Arseniy
>
> Best,
> Bobby
>
>>>
>>>>
>>>>> + if (!clone)
>>>>> + return;
>>>>
>>>> What will happen here 'if (!clone)' ? skb will leak as it was removed from queue?
>>>>
>>>
>>> Ah yes, true.
>>>
>>>>> +
>>>>> + if (sock_queue_err_skb(sk, clone))
>>>>> + kfree_skb(clone);
>>>>> +
>>>>> + sk->sk_err = err;
>>>>> + sk_error_report(sk);
>>>>> +
>>>>> + kfree_skb(skb);
>>>>> +}
>>>>> +
>>>>> static void
>>>>> vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>>>>> struct vhost_virtqueue *vq)
>>>>> @@ -160,9 +189,15 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>>>>> hdr = virtio_vsock_hdr(skb);
>>>>>
>>>>> /* If the packet is greater than the space available in the
>>>>> - * buffer, we split it using multiple buffers.
>>>>> + * buffer, we split it using multiple buffers for connectible
>>>>> + * sockets and drop the packet for datagram sockets.
>>>>> */
>>>>> if (payload_len > iov_len - sizeof(*hdr)) {
>>>>> + if (le16_to_cpu(hdr->type) == VIRTIO_VSOCK_TYPE_DGRAM) {
>>>>> + vhost_transport_error(skb, EHOSTUNREACH);
>>>>> + continue;
>>>>> + }
>>>>> +
>>>>> payload_len = iov_len - sizeof(*hdr);
>>>>>
>>>>> /* As we are copying pieces of large packet's buffer to
>>>>> @@ -394,6 +429,7 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock)
>>>>> return val < vq->num;
>>>>> }
>>>>>
>>>>> +static bool vhost_transport_dgram_allow(u32 cid, u32 port);
>>>>> static bool vhost_transport_seqpacket_allow(u32 remote_cid);
>>>>>
>>>>> static struct virtio_transport vhost_transport = {
>>>>> @@ -410,7 +446,8 @@ static struct virtio_transport vhost_transport = {
>>>>> .cancel_pkt = vhost_transport_cancel_pkt,
>>>>>
>>>>> .dgram_enqueue = virtio_transport_dgram_enqueue,
>>>>> - .dgram_allow = virtio_transport_dgram_allow,
>>>>> + .dgram_allow = vhost_transport_dgram_allow,
>>>>> + .dgram_addr_init = virtio_transport_dgram_addr_init,
>>>>>
>>>>> .stream_enqueue = virtio_transport_stream_enqueue,
>>>>> .stream_dequeue = virtio_transport_stream_dequeue,
>>>>> @@ -443,6 +480,22 @@ static struct virtio_transport vhost_transport = {
>>>>> .send_pkt = vhost_transport_send_pkt,
>>>>> };
>>>>>
>>>>> +static bool vhost_transport_dgram_allow(u32 cid, u32 port)
>>>>> +{
>>>>> + struct vhost_vsock *vsock;
>>>>> + bool dgram_allow = false;
>>>>> +
>>>>> + rcu_read_lock();
>>>>> + vsock = vhost_vsock_get(cid);
>>>>> +
>>>>> + if (vsock)
>>>>> + dgram_allow = vsock->dgram_allow;
>>>>> +
>>>>> + rcu_read_unlock();
>>>>> +
>>>>> + return dgram_allow;
>>>>> +}
>>>>> +
>>>>> static bool vhost_transport_seqpacket_allow(u32 remote_cid)
>>>>> {
>>>>> struct vhost_vsock *vsock;
>>>>> @@ -799,6 +852,9 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
>>>>> if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
>>>>> vsock->seqpacket_allow = true;
>>>>>
>>>>> + if (features & (1ULL << VIRTIO_VSOCK_F_DGRAM))
>>>>> + vsock->dgram_allow = true;
>>>>> +
>>>>> for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
>>>>> vq = &vsock->vqs[i];
>>>>> mutex_lock(&vq->mutex);
>>>>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>>>>> index e73f3b2c52f1..449ed63ac2b0 100644
>>>>> --- a/net/vmw_vsock/af_vsock.c
>>>>> +++ b/net/vmw_vsock/af_vsock.c
>>>>> @@ -1427,9 +1427,12 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
>>>>> return prot->recvmsg(sk, msg, len, flags, NULL);
>>>>> #endif
>>>>>
>>>>> - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
>>>>> + if (unlikely(flags & MSG_OOB))
>>>>> return -EOPNOTSUPP;
>>>>>
>>>>> + if (unlikely(flags & MSG_ERRQUEUE))
>>>>> + return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, 0);
>>>>> +
>>>>
>>>> Sorry, but I get build error here, because SOL_VSOCK in undefined. I think it should be added to
>>>> include/linux/socket.h and to uapi files also for future use in userspace.
>>>>
>>>
>>> Strange, I built each patch individually without issue. My base is
>>> netdev/main with your SOL_VSOCK patch applied. I will look today and see
>>> if I'm missing something.
>>>
>>>> Also Stefano Garzarella <[email protected]> suggested to add define something like VSOCK_RECVERR,
>>>> in the same way as IP_RECVERR, and use it as last parameter of 'sock_recv_errqueue()'.
>>>>
>>>
>>> Got it, thanks.
>>>
>>>>> transport = vsk->transport;
>>>>>
>>>>> /* Retrieve the head sk_buff from the socket's receive queue. */
>>>>>
>>>>
>>>> Thanks, Arseniy
>>>
>>> Thanks,
>>> Bobby
On Thu, Aug 03, 2023 at 06:58:24PM +0000, Bobby Eshleman wrote:
>On Thu, Aug 03, 2023 at 02:42:26PM +0200, Stefano Garzarella wrote:
>> On Thu, Aug 03, 2023 at 12:53:22AM +0000, Bobby Eshleman wrote:
>> > On Wed, Aug 02, 2023 at 10:24:44PM +0000, Bobby Eshleman wrote:
>> > > On Sun, Jul 23, 2023 at 12:53:15AM +0300, Arseniy Krasnov wrote:
>> > > >
>> > > >
>> > > > On 19.07.2023 03:50, Bobby Eshleman wrote:
>> > > > > This patch adds support for multi-transport datagrams.
>> > > > >
>> > > > > This includes:
>> > > > > - Per-packet lookup of transports when using sendto(sockaddr_vm)
>> > > > > - Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
>> > > > > sockaddr_vm
>> > > > > - rename VSOCK_TRANSPORT_F_DGRAM to VSOCK_TRANSPORT_F_DGRAM_FALLBACK
>> > > > > - connect() now assigns the transport for (similar to connectible
>> > > > > sockets)
>> > > > >
>> > > > > To preserve backwards compatibility with VMCI, some important changes
>> > > > > are made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
>> > > > > be used for dgrams only if there is not yet a g2h or h2g transport that
>> > > > > has been registered that can transmit the packet. If there is a g2h/h2g
>> > > > > transport for that remote address, then that transport will be used and
>> > > > > not "transport_dgram". This essentially makes "transport_dgram" a
>> > > > > fallback transport for when h2g/g2h has not yet gone online, and so it
>> > > > > is renamed "transport_dgram_fallback". VMCI implements this transport.
>> > > > >
>> > > > > The logic around "transport_dgram" needs to be retained to prevent
>> > > > > breaking VMCI:
>> > > > >
>> > > > > 1) VMCI datagrams existed prior to h2g/g2h and so operate under a
>> > > > > different paradigm. When the vmci transport comes online, it registers
>> > > > > itself with the DGRAM feature, but not H2G/G2H. Only later when the
>> > > > > transport has more information about its environment does it register
>> > > > > H2G or G2H. In the case that a datagram socket is created after
>> > > > > VSOCK_TRANSPORT_F_DGRAM registration but before G2H/H2G registration,
>> > > > > the "transport_dgram" transport is the only registered transport and so
>> > > > > needs to be used.
>> > > > >
>> > > > > 2) VMCI seems to require a special message be sent by the transport when a
>> > > > > datagram socket calls bind(). Under the h2g/g2h model, the transport
>> > > > > is selected using the remote_addr which is set by connect(). At
>> > > > > bind time there is no remote_addr because often no connect() has been
>> > > > > called yet: the transport is null. Therefore, with a null transport
>> > > > > there doesn't seem to be any good way for a datagram socket to tell the
>> > > > > VMCI transport that it has just had bind() called upon it.
>> > > > >
>> > > > > With the new fallback logic, after H2G/G2H comes online the socket layer
>> > > > > will access the VMCI transport via transport_{h2g,g2h}. Prior to H2G/G2H
>> > > > > coming online, the socket layer will access the VMCI transport via
>> > > > > "transport_dgram_fallback".
>> > > > >
>> > > > > Only transports with a special datagram fallback use-case such as VMCI
>> > > > > need to register VSOCK_TRANSPORT_F_DGRAM_FALLBACK.
>> > > > >
>> > > > > Signed-off-by: Bobby Eshleman <[email protected]>
>> > > > > ---
>> > > > > drivers/vhost/vsock.c | 1 -
>> > > > > include/linux/virtio_vsock.h | 2 --
>> > > > > include/net/af_vsock.h | 10 +++---
>> > > > > net/vmw_vsock/af_vsock.c | 64 ++++++++++++++++++++++++++-------
>> > > > > net/vmw_vsock/hyperv_transport.c | 6 ----
>> > > > > net/vmw_vsock/virtio_transport.c | 1 -
>> > > > > net/vmw_vsock/virtio_transport_common.c | 7 ----
>> > > > > net/vmw_vsock/vmci_transport.c | 2 +-
>> > > > > net/vmw_vsock/vsock_loopback.c | 1 -
>> > > > > 9 files changed, 58 insertions(+), 36 deletions(-)
>> > > > >
>> > > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> > > > > index ae8891598a48..d5d6a3c3f273 100644
>> > > > > --- a/drivers/vhost/vsock.c
>> > > > > +++ b/drivers/vhost/vsock.c
>> > > > > @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
>> > > > > .cancel_pkt = vhost_transport_cancel_pkt,
>> > > > >
>> > > > > .dgram_enqueue = virtio_transport_dgram_enqueue,
>> > > > > - .dgram_bind = virtio_transport_dgram_bind,
>> > > > > .dgram_allow = virtio_transport_dgram_allow,
>> > > > >
>> > > > > .stream_enqueue = virtio_transport_stream_enqueue,
>> > > > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> > > > > index 18cbe8d37fca..7632552bee58 100644
>> > > > > --- a/include/linux/virtio_vsock.h
>> > > > > +++ b/include/linux/virtio_vsock.h
>> > > > > @@ -211,8 +211,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
>> > > > > u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
>> > > > > bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
>> > > > > bool virtio_transport_stream_allow(u32 cid, u32 port);
>> > > > > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>> > > > > - struct sockaddr_vm *addr);
>> > > > > bool virtio_transport_dgram_allow(u32 cid, u32 port);
>> > > > >
>> > > > > int virtio_transport_connect(struct vsock_sock *vsk);
>> > > > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>> > > > > index 305d57502e89..f6a0ca9d7c3e 100644
>> > > > > --- a/include/net/af_vsock.h
>> > > > > +++ b/include/net/af_vsock.h
>> > > > > @@ -96,13 +96,13 @@ struct vsock_transport_send_notify_data {
>> > > > >
>> > > > > /* Transport features flags */
>> > > > > /* Transport provides host->guest communication */
>> > > > > -#define VSOCK_TRANSPORT_F_H2G 0x00000001
>> > > > > +#define VSOCK_TRANSPORT_F_H2G 0x00000001
>> > > > > /* Transport provides guest->host communication */
>> > > > > -#define VSOCK_TRANSPORT_F_G2H 0x00000002
>> > > > > -/* Transport provides DGRAM communication */
>> > > > > -#define VSOCK_TRANSPORT_F_DGRAM 0x00000004
>> > > > > +#define VSOCK_TRANSPORT_F_G2H 0x00000002
>> > > > > +/* Transport provides fallback for DGRAM communication */
>> > > > > +#define VSOCK_TRANSPORT_F_DGRAM_FALLBACK 0x00000004
>> > > > > /* Transport provides local (loopback) communication */
>> > > > > -#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
>> > > > > +#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
>> > > > >
>> > > > > struct vsock_transport {
>> > > > > struct module *module;
>> > > > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> > > > > index ae5ac5531d96..26c97b33d55a 100644
>> > > > > --- a/net/vmw_vsock/af_vsock.c
>> > > > > +++ b/net/vmw_vsock/af_vsock.c
>> > > > > @@ -139,8 +139,8 @@ struct proto vsock_proto = {
>> > > > > static const struct vsock_transport *transport_h2g;
>> > > > > /* Transport used for guest->host communication */
>> > > > > static const struct vsock_transport *transport_g2h;
>> > > > > -/* Transport used for DGRAM communication */
>> > > > > -static const struct vsock_transport *transport_dgram;
>> > > > > +/* Transport used as a fallback for DGRAM communication */
>> > > > > +static const struct vsock_transport *transport_dgram_fallback;
>> > > > > /* Transport used for local communication */
>> > > > > static const struct vsock_transport *transport_local;
>> > > > > static DEFINE_MUTEX(vsock_register_mutex);
>> > > > > @@ -439,6 +439,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
>> > > > > return transport;
>> > > > > }
>> > > > >
>> > > > > +static const struct vsock_transport *
>> > > > > +vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
>> > > > > +{
>> > > > > + const struct vsock_transport *transport;
>> > > > > +
>> > > > > + transport = vsock_connectible_lookup_transport(cid, flags);
>> > > > > + if (transport)
>> > > > > + return transport;
>> > > > > +
>> > > > > + return transport_dgram_fallback;
>> > > > > +}
>> > > > > +
>> > > > > /* Assign a transport to a socket and call the .init transport callback.
>> > > > > *
>> > > > > * Note: for connection oriented socket this must be called when vsk->remote_addr
>> > > > > @@ -475,7 +487,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
>> > > > >
>> > > > > switch (sk->sk_type) {
>> > > > > case SOCK_DGRAM:
>> > > > > - new_transport = transport_dgram;
>> > > > > + new_transport = vsock_dgram_lookup_transport(remote_cid,
>> > > > > + remote_flags);
>> > > >
>> > > > I'm a little bit confused about this:
>> > > > 1) Let's create SOCK_DGRAM socket using vsock_create()
>> > > > 2) for SOCK_DGRAM it calls 'vsock_assign_transport()' and we go here, remote_cid == -1
>> > > > 3) I guess 'vsock_dgram_lookup_transport()' calls logic from 0002 and returns h2g for such remote cid, which is not
>> > > > correct I think...
>> > > >
>> > > > Please correct me if i'm wrong
>> > > >
>> > > > Thanks, Arseniy
>> > > >
>> > >
>> > > As I understand, for the VMCI case, if transport_h2g != NULL, then
>> > > transport_h2g == transport_dgram_fallback. In either case,
>> > > vsk->transport == transport_dgram_fallback.
>> > >
>> > > For the virtio/vhost case, temporarily vsk->transport == transport_h2g,
>> > > but it is unused because vsk->transport->dgram_bind == NULL.
>> > >
>> > > Until SS_CONNECTED is set by connect() and vsk->transport is set
>> > > correctly, the send path is barred from using the bad transport.
>> > >
>> > > I guess the recvmsg() path is a little more sketchy, and probably only
>> > > works in my test cases because h2g/g2h in the vhost/virtio case have
>> > > identical dgram_addr_init() implementations.
>> > >
>> > > I think a cleaner solution is maybe checking in vsock_create() if
>> > > dgram_bind is implemented. If it is not, then vsk->transport should be
>> > > reset to NULL and a comment added explaining why VMCI requires this.
>> > >
>> > > Then the other calls can begin explicitly checking for vsk->transport ==
>> > > NULL.
>> >
>> > Actually, on further reflection here, in order for the vsk->transport to
>> > be called in time for ->dgram_addr_init(), it is going to be necessary
>> > to call vsock_assign_transport() in vsock_dgram_bind() anyway.
>> >
>> > I think this means that the vsock_assign_transport() call can be removed
>> > from vsock_create() call entirely, and yet VMCI can still dispatch
>> > messages upon bind() calls as needed.
>> >
>> > This would then simplify the whole arrangement, if there aren't other
>> > unseen issues.
>>
>> This sounds like a good approach.
>>
>> My only question is whether vsock_dgram_bind() is always called for each
>> dgram socket.
>>
>
>No, not yet.
>
>Currently, receivers may use vsock_dgram_recvmsg() prior to any bind,
>but this should probably change.
>
>For UDP, if we initialize a socket and call recvmsg() with no prior
>bind, then the socket will be auto-bound to 0.0.0.0. I guess vsock
>should probably also auto-bind in this case.
I see.
>
>For other cases, bind may not be called prior to calls to vsock_poll() /
>vsock_getname() (even if it doesn't make sense to do so), but I think it
>is okay as long as vsk->transport is not used.
Makes sense.
>
>vsock_dgram_sendmsg() always auto-binds if needed.
Okay, but the transport for sending messages, doesn't depend on the
local address, right?
Thanks,
Stefano
On Fri, Aug 04, 2023 at 04:11:58PM +0200, Stefano Garzarella wrote:
> On Thu, Aug 03, 2023 at 06:58:24PM +0000, Bobby Eshleman wrote:
> > On Thu, Aug 03, 2023 at 02:42:26PM +0200, Stefano Garzarella wrote:
> > > On Thu, Aug 03, 2023 at 12:53:22AM +0000, Bobby Eshleman wrote:
> > > > On Wed, Aug 02, 2023 at 10:24:44PM +0000, Bobby Eshleman wrote:
> > > > > On Sun, Jul 23, 2023 at 12:53:15AM +0300, Arseniy Krasnov wrote:
> > > > > >
> > > > > >
> > > > > > On 19.07.2023 03:50, Bobby Eshleman wrote:
> > > > > > > This patch adds support for multi-transport datagrams.
> > > > > > >
> > > > > > > This includes:
> > > > > > > - Per-packet lookup of transports when using sendto(sockaddr_vm)
> > > > > > > - Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
> > > > > > > sockaddr_vm
> > > > > > > - rename VSOCK_TRANSPORT_F_DGRAM to VSOCK_TRANSPORT_F_DGRAM_FALLBACK
> > > > > > > - connect() now assigns the transport for (similar to connectible
> > > > > > > sockets)
> > > > > > >
> > > > > > > To preserve backwards compatibility with VMCI, some important changes
> > > > > > > are made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
> > > > > > > be used for dgrams only if there is not yet a g2h or h2g transport that
> > > > > > > has been registered that can transmit the packet. If there is a g2h/h2g
> > > > > > > transport for that remote address, then that transport will be used and
> > > > > > > not "transport_dgram". This essentially makes "transport_dgram" a
> > > > > > > fallback transport for when h2g/g2h has not yet gone online, and so it
> > > > > > > is renamed "transport_dgram_fallback". VMCI implements this transport.
> > > > > > >
> > > > > > > The logic around "transport_dgram" needs to be retained to prevent
> > > > > > > breaking VMCI:
> > > > > > >
> > > > > > > 1) VMCI datagrams existed prior to h2g/g2h and so operate under a
> > > > > > > different paradigm. When the vmci transport comes online, it registers
> > > > > > > itself with the DGRAM feature, but not H2G/G2H. Only later when the
> > > > > > > transport has more information about its environment does it register
> > > > > > > H2G or G2H. In the case that a datagram socket is created after
> > > > > > > VSOCK_TRANSPORT_F_DGRAM registration but before G2H/H2G registration,
> > > > > > > the "transport_dgram" transport is the only registered transport and so
> > > > > > > needs to be used.
> > > > > > >
> > > > > > > 2) VMCI seems to require a special message be sent by the transport when a
> > > > > > > datagram socket calls bind(). Under the h2g/g2h model, the transport
> > > > > > > is selected using the remote_addr which is set by connect(). At
> > > > > > > bind time there is no remote_addr because often no connect() has been
> > > > > > > called yet: the transport is null. Therefore, with a null transport
> > > > > > > there doesn't seem to be any good way for a datagram socket to tell the
> > > > > > > VMCI transport that it has just had bind() called upon it.
> > > > > > >
> > > > > > > With the new fallback logic, after H2G/G2H comes online the socket layer
> > > > > > > will access the VMCI transport via transport_{h2g,g2h}. Prior to H2G/G2H
> > > > > > > coming online, the socket layer will access the VMCI transport via
> > > > > > > "transport_dgram_fallback".
> > > > > > >
> > > > > > > Only transports with a special datagram fallback use-case such as VMCI
> > > > > > > need to register VSOCK_TRANSPORT_F_DGRAM_FALLBACK.
> > > > > > >
> > > > > > > Signed-off-by: Bobby Eshleman <[email protected]>
> > > > > > > ---
> > > > > > > drivers/vhost/vsock.c | 1 -
> > > > > > > include/linux/virtio_vsock.h | 2 --
> > > > > > > include/net/af_vsock.h | 10 +++---
> > > > > > > net/vmw_vsock/af_vsock.c | 64 ++++++++++++++++++++++++++-------
> > > > > > > net/vmw_vsock/hyperv_transport.c | 6 ----
> > > > > > > net/vmw_vsock/virtio_transport.c | 1 -
> > > > > > > net/vmw_vsock/virtio_transport_common.c | 7 ----
> > > > > > > net/vmw_vsock/vmci_transport.c | 2 +-
> > > > > > > net/vmw_vsock/vsock_loopback.c | 1 -
> > > > > > > 9 files changed, 58 insertions(+), 36 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > > > > index ae8891598a48..d5d6a3c3f273 100644
> > > > > > > --- a/drivers/vhost/vsock.c
> > > > > > > +++ b/drivers/vhost/vsock.c
> > > > > > > @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
> > > > > > > .cancel_pkt = vhost_transport_cancel_pkt,
> > > > > > >
> > > > > > > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > > > > > > - .dgram_bind = virtio_transport_dgram_bind,
> > > > > > > .dgram_allow = virtio_transport_dgram_allow,
> > > > > > >
> > > > > > > .stream_enqueue = virtio_transport_stream_enqueue,
> > > > > > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > > > > > > index 18cbe8d37fca..7632552bee58 100644
> > > > > > > --- a/include/linux/virtio_vsock.h
> > > > > > > +++ b/include/linux/virtio_vsock.h
> > > > > > > @@ -211,8 +211,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> > > > > > > u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> > > > > > > bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> > > > > > > bool virtio_transport_stream_allow(u32 cid, u32 port);
> > > > > > > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > > > > > > - struct sockaddr_vm *addr);
> > > > > > > bool virtio_transport_dgram_allow(u32 cid, u32 port);
> > > > > > >
> > > > > > > int virtio_transport_connect(struct vsock_sock *vsk);
> > > > > > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > > > > > > index 305d57502e89..f6a0ca9d7c3e 100644
> > > > > > > --- a/include/net/af_vsock.h
> > > > > > > +++ b/include/net/af_vsock.h
> > > > > > > @@ -96,13 +96,13 @@ struct vsock_transport_send_notify_data {
> > > > > > >
> > > > > > > /* Transport features flags */
> > > > > > > /* Transport provides host->guest communication */
> > > > > > > -#define VSOCK_TRANSPORT_F_H2G 0x00000001
> > > > > > > +#define VSOCK_TRANSPORT_F_H2G 0x00000001
> > > > > > > /* Transport provides guest->host communication */
> > > > > > > -#define VSOCK_TRANSPORT_F_G2H 0x00000002
> > > > > > > -/* Transport provides DGRAM communication */
> > > > > > > -#define VSOCK_TRANSPORT_F_DGRAM 0x00000004
> > > > > > > +#define VSOCK_TRANSPORT_F_G2H 0x00000002
> > > > > > > +/* Transport provides fallback for DGRAM communication */
> > > > > > > +#define VSOCK_TRANSPORT_F_DGRAM_FALLBACK 0x00000004
> > > > > > > /* Transport provides local (loopback) communication */
> > > > > > > -#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
> > > > > > > +#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
> > > > > > >
> > > > > > > struct vsock_transport {
> > > > > > > struct module *module;
> > > > > > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > > > > > > index ae5ac5531d96..26c97b33d55a 100644
> > > > > > > --- a/net/vmw_vsock/af_vsock.c
> > > > > > > +++ b/net/vmw_vsock/af_vsock.c
> > > > > > > @@ -139,8 +139,8 @@ struct proto vsock_proto = {
> > > > > > > static const struct vsock_transport *transport_h2g;
> > > > > > > /* Transport used for guest->host communication */
> > > > > > > static const struct vsock_transport *transport_g2h;
> > > > > > > -/* Transport used for DGRAM communication */
> > > > > > > -static const struct vsock_transport *transport_dgram;
> > > > > > > +/* Transport used as a fallback for DGRAM communication */
> > > > > > > +static const struct vsock_transport *transport_dgram_fallback;
> > > > > > > /* Transport used for local communication */
> > > > > > > static const struct vsock_transport *transport_local;
> > > > > > > static DEFINE_MUTEX(vsock_register_mutex);
> > > > > > > @@ -439,6 +439,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
> > > > > > > return transport;
> > > > > > > }
> > > > > > >
> > > > > > > +static const struct vsock_transport *
> > > > > > > +vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
> > > > > > > +{
> > > > > > > + const struct vsock_transport *transport;
> > > > > > > +
> > > > > > > + transport = vsock_connectible_lookup_transport(cid, flags);
> > > > > > > + if (transport)
> > > > > > > + return transport;
> > > > > > > +
> > > > > > > + return transport_dgram_fallback;
> > > > > > > +}
> > > > > > > +
> > > > > > > /* Assign a transport to a socket and call the .init transport callback.
> > > > > > > *
> > > > > > > * Note: for connection oriented socket this must be called when vsk->remote_addr
> > > > > > > @@ -475,7 +487,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> > > > > > >
> > > > > > > switch (sk->sk_type) {
> > > > > > > case SOCK_DGRAM:
> > > > > > > - new_transport = transport_dgram;
> > > > > > > + new_transport = vsock_dgram_lookup_transport(remote_cid,
> > > > > > > + remote_flags);
> > > > > >
> > > > > > I'm a little bit confused about this:
> > > > > > 1) Let's create SOCK_DGRAM socket using vsock_create()
> > > > > > 2) for SOCK_DGRAM it calls 'vsock_assign_transport()' and we go here, remote_cid == -1
> > > > > > 3) I guess 'vsock_dgram_lookup_transport()' calls logic from 0002 and returns h2g for such remote cid, which is not
> > > > > > correct I think...
> > > > > >
> > > > > > Please correct me if i'm wrong
> > > > > >
> > > > > > Thanks, Arseniy
> > > > > >
> > > > >
> > > > > As I understand, for the VMCI case, if transport_h2g != NULL, then
> > > > > transport_h2g == transport_dgram_fallback. In either case,
> > > > > vsk->transport == transport_dgram_fallback.
> > > > >
> > > > > For the virtio/vhost case, temporarily vsk->transport == transport_h2g,
> > > > > but it is unused because vsk->transport->dgram_bind == NULL.
> > > > >
> > > > > Until SS_CONNECTED is set by connect() and vsk->transport is set
> > > > > correctly, the send path is barred from using the bad transport.
> > > > >
> > > > > I guess the recvmsg() path is a little more sketchy, and probably only
> > > > > works in my test cases because h2g/g2h in the vhost/virtio case have
> > > > > identical dgram_addr_init() implementations.
> > > > >
> > > > > I think a cleaner solution is maybe checking in vsock_create() if
> > > > > dgram_bind is implemented. If it is not, then vsk->transport should be
> > > > > reset to NULL and a comment added explaining why VMCI requires this.
> > > > >
> > > > > Then the other calls can begin explicitly checking for vsk->transport ==
> > > > > NULL.
> > > >
> > > > Actually, on further reflection here, in order for the vsk->transport to
> > > > be called in time for ->dgram_addr_init(), it is going to be necessary
> > > > to call vsock_assign_transport() in vsock_dgram_bind() anyway.
> > > >
> > > > I think this means that the vsock_assign_transport() call can be removed
> > > > from vsock_create() call entirely, and yet VMCI can still dispatch
> > > > messages upon bind() calls as needed.
> > > >
> > > > This would then simplify the whole arrangement, if there aren't other
> > > > unseen issues.
> > >
> > > This sounds like a good approach.
> > >
> > > My only question is whether vsock_dgram_bind() is always called for each
> > > dgram socket.
> > >
> >
> > No, not yet.
> >
> > Currently, receivers may use vsock_dgram_recvmsg() prior to any bind,
> > but this should probably change.
> >
> > For UDP, if we initialize a socket and call recvmsg() with no prior
> > bind, then the socket will be auto-bound to 0.0.0.0. I guess vsock
> > should probably also auto-bind in this case.
>
> I see.
>
> >
> > For other cases, bind may not be called prior to calls to vsock_poll() /
> > vsock_getname() (even if it doesn't make sense to do so), but I think it
> > is okay as long as vsk->transport is not used.
>
> Makes sense.
>
> >
> > vsock_dgram_sendmsg() always auto-binds if needed.
>
> Okay, but the transport for sending messages, doesn't depend on the
> local address, right?
That is correct.
Best,
Bobby