2023-06-10 01:17:44

by Bobby Eshleman

[permalink] [raw]
Subject: [PATCH RFC net-next v4 0/8] virtio/vsock: support datagrams

Hey all!

This series introduces support for datagrams to virtio/vsock.

It is a spin-off (and smaller version) of this series from the summer:
https://lore.kernel.org/all/[email protected]/

Please note that this is an RFC and should not be merged until
associated changes are made to the virtio specification, which will
follow after discussion from this series.

Another aside, the v4 of the series has only been mildly tested with a
run of tools/testing/vsock/vsock_test. Some code likely needs cleaning
up, but I'm hoping to get some of the design choices agreed upon before
spending too much time making it pretty.

This series first supports datagrams in a basic form for virtio, and
then optimizes the sendpath for all datagram transports.

The result is a very fast datagram communication protocol that
outperforms even UDP on multi-queue virtio-net w/ vhost on a variety
of multi-threaded workload samples.

For those that are curious, some summary data comparing UDP and VSOCK
DGRAM (N=5):

vCPUS: 16
virtio-net queues: 16
payload size: 4KB
Setup: bare metal + vm (non-nested)

UDP: 287.59 MB/s
VSOCK DGRAM: 509.2 MB/s

Some notes about the implementation...

This datagram implementation forces datagrams to self-throttle according
to the threshold set by sk_sndbuf. It behaves similar to the credits
used by streams in its effect on throughput and memory consumption, but
it is not influenced by the receiving socket as credits are.

The device drops packets silently.

As discussed previously, this series introduces datagrams and defers
fairness to future work. See discussion in v2 for more context around
datagrams, fairness, and this implementation.

Signed-off-by: Bobby Eshleman <[email protected]>
---
Changes in v4:
- style changes
- vsock: use sk_vsock(vsk) in vsock_dgram_recvmsg instead of
&sk->vsk
- vsock: fix xmas tree declaration
- vsock: fix spacing issues
- virtio/vsock: virtio_transport_recv_dgram returns void because err
unused
- sparse analysis warnings/errors
- virtio/vsock: fix unitialized skerr on destroy
- virtio/vsock: fix uninitialized err var on goto out
- vsock: fix declarations that need static
- vsock: fix __rcu annotation order
- bugs
- vsock: fix null ptr in remote_info code
- vsock/dgram: make transport_dgram a fallback instead of first
priority
- vsock: remove redundant rcu read lock acquire in getname()
- tests
- add more tests (message bounds and more)
- add vsock_dgram_bind() helper
- add vsock_dgram_connect() helper

Changes in v3:
- Support multi-transport dgram, changing logic in connect/bind
to support VMCI case
- Support per-pkt transport lookup for sendto() case
- Fix dgram_allow() implementation
- Fix dgram feature bit number (now it is 3)
- Fix binding so dgram and connectible (cid,port) spaces are
non-overlapping
- RCU protect transport ptr so connect() calls never leave
a lockless read of the transport and remote_addr are always
in sync
- Link to v2: https://lore.kernel.org/r/[email protected]

---
Bobby Eshleman (7):
vsock/dgram: generalize recvmsg and drop transport->dgram_dequeue
vsock: refactor transport lookup code
vsock: support multi-transport datagrams
vsock: make vsock bind reusable
virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
virtio/vsock: support dgrams
vsock: Add lockless sendmsg() support

Jiang Wang (1):
tests: add vsock dgram tests

drivers/vhost/vsock.c | 44 ++-
include/linux/virtio_vsock.h | 13 +-
include/net/af_vsock.h | 52 ++-
include/uapi/linux/virtio_vsock.h | 2 +
net/vmw_vsock/af_vsock.c | 616 ++++++++++++++++++++++++++------
net/vmw_vsock/diag.c | 10 +-
net/vmw_vsock/hyperv_transport.c | 42 ++-
net/vmw_vsock/virtio_transport.c | 28 +-
net/vmw_vsock/virtio_transport_common.c | 226 +++++++++---
net/vmw_vsock/vmci_transport.c | 152 ++++----
net/vmw_vsock/vsock_bpf.c | 10 +-
net/vmw_vsock/vsock_loopback.c | 13 +-
tools/testing/vsock/util.c | 141 +++++++-
tools/testing/vsock/util.h | 6 +
tools/testing/vsock/vsock_test.c | 432 ++++++++++++++++++++++
15 files changed, 1533 insertions(+), 254 deletions(-)
---
base-commit: 28cfea989d6f55c3d10608eba2a2bae609c5bf3e
change-id: 20230413-b4-vsock-dgram-3b6eba6a64e5

Best regards,
--
Bobby Eshleman <[email protected]>



2023-06-10 01:21:15

by Bobby Eshleman

[permalink] [raw]
Subject: [PATCH RFC net-next v4 2/8] vsock: refactor transport lookup code

Introduce new reusable function vsock_connectible_lookup_transport()
that performs the transport lookup logic.

No functional change intended.

Signed-off-by: Bobby Eshleman <[email protected]>
---
net/vmw_vsock/af_vsock.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index ffb4dd8b6ea7..74358f0b47fa 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -422,6 +422,22 @@ static void vsock_deassign_transport(struct vsock_sock *vsk)
vsk->transport = NULL;
}

+static const struct vsock_transport *
+vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
+{
+ const struct vsock_transport *transport;
+
+ if (vsock_use_local_transport(cid))
+ transport = transport_local;
+ else if (cid <= VMADDR_CID_HOST || !transport_h2g ||
+ (flags & VMADDR_FLAG_TO_HOST))
+ transport = transport_g2h;
+ else
+ transport = transport_h2g;
+
+ return transport;
+}
+
/* Assign a transport to a socket and call the .init transport callback.
*
* Note: for connection oriented socket this must be called when vsk->remote_addr
@@ -462,13 +478,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
break;
case SOCK_STREAM:
case SOCK_SEQPACKET:
- if (vsock_use_local_transport(remote_cid))
- new_transport = transport_local;
- else if (remote_cid <= VMADDR_CID_HOST || !transport_h2g ||
- (remote_flags & VMADDR_FLAG_TO_HOST))
- new_transport = transport_g2h;
- else
- new_transport = transport_h2g;
+ new_transport = vsock_connectible_lookup_transport(remote_cid,
+ remote_flags);
break;
default:
return -ESOCKTNOSUPPORT;

--
2.30.2


2023-06-10 01:21:22

by Bobby Eshleman

[permalink] [raw]
Subject: [PATCH RFC net-next v4 1/8] vsock/dgram: generalize recvmsg and drop transport->dgram_dequeue

This commit drops the transport->dgram_dequeue callback and makes
vsock_dgram_recvmsg() generic. It also adds additional transport
callbacks for use by the generic vsock_dgram_recvmsg(), such as for
parsing skbs for CID/port which vary in format per transport.

Signed-off-by: Bobby Eshleman <[email protected]>
---
drivers/vhost/vsock.c | 4 +-
include/linux/virtio_vsock.h | 3 ++
include/net/af_vsock.h | 13 ++++++-
net/vmw_vsock/af_vsock.c | 51 ++++++++++++++++++++++++-
net/vmw_vsock/hyperv_transport.c | 17 +++++++--
net/vmw_vsock/virtio_transport.c | 4 +-
net/vmw_vsock/virtio_transport_common.c | 18 +++++++++
net/vmw_vsock/vmci_transport.c | 68 +++++++++++++--------------------
net/vmw_vsock/vsock_loopback.c | 4 +-
9 files changed, 132 insertions(+), 50 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 6578db78f0ae..c8201c070b4b 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -410,9 +410,11 @@ static struct virtio_transport vhost_transport = {
.cancel_pkt = vhost_transport_cancel_pkt,

.dgram_enqueue = virtio_transport_dgram_enqueue,
- .dgram_dequeue = virtio_transport_dgram_dequeue,
.dgram_bind = virtio_transport_dgram_bind,
.dgram_allow = virtio_transport_dgram_allow,
+ .dgram_get_cid = virtio_transport_dgram_get_cid,
+ .dgram_get_port = virtio_transport_dgram_get_port,
+ .dgram_get_length = virtio_transport_dgram_get_length,

.stream_enqueue = virtio_transport_stream_enqueue,
.stream_dequeue = virtio_transport_stream_dequeue,
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index c58453699ee9..23521a318cf0 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -219,6 +219,9 @@ bool virtio_transport_stream_allow(u32 cid, u32 port);
int virtio_transport_dgram_bind(struct vsock_sock *vsk,
struct sockaddr_vm *addr);
bool virtio_transport_dgram_allow(u32 cid, u32 port);
+int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
+int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
+int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len);

int virtio_transport_connect(struct vsock_sock *vsk);

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 0e7504a42925..7bedb9ee7e3e 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -120,11 +120,20 @@ struct vsock_transport {

/* DGRAM. */
int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
- int (*dgram_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
- size_t len, int flags);
int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
struct msghdr *, size_t len);
bool (*dgram_allow)(u32 cid, u32 port);
+ int (*dgram_get_cid)(struct sk_buff *skb, unsigned int *cid);
+ int (*dgram_get_port)(struct sk_buff *skb, unsigned int *port);
+ int (*dgram_get_length)(struct sk_buff *skb, size_t *length);
+
+ /* The number of bytes into the buffer at which the payload starts, as
+ * first seen by the receiving socket layer. For example, if the
+ * transport presets the skb pointers using skb_pull(sizeof(header))
+ * than this would be zero, otherwise it would be the size of the
+ * header.
+ */
+ const size_t dgram_payload_offset;

/* STREAM. */
/* TODO: stream_bind() */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index efb8a0937a13..ffb4dd8b6ea7 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1271,11 +1271,15 @@ static int vsock_dgram_connect(struct socket *sock,
int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
size_t len, int flags)
{
+ const struct vsock_transport *transport;
#ifdef CONFIG_BPF_SYSCALL
const struct proto *prot;
#endif
struct vsock_sock *vsk;
+ struct sk_buff *skb;
+ size_t payload_len;
struct sock *sk;
+ int err;

sk = sock->sk;
vsk = vsock_sk(sk);
@@ -1286,7 +1290,52 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
return prot->recvmsg(sk, msg, len, flags, NULL);
#endif

- return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
+ if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
+ return -EOPNOTSUPP;
+
+ transport = vsk->transport;
+
+ /* Retrieve the head sk_buff from the socket's receive queue. */
+ err = 0;
+ skb = skb_recv_datagram(sk_vsock(vsk), flags, &err);
+ if (!skb)
+ return err;
+
+ err = transport->dgram_get_length(skb, &payload_len);
+ if (err)
+ goto out;
+
+ if (payload_len > len) {
+ payload_len = len;
+ msg->msg_flags |= MSG_TRUNC;
+ }
+
+ /* Place the datagram payload in the user's iovec. */
+ err = skb_copy_datagram_msg(skb, transport->dgram_payload_offset, msg, payload_len);
+ if (err)
+ goto out;
+
+ if (msg->msg_name) {
+ /* Provide the address of the sender. */
+ DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
+ unsigned int cid, port;
+
+ err = transport->dgram_get_cid(skb, &cid);
+ if (err)
+ goto out;
+
+ err = transport->dgram_get_port(skb, &port);
+ if (err)
+ goto out;
+
+ vsock_addr_init(vm_addr, cid, port);
+ msg->msg_namelen = sizeof(*vm_addr);
+ }
+ err = payload_len;
+
+out:
+ skb_free_datagram(&vsk->sk, skb);
+ return err;
}
EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 7cb1a9d2cdb4..ff6e87e25fa0 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -556,8 +556,17 @@ static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
return -EOPNOTSUPP;
}

-static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
- size_t len, int flags)
+static int hvs_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
+{
+ return -EOPNOTSUPP;
+}
+
+static int hvs_dgram_get_port(struct sk_buff *skb, unsigned int *port)
+{
+ return -EOPNOTSUPP;
+}
+
+static int hvs_dgram_get_length(struct sk_buff *skb, size_t *len)
{
return -EOPNOTSUPP;
}
@@ -833,7 +842,9 @@ static struct vsock_transport hvs_transport = {
.shutdown = hvs_shutdown,

.dgram_bind = hvs_dgram_bind,
- .dgram_dequeue = hvs_dgram_dequeue,
+ .dgram_get_cid = hvs_dgram_get_cid,
+ .dgram_get_port = hvs_dgram_get_port,
+ .dgram_get_length = hvs_dgram_get_length,
.dgram_enqueue = hvs_dgram_enqueue,
.dgram_allow = hvs_dgram_allow,

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index e95df847176b..5763cdf13804 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -429,9 +429,11 @@ static struct virtio_transport virtio_transport = {
.cancel_pkt = virtio_transport_cancel_pkt,

.dgram_bind = virtio_transport_dgram_bind,
- .dgram_dequeue = virtio_transport_dgram_dequeue,
.dgram_enqueue = virtio_transport_dgram_enqueue,
.dgram_allow = virtio_transport_dgram_allow,
+ .dgram_get_cid = virtio_transport_dgram_get_cid,
+ .dgram_get_port = virtio_transport_dgram_get_port,
+ .dgram_get_length = virtio_transport_dgram_get_length,

.stream_dequeue = virtio_transport_stream_dequeue,
.stream_enqueue = virtio_transport_stream_enqueue,
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index b769fc258931..e6903c719964 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -797,6 +797,24 @@ int virtio_transport_dgram_bind(struct vsock_sock *vsk,
}
EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);

+int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
+{
+ return -EOPNOTSUPP;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_cid);
+
+int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
+{
+ return -EOPNOTSUPP;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_port);
+
+int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
+{
+ return -EOPNOTSUPP;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_length);
+
bool virtio_transport_dgram_allow(u32 cid, u32 port)
{
return false;
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index b370070194fa..bbc63826bf48 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -1731,57 +1731,40 @@ static int vmci_transport_dgram_enqueue(
return err - sizeof(*dg);
}

-static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk,
- struct msghdr *msg, size_t len,
- int flags)
+static int vmci_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
{
- int err;
struct vmci_datagram *dg;
- size_t payload_len;
- struct sk_buff *skb;

- if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
- return -EOPNOTSUPP;
+ dg = (struct vmci_datagram *)skb->data;
+ if (!dg)
+ return -EINVAL;

- /* Retrieve the head sk_buff from the socket's receive queue. */
- err = 0;
- skb = skb_recv_datagram(&vsk->sk, flags, &err);
- if (!skb)
- return err;
+ *cid = dg->src.context;
+ return 0;
+}
+
+static int vmci_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
+{
+ struct vmci_datagram *dg;

dg = (struct vmci_datagram *)skb->data;
if (!dg)
- /* err is 0, meaning we read zero bytes. */
- goto out;
-
- payload_len = dg->payload_size;
- /* Ensure the sk_buff matches the payload size claimed in the packet. */
- if (payload_len != skb->len - sizeof(*dg)) {
- err = -EINVAL;
- goto out;
- }
+ return -EINVAL;

- if (payload_len > len) {
- payload_len = len;
- msg->msg_flags |= MSG_TRUNC;
- }
+ *port = dg->src.resource;
+ return 0;
+}

- /* Place the datagram payload in the user's iovec. */
- err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
- if (err)
- goto out;
+static int vmci_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
+{
+ struct vmci_datagram *dg;

- if (msg->msg_name) {
- /* Provide the address of the sender. */
- DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
- vsock_addr_init(vm_addr, dg->src.context, dg->src.resource);
- msg->msg_namelen = sizeof(*vm_addr);
- }
- err = payload_len;
+ dg = (struct vmci_datagram *)skb->data;
+ if (!dg)
+ return -EINVAL;

-out:
- skb_free_datagram(&vsk->sk, skb);
- return err;
+ *len = dg->payload_size;
+ return 0;
}

static bool vmci_transport_dgram_allow(u32 cid, u32 port)
@@ -2040,9 +2023,12 @@ static struct vsock_transport vmci_transport = {
.release = vmci_transport_release,
.connect = vmci_transport_connect,
.dgram_bind = vmci_transport_dgram_bind,
- .dgram_dequeue = vmci_transport_dgram_dequeue,
.dgram_enqueue = vmci_transport_dgram_enqueue,
.dgram_allow = vmci_transport_dgram_allow,
+ .dgram_get_cid = vmci_transport_dgram_get_cid,
+ .dgram_get_port = vmci_transport_dgram_get_port,
+ .dgram_get_length = vmci_transport_dgram_get_length,
+ .dgram_payload_offset = sizeof(struct vmci_datagram),
.stream_dequeue = vmci_transport_stream_dequeue,
.stream_enqueue = vmci_transport_stream_enqueue,
.stream_has_data = vmci_transport_stream_has_data,
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
index 5c6360df1f31..2f3cabc79ee5 100644
--- a/net/vmw_vsock/vsock_loopback.c
+++ b/net/vmw_vsock/vsock_loopback.c
@@ -62,9 +62,11 @@ static struct virtio_transport loopback_transport = {
.cancel_pkt = vsock_loopback_cancel_pkt,

.dgram_bind = virtio_transport_dgram_bind,
- .dgram_dequeue = virtio_transport_dgram_dequeue,
.dgram_enqueue = virtio_transport_dgram_enqueue,
.dgram_allow = virtio_transport_dgram_allow,
+ .dgram_get_cid = virtio_transport_dgram_get_cid,
+ .dgram_get_port = virtio_transport_dgram_get_port,
+ .dgram_get_length = virtio_transport_dgram_get_length,

.stream_dequeue = virtio_transport_stream_dequeue,
.stream_enqueue = virtio_transport_stream_enqueue,

--
2.30.2


2023-06-10 01:21:53

by Bobby Eshleman

[permalink] [raw]
Subject: [PATCH RFC net-next v4 3/8] vsock: support multi-transport datagrams

This patch adds support for multi-transport datagrams.

This includes:
- Per-packet lookup of transports when using sendto(sockaddr_vm)
- Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
sockaddr_vm

To preserve backwards compatibility with VMCI, some important changes
were made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
be used for dgrams iff there is not yet a g2h or h2g transport that has
been registered that can transmit the packet. If there is a g2h/h2g
transport for that remote address, then that transport will be used and
not "transport_dgram". This essentially makes "transport_dgram" a
fallback transport for when h2g/g2h has not yet gone online, which
appears to be the exact use case for VMCI.

This design makes sense, because there is no reason that the
transport_{g2h,h2g} cannot also service datagrams, which makes the role
of transport_dgram difficult to understand outside of the VMCI context.

The logic around "transport_dgram" had to be retained to prevent
breaking VMCI:

1) VMCI datagrams appear to function outside of the h2g/g2h
paradigm. When the vmci transport becomes online, it registers itself
with the DGRAM feature, but not H2G/G2H. Only later when the
transport has more information about its environment does it register
H2G or G2H. In the case that a datagram socket becomes active
after DGRAM registration but before G2H/H2G registration, the
"transport_dgram" transport needs to be used.

2) VMCI seems to require special message be sent by the transport when a
datagram socket calls bind(). Under the h2g/g2h model, the transport
is selected using the remote_addr which is set by connect(). At
bind time there is no remote_addr because often no connect() has been
called yet: the transport is null. Therefore, with a null transport
there doesn't seem to be any good way for a datagram socket a tell the
VMCI transport that it has just had bind() called upon it.

Only transports with a special datagram fallback use-case such as VMCI
need to register VSOCK_TRANSPORT_F_DGRAM.

Signed-off-by: Bobby Eshleman <[email protected]>
---
drivers/vhost/vsock.c | 1 -
include/linux/virtio_vsock.h | 2 -
net/vmw_vsock/af_vsock.c | 78 +++++++++++++++++++++++++--------
net/vmw_vsock/hyperv_transport.c | 6 ---
net/vmw_vsock/virtio_transport.c | 1 -
net/vmw_vsock/virtio_transport_common.c | 7 ---
net/vmw_vsock/vsock_loopback.c | 1 -
7 files changed, 60 insertions(+), 36 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index c8201c070b4b..8f0082da5e70 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
.cancel_pkt = vhost_transport_cancel_pkt,

.dgram_enqueue = virtio_transport_dgram_enqueue,
- .dgram_bind = virtio_transport_dgram_bind,
.dgram_allow = virtio_transport_dgram_allow,
.dgram_get_cid = virtio_transport_dgram_get_cid,
.dgram_get_port = virtio_transport_dgram_get_port,
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 23521a318cf0..73afa09f4585 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -216,8 +216,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
bool virtio_transport_stream_allow(u32 cid, u32 port);
-int virtio_transport_dgram_bind(struct vsock_sock *vsk,
- struct sockaddr_vm *addr);
bool virtio_transport_dgram_allow(u32 cid, u32 port);
int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 74358f0b47fa..ef86765f3765 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -438,6 +438,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
return transport;
}

+static const struct vsock_transport *
+vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
+{
+ const struct vsock_transport *transport;
+
+ transport = vsock_connectible_lookup_transport(cid, flags);
+ if (transport)
+ return transport;
+
+ return transport_dgram;
+}
+
/* Assign a transport to a socket and call the .init transport callback.
*
* Note: for connection oriented socket this must be called when vsk->remote_addr
@@ -474,7 +486,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)

switch (sk->sk_type) {
case SOCK_DGRAM:
- new_transport = transport_dgram;
+ new_transport = vsock_dgram_lookup_transport(remote_cid,
+ remote_flags);
break;
case SOCK_STREAM:
case SOCK_SEQPACKET:
@@ -691,6 +704,9 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
static int __vsock_bind_dgram(struct vsock_sock *vsk,
struct sockaddr_vm *addr)
{
+ if (!vsk->transport || !vsk->transport->dgram_bind)
+ return -EINVAL;
+
return vsk->transport->dgram_bind(vsk, addr);
}

@@ -1172,19 +1188,24 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,

lock_sock(sk);

- transport = vsk->transport;
-
- err = vsock_auto_bind(vsk);
- if (err)
- goto out;
-
-
/* If the provided message contains an address, use that. Otherwise
* fall back on the socket's remote handle (if it has been connected).
*/
if (msg->msg_name &&
vsock_addr_cast(msg->msg_name, msg->msg_namelen,
&remote_addr) == 0) {
+ transport = vsock_dgram_lookup_transport(remote_addr->svm_cid,
+ remote_addr->svm_flags);
+ if (!transport) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (!try_module_get(transport->module)) {
+ err = -ENODEV;
+ goto out;
+ }
+
/* Ensure this address is of the right type and is a valid
* destination.
*/
@@ -1193,11 +1214,27 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
remote_addr->svm_cid = transport->get_local_cid();

if (!vsock_addr_bound(remote_addr)) {
+ module_put(transport->module);
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (!transport->dgram_allow(remote_addr->svm_cid,
+ remote_addr->svm_port)) {
+ module_put(transport->module);
err = -EINVAL;
goto out;
}
+
+ err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
+ module_put(transport->module);
} else if (sock->state == SS_CONNECTED) {
remote_addr = &vsk->remote_addr;
+ transport = vsk->transport;
+
+ err = vsock_auto_bind(vsk);
+ if (err)
+ goto out;

if (remote_addr->svm_cid == VMADDR_CID_ANY)
remote_addr->svm_cid = transport->get_local_cid();
@@ -1205,23 +1242,23 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
/* XXX Should connect() or this function ensure remote_addr is
* bound?
*/
- if (!vsock_addr_bound(&vsk->remote_addr)) {
+ if (!vsock_addr_bound(remote_addr)) {
err = -EINVAL;
goto out;
}
- } else {
- err = -EINVAL;
- goto out;
- }

- if (!transport->dgram_allow(remote_addr->svm_cid,
- remote_addr->svm_port)) {
+ if (!transport->dgram_allow(remote_addr->svm_cid,
+ remote_addr->svm_port)) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
+ } else {
err = -EINVAL;
goto out;
}

- err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
-
out:
release_sock(sk);
return err;
@@ -1255,13 +1292,18 @@ static int vsock_dgram_connect(struct socket *sock,
if (err)
goto out;

+ memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
+
+ err = vsock_assign_transport(vsk, NULL);
+ if (err)
+ goto out;
+
if (!vsk->transport->dgram_allow(remote_addr->svm_cid,
remote_addr->svm_port)) {
err = -EINVAL;
goto out;
}

- memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
sock->state = SS_CONNECTED;

/* sock map disallows redirection of non-TCP sockets with sk_state !=
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index ff6e87e25fa0..c00bc5da769a 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -551,11 +551,6 @@ static void hvs_destruct(struct vsock_sock *vsk)
kfree(hvs);
}

-static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
-{
- return -EOPNOTSUPP;
-}
-
static int hvs_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
{
return -EOPNOTSUPP;
@@ -841,7 +836,6 @@ static struct vsock_transport hvs_transport = {
.connect = hvs_connect,
.shutdown = hvs_shutdown,

- .dgram_bind = hvs_dgram_bind,
.dgram_get_cid = hvs_dgram_get_cid,
.dgram_get_port = hvs_dgram_get_port,
.dgram_get_length = hvs_dgram_get_length,
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 5763cdf13804..1b7843a7779a 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -428,7 +428,6 @@ static struct virtio_transport virtio_transport = {
.shutdown = virtio_transport_shutdown,
.cancel_pkt = virtio_transport_cancel_pkt,

- .dgram_bind = virtio_transport_dgram_bind,
.dgram_enqueue = virtio_transport_dgram_enqueue,
.dgram_allow = virtio_transport_dgram_allow,
.dgram_get_cid = virtio_transport_dgram_get_cid,
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index e6903c719964..d5a3c8efe84b 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -790,13 +790,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
}
EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);

-int virtio_transport_dgram_bind(struct vsock_sock *vsk,
- struct sockaddr_vm *addr)
-{
- return -EOPNOTSUPP;
-}
-EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
-
int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
{
return -EOPNOTSUPP;
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
index 2f3cabc79ee5..e9de45a26fbd 100644
--- a/net/vmw_vsock/vsock_loopback.c
+++ b/net/vmw_vsock/vsock_loopback.c
@@ -61,7 +61,6 @@ static struct virtio_transport loopback_transport = {
.shutdown = virtio_transport_shutdown,
.cancel_pkt = vsock_loopback_cancel_pkt,

- .dgram_bind = virtio_transport_dgram_bind,
.dgram_enqueue = virtio_transport_dgram_enqueue,
.dgram_allow = virtio_transport_dgram_allow,
.dgram_get_cid = virtio_transport_dgram_get_cid,

--
2.30.2


2023-06-10 01:22:06

by Bobby Eshleman

[permalink] [raw]
Subject: [PATCH RFC net-next v4 6/8] virtio/vsock: support dgrams

This commit adds support for datagrams over virtio/vsock.

Message boundaries are preserved on a per-skb and per-vq entry basis.
Messages are copied in whole from the user to an SKB, which in turn is
added to the scatterlist for the virtqueue in whole for the device.
Messages do not straddle skbs and they do not straddle packets.
Messages may be truncated by the receiving user if their buffer is
shorter than the message.

Other properties of vsock datagrams:
- Datagrams self-throttle at the per-socket sk_sndbuf threshold.
- The same virtqueue is used as is used for streams and seqpacket flows
- Credits are not used for datagrams
- Packets are dropped silently by the device, which means the virtqueue
will still get kicked even during high packet loss, so long as the
socket does not exceed sk_sndbuf.

Future work might include finding a way to reduce the virtqueue kick
rate for datagram flows with high packet loss.

Signed-off-by: Bobby Eshleman <[email protected]>
---
drivers/vhost/vsock.c | 27 ++++-
include/linux/virtio_vsock.h | 5 +-
include/net/af_vsock.h | 1 +
include/uapi/linux/virtio_vsock.h | 1 +
net/vmw_vsock/af_vsock.c | 58 +++++++--
net/vmw_vsock/virtio_transport.c | 23 +++-
net/vmw_vsock/virtio_transport_common.c | 207 ++++++++++++++++++++++++--------
net/vmw_vsock/vsock_loopback.c | 8 +-
8 files changed, 264 insertions(+), 66 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 8f0082da5e70..159c1a22c1a8 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -32,7 +32,8 @@
enum {
VHOST_VSOCK_FEATURES = VHOST_FEATURES |
(1ULL << VIRTIO_F_ACCESS_PLATFORM) |
- (1ULL << VIRTIO_VSOCK_F_SEQPACKET)
+ (1ULL << VIRTIO_VSOCK_F_SEQPACKET) |
+ (1ULL << VIRTIO_VSOCK_F_DGRAM)
};

enum {
@@ -56,6 +57,7 @@ struct vhost_vsock {
atomic_t queued_replies;

u32 guest_cid;
+ bool dgram_allow;
bool seqpacket_allow;
};

@@ -394,6 +396,7 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock)
return val < vq->num;
}

+static bool vhost_transport_dgram_allow(u32 cid, u32 port);
static bool vhost_transport_seqpacket_allow(u32 remote_cid);

static struct virtio_transport vhost_transport = {
@@ -410,10 +413,11 @@ static struct virtio_transport vhost_transport = {
.cancel_pkt = vhost_transport_cancel_pkt,

.dgram_enqueue = virtio_transport_dgram_enqueue,
- .dgram_allow = virtio_transport_dgram_allow,
+ .dgram_allow = vhost_transport_dgram_allow,
.dgram_get_cid = virtio_transport_dgram_get_cid,
.dgram_get_port = virtio_transport_dgram_get_port,
.dgram_get_length = virtio_transport_dgram_get_length,
+ .dgram_payload_offset = 0,

.stream_enqueue = virtio_transport_stream_enqueue,
.stream_dequeue = virtio_transport_stream_dequeue,
@@ -446,6 +450,22 @@ static struct virtio_transport vhost_transport = {
.send_pkt = vhost_transport_send_pkt,
};

+static bool vhost_transport_dgram_allow(u32 cid, u32 port)
+{
+ struct vhost_vsock *vsock;
+ bool dgram_allow = false;
+
+ rcu_read_lock();
+ vsock = vhost_vsock_get(cid);
+
+ if (vsock)
+ dgram_allow = vsock->dgram_allow;
+
+ rcu_read_unlock();
+
+ return dgram_allow;
+}
+
static bool vhost_transport_seqpacket_allow(u32 remote_cid)
{
struct vhost_vsock *vsock;
@@ -802,6 +822,9 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
vsock->seqpacket_allow = true;

+ if (features & (1ULL << VIRTIO_VSOCK_F_DGRAM))
+ vsock->dgram_allow = true;
+
for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
vq = &vsock->vqs[i];
mutex_lock(&vq->mutex);
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 73afa09f4585..237ca87a2ecd 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -216,7 +216,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
bool virtio_transport_stream_allow(u32 cid, u32 port);
-bool virtio_transport_dgram_allow(u32 cid, u32 port);
int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len);
@@ -247,4 +246,8 @@ void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit);
void virtio_transport_deliver_tap_pkt(struct sk_buff *skb);
int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *list);
int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t read_actor);
+void virtio_transport_init_dgram_bind_tables(void);
+int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
+int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
+int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len);
#endif /* _LINUX_VIRTIO_VSOCK_H */
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 7bedb9ee7e3e..c115e655b4f5 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -225,6 +225,7 @@ void vsock_for_each_connected_socket(struct vsock_transport *transport,
void (*fn)(struct sock *sk));
int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
bool vsock_find_cid(unsigned int cid);
+struct sock *vsock_find_bound_dgram_socket(struct sockaddr_vm *addr);

/**** TAP ****/

diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
index 9c25f267bbc0..27b4b2b8bf13 100644
--- a/include/uapi/linux/virtio_vsock.h
+++ b/include/uapi/linux/virtio_vsock.h
@@ -70,6 +70,7 @@ struct virtio_vsock_hdr {
enum virtio_vsock_type {
VIRTIO_VSOCK_TYPE_STREAM = 1,
VIRTIO_VSOCK_TYPE_SEQPACKET = 2,
+ VIRTIO_VSOCK_TYPE_DGRAM = 3,
};

enum virtio_vsock_op {
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 7a3ca4270446..b0b18e7f4299 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -114,6 +114,7 @@
static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
static void vsock_sk_destruct(struct sock *sk);
static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
+static bool sock_type_connectible(u16 type);

/* Protocol family. */
struct proto vsock_proto = {
@@ -180,6 +181,8 @@ struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
EXPORT_SYMBOL_GPL(vsock_connected_table);
DEFINE_SPINLOCK(vsock_table_lock);
EXPORT_SYMBOL_GPL(vsock_table_lock);
+static struct list_head vsock_dgram_bind_table[VSOCK_HASH_SIZE];
+static DEFINE_SPINLOCK(vsock_dgram_table_lock);

/* Autobind this socket to the local address if necessary. */
static int vsock_auto_bind(struct vsock_sock *vsk)
@@ -202,6 +205,9 @@ static void vsock_init_tables(void)

for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++)
INIT_LIST_HEAD(&vsock_connected_table[i]);
+
+ for (i = 0; i < ARRAY_SIZE(vsock_dgram_bind_table); i++)
+ INIT_LIST_HEAD(&vsock_dgram_bind_table[i]);
}

static void __vsock_insert_bound(struct list_head *list,
@@ -230,8 +236,8 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
sock_put(&vsk->sk);
}

-struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
- struct list_head *bind_table)
+static struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
+ struct list_head *bind_table)
{
struct vsock_sock *vsk;

@@ -248,6 +254,23 @@ struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
return NULL;
}

+struct sock *
+vsock_find_bound_dgram_socket(struct sockaddr_vm *addr)
+{
+ struct sock *sk;
+
+ spin_lock_bh(&vsock_dgram_table_lock);
+ sk = vsock_find_bound_socket_common(addr,
+ &vsock_dgram_bind_table[VSOCK_HASH(addr)]);
+ if (sk)
+ sock_hold(sk);
+
+ spin_unlock_bh(&vsock_dgram_table_lock);
+
+ return sk;
+}
+EXPORT_SYMBOL_GPL(vsock_find_bound_dgram_socket);
+
static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
{
return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr));
@@ -287,6 +310,14 @@ void vsock_insert_connected(struct vsock_sock *vsk)
}
EXPORT_SYMBOL_GPL(vsock_insert_connected);

+static void vsock_remove_dgram_bound(struct vsock_sock *vsk)
+{
+ spin_lock_bh(&vsock_dgram_table_lock);
+ if (__vsock_in_bound_table(vsk))
+ __vsock_remove_bound(vsk);
+ spin_unlock_bh(&vsock_dgram_table_lock);
+}
+
void vsock_remove_bound(struct vsock_sock *vsk)
{
spin_lock_bh(&vsock_table_lock);
@@ -338,7 +369,10 @@ EXPORT_SYMBOL_GPL(vsock_find_connected_socket);

void vsock_remove_sock(struct vsock_sock *vsk)
{
- vsock_remove_bound(vsk);
+ if (sock_type_connectible(sk_vsock(vsk)->sk_type))
+ vsock_remove_bound(vsk);
+ else
+ vsock_remove_dgram_bound(vsk);
vsock_remove_connected(vsk);
}
EXPORT_SYMBOL_GPL(vsock_remove_sock);
@@ -720,11 +754,19 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
}

-static int __vsock_bind_dgram(struct vsock_sock *vsk,
- struct sockaddr_vm *addr)
+static int vsock_bind_dgram(struct vsock_sock *vsk,
+ struct sockaddr_vm *addr)
{
- if (!vsk->transport || !vsk->transport->dgram_bind)
- return -EINVAL;
+ if (!vsk->transport || !vsk->transport->dgram_bind) {
+ int retval;
+
+ spin_lock_bh(&vsock_dgram_table_lock);
+ retval = vsock_bind_common(vsk, addr, vsock_dgram_bind_table,
+ VSOCK_HASH_SIZE);
+ spin_unlock_bh(&vsock_dgram_table_lock);
+
+ return retval;
+ }

return vsk->transport->dgram_bind(vsk, addr);
}
@@ -755,7 +797,7 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
break;

case SOCK_DGRAM:
- retval = __vsock_bind_dgram(vsk, addr);
+ retval = vsock_bind_dgram(vsk, addr);
break;

default:
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 1b7843a7779a..7160a3104218 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -63,6 +63,7 @@ struct virtio_vsock {

u32 guest_cid;
bool seqpacket_allow;
+ bool dgram_allow;
};

static u32 virtio_transport_get_local_cid(void)
@@ -413,6 +414,7 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
queue_work(virtio_vsock_workqueue, &vsock->rx_work);
}

+static bool virtio_transport_dgram_allow(u32 cid, u32 port);
static bool virtio_transport_seqpacket_allow(u32 remote_cid);

static struct virtio_transport virtio_transport = {
@@ -465,6 +467,21 @@ static struct virtio_transport virtio_transport = {
.send_pkt = virtio_transport_send_pkt,
};

+static bool virtio_transport_dgram_allow(u32 cid, u32 port)
+{
+ struct virtio_vsock *vsock;
+ bool dgram_allow;
+
+ dgram_allow = false;
+ rcu_read_lock();
+ vsock = rcu_dereference(the_virtio_vsock);
+ if (vsock)
+ dgram_allow = vsock->dgram_allow;
+ rcu_read_unlock();
+
+ return dgram_allow;
+}
+
static bool virtio_transport_seqpacket_allow(u32 remote_cid)
{
struct virtio_vsock *vsock;
@@ -658,6 +675,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_SEQPACKET))
vsock->seqpacket_allow = true;

+ if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
+ vsock->dgram_allow = true;
+
vdev->priv = vsock;

ret = virtio_vsock_vqs_init(vsock);
@@ -750,7 +770,8 @@ static struct virtio_device_id id_table[] = {
};

static unsigned int features[] = {
- VIRTIO_VSOCK_F_SEQPACKET
+ VIRTIO_VSOCK_F_SEQPACKET,
+ VIRTIO_VSOCK_F_DGRAM
};

static struct virtio_driver virtio_vsock_driver = {
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index d5a3c8efe84b..bc9d459723f5 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -37,6 +37,35 @@ virtio_transport_get_ops(struct vsock_sock *vsk)
return container_of(t, struct virtio_transport, transport);
}

+/* Requires info->msg and info->vsk */
+static struct sk_buff *
+virtio_transport_sock_alloc_send_skb(struct virtio_vsock_pkt_info *info, unsigned int size,
+ gfp_t mask, int *err)
+{
+ struct sk_buff *skb;
+ struct sock *sk;
+ int noblock;
+
+ if (size < VIRTIO_VSOCK_SKB_HEADROOM) {
+ *err = -EINVAL;
+ return NULL;
+ }
+
+ if (info->msg)
+ noblock = info->msg->msg_flags & MSG_DONTWAIT;
+ else
+ noblock = 1;
+
+ sk = sk_vsock(info->vsk);
+ sk->sk_allocation = mask;
+ skb = sock_alloc_send_skb(sk, size, noblock, err);
+ if (!skb)
+ return NULL;
+
+ skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM);
+ return skb;
+}
+
/* Returns a new packet on success, otherwise returns NULL.
*
* If NULL is returned, errp is set to a negative errno.
@@ -47,7 +76,8 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
u32 src_cid,
u32 src_port,
u32 dst_cid,
- u32 dst_port)
+ u32 dst_port,
+ int *errp)
{
const size_t skb_len = VIRTIO_VSOCK_SKB_HEADROOM + len;
struct virtio_vsock_hdr *hdr;
@@ -55,9 +85,21 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
void *payload;
int err;

- skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
- if (!skb)
+ /* dgrams do not use credits, self-throttle according to sk_sndbuf
+ * using sock_alloc_send_skb. This helps avoid triggering the OOM.
+ */
+ if (info->vsk && info->type == VIRTIO_VSOCK_TYPE_DGRAM) {
+ skb = virtio_transport_sock_alloc_send_skb(info, skb_len, GFP_KERNEL, &err);
+ } else {
+ skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
+ if (!skb)
+ err = -ENOMEM;
+ }
+
+ if (!skb) {
+ *errp = err;
return NULL;
+ }

hdr = virtio_vsock_hdr(skb);
hdr->type = cpu_to_le16(info->type);
@@ -96,12 +138,14 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,

if (info->vsk && !skb_set_owner_sk_safe(skb, sk_vsock(info->vsk))) {
WARN_ONCE(1, "failed to allocate skb on vsock socket with sk_refcnt == 0\n");
+ err = -EFAULT;
goto out;
}

return skb;

out:
+ *errp = err;
kfree_skb(skb);
return NULL;
}
@@ -183,7 +227,9 @@ EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt);

static u16 virtio_transport_get_type(struct sock *sk)
{
- if (sk->sk_type == SOCK_STREAM)
+ if (sk->sk_type == SOCK_DGRAM)
+ return VIRTIO_VSOCK_TYPE_DGRAM;
+ else if (sk->sk_type == SOCK_STREAM)
return VIRTIO_VSOCK_TYPE_STREAM;
else
return VIRTIO_VSOCK_TYPE_SEQPACKET;
@@ -239,11 +285,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,

skb = virtio_transport_alloc_skb(info, skb_len,
src_cid, src_port,
- dst_cid, dst_port);
- if (!skb) {
- ret = -ENOMEM;
+ dst_cid, dst_port,
+ &ret);
+ if (!skb)
break;
- }

virtio_transport_inc_tx_pkt(vvs, skb);

@@ -583,14 +628,30 @@ virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
}
EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_enqueue);

-int
-virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
- struct msghdr *msg,
- size_t len, int flags)
+int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
+{
+ *cid = le64_to_cpu(virtio_vsock_hdr(skb)->src_cid);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_cid);
+
+int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
+{
+ *port = le32_to_cpu(virtio_vsock_hdr(skb)->src_port);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_port);
+
+int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
{
- return -EOPNOTSUPP;
+ /* The device layer must have already moved the data ptr beyond the
+ * header for skb->len to be correct.
+ */
+ WARN_ON(skb->data == skb->head);
+ *len = skb->len;
+ return 0;
}
-EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
+EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_length);

s64 virtio_transport_stream_has_data(struct vsock_sock *vsk)
{
@@ -790,30 +851,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
}
EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);

-int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
-{
- return -EOPNOTSUPP;
-}
-EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_cid);
-
-int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
-{
- return -EOPNOTSUPP;
-}
-EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_port);
-
-int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
-{
- return -EOPNOTSUPP;
-}
-EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_length);
-
-bool virtio_transport_dgram_allow(u32 cid, u32 port)
-{
- return false;
-}
-EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
-
int virtio_transport_connect(struct vsock_sock *vsk)
{
struct virtio_vsock_pkt_info info = {
@@ -846,7 +883,34 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
struct msghdr *msg,
size_t dgram_len)
{
- return -EOPNOTSUPP;
+ const struct virtio_transport *t_ops;
+ struct virtio_vsock_pkt_info info = {
+ .op = VIRTIO_VSOCK_OP_RW,
+ .msg = msg,
+ .vsk = vsk,
+ .type = VIRTIO_VSOCK_TYPE_DGRAM,
+ };
+ u32 src_cid, src_port;
+ struct sk_buff *skb;
+ int err;
+
+ if (dgram_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
+ return -EMSGSIZE;
+
+ t_ops = virtio_transport_get_ops(vsk);
+ src_cid = t_ops->transport.get_local_cid();
+ src_port = vsk->local_addr.svm_port;
+
+ skb = virtio_transport_alloc_skb(&info, dgram_len,
+ src_cid, src_port,
+ remote_addr->svm_cid,
+ remote_addr->svm_port,
+ &err);
+
+ if (!skb)
+ return err;
+
+ return t_ops->send_pkt(skb);
}
EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);

@@ -903,6 +967,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
.reply = true,
};
struct sk_buff *reply;
+ int err;

/* Send RST only if the original pkt is not a RST pkt */
if (le16_to_cpu(hdr->op) == VIRTIO_VSOCK_OP_RST)
@@ -915,9 +980,10 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
le64_to_cpu(hdr->dst_cid),
le32_to_cpu(hdr->dst_port),
le64_to_cpu(hdr->src_cid),
- le32_to_cpu(hdr->src_port));
+ le32_to_cpu(hdr->src_port),
+ &err);
if (!reply)
- return -ENOMEM;
+ return err;

return t->send_pkt(reply);
}
@@ -1137,6 +1203,21 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
kfree_skb(skb);
}

+/* This function takes ownership of the skb.
+ *
+ * It either places the skb on the sk_receive_queue or frees it.
+ */
+static void
+virtio_transport_recv_dgram(struct sock *sk, struct sk_buff *skb)
+{
+ if (sock_queue_rcv_skb(sk, skb)) {
+ kfree_skb(skb);
+ return;
+ }
+
+ sk->sk_data_ready(sk);
+}
+
static int
virtio_transport_recv_connected(struct sock *sk,
struct sk_buff *skb)
@@ -1300,7 +1381,8 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
static bool virtio_transport_valid_type(u16 type)
{
return (type == VIRTIO_VSOCK_TYPE_STREAM) ||
- (type == VIRTIO_VSOCK_TYPE_SEQPACKET);
+ (type == VIRTIO_VSOCK_TYPE_SEQPACKET) ||
+ (type == VIRTIO_VSOCK_TYPE_DGRAM);
}

/* We are under the virtio-vsock's vsock->rx_lock or vhost-vsock's vq->mutex
@@ -1314,40 +1396,52 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
struct vsock_sock *vsk;
struct sock *sk;
bool space_available;
+ u16 type;

vsock_addr_init(&src, le64_to_cpu(hdr->src_cid),
le32_to_cpu(hdr->src_port));
vsock_addr_init(&dst, le64_to_cpu(hdr->dst_cid),
le32_to_cpu(hdr->dst_port));

+ type = le16_to_cpu(hdr->type);
+
trace_virtio_transport_recv_pkt(src.svm_cid, src.svm_port,
dst.svm_cid, dst.svm_port,
le32_to_cpu(hdr->len),
- le16_to_cpu(hdr->type),
+ type,
le16_to_cpu(hdr->op),
le32_to_cpu(hdr->flags),
le32_to_cpu(hdr->buf_alloc),
le32_to_cpu(hdr->fwd_cnt));

- if (!virtio_transport_valid_type(le16_to_cpu(hdr->type))) {
+ if (!virtio_transport_valid_type(type)) {
(void)virtio_transport_reset_no_sock(t, skb);
goto free_pkt;
}

- /* The socket must be in connected or bound table
- * otherwise send reset back
+ /* For stream/seqpacket, the socket must be in connected or bound table
+ * otherwise send reset back.
+ *
+ * For datagrams, no reset is sent back.
*/
sk = vsock_find_connected_socket(&src, &dst);
if (!sk) {
- sk = vsock_find_bound_socket(&dst);
- if (!sk) {
- (void)virtio_transport_reset_no_sock(t, skb);
- goto free_pkt;
+ if (type == VIRTIO_VSOCK_TYPE_DGRAM) {
+ sk = vsock_find_bound_dgram_socket(&dst);
+ if (!sk)
+ goto free_pkt;
+ } else {
+ sk = vsock_find_bound_socket(&dst);
+ if (!sk) {
+ (void)virtio_transport_reset_no_sock(t, skb);
+ goto free_pkt;
+ }
}
}

- if (virtio_transport_get_type(sk) != le16_to_cpu(hdr->type)) {
- (void)virtio_transport_reset_no_sock(t, skb);
+ if (virtio_transport_get_type(sk) != type) {
+ if (type != VIRTIO_VSOCK_TYPE_DGRAM)
+ (void)virtio_transport_reset_no_sock(t, skb);
sock_put(sk);
goto free_pkt;
}
@@ -1363,12 +1457,18 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,

/* Check if sk has been closed before lock_sock */
if (sock_flag(sk, SOCK_DONE)) {
- (void)virtio_transport_reset_no_sock(t, skb);
+ if (type != VIRTIO_VSOCK_TYPE_DGRAM)
+ (void)virtio_transport_reset_no_sock(t, skb);
release_sock(sk);
sock_put(sk);
goto free_pkt;
}

+ if (sk->sk_type == SOCK_DGRAM) {
+ virtio_transport_recv_dgram(sk, skb);
+ goto out;
+ }
+
space_available = virtio_transport_space_update(sk, skb);

/* Update CID in case it has changed after a transport reset event */
@@ -1400,6 +1500,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
break;
}

+out:
release_sock(sk);

/* Release refcnt obtained when we fetched this socket out of the
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
index e9de45a26fbd..68312aa8c972 100644
--- a/net/vmw_vsock/vsock_loopback.c
+++ b/net/vmw_vsock/vsock_loopback.c
@@ -46,6 +46,7 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk)
return 0;
}

+static bool vsock_loopback_dgram_allow(u32 cid, u32 port);
static bool vsock_loopback_seqpacket_allow(u32 remote_cid);

static struct virtio_transport loopback_transport = {
@@ -62,7 +63,7 @@ static struct virtio_transport loopback_transport = {
.cancel_pkt = vsock_loopback_cancel_pkt,

.dgram_enqueue = virtio_transport_dgram_enqueue,
- .dgram_allow = virtio_transport_dgram_allow,
+ .dgram_allow = vsock_loopback_dgram_allow,
.dgram_get_cid = virtio_transport_dgram_get_cid,
.dgram_get_port = virtio_transport_dgram_get_port,
.dgram_get_length = virtio_transport_dgram_get_length,
@@ -98,6 +99,11 @@ static struct virtio_transport loopback_transport = {
.send_pkt = vsock_loopback_send_pkt,
};

+static bool vsock_loopback_dgram_allow(u32 cid, u32 port)
+{
+ return true;
+}
+
static bool vsock_loopback_seqpacket_allow(u32 remote_cid)
{
return true;

--
2.30.2


2023-06-10 01:37:39

by Bobby Eshleman

[permalink] [raw]
Subject: [PATCH RFC net-next v4 4/8] vsock: make vsock bind reusable

This commit makes the bind table management functions in vsock usable
for different bind tables. For use by datagrams in a future patch.

Signed-off-by: Bobby Eshleman <[email protected]>
---
net/vmw_vsock/af_vsock.c | 33 ++++++++++++++++++++++++++-------
1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index ef86765f3765..7a3ca4270446 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -230,11 +230,12 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
sock_put(&vsk->sk);
}

-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
+struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
+ struct list_head *bind_table)
{
struct vsock_sock *vsk;

- list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
+ list_for_each_entry(vsk, bind_table, bound_table) {
if (vsock_addr_equals_addr(addr, &vsk->local_addr))
return sk_vsock(vsk);

@@ -247,6 +248,11 @@ static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
return NULL;
}

+static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
+{
+ return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr));
+}
+
static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
struct sockaddr_vm *dst)
{
@@ -646,12 +652,17 @@ static void vsock_pending_work(struct work_struct *work)

/**** SOCKET OPERATIONS ****/

-static int __vsock_bind_connectible(struct vsock_sock *vsk,
- struct sockaddr_vm *addr)
+static int vsock_bind_common(struct vsock_sock *vsk,
+ struct sockaddr_vm *addr,
+ struct list_head *bind_table,
+ size_t table_size)
{
static u32 port;
struct sockaddr_vm new_addr;

+ if (table_size < VSOCK_HASH_SIZE)
+ return -1;
+
if (!port)
port = get_random_u32_above(LAST_RESERVED_PORT);

@@ -667,7 +678,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,

new_addr.svm_port = port++;

- if (!__vsock_find_bound_socket(&new_addr)) {
+ if (!vsock_find_bound_socket_common(&new_addr,
+ &bind_table[VSOCK_HASH(addr)])) {
found = true;
break;
}
@@ -684,7 +696,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
return -EACCES;
}

- if (__vsock_find_bound_socket(&new_addr))
+ if (vsock_find_bound_socket_common(&new_addr,
+ &bind_table[VSOCK_HASH(addr)]))
return -EADDRINUSE;
}

@@ -696,11 +709,17 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
* by AF_UNIX.
*/
__vsock_remove_bound(vsk);
- __vsock_insert_bound(vsock_bound_sockets(&vsk->local_addr), vsk);
+ __vsock_insert_bound(&bind_table[VSOCK_HASH(&vsk->local_addr)], vsk);

return 0;
}

+static int __vsock_bind_connectible(struct vsock_sock *vsk,
+ struct sockaddr_vm *addr)
+{
+ return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
+}
+
static int __vsock_bind_dgram(struct vsock_sock *vsk,
struct sockaddr_vm *addr)
{

--
2.30.2


2023-06-10 02:16:41

by Bobby Eshleman

[permalink] [raw]
Subject: [PATCH RFC net-next v4 8/8] tests: add vsock dgram tests

From: Jiang Wang <[email protected]>

This patch adds tests for vsock datagram.

Signed-off-by: Bobby Eshleman <[email protected]>
Signed-off-by: Jiang Wang <[email protected]>
---
tools/testing/vsock/util.c | 141 ++++++++++++-
tools/testing/vsock/util.h | 6 +
tools/testing/vsock/vsock_test.c | 432 +++++++++++++++++++++++++++++++++++++++
3 files changed, 578 insertions(+), 1 deletion(-)

diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
index 01b636d3039a..811e70d7cf1e 100644
--- a/tools/testing/vsock/util.c
+++ b/tools/testing/vsock/util.c
@@ -99,7 +99,8 @@ static int vsock_connect(unsigned int cid, unsigned int port, int type)
int ret;
int fd;

- control_expectln("LISTENING");
+ if (type != SOCK_DGRAM)
+ control_expectln("LISTENING");

fd = socket(AF_VSOCK, type, 0);

@@ -130,6 +131,11 @@ int vsock_seqpacket_connect(unsigned int cid, unsigned int port)
return vsock_connect(cid, port, SOCK_SEQPACKET);
}

+int vsock_dgram_connect(unsigned int cid, unsigned int port)
+{
+ return vsock_connect(cid, port, SOCK_DGRAM);
+}
+
/* Listen on <cid, port> and return the first incoming connection. The remote
* address is stored to clientaddrp. clientaddrp may be NULL.
*/
@@ -211,6 +217,34 @@ int vsock_seqpacket_accept(unsigned int cid, unsigned int port,
return vsock_accept(cid, port, clientaddrp, SOCK_SEQPACKET);
}

+int vsock_dgram_bind(unsigned int cid, unsigned int port)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = port,
+ .svm_cid = cid,
+ },
+ };
+ int fd;
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ return fd;
+}
+
/* Transmit one byte and check the return value.
*
* expected_ret:
@@ -260,6 +294,57 @@ void send_byte(int fd, int expected_ret, int flags)
}
}

+/* Transmit one byte and check the return value.
+ *
+ * expected_ret:
+ * <0 Negative errno (for testing errors)
+ * 0 End-of-file
+ * 1 Success
+ */
+void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
+ int flags)
+{
+ const uint8_t byte = 'A';
+ ssize_t nwritten;
+
+ timeout_begin(TIMEOUT);
+ do {
+ nwritten = sendto(fd, &byte, sizeof(byte), flags, dest_addr,
+ len);
+ timeout_check("write");
+ } while (nwritten < 0 && errno == EINTR);
+ timeout_end();
+
+ if (expected_ret < 0) {
+ if (nwritten != -1) {
+ fprintf(stderr, "bogus sendto(2) return value %zd\n",
+ nwritten);
+ exit(EXIT_FAILURE);
+ }
+ if (errno != -expected_ret) {
+ perror("write");
+ exit(EXIT_FAILURE);
+ }
+ return;
+ }
+
+ if (nwritten < 0) {
+ perror("write");
+ exit(EXIT_FAILURE);
+ }
+ if (nwritten == 0) {
+ if (expected_ret == 0)
+ return;
+
+ fprintf(stderr, "unexpected EOF while sending byte\n");
+ exit(EXIT_FAILURE);
+ }
+ if (nwritten != sizeof(byte)) {
+ fprintf(stderr, "bogus sendto(2) return value %zd\n", nwritten);
+ exit(EXIT_FAILURE);
+ }
+}
+
/* Receive one byte and check the return value.
*
* expected_ret:
@@ -313,6 +398,60 @@ void recv_byte(int fd, int expected_ret, int flags)
}
}

+/* Receive one byte and check the return value.
+ *
+ * expected_ret:
+ * <0 Negative errno (for testing errors)
+ * 0 End-of-file
+ * 1 Success
+ */
+void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
+ int expected_ret, int flags)
+{
+ uint8_t byte;
+ ssize_t nread;
+
+ timeout_begin(TIMEOUT);
+ do {
+ nread = recvfrom(fd, &byte, sizeof(byte), flags, src_addr, addrlen);
+ timeout_check("read");
+ } while (nread < 0 && errno == EINTR);
+ timeout_end();
+
+ if (expected_ret < 0) {
+ if (nread != -1) {
+ fprintf(stderr, "bogus recvfrom(2) return value %zd\n",
+ nread);
+ exit(EXIT_FAILURE);
+ }
+ if (errno != -expected_ret) {
+ perror("read");
+ exit(EXIT_FAILURE);
+ }
+ return;
+ }
+
+ if (nread < 0) {
+ perror("read");
+ exit(EXIT_FAILURE);
+ }
+ if (nread == 0) {
+ if (expected_ret == 0)
+ return;
+
+ fprintf(stderr, "unexpected EOF while receiving byte\n");
+ exit(EXIT_FAILURE);
+ }
+ if (nread != sizeof(byte)) {
+ fprintf(stderr, "bogus recvfrom(2) return value %zd\n", nread);
+ exit(EXIT_FAILURE);
+ }
+ if (byte != 'A') {
+ fprintf(stderr, "unexpected byte read %c\n", byte);
+ exit(EXIT_FAILURE);
+ }
+}
+
/* Run test cases. The program terminates if a failure occurs. */
void run_tests(const struct test_case *test_cases,
const struct test_opts *opts)
diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
index fb99208a95ea..a69e128d120c 100644
--- a/tools/testing/vsock/util.h
+++ b/tools/testing/vsock/util.h
@@ -37,13 +37,19 @@ void init_signals(void);
unsigned int parse_cid(const char *str);
int vsock_stream_connect(unsigned int cid, unsigned int port);
int vsock_seqpacket_connect(unsigned int cid, unsigned int port);
+int vsock_dgram_connect(unsigned int cid, unsigned int port);
int vsock_stream_accept(unsigned int cid, unsigned int port,
struct sockaddr_vm *clientaddrp);
int vsock_seqpacket_accept(unsigned int cid, unsigned int port,
struct sockaddr_vm *clientaddrp);
+int vsock_dgram_bind(unsigned int cid, unsigned int port);
void vsock_wait_remote_close(int fd);
void send_byte(int fd, int expected_ret, int flags);
+void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
+ int flags);
void recv_byte(int fd, int expected_ret, int flags);
+void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
+ int expected_ret, int flags);
void run_tests(const struct test_case *test_cases,
const struct test_opts *opts);
void list_tests(const struct test_case *test_cases);
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index ac1bd3ac1533..ded82d39ee5d 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -1053,6 +1053,413 @@ static void test_stream_virtio_skb_merge_server(const struct test_opts *opts)
close(fd);
}

+static void test_dgram_sendto_client(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = opts->peer_cid,
+ },
+ };
+ int fd;
+
+ /* Wait for the server to be ready */
+ control_expectln("BIND");
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0);
+
+ /* Notify the server that the client has finished */
+ control_writeln("DONE");
+
+ close(fd);
+}
+
+static void test_dgram_sendto_server(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = VMADDR_CID_ANY,
+ },
+ };
+ int len = sizeof(addr.sa);
+ int fd;
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Notify the client that the server is ready */
+ control_writeln("BIND");
+
+ recvfrom_byte(fd, &addr.sa, &len, 1, 0);
+
+ /* Wait for the client to finish */
+ control_expectln("DONE");
+
+ close(fd);
+}
+
+static void test_dgram_connect_client(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = opts->peer_cid,
+ },
+ };
+ int ret;
+ int fd;
+
+ /* Wait for the server to be ready */
+ control_expectln("BIND");
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ ret = connect(fd, &addr.sa, sizeof(addr.svm));
+ if (ret < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ send_byte(fd, 1, 0);
+
+ /* Notify the server that the client has finished */
+ control_writeln("DONE");
+
+ close(fd);
+}
+
+static void test_dgram_connect_server(const struct test_opts *opts)
+{
+ test_dgram_sendto_server(opts);
+}
+
+static void test_dgram_multiconn_sendto_client(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = opts->peer_cid,
+ },
+ };
+ int fds[MULTICONN_NFDS];
+ int i;
+
+ /* Wait for the server to be ready */
+ control_expectln("BIND");
+
+ for (i = 0; i < MULTICONN_NFDS; i++) {
+ fds[i] = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fds[i] < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+ }
+
+ for (i = 0; i < MULTICONN_NFDS; i++)
+ sendto_byte(fds[i], &addr.sa, sizeof(addr.svm), 1, 0);
+
+ /* Notify the server that the client has finished */
+ control_writeln("DONE");
+
+ for (i = 0; i < MULTICONN_NFDS; i++)
+ close(fds[i]);
+}
+
+static void test_dgram_multiconn_sendto_server(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = VMADDR_CID_ANY,
+ },
+ };
+ int len = sizeof(addr.sa);
+ int fd;
+ int i;
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Notify the client that the server is ready */
+ control_writeln("BIND");
+
+ for (i = 0; i < MULTICONN_NFDS; i++)
+ recvfrom_byte(fd, &addr.sa, &len, 1, 0);
+
+ /* Wait for the client to finish */
+ control_expectln("DONE");
+
+ close(fd);
+}
+
+static void test_dgram_multiconn_send_client(const struct test_opts *opts)
+{
+ int fds[MULTICONN_NFDS];
+ int i;
+
+ /* Wait for the server to be ready */
+ control_expectln("BIND");
+
+ for (i = 0; i < MULTICONN_NFDS; i++) {
+ fds[i] = vsock_dgram_connect(opts->peer_cid, 1234);
+ if (fds[i] < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+ }
+
+ for (i = 0; i < MULTICONN_NFDS; i++)
+ send_byte(fds[i], 1, 0);
+
+ /* Notify the server that the client has finished */
+ control_writeln("DONE");
+
+ for (i = 0; i < MULTICONN_NFDS; i++)
+ close(fds[i]);
+}
+
+static void test_dgram_multiconn_send_server(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = VMADDR_CID_ANY,
+ },
+ };
+ int fd;
+ int i;
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Notify the client that the server is ready */
+ control_writeln("BIND");
+
+ for (i = 0; i < MULTICONN_NFDS; i++)
+ recv_byte(fd, 1, 0);
+
+ /* Wait for the client to finish */
+ control_expectln("DONE");
+
+ close(fd);
+}
+
+static void test_dgram_msg_bounds_client(const struct test_opts *opts)
+{
+ unsigned long recv_buf_size;
+ int page_size;
+ int msg_cnt;
+ int fd;
+
+ fd = vsock_dgram_connect(opts->peer_cid, 1234);
+ if (fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Let the server know the client is ready */
+ control_writeln("CLNTREADY");
+
+ msg_cnt = control_readulong();
+ recv_buf_size = control_readulong();
+
+ /* Wait, until receiver sets buffer size. */
+ control_expectln("SRVREADY");
+
+ page_size = getpagesize();
+
+ for (int i = 0; i < msg_cnt; i++) {
+ unsigned long curr_hash;
+ ssize_t send_size;
+ size_t buf_size;
+ void *buf;
+
+ /* Use "small" buffers and "big" buffers. */
+ if (i & 1)
+ buf_size = page_size +
+ (rand() % (MAX_MSG_SIZE - page_size));
+ else
+ buf_size = 1 + (rand() % page_size);
+
+ buf_size = min(buf_size, recv_buf_size);
+
+ buf = malloc(buf_size);
+
+ if (!buf) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+
+ memset(buf, rand() & 0xff, buf_size);
+ /* Set at least one MSG_EOR + some random. */
+
+ send_size = send(fd, buf, buf_size, 0);
+
+ if (send_size < 0) {
+ perror("send");
+ exit(EXIT_FAILURE);
+ }
+
+ if (send_size != buf_size) {
+ fprintf(stderr, "Invalid send size\n");
+ exit(EXIT_FAILURE);
+ }
+
+ /* In theory the implementation isn't required to transmit
+ * these packets in order, so we use this SYNC control message
+ * so that server and client coordinate sending and receiving
+ * one packet at a time. The client sends a packet and waits
+ * until it has been received before sending another.
+ */
+ control_writeln("PKTSENT");
+ control_expectln("PKTRECV");
+
+ /* Send the server a hash of the packet */
+ curr_hash = hash_djb2(buf, buf_size);
+ control_writeulong(curr_hash);
+ free(buf);
+ }
+
+ control_writeln("SENDDONE");
+ close(fd);
+}
+
+static void test_dgram_msg_bounds_server(const struct test_opts *opts)
+{
+ const unsigned long msg_cnt = 16;
+ unsigned long sock_buf_size;
+ struct msghdr msg = {0};
+ struct iovec iov = {0};
+ char buf[MAX_MSG_SIZE];
+ socklen_t len;
+ int fd;
+ int i;
+
+ fd = vsock_dgram_bind(VMADDR_CID_ANY, 1234);
+
+ if (fd < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Set receive buffer to maximum */
+ sock_buf_size = -1;
+ if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
+ &sock_buf_size, sizeof(sock_buf_size))) {
+ perror("setsockopt(SO_RECVBUF)");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Retrieve the receive buffer size */
+ len = sizeof(sock_buf_size);
+ if (getsockopt(fd, SOL_SOCKET, SO_RCVBUF,
+ &sock_buf_size, &len)) {
+ perror("getsockopt(SO_RECVBUF)");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Client ready to receive parameters */
+ control_expectln("CLNTREADY");
+
+ control_writeulong(msg_cnt);
+ control_writeulong(sock_buf_size);
+
+ /* Ready to receive data. */
+ control_writeln("SRVREADY");
+
+ iov.iov_base = buf;
+ iov.iov_len = sizeof(buf);
+ msg.msg_iov = &iov;
+ msg.msg_iovlen = 1;
+
+ for (i = 0; i < msg_cnt; i++) {
+ unsigned long remote_hash;
+ unsigned long curr_hash;
+ ssize_t recv_size;
+
+ control_expectln("PKTSENT");
+ recv_size = recvmsg(fd, &msg, 0);
+ control_writeln("PKTRECV");
+
+ if (!recv_size)
+ break;
+
+ if (recv_size < 0) {
+ perror("recvmsg");
+ exit(EXIT_FAILURE);
+ }
+
+ curr_hash = hash_djb2(msg.msg_iov[0].iov_base, recv_size);
+ remote_hash = control_readulong();
+
+ if (curr_hash != remote_hash) {
+ fprintf(stderr, "Message bounds broken\n");
+ exit(EXIT_FAILURE);
+ }
+ }
+
+ close(fd);
+}
+
static struct test_case test_cases[] = {
{
.name = "SOCK_STREAM connection reset",
@@ -1128,6 +1535,31 @@ static struct test_case test_cases[] = {
.run_client = test_stream_virtio_skb_merge_client,
.run_server = test_stream_virtio_skb_merge_server,
},
+ {
+ .name = "SOCK_DGRAM client sendto",
+ .run_client = test_dgram_sendto_client,
+ .run_server = test_dgram_sendto_server,
+ },
+ {
+ .name = "SOCK_DGRAM client connect",
+ .run_client = test_dgram_connect_client,
+ .run_server = test_dgram_connect_server,
+ },
+ {
+ .name = "SOCK_DGRAM multiple connections using sendto",
+ .run_client = test_dgram_multiconn_sendto_client,
+ .run_server = test_dgram_multiconn_sendto_server,
+ },
+ {
+ .name = "SOCK_DGRAM multiple connections using send",
+ .run_client = test_dgram_multiconn_send_client,
+ .run_server = test_dgram_multiconn_send_server,
+ },
+ {
+ .name = "SOCK_DGRAM msg bounds",
+ .run_client = test_dgram_msg_bounds_client,
+ .run_server = test_dgram_msg_bounds_server,
+ },
{},
};


--
2.30.2


2023-06-11 21:19:45

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 1/8] vsock/dgram: generalize recvmsg and drop transport->dgram_dequeue

Hello Bobby! Thanks for this patchset! Small comment below:

On 10.06.2023 03:58, Bobby Eshleman wrote:
> This commit drops the transport->dgram_dequeue callback and makes
> vsock_dgram_recvmsg() generic. It also adds additional transport
> callbacks for use by the generic vsock_dgram_recvmsg(), such as for
> parsing skbs for CID/port which vary in format per transport.
>
> Signed-off-by: Bobby Eshleman <[email protected]>
> ---
> drivers/vhost/vsock.c | 4 +-
> include/linux/virtio_vsock.h | 3 ++
> include/net/af_vsock.h | 13 ++++++-
> net/vmw_vsock/af_vsock.c | 51 ++++++++++++++++++++++++-
> net/vmw_vsock/hyperv_transport.c | 17 +++++++--
> net/vmw_vsock/virtio_transport.c | 4 +-
> net/vmw_vsock/virtio_transport_common.c | 18 +++++++++
> net/vmw_vsock/vmci_transport.c | 68 +++++++++++++--------------------
> net/vmw_vsock/vsock_loopback.c | 4 +-
> 9 files changed, 132 insertions(+), 50 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 6578db78f0ae..c8201c070b4b 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -410,9 +410,11 @@ static struct virtio_transport vhost_transport = {
> .cancel_pkt = vhost_transport_cancel_pkt,
>
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> - .dgram_dequeue = virtio_transport_dgram_dequeue,
> .dgram_bind = virtio_transport_dgram_bind,
> .dgram_allow = virtio_transport_dgram_allow,
> + .dgram_get_cid = virtio_transport_dgram_get_cid,
> + .dgram_get_port = virtio_transport_dgram_get_port,
> + .dgram_get_length = virtio_transport_dgram_get_length,
>
> .stream_enqueue = virtio_transport_stream_enqueue,
> .stream_dequeue = virtio_transport_stream_dequeue,
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index c58453699ee9..23521a318cf0 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -219,6 +219,9 @@ bool virtio_transport_stream_allow(u32 cid, u32 port);
> int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> struct sockaddr_vm *addr);
> bool virtio_transport_dgram_allow(u32 cid, u32 port);
> +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
> +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
> +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len);
>
> int virtio_transport_connect(struct vsock_sock *vsk);
>
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> index 0e7504a42925..7bedb9ee7e3e 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -120,11 +120,20 @@ struct vsock_transport {
>
> /* DGRAM. */
> int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
> - int (*dgram_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
> - size_t len, int flags);
> int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
> struct msghdr *, size_t len);
> bool (*dgram_allow)(u32 cid, u32 port);
> + int (*dgram_get_cid)(struct sk_buff *skb, unsigned int *cid);
> + int (*dgram_get_port)(struct sk_buff *skb, unsigned int *port);
> + int (*dgram_get_length)(struct sk_buff *skb, size_t *length);
> +
> + /* The number of bytes into the buffer at which the payload starts, as
> + * first seen by the receiving socket layer. For example, if the
> + * transport presets the skb pointers using skb_pull(sizeof(header))
> + * than this would be zero, otherwise it would be the size of the
> + * header.
> + */
> + const size_t dgram_payload_offset;
>
> /* STREAM. */
> /* TODO: stream_bind() */
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index efb8a0937a13..ffb4dd8b6ea7 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -1271,11 +1271,15 @@ static int vsock_dgram_connect(struct socket *sock,
> int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> size_t len, int flags)
> {
> + const struct vsock_transport *transport;
> #ifdef CONFIG_BPF_SYSCALL
> const struct proto *prot;
> #endif
> struct vsock_sock *vsk;
> + struct sk_buff *skb;
> + size_t payload_len;
> struct sock *sk;
> + int err;
>
> sk = sock->sk;
> vsk = vsock_sk(sk);
> @@ -1286,7 +1290,52 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> return prot->recvmsg(sk, msg, len, flags, NULL);
> #endif
>
> - return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
> + if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> + return -EOPNOTSUPP;
> +
> + transport = vsk->transport;
> +
> + /* Retrieve the head sk_buff from the socket's receive queue. */
> + err = 0;
> + skb = skb_recv_datagram(sk_vsock(vsk), flags, &err);
> + if (!skb)
> + return err;
> +
> + err = transport->dgram_get_length(skb, &payload_len);
> + if (err)
> + goto out;
> +
> + if (payload_len > len) {
> + payload_len = len;
> + msg->msg_flags |= MSG_TRUNC;
> + }
> +
> + /* Place the datagram payload in the user's iovec. */
> + err = skb_copy_datagram_msg(skb, transport->dgram_payload_offset, msg, payload_len);
> + if (err)
> + goto out;
> +
> + if (msg->msg_name) {
> + /* Provide the address of the sender. */
> + DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> + unsigned int cid, port;
> +
> + err = transport->dgram_get_cid(skb, &cid);
> + if (err)
> + goto out;
> +
> + err = transport->dgram_get_port(skb, &port);
> + if (err)
> + goto out;

Maybe we can merge 'dgram_get_cid' and 'dgram_get_port' to a single callback? Because I see that this is
the only place where both are used (correct me if i'm wrong) and logically both operates with addresses:
CID and port. E.g. something like that: dgram_get_cid_n_port().

Moreover, I'm not sure, but is it good "tradeoff" here: remove transport specific callback for dgram receive
where we already have 'msghdr' with both data buffer and buffer for 'sockaddr_vm' and instead of it add new
several fields (callbacks) to transports like dgram_get_cid(), dgram_get_port()? I agree, that in each transport
specific callback we will have same copying logic by calling 'skb_copy_datagram_msg()' and filling address
by using 'vsock_addr_init()', but in this case we don't need to update transports too much. For example HyperV
still unchanged as it does not support SOCK_DGRAM. For VMCI You just need to add 'vsock_addr_init()' logic
to it's dgram dequeue callback.

What do You think?

Thanks, Arseniy

> +
> + vsock_addr_init(vm_addr, cid, port);
> + msg->msg_namelen = sizeof(*vm_addr);
> + }
> + err = payload_len;
> +
> +out:
> + skb_free_datagram(&vsk->sk, skb);
> + return err;
> }
> EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);
>
> diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> index 7cb1a9d2cdb4..ff6e87e25fa0 100644
> --- a/net/vmw_vsock/hyperv_transport.c
> +++ b/net/vmw_vsock/hyperv_transport.c
> @@ -556,8 +556,17 @@ static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
> return -EOPNOTSUPP;
> }
>
> -static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
> - size_t len, int flags)
> +static int hvs_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> +static int hvs_dgram_get_port(struct sk_buff *skb, unsigned int *port)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> +static int hvs_dgram_get_length(struct sk_buff *skb, size_t *len)
> {
> return -EOPNOTSUPP;
> }
> @@ -833,7 +842,9 @@ static struct vsock_transport hvs_transport = {
> .shutdown = hvs_shutdown,
>
> .dgram_bind = hvs_dgram_bind,
> - .dgram_dequeue = hvs_dgram_dequeue,
> + .dgram_get_cid = hvs_dgram_get_cid,
> + .dgram_get_port = hvs_dgram_get_port,
> + .dgram_get_length = hvs_dgram_get_length,
> .dgram_enqueue = hvs_dgram_enqueue,
> .dgram_allow = hvs_dgram_allow,
>
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index e95df847176b..5763cdf13804 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -429,9 +429,11 @@ static struct virtio_transport virtio_transport = {
> .cancel_pkt = virtio_transport_cancel_pkt,
>
> .dgram_bind = virtio_transport_dgram_bind,
> - .dgram_dequeue = virtio_transport_dgram_dequeue,
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> .dgram_allow = virtio_transport_dgram_allow,
> + .dgram_get_cid = virtio_transport_dgram_get_cid,
> + .dgram_get_port = virtio_transport_dgram_get_port,
> + .dgram_get_length = virtio_transport_dgram_get_length,
>
> .stream_dequeue = virtio_transport_stream_dequeue,
> .stream_enqueue = virtio_transport_stream_enqueue,
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index b769fc258931..e6903c719964 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -797,6 +797,24 @@ int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
>
> +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> +{
> + return -EOPNOTSUPP;
> +}
> +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_cid);
> +
> +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
> +{
> + return -EOPNOTSUPP;
> +}
> +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_port);
> +
> +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
> +{
> + return -EOPNOTSUPP;
> +}
> +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_length);
> +
> bool virtio_transport_dgram_allow(u32 cid, u32 port)
> {
> return false;
> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> index b370070194fa..bbc63826bf48 100644
> --- a/net/vmw_vsock/vmci_transport.c
> +++ b/net/vmw_vsock/vmci_transport.c
> @@ -1731,57 +1731,40 @@ static int vmci_transport_dgram_enqueue(
> return err - sizeof(*dg);
> }
>
> -static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk,
> - struct msghdr *msg, size_t len,
> - int flags)
> +static int vmci_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> {
> - int err;
> struct vmci_datagram *dg;
> - size_t payload_len;
> - struct sk_buff *skb;
>
> - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> - return -EOPNOTSUPP;
> + dg = (struct vmci_datagram *)skb->data;
> + if (!dg)
> + return -EINVAL;
>
> - /* Retrieve the head sk_buff from the socket's receive queue. */
> - err = 0;
> - skb = skb_recv_datagram(&vsk->sk, flags, &err);
> - if (!skb)
> - return err;
> + *cid = dg->src.context;
> + return 0;
> +}
> +
> +static int vmci_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
> +{
> + struct vmci_datagram *dg;
>
> dg = (struct vmci_datagram *)skb->data;
> if (!dg)
> - /* err is 0, meaning we read zero bytes. */
> - goto out;
> -
> - payload_len = dg->payload_size;
> - /* Ensure the sk_buff matches the payload size claimed in the packet. */
> - if (payload_len != skb->len - sizeof(*dg)) {
> - err = -EINVAL;
> - goto out;
> - }
> + return -EINVAL;
>
> - if (payload_len > len) {
> - payload_len = len;
> - msg->msg_flags |= MSG_TRUNC;
> - }
> + *port = dg->src.resource;
> + return 0;
> +}
>
> - /* Place the datagram payload in the user's iovec. */
> - err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
> - if (err)
> - goto out;
> +static int vmci_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
> +{
> + struct vmci_datagram *dg;
>
> - if (msg->msg_name) {
> - /* Provide the address of the sender. */
> - DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> - vsock_addr_init(vm_addr, dg->src.context, dg->src.resource);
> - msg->msg_namelen = sizeof(*vm_addr);
> - }
> - err = payload_len;
> + dg = (struct vmci_datagram *)skb->data;
> + if (!dg)
> + return -EINVAL;
>
> -out:
> - skb_free_datagram(&vsk->sk, skb);
> - return err;
> + *len = dg->payload_size;
> + return 0;
> }
>
> static bool vmci_transport_dgram_allow(u32 cid, u32 port)
> @@ -2040,9 +2023,12 @@ static struct vsock_transport vmci_transport = {
> .release = vmci_transport_release,
> .connect = vmci_transport_connect,
> .dgram_bind = vmci_transport_dgram_bind,
> - .dgram_dequeue = vmci_transport_dgram_dequeue,
> .dgram_enqueue = vmci_transport_dgram_enqueue,
> .dgram_allow = vmci_transport_dgram_allow,
> + .dgram_get_cid = vmci_transport_dgram_get_cid,
> + .dgram_get_port = vmci_transport_dgram_get_port,
> + .dgram_get_length = vmci_transport_dgram_get_length,
> + .dgram_payload_offset = sizeof(struct vmci_datagram),
> .stream_dequeue = vmci_transport_stream_dequeue,
> .stream_enqueue = vmci_transport_stream_enqueue,
> .stream_has_data = vmci_transport_stream_has_data,
> diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> index 5c6360df1f31..2f3cabc79ee5 100644
> --- a/net/vmw_vsock/vsock_loopback.c
> +++ b/net/vmw_vsock/vsock_loopback.c
> @@ -62,9 +62,11 @@ static struct virtio_transport loopback_transport = {
> .cancel_pkt = vsock_loopback_cancel_pkt,
>
> .dgram_bind = virtio_transport_dgram_bind,
> - .dgram_dequeue = virtio_transport_dgram_dequeue,
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> .dgram_allow = virtio_transport_dgram_allow,
> + .dgram_get_cid = virtio_transport_dgram_get_cid,
> + .dgram_get_port = virtio_transport_dgram_get_port,
> + .dgram_get_length = virtio_transport_dgram_get_length,
>
> .stream_dequeue = virtio_transport_stream_dequeue,
> .stream_enqueue = virtio_transport_stream_enqueue,
>

2023-06-11 21:20:15

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 6/8] virtio/vsock: support dgrams

Hello Bobby!

On 10.06.2023 03:58, Bobby Eshleman wrote:
> This commit adds support for datagrams over virtio/vsock.
>
> Message boundaries are preserved on a per-skb and per-vq entry basis.

I'm a little bit confused about the following case: let vhost sends 4097 bytes
datagram to the guest. Guest uses 4096 RX buffers in it's virtio queue, each
buffer has attached empty skb to it. Vhost places first 4096 bytes to the first
buffer of guests RX queue, and 1 last byte to the second buffer. Now IIUC guest
has two skb in it rx queue, and user in guest wants to read data - does it read
4097 bytes, while guest has two skb - 4096 bytes and 1 bytes? In seqpacket there is
special marker in header which shows where message ends, and how it works here?

Thanks, Arseniy

> Messages are copied in whole from the user to an SKB, which in turn is
> added to the scatterlist for the virtqueue in whole for the device.
> Messages do not straddle skbs and they do not straddle packets.
> Messages may be truncated by the receiving user if their buffer is
> shorter than the message.
>
> Other properties of vsock datagrams:
> - Datagrams self-throttle at the per-socket sk_sndbuf threshold.
> - The same virtqueue is used as is used for streams and seqpacket flows
> - Credits are not used for datagrams
> - Packets are dropped silently by the device, which means the virtqueue
> will still get kicked even during high packet loss, so long as the
> socket does not exceed sk_sndbuf.
>
> Future work might include finding a way to reduce the virtqueue kick
> rate for datagram flows with high packet loss.
>
> Signed-off-by: Bobby Eshleman <[email protected]>
> ---
> drivers/vhost/vsock.c | 27 ++++-
> include/linux/virtio_vsock.h | 5 +-
> include/net/af_vsock.h | 1 +
> include/uapi/linux/virtio_vsock.h | 1 +
> net/vmw_vsock/af_vsock.c | 58 +++++++--
> net/vmw_vsock/virtio_transport.c | 23 +++-
> net/vmw_vsock/virtio_transport_common.c | 207 ++++++++++++++++++++++++--------
> net/vmw_vsock/vsock_loopback.c | 8 +-
> 8 files changed, 264 insertions(+), 66 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 8f0082da5e70..159c1a22c1a8 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -32,7 +32,8 @@
> enum {
> VHOST_VSOCK_FEATURES = VHOST_FEATURES |
> (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
> - (1ULL << VIRTIO_VSOCK_F_SEQPACKET)
> + (1ULL << VIRTIO_VSOCK_F_SEQPACKET) |
> + (1ULL << VIRTIO_VSOCK_F_DGRAM)
> };
>
> enum {
> @@ -56,6 +57,7 @@ struct vhost_vsock {
> atomic_t queued_replies;
>
> u32 guest_cid;
> + bool dgram_allow;
> bool seqpacket_allow;
> };
>
> @@ -394,6 +396,7 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock)
> return val < vq->num;
> }
>
> +static bool vhost_transport_dgram_allow(u32 cid, u32 port);
> static bool vhost_transport_seqpacket_allow(u32 remote_cid);
>
> static struct virtio_transport vhost_transport = {
> @@ -410,10 +413,11 @@ static struct virtio_transport vhost_transport = {
> .cancel_pkt = vhost_transport_cancel_pkt,
>
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> - .dgram_allow = virtio_transport_dgram_allow,
> + .dgram_allow = vhost_transport_dgram_allow,
> .dgram_get_cid = virtio_transport_dgram_get_cid,
> .dgram_get_port = virtio_transport_dgram_get_port,
> .dgram_get_length = virtio_transport_dgram_get_length,
> + .dgram_payload_offset = 0,
>
> .stream_enqueue = virtio_transport_stream_enqueue,
> .stream_dequeue = virtio_transport_stream_dequeue,
> @@ -446,6 +450,22 @@ static struct virtio_transport vhost_transport = {
> .send_pkt = vhost_transport_send_pkt,
> };
>
> +static bool vhost_transport_dgram_allow(u32 cid, u32 port)
> +{
> + struct vhost_vsock *vsock;
> + bool dgram_allow = false;
> +
> + rcu_read_lock();
> + vsock = vhost_vsock_get(cid);
> +
> + if (vsock)
> + dgram_allow = vsock->dgram_allow;
> +
> + rcu_read_unlock();
> +
> + return dgram_allow;
> +}
> +
> static bool vhost_transport_seqpacket_allow(u32 remote_cid)
> {
> struct vhost_vsock *vsock;
> @@ -802,6 +822,9 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
> if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
> vsock->seqpacket_allow = true;
>
> + if (features & (1ULL << VIRTIO_VSOCK_F_DGRAM))
> + vsock->dgram_allow = true;
> +
> for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
> vq = &vsock->vqs[i];
> mutex_lock(&vq->mutex);
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 73afa09f4585..237ca87a2ecd 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -216,7 +216,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> bool virtio_transport_stream_allow(u32 cid, u32 port);
> -bool virtio_transport_dgram_allow(u32 cid, u32 port);
> int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
> int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
> int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len);
> @@ -247,4 +246,8 @@ void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit);
> void virtio_transport_deliver_tap_pkt(struct sk_buff *skb);
> int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *list);
> int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t read_actor);
> +void virtio_transport_init_dgram_bind_tables(void);
> +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
> +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
> +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len);
> #endif /* _LINUX_VIRTIO_VSOCK_H */
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> index 7bedb9ee7e3e..c115e655b4f5 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -225,6 +225,7 @@ void vsock_for_each_connected_socket(struct vsock_transport *transport,
> void (*fn)(struct sock *sk));
> int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
> bool vsock_find_cid(unsigned int cid);
> +struct sock *vsock_find_bound_dgram_socket(struct sockaddr_vm *addr);
>
> /**** TAP ****/
>
> diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
> index 9c25f267bbc0..27b4b2b8bf13 100644
> --- a/include/uapi/linux/virtio_vsock.h
> +++ b/include/uapi/linux/virtio_vsock.h
> @@ -70,6 +70,7 @@ struct virtio_vsock_hdr {
> enum virtio_vsock_type {
> VIRTIO_VSOCK_TYPE_STREAM = 1,
> VIRTIO_VSOCK_TYPE_SEQPACKET = 2,
> + VIRTIO_VSOCK_TYPE_DGRAM = 3,
> };
>
> enum virtio_vsock_op {
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index 7a3ca4270446..b0b18e7f4299 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -114,6 +114,7 @@
> static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
> static void vsock_sk_destruct(struct sock *sk);
> static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
> +static bool sock_type_connectible(u16 type);
>
> /* Protocol family. */
> struct proto vsock_proto = {
> @@ -180,6 +181,8 @@ struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
> EXPORT_SYMBOL_GPL(vsock_connected_table);
> DEFINE_SPINLOCK(vsock_table_lock);
> EXPORT_SYMBOL_GPL(vsock_table_lock);
> +static struct list_head vsock_dgram_bind_table[VSOCK_HASH_SIZE];
> +static DEFINE_SPINLOCK(vsock_dgram_table_lock);
>
> /* Autobind this socket to the local address if necessary. */
> static int vsock_auto_bind(struct vsock_sock *vsk)
> @@ -202,6 +205,9 @@ static void vsock_init_tables(void)
>
> for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++)
> INIT_LIST_HEAD(&vsock_connected_table[i]);
> +
> + for (i = 0; i < ARRAY_SIZE(vsock_dgram_bind_table); i++)
> + INIT_LIST_HEAD(&vsock_dgram_bind_table[i]);
> }
>
> static void __vsock_insert_bound(struct list_head *list,
> @@ -230,8 +236,8 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
> sock_put(&vsk->sk);
> }
>
> -struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
> - struct list_head *bind_table)
> +static struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
> + struct list_head *bind_table)
> {
> struct vsock_sock *vsk;
>
> @@ -248,6 +254,23 @@ struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
> return NULL;
> }
>
> +struct sock *
> +vsock_find_bound_dgram_socket(struct sockaddr_vm *addr)
> +{
> + struct sock *sk;
> +
> + spin_lock_bh(&vsock_dgram_table_lock);
> + sk = vsock_find_bound_socket_common(addr,
> + &vsock_dgram_bind_table[VSOCK_HASH(addr)]);
> + if (sk)
> + sock_hold(sk);
> +
> + spin_unlock_bh(&vsock_dgram_table_lock);
> +
> + return sk;
> +}
> +EXPORT_SYMBOL_GPL(vsock_find_bound_dgram_socket);
> +
> static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> {
> return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr));
> @@ -287,6 +310,14 @@ void vsock_insert_connected(struct vsock_sock *vsk)
> }
> EXPORT_SYMBOL_GPL(vsock_insert_connected);
>
> +static void vsock_remove_dgram_bound(struct vsock_sock *vsk)
> +{
> + spin_lock_bh(&vsock_dgram_table_lock);
> + if (__vsock_in_bound_table(vsk))
> + __vsock_remove_bound(vsk);
> + spin_unlock_bh(&vsock_dgram_table_lock);
> +}
> +
> void vsock_remove_bound(struct vsock_sock *vsk)
> {
> spin_lock_bh(&vsock_table_lock);
> @@ -338,7 +369,10 @@ EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
>
> void vsock_remove_sock(struct vsock_sock *vsk)
> {
> - vsock_remove_bound(vsk);
> + if (sock_type_connectible(sk_vsock(vsk)->sk_type))
> + vsock_remove_bound(vsk);
> + else
> + vsock_remove_dgram_bound(vsk);
> vsock_remove_connected(vsk);
> }
> EXPORT_SYMBOL_GPL(vsock_remove_sock);
> @@ -720,11 +754,19 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
> }
>
> -static int __vsock_bind_dgram(struct vsock_sock *vsk,
> - struct sockaddr_vm *addr)
> +static int vsock_bind_dgram(struct vsock_sock *vsk,
> + struct sockaddr_vm *addr)
> {
> - if (!vsk->transport || !vsk->transport->dgram_bind)
> - return -EINVAL;
> + if (!vsk->transport || !vsk->transport->dgram_bind) {
> + int retval;
> +
> + spin_lock_bh(&vsock_dgram_table_lock);
> + retval = vsock_bind_common(vsk, addr, vsock_dgram_bind_table,
> + VSOCK_HASH_SIZE);
> + spin_unlock_bh(&vsock_dgram_table_lock);
> +
> + return retval;
> + }
>
> return vsk->transport->dgram_bind(vsk, addr);
> }
> @@ -755,7 +797,7 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
> break;
>
> case SOCK_DGRAM:
> - retval = __vsock_bind_dgram(vsk, addr);
> + retval = vsock_bind_dgram(vsk, addr);
> break;
>
> default:
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index 1b7843a7779a..7160a3104218 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -63,6 +63,7 @@ struct virtio_vsock {
>
> u32 guest_cid;
> bool seqpacket_allow;
> + bool dgram_allow;
> };
>
> static u32 virtio_transport_get_local_cid(void)
> @@ -413,6 +414,7 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
> queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> }
>
> +static bool virtio_transport_dgram_allow(u32 cid, u32 port);
> static bool virtio_transport_seqpacket_allow(u32 remote_cid);
>
> static struct virtio_transport virtio_transport = {
> @@ -465,6 +467,21 @@ static struct virtio_transport virtio_transport = {
> .send_pkt = virtio_transport_send_pkt,
> };
>
> +static bool virtio_transport_dgram_allow(u32 cid, u32 port)
> +{
> + struct virtio_vsock *vsock;
> + bool dgram_allow;
> +
> + dgram_allow = false;
> + rcu_read_lock();
> + vsock = rcu_dereference(the_virtio_vsock);
> + if (vsock)
> + dgram_allow = vsock->dgram_allow;
> + rcu_read_unlock();
> +
> + return dgram_allow;
> +}
> +
> static bool virtio_transport_seqpacket_allow(u32 remote_cid)
> {
> struct virtio_vsock *vsock;
> @@ -658,6 +675,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_SEQPACKET))
> vsock->seqpacket_allow = true;
>
> + if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
> + vsock->dgram_allow = true;
> +
> vdev->priv = vsock;
>
> ret = virtio_vsock_vqs_init(vsock);
> @@ -750,7 +770,8 @@ static struct virtio_device_id id_table[] = {
> };
>
> static unsigned int features[] = {
> - VIRTIO_VSOCK_F_SEQPACKET
> + VIRTIO_VSOCK_F_SEQPACKET,
> + VIRTIO_VSOCK_F_DGRAM
> };
>
> static struct virtio_driver virtio_vsock_driver = {
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index d5a3c8efe84b..bc9d459723f5 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -37,6 +37,35 @@ virtio_transport_get_ops(struct vsock_sock *vsk)
> return container_of(t, struct virtio_transport, transport);
> }
>
> +/* Requires info->msg and info->vsk */
> +static struct sk_buff *
> +virtio_transport_sock_alloc_send_skb(struct virtio_vsock_pkt_info *info, unsigned int size,
> + gfp_t mask, int *err)
> +{
> + struct sk_buff *skb;
> + struct sock *sk;
> + int noblock;
> +
> + if (size < VIRTIO_VSOCK_SKB_HEADROOM) {
> + *err = -EINVAL;
> + return NULL;
> + }
> +
> + if (info->msg)
> + noblock = info->msg->msg_flags & MSG_DONTWAIT;
> + else
> + noblock = 1;
> +
> + sk = sk_vsock(info->vsk);
> + sk->sk_allocation = mask;
> + skb = sock_alloc_send_skb(sk, size, noblock, err);
> + if (!skb)
> + return NULL;
> +
> + skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM);
> + return skb;
> +}
> +
> /* Returns a new packet on success, otherwise returns NULL.
> *
> * If NULL is returned, errp is set to a negative errno.
> @@ -47,7 +76,8 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
> u32 src_cid,
> u32 src_port,
> u32 dst_cid,
> - u32 dst_port)
> + u32 dst_port,
> + int *errp)
> {
> const size_t skb_len = VIRTIO_VSOCK_SKB_HEADROOM + len;
> struct virtio_vsock_hdr *hdr;
> @@ -55,9 +85,21 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
> void *payload;
> int err;
>
> - skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
> - if (!skb)
> + /* dgrams do not use credits, self-throttle according to sk_sndbuf
> + * using sock_alloc_send_skb. This helps avoid triggering the OOM.
> + */
> + if (info->vsk && info->type == VIRTIO_VSOCK_TYPE_DGRAM) {
> + skb = virtio_transport_sock_alloc_send_skb(info, skb_len, GFP_KERNEL, &err);
> + } else {
> + skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
> + if (!skb)
> + err = -ENOMEM;
> + }
> +
> + if (!skb) {
> + *errp = err;
> return NULL;
> + }
>
> hdr = virtio_vsock_hdr(skb);
> hdr->type = cpu_to_le16(info->type);
> @@ -96,12 +138,14 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
>
> if (info->vsk && !skb_set_owner_sk_safe(skb, sk_vsock(info->vsk))) {
> WARN_ONCE(1, "failed to allocate skb on vsock socket with sk_refcnt == 0\n");
> + err = -EFAULT;
> goto out;
> }
>
> return skb;
>
> out:
> + *errp = err;
> kfree_skb(skb);
> return NULL;
> }
> @@ -183,7 +227,9 @@ EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt);
>
> static u16 virtio_transport_get_type(struct sock *sk)
> {
> - if (sk->sk_type == SOCK_STREAM)
> + if (sk->sk_type == SOCK_DGRAM)
> + return VIRTIO_VSOCK_TYPE_DGRAM;
> + else if (sk->sk_type == SOCK_STREAM)
> return VIRTIO_VSOCK_TYPE_STREAM;
> else
> return VIRTIO_VSOCK_TYPE_SEQPACKET;
> @@ -239,11 +285,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>
> skb = virtio_transport_alloc_skb(info, skb_len,
> src_cid, src_port,
> - dst_cid, dst_port);
> - if (!skb) {
> - ret = -ENOMEM;
> + dst_cid, dst_port,
> + &ret);
> + if (!skb)
> break;
> - }
>
> virtio_transport_inc_tx_pkt(vvs, skb);
>
> @@ -583,14 +628,30 @@ virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
> }
> EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_enqueue);
>
> -int
> -virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
> - struct msghdr *msg,
> - size_t len, int flags)
> +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> +{
> + *cid = le64_to_cpu(virtio_vsock_hdr(skb)->src_cid);
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_cid);
> +
> +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
> +{
> + *port = le32_to_cpu(virtio_vsock_hdr(skb)->src_port);
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_port);
> +
> +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
> {
> - return -EOPNOTSUPP;
> + /* The device layer must have already moved the data ptr beyond the
> + * header for skb->len to be correct.
> + */
> + WARN_ON(skb->data == skb->head);
> + *len = skb->len;
> + return 0;
> }
> -EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
> +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_length);
>
> s64 virtio_transport_stream_has_data(struct vsock_sock *vsk)
> {
> @@ -790,30 +851,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
> }
> EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
>
> -int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> -{
> - return -EOPNOTSUPP;
> -}
> -EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_cid);
> -
> -int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
> -{
> - return -EOPNOTSUPP;
> -}
> -EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_port);
> -
> -int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
> -{
> - return -EOPNOTSUPP;
> -}
> -EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_length);
> -
> -bool virtio_transport_dgram_allow(u32 cid, u32 port)
> -{
> - return false;
> -}
> -EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
> -
> int virtio_transport_connect(struct vsock_sock *vsk)
> {
> struct virtio_vsock_pkt_info info = {
> @@ -846,7 +883,34 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
> struct msghdr *msg,
> size_t dgram_len)
> {
> - return -EOPNOTSUPP;
> + const struct virtio_transport *t_ops;
> + struct virtio_vsock_pkt_info info = {
> + .op = VIRTIO_VSOCK_OP_RW,
> + .msg = msg,
> + .vsk = vsk,
> + .type = VIRTIO_VSOCK_TYPE_DGRAM,
> + };
> + u32 src_cid, src_port;
> + struct sk_buff *skb;
> + int err;
> +
> + if (dgram_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> + return -EMSGSIZE;
> +
> + t_ops = virtio_transport_get_ops(vsk);
> + src_cid = t_ops->transport.get_local_cid();
> + src_port = vsk->local_addr.svm_port;
> +
> + skb = virtio_transport_alloc_skb(&info, dgram_len,
> + src_cid, src_port,
> + remote_addr->svm_cid,
> + remote_addr->svm_port,
> + &err);
> +
> + if (!skb)
> + return err;
> +
> + return t_ops->send_pkt(skb);
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
>
> @@ -903,6 +967,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
> .reply = true,
> };
> struct sk_buff *reply;
> + int err;
>
> /* Send RST only if the original pkt is not a RST pkt */
> if (le16_to_cpu(hdr->op) == VIRTIO_VSOCK_OP_RST)
> @@ -915,9 +980,10 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
> le64_to_cpu(hdr->dst_cid),
> le32_to_cpu(hdr->dst_port),
> le64_to_cpu(hdr->src_cid),
> - le32_to_cpu(hdr->src_port));
> + le32_to_cpu(hdr->src_port),
> + &err);
> if (!reply)
> - return -ENOMEM;
> + return err;
>
> return t->send_pkt(reply);
> }
> @@ -1137,6 +1203,21 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
> kfree_skb(skb);
> }
>
> +/* This function takes ownership of the skb.
> + *
> + * It either places the skb on the sk_receive_queue or frees it.
> + */
> +static void
> +virtio_transport_recv_dgram(struct sock *sk, struct sk_buff *skb)
> +{
> + if (sock_queue_rcv_skb(sk, skb)) {
> + kfree_skb(skb);
> + return;
> + }
> +
> + sk->sk_data_ready(sk);
> +}
> +
> static int
> virtio_transport_recv_connected(struct sock *sk,
> struct sk_buff *skb)
> @@ -1300,7 +1381,8 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
> static bool virtio_transport_valid_type(u16 type)
> {
> return (type == VIRTIO_VSOCK_TYPE_STREAM) ||
> - (type == VIRTIO_VSOCK_TYPE_SEQPACKET);
> + (type == VIRTIO_VSOCK_TYPE_SEQPACKET) ||
> + (type == VIRTIO_VSOCK_TYPE_DGRAM);
> }
>
> /* We are under the virtio-vsock's vsock->rx_lock or vhost-vsock's vq->mutex
> @@ -1314,40 +1396,52 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> struct vsock_sock *vsk;
> struct sock *sk;
> bool space_available;
> + u16 type;
>
> vsock_addr_init(&src, le64_to_cpu(hdr->src_cid),
> le32_to_cpu(hdr->src_port));
> vsock_addr_init(&dst, le64_to_cpu(hdr->dst_cid),
> le32_to_cpu(hdr->dst_port));
>
> + type = le16_to_cpu(hdr->type);
> +
> trace_virtio_transport_recv_pkt(src.svm_cid, src.svm_port,
> dst.svm_cid, dst.svm_port,
> le32_to_cpu(hdr->len),
> - le16_to_cpu(hdr->type),
> + type,
> le16_to_cpu(hdr->op),
> le32_to_cpu(hdr->flags),
> le32_to_cpu(hdr->buf_alloc),
> le32_to_cpu(hdr->fwd_cnt));
>
> - if (!virtio_transport_valid_type(le16_to_cpu(hdr->type))) {
> + if (!virtio_transport_valid_type(type)) {
> (void)virtio_transport_reset_no_sock(t, skb);
> goto free_pkt;
> }
>
> - /* The socket must be in connected or bound table
> - * otherwise send reset back
> + /* For stream/seqpacket, the socket must be in connected or bound table
> + * otherwise send reset back.
> + *
> + * For datagrams, no reset is sent back.
> */
> sk = vsock_find_connected_socket(&src, &dst);
> if (!sk) {
> - sk = vsock_find_bound_socket(&dst);
> - if (!sk) {
> - (void)virtio_transport_reset_no_sock(t, skb);
> - goto free_pkt;
> + if (type == VIRTIO_VSOCK_TYPE_DGRAM) {
> + sk = vsock_find_bound_dgram_socket(&dst);
> + if (!sk)
> + goto free_pkt;
> + } else {
> + sk = vsock_find_bound_socket(&dst);
> + if (!sk) {
> + (void)virtio_transport_reset_no_sock(t, skb);
> + goto free_pkt;
> + }
> }
> }
>
> - if (virtio_transport_get_type(sk) != le16_to_cpu(hdr->type)) {
> - (void)virtio_transport_reset_no_sock(t, skb);
> + if (virtio_transport_get_type(sk) != type) {
> + if (type != VIRTIO_VSOCK_TYPE_DGRAM)
> + (void)virtio_transport_reset_no_sock(t, skb);
> sock_put(sk);
> goto free_pkt;
> }
> @@ -1363,12 +1457,18 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>
> /* Check if sk has been closed before lock_sock */
> if (sock_flag(sk, SOCK_DONE)) {
> - (void)virtio_transport_reset_no_sock(t, skb);
> + if (type != VIRTIO_VSOCK_TYPE_DGRAM)
> + (void)virtio_transport_reset_no_sock(t, skb);
> release_sock(sk);
> sock_put(sk);
> goto free_pkt;
> }
>
> + if (sk->sk_type == SOCK_DGRAM) {
> + virtio_transport_recv_dgram(sk, skb);
> + goto out;
> + }
> +
> space_available = virtio_transport_space_update(sk, skb);
>
> /* Update CID in case it has changed after a transport reset event */
> @@ -1400,6 +1500,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> break;
> }
>
> +out:
> release_sock(sk);
>
> /* Release refcnt obtained when we fetched this socket out of the
> diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> index e9de45a26fbd..68312aa8c972 100644
> --- a/net/vmw_vsock/vsock_loopback.c
> +++ b/net/vmw_vsock/vsock_loopback.c
> @@ -46,6 +46,7 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk)
> return 0;
> }
>
> +static bool vsock_loopback_dgram_allow(u32 cid, u32 port);
> static bool vsock_loopback_seqpacket_allow(u32 remote_cid);
>
> static struct virtio_transport loopback_transport = {
> @@ -62,7 +63,7 @@ static struct virtio_transport loopback_transport = {
> .cancel_pkt = vsock_loopback_cancel_pkt,
>
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> - .dgram_allow = virtio_transport_dgram_allow,
> + .dgram_allow = vsock_loopback_dgram_allow,
> .dgram_get_cid = virtio_transport_dgram_get_cid,
> .dgram_get_port = virtio_transport_dgram_get_port,
> .dgram_get_length = virtio_transport_dgram_get_length,
> @@ -98,6 +99,11 @@ static struct virtio_transport loopback_transport = {
> .send_pkt = vsock_loopback_send_pkt,
> };
>
> +static bool vsock_loopback_dgram_allow(u32 cid, u32 port)
> +{
> + return true;
> +}
> +
> static bool vsock_loopback_seqpacket_allow(u32 remote_cid)
> {
> return true;
>

2023-06-11 21:44:18

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 8/8] tests: add vsock dgram tests

Hello Bobby!

Sorry, may be I become a little bit annoying:), but I tried to run vsock_test with
this v4 version, and again get the same crash:

# cat client.sh
./vsock_test --mode=client --control-host=192.168.1.1 --control-port=12345 --peer-cid=2
# ./client.sh
Control socket connected to 192.168.1.1:12345.
0 - SOCK_STREAM connection reset...[ 20.065237] BUG: kernel NULL pointer dereference, addre0
[ 20.065895] #PF: supervisor read access in kernel mode
[ 20.065895] #PF: error_code(0x0000) - not-present page
[ 20.065895] PGD 0 P4D 0
[ 20.065895] Oops: 0000 [#1] PREEMPT SMP PTI
[ 20.065895] CPU: 0 PID: 111 Comm: vsock_test Not tainted 6.4.0-rc3-gefcccba07069 #385
[ 20.065895] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd44
[ 20.065895] RIP: 0010:static_key_count+0x0/0x20
[ 20.065895] Code: 04 4c 8b 46 08 49 29 c0 4c 01 c8 4c 89 47 08 89 0e 89 56 04 48 89 46 08 f
[ 20.065895] RSP: 0018:ffffbbb000223dc0 EFLAGS: 00010202
[ 20.065895] RAX: ffffffff85709880 RBX: ffffffffc0079140 RCX: 0000000000000000
[ 20.065895] RDX: ffff9f73c2175700 RSI: 0000000000000000 RDI: 0000000000000000
[ 20.065895] RBP: ffff9f73c2385900 R08: ffffbbb000223d30 R09: ffff9f73ff896000
[ 20.065895] R10: 0000000000001000 R11: 0000000000000000 R12: ffffbbb000223e80
[ 20.065895] R13: 0000000000000000 R14: 0000000000000002 R15: ffff9f73c1cfaa80
[ 20.065895] FS: 00007f1ad82f55c0(0000) GS:ffff9f73fe400000(0000) knlGS:0000000000000000
[ 20.065895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 20.065895] CR2: 0000000000000000 CR3: 000000003f954000 CR4: 00000000000006f0
[ 20.065895] Call Trace:
[ 20.065895] <TASK>
[ 20.065895] once_deferred+0xd/0x30
[ 20.065895] vsock_assign_transport+0x9a/0x1b0 [vsock]
[ 20.065895] vsock_connect+0xb4/0x3a0 [vsock]
[ 20.065895] ? var_wake_function+0x60/0x60
[ 20.065895] __sys_connect+0x9e/0xd0
[ 20.065895] ? _raw_spin_unlock_irq+0xe/0x30
[ 20.065895] ? do_setitimer+0x128/0x1f0
[ 20.065895] ? alarm_setitimer+0x4c/0x90
[ 20.065895] ? fpregs_assert_state_consistent+0x1d/0x50
[ 20.065895] ? exit_to_user_mode_prepare+0x36/0x130
[ 20.065895] __x64_sys_connect+0x11/0x20
[ 20.065895] do_syscall_64+0x3b/0xc0
[ 20.065895] entry_SYSCALL_64_after_hwframe+0x4b/0xb5
[ 20.065895] RIP: 0033:0x7f1ad822dd13
[ 20.065895] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 64 8
[ 20.065895] RSP: 002b:00007ffc513e3c98 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
[ 20.065895] RAX: ffffffffffffffda RBX: 000055aed298e020 RCX: 00007f1ad822dd13
[ 20.065895] RDX: 0000000000000010 RSI: 00007ffc513e3cb0 RDI: 0000000000000004
[ 20.065895] RBP: 0000000000000004 R08: 000055aed32b2018 R09: 0000000000000000
[ 20.065895] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 20.065895] R13: 000055aed298acb1 R14: 00007ffc513e3cb0 R15: 00007ffc513e3d40
[ 20.065895] </TASK>
[ 20.065895] Modules linked in: vsock_loopback vhost_vsock vmw_vsock_virtio_transport vmw_vb
[ 20.065895] CR2: 0000000000000000
[ 20.154060] ---[ end trace 0000000000000000 ]---
[ 20.155519] RIP: 0010:static_key_count+0x0/0x20
[ 20.156932] Code: 04 4c 8b 46 08 49 29 c0 4c 01 c8 4c 89 47 08 89 0e 89 56 04 48 89 46 08 f
[ 20.161367] RSP: 0018:ffffbbb000223dc0 EFLAGS: 00010202
[ 20.162613] RAX: ffffffff85709880 RBX: ffffffffc0079140 RCX: 0000000000000000
[ 20.164262] RDX: ffff9f73c2175700 RSI: 0000000000000000 RDI: 0000000000000000
[ 20.165934] RBP: ffff9f73c2385900 R08: ffffbbb000223d30 R09: ffff9f73ff896000
[ 20.167684] R10: 0000000000001000 R11: 0000000000000000 R12: ffffbbb000223e80
[ 20.169427] R13: 0000000000000000 R14: 0000000000000002 R15: ffff9f73c1cfaa80
[ 20.171109] FS: 00007f1ad82f55c0(0000) GS:ffff9f73fe400000(0000) knlGS:0000000000000000
[ 20.173000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 20.174381] CR2: 0000000000000000 CR3: 000000003f954000 CR4: 00000000000006f0

So, what HEAD do You use? May be You have some specific config (I use x86-64 defconfig + vsock/vhost
related things) ?

Thanks, Arseniy


On 10.06.2023 03:58, Bobby Eshleman wrote:
> From: Jiang Wang <[email protected]>
>
> This patch adds tests for vsock datagram.
>
> Signed-off-by: Bobby Eshleman <[email protected]>
> Signed-off-by: Jiang Wang <[email protected]>
> ---
> tools/testing/vsock/util.c | 141 ++++++++++++-
> tools/testing/vsock/util.h | 6 +
> tools/testing/vsock/vsock_test.c | 432 +++++++++++++++++++++++++++++++++++++++
> 3 files changed, 578 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
> index 01b636d3039a..811e70d7cf1e 100644
> --- a/tools/testing/vsock/util.c
> +++ b/tools/testing/vsock/util.c
> @@ -99,7 +99,8 @@ static int vsock_connect(unsigned int cid, unsigned int port, int type)
> int ret;
> int fd;
>
> - control_expectln("LISTENING");
> + if (type != SOCK_DGRAM)
> + control_expectln("LISTENING");
>
> fd = socket(AF_VSOCK, type, 0);
>
> @@ -130,6 +131,11 @@ int vsock_seqpacket_connect(unsigned int cid, unsigned int port)
> return vsock_connect(cid, port, SOCK_SEQPACKET);
> }
>
> +int vsock_dgram_connect(unsigned int cid, unsigned int port)
> +{
> + return vsock_connect(cid, port, SOCK_DGRAM);
> +}
> +
> /* Listen on <cid, port> and return the first incoming connection. The remote
> * address is stored to clientaddrp. clientaddrp may be NULL.
> */
> @@ -211,6 +217,34 @@ int vsock_seqpacket_accept(unsigned int cid, unsigned int port,
> return vsock_accept(cid, port, clientaddrp, SOCK_SEQPACKET);
> }
>
> +int vsock_dgram_bind(unsigned int cid, unsigned int port)
> +{
> + union {
> + struct sockaddr sa;
> + struct sockaddr_vm svm;
> + } addr = {
> + .svm = {
> + .svm_family = AF_VSOCK,
> + .svm_port = port,
> + .svm_cid = cid,
> + },
> + };
> + int fd;
> +
> + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> + if (fd < 0) {
> + perror("socket");
> + exit(EXIT_FAILURE);
> + }
> +
> + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
> + perror("bind");
> + exit(EXIT_FAILURE);
> + }
> +
> + return fd;
> +}
> +
> /* Transmit one byte and check the return value.
> *
> * expected_ret:
> @@ -260,6 +294,57 @@ void send_byte(int fd, int expected_ret, int flags)
> }
> }
>
> +/* Transmit one byte and check the return value.
> + *
> + * expected_ret:
> + * <0 Negative errno (for testing errors)
> + * 0 End-of-file
> + * 1 Success
> + */
> +void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
> + int flags)
> +{
> + const uint8_t byte = 'A';
> + ssize_t nwritten;
> +
> + timeout_begin(TIMEOUT);
> + do {
> + nwritten = sendto(fd, &byte, sizeof(byte), flags, dest_addr,
> + len);
> + timeout_check("write");
> + } while (nwritten < 0 && errno == EINTR);
> + timeout_end();
> +
> + if (expected_ret < 0) {
> + if (nwritten != -1) {
> + fprintf(stderr, "bogus sendto(2) return value %zd\n",
> + nwritten);
> + exit(EXIT_FAILURE);
> + }
> + if (errno != -expected_ret) {
> + perror("write");
> + exit(EXIT_FAILURE);
> + }
> + return;
> + }
> +
> + if (nwritten < 0) {
> + perror("write");
> + exit(EXIT_FAILURE);
> + }
> + if (nwritten == 0) {
> + if (expected_ret == 0)
> + return;
> +
> + fprintf(stderr, "unexpected EOF while sending byte\n");
> + exit(EXIT_FAILURE);
> + }
> + if (nwritten != sizeof(byte)) {
> + fprintf(stderr, "bogus sendto(2) return value %zd\n", nwritten);
> + exit(EXIT_FAILURE);
> + }
> +}
> +
> /* Receive one byte and check the return value.
> *
> * expected_ret:
> @@ -313,6 +398,60 @@ void recv_byte(int fd, int expected_ret, int flags)
> }
> }
>
> +/* Receive one byte and check the return value.
> + *
> + * expected_ret:
> + * <0 Negative errno (for testing errors)
> + * 0 End-of-file
> + * 1 Success
> + */
> +void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
> + int expected_ret, int flags)
> +{
> + uint8_t byte;
> + ssize_t nread;
> +
> + timeout_begin(TIMEOUT);
> + do {
> + nread = recvfrom(fd, &byte, sizeof(byte), flags, src_addr, addrlen);
> + timeout_check("read");
> + } while (nread < 0 && errno == EINTR);
> + timeout_end();
> +
> + if (expected_ret < 0) {
> + if (nread != -1) {
> + fprintf(stderr, "bogus recvfrom(2) return value %zd\n",
> + nread);
> + exit(EXIT_FAILURE);
> + }
> + if (errno != -expected_ret) {
> + perror("read");
> + exit(EXIT_FAILURE);
> + }
> + return;
> + }
> +
> + if (nread < 0) {
> + perror("read");
> + exit(EXIT_FAILURE);
> + }
> + if (nread == 0) {
> + if (expected_ret == 0)
> + return;
> +
> + fprintf(stderr, "unexpected EOF while receiving byte\n");
> + exit(EXIT_FAILURE);
> + }
> + if (nread != sizeof(byte)) {
> + fprintf(stderr, "bogus recvfrom(2) return value %zd\n", nread);
> + exit(EXIT_FAILURE);
> + }
> + if (byte != 'A') {
> + fprintf(stderr, "unexpected byte read %c\n", byte);
> + exit(EXIT_FAILURE);
> + }
> +}
> +
> /* Run test cases. The program terminates if a failure occurs. */
> void run_tests(const struct test_case *test_cases,
> const struct test_opts *opts)
> diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
> index fb99208a95ea..a69e128d120c 100644
> --- a/tools/testing/vsock/util.h
> +++ b/tools/testing/vsock/util.h
> @@ -37,13 +37,19 @@ void init_signals(void);
> unsigned int parse_cid(const char *str);
> int vsock_stream_connect(unsigned int cid, unsigned int port);
> int vsock_seqpacket_connect(unsigned int cid, unsigned int port);
> +int vsock_dgram_connect(unsigned int cid, unsigned int port);
> int vsock_stream_accept(unsigned int cid, unsigned int port,
> struct sockaddr_vm *clientaddrp);
> int vsock_seqpacket_accept(unsigned int cid, unsigned int port,
> struct sockaddr_vm *clientaddrp);
> +int vsock_dgram_bind(unsigned int cid, unsigned int port);
> void vsock_wait_remote_close(int fd);
> void send_byte(int fd, int expected_ret, int flags);
> +void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
> + int flags);
> void recv_byte(int fd, int expected_ret, int flags);
> +void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
> + int expected_ret, int flags);
> void run_tests(const struct test_case *test_cases,
> const struct test_opts *opts);
> void list_tests(const struct test_case *test_cases);
> diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
> index ac1bd3ac1533..ded82d39ee5d 100644
> --- a/tools/testing/vsock/vsock_test.c
> +++ b/tools/testing/vsock/vsock_test.c
> @@ -1053,6 +1053,413 @@ static void test_stream_virtio_skb_merge_server(const struct test_opts *opts)
> close(fd);
> }
>
> +static void test_dgram_sendto_client(const struct test_opts *opts)
> +{
> + union {
> + struct sockaddr sa;
> + struct sockaddr_vm svm;
> + } addr = {
> + .svm = {
> + .svm_family = AF_VSOCK,
> + .svm_port = 1234,
> + .svm_cid = opts->peer_cid,
> + },
> + };
> + int fd;
> +
> + /* Wait for the server to be ready */
> + control_expectln("BIND");
> +
> + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> + if (fd < 0) {
> + perror("socket");
> + exit(EXIT_FAILURE);
> + }
> +
> + sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0);
> +
> + /* Notify the server that the client has finished */
> + control_writeln("DONE");
> +
> + close(fd);
> +}
> +
> +static void test_dgram_sendto_server(const struct test_opts *opts)
> +{
> + union {
> + struct sockaddr sa;
> + struct sockaddr_vm svm;
> + } addr = {
> + .svm = {
> + .svm_family = AF_VSOCK,
> + .svm_port = 1234,
> + .svm_cid = VMADDR_CID_ANY,
> + },
> + };
> + int len = sizeof(addr.sa);
> + int fd;
> +
> + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> + if (fd < 0) {
> + perror("socket");
> + exit(EXIT_FAILURE);
> + }
> +
> + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
> + perror("bind");
> + exit(EXIT_FAILURE);
> + }
> +
> + /* Notify the client that the server is ready */
> + control_writeln("BIND");
> +
> + recvfrom_byte(fd, &addr.sa, &len, 1, 0);
> +
> + /* Wait for the client to finish */
> + control_expectln("DONE");
> +
> + close(fd);
> +}
> +
> +static void test_dgram_connect_client(const struct test_opts *opts)
> +{
> + union {
> + struct sockaddr sa;
> + struct sockaddr_vm svm;
> + } addr = {
> + .svm = {
> + .svm_family = AF_VSOCK,
> + .svm_port = 1234,
> + .svm_cid = opts->peer_cid,
> + },
> + };
> + int ret;
> + int fd;
> +
> + /* Wait for the server to be ready */
> + control_expectln("BIND");
> +
> + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> + if (fd < 0) {
> + perror("bind");
> + exit(EXIT_FAILURE);
> + }
> +
> + ret = connect(fd, &addr.sa, sizeof(addr.svm));
> + if (ret < 0) {
> + perror("connect");
> + exit(EXIT_FAILURE);
> + }
> +
> + send_byte(fd, 1, 0);
> +
> + /* Notify the server that the client has finished */
> + control_writeln("DONE");
> +
> + close(fd);
> +}
> +
> +static void test_dgram_connect_server(const struct test_opts *opts)
> +{
> + test_dgram_sendto_server(opts);
> +}
> +
> +static void test_dgram_multiconn_sendto_client(const struct test_opts *opts)
> +{
> + union {
> + struct sockaddr sa;
> + struct sockaddr_vm svm;
> + } addr = {
> + .svm = {
> + .svm_family = AF_VSOCK,
> + .svm_port = 1234,
> + .svm_cid = opts->peer_cid,
> + },
> + };
> + int fds[MULTICONN_NFDS];
> + int i;
> +
> + /* Wait for the server to be ready */
> + control_expectln("BIND");
> +
> + for (i = 0; i < MULTICONN_NFDS; i++) {
> + fds[i] = socket(AF_VSOCK, SOCK_DGRAM, 0);
> + if (fds[i] < 0) {
> + perror("socket");
> + exit(EXIT_FAILURE);
> + }
> + }
> +
> + for (i = 0; i < MULTICONN_NFDS; i++)
> + sendto_byte(fds[i], &addr.sa, sizeof(addr.svm), 1, 0);
> +
> + /* Notify the server that the client has finished */
> + control_writeln("DONE");
> +
> + for (i = 0; i < MULTICONN_NFDS; i++)
> + close(fds[i]);
> +}
> +
> +static void test_dgram_multiconn_sendto_server(const struct test_opts *opts)
> +{
> + union {
> + struct sockaddr sa;
> + struct sockaddr_vm svm;
> + } addr = {
> + .svm = {
> + .svm_family = AF_VSOCK,
> + .svm_port = 1234,
> + .svm_cid = VMADDR_CID_ANY,
> + },
> + };
> + int len = sizeof(addr.sa);
> + int fd;
> + int i;
> +
> + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> + if (fd < 0) {
> + perror("socket");
> + exit(EXIT_FAILURE);
> + }
> +
> + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
> + perror("bind");
> + exit(EXIT_FAILURE);
> + }
> +
> + /* Notify the client that the server is ready */
> + control_writeln("BIND");
> +
> + for (i = 0; i < MULTICONN_NFDS; i++)
> + recvfrom_byte(fd, &addr.sa, &len, 1, 0);
> +
> + /* Wait for the client to finish */
> + control_expectln("DONE");
> +
> + close(fd);
> +}
> +
> +static void test_dgram_multiconn_send_client(const struct test_opts *opts)
> +{
> + int fds[MULTICONN_NFDS];
> + int i;
> +
> + /* Wait for the server to be ready */
> + control_expectln("BIND");
> +
> + for (i = 0; i < MULTICONN_NFDS; i++) {
> + fds[i] = vsock_dgram_connect(opts->peer_cid, 1234);
> + if (fds[i] < 0) {
> + perror("socket");
> + exit(EXIT_FAILURE);
> + }
> + }
> +
> + for (i = 0; i < MULTICONN_NFDS; i++)
> + send_byte(fds[i], 1, 0);
> +
> + /* Notify the server that the client has finished */
> + control_writeln("DONE");
> +
> + for (i = 0; i < MULTICONN_NFDS; i++)
> + close(fds[i]);
> +}
> +
> +static void test_dgram_multiconn_send_server(const struct test_opts *opts)
> +{
> + union {
> + struct sockaddr sa;
> + struct sockaddr_vm svm;
> + } addr = {
> + .svm = {
> + .svm_family = AF_VSOCK,
> + .svm_port = 1234,
> + .svm_cid = VMADDR_CID_ANY,
> + },
> + };
> + int fd;
> + int i;
> +
> + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> + if (fd < 0) {
> + perror("socket");
> + exit(EXIT_FAILURE);
> + }
> +
> + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
> + perror("bind");
> + exit(EXIT_FAILURE);
> + }
> +
> + /* Notify the client that the server is ready */
> + control_writeln("BIND");
> +
> + for (i = 0; i < MULTICONN_NFDS; i++)
> + recv_byte(fd, 1, 0);
> +
> + /* Wait for the client to finish */
> + control_expectln("DONE");
> +
> + close(fd);
> +}
> +
> +static void test_dgram_msg_bounds_client(const struct test_opts *opts)
> +{
> + unsigned long recv_buf_size;
> + int page_size;
> + int msg_cnt;
> + int fd;
> +
> + fd = vsock_dgram_connect(opts->peer_cid, 1234);
> + if (fd < 0) {
> + perror("connect");
> + exit(EXIT_FAILURE);
> + }
> +
> + /* Let the server know the client is ready */
> + control_writeln("CLNTREADY");
> +
> + msg_cnt = control_readulong();
> + recv_buf_size = control_readulong();
> +
> + /* Wait, until receiver sets buffer size. */
> + control_expectln("SRVREADY");
> +
> + page_size = getpagesize();
> +
> + for (int i = 0; i < msg_cnt; i++) {
> + unsigned long curr_hash;
> + ssize_t send_size;
> + size_t buf_size;
> + void *buf;
> +
> + /* Use "small" buffers and "big" buffers. */
> + if (i & 1)
> + buf_size = page_size +
> + (rand() % (MAX_MSG_SIZE - page_size));
> + else
> + buf_size = 1 + (rand() % page_size);
> +
> + buf_size = min(buf_size, recv_buf_size);
> +
> + buf = malloc(buf_size);
> +
> + if (!buf) {
> + perror("malloc");
> + exit(EXIT_FAILURE);
> + }
> +
> + memset(buf, rand() & 0xff, buf_size);
> + /* Set at least one MSG_EOR + some random. */
> +
> + send_size = send(fd, buf, buf_size, 0);
> +
> + if (send_size < 0) {
> + perror("send");
> + exit(EXIT_FAILURE);
> + }
> +
> + if (send_size != buf_size) {
> + fprintf(stderr, "Invalid send size\n");
> + exit(EXIT_FAILURE);
> + }
> +
> + /* In theory the implementation isn't required to transmit
> + * these packets in order, so we use this SYNC control message
> + * so that server and client coordinate sending and receiving
> + * one packet at a time. The client sends a packet and waits
> + * until it has been received before sending another.
> + */
> + control_writeln("PKTSENT");
> + control_expectln("PKTRECV");
> +
> + /* Send the server a hash of the packet */
> + curr_hash = hash_djb2(buf, buf_size);
> + control_writeulong(curr_hash);
> + free(buf);
> + }
> +
> + control_writeln("SENDDONE");
> + close(fd);
> +}
> +
> +static void test_dgram_msg_bounds_server(const struct test_opts *opts)
> +{
> + const unsigned long msg_cnt = 16;
> + unsigned long sock_buf_size;
> + struct msghdr msg = {0};
> + struct iovec iov = {0};
> + char buf[MAX_MSG_SIZE];
> + socklen_t len;
> + int fd;
> + int i;
> +
> + fd = vsock_dgram_bind(VMADDR_CID_ANY, 1234);
> +
> + if (fd < 0) {
> + perror("bind");
> + exit(EXIT_FAILURE);
> + }
> +
> + /* Set receive buffer to maximum */
> + sock_buf_size = -1;
> + if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
> + &sock_buf_size, sizeof(sock_buf_size))) {
> + perror("setsockopt(SO_RECVBUF)");
> + exit(EXIT_FAILURE);
> + }
> +
> + /* Retrieve the receive buffer size */
> + len = sizeof(sock_buf_size);
> + if (getsockopt(fd, SOL_SOCKET, SO_RCVBUF,
> + &sock_buf_size, &len)) {
> + perror("getsockopt(SO_RECVBUF)");
> + exit(EXIT_FAILURE);
> + }
> +
> + /* Client ready to receive parameters */
> + control_expectln("CLNTREADY");
> +
> + control_writeulong(msg_cnt);
> + control_writeulong(sock_buf_size);
> +
> + /* Ready to receive data. */
> + control_writeln("SRVREADY");
> +
> + iov.iov_base = buf;
> + iov.iov_len = sizeof(buf);
> + msg.msg_iov = &iov;
> + msg.msg_iovlen = 1;
> +
> + for (i = 0; i < msg_cnt; i++) {
> + unsigned long remote_hash;
> + unsigned long curr_hash;
> + ssize_t recv_size;
> +
> + control_expectln("PKTSENT");
> + recv_size = recvmsg(fd, &msg, 0);
> + control_writeln("PKTRECV");
> +
> + if (!recv_size)
> + break;
> +
> + if (recv_size < 0) {
> + perror("recvmsg");
> + exit(EXIT_FAILURE);
> + }
> +
> + curr_hash = hash_djb2(msg.msg_iov[0].iov_base, recv_size);
> + remote_hash = control_readulong();
> +
> + if (curr_hash != remote_hash) {
> + fprintf(stderr, "Message bounds broken\n");
> + exit(EXIT_FAILURE);
> + }
> + }
> +
> + close(fd);
> +}
> +
> static struct test_case test_cases[] = {
> {
> .name = "SOCK_STREAM connection reset",
> @@ -1128,6 +1535,31 @@ static struct test_case test_cases[] = {
> .run_client = test_stream_virtio_skb_merge_client,
> .run_server = test_stream_virtio_skb_merge_server,
> },
> + {
> + .name = "SOCK_DGRAM client sendto",
> + .run_client = test_dgram_sendto_client,
> + .run_server = test_dgram_sendto_server,
> + },
> + {
> + .name = "SOCK_DGRAM client connect",
> + .run_client = test_dgram_connect_client,
> + .run_server = test_dgram_connect_server,
> + },
> + {
> + .name = "SOCK_DGRAM multiple connections using sendto",
> + .run_client = test_dgram_multiconn_sendto_client,
> + .run_server = test_dgram_multiconn_sendto_server,
> + },
> + {
> + .name = "SOCK_DGRAM multiple connections using send",
> + .run_client = test_dgram_multiconn_send_client,
> + .run_server = test_dgram_multiconn_send_server,
> + },
> + {
> + .name = "SOCK_DGRAM msg bounds",
> + .run_client = test_dgram_msg_bounds_client,
> + .run_server = test_dgram_msg_bounds_server,
> + },
> {},
> };
>
>

2023-06-12 10:18:37

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 4/8] vsock: make vsock bind reusable

On Sat, Jun 10, 2023 at 12:58:31AM +0000, Bobby Eshleman wrote:
> This commit makes the bind table management functions in vsock usable
> for different bind tables. For use by datagrams in a future patch.
>
> Signed-off-by: Bobby Eshleman <[email protected]>
> ---
> net/vmw_vsock/af_vsock.c | 33 ++++++++++++++++++++++++++-------
> 1 file changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index ef86765f3765..7a3ca4270446 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -230,11 +230,12 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
> sock_put(&vsk->sk);
> }
>
> -static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> +struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
> + struct list_head *bind_table)

Hi Bobby,

This function seems to only be used in this file.
Should it be static?

2023-06-22 15:13:11

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 2/8] vsock: refactor transport lookup code

On Sat, Jun 10, 2023 at 12:58:29AM +0000, Bobby Eshleman wrote:
>Introduce new reusable function vsock_connectible_lookup_transport()
>that performs the transport lookup logic.
>
>No functional change intended.
>
>Signed-off-by: Bobby Eshleman <[email protected]>
>---
> net/vmw_vsock/af_vsock.c | 25 ++++++++++++++++++-------
> 1 file changed, 18 insertions(+), 7 deletions(-)

Reviewed-by: Stefano Garzarella <[email protected]>

>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index ffb4dd8b6ea7..74358f0b47fa 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -422,6 +422,22 @@ static void vsock_deassign_transport(struct vsock_sock *vsk)
> vsk->transport = NULL;
> }
>
>+static const struct vsock_transport *
>+vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
>+{
>+ const struct vsock_transport *transport;
>+
>+ if (vsock_use_local_transport(cid))
>+ transport = transport_local;
>+ else if (cid <= VMADDR_CID_HOST || !transport_h2g ||
>+ (flags & VMADDR_FLAG_TO_HOST))
>+ transport = transport_g2h;
>+ else
>+ transport = transport_h2g;
>+
>+ return transport;
>+}
>+
> /* Assign a transport to a socket and call the .init transport callback.
> *
> * Note: for connection oriented socket this must be called when vsk->remote_addr
>@@ -462,13 +478,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> break;
> case SOCK_STREAM:
> case SOCK_SEQPACKET:
>- if (vsock_use_local_transport(remote_cid))
>- new_transport = transport_local;
>- else if (remote_cid <= VMADDR_CID_HOST || !transport_h2g ||
>- (remote_flags & VMADDR_FLAG_TO_HOST))
>- new_transport = transport_g2h;
>- else
>- new_transport = transport_h2g;
>+ new_transport = vsock_connectible_lookup_transport(remote_cid,
>+ remote_flags);
> break;
> default:
> return -ESOCKTNOSUPPORT;
>
>--
>2.30.2
>


2023-06-22 15:17:10

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 1/8] vsock/dgram: generalize recvmsg and drop transport->dgram_dequeue

On Sun, Jun 11, 2023 at 11:43:15PM +0300, Arseniy Krasnov wrote:
>Hello Bobby! Thanks for this patchset! Small comment below:
>
>On 10.06.2023 03:58, Bobby Eshleman wrote:
>> This commit drops the transport->dgram_dequeue callback and makes
>> vsock_dgram_recvmsg() generic. It also adds additional transport
>> callbacks for use by the generic vsock_dgram_recvmsg(), such as for
>> parsing skbs for CID/port which vary in format per transport.
>>
>> Signed-off-by: Bobby Eshleman <[email protected]>
>> ---
>> drivers/vhost/vsock.c | 4 +-
>> include/linux/virtio_vsock.h | 3 ++
>> include/net/af_vsock.h | 13 ++++++-
>> net/vmw_vsock/af_vsock.c | 51 ++++++++++++++++++++++++-
>> net/vmw_vsock/hyperv_transport.c | 17 +++++++--
>> net/vmw_vsock/virtio_transport.c | 4 +-
>> net/vmw_vsock/virtio_transport_common.c | 18 +++++++++
>> net/vmw_vsock/vmci_transport.c | 68 +++++++++++++--------------------
>> net/vmw_vsock/vsock_loopback.c | 4 +-
>> 9 files changed, 132 insertions(+), 50 deletions(-)
>>
>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> index 6578db78f0ae..c8201c070b4b 100644
>> --- a/drivers/vhost/vsock.c
>> +++ b/drivers/vhost/vsock.c
>> @@ -410,9 +410,11 @@ static struct virtio_transport vhost_transport = {
>> .cancel_pkt = vhost_transport_cancel_pkt,
>>
>> .dgram_enqueue = virtio_transport_dgram_enqueue,
>> - .dgram_dequeue = virtio_transport_dgram_dequeue,
>> .dgram_bind = virtio_transport_dgram_bind,
>> .dgram_allow = virtio_transport_dgram_allow,
>> + .dgram_get_cid = virtio_transport_dgram_get_cid,
>> + .dgram_get_port = virtio_transport_dgram_get_port,
>> + .dgram_get_length = virtio_transport_dgram_get_length,
>>
>> .stream_enqueue = virtio_transport_stream_enqueue,
>> .stream_dequeue = virtio_transport_stream_dequeue,
>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> index c58453699ee9..23521a318cf0 100644
>> --- a/include/linux/virtio_vsock.h
>> +++ b/include/linux/virtio_vsock.h
>> @@ -219,6 +219,9 @@ bool virtio_transport_stream_allow(u32 cid, u32 port);
>> int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>> struct sockaddr_vm *addr);
>> bool virtio_transport_dgram_allow(u32 cid, u32 port);
>> +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
>> +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
>> +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len);
>>
>> int virtio_transport_connect(struct vsock_sock *vsk);
>>
>> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>> index 0e7504a42925..7bedb9ee7e3e 100644
>> --- a/include/net/af_vsock.h
>> +++ b/include/net/af_vsock.h
>> @@ -120,11 +120,20 @@ struct vsock_transport {
>>
>> /* DGRAM. */
>> int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
>> - int (*dgram_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
>> - size_t len, int flags);
>> int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
>> struct msghdr *, size_t len);
>> bool (*dgram_allow)(u32 cid, u32 port);
>> + int (*dgram_get_cid)(struct sk_buff *skb, unsigned int *cid);
>> + int (*dgram_get_port)(struct sk_buff *skb, unsigned int *port);
>> + int (*dgram_get_length)(struct sk_buff *skb, size_t *length);
>> +
>> + /* The number of bytes into the buffer at which the payload starts, as
>> + * first seen by the receiving socket layer. For example, if the
>> + * transport presets the skb pointers using skb_pull(sizeof(header))
>> + * than this would be zero, otherwise it would be the size of the
>> + * header.
>> + */
>> + const size_t dgram_payload_offset;
>>
>> /* STREAM. */
>> /* TODO: stream_bind() */
>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> index efb8a0937a13..ffb4dd8b6ea7 100644
>> --- a/net/vmw_vsock/af_vsock.c
>> +++ b/net/vmw_vsock/af_vsock.c
>> @@ -1271,11 +1271,15 @@ static int vsock_dgram_connect(struct socket *sock,
>> int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
>> size_t len, int flags)
>> {
>> + const struct vsock_transport *transport;
>> #ifdef CONFIG_BPF_SYSCALL
>> const struct proto *prot;
>> #endif
>> struct vsock_sock *vsk;
>> + struct sk_buff *skb;
>> + size_t payload_len;
>> struct sock *sk;
>> + int err;
>>
>> sk = sock->sk;
>> vsk = vsock_sk(sk);
>> @@ -1286,7 +1290,52 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
>> return prot->recvmsg(sk, msg, len, flags, NULL);
>> #endif
>>
>> - return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
>> + if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
>> + return -EOPNOTSUPP;
>> +
>> + transport = vsk->transport;
>> +
>> + /* Retrieve the head sk_buff from the socket's receive queue. */
>> + err = 0;
>> + skb = skb_recv_datagram(sk_vsock(vsk), flags, &err);
>> + if (!skb)
>> + return err;
>> +
>> + err = transport->dgram_get_length(skb, &payload_len);

What about ssize_t return value here?

Or maybe a single callback that return both length and offset?

.dgram_get_payload_info(skb, &payload_len, &payload_off)

>> + if (err)
>> + goto out;
>> +
>> + if (payload_len > len) {
>> + payload_len = len;
>> + msg->msg_flags |= MSG_TRUNC;
>> + }
>> +
>> + /* Place the datagram payload in the user's iovec. */
>> + err = skb_copy_datagram_msg(skb, transport->dgram_payload_offset, msg, payload_len);
>> + if (err)
>> + goto out;
>> +
>> + if (msg->msg_name) {
>> + /* Provide the address of the sender. */
>> + DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
>> + unsigned int cid, port;
>> +
>> + err = transport->dgram_get_cid(skb, &cid);
>> + if (err)
>> + goto out;
>> +
>> + err = transport->dgram_get_port(skb, &port);
>> + if (err)
>> + goto out;
>
>Maybe we can merge 'dgram_get_cid' and 'dgram_get_port' to a single callback? Because I see that this is
>the only place where both are used (correct me if i'm wrong) and logically both operates with addresses:
>CID and port. E.g. something like that: dgram_get_cid_n_port().

What about .dgram_addr_init(struct sk_buff *skb, struct sockaddr_vm *addr)
and the transport can set cid and port?

>
>Moreover, I'm not sure, but is it good "tradeoff" here: remove transport specific callback for dgram receive
>where we already have 'msghdr' with both data buffer and buffer for 'sockaddr_vm' and instead of it add new
>several fields (callbacks) to transports like dgram_get_cid(), dgram_get_port()? I agree, that in each transport
>specific callback we will have same copying logic by calling 'skb_copy_datagram_msg()' and filling address
>by using 'vsock_addr_init()', but in this case we don't need to update transports too much. For example HyperV
>still unchanged as it does not support SOCK_DGRAM. For VMCI You just need to add 'vsock_addr_init()' logic
>to it's dgram dequeue callback.
>
>What do You think?

Honestly, I'd rather avoid duplicate code than reduce changes in
transports that don't support dgram.

One thing I do agree on though is minimizing the number of callbacks
to call to reduce the number of indirection (more performance?).

Thanks,
Stefano

>
>Thanks, Arseniy
>
>> +
>> + vsock_addr_init(vm_addr, cid, port);
>> + msg->msg_namelen = sizeof(*vm_addr);
>> + }
>> + err = payload_len;
>> +
>> +out:
>> + skb_free_datagram(&vsk->sk, skb);
>> + return err;
>> }
>> EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);
>>
>> diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
>> index 7cb1a9d2cdb4..ff6e87e25fa0 100644
>> --- a/net/vmw_vsock/hyperv_transport.c
>> +++ b/net/vmw_vsock/hyperv_transport.c
>> @@ -556,8 +556,17 @@ static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
>> return -EOPNOTSUPP;
>> }
>>
>> -static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
>> - size_t len, int flags)
>> +static int hvs_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +
>> +static int hvs_dgram_get_port(struct sk_buff *skb, unsigned int *port)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +
>> +static int hvs_dgram_get_length(struct sk_buff *skb, size_t *len)
>> {
>> return -EOPNOTSUPP;
>> }
>> @@ -833,7 +842,9 @@ static struct vsock_transport hvs_transport = {
>> .shutdown = hvs_shutdown,
>>
>> .dgram_bind = hvs_dgram_bind,
>> - .dgram_dequeue = hvs_dgram_dequeue,
>> + .dgram_get_cid = hvs_dgram_get_cid,
>> + .dgram_get_port = hvs_dgram_get_port,
>> + .dgram_get_length = hvs_dgram_get_length,
>> .dgram_enqueue = hvs_dgram_enqueue,
>> .dgram_allow = hvs_dgram_allow,
>>
>> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>> index e95df847176b..5763cdf13804 100644
>> --- a/net/vmw_vsock/virtio_transport.c
>> +++ b/net/vmw_vsock/virtio_transport.c
>> @@ -429,9 +429,11 @@ static struct virtio_transport virtio_transport = {
>> .cancel_pkt = virtio_transport_cancel_pkt,
>>
>> .dgram_bind = virtio_transport_dgram_bind,
>> - .dgram_dequeue = virtio_transport_dgram_dequeue,
>> .dgram_enqueue = virtio_transport_dgram_enqueue,
>> .dgram_allow = virtio_transport_dgram_allow,
>> + .dgram_get_cid = virtio_transport_dgram_get_cid,
>> + .dgram_get_port = virtio_transport_dgram_get_port,
>> + .dgram_get_length = virtio_transport_dgram_get_length,
>>
>> .stream_dequeue = virtio_transport_stream_dequeue,
>> .stream_enqueue = virtio_transport_stream_enqueue,
>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> index b769fc258931..e6903c719964 100644
>> --- a/net/vmw_vsock/virtio_transport_common.c
>> +++ b/net/vmw_vsock/virtio_transport_common.c
>> @@ -797,6 +797,24 @@ int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>> }
>> EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
>>
>> +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_cid);
>> +
>> +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_port);
>> +
>> +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_length);
>> +
>> bool virtio_transport_dgram_allow(u32 cid, u32 port)
>> {
>> return false;
>> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
>> index b370070194fa..bbc63826bf48 100644
>> --- a/net/vmw_vsock/vmci_transport.c
>> +++ b/net/vmw_vsock/vmci_transport.c
>> @@ -1731,57 +1731,40 @@ static int vmci_transport_dgram_enqueue(
>> return err - sizeof(*dg);
>> }
>>
>> -static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk,
>> - struct msghdr *msg, size_t len,
>> - int flags)
>> +static int vmci_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
>> {
>> - int err;
>> struct vmci_datagram *dg;
>> - size_t payload_len;
>> - struct sk_buff *skb;
>>
>> - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
>> - return -EOPNOTSUPP;
>> + dg = (struct vmci_datagram *)skb->data;
>> + if (!dg)
>> + return -EINVAL;
>>
>> - /* Retrieve the head sk_buff from the socket's receive queue. */
>> - err = 0;
>> - skb = skb_recv_datagram(&vsk->sk, flags, &err);
>> - if (!skb)
>> - return err;
>> + *cid = dg->src.context;
>> + return 0;
>> +}
>> +
>> +static int vmci_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
>> +{
>> + struct vmci_datagram *dg;
>>
>> dg = (struct vmci_datagram *)skb->data;
>> if (!dg)
>> - /* err is 0, meaning we read zero bytes. */
>> - goto out;
>> -
>> - payload_len = dg->payload_size;
>> - /* Ensure the sk_buff matches the payload size claimed in the packet. */
>> - if (payload_len != skb->len - sizeof(*dg)) {
>> - err = -EINVAL;
>> - goto out;
>> - }
>> + return -EINVAL;
>>
>> - if (payload_len > len) {
>> - payload_len = len;
>> - msg->msg_flags |= MSG_TRUNC;
>> - }
>> + *port = dg->src.resource;
>> + return 0;
>> +}
>>
>> - /* Place the datagram payload in the user's iovec. */
>> - err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
>> - if (err)
>> - goto out;
>> +static int vmci_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
>> +{
>> + struct vmci_datagram *dg;
>>
>> - if (msg->msg_name) {
>> - /* Provide the address of the sender. */
>> - DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
>> - vsock_addr_init(vm_addr, dg->src.context, dg->src.resource);
>> - msg->msg_namelen = sizeof(*vm_addr);
>> - }
>> - err = payload_len;
>> + dg = (struct vmci_datagram *)skb->data;
>> + if (!dg)
>> + return -EINVAL;
>>
>> -out:
>> - skb_free_datagram(&vsk->sk, skb);
>> - return err;
>> + *len = dg->payload_size;
>> + return 0;
>> }
>>
>> static bool vmci_transport_dgram_allow(u32 cid, u32 port)
>> @@ -2040,9 +2023,12 @@ static struct vsock_transport vmci_transport = {
>> .release = vmci_transport_release,
>> .connect = vmci_transport_connect,
>> .dgram_bind = vmci_transport_dgram_bind,
>> - .dgram_dequeue = vmci_transport_dgram_dequeue,
>> .dgram_enqueue = vmci_transport_dgram_enqueue,
>> .dgram_allow = vmci_transport_dgram_allow,
>> + .dgram_get_cid = vmci_transport_dgram_get_cid,
>> + .dgram_get_port = vmci_transport_dgram_get_port,
>> + .dgram_get_length = vmci_transport_dgram_get_length,
>> + .dgram_payload_offset = sizeof(struct vmci_datagram),
>> .stream_dequeue = vmci_transport_stream_dequeue,
>> .stream_enqueue = vmci_transport_stream_enqueue,
>> .stream_has_data = vmci_transport_stream_has_data,
>> diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
>> index 5c6360df1f31..2f3cabc79ee5 100644
>> --- a/net/vmw_vsock/vsock_loopback.c
>> +++ b/net/vmw_vsock/vsock_loopback.c
>> @@ -62,9 +62,11 @@ static struct virtio_transport loopback_transport = {
>> .cancel_pkt = vsock_loopback_cancel_pkt,
>>
>> .dgram_bind = virtio_transport_dgram_bind,
>> - .dgram_dequeue = virtio_transport_dgram_dequeue,
>> .dgram_enqueue = virtio_transport_dgram_enqueue,
>> .dgram_allow = virtio_transport_dgram_allow,
>> + .dgram_get_cid = virtio_transport_dgram_get_cid,
>> + .dgram_get_port = virtio_transport_dgram_get_port,
>> + .dgram_get_length = virtio_transport_dgram_get_length,
>>
>> .stream_dequeue = virtio_transport_stream_dequeue,
>> .stream_enqueue = virtio_transport_stream_enqueue,
>>
>


2023-06-22 15:28:28

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 3/8] vsock: support multi-transport datagrams

On Sat, Jun 10, 2023 at 12:58:30AM +0000, Bobby Eshleman wrote:
>This patch adds support for multi-transport datagrams.
>
>This includes:
>- Per-packet lookup of transports when using sendto(sockaddr_vm)
>- Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
> sockaddr_vm
>
>To preserve backwards compatibility with VMCI, some important changes
>were made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
>be used for dgrams iff there is not yet a g2h or h2g transport that has

s/iff/if

>been registered that can transmit the packet. If there is a g2h/h2g
>transport for that remote address, then that transport will be used and
>not "transport_dgram". This essentially makes "transport_dgram" a
>fallback transport for when h2g/g2h has not yet gone online, which
>appears to be the exact use case for VMCI.
>
>This design makes sense, because there is no reason that the
>transport_{g2h,h2g} cannot also service datagrams, which makes the role
>of transport_dgram difficult to understand outside of the VMCI context.
>
>The logic around "transport_dgram" had to be retained to prevent
>breaking VMCI:
>
>1) VMCI datagrams appear to function outside of the h2g/g2h
> paradigm. When the vmci transport becomes online, it registers itself
> with the DGRAM feature, but not H2G/G2H. Only later when the
> transport has more information about its environment does it register
> H2G or G2H. In the case that a datagram socket becomes active
> after DGRAM registration but before G2H/H2G registration, the
> "transport_dgram" transport needs to be used.

IIRC we did this, because at that time only VMCI supported DGRAM. Now
that there are more transports, maybe DGRAM can follow the h2g/g2h
paradigm.

>
>2) VMCI seems to require special message be sent by the transport when a
> datagram socket calls bind(). Under the h2g/g2h model, the transport
> is selected using the remote_addr which is set by connect(). At
> bind time there is no remote_addr because often no connect() has been
> called yet: the transport is null. Therefore, with a null transport
> there doesn't seem to be any good way for a datagram socket a tell the
> VMCI transport that it has just had bind() called upon it.

@Vishnu, @Bryan do you think we can avoid this in some way?

>
>Only transports with a special datagram fallback use-case such as VMCI
>need to register VSOCK_TRANSPORT_F_DGRAM.

Maybe we should rename it in VSOCK_TRANSPORT_F_DGRAM_FALLBACK or
something like that.

In any case, we definitely need to update the comment in
include/net/af_vsock.h on top of VSOCK_TRANSPORT_F_DGRAM mentioning
this.

>
>Signed-off-by: Bobby Eshleman <[email protected]>
>---
> drivers/vhost/vsock.c | 1 -
> include/linux/virtio_vsock.h | 2 -
> net/vmw_vsock/af_vsock.c | 78 +++++++++++++++++++++++++--------
> net/vmw_vsock/hyperv_transport.c | 6 ---
> net/vmw_vsock/virtio_transport.c | 1 -
> net/vmw_vsock/virtio_transport_common.c | 7 ---
> net/vmw_vsock/vsock_loopback.c | 1 -
> 7 files changed, 60 insertions(+), 36 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index c8201c070b4b..8f0082da5e70 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
> .cancel_pkt = vhost_transport_cancel_pkt,
>
> .dgram_enqueue = virtio_transport_dgram_enqueue,
>- .dgram_bind = virtio_transport_dgram_bind,
> .dgram_allow = virtio_transport_dgram_allow,
> .dgram_get_cid = virtio_transport_dgram_get_cid,
> .dgram_get_port = virtio_transport_dgram_get_port,
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index 23521a318cf0..73afa09f4585 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -216,8 +216,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> bool virtio_transport_stream_allow(u32 cid, u32 port);
>-int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>- struct sockaddr_vm *addr);
> bool virtio_transport_dgram_allow(u32 cid, u32 port);
> int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
> int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 74358f0b47fa..ef86765f3765 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -438,6 +438,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
> return transport;
> }
>
>+static const struct vsock_transport *
>+vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
>+{
>+ const struct vsock_transport *transport;
>+
>+ transport = vsock_connectible_lookup_transport(cid, flags);
>+ if (transport)
>+ return transport;
>+
>+ return transport_dgram;
>+}
>+
> /* Assign a transport to a socket and call the .init transport callback.
> *
> * Note: for connection oriented socket this must be called when vsk->remote_addr
>@@ -474,7 +486,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
>
> switch (sk->sk_type) {
> case SOCK_DGRAM:
>- new_transport = transport_dgram;
>+ new_transport = vsock_dgram_lookup_transport(remote_cid,
>+ remote_flags);
> break;
> case SOCK_STREAM:
> case SOCK_SEQPACKET:
>@@ -691,6 +704,9 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> static int __vsock_bind_dgram(struct vsock_sock *vsk,
> struct sockaddr_vm *addr)
> {
>+ if (!vsk->transport || !vsk->transport->dgram_bind)
>+ return -EINVAL;
>+
> return vsk->transport->dgram_bind(vsk, addr);
> }
>
>@@ -1172,19 +1188,24 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
>
> lock_sock(sk);
>
>- transport = vsk->transport;
>-
>- err = vsock_auto_bind(vsk);
>- if (err)
>- goto out;
>-
>-
> /* If the provided message contains an address, use that. Otherwise
> * fall back on the socket's remote handle (if it has been connected).
> */
> if (msg->msg_name &&
> vsock_addr_cast(msg->msg_name, msg->msg_namelen,
> &remote_addr) == 0) {
>+ transport = vsock_dgram_lookup_transport(remote_addr->svm_cid,
>+ remote_addr->svm_flags);
>+ if (!transport) {
>+ err = -EINVAL;
>+ goto out;
>+ }
>+
>+ if (!try_module_get(transport->module)) {
>+ err = -ENODEV;
>+ goto out;
>+ }
>+
> /* Ensure this address is of the right type and is a valid
> * destination.
> */
>@@ -1193,11 +1214,27 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> remote_addr->svm_cid = transport->get_local_cid();
>

From here ...

> if (!vsock_addr_bound(remote_addr)) {
>+ module_put(transport->module);
>+ err = -EINVAL;
>+ goto out;
>+ }
>+
>+ if (!transport->dgram_allow(remote_addr->svm_cid,
>+ remote_addr->svm_port)) {
>+ module_put(transport->module);
> err = -EINVAL;
> goto out;
> }
>+
>+ err = transport->dgram_enqueue(vsk, remote_addr, msg, len);

... to here, looks like duplicate code, can we get it out of the if
block?

>+ module_put(transport->module);
> } else if (sock->state == SS_CONNECTED) {
> remote_addr = &vsk->remote_addr;
>+ transport = vsk->transport;
>+
>+ err = vsock_auto_bind(vsk);
>+ if (err)
>+ goto out;
>
> if (remote_addr->svm_cid == VMADDR_CID_ANY)
> remote_addr->svm_cid = transport->get_local_cid();
>@@ -1205,23 +1242,23 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> /* XXX Should connect() or this function ensure remote_addr is
> * bound?
> */
>- if (!vsock_addr_bound(&vsk->remote_addr)) {
>+ if (!vsock_addr_bound(remote_addr)) {
> err = -EINVAL;
> goto out;
> }
>- } else {
>- err = -EINVAL;
>- goto out;
>- }
>
>- if (!transport->dgram_allow(remote_addr->svm_cid,
>- remote_addr->svm_port)) {
>+ if (!transport->dgram_allow(remote_addr->svm_cid,
>+ remote_addr->svm_port)) {
>+ err = -EINVAL;
>+ goto out;
>+ }
>+
>+ err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
>+ } else {
> err = -EINVAL;
> goto out;
> }
>
>- err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
>-
> out:
> release_sock(sk);
> return err;
>@@ -1255,13 +1292,18 @@ static int vsock_dgram_connect(struct socket *sock,
> if (err)
> goto out;
>
>+ memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
>+
>+ err = vsock_assign_transport(vsk, NULL);
>+ if (err)
>+ goto out;
>+
> if (!vsk->transport->dgram_allow(remote_addr->svm_cid,
> remote_addr->svm_port)) {
> err = -EINVAL;
> goto out;
> }
>
>- memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
> sock->state = SS_CONNECTED;
>
> /* sock map disallows redirection of non-TCP sockets with sk_state !=
>diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
>index ff6e87e25fa0..c00bc5da769a 100644
>--- a/net/vmw_vsock/hyperv_transport.c
>+++ b/net/vmw_vsock/hyperv_transport.c
>@@ -551,11 +551,6 @@ static void hvs_destruct(struct vsock_sock *vsk)
> kfree(hvs);
> }
>
>-static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
>-{
>- return -EOPNOTSUPP;
>-}
>-
> static int hvs_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> {
> return -EOPNOTSUPP;
>@@ -841,7 +836,6 @@ static struct vsock_transport hvs_transport = {
> .connect = hvs_connect,
> .shutdown = hvs_shutdown,
>
>- .dgram_bind = hvs_dgram_bind,
> .dgram_get_cid = hvs_dgram_get_cid,
> .dgram_get_port = hvs_dgram_get_port,
> .dgram_get_length = hvs_dgram_get_length,
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index 5763cdf13804..1b7843a7779a 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -428,7 +428,6 @@ static struct virtio_transport virtio_transport = {
> .shutdown = virtio_transport_shutdown,
> .cancel_pkt = virtio_transport_cancel_pkt,
>
>- .dgram_bind = virtio_transport_dgram_bind,
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> .dgram_allow = virtio_transport_dgram_allow,
> .dgram_get_cid = virtio_transport_dgram_get_cid,
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index e6903c719964..d5a3c8efe84b 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -790,13 +790,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
> }
> EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
>
>-int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>- struct sockaddr_vm *addr)
>-{
>- return -EOPNOTSUPP;
>-}
>-EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
>-
> int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> {
> return -EOPNOTSUPP;
>diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
>index 2f3cabc79ee5..e9de45a26fbd 100644
>--- a/net/vmw_vsock/vsock_loopback.c
>+++ b/net/vmw_vsock/vsock_loopback.c
>@@ -61,7 +61,6 @@ static struct virtio_transport loopback_transport = {
> .shutdown = virtio_transport_shutdown,
> .cancel_pkt = vsock_loopback_cancel_pkt,
>
>- .dgram_bind = virtio_transport_dgram_bind,
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> .dgram_allow = virtio_transport_dgram_allow,
> .dgram_get_cid = virtio_transport_dgram_get_cid,
>
>--
>2.30.2
>

The rest LGTM!

Stefano


2023-06-22 16:06:44

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 4/8] vsock: make vsock bind reusable

On Sat, Jun 10, 2023 at 12:58:31AM +0000, Bobby Eshleman wrote:
>This commit makes the bind table management functions in vsock usable
>for different bind tables. For use by datagrams in a future patch.
>
>Signed-off-by: Bobby Eshleman <[email protected]>
>---
> net/vmw_vsock/af_vsock.c | 33 ++++++++++++++++++++++++++-------
> 1 file changed, 26 insertions(+), 7 deletions(-)
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index ef86765f3765..7a3ca4270446 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -230,11 +230,12 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
> sock_put(&vsk->sk);
> }
>
>-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
>+struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
>+ struct list_head *bind_table)
> {
> struct vsock_sock *vsk;
>
>- list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
>+ list_for_each_entry(vsk, bind_table, bound_table) {
> if (vsock_addr_equals_addr(addr, &vsk->local_addr))
> return sk_vsock(vsk);
>
>@@ -247,6 +248,11 @@ static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> return NULL;
> }
>
>+static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
>+{
>+ return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr));
>+}
>+
> static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
> struct sockaddr_vm *dst)
> {
>@@ -646,12 +652,17 @@ static void vsock_pending_work(struct work_struct *work)
>
> /**** SOCKET OPERATIONS ****/
>
>-static int __vsock_bind_connectible(struct vsock_sock *vsk,
>- struct sockaddr_vm *addr)
>+static int vsock_bind_common(struct vsock_sock *vsk,
>+ struct sockaddr_vm *addr,
>+ struct list_head *bind_table,
>+ size_t table_size)
> {
> static u32 port;
> struct sockaddr_vm new_addr;
>
>+ if (table_size < VSOCK_HASH_SIZE)
>+ return -1;

Why we need this check now?

>+
> if (!port)
> port = get_random_u32_above(LAST_RESERVED_PORT);
>
>@@ -667,7 +678,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
>
> new_addr.svm_port = port++;
>
>- if (!__vsock_find_bound_socket(&new_addr)) {
>+ if (!vsock_find_bound_socket_common(&new_addr,
>+ &bind_table[VSOCK_HASH(addr)])) {
> found = true;
> break;
> }
>@@ -684,7 +696,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> return -EACCES;
> }
>
>- if (__vsock_find_bound_socket(&new_addr))
>+ if (vsock_find_bound_socket_common(&new_addr,
>+ &bind_table[VSOCK_HASH(addr)]))
> return -EADDRINUSE;
> }
>
>@@ -696,11 +709,17 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> * by AF_UNIX.
> */
> __vsock_remove_bound(vsk);
>- __vsock_insert_bound(vsock_bound_sockets(&vsk->local_addr), vsk);
>+ __vsock_insert_bound(&bind_table[VSOCK_HASH(&vsk->local_addr)], vsk);
>
> return 0;
> }
>
>+static int __vsock_bind_connectible(struct vsock_sock *vsk,
>+ struct sockaddr_vm *addr)
>+{
>+ return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
>+}
>+
> static int __vsock_bind_dgram(struct vsock_sock *vsk,
> struct sockaddr_vm *addr)
> {
>
>--
>2.30.2
>

The rest seems okay to me, but I agree with Simon's suggestion.

Stefano


2023-06-22 16:15:48

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 6/8] virtio/vsock: support dgrams

On Sun, Jun 11, 2023 at 11:49:02PM +0300, Arseniy Krasnov wrote:
>Hello Bobby!
>
>On 10.06.2023 03:58, Bobby Eshleman wrote:
>> This commit adds support for datagrams over virtio/vsock.
>>
>> Message boundaries are preserved on a per-skb and per-vq entry basis.
>
>I'm a little bit confused about the following case: let vhost sends 4097 bytes
>datagram to the guest. Guest uses 4096 RX buffers in it's virtio queue, each
>buffer has attached empty skb to it. Vhost places first 4096 bytes to the first
>buffer of guests RX queue, and 1 last byte to the second buffer. Now IIUC guest
>has two skb in it rx queue, and user in guest wants to read data - does it read
>4097 bytes, while guest has two skb - 4096 bytes and 1 bytes? In seqpacket there is
>special marker in header which shows where message ends, and how it works here?

I think the main difference is that DGRAM is not connection-oriented, so
we don't have a stream and we can't split the packet into 2 (maybe we
could, but we have no guarantee that the second one for example will be
not discarded because there is no space).

So I think it is acceptable as a restriction to keep it simple.

My only doubt is, should we make the RX buffer size configurable,
instead of always using 4k?

Thanks,
Stefano


2023-06-22 16:39:19

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 6/8] virtio/vsock: support dgrams

On Sat, Jun 10, 2023 at 12:58:33AM +0000, Bobby Eshleman wrote:
>This commit adds support for datagrams over virtio/vsock.
>
>Message boundaries are preserved on a per-skb and per-vq entry basis.
>Messages are copied in whole from the user to an SKB, which in turn is
>added to the scatterlist for the virtqueue in whole for the device.
>Messages do not straddle skbs and they do not straddle packets.
>Messages may be truncated by the receiving user if their buffer is
>shorter than the message.
>
>Other properties of vsock datagrams:
>- Datagrams self-throttle at the per-socket sk_sndbuf threshold.
>- The same virtqueue is used as is used for streams and seqpacket flows
>- Credits are not used for datagrams
>- Packets are dropped silently by the device, which means the virtqueue
> will still get kicked even during high packet loss, so long as the
> socket does not exceed sk_sndbuf.
>
>Future work might include finding a way to reduce the virtqueue kick
>rate for datagram flows with high packet loss.
>
>Signed-off-by: Bobby Eshleman <[email protected]>
>---
> drivers/vhost/vsock.c | 27 ++++-
> include/linux/virtio_vsock.h | 5 +-
> include/net/af_vsock.h | 1 +
> include/uapi/linux/virtio_vsock.h | 1 +
> net/vmw_vsock/af_vsock.c | 58 +++++++--
> net/vmw_vsock/virtio_transport.c | 23 +++-
> net/vmw_vsock/virtio_transport_common.c | 207 ++++++++++++++++++++++++--------
> net/vmw_vsock/vsock_loopback.c | 8 +-
> 8 files changed, 264 insertions(+), 66 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index 8f0082da5e70..159c1a22c1a8 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -32,7 +32,8 @@
> enum {
> VHOST_VSOCK_FEATURES = VHOST_FEATURES |
> (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
>- (1ULL << VIRTIO_VSOCK_F_SEQPACKET)
>+ (1ULL << VIRTIO_VSOCK_F_SEQPACKET) |
>+ (1ULL << VIRTIO_VSOCK_F_DGRAM)
> };
>
> enum {
>@@ -56,6 +57,7 @@ struct vhost_vsock {
> atomic_t queued_replies;
>
> u32 guest_cid;
>+ bool dgram_allow;
> bool seqpacket_allow;
> };
>
>@@ -394,6 +396,7 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock)
> return val < vq->num;
> }
>
>+static bool vhost_transport_dgram_allow(u32 cid, u32 port);
> static bool vhost_transport_seqpacket_allow(u32 remote_cid);
>
> static struct virtio_transport vhost_transport = {
>@@ -410,10 +413,11 @@ static struct virtio_transport vhost_transport = {
> .cancel_pkt = vhost_transport_cancel_pkt,
>
> .dgram_enqueue = virtio_transport_dgram_enqueue,
>- .dgram_allow = virtio_transport_dgram_allow,
>+ .dgram_allow = vhost_transport_dgram_allow,
> .dgram_get_cid = virtio_transport_dgram_get_cid,
> .dgram_get_port = virtio_transport_dgram_get_port,
> .dgram_get_length = virtio_transport_dgram_get_length,
>+ .dgram_payload_offset = 0,
>
> .stream_enqueue = virtio_transport_stream_enqueue,
> .stream_dequeue = virtio_transport_stream_dequeue,
>@@ -446,6 +450,22 @@ static struct virtio_transport vhost_transport = {
> .send_pkt = vhost_transport_send_pkt,
> };
>
>+static bool vhost_transport_dgram_allow(u32 cid, u32 port)
>+{
>+ struct vhost_vsock *vsock;
>+ bool dgram_allow = false;
>+
>+ rcu_read_lock();
>+ vsock = vhost_vsock_get(cid);
>+
>+ if (vsock)
>+ dgram_allow = vsock->dgram_allow;
>+
>+ rcu_read_unlock();
>+
>+ return dgram_allow;
>+}
>+
> static bool vhost_transport_seqpacket_allow(u32 remote_cid)
> {
> struct vhost_vsock *vsock;
>@@ -802,6 +822,9 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
> if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
> vsock->seqpacket_allow = true;
>
>+ if (features & (1ULL << VIRTIO_VSOCK_F_DGRAM))
>+ vsock->dgram_allow = true;
>+
> for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
> vq = &vsock->vqs[i];
> mutex_lock(&vq->mutex);
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index 73afa09f4585..237ca87a2ecd 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -216,7 +216,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> bool virtio_transport_stream_allow(u32 cid, u32 port);
>-bool virtio_transport_dgram_allow(u32 cid, u32 port);
> int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
> int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
> int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len);
>@@ -247,4 +246,8 @@ void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit);
> void virtio_transport_deliver_tap_pkt(struct sk_buff *skb);
> int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *list);
> int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t read_actor);
>+void virtio_transport_init_dgram_bind_tables(void);
>+int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
>+int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
>+int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len);
> #endif /* _LINUX_VIRTIO_VSOCK_H */
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index 7bedb9ee7e3e..c115e655b4f5 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -225,6 +225,7 @@ void vsock_for_each_connected_socket(struct vsock_transport *transport,
> void (*fn)(struct sock *sk));
> int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
> bool vsock_find_cid(unsigned int cid);
>+struct sock *vsock_find_bound_dgram_socket(struct sockaddr_vm *addr);
>
> /**** TAP ****/
>
>diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
>index 9c25f267bbc0..27b4b2b8bf13 100644
>--- a/include/uapi/linux/virtio_vsock.h
>+++ b/include/uapi/linux/virtio_vsock.h
>@@ -70,6 +70,7 @@ struct virtio_vsock_hdr {
> enum virtio_vsock_type {
> VIRTIO_VSOCK_TYPE_STREAM = 1,
> VIRTIO_VSOCK_TYPE_SEQPACKET = 2,
>+ VIRTIO_VSOCK_TYPE_DGRAM = 3,
> };
>
> enum virtio_vsock_op {
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 7a3ca4270446..b0b18e7f4299 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c

I would split this patch in 2, one with the changes in af_vsock.c,
of for the virtio changes.

>@@ -114,6 +114,7 @@
> static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
> static void vsock_sk_destruct(struct sock *sk);
> static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
>+static bool sock_type_connectible(u16 type);
>
> /* Protocol family. */
> struct proto vsock_proto = {
>@@ -180,6 +181,8 @@ struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
> EXPORT_SYMBOL_GPL(vsock_connected_table);
> DEFINE_SPINLOCK(vsock_table_lock);
> EXPORT_SYMBOL_GPL(vsock_table_lock);
>+static struct list_head vsock_dgram_bind_table[VSOCK_HASH_SIZE];
>+static DEFINE_SPINLOCK(vsock_dgram_table_lock);
>
> /* Autobind this socket to the local address if necessary. */
> static int vsock_auto_bind(struct vsock_sock *vsk)
>@@ -202,6 +205,9 @@ static void vsock_init_tables(void)
>
> for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++)
> INIT_LIST_HEAD(&vsock_connected_table[i]);
>+
>+ for (i = 0; i < ARRAY_SIZE(vsock_dgram_bind_table); i++)
>+ INIT_LIST_HEAD(&vsock_dgram_bind_table[i]);
> }
>
> static void __vsock_insert_bound(struct list_head *list,
>@@ -230,8 +236,8 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
> sock_put(&vsk->sk);
> }
>
>-struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
>- struct list_head *bind_table)
>+static struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
>+ struct list_head *bind_table)
> {
> struct vsock_sock *vsk;
>
>@@ -248,6 +254,23 @@ struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
> return NULL;
> }
>
>+struct sock *
>+vsock_find_bound_dgram_socket(struct sockaddr_vm *addr)
>+{
>+ struct sock *sk;
>+
>+ spin_lock_bh(&vsock_dgram_table_lock);
>+ sk = vsock_find_bound_socket_common(addr,
>+ &vsock_dgram_bind_table[VSOCK_HASH(addr)]);
>+ if (sk)
>+ sock_hold(sk);
>+
>+ spin_unlock_bh(&vsock_dgram_table_lock);
>+
>+ return sk;
>+}
>+EXPORT_SYMBOL_GPL(vsock_find_bound_dgram_socket);
>+
> static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> {
> return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr));
>@@ -287,6 +310,14 @@ void vsock_insert_connected(struct vsock_sock *vsk)
> }
> EXPORT_SYMBOL_GPL(vsock_insert_connected);
>
>+static void vsock_remove_dgram_bound(struct vsock_sock *vsk)
>+{
>+ spin_lock_bh(&vsock_dgram_table_lock);
>+ if (__vsock_in_bound_table(vsk))
>+ __vsock_remove_bound(vsk);
>+ spin_unlock_bh(&vsock_dgram_table_lock);
>+}
>+
> void vsock_remove_bound(struct vsock_sock *vsk)
> {
> spin_lock_bh(&vsock_table_lock);
>@@ -338,7 +369,10 @@ EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
>
> void vsock_remove_sock(struct vsock_sock *vsk)
> {
>- vsock_remove_bound(vsk);
>+ if (sock_type_connectible(sk_vsock(vsk)->sk_type))
>+ vsock_remove_bound(vsk);
>+ else
>+ vsock_remove_dgram_bound(vsk);
> vsock_remove_connected(vsk);
> }
> EXPORT_SYMBOL_GPL(vsock_remove_sock);
>@@ -720,11 +754,19 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
> }
>
>-static int __vsock_bind_dgram(struct vsock_sock *vsk,
>- struct sockaddr_vm *addr)
>+static int vsock_bind_dgram(struct vsock_sock *vsk,
>+ struct sockaddr_vm *addr)
> {
>- if (!vsk->transport || !vsk->transport->dgram_bind)
>- return -EINVAL;
>+ if (!vsk->transport || !vsk->transport->dgram_bind) {
>+ int retval;
>+
>+ spin_lock_bh(&vsock_dgram_table_lock);
>+ retval = vsock_bind_common(vsk, addr, vsock_dgram_bind_table,
>+ VSOCK_HASH_SIZE);
>+ spin_unlock_bh(&vsock_dgram_table_lock);
>+
>+ return retval;
>+ }
>
> return vsk->transport->dgram_bind(vsk, addr);
> }
>@@ -755,7 +797,7 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
> break;
>
> case SOCK_DGRAM:
>- retval = __vsock_bind_dgram(vsk, addr);
>+ retval = vsock_bind_dgram(vsk, addr);
> break;
>
> default:
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index 1b7843a7779a..7160a3104218 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -63,6 +63,7 @@ struct virtio_vsock {
>
> u32 guest_cid;
> bool seqpacket_allow;
>+ bool dgram_allow;
> };
>
> static u32 virtio_transport_get_local_cid(void)
>@@ -413,6 +414,7 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
> queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> }
>
>+static bool virtio_transport_dgram_allow(u32 cid, u32 port);
> static bool virtio_transport_seqpacket_allow(u32 remote_cid);
>
> static struct virtio_transport virtio_transport = {
>@@ -465,6 +467,21 @@ static struct virtio_transport virtio_transport = {
> .send_pkt = virtio_transport_send_pkt,
> };
>
>+static bool virtio_transport_dgram_allow(u32 cid, u32 port)
>+{
>+ struct virtio_vsock *vsock;
>+ bool dgram_allow;
>+
>+ dgram_allow = false;
>+ rcu_read_lock();
>+ vsock = rcu_dereference(the_virtio_vsock);
>+ if (vsock)
>+ dgram_allow = vsock->dgram_allow;
>+ rcu_read_unlock();
>+
>+ return dgram_allow;
>+}
>+
> static bool virtio_transport_seqpacket_allow(u32 remote_cid)
> {
> struct virtio_vsock *vsock;
>@@ -658,6 +675,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_SEQPACKET))
> vsock->seqpacket_allow = true;
>
>+ if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
>+ vsock->dgram_allow = true;
>+
> vdev->priv = vsock;
>
> ret = virtio_vsock_vqs_init(vsock);
>@@ -750,7 +770,8 @@ static struct virtio_device_id id_table[] = {
> };
>
> static unsigned int features[] = {
>- VIRTIO_VSOCK_F_SEQPACKET
>+ VIRTIO_VSOCK_F_SEQPACKET,
>+ VIRTIO_VSOCK_F_DGRAM
> };
>
> static struct virtio_driver virtio_vsock_driver = {
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index d5a3c8efe84b..bc9d459723f5 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -37,6 +37,35 @@ virtio_transport_get_ops(struct vsock_sock *vsk)
> return container_of(t, struct virtio_transport, transport);
> }
>
>+/* Requires info->msg and info->vsk */
>+static struct sk_buff *
>+virtio_transport_sock_alloc_send_skb(struct virtio_vsock_pkt_info *info, unsigned int size,
>+ gfp_t mask, int *err)
>+{
>+ struct sk_buff *skb;
>+ struct sock *sk;
>+ int noblock;
>+
>+ if (size < VIRTIO_VSOCK_SKB_HEADROOM) {
>+ *err = -EINVAL;
>+ return NULL;
>+ }
>+
>+ if (info->msg)
>+ noblock = info->msg->msg_flags & MSG_DONTWAIT;
>+ else
>+ noblock = 1;
>+
>+ sk = sk_vsock(info->vsk);
>+ sk->sk_allocation = mask;
>+ skb = sock_alloc_send_skb(sk, size, noblock, err);
>+ if (!skb)
>+ return NULL;
>+
>+ skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM);
>+ return skb;
>+}
>+
> /* Returns a new packet on success, otherwise returns NULL.
> *
> * If NULL is returned, errp is set to a negative errno.
^
So this comment was wrong before this change?

>@@ -47,7 +76,8 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
> u32 src_cid,
> u32 src_port,
> u32 dst_cid,
>- u32 dst_port)
>+ u32 dst_port,
>+ int *errp)
> {
> const size_t skb_len = VIRTIO_VSOCK_SKB_HEADROOM + len;
> struct virtio_vsock_hdr *hdr;
>@@ -55,9 +85,21 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
> void *payload;
> int err;
>
>- skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
>- if (!skb)
>+ /* dgrams do not use credits, self-throttle according to sk_sndbuf
>+ * using sock_alloc_send_skb. This helps avoid triggering the OOM.
>+ */

I'm thinking if we should do somenthing similar also for other types...

>+ if (info->vsk && info->type == VIRTIO_VSOCK_TYPE_DGRAM) {
>+ skb = virtio_transport_sock_alloc_send_skb(info,
>skb_len, GFP_KERNEL, &err);

Why not using errp here?

>+ } else {
>+ skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);

Maybe we can pass errp also to virtio_vsock_alloc_skb.


The rest LGTM.

Anyway, the implementation seems to work well, so I think now we should
discuss the virtio-spec changes, that with this approach should be
not big, right?

Thanks,
Stefano


2023-06-22 19:17:10

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 6/8] virtio/vsock: support dgrams



On 22.06.2023 19:09, Stefano Garzarella wrote:
> On Sun, Jun 11, 2023 at 11:49:02PM +0300, Arseniy Krasnov wrote:
>> Hello Bobby!
>>
>> On 10.06.2023 03:58, Bobby Eshleman wrote:
>>> This commit adds support for datagrams over virtio/vsock.
>>>
>>> Message boundaries are preserved on a per-skb and per-vq entry basis.
>>
>> I'm a little bit confused about the following case: let vhost sends 4097 bytes
>> datagram to the guest. Guest uses 4096 RX buffers in it's virtio queue, each
>> buffer has attached empty skb to it. Vhost places first 4096 bytes to the first
>> buffer of guests RX queue, and 1 last byte to the second buffer. Now IIUC guest
>> has two skb in it rx queue, and user in guest wants to read data - does it read
>> 4097 bytes, while guest has two skb - 4096 bytes and 1 bytes? In seqpacket there is
>> special marker in header which shows where message ends, and how it works here?
>
> I think the main difference is that DGRAM is not connection-oriented, so
> we don't have a stream and we can't split the packet into 2 (maybe we
> could, but we have no guarantee that the second one for example will be
> not discarded because there is no space).
>
> So I think it is acceptable as a restriction to keep it simple.

Ah, I see, idea is that any "corruptions" of data could be considered as
"DGRAM is not reliable anyway, so that's it" :)

>
> My only doubt is, should we make the RX buffer size configurable,
> instead of always using 4k?

I guess this is useful only for DGRAM usage, when we want to tune buffers
for some specific case - may be for exact length of messages (for example if we have
4096 buffers, while senders wants to send 5000 bytes always by each 'send()' - I think it
will be really strange that reader ALWAYS dequeues 4096 and 4 bytes as two packets).
For stream types of socket I think size of rx buffers is not big deal in most of cases.

Thanks, Arseniy

>
> Thanks,
> Stefano
>

2023-06-22 20:18:17

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 1/8] vsock/dgram: generalize recvmsg and drop transport->dgram_dequeue



On 22.06.2023 17:51, Stefano Garzarella wrote:
> On Sun, Jun 11, 2023 at 11:43:15PM +0300, Arseniy Krasnov wrote:
>> Hello Bobby! Thanks for this patchset! Small comment below:
>>
>> On 10.06.2023 03:58, Bobby Eshleman wrote:
>>> This commit drops the transport->dgram_dequeue callback and makes
>>> vsock_dgram_recvmsg() generic. It also adds additional transport
>>> callbacks for use by the generic vsock_dgram_recvmsg(), such as for
>>> parsing skbs for CID/port which vary in format per transport.
>>>
>>> Signed-off-by: Bobby Eshleman <[email protected]>
>>> ---
>>>  drivers/vhost/vsock.c                   |  4 +-
>>>  include/linux/virtio_vsock.h            |  3 ++
>>>  include/net/af_vsock.h                  | 13 ++++++-
>>>  net/vmw_vsock/af_vsock.c                | 51 ++++++++++++++++++++++++-
>>>  net/vmw_vsock/hyperv_transport.c        | 17 +++++++--
>>>  net/vmw_vsock/virtio_transport.c        |  4 +-
>>>  net/vmw_vsock/virtio_transport_common.c | 18 +++++++++
>>>  net/vmw_vsock/vmci_transport.c          | 68 +++++++++++++--------------------
>>>  net/vmw_vsock/vsock_loopback.c          |  4 +-
>>>  9 files changed, 132 insertions(+), 50 deletions(-)
>>>
>>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>>> index 6578db78f0ae..c8201c070b4b 100644
>>> --- a/drivers/vhost/vsock.c
>>> +++ b/drivers/vhost/vsock.c
>>> @@ -410,9 +410,11 @@ static struct virtio_transport vhost_transport = {
>>>          .cancel_pkt               = vhost_transport_cancel_pkt,
>>>
>>>          .dgram_enqueue            = virtio_transport_dgram_enqueue,
>>> -        .dgram_dequeue            = virtio_transport_dgram_dequeue,
>>>          .dgram_bind               = virtio_transport_dgram_bind,
>>>          .dgram_allow              = virtio_transport_dgram_allow,
>>> +        .dgram_get_cid          = virtio_transport_dgram_get_cid,
>>> +        .dgram_get_port          = virtio_transport_dgram_get_port,
>>> +        .dgram_get_length      = virtio_transport_dgram_get_length,
>>>
>>>          .stream_enqueue           = virtio_transport_stream_enqueue,
>>>          .stream_dequeue           = virtio_transport_stream_dequeue,
>>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>>> index c58453699ee9..23521a318cf0 100644
>>> --- a/include/linux/virtio_vsock.h
>>> +++ b/include/linux/virtio_vsock.h
>>> @@ -219,6 +219,9 @@ bool virtio_transport_stream_allow(u32 cid, u32 port);
>>>  int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>>>                  struct sockaddr_vm *addr);
>>>  bool virtio_transport_dgram_allow(u32 cid, u32 port);
>>> +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
>>> +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
>>> +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len);
>>>
>>>  int virtio_transport_connect(struct vsock_sock *vsk);
>>>
>>> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>>> index 0e7504a42925..7bedb9ee7e3e 100644
>>> --- a/include/net/af_vsock.h
>>> +++ b/include/net/af_vsock.h
>>> @@ -120,11 +120,20 @@ struct vsock_transport {
>>>
>>>      /* DGRAM. */
>>>      int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
>>> -    int (*dgram_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
>>> -                 size_t len, int flags);
>>>      int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
>>>                   struct msghdr *, size_t len);
>>>      bool (*dgram_allow)(u32 cid, u32 port);
>>> +    int (*dgram_get_cid)(struct sk_buff *skb, unsigned int *cid);
>>> +    int (*dgram_get_port)(struct sk_buff *skb, unsigned int *port);
>>> +    int (*dgram_get_length)(struct sk_buff *skb, size_t *length);
>>> +
>>> +    /* The number of bytes into the buffer at which the payload starts, as
>>> +     * first seen by the receiving socket layer. For example, if the
>>> +     * transport presets the skb pointers using skb_pull(sizeof(header))
>>> +     * than this would be zero, otherwise it would be the size of the
>>> +     * header.
>>> +     */
>>> +    const size_t dgram_payload_offset;
>>>
>>>      /* STREAM. */
>>>      /* TODO: stream_bind() */
>>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>>> index efb8a0937a13..ffb4dd8b6ea7 100644
>>> --- a/net/vmw_vsock/af_vsock.c
>>> +++ b/net/vmw_vsock/af_vsock.c
>>> @@ -1271,11 +1271,15 @@ static int vsock_dgram_connect(struct socket *sock,
>>>  int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
>>>              size_t len, int flags)
>>>  {
>>> +    const struct vsock_transport *transport;
>>>  #ifdef CONFIG_BPF_SYSCALL
>>>      const struct proto *prot;
>>>  #endif
>>>      struct vsock_sock *vsk;
>>> +    struct sk_buff *skb;
>>> +    size_t payload_len;
>>>      struct sock *sk;
>>> +    int err;
>>>
>>>      sk = sock->sk;
>>>      vsk = vsock_sk(sk);
>>> @@ -1286,7 +1290,52 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
>>>          return prot->recvmsg(sk, msg, len, flags, NULL);
>>>  #endif
>>>
>>> -    return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
>>> +    if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
>>> +        return -EOPNOTSUPP;
>>> +
>>> +    transport = vsk->transport;
>>> +
>>> +    /* Retrieve the head sk_buff from the socket's receive queue. */
>>> +    err = 0;
>>> +    skb = skb_recv_datagram(sk_vsock(vsk), flags, &err);
>>> +    if (!skb)
>>> +        return err;
>>> +
>>> +    err = transport->dgram_get_length(skb, &payload_len);
>
> What about ssize_t return value here?
>
> Or maybe a single callback that return both length and offset?
>
> .dgram_get_payload_info(skb, &payload_len, &payload_off)

Just architectural question:

May be we can avoid this callback for length? IIUC concept of skbuff is that
current level of network stack already have pointer to its data 'skb->data' and
length of the payload 'skb->len' (both are set by previous stack handler - transport in
this case), so here we can use just 'skb->len' and thats all. There is no need to ask
lower level of network stack for length of payload? I see that VMCI stores metadata
with payload in 'data' buffer, but may be it is more correctly to do 'skb_pull()'
in vmci before inserting skbuff to sockets queue? In this case field with dgram payload
offset could be removed from transport.

Thanks, Arseniy

>
>>> +    if (err)
>>> +        goto out;
>>> +
>>> +    if (payload_len > len) {
>>> +        payload_len = len;
>>> +        msg->msg_flags |= MSG_TRUNC;
>>> +    }
>>> +
>>> +    /* Place the datagram payload in the user's iovec. */
>>> +    err = skb_copy_datagram_msg(skb, transport->dgram_payload_offset, msg, payload_len);
>>> +    if (err)
>>> +        goto out;
>>> +
>>> +    if (msg->msg_name) {
>>> +        /* Provide the address of the sender. */
>>> +        DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
>>> +        unsigned int cid, port;
>>> +
>>> +        err = transport->dgram_get_cid(skb, &cid);
>>> +        if (err)
>>> +            goto out;
>>> +
>>> +        err = transport->dgram_get_port(skb, &port);
>>> +        if (err)
>>> +            goto out;
>>
>> Maybe we can merge 'dgram_get_cid' and 'dgram_get_port' to a single callback? Because I see that this is
>> the only place where both are used (correct me if i'm wrong) and logically both operates with addresses:
>> CID and port. E.g. something like that: dgram_get_cid_n_port().
>
> What about .dgram_addr_init(struct sk_buff *skb, struct sockaddr_vm *addr)
> and the transport can set cid and port?
>
>>
>> Moreover, I'm not sure, but is it good "tradeoff" here: remove transport specific callback for dgram receive
>> where we already have 'msghdr' with both data buffer and buffer for 'sockaddr_vm' and instead of it add new
>> several fields (callbacks) to transports like dgram_get_cid(), dgram_get_port()? I agree, that in each transport
>> specific callback we will have same copying logic by calling 'skb_copy_datagram_msg()' and filling address
>> by using 'vsock_addr_init()', but in this case we don't need to update transports too much. For example HyperV
>> still unchanged as it does not support SOCK_DGRAM. For VMCI You just need to add 'vsock_addr_init()' logic
>> to it's dgram dequeue callback.
>>
>> What do You think?
>
> Honestly, I'd rather avoid duplicate code than reduce changes in
> transports that don't support dgram.
>
> One thing I do agree on though is minimizing the number of callbacks
> to call to reduce the number of indirection (more performance?).
>
> Thanks,
> Stefano
>
>>
>> Thanks, Arseniy
>>
>>> +
>>> +        vsock_addr_init(vm_addr, cid, port);
>>> +        msg->msg_namelen = sizeof(*vm_addr);
>>> +    }
>>> +    err = payload_len;
>>> +
>>> +out:
>>> +    skb_free_datagram(&vsk->sk, skb);
>>> +    return err;
>>>  }
>>>  EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);
>>>
>>> diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
>>> index 7cb1a9d2cdb4..ff6e87e25fa0 100644
>>> --- a/net/vmw_vsock/hyperv_transport.c
>>> +++ b/net/vmw_vsock/hyperv_transport.c
>>> @@ -556,8 +556,17 @@ static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
>>>      return -EOPNOTSUPP;
>>>  }
>>>
>>> -static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
>>> -                 size_t len, int flags)
>>> +static int hvs_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
>>> +{
>>> +    return -EOPNOTSUPP;
>>> +}
>>> +
>>> +static int hvs_dgram_get_port(struct sk_buff *skb, unsigned int *port)
>>> +{
>>> +    return -EOPNOTSUPP;
>>> +}
>>> +
>>> +static int hvs_dgram_get_length(struct sk_buff *skb, size_t *len)
>>>  {
>>>      return -EOPNOTSUPP;
>>>  }
>>> @@ -833,7 +842,9 @@ static struct vsock_transport hvs_transport = {
>>>      .shutdown                 = hvs_shutdown,
>>>
>>>      .dgram_bind               = hvs_dgram_bind,
>>> -    .dgram_dequeue            = hvs_dgram_dequeue,
>>> +    .dgram_get_cid          = hvs_dgram_get_cid,
>>> +    .dgram_get_port          = hvs_dgram_get_port,
>>> +    .dgram_get_length      = hvs_dgram_get_length,
>>>      .dgram_enqueue            = hvs_dgram_enqueue,
>>>      .dgram_allow              = hvs_dgram_allow,
>>>
>>> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>>> index e95df847176b..5763cdf13804 100644
>>> --- a/net/vmw_vsock/virtio_transport.c
>>> +++ b/net/vmw_vsock/virtio_transport.c
>>> @@ -429,9 +429,11 @@ static struct virtio_transport virtio_transport = {
>>>          .cancel_pkt               = virtio_transport_cancel_pkt,
>>>
>>>          .dgram_bind               = virtio_transport_dgram_bind,
>>> -        .dgram_dequeue            = virtio_transport_dgram_dequeue,
>>>          .dgram_enqueue            = virtio_transport_dgram_enqueue,
>>>          .dgram_allow              = virtio_transport_dgram_allow,
>>> +        .dgram_get_cid          = virtio_transport_dgram_get_cid,
>>> +        .dgram_get_port          = virtio_transport_dgram_get_port,
>>> +        .dgram_get_length      = virtio_transport_dgram_get_length,
>>>
>>>          .stream_dequeue           = virtio_transport_stream_dequeue,
>>>          .stream_enqueue           = virtio_transport_stream_enqueue,
>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>> index b769fc258931..e6903c719964 100644
>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>> @@ -797,6 +797,24 @@ int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>>>  }
>>>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
>>>
>>> +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
>>> +{
>>> +    return -EOPNOTSUPP;
>>> +}
>>> +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_cid);
>>> +
>>> +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
>>> +{
>>> +    return -EOPNOTSUPP;
>>> +}
>>> +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_port);
>>> +
>>> +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
>>> +{
>>> +    return -EOPNOTSUPP;
>>> +}
>>> +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_length);
>>> +
>>>  bool virtio_transport_dgram_allow(u32 cid, u32 port)
>>>  {
>>>      return false;
>>> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
>>> index b370070194fa..bbc63826bf48 100644
>>> --- a/net/vmw_vsock/vmci_transport.c
>>> +++ b/net/vmw_vsock/vmci_transport.c
>>> @@ -1731,57 +1731,40 @@ static int vmci_transport_dgram_enqueue(
>>>      return err - sizeof(*dg);
>>>  }
>>>
>>> -static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk,
>>> -                    struct msghdr *msg, size_t len,
>>> -                    int flags)
>>> +static int vmci_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
>>>  {
>>> -    int err;
>>>      struct vmci_datagram *dg;
>>> -    size_t payload_len;
>>> -    struct sk_buff *skb;
>>>
>>> -    if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
>>> -        return -EOPNOTSUPP;
>>> +    dg = (struct vmci_datagram *)skb->data;
>>> +    if (!dg)
>>> +        return -EINVAL;
>>>
>>> -    /* Retrieve the head sk_buff from the socket's receive queue. */
>>> -    err = 0;
>>> -    skb = skb_recv_datagram(&vsk->sk, flags, &err);
>>> -    if (!skb)
>>> -        return err;
>>> +    *cid = dg->src.context;
>>> +    return 0;
>>> +}
>>> +
>>> +static int vmci_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
>>> +{
>>> +    struct vmci_datagram *dg;
>>>
>>>      dg = (struct vmci_datagram *)skb->data;
>>>      if (!dg)
>>> -        /* err is 0, meaning we read zero bytes. */
>>> -        goto out;
>>> -
>>> -    payload_len = dg->payload_size;
>>> -    /* Ensure the sk_buff matches the payload size claimed in the packet. */
>>> -    if (payload_len != skb->len - sizeof(*dg)) {
>>> -        err = -EINVAL;
>>> -        goto out;
>>> -    }
>>> +        return -EINVAL;
>>>
>>> -    if (payload_len > len) {
>>> -        payload_len = len;
>>> -        msg->msg_flags |= MSG_TRUNC;
>>> -    }
>>> +    *port = dg->src.resource;
>>> +    return 0;
>>> +}
>>>
>>> -    /* Place the datagram payload in the user's iovec. */
>>> -    err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
>>> -    if (err)
>>> -        goto out;
>>> +static int vmci_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
>>> +{
>>> +    struct vmci_datagram *dg;
>>>
>>> -    if (msg->msg_name) {
>>> -        /* Provide the address of the sender. */
>>> -        DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
>>> -        vsock_addr_init(vm_addr, dg->src.context, dg->src.resource);
>>> -        msg->msg_namelen = sizeof(*vm_addr);
>>> -    }
>>> -    err = payload_len;
>>> +    dg = (struct vmci_datagram *)skb->data;
>>> +    if (!dg)
>>> +        return -EINVAL;
>>>
>>> -out:
>>> -    skb_free_datagram(&vsk->sk, skb);
>>> -    return err;
>>> +    *len = dg->payload_size;
>>> +    return 0;
>>>  }
>>>
>>>  static bool vmci_transport_dgram_allow(u32 cid, u32 port)
>>> @@ -2040,9 +2023,12 @@ static struct vsock_transport vmci_transport = {
>>>      .release = vmci_transport_release,
>>>      .connect = vmci_transport_connect,
>>>      .dgram_bind = vmci_transport_dgram_bind,
>>> -    .dgram_dequeue = vmci_transport_dgram_dequeue,
>>>      .dgram_enqueue = vmci_transport_dgram_enqueue,
>>>      .dgram_allow = vmci_transport_dgram_allow,
>>> +    .dgram_get_cid = vmci_transport_dgram_get_cid,
>>> +    .dgram_get_port = vmci_transport_dgram_get_port,
>>> +    .dgram_get_length = vmci_transport_dgram_get_length,
>>> +    .dgram_payload_offset = sizeof(struct vmci_datagram),
>>>      .stream_dequeue = vmci_transport_stream_dequeue,
>>>      .stream_enqueue = vmci_transport_stream_enqueue,
>>>      .stream_has_data = vmci_transport_stream_has_data,
>>> diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
>>> index 5c6360df1f31..2f3cabc79ee5 100644
>>> --- a/net/vmw_vsock/vsock_loopback.c
>>> +++ b/net/vmw_vsock/vsock_loopback.c
>>> @@ -62,9 +62,11 @@ static struct virtio_transport loopback_transport = {
>>>          .cancel_pkt               = vsock_loopback_cancel_pkt,
>>>
>>>          .dgram_bind               = virtio_transport_dgram_bind,
>>> -        .dgram_dequeue            = virtio_transport_dgram_dequeue,
>>>          .dgram_enqueue            = virtio_transport_dgram_enqueue,
>>>          .dgram_allow              = virtio_transport_dgram_allow,
>>> +        .dgram_get_cid          = virtio_transport_dgram_get_cid,
>>> +        .dgram_get_port          = virtio_transport_dgram_get_port,
>>> +        .dgram_get_length      = virtio_transport_dgram_get_length,
>>>
>>>          .stream_dequeue           = virtio_transport_stream_dequeue,
>>>          .stream_enqueue           = virtio_transport_stream_enqueue,
>>>
>>
>

2023-06-22 23:14:09

by Bobby Eshleman

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 4/8] vsock: make vsock bind reusable

On Thu, Jun 22, 2023 at 05:25:55PM +0200, Stefano Garzarella wrote:
> On Sat, Jun 10, 2023 at 12:58:31AM +0000, Bobby Eshleman wrote:
> > This commit makes the bind table management functions in vsock usable
> > for different bind tables. For use by datagrams in a future patch.
> >
> > Signed-off-by: Bobby Eshleman <[email protected]>
> > ---
> > net/vmw_vsock/af_vsock.c | 33 ++++++++++++++++++++++++++-------
> > 1 file changed, 26 insertions(+), 7 deletions(-)
> >
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index ef86765f3765..7a3ca4270446 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -230,11 +230,12 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
> > sock_put(&vsk->sk);
> > }
> >
> > -static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> > +struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
> > + struct list_head *bind_table)
> > {
> > struct vsock_sock *vsk;
> >
> > - list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
> > + list_for_each_entry(vsk, bind_table, bound_table) {
> > if (vsock_addr_equals_addr(addr, &vsk->local_addr))
> > return sk_vsock(vsk);
> >
> > @@ -247,6 +248,11 @@ static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> > return NULL;
> > }
> >
> > +static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> > +{
> > + return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr));
> > +}
> > +
> > static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
> > struct sockaddr_vm *dst)
> > {
> > @@ -646,12 +652,17 @@ static void vsock_pending_work(struct work_struct *work)
> >
> > /**** SOCKET OPERATIONS ****/
> >
> > -static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > - struct sockaddr_vm *addr)
> > +static int vsock_bind_common(struct vsock_sock *vsk,
> > + struct sockaddr_vm *addr,
> > + struct list_head *bind_table,
> > + size_t table_size)
> > {
> > static u32 port;
> > struct sockaddr_vm new_addr;
> >
> > + if (table_size < VSOCK_HASH_SIZE)
> > + return -1;
>
> Why we need this check now?
>

If the table_size is not at least VSOCK_HASH_SIZE then the
VSOCK_HASH(addr) used later could overflow the table.

Maybe this really deserves a WARN() and a comment?

> > +
> > if (!port)
> > port = get_random_u32_above(LAST_RESERVED_PORT);
> >
> > @@ -667,7 +678,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> >
> > new_addr.svm_port = port++;
> >
> > - if (!__vsock_find_bound_socket(&new_addr)) {
> > + if (!vsock_find_bound_socket_common(&new_addr,
> > + &bind_table[VSOCK_HASH(addr)])) {
> > found = true;
> > break;
> > }
> > @@ -684,7 +696,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > return -EACCES;
> > }
> >
> > - if (__vsock_find_bound_socket(&new_addr))
> > + if (vsock_find_bound_socket_common(&new_addr,
> > + &bind_table[VSOCK_HASH(addr)]))
> > return -EADDRINUSE;
> > }
> >
> > @@ -696,11 +709,17 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > * by AF_UNIX.
> > */
> > __vsock_remove_bound(vsk);
> > - __vsock_insert_bound(vsock_bound_sockets(&vsk->local_addr), vsk);
> > + __vsock_insert_bound(&bind_table[VSOCK_HASH(&vsk->local_addr)], vsk);
> >
> > return 0;
> > }
> >
> > +static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > + struct sockaddr_vm *addr)
> > +{
> > + return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
> > +}
> > +
> > static int __vsock_bind_dgram(struct vsock_sock *vsk,
> > struct sockaddr_vm *addr)
> > {
> >
> > --
> > 2.30.2
> >
>
> The rest seems okay to me, but I agree with Simon's suggestion.
>
> Stefano
>

Thanks,
Bobby

2023-06-22 23:34:50

by Bobby Eshleman

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 8/8] tests: add vsock dgram tests

On Sun, Jun 11, 2023 at 11:54:57PM +0300, Arseniy Krasnov wrote:
> Hello Bobby!
>
> Sorry, may be I become a little bit annoying:), but I tried to run vsock_test with
> this v4 version, and again get the same crash:

Haha not annoying at all. I appreciate the testing!

>
> # cat client.sh
> ./vsock_test --mode=client --control-host=192.168.1.1 --control-port=12345 --peer-cid=2
> # ./client.sh
> Control socket connected to 192.168.1.1:12345.
> 0 - SOCK_STREAM connection reset...[ 20.065237] BUG: kernel NULL pointer dereference, addre0
> [ 20.065895] #PF: supervisor read access in kernel mode
> [ 20.065895] #PF: error_code(0x0000) - not-present page
> [ 20.065895] PGD 0 P4D 0
> [ 20.065895] Oops: 0000 [#1] PREEMPT SMP PTI
> [ 20.065895] CPU: 0 PID: 111 Comm: vsock_test Not tainted 6.4.0-rc3-gefcccba07069 #385
> [ 20.065895] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd44
> [ 20.065895] RIP: 0010:static_key_count+0x0/0x20
> [ 20.065895] Code: 04 4c 8b 46 08 49 29 c0 4c 01 c8 4c 89 47 08 89 0e 89 56 04 48 89 46 08 f
> [ 20.065895] RSP: 0018:ffffbbb000223dc0 EFLAGS: 00010202
> [ 20.065895] RAX: ffffffff85709880 RBX: ffffffffc0079140 RCX: 0000000000000000
> [ 20.065895] RDX: ffff9f73c2175700 RSI: 0000000000000000 RDI: 0000000000000000
> [ 20.065895] RBP: ffff9f73c2385900 R08: ffffbbb000223d30 R09: ffff9f73ff896000
> [ 20.065895] R10: 0000000000001000 R11: 0000000000000000 R12: ffffbbb000223e80
> [ 20.065895] R13: 0000000000000000 R14: 0000000000000002 R15: ffff9f73c1cfaa80
> [ 20.065895] FS: 00007f1ad82f55c0(0000) GS:ffff9f73fe400000(0000) knlGS:0000000000000000
> [ 20.065895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 20.065895] CR2: 0000000000000000 CR3: 000000003f954000 CR4: 00000000000006f0
> [ 20.065895] Call Trace:
> [ 20.065895] <TASK>
> [ 20.065895] once_deferred+0xd/0x30
> [ 20.065895] vsock_assign_transport+0x9a/0x1b0 [vsock]
> [ 20.065895] vsock_connect+0xb4/0x3a0 [vsock]
> [ 20.065895] ? var_wake_function+0x60/0x60
> [ 20.065895] __sys_connect+0x9e/0xd0
> [ 20.065895] ? _raw_spin_unlock_irq+0xe/0x30
> [ 20.065895] ? do_setitimer+0x128/0x1f0
> [ 20.065895] ? alarm_setitimer+0x4c/0x90
> [ 20.065895] ? fpregs_assert_state_consistent+0x1d/0x50
> [ 20.065895] ? exit_to_user_mode_prepare+0x36/0x130
> [ 20.065895] __x64_sys_connect+0x11/0x20
> [ 20.065895] do_syscall_64+0x3b/0xc0
> [ 20.065895] entry_SYSCALL_64_after_hwframe+0x4b/0xb5
> [ 20.065895] RIP: 0033:0x7f1ad822dd13
> [ 20.065895] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 64 8
> [ 20.065895] RSP: 002b:00007ffc513e3c98 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
> [ 20.065895] RAX: ffffffffffffffda RBX: 000055aed298e020 RCX: 00007f1ad822dd13
> [ 20.065895] RDX: 0000000000000010 RSI: 00007ffc513e3cb0 RDI: 0000000000000004
> [ 20.065895] RBP: 0000000000000004 R08: 000055aed32b2018 R09: 0000000000000000
> [ 20.065895] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [ 20.065895] R13: 000055aed298acb1 R14: 00007ffc513e3cb0 R15: 00007ffc513e3d40
> [ 20.065895] </TASK>
> [ 20.065895] Modules linked in: vsock_loopback vhost_vsock vmw_vsock_virtio_transport vmw_vb

^ I'm guessing this is the difference between our setups. I have been
going all built-in, let me see if I can reproduce w/ modules...

> [ 20.065895] CR2: 0000000000000000
> [ 20.154060] ---[ end trace 0000000000000000 ]---
> [ 20.155519] RIP: 0010:static_key_count+0x0/0x20
> [ 20.156932] Code: 04 4c 8b 46 08 49 29 c0 4c 01 c8 4c 89 47 08 89 0e 89 56 04 48 89 46 08 f
> [ 20.161367] RSP: 0018:ffffbbb000223dc0 EFLAGS: 00010202
> [ 20.162613] RAX: ffffffff85709880 RBX: ffffffffc0079140 RCX: 0000000000000000
> [ 20.164262] RDX: ffff9f73c2175700 RSI: 0000000000000000 RDI: 0000000000000000
> [ 20.165934] RBP: ffff9f73c2385900 R08: ffffbbb000223d30 R09: ffff9f73ff896000
> [ 20.167684] R10: 0000000000001000 R11: 0000000000000000 R12: ffffbbb000223e80
> [ 20.169427] R13: 0000000000000000 R14: 0000000000000002 R15: ffff9f73c1cfaa80
> [ 20.171109] FS: 00007f1ad82f55c0(0000) GS:ffff9f73fe400000(0000) knlGS:0000000000000000
> [ 20.173000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 20.174381] CR2: 0000000000000000 CR3: 000000003f954000 CR4: 00000000000006f0
>
> So, what HEAD do You use? May be You have some specific config (I use x86-64 defconfig + vsock/vhost
> related things) ?
>

For this series I used net-next:
28cfea989d6f55c3d10608eba2a2bae609c5bf3e

> Thanks, Arseniy
>

As always, thanks for the bug finding! I'll report back when I
reproduce or with questions if I can't.

Best,
Bobby

>
> On 10.06.2023 03:58, Bobby Eshleman wrote:
> > From: Jiang Wang <[email protected]>
> >
> > This patch adds tests for vsock datagram.
> >
> > Signed-off-by: Bobby Eshleman <[email protected]>
> > Signed-off-by: Jiang Wang <[email protected]>
> > ---
> > tools/testing/vsock/util.c | 141 ++++++++++++-
> > tools/testing/vsock/util.h | 6 +
> > tools/testing/vsock/vsock_test.c | 432 +++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 578 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
> > index 01b636d3039a..811e70d7cf1e 100644
> > --- a/tools/testing/vsock/util.c
> > +++ b/tools/testing/vsock/util.c
> > @@ -99,7 +99,8 @@ static int vsock_connect(unsigned int cid, unsigned int port, int type)
> > int ret;
> > int fd;
> >
> > - control_expectln("LISTENING");
> > + if (type != SOCK_DGRAM)
> > + control_expectln("LISTENING");
> >
> > fd = socket(AF_VSOCK, type, 0);
> >
> > @@ -130,6 +131,11 @@ int vsock_seqpacket_connect(unsigned int cid, unsigned int port)
> > return vsock_connect(cid, port, SOCK_SEQPACKET);
> > }
> >
> > +int vsock_dgram_connect(unsigned int cid, unsigned int port)
> > +{
> > + return vsock_connect(cid, port, SOCK_DGRAM);
> > +}
> > +
> > /* Listen on <cid, port> and return the first incoming connection. The remote
> > * address is stored to clientaddrp. clientaddrp may be NULL.
> > */
> > @@ -211,6 +217,34 @@ int vsock_seqpacket_accept(unsigned int cid, unsigned int port,
> > return vsock_accept(cid, port, clientaddrp, SOCK_SEQPACKET);
> > }
> >
> > +int vsock_dgram_bind(unsigned int cid, unsigned int port)
> > +{
> > + union {
> > + struct sockaddr sa;
> > + struct sockaddr_vm svm;
> > + } addr = {
> > + .svm = {
> > + .svm_family = AF_VSOCK,
> > + .svm_port = port,
> > + .svm_cid = cid,
> > + },
> > + };
> > + int fd;
> > +
> > + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> > + if (fd < 0) {
> > + perror("socket");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
> > + perror("bind");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + return fd;
> > +}
> > +
> > /* Transmit one byte and check the return value.
> > *
> > * expected_ret:
> > @@ -260,6 +294,57 @@ void send_byte(int fd, int expected_ret, int flags)
> > }
> > }
> >
> > +/* Transmit one byte and check the return value.
> > + *
> > + * expected_ret:
> > + * <0 Negative errno (for testing errors)
> > + * 0 End-of-file
> > + * 1 Success
> > + */
> > +void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
> > + int flags)
> > +{
> > + const uint8_t byte = 'A';
> > + ssize_t nwritten;
> > +
> > + timeout_begin(TIMEOUT);
> > + do {
> > + nwritten = sendto(fd, &byte, sizeof(byte), flags, dest_addr,
> > + len);
> > + timeout_check("write");
> > + } while (nwritten < 0 && errno == EINTR);
> > + timeout_end();
> > +
> > + if (expected_ret < 0) {
> > + if (nwritten != -1) {
> > + fprintf(stderr, "bogus sendto(2) return value %zd\n",
> > + nwritten);
> > + exit(EXIT_FAILURE);
> > + }
> > + if (errno != -expected_ret) {
> > + perror("write");
> > + exit(EXIT_FAILURE);
> > + }
> > + return;
> > + }
> > +
> > + if (nwritten < 0) {
> > + perror("write");
> > + exit(EXIT_FAILURE);
> > + }
> > + if (nwritten == 0) {
> > + if (expected_ret == 0)
> > + return;
> > +
> > + fprintf(stderr, "unexpected EOF while sending byte\n");
> > + exit(EXIT_FAILURE);
> > + }
> > + if (nwritten != sizeof(byte)) {
> > + fprintf(stderr, "bogus sendto(2) return value %zd\n", nwritten);
> > + exit(EXIT_FAILURE);
> > + }
> > +}
> > +
> > /* Receive one byte and check the return value.
> > *
> > * expected_ret:
> > @@ -313,6 +398,60 @@ void recv_byte(int fd, int expected_ret, int flags)
> > }
> > }
> >
> > +/* Receive one byte and check the return value.
> > + *
> > + * expected_ret:
> > + * <0 Negative errno (for testing errors)
> > + * 0 End-of-file
> > + * 1 Success
> > + */
> > +void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
> > + int expected_ret, int flags)
> > +{
> > + uint8_t byte;
> > + ssize_t nread;
> > +
> > + timeout_begin(TIMEOUT);
> > + do {
> > + nread = recvfrom(fd, &byte, sizeof(byte), flags, src_addr, addrlen);
> > + timeout_check("read");
> > + } while (nread < 0 && errno == EINTR);
> > + timeout_end();
> > +
> > + if (expected_ret < 0) {
> > + if (nread != -1) {
> > + fprintf(stderr, "bogus recvfrom(2) return value %zd\n",
> > + nread);
> > + exit(EXIT_FAILURE);
> > + }
> > + if (errno != -expected_ret) {
> > + perror("read");
> > + exit(EXIT_FAILURE);
> > + }
> > + return;
> > + }
> > +
> > + if (nread < 0) {
> > + perror("read");
> > + exit(EXIT_FAILURE);
> > + }
> > + if (nread == 0) {
> > + if (expected_ret == 0)
> > + return;
> > +
> > + fprintf(stderr, "unexpected EOF while receiving byte\n");
> > + exit(EXIT_FAILURE);
> > + }
> > + if (nread != sizeof(byte)) {
> > + fprintf(stderr, "bogus recvfrom(2) return value %zd\n", nread);
> > + exit(EXIT_FAILURE);
> > + }
> > + if (byte != 'A') {
> > + fprintf(stderr, "unexpected byte read %c\n", byte);
> > + exit(EXIT_FAILURE);
> > + }
> > +}
> > +
> > /* Run test cases. The program terminates if a failure occurs. */
> > void run_tests(const struct test_case *test_cases,
> > const struct test_opts *opts)
> > diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
> > index fb99208a95ea..a69e128d120c 100644
> > --- a/tools/testing/vsock/util.h
> > +++ b/tools/testing/vsock/util.h
> > @@ -37,13 +37,19 @@ void init_signals(void);
> > unsigned int parse_cid(const char *str);
> > int vsock_stream_connect(unsigned int cid, unsigned int port);
> > int vsock_seqpacket_connect(unsigned int cid, unsigned int port);
> > +int vsock_dgram_connect(unsigned int cid, unsigned int port);
> > int vsock_stream_accept(unsigned int cid, unsigned int port,
> > struct sockaddr_vm *clientaddrp);
> > int vsock_seqpacket_accept(unsigned int cid, unsigned int port,
> > struct sockaddr_vm *clientaddrp);
> > +int vsock_dgram_bind(unsigned int cid, unsigned int port);
> > void vsock_wait_remote_close(int fd);
> > void send_byte(int fd, int expected_ret, int flags);
> > +void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
> > + int flags);
> > void recv_byte(int fd, int expected_ret, int flags);
> > +void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
> > + int expected_ret, int flags);
> > void run_tests(const struct test_case *test_cases,
> > const struct test_opts *opts);
> > void list_tests(const struct test_case *test_cases);
> > diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
> > index ac1bd3ac1533..ded82d39ee5d 100644
> > --- a/tools/testing/vsock/vsock_test.c
> > +++ b/tools/testing/vsock/vsock_test.c
> > @@ -1053,6 +1053,413 @@ static void test_stream_virtio_skb_merge_server(const struct test_opts *opts)
> > close(fd);
> > }
> >
> > +static void test_dgram_sendto_client(const struct test_opts *opts)
> > +{
> > + union {
> > + struct sockaddr sa;
> > + struct sockaddr_vm svm;
> > + } addr = {
> > + .svm = {
> > + .svm_family = AF_VSOCK,
> > + .svm_port = 1234,
> > + .svm_cid = opts->peer_cid,
> > + },
> > + };
> > + int fd;
> > +
> > + /* Wait for the server to be ready */
> > + control_expectln("BIND");
> > +
> > + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> > + if (fd < 0) {
> > + perror("socket");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0);
> > +
> > + /* Notify the server that the client has finished */
> > + control_writeln("DONE");
> > +
> > + close(fd);
> > +}
> > +
> > +static void test_dgram_sendto_server(const struct test_opts *opts)
> > +{
> > + union {
> > + struct sockaddr sa;
> > + struct sockaddr_vm svm;
> > + } addr = {
> > + .svm = {
> > + .svm_family = AF_VSOCK,
> > + .svm_port = 1234,
> > + .svm_cid = VMADDR_CID_ANY,
> > + },
> > + };
> > + int len = sizeof(addr.sa);
> > + int fd;
> > +
> > + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> > + if (fd < 0) {
> > + perror("socket");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
> > + perror("bind");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + /* Notify the client that the server is ready */
> > + control_writeln("BIND");
> > +
> > + recvfrom_byte(fd, &addr.sa, &len, 1, 0);
> > +
> > + /* Wait for the client to finish */
> > + control_expectln("DONE");
> > +
> > + close(fd);
> > +}
> > +
> > +static void test_dgram_connect_client(const struct test_opts *opts)
> > +{
> > + union {
> > + struct sockaddr sa;
> > + struct sockaddr_vm svm;
> > + } addr = {
> > + .svm = {
> > + .svm_family = AF_VSOCK,
> > + .svm_port = 1234,
> > + .svm_cid = opts->peer_cid,
> > + },
> > + };
> > + int ret;
> > + int fd;
> > +
> > + /* Wait for the server to be ready */
> > + control_expectln("BIND");
> > +
> > + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> > + if (fd < 0) {
> > + perror("bind");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + ret = connect(fd, &addr.sa, sizeof(addr.svm));
> > + if (ret < 0) {
> > + perror("connect");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + send_byte(fd, 1, 0);
> > +
> > + /* Notify the server that the client has finished */
> > + control_writeln("DONE");
> > +
> > + close(fd);
> > +}
> > +
> > +static void test_dgram_connect_server(const struct test_opts *opts)
> > +{
> > + test_dgram_sendto_server(opts);
> > +}
> > +
> > +static void test_dgram_multiconn_sendto_client(const struct test_opts *opts)
> > +{
> > + union {
> > + struct sockaddr sa;
> > + struct sockaddr_vm svm;
> > + } addr = {
> > + .svm = {
> > + .svm_family = AF_VSOCK,
> > + .svm_port = 1234,
> > + .svm_cid = opts->peer_cid,
> > + },
> > + };
> > + int fds[MULTICONN_NFDS];
> > + int i;
> > +
> > + /* Wait for the server to be ready */
> > + control_expectln("BIND");
> > +
> > + for (i = 0; i < MULTICONN_NFDS; i++) {
> > + fds[i] = socket(AF_VSOCK, SOCK_DGRAM, 0);
> > + if (fds[i] < 0) {
> > + perror("socket");
> > + exit(EXIT_FAILURE);
> > + }
> > + }
> > +
> > + for (i = 0; i < MULTICONN_NFDS; i++)
> > + sendto_byte(fds[i], &addr.sa, sizeof(addr.svm), 1, 0);
> > +
> > + /* Notify the server that the client has finished */
> > + control_writeln("DONE");
> > +
> > + for (i = 0; i < MULTICONN_NFDS; i++)
> > + close(fds[i]);
> > +}
> > +
> > +static void test_dgram_multiconn_sendto_server(const struct test_opts *opts)
> > +{
> > + union {
> > + struct sockaddr sa;
> > + struct sockaddr_vm svm;
> > + } addr = {
> > + .svm = {
> > + .svm_family = AF_VSOCK,
> > + .svm_port = 1234,
> > + .svm_cid = VMADDR_CID_ANY,
> > + },
> > + };
> > + int len = sizeof(addr.sa);
> > + int fd;
> > + int i;
> > +
> > + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> > + if (fd < 0) {
> > + perror("socket");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
> > + perror("bind");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + /* Notify the client that the server is ready */
> > + control_writeln("BIND");
> > +
> > + for (i = 0; i < MULTICONN_NFDS; i++)
> > + recvfrom_byte(fd, &addr.sa, &len, 1, 0);
> > +
> > + /* Wait for the client to finish */
> > + control_expectln("DONE");
> > +
> > + close(fd);
> > +}
> > +
> > +static void test_dgram_multiconn_send_client(const struct test_opts *opts)
> > +{
> > + int fds[MULTICONN_NFDS];
> > + int i;
> > +
> > + /* Wait for the server to be ready */
> > + control_expectln("BIND");
> > +
> > + for (i = 0; i < MULTICONN_NFDS; i++) {
> > + fds[i] = vsock_dgram_connect(opts->peer_cid, 1234);
> > + if (fds[i] < 0) {
> > + perror("socket");
> > + exit(EXIT_FAILURE);
> > + }
> > + }
> > +
> > + for (i = 0; i < MULTICONN_NFDS; i++)
> > + send_byte(fds[i], 1, 0);
> > +
> > + /* Notify the server that the client has finished */
> > + control_writeln("DONE");
> > +
> > + for (i = 0; i < MULTICONN_NFDS; i++)
> > + close(fds[i]);
> > +}
> > +
> > +static void test_dgram_multiconn_send_server(const struct test_opts *opts)
> > +{
> > + union {
> > + struct sockaddr sa;
> > + struct sockaddr_vm svm;
> > + } addr = {
> > + .svm = {
> > + .svm_family = AF_VSOCK,
> > + .svm_port = 1234,
> > + .svm_cid = VMADDR_CID_ANY,
> > + },
> > + };
> > + int fd;
> > + int i;
> > +
> > + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> > + if (fd < 0) {
> > + perror("socket");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
> > + perror("bind");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + /* Notify the client that the server is ready */
> > + control_writeln("BIND");
> > +
> > + for (i = 0; i < MULTICONN_NFDS; i++)
> > + recv_byte(fd, 1, 0);
> > +
> > + /* Wait for the client to finish */
> > + control_expectln("DONE");
> > +
> > + close(fd);
> > +}
> > +
> > +static void test_dgram_msg_bounds_client(const struct test_opts *opts)
> > +{
> > + unsigned long recv_buf_size;
> > + int page_size;
> > + int msg_cnt;
> > + int fd;
> > +
> > + fd = vsock_dgram_connect(opts->peer_cid, 1234);
> > + if (fd < 0) {
> > + perror("connect");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + /* Let the server know the client is ready */
> > + control_writeln("CLNTREADY");
> > +
> > + msg_cnt = control_readulong();
> > + recv_buf_size = control_readulong();
> > +
> > + /* Wait, until receiver sets buffer size. */
> > + control_expectln("SRVREADY");
> > +
> > + page_size = getpagesize();
> > +
> > + for (int i = 0; i < msg_cnt; i++) {
> > + unsigned long curr_hash;
> > + ssize_t send_size;
> > + size_t buf_size;
> > + void *buf;
> > +
> > + /* Use "small" buffers and "big" buffers. */
> > + if (i & 1)
> > + buf_size = page_size +
> > + (rand() % (MAX_MSG_SIZE - page_size));
> > + else
> > + buf_size = 1 + (rand() % page_size);
> > +
> > + buf_size = min(buf_size, recv_buf_size);
> > +
> > + buf = malloc(buf_size);
> > +
> > + if (!buf) {
> > + perror("malloc");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + memset(buf, rand() & 0xff, buf_size);
> > + /* Set at least one MSG_EOR + some random. */
> > +
> > + send_size = send(fd, buf, buf_size, 0);
> > +
> > + if (send_size < 0) {
> > + perror("send");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + if (send_size != buf_size) {
> > + fprintf(stderr, "Invalid send size\n");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + /* In theory the implementation isn't required to transmit
> > + * these packets in order, so we use this SYNC control message
> > + * so that server and client coordinate sending and receiving
> > + * one packet at a time. The client sends a packet and waits
> > + * until it has been received before sending another.
> > + */
> > + control_writeln("PKTSENT");
> > + control_expectln("PKTRECV");
> > +
> > + /* Send the server a hash of the packet */
> > + curr_hash = hash_djb2(buf, buf_size);
> > + control_writeulong(curr_hash);
> > + free(buf);
> > + }
> > +
> > + control_writeln("SENDDONE");
> > + close(fd);
> > +}
> > +
> > +static void test_dgram_msg_bounds_server(const struct test_opts *opts)
> > +{
> > + const unsigned long msg_cnt = 16;
> > + unsigned long sock_buf_size;
> > + struct msghdr msg = {0};
> > + struct iovec iov = {0};
> > + char buf[MAX_MSG_SIZE];
> > + socklen_t len;
> > + int fd;
> > + int i;
> > +
> > + fd = vsock_dgram_bind(VMADDR_CID_ANY, 1234);
> > +
> > + if (fd < 0) {
> > + perror("bind");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + /* Set receive buffer to maximum */
> > + sock_buf_size = -1;
> > + if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
> > + &sock_buf_size, sizeof(sock_buf_size))) {
> > + perror("setsockopt(SO_RECVBUF)");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + /* Retrieve the receive buffer size */
> > + len = sizeof(sock_buf_size);
> > + if (getsockopt(fd, SOL_SOCKET, SO_RCVBUF,
> > + &sock_buf_size, &len)) {
> > + perror("getsockopt(SO_RECVBUF)");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + /* Client ready to receive parameters */
> > + control_expectln("CLNTREADY");
> > +
> > + control_writeulong(msg_cnt);
> > + control_writeulong(sock_buf_size);
> > +
> > + /* Ready to receive data. */
> > + control_writeln("SRVREADY");
> > +
> > + iov.iov_base = buf;
> > + iov.iov_len = sizeof(buf);
> > + msg.msg_iov = &iov;
> > + msg.msg_iovlen = 1;
> > +
> > + for (i = 0; i < msg_cnt; i++) {
> > + unsigned long remote_hash;
> > + unsigned long curr_hash;
> > + ssize_t recv_size;
> > +
> > + control_expectln("PKTSENT");
> > + recv_size = recvmsg(fd, &msg, 0);
> > + control_writeln("PKTRECV");
> > +
> > + if (!recv_size)
> > + break;
> > +
> > + if (recv_size < 0) {
> > + perror("recvmsg");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + curr_hash = hash_djb2(msg.msg_iov[0].iov_base, recv_size);
> > + remote_hash = control_readulong();
> > +
> > + if (curr_hash != remote_hash) {
> > + fprintf(stderr, "Message bounds broken\n");
> > + exit(EXIT_FAILURE);
> > + }
> > + }
> > +
> > + close(fd);
> > +}
> > +
> > static struct test_case test_cases[] = {
> > {
> > .name = "SOCK_STREAM connection reset",
> > @@ -1128,6 +1535,31 @@ static struct test_case test_cases[] = {
> > .run_client = test_stream_virtio_skb_merge_client,
> > .run_server = test_stream_virtio_skb_merge_server,
> > },
> > + {
> > + .name = "SOCK_DGRAM client sendto",
> > + .run_client = test_dgram_sendto_client,
> > + .run_server = test_dgram_sendto_server,
> > + },
> > + {
> > + .name = "SOCK_DGRAM client connect",
> > + .run_client = test_dgram_connect_client,
> > + .run_server = test_dgram_connect_server,
> > + },
> > + {
> > + .name = "SOCK_DGRAM multiple connections using sendto",
> > + .run_client = test_dgram_multiconn_sendto_client,
> > + .run_server = test_dgram_multiconn_sendto_server,
> > + },
> > + {
> > + .name = "SOCK_DGRAM multiple connections using send",
> > + .run_client = test_dgram_multiconn_send_client,
> > + .run_server = test_dgram_multiconn_send_server,
> > + },
> > + {
> > + .name = "SOCK_DGRAM msg bounds",
> > + .run_client = test_dgram_msg_bounds_client,
> > + .run_server = test_dgram_msg_bounds_server,
> > + },
> > {},
> > };
> >
> >

2023-06-22 23:51:44

by Bobby Eshleman

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 1/8] vsock/dgram: generalize recvmsg and drop transport->dgram_dequeue

On Thu, Jun 22, 2023 at 04:51:41PM +0200, Stefano Garzarella wrote:
> On Sun, Jun 11, 2023 at 11:43:15PM +0300, Arseniy Krasnov wrote:
> > Hello Bobby! Thanks for this patchset! Small comment below:
> >
> > On 10.06.2023 03:58, Bobby Eshleman wrote:
> > > This commit drops the transport->dgram_dequeue callback and makes
> > > vsock_dgram_recvmsg() generic. It also adds additional transport
> > > callbacks for use by the generic vsock_dgram_recvmsg(), such as for
> > > parsing skbs for CID/port which vary in format per transport.
> > >
> > > Signed-off-by: Bobby Eshleman <[email protected]>
> > > ---
> > > drivers/vhost/vsock.c | 4 +-
> > > include/linux/virtio_vsock.h | 3 ++
> > > include/net/af_vsock.h | 13 ++++++-
> > > net/vmw_vsock/af_vsock.c | 51 ++++++++++++++++++++++++-
> > > net/vmw_vsock/hyperv_transport.c | 17 +++++++--
> > > net/vmw_vsock/virtio_transport.c | 4 +-
> > > net/vmw_vsock/virtio_transport_common.c | 18 +++++++++
> > > net/vmw_vsock/vmci_transport.c | 68 +++++++++++++--------------------
> > > net/vmw_vsock/vsock_loopback.c | 4 +-
> > > 9 files changed, 132 insertions(+), 50 deletions(-)
> > >
> > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > index 6578db78f0ae..c8201c070b4b 100644
> > > --- a/drivers/vhost/vsock.c
> > > +++ b/drivers/vhost/vsock.c
> > > @@ -410,9 +410,11 @@ static struct virtio_transport vhost_transport = {
> > > .cancel_pkt = vhost_transport_cancel_pkt,
> > >
> > > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > > - .dgram_dequeue = virtio_transport_dgram_dequeue,
> > > .dgram_bind = virtio_transport_dgram_bind,
> > > .dgram_allow = virtio_transport_dgram_allow,
> > > + .dgram_get_cid = virtio_transport_dgram_get_cid,
> > > + .dgram_get_port = virtio_transport_dgram_get_port,
> > > + .dgram_get_length = virtio_transport_dgram_get_length,
> > >
> > > .stream_enqueue = virtio_transport_stream_enqueue,
> > > .stream_dequeue = virtio_transport_stream_dequeue,
> > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > > index c58453699ee9..23521a318cf0 100644
> > > --- a/include/linux/virtio_vsock.h
> > > +++ b/include/linux/virtio_vsock.h
> > > @@ -219,6 +219,9 @@ bool virtio_transport_stream_allow(u32 cid, u32 port);
> > > int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > > struct sockaddr_vm *addr);
> > > bool virtio_transport_dgram_allow(u32 cid, u32 port);
> > > +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
> > > +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
> > > +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len);
> > >
> > > int virtio_transport_connect(struct vsock_sock *vsk);
> > >
> > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > > index 0e7504a42925..7bedb9ee7e3e 100644
> > > --- a/include/net/af_vsock.h
> > > +++ b/include/net/af_vsock.h
> > > @@ -120,11 +120,20 @@ struct vsock_transport {
> > >
> > > /* DGRAM. */
> > > int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
> > > - int (*dgram_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
> > > - size_t len, int flags);
> > > int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
> > > struct msghdr *, size_t len);
> > > bool (*dgram_allow)(u32 cid, u32 port);
> > > + int (*dgram_get_cid)(struct sk_buff *skb, unsigned int *cid);
> > > + int (*dgram_get_port)(struct sk_buff *skb, unsigned int *port);
> > > + int (*dgram_get_length)(struct sk_buff *skb, size_t *length);
> > > +
> > > + /* The number of bytes into the buffer at which the payload starts, as
> > > + * first seen by the receiving socket layer. For example, if the
> > > + * transport presets the skb pointers using skb_pull(sizeof(header))
> > > + * than this would be zero, otherwise it would be the size of the
> > > + * header.
> > > + */
> > > + const size_t dgram_payload_offset;
> > >
> > > /* STREAM. */
> > > /* TODO: stream_bind() */
> > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > > index efb8a0937a13..ffb4dd8b6ea7 100644
> > > --- a/net/vmw_vsock/af_vsock.c
> > > +++ b/net/vmw_vsock/af_vsock.c
> > > @@ -1271,11 +1271,15 @@ static int vsock_dgram_connect(struct socket *sock,
> > > int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> > > size_t len, int flags)
> > > {
> > > + const struct vsock_transport *transport;
> > > #ifdef CONFIG_BPF_SYSCALL
> > > const struct proto *prot;
> > > #endif
> > > struct vsock_sock *vsk;
> > > + struct sk_buff *skb;
> > > + size_t payload_len;
> > > struct sock *sk;
> > > + int err;
> > >
> > > sk = sock->sk;
> > > vsk = vsock_sk(sk);
> > > @@ -1286,7 +1290,52 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> > > return prot->recvmsg(sk, msg, len, flags, NULL);
> > > #endif
> > >
> > > - return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
> > > + if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> > > + return -EOPNOTSUPP;
> > > +
> > > + transport = vsk->transport;
> > > +
> > > + /* Retrieve the head sk_buff from the socket's receive queue. */
> > > + err = 0;
> > > + skb = skb_recv_datagram(sk_vsock(vsk), flags, &err);
> > > + if (!skb)
> > > + return err;
> > > +
> > > + err = transport->dgram_get_length(skb, &payload_len);
>
> What about ssize_t return value here?
>
> Or maybe a single callback that return both length and offset?
>
> .dgram_get_payload_info(skb, &payload_len, &payload_off)
>

What are your thoughts on Arseniy's idea of using skb->len and adding a
skb_pull() just before vmci adds the skb to the sk receive queue?

> > > + if (err)
> > > + goto out;
> > > +
> > > + if (payload_len > len) {
> > > + payload_len = len;
> > > + msg->msg_flags |= MSG_TRUNC;
> > > + }
> > > +
> > > + /* Place the datagram payload in the user's iovec. */
> > > + err = skb_copy_datagram_msg(skb, transport->dgram_payload_offset, msg, payload_len);
> > > + if (err)
> > > + goto out;
> > > +
> > > + if (msg->msg_name) {
> > > + /* Provide the address of the sender. */
> > > + DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> > > + unsigned int cid, port;
> > > +
> > > + err = transport->dgram_get_cid(skb, &cid);
> > > + if (err)
> > > + goto out;
> > > +
> > > + err = transport->dgram_get_port(skb, &port);
> > > + if (err)
> > > + goto out;
> >
> > Maybe we can merge 'dgram_get_cid' and 'dgram_get_port' to a single callback? Because I see that this is
> > the only place where both are used (correct me if i'm wrong) and logically both operates with addresses:
> > CID and port. E.g. something like that: dgram_get_cid_n_port().
>
> What about .dgram_addr_init(struct sk_buff *skb, struct sockaddr_vm *addr)
> and the transport can set cid and port?
>

LGTM!

> >
> > Moreover, I'm not sure, but is it good "tradeoff" here: remove transport specific callback for dgram receive
> > where we already have 'msghdr' with both data buffer and buffer for 'sockaddr_vm' and instead of it add new
> > several fields (callbacks) to transports like dgram_get_cid(), dgram_get_port()? I agree, that in each transport
> > specific callback we will have same copying logic by calling 'skb_copy_datagram_msg()' and filling address
> > by using 'vsock_addr_init()', but in this case we don't need to update transports too much. For example HyperV
> > still unchanged as it does not support SOCK_DGRAM. For VMCI You just need to add 'vsock_addr_init()' logic
> > to it's dgram dequeue callback.
> >
> > What do You think?
>
> Honestly, I'd rather avoid duplicate code than reduce changes in
> transports that don't support dgram.
>
> One thing I do agree on though is minimizing the number of callbacks
> to call to reduce the number of indirection (more performance?).
>
> Thanks,
> Stefano
>

Thanks!

Best,
Bobby

> >
> > Thanks, Arseniy
> >
> > > +
> > > + vsock_addr_init(vm_addr, cid, port);
> > > + msg->msg_namelen = sizeof(*vm_addr);
> > > + }
> > > + err = payload_len;
> > > +
> > > +out:
> > > + skb_free_datagram(&vsk->sk, skb);
> > > + return err;
> > > }
> > > EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);
> > >
> > > diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> > > index 7cb1a9d2cdb4..ff6e87e25fa0 100644
> > > --- a/net/vmw_vsock/hyperv_transport.c
> > > +++ b/net/vmw_vsock/hyperv_transport.c
> > > @@ -556,8 +556,17 @@ static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
> > > return -EOPNOTSUPP;
> > > }
> > >
> > > -static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
> > > - size_t len, int flags)
> > > +static int hvs_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> > > +{
> > > + return -EOPNOTSUPP;
> > > +}
> > > +
> > > +static int hvs_dgram_get_port(struct sk_buff *skb, unsigned int *port)
> > > +{
> > > + return -EOPNOTSUPP;
> > > +}
> > > +
> > > +static int hvs_dgram_get_length(struct sk_buff *skb, size_t *len)
> > > {
> > > return -EOPNOTSUPP;
> > > }
> > > @@ -833,7 +842,9 @@ static struct vsock_transport hvs_transport = {
> > > .shutdown = hvs_shutdown,
> > >
> > > .dgram_bind = hvs_dgram_bind,
> > > - .dgram_dequeue = hvs_dgram_dequeue,
> > > + .dgram_get_cid = hvs_dgram_get_cid,
> > > + .dgram_get_port = hvs_dgram_get_port,
> > > + .dgram_get_length = hvs_dgram_get_length,
> > > .dgram_enqueue = hvs_dgram_enqueue,
> > > .dgram_allow = hvs_dgram_allow,
> > >
> > > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > > index e95df847176b..5763cdf13804 100644
> > > --- a/net/vmw_vsock/virtio_transport.c
> > > +++ b/net/vmw_vsock/virtio_transport.c
> > > @@ -429,9 +429,11 @@ static struct virtio_transport virtio_transport = {
> > > .cancel_pkt = virtio_transport_cancel_pkt,
> > >
> > > .dgram_bind = virtio_transport_dgram_bind,
> > > - .dgram_dequeue = virtio_transport_dgram_dequeue,
> > > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > > .dgram_allow = virtio_transport_dgram_allow,
> > > + .dgram_get_cid = virtio_transport_dgram_get_cid,
> > > + .dgram_get_port = virtio_transport_dgram_get_port,
> > > + .dgram_get_length = virtio_transport_dgram_get_length,
> > >
> > > .stream_dequeue = virtio_transport_stream_dequeue,
> > > .stream_enqueue = virtio_transport_stream_enqueue,
> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > index b769fc258931..e6903c719964 100644
> > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > @@ -797,6 +797,24 @@ int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > > }
> > > EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
> > >
> > > +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> > > +{
> > > + return -EOPNOTSUPP;
> > > +}
> > > +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_cid);
> > > +
> > > +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
> > > +{
> > > + return -EOPNOTSUPP;
> > > +}
> > > +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_port);
> > > +
> > > +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
> > > +{
> > > + return -EOPNOTSUPP;
> > > +}
> > > +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_length);
> > > +
> > > bool virtio_transport_dgram_allow(u32 cid, u32 port)
> > > {
> > > return false;
> > > diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> > > index b370070194fa..bbc63826bf48 100644
> > > --- a/net/vmw_vsock/vmci_transport.c
> > > +++ b/net/vmw_vsock/vmci_transport.c
> > > @@ -1731,57 +1731,40 @@ static int vmci_transport_dgram_enqueue(
> > > return err - sizeof(*dg);
> > > }
> > >
> > > -static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk,
> > > - struct msghdr *msg, size_t len,
> > > - int flags)
> > > +static int vmci_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> > > {
> > > - int err;
> > > struct vmci_datagram *dg;
> > > - size_t payload_len;
> > > - struct sk_buff *skb;
> > >
> > > - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> > > - return -EOPNOTSUPP;
> > > + dg = (struct vmci_datagram *)skb->data;
> > > + if (!dg)
> > > + return -EINVAL;
> > >
> > > - /* Retrieve the head sk_buff from the socket's receive queue. */
> > > - err = 0;
> > > - skb = skb_recv_datagram(&vsk->sk, flags, &err);
> > > - if (!skb)
> > > - return err;
> > > + *cid = dg->src.context;
> > > + return 0;
> > > +}
> > > +
> > > +static int vmci_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
> > > +{
> > > + struct vmci_datagram *dg;
> > >
> > > dg = (struct vmci_datagram *)skb->data;
> > > if (!dg)
> > > - /* err is 0, meaning we read zero bytes. */
> > > - goto out;
> > > -
> > > - payload_len = dg->payload_size;
> > > - /* Ensure the sk_buff matches the payload size claimed in the packet. */
> > > - if (payload_len != skb->len - sizeof(*dg)) {
> > > - err = -EINVAL;
> > > - goto out;
> > > - }
> > > + return -EINVAL;
> > >
> > > - if (payload_len > len) {
> > > - payload_len = len;
> > > - msg->msg_flags |= MSG_TRUNC;
> > > - }
> > > + *port = dg->src.resource;
> > > + return 0;
> > > +}
> > >
> > > - /* Place the datagram payload in the user's iovec. */
> > > - err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
> > > - if (err)
> > > - goto out;
> > > +static int vmci_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
> > > +{
> > > + struct vmci_datagram *dg;
> > >
> > > - if (msg->msg_name) {
> > > - /* Provide the address of the sender. */
> > > - DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> > > - vsock_addr_init(vm_addr, dg->src.context, dg->src.resource);
> > > - msg->msg_namelen = sizeof(*vm_addr);
> > > - }
> > > - err = payload_len;
> > > + dg = (struct vmci_datagram *)skb->data;
> > > + if (!dg)
> > > + return -EINVAL;
> > >
> > > -out:
> > > - skb_free_datagram(&vsk->sk, skb);
> > > - return err;
> > > + *len = dg->payload_size;
> > > + return 0;
> > > }
> > >
> > > static bool vmci_transport_dgram_allow(u32 cid, u32 port)
> > > @@ -2040,9 +2023,12 @@ static struct vsock_transport vmci_transport = {
> > > .release = vmci_transport_release,
> > > .connect = vmci_transport_connect,
> > > .dgram_bind = vmci_transport_dgram_bind,
> > > - .dgram_dequeue = vmci_transport_dgram_dequeue,
> > > .dgram_enqueue = vmci_transport_dgram_enqueue,
> > > .dgram_allow = vmci_transport_dgram_allow,
> > > + .dgram_get_cid = vmci_transport_dgram_get_cid,
> > > + .dgram_get_port = vmci_transport_dgram_get_port,
> > > + .dgram_get_length = vmci_transport_dgram_get_length,
> > > + .dgram_payload_offset = sizeof(struct vmci_datagram),
> > > .stream_dequeue = vmci_transport_stream_dequeue,
> > > .stream_enqueue = vmci_transport_stream_enqueue,
> > > .stream_has_data = vmci_transport_stream_has_data,
> > > diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> > > index 5c6360df1f31..2f3cabc79ee5 100644
> > > --- a/net/vmw_vsock/vsock_loopback.c
> > > +++ b/net/vmw_vsock/vsock_loopback.c
> > > @@ -62,9 +62,11 @@ static struct virtio_transport loopback_transport = {
> > > .cancel_pkt = vsock_loopback_cancel_pkt,
> > >
> > > .dgram_bind = virtio_transport_dgram_bind,
> > > - .dgram_dequeue = virtio_transport_dgram_dequeue,
> > > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > > .dgram_allow = virtio_transport_dgram_allow,
> > > + .dgram_get_cid = virtio_transport_dgram_get_cid,
> > > + .dgram_get_port = virtio_transport_dgram_get_port,
> > > + .dgram_get_length = virtio_transport_dgram_get_length,
> > >
> > > .stream_dequeue = virtio_transport_stream_dequeue,
> > > .stream_enqueue = virtio_transport_stream_enqueue,
> > >
> >
>
> _______________________________________________
> Virtualization mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

2023-06-22 23:53:12

by Bobby Eshleman

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 1/8] vsock/dgram: generalize recvmsg and drop transport->dgram_dequeue

On Sun, Jun 11, 2023 at 11:43:15PM +0300, Arseniy Krasnov wrote:
> Hello Bobby! Thanks for this patchset! Small comment below:
>
> On 10.06.2023 03:58, Bobby Eshleman wrote:
> > This commit drops the transport->dgram_dequeue callback and makes
> > vsock_dgram_recvmsg() generic. It also adds additional transport
> > callbacks for use by the generic vsock_dgram_recvmsg(), such as for
> > parsing skbs for CID/port which vary in format per transport.
> >
> > Signed-off-by: Bobby Eshleman <[email protected]>
> > ---
> > drivers/vhost/vsock.c | 4 +-
> > include/linux/virtio_vsock.h | 3 ++
> > include/net/af_vsock.h | 13 ++++++-
> > net/vmw_vsock/af_vsock.c | 51 ++++++++++++++++++++++++-
> > net/vmw_vsock/hyperv_transport.c | 17 +++++++--
> > net/vmw_vsock/virtio_transport.c | 4 +-
> > net/vmw_vsock/virtio_transport_common.c | 18 +++++++++
> > net/vmw_vsock/vmci_transport.c | 68 +++++++++++++--------------------
> > net/vmw_vsock/vsock_loopback.c | 4 +-
> > 9 files changed, 132 insertions(+), 50 deletions(-)
> >
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index 6578db78f0ae..c8201c070b4b 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -410,9 +410,11 @@ static struct virtio_transport vhost_transport = {
> > .cancel_pkt = vhost_transport_cancel_pkt,
> >
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > - .dgram_dequeue = virtio_transport_dgram_dequeue,
> > .dgram_bind = virtio_transport_dgram_bind,
> > .dgram_allow = virtio_transport_dgram_allow,
> > + .dgram_get_cid = virtio_transport_dgram_get_cid,
> > + .dgram_get_port = virtio_transport_dgram_get_port,
> > + .dgram_get_length = virtio_transport_dgram_get_length,
> >
> > .stream_enqueue = virtio_transport_stream_enqueue,
> > .stream_dequeue = virtio_transport_stream_dequeue,
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index c58453699ee9..23521a318cf0 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -219,6 +219,9 @@ bool virtio_transport_stream_allow(u32 cid, u32 port);
> > int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > struct sockaddr_vm *addr);
> > bool virtio_transport_dgram_allow(u32 cid, u32 port);
> > +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
> > +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
> > +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len);
> >
> > int virtio_transport_connect(struct vsock_sock *vsk);
> >
> > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > index 0e7504a42925..7bedb9ee7e3e 100644
> > --- a/include/net/af_vsock.h
> > +++ b/include/net/af_vsock.h
> > @@ -120,11 +120,20 @@ struct vsock_transport {
> >
> > /* DGRAM. */
> > int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
> > - int (*dgram_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
> > - size_t len, int flags);
> > int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
> > struct msghdr *, size_t len);
> > bool (*dgram_allow)(u32 cid, u32 port);
> > + int (*dgram_get_cid)(struct sk_buff *skb, unsigned int *cid);
> > + int (*dgram_get_port)(struct sk_buff *skb, unsigned int *port);
> > + int (*dgram_get_length)(struct sk_buff *skb, size_t *length);
> > +
> > + /* The number of bytes into the buffer at which the payload starts, as
> > + * first seen by the receiving socket layer. For example, if the
> > + * transport presets the skb pointers using skb_pull(sizeof(header))
> > + * than this would be zero, otherwise it would be the size of the
> > + * header.
> > + */
> > + const size_t dgram_payload_offset;
> >
> > /* STREAM. */
> > /* TODO: stream_bind() */
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index efb8a0937a13..ffb4dd8b6ea7 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -1271,11 +1271,15 @@ static int vsock_dgram_connect(struct socket *sock,
> > int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> > size_t len, int flags)
> > {
> > + const struct vsock_transport *transport;
> > #ifdef CONFIG_BPF_SYSCALL
> > const struct proto *prot;
> > #endif
> > struct vsock_sock *vsk;
> > + struct sk_buff *skb;
> > + size_t payload_len;
> > struct sock *sk;
> > + int err;
> >
> > sk = sock->sk;
> > vsk = vsock_sk(sk);
> > @@ -1286,7 +1290,52 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> > return prot->recvmsg(sk, msg, len, flags, NULL);
> > #endif
> >
> > - return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
> > + if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> > + return -EOPNOTSUPP;
> > +
> > + transport = vsk->transport;
> > +
> > + /* Retrieve the head sk_buff from the socket's receive queue. */
> > + err = 0;
> > + skb = skb_recv_datagram(sk_vsock(vsk), flags, &err);
> > + if (!skb)
> > + return err;
> > +
> > + err = transport->dgram_get_length(skb, &payload_len);
> > + if (err)
> > + goto out;
> > +
> > + if (payload_len > len) {
> > + payload_len = len;
> > + msg->msg_flags |= MSG_TRUNC;
> > + }
> > +
> > + /* Place the datagram payload in the user's iovec. */
> > + err = skb_copy_datagram_msg(skb, transport->dgram_payload_offset, msg, payload_len);
> > + if (err)
> > + goto out;
> > +
> > + if (msg->msg_name) {
> > + /* Provide the address of the sender. */
> > + DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> > + unsigned int cid, port;
> > +
> > + err = transport->dgram_get_cid(skb, &cid);
> > + if (err)
> > + goto out;
> > +
> > + err = transport->dgram_get_port(skb, &port);
> > + if (err)
> > + goto out;
>
> Maybe we can merge 'dgram_get_cid' and 'dgram_get_port' to a single callback? Because I see that this is
> the only place where both are used (correct me if i'm wrong) and logically both operates with addresses:
> CID and port. E.g. something like that: dgram_get_cid_n_port().

I like this idea.

>
> Moreover, I'm not sure, but is it good "tradeoff" here: remove transport specific callback for dgram receive
> where we already have 'msghdr' with both data buffer and buffer for 'sockaddr_vm' and instead of it add new
> several fields (callbacks) to transports like dgram_get_cid(), dgram_get_port()? I agree, that in each transport
> specific callback we will have same copying logic by calling 'skb_copy_datagram_msg()' and filling address
> by using 'vsock_addr_init()', but in this case we don't need to update transports too much. For example HyperV
> still unchanged as it does not support SOCK_DGRAM. For VMCI You just need to add 'vsock_addr_init()' logic
> to it's dgram dequeue callback.
>
> What do You think?
>
> Thanks, Arseniy
>

I tend to agree with your point here that adding this many callbacks is
not the big win in complexity reduction that we're hoping for.

I also agree with Stefano's assessment that having two near identical
implementations is not good either.

Hopefully having one simpler callback will bring the best of both
worlds?

Best,
Bobby

> > +
> > + vsock_addr_init(vm_addr, cid, port);
> > + msg->msg_namelen = sizeof(*vm_addr);
> > + }
> > + err = payload_len;
> > +
> > +out:
> > + skb_free_datagram(&vsk->sk, skb);
> > + return err;
> > }
> > EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);
> >
> > diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> > index 7cb1a9d2cdb4..ff6e87e25fa0 100644
> > --- a/net/vmw_vsock/hyperv_transport.c
> > +++ b/net/vmw_vsock/hyperv_transport.c
> > @@ -556,8 +556,17 @@ static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
> > return -EOPNOTSUPP;
> > }
> >
> > -static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
> > - size_t len, int flags)
> > +static int hvs_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> > +{
> > + return -EOPNOTSUPP;
> > +}
> > +
> > +static int hvs_dgram_get_port(struct sk_buff *skb, unsigned int *port)
> > +{
> > + return -EOPNOTSUPP;
> > +}
> > +
> > +static int hvs_dgram_get_length(struct sk_buff *skb, size_t *len)
> > {
> > return -EOPNOTSUPP;
> > }
> > @@ -833,7 +842,9 @@ static struct vsock_transport hvs_transport = {
> > .shutdown = hvs_shutdown,
> >
> > .dgram_bind = hvs_dgram_bind,
> > - .dgram_dequeue = hvs_dgram_dequeue,
> > + .dgram_get_cid = hvs_dgram_get_cid,
> > + .dgram_get_port = hvs_dgram_get_port,
> > + .dgram_get_length = hvs_dgram_get_length,
> > .dgram_enqueue = hvs_dgram_enqueue,
> > .dgram_allow = hvs_dgram_allow,
> >
> > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > index e95df847176b..5763cdf13804 100644
> > --- a/net/vmw_vsock/virtio_transport.c
> > +++ b/net/vmw_vsock/virtio_transport.c
> > @@ -429,9 +429,11 @@ static struct virtio_transport virtio_transport = {
> > .cancel_pkt = virtio_transport_cancel_pkt,
> >
> > .dgram_bind = virtio_transport_dgram_bind,
> > - .dgram_dequeue = virtio_transport_dgram_dequeue,
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > .dgram_allow = virtio_transport_dgram_allow,
> > + .dgram_get_cid = virtio_transport_dgram_get_cid,
> > + .dgram_get_port = virtio_transport_dgram_get_port,
> > + .dgram_get_length = virtio_transport_dgram_get_length,
> >
> > .stream_dequeue = virtio_transport_stream_dequeue,
> > .stream_enqueue = virtio_transport_stream_enqueue,
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index b769fc258931..e6903c719964 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -797,6 +797,24 @@ int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > }
> > EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
> >
> > +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> > +{
> > + return -EOPNOTSUPP;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_cid);
> > +
> > +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
> > +{
> > + return -EOPNOTSUPP;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_port);
> > +
> > +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
> > +{
> > + return -EOPNOTSUPP;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_length);
> > +
> > bool virtio_transport_dgram_allow(u32 cid, u32 port)
> > {
> > return false;
> > diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> > index b370070194fa..bbc63826bf48 100644
> > --- a/net/vmw_vsock/vmci_transport.c
> > +++ b/net/vmw_vsock/vmci_transport.c
> > @@ -1731,57 +1731,40 @@ static int vmci_transport_dgram_enqueue(
> > return err - sizeof(*dg);
> > }
> >
> > -static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk,
> > - struct msghdr *msg, size_t len,
> > - int flags)
> > +static int vmci_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> > {
> > - int err;
> > struct vmci_datagram *dg;
> > - size_t payload_len;
> > - struct sk_buff *skb;
> >
> > - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> > - return -EOPNOTSUPP;
> > + dg = (struct vmci_datagram *)skb->data;
> > + if (!dg)
> > + return -EINVAL;
> >
> > - /* Retrieve the head sk_buff from the socket's receive queue. */
> > - err = 0;
> > - skb = skb_recv_datagram(&vsk->sk, flags, &err);
> > - if (!skb)
> > - return err;
> > + *cid = dg->src.context;
> > + return 0;
> > +}
> > +
> > +static int vmci_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
> > +{
> > + struct vmci_datagram *dg;
> >
> > dg = (struct vmci_datagram *)skb->data;
> > if (!dg)
> > - /* err is 0, meaning we read zero bytes. */
> > - goto out;
> > -
> > - payload_len = dg->payload_size;
> > - /* Ensure the sk_buff matches the payload size claimed in the packet. */
> > - if (payload_len != skb->len - sizeof(*dg)) {
> > - err = -EINVAL;
> > - goto out;
> > - }
> > + return -EINVAL;
> >
> > - if (payload_len > len) {
> > - payload_len = len;
> > - msg->msg_flags |= MSG_TRUNC;
> > - }
> > + *port = dg->src.resource;
> > + return 0;
> > +}
> >
> > - /* Place the datagram payload in the user's iovec. */
> > - err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
> > - if (err)
> > - goto out;
> > +static int vmci_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
> > +{
> > + struct vmci_datagram *dg;
> >
> > - if (msg->msg_name) {
> > - /* Provide the address of the sender. */
> > - DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> > - vsock_addr_init(vm_addr, dg->src.context, dg->src.resource);
> > - msg->msg_namelen = sizeof(*vm_addr);
> > - }
> > - err = payload_len;
> > + dg = (struct vmci_datagram *)skb->data;
> > + if (!dg)
> > + return -EINVAL;
> >
> > -out:
> > - skb_free_datagram(&vsk->sk, skb);
> > - return err;
> > + *len = dg->payload_size;
> > + return 0;
> > }
> >
> > static bool vmci_transport_dgram_allow(u32 cid, u32 port)
> > @@ -2040,9 +2023,12 @@ static struct vsock_transport vmci_transport = {
> > .release = vmci_transport_release,
> > .connect = vmci_transport_connect,
> > .dgram_bind = vmci_transport_dgram_bind,
> > - .dgram_dequeue = vmci_transport_dgram_dequeue,
> > .dgram_enqueue = vmci_transport_dgram_enqueue,
> > .dgram_allow = vmci_transport_dgram_allow,
> > + .dgram_get_cid = vmci_transport_dgram_get_cid,
> > + .dgram_get_port = vmci_transport_dgram_get_port,
> > + .dgram_get_length = vmci_transport_dgram_get_length,
> > + .dgram_payload_offset = sizeof(struct vmci_datagram),
> > .stream_dequeue = vmci_transport_stream_dequeue,
> > .stream_enqueue = vmci_transport_stream_enqueue,
> > .stream_has_data = vmci_transport_stream_has_data,
> > diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> > index 5c6360df1f31..2f3cabc79ee5 100644
> > --- a/net/vmw_vsock/vsock_loopback.c
> > +++ b/net/vmw_vsock/vsock_loopback.c
> > @@ -62,9 +62,11 @@ static struct virtio_transport loopback_transport = {
> > .cancel_pkt = vsock_loopback_cancel_pkt,
> >
> > .dgram_bind = virtio_transport_dgram_bind,
> > - .dgram_dequeue = virtio_transport_dgram_dequeue,
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > .dgram_allow = virtio_transport_dgram_allow,
> > + .dgram_get_cid = virtio_transport_dgram_get_cid,
> > + .dgram_get_port = virtio_transport_dgram_get_port,
> > + .dgram_get_length = virtio_transport_dgram_get_length,
> >
> > .stream_dequeue = virtio_transport_stream_dequeue,
> > .stream_enqueue = virtio_transport_stream_enqueue,
> >

2023-06-22 23:53:49

by Bobby Eshleman

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 4/8] vsock: make vsock bind reusable

On Mon, Jun 12, 2023 at 11:49:57AM +0200, Simon Horman wrote:
> On Sat, Jun 10, 2023 at 12:58:31AM +0000, Bobby Eshleman wrote:
> > This commit makes the bind table management functions in vsock usable
> > for different bind tables. For use by datagrams in a future patch.
> >
> > Signed-off-by: Bobby Eshleman <[email protected]>
> > ---
> > net/vmw_vsock/af_vsock.c | 33 ++++++++++++++++++++++++++-------
> > 1 file changed, 26 insertions(+), 7 deletions(-)
> >
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index ef86765f3765..7a3ca4270446 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -230,11 +230,12 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
> > sock_put(&vsk->sk);
> > }
> >
> > -static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> > +struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
> > + struct list_head *bind_table)
>
> Hi Bobby,
>
> This function seems to only be used in this file.
> Should it be static?

Oh good call, yep.

Thanks!
Bobby

2023-06-22 23:54:30

by Bobby Eshleman

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 1/8] vsock/dgram: generalize recvmsg and drop transport->dgram_dequeue

On Thu, Jun 22, 2023 at 10:23:26PM +0300, Arseniy Krasnov wrote:
>
>
> On 22.06.2023 17:51, Stefano Garzarella wrote:
> > On Sun, Jun 11, 2023 at 11:43:15PM +0300, Arseniy Krasnov wrote:
> >> Hello Bobby! Thanks for this patchset! Small comment below:
> >>
> >> On 10.06.2023 03:58, Bobby Eshleman wrote:
> >>> This commit drops the transport->dgram_dequeue callback and makes
> >>> vsock_dgram_recvmsg() generic. It also adds additional transport
> >>> callbacks for use by the generic vsock_dgram_recvmsg(), such as for
> >>> parsing skbs for CID/port which vary in format per transport.
> >>>
> >>> Signed-off-by: Bobby Eshleman <[email protected]>
> >>> ---
> >>> ?drivers/vhost/vsock.c?????????????????? |? 4 +-
> >>> ?include/linux/virtio_vsock.h??????????? |? 3 ++
> >>> ?include/net/af_vsock.h????????????????? | 13 ++++++-
> >>> ?net/vmw_vsock/af_vsock.c??????????????? | 51 ++++++++++++++++++++++++-
> >>> ?net/vmw_vsock/hyperv_transport.c??????? | 17 +++++++--
> >>> ?net/vmw_vsock/virtio_transport.c??????? |? 4 +-
> >>> ?net/vmw_vsock/virtio_transport_common.c | 18 +++++++++
> >>> ?net/vmw_vsock/vmci_transport.c????????? | 68 +++++++++++++--------------------
> >>> ?net/vmw_vsock/vsock_loopback.c????????? |? 4 +-
> >>> ?9 files changed, 132 insertions(+), 50 deletions(-)
> >>>
> >>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> >>> index 6578db78f0ae..c8201c070b4b 100644
> >>> --- a/drivers/vhost/vsock.c
> >>> +++ b/drivers/vhost/vsock.c
> >>> @@ -410,9 +410,11 @@ static struct virtio_transport vhost_transport = {
> >>> ???????? .cancel_pkt?????????????? = vhost_transport_cancel_pkt,
> >>>
> >>> ???????? .dgram_enqueue??????????? = virtio_transport_dgram_enqueue,
> >>> -??????? .dgram_dequeue??????????? = virtio_transport_dgram_dequeue,
> >>> ???????? .dgram_bind?????????????? = virtio_transport_dgram_bind,
> >>> ???????? .dgram_allow????????????? = virtio_transport_dgram_allow,
> >>> +??????? .dgram_get_cid????????? = virtio_transport_dgram_get_cid,
> >>> +??????? .dgram_get_port????????? = virtio_transport_dgram_get_port,
> >>> +??????? .dgram_get_length????? = virtio_transport_dgram_get_length,
> >>>
> >>> ???????? .stream_enqueue?????????? = virtio_transport_stream_enqueue,
> >>> ???????? .stream_dequeue?????????? = virtio_transport_stream_dequeue,
> >>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> >>> index c58453699ee9..23521a318cf0 100644
> >>> --- a/include/linux/virtio_vsock.h
> >>> +++ b/include/linux/virtio_vsock.h
> >>> @@ -219,6 +219,9 @@ bool virtio_transport_stream_allow(u32 cid, u32 port);
> >>> ?int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> >>> ???????????????? struct sockaddr_vm *addr);
> >>> ?bool virtio_transport_dgram_allow(u32 cid, u32 port);
> >>> +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
> >>> +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
> >>> +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len);
> >>>
> >>> ?int virtio_transport_connect(struct vsock_sock *vsk);
> >>>
> >>> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> >>> index 0e7504a42925..7bedb9ee7e3e 100644
> >>> --- a/include/net/af_vsock.h
> >>> +++ b/include/net/af_vsock.h
> >>> @@ -120,11 +120,20 @@ struct vsock_transport {
> >>>
> >>> ???? /* DGRAM. */
> >>> ???? int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
> >>> -??? int (*dgram_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
> >>> -???????????????? size_t len, int flags);
> >>> ???? int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
> >>> ????????????????? struct msghdr *, size_t len);
> >>> ???? bool (*dgram_allow)(u32 cid, u32 port);
> >>> +??? int (*dgram_get_cid)(struct sk_buff *skb, unsigned int *cid);
> >>> +??? int (*dgram_get_port)(struct sk_buff *skb, unsigned int *port);
> >>> +??? int (*dgram_get_length)(struct sk_buff *skb, size_t *length);
> >>> +
> >>> +??? /* The number of bytes into the buffer at which the payload starts, as
> >>> +???? * first seen by the receiving socket layer. For example, if the
> >>> +???? * transport presets the skb pointers using skb_pull(sizeof(header))
> >>> +???? * than this would be zero, otherwise it would be the size of the
> >>> +???? * header.
> >>> +???? */
> >>> +??? const size_t dgram_payload_offset;
> >>>
> >>> ???? /* STREAM. */
> >>> ???? /* TODO: stream_bind() */
> >>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> >>> index efb8a0937a13..ffb4dd8b6ea7 100644
> >>> --- a/net/vmw_vsock/af_vsock.c
> >>> +++ b/net/vmw_vsock/af_vsock.c
> >>> @@ -1271,11 +1271,15 @@ static int vsock_dgram_connect(struct socket *sock,
> >>> ?int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> >>> ???????????? size_t len, int flags)
> >>> ?{
> >>> +??? const struct vsock_transport *transport;
> >>> ?#ifdef CONFIG_BPF_SYSCALL
> >>> ???? const struct proto *prot;
> >>> ?#endif
> >>> ???? struct vsock_sock *vsk;
> >>> +??? struct sk_buff *skb;
> >>> +??? size_t payload_len;
> >>> ???? struct sock *sk;
> >>> +??? int err;
> >>>
> >>> ???? sk = sock->sk;
> >>> ???? vsk = vsock_sk(sk);
> >>> @@ -1286,7 +1290,52 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> >>> ???????? return prot->recvmsg(sk, msg, len, flags, NULL);
> >>> ?#endif
> >>>
> >>> -??? return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
> >>> +??? if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> >>> +??????? return -EOPNOTSUPP;
> >>> +
> >>> +??? transport = vsk->transport;
> >>> +
> >>> +??? /* Retrieve the head sk_buff from the socket's receive queue. */
> >>> +??? err = 0;
> >>> +??? skb = skb_recv_datagram(sk_vsock(vsk), flags, &err);
> >>> +??? if (!skb)
> >>> +??????? return err;
> >>> +
> >>> +??? err = transport->dgram_get_length(skb, &payload_len);
> >
> > What about ssize_t return value here?
> >
> > Or maybe a single callback that return both length and offset?
> >
> > .dgram_get_payload_info(skb, &payload_len, &payload_off)
>
> Just architectural question:
>
> May be we can avoid this callback for length? IIUC concept of skbuff is that
> current level of network stack already have pointer to its data 'skb->data' and
> length of the payload 'skb->len' (both are set by previous stack handler - transport in
> this case), so here we can use just 'skb->len' and thats all. There is no need to ask
> lower level of network stack for length of payload? I see that VMCI stores metadata
> with payload in 'data' buffer, but may be it is more correctly to do 'skb_pull()'
> in vmci before inserting skbuff to sockets queue? In this case field with dgram payload
> offset could be removed from transport.
>
> Thanks, Arseniy
>

I agree, I think introducing the skb_pull() to vmci would honestly be
best. I don't think it should break anything based on my reading of the
code if it is called just prior to sk_receive_skb().


Thanks,
Bobby

> >
> >>> +??? if (err)
> >>> +??????? goto out;
> >>> +
> >>> +??? if (payload_len > len) {
> >>> +??????? payload_len = len;
> >>> +??????? msg->msg_flags |= MSG_TRUNC;
> >>> +??? }
> >>> +
> >>> +??? /* Place the datagram payload in the user's iovec. */
> >>> +??? err = skb_copy_datagram_msg(skb, transport->dgram_payload_offset, msg, payload_len);
> >>> +??? if (err)
> >>> +??????? goto out;
> >>> +
> >>> +??? if (msg->msg_name) {
> >>> +??????? /* Provide the address of the sender. */
> >>> +??????? DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> >>> +??????? unsigned int cid, port;
> >>> +
> >>> +??????? err = transport->dgram_get_cid(skb, &cid);
> >>> +??????? if (err)
> >>> +??????????? goto out;
> >>> +
> >>> +??????? err = transport->dgram_get_port(skb, &port);
> >>> +??????? if (err)
> >>> +??????????? goto out;
> >>
> >> Maybe we can merge 'dgram_get_cid' and 'dgram_get_port' to a single callback? Because I see that this is
> >> the only place where both are used (correct me if i'm wrong) and logically both operates with addresses:
> >> CID and port. E.g. something like that: dgram_get_cid_n_port().
> >
> > What about .dgram_addr_init(struct sk_buff *skb, struct sockaddr_vm *addr)
> > and the transport can set cid and port?
> >
> >>
> >> Moreover, I'm not sure, but is it good "tradeoff" here: remove transport specific callback for dgram receive
> >> where we already have 'msghdr' with both data buffer and buffer for 'sockaddr_vm' and instead of it add new
> >> several fields (callbacks) to transports like dgram_get_cid(), dgram_get_port()? I agree, that in each transport
> >> specific callback we will have same copying logic by calling 'skb_copy_datagram_msg()' and filling address
> >> by using 'vsock_addr_init()', but in this case we don't need to update transports too much. For example HyperV
> >> still unchanged as it does not support SOCK_DGRAM. For VMCI You just need to add 'vsock_addr_init()' logic
> >> to it's dgram dequeue callback.
> >>
> >> What do You think?
> >
> > Honestly, I'd rather avoid duplicate code than reduce changes in
> > transports that don't support dgram.
> >
> > One thing I do agree on though is minimizing the number of callbacks
> > to call to reduce the number of indirection (more performance?).
> >
> > Thanks,
> > Stefano
> >
> >>
> >> Thanks, Arseniy
> >>
> >>> +
> >>> +??????? vsock_addr_init(vm_addr, cid, port);
> >>> +??????? msg->msg_namelen = sizeof(*vm_addr);
> >>> +??? }
> >>> +??? err = payload_len;
> >>> +
> >>> +out:
> >>> +??? skb_free_datagram(&vsk->sk, skb);
> >>> +??? return err;
> >>> ?}
> >>> ?EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);
> >>>
> >>> diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> >>> index 7cb1a9d2cdb4..ff6e87e25fa0 100644
> >>> --- a/net/vmw_vsock/hyperv_transport.c
> >>> +++ b/net/vmw_vsock/hyperv_transport.c
> >>> @@ -556,8 +556,17 @@ static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
> >>> ???? return -EOPNOTSUPP;
> >>> ?}
> >>>
> >>> -static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
> >>> -???????????????? size_t len, int flags)
> >>> +static int hvs_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> >>> +{
> >>> +??? return -EOPNOTSUPP;
> >>> +}
> >>> +
> >>> +static int hvs_dgram_get_port(struct sk_buff *skb, unsigned int *port)
> >>> +{
> >>> +??? return -EOPNOTSUPP;
> >>> +}
> >>> +
> >>> +static int hvs_dgram_get_length(struct sk_buff *skb, size_t *len)
> >>> ?{
> >>> ???? return -EOPNOTSUPP;
> >>> ?}
> >>> @@ -833,7 +842,9 @@ static struct vsock_transport hvs_transport = {
> >>> ???? .shutdown???????????????? = hvs_shutdown,
> >>>
> >>> ???? .dgram_bind?????????????? = hvs_dgram_bind,
> >>> -??? .dgram_dequeue??????????? = hvs_dgram_dequeue,
> >>> +??? .dgram_get_cid????????? = hvs_dgram_get_cid,
> >>> +??? .dgram_get_port????????? = hvs_dgram_get_port,
> >>> +??? .dgram_get_length????? = hvs_dgram_get_length,
> >>> ???? .dgram_enqueue??????????? = hvs_dgram_enqueue,
> >>> ???? .dgram_allow????????????? = hvs_dgram_allow,
> >>>
> >>> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> >>> index e95df847176b..5763cdf13804 100644
> >>> --- a/net/vmw_vsock/virtio_transport.c
> >>> +++ b/net/vmw_vsock/virtio_transport.c
> >>> @@ -429,9 +429,11 @@ static struct virtio_transport virtio_transport = {
> >>> ???????? .cancel_pkt?????????????? = virtio_transport_cancel_pkt,
> >>>
> >>> ???????? .dgram_bind?????????????? = virtio_transport_dgram_bind,
> >>> -??????? .dgram_dequeue??????????? = virtio_transport_dgram_dequeue,
> >>> ???????? .dgram_enqueue??????????? = virtio_transport_dgram_enqueue,
> >>> ???????? .dgram_allow????????????? = virtio_transport_dgram_allow,
> >>> +??????? .dgram_get_cid????????? = virtio_transport_dgram_get_cid,
> >>> +??????? .dgram_get_port????????? = virtio_transport_dgram_get_port,
> >>> +??????? .dgram_get_length????? = virtio_transport_dgram_get_length,
> >>>
> >>> ???????? .stream_dequeue?????????? = virtio_transport_stream_dequeue,
> >>> ???????? .stream_enqueue?????????? = virtio_transport_stream_enqueue,
> >>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> >>> index b769fc258931..e6903c719964 100644
> >>> --- a/net/vmw_vsock/virtio_transport_common.c
> >>> +++ b/net/vmw_vsock/virtio_transport_common.c
> >>> @@ -797,6 +797,24 @@ int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> >>> ?}
> >>> ?EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
> >>>
> >>> +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> >>> +{
> >>> +??? return -EOPNOTSUPP;
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_cid);
> >>> +
> >>> +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
> >>> +{
> >>> +??? return -EOPNOTSUPP;
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_port);
> >>> +
> >>> +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
> >>> +{
> >>> +??? return -EOPNOTSUPP;
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_length);
> >>> +
> >>> ?bool virtio_transport_dgram_allow(u32 cid, u32 port)
> >>> ?{
> >>> ???? return false;
> >>> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> >>> index b370070194fa..bbc63826bf48 100644
> >>> --- a/net/vmw_vsock/vmci_transport.c
> >>> +++ b/net/vmw_vsock/vmci_transport.c
> >>> @@ -1731,57 +1731,40 @@ static int vmci_transport_dgram_enqueue(
> >>> ???? return err - sizeof(*dg);
> >>> ?}
> >>>
> >>> -static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk,
> >>> -??????????????????? struct msghdr *msg, size_t len,
> >>> -??????????????????? int flags)
> >>> +static int vmci_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> >>> ?{
> >>> -??? int err;
> >>> ???? struct vmci_datagram *dg;
> >>> -??? size_t payload_len;
> >>> -??? struct sk_buff *skb;
> >>>
> >>> -??? if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> >>> -??????? return -EOPNOTSUPP;
> >>> +??? dg = (struct vmci_datagram *)skb->data;
> >>> +??? if (!dg)
> >>> +??????? return -EINVAL;
> >>>
> >>> -??? /* Retrieve the head sk_buff from the socket's receive queue. */
> >>> -??? err = 0;
> >>> -??? skb = skb_recv_datagram(&vsk->sk, flags, &err);
> >>> -??? if (!skb)
> >>> -??????? return err;
> >>> +??? *cid = dg->src.context;
> >>> +??? return 0;
> >>> +}
> >>> +
> >>> +static int vmci_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
> >>> +{
> >>> +??? struct vmci_datagram *dg;
> >>>
> >>> ???? dg = (struct vmci_datagram *)skb->data;
> >>> ???? if (!dg)
> >>> -??????? /* err is 0, meaning we read zero bytes. */
> >>> -??????? goto out;
> >>> -
> >>> -??? payload_len = dg->payload_size;
> >>> -??? /* Ensure the sk_buff matches the payload size claimed in the packet. */
> >>> -??? if (payload_len != skb->len - sizeof(*dg)) {
> >>> -??????? err = -EINVAL;
> >>> -??????? goto out;
> >>> -??? }
> >>> +??????? return -EINVAL;
> >>>
> >>> -??? if (payload_len > len) {
> >>> -??????? payload_len = len;
> >>> -??????? msg->msg_flags |= MSG_TRUNC;
> >>> -??? }
> >>> +??? *port = dg->src.resource;
> >>> +??? return 0;
> >>> +}
> >>>
> >>> -??? /* Place the datagram payload in the user's iovec. */
> >>> -??? err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
> >>> -??? if (err)
> >>> -??????? goto out;
> >>> +static int vmci_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
> >>> +{
> >>> +??? struct vmci_datagram *dg;
> >>>
> >>> -??? if (msg->msg_name) {
> >>> -??????? /* Provide the address of the sender. */
> >>> -??????? DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> >>> -??????? vsock_addr_init(vm_addr, dg->src.context, dg->src.resource);
> >>> -??????? msg->msg_namelen = sizeof(*vm_addr);
> >>> -??? }
> >>> -??? err = payload_len;
> >>> +??? dg = (struct vmci_datagram *)skb->data;
> >>> +??? if (!dg)
> >>> +??????? return -EINVAL;
> >>>
> >>> -out:
> >>> -??? skb_free_datagram(&vsk->sk, skb);
> >>> -??? return err;
> >>> +??? *len = dg->payload_size;
> >>> +??? return 0;
> >>> ?}
> >>>
> >>> ?static bool vmci_transport_dgram_allow(u32 cid, u32 port)
> >>> @@ -2040,9 +2023,12 @@ static struct vsock_transport vmci_transport = {
> >>> ???? .release = vmci_transport_release,
> >>> ???? .connect = vmci_transport_connect,
> >>> ???? .dgram_bind = vmci_transport_dgram_bind,
> >>> -??? .dgram_dequeue = vmci_transport_dgram_dequeue,
> >>> ???? .dgram_enqueue = vmci_transport_dgram_enqueue,
> >>> ???? .dgram_allow = vmci_transport_dgram_allow,
> >>> +??? .dgram_get_cid = vmci_transport_dgram_get_cid,
> >>> +??? .dgram_get_port = vmci_transport_dgram_get_port,
> >>> +??? .dgram_get_length = vmci_transport_dgram_get_length,
> >>> +??? .dgram_payload_offset = sizeof(struct vmci_datagram),
> >>> ???? .stream_dequeue = vmci_transport_stream_dequeue,
> >>> ???? .stream_enqueue = vmci_transport_stream_enqueue,
> >>> ???? .stream_has_data = vmci_transport_stream_has_data,
> >>> diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> >>> index 5c6360df1f31..2f3cabc79ee5 100644
> >>> --- a/net/vmw_vsock/vsock_loopback.c
> >>> +++ b/net/vmw_vsock/vsock_loopback.c
> >>> @@ -62,9 +62,11 @@ static struct virtio_transport loopback_transport = {
> >>> ???????? .cancel_pkt?????????????? = vsock_loopback_cancel_pkt,
> >>>
> >>> ???????? .dgram_bind?????????????? = virtio_transport_dgram_bind,
> >>> -??????? .dgram_dequeue??????????? = virtio_transport_dgram_dequeue,
> >>> ???????? .dgram_enqueue??????????? = virtio_transport_dgram_enqueue,
> >>> ???????? .dgram_allow????????????? = virtio_transport_dgram_allow,
> >>> +??????? .dgram_get_cid????????? = virtio_transport_dgram_get_cid,
> >>> +??????? .dgram_get_port????????? = virtio_transport_dgram_get_port,
> >>> +??????? .dgram_get_length????? = virtio_transport_dgram_get_length,
> >>>
> >>> ???????? .stream_dequeue?????????? = virtio_transport_stream_dequeue,
> >>> ???????? .stream_enqueue?????????? = virtio_transport_stream_enqueue,
> >>>
> >>
> >

2023-06-23 08:24:25

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 1/8] vsock/dgram: generalize recvmsg and drop transport->dgram_dequeue

On Thu, Jun 22, 2023 at 11:37:42PM +0000, Bobby Eshleman wrote:
>On Thu, Jun 22, 2023 at 04:51:41PM +0200, Stefano Garzarella wrote:
>> On Sun, Jun 11, 2023 at 11:43:15PM +0300, Arseniy Krasnov wrote:
>> > Hello Bobby! Thanks for this patchset! Small comment below:
>> >
>> > On 10.06.2023 03:58, Bobby Eshleman wrote:
>> > > This commit drops the transport->dgram_dequeue callback and makes
>> > > vsock_dgram_recvmsg() generic. It also adds additional transport
>> > > callbacks for use by the generic vsock_dgram_recvmsg(), such as for
>> > > parsing skbs for CID/port which vary in format per transport.
>> > >
>> > > Signed-off-by: Bobby Eshleman <[email protected]>
>> > > ---
>> > > drivers/vhost/vsock.c | 4 +-
>> > > include/linux/virtio_vsock.h | 3 ++
>> > > include/net/af_vsock.h | 13 ++++++-
>> > > net/vmw_vsock/af_vsock.c | 51 ++++++++++++++++++++++++-
>> > > net/vmw_vsock/hyperv_transport.c | 17 +++++++--
>> > > net/vmw_vsock/virtio_transport.c | 4 +-
>> > > net/vmw_vsock/virtio_transport_common.c | 18 +++++++++
>> > > net/vmw_vsock/vmci_transport.c | 68 +++++++++++++--------------------
>> > > net/vmw_vsock/vsock_loopback.c | 4 +-
>> > > 9 files changed, 132 insertions(+), 50 deletions(-)
>> > >
>> > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> > > index 6578db78f0ae..c8201c070b4b 100644
>> > > --- a/drivers/vhost/vsock.c
>> > > +++ b/drivers/vhost/vsock.c
>> > > @@ -410,9 +410,11 @@ static struct virtio_transport vhost_transport = {
>> > > .cancel_pkt = vhost_transport_cancel_pkt,
>> > >
>> > > .dgram_enqueue = virtio_transport_dgram_enqueue,
>> > > - .dgram_dequeue = virtio_transport_dgram_dequeue,
>> > > .dgram_bind = virtio_transport_dgram_bind,
>> > > .dgram_allow = virtio_transport_dgram_allow,
>> > > + .dgram_get_cid = virtio_transport_dgram_get_cid,
>> > > + .dgram_get_port = virtio_transport_dgram_get_port,
>> > > + .dgram_get_length = virtio_transport_dgram_get_length,
>> > >
>> > > .stream_enqueue = virtio_transport_stream_enqueue,
>> > > .stream_dequeue = virtio_transport_stream_dequeue,
>> > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> > > index c58453699ee9..23521a318cf0 100644
>> > > --- a/include/linux/virtio_vsock.h
>> > > +++ b/include/linux/virtio_vsock.h
>> > > @@ -219,6 +219,9 @@ bool virtio_transport_stream_allow(u32 cid, u32 port);
>> > > int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>> > > struct sockaddr_vm *addr);
>> > > bool virtio_transport_dgram_allow(u32 cid, u32 port);
>> > > +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
>> > > +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
>> > > +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len);
>> > >
>> > > int virtio_transport_connect(struct vsock_sock *vsk);
>> > >
>> > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>> > > index 0e7504a42925..7bedb9ee7e3e 100644
>> > > --- a/include/net/af_vsock.h
>> > > +++ b/include/net/af_vsock.h
>> > > @@ -120,11 +120,20 @@ struct vsock_transport {
>> > >
>> > > /* DGRAM. */
>> > > int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
>> > > - int (*dgram_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
>> > > - size_t len, int flags);
>> > > int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
>> > > struct msghdr *, size_t len);
>> > > bool (*dgram_allow)(u32 cid, u32 port);
>> > > + int (*dgram_get_cid)(struct sk_buff *skb, unsigned int *cid);
>> > > + int (*dgram_get_port)(struct sk_buff *skb, unsigned int *port);
>> > > + int (*dgram_get_length)(struct sk_buff *skb, size_t *length);
>> > > +
>> > > + /* The number of bytes into the buffer at which the payload starts, as
>> > > + * first seen by the receiving socket layer. For example, if the
>> > > + * transport presets the skb pointers using skb_pull(sizeof(header))
>> > > + * than this would be zero, otherwise it would be the size of the
>> > > + * header.
>> > > + */
>> > > + const size_t dgram_payload_offset;
>> > >
>> > > /* STREAM. */
>> > > /* TODO: stream_bind() */
>> > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> > > index efb8a0937a13..ffb4dd8b6ea7 100644
>> > > --- a/net/vmw_vsock/af_vsock.c
>> > > +++ b/net/vmw_vsock/af_vsock.c
>> > > @@ -1271,11 +1271,15 @@ static int vsock_dgram_connect(struct socket *sock,
>> > > int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
>> > > size_t len, int flags)
>> > > {
>> > > + const struct vsock_transport *transport;
>> > > #ifdef CONFIG_BPF_SYSCALL
>> > > const struct proto *prot;
>> > > #endif
>> > > struct vsock_sock *vsk;
>> > > + struct sk_buff *skb;
>> > > + size_t payload_len;
>> > > struct sock *sk;
>> > > + int err;
>> > >
>> > > sk = sock->sk;
>> > > vsk = vsock_sk(sk);
>> > > @@ -1286,7 +1290,52 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
>> > > return prot->recvmsg(sk, msg, len, flags, NULL);
>> > > #endif
>> > >
>> > > - return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
>> > > + if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
>> > > + return -EOPNOTSUPP;
>> > > +
>> > > + transport = vsk->transport;
>> > > +
>> > > + /* Retrieve the head sk_buff from the socket's receive queue. */
>> > > + err = 0;
>> > > + skb = skb_recv_datagram(sk_vsock(vsk), flags, &err);
>> > > + if (!skb)
>> > > + return err;
>> > > +
>> > > + err = transport->dgram_get_length(skb, &payload_len);
>>
>> What about ssize_t return value here?
>>
>> Or maybe a single callback that return both length and offset?
>>
>> .dgram_get_payload_info(skb, &payload_len, &payload_off)
>>
>
>What are your thoughts on Arseniy's idea of using skb->len and adding a
>skb_pull() just before vmci adds the skb to the sk receive queue?

Yep, I agree on that!

Thanks,
Stefano


2023-06-23 08:25:25

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 4/8] vsock: make vsock bind reusable

On Thu, Jun 22, 2023 at 11:05:43PM +0000, Bobby Eshleman wrote:
>On Thu, Jun 22, 2023 at 05:25:55PM +0200, Stefano Garzarella wrote:
>> On Sat, Jun 10, 2023 at 12:58:31AM +0000, Bobby Eshleman wrote:
>> > This commit makes the bind table management functions in vsock usable
>> > for different bind tables. For use by datagrams in a future patch.
>> >
>> > Signed-off-by: Bobby Eshleman <[email protected]>
>> > ---
>> > net/vmw_vsock/af_vsock.c | 33 ++++++++++++++++++++++++++-------
>> > 1 file changed, 26 insertions(+), 7 deletions(-)
>> >
>> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> > index ef86765f3765..7a3ca4270446 100644
>> > --- a/net/vmw_vsock/af_vsock.c
>> > +++ b/net/vmw_vsock/af_vsock.c
>> > @@ -230,11 +230,12 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
>> > sock_put(&vsk->sk);
>> > }
>> >
>> > -static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
>> > +struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
>> > + struct list_head *bind_table)
>> > {
>> > struct vsock_sock *vsk;
>> >
>> > - list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
>> > + list_for_each_entry(vsk, bind_table, bound_table) {
>> > if (vsock_addr_equals_addr(addr, &vsk->local_addr))
>> > return sk_vsock(vsk);
>> >
>> > @@ -247,6 +248,11 @@ static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
>> > return NULL;
>> > }
>> >
>> > +static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
>> > +{
>> > + return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr));
>> > +}
>> > +
>> > static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
>> > struct sockaddr_vm *dst)
>> > {
>> > @@ -646,12 +652,17 @@ static void vsock_pending_work(struct work_struct *work)
>> >
>> > /**** SOCKET OPERATIONS ****/
>> >
>> > -static int __vsock_bind_connectible(struct vsock_sock *vsk,
>> > - struct sockaddr_vm *addr)
>> > +static int vsock_bind_common(struct vsock_sock *vsk,
>> > + struct sockaddr_vm *addr,
>> > + struct list_head *bind_table,
>> > + size_t table_size)
>> > {
>> > static u32 port;
>> > struct sockaddr_vm new_addr;
>> >
>> > + if (table_size < VSOCK_HASH_SIZE)
>> > + return -1;
>>
>> Why we need this check now?
>>
>
>If the table_size is not at least VSOCK_HASH_SIZE then the
>VSOCK_HASH(addr) used later could overflow the table.
>
>Maybe this really deserves a WARN() and a comment?

Yes, please WARN_ONCE() should be enough.

Stefano


2023-06-23 18:53:12

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 8/8] tests: add vsock dgram tests



On 23.06.2023 02:16, Bobby Eshleman wrote:
> On Sun, Jun 11, 2023 at 11:54:57PM +0300, Arseniy Krasnov wrote:
>> Hello Bobby!
>>
>> Sorry, may be I become a little bit annoying:), but I tried to run vsock_test with
>> this v4 version, and again get the same crash:
>
> Haha not annoying at all. I appreciate the testing!
>
>>
>> # cat client.sh
>> ./vsock_test --mode=client --control-host=192.168.1.1 --control-port=12345 --peer-cid=2
>> # ./client.sh
>> Control socket connected to 192.168.1.1:12345.
>> 0 - SOCK_STREAM connection reset...[ 20.065237] BUG: kernel NULL pointer dereference, addre0
>> [ 20.065895] #PF: supervisor read access in kernel mode
>> [ 20.065895] #PF: error_code(0x0000) - not-present page
>> [ 20.065895] PGD 0 P4D 0
>> [ 20.065895] Oops: 0000 [#1] PREEMPT SMP PTI
>> [ 20.065895] CPU: 0 PID: 111 Comm: vsock_test Not tainted 6.4.0-rc3-gefcccba07069 #385
>> [ 20.065895] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd44
>> [ 20.065895] RIP: 0010:static_key_count+0x0/0x20
>> [ 20.065895] Code: 04 4c 8b 46 08 49 29 c0 4c 01 c8 4c 89 47 08 89 0e 89 56 04 48 89 46 08 f
>> [ 20.065895] RSP: 0018:ffffbbb000223dc0 EFLAGS: 00010202
>> [ 20.065895] RAX: ffffffff85709880 RBX: ffffffffc0079140 RCX: 0000000000000000
>> [ 20.065895] RDX: ffff9f73c2175700 RSI: 0000000000000000 RDI: 0000000000000000
>> [ 20.065895] RBP: ffff9f73c2385900 R08: ffffbbb000223d30 R09: ffff9f73ff896000
>> [ 20.065895] R10: 0000000000001000 R11: 0000000000000000 R12: ffffbbb000223e80
>> [ 20.065895] R13: 0000000000000000 R14: 0000000000000002 R15: ffff9f73c1cfaa80
>> [ 20.065895] FS: 00007f1ad82f55c0(0000) GS:ffff9f73fe400000(0000) knlGS:0000000000000000
>> [ 20.065895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 20.065895] CR2: 0000000000000000 CR3: 000000003f954000 CR4: 00000000000006f0
>> [ 20.065895] Call Trace:
>> [ 20.065895] <TASK>
>> [ 20.065895] once_deferred+0xd/0x30
>> [ 20.065895] vsock_assign_transport+0x9a/0x1b0 [vsock]
>> [ 20.065895] vsock_connect+0xb4/0x3a0 [vsock]
>> [ 20.065895] ? var_wake_function+0x60/0x60
>> [ 20.065895] __sys_connect+0x9e/0xd0
>> [ 20.065895] ? _raw_spin_unlock_irq+0xe/0x30
>> [ 20.065895] ? do_setitimer+0x128/0x1f0
>> [ 20.065895] ? alarm_setitimer+0x4c/0x90
>> [ 20.065895] ? fpregs_assert_state_consistent+0x1d/0x50
>> [ 20.065895] ? exit_to_user_mode_prepare+0x36/0x130
>> [ 20.065895] __x64_sys_connect+0x11/0x20
>> [ 20.065895] do_syscall_64+0x3b/0xc0
>> [ 20.065895] entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>> [ 20.065895] RIP: 0033:0x7f1ad822dd13
>> [ 20.065895] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 64 8
>> [ 20.065895] RSP: 002b:00007ffc513e3c98 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
>> [ 20.065895] RAX: ffffffffffffffda RBX: 000055aed298e020 RCX: 00007f1ad822dd13
>> [ 20.065895] RDX: 0000000000000010 RSI: 00007ffc513e3cb0 RDI: 0000000000000004
>> [ 20.065895] RBP: 0000000000000004 R08: 000055aed32b2018 R09: 0000000000000000
>> [ 20.065895] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>> [ 20.065895] R13: 000055aed298acb1 R14: 00007ffc513e3cb0 R15: 00007ffc513e3d40
>> [ 20.065895] </TASK>
>> [ 20.065895] Modules linked in: vsock_loopback vhost_vsock vmw_vsock_virtio_transport vmw_vb
>
> ^ I'm guessing this is the difference between our setups. I have been
> going all built-in, let me see if I can reproduce w/ modules...

Ah ok, I think using modules is good practice here, because it could test that Your symbols
set is valid to work as modules and 'rmmod' could check problems with the forgotten references
for example for socket or skb. I'm working with modules most of the time.

>
>> [ 20.065895] CR2: 0000000000000000
>> [ 20.154060] ---[ end trace 0000000000000000 ]---
>> [ 20.155519] RIP: 0010:static_key_count+0x0/0x20
>> [ 20.156932] Code: 04 4c 8b 46 08 49 29 c0 4c 01 c8 4c 89 47 08 89 0e 89 56 04 48 89 46 08 f
>> [ 20.161367] RSP: 0018:ffffbbb000223dc0 EFLAGS: 00010202
>> [ 20.162613] RAX: ffffffff85709880 RBX: ffffffffc0079140 RCX: 0000000000000000
>> [ 20.164262] RDX: ffff9f73c2175700 RSI: 0000000000000000 RDI: 0000000000000000
>> [ 20.165934] RBP: ffff9f73c2385900 R08: ffffbbb000223d30 R09: ffff9f73ff896000
>> [ 20.167684] R10: 0000000000001000 R11: 0000000000000000 R12: ffffbbb000223e80
>> [ 20.169427] R13: 0000000000000000 R14: 0000000000000002 R15: ffff9f73c1cfaa80
>> [ 20.171109] FS: 00007f1ad82f55c0(0000) GS:ffff9f73fe400000(0000) knlGS:0000000000000000
>> [ 20.173000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 20.174381] CR2: 0000000000000000 CR3: 000000003f954000 CR4: 00000000000006f0
>>
>> So, what HEAD do You use? May be You have some specific config (I use x86-64 defconfig + vsock/vhost
>> related things) ?
>>
>
> For this series I used net-next:
> 28cfea989d6f55c3d10608eba2a2bae609c5bf3e
>
>> Thanks, Arseniy
>>
>
> As always, thanks for the bug finding! I'll report back when I
> reproduce or with questions if I can't.

Thanks!

Thanks, Arseniy

>
> Best,
> Bobby
>
>>
>> On 10.06.2023 03:58, Bobby Eshleman wrote:
>>> From: Jiang Wang <[email protected]>
>>>
>>> This patch adds tests for vsock datagram.
>>>
>>> Signed-off-by: Bobby Eshleman <[email protected]>
>>> Signed-off-by: Jiang Wang <[email protected]>
>>> ---
>>> tools/testing/vsock/util.c | 141 ++++++++++++-
>>> tools/testing/vsock/util.h | 6 +
>>> tools/testing/vsock/vsock_test.c | 432 +++++++++++++++++++++++++++++++++++++++
>>> 3 files changed, 578 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
>>> index 01b636d3039a..811e70d7cf1e 100644
>>> --- a/tools/testing/vsock/util.c
>>> +++ b/tools/testing/vsock/util.c
>>> @@ -99,7 +99,8 @@ static int vsock_connect(unsigned int cid, unsigned int port, int type)
>>> int ret;
>>> int fd;
>>>
>>> - control_expectln("LISTENING");
>>> + if (type != SOCK_DGRAM)
>>> + control_expectln("LISTENING");
>>>
>>> fd = socket(AF_VSOCK, type, 0);
>>>
>>> @@ -130,6 +131,11 @@ int vsock_seqpacket_connect(unsigned int cid, unsigned int port)
>>> return vsock_connect(cid, port, SOCK_SEQPACKET);
>>> }
>>>
>>> +int vsock_dgram_connect(unsigned int cid, unsigned int port)
>>> +{
>>> + return vsock_connect(cid, port, SOCK_DGRAM);
>>> +}
>>> +
>>> /* Listen on <cid, port> and return the first incoming connection. The remote
>>> * address is stored to clientaddrp. clientaddrp may be NULL.
>>> */
>>> @@ -211,6 +217,34 @@ int vsock_seqpacket_accept(unsigned int cid, unsigned int port,
>>> return vsock_accept(cid, port, clientaddrp, SOCK_SEQPACKET);
>>> }
>>>
>>> +int vsock_dgram_bind(unsigned int cid, unsigned int port)
>>> +{
>>> + union {
>>> + struct sockaddr sa;
>>> + struct sockaddr_vm svm;
>>> + } addr = {
>>> + .svm = {
>>> + .svm_family = AF_VSOCK,
>>> + .svm_port = port,
>>> + .svm_cid = cid,
>>> + },
>>> + };
>>> + int fd;
>>> +
>>> + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
>>> + if (fd < 0) {
>>> + perror("socket");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
>>> + perror("bind");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + return fd;
>>> +}
>>> +
>>> /* Transmit one byte and check the return value.
>>> *
>>> * expected_ret:
>>> @@ -260,6 +294,57 @@ void send_byte(int fd, int expected_ret, int flags)
>>> }
>>> }
>>>
>>> +/* Transmit one byte and check the return value.
>>> + *
>>> + * expected_ret:
>>> + * <0 Negative errno (for testing errors)
>>> + * 0 End-of-file
>>> + * 1 Success
>>> + */
>>> +void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
>>> + int flags)
>>> +{
>>> + const uint8_t byte = 'A';
>>> + ssize_t nwritten;
>>> +
>>> + timeout_begin(TIMEOUT);
>>> + do {
>>> + nwritten = sendto(fd, &byte, sizeof(byte), flags, dest_addr,
>>> + len);
>>> + timeout_check("write");
>>> + } while (nwritten < 0 && errno == EINTR);
>>> + timeout_end();
>>> +
>>> + if (expected_ret < 0) {
>>> + if (nwritten != -1) {
>>> + fprintf(stderr, "bogus sendto(2) return value %zd\n",
>>> + nwritten);
>>> + exit(EXIT_FAILURE);
>>> + }
>>> + if (errno != -expected_ret) {
>>> + perror("write");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> + return;
>>> + }
>>> +
>>> + if (nwritten < 0) {
>>> + perror("write");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> + if (nwritten == 0) {
>>> + if (expected_ret == 0)
>>> + return;
>>> +
>>> + fprintf(stderr, "unexpected EOF while sending byte\n");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> + if (nwritten != sizeof(byte)) {
>>> + fprintf(stderr, "bogus sendto(2) return value %zd\n", nwritten);
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +}
>>> +
>>> /* Receive one byte and check the return value.
>>> *
>>> * expected_ret:
>>> @@ -313,6 +398,60 @@ void recv_byte(int fd, int expected_ret, int flags)
>>> }
>>> }
>>>
>>> +/* Receive one byte and check the return value.
>>> + *
>>> + * expected_ret:
>>> + * <0 Negative errno (for testing errors)
>>> + * 0 End-of-file
>>> + * 1 Success
>>> + */
>>> +void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
>>> + int expected_ret, int flags)
>>> +{
>>> + uint8_t byte;
>>> + ssize_t nread;
>>> +
>>> + timeout_begin(TIMEOUT);
>>> + do {
>>> + nread = recvfrom(fd, &byte, sizeof(byte), flags, src_addr, addrlen);
>>> + timeout_check("read");
>>> + } while (nread < 0 && errno == EINTR);
>>> + timeout_end();
>>> +
>>> + if (expected_ret < 0) {
>>> + if (nread != -1) {
>>> + fprintf(stderr, "bogus recvfrom(2) return value %zd\n",
>>> + nread);
>>> + exit(EXIT_FAILURE);
>>> + }
>>> + if (errno != -expected_ret) {
>>> + perror("read");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> + return;
>>> + }
>>> +
>>> + if (nread < 0) {
>>> + perror("read");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> + if (nread == 0) {
>>> + if (expected_ret == 0)
>>> + return;
>>> +
>>> + fprintf(stderr, "unexpected EOF while receiving byte\n");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> + if (nread != sizeof(byte)) {
>>> + fprintf(stderr, "bogus recvfrom(2) return value %zd\n", nread);
>>> + exit(EXIT_FAILURE);
>>> + }
>>> + if (byte != 'A') {
>>> + fprintf(stderr, "unexpected byte read %c\n", byte);
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +}
>>> +
>>> /* Run test cases. The program terminates if a failure occurs. */
>>> void run_tests(const struct test_case *test_cases,
>>> const struct test_opts *opts)
>>> diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
>>> index fb99208a95ea..a69e128d120c 100644
>>> --- a/tools/testing/vsock/util.h
>>> +++ b/tools/testing/vsock/util.h
>>> @@ -37,13 +37,19 @@ void init_signals(void);
>>> unsigned int parse_cid(const char *str);
>>> int vsock_stream_connect(unsigned int cid, unsigned int port);
>>> int vsock_seqpacket_connect(unsigned int cid, unsigned int port);
>>> +int vsock_dgram_connect(unsigned int cid, unsigned int port);
>>> int vsock_stream_accept(unsigned int cid, unsigned int port,
>>> struct sockaddr_vm *clientaddrp);
>>> int vsock_seqpacket_accept(unsigned int cid, unsigned int port,
>>> struct sockaddr_vm *clientaddrp);
>>> +int vsock_dgram_bind(unsigned int cid, unsigned int port);
>>> void vsock_wait_remote_close(int fd);
>>> void send_byte(int fd, int expected_ret, int flags);
>>> +void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
>>> + int flags);
>>> void recv_byte(int fd, int expected_ret, int flags);
>>> +void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
>>> + int expected_ret, int flags);
>>> void run_tests(const struct test_case *test_cases,
>>> const struct test_opts *opts);
>>> void list_tests(const struct test_case *test_cases);
>>> diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
>>> index ac1bd3ac1533..ded82d39ee5d 100644
>>> --- a/tools/testing/vsock/vsock_test.c
>>> +++ b/tools/testing/vsock/vsock_test.c
>>> @@ -1053,6 +1053,413 @@ static void test_stream_virtio_skb_merge_server(const struct test_opts *opts)
>>> close(fd);
>>> }
>>>
>>> +static void test_dgram_sendto_client(const struct test_opts *opts)
>>> +{
>>> + union {
>>> + struct sockaddr sa;
>>> + struct sockaddr_vm svm;
>>> + } addr = {
>>> + .svm = {
>>> + .svm_family = AF_VSOCK,
>>> + .svm_port = 1234,
>>> + .svm_cid = opts->peer_cid,
>>> + },
>>> + };
>>> + int fd;
>>> +
>>> + /* Wait for the server to be ready */
>>> + control_expectln("BIND");
>>> +
>>> + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
>>> + if (fd < 0) {
>>> + perror("socket");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0);
>>> +
>>> + /* Notify the server that the client has finished */
>>> + control_writeln("DONE");
>>> +
>>> + close(fd);
>>> +}
>>> +
>>> +static void test_dgram_sendto_server(const struct test_opts *opts)
>>> +{
>>> + union {
>>> + struct sockaddr sa;
>>> + struct sockaddr_vm svm;
>>> + } addr = {
>>> + .svm = {
>>> + .svm_family = AF_VSOCK,
>>> + .svm_port = 1234,
>>> + .svm_cid = VMADDR_CID_ANY,
>>> + },
>>> + };
>>> + int len = sizeof(addr.sa);
>>> + int fd;
>>> +
>>> + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
>>> + if (fd < 0) {
>>> + perror("socket");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
>>> + perror("bind");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + /* Notify the client that the server is ready */
>>> + control_writeln("BIND");
>>> +
>>> + recvfrom_byte(fd, &addr.sa, &len, 1, 0);
>>> +
>>> + /* Wait for the client to finish */
>>> + control_expectln("DONE");
>>> +
>>> + close(fd);
>>> +}
>>> +
>>> +static void test_dgram_connect_client(const struct test_opts *opts)
>>> +{
>>> + union {
>>> + struct sockaddr sa;
>>> + struct sockaddr_vm svm;
>>> + } addr = {
>>> + .svm = {
>>> + .svm_family = AF_VSOCK,
>>> + .svm_port = 1234,
>>> + .svm_cid = opts->peer_cid,
>>> + },
>>> + };
>>> + int ret;
>>> + int fd;
>>> +
>>> + /* Wait for the server to be ready */
>>> + control_expectln("BIND");
>>> +
>>> + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
>>> + if (fd < 0) {
>>> + perror("bind");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + ret = connect(fd, &addr.sa, sizeof(addr.svm));
>>> + if (ret < 0) {
>>> + perror("connect");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + send_byte(fd, 1, 0);
>>> +
>>> + /* Notify the server that the client has finished */
>>> + control_writeln("DONE");
>>> +
>>> + close(fd);
>>> +}
>>> +
>>> +static void test_dgram_connect_server(const struct test_opts *opts)
>>> +{
>>> + test_dgram_sendto_server(opts);
>>> +}
>>> +
>>> +static void test_dgram_multiconn_sendto_client(const struct test_opts *opts)
>>> +{
>>> + union {
>>> + struct sockaddr sa;
>>> + struct sockaddr_vm svm;
>>> + } addr = {
>>> + .svm = {
>>> + .svm_family = AF_VSOCK,
>>> + .svm_port = 1234,
>>> + .svm_cid = opts->peer_cid,
>>> + },
>>> + };
>>> + int fds[MULTICONN_NFDS];
>>> + int i;
>>> +
>>> + /* Wait for the server to be ready */
>>> + control_expectln("BIND");
>>> +
>>> + for (i = 0; i < MULTICONN_NFDS; i++) {
>>> + fds[i] = socket(AF_VSOCK, SOCK_DGRAM, 0);
>>> + if (fds[i] < 0) {
>>> + perror("socket");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> + }
>>> +
>>> + for (i = 0; i < MULTICONN_NFDS; i++)
>>> + sendto_byte(fds[i], &addr.sa, sizeof(addr.svm), 1, 0);
>>> +
>>> + /* Notify the server that the client has finished */
>>> + control_writeln("DONE");
>>> +
>>> + for (i = 0; i < MULTICONN_NFDS; i++)
>>> + close(fds[i]);
>>> +}
>>> +
>>> +static void test_dgram_multiconn_sendto_server(const struct test_opts *opts)
>>> +{
>>> + union {
>>> + struct sockaddr sa;
>>> + struct sockaddr_vm svm;
>>> + } addr = {
>>> + .svm = {
>>> + .svm_family = AF_VSOCK,
>>> + .svm_port = 1234,
>>> + .svm_cid = VMADDR_CID_ANY,
>>> + },
>>> + };
>>> + int len = sizeof(addr.sa);
>>> + int fd;
>>> + int i;
>>> +
>>> + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
>>> + if (fd < 0) {
>>> + perror("socket");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
>>> + perror("bind");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + /* Notify the client that the server is ready */
>>> + control_writeln("BIND");
>>> +
>>> + for (i = 0; i < MULTICONN_NFDS; i++)
>>> + recvfrom_byte(fd, &addr.sa, &len, 1, 0);
>>> +
>>> + /* Wait for the client to finish */
>>> + control_expectln("DONE");
>>> +
>>> + close(fd);
>>> +}
>>> +
>>> +static void test_dgram_multiconn_send_client(const struct test_opts *opts)
>>> +{
>>> + int fds[MULTICONN_NFDS];
>>> + int i;
>>> +
>>> + /* Wait for the server to be ready */
>>> + control_expectln("BIND");
>>> +
>>> + for (i = 0; i < MULTICONN_NFDS; i++) {
>>> + fds[i] = vsock_dgram_connect(opts->peer_cid, 1234);
>>> + if (fds[i] < 0) {
>>> + perror("socket");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> + }
>>> +
>>> + for (i = 0; i < MULTICONN_NFDS; i++)
>>> + send_byte(fds[i], 1, 0);
>>> +
>>> + /* Notify the server that the client has finished */
>>> + control_writeln("DONE");
>>> +
>>> + for (i = 0; i < MULTICONN_NFDS; i++)
>>> + close(fds[i]);
>>> +}
>>> +
>>> +static void test_dgram_multiconn_send_server(const struct test_opts *opts)
>>> +{
>>> + union {
>>> + struct sockaddr sa;
>>> + struct sockaddr_vm svm;
>>> + } addr = {
>>> + .svm = {
>>> + .svm_family = AF_VSOCK,
>>> + .svm_port = 1234,
>>> + .svm_cid = VMADDR_CID_ANY,
>>> + },
>>> + };
>>> + int fd;
>>> + int i;
>>> +
>>> + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
>>> + if (fd < 0) {
>>> + perror("socket");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
>>> + perror("bind");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + /* Notify the client that the server is ready */
>>> + control_writeln("BIND");
>>> +
>>> + for (i = 0; i < MULTICONN_NFDS; i++)
>>> + recv_byte(fd, 1, 0);
>>> +
>>> + /* Wait for the client to finish */
>>> + control_expectln("DONE");
>>> +
>>> + close(fd);
>>> +}
>>> +
>>> +static void test_dgram_msg_bounds_client(const struct test_opts *opts)
>>> +{
>>> + unsigned long recv_buf_size;
>>> + int page_size;
>>> + int msg_cnt;
>>> + int fd;
>>> +
>>> + fd = vsock_dgram_connect(opts->peer_cid, 1234);
>>> + if (fd < 0) {
>>> + perror("connect");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + /* Let the server know the client is ready */
>>> + control_writeln("CLNTREADY");
>>> +
>>> + msg_cnt = control_readulong();
>>> + recv_buf_size = control_readulong();
>>> +
>>> + /* Wait, until receiver sets buffer size. */
>>> + control_expectln("SRVREADY");
>>> +
>>> + page_size = getpagesize();
>>> +
>>> + for (int i = 0; i < msg_cnt; i++) {
>>> + unsigned long curr_hash;
>>> + ssize_t send_size;
>>> + size_t buf_size;
>>> + void *buf;
>>> +
>>> + /* Use "small" buffers and "big" buffers. */
>>> + if (i & 1)
>>> + buf_size = page_size +
>>> + (rand() % (MAX_MSG_SIZE - page_size));
>>> + else
>>> + buf_size = 1 + (rand() % page_size);
>>> +
>>> + buf_size = min(buf_size, recv_buf_size);
>>> +
>>> + buf = malloc(buf_size);
>>> +
>>> + if (!buf) {
>>> + perror("malloc");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + memset(buf, rand() & 0xff, buf_size);
>>> + /* Set at least one MSG_EOR + some random. */
>>> +
>>> + send_size = send(fd, buf, buf_size, 0);
>>> +
>>> + if (send_size < 0) {
>>> + perror("send");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + if (send_size != buf_size) {
>>> + fprintf(stderr, "Invalid send size\n");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + /* In theory the implementation isn't required to transmit
>>> + * these packets in order, so we use this SYNC control message
>>> + * so that server and client coordinate sending and receiving
>>> + * one packet at a time. The client sends a packet and waits
>>> + * until it has been received before sending another.
>>> + */
>>> + control_writeln("PKTSENT");
>>> + control_expectln("PKTRECV");
>>> +
>>> + /* Send the server a hash of the packet */
>>> + curr_hash = hash_djb2(buf, buf_size);
>>> + control_writeulong(curr_hash);
>>> + free(buf);
>>> + }
>>> +
>>> + control_writeln("SENDDONE");
>>> + close(fd);
>>> +}
>>> +
>>> +static void test_dgram_msg_bounds_server(const struct test_opts *opts)
>>> +{
>>> + const unsigned long msg_cnt = 16;
>>> + unsigned long sock_buf_size;
>>> + struct msghdr msg = {0};
>>> + struct iovec iov = {0};
>>> + char buf[MAX_MSG_SIZE];
>>> + socklen_t len;
>>> + int fd;
>>> + int i;
>>> +
>>> + fd = vsock_dgram_bind(VMADDR_CID_ANY, 1234);
>>> +
>>> + if (fd < 0) {
>>> + perror("bind");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + /* Set receive buffer to maximum */
>>> + sock_buf_size = -1;
>>> + if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
>>> + &sock_buf_size, sizeof(sock_buf_size))) {
>>> + perror("setsockopt(SO_RECVBUF)");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + /* Retrieve the receive buffer size */
>>> + len = sizeof(sock_buf_size);
>>> + if (getsockopt(fd, SOL_SOCKET, SO_RCVBUF,
>>> + &sock_buf_size, &len)) {
>>> + perror("getsockopt(SO_RECVBUF)");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + /* Client ready to receive parameters */
>>> + control_expectln("CLNTREADY");
>>> +
>>> + control_writeulong(msg_cnt);
>>> + control_writeulong(sock_buf_size);
>>> +
>>> + /* Ready to receive data. */
>>> + control_writeln("SRVREADY");
>>> +
>>> + iov.iov_base = buf;
>>> + iov.iov_len = sizeof(buf);
>>> + msg.msg_iov = &iov;
>>> + msg.msg_iovlen = 1;
>>> +
>>> + for (i = 0; i < msg_cnt; i++) {
>>> + unsigned long remote_hash;
>>> + unsigned long curr_hash;
>>> + ssize_t recv_size;
>>> +
>>> + control_expectln("PKTSENT");
>>> + recv_size = recvmsg(fd, &msg, 0);
>>> + control_writeln("PKTRECV");
>>> +
>>> + if (!recv_size)
>>> + break;
>>> +
>>> + if (recv_size < 0) {
>>> + perror("recvmsg");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> + curr_hash = hash_djb2(msg.msg_iov[0].iov_base, recv_size);
>>> + remote_hash = control_readulong();
>>> +
>>> + if (curr_hash != remote_hash) {
>>> + fprintf(stderr, "Message bounds broken\n");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> + }
>>> +
>>> + close(fd);
>>> +}
>>> +
>>> static struct test_case test_cases[] = {
>>> {
>>> .name = "SOCK_STREAM connection reset",
>>> @@ -1128,6 +1535,31 @@ static struct test_case test_cases[] = {
>>> .run_client = test_stream_virtio_skb_merge_client,
>>> .run_server = test_stream_virtio_skb_merge_server,
>>> },
>>> + {
>>> + .name = "SOCK_DGRAM client sendto",
>>> + .run_client = test_dgram_sendto_client,
>>> + .run_server = test_dgram_sendto_server,
>>> + },
>>> + {
>>> + .name = "SOCK_DGRAM client connect",
>>> + .run_client = test_dgram_connect_client,
>>> + .run_server = test_dgram_connect_server,
>>> + },
>>> + {
>>> + .name = "SOCK_DGRAM multiple connections using sendto",
>>> + .run_client = test_dgram_multiconn_sendto_client,
>>> + .run_server = test_dgram_multiconn_sendto_server,
>>> + },
>>> + {
>>> + .name = "SOCK_DGRAM multiple connections using send",
>>> + .run_client = test_dgram_multiconn_send_client,
>>> + .run_server = test_dgram_multiconn_send_server,
>>> + },
>>> + {
>>> + .name = "SOCK_DGRAM msg bounds",
>>> + .run_client = test_dgram_msg_bounds_client,
>>> + .run_server = test_dgram_msg_bounds_server,
>>> + },
>>> {},
>>> };
>>>
>>>

2023-06-23 20:44:54

by Bobby Eshleman

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 3/8] vsock: support multi-transport datagrams

On Thu, Jun 22, 2023 at 05:19:08PM +0200, Stefano Garzarella wrote:
> On Sat, Jun 10, 2023 at 12:58:30AM +0000, Bobby Eshleman wrote:
> > This patch adds support for multi-transport datagrams.
> >
> > This includes:
> > - Per-packet lookup of transports when using sendto(sockaddr_vm)
> > - Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
> > sockaddr_vm
> >
> > To preserve backwards compatibility with VMCI, some important changes
> > were made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
> > be used for dgrams iff there is not yet a g2h or h2g transport that has
>
> s/iff/if
>
> > been registered that can transmit the packet. If there is a g2h/h2g
> > transport for that remote address, then that transport will be used and
> > not "transport_dgram". This essentially makes "transport_dgram" a
> > fallback transport for when h2g/g2h has not yet gone online, which
> > appears to be the exact use case for VMCI.
> >
> > This design makes sense, because there is no reason that the
> > transport_{g2h,h2g} cannot also service datagrams, which makes the role
> > of transport_dgram difficult to understand outside of the VMCI context.
> >
> > The logic around "transport_dgram" had to be retained to prevent
> > breaking VMCI:
> >
> > 1) VMCI datagrams appear to function outside of the h2g/g2h
> > paradigm. When the vmci transport becomes online, it registers itself
> > with the DGRAM feature, but not H2G/G2H. Only later when the
> > transport has more information about its environment does it register
> > H2G or G2H. In the case that a datagram socket becomes active
> > after DGRAM registration but before G2H/H2G registration, the
> > "transport_dgram" transport needs to be used.
>
> IIRC we did this, because at that time only VMCI supported DGRAM. Now that
> there are more transports, maybe DGRAM can follow the h2g/g2h paradigm.
>

Totally makes sense. I'll add the detail above that the prior design was
a result of chronology.

> >
> > 2) VMCI seems to require special message be sent by the transport when a
> > datagram socket calls bind(). Under the h2g/g2h model, the transport
> > is selected using the remote_addr which is set by connect(). At
> > bind time there is no remote_addr because often no connect() has been
> > called yet: the transport is null. Therefore, with a null transport
> > there doesn't seem to be any good way for a datagram socket a tell the
> > VMCI transport that it has just had bind() called upon it.
>
> @Vishnu, @Bryan do you think we can avoid this in some way?
>
> >
> > Only transports with a special datagram fallback use-case such as VMCI
> > need to register VSOCK_TRANSPORT_F_DGRAM.
>
> Maybe we should rename it in VSOCK_TRANSPORT_F_DGRAM_FALLBACK or
> something like that.
>
> In any case, we definitely need to update the comment in
> include/net/af_vsock.h on top of VSOCK_TRANSPORT_F_DGRAM mentioning
> this.
>

Agreed. I'll rename to VSOCK_TRANSPORT_F_DGRAM_FALLBACK, unless we find
there is a better way altogether.

> >
> > Signed-off-by: Bobby Eshleman <[email protected]>
> > ---
> > drivers/vhost/vsock.c | 1 -
> > include/linux/virtio_vsock.h | 2 -
> > net/vmw_vsock/af_vsock.c | 78 +++++++++++++++++++++++++--------
> > net/vmw_vsock/hyperv_transport.c | 6 ---
> > net/vmw_vsock/virtio_transport.c | 1 -
> > net/vmw_vsock/virtio_transport_common.c | 7 ---
> > net/vmw_vsock/vsock_loopback.c | 1 -
> > 7 files changed, 60 insertions(+), 36 deletions(-)
> >
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index c8201c070b4b..8f0082da5e70 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
> > .cancel_pkt = vhost_transport_cancel_pkt,
> >
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > - .dgram_bind = virtio_transport_dgram_bind,
> > .dgram_allow = virtio_transport_dgram_allow,
> > .dgram_get_cid = virtio_transport_dgram_get_cid,
> > .dgram_get_port = virtio_transport_dgram_get_port,
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 23521a318cf0..73afa09f4585 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -216,8 +216,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> > u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> > bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> > bool virtio_transport_stream_allow(u32 cid, u32 port);
> > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > - struct sockaddr_vm *addr);
> > bool virtio_transport_dgram_allow(u32 cid, u32 port);
> > int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
> > int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index 74358f0b47fa..ef86765f3765 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -438,6 +438,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
> > return transport;
> > }
> >
> > +static const struct vsock_transport *
> > +vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
> > +{
> > + const struct vsock_transport *transport;
> > +
> > + transport = vsock_connectible_lookup_transport(cid, flags);
> > + if (transport)
> > + return transport;
> > +
> > + return transport_dgram;
> > +}
> > +
> > /* Assign a transport to a socket and call the .init transport callback.
> > *
> > * Note: for connection oriented socket this must be called when vsk->remote_addr
> > @@ -474,7 +486,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> >
> > switch (sk->sk_type) {
> > case SOCK_DGRAM:
> > - new_transport = transport_dgram;
> > + new_transport = vsock_dgram_lookup_transport(remote_cid,
> > + remote_flags);
> > break;
> > case SOCK_STREAM:
> > case SOCK_SEQPACKET:
> > @@ -691,6 +704,9 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > static int __vsock_bind_dgram(struct vsock_sock *vsk,
> > struct sockaddr_vm *addr)
> > {
> > + if (!vsk->transport || !vsk->transport->dgram_bind)
> > + return -EINVAL;
> > +
> > return vsk->transport->dgram_bind(vsk, addr);
> > }
> >
> > @@ -1172,19 +1188,24 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> >
> > lock_sock(sk);
> >
> > - transport = vsk->transport;
> > -
> > - err = vsock_auto_bind(vsk);
> > - if (err)
> > - goto out;
> > -
> > -
> > /* If the provided message contains an address, use that. Otherwise
> > * fall back on the socket's remote handle (if it has been connected).
> > */
> > if (msg->msg_name &&
> > vsock_addr_cast(msg->msg_name, msg->msg_namelen,
> > &remote_addr) == 0) {
> > + transport = vsock_dgram_lookup_transport(remote_addr->svm_cid,
> > + remote_addr->svm_flags);
> > + if (!transport) {
> > + err = -EINVAL;
> > + goto out;
> > + }
> > +
> > + if (!try_module_get(transport->module)) {
> > + err = -ENODEV;
> > + goto out;
> > + }
> > +
> > /* Ensure this address is of the right type and is a valid
> > * destination.
> > */
> > @@ -1193,11 +1214,27 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > remote_addr->svm_cid = transport->get_local_cid();
> >
>
> From here ...
>
> > if (!vsock_addr_bound(remote_addr)) {
> > + module_put(transport->module);
> > + err = -EINVAL;
> > + goto out;
> > + }
> > +
> > + if (!transport->dgram_allow(remote_addr->svm_cid,
> > + remote_addr->svm_port)) {
> > + module_put(transport->module);
> > err = -EINVAL;
> > goto out;
> > }
> > +
> > + err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
>
> ... to here, looks like duplicate code, can we get it out of the if block?
>

Yes, I think using something like this:

[...]
bool module_got = false;

[...]
if (!try_module_get(transport->module)) {
err = -ENODEV;
goto out;
}
module_got = true;

[...]

out:
if (likely(transport && !err && module_got))
module_put(transport->module)

> > + module_put(transport->module);
> > } else if (sock->state == SS_CONNECTED) {
> > remote_addr = &vsk->remote_addr;
> > + transport = vsk->transport;
> > +
> > + err = vsock_auto_bind(vsk);
> > + if (err)
> > + goto out;
> >
> > if (remote_addr->svm_cid == VMADDR_CID_ANY)
> > remote_addr->svm_cid = transport->get_local_cid();
> > @@ -1205,23 +1242,23 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > /* XXX Should connect() or this function ensure remote_addr is
> > * bound?
> > */
> > - if (!vsock_addr_bound(&vsk->remote_addr)) {
> > + if (!vsock_addr_bound(remote_addr)) {
> > err = -EINVAL;
> > goto out;
> > }
> > - } else {
> > - err = -EINVAL;
> > - goto out;
> > - }
> >
> > - if (!transport->dgram_allow(remote_addr->svm_cid,
> > - remote_addr->svm_port)) {
> > + if (!transport->dgram_allow(remote_addr->svm_cid,
> > + remote_addr->svm_port)) {
> > + err = -EINVAL;
> > + goto out;
> > + }
> > +
> > + err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
> > + } else {
> > err = -EINVAL;
> > goto out;
> > }
> >
> > - err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
> > -
> > out:
> > release_sock(sk);
> > return err;
> > @@ -1255,13 +1292,18 @@ static int vsock_dgram_connect(struct socket *sock,
> > if (err)
> > goto out;
> >
> > + memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
> > +
> > + err = vsock_assign_transport(vsk, NULL);
> > + if (err)
> > + goto out;
> > +
> > if (!vsk->transport->dgram_allow(remote_addr->svm_cid,
> > remote_addr->svm_port)) {
> > err = -EINVAL;
> > goto out;
> > }
> >
> > - memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
> > sock->state = SS_CONNECTED;
> >
> > /* sock map disallows redirection of non-TCP sockets with sk_state !=
> > diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> > index ff6e87e25fa0..c00bc5da769a 100644
> > --- a/net/vmw_vsock/hyperv_transport.c
> > +++ b/net/vmw_vsock/hyperv_transport.c
> > @@ -551,11 +551,6 @@ static void hvs_destruct(struct vsock_sock *vsk)
> > kfree(hvs);
> > }
> >
> > -static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
> > -{
> > - return -EOPNOTSUPP;
> > -}
> > -
> > static int hvs_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> > {
> > return -EOPNOTSUPP;
> > @@ -841,7 +836,6 @@ static struct vsock_transport hvs_transport = {
> > .connect = hvs_connect,
> > .shutdown = hvs_shutdown,
> >
> > - .dgram_bind = hvs_dgram_bind,
> > .dgram_get_cid = hvs_dgram_get_cid,
> > .dgram_get_port = hvs_dgram_get_port,
> > .dgram_get_length = hvs_dgram_get_length,
> > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > index 5763cdf13804..1b7843a7779a 100644
> > --- a/net/vmw_vsock/virtio_transport.c
> > +++ b/net/vmw_vsock/virtio_transport.c
> > @@ -428,7 +428,6 @@ static struct virtio_transport virtio_transport = {
> > .shutdown = virtio_transport_shutdown,
> > .cancel_pkt = virtio_transport_cancel_pkt,
> >
> > - .dgram_bind = virtio_transport_dgram_bind,
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > .dgram_allow = virtio_transport_dgram_allow,
> > .dgram_get_cid = virtio_transport_dgram_get_cid,
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index e6903c719964..d5a3c8efe84b 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -790,13 +790,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
> > }
> > EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> >
> > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > - struct sockaddr_vm *addr)
> > -{
> > - return -EOPNOTSUPP;
> > -}
> > -EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
> > -
> > int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> > {
> > return -EOPNOTSUPP;
> > diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> > index 2f3cabc79ee5..e9de45a26fbd 100644
> > --- a/net/vmw_vsock/vsock_loopback.c
> > +++ b/net/vmw_vsock/vsock_loopback.c
> > @@ -61,7 +61,6 @@ static struct virtio_transport loopback_transport = {
> > .shutdown = virtio_transport_shutdown,
> > .cancel_pkt = vsock_loopback_cancel_pkt,
> >
> > - .dgram_bind = virtio_transport_dgram_bind,
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > .dgram_allow = virtio_transport_dgram_allow,
> > .dgram_get_cid = virtio_transport_dgram_get_cid,
> >
> > --
> > 2.30.2
> >
>
> The rest LGTM!
>
> Stefano

Thanks,
Bobby

2023-06-23 20:55:51

by Bobby Eshleman

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 3/8] vsock: support multi-transport datagrams

On Fri, Jun 23, 2023 at 02:50:01AM +0000, Bobby Eshleman wrote:
> On Thu, Jun 22, 2023 at 05:19:08PM +0200, Stefano Garzarella wrote:
> > On Sat, Jun 10, 2023 at 12:58:30AM +0000, Bobby Eshleman wrote:
> > > This patch adds support for multi-transport datagrams.
> > >
> > > This includes:
> > > - Per-packet lookup of transports when using sendto(sockaddr_vm)
> > > - Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
> > > sockaddr_vm
> > >
> > > To preserve backwards compatibility with VMCI, some important changes
> > > were made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
> > > be used for dgrams iff there is not yet a g2h or h2g transport that has
> >
> > s/iff/if
> >
> > > been registered that can transmit the packet. If there is a g2h/h2g
> > > transport for that remote address, then that transport will be used and
> > > not "transport_dgram". This essentially makes "transport_dgram" a
> > > fallback transport for when h2g/g2h has not yet gone online, which
> > > appears to be the exact use case for VMCI.
> > >
> > > This design makes sense, because there is no reason that the
> > > transport_{g2h,h2g} cannot also service datagrams, which makes the role
> > > of transport_dgram difficult to understand outside of the VMCI context.
> > >
> > > The logic around "transport_dgram" had to be retained to prevent
> > > breaking VMCI:
> > >
> > > 1) VMCI datagrams appear to function outside of the h2g/g2h
> > > paradigm. When the vmci transport becomes online, it registers itself
> > > with the DGRAM feature, but not H2G/G2H. Only later when the
> > > transport has more information about its environment does it register
> > > H2G or G2H. In the case that a datagram socket becomes active
> > > after DGRAM registration but before G2H/H2G registration, the
> > > "transport_dgram" transport needs to be used.
> >
> > IIRC we did this, because at that time only VMCI supported DGRAM. Now that
> > there are more transports, maybe DGRAM can follow the h2g/g2h paradigm.
> >
>
> Totally makes sense. I'll add the detail above that the prior design was
> a result of chronology.
>
> > >
> > > 2) VMCI seems to require special message be sent by the transport when a
> > > datagram socket calls bind(). Under the h2g/g2h model, the transport
> > > is selected using the remote_addr which is set by connect(). At
> > > bind time there is no remote_addr because often no connect() has been
> > > called yet: the transport is null. Therefore, with a null transport
> > > there doesn't seem to be any good way for a datagram socket a tell the
> > > VMCI transport that it has just had bind() called upon it.
> >
> > @Vishnu, @Bryan do you think we can avoid this in some way?
> >
> > >
> > > Only transports with a special datagram fallback use-case such as VMCI
> > > need to register VSOCK_TRANSPORT_F_DGRAM.
> >
> > Maybe we should rename it in VSOCK_TRANSPORT_F_DGRAM_FALLBACK or
> > something like that.
> >
> > In any case, we definitely need to update the comment in
> > include/net/af_vsock.h on top of VSOCK_TRANSPORT_F_DGRAM mentioning
> > this.
> >
>
> Agreed. I'll rename to VSOCK_TRANSPORT_F_DGRAM_FALLBACK, unless we find
> there is a better way altogether.
>
> > >
> > > Signed-off-by: Bobby Eshleman <[email protected]>
> > > ---
> > > drivers/vhost/vsock.c | 1 -
> > > include/linux/virtio_vsock.h | 2 -
> > > net/vmw_vsock/af_vsock.c | 78 +++++++++++++++++++++++++--------
> > > net/vmw_vsock/hyperv_transport.c | 6 ---
> > > net/vmw_vsock/virtio_transport.c | 1 -
> > > net/vmw_vsock/virtio_transport_common.c | 7 ---
> > > net/vmw_vsock/vsock_loopback.c | 1 -
> > > 7 files changed, 60 insertions(+), 36 deletions(-)
> > >
> > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > index c8201c070b4b..8f0082da5e70 100644
> > > --- a/drivers/vhost/vsock.c
> > > +++ b/drivers/vhost/vsock.c
> > > @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
> > > .cancel_pkt = vhost_transport_cancel_pkt,
> > >
> > > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > > - .dgram_bind = virtio_transport_dgram_bind,
> > > .dgram_allow = virtio_transport_dgram_allow,
> > > .dgram_get_cid = virtio_transport_dgram_get_cid,
> > > .dgram_get_port = virtio_transport_dgram_get_port,
> > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > > index 23521a318cf0..73afa09f4585 100644
> > > --- a/include/linux/virtio_vsock.h
> > > +++ b/include/linux/virtio_vsock.h
> > > @@ -216,8 +216,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> > > u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> > > bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> > > bool virtio_transport_stream_allow(u32 cid, u32 port);
> > > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > > - struct sockaddr_vm *addr);
> > > bool virtio_transport_dgram_allow(u32 cid, u32 port);
> > > int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
> > > int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
> > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > > index 74358f0b47fa..ef86765f3765 100644
> > > --- a/net/vmw_vsock/af_vsock.c
> > > +++ b/net/vmw_vsock/af_vsock.c
> > > @@ -438,6 +438,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
> > > return transport;
> > > }
> > >
> > > +static const struct vsock_transport *
> > > +vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
> > > +{
> > > + const struct vsock_transport *transport;
> > > +
> > > + transport = vsock_connectible_lookup_transport(cid, flags);
> > > + if (transport)
> > > + return transport;
> > > +
> > > + return transport_dgram;
> > > +}
> > > +
> > > /* Assign a transport to a socket and call the .init transport callback.
> > > *
> > > * Note: for connection oriented socket this must be called when vsk->remote_addr
> > > @@ -474,7 +486,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> > >
> > > switch (sk->sk_type) {
> > > case SOCK_DGRAM:
> > > - new_transport = transport_dgram;
> > > + new_transport = vsock_dgram_lookup_transport(remote_cid,
> > > + remote_flags);
> > > break;
> > > case SOCK_STREAM:
> > > case SOCK_SEQPACKET:
> > > @@ -691,6 +704,9 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > > static int __vsock_bind_dgram(struct vsock_sock *vsk,
> > > struct sockaddr_vm *addr)
> > > {
> > > + if (!vsk->transport || !vsk->transport->dgram_bind)
> > > + return -EINVAL;
> > > +
> > > return vsk->transport->dgram_bind(vsk, addr);
> > > }
> > >
> > > @@ -1172,19 +1188,24 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > >
> > > lock_sock(sk);
> > >
> > > - transport = vsk->transport;
> > > -
> > > - err = vsock_auto_bind(vsk);
> > > - if (err)
> > > - goto out;
> > > -
> > > -
> > > /* If the provided message contains an address, use that. Otherwise
> > > * fall back on the socket's remote handle (if it has been connected).
> > > */
> > > if (msg->msg_name &&
> > > vsock_addr_cast(msg->msg_name, msg->msg_namelen,
> > > &remote_addr) == 0) {
> > > + transport = vsock_dgram_lookup_transport(remote_addr->svm_cid,
> > > + remote_addr->svm_flags);
> > > + if (!transport) {
> > > + err = -EINVAL;
> > > + goto out;
> > > + }
> > > +
> > > + if (!try_module_get(transport->module)) {
> > > + err = -ENODEV;
> > > + goto out;
> > > + }
> > > +
> > > /* Ensure this address is of the right type and is a valid
> > > * destination.
> > > */
> > > @@ -1193,11 +1214,27 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > > remote_addr->svm_cid = transport->get_local_cid();
> > >
> >
> > From here ...
> >
> > > if (!vsock_addr_bound(remote_addr)) {
> > > + module_put(transport->module);
> > > + err = -EINVAL;
> > > + goto out;
> > > + }
> > > +
> > > + if (!transport->dgram_allow(remote_addr->svm_cid,
> > > + remote_addr->svm_port)) {
> > > + module_put(transport->module);
> > > err = -EINVAL;
> > > goto out;
> > > }
> > > +
> > > + err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
> >
> > ... to here, looks like duplicate code, can we get it out of the if block?
> >
>
> Yes, I think using something like this:
>
> [...]
> bool module_got = false;
>
> [...]
> if (!try_module_get(transport->module)) {
> err = -ENODEV;
> goto out;
> }
> module_got = true;
>
> [...]
>
> out:
> if (likely(transport && !err && module_got))

Actually, just...

if (module_got)

> module_put(transport->module)
>
> > > + module_put(transport->module);
> > > } else if (sock->state == SS_CONNECTED) {
> > > remote_addr = &vsk->remote_addr;
> > > + transport = vsk->transport;
> > > +
> > > + err = vsock_auto_bind(vsk);
> > > + if (err)
> > > + goto out;
> > >
> > > if (remote_addr->svm_cid == VMADDR_CID_ANY)
> > > remote_addr->svm_cid = transport->get_local_cid();
> > > @@ -1205,23 +1242,23 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > > /* XXX Should connect() or this function ensure remote_addr is
> > > * bound?
> > > */
> > > - if (!vsock_addr_bound(&vsk->remote_addr)) {
> > > + if (!vsock_addr_bound(remote_addr)) {
> > > err = -EINVAL;
> > > goto out;
> > > }
> > > - } else {
> > > - err = -EINVAL;
> > > - goto out;
> > > - }
> > >
> > > - if (!transport->dgram_allow(remote_addr->svm_cid,
> > > - remote_addr->svm_port)) {
> > > + if (!transport->dgram_allow(remote_addr->svm_cid,
> > > + remote_addr->svm_port)) {
> > > + err = -EINVAL;
> > > + goto out;
> > > + }
> > > +
> > > + err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
> > > + } else {
> > > err = -EINVAL;
> > > goto out;
> > > }
> > >
> > > - err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
> > > -
> > > out:
> > > release_sock(sk);
> > > return err;
> > > @@ -1255,13 +1292,18 @@ static int vsock_dgram_connect(struct socket *sock,
> > > if (err)
> > > goto out;
> > >
> > > + memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
> > > +
> > > + err = vsock_assign_transport(vsk, NULL);
> > > + if (err)
> > > + goto out;
> > > +
> > > if (!vsk->transport->dgram_allow(remote_addr->svm_cid,
> > > remote_addr->svm_port)) {
> > > err = -EINVAL;
> > > goto out;
> > > }
> > >
> > > - memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
> > > sock->state = SS_CONNECTED;
> > >
> > > /* sock map disallows redirection of non-TCP sockets with sk_state !=
> > > diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> > > index ff6e87e25fa0..c00bc5da769a 100644
> > > --- a/net/vmw_vsock/hyperv_transport.c
> > > +++ b/net/vmw_vsock/hyperv_transport.c
> > > @@ -551,11 +551,6 @@ static void hvs_destruct(struct vsock_sock *vsk)
> > > kfree(hvs);
> > > }
> > >
> > > -static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
> > > -{
> > > - return -EOPNOTSUPP;
> > > -}
> > > -
> > > static int hvs_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> > > {
> > > return -EOPNOTSUPP;
> > > @@ -841,7 +836,6 @@ static struct vsock_transport hvs_transport = {
> > > .connect = hvs_connect,
> > > .shutdown = hvs_shutdown,
> > >
> > > - .dgram_bind = hvs_dgram_bind,
> > > .dgram_get_cid = hvs_dgram_get_cid,
> > > .dgram_get_port = hvs_dgram_get_port,
> > > .dgram_get_length = hvs_dgram_get_length,
> > > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > > index 5763cdf13804..1b7843a7779a 100644
> > > --- a/net/vmw_vsock/virtio_transport.c
> > > +++ b/net/vmw_vsock/virtio_transport.c
> > > @@ -428,7 +428,6 @@ static struct virtio_transport virtio_transport = {
> > > .shutdown = virtio_transport_shutdown,
> > > .cancel_pkt = virtio_transport_cancel_pkt,
> > >
> > > - .dgram_bind = virtio_transport_dgram_bind,
> > > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > > .dgram_allow = virtio_transport_dgram_allow,
> > > .dgram_get_cid = virtio_transport_dgram_get_cid,
> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > index e6903c719964..d5a3c8efe84b 100644
> > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > @@ -790,13 +790,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
> > > }
> > > EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> > >
> > > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > > - struct sockaddr_vm *addr)
> > > -{
> > > - return -EOPNOTSUPP;
> > > -}
> > > -EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
> > > -
> > > int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> > > {
> > > return -EOPNOTSUPP;
> > > diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> > > index 2f3cabc79ee5..e9de45a26fbd 100644
> > > --- a/net/vmw_vsock/vsock_loopback.c
> > > +++ b/net/vmw_vsock/vsock_loopback.c
> > > @@ -61,7 +61,6 @@ static struct virtio_transport loopback_transport = {
> > > .shutdown = virtio_transport_shutdown,
> > > .cancel_pkt = vsock_loopback_cancel_pkt,
> > >
> > > - .dgram_bind = virtio_transport_dgram_bind,
> > > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > > .dgram_allow = virtio_transport_dgram_allow,
> > > .dgram_get_cid = virtio_transport_dgram_get_cid,
> > >
> > > --
> > > 2.30.2
> > >
> >
> > The rest LGTM!
> >
> > Stefano
>
> Thanks,
> Bobby

2023-06-23 22:31:50

by Bobby Eshleman

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 6/8] virtio/vsock: support dgrams

On Thu, Jun 22, 2023 at 06:09:12PM +0200, Stefano Garzarella wrote:
> On Sun, Jun 11, 2023 at 11:49:02PM +0300, Arseniy Krasnov wrote:
> > Hello Bobby!
> >
> > On 10.06.2023 03:58, Bobby Eshleman wrote:
> > > This commit adds support for datagrams over virtio/vsock.
> > >
> > > Message boundaries are preserved on a per-skb and per-vq entry basis.
> >
> > I'm a little bit confused about the following case: let vhost sends 4097 bytes
> > datagram to the guest. Guest uses 4096 RX buffers in it's virtio queue, each
> > buffer has attached empty skb to it. Vhost places first 4096 bytes to the first
> > buffer of guests RX queue, and 1 last byte to the second buffer. Now IIUC guest
> > has two skb in it rx queue, and user in guest wants to read data - does it read
> > 4097 bytes, while guest has two skb - 4096 bytes and 1 bytes? In seqpacket there is
> > special marker in header which shows where message ends, and how it works here?
>
> I think the main difference is that DGRAM is not connection-oriented, so
> we don't have a stream and we can't split the packet into 2 (maybe we
> could, but we have no guarantee that the second one for example will be
> not discarded because there is no space).
>
> So I think it is acceptable as a restriction to keep it simple.
>
> My only doubt is, should we make the RX buffer size configurable,
> instead of always using 4k?
>
I think that is a really good idea. What mechanism do you imagine?

For sendmsg() with buflen > VQ_BUF_SIZE, I think I'd like -ENOBUFS
returned even though it is uncharacteristic of Linux sockets.
Alternatively, silently dropping is okay... but seems needlessly
unhelpful.

FYI, this patch is broken for h2g because it requeues partially sent
skbs, so probably doesn't need much code review until we decided on the
policy.

Best,
Bobby

2023-06-24 00:42:18

by Bobby Eshleman

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 8/8] tests: add vsock dgram tests

On Fri, Jun 23, 2023 at 09:34:51PM +0300, Arseniy Krasnov wrote:
>
>
> On 23.06.2023 02:16, Bobby Eshleman wrote:
> > On Sun, Jun 11, 2023 at 11:54:57PM +0300, Arseniy Krasnov wrote:
> >> Hello Bobby!
> >>
> >> Sorry, may be I become a little bit annoying:), but I tried to run vsock_test with
> >> this v4 version, and again get the same crash:
> >
> > Haha not annoying at all. I appreciate the testing!
> >
> >>
> >> # cat client.sh
> >> ./vsock_test --mode=client --control-host=192.168.1.1 --control-port=12345 --peer-cid=2
> >> # ./client.sh
> >> Control socket connected to 192.168.1.1:12345.
> >> 0 - SOCK_STREAM connection reset...[ 20.065237] BUG: kernel NULL pointer dereference, addre0
> >> [ 20.065895] #PF: supervisor read access in kernel mode
> >> [ 20.065895] #PF: error_code(0x0000) - not-present page
> >> [ 20.065895] PGD 0 P4D 0
> >> [ 20.065895] Oops: 0000 [#1] PREEMPT SMP PTI
> >> [ 20.065895] CPU: 0 PID: 111 Comm: vsock_test Not tainted 6.4.0-rc3-gefcccba07069 #385
> >> [ 20.065895] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd44
> >> [ 20.065895] RIP: 0010:static_key_count+0x0/0x20
> >> [ 20.065895] Code: 04 4c 8b 46 08 49 29 c0 4c 01 c8 4c 89 47 08 89 0e 89 56 04 48 89 46 08 f
> >> [ 20.065895] RSP: 0018:ffffbbb000223dc0 EFLAGS: 00010202
> >> [ 20.065895] RAX: ffffffff85709880 RBX: ffffffffc0079140 RCX: 0000000000000000
> >> [ 20.065895] RDX: ffff9f73c2175700 RSI: 0000000000000000 RDI: 0000000000000000
> >> [ 20.065895] RBP: ffff9f73c2385900 R08: ffffbbb000223d30 R09: ffff9f73ff896000
> >> [ 20.065895] R10: 0000000000001000 R11: 0000000000000000 R12: ffffbbb000223e80
> >> [ 20.065895] R13: 0000000000000000 R14: 0000000000000002 R15: ffff9f73c1cfaa80
> >> [ 20.065895] FS: 00007f1ad82f55c0(0000) GS:ffff9f73fe400000(0000) knlGS:0000000000000000
> >> [ 20.065895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [ 20.065895] CR2: 0000000000000000 CR3: 000000003f954000 CR4: 00000000000006f0
> >> [ 20.065895] Call Trace:
> >> [ 20.065895] <TASK>
> >> [ 20.065895] once_deferred+0xd/0x30
> >> [ 20.065895] vsock_assign_transport+0x9a/0x1b0 [vsock]
> >> [ 20.065895] vsock_connect+0xb4/0x3a0 [vsock]
> >> [ 20.065895] ? var_wake_function+0x60/0x60
> >> [ 20.065895] __sys_connect+0x9e/0xd0
> >> [ 20.065895] ? _raw_spin_unlock_irq+0xe/0x30
> >> [ 20.065895] ? do_setitimer+0x128/0x1f0
> >> [ 20.065895] ? alarm_setitimer+0x4c/0x90
> >> [ 20.065895] ? fpregs_assert_state_consistent+0x1d/0x50
> >> [ 20.065895] ? exit_to_user_mode_prepare+0x36/0x130
> >> [ 20.065895] __x64_sys_connect+0x11/0x20
> >> [ 20.065895] do_syscall_64+0x3b/0xc0
> >> [ 20.065895] entry_SYSCALL_64_after_hwframe+0x4b/0xb5
> >> [ 20.065895] RIP: 0033:0x7f1ad822dd13
> >> [ 20.065895] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 64 8
> >> [ 20.065895] RSP: 002b:00007ffc513e3c98 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
> >> [ 20.065895] RAX: ffffffffffffffda RBX: 000055aed298e020 RCX: 00007f1ad822dd13
> >> [ 20.065895] RDX: 0000000000000010 RSI: 00007ffc513e3cb0 RDI: 0000000000000004
> >> [ 20.065895] RBP: 0000000000000004 R08: 000055aed32b2018 R09: 0000000000000000
> >> [ 20.065895] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> >> [ 20.065895] R13: 000055aed298acb1 R14: 00007ffc513e3cb0 R15: 00007ffc513e3d40
> >> [ 20.065895] </TASK>
> >> [ 20.065895] Modules linked in: vsock_loopback vhost_vsock vmw_vsock_virtio_transport vmw_vb
> >
> > ^ I'm guessing this is the difference between our setups. I have been
> > going all built-in, let me see if I can reproduce w/ modules...
>
> Ah ok, I think using modules is good practice here, because it could test that Your symbols
> set is valid to work as modules and 'rmmod' could check problems with the forgotten references
> for example for socket or skb. I'm working with modules most of the time.
>

Agreed, definitely good practice. I test combinations of 'm' and 'y' for
the subsystem as part of a more rigorous testing process that I do when
I feel it is getting closer to losing the RFC tag. I'll definitely test
it for the rest of this series so you don't run into issues though. You
may have just convinced to change my environment around to let me use
modules by default...

Thanks,
Bobby

> >
> >> [ 20.065895] CR2: 0000000000000000
> >> [ 20.154060] ---[ end trace 0000000000000000 ]---
> >> [ 20.155519] RIP: 0010:static_key_count+0x0/0x20
> >> [ 20.156932] Code: 04 4c 8b 46 08 49 29 c0 4c 01 c8 4c 89 47 08 89 0e 89 56 04 48 89 46 08 f
> >> [ 20.161367] RSP: 0018:ffffbbb000223dc0 EFLAGS: 00010202
> >> [ 20.162613] RAX: ffffffff85709880 RBX: ffffffffc0079140 RCX: 0000000000000000
> >> [ 20.164262] RDX: ffff9f73c2175700 RSI: 0000000000000000 RDI: 0000000000000000
> >> [ 20.165934] RBP: ffff9f73c2385900 R08: ffffbbb000223d30 R09: ffff9f73ff896000
> >> [ 20.167684] R10: 0000000000001000 R11: 0000000000000000 R12: ffffbbb000223e80
> >> [ 20.169427] R13: 0000000000000000 R14: 0000000000000002 R15: ffff9f73c1cfaa80
> >> [ 20.171109] FS: 00007f1ad82f55c0(0000) GS:ffff9f73fe400000(0000) knlGS:0000000000000000
> >> [ 20.173000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [ 20.174381] CR2: 0000000000000000 CR3: 000000003f954000 CR4: 00000000000006f0
> >>
> >> So, what HEAD do You use? May be You have some specific config (I use x86-64 defconfig + vsock/vhost
> >> related things) ?
> >>
> >
> > For this series I used net-next:
> > 28cfea989d6f55c3d10608eba2a2bae609c5bf3e
> >
> >> Thanks, Arseniy
> >>
> >
> > As always, thanks for the bug finding! I'll report back when I
> > reproduce or with questions if I can't.
>
> Thanks!
>
> Thanks, Arseniy
>
> >
> > Best,
> > Bobby
> >
> >>
> >> On 10.06.2023 03:58, Bobby Eshleman wrote:
> >>> From: Jiang Wang <[email protected]>
> >>>
> >>> This patch adds tests for vsock datagram.
> >>>
> >>> Signed-off-by: Bobby Eshleman <[email protected]>
> >>> Signed-off-by: Jiang Wang <[email protected]>
> >>> ---
> >>> tools/testing/vsock/util.c | 141 ++++++++++++-
> >>> tools/testing/vsock/util.h | 6 +
> >>> tools/testing/vsock/vsock_test.c | 432 +++++++++++++++++++++++++++++++++++++++
> >>> 3 files changed, 578 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
> >>> index 01b636d3039a..811e70d7cf1e 100644
> >>> --- a/tools/testing/vsock/util.c
> >>> +++ b/tools/testing/vsock/util.c
> >>> @@ -99,7 +99,8 @@ static int vsock_connect(unsigned int cid, unsigned int port, int type)
> >>> int ret;
> >>> int fd;
> >>>
> >>> - control_expectln("LISTENING");
> >>> + if (type != SOCK_DGRAM)
> >>> + control_expectln("LISTENING");
> >>>
> >>> fd = socket(AF_VSOCK, type, 0);
> >>>
> >>> @@ -130,6 +131,11 @@ int vsock_seqpacket_connect(unsigned int cid, unsigned int port)
> >>> return vsock_connect(cid, port, SOCK_SEQPACKET);
> >>> }
> >>>
> >>> +int vsock_dgram_connect(unsigned int cid, unsigned int port)
> >>> +{
> >>> + return vsock_connect(cid, port, SOCK_DGRAM);
> >>> +}
> >>> +
> >>> /* Listen on <cid, port> and return the first incoming connection. The remote
> >>> * address is stored to clientaddrp. clientaddrp may be NULL.
> >>> */
> >>> @@ -211,6 +217,34 @@ int vsock_seqpacket_accept(unsigned int cid, unsigned int port,
> >>> return vsock_accept(cid, port, clientaddrp, SOCK_SEQPACKET);
> >>> }
> >>>
> >>> +int vsock_dgram_bind(unsigned int cid, unsigned int port)
> >>> +{
> >>> + union {
> >>> + struct sockaddr sa;
> >>> + struct sockaddr_vm svm;
> >>> + } addr = {
> >>> + .svm = {
> >>> + .svm_family = AF_VSOCK,
> >>> + .svm_port = port,
> >>> + .svm_cid = cid,
> >>> + },
> >>> + };
> >>> + int fd;
> >>> +
> >>> + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> >>> + if (fd < 0) {
> >>> + perror("socket");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
> >>> + perror("bind");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + return fd;
> >>> +}
> >>> +
> >>> /* Transmit one byte and check the return value.
> >>> *
> >>> * expected_ret:
> >>> @@ -260,6 +294,57 @@ void send_byte(int fd, int expected_ret, int flags)
> >>> }
> >>> }
> >>>
> >>> +/* Transmit one byte and check the return value.
> >>> + *
> >>> + * expected_ret:
> >>> + * <0 Negative errno (for testing errors)
> >>> + * 0 End-of-file
> >>> + * 1 Success
> >>> + */
> >>> +void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
> >>> + int flags)
> >>> +{
> >>> + const uint8_t byte = 'A';
> >>> + ssize_t nwritten;
> >>> +
> >>> + timeout_begin(TIMEOUT);
> >>> + do {
> >>> + nwritten = sendto(fd, &byte, sizeof(byte), flags, dest_addr,
> >>> + len);
> >>> + timeout_check("write");
> >>> + } while (nwritten < 0 && errno == EINTR);
> >>> + timeout_end();
> >>> +
> >>> + if (expected_ret < 0) {
> >>> + if (nwritten != -1) {
> >>> + fprintf(stderr, "bogus sendto(2) return value %zd\n",
> >>> + nwritten);
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> + if (errno != -expected_ret) {
> >>> + perror("write");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> + return;
> >>> + }
> >>> +
> >>> + if (nwritten < 0) {
> >>> + perror("write");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> + if (nwritten == 0) {
> >>> + if (expected_ret == 0)
> >>> + return;
> >>> +
> >>> + fprintf(stderr, "unexpected EOF while sending byte\n");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> + if (nwritten != sizeof(byte)) {
> >>> + fprintf(stderr, "bogus sendto(2) return value %zd\n", nwritten);
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +}
> >>> +
> >>> /* Receive one byte and check the return value.
> >>> *
> >>> * expected_ret:
> >>> @@ -313,6 +398,60 @@ void recv_byte(int fd, int expected_ret, int flags)
> >>> }
> >>> }
> >>>
> >>> +/* Receive one byte and check the return value.
> >>> + *
> >>> + * expected_ret:
> >>> + * <0 Negative errno (for testing errors)
> >>> + * 0 End-of-file
> >>> + * 1 Success
> >>> + */
> >>> +void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
> >>> + int expected_ret, int flags)
> >>> +{
> >>> + uint8_t byte;
> >>> + ssize_t nread;
> >>> +
> >>> + timeout_begin(TIMEOUT);
> >>> + do {
> >>> + nread = recvfrom(fd, &byte, sizeof(byte), flags, src_addr, addrlen);
> >>> + timeout_check("read");
> >>> + } while (nread < 0 && errno == EINTR);
> >>> + timeout_end();
> >>> +
> >>> + if (expected_ret < 0) {
> >>> + if (nread != -1) {
> >>> + fprintf(stderr, "bogus recvfrom(2) return value %zd\n",
> >>> + nread);
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> + if (errno != -expected_ret) {
> >>> + perror("read");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> + return;
> >>> + }
> >>> +
> >>> + if (nread < 0) {
> >>> + perror("read");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> + if (nread == 0) {
> >>> + if (expected_ret == 0)
> >>> + return;
> >>> +
> >>> + fprintf(stderr, "unexpected EOF while receiving byte\n");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> + if (nread != sizeof(byte)) {
> >>> + fprintf(stderr, "bogus recvfrom(2) return value %zd\n", nread);
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> + if (byte != 'A') {
> >>> + fprintf(stderr, "unexpected byte read %c\n", byte);
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +}
> >>> +
> >>> /* Run test cases. The program terminates if a failure occurs. */
> >>> void run_tests(const struct test_case *test_cases,
> >>> const struct test_opts *opts)
> >>> diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
> >>> index fb99208a95ea..a69e128d120c 100644
> >>> --- a/tools/testing/vsock/util.h
> >>> +++ b/tools/testing/vsock/util.h
> >>> @@ -37,13 +37,19 @@ void init_signals(void);
> >>> unsigned int parse_cid(const char *str);
> >>> int vsock_stream_connect(unsigned int cid, unsigned int port);
> >>> int vsock_seqpacket_connect(unsigned int cid, unsigned int port);
> >>> +int vsock_dgram_connect(unsigned int cid, unsigned int port);
> >>> int vsock_stream_accept(unsigned int cid, unsigned int port,
> >>> struct sockaddr_vm *clientaddrp);
> >>> int vsock_seqpacket_accept(unsigned int cid, unsigned int port,
> >>> struct sockaddr_vm *clientaddrp);
> >>> +int vsock_dgram_bind(unsigned int cid, unsigned int port);
> >>> void vsock_wait_remote_close(int fd);
> >>> void send_byte(int fd, int expected_ret, int flags);
> >>> +void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
> >>> + int flags);
> >>> void recv_byte(int fd, int expected_ret, int flags);
> >>> +void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
> >>> + int expected_ret, int flags);
> >>> void run_tests(const struct test_case *test_cases,
> >>> const struct test_opts *opts);
> >>> void list_tests(const struct test_case *test_cases);
> >>> diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
> >>> index ac1bd3ac1533..ded82d39ee5d 100644
> >>> --- a/tools/testing/vsock/vsock_test.c
> >>> +++ b/tools/testing/vsock/vsock_test.c
> >>> @@ -1053,6 +1053,413 @@ static void test_stream_virtio_skb_merge_server(const struct test_opts *opts)
> >>> close(fd);
> >>> }
> >>>
> >>> +static void test_dgram_sendto_client(const struct test_opts *opts)
> >>> +{
> >>> + union {
> >>> + struct sockaddr sa;
> >>> + struct sockaddr_vm svm;
> >>> + } addr = {
> >>> + .svm = {
> >>> + .svm_family = AF_VSOCK,
> >>> + .svm_port = 1234,
> >>> + .svm_cid = opts->peer_cid,
> >>> + },
> >>> + };
> >>> + int fd;
> >>> +
> >>> + /* Wait for the server to be ready */
> >>> + control_expectln("BIND");
> >>> +
> >>> + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> >>> + if (fd < 0) {
> >>> + perror("socket");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0);
> >>> +
> >>> + /* Notify the server that the client has finished */
> >>> + control_writeln("DONE");
> >>> +
> >>> + close(fd);
> >>> +}
> >>> +
> >>> +static void test_dgram_sendto_server(const struct test_opts *opts)
> >>> +{
> >>> + union {
> >>> + struct sockaddr sa;
> >>> + struct sockaddr_vm svm;
> >>> + } addr = {
> >>> + .svm = {
> >>> + .svm_family = AF_VSOCK,
> >>> + .svm_port = 1234,
> >>> + .svm_cid = VMADDR_CID_ANY,
> >>> + },
> >>> + };
> >>> + int len = sizeof(addr.sa);
> >>> + int fd;
> >>> +
> >>> + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> >>> + if (fd < 0) {
> >>> + perror("socket");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
> >>> + perror("bind");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + /* Notify the client that the server is ready */
> >>> + control_writeln("BIND");
> >>> +
> >>> + recvfrom_byte(fd, &addr.sa, &len, 1, 0);
> >>> +
> >>> + /* Wait for the client to finish */
> >>> + control_expectln("DONE");
> >>> +
> >>> + close(fd);
> >>> +}
> >>> +
> >>> +static void test_dgram_connect_client(const struct test_opts *opts)
> >>> +{
> >>> + union {
> >>> + struct sockaddr sa;
> >>> + struct sockaddr_vm svm;
> >>> + } addr = {
> >>> + .svm = {
> >>> + .svm_family = AF_VSOCK,
> >>> + .svm_port = 1234,
> >>> + .svm_cid = opts->peer_cid,
> >>> + },
> >>> + };
> >>> + int ret;
> >>> + int fd;
> >>> +
> >>> + /* Wait for the server to be ready */
> >>> + control_expectln("BIND");
> >>> +
> >>> + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> >>> + if (fd < 0) {
> >>> + perror("bind");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + ret = connect(fd, &addr.sa, sizeof(addr.svm));
> >>> + if (ret < 0) {
> >>> + perror("connect");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + send_byte(fd, 1, 0);
> >>> +
> >>> + /* Notify the server that the client has finished */
> >>> + control_writeln("DONE");
> >>> +
> >>> + close(fd);
> >>> +}
> >>> +
> >>> +static void test_dgram_connect_server(const struct test_opts *opts)
> >>> +{
> >>> + test_dgram_sendto_server(opts);
> >>> +}
> >>> +
> >>> +static void test_dgram_multiconn_sendto_client(const struct test_opts *opts)
> >>> +{
> >>> + union {
> >>> + struct sockaddr sa;
> >>> + struct sockaddr_vm svm;
> >>> + } addr = {
> >>> + .svm = {
> >>> + .svm_family = AF_VSOCK,
> >>> + .svm_port = 1234,
> >>> + .svm_cid = opts->peer_cid,
> >>> + },
> >>> + };
> >>> + int fds[MULTICONN_NFDS];
> >>> + int i;
> >>> +
> >>> + /* Wait for the server to be ready */
> >>> + control_expectln("BIND");
> >>> +
> >>> + for (i = 0; i < MULTICONN_NFDS; i++) {
> >>> + fds[i] = socket(AF_VSOCK, SOCK_DGRAM, 0);
> >>> + if (fds[i] < 0) {
> >>> + perror("socket");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> + }
> >>> +
> >>> + for (i = 0; i < MULTICONN_NFDS; i++)
> >>> + sendto_byte(fds[i], &addr.sa, sizeof(addr.svm), 1, 0);
> >>> +
> >>> + /* Notify the server that the client has finished */
> >>> + control_writeln("DONE");
> >>> +
> >>> + for (i = 0; i < MULTICONN_NFDS; i++)
> >>> + close(fds[i]);
> >>> +}
> >>> +
> >>> +static void test_dgram_multiconn_sendto_server(const struct test_opts *opts)
> >>> +{
> >>> + union {
> >>> + struct sockaddr sa;
> >>> + struct sockaddr_vm svm;
> >>> + } addr = {
> >>> + .svm = {
> >>> + .svm_family = AF_VSOCK,
> >>> + .svm_port = 1234,
> >>> + .svm_cid = VMADDR_CID_ANY,
> >>> + },
> >>> + };
> >>> + int len = sizeof(addr.sa);
> >>> + int fd;
> >>> + int i;
> >>> +
> >>> + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> >>> + if (fd < 0) {
> >>> + perror("socket");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
> >>> + perror("bind");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + /* Notify the client that the server is ready */
> >>> + control_writeln("BIND");
> >>> +
> >>> + for (i = 0; i < MULTICONN_NFDS; i++)
> >>> + recvfrom_byte(fd, &addr.sa, &len, 1, 0);
> >>> +
> >>> + /* Wait for the client to finish */
> >>> + control_expectln("DONE");
> >>> +
> >>> + close(fd);
> >>> +}
> >>> +
> >>> +static void test_dgram_multiconn_send_client(const struct test_opts *opts)
> >>> +{
> >>> + int fds[MULTICONN_NFDS];
> >>> + int i;
> >>> +
> >>> + /* Wait for the server to be ready */
> >>> + control_expectln("BIND");
> >>> +
> >>> + for (i = 0; i < MULTICONN_NFDS; i++) {
> >>> + fds[i] = vsock_dgram_connect(opts->peer_cid, 1234);
> >>> + if (fds[i] < 0) {
> >>> + perror("socket");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> + }
> >>> +
> >>> + for (i = 0; i < MULTICONN_NFDS; i++)
> >>> + send_byte(fds[i], 1, 0);
> >>> +
> >>> + /* Notify the server that the client has finished */
> >>> + control_writeln("DONE");
> >>> +
> >>> + for (i = 0; i < MULTICONN_NFDS; i++)
> >>> + close(fds[i]);
> >>> +}
> >>> +
> >>> +static void test_dgram_multiconn_send_server(const struct test_opts *opts)
> >>> +{
> >>> + union {
> >>> + struct sockaddr sa;
> >>> + struct sockaddr_vm svm;
> >>> + } addr = {
> >>> + .svm = {
> >>> + .svm_family = AF_VSOCK,
> >>> + .svm_port = 1234,
> >>> + .svm_cid = VMADDR_CID_ANY,
> >>> + },
> >>> + };
> >>> + int fd;
> >>> + int i;
> >>> +
> >>> + fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> >>> + if (fd < 0) {
> >>> + perror("socket");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
> >>> + perror("bind");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + /* Notify the client that the server is ready */
> >>> + control_writeln("BIND");
> >>> +
> >>> + for (i = 0; i < MULTICONN_NFDS; i++)
> >>> + recv_byte(fd, 1, 0);
> >>> +
> >>> + /* Wait for the client to finish */
> >>> + control_expectln("DONE");
> >>> +
> >>> + close(fd);
> >>> +}
> >>> +
> >>> +static void test_dgram_msg_bounds_client(const struct test_opts *opts)
> >>> +{
> >>> + unsigned long recv_buf_size;
> >>> + int page_size;
> >>> + int msg_cnt;
> >>> + int fd;
> >>> +
> >>> + fd = vsock_dgram_connect(opts->peer_cid, 1234);
> >>> + if (fd < 0) {
> >>> + perror("connect");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + /* Let the server know the client is ready */
> >>> + control_writeln("CLNTREADY");
> >>> +
> >>> + msg_cnt = control_readulong();
> >>> + recv_buf_size = control_readulong();
> >>> +
> >>> + /* Wait, until receiver sets buffer size. */
> >>> + control_expectln("SRVREADY");
> >>> +
> >>> + page_size = getpagesize();
> >>> +
> >>> + for (int i = 0; i < msg_cnt; i++) {
> >>> + unsigned long curr_hash;
> >>> + ssize_t send_size;
> >>> + size_t buf_size;
> >>> + void *buf;
> >>> +
> >>> + /* Use "small" buffers and "big" buffers. */
> >>> + if (i & 1)
> >>> + buf_size = page_size +
> >>> + (rand() % (MAX_MSG_SIZE - page_size));
> >>> + else
> >>> + buf_size = 1 + (rand() % page_size);
> >>> +
> >>> + buf_size = min(buf_size, recv_buf_size);
> >>> +
> >>> + buf = malloc(buf_size);
> >>> +
> >>> + if (!buf) {
> >>> + perror("malloc");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + memset(buf, rand() & 0xff, buf_size);
> >>> + /* Set at least one MSG_EOR + some random. */
> >>> +
> >>> + send_size = send(fd, buf, buf_size, 0);
> >>> +
> >>> + if (send_size < 0) {
> >>> + perror("send");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + if (send_size != buf_size) {
> >>> + fprintf(stderr, "Invalid send size\n");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + /* In theory the implementation isn't required to transmit
> >>> + * these packets in order, so we use this SYNC control message
> >>> + * so that server and client coordinate sending and receiving
> >>> + * one packet at a time. The client sends a packet and waits
> >>> + * until it has been received before sending another.
> >>> + */
> >>> + control_writeln("PKTSENT");
> >>> + control_expectln("PKTRECV");
> >>> +
> >>> + /* Send the server a hash of the packet */
> >>> + curr_hash = hash_djb2(buf, buf_size);
> >>> + control_writeulong(curr_hash);
> >>> + free(buf);
> >>> + }
> >>> +
> >>> + control_writeln("SENDDONE");
> >>> + close(fd);
> >>> +}
> >>> +
> >>> +static void test_dgram_msg_bounds_server(const struct test_opts *opts)
> >>> +{
> >>> + const unsigned long msg_cnt = 16;
> >>> + unsigned long sock_buf_size;
> >>> + struct msghdr msg = {0};
> >>> + struct iovec iov = {0};
> >>> + char buf[MAX_MSG_SIZE];
> >>> + socklen_t len;
> >>> + int fd;
> >>> + int i;
> >>> +
> >>> + fd = vsock_dgram_bind(VMADDR_CID_ANY, 1234);
> >>> +
> >>> + if (fd < 0) {
> >>> + perror("bind");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + /* Set receive buffer to maximum */
> >>> + sock_buf_size = -1;
> >>> + if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
> >>> + &sock_buf_size, sizeof(sock_buf_size))) {
> >>> + perror("setsockopt(SO_RECVBUF)");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + /* Retrieve the receive buffer size */
> >>> + len = sizeof(sock_buf_size);
> >>> + if (getsockopt(fd, SOL_SOCKET, SO_RCVBUF,
> >>> + &sock_buf_size, &len)) {
> >>> + perror("getsockopt(SO_RECVBUF)");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + /* Client ready to receive parameters */
> >>> + control_expectln("CLNTREADY");
> >>> +
> >>> + control_writeulong(msg_cnt);
> >>> + control_writeulong(sock_buf_size);
> >>> +
> >>> + /* Ready to receive data. */
> >>> + control_writeln("SRVREADY");
> >>> +
> >>> + iov.iov_base = buf;
> >>> + iov.iov_len = sizeof(buf);
> >>> + msg.msg_iov = &iov;
> >>> + msg.msg_iovlen = 1;
> >>> +
> >>> + for (i = 0; i < msg_cnt; i++) {
> >>> + unsigned long remote_hash;
> >>> + unsigned long curr_hash;
> >>> + ssize_t recv_size;
> >>> +
> >>> + control_expectln("PKTSENT");
> >>> + recv_size = recvmsg(fd, &msg, 0);
> >>> + control_writeln("PKTRECV");
> >>> +
> >>> + if (!recv_size)
> >>> + break;
> >>> +
> >>> + if (recv_size < 0) {
> >>> + perror("recvmsg");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> +
> >>> + curr_hash = hash_djb2(msg.msg_iov[0].iov_base, recv_size);
> >>> + remote_hash = control_readulong();
> >>> +
> >>> + if (curr_hash != remote_hash) {
> >>> + fprintf(stderr, "Message bounds broken\n");
> >>> + exit(EXIT_FAILURE);
> >>> + }
> >>> + }
> >>> +
> >>> + close(fd);
> >>> +}
> >>> +
> >>> static struct test_case test_cases[] = {
> >>> {
> >>> .name = "SOCK_STREAM connection reset",
> >>> @@ -1128,6 +1535,31 @@ static struct test_case test_cases[] = {
> >>> .run_client = test_stream_virtio_skb_merge_client,
> >>> .run_server = test_stream_virtio_skb_merge_server,
> >>> },
> >>> + {
> >>> + .name = "SOCK_DGRAM client sendto",
> >>> + .run_client = test_dgram_sendto_client,
> >>> + .run_server = test_dgram_sendto_server,
> >>> + },
> >>> + {
> >>> + .name = "SOCK_DGRAM client connect",
> >>> + .run_client = test_dgram_connect_client,
> >>> + .run_server = test_dgram_connect_server,
> >>> + },
> >>> + {
> >>> + .name = "SOCK_DGRAM multiple connections using sendto",
> >>> + .run_client = test_dgram_multiconn_sendto_client,
> >>> + .run_server = test_dgram_multiconn_sendto_server,
> >>> + },
> >>> + {
> >>> + .name = "SOCK_DGRAM multiple connections using send",
> >>> + .run_client = test_dgram_multiconn_send_client,
> >>> + .run_server = test_dgram_multiconn_send_server,
> >>> + },
> >>> + {
> >>> + .name = "SOCK_DGRAM msg bounds",
> >>> + .run_client = test_dgram_msg_bounds_client,
> >>> + .run_server = test_dgram_msg_bounds_server,
> >>> + },
> >>> {},
> >>> };
> >>>
> >>>

2023-06-26 15:29:07

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 6/8] virtio/vsock: support dgrams

On Fri, Jun 23, 2023 at 04:37:55AM +0000, Bobby Eshleman wrote:
>On Thu, Jun 22, 2023 at 06:09:12PM +0200, Stefano Garzarella wrote:
>> On Sun, Jun 11, 2023 at 11:49:02PM +0300, Arseniy Krasnov wrote:
>> > Hello Bobby!
>> >
>> > On 10.06.2023 03:58, Bobby Eshleman wrote:
>> > > This commit adds support for datagrams over virtio/vsock.
>> > >
>> > > Message boundaries are preserved on a per-skb and per-vq entry basis.
>> >
>> > I'm a little bit confused about the following case: let vhost sends 4097 bytes
>> > datagram to the guest. Guest uses 4096 RX buffers in it's virtio queue, each
>> > buffer has attached empty skb to it. Vhost places first 4096 bytes to the first
>> > buffer of guests RX queue, and 1 last byte to the second buffer. Now IIUC guest
>> > has two skb in it rx queue, and user in guest wants to read data - does it read
>> > 4097 bytes, while guest has two skb - 4096 bytes and 1 bytes? In seqpacket there is
>> > special marker in header which shows where message ends, and how it works here?
>>
>> I think the main difference is that DGRAM is not connection-oriented, so
>> we don't have a stream and we can't split the packet into 2 (maybe we
>> could, but we have no guarantee that the second one for example will be
>> not discarded because there is no space).
>>
>> So I think it is acceptable as a restriction to keep it simple.
>>
>> My only doubt is, should we make the RX buffer size configurable,
>> instead of always using 4k?
>>
>I think that is a really good idea. What mechanism do you imagine?

Some parameter in sysfs?

>
>For sendmsg() with buflen > VQ_BUF_SIZE, I think I'd like -ENOBUFS

For the guest it should be easy since it allocates the buffers, but for
the host?

Maybe we should add a field in the configuration space that reports some
sort of MTU.

Something in addition to what Laura had proposed here:
https://markmail.org/message/ymhz7wllutdxji3e

>returned even though it is uncharacteristic of Linux sockets.
>Alternatively, silently dropping is okay... but seems needlessly
>unhelpful.

UDP takes advantage of IP fragmentation, right?
But what happens if a fragment is lost?

We should try to behave in a similar way.

>
>FYI, this patch is broken for h2g because it requeues partially sent
>skbs, so probably doesn't need much code review until we decided on the
>policy.

Got it.

Thanks,
Stefano


2023-06-26 15:29:08

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 3/8] vsock: support multi-transport datagrams

On Fri, Jun 23, 2023 at 02:59:23AM +0000, Bobby Eshleman wrote:
>On Fri, Jun 23, 2023 at 02:50:01AM +0000, Bobby Eshleman wrote:
>> On Thu, Jun 22, 2023 at 05:19:08PM +0200, Stefano Garzarella wrote:
>> > On Sat, Jun 10, 2023 at 12:58:30AM +0000, Bobby Eshleman wrote:
>> > > This patch adds support for multi-transport datagrams.
>> > >
>> > > This includes:
>> > > - Per-packet lookup of transports when using sendto(sockaddr_vm)
>> > > - Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
>> > > sockaddr_vm
>> > >
>> > > To preserve backwards compatibility with VMCI, some important changes
>> > > were made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
>> > > be used for dgrams iff there is not yet a g2h or h2g transport that has
>> >
>> > s/iff/if
>> >
>> > > been registered that can transmit the packet. If there is a g2h/h2g
>> > > transport for that remote address, then that transport will be used and
>> > > not "transport_dgram". This essentially makes "transport_dgram" a
>> > > fallback transport for when h2g/g2h has not yet gone online, which
>> > > appears to be the exact use case for VMCI.
>> > >
>> > > This design makes sense, because there is no reason that the
>> > > transport_{g2h,h2g} cannot also service datagrams, which makes the role
>> > > of transport_dgram difficult to understand outside of the VMCI context.
>> > >
>> > > The logic around "transport_dgram" had to be retained to prevent
>> > > breaking VMCI:
>> > >
>> > > 1) VMCI datagrams appear to function outside of the h2g/g2h
>> > > paradigm. When the vmci transport becomes online, it registers itself
>> > > with the DGRAM feature, but not H2G/G2H. Only later when the
>> > > transport has more information about its environment does it register
>> > > H2G or G2H. In the case that a datagram socket becomes active
>> > > after DGRAM registration but before G2H/H2G registration, the
>> > > "transport_dgram" transport needs to be used.
>> >
>> > IIRC we did this, because at that time only VMCI supported DGRAM. Now that
>> > there are more transports, maybe DGRAM can follow the h2g/g2h paradigm.
>> >
>>
>> Totally makes sense. I'll add the detail above that the prior design was
>> a result of chronology.
>>
>> > >
>> > > 2) VMCI seems to require special message be sent by the transport when a
>> > > datagram socket calls bind(). Under the h2g/g2h model, the transport
>> > > is selected using the remote_addr which is set by connect(). At
>> > > bind time there is no remote_addr because often no connect() has been
>> > > called yet: the transport is null. Therefore, with a null transport
>> > > there doesn't seem to be any good way for a datagram socket a tell the
>> > > VMCI transport that it has just had bind() called upon it.
>> >
>> > @Vishnu, @Bryan do you think we can avoid this in some way?
>> >
>> > >
>> > > Only transports with a special datagram fallback use-case such as VMCI
>> > > need to register VSOCK_TRANSPORT_F_DGRAM.
>> >
>> > Maybe we should rename it in VSOCK_TRANSPORT_F_DGRAM_FALLBACK or
>> > something like that.
>> >
>> > In any case, we definitely need to update the comment in
>> > include/net/af_vsock.h on top of VSOCK_TRANSPORT_F_DGRAM mentioning
>> > this.
>> >
>>
>> Agreed. I'll rename to VSOCK_TRANSPORT_F_DGRAM_FALLBACK, unless we find
>> there is a better way altogether.
>>
>> > >
>> > > Signed-off-by: Bobby Eshleman <[email protected]>
>> > > ---
>> > > drivers/vhost/vsock.c | 1 -
>> > > include/linux/virtio_vsock.h | 2 -
>> > > net/vmw_vsock/af_vsock.c | 78 +++++++++++++++++++++++++--------
>> > > net/vmw_vsock/hyperv_transport.c | 6 ---
>> > > net/vmw_vsock/virtio_transport.c | 1 -
>> > > net/vmw_vsock/virtio_transport_common.c | 7 ---
>> > > net/vmw_vsock/vsock_loopback.c | 1 -
>> > > 7 files changed, 60 insertions(+), 36 deletions(-)
>> > >
>> > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> > > index c8201c070b4b..8f0082da5e70 100644
>> > > --- a/drivers/vhost/vsock.c
>> > > +++ b/drivers/vhost/vsock.c
>> > > @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
>> > > .cancel_pkt = vhost_transport_cancel_pkt,
>> > >
>> > > .dgram_enqueue = virtio_transport_dgram_enqueue,
>> > > - .dgram_bind = virtio_transport_dgram_bind,
>> > > .dgram_allow = virtio_transport_dgram_allow,
>> > > .dgram_get_cid = virtio_transport_dgram_get_cid,
>> > > .dgram_get_port = virtio_transport_dgram_get_port,
>> > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> > > index 23521a318cf0..73afa09f4585 100644
>> > > --- a/include/linux/virtio_vsock.h
>> > > +++ b/include/linux/virtio_vsock.h
>> > > @@ -216,8 +216,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
>> > > u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
>> > > bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
>> > > bool virtio_transport_stream_allow(u32 cid, u32 port);
>> > > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>> > > - struct sockaddr_vm *addr);
>> > > bool virtio_transport_dgram_allow(u32 cid, u32 port);
>> > > int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
>> > > int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
>> > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> > > index 74358f0b47fa..ef86765f3765 100644
>> > > --- a/net/vmw_vsock/af_vsock.c
>> > > +++ b/net/vmw_vsock/af_vsock.c
>> > > @@ -438,6 +438,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
>> > > return transport;
>> > > }
>> > >
>> > > +static const struct vsock_transport *
>> > > +vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
>> > > +{
>> > > + const struct vsock_transport *transport;
>> > > +
>> > > + transport = vsock_connectible_lookup_transport(cid, flags);
>> > > + if (transport)
>> > > + return transport;
>> > > +
>> > > + return transport_dgram;
>> > > +}
>> > > +
>> > > /* Assign a transport to a socket and call the .init transport callback.
>> > > *
>> > > * Note: for connection oriented socket this must be called when vsk->remote_addr
>> > > @@ -474,7 +486,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
>> > >
>> > > switch (sk->sk_type) {
>> > > case SOCK_DGRAM:
>> > > - new_transport = transport_dgram;
>> > > + new_transport = vsock_dgram_lookup_transport(remote_cid,
>> > > + remote_flags);
>> > > break;
>> > > case SOCK_STREAM:
>> > > case SOCK_SEQPACKET:
>> > > @@ -691,6 +704,9 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
>> > > static int __vsock_bind_dgram(struct vsock_sock *vsk,
>> > > struct sockaddr_vm *addr)
>> > > {
>> > > + if (!vsk->transport || !vsk->transport->dgram_bind)
>> > > + return -EINVAL;
>> > > +
>> > > return vsk->transport->dgram_bind(vsk, addr);
>> > > }
>> > >
>> > > @@ -1172,19 +1188,24 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
>> > >
>> > > lock_sock(sk);
>> > >
>> > > - transport = vsk->transport;
>> > > -
>> > > - err = vsock_auto_bind(vsk);
>> > > - if (err)
>> > > - goto out;
>> > > -
>> > > -
>> > > /* If the provided message contains an address, use that. Otherwise
>> > > * fall back on the socket's remote handle (if it has been connected).
>> > > */
>> > > if (msg->msg_name &&
>> > > vsock_addr_cast(msg->msg_name, msg->msg_namelen,
>> > > &remote_addr) == 0) {
>> > > + transport = vsock_dgram_lookup_transport(remote_addr->svm_cid,
>> > > + remote_addr->svm_flags);
>> > > + if (!transport) {
>> > > + err = -EINVAL;
>> > > + goto out;
>> > > + }
>> > > +
>> > > + if (!try_module_get(transport->module)) {
>> > > + err = -ENODEV;
>> > > + goto out;
>> > > + }
>> > > +
>> > > /* Ensure this address is of the right type and is a valid
>> > > * destination.
>> > > */
>> > > @@ -1193,11 +1214,27 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
>> > > remote_addr->svm_cid = transport->get_local_cid();
>> > >
>> >
>> > From here ...
>> >
>> > > if (!vsock_addr_bound(remote_addr)) {
>> > > + module_put(transport->module);
>> > > + err = -EINVAL;
>> > > + goto out;
>> > > + }
>> > > +
>> > > + if (!transport->dgram_allow(remote_addr->svm_cid,
>> > > + remote_addr->svm_port)) {
>> > > + module_put(transport->module);
>> > > err = -EINVAL;
>> > > goto out;
>> > > }
>> > > +
>> > > + err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
>> >
>> > ... to here, looks like duplicate code, can we get it out of the if block?
>> >
>>
>> Yes, I think using something like this:
>>
>> [...]
>> bool module_got = false;
>>
>> [...]
>> if (!try_module_get(transport->module)) {
>> err = -ENODEV;
>> goto out;
>> }
>> module_got = true;
>>
>> [...]
>>
>> out:
>> if (likely(transport && !err && module_got))
>
>Actually, just...
>
> if (module_got)
>

Yep, I think it should work ;-)

Thanks,
Stefano


2023-06-27 17:36:09

by Bobby Eshleman

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 6/8] virtio/vsock: support dgrams

On Mon, Jun 26, 2023 at 05:03:15PM +0200, Stefano Garzarella wrote:
> On Fri, Jun 23, 2023 at 04:37:55AM +0000, Bobby Eshleman wrote:
> > On Thu, Jun 22, 2023 at 06:09:12PM +0200, Stefano Garzarella wrote:
> > > On Sun, Jun 11, 2023 at 11:49:02PM +0300, Arseniy Krasnov wrote:
> > > > Hello Bobby!
> > > >
> > > > On 10.06.2023 03:58, Bobby Eshleman wrote:
> > > > > This commit adds support for datagrams over virtio/vsock.
> > > > >
> > > > > Message boundaries are preserved on a per-skb and per-vq entry basis.
> > > >
> > > > I'm a little bit confused about the following case: let vhost sends 4097 bytes
> > > > datagram to the guest. Guest uses 4096 RX buffers in it's virtio queue, each
> > > > buffer has attached empty skb to it. Vhost places first 4096 bytes to the first
> > > > buffer of guests RX queue, and 1 last byte to the second buffer. Now IIUC guest
> > > > has two skb in it rx queue, and user in guest wants to read data - does it read
> > > > 4097 bytes, while guest has two skb - 4096 bytes and 1 bytes? In seqpacket there is
> > > > special marker in header which shows where message ends, and how it works here?
> > >
> > > I think the main difference is that DGRAM is not connection-oriented, so
> > > we don't have a stream and we can't split the packet into 2 (maybe we
> > > could, but we have no guarantee that the second one for example will be
> > > not discarded because there is no space).
> > >
> > > So I think it is acceptable as a restriction to keep it simple.
> > >
> > > My only doubt is, should we make the RX buffer size configurable,
> > > instead of always using 4k?
> > >
> > I think that is a really good idea. What mechanism do you imagine?
>
> Some parameter in sysfs?
>

I comment more on this below.

> >
> > For sendmsg() with buflen > VQ_BUF_SIZE, I think I'd like -ENOBUFS
>
> For the guest it should be easy since it allocates the buffers, but for
> the host?
>
> Maybe we should add a field in the configuration space that reports some
> sort of MTU.
>
> Something in addition to what Laura had proposed here:
> https://markmail.org/message/ymhz7wllutdxji3e
>

That sounds good to me.

IIUC vhost exposes the limit via the configuration space, and the guest
can configure the RX buffer size up to that limit via sysfs?

> > returned even though it is uncharacteristic of Linux sockets.
> > Alternatively, silently dropping is okay... but seems needlessly
> > unhelpful.
>
> UDP takes advantage of IP fragmentation, right?
> But what happens if a fragment is lost?
>
> We should try to behave in a similar way.
>

AFAICT in UDP the sending socket will see EHOSTUNREACH on its error
queue and the packet will be dropped.

For more details:
- the IP defragmenter will emit an ICMP_TIME_EXCEEDED from ip_expire()
if the fragment queue is not completed within time.
- Upon seeing ICMP_TIME_EXCEEDED, the sending stack will then add
EHOSTUNREACH to the socket's error queue, as seen in __udp4_lib_err().

Given some updated man pages I think enqueuing EHOSTUNREACH is okay for
vsock too. This also reserves ENOBUFS/ENOMEM only for shortage on local
buffers / mem.

What do you think?

Thanks,
Bobby

2023-06-29 12:40:19

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v4 6/8] virtio/vsock: support dgrams

On Tue, Jun 27, 2023 at 01:19:43AM +0000, Bobby Eshleman wrote:
>On Mon, Jun 26, 2023 at 05:03:15PM +0200, Stefano Garzarella wrote:
>> On Fri, Jun 23, 2023 at 04:37:55AM +0000, Bobby Eshleman wrote:
>> > On Thu, Jun 22, 2023 at 06:09:12PM +0200, Stefano Garzarella wrote:
>> > > On Sun, Jun 11, 2023 at 11:49:02PM +0300, Arseniy Krasnov wrote:
>> > > > Hello Bobby!
>> > > >
>> > > > On 10.06.2023 03:58, Bobby Eshleman wrote:
>> > > > > This commit adds support for datagrams over virtio/vsock.
>> > > > >
>> > > > > Message boundaries are preserved on a per-skb and per-vq entry basis.
>> > > >
>> > > > I'm a little bit confused about the following case: let vhost sends 4097 bytes
>> > > > datagram to the guest. Guest uses 4096 RX buffers in it's virtio queue, each
>> > > > buffer has attached empty skb to it. Vhost places first 4096 bytes to the first
>> > > > buffer of guests RX queue, and 1 last byte to the second buffer. Now IIUC guest
>> > > > has two skb in it rx queue, and user in guest wants to read data - does it read
>> > > > 4097 bytes, while guest has two skb - 4096 bytes and 1 bytes? In seqpacket there is
>> > > > special marker in header which shows where message ends, and how it works here?
>> > >
>> > > I think the main difference is that DGRAM is not connection-oriented, so
>> > > we don't have a stream and we can't split the packet into 2 (maybe we
>> > > could, but we have no guarantee that the second one for example will be
>> > > not discarded because there is no space).
>> > >
>> > > So I think it is acceptable as a restriction to keep it simple.
>> > >
>> > > My only doubt is, should we make the RX buffer size configurable,
>> > > instead of always using 4k?
>> > >
>> > I think that is a really good idea. What mechanism do you imagine?
>>
>> Some parameter in sysfs?
>>
>
>I comment more on this below.
>
>> >
>> > For sendmsg() with buflen > VQ_BUF_SIZE, I think I'd like -ENOBUFS
>>
>> For the guest it should be easy since it allocates the buffers, but for
>> the host?
>>
>> Maybe we should add a field in the configuration space that reports some
>> sort of MTU.
>>
>> Something in addition to what Laura had proposed here:
>> https://markmail.org/message/ymhz7wllutdxji3e
>>
>
>That sounds good to me.
>
>IIUC vhost exposes the limit via the configuration space, and the guest
>can configure the RX buffer size up to that limit via sysfs?
>
>> > returned even though it is uncharacteristic of Linux sockets.
>> > Alternatively, silently dropping is okay... but seems needlessly
>> > unhelpful.
>>
>> UDP takes advantage of IP fragmentation, right?
>> But what happens if a fragment is lost?
>>
>> We should try to behave in a similar way.
>>
>
>AFAICT in UDP the sending socket will see EHOSTUNREACH on its error
>queue and the packet will be dropped.
>
>For more details:
>- the IP defragmenter will emit an ICMP_TIME_EXCEEDED from ip_expire()
> if the fragment queue is not completed within time.
>- Upon seeing ICMP_TIME_EXCEEDED, the sending stack will then add
> EHOSTUNREACH to the socket's error queue, as seen in __udp4_lib_err().
>
>Given some updated man pages I think enqueuing EHOSTUNREACH is okay for
>vsock too. This also reserves ENOBUFS/ENOMEM only for shortage on local
>buffers / mem.
>
>What do you think?

Yep, makes sense to me!

Stefano