This series tries to increase the throughput of virtio-vsock with slight
changes:
- patch 1/4: reduces the number of credit update messages sent to the
transmitter
- patch 2/4: allows the host to split packets on multiple buffers,
in this way, we can remove the packet size limit to
VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE
- patch 3/4: uses VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size
allowed
- patch 4/4: increases RX buffer size to 64 KiB (affects only host->guest)
RFC:
- maybe patch 4 can be replaced with multiple queues with different
buffer sizes or using EWMA to adapt the buffer size to the traffic
- as Jason suggested in a previous thread [1] I'll evaluate to use
virtio-net as transport, but I need to understand better how to
interface with it, maybe introducing sk_buff in virtio-vsock.
Any suggestions?
Here some benchmarks step by step. I used iperf3 [2] modified with VSOCK
support:
host -> guest [Gbps]
pkt_size before opt. patch 1 patches 2+3 patch 4
64 0.060 0.102 0.102 0.096
256 0.22 0.40 0.40 0.36
512 0.42 0.82 0.85 0.74
1K 0.7 1.6 1.6 1.5
2K 1.5 3.0 3.1 2.9
4K 2.5 5.2 5.3 5.3
8K 3.9 8.4 8.6 8.8
16K 6.6 11.1 11.3 12.8
32K 9.9 15.8 15.8 18.1
64K 13.5 17.4 17.7 21.4
128K 17.9 19.0 19.0 23.6
256K 18.0 19.4 19.8 24.4
512K 18.4 19.6 20.1 25.3
guest -> host [Gbps]
pkt_size before opt. patch 1 patches 2+3
64 0.088 0.100 0.101
256 0.35 0.36 0.41
512 0.70 0.74 0.73
1K 1.1 1.3 1.3
2K 2.4 2.4 2.6
4K 4.3 4.3 4.5
8K 7.3 7.4 7.6
16K 9.2 9.6 11.1
32K 8.3 8.9 18.1
64K 8.3 8.9 25.4
128K 7.2 8.7 26.7
256K 7.7 8.4 24.9
512K 7.7 8.5 25.0
Thanks,
Stefano
[1] https://www.spinics.net/lists/netdev/msg531783.html
[2] https://github.com/stefano-garzarella/iperf/
Stefano Garzarella (4):
vsock/virtio: reduce credit update messages
vhost/vsock: split packets to send using multiple buffers
vsock/virtio: change the maximum packet size allowed
vsock/virtio: increase RX buffer size to 64 KiB
drivers/vhost/vsock.c | 35 ++++++++++++++++++++-----
include/linux/virtio_vsock.h | 3 ++-
net/vmw_vsock/virtio_transport_common.c | 18 +++++++++----
3 files changed, 44 insertions(+), 12 deletions(-)
--
2.20.1
In order to reduce the number of credit update messages,
we send them only when the space available seen by the
transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
Signed-off-by: Stefano Garzarella <[email protected]>
---
include/linux/virtio_vsock.h | 1 +
net/vmw_vsock/virtio_transport_common.c | 14 +++++++++++---
2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index e223e2632edd..6d7a22cc20bf 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -37,6 +37,7 @@ struct virtio_vsock_sock {
u32 tx_cnt;
u32 buf_alloc;
u32 peer_fwd_cnt;
+ u32 last_fwd_cnt;
u32 peer_buf_alloc;
/* Protected by rx_lock */
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 602715fc9a75..f32301d823f5 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -206,6 +206,7 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
{
spin_lock_bh(&vvs->tx_lock);
+ vvs->last_fwd_cnt = vvs->fwd_cnt;
pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
spin_unlock_bh(&vvs->tx_lock);
@@ -256,6 +257,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
struct virtio_vsock_sock *vvs = vsk->trans;
struct virtio_vsock_pkt *pkt;
size_t bytes, total = 0;
+ s64 free_space;
int err = -EFAULT;
spin_lock_bh(&vvs->rx_lock);
@@ -288,9 +290,15 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
}
spin_unlock_bh(&vvs->rx_lock);
- /* Send a credit pkt to peer */
- virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
- NULL);
+ /* We send a credit update only when the space available seen
+ * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE
+ */
+ free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
+ if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
+ virtio_transport_send_credit_update(vsk,
+ VIRTIO_VSOCK_TYPE_STREAM,
+ NULL);
+ }
return total;
--
2.20.1
If the packets to sent to the guest are bigger than the buffer
available, we can split them, using multiple buffers and fixing
the length in the packet header.
This is safe since virtio-vsock supports only stream sockets.
Signed-off-by: Stefano Garzarella <[email protected]>
---
drivers/vhost/vsock.c | 35 +++++++++++++++++++++++++++++------
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index bb5fc0e9fbc2..9951b7e661f6 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -94,7 +94,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
struct iov_iter iov_iter;
unsigned out, in;
size_t nbytes;
- size_t len;
+ size_t iov_len, payload_len;
int head;
spin_lock_bh(&vsock->send_pkt_list_lock);
@@ -139,8 +139,18 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
break;
}
- len = iov_length(&vq->iov[out], in);
- iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
+ payload_len = pkt->len - pkt->off;
+ iov_len = iov_length(&vq->iov[out], in);
+ iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len);
+
+ /* If the packet is greater than the space available in the
+ * buffer, we split it using multiple buffers.
+ */
+ if (payload_len > iov_len - sizeof(pkt->hdr))
+ payload_len = iov_len - sizeof(pkt->hdr);
+
+ /* Set the correct length in the header */
+ pkt->hdr.len = cpu_to_le32(payload_len);
nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
if (nbytes != sizeof(pkt->hdr)) {
@@ -149,16 +159,29 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
break;
}
- nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
- if (nbytes != pkt->len) {
+ nbytes = copy_to_iter(pkt->buf + pkt->off, payload_len,
+ &iov_iter);
+ if (nbytes != payload_len) {
virtio_transport_free_pkt(pkt);
vq_err(vq, "Faulted on copying pkt buf\n");
break;
}
- vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
+ vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len);
added = true;
+ pkt->off += payload_len;
+
+ /* If we didn't send all the payload we can requeue the packet
+ * to send it with the next available buffer.
+ */
+ if (pkt->off < pkt->len) {
+ spin_lock_bh(&vsock->send_pkt_list_lock);
+ list_add(&pkt->list, &vsock->send_pkt_list);
+ spin_unlock_bh(&vsock->send_pkt_list_lock);
+ continue;
+ }
+
if (pkt->reply) {
int val;
--
2.20.1
In order to increase host -> guest throughput with large packets,
we can use 64 KiB RX buffers.
Signed-off-by: Stefano Garzarella <[email protected]>
---
include/linux/virtio_vsock.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 6d7a22cc20bf..43cce304408e 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -10,7 +10,7 @@
#define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE 128
#define VIRTIO_VSOCK_DEFAULT_BUF_SIZE (1024 * 256)
#define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE (1024 * 256)
-#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 4)
+#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 64)
#define VIRTIO_VSOCK_MAX_BUF_SIZE 0xFFFFFFFFUL
#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE (1024 * 64)
--
2.20.1
Since now we are able to split packets, we can avoid limiting
their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
packet size.
Signed-off-by: Stefano Garzarella <[email protected]>
---
net/vmw_vsock/virtio_transport_common.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index f32301d823f5..822e5d07a4ec 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -167,8 +167,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
vvs = vsk->trans;
/* we can send less than pkt_len bytes */
- if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
- pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+ if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
+ pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
/* virtio_transport_get_credit might return less than pkt_len credit */
pkt_len = virtio_transport_get_credit(vvs, pkt_len);
--
2.20.1
On Thu, Apr 04, 2019 at 12:58:34PM +0200, Stefano Garzarella wrote:
> This series tries to increase the throughput of virtio-vsock with slight
> changes:
> - patch 1/4: reduces the number of credit update messages sent to the
> transmitter
> - patch 2/4: allows the host to split packets on multiple buffers,
> in this way, we can remove the packet size limit to
> VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE
> - patch 3/4: uses VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size
> allowed
> - patch 4/4: increases RX buffer size to 64 KiB (affects only host->guest)
>
> RFC:
> - maybe patch 4 can be replaced with multiple queues with different
> buffer sizes or using EWMA to adapt the buffer size to the traffic
>
> - as Jason suggested in a previous thread [1] I'll evaluate to use
> virtio-net as transport, but I need to understand better how to
> interface with it, maybe introducing sk_buff in virtio-vsock.
>
> Any suggestions?
Great performance results, nice job!
Please include efficiency numbers (bandwidth / CPU utilization) in the
future. Due to the nature of these optimizations it's unlikely that
efficiency has decreased, so I'm not too worried about it this time.
On Thu, Apr 04, 2019 at 03:14:10PM +0100, Stefan Hajnoczi wrote:
> On Thu, Apr 04, 2019 at 12:58:34PM +0200, Stefano Garzarella wrote:
> > This series tries to increase the throughput of virtio-vsock with slight
> > changes:
> > - patch 1/4: reduces the number of credit update messages sent to the
> > transmitter
> > - patch 2/4: allows the host to split packets on multiple buffers,
> > in this way, we can remove the packet size limit to
> > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE
> > - patch 3/4: uses VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size
> > allowed
> > - patch 4/4: increases RX buffer size to 64 KiB (affects only host->guest)
> >
> > RFC:
> > - maybe patch 4 can be replaced with multiple queues with different
> > buffer sizes or using EWMA to adapt the buffer size to the traffic
> >
> > - as Jason suggested in a previous thread [1] I'll evaluate to use
> > virtio-net as transport, but I need to understand better how to
> > interface with it, maybe introducing sk_buff in virtio-vsock.
> >
> > Any suggestions?
>
> Great performance results, nice job!
:)
>
> Please include efficiency numbers (bandwidth / CPU utilization) in the
> future. Due to the nature of these optimizations it's unlikely that
> efficiency has decreased, so I'm not too worried about it this time.
Thanks for the suggestion! I'll measure also the efficiency for future
optimizations.
Cheers,
Stefano
On Thu, Apr 04, 2019 at 12:58:34PM +0200, Stefano Garzarella wrote:
> This series tries to increase the throughput of virtio-vsock with slight
> changes:
> - patch 1/4: reduces the number of credit update messages sent to the
> transmitter
> - patch 2/4: allows the host to split packets on multiple buffers,
> in this way, we can remove the packet size limit to
> VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE
> - patch 3/4: uses VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size
> allowed
> - patch 4/4: increases RX buffer size to 64 KiB (affects only host->guest)
>
> RFC:
> - maybe patch 4 can be replaced with multiple queues with different
> buffer sizes or using EWMA to adapt the buffer size to the traffic
>
> - as Jason suggested in a previous thread [1] I'll evaluate to use
> virtio-net as transport, but I need to understand better how to
> interface with it, maybe introducing sk_buff in virtio-vsock.
>
> Any suggestions?
>
> Here some benchmarks step by step. I used iperf3 [2] modified with VSOCK
> support:
>
> host -> guest [Gbps]
> pkt_size before opt. patch 1 patches 2+3 patch 4
> 64 0.060 0.102 0.102 0.096
> 256 0.22 0.40 0.40 0.36
> 512 0.42 0.82 0.85 0.74
> 1K 0.7 1.6 1.6 1.5
> 2K 1.5 3.0 3.1 2.9
> 4K 2.5 5.2 5.3 5.3
> 8K 3.9 8.4 8.6 8.8
> 16K 6.6 11.1 11.3 12.8
> 32K 9.9 15.8 15.8 18.1
> 64K 13.5 17.4 17.7 21.4
> 128K 17.9 19.0 19.0 23.6
> 256K 18.0 19.4 19.8 24.4
> 512K 18.4 19.6 20.1 25.3
>
> guest -> host [Gbps]
> pkt_size before opt. patch 1 patches 2+3
> 64 0.088 0.100 0.101
> 256 0.35 0.36 0.41
> 512 0.70 0.74 0.73
> 1K 1.1 1.3 1.3
> 2K 2.4 2.4 2.6
> 4K 4.3 4.3 4.5
> 8K 7.3 7.4 7.6
> 16K 9.2 9.6 11.1
> 32K 8.3 8.9 18.1
> 64K 8.3 8.9 25.4
> 128K 7.2 8.7 26.7
> 256K 7.7 8.4 24.9
> 512K 7.7 8.5 25.0
>
> Thanks,
> Stefano
I simply love it that you have analysed the individual impact of
each patch! Great job!
For comparison's sake, it could be IMHO benefitial to add a column
with virtio-net+vhost-net performance.
This will both give us an idea about whether the vsock layer introduces
inefficiencies, and whether the virtio-net idea has merit.
One other comment: it makes sense to test with disabling smap
mitigations (boot host and guest with nosmap). No problem with also
testing the default smap path, but I think you will discover that the
performance impact of smap hardening being enabled is often severe for
such benchmarks.
> [1] https://www.spinics.net/lists/netdev/msg531783.html
> [2] https://github.com/stefano-garzarella/iperf/
>
> Stefano Garzarella (4):
> vsock/virtio: reduce credit update messages
> vhost/vsock: split packets to send using multiple buffers
> vsock/virtio: change the maximum packet size allowed
> vsock/virtio: increase RX buffer size to 64 KiB
>
> drivers/vhost/vsock.c | 35 ++++++++++++++++++++-----
> include/linux/virtio_vsock.h | 3 ++-
> net/vmw_vsock/virtio_transport_common.c | 18 +++++++++----
> 3 files changed, 44 insertions(+), 12 deletions(-)
>
> --
> 2.20.1
On Thu, Apr 04, 2019 at 11:52:46AM -0400, Michael S. Tsirkin wrote:
> I simply love it that you have analysed the individual impact of
> each patch! Great job!
Thanks! I followed Stefan's suggestions!
>
> For comparison's sake, it could be IMHO benefitial to add a column
> with virtio-net+vhost-net performance.
>
> This will both give us an idea about whether the vsock layer introduces
> inefficiencies, and whether the virtio-net idea has merit.
>
Sure, I already did TCP tests on virtio-net + vhost, starting qemu in
this way:
$ qemu-system-x86_64 ... \
-netdev tap,id=net0,vhost=on,ifname=tap0,script=no,downscript=no \
-device virtio-net-pci,netdev=net0
I did also a test using TCP_NODELAY, just to be fair, because VSOCK
doesn't implement something like this.
In both cases I set the MTU to the maximum allowed (65520).
VSOCK TCP + virtio-net + vhost
host -> guest [Gbps] host -> guest [Gbps]
pkt_size before opt. patch 1 patches 2+3 patch 4 TCP_NODELAY
64 0.060 0.102 0.102 0.096 0.16 0.15
256 0.22 0.40 0.40 0.36 0.32 0.57
512 0.42 0.82 0.85 0.74 1.2 1.2
1K 0.7 1.6 1.6 1.5 2.1 2.1
2K 1.5 3.0 3.1 2.9 3.5 3.4
4K 2.5 5.2 5.3 5.3 5.5 5.3
8K 3.9 8.4 8.6 8.8 8.0 7.9
16K 6.6 11.1 11.3 12.8 9.8 10.2
32K 9.9 15.8 15.8 18.1 11.8 10.7
64K 13.5 17.4 17.7 21.4 11.4 11.3
128K 17.9 19.0 19.0 23.6 11.2 11.0
256K 18.0 19.4 19.8 24.4 11.1 11.0
512K 18.4 19.6 20.1 25.3 10.1 10.7
For small packet size (< 4K) I think we should implement some kind of
batching/merging, that could be for free if we use virtio-net as a transport.
Note: Maybe I have something miss configured because TCP on virtio-net
for host -> guest case doesn't exceed 11 Gbps.
VSOCK TCP + virtio-net + vhost
guest -> host [Gbps] guest -> host [Gbps]
pkt_size before opt. patch 1 patches 2+3 TCP_NODELAY
64 0.088 0.100 0.101 0.24 0.24
256 0.35 0.36 0.41 0.36 1.03
512 0.70 0.74 0.73 0.69 1.6
1K 1.1 1.3 1.3 1.1 3.0
2K 2.4 2.4 2.6 2.1 5.5
4K 4.3 4.3 4.5 3.8 8.8
8K 7.3 7.4 7.6 6.6 20.0
16K 9.2 9.6 11.1 12.3 29.4
32K 8.3 8.9 18.1 19.3 28.2
64K 8.3 8.9 25.4 20.6 28.7
128K 7.2 8.7 26.7 23.1 27.9
256K 7.7 8.4 24.9 28.5 29.4
512K 7.7 8.5 25.0 28.3 29.3
For guest -> host I think is important the TCP_NODELAY test, because TCP
buffering increases a lot the throughput.
> One other comment: it makes sense to test with disabling smap
> mitigations (boot host and guest with nosmap). No problem with also
> testing the default smap path, but I think you will discover that the
> performance impact of smap hardening being enabled is often severe for
> such benchmarks.
Thanks for this valuable suggestion, I'll redo all the tests with nosmap!
Cheers,
Stefano
On Thu, Apr 04, 2019 at 06:47:15PM +0200, Stefano Garzarella wrote:
> On Thu, Apr 04, 2019 at 11:52:46AM -0400, Michael S. Tsirkin wrote:
> > I simply love it that you have analysed the individual impact of
> > each patch! Great job!
>
> Thanks! I followed Stefan's suggestions!
>
> >
> > For comparison's sake, it could be IMHO benefitial to add a column
> > with virtio-net+vhost-net performance.
> >
> > This will both give us an idea about whether the vsock layer introduces
> > inefficiencies, and whether the virtio-net idea has merit.
> >
>
> Sure, I already did TCP tests on virtio-net + vhost, starting qemu in
> this way:
> $ qemu-system-x86_64 ... \
> -netdev tap,id=net0,vhost=on,ifname=tap0,script=no,downscript=no \
> -device virtio-net-pci,netdev=net0
>
> I did also a test using TCP_NODELAY, just to be fair, because VSOCK
> doesn't implement something like this.
Why not?
> In both cases I set the MTU to the maximum allowed (65520).
>
> VSOCK TCP + virtio-net + vhost
> host -> guest [Gbps] host -> guest [Gbps]
> pkt_size before opt. patch 1 patches 2+3 patch 4 TCP_NODELAY
> 64 0.060 0.102 0.102 0.096 0.16 0.15
> 256 0.22 0.40 0.40 0.36 0.32 0.57
> 512 0.42 0.82 0.85 0.74 1.2 1.2
> 1K 0.7 1.6 1.6 1.5 2.1 2.1
> 2K 1.5 3.0 3.1 2.9 3.5 3.4
> 4K 2.5 5.2 5.3 5.3 5.5 5.3
> 8K 3.9 8.4 8.6 8.8 8.0 7.9
> 16K 6.6 11.1 11.3 12.8 9.8 10.2
> 32K 9.9 15.8 15.8 18.1 11.8 10.7
> 64K 13.5 17.4 17.7 21.4 11.4 11.3
> 128K 17.9 19.0 19.0 23.6 11.2 11.0
> 256K 18.0 19.4 19.8 24.4 11.1 11.0
> 512K 18.4 19.6 20.1 25.3 10.1 10.7
>
> For small packet size (< 4K) I think we should implement some kind of
> batching/merging, that could be for free if we use virtio-net as a transport.
>
> Note: Maybe I have something miss configured because TCP on virtio-net
> for host -> guest case doesn't exceed 11 Gbps.
>
> VSOCK TCP + virtio-net + vhost
> guest -> host [Gbps] guest -> host [Gbps]
> pkt_size before opt. patch 1 patches 2+3 TCP_NODELAY
> 64 0.088 0.100 0.101 0.24 0.24
> 256 0.35 0.36 0.41 0.36 1.03
> 512 0.70 0.74 0.73 0.69 1.6
> 1K 1.1 1.3 1.3 1.1 3.0
> 2K 2.4 2.4 2.6 2.1 5.5
> 4K 4.3 4.3 4.5 3.8 8.8
> 8K 7.3 7.4 7.6 6.6 20.0
> 16K 9.2 9.6 11.1 12.3 29.4
> 32K 8.3 8.9 18.1 19.3 28.2
> 64K 8.3 8.9 25.4 20.6 28.7
> 128K 7.2 8.7 26.7 23.1 27.9
> 256K 7.7 8.4 24.9 28.5 29.4
> 512K 7.7 8.5 25.0 28.3 29.3
>
> For guest -> host I think is important the TCP_NODELAY test, because TCP
> buffering increases a lot the throughput.
>
> > One other comment: it makes sense to test with disabling smap
> > mitigations (boot host and guest with nosmap). No problem with also
> > testing the default smap path, but I think you will discover that the
> > performance impact of smap hardening being enabled is often severe for
> > such benchmarks.
>
> Thanks for this valuable suggestion, I'll redo all the tests with nosmap!
>
> Cheers,
> Stefano
On Thu, Apr 04, 2019 at 12:58:35PM +0200, Stefano Garzarella wrote:
> @@ -256,6 +257,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> struct virtio_vsock_sock *vvs = vsk->trans;
> struct virtio_vsock_pkt *pkt;
> size_t bytes, total = 0;
> + s64 free_space;
Why s64? buf_alloc, fwd_cnt, and last_fwd_cnt are all u32. fwd_cnt -
last_fwd_cnt <= buf_alloc is always true.
> int err = -EFAULT;
>
> spin_lock_bh(&vvs->rx_lock);
> @@ -288,9 +290,15 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> }
> spin_unlock_bh(&vvs->rx_lock);
>
> - /* Send a credit pkt to peer */
> - virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
> - NULL);
> + /* We send a credit update only when the space available seen
> + * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE
> + */
> + free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
Locking? These fields should be accessed under tx_lock.
> + if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
> + virtio_transport_send_credit_update(vsk,
> + VIRTIO_VSOCK_TYPE_STREAM,
> + NULL);
> + }
On Thu, Apr 04, 2019 at 02:04:10PM -0400, Michael S. Tsirkin wrote:
> On Thu, Apr 04, 2019 at 06:47:15PM +0200, Stefano Garzarella wrote:
> > On Thu, Apr 04, 2019 at 11:52:46AM -0400, Michael S. Tsirkin wrote:
> > > I simply love it that you have analysed the individual impact of
> > > each patch! Great job!
> >
> > Thanks! I followed Stefan's suggestions!
> >
> > >
> > > For comparison's sake, it could be IMHO benefitial to add a column
> > > with virtio-net+vhost-net performance.
> > >
> > > This will both give us an idea about whether the vsock layer introduces
> > > inefficiencies, and whether the virtio-net idea has merit.
> > >
> >
> > Sure, I already did TCP tests on virtio-net + vhost, starting qemu in
> > this way:
> > $ qemu-system-x86_64 ... \
> > -netdev tap,id=net0,vhost=on,ifname=tap0,script=no,downscript=no \
> > -device virtio-net-pci,netdev=net0
> >
> > I did also a test using TCP_NODELAY, just to be fair, because VSOCK
> > doesn't implement something like this.
>
> Why not?
>
I think because originally VSOCK was designed to be simple and
low-latency, but of course we can introduce something like that.
Current implementation directly copy the buffer from the user-space in a
virtio_vsock_pkt and enqueue it to be transmitted.
Maybe we can introduce a buffer per socket where accumulate bytes and
send it when it is full or when a timer is fired . We can also introduce
a VSOCK_NODELAY (maybe using the same value of TCP_NODELAY for
compatibility) to send the buffer immediately for low-latency use cases.
What do you think?
Thanks,
Stefano
On Thu, Apr 04, 2019 at 12:58:36PM +0200, Stefano Garzarella wrote:
> @@ -139,8 +139,18 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> break;
> }
>
> - len = iov_length(&vq->iov[out], in);
> - iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
> + payload_len = pkt->len - pkt->off;
> + iov_len = iov_length(&vq->iov[out], in);
> + iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len);
> +
> + /* If the packet is greater than the space available in the
> + * buffer, we split it using multiple buffers.
> + */
> + if (payload_len > iov_len - sizeof(pkt->hdr))
Integer underflow. iov_len is controlled by the guest and therefore
untrusted. Please validate iov_len before assuming it's larger than
sizeof(pkt->hdr).
> - vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
> + vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len);
> added = true;
>
> + pkt->off += payload_len;
> +
> + /* If we didn't send all the payload we can requeue the packet
> + * to send it with the next available buffer.
> + */
> + if (pkt->off < pkt->len) {
> + spin_lock_bh(&vsock->send_pkt_list_lock);
> + list_add(&pkt->list, &vsock->send_pkt_list);
> + spin_unlock_bh(&vsock->send_pkt_list_lock);
> + continue;
The virtio_transport_deliver_tap_pkt() call is skipped. Packet capture
should see the exact packets that are delivered. I think this patch
will present one large packet instead of several smaller packets that
were actually delivered.
On Thu, Apr 04, 2019 at 08:15:39PM +0100, Stefan Hajnoczi wrote:
> On Thu, Apr 04, 2019 at 12:58:35PM +0200, Stefano Garzarella wrote:
> > @@ -256,6 +257,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> > struct virtio_vsock_sock *vvs = vsk->trans;
> > struct virtio_vsock_pkt *pkt;
> > size_t bytes, total = 0;
> > + s64 free_space;
>
> Why s64? buf_alloc, fwd_cnt, and last_fwd_cnt are all u32. fwd_cnt -
> last_fwd_cnt <= buf_alloc is always true.
>
Right, I'll use a u32 for free_space!
Is is a leftover because initially I implemented something like
virtio_transport_has_space().
> > int err = -EFAULT;
> >
> > spin_lock_bh(&vvs->rx_lock);
> > @@ -288,9 +290,15 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> > }
> > spin_unlock_bh(&vvs->rx_lock);
> >
> > - /* Send a credit pkt to peer */
> > - virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
> > - NULL);
> > + /* We send a credit update only when the space available seen
> > + * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE
> > + */
> > + free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
>
> Locking? These fields should be accessed under tx_lock.
>
Yes, we need a lock, but looking in the code, vvs->fwd_cnd is written
taking rx_lock (virtio_transport_dec_rx_pkt) and it is read with the
tx_lock (virtio_transport_inc_tx_pkt).
Maybe we should use another spin_lock shared between RX and TX for those
fields or use atomic variables.
What do you suggest?
Thanks,
Stefano
On Thu, Apr 04, 2019 at 12:58:37PM +0200, Stefano Garzarella wrote:
> Since now we are able to split packets, we can avoid limiting
> their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> packet size.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
> net/vmw_vsock/virtio_transport_common.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index f32301d823f5..822e5d07a4ec 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -167,8 +167,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> vvs = vsk->trans;
>
> /* we can send less than pkt_len bytes */
> - if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> - pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> + if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> + pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
The next line limits pkt_len based on available credits:
/* virtio_transport_get_credit might return less than pkt_len credit */
pkt_len = virtio_transport_get_credit(vvs, pkt_len);
I think drivers/vhost/vsock.c:vhost_transport_do_send_pkt() now works
correctly even with pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
The other ->send_pkt() callback is
net/vmw_vsock/virtio_transport.c:virtio_transport_send_pkt_work() and it
can already send any size packet.
Do you remember why VIRTIO_VSOCK_MAX_PKT_BUF_SIZE still needs to be the
limit? I'm wondering if we can get rid of it now and just limit packets
to the available credits.
Stefan
On Thu, Apr 04, 2019 at 12:58:38PM +0200, Stefano Garzarella wrote:
> In order to increase host -> guest throughput with large packets,
> we can use 64 KiB RX buffers.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
> include/linux/virtio_vsock.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 6d7a22cc20bf..43cce304408e 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -10,7 +10,7 @@
> #define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE 128
> #define VIRTIO_VSOCK_DEFAULT_BUF_SIZE (1024 * 256)
> #define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE (1024 * 256)
> -#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 4)
> +#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 64)
This patch raises rx ring memory consumption from 128 * 4KB = 512KB to
128 * 64KB = 8MB.
Michael, Jason: Any advice regarding rx/tx ring sizes and buffer sizes?
Depending on rx ring size and the workload's packet size, different
values might be preferred.
This could become a tunable in the future. It determines the size of
the guest driver's rx buffers.
On Fri, Apr 05, 2019 at 09:13:56AM +0100, Stefan Hajnoczi wrote:
> On Thu, Apr 04, 2019 at 12:58:36PM +0200, Stefano Garzarella wrote:
> > @@ -139,8 +139,18 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > break;
> > }
> >
> > - len = iov_length(&vq->iov[out], in);
> > - iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
> > + payload_len = pkt->len - pkt->off;
> > + iov_len = iov_length(&vq->iov[out], in);
> > + iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len);
> > +
> > + /* If the packet is greater than the space available in the
> > + * buffer, we split it using multiple buffers.
> > + */
> > + if (payload_len > iov_len - sizeof(pkt->hdr))
>
> Integer underflow. iov_len is controlled by the guest and therefore
> untrusted. Please validate iov_len before assuming it's larger than
> sizeof(pkt->hdr).
>
Okay, I'll do it!
> > - vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
> > + vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len);
> > added = true;
> >
> > + pkt->off += payload_len;
> > +
> > + /* If we didn't send all the payload we can requeue the packet
> > + * to send it with the next available buffer.
> > + */
> > + if (pkt->off < pkt->len) {
> > + spin_lock_bh(&vsock->send_pkt_list_lock);
> > + list_add(&pkt->list, &vsock->send_pkt_list);
> > + spin_unlock_bh(&vsock->send_pkt_list_lock);
> > + continue;
>
> The virtio_transport_deliver_tap_pkt() call is skipped. Packet capture
> should see the exact packets that are delivered. I think this patch
> will present one large packet instead of several smaller packets that
> were actually delivered.
I'll modify virtio_transport_build_skb() to take care of pkt->off
and reading the payload size from the virtio_vsock_hdr.
Otherwise, should I introduce another field in virtio_vsock_pkt to store
the payload size?
Thanks,
Stefano
On Fri, Apr 05, 2019 at 09:24:47AM +0100, Stefan Hajnoczi wrote:
> On Thu, Apr 04, 2019 at 12:58:37PM +0200, Stefano Garzarella wrote:
> > Since now we are able to split packets, we can avoid limiting
> > their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> > Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> > packet size.
> >
> > Signed-off-by: Stefano Garzarella <[email protected]>
> > ---
> > net/vmw_vsock/virtio_transport_common.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index f32301d823f5..822e5d07a4ec 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -167,8 +167,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > vvs = vsk->trans;
> >
> > /* we can send less than pkt_len bytes */
> > - if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> > - pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> > + if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> > + pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
>
> The next line limits pkt_len based on available credits:
>
> /* virtio_transport_get_credit might return less than pkt_len credit */
> pkt_len = virtio_transport_get_credit(vvs, pkt_len);
>
> I think drivers/vhost/vsock.c:vhost_transport_do_send_pkt() now works
> correctly even with pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
Correct.
>
> The other ->send_pkt() callback is
> net/vmw_vsock/virtio_transport.c:virtio_transport_send_pkt_work() and it
> can already send any size packet.
>
> Do you remember why VIRTIO_VSOCK_MAX_PKT_BUF_SIZE still needs to be the
> limit? I'm wondering if we can get rid of it now and just limit packets
> to the available credits.
There are 2 reasons why I left this limit:
1. When the host receives a packets, it must be <=
VIRTIO_VSOCK_MAX_PKT_BUF_SIZE [drivers/vhost/vsock.c:vhost_vsock_alloc_pkt()]
So in this way we can limit the packets sent from the guest.
2. When the host send packets, it help us to increase the parallelism
(especially if the guest has 64 KB RX buffers) because the user thread
will split packets, calling multiple times transport->stream_enqueue()
in net/vmw_vsock/af_vsock.c:vsock_stream_sendmsg() while the
vhost_transport_send_pkt_work() send them to the guest.
Do you think make sense?
Thanks,
Stefano
On 2019/4/5 下午4:44, Stefan Hajnoczi wrote:
> On Thu, Apr 04, 2019 at 12:58:38PM +0200, Stefano Garzarella wrote:
>> In order to increase host -> guest throughput with large packets,
>> we can use 64 KiB RX buffers.
>>
>> Signed-off-by: Stefano Garzarella <[email protected]>
>> ---
>> include/linux/virtio_vsock.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> index 6d7a22cc20bf..43cce304408e 100644
>> --- a/include/linux/virtio_vsock.h
>> +++ b/include/linux/virtio_vsock.h
>> @@ -10,7 +10,7 @@
>> #define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE 128
>> #define VIRTIO_VSOCK_DEFAULT_BUF_SIZE (1024 * 256)
>> #define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE (1024 * 256)
>> -#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 4)
>> +#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 64)
> This patch raises rx ring memory consumption from 128 * 4KB = 512KB to
> 128 * 64KB = 8MB.
>
> Michael, Jason: Any advice regarding rx/tx ring sizes and buffer sizes?
>
> Depending on rx ring size and the workload's packet size, different
> values might be preferred.
>
> This could become a tunable in the future. It determines the size of
> the guest driver's rx buffers.
In virtio-net, we have mergeable rx buffer and estimate the rx buffer
size through EWMA.
That's another reason I suggest to squash the vsock codes into virtio-net.
Thanks
On 2019/4/4 下午6:58, Stefano Garzarella wrote:
> This series tries to increase the throughput of virtio-vsock with slight
> changes:
> - patch 1/4: reduces the number of credit update messages sent to the
> transmitter
> - patch 2/4: allows the host to split packets on multiple buffers,
> in this way, we can remove the packet size limit to
> VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE
> - patch 3/4: uses VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size
> allowed
> - patch 4/4: increases RX buffer size to 64 KiB (affects only host->guest)
>
> RFC:
> - maybe patch 4 can be replaced with multiple queues with different
> buffer sizes or using EWMA to adapt the buffer size to the traffic
Or EWMA + mergeable rx buffer, but if we decide to unify the datapath
with virtio-net, we can reuse their codes.
>
> - as Jason suggested in a previous thread [1] I'll evaluate to use
> virtio-net as transport, but I need to understand better how to
> interface with it, maybe introducing sk_buff in virtio-vsock.
>
> Any suggestions?
My understanding is this is not a must, but if it makes things easier,
we can do this.
Another thing that may help is to implement sendpage(), which will
greatly improve the performance.
Thanks
>
> Here some benchmarks step by step. I used iperf3 [2] modified with VSOCK
> support:
>
> host -> guest [Gbps]
> pkt_size before opt. patch 1 patches 2+3 patch 4
> 64 0.060 0.102 0.102 0.096
> 256 0.22 0.40 0.40 0.36
> 512 0.42 0.82 0.85 0.74
> 1K 0.7 1.6 1.6 1.5
> 2K 1.5 3.0 3.1 2.9
> 4K 2.5 5.2 5.3 5.3
> 8K 3.9 8.4 8.6 8.8
> 16K 6.6 11.1 11.3 12.8
> 32K 9.9 15.8 15.8 18.1
> 64K 13.5 17.4 17.7 21.4
> 128K 17.9 19.0 19.0 23.6
> 256K 18.0 19.4 19.8 24.4
> 512K 18.4 19.6 20.1 25.3
>
> guest -> host [Gbps]
> pkt_size before opt. patch 1 patches 2+3
> 64 0.088 0.100 0.101
> 256 0.35 0.36 0.41
> 512 0.70 0.74 0.73
> 1K 1.1 1.3 1.3
> 2K 2.4 2.4 2.6
> 4K 4.3 4.3 4.5
> 8K 7.3 7.4 7.6
> 16K 9.2 9.6 11.1
> 32K 8.3 8.9 18.1
> 64K 8.3 8.9 25.4
> 128K 7.2 8.7 26.7
> 256K 7.7 8.4 24.9
> 512K 7.7 8.5 25.0
>
> Thanks,
> Stefano
>
> [1] https://www.spinics.net/lists/netdev/msg531783.html
> [2] https://github.com/stefano-garzarella/iperf/
>
> Stefano Garzarella (4):
> vsock/virtio: reduce credit update messages
> vhost/vsock: split packets to send using multiple buffers
> vsock/virtio: change the maximum packet size allowed
> vsock/virtio: increase RX buffer size to 64 KiB
>
> drivers/vhost/vsock.c | 35 ++++++++++++++++++++-----
> include/linux/virtio_vsock.h | 3 ++-
> net/vmw_vsock/virtio_transport_common.c | 18 +++++++++----
> 3 files changed, 44 insertions(+), 12 deletions(-)
>
On Fri, Apr 05, 2019 at 09:49:17AM +0200, Stefano Garzarella wrote:
> On Thu, Apr 04, 2019 at 02:04:10PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Apr 04, 2019 at 06:47:15PM +0200, Stefano Garzarella wrote:
> > > On Thu, Apr 04, 2019 at 11:52:46AM -0400, Michael S. Tsirkin wrote:
> > > > I simply love it that you have analysed the individual impact of
> > > > each patch! Great job!
> > >
> > > Thanks! I followed Stefan's suggestions!
> > >
> > > >
> > > > For comparison's sake, it could be IMHO benefitial to add a column
> > > > with virtio-net+vhost-net performance.
> > > >
> > > > This will both give us an idea about whether the vsock layer introduces
> > > > inefficiencies, and whether the virtio-net idea has merit.
> > > >
> > >
> > > Sure, I already did TCP tests on virtio-net + vhost, starting qemu in
> > > this way:
> > > $ qemu-system-x86_64 ... \
> > > -netdev tap,id=net0,vhost=on,ifname=tap0,script=no,downscript=no \
> > > -device virtio-net-pci,netdev=net0
> > >
> > > I did also a test using TCP_NODELAY, just to be fair, because VSOCK
> > > doesn't implement something like this.
> >
> > Why not?
> >
>
> I think because originally VSOCK was designed to be simple and
> low-latency, but of course we can introduce something like that.
>
> Current implementation directly copy the buffer from the user-space in a
> virtio_vsock_pkt and enqueue it to be transmitted.
>
> Maybe we can introduce a buffer per socket where accumulate bytes and
> send it when it is full or when a timer is fired . We can also introduce
> a VSOCK_NODELAY (maybe using the same value of TCP_NODELAY for
> compatibility) to send the buffer immediately for low-latency use cases.
>
> What do you think?
Today virtio-vsock implements a 1:1 sendmsg():packet relationship
because it's simple. But there's no need for the guest to enqueue
multiple VIRTIO_VSOCK_OP_RW packets when a single large packet could
combine all payloads for a connection. This is not the same as
TCP_NODELAY but related.
I think it's worth exploring TCP_NODELAY and send_pkt_list merging.
Hopefully it won't make the code much more complicated.
Stefan
On Fri, Apr 05, 2019 at 10:16:48AM +0200, Stefano Garzarella wrote:
> On Thu, Apr 04, 2019 at 08:15:39PM +0100, Stefan Hajnoczi wrote:
> > On Thu, Apr 04, 2019 at 12:58:35PM +0200, Stefano Garzarella wrote:
> > > int err = -EFAULT;
> > >
> > > spin_lock_bh(&vvs->rx_lock);
> > > @@ -288,9 +290,15 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> > > }
> > > spin_unlock_bh(&vvs->rx_lock);
> > >
> > > - /* Send a credit pkt to peer */
> > > - virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
> > > - NULL);
> > > + /* We send a credit update only when the space available seen
> > > + * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE
> > > + */
> > > + free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
> >
> > Locking? These fields should be accessed under tx_lock.
> >
>
> Yes, we need a lock, but looking in the code, vvs->fwd_cnd is written
> taking rx_lock (virtio_transport_dec_rx_pkt) and it is read with the
> tx_lock (virtio_transport_inc_tx_pkt).
>
> Maybe we should use another spin_lock shared between RX and TX for those
> fields or use atomic variables.
>
> What do you suggest?
Or make vvs->fwd_cnt atomic if it's the only field that needs to be
accessed in this manner.
Stefan
On Fri, Apr 05, 2019 at 11:36:08AM +0200, Stefano Garzarella wrote:
> On Fri, Apr 05, 2019 at 09:13:56AM +0100, Stefan Hajnoczi wrote:
> > On Thu, Apr 04, 2019 at 12:58:36PM +0200, Stefano Garzarella wrote:
> > > - vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
> > > + vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len);
> > > added = true;
> > >
> > > + pkt->off += payload_len;
> > > +
> > > + /* If we didn't send all the payload we can requeue the packet
> > > + * to send it with the next available buffer.
> > > + */
> > > + if (pkt->off < pkt->len) {
> > > + spin_lock_bh(&vsock->send_pkt_list_lock);
> > > + list_add(&pkt->list, &vsock->send_pkt_list);
> > > + spin_unlock_bh(&vsock->send_pkt_list_lock);
> > > + continue;
> >
> > The virtio_transport_deliver_tap_pkt() call is skipped. Packet capture
> > should see the exact packets that are delivered. I think this patch
> > will present one large packet instead of several smaller packets that
> > were actually delivered.
>
> I'll modify virtio_transport_build_skb() to take care of pkt->off
> and reading the payload size from the virtio_vsock_hdr.
> Otherwise, should I introduce another field in virtio_vsock_pkt to store
> the payload size?
I don't remember the details but I trust you'll pick a good way of doing
it.
Stefan
On Fri, Apr 05, 2019 at 12:07:47PM +0200, Stefano Garzarella wrote:
> On Fri, Apr 05, 2019 at 09:24:47AM +0100, Stefan Hajnoczi wrote:
> > On Thu, Apr 04, 2019 at 12:58:37PM +0200, Stefano Garzarella wrote:
> > > Since now we are able to split packets, we can avoid limiting
> > > their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> > > Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> > > packet size.
> > >
> > > Signed-off-by: Stefano Garzarella <[email protected]>
> > > ---
> > > net/vmw_vsock/virtio_transport_common.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > index f32301d823f5..822e5d07a4ec 100644
> > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > @@ -167,8 +167,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > vvs = vsk->trans;
> > >
> > > /* we can send less than pkt_len bytes */
> > > - if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> > > - pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> > > + if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> > > + pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> >
> > The next line limits pkt_len based on available credits:
> >
> > /* virtio_transport_get_credit might return less than pkt_len credit */
> > pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> >
> > I think drivers/vhost/vsock.c:vhost_transport_do_send_pkt() now works
> > correctly even with pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
>
> Correct.
>
> >
> > The other ->send_pkt() callback is
> > net/vmw_vsock/virtio_transport.c:virtio_transport_send_pkt_work() and it
> > can already send any size packet.
> >
> > Do you remember why VIRTIO_VSOCK_MAX_PKT_BUF_SIZE still needs to be the
> > limit? I'm wondering if we can get rid of it now and just limit packets
> > to the available credits.
>
> There are 2 reasons why I left this limit:
> 1. When the host receives a packets, it must be <=
> VIRTIO_VSOCK_MAX_PKT_BUF_SIZE [drivers/vhost/vsock.c:vhost_vsock_alloc_pkt()]
> So in this way we can limit the packets sent from the guest.
The general intent is to prevent the guest from sending huge buffers.
This is good.
However, the guest must already obey the credit limit advertized by the
host. Therefore I think we should be checking against that instead of
an arbitrary constant limit.
So I think the limit should be the receive buffer size, not
VIRTIO_VSOCK_MAX_PKT_BUF_SIZE. But at this point the code doesn't know
which connection the packet is associated with and cannot check the
receive buffer size. :(
Anyway, any change to this behavior requires compatibility so new guest
drivers work with old vhost_vsock.ko. Therefore we should probably just
leave the limit for now.
> 2. When the host send packets, it help us to increase the parallelism
> (especially if the guest has 64 KB RX buffers) because the user thread
> will split packets, calling multiple times transport->stream_enqueue()
> in net/vmw_vsock/af_vsock.c:vsock_stream_sendmsg() while the
> vhost_transport_send_pkt_work() send them to the guest.
Sorry, I don't understand the reasoning. Overall this creates more
work. Are you saying the benefit is that
vhost_transport_send_pkt_work() can run "early" and notify the guest of
partial rx data before all of it has been enqueued?
Stefan
On Mon, Apr 08, 2019 at 02:43:28PM +0800, Jason Wang wrote:
> Another thing that may help is to implement sendpage(), which will greatly
> improve the performance.
I can't find documentation for ->sendpage(). Is the idea that you get a
struct page for the payload and can do zero-copy tx? (And can userspace
still write to the page, invalidating checksums in the header?)
Stefan
On Mon, Apr 08, 2019 at 10:37:23AM +0100, Stefan Hajnoczi wrote:
> On Fri, Apr 05, 2019 at 12:07:47PM +0200, Stefano Garzarella wrote:
> > On Fri, Apr 05, 2019 at 09:24:47AM +0100, Stefan Hajnoczi wrote:
> > > On Thu, Apr 04, 2019 at 12:58:37PM +0200, Stefano Garzarella wrote:
> > > > Since now we are able to split packets, we can avoid limiting
> > > > their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> > > > Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> > > > packet size.
> > > >
> > > > Signed-off-by: Stefano Garzarella <[email protected]>
> > > > ---
> > > > net/vmw_vsock/virtio_transport_common.c | 4 ++--
> > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > > index f32301d823f5..822e5d07a4ec 100644
> > > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > > @@ -167,8 +167,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > vvs = vsk->trans;
> > > >
> > > > /* we can send less than pkt_len bytes */
> > > > - if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> > > > - pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> > > > + if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> > > > + pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> > >
> > > The next line limits pkt_len based on available credits:
> > >
> > > /* virtio_transport_get_credit might return less than pkt_len credit */
> > > pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> > >
> > > I think drivers/vhost/vsock.c:vhost_transport_do_send_pkt() now works
> > > correctly even with pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
> >
> > Correct.
> >
> > >
> > > The other ->send_pkt() callback is
> > > net/vmw_vsock/virtio_transport.c:virtio_transport_send_pkt_work() and it
> > > can already send any size packet.
> > >
> > > Do you remember why VIRTIO_VSOCK_MAX_PKT_BUF_SIZE still needs to be the
> > > limit? I'm wondering if we can get rid of it now and just limit packets
> > > to the available credits.
> >
> > There are 2 reasons why I left this limit:
> > 1. When the host receives a packets, it must be <=
> > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE [drivers/vhost/vsock.c:vhost_vsock_alloc_pkt()]
> > So in this way we can limit the packets sent from the guest.
>
> The general intent is to prevent the guest from sending huge buffers.
> This is good.
>
> However, the guest must already obey the credit limit advertized by the
> host. Therefore I think we should be checking against that instead of
> an arbitrary constant limit.
>
> So I think the limit should be the receive buffer size, not
> VIRTIO_VSOCK_MAX_PKT_BUF_SIZE. But at this point the code doesn't know
> which connection the packet is associated with and cannot check the
> receive buffer size. :(
>
> Anyway, any change to this behavior requires compatibility so new guest
> drivers work with old vhost_vsock.ko. Therefore we should probably just
> leave the limit for now.
I understood your point of view and I completely agree with you.
But, until we don't have a way to expose features/versions between guest
and host, maybe is better to leave the limit in order to be compatible
with old vhost_vsock.
>
> > 2. When the host send packets, it help us to increase the parallelism
> > (especially if the guest has 64 KB RX buffers) because the user thread
> > will split packets, calling multiple times transport->stream_enqueue()
> > in net/vmw_vsock/af_vsock.c:vsock_stream_sendmsg() while the
> > vhost_transport_send_pkt_work() send them to the guest.
>
> Sorry, I don't understand the reasoning. Overall this creates more
> work. Are you saying the benefit is that
> vhost_transport_send_pkt_work() can run "early" and notify the guest of
> partial rx data before all of it has been enqueued?
Something like that. Your reasoning is more accurate.
Anyway, I'll do some tests in order to understand better the behaviour!
Thanks,
Stefano
On Mon, Apr 08, 2019 at 04:55:31PM +0200, Stefano Garzarella wrote:
> > Anyway, any change to this behavior requires compatibility so new guest
> > drivers work with old vhost_vsock.ko. Therefore we should probably just
> > leave the limit for now.
>
> I understood your point of view and I completely agree with you.
> But, until we don't have a way to expose features/versions between guest
> and host,
Why not use the standard virtio feature negotiation mechanism for this?
> maybe is better to leave the limit in order to be compatible
> with old vhost_vsock.
--
MST
On Mon, Apr 08, 2019 at 10:57:44AM -0400, Michael S. Tsirkin wrote:
> On Mon, Apr 08, 2019 at 04:55:31PM +0200, Stefano Garzarella wrote:
> > > Anyway, any change to this behavior requires compatibility so new guest
> > > drivers work with old vhost_vsock.ko. Therefore we should probably just
> > > leave the limit for now.
> >
> > I understood your point of view and I completely agree with you.
> > But, until we don't have a way to expose features/versions between guest
> > and host,
>
> Why not use the standard virtio feature negotiation mechanism for this?
>
Yes, I have this in my mind :), but I want to understand better if we can
use virtio-net also for this mechanism.
For now, I don't think limiting the packets to 64 KiB is a big issue.
What do you think if I postpone this when I have more clear if we can
use virtio-net or not? (in order to avoid duplicated work)
Thanks,
Stefano
On Mon, Apr 08, 2019 at 05:17:35PM +0200, Stefano Garzarella wrote:
> On Mon, Apr 08, 2019 at 10:57:44AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Apr 08, 2019 at 04:55:31PM +0200, Stefano Garzarella wrote:
> > > > Anyway, any change to this behavior requires compatibility so new guest
> > > > drivers work with old vhost_vsock.ko. Therefore we should probably just
> > > > leave the limit for now.
> > >
> > > I understood your point of view and I completely agree with you.
> > > But, until we don't have a way to expose features/versions between guest
> > > and host,
> >
> > Why not use the standard virtio feature negotiation mechanism for this?
> >
>
> Yes, I have this in my mind :), but I want to understand better if we can
> use virtio-net also for this mechanism.
> For now, I don't think limiting the packets to 64 KiB is a big issue.
>
> What do you think if I postpone this when I have more clear if we can
> use virtio-net or not? (in order to avoid duplicated work)
Yes, I agree. VIRTIO has feature negotiation and we can use it to
change this behavior cleanly.
However, this will require a spec change and this patch series delivers
significant performance improvements that can be merged sooner than
VIRTIO spec changes.
Let's defer the max packet size change via VIRTIO feature bits. It can
be done separately if we decide to stick to the virtio-vsock device
design and not virtio-net.
Stefan
On 2019/4/8 下午5:44, Stefan Hajnoczi wrote:
> On Mon, Apr 08, 2019 at 02:43:28PM +0800, Jason Wang wrote:
>> Another thing that may help is to implement sendpage(), which will greatly
>> improve the performance.
> I can't find documentation for ->sendpage(). Is the idea that you get a
> struct page for the payload and can do zero-copy tx?
Yes.
> (And can userspace
> still write to the page, invalidating checksums in the header?)
>
> Stefan
Userspace can still write to the page, but for correctness (e.g in the
case of SPLICE_F_GIFT describe by vmsplice(2)), it should not do this.
For vmsplice, it may hard to detect the time to reuse the page. Maybe we
MSG_ZEROCOPY[1] is better.
Anyway, sendpage() could be still useful for sendfile() or splice().
Thanks
[1] https://netdevconf.org/2.1/papers/netdev.pdf
On Mon, Apr 08, 2019 at 02:43:28PM +0800, Jason Wang wrote:
>
> On 2019/4/4 下午6:58, Stefano Garzarella wrote:
> > This series tries to increase the throughput of virtio-vsock with slight
> > changes:
> > - patch 1/4: reduces the number of credit update messages sent to the
> > transmitter
> > - patch 2/4: allows the host to split packets on multiple buffers,
> > in this way, we can remove the packet size limit to
> > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE
> > - patch 3/4: uses VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size
> > allowed
> > - patch 4/4: increases RX buffer size to 64 KiB (affects only host->guest)
> >
> > RFC:
> > - maybe patch 4 can be replaced with multiple queues with different
> > buffer sizes or using EWMA to adapt the buffer size to the traffic
>
>
> Or EWMA + mergeable rx buffer, but if we decide to unify the datapath with
> virtio-net, we can reuse their codes.
>
>
> >
> > - as Jason suggested in a previous thread [1] I'll evaluate to use
> > virtio-net as transport, but I need to understand better how to
> > interface with it, maybe introducing sk_buff in virtio-vsock.
> >
> > Any suggestions?
>
>
> My understanding is this is not a must, but if it makes things easier, we
> can do this.
Hopefully it should simplify the maintainability and avoid duplicated code.
>
> Another thing that may help is to implement sendpage(), which will greatly
> improve the performance.
Thanks for your suggestions!
I'll try to implement sendpage() in VSOCK to measure the improvement.
Cheers,
Stefano