While I was testing this new series (v2) I discovered an huge use of memory
and a memory leak in the virtio-vsock driver in the guest when I sent
1-byte packets to the guest.
These issues are present since the introduction of the virtio-vsock
driver. I added the patches 1 and 2 to fix them in this series in order
to better track the performance trends.
v1: https://patchwork.kernel.org/cover/10885431/
v2:
- Add patch 1 to limit the memory usage
- Add patch 2 to avoid memory leak during the socket release
- Add patch 3 to fix locking of fwd_cnt and buf_alloc
- Patch 4: fix 'free_space' type (u32 instead of s64) [Stefan]
- Patch 5: Avoid integer underflow of iov_len [Stefan]
- Patch 5: Fix packet capture in order to see the exact packets that are
delivered. [Stefan]
- Add patch 8 to make the RX buffer size tunable [Stefan]
Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
support.
As Micheal suggested in the v1, I booted host and guest with 'nosmap', and I
added a column with virtio-net+vhost-net performance.
A brief description of patches:
- Patches 1+2: limit the memory usage with an extra copy and avoid memory leak
- Patches 3+4: fix locking and reduce the number of credit update messages sent
to the transmitter
- Patches 5+6: allow the host to split packets on multiple buffers and use
VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
- Patches 7+8: increase RX buffer size to 64 KiB
host -> guest [Gbps]
pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
TCP_NODELAY
64 0.068 0.063 0.130 0.131 0.128 0.188 0.187
256 0.274 0.236 0.392 0.338 0.282 0.749 0.654
512 0.531 0.457 0.862 0.725 0.602 1.419 1.414
1K 0.954 0.827 1.591 1.598 1.548 2.599 2.640
2K 1.783 1.543 3.731 3.637 3.469 4.530 4.754
4K 3.332 3.436 7.164 7.124 6.494 7.738 7.696
8K 5.792 5.530 11.653 11.787 11.444 12.307 11.850
16K 8.405 8.462 16.372 16.855 17.562 16.936 16.954
32K 14.208 13.669 18.945 20.009 23.128 21.980 23.015
64K 21.082 18.893 20.266 20.903 30.622 27.290 27.383
128K 20.696 20.148 20.112 21.746 32.152 30.446 30.990
256K 20.801 20.589 20.725 22.685 34.721 33.151 32.745
512K 21.220 20.465 20.432 22.106 34.496 36.847 31.096
guest -> host [Gbps]
pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
TCP_NODELAY
64 0.089 0.091 0.120 0.115 0.117 0.274 0.272
256 0.352 0.354 0.452 0.445 0.451 1.085 1.136
512 0.705 0.704 0.893 0.858 0.898 2.131 1.882
1K 1.394 1.433 1.721 1.669 1.691 3.984 3.576
2K 2.818 2.874 3.316 3.249 3.303 6.719 6.359
4K 5.293 5.397 6.129 5.933 6.082 10.105 9.860
8K 8.890 9.151 10.990 10.545 10.519 15.239 14.868
16K 11.444 11.018 12.074 15.255 15.577 20.551 20.848
32K 11.229 10.875 10.857 24.401 25.227 26.294 26.380
64K 10.832 10.545 10.816 39.487 39.616 34.996 32.041
128K 10.435 10.241 10.500 39.813 40.012 38.379 35.055
256K 10.263 9.866 9.845 34.971 35.143 36.559 37.232
512K 10.224 10.060 10.092 35.469 34.627 34.963 33.401
As Stefan suggested in the v1, this time I measured also the efficiency in this
way:
efficiency = Mbps / (%CPU_Host + %CPU_Guest)
The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
but it's provided for free from iperf3 and could be an indication.
host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
TCP_NODELAY
64 0.94 0.59 3.96 4.06 4.09 2.82 2.11
256 2.62 2.50 6.45 6.09 5.81 9.64 8.73
512 5.16 4.87 13.16 12.39 11.67 17.83 17.76
1K 9.16 8.85 24.98 24.97 25.01 32.57 32.04
2K 17.41 17.03 49.09 48.59 49.22 55.31 57.14
4K 32.99 33.62 90.80 90.98 91.72 91.79 91.40
8K 58.51 59.98 153.53 170.83 167.31 137.51 132.85
16K 89.32 95.29 216.98 264.18 260.95 176.05 176.05
32K 152.94 167.10 285.75 387.02 360.81 215.49 226.30
64K 250.38 307.20 317.65 489.53 472.70 238.97 244.27
128K 327.99 335.24 335.76 523.71 486.41 253.29 260.86
256K 327.06 334.24 338.64 533.76 509.85 267.78 266.22
512K 337.36 330.61 334.95 512.90 496.35 280.42 241.43
guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
TCP_NODELAY
64 0.90 0.91 1.37 1.32 1.35 2.15 2.13
256 3.59 3.55 5.23 5.19 5.29 8.50 8.89
512 7.19 7.08 10.21 9.95 10.38 16.74 14.71
1K 14.15 14.34 19.85 19.06 19.33 31.44 28.11
2K 28.44 29.09 37.78 37.18 37.49 53.07 50.63
4K 55.37 57.60 71.02 69.27 70.97 81.56 79.32
8K 105.58 100.45 111.95 124.68 123.61 120.85 118.66
16K 141.63 138.24 137.67 187.41 190.20 160.43 163.00
32K 147.56 143.09 138.48 296.41 301.04 214.64 223.94
64K 144.81 143.27 138.49 433.98 462.26 298.86 269.71
128K 150.14 147.99 146.85 511.36 514.29 350.17 298.09
256K 156.69 152.25 148.69 542.19 549.97 326.42 333.32
512K 157.29 153.35 152.22 546.52 533.24 315.55 302.27
[1] https://github.com/stefano-garzarella/iperf/
Stefano Garzarella (8):
vsock/virtio: limit the memory used per-socket
vsock/virtio: free packets during the socket release
vsock/virtio: fix locking for fwd_cnt and buf_alloc
vsock/virtio: reduce credit update messages
vhost/vsock: split packets to send using multiple buffers
vsock/virtio: change the maximum packet size allowed
vsock/virtio: increase RX buffer size to 64 KiB
vsock/virtio: make the RX buffer size tunable
drivers/vhost/vsock.c | 53 +++++++--
include/linux/virtio_vsock.h | 14 ++-
net/vmw_vsock/virtio_transport.c | 28 ++++-
net/vmw_vsock/virtio_transport_common.c | 144 ++++++++++++++++++------
4 files changed, 190 insertions(+), 49 deletions(-)
--
2.20.1
The RX buffer size determines the memory consumption of the
vsock/virtio guest driver, so we make it tunable through
a module parameter.
The size allowed are between 4 KB and 64 KB in order to be
compatible with old host drivers.
Suggested-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
---
include/linux/virtio_vsock.h | 1 +
net/vmw_vsock/virtio_transport.c | 27 ++++++++++++++++++++++++++-
2 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 5a9d25be72df..b9f8c3d91f80 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -13,6 +13,7 @@
#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 64)
#define VIRTIO_VSOCK_MAX_BUF_SIZE 0xFFFFFFFFUL
#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE (1024 * 64)
+#define VIRTIO_VSOCK_MIN_PKT_BUF_SIZE (1024 * 4)
enum {
VSOCK_VQ_RX = 0, /* for host to guest data */
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index af1d2ce12f54..732398b4e28f 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -66,6 +66,31 @@ struct virtio_vsock {
u32 guest_cid;
};
+static unsigned int rx_buf_size = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+
+static int param_set_rx_buf_size(const char *val, const struct kernel_param *kp)
+{
+ unsigned int size;
+ int ret;
+
+ ret = kstrtouint(val, 0, &size);
+ if (ret)
+ return ret;
+
+ if (size < VIRTIO_VSOCK_MIN_PKT_BUF_SIZE ||
+ size > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
+ return -EINVAL;
+
+ return param_set_uint(val, kp);
+};
+
+static const struct kernel_param_ops param_ops_rx_buf_size = {
+ .set = param_set_rx_buf_size,
+ .get = param_get_uint,
+};
+
+module_param_cb(rx_buf_size, ¶m_ops_rx_buf_size, &rx_buf_size, 0644);
+
static struct virtio_vsock *virtio_vsock_get(void)
{
return the_virtio_vsock;
@@ -261,7 +286,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
{
- int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+ int buf_len = rx_buf_size;
struct virtio_vsock_pkt *pkt;
struct scatterlist hdr, buf, *sgs[2];
struct virtqueue *vq;
--
2.20.1
When the socket is released, we should free all packets
queued in the per-socket list in order to avoid a memory
leak.
Signed-off-by: Stefano Garzarella <[email protected]>
---
net/vmw_vsock/virtio_transport_common.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 0248d6808755..65c8b4a23f2b 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -827,12 +827,20 @@ static bool virtio_transport_close(struct vsock_sock *vsk)
void virtio_transport_release(struct vsock_sock *vsk)
{
+ struct virtio_vsock_sock *vvs = vsk->trans;
+ struct virtio_vsock_buf *buf;
struct sock *sk = &vsk->sk;
bool remove_sock = true;
lock_sock(sk);
if (sk->sk_type == SOCK_STREAM)
remove_sock = virtio_transport_close(vsk);
+ while (!list_empty(&vvs->rx_queue)) {
+ buf = list_first_entry(&vvs->rx_queue,
+ struct virtio_vsock_buf, list);
+ list_del(&buf->list);
+ virtio_transport_free_buf(buf);
+ }
release_sock(sk);
if (remove_sock)
--
2.20.1
From: Stefano Garzarella <[email protected]>
Date: Fri, 10 May 2019 14:58:37 +0200
> @@ -827,12 +827,20 @@ static bool virtio_transport_close(struct vsock_sock *vsk)
>
> void virtio_transport_release(struct vsock_sock *vsk)
> {
> + struct virtio_vsock_sock *vvs = vsk->trans;
> + struct virtio_vsock_buf *buf;
> struct sock *sk = &vsk->sk;
> bool remove_sock = true;
>
> lock_sock(sk);
> if (sk->sk_type == SOCK_STREAM)
> remove_sock = virtio_transport_close(vsk);
> + while (!list_empty(&vvs->rx_queue)) {
> + buf = list_first_entry(&vvs->rx_queue,
> + struct virtio_vsock_buf, list);
Please use list_for_each_entry_safe().
On Fri, May 10, 2019 at 03:20:08PM -0700, David Miller wrote:
> From: Stefano Garzarella <[email protected]>
> Date: Fri, 10 May 2019 14:58:37 +0200
>
> > @@ -827,12 +827,20 @@ static bool virtio_transport_close(struct vsock_sock *vsk)
> >
> > void virtio_transport_release(struct vsock_sock *vsk)
> > {
> > + struct virtio_vsock_sock *vvs = vsk->trans;
> > + struct virtio_vsock_buf *buf;
> > struct sock *sk = &vsk->sk;
> > bool remove_sock = true;
> >
> > lock_sock(sk);
> > if (sk->sk_type == SOCK_STREAM)
> > remove_sock = virtio_transport_close(vsk);
> > + while (!list_empty(&vvs->rx_queue)) {
> > + buf = list_first_entry(&vvs->rx_queue,
> > + struct virtio_vsock_buf, list);
>
> Please use list_for_each_entry_safe().
Thanks for the review, I'll change it in the v3.
Cheers,
Stefano
On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> While I was testing this new series (v2) I discovered an huge use of memory
> and a memory leak in the virtio-vsock driver in the guest when I sent
> 1-byte packets to the guest.
>
> These issues are present since the introduction of the virtio-vsock
> driver. I added the patches 1 and 2 to fix them in this series in order
> to better track the performance trends.
>
> v1: https://patchwork.kernel.org/cover/10885431/
>
> v2:
> - Add patch 1 to limit the memory usage
> - Add patch 2 to avoid memory leak during the socket release
> - Add patch 3 to fix locking of fwd_cnt and buf_alloc
> - Patch 4: fix 'free_space' type (u32 instead of s64) [Stefan]
> - Patch 5: Avoid integer underflow of iov_len [Stefan]
> - Patch 5: Fix packet capture in order to see the exact packets that are
> delivered. [Stefan]
> - Add patch 8 to make the RX buffer size tunable [Stefan]
>
> Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> support.
> As Micheal suggested in the v1, I booted host and guest with 'nosmap', and I
> added a column with virtio-net+vhost-net performance.
>
> A brief description of patches:
> - Patches 1+2: limit the memory usage with an extra copy and avoid memory leak
> - Patches 3+4: fix locking and reduce the number of credit update messages sent
> to the transmitter
> - Patches 5+6: allow the host to split packets on multiple buffers and use
> VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> - Patches 7+8: increase RX buffer size to 64 KiB
>
> host -> guest [Gbps]
> pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
> TCP_NODELAY
> 64 0.068 0.063 0.130 0.131 0.128 0.188 0.187
> 256 0.274 0.236 0.392 0.338 0.282 0.749 0.654
> 512 0.531 0.457 0.862 0.725 0.602 1.419 1.414
> 1K 0.954 0.827 1.591 1.598 1.548 2.599 2.640
> 2K 1.783 1.543 3.731 3.637 3.469 4.530 4.754
> 4K 3.332 3.436 7.164 7.124 6.494 7.738 7.696
> 8K 5.792 5.530 11.653 11.787 11.444 12.307 11.850
> 16K 8.405 8.462 16.372 16.855 17.562 16.936 16.954
> 32K 14.208 13.669 18.945 20.009 23.128 21.980 23.015
> 64K 21.082 18.893 20.266 20.903 30.622 27.290 27.383
> 128K 20.696 20.148 20.112 21.746 32.152 30.446 30.990
> 256K 20.801 20.589 20.725 22.685 34.721 33.151 32.745
> 512K 21.220 20.465 20.432 22.106 34.496 36.847 31.096
>
> guest -> host [Gbps]
> pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
> TCP_NODELAY
> 64 0.089 0.091 0.120 0.115 0.117 0.274 0.272
> 256 0.352 0.354 0.452 0.445 0.451 1.085 1.136
> 512 0.705 0.704 0.893 0.858 0.898 2.131 1.882
> 1K 1.394 1.433 1.721 1.669 1.691 3.984 3.576
> 2K 2.818 2.874 3.316 3.249 3.303 6.719 6.359
> 4K 5.293 5.397 6.129 5.933 6.082 10.105 9.860
> 8K 8.890 9.151 10.990 10.545 10.519 15.239 14.868
> 16K 11.444 11.018 12.074 15.255 15.577 20.551 20.848
> 32K 11.229 10.875 10.857 24.401 25.227 26.294 26.380
> 64K 10.832 10.545 10.816 39.487 39.616 34.996 32.041
> 128K 10.435 10.241 10.500 39.813 40.012 38.379 35.055
> 256K 10.263 9.866 9.845 34.971 35.143 36.559 37.232
> 512K 10.224 10.060 10.092 35.469 34.627 34.963 33.401
>
> As Stefan suggested in the v1, this time I measured also the efficiency in this
> way:
> efficiency = Mbps / (%CPU_Host + %CPU_Guest)
>
> The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> but it's provided for free from iperf3 and could be an indication.
>
> host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
> TCP_NODELAY
> 64 0.94 0.59 3.96 4.06 4.09 2.82 2.11
> 256 2.62 2.50 6.45 6.09 5.81 9.64 8.73
> 512 5.16 4.87 13.16 12.39 11.67 17.83 17.76
> 1K 9.16 8.85 24.98 24.97 25.01 32.57 32.04
> 2K 17.41 17.03 49.09 48.59 49.22 55.31 57.14
> 4K 32.99 33.62 90.80 90.98 91.72 91.79 91.40
> 8K 58.51 59.98 153.53 170.83 167.31 137.51 132.85
> 16K 89.32 95.29 216.98 264.18 260.95 176.05 176.05
> 32K 152.94 167.10 285.75 387.02 360.81 215.49 226.30
> 64K 250.38 307.20 317.65 489.53 472.70 238.97 244.27
> 128K 327.99 335.24 335.76 523.71 486.41 253.29 260.86
> 256K 327.06 334.24 338.64 533.76 509.85 267.78 266.22
> 512K 337.36 330.61 334.95 512.90 496.35 280.42 241.43
>
> guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
> TCP_NODELAY
> 64 0.90 0.91 1.37 1.32 1.35 2.15 2.13
> 256 3.59 3.55 5.23 5.19 5.29 8.50 8.89
> 512 7.19 7.08 10.21 9.95 10.38 16.74 14.71
> 1K 14.15 14.34 19.85 19.06 19.33 31.44 28.11
> 2K 28.44 29.09 37.78 37.18 37.49 53.07 50.63
> 4K 55.37 57.60 71.02 69.27 70.97 81.56 79.32
> 8K 105.58 100.45 111.95 124.68 123.61 120.85 118.66
> 16K 141.63 138.24 137.67 187.41 190.20 160.43 163.00
> 32K 147.56 143.09 138.48 296.41 301.04 214.64 223.94
> 64K 144.81 143.27 138.49 433.98 462.26 298.86 269.71
> 128K 150.14 147.99 146.85 511.36 514.29 350.17 298.09
> 256K 156.69 152.25 148.69 542.19 549.97 326.42 333.32
> 512K 157.29 153.35 152.22 546.52 533.24 315.55 302.27
>
> [1] https://github.com/stefano-garzarella/iperf/
Hi:
Do you have any explanation that vsock is better here? Is this because
of the mergeable buffer? If you, we need test with mrg_rxbuf=off.
Thanks
>
> Stefano Garzarella (8):
> vsock/virtio: limit the memory used per-socket
> vsock/virtio: free packets during the socket release
> vsock/virtio: fix locking for fwd_cnt and buf_alloc
> vsock/virtio: reduce credit update messages
> vhost/vsock: split packets to send using multiple buffers
> vsock/virtio: change the maximum packet size allowed
> vsock/virtio: increase RX buffer size to 64 KiB
> vsock/virtio: make the RX buffer size tunable
>
> drivers/vhost/vsock.c | 53 +++++++--
> include/linux/virtio_vsock.h | 14 ++-
> net/vmw_vsock/virtio_transport.c | 28 ++++-
> net/vmw_vsock/virtio_transport_common.c | 144 ++++++++++++++++++------
> 4 files changed, 190 insertions(+), 49 deletions(-)
>
On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> The RX buffer size determines the memory consumption of the
> vsock/virtio guest driver, so we make it tunable through
> a module parameter.
>
> The size allowed are between 4 KB and 64 KB in order to be
> compatible with old host drivers.
>
> Suggested-by: Stefan Hajnoczi <[email protected]>
> Signed-off-by: Stefano Garzarella <[email protected]>
I don't see much value of doing this through kernel command line. We
should deal with them automatically like what virtio-net did. Or even a
module parameter is better.
Thanks
> ---
> include/linux/virtio_vsock.h | 1 +
> net/vmw_vsock/virtio_transport.c | 27 ++++++++++++++++++++++++++-
> 2 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 5a9d25be72df..b9f8c3d91f80 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -13,6 +13,7 @@
> #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 64)
> #define VIRTIO_VSOCK_MAX_BUF_SIZE 0xFFFFFFFFUL
> #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE (1024 * 64)
> +#define VIRTIO_VSOCK_MIN_PKT_BUF_SIZE (1024 * 4)
>
> enum {
> VSOCK_VQ_RX = 0, /* for host to guest data */
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index af1d2ce12f54..732398b4e28f 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -66,6 +66,31 @@ struct virtio_vsock {
> u32 guest_cid;
> };
>
> +static unsigned int rx_buf_size = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> +
> +static int param_set_rx_buf_size(const char *val, const struct kernel_param *kp)
> +{
> + unsigned int size;
> + int ret;
> +
> + ret = kstrtouint(val, 0, &size);
> + if (ret)
> + return ret;
> +
> + if (size < VIRTIO_VSOCK_MIN_PKT_BUF_SIZE ||
> + size > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> + return -EINVAL;
> +
> + return param_set_uint(val, kp);
> +};
> +
> +static const struct kernel_param_ops param_ops_rx_buf_size = {
> + .set = param_set_rx_buf_size,
> + .get = param_get_uint,
> +};
> +
> +module_param_cb(rx_buf_size, ¶m_ops_rx_buf_size, &rx_buf_size, 0644);
> +
> static struct virtio_vsock *virtio_vsock_get(void)
> {
> return the_virtio_vsock;
> @@ -261,7 +286,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
>
> static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> {
> - int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> + int buf_len = rx_buf_size;
> struct virtio_vsock_pkt *pkt;
> struct scatterlist hdr, buf, *sgs[2];
> struct virtqueue *vq;
On 2019/5/13 下午6:05, Jason Wang wrote:
>
> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
>> The RX buffer size determines the memory consumption of the
>> vsock/virtio guest driver, so we make it tunable through
>> a module parameter.
>>
>> The size allowed are between 4 KB and 64 KB in order to be
>> compatible with old host drivers.
>>
>> Suggested-by: Stefan Hajnoczi <[email protected]>
>> Signed-off-by: Stefano Garzarella <[email protected]>
>
>
> I don't see much value of doing this through kernel command line. We
> should deal with them automatically like what virtio-net did. Or even
> a module parameter is better.
>
> Thanks
Sorry, I misread the patch. But even module parameter is something not
flexible enough. We should deal with them transparently.
Thanks
>
>
>> ---
>> include/linux/virtio_vsock.h | 1 +
>> net/vmw_vsock/virtio_transport.c | 27 ++++++++++++++++++++++++++-
>> 2 files changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> index 5a9d25be72df..b9f8c3d91f80 100644
>> --- a/include/linux/virtio_vsock.h
>> +++ b/include/linux/virtio_vsock.h
>> @@ -13,6 +13,7 @@
>> #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 64)
>> #define VIRTIO_VSOCK_MAX_BUF_SIZE 0xFFFFFFFFUL
>> #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE (1024 * 64)
>> +#define VIRTIO_VSOCK_MIN_PKT_BUF_SIZE (1024 * 4)
>> enum {
>> VSOCK_VQ_RX = 0, /* for host to guest data */
>> diff --git a/net/vmw_vsock/virtio_transport.c
>> b/net/vmw_vsock/virtio_transport.c
>> index af1d2ce12f54..732398b4e28f 100644
>> --- a/net/vmw_vsock/virtio_transport.c
>> +++ b/net/vmw_vsock/virtio_transport.c
>> @@ -66,6 +66,31 @@ struct virtio_vsock {
>> u32 guest_cid;
>> };
>> +static unsigned int rx_buf_size = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
>> +
>> +static int param_set_rx_buf_size(const char *val, const struct
>> kernel_param *kp)
>> +{
>> + unsigned int size;
>> + int ret;
>> +
>> + ret = kstrtouint(val, 0, &size);
>> + if (ret)
>> + return ret;
>> +
>> + if (size < VIRTIO_VSOCK_MIN_PKT_BUF_SIZE ||
>> + size > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
>> + return -EINVAL;
>> +
>> + return param_set_uint(val, kp);
>> +};
>> +
>> +static const struct kernel_param_ops param_ops_rx_buf_size = {
>> + .set = param_set_rx_buf_size,
>> + .get = param_get_uint,
>> +};
>> +
>> +module_param_cb(rx_buf_size, ¶m_ops_rx_buf_size, &rx_buf_size,
>> 0644);
>> +
>> static struct virtio_vsock *virtio_vsock_get(void)
>> {
>> return the_virtio_vsock;
>> @@ -261,7 +286,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
>> static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>> {
>> - int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
>> + int buf_len = rx_buf_size;
>> struct virtio_vsock_pkt *pkt;
>> struct scatterlist hdr, buf, *sgs[2];
>> struct virtqueue *vq;
On Mon, May 13, 2019 at 05:33:40PM +0800, Jason Wang wrote:
>
> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > While I was testing this new series (v2) I discovered an huge use of memory
> > and a memory leak in the virtio-vsock driver in the guest when I sent
> > 1-byte packets to the guest.
> >
> > These issues are present since the introduction of the virtio-vsock
> > driver. I added the patches 1 and 2 to fix them in this series in order
> > to better track the performance trends.
> >
> > v1: https://patchwork.kernel.org/cover/10885431/
> >
> > v2:
> > - Add patch 1 to limit the memory usage
> > - Add patch 2 to avoid memory leak during the socket release
> > - Add patch 3 to fix locking of fwd_cnt and buf_alloc
> > - Patch 4: fix 'free_space' type (u32 instead of s64) [Stefan]
> > - Patch 5: Avoid integer underflow of iov_len [Stefan]
> > - Patch 5: Fix packet capture in order to see the exact packets that are
> > delivered. [Stefan]
> > - Add patch 8 to make the RX buffer size tunable [Stefan]
> >
> > Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> > support.
> > As Micheal suggested in the v1, I booted host and guest with 'nosmap', and I
> > added a column with virtio-net+vhost-net performance.
> >
> > A brief description of patches:
> > - Patches 1+2: limit the memory usage with an extra copy and avoid memory leak
> > - Patches 3+4: fix locking and reduce the number of credit update messages sent
> > to the transmitter
> > - Patches 5+6: allow the host to split packets on multiple buffers and use
> > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> > - Patches 7+8: increase RX buffer size to 64 KiB
> >
> > host -> guest [Gbps]
> > pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
> > TCP_NODELAY
> > 64 0.068 0.063 0.130 0.131 0.128 0.188 0.187
> > 256 0.274 0.236 0.392 0.338 0.282 0.749 0.654
> > 512 0.531 0.457 0.862 0.725 0.602 1.419 1.414
> > 1K 0.954 0.827 1.591 1.598 1.548 2.599 2.640
> > 2K 1.783 1.543 3.731 3.637 3.469 4.530 4.754
> > 4K 3.332 3.436 7.164 7.124 6.494 7.738 7.696
> > 8K 5.792 5.530 11.653 11.787 11.444 12.307 11.850
> > 16K 8.405 8.462 16.372 16.855 17.562 16.936 16.954
> > 32K 14.208 13.669 18.945 20.009 23.128 21.980 23.015
> > 64K 21.082 18.893 20.266 20.903 30.622 27.290 27.383
> > 128K 20.696 20.148 20.112 21.746 32.152 30.446 30.990
> > 256K 20.801 20.589 20.725 22.685 34.721 33.151 32.745
> > 512K 21.220 20.465 20.432 22.106 34.496 36.847 31.096
> >
> > guest -> host [Gbps]
> > pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
> > TCP_NODELAY
> > 64 0.089 0.091 0.120 0.115 0.117 0.274 0.272
> > 256 0.352 0.354 0.452 0.445 0.451 1.085 1.136
> > 512 0.705 0.704 0.893 0.858 0.898 2.131 1.882
> > 1K 1.394 1.433 1.721 1.669 1.691 3.984 3.576
> > 2K 2.818 2.874 3.316 3.249 3.303 6.719 6.359
> > 4K 5.293 5.397 6.129 5.933 6.082 10.105 9.860
> > 8K 8.890 9.151 10.990 10.545 10.519 15.239 14.868
> > 16K 11.444 11.018 12.074 15.255 15.577 20.551 20.848
> > 32K 11.229 10.875 10.857 24.401 25.227 26.294 26.380
> > 64K 10.832 10.545 10.816 39.487 39.616 34.996 32.041
> > 128K 10.435 10.241 10.500 39.813 40.012 38.379 35.055
> > 256K 10.263 9.866 9.845 34.971 35.143 36.559 37.232
> > 512K 10.224 10.060 10.092 35.469 34.627 34.963 33.401
> >
> > As Stefan suggested in the v1, this time I measured also the efficiency in this
> > way:
> > efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> >
> > The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> > but it's provided for free from iperf3 and could be an indication.
> >
> > host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
> > TCP_NODELAY
> > 64 0.94 0.59 3.96 4.06 4.09 2.82 2.11
> > 256 2.62 2.50 6.45 6.09 5.81 9.64 8.73
> > 512 5.16 4.87 13.16 12.39 11.67 17.83 17.76
> > 1K 9.16 8.85 24.98 24.97 25.01 32.57 32.04
> > 2K 17.41 17.03 49.09 48.59 49.22 55.31 57.14
> > 4K 32.99 33.62 90.80 90.98 91.72 91.79 91.40
> > 8K 58.51 59.98 153.53 170.83 167.31 137.51 132.85
> > 16K 89.32 95.29 216.98 264.18 260.95 176.05 176.05
> > 32K 152.94 167.10 285.75 387.02 360.81 215.49 226.30
> > 64K 250.38 307.20 317.65 489.53 472.70 238.97 244.27
> > 128K 327.99 335.24 335.76 523.71 486.41 253.29 260.86
> > 256K 327.06 334.24 338.64 533.76 509.85 267.78 266.22
> > 512K 337.36 330.61 334.95 512.90 496.35 280.42 241.43
> >
> > guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
> > TCP_NODELAY
> > 64 0.90 0.91 1.37 1.32 1.35 2.15 2.13
> > 256 3.59 3.55 5.23 5.19 5.29 8.50 8.89
> > 512 7.19 7.08 10.21 9.95 10.38 16.74 14.71
> > 1K 14.15 14.34 19.85 19.06 19.33 31.44 28.11
> > 2K 28.44 29.09 37.78 37.18 37.49 53.07 50.63
> > 4K 55.37 57.60 71.02 69.27 70.97 81.56 79.32
> > 8K 105.58 100.45 111.95 124.68 123.61 120.85 118.66
> > 16K 141.63 138.24 137.67 187.41 190.20 160.43 163.00
> > 32K 147.56 143.09 138.48 296.41 301.04 214.64 223.94
> > 64K 144.81 143.27 138.49 433.98 462.26 298.86 269.71
> > 128K 150.14 147.99 146.85 511.36 514.29 350.17 298.09
> > 256K 156.69 152.25 148.69 542.19 549.97 326.42 333.32
> > 512K 157.29 153.35 152.22 546.52 533.24 315.55 302.27
> >
> > [1] https://github.com/stefano-garzarella/iperf/
>
>
> Hi:
>
> Do you have any explanation that vsock is better here? Is this because of
> the mergeable buffer? If you, we need test with mrg_rxbuf=off.
Hi Jason,
virtio-net stays faster for packets with size up tp 16K/32K, maybe, as
you suggested, could be releated to mergeable buffer.
I'll try to disable it and re-run the tests.
Thanks,
Stefano
On Mon, May 13, 2019 at 08:46:19PM +0800, Jason Wang wrote:
>
> On 2019/5/13 下午6:05, Jason Wang wrote:
> >
> > On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > > The RX buffer size determines the memory consumption of the
> > > vsock/virtio guest driver, so we make it tunable through
> > > a module parameter.
> > >
> > > The size allowed are between 4 KB and 64 KB in order to be
> > > compatible with old host drivers.
> > >
> > > Suggested-by: Stefan Hajnoczi <[email protected]>
> > > Signed-off-by: Stefano Garzarella <[email protected]>
> >
> >
> > I don't see much value of doing this through kernel command line. We
> > should deal with them automatically like what virtio-net did. Or even a
> > module parameter is better.
> >
> > Thanks
>
>
> Sorry, I misread the patch. But even module parameter is something not
> flexible enough. We should deal with them transparently.
>
Okay, I'll try to understand how we can automatically adapt the RX
buffer size. Since the flow is stream based, the receiver doesn't know the
original packet size.
Maybe I can reuse the EWMA approach to understand if the buffers are
entirely filled or not.
In that case I can increase (e.g. double) or decrease the size.
I'll try to do it!
Thanks,
Stefano
On Fri, May 10, 2019 at 02:58:37PM +0200, Stefano Garzarella wrote:
> When the socket is released, we should free all packets
> queued in the per-socket list in order to avoid a memory
> leak.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
> net/vmw_vsock/virtio_transport_common.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
Ouch, this would be nice as a separate patch that can be merged right
away (with s/virtio_vsock_buf/virtio_vsock_pkt/).
On Thu, May 16, 2019 at 04:32:18PM +0100, Stefan Hajnoczi wrote:
> On Fri, May 10, 2019 at 02:58:37PM +0200, Stefano Garzarella wrote:
> > When the socket is released, we should free all packets
> > queued in the per-socket list in order to avoid a memory
> > leak.
> >
> > Signed-off-by: Stefano Garzarella <[email protected]>
> > ---
> > net/vmw_vsock/virtio_transport_common.c | 8 ++++++++
> > 1 file changed, 8 insertions(+)
>
> Ouch, this would be nice as a separate patch that can be merged right
> away (with s/virtio_vsock_buf/virtio_vsock_pkt/).
Okay, I'll fix this patch following the David's comment and I'll send
as a separate patch using the virtio_vsock_pkt.
Thanks,
Stefano
On Mon, May 13, 2019 at 05:33:40PM +0800, Jason Wang wrote:
>
> On 2019/5/10 下午8:58, Stefano Garzarella wrote:
> > While I was testing this new series (v2) I discovered an huge use of memory
> > and a memory leak in the virtio-vsock driver in the guest when I sent
> > 1-byte packets to the guest.
> >
> > These issues are present since the introduction of the virtio-vsock
> > driver. I added the patches 1 and 2 to fix them in this series in order
> > to better track the performance trends.
> >
> > v1: https://patchwork.kernel.org/cover/10885431/
> >
> > v2:
> > - Add patch 1 to limit the memory usage
> > - Add patch 2 to avoid memory leak during the socket release
> > - Add patch 3 to fix locking of fwd_cnt and buf_alloc
> > - Patch 4: fix 'free_space' type (u32 instead of s64) [Stefan]
> > - Patch 5: Avoid integer underflow of iov_len [Stefan]
> > - Patch 5: Fix packet capture in order to see the exact packets that are
> > delivered. [Stefan]
> > - Add patch 8 to make the RX buffer size tunable [Stefan]
> >
> > Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> > support.
> > As Micheal suggested in the v1, I booted host and guest with 'nosmap', and I
> > added a column with virtio-net+vhost-net performance.
> >
> > A brief description of patches:
> > - Patches 1+2: limit the memory usage with an extra copy and avoid memory leak
> > - Patches 3+4: fix locking and reduce the number of credit update messages sent
> > to the transmitter
> > - Patches 5+6: allow the host to split packets on multiple buffers and use
> > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> > - Patches 7+8: increase RX buffer size to 64 KiB
> >
> > host -> guest [Gbps]
> > pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
> > TCP_NODELAY
> > 64 0.068 0.063 0.130 0.131 0.128 0.188 0.187
> > 256 0.274 0.236 0.392 0.338 0.282 0.749 0.654
> > 512 0.531 0.457 0.862 0.725 0.602 1.419 1.414
> > 1K 0.954 0.827 1.591 1.598 1.548 2.599 2.640
> > 2K 1.783 1.543 3.731 3.637 3.469 4.530 4.754
> > 4K 3.332 3.436 7.164 7.124 6.494 7.738 7.696
> > 8K 5.792 5.530 11.653 11.787 11.444 12.307 11.850
> > 16K 8.405 8.462 16.372 16.855 17.562 16.936 16.954
> > 32K 14.208 13.669 18.945 20.009 23.128 21.980 23.015
> > 64K 21.082 18.893 20.266 20.903 30.622 27.290 27.383
> > 128K 20.696 20.148 20.112 21.746 32.152 30.446 30.990
> > 256K 20.801 20.589 20.725 22.685 34.721 33.151 32.745
> > 512K 21.220 20.465 20.432 22.106 34.496 36.847 31.096
> >
> > guest -> host [Gbps]
> > pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
> > TCP_NODELAY
> > 64 0.089 0.091 0.120 0.115 0.117 0.274 0.272
> > 256 0.352 0.354 0.452 0.445 0.451 1.085 1.136
> > 512 0.705 0.704 0.893 0.858 0.898 2.131 1.882
> > 1K 1.394 1.433 1.721 1.669 1.691 3.984 3.576
> > 2K 2.818 2.874 3.316 3.249 3.303 6.719 6.359
> > 4K 5.293 5.397 6.129 5.933 6.082 10.105 9.860
> > 8K 8.890 9.151 10.990 10.545 10.519 15.239 14.868
> > 16K 11.444 11.018 12.074 15.255 15.577 20.551 20.848
> > 32K 11.229 10.875 10.857 24.401 25.227 26.294 26.380
> > 64K 10.832 10.545 10.816 39.487 39.616 34.996 32.041
> > 128K 10.435 10.241 10.500 39.813 40.012 38.379 35.055
> > 256K 10.263 9.866 9.845 34.971 35.143 36.559 37.232
> > 512K 10.224 10.060 10.092 35.469 34.627 34.963 33.401
> >
> > As Stefan suggested in the v1, this time I measured also the efficiency in this
> > way:
> > efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> >
> > The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> > but it's provided for free from iperf3 and could be an indication.
> >
> > host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
> > TCP_NODELAY
> > 64 0.94 0.59 3.96 4.06 4.09 2.82 2.11
> > 256 2.62 2.50 6.45 6.09 5.81 9.64 8.73
> > 512 5.16 4.87 13.16 12.39 11.67 17.83 17.76
> > 1K 9.16 8.85 24.98 24.97 25.01 32.57 32.04
> > 2K 17.41 17.03 49.09 48.59 49.22 55.31 57.14
> > 4K 32.99 33.62 90.80 90.98 91.72 91.79 91.40
> > 8K 58.51 59.98 153.53 170.83 167.31 137.51 132.85
> > 16K 89.32 95.29 216.98 264.18 260.95 176.05 176.05
> > 32K 152.94 167.10 285.75 387.02 360.81 215.49 226.30
> > 64K 250.38 307.20 317.65 489.53 472.70 238.97 244.27
> > 128K 327.99 335.24 335.76 523.71 486.41 253.29 260.86
> > 256K 327.06 334.24 338.64 533.76 509.85 267.78 266.22
> > 512K 337.36 330.61 334.95 512.90 496.35 280.42 241.43
> >
> > guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt p 1+2 p 3+4 p 5+6 p 7+8 virtio-net + vhost
> > TCP_NODELAY
> > 64 0.90 0.91 1.37 1.32 1.35 2.15 2.13
> > 256 3.59 3.55 5.23 5.19 5.29 8.50 8.89
> > 512 7.19 7.08 10.21 9.95 10.38 16.74 14.71
> > 1K 14.15 14.34 19.85 19.06 19.33 31.44 28.11
> > 2K 28.44 29.09 37.78 37.18 37.49 53.07 50.63
> > 4K 55.37 57.60 71.02 69.27 70.97 81.56 79.32
> > 8K 105.58 100.45 111.95 124.68 123.61 120.85 118.66
> > 16K 141.63 138.24 137.67 187.41 190.20 160.43 163.00
> > 32K 147.56 143.09 138.48 296.41 301.04 214.64 223.94
> > 64K 144.81 143.27 138.49 433.98 462.26 298.86 269.71
> > 128K 150.14 147.99 146.85 511.36 514.29 350.17 298.09
> > 256K 156.69 152.25 148.69 542.19 549.97 326.42 333.32
> > 512K 157.29 153.35 152.22 546.52 533.24 315.55 302.27
> >
> > [1] https://github.com/stefano-garzarella/iperf/
>
>
> Hi:
>
> Do you have any explanation that vsock is better here? Is this because of
> the mergeable buffer? If you, we need test with mrg_rxbuf=off.
>
Hi Jason,
I tried to disable the mergeable buffer but I had even worst performance
with virtio-net.
Do you think the differences could be related to the TCP/IP stack?
Thanks,
Stefano