2019-07-17 11:31:22

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput

This series tries to increase the throughput of virtio-vsock with slight
changes.
While I was testing the v2 of this series I discovered an huge use of memory,
so I added patch 1 to mitigate this issue. I put it in this series in order
to better track the performance trends.

v4:
- rebased all patches on current master (conflicts is Patch 4)
- Patch 1: added Stefan's R-b
- Patch 3: removed lock when buf_alloc is written [David];
moved this patch after "vsock/virtio: reduce credit update messages"
to make it clearer
- Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some
conflicts

v3: https://patchwork.kernel.org/cover/10970145

v2: https://patchwork.kernel.org/cover/10938743

v1: https://patchwork.kernel.org/cover/10885431

Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
support. As Micheal suggested in the v1, I booted host and guest with 'nosmap'.

A brief description of patches:
- Patches 1: limit the memory usage with an extra copy for small packets
- Patches 2+3: reduce the number of credit update messages sent to the
transmitter
- Patches 4+5: allow the host to split packets on multiple buffers and use
VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed

host -> guest [Gbps]
pkt_size before opt p 1 p 2+3 p 4+5

32 0.032 0.030 0.048 0.051
64 0.061 0.059 0.108 0.117
128 0.122 0.112 0.227 0.234
256 0.244 0.241 0.418 0.415
512 0.459 0.466 0.847 0.865
1K 0.927 0.919 1.657 1.641
2K 1.884 1.813 3.262 3.269
4K 3.378 3.326 6.044 6.195
8K 5.637 5.676 10.141 11.287
16K 8.250 8.402 15.976 16.736
32K 13.327 13.204 19.013 20.515
64K 21.241 21.341 20.973 21.879
128K 21.851 22.354 21.816 23.203
256K 21.408 21.693 21.846 24.088
512K 21.600 21.899 21.921 24.106

guest -> host [Gbps]
pkt_size before opt p 1 p 2+3 p 4+5

32 0.045 0.046 0.057 0.057
64 0.089 0.091 0.103 0.104
128 0.170 0.179 0.192 0.200
256 0.364 0.351 0.361 0.379
512 0.709 0.699 0.731 0.790
1K 1.399 1.407 1.395 1.427
2K 2.670 2.684 2.745 2.835
4K 5.171 5.199 5.305 5.451
8K 8.442 8.500 10.083 9.941
16K 12.305 12.259 13.519 15.385
32K 11.418 11.150 11.988 24.680
64K 10.778 10.659 11.589 35.273
128K 10.421 10.339 10.939 40.338
256K 10.300 9.719 10.508 36.562
512K 9.833 9.808 10.612 35.979

As Stefan suggested in the v1, I measured also the efficiency in this way:
efficiency = Mbps / (%CPU_Host + %CPU_Guest)

The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
but it's provided for free from iperf3 and could be an indication.

host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt p 1 p 2+3 p 4+5

32 0.35 0.45 0.79 1.02
64 0.56 0.80 1.41 1.54
128 1.11 1.52 3.03 3.12
256 2.20 2.16 5.44 5.58
512 4.17 4.18 10.96 11.46
1K 8.30 8.26 20.99 20.89
2K 16.82 16.31 39.76 39.73
4K 30.89 30.79 74.07 75.73
8K 53.74 54.49 124.24 148.91
16K 80.68 83.63 200.21 232.79
32K 132.27 132.52 260.81 357.07
64K 229.82 230.40 300.19 444.18
128K 332.60 329.78 331.51 492.28
256K 331.06 337.22 339.59 511.59
512K 335.58 328.50 331.56 504.56

guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt p 1 p 2+3 p 4+5

32 0.43 0.43 0.53 0.56
64 0.85 0.86 1.04 1.10
128 1.63 1.71 2.07 2.13
256 3.48 3.35 4.02 4.22
512 6.80 6.67 7.97 8.63
1K 13.32 13.31 15.72 15.94
2K 25.79 25.92 30.84 30.98
4K 50.37 50.48 58.79 59.69
8K 95.90 96.15 107.04 110.33
16K 145.80 145.43 143.97 174.70
32K 147.06 144.74 146.02 282.48
64K 145.25 143.99 141.62 406.40
128K 149.34 146.96 147.49 489.34
256K 156.35 149.81 152.21 536.37
512K 151.65 150.74 151.52 519.93

[1] https://github.com/stefano-garzarella/iperf/

Stefano Garzarella (5):
vsock/virtio: limit the memory used per-socket
vsock/virtio: reduce credit update messages
vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
vhost/vsock: split packets to send using multiple buffers
vsock/virtio: change the maximum packet size allowed

drivers/vhost/vsock.c | 68 ++++++++++++-----
include/linux/virtio_vsock.h | 4 +-
net/vmw_vsock/virtio_transport.c | 1 +
net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
4 files changed, 134 insertions(+), 38 deletions(-)

--
2.20.1


2019-07-17 11:31:42

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()

fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
the same spinlock also if we are in the TX path.

Move also buf_alloc under the same lock.

Signed-off-by: Stefano Garzarella <[email protected]>
---
include/linux/virtio_vsock.h | 2 +-
net/vmw_vsock/virtio_transport_common.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 49fc9d20bc43..4c7781f4b29b 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -35,7 +35,6 @@ struct virtio_vsock_sock {

/* Protected by tx_lock */
u32 tx_cnt;
- u32 buf_alloc;
u32 peer_fwd_cnt;
u32 peer_buf_alloc;

@@ -43,6 +42,7 @@ struct virtio_vsock_sock {
u32 fwd_cnt;
u32 last_fwd_cnt;
u32 rx_bytes;
+ u32 buf_alloc;
struct list_head rx_queue;
};

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index a85559d4d974..34a2b42313b7 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -210,11 +210,11 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,

void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
{
- spin_lock_bh(&vvs->tx_lock);
+ spin_lock_bh(&vvs->rx_lock);
vvs->last_fwd_cnt = vvs->fwd_cnt;
pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
- spin_unlock_bh(&vvs->tx_lock);
+ spin_unlock_bh(&vvs->rx_lock);
}
EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);

--
2.20.1

2019-07-17 11:31:48

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers

If the packets to sent to the guest are bigger than the buffer
available, we can split them, using multiple buffers and fixing
the length in the packet header.
This is safe since virtio-vsock supports only stream sockets.

Signed-off-by: Stefano Garzarella <[email protected]>
---
drivers/vhost/vsock.c | 66 ++++++++++++++++++-------
net/vmw_vsock/virtio_transport_common.c | 15 ++++--
2 files changed, 60 insertions(+), 21 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 6c8390a2af52..9f57736fe15e 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -102,7 +102,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
struct iov_iter iov_iter;
unsigned out, in;
size_t nbytes;
- size_t len;
+ size_t iov_len, payload_len;
int head;

spin_lock_bh(&vsock->send_pkt_list_lock);
@@ -147,8 +147,24 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
break;
}

- len = iov_length(&vq->iov[out], in);
- iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
+ iov_len = iov_length(&vq->iov[out], in);
+ if (iov_len < sizeof(pkt->hdr)) {
+ virtio_transport_free_pkt(pkt);
+ vq_err(vq, "Buffer len [%zu] too small\n", iov_len);
+ break;
+ }
+
+ iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len);
+ payload_len = pkt->len - pkt->off;
+
+ /* If the packet is greater than the space available in the
+ * buffer, we split it using multiple buffers.
+ */
+ if (payload_len > iov_len - sizeof(pkt->hdr))
+ payload_len = iov_len - sizeof(pkt->hdr);
+
+ /* Set the correct length in the header */
+ pkt->hdr.len = cpu_to_le32(payload_len);

nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
if (nbytes != sizeof(pkt->hdr)) {
@@ -157,33 +173,47 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
break;
}

- nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
- if (nbytes != pkt->len) {
+ nbytes = copy_to_iter(pkt->buf + pkt->off, payload_len,
+ &iov_iter);
+ if (nbytes != payload_len) {
virtio_transport_free_pkt(pkt);
vq_err(vq, "Faulted on copying pkt buf\n");
break;
}

- vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
+ vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len);
added = true;

- if (pkt->reply) {
- int val;
-
- val = atomic_dec_return(&vsock->queued_replies);
-
- /* Do we have resources to resume tx processing? */
- if (val + 1 == tx_vq->num)
- restart_tx = true;
- }
-
/* Deliver to monitoring devices all correctly transmitted
* packets.
*/
virtio_transport_deliver_tap_pkt(pkt);

- total_len += pkt->len;
- virtio_transport_free_pkt(pkt);
+ pkt->off += payload_len;
+ total_len += payload_len;
+
+ /* If we didn't send all the payload we can requeue the packet
+ * to send it with the next available buffer.
+ */
+ if (pkt->off < pkt->len) {
+ spin_lock_bh(&vsock->send_pkt_list_lock);
+ list_add(&pkt->list, &vsock->send_pkt_list);
+ spin_unlock_bh(&vsock->send_pkt_list_lock);
+ } else {
+ if (pkt->reply) {
+ int val;
+
+ val = atomic_dec_return(&vsock->queued_replies);
+
+ /* Do we have resources to resume tx
+ * processing?
+ */
+ if (val + 1 == tx_vq->num)
+ restart_tx = true;
+ }
+
+ virtio_transport_free_pkt(pkt);
+ }
} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
if (added)
vhost_signal(&vsock->dev, vq);
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 34a2b42313b7..56fab3f03d0e 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -97,8 +97,17 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
struct virtio_vsock_pkt *pkt = opaque;
struct af_vsockmon_hdr *hdr;
struct sk_buff *skb;
+ size_t payload_len;
+ void *payload_buf;

- skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + pkt->len,
+ /* A packet could be split to fit the RX buffer, so we can retrieve
+ * the payload length from the header and the buffer pointer taking
+ * care of the offset in the original packet.
+ */
+ payload_len = le32_to_cpu(pkt->hdr.len);
+ payload_buf = pkt->buf + pkt->off;
+
+ skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + payload_len,
GFP_ATOMIC);
if (!skb)
return NULL;
@@ -138,8 +147,8 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)

skb_put_data(skb, &pkt->hdr, sizeof(pkt->hdr));

- if (pkt->len) {
- skb_put_data(skb, pkt->buf, pkt->len);
+ if (payload_len) {
+ skb_put_data(skb, payload_buf, payload_len);
}

return skb;
--
2.20.1

2019-07-17 11:32:39

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

Since virtio-vsock was introduced, the buffers filled by the host
and pushed to the guest using the vring, are directly queued in
a per-socket list. These buffers are preallocated by the guest
with a fixed size (4 KB).

The maximum amount of memory used by each socket should be
controlled by the credit mechanism.
The default credit available per-socket is 256 KB, but if we use
only 1 byte per packet, the guest can queue up to 262144 of 4 KB
buffers, using up to 1 GB of memory per-socket. In addition, the
guest will continue to fill the vring with new 4 KB free buffers
to avoid starvation of other sockets.

This patch mitigates this issue copying the payload of small
packets (< 128 bytes) into the buffer of last packet queued, in
order to avoid wasting memory.

Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
---
drivers/vhost/vsock.c | 2 +
include/linux/virtio_vsock.h | 1 +
net/vmw_vsock/virtio_transport.c | 1 +
net/vmw_vsock/virtio_transport_common.c | 60 +++++++++++++++++++++----
4 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 6a50e1d0529c..6c8390a2af52 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -329,6 +329,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
return NULL;
}

+ pkt->buf_len = pkt->len;
+
nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
if (nbytes != pkt->len) {
vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index e223e2632edd..7d973903f52e 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -52,6 +52,7 @@ struct virtio_vsock_pkt {
/* socket refcnt not held, only use for cancellation */
struct vsock_sock *vsk;
void *buf;
+ u32 buf_len;
u32 len;
u32 off;
bool reply;
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 0815d1357861..082a30936690 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -307,6 +307,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
break;
}

+ pkt->buf_len = buf_len;
pkt->len = buf_len;

sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 6f1a8aff65c5..095221f94786 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -26,6 +26,9 @@
/* How long to wait for graceful shutdown of a connection */
#define VSOCK_CLOSE_TIMEOUT (8 * HZ)

+/* Threshold for detecting small packets to copy */
+#define GOOD_COPY_LEN 128
+
static const struct virtio_transport *virtio_transport_get_ops(void)
{
const struct vsock_transport *t = vsock_core_get_transport();
@@ -64,6 +67,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
pkt->buf = kmalloc(len, GFP_KERNEL);
if (!pkt->buf)
goto out_pkt;
+
+ pkt->buf_len = len;
+
err = memcpy_from_msg(pkt->buf, info->msg, len);
if (err)
goto out;
@@ -841,24 +847,60 @@ virtio_transport_recv_connecting(struct sock *sk,
return err;
}

+static void
+virtio_transport_recv_enqueue(struct vsock_sock *vsk,
+ struct virtio_vsock_pkt *pkt)
+{
+ struct virtio_vsock_sock *vvs = vsk->trans;
+ bool free_pkt = false;
+
+ pkt->len = le32_to_cpu(pkt->hdr.len);
+ pkt->off = 0;
+
+ spin_lock_bh(&vvs->rx_lock);
+
+ virtio_transport_inc_rx_pkt(vvs, pkt);
+
+ /* Try to copy small packets into the buffer of last packet queued,
+ * to avoid wasting memory queueing the entire buffer with a small
+ * payload.
+ */
+ if (pkt->len <= GOOD_COPY_LEN && !list_empty(&vvs->rx_queue)) {
+ struct virtio_vsock_pkt *last_pkt;
+
+ last_pkt = list_last_entry(&vvs->rx_queue,
+ struct virtio_vsock_pkt, list);
+
+ /* If there is space in the last packet queued, we copy the
+ * new packet in its buffer.
+ */
+ if (pkt->len <= last_pkt->buf_len - last_pkt->len) {
+ memcpy(last_pkt->buf + last_pkt->len, pkt->buf,
+ pkt->len);
+ last_pkt->len += pkt->len;
+ free_pkt = true;
+ goto out;
+ }
+ }
+
+ list_add_tail(&pkt->list, &vvs->rx_queue);
+
+out:
+ spin_unlock_bh(&vvs->rx_lock);
+ if (free_pkt)
+ virtio_transport_free_pkt(pkt);
+}
+
static int
virtio_transport_recv_connected(struct sock *sk,
struct virtio_vsock_pkt *pkt)
{
struct vsock_sock *vsk = vsock_sk(sk);
- struct virtio_vsock_sock *vvs = vsk->trans;
int err = 0;

switch (le16_to_cpu(pkt->hdr.op)) {
case VIRTIO_VSOCK_OP_RW:
- pkt->len = le32_to_cpu(pkt->hdr.len);
- pkt->off = 0;
-
- spin_lock_bh(&vvs->rx_lock);
- virtio_transport_inc_rx_pkt(vvs, pkt);
- list_add_tail(&pkt->list, &vvs->rx_queue);
- spin_unlock_bh(&vvs->rx_lock);
-
+ virtio_transport_recv_enqueue(vsk, pkt);
sk->sk_data_ready(sk);
return err;
case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
--
2.20.1

2019-07-17 11:33:00

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed

Since now we are able to split packets, we can avoid limiting
their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
packet size.

Signed-off-by: Stefano Garzarella <[email protected]>
---
net/vmw_vsock/virtio_transport_common.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 56fab3f03d0e..94cc0fa3e848 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -181,8 +181,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
vvs = vsk->trans;

/* we can send less than pkt_len bytes */
- if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
- pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+ if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
+ pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;

/* virtio_transport_get_credit might return less than pkt_len credit */
pkt_len = virtio_transport_get_credit(vvs, pkt_len);
--
2.20.1

2019-07-17 11:33:38

by Stefano Garzarella

[permalink] [raw]
Subject: [PATCH v4 2/5] vsock/virtio: reduce credit update messages

In order to reduce the number of credit update messages,
we send them only when the space available seen by the
transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.

Signed-off-by: Stefano Garzarella <[email protected]>
---
include/linux/virtio_vsock.h | 1 +
net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++---
2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 7d973903f52e..49fc9d20bc43 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -41,6 +41,7 @@ struct virtio_vsock_sock {

/* Protected by rx_lock */
u32 fwd_cnt;
+ u32 last_fwd_cnt;
u32 rx_bytes;
struct list_head rx_queue;
};
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 095221f94786..a85559d4d974 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -211,6 +211,7 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
{
spin_lock_bh(&vvs->tx_lock);
+ vvs->last_fwd_cnt = vvs->fwd_cnt;
pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
spin_unlock_bh(&vvs->tx_lock);
@@ -261,6 +262,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
struct virtio_vsock_sock *vvs = vsk->trans;
struct virtio_vsock_pkt *pkt;
size_t bytes, total = 0;
+ u32 free_space;
int err = -EFAULT;

spin_lock_bh(&vvs->rx_lock);
@@ -291,11 +293,19 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
virtio_transport_free_pkt(pkt);
}
}
+
+ free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
+
spin_unlock_bh(&vvs->rx_lock);

- /* Send a credit pkt to peer */
- virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
- NULL);
+ /* We send a credit update only when the space available seen
+ * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE
+ */
+ if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
+ virtio_transport_send_credit_update(vsk,
+ VIRTIO_VSOCK_TYPE_STREAM,
+ NULL);
+ }

return total;

--
2.20.1

2019-07-17 14:52:36

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()

On Wed, Jul 17, 2019 at 01:30:28PM +0200, Stefano Garzarella wrote:
> fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
> the same spinlock also if we are in the TX path.
>
> Move also buf_alloc under the same lock.
>
> Signed-off-by: Stefano Garzarella <[email protected]>

Wait a second is this a bugfix?
If it's used under the wrong lock won't values get corrupted?
Won't traffic then stall or more data get to sent than
credits?

> ---
> include/linux/virtio_vsock.h | 2 +-
> net/vmw_vsock/virtio_transport_common.c | 4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 49fc9d20bc43..4c7781f4b29b 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -35,7 +35,6 @@ struct virtio_vsock_sock {
>
> /* Protected by tx_lock */
> u32 tx_cnt;
> - u32 buf_alloc;
> u32 peer_fwd_cnt;
> u32 peer_buf_alloc;
>
> @@ -43,6 +42,7 @@ struct virtio_vsock_sock {
> u32 fwd_cnt;
> u32 last_fwd_cnt;
> u32 rx_bytes;
> + u32 buf_alloc;
> struct list_head rx_queue;
> };
>
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index a85559d4d974..34a2b42313b7 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -210,11 +210,11 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
>
> void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> {
> - spin_lock_bh(&vvs->tx_lock);
> + spin_lock_bh(&vvs->rx_lock);
> vvs->last_fwd_cnt = vvs->fwd_cnt;
> pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
> pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
> - spin_unlock_bh(&vvs->tx_lock);
> + spin_unlock_bh(&vvs->rx_lock);
> }
> EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
>
> --
> 2.20.1

2019-07-17 14:55:46

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers

On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> If the packets to sent to the guest are bigger than the buffer
> available, we can split them, using multiple buffers and fixing
> the length in the packet header.
> This is safe since virtio-vsock supports only stream sockets.
>
> Signed-off-by: Stefano Garzarella <[email protected]>

So how does it work right now? If an app
does sendmsg with a 64K buffer and the other
side publishes 4K buffers - does it just stall?


> ---
> drivers/vhost/vsock.c | 66 ++++++++++++++++++-------
> net/vmw_vsock/virtio_transport_common.c | 15 ++++--
> 2 files changed, 60 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 6c8390a2af52..9f57736fe15e 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -102,7 +102,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> struct iov_iter iov_iter;
> unsigned out, in;
> size_t nbytes;
> - size_t len;
> + size_t iov_len, payload_len;
> int head;
>
> spin_lock_bh(&vsock->send_pkt_list_lock);
> @@ -147,8 +147,24 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> break;
> }
>
> - len = iov_length(&vq->iov[out], in);
> - iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
> + iov_len = iov_length(&vq->iov[out], in);
> + if (iov_len < sizeof(pkt->hdr)) {
> + virtio_transport_free_pkt(pkt);
> + vq_err(vq, "Buffer len [%zu] too small\n", iov_len);
> + break;
> + }
> +
> + iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len);
> + payload_len = pkt->len - pkt->off;
> +
> + /* If the packet is greater than the space available in the
> + * buffer, we split it using multiple buffers.
> + */
> + if (payload_len > iov_len - sizeof(pkt->hdr))
> + payload_len = iov_len - sizeof(pkt->hdr);
> +
> + /* Set the correct length in the header */
> + pkt->hdr.len = cpu_to_le32(payload_len);
>
> nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
> if (nbytes != sizeof(pkt->hdr)) {
> @@ -157,33 +173,47 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> break;
> }
>
> - nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
> - if (nbytes != pkt->len) {
> + nbytes = copy_to_iter(pkt->buf + pkt->off, payload_len,
> + &iov_iter);
> + if (nbytes != payload_len) {
> virtio_transport_free_pkt(pkt);
> vq_err(vq, "Faulted on copying pkt buf\n");
> break;
> }
>
> - vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
> + vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len);
> added = true;
>
> - if (pkt->reply) {
> - int val;
> -
> - val = atomic_dec_return(&vsock->queued_replies);
> -
> - /* Do we have resources to resume tx processing? */
> - if (val + 1 == tx_vq->num)
> - restart_tx = true;
> - }
> -
> /* Deliver to monitoring devices all correctly transmitted
> * packets.
> */
> virtio_transport_deliver_tap_pkt(pkt);
>
> - total_len += pkt->len;
> - virtio_transport_free_pkt(pkt);
> + pkt->off += payload_len;
> + total_len += payload_len;
> +
> + /* If we didn't send all the payload we can requeue the packet
> + * to send it with the next available buffer.
> + */
> + if (pkt->off < pkt->len) {
> + spin_lock_bh(&vsock->send_pkt_list_lock);
> + list_add(&pkt->list, &vsock->send_pkt_list);
> + spin_unlock_bh(&vsock->send_pkt_list_lock);
> + } else {
> + if (pkt->reply) {
> + int val;
> +
> + val = atomic_dec_return(&vsock->queued_replies);
> +
> + /* Do we have resources to resume tx
> + * processing?
> + */
> + if (val + 1 == tx_vq->num)
> + restart_tx = true;
> + }
> +
> + virtio_transport_free_pkt(pkt);
> + }
> } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
> if (added)
> vhost_signal(&vsock->dev, vq);
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 34a2b42313b7..56fab3f03d0e 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -97,8 +97,17 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
> struct virtio_vsock_pkt *pkt = opaque;
> struct af_vsockmon_hdr *hdr;
> struct sk_buff *skb;
> + size_t payload_len;
> + void *payload_buf;
>
> - skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + pkt->len,
> + /* A packet could be split to fit the RX buffer, so we can retrieve
> + * the payload length from the header and the buffer pointer taking
> + * care of the offset in the original packet.
> + */
> + payload_len = le32_to_cpu(pkt->hdr.len);
> + payload_buf = pkt->buf + pkt->off;
> +
> + skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + payload_len,
> GFP_ATOMIC);
> if (!skb)
> return NULL;
> @@ -138,8 +147,8 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
>
> skb_put_data(skb, &pkt->hdr, sizeof(pkt->hdr));
>
> - if (pkt->len) {
> - skb_put_data(skb, pkt->buf, pkt->len);
> + if (payload_len) {
> + skb_put_data(skb, payload_buf, payload_len);
> }
>
> return skb;
> --
> 2.20.1

2019-07-17 15:02:04

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed

On Wed, Jul 17, 2019 at 01:30:30PM +0200, Stefano Garzarella wrote:
> Since now we are able to split packets, we can avoid limiting
> their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> packet size.
>
> Signed-off-by: Stefano Garzarella <[email protected]>


OK so this is kind of like GSO where we are passing
64K packets to the vsock and then split at the
low level.


> ---
> net/vmw_vsock/virtio_transport_common.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 56fab3f03d0e..94cc0fa3e848 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -181,8 +181,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> vvs = vsk->trans;
>
> /* we can send less than pkt_len bytes */
> - if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> - pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> + if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> + pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
>
> /* virtio_transport_get_credit might return less than pkt_len credit */
> pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> --
> 2.20.1

2019-07-18 07:45:02

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()

On Wed, Jul 17, 2019 at 4:51 PM Michael S. Tsirkin <[email protected]> wrote:
>
> On Wed, Jul 17, 2019 at 01:30:28PM +0200, Stefano Garzarella wrote:
> > fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
> > the same spinlock also if we are in the TX path.
> >
> > Move also buf_alloc under the same lock.
> >
> > Signed-off-by: Stefano Garzarella <[email protected]>
>
> Wait a second is this a bugfix?
> If it's used under the wrong lock won't values get corrupted?
> Won't traffic then stall or more data get to sent than
> credits?

Before this series, we only read vvs->fwd_cnt and vvs->buf_alloc in this
function, but using a different lock than the one used to write them.
I'm not sure if a corruption can happen, but if we want to avoid the
lock, we should use an atomic operation or memory barriers.

Since now we also need to update vvs->last_fwd_cnt, in order to limit the
credit message, I decided to take the same lock used to protect vvs->fwd_cnt
and vvs->last_fwd_cnt.


Thanks,
Stefano

>
> > ---
> > include/linux/virtio_vsock.h | 2 +-
> > net/vmw_vsock/virtio_transport_common.c | 4 ++--
> > 2 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 49fc9d20bc43..4c7781f4b29b 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -35,7 +35,6 @@ struct virtio_vsock_sock {
> >
> > /* Protected by tx_lock */
> > u32 tx_cnt;
> > - u32 buf_alloc;
> > u32 peer_fwd_cnt;
> > u32 peer_buf_alloc;
> >
> > @@ -43,6 +42,7 @@ struct virtio_vsock_sock {
> > u32 fwd_cnt;
> > u32 last_fwd_cnt;
> > u32 rx_bytes;
> > + u32 buf_alloc;
> > struct list_head rx_queue;
> > };
> >
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index a85559d4d974..34a2b42313b7 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -210,11 +210,11 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> >
> > void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> > {
> > - spin_lock_bh(&vvs->tx_lock);
> > + spin_lock_bh(&vvs->rx_lock);
> > vvs->last_fwd_cnt = vvs->fwd_cnt;
> > pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
> > pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
> > - spin_unlock_bh(&vvs->tx_lock);
> > + spin_unlock_bh(&vvs->rx_lock);
> > }
> > EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
> >
> > --
> > 2.20.1

2019-07-18 07:52:34

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers

On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <[email protected]> wrote:
>
> On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > If the packets to sent to the guest are bigger than the buffer
> > available, we can split them, using multiple buffers and fixing
> > the length in the packet header.
> > This is safe since virtio-vsock supports only stream sockets.
> >
> > Signed-off-by: Stefano Garzarella <[email protected]>
>
> So how does it work right now? If an app
> does sendmsg with a 64K buffer and the other
> side publishes 4K buffers - does it just stall?

Before this series, the 64K (or bigger) user messages was split in 4K packets
(fixed in the code) and queued in an internal list for the TX worker.

After this series, we will queue up to 64K packets and then it will be split in
the TX worker, depending on the size of the buffers available in the
vring. (The idea was to allow EWMA or a configuration of the buffers size, but
for now we postponed it)

Note: virtio-vsock only supports stream socket for now.

Thanks,
Stefano

2019-07-18 07:54:00

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed

On Wed, Jul 17, 2019 at 5:00 PM Michael S. Tsirkin <[email protected]> wrote:
>
> On Wed, Jul 17, 2019 at 01:30:30PM +0200, Stefano Garzarella wrote:
> > Since now we are able to split packets, we can avoid limiting
> > their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> > Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> > packet size.
> >
> > Signed-off-by: Stefano Garzarella <[email protected]>
>
>
> OK so this is kind of like GSO where we are passing
> 64K packets to the vsock and then split at the
> low level.

Exactly, something like that in the Host->Guest path, instead in the
Guest->Host we use the entire 64K packet.

Thanks,
Stefano

2019-07-18 08:14:10

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers

On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <[email protected]> wrote:
> >
> > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > If the packets to sent to the guest are bigger than the buffer
> > > available, we can split them, using multiple buffers and fixing
> > > the length in the packet header.
> > > This is safe since virtio-vsock supports only stream sockets.
> > >
> > > Signed-off-by: Stefano Garzarella <[email protected]>
> >
> > So how does it work right now? If an app
> > does sendmsg with a 64K buffer and the other
> > side publishes 4K buffers - does it just stall?
>
> Before this series, the 64K (or bigger) user messages was split in 4K packets
> (fixed in the code) and queued in an internal list for the TX worker.
>
> After this series, we will queue up to 64K packets and then it will be split in
> the TX worker, depending on the size of the buffers available in the
> vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> for now we postponed it)

Got it. Using workers for xmit is IMHO a bad idea btw.
Why is it done like this?

> Note: virtio-vsock only supports stream socket for now.
>
> Thanks,
> Stefano

2019-07-18 09:39:38

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers

On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin <[email protected]> wrote:
> On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <[email protected]> wrote:
> > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > If the packets to sent to the guest are bigger than the buffer
> > > > available, we can split them, using multiple buffers and fixing
> > > > the length in the packet header.
> > > > This is safe since virtio-vsock supports only stream sockets.
> > > >
> > > > Signed-off-by: Stefano Garzarella <[email protected]>
> > >
> > > So how does it work right now? If an app
> > > does sendmsg with a 64K buffer and the other
> > > side publishes 4K buffers - does it just stall?
> >
> > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > (fixed in the code) and queued in an internal list for the TX worker.
> >
> > After this series, we will queue up to 64K packets and then it will be split in
> > the TX worker, depending on the size of the buffers available in the
> > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > for now we postponed it)
>
> Got it. Using workers for xmit is IMHO a bad idea btw.
> Why is it done like this?

Honestly, I don't know the exact reasons for this design, but I suppose
that the idea was to have only one worker that uses the vring, and
multiple user threads that enqueue packets in the list.
This can simplify the code and we can put the user threads to sleep if
we don't have "credit" available (this means that the receiver doesn't
have space to receive the packet).

What are the drawbacks in your opinion?


Thanks,
Stefano

2019-07-18 11:37:07

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers

On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
> On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin <[email protected]> wrote:
> > On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <[email protected]> wrote:
> > > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > > If the packets to sent to the guest are bigger than the buffer
> > > > > available, we can split them, using multiple buffers and fixing
> > > > > the length in the packet header.
> > > > > This is safe since virtio-vsock supports only stream sockets.
> > > > >
> > > > > Signed-off-by: Stefano Garzarella <[email protected]>
> > > >
> > > > So how does it work right now? If an app
> > > > does sendmsg with a 64K buffer and the other
> > > > side publishes 4K buffers - does it just stall?
> > >
> > > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > > (fixed in the code) and queued in an internal list for the TX worker.
> > >
> > > After this series, we will queue up to 64K packets and then it will be split in
> > > the TX worker, depending on the size of the buffers available in the
> > > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > > for now we postponed it)
> >
> > Got it. Using workers for xmit is IMHO a bad idea btw.
> > Why is it done like this?
>
> Honestly, I don't know the exact reasons for this design, but I suppose
> that the idea was to have only one worker that uses the vring, and
> multiple user threads that enqueue packets in the list.
> This can simplify the code and we can put the user threads to sleep if
> we don't have "credit" available (this means that the receiver doesn't
> have space to receive the packet).


I think you mean the reverse: even without credits you can copy from
user and queue up data, then process it without waking up the user
thread.
Does it help though? It certainly adds up work outside of
user thread context which means it's not accounted for
correctly.

Maybe we want more VQs. Would help improve parallelism. The question
would then become how to map sockets to VQs. With a simple hash
it's easy to create collisions ...


>
> What are the drawbacks in your opinion?
>
>
> Thanks,
> Stefano

- More pressure on scheduler
- Increased latency


--
MST

2019-07-18 12:34:39

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed

On Thu, Jul 18, 2019 at 09:52:41AM +0200, Stefano Garzarella wrote:
> On Wed, Jul 17, 2019 at 5:00 PM Michael S. Tsirkin <[email protected]> wrote:
> >
> > On Wed, Jul 17, 2019 at 01:30:30PM +0200, Stefano Garzarella wrote:
> > > Since now we are able to split packets, we can avoid limiting
> > > their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> > > Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> > > packet size.
> > >
> > > Signed-off-by: Stefano Garzarella <[email protected]>
> >
> >
> > OK so this is kind of like GSO where we are passing
> > 64K packets to the vsock and then split at the
> > low level.
>
> Exactly, something like that in the Host->Guest path, instead in the
> Guest->Host we use the entire 64K packet.
>
> Thanks,
> Stefano

btw two allocations for each packet isn't great. How about
allocating the struct linearly with the data?
And all buffers are same length for you - so you can actually
do alloc_pages.
Allocating/freeing pages in a batch should also be considered.

--
MST

2019-07-19 08:09:12

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers

On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
> > On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin <[email protected]> wrote:
> > > On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > > > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <[email protected]> wrote:
> > > > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > > > If the packets to sent to the guest are bigger than the buffer
> > > > > > available, we can split them, using multiple buffers and fixing
> > > > > > the length in the packet header.
> > > > > > This is safe since virtio-vsock supports only stream sockets.
> > > > > >
> > > > > > Signed-off-by: Stefano Garzarella <[email protected]>
> > > > >
> > > > > So how does it work right now? If an app
> > > > > does sendmsg with a 64K buffer and the other
> > > > > side publishes 4K buffers - does it just stall?
> > > >
> > > > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > > > (fixed in the code) and queued in an internal list for the TX worker.
> > > >
> > > > After this series, we will queue up to 64K packets and then it will be split in
> > > > the TX worker, depending on the size of the buffers available in the
> > > > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > > > for now we postponed it)
> > >
> > > Got it. Using workers for xmit is IMHO a bad idea btw.
> > > Why is it done like this?
> >
> > Honestly, I don't know the exact reasons for this design, but I suppose
> > that the idea was to have only one worker that uses the vring, and
> > multiple user threads that enqueue packets in the list.
> > This can simplify the code and we can put the user threads to sleep if
> > we don't have "credit" available (this means that the receiver doesn't
> > have space to receive the packet).
>
>
> I think you mean the reverse: even without credits you can copy from
> user and queue up data, then process it without waking up the user
> thread.

I checked the code better, but it doesn't seem to do that.
The .sendmsg callback of af_vsock, check if the transport has space
(virtio-vsock transport returns the credit available). If there is no
space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.

When the transport receives an update of credit available on the other
peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
sleeping, that will queue the new packet.

So, in the current implementation, the TX worker doesn't check the
credit available, it only sends the packets.

> Does it help though? It certainly adds up work outside of
> user thread context which means it's not accounted for
> correctly.

I can try to xmit the packet directly in the user thread context, to see
the improvements.

>
> Maybe we want more VQs. Would help improve parallelism. The question
> would then become how to map sockets to VQs. With a simple hash
> it's easy to create collisions ...

Yes, more VQs can help but the map question is not simple to answer.
Maybe we can do an hash on the (cid, port) or do some kind of estimation
of queue utilization and try to balance.
Should the mapping be unique?

>
>
> >
> > What are the drawbacks in your opinion?
> >
> >
> > Thanks,
> > Stefano
>
> - More pressure on scheduler
> - Increased latency
>

Thanks,
Stefano

2019-07-19 08:23:11

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers


On 2019/7/19 下午4:08, Stefano Garzarella wrote:
> On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
>> On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
>>> On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin<[email protected]> wrote:
>>>> On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
>>>>> On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin<[email protected]> wrote:
>>>>>> On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
>>>>>>> If the packets to sent to the guest are bigger than the buffer
>>>>>>> available, we can split them, using multiple buffers and fixing
>>>>>>> the length in the packet header.
>>>>>>> This is safe since virtio-vsock supports only stream sockets.
>>>>>>>
>>>>>>> Signed-off-by: Stefano Garzarella<[email protected]>
>>>>>> So how does it work right now? If an app
>>>>>> does sendmsg with a 64K buffer and the other
>>>>>> side publishes 4K buffers - does it just stall?
>>>>> Before this series, the 64K (or bigger) user messages was split in 4K packets
>>>>> (fixed in the code) and queued in an internal list for the TX worker.
>>>>>
>>>>> After this series, we will queue up to 64K packets and then it will be split in
>>>>> the TX worker, depending on the size of the buffers available in the
>>>>> vring. (The idea was to allow EWMA or a configuration of the buffers size, but
>>>>> for now we postponed it)
>>>> Got it. Using workers for xmit is IMHO a bad idea btw.
>>>> Why is it done like this?
>>> Honestly, I don't know the exact reasons for this design, but I suppose
>>> that the idea was to have only one worker that uses the vring, and
>>> multiple user threads that enqueue packets in the list.
>>> This can simplify the code and we can put the user threads to sleep if
>>> we don't have "credit" available (this means that the receiver doesn't
>>> have space to receive the packet).
>> I think you mean the reverse: even without credits you can copy from
>> user and queue up data, then process it without waking up the user
>> thread.
> I checked the code better, but it doesn't seem to do that.
> The .sendmsg callback of af_vsock, check if the transport has space
> (virtio-vsock transport returns the credit available). If there is no
> space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.
>
> When the transport receives an update of credit available on the other
> peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
> sleeping, that will queue the new packet.
>
> So, in the current implementation, the TX worker doesn't check the
> credit available, it only sends the packets.
>
>> Does it help though? It certainly adds up work outside of
>> user thread context which means it's not accounted for
>> correctly.
> I can try to xmit the packet directly in the user thread context, to see
> the improvements.


It will then looks more like what virtio-net (and other networking
device) did.


>
>> Maybe we want more VQs. Would help improve parallelism. The question
>> would then become how to map sockets to VQs. With a simple hash
>> it's easy to create collisions ...
> Yes, more VQs can help but the map question is not simple to answer.
> Maybe we can do an hash on the (cid, port) or do some kind of estimation
> of queue utilization and try to balance.
> Should the mapping be unique?


It sounds to me you want some kind of fair queuing? We've already had
several qdiscs that do this.

So if we use the kernel networking xmit path, all those issues could be
addressed.

Thanks

>

2019-07-19 08:30:44

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed

On Thu, Jul 18, 2019 at 08:33:40AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 18, 2019 at 09:52:41AM +0200, Stefano Garzarella wrote:
> > On Wed, Jul 17, 2019 at 5:00 PM Michael S. Tsirkin <[email protected]> wrote:
> > >
> > > On Wed, Jul 17, 2019 at 01:30:30PM +0200, Stefano Garzarella wrote:
> > > > Since now we are able to split packets, we can avoid limiting
> > > > their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> > > > Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> > > > packet size.
> > > >
> > > > Signed-off-by: Stefano Garzarella <[email protected]>
> > >
> > >
> > > OK so this is kind of like GSO where we are passing
> > > 64K packets to the vsock and then split at the
> > > low level.
> >
> > Exactly, something like that in the Host->Guest path, instead in the
> > Guest->Host we use the entire 64K packet.
> >
> > Thanks,
> > Stefano
>
> btw two allocations for each packet isn't great. How about
> allocating the struct linearly with the data?

Are you referring to the kzalloc() to allocate the 'struct
virtio_vsock_pkt', followed by the kmalloc() to allocate the buffer?

Actually they don't look great, I will try to do a single allocation.

> And all buffers are same length for you - so you can actually
> do alloc_pages.

Yes, also Jason suggested it and we decided to postpone since we will
try to reuse the virtio-net where it comes for free.

> Allocating/freeing pages in a batch should also be considered.

For the allocation of guest rx buffers we do some kind of batching (we
refill the queue when it reaches the half), but only it this case :(

I'll try to do more alloc/free batching.

Thanks,
Stefano

2019-07-19 08:40:02

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers

On Fri, Jul 19, 2019 at 04:21:52PM +0800, Jason Wang wrote:
>
> On 2019/7/19 下午4:08, Stefano Garzarella wrote:
> > On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
> > > > On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin<[email protected]> wrote:
> > > > > On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > > > > > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin<[email protected]> wrote:
> > > > > > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > > > > > If the packets to sent to the guest are bigger than the buffer
> > > > > > > > available, we can split them, using multiple buffers and fixing
> > > > > > > > the length in the packet header.
> > > > > > > > This is safe since virtio-vsock supports only stream sockets.
> > > > > > > >
> > > > > > > > Signed-off-by: Stefano Garzarella<[email protected]>
> > > > > > > So how does it work right now? If an app
> > > > > > > does sendmsg with a 64K buffer and the other
> > > > > > > side publishes 4K buffers - does it just stall?
> > > > > > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > > > > > (fixed in the code) and queued in an internal list for the TX worker.
> > > > > >
> > > > > > After this series, we will queue up to 64K packets and then it will be split in
> > > > > > the TX worker, depending on the size of the buffers available in the
> > > > > > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > > > > > for now we postponed it)
> > > > > Got it. Using workers for xmit is IMHO a bad idea btw.
> > > > > Why is it done like this?
> > > > Honestly, I don't know the exact reasons for this design, but I suppose
> > > > that the idea was to have only one worker that uses the vring, and
> > > > multiple user threads that enqueue packets in the list.
> > > > This can simplify the code and we can put the user threads to sleep if
> > > > we don't have "credit" available (this means that the receiver doesn't
> > > > have space to receive the packet).
> > > I think you mean the reverse: even without credits you can copy from
> > > user and queue up data, then process it without waking up the user
> > > thread.
> > I checked the code better, but it doesn't seem to do that.
> > The .sendmsg callback of af_vsock, check if the transport has space
> > (virtio-vsock transport returns the credit available). If there is no
> > space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.
> >
> > When the transport receives an update of credit available on the other
> > peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
> > sleeping, that will queue the new packet.
> >
> > So, in the current implementation, the TX worker doesn't check the
> > credit available, it only sends the packets.
> >
> > > Does it help though? It certainly adds up work outside of
> > > user thread context which means it's not accounted for
> > > correctly.
> > I can try to xmit the packet directly in the user thread context, to see
> > the improvements.
>
>
> It will then looks more like what virtio-net (and other networking device)
> did.

I'll try ASAP, the changes should not be too complicated... I hope :)

>
>
> >
> > > Maybe we want more VQs. Would help improve parallelism. The question
> > > would then become how to map sockets to VQs. With a simple hash
> > > it's easy to create collisions ...
> > Yes, more VQs can help but the map question is not simple to answer.
> > Maybe we can do an hash on the (cid, port) or do some kind of estimation
> > of queue utilization and try to balance.
> > Should the mapping be unique?
>
>
> It sounds to me you want some kind of fair queuing? We've already had
> several qdiscs that do this.

Thanks for pointing it out!

>
> So if we use the kernel networking xmit path, all those issues could be
> addressed.

One more point to AF_VSOCK + net-stack, but we have to evaluate possible
drawbacks in using the net-stack. (e.g. more latency due to the complexity
of the net-stack?)

Thanks,
Stefano

2019-07-19 08:52:25

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers


On 2019/7/19 下午4:39, Stefano Garzarella wrote:
> On Fri, Jul 19, 2019 at 04:21:52PM +0800, Jason Wang wrote:
>> On 2019/7/19 下午4:08, Stefano Garzarella wrote:
>>> On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
>>>> On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
>>>>> On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin<[email protected]> wrote:
>>>>>> On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
>>>>>>> On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin<[email protected]> wrote:
>>>>>>>> On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
>>>>>>>>> If the packets to sent to the guest are bigger than the buffer
>>>>>>>>> available, we can split them, using multiple buffers and fixing
>>>>>>>>> the length in the packet header.
>>>>>>>>> This is safe since virtio-vsock supports only stream sockets.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Stefano Garzarella<[email protected]>
>>>>>>>> So how does it work right now? If an app
>>>>>>>> does sendmsg with a 64K buffer and the other
>>>>>>>> side publishes 4K buffers - does it just stall?
>>>>>>> Before this series, the 64K (or bigger) user messages was split in 4K packets
>>>>>>> (fixed in the code) and queued in an internal list for the TX worker.
>>>>>>>
>>>>>>> After this series, we will queue up to 64K packets and then it will be split in
>>>>>>> the TX worker, depending on the size of the buffers available in the
>>>>>>> vring. (The idea was to allow EWMA or a configuration of the buffers size, but
>>>>>>> for now we postponed it)
>>>>>> Got it. Using workers for xmit is IMHO a bad idea btw.
>>>>>> Why is it done like this?
>>>>> Honestly, I don't know the exact reasons for this design, but I suppose
>>>>> that the idea was to have only one worker that uses the vring, and
>>>>> multiple user threads that enqueue packets in the list.
>>>>> This can simplify the code and we can put the user threads to sleep if
>>>>> we don't have "credit" available (this means that the receiver doesn't
>>>>> have space to receive the packet).
>>>> I think you mean the reverse: even without credits you can copy from
>>>> user and queue up data, then process it without waking up the user
>>>> thread.
>>> I checked the code better, but it doesn't seem to do that.
>>> The .sendmsg callback of af_vsock, check if the transport has space
>>> (virtio-vsock transport returns the credit available). If there is no
>>> space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.
>>>
>>> When the transport receives an update of credit available on the other
>>> peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
>>> sleeping, that will queue the new packet.
>>>
>>> So, in the current implementation, the TX worker doesn't check the
>>> credit available, it only sends the packets.
>>>
>>>> Does it help though? It certainly adds up work outside of
>>>> user thread context which means it's not accounted for
>>>> correctly.
>>> I can try to xmit the packet directly in the user thread context, to see
>>> the improvements.
>>
>> It will then looks more like what virtio-net (and other networking device)
>> did.
> I'll try ASAP, the changes should not be too complicated... I hope :)
>
>>
>>>> Maybe we want more VQs. Would help improve parallelism. The question
>>>> would then become how to map sockets to VQs. With a simple hash
>>>> it's easy to create collisions ...
>>> Yes, more VQs can help but the map question is not simple to answer.
>>> Maybe we can do an hash on the (cid, port) or do some kind of estimation
>>> of queue utilization and try to balance.
>>> Should the mapping be unique?
>>
>> It sounds to me you want some kind of fair queuing? We've already had
>> several qdiscs that do this.
> Thanks for pointing it out!
>
>> So if we use the kernel networking xmit path, all those issues could be
>> addressed.
> One more point to AF_VSOCK + net-stack, but we have to evaluate possible
> drawbacks in using the net-stack. (e.g. more latency due to the complexity
> of the net-stack?)


Yes, we need benchmark the performance. But as we've noticed, current
vsock implementation is not efficient, and for stream socket, the
overhead should be minimal. The most important thing is to avoid
reinventing things that has already existed.

Thanks


>
> Thanks,
> Stefano

2019-07-19 09:20:59

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers

On Fri, Jul 19, 2019 at 04:51:00PM +0800, Jason Wang wrote:
>
> On 2019/7/19 下午4:39, Stefano Garzarella wrote:
> > On Fri, Jul 19, 2019 at 04:21:52PM +0800, Jason Wang wrote:
> > > On 2019/7/19 下午4:08, Stefano Garzarella wrote:
> > > > On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
> > > > > On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
> > > > > > On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin<[email protected]> wrote:
> > > > > > > On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > > > > > > > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin<[email protected]> wrote:
> > > > > > > > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > > > > > > > If the packets to sent to the guest are bigger than the buffer
> > > > > > > > > > available, we can split them, using multiple buffers and fixing
> > > > > > > > > > the length in the packet header.
> > > > > > > > > > This is safe since virtio-vsock supports only stream sockets.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Stefano Garzarella<[email protected]>
> > > > > > > > > So how does it work right now? If an app
> > > > > > > > > does sendmsg with a 64K buffer and the other
> > > > > > > > > side publishes 4K buffers - does it just stall?
> > > > > > > > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > > > > > > > (fixed in the code) and queued in an internal list for the TX worker.
> > > > > > > >
> > > > > > > > After this series, we will queue up to 64K packets and then it will be split in
> > > > > > > > the TX worker, depending on the size of the buffers available in the
> > > > > > > > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > > > > > > > for now we postponed it)
> > > > > > > Got it. Using workers for xmit is IMHO a bad idea btw.
> > > > > > > Why is it done like this?
> > > > > > Honestly, I don't know the exact reasons for this design, but I suppose
> > > > > > that the idea was to have only one worker that uses the vring, and
> > > > > > multiple user threads that enqueue packets in the list.
> > > > > > This can simplify the code and we can put the user threads to sleep if
> > > > > > we don't have "credit" available (this means that the receiver doesn't
> > > > > > have space to receive the packet).
> > > > > I think you mean the reverse: even without credits you can copy from
> > > > > user and queue up data, then process it without waking up the user
> > > > > thread.
> > > > I checked the code better, but it doesn't seem to do that.
> > > > The .sendmsg callback of af_vsock, check if the transport has space
> > > > (virtio-vsock transport returns the credit available). If there is no
> > > > space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.
> > > >
> > > > When the transport receives an update of credit available on the other
> > > > peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
> > > > sleeping, that will queue the new packet.
> > > >
> > > > So, in the current implementation, the TX worker doesn't check the
> > > > credit available, it only sends the packets.
> > > >
> > > > > Does it help though? It certainly adds up work outside of
> > > > > user thread context which means it's not accounted for
> > > > > correctly.
> > > > I can try to xmit the packet directly in the user thread context, to see
> > > > the improvements.
> > >
> > > It will then looks more like what virtio-net (and other networking device)
> > > did.
> > I'll try ASAP, the changes should not be too complicated... I hope :)
> >
> > >
> > > > > Maybe we want more VQs. Would help improve parallelism. The question
> > > > > would then become how to map sockets to VQs. With a simple hash
> > > > > it's easy to create collisions ...
> > > > Yes, more VQs can help but the map question is not simple to answer.
> > > > Maybe we can do an hash on the (cid, port) or do some kind of estimation
> > > > of queue utilization and try to balance.
> > > > Should the mapping be unique?
> > >
> > > It sounds to me you want some kind of fair queuing? We've already had
> > > several qdiscs that do this.
> > Thanks for pointing it out!
> >
> > > So if we use the kernel networking xmit path, all those issues could be
> > > addressed.
> > One more point to AF_VSOCK + net-stack, but we have to evaluate possible
> > drawbacks in using the net-stack. (e.g. more latency due to the complexity
> > of the net-stack?)
>
>
> Yes, we need benchmark the performance. But as we've noticed, current vsock
> implementation is not efficient, and for stream socket, the overhead should
> be minimal. The most important thing is to avoid reinventing things that has
> already existed.

Got it. I completely agree with you, and I want to avoid reinventing things
(surely in a worse way).

But the idea (suggested also by Micheal) to discover how fast can go a
new protocol separate from the networking stack, is quite attractive :)

Thanks,
Stefano

2019-07-22 08:41:09

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] vsock/virtio: reduce credit update messages

On Wed, Jul 17, 2019 at 01:30:27PM +0200, Stefano Garzarella wrote:
> In order to reduce the number of credit update messages,
> we send them only when the space available seen by the
> transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
> include/linux/virtio_vsock.h | 1 +
> net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++---
> 2 files changed, 14 insertions(+), 3 deletions(-)

It's an arbitrary limit but the risk of regressions is low since the
credit update traffic was so excessive:

Reviewed-by: Stefan Hajnoczi <[email protected]>


Attachments:
(No filename) (654.00 B)
signature.asc (499.00 B)
Download all attachments

2019-07-22 08:54:13

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()

On Wed, Jul 17, 2019 at 01:30:28PM +0200, Stefano Garzarella wrote:
> fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
> the same spinlock also if we are in the TX path.
>
> Move also buf_alloc under the same lock.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
> include/linux/virtio_vsock.h | 2 +-
> net/vmw_vsock/virtio_transport_common.c | 4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi <[email protected]>


Attachments:
(No filename) (521.00 B)
signature.asc (499.00 B)
Download all attachments

2019-07-22 09:39:35

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers

On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> If the packets to sent to the guest are bigger than the buffer
> available, we can split them, using multiple buffers and fixing
> the length in the packet header.
> This is safe since virtio-vsock supports only stream sockets.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
> drivers/vhost/vsock.c | 66 ++++++++++++++++++-------
> net/vmw_vsock/virtio_transport_common.c | 15 ++++--
> 2 files changed, 60 insertions(+), 21 deletions(-)

Reviewed-by: Stefan Hajnoczi <[email protected]>


Attachments:
(No filename) (613.00 B)
signature.asc (499.00 B)
Download all attachments

2019-07-22 09:45:13

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput

On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> This series tries to increase the throughput of virtio-vsock with slight
> changes.
> While I was testing the v2 of this series I discovered an huge use of memory,
> so I added patch 1 to mitigate this issue. I put it in this series in order
> to better track the performance trends.
>
> v4:
> - rebased all patches on current master (conflicts is Patch 4)
> - Patch 1: added Stefan's R-b
> - Patch 3: removed lock when buf_alloc is written [David];
> moved this patch after "vsock/virtio: reduce credit update messages"
> to make it clearer
> - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some
> conflicts

Stefano: Do you want to continue experimenting before we merge this
patch series? The code looks functionally correct and the performance
increases, so I'm happy for it to be merged.


Attachments:
(No filename) (942.00 B)
signature.asc (499.00 B)
Download all attachments

2019-07-22 10:58:45

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed

On Wed, Jul 17, 2019 at 01:30:30PM +0200, Stefano Garzarella wrote:
> Since now we are able to split packets, we can avoid limiting
> their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> packet size.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
> net/vmw_vsock/virtio_transport_common.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <[email protected]>


Attachments:
(No filename) (495.00 B)
signature.asc (499.00 B)
Download all attachments

2019-07-22 11:09:56

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput

On Mon, Jul 22, 2019 at 10:08:35AM +0100, Stefan Hajnoczi wrote:
> On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> > This series tries to increase the throughput of virtio-vsock with slight
> > changes.
> > While I was testing the v2 of this series I discovered an huge use of memory,
> > so I added patch 1 to mitigate this issue. I put it in this series in order
> > to better track the performance trends.
> >
> > v4:
> > - rebased all patches on current master (conflicts is Patch 4)
> > - Patch 1: added Stefan's R-b
> > - Patch 3: removed lock when buf_alloc is written [David];
> > moved this patch after "vsock/virtio: reduce credit update messages"
> > to make it clearer
> > - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some
> > conflicts
>
> Stefano: Do you want to continue experimenting before we merge this
> patch series? The code looks functionally correct and the performance
> increases, so I'm happy for it to be merged.

I think we can merge this series.

I'll continue to do other experiments (e.g. removing TX workers, allocating
pages, etc.) but I think these changes are prerequisites for the other patches,
so we can merge them.

Thank you very much for the reviews!
Stefano

2019-07-29 14:04:27

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput

On Mon, Jul 22, 2019 at 11:14:34AM +0200, Stefano Garzarella wrote:
> On Mon, Jul 22, 2019 at 10:08:35AM +0100, Stefan Hajnoczi wrote:
> > On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> > > This series tries to increase the throughput of virtio-vsock with slight
> > > changes.
> > > While I was testing the v2 of this series I discovered an huge use of memory,
> > > so I added patch 1 to mitigate this issue. I put it in this series in order
> > > to better track the performance trends.
> > >
> > > v4:
> > > - rebased all patches on current master (conflicts is Patch 4)
> > > - Patch 1: added Stefan's R-b
> > > - Patch 3: removed lock when buf_alloc is written [David];
> > > moved this patch after "vsock/virtio: reduce credit update messages"
> > > to make it clearer
> > > - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some
> > > conflicts
> >
> > Stefano: Do you want to continue experimenting before we merge this
> > patch series? The code looks functionally correct and the performance
> > increases, so I'm happy for it to be merged.
>
> I think we can merge this series.
>
> I'll continue to do other experiments (e.g. removing TX workers, allocating
> pages, etc.) but I think these changes are prerequisites for the other patches,
> so we can merge them.
>
> Thank you very much for the reviews!

All patches have been reviewed by here. Have an Ack for good measure:

Acked-by: Stefan Hajnoczi <[email protected]>

The topics discussed in sub-threads relate to longer-term optimization
work that doesn't block this series. Please merge.


Attachments:
(No filename) (1.64 kB)
signature.asc (499.00 B)
Download all attachments

2019-07-29 14:24:46

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput

On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> This series tries to increase the throughput of virtio-vsock with slight
> changes.
> While I was testing the v2 of this series I discovered an huge use of memory,
> so I added patch 1 to mitigate this issue. I put it in this series in order
> to better track the performance trends.

Series:

Acked-by: Michael S. Tsirkin <[email protected]>

Can this go into net-next?


> v4:
> - rebased all patches on current master (conflicts is Patch 4)
> - Patch 1: added Stefan's R-b
> - Patch 3: removed lock when buf_alloc is written [David];
> moved this patch after "vsock/virtio: reduce credit update messages"
> to make it clearer
> - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some
> conflicts
>
> v3: https://patchwork.kernel.org/cover/10970145
>
> v2: https://patchwork.kernel.org/cover/10938743
>
> v1: https://patchwork.kernel.org/cover/10885431
>
> Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> support. As Micheal suggested in the v1, I booted host and guest with 'nosmap'.
>
> A brief description of patches:
> - Patches 1: limit the memory usage with an extra copy for small packets
> - Patches 2+3: reduce the number of credit update messages sent to the
> transmitter
> - Patches 4+5: allow the host to split packets on multiple buffers and use
> VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
>
> host -> guest [Gbps]
> pkt_size before opt p 1 p 2+3 p 4+5
>
> 32 0.032 0.030 0.048 0.051
> 64 0.061 0.059 0.108 0.117
> 128 0.122 0.112 0.227 0.234
> 256 0.244 0.241 0.418 0.415
> 512 0.459 0.466 0.847 0.865
> 1K 0.927 0.919 1.657 1.641
> 2K 1.884 1.813 3.262 3.269
> 4K 3.378 3.326 6.044 6.195
> 8K 5.637 5.676 10.141 11.287
> 16K 8.250 8.402 15.976 16.736
> 32K 13.327 13.204 19.013 20.515
> 64K 21.241 21.341 20.973 21.879
> 128K 21.851 22.354 21.816 23.203
> 256K 21.408 21.693 21.846 24.088
> 512K 21.600 21.899 21.921 24.106
>
> guest -> host [Gbps]
> pkt_size before opt p 1 p 2+3 p 4+5
>
> 32 0.045 0.046 0.057 0.057
> 64 0.089 0.091 0.103 0.104
> 128 0.170 0.179 0.192 0.200
> 256 0.364 0.351 0.361 0.379
> 512 0.709 0.699 0.731 0.790
> 1K 1.399 1.407 1.395 1.427
> 2K 2.670 2.684 2.745 2.835
> 4K 5.171 5.199 5.305 5.451
> 8K 8.442 8.500 10.083 9.941
> 16K 12.305 12.259 13.519 15.385
> 32K 11.418 11.150 11.988 24.680
> 64K 10.778 10.659 11.589 35.273
> 128K 10.421 10.339 10.939 40.338
> 256K 10.300 9.719 10.508 36.562
> 512K 9.833 9.808 10.612 35.979
>
> As Stefan suggested in the v1, I measured also the efficiency in this way:
> efficiency = Mbps / (%CPU_Host + %CPU_Guest)
>
> The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> but it's provided for free from iperf3 and could be an indication.
>
> host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt p 1 p 2+3 p 4+5
>
> 32 0.35 0.45 0.79 1.02
> 64 0.56 0.80 1.41 1.54
> 128 1.11 1.52 3.03 3.12
> 256 2.20 2.16 5.44 5.58
> 512 4.17 4.18 10.96 11.46
> 1K 8.30 8.26 20.99 20.89
> 2K 16.82 16.31 39.76 39.73
> 4K 30.89 30.79 74.07 75.73
> 8K 53.74 54.49 124.24 148.91
> 16K 80.68 83.63 200.21 232.79
> 32K 132.27 132.52 260.81 357.07
> 64K 229.82 230.40 300.19 444.18
> 128K 332.60 329.78 331.51 492.28
> 256K 331.06 337.22 339.59 511.59
> 512K 335.58 328.50 331.56 504.56
>
> guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt p 1 p 2+3 p 4+5
>
> 32 0.43 0.43 0.53 0.56
> 64 0.85 0.86 1.04 1.10
> 128 1.63 1.71 2.07 2.13
> 256 3.48 3.35 4.02 4.22
> 512 6.80 6.67 7.97 8.63
> 1K 13.32 13.31 15.72 15.94
> 2K 25.79 25.92 30.84 30.98
> 4K 50.37 50.48 58.79 59.69
> 8K 95.90 96.15 107.04 110.33
> 16K 145.80 145.43 143.97 174.70
> 32K 147.06 144.74 146.02 282.48
> 64K 145.25 143.99 141.62 406.40
> 128K 149.34 146.96 147.49 489.34
> 256K 156.35 149.81 152.21 536.37
> 512K 151.65 150.74 151.52 519.93
>
> [1] https://github.com/stefano-garzarella/iperf/
>
> Stefano Garzarella (5):
> vsock/virtio: limit the memory used per-socket
> vsock/virtio: reduce credit update messages
> vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
> vhost/vsock: split packets to send using multiple buffers
> vsock/virtio: change the maximum packet size allowed
>
> drivers/vhost/vsock.c | 68 ++++++++++++-----
> include/linux/virtio_vsock.h | 4 +-
> net/vmw_vsock/virtio_transport.c | 1 +
> net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
> 4 files changed, 134 insertions(+), 38 deletions(-)
>
> --
> 2.20.1

2019-07-29 16:31:26

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > Since virtio-vsock was introduced, the buffers filled by the host
> > and pushed to the guest using the vring, are directly queued in
> > a per-socket list. These buffers are preallocated by the guest
> > with a fixed size (4 KB).
> >
> > The maximum amount of memory used by each socket should be
> > controlled by the credit mechanism.
> > The default credit available per-socket is 256 KB, but if we use
> > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > buffers, using up to 1 GB of memory per-socket. In addition, the
> > guest will continue to fill the vring with new 4 KB free buffers
> > to avoid starvation of other sockets.
> >
> > This patch mitigates this issue copying the payload of small
> > packets (< 128 bytes) into the buffer of last packet queued, in
> > order to avoid wasting memory.
> >
> > Reviewed-by: Stefan Hajnoczi <[email protected]>
> > Signed-off-by: Stefano Garzarella <[email protected]>
>
> This is good enough for net-next, but for net I think we
> should figure out how to address the issue completely.
> Can we make the accounting precise? What happens to
> performance if we do?
>

In order to do more precise accounting maybe we can use the buffer size,
instead of payload size when we update the credit available.
In this way, the credit available for each socket will reflect the memory
actually used.

I should check better, because I'm not sure what happen if the peer sees
1KB of space available, then it sends 1KB of payload (using a 4KB
buffer).

The other option is to copy each packet in a new buffer like I did in
the v2 [2], but this forces us to make a copy for each packet that does
not fill the entire buffer, perhaps too expensive.

[2] https://patchwork.kernel.org/patch/10938741/


Thanks,
Stefano

2019-07-29 16:39:04

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > Since virtio-vsock was introduced, the buffers filled by the host
> > > and pushed to the guest using the vring, are directly queued in
> > > a per-socket list. These buffers are preallocated by the guest
> > > with a fixed size (4 KB).
> > >
> > > The maximum amount of memory used by each socket should be
> > > controlled by the credit mechanism.
> > > The default credit available per-socket is 256 KB, but if we use
> > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > guest will continue to fill the vring with new 4 KB free buffers
> > > to avoid starvation of other sockets.
> > >
> > > This patch mitigates this issue copying the payload of small
> > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > order to avoid wasting memory.
> > >
> > > Reviewed-by: Stefan Hajnoczi <[email protected]>
> > > Signed-off-by: Stefano Garzarella <[email protected]>
> >
> > This is good enough for net-next, but for net I think we
> > should figure out how to address the issue completely.
> > Can we make the accounting precise? What happens to
> > performance if we do?
> >
>
> In order to do more precise accounting maybe we can use the buffer size,
> instead of payload size when we update the credit available.
> In this way, the credit available for each socket will reflect the memory
> actually used.
>
> I should check better, because I'm not sure what happen if the peer sees
> 1KB of space available, then it sends 1KB of payload (using a 4KB
> buffer).
> The other option is to copy each packet in a new buffer like I did in
> the v2 [2], but this forces us to make a copy for each packet that does
> not fill the entire buffer, perhaps too expensive.
>
> [2] https://patchwork.kernel.org/patch/10938741/
>

So one thing we can easily do is to under-report the
available credit. E.g. if we copy up to 256bytes,
then report just 256bytes for every buffer in the queue.


>
> Thanks,
> Stefano

2019-07-29 17:03:42

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Mon, Jul 29, 2019 at 06:19:03PM +0200, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 11:49:02AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > and pushed to the guest using the vring, are directly queued in
> > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > with a fixed size (4 KB).
> > > > >
> > > > > The maximum amount of memory used by each socket should be
> > > > > controlled by the credit mechanism.
> > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > to avoid starvation of other sockets.
> > > > >
> > > > > This patch mitigates this issue copying the payload of small
> > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > order to avoid wasting memory.
> > > > >
> > > > > Reviewed-by: Stefan Hajnoczi <[email protected]>
> > > > > Signed-off-by: Stefano Garzarella <[email protected]>
> > > >
> > > > This is good enough for net-next, but for net I think we
> > > > should figure out how to address the issue completely.
> > > > Can we make the accounting precise? What happens to
> > > > performance if we do?
> > > >
> > >
> > > In order to do more precise accounting maybe we can use the buffer size,
> > > instead of payload size when we update the credit available.
> > > In this way, the credit available for each socket will reflect the memory
> > > actually used.
> > >
> > > I should check better, because I'm not sure what happen if the peer sees
> > > 1KB of space available, then it sends 1KB of payload (using a 4KB
> > > buffer).
> > >
> > > The other option is to copy each packet in a new buffer like I did in
> > > the v2 [2], but this forces us to make a copy for each packet that does
> > > not fill the entire buffer, perhaps too expensive.
> > >
> > > [2] https://patchwork.kernel.org/patch/10938741/
> > >
> > >
> > > Thanks,
> > > Stefano
> >
> > Interesting. You are right, and at some level the protocol forces copies.
> >
> > We could try to detect that the actual memory is getting close to
> > admin limits and force copies on queued packets after the fact.
> > Is that practical?
>
> Yes, I think it is doable!
> We can decrease the credit available with the buffer size queued, and
> when the buffer size of packet to queue is bigger than the credit
> available, we can copy it.
>
> >
> > And yes we can extend the credit accounting to include buffer size.
> > That's a protocol change but maybe it makes sense.
>
> Since we send to the other peer the credit available, maybe this
> change can be backwards compatible (I'll check better this).

What I said was wrong.

We send a counter (increased when the user consumes the packets) and the
"buf_alloc" (the max memory allowed) to the other peer.
It makes a difference between a local counter (increased when the
packets are sent) and the remote counter to calculate the credit available:

u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit)
{
u32 ret;

spin_lock_bh(&vvs->tx_lock);
ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
if (ret > credit)
ret = credit;
vvs->tx_cnt += ret;
spin_unlock_bh(&vvs->tx_lock);

return ret;
}

Maybe I can play with "buf_alloc" to take care of bytes queued but not
used.

Thanks,
Stefano

2019-07-29 17:44:54

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> Since virtio-vsock was introduced, the buffers filled by the host
> and pushed to the guest using the vring, are directly queued in
> a per-socket list. These buffers are preallocated by the guest
> with a fixed size (4 KB).
>
> The maximum amount of memory used by each socket should be
> controlled by the credit mechanism.
> The default credit available per-socket is 256 KB, but if we use
> only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> buffers, using up to 1 GB of memory per-socket. In addition, the
> guest will continue to fill the vring with new 4 KB free buffers
> to avoid starvation of other sockets.
>
> This patch mitigates this issue copying the payload of small
> packets (< 128 bytes) into the buffer of last packet queued, in
> order to avoid wasting memory.
>
> Reviewed-by: Stefan Hajnoczi <[email protected]>
> Signed-off-by: Stefano Garzarella <[email protected]>

This is good enough for net-next, but for net I think we
should figure out how to address the issue completely.
Can we make the accounting precise? What happens to
performance if we do?


> ---
> drivers/vhost/vsock.c | 2 +
> include/linux/virtio_vsock.h | 1 +
> net/vmw_vsock/virtio_transport.c | 1 +
> net/vmw_vsock/virtio_transport_common.c | 60 +++++++++++++++++++++----
> 4 files changed, 55 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 6a50e1d0529c..6c8390a2af52 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -329,6 +329,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> return NULL;
> }
>
> + pkt->buf_len = pkt->len;
> +
> nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
> if (nbytes != pkt->len) {
> vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index e223e2632edd..7d973903f52e 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -52,6 +52,7 @@ struct virtio_vsock_pkt {
> /* socket refcnt not held, only use for cancellation */
> struct vsock_sock *vsk;
> void *buf;
> + u32 buf_len;
> u32 len;
> u32 off;
> bool reply;
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index 0815d1357861..082a30936690 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -307,6 +307,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> break;
> }
>
> + pkt->buf_len = buf_len;
> pkt->len = buf_len;
>
> sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 6f1a8aff65c5..095221f94786 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -26,6 +26,9 @@
> /* How long to wait for graceful shutdown of a connection */
> #define VSOCK_CLOSE_TIMEOUT (8 * HZ)
>
> +/* Threshold for detecting small packets to copy */
> +#define GOOD_COPY_LEN 128
> +
> static const struct virtio_transport *virtio_transport_get_ops(void)
> {
> const struct vsock_transport *t = vsock_core_get_transport();
> @@ -64,6 +67,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
> pkt->buf = kmalloc(len, GFP_KERNEL);
> if (!pkt->buf)
> goto out_pkt;
> +
> + pkt->buf_len = len;
> +
> err = memcpy_from_msg(pkt->buf, info->msg, len);
> if (err)
> goto out;
> @@ -841,24 +847,60 @@ virtio_transport_recv_connecting(struct sock *sk,
> return err;
> }
>
> +static void
> +virtio_transport_recv_enqueue(struct vsock_sock *vsk,
> + struct virtio_vsock_pkt *pkt)
> +{
> + struct virtio_vsock_sock *vvs = vsk->trans;
> + bool free_pkt = false;
> +
> + pkt->len = le32_to_cpu(pkt->hdr.len);
> + pkt->off = 0;
> +
> + spin_lock_bh(&vvs->rx_lock);
> +
> + virtio_transport_inc_rx_pkt(vvs, pkt);
> +
> + /* Try to copy small packets into the buffer of last packet queued,
> + * to avoid wasting memory queueing the entire buffer with a small
> + * payload.
> + */
> + if (pkt->len <= GOOD_COPY_LEN && !list_empty(&vvs->rx_queue)) {
> + struct virtio_vsock_pkt *last_pkt;
> +
> + last_pkt = list_last_entry(&vvs->rx_queue,
> + struct virtio_vsock_pkt, list);
> +
> + /* If there is space in the last packet queued, we copy the
> + * new packet in its buffer.
> + */
> + if (pkt->len <= last_pkt->buf_len - last_pkt->len) {
> + memcpy(last_pkt->buf + last_pkt->len, pkt->buf,
> + pkt->len);
> + last_pkt->len += pkt->len;
> + free_pkt = true;
> + goto out;
> + }
> + }
> +
> + list_add_tail(&pkt->list, &vvs->rx_queue);
> +
> +out:
> + spin_unlock_bh(&vvs->rx_lock);
> + if (free_pkt)
> + virtio_transport_free_pkt(pkt);
> +}
> +
> static int
> virtio_transport_recv_connected(struct sock *sk,
> struct virtio_vsock_pkt *pkt)
> {
> struct vsock_sock *vsk = vsock_sk(sk);
> - struct virtio_vsock_sock *vvs = vsk->trans;
> int err = 0;
>
> switch (le16_to_cpu(pkt->hdr.op)) {
> case VIRTIO_VSOCK_OP_RW:
> - pkt->len = le32_to_cpu(pkt->hdr.len);
> - pkt->off = 0;
> -
> - spin_lock_bh(&vvs->rx_lock);
> - virtio_transport_inc_rx_pkt(vvs, pkt);
> - list_add_tail(&pkt->list, &vvs->rx_queue);
> - spin_unlock_bh(&vvs->rx_lock);
> -
> + virtio_transport_recv_enqueue(vsk, pkt);
> sk->sk_data_ready(sk);
> return err;
> case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
> --
> 2.20.1

2019-07-29 18:58:50

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Mon, Jul 29, 2019 at 11:49:02AM -0400, Michael S. Tsirkin wrote:
> On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > and pushed to the guest using the vring, are directly queued in
> > > > a per-socket list. These buffers are preallocated by the guest
> > > > with a fixed size (4 KB).
> > > >
> > > > The maximum amount of memory used by each socket should be
> > > > controlled by the credit mechanism.
> > > > The default credit available per-socket is 256 KB, but if we use
> > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > to avoid starvation of other sockets.
> > > >
> > > > This patch mitigates this issue copying the payload of small
> > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > order to avoid wasting memory.
> > > >
> > > > Reviewed-by: Stefan Hajnoczi <[email protected]>
> > > > Signed-off-by: Stefano Garzarella <[email protected]>
> > >
> > > This is good enough for net-next, but for net I think we
> > > should figure out how to address the issue completely.
> > > Can we make the accounting precise? What happens to
> > > performance if we do?
> > >
> >
> > In order to do more precise accounting maybe we can use the buffer size,
> > instead of payload size when we update the credit available.
> > In this way, the credit available for each socket will reflect the memory
> > actually used.
> >
> > I should check better, because I'm not sure what happen if the peer sees
> > 1KB of space available, then it sends 1KB of payload (using a 4KB
> > buffer).
> >
> > The other option is to copy each packet in a new buffer like I did in
> > the v2 [2], but this forces us to make a copy for each packet that does
> > not fill the entire buffer, perhaps too expensive.
> >
> > [2] https://patchwork.kernel.org/patch/10938741/
> >
> >
> > Thanks,
> > Stefano
>
> Interesting. You are right, and at some level the protocol forces copies.
>
> We could try to detect that the actual memory is getting close to
> admin limits and force copies on queued packets after the fact.
> Is that practical?

Yes, I think it is doable!
We can decrease the credit available with the buffer size queued, and
when the buffer size of packet to queue is bigger than the credit
available, we can copy it.

>
> And yes we can extend the credit accounting to include buffer size.
> That's a protocol change but maybe it makes sense.

Since we send to the other peer the credit available, maybe this
change can be backwards compatible (I'll check better this).

Thanks,
Stefano

2019-07-29 19:24:33

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > Since virtio-vsock was introduced, the buffers filled by the host
> > > and pushed to the guest using the vring, are directly queued in
> > > a per-socket list. These buffers are preallocated by the guest
> > > with a fixed size (4 KB).
> > >
> > > The maximum amount of memory used by each socket should be
> > > controlled by the credit mechanism.
> > > The default credit available per-socket is 256 KB, but if we use
> > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > guest will continue to fill the vring with new 4 KB free buffers
> > > to avoid starvation of other sockets.
> > >
> > > This patch mitigates this issue copying the payload of small
> > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > order to avoid wasting memory.
> > >
> > > Reviewed-by: Stefan Hajnoczi <[email protected]>
> > > Signed-off-by: Stefano Garzarella <[email protected]>
> >
> > This is good enough for net-next, but for net I think we
> > should figure out how to address the issue completely.
> > Can we make the accounting precise? What happens to
> > performance if we do?
> >
>
> In order to do more precise accounting maybe we can use the buffer size,
> instead of payload size when we update the credit available.
> In this way, the credit available for each socket will reflect the memory
> actually used.
>
> I should check better, because I'm not sure what happen if the peer sees
> 1KB of space available, then it sends 1KB of payload (using a 4KB
> buffer).
>
> The other option is to copy each packet in a new buffer like I did in
> the v2 [2], but this forces us to make a copy for each packet that does
> not fill the entire buffer, perhaps too expensive.
>
> [2] https://patchwork.kernel.org/patch/10938741/
>
>
> Thanks,
> Stefano

Interesting. You are right, and at some level the protocol forces copies.

We could try to detect that the actual memory is getting close to
admin limits and force copies on queued packets after the fact.
Is that practical?

And yes we can extend the credit accounting to include buffer size.
That's a protocol change but maybe it makes sense.

--
MST

2019-07-29 19:26:30

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Mon, Jul 29, 2019 at 12:01:37PM -0400, Michael S. Tsirkin wrote:
> On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > and pushed to the guest using the vring, are directly queued in
> > > > a per-socket list. These buffers are preallocated by the guest
> > > > with a fixed size (4 KB).
> > > >
> > > > The maximum amount of memory used by each socket should be
> > > > controlled by the credit mechanism.
> > > > The default credit available per-socket is 256 KB, but if we use
> > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > to avoid starvation of other sockets.
> > > >
> > > > This patch mitigates this issue copying the payload of small
> > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > order to avoid wasting memory.
> > > >
> > > > Reviewed-by: Stefan Hajnoczi <[email protected]>
> > > > Signed-off-by: Stefano Garzarella <[email protected]>
> > >
> > > This is good enough for net-next, but for net I think we
> > > should figure out how to address the issue completely.
> > > Can we make the accounting precise? What happens to
> > > performance if we do?
> > >
> >
> > In order to do more precise accounting maybe we can use the buffer size,
> > instead of payload size when we update the credit available.
> > In this way, the credit available for each socket will reflect the memory
> > actually used.
> >
> > I should check better, because I'm not sure what happen if the peer sees
> > 1KB of space available, then it sends 1KB of payload (using a 4KB
> > buffer).
> > The other option is to copy each packet in a new buffer like I did in
> > the v2 [2], but this forces us to make a copy for each packet that does
> > not fill the entire buffer, perhaps too expensive.
> >
> > [2] https://patchwork.kernel.org/patch/10938741/
> >
>
> So one thing we can easily do is to under-report the
> available credit. E.g. if we copy up to 256bytes,
> then report just 256bytes for every buffer in the queue.
>

Ehm sorry, I got lost :(
Can you explain better?


Thanks,
Stefano

2019-07-29 20:12:24

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Mon, Jul 29, 2019 at 06:41:27PM +0200, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 12:01:37PM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > and pushed to the guest using the vring, are directly queued in
> > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > with a fixed size (4 KB).
> > > > >
> > > > > The maximum amount of memory used by each socket should be
> > > > > controlled by the credit mechanism.
> > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > to avoid starvation of other sockets.
> > > > >
> > > > > This patch mitigates this issue copying the payload of small
> > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > order to avoid wasting memory.
> > > > >
> > > > > Reviewed-by: Stefan Hajnoczi <[email protected]>
> > > > > Signed-off-by: Stefano Garzarella <[email protected]>
> > > >
> > > > This is good enough for net-next, but for net I think we
> > > > should figure out how to address the issue completely.
> > > > Can we make the accounting precise? What happens to
> > > > performance if we do?
> > > >
> > >
> > > In order to do more precise accounting maybe we can use the buffer size,
> > > instead of payload size when we update the credit available.
> > > In this way, the credit available for each socket will reflect the memory
> > > actually used.
> > >
> > > I should check better, because I'm not sure what happen if the peer sees
> > > 1KB of space available, then it sends 1KB of payload (using a 4KB
> > > buffer).
> > > The other option is to copy each packet in a new buffer like I did in
> > > the v2 [2], but this forces us to make a copy for each packet that does
> > > not fill the entire buffer, perhaps too expensive.
> > >
> > > [2] https://patchwork.kernel.org/patch/10938741/
> > >
> >
> > So one thing we can easily do is to under-report the
> > available credit. E.g. if we copy up to 256bytes,
> > then report just 256bytes for every buffer in the queue.
> >
>
> Ehm sorry, I got lost :(
> Can you explain better?
>
>
> Thanks,
> Stefano

I think I suggested a better idea more recently.
But to clarify this option: we are adding a 4K buffer.
Let's say we know we will always copy 128 bytes.

So we just tell remote we have 128.
If we add another 4K buffer we add another 128 credits.

So we are charging local socket 16x more (4k for a 128 byte packet) but
we are paying remote 16x less (128 credits for 4k byte buffer). It evens
out.

Way less credits to go around so I'm not sure it's a good idea,
at least as the only solution. Can be combined with other
optimizations and probably in a less drastic fashion
(e.g. 2x rather than 16x).


--
MST

2019-07-29 22:02:02

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Mon, Jul 29, 2019 at 06:50:56PM +0200, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 06:19:03PM +0200, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 11:49:02AM -0400, Michael S. Tsirkin wrote:
> > > On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> > > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > > and pushed to the guest using the vring, are directly queued in
> > > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > > with a fixed size (4 KB).
> > > > > >
> > > > > > The maximum amount of memory used by each socket should be
> > > > > > controlled by the credit mechanism.
> > > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > > to avoid starvation of other sockets.
> > > > > >
> > > > > > This patch mitigates this issue copying the payload of small
> > > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > > order to avoid wasting memory.
> > > > > >
> > > > > > Reviewed-by: Stefan Hajnoczi <[email protected]>
> > > > > > Signed-off-by: Stefano Garzarella <[email protected]>
> > > > >
> > > > > This is good enough for net-next, but for net I think we
> > > > > should figure out how to address the issue completely.
> > > > > Can we make the accounting precise? What happens to
> > > > > performance if we do?
> > > > >
> > > >
> > > > In order to do more precise accounting maybe we can use the buffer size,
> > > > instead of payload size when we update the credit available.
> > > > In this way, the credit available for each socket will reflect the memory
> > > > actually used.
> > > >
> > > > I should check better, because I'm not sure what happen if the peer sees
> > > > 1KB of space available, then it sends 1KB of payload (using a 4KB
> > > > buffer).
> > > >
> > > > The other option is to copy each packet in a new buffer like I did in
> > > > the v2 [2], but this forces us to make a copy for each packet that does
> > > > not fill the entire buffer, perhaps too expensive.
> > > >
> > > > [2] https://patchwork.kernel.org/patch/10938741/
> > > >
> > > >
> > > > Thanks,
> > > > Stefano
> > >
> > > Interesting. You are right, and at some level the protocol forces copies.
> > >
> > > We could try to detect that the actual memory is getting close to
> > > admin limits and force copies on queued packets after the fact.
> > > Is that practical?
> >
> > Yes, I think it is doable!
> > We can decrease the credit available with the buffer size queued, and
> > when the buffer size of packet to queue is bigger than the credit
> > available, we can copy it.
> >
> > >
> > > And yes we can extend the credit accounting to include buffer size.
> > > That's a protocol change but maybe it makes sense.
> >
> > Since we send to the other peer the credit available, maybe this
> > change can be backwards compatible (I'll check better this).
>
> What I said was wrong.
>
> We send a counter (increased when the user consumes the packets) and the
> "buf_alloc" (the max memory allowed) to the other peer.
> It makes a difference between a local counter (increased when the
> packets are sent) and the remote counter to calculate the credit available:
>
> u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit)
> {
> u32 ret;
>
> spin_lock_bh(&vvs->tx_lock);
> ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
> if (ret > credit)
> ret = credit;
> vvs->tx_cnt += ret;
> spin_unlock_bh(&vvs->tx_lock);
>
> return ret;
> }
>
> Maybe I can play with "buf_alloc" to take care of bytes queued but not
> used.
>
> Thanks,
> Stefano

Right. And the idea behind it all was that if we send a credit
to remote then we have space for it.
I think the basic idea was that if we have actual allocated
memory and can copy data there, then we send the credit to
remote.

Of course that means an extra copy every packet.
So as an optimization, it seems that we just assume
that we will be able to allocate a new buffer.

First this is not the best we can do. We can actually do
allocate memory in the socket before sending credit.
If packet is small then we copy it there.
If packet is big then we queue the packet,
take the buffer out of socket and add it to the virtqueue.

Second question is what to do about medium sized packets.
Packet is 1K but buffer is 4K, what do we do?
And here I wonder - why don't we add the 3K buffer
to the vq?



--
MST

2019-07-30 11:18:10

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Mon, Jul 29, 2019 at 03:10:15PM -0400, Michael S. Tsirkin wrote:
> On Mon, Jul 29, 2019 at 06:50:56PM +0200, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 06:19:03PM +0200, Stefano Garzarella wrote:
> > > On Mon, Jul 29, 2019 at 11:49:02AM -0400, Michael S. Tsirkin wrote:
> > > > On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> > > > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > > > and pushed to the guest using the vring, are directly queued in
> > > > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > > > with a fixed size (4 KB).
> > > > > > >
> > > > > > > The maximum amount of memory used by each socket should be
> > > > > > > controlled by the credit mechanism.
> > > > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > > > to avoid starvation of other sockets.
> > > > > > >
> > > > > > > This patch mitigates this issue copying the payload of small
> > > > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > > > order to avoid wasting memory.
> > > > > > >
> > > > > > > Reviewed-by: Stefan Hajnoczi <[email protected]>
> > > > > > > Signed-off-by: Stefano Garzarella <[email protected]>
> > > > > >
> > > > > > This is good enough for net-next, but for net I think we
> > > > > > should figure out how to address the issue completely.
> > > > > > Can we make the accounting precise? What happens to
> > > > > > performance if we do?
> > > > > >
> > > > >
> > > > > In order to do more precise accounting maybe we can use the buffer size,
> > > > > instead of payload size when we update the credit available.
> > > > > In this way, the credit available for each socket will reflect the memory
> > > > > actually used.
> > > > >
> > > > > I should check better, because I'm not sure what happen if the peer sees
> > > > > 1KB of space available, then it sends 1KB of payload (using a 4KB
> > > > > buffer).
> > > > >
> > > > > The other option is to copy each packet in a new buffer like I did in
> > > > > the v2 [2], but this forces us to make a copy for each packet that does
> > > > > not fill the entire buffer, perhaps too expensive.
> > > > >
> > > > > [2] https://patchwork.kernel.org/patch/10938741/
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Stefano
> > > >
> > > > Interesting. You are right, and at some level the protocol forces copies.
> > > >
> > > > We could try to detect that the actual memory is getting close to
> > > > admin limits and force copies on queued packets after the fact.
> > > > Is that practical?
> > >
> > > Yes, I think it is doable!
> > > We can decrease the credit available with the buffer size queued, and
> > > when the buffer size of packet to queue is bigger than the credit
> > > available, we can copy it.
> > >
> > > >
> > > > And yes we can extend the credit accounting to include buffer size.
> > > > That's a protocol change but maybe it makes sense.
> > >
> > > Since we send to the other peer the credit available, maybe this
> > > change can be backwards compatible (I'll check better this).
> >
> > What I said was wrong.
> >
> > We send a counter (increased when the user consumes the packets) and the
> > "buf_alloc" (the max memory allowed) to the other peer.
> > It makes a difference between a local counter (increased when the
> > packets are sent) and the remote counter to calculate the credit available:
> >
> > u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit)
> > {
> > u32 ret;
> >
> > spin_lock_bh(&vvs->tx_lock);
> > ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
> > if (ret > credit)
> > ret = credit;
> > vvs->tx_cnt += ret;
> > spin_unlock_bh(&vvs->tx_lock);
> >
> > return ret;
> > }
> >
> > Maybe I can play with "buf_alloc" to take care of bytes queued but not
> > used.
> >
> > Thanks,
> > Stefano
>
> Right. And the idea behind it all was that if we send a credit
> to remote then we have space for it.

Yes.

> I think the basic idea was that if we have actual allocated
> memory and can copy data there, then we send the credit to
> remote.
>
> Of course that means an extra copy every packet.
> So as an optimization, it seems that we just assume
> that we will be able to allocate a new buffer.

Yes, we refill the virtqueue when half of the buffers were used.

>
> First this is not the best we can do. We can actually do
> allocate memory in the socket before sending credit.

In this case, IIUC we should allocate an entire buffer (4KB),
so we can reuse it if the packet is big.

> If packet is small then we copy it there.
> If packet is big then we queue the packet,
> take the buffer out of socket and add it to the virtqueue.
>
> Second question is what to do about medium sized packets.
> Packet is 1K but buffer is 4K, what do we do?
> And here I wonder - why don't we add the 3K buffer
> to the vq?

This would allow us to have an accurate credit account.

The problem here is the compatibility. Before this series virtio-vsock
and vhost-vsock modules had the RX buffer size hard-coded
(VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
of 4K, there might be issues.

Maybe it is the time to add add 'features' to virtio-vsock device.

Thanks,
Stefano

2019-07-30 11:40:13

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput


On 2019/7/30 下午5:40, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 09:59:23AM -0400, Michael S. Tsirkin wrote:
>> On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
>>> This series tries to increase the throughput of virtio-vsock with slight
>>> changes.
>>> While I was testing the v2 of this series I discovered an huge use of memory,
>>> so I added patch 1 to mitigate this issue. I put it in this series in order
>>> to better track the performance trends.
>> Series:
>>
>> Acked-by: Michael S. Tsirkin <[email protected]>
>>
>> Can this go into net-next?
>>
> I think so.
> Michael, Stefan thanks to ack the series!
>
> Should I resend it with net-next tag?
>
> Thanks,
> Stefano


I think so.

Thanks

2019-07-30 14:59:41

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput

On Mon, Jul 29, 2019 at 09:59:23AM -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> > This series tries to increase the throughput of virtio-vsock with slight
> > changes.
> > While I was testing the v2 of this series I discovered an huge use of memory,
> > so I added patch 1 to mitigate this issue. I put it in this series in order
> > to better track the performance trends.
>
> Series:
>
> Acked-by: Michael S. Tsirkin <[email protected]>
>
> Can this go into net-next?
>

I think so.
Michael, Stefan thanks to ack the series!

Should I resend it with net-next tag?

Thanks,
Stefano

2019-07-30 16:40:19

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput

On Tue, Jul 30, 2019 at 06:03:24PM +0800, Jason Wang wrote:
>
> On 2019/7/30 下午5:40, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 09:59:23AM -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> > > > This series tries to increase the throughput of virtio-vsock with slight
> > > > changes.
> > > > While I was testing the v2 of this series I discovered an huge use of memory,
> > > > so I added patch 1 to mitigate this issue. I put it in this series in order
> > > > to better track the performance trends.
> > > Series:
> > >
> > > Acked-by: Michael S. Tsirkin <[email protected]>
> > >
> > > Can this go into net-next?
> > >
> > I think so.
> > Michael, Stefan thanks to ack the series!
> >
> > Should I resend it with net-next tag?
> >
> > Thanks,
> > Stefano
>
>
> I think so.

Okay, I'll resend it!

Thanks,
Stefano

2019-07-31 01:53:58

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 03:10:15PM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 29, 2019 at 06:50:56PM +0200, Stefano Garzarella wrote:
> > > On Mon, Jul 29, 2019 at 06:19:03PM +0200, Stefano Garzarella wrote:
> > > > On Mon, Jul 29, 2019 at 11:49:02AM -0400, Michael S. Tsirkin wrote:
> > > > > On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote:
> > > > > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > > > > and pushed to the guest using the vring, are directly queued in
> > > > > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > > > > with a fixed size (4 KB).
> > > > > > > >
> > > > > > > > The maximum amount of memory used by each socket should be
> > > > > > > > controlled by the credit mechanism.
> > > > > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > > > > to avoid starvation of other sockets.
> > > > > > > >
> > > > > > > > This patch mitigates this issue copying the payload of small
> > > > > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > > > > order to avoid wasting memory.
> > > > > > > >
> > > > > > > > Reviewed-by: Stefan Hajnoczi <[email protected]>
> > > > > > > > Signed-off-by: Stefano Garzarella <[email protected]>
> > > > > > >
> > > > > > > This is good enough for net-next, but for net I think we
> > > > > > > should figure out how to address the issue completely.
> > > > > > > Can we make the accounting precise? What happens to
> > > > > > > performance if we do?
> > > > > > >
> > > > > >
> > > > > > In order to do more precise accounting maybe we can use the buffer size,
> > > > > > instead of payload size when we update the credit available.
> > > > > > In this way, the credit available for each socket will reflect the memory
> > > > > > actually used.
> > > > > >
> > > > > > I should check better, because I'm not sure what happen if the peer sees
> > > > > > 1KB of space available, then it sends 1KB of payload (using a 4KB
> > > > > > buffer).
> > > > > >
> > > > > > The other option is to copy each packet in a new buffer like I did in
> > > > > > the v2 [2], but this forces us to make a copy for each packet that does
> > > > > > not fill the entire buffer, perhaps too expensive.
> > > > > >
> > > > > > [2] https://patchwork.kernel.org/patch/10938741/
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Stefano
> > > > >
> > > > > Interesting. You are right, and at some level the protocol forces copies.
> > > > >
> > > > > We could try to detect that the actual memory is getting close to
> > > > > admin limits and force copies on queued packets after the fact.
> > > > > Is that practical?
> > > >
> > > > Yes, I think it is doable!
> > > > We can decrease the credit available with the buffer size queued, and
> > > > when the buffer size of packet to queue is bigger than the credit
> > > > available, we can copy it.
> > > >
> > > > >
> > > > > And yes we can extend the credit accounting to include buffer size.
> > > > > That's a protocol change but maybe it makes sense.
> > > >
> > > > Since we send to the other peer the credit available, maybe this
> > > > change can be backwards compatible (I'll check better this).
> > >
> > > What I said was wrong.
> > >
> > > We send a counter (increased when the user consumes the packets) and the
> > > "buf_alloc" (the max memory allowed) to the other peer.
> > > It makes a difference between a local counter (increased when the
> > > packets are sent) and the remote counter to calculate the credit available:
> > >
> > > u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit)
> > > {
> > > u32 ret;
> > >
> > > spin_lock_bh(&vvs->tx_lock);
> > > ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
> > > if (ret > credit)
> > > ret = credit;
> > > vvs->tx_cnt += ret;
> > > spin_unlock_bh(&vvs->tx_lock);
> > >
> > > return ret;
> > > }
> > >
> > > Maybe I can play with "buf_alloc" to take care of bytes queued but not
> > > used.
> > >
> > > Thanks,
> > > Stefano
> >
> > Right. And the idea behind it all was that if we send a credit
> > to remote then we have space for it.
>
> Yes.
>
> > I think the basic idea was that if we have actual allocated
> > memory and can copy data there, then we send the credit to
> > remote.
> >
> > Of course that means an extra copy every packet.
> > So as an optimization, it seems that we just assume
> > that we will be able to allocate a new buffer.
>
> Yes, we refill the virtqueue when half of the buffers were used.
>
> >
> > First this is not the best we can do. We can actually do
> > allocate memory in the socket before sending credit.
>
> In this case, IIUC we should allocate an entire buffer (4KB),
> so we can reuse it if the packet is big.
>
> > If packet is small then we copy it there.
> > If packet is big then we queue the packet,
> > take the buffer out of socket and add it to the virtqueue.
> >
> > Second question is what to do about medium sized packets.
> > Packet is 1K but buffer is 4K, what do we do?
> > And here I wonder - why don't we add the 3K buffer
> > to the vq?
>
> This would allow us to have an accurate credit account.
>
> The problem here is the compatibility. Before this series virtio-vsock
> and vhost-vsock modules had the RX buffer size hard-coded
> (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> of 4K, there might be issues.

Shouldn't be if they are following the spec. If not let's fix
the broken parts.

>
> Maybe it is the time to add add 'features' to virtio-vsock device.
>
> Thanks,
> Stefano

Why would a remote care about buffer sizes?

Let's first see what the issues are. If they exist
we can either fix the bugs, or code the bug as a feature in spec.

--
MST

2019-08-01 12:02:49

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Tue, Jul 30, 2019 at 04:42:25PM -0400, Michael S. Tsirkin wrote:
> On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:

(...)

> >
> > The problem here is the compatibility. Before this series virtio-vsock
> > and vhost-vsock modules had the RX buffer size hard-coded
> > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> > of 4K, there might be issues.
>
> Shouldn't be if they are following the spec. If not let's fix
> the broken parts.
>
> >
> > Maybe it is the time to add add 'features' to virtio-vsock device.
> >
> > Thanks,
> > Stefano
>
> Why would a remote care about buffer sizes?
>
> Let's first see what the issues are. If they exist
> we can either fix the bugs, or code the bug as a feature in spec.
>

The vhost_transport '.stream_enqueue' callback
[virtio_transport_stream_enqueue()] calls the virtio_transport_send_pkt_info(),
passing the user message. This function allocates a new packet, copying
the user message, but (before this series) it limits the packet size to
the VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (4K):

static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
struct virtio_vsock_pkt_info *info)
{
...
/* we can send less than pkt_len bytes */
if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;

/* virtio_transport_get_credit might return less than pkt_len credit */
pkt_len = virtio_transport_get_credit(vvs, pkt_len);

/* Do not send zero length OP_RW pkt */
if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
return pkt_len;
...
}

then it queues the packet for the TX worker calling .send_pkt()
[vhost_transport_send_pkt() in the vhost_transport case]

The main function executed by the TX worker is
vhost_transport_do_send_pkt() that picks up a buffer from the virtqueue
and it tries to copy the packet (up to 4K) on it. If the buffer
allocated from the guest will be smaller then 4K, I think here it will
be discarded with an error:

static void
vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
struct vhost_virtqueue *vq)
{
...
nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
if (nbytes != pkt->len) {
virtio_transport_free_pkt(pkt);
vq_err(vq, "Faulted on copying pkt buf\n");
break;
}
...
}


This series changes this behavior since now we will split the packet in
vhost_transport_do_send_pkt() depending on the buffer found in the
virtqueue.

We didn't change the buffer size in this series, so we still backward
compatible, but if we will use buffers smaller than 4K, we should
encounter the error described above.

How do you suggest we proceed if we want to change the buffer size?
Maybe adding a feature to "support any buffer size"?

Thanks,
Stefano

2019-08-01 13:39:58

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Thu, Aug 01, 2019 at 12:47:54PM +0200, Stefano Garzarella wrote:
> On Tue, Jul 30, 2019 at 04:42:25PM -0400, Michael S. Tsirkin wrote:
> > On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:
>
> (...)
>
> > >
> > > The problem here is the compatibility. Before this series virtio-vsock
> > > and vhost-vsock modules had the RX buffer size hard-coded
> > > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> > > of 4K, there might be issues.
> >
> > Shouldn't be if they are following the spec. If not let's fix
> > the broken parts.
> >
> > >
> > > Maybe it is the time to add add 'features' to virtio-vsock device.
> > >
> > > Thanks,
> > > Stefano
> >
> > Why would a remote care about buffer sizes?
> >
> > Let's first see what the issues are. If they exist
> > we can either fix the bugs, or code the bug as a feature in spec.
> >
>
> The vhost_transport '.stream_enqueue' callback
> [virtio_transport_stream_enqueue()] calls the virtio_transport_send_pkt_info(),
> passing the user message. This function allocates a new packet, copying
> the user message, but (before this series) it limits the packet size to
> the VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (4K):
>
> static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> struct virtio_vsock_pkt_info *info)
> {
> ...
> /* we can send less than pkt_len bytes */
> if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
>
> /* virtio_transport_get_credit might return less than pkt_len credit */
> pkt_len = virtio_transport_get_credit(vvs, pkt_len);
>
> /* Do not send zero length OP_RW pkt */
> if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> return pkt_len;
> ...
> }
>
> then it queues the packet for the TX worker calling .send_pkt()
> [vhost_transport_send_pkt() in the vhost_transport case]
>
> The main function executed by the TX worker is
> vhost_transport_do_send_pkt() that picks up a buffer from the virtqueue
> and it tries to copy the packet (up to 4K) on it. If the buffer
> allocated from the guest will be smaller then 4K, I think here it will
> be discarded with an error:
>
> static void
> vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> struct vhost_virtqueue *vq)
> {
> ...
> nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);

isn't pck len the actual length though?

> if (nbytes != pkt->len) {
> virtio_transport_free_pkt(pkt);
> vq_err(vq, "Faulted on copying pkt buf\n");
> break;
> }
> ...
> }
>
>
> This series changes this behavior since now we will split the packet in
> vhost_transport_do_send_pkt() depending on the buffer found in the
> virtqueue.
>
> We didn't change the buffer size in this series, so we still backward
> compatible, but if we will use buffers smaller than 4K, we should
> encounter the error described above.
>
> How do you suggest we proceed if we want to change the buffer size?
> Maybe adding a feature to "support any buffer size"?
>
> Thanks,
> Stefano


2019-08-01 13:46:08

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Thu, Aug 01, 2019 at 09:21:15AM -0400, Michael S. Tsirkin wrote:
> On Thu, Aug 01, 2019 at 12:47:54PM +0200, Stefano Garzarella wrote:
> > On Tue, Jul 30, 2019 at 04:42:25PM -0400, Michael S. Tsirkin wrote:
> > > On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:
> >
> > (...)
> >
> > > >
> > > > The problem here is the compatibility. Before this series virtio-vsock
> > > > and vhost-vsock modules had the RX buffer size hard-coded
> > > > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> > > > of 4K, there might be issues.
> > >
> > > Shouldn't be if they are following the spec. If not let's fix
> > > the broken parts.
> > >
> > > >
> > > > Maybe it is the time to add add 'features' to virtio-vsock device.
> > > >
> > > > Thanks,
> > > > Stefano
> > >
> > > Why would a remote care about buffer sizes?
> > >
> > > Let's first see what the issues are. If they exist
> > > we can either fix the bugs, or code the bug as a feature in spec.
> > >
> >
> > The vhost_transport '.stream_enqueue' callback
> > [virtio_transport_stream_enqueue()] calls the virtio_transport_send_pkt_info(),
> > passing the user message. This function allocates a new packet, copying
> > the user message, but (before this series) it limits the packet size to
> > the VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (4K):
> >
> > static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > struct virtio_vsock_pkt_info *info)
> > {
> > ...
> > /* we can send less than pkt_len bytes */
> > if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> > pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> >
> > /* virtio_transport_get_credit might return less than pkt_len credit */
> > pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> >
> > /* Do not send zero length OP_RW pkt */
> > if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> > return pkt_len;
> > ...
> > }
> >
> > then it queues the packet for the TX worker calling .send_pkt()
> > [vhost_transport_send_pkt() in the vhost_transport case]
> >
> > The main function executed by the TX worker is
> > vhost_transport_do_send_pkt() that picks up a buffer from the virtqueue
> > and it tries to copy the packet (up to 4K) on it. If the buffer
> > allocated from the guest will be smaller then 4K, I think here it will
> > be discarded with an error:
> >

I'm adding more lines to explain better.

> > static void
> > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > struct vhost_virtqueue *vq)
> > {
...

head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
&out, &in, NULL, NULL);

...

len = iov_length(&vq->iov[out], in);
iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);

nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
if (nbytes != sizeof(pkt->hdr)) {
virtio_transport_free_pkt(pkt);
vq_err(vq, "Faulted on copying pkt hdr\n");
break;
}

> > ...
> > nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
>
> isn't pck len the actual length though?
>

It is the length of the packet that we are copying in the guest RX
buffers pointed by the iov_iter. The guest allocates an iovec with 2
buffers, one for the header and one for the payload (4KB).

> > if (nbytes != pkt->len) {
> > virtio_transport_free_pkt(pkt);
> > vq_err(vq, "Faulted on copying pkt buf\n");
> > break;
> > }
> > ...
> > }
> >
> >
> > This series changes this behavior since now we will split the packet in
> > vhost_transport_do_send_pkt() depending on the buffer found in the
> > virtqueue.
> >
> > We didn't change the buffer size in this series, so we still backward
> > compatible, but if we will use buffers smaller than 4K, we should
> > encounter the error described above.
> >
> > How do you suggest we proceed if we want to change the buffer size?
> > Maybe adding a feature to "support any buffer size"?
> >
> > Thanks,
> > Stefano
>
>

--

2019-08-30 09:42:34

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > Since virtio-vsock was introduced, the buffers filled by the host
> > and pushed to the guest using the vring, are directly queued in
> > a per-socket list. These buffers are preallocated by the guest
> > with a fixed size (4 KB).
> >
> > The maximum amount of memory used by each socket should be
> > controlled by the credit mechanism.
> > The default credit available per-socket is 256 KB, but if we use
> > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > buffers, using up to 1 GB of memory per-socket. In addition, the
> > guest will continue to fill the vring with new 4 KB free buffers
> > to avoid starvation of other sockets.
> >
> > This patch mitigates this issue copying the payload of small
> > packets (< 128 bytes) into the buffer of last packet queued, in
> > order to avoid wasting memory.
> >
> > Reviewed-by: Stefan Hajnoczi <[email protected]>
> > Signed-off-by: Stefano Garzarella <[email protected]>
>
> This is good enough for net-next, but for net I think we
> should figure out how to address the issue completely.
> Can we make the accounting precise? What happens to
> performance if we do?
>

Since I'm back from holidays, I'm restarting this thread to figure out
how to address the issue completely.

I did a better analysis of the credit mechanism that we implemented in
virtio-vsock to get a clearer view and I'd share it with you:

This issue affect only the "host->guest" path. In this case, when the
host wants to send a packet to the guest, it uses a "free" buffer
allocated by the guest (4KB).
The "free" buffers available for the host are shared between all
sockets, instead, the credit mechanism is per-socket, I think to
avoid the starvation of others sockets.
The guests re-fill the "free" queue when the available buffers are
less than half.

Each peer have these variables in the per-socket state:
/* local vars */
buf_alloc /* max bytes usable by this socket
[exposed to the other peer] */
fwd_cnt /* increased when RX packet is consumed by the
user space [exposed to the other peer] */
tx_cnt /* increased when TX packet is sent to the other peer */

/* remote vars */
peer_buf_alloc /* peer's buf_alloc */
peer_fwd_cnt /* peer's fwd_cnt */

When a peer sends a packet, it increases the 'tx_cnt'; when the
receiver consumes the packet (copy it to the user-space buffer), it
increases the 'fwd_cnt'.
Note: increments are made considering the payload length and not the
buffer length.

The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
all packet headers or with an explicit CREDIT_UPDATE packet.

The local 'buf_alloc' value can be modified by the user space using
setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.

Before to send a packet, the peer checks the space available:
credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
and it will send up to credit_available bytes to the other peer.

Possible solutions considering Michael's advice:
1. Use the buffer length instead of the payload length when we increment
the counters:
- This approach will account precisely the memory used per socket.
- This requires changes in both guest and host.
- It is not compatible with old drivers, so a feature should be negotiated.

2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
the socket queue but not used. (e.g. 256 byte used on 4K available in
the buffer)
- pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
- This should be compatible also with old drivers.

Maybe the second is less invasive, but will it be too tricky?
Any other advice or suggestions?

Thanks in advance,
Stefano

2019-09-01 06:58:19

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:
> On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > Since virtio-vsock was introduced, the buffers filled by the host
> > > and pushed to the guest using the vring, are directly queued in
> > > a per-socket list. These buffers are preallocated by the guest
> > > with a fixed size (4 KB).
> > >
> > > The maximum amount of memory used by each socket should be
> > > controlled by the credit mechanism.
> > > The default credit available per-socket is 256 KB, but if we use
> > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > guest will continue to fill the vring with new 4 KB free buffers
> > > to avoid starvation of other sockets.
> > >
> > > This patch mitigates this issue copying the payload of small
> > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > order to avoid wasting memory.
> > >
> > > Reviewed-by: Stefan Hajnoczi <[email protected]>
> > > Signed-off-by: Stefano Garzarella <[email protected]>
> >
> > This is good enough for net-next, but for net I think we
> > should figure out how to address the issue completely.
> > Can we make the accounting precise? What happens to
> > performance if we do?
> >
>
> Since I'm back from holidays, I'm restarting this thread to figure out
> how to address the issue completely.
>
> I did a better analysis of the credit mechanism that we implemented in
> virtio-vsock to get a clearer view and I'd share it with you:
>
> This issue affect only the "host->guest" path. In this case, when the
> host wants to send a packet to the guest, it uses a "free" buffer
> allocated by the guest (4KB).
> The "free" buffers available for the host are shared between all
> sockets, instead, the credit mechanism is per-socket, I think to
> avoid the starvation of others sockets.
> The guests re-fill the "free" queue when the available buffers are
> less than half.
>
> Each peer have these variables in the per-socket state:
> /* local vars */
> buf_alloc /* max bytes usable by this socket
> [exposed to the other peer] */
> fwd_cnt /* increased when RX packet is consumed by the
> user space [exposed to the other peer] */
> tx_cnt /* increased when TX packet is sent to the other peer */
>
> /* remote vars */
> peer_buf_alloc /* peer's buf_alloc */
> peer_fwd_cnt /* peer's fwd_cnt */
>
> When a peer sends a packet, it increases the 'tx_cnt'; when the
> receiver consumes the packet (copy it to the user-space buffer), it
> increases the 'fwd_cnt'.
> Note: increments are made considering the payload length and not the
> buffer length.
>
> The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
> all packet headers or with an explicit CREDIT_UPDATE packet.
>
> The local 'buf_alloc' value can be modified by the user space using
> setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.
>
> Before to send a packet, the peer checks the space available:
> credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
> and it will send up to credit_available bytes to the other peer.
>
> Possible solutions considering Michael's advice:
> 1. Use the buffer length instead of the payload length when we increment
> the counters:
> - This approach will account precisely the memory used per socket.
> - This requires changes in both guest and host.
> - It is not compatible with old drivers, so a feature should be negotiated.
> 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
> the socket queue but not used. (e.g. 256 byte used on 4K available in
> the buffer)
> - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
> - This should be compatible also with old drivers.
>
> Maybe the second is less invasive, but will it be too tricky?
> Any other advice or suggestions?
>
> Thanks in advance,
> Stefano

OK let me try to clarify. The idea is this:

Let's say we queue a buffer of 4K, and we copy if len < 128 bytes. This
means that in the worst case (128 byte packets), each byte of credit in
the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
to also account for the virtio_vsock_pkt since I think it's kept around
until userspace consumes it.

Thus given X buf alloc allowed in the socket, we should publish X/16
credits to the other side. This will ensure the other side does not send
more than X/16 bytes for a given socket and thus we won't need to
allocate more than X bytes to hold the data.

We can play with the copy break value to tweak this.



2019-09-01 08:34:32

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Thu, Aug 01, 2019 at 03:36:16PM +0200, Stefano Garzarella wrote:
> On Thu, Aug 01, 2019 at 09:21:15AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Aug 01, 2019 at 12:47:54PM +0200, Stefano Garzarella wrote:
> > > On Tue, Jul 30, 2019 at 04:42:25PM -0400, Michael S. Tsirkin wrote:
> > > > On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:
> > >
> > > (...)
> > >
> > > > >
> > > > > The problem here is the compatibility. Before this series virtio-vsock
> > > > > and vhost-vsock modules had the RX buffer size hard-coded
> > > > > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> > > > > of 4K, there might be issues.
> > > >
> > > > Shouldn't be if they are following the spec. If not let's fix
> > > > the broken parts.
> > > >
> > > > >
> > > > > Maybe it is the time to add add 'features' to virtio-vsock device.
> > > > >
> > > > > Thanks,
> > > > > Stefano
> > > >
> > > > Why would a remote care about buffer sizes?
> > > >
> > > > Let's first see what the issues are. If they exist
> > > > we can either fix the bugs, or code the bug as a feature in spec.
> > > >
> > >
> > > The vhost_transport '.stream_enqueue' callback
> > > [virtio_transport_stream_enqueue()] calls the virtio_transport_send_pkt_info(),
> > > passing the user message. This function allocates a new packet, copying
> > > the user message, but (before this series) it limits the packet size to
> > > the VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (4K):
> > >
> > > static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > struct virtio_vsock_pkt_info *info)
> > > {
> > > ...
> > > /* we can send less than pkt_len bytes */
> > > if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> > > pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> > >
> > > /* virtio_transport_get_credit might return less than pkt_len credit */
> > > pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> > >
> > > /* Do not send zero length OP_RW pkt */
> > > if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> > > return pkt_len;
> > > ...
> > > }
> > >
> > > then it queues the packet for the TX worker calling .send_pkt()
> > > [vhost_transport_send_pkt() in the vhost_transport case]
> > >
> > > The main function executed by the TX worker is
> > > vhost_transport_do_send_pkt() that picks up a buffer from the virtqueue
> > > and it tries to copy the packet (up to 4K) on it. If the buffer
> > > allocated from the guest will be smaller then 4K, I think here it will
> > > be discarded with an error:
> > >
>
> I'm adding more lines to explain better.
>
> > > static void
> > > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > > struct vhost_virtqueue *vq)
> > > {
> ...
>
> head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> &out, &in, NULL, NULL);
>
> ...
>
> len = iov_length(&vq->iov[out], in);
> iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
>
> nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
> if (nbytes != sizeof(pkt->hdr)) {
> virtio_transport_free_pkt(pkt);
> vq_err(vq, "Faulted on copying pkt hdr\n");
> break;
> }
>
> > > ...
> > > nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
> >
> > isn't pck len the actual length though?
> >
>
> It is the length of the packet that we are copying in the guest RX
> buffers pointed by the iov_iter. The guest allocates an iovec with 2
> buffers, one for the header and one for the payload (4KB).

BTW at the moment that forces another kmalloc within virtio core. Maybe
vsock needs a flag to skip allocation in this case. Worth benchmarking.
See virtqueue_use_indirect which just does total_sg > 1.

>
> > > if (nbytes != pkt->len) {
> > > virtio_transport_free_pkt(pkt);
> > > vq_err(vq, "Faulted on copying pkt buf\n");
> > > break;
> > > }
> > > ...
> > > }
> > >
> > >
> > > This series changes this behavior since now we will split the packet in
> > > vhost_transport_do_send_pkt() depending on the buffer found in the
> > > virtqueue.
> > >
> > > We didn't change the buffer size in this series, so we still backward
> > > compatible, but if we will use buffers smaller than 4K, we should
> > > encounter the error described above.

So that's an implementation bug then? It made an assumption
of a 4K sized buffer? Or even PAGE_SIZE sized buffer?


> > >
> > > How do you suggest we proceed if we want to change the buffer size?
> > > Maybe adding a feature to "support any buffer size"?
> > >
> > > Thanks,
> > > Stefano
> >
> >
>
> --

2019-09-01 10:27:38

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Sun, Sep 01, 2019 at 04:26:19AM -0400, Michael S. Tsirkin wrote:
> On Thu, Aug 01, 2019 at 03:36:16PM +0200, Stefano Garzarella wrote:
> > On Thu, Aug 01, 2019 at 09:21:15AM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Aug 01, 2019 at 12:47:54PM +0200, Stefano Garzarella wrote:
> > > > On Tue, Jul 30, 2019 at 04:42:25PM -0400, Michael S. Tsirkin wrote:
> > > > > On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:
> > > >
> > > > (...)
> > > >
> > > > > >
> > > > > > The problem here is the compatibility. Before this series virtio-vsock
> > > > > > and vhost-vsock modules had the RX buffer size hard-coded
> > > > > > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> > > > > > of 4K, there might be issues.
> > > > >
> > > > > Shouldn't be if they are following the spec. If not let's fix
> > > > > the broken parts.
> > > > >
> > > > > >
> > > > > > Maybe it is the time to add add 'features' to virtio-vsock device.
> > > > > >
> > > > > > Thanks,
> > > > > > Stefano
> > > > >
> > > > > Why would a remote care about buffer sizes?
> > > > >
> > > > > Let's first see what the issues are. If they exist
> > > > > we can either fix the bugs, or code the bug as a feature in spec.
> > > > >
> > > >
> > > > The vhost_transport '.stream_enqueue' callback
> > > > [virtio_transport_stream_enqueue()] calls the virtio_transport_send_pkt_info(),
> > > > passing the user message. This function allocates a new packet, copying
> > > > the user message, but (before this series) it limits the packet size to
> > > > the VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (4K):
> > > >
> > > > static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > struct virtio_vsock_pkt_info *info)
> > > > {
> > > > ...
> > > > /* we can send less than pkt_len bytes */
> > > > if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> > > > pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> > > >
> > > > /* virtio_transport_get_credit might return less than pkt_len credit */
> > > > pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> > > >
> > > > /* Do not send zero length OP_RW pkt */
> > > > if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> > > > return pkt_len;
> > > > ...
> > > > }
> > > >
> > > > then it queues the packet for the TX worker calling .send_pkt()
> > > > [vhost_transport_send_pkt() in the vhost_transport case]
> > > >
> > > > The main function executed by the TX worker is
> > > > vhost_transport_do_send_pkt() that picks up a buffer from the virtqueue
> > > > and it tries to copy the packet (up to 4K) on it. If the buffer
> > > > allocated from the guest will be smaller then 4K, I think here it will
> > > > be discarded with an error:
> > > >
> >
> > I'm adding more lines to explain better.
> >
> > > > static void
> > > > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > > > struct vhost_virtqueue *vq)
> > > > {
> > ...
> >
> > head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> > &out, &in, NULL, NULL);
> >
> > ...
> >
> > len = iov_length(&vq->iov[out], in);
> > iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
> >
> > nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
> > if (nbytes != sizeof(pkt->hdr)) {
> > virtio_transport_free_pkt(pkt);
> > vq_err(vq, "Faulted on copying pkt hdr\n");
> > break;
> > }
> >
> > > > ...
> > > > nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
> > >
> > > isn't pck len the actual length though?
> > >
> >
> > It is the length of the packet that we are copying in the guest RX
> > buffers pointed by the iov_iter. The guest allocates an iovec with 2
> > buffers, one for the header and one for the payload (4KB).
>
> BTW at the moment that forces another kmalloc within virtio core. Maybe
> vsock needs a flag to skip allocation in this case. Worth benchmarking.
> See virtqueue_use_indirect which just does total_sg > 1.
>
> >
> > > > if (nbytes != pkt->len) {
> > > > virtio_transport_free_pkt(pkt);
> > > > vq_err(vq, "Faulted on copying pkt buf\n");
> > > > break;
> > > > }
> > > > ...
> > > > }
> > > >
> > > >
> > > > This series changes this behavior since now we will split the packet in
> > > > vhost_transport_do_send_pkt() depending on the buffer found in the
> > > > virtqueue.
> > > >
> > > > We didn't change the buffer size in this series, so we still backward
> > > > compatible, but if we will use buffers smaller than 4K, we should
> > > > encounter the error described above.
>
> So that's an implementation bug then? It made an assumption
> of a 4K sized buffer? Or even PAGE_SIZE sized buffer?

Assuming we miss nothing and buffers < 4K are broken,
I think we need to add this to the spec, possibly with
a feature bit to relax the requirement that all buffers
are at least 4k in size.

>
> > > >
> > > > How do you suggest we proceed if we want to change the buffer size?
> > > > Maybe adding a feature to "support any buffer size"?
> > > >
> > > > Thanks,
> > > > Stefano
> > >
> > >
> >
> > --

2019-09-02 09:37:23

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Sun, Sep 01, 2019 at 02:56:44AM -0400, Michael S. Tsirkin wrote:
> On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > and pushed to the guest using the vring, are directly queued in
> > > > a per-socket list. These buffers are preallocated by the guest
> > > > with a fixed size (4 KB).
> > > >
> > > > The maximum amount of memory used by each socket should be
> > > > controlled by the credit mechanism.
> > > > The default credit available per-socket is 256 KB, but if we use
> > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > to avoid starvation of other sockets.
> > > >
> > > > This patch mitigates this issue copying the payload of small
> > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > order to avoid wasting memory.
> > > >
> > > > Reviewed-by: Stefan Hajnoczi <[email protected]>
> > > > Signed-off-by: Stefano Garzarella <[email protected]>
> > >
> > > This is good enough for net-next, but for net I think we
> > > should figure out how to address the issue completely.
> > > Can we make the accounting precise? What happens to
> > > performance if we do?
> > >
> >
> > Since I'm back from holidays, I'm restarting this thread to figure out
> > how to address the issue completely.
> >
> > I did a better analysis of the credit mechanism that we implemented in
> > virtio-vsock to get a clearer view and I'd share it with you:
> >
> > This issue affect only the "host->guest" path. In this case, when the
> > host wants to send a packet to the guest, it uses a "free" buffer
> > allocated by the guest (4KB).
> > The "free" buffers available for the host are shared between all
> > sockets, instead, the credit mechanism is per-socket, I think to
> > avoid the starvation of others sockets.
> > The guests re-fill the "free" queue when the available buffers are
> > less than half.
> >
> > Each peer have these variables in the per-socket state:
> > /* local vars */
> > buf_alloc /* max bytes usable by this socket
> > [exposed to the other peer] */
> > fwd_cnt /* increased when RX packet is consumed by the
> > user space [exposed to the other peer] */
> > tx_cnt /* increased when TX packet is sent to the other peer */
> >
> > /* remote vars */
> > peer_buf_alloc /* peer's buf_alloc */
> > peer_fwd_cnt /* peer's fwd_cnt */
> >
> > When a peer sends a packet, it increases the 'tx_cnt'; when the
> > receiver consumes the packet (copy it to the user-space buffer), it
> > increases the 'fwd_cnt'.
> > Note: increments are made considering the payload length and not the
> > buffer length.
> >
> > The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
> > all packet headers or with an explicit CREDIT_UPDATE packet.
> >
> > The local 'buf_alloc' value can be modified by the user space using
> > setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.
> >
> > Before to send a packet, the peer checks the space available:
> > credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
> > and it will send up to credit_available bytes to the other peer.
> >
> > Possible solutions considering Michael's advice:
> > 1. Use the buffer length instead of the payload length when we increment
> > the counters:
> > - This approach will account precisely the memory used per socket.
> > - This requires changes in both guest and host.
> > - It is not compatible with old drivers, so a feature should be negotiated.
> > 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
> > the socket queue but not used. (e.g. 256 byte used on 4K available in
> > the buffer)
> > - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
> > - This should be compatible also with old drivers.
> >
> > Maybe the second is less invasive, but will it be too tricky?
> > Any other advice or suggestions?
> >
> > Thanks in advance,
> > Stefano
>
> OK let me try to clarify. The idea is this:
>
> Let's say we queue a buffer of 4K, and we copy if len < 128 bytes. This
> means that in the worst case (128 byte packets), each byte of credit in
> the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
> to also account for the virtio_vsock_pkt since I think it's kept around
> until userspace consumes it.
>
> Thus given X buf alloc allowed in the socket, we should publish X/16
> credits to the other side. This will ensure the other side does not send
> more than X/16 bytes for a given socket and thus we won't need to
> allocate more than X bytes to hold the data.
>
> We can play with the copy break value to tweak this.

This seems like a reasonable solution. Hopefully the benchmark results
will come out okay too.

Stefan


Attachments:
(No filename) (5.27 kB)
signature.asc (499.00 B)
Download all attachments

2019-09-02 09:59:56

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Sun, Sep 01, 2019 at 06:17:58AM -0400, Michael S. Tsirkin wrote:
> On Sun, Sep 01, 2019 at 04:26:19AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Aug 01, 2019 at 03:36:16PM +0200, Stefano Garzarella wrote:
> > > On Thu, Aug 01, 2019 at 09:21:15AM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Aug 01, 2019 at 12:47:54PM +0200, Stefano Garzarella wrote:
> > > > > On Tue, Jul 30, 2019 at 04:42:25PM -0400, Michael S. Tsirkin wrote:
> > > > > > On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:
> > > > >
> > > > > (...)
> > > > >
> > > > > > >
> > > > > > > The problem here is the compatibility. Before this series virtio-vsock
> > > > > > > and vhost-vsock modules had the RX buffer size hard-coded
> > > > > > > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> > > > > > > of 4K, there might be issues.
> > > > > >
> > > > > > Shouldn't be if they are following the spec. If not let's fix
> > > > > > the broken parts.
> > > > > >
> > > > > > >
> > > > > > > Maybe it is the time to add add 'features' to virtio-vsock device.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Stefano
> > > > > >
> > > > > > Why would a remote care about buffer sizes?
> > > > > >
> > > > > > Let's first see what the issues are. If they exist
> > > > > > we can either fix the bugs, or code the bug as a feature in spec.
> > > > > >
> > > > >
> > > > > The vhost_transport '.stream_enqueue' callback
> > > > > [virtio_transport_stream_enqueue()] calls the virtio_transport_send_pkt_info(),
> > > > > passing the user message. This function allocates a new packet, copying
> > > > > the user message, but (before this series) it limits the packet size to
> > > > > the VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (4K):
> > > > >
> > > > > static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > > struct virtio_vsock_pkt_info *info)
> > > > > {
> > > > > ...
> > > > > /* we can send less than pkt_len bytes */
> > > > > if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> > > > > pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> > > > >
> > > > > /* virtio_transport_get_credit might return less than pkt_len credit */
> > > > > pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> > > > >
> > > > > /* Do not send zero length OP_RW pkt */
> > > > > if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> > > > > return pkt_len;
> > > > > ...
> > > > > }
> > > > >
> > > > > then it queues the packet for the TX worker calling .send_pkt()
> > > > > [vhost_transport_send_pkt() in the vhost_transport case]
> > > > >
> > > > > The main function executed by the TX worker is
> > > > > vhost_transport_do_send_pkt() that picks up a buffer from the virtqueue
> > > > > and it tries to copy the packet (up to 4K) on it. If the buffer
> > > > > allocated from the guest will be smaller then 4K, I think here it will
> > > > > be discarded with an error:
> > > > >
> > >
> > > I'm adding more lines to explain better.
> > >
> > > > > static void
> > > > > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > > > > struct vhost_virtqueue *vq)
> > > > > {
> > > ...
> > >
> > > head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> > > &out, &in, NULL, NULL);
> > >
> > > ...
> > >
> > > len = iov_length(&vq->iov[out], in);
> > > iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
> > >
> > > nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
> > > if (nbytes != sizeof(pkt->hdr)) {
> > > virtio_transport_free_pkt(pkt);
> > > vq_err(vq, "Faulted on copying pkt hdr\n");
> > > break;
> > > }
> > >
> > > > > ...
> > > > > nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
> > > >
> > > > isn't pck len the actual length though?
> > > >
> > >
> > > It is the length of the packet that we are copying in the guest RX
> > > buffers pointed by the iov_iter. The guest allocates an iovec with 2
> > > buffers, one for the header and one for the payload (4KB).
> >
> > BTW at the moment that forces another kmalloc within virtio core. Maybe
> > vsock needs a flag to skip allocation in this case. Worth benchmarking.
> > See virtqueue_use_indirect which just does total_sg > 1.

Okay, I'll take a look at virtqueue_use_indirect and I'll do some
benchmarking.

> >
> > >
> > > > > if (nbytes != pkt->len) {
> > > > > virtio_transport_free_pkt(pkt);
> > > > > vq_err(vq, "Faulted on copying pkt buf\n");
> > > > > break;
> > > > > }
> > > > > ...
> > > > > }
> > > > >
> > > > >
> > > > > This series changes this behavior since now we will split the packet in
> > > > > vhost_transport_do_send_pkt() depending on the buffer found in the
> > > > > virtqueue.
> > > > >
> > > > > We didn't change the buffer size in this series, so we still backward
> > > > > compatible, but if we will use buffers smaller than 4K, we should
> > > > > encounter the error described above.
> >
> > So that's an implementation bug then? It made an assumption
> > of a 4K sized buffer? Or even PAGE_SIZE sized buffer?

Yes, I think it made an assumption and it used this macro as a limit:

include/linux/virtio_vsock.h:13:
#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 4)

>
> Assuming we miss nothing and buffers < 4K are broken,
> I think we need to add this to the spec, possibly with
> a feature bit to relax the requirement that all buffers
> are at least 4k in size.
>

Okay, should I send a proposal to [email protected]?

Thanks,
Stefano

2019-09-02 10:13:21

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Mon, Sep 02, 2019 at 09:39:12AM +0100, Stefan Hajnoczi wrote:
> On Sun, Sep 01, 2019 at 02:56:44AM -0400, Michael S. Tsirkin wrote:
> >
> > OK let me try to clarify. The idea is this:
> >
> > Let's say we queue a buffer of 4K, and we copy if len < 128 bytes. This
> > means that in the worst case (128 byte packets), each byte of credit in
> > the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
> > to also account for the virtio_vsock_pkt since I think it's kept around
> > until userspace consumes it.
> >
> > Thus given X buf alloc allowed in the socket, we should publish X/16
> > credits to the other side. This will ensure the other side does not send
> > more than X/16 bytes for a given socket and thus we won't need to
> > allocate more than X bytes to hold the data.
> >
> > We can play with the copy break value to tweak this.

Thanks Michael, now it is perfectly clear. It seems an excellent solution and
easy to implement. I'll work on that.

>
> This seems like a reasonable solution. Hopefully the benchmark results
> will come out okay too.

Yes, as Michael suggested I'll play with the copy break value to see as
benchmark has affected.

Thank you very much,
Stefano

2019-09-02 15:36:21

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Mon, Sep 02, 2019 at 11:57:23AM +0200, Stefano Garzarella wrote:
> On Sun, Sep 01, 2019 at 06:17:58AM -0400, Michael S. Tsirkin wrote:
> > On Sun, Sep 01, 2019 at 04:26:19AM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Aug 01, 2019 at 03:36:16PM +0200, Stefano Garzarella wrote:
> > > > On Thu, Aug 01, 2019 at 09:21:15AM -0400, Michael S. Tsirkin wrote:
> > > > > On Thu, Aug 01, 2019 at 12:47:54PM +0200, Stefano Garzarella wrote:
> > > > > > On Tue, Jul 30, 2019 at 04:42:25PM -0400, Michael S. Tsirkin wrote:
> > > > > > > On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:
> > > > > >
> > > > > > (...)
> > > > > >
> > > > > > > >
> > > > > > > > The problem here is the compatibility. Before this series virtio-vsock
> > > > > > > > and vhost-vsock modules had the RX buffer size hard-coded
> > > > > > > > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller
> > > > > > > > of 4K, there might be issues.
> > > > > > >
> > > > > > > Shouldn't be if they are following the spec. If not let's fix
> > > > > > > the broken parts.
> > > > > > >
> > > > > > > >
> > > > > > > > Maybe it is the time to add add 'features' to virtio-vsock device.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Stefano
> > > > > > >
> > > > > > > Why would a remote care about buffer sizes?
> > > > > > >
> > > > > > > Let's first see what the issues are. If they exist
> > > > > > > we can either fix the bugs, or code the bug as a feature in spec.
> > > > > > >
> > > > > >
> > > > > > The vhost_transport '.stream_enqueue' callback
> > > > > > [virtio_transport_stream_enqueue()] calls the virtio_transport_send_pkt_info(),
> > > > > > passing the user message. This function allocates a new packet, copying
> > > > > > the user message, but (before this series) it limits the packet size to
> > > > > > the VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (4K):
> > > > > >
> > > > > > static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > > > struct virtio_vsock_pkt_info *info)
> > > > > > {
> > > > > > ...
> > > > > > /* we can send less than pkt_len bytes */
> > > > > > if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> > > > > > pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> > > > > >
> > > > > > /* virtio_transport_get_credit might return less than pkt_len credit */
> > > > > > pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> > > > > >
> > > > > > /* Do not send zero length OP_RW pkt */
> > > > > > if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> > > > > > return pkt_len;
> > > > > > ...
> > > > > > }
> > > > > >
> > > > > > then it queues the packet for the TX worker calling .send_pkt()
> > > > > > [vhost_transport_send_pkt() in the vhost_transport case]
> > > > > >
> > > > > > The main function executed by the TX worker is
> > > > > > vhost_transport_do_send_pkt() that picks up a buffer from the virtqueue
> > > > > > and it tries to copy the packet (up to 4K) on it. If the buffer
> > > > > > allocated from the guest will be smaller then 4K, I think here it will
> > > > > > be discarded with an error:
> > > > > >
> > > >
> > > > I'm adding more lines to explain better.
> > > >
> > > > > > static void
> > > > > > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > > > > > struct vhost_virtqueue *vq)
> > > > > > {
> > > > ...
> > > >
> > > > head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> > > > &out, &in, NULL, NULL);
> > > >
> > > > ...
> > > >
> > > > len = iov_length(&vq->iov[out], in);
> > > > iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
> > > >
> > > > nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
> > > > if (nbytes != sizeof(pkt->hdr)) {
> > > > virtio_transport_free_pkt(pkt);
> > > > vq_err(vq, "Faulted on copying pkt hdr\n");
> > > > break;
> > > > }
> > > >
> > > > > > ...
> > > > > > nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
> > > > >
> > > > > isn't pck len the actual length though?
> > > > >
> > > >
> > > > It is the length of the packet that we are copying in the guest RX
> > > > buffers pointed by the iov_iter. The guest allocates an iovec with 2
> > > > buffers, one for the header and one for the payload (4KB).
> > >
> > > BTW at the moment that forces another kmalloc within virtio core. Maybe
> > > vsock needs a flag to skip allocation in this case. Worth benchmarking.
> > > See virtqueue_use_indirect which just does total_sg > 1.
>
> Okay, I'll take a look at virtqueue_use_indirect and I'll do some
> benchmarking.
>
> > >
> > > >
> > > > > > if (nbytes != pkt->len) {
> > > > > > virtio_transport_free_pkt(pkt);
> > > > > > vq_err(vq, "Faulted on copying pkt buf\n");
> > > > > > break;
> > > > > > }
> > > > > > ...
> > > > > > }
> > > > > >
> > > > > >
> > > > > > This series changes this behavior since now we will split the packet in
> > > > > > vhost_transport_do_send_pkt() depending on the buffer found in the
> > > > > > virtqueue.
> > > > > >
> > > > > > We didn't change the buffer size in this series, so we still backward
> > > > > > compatible, but if we will use buffers smaller than 4K, we should
> > > > > > encounter the error described above.
> > >
> > > So that's an implementation bug then? It made an assumption
> > > of a 4K sized buffer? Or even PAGE_SIZE sized buffer?
>
> Yes, I think it made an assumption and it used this macro as a limit:
>
> include/linux/virtio_vsock.h:13:
> #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 4)
>
> >
> > Assuming we miss nothing and buffers < 4K are broken,
> > I think we need to add this to the spec, possibly with
> > a feature bit to relax the requirement that all buffers
> > are at least 4k in size.
> >
>
> Okay, should I send a proposal to [email protected]?
>
> Thanks,
> Stefano

virtio-comment is more appropriate for this.

--
MST

2019-09-03 04:40:24

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] vsock/virtio: reduce credit update messages

On Wed, Jul 17, 2019 at 01:30:27PM +0200, Stefano Garzarella wrote:
> In order to reduce the number of credit update messages,
> we send them only when the space available seen by the
> transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
>
> Signed-off-by: Stefano Garzarella <[email protected]>
> ---
> include/linux/virtio_vsock.h | 1 +
> net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++---
> 2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 7d973903f52e..49fc9d20bc43 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -41,6 +41,7 @@ struct virtio_vsock_sock {
>
> /* Protected by rx_lock */
> u32 fwd_cnt;
> + u32 last_fwd_cnt;
> u32 rx_bytes;
> struct list_head rx_queue;
> };
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 095221f94786..a85559d4d974 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -211,6 +211,7 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> {
> spin_lock_bh(&vvs->tx_lock);
> + vvs->last_fwd_cnt = vvs->fwd_cnt;
> pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
> pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
> spin_unlock_bh(&vvs->tx_lock);
> @@ -261,6 +262,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> struct virtio_vsock_sock *vvs = vsk->trans;
> struct virtio_vsock_pkt *pkt;
> size_t bytes, total = 0;
> + u32 free_space;
> int err = -EFAULT;
>
> spin_lock_bh(&vvs->rx_lock);
> @@ -291,11 +293,19 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> virtio_transport_free_pkt(pkt);
> }
> }
> +
> + free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
> +
> spin_unlock_bh(&vvs->rx_lock);
>
> - /* Send a credit pkt to peer */
> - virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
> - NULL);
> + /* We send a credit update only when the space available seen
> + * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE

This is just repeating what code does though.
Please include the *reason* for the condition.
E.g. here's a better comment:

/* To reduce number of credit update messages,
* don't update credits as long as lots of space is available.
* Note: the limit chosen here is arbitrary. Setting the limit
* too high causes extra messages. Too low causes transmitter
* stalls. As stalls are in theory more expensive than extra
* messages, we set the limit to a high value. TODO: experiment
* with different values.
*/


> + */
> + if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
> + virtio_transport_send_credit_update(vsk,
> + VIRTIO_VSOCK_TYPE_STREAM,
> + NULL);
> + }
>
> return total;
>
> --
> 2.20.1

2019-09-03 04:41:51

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Mon, Sep 02, 2019 at 11:57:23AM +0200, Stefano Garzarella wrote:
> >
> > Assuming we miss nothing and buffers < 4K are broken,
> > I think we need to add this to the spec, possibly with
> > a feature bit to relax the requirement that all buffers
> > are at least 4k in size.
> >
>
> Okay, should I send a proposal to [email protected]?

How about we also fix the bug for now?

--
MST

2019-09-03 07:32:33

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] vsock/virtio: reduce credit update messages

On Tue, Sep 03, 2019 at 12:38:02AM -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 17, 2019 at 01:30:27PM +0200, Stefano Garzarella wrote:
> > In order to reduce the number of credit update messages,
> > we send them only when the space available seen by the
> > transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
> >
> > Signed-off-by: Stefano Garzarella <[email protected]>
> > ---
> > include/linux/virtio_vsock.h | 1 +
> > net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++---
> > 2 files changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 7d973903f52e..49fc9d20bc43 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -41,6 +41,7 @@ struct virtio_vsock_sock {
> >
> > /* Protected by rx_lock */
> > u32 fwd_cnt;
> > + u32 last_fwd_cnt;
> > u32 rx_bytes;
> > struct list_head rx_queue;
> > };
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index 095221f94786..a85559d4d974 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -211,6 +211,7 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> > void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> > {
> > spin_lock_bh(&vvs->tx_lock);
> > + vvs->last_fwd_cnt = vvs->fwd_cnt;
> > pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
> > pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
> > spin_unlock_bh(&vvs->tx_lock);
> > @@ -261,6 +262,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> > struct virtio_vsock_sock *vvs = vsk->trans;
> > struct virtio_vsock_pkt *pkt;
> > size_t bytes, total = 0;
> > + u32 free_space;
> > int err = -EFAULT;
> >
> > spin_lock_bh(&vvs->rx_lock);
> > @@ -291,11 +293,19 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> > virtio_transport_free_pkt(pkt);
> > }
> > }
> > +
> > + free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
> > +
> > spin_unlock_bh(&vvs->rx_lock);
> >
> > - /* Send a credit pkt to peer */
> > - virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
> > - NULL);
> > + /* We send a credit update only when the space available seen
> > + * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE
>
> This is just repeating what code does though.
> Please include the *reason* for the condition.
> E.g. here's a better comment:
>
> /* To reduce number of credit update messages,
> * don't update credits as long as lots of space is available.
> * Note: the limit chosen here is arbitrary. Setting the limit
> * too high causes extra messages. Too low causes transmitter
> * stalls. As stalls are in theory more expensive than extra
> * messages, we set the limit to a high value. TODO: experiment
> * with different values.
> */
>

Yes, it is better, sorry for that. I'll try to avoid unnecessary comments,
explaining the reason for certain changes.

Since this patch is already queued in net-next, should I send another
patch to fix the comment?

Thanks,
Stefano

2019-09-03 07:41:14

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] vsock/virtio: reduce credit update messages

On Tue, Sep 03, 2019 at 09:31:20AM +0200, Stefano Garzarella wrote:
> On Tue, Sep 03, 2019 at 12:38:02AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 17, 2019 at 01:30:27PM +0200, Stefano Garzarella wrote:
> > > In order to reduce the number of credit update messages,
> > > we send them only when the space available seen by the
> > > transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
> > >
> > > Signed-off-by: Stefano Garzarella <[email protected]>
> > > ---
> > > include/linux/virtio_vsock.h | 1 +
> > > net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++---
> > > 2 files changed, 14 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > > index 7d973903f52e..49fc9d20bc43 100644
> > > --- a/include/linux/virtio_vsock.h
> > > +++ b/include/linux/virtio_vsock.h
> > > @@ -41,6 +41,7 @@ struct virtio_vsock_sock {
> > >
> > > /* Protected by rx_lock */
> > > u32 fwd_cnt;
> > > + u32 last_fwd_cnt;
> > > u32 rx_bytes;
> > > struct list_head rx_queue;
> > > };
> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > index 095221f94786..a85559d4d974 100644
> > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > @@ -211,6 +211,7 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> > > void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> > > {
> > > spin_lock_bh(&vvs->tx_lock);
> > > + vvs->last_fwd_cnt = vvs->fwd_cnt;
> > > pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
> > > pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
> > > spin_unlock_bh(&vvs->tx_lock);
> > > @@ -261,6 +262,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> > > struct virtio_vsock_sock *vvs = vsk->trans;
> > > struct virtio_vsock_pkt *pkt;
> > > size_t bytes, total = 0;
> > > + u32 free_space;
> > > int err = -EFAULT;
> > >
> > > spin_lock_bh(&vvs->rx_lock);
> > > @@ -291,11 +293,19 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> > > virtio_transport_free_pkt(pkt);
> > > }
> > > }
> > > +
> > > + free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
> > > +
> > > spin_unlock_bh(&vvs->rx_lock);
> > >
> > > - /* Send a credit pkt to peer */
> > > - virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
> > > - NULL);
> > > + /* We send a credit update only when the space available seen
> > > + * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE
> >
> > This is just repeating what code does though.
> > Please include the *reason* for the condition.
> > E.g. here's a better comment:
> >
> > /* To reduce number of credit update messages,
> > * don't update credits as long as lots of space is available.
> > * Note: the limit chosen here is arbitrary. Setting the limit
> > * too high causes extra messages. Too low causes transmitter
> > * stalls. As stalls are in theory more expensive than extra
> > * messages, we set the limit to a high value. TODO: experiment
> > * with different values.
> > */
> >
>
> Yes, it is better, sorry for that. I'll try to avoid unnecessary comments,
> explaining the reason for certain changes.
>
> Since this patch is already queued in net-next, should I send another
> patch to fix the comment?
>
> Thanks,
> Stefano

I just sent a patch like that, pls ack it.

--
MST

2019-09-03 07:48:07

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Tue, Sep 03, 2019 at 12:39:19AM -0400, Michael S. Tsirkin wrote:
> On Mon, Sep 02, 2019 at 11:57:23AM +0200, Stefano Garzarella wrote:
> > >
> > > Assuming we miss nothing and buffers < 4K are broken,
> > > I think we need to add this to the spec, possibly with
> > > a feature bit to relax the requirement that all buffers
> > > are at least 4k in size.
> > >
> >
> > Okay, should I send a proposal to [email protected]?
>
> How about we also fix the bug for now?

This series unintentionally fix the bug because we are introducing a way
to split the packet depending on the buffer size ([PATCH 4/5] vhost/vsock:
split packets to send using multiple buffers) and we removed the limit
to 4K buffers ([PATCH 5/5] vsock/virtio: change the maximum packet size
allowed).

I discovered that there was a bug while we discussed memory accounting.

Do you think it's enough while we introduce the feature bit in the spec?

Thanks,
Stefano

2019-09-03 07:53:50

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Tue, Sep 03, 2019 at 09:45:54AM +0200, Stefano Garzarella wrote:
> On Tue, Sep 03, 2019 at 12:39:19AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Sep 02, 2019 at 11:57:23AM +0200, Stefano Garzarella wrote:
> > > >
> > > > Assuming we miss nothing and buffers < 4K are broken,
> > > > I think we need to add this to the spec, possibly with
> > > > a feature bit to relax the requirement that all buffers
> > > > are at least 4k in size.
> > > >
> > >
> > > Okay, should I send a proposal to [email protected]?
> >
> > How about we also fix the bug for now?
>
> This series unintentionally fix the bug because we are introducing a way
> to split the packet depending on the buffer size ([PATCH 4/5] vhost/vsock:
> split packets to send using multiple buffers) and we removed the limit
> to 4K buffers ([PATCH 5/5] vsock/virtio: change the maximum packet size
> allowed).
>
> I discovered that there was a bug while we discussed memory accounting.
>
> Do you think it's enough while we introduce the feature bit in the spec?
>
> Thanks,
> Stefano

Well locking is also broken (patch 3/5). It seems that 3/5 and 4/5 work
by themselves, right? So how about we ask Dave to send these to stable?
Also, how about 1/5? Also needed for stable?


--
MST

2019-09-03 08:02:04

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Tue, Sep 03, 2019 at 03:52:24AM -0400, Michael S. Tsirkin wrote:
> On Tue, Sep 03, 2019 at 09:45:54AM +0200, Stefano Garzarella wrote:
> > On Tue, Sep 03, 2019 at 12:39:19AM -0400, Michael S. Tsirkin wrote:
> > > On Mon, Sep 02, 2019 at 11:57:23AM +0200, Stefano Garzarella wrote:
> > > > >
> > > > > Assuming we miss nothing and buffers < 4K are broken,
> > > > > I think we need to add this to the spec, possibly with
> > > > > a feature bit to relax the requirement that all buffers
> > > > > are at least 4k in size.
> > > > >
> > > >
> > > > Okay, should I send a proposal to [email protected]?
> > >
> > > How about we also fix the bug for now?
> >
> > This series unintentionally fix the bug because we are introducing a way
> > to split the packet depending on the buffer size ([PATCH 4/5] vhost/vsock:
> > split packets to send using multiple buffers) and we removed the limit
> > to 4K buffers ([PATCH 5/5] vsock/virtio: change the maximum packet size
> > allowed).
> >
> > I discovered that there was a bug while we discussed memory accounting.
> >
> > Do you think it's enough while we introduce the feature bit in the spec?
> >
> > Thanks,
> > Stefano
>
> Well locking is also broken (patch 3/5). It seems that 3/5 and 4/5 work
> by themselves, right? So how about we ask Dave to send these to stable?

Yes, they work by themselves and I agree that should be send to stable.

> Also, how about 1/5? Also needed for stable?

I think so, without this patch if we flood the guest with 1-byte packets,
we can consume ~ 1 GB of guest memory per-socket.

Thanks,
Stefano

2019-09-03 08:05:43

by Michael S. Tsirkin

[permalink] [raw]
Subject: request for stable (was Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput)

Patches 1,3 and 4 are needed for stable.
Dave, could you queue them there please?

On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> This series tries to increase the throughput of virtio-vsock with slight
> changes.
> While I was testing the v2 of this series I discovered an huge use of memory,
> so I added patch 1 to mitigate this issue. I put it in this series in order
> to better track the performance trends.
>
> v4:
> - rebased all patches on current master (conflicts is Patch 4)
> - Patch 1: added Stefan's R-b
> - Patch 3: removed lock when buf_alloc is written [David];
> moved this patch after "vsock/virtio: reduce credit update messages"
> to make it clearer
> - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some
> conflicts
>
> v3: https://patchwork.kernel.org/cover/10970145
>
> v2: https://patchwork.kernel.org/cover/10938743
>
> v1: https://patchwork.kernel.org/cover/10885431
>
> Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> support. As Micheal suggested in the v1, I booted host and guest with 'nosmap'.
>
> A brief description of patches:
> - Patches 1: limit the memory usage with an extra copy for small packets
> - Patches 2+3: reduce the number of credit update messages sent to the
> transmitter
> - Patches 4+5: allow the host to split packets on multiple buffers and use
> VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
>
> host -> guest [Gbps]
> pkt_size before opt p 1 p 2+3 p 4+5
>
> 32 0.032 0.030 0.048 0.051
> 64 0.061 0.059 0.108 0.117
> 128 0.122 0.112 0.227 0.234
> 256 0.244 0.241 0.418 0.415
> 512 0.459 0.466 0.847 0.865
> 1K 0.927 0.919 1.657 1.641
> 2K 1.884 1.813 3.262 3.269
> 4K 3.378 3.326 6.044 6.195
> 8K 5.637 5.676 10.141 11.287
> 16K 8.250 8.402 15.976 16.736
> 32K 13.327 13.204 19.013 20.515
> 64K 21.241 21.341 20.973 21.879
> 128K 21.851 22.354 21.816 23.203
> 256K 21.408 21.693 21.846 24.088
> 512K 21.600 21.899 21.921 24.106
>
> guest -> host [Gbps]
> pkt_size before opt p 1 p 2+3 p 4+5
>
> 32 0.045 0.046 0.057 0.057
> 64 0.089 0.091 0.103 0.104
> 128 0.170 0.179 0.192 0.200
> 256 0.364 0.351 0.361 0.379
> 512 0.709 0.699 0.731 0.790
> 1K 1.399 1.407 1.395 1.427
> 2K 2.670 2.684 2.745 2.835
> 4K 5.171 5.199 5.305 5.451
> 8K 8.442 8.500 10.083 9.941
> 16K 12.305 12.259 13.519 15.385
> 32K 11.418 11.150 11.988 24.680
> 64K 10.778 10.659 11.589 35.273
> 128K 10.421 10.339 10.939 40.338
> 256K 10.300 9.719 10.508 36.562
> 512K 9.833 9.808 10.612 35.979
>
> As Stefan suggested in the v1, I measured also the efficiency in this way:
> efficiency = Mbps / (%CPU_Host + %CPU_Guest)
>
> The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> but it's provided for free from iperf3 and could be an indication.
>
> host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt p 1 p 2+3 p 4+5
>
> 32 0.35 0.45 0.79 1.02
> 64 0.56 0.80 1.41 1.54
> 128 1.11 1.52 3.03 3.12
> 256 2.20 2.16 5.44 5.58
> 512 4.17 4.18 10.96 11.46
> 1K 8.30 8.26 20.99 20.89
> 2K 16.82 16.31 39.76 39.73
> 4K 30.89 30.79 74.07 75.73
> 8K 53.74 54.49 124.24 148.91
> 16K 80.68 83.63 200.21 232.79
> 32K 132.27 132.52 260.81 357.07
> 64K 229.82 230.40 300.19 444.18
> 128K 332.60 329.78 331.51 492.28
> 256K 331.06 337.22 339.59 511.59
> 512K 335.58 328.50 331.56 504.56
>
> guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt p 1 p 2+3 p 4+5
>
> 32 0.43 0.43 0.53 0.56
> 64 0.85 0.86 1.04 1.10
> 128 1.63 1.71 2.07 2.13
> 256 3.48 3.35 4.02 4.22
> 512 6.80 6.67 7.97 8.63
> 1K 13.32 13.31 15.72 15.94
> 2K 25.79 25.92 30.84 30.98
> 4K 50.37 50.48 58.79 59.69
> 8K 95.90 96.15 107.04 110.33
> 16K 145.80 145.43 143.97 174.70
> 32K 147.06 144.74 146.02 282.48
> 64K 145.25 143.99 141.62 406.40
> 128K 149.34 146.96 147.49 489.34
> 256K 156.35 149.81 152.21 536.37
> 512K 151.65 150.74 151.52 519.93
>
> [1] https://github.com/stefano-garzarella/iperf/
>
> Stefano Garzarella (5):
> vsock/virtio: limit the memory used per-socket
> vsock/virtio: reduce credit update messages
> vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
> vhost/vsock: split packets to send using multiple buffers
> vsock/virtio: change the maximum packet size allowed
>
> drivers/vhost/vsock.c | 68 ++++++++++++-----
> include/linux/virtio_vsock.h | 4 +-
> net/vmw_vsock/virtio_transport.c | 1 +
> net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
> 4 files changed, 134 insertions(+), 38 deletions(-)
>
> --
> 2.20.1

2019-10-11 13:42:05

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Sun, Sep 1, 2019 at 8:56 AM Michael S. Tsirkin <[email protected]> wrote:
> On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > and pushed to the guest using the vring, are directly queued in
> > > > a per-socket list. These buffers are preallocated by the guest
> > > > with a fixed size (4 KB).
> > > >
> > > > The maximum amount of memory used by each socket should be
> > > > controlled by the credit mechanism.
> > > > The default credit available per-socket is 256 KB, but if we use
> > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > to avoid starvation of other sockets.
> > > >
> > > > This patch mitigates this issue copying the payload of small
> > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > order to avoid wasting memory.
> > > >
> > > > Reviewed-by: Stefan Hajnoczi <[email protected]>
> > > > Signed-off-by: Stefano Garzarella <[email protected]>
> > >
> > > This is good enough for net-next, but for net I think we
> > > should figure out how to address the issue completely.
> > > Can we make the accounting precise? What happens to
> > > performance if we do?
> > >
> >
> > Since I'm back from holidays, I'm restarting this thread to figure out
> > how to address the issue completely.
> >
> > I did a better analysis of the credit mechanism that we implemented in
> > virtio-vsock to get a clearer view and I'd share it with you:
> >
> > This issue affect only the "host->guest" path. In this case, when the
> > host wants to send a packet to the guest, it uses a "free" buffer
> > allocated by the guest (4KB).
> > The "free" buffers available for the host are shared between all
> > sockets, instead, the credit mechanism is per-socket, I think to
> > avoid the starvation of others sockets.
> > The guests re-fill the "free" queue when the available buffers are
> > less than half.
> >
> > Each peer have these variables in the per-socket state:
> > /* local vars */
> > buf_alloc /* max bytes usable by this socket
> > [exposed to the other peer] */
> > fwd_cnt /* increased when RX packet is consumed by the
> > user space [exposed to the other peer] */
> > tx_cnt /* increased when TX packet is sent to the other peer */
> >
> > /* remote vars */
> > peer_buf_alloc /* peer's buf_alloc */
> > peer_fwd_cnt /* peer's fwd_cnt */
> >
> > When a peer sends a packet, it increases the 'tx_cnt'; when the
> > receiver consumes the packet (copy it to the user-space buffer), it
> > increases the 'fwd_cnt'.
> > Note: increments are made considering the payload length and not the
> > buffer length.
> >
> > The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
> > all packet headers or with an explicit CREDIT_UPDATE packet.
> >
> > The local 'buf_alloc' value can be modified by the user space using
> > setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.
> >
> > Before to send a packet, the peer checks the space available:
> > credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
> > and it will send up to credit_available bytes to the other peer.
> >
> > Possible solutions considering Michael's advice:
> > 1. Use the buffer length instead of the payload length when we increment
> > the counters:
> > - This approach will account precisely the memory used per socket.
> > - This requires changes in both guest and host.
> > - It is not compatible with old drivers, so a feature should be negotiated.
> > 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
> > the socket queue but not used. (e.g. 256 byte used on 4K available in
> > the buffer)
> > - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
> > - This should be compatible also with old drivers.
> >
> > Maybe the second is less invasive, but will it be too tricky?
> > Any other advice or suggestions?
> >
> > Thanks in advance,
> > Stefano
>
> OK let me try to clarify. The idea is this:
>
> Let's say we queue a buffer of 4K, and we copy if len < 128 bytes. This
> means that in the worst case (128 byte packets), each byte of credit in
> the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
> to also account for the virtio_vsock_pkt since I think it's kept around
> until userspace consumes it.
>
> Thus given X buf alloc allowed in the socket, we should publish X/16
> credits to the other side. This will ensure the other side does not send
> more than X/16 bytes for a given socket and thus we won't need to
> allocate more than X bytes to hold the data.
>
> We can play with the copy break value to tweak this.
>

Hi Michael,
sorry for the long silence, but I focused on multi-transport.

Before to implement your idea, I tried to do some calculations and
looking better to our credit mechanism:

buf_alloc = 256 KB (default, tunable through setsockopt)
sizeof(struct virtio_vsock_pkt) = 128

- guest (we use preallocated 4 KB buffers to receive packets, copying
small packet - < 128 -)
worst_case = 129
buf_size = 4 KB
credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 32

credit_published = buf_alloc / credit2mem = ~8 KB
Space for just 2 full packet (4 KB)

- host (we copy packets from the vring, allocating the space for the payload)
worst_case = 1
buf_size = 1
credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 129

credit_published = buf_alloc / credit2mem = ~2 KB
Less than a full packet (guest now can send up to 64 KB with a single
packet, so it will be limited to 2 KB)

Current memory consumption in the worst case if the RX queue is full:
- guest
mem = (buf_alloc / worst_case) *
(buf_size + sizeof(struct virtio_vsock_pkt) = ~8MB

- host
mem = (buf_alloc / worst_case) *
(buf_size + sizeof(struct virtio_vsock_pkt) = ~32MB

I think that the performance with big packets will be affected,
but I still have to try.

Another approach that I want to explore is to play with buf_alloc
published to the peer.

One thing that's not clear to me yet is the meaning of
SO_VM_SOCKETS_BUFFER_SIZE:
- max amount of memory used in the RX queue
- max amount of payload bytes in the RX queue (without overhead of
struct virtio_vsock_pkt + preallocated buffer)

From the 'include/uapi/linux/vm_sockets.h':
/* Option name for STREAM socket buffer size. Use as the option name in
* setsockopt(3) or getsockopt(3) to set or get an unsigned long long that
* specifies the size of the buffer underlying a vSockets STREAM socket.
* Value is clamped to the MIN and MAX.
*/

#define SO_VM_SOCKETS_BUFFER_SIZE 0

Regardless, I think we need to limit memory consumption in some way.
I'll check the implementation of other transports, to understand better.

I'll keep you updated!

Thanks,
Stefano

2019-10-11 14:14:16

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Fri, Oct 11, 2019 at 03:40:48PM +0200, Stefano Garzarella wrote:
> On Sun, Sep 1, 2019 at 8:56 AM Michael S. Tsirkin <[email protected]> wrote:
> > On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:
> > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > and pushed to the guest using the vring, are directly queued in
> > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > with a fixed size (4 KB).
> > > > >
> > > > > The maximum amount of memory used by each socket should be
> > > > > controlled by the credit mechanism.
> > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > to avoid starvation of other sockets.
> > > > >
> > > > > This patch mitigates this issue copying the payload of small
> > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > order to avoid wasting memory.
> > > > >
> > > > > Reviewed-by: Stefan Hajnoczi <[email protected]>
> > > > > Signed-off-by: Stefano Garzarella <[email protected]>
> > > >
> > > > This is good enough for net-next, but for net I think we
> > > > should figure out how to address the issue completely.
> > > > Can we make the accounting precise? What happens to
> > > > performance if we do?
> > > >
> > >
> > > Since I'm back from holidays, I'm restarting this thread to figure out
> > > how to address the issue completely.
> > >
> > > I did a better analysis of the credit mechanism that we implemented in
> > > virtio-vsock to get a clearer view and I'd share it with you:
> > >
> > > This issue affect only the "host->guest" path. In this case, when the
> > > host wants to send a packet to the guest, it uses a "free" buffer
> > > allocated by the guest (4KB).
> > > The "free" buffers available for the host are shared between all
> > > sockets, instead, the credit mechanism is per-socket, I think to
> > > avoid the starvation of others sockets.
> > > The guests re-fill the "free" queue when the available buffers are
> > > less than half.
> > >
> > > Each peer have these variables in the per-socket state:
> > > /* local vars */
> > > buf_alloc /* max bytes usable by this socket
> > > [exposed to the other peer] */
> > > fwd_cnt /* increased when RX packet is consumed by the
> > > user space [exposed to the other peer] */
> > > tx_cnt /* increased when TX packet is sent to the other peer */
> > >
> > > /* remote vars */
> > > peer_buf_alloc /* peer's buf_alloc */
> > > peer_fwd_cnt /* peer's fwd_cnt */
> > >
> > > When a peer sends a packet, it increases the 'tx_cnt'; when the
> > > receiver consumes the packet (copy it to the user-space buffer), it
> > > increases the 'fwd_cnt'.
> > > Note: increments are made considering the payload length and not the
> > > buffer length.
> > >
> > > The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
> > > all packet headers or with an explicit CREDIT_UPDATE packet.
> > >
> > > The local 'buf_alloc' value can be modified by the user space using
> > > setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.
> > >
> > > Before to send a packet, the peer checks the space available:
> > > credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
> > > and it will send up to credit_available bytes to the other peer.
> > >
> > > Possible solutions considering Michael's advice:
> > > 1. Use the buffer length instead of the payload length when we increment
> > > the counters:
> > > - This approach will account precisely the memory used per socket.
> > > - This requires changes in both guest and host.
> > > - It is not compatible with old drivers, so a feature should be negotiated.
> > > 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
> > > the socket queue but not used. (e.g. 256 byte used on 4K available in
> > > the buffer)
> > > - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
> > > - This should be compatible also with old drivers.
> > >
> > > Maybe the second is less invasive, but will it be too tricky?
> > > Any other advice or suggestions?
> > >
> > > Thanks in advance,
> > > Stefano
> >
> > OK let me try to clarify. The idea is this:
> >
> > Let's say we queue a buffer of 4K, and we copy if len < 128 bytes. This
> > means that in the worst case (128 byte packets), each byte of credit in
> > the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
> > to also account for the virtio_vsock_pkt since I think it's kept around
> > until userspace consumes it.
> >
> > Thus given X buf alloc allowed in the socket, we should publish X/16
> > credits to the other side. This will ensure the other side does not send
> > more than X/16 bytes for a given socket and thus we won't need to
> > allocate more than X bytes to hold the data.
> >
> > We can play with the copy break value to tweak this.
> >
>
> Hi Michael,
> sorry for the long silence, but I focused on multi-transport.
>
> Before to implement your idea, I tried to do some calculations and
> looking better to our credit mechanism:
>
> buf_alloc = 256 KB (default, tunable through setsockopt)
> sizeof(struct virtio_vsock_pkt) = 128
>
> - guest (we use preallocated 4 KB buffers to receive packets, copying
> small packet - < 128 -)
> worst_case = 129
> buf_size = 4 KB
> credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 32

Making virtio_vsock_pkt smaller will help btw.
E.g. why keep a work struct per packet? do we need the header there? etc
etc.

>
> credit_published = buf_alloc / credit2mem = ~8 KB
> Space for just 2 full packet (4 KB)
>
> - host (we copy packets from the vring, allocating the space for the payload)
> worst_case = 1
> buf_size = 1
> credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 129
>
> credit_published = buf_alloc / credit2mem = ~2 KB
> Less than a full packet (guest now can send up to 64 KB with a single
> packet, so it will be limited to 2 KB)

1 byte in struct virtio_vsock_pkt is crazy. Can't we copy
like we do in the guest? E.g. allocate at least 128 bytes,
and then we can add data to the tail of the last packet?

> Current memory consumption in the worst case if the RX queue is full:
> - guest
> mem = (buf_alloc / worst_case) *
> (buf_size + sizeof(struct virtio_vsock_pkt) = ~8MB
>
> - host
> mem = (buf_alloc / worst_case) *
> (buf_size + sizeof(struct virtio_vsock_pkt) = ~32MB
>
> I think that the performance with big packets will be affected,
> but I still have to try.
>
> Another approach that I want to explore is to play with buf_alloc
> published to the peer.
>
> One thing that's not clear to me yet is the meaning of
> SO_VM_SOCKETS_BUFFER_SIZE:
> - max amount of memory used in the RX queue
> - max amount of payload bytes in the RX queue (without overhead of
> struct virtio_vsock_pkt + preallocated buffer)
>
> >From the 'include/uapi/linux/vm_sockets.h':
> /* Option name for STREAM socket buffer size. Use as the option name in
> * setsockopt(3) or getsockopt(3) to set or get an unsigned long long that
> * specifies the size of the buffer underlying a vSockets STREAM socket.
> * Value is clamped to the MIN and MAX.
> */
>
> #define SO_VM_SOCKETS_BUFFER_SIZE 0
>
> Regardless, I think we need to limit memory consumption in some way.
> I'll check the implementation of other transports, to understand better.
>
> I'll keep you updated!
>
> Thanks,
> Stefano

2019-10-11 14:25:14

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Fri, Oct 11, 2019 at 10:11:19AM -0400, Michael S. Tsirkin wrote:
> On Fri, Oct 11, 2019 at 03:40:48PM +0200, Stefano Garzarella wrote:
> > On Sun, Sep 1, 2019 at 8:56 AM Michael S. Tsirkin <[email protected]> wrote:
> > > On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:
> > > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > > and pushed to the guest using the vring, are directly queued in
> > > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > > with a fixed size (4 KB).
> > > > > >
> > > > > > The maximum amount of memory used by each socket should be
> > > > > > controlled by the credit mechanism.
> > > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > > to avoid starvation of other sockets.
> > > > > >
> > > > > > This patch mitigates this issue copying the payload of small
> > > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > > order to avoid wasting memory.
> > > > > >
> > > > > > Reviewed-by: Stefan Hajnoczi <[email protected]>
> > > > > > Signed-off-by: Stefano Garzarella <[email protected]>
> > > > >
> > > > > This is good enough for net-next, but for net I think we
> > > > > should figure out how to address the issue completely.
> > > > > Can we make the accounting precise? What happens to
> > > > > performance if we do?
> > > > >
> > > >
> > > > Since I'm back from holidays, I'm restarting this thread to figure out
> > > > how to address the issue completely.
> > > >
> > > > I did a better analysis of the credit mechanism that we implemented in
> > > > virtio-vsock to get a clearer view and I'd share it with you:
> > > >
> > > > This issue affect only the "host->guest" path. In this case, when the
> > > > host wants to send a packet to the guest, it uses a "free" buffer
> > > > allocated by the guest (4KB).
> > > > The "free" buffers available for the host are shared between all
> > > > sockets, instead, the credit mechanism is per-socket, I think to
> > > > avoid the starvation of others sockets.
> > > > The guests re-fill the "free" queue when the available buffers are
> > > > less than half.
> > > >
> > > > Each peer have these variables in the per-socket state:
> > > > /* local vars */
> > > > buf_alloc /* max bytes usable by this socket
> > > > [exposed to the other peer] */
> > > > fwd_cnt /* increased when RX packet is consumed by the
> > > > user space [exposed to the other peer] */
> > > > tx_cnt /* increased when TX packet is sent to the other peer */
> > > >
> > > > /* remote vars */
> > > > peer_buf_alloc /* peer's buf_alloc */
> > > > peer_fwd_cnt /* peer's fwd_cnt */
> > > >
> > > > When a peer sends a packet, it increases the 'tx_cnt'; when the
> > > > receiver consumes the packet (copy it to the user-space buffer), it
> > > > increases the 'fwd_cnt'.
> > > > Note: increments are made considering the payload length and not the
> > > > buffer length.
> > > >
> > > > The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
> > > > all packet headers or with an explicit CREDIT_UPDATE packet.
> > > >
> > > > The local 'buf_alloc' value can be modified by the user space using
> > > > setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.
> > > >
> > > > Before to send a packet, the peer checks the space available:
> > > > credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
> > > > and it will send up to credit_available bytes to the other peer.
> > > >
> > > > Possible solutions considering Michael's advice:
> > > > 1. Use the buffer length instead of the payload length when we increment
> > > > the counters:
> > > > - This approach will account precisely the memory used per socket.
> > > > - This requires changes in both guest and host.
> > > > - It is not compatible with old drivers, so a feature should be negotiated.
> > > > 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
> > > > the socket queue but not used. (e.g. 256 byte used on 4K available in
> > > > the buffer)
> > > > - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
> > > > - This should be compatible also with old drivers.
> > > >
> > > > Maybe the second is less invasive, but will it be too tricky?
> > > > Any other advice or suggestions?
> > > >
> > > > Thanks in advance,
> > > > Stefano
> > >
> > > OK let me try to clarify. The idea is this:
> > >
> > > Let's say we queue a buffer of 4K, and we copy if len < 128 bytes. This
> > > means that in the worst case (128 byte packets), each byte of credit in
> > > the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
> > > to also account for the virtio_vsock_pkt since I think it's kept around
> > > until userspace consumes it.
> > >
> > > Thus given X buf alloc allowed in the socket, we should publish X/16
> > > credits to the other side. This will ensure the other side does not send
> > > more than X/16 bytes for a given socket and thus we won't need to
> > > allocate more than X bytes to hold the data.
> > >
> > > We can play with the copy break value to tweak this.
> > >
> >
> > Hi Michael,
> > sorry for the long silence, but I focused on multi-transport.
> >
> > Before to implement your idea, I tried to do some calculations and
> > looking better to our credit mechanism:
> >
> > buf_alloc = 256 KB (default, tunable through setsockopt)
> > sizeof(struct virtio_vsock_pkt) = 128
> >
> > - guest (we use preallocated 4 KB buffers to receive packets, copying
> > small packet - < 128 -)
> > worst_case = 129
> > buf_size = 4 KB
> > credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 32
>
> Making virtio_vsock_pkt smaller will help btw.
> E.g. why keep a work struct per packet? do we need the header there? etc
> etc.
>

Yes, I had the same feeling, I'll try to make it smaller!


> >
> > credit_published = buf_alloc / credit2mem = ~8 KB
> > Space for just 2 full packet (4 KB)
> >
> > - host (we copy packets from the vring, allocating the space for the payload)
> > worst_case = 1
> > buf_size = 1
> > credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 129
> >
> > credit_published = buf_alloc / credit2mem = ~2 KB
> > Less than a full packet (guest now can send up to 64 KB with a single
> > packet, so it will be limited to 2 KB)
>
> 1 byte in struct virtio_vsock_pkt is crazy. Can't we copy

I know :-)

> like we do in the guest? E.g. allocate at least 128 bytes,
> and then we can add data to the tail of the last packet?

Yes, I've been thinking about trying that out. (allocate at least 128 bytes
and do as we do in the guest).


Thanks,
Stefano

2019-10-14 08:19:28

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Fri, Oct 11, 2019 at 03:40:48PM +0200, Stefano Garzarella wrote:
> On Sun, Sep 1, 2019 at 8:56 AM Michael S. Tsirkin <[email protected]> wrote:
> > On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:
> > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > and pushed to the guest using the vring, are directly queued in
> > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > with a fixed size (4 KB).
> > > > >
> > > > > The maximum amount of memory used by each socket should be
> > > > > controlled by the credit mechanism.
> > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > to avoid starvation of other sockets.
> > > > >
> > > > > This patch mitigates this issue copying the payload of small
> > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > order to avoid wasting memory.
> > > > >
> > > > > Reviewed-by: Stefan Hajnoczi <[email protected]>
> > > > > Signed-off-by: Stefano Garzarella <[email protected]>
> > > >
> > > > This is good enough for net-next, but for net I think we
> > > > should figure out how to address the issue completely.
> > > > Can we make the accounting precise? What happens to
> > > > performance if we do?
> > > >
> > >
> > > Since I'm back from holidays, I'm restarting this thread to figure out
> > > how to address the issue completely.
> > >
> > > I did a better analysis of the credit mechanism that we implemented in
> > > virtio-vsock to get a clearer view and I'd share it with you:
> > >
> > > This issue affect only the "host->guest" path. In this case, when the
> > > host wants to send a packet to the guest, it uses a "free" buffer
> > > allocated by the guest (4KB).
> > > The "free" buffers available for the host are shared between all
> > > sockets, instead, the credit mechanism is per-socket, I think to
> > > avoid the starvation of others sockets.
> > > The guests re-fill the "free" queue when the available buffers are
> > > less than half.
> > >
> > > Each peer have these variables in the per-socket state:
> > > /* local vars */
> > > buf_alloc /* max bytes usable by this socket
> > > [exposed to the other peer] */
> > > fwd_cnt /* increased when RX packet is consumed by the
> > > user space [exposed to the other peer] */
> > > tx_cnt /* increased when TX packet is sent to the other peer */
> > >
> > > /* remote vars */
> > > peer_buf_alloc /* peer's buf_alloc */
> > > peer_fwd_cnt /* peer's fwd_cnt */
> > >
> > > When a peer sends a packet, it increases the 'tx_cnt'; when the
> > > receiver consumes the packet (copy it to the user-space buffer), it
> > > increases the 'fwd_cnt'.
> > > Note: increments are made considering the payload length and not the
> > > buffer length.
> > >
> > > The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
> > > all packet headers or with an explicit CREDIT_UPDATE packet.
> > >
> > > The local 'buf_alloc' value can be modified by the user space using
> > > setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.
> > >
> > > Before to send a packet, the peer checks the space available:
> > > credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
> > > and it will send up to credit_available bytes to the other peer.
> > >
> > > Possible solutions considering Michael's advice:
> > > 1. Use the buffer length instead of the payload length when we increment
> > > the counters:
> > > - This approach will account precisely the memory used per socket.
> > > - This requires changes in both guest and host.
> > > - It is not compatible with old drivers, so a feature should be negotiated.
> > > 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
> > > the socket queue but not used. (e.g. 256 byte used on 4K available in
> > > the buffer)
> > > - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
> > > - This should be compatible also with old drivers.
> > >
> > > Maybe the second is less invasive, but will it be too tricky?
> > > Any other advice or suggestions?
> > >
> > > Thanks in advance,
> > > Stefano
> >
> > OK let me try to clarify. The idea is this:
> >
> > Let's say we queue a buffer of 4K, and we copy if len < 128 bytes. This
> > means that in the worst case (128 byte packets), each byte of credit in
> > the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
> > to also account for the virtio_vsock_pkt since I think it's kept around
> > until userspace consumes it.
> >
> > Thus given X buf alloc allowed in the socket, we should publish X/16
> > credits to the other side. This will ensure the other side does not send
> > more than X/16 bytes for a given socket and thus we won't need to
> > allocate more than X bytes to hold the data.
> >
> > We can play with the copy break value to tweak this.
> >
>
> Hi Michael,
> sorry for the long silence, but I focused on multi-transport.
>
> Before to implement your idea, I tried to do some calculations and
> looking better to our credit mechanism:
>
> buf_alloc = 256 KB (default, tunable through setsockopt)
> sizeof(struct virtio_vsock_pkt) = 128
>
> - guest (we use preallocated 4 KB buffers to receive packets, copying
> small packet - < 128 -)
> worst_case = 129
> buf_size = 4 KB
> credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 32
>
> credit_published = buf_alloc / credit2mem = ~8 KB
> Space for just 2 full packet (4 KB)
>
> - host (we copy packets from the vring, allocating the space for the payload)
> worst_case = 1
> buf_size = 1
> credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 129
>
> credit_published = buf_alloc / credit2mem = ~2 KB
> Less than a full packet (guest now can send up to 64 KB with a single
> packet, so it will be limited to 2 KB)
>
> Current memory consumption in the worst case if the RX queue is full:
> - guest
> mem = (buf_alloc / worst_case) *
> (buf_size + sizeof(struct virtio_vsock_pkt) = ~8MB
>
> - host
> mem = (buf_alloc / worst_case) *
> (buf_size + sizeof(struct virtio_vsock_pkt) = ~32MB
>
> I think that the performance with big packets will be affected,
> but I still have to try.
>
> Another approach that I want to explore is to play with buf_alloc
> published to the peer.
>
> One thing that's not clear to me yet is the meaning of
> SO_VM_SOCKETS_BUFFER_SIZE:
> - max amount of memory used in the RX queue
> - max amount of payload bytes in the RX queue (without overhead of
> struct virtio_vsock_pkt + preallocated buffer)
>
> From the 'include/uapi/linux/vm_sockets.h':
> /* Option name for STREAM socket buffer size. Use as the option name in
> * setsockopt(3) or getsockopt(3) to set or get an unsigned long long that
> * specifies the size of the buffer underlying a vSockets STREAM socket.
> * Value is clamped to the MIN and MAX.
> */
>
> #define SO_VM_SOCKETS_BUFFER_SIZE 0
>
> Regardless, I think we need to limit memory consumption in some way.
> I'll check the implementation of other transports, to understand better.

SO_VM_SOCKETS_BUFFER_SIZE might have been useful for VMCI-specific
applications, but we should use SO_RCVBUF and SO_SNDBUF for portable
applications in the future. Those socket options also work with other
address families.

I guess these sockopts are bypassed by AF_VSOCK because it doesn't use
the common skb queuing code in net/core/sock.c :(. But one day we might
migrate to it...

Stefan


Attachments:
(No filename) (8.15 kB)
signature.asc (499.00 B)
Download all attachments

2019-10-14 08:23:45

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket


On 2019/10/14 下午4:17, Stefan Hajnoczi wrote:
> SO_VM_SOCKETS_BUFFER_SIZE might have been useful for VMCI-specific
> applications, but we should use SO_RCVBUF and SO_SNDBUF for portable
> applications in the future. Those socket options also work with other
> address families.
>
> I guess these sockopts are bypassed by AF_VSOCK because it doesn't use
> the common skb queuing code in net/core/sock.c:(. But one day we might
> migrate to it...
>
> Stefan


+1, we should really consider to reuse the exist socket mechanism
instead of re-inventing wheels.

Thanks

2019-10-14 08:39:45

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

On Mon, Oct 14, 2019 at 04:21:35PM +0800, Jason Wang wrote:
> On 2019/10/14 下午4:17, Stefan Hajnoczi wrote:
> > SO_VM_SOCKETS_BUFFER_SIZE might have been useful for VMCI-specific
> > applications, but we should use SO_RCVBUF and SO_SNDBUF for portable
> > applications in the future. Those socket options also work with other
> > address families.
> >

I think hyperv_transport started to use it in this patch:
ac383f58f3c9 hv_sock: perf: Allow the socket buffer size options to influence
the actual socket buffers


> > I guess these sockopts are bypassed by AF_VSOCK because it doesn't use
> > the common skb queuing code in net/core/sock.c:(. But one day we might
> > migrate to it...
> >
> > Stefan
>
>
> +1, we should really consider to reuse the exist socket mechanism instead of
> re-inventing wheels.

+1, I totally agree. I'll go this way.

Guys, thank you all for your suggestions!