Hi,
I stumbled across poor performance of virtio-blk while working on a
high-performance network storage protocol. Moving virtio-blk's host
side to kernel did increase single queue IOPS, but multiqueue disk
still was not scaling well. It turned out that vhost handles events
from all virtio queues in one helper thread, and that's pretty much a
big serialization point.
The following patch enables events handling in per-queue thread and
increases IO concurrency, see IOPS numbers:
# num-queues
# bare metal
# virtio-blk
# vhost-blk
1 171k 148k 195k
2 328k 249k 349k
3 479k 179k 501k
4 622k 143k 620k
5 755k 136k 737k
6 887k 131k 830k
7 1004k 126k 926k
8 1099k 117k 1001k
9 1194k 115k 1055k
10 1278k 109k 1130k
11 1345k 110k 1119k
12 1411k 104k 1201k
13 1466k 106k 1260k
14 1517k 103k 1296k
15 1552k 102k 1322k
16 1480k 101k 1346k
Vitaly Mayatskikh (1):
vhost: add per-vq worker thread
drivers/vhost/vhost.c | 123 +++++++++++++++++++++++++++++++-----------
drivers/vhost/vhost.h | 11 +++-
2 files changed, 100 insertions(+), 34 deletions(-)
--
2.17.1
On 2018/11/3 上午12:07, Vitaly Mayatskikh wrote:
> Hi,
>
> I stumbled across poor performance of virtio-blk while working on a
> high-performance network storage protocol. Moving virtio-blk's host
> side to kernel did increase single queue IOPS, but multiqueue disk
> still was not scaling well. It turned out that vhost handles events
> from all virtio queues in one helper thread, and that's pretty much a
> big serialization point.
>
> The following patch enables events handling in per-queue thread and
> increases IO concurrency, see IOPS numbers:
Thanks a lot for the patches. Here's some thoughts:
- This is not the first attempt that tries to parallelize vhost workers.
So we need a comparing among them.
1) Multiple vhost workers from Anthony,
https://www.spinics.net/lists/netdev/msg189432.html
2) ELVIS from IBM, http://www.mulix.org/pubs/eli/elvis-h319.pdf
3) CMWQ from Bandan,
http://www.linux-kvm.org/images/5/52/02x08-Aspen-Bandan_Das-vhost-sharing_is_better.pdf
- vhost-net use a different multiqueue model. Each vhost device on host
is only dealing with a specific queue pair instead of a whole device.
This allow great flexibility and multiqueue could be implemented without
touching vhost codes.
- current vhost-net implementation depends heavily on the assumption of
single thread model especially its busy polling code. It would be broken
by this attempt. If we decide to go this way, this needs to be fixed.
And we do need performance result of networking.
- Having more threads is not necessarily a win, at least we need a
module parameter to other stuffs to control the number of threads I
believe.
Thanks
>
> # num-queues
> # bare metal
> # virtio-blk
> # vhost-blk
>
> 1 171k 148k 195k
> 2 328k 249k 349k
> 3 479k 179k 501k
> 4 622k 143k 620k
> 5 755k 136k 737k
> 6 887k 131k 830k
> 7 1004k 126k 926k
> 8 1099k 117k 1001k
> 9 1194k 115k 1055k
> 10 1278k 109k 1130k
> 11 1345k 110k 1119k
> 12 1411k 104k 1201k
> 13 1466k 106k 1260k
> 14 1517k 103k 1296k
> 15 1552k 102k 1322k
> 16 1480k 101k 1346k
>
> Vitaly Mayatskikh (1):
> vhost: add per-vq worker thread
>
> drivers/vhost/vhost.c | 123 +++++++++++++++++++++++++++++++-----------
> drivers/vhost/vhost.h | 11 +++-
> 2 files changed, 100 insertions(+), 34 deletions(-)
>
On Sun, Nov 4, 2018 at 9:52 PM Jason Wang <[email protected]> wrote:
> Thanks a lot for the patches. Here's some thoughts:
>
> - This is not the first attempt that tries to parallelize vhost workers.
> So we need a comparing among them.
>
> 1) Multiple vhost workers from Anthony,
> https://www.spinics.net/lists/netdev/msg189432.html
>
> 2) ELVIS from IBM, http://www.mulix.org/pubs/eli/elvis-h319.pdf
>
> 3) CMWQ from Bandan,
> http://www.linux-kvm.org/images/5/52/02x08-Aspen-Bandan_Das-vhost-sharing_is_better.pdf
>
> - vhost-net use a different multiqueue model. Each vhost device on host
> is only dealing with a specific queue pair instead of a whole device.
> This allow great flexibility and multiqueue could be implemented without
> touching vhost codes.
I'm no way a network expert, but I think this is because it follows a
combined queue model of the NIC. Having a TX/RX queues pair looks like
a natural choice for this case.
> - current vhost-net implementation depends heavily on the assumption of
> single thread model especially its busy polling code. It would be broken
> by this attempt. If we decide to go this way, this needs to be fixed.
> And we do need performance result of networking.
Thanks for noting that, I miss a lot of historical background. Will
check that up.
> - Having more threads is not necessarily a win, at least we need a
> module parameter to other stuffs to control the number of threads I
> believe.
I agree I didn't think fully about other cases, but for the disk it is
already under controll: QEMU's num-queues disk parameter.
There's a certain saturation point when adding more threads does not
yield lot more more performance. For my environment it's about 12
queues.
So, how does it sound: the default behaviour is 1 worker per vhost
device. If the user needs per-vq worker he does a new VHOST_SET_
ioctl?
--
wbr, Vitaly