under virtual machine environment, cpu topology may differ
from normal physical server.
for example (machine with 4 cores, 2 threads per core):
normal physical server:
core-id thread-0-id thread-1-id
0 0 4
1 1 5
2 2 6
3 3 7
virtual machine:
core-id thread-0-id thread-1-id
0 0 1
1 2 3
2 4 5
3 6 7
If we attach disk with multi queues to virtual machine,
blk_mq_map_queues can cause serious imbalance.this will lead to
performance impact on system IO.
Here is the qemu cmdline:
"-smp 8,sockets=1,cores=8,threads=2"
"-device virtio-blk-pci,drive=drive0,id=device0,num-queues=4,vectors=2"
when vectors less than num-queues, virtio-blk will fallback to
blk_mq_map_queues
Before this patch:
[root@blk-mq ~]# cat /sys/block/vd*/mq/*/cpu_list
0, 4, 5, 8, 9, 12, 13
1
2, 6, 7, 10, 11, 14, 15
3
After this patch:
[root@blk-mq ~]# cat /sys/block/vd*/mq/*/cpu_list
0, 1, 8, 9
2, 3, 10, 11
4, 5, 12, 13
6, 7, 14, 15
Signed-off-by: luferry <[email protected]>
---
block/blk-mq-cpumap.c | 31 +++++++++++++++++++++++--------
1 file changed, 23 insertions(+), 8 deletions(-)
diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 03a534820271..2eb78ad4a49b 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -35,22 +35,37 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
{
unsigned int *map = qmap->mq_map;
unsigned int nr_queues = qmap->nr_queues;
- unsigned int cpu, first_sibling;
+ unsigned int cpu, first_sibling, core = 0;
+ bool core_map = false;
+ /*
+ * If core num is euqal or over nr_queues,
+ * there be sure at least per core per queue
+ */
+ for_each_possible_cpu(cpu) {
+ if (get_first_sibling(cpu) == cpu)
+ core++;
+ }
+ if (core >= nr_queues)
+ core_map = true;
+
+ core = 0;
for_each_possible_cpu(cpu) {
/*
- * First do sequential mapping between CPUs and queues.
- * In case we still have CPUs to map, and we have some number of
- * threads per cores then map sibling threads to the same queue for
+ * If cores is enough, just do map between cores and queues
+ * else will do sequential mapping between CPUs and queues first.
+ * For other cpus, we have some number of threads per cores
+ * then map sibling threads to the same queue for
* performace optimizations.
*/
- if (cpu < nr_queues) {
+ if (!core_map && cpu < nr_queues) {
map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
} else {
first_sibling = get_first_sibling(cpu);
- if (first_sibling == cpu)
- map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
- else
+ if (first_sibling == cpu) {
+ map[cpu] = cpu_to_queue_index(qmap, nr_queues, core);
+ core++;
+ } else
map[cpu] = map[first_sibling];
}
}
--
2.14.1.40.g8e62ba1
Why isn't this using the automatic PCI-level affinity assignment to
start with?
At 2019-03-26 15:39:54, "Christoph Hellwig" <[email protected]> wrote:
>Why isn't this using the automatic PCI-level affinity assignment to
>start with?
When enable virtio-blk with multi queues but with only 2 msix-vector.
vp_dev->per_vq_vectors will be false, vp_get_vq_affintity will return NULL directly
so blk_mq_virtio_map_queues will fallback to blk_mq_map_queues.
448 const struct cpumask *vp_get_vq_affinity(struct virtio_device *vdev, int index)
449 {
450 struct virtio_pci_device *vp_dev = to_vp_device(vdev);
451
452 if (!vp_dev->per_vq_vectors ||
453 vp_dev->vqs[index]->msix_vector == VIRTIO_MSI_NO_VECTOR)
454 return NULL;
455
456 return pci_irq_get_affinity(vp_dev->pci_dev,
457 vp_dev->vqs[index]->msix_vector);
458 }
32 int blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap,
33 struct virtio_device *vdev, int first_vec)
34 {
35 const struct cpumask *mask;
36 unsigned int queue, cpu;
37
38 if (!vdev->config->get_vq_affinity)
39 goto fallback;
40
41 for (queue = 0; queue < qmap->nr_queues; queue++) {
42 mask = vdev->config->get_vq_affinity(vdev, first_vec + queue); //vp_get_vq_affinity return NULL
43 if (!mask)
44 goto fallback;
45
46 for_each_cpu(cpu, mask)
47 qmap->mq_map[cpu] = qmap->queue_offset + queue;
48 }
49
50 return 0;
51 fallback:
52 return blk_mq_map_queues(qmap);
53 }
here is previous discussion
https://patchwork.kernel.org/patch/10865461/
On Tue, Mar 26, 2019 at 03:55:10PM +0800, luferry wrote:
>
>
>
> At 2019-03-26 15:39:54, "Christoph Hellwig" <[email protected]> wrote:
> >Why isn't this using the automatic PCI-level affinity assignment to
> >start with?
>
> When enable virtio-blk with multi queues but with only 2 msix-vector.
> vp_dev->per_vq_vectors will be false, vp_get_vq_affintity will return NULL directly
> so blk_mq_virtio_map_queues will fallback to blk_mq_map_queues.
What is the point of the multiqueue mode if you don't have enough
(virtual) MSI-X vectors?
Actually, I just bought one vm from public cloud provider and run into this problem.
after reading code and compare pci device info, I reproduce this scenario.
Since common users cannot change msi vector numbers, so I suggest blk_mq_map_queues to be
more friendly. blk_mq_map_queues may be the last choice.
At 2019-03-27 16:16:19, "Christoph Hellwig" <[email protected]> wrote:
>On Tue, Mar 26, 2019 at 03:55:10PM +0800, luferry wrote:
>>
>>
>>
>> At 2019-03-26 15:39:54, "Christoph Hellwig" <[email protected]> wrote:
>> >Why isn't this using the automatic PCI-level affinity assignment to
>> >start with?
>>
>> When enable virtio-blk with multi queues but with only 2 msix-vector.
>> vp_dev->per_vq_vectors will be false, vp_get_vq_affintity will return NULL directly
>> so blk_mq_virtio_map_queues will fallback to blk_mq_map_queues.
>
>What is the point of the multiqueue mode if you don't have enough
>(virtual) MSI-X vectors?
On 03/27/2019 05:17 PM, luferry wrote:
>
>
>
> Actually, I just bought one vm from public cloud provider and run into this problem.
> after reading code and compare pci device info, I reproduce this scenario.
There is lack of vector issue when number of queues is more than maxcpus as well.
https://lore.kernel.org/lkml/e4afe4c5-0262-4500-aeec-60f30734b4fc@default
I was going to send out patch to fix that specific case on virtio side.
BTW, is your VM's virtio mmio based or pci based?
As mentioned in previous email, when I was playing with firecracker by amazon,
the virtio-blk is mmio based. Not sure if that would lead to the lack of vector
issue.
Dongli Zhang
>
>There is lack of vector issue when number of queues is more than maxcpus as well.
>
>https://lore.kernel.org/lkml/e4afe4c5-0262-4500-aeec-60f30734b4fc@default
>
>I was going to send out patch to fix that specific case on virtio side.
>
>BTW, is your VM's virtio mmio based or pci based?
>
>As mentioned in previous email, when I was playing with firecracker by amazon,
>the virtio-blk is mmio based. Not sure if that would lead to the lack of vector
>issue.
>
>Dongli Zhang
It's pci based.