2019-03-25 15:28:15

by luferry

[permalink] [raw]
Subject: [PATCH] make blk_mq_map_queues more friendly for cpu topology

under virtual machine environment, cpu topology may differ
from normal physical server.

for example (machine with 4 cores, 2 threads per core):

normal physical server:
core-id thread-0-id thread-1-id
0 0 4
1 1 5
2 2 6
3 3 7

virtual machine:
core-id thread-0-id thread-1-id
0 0 1
1 2 3
2 4 5
3 6 7

If we attach disk with multi queues to virtual machine,
blk_mq_map_queues can cause serious imbalance.this will lead to
performance impact on system IO.

Here is the qemu cmdline:
"-smp 8,sockets=1,cores=8,threads=2"
"-device virtio-blk-pci,drive=drive0,id=device0,num-queues=4,vectors=2"

when vectors less than num-queues, virtio-blk will fallback to
blk_mq_map_queues

Before this patch:
[root@blk-mq ~]# cat /sys/block/vd*/mq/*/cpu_list
0, 4, 5, 8, 9, 12, 13
1
2, 6, 7, 10, 11, 14, 15
3

After this patch:
[root@blk-mq ~]# cat /sys/block/vd*/mq/*/cpu_list
0, 1, 8, 9
2, 3, 10, 11
4, 5, 12, 13
6, 7, 14, 15

Signed-off-by: luferry <[email protected]>
---
block/blk-mq-cpumap.c | 31 +++++++++++++++++++++++--------
1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 03a534820271..2eb78ad4a49b 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -35,22 +35,37 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
{
unsigned int *map = qmap->mq_map;
unsigned int nr_queues = qmap->nr_queues;
- unsigned int cpu, first_sibling;
+ unsigned int cpu, first_sibling, core = 0;
+ bool core_map = false;

+ /*
+ * If core num is euqal or over nr_queues,
+ * there be sure at least per core per queue
+ */
+ for_each_possible_cpu(cpu) {
+ if (get_first_sibling(cpu) == cpu)
+ core++;
+ }
+ if (core >= nr_queues)
+ core_map = true;
+
+ core = 0;
for_each_possible_cpu(cpu) {
/*
- * First do sequential mapping between CPUs and queues.
- * In case we still have CPUs to map, and we have some number of
- * threads per cores then map sibling threads to the same queue for
+ * If cores is enough, just do map between cores and queues
+ * else will do sequential mapping between CPUs and queues first.
+ * For other cpus, we have some number of threads per cores
+ * then map sibling threads to the same queue for
* performace optimizations.
*/
- if (cpu < nr_queues) {
+ if (!core_map && cpu < nr_queues) {
map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
} else {
first_sibling = get_first_sibling(cpu);
- if (first_sibling == cpu)
- map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
- else
+ if (first_sibling == cpu) {
+ map[cpu] = cpu_to_queue_index(qmap, nr_queues, core);
+ core++;
+ } else
map[cpu] = map[first_sibling];
}
}
--
2.14.1.40.g8e62ba1



2019-03-26 07:42:11

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] make blk_mq_map_queues more friendly for cpu topology

Why isn't this using the automatic PCI-level affinity assignment to
start with?

2019-03-26 08:07:47

by luferry

[permalink] [raw]
Subject: Re:Re: [PATCH] make blk_mq_map_queues more friendly for cpu topology




At 2019-03-26 15:39:54, "Christoph Hellwig" <[email protected]> wrote:
>Why isn't this using the automatic PCI-level affinity assignment to
>start with?

When enable virtio-blk with multi queues but with only 2 msix-vector.
vp_dev->per_vq_vectors will be false, vp_get_vq_affintity will return NULL directly
so blk_mq_virtio_map_queues will fallback to blk_mq_map_queues.


448 const struct cpumask *vp_get_vq_affinity(struct virtio_device *vdev, int index)
449 {
450 struct virtio_pci_device *vp_dev = to_vp_device(vdev);
451
452 if (!vp_dev->per_vq_vectors ||
453 vp_dev->vqs[index]->msix_vector == VIRTIO_MSI_NO_VECTOR)
454 return NULL;
455
456 return pci_irq_get_affinity(vp_dev->pci_dev,
457 vp_dev->vqs[index]->msix_vector);
458 }

32 int blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap,
33 struct virtio_device *vdev, int first_vec)
34 {
35 const struct cpumask *mask;
36 unsigned int queue, cpu;
37
38 if (!vdev->config->get_vq_affinity)
39 goto fallback;
40
41 for (queue = 0; queue < qmap->nr_queues; queue++) {
42 mask = vdev->config->get_vq_affinity(vdev, first_vec + queue); //vp_get_vq_affinity return NULL
43 if (!mask)
44 goto fallback;
45
46 for_each_cpu(cpu, mask)
47 qmap->mq_map[cpu] = qmap->queue_offset + queue;
48 }
49
50 return 0;
51 fallback:
52 return blk_mq_map_queues(qmap);
53 }

here is previous discussion
https://patchwork.kernel.org/patch/10865461/

2019-03-27 08:17:19

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] make blk_mq_map_queues more friendly for cpu topology

On Tue, Mar 26, 2019 at 03:55:10PM +0800, luferry wrote:
>
>
>
> At 2019-03-26 15:39:54, "Christoph Hellwig" <[email protected]> wrote:
> >Why isn't this using the automatic PCI-level affinity assignment to
> >start with?
>
> When enable virtio-blk with multi queues but with only 2 msix-vector.
> vp_dev->per_vq_vectors will be false, vp_get_vq_affintity will return NULL directly
> so blk_mq_virtio_map_queues will fallback to blk_mq_map_queues.

What is the point of the multiqueue mode if you don't have enough
(virtual) MSI-X vectors?

2019-03-27 09:34:04

by luferry

[permalink] [raw]
Subject: Re:Re: [PATCH] make blk_mq_map_queues more friendly for cpu topology




Actually, I just bought one vm from public cloud provider and run into this problem.
after reading code and compare pci device info, I reproduce this scenario.

Since common users cannot change msi vector numbers, so I suggest blk_mq_map_queues to be
more friendly. blk_mq_map_queues may be the last choice.





At 2019-03-27 16:16:19, "Christoph Hellwig" <[email protected]> wrote:
>On Tue, Mar 26, 2019 at 03:55:10PM +0800, luferry wrote:
>>
>>
>>
>> At 2019-03-26 15:39:54, "Christoph Hellwig" <[email protected]> wrote:
>> >Why isn't this using the automatic PCI-level affinity assignment to
>> >start with?
>>
>> When enable virtio-blk with multi queues but with only 2 msix-vector.
>> vp_dev->per_vq_vectors will be false, vp_get_vq_affintity will return NULL directly
>> so blk_mq_virtio_map_queues will fallback to blk_mq_map_queues.
>
>What is the point of the multiqueue mode if you don't have enough
>(virtual) MSI-X vectors?

2019-03-27 09:42:07

by Dongli Zhang

[permalink] [raw]
Subject: Re: [PATCH] make blk_mq_map_queues more friendly for cpu topology



On 03/27/2019 05:17 PM, luferry wrote:
>
>
>
> Actually, I just bought one vm from public cloud provider and run into this problem.
> after reading code and compare pci device info, I reproduce this scenario.

There is lack of vector issue when number of queues is more than maxcpus as well.

https://lore.kernel.org/lkml/e4afe4c5-0262-4500-aeec-60f30734b4fc@default

I was going to send out patch to fix that specific case on virtio side.

BTW, is your VM's virtio mmio based or pci based?

As mentioned in previous email, when I was playing with firecracker by amazon,
the virtio-blk is mmio based. Not sure if that would lead to the lack of vector
issue.

Dongli Zhang

2019-03-27 09:56:37

by luferry

[permalink] [raw]
Subject: Re:Re: [PATCH] make blk_mq_map_queues more friendly for cpu topology


>
>There is lack of vector issue when number of queues is more than maxcpus as well.
>
>https://lore.kernel.org/lkml/e4afe4c5-0262-4500-aeec-60f30734b4fc@default
>
>I was going to send out patch to fix that specific case on virtio side.
>
>BTW, is your VM's virtio mmio based or pci based?
>
>As mentioned in previous email, when I was playing with firecracker by amazon,
>the virtio-blk is mmio based. Not sure if that would lead to the lack of vector
>issue.
>
>Dongli Zhang

It's pci based.