Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
Subject: Re: [PATCH] block/mq: blk map queues by core id
To:     luferry <luferry@163.com>
References: <20190322100949.5555-1-luferry@163.com>
 <9fa3eda2-536c-86a1-a7a3-c157fd0cdfdc@oracle.com>
 <4cddfb88.8534.169a941f24a.Coremail.luferry@163.com>
 <9b568440-cd06-d74c-5708-333b847e49c1@oracle.com>
 <7cf8dc67.12abe.169b441500d.Coremail.luferry@163.com>
 <4181cc8.12c63.169b44549cb.Coremail.luferry@163.com>
Cc:     Jens Axboe <axboe@kernel.dk>, Ming Lei <ming.lei@redhat.com>,
        Christoph Hellwig <hch@lst.de>, linux-block@vger.kernel.org,
        linux-kernel@vger.kernel.org
From:   Dongli Zhang <dongli.zhang@oracle.com>
Message-ID: <a1cfd78c-d870-030e-bbed-fc2e7372debd@oracle.com>
Date:   Mon, 25 Mar 2019 21:53:00 +0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.3.0
MIME-Version: 1.0
In-Reply-To: <4181cc8.12c63.169b44549cb.Coremail.luferry@163.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk


On 03/25/2019 05:53 PM, luferry wrote:
> 
> sorry,  messed up some func name
> 
> When enable virtio-blk with multi queues but with only 2 msix-vector.
> vp_dev->per_vq_vectors will be false, so blk_mq_virtio_map_queues will fallback to blk_mq_map_queues
> 
> Since virtual machine users cannot change the vector numbers, I think blk_mq_map_queues should be more friendly with different cpu topology.


Yes, we will fallback to blk_mq_map_queues if there is lack of vectors.

Here is the qemu cmdline:
"-smp 8,sockets=1,cores=4,threads=2"
"-device virtio-blk-pci,drive=drive0,id=device0,num-queues=4,vectors=2"

# cat /sys/block/vda/mq/0/cpu_list
0, 4, 5
root@vm:/home/zhang# cat /sys/block/vda/mq/1/cpu_list
1
root@vm:/home/zhang# cat /sys/block/vda/mq/2/cpu_list
2, 6, 7
root@vm:/home/zhang# cat /sys/block/vda/mq/3/cpu_list
3

How about to put how to hit the issue in commit message so people would be able
to know the scenario?

Dongli Zhang

> 
> 
> 
> 
> At 2019-03-25 17:49:33, "luferry" <luferry@163.com> wrote:
>>
>> After reading the code and compare pci device info.
>> I can reproduce this imbalance under KVM.
>> When enable virtio-blk with multi queues but with only 2 msix-vector.
>> vp_dev->per_vq_vectors will be false, so blk_mq_virtio_map_queues will fallback to virtio_mq_map_queues
>>
>> Since virtual machine users cannot change the vector numbers, I think virtio_mq_map_queues should be more friendly with different cpu topology. 
>>
>> 448 const struct cpumask *vp_get_vq_affinity(struct virtio_device *vdev, int index)
>> 449 {
>> 450     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>> 451
>> 452     if (!vp_dev->per_vq_vectors ||
>> 453         vp_dev->vqs[index]->msix_vector == VIRTIO_MSI_NO_VECTOR)
>> 454         return NULL;
>> 455
>> 456     return pci_irq_get_affinity(vp_dev->pci_dev,
>> 457                     vp_dev->vqs[index]->msix_vector);
>> 458 }
>>
>> 32 int blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap,
>> 33         struct virtio_device *vdev, int first_vec)
>> 34 {
>> 35     const struct cpumask *mask;
>> 36     unsigned int queue, cpu;
>> 37
>> 38     if (!vdev->config->get_vq_affinity)
>> 39         goto fallback;
>> 40
>> 41     for (queue = 0; queue < qmap->nr_queues; queue++) {
>> 42         mask = vdev->config->get_vq_affinity(vdev, first_vec + queue); //vp_get_vq_affinity
>> 43         if (!mask)
>> 44             goto fallback;
>> 45
>> 46         for_each_cpu(cpu, mask)
>> 47             qmap->mq_map[cpu] = qmap->queue_offset + queue;
>> 48     }
>> 49
>> 50     return 0;
>> 51 fallback:
>> 52     return blk_mq_map_queues(qmap);
>> 53 }
>>
>>
>>
>>
>>
>>
>>
>>
>> At 2019-03-23 19:14:34, "Dongli Zhang" <dongli.zhang@oracle.com> wrote:
>>>
>>>
>>> On 03/23/2019 02:34 PM, luferry wrote:
>>>>
>>>> I just bought one vm , so i cannot access to hypervisor. I will try to build the environment on my desktop.
>>>> I'm sure about something.
>>>> The hypervisor is KVM and disk driver is virtio-blk.
>>>> [root@blk-mq ~]# dmesg |grep KVM
>>>> [    0.000000] Hypervisor detected: KVM
>>>> [    0.186330] Booting paravirtualized kernel on KVM
>>>> [    0.279106] KVM setup async PF for cpu 0
>>>> [    0.420819] KVM setup async PF for cpu 1
>>>> [    0.421682] KVM setup async PF for cpu 2
>>>> [    0.422113] KVM setup async PF for cpu 3
>>>> [    0.422434] KVM setup async PF for cpu 4
>>>> [    0.422434] KVM setup async PF for cpu 5
>>>> [    0.423312] KVM setup async PF for cpu 6
>>>> [    0.423394] KVM setup async PF for cpu 7
>>>> [    0.424125] KVM setup async PF for cpu 8
>>>> [    0.424414] KVM setup async PF for cpu 9
>>>> [    0.424415] KVM setup async PF for cpu 10
>>>> [    0.425329] KVM setup async PF for cpu 11
>>>> [    0.425420] KVM setup async PF for cpu 12
>>>> [    0.426156] KVM setup async PF for cpu 13
>>>> [    0.426431] KVM setup async PF for cpu 14
>>>> [    0.426431] KVM setup async PF for cpu 15
>>>> [root@blk-mq ~]# lspci |grep block
>>>> 00:05.0 SCSI storage controller: Red Hat, Inc. Virtio block device
>>>> 00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device
>>>>
>>>> [root@blk-mq ~]# lscpu
>>>> Architecture:          x86_64
>>>> CPU op-mode(s):        32-bit, 64-bit
>>>> Byte Order:            Little Endian
>>>> CPU(s):                16
>>>> On-line CPU(s) list:   0-15
>>>> Thread(s) per core:    2
>>>> Core(s) per socket:    8
>>>>
>>>> [root@blk-mq ~]# ls /sys/block/vdb/mq/
>>>> 0  1  2  3
>>>>
>>>> [root@blk-mq ~]# cat /proc/cpuinfo |egrep 'processor|core id'
>>>> processor	: 0
>>>> core id		: 0
>>>> processor	: 1
>>>> core id		: 0
>>>> processor	: 2
>>>> core id		: 1
>>>> processor	: 3
>>>> core id		: 1
>>>> processor	: 4
>>>> core id		: 2
>>>> processor	: 5
>>>> core id		: 2
>>>> processor	: 6
>>>> core id		: 3
>>>> processor	: 7
>>>> core id		: 3
>>>> processor	: 8
>>>> core id		: 4
>>>> processor	: 9
>>>> core id		: 4
>>>> processor	: 10
>>>> core id		: 5
>>>> processor	: 11
>>>> core id		: 5
>>>> processor	: 12
>>>> core id		: 6
>>>> processor	: 13
>>>> core id		: 6
>>>> processor	: 14
>>>> core id		: 7
>>>> processor	: 15
>>>> core id		: 7
>>>>
>>>> --before this patch--
>>>> [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
>>>> 0, 4, 5, 8, 9, 12, 13
>>>> 1
>>>> 2, 6, 7, 10, 11, 14, 15
>>>> 3
>>>>
>>>> --after this patch--
>>>> [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
>>>> 0, 4, 5, 12, 13
>>>> 1, 6, 7, 14, 15
>>>> 2, 8, 9
>>>> 3, 10, 11
>>>>
>>>>
>>>> I add dump_stack insert blk_mq_map_queues.
>>>>
>>>> [    1.378680] Call Trace:
>>>> [    1.378690]  dump_stack+0x5a/0x73
>>>> [    1.378695]  blk_mq_map_queues+0x29/0xb0
>>>> [    1.378700]  blk_mq_alloc_tag_set+0x1bd/0x2d0
>>>> [    1.378705]  virtblk_probe+0x1ae/0x8e4 [virtio_blk]
>>>> [    1.378709]  virtio_dev_probe+0x18a/0x230 [virtio]
>>>> [    1.378713]  really_probe+0x215/0x3f0
>>>> [    1.378716]  driver_probe_device+0x115/0x130
>>>> [    1.378718]  device_driver_attach+0x50/0x60
>>>> [    1.378720]  __driver_attach+0xbd/0x140
>>>> [    1.378722]  ? device_driver_attach+0x60/0x60
>>>> [    1.378724]  bus_for_each_dev+0x67/0xc0
>>>> [    1.378727]  ? klist_add_tail+0x57/0x70
>>>
>>> I am not able to reproduce above call stack when virtio-blk is assigned 4 queues
>>> while my qemu cmdline is "-smp 16,sockets=1,cores=8,threads=2".
>>>
>>> # cat /sys/block/vda/mq/0/cpu_list
>>> 0, 1, 2, 3
>>> # cat /sys/block/vda/mq/1/cpu_list
>>> 4, 5, 6, 7
>>> # cat /sys/block/vda/mq/2/cpu_list
>>> 8, 9, 10, 11
>>> # cat /sys/block/vda/mq/3/cpu_list
>>> 12, 13, 14, 15
>>>
>>>
>>> I do agree in above case we would have issue if the mapping is established by
>>> blk_mq_map_queues().
>>>
>>>
>>> However, I am just curious how we finally reach at blk_mq_map_queues() from
>>> blk_mq_alloc_tag_set()?
>>>
>>> It should be something like:
>>>
>>> blk_mq_alloc_tag_set()
>>> -> blk_mq_update_queue_map()
>>>      -> if (set->ops->map_queues && !is_kdump_kernel())
>>>             return set->ops->map_queues(set);
>>>      -> else
>>>             return blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
>>>
>>> Wouldn't we always have set->ops->map_queues = virtblk_map_queues()?
>>>
>>> Or the execution reach at:
>>>
>>> virtblk_map_queues()
>>> -> blk_mq_virtio_map_queues()
>>>     -> if (!vdev->config->get_vq_affinity)
>>>            return blk_mq_map_queues(qmap);
>>>     -> else
>>>            establish the mapping via get_vq_affinity
>>>
>>> but vdev->config->get_vq_affinity == NULL?
>>>
>>> For virtio pci, get_vq_affinity is always set. Seems virtio mmio would not set
>>> get_vq_affinity.
>>>
>>>
>>> I used to play with firecracker (by amazon) and it is interesting firecracker
>>> uses mmio to setup virtio-blk.
>>>
>>>
>>> Sorry for disturbing the review of this patch. I just would like to clarify in
>>> which scenario we would hit this issue, e.g., when virtio-blk is based on mmio?
>>>
>>> Dongli Zhang
>>>
>>>>
>>>>
>>>> At 2019-03-22 19:58:08, "Dongli Zhang" <dongli.zhang@oracle.com> wrote:
>>>>>
>>>>>
>>>>> On 03/22/2019 06:09 PM, luferry wrote:
>>>>>> under virtual machine environment, cpu topology may differ from normal
>>>>>> physical server.
>>>>>
>>>>> Would mind share the name of virtual machine monitor, the command line if
>>>>> available, and which device to reproduce.
>>>>>
>>>>> For instance, I am not able to reproduce with qemu nvme or virtio-blk as I
>>>>> assume they use pci or virtio specific mapper to establish the mapping.
>>>>>
>>>>> E.g., with qemu and nvme: -smp 8,sockets=1,cores=4,threads=2
>>>>>
>>>>> Indeed I use three queues instead of twp as one is reserved for admin.
>>>>>
>>>>> # ls /sys/block/nvme0n1/mq/*
>>>>> /sys/block/nvme0n1/mq/0:
>>>>> cpu0  cpu1  cpu2  cpu3  cpu_list  nr_reserved_tags  nr_tags
>>>>>
>>>>> /sys/block/nvme0n1/mq/1:
>>>>> cpu4  cpu5  cpu6  cpu7  cpu_list  nr_reserved_tags  nr_tags
>>>>>
>>>>>
>>>>> Thank you very much!
>>>>>
>>>>> Dongli Zhang
>>>>>
>>>>>> for example (machine with 4 cores, 2 threads per core):
>>>>>>
>>>>>> normal physical server:
>>>>>> core-id   thread-0-id  thread-1-id
>>>>>> 0         0            4
>>>>>> 1         1            5
>>>>>> 2         2            6
>>>>>> 3         3            7
>>>>>>
>>>>>> virtual machine:
>>>>>> core-id   thread-0-id  thread-1-id
>>>>>> 0         0            1
>>>>>> 1         2            3
>>>>>> 2         4            5
>>>>>> 3         6            7
>>>>>>
>>>>>> When attach disk with two queues, all the even numbered cpus will be
>>>>>> mapped to queue 0. Under virtual machine, all the cpus is followed by
>>>>>> its sibling cpu.Before this patch, all the odd numbered cpus will also
>>>>>> be mapped to queue 0, can cause serious imbalance.this will lead to
>>>>>> performance impact on system IO
>>>>>>
>>>>>> So suggest to allocate cpu map by core id, this can be more currency
>>>>>>
>>>>>> Signed-off-by: luferry <luferry@163.com>
>>>>>> ---
>>>>>>  block/blk-mq-cpumap.c | 9 +++++----
>>>>>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>>>>>> index 03a534820271..4125e8e77679 100644
>>>>>> --- a/block/blk-mq-cpumap.c
>>>>>> +++ b/block/blk-mq-cpumap.c
>>>>>> @@ -35,7 +35,7 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>>>>  {
>>>>>>  	unsigned int *map = qmap->mq_map;
>>>>>>  	unsigned int nr_queues = qmap->nr_queues;
>>>>>> -	unsigned int cpu, first_sibling;
>>>>>> +	unsigned int cpu, first_sibling, core = 0;
>>>>>>  
>>>>>>  	for_each_possible_cpu(cpu) {
>>>>>>  		/*
>>>>>> @@ -48,9 +48,10 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>>>>  			map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>>>>>  		} else {
>>>>>>  			first_sibling = get_first_sibling(cpu);
>>>>>> -			if (first_sibling == cpu)
>>>>>> -				map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>>>>> -			else
>>>>>> +			if (first_sibling == cpu) {
>>>>>> +				map[cpu] = cpu_to_queue_index(qmap, nr_queues, core);
>>>>>> +				core++;
>>>>>> +			} else
>>>>>>  				map[cpu] = map[first_sibling];
>>>>>>  		}
>>>>>>  	}
>>>>>>