by Marc Zyngier

[permalink] [raw]

Subject: Re: ARM/gic-v4: deadlock occurred

On 15/07/2019 11:43, Guoheyi wrote:
>
>
> On 2019/7/15 17:07, Marc Zyngier wrote:
>> On 15/07/2019 07:32, Guoheyi wrote:
>>> Hi Marc,
>>>
>>> The issue only occurs after applying the vlpi_map_rework patches, and we
>>> can see the patches only affect VM; it changes its_create_device() a
>>> little so it may affect host booting in some ways, so I took the lazy
>>> way to send it out for some insights.
>>>
>>> I am suspecting below code; if alloc_lpis == false, what will happen?
>> If !alloc_lpis, then we don't allocate the lpi_map, which is the
>> intended effect.
>>
>>> Anyway, I will investigate more on this.
>>>
>>>
>>> if (alloc_lpis) {
>>> lpi_map = its_lpi_alloc(nvecs, &lpi_base, &nr_lpis);
>>> if (lpi_map)
>>> col_map = kcalloc(nr_lpis, sizeof(*col_map),
>>> GFP_KERNEL);
>>> } else {
>>> col_map = kcalloc(nr_ites, sizeof(*col_map), GFP_KERNEL);
>>> nr_lpis = 0;
>>> lpi_base = 0;
>>> }
>>> if (its->is_v4)
>>> vlpi_map = kcalloc(nr_lpis, sizeof(*vlpi_map), GFP_KERNEL);
>>>
>>> if (!dev || !itt || !col_map || (!lpi_map && alloc_lpis) ||
>>> (!vlpi_map && its->is_v4)) {
>>> kfree(dev);
>>> kfree(itt);
>>> kfree(lpi_map);
>>> kfree(col_map);
>>> kfree(vlpi_map);
>>> return NULL;
>>> }
>> How does this relate to the patch posted in this discussion? The
>> proposed changes turn the locking from a mutex into a raw_spinlock.
>
> I'm testing the patchset in
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/vlpi-map-rework,
> not only the patch posted in the mail directly. The first patch
> *"**irqchip/gic-v3-its: Make vlpi_map allocations atomic" works well in
> our internal tree, and my new testing is against the other 3 patches in
> your vlpi-map-rework branch, as I promised. I'm sorry if I didn't state
> this clearly.

Ah, I had completely forgot about this branch. As I said, it is
completely untested. I'll see if I can get some brain bandwidth in the
next couple of weeks to get back to it...

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2019-07-15 12:50:19

by Heyi Guo

[permalink] [raw]

Subject: Re: ARM/gic-v4: deadlock occurred

On 2019/7/15 19:13, Marc Zyngier wrote:
> On 15/07/2019 11:43, Guoheyi wrote:
>>
>> On 2019/7/15 17:07, Marc Zyngier wrote:
>>> On 15/07/2019 07:32, Guoheyi wrote:
>>>> Hi Marc,
>>>>
>>>> The issue only occurs after applying the vlpi_map_rework patches, and we
>>>> can see the patches only affect VM; it changes its_create_device() a
>>>> little so it may affect host booting in some ways, so I took the lazy
>>>> way to send it out for some insights.
>>>>
>>>> I am suspecting below code; if alloc_lpis == false, what will happen?
>>> If !alloc_lpis, then we don't allocate the lpi_map, which is the
>>> intended effect.
>>>
>>>> Anyway, I will investigate more on this.
>>>>
>>>>
>>>> if (alloc_lpis) {
>>>> lpi_map = its_lpi_alloc(nvecs, &lpi_base, &nr_lpis);
>>>> if (lpi_map)
>>>> col_map = kcalloc(nr_lpis, sizeof(*col_map),
>>>> GFP_KERNEL);
>>>> } else {
>>>> col_map = kcalloc(nr_ites, sizeof(*col_map), GFP_KERNEL);
>>>> nr_lpis = 0;
>>>> lpi_base = 0;
>>>> }
>>>> if (its->is_v4)
>>>> vlpi_map = kcalloc(nr_lpis, sizeof(*vlpi_map), GFP_KERNEL);
>>>>
>>>> if (!dev || !itt || !col_map || (!lpi_map && alloc_lpis) ||
>>>> (!vlpi_map && its->is_v4)) {
>>>> kfree(dev);
>>>> kfree(itt);
>>>> kfree(lpi_map);
>>>> kfree(col_map);
>>>> kfree(vlpi_map);
>>>> return NULL;
>>>> }
>>> How does this relate to the patch posted in this discussion? The
>>> proposed changes turn the locking from a mutex into a raw_spinlock.
>> I'm testing the patchset in
>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/vlpi-map-rework,
>> not only the patch posted in the mail directly. The first patch
>> *"**irqchip/gic-v3-its: Make vlpi_map allocations atomic" works well in
>> our internal tree, and my new testing is against the other 3 patches in
>> your vlpi-map-rework branch, as I promised. I'm sorry if I didn't state
>> this clearly.
> Ah, I had completely forgot about this branch. As I said, it is
> completely untested. I'll see if I can get some brain bandwidth in the
> next couple of weeks to get back to it...
Yes, a bit too long ago...

And finally I found the panic is caused by this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=irq/vlpi-map-rework&id=fe3dd7e06ee0e82bade4f2a107ef6422e5c9021e

diff --git a/drivers/irqchip/irq-gic-v3-its.c
b/drivers/irqchip/irq-gic-v3-its.c
index 18aa04b..6f55886 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -2458,6 +2458,8 @@ static void its_free_device(struct its_device
*its_dev)
list_del(&its_dev->entry);
raw_spin_unlock_irqrestore(&its_dev->its->lock, flags);
kfree(its_dev->itt);
+ kfree(its_dev->event_map.lpi_map);
+ kfree(its_dev->event_map.col_map);
kfree(its_dev);
}

This patch causes double free for both lpi_map and col_map in
its_irq_domain_free():

if (!its_dev->shared &&
bitmap_empty(its_dev->event_map.lpi_map,
its_dev->event_map.nr_lpis)) {
its_lpi_free(its_dev->event_map.lpi_map, ---->
its_dev->event_map.lpi_map is freed
its_dev->event_map.lpi_base,
its_dev->event_map.nr_lpis);
kfree(its_dev->event_map.col_map); ---->
its_dev->event_map.col_map is freed

/* Unmap device/itt */
its_send_mapd(its_dev, 0);
its_free_device(its_dev); ---->
lpi_map and col_map are freed again
}

Thanks,

Heyi

>
> Thanks,
>
> M.