2021-02-01 18:38:30

by John Garry

[permalink] [raw]
Subject: PCI MSI issue with reinserting a driver

Just a heads-up, by chance I noticed that I can't re-insert a specific
driver on v5.11-rc6:

[ 64.356023] hisi_dma 0000:7b:00.0: Adding to iommu group 31
[ 64.368627] hisi_dma 0000:7b:00.0: enabling device (0000 -> 0002)
[ 64.384156] hisi_dma 0000:7b:00.0: Failed to allocate MSI vectors!
[ 64.397180] hisi_dma: probe of 0000:7b:00.0 failed with error -28

That's with CONFIG_DEBUG_TEST_DRIVER_REMOVE=y

Bisect tells me that this is the first bad commit:
4615fbc3788d genirq/irqdomain: Don't try to free an interrupt that has
no mapping

The relevant driver code is
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma/hisi_dma.c#n547

That driver only allocates 30 MSI, so maybe there's a problem with not
allocating (and freeing) all 32 MSI.

I'll have a bit more of a look tomorrow.

Cheers,
john

Bisect log:

git bisect start
# good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10
git bisect good 2c85ebc57b3e1817b6ce1a6b703928e113a90442
# bad: [1048ba83fb1c00cd24172e23e8263972f6b5d9ac] Linux 5.11-rc6
git bisect bad 1048ba83fb1c00cd24172e23e8263972f6b5d9ac
# bad: [ee249d30fadec7677364063648f5547e243bf93f] Merge branch
'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
git bisect bad ee249d30fadec7677364063648f5547e243bf93f
# good: [15b447361794271f4d03c04d82276a841fe06328] mm/lru: revise the
comments of lru_lock
git bisect good 15b447361794271f4d03c04d82276a841fe06328
# good: [15b447361794271f4d03c04d82276a841fe06328] mm/lru: revise the
comments of lru_lock
git bisect good 15b447361794271f4d03c04d82276a841fe06328
# good: [2aa899ebd5c3aef707460f58951cc8a1d1f466c1] MAINTAINERS: add
mvpp2 driver entry
git bisect good 2aa899ebd5c3aef707460f58951cc8a1d1f466c1
# good: [2911ed9f47b47cb5ab87d03314b3b9fe008e607f] Merge tag
'char-misc-5.11-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

git bisect good 2911ed9f47b47cb5ab87d03314b3b9fe008e607f
# bad: [a45f1d43311d3a4f6534e48a3655ba3247a59d48] Merge tag
'regmap-v5.11' of
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
git bisect bad a45f1d43311d3a4f6534e48a3655ba3247a59d48
# good: [749c1e1481e1d242ded9dd1bf210ddb7c0d22a4f] Merge tag
'iio-for-5.11a' of
https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into
staging-next
git bisect good 749c1e1481e1d242ded9dd1bf210ddb7c0d22a4f
# good: [98b32c71a455ff289442779fee02ad60a6217006] staging: rtl8723bs:
replace HT_CAP_AMPDU_FACTOR
git bisect good 98b32c71a455ff289442779fee02ad60a6217006
# bad: [3c41e57a1e168d879e923c5583adeae47eec9f64] Merge tag
'irqchip-5.11' of
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into
irq/core
git bisect bad 3c41e57a1e168d879e923c5583adeae47eec9f64
# good: [e15f2fa959f2cce8a05e8e3a596e75d068cd42c5] driver core:
platform: Add devm_platform_get_irqs_affinity()
git bisect good e15f2fa959f2cce8a05e8e3a596e75d068cd42c5
# good: [2cb0837e56e1b04b773ed05df72297de4e010063] arm64: irqstat: Get
rid of duplicated declaration
git bisect good 2cb0837e56e1b04b773ed05df72297de4e010063
# bad: [4615fbc3788ddc8e7c6d697714ad35a53729aa2c] genirq/irqdomain:
Don't try to free an interrupt that has no mapping
git bisect bad 4615fbc3788ddc8e7c6d697714ad35a53729aa2c
# good: [e091bc90cd2d65f48e4688faead2911558d177d7] irqstat: Move
declaration into asm-generic/hardirq.h
git bisect good e091bc90cd2d65f48e4688faead2911558d177d7
# good: [ae9ef58996a4447dd44aa638759f913c883ba816] softirq: Move related
code into one section
git bisect good ae9ef58996a4447dd44aa638759f913c883ba816
# good: [15b8d9372f27c47e17c91f6f16d359314cf11404] sh/irq: Add missing
closing parentheses in arch_show_interrupts()
git bisect good 15b8d9372f27c47e17c91f6f16d359314cf11404
# first bad commit: [4615fbc3788ddc8e7c6d697714ad35a53729aa2c]
genirq/irqdomain: Don't try to free an interrupt that has no mapping


2021-02-01 18:54:46

by Marc Zyngier

[permalink] [raw]
Subject: Re: PCI MSI issue with reinserting a driver

Hi John,

On Mon, 01 Feb 2021 18:34:59 +0000,
John Garry <[email protected]> wrote:
>
> Just a heads-up, by chance I noticed that I can't re-insert a specific
> driver on v5.11-rc6:
>
> [ 64.356023] hisi_dma 0000:7b:00.0: Adding to iommu group 31
> [ 64.368627] hisi_dma 0000:7b:00.0: enabling device (0000 -> 0002)
> [ 64.384156] hisi_dma 0000:7b:00.0: Failed to allocate MSI vectors!
> [ 64.397180] hisi_dma: probe of 0000:7b:00.0 failed with error -28
>
> That's with CONFIG_DEBUG_TEST_DRIVER_REMOVE=y
>
> Bisect tells me that this is the first bad commit:
> 4615fbc3788d genirq/irqdomain: Don't try to free an interrupt that has
> no mapping
>
> The relevant driver code is
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma/hisi_dma.c#n547
>
> That driver only allocates 30 MSI, so maybe there's a problem with not
> allocating (and freeing) all 32 MSI.

Are they Multi-MSI (and not MSI-X)?

> I'll have a bit more of a look tomorrow.

Here's my suspicion: two of the interrupts are mapped in the low-level
domain (the ITS, I'd expect in your case), but they have never been
mapped at the higher level.

On teardown, we only get rid of the 30 that were actually mapped, and
leave the last two dangling in the ITS domain, and thus the ITS device
resources are never freed. On reload, we request another 32
interrupts, which can't be satisfied for this device.

Assuming I got it right, the question is: why weren't these interrupts
mapped in the PCI domain the first place. And if I got it wrong, I'm
even more curious!

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2021-02-02 08:41:33

by John Garry

[permalink] [raw]
Subject: Re: PCI MSI issue with reinserting a driver

On 01/02/2021 18:50, Marc Zyngier wrote:

Hi Marc,

>> Just a heads-up, by chance I noticed that I can't re-insert a specific
>> driver on v5.11-rc6:
>>
>> [ 64.356023] hisi_dma 0000:7b:00.0: Adding to iommu group 31
>> [ 64.368627] hisi_dma 0000:7b:00.0: enabling device (0000 -> 0002)
>> [ 64.384156] hisi_dma 0000:7b:00.0: Failed to allocate MSI vectors!
>> [ 64.397180] hisi_dma: probe of 0000:7b:00.0 failed with error -28
>>
>> That's with CONFIG_DEBUG_TEST_DRIVER_REMOVE=y
>>
>> Bisect tells me that this is the first bad commit:
>> 4615fbc3788d genirq/irqdomain: Don't try to free an interrupt that has
>> no mapping
>>
>> The relevant driver code is
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma/hisi_dma.c#n547
>>
>> That driver only allocates 30 MSI, so maybe there's a problem with not
>> allocating (and freeing) all 32 MSI.
> Are they Multi-MSI (and not MSI-X)?

multi-msi

>
>> I'll have a bit more of a look tomorrow.
> Here's my suspicion: two of the interrupts are mapped in the low-level
> domain (the ITS, I'd expect in your case), but they have never been
> mapped at the higher level.
>
> On teardown, we only get rid of the 30 that were actually mapped, and
> leave the last two dangling in the ITS domain, and thus the ITS device
> resources are never freed. On reload, we request another 32
> interrupts, which can't be satisfied for this device.
>
> Assuming I got it right, the question is: why weren't these interrupts
> mapped in the PCI domain the first place. And if I got it wrong, I'm
> even more curious!

Not sure. I also now notice an error for the SAS PCI driver on D06 when
nr_cpus < 16, which means number of MSI vectors allocated < 32, so looks
the same problem. There we try to allocate 16 + max(nr cpus, 16) MSI.

Anyway, let me have a look today to see what is going wrong.

cheers,
John

2021-02-02 21:26:02

by John Garry

[permalink] [raw]
Subject: Re: PCI MSI issue with reinserting a driver

>> Here's my suspicion: two of the interrupts are mapped in the low-level
>> domain (the ITS, I'd expect in your case), but they have never been
>> mapped at the higher level.
>>
>> On teardown, we only get rid of the 30 that were actually mapped, and
>> leave the last two dangling in the ITS domain, and thus the ITS device
>> resources are never freed. On reload, we request another 32
>> interrupts, which can't be satisfied for this device.
>>
>> Assuming I got it right, the question is: why weren't these interrupts
>> mapped in the PCI domain the first place. And if I got it wrong, I'm
>> even more curious!
>
> Not sure. I also now notice an error for the SAS PCI driver on D06 when
> nr_cpus < 16, which means number of MSI vectors allocated < 32, so looks
> the same problem. There we try to allocate 16 + max(nr cpus, 16) MSI.
>
> Anyway, let me have a look today to see what is going wrong.
>
Could this be the problem:

nr_cpus=11

In alloc path, we have:
its_alloc_device_irq(nvecs=27 = 16+11)
bitmap_find_free_region(order = 5);
In free path, we have:
its_irq_domain_free(nvecs = 1) and free each 27 vecs
bitmap_release_region(order = 0)

So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails.

FWIW, this hack seems to fix it:

---->8-----

diff --git a/drivers/irqchip/irq-gic-v3-its.c
b/drivers/irqchip/irq-gic-v3-its.c
index ac5412b284e6..458ea0ebea2b 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c

@@ -3533,34 +3534,39 @@ static int its_irq_domain_alloc(struct
irq_domain *domain, unsigned int virq,
struct its_device *its_dev = info->scratchpad[0].ptr;
struct its_node *its = its_dev->its;
struct irq_data *irqd;
- irq_hw_number_t hwirq;
+ irq_hw_number_t hwirq[nr_irqs]; //vla :(
int err;
int i;

- err = its_alloc_device_irq(its_dev, nr_irqs, &hwirq);
+ for (i = 0; i < nr_irqs; i++) {
+ err = its_alloc_device_irq(its_dev, 1, &hwirq[i]);
+ if (err) //tidy
+ return err;
+ }
+
- if (err)
- return err;

err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev));
if (err)
return err;

for (i = 0; i < nr_irqs; i++) {
- err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i);
+ err = its_irq_gic_domain_alloc(domain, virq + i, hwirq[i]);
if (err)
return err;

irq_domain_set_hwirq_and_chip(domain, virq + i,
- hwirq + i, &its_irq_chip, its_dev);
+ hwirq[i], &its_irq_chip, its_dev);
irqd = irq_get_irq_data(virq + i);
irqd_set_single_target(irqd);
irqd_set_affinity_on_activate(irqd);
pr_debug("ID:%d pID:%d vID:%d\n",
- (int)(hwirq + i - its_dev->event_map.lpi_base),
- (int)(hwirq + i), virq + i);
+ (int)(hwirq[i] - its_dev->event_map.lpi_base),
+ (int)(hwirq[i]), virq + i);
}
----8<-----


But I'm not sure that we have any requirement for those map bits to be
consecutive.

Thanks,
John

2021-02-02 22:29:49

by Marc Zyngier

[permalink] [raw]
Subject: Re: PCI MSI issue with reinserting a driver

On 2021-02-02 12:38, John Garry wrote:
>>> Here's my suspicion: two of the interrupts are mapped in the
>>> low-level
>>> domain (the ITS, I'd expect in your case), but they have never been
>>> mapped at the higher level.
>>>
>>> On teardown, we only get rid of the 30 that were actually mapped, and
>>> leave the last two dangling in the ITS domain, and thus the ITS
>>> device
>>> resources are never freed. On reload, we request another 32
>>> interrupts, which can't be satisfied for this device.
>>>
>>> Assuming I got it right, the question is: why weren't these
>>> interrupts
>>> mapped in the PCI domain the first place. And if I got it wrong, I'm
>>> even more curious!
>>
>> Not sure. I also now notice an error for the SAS PCI driver on D06
>> when nr_cpus < 16, which means number of MSI vectors allocated < 32,
>> so looks the same problem. There we try to allocate 16 + max(nr cpus,
>> 16) MSI.
>>
>> Anyway, let me have a look today to see what is going wrong.
>>
> Could this be the problem:
>
> nr_cpus=11
>
> In alloc path, we have:
> its_alloc_device_irq(nvecs=27 = 16+11)
> bitmap_find_free_region(order = 5);
> In free path, we have:
> its_irq_domain_free(nvecs = 1) and free each 27 vecs
> bitmap_release_region(order = 0)
>
> So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails.
>
> FWIW, this hack seems to fix it:
>
> ---->8-----
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c
> b/drivers/irqchip/irq-gic-v3-its.c
> index ac5412b284e6..458ea0ebea2b 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
>
> @@ -3533,34 +3534,39 @@ static int its_irq_domain_alloc(struct
> irq_domain *domain, unsigned int virq,
> struct its_device *its_dev = info->scratchpad[0].ptr;
> struct its_node *its = its_dev->its;
> struct irq_data *irqd;
> - irq_hw_number_t hwirq;
> + irq_hw_number_t hwirq[nr_irqs]; //vla :(
> int err;
> int i;
>
> - err = its_alloc_device_irq(its_dev, nr_irqs, &hwirq);
> + for (i = 0; i < nr_irqs; i++) {
> + err = its_alloc_device_irq(its_dev, 1, &hwirq[i]);
> + if (err) //tidy
> + return err;
> + }
> +
> - if (err)
> - return err;
>
> err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev));
> if (err)
> return err;
>
> for (i = 0; i < nr_irqs; i++) {
> - err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i);
> + err = its_irq_gic_domain_alloc(domain, virq + i, hwirq[i]);
> if (err)
> return err;
>
> irq_domain_set_hwirq_and_chip(domain, virq + i,
> - hwirq + i, &its_irq_chip, its_dev);
> + hwirq[i], &its_irq_chip, its_dev);
> irqd = irq_get_irq_data(virq + i);
> irqd_set_single_target(irqd);
> irqd_set_affinity_on_activate(irqd);
> pr_debug("ID:%d pID:%d vID:%d\n",
> - (int)(hwirq + i - its_dev->event_map.lpi_base),
> - (int)(hwirq + i), virq + i);
> + (int)(hwirq[i] - its_dev->event_map.lpi_base),
> + (int)(hwirq[i]), virq + i);
> }
> ----8<-----
>
>
> But I'm not sure that we have any requirement for those map bits to be
> consecutive.

We can't really do that. All the events must be contiguous,
and there is also a lot of assumptions in the ITS driver that
LPI allocations is also contiguous.

But there is also the fact that for Multi-MSI, we *must*
allocate 32 vectors. Any driver could assume that if we have
allocated 17 vectors, then there is another 15 available.

My question still stand: how was this working with the previous
behaviour?

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2021-02-02 22:46:46

by John Garry

[permalink] [raw]
Subject: Re: PCI MSI issue with reinserting a driver

On 02/02/2021 14:48, Marc Zyngier wrote:
>>>
>>> Not sure. I also now notice an error for the SAS PCI driver on D06
>>> when nr_cpus < 16, which means number of MSI vectors allocated < 32,
>>> so looks the same problem. There we try to allocate 16 + max(nr cpus,
>>> 16) MSI.
>>>
>>> Anyway, let me have a look today to see what is going wrong.
>>>
>> Could this be the problem:
>>
>> nr_cpus=11
>>
>> In alloc path, we have:
>>     its_alloc_device_irq(nvecs=27 = 16+11)
>>       bitmap_find_free_region(order = 5);
>> In free path, we have:
>>     its_irq_domain_free(nvecs = 1) and free each 27 vecs
>>       bitmap_release_region(order = 0)
>>
>> So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails.

[ ... ]

>>
>>
>> But I'm not sure that we have any requirement for those map bits to be
>> consecutive.
>
> We can't really do that. All the events must be contiguous,
> and there is also a lot of assumptions in the ITS driver that
> LPI allocations is also contiguous.
>
> But there is also the fact that for Multi-MSI, we *must*
> allocate 32 vectors. Any driver could assume that if we have
> allocated 17 vectors, then there is another 15 available.
>
> My question still stand: how was this working with the previous
> behaviour?

Because previously in this scenario we would allocate 32 bits and free
32 bits in the map; but now we allocate 32 bits, yet only free 27 - so
leak 5 bits. And this comes from how irq_domain_free_irqs_hierarchy()
now frees per-interrupt, instead of all irqs per domain.

Before:
In free path, we have:
its_irq_domain_free(nvecs = 27)
bitmap_release_region(count order = 5 == 32bits)

Current:
In free path, we have:
its_irq_domain_free(nvecs = 1) for free each 27 vecs
bitmap_release_region(count order = 0 == 1bit)

Cheers,
John

2021-02-03 17:28:01

by Marc Zyngier

[permalink] [raw]
Subject: Re: PCI MSI issue with reinserting a driver

On 2021-02-02 15:46, John Garry wrote:
> On 02/02/2021 14:48, Marc Zyngier wrote:
>>>>
>>>> Not sure. I also now notice an error for the SAS PCI driver on D06
>>>> when nr_cpus < 16, which means number of MSI vectors allocated < 32,
>>>> so looks the same problem. There we try to allocate 16 + max(nr
>>>> cpus, 16) MSI.
>>>>
>>>> Anyway, let me have a look today to see what is going wrong.
>>>>
>>> Could this be the problem:
>>>
>>> nr_cpus=11
>>>
>>> In alloc path, we have:
>>>     its_alloc_device_irq(nvecs=27 = 16+11)
>>>       bitmap_find_free_region(order = 5);
>>> In free path, we have:
>>>     its_irq_domain_free(nvecs = 1) and free each 27 vecs
>>>       bitmap_release_region(order = 0)
>>>
>>> So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails.
>
> [ ... ]
>
>>>
>>>
>>> But I'm not sure that we have any requirement for those map bits to
>>> be
>>> consecutive.
>>
>> We can't really do that. All the events must be contiguous,
>> and there is also a lot of assumptions in the ITS driver that
>> LPI allocations is also contiguous.
>>
>> But there is also the fact that for Multi-MSI, we *must*
>> allocate 32 vectors. Any driver could assume that if we have
>> allocated 17 vectors, then there is another 15 available.
>>
>> My question still stand: how was this working with the previous
>> behaviour?
>
> Because previously in this scenario we would allocate 32 bits and free
> 32 bits in the map; but now we allocate 32 bits, yet only free 27 - so
> leak 5 bits. And this comes from how irq_domain_free_irqs_hierarchy()
> now frees per-interrupt, instead of all irqs per domain.
>
> Before:
> In free path, we have:
> its_irq_domain_free(nvecs = 27)
> bitmap_release_region(count order = 5 == 32bits)
>
> Current:
> In free path, we have:
> its_irq_domain_free(nvecs = 1) for free each 27 vecs
> bitmap_release_region(count order = 0 == 1bit)

Right. I was focusing on the patch and blindly ignored the explanation
at the top of the email. Apologies for that.

I'm not overly keen on handling this in the ITS though, and I'd rather
we try and do it in the generic code. How about this (compile tested
only)?

Thanks,

M.

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 6aacd342cd14..cfccad83c2df 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -1399,8 +1399,19 @@ static void irq_domain_free_irqs_hierarchy(struct
irq_domain *domain,
return;

for (i = 0; i < nr_irqs; i++) {
- if (irq_domain_get_irq_data(domain, irq_base + i))
- domain->ops->free(domain, irq_base + i, 1);
+ int n ;
+
+ /* Find the largest possible span of IRQs to free in one go */
+ for (n = 0;
+ ((i + n) < nr_irqs &&
+ irq_domain_get_irq_data(domain, irq_base + i + n));
+ n++);
+
+ if (!n)
+ continue;
+
+ domain->ops->free(domain, irq_base + i, n);
+ i += n;
}
}


--
Jazz is not dead. It just smells funny...

2021-02-04 10:49:35

by John Garry

[permalink] [raw]
Subject: Re: PCI MSI issue with reinserting a driver

On 03/02/2021 17:23, Marc Zyngier wrote:
>>
>> Before:
>>  In free path, we have:
>>      its_irq_domain_free(nvecs = 27)
>>        bitmap_release_region(count order = 5 == 32bits)
>>
>> Current:
>>  In free path, we have:
>>      its_irq_domain_free(nvecs = 1) for free each 27 vecs
>>        bitmap_release_region(count order = 0 == 1bit)
>
> Right. I was focusing on the patch and blindly ignored the explanation
> at the top of the email. Apologies for that.

Yeah, it was a distraction.

>
> I'm not overly keen on handling this in the ITS though, and I'd rather
> we try and do it in the generic code. How about this (compile tested
> only)?
>
> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
> index 6aacd342cd14..cfccad83c2df 100644
> --- a/kernel/irq/irqdomain.c
> +++ b/kernel/irq/irqdomain.c
> @@ -1399,8 +1399,19 @@ static void irq_domain_free_irqs_hierarchy(struct
> irq_domain *domain,
>          return;
>
>      for (i = 0; i < nr_irqs; i++) {
> -        if (irq_domain_get_irq_data(domain, irq_base + i))
> -            domain->ops->free(domain, irq_base + i, 1);
> +        int n ;
> +
> +        /* Find the largest possible span of IRQs to free in one go */
> +        for (n = 0;
> +             ((i + n) < nr_irqs &&
> +              irq_domain_get_irq_data(domain, irq_base + i + n));
> +             n++);
> +
> +        if (!n)
> +            continue;
> +
> +        domain->ops->free(domain, irq_base + i, n);
> +        i += n;
>      }
>  }

That fixed my problem.

For my benefit, if MSIs must be allocated in power of 2, could we check
a flag for the dealloc method? Like, if MSI domain, then do as before
4615fbc3788d. But I'm not sure on the specific scenario which that
commit fixed. Or even whether you want specifics like that in core code.

Thanks,
John

2021-04-06 19:26:21

by John Garry

[permalink] [raw]
Subject: Re: PCI MSI issue with reinserting a driver

On 03/02/2021 17:23, Marc Zyngier wrote:
> On 2021-02-02 15:46, John Garry wrote:
>> On 02/02/2021 14:48, Marc Zyngier wrote:
>>>>>
>>>>> Not sure. I also now notice an error for the SAS PCI driver on D06
>>>>> when nr_cpus < 16, which means number of MSI vectors allocated <
>>>>> 32, so looks the same problem. There we try to allocate 16 + max(nr
>>>>> cpus, 16) MSI.
>>>>>
>>>>> Anyway, let me have a look today to see what is going wrong.
>>>>>
>>>> Could this be the problem:
>>>>
>>>> nr_cpus=11
>>>>
>>>> In alloc path, we have:
>>>>     its_alloc_device_irq(nvecs=27 = 16+11)
>>>>       bitmap_find_free_region(order = 5);
>>>> In free path, we have:
>>>>     its_irq_domain_free(nvecs = 1) and free each 27 vecs
>>>>       bitmap_release_region(order = 0)
>>>>
>>>> So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails.
>>
>> [ ... ]
>>
>>>>

Hi Marc,

Just a friendly reminder on this issue. Let me know if anything required
some our side.

Cheers,
John

>>>>
>>>> But I'm not sure that we have any requirement for those map bits to be
>>>> consecutive.
>>>
>>> We can't really do that. All the events must be contiguous,
>>> and there is also a lot of assumptions in the ITS driver that
>>> LPI allocations is also contiguous.
>>>
>>> But there is also the fact that for Multi-MSI, we *must*
>>> allocate 32 vectors. Any driver could assume that if we have
>>> allocated 17 vectors, then there is another 15 available.
>>>
>>> My question still stand: how was this working with the previous
>>> behaviour?
>>
>> Because previously in this scenario we would allocate 32 bits and free
>> 32 bits in the map; but now we allocate 32 bits, yet only free 27 - so
>> leak 5 bits. And this comes from how irq_domain_free_irqs_hierarchy()
>> now frees per-interrupt, instead of all irqs per domain.
>>
>> Before:
>>  In free path, we have:
>>      its_irq_domain_free(nvecs = 27)
>>        bitmap_release_region(count order = 5 == 32bits)
>>
>> Current:
>>  In free path, we have:
>>      its_irq_domain_free(nvecs = 1) for free each 27 vecs
>>        bitmap_release_region(count order = 0 == 1bit)
>
> Right. I was focusing on the patch and blindly ignored the explanation
> at the top of the email. Apologies for that.
>
> I'm not overly keen on handling this in the ITS though, and I'd rather
> we try and do it in the generic code. How about this (compile tested
> only)?
>
> Thanks,
>
>         M.
>
> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
> index 6aacd342cd14..cfccad83c2df 100644
> --- a/kernel/irq/irqdomain.c
> +++ b/kernel/irq/irqdomain.c
> @@ -1399,8 +1399,19 @@ static void irq_domain_free_irqs_hierarchy(struct
> irq_domain *domain,
>          return;
>
>      for (i = 0; i < nr_irqs; i++) {
> -        if (irq_domain_get_irq_data(domain, irq_base + i))
> -            domain->ops->free(domain, irq_base + i, 1);
> +        int n ;
> +
> +        /* Find the largest possible span of IRQs to free in one go */
> +        for (n = 0;
> +             ((i + n) < nr_irqs &&
> +              irq_domain_get_irq_data(domain, irq_base + i + n));
> +             n++);
> +
> +        if (!n)
> +            continue;
> +
> +        domain->ops->free(domain, irq_base + i, n);
> +        i += n;
>      }
>  }
>
>

2021-08-27 08:35:27

by Luo Jiaxing

[permalink] [raw]
Subject: Re: PCI MSI issue with reinserting a driver


On 2021/2/4 1:23, Marc Zyngier wrote:
> On 2021-02-02 15:46, John Garry wrote:
>> On 02/02/2021 14:48, Marc Zyngier wrote:
>>>>>
>>>>> Not sure. I also now notice an error for the SAS PCI driver on D06
>>>>> when nr_cpus < 16, which means number of MSI vectors allocated <
>>>>> 32, so looks the same problem. There we try to allocate 16 +
>>>>> max(nr cpus, 16) MSI.
>>>>>
>>>>> Anyway, let me have a look today to see what is going wrong.
>>>>>
>>>> Could this be the problem:
>>>>
>>>> nr_cpus=11
>>>>
>>>> In alloc path, we have:
>>>>     its_alloc_device_irq(nvecs=27 = 16+11)
>>>>       bitmap_find_free_region(order = 5);
>>>> In free path, we have:
>>>>     its_irq_domain_free(nvecs = 1) and free each 27 vecs
>>>>       bitmap_release_region(order = 0)
>>>>
>>>> So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails.
>>
>> [ ... ]
>>
>>>>
>>>>
>>>> But I'm not sure that we have any requirement for those map bits to be
>>>> consecutive.
>>>
>>> We can't really do that. All the events must be contiguous,
>>> and there is also a lot of assumptions in the ITS driver that
>>> LPI allocations is also contiguous.
>>>
>>> But there is also the fact that for Multi-MSI, we *must*
>>> allocate 32 vectors. Any driver could assume that if we have
>>> allocated 17 vectors, then there is another 15 available.
>>>
>>> My question still stand: how was this working with the previous
>>> behaviour?
>>
>> Because previously in this scenario we would allocate 32 bits and free
>> 32 bits in the map; but now we allocate 32 bits, yet only free 27 - so
>> leak 5 bits. And this comes from how irq_domain_free_irqs_hierarchy()
>> now frees per-interrupt, instead of all irqs per domain.
>>
>> Before:
>>  In free path, we have:
>>      its_irq_domain_free(nvecs = 27)
>>        bitmap_release_region(count order = 5 == 32bits)
>>
>> Current:
>>  In free path, we have:
>>      its_irq_domain_free(nvecs = 1) for free each 27 vecs
>>        bitmap_release_region(count order = 0 == 1bit)
>
> Right. I was focusing on the patch and blindly ignored the explanation
> at the top of the email. Apologies for that.
>
> I'm not overly keen on handling this in the ITS though, and I'd rather
> we try and do it in the generic code. How about this (compile tested
> only)?
>
> Thanks,
>
>         M.


Hi, Marc, Just a friendly reminder on this issue. We tested the kernel
on 5.14-rc4 and found that this issue still existed, and the following
bugfix code was not incorporated into the kernel.

I wonder if you have any plans to merge this bugfix patch.


Thanks

Jiaxing


>
> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
> index 6aacd342cd14..cfccad83c2df 100644
> --- a/kernel/irq/irqdomain.c
> +++ b/kernel/irq/irqdomain.c
> @@ -1399,8 +1399,19 @@ static void
> irq_domain_free_irqs_hierarchy(struct irq_domain *domain,
>          return;
>
>      for (i = 0; i < nr_irqs; i++) {
> -        if (irq_domain_get_irq_data(domain, irq_base + i))
> -            domain->ops->free(domain, irq_base + i, 1);
> +        int n ;
> +
> +        /* Find the largest possible span of IRQs to free in one go */
> +        for (n = 0;
> +             ((i + n) < nr_irqs &&
> +              irq_domain_get_irq_data(domain, irq_base + i + n));
> +             n++);
> +
> +        if (!n)
> +            continue;
> +
> +        domain->ops->free(domain, irq_base + i, n);
> +        i += n;
>      }
>  }
>
>

2022-08-04 11:19:59

by John Garry

[permalink] [raw]
Subject: Re: PCI MSI issue with reinserting a driver

On 04/02/2021 10:45, John Garry wrote:

Hi Marc,

Just a friendly reminder that we still have the issue with reinseting a
PCI driver which does not allocate a power-of-2 MSIs.

> On 03/02/2021 17:23, Marc Zyngier wrote:
>>>
>>> Before:
>>>  In free path, we have:
>>>      its_irq_domain_free(nvecs = 27)
>>>        bitmap_release_region(count order = 5 == 32bits)
>>>
>>> Current:
>>>  In free path, we have:
>>>      its_irq_domain_free(nvecs = 1) for free each 27 vecs
>>>        bitmap_release_region(count order = 0 == 1bit)
>>
>> Right. I was focusing on the patch and blindly ignored the explanation
>> at the top of the email. Apologies for that.
>
> Yeah, it was a distraction.
>
>>
>> I'm not overly keen on handling this in the ITS though, and I'd rather
>> we try and do it in the generic code. How about this (compile tested
>> only)?
>>
>> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
>> index 6aacd342cd14..cfccad83c2df 100644
>> --- a/kernel/irq/irqdomain.c
>> +++ b/kernel/irq/irqdomain.c
>> @@ -1399,8 +1399,19 @@ static void
>> irq_domain_free_irqs_hierarchy(struct irq_domain *domain,
>>           return;
>>
>>       for (i = 0; i < nr_irqs; i++) {
>> -        if (irq_domain_get_irq_data(domain, irq_base + i))
>> -            domain->ops->free(domain, irq_base + i, 1);
>> +        int n ;
>> +
>> +        /* Find the largest possible span of IRQs to free in one go */
>> +        for (n = 0;
>> +             ((i + n) < nr_irqs &&
>> +              irq_domain_get_irq_data(domain, irq_base + i + n));
>> +             n++);
>> +
>> +        if (!n)
>> +            continue;
>> +
>> +        domain->ops->free(domain, irq_base + i, n);
>> +        i += n;
>>       }
>>   }
>
> That fixed my problem.
>
> For my benefit, if MSIs must be allocated in power of 2, could we check
> a flag for the dealloc method? Like, if MSI domain, then do as before
> 4615fbc3788d. But I'm not sure on the specific scenario which that
> commit fixed. Or even whether you want specifics like that in core code.

Thanks,
John