2022-08-31 03:17:02

by Liao, Chang

[permalink] [raw]
Subject: [PATCH V2] irqchip/gic-v3-its: Reclaim the dangling bits in LPI maps

Following interrupt allocation process leads to some interrupts are
mapped in the low-level domain(Arm ITS), but they have never mapped
at the higher level.

irq_domain_alloc_irqs_hierarchy(.., nr_irqs, ...)
its_irq_domain_alloc(..., nr_irqs, ...)
its_alloc_device_irq(..., nr_irqs, ...)
bitmap_find_free_region(..., get_count_order(nr_irqs))

Since ITS domain finds a region of zero bits, the length of which must
aligned to the power of two. If nr_irqs is 30, the length of zero bits
is actually 32, but the first 30 bits are really mapped.

On teardown, the low-level domain only free these interrupts that
actually mapped, and leave last interrupts dangling in the ITS domain.
Thus the ITS device resources are never freed. On device driver reload,
dangling interrupts prevent ITS domain from allocating enough resource.

irq_domain_free_irqs_hierarchy(..., nr_irqs, ...)
its_irq_domain_free(..., irq_base + i, 1)
bitmap_release_region(..., irq_base + i, get_count_order(1))

John reported this problem to LKML and Marc provided a solution and fix
it in the generic code, see the discussion from Link tag. Marc's patch
fix John's problem, but does not take care of some corner case, look one
example below.

Step1: 32 interrupts allocated in LPI domain, but return the first 30 to
higher driver.

111111111111111111111111111111 11
|<------------0~29------------>|30,31|

Step2: interrupt #16~28 are released one by one, then #0~15 and #29~31
still be there.

1111111111111111 0000000000000 1 11
|<-----0~15----->|<---16~28--->|29|30,31|

Step#: on driver teardown, generic code will invoke ITS domain code
twice. The first time, #0~15 will be released, the second one, only #29
will be released(1 align to power of two).

0000000000000000 0000000000000 0 11
|<-----0~15----->|<---16~28--->|29|30,31|

In short summary, the dangling problem stems from the number of released
hwirq is less than the one of the allocated hwirq in ITS domain.

In order to fix this problem, introduce dangling list for recording
these allocated but unmapped hwirq. Whenever some LPI hwirqs are
released, perform dangling list-travel to find out some dangling bits
followed then release them, look back the trivial example above.

Step1: record '2' into the irq_data.dangling of #29 hwirq.

111111111111111111111111111111 11
|<------------0~29------------>|30,31|
dangling: 000000000000000000000000000002

Step2: no change

1111111111111111 0000000000000 1 11
|<-----0~15----->|<---16~28--->|29|30,31|
dangling: 0000000000000000 0000000000000 2

Step3: ITS domain will release #30~31 since the irq_data.dangling of #29
is '2'.

0000000000000000 0000000000000 0 00
|<-----0~15----->|<---16~28--->|29|30,31|
dangling: 0000000000000000 0000000000000 2

Fixes: 4615fbc3788dd ("genirq/irqdomain: Don't try to free an interrupt that has no mapping")
Reported-by: John Garry <[email protected]>
Signed-off-by: Liao Chang <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
---
Changes since v1:
- Correct grammar and spelling mistakes in commit message.
- Refactor the fixup solution, avoid hacking generici irq code.

---
drivers/irqchip/irq-gic-v3-its.c | 57 ++++++++++++++++++++++++++++++--
1 file changed, 54 insertions(+), 3 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 5ff09de6c48f..e191491bf683 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -143,6 +143,12 @@ struct its_node {
/* Convert page order to size in bytes */
#define PAGE_ORDER_TO_SIZE(o) (PAGE_SIZE << (o))

+struct dangling_lpis {
+ struct list_head list;
+ irq_hw_number_t start;
+ irq_hw_number_t end;
+};
+
struct event_lpi_map {
unsigned long *lpi_map;
u16 *col_map;
@@ -152,6 +158,7 @@ struct event_lpi_map {
struct its_vm *vm;
struct its_vlpi_map *vlpi_maps;
int nr_vlpis;
+ struct list_head dangling;
};

/*
@@ -3414,6 +3421,7 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
dev->event_map.col_map = col_map;
dev->event_map.lpi_base = lpi_base;
dev->event_map.nr_lpis = nr_lpis;
+ INIT_LIST_HEAD(&dev->event_map.dangling);
raw_spin_lock_init(&dev->event_map.vlpi_lock);
dev->device_id = dev_id;
INIT_LIST_HEAD(&dev->entry);
@@ -3443,6 +3451,8 @@ static void its_free_device(struct its_device *its_dev)
static int its_alloc_device_irq(struct its_device *dev, int nvecs, irq_hw_number_t *hwirq)
{
int idx;
+ int real_bits = (1 << get_count_order(nvecs));
+ struct dangling_lpis *lpis;

/* Find a free LPI region in lpi_map and allocate them. */
idx = bitmap_find_free_region(dev->event_map.lpi_map,
@@ -3453,9 +3463,52 @@ static int its_alloc_device_irq(struct its_device *dev, int nvecs, irq_hw_number

*hwirq = dev->event_map.lpi_base + idx;

+ /*
+ * In order to reclaim dangling hwirq bits when module teardown,
+ * record all dangling hwirq here.
+ */
+ if (real_bits > nvecs) {
+ lpis = kzalloc(sizeof(*lpis), GFP_KERNEL);
+ if (!lpis) {
+ bitmap_release_region(dev->event_map.lpi_map,
+ *hwirq, get_count_order(nvecs));
+ return -ENOMEM;
+ }
+ lpis->start = *hwirq + nvecs;
+ lpis->end = *hwirq + real_bits;
+ list_add_tail(&dev->event_map.dangling, &lpis->list);
+ }
+
return 0;
}

+static void its_free_device_irq(struct its_device *dev, int nvecs,
+ irq_hw_number_t hwirq)
+{
+ struct dangling_lpis *entry, *temp;
+
+ bitmap_release_region(dev->event_map.lpi_map, hwirq,
+ get_count_order(nvecs));
+
+ /*
+ * If these hwirq are followed by some dangling bits, it needs to
+ * reclaim these dangling bits.
+ */
+ list_for_each_entry_safe(entry, temp, &dev->event_map.dangling, list) {
+ if (entry->start != hwirq + nvecs)
+ continue;
+
+ while (entry->start < entry->end) {
+ bitmap_release_region(dev->event_map.lpi_map,
+ entry->start, get_count_order(1));
+ entry->start += 1;
+ }
+ list_del(&entry->list);
+ kfree(entry);
+ break;
+ }
+}
+
static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
int nvec, msi_alloc_info_t *info)
{
@@ -3619,9 +3672,7 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq,
struct its_node *its = its_dev->its;
int i;

- bitmap_release_region(its_dev->event_map.lpi_map,
- its_get_event_id(irq_domain_get_irq_data(domain, virq)),
- get_count_order(nr_irqs));
+ its_free_device_irq(its_dev, nr_irqs, its_get_event_id(d));

for (i = 0; i < nr_irqs; i++) {
struct irq_data *data = irq_domain_get_irq_data(domain,
--
2.17.1


2022-09-05 18:05:34

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH V2] irqchip/gic-v3-its: Reclaim the dangling bits in LPI maps

On Wed, 31 Aug 2022 03:33:32 +0100,
Liao Chang <[email protected]> wrote:
>
> Following interrupt allocation process leads to some interrupts are
> mapped in the low-level domain(Arm ITS), but they have never mapped
> at the higher level.
>
> irq_domain_alloc_irqs_hierarchy(.., nr_irqs, ...)
> its_irq_domain_alloc(..., nr_irqs, ...)
> its_alloc_device_irq(..., nr_irqs, ...)
> bitmap_find_free_region(..., get_count_order(nr_irqs))
>
> Since ITS domain finds a region of zero bits, the length of which must
> aligned to the power of two. If nr_irqs is 30, the length of zero bits
> is actually 32, but the first 30 bits are really mapped.
>
> On teardown, the low-level domain only free these interrupts that
> actually mapped, and leave last interrupts dangling in the ITS domain.
> Thus the ITS device resources are never freed. On device driver reload,
> dangling interrupts prevent ITS domain from allocating enough resource.
>
> irq_domain_free_irqs_hierarchy(..., nr_irqs, ...)
> its_irq_domain_free(..., irq_base + i, 1)
> bitmap_release_region(..., irq_base + i, get_count_order(1))
>
> John reported this problem to LKML and Marc provided a solution and fix
> it in the generic code, see the discussion from Link tag. Marc's patch
> fix John's problem, but does not take care of some corner case, look one
> example below.
>
> Step1: 32 interrupts allocated in LPI domain, but return the first 30 to
> higher driver.
>
> 111111111111111111111111111111 11
> |<------------0~29------------>|30,31|
>
> Step2: interrupt #16~28 are released one by one, then #0~15 and #29~31
> still be there.
>
> 1111111111111111 0000000000000 1 11
> |<-----0~15----->|<---16~28--->|29|30,31|
>
> Step#: on driver teardown, generic code will invoke ITS domain code
> twice. The first time, #0~15 will be released, the second one, only #29
> will be released(1 align to power of two).
>
> 0000000000000000 0000000000000 0 11
> |<-----0~15----->|<---16~28--->|29|30,31|

Which driver is doing this? This really looks like a driver bug to
only free a portion of its MSI allocation, and that's definitely not
something that is commonly done.

Even worse, this can result in some LPIs being released behind the
driver's back, exactly due to this power-of-two alignment.

It seems to me that you are trying to solve a problem that only exists
for a buggy driver. Please point me to the upstream code that has such
behaviour and explain why this can't be fixed in that driver itself.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2022-09-07 08:01:04

by Liao, Chang

[permalink] [raw]
Subject: Re: [PATCH V2] irqchip/gic-v3-its: Reclaim the dangling bits in LPI maps

Marc, thanks for comment.

在 2022/9/6 1:21, Marc Zyngier 写道:
> On Wed, 31 Aug 2022 03:33:32 +0100,
> Liao Chang <[email protected]> wrote:
>>
>> Following interrupt allocation process leads to some interrupts are
>> mapped in the low-level domain(Arm ITS), but they have never mapped
>> at the higher level.
>>
>> irq_domain_alloc_irqs_hierarchy(.., nr_irqs, ...)
>> its_irq_domain_alloc(..., nr_irqs, ...)
>> its_alloc_device_irq(..., nr_irqs, ...)
>> bitmap_find_free_region(..., get_count_order(nr_irqs))
>>
>> Since ITS domain finds a region of zero bits, the length of which must
>> aligned to the power of two. If nr_irqs is 30, the length of zero bits
>> is actually 32, but the first 30 bits are really mapped.
>>
>> On teardown, the low-level domain only free these interrupts that
>> actually mapped, and leave last interrupts dangling in the ITS domain.
>> Thus the ITS device resources are never freed. On device driver reload,
>> dangling interrupts prevent ITS domain from allocating enough resource.
>>
>> irq_domain_free_irqs_hierarchy(..., nr_irqs, ...)
>> its_irq_domain_free(..., irq_base + i, 1)
>> bitmap_release_region(..., irq_base + i, get_count_order(1))
>>
>> John reported this problem to LKML and Marc provided a solution and fix
>> it in the generic code, see the discussion from Link tag. Marc's patch
>> fix John's problem, but does not take care of some corner case, look one
>> example below.
>>
>> Step1: 32 interrupts allocated in LPI domain, but return the first 30 to
>> higher driver.
>>
>> 111111111111111111111111111111 11
>> |<------------0~29------------>|30,31|
>>
>> Step2: interrupt #16~28 are released one by one, then #0~15 and #29~31
>> still be there.
>>
>> 1111111111111111 0000000000000 1 11
>> |<-----0~15----->|<---16~28--->|29|30,31|
>>
>> Step#: on driver teardown, generic code will invoke ITS domain code
>> twice. The first time, #0~15 will be released, the second one, only #29
>> will be released(1 align to power of two).
>>
>> 0000000000000000 0000000000000 0 11
>> |<-----0~15----->|<---16~28--->|29|30,31|
>
> Which driver is doing this? This really looks like a driver bug to
> only free a portion of its MSI allocation, and that's definitely not
> something that is commonly done.

Yes, this scenario is manipulated. I use this example to prove why current ITS
allocation is buggy, that is the number of interrupt driver is about to release
have to be equal with the number of allocation **last time**, even though the total
number is same, pesudo code below reflects this problem.

[Correct usage]
virq = irq_domain_alloc_irqs(...,0,30,...) // 32 bits are allocated actually.
irq_domain_free_irqs(virq, 30) // 32 bits are released actually.

[Incorrect usage]
virq = irq_domain_alloc_irqs(...,0,30,...)
for(i = 0; i < 30; i++)
irq_domain_free_irqs(virq + i, 1)
// driver release 30 irq, but last 2 bits are dangling due to alignment.

>
> Even worse, this can result in some LPIs being released behind the
> driver's back, exactly due to this power-of-two alignment.
>
> It seems to me that you are trying to solve a problem that only exists
> for a buggy driver. Please point me to the upstream code that has such
> behaviour and explain why this can't be fixed in that driver itself.

I indeed find no upstream code has such buggy behaviour, thanks for pointing out
this important and undocumented rule, it is very helpful to me, please ignore this
patch.

>
> Thanks,
>
> M.
>

--
BR,
Liao, Chang