Hi,
I have been working on a interrupt controller driver that uses tree
based mapping for its domain (irq_domain_add_tree(..)).
If I understand correctly, the clients get a mapping when they call
platform_get_irq(..).
However, after these clients are removed (rmmod), when I try to remove
the interrupt controller driver where it calls irq_domain_remove(..), I
hit this warning from kernel/kernel/irq/irqdomain.c::
irq_domain_remove(..)
[WARN_ON(!radix_tree_empty(&domain->revmap_tree));]-
WARNING: CPU: 0 PID: 238 at /kernel/kernel/irq/irqdomain.c:246
irq_domain_remove+0x84/0x98
Also, I see that the requested IRQs by the clients are still present (in
/proc/interrupts) even after they had been removed.
Hence, I just wanted to know how to handle this warning. Should the
client clean up by calling irq_dispose_mapping(..) or is it the
responsibility of the interrupt controller driver to dispose the
mappings one by one?
Regards,
Prakruthi Deepak Heragu
On Tue, 23 Jul 2019 14:52:34 -0700
[email protected] wrote:
Hi Prakruthi,
> Hi,
>
> I have been working on a interrupt controller driver that uses tree
> based mapping for its domain (irq_domain_add_tree(..)).
> If I understand correctly, the clients get a mapping when they call
> platform_get_irq(..). However, after these clients are removed
> (rmmod), when I try to remove the interrupt controller driver where
> it calls irq_domain_remove(..), I hit this warning from
> kernel/kernel/irq/irqdomain.c:: irq_domain_remove(..)
> [WARN_ON(!radix_tree_empty(&domain->revmap_tree));]-
> WARNING: CPU: 0 PID: 238 at /kernel/kernel/irq/irqdomain.c:246 irq_domain_remove+0x84/0x98
>
> Also, I see that the requested IRQs by the clients are still present
> (in /proc/interrupts) even after they had been removed. Hence, I just
> wanted to know how to handle this warning. Should the client clean up
> by calling irq_dispose_mapping(..) or is it the responsibility of the
> interrupt controller driver to dispose the mappings one by one?
In general, building interrupt controller drivers as a module is a
pretty difficult thing to do in a safe manner. As you found out, this
relies on the irq_domain being "emptied" before it can be freed. There
are some other gotchas in the rest of the IRQ stack as well.
Doing that is hard. One of the reasons is that the OF subsystem will
happily allocate all the interrupts it can even if there is no driver
having requested them (see of_platform_populate). This means that you
cannot track whether a client driver is using one of the interrupt your
irqchip is in charge of. You can apply some heuristics, but they are in
general all wrong.
Fixing the OF subsystem is possible, but will break a lot of platforms
that will have to be identified and fixed one by one. Another
possibility would be to refcount irqdescs, and make sure the irqdomain
directly holds pointers to them. Doable, but may create overhead.
To sum it up, don't build your irqchip driver as a module if you can
avoid it. If you can't, you'll have to be very careful about how the
mapping is established (make sure it is not created by
of_platform_populate), and use irq_dispose_mapping in the client
drivers.
Hope this helps,
M.
--
Without deviation from the norm, progress is not possible.
On 2019-07-23 23:51, Marc Zyngier wrote:
> On Tue, 23 Jul 2019 14:52:34 -0700
> [email protected] wrote:
>
> Hi Prakruthi,
>
>> Hi,
>>
>> I have been working on a interrupt controller driver that uses tree
>> based mapping for its domain (irq_domain_add_tree(..)).
>> If I understand correctly, the clients get a mapping when they call
>> platform_get_irq(..). However, after these clients are removed
>> (rmmod), when I try to remove the interrupt controller driver where
>> it calls irq_domain_remove(..), I hit this warning from
>> kernel/kernel/irq/irqdomain.c:: irq_domain_remove(..)
>> [WARN_ON(!radix_tree_empty(&domain->revmap_tree));]-
>> WARNING: CPU: 0 PID: 238 at /kernel/kernel/irq/irqdomain.c:246
>> irq_domain_remove+0x84/0x98
>>
>> Also, I see that the requested IRQs by the clients are still present
>> (in /proc/interrupts) even after they had been removed. Hence, I just
>> wanted to know how to handle this warning. Should the client clean up
>> by calling irq_dispose_mapping(..) or is it the responsibility of the
>> interrupt controller driver to dispose the mappings one by one?
>
> In general, building interrupt controller drivers as a module is a
> pretty difficult thing to do in a safe manner. As you found out, this
> relies on the irq_domain being "emptied" before it can be freed. There
> are some other gotchas in the rest of the IRQ stack as well.
>
> Doing that is hard. One of the reasons is that the OF subsystem will
> happily allocate all the interrupts it can even if there is no driver
> having requested them (see of_platform_populate). This means that you
> cannot track whether a client driver is using one of the interrupt your
> irqchip is in charge of. You can apply some heuristics, but they are in
> general all wrong.
>
> Fixing the OF subsystem is possible, but will break a lot of platforms
> that will have to be identified and fixed one by one. Another
> possibility would be to refcount irqdescs, and make sure the irqdomain
> directly holds pointers to them. Doable, but may create overhead.
>
> To sum it up, don't build your irqchip driver as a module if you can
> avoid it. If you can't, you'll have to be very careful about how the
> mapping is established (make sure it is not created by
> of_platform_populate), and use irq_dispose_mapping in the client
> drivers.
>
As per your suggestion I tried making this driver a statically compiled
one.
I tried various approaches with this -
1. Using arch_inticall(..) - When I used this call, I saw that once the
clients were removed, I don't see the IRQs requested by them (in
/proc/interrupts).
2. Using module_init(..) (statically compiled) - When I used this call,
I saw that
even after the clients were removed, I do see their requested IRQs in
/proc/interrupts.
This behavior in #2 is the same as the one I saw when I compiled my
driver as a module
and used arch_initcall(..).
Is there any reason why this is seen only with arch_initcall(..) used
statically?
Regards,
Prakruthi Deepak