Dear Marc, Thomas,
we have encountered the following problem that can hopefully be put
some light onto: What is the intended way to set affinity (and possibly
other irq attributes) of parent IRQ of chained IRQs, when using the
irqdomain API?
We are working on a driver that
- registers an irqchip and adds an irqdomain
- calls irq_set_chained_handler_and_data(parent_irq, handler)
where handler triggers handling of child IRQs
- but since parent_irq isn't requested for with request_thread_irq(),
it does not show up in proc/sysfs, only in debugfs
- the HW does not support setting affinity for the chained IRQs, only
the parent (which comes from a GIC chip)
The problem is that he parent IRQ, as mentioned in the third point, does
not show up in proc/sysfs.
Is there some precedent for this?
Thank you.
Marek
On Mon, 02 May 2022 09:21:37 +0100,
Marek Behún <[email protected]> wrote:
>
> Dear Marc, Thomas,
>
> we have encountered the following problem that can hopefully be put
> some light onto: What is the intended way to set affinity (and possibly
> other irq attributes) of parent IRQ of chained IRQs, when using the
> irqdomain API?
Simples: you can't. What sense does it make to change the affinity of
the parent interrupt, given that its fate is tied to *all* of the
other interrupts that are muxed to it?
Moving the parent interrupt breaks userspace's view of how interrupt
affinity is managed (change the affinity of one interrupt, see all the
others move the same way). Which is why we don't expose this interrupt
to userspace, as this can only lead to bad things.
Note that this has nothing to do with the irqdomain API, but
everything to do with the userspace ABI.
>
> We are working on a driver that
> - registers an irqchip and adds an irqdomain
> - calls irq_set_chained_handler_and_data(parent_irq, handler)
> where handler triggers handling of child IRQs
> - but since parent_irq isn't requested for with request_thread_irq(),
> it does not show up in proc/sysfs, only in debugfs
> - the HW does not support setting affinity for the chained IRQs, only
> the parent (which comes from a GIC chip)
>
> The problem is that he parent IRQ, as mentioned in the third point, does
> not show up in proc/sysfs.
>
> Is there some precedent for this?
There were precedents of irqchips doing terrible things, such as
implementing a set_affinity() callback in the chained irqchip.
Thankfully, they have been either fixed or eradicated.
M.
--
Without deviation from the norm, progress is not possible.
On Mon, 02 May 2022 10:31:11 +0100
Marc Zyngier <[email protected]> wrote:
> On Mon, 02 May 2022 09:21:37 +0100,
> Marek Behún <[email protected]> wrote:
> >
> > Dear Marc, Thomas,
> >
> > we have encountered the following problem that can hopefully be put
> > some light onto: What is the intended way to set affinity (and possibly
> > other irq attributes) of parent IRQ of chained IRQs, when using the
> > irqdomain API?
>
> Simples: you can't. What sense does it make to change the affinity of
> the parent interrupt, given that its fate is tied to *all* of the
> other interrupts that are muxed to it?
Dear Marc,
thank you for your answer. Still:
What about when we want to set the same affinity for all the chained
interrupts?
Example: on Armada 385 there are 4 PCIe controllers. Each controller
has one interrupt from which we trigger chained interrupts. We would
like to configure each controller to trigger interrupt (and thus all
chained interrupts in the domain) on different CPU core.
Moreover we would really like to do this in runtime, through sysfs,
depending on for example whether there are cards plugged in the PCIe
ports.
Maybe there should be some mechanism to allow to change affinity for
whole irqdomain, or something?
Marek
On Mon, 02 May 2022 16:45:59 +0100,
Marek Behún <[email protected]> wrote:
>
> On Mon, 02 May 2022 10:31:11 +0100
> Marc Zyngier <[email protected]> wrote:
>
> > On Mon, 02 May 2022 09:21:37 +0100,
> > Marek Behún <[email protected]> wrote:
> > >
> > > Dear Marc, Thomas,
> > >
> > > we have encountered the following problem that can hopefully be put
> > > some light onto: What is the intended way to set affinity (and possibly
> > > other irq attributes) of parent IRQ of chained IRQs, when using the
> > > irqdomain API?
> >
> > Simples: you can't. What sense does it make to change the affinity of
> > the parent interrupt, given that its fate is tied to *all* of the
> > other interrupts that are muxed to it?
>
> Dear Marc,
>
> thank you for your answer. Still:
>
> What about when we want to set the same affinity for all the chained
> interrupts?
>
> Example: on Armada 385 there are 4 PCIe controllers. Each controller
> has one interrupt from which we trigger chained interrupts. We would
> like to configure each controller to trigger interrupt (and thus all
> chained interrupts in the domain) on different CPU core.
>
> Moreover we would really like to do this in runtime, through sysfs,
> depending on for example whether there are cards plugged in the PCIe
> ports.
>
> Maybe there should be some mechanism to allow to change affinity for
> whole irqdomain, or something?
Should? Maybe. But not for an irqdomain (which really doesn't have
anything to do with interrupt affinity).
What you may want is a new sysfs interface that would allow a parent
interrupt affinity being changed, but also exposing to userspace all
the interrupts this affects *at the same time*. something like:
/sys/kernel/irq/42/smp_affinity_list
/sys/kernel/irq/42/muxed_irqs/
/sys/kernel/irq/42/muxed_irqs/56 -> ../../56
/sys/kernel/irq/42/muxed_irqs/57 -> ../../57
The main issues are that:
- we don't really track the muxing information in any of the data
structures, so you can't just walk a short list and generate this
information. You'd need to build the topology information at
allocation time (or fish it out at runtime, but that's likely a
pain).
- sysfs doesn't deal with affinities at all. procfs does, but adding
more crap there is frowned upon.
- it *must* be a new interface. You can't repurpose the existing one,
as something like irqbalance would be otherwise be massively
confused by seeing interrupts moving around behind its back.
- conversely, you'll need to teach irqbalance how to deal with this
new interface.
- this needs to be safe against CPU hotplug. It probably already is,
but nobody ever tested it, given that userspace can't interact with
these interrupts at the moment.
M.
--
Without deviation from the norm, progress is not possible.
Hello Marc, Marek,
On Tue, 2022-05-03 at 10:32 +0100, Marc Zyngier wrote:
> On Mon, 02 May 2022 16:45:59 +0100,
> Marek Behún <[email protected]> wrote:
> >
> > On Mon, 02 May 2022 10:31:11 +0100
> > Marc Zyngier <[email protected]> wrote:
> >
> > > On Mon, 02 May 2022 09:21:37 +0100,
> > > Marek Behún <[email protected]> wrote:
> > > >
> > > > Dear Marc, Thomas,
> > > >
> > > > we have encountered the following problem that can hopefully be put
> > > > some light onto: What is the intended way to set affinity (and possibly
> > > > other irq attributes) of parent IRQ of chained IRQs, when using the
> > > > irqdomain API?
> > >
> > > Simples: you can't. What sense does it make to change the affinity of
> > > the parent interrupt, given that its fate is tied to *all* of the
> > > other interrupts that are muxed to it?
> >
> > Dear Marc,
> >
> > thank you for your answer. Still:
> >
> > What about when we want to set the same affinity for all the chained
> > interrupts?
> >
> > Example: on Armada 385 there are 4 PCIe controllers. Each controller
> > has one interrupt from which we trigger chained interrupts. We would
> > like to configure each controller to trigger interrupt (and thus all
> > chained interrupts in the domain) on different CPU core.
> >
> > Moreover we would really like to do this in runtime, through sysfs,
> > depending on for example whether there are cards plugged in the PCIe
> > ports.
> >
> > Maybe there should be some mechanism to allow to change affinity for
> > whole irqdomain, or something?
>
> Should? Maybe. But not for an irqdomain (which really doesn't have
> anything to do with interrupt affinity).
>
> What you may want is a new sysfs interface that would allow a parent
> interrupt affinity being changed, but also exposing to userspace all
> the interrupts this affects *at the same time*. something like:
>
> /sys/kernel/irq/42/smp_affinity_list
> /sys/kernel/irq/42/muxed_irqs/
> /sys/kernel/irq/42/muxed_irqs/56 -> ../../56
> /sys/kernel/irq/42/muxed_irqs/57 -> ../../57
>
> The main issues are that:
>
> - we don't really track the muxing information in any of the data
> structures, so you can't just walk a short list and generate this
> information. You'd need to build the topology information at
> allocation time (or fish it out at runtime, but that's likely a
> pain).
>
> - sysfs doesn't deal with affinities at all. procfs does, but adding
> more crap there is frowned upon.
>
> - it *must* be a new interface. You can't repurpose the existing one,
> as something like irqbalance would be otherwise be massively
> confused by seeing interrupts moving around behind its back.
>
> - conversely, you'll need to teach irqbalance how to deal with this
> new interface.
>
> - this needs to be safe against CPU hotplug. It probably already is,
> but nobody ever tested it, given that userspace can't interact with
> these interrupts at the moment.
Are you aware of any work being done (or having been done) in this
area? Thanks in advance!
My colleagues and I are looking into picking this up and implementing
the new sysfs interface and the related irqbalance changes, and we are
currently evaluating the level of effort. Obviously, we would like to
avoid any effort duplication.
Best regards,
Radu
Hi Radu,
On Fri, 07 Apr 2023 00:56:40 +0100,
Radu Rendec <[email protected]> wrote:
>
> Hello Marc, Marek,
>
> On Tue, 2022-05-03 at 10:32 +0100, Marc Zyngier wrote:
> > On Mon, 02 May 2022 16:45:59 +0100,
> > Marek Behún <[email protected]> wrote:
> > >
> > > On Mon, 02 May 2022 10:31:11 +0100
> > > Marc Zyngier <[email protected]> wrote:
> > >
> > > > On Mon, 02 May 2022 09:21:37 +0100,
> > > > Marek Behún <[email protected]> wrote:
> > > > >
> > > > > Dear Marc, Thomas,
> > > > >
> > > > > we have encountered the following problem that can hopefully be put
> > > > > some light onto: What is the intended way to set affinity (and possibly
> > > > > other irq attributes) of parent IRQ of chained IRQs, when using the
> > > > > irqdomain API?
> > > >
> > > > Simples: you can't. What sense does it make to change the affinity of
> > > > the parent interrupt, given that its fate is tied to *all* of the
> > > > other interrupts that are muxed to it?
> > >
> > > Dear Marc,
> > >
> > > thank you for your answer. Still:
> > >
> > > What about when we want to set the same affinity for all the chained
> > > interrupts?
> > >
> > > Example: on Armada 385 there are 4 PCIe controllers. Each controller
> > > has one interrupt from which we trigger chained interrupts. We would
> > > like to configure each controller to trigger interrupt (and thus all
> > > chained interrupts in the domain) on different CPU core.
> > >
> > > Moreover we would really like to do this in runtime, through sysfs,
> > > depending on for example whether there are cards plugged in the PCIe
> > > ports.
> > >
> > > Maybe there should be some mechanism to allow to change affinity for
> > > whole irqdomain, or something?
> >
> > Should? Maybe. But not for an irqdomain (which really doesn't have
> > anything to do with interrupt affinity).
> >
> > What you may want is a new sysfs interface that would allow a parent
> > interrupt affinity being changed, but also exposing to userspace all
> > the interrupts this affects *at the same time*. something like:
> >
> > /sys/kernel/irq/42/smp_affinity_list
> > /sys/kernel/irq/42/muxed_irqs/
> > /sys/kernel/irq/42/muxed_irqs/56 -> ../../56
> > /sys/kernel/irq/42/muxed_irqs/57 -> ../../57
> >
> > The main issues are that:
> >
> > - we don't really track the muxing information in any of the data
> > structures, so you can't just walk a short list and generate this
> > information. You'd need to build the topology information at
> > allocation time (or fish it out at runtime, but that's likely a
> > pain).
> >
> > - sysfs doesn't deal with affinities at all. procfs does, but adding
> > more crap there is frowned upon.
> >
> > - it *must* be a new interface. You can't repurpose the existing one,
> > as something like irqbalance would be otherwise be massively
> > confused by seeing interrupts moving around behind its back.
> >
> > - conversely, you'll need to teach irqbalance how to deal with this
> > new interface.
> >
> > - this needs to be safe against CPU hotplug. It probably already is,
> > but nobody ever tested it, given that userspace can't interact with
> > these interrupts at the moment.
>
> Are you aware of any work being done (or having been done) in this
> area? Thanks in advance!
>
> My colleagues and I are looking into picking this up and implementing
> the new sysfs interface and the related irqbalance changes, and we are
> currently evaluating the level of effort. Obviously, we would like to
> avoid any effort duplication.
I don't think anyone ever tried it (it's far easier to just moan about
it than to do anything useful). But if you want to start looking into
that, that'd be great.
One of my concern is that allowing affinity changes for chained
interrupt may uncover issues in existing drivers, so it would have to
be an explicit buy-in for any chained irqchip. That's probably not too
hard to achieve anyway given that you'll need some new infrastructure
to track the muxed interrupts.
Hopefully this will result in something actually happening! ;-)
Cheers,
M.
--
Without deviation from the norm, progress is not possible.
Hi Marc,
On Fri, 2023-04-07 at 10:18 +0100, Marc Zyngier wrote:
> On Fri, 07 Apr 2023 00:56:40 +0100, Radu Rendec <[email protected]> wrote:
> > Are you aware of any work being done (or having been done) in this
> > area? Thanks in advance!
> >
> > My colleagues and I are looking into picking this up and implementing
> > the new sysfs interface and the related irqbalance changes, and we are
> > currently evaluating the level of effort. Obviously, we would like to
> > avoid any effort duplication.
>
> I don't think anyone ever tried it (it's far easier to just moan about
> it than to do anything useful). But if you want to start looking into
> that, that'd be great.
Thanks for the feedback and sorry for the late reply. It looks like I
already started. I have been working on a "sandbox" driver that
implements hierarchical/muxed interrupts and would allow me to test in
a generic environment, without requiring mux hardware or to mess with
real interrupts.
But first, I would like to clarify something, just to make sure I'm on
the right track. It looks to me like with the hierarchical IRQ domain
API, there is always a 1:1 end-to-end mapping between virqs and the
hwirqs near the CPU. IOW, there is a 1:1 mapping between a given virq
and the corresponding hwirq in each IRQ domain along the chain, and
there is no other virq in-between. I looked at many of the irqchip
drivers that implement the hierarchical API, and couldn't find a single
one that does muxed IRQs. Furthermore, the revmap in struct irq_domain
is clearly a 1:1 map, so when an IRQ vector is entered, there is no way
to map multiple virqs (and run the associated handlers). I tried it in
my test driver, and if the .alloc domain op implementation allocates
the same hwirq in the parent domain for two different (v)irqs, the
revmap slot in the parent domain is overwritten.
If my understanding is correct, muxed IRQs are not possible with the
hierarchical IRQ domain API. That means in this particular case you can
never indirectly change the affinity of a different IRQ because hwirqs
are never shared. So, this is just a matter of exposing the affinity
through the new sysfs API for every irqchip driver that opts-in.
On the other hand, muxed IRQs *are* possible with the legacy API, and
drivers/irqchip/irq-imx-intmux.c is a clear example of that. However,
in this case one (or multiple) additional virq(s) exist at the mux
level, and it is the virq handler that implements the logic to invoke
the appropriate downstream (child) virq handler(s). Then the virq(s) at
the mux level and all the corresponding downstream virqs share the same
affinity setting, because they also share the same hwirq in the root
domain (which is where affinity is really implemented). And yes, in
this case the relationship between these virqs is not tracked anywhere
currently. Is this what you had in mind when you mentioned below a "new
infrastructure to track muxed interrupts"?
> One of my concern is that allowing affinity changes for chained
> interrupt may uncover issues in existing drivers, so it would have to
> be an explicit buy-in for any chained irqchip. That's probably not too
> hard to achieve anyway given that you'll need some new infrastructure
> to track the muxed interrupts.
The first thing that comes to mind for the "explicit buy-in" is a new
function pointer in struct irq_chip to set the affinity in a mux-aware
manner. Something like irq_set_affinity_shared or _chained. I may not
see the whole picture yet but so far my thinking is that the existing
irq_set_affinity must remain unchanged in order to preserve
compatibility/behavior of the procfs interface.
> Hopefully this will result in something actually happening! ;-)
I really hope so. I am also excited to have the opportunity to work on
this. I will likely need your guidance along the way but I think it's
better to talk in advance than submit a huge patch series that makes no
sense :)
Thanks,
Radu