2008-08-28 20:50:11

by Brice Goglin

[permalink] [raw]
Subject: [RFC] export irq_set/get_affinity() for multiqueue network drivers

Hello,

Is there any way to setup IRQ masks from within a driver? myri10ge
currently relies on an external script (writing in
/proc/irq/*/smp_affinity) to bind each queue/MSI-X to a different
processor. By default, Linux will either:
* round-robin the interrupts (killing the benefit of DCA for instance)
* put all IRQs on the same CPU (killing much of the benefit of multislices)

With more and more drivers using multiqueues, I think we need a nice way
to bind MSI-X from within the drivers. I am not sure what's best, the
attached (untested) patch would just export the existing
irq_set_affinity() and add irq_get_affinity(). Comments?

thanks,
Brice


Attachments:
export_irq_affinity.patch (1.76 kB)

2008-08-28 20:56:25

by David Miller

[permalink] [raw]
Subject: Re: [RFC] export irq_set/get_affinity() for multiqueue network drivers

From: Brice Goglin <[email protected]>
Date: Thu, 28 Aug 2008 22:21:53 +0200

> With more and more drivers using multiqueues, I think we need a nice way
> to bind MSI-X from within the drivers. I am not sure what's best, the
> attached (untested) patch would just export the existing
> irq_set_affinity() and add irq_get_affinity(). Comments?

I think we should rather have some kind of generic thing in the
IRQ layer that allows specifying the usage model of the device's
interrupts, so that the IRQ layer can choose a default affinities.

I never notice any of this complete insanity on sparc64 because
we flat spread out all of the interrupts across the machine.

What we don't want it drivers choosing IRQ affinity settings,
they have no idea about NUMA topology, what NUMA node the
PCI controller sits behind, what cpus are there, etc. and
without that kind of knowledge you cannot possible make
affinity decisions properly.

2008-08-29 07:08:22

by Brice Goglin

[permalink] [raw]
Subject: Re: [RFC] export irq_set/get_affinity() for multiqueue network drivers

David Miller wrote:
> I think we should rather have some kind of generic thing in the
> IRQ layer that allows specifying the usage model of the device's
> interrupts, so that the IRQ layer can choose a default affinities.
>
> I never notice any of this complete insanity on sparc64 because
> we flat spread out all of the interrupts across the machine.
>
> What we don't want it drivers choosing IRQ affinity settings,
> they have no idea about NUMA topology, what NUMA node the
> PCI controller sits behind, what cpus are there, etc. and
> without that kind of knowledge you cannot possible make
> affinity decisions properly.

As long as we get something better than the current behavior, I am fine
with it :)

Brice

2008-08-29 14:53:32

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [RFC] export irq_set/get_affinity() for multiqueue network drivers

On Thu, 28 Aug 2008 22:21:53 +0200
Brice Goglin <[email protected]> wrote:

> Hello,
>
> Is there any way to setup IRQ masks from within a driver? myri10ge
> currently relies on an external script (writing in
> /proc/irq/*/smp_affinity) to bind each queue/MSI-X to a different
> processor. By default, Linux will either:
> * round-robin the interrupts (killing the benefit of DCA for instance)
> * put all IRQs on the same CPU (killing much of th

* do the right thing with the userspace irq balancer

--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-08-29 16:48:27

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC] export irq_set/get_affinity() for multiqueue network drivers

Arjan van de Ven <[email protected]> writes:

> On Thu, 28 Aug 2008 22:21:53 +0200
> Brice Goglin <[email protected]> wrote:
>
>> Hello,
>>
>> Is there any way to setup IRQ masks from within a driver? myri10ge
>> currently relies on an external script (writing in
>> /proc/irq/*/smp_affinity) to bind each queue/MSI-X to a different
>> processor. By default, Linux will either:
>> * round-robin the interrupts (killing the benefit of DCA for instance)
>> * put all IRQs on the same CPU (killing much of th
>
> * do the right thing with the userspace irq balancer

It probably also needs to be hooked up the sched_mc_power_savings
When the switch is on the interrupts shouldn't be spread out over
that many sockets.

Does it need callbacks to change the interrupts when that variable
changes?

Also I suspect handling SMT explicitely is a good idea. e.g. I would
always set the affinity to all thread siblings in a core, not
just a single one, because context switch is very cheap between them.

-Andi

--
[email protected]

2008-08-29 16:52:19

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [RFC] export irq_set/get_affinity() for multiqueue network drivers

On Fri, 29 Aug 2008 18:48:12 +0200
Andi Kleen <[email protected]> wrote:

> Arjan van de Ven <[email protected]> writes:
>
> > On Thu, 28 Aug 2008 22:21:53 +0200
> > Brice Goglin <[email protected]> wrote:
> >
> >> Hello,
> >>
> >> Is there any way to setup IRQ masks from within a driver? myri10ge
> >> currently relies on an external script (writing in
> >> /proc/irq/*/smp_affinity) to bind each queue/MSI-X to a different
> >> processor. By default, Linux will either:
> >> * round-robin the interrupts (killing the benefit of DCA for
> >> instance)
> >> * put all IRQs on the same CPU (killing much of th
> >
> > * do the right thing with the userspace irq balancer
>
> It probably also needs to be hooked up the sched_mc_power_savings
> When the switch is on the interrupts shouldn't be spread out over
> that many sockets.

that's what irqbalance already does today.

>
> Also I suspect handling SMT explicitely is a good idea. e.g. I would
> always set the affinity to all thread siblings in a core, not
> just a single one, because context switch is very cheap between them.

that is what irqbalance already does today, at least for what it
considers somewhat slower irqs.
for networking it still sucks because the packet reordering logic is
per logical cpu so you still don't want to receive packets from the
same "stream" over multiple logical cpus.



--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-08-29 17:14:39

by Rick Jones

[permalink] [raw]
Subject: Re: [RFC] export irq_set/get_affinity() for multiqueue network drivers

> Also I suspect handling SMT explicitely is a good idea. e.g. I would
> always set the affinity to all thread siblings in a core, not
> just a single one, because context switch is very cheap between them.

That is true, but don't they also "compete" for pipeline resources?

rick jones