2007-11-20 04:12:32

by Mark Lord

[permalink] [raw]
Subject: CONFIG_IRQBALANCE for 64-bit x86 ?

On 32-bit x86, we have CONFIG_IRQBALANCE available,
but not on 64-bit x86. Why not?

I ask, because this feature seems almost essential to obtaining
reasonable latencies during heavy I/O with fast devices.

My 32-bit Core2Duo MythTV box drops audio frames without it,
but works perfectly *with* IRQBALANCE.

My QuadCore box works very well in 32-bit mode with IRQBALANCE,
but responsiveness sucks bigtime when run in 64-bit mode (no IRQBALANCE)
during periods of multiple heavy I/O streams (USB flash drives).

That's with both the 32 and 64 bit versions of Kubuntu Gutsy,
so the software uses pretty much identical versions either way.

As near as I can tell, when IRQBALANCE is not configured,
all I/O device interrupts go to CPU#0.

I don't think our CPU scheduler takes that into account when assigning
tasks to CPUs, so anything sent to CPU0 runs with very high latencies.

Or something like that.

Why no IRQ_BALANCE in 64-bit mode ?


2007-11-20 04:15:51

by Ismail Dönmez

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

Tuesday 20 November 2007 Tarihinde 06:12:21 yazmıştı:
> On 32-bit x86, we have CONFIG_IRQBALANCE available,
> but not on 64-bit x86. Why not?
>
> I ask, because this feature seems almost essential to obtaining
> reasonable latencies during heavy I/O with fast devices.
>
> My 32-bit Core2Duo MythTV box drops audio frames without it,
> but works perfectly *with* IRQBALANCE.
>
> My QuadCore box works very well in 32-bit mode with IRQBALANCE,
> but responsiveness sucks bigtime when run in 64-bit mode (no IRQBALANCE)
> during periods of multiple heavy I/O streams (USB flash drives).
>
> That's with both the 32 and 64 bit versions of Kubuntu Gutsy,
> so the software uses pretty much identical versions either way.
>
> As near as I can tell, when IRQBALANCE is not configured,
> all I/O device interrupts go to CPU#0.
>
> I don't think our CPU scheduler takes that into account when assigning
> tasks to CPUs, so anything sent to CPU0 runs with very high latencies.
>
> Or something like that.
>
> Why no IRQ_BALANCE in 64-bit mode ?

Have you tried running irqbalance on userspace? Checkout
http://irqbalance.org/ . AFAIK CONFIG_IRQBALANCE is deprecated and eats
battery power.

Regards,
ismail

--
Faith is believing what you know isn't so -- Mark Twain

2007-11-20 04:17:32

by Nick Piggin

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

On Tuesday 20 November 2007 15:12, Mark Lord wrote:
> On 32-bit x86, we have CONFIG_IRQBALANCE available,
> but not on 64-bit x86. Why not?
>
> I ask, because this feature seems almost essential to obtaining
> reasonable latencies during heavy I/O with fast devices.
>
> My 32-bit Core2Duo MythTV box drops audio frames without it,
> but works perfectly *with* IRQBALANCE.
>
> My QuadCore box works very well in 32-bit mode with IRQBALANCE,
> but responsiveness sucks bigtime when run in 64-bit mode (no IRQBALANCE)
> during periods of multiple heavy I/O streams (USB flash drives).
>
> That's with both the 32 and 64 bit versions of Kubuntu Gutsy,
> so the software uses pretty much identical versions either way.
>
> As near as I can tell, when IRQBALANCE is not configured,
> all I/O device interrupts go to CPU#0.
>
> I don't think our CPU scheduler takes that into account when assigning
> tasks to CPUs, so anything sent to CPU0 runs with very high latencies.
>
> Or something like that.
>
> Why no IRQ_BALANCE in 64-bit mode ?

For that matter, I'd like to know why it has been decided that the
best place for IRQ balancing is in userspace. It should be in kernel
IMO, and it would probably allow better power saving, performance,
fairness, etc. if it were to be integrated with the task balancer as
well.

2007-11-20 04:29:51

by Willy Tarreau

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

On Tue, Nov 20, 2007 at 03:17:15PM +1100, Nick Piggin wrote:
> On Tuesday 20 November 2007 15:12, Mark Lord wrote:
> > On 32-bit x86, we have CONFIG_IRQBALANCE available,
> > but not on 64-bit x86. Why not?
> >
> > I ask, because this feature seems almost essential to obtaining
> > reasonable latencies during heavy I/O with fast devices.
> >
> > My 32-bit Core2Duo MythTV box drops audio frames without it,
> > but works perfectly *with* IRQBALANCE.
> >
> > My QuadCore box works very well in 32-bit mode with IRQBALANCE,
> > but responsiveness sucks bigtime when run in 64-bit mode (no IRQBALANCE)
> > during periods of multiple heavy I/O streams (USB flash drives).
> >
> > That's with both the 32 and 64 bit versions of Kubuntu Gutsy,
> > so the software uses pretty much identical versions either way.
> >
> > As near as I can tell, when IRQBALANCE is not configured,
> > all I/O device interrupts go to CPU#0.
> >
> > I don't think our CPU scheduler takes that into account when assigning
> > tasks to CPUs, so anything sent to CPU0 runs with very high latencies.
> >
> > Or something like that.
> >
> > Why no IRQ_BALANCE in 64-bit mode ?
>
> For that matter, I'd like to know why it has been decided that the
> best place for IRQ balancing is in userspace. It should be in kernel
> IMO, and it would probably allow better power saving, performance,
> fairness, etc. if it were to be integrated with the task balancer as
> well.

Agreed. When userspace has something to do with the way IRQs are
delivered, it's going to smell as bad as micro-kernels...

Willy

2007-11-20 04:38:15

by Adrian Bunk

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

On Tue, Nov 20, 2007 at 05:29:29AM +0100, Willy Tarreau wrote:
> On Tue, Nov 20, 2007 at 03:17:15PM +1100, Nick Piggin wrote:
> > On Tuesday 20 November 2007 15:12, Mark Lord wrote:
> > > On 32-bit x86, we have CONFIG_IRQBALANCE available,
> > > but not on 64-bit x86. Why not?
> > >
> > > I ask, because this feature seems almost essential to obtaining
> > > reasonable latencies during heavy I/O with fast devices.
> > >
> > > My 32-bit Core2Duo MythTV box drops audio frames without it,
> > > but works perfectly *with* IRQBALANCE.
> > >
> > > My QuadCore box works very well in 32-bit mode with IRQBALANCE,
> > > but responsiveness sucks bigtime when run in 64-bit mode (no IRQBALANCE)
> > > during periods of multiple heavy I/O streams (USB flash drives).
> > >
> > > That's with both the 32 and 64 bit versions of Kubuntu Gutsy,
> > > so the software uses pretty much identical versions either way.
> > >
> > > As near as I can tell, when IRQBALANCE is not configured,
> > > all I/O device interrupts go to CPU#0.
> > >
> > > I don't think our CPU scheduler takes that into account when assigning
> > > tasks to CPUs, so anything sent to CPU0 runs with very high latencies.
> > >
> > > Or something like that.
> > >
> > > Why no IRQ_BALANCE in 64-bit mode ?
> >
> > For that matter, I'd like to know why it has been decided that the
> > best place for IRQ balancing is in userspace. It should be in kernel
> > IMO, and it would probably allow better power saving, performance,
> > fairness, etc. if it were to be integrated with the task balancer as
> > well.
>
> Agreed. When userspace has something to do with the way IRQs are
> delivered, it's going to smell as bad as micro-kernels...

The next step to a micro-kernel would then be hardware drivers and file
systems in userspace? ;-)

> Willy

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-11-20 05:24:51

by Nick Piggin

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

On Tuesday 20 November 2007 15:37, Adrian Bunk wrote:
> On Tue, Nov 20, 2007 at 05:29:29AM +0100, Willy Tarreau wrote:

> > Agreed. When userspace has something to do with the way IRQs are
> > delivered, it's going to smell as bad as micro-kernels...
>
> The next step to a micro-kernel would then be hardware drivers and file
> systems in userspace? ;-)

We already have those. So the next step would be to pretend the
performance critical ones can be in userspace and remain competitive,
wouldn't it? ;)

2007-11-20 05:29:48

by H. Peter Anvin

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

Nick Piggin wrote:
> On Tuesday 20 November 2007 15:37, Adrian Bunk wrote:
>> On Tue, Nov 20, 2007 at 05:29:29AM +0100, Willy Tarreau wrote:
>
>>> Agreed. When userspace has something to do with the way IRQs are
>>> delivered, it's going to smell as bad as micro-kernels...
>> The next step to a micro-kernel would then be hardware drivers and file
>> systems in userspace? ;-)
>
> We already have those. So the next step would be to pretend the
> performance critical ones can be in userspace and remain competitive,
> wouldn't it? ;)

Hey, I have a great idea... we can create a microkernel^W hypervisor and
make a single process^W domain do all the I/O...

-hpa

2007-11-20 05:40:17

by Arjan van de Ven

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

On Tue, 20 Nov 2007 15:17:15 +1100
Nick Piggin <[email protected]> wrote:

> On Tuesday 20 November 2007 15:12, Mark Lord wrote:
> > On 32-bit x86, we have CONFIG_IRQBALANCE available,
> > but not on 64-bit x86. Why not?

because the in-kernel one is actually quite bad.


> > My QuadCore box works very well in 32-bit mode with IRQBALANCE,
> > but responsiveness sucks bigtime when run in 64-bit mode (no
> > IRQBALANCE) during periods of multiple heavy I/O streams (USB flash
> > drives).

please run the userspace irq balancer, see http://www.irqbalance.org
afaik most distros ship that by default anyway.


> > As near as I can tell, when IRQBALANCE is not configured,
> > all I/O device interrupts go to CPU#0.

that depends on your chipset; some chipsets do worse than that.

>
> > I don't think our CPU scheduler takes that into account when
> > assigning tasks to CPUs, so anything sent to CPU0 runs with very
> > high latencies.
> >
> > Or something like that.
> >
> > Why no IRQ_BALANCE in 64-bit mode ?
>
> For that matter, I'd like to know why it has been decided that the
> best place for IRQ balancing is in userspace. It should be in kernel
> IMO, and it would probably allow better power saving, performance,
> fairness, etc. if it were to be integrated with the task balancer as
> well.

actually.... no. IRQ balancing is not a "fast" decision; every time you
move an interrupt around, you end up causing a really a TON of cache
line bounces, and generally really bad performance (esp if you do it
for networking ones, since you destroy the packet reassembly stuff in
the tcp/ip stack).

Instead, what ends up working is if you do high level categories of
interrupt classes and balance within those (so that no 2 networking
irqs are on the same core/package unless you have more nics than cores)
etc. Balancing on a 10 second scale seems to work quite well; no need
to pull that complexity into the kernel....

--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2007-11-20 07:37:57

by Nick Piggin

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

On Tuesday 20 November 2007 16:37, Arjan van de Ven wrote:
> On Tue, 20 Nov 2007 15:17:15 +1100

> > For that matter, I'd like to know why it has been decided that the
> > best place for IRQ balancing is in userspace. It should be in kernel
> > IMO, and it would probably allow better power saving, performance,
> > fairness, etc. if it were to be integrated with the task balancer as
> > well.
>
> actually.... no. IRQ balancing is not a "fast" decision; every time you

I didn't say anything of the sort. But IRQ load could still fluctuate
a lot more rapidly than we'd like to wake up the irqbalancer.


> move an interrupt around, you end up causing a really a TON of cache
> line bounces, and generally really bad performance

All the more reason why the kernel should do it. When I say move it to
the kernel, I don't mean because I want to move IRQs 1 000 000 times
per second and can't sustain enough context switches to do it in
userspace. Userspace basically has insufficient information to do it
as well as kernel.

We do task balancing in the kernel too, it's a pretty similar problem
(although granted it is less feasible for userspace because tasks are
created and destroyed very often)


> (esp if you do it
> for networking ones, since you destroy the packet reassembly stuff in
> the tcp/ip stack).
>
> Instead, what ends up working is if you do high level categories of
> interrupt classes and balance within those (so that no 2 networking
> irqs are on the same core/package unless you have more nics than cores)

Sure, but you say that like it is difficult information for the kernel
to know about. Actually it is much easier. Note that you can still
bind interrupts to specific CPUs.


> etc. Balancing on a 10 second scale seems to work quite well; no need
> to pull that complexity into the kernel....

My perspective is that it isn't a good idea to have such a critical
piece of infrastructure outside the kernel.

I want the kernel to balance interrupts and tasks fairly; maybe move
interrupts closer to the tasks they are interacting with (instead of,
or combined with our current policy of moving tasks near the interrupts,
which can be much more damaging for cache and NUMA); move all interrupts
to a single core when there is enough capacity and we are balancing for
power savings; do exponential interrupt balancing backoff when it isn't
required; etc. Not easy to do all that in userspace.

Any reason you actually think it is a good idea, aside from the fact
that a userspace solution was able to be better than a crappy old
kernel one?

2007-11-20 14:50:48

by Arjan van de Ven

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

On Tue, 20 Nov 2007 18:37:39 +1100
Nick Piggin <[email protected]> wrote:
> > actually.... no. IRQ balancing is not a "fast" decision; every time
> > you
>
> I didn't say anything of the sort. But IRQ load could still fluctuate
> a lot more rapidly than we'd like to wake up the irqbalancer.

irq load fluctuates by definition. but acting on it faster isn't the
right thing.
>
>
> > move an interrupt around, you end up causing a really a TON of cache
> > line bounces, and generally really bad performance
>
> All the more reason why the kernel should do it. When I say move it to
> the kernel, I don't mean because I want to move IRQs 1 000 000 times
> per second and can't sustain enough context switches to do it in
> userspace. Userspace basically has insufficient information to do it
> as well as kernel.

like what?
Assuming this is a "once every few seconds" decision (and really it is,
esp for networking)....
>
>
> > (esp if you do it
> > for networking ones, since you destroy the packet reassembly stuff
> > in the tcp/ip stack).
> >
> > Instead, what ends up working is if you do high level categories of
> > interrupt classes and balance within those (so that no 2 networking
> > irqs are on the same core/package unless you have more nics than
> > cores)
>
> Sure, but you say that like it is difficult information for the kernel
> to know about. Actually it is much easier. Note that you can still
> bind interrupts to specific CPUs.

I assume you've read what/how irqbalance does; good luck convincing
people that that kind of policy belongs in the kernel.
>
>
> > etc. Balancing on a 10 second scale seems to work quite well; no
> > need to pull that complexity into the kernel....
>
> My perspective is that it isn't a good idea to have such a critical
> piece of infrastructure outside the kernel.

kernel or kernel source? If there was a good place in the kernel source
I'd not be against moving irqbalance there. In the kernel... not needed.
(also because on single socket machines, the irqbalancer basically has
a one-shot task because there balancing is effectively a static setup)

The same ("critical piece of infrastructure') can be said about other
things, like udev and ... even hal. Nobody is arguing for moving those
into the kernel though....

>
> I want the kernel to balance interrupts and tasks fairly;

with irqthreads that will come for free soon.

>maybe move
> interrupts closer to the tasks they are interacting with (instead of,
> or combined with our current policy of moving tasks near the
> interrupts, which can be much more damaging for cache and NUMA);

interrupts and tasks have an N:M relationship.... or sometimes 1:M
where tasks only depend on one irq. Moving the irq around then tends to
be a loss. For NUMA, you actually very likely want the IRQ on the node
that the IO is associdated with.

> move
> all interrupts to a single core when there is enough capacity and we
> are balancing for power savings;

irqbalance does that today.

>do exponential interrupt balancing
> backoff when it isn't required; etc. Not easy to do all that in
> userspace.
>
> Any reason you actually think it is a good idea, aside from the fact
> that a userspace solution was able to be better than a crappy old
> kernel one?

I listed a few;
1) it's policy
2) the memory is only needed for a short time (20 seconds or so) on
single-socket machines
3) it makes decisions on "subjective" information such as interrupt
device classes that the kernel currently just doesn't have (it could
grow that obviously), and is clearly policy information.




--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2007-11-20 15:47:39

by Mark Lord

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

Arjan van de Ven wrote:
>..
> I listed a few;
> 1) it's policy
> 2) the memory is only needed for a short time (20 seconds or so) on
> single-socket machines
> 3) it makes decisions on "subjective" information such as interrupt
> device classes that the kernel currently just doesn't have (it could
> grow that obviously), and is clearly policy information.
..

It's much more than just "policy".
Distributing IRQs across available cores is *essential* functionality,
not an optional "extra" as this would have it be.

After reading some of the replies, I installed it on my malfunctioning 64-bit
system, but discovered it does not perform nearly as well as the kernel solution
in the 32-bit system does.

Responsiveness was jerky, and it took a long time to have any noticeable effect.

And in the end, it still just assigned IRQs to two of the four available cores.
Which still results in the task scheduler fighting against IRQs more than necessary.

Much of this could be due to a slow response curve in the userspace balancer (?),
but I have not yet examined it for such bugs. Hopefully it also is clever enough
to mlock() itself, and to run at a low RT priority ?

It really does need to respond *quickly* to changes in IRQ load,
as otherwise I see dropouts on sound playback (let along video..) and the like.

The vast majority of Linux machines are "single package", and this software
appears to be designed more for multi package, and doesn't do a great job here
on the single package Intel cores I have (Core2duo, Core2quad).

Cheers

2007-11-20 15:53:04

by Mark Lord

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

Mark Lord wrote:
> Arjan van de Ven wrote:
>> ..
>> I listed a few;
>> 1) it's policy 2) the memory is only needed for a short time (20
>> seconds or so) on
>> single-socket machines
>> 3) it makes decisions on "subjective" information such as interrupt
>> device classes that the kernel currently just doesn't have (it could
>> grow that obviously), and is clearly policy information.
> ..
>
> It's much more than just "policy".
> Distributing IRQs across available cores is *essential* functionality,
> not an optional "extra" as this would have it be.
>
> After reading some of the replies, I installed it on my malfunctioning
> 64-bit
> system, but discovered it does not perform nearly as well as the kernel
> solution
> in the 32-bit system does.
>
> Responsiveness was jerky, and it took a long time to have any noticeable
> effect.
>
> And in the end, it still just assigned IRQs to two of the four available
> cores.
> Which still results in the task scheduler fighting against IRQs more
> than necessary.
>
> Much of this could be due to a slow response curve in the userspace
> balancer (?),
> but I have not yet examined it for such bugs. Hopefully it also is
> clever enough
> to mlock() itself, and to run at a low RT priority ?
> It really does need to respond *quickly* to changes in IRQ load,
> as otherwise I see dropouts on sound playback (let along video..) and
> the like.
>
> The vast majority of Linux machines are "single package", and this software
> appears to be designed more for multi package, and doesn't do a great
> job here
> on the single package Intel cores I have (Core2duo, Core2quad).
..

All of which reminds me of perhaps *the* most important reason to keep
core functionality like "IRQ distribution" *inside* the kernel:

It has to pass peer review on this mailing list.

External utilities have no such accountability, and can generally just
follow the whims of their maintainers at the expense of kernel performance.

Not that this may be the case (or not) here, but..

Cheers

2007-11-20 16:02:33

by Nick Piggin

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

On Wednesday 21 November 2007 01:47, Arjan van de Ven wrote:
> On Tue, 20 Nov 2007 18:37:39 +1100
>
> Nick Piggin <[email protected]> wrote:
> > > actually.... no. IRQ balancing is not a "fast" decision; every time
> > > you
> >
> > I didn't say anything of the sort. But IRQ load could still fluctuate
> > a lot more rapidly than we'd like to wake up the irqbalancer.
>
> irq load fluctuates by definition. but acting on it faster isn't the
> right thing.

Of course it is, if you want to effectively use your resources.
Imagine if the task balancer only polled once every 10s.


> > > move an interrupt around, you end up causing a really a TON of cache
> > > line bounces, and generally really bad performance
> >
> > All the more reason why the kernel should do it. When I say move it to
> > the kernel, I don't mean because I want to move IRQs 1 000 000 times
> > per second and can't sustain enough context switches to do it in
> > userspace. Userspace basically has insufficient information to do it
> > as well as kernel.
>
> like what?

Knowledge of wakeup events, runqueue load, task and group fairness
requirements, the task balancer's consolidation of load to fewer cores.


> Assuming this is a "once every few seconds" decision (and really it is,
> esp for networking)....

Definitely not always the case. Sometimes fairness is a top concern, in
which case you probably want a lot better response than the hard coded
10 seconds in the userspace thing.


> > > (esp if you do it
> > > for networking ones, since you destroy the packet reassembly stuff
> > > in the tcp/ip stack).
> > >
> > > Instead, what ends up working is if you do high level categories of
> > > interrupt classes and balance within those (so that no 2 networking
> > > irqs are on the same core/package unless you have more nics than
> > > cores)
> >
> > Sure, but you say that like it is difficult information for the kernel
> > to know about. Actually it is much easier. Note that you can still
> > bind interrupts to specific CPUs.
>
> I assume you've read what/how irqbalance does; good luck convincing
> people that that kind of policy belongs in the kernel.

Lots of code to get topology and device information. Some constants
that make assumptions about the machine it is running on and may or may
not agree with what the task scheduler is trying to do. Some
classification stuff which makes guesses about how a particular bit of
hardware or device driver wants to be balanced. Hacks to poll hotplugging
and topology changes.

I'm still convinced. Who isn't?


> > > etc. Balancing on a 10 second scale seems to work quite well; no
> > > need to pull that complexity into the kernel....
> >
> > My perspective is that it isn't a good idea to have such a critical
> > piece of infrastructure outside the kernel.
>
> kernel or kernel source? If there was a good place in the kernel source
> I'd not be against moving irqbalance there. In the kernel... not needed.
> (also because on single socket machines, the irqbalancer basically has
> a one-shot task because there balancing is effectively a static setup)

I don't think that's a good argument for not having it in kernel.


> The same ("critical piece of infrastructure') can be said about other
> things, like udev and ... even hal. Nobody is arguing for moving those
> into the kernel though....

Maybe because there aren't any good arguments. I have good arguments
for irq balancing, though, which aren't invalidated by this observation.


> > I want the kernel to balance interrupts and tasks fairly;
>
> with irqthreads that will come for free soon.

No it won't. It will balance irqthreads. And irqthreads may not even
exist depending on the configuration.


> >maybe move
> > interrupts closer to the tasks they are interacting with (instead of,
> > or combined with our current policy of moving tasks near the
> > interrupts, which can be much more damaging for cache and NUMA);
>
> interrupts and tasks have an N:M relationship.... or sometimes 1:M
> where tasks only depend on one irq. Moving the irq around then tends to
> be a loss. For NUMA, you actually very likely want the IRQ on the node
> that the IO is associdated with.

And the kernel knows all this intimately. And it isn't always that
straightforward. And even if it were for NUMA, you still have SMP
within NUMA.


> > move
> > all interrupts to a single core when there is enough capacity and we
> > are balancing for power savings;
>
> irqbalance does that today.

To the same core which the task scheduler moves tasks? If so, I missed
that. Still, I guess that's the easiest thing to do.


> >do exponential interrupt balancing
> > backoff when it isn't required; etc. Not easy to do all that in
> > userspace.
> >
> > Any reason you actually think it is a good idea, aside from the fact
> > that a userspace solution was able to be better than a crappy old
> > kernel one?
>
> I listed a few;
> 1) it's policy

I don't think that's such a constructive point. Task balancing is
policy in exactly the same way.


> 2) the memory is only needed for a short time (20 seconds or so) on
> single-socket machines

Actually it could be a good idea for fairness and load balancing
to do it more than for a short time. Isn't it easily possible to
have a single socket, multicore system which can overload all cores
with combined IO (including a fair amount of int processing overhead),
but that often runs within CPU capacity?


> 3) it makes decisions on "subjective" information such as interrupt
> device classes that the kernel currently just doesn't have (it could
> grow that obviously), and is clearly policy information.

I'd argue that the kernel, eg. drivers, subsystems, arch code, knows
about this stuff better than irqbalance does anyway.

More out of place IMO, is irqbalance has things like checking for
NAPI turned on in a driver and in that case it does something specific
according to its knowledge of kernel implementation details.

2007-11-20 16:05:37

by Arjan van de Ven

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

On Tue, 20 Nov 2007 10:52:48 -0500
Mark Lord <[email protected]> wro
>
> All of which reminds me of perhaps *the* most important reason to keep
> core functionality like "IRQ distribution" *inside* the kernel:
>
> It has to pass peer review on this mailing list.


that's a reason to keep it in the *source*, that's not the same as
keeping it in ring0 pinning down memory all the time etc ;)

--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2007-11-20 16:10:38

by Mark Lord

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

Arjan van de Ven wrote:
> On Tue, 20 Nov 2007 10:52:48 -0500
> Mark Lord <[email protected]> wro
>> All of which reminds me of perhaps *the* most important reason to keep
>> core functionality like "IRQ distribution" *inside* the kernel:
>>
>> It has to pass peer review on this mailing list.
>
>
> that's a reason to keep it in the *source*, that's not the same as
..

Ack. :)


> keeping it in ring0 pinning down memory all the time etc ;)
..

I belive it *must* remain pinned in memory to be effective,
because I also know it must run much more frequently than it
currently seems to run, in order to respond to quick changes
in IRQ load.

Eg. a heretofore idle device is suddenly now being used to copy
a DVD-sized file around. It *must* respond quickly to changes
in load like this, or system latencies will suffer badly.

Cheers

2007-11-20 18:43:06

by Mark Lord

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

(resending this one to the list).

Arjan van de Ven wrote:
> On Tue, 20 Nov 2007 10:47:24 -0500
> Mark Lord <[email protected]> wrote:
..
>> After reading some of the replies, I installed it on my
>> malfunctioning 64-bit system, but discovered it does not perform
>> nearly as well as the kernel solution in the 32-bit system does.
>
> can you send me the output you get from running irqbalance with the
> --debug option? That'll show me what decisions it made and why
..

The next time I'm using that system for large I/O I will try and do so.
But the shortcomings seem rather obvious already (more below).

>> Much of this could be due to a slow response curve in the userspace
>> balancer (?), but I have not yet examined it for such bugs.
>> Hopefully it also is clever enough to mlock() itself, and to run at a
>> low RT priority ?
>
> there's no need for either of those two.
..

But there is! If it is not in-memory, then it needs IRQs to be paged-in
before it can redistribute any IRQs. And when the situation is bad,
the page-in device is one that can be suffering from poor response.
Which just makes the system stutter even more.

>> It really does need to respond *quickly* to changes in IRQ load,
>> as otherwise I see dropouts on sound playback (let along video..) and
>> the like.
>
> the problem is, you cannot respond quickly like that without
> sacrificing huge heaps of performance, especially on networking.
..

You are more expert on that aspect than I am.
But surely networking can be taken into account when
distributing other IRQs dynamically ?

>> The vast majority of Linux machines are "single package", and this
>> software appears to be designed more for multi package,
>
> it's not. It just right now makes the assumption that on single package
> it can do a good enough job with a static balancing.
> Maybe you've found a case that proves that assumption wrong.
..

I think perhaps the existing algorithm makes the assumptions of
a static configuration of IRQ generating devices, and an unchanging
IRQ average frequency among them.

Neither assumption is valid in a hotplug environment, and the second
assumption is certainly not true on most of my machines.

The lone 64-bit desktop configuration I have here is the only one without
in-kernel IRQ distribution, and it has the fastest (2.5GHz) clock speed,
the largest number (4) of CPU cores, and the most memory (4GB)
of any of the machines here.

And yet it really felt "jerky" in use when copying data around last night,
even after installing/running the userspace irqbalance daemon.

I eventually just moved the work over to a slower machine with a 32-bit
kernel (notebook, 2.1GHz, two cores, 3GB), and things finished more rapidly
and with no noticeable mal effects on the GUI at the time.

The workload in all cases here was plugging in 2GB USB sticks,
and copying a 2GB image to them, and the unplugging/replugging them
and running md5sum to verify correct data transfers.

I had a lot of them (14) to do, and so generally two or three sticks
were plugged in and in use at any given time.

A last note on the quad-core, is that irqbalance *never* used
more than two cores. Dunno why not.

Cheers


2007-11-20 19:10:29

by Arjan van de Ven

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

On Wed, 21 Nov 2007 02:43:46 +1100
Nick Piggin <[email protected]> wrote:

> On Wednesday 21 November 2007 01:47, Arjan van de Ven wrote:
> > On Tue, 20 Nov 2007 18:37:39 +1100
> >
> > Nick Piggin <[email protected]> wrote:
> > > > actually.... no. IRQ balancing is not a "fast" decision; every
> > > > time you
> > >
> > > I didn't say anything of the sort. But IRQ load could still
> > > fluctuate a lot more rapidly than we'd like to wake up the
> > > irqbalancer.
> >
> > irq load fluctuates by definition. but acting on it faster isn't the
> > right thing.
>
> Of course it is, if you want to effectively use your resources.
> Imagine if the task balancer only polled once every 10s.

but unlike the task balancer, moving an irq is really expensive.
(at least for networking and a few other similar systems)
ANd no it's not just the cache bouncing, it's the entire reassembly of
multiple packets etc etc that gets really messy.


> >
> > I assume you've read what/how irqbalance does; good luck convincing
> > people that that kind of policy belongs in the kernel.
>
> Lots of code to get topology and device information.

yes this would go away in the kernel

> Some constants
> that make assumptions about the machine it is running on and may or
> may not agree with what the task scheduler is trying to do.

> Some
> classification stuff which makes guesses about how a particular bit of

you misunderstood this; the classification stuff is there to spread
different irqs of similar class (say networking) over multiple
cores/packages. Doing this is a system resource balancing proposition
not just a cpu time one.

You may think this spreading based on classification is a mistake, but
it's based on the following observation:
1) servers with multiple network cards serving internet traffic out
really need to load balance their loads; this is for various per-cpu
resource reasons (such as per cpu memory pools) to be evenly used. It
also makes sure that under network spikes on both interfaces, the
response is sane
2) servers with multiple IO devices need this to be spread out, just
think of oracle etc.

for both you could argue "but we could balance this based on actual
observed load in some way", but you can only do that if you rebalance
at a relatively high frequency, which you really don't want to do for
networking and probably even storage.

We used to rebalance this frequently in the 2.4-early kernels based on
a patch from Ingo. Turned out to be a really really bad idea;
performance really tanked.

> hardware or device driver wants to be balanced. Hacks to poll
> hotplugging and topology changes.

"hacks" as in "rescan".. so falls under the topology code and would
indeed be changed to hook into hotplug inside the kernel; just
different complexity.

>
> I'm still convinced. Who isn't?

I know you can do SOME sort of balancing in the kernel. But please
describe the algorithm you would use; I started out with the same
thought but when it got down to the algorithm to me at least it became
clear "we really don't want this complexity in kernel mode".



> > > > etc. Balancing on a 10 second scale seems to work quite well; no
> > > > need to pull that complexity into the kernel....
> > >
> > > My perspective is that it isn't a good idea to have such a
> > > critical piece of infrastructure outside the kernel.
> >
> > kernel or kernel source? If there was a good place in the kernel
> > source I'd not be against moving irqbalance there. In the kernel...
> > not needed. (also because on single socket machines, the
> > irqbalancer basically has a one-shot task because there balancing
> > is effectively a static setup)
>
> I don't think that's a good argument for not having it in kernel.

if you don't care about kernel unpagable memory footprint, fine.
Others do.


> > The same ("critical piece of infrastructure') can be said about
> > other things, like udev and ... even hal. Nobody is arguing for
> > moving those into the kernel though....
>
> Maybe because there aren't any good arguments. I have good arguments
> for irq balancing, though, which aren't invalidated by this
> observation.

I'm not arguing against doing irqbalancing per se (heck that's why I
wrote irqbalance); just that every time I try to do it in kernel the
complexity to get the behavior people (and benchmarks) want turns me
right off that again.

>
>
> > > I want the kernel to balance interrupts and tasks fairly;
> >
> > with irqthreads that will come for free soon.
>
> No it won't. It will balance irqthreads.

and it will know how much cpu they take and it'll move work around to
compensate for any unfairness. CFS is really good at that.

> > >maybe move
> > > interrupts closer to the tasks they are interacting with (instead
> > > of, or combined with our current policy of moving tasks near the
> > > interrupts, which can be much more damaging for cache and NUMA);
> >
> > interrupts and tasks have an N:M relationship.... or sometimes 1:M
> > where tasks only depend on one irq. Moving the irq around then
> > tends to be a loss. For NUMA, you actually very likely want the IRQ
> > on the node that the IO is associdated with.
>
> And the kernel knows all this intimately. And it isn't always that
> straightforward. And even if it were for NUMA, you still have SMP
> within NUMA.

for now yes. I agree the kernel "knows" this to some form (well it
COULD know). I just don't believe the "extra" information it has is in
practice useful for making decisions on.


>
>
> > > move
> > > all interrupts to a single core when there is enough capacity and
> > > we are balancing for power savings;
> >
> > irqbalance does that today.
>
> To the same core which the task scheduler moves tasks? If so, I missed
> that. Still, I guess that's the easiest thing to do.

yes; the power aware scheduler also moves processes to the first
package .. as does irqbalance.

> >
> > I listed a few;
> > 1) it's policy
>
> I don't think that's such a constructive point. Task balancing is
> policy in exactly the same way.

not really; CFS has shown that.... the only real policy in task
balancing is the fairness part, and that seems to be general accepted
as the right thing.



> More out of place IMO, is irqbalance has things like checking for
> NAPI turned on in a driver and in that case it does something specific
> according to its knowledge of kernel implementation details.

no it doesn't; it uses "packet counts" to deal with NAPI and other
effects such as irq mitigation to get a more accurate estimate of load
caused by an irq, but it's not fair to call this inappropriate checking
for NAPI being turned on.




--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2007-11-20 19:18:15

by Andi Kleen

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

Nick Piggin <[email protected]> writes:
>
> For that matter, I'd like to know why it has been decided that the
> best place for IRQ balancing is in userspace.

There is a lot of possible policy in it

> It should be in kernel
> IMO, and it would probably allow better power saving, performance,
> fairness, etc. if it were to be integrated with the task balancer as
> well.

Integrating with the task balancer makes really only sense if the
device supports MSI-X and if it does that you don't really need
an irq balancer because you can just send to all CPUs as needed.

Without MSI-X you would be trying to reprogram the interrupts
all the time when a task is migrating and it is highly doubtful
that doing that automatically would do any good.

-Andi

2007-11-20 20:02:57

by Mark Lord

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

Arjan van de Ven wrote:
> On Wed, 21 Nov 2007 02:43:46 +1100
> Nick Piggin <[email protected]> wrote:
>
>> On Wednesday 21 November 2007 01:47, Arjan van de Ven wrote:
>>> On Tue, 20 Nov 2007 18:37:39 +1100
>>>
>>> Nick Piggin <[email protected]> wrote:
>>>>> actually.... no. IRQ balancing is not a "fast" decision; every
>>>>> time you
>>>> I didn't say anything of the sort. But IRQ load could still
>>>> fluctuate a lot more rapidly than we'd like to wake up the
>>>> irqbalancer.
>>> irq load fluctuates by definition. but acting on it faster isn't the
>>> right thing.
>> Of course it is, if you want to effectively use your resources.
>> Imagine if the task balancer only polled once every 10s.
>
> but unlike the task balancer, moving an irq is really expensive.
> (at least for networking and a few other similar systems)
> ANd no it's not just the cache bouncing, it's the entire reassembly of
> multiple packets etc etc that gets really messy.
>
>
>>> I assume you've read what/how irqbalance does; good luck convincing
>>> people that that kind of policy belongs in the kernel.
>> Lots of code to get topology and device information.
>
> yes this would go away in the kernel
>
>> Some constants
>> that make assumptions about the machine it is running on and may or
>> may not agree with what the task scheduler is trying to do.
>
>> Some
>> classification stuff which makes guesses about how a particular bit of
>
> you misunderstood this; the classification stuff is there to spread
> different irqs of similar class (say networking) over multiple
> cores/packages. Doing this is a system resource balancing proposition
> not just a cpu time one.
>
> You may think this spreading based on classification is a mistake, but
> it's based on the following observation:
> 1) servers with multiple network cards serving internet traffic out
> really need to load balance their loads; this is for various per-cpu
> resource reasons (such as per cpu memory pools) to be evenly used. It
> also makes sure that under network spikes on both interfaces, the
> response is sane
> 2) servers with multiple IO devices need this to be spread out, just
> think of oracle etc.
>
> for both you could argue "but we could balance this based on actual
> observed load in some way", but you can only do that if you rebalance
> at a relatively high frequency, which you really don't want to do for
> networking and probably even storage.
>
> We used to rebalance this frequently in the 2.4-early kernels based on
> a patch from Ingo. Turned out to be a really really bad idea;
> performance really tanked.
>
>> hardware or device driver wants to be balanced. Hacks to poll
>> hotplugging and topology changes.
>
> "hacks" as in "rescan".. so falls under the topology code and would
> indeed be changed to hook into hotplug inside the kernel; just
> different complexity.
>
>> I'm still convinced. Who isn't?
>
> I know you can do SOME sort of balancing in the kernel. But please
> describe the algorithm you would use; I started out with the same
> thought but when it got down to the algorithm to me at least it became
> clear "we really don't want this complexity in kernel mode".
..

Well, for my dualCore notebook, dualCore MythTV box, and QuadCore desktop,
the behaviour of the existing, working, 32-bit kernel IRQBALANCE code
outperforms the userspace utility.

Mostly, I suspect, due to it's much faster response to changing conditions.
That's something the external one could try to match, but at present it
seems tuned specifically for high-traffic network servers, not for the
average notebook or desktop.

Cheers

2007-11-20 22:01:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?


* Arjan van de Ven <[email protected]> wrote:

> kernel or kernel source? If there was a good place in the kernel
> source I'd not be against moving irqbalance there. [...]

would this be a good case study to use klibc and start up irqbalanced
automatically? I'd love it if we moved more of the 'system support'
userspace into the kernel proper, to keep it under control. (and to
simplify the compatibility and QA matrix)

Ingo

2007-11-20 22:01:54

by Arjan van de Ven

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

On Tue, 20 Nov 2007 15:02:43 -0500
Mark Lord <[email protected]> wrote:
> ..
>
> Well, for my dualCore notebook, dualCore MythTV box, and QuadCore
> desktop, the behaviour of the existing, working, 32-bit kernel
> IRQBALANCE code outperforms the userspace utility.
>
> Mostly, I suspect, due to it's much faster response to changing
> conditions. That's something the external one could try to match, but
> at present it seems tuned specifically for high-traffic network
> servers, not for the average notebook or desktop.

I'd really like to see what it's doing before commenting on this;
at minimum can you give me the /proc/interrupts of the system?
It might a simple bug or simple missing item, not a total "scratch the
full system".


--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2007-11-20 23:17:20

by Mark Lord

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

Arjan van de Ven wrote:
> On Tue, 20 Nov 2007 15:02:43 -0500
> Mark Lord <[email protected]> wrote:
>> ..
>>
>> Well, for my dualCore notebook, dualCore MythTV box, and QuadCore
>> desktop, the behaviour of the existing, working, 32-bit kernel
>> IRQBALANCE code outperforms the userspace utility.
>>
>> Mostly, I suspect, due to it's much faster response to changing
>> conditions. That's something the external one could try to match, but
>> at present it seems tuned specifically for high-traffic network
>> servers, not for the average notebook or desktop.
>
> I'd really like to see what it's doing before commenting on this;
> at minimum can you give me the /proc/interrupts of the system?
> It might a simple bug or simple missing item, not a total "scratch the
> full system".
..

Next time I'm doing something significant there,
I'll collect some data for you. Got other work now.

But it does make sense that this mechanism cannot be longterm for
a desktop. Intensive loads come and go quickly there, and the
interrupt handling has to respond in a timely fashion.

It's not like a server where loads generally increase/decrease gradually.

Cheers

2007-11-20 23:22:53

by Mark Lord

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

Ingo Molnar wrote:
> * Arjan van de Ven <[email protected]> wrote:
>
>> kernel or kernel source? If there was a good place in the kernel
>> source I'd not be against moving irqbalance there. [...]
>
> would this be a good case study to use klibc and start up irqbalanced
> automatically? I'd love it if we moved more of the 'system support'
> userspace into the kernel proper, to keep it under control. (and to
> simplify the compatibility and QA matrix)
..

Perhaps, but this also violates the principle that the kernel
should just *work* with sensible defaults. I don't use an initrd,
or an initramfs, and have no intention of ever doing so.

I *like* having a single boot image with no unneeded/unwanted complexity.
It's only recently that I've even come round to using some loadable
modules for things like network drivers -- I prefer a single image
for as much as possible (like Linus there).

If putting a C-library and utilities "into the kernel" still leaves
me with a single image file, then.. maybe. Seems clumsy, though.

Handling interrupts efficiently is a very basic, core function
for any operating system kernel. With CONFIG_IRQBALANCE=y, Linux is
fine at present. But that's not available in 64-bit mode,
so we have a deficiency there.

I guess I'll patch it into my kernels soon-ish.

Cheers

2007-11-20 23:28:13

by Ingo Molnar

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?


* Mark Lord <[email protected]> wrote:

> Perhaps, but this also violates the principle that the kernel should
> just *work* with sensible defaults. I don't use an initrd, or an
> initramfs, and have no intention of ever doing so.

nor do i - i was under the impression that klibc was able to work out of
a bzImage too? Am i wrong?

Ingo

2007-11-20 23:33:19

by H. Peter Anvin

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

Mark Lord wrote:
>
> Perhaps, but this also violates the principle that the kernel
> should just *work* with sensible defaults. I don't use an initrd,
> or an initramfs, and have no intention of ever doing so.
>
> I *like* having a single boot image with no unneeded/unwanted complexity.
> It's only recently that I've even come round to using some loadable
> modules for things like network drivers -- I prefer a single image
> for as much as possible (like Linus there).
>
> If putting a C-library and utilities "into the kernel" still leaves
> me with a single image file, then.. maybe. Seems clumsy, though.
>

That was the whole point of klibc, and in fact it was in -mm that way
for a while. Linus rejected it at the time on the grounds that it added
no new features, only moved existing features to userspace.

The unified build tree has since then bitrotted slightly due to lack of
time on my part, but it wouldn't be hard at all to bring it up to current.

-hpa

2007-11-20 23:37:42

by H. Peter Anvin

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

Ingo Molnar wrote:
> * Mark Lord <[email protected]> wrote:
>
>> Perhaps, but this also violates the principle that the kernel should
>> just *work* with sensible defaults. I don't use an initrd, or an
>> initramfs, and have no intention of ever doing so.
>
> nor do i - i was under the impression that klibc was able to work out of
> a bzImage too? Am i wrong?
>

Nope. It runs inside an initramfs, of course; that initramfs is linked
into the kernel binary.

-hpa

2007-11-20 23:47:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?


* H. Peter Anvin <[email protected]> wrote:

> Ingo Molnar wrote:
>> * Mark Lord <[email protected]> wrote:
>>
>>> Perhaps, but this also violates the principle that the kernel should just
>>> *work* with sensible defaults. I don't use an initrd, or an initramfs,
>>> and have no intention of ever doing so.
>>
>> nor do i - i was under the impression that klibc was able to work out of a
>> bzImage too? Am i wrong?
>>
>
> Nope. It runs inside an initramfs, of course; that initramfs is
> linked into the kernel binary.

would be nice to have a single-image variant for all of this. having the
separate initrd was always trouble - and it's pointless as well. (we
rarely update the initrd without updating the vmlinuz as well)

Ingo

2007-11-20 23:54:46

by H. Peter Anvin

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

Ingo Molnar wrote:
> * H. Peter Anvin <[email protected]> wrote:
>
>> Ingo Molnar wrote:
>>> * Mark Lord <[email protected]> wrote:
>>>
>>>> Perhaps, but this also violates the principle that the kernel should just
>>>> *work* with sensible defaults. I don't use an initrd, or an initramfs,
>>>> and have no intention of ever doing so.
>>> nor do i - i was under the impression that klibc was able to work out of a
>>> bzImage too? Am i wrong?
>>>
>> Nope. It runs inside an initramfs, of course; that initramfs is
>> linked into the kernel binary.
>
> would be nice to have a single-image variant for all of this. having the
> separate initrd was always trouble - and it's pointless as well. (we
> rarely update the initrd without updating the vmlinuz as well)
>

We do. Am I missing something?

-hpa

2007-11-21 00:08:13

by Ingo Molnar

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?


* H. Peter Anvin <[email protected]> wrote:

>>> Nope. It runs inside an initramfs, of course; that initramfs is
>>> linked into the kernel binary.
>>
>> would be nice to have a single-image variant for all of this. having
>> the separate initrd was always trouble - and it's pointless as well.
>> (we rarely update the initrd without updating the vmlinuz as well)
>
> We do. Am I missing something?

do we have a single-image way of getting both the kernel image and the
initram set up at once? What i know of is a two-image approach: vmlinuz
and initrd.

Ingo

2007-11-21 00:24:17

by H. Peter Anvin

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

Ingo Molnar wrote:
> * H. Peter Anvin <[email protected]> wrote:
>
>>>> Nope. It runs inside an initramfs, of course; that initramfs is
>>>> linked into the kernel binary.
>>> would be nice to have a single-image variant for all of this. having
>>> the separate initrd was always trouble - and it's pointless as well.
>>> (we rarely update the initrd without updating the vmlinuz as well)
>> We do. Am I missing something?
>
> do we have a single-image way of getting both the kernel image and the
> initram set up at once? What i know of is a two-image approach: vmlinuz
> and initrd.
>

Yes, we do. The initramfs can be linked into the kernel image. The
unified klibc build tree does that by default.

-hpa

2007-11-21 00:37:09

by Ingo Molnar

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?


* H. Peter Anvin <[email protected]> wrote:

> Ingo Molnar wrote:
>> * H. Peter Anvin <[email protected]> wrote:
>>
>>>>> Nope. It runs inside an initramfs, of course; that initramfs is linked
>>>>> into the kernel binary.
>>>> would be nice to have a single-image variant for all of this. having the
>>>> separate initrd was always trouble - and it's pointless as well. (we
>>>> rarely update the initrd without updating the vmlinuz as well)
>>> We do. Am I missing something?
>>
>> do we have a single-image way of getting both the kernel image and the
>> initram set up at once? What i know of is a two-image approach: vmlinuz
>> and initrd.
>>
>
> Yes, we do. The initramfs can be linked into the kernel image. The
> unified klibc build tree does that by default.

argh. Guess i misread your answer:

>>>>>> nor do i - i was under the impression that klibc was able to work
>>>>>> out of a bzImage too? Am i wrong?
>>>>> Nope. It runs inside an initramfs, of course; that initramfs is linked
>>>>> into the kernel binary.

i took that "Nope" as referring to my impression - but you in fact meant
that i am not wrong? :-) So nothing to see here. single-bzImage initrd
was and is possible, so we could in fact move chunks of system-related
userland (such as irqbalanced) into the kernel proper?

Ingo

2007-11-21 00:51:24

by H. Peter Anvin

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

Ingo Molnar wrote:
>
> i took that "Nope" as referring to my impression - but you in fact meant
> that i am not wrong? :-) So nothing to see here. single-bzImage initrd
> was and is possible, so we could in fact move chunks of system-related
> userland (such as irqbalanced) into the kernel proper?
>

Yes, it should be quite straightforward.

-hpa

2007-11-21 02:30:53

by Walt H

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

>
> On Tue, 20 Nov 2007 15:17:15 +1100
> Nick Piggin <[email protected] <mailto:[email protected]>> wrote:
>
> > On Tuesday 20 November 2007 15:12, Mark Lord wrote:
> > > On 32-bit x86, we have CONFIG_IRQBALANCE available,
> > > but not on 64-bit x86. Why not?
>
> because the in-kernel one is actually quite bad.
>
>
> > > My QuadCore box works very well in 32-bit mode with IRQBALANCE,
> > > but responsiveness sucks bigtime when run in 64-bit mode (no
> > > IRQBALANCE) during periods of multiple heavy I/O streams (USB flash
> > > drives).
>
> please run the userspace irq balancer, see http://www.irqbalance.org
> afaik most distros ship that by default anyway.

I've been running the daemon for quite some time, however, have noticed
something on my newest computer. It's a core2 duo and the IRQ balance
daemon always exits after some time. After looking at the source, I see
it's because dual core/hyperthreaded boxes (single domain caches) always
get treated as though the --oneshot option were passed and exit after
the first pass (I assume same thing happens on quad cores?).

Does this not adversely affect IRQ balancing on those CPU's? If the IRQ
load of a mostly idle device changes from when the daemon was run,
wouldn't the inability of the balance to adjust it adversely affect
performance if the load changes at a later time? I'm used to my old SMP
box with 2 physical cores, so this is just something I've wondered about
on the new box. Thanks,

-Walt


2007-11-21 02:49:28

by Jeff Garzik

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

Ingo Molnar wrote:
> single-bzImage initrd
> was and is possible,

Correct (though s/initrd/initramfs/).

Take a look at usr/Makefile for how initramfs is automatically included
in the image, right now.

The intention at the time was to quickly follow up this stub (generated
by gen_init_cpio) with a full inclusion of klibc + some basics like
nfsroot. It should be a very straightforward step to go from what we
have today to including klibc initramfs into the kernel image.


> so we could in fact move chunks of system-related
> userland (such as irqbalanced) into the kernel proper?

s/kernel/kernel tree/ I presume you mean...

With regards to irqbalanced, if you are thinking about including it in
initramfs, you would need to work out the details of how
userland/distros modify the default policy configurations.

Jeff



2007-11-21 03:04:52

by H. Peter Anvin

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

Jeff Garzik wrote:
>
> Take a look at usr/Makefile for how initramfs is automatically included
> in the image, right now.
>
> The intention at the time was to quickly follow up this stub (generated
> by gen_init_cpio) with a full inclusion of klibc + some basics like
> nfsroot. It should be a very straightforward step to go from what we
> have today to including klibc initramfs into the kernel image.
>

http://git.kernel.org/?p=linux/kernel/git/hpa/linux-2.6-klibc.git;a=summary

-hpa

2007-11-22 07:59:13

by Nick Piggin

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

On Wednesday 21 November 2007 06:07, Arjan van de Ven wrote:
> On Wed, 21 Nov 2007 02:43:46 +1100

> > Of course it is, if you want to effectively use your resources.
> > Imagine if the task balancer only polled once every 10s.
>
> but unlike the task balancer, moving an irq is really expensive.
> (at least for networking and a few other similar systems)
> ANd no it's not just the cache bouncing, it's the entire reassembly of
> multiple packets etc etc that gets really messy.

Actually a blanket statement like that is just wrong. Moving a
network interrupt yes is probably quite expensive, but it is
about the worst case one to move. What's more, moving tasks between
NUMA nodes could easily be many orders of magnitude worse than the
transient slowdown of moving irqs.

Furthermore, what you say doesn't really seem to be an argument
for doing it in userspace or an argument against moving IRQs. It
actually shows that there are complex, hardware and kernel
implementation dependent issues, all of which suggest it is better
to be in kernel.


> > Some constants
> > that make assumptions about the machine it is running on and may or
> > may not agree with what the task scheduler is trying to do.
> >
> > Some
> > classification stuff which makes guesses about how a particular bit of
>
> you misunderstood this; the classification stuff is there to spread
> different irqs of similar class (say networking) over multiple
> cores/packages. Doing this is a system resource balancing proposition
> not just a cpu time one.
>
> You may think this spreading based on classification is a mistake, but
> it's based on the following observation:

No I'm not misunderstanding or think it is a mistake. But it is
something which the kernel and the devices themselves should have
better knowledge of. You have a process which is reading off disk
and sending to a network interface? You may well want to put the
process and the disk interrupt and the network interrupt all on
the same CPU.

[snip]

> We used to rebalance this frequently in the 2.4-early kernels based on
> a patch from Ingo. Turned out to be a really really bad idea;
> performance really tanked.

To reiterate, I do not think that IRQs should be moved more frequently.
I think the kernel is in the position to know far better than userspace
about irq balancing.


> > hardware or device driver wants to be balanced. Hacks to poll
> > hotplugging and topology changes.
>
> "hacks" as in "rescan".. so falls under the topology code and would
> indeed be changed to hook into hotplug inside the kernel; just
> different complexity.

ie. simpler. All the topology stuff would be far simpler.


> > I'm still convinced. Who isn't?
>
> I know you can do SOME sort of balancing in the kernel. But please
> describe the algorithm you would use; I started out with the same
> thought but when it got down to the algorithm to me at least it became
> clear "we really don't want this complexity in kernel mode".

I'd rather not to this far into handwaving. I'm not saying that
I know exactly how it should work right now. I'm questioning the
established viewpoint that irq balancing belongs in userspace.

For that matter, I guess from the results you get, it's not terribly
bad to do in userspace or anything. But I think it can be done in
kernel.

Policy... I think that's a misused argument. The "policy" of any
kernel code I write is to utilise the hardware as efficiently as
possible within restrictions (eg. fairness, permissions). Setting
those restrictions is the realm of userspace, otherwise IMO it is
fine to go in kernel.

Using the same argument, task balancing and even scheduling is
policy, so is page reclaim, page writeback, filesystem block
allocation, etc. Now many of those things can be directed or
restricted somehow from userspace, and in-kernel irq balancing
would be no different.


> > > not needed. (also because on single socket machines, the
> > > irqbalancer basically has a one-shot task because there balancing
> > > is effectively a static setup)
> >
> > I don't think that's a good argument for not having it in kernel.
>
> if you don't care about kernel unpagable memory footprint, fine.
> Others do.

It would be a couple of K, right? I mean it would be probably less than
half the code of irqbalance because of the parsing and topology stuff.

Also, I don't think the one-shot behaviour on single socket machines is
good policy at all, and it can't capture dynamic behaviour at all.


> > > I listed a few;
> > > 1) it's policy
> >
> > I don't think that's such a constructive point. Task balancing is
> > policy in exactly the same way.
>
> not really; CFS has shown that.... the only real policy in task
> balancing is the fairness part,

Ahh, hate to get off topic, but let's not perpetuate this myth.
It wasn't Con, or CFS, or anything that showed fairness is some
great new idea. Actually I was arguing for fairness first,
against both Con and Ingo, way back when the old scheduler was
having so much problems.

Not that I am trying to claim the idea for myself. Fairness is
like the most fundamental and obvious behaviour for any sort of
resource scheduler that I have to laugh when people get "credited"
with this idea.

Back on topic... no, fairness is not the only real policy. Not at
all. Fariness is one of the most important ones, and that is exactly
why it is the default behaviour. After that, deviation from that is
a userspace thing.


> and that seems to be general accepted
> as the right thing.
>
> > More out of place IMO, is irqbalance has things like checking for
> > NAPI turned on in a driver and in that case it does something specific
> > according to its knowledge of kernel implementation details.
>
> no it doesn't; it uses "packet counts" to deal with NAPI and other
> effects such as irq mitigation to get a more accurate estimate of load
> caused by an irq, but it's not fair to call this inappropriate checking
> for NAPI being turned on.

I don't think it is inappropriate. Obviously it needs to check it
to do something close to the right thing. This to me is yet another
signal that says the kernel is the right place for it.

Anyway, I'm clearly not going to change your mind. But I do have an
idea of the rationale for doing it in userspace now. So if I wanted
to challenge that, I guess I'd have to write code and prove it...

2007-11-23 13:10:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?


* Nick Piggin <[email protected]> wrote:

> Ahh, hate to get off topic, but let's not perpetuate this myth. It
> wasn't Con, or CFS, or anything that showed fairness is some great new
> idea. Actually I was arguing for fairness first, against both Con and
> Ingo, way back when the old scheduler was having so much problems.
>
> Not that I am trying to claim the idea for myself. Fairness is like
> the most fundamental and obvious behaviour for any sort of resource
> scheduler that I have to laugh when people get "credited" with this
> idea.

just out of curiosity (and to get my own sense of history corrected), do
you remember in which thread you said that? (and even better, could you
dig out any URLs for that thread?)

btw., the question was never really whether fairness was a good idea for
a resource scheduler - the question was whether _strict fairness_ was a
good idea for a general purpose OS (and the desktop in particular). My
point back then was that strict fairness is not good enough and that we
thus need the interactivity estimator - and i still maintain the first
half of that position while conceding that i was wrong about the second
part :-)

I dont think anyone was arguing for a scheduler with no fairness at all
- but "fairness" indeed was more of an after-thought, not the driving
principle.

Current CFS uses a modified "sleeper fairness" model (not a strict
fairness model) via which we in essence replace the effect of the
interactivity estimator with "sleeper fairness". So in essence we've
replaced the O(1) scheduler's sleep average code with a deterministic
sleep average code. This in turn also made the allocation of CPU time
deterministic throughout. (which in other words can also be called "fair
allocation of CPU time")

_That_ scheme seems to behave rather well in practice and i think i can
take credit for _that_ bit ;-) [many people have hacked upon that
concept and code since then so it's nowhere near "my code" anymore, of
course.]

Ingo

2007-11-25 10:03:27

by Nick Piggin

[permalink] [raw]
Subject: Re: CONFIG_IRQBALANCE for 64-bit x86 ?

On Saturday 24 November 2007 00:09, Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
> > Ahh, hate to get off topic, but let's not perpetuate this myth. It
> > wasn't Con, or CFS, or anything that showed fairness is some great new
> > idea. Actually I was arguing for fairness first, against both Con and
> > Ingo, way back when the old scheduler was having so much problems.
> >
> > Not that I am trying to claim the idea for myself. Fairness is like
> > the most fundamental and obvious behaviour for any sort of resource
> > scheduler that I have to laugh when people get "credited" with this
> > idea.
>
> just out of curiosity (and to get my own sense of history corrected), do
> you remember in which thread you said that? (and even better, could you
> dig out any URLs for that thread?)

No, I have no idea except for the vague talking pictures in my noggin ;)
"nicksched", maybe? Or Con's patches on the old scheduler (around 2.6.0
time, was it?)


> btw., the question was never really whether fairness was a good idea for
> a resource scheduler - the question was whether _strict fairness_ was a
> good idea for a general purpose OS (and the desktop in particular). My
> point back then was that strict fairness is not good enough and that we
> thus need the interactivity estimator - and i still maintain the first
> half of that position while conceding that i was wrong about the second
> part :-)

I'm not sure what you mean by strict fairness. Obviously there are
fundamental points where you have to make some heuristic choice about
priority -- process creation/destruction, and sleep patterns in
particular. So yes, you do need decaying priority.

But if all that is applied consistently, it shouldn't be possible for
a process to get more CPU time than another of the same (or more)
demand, over a given period.


> I dont think anyone was arguing for a scheduler with no fairness at all
> - but "fairness" indeed was more of an after-thought, not the driving
> principle.

And actually it was systemically unfair by design ;) That's where
most the bad behavioural corner cases came in.


> Current CFS uses a modified "sleeper fairness" model (not a strict
> fairness model) via which we in essence replace the effect of the
> interactivity estimator with "sleeper fairness". So in essence we've
> replaced the O(1) scheduler's sleep average code with a deterministic
> sleep average code. This in turn also made the allocation of CPU time
> deterministic throughout. (which in other words can also be called "fair
> allocation of CPU time")

Yeah, it's OK I guess. I think it is quite complex -- you're dealing
with a complete heuristic anyway, so while the equations may look nice,
I don't actually know what justifies the equations themselves (not that
*any* scheduler can be completely justified in that way, but...). But
at least there is fairness and some rationale for it.

Nicksched had what I'd call a deterministic sleep average code too
(though much simpler). The big problem it had was that it also had
to scale timeslices back when there were high priority processes on
the runqueue in order to keep latency down while retaining O(1)
scheduling. It was hard or impossible to do exactly right. It would
have been easy with an O(lgn) data structure, though :P


> _That_ scheme seems to behave rather well in practice and i think i can
> take credit for _that_ bit ;-) [many people have hacked upon that
> concept and code since then so it's nowhere near "my code" anymore, of
> course.]

I found that just doing something relatively sane (eg. a simple, fair,
decaying priority system) that doesn't violate the principle of least
surprise (ie. that unix apps and programmers have expected over the
years) has resulted in good behaviour.

I don't really know about taking credit for ideas. Probably your exact
algorithm is unique, but there is a lot of research on CPU schedulers
I have never reviewed, so I can't say. Still, if you came up with it
independently, I guess that is the main thing for one's ego ;)

Still, what I can say with at least one counterexample is that fairness
is not a new concept (curious: wasn't the 2.4 scheduler fair?).