Hello,
I have a dual-CPU router/firewall with five gigabit NICs. Recently I have
found that irqbalance (0.55 from Fedora 9/x86_64) gives a suboptimal
IRQ to CPU mapping on this box:
During traffic spikes, it assings two NICs to one CPU, and the
other three to the second CPU. However, this does not account for
the fact that packets coming from the uplink interface are way more
expensive to handle than the rest of the traffic: most iptables rules
apply to the packets received from the uplink interface. The result is
that the CPU which receives IRQs for the uplink interface
is 100 % busy (softirq mostly), while the other one is 90% idle.
Setting the IRQ mapping by hand (uplink to one CPU, all the other
NICs to the other CPU) makes a well balanced system (both CPUs 30-60 % busy).
I am not sure whether my configuration is too special, but it might be
worth trying to make irqbalance daemon cope also with this usage pattern.
Another problem is that with one CPU 100 % busy in the kernel
the system latency of user-space programs is _way_ too high. For example,
MRTG graphs from my router used to have blank stripes (i.e. snmpd
has failed to respond in time). Also the shell response time was bad,
even though I was logged in using SSH over the interface, which at that
time had its IRQ routed to the other CPU.
With the same network load and manual IRQ to CPU assignment,
MRTG works well and both snmpd and shell response time is good.
-Yenya
--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/ Journal: http://www.fi.muni.cz/~kas/blog/ |
>> If you find yourself arguing with Alan Cox, you’re _probably_ wrong. <<
>> --James Morris in "How and Why You Should Become a Kernel Hacker" <<
On Fri, 3 Oct 2008 15:21:17 +0200
Jan Kasprzak <[email protected]> wrote:
> Hello,
>
> I have a dual-CPU router/firewall with five gigabit NICs. Recently I
> have found that irqbalance (0.55 from Fedora 9/x86_64) gives a
> suboptimal IRQ to CPU mapping on this box:
>
> During traffic spikes, it assings two NICs to one CPU, and the
> other three to the second CPU. However, this does not account for
> the fact that packets coming from the uplink interface are way more
> expensive to handle than the rest of the traffic: most iptables rules
> apply to the packets received from the uplink interface. The result is
> that the CPU which receives IRQs for the uplink interface
> is 100 % busy (softirq mostly), while the other one is 90% idle.
>
> Setting the IRQ mapping by hand (uplink to one CPU, all the
> other NICs to the other CPU) makes a well balanced system (both CPUs
> 30-60 % busy). I am not sure whether my configuration is too special,
> but it might be worth trying to make irqbalance daemon cope also with
> this usage pattern.
>
one of the hard cases for irqbalance is that irqbalance doesn't have a
way to find out the actual cpu time spend in the handlers. For
networking it makes an estimate just based on the number of packets
(which is better than nothing)... but that breaks down if you have an
non-symmetry in CPU costs per packet like you have.
The good news is that irqthreads at least have the potential to solve
this "lack of information"; if not, we could consider doing a form of
microaccounting for irq handlers....
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Fri, Oct 03, 2008 at 06:38:57AM -0700, Arjan van de Ven wrote:
> > Hello,
> >
> > I have a dual-CPU router/firewall with five gigabit NICs. Recently I
> > have found that irqbalance (0.55 from Fedora 9/x86_64) gives a
> > suboptimal IRQ to CPU mapping on this box:
> >
> > During traffic spikes, it assings two NICs to one CPU, and the
> > other three to the second CPU. However, this does not account for
> > the fact that packets coming from the uplink interface are way more
> > expensive to handle than the rest of the traffic: most iptables rules
> > apply to the packets received from the uplink interface. The result is
> > that the CPU which receives IRQs for the uplink interface
> > is 100 % busy (softirq mostly), while the other one is 90% idle.
> >
> > Setting the IRQ mapping by hand (uplink to one CPU, all the
> > other NICs to the other CPU) makes a well balanced system (both CPUs
> > 30-60 % busy). I am not sure whether my configuration is too special,
> > but it might be worth trying to make irqbalance daemon cope also with
> > this usage pattern.
> >
>
> one of the hard cases for irqbalance is that irqbalance doesn't have a
> way to find out the actual cpu time spend in the handlers. For
> networking it makes an estimate just based on the number of packets
> (which is better than nothing)... but that breaks down if you have an
> non-symmetry in CPU costs per packet like you have.
>
> The good news is that irqthreads at least have the potential to solve
> this "lack of information"; if not, we could consider doing a form of
> microaccounting for irq handlers....
>
>
perhaps, this could be addressed using tracepoints. The currently
proposed ones are at the beginning and end of 'handle_IRQ_event()'. See:
http://marc.info/?l=linux-kernel&m=121616099830280&w=2
thanks,
-Jason
Arjan van de Ven wrote:
: Jan Kasprzak <[email protected]> wrote:
: > The result is
: > that the CPU which receives IRQs for the uplink interface
: > is 100 % busy (softirq mostly), while the other one is 90% idle.
:
: one of the hard cases for irqbalance is that irqbalance doesn't have a
: way to find out the actual cpu time spend in the handlers. For
: networking it makes an estimate just based on the number of packets
: (which is better than nothing)... but that breaks down if you have an
: non-symmetry in CPU costs per packet like you have.
:
: The good news is that irqthreads at least have the potential to solve
: this "lack of information"; if not, we could consider doing a form of
: microaccounting for irq handlers....
I am not sure whether this would help. In my case, the most of the
in-kernel CPU time is not spend in the irq handler per se, but in softirq
(i.e. checking the packet against iptables rules).
-Yenya
--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/ Journal: http://www.fi.muni.cz/~kas/blog/ |
>> If you find yourself arguing with Alan Cox, you’re _probably_ wrong. <<
>> --James Morris in "How and Why You Should Become a Kernel Hacker" <<
Jason Baron wrote:
>> one of the hard cases for irqbalance is that irqbalance doesn't have a
>> way to find out the actual cpu time spend in the handlers. For
>> networking it makes an estimate just based on the number of packets
>> (which is better than nothing)... but that breaks down if you have an
>> non-symmetry in CPU costs per packet like you have.
>>
>> The good news is that irqthreads at least have the potential to solve
>> this "lack of information"; if not, we could consider doing a form of
>> microaccounting for irq handlers....
>>
>>
>
> perhaps, this could be addressed using tracepoints. The currently
> proposed ones are at the beginning and end of 'handle_IRQ_event()'. See:
> http://marc.info/?l=linux-kernel&m=121616099830280&w=2
>
something that you always need should not be a tracepoint.
Jan Kasprzak wrote:
> Arjan van de Ven wrote:
> : Jan Kasprzak <[email protected]> wrote:
> : > The result is
> : > that the CPU which receives IRQs for the uplink interface
> : > is 100 % busy (softirq mostly), while the other one is 90% idle.
> :
> : one of the hard cases for irqbalance is that irqbalance doesn't have a
> : way to find out the actual cpu time spend in the handlers. For
> : networking it makes an estimate just based on the number of packets
> : (which is better than nothing)... but that breaks down if you have an
> : non-symmetry in CPU costs per packet like you have.
> :
> : The good news is that irqthreads at least have the potential to solve
> : this "lack of information"; if not, we could consider doing a form of
> : microaccounting for irq handlers....
>
> I am not sure whether this would help. In my case, the most of the
> in-kernel CPU time is not spend in the irq handler per se, but in softirq
> (i.e. checking the packet against iptables rules).
there is some consideration of making softirqs that are raised run as part of the irq thread.
or at least thoughts in that direction.
On Fri, 2008-10-03 at 06:38 -0700, Arjan van de Ven wrote:
> The good news is that irqthreads at least have the potential to solve
> this "lack of information"; if not, we could consider doing a form of
> microaccounting for irq handlers....
I have some patches floating about that account for nmi/irq/softirq time
in a fine grained scale. The trouble is that i've so far not found a way
to handle the case of a jiffie based sched_clock().
The trouble with that is that time always increases in IRQ context, so
nmi=0, softirq=0, regular=0 but irq=100%, which is obviuosly
sub-optimal :-)
On Tuesday 07 October 2008 21:29, Peter Zijlstra wrote:
> On Fri, 2008-10-03 at 06:38 -0700, Arjan van de Ven wrote:
> > The good news is that irqthreads at least have the potential to solve
> > this "lack of information"; if not, we could consider doing a form of
> > microaccounting for irq handlers....
>
> I have some patches floating about that account for nmi/irq/softirq time
> in a fine grained scale. The trouble is that i've so far not found a way
> to handle the case of a jiffie based sched_clock().
Would be nice to have.
> The trouble with that is that time always increases in IRQ context, so
> nmi=0, softirq=0, regular=0 but irq=100%, which is obviuosly
> sub-optimal :-)
Can't you exempt the timer interrupt from counting itself as irq
in the case of timer interrupt based sched_clock()?