Question: Why is ksoftirqd eating about 5 to 10 percent of my CPU on an idle
system? The problem occurs if I config the kernel with tickless
support (i.e. CONFIG_TICK_ONESHOT=y). (Thanks to "oprofile" for putting me
onto this.)
I have noted this same problem on kernel versions: 2.6.23.1, 2.6.23.8 and
2.6.23.9
**************************************************************************
*** Output from "vmstat -n 1 10" -- Note very high context switch rate ***
*** This is on a idle machine! ***
**************************************************************************
procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
0 0 0 1925556 4768 116104 0 0 124 2 6 7538 1 2
96 1
0 0 0 1925556 4768 116104 0 0 0 0 2 147329 0 1
99 0
0 0 0 1925548 4768 116104 0 0 0 0 0 154515 0 1
99 0
0 0 0 1925548 4768 116104 0 0 0 0 1 153898 0 2
98 0
0 0 0 1925548 4780 116104 0 0 0 16 3 155216 0 1
99 0
0 0 0 1925548 4780 116104 0 0 0 0 1 161718 0 1
99 0
0 0 0 1925548 4780 116104 0 0 0 0 0 147587 0 2
98 0
0 0 0 1925548 4780 116104 0 0 0 0 1 153524 0 2
98 0
0 0 0 1925448 4780 116104 0 0 0 0 0 153434 0 1
99 0
0 0 0 1925448 4792 116092 0 0 0 16 4 153527 0 2
98 0
********************
*** System Stats ***
********************
Distro: Slackware 10.2
Mobo: MSI MasterX FA6R E7210
CPUs: Dual 2.4 GHz P4 Xeons 400 MHz FSB - Hyperthreading enabled
Mem: 2 GB ECC DDR PC 266
******************
*** PCI Config ***
******************
00:00.0 Host bridge: Intel Corporation 82875P/E7210 Memory Controller Hub
(rev 02)
00:03.0 PCI bridge: Intel Corporation 82875P/E7210 Processor to PCI to CSA
Bridge (rev 02)
00:06.0 System peripheral: Intel Corporation 82875P/E7210 Processor to I/O
Memory Interface (rev 02)
00:1c.0 PCI bridge: Intel Corporation 6300ESB 64-bit PCI-X Bridge (rev 02)
00:1d.0 USB Controller: Intel Corporation 6300ESB USB Universal Host
Controller (rev 02)
00:1d.1 USB Controller: Intel Corporation 6300ESB USB Universal Host
Controller (rev 02)
00:1d.4 System peripheral: Intel Corporation 6300ESB Watchdog Timer (rev 02)
00:1d.5 PIC: Intel Corporation 6300ESB I/O Advanced Programmable Interrupt
Controller (rev 02)
00:1d.7 USB Controller: Intel Corporation 6300ESB USB2 Enhanced Host
Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 0a)
00:1f.0 ISA bridge: Intel Corporation 6300ESB LPC Interface Controller
(rev 02)
00:1f.1 IDE interface: Intel Corporation 6300ESB PATA Storage Controller
(rev 02)
00:1f.2 IDE interface: Intel Corporation 6300ESB SATA Storage Controller
(rev 02)
00:1f.3 SMBus: Intel Corporation 6300ESB SMBus Controller (rev 02)
01:01.0 Ethernet controller: Intel Corporation 82547GI Gigabit Ethernet
Controller
02:02.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X
Fusion-MPT Dual Ultra320 SCSI (rev 08)
03:09.0 Mass storage controller: Silicon Image, Inc. SiI 3114
[SATALink/SATARaid] Serial ATA Controller (rev 02)
03:0a.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet
Controller
03:0c.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
[email protected] wrote:
> Question: Why is ksoftirqd eating about 5 to 10 percent of my CPU on an idle
> system? The problem occurs if I config the kernel with tickless
> support (i.e. CONFIG_TICK_ONESHOT=y). (Thanks to "oprofile" for putting me
> onto this.)
>
> I have noted this same problem on kernel versions: 2.6.23.1, 2.6.23.8 and
> 2.6.23.9
>
> **************************************************************************
> *** Output from "vmstat -n 1 10" -- Note very high context switch rate ***
> *** This is on a idle machine! ***
> **************************************************************************
>
> procs -----------memory---------- ---swap-- -----io---- --system--
> ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy
> id wa
> 0 0 0 1925556 4768 116104 0 0 124 2 6 7538 1 2
> 96 1
> 0 0 0 1925556 4768 116104 0 0 0 0 2 147329 0 1
> 99 0
What did oprofile show? It should be able to narrow down what
function(s) are responsible for the CPU usage..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
On Mon, 26 Nov 2007 22:36:17 -0600
Robert Hancock <[email protected]> wrote:
> [email protected] wrote:
> > Question: Why is ksoftirqd eating about 5 to 10 percent of my CPU
> > on an idle system? The problem occurs if I config the kernel with
> > tickless support (i.e. CONFIG_TICK_ONESHOT=y). (Thanks to
> > "oprofile" for putting me onto this.)
> >
> > I have noted this same problem on kernel versions: 2.6.23.1,
> > 2.6.23.8 and 2.6.23.9
> >
> > **************************************************************************
> > *** Output from "vmstat -n 1 10" -- Note very high context switch
> > rate *** *** This is on a idle
> > machine! ***
> > **************************************************************************
> >
> > procs -----------memory---------- ---swap-- -----io---- --system--
> > ----cpu----
> > r b swpd free buff cache si so bi bo in cs
> > us sy id wa
> > 0 0 0 1925556 4768 116104 0 0 124 2 6
> > 7538 1 2 96 1
> > 0 0 0 1925556 4768 116104 0 0 0 0 2
> > 147329 0 1 99 0
>
> What did oprofile show? It should be able to narrow down what
> function(s) are responsible for the CPU usage..
>
or better, what does powertop version 1.9 show?
that tends to show tickless wakeup artifacts quite nicely
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
[email protected] wrote:
> Hello Robert,
>
> I've attached additional detail on the config of the misbehaving system
> including output from oprofile and PowerTop. PowerTop output leads me to
> believe that maybe this is an interaction between my bridged ethernet
> setup and dynticks? Hmmm...
Don't know about that, your top wakeups are from br_stp_enable_bridge,
but that is only 26 a second - that doesn't explain a context switch
rate of 150,000 a second..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
On Mon, 26 Nov 2007 20:36:32 -0600 (CST) [email protected] wrote:
> Question: Why is ksoftirqd eating about 5 to 10 percent of my CPU on an idle
> system? The problem occurs if I config the kernel with tickless
> support (i.e. CONFIG_TICK_ONESHOT=y). (Thanks to "oprofile" for putting me
> onto this.)
beware that oprofile can provide misleading results on a paritally-idle
system. You may have discovered that ksoftirqd is consuming 5-10% of the
non-idle time on that idle system, which is less surprising.
> I have noted this same problem on kernel versions: 2.6.23.1, 2.6.23.8 and
> 2.6.23.9
>
> **************************************************************************
> *** Output from "vmstat -n 1 10" -- Note very high context switch rate ***
> *** This is on a idle machine! ***
> **************************************************************************
>
> procs -----------memory---------- ---swap-- -----io---- --system--
> ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy
> id wa
> 0 0 0 1925556 4768 116104 0 0 124 2 6 7538 1 2
> 96 1
> 0 0 0 1925556 4768 116104 0 0 0 0 2 147329 0 1
> 99 0
> 0 0 0 1925548 4768 116104 0 0 0 0 0 154515 0 1
> 99 0
> 0 0 0 1925548 4768 116104 0 0 0 0 1 153898 0 2
> 98 0
> 0 0 0 1925548 4780 116104 0 0 0 16 3 155216 0 1
> 99 0
> 0 0 0 1925548 4780 116104 0 0 0 0 1 161718 0 1
> 99 0
> 0 0 0 1925548 4780 116104 0 0 0 0 0 147587 0 2
> 98 0
> 0 0 0 1925548 4780 116104 0 0 0 0 1 153524 0 2
> 98 0
> 0 0 0 1925448 4780 116104 0 0 0 0 0 153434 0 1
> 99 0
> 0 0 0 1925448 4792 116092 0 0 0 16 4 153527 0 2
> 98 0
So what piece of code is scheduling so much? What does `top' say? What
does the (sorted) output of oprofile look like?
Did you try shutting down as much userspace code as possible to find out if
some userspace task is misbehaving?
On Mon, 26 Nov 2007 22:36:17 -0600 Robert Hancock <[email protected]> wrote:
> [email protected] wrote:
> > Question: Why is ksoftirqd eating about 5 to 10 percent of my CPU on an idle
> > system? The problem occurs if I config the kernel with tickless
> > support (i.e. CONFIG_TICK_ONESHOT=y). (Thanks to "oprofile" for putting me
> > onto this.)
> >
> > I have noted this same problem on kernel versions: 2.6.23.1, 2.6.23.8 and
> > 2.6.23.9
> >
> > **************************************************************************
> > *** Output from "vmstat -n 1 10" -- Note very high context switch rate ***
> > *** This is on a idle machine! ***
> > **************************************************************************
> >
> > procs -----------memory---------- ---swap-- -----io---- --system--
> > ----cpu----
> > r b swpd free buff cache si so bi bo in cs us sy
> > id wa
> > 0 0 0 1925556 4768 116104 0 0 124 2 6 7538 1 2
> > 96 1
> > 0 0 0 1925556 4768 116104 0 0 0 0 2 147329 0 1
> > 99 0
>
> What did oprofile show? It should be able to narrow down what
> function(s) are responsible for the CPU usage..
Sigh. I just asked a similar thing. Let's look at the mail headers:
Message-ID: <41877.67.173.156.207.1196130992.squirrel@http://www.techfinesse.net>
...
From: [email protected]
From: Robert Hancock <[email protected]>
...
In-reply-to: <fa.c70Wy8WHP3DOaDydQ2D94+Xxx/[email protected]>
Please fix your email client so as to not break threading?
* Andrew Morton <[email protected]> wrote:
> On Mon, 26 Nov 2007 20:36:32 -0600 (CST) [email protected] wrote:
>
> > Question: Why is ksoftirqd eating about 5 to 10 percent of my CPU on an idle
> > system? The problem occurs if I config the kernel with tickless
> > support (i.e. CONFIG_TICK_ONESHOT=y). (Thanks to "oprofile" for putting me
> > onto this.)
>
> beware that oprofile can provide misleading results on a paritally-idle
> system. You may have discovered that ksoftirqd is consuming 5-10% of the
> non-idle time on that idle system, which is less surprising.
>
> > I have noted this same problem on kernel versions: 2.6.23.1, 2.6.23.8 and
> > 2.6.23.9
> >
> > **************************************************************************
> > *** Output from "vmstat -n 1 10" -- Note very high context switch rate ***
> > *** This is on a idle machine! ***
> > **************************************************************************
> >
> > procs -----------memory---------- ---swap-- -----io---- --system--
> > ----cpu----
> > r b swpd free buff cache si so bi bo in cs us sy
> > id wa
> > 0 0 0 1925556 4768 116104 0 0 124 2 6 7538 1 2
> > 96 1
> > 0 0 0 1925556 4768 116104 0 0 0 0 2 147329 0 1
> > 99 0
> > 0 0 0 1925548 4768 116104 0 0 0 0 0 154515 0 1
> > 99 0
> > 0 0 0 1925548 4768 116104 0 0 0 0 1 153898 0 2
> > 98 0
> > 0 0 0 1925548 4780 116104 0 0 0 16 3 155216 0 1
> > 99 0
> > 0 0 0 1925548 4780 116104 0 0 0 0 1 161718 0 1
> > 99 0
> > 0 0 0 1925548 4780 116104 0 0 0 0 0 147587 0 2
> > 98 0
> > 0 0 0 1925548 4780 116104 0 0 0 0 1 153524 0 2
> > 98 0
> > 0 0 0 1925448 4780 116104 0 0 0 0 0 153434 0 1
> > 99 0
> > 0 0 0 1925448 4792 116092 0 0 0 16 4 153527 0 2
> > 98 0
>
> So what piece of code is scheduling so much? What does `top' say?
> What does the (sorted) output of oprofile look like?
>
> Did you try shutting down as much userspace code as possible to find
> out if some userspace task is misbehaving?
such 'what the heck is happening' problems can also be debugged via the
tracer. Here's a quickstart:
http://redhat.com/~mingo/latency-tracing-patches/tracing-QuickStart.txt
Ingo