2008-01-21 23:24:31

by S.Çağlar Onur

[permalink] [raw]
Subject: Rescheduling interrupts

Hi;

With Linus's latest git, powertop reports following while system nearly %100
idle;

PowerTOP version 1.9 (C) 2007 Intel Corporation

Cn Avg residency P-states (frequencies)
C0 (cpu running) ( 6,3%) 1,84 Ghz 0,4%
C1 0,0ms ( 0,0%) 1333 Mhz 0,0%
C2 0,1ms ( 0,5%) 1000 Mhz 99,6%
C3 3,7ms (93,2%)


Wakeups-from-idle per second : 306,8 interval: 10,0s
Power usage (5 minute ACPI estimate) : 23,1 W (0,5 hours left)

Top causes for wakeups:
59,9% (238,4) <kernel IPI> : Rescheduling interrupts
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14,7% ( 58,6) amarokapp : schedule_timeout (process_timeout)
5,5% ( 21,9) java : futex_wait (hrtimer_wakeup)
5,0% ( 19,8) <interrupt> : iwl3945
2,5% ( 10,0) java : schedule_timeout (process_timeout)
2,5% ( 10,0) <kernel core> : ehci_work (ehci_watchdog)
2,5% ( 10,0) <interrupt> : extra timer interrupt
1,6% ( 6,4) artsd : schedule_timeout (process_timeout)
1,0% ( 4,0) <kernel module> : usb_hcd_poll_rh_status (rh_timer_func)
0,5% ( 2,0) <interrupt> : ohci1394, uhci_hcd:usb4, nvidia
0,5% ( 2,0) <kernel core> : clocksource_check_watchdog
(clocksource_watchdog)
0,5% ( 2,0) kwin : schedule_timeout (process_timeout)
0,5% ( 1,9) wpa_supplicant : schedule_timeout (process_timeout)
0,3% ( 1,2) kdesktop : schedule_timeout (process_timeout)
0,3% ( 1,0) kwrapper : do_nanosleep (hrtimer_wakeup)
0,3% ( 1,0) klipper : schedule_timeout (process_timeout)
0,3% ( 1,0) artsd : do_setitimer (it_real_fn)
0,3% ( 1,0) gpg-agent : schedule_timeout (process_timeout)
0,3% ( 1,0) X : nv_start_rc_timer (nv_kern_rc_timer)
0,3% ( 1,0) kicker : schedule_timeout (process_timeout)
0,1% ( 0,5) iwl3945 : ieee80211_authenticate
(ieee80211_sta_timer)
0,1% ( 0,5) <kernel core> : neigh_table_init_no_netlink
(neigh_periodic_timer)

This "<kernel IPI> : Rescheduling interrupts" causes at least 200 wakeups
(sometimes i see ~400 wakeups) for me and a quick google search yields [1],
but i didn't see this reported to LKML, so here it is :).

If anything else is needed please yell...

[1] http://www.mail-archive.com/[email protected]/msg01009.html
--
S.Çağlar Onur <[email protected]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!


Attachments:
(No filename) (2.58 kB)
signature.asc (189.00 B)
This is a digitally signed message part.
Download all attachments

2008-01-22 10:58:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: Rescheduling interrupts


* S.Çağlar Onur <[email protected]> wrote:

> Top causes for wakeups:
> 59,9% (238,4) <kernel IPI> : Rescheduling interrupts
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 14,7% ( 58,6) amarokapp : schedule_timeout (process_timeout)

hm, would be nice to figure out what causes these IPIs. Could you stick
something like this into arch/x86/kernel/smp_32.c's
smp_send_reschedule() function [this is the function that generates the
IPI]:

static void native_smp_send_reschedule(int cpu)
{
WARN_ON(cpu_is_offline(cpu));
send_IPI_mask(cpumask_of_cpu(cpu), RESCHEDULE_VECTOR);
if (panic_timeout > 0) {
panic_timeout--;
printk("IPI from task %s:%d on CPU#%d:\n",
current->comm, current->pid, cpu);
dump_stack();
}
}

NOTE: if you run an SMP kernel then first remove these two lines from
kernel/printk.c:

if (!oops_in_progress && waitqueue_active(&log_wait))
wake_up_interruptible(&log_wait);

otherwise you'll get lockups. (the IPI is sent while holding the
runqueue lock, so the printks will lock up)

then wait for the bad condition to occur on your system and generate a
stream of ~10 backtraces, via:

echo 10 > /proc/sys/kernel/panic

you should be getting 10 immediate backtraces - please send them to us.
The backtraces should show the place that generates the wakeups. [turn
on CONFIG_FRAME_POINTERS=y to get high quality backtraces.]

If you do _not_ get 10 immediate backtraces, then something in the
system is generating such IPIs outside of the scheduler's control. That
would suggest some other sort of borkage.

Ingo

2008-01-22 12:55:40

by S.Çağlar Onur

[permalink] [raw]
Subject: Re: Rescheduling interrupts

Hi;

22 Oca 2008 Sal tarihinde, Ingo Molnar şunları yazmıştı:
> * S.Çağlar Onur <[email protected]> wrote:
>
> > Top causes for wakeups:
> > 59,9% (238,4) <kernel IPI> : Rescheduling interrupts
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 14,7% ( 58,6) amarokapp : schedule_timeout (process_timeout)
>
> hm, would be nice to figure out what causes these IPIs. Could you stick
> something like this into arch/x86/kernel/smp_32.c's
> smp_send_reschedule() function [this is the function that generates the
> IPI]:
>
> static void native_smp_send_reschedule(int cpu)
> {
> WARN_ON(cpu_is_offline(cpu));
> send_IPI_mask(cpumask_of_cpu(cpu), RESCHEDULE_VECTOR);
> if (panic_timeout > 0) {
> panic_timeout--;
> printk("IPI from task %s:%d on CPU#%d:\n",
> current->comm, current->pid, cpu);
> dump_stack();
> }
> }
>
> NOTE: if you run an SMP kernel then first remove these two lines from
> kernel/printk.c:
>
> if (!oops_in_progress && waitqueue_active(&log_wait))
> wake_up_interruptible(&log_wait);
>
> otherwise you'll get lockups. (the IPI is sent while holding the
> runqueue lock, so the printks will lock up)
>
> then wait for the bad condition to occur on your system and generate a
> stream of ~10 backtraces, via:
>
> echo 10 > /proc/sys/kernel/panic
>
> you should be getting 10 immediate backtraces - please send them to us.
> The backtraces should show the place that generates the wakeups. [turn
> on CONFIG_FRAME_POINTERS=y to get high quality backtraces.]
>
> If you do _not_ get 10 immediate backtraces, then something in the
> system is generating such IPIs outside of the scheduler's control. That
> would suggest some other sort of borkage.
>
> Ingo

I grabbed the logs two times to make sure to catch needed info. 1st [1] one is generated while "Rescheduling interrupts" wakeups ~200 times and 2nd one generated for ~350 wakeups.

[1] http://cekirdek.pardus.org.tr/~caglar/dmesg.1st
[2] http://cekirdek.pardus.org.tr/~caglar/dmesg.2nd

Cheers
--
S.Çağlar Onur <[email protected]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!

2008-01-22 15:24:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: Rescheduling interrupts


* S.Çağlar Onur <[email protected]> wrote:

> I grabbed the logs two times to make sure to catch needed info. 1st [1] one is generated while "Rescheduling interrupts" wakeups ~200 times and 2nd one generated for ~350 wakeups.
>
> [1] http://cekirdek.pardus.org.tr/~caglar/dmesg.1st
> [2] http://cekirdek.pardus.org.tr/~caglar/dmesg.2nd

thanks, these seem to be mostly normal wakeups from standard tasks:

IPI from task kdm_greet:2118 on CPU#0:
IPI from task X:2079 on CPU#1:
IPI from task kdm_greet:2118 on CPU#0:
IPI from task hald-addon-inpu:2009 on CPU#1:
IPI from task events/0:7 on CPU#1:
IPI from task bash:2129 on CPU#0:
IPI from task kdm_greet:2118 on CPU#0:
IPI from task events/0:7 on CPU#1:
IPI from task events/0:7 on CPU#1:
IPI from task events/0:7 on CPU#1:
IPI from task bash:3902 on CPU#1:
IPI from task bash:3902 on CPU#1:
IPI from task amarokapp:3423 on CPU#1:
IPI from task amarokapp:3423 on CPU#1:
IPI from task amarokapp:3423 on CPU#1:
IPI from task X:2079 on CPU#0:
IPI from task yakuake:3422 on CPU#0:
IPI from task X:2079 on CPU#1:
IPI from task amarokapp:3423 on CPU#1:
IPI from task amarokapp:3423 on CPU#1:

could you also add a similar IPI printouts (with the same panic_timeout
logic) to arch/x86/kernel/smp_32.c's smp_reschedule_interrupt() function
- while still keeping the other printouts too?

Could you also enable PRINTK_TIME timestamps, so that we can see the
timings? (And do a "dmesg -n 1" so that the printks happen fast and the
timings are accurate.) I'd suggest to increase CONFIG_LOG_BUF_SHIFT to
20, so that your dmesg buffer is large enough. Plus try to capture 100
events, ok?

My theory is that for whatever reason we get "repeat" IPIs: multiple
reschedule IPIs although the other CPU only initiated one.

Ingo

2008-01-22 15:38:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: Rescheduling interrupts


also, this might reduce the number of cross-CPU wakeups on near-idle
systems:

echo 1 > /sys/devices/system/cpu/sched_mc_power_savings

[ or if it doesnt, it should ;) ]

Ingo

2008-01-22 15:56:17

by S.Çağlar Onur

[permalink] [raw]
Subject: Re: Rescheduling interrupts

Hi;

22 Oca 2008 Sal tarihinde, Ingo Molnar şunları yazmıştı:
> * S.Çağlar Onur <[email protected]> wrote:
> > I grabbed the logs two times to make sure to catch needed info. 1st [1]
> > one is generated while "Rescheduling interrupts" wakeups ~200 times and
> > 2nd one generated for ~350 wakeups.
> >
> > [1] http://cekirdek.pardus.org.tr/~caglar/dmesg.1st
> > [2] http://cekirdek.pardus.org.tr/~caglar/dmesg.2nd
>
> thanks, these seem to be mostly normal wakeups from standard tasks:
>
> IPI from task kdm_greet:2118 on CPU#0:
> IPI from task X:2079 on CPU#1:
> IPI from task kdm_greet:2118 on CPU#0:
> IPI from task hald-addon-inpu:2009 on CPU#1:
> IPI from task events/0:7 on CPU#1:
> IPI from task bash:2129 on CPU#0:
> IPI from task kdm_greet:2118 on CPU#0:
> IPI from task events/0:7 on CPU#1:
> IPI from task events/0:7 on CPU#1:
> IPI from task events/0:7 on CPU#1:
> IPI from task bash:3902 on CPU#1:
> IPI from task bash:3902 on CPU#1:
> IPI from task amarokapp:3423 on CPU#1:
> IPI from task amarokapp:3423 on CPU#1:
> IPI from task amarokapp:3423 on CPU#1:
> IPI from task X:2079 on CPU#0:
> IPI from task yakuake:3422 on CPU#0:
> IPI from task X:2079 on CPU#1:
> IPI from task amarokapp:3423 on CPU#1:
> IPI from task amarokapp:3423 on CPU#1:
>
> could you also add a similar IPI printouts (with the same panic_timeout
> logic) to arch/x86/kernel/smp_32.c's smp_reschedule_interrupt() function
> - while still keeping the other printouts too?
>
> Could you also enable PRINTK_TIME timestamps, so that we can see the
> timings? (And do a "dmesg -n 1" so that the printks happen fast and the
> timings are accurate.) I'd suggest to increase CONFIG_LOG_BUF_SHIFT to
> 20, so that your dmesg buffer is large enough. Plus try to capture 100
> events, ok?
>
> My theory is that for whatever reason we get "repeat" IPIs: multiple
> reschedule IPIs although the other CPU only initiated one.

Ok, please see http://cekirdek.pardus.org.tr/~caglar/dmesg.3rd :)

Cheers
--
S.Çağlar Onur <[email protected]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!

2008-01-22 16:00:34

by S.Çağlar Onur

[permalink] [raw]
Subject: Re: Rescheduling interrupts

Hi;

22 Oca 2008 Sal tarihinde, Ingo Molnar şunları yazmıştı:
>
> also, this might reduce the number of cross-CPU wakeups on near-idle
> systems:
>
> echo 1 > /sys/devices/system/cpu/sched_mc_power_savings
>
> [ or if it doesnt, it should ;) ]
>
> Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Seems like nothing changes


zangetsu ~ # cat /sys/devices/system/cpu/sched_mc_power_savings
1



Powertop still reports ~300 wakeups for "<kernel IPI> : Rescheduling interrupts"

PowerTOP version 1.9 (C) 2007 Intel Corporation

Cn Avg residency P-states (frequencies)
C0 (cpu running) ( 4,8%)
C1 0,0ms ( 0,0%)
C2 0,2ms ( 2,4%)
C3 2,4ms (92,8%)


Wakeups-from-idle per second : 495,2 interval: 3,0s
no ACPI power usage estimate available

Top causes for wakeups:
40,0% (330,7) <kernel IPI> : Rescheduling interrupts
12,3% (102,0) USB device 3-2 : HP Integrated Module (Broadcom Corp)
12,1% (100,0) <interrupt> : uhci_hcd:usb3
8,0% ( 66,3) <interrupt> : extra timer interrupt
7,0% ( 58,0) amarokapp : schedule_timeout (process_timeout)
4,0% ( 33,0) <interrupt> : uhci_hcd:usb2



and this is what system is doing while powertop reports above;

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.1 0.0 1512 532 ? Ss 17:41 0:00 init [3]
root 2 0.0 0.0 0 0 ? S< 17:41 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S< 17:41 0:00 [migration/0]
root 4 0.0 0.0 0 0 ? S< 17:41 0:00 [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S< 17:41 0:00 [migration/1]
root 6 0.0 0.0 0 0 ? S< 17:41 0:00 [ksoftirqd/1]
root 7 0.0 0.0 0 0 ? S< 17:41 0:00 [events/0]
root 8 0.0 0.0 0 0 ? S< 17:41 0:00 [events/1]
root 9 0.0 0.0 0 0 ? S< 17:41 0:00 [khelper]
root 10 0.0 0.0 0 0 ? S< 17:41 0:00 [kblockd/0]
root 11 0.0 0.0 0 0 ? S< 17:41 0:00 [kblockd/1]
root 12 0.0 0.0 0 0 ? S< 17:41 0:00 [kacpid]
root 13 0.0 0.0 0 0 ? S< 17:41 0:00 [kacpi_notify]
root 14 0.0 0.0 0 0 ? S< 17:41 0:00 [cqueue/0]
root 15 0.0 0.0 0 0 ? S< 17:41 0:00 [cqueue/1]
root 16 0.0 0.0 0 0 ? S< 17:41 0:00 [kseriod]
root 17 0.0 0.0 0 0 ? S 17:41 0:00 [pdflush]
root 18 0.0 0.0 0 0 ? S 17:41 0:00 [pdflush]
root 19 0.0 0.0 0 0 ? S< 17:41 0:00 [kswapd0]
root 20 0.0 0.0 0 0 ? S< 17:41 0:00 [aio/0]
root 21 0.0 0.0 0 0 ? S< 17:41 0:00 [aio/1]
root 22 0.0 0.0 0 0 ? S< 17:41 0:00 [kpsmoused]
root 42 0.0 0.0 0 0 ? S< 17:41 0:00 [khpsbpkt]
root 46 0.0 0.0 0 0 ? S< 17:41 0:00 [knodemgrd_0]
root 55 0.0 0.0 0 0 ? S< 17:41 0:00 [ata/0]
root 56 0.0 0.0 0 0 ? S< 17:41 0:00 [ata/1]
root 57 0.0 0.0 0 0 ? S< 17:41 0:00 [ata_aux]
root 61 0.0 0.0 0 0 ? S< 17:41 0:00 [scsi_eh_0]
root 62 0.0 0.0 0 0 ? S< 17:41 0:00 [scsi_eh_1]
root 63 0.0 0.0 0 0 ? S< 17:41 0:00 [scsi_eh_2]
root 64 0.0 0.0 0 0 ? S< 17:41 0:00 [scsi_eh_3]
root 70 0.0 0.0 0 0 ? S< 17:41 0:00 [ksuspend_usbd]
root 71 0.0 0.0 0 0 ? S< 17:41 0:00 [khubd]
root 80 0.0 0.0 0 0 ? S< 17:41 0:00 [scsi_eh_4]
root 81 0.0 0.0 0 0 ? S< 17:41 0:00 [scsi_eh_5]
root 159 0.0 0.0 0 0 ? S< 17:41 0:00 [kjournald]
root 194 0.0 0.0 2452 1304 ? S<s 17:41 0:00 /sbin/udevd --daemon
root 1212 0.0 0.0 0 0 ? S< 17:41 0:00 [kmmcd]
root 1378 0.0 0.0 0 0 ? S< 17:41 0:00 [iwl3945/0]
root 1379 0.0 0.0 0 0 ? S< 17:41 0:00 [iwl3945/1]
root 1408 0.0 0.0 0 0 ? S< 17:41 0:00 [iwl3945]
root 1522 0.0 0.0 2320 348 ? Ss 17:41 0:00 /usr/sbin/irqbalance
root 1554 0.0 0.0 3904 1200 ? Ss 17:41 0:00 Comar
root 1681 0.0 0.0 3904 720 ? S 17:41 0:00 ComarRPC
root 1693 0.0 0.0 1556 608 ? Ss 17:41 0:00 /usr/sbin/syslogd -m 15
root 1696 0.0 0.0 1512 412 ? Ss 17:41 0:00 /usr/sbin/klogd -c 3 -2
root 1697 0.0 0.0 2168 948 ? Ss 17:41 0:00 /usr/sbin/hcid -s -f /etc/bluetooth/hcid.conf
root 1708 0.0 0.0 0 0 ? S< 17:41 0:00 [krfcommd]
root 1726 0.0 0.0 2008 712 tty3 Ss+ 17:41 0:00 /sbin/mingetty tty3
root 1727 0.0 0.0 2008 708 tty4 Ss+ 17:41 0:00 /sbin/mingetty tty4
root 1728 0.0 0.0 2008 708 tty5 Ss+ 17:41 0:00 /sbin/mingetty tty5
root 1729 0.0 0.0 2008 708 tty6 Ss+ 17:41 0:00 /sbin/mingetty tty6
root 1730 0.0 0.0 1508 512 ? Ss 17:41 0:00 /usr/sbin/acpid -c /etc/acpi/events
dbus 1797 0.0 0.0 2360 1000 ? Ss 17:41 0:00 /usr/bin/dbus-daemon --system
root 1846 0.0 0.1 8384 2196 ? Ssl 17:41 0:00 /usr/sbin/console-kit-daemon
hal 1939 0.0 0.1 5572 3948 ? Ss 17:42 0:00 /usr/sbin/hald --daemon=yes --use-syslog
root 1943 0.0 0.0 2980 1080 ? S 17:42 0:00 hald-runner
root 1950 0.5 1.7 214512 36420 ? Ssl 17:42 0:04 /opt/sun-jdk/bin/java -DConfigFile=/opt/zemberek-server/config/conf.ini -Djava.library.path=/opt/zemberek-server/lib -Xverify:none -Xms12m -Xmx14m -jar /opt/zemberek-server/zemberek_server-0.7.jar
root 1951 0.0 0.0 1780 692 ? Ss 17:42 0:00 /usr/sbin/cron
root 1952 0.0 0.0 10060 648 ? S<sl 17:42 0:00 /sbin/auditd
root 1954 0.0 0.0 9828 648 ? S<sl 17:42 0:00 /sbin/audispd
root 1955 0.0 0.0 0 0 ? S< 17:42 0:00 [kauditd]
root 1962 0.0 0.1 5812 2088 ? Ss 17:42 0:00 /usr/sbin/cupsd
avahi 1964 0.0 0.0 2656 1492 ? Ss 17:42 0:00 avahi-daemon: running [zangetsu.local]
avahi 1967 0.0 0.0 2516 456 ? Ss 17:42 0:00 avahi-daemon: chroot helper
root 2007 0.0 0.0 3072 1108 ? S 17:42 0:00 hald-addon-input: Listening on /dev/input/event8 /dev/input/event9 /dev/input/event0 /dev/input/event4 /dev/input/event5 /dev/input/event6 /dev/input/event7
hal 2039 0.0 0.0 2032 888 ? S 17:42 0:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
root 2044 0.0 0.0 3076 1024 ? S 17:42 0:00 hald-addon-storage: polling /dev/sr0 (every 2 sec)
root 2057 0.0 0.0 1640 468 ? Ss 17:42 0:00 /usr/sbin/avahi-dnsconfd -D
root 2071 0.0 0.0 3952 776 ? Ss 17:42 0:00 /usr/kde/3.5/bin/kdm
root 2073 3.5 1.2 30852 26084 tty7 RLs+ 17:42 0:30 /usr/bin/X -br -nolisten tcp :0 vt7 -auth /var/run/xauth/A:0-E3nV3y
root 2077 0.0 0.0 4668 1976 ? S 17:42 0:00 -:0
root 2080 0.0 0.0 3296 1360 ? Ss 17:42 0:00 /usr/sbin/wpa_supplicant -wuB -P/var/run/wpa_supplicant.pid
root 2109 0.0 0.0 1652 248 ? Ss 17:42 0:00 /sbin/dhcpcd -R -Y -N eth1 -t 120
root 2694 0.0 0.0 2008 708 tty2 Ss+ 17:43 0:00 /sbin/mingetty --noclear tty2
root 2758 0.0 0.0 2008 708 tty1 Ss+ 17:43 0:00 /sbin/mingetty --noclear tty1
caglar 2783 0.0 0.0 3016 1348 ? Ss 17:43 0:00 /bin/sh /usr/kde/3.5/bin/startkde
caglar 2812 0.0 0.0 2664 468 ? Ss 17:44 0:00 /usr/bin/gpg-agent --daemon
caglar 2815 0.0 0.0 3056 764 ? Ss 17:44 0:00 /usr/bin/ssh-agent -s
caglar 2832 0.0 0.0 2984 432 ? S 17:44 0:00 dbus-launch --sh-syntax --exit-with-session
caglar 2833 0.0 0.0 2228 864 ? Ss 17:44 0:00 /usr/bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session
root 2839 0.0 0.0 1360 148 ? S 17:44 0:00 start_kdeinit --new-startup +kcminit_startup
caglar 2840 0.0 0.3 25008 7800 ? Ss 17:44 0:00 kdeinit Running...
caglar 2843 0.0 0.1 24276 2788 ? S 17:44 0:00 dcopserver [kdeinit] --nosid
caglar 2845 0.0 0.3 26896 7720 ? S 17:44 0:00 klauncher [kdeinit] --new-startup
caglar 2847 0.0 0.7 33552 16424 ? S 17:44 0:00 kded [kdeinit] --new-startup
caglar 2852 0.0 0.0 1496 348 ? S 17:44 0:00 kwrapper ksmserver
caglar 2854 0.0 0.4 26656 9392 ? S 17:44 0:00 ksmserver [kdeinit]
caglar 2855 0.1 0.6 28584 12480 ? S 17:44 0:01 kwin [kdeinit] -session 10dfd5e1dc000119905582800000074630008_1201
caglar 2857 0.1 0.8 34436 17352 ? S 17:44 0:01 kdesktop [kdeinit]
caglar 2859 0.1 0.8 45892 17384 ? S 17:44 0:01 kicker [kdeinit]
caglar 2862 0.0 0.3 25560 6688 ? S 17:44 0:00 kio_file [kdeinit] file /tmp/ksocket-caglar/klauncherTCaD9a.slave-
caglar 2866 0.0 0.3 13736 7856 ? S 17:44 0:00 /usr/kde/3.5/bin/artsd -F 10 -S 4096 -s 3 -m artsmessage -c drkonqi -l 3 -f
caglar 2871 0.0 0.6 29552 12728 ? S 17:44 0:00 kmix [kdeinit] -session 10dfd5e1dc000118109216400000036490013_1201
caglar 2873 0.3 1.2 57188 26548 ? S 17:44 0:02 akregator -session 10dfd5e1dc000119737236400000038700014_1201016429_751575
caglar 2876 0.0 0.5 25972 10908 ? S 17:44 0:00 knazar -session 10dfd5e1dc000119738342100000051660022_1201016429_752054
caglar 2877 0.0 0.5 35540 11524 ? S 17:44 0:00 knotify [kdeinit]
caglar 2879 0.0 0.7 41988 14848 ? S 17:44 0:00 kgpg -session 10dfd5e1dc000119885454500000040010017_1201016429_751902
caglar 2881 0.4 0.7 29792 15976 ? S 17:44 0:03 yakuake -session 10dfd5e1dc000119949009100000068430023_1201016429_752567
caglar 2882 3.5 2.7 152560 57376 ? Sl 17:44 0:26 amarokapp -session 10dfd5e1dc000120001145300000022560015_1201016429_751762
caglar 2890 1.2 4.0 148500 83704 ? Sl 17:44 0:09 kmail -session 10dfd5e1dc000120100924600000022060011_1201016429_752344
caglar 2894 0.0 0.5 27664 11080 ? S 17:44 0:00 klipper [kdeinit]
caglar 2898 0.0 0.5 29504 11512 ? S 17:44 0:00 kpowersave [kdeinit]
caglar 3080 0.0 0.3 54032 7924 ? S 17:44 0:00 kio_http [kdeinit] http /tmp/ksocket-caglar/klauncherTCaD9a.slave-
caglar 3081 0.0 0.3 53912 7916 ? S 17:44 0:00 kio_http [kdeinit] http /tmp/ksocket-caglar/klauncherTCaD9a.slave-
caglar 3150 0.0 0.1 4136 2600 pts/2 Ss 17:44 0:00 /bin/bash
caglar 3236 0.0 0.4 16604 8576 ? S 17:44 0:00 /usr/libexec/notification-daemon
caglar 3273 0.0 0.3 27164 7028 ? S 17:44 0:00 kio_file [kdeinit] file /tmp/ksocket-caglar/klauncherTCaD9a.slave-
caglar 3274 0.0 0.1 3692 2348 ? S 17:44 0:00 ruby /usr/kde/3.5/share/apps/amarok/scripts/score_default/score_default.rb
caglar 3341 0.0 0.3 62224 8172 ? S 17:46 0:00 kio_http [kdeinit] http /tmp/ksocket-caglar/klauncherTCaD9a.slave-
caglar 3343 0.0 0.3 61208 8116 ? S 17:46 0:00 kio_http [kdeinit] http /tmp/ksocket-caglar/klauncherTCaD9a.slave-
caglar 3344 0.0 0.3 54036 8076 ? S 17:46 0:00 kio_http [kdeinit] http /tmp/ksocket-caglar/klauncherTCaD9a.slave-
caglar 3345 0.0 0.3 53012 8072 ? S 17:46 0:00 kio_http [kdeinit] http /tmp/ksocket-caglar/klauncherTCaD9a.slave-
caglar 3347 0.0 0.3 54040 8120 ? S 17:46 0:00 kio_http [kdeinit] http /tmp/ksocket-caglar/klauncherTCaD9a.slave-
caglar 3348 0.0 0.3 62232 8100 ? S 17:46 0:00 kio_http [kdeinit] http /tmp/ksocket-caglar/klauncherTCaD9a.slave-
caglar 3897 0.0 0.8 56212 17004 ? S 17:50 0:00 /usr/bin/python /usr/kde/3.5/bin/network-applet
caglar 4265 0.0 0.0 2752 1308 ? S 17:53 0:00 /bin/sh /usr/bin/firefox
caglar 4267 1.1 1.6 131892 33644 ? Sl 17:53 0:01 /usr/lib/MozillaFirefox//firefox-bin
caglar 4271 0.0 0.1 4148 2392 ? S 17:53 0:00 /usr/libexec/gconfd-2 14
caglar 4300 0.0 0.4 54932 9120 ? S 17:54 0:00 kio_pop3 [kdeinit] pop3 /tmp/ksocket-caglar/klauncherTCaD9a.slave-
root 4316 0.0 0.0 2524 1200 pts/2 S 17:55 0:00 su -
root 4318 0.2 0.1 4788 3476 pts/2 S 17:55 0:00 -su
root 4342 0.1 0.1 3200 2116 pts/2 S+ 17:55 0:00 powertop
caglar 4345 0.1 0.1 4004 2540 pts/0 Ss 17:55 0:00 /bin/bash
root 4451 0.0 0.0 2524 1200 pts/0 S 17:55 0:00 su -
root 4453 0.3 0.1 4788 3448 pts/0 S 17:55 0:00 -su
caglar 4543 0.2 0.4 54064 9416 ? S 17:55 0:00 kio_smtp [kdeinit] smtp /tmp/ksocket-caglar/klauncherTCaD9a.slave-
root 4588 0.0 0.0 2272 964 pts/0 R+ 17:56 0:00 ps aux

Cheers
--
S.Çağlar Onur <[email protected]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!

2008-01-22 16:06:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: Rescheduling interrupts


* S.Çağlar Onur <[email protected]> wrote:

> > My theory is that for whatever reason we get "repeat" IPIs: multiple
> > reschedule IPIs although the other CPU only initiated one.
>
> Ok, please see http://cekirdek.pardus.org.tr/~caglar/dmesg.3rd :)

hm, the IPI sending and receiving is nicely paired up:

[ 625.795008] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
[ 625.795223] IPI (@native_smp_send_reschedule) from task amarokapp:2882 on CPU#1:

amarokapp does wake up threads every 20 microseconds - that could
explain it. It's probably Xorg running on one core, amarokapp on the
other core. That's already 100 reschedules/sec.

Ingo

2008-01-22 16:11:43

by S.Çağlar Onur

[permalink] [raw]
Subject: Re: Rescheduling interrupts

22 Oca 2008 Sal tarihinde, Ingo Molnar şunları yazmıştı:
>
> * S.Çağlar Onur <[email protected]> wrote:
>
> > > My theory is that for whatever reason we get "repeat" IPIs: multiple
> > > reschedule IPIs although the other CPU only initiated one.
> >
> > Ok, please see http://cekirdek.pardus.org.tr/~caglar/dmesg.3rd :)
>
> hm, the IPI sending and receiving is nicely paired up:
>
> [ 625.795008] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
> [ 625.795223] IPI (@native_smp_send_reschedule) from task amarokapp:2882 on CPU#1:
>
> amarokapp does wake up threads every 20 microseconds - that could
> explain it. It's probably Xorg running on one core, amarokapp on the
> other core. That's already 100 reschedules/sec.

Heh, killing amarok ends up with following;

PowerTOP version 1.9 (C) 2007 Intel Corporation

Cn Avg residency P-states (frequencies)
C0 (cpu running) ( 0,9%)
C1 0,0ms ( 0,0%)
C2 0,2ms ( 0,0%)
C3 5,1ms (99,1%)


Wakeups-from-idle per second : 197,8 interval: 10,0s
no ACPI power usage estimate available

Top causes for wakeups:
34,7% (130,7) USB device 3-2 : HP Integrated Module (Broadcom Corp)
26,5% (100,0) <interrupt> : uhci_hcd:usb3
5,8% ( 22,0) java : futex_wait (hrtimer_wakeup)
5,3% ( 20,0) <interrupt> : iwl3945
4,1% ( 15,4) USB device 2-2 : Microsoft Wireless Optical Mouse .00 (Microsoft)
2,9% ( 11,0) <interrupt> : libata
2,7% ( 10,1) <interrupt> : extra timer interrupt
2,7% ( 10,0) java : schedule_timeout (process_timeout)
2,7% ( 10,0) <kernel core> : scan_async (ehci_watchdog)
2,4% ( 9,0) <kernel IPI> : Rescheduling interrupts
2,1% ( 8,0) <kernel module> : usb_hcd_poll_rh_status (rh_timer_func)
1,7% ( 6,4) <interrupt> : uhci_hcd:usb2
1,7% ( 6,4) artsd : schedule_timeout (process_timeout)
0,6% ( 2,1) <interrupt> : ohci1394, uhci_hcd:usb4, nvidia
0,5% ( 2,0) <kernel core> : clocksource_check_watchdog (clocksource_watchdog)
0,5% ( 1,7) wpa_supplicant : schedule_timeout (process_timeout)
0,3% ( 1,0) kicker : schedule_timeout (process_timeout)
0,3% ( 1,0) kwin : schedule_timeout (process_timeout)
0,3% ( 1,0) kdesktop : schedule_timeout (process_timeout)
0,3% ( 1,0) klipper : schedule_timeout (process_timeout)
0,3% ( 1,0) kwrapper : do_nanosleep (hrtimer_wakeup)
0,3% ( 1,0) X : nv_start_rc_timer (nv_kern_rc_timer)

--
S.Çağlar Onur <[email protected]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!

2008-01-22 16:22:28

by S.Çağlar Onur

[permalink] [raw]
Subject: Re: Rescheduling interrupts

Hi;

22 Oca 2008 Sal tarihinde, S.Çağlar Onur şunları yazmıştı:
> > hm, the IPI sending and receiving is nicely paired up:
> >
> > [ 625.795008] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
> > [ 625.795223] IPI (@native_smp_send_reschedule) from task amarokapp:2882 on CPU#1:
> >
> > amarokapp does wake up threads every 20 microseconds - that could
> > explain it. It's probably Xorg running on one core, amarokapp on the
> > other core. That's already 100 reschedules/sec.
>
> Heh, killing amarok ends up with following;
>
> PowerTOP version 1.9 (C) 2007 Intel Corporation
>
> Cn Avg residency P-states (frequencies)
> C0 (cpu running) ( 0,9%)
> C1 0,0ms ( 0,0%)
> C2 0,2ms ( 0,0%)
> C3 5,1ms (99,1%)
>
>
> Wakeups-from-idle per second : 197,8 interval: 10,0s
> no ACPI power usage estimate available
>
> Top causes for wakeups:
> 34,7% (130,7) USB device 3-2 : HP Integrated Module (Broadcom Corp)
> 26,5% (100,0) <interrupt> : uhci_hcd:usb3
> 5,8% ( 22,0) java : futex_wait (hrtimer_wakeup)
> 5,3% ( 20,0) <interrupt> : iwl3945
> 4,1% ( 15,4) USB device 2-2 : Microsoft Wireless Optical Mouse .00 (Microsoft)
> 2,9% ( 11,0) <interrupt> : libata
> 2,7% ( 10,1) <interrupt> : extra timer interrupt
> 2,7% ( 10,0) java : schedule_timeout (process_timeout)
> 2,7% ( 10,0) <kernel core> : scan_async (ehci_watchdog)
> 2,4% ( 9,0) <kernel IPI> : Rescheduling interrupts
> 2,1% ( 8,0) <kernel module> : usb_hcd_poll_rh_status (rh_timer_func)
> 1,7% ( 6,4) <interrupt> : uhci_hcd:usb2
> 1,7% ( 6,4) artsd : schedule_timeout (process_timeout)
> 0,6% ( 2,1) <interrupt> : ohci1394, uhci_hcd:usb4, nvidia
> 0,5% ( 2,0) <kernel core> : clocksource_check_watchdog (clocksource_watchdog)
> 0,5% ( 1,7) wpa_supplicant : schedule_timeout (process_timeout)
> 0,3% ( 1,0) kicker : schedule_timeout (process_timeout)
> 0,3% ( 1,0) kwin : schedule_timeout (process_timeout)
> 0,3% ( 1,0) kdesktop : schedule_timeout (process_timeout)
> 0,3% ( 1,0) klipper : schedule_timeout (process_timeout)
> 0,3% ( 1,0) kwrapper : do_nanosleep (hrtimer_wakeup)
> 0,3% ( 1,0) X : nv_start_rc_timer (nv_kern_rc_timer)

By the way loging out from KDE also suffers same problem, this time kdm migrated to CPU1 and powertop reports ~300 wakeups for "<kernel IPI> : Rescheduling interrupts" again.

[...]
[ 2058.246692] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#0:
[ 2058.246737] IPI (@native_smp_send_reschedule) from task kdm_greet:6122 on CPU#1:
[ 2058.246812] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
[ 2058.278947] IPI (@native_smp_send_reschedule) from task X:2073 on CPU#0:
[ 2058.279070] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#0:
[ 2058.279175] IPI (@native_smp_send_reschedule) from task kdm_greet:6122 on CPU#1:
[ 2058.279251] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
[ 2058.279301] IPI (@native_smp_send_reschedule) from task X:2073 on CPU#0:
[ 2058.279377] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#0:
[ 2058.279425] IPI (@native_smp_send_reschedule) from task kdm_greet:6122 on CPU#1:
[ 2058.279503] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
[ 2058.279565] IPI (@native_smp_send_reschedule) from task X:2073 on CPU#0:
[ 2058.279637] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#0:
[ 2058.279683] IPI (@native_smp_send_reschedule) from task kdm_greet:6122 on CPU#1:
[ 2058.279758] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
[ 2058.311903] IPI (@native_smp_send_reschedule) from task X:2073 on CPU#0:
[ 2058.312028] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#0:
[...]

Cheers
--
S.Çağlar Onur <[email protected]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!

2008-01-22 17:05:07

by Matt Mackall

[permalink] [raw]
Subject: Re: Rescheduling interrupts


On Tue, 2008-01-22 at 17:05 +0100, Ingo Molnar wrote:
> * S.Çağlar Onur <[email protected]> wrote:
>
> > > My theory is that for whatever reason we get "repeat" IPIs: multiple
> > > reschedule IPIs although the other CPU only initiated one.
> >
> > Ok, please see http://cekirdek.pardus.org.tr/~caglar/dmesg.3rd :)
>
> hm, the IPI sending and receiving is nicely paired up:
>
> [ 625.795008] IPI (@smp_reschedule_interrupt) from task swapper:0 on CPU#1:
> [ 625.795223] IPI (@native_smp_send_reschedule) from task amarokapp:2882 on CPU#1:
>
> amarokapp does wake up threads every 20 microseconds - that could
> explain it. It's probably Xorg running on one core, amarokapp on the
> other core. That's already 100 reschedules/sec.

That suggests we want an "anti-load-balancing" heuristic when CPU usage
is very low. Migrating everything onto one core when we're close to idle
will save power and probably reduce latencies.

--
Mathematics is the supreme nostalgia of our time.

2008-01-22 19:28:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: Rescheduling interrupts


* Matt Mackall <[email protected]> wrote:

> > amarokapp does wake up threads every 20 microseconds - that could
> > explain it. It's probably Xorg running on one core, amarokapp on the
> > other core. That's already 100 reschedules/sec.
>
> That suggests we want an "anti-load-balancing" heuristic when CPU
> usage is very low. Migrating everything onto one core when we're close
> to idle will save power and probably reduce latencies.

that would probably be the case if it's multiple sockets - but for
multiple cores exactly the opposite is true: the sooner _both_ cores
finish processing, the deeper power use the CPU can reach. So effective
and immediate spreading of workloads amongst multiple cores - especially
with shared L2 caches where the cost of migration is low, helps power
consumption. (and it obviously helps latencies and bandwith)

Ingo

2008-01-23 15:54:39

by Matt Mackall

[permalink] [raw]
Subject: Re: Rescheduling interrupts


On Wed, 2008-01-23 at 09:53 +0100, Andi Kleen wrote:
> Ingo Molnar <[email protected]> writes:
>
> > that would probably be the case if it's multiple sockets - but for
> > multiple cores exactly the opposite is true: the sooner _both_ cores
> > finish processing, the deeper power use the CPU can reach.
>
> That's only true on setups where the cores don't have
> separate sleep states. But that's not generally true anymore.
> e.g. AMD Fam10h has completely separate power planes for
> the cores and I believe newer Intel CPUs can also let their
> cores go to at least some sleep states independently (although
> the deepest sleep modi still require all cores idle)

I think we can expect everyone to rapidly evolve towards full
independence of core power states. In fact, it wouldn't surprise me if
we eventually get to the point of shutting down individual functional
units like the FPU.

--
Mathematics is the supreme nostalgia of our time.