Hi,
This patchset focus on two problem:
1)timer_stats_timer_*_start_info() should belong to timer core only.
It should not be seen by outsider, such as workqueue. And should
not resident in timer.h.
Patch-0001~0003 first remove the only explicit out call, then move
timer_stats_timer_*_start_info() from timer.h to timer.c.
2)call site recorded by timer_stats sometimes show the timer core functions
instead of the real call site.
Now /proc/timer_stats is like below on my machine:
---now---
Timer Stats Version: v0.2
Sample period: 7.967 s
10075, 0 kworker/0:0 hrtimer_start_range_ns (tick_sched_timer)
642, 0 swapper hrtimer_start (tick_sched_timer)
16, 1312 pulseaudio hrtimer_start_range_ns (hrtimer_wakeup)
153, 0 swapper hrtimer_start_range_ns (tick_sched_timer)
8, 1117 sshd sk_reset_timer (tcp_write_timer)
15, 1373 multiload-apple hrtimer_start_range_ns (hrtimer_wakeup)
8, 1361 gvfs-afc-volume hrtimer_start_range_ns (hrtimer_wakeup)
4, 1288 gnome-settings- hrtimer_start_range_ns (hrtimer_wakeup)
1, 1403 indicator-me-se hrtimer_start_range_ns (hrtimer_wakeup)
1, 1306 nautilus hrtimer_start_range_ns (hrtimer_wakeup)
1, 487 gdm-binary hrtimer_start_range_ns (hrtimer_wakeup)
3, 1304 gnome-panel hrtimer_start_range_ns (hrtimer_wakeup)
4, 1347 udisks-daemon hrtimer_start_range_ns (hrtimer_wakeup)
9, 0 kworker/0:0 hrtimer_start (tick_sched_timer)
8, 1 swapper __enqueue_rt_entity (sched_rt_period_timer)
14, 0 swapper clocksource_watchdog (clocksource_watchdog)
2, 1227 gnome-session hrtimer_start_range_ns (hrtimer_wakeup)
4, 1013 hald-addon-stor hrtimer_start_range_ns (hrtimer_wakeup)
1, 951 hald hrtimer_start_range_ns (hrtimer_wakeup)
2, 0 kworker/0:0 cfq_completed_request (cfq_idle_slice_timer)
1, 628 Xorg queue_delayed_work (delayed_work_timer_fn)
1, 1309 compiz hrtimer_start_range_ns (hrtimer_wakeup)
29, 1117 sshd queue_delayed_work (delayed_work_timer_fn)
28, 1159 bash queue_delayed_work (delayed_work_timer_fn)
9, 0 swapper sk_reset_timer (tcp_delack_timer)
6, 502 avahi-daemon hrtimer_start_range_ns (hrtimer_wakeup)
1, 1303 gnome-power-man hrtimer_start_range_ns (hrtimer_wakeup)
1, 542 console-kit-dae hrtimer_start_range_ns (hrtimer_wakeup)
6D, 11 kworker/0:1 queue_delayed_work (delayed_work_timer_fn)
5D, 46 kworker/1:1 queue_delayed_work (delayed_work_timer_fn)
2, 7 watchdog/0 hrtimer_start (watchdog_timer_fn)
2, 12 watchdog/1 hrtimer_start (watchdog_timer_fn)
1, 919 rtkit-daemon hrtimer_start_range_ns (hrtimer_wakeup)
1, 1307 pulseaudio hrtimer_start_range_ns (hrtimer_wakeup)
1, 651 irqbalance hrtimer_start_range_ns (hrtimer_wakeup)
1, 0 kworker/0:0 dev_watchdog (dev_watchdog)
1, 914 pulseaudio hrtimer_start_range_ns (hrtimer_wakeup)
1, 1265 ssh-agent hrtimer_start_range_ns (hrtimer_wakeup)
493 total events, 61.880 events/sec
---------------------------------------------------------------------------------------
Then after patches applied:
---patched--
Timer Stats Version: v0.2
Sample period: 8.683 s
82, 0 kworker/0:0 tick_nohz_restart_sched_tick (tick_sched_timer)
130, 0 swapper tick_nohz_restart_sched_tick (tick_sched_timer)
5, 1172 udisks-daemon schedule_hrtimeout_range_clock (hrtimer_wakeup)
5, 1009 hald-addon-stor schedule_hrtimeout_range_clock (hrtimer_wakeup)
1, 959 hald schedule_hrtimeout_range_clock (hrtimer_wakeup)
3, 7 watchdog/0 watchdog (watchdog_timer_fn)
7, 3530 sshd sk_reset_timer (tcp_write_timer)
9, 1189 gvfs-afc-volume do_nanosleep (hrtimer_wakeup)
17, 1204 multiload-apple schedule_hrtimeout_range_clock (hrtimer_wakeup)
3, 12 watchdog/1 watchdog (watchdog_timer_fn)
4, 1 swapper __enqueue_rt_entity (sched_rt_period_timer)
14, 1244 gnome-terminal schedule_hrtimeout_range_clock (hrtimer_wakeup)
13, 629 Xorg queue_delayed_work_on (delayed_work_timer_fn)
16, 0 kworker/0:0 clocksource_watchdog (clocksource_watchdog)
2, 1368 update-notifier schedule_hrtimeout_range_clock (hrtimer_wakeup)
7D, 11 kworker/0:1 queue_delayed_work_on (delayed_work_timer_fn)
2, 1096 gnome-settings- schedule_hrtimeout_range_clock (hrtimer_wakeup)
8D, 46 kworker/1:1 queue_delayed_work_on (delayed_work_timer_fn)
2, 915 rtkit-daemon schedule_hrtimeout_range_clock (hrtimer_wakeup)
1, 654 irqbalance do_nanosleep (hrtimer_wakeup)
1, 1073 ssh-agent schedule_hrtimeout_range_clock (hrtimer_wakeup)
2, 1140 compiz schedule_hrtimeout_range_clock (hrtimer_wakeup)
2, 1136 gnome-panel schedule_hrtimeout_range_clock (hrtimer_wakeup)
4, 11 kworker/0:1 schedule_timeout_uninterruptible (process_timeout)
25, 3530 sshd queue_delayed_work_on (delayed_work_timer_fn)
24, 3573 bash queue_delayed_work_on (delayed_work_timer_fn)
7, 0 swapper sk_reset_timer (tcp_delack_timer)
1, 1080 gconfd-2 schedule_hrtimeout_range_clock (hrtimer_wakeup)
1, 1133 gnome-power-man schedule_hrtimeout_range_clock (hrtimer_wakeup)
1, 533 console-kit-dae schedule_hrtimeout_range_clock (hrtimer_wakeup)
399 total events, 45.951 events/sec
---------------------------------------------------------------------------------------
You can notice tick_nohz_restart_sched_tick(), watchdog(), do_nanosleep()
is showed as the start_site.
And there is also something reflected in timer_list(AKA, more readable
start_site).
Any comments is appreciated.
Thanks,
Yong
Yong Zhang (6):
timer: expliciyly call timer_stats_timer_set_start_info() in add_timer()
workqueue: remove redundant timer_stats_timer_set_start_info()
timer: move timer_stats_timer_*_start_info() to kernel/timer.c
hrtimer: change call site of timer_stats_hrtimer_clear_start_info()
hrtimer: record more real start_site for timer_stats
timer_stats: make output more readable
include/linux/timer.h | 23 -----------------------
kernel/hrtimer.c | 10 ++++++----
kernel/time/timer_stats.c | 4 ++--
kernel/timer.c | 21 +++++++++++++++++++--
kernel/workqueue.c | 2 --
5 files changed, 27 insertions(+), 33 deletions(-)
From: Yong Zhang <[email protected]>
Now /proc/timer_stats will show add_timer() as the start_site:
23, 0 swapper add_timer (peer_check_expire)
23, 0 swapper add_timer (addrconf_verify)
This is because add_timer() -> mod_timer() -> __mod_timer(),
but mod_timer() is not inlined. So add_timer() is showed.
Call timer_stats_timer_set_start_info() explicitly to record the
real start_site.
After this change, timer_stats_timer_set_start_info() in workqueue.c
can be removed.
Signed-off-by: Yong Zhang <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
kernel/timer.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/kernel/timer.c b/kernel/timer.c
index 97bf05b..2a443cc 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -864,6 +864,7 @@ EXPORT_SYMBOL(mod_timer_pinned);
void add_timer(struct timer_list *timer)
{
BUG_ON(timer_pending(timer));
+ timer_stats_timer_set_start_info(timer);
mod_timer(timer, timer->expires);
}
EXPORT_SYMBOL(add_timer);
--
1.7.0.4
From: Yong Zhang <[email protected]>
No need to call timer_stats_timer_set_start_info() explicitly,
add_timer_on() and add_timer() have done the right job.
After this change, we can move timer_stats_timer_*_start_info()
from include/linux/timer.h to kernel/timer.c. Outsider should not
see those functions.
Signed-off-by: Yong Zhang <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
---
kernel/workqueue.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 8bd600c..e87ffd7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1094,8 +1094,6 @@ int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
BUG_ON(timer_pending(timer));
BUG_ON(!list_empty(&work->entry));
- timer_stats_timer_set_start_info(&dwork->timer);
-
/*
* This stores cwq for the moment, for the timer_fn.
* Note that the work's gcwq is preserved to allow
--
1.7.0.4
From: Yong Zhang <[email protected]>
timer_stats_timer_*_start_info() should not be seen outside,
move it to where it should be.
Signed-off-by: Yong Zhang <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
include/linux/timer.h | 23 -----------------------
kernel/timer.c | 20 ++++++++++++++++++--
2 files changed, 18 insertions(+), 25 deletions(-)
diff --git a/include/linux/timer.h b/include/linux/timer.h
index 38cf093..4fb4cd7 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -217,33 +217,10 @@ extern void init_timer_stats(void);
extern void timer_stats_update_stats(void *timer, pid_t pid, void *startf,
void *timerf, char *comm,
unsigned int timer_flag);
-
-extern void __timer_stats_timer_set_start_info(struct timer_list *timer,
- void *addr);
-
-static inline void timer_stats_timer_set_start_info(struct timer_list *timer)
-{
- if (likely(!timer_stats_active))
- return;
- __timer_stats_timer_set_start_info(timer, __builtin_return_address(0));
-}
-
-static inline void timer_stats_timer_clear_start_info(struct timer_list *timer)
-{
- timer->start_site = NULL;
-}
#else
static inline void init_timer_stats(void)
{
}
-
-static inline void timer_stats_timer_set_start_info(struct timer_list *timer)
-{
-}
-
-static inline void timer_stats_timer_clear_start_info(struct timer_list *timer)
-{
-}
#endif
extern void add_timer(struct timer_list *timer);
diff --git a/kernel/timer.c b/kernel/timer.c
index 2a443cc..6dd3a9e 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -395,16 +395,24 @@ static void internal_add_timer(struct tvec_base *base, struct timer_list *timer)
}
#ifdef CONFIG_TIMER_STATS
-void __timer_stats_timer_set_start_info(struct timer_list *timer, void *addr)
+static inline void timer_stats_timer_set_start_info(struct timer_list *timer)
{
+ if (likely(!timer_stats_active))
+ return;
+
if (timer->start_site)
return;
- timer->start_site = addr;
+ timer->start_site = __builtin_return_address(0);
memcpy(timer->start_comm, current->comm, TASK_COMM_LEN);
timer->start_pid = current->pid;
}
+static inline void timer_stats_timer_clear_start_info(struct timer_list *timer)
+{
+ timer->start_site = NULL;
+}
+
static void timer_stats_account_timer(struct timer_list *timer)
{
unsigned int flag = 0;
@@ -419,6 +427,14 @@ static void timer_stats_account_timer(struct timer_list *timer)
}
#else
+static inline void timer_stats_timer_set_start_info(struct timer_list *timer)
+{
+}
+
+static inline void timer_stats_timer_clear_start_info(struct timer_list *timer)
+{
+}
+
static void timer_stats_account_timer(struct timer_list *timer) {}
#endif
--
1.7.0.4
From: Yong Zhang <[email protected]>
Put timer_stats_hrtimer_clear_start_info() to the real cancel
place.
Because remove_hrtimer() is used for internal, and it could be called
as below:
int __hrtimer_start_range_ns(ARG) {
...
remove_hrtimer();
...
timer_stats_hrtimer_set_start_info();
}
Then if remove_hrtimer() clear the start_site, the following
timer_stats_hrtimer_set_start_info() will record the caller of
__hrtimer_start_range_ns as start_site. But this is not what
we want.
Signed-off-by: Yong Zhang <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
kernel/hrtimer.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index ce66917..2b79ab4 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -772,7 +772,7 @@ static inline void timer_stats_hrtimer_clear_start_info(struct hrtimer *timer)
static inline void timer_stats_account_hrtimer(struct hrtimer *timer)
{
#ifdef CONFIG_TIMER_STATS
- if (likely(!timer_stats_active))
+ if (!timer->start_site)
return;
timer_stats_update_stats(timer, timer->start_pid, timer->start_site,
timer->function, timer->start_comm, 0);
@@ -942,7 +942,6 @@ remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base)
* rare case and less expensive than a smp call.
*/
debug_deactivate(timer);
- timer_stats_hrtimer_clear_start_info(timer);
reprogram = base->cpu_base == &__get_cpu_var(hrtimer_bases);
__remove_hrtimer(timer, base, HRTIMER_STATE_INACTIVE,
reprogram);
@@ -1055,9 +1054,10 @@ int hrtimer_try_to_cancel(struct hrtimer *timer)
base = lock_hrtimer_base(timer, &flags);
- if (!hrtimer_callback_running(timer))
+ if (!hrtimer_callback_running(timer)){
+ timer_stats_hrtimer_clear_start_info(timer);
ret = remove_hrtimer(timer, base);
-
+ }
unlock_hrtimer_base(timer, &flags);
return ret;
--
1.7.0.4
From: Yong Zhang <[email protected]>
Most call sites are showed as hrtimer_start_range_ns(). This gives us
not enough info about the start_site. Then call
timer_stats_hrtimer_set_start_info() in hrtimer_start*() to
record the upper caller.
Signed-off-by: Yong Zhang <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
kernel/hrtimer.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 2b79ab4..1820a83 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1014,6 +1014,7 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
int hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
unsigned long delta_ns, const enum hrtimer_mode mode)
{
+ timer_stats_hrtimer_set_start_info(timer);
return __hrtimer_start_range_ns(timer, tim, delta_ns, mode, 1);
}
EXPORT_SYMBOL_GPL(hrtimer_start_range_ns);
@@ -1031,6 +1032,7 @@ EXPORT_SYMBOL_GPL(hrtimer_start_range_ns);
int
hrtimer_start(struct hrtimer *timer, ktime_t tim, const enum hrtimer_mode mode)
{
+ timer_stats_hrtimer_set_start_info(timer);
return __hrtimer_start_range_ns(timer, tim, 0, mode, 1);
}
EXPORT_SYMBOL_GPL(hrtimer_start);
--
1.7.0.4
From: Yong Zhang <[email protected]>
When running timer_stats for a little long time, the output
will be not aligned anymore because the volume of sample count
and pid will expand.
Signed-off-by: Yong Zhang <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
kernel/time/timer_stats.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/time/timer_stats.c b/kernel/time/timer_stats.c
index 2f3b585..da85b3e 100644
--- a/kernel/time/timer_stats.c
+++ b/kernel/time/timer_stats.c
@@ -307,10 +307,10 @@ static int tstats_show(struct seq_file *m, void *v)
for (i = 0; i < nr_entries; i++) {
entry = entries + i;
if (entry->timer_flag & TIMER_STATS_FLAG_DEFERRABLE) {
- seq_printf(m, "%4luD, %5d %-16s ",
+ seq_printf(m, "%10luD, %10d %-16s ",
entry->count, entry->pid, entry->comm);
} else {
- seq_printf(m, " %4lu, %5d %-16s ",
+ seq_printf(m, " %10lu, %10d %-16s ",
entry->count, entry->pid, entry->comm);
}
--
1.7.0.4
On 8/29/2010 8:05 AM, Yong Zhang wrote:
> Hi,
>
> This patchset focus on two problem:
>
> 1)timer_stats_timer_*_start_info() should belong to timer core only.
> It should not be seen by outsider, such as workqueue. And should
> not resident in timer.h.
> Patch-0001~0003 first remove the only explicit out call, then move
> timer_stats_timer_*_start_info() from timer.h to timer.c.
>
> 2)call site recorded by timer_stats sometimes show the timer core functions
> instead of the real call site.
> Now /proc/timer_stats is like below on my machine:
> ---now---
>
one question; have you tested this with powertop? (powertop being one of
the prime consumers of this interface)
On Sun, Aug 29, 2010 at 11:44 PM, Arjan van de Ven
<[email protected]> wrote:
> On 8/29/2010 8:05 AM, Yong Zhang wrote:
>>
>> Hi,
>>
>> This patchset focus on two problem:
>>
>> 1)timer_stats_timer_*_start_info() should belong to timer core only.
>> It should not be seen by outsider, such as workqueue. And should
>> not resident in timer.h.
>> Patch-0001~0003 first remove the only explicit out call, then move
>> timer_stats_timer_*_start_info() from timer.h to timer.c.
>>
>> 2)call site recorded by timer_stats sometimes show the timer core
>> functions
>> instead of the real call site.
>> Now /proc/timer_stats is like below on my machine:
>> ---now---
>>
>
>
> one question; have you tested this with powertop? (powertop being one of the
> prime consumers of this interface)
>
Now, vanilla kernel with powtop is
Top causes for wakeups:
49.4% (250.3) kworker/0:0 : hrtimer_start_range_ns (tick_sched_timer)
49.4% (250.3) <kernel core> : hrtimer_start_range_ns (tick_sched_timer)
0.4% ( 2.0) <kernel core> : clocksource_watchdog (clocksource_watchdog)
0.4% ( 1.8) <interrupt> : eth1
0.2% ( 1.0) kworker/0:0 : add_timer (tg3_timer)
0.1% ( 0.3) <kernel core> : dev_watchdog (dev_watchdog)
0.0% ( 0.2) sendmail : hrtimer_start_range_ns (hrtimer_wakeup)
0.0% ( 0.2) init : hrtimer_start_range_ns (hrtimer_wakeup)
0.0% ( 0.1) <kernel core> : bdi_arm_supers_timer (sync_supers_timer_fn)
0.0% ( 0.1) flush-0:13 : bdi_writeback_thread (process_timeout)
0.0% ( 0.1) <kernel core> : __enqueue_rt_entity (sched_rt_period_timer)
And the patched kernel:
Top causes for wakeups:
34.9% ( 2.0) kworker/0:0 : clocksource_watchdog (clocksource_watchdog)
29.1% ( 1.7) <interrupt> : eth1
17.4% ( 1.0) kworker/0:0 : tg3_timer (tg3_timer)
3.5% ( 0.2) sendmail : schedule_hrtimeout_range_clock
(hrtimer_wakeup)
3.5% ( 0.2) <kernel core> : dev_watchdog (dev_watchdog)
3.5% ( 0.2) init : schedule_hrtimeout_range_clock
(hrtimer_wakeup)
2.3% ( 0.1) <kernel core> : bdi_arm_supers_timer (sync_supers_timer_fn)
2.3% ( 0.1) flush-0:13 : bdi_writeback_thread (process_timeout)
1.2% ( 0.1) <kernel core> : __enqueue_rt_entity (sched_rt_period_timer)
1.2% ( 0.1) syslogd : do_setitimer (it_real_fn)
The difference is tick_sched_timer() is gone in the patches version.
And after taking a look at powertop, I think you want to get rid of the tick
events, right? Or maybe I'm missing something?
Thanks,
Yong