2010-08-29 15:05:29

by Yong Zhang

[permalink] [raw]
Subject: [RFC PATCHSET 0/6] timer/hrtimer/timer_stats: Improvement on timer_stats

Hi,

This patchset focus on two problem:

1)timer_stats_timer_*_start_info() should belong to timer core only.
It should not be seen by outsider, such as workqueue. And should
not resident in timer.h.
Patch-0001~0003 first remove the only explicit out call, then move
timer_stats_timer_*_start_info() from timer.h to timer.c.

2)call site recorded by timer_stats sometimes show the timer core functions
instead of the real call site.
Now /proc/timer_stats is like below on my machine:
---now---
Timer Stats Version: v0.2
Sample period: 7.967 s
10075, 0 kworker/0:0 hrtimer_start_range_ns (tick_sched_timer)
642, 0 swapper hrtimer_start (tick_sched_timer)
16, 1312 pulseaudio hrtimer_start_range_ns (hrtimer_wakeup)
153, 0 swapper hrtimer_start_range_ns (tick_sched_timer)
8, 1117 sshd sk_reset_timer (tcp_write_timer)
15, 1373 multiload-apple hrtimer_start_range_ns (hrtimer_wakeup)
8, 1361 gvfs-afc-volume hrtimer_start_range_ns (hrtimer_wakeup)
4, 1288 gnome-settings- hrtimer_start_range_ns (hrtimer_wakeup)
1, 1403 indicator-me-se hrtimer_start_range_ns (hrtimer_wakeup)
1, 1306 nautilus hrtimer_start_range_ns (hrtimer_wakeup)
1, 487 gdm-binary hrtimer_start_range_ns (hrtimer_wakeup)
3, 1304 gnome-panel hrtimer_start_range_ns (hrtimer_wakeup)
4, 1347 udisks-daemon hrtimer_start_range_ns (hrtimer_wakeup)
9, 0 kworker/0:0 hrtimer_start (tick_sched_timer)
8, 1 swapper __enqueue_rt_entity (sched_rt_period_timer)
14, 0 swapper clocksource_watchdog (clocksource_watchdog)
2, 1227 gnome-session hrtimer_start_range_ns (hrtimer_wakeup)
4, 1013 hald-addon-stor hrtimer_start_range_ns (hrtimer_wakeup)
1, 951 hald hrtimer_start_range_ns (hrtimer_wakeup)
2, 0 kworker/0:0 cfq_completed_request (cfq_idle_slice_timer)
1, 628 Xorg queue_delayed_work (delayed_work_timer_fn)
1, 1309 compiz hrtimer_start_range_ns (hrtimer_wakeup)
29, 1117 sshd queue_delayed_work (delayed_work_timer_fn)
28, 1159 bash queue_delayed_work (delayed_work_timer_fn)
9, 0 swapper sk_reset_timer (tcp_delack_timer)
6, 502 avahi-daemon hrtimer_start_range_ns (hrtimer_wakeup)
1, 1303 gnome-power-man hrtimer_start_range_ns (hrtimer_wakeup)
1, 542 console-kit-dae hrtimer_start_range_ns (hrtimer_wakeup)
6D, 11 kworker/0:1 queue_delayed_work (delayed_work_timer_fn)
5D, 46 kworker/1:1 queue_delayed_work (delayed_work_timer_fn)
2, 7 watchdog/0 hrtimer_start (watchdog_timer_fn)
2, 12 watchdog/1 hrtimer_start (watchdog_timer_fn)
1, 919 rtkit-daemon hrtimer_start_range_ns (hrtimer_wakeup)
1, 1307 pulseaudio hrtimer_start_range_ns (hrtimer_wakeup)
1, 651 irqbalance hrtimer_start_range_ns (hrtimer_wakeup)
1, 0 kworker/0:0 dev_watchdog (dev_watchdog)
1, 914 pulseaudio hrtimer_start_range_ns (hrtimer_wakeup)
1, 1265 ssh-agent hrtimer_start_range_ns (hrtimer_wakeup)
493 total events, 61.880 events/sec
---------------------------------------------------------------------------------------
Then after patches applied:
---patched--
Timer Stats Version: v0.2
Sample period: 8.683 s
82, 0 kworker/0:0 tick_nohz_restart_sched_tick (tick_sched_timer)
130, 0 swapper tick_nohz_restart_sched_tick (tick_sched_timer)
5, 1172 udisks-daemon schedule_hrtimeout_range_clock (hrtimer_wakeup)
5, 1009 hald-addon-stor schedule_hrtimeout_range_clock (hrtimer_wakeup)
1, 959 hald schedule_hrtimeout_range_clock (hrtimer_wakeup)
3, 7 watchdog/0 watchdog (watchdog_timer_fn)
7, 3530 sshd sk_reset_timer (tcp_write_timer)
9, 1189 gvfs-afc-volume do_nanosleep (hrtimer_wakeup)
17, 1204 multiload-apple schedule_hrtimeout_range_clock (hrtimer_wakeup)
3, 12 watchdog/1 watchdog (watchdog_timer_fn)
4, 1 swapper __enqueue_rt_entity (sched_rt_period_timer)
14, 1244 gnome-terminal schedule_hrtimeout_range_clock (hrtimer_wakeup)
13, 629 Xorg queue_delayed_work_on (delayed_work_timer_fn)
16, 0 kworker/0:0 clocksource_watchdog (clocksource_watchdog)
2, 1368 update-notifier schedule_hrtimeout_range_clock (hrtimer_wakeup)
7D, 11 kworker/0:1 queue_delayed_work_on (delayed_work_timer_fn)
2, 1096 gnome-settings- schedule_hrtimeout_range_clock (hrtimer_wakeup)
8D, 46 kworker/1:1 queue_delayed_work_on (delayed_work_timer_fn)
2, 915 rtkit-daemon schedule_hrtimeout_range_clock (hrtimer_wakeup)
1, 654 irqbalance do_nanosleep (hrtimer_wakeup)
1, 1073 ssh-agent schedule_hrtimeout_range_clock (hrtimer_wakeup)
2, 1140 compiz schedule_hrtimeout_range_clock (hrtimer_wakeup)
2, 1136 gnome-panel schedule_hrtimeout_range_clock (hrtimer_wakeup)
4, 11 kworker/0:1 schedule_timeout_uninterruptible (process_timeout)
25, 3530 sshd queue_delayed_work_on (delayed_work_timer_fn)
24, 3573 bash queue_delayed_work_on (delayed_work_timer_fn)
7, 0 swapper sk_reset_timer (tcp_delack_timer)
1, 1080 gconfd-2 schedule_hrtimeout_range_clock (hrtimer_wakeup)
1, 1133 gnome-power-man schedule_hrtimeout_range_clock (hrtimer_wakeup)
1, 533 console-kit-dae schedule_hrtimeout_range_clock (hrtimer_wakeup)
399 total events, 45.951 events/sec
---------------------------------------------------------------------------------------
You can notice tick_nohz_restart_sched_tick(), watchdog(), do_nanosleep()
is showed as the start_site.

And there is also something reflected in timer_list(AKA, more readable
start_site).

Any comments is appreciated.

Thanks,
Yong

Yong Zhang (6):
timer: expliciyly call timer_stats_timer_set_start_info() in add_timer()
workqueue: remove redundant timer_stats_timer_set_start_info()
timer: move timer_stats_timer_*_start_info() to kernel/timer.c
hrtimer: change call site of timer_stats_hrtimer_clear_start_info()
hrtimer: record more real start_site for timer_stats
timer_stats: make output more readable

include/linux/timer.h | 23 -----------------------
kernel/hrtimer.c | 10 ++++++----
kernel/time/timer_stats.c | 4 ++--
kernel/timer.c | 21 +++++++++++++++++++--
kernel/workqueue.c | 2 --
5 files changed, 27 insertions(+), 33 deletions(-)


2010-08-29 15:05:49

by Yong Zhang

[permalink] [raw]
Subject: [RFC PATCH 1/6] timer: expliciyly call timer_stats_timer_set_start_info() in add_timer()

From: Yong Zhang <[email protected]>

Now /proc/timer_stats will show add_timer() as the start_site:
23, 0 swapper add_timer (peer_check_expire)
23, 0 swapper add_timer (addrconf_verify)

This is because add_timer() -> mod_timer() -> __mod_timer(),
but mod_timer() is not inlined. So add_timer() is showed.
Call timer_stats_timer_set_start_info() explicitly to record the
real start_site.

After this change, timer_stats_timer_set_start_info() in workqueue.c
can be removed.

Signed-off-by: Yong Zhang <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
kernel/timer.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index 97bf05b..2a443cc 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -864,6 +864,7 @@ EXPORT_SYMBOL(mod_timer_pinned);
void add_timer(struct timer_list *timer)
{
BUG_ON(timer_pending(timer));
+ timer_stats_timer_set_start_info(timer);
mod_timer(timer, timer->expires);
}
EXPORT_SYMBOL(add_timer);
--
1.7.0.4

2010-08-29 15:05:57

by Yong Zhang

[permalink] [raw]
Subject: [RFC PATCH 2/6] workqueue: remove redundant timer_stats_timer_set_start_info()

From: Yong Zhang <[email protected]>

No need to call timer_stats_timer_set_start_info() explicitly,
add_timer_on() and add_timer() have done the right job.

After this change, we can move timer_stats_timer_*_start_info()
from include/linux/timer.h to kernel/timer.c. Outsider should not
see those functions.

Signed-off-by: Yong Zhang <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
---
kernel/workqueue.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 8bd600c..e87ffd7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1094,8 +1094,6 @@ int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
BUG_ON(timer_pending(timer));
BUG_ON(!list_empty(&work->entry));

- timer_stats_timer_set_start_info(&dwork->timer);
-
/*
* This stores cwq for the moment, for the timer_fn.
* Note that the work's gcwq is preserved to allow
--
1.7.0.4

2010-08-29 15:06:12

by Yong Zhang

[permalink] [raw]
Subject: [RFC PATCH 3/6] timer: move timer_stats_timer_*_start_info() to kernel/timer.c

From: Yong Zhang <[email protected]>

timer_stats_timer_*_start_info() should not be seen outside,
move it to where it should be.

Signed-off-by: Yong Zhang <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
include/linux/timer.h | 23 -----------------------
kernel/timer.c | 20 ++++++++++++++++++--
2 files changed, 18 insertions(+), 25 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 38cf093..4fb4cd7 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -217,33 +217,10 @@ extern void init_timer_stats(void);
extern void timer_stats_update_stats(void *timer, pid_t pid, void *startf,
void *timerf, char *comm,
unsigned int timer_flag);
-
-extern void __timer_stats_timer_set_start_info(struct timer_list *timer,
- void *addr);
-
-static inline void timer_stats_timer_set_start_info(struct timer_list *timer)
-{
- if (likely(!timer_stats_active))
- return;
- __timer_stats_timer_set_start_info(timer, __builtin_return_address(0));
-}
-
-static inline void timer_stats_timer_clear_start_info(struct timer_list *timer)
-{
- timer->start_site = NULL;
-}
#else
static inline void init_timer_stats(void)
{
}
-
-static inline void timer_stats_timer_set_start_info(struct timer_list *timer)
-{
-}
-
-static inline void timer_stats_timer_clear_start_info(struct timer_list *timer)
-{
-}
#endif

extern void add_timer(struct timer_list *timer);
diff --git a/kernel/timer.c b/kernel/timer.c
index 2a443cc..6dd3a9e 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -395,16 +395,24 @@ static void internal_add_timer(struct tvec_base *base, struct timer_list *timer)
}

#ifdef CONFIG_TIMER_STATS
-void __timer_stats_timer_set_start_info(struct timer_list *timer, void *addr)
+static inline void timer_stats_timer_set_start_info(struct timer_list *timer)
{
+ if (likely(!timer_stats_active))
+ return;
+
if (timer->start_site)
return;

- timer->start_site = addr;
+ timer->start_site = __builtin_return_address(0);
memcpy(timer->start_comm, current->comm, TASK_COMM_LEN);
timer->start_pid = current->pid;
}

+static inline void timer_stats_timer_clear_start_info(struct timer_list *timer)
+{
+ timer->start_site = NULL;
+}
+
static void timer_stats_account_timer(struct timer_list *timer)
{
unsigned int flag = 0;
@@ -419,6 +427,14 @@ static void timer_stats_account_timer(struct timer_list *timer)
}

#else
+static inline void timer_stats_timer_set_start_info(struct timer_list *timer)
+{
+}
+
+static inline void timer_stats_timer_clear_start_info(struct timer_list *timer)
+{
+}
+
static void timer_stats_account_timer(struct timer_list *timer) {}
#endif

--
1.7.0.4

2010-08-29 15:06:33

by Yong Zhang

[permalink] [raw]
Subject: [RFC PATCH 4/6] hrtimer: change call site of timer_stats_hrtimer_clear_start_info()

From: Yong Zhang <[email protected]>

Put timer_stats_hrtimer_clear_start_info() to the real cancel
place.

Because remove_hrtimer() is used for internal, and it could be called
as below:
int __hrtimer_start_range_ns(ARG) {
...
remove_hrtimer();
...
timer_stats_hrtimer_set_start_info();
}

Then if remove_hrtimer() clear the start_site, the following
timer_stats_hrtimer_set_start_info() will record the caller of
__hrtimer_start_range_ns as start_site. But this is not what
we want.

Signed-off-by: Yong Zhang <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
kernel/hrtimer.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index ce66917..2b79ab4 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -772,7 +772,7 @@ static inline void timer_stats_hrtimer_clear_start_info(struct hrtimer *timer)
static inline void timer_stats_account_hrtimer(struct hrtimer *timer)
{
#ifdef CONFIG_TIMER_STATS
- if (likely(!timer_stats_active))
+ if (!timer->start_site)
return;
timer_stats_update_stats(timer, timer->start_pid, timer->start_site,
timer->function, timer->start_comm, 0);
@@ -942,7 +942,6 @@ remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base)
* rare case and less expensive than a smp call.
*/
debug_deactivate(timer);
- timer_stats_hrtimer_clear_start_info(timer);
reprogram = base->cpu_base == &__get_cpu_var(hrtimer_bases);
__remove_hrtimer(timer, base, HRTIMER_STATE_INACTIVE,
reprogram);
@@ -1055,9 +1054,10 @@ int hrtimer_try_to_cancel(struct hrtimer *timer)

base = lock_hrtimer_base(timer, &flags);

- if (!hrtimer_callback_running(timer))
+ if (!hrtimer_callback_running(timer)){
+ timer_stats_hrtimer_clear_start_info(timer);
ret = remove_hrtimer(timer, base);
-
+ }
unlock_hrtimer_base(timer, &flags);

return ret;
--
1.7.0.4

2010-08-29 15:06:48

by Yong Zhang

[permalink] [raw]
Subject: [RFC PATCH 5/6] hrtimer: record more real start_site for timer_stats

From: Yong Zhang <[email protected]>

Most call sites are showed as hrtimer_start_range_ns(). This gives us
not enough info about the start_site. Then call
timer_stats_hrtimer_set_start_info() in hrtimer_start*() to
record the upper caller.

Signed-off-by: Yong Zhang <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
kernel/hrtimer.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 2b79ab4..1820a83 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1014,6 +1014,7 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
int hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
unsigned long delta_ns, const enum hrtimer_mode mode)
{
+ timer_stats_hrtimer_set_start_info(timer);
return __hrtimer_start_range_ns(timer, tim, delta_ns, mode, 1);
}
EXPORT_SYMBOL_GPL(hrtimer_start_range_ns);
@@ -1031,6 +1032,7 @@ EXPORT_SYMBOL_GPL(hrtimer_start_range_ns);
int
hrtimer_start(struct hrtimer *timer, ktime_t tim, const enum hrtimer_mode mode)
{
+ timer_stats_hrtimer_set_start_info(timer);
return __hrtimer_start_range_ns(timer, tim, 0, mode, 1);
}
EXPORT_SYMBOL_GPL(hrtimer_start);
--
1.7.0.4

2010-08-29 15:07:03

by Yong Zhang

[permalink] [raw]
Subject: [RFC PATCH 6/6] timer_stats: make output more readable

From: Yong Zhang <[email protected]>

When running timer_stats for a little long time, the output
will be not aligned anymore because the volume of sample count
and pid will expand.

Signed-off-by: Yong Zhang <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
kernel/time/timer_stats.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timer_stats.c b/kernel/time/timer_stats.c
index 2f3b585..da85b3e 100644
--- a/kernel/time/timer_stats.c
+++ b/kernel/time/timer_stats.c
@@ -307,10 +307,10 @@ static int tstats_show(struct seq_file *m, void *v)
for (i = 0; i < nr_entries; i++) {
entry = entries + i;
if (entry->timer_flag & TIMER_STATS_FLAG_DEFERRABLE) {
- seq_printf(m, "%4luD, %5d %-16s ",
+ seq_printf(m, "%10luD, %10d %-16s ",
entry->count, entry->pid, entry->comm);
} else {
- seq_printf(m, " %4lu, %5d %-16s ",
+ seq_printf(m, " %10lu, %10d %-16s ",
entry->count, entry->pid, entry->comm);
}

--
1.7.0.4

2010-08-29 15:44:48

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [RFC PATCHSET 0/6] timer/hrtimer/timer_stats: Improvement on timer_stats

On 8/29/2010 8:05 AM, Yong Zhang wrote:
> Hi,
>
> This patchset focus on two problem:
>
> 1)timer_stats_timer_*_start_info() should belong to timer core only.
> It should not be seen by outsider, such as workqueue. And should
> not resident in timer.h.
> Patch-0001~0003 first remove the only explicit out call, then move
> timer_stats_timer_*_start_info() from timer.h to timer.c.
>
> 2)call site recorded by timer_stats sometimes show the timer core functions
> instead of the real call site.
> Now /proc/timer_stats is like below on my machine:
> ---now---
>


one question; have you tested this with powertop? (powertop being one of
the prime consumers of this interface)

2010-08-30 03:27:15

by Yong Zhang

[permalink] [raw]
Subject: Re: [RFC PATCHSET 0/6] timer/hrtimer/timer_stats: Improvement on timer_stats

On Sun, Aug 29, 2010 at 11:44 PM, Arjan van de Ven
<[email protected]> wrote:
> On 8/29/2010 8:05 AM, Yong Zhang wrote:
>>
>> Hi,
>>
>> This patchset focus on two problem:
>>
>> 1)timer_stats_timer_*_start_info() should belong to timer core only.
>>   It should not be seen by outsider, such as workqueue. And should
>>   not resident in timer.h.
>>   Patch-0001~0003 first remove the only explicit out call, then move
>>   timer_stats_timer_*_start_info() from timer.h to timer.c.
>>
>> 2)call site recorded by timer_stats sometimes show the timer core
>> functions
>>   instead of the real call site.
>>   Now /proc/timer_stats is like below on my machine:
>> ---now---
>>
>
>
> one question; have you tested this with powertop? (powertop being one of the
> prime consumers of this interface)
>

Now, vanilla kernel with powtop is
Top causes for wakeups:
49.4% (250.3) kworker/0:0 : hrtimer_start_range_ns (tick_sched_timer)
49.4% (250.3) <kernel core> : hrtimer_start_range_ns (tick_sched_timer)
0.4% ( 2.0) <kernel core> : clocksource_watchdog (clocksource_watchdog)
0.4% ( 1.8) <interrupt> : eth1
0.2% ( 1.0) kworker/0:0 : add_timer (tg3_timer)
0.1% ( 0.3) <kernel core> : dev_watchdog (dev_watchdog)
0.0% ( 0.2) sendmail : hrtimer_start_range_ns (hrtimer_wakeup)
0.0% ( 0.2) init : hrtimer_start_range_ns (hrtimer_wakeup)
0.0% ( 0.1) <kernel core> : bdi_arm_supers_timer (sync_supers_timer_fn)
0.0% ( 0.1) flush-0:13 : bdi_writeback_thread (process_timeout)
0.0% ( 0.1) <kernel core> : __enqueue_rt_entity (sched_rt_period_timer)

And the patched kernel:
Top causes for wakeups:
34.9% ( 2.0) kworker/0:0 : clocksource_watchdog (clocksource_watchdog)
29.1% ( 1.7) <interrupt> : eth1
17.4% ( 1.0) kworker/0:0 : tg3_timer (tg3_timer)
3.5% ( 0.2) sendmail : schedule_hrtimeout_range_clock
(hrtimer_wakeup)
3.5% ( 0.2) <kernel core> : dev_watchdog (dev_watchdog)
3.5% ( 0.2) init : schedule_hrtimeout_range_clock
(hrtimer_wakeup)
2.3% ( 0.1) <kernel core> : bdi_arm_supers_timer (sync_supers_timer_fn)
2.3% ( 0.1) flush-0:13 : bdi_writeback_thread (process_timeout)
1.2% ( 0.1) <kernel core> : __enqueue_rt_entity (sched_rt_period_timer)
1.2% ( 0.1) syslogd : do_setitimer (it_real_fn)

The difference is tick_sched_timer() is gone in the patches version.

And after taking a look at powertop, I think you want to get rid of the tick
events, right? Or maybe I'm missing something?

Thanks,
Yong