2009-06-03 15:23:21

by Martin Schwidefsky

[permalink] [raw]
Subject: [patch 0/2] NOHZ vs. profile/oprofile v2

Greetings,
version 2 of the profile patches. The only change is the in_interrupt()
fix in tick_nohz_stop_idle(). I would like to know how to proceed with
the issue.
Andy, do you still prefer to handle the old style profiler analog to
the oprofile patch? If yes I would drop patch #1 and extend patch #2
with another tick_nohz_disable().

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.


2009-06-09 20:53:16

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 0/2] NOHZ vs. profile/oprofile v2

On Wed, 3 Jun 2009, Martin Schwidefsky wrote:

> Greetings,
> version 2 of the profile patches. The only change is the in_interrupt()
> fix in tick_nohz_stop_idle(). I would like to know how to proceed with
> the issue.
> Andy, do you still prefer to handle the old style profiler analog to
> the oprofile patch? If yes I would drop patch #1 and extend patch #2
> with another tick_nohz_disable().

Any update on this one ?

Thanks,

tglx

2009-06-10 17:34:11

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [patch 0/2] NOHZ vs. profile/oprofile v2

On Tue, 9 Jun 2009 22:52:51 +0200 (CEST)
Thomas Gleixner <[email protected]> wrote:

> On Wed, 3 Jun 2009, Martin Schwidefsky wrote:
>
> > Greetings,
> > version 2 of the profile patches. The only change is the in_interrupt()
> > fix in tick_nohz_stop_idle(). I would like to know how to proceed with
> > the issue.
> > Andy, do you still prefer to handle the old style profiler analog to
> > the oprofile patch? If yes I would drop patch #1 and extend patch #2
> > with another tick_nohz_disable().
>
> Any update on this one ?

I haven't heard anything from any Andy in regard to patch #1. We could
do patch #2 now and decide later what we want to do with the old style
profiler.

blue skies,
Martin.

2009-06-22 14:27:13

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [patch 0/2] NOHZ vs. profile/oprofile v2

On Tue, 9 Jun 2009 22:52:51 +0200 (CEST)
Thomas Gleixner <[email protected]> wrote:

> > version 2 of the profile patches. The only change is the in_interrupt()
> > fix in tick_nohz_stop_idle(). I would like to know how to proceed with
> > the issue.
> > Andy, do you still prefer to handle the old style profiler analog to
> > the oprofile patch? If yes I would drop patch #1 and extend patch #2
> > with another tick_nohz_disable().
>
> Any update on this one ?

A solution to this problem should go upstream soon, no? How about this
patch, it uses the tick_nohz_disable/tick_nohz_enable mechanic for
oprofile and the old style kernel profiler. Good enough ?

---
Subject: [PATCH] keep on ticking if a profiler is active

From: Martin Schwidefsky <[email protected]>

On a NOHZ system with oprofile or the old style kernel profiler enabled
the timer tick should not be stopped when a cpu goes idle. Currently
a maximum of 1 tick is accounted if a cpu sleeps for a longer period of
time. This does bad things to the percentages in the profiler output.

Signed-off-by: Martin Schwidefsky <[email protected]>
---

drivers/oprofile/oprof.c | 3 +++
include/linux/tick.h | 4 ++++
kernel/profile.c | 4 ++++
kernel/time/tick-sched.c | 27 ++++++++++++++++++++++++++-
4 files changed, 37 insertions(+), 1 deletion(-)

diff -urpN linux-2.6/drivers/oprofile/oprof.c linux-2.6-patched/drivers/oprofile/oprof.c
--- linux-2.6/drivers/oprofile/oprof.c 2009-06-10 05:05:27.000000000 +0200
+++ linux-2.6-patched/drivers/oprofile/oprof.c 2009-06-22 11:26:50.000000000 +0200
@@ -12,6 +12,7 @@
#include <linux/init.h>
#include <linux/oprofile.h>
#include <linux/moduleparam.h>
+#include <linux/tick.h>
#include <asm/mutex.h>

#include "oprof.h"
@@ -103,6 +104,7 @@ int oprofile_start(void)
if (oprofile_started)
goto out;

+ tick_nohz_disable(1);
oprofile_reset_stats();

if ((err = oprofile_ops.start()))
@@ -123,6 +125,7 @@ void oprofile_stop(void)
goto out;
oprofile_ops.stop();
oprofile_started = 0;
+ tick_nohz_enable();
/* wake up the daemon to read what remains */
wake_up_buffer_waiter();
out:
diff -urpN linux-2.6/include/linux/tick.h linux-2.6-patched/include/linux/tick.h
--- linux-2.6/include/linux/tick.h 2009-06-22 11:26:26.000000000 +0200
+++ linux-2.6-patched/include/linux/tick.h 2009-06-22 11:26:50.000000000 +0200
@@ -119,6 +119,8 @@ extern void tick_nohz_stop_sched_tick(in
extern void tick_nohz_restart_sched_tick(void);
extern ktime_t tick_nohz_get_sleep_length(void);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
+extern void tick_nohz_enable(void);
+extern void tick_nohz_disable(int wakeup);
# else
static inline void tick_nohz_stop_sched_tick(int inidle) { }
static inline void tick_nohz_restart_sched_tick(void) { }
@@ -129,6 +131,8 @@ static inline ktime_t tick_nohz_get_slee
return len;
}
static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
+static inline void tick_nohz_enable(void) { }
+static inline void tick_nohz_disable(int wakeup) { }
# endif /* !NO_HZ */

#endif
diff -urpN linux-2.6/kernel/profile.c linux-2.6-patched/kernel/profile.c
--- linux-2.6/kernel/profile.c 2009-06-22 11:26:26.000000000 +0200
+++ linux-2.6-patched/kernel/profile.c 2009-06-22 11:26:50.000000000 +0200
@@ -24,6 +24,7 @@
#include <linux/mutex.h>
#include <linux/slab.h>
#include <linux/vmalloc.h>
+#include <linux/tick.h>
#include <asm/sections.h>
#include <asm/irq_regs.h>
#include <asm/ptrace.h>
@@ -97,6 +98,8 @@ int profile_setup(char *str)
printk(KERN_INFO "kernel profiling enabled (shift: %ld)\n",
prof_shift);
}
+ if (prof_on)
+ tick_nohz_disable(0);
return 1;
}
__setup("profile=", profile_setup);
@@ -582,6 +585,7 @@ static int create_hash_tables(void)
return 0;
out_cleanup:
prof_on = 0;
+ tick_nohz_enable();
smp_mb();
on_each_cpu(profile_nop, NULL, 1);
for_each_online_cpu(cpu) {
diff -urpN linux-2.6/kernel/time/tick-sched.c linux-2.6-patched/kernel/time/tick-sched.c
--- linux-2.6/kernel/time/tick-sched.c 2009-06-22 11:26:26.000000000 +0200
+++ linux-2.6-patched/kernel/time/tick-sched.c 2009-06-22 11:26:50.000000000 +0200
@@ -124,6 +124,30 @@ static int __init setup_tick_nohz(char *

__setup("nohz=", setup_tick_nohz);

+/*
+ * NO HZ currently disabled ?
+ */
+static atomic_t tick_nohz_disable_counter = ATOMIC_INIT(0);
+
+void tick_nohz_enable(void)
+{
+ atomic_dec(&tick_nohz_disable_counter);
+}
+EXPORT_SYMBOL_GPL(tick_nohz_enable);
+
+static void __tick_nohz_disable(void *dummy)
+{
+}
+
+void tick_nohz_disable(int wakeup)
+{
+ if (atomic_inc_return(&tick_nohz_disable_counter) == 1)
+ if (wakeup)
+ /* Wake up all cpus to make them start ticking. */
+ smp_call_function(__tick_nohz_disable, NULL, 0);
+}
+EXPORT_SYMBOL_GPL(tick_nohz_disable);
+
/**
* tick_nohz_update_jiffies - update jiffies when idle was interrupted
*
@@ -276,7 +300,8 @@ void tick_nohz_stop_sched_tick(int inidl
next_jiffies = get_next_timer_interrupt(last_jiffies);
delta_jiffies = next_jiffies - last_jiffies;

- if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu))
+ if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu) ||
+ atomic_read(&tick_nohz_disable_counter) > 0)
delta_jiffies = 1;
/*
* Do not stop the tick, if we are only one off


--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

2009-06-22 14:41:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 0/2] NOHZ vs. profile/oprofile v2


* Martin Schwidefsky <[email protected]> wrote:

> On Tue, 9 Jun 2009 22:52:51 +0200 (CEST)
> Thomas Gleixner <[email protected]> wrote:
>
> > > version 2 of the profile patches. The only change is the in_interrupt()
> > > fix in tick_nohz_stop_idle(). I would like to know how to proceed with
> > > the issue.
> > > Andy, do you still prefer to handle the old style profiler analog to
> > > the oprofile patch? If yes I would drop patch #1 and extend patch #2
> > > with another tick_nohz_disable().
> >
> > Any update on this one ?
>
> A solution to this problem should go upstream soon, no? How about this
> patch, it uses the tick_nohz_disable/tick_nohz_enable mechanic for
> oprofile and the old style kernel profiler. Good enough ?
>
> ---
> Subject: [PATCH] keep on ticking if a profiler is active
>
> From: Martin Schwidefsky <[email protected]>
>
> On a NOHZ system with oprofile or the old style kernel profiler enabled
> the timer tick should not be stopped when a cpu goes idle. Currently
> a maximum of 1 tick is accounted if a cpu sleeps for a longer period of
> time. This does bad things to the percentages in the profiler output.
>
> Signed-off-by: Martin Schwidefsky <[email protected]>
> ---
>
> drivers/oprofile/oprof.c | 3 +++
> include/linux/tick.h | 4 ++++
> kernel/profile.c | 4 ++++
> kernel/time/tick-sched.c | 27 ++++++++++++++++++++++++++-
> 4 files changed, 37 insertions(+), 1 deletion(-)

Hm, this is rather ugly. Why not use hrtimers like 'perf' does when
it fallback-samples based on the timer tick?

That method has three advantages:

- no weird hookery needed
- resolution can go far beyond HZ
- it is evidently dynticks-safe

Ingo

2009-06-22 14:59:45

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [patch 0/2] NOHZ vs. profile/oprofile v2

On Mon, 22 Jun 2009 16:41:10 +0200
Ingo Molnar <[email protected]> wrote:

> Hm, this is rather ugly. Why not use hrtimers like 'perf' does when
> it fallback-samples based on the timer tick?
>
> That method has three advantages:
>
> - no weird hookery needed
> - resolution can go far beyond HZ
> - it is evidently dynticks-safe

Hmm, if we replace the HZ based oprofile tick with an hrtimer we should
add an interface to configure the sample interval as well, no? Otherwise
we just replace one timer event (HZ) with another (hrtimer).

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

2009-06-22 15:06:15

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 0/2] NOHZ vs. profile/oprofile v2


* Martin Schwidefsky <[email protected]> wrote:

> On Mon, 22 Jun 2009 16:41:10 +0200
> Ingo Molnar <[email protected]> wrote:
>
> > Hm, this is rather ugly. Why not use hrtimers like 'perf' does when
> > it fallback-samples based on the timer tick?
> >
> > That method has three advantages:
> >
> > - no weird hookery needed
> > - resolution can go far beyond HZ
> > - it is evidently dynticks-safe
>
> Hmm, if we replace the HZ based oprofile tick with an hrtimer we
> should add an interface to configure the sample interval as well,
> no? Otherwise we just replace one timer event (HZ) with another
> (hrtimer).

Even if the hrtimer is set with a 1/HZ period it's a better
solution, as it's dynticks safe without invasive changes.

Ingo

2009-06-22 15:20:30

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [patch 0/2] NOHZ vs. profile/oprofile v2

On Mon, 22 Jun 2009 17:05:53 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Martin Schwidefsky <[email protected]> wrote:
>
> > On Mon, 22 Jun 2009 16:41:10 +0200
> > Ingo Molnar <[email protected]> wrote:
> >
> > > Hm, this is rather ugly. Why not use hrtimers like 'perf' does when
> > > it fallback-samples based on the timer tick?
> > >
> > > That method has three advantages:
> > >
> > > - no weird hookery needed
> > > - resolution can go far beyond HZ
> > > - it is evidently dynticks-safe
> >
> > Hmm, if we replace the HZ based oprofile tick with an hrtimer we
> > should add an interface to configure the sample interval as well,
> > no? Otherwise we just replace one timer event (HZ) with another
> > (hrtimer).
>
> Even if the hrtimer is set with a 1/HZ period it's a better
> solution, as it's dynticks safe without invasive changes.

Ok, but the patch will be quite big. All the profile_tick() calls from
the architecture files will have to be removed. And if we really want
to keep things separate there will be two sets of per-cpu hrtimer,
one for the old style profiler and one for oprofile. Any preference
for the user space interface to set the sample rate? A sysctl?

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

2009-06-22 15:29:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 0/2] NOHZ vs. profile/oprofile v2


* Martin Schwidefsky <[email protected]> wrote:

> On Mon, 22 Jun 2009 17:05:53 +0200
> Ingo Molnar <[email protected]> wrote:
>
> >
> > * Martin Schwidefsky <[email protected]> wrote:
> >
> > > On Mon, 22 Jun 2009 16:41:10 +0200
> > > Ingo Molnar <[email protected]> wrote:
> > >
> > > > Hm, this is rather ugly. Why not use hrtimers like 'perf' does when
> > > > it fallback-samples based on the timer tick?
> > > >
> > > > That method has three advantages:
> > > >
> > > > - no weird hookery needed
> > > > - resolution can go far beyond HZ
> > > > - it is evidently dynticks-safe
> > >
> > > Hmm, if we replace the HZ based oprofile tick with an hrtimer we
> > > should add an interface to configure the sample interval as well,
> > > no? Otherwise we just replace one timer event (HZ) with another
> > > (hrtimer).
> >
> > Even if the hrtimer is set with a 1/HZ period it's a better
> > solution, as it's dynticks safe without invasive changes.
>
> Ok, but the patch will be quite big. All the profile_tick() calls
> from the architecture files will have to be removed. [...]

Hey, that's a bonus :)

> [...] And if we really want to keep things separate there will be
> two sets of per-cpu hrtimer, one for the old style profiler and
> one for oprofile. Any preference for the user space interface to
> set the sample rate? A sysctl?

I dont think we want to keep things separate. Regarding old-style
profiler, does anyone still use it? There's now a superior in-tree
replacement for it, so we could phase it out.

A sysctl sounds like the most obvious way to set the sample period -
and it can default to 1/Hz.

Ingo

2009-06-22 15:36:22

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [patch 0/2] NOHZ vs. profile/oprofile v2

On Mon, 22 Jun 2009 17:29:37 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Martin Schwidefsky <[email protected]> wrote:
>
> > On Mon, 22 Jun 2009 17:05:53 +0200
> > Ingo Molnar <[email protected]> wrote:
> >
> > >
> > > * Martin Schwidefsky <[email protected]> wrote:
> > >
> > > > On Mon, 22 Jun 2009 16:41:10 +0200
> > > > Ingo Molnar <[email protected]> wrote:
> > > >
> > > > > Hm, this is rather ugly. Why not use hrtimers like 'perf' does when
> > > > > it fallback-samples based on the timer tick?
> > > > >
> > > > > That method has three advantages:
> > > > >
> > > > > - no weird hookery needed
> > > > > - resolution can go far beyond HZ
> > > > > - it is evidently dynticks-safe
> > > >
> > > > Hmm, if we replace the HZ based oprofile tick with an hrtimer we
> > > > should add an interface to configure the sample interval as well,
> > > > no? Otherwise we just replace one timer event (HZ) with another
> > > > (hrtimer).
> > >
> > > Even if the hrtimer is set with a 1/HZ period it's a better
> > > solution, as it's dynticks safe without invasive changes.
> >
> > Ok, but the patch will be quite big. All the profile_tick() calls
> > from the architecture files will have to be removed. [...]
>
> Hey, that's a bonus :)

It would remove some oddball code :-)

> > [...] And if we really want to keep things separate there will be
> > two sets of per-cpu hrtimer, one for the old style profiler and
> > one for oprofile. Any preference for the user space interface to
> > set the sample rate? A sysctl?
>
> I dont think we want to keep things separate. Regarding old-style
> profiler, does anyone still use it? There's now a superior in-tree
> replacement for it, so we could phase it out.

Well, for my part I won't miss it. But to be able to remove the
profile_tick() calls from the architectures I either have to rip out
the old profiler now, or adapt it to use hrtimer as well.

> A sysctl sounds like the most obvious way to set the sample period -
> and it can default to 1/Hz.

Agreed.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

2009-06-22 15:40:47

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 0/2] NOHZ vs. profile/oprofile v2


* Martin Schwidefsky <[email protected]> wrote:

> On Mon, 22 Jun 2009 17:29:37 +0200
> Ingo Molnar <[email protected]> wrote:
>
> >
> > * Martin Schwidefsky <[email protected]> wrote:
> >
> > > On Mon, 22 Jun 2009 17:05:53 +0200
> > > Ingo Molnar <[email protected]> wrote:
> > >
> > > >
> > > > * Martin Schwidefsky <[email protected]> wrote:
> > > >
> > > > > On Mon, 22 Jun 2009 16:41:10 +0200
> > > > > Ingo Molnar <[email protected]> wrote:
> > > > >
> > > > > > Hm, this is rather ugly. Why not use hrtimers like 'perf' does when
> > > > > > it fallback-samples based on the timer tick?
> > > > > >
> > > > > > That method has three advantages:
> > > > > >
> > > > > > - no weird hookery needed
> > > > > > - resolution can go far beyond HZ
> > > > > > - it is evidently dynticks-safe
> > > > >
> > > > > Hmm, if we replace the HZ based oprofile tick with an hrtimer we
> > > > > should add an interface to configure the sample interval as well,
> > > > > no? Otherwise we just replace one timer event (HZ) with another
> > > > > (hrtimer).
> > > >
> > > > Even if the hrtimer is set with a 1/HZ period it's a better
> > > > solution, as it's dynticks safe without invasive changes.
> > >
> > > Ok, but the patch will be quite big. All the profile_tick() calls
> > > from the architecture files will have to be removed. [...]
> >
> > Hey, that's a bonus :)
>
> It would remove some oddball code :-)
>
> > > [...] And if we really want to keep things separate there will be
> > > two sets of per-cpu hrtimer, one for the old style profiler and
> > > one for oprofile. Any preference for the user space interface to
> > > set the sample rate? A sysctl?
> >
> > I dont think we want to keep things separate. Regarding old-style
> > profiler, does anyone still use it? There's now a superior in-tree
> > replacement for it, so we could phase it out.
>
> Well, for my part I won't miss it. But to be able to remove the
> profile_tick() calls from the architectures I either have to rip
> out the old profiler now, or adapt it to use hrtimer as well.

Do we _have to_ touch it so widely right now? We could start with a
deprecation warning in this cycle. Once it's deprecated we can
remove all those calls.

Ingo

2009-06-22 16:37:39

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [patch 0/2] NOHZ vs. profile/oprofile v2

On Mon, 22 Jun 2009 17:40:30 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Martin Schwidefsky <[email protected]> wrote:
>
> > On Mon, 22 Jun 2009 17:29:37 +0200
> > Ingo Molnar <[email protected]> wrote:
> >
> > >
> > > * Martin Schwidefsky <[email protected]> wrote:
> > >
> > > > On Mon, 22 Jun 2009 17:05:53 +0200
> > > > Ingo Molnar <[email protected]> wrote:
> > > >
> > > > >
> > > > > * Martin Schwidefsky <[email protected]> wrote:
> > > > >
> > > > > > On Mon, 22 Jun 2009 16:41:10 +0200
> > > > > > Ingo Molnar <[email protected]> wrote:
> > > > > >
> > > > > > > Hm, this is rather ugly. Why not use hrtimers like 'perf' does when
> > > > > > > it fallback-samples based on the timer tick?
> > > > > > >
> > > > > > > That method has three advantages:
> > > > > > >
> > > > > > > - no weird hookery needed
> > > > > > > - resolution can go far beyond HZ
> > > > > > > - it is evidently dynticks-safe
> > > > > >
> > > > > > Hmm, if we replace the HZ based oprofile tick with an hrtimer we
> > > > > > should add an interface to configure the sample interval as well,
> > > > > > no? Otherwise we just replace one timer event (HZ) with another
> > > > > > (hrtimer).
> > > > >
> > > > > Even if the hrtimer is set with a 1/HZ period it's a better
> > > > > solution, as it's dynticks safe without invasive changes.
> > > >
> > > > Ok, but the patch will be quite big. All the profile_tick() calls
> > > > from the architecture files will have to be removed. [...]
> > >
> > > Hey, that's a bonus :)
> >
> > It would remove some oddball code :-)
> >
> > > > [...] And if we really want to keep things separate there will be
> > > > two sets of per-cpu hrtimer, one for the old style profiler and
> > > > one for oprofile. Any preference for the user space interface to
> > > > set the sample rate? A sysctl?
> > >
> > > I dont think we want to keep things separate. Regarding old-style
> > > profiler, does anyone still use it? There's now a superior in-tree
> > > replacement for it, so we could phase it out.
> >
> > Well, for my part I won't miss it. But to be able to remove the
> > profile_tick() calls from the architectures I either have to rip
> > out the old profiler now, or adapt it to use hrtimer as well.
>
> Do we _have to_ touch it so widely right now? We could start with a
> deprecation warning in this cycle. Once it's deprecated we can
> remove all those calls.

Hmm, I like that. Fix oprofile with hrtimer and leave profile_tick()
as it is for now.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

2009-06-24 16:51:24

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: [patch 0/2] NOHZ vs. profile/oprofile v2

On Mon, 22 Jun 2009 17:40:30 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Martin Schwidefsky <[email protected]> wrote:
>
> > On Mon, 22 Jun 2009 17:29:37 +0200
> > Ingo Molnar <[email protected]> wrote:
> >
> > >
> > > * Martin Schwidefsky <[email protected]> wrote:
> > >
> > > > [...] And if we really want to keep things separate there will be
> > > > two sets of per-cpu hrtimer, one for the old style profiler and
> > > > one for oprofile. Any preference for the user space interface to
> > > > set the sample rate? A sysctl?
> > >
> > > I dont think we want to keep things separate. Regarding old-style
> > > profiler, does anyone still use it? There's now a superior in-tree
> > > replacement for it, so we could phase it out.
> >
> > Well, for my part I won't miss it. But to be able to remove the
> > profile_tick() calls from the architectures I either have to rip
> > out the old profiler now, or adapt it to use hrtimer as well.
>
> Do we _have to_ touch it so widely right now? We could start with a
> deprecation warning in this cycle. Once it's deprecated we can
> remove all those calls.

First version of the hrtimer patch for oprofile. I did not add the
sysctl yet, if the sysctl is added in oprofile_timer_init it would not
be available if some better profiling source is available. If it is
added unconditionally it would only have an effect if the timer
fallback is used. Both cases are not exactly nice for a user space
interface.

---
Subject: [PATCH] convert oprofile from timer_hook to hrtimer

From: Martin Schwidefsky <[email protected]>

Oprofile is currently broken on systems running with NOHZ enabled.
A maximum of 1 tick is accounted via the timer_hook if a cpu sleeps
for a longer period of time. This does bad things to the percentages
in the profiler output. To solve this problem convert oprofile to
use a restarting hrtimer instead of the timer_hook.

Signed-off-by: Martin Schwidefsky <[email protected]>
---

drivers/oprofile/oprof.c | 12 ++++--
drivers/oprofile/oprof.h | 3 +
drivers/oprofile/timer_int.c | 77 +++++++++++++++++++++++++++++++++++++------
3 files changed, 78 insertions(+), 14 deletions(-)

diff -urpN linux-2.6/drivers/oprofile/oprof.c linux-2.6-patched/drivers/oprofile/oprof.c
--- linux-2.6/drivers/oprofile/oprof.c 2009-06-24 17:21:12.000000000 +0200
+++ linux-2.6-patched/drivers/oprofile/oprof.c 2009-06-24 17:18:28.000000000 +0200
@@ -184,22 +184,26 @@ static int __init oprofile_init(void)
int err;

err = oprofile_arch_init(&oprofile_ops);
-
if (err < 0 || timer) {
printk(KERN_INFO "oprofile: using timer interrupt.\n");
- oprofile_timer_init(&oprofile_ops);
+ err = oprofile_timer_init(&oprofile_ops);
+ if (err)
+ goto out_arch;
}
-
err = oprofilefs_register();
if (err)
- oprofile_arch_exit();
+ goto out_arch;
+ return 0;

+out_arch:
+ oprofile_arch_exit();
return err;
}


static void __exit oprofile_exit(void)
{
+ oprofile_timer_exit();
oprofilefs_unregister();
oprofile_arch_exit();
}
diff -urpN linux-2.6/drivers/oprofile/oprof.h linux-2.6-patched/drivers/oprofile/oprof.h
--- linux-2.6/drivers/oprofile/oprof.h 2009-06-24 17:21:12.000000000 +0200
+++ linux-2.6-patched/drivers/oprofile/oprof.h 2009-06-24 17:19:11.000000000 +0200
@@ -32,7 +32,8 @@ struct super_block;
struct dentry;

void oprofile_create_files(struct super_block *sb, struct dentry *root);
-void oprofile_timer_init(struct oprofile_operations *ops);
+int oprofile_timer_init(struct oprofile_operations *ops);
+void oprofile_timer_exit(void);

int oprofile_set_backtrace(unsigned long depth);

diff -urpN linux-2.6/drivers/oprofile/timer_int.c linux-2.6-patched/drivers/oprofile/timer_int.c
--- linux-2.6/drivers/oprofile/timer_int.c 2009-06-24 17:21:12.000000000 +0200
+++ linux-2.6-patched/drivers/oprofile/timer_int.c 2009-06-24 17:18:44.000000000 +0200
@@ -13,34 +13,93 @@
#include <linux/oprofile.h>
#include <linux/profile.h>
#include <linux/init.h>
+#include <linux/cpu.h>
+#include <linux/hrtimer.h>
+#include <asm/irq_regs.h>
#include <asm/ptrace.h>

#include "oprof.h"

-static int timer_notify(struct pt_regs *regs)
+static DEFINE_PER_CPU(struct hrtimer, oprofile_hrtimer);
+
+static enum hrtimer_restart oprofile_hrtimer_notify(struct hrtimer *hrtimer)
+{
+ oprofile_add_sample(get_irq_regs(), 0);
+ hrtimer_forward_now(hrtimer, ns_to_ktime(TICK_NSEC));
+ return HRTIMER_RESTART;
+}
+
+static void __oprofile_hrtimer_start(void *unused)
+{
+ struct hrtimer *hrtimer = &__get_cpu_var(oprofile_hrtimer);
+
+ hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ hrtimer->function = oprofile_hrtimer_notify;
+
+ hrtimer_start(hrtimer, ns_to_ktime(TICK_NSEC),
+ HRTIMER_MODE_REL_PINNED);
+}
+
+static int oprofile_hrtimer_start(void)
{
- oprofile_add_sample(regs, 0);
+ on_each_cpu(__oprofile_hrtimer_start, NULL, 1);
return 0;
}

-static int timer_start(void)
+static void __oprofile_hrtimer_stop(int cpu)
{
- return register_timer_hook(timer_notify);
+ struct hrtimer *hrtimer = &per_cpu(oprofile_hrtimer, cpu);
+
+ hrtimer_cancel(hrtimer);
}

+static void oprofile_hrtimer_stop(void)
+{
+ int cpu;
+ for_each_online_cpu(cpu)
+ __oprofile_hrtimer_stop(cpu);
+}

-static void timer_stop(void)
+static int __cpuinit oprofile_cpu_notify(struct notifier_block *self,
+ unsigned long action, void *hcpu)
{
- unregister_timer_hook(timer_notify);
+ long cpu = (long) hcpu;
+
+ switch (action) {
+ case CPU_ONLINE:
+ case CPU_ONLINE_FROZEN:
+ smp_call_function_single(cpu, __oprofile_hrtimer_start,
+ NULL, 1);
+ break;
+ case CPU_DEAD:
+ case CPU_DEAD_FROZEN:
+ __oprofile_hrtimer_stop(cpu);
+ break;
+ }
+ return NOTIFY_OK;
}

+static struct notifier_block __refdata oprofile_cpu_notifier = {
+ .notifier_call = oprofile_cpu_notify,
+};

-void __init oprofile_timer_init(struct oprofile_operations *ops)
+int __init oprofile_timer_init(struct oprofile_operations *ops)
{
+ int rc;
+
+ rc = register_hotcpu_notifier(&oprofile_cpu_notifier);
+ if (rc)
+ return rc;
ops->create_files = NULL;
ops->setup = NULL;
ops->shutdown = NULL;
- ops->start = timer_start;
- ops->stop = timer_stop;
+ ops->start = oprofile_hrtimer_start;
+ ops->stop = oprofile_hrtimer_stop;
ops->cpu_type = "timer";
+ return 0;
+}
+
+void __exit oprofile_timer_exit(void)
+{
+ unregister_hotcpu_notifier(&oprofile_cpu_notifier);
}

---

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

2009-06-24 17:47:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 0/2] NOHZ vs. profile/oprofile v2


* Martin Schwidefsky <[email protected]> wrote:

> On Mon, 22 Jun 2009 17:40:30 +0200
> Ingo Molnar <[email protected]> wrote:
>
> >
> > * Martin Schwidefsky <[email protected]> wrote:
> >
> > > On Mon, 22 Jun 2009 17:29:37 +0200
> > > Ingo Molnar <[email protected]> wrote:
> > >
> > > >
> > > > * Martin Schwidefsky <[email protected]> wrote:
> > > >
> > > > > [...] And if we really want to keep things separate there will be
> > > > > two sets of per-cpu hrtimer, one for the old style profiler and
> > > > > one for oprofile. Any preference for the user space interface to
> > > > > set the sample rate? A sysctl?
> > > >
> > > > I dont think we want to keep things separate. Regarding old-style
> > > > profiler, does anyone still use it? There's now a superior in-tree
> > > > replacement for it, so we could phase it out.
> > >
> > > Well, for my part I won't miss it. But to be able to remove the
> > > profile_tick() calls from the architectures I either have to rip
> > > out the old profiler now, or adapt it to use hrtimer as well.
> >
> > Do we _have to_ touch it so widely right now? We could start with a
> > deprecation warning in this cycle. Once it's deprecated we can
> > remove all those calls.
>
> First version of the hrtimer patch for oprofile. I did not add the
> sysctl yet, if the sysctl is added in oprofile_timer_init it would
> not be available if some better profiling source is available. If
> it is added unconditionally it would only have an effect if the
> timer fallback is used. Both cases are not exactly nice for a user
> space interface.

looks quite sane. I've Cc:-ed Robert.

Ingo