2009-06-24 11:11:19

by Martin Schwidefsky

[permalink] [raw]
Subject: register_timer_hook use in arch/sh/oprofile

Hi Paul,
following Ingo's suggestion I'm working on a patch to use a per-cpu
hrtimer instead of the timer_hook for the oprofile ticks. Given that
oprofile is the only user of the timer_hook I want to remove it
completely. That way I came across the register_timer_hook call in
arch/sh/oprofile/op_model_sh7750.c. Did this ever work? The startup
of oprofile is done in two steps, on module load oprofile_init is
called, on profiling start oprofile_start is called.
Now for SH7750 this happens:

oprofile_init()
...
oprofile_arch_init(&oprofile_ops);
...
lmodel->init();
/* init() is sh7750_ppc_init for CPU_SH7750/CPU_7750S */
sh7750_ppc_reset();
register_timer_hook(sh7750_timer_notify);
...
...
oprofile_timer_init(&oprofile_ops);
...
ops->start = timer_start;
...
...

oprofile_start()
...
oprofile_ops.start();
/* start() is timer_start set by oprofile_timer_init */
register_timer_hook(timer_notify);
...

As far as I can see the second register_timer_hook will fail
because sh7750_timer_notify is already registered. That will cause
oprofile_start to fail with -EBUSY, no?

My proposed solution for that particular problem would be to remove the
sh7750_timer_notify function together with the register/unregister
calls, see patch below. Then the sole caller of register_timer_hook
would be in drivers/oprofile/timer_int.c and after that has been
converted to hrtimer the whole timer_hook stuff could go away.

---
diff -urpN linux-2.6/arch/sh/oprofile/op_model_sh7750.c linux-2.6-patched/arch/sh/oprofile/op_model_sh7750.c
--- linux-2.6/arch/sh/oprofile/op_model_sh7750.c 2009-06-10 05:05:27.000000000 +0200
+++ linux-2.6-patched/arch/sh/oprofile/op_model_sh7750.c 2009-06-24 13:07:08.000000000 +0200
@@ -95,12 +95,6 @@ static struct sh7750_ppc_register_config
* behavior.
*/

-static int sh7750_timer_notify(struct pt_regs *regs)
-{
- oprofile_add_sample(regs, 0);
- return 0;
-}
-
static u64 sh7750_read_counter(int counter)
{
return (u64)((u64)(__raw_readl(PMCTRH(counter)) & 0xffff) << 32) |
@@ -231,14 +225,11 @@ static inline void sh7750_ppc_reset(void
static int sh7750_ppc_init(void)
{
sh7750_ppc_reset();
-
- return register_timer_hook(sh7750_timer_notify);
+ return 0;
}

static void sh7750_ppc_exit(void)
{
- unregister_timer_hook(sh7750_timer_notify);
-
sh7750_ppc_reset();
}

---

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.


2009-06-24 11:30:39

by Paul Mundt

[permalink] [raw]
Subject: Re: register_timer_hook use in arch/sh/oprofile

On Wed, Jun 24, 2009 at 01:11:04PM +0200, Martin Schwidefsky wrote:
> Hi Paul,
> following Ingo's suggestion I'm working on a patch to use a per-cpu
> hrtimer instead of the timer_hook for the oprofile ticks. Given that
> oprofile is the only user of the timer_hook I want to remove it
> completely. That way I came across the register_timer_hook call in
> arch/sh/oprofile/op_model_sh7750.c. Did this ever work? The startup
> of oprofile is done in two steps, on module load oprofile_init is
> called, on profiling start oprofile_start is called.
> Now for SH7750 this happens:
>
> oprofile_init()
> ...
> oprofile_arch_init(&oprofile_ops);
> ...
> lmodel->init();
> /* init() is sh7750_ppc_init for CPU_SH7750/CPU_7750S */
> sh7750_ppc_reset();
> register_timer_hook(sh7750_timer_notify);
> ...
> ...
> oprofile_timer_init(&oprofile_ops);
> ...
> ops->start = timer_start;
> ...
> ...
>
> oprofile_start()
> ...
> oprofile_ops.start();
> /* start() is timer_start set by oprofile_timer_init */
> register_timer_hook(timer_notify);
> ...
>
> As far as I can see the second register_timer_hook will fail
> because sh7750_timer_notify is already registered. That will cause
> oprofile_start to fail with -EBUSY, no?
>
No. oprofile_timer_init() is only entered if the performance counters
fail to register in the SH7750 case, so there is only one timer hook user
at a time:

static int __init oprofile_init(void)
{
int err;

err = oprofile_arch_init(&oprofile_ops);

if (err < 0 || timer) {
printk(KERN_INFO "oprofile: using timer interrupt.\n");
oprofile_timer_init(&oprofile_ops);
}
...

The sh7750 performance counter code has always purposely abused the timer
interrupt as its profiling interrupt given that the counters themselves do not
generate IRQs on their own. There are a couple of 64-bit counters that can
count various events, but have no real event associated with them. As long as
they are periodically read, they return accurate statistics, but the granularity
is never better than the timer IRQ.

In practice oprofile has never been a good fit for these sorts of counters, so
this has fairly limited use. If there's a way to wiggle these types of counters
in to the new perf_counter API, then I'll convert that over and just kill the
old oprofile driver off completely. Barring that, I'll just end up converting it
over to hrtimers as well, so don't let that stop you from ripping out the timer
hook bits.

Most of this code predates hrtimers anyways, and it also predates the timer
hook, which is only something that we converted to some years back.

2009-06-24 12:18:15

by Ingo Molnar

[permalink] [raw]
Subject: Re: register_timer_hook use in arch/sh/oprofile


* Paul Mundt <[email protected]> wrote:

> In practice oprofile has never been a good fit for these sorts of
> counters, so this has fairly limited use. If there's a way to
> wiggle these types of counters in to the new perf_counter API,
> then I'll convert that over and just kill the old oprofile driver
> off completely. Barring that, I'll just end up converting it over
> to hrtimers as well, so don't let that stop you from ripping out
> the timer hook bits.
>
> Most of this code predates hrtimers anyways, and it also predates
> the timer hook, which is only something that we converted to some
> years back.

Note, the current initial upstream SH support for perfcounters:

arch/sh/include/asm/perf_counter.h:#define set_perf_counter_pending() do { } while (0)
arch/sh/include/asm/unistd_32.h:#define __NR_perf_counter_open 336
arch/sh/include/asm/unistd_64.h:#define __NR_perf_counter_open 364
arch/sh/kernel/syscalls_32.S: .long sys_perf_counter_open
arch/sh/kernel/syscalls_64.S: .long sys_perf_counter_open

Should already give you hrtimers straight away.

To test it, could you try to run 'perf top' after:

cd tools/perf/
make install

It should display a hrtimer driven kernel profile already. You can
increase/decrease the frequency of sampling by using -F option - say
'perf top -F 10000' should sample at 10 KHz.

Please let me know if any of this does not work as expected.

Ingo

2009-06-24 12:26:42

by Paul Mundt

[permalink] [raw]
Subject: Re: register_timer_hook use in arch/sh/oprofile

On Wed, Jun 24, 2009 at 02:17:50PM +0200, Ingo Molnar wrote:
>
> * Paul Mundt <[email protected]> wrote:
>
> > In practice oprofile has never been a good fit for these sorts of
> > counters, so this has fairly limited use. If there's a way to
> > wiggle these types of counters in to the new perf_counter API,
> > then I'll convert that over and just kill the old oprofile driver
> > off completely. Barring that, I'll just end up converting it over
> > to hrtimers as well, so don't let that stop you from ripping out
> > the timer hook bits.
> >
> > Most of this code predates hrtimers anyways, and it also predates
> > the timer hook, which is only something that we converted to some
> > years back.
>
> Note, the current initial upstream SH support for perfcounters:
>
> arch/sh/include/asm/perf_counter.h:#define set_perf_counter_pending() do { } while (0)
> arch/sh/include/asm/unistd_32.h:#define __NR_perf_counter_open 336
> arch/sh/include/asm/unistd_64.h:#define __NR_perf_counter_open 364
> arch/sh/kernel/syscalls_32.S: .long sys_perf_counter_open
> arch/sh/kernel/syscalls_64.S: .long sys_perf_counter_open
>
> Should already give you hrtimers straight away.
>
> To test it, could you try to run 'perf top' after:
>
> cd tools/perf/
> make install
>
> It should display a hrtimer driven kernel profile already. You can
> increase/decrease the frequency of sampling by using -F option - say
> 'perf top -F 10000' should sample at 10 KHz.
>
> Please let me know if any of this does not work as expected.
>
Yes, that all works fine. My comment was more in reference to the
hardware performance counters that don't have IRQs of their own, which
still need to be tied in to the perf_counter API.

2009-06-24 12:28:36

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: register_timer_hook use in arch/sh/oprofile

On Wed, 24 Jun 2009 20:29:29 +0900
Paul Mundt <[email protected]> wrote:

> On Wed, Jun 24, 2009 at 01:11:04PM +0200, Martin Schwidefsky wrote:
> > Hi Paul,
> > following Ingo's suggestion I'm working on a patch to use a per-cpu
> > hrtimer instead of the timer_hook for the oprofile ticks. Given that
> > oprofile is the only user of the timer_hook I want to remove it
> > completely. That way I came across the register_timer_hook call in
> > arch/sh/oprofile/op_model_sh7750.c. Did this ever work? The startup
> > of oprofile is done in two steps, on module load oprofile_init is
> > called, on profiling start oprofile_start is called.
> > Now for SH7750 this happens:
> >
> > oprofile_init()
> > ...
> > oprofile_arch_init(&oprofile_ops);
> > ...
> > lmodel->init();
> > /* init() is sh7750_ppc_init for CPU_SH7750/CPU_7750S */
> > sh7750_ppc_reset();
> > register_timer_hook(sh7750_timer_notify);
> > ...
> > ...
> > oprofile_timer_init(&oprofile_ops);
> > ...
> > ops->start = timer_start;
> > ...
> > ...
> >
> > oprofile_start()
> > ...
> > oprofile_ops.start();
> > /* start() is timer_start set by oprofile_timer_init */
> > register_timer_hook(timer_notify);
> > ...
> >
> > As far as I can see the second register_timer_hook will fail
> > because sh7750_timer_notify is already registered. That will cause
> > oprofile_start to fail with -EBUSY, no?
> >
> No. oprofile_timer_init() is only entered if the performance counters
> fail to register in the SH7750 case, so there is only one timer hook user
> at a time:
>
> static int __init oprofile_init(void)
> {
> int err;
>
> err = oprofile_arch_init(&oprofile_ops);
>
> if (err < 0 || timer) {
> printk(KERN_INFO "oprofile: using timer interrupt.\n");
> oprofile_timer_init(&oprofile_ops);
> }
> ...

Oh, I see. That is the reason why the s390 version of
oprofile_arch_init returns -ENODEV. It does so to trigger the fallback
to the timer_hook. That should work for sh as well, no?

> The sh7750 performance counter code has always purposely abused the timer
> interrupt as its profiling interrupt given that the counters themselves do not
> generate IRQs on their own. There are a couple of 64-bit counters that can
> count various events, but have no real event associated with them. As long as
> they are periodically read, they return accurate statistics, but the granularity
> is never better than the timer IRQ.
>
> In practice oprofile has never been a good fit for these sorts of counters, so
> this has fairly limited use. If there's a way to wiggle these types of counters
> in to the new perf_counter API, then I'll convert that over and just kill the
> old oprofile driver off completely. Barring that, I'll just end up converting it
> over to hrtimers as well, so don't let that stop you from ripping out the timer
> hook bits.
>
> Most of this code predates hrtimers anyways, and it also predates the timer
> hook, which is only something that we converted to some years back.

Consider the timer_hook as history :-)

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

2009-06-24 12:35:29

by Paul Mundt

[permalink] [raw]
Subject: Re: register_timer_hook use in arch/sh/oprofile

On Wed, Jun 24, 2009 at 02:28:28PM +0200, Martin Schwidefsky wrote:
> On Wed, 24 Jun 2009 20:29:29 +0900
> Paul Mundt <[email protected]> wrote:
> > No. oprofile_timer_init() is only entered if the performance counters
> > fail to register in the SH7750 case, so there is only one timer hook user
> > at a time:
> >
> > static int __init oprofile_init(void)
> > {
> > int err;
> >
> > err = oprofile_arch_init(&oprofile_ops);
> >
> > if (err < 0 || timer) {
> > printk(KERN_INFO "oprofile: using timer interrupt.\n");
> > oprofile_timer_init(&oprofile_ops);
> > }
> > ...
>
> Oh, I see. That is the reason why the s390 version of
> oprofile_arch_init returns -ENODEV. It does so to trigger the fallback
> to the timer_hook. That should work for sh as well, no?
>
It would, yes, but it would also disable access to the SH7750 counters at
the same time, so we don't really want to do that. The sh7750 counters
are more like timer based profiling with some extra events that can be
set and read, so reverting to oprofile_timer_init() would reduce
functionality.

My current plan is to migrate things over to the perf_counter API and
annoy Ingo with my interrupt deprived counters ;-)

Given that hrtimers are already generically supported there, it should
tie in much cleaner there than in the oprofile case at least.

2009-06-24 12:38:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: register_timer_hook use in arch/sh/oprofile


* Paul Mundt <[email protected]> wrote:

> On Wed, Jun 24, 2009 at 02:17:50PM +0200, Ingo Molnar wrote:
> >
> > * Paul Mundt <[email protected]> wrote:
> >
> > > In practice oprofile has never been a good fit for these sorts of
> > > counters, so this has fairly limited use. If there's a way to
> > > wiggle these types of counters in to the new perf_counter API,
> > > then I'll convert that over and just kill the old oprofile driver
> > > off completely. Barring that, I'll just end up converting it over
> > > to hrtimers as well, so don't let that stop you from ripping out
> > > the timer hook bits.
> > >
> > > Most of this code predates hrtimers anyways, and it also predates
> > > the timer hook, which is only something that we converted to some
> > > years back.
> >
> > Note, the current initial upstream SH support for perfcounters:
> >
> > arch/sh/include/asm/perf_counter.h:#define set_perf_counter_pending() do { } while (0)
> > arch/sh/include/asm/unistd_32.h:#define __NR_perf_counter_open 336
> > arch/sh/include/asm/unistd_64.h:#define __NR_perf_counter_open 364
> > arch/sh/kernel/syscalls_32.S: .long sys_perf_counter_open
> > arch/sh/kernel/syscalls_64.S: .long sys_perf_counter_open
> >
> > Should already give you hrtimers straight away.
> >
> > To test it, could you try to run 'perf top' after:
> >
> > cd tools/perf/
> > make install
> >
> > It should display a hrtimer driven kernel profile already. You can
> > increase/decrease the frequency of sampling by using -F option - say
> > 'perf top -F 10000' should sample at 10 KHz.
> >
> > Please let me know if any of this does not work as expected.
>
> Yes, that all works fine. [...]

Great! :)

> [...] My comment was more in reference to the hardware performance
> counters that don't have IRQs of their own, which still need to be
> tied in to the perf_counter API.

Yeah. Note that you dont have to implement explicit interrupts
support for that (especially if it's non-existent in the hardware):
just implement the enable/disable and read methods, and then you can
sample based on that counter by using it together in a counter-group
with a sampling software-counter.

Each software-counter IRQ (hrtimer driven) will cause a sample of
the hw counter to be emitted too.

This would work here and today.

A step further would be to librarize this in kernel/perf_counter.c
and allow architectures to offer such 'hw backed' counters to
user-space as a single item, under the generic enumeration:

PERF_COUNT_HW_CPU_CYCLES = 0,
PERF_COUNT_HW_INSTRUCTIONS = 1,
PERF_COUNT_HW_CACHE_REFERENCES = 2,
PERF_COUNT_HW_CACHE_MISSES = 3,
PERF_COUNT_HW_BRANCH_INSTRUCTIONS = 4,
PERF_COUNT_HW_BRANCH_MISSES = 5,

Note that with the latest tools it does not skew the results or
profiles if the 'period metric' is not the hardware counter itself,
but an independent hrtimer. All the tooling/reporting (perf top and
perf report) is using event weights, so the period is an invariant.

(and especially with self-tuning auto-freq counters the period is
never truly constant anyway)

So for all tooling and analysis purposes such counters would be
fully equivalent to 'real' hw counters that can generate interrupts.

Ingo

2009-06-24 12:40:32

by Paul Mackerras

[permalink] [raw]
Subject: Re: register_timer_hook use in arch/sh/oprofile

Paul Mundt writes:

> Yes, that all works fine. My comment was more in reference to the
> hardware performance counters that don't have IRQs of their own, which
> still need to be tied in to the perf_counter API.

For them you would need to check in hw_perf_counter_init() that
counter->attr.sample_period is zero, and fail with an -EINVAL error if
it isn't.

Paul.

2009-06-24 12:45:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: register_timer_hook use in arch/sh/oprofile


* Paul Mundt <[email protected]> wrote:

> My current plan is to migrate things over to the perf_counter API
> and annoy Ingo with my interrupt deprived counters ;-)

Please do :-)

I just wrote a longer explanation about them: i think they can be
made full-blown hardware counters, which in the end would be
basically just as capable as 'real' IRQ capable counters.

The main complication that such counters bring is that in their 'own
metric' they do interrupt periods in an 'irregular' way (because
they interrupt in the nanosec metric - being hrtimers) - but both
the tools can deal with uneven periods just fine and the auto-freq
code can auto-balance based on this just fine too.

So you should be able to implement this within arch/sh/ with
existing perf_counter facilities and hrtimers, or you can help us
librarize it in kernel/perf_counter.c some more, to minimize
architecture code.

[ Of course, given that you are the first architecture to do this,
you would be fair to expect some unexpected details along the way
as well ;-) ]

Ingo

2009-06-24 12:48:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: register_timer_hook use in arch/sh/oprofile


* Paul Mackerras <[email protected]> wrote:

> Paul Mundt writes:
>
> > Yes, that all works fine. My comment was more in reference to
> > the hardware performance counters that don't have IRQs of their
> > own, which still need to be tied in to the perf_counter API.
>
> For them you would need to check in hw_perf_counter_init() that
> counter->attr.sample_period is zero, and fail with an -EINVAL
> error if it isn't.

That would make 'perf stat' work - but 'perf top' and 'perf record'
would not.

But those can be made to work as well without hw IRQ support, if the
period is simply fed into a special hrtimer, with nanosecs units.
The resulting sampling wont be "constant period" - but none of the
tools really mind about that.

Ingo

2009-06-24 12:54:18

by Martin Schwidefsky

[permalink] [raw]
Subject: Re: register_timer_hook use in arch/sh/oprofile

On Wed, 24 Jun 2009 21:34:16 +0900
Paul Mundt <[email protected]> wrote:

> On Wed, Jun 24, 2009 at 02:28:28PM +0200, Martin Schwidefsky wrote:
> > On Wed, 24 Jun 2009 20:29:29 +0900
> > Paul Mundt <[email protected]> wrote:
> > > No. oprofile_timer_init() is only entered if the performance counters
> > > fail to register in the SH7750 case, so there is only one timer hook user
> > > at a time:
> > >
> > > static int __init oprofile_init(void)
> > > {
> > > int err;
> > >
> > > err = oprofile_arch_init(&oprofile_ops);
> > >
> > > if (err < 0 || timer) {
> > > printk(KERN_INFO "oprofile: using timer interrupt.\n");
> > > oprofile_timer_init(&oprofile_ops);
> > > }
> > > ...
> >
> > Oh, I see. That is the reason why the s390 version of
> > oprofile_arch_init returns -ENODEV. It does so to trigger the fallback
> > to the timer_hook. That should work for sh as well, no?
> >
> It would, yes, but it would also disable access to the SH7750 counters at
> the same time, so we don't really want to do that. The sh7750 counters
> are more like timer based profiling with some extra events that can be
> set and read, so reverting to oprofile_timer_init() would reduce
> functionality.
>
> My current plan is to migrate things over to the perf_counter API and
> annoy Ingo with my interrupt deprived counters ;-)
>
> Given that hrtimers are already generically supported there, it should
> tie in much cleaner there than in the oprofile case at least.

Ok, that makes sense. So I guess for now I should stop trying to get
rid of the timer_hook and concentrate to convert the fallback code
in timer_int.c to hrtimer. Then after sh is fully converted to the
perf_counter API we can do the cleanup.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

2009-06-24 13:15:49

by Paul Mundt

[permalink] [raw]
Subject: Re: register_timer_hook use in arch/sh/oprofile

On Wed, Jun 24, 2009 at 02:45:42PM +0200, Ingo Molnar wrote:
>
> * Paul Mundt <[email protected]> wrote:
>
> > My current plan is to migrate things over to the perf_counter API
> > and annoy Ingo with my interrupt deprived counters ;-)
>
> Please do :-)
>
> I just wrote a longer explanation about them: i think they can be
> made full-blown hardware counters, which in the end would be
> basically just as capable as 'real' IRQ capable counters.
>
Thanks for the hints, now that all of my -rc1 stuff is out of the way,
I've got some spare cycles to start working on the hardware counters.

The lack of design awareness for these sorts of use cases has been one of
the biggest hassles of trying to deal with oprofile and the various
incarnations of perfmon and so on in the past, but perf_counter certainly
looks like a good way forward here.

> The main complication that such counters bring is that in their 'own
> metric' they do interrupt periods in an 'irregular' way (because
> they interrupt in the nanosec metric - being hrtimers) - but both
> the tools can deal with uneven periods just fine and the auto-freq
> code can auto-balance based on this just fine too.
>
How we have mostly handled these counters in the past have been for
long-term workload analysis. A given user wants to measure certain events
across the lifetime of their workload to get an idea of where the hot
spots are, and then send us numbers later. 64-bit hardware counters give
them quite a bit of time to accumulate results, and in these cases
precision is not so important as continuous non-interactive monitoring.

On the other hand trying to beat these sorts of counters in to a
framework where they can leverage existing userspace tools has never
really worked as well as it could have, and this too is something that
perf_counter seems well suited to accomodate.

I knew quite a few other architectures that also have similar counters,
so having these tie in cleanly in a generic way will make these a lot
more useful, especially since in the past people have either just thrown
them at sysfs or just ignored them completely.

> So you should be able to implement this within arch/sh/ with
> existing perf_counter facilities and hrtimers, or you can help us
> librarize it in kernel/perf_counter.c some more, to minimize
> architecture code.
>
Architecturally we have two fairly different counter types that follow a
similar design, so they will need separate drivers regardless. I don't
have much interest in hiding generic stuff in arch/sh/, especially given
that we aren't the only ones with these types of counters, so I'll
certainly hack on kernel/perf_counter.c as necessary to get things moving
along. I expect between the two types of SH counters we have to handle
there will already be quite a bit of variation.

> [ Of course, given that you are the first architecture to do this,
> you would be fair to expect some unexpected details along the way
> as well ;-) ]
>
That tends to be the way things go more often than not. In any event,
it's certainly worth spending time on getting right, and I would rather
spend my time being the guinea pig for evolving in-tree code than not
having the counters be useful at all. ;-)

2009-06-24 13:24:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: register_timer_hook use in arch/sh/oprofile


* Paul Mundt <[email protected]> wrote:

> > The main complication that such counters bring is that in their
> > 'own metric' they do interrupt periods in an 'irregular' way
> > (because they interrupt in the nanosec metric - being hrtimers)
> > - but both the tools can deal with uneven periods just fine and
> > the auto-freq code can auto-balance based on this just fine too.
>
> How we have mostly handled these counters in the past have been
> for long-term workload analysis. A given user wants to measure
> certain events across the lifetime of their workload to get an
> idea of where the hot spots are, and then send us numbers later.
> 64-bit hardware counters give them quite a bit of time to
> accumulate results, and in these cases precision is not so
> important as continuous non-interactive monitoring.

Note that if i understand the constraints correctly, SH's counters
will be _exactly_ as precise as x86 counters or PowerPC counters.

Even if SH uses a hrtimer to drive the sampling of them. [*]

The reason is that perfcounters are designed to be 'sampling
frequency/precision invariant'. We deal with
[delta_count,delta_time] pairs in the histograms, etc. and nothing
assumes that periods are static.

[ And long-term analysis ('perf stat' type of runs) dont need IRQs
anyway - perfcounters reads outs the counts and summarizes them
across the measured workload. ]

Furthermore, -F (auto-freq) counters are by design variable-period
and self-adjusting, even on x86 and PowerPC.

[ Btw., we plan to switch over the tools to use a 10 KHz auto-freq
sampling by default - as that gives us meaningful samples all the
time, regardless of how rare the underlying hardware event is. ]

Ingo

[*] The only tradeoff is that you wont have NMI sampling. But if you
have _some_ sort of NMI source in SH (regardless of whether it's
the PMU that drives it), you can still expose that and drive
your perfcounter sampling via that.

2009-06-26 07:01:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: register_timer_hook use in arch/sh/oprofile

On Wed, 2009-06-24 at 15:24 +0200, Ingo Molnar wrote:
>
> [ And long-term analysis ('perf stat' type of runs) dont need IRQs
> anyway - perfcounters reads outs the counts and summarizes them
> across the measured workload. ]

If the counter width is less than 64 bits we do need to have some
interrupt to read them from before they cycle so we can accumulate the
deltas into a proper u64.

But for proper 64 bit hardware counters there is indeed no need for
that.

2009-06-26 07:24:15

by Ingo Molnar

[permalink] [raw]
Subject: Re: register_timer_hook use in arch/sh/oprofile


* Peter Zijlstra <[email protected]> wrote:

> On Wed, 2009-06-24 at 15:24 +0200, Ingo Molnar wrote:
> >
> > [ And long-term analysis ('perf stat' type of runs) dont need IRQs
> > anyway - perfcounters reads outs the counts and summarizes them
> > across the measured workload. ]
>
> If the counter width is less than 64 bits we do need to have some
> interrupt to read them from before they cycle so we can accumulate
> the deltas into a proper u64.

Yeah - but even that can be driven from some housekeeping hrtimer.

> But for proper 64 bit hardware counters there is indeed no need
> for that.

Indeed - although they are rare.

Ingo

2009-06-26 07:27:49

by Paul Mundt

[permalink] [raw]
Subject: Re: register_timer_hook use in arch/sh/oprofile

On Fri, Jun 26, 2009 at 09:23:50AM +0200, Ingo Molnar wrote:
>
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Wed, 2009-06-24 at 15:24 +0200, Ingo Molnar wrote:
> > >
> > > [ And long-term analysis ('perf stat' type of runs) dont need IRQs
> > > anyway - perfcounters reads outs the counts and summarizes them
> > > across the measured workload. ]
> >
> > If the counter width is less than 64 bits we do need to have some
> > interrupt to read them from before they cycle so we can accumulate
> > the deltas into a proper u64.
>
> Yeah - but even that can be driven from some housekeeping hrtimer.
>
> > But for proper 64 bit hardware counters there is indeed no need
> > for that.
>
> Indeed - although they are rare.
>
We have both full 64-bit counters as well as 48-bit, although they do
both at least set an overflow bit that can be looked at, even if there is
no exception associated with it. Optionally they can be split up in to
multiple 32-bit counters and half of them handed over to the bus
controller.

2009-06-26 07:36:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: register_timer_hook use in arch/sh/oprofile

On Fri, 2009-06-26 at 09:23 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Wed, 2009-06-24 at 15:24 +0200, Ingo Molnar wrote:
> > >
> > > [ And long-term analysis ('perf stat' type of runs) dont need IRQs
> > > anyway - perfcounters reads outs the counts and summarizes them
> > > across the measured workload. ]
> >
> > If the counter width is less than 64 bits we do need to have some
> > interrupt to read them from before they cycle so we can accumulate
> > the deltas into a proper u64.
>
> Yeah - but even that can be driven from some housekeeping hrtimer.

Oh absolutely, but its one of those things one should not forget to do.

> > But for proper 64 bit hardware counters there is indeed no need
> > for that.
>
> Indeed - although they are rare.

Paul Mundt mentioned he had some.

2009-12-16 04:52:51

by Paul Mundt

[permalink] [raw]
Subject: Re: register_timer_hook use in arch/sh/oprofile

Hi Martin,

On Wed, Jun 24, 2009 at 02:54:06PM +0200, Martin Schwidefsky wrote:
> On Wed, 24 Jun 2009 21:34:16 +0900
> Paul Mundt <[email protected]> wrote:
> > On Wed, Jun 24, 2009 at 02:28:28PM +0200, Martin Schwidefsky wrote:
> > > On Wed, 24 Jun 2009 20:29:29 +0900
> > > Paul Mundt <[email protected]> wrote:
> > > > No. oprofile_timer_init() is only entered if the performance counters
> > > > fail to register in the SH7750 case, so there is only one timer hook user
> > > > at a time:
> > > >
> > > > static int __init oprofile_init(void)
> > > > {
> > > > int err;
> > > >
> > > > err = oprofile_arch_init(&oprofile_ops);
> > > >
> > > > if (err < 0 || timer) {
> > > > printk(KERN_INFO "oprofile: using timer interrupt.\n");
> > > > oprofile_timer_init(&oprofile_ops);
> > > > }
> > > > ...
> > >
> > > Oh, I see. That is the reason why the s390 version of
> > > oprofile_arch_init returns -ENODEV. It does so to trigger the fallback
> > > to the timer_hook. That should work for sh as well, no?
> > >
> > It would, yes, but it would also disable access to the SH7750 counters at
> > the same time, so we don't really want to do that. The sh7750 counters
> > are more like timer based profiling with some extra events that can be
> > set and read, so reverting to oprofile_timer_init() would reduce
> > functionality.
> >
> > My current plan is to migrate things over to the perf_counter API and
> > annoy Ingo with my interrupt deprived counters ;-)
> >
> > Given that hrtimers are already generically supported there, it should
> > tie in much cleaner there than in the oprofile case at least.
>
> Ok, that makes sense. So I guess for now I should stop trying to get
> rid of the timer_hook and concentrate to convert the fallback code
> in timer_int.c to hrtimer. Then after sh is fully converted to the
> perf_counter API we can do the cleanup.
>

This is just a follow-up to let you know that the first-step transition
to the perf_counter API is done. There's still more work to do, but we're
already more functional on the perf_counter side than we ever were on
oprofile. As such, I've subsequently killed off the bitrotted oprofile
bits, which includes the timer hook references.

There is now nothing remaining on the SH side blocking the timer hook
removal.

2009-12-16 07:22:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: register_timer_hook use in arch/sh/oprofile


* Paul Mundt <[email protected]> wrote:

> Hi Martin,
>
> On Wed, Jun 24, 2009 at 02:54:06PM +0200, Martin Schwidefsky wrote:
> > On Wed, 24 Jun 2009 21:34:16 +0900
> > Paul Mundt <[email protected]> wrote:
> > > On Wed, Jun 24, 2009 at 02:28:28PM +0200, Martin Schwidefsky wrote:
> > > > On Wed, 24 Jun 2009 20:29:29 +0900
> > > > Paul Mundt <[email protected]> wrote:
> > > > > No. oprofile_timer_init() is only entered if the performance counters
> > > > > fail to register in the SH7750 case, so there is only one timer hook user
> > > > > at a time:
> > > > >
> > > > > static int __init oprofile_init(void)
> > > > > {
> > > > > int err;
> > > > >
> > > > > err = oprofile_arch_init(&oprofile_ops);
> > > > >
> > > > > if (err < 0 || timer) {
> > > > > printk(KERN_INFO "oprofile: using timer interrupt.\n");
> > > > > oprofile_timer_init(&oprofile_ops);
> > > > > }
> > > > > ...
> > > >
> > > > Oh, I see. That is the reason why the s390 version of
> > > > oprofile_arch_init returns -ENODEV. It does so to trigger the fallback
> > > > to the timer_hook. That should work for sh as well, no?
> > > >
> > > It would, yes, but it would also disable access to the SH7750 counters at
> > > the same time, so we don't really want to do that. The sh7750 counters
> > > are more like timer based profiling with some extra events that can be
> > > set and read, so reverting to oprofile_timer_init() would reduce
> > > functionality.
> > >
> > > My current plan is to migrate things over to the perf_counter API and
> > > annoy Ingo with my interrupt deprived counters ;-)
> > >
> > > Given that hrtimers are already generically supported there, it should
> > > tie in much cleaner there than in the oprofile case at least.
> >
> > Ok, that makes sense. So I guess for now I should stop trying to get
> > rid of the timer_hook and concentrate to convert the fallback code
> > in timer_int.c to hrtimer. Then after sh is fully converted to the
> > perf_counter API we can do the cleanup.
> >
>
> This is just a follow-up to let you know that the first-step transition to
> the perf_counter API is done. There's still more work to do, but we're
> already more functional on the perf_counter side than we ever were on
> oprofile. [...]

Nice to hear!

Please let us know if you have any problems on the tools/perf/ side as well -
in particular everyday usability. Even small details matter.

Also, embedded-system usability would be of interest as well. Most of the perf
tooling gets tested on x86 so it's quite possible that some aspects are not as
slim as they could be.

One key point of performance is the bona fide overhead of a default "perf
record" profile recording session. On x86, it's roughly in the following
range:

aldebaran:~> perf stat --repeat 3 -e instructions ./loop_10b_instructions

Performance counter stats for './loop_10b_instructions' (3 runs):

10009061648 instructions # 0.000 IPC ( +- 0.067% )

2.407052637 seconds time elapsed ( +- 1.259% )

aldebaran:~> perf stat --repeat 3 -e instructions perf record -f ./loop_10b_instructions
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.073 MB perf.data (~3191 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.073 MB perf.data (~3183 samples) ]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.073 MB perf.data (~3190 samples) ]

Performance counter stats for 'perf record -f ./loop_10b_instructions' (3 runs):

10029183818 instructions # 0.000 IPC ( +- 0.004% )

2.377064570 seconds time elapsed ( +- 0.215% )

I.e. 10029183818/10009061648 ~== 0.2% direct overhead.

For something very system and task intense (such as hackbench) it can go up to
2.5%.

Thanks,

Ingo