Subject: [GIT PULL] updates for oprofile

Ingo,

please pull oprofile updates from:

git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git core

You might want to pull from 'for-next' instead that includes an
additional merge with tip/tracing/core to resolve conflicts with that
branch.

Thanks.

-Robert

--

commit b971f06187d83b5c03d2b597cccdfef421c0ca91
Merge: cb6e943 c1ab9ca
Author: Robert Richter <[email protected]>
Date: Fri Apr 23 16:47:51 2010 +0200

Merge commit 'tip/tracing/core' into oprofile/core

Conflicts:
drivers/oprofile/cpu_buffer.c

Signed-off-by: Robert Richter <[email protected]>

commit cb6e943ccf19ab6d3189147e9d625a992e016084
Author: Andi Kleen <[email protected]>
Date: Thu Apr 1 03:17:25 2010 +0200

oprofile: remove double ring buffering

oprofile used a double buffer scheme for its cpu event buffer
to avoid races on reading with the old locked ring buffer.

But that is obsolete now with the new ring buffer, so simply
use a single buffer. This greatly simplifies the code and avoids
a lot of sample drops on large runs, especially with call graph.

Based on suggestions from Steven Rostedt

For stable kernels from v2.6.32, but not earlier.

Signed-off-by: Andi Kleen <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Robert Richter <[email protected]>

commit a36bf32e9e8a86f291f746b7f8292e042ee04a46
Merge: bc078e4 01bf0b6
Author: Robert Richter <[email protected]>
Date: Fri Apr 23 14:30:22 2010 +0200

Merge commit 'v2.6.34-rc5' into oprofile/core

commit bc078e4eab65f11bbaeed380593ab8151b30d703
Author: Martin Schwidefsky <[email protected]>
Date: Tue Mar 2 16:01:10 2010 +0100

oprofile: convert oprofile from timer_hook to hrtimer

Oprofile is currently broken on systems running with NOHZ enabled.
A maximum of 1 tick is accounted via the timer_hook if a cpu sleeps
for a longer period of time. This does bad things to the percentages
in the profiler output. To solve this problem convert oprofile to
use a restarting hrtimer instead of the timer_hook.

Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Robert Richter <[email protected]>

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: [email protected]


2010-04-27 09:21:25

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] updates for oprofile


* Robert Richter <[email protected]> wrote:

> Ingo,
>
> please pull oprofile updates from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git core
>
> You might want to pull from 'for-next' instead that includes an
> additional merge with tip/tracing/core to resolve conflicts with that
> branch.

Pulled, thanks Robert!

Ingo

2010-04-27 15:26:32

by Phil Carmody

[permalink] [raw]
Subject: Re: [GIT PULL] updates for oprofile

Ingo, et al.,

Regarding today's pulled request, containing:

commit bc078e4eab65f11bbaeed380593ab8151b30d703
Author: Martin Schwidefsky <[email protected]>
Date: Tue Mar 2 16:01:10 2010 +0100

oprofile: convert oprofile from timer_hook to hrtimer


Information is a touch scant, as I'm doing the investigation as I
write, but I believe that that patch can cause ooops regressions
via a null-pointer dereference in oprofile_add_sample().

That function declares:
"""
/**
* Add a sample. This may be called from any context.
*/
void oprofile_add_sample(struct pt_regs * const regs, unsigned long event);
"""

And begins:
"""
void oprofile_add_sample(struct pt_regs * const regs, unsigned long event)
{
int is_kernel = !user_mode(regs);
"""

Where on at least two major architectures (Arm, x86), user_mode()
unconditionally dereferences its parameter.

Now oprofile_add_sample() is called from this context:
"""
static enum hrtimer_restart oprofile_hrtimer_notify(struct hrtimer *hrtimer)
{
oprofile_add_sample(get_irq_regs(), 0);
"""

And get_irq_regs() is NULL when not in an IRQ context.

Bang.

An example of this kind of thing kicking in has already been encountered
last year:
http://www.mail-archive.com/[email protected]/msg14069.html
(That thread got a little side-tracked onto OMAP specifics, but the
original report is topical.)

Now would be a very good time for the "many eyes" principle to kick in.

I'm now looking into workarounds, but nothing that I'd necessarily
want to submit as a real fix.

Phil
cc:'d replies appreciated

Subject: Re: [GIT PULL] updates for oprofile

(cc'ing oprofile-list)

On 27.04.10 18:25:44, Phil Carmody wrote:
> Ingo, et al.,
>
> Regarding today's pulled request, containing:
>
> commit bc078e4eab65f11bbaeed380593ab8151b30d703
> Author: Martin Schwidefsky <[email protected]>
> Date: Tue Mar 2 16:01:10 2010 +0100
>
> oprofile: convert oprofile from timer_hook to hrtimer
>
>
> Information is a touch scant, as I'm doing the investigation as I
> write, but I believe that that patch can cause ooops regressions
> via a null-pointer dereference in oprofile_add_sample().
>
> That function declares:
> """
> /**
> * Add a sample. This may be called from any context.
> */
> void oprofile_add_sample(struct pt_regs * const regs, unsigned long event);
> """
>
> And begins:
> """
> void oprofile_add_sample(struct pt_regs * const regs, unsigned long event)
> {
> int is_kernel = !user_mode(regs);
> """
>
> Where on at least two major architectures (Arm, x86), user_mode()
> unconditionally dereferences its parameter.
>
> Now oprofile_add_sample() is called from this context:
> """
> static enum hrtimer_restart oprofile_hrtimer_notify(struct hrtimer *hrtimer)
> {
> oprofile_add_sample(get_irq_regs(), 0);
> """
>
> And get_irq_regs() is NULL when not in an IRQ context.

Perf is simply dropping the sample in such cases, see:

kernel/perf_event.c:perf_swevent_hrtimer()

So at quick fix would be to check for a null pointer also. But,
according to this:

http://www.mail-archive.com/[email protected]/msg14074.html

samples will be incorrect then since only interrupt context is
profiled. It seems there is no solution available right now.

-Robert

>
> Bang.
>
> An example of this kind of thing kicking in has already been encountered
> last year:
> http://www.mail-archive.com/[email protected]/msg14069.html
> (That thread got a little side-tracked onto OMAP specifics, but the
> original report is topical.)
>
> Now would be a very good time for the "many eyes" principle to kick in.
>
> I'm now looking into workarounds, but nothing that I'd necessarily
> want to submit as a real fix.
>
> Phil
> cc:'d replies appreciated
>

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: [email protected]

2010-04-27 17:50:24

by Siarhei Siamashka

[permalink] [raw]
Subject: Re: [GIT PULL] updates for oprofile

On Tuesday 27 April 2010 20:40:26 ext Robert Richter wrote:
> On 27.04.10 18:25:44, Phil Carmody wrote:
> > Now oprofile_add_sample() is called from this context:
> > """
> > static enum hrtimer_restart oprofile_hrtimer_notify(struct hrtimer
> > *hrtimer) {
> > oprofile_add_sample(get_irq_regs(), 0);
> > """
> >
> > And get_irq_regs() is NULL when not in an IRQ context.
>
> Perf is simply dropping the sample in such cases, see:
>
> kernel/perf_event.c:perf_swevent_hrtimer()
>
> So at quick fix would be to check for a null pointer also. But,
> according to this:
>
> http://www.mail-archive.com/[email protected]/msg14074.html
>
> samples will be incorrect then since only interrupt context is
> profiled. It seems there is no solution available right now.

Isn't hrtimer callback function supposed to be only called from IRQ context
after this cleanup: http://lwn.net/Articles/308545/ ?

--
Best regards,
Siarhei Siamashka

Subject: [PATCH] oprofile, hrtimer: only add samples if regs are available

On 28.04.10 18:59:06, Robert Richter wrote:
> On 27.04.10 20:47:51, Siarhei Siamashka wrote:
> > Isn't hrtimer callback function supposed to be only called from IRQ context
> > after this cleanup: http://lwn.net/Articles/308545/ ?
>
> Yes, the patch is upstream since v2.6.29. Thanks Siarhei.
>
> I will add a null pointer check anyway.

--

>From ad8a6546b32d22ecf92b4448ad329133d18dde0f Mon Sep 17 00:00:00 2001
From: Robert Richter <[email protected]>
Date: Wed, 28 Apr 2010 20:43:18 +0200
Subject: [PATCH] oprofile, hrtimer: only add samples if regs are available

This patch adds a check if the regs pointer is valid. This actually
should not happen since hrtimer notification code should always run in
hard_irq context after this commit (since v2.6.29):

ca10949 hrtimer: removing all ur callback modes

However, if code does not run in interrupt context, regs will be null:

http://www.mail-archive.com/[email protected]/msg14074.html

Cc: Martin Schwidefsky <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
---
drivers/oprofile/timer_int.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/oprofile/timer_int.c b/drivers/oprofile/timer_int.c
index dc0ae4d..ca3436c 100644
--- a/drivers/oprofile/timer_int.c
+++ b/drivers/oprofile/timer_int.c
@@ -24,8 +24,12 @@ static DEFINE_PER_CPU(struct hrtimer, oprofile_hrtimer);

static enum hrtimer_restart oprofile_hrtimer_notify(struct hrtimer *hrtimer)
{
- oprofile_add_sample(get_irq_regs(), 0);
+ struct pt_regs *regs = get_irq_regs();
+
+ if (regs)
+ oprofile_add_sample(regs, 0);
hrtimer_forward_now(hrtimer, ns_to_ktime(TICK_NSEC));
+
return HRTIMER_RESTART;
}

--
1.7.0.3



--
Advanced Micro Devices, Inc.
Operating System Research Center
email: [email protected]

2010-04-28 17:09:56

by Phil Carmody

[permalink] [raw]
Subject: Re: [GIT PULL] updates for oprofile

On 28/04/10 18:59 +0200, ext Robert Richter wrote:
> On 27.04.10 20:47:51, Siarhei Siamashka wrote:
> > Isn't hrtimer callback function supposed to be only called from IRQ context
> > after this cleanup: http://lwn.net/Articles/308545/ ?
>
> Yes, the patch is upstream since v2.6.29. Thanks Siarhei.
>
> I will add a null pointer check anyway.

A few here thrashed around a couple of ideas, and the general consensus
was that the following work for us, and is offered for consideration.

Phil


From: Phil Carmody <[email protected]>
Date: Tue, 27 Apr 2010 19:28:33 +0300
Subject: [PATCH v2 1/1] oprofile: HACK - protect from not being in an IRQ context

http://lkml.org/lkml/2010/4/27/285

Protect against dereferencing regs when it's NULL, and
force a magic number into pc to prevent too deep processing.
This approach permits the dropped samples to be tallied as
invalid Instruction Pointer events.

e.g. output from about 15mins at 10kHz sample rate:
Nr. samples received: 2565380
Nr. samples lost invalid pc: 4

Signed-off-by: Phil Carmody <[email protected]>
---
drivers/oprofile/cpu_buffer.c | 12 +++++++++---
1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
index a7aae24..f70f954 100644
--- a/drivers/oprofile/cpu_buffer.c
+++ b/drivers/oprofile/cpu_buffer.c
@@ -357,9 +357,15 @@ void oprofile_add_ext_sample(unsigned long pc, struct pt_regs * const regs,

void oprofile_add_sample(struct pt_regs * const regs, unsigned long event)
{
- int is_kernel = !user_mode(regs);
- unsigned long pc = profile_pc(regs);
-
+ int is_kernel;
+ unsigned long pc;
+ if (likely(regs)) {
+ is_kernel = !user_mode(regs);
+ pc = profile_pc(regs);
+ } else {
+ is_kernel = 0; /* This value will not be used */
+ pc = ESCAPE_CODE; /* as this causes an early return. */
+ }
__oprofile_add_ext_sample(pc, regs, event, is_kernel);
}

--
1.6.0.4

Subject: Re: [GIT PULL] updates for oprofile

On 27.04.10 20:47:51, Siarhei Siamashka wrote:
> Isn't hrtimer callback function supposed to be only called from IRQ context
> after this cleanup: http://lwn.net/Articles/308545/ ?

Yes, the patch is upstream since v2.6.29. Thanks Siarhei.

I will add a null pointer check anyway.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: [email protected]

Subject: Re: [GIT PULL] updates for oprofile

On 28.04.10 12:09:16, Phil Carmody wrote:
> A few here thrashed around a couple of ideas, and the general consensus
> was that the following work for us, and is offered for consideration.

Phil,

I missed your patch. I will apply your version in favour of my patch
since it adds statistic data.

Thanks,

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: [email protected]