2010-04-12 21:45:15

by Mathieu Desnoyers

[permalink] [raw]
Subject: TRACE_EVENT() declarations belong to include/trace/

Hi,

Ranting about:

commit 1bf4af165050d90ea6659ffb2536ec8ca783aab5
Author: Anton Blanchard <[email protected]>
Date: Mon Oct 26 18:47:42 2009 +0000

powerpc: tracing: Add powerpc tracepoints for interrupt entry and exit

Why are there TRACE_EVENT() declarations in arch/powerpc/include/asm/trace.h for
irq_entry/exit ?

What's so special about them that they cannot be put in linux/trace/ ?

I'm all for the trace_irq_entry/exit instrumentation, but I don't see any good
in adding event declarations outside of include/trace/.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com


2010-04-12 22:11:31

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: TRACE_EVENT() declarations belong to include/trace/

On Mon, Apr 12, 2010 at 05:45:11PM -0400, Mathieu Desnoyers wrote:
> Why are there TRACE_EVENT() declarations in arch/powerpc/include/asm/trace.h for
> irq_entry/exit ?
>
> What's so special about them that they cannot be put in linux/trace/ ?
>
> I'm all for the trace_irq_entry/exit instrumentation, but I don't see any good
> in adding event declarations outside of include/trace/.
>
> Thanks,


Yeah,

If this is to trace all irqs, then it seems to me the wrong way.
We already have generic irq_handler_entry and irq_handler_exit trace events.

May be those in powerpc are here to get the spurious irqs by computing
a diff between generic and arch irq events? In which case
it would be better to get dedicated spurious irq tracepoints.

2010-04-12 22:17:40

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: TRACE_EVENT() declarations belong to include/trace/

* Frederic Weisbecker ([email protected]) wrote:
> On Mon, Apr 12, 2010 at 05:45:11PM -0400, Mathieu Desnoyers wrote:
> > Why are there TRACE_EVENT() declarations in arch/powerpc/include/asm/trace.h for
> > irq_entry/exit ?
> >
> > What's so special about them that they cannot be put in linux/trace/ ?
> >
> > I'm all for the trace_irq_entry/exit instrumentation, but I don't see any good
> > in adding event declarations outside of include/trace/.
> >
> > Thanks,
>
>
> Yeah,
>
> If this is to trace all irqs, then it seems to me the wrong way.
> We already have generic irq_handler_entry and irq_handler_exit trace events.

The commit changelog :

<quote>
commit 1bf4af165050d90ea6659ffb2536ec8ca783aab5
Author: Anton Blanchard <[email protected]>
Date: Mon Oct 26 18:47:42 2009 +0000

powerpc: tracing: Add powerpc tracepoints for interrupt entry and exit

This adds powerpc-specific tracepoints for interrupt entry and exit.

While we already have generic irq_handler_entry and irq_handler_exit
tracepoints there are cases on our virtualised powerpc machines where an
interrupt is presented to the OS, but subsequently handled by the hypervisor.
This means no OS interrupt handler is invoked.

Here is an example on a POWER6 machine with the patch below applied:

<idle>-0 [006] 3243.949840744: irq_entry: pt_regs=c0000000ce31fb10
<idle>-0 [006] 3243.949850520: irq_exit: pt_regs=c0000000ce31fb10

<idle>-0 [007] 3243.950218208: irq_entry: pt_regs=c0000000ce323b10
<idle>-0 [007] 3243.950224080: irq_exit: pt_regs=c0000000ce323b10

<idle>-0 [000] 3244.021879320: irq_entry: pt_regs=c000000000a63aa0
<idle>-0 [000] 3244.021883616: irq_handler_entry: irq=87 handler=eth0
<idle>-0 [000] 3244.021887328: irq_handler_exit: irq=87 return=handled
<idle>-0 [000] 3244.021897408: irq_exit: pt_regs=c000000000a63aa0

Here we see two phantom interrupts (no handler was invoked), followed
by a real interrupt for eth0. Without the tracepoints in this patch we
would have missed the phantom interrupts.
</quote>

states that this is done for setups where no in-kernel handler is called. But it
does not say if tracing the beginning and end of handle_IRQ_event() from
kernel/irq/handle.c would fix the problem. That would be a lot neater than this
arch-specific solution.

Thanks,

Mathieu

>
> May be those in powerpc are here to get the spurious irqs by computing
> a diff between generic and arch irq events? In which case
> it would be better to get dedicated spurious irq tracepoints.
>

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-04-12 22:26:43

by Steven Rostedt

[permalink] [raw]
Subject: Re: TRACE_EVENT() declarations belong to include/trace/

On Mon, 2010-04-12 at 17:45 -0400, Mathieu Desnoyers wrote:
> Hi,
>
> Ranting about:
>
> commit 1bf4af165050d90ea6659ffb2536ec8ca783aab5
> Author: Anton Blanchard <[email protected]>
> Date: Mon Oct 26 18:47:42 2009 +0000
>
> powerpc: tracing: Add powerpc tracepoints for interrupt entry and exit
>
> Why are there TRACE_EVENT() declarations in arch/powerpc/include/asm/trace.h for
> irq_entry/exit ?
>
> What's so special about them that they cannot be put in linux/trace/ ?
>
> I'm all for the trace_irq_entry/exit instrumentation, but I don't see any good
> in adding event declarations outside of include/trace/.


If there is any specific architecture data being recorded in the
TRACE_EVENT() macro, then it should be arch specific, but if not, then
it should go in include/trace/

/me goes to look at the code.

-- Steve

2010-04-12 23:20:24

by Anton Blanchard

[permalink] [raw]
Subject: Re: TRACE_EVENT() declarations belong to include/trace/


Hi,

> states that this is done for setups where no in-kernel handler is called. But
> it does not say if tracing the beginning and end of handle_IRQ_event() from
> kernel/irq/handle.c would fix the problem. That would be a lot neater than
> this arch-specific solution.

Unfortunately that misses this problem completely. On some versions of the
POWER hypervisor we can be presented with interrupts for our virtualisation
layer that get handled in the get_irq hypervisor call. The code looks like
this:


void do_IRQ(struct pt_regs *regs)
{
struct pt_regs *old_regs = set_irq_regs(regs);
unsigned int irq;

trace_irq_entry(regs);

irq_enter();

check_stack_overflow();

irq = ppc_md.get_irq(); <------------- jitter spikes here

if (irq != NO_IRQ && irq != NO_IRQ_IGNORE)
handle_one_irq(irq);
else if (irq != NO_IRQ_IGNORE)
__get_cpu_var(irq_stat).spurious_irqs++;


We've had HPC customers who have experienced jitter in their applications
caused by this and as a result I added the events so we can monitor it.

Since this is a POWER specific issue I'm happy to rename the trace events to
powerpc_irq_entry/exit. We could also look at changing the tracepoints, eg
putting it around the ppc_md.get_irq(), but I can't see how we can remove
them completely.

Anton

2010-04-12 23:25:27

by Anton Blanchard

[permalink] [raw]
Subject: Re: TRACE_EVENT() declarations belong to include/trace/


Hi Frederic,

> If this is to trace all irqs, then it seems to me the wrong way.
> We already have generic irq_handler_entry and irq_handler_exit trace events.
>
> May be those in powerpc are here to get the spurious irqs by computing
> a diff between generic and arch irq events? In which case
> it would be better to get dedicated spurious irq tracepoints.

The number of spurious irqs are not interesting - we track them in
/proc/interrupts. The duration of the disturbances are, and they were big
enough for people to see them in certain HPC loops.

Anton

2010-04-13 00:27:41

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: TRACE_EVENT() declarations belong to include/trace/

* Anton Blanchard ([email protected]) wrote:
>
> Hi,
>
> > states that this is done for setups where no in-kernel handler is called. But
> > it does not say if tracing the beginning and end of handle_IRQ_event() from
> > kernel/irq/handle.c would fix the problem. That would be a lot neater than
> > this arch-specific solution.
>
> Unfortunately that misses this problem completely. On some versions of the
> POWER hypervisor we can be presented with interrupts for our virtualisation
> layer that get handled in the get_irq hypervisor call. The code looks like
> this:
>
>
> void do_IRQ(struct pt_regs *regs)
> {
> struct pt_regs *old_regs = set_irq_regs(regs);
> unsigned int irq;
>
> trace_irq_entry(regs);
>
> irq_enter();
>
> check_stack_overflow();
>
> irq = ppc_md.get_irq(); <------------- jitter spikes here
>
> if (irq != NO_IRQ && irq != NO_IRQ_IGNORE)
> handle_one_irq(irq);
> else if (irq != NO_IRQ_IGNORE)
> __get_cpu_var(irq_stat).spurious_irqs++;
>
>
> We've had HPC customers who have experienced jitter in their applications
> caused by this and as a result I added the events so we can monitor it.
>
> Since this is a POWER specific issue I'm happy to rename the trace events to
> powerpc_irq_entry/exit. We could also look at changing the tracepoints, eg
> putting it around the ppc_md.get_irq(), but I can't see how we can remove
> them completely.

OK, I see. How about arch_irq_entry/exit() ?

This way, if we need to do something similar on another arch at the
architecture-level, we can use the same names.

Thanks,

Mathieu

>
> Anton

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com