LinuxLists.cc - [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32]

2009-09-05 00:01:24

Subject: [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32]

Thomas,

I can port these over to -rt if you want, or you could just
add them yourself. I know the spinlocks need to be converted to atomic.
Just let me know if you want me to do it.

Ingo,

This patch series started out fixing a single bug that Arnaldo ran into
with the -rt patch series. I've hit it too and wrote a patch to solve it.
Unfortunately the patch was racey and error prone and would crash easily
in Arnaldo's setup.

The problem was with the wakeup latency tracer. When it was developed,
it controlled what went into the ring buffer. But now that we have
events, which can write into the ring buffer at anytime unless they are
disabled. When a max latency is encountered, the buffers are swapped
and the running ring buffer is reset. But if a commit is happening
at this time, it will be corrupted.

Luckily the ring buffer is robust enough to detect this and instead of
crashing the kernel, it safely disables the ring buffer and outputs
a back trace. But this does not help the developer that wants the trace.

Talking with Thomas Gleixner, we both agreed that resetting the ring
buffer on the fly was dangerous. Instead we should insert a marker and
use that as the start of the trace.

I thought this would be an easy change, but after a week of debugging,
fixing bugs just to create two new ones for every one I fixed, this
was not as easy as I thought.

The problem was that you can not write a marker in a buffer that is
not on the same CPU as the writer. I tried to add flags to make the
ring buffer reset itself when a new write happens on that CPU but
that just proved to be even more racey.

Finally (something I should have realized from day 1, not 4 days later)
I realized I already had a marker. The latency tracers always record
the timestamp of when the trace began. I can simply ignore any trace
event that happened before that timestamp. This ended up working out
very well. The code is quite simple and solid.

But this journey through the marker bits was not all in vain. I actually
came across several outstanding bugs in both the tracer and the ring buffer.
Nothing major, but enough to be fixed.

Not just the latency tracers could cause a corruption, but the reset
of the ring buffers by the switching of plugins could also cause it.
All resets must synchronize the disabling of the ring buffers with any
current writers.

Next I found that the swapping of the buffers with the wakeup tracer
was causing issues. The events would use the trace_array variable *tr
to access the buffers. tr->buffer to reserve space on the buffer
and then tr->buffer to commit it. The problem is that the wakeup
tracer swaps the tr->buffer with the max_tr.buffer. The commit can
happen on another buffer than what it started with. This again would
be detected by the ring buffer and would shut it down. The solution
here was to change the API to the writing of ftrace to pass in
the buffer directly and not the trace_array. Thus the events would pass
in the same buffer for both the reserve and the commit. It's OK
to update the max_tr on the wakeup tracer.

The irqsoff trace posed a different problem. It does not swap the
buffers, but it swaps a per cpu buffer within the buffer. This
can be done because the irqsoff tracer only records per cpu and does
not record the events of other cpus. But if a latency is detected
in the middle of a commit (commits can be preempted by interrupts) then
we can not swap the buffer. This is a tougher problem and I currently
do not have solution since it would require a copy of one buffer to the
other. Perhaps in the future. But instead of just failing, I added a
print into the buffer to notify the users that this has occurred.
They will now see:

# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 2.6.31-rc5
# --------------------------------------------------------------------
# latency: 151 us, #1/1, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: -4328 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: save_args
# => ended at: __do_softirq
#
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| /
# ||||| delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
bash-4328 3d.s5 167us : update_max_tr_single: Failed to swap buffers due to commit in progress

Which is better than a wrong trace. It still shows the max latency and the
start and stop points. Note, this run had lockdep enabled with locking
events on, which spews a lot of events after a latency has been hit
(the 167 entry with respect to the 151 max). This should not be too big of
a deal since it took a while loop of constant resettng of the max latency
and greping for this failure to hit it.

Because the irqsoff tracer swaps the internal ring buffer cpus,
a check needs to be added in the recording to handle this. Because this
check is only needed for swapping the internal buffers, and the irqsoff
(and preemptoff) tracer is the only user. I only do this if they
are configured (keep the overhead down).

Please pull the latest tip/tracing/core tree, which can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/core

Steven Rostedt (18):
ring-buffer: do not reset while in a commit
ring-buffer: do not swap buffers during a commit
ring-buffer: remove unnecessary cpu_relax
ring-buffer: fix ring_buffer_read crossing pages
ring-buffer: remove ring_buffer_event_discard
ring-buffer: do not count discarded events
ring-buffer: disable all cpu buffers when one finds a problem
tracing: print out start and stop in latency traces
tracing: disable update max tracer while reading trace
tracing: disable buffers and synchronize_sched before resetting
tracing: remove users of tracing_reset
tracing: use timestamp to determine start of latency traces
tracing: make tracing_reset safe for external use
tracing: pass around ring buffer instead of tracer
tracing: add trace_array_printk for internal tracers to use
tracing: report error in trace if we fail to swap latency buffer
ring-buffer: check for swapped buffers in start of committing
ring-buffer: only enable ring_buffer_swap_cpu when needed

----
include/linux/ftrace_event.h | 15 ++-
include/linux/ring_buffer.h | 23 +--
include/trace/ftrace.h | 15 ++-
kernel/trace/Kconfig | 8 +
kernel/trace/blktrace.c | 12 +-
kernel/trace/kmemtrace.c | 4 +-
kernel/trace/ring_buffer.c | 172 +++++++++++++-------
kernel/trace/trace.c | 297 ++++++++++++++++++++++++----------
kernel/trace/trace.h | 26 ++--
kernel/trace/trace_boot.c | 16 +-
kernel/trace/trace_events.c | 6 +-
kernel/trace/trace_functions_graph.c | 14 +-
kernel/trace/trace_irqsoff.c | 3 +-
kernel/trace/trace_mmiotrace.c | 10 +-
kernel/trace/trace_power.c | 22 ++-
kernel/trace/trace_sched_switch.c | 18 ++-
kernel/trace/trace_sched_wakeup.c | 7 +-
kernel/trace/trace_syscalls.c | 18 ++-
18 files changed, 444 insertions(+), 242 deletions(-)
--

2009-09-05 15:07:20

by Arnaldo Carvalho de Melo

[permalink] [raw]

Subject: Re: [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32]

Em Fri, Sep 04, 2009 at 07:55:27PM -0400, Steven Rostedt escreveu:
> Thomas,
>
> I can port these over to -rt if you want, or you could just
> add them yourself. I know the spinlocks need to be converted to atomic.
> Just let me know if you want me to do it.

I did a quick port to tip/rt/head and couldn't reproduce the problems I
was experiencing, thanks!

I've put the ported patch series at:

http://userweb.kernel.org/~acme/rostedt-rb-rt

Tested using:

echo 1 > /sys/kernel/debug/tracing/events/irq/enable
echo 1 > /sys/kernel/debug/tracing/events/sched/enable
echo wakeup > /sys/kernel/debug/tracing/current_tracer

Then a make -j64 allyesconfig + cyclictest -p95 ...
[root@hs21xm-1 ~]# uname -a
Linux hs21xm-1 2.6.31-rc8-rt9 #2 SMP PREEMPT RT Sat Sep 5 09:42:03 CDT
2009 x86_64 x86_64 x86_64 GNU/Linux
[root@hs21xm-1 ~]# cat /sys/kernel/debug/tracing/trace
# tracer: wakeup
#
# wakeup latency trace v1.1.5 on 2.6.31-rc8-rt9
# --------------------------------------------------------------------
# latency: 220 us, #43/43, CPU#4 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8)
# -----------------
# | task: -2924 (uid:0 nice:-5 policy:1 rt_prio:85)
# -----------------
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| /
# ||||| delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
sirq-rcu-52 3d..3 0us : sched_switch: task sirq-rcu/3:52 [29] (D) ==> cc1:13267 [120]
<idle>-0 5d.h3 0us : 0:140:R + [004] 2924: 14:R irq/60-eth2
<idle>-0 5d.h3 1us+: wake_up_process <-handle_IRQ_event
irq/60-e-2924 4d..2 4us+: sched_wakeup: task sirq-net-rx/4:60 [24] success=1
<...>-13267 3d..3 8us : sched_switch: task cc1:13267 [120] (T|0x100) ==> cc1:13085 [120]
irq/60-e-2924 4d.h2 8us+: sched_wakeup: task sirq-timer/4:58 [29] success=1
irq/60-e-2924 4d.h2 10us+: sched_wakeup: task sirq-rcu/4:65 [29] success=1
irq/60-e-2924 4d..3 15us+: sched_switch: task irq/60-eth2:2924 [14] (D) ==> sirq-net-rx/4:60 [24]
<...>-12172 0d..4 26us+: sched_wakeup: task cc1:13566 [120] success=0
<...>-13566 7d..4 36us+: sched_wakeup: task cc1:12172 [120] success=0
<...>-60 4d..2 39us+: sched_wakeup: task cc1:12985 [120] success=1
<...>-13300 2d..4 44us+: sched_wakeup: task cc1:12172 [120] success=0
<...>-13566 7d..3 46us+: sched_switch: task cc1:13566 [120] (R) ==> cc1:12985 [120]
<...>-60 4d..2 55us+: sched_wakeup: task cc1:13149 [120] success=1
<...>-60 4d..4 59us+: sched_wakeup: task cc1:12985 [120] success=0
<...>-12985 7d..2 65us : sched_migrate_task: task cc1:13274 [120] from: 4 to: 7
<idle>-0 5d.h3 67us+: sched_wakeup: task sirq-timer/5:71 [29] success=1
<...>-12985 7d..2 68us+: sched_wakeup: task cc1:13274 [120] success=1
<idle>-0 5d..3 70us+: sched_switch: task swapper:0 [140] (R) ==> sirq-timer/5:71 [29]
sirq-tim-71 5d..3 74us+: sched_switch: task sirq-timer/5:71 [29] (D) ==> swapper:0 [140]
<...>-60 4d..2 82us+: sched_wakeup: task cc1:12683 [120] success=1
<...>-60 4d..3 89us+: sched_switch: task sirq-net-rx/4:60 [24] (D) ==> sirq-timer/4:58 [29]
sirq-tim-58 4d..3 93us+: sched_switch: task sirq-timer/4:58 [29] (D) ==> sirq-rcu/4:65 [29]
sirq-rcu-65 4d..3 97us+: sched_switch: task sirq-rcu/4:65 [29] (D) ==> cc1:13355 [120]
<...>-13210 6d..3 103us+: sched_switch: task cc1:13210 [120] (T|0x100) ==> fixdep:13407 [120]
<...>-13407 6d..2 127us+: sched_wakeup: task cc1:13127 [120] success=1
<...>-13407 6d.h3 134us+: sched_wakeup: task sirq-timer/6:84 [29] success=1
<...>-13407 6dNh3 137us+: sched_wakeup: task sirq-rcu/6:91 [29] success=1
<...>-13407 6d..3 141us+: sched_switch: task fixdep:13407 [120] (R) ==> sirq-timer/6:84 [29]
<...>-13300 2d..4 145us : sched_wakeup: task cc1:12172 [120] success=0
sirq-tim-84 6d..3 146us+: sched_switch: task sirq-timer/6:84 [29] (D) ==> sirq-rcu/6:91 [29]
sirq-rcu-91 6d..3 153us+: sched_switch: task sirq-rcu/6:91 [29] (D) ==> genksyms:13194 [120]
<...>-12985 7d.h2 194us+: sched_wakeup: task sirq-timer/7:97 [29] success=1
<...>-13358 1d..3 199us : sched_switch: task genksyms:13358 [120] (D) ==> as:12659 [120]
<...>-12985 7dNh2 199us+: sched_wakeup: task sirq-rcu/7:104 [29] success=1
<...>-12985 7d..3 204us+: sched_switch: task cc1:12985 [120] (R) ==> sirq-timer/7:97 [29]
sirq-tim-97 7d.h1 207us : irq_handler_entry: irq=60 handler=eth2
sirq-tim-97 7d.h1 208us+: irq_handler_exit: irq=60 return=handled
sirq-tim-97 7d.h2 211us+: sched_wakeup: task irq/60-eth2:2924 [14] success=1
sirq-tim-97 7d..3 216us+: sched_switch: task sirq-timer/7:97 [29] (D) ==> sirq-rcu/7:104 [29]
<...>-13355 4d..3 219us : sched_switch: task cc1:13355 [120] (R) ==> irq/60-eth2:2924 [14]
<...>-13355 4d..3 219us : __schedule <-schedule
<...>-13355 4d..3 220us : 13355:120:R ==> [004] 2924: 14:R irq/60-eth2

- Arnaldo

2009-09-06 04:20:10

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32]

* Steven Rostedt <[email protected]> wrote:

> Finally (something I should have realized from day 1, not 4 days
> later) I realized I already had a marker. The latency tracers
> always record the timestamp of when the trace began. I can simply
> ignore any trace event that happened before that timestamp. This
> ended up working out very well. The code is quite simple and
> solid.

Yes - that trick came from the original latency tracer and i added
it for that purpose.

The flip side is that if there's anything wrong with timestamps
(and this does happen frequently as well) we suddenly have an empty
trace and no explanation about what happened.

> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> tip/tracing/core
>
>
> Steven Rostedt (18):
> ring-buffer: do not reset while in a commit
> ring-buffer: do not swap buffers during a commit
> ring-buffer: remove unnecessary cpu_relax
> ring-buffer: fix ring_buffer_read crossing pages
> ring-buffer: remove ring_buffer_event_discard
> ring-buffer: do not count discarded events
> ring-buffer: disable all cpu buffers when one finds a problem
> tracing: print out start and stop in latency traces
> tracing: disable update max tracer while reading trace
> tracing: disable buffers and synchronize_sched before resetting
> tracing: remove users of tracing_reset
> tracing: use timestamp to determine start of latency traces
> tracing: make tracing_reset safe for external use
> tracing: pass around ring buffer instead of tracer
> tracing: add trace_array_printk for internal tracers to use
> tracing: report error in trace if we fail to swap latency buffer
> ring-buffer: check for swapped buffers in start of committing
> ring-buffer: only enable ring_buffer_swap_cpu when needed
>
> ----
> include/linux/ftrace_event.h | 15 ++-
> include/linux/ring_buffer.h | 23 +--
> include/trace/ftrace.h | 15 ++-
> kernel/trace/Kconfig | 8 +
> kernel/trace/blktrace.c | 12 +-
> kernel/trace/kmemtrace.c | 4 +-
> kernel/trace/ring_buffer.c | 172 +++++++++++++-------
> kernel/trace/trace.c | 297 ++++++++++++++++++++++++----------
> kernel/trace/trace.h | 26 ++--
> kernel/trace/trace_boot.c | 16 +-
> kernel/trace/trace_events.c | 6 +-
> kernel/trace/trace_functions_graph.c | 14 +-
> kernel/trace/trace_irqsoff.c | 3 +-
> kernel/trace/trace_mmiotrace.c | 10 +-
> kernel/trace/trace_power.c | 22 ++-
> kernel/trace/trace_sched_switch.c | 18 ++-
> kernel/trace/trace_sched_wakeup.c | 7 +-
> kernel/trace/trace_syscalls.c | 18 ++-
> 18 files changed, 444 insertions(+), 242 deletions(-)

Pulled, thanks a lot Steve!

We should also think about how to expose the function tracer, and
in particular latency tracing functionality, via perf.

The most natural approach would be to expose it as a 'generic
event' - and simply stream function tracing events as they happen.

The latency tracer itself could be exposed either as a generic
event, or as a software counter. The difference there is that it
needs a per counter 'max latency' state that controls whether a new
trace is sent down the pipe.

This would allow very flexible tooling: for example one could do a
dump of the maximum latency a particular workflow experiences [in a
scheduling invariant manner] - without having to trace the full
system. I.e. individual apps could be validated separately and
latency impact on individual apps could be tracked more accurately
as well.

(beyond the other usecases such as tracking cache-miss rates in
addition to latencies - this would allow the observation of
worst-case cachemisses correlating with large latencies, etc.)

Steve, Frederic, does any of you have interest (and time ;-) to
look into that? It should probably have its own tooling variant,
not be part of 'perf trace' per se (which is the generic tracing
tool).

'perf latencytrace' or so - because latency tracing has its own
special needs (for the same reason do we have a separate latency
tracer ftrace plugin).

Ingo

2009-09-07 10:20:51

by Jan Blunck

[permalink] [raw]

Subject: Re: [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32]

On Sat, Sep 5, 2009 at 5:06 PM, Arnaldo Carvalho de Melo<[email protected]> wrote:
>
> I did a quick port to tip/rt/head and couldn't reproduce the problems I
> was experiencing, thanks!
>
> I've put the ported patch series at:
>
> http://userweb.kernel.org/~acme/rostedt-rb-rt
>

Nice, this also seems to fix my problem with the hanging wakup
selftest. At least the system continues to boot. However, I still get
this:

[ 13.739115] Testing tracer sched_switch: PASSED
[ 14.002071] Testing tracer function: PASSED
[ 14.158816] Testing dynamic ftrace: PASSED
[ 14.512524] Testing tracer irqsoff: PASSED
[ 14.693519] Testing tracer preemptoff: PASSED
[ 14.853519] Testing tracer preemptirqsoff: PASSED
[ 15.013506] Testing tracer wakeup:
[ 15.134004] hrtimer: interrupt too slow, forcing clock min delta to
27508383 ns
[ 15.134010] hrtimer: interrupt too slow, forcing clock min delta to
27544629 ns
[ 15.134016] hrtimer: interrupt too slow, forcing clock min delta to
27543582 ns
[ 15.134022] hrtimer: interrupt too slow, forcing clock min delta to
27543372 ns
[ 15.134028] hrtimer: interrupt too slow, forcing clock min delta to
27543162 ns
[ 15.134034] hrtimer: interrupt too slow, forcing clock min delta to
27542325 ns
[ 15.134040] hrtimer: interrupt too slow, forcing clock min delta to
27542952 ns
[ 15.134046] hrtimer: interrupt too slow, forcing clock min delta to
27543579 ns
[ 15.134052] hrtimer: interrupt too slow, forcing clock min delta to
27543582 ns
[ 15.134058] hrtimer: interrupt too slow, forcing clock min delta to
27543789 ns
[ 15.134063] hrtimer: interrupt too slow, forcing clock min delta to
27543372 ns
[ 15.134070] hrtimer: interrupt too slow, forcing clock min delta to
27542532 ns
[ 15.134076] hrtimer: interrupt too slow, forcing clock min delta to
27542745 ns
[ 15.134082] hrtimer: interrupt too slow, forcing clock min delta to
27543372 ns
[ 15.134088] hrtimer: interrupt too slow, forcing clock min delta to
27542742 ns
[ 15.134094] hrtimer: interrupt too slow, forcing clock min delta to
27542325 ns
[ 15.134101] hrtimer: interrupt too slow, forcing clock min delta to
27544419 ns
[ 15.134107] hrtimer: interrupt too slow, forcing clock min delta to
27544419 ns
[ 15.134113] hrtimer: interrupt too slow, forcing clock min delta to
27544209 ns
[ 15.134119] hrtimer: interrupt too slow, forcing clock min delta to
27544419 ns
[ 15.134126] hrtimer: interrupt too slow, forcing clock min delta to
27544419 ns
[ 15.134132] hrtimer: interrupt too slow, forcing clock min delta to
27544422 ns
[ 15.134161] hrtimer: interrupt too slow, forcing clock min delta to
27492249 ns
[ 15.133993] hrtimer: interrupt too slow, forcing clock min delta to
28730742 ns
[ 15.810670] PASSED
[ 15.865758] Testing tracer wakeup_rt: PASSED

And this time I have lockdep really disabled.

Jan

2009-09-07 10:28:45

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32]

On Mon, Sep 07, 2009 at 12:20:50PM +0200, Jan Blunck wrote:
> On Sat, Sep 5, 2009 at 5:06 PM, Arnaldo Carvalho de Melo<[email protected]> wrote:
> >
> > I did a quick port to tip/rt/head and couldn't reproduce the problems I
> > was experiencing, thanks!
> >
> > I've put the ported patch series at:
> >
> > http://userweb.kernel.org/~acme/rostedt-rb-rt
> >
>
> Nice, this also seems to fix my problem with the hanging wakup
> selftest. At least the system continues to boot. However, I still get
> this:
>
> [ 13.739115] Testing tracer sched_switch: PASSED
> [ 14.002071] Testing tracer function: PASSED
> [ 14.158816] Testing dynamic ftrace: PASSED
> [ 14.512524] Testing tracer irqsoff: PASSED
> [ 14.693519] Testing tracer preemptoff: PASSED
> [ 14.853519] Testing tracer preemptirqsoff: PASSED
> [ 15.013506] Testing tracer wakeup:
> [ 15.134004] hrtimer: interrupt too slow, forcing clock min delta to
> 27508383 ns

Ah, Ulrich Lukas also reported me such warnings. I would like to
track the reason of these messages (the hrtimer hanging detection
seems to warn too early).

Could you please send me your config, I hope I could reproduce it.

Thanks!

2009-09-07 10:41:59

by Jan Blunck

[permalink] [raw]

Subject: Re: [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32]

On Mon, Sep 07, Frederic Weisbecker wrote:

> On Mon, Sep 07, 2009 at 12:20:50PM +0200, Jan Blunck wrote:
> > On Sat, Sep 5, 2009 at 5:06 PM, Arnaldo Carvalho de Melo<[email protected]> wrote:
> > >
> > > I did a quick port to tip/rt/head and couldn't reproduce the problems I
> > > was experiencing, thanks!
> > >
> > > I've put the ported patch series at:
> > >
> > > http://userweb.kernel.org/~acme/rostedt-rb-rt
> > >
> >
> > Nice, this also seems to fix my problem with the hanging wakup
> > selftest. At least the system continues to boot. However, I still get
> > this:
> >
> > [ 13.739115] Testing tracer sched_switch: PASSED
> > [ 14.002071] Testing tracer function: PASSED
> > [ 14.158816] Testing dynamic ftrace: PASSED
> > [ 14.512524] Testing tracer irqsoff: PASSED
> > [ 14.693519] Testing tracer preemptoff: PASSED
> > [ 14.853519] Testing tracer preemptirqsoff: PASSED
> > [ 15.013506] Testing tracer wakeup:
> > [ 15.134004] hrtimer: interrupt too slow, forcing clock min delta to
> > 27508383 ns
>
>
>
> Ah, Ulrich Lukas also reported me such warnings. I would like to
> track the reason of these messages (the hrtimer hanging detection
> seems to warn too early).
>
> Could you please send me your config, I hope I could reproduce it.
>

I only see that on one of my systems, a 32 core Opteron and not on my 8
core. So it seems that size does matter ...

Attachments:

(No filename) (1.36 kB)
hrtimer-slow.config (65.12 kB)
Download all attachments

2009-09-07 11:22:32

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32]

On Mon, Sep 07, 2009 at 12:41:55PM +0200, Jan Blunck wrote:
> On Mon, Sep 07, Frederic Weisbecker wrote:
> > Ah, Ulrich Lukas also reported me such warnings. I would like to
> > track the reason of these messages (the hrtimer hanging detection
> > seems to warn too early).
> >
> > Could you please send me your config, I hope I could reproduce it.
> >
>
> I only see that on one of my systems, a 32 core Opteron and not on my 8
> core. So it seems that size does matter ...

Ah then I have few chances to reproduce it with my boxes :-)

I'll try anyway, thanks!

2009-09-08 14:46:35

by Jan Blunck

[permalink] [raw]

Subject: Re: [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32]

On Mon, Sep 7, 2009 at 12:20 PM, Jan Blunck<[email protected]> wrote:
> On Sat, Sep 5, 2009 at 5:06 PM, Arnaldo Carvalho de Melo<[email protected]> wrote:
>>
>> I did a quick port to tip/rt/head and couldn't reproduce the problems I
>> was experiencing, thanks!
>>
>> I've put the ported patch series at:
>>
>> http://userweb.kernel.org/~acme/rostedt-rb-rt
>>

Besides the problem with the wakeup tracer I see traces with
preemptirqsoff that I think are invalid:

# tracer: preemptirqsoff
#
# preemptirqsoff latency trace v1.1.5 on 2.6.31-rc8-rt9-rt_trace
# --------------------------------------------------------------------
# latency: 332 us, #4/4, CPU#24 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:32)
# -----------------
# | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: start_secondary
# => ended at: start_secondary
#
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| /
# ||||| delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
<idle>-0 24.N.1 0us : cpu_idle <-start_secondary
<idle>-0 24dN.1 1us!: tick_nohz_stop_idle <-tick_nohz_restart_sched_ti
ck
<idle>-0 24dN.1 332us+: cpu_idle <-start_secondary
<idle>-0 24dN.1 336us : trace_preempt_on <-start_secondary

I have set kernel.ftrace_enabled = 1 but still not a single function
is recorded in tick_nohz_stop_idle().

Any ideas?

2009-09-08 15:39:13

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32]

On Tue, 2009-09-08 at 16:42 +0200, Jan Blunck wrote:
> On Mon, Sep 7, 2009 at 12:20 PM, Jan Blunck<[email protected]> wrote:
> > On Sat, Sep 5, 2009 at 5:06 PM, Arnaldo Carvalho de Melo<[email protected]> wrote:
> >>
> >> I did a quick port to tip/rt/head and couldn't reproduce the problems I
> >> was experiencing, thanks!
> >>
> >> I've put the ported patch series at:
> >>
> >> http://userweb.kernel.org/~acme/rostedt-rb-rt
> >>
>
> Besides the problem with the wakeup tracer I see traces with
> preemptirqsoff that I think are invalid:
>
> # tracer: preemptirqsoff
> #
> # preemptirqsoff latency trace v1.1.5 on 2.6.31-rc8-rt9-rt_trace
> # --------------------------------------------------------------------
> # latency: 332 us, #4/4, CPU#24 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:32)
> # -----------------
> # | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
> # -----------------
> # => started at: start_secondary
> # => ended at: start_secondary
> #
> #
> # _------=> CPU#
> # / _-----=> irqs-off
> # | / _----=> need-resched
> # || / _---=> hardirq/softirq
> # ||| / _--=> preempt-depth
> # |||| /
> # ||||| delay
> # cmd pid ||||| time | caller
> # \ / ||||| \ | /
> <idle>-0 24.N.1 0us : cpu_idle <-start_secondary
> <idle>-0 24dN.1 1us!: tick_nohz_stop_idle <-tick_nohz_restart_sched_ti
> ck
> <idle>-0 24dN.1 332us+: cpu_idle <-start_secondary
> <idle>-0 24dN.1 336us : trace_preempt_on <-start_secondary
>
> I have set kernel.ftrace_enabled = 1 but still not a single function
> is recorded in tick_nohz_stop_idle().
>
> Any ideas?

Which arch is this for? When the system goes into idle, it usually
disables preemption and maybe interrupts. But the idle call in assembly
will enable interrupts. And when an interrupt happens it comes out.

There should be callbacks in cpu_idle to disable latency tracing. These
functions are called stop_critical_timings and start_critical_timings.
You'll see these calls in both process_64.c and process_32.c in the x86
arch.

Is you set up not calling cpu_idle?

-- Steve