LinuxLists.cc - [PATCH RT 3/3] fix get_monotonic

2007-08-24 17:59:08

Subject: [PATCH RT 3/3] fix get_monotonic_cycles for latency tracer

The latency tracer on SMP was given crazy results. It was found that the
get_monotonic_cycles that it uses was not returning a monotonic counter.
The cause of this was that clock->cycles_raw and clock->cycles_last can
be updated on another CPU and make the cycles_now variable out-of-date.
So the delta that was calculated from cycles_now - cycles_last was
incorrect.

This patch adds a loop to make sure that the cycles_raw and cycles_last
are consistent through out the calculation (otherwise it performs the
loop again).

With this patch the latency_tracer can produce normal results again.

Signed-off-by: Steven Rostedt <[email protected]>

Index: linux-2.6-rt/kernel/time/timekeeping.c
===================================================================
--- linux-2.6-rt.orig/kernel/time/timekeeping.c 2007-08-24 11:41:04.000000000 -0400
+++ linux-2.6-rt/kernel/time/timekeeping.c 2007-08-24 11:47:01.000000000 -0400
@@ -75,15 +75,30 @@ s64 __get_nsec_offset(void)

cycle_t notrace get_monotonic_cycles(void)
{
- cycle_t cycle_now, cycle_delta;
+ cycle_t cycle_now, cycle_delta, cycle_raw, cycle_last;

- /* read clocksource: */
- cycle_now = clocksource_read(clock);
+ do {
+ /*
+ * cycle_raw and cycle_last can change on
+ * another CPU and we need the delta calculation
+ * of cycle_now and cycle_last happen atomic, as well
+ * as the adding to cycle_raw. We don't need to grab
+ * any locks, we just keep trying until get all the
+ * calculations together in one state.
+ */
+ cycle_raw = clock->cycle_raw;
+ cycle_last = clock->cycle_last;
+
+ /* read clocksource: */
+ cycle_now = clocksource_read(clock);
+
+ /* calculate the delta since the last update_wall_time: */
+ cycle_delta = (cycle_now - cycle_last) & clock->mask;

- /* calculate the delta since the last update_wall_time: */
- cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
+ } while (cycle_raw != clock->cycle_raw ||
+ cycle_last != clock->cycle_last);

- return clock->cycle_raw + cycle_delta;
+ return cycle_raw + cycle_delta;
}

unsigned long notrace cycles_to_usecs(cycle_t cycles)

2007-08-24 18:31:20

by john stultz

[permalink] [raw]

Subject: Re: [PATCH RT 3/3] fix get_monotonic_cycles for latency tracer

On Fri, 2007-08-24 at 13:57 -0400, Steven Rostedt wrote:
> The latency tracer on SMP was given crazy results. It was found that the
> get_monotonic_cycles that it uses was not returning a monotonic counter.
> The cause of this was that clock->cycles_raw and clock->cycles_last can
> be updated on another CPU and make the cycles_now variable out-of-date.
> So the delta that was calculated from cycles_now - cycles_last was
> incorrect.
>
> This patch adds a loop to make sure that the cycles_raw and cycles_last
> are consistent through out the calculation (otherwise it performs the
> loop again).
>
> With this patch the latency_tracer can produce normal results again.

Ah! good catch. I totally missed that get_monotonic_cycles was being
called outside of the xtime lock.

> Signed-off-by: Steven Rostedt <[email protected]>
>
> Index: linux-2.6-rt/kernel/time/timekeeping.c
> ===================================================================
> --- linux-2.6-rt.orig/kernel/time/timekeeping.c 2007-08-24 11:41:04.000000000 -0400
> +++ linux-2.6-rt/kernel/time/timekeeping.c 2007-08-24 11:47:01.000000000 -0400
> @@ -75,15 +75,30 @@ s64 __get_nsec_offset(void)
>
> cycle_t notrace get_monotonic_cycles(void)
> {
> - cycle_t cycle_now, cycle_delta;
> + cycle_t cycle_now, cycle_delta, cycle_raw, cycle_last;
>
> - /* read clocksource: */
> - cycle_now = clocksource_read(clock);
> + do {
> + /*
> + * cycle_raw and cycle_last can change on
> + * another CPU and we need the delta calculation
> + * of cycle_now and cycle_last happen atomic, as well
> + * as the adding to cycle_raw. We don't need to grab
> + * any locks, we just keep trying until get all the
> + * calculations together in one state.
> + */
> + cycle_raw = clock->cycle_raw;
> + cycle_last = clock->cycle_last;
> +
> + /* read clocksource: */
> + cycle_now = clocksource_read(clock);
> +
> + /* calculate the delta since the last update_wall_time: */
> + cycle_delta = (cycle_now - cycle_last) & clock->mask;
>
> - /* calculate the delta since the last update_wall_time: */
> - cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
> + } while (cycle_raw != clock->cycle_raw ||
> + cycle_last != clock->cycle_last);

So if I'm understanding this right, not taking a lock isn't an
optimization (as the seq read lock code is almost the same), but a
requirement (as this might be called while xtime_lock is held),
correct?

Might want to clarify that a bit in the comment.

> - return clock->cycle_raw + cycle_delta;
> + return cycle_raw + cycle_delta;
> }
>
> unsigned long notrace cycles_to_usecs(cycle_t cycles)

Otherwise:
Acked-by: John Stultz <[email protected]>

thanks
-john

2007-08-24 18:57:14

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH RT 3/3] fix get_monotonic_cycles for latency tracer

--
On Fri, 24 Aug 2007, john stultz wrote:

> >
> > - /* calculate the delta since the last update_wall_time: */
> > - cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
> > + } while (cycle_raw != clock->cycle_raw ||
> > + cycle_last != clock->cycle_last);
>
> So if I'm understanding this right, not taking a lock isn't an
> optimization (as the seq read lock code is almost the same), but a
> requirement (as this might be called while xtime_lock is held),
> correct?

Exactly. The latency tracer uses this so no locks can be grabbed
(seq_locks included).

>
> Might want to clarify that a bit in the comment.
>

OK, will do and send a updated patch.

I wasn't in the best state writing this patch in the wee hours of the
night ;-)

-- Steve

2007-08-24 19:03:52

by Steven Rostedt

[permalink] [raw]

Subject: [PATCH RT 3/3 - take two ] fix get_monotonic_cycles for latency tracer

[Added comment about not being able to take the xtime lock in
get_monotonic_cycles - suggested by John Stultz]

The latency tracer on SMP was given crazy results. It was found that the
get_monotonic_cycles that it uses was not returning a monotonic counter.
The cause of this was that clock->cycles_raw and clock->cycles_last can
be updated on another CPU and make the cycles_now variable out-of-date.
So the delta that was calculated from cycles_now - cycles_last was
incorrect.

This patch adds a loop to make sure that the cycles_raw and cycles_last
are consistent through out the calculation (otherwise it performs the
loop again).

With this patch the latency_tracer can produce normal results again.

Signed-off-by: Steven Rostedt <[email protected]>

Index: linux-2.6-rt/kernel/time/timekeeping.c
===================================================================
--- linux-2.6-rt.orig/kernel/time/timekeeping.c 2007-08-24 13:41:28.000000000 -0400
+++ linux-2.6-rt/kernel/time/timekeeping.c 2007-08-24 14:58:02.000000000 -0400
@@ -75,15 +75,35 @@ s64 __get_nsec_offset(void)

cycle_t notrace get_monotonic_cycles(void)
{
- cycle_t cycle_now, cycle_delta;
+ cycle_t cycle_now, cycle_delta, cycle_raw, cycle_last;

- /* read clocksource: */
- cycle_now = clocksource_read(clock);
+ do {
+ /*
+ * cycle_raw and cycle_last can change on
+ * another CPU and we need the delta calculation
+ * of cycle_now and cycle_last happen atomic, as well
+ * as the adding to cycle_raw. We don't need to grab
+ * any locks, we just keep trying until get all the
+ * calculations together in one state.
+ *
+ * In fact, we __cant__ grab any locks. This
+ * function is called from the latency_tracer which can
+ * be called anywhere. To grab any locks (including
+ * seq_locks) we risk putting ourselves into a deadlock.
+ */
+ cycle_raw = clock->cycle_raw;
+ cycle_last = clock->cycle_last;
+
+ /* read clocksource: */
+ cycle_now = clocksource_read(clock);
+
+ /* calculate the delta since the last update_wall_time: */
+ cycle_delta = (cycle_now - cycle_last) & clock->mask;

- /* calculate the delta since the last update_wall_time: */
- cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
+ } while (cycle_raw != clock->cycle_raw ||
+ cycle_last != clock->cycle_last);

- return clock->cycle_raw + cycle_delta;
+ return cycle_raw + cycle_delta;
}

unsigned long notrace cycles_to_usecs(cycle_t cycles)

2007-08-25 11:07:59

by Frank Ch. Eigler

[permalink] [raw]

Subject: Re: [PATCH RT 3/3 - take two ] fix get_monotonic_cycles for latency tracer

Steven Rostedt <[email protected]> writes:

> [...]
> + * [...] We don't need to grab
> + * any locks, we just keep trying until get all the
> + * calculations together in one state.
> + *
> + * In fact, we __cant__ grab any locks. This
> + * function is called from the latency_tracer which can
> + * be called anywhere. To grab any locks (including
> + * seq_locks) we risk putting ourselves into a deadlock.

Perhaps you could add a comment about why the loop, which appears
potentially infinite as written, avoids livelock. (It looks rather
like a seqlock read loop.)

- FChE

2007-08-26 01:27:48

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH RT 3/3 - take two ] fix get_monotonic_cycles for latency tracer

--
On Sat, 25 Aug 2007, Frank Ch. Eigler wrote:

>
> Steven Rostedt <[email protected]> writes:
>
> > [...]
> > + * [...] We don't need to grab
> > + * any locks, we just keep trying until get all the
> > + * calculations together in one state.
> > + *
> > + * In fact, we __cant__ grab any locks. This
> > + * function is called from the latency_tracer which can
> > + * be called anywhere. To grab any locks (including
> > + * seq_locks) we risk putting ourselves into a deadlock.
>
> Perhaps you could add a comment about why the loop, which appears
> potentially infinite as written, avoids livelock. (It looks rather
> like a seqlock read loop.)
>

I guess I need to rewrite that comment. It shouldn't appear infinitely
looping, since it is basically: do { x=A; func() } while (x != A); which
to me seems that while is most likely to fail unless something touches A.

But yes, it _is_ basically a seq lock, but its on what we are working
with. And we don't even need any memory barriers that a seq lock might
do, since it is very obvious to gcc that the function call can modify the
variables that we are testing.

But do you still think that looks inifinte? If so, I'll reword it.

-- Steve