LinuxLists.cc - Re: [PATCH 1/3] ftrace: add tracepoint for timer

2009-06-03 02:52:16

Subject: Re: [PATCH 1/3] ftrace: add tracepoint for timer

Thomas Gleixner wrote:
> On Fri, 29 May 2009, Zhaolei wrote:
>> But, for trace_timer_start() in __mod_timer(), we need to put it after
>> timer->* changed.
>
> Why ?
>
>>> + TP_fast_assign(
>>> + __entry->timer = timer;
>>> + __entry->function = timer->function;
>>> + __entry->expires = timer->expires;
>>> + __entry->cpu = cpu;
>
> Again, neither timer nor function nor expires will change when the
> timer is added, right ?
>
> The only unknown at this point is cpu. See below.
>
>> Nevertheless, it don't means we need separate trace_timer_start() and
>> debug_timer_activate(), because we can put move debug_timer_activate() below,
>> as:
>> - debug_timer_activate(timer);
>> ...
>> timer->expires = expires;
>> internal_add_timer(base, timer);
>> + debug_timer_activate(timer);
>
> No, you can not call it with the base->lock held.
>
>> + trace_timer_start(timer, smp_processor_id());
>
> Also using smp_processor_id() here is wrong. We do not necessarily add
> the timer to the current CPUs timer wheel. See the code which selects
> the timer base. So this information is rather useless, because the
> tracer knows anyway on which CPU we are running.
>
> Unfortunately we do not have an easy way to figure out to which CPU
> the base belongs (except if it's the base of the current CPU). There
> is not much we can do about that. But OTOH, this is not a problem
> because we see when the timer expires on which CPU it was enqueued. So
> scrapping the cpu entry in the trace completely is not a big loss.
>
> The same applies to hrtimers as well.
>

Hi tglx:

I also have different view here. :-)

As you say, "We do not necessarily add the timer to the current CPUs timer
wheel", but the timer is added to current CPU in __mod_timer(), selects the
timer base as below code:
new_base = __get_cpu_var(tvec_bases);
In this case, we can use smp_processor_id() to get the CPU which timer is
added.

We can not add the timer to the current CPUs by using add_timer_on(), selects
the timer base in this function as below code:
struct tvec_base *base = per_cpu(tvec_bases, cpu);
In this case, We can know the timer is added to 'cpu'.

So, I add trace_timer_start() in __mod_timer() and add_timer_on()in my patch.

In hrtimer, all timer is added to the current CPU which can be getted by using
smp_processor_id() in probe function, so it not has 'cpu' argument in my patch.

In addition, we do better not put trace_timer_start() and debug_timer_activate
in one function, have two reasons:
1: for trace_timer_start()'s logic, the timer start event is completed in
internal_add_timer(), in other words: the timer is not start before
internal_add_timer().

2: as Zhaolei says in the last mail, the timer's data may changed after
debug_timer_activate().

Thanks,
Xiao Guangrong

>
> tglx
>
>

2009-06-03 16:40:36

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [PATCH 1/3] ftrace: add tracepoint for timer

On Wed, 3 Jun 2009, Xiao Guangrong wrote:
> Thomas Gleixner wrote:
> > On Fri, 29 May 2009, Zhaolei wrote:
> >> But, for trace_timer_start() in __mod_timer(), we need to put it after
> >> timer->* changed.
> >
> > Why ?
> >
> >>> + TP_fast_assign(
> >>> + __entry->timer = timer;
> >>> + __entry->function = timer->function;
> >>> + __entry->expires = timer->expires;
> >>> + __entry->cpu = cpu;
> >
> > Again, neither timer nor function nor expires will change when the
> > timer is added, right ?
> >
> > The only unknown at this point is cpu. See below.
> >
> >> Nevertheless, it don't means we need separate trace_timer_start() and
> >> debug_timer_activate(), because we can put move debug_timer_activate() below,
> >> as:
> >> - debug_timer_activate(timer);
> >> ...
> >> timer->expires = expires;
> >> internal_add_timer(base, timer);
> >> + debug_timer_activate(timer);
> >
> > No, you can not call it with the base->lock held.
> >
> >> + trace_timer_start(timer, smp_processor_id());
> >
> > Also using smp_processor_id() here is wrong. We do not necessarily add
> > the timer to the current CPUs timer wheel. See the code which selects
> > the timer base. So this information is rather useless, because the
> > tracer knows anyway on which CPU we are running.
> >
> > Unfortunately we do not have an easy way to figure out to which CPU
> > the base belongs (except if it's the base of the current CPU). There
> > is not much we can do about that. But OTOH, this is not a problem
> > because we see when the timer expires on which CPU it was enqueued. So
> > scrapping the cpu entry in the trace completely is not a big loss.
> >
> > The same applies to hrtimers as well.
> >
>
> Hi tglx:
>
> I also have different view here. :-)
>
> As you say, "We do not necessarily add the timer to the current CPUs timer
> wheel", but the timer is added to current CPU in __mod_timer(), selects the
> timer base as below code:
> new_base = __get_cpu_var(tvec_bases);
> In this case, we can use smp_processor_id() to get the CPU which timer is
> added.

No, this is _wrong_. You need to look at the full source. I added some
extra comments

new_base = __get_cpu_var(tvec_bases);

if (base != new_base) {

+ /* current timer base is on a different CPU */

/*
* We are trying to schedule the timer on the local CPU.
* However we can't change timer's base while it is running,
* otherwise del_timer_sync() can't detect that the timer's
* handler yet has not finished. This also guarantees that
* the timer is serialized wrt itself.
*/
if (likely(base->running_timer != timer)) {
/* See the comment in lock_timer_base() */
timer_set_base(timer, NULL);
spin_unlock(&base->lock);
base = new_base;
spin_lock(&base->lock);
timer_set_base(timer, base);
- }
+ } else
+ /*
+ * we know that that
+ * the callback is running on a different CPU and we need
+ * to keep base unchanged, so smp_processor_id() is
+ * telling you the wrong information.
+ */
+ }

> We can not add the timer to the current CPUs by using add_timer_on(), selects

We can add the timer to the current CPU by using add_timer_on() as well.

> the timer base in this function as below code:
> struct tvec_base *base = per_cpu(tvec_bases, cpu);
> In this case, We can know the timer is added to 'cpu'.
>
> So, I add trace_timer_start() in __mod_timer() and add_timer_on()in my patch.

Still in the __mod_timer() case the tracing info can be wrong and
tracing wrong information is worse than tracing no information.

Your patch could result in a trace which confuses the hell out of
people looking at it:

....... 0..... activate_timer on cpu 0

some time later

....... 2..... expire timer on cpu 2

And the guy who needs to analyse that trace would rip his hairs out
to find out how the timer moved from cpu 0 to cpu 2

> In hrtimer, all timer is added to the current CPU which can be getted by using
> smp_processor_id() in probe function, so it not has 'cpu' argument in my patch.

Wrong again. Read the code in switch_hrtimer_base(). It does the
same thing as the timer base selection logic in __mod_timer()

> In addition, we do better not put trace_timer_start() and debug_timer_activate
> in one function, have two reasons:
> 1: for trace_timer_start()'s logic, the timer start event is completed in
> internal_add_timer(), in other words: the timer is not start before
> internal_add_timer().

Oh well. So where is the difference of tracing it before or after the
list add happened ? That's complete irrelevant.

> 2: as Zhaolei says in the last mail, the timer's data may changed after
> debug_timer_activate().

Really ? What is going to change ? Nothing in the normal case, in the
case the timer is active then it is removed first. Also it depends on
how you do this:

void debug_and_trace_timer_activate(....)
{
debug_timer_activate(...);
trace_timer_activate(...);
}

in the timer code:

- debug_timer_activate(...);
+ debug_and_trace_timer_activate(...);

So this does not change the order of functions at all, but it avoids
sprinkling the real code with tons of extra crap.

Thanks,

tglx

2009-06-04 05:38:22

by Xiao Guangrong

[permalink] [raw]

Subject: Re: [PATCH 1/3] ftrace: add tracepoint for timer

Thomas Gleixner wrote:

> No, this is _wrong_. You need to look at the full source. I added some
> extra comments
>
> new_base = __get_cpu_var(tvec_bases);
>
> if (base != new_base) {
>
> + /* current timer base is on a different CPU */
>
> /*
> * We are trying to schedule the timer on the local CPU.
> * However we can't change timer's base while it is running,
> * otherwise del_timer_sync() can't detect that the timer's
> * handler yet has not finished. This also guarantees that
> * the timer is serialized wrt itself.
> */
> if (likely(base->running_timer != timer)) {
> /* See the comment in lock_timer_base() */
> timer_set_base(timer, NULL);
> spin_unlock(&base->lock);
> base = new_base;
> spin_lock(&base->lock);
> timer_set_base(timer, base);
> - }
> + } else
> + /*
> + * we know that that
> + * the callback is running on a different CPU and we need
> + * to keep base unchanged, so smp_processor_id() is
> + * telling you the wrong information.
> + */
> + }
>
>> We can not add the timer to the current CPUs by using add_timer_on(), selects
>
> We can add the timer to the current CPU by using add_timer_on() as well.
>
>> the timer base in this function as below code:
>> struct tvec_base *base = per_cpu(tvec_bases, cpu);
>> In this case, We can know the timer is added to 'cpu'.
>>
>> So, I add trace_timer_start() in __mod_timer() and add_timer_on()in my patch.
>
> Still in the __mod_timer() case the tracing info can be wrong and
> tracing wrong information is worse than tracing no information.
>
> Your patch could result in a trace which confuses the hell out of
> people looking at it:
>
> ....... 0..... activate_timer on cpu 0
>
> some time later
>
> ....... 2..... expire timer on cpu 2
>
> And the guy who needs to analyse that trace would rip his hairs out
> to find out how the timer moved from cpu 0 to cpu 2
>
>> In hrtimer, all timer is added to the current CPU which can be getted by using
>> smp_processor_id() in probe function, so it not has 'cpu' argument in my patch.
>
> Wrong again. Read the code in switch_hrtimer_base(). It does the
> same thing as the timer base selection logic in __mod_timer()
>

Hi tglx:

Thanks for you correct my mistake again. :-)

It's hard to detect which cpu the timer add, we can remove the 'cpu' parameter
in trace_timer_start() as your suggestion, like this:

TRACE_EVENT(timer_start,

TP_PROTO(struct timer_list *timer),

TP_ARGS(timer),

TP_STRUCT__entry(
__field( void *, timer )
__field( void *, function )
__field( unsigned long, expires )
),
......
}

>> In addition, we do better not put trace_timer_start() and debug_timer_activate
>> in one function, have two reasons:
>> 1: for trace_timer_start()'s logic, the timer start event is completed in
>> internal_add_timer(), in other words: the timer is not start before
>> internal_add_timer().
>
> Oh well. So where is the difference of tracing it before or after the
> list add happened ? That's complete irrelevant.
>

Yes, maybe it's not important.

>> 2: as Zhaolei says in the last mail, the timer's data may changed after
>> debug_timer_activate().
>
> Really ? What is going to change ? Nothing in the normal case, in the
> case the timer is active then it is removed first. Also it depends on
> how you do this:
>
> void debug_and_trace_timer_activate(....)
> {
> debug_timer_activate(...);
> trace_timer_activate(...);
> }
>
> in the timer code:
>
> - debug_timer_activate(...);
> + debug_and_trace_timer_activate(...);
>
> So this does not change the order of functions at all, but it avoids
> sprinkling the real code with tons of extra crap.
>

See below code:

static inline int
__mod_timer(......)
{
......
......

debug_timer_activate(timer);

new_base = __get_cpu_var(tvec_bases);
......
......

timer->expires = expires; *
internal_add_timer(base, timer);
trace_timer_start(...)
......
}
( this example is in Zhaolei's reply)

timer->expires can be changed at *, if we put trace_timer_start() and
debug_timer_activate() together, we can't get the right value of timer->expires.

In addition, do you agree my humble opinion about not put __init_timer() and
debug_timer_init() together? (can be found at:
http://marc.info/?l=linux-kernel&m=124399744614127&w=2)
If you agree with it, we do better to detach other event.

Thanks,
Xiao Guangrong

> Thanks,
>
> tglx
>
>

2009-06-04 08:45:22

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [PATCH 1/3] ftrace: add tracepoint for timer

On Thu, 4 Jun 2009, Xiao Guangrong wrote:
> > void debug_and_trace_timer_activate(....)
> > {
> > debug_timer_activate(...);
> > trace_timer_activate(...);
> > }
> >
> > in the timer code:
> >
> > - debug_timer_activate(...);
> > + debug_and_trace_timer_activate(...);
> >
> > So this does not change the order of functions at all, but it avoids
> > sprinkling the real code with tons of extra crap.
> >
>
> See below code:
>
> static inline int
> __mod_timer(......)
> {
> ......
> ......
>

- debug_timer_activate(timer);
+ debug_and_trace_timer_activate(timer, expires);

Does the trick nicely

> In addition, do you agree my humble opinion about not put __init_timer() and
> debug_timer_init() together? (can be found at:

No, I do not agree at all. I still see no value in those extra trace
points. As I showed you above you can hide all in extra arguments to a
combined debug_trace_xxx() function.

> http://marc.info/?l=linux-kernel&m=124399744614127&w=2)

>> But TRACE_EVENT not only be used in ftrace but also be used in
>> other probe module. Maybe detailed information of timer is
>> requisite in other probe funtion. In this case, we must put
>> trace_timer_init() after __init_timer().

Please stop this handwaving about potential use cases. I still have
not seen a single technical argument which explains the benefit of
those extra trace points. If we follow your "maybe its needed" logic
then we have to add yet another dozen of tracepoints into the same
functions to gather all possible states which might be there.

Tracing is important but it needs to be done unintrusive to the code
and you have to apply common sense which information is really
valuable and necessary. Gathering random crap just because it might be
necessary at some point is useless.

Get your gear together, sit down and think hard about which
information a tracer or a probe needs to have to give us a useful
insight into a particular object or function.

As long as there is no technical convincing argument that the gathered
information is really useful I'm not going to apply any of those
patches to the code I maintain.

Thanks,

tglx

2009-06-10 09:44:34

by Xiao Guangrong

[permalink] [raw]

Subject: Re: [PATCH 1/3] ftrace: add tracepoint for timer

Hi Thomas:

Sorry for the delayed reply.

> Please stop this handwaving about potential use cases. I still have
> not seen a single technical argument which explains the benefit of
> those extra trace points. If we follow your "maybe its needed" logic
> then we have to add yet another dozen of tracepoints into the same
> functions to gather all possible states which might be there.
>
> Tracing is important but it needs to be done unintrusive to the code
> and you have to apply common sense which information is really
> valuable and necessary. Gathering random crap just because it might be
> necessary at some point is useless.
>

I see your point, and I agree with you on this.

> Get your gear together, sit down and think hard about which
> information a tracer or a probe needs to have to give us a useful
> insight into a particular object or function.
>
> As long as there is no technical convincing argument that the gathered
> information is really useful I'm not going to apply any of those
> patches to the code I maintain.
>

Those tracepoints are wanted and useful:

1) We can detect a timer's delay if jiffies info is added, like this:

XXX: timer_start: timer=e0b374e0 func=test_timerfuc expires=4294941565
XXX: timer_expire: timer=e0b374e0 func=test_timerfuc jiffies=4294941567

We expect the timer expires at 4294941565, actually the timer expires at
4294941567, so it is delayed by 2 jiffies.

2) We can monitor the lifecycle and behaviors of a timer, for example, when
monitoring dirty writeback timer, I found reading /proc/sys/vm/dirty_writeback_centisecs
will reset the timer:

pdflush-244 [000] 0.554131: timer_start: timer=c077ef20 func=wb_timer_fn expires=2757949
<idle>-0 [000] 5.544769: timer_expire: timer=c077ef20 func=wb_timer_fn
<idle>-0 [000] 5.544770: timer_cancel: timer=c077ef20 func=wb_timer_fn
pdflush-244 [000] 5.544807: timer_start: timer=c077ef20 func=wb_timer_fn expires=2762949
cat-2326 [000] 5.618852: timer_cancel: timer=c077ef20 func=wb_timer_fn
cat-2326 [000] 6.618858: timer_start: timer=c077ef20 func=wb_timer_fn expires=2764025

3) We are developing "flight-record" using crash. It can read out the ring
buffer from the dump file, and let us know what was happening before
kernel panic, so we may find the cause of the panic. This requires
well-defined tracepoints in different kernel subsystems, especially some
core subsystems, timer is surely one of them.

I'll revise those patches according to your suggestions.

Thanks,
Xiao Guangrong

> Thanks,
>
> tglx
>
>

2009-06-10 10:59:11

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [PATCH 1/3] ftrace: add tracepoint for timer

Xiao,

On Wed, 10 Jun 2009, Xiao Guangrong wrote:
> Those tracepoints are wanted and useful:
> 1) We can detect a timer's delay if jiffies info is added, like this:
>
> XXX: timer_start: timer=e0b374e0 func=test_timerfuc expires=4294941565
> XXX: timer_expire: timer=e0b374e0 func=test_timerfuc jiffies=4294941567
> We expect the timer expires at 4294941565, actually the timer expires at
> 4294941567, so it is delayed by 2 jiffies.

You can achive the same thing by storing (expires - jiffies) when you
trace the timer start. Then you can calc the delta from the trace
timestamps, but I have no strong opinion about that.

> 2) We can monitor the lifecycle and behaviors of a timer, for
> example, when monitoring dirty writeback timer, I found reading
> /proc/sys/vm/dirty_writeback_centisecs will reset the timer:

> 3) We are developing "flight-record" using crash. It can read out the ring
> buffer from the dump file, and let us know what was happening before
> kernel panic, so we may find the cause of the panic. This requires
> well-defined tracepoints in different kernel subsystems, especially some
> core subsystems, timer is surely one of them.

I'm not arguing that tracepoints are not useful, but you can achieve
all the things above by combining them in an unintrusive way to the
existing debug points.

This results in the minimum noise of debug/trace insertions into the
code, which keeps it readable and maintainable. That's all I want.

Thanks,

tglx