2009-09-16 20:23:35

by john stultz

[permalink] [raw]
Subject: Re: [PATCH 2/3] ftrace: add tracepoint for xtime

On Tue, Aug 25, 2009 at 1:14 AM, Zhaolei <[email protected]> wrote:
> From: Xiao Guangrong <[email protected]>
>
> This patch can trace current xtime and wall_to_monotonic. Then user can
> use ctime() to convert them to wall time which is easier to be understood,
> especially for flight-recorder which need to get trace event from a kernel
> dump file.
>
> Example of ftrace output:
> ? ? ? ? ?<idle>-0 ? ? [000] 20118.489849: gtod_update_xtime: xtime=1243265589.999999824 wall_to_monotonic=3051713268.744158739
> ? ? ? ? ? <...>-4020 ?[001] 20118.489855: sys_open(filename: bf9e66e0, flags: 98800, mode: bf9e66e0)
> ? ? ? ? ? <...>-4020 ?[001] 20118.489873: sys_open -> 0xffffffec
>
> ctime(1243265590) = date:Mon May 25 11:33:10 2009
> So we can realize the task with pid 4020 open a file at
> Mon May 25 11:33:10 2009
>
> Changelog:
> v1->v2: Rebased by Zhao Lei <[email protected]>
>
> Signed-off-by: Zhao Lei <[email protected]>
> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> ?include/trace/events/xtime.h | ? 38 ++++++++++++++++++++++++++++++++++++++
> ?kernel/time/ntp.c ? ? ? ? ? ?| ? ?4 ++++
> ?kernel/time/timekeeping.c ? ?| ? ?6 ++++++
> ?3 files changed, 48 insertions(+), 0 deletions(-)
> ?create mode 100644 include/trace/events/xtime.h
>
> diff --git a/include/trace/events/xtime.h b/include/trace/events/xtime.h
> new file mode 100644
> index 0000000..398e679
> --- /dev/null
> +++ b/include/trace/events/xtime.h
> @@ -0,0 +1,38 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM xtime
> +
> +#if !defined(_TRACE_XTIME_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_XTIME_H
> +
> +#include <linux/time.h>
> +#include <linux/tracepoint.h>
> +
> +TRACE_EVENT(gtod_update_xtime,
> +
> + ? ? ? TP_PROTO(struct timespec *xtime, struct timespec *wall_to_monotonic),
> +
> + ? ? ? TP_ARGS(xtime, wall_to_monotonic),
> +
> + ? ? ? TP_STRUCT__entry(
> + ? ? ? ? ? ? ? __field( ? ? ? ?long, ? xtime_sec ? ? ? ? ? ? ? )
> + ? ? ? ? ? ? ? __field( ? ? ? ?long, ? xtime_nsec ? ? ? ? ? ? ?)
> + ? ? ? ? ? ? ? __field( ? ? ? ?long, ? wall_to_monotonic_sec ? )
> + ? ? ? ? ? ? ? __field( ? ? ? ?long, ? wall_to_monotonic_nsec ?)
> + ? ? ? ),
> +
> + ? ? ? TP_fast_assign(
> + ? ? ? ? ? ? ? __entry->xtime_sec ? ? ? ? ? ? ?= xtime->tv_sec;
> + ? ? ? ? ? ? ? __entry->xtime_nsec ? ? ? ? ? ? = xtime->tv_nsec;
> + ? ? ? ? ? ? ? __entry->wall_to_monotonic_sec ?= wall_to_monotonic->tv_sec;
> + ? ? ? ? ? ? ? __entry->wall_to_monotonic_nsec = wall_to_monotonic->tv_nsec;
> + ? ? ? ),
> +
> + ? ? ? TP_printk("xtime=%ld.%09ld wall_to_monotonic=%ld.%09ld",
> + ? ? ? ? ? ? ? ? __entry->xtime_sec, __entry->xtime_nsec,
> + ? ? ? ? ? ? ? ? __entry->wall_to_monotonic_sec, __entry->wall_to_monotonic_nsec)
> +);
> +
> +#endif /* _TRACE_XTIME_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> index 4800f93..fc2f13a 100644
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c
> @@ -15,6 +15,8 @@
> ?#include <linux/time.h>
> ?#include <linux/mm.h>
>
> +#include <trace/events/xtime.h>
> +
> ?/*
> ?* NTP timekeeping variables:
> ?*/
> @@ -218,6 +220,8 @@ static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer)
> ? ? ? ? ? ? ? ?break;
> ? ? ? ?}
>
> + ? ? ? trace_gtod_update_xtime(&xtime, &wall_to_monotonic);
> +
> ? ? ? ?write_sequnlock(&xtime_lock);
>
> ? ? ? ?return res;
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 03cbeb3..2e57a87 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -20,6 +20,9 @@
> ?#include <linux/tick.h>
> ?#include <linux/stop_machine.h>
>
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/xtime.h>
> +
> ?/* Structure holding internal timekeeping values. */
> ?struct timekeeper {
> ? ? ? ?/* Current clocksource used for timekeeping. */
> @@ -338,6 +341,8 @@ int do_settimeofday(struct timespec *tv)
>
> ? ? ? ?update_vsyscall(&xtime, timekeeper.clock);
>
> + ? ? ? trace_gtod_update_xtime(&xtime, &wall_to_monotonic);
> +

So the only thing to watch out on here is that xtime doesn't hold the
current time, but the accumulated time. There is some unaccumulated
time still kept in the clocksource structure.

You probably want (assuming you only need tick granularity time) to
use current_kernel_time().

As an aside, is there a reason you have to have update callbacks and
duplicate the time storage instead of using the existing interfaces?
(ie: Is this due to locking or something else?)

thanks
-john


2009-09-16 20:19:59

by john stultz

[permalink] [raw]
Subject: Re: [PATCH 2/3] ftrace: add tracepoint for xtime

On Wed, Sep 16, 2009 at 12:56 PM, john stultz <[email protected]> wrote:
> On Tue, Aug 25, 2009 at 1:14 AM, Zhaolei <[email protected]> wrote:
>> From: Xiao Guangrong <[email protected]>
>> ?/* Structure holding internal timekeeping values. */
>> ?struct timekeeper {
>> ? ? ? ?/* Current clocksource used for timekeeping. */
>> @@ -338,6 +341,8 @@ int do_settimeofday(struct timespec *tv)
>>
>> ? ? ? ?update_vsyscall(&xtime, timekeeper.clock);
>>
>> + ? ? ? trace_gtod_update_xtime(&xtime, &wall_to_monotonic);
>> +
>
> So the only thing to watch out on here is that xtime doesn't hold the
> current time, but the accumulated time. There is some unaccumulated
> time still kept in the clocksource structure.
>
> You probably want (assuming you only need tick granularity time) to
> use current_kernel_time().
>
> As an aside, is there a reason you have to have update callbacks and
> duplicate the time storage instead of using the existing interfaces?
> (ie: Is this due to locking or something else?)

Doh. Sorry, you're actually tracing the timekeeping code. Not using
this to assist tracing. Got this confused.

So yea, I think this should be ok, plus or minus determining if you
really want xtime or xtime_cache.

-john

2009-09-16 20:32:15

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/3] ftrace: add tracepoint for xtime

On Wed, 2009-09-16 at 12:58 -0700, john stultz wrote:
> On Wed, Sep 16, 2009 at 12:56 PM, john stultz <[email protected]> wrote:
> > On Tue, Aug 25, 2009 at 1:14 AM, Zhaolei <[email protected]> wrote:
> >> From: Xiao Guangrong <[email protected]>
> >> /* Structure holding internal timekeeping values. */
> >> struct timekeeper {
> >> /* Current clocksource used for timekeeping. */
> >> @@ -338,6 +341,8 @@ int do_settimeofday(struct timespec *tv)
> >>
> >> update_vsyscall(&xtime, timekeeper.clock);
> >>
> >> + trace_gtod_update_xtime(&xtime, &wall_to_monotonic);
> >> +
> >
> > So the only thing to watch out on here is that xtime doesn't hold the
> > current time, but the accumulated time. There is some unaccumulated
> > time still kept in the clocksource structure.
> >
> > You probably want (assuming you only need tick granularity time) to
> > use current_kernel_time().
> >
> > As an aside, is there a reason you have to have update callbacks and
> > duplicate the time storage instead of using the existing interfaces?
> > (ie: Is this due to locking or something else?)
>
> Doh. Sorry, you're actually tracing the timekeeping code. Not using
> this to assist tracing. Got this confused.
>
> So yea, I think this should be ok, plus or minus determining if you
> really want xtime or xtime_cache.

Well this may be a real concern. It's not about tracing timekeeping
(although it adds that functionality). His second patch (I didn't Cc you
on that one) hooks to these tracepoints to update time accordingly.

What is being done is a way to have a "wall time" value being added to
the ring buffer. But this needs to be very carefully done, because the
all tracers use this, including the function tracer in NMI code. So the
clock source can not take locks or do anything fancy.

What the idea is, is to have a semi clock that is read with some kind of
fast increment, and then at clock ticks, this clock is synced up.

I think that's the idea, anyone correct me if I'm wrong ;-)

-- Steve

2009-09-16 20:49:39

by john stultz

[permalink] [raw]
Subject: Re: [PATCH 2/3] ftrace: add tracepoint for xtime

On Wed, 2009-09-16 at 16:32 -0400, Steven Rostedt wrote:
> On Wed, 2009-09-16 at 12:58 -0700, john stultz wrote:
> > On Wed, Sep 16, 2009 at 12:56 PM, john stultz <[email protected]> wrote:
> > > On Tue, Aug 25, 2009 at 1:14 AM, Zhaolei <[email protected]> wrote:
> > >> From: Xiao Guangrong <[email protected]>
> > >> /* Structure holding internal timekeeping values. */
> > >> struct timekeeper {
> > >> /* Current clocksource used for timekeeping. */
> > >> @@ -338,6 +341,8 @@ int do_settimeofday(struct timespec *tv)
> > >>
> > >> update_vsyscall(&xtime, timekeeper.clock);
> > >>
> > >> + trace_gtod_update_xtime(&xtime, &wall_to_monotonic);
> > >> +
> > >
> > > So the only thing to watch out on here is that xtime doesn't hold the
> > > current time, but the accumulated time. There is some unaccumulated
> > > time still kept in the clocksource structure.
> > >
> > > You probably want (assuming you only need tick granularity time) to
> > > use current_kernel_time().
> > >
> > > As an aside, is there a reason you have to have update callbacks and
> > > duplicate the time storage instead of using the existing interfaces?
> > > (ie: Is this due to locking or something else?)
> >
> > Doh. Sorry, you're actually tracing the timekeeping code. Not using
> > this to assist tracing. Got this confused.
> >
> > So yea, I think this should be ok, plus or minus determining if you
> > really want xtime or xtime_cache.
>
> Well this may be a real concern. It's not about tracing timekeeping
> (although it adds that functionality). His second patch (I didn't Cc you
> on that one) hooks to these tracepoints to update time accordingly.
>
> What is being done is a way to have a "wall time" value being added to
> the ring buffer. But this needs to be very carefully done, because the
> all tracers use this, including the function tracer in NMI code. So the
> clock source can not take locks or do anything fancy.
>
> What the idea is, is to have a semi clock that is read with some kind of
> fast increment, and then at clock ticks, this clock is synced up.

Hmm.. Yea, if that's the case, then I'm not a big fan of this approach.

It sounds like what's really needed is a lock-free variant of
current_kernel_time() or something close to the CLOCK_MONOTONIC_COARSE
functionality currently queued.

Doing it without locks might have some downsides, and I guess that's the
point of the callback method (updates happen at prescribed times and
likely under locks that the trace code understands so it avoids races
and deadlocks).

-john

2009-09-17 06:32:12

by Zhao Lei

[permalink] [raw]
Subject: Re: Re: [PATCH 2/3] ftrace: add tracepoint for xtime

john stultz wrote:
> On Wed, 2009-09-16 at 16:32 -0400, Steven Rostedt wrote:
>> On Wed, 2009-09-16 at 12:58 -0700, john stultz wrote:
>>> On Wed, Sep 16, 2009 at 12:56 PM, john stultz <[email protected]> wrote:
>>>> On Tue, Aug 25, 2009 at 1:14 AM, Zhaolei <[email protected]> wrote:
>>>>> From: Xiao Guangrong <[email protected]>
>>>>> /* Structure holding internal timekeeping values. */
>>>>> struct timekeeper {
>>>>> /* Current clocksource used for timekeeping. */
>>>>> @@ -338,6 +341,8 @@ int do_settimeofday(struct timespec *tv)
>>>>>
>>>>> update_vsyscall(&xtime, timekeeper.clock);
>>>>>
>>>>> + trace_gtod_update_xtime(&xtime, &wall_to_monotonic);
>>>>> +
>>>> So the only thing to watch out on here is that xtime doesn't hold the
>>>> current time, but the accumulated time. There is some unaccumulated
>>>> time still kept in the clocksource structure.
>>>>
>>>> You probably want (assuming you only need tick granularity time) to
>>>> use current_kernel_time().
>>>>
>>>> As an aside, is there a reason you have to have update callbacks and
>>>> duplicate the time storage instead of using the existing interfaces?
>>>> (ie: Is this due to locking or something else?)
>>> Doh. Sorry, you're actually tracing the timekeeping code. Not using
>>> this to assist tracing. Got this confused.
>>>
>>> So yea, I think this should be ok, plus or minus determining if you
>>> really want xtime or xtime_cache.
>> Well this may be a real concern. It's not about tracing timekeeping
>> (although it adds that functionality). His second patch (I didn't Cc you
>> on that one) hooks to these tracepoints to update time accordingly.
>>
>> What is being done is a way to have a "wall time" value being added to
>> the ring buffer. But this needs to be very carefully done, because the
>> all tracers use this, including the function tracer in NMI code. So the
>> clock source can not take locks or do anything fancy.
>>
>> What the idea is, is to have a semi clock that is read with some kind of
>> fast increment, and then at clock ticks, this clock is synced up.
>
> Hmm.. Yea, if that's the case, then I'm not a big fan of this approach.
>
> It sounds like what's really needed is a lock-free variant of
> current_kernel_time() or something close to the CLOCK_MONOTONIC_COARSE
> functionality currently queued.

Yes.

And it will be better if I can get current walltime fast and
with high resolution.

I see your patch of __current_kernel_time() in Aug 19,
and I think it can be used to get walltime in ftrace.
But IMHO, use tracepoint to do it maybe better for its high resolution.

Thanks
Zhaolei

> Doing it without locks might have some downsides, and I guess that's the
> point of the callback method (updates happen at prescribed times and
> likely under locks that the trace code understands so it avoids races
> and deadlocks).
>
> -john
>