2016-11-21 22:13:00

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH v2 1/2] timekeeping: Introduce a fast boot clock derived from fast monotonic clock

Introduce a fast boot clock. It accounts for suspend time and is based on the
fast monotonic clock so that its safe to use for tracing. Next patch adds the
clock to trace clock.

Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Joel Fernandes <[email protected]>
---
kernel/time/timekeeping.c | 36 ++++++++++++++++++++++++++++++------
1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 37dec7e..5a3f5ff 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -56,6 +56,12 @@ static struct timekeeper shadow_timekeeper;
struct tk_fast {
seqcount_t seq;
struct tk_read_base base[2];
+
+ /*
+ * first dimension is based on lower seq bit,
+ * second dimension is for offset type (real, boot, tai)
+ */
+ ktime_t offsets[2][TK_OFFS_MAX];
};

static struct tk_fast tk_fast_mono ____cacheline_aligned;
@@ -350,14 +356,20 @@ static void update_fast_timekeeper(struct tk_read_base *tkr, struct tk_fast *tkf
/* Force readers off to base[1] */
raw_write_seqcount_latch(&tkf->seq);

- /* Update base[0] */
+ /* Update base[0] and offsets*/
memcpy(base, tkr, sizeof(*base));
+ tkf->offsets[0][TK_OFFS_REAL] = tk_core.timekeeper.offs_real;
+ tkf->offsets[0][TK_OFFS_BOOT] = tk_core.timekeeper.offs_boot;
+ tkf->offsets[0][TK_OFFS_TAI] = tk_core.timekeeper.offs_tai;

/* Force readers back to base[0] */
raw_write_seqcount_latch(&tkf->seq);

- /* Update base[1] */
+ /* Update base[1] and offsets*/
memcpy(base + 1, base, sizeof(*base));
+ tkf->offsets[1][TK_OFFS_REAL] = tk_core.timekeeper.offs_real;
+ tkf->offsets[1][TK_OFFS_BOOT] = tk_core.timekeeper.offs_boot;
+ tkf->offsets[1][TK_OFFS_TAI] = tk_core.timekeeper.offs_tai;
}

/**
@@ -392,16 +404,23 @@ static void update_fast_timekeeper(struct tk_read_base *tkr, struct tk_fast *tkf
* of the following timestamps. Callers need to be aware of that and
* deal with it.
*/
-static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
+static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf, int offset)
{
struct tk_read_base *tkr;
unsigned int seq;
u64 now;
+ ktime_t *off;

do {
seq = raw_read_seqcount_latch(&tkf->seq);
tkr = tkf->base + (seq & 0x01);
- now = ktime_to_ns(tkr->base);
+
+ if (unlikely((offset >= 0))) {
+ off = tkf->offsets[seq & 0x01];
+ now = ktime_to_ns(ktime_add(tkr->base, off[offset]));
+ } else {
+ now = ktime_to_ns(tkr->base);
+ }

now += timekeeping_delta_to_ns(tkr,
clocksource_delta(
@@ -415,16 +434,21 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)

u64 ktime_get_mono_fast_ns(void)
{
- return __ktime_get_fast_ns(&tk_fast_mono);
+ return __ktime_get_fast_ns(&tk_fast_mono, -1);
}
EXPORT_SYMBOL_GPL(ktime_get_mono_fast_ns);

u64 ktime_get_raw_fast_ns(void)
{
- return __ktime_get_fast_ns(&tk_fast_raw);
+ return __ktime_get_fast_ns(&tk_fast_raw, -1);
}
EXPORT_SYMBOL_GPL(ktime_get_raw_fast_ns);

+u64 ktime_get_boot_fast_ns(void)
+{
+ return __ktime_get_fast_ns(&tk_fast_mono, TK_OFFS_BOOT);
+}
+
/* Suspend-time cycles value for halted fast timekeeper. */
static cycle_t cycles_at_suspend;

--
2.8.0.rc3.226.g39d4020


2016-11-21 22:13:12

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH v2 2/2] trace: Add an option for boot clock as trace clock

Unlike monotonic clock, boot clock as a trace clock will account for
time spent in suspend useful for tracing suspend/resume. This uses
earlier introduced infrastructure for using the fast boot clock.

Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Joel Fernandes <[email protected]>
---
kernel/trace/trace.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 8696ce6..f7b64db 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1125,6 +1125,7 @@ static struct {
{ trace_clock, "perf", 1 },
{ ktime_get_mono_fast_ns, "mono", 1 },
{ ktime_get_raw_fast_ns, "mono_raw", 1 },
+ { ktime_get_boot_fast_ns, "boot", 1 },
ARCH_TRACE_CLOCKS
};

--
2.8.0.rc3.226.g39d4020

2016-11-21 22:26:17

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] trace: Add an option for boot clock as trace clock

On Mon, 21 Nov 2016 14:12:50 -0800
Joel Fernandes <[email protected]> wrote:

> Unlike monotonic clock, boot clock as a trace clock will account for
> time spent in suspend useful for tracing suspend/resume. This uses
> earlier introduced infrastructure for using the fast boot clock.
>
> Cc: Steven Rostedt <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: John Stultz <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Signed-off-by: Joel Fernandes <[email protected]>
> ---
> kernel/trace/trace.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 8696ce6..f7b64db 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -1125,6 +1125,7 @@ static struct {
> { trace_clock, "perf", 1 },
> { ktime_get_mono_fast_ns, "mono", 1 },
> { ktime_get_raw_fast_ns, "mono_raw", 1 },
> + { ktime_get_boot_fast_ns, "boot", 1 },
> ARCH_TRACE_CLOCKS
> };
>

I'm fine with this change, but can you please update
Documentation/trace/ftrace.txt under trace_clock. I notice that mono
and mono_raw are missing too.

-- Steve

2016-11-22 03:43:13

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] trace: Add an option for boot clock as trace clock

Hi Joel,

[auto build test ERROR on tip/timers/core]
[also build test ERROR on v4.9-rc6 next-20161117]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Joel-Fernandes/timekeeping-Introduce-a-fast-boot-clock-derived-from-fast-monotonic-clock/20161122-071652
config: x86_64-rhel (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

All errors (new ones prefixed by >>):

>> kernel/trace/trace.c:1128:4: error: 'ktime_get_boot_fast_ns' undeclared here (not in a function)
{ ktime_get_boot_fast_ns, "boot", 1 },
^~~~~~~~~~~~~~~~~~~~~~

vim +/ktime_get_boot_fast_ns +1128 kernel/trace/trace.c

1122 { trace_clock_global, "global", 1 },
1123 { trace_clock_counter, "counter", 0 },
1124 { trace_clock_jiffies, "uptime", 0 },
1125 { trace_clock, "perf", 1 },
1126 { ktime_get_mono_fast_ns, "mono", 1 },
1127 { ktime_get_raw_fast_ns, "mono_raw", 1 },
> 1128 { ktime_get_boot_fast_ns, "boot", 1 },
1129 ARCH_TRACE_CLOCKS
1130 };
1131

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (1.31 kB)
.config.gz (37.00 kB)
Download all attachments

2016-11-22 22:32:30

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] timekeeping: Introduce a fast boot clock derived from fast monotonic clock

On Mon, 21 Nov 2016, Joel Fernandes wrote:
> @@ -56,6 +56,12 @@ static struct timekeeper shadow_timekeeper;
> struct tk_fast {
> seqcount_t seq;
> struct tk_read_base base[2];
> +
> + /*
> + * first dimension is based on lower seq bit,
> + * second dimension is for offset type (real, boot, tai)
> + */

Can you figure out why there are comments above the struct which explain
the members in Kernel Doc format and not randomly formatted comments inside
the struct definition?

> + ktime_t offsets[2][TK_OFFS_MAX];

This bloats fast_tk_raw for no value, along with the extra stores in the
update function for fast_tk_raw which will never use that offset stuff.

Aside of that, I really have to ask the question whether it's really
necessary to add this extra bloat in storage, update and readout code for
something which is not used by most people.

What's wrong with adding a tracepoint into the boot offset update function
and let perf or the tracer inject the value of the boot offset into the
trace data when starting. The time adjustment can be done in
postprocessing.

That should be sufficient for tracing suspend/resume behaviour. If there is
not a really convincing reason for having that as a real trace clock then I
prefer to avoid that extra stuff.

Thanks,

tglx

2016-11-23 00:25:01

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] timekeeping: Introduce a fast boot clock derived from fast monotonic clock

Hi Thomas,

On Tue, Nov 22, 2016 at 2:29 PM, Thomas Gleixner <[email protected]> wrote:
>
> On Mon, 21 Nov 2016, Joel Fernandes wrote:
> > @@ -56,6 +56,12 @@ static struct timekeeper shadow_timekeeper;
> > struct tk_fast {
> > seqcount_t seq;
> > struct tk_read_base base[2];
> > +
> > + /*
> > + * first dimension is based on lower seq bit,
> > + * second dimension is for offset type (real, boot, tai)
> > + */
>
> Can you figure out why there are comments above the struct which explain
> the members in Kernel Doc format and not randomly formatted comments inside
> the struct definition?

Ok sorry. I can move the comments before the function in the prescribed format.

> > + ktime_t offsets[2][TK_OFFS_MAX];
>
> This bloats fast_tk_raw for no value, along with the extra stores in the
> update function for fast_tk_raw which will never use that offset stuff.
>
> Aside of that, I really have to ask the question whether it's really
> necessary to add this extra bloat in storage, update and readout code for
> something which is not used by most people.
>
> What's wrong with adding a tracepoint into the boot offset update function
> and let perf or the tracer inject the value of the boot offset into the
> trace data when starting. The time adjustment can be done in
> postprocessing.

I agree we're bloating this and probably not very acceptable.
tracepoint adds additional complexity and out of tree patches which
we'd like to avoid. Would you be Ok if we added a relatively simple
function like below that could do the job and not bloat the optimal
case?

/*
* Fast and NMI safe access to boot time. It may be slightly out of date
* as we're accessing offset without seqcount writes, but is safe to access.
*/
u64 ktime_get_boot_fast_ns(void)
{
struct timekeeper *tk = &tk_core.timekeeper;
return __ktime_get_fast_ns(&tk_fast_mono) + tk->offs_boot;
}
EXPORT_SYMBOL_GPL(ktime_get_boot_fast_ns);


> That should be sufficient for tracing suspend/resume behaviour. If there is
> not a really convincing reason for having that as a real trace clock then I
> prefer to avoid that extra stuff.

Several clocks are accessible just by simple writing of clock name to
trace_clock in debugfs. This is really cool and doesn't require any
out of tree patches or post processing complexity.

Thanks,
Joel

2016-11-23 00:26:54

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] timekeeping: Introduce a fast boot clock derived from fast monotonic clock

On Tue, Nov 22, 2016 at 4:23 PM, Joel Fernandes <[email protected]> wrote:
> Hi Thomas,
>
> On Tue, Nov 22, 2016 at 2:29 PM, Thomas Gleixner <[email protected]> wrote:
>>
>> On Mon, 21 Nov 2016, Joel Fernandes wrote:
>> > @@ -56,6 +56,12 @@ static struct timekeeper shadow_timekeeper;
>> > struct tk_fast {
>> > seqcount_t seq;
>> > struct tk_read_base base[2];
>> > +
>> > + /*
>> > + * first dimension is based on lower seq bit,
>> > + * second dimension is for offset type (real, boot, tai)
>> > + */
>>
>> Can you figure out why there are comments above the struct which explain
>> the members in Kernel Doc format and not randomly formatted comments inside
>> the struct definition?
>
> Ok sorry. I can move the comments before the function in the prescribed format.
>
>> > + ktime_t offsets[2][TK_OFFS_MAX];
>>
>> This bloats fast_tk_raw for no value, along with the extra stores in the
>> update function for fast_tk_raw which will never use that offset stuff.
>>
>> Aside of that, I really have to ask the question whether it's really
>> necessary to add this extra bloat in storage, update and readout code for
>> something which is not used by most people.
>>
>> What's wrong with adding a tracepoint into the boot offset update function
>> and let perf or the tracer inject the value of the boot offset into the
>> trace data when starting. The time adjustment can be done in
>> postprocessing.
>
> I agree we're bloating this and probably not very acceptable.
> tracepoint adds additional complexity and out of tree patches which
> we'd like to avoid. Would you be Ok if we added a relatively simple
> function like below that could do the job and not bloat the optimal
> case?
>
> /*
> * Fast and NMI safe access to boot time. It may be slightly out of date
> * as we're accessing offset without seqcount writes, but is safe to access.

s/writes/reads/

> */
> u64 ktime_get_boot_fast_ns(void)
> {
> struct timekeeper *tk = &tk_core.timekeeper;
> return __ktime_get_fast_ns(&tk_fast_mono) + tk->offs_boot;

I meant, __ktime_get_fast_ns(&tk_fast_mono) +
ktime_to_ns(tk->offs_boot). Was just showing the idea...

Joel


> }
> EXPORT_SYMBOL_GPL(ktime_get_boot_fast_ns);
>
>
>> That should be sufficient for tracing suspend/resume behaviour. If there is
>> not a really convincing reason for having that as a real trace clock then I
>> prefer to avoid that extra stuff.
>
> Several clocks are accessible just by simple writing of clock name to
> trace_clock in debugfs. This is really cool and doesn't require any
> out of tree patches or post processing complexity.
>
> Thanks,
> Joel

2016-11-23 10:20:55

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] timekeeping: Introduce a fast boot clock derived from fast monotonic clock

On Tue, 22 Nov 2016, Joel Fernandes wrote:
> On Tue, Nov 22, 2016 at 2:29 PM, Thomas Gleixner <[email protected]> wrote:
> > What's wrong with adding a tracepoint into the boot offset update function
> > and let perf or the tracer inject the value of the boot offset into the
> > trace data when starting. The time adjustment can be done in
> > postprocessing.
>
> I agree we're bloating this and probably not very acceptable.
> tracepoint adds additional complexity and out of tree patches which

Why are tracepoints a problem and adding complexity? Also why would they
cause out of tree patches?

> we'd like to avoid. Would you be Ok if we added a relatively simple
> function like below that could do the job and not bloat the optimal
> case?
>
> /*
> * Fast and NMI safe access to boot time. It may be slightly out of date
> * as we're accessing offset without seqcount writes, but is safe to access.
> */
> u64 ktime_get_boot_fast_ns(void)
> {
> struct timekeeper *tk = &tk_core.timekeeper;
> return __ktime_get_fast_ns(&tk_fast_mono) + tk->offs_boot;

This needs a big fat comment and documentation of the possible side effects
of this:

1) The timestamp might be off due:

CPU 0 CPU 1
timekeeping_inject_sleeptime64()
__timekeeping_inject_sleeptime(tk, delta);
timestamp();
timekeeping_update(tk, TK_CLEAR_NTP...);

2) On 32bit machines the timestamp might see a half updated boot offset
value. On 64bit machines it either sees the old or the new value.

I don't think this is a big issue, because the event of updating boot
offset is really, really rare, so postprocessing should be able to handle
this. But at least it must be documented so people wont get surprised.

> > That should be sufficient for tracing suspend/resume behaviour. If there is
> > not a really convincing reason for having that as a real trace clock then I
> > prefer to avoid that extra stuff.
>
> Several clocks are accessible just by simple writing of clock name to
> trace_clock in debugfs. This is really cool and doesn't require any
> out of tree patches or post processing complexity.

I know that it's nice to have :)

Thanks,

tglx