2014-06-28 10:43:45

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH 0/2] tracing: fix uptime overflow problem

The "uptime" tracer added in:
commit 8aacf017b065a805d27467843490c976835eb4a5
tracing: Add "uptime" trace clock that uses jiffies
has wraparound problems when the system has been up more
than 1 hour 11 minutes and 34 seconds. It converts jiffies
to nanoseconds using:
(u64)jiffies_to_usecs(jiffy) * 1000ULL
but since jiffies_to_usecs() only returns a 32-bit value, it
truncates at 2^32 microseconds. An additional problem on 32-bit
systems is that the argument is "unsigned long", so fixing the
return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
system).

Tony provide full featured jiffies_to_nsecs() function, but
can't resolve another problem that jiffies_lock is not safe
in NMI context.

Now we use the lockless function __current_kernel_time() and
getboottime() to calculate the uptime.

The former discussion is here:
http://lkml.org/lkml/2014/4/8/525

Additional, I changed trace_clock_jiffies to trace_clock_uptime,
in order to better describe its function and updated the document.

Xie XiuQi (2):
tracing: fix uptime overflow problem
tracing: update Documentation/trace/ftrace.txt for uptime

Documentation/trace/ftrace.txt | 3 +--
include/linux/trace_clock.h | 2 +-
kernel/trace/trace.c | 2 +-
kernel/trace/trace_clock.c | 15 +++++++++++----
4 files changed, 14 insertions(+), 8 deletions(-)

--
2.0.0


2014-06-28 10:43:55

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH 1/2] tracing: fix uptime overflow problem

The "uptime" tracer added in:
commit 8aacf017b065a805d27467843490c976835eb4a5
tracing: Add "uptime" trace clock that uses jiffies
has wraparound problems when the system has been up more
than 1 hour 11 minutes and 34 seconds. It converts jiffies
to nanoseconds using:
(u64)jiffies_to_usecs(jiffy) * 1000ULL
but since jiffies_to_usecs() only returns a 32-bit value, it
truncates at 2^32 microseconds. An additional problem on 32-bit
systems is that the argument is "unsigned long", so fixing the
return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
system).

Tony provide full featured jiffies_to_nsecs() function, but
can't resolve another problem that jiffies_lock is not safe
in NMI context.

Now we use the lockless function __current_kernel_time() and
getboottime() to calculate the uptime.

The former discussion is here:
http://lkml.org/lkml/2014/4/8/525

Additional, I changed trace_clock_jiffies to trace_clock_uptime,
in order to better describe its function.

Reported-by: Tony Luck <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: [email protected] # 3.10+
Signed-off-by: Xie XiuQi <[email protected]>
---
include/linux/trace_clock.h | 2 +-
kernel/trace/trace.c | 2 +-
kernel/trace/trace_clock.c | 15 +++++++++++----
3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/linux/trace_clock.h b/include/linux/trace_clock.h
index 1d7ca27..2961ac7 100644
--- a/include/linux/trace_clock.h
+++ b/include/linux/trace_clock.h
@@ -16,7 +16,7 @@

extern u64 notrace trace_clock_local(void);
extern u64 notrace trace_clock(void);
-extern u64 notrace trace_clock_jiffies(void);
+extern u64 notrace trace_clock_uptime(void);
extern u64 notrace trace_clock_global(void);
extern u64 notrace trace_clock_counter(void);

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 384ede3..867e849 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -809,7 +809,7 @@ static struct {
{ trace_clock_local, "local", 1 },
{ trace_clock_global, "global", 1 },
{ trace_clock_counter, "counter", 0 },
- { trace_clock_jiffies, "uptime", 1 },
+ { trace_clock_uptime, "uptime", 1 },
{ trace_clock, "perf", 1 },
ARCH_TRACE_CLOCKS
};
diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index 26dc348..59e4d4d 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -19,6 +19,7 @@
#include <linux/percpu.h>
#include <linux/sched.h>
#include <linux/ktime.h>
+#include <linux/time.h>
#include <linux/trace_clock.h>

/*
@@ -58,14 +59,20 @@ u64 notrace trace_clock(void)
}

/*
- * trace_jiffy_clock(): Simply use jiffies as a clock counter.
+ * trace_clock_uptime(): Use lockless version __current_kernel_time,
+ * so it's safe in NMI context.
*/
-u64 notrace trace_clock_jiffies(void)
+u64 notrace trace_clock_uptime(void)
{
- u64 jiffy = jiffies - INITIAL_JIFFIES;
+ struct timespec uptime, now, boottime;
+
+ /* Does not take xtime_lock, so it's safe in NMI context. */
+ now = __current_kernel_time();
+ getboottime(&boottime);
+ uptime = timespec_sub(now, boottime);

/* Return nsecs */
- return (u64)jiffies_to_usecs(jiffy) * 1000ULL;
+ return timespec_to_ns(&uptime);
}

/*
--
2.0.0

2014-06-28 10:43:44

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH 2/2] tracing: update Documentation/trace/ftrace.txt for uptime

update uptime description.

Cc: Tony Luck <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Xie XiuQi <[email protected]>
---
Documentation/trace/ftrace.txt | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 2479b2a..7b2e906 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -331,8 +331,7 @@ of ftrace. Here is a list of some of the key files:
know exactly the order events occurred with respect to
each other on different CPUs.

- uptime: This uses the jiffies counter and the time stamp
- is relative to the time since boot up.
+ uptime: This time stamp is relative to the time since boot up.

perf: This makes ftrace use the same clock that perf uses.
Eventually perf will be able to read ftrace buffers
--
2.0.0

2014-06-30 15:10:44

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/2] tracing: fix uptime overflow problem

On Sat, 28 Jun 2014 19:10:00 +0800
Xie XiuQi <[email protected]> wrote:


> /*
> - * trace_jiffy_clock(): Simply use jiffies as a clock counter.
> + * trace_clock_uptime(): Use lockless version __current_kernel_time,
> + * so it's safe in NMI context.
> */
> -u64 notrace trace_clock_jiffies(void)
> +u64 notrace trace_clock_uptime(void)
> {
> - u64 jiffy = jiffies - INITIAL_JIFFIES;
> + struct timespec uptime, now, boottime;
> +
> + /* Does not take xtime_lock, so it's safe in NMI context. */
> + now = __current_kernel_time();

The problem with this patch is that you have a race:

ts.tv_sec = tk->xtime_sec;
ts.tv_nsec = (long)(tk->xtime_nsec >> tk->shift);

There's a chance that gets done between the update of xtime_sec an
xtime_nsec, and make time go backwards.

This call is lockless but is not reliable.

-- Steve


> + getboottime(&boottime);
> + uptime = timespec_sub(now, boottime);
>
> /* Return nsecs */
> - return (u64)jiffies_to_usecs(jiffy) * 1000ULL;
> + return timespec_to_ns(&uptime);
> }
>
> /*

2014-06-30 18:17:23

by Tony Luck

[permalink] [raw]
Subject: [PATCH] tracing: Fix wraparound problems in "uptime" tracer

There seem to be no non-racy solutions ... I've been wondering
about giving up on a generic jiffies_to_nsec() function because
people might use it in cases where the races might be likley to
bite them. For my need, I think that "perfect is the enemy of good":

1) The race window is only a few microseconds wide
2) It only exists on 32-bit kernels - which are dying out on server
systems because they can't handle the amounts of memory on modern
machines.
3) It opens every 49 days (on a HZ=1000 system)
4) I'm logging error events that happen at a "per-month" frequency (or lower)
5) If the race does happen - the visible result is that we have a
bad time logged against an error event.

so what about this: ...

From: Tony Luck <[email protected]>

The "uptime" tracer added in:
commit 8aacf017b065a805d27467843490c976835eb4a5
tracing: Add "uptime" trace clock that uses jiffies
has wraparound problems when the system has been up more
than 1 hour 11 minutes and 34 seconds. It converts jiffies
to nanoseconds using:
(u64)jiffies_to_usecs(jiffy) * 1000ULL
but since jiffies_to_usecs() only returns a 32-bit value, it
truncates at 2^32 microseconds. An additional problem on 32-bit
systems is that the argument is "unsigned long", so fixing the
return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
system).

We can't provide a full features jiffies_to_nsec() function in
any safe way (32-bit systems need locking to read the full 64-bit
jiffies value). Just do the best we can here and recognise that
32-bit systems may seem some timestamp anomolies if jiffies64
was in the middle of rolling over a 2^32 boundary.

Signed-off-by: Tony Luck <[email protected]>
---
kernel/timeconst.bc | 6 ++++++
kernel/trace/trace_clock.c | 10 ++++++++--
2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/timeconst.bc b/kernel/timeconst.bc
index 511bdf2cafda..a5fef7a7fb27 100644
--- a/kernel/timeconst.bc
+++ b/kernel/timeconst.bc
@@ -100,6 +100,12 @@ define timeconst(hz) {
print "#define USEC_TO_HZ_DEN\t\t", 1000000/cd, "\n"
print "\n"

+ obase=10
+ cd=gcd(hz,1000000000)
+ print "#define HZ_TO_NSEC_NUM\t\t", 1000000000/cd, "\n"
+ print "#define HZ_TO_NSEC_DEN\t\t", hz/cd, "\n"
+ print "\n"
+
print "#endif /* KERNEL_TIMECONST_H */\n"
}
halt
diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index 26dc348332b7..dc5b11b9f8a4 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -59,13 +59,19 @@ u64 notrace trace_clock(void)

/*
* trace_jiffy_clock(): Simply use jiffies as a clock counter.
+ * This usage of jiffies_64 isn't safe on 32-bit, but we may be
+ * called from NMI context, and we have no safe way to get a timestamp.
*/
u64 notrace trace_clock_jiffies(void)
{
- u64 jiffy = jiffies - INITIAL_JIFFIES;
+ u64 jiffy = jiffies_64 - INITIAL_JIFFIES;

/* Return nsecs */
- return (u64)jiffies_to_usecs(jiffy) * 1000ULL;
+#if !(NSEC_PER_SEC % HZ)
+ return (NSEC_PER_SEC / HZ) * jiffy;
+#else
+ return (jiffy * HZ_TO_NSEC_NUM) / HZ_TO_NSEC_DEN;
+#endif
}

/*
--
1.8.4.1

2014-06-30 18:40:21

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] tracing: Fix wraparound problems in "uptime" tracer

On Mon, 30 Jun 2014 11:17:18 -0700
Tony Luck <[email protected]> wrote:

> There seem to be no non-racy solutions ... I've been wondering
> about giving up on a generic jiffies_to_nsec() function because
> people might use it in cases where the races might be likley to
> bite them. For my need, I think that "perfect is the enemy of good":
>
> 1) The race window is only a few microseconds wide
> 2) It only exists on 32-bit kernels - which are dying out on server
> systems because they can't handle the amounts of memory on modern
> machines.
> 3) It opens every 49 days (on a HZ=1000 system)
> 4) I'm logging error events that happen at a "per-month" frequency (or lower)
> 5) If the race does happen - the visible result is that we have a
> bad time logged against an error event.
>
> so what about this: ...
>
> From: Tony Luck <[email protected]>
>
> The "uptime" tracer added in:
> commit 8aacf017b065a805d27467843490c976835eb4a5
> tracing: Add "uptime" trace clock that uses jiffies
> has wraparound problems when the system has been up more
> than 1 hour 11 minutes and 34 seconds. It converts jiffies
> to nanoseconds using:
> (u64)jiffies_to_usecs(jiffy) * 1000ULL
> but since jiffies_to_usecs() only returns a 32-bit value, it
> truncates at 2^32 microseconds. An additional problem on 32-bit
> systems is that the argument is "unsigned long", so fixing the
> return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
> system).
>
> We can't provide a full features jiffies_to_nsec() function in
> any safe way (32-bit systems need locking to read the full 64-bit
> jiffies value). Just do the best we can here and recognise that
> 32-bit systems may seem some timestamp anomolies if jiffies64
> was in the middle of rolling over a 2^32 boundary.
>
> Signed-off-by: Tony Luck <[email protected]>
> ---
> kernel/timeconst.bc | 6 ++++++
> kernel/trace/trace_clock.c | 10 ++++++++--
> 2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/timeconst.bc b/kernel/timeconst.bc
> index 511bdf2cafda..a5fef7a7fb27 100644
> --- a/kernel/timeconst.bc
> +++ b/kernel/timeconst.bc
> @@ -100,6 +100,12 @@ define timeconst(hz) {
> print "#define USEC_TO_HZ_DEN\t\t", 1000000/cd, "\n"
> print "\n"
>
> + obase=10
> + cd=gcd(hz,1000000000)
> + print "#define HZ_TO_NSEC_NUM\t\t", 1000000000/cd, "\n"
> + print "#define HZ_TO_NSEC_DEN\t\t", hz/cd, "\n"
> + print "\n"
> +
> print "#endif /* KERNEL_TIMECONST_H */\n"
> }
> halt
> diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
> index 26dc348332b7..dc5b11b9f8a4 100644
> --- a/kernel/trace/trace_clock.c
> +++ b/kernel/trace/trace_clock.c
> @@ -59,13 +59,19 @@ u64 notrace trace_clock(void)
>
> /*
> * trace_jiffy_clock(): Simply use jiffies as a clock counter.
> + * This usage of jiffies_64 isn't safe on 32-bit, but we may be
> + * called from NMI context, and we have no safe way to get a timestamp.
> */
> u64 notrace trace_clock_jiffies(void)
> {
> - u64 jiffy = jiffies - INITIAL_JIFFIES;
> + u64 jiffy = jiffies_64 - INITIAL_JIFFIES;
>
> /* Return nsecs */
> - return (u64)jiffies_to_usecs(jiffy) * 1000ULL;
> +#if !(NSEC_PER_SEC % HZ)
> + return (NSEC_PER_SEC / HZ) * jiffy;
> +#else
> + return (jiffy * HZ_TO_NSEC_NUM) / HZ_TO_NSEC_DEN;

Wont this break on 32 bit systems. That is, you can't divide 64bit
integers without using do_div().

-- Steve

> +#endif
> }
>
> /*

2014-06-30 20:31:44

by Tony Luck

[permalink] [raw]
Subject: [PATCH-v2] tracing: Fix wraparound problems in "uptime" tracer

The "uptime" tracer added in:
commit 8aacf017b065a805d27467843490c976835eb4a5
tracing: Add "uptime" trace clock that uses jiffies
has wraparound problems when the system has been up more
than 1 hour 11 minutes and 34 seconds. It converts jiffies
to nanoseconds using:
(u64)jiffies_to_usecs(jiffy) * 1000ULL
but since jiffies_to_usecs() only returns a 32-bit value, it
truncates at 2^32 microseconds. An additional problem on 32-bit
systems is that the argument is "unsigned long", so fixing the
return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
system).

We can't provide a full features jiffies_to_nsec() function in
any safe way (32-bit systems need locking to read the full 64-bit
jiffies value). Just do the best we can here and recognise that
32-bit systems may seem some timestamp anomolies if jiffies64
was in the middle of rolling over a 2^32 boundary.

Signed-off-by: Tony Luck <[email protected]>
---

v1->v2: Use "do_div()" for 64-bit division

kernel/timeconst.bc | 6 ++++++
kernel/trace/trace_clock.c | 12 ++++++++++--
2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/timeconst.bc b/kernel/timeconst.bc
index 511bdf2cafda..a5fef7a7fb27 100644
--- a/kernel/timeconst.bc
+++ b/kernel/timeconst.bc
@@ -100,6 +100,12 @@ define timeconst(hz) {
print "#define USEC_TO_HZ_DEN\t\t", 1000000/cd, "\n"
print "\n"

+ obase=10
+ cd=gcd(hz,1000000000)
+ print "#define HZ_TO_NSEC_NUM\t\t", 1000000000/cd, "\n"
+ print "#define HZ_TO_NSEC_DEN\t\t", hz/cd, "\n"
+ print "\n"
+
print "#endif /* KERNEL_TIMECONST_H */\n"
}
halt
diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index 26dc348332b7..3fcf8f4807d5 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -59,13 +59,21 @@ u64 notrace trace_clock(void)

/*
* trace_jiffy_clock(): Simply use jiffies as a clock counter.
+ * This usage of jiffies_64 isn't safe on 32-bit, but we may be
+ * called from NMI context, and we have no safe way to get a timestamp.
*/
u64 notrace trace_clock_jiffies(void)
{
- u64 jiffy = jiffies - INITIAL_JIFFIES;
+ u64 jiffy = jiffies_64 - INITIAL_JIFFIES;

/* Return nsecs */
- return (u64)jiffies_to_usecs(jiffy) * 1000ULL;
+#if !(NSEC_PER_SEC % HZ)
+ return (NSEC_PER_SEC / HZ) * jiffy;
+#else
+ jiffy *= HZ_TO_NSEC_NUM;
+ do_div(jiffy, HZ_TO_NSEC_DEN);
+ return jiffy;
+#endif
}

/*
--
1.8.4.1

2014-07-17 23:02:28

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH-v2] tracing: Fix wraparound problems in "uptime" tracer

On Mon, Jun 30, 2014 at 1:31 PM, Tony Luck <[email protected]> wrote:
> +#else
> + jiffy *= HZ_TO_NSEC_NUM;
> + do_div(jiffy, HZ_TO_NSEC_DEN);
> + return jiffy;
> +#endif

Well that didn't work either. kernel/trace/trace_clock.c doesn't include
timeconst.h - so we fail the compile because HZ_TO_NSEC_{NUM,DEN}
aren't defined. a #include looks dicey because timeconst.h is a generated
file up the the kernel/ level ... so getting Makefile dependencies right would
be ugly.

Do we really need to convert to nanoseconds? Couldn't we just return
jiffies:

u64 notrace trace_clock_jiffies(void)
{
return jiffies_64 - INITIAL_JIFFIES;
}

and leave it as an exercise to userspace to interpret that?

-Tony

[Yes, this changes the UI . but there can't be any serious users
given that the existing UI only works for 1hr 11 minutes after each
boot]

2014-07-18 02:09:03

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH-v2] tracing: Fix wraparound problems in "uptime" tracer

On Thu, 17 Jul 2014 16:02:26 -0700
Tony Luck <[email protected]> wrote:

> On Mon, Jun 30, 2014 at 1:31 PM, Tony Luck <[email protected]> wrote:
> > +#else
> > + jiffy *= HZ_TO_NSEC_NUM;
> > + do_div(jiffy, HZ_TO_NSEC_DEN);
> > + return jiffy;
> > +#endif
>
> Well that didn't work either. kernel/trace/trace_clock.c doesn't include
> timeconst.h - so we fail the compile because HZ_TO_NSEC_{NUM,DEN}
> aren't defined. a #include looks dicey because timeconst.h is a generated
> file up the the kernel/ level ... so getting Makefile dependencies right would
> be ugly.
>
> Do we really need to convert to nanoseconds? Couldn't we just return
> jiffies:

Sure, and we can make it a "counter". That is, the counters don't hide
1000 counts on output.

>
> u64 notrace trace_clock_jiffies(void)
> {
> return jiffies_64 - INITIAL_JIFFIES;
> }
>
> and leave it as an exercise to userspace to interpret that?
>
> -Tony
>
> [Yes, this changes the UI . but there can't be any serious users
> given that the existing UI only works for 1hr 11 minutes after each
> boot]

I doubt it will break any tools, so it's fine.

-- Steve

2014-07-18 17:05:03

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH-v2] tracing: Fix wraparound problems in "uptime" tracer

On Thu, Jul 17, 2014 at 7:08 PM, Steven Rostedt <[email protected]> wrote:
>> Do we really need to convert to nanoseconds? Couldn't we just return
>> jiffies:
>
> Sure, and we can make it a "counter". That is, the counters don't hide
> 1000 counts on output.

Can you explain that a bit more ... how do we mark it as a "counter"?

>> u64 notrace trace_clock_jiffies(void)
>> {
>> return jiffies_64 - INITIAL_JIFFIES;
>> }

Hmm it seems that we make it hard for applications to see what HZ
value the kernel is actually using. All user-facing interfaces should
convert to USER_HZ (100). So we should say:

u64 notrace trace_clock_jiffies(void)
{
return jiffies_64_to_clock_t(jiffies_64 - INITIAL_JIFFIES);
}

Do we need to change anything in Documentation/trace/ftrace.txt?

Currently it says:

uptime: This uses the jiffies counter and the time stamp
is relative to the time since boot up.

-Tony

2014-07-18 17:36:19

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH-v2] tracing: Fix wraparound problems in "uptime" tracer

On Fri, 18 Jul 2014 10:05:00 -0700
Tony Luck <[email protected]> wrote:

> On Thu, Jul 17, 2014 at 7:08 PM, Steven Rostedt <[email protected]> wrote:
> >> Do we really need to convert to nanoseconds? Couldn't we just return
> >> jiffies:
> >
> > Sure, and we can make it a "counter". That is, the counters don't hide
> > 1000 counts on output.
>
> Can you explain that a bit more ... how do we mark it as a "counter"?

My mistake, I should have said, mark it as not "in_ns"...

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index bda9621..291397e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -823,7 +823,7 @@ static struct {
{ trace_clock_local, "local", 1 },
{ trace_clock_global, "global", 1 },
{ trace_clock_counter, "counter", 0 },
- { trace_clock_jiffies, "uptime", 1 },
+ { trace_clock_jiffies, "uptime", 0 },
{ trace_clock, "perf", 1 },
ARCH_TRACE_CLOCKS
};

>
> >> u64 notrace trace_clock_jiffies(void)
> >> {
> >> return jiffies_64 - INITIAL_JIFFIES;
> >> }
>
> Hmm it seems that we make it hard for applications to see what HZ
> value the kernel is actually using. All user-facing interfaces should
> convert to USER_HZ (100). So we should say:

Yeah, I remember theses conversations in the past. Basically, HZ is an
internal kernel operation and userspace shouldn't know what it is,
otherwise you may have tools doing whacky things based on it.

Which is a good thing when we eventually get rid of the HZ idea.

>
> u64 notrace trace_clock_jiffies(void)
> {
> return jiffies_64_to_clock_t(jiffies_64 - INITIAL_JIFFIES);
> }
>
> Do we need to change anything in Documentation/trace/ftrace.txt?
>
> Currently it says:
>
> uptime: This uses the jiffies counter and the time stamp
> is relative to the time since boot up.

Yeah, and even if we change it to be converted, I believe that comment
is still correct. It does use the jiffies counter, and the time stamp
is still relative to system boot up.

-- Steve

2014-07-18 18:43:26

by Tony Luck

[permalink] [raw]
Subject: [PATCH] tracing: Fix wraparound problems in "uptime" tracer

The "uptime" tracer added in:

commit 8aacf017b065a805d27467843490c976835eb4a5
tracing: Add "uptime" trace clock that uses jiffies

has wraparound problems when the system has been up more
than 1 hour 11 minutes and 34 seconds. It converts jiffies
to nanoseconds using:
(u64)jiffies_to_usecs(jiffy) * 1000ULL
but since jiffies_to_usecs() only returns a 32-bit value, it
truncates at 2^32 microseconds. An additional problem on 32-bit
systems is that the argument is "unsigned long", so fixing the
return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
system).

Avoid these problems by using jiffies_64 as our basis, and
not converting to nanoseconds (we do convert to clock_t because
user facing API must not be dependent on internal kernel
HZ values).

Signed-off-by: Tony Luck <[email protected]>
---
kernel/trace/trace.c | 2 +-
kernel/trace/trace_clock.c | 9 +++++----
2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 384ede311717..eb7d56480545 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -809,7 +809,7 @@ static struct {
{ trace_clock_local, "local", 1 },
{ trace_clock_global, "global", 1 },
{ trace_clock_counter, "counter", 0 },
- { trace_clock_jiffies, "uptime", 1 },
+ { trace_clock_jiffies, "uptime", 0 },
{ trace_clock, "perf", 1 },
ARCH_TRACE_CLOCKS
};
diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index 26dc348332b7..57b67b1f24d1 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -59,13 +59,14 @@ u64 notrace trace_clock(void)

/*
* trace_jiffy_clock(): Simply use jiffies as a clock counter.
+ * Note that this use of jiffies_64 is not completely safe on
+ * 32-bit systems. But the window is tiny, and the effect if
+ * we are affected is that we will have an obviously bogus
+ * timestamp on a trace event - i.e. not life threatening.
*/
u64 notrace trace_clock_jiffies(void)
{
- u64 jiffy = jiffies - INITIAL_JIFFIES;
-
- /* Return nsecs */
- return (u64)jiffies_to_usecs(jiffy) * 1000ULL;
+ return jiffies_64_to_clock_t(jiffies_64 - INITIAL_JIFFIES);
}

/*
--
1.8.4.1

2014-07-18 18:47:38

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] tracing: Fix wraparound problems in "uptime" tracer

On Fri, 18 Jul 2014 11:43:01 -0700
Tony Luck <[email protected]> wrote:

> The "uptime" tracer added in:
>
> commit 8aacf017b065a805d27467843490c976835eb4a5
> tracing: Add "uptime" trace clock that uses jiffies
>
> has wraparound problems when the system has been up more
> than 1 hour 11 minutes and 34 seconds. It converts jiffies
> to nanoseconds using:
> (u64)jiffies_to_usecs(jiffy) * 1000ULL
> but since jiffies_to_usecs() only returns a 32-bit value, it
> truncates at 2^32 microseconds. An additional problem on 32-bit
> systems is that the argument is "unsigned long", so fixing the
> return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
> system).
>
> Avoid these problems by using jiffies_64 as our basis, and
> not converting to nanoseconds (we do convert to clock_t because
> user facing API must not be dependent on internal kernel
> HZ values).
>

Do you think this is worthy of stable and 3.16 material?

-- Steve

> Signed-off-by: Tony Luck <[email protected]>
> ---
> kernel/trace/trace.c | 2 +-
> kernel/trace/trace_clock.c | 9 +++++----
> 2 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 384ede311717..eb7d56480545 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -809,7 +809,7 @@ static struct {
> { trace_clock_local, "local", 1 },
> { trace_clock_global, "global", 1 },
> { trace_clock_counter, "counter", 0 },
> - { trace_clock_jiffies, "uptime", 1 },
> + { trace_clock_jiffies, "uptime", 0 },
> { trace_clock, "perf", 1 },
> ARCH_TRACE_CLOCKS
> };
> diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
> index 26dc348332b7..57b67b1f24d1 100644
> --- a/kernel/trace/trace_clock.c
> +++ b/kernel/trace/trace_clock.c
> @@ -59,13 +59,14 @@ u64 notrace trace_clock(void)
>
> /*
> * trace_jiffy_clock(): Simply use jiffies as a clock counter.
> + * Note that this use of jiffies_64 is not completely safe on
> + * 32-bit systems. But the window is tiny, and the effect if
> + * we are affected is that we will have an obviously bogus
> + * timestamp on a trace event - i.e. not life threatening.
> */
> u64 notrace trace_clock_jiffies(void)
> {
> - u64 jiffy = jiffies - INITIAL_JIFFIES;
> -
> - /* Return nsecs */
> - return (u64)jiffies_to_usecs(jiffy) * 1000ULL;
> + return jiffies_64_to_clock_t(jiffies_64 - INITIAL_JIFFIES);
> }
>
> /*

2014-07-18 18:54:29

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH] tracing: Fix wraparound problems in "uptime" tracer

On Fri, Jul 18, 2014 at 11:47 AM, Steven Rostedt <[email protected]> wrote:
> Do you think this is worthy of stable and 3.16 material?

Perhaps. Not sure that I need it (I'm stuck waiting for the extlog_trace pieces
which are now queued in "tip", but won't go in until 3.17). But I suppose we
should minimize the window where people see the bogus wraparound nanoseconds.

-Tony