Hello,
From reviewing the dynamic tick code, I noticed that the "max_delta_ns"
member of the "clock_event_device" structure, represents the upper limit
between timer events in nanoseconds. The variable, "max_delta_ns", is
defined as an unsigned long. An unsigned long is a 32-bit integer for
32-bit machines and is a 64-bit integer for 64-bit machines (if -m64
option is used for gcc). Also see [1].
The variable, "max_delta_ns", is configured by calling function
"clockevent_delta2ns()". The maximum value that "max_delta_ns" can be
set to by calling clockevent_delta2ns(), is LONG_MAX. For a 32-bit
machine LONG_MAX is equal to 0x7fffffff and in nanoseconds this equates
to ~2.15 seconds. Hence, the maximum sleep time for a 32-bit machine is
~2.15 seconds, where as for a 64-bit machine it will be many years.
Therefore, I wanted to propose changing the type of max_delta_ns to be
"unsigned long long" instead of "unsigned long". My understanding is
that a variable of type "unsigned long long" is 64-bits for both 32-bit
and 64-bit machines and hence this would allow a 32-bit machine to sleep
for longer than ~2.15 seconds. Please note that I also ended up making
"min_delta_ns" of type "unsigned long long" too and although this is
probably very unnecessary, it made the patch simpler. See below.
Anyway, making this change has allowed my 32-bit machine to sleep for
longer than ~2.15 seconds and thought this could be of interest to
others. Your comments/feedback would be appreciated.
[1] http://en.wikipedia.org/wiki/64-bit#64-bit_data_models
Cheers
Jon
Signed-off-by: Jon Hunter <[email protected]>
---
include/linux/clockchips.h | 6 +++---
kernel/hrtimer.c | 2 +-
kernel/time/clockevents.c | 10 +++++-----
kernel/time/tick-oneshot.c | 2 +-
kernel/time/timer_list.c | 4 ++--
5 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 3a1dbba..8154bc6 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -77,8 +77,8 @@ enum clock_event_nofitiers {
struct clock_event_device {
const char *name;
unsigned int features;
- unsigned long max_delta_ns;
- unsigned long min_delta_ns;
+ unsigned long long max_delta_ns;
+ unsigned long long min_delta_ns;
unsigned long mult;
int shift;
int rating;
@@ -116,7 +116,7 @@ static inline unsigned long div_sc(unsigned long
ticks, unsigned long nsec,
}
/* Clock event layer functions */
-extern unsigned long clockevent_delta2ns(unsigned long latch,
+extern unsigned long long clockevent_delta2ns(unsigned long latch,
struct clock_event_device *evt);
extern void clockevents_register_device(struct clock_event_device *dev);
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index cb8a15c..5b1cdc4 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1199,7 +1199,7 @@ hrtimer_interrupt_hanging(struct
clock_event_device *dev,
force_clock_reprogram = 1;
dev->min_delta_ns = (unsigned long)try_time.tv64 * 3;
printk(KERN_WARNING "hrtimer: interrupt too slow, "
- "forcing clock min delta to %lu ns\n", dev->min_delta_ns);
+ "forcing clock min delta to %llu ns\n", dev->min_delta_ns);
}
/*
* High resolution timer interrupt
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index d13be21..3fa07b3 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -36,10 +36,10 @@ static DEFINE_SPINLOCK(clockevents_lock);
*
* Math helper, returns latch value converted to nanoseconds (bound
checked)
*/
-unsigned long clockevent_delta2ns(unsigned long latch,
+unsigned long long clockevent_delta2ns(unsigned long latch,
struct clock_event_device *evt)
{
- u64 clc = ((u64) latch << evt->shift);
+ unsigned long long clc = ((unsigned long long) latch << evt->shift);
if (unlikely(!evt->mult)) {
evt->mult = 1;
@@ -49,10 +49,10 @@ unsigned long clockevent_delta2ns(unsigned long latch,
do_div(clc, evt->mult);
if (clc < 1000)
clc = 1000;
- if (clc > LONG_MAX)
- clc = LONG_MAX;
+ if (clc > LLONG_MAX)
+ clc = LLONG_MAX;
- return (unsigned long) clc;
+ return clc;
}
/**
diff --git a/kernel/time/tick-oneshot.c b/kernel/time/tick-oneshot.c
index 2e8de67..857087b 100644
--- a/kernel/time/tick-oneshot.c
+++ b/kernel/time/tick-oneshot.c
@@ -50,7 +50,7 @@ int tick_dev_program_event(struct clock_event_device
*dev, ktime_t expires,
dev->min_delta_ns += dev->min_delta_ns >> 1;
printk(KERN_WARNING
- "CE: %s increasing min_delta_ns to %lu nsec\n",
+ "CE: %s increasing min_delta_ns to %llu nsec\n",
dev->name ? dev->name : "?",
dev->min_delta_ns << 1);
diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
index a999b92..3bf30b4 100644
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -204,8 +204,8 @@ print_tickdevice(struct seq_file *m, struct
tick_device *td, int cpu)
return;
}
SEQ_printf(m, "%s\n", dev->name);
- SEQ_printf(m, " max_delta_ns: %lu\n", dev->max_delta_ns);
- SEQ_printf(m, " min_delta_ns: %lu\n", dev->min_delta_ns);
+ SEQ_printf(m, " max_delta_ns: %llu\n", dev->max_delta_ns);
+ SEQ_printf(m, " min_delta_ns: %llu\n", dev->min_delta_ns);
SEQ_printf(m, " mult: %lu\n", dev->mult);
SEQ_printf(m, " shift: %d\n", dev->shift);
SEQ_printf(m, " mode: %d\n", dev->mode);
--
1.6.1
* Jon Hunter <[email protected]> wrote:
> Hello,
>
> From reviewing the dynamic tick code, I noticed that the
> "max_delta_ns" member of the "clock_event_device" structure,
> represents the upper limit between timer events in nanoseconds.
> The variable, "max_delta_ns", is defined as an unsigned long. An
> unsigned long is a 32-bit integer for 32-bit machines and is a
> 64-bit integer for 64-bit machines (if -m64 option is used for
> gcc). Also see [1].
>
> The variable, "max_delta_ns", is configured by calling function
> "clockevent_delta2ns()". The maximum value that "max_delta_ns" can
> be set to by calling clockevent_delta2ns(), is LONG_MAX. For a
> 32-bit machine LONG_MAX is equal to 0x7fffffff and in nanoseconds
> this equates to ~2.15 seconds. Hence, the maximum sleep time for a
> 32-bit machine is ~2.15 seconds, where as for a 64-bit machine it
> will be many years.
>
> Therefore, I wanted to propose changing the type of max_delta_ns
> to be "unsigned long long" instead of "unsigned long". My
> understanding is that a variable of type "unsigned long long" is
> 64-bits for both 32-bit and 64-bit machines and hence this would
> allow a 32-bit machine to sleep for longer than ~2.15 seconds.
> Please note that I also ended up making "min_delta_ns" of type
> "unsigned long long" too and although this is probably very
> unnecessary, it made the patch simpler. See below.
>
> Anyway, making this change has allowed my 32-bit machine to sleep
> for longer than ~2.15 seconds and thought this could be of
> interest to others. Your comments/feedback would be appreciated.
>
> [1] http://en.wikipedia.org/wiki/64-bit#64-bit_data_models
>
> Cheers
> Jon
>
>
> Signed-off-by: Jon Hunter <[email protected]>
> ---
> include/linux/clockchips.h | 6 +++---
> kernel/hrtimer.c | 2 +-
> kernel/time/clockevents.c | 10 +++++-----
> kernel/time/tick-oneshot.c | 2 +-
> kernel/time/timer_list.c | 4 ++--
> 5 files changed, 12 insertions(+), 12 deletions(-)
Looks like a good thing at first sight. Thomas, John, what do you
think?
Ingo
On Tue, 2009-04-21 at 08:35 +0200, Ingo Molnar wrote:
> * Jon Hunter <[email protected]> wrote:
> > Anyway, making this change has allowed my 32-bit machine to sleep
> > for longer than ~2.15 seconds and thought this could be of
> > interest to others. Your comments/feedback would be appreciated.
> >
> Looks like a good thing at first sight. Thomas, John, what do you
> think?
The concern is many clocksources wrap after a handful of seconds. The
acpi_pm is the best example (its only 24 bits wide).
I brought this issue up earlier, and provided some example code that
could be used to limit the idle time appropriately for the current
clocksource here:
http://lkml.indiana.edu/hypermail/linux/kernel/0901.3/02693.html
Jon: Did you see that mail, or is there a reason you didn't adapt this
code into your patch?
thanks
-john
john stultz wrote:
> The concern is many clocksources wrap after a handful of seconds. The
> acpi_pm is the best example (its only 24 bits wide).
>
> I brought this issue up earlier, and provided some example code that
> could be used to limit the idle time appropriately for the current
> clocksource here:
>
> http://lkml.indiana.edu/hypermail/linux/kernel/0901.3/02693.html
>
> Jon: Did you see that mail, or is there a reason you didn't adapt this
> code into your patch?
Hi John, yes I did read this email and thanks for bringing this up.
As I looked at this more I noticed that for 64-bit machines that the
max_delta_ns would be a 64-bit integer already and so this change would
only have an impact for 32-bit machines. I understand that there are
more 32-bit machines that 64-bit. However, I was trying to understand
how the wrapping of clocksources, such as the one you mention above, is
handled today for 64-bit machines that could theoretically sleep for
longer periods.
In addition to this as I was reviewing the "tick_nohz_stop_sched_tick()"
function that is configuring the dynamic tick and I noticed that this
code would actually stop the timer altogether if the time for the next
timer event is greater than NEXT_TIMER_MAX_DELTA jiffies. See code
snippet below. This is very unlikely, however, if this scenario was to
occur what would be the impact on the clocksource?
/*
* delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
* there is no timer pending or at least extremly far
* into the future (12 days for HZ=1000). In this case
* we simply stop the tick timer:
*/
if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
ts->idle_expires.tv64 = KTIME_MAX;
if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
hrtimer_cancel(&ts->sched_timer);
goto out;
}
I understand that clocksources need to be handled correctly, but as I
looked into this more I wanted to clarify how this is handled today for
64-bit machines. I appreciate your comments and feedback.
Cheers
Jon
On Tue, 2009-04-21 at 18:20 -0500, Jon Hunter wrote:
> john stultz wrote:
> > The concern is many clocksources wrap after a handful of seconds. The
> > acpi_pm is the best example (its only 24 bits wide).
> >
> > I brought this issue up earlier, and provided some example code that
> > could be used to limit the idle time appropriately for the current
> > clocksource here:
> >
> > http://lkml.indiana.edu/hypermail/linux/kernel/0901.3/02693.html
> >
> > Jon: Did you see that mail, or is there a reason you didn't adapt this
> > code into your patch?
>
> Hi John, yes I did read this email and thanks for bringing this up.
>
> As I looked at this more I noticed that for 64-bit machines that the
> max_delta_ns would be a 64-bit integer already and so this change would
> only have an impact for 32-bit machines. I understand that there are
> more 32-bit machines that 64-bit. However, I was trying to understand
> how the wrapping of clocksources, such as the one you mention above, is
> handled today for 64-bit machines that could theoretically sleep for
> longer periods.
As the actual max_delta_ns currently comes from the clockevent driver,
that is our current limitation. For instance, on a box I'm using, the
lapic's max_delta_ns is a little more then 600ms. The HPET's does allow
for ~149sec timers, which would break with acpi_pm, but I suspect boxes
using the HPET for tick interrupts are currently using HPET for the
clocksource as well, which keeps it safe.
So I suspect its mostly luck that system configs don't hit the issue
that's saved us so far on 64bit boxes.
So yes, while your patch in of itself doesn't create the issue, it does
open the window a bit more. I'm just saying we need to add the
clocksource limiting factor in before more odd configs trip over it. :)
> In addition to this as I was reviewing the "tick_nohz_stop_sched_tick()"
> function that is configuring the dynamic tick and I noticed that this
> code would actually stop the timer altogether if the time for the next
> timer event is greater than NEXT_TIMER_MAX_DELTA jiffies. See code
> snippet below. This is very unlikely, however, if this scenario was to
> occur what would be the impact on the clocksource?
>
> /*
> * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
> * there is no timer pending or at least extremly far
>
> * into the future (12 days for HZ=1000). In this case
> * we simply stop the tick timer:
> */
> if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
> ts->idle_expires.tv64 = KTIME_MAX;
> if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
> hrtimer_cancel(&ts->sched_timer);
> goto out;
> }
Fair point. Thomas? Why do we kill the timer in the above case? Can that
catch us on an UP box? If so what would ever wake us up if there were no
other system interrupts?
thanks
-john
On Tue, 2009-04-21 at 18:20 -0500, Jon Hunter wrote:
> john stultz wrote:
> > The concern is many clocksources wrap after a handful of seconds. The
> > acpi_pm is the best example (its only 24 bits wide).
> >
> > I brought this issue up earlier, and provided some example code that
> > could be used to limit the idle time appropriately for the current
> > clocksource here:
> >
> > http://lkml.indiana.edu/hypermail/linux/kernel/0901.3/02693.html
> >
> > Jon: Did you see that mail, or is there a reason you didn't adapt this
> > code into your patch?
>
> Hi John, yes I did read this email and thanks for bringing this up.
>
> As I looked at this more I noticed that for 64-bit machines that the
> max_delta_ns would be a 64-bit integer already and so this change would
> only have an impact for 32-bit machines. I understand that there are
> more 32-bit machines that 64-bit. However, I was trying to understand
> how the wrapping of clocksources, such as the one you mention above, is
> handled today for 64-bit machines that could theoretically sleep for
> longer periods.
One other minor comment nit, if we're really meaning that max_delta_ns
is a 64bit value, should we not just be using s64 and be explicit
instead of converting longs to long longs?
thanks
-john
john stultz wrote:
> One other minor comment nit, if we're really meaning that max_delta_ns
> is a 64bit value, should we not just be using s64 and be explicit
> instead of converting longs to long longs?
Thanks for the feedback. I am in two minds about this. I agree and I
would prefer to use s64/u64 as this is explicit with regard to the size
of the data type. However, for right or wrong I ended up with long long
because...
a). The function clockevent_delta2ns() uses LONG_MAX (or in my
suggestion LLONG_MAX) as the upper limit. LONG_MAX and LLONG_MAX are
defined as a long and long long respectively.
#define LONG_MAX ((long)(~0UL>>1))
#define LLONG_MAX ((long long)(~0ULL>>1))
b). There are a couple prints in the kernel for display max_delta_ns and
min_delta_ns. The prints use %lu and %llu to indicate long and long long
types respectively.
So to avoid using casts or possibly a type mismatch for some
architecture that I may have overlooked I kept it as long long. So this
assumes that long long will be a 64-bit type which I don't like.
However, this would not cause any compilation issues even if long long
turned out to be 32-bits or 128-bits (if this is even possible). We
could use u64 and cast where necessary to be safe/correct if this is
preferred.
Cheers
Jon
Jon Hunter wrote:
> So this
> assumes that long long will be a 64-bit type which I don't like.
> However, this would not cause any compilation issues even if long long
> turned out to be 32-bits or 128-bits (if this is even possible).
Isn't "long long" guaranteed to be 64-bit on all linux systems?
Unless the width is critical, I'd prefer to stay away from u64 until it
gets unified between architectures. I recently ran into a problem
printk-ing a "u64" value because it was a different type on ppc64 than
x86-64.
Chris
Chris Friesen wrote:
> Isn't "long long" guaranteed to be 64-bit on all linux systems?
If long long is guaranteed to be 64-bits this is the way to go. Looks
like there was some previous discussion on making u64 always a long
long, but I am not sure that this happened [1]. So may be this does
confirm this?
> Unless the width is critical, I'd prefer to stay away from u64 until it
> gets unified between architectures. I recently ran into a problem
> printk-ing a "u64" value because it was a different type on ppc64 than
> x86-64.
It is not critical but maybe more ideal, as it would be nice to be
explicit that this variable is intended to be 64bits. In fact the issue
you saw with the printk is one of the reasons that I previously
mentioned of why I had opted to stay with long long. I also found that
this issue was discussed in the thread I mentioned above [1]. Seems like
a common problem.
The alternative is to use u64 and make sure that all printks cast the
variable to long long where necessary. However, this is not clean and
you do run the risk of a new print being added that does not take this
into account and breaks the code for some architectures. So I wished to
avoid this.
For this specific case using long long should be fine. Even if there is
a case where long long is not 64bits, then this would not break
functionality, simply increase of decrease the dynamic range of
max_delta_ns and min_delta_ns.
Cheers
Jon
[1] http://lkml.indiana.edu/hypermail/linux/kernel/0807.2/2805.html
On Wed, Apr 22, 2009 at 19:04, Jon Hunter <[email protected]> wrote:
> Chris Friesen wrote:
>>
>> Isn't "long long" guaranteed to be 64-bit on all linux systems?
>
> If long long is guaranteed to be 64-bits this is the way to go. Looks like
> there was some previous discussion on making u64 always a long long, but I
> am not sure that this happened [1]. So may be this does confirm this?
>
>> Unless the width is critical, I'd prefer to stay away from u64 until it
>> gets unified between architectures. I recently ran into a problem
>> printk-ing a "u64" value because it was a different type on ppc64 than
>> x86-64.
>
> It is not critical but maybe more ideal, as it would be nice to be explicit
> that this variable is intended to be 64bits. In fact the issue you saw with
> the printk is one of the reasons that I previously mentioned of why I had
> opted to stay with long long. I also found that this issue was discussed in
> the thread I mentioned above [1]. Seems like a common problem.
>
> The alternative is to use u64 and make sure that all printks cast the
> variable to long long where necessary. However, this is not clean and you do
> run the risk of a new print being added that does not take this into account
> and breaks the code for some architectures. So I wished to avoid this.
That's why recently, u64 became `unsigned long long' on ppc64. So please stay
away from the casts.
--
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds