Hi All,
On some new Intel Atom processors (Penwell and Cloverview), there is
a feature that the TSC won't stop in S3, say the TSC value won't be
reset to 0 after resume. This feature makes TSC a more reliable
clocksource and could benefit the timekeeping code during system
suspend/resume cycles.
The enabling efforts include adding new flags for this feature,
modifying clocksource.c and timekeeping.c to support and utilizing
it.
The major change to timekeeping is the way to count suspened time,
current way is trying to use the persistent clock first, and then
try the rtc if persistent clock can't do it. This patch will change
the trying order to:
suspend-nonstop clocksource -> persistent clock -> rtc
Please help to review them, thanks a lot!
Changelog:
v2:
* Dump the code changing a clocksource's mult and shit,
as suggested by John to not hurt accuracy
* Modify the CPU feature flag name to be consistent with
other flags
* Solve the problem of judging S3/S4, as the clocksource
counter will be reset after coming out S4.
- Feng
Feng Tang (4):
x86: Add cpu capability flag X86_FEATURE_NONSTOP_TSC_S3
clocksource: Add new feature flag CLOCK_SOURCE_SUSPEND_NOTSTOP
x86: tsc: Add support for new S3_NOTSTOP feature
timekeeping: utilize the suspend-nonstop clocksource to count
suspended time
arch/x86/include/asm/cpufeature.h | 1 +
arch/x86/kernel/cpu/intel.c | 12 ++++++++
arch/x86/kernel/tsc.c | 6 +++-
include/linux/clocksource.h | 1 +
kernel/time/timekeeping.c | 57 +++++++++++++++++++++++++++++++-----
5 files changed, 68 insertions(+), 9 deletions(-)
Signed-off-by: Feng Tang <[email protected]>
---
arch/x86/kernel/tsc.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 4b9ea10..fb581c6 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -768,7 +768,8 @@ static cycle_t read_tsc(struct clocksource *cs)
static void resume_tsc(struct clocksource *cs)
{
- clocksource_tsc.cycle_last = 0;
+ if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
+ clocksource_tsc.cycle_last = 0;
}
static struct clocksource clocksource_tsc = {
@@ -939,6 +940,9 @@ static int __init init_tsc_clocksource(void)
clocksource_tsc.flags &= ~CLOCK_SOURCE_IS_CONTINUOUS;
}
+ if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
+ clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NOTSTOP;
+
/*
* Trust the results of the earlier calibration on systems
* exporting a reliable TSC.
--
1.7.0.4
Some x86 processors have a TSC clocksource, which continue to work when
system is suspend. Add a feature flag so that it could be utilized.
Signed-off-by: Feng Tang <[email protected]>
---
include/linux/clocksource.h | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 27cfda4..5dee330 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -206,6 +206,7 @@ struct clocksource {
#define CLOCK_SOURCE_WATCHDOG 0x10
#define CLOCK_SOURCE_VALID_FOR_HRES 0x20
#define CLOCK_SOURCE_UNSTABLE 0x40
+#define CLOCK_SOURCE_SUSPEND_NOTSTOP 0x80
/* simplify initialization of mask field */
#define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ? ((1ULL<<(bits))-1) : -1)
--
1.7.0.4
There are some new processors whose TSC clocksource won't stop during
suspend. Currently, after system resumes, kernel will use persistent
clock or RTC to compensate the sleep time, but for those new types of
clocksources, we could skip the special compensation from external
sources, and just use current clocksource for time recounting.
This can solve some time drift bugs caused by some not-so-accurate or
error-prone RTC devices.
The current way to count suspened time is first try to use the persistent
clock, and then try the rtc if persistent clock can't be used. This
patch will change the trying order to:
suspend-nonstop clocksource -> persistent clock -> rtc
Signed-off-by: Feng Tang <[email protected]>
---
kernel/time/timekeeping.c | 57 ++++++++++++++++++++++++++++++++++++++------
1 files changed, 49 insertions(+), 8 deletions(-)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 9a0bc98..15cc086 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -788,22 +788,63 @@ void timekeeping_inject_sleeptime(struct timespec *delta)
static void timekeeping_resume(void)
{
struct timekeeper *tk = &timekeeper;
+ struct clocksource *clock = tk->clock;
unsigned long flags;
- struct timespec ts;
+ struct timespec ts_new, ts_delta;
+ cycle_t cycle_now, cycle_delta;
+ s64 nsec;
- read_persistent_clock(&ts);
+ ts_delta.tv_sec = 0;
+ read_persistent_clock(&ts_new);
clockevents_resume();
clocksource_resume();
write_seqlock_irqsave(&tk->lock, flags);
- if (timespec_compare(&ts, &timekeeping_suspend_time) > 0) {
- ts = timespec_sub(ts, timekeeping_suspend_time);
- __timekeeping_inject_sleeptime(tk, &ts);
- }
- /* re-base the last cycle value */
- tk->clock->cycle_last = tk->clock->read(tk->clock);
+ /*
+ * After system resumes, we need to calculate the suspended time and
+ * compensate it for the OS time. There are 3 sources that could be
+ * used: Nonstop clocksource during suspend, persistent clock and rtc
+ * device.
+ *
+ * One specific platform may have 1 or 2 or all of them, and the
+ * preference will be:
+ * suspend-nonstop clocksource > persistent clock > rtc
+ * The less preferred source will only be tried if there is no better
+ * usable source. The rtc part is handled separately in rtc core code.
+ */
+ cycle_now = clock->read(clock);
+ if ((clock->flags & CLOCK_SOURCE_SUSPEND_NOTSTOP) &&
+ cycle_now > clock->cycle_last) {
+
+ u64 max_cycles;
+ u32 mult = clock->mult;
+ u32 shift = clock->shift;
+
+ cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
+ /*
+ * The system suspended time and the delta cycles may be very
+ * long, so we can't call clocksource_cyc2ns() directly with
+ * clocksource's default mult and shift to avoid overflow.
+ */
+ max_cycles = 1ULL << (63 - (ilog2(mult) + 1));
+ while (cycle_delta > max_cycles) {
+ max_cycles <<= 1;
+ mult >>= 1;
+ shift--;
+ }
+
+ nsec = clocksource_cyc2ns(cycle_delta, mult, shift);
+ ts_delta = ns_to_timespec(nsec);
+ } else if (timespec_compare(&ts_new, &timekeeping_suspend_time) > 0)
+ ts_delta = timespec_sub(ts_new, timekeeping_suspend_time);
+
+ if (ts_delta.tv_sec >= 1)
+ __timekeeping_inject_sleeptime(tk, &ts_delta);
+
+ /* Re-base the last cycle value */
+ clock->cycle_last = clock->read(clock);
tk->ntp_error = 0;
timekeeping_suspended = 0;
timekeeping_update(tk, false);
--
1.7.0.4
On some new Intel Atom processors (Penwell and Cloverview), there is
a feature that the TSC won't stop in S3 state, say the TSC value
won't be reset to 0 after resume. This feature makes TSC a more reliable
clocksource and could benefit the timekeeping code during system
suspend/resume cycle, so add a flag for it.
Signed-off-by: Feng Tang <[email protected]>
---
arch/x86/include/asm/cpufeature.h | 1 +
arch/x86/kernel/cpu/intel.c | 12 ++++++++++++
2 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 93fe929..a8466f2 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -100,6 +100,7 @@
#define X86_FEATURE_AMD_DCM (3*32+27) /* multi-node processor */
#define X86_FEATURE_APERFMPERF (3*32+28) /* APERFMPERF */
#define X86_FEATURE_EAGER_FPU (3*32+29) /* "eagerfpu" Non lazy FPU restore */
+#define X86_FEATURE_NONSTOP_TSC_S3 (3*32+30) /* TSC doesn't stop in S3 state */
/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
#define X86_FEATURE_XMM3 (4*32+ 0) /* "pni" SSE-3 */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 1905ce9..fe57544 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -96,6 +96,18 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
sched_clock_stable = 1;
}
+ /* Penwell and Cloverview have the TSC which doesn't sleep on S3 */
+ if (c->x86 == 6) {
+ switch (c->x86_model) {
+ case 0x27: /* Penwell */
+ case 0x35: /* Cloverview */
+ set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC_S3);
+ break;
+ default:
+ ;
+ }
+ }
+
/*
* There is a known erratum on Pentium III and Core Solo
* and Core Duo CPUs.
--
1.7.0.4
Hi Jason,
Thanks for the review and suggestions!
On Mon, Mar 04, 2013 at 08:20:36PM -0700, Jason Gunthorpe wrote:
> On Tue, Mar 05, 2013 at 02:34:25AM +0000, Tang, Feng wrote:
> > Hi Jason,
> >
> > Sorry, I forgot to add you in cc list in the first place. Please
> > help to review the patch series, thanks!
>
> Sure, I didn't get CC'd on the patches, so this is an imperfect reply,
> but..
I've copied your comments back to the main thread.
>
> Did you consider an approach closer to the function I outlined to
> Jonh:
>
> // Drops some small precision along the way but is simple..
> static inline u64 cyclecounter_cyc2ns_128(const struct cyclecounter *cc,
> cycle_t cycles)
> {
> u64 max = U64_MAX/cc->mult;
> u64 num = cycles/max;
> u64 result = num * ((max * cc->mult) >> cc->shift);
> return result + cyclecounter_cyc2ns(cc, cycles - num*cc->mult);
> }
Your way is surely more accurate, if maintainers are ok with adding
the new API, I will use it.
>
> Rather than the while loop, which, I suspect, drops more precision
> that something like the above. (replace cyclecounter with clocksource)
> At the very least, keeping it as a distinct inline will let someone
> come by one day and implement a proper 128 bit multiply...
>
> You may want to also CC the maintainers of all the ARM subsystems that
> use read_persistent_clock and check with them to ensure this new
> interface will let them migrate their implementations as well.
Maybe I didn't get it well, my patches didn't change the read_persistent_clock(),
but inject a new way of counting suspended time. It should have no
functional changes to existing platforms.
> > * Solve the problem of judging S3/S4, as the clocksource
> > counter will be reset after coming out S4.
>
> Hrm, what if it wraps during suspend? This probably isn't a problem
> for a 64 bit TSC though..
Nice catch! I've added a wrap check in the patch, and will stop to use it
if there is a wrap founded.
+ if ((clock->flags & CLOCK_SOURCE_SUSPEND_NOTSTOP) &&
+ cycle_now > clock->cycle_last) {
>
> Is it impossible to track if S4 or S3 was entered in the clocksource?
Currently there is no easy way to check it.
Thanks,
Feng
On Tue, Mar 05, 2013 at 11:53:02AM +0800, Feng Tang wrote:
> > // Drops some small precision along the way but is simple..
> > static inline u64 cyclecounter_cyc2ns_128(const struct cyclecounter *cc,
> > cycle_t cycles)
> > {
> > u64 max = U64_MAX/cc->mult;
> > u64 num = cycles/max;
> > u64 result = num * ((max * cc->mult) >> cc->shift);
> > return result + cyclecounter_cyc2ns(cc, cycles - num*cc->mult);
> > }
>
> Your way is surely more accurate, if maintainers are ok with adding
> the new API, I will use it.
Okay, give it a good look though, I only wrote it out in email, never
tested it :)
> > You may want to also CC the maintainers of all the ARM subsystems that
> > use read_persistent_clock and check with them to ensure this new
> > interface will let them migrate their implementations as well.
>
> Maybe I didn't get it well, my patches didn't change the
> read_persistent_clock(), but inject a new way of counting suspended
> time. It should have no functional changes to existing platforms.
Right, your patches are fine stand alone.
The ARM case of plat-omap/counter_32k.c would ideally be converted to
use your new API though, that is what I ment about involving them.
I'm not sure about mach-tegra/timer.c though - it seems to be using a
counter as well but somehow sharing registers with the RTC?
Regards,
Jason
On 03/05/2013 12:32 PM, Jason Gunthorpe wrote:
> On Tue, Mar 05, 2013 at 11:53:02AM +0800, Feng Tang wrote:
>
>>> // Drops some small precision along the way but is simple..
>>> static inline u64 cyclecounter_cyc2ns_128(const struct cyclecounter *cc,
>>> cycle_t cycles)
>>> {
>>> u64 max = U64_MAX/cc->mult;
>>> u64 num = cycles/max;
>>> u64 result = num * ((max * cc->mult) >> cc->shift);
>>> return result + cyclecounter_cyc2ns(cc, cycles - num*cc->mult);
>>> }
>> Your way is surely more accurate, if maintainers are ok with adding
>> the new API, I will use it.
> Okay, give it a good look though, I only wrote it out in email, never
> tested it :)
>
Probably want to use clocksource instead of cyclecounter, but I think
Jason's approach sounds ok. I might suggest that you initially make the
function static to the timekeeping code, just so we don't get unexpected
users.
thanks
-john
On 03/05/2013 10:27 AM, Feng Tang wrote:
> There are some new processors whose TSC clocksource won't stop during
> suspend. Currently, after system resumes, kernel will use persistent
> clock or RTC to compensate the sleep time, but for those new types of
> clocksources, we could skip the special compensation from external
> sources, and just use current clocksource for time recounting.
>
> This can solve some time drift bugs caused by some not-so-accurate or
> error-prone RTC devices.
>
> The current way to count suspened time is first try to use the persistent
> clock, and then try the rtc if persistent clock can't be used. This
> patch will change the trying order to:
> suspend-nonstop clocksource -> persistent clock -> rtc
Thanks for sending out another iteration of this code. Jason's feedback
has been good, but I think this is starting to shape up nicely.
More below
> Signed-off-by: Feng Tang <[email protected]>
> ---
> kernel/time/timekeeping.c | 57 ++++++++++++++++++++++++++++++++++++++------
> 1 files changed, 49 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 9a0bc98..15cc086 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -788,22 +788,63 @@ void timekeeping_inject_sleeptime(struct timespec *delta)
> static void timekeeping_resume(void)
> {
> struct timekeeper *tk = &timekeeper;
> + struct clocksource *clock = tk->clock;
> unsigned long flags;
> - struct timespec ts;
> + struct timespec ts_new, ts_delta;
> + cycle_t cycle_now, cycle_delta;
> + s64 nsec;
>
> - read_persistent_clock(&ts);
> + ts_delta.tv_sec = 0;
> + read_persistent_clock(&ts_new);
>
> clockevents_resume();
> clocksource_resume();
>
> write_seqlock_irqsave(&tk->lock, flags);
>
> - if (timespec_compare(&ts, &timekeeping_suspend_time) > 0) {
> - ts = timespec_sub(ts, timekeeping_suspend_time);
> - __timekeeping_inject_sleeptime(tk, &ts);
> - }
> - /* re-base the last cycle value */
> - tk->clock->cycle_last = tk->clock->read(tk->clock);
> + /*
> + * After system resumes, we need to calculate the suspended time and
> + * compensate it for the OS time. There are 3 sources that could be
> + * used: Nonstop clocksource during suspend, persistent clock and rtc
> + * device.
> + *
> + * One specific platform may have 1 or 2 or all of them, and the
> + * preference will be:
> + * suspend-nonstop clocksource > persistent clock > rtc
> + * The less preferred source will only be tried if there is no better
> + * usable source. The rtc part is handled separately in rtc core code.
> + */
> + cycle_now = clock->read(clock);
So this might be ok for an initial implementation, as on the
non-stop-tsc hardware, the TSC is the best clocksource available. One
concern long term is that there may be cases where the non-stop
clocksource is not the most performant clocksource on a system. In that
case, we may want to use a non-stop clocksource that is not the current
timekeeping clocksource. So that may require some extra clocksource core
interfaces to access the non-stop clocksource instead of using the
timekeeper's clocksource, also we'll have to be sure to use something
other then cycle_last in that case, since we'll need to read the nonstop
clocksource at suspend, rather then trusting that forward_now updates
cycle_last as is done here.
thanks
-john
On Mon, Mar 04, 2013 at 09:32:03PM -0700, Jason Gunthorpe wrote:
> On Tue, Mar 05, 2013 at 11:53:02AM +0800, Feng Tang wrote:
> > > You may want to also CC the maintainers of all the ARM subsystems that
> > > use read_persistent_clock and check with them to ensure this new
> > > interface will let them migrate their implementations as well.
> >
> > Maybe I didn't get it well, my patches didn't change the
> > read_persistent_clock(), but inject a new way of counting suspended
> > time. It should have no functional changes to existing platforms.
>
> Right, your patches are fine stand alone.
>
> The ARM case of plat-omap/counter_32k.c would ideally be converted to
> use your new API though, that is what I ment about involving them.
I see now. Yes, the counter_32k could be converted to a clocksource
with SUSPEND_NONSTOP flag set, and no need for it to use the
read_persistent_clock any more.
>
> I'm not sure about mach-tegra/timer.c though - it seems to be using a
> counter as well but somehow sharing registers with the RTC?
I just searched the 3.9-rc1 code, seems the file has been moved to
drivers/clocksource/tegra20_timer.c, and its persistent clock seems
to also be based on a RTC like device.
Thanks,
Feng
On Tue, Mar 05, 2013 at 02:27:34PM +0800, John Stultz wrote:
> On 03/05/2013 10:27 AM, Feng Tang wrote:
> >There are some new processors whose TSC clocksource won't stop during
> >suspend. Currently, after system resumes, kernel will use persistent
> >clock or RTC to compensate the sleep time, but for those new types of
> >clocksources, we could skip the special compensation from external
> >sources, and just use current clocksource for time recounting.
> >
> >This can solve some time drift bugs caused by some not-so-accurate or
> >error-prone RTC devices.
> >
> >The current way to count suspened time is first try to use the persistent
> >clock, and then try the rtc if persistent clock can't be used. This
> >patch will change the trying order to:
> > suspend-nonstop clocksource -> persistent clock -> rtc
>
> Thanks for sending out another iteration of this code. Jason's
> feedback has been good, but I think this is starting to shape up
> nicely.
Thanks :)
> >Signed-off-by: Feng Tang <[email protected]>
> >---
> > kernel/time/timekeeping.c | 57 ++++++++++++++++++++++++++++++++++++++------
> > 1 files changed, 49 insertions(+), 8 deletions(-)
> >
> >diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> >index 9a0bc98..15cc086 100644
> >--- a/kernel/time/timekeeping.c
> >+++ b/kernel/time/timekeeping.c
> >@@ -788,22 +788,63 @@ void timekeeping_inject_sleeptime(struct timespec *delta)
> > static void timekeeping_resume(void)
> > {
> > struct timekeeper *tk = &timekeeper;
> >+ struct clocksource *clock = tk->clock;
> > unsigned long flags;
> >- struct timespec ts;
> >+ struct timespec ts_new, ts_delta;
> >+ cycle_t cycle_now, cycle_delta;
> >+ s64 nsec;
> >- read_persistent_clock(&ts);
> >+ ts_delta.tv_sec = 0;
> >+ read_persistent_clock(&ts_new);
> > clockevents_resume();
> > clocksource_resume();
> > write_seqlock_irqsave(&tk->lock, flags);
> >- if (timespec_compare(&ts, &timekeeping_suspend_time) > 0) {
> >- ts = timespec_sub(ts, timekeeping_suspend_time);
> >- __timekeeping_inject_sleeptime(tk, &ts);
> >- }
> >- /* re-base the last cycle value */
> >- tk->clock->cycle_last = tk->clock->read(tk->clock);
> >+ /*
> >+ * After system resumes, we need to calculate the suspended time and
> >+ * compensate it for the OS time. There are 3 sources that could be
> >+ * used: Nonstop clocksource during suspend, persistent clock and rtc
> >+ * device.
> >+ *
> >+ * One specific platform may have 1 or 2 or all of them, and the
> >+ * preference will be:
> >+ * suspend-nonstop clocksource > persistent clock > rtc
> >+ * The less preferred source will only be tried if there is no better
> >+ * usable source. The rtc part is handled separately in rtc core code.
> >+ */
> >+ cycle_now = clock->read(clock);
>
> So this might be ok for an initial implementation, as on the
> non-stop-tsc hardware, the TSC is the best clocksource available.
> One concern long term is that there may be cases where the non-stop
> clocksource is not the most performant clocksource on a system. In
> that case, we may want to use a non-stop clocksource that is not the
> current timekeeping clocksource. So that may require some extra
> clocksource core interfaces to access the non-stop clocksource
> instead of using the timekeeper's clocksource, also we'll have to be
> sure to use something other then cycle_last in that case, since
> we'll need to read the nonstop clocksource at suspend, rather then
> trusting that forward_now updates cycle_last as is done here.
Yeah, I just realized this when Jason mentioned the counter_32k on
OMAP.
So for next step, we may add something in timekeeping.c like
static struct clocksource *suspend_time_cs;
read and save its counter righer before entering and after getting
out of suspended state. And create a new struct which includes
all time suspend related flags, counters, pointers, make it as a
member of struct timekeeper. Comments?
Thanks,
Feng
On 03/05/2013 02:38 PM, Feng Tang wrote:
> On Tue, Mar 05, 2013 at 02:27:34PM +0800, John Stultz wrote:
>
>>
>> So this might be ok for an initial implementation, as on the
>> non-stop-tsc hardware, the TSC is the best clocksource available.
>> One concern long term is that there may be cases where the non-stop
>> clocksource is not the most performant clocksource on a system. In
>> that case, we may want to use a non-stop clocksource that is not the
>> current timekeeping clocksource. So that may require some extra
>> clocksource core interfaces to access the non-stop clocksource
>> instead of using the timekeeper's clocksource, also we'll have to be
>> sure to use something other then cycle_last in that case, since
>> we'll need to read the nonstop clocksource at suspend, rather then
>> trusting that forward_now updates cycle_last as is done here.
> Yeah, I just realized this when Jason mentioned the counter_32k on
> OMAP.
>
> So for next step, we may add something in timekeeping.c like
> static struct clocksource *suspend_time_cs;
> read and save its counter righer before entering and after getting
> out of suspended state. And create a new struct which includes
> all time suspend related flags, counters, pointers, make it as a
> member of struct timekeeper. Comments?
I'd maybe add it to the clocksource code rather then the timekeeper.
Have a clocksource_nonstop_clock() accessor which returns a pointer to
the highest rated clocksource in the clocksource list that has the
nonstop flag (updating the pointer at registration time, rather then
scanning the list every time).
That way if we want, we can later define clocksource_nonstop_clock() as
null, and let the complier optimize out the extra complexity in the
resume path if the arch doesn't support nonstop clocksource.
thanks
-john
On Tue, Mar 05, 2013 at 02:47:27PM +0800, John Stultz wrote:
> On 03/05/2013 02:38 PM, Feng Tang wrote:
> >On Tue, Mar 05, 2013 at 02:27:34PM +0800, John Stultz wrote:
> >
> >>
> >>So this might be ok for an initial implementation, as on the
> >>non-stop-tsc hardware, the TSC is the best clocksource available.
> >>One concern long term is that there may be cases where the non-stop
> >>clocksource is not the most performant clocksource on a system. In
> >>that case, we may want to use a non-stop clocksource that is not the
> >>current timekeeping clocksource. So that may require some extra
> >>clocksource core interfaces to access the non-stop clocksource
> >>instead of using the timekeeper's clocksource, also we'll have to be
> >>sure to use something other then cycle_last in that case, since
> >>we'll need to read the nonstop clocksource at suspend, rather then
> >>trusting that forward_now updates cycle_last as is done here.
> >Yeah, I just realized this when Jason mentioned the counter_32k on
> >OMAP.
> >
> >So for next step, we may add something in timekeeping.c like
> > static struct clocksource *suspend_time_cs;
> >read and save its counter righer before entering and after getting
> >out of suspended state. And create a new struct which includes
> >all time suspend related flags, counters, pointers, make it as a
> >member of struct timekeeper. Comments?
> I'd maybe add it to the clocksource code rather then the timekeeper.
> Have a clocksource_nonstop_clock() accessor which returns a pointer
> to the highest rated clocksource in the clocksource list that has
> the nonstop flag (updating the pointer at registration time, rather
> then scanning the list every time).
Yes, it's more natural to put it in clocksource code if we need to
pick and compare a best non-stop-tsc clocksource.
Thanks,
Feng
On Tue, Mar 05, 2013 at 02:17:59PM +0800, John Stultz wrote:
> On 03/05/2013 12:32 PM, Jason Gunthorpe wrote:
> >On Tue, Mar 05, 2013 at 11:53:02AM +0800, Feng Tang wrote:
> >
> >>>// Drops some small precision along the way but is simple..
> >>>static inline u64 cyclecounter_cyc2ns_128(const struct cyclecounter *cc,
> >>> cycle_t cycles)
> >>>{
> >>> u64 max = U64_MAX/cc->mult;
> >>> u64 num = cycles/max;
> >>> u64 result = num * ((max * cc->mult) >> cc->shift);
> >>> return result + cyclecounter_cyc2ns(cc, cycles - num*cc->mult);
> >>>}
> >
> Probably want to use clocksource instead of cyclecounter, but I
> think Jason's approach sounds ok. I might suggest that you initially
> make the function static to the timekeeping code, just so we don't
> get unexpected users.
Thought more about it, can we directly make clocksource_cyc2ns() cover
the overflow case? Something like:
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index aa7032c..1ecc872 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -274,7 +274,16 @@ static inline u32 clocksource_hz2mult(u32 hz, u32 shift_constant)
*/
static inline s64 clocksource_cyc2ns(cycle_t cycles, u32 mult, u32 shift)
{
- return ((u64) cycles * mult) >> shift;
+ u64 max = ULLONG_MAX / mult;
+ s64 nsec = 0;
+
+ /* The (mult * cycles) may overflow 64 bits, so add a max check */
+ if (cycles > max) {
+ nsec = ((max * mult) >> shift) * (cycles / max);
+ cycles %= max;
+ }
+ nsec += ((u64) cycles * mult) >> shift;
+ return nsec;
}
Thanks,
Feng