2006-09-30 00:12:39

by Thomas Gleixner

[permalink] [raw]
Subject: [patch 02/23] GTOD: persistent clock support, core

From: John Stultz <[email protected]>

persistent clock support: do proper timekeeping across suspend/resume.

Signed-off-by: John Stultz <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
--
include/linux/hrtimer.h | 3 +++
include/linux/time.h | 1 +
kernel/hrtimer.c | 8 ++++++++
kernel/timer.c | 34 +++++++++++++++++++++++++++++++---
4 files changed, 43 insertions(+), 3 deletions(-)

linux-2.6.18-rc6_timeofday-persistent-clock-generic_C6.patch
Index: linux-2.6.18-mm2/include/linux/hrtimer.h
===================================================================
--- linux-2.6.18-mm2.orig/include/linux/hrtimer.h 2006-09-30 01:41:14.000000000 +0200
+++ linux-2.6.18-mm2/include/linux/hrtimer.h 2006-09-30 01:41:15.000000000 +0200
@@ -146,6 +146,9 @@ extern void hrtimer_init_sleeper(struct
/* Soft interrupt function to run the hrtimer queues: */
extern void hrtimer_run_queues(void);

+/* Resume notification */
+void hrtimer_notify_resume(void);
+
/* Bootup initialization: */
extern void __init hrtimers_init(void);

Index: linux-2.6.18-mm2/include/linux/time.h
===================================================================
--- linux-2.6.18-mm2.orig/include/linux/time.h 2006-09-30 01:41:14.000000000 +0200
+++ linux-2.6.18-mm2/include/linux/time.h 2006-09-30 01:41:15.000000000 +0200
@@ -92,6 +92,7 @@ extern struct timespec xtime;
extern struct timespec wall_to_monotonic;
extern seqlock_t xtime_lock;

+extern unsigned long read_persistent_clock(void);
void timekeeping_init(void);

static inline unsigned long get_seconds(void)
Index: linux-2.6.18-mm2/kernel/hrtimer.c
===================================================================
--- linux-2.6.18-mm2.orig/kernel/hrtimer.c 2006-09-30 01:41:14.000000000 +0200
+++ linux-2.6.18-mm2/kernel/hrtimer.c 2006-09-30 01:41:15.000000000 +0200
@@ -287,6 +287,14 @@ static unsigned long ktime_divns(const k
#endif /* BITS_PER_LONG >= 64 */

/*
+ * Timekeeping resumed notification
+ */
+void hrtimer_notify_resume(void)
+{
+ clock_was_set();
+}
+
+/*
* Counterpart to lock_timer_base above:
*/
static inline
Index: linux-2.6.18-mm2/kernel/timer.c
===================================================================
--- linux-2.6.18-mm2.orig/kernel/timer.c 2006-09-30 01:41:14.000000000 +0200
+++ linux-2.6.18-mm2/kernel/timer.c 2006-09-30 01:41:15.000000000 +0200
@@ -41,6 +41,9 @@
#include <asm/timex.h>
#include <asm/io.h>

+/* jiffies at the most recent update of wall time */
+unsigned long wall_jiffies = INITIAL_JIFFIES;
+
u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES;

EXPORT_SYMBOL(jiffies_64);
@@ -743,12 +746,20 @@ int timekeeping_is_continuous(void)
return ret;
}

+/* Weak dummy function for arches that do not yet support it.
+ * XXX - Do be sure to remove it once all arches implement it.
+ */
+unsigned long __attribute__((weak)) read_persistent_clock(void)
+{
+ return 0;
+}
+
/*
* timekeeping_init - Initializes the clocksource and common timekeeping values
*/
void __init timekeeping_init(void)
{
- unsigned long flags;
+ unsigned long flags, sec = read_persistent_clock();

write_seqlock_irqsave(&xtime_lock, flags);

@@ -758,11 +769,18 @@ void __init timekeeping_init(void)
clocksource_calculate_interval(clock, tick_nsec);
clock->cycle_last = clocksource_read(clock);

+ xtime.tv_sec = sec;
+ xtime.tv_nsec = (jiffies % HZ) * (NSEC_PER_SEC / HZ);
+ set_normalized_timespec(&wall_to_monotonic,
+ -xtime.tv_sec, -xtime.tv_nsec);
+
write_sequnlock_irqrestore(&xtime_lock, flags);
}


static int timekeeping_suspended;
+static unsigned long timekeeping_suspend_time;
+
/**
* timekeeping_resume - Resumes the generic timekeeping subsystem.
* @dev: unused
@@ -773,14 +791,23 @@ static int timekeeping_suspended;
*/
static int timekeeping_resume(struct sys_device *dev)
{
- unsigned long flags;
+ unsigned long flags, now = read_persistent_clock();

write_seqlock_irqsave(&xtime_lock, flags);
- /* restart the last cycle value */
+
+ if (now && (now > timekeeping_suspend_time)) {
+ unsigned long sleep_length = now - timekeeping_suspend_time;
+ xtime.tv_sec += sleep_length;
+ jiffies_64 += sleep_length * HZ;
+ }
+ /* re-base the last cycle value */
clock->cycle_last = clocksource_read(clock);
clock->error = 0;
timekeeping_suspended = 0;
write_sequnlock_irqrestore(&xtime_lock, flags);
+
+ hrtimer_notify_resume();
+
return 0;
}

@@ -790,6 +817,7 @@ static int timekeeping_suspend(struct sy

write_seqlock_irqsave(&xtime_lock, flags);
timekeeping_suspended = 1;
+ timekeeping_suspend_time = read_persistent_clock();
write_sequnlock_irqrestore(&xtime_lock, flags);
return 0;
}

--


2006-09-30 08:36:20

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 02/23] GTOD: persistent clock support, core


On Fri, 29 Sep 2006 23:58:21 -0000
Thomas Gleixner <[email protected]> wrote:

> From: John Stultz <[email protected]>
>
> persistent clock support: do proper timekeeping across suspend/resume.

How?

> +/* Weak dummy function for arches that do not yet support it.
> + * XXX - Do be sure to remove it once all arches implement it.
> + */
> +unsigned long __attribute__((weak)) read_persistent_clock(void)
> +{
> + return 0;
> +}

Seconds? microseconds? jiffies? walltime? uptime?

Needs some comments.


> void __init timekeeping_init(void)
> {
> - unsigned long flags;
> + unsigned long flags, sec = read_persistent_clock();

So it apparently returns seconds-since-epoch?

If so, why?

> write_seqlock_irqsave(&xtime_lock, flags);
>
> @@ -758,11 +769,18 @@ void __init timekeeping_init(void)
> clocksource_calculate_interval(clock, tick_nsec);
> clock->cycle_last = clocksource_read(clock);
>
> + xtime.tv_sec = sec;
> + xtime.tv_nsec = (jiffies % HZ) * (NSEC_PER_SEC / HZ);

Why is it valid to take the second from the persistent clock and the
fraction-of-a-second from jiffies? Some comments describing the
implementation would improve its understandability and maintainability.

This statement can set xtime.tv_nsec to a value >= NSEC_PER_SEC. Should it
not be normalised?

> + set_normalized_timespec(&wall_to_monotonic,
> + -xtime.tv_sec, -xtime.tv_nsec);
> +
> write_sequnlock_irqrestore(&xtime_lock, flags);
> }
>
>
> static int timekeeping_suspended;
> +static unsigned long timekeeping_suspend_time;

In what units?

> +
> /**
> * timekeeping_resume - Resumes the generic timekeeping subsystem.
> * @dev: unused
> @@ -773,14 +791,23 @@ static int timekeeping_suspended;
> */
> static int timekeeping_resume(struct sys_device *dev)
> {
> - unsigned long flags;
> + unsigned long flags, now = read_persistent_clock();

Would whoever keeps doing that please stop it? This:

unsigned long flags;
unsigned long now = read_persistent_clock();

is more readable and makes for more readable patches in the future.

> write_seqlock_irqsave(&xtime_lock, flags);
> - /* restart the last cycle value */
> +
> + if (now && (now > timekeeping_suspend_time)) {
> + unsigned long sleep_length = now - timekeeping_suspend_time;
> + xtime.tv_sec += sleep_length;
> + jiffies_64 += sleep_length * HZ;

sleep_length will overflow if we slept for more than 49 days, and HZ=1000.

> + }
> + /* re-base the last cycle value */
> clock->cycle_last = clocksource_read(clock);
> clock->error = 0;
> timekeeping_suspended = 0;
> write_sequnlock_irqrestore(&xtime_lock, flags);
> +
> + hrtimer_notify_resume();
> +
> return 0;
> }
>
> @@ -790,6 +817,7 @@ static int timekeeping_suspend(struct sy
>
> write_seqlock_irqsave(&xtime_lock, flags);
> timekeeping_suspended = 1;
> + timekeeping_suspend_time = read_persistent_clock();
> write_sequnlock_irqrestore(&xtime_lock, flags);
> return 0;
> }
>
> --

2006-09-30 17:24:32

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [patch 02/23] GTOD: persistent clock support, core


>> persistent clock support: do proper timekeeping across suspend/resume.
>
>How?

Rereading the RTC seems the only way to me. Someone prove me wrong, and
do it fast! :)


Jan Engelhardt
--

2006-10-02 21:49:32

by john stultz

[permalink] [raw]
Subject: Re: [patch 02/23] GTOD: persistent clock support, core

On Sat, 2006-09-30 at 01:35 -0700, Andrew Morton wrote:
> On Fri, 29 Sep 2006 23:58:21 -0000
> Thomas Gleixner <[email protected]> wrote:
>
> > From: John Stultz <[email protected]>
> >
> > persistent clock support: do proper timekeeping across suspend/resume.
>
> How?

Improved description included below.

> > +/* Weak dummy function for arches that do not yet support it.
> > + * XXX - Do be sure to remove it once all arches implement it.
> > + */
> > +unsigned long __attribute__((weak)) read_persistent_clock(void)
> > +{
> > + return 0;
> > +}
>
> Seconds? microseconds? jiffies? walltime? uptime?
>
> Needs some comments.

Agreed. Thanks for pointing it out.


>
> > void __init timekeeping_init(void)
> > {
> > - unsigned long flags;
> > + unsigned long flags, sec = read_persistent_clock();
>
> So it apparently returns seconds-since-epoch?
>
> If so, why?
>
> > write_seqlock_irqsave(&xtime_lock, flags);
> >
> > @@ -758,11 +769,18 @@ void __init timekeeping_init(void)
> > clocksource_calculate_interval(clock, tick_nsec);
> > clock->cycle_last = clocksource_read(clock);
> >
> > + xtime.tv_sec = sec;
> > + xtime.tv_nsec = (jiffies % HZ) * (NSEC_PER_SEC / HZ);
>
> Why is it valid to take the second from the persistent clock and the
> fraction-of-a-second from jiffies? Some comments describing the
> implementation would improve its understandability and maintainability.

Yea. i386 and other arches have done this for awhile, so I preserved it.
However on further inspection, it really doesn't make much sense. We're
pre-timer interurpts anyway, so jiffies won't have begun yet. So now I
just initialize it to zero.

> This statement can set xtime.tv_nsec to a value >= NSEC_PER_SEC. Should it
> not be normalised?

Yep, it is, and you commented just above it.:)

> > + set_normalized_timespec(&wall_to_monotonic,
> > + -xtime.tv_sec, -xtime.tv_nsec);
> > +
> > write_sequnlock_irqrestore(&xtime_lock, flags);
> > }
> >
> > static int timekeeping_suspended;
> > +static unsigned long timekeeping_suspend_time;
>
> In what units?

Fixed.

> > +
> > /**
> > * timekeeping_resume - Resumes the generic timekeeping subsystem.
> > * @dev: unused
> > @@ -773,14 +791,23 @@ static int timekeeping_suspended;
> > */
> > static int timekeeping_resume(struct sys_device *dev)
> > {
> > - unsigned long flags;
> > + unsigned long flags, now = read_persistent_clock();
>
> Would whoever keeps doing that please stop it? This:
> unsigned long flags;
> unsigned long now = read_persistent_clock();
>
> is more readable and makes for more readable patches in the future.

Fixed.

> > write_seqlock_irqsave(&xtime_lock, flags);
> > - /* restart the last cycle value */
> > +
> > + if (now && (now > timekeeping_suspend_time)) {
> > + unsigned long sleep_length = now - timekeeping_suspend_time;
> > + xtime.tv_sec += sleep_length;
> > + jiffies_64 += sleep_length * HZ;
>
> sleep_length will overflow if we slept for more than 49 days, and HZ=1000.

Oh! Great catch! Fixed.

Thanks so much for the thorough review!

Updated patch follows:

thanks
-john


Implement generic timekeeping suspend/resume accounting by introducing
the read_persistent_clock() interface. This is an arch specific
function that returns the seconds since the epoch using the arch
defined battery backed clock.

Aside from allowing the removal of duplicate arch specific resume
logic, this change helps avoid potential resume time ordering issues
between generic and arch specific time code.

This patch only provides the generic usage of this new function and a
weak dummy function, that always returns zero if no arch specific
function is defined. Thus if no persistent clock is present, no change
in behavior should be seen with this patch.

Signed-off-by: John Stultz <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: John Stultz <[email protected]>
--

include/linux/hrtimer.h | 3 +++
include/linux/time.h | 1 +
kernel/hrtimer.c | 8 ++++++++
kernel/timer.c | 40 +++++++++++++++++++++++++++++++++++++++-
4 files changed, 51 insertions(+), 1 deletion(-)

linux-2.6.18_timeofday-persistent-clock-generic_C7.patch
============================================
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index fca9302..660d91d 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -146,6 +146,9 @@ extern void hrtimer_init_sleeper(struct
/* Soft interrupt function to run the hrtimer queues: */
extern void hrtimer_run_queues(void);

+/* Resume notification */
+void hrtimer_notify_resume(void);
+
/* Bootup initialization: */
extern void __init hrtimers_init(void);

diff --git a/include/linux/time.h b/include/linux/time.h
index a5b7399..db31d2a 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -92,6 +92,7 @@ extern struct timespec xtime;
extern struct timespec wall_to_monotonic;
extern seqlock_t xtime_lock;

+extern unsigned long read_persistent_clock(void);
void timekeeping_init(void);

static inline unsigned long get_seconds(void)
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index d0ba190..090b752 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -287,6 +287,14 @@ static unsigned long ktime_divns(const k
#endif /* BITS_PER_LONG >= 64 */

/*
+ * Timekeeping resumed notification
+ */
+void hrtimer_notify_resume(void)
+{
+ clock_was_set();
+}
+
+/*
* Counterpart to lock_timer_base above:
*/
static inline
diff --git a/kernel/timer.c b/kernel/timer.c
index c1c7fbc..5069139 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -41,6 +41,9 @@
#include <asm/timex.h>
#include <asm/io.h>

+/* jiffies at the most recent update of wall time */
+unsigned long wall_jiffies = INITIAL_JIFFIES;
+
u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES;

EXPORT_SYMBOL(jiffies_64);
@@ -743,12 +746,27 @@ int timekeeping_is_continuous(void)
return ret;
}

+/**
+ * read_persistent_clock - Return time in seconds from the persistent clock.
+ *
+ * Weak dummy function for arches that do not yet support it.
+ * Returns seconds from epoch using the battery backed persistent clock.
+ * Returns zero if unsupported.
+ *
+ * XXX - Do be sure to remove it once all arches implement it.
+ */
+unsigned long __attribute__((weak)) read_persistent_clock(void)
+{
+ return 0;
+}
+
/*
* timekeeping_init - Initializes the clocksource and common timekeeping values
*/
void __init timekeeping_init(void)
{
unsigned long flags;
+ unsigned long sec = read_persistent_clock();

write_seqlock_irqsave(&xtime_lock, flags);

@@ -758,11 +776,20 @@ void __init timekeeping_init(void)
clocksource_calculate_interval(clock, tick_nsec);
clock->cycle_last = clocksource_read(clock);

+ xtime.tv_sec = sec;
+ xtime.tv_nsec = 0;
+ set_normalized_timespec(&wall_to_monotonic,
+ -xtime.tv_sec, -xtime.tv_nsec);
+
write_sequnlock_irqrestore(&xtime_lock, flags);
}


+/* flag for if timekeeping is suspended */
static int timekeeping_suspended;
+/* time in seconds when suspend began */
+static unsigned long timekeeping_suspend_time;
+
/**
* timekeeping_resume - Resumes the generic timekeeping subsystem.
* @dev: unused
@@ -774,13 +801,23 @@ static int timekeeping_suspended;
static int timekeeping_resume(struct sys_device *dev)
{
unsigned long flags;
+ unsigned long now = read_persistent_clock();

write_seqlock_irqsave(&xtime_lock, flags);
- /* restart the last cycle value */
+
+ if (now && (now > timekeeping_suspend_time)) {
+ unsigned long sleep_length = now - timekeeping_suspend_time;
+ xtime.tv_sec += sleep_length;
+ jiffies_64 += (u64)sleep_length * HZ;
+ }
+ /* re-base the last cycle value */
clock->cycle_last = clocksource_read(clock);
clock->error = 0;
timekeeping_suspended = 0;
write_sequnlock_irqrestore(&xtime_lock, flags);
+
+ hrtimer_notify_resume();
+
return 0;
}

@@ -790,6 +827,7 @@ static int timekeeping_suspend(struct sy

write_seqlock_irqsave(&xtime_lock, flags);
timekeeping_suspended = 1;
+ timekeeping_suspend_time = read_persistent_clock();
write_sequnlock_irqrestore(&xtime_lock, flags);
return 0;
}


2006-10-03 13:53:17

by David Brownell

[permalink] [raw]
Subject: Re: [patch 02/23] GTOD: persistent clock support, core

> Implement generic timekeeping suspend/resume accounting by introducing
> the read_persistent_clock() interface. This is an arch specific
> function that returns the seconds since the epoch using the arch
> defined battery backed clock.

I remain unclear what's expected to happen when there IS no such
architcture-defined clock ... but where the system itself still
has one, e.g. a board may access one through I2C or SPI once IRQs
are working normally.

You'll recall that I had pointed out that the drivers/rtc framework
provides CONFIG_RTC_HCTOSYS, which already unifies quite a lot of
the "persistent" clocks in the way you described above, but without
that nasty requirement of working without IRQs enabled.


> +/**
> + * read_persistent_clock - Return time in seconds from the persistent clock.
> + *
> + * Weak dummy function for arches that do not yet support it.
> + * Returns seconds from epoch using the battery backed persistent clock.
> + * Returns zero if unsupported.
> + *
> + * XXX - Do be sure to remove it once all arches implement it.

But not all architectures **CAN** support this notion ...


> + */
> +unsigned long __attribute__((weak)) read_persistent_clock(void)
> +{
> + return 0;
> +}
> +
>
> /*
> * timekeeping_init - Initializes the clocksource and common timekeeping values
> */
> void __init timekeeping_init(void)
> {
> unsigned long flags;
> + unsigned long sec = read_persistent_clock();

... and timekeeping_init() is called before I2C or SPI could be used,
since IRQs aren't enabled yet and accessing those busses can't be
done in general without IRQs enabled.


> @@ -774,13 +801,23 @@ static int timekeeping_suspended;
> static int timekeeping_resume(struct sys_device *dev)
> {
> unsigned long flags;
> + unsigned long now = read_persistent_clock();

Again: sys_device resume() is called with IRQs disabled, which
prevents access to many systems' persistent clocks. In fact,
after posting this example patch

http://marc.theaimsgroup.com/?l=linux-kernel&m=115600629813751&w=2

I never heard anything more from you on this issue. Given this
particular patch (in $SUBJECT) should I assume you're going to
just ignore the issues whereby RTCs ("persistent clocks") can't
always meet the no-IRQs-needed assumptions being made here? Or
address isssues like using pointer-to-function instead of using
linker tricks?

http://marc.theaimsgroup.com/?l=linux-kernel&m=115600629825461&w=2

Those class suspend/resume hooks are now merged to kernel.org, by
the way, so that example patch is now pretty much deployable.

- Dave

2006-10-03 19:20:56

by john stultz

[permalink] [raw]
Subject: Re: [patch 02/23] GTOD: persistent clock support, core

On Tue, 2006-10-03 at 06:53 -0700, David Brownell wrote:
> > Implement generic timekeeping suspend/resume accounting by introducing
> > the read_persistent_clock() interface. This is an arch specific
> > function that returns the seconds since the epoch using the arch
> > defined battery backed clock.
>
> I remain unclear what's expected to happen when there IS no such
> architcture-defined clock ... but where the system itself still
> has one, e.g. a board may access one through I2C or SPI once IRQs
> are working normally.

Yea. First, let me apologize for falling off the last thread. I got busy
with other things and the discussion withered. This patch has been
re-raised because Thomas finally tripped over the "theoretical" resume
time ordering bug that I was concerned about.

So, while my first announcement with the patch was something of the
effect of "Trying to unify the cmos/RTC/whatever interface", due to our
discussion I'm scaling back that goal.

Instead the purpose of this is just a continuation of the generic
timekeeping work. Moving the *existing* arch specific resume timekeeping
code into generic code, so we don't get crazy resume ordering issues
splitting the issue of who sets what when between the generic and arch
specific time resume functions.

On arches that have the constraints you list above, a dummy
read_persistent_clock() that returns zero can be implemented, with a
comment about why and the RTC_HCTOSYS bits can be used, with no change
in behavior from what we have now.


> You'll recall that I had pointed out that the drivers/rtc framework
> provides CONFIG_RTC_HCTOSYS, which already unifies quite a lot of
> the "persistent" clocks in the way you described above, but without
> that nasty requirement of working without IRQs enabled.
>
>
> > +/**
> > + * read_persistent_clock - Return time in seconds from the persistent clock.
> > + *
> > + * Weak dummy function for arches that do not yet support it.
> > + * Returns seconds from epoch using the battery backed persistent clock.
> > + * Returns zero if unsupported.
> > + *
> > + * XXX - Do be sure to remove it once all arches implement it.
>
> But not all architectures **CAN** support this notion ...

That's ok and is the reason why we have a unsupported return option.

This patch just gives us a path to consolidate what the majority of
arches do currently.


> > /*
> > * timekeeping_init - Initializes the clocksource and common timekeeping values
> > */
> > void __init timekeeping_init(void)
> > {
> > unsigned long flags;
> > + unsigned long sec = read_persistent_clock();
>
> ... and timekeeping_init() is called before I2C or SPI could be used,
> since IRQs aren't enabled yet and accessing those busses can't be
> done in general without IRQs enabled.

Again, that's fine. If the read_persistent_clock is not supported, xtime
will still be zero and can be set later via other methods.


> > @@ -774,13 +801,23 @@ static int timekeeping_suspended;
> > static int timekeeping_resume(struct sys_device *dev)
> > {
> > unsigned long flags;
> > + unsigned long now = read_persistent_clock();
>
> Again: sys_device resume() is called with IRQs disabled, which
> prevents access to many systems' persistent clocks. In fact,
> after posting this example patch
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=115600629813751&w=2


Ok, correct me if I'm wrong, though: The only catch, if I understand
correctly, is that it requires the system in question to have a proper
RTC driver, which doesn't cover 100% of the arch/platforms supported.
No?

I don't really have an issue w/ the RTC code above, however it does not
address the current suspend/resume code in the majority of arches. I
don't know if we're actually in that much conflict here, since I'm
trying to remove the arch specific suspend/resume timekeeping changes,
and (to my understanding) so are you.

We just have a difference in where we're trying to re-add the code. I'm
moving the current code into the generic tod path, and you're moving it
into the RTC driver. Both efforts are consolidating code, so even with
the minor duplication we have less code then before. So I'm sure as we
whittle away at this we can find a way to meet in the middle. I think
this patch moves us forward in that direction.


> I never heard anything more from you on this issue. Given this
> particular patch (in $SUBJECT) should I assume you're going to
> just ignore the issues whereby RTCs ("persistent clocks") can't
> always meet the no-IRQs-needed assumptions being made here? Or
> address isssues like using pointer-to-function instead of using
> linker tricks?

In my head I see it like this:

Currently here is how the timekeeping resume code breaks down:
1) timeofday_resume: reset clocksource, NTP
2) arch time resume: [set xtime]
3) RTC resume: [set xtime]

Where the [set xtime]s depend on platform config

I'm trying to just move us to:
1) timeofday_resume: reset clocksource, NTP, [set xtime]
2) RTC resume: [set xtime]

Again, where the [set xtime]s depend on platform config

Then as the RTC drivers gain coverage, maybe we can start cutting
timeofday resume down and have something like:
1) timeofday_resume: reset clocksource, NTP
2) RTC resume: set xtime

Does this sound like a way forward?

thanks
-john

2006-10-04 17:51:03

by David Brownell

[permalink] [raw]
Subject: Re: [patch 02/23] GTOD: persistent clock support, core

On Tuesday 03 October 2006 12:19 pm, john stultz wrote:
>
> Yea. First, let me apologize for falling off the last thread. I got busy
> with other things and the discussion withered. This patch has been
> re-raised because Thomas finally tripped over the "theoretical" resume
> time ordering bug that I was concerned about.

I recall losing track of what that issue was, and thinking a new explanation
was needed ... no such issues were described in this patch or its summary.


> So, while my first announcement with the patch was something of the
> effect of "Trying to unify the cmos/RTC/whatever interface", due to our
> discussion I'm scaling back that goal.
>
> Instead the purpose of this is just a continuation of the generic
> timekeeping work. Moving the *existing* arch specific resume timekeeping
> code into generic code,

... which wasn't what that one source code comment implied; quite
the opposite, it assumed all architectures _could_ work that way.


> On arches that have the constraints you list above, a dummy
> read_persistent_clock() that returns zero can be implemented, with a
> comment about why and the RTC_HCTOSYS bits can be used, with no change
> in behavior from what we have now.

Better IMO to use a more traditional run-time function call style
hook, not the link time magic you're now using. Point is, this
is most accurately a _board_ option not an architectural one.

Even systems that _could_ provide some access to a persistent
clock without relying on IRQs might be designed not to do so.
For example, RTCs integrated into system-on-chip processors
can be less power-efficient than discrete ones, so that when
power-efficient design is essential, that integrated RTC may
not be a viable design option.

And since we expect Linux kernels to be able to run on more
than one board, that includes configurations where one of the
boards uses the integrated register-based RTC, and another
uses a discrete one with serial interface. But your choice
of link-time binding precludes one kernel handling both boards
correctly...



> > > +/**
> > > + * read_persistent_clock - Return time in seconds from the persistent clock.
> > > + *
> > > + * Weak dummy function for arches that do not yet support it.
> > > + * Returns seconds from epoch using the battery backed persistent clock.
> > > + * Returns zero if unsupported.
> > > + *
> > > + * XXX - Do be sure to remove it once all arches implement it.
> >
> > But not all architectures **CAN** support this notion ...
>
> That's ok and is the reason why we have a unsupported return option.
>
> This patch just gives us a path to consolidate what the majority of
> arches do currently.

That XXX comment disagrees ... it assumes all architectures can work
that way ...

And I'm not sure about "majority"; depends how you count. Certainly
the number of embedded boards that could work differently is large and
growing ... and there are a lot more embedded computers in the world
than desktops, laptops, or servers.


> > > @@ -774,13 +801,23 @@ static int timekeeping_suspended;
> > > static int timekeeping_resume(struct sys_device *dev)
> > > {
> > > unsigned long flags;
> > > + unsigned long now = read_persistent_clock();
> >
> > Again: sys_device resume() is called with IRQs disabled, which
> > prevents access to many systems' persistent clocks. In fact,
> > after posting this example patch
> >
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=115600629813751&w=2
>
>
> Ok, correct me if I'm wrong, though: The only catch, if I understand
> correctly, is that it requires the system in question to have a proper
> RTC driver, which doesn't cover 100% of the arch/platforms supported.
> No?

It's not a "catch" for the systems which have an RTC class driver,
and your approach doesn't cover 100% either.

The key difference between the two approaches is that yours **CANNOT**
handle cases like an I2C based RTC (call them "message based RTCs"), where
the RTC class drivers **CAN** handle "register based RTCs".


> I don't really have an issue w/ the RTC code above, however it does not
> address the current suspend/resume code in the majority of arches. I
> don't know if we're actually in that much conflict here, since I'm
> trying to remove the arch specific suspend/resume timekeeping changes,
> and (to my understanding) so are you.

It's a question of scope, I guess. You're limiting your solution to
boards that don't require message based RTCs. I'm thinking that's not
really sufficient ... and it bothers me to see something be pitched
as a basic architectural feature of the clock/time framework when
it's so clearly not sufficient.


> We just have a difference in where we're trying to re-add the code. I'm
> moving the current code into the generic tod path, and you're moving it
> into the RTC driver. Both efforts are consolidating code, so even with
> the minor duplication we have less code then before. So I'm sure as we
> whittle away at this we can find a way to meet in the middle. I think
> this patch moves us forward in that direction.

So long as this GTOD patch is updated to reflect the reality that
not all _boards_ can (or will) support that style of RTC ... cleanup
is good, but those code comments suggest deeper assumptions.


> > I never heard anything more from you on this issue. Given this
> > particular patch (in $SUBJECT) should I assume you're going to
> > just ignore the issues whereby RTCs ("persistent clocks") can't
> > always meet the no-IRQs-needed assumptions being made here? Or
> > address isssues like using pointer-to-function instead of using
> > linker tricks?
>
> In my head I see it like this:
>
> Currently here is how the timekeeping resume code breaks down:
> 1) timeofday_resume: reset clocksource, NTP
> 2) arch time resume: [set xtime]
> 3) RTC resume: [set xtime]
>
> Where the [set xtime]s depend on platform config

That is, three different ways to do it.


> I'm trying to just move us to:
> 1) timeofday_resume: reset clocksource, NTP, [set xtime]
> 2) RTC resume: [set xtime]
>
> Again, where the [set xtime]s depend on platform config

Getting rid of one is useful, yes. But why not get rid of two?


> Then as the RTC drivers gain coverage, maybe we can start cutting
> timeofday resume down and have something like:
> 1) timeofday_resume: reset clocksource, NTP
> 2) RTC resume: set xtime
>
> Does this sound like a way forward?

I'm not sure I see how the last two groups are very different;
are you saying it would get rid of these message-unfriendly
read_persistent_clock() calls?

- Dave

2006-10-04 19:09:29

by john stultz

[permalink] [raw]
Subject: Re: [patch 02/23] GTOD: persistent clock support, core

On Wed, 2006-10-04 at 10:50 -0700, David Brownell wrote:
> On Tuesday 03 October 2006 12:19 pm, john stultz wrote:
> >
> > Yea. First, let me apologize for falling off the last thread. I got busy
> > with other things and the discussion withered. This patch has been
> > re-raised because Thomas finally tripped over the "theoretical" resume
> > time ordering bug that I was concerned about.
>
> I recall losing track of what that issue was, and thinking a new explanation
> was needed ... no such issues were described in this patch or its summary.

Right. So briefly, the resume ordering issue is this: Currently resuming
the timekeeping code is split across both the generic code and the arch
specific code. It wasn't always this way, my earlier GTOD patches had a
similar change, but in my reworking and cutting down of the code to get
it acceptable for mainline I dropped it.

The assumption I made, was that the generic timekeeping_resume would be
called before the arch specific time resume code runs. And by luck it
does, however I realized shortly afterward that there isn't a strict
ordering to the resume functions and it could happen the other way. With
the dynticks patch, Thomas finally tripped over the issue, and so I got
this patch going again.


> > So, while my first announcement with the patch was something of the
> > effect of "Trying to unify the cmos/RTC/whatever interface", due to our
> > discussion I'm scaling back that goal.
> >
> > Instead the purpose of this is just a continuation of the generic
> > timekeeping work. Moving the *existing* arch specific resume timekeeping
> > code into generic code,
>
> ... which wasn't what that one source code comment implied; quite
> the opposite, it assumed all architectures _could_ work that way.

Well to me the comment meant that all arches could implement the
function which would return the persistent time or zero(unsupported).
And I guess I was thinking the arch implementation it would include
comment at to why it returns zero.

But sure, I can understand its confusing and will re-work the wording to
be more clear.

> > On arches that have the constraints you list above, a dummy
> > read_persistent_clock() that returns zero can be implemented, with a
> > comment about why and the RTC_HCTOSYS bits can be used, with no change
> > in behavior from what we have now.
>
> Better IMO to use a more traditional run-time function call style
> hook, not the link time magic you're now using. Point is, this
> is most accurately a _board_ option not an architectural one.
>
> Even systems that _could_ provide some access to a persistent
> clock without relying on IRQs might be designed not to do so.
> For example, RTCs integrated into system-on-chip processors
> can be less power-efficient than discrete ones, so that when
> power-efficient design is essential, that integrated RTC may
> not be a viable design option.
>
> And since we expect Linux kernels to be able to run on more
> than one board, that includes configurations where one of the
> boards uses the integrated register-based RTC, and another
> uses a discrete one with serial interface. But your choice
> of link-time binding precludes one kernel handling both boards
> correctly...

Sure, using a function pointer is easy enough to change.

> > > Again: sys_device resume() is called with IRQs disabled, which
> > > prevents access to many systems' persistent clocks. In fact,
> > > after posting this example patch
> > >
> > > http://marc.theaimsgroup.com/?l=linux-kernel&m=115600629813751&w=2
> >
> >
> > Ok, correct me if I'm wrong, though: The only catch, if I understand
> > correctly, is that it requires the system in question to have a proper
> > RTC driver, which doesn't cover 100% of the arch/platforms supported.
> > No?
>
> It's not a "catch" for the systems which have an RTC class driver,
> and your approach doesn't cover 100% either.

True, it doesn't provide a solution for 100% of the systems. But that's
not what I'm shooting for. I'm trying to fix a real bug (the resume
ordering issue), and reduce the arch specific logic.


> The key difference between the two approaches is that yours **CANNOT**
> handle cases like an I2C based RTC (call them "message based RTCs"), where
> the RTC class drivers **CAN** handle "register based RTCs".

Sure. As a long term solution, the RTC method has promise. The patch in
question does not prohibit the RTC work from getting there.

It simply resolves a possible resume ordering bug and allows logic that
already exists in alpha, cris, frv, h8300, i386, ia64, m68k, m68knommu,
mips, parisc, powerpc, ppc, s390...
[deep inhale]
sh, sh64, sparc64, um, v850, x86_64, and xtensa [whew!] to be
consolidated in the generic path.

And the above doesn't challenge your point about there being more
embedded systems then desktops. This patch isn't about one arch or
platform being more important or more common then another, its just a
code duplication and maintenance issue.

> > I don't really have an issue w/ the RTC code above, however it does not
> > address the current suspend/resume code in the majority of arches. I
> > don't know if we're actually in that much conflict here, since I'm
> > trying to remove the arch specific suspend/resume timekeeping changes,
> > and (to my understanding) so are you.
>
> It's a question of scope, I guess. You're limiting your solution to
> boards that don't require message based RTCs. I'm thinking that's not
> really sufficient ... and it bothers me to see something be pitched
> as a basic architectural feature of the clock/time framework when
> it's so clearly not sufficient.

Has an unsupported return code. Does not change existing behavior
(except for fixing the ordering issue). Not a basic architectural
feature. Just a common one we want to handle correctly.

> > We just have a difference in where we're trying to re-add the code. I'm
> > moving the current code into the generic tod path, and you're moving it
> > into the RTC driver. Both efforts are consolidating code, so even with
> > the minor duplication we have less code then before. So I'm sure as we
> > whittle away at this we can find a way to meet in the middle. I think
> > this patch moves us forward in that direction.
>
> So long as this GTOD patch is updated to reflect the reality that
> not all _boards_ can (or will) support that style of RTC ... cleanup
> is good, but those code comments suggest deeper assumptions.

Sure. I apologize for the confusion.

> > I'm trying to just move us to:
> > 1) timeofday_resume: reset clocksource, NTP, [set xtime]
> > 2) RTC resume: [set xtime]
> >
> > Again, where the [set xtime]s depend on platform config
>
> Getting rid of one is useful, yes. But why not get rid of two?

Sure, but that requires RTC drivers to be written for every arch and
platform first.

> > Then as the RTC drivers gain coverage, maybe we can start cutting
> > timeofday resume down and have something like:
> > 1) timeofday_resume: reset clocksource, NTP
> > 2) RTC resume: set xtime
> >
> > Does this sound like a way forward?
>
> I'm not sure I see how the last two groups are very different;
> are you saying it would get rid of these message-unfriendly
> read_persistent_clock() calls?

I'm open to that. Once the RTC drivers get there. :)

I apologize if this reply seems a bit terse (and more worrying,
combative). That's not the intent. I really do appreciate your review
and feedback. Your point about message based RTCs is solid and I will
keep it in mind as the development continues.

thanks again,
-john