2012-05-18 14:10:17

by Richard Cochran

[permalink] [raw]
Subject: [PATCH RFC V2 0/6] Fix leap seconds and add tai clock

In honor of this summer's new leap second, this patch series aim to
fix the incorrect time values reported during a leap second. This
second version of the patch series does not change the core time
keeping code, but rather affects only the adjtimex interface.

Of course, the POSIX UTC system is broken by design, and the Linux
kernel cannot fix that. However, what we can do is correctly execute
leap seconds and report the time variables (UTC time, TAI offset, and
leap second status) with consistency.

Since commit 6b43ae8a, the leap second is applied on the first timer
tick after the actual deadline, resulting in incorrect time values
from gettimeofday and friends. In contrast to V1, this series does
not address that issue but rather adds extra checks into the adjtimex
code path, with the effect of always providing correct UTC time
values, even during a leap second.

The first two patches merely clean up some cruft I noticed along the
way while working on this series.


John Stultz (1):
time: Add CLOCK_TAI clockid

Richard Cochran (5):
time: remove obsolete declaration
ntp: remove useless parameter
time: keep track of the pending utc/tai threshold
time: introduce leap second functional interface
time: move leap second management into time keeping core

include/linux/time.h | 9 +-
kernel/posix-timers.c | 10 ++
kernel/time/leap-seconds.h | 21 ++++
kernel/time/ntp.c | 90 ++++++------------
kernel/time/timekeeping.c | 228 ++++++++++++++++++++++++++++++++++++++++++--
5 files changed, 283 insertions(+), 75 deletions(-)
create mode 100644 kernel/time/leap-seconds.h

--
1.7.2.5


2012-05-18 14:10:20

by Richard Cochran

[permalink] [raw]
Subject: [PATCH RFC V2 4/6] time: introduce leap second functional interface

This patch adds a new private leap second interface for use by the NTP
code. In addition to methods for starting and ending leap seconds, the
interface provides a way to get the correct UTC time of day even during
a leap second.

Signed-off-by: Richard Cochran <[email protected]>
---
kernel/time/leap-seconds.h | 21 +++++++
kernel/time/timekeeping.c | 129 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 150 insertions(+), 0 deletions(-)
create mode 100644 kernel/time/leap-seconds.h

diff --git a/kernel/time/leap-seconds.h b/kernel/time/leap-seconds.h
new file mode 100644
index 0000000..3dea7b0
--- /dev/null
+++ b/kernel/time/leap-seconds.h
@@ -0,0 +1,21 @@
+/*
+ * linux/kernel/time/leap-seconds.h
+ *
+ * Functional interface to the timekeeper code,
+ * for use by the NTP code.
+ *
+ */
+#ifndef __LINUX_KERNEL_TIME_LEAP_SECONDS_H
+#define __LINUX_KERNEL_TIME_LEAP_SECONDS_H
+
+#include <linux/time.h>
+
+int timekeeping_gettod_status(struct timespec *ts, time_t *offset);
+
+void timekeeping_delete_leap_second(void);
+
+void timekeeping_finish_leap_second(void);
+
+void timekeeping_insert_leap_second(void);
+
+#endif
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index ac04de4..eab03fb 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -21,6 +21,8 @@
#include <linux/tick.h>
#include <linux/stop_machine.h>

+#include "leap-seconds.h"
+
/* Structure holding internal timekeeping values. */
struct timekeeper {
/* Current clocksource used for timekeeping. */
@@ -1290,3 +1292,130 @@ void xtime_update(unsigned long ticks)
do_timer(ticks);
write_sequnlock(&xtime_lock);
}
+
+/**
+ * zero_hour - Returns the time of the next zero hour.
+ */
+static time_t zero_hour(time_t now)
+{
+ time_t days = now / 86400;
+ time_t zero = (1 + days) * 86400;
+ return zero;
+}
+
+/**
+ * timekeeping_delete_leap_second - Delete a leap second today.
+ */
+void timekeeping_delete_leap_second(void)
+{
+ unsigned long flags;
+
+ write_seqlock_irqsave(&timekeeper.lock, flags);
+
+ if (timekeeper.leap_state == LEAP_IDLE) {
+ timekeeper.utc_epoch = zero_hour(timekeeper.xtime.tv_sec);
+ timekeeper.leap_state = LEAP_DEL_PENDING;
+ }
+ write_sequnlock_irqrestore(&timekeeper.lock, flags);
+}
+
+/**
+ * timekeeping_finish_leap_second - Advance the leap second threshold.
+ */
+void timekeeping_finish_leap_second(void)
+{
+ unsigned long flags;
+
+ write_seqlock_irqsave(&timekeeper.lock, flags);
+
+ if (timekeeper.leap_state == LEAP_INS_DONE ||
+ timekeeper.leap_state == LEAP_DEL_DONE) {
+ timekeeper.utc_epoch = LONG_MAX;
+ timekeeper.leap_state = LEAP_IDLE;
+ }
+ write_sequnlock_irqrestore(&timekeeper.lock, flags);
+}
+
+/**
+ * timekeeping_insert_leap_second - Add a leap second today.
+ */
+void timekeeping_insert_leap_second(void)
+{
+ unsigned long flags;
+
+ write_seqlock_irqsave(&timekeeper.lock, flags);
+
+ if (timekeeper.leap_state == LEAP_IDLE) {
+ timekeeper.utc_epoch = zero_hour(timekeeper.xtime.tv_sec);
+ timekeeper.leap_state = LEAP_INS_PENDING;
+ }
+ write_sequnlock_irqrestore(&timekeeper.lock, flags);
+}
+
+/**
+ * timekeeping_gettod_status - Get the current time, TAI offset,
+ * and leap second status.
+ */
+int timekeeping_gettod_status(struct timespec *ts, time_t *offset)
+{
+ int code = TIME_OK, leap_state;
+ time_t diff, epoch, taioff;
+ unsigned long seq;
+ s64 nsecs;
+
+ WARN_ON(timekeeping_suspended);
+
+ do {
+ seq = read_seqbegin(&timekeeper.lock);
+ *ts = timekeeper.xtime;
+ nsecs = timekeeping_get_ns();
+ nsecs += arch_gettimeoffset();
+ taioff = timekeeper.tai_offset;
+ epoch = timekeeper.utc_epoch;
+ leap_state = timekeeper.leap_state;
+ } while (read_seqretry(&timekeeper.lock, seq));
+
+ timespec_add_ns(ts, nsecs);
+
+ switch (leap_state) {
+ case LEAP_IDLE:
+ break;
+
+ case LEAP_INS_PENDING:
+ diff = epoch - ts->tv_sec;
+ if (diff > 0) {
+ code = TIME_INS;
+ } else {
+ code = diff ? TIME_WAIT : TIME_OOP;
+ ts->tv_sec--;
+ taioff++;
+ }
+ break;
+
+ case LEAP_INS_DONE:
+ diff = epoch - ts->tv_sec;
+ if (diff > 0)
+ code = TIME_OOP;
+ else
+ code = TIME_WAIT;
+ break;
+
+ case LEAP_DEL_PENDING:
+ diff = epoch - ts->tv_sec;
+ if (diff < 2) {
+ code = TIME_WAIT;
+ ts->tv_sec++;
+ taioff--;
+ } else {
+ code = TIME_DEL;
+ }
+ break;
+
+ case LEAP_DEL_DONE:
+ code = TIME_WAIT;
+ break;
+ }
+
+ *offset = taioff;
+ return code;
+}
--
1.7.2.5

2012-05-18 14:10:27

by Richard Cochran

[permalink] [raw]
Subject: [PATCH RFC V2 6/6] time: Add CLOCK_TAI clockid

From: John Stultz <[email protected]>

This adds a CLOCK_TAI clockid and the needed accessors.

Signed-off-by: John Stultz <[email protected]>
Signed-off-by: Richard Cochran <[email protected]>
---
include/linux/time.h | 7 +++----
kernel/posix-timers.c | 10 ++++++++++
kernel/time/timekeeping.c | 31 +++++++++++++++++++++++++++++++
3 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/include/linux/time.h b/include/linux/time.h
index b6034b0..9be8205 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -169,6 +169,7 @@ extern int timekeeping_valid_for_hres(void);
extern u64 timekeeping_max_deferment(void);
extern int timekeeping_inject_offset(struct timespec *ts);
extern void timekeeping_set_tai_offset(time_t tai_offset);
+extern void timekeeping_clocktai(struct timespec *ts);

struct tms;
extern void do_sys_times(struct tms *);
@@ -297,11 +298,9 @@ struct itimerval {
#define CLOCK_BOOTTIME 7
#define CLOCK_REALTIME_ALARM 8
#define CLOCK_BOOTTIME_ALARM 9
+#define CLOCK_SGI_CYCLE 10 /* Hardware specific */
+#define CLOCK_TAI 11

-/*
- * The IDs of various hardware clocks:
- */
-#define CLOCK_SGI_CYCLE 10
#define MAX_CLOCKS 16
#define CLOCKS_MASK (CLOCK_REALTIME | CLOCK_MONOTONIC)
#define CLOCKS_MONO CLOCK_MONOTONIC
diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index 69185ae..d6d146c 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -221,6 +221,11 @@ static int posix_get_boottime(const clockid_t which_clock, struct timespec *tp)
return 0;
}

+static int posix_get_tai(clockid_t which_clock, struct timespec *tp)
+{
+ timekeeping_clocktai(tp);
+ return 0;
+}

/*
* Initialize everything, well, just everything in Posix clocks/timers ;)
@@ -261,6 +266,10 @@ static __init int init_posix_timers(void)
.clock_getres = posix_get_coarse_res,
.clock_get = posix_get_monotonic_coarse,
};
+ struct k_clock clock_tai = {
+ .clock_getres = hrtimer_get_res,
+ .clock_get = posix_get_tai,
+ };
struct k_clock clock_boottime = {
.clock_getres = hrtimer_get_res,
.clock_get = posix_get_boottime,
@@ -278,6 +287,7 @@ static __init int init_posix_timers(void)
posix_timers_register_clock(CLOCK_REALTIME_COARSE, &clock_realtime_coarse);
posix_timers_register_clock(CLOCK_MONOTONIC_COARSE, &clock_monotonic_coarse);
posix_timers_register_clock(CLOCK_BOOTTIME, &clock_boottime);
+ posix_timers_register_clock(CLOCK_TAI, &clock_tai);

posix_timers_cache = kmem_cache_create("posix_timers_cache",
sizeof (struct k_itimer), 0, SLAB_PANIC,
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index fdd1a48..6696f60 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -315,6 +315,37 @@ void ktime_get_ts(struct timespec *ts)
}
EXPORT_SYMBOL_GPL(ktime_get_ts);

+
+/**
+ * timekeeping_clocktai - Returns the TAI time of day in a timespec
+ * @ts: pointer to the timespec to be set
+ *
+ * Returns the time of day in a timespec.
+ */
+void timekeeping_clocktai(struct timespec *ts)
+{
+ unsigned long seq;
+ s64 nsecs;
+
+ WARN_ON(timekeeping_suspended);
+
+ do {
+ seq = read_seqbegin(&timekeeper.lock);
+
+ *ts = timekeeper.xtime;
+ nsecs = timekeeping_get_ns();
+
+ /* If arch requires, add in gettimeoffset() */
+ nsecs += arch_gettimeoffset();
+ ts->tv_sec += timekeeper.tai_offset;
+
+ } while (read_seqretry(&timekeeper.lock, seq));
+
+ timespec_add_ns(ts, nsecs);
+}
+EXPORT_SYMBOL(timekeeping_clocktai);
+
+
#ifdef CONFIG_NTP_PPS

/**
--
1.7.2.5

2012-05-18 14:10:24

by Richard Cochran

[permalink] [raw]
Subject: [PATCH RFC V2 5/6] time: move leap second management into time keeping core

This patch refactors the timekeeping and ntp code in order to
improve leap second handling and to provide a TAI based clock.
The change has a number of aspects.

* remove the NTP time_status variable

Instead, compute this value on demand in adjtimex.

* move TAI managment into timekeeping core from ntp

Currently NTP manages the TAI offset. Since there's plans for a
CLOCK_TAI clockid, push the TAI management into the timekeeping
core.

[ Original idea from: John Stultz <[email protected]> ]
[ Changed by RC: ]

- replace u32 with time_t
- fixup second call site of second_overflow()

* replace modulus with integer test and schedule leap second

On the day of a leap second, the NTP code performs a division on every
tick in order to know when to leap. This patch replaces the division
with an integer comparison, making the code faster and easier to
understand.

Signed-off-by: Richard Cochran <[email protected]>
---
include/linux/time.h | 1 +
kernel/time/ntp.c | 86 +++++++++++++-------------------------------
kernel/time/timekeeping.c | 54 ++++++++++++++++++++++++----
3 files changed, 73 insertions(+), 68 deletions(-)

diff --git a/include/linux/time.h b/include/linux/time.h
index 179f4d6..b6034b0 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -168,6 +168,7 @@ extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
extern int timekeeping_valid_for_hres(void);
extern u64 timekeeping_max_deferment(void);
extern int timekeeping_inject_offset(struct timespec *ts);
+extern void timekeeping_set_tai_offset(time_t tai_offset);

struct tms;
extern void do_sys_times(struct tms *);
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index d4d48b0..53c3227 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -16,6 +16,7 @@
#include <linux/mm.h>
#include <linux/module.h>

+#include "leap-seconds.h"
#include "tick-internal.h"

/*
@@ -24,6 +25,7 @@

DEFINE_SPINLOCK(ntp_lock);

+#define STA_LEAP (STA_INS | STA_DEL)

/* USER_HZ period (usecs): */
unsigned long tick_usec = TICK_USEC;
@@ -42,19 +44,9 @@ static u64 tick_length_base;
* phase-lock loop variables
*/

-/*
- * clock synchronization status
- *
- * (TIME_ERROR prevents overwriting the CMOS clock)
- */
-static int time_state = TIME_OK;
-
/* clock status bits: */
static int time_status = STA_UNSYNC;

-/* TAI offset (secs): */
-static long time_tai;
-
/* time adjustment (nsecs): */
static s64 time_offset;

@@ -386,57 +378,14 @@ u64 ntp_tick_length(void)
* They were originally developed for SUN and DEC kernels.
* All the kudos should go to Dave for this stuff.
*
- * Also handles leap second processing, and returns leap offset
*/
int second_overflow(unsigned long secs)
{
s64 delta;
- int leap = 0;
unsigned long flags;

spin_lock_irqsave(&ntp_lock, flags);

- /*
- * Leap second processing. If in leap-insert state at the end of the
- * day, the system clock is set back one second; if in leap-delete
- * state, the system clock is set ahead one second.
- */
- switch (time_state) {
- case TIME_OK:
- if (time_status & STA_INS)
- time_state = TIME_INS;
- else if (time_status & STA_DEL)
- time_state = TIME_DEL;
- break;
- case TIME_INS:
- if (secs % 86400 == 0) {
- leap = -1;
- time_state = TIME_OOP;
- time_tai++;
- printk(KERN_NOTICE
- "Clock: inserting leap second 23:59:60 UTC\n");
- }
- break;
- case TIME_DEL:
- if ((secs + 1) % 86400 == 0) {
- leap = 1;
- time_tai--;
- time_state = TIME_WAIT;
- printk(KERN_NOTICE
- "Clock: deleting leap second 23:59:59 UTC\n");
- }
- break;
- case TIME_OOP:
- time_state = TIME_WAIT;
- break;
-
- case TIME_WAIT:
- if (!(time_status & (STA_INS | STA_DEL)))
- time_state = TIME_OK;
- break;
- }
-
-
/* Bump the maxerror field */
time_maxerror += MAXFREQ / NSEC_PER_USEC;
if (time_maxerror > NTP_PHASE_LIMIT) {
@@ -478,7 +427,7 @@ int second_overflow(unsigned long secs)
out:
spin_unlock_irqrestore(&ntp_lock, flags);

- return leap;
+ return 0;
}

#ifdef CONFIG_GENERIC_CMOS_UPDATE
@@ -543,7 +492,6 @@ static inline void notify_cmos_timer(void) { }
static inline void process_adj_status(struct timex *txc)
{
if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
- time_state = TIME_OK;
time_status = STA_UNSYNC;
/* restart PPS frequency calibration */
pps_reset_freq_interval();
@@ -565,7 +513,7 @@ static inline void process_adj_status(struct timex *txc)
* Called with the xtime lock held, so we can access and modify
* all the global NTP state:
*/
-static inline void process_adjtimex_modes(struct timex *txc)
+static inline void process_adjtimex_modes(struct timex *txc, time_t *time_tai)
{
if (txc->modes & ADJ_STATUS)
process_adj_status(txc);
@@ -599,7 +547,7 @@ static inline void process_adjtimex_modes(struct timex *txc)
}

if (txc->modes & ADJ_TAI && txc->constant > 0)
- time_tai = txc->constant;
+ *time_tai = txc->constant;

if (txc->modes & ADJ_OFFSET)
ntp_update_offset(txc->offset);
@@ -618,6 +566,7 @@ static inline void process_adjtimex_modes(struct timex *txc)
int do_adjtimex(struct timex *txc)
{
struct timespec ts;
+ time_t time_tai, orig_tai;
int result;

/* Validate the data before disabling interrupts */
@@ -656,7 +605,22 @@ int do_adjtimex(struct timex *txc)
return result;
}

- getnstimeofday(&ts);
+ result = timekeeping_gettod_status(&ts, &orig_tai);
+ time_tai = orig_tai;
+
+ if (txc->modes & ADJ_STATUS) {
+ /*
+ * Check for new leap second commands.
+ */
+ if (!(time_status & STA_INS) && (txc->status & STA_INS))
+ timekeeping_insert_leap_second();
+
+ else if (!(time_status & STA_DEL) && (txc->status & STA_DEL))
+ timekeeping_delete_leap_second();
+
+ else if ((time_status & STA_LEAP) && !(txc->status & STA_LEAP))
+ timekeeping_finish_leap_second();
+ }

spin_lock_irq(&ntp_lock);

@@ -673,7 +637,7 @@ int do_adjtimex(struct timex *txc)

/* If there are input parameters, then process them: */
if (txc->modes)
- process_adjtimex_modes(txc);
+ process_adjtimex_modes(txc, &time_tai);

txc->offset = shift_right(time_offset * NTP_INTERVAL_FREQ,
NTP_SCALE_SHIFT);
@@ -681,7 +645,6 @@ int do_adjtimex(struct timex *txc)
txc->offset /= NSEC_PER_USEC;
}

- result = time_state; /* mostly `TIME_OK' */
/* check for errors */
if (is_error_status(time_status))
result = TIME_ERROR;
@@ -702,6 +665,9 @@ int do_adjtimex(struct timex *txc)

spin_unlock_irq(&ntp_lock);

+ if (time_tai != orig_tai && result == TIME_OK)
+ timekeeping_set_tai_offset(time_tai);
+
txc->time.tv_sec = ts.tv_sec;
txc->time.tv_usec = ts.tv_nsec;
if (!(time_status & STA_NANO))
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index eab03fb..fdd1a48 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -444,6 +444,18 @@ int timekeeping_inject_offset(struct timespec *ts)
EXPORT_SYMBOL(timekeeping_inject_offset);

/**
+ * timekeeping_set_tai_offset - Sets the current TAI offset from UTC
+ */
+void timekeeping_set_tai_offset(time_t tai_offset)
+{
+ unsigned long flags;
+
+ write_seqlock_irqsave(&timekeeper.lock, flags);
+ timekeeper.tai_offset = tai_offset;
+ write_sequnlock_irqrestore(&timekeeper.lock, flags);
+}
+
+/**
* change_clocksource - Swaps clocksources if a new one is available
*
* Accumulates current time interval and initializes new clocksource
@@ -950,6 +962,38 @@ static void timekeeping_adjust(s64 offset)
timekeeper.ntp_error_shift;
}

+static void timekeeper_overflow(void)
+{
+ timekeeper.xtime.tv_sec++;
+
+ switch (timekeeper.leap_state) {
+
+ case LEAP_IDLE:
+ case LEAP_INS_DONE:
+ case LEAP_DEL_DONE:
+ break;
+
+ case LEAP_INS_PENDING:
+ if (timekeeper.xtime.tv_sec == timekeeper.utc_epoch) {
+ pr_notice("Clock: inserting leap second 23:59:60 UTC\n");
+ timekeeper.xtime.tv_sec--;
+ timekeeper.tai_offset++;
+ timekeeper.leap_state = LEAP_INS_DONE;
+ }
+ break;
+
+ case LEAP_DEL_PENDING:
+ if (timekeeper.xtime.tv_sec + 1 == timekeeper.utc_epoch) {
+ pr_notice("Clock: deleting leap second 23:59:59 UTC\n");
+ timekeeper.xtime.tv_sec++;
+ timekeeper.tai_offset--;
+ timekeeper.leap_state = LEAP_DEL_DONE;
+ }
+ break;
+ }
+
+ second_overflow(timekeeper.xtime.tv_sec);
+}

/**
* logarithmic_accumulation - shifted accumulation of cycles
@@ -975,11 +1019,8 @@ static cycle_t logarithmic_accumulation(cycle_t offset, int shift)

timekeeper.xtime_nsec += timekeeper.xtime_interval << shift;
while (timekeeper.xtime_nsec >= nsecps) {
- int leap;
timekeeper.xtime_nsec -= nsecps;
- timekeeper.xtime.tv_sec++;
- leap = second_overflow(timekeeper.xtime.tv_sec);
- timekeeper.xtime.tv_sec += leap;
+ timekeeper_overflow();
}

/* Accumulate raw time */
@@ -1090,11 +1131,8 @@ static void update_wall_time(void)
* xtime.tv_nsec isn't larger than NSEC_PER_SEC
*/
if (unlikely(timekeeper.xtime.tv_nsec >= NSEC_PER_SEC)) {
- int leap;
timekeeper.xtime.tv_nsec -= NSEC_PER_SEC;
- timekeeper.xtime.tv_sec++;
- leap = second_overflow(timekeeper.xtime.tv_sec);
- timekeeper.xtime.tv_sec += leap;
+ timekeeper_overflow();
}

timekeeping_update(false);
--
1.7.2.5

2012-05-18 14:10:15

by Richard Cochran

[permalink] [raw]
Subject: [PATCH RFC V2 1/6] time: remove obsolete declaration

The function, timekeeping_leap_insert, was removed in commit
6b43ae8a619d17c4935c3320d2ef9e92bdeed05d

Signed-off-by: Richard Cochran <[email protected]>
---
include/linux/time.h | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/include/linux/time.h b/include/linux/time.h
index 33a92ea..179f4d6 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -167,7 +167,6 @@ extern void get_monotonic_boottime(struct timespec *ts);
extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
extern int timekeeping_valid_for_hres(void);
extern u64 timekeeping_max_deferment(void);
-extern void timekeeping_leap_insert(int leapsecond);
extern int timekeeping_inject_offset(struct timespec *ts);

struct tms;
--
1.7.2.5

2012-05-18 14:11:41

by Richard Cochran

[permalink] [raw]
Subject: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

This patch introduces time keeping variables to track the next
mini-epoch between the UTC and TAI timescales. A leap second occurs
one second before a mini-epoch. When no leap second is pending, then
the epoch is set to the far future, LONG_MAX.

This code will become useful later on for providing correct time
surrounding a leap second.

Signed-off-by: Richard Cochran <[email protected]>
---
kernel/time/timekeeping.c | 14 ++++++++++++++
1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index d66b213..ac04de4 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -70,6 +70,19 @@ struct timekeeper {
/* The raw monotonic time for the CLOCK_MONOTONIC_RAW posix clock. */
struct timespec raw_time;

+ /* The current TAI - UTC offset */
+ time_t tai_offset;
+ /* The UTC time of the next leap second epoch */
+ time_t utc_epoch;
+ /* Tracks where we stand with regard to leap the second epoch. */
+ enum {
+ LEAP_IDLE,
+ LEAP_INS_PENDING,
+ LEAP_INS_DONE,
+ LEAP_DEL_PENDING,
+ LEAP_DEL_DONE,
+ } leap_state;
+
/* Seqlock for all timekeeper values */
seqlock_t lock;
};
@@ -608,6 +621,7 @@ void __init timekeeping_init(void)
-boot.tv_sec, -boot.tv_nsec);
timekeeper.total_sleep_time.tv_sec = 0;
timekeeper.total_sleep_time.tv_nsec = 0;
+ timekeeper.utc_epoch = LONG_MAX;
write_sequnlock_irqrestore(&timekeeper.lock, flags);
}

--
1.7.2.5

2012-05-18 14:12:00

by Richard Cochran

[permalink] [raw]
Subject: [PATCH RFC V2 2/6] ntp: remove useless parameter

This patch removes some leftover cruft in the NTP code.

Signed-off-by: Richard Cochran <[email protected]>
---
kernel/time/ntp.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index e8c8671..d4d48b0 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -540,7 +540,7 @@ static inline void notify_cmos_timer(void) { }
/*
* Propagate a new txc->status value into the NTP state:
*/
-static inline void process_adj_status(struct timex *txc, struct timespec *ts)
+static inline void process_adj_status(struct timex *txc)
{
if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
time_state = TIME_OK;
@@ -565,10 +565,10 @@ static inline void process_adj_status(struct timex *txc, struct timespec *ts)
* Called with the xtime lock held, so we can access and modify
* all the global NTP state:
*/
-static inline void process_adjtimex_modes(struct timex *txc, struct timespec *ts)
+static inline void process_adjtimex_modes(struct timex *txc)
{
if (txc->modes & ADJ_STATUS)
- process_adj_status(txc, ts);
+ process_adj_status(txc);

if (txc->modes & ADJ_NANO)
time_status |= STA_NANO;
@@ -673,7 +673,7 @@ int do_adjtimex(struct timex *txc)

/* If there are input parameters, then process them: */
if (txc->modes)
- process_adjtimex_modes(txc, &ts);
+ process_adjtimex_modes(txc);

txc->offset = shift_right(time_offset * NTP_INTERVAL_FREQ,
NTP_SCALE_SHIFT);
--
1.7.2.5

2012-05-21 18:01:20

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH RFC V2 4/6] time: introduce leap second functional interface

On 05/18/2012 07:09 AM, Richard Cochran wrote:
> This patch adds a new private leap second interface for use by the NTP
> code. In addition to methods for starting and ending leap seconds, the
> interface provides a way to get the correct UTC time of day even during
> a leap second.
>
> Signed-off-by: Richard Cochran<[email protected]>
> ---
> kernel/time/leap-seconds.h | 21 +++++++
> kernel/time/timekeeping.c | 129 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 150 insertions(+), 0 deletions(-)
> create mode 100644 kernel/time/leap-seconds.h
>
> diff --git a/kernel/time/leap-seconds.h b/kernel/time/leap-seconds.h
> new file mode 100644
> index 0000000..3dea7b0
> --- /dev/null
> +++ b/kernel/time/leap-seconds.h
> @@ -0,0 +1,21 @@
> +/*
> + * linux/kernel/time/leap-seconds.h
> + *
> + * Functional interface to the timekeeper code,
> + * for use by the NTP code.
> + *
> + */
> +#ifndef __LINUX_KERNEL_TIME_LEAP_SECONDS_H
> +#define __LINUX_KERNEL_TIME_LEAP_SECONDS_H
> +
> +#include<linux/time.h>
> +
> +int timekeeping_gettod_status(struct timespec *ts, time_t *offset);
> +
> +void timekeeping_delete_leap_second(void);
> +
> +void timekeeping_finish_leap_second(void);
> +
> +void timekeeping_insert_leap_second(void);
> +
> +#endif

Why not just add these to time.h?

thanks
-john

2012-05-21 18:11:02

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On 05/18/2012 07:09 AM, Richard Cochran wrote:
> This patch introduces time keeping variables to track the next
> mini-epoch between the UTC and TAI timescales. A leap second occurs
> one second before a mini-epoch. When no leap second is pending, then
> the epoch is set to the far future, LONG_MAX.
>
> This code will become useful later on for providing correct time
> surrounding a leap second.
>
> Signed-off-by: Richard Cochran<[email protected]>
> ---
> kernel/time/timekeeping.c | 14 ++++++++++++++
> 1 files changed, 14 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index d66b213..ac04de4 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -70,6 +70,19 @@ struct timekeeper {
> /* The raw monotonic time for the CLOCK_MONOTONIC_RAW posix clock. */
> struct timespec raw_time;
>
> + /* The current TAI - UTC offset */
> + time_t tai_offset;
> + /* The UTC time of the next leap second epoch */
> + time_t utc_epoch;

How about leap_utc_epoch just to be more clear?

> + /* Tracks where we stand with regard to leap the second epoch. */
> + enum {
> + LEAP_IDLE,
> + LEAP_INS_PENDING,
> + LEAP_INS_DONE,
> + LEAP_DEL_PENDING,
> + LEAP_DEL_DONE,
> + } leap_state;
> +
For continuity, would it make more sense for these to named closer to
the NTP time_state values, or maybe reworked to make use of them? Not
sure if its worth having separate state machines in the timekeeping code
and the ntp code, but maybe I'm not seeing a necessary detail here?

2012-05-21 18:18:56

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH RFC V2 5/6] time: move leap second management into time keeping core

On 05/18/2012 07:09 AM, Richard Cochran wrote:
> This patch refactors the timekeeping and ntp code in order to
> improve leap second handling and to provide a TAI based clock.
> The change has a number of aspects.
>
> * remove the NTP time_status variable
>
> Instead, compute this value on demand in adjtimex.
>
> * move TAI managment into timekeeping core from ntp
>
> Currently NTP manages the TAI offset. Since there's plans for a
> CLOCK_TAI clockid, push the TAI management into the timekeeping
> core.
>
> [ Original idea from: John Stultz<[email protected]> ]
> [ Changed by RC: ]
>
> - replace u32 with time_t
> - fixup second call site of second_overflow()
>
> * replace modulus with integer test and schedule leap second
>
> On the day of a leap second, the NTP code performs a division on every
> tick in order to know when to leap. This patch replaces the division
> with an integer comparison, making the code faster and easier to
> understand.
>
> Signed-off-by: Richard Cochran<[email protected]>
> ---
> include/linux/time.h | 1 +
> kernel/time/ntp.c | 86 +++++++++++++-------------------------------
> kernel/time/timekeeping.c | 54 ++++++++++++++++++++++++----
> 3 files changed, 73 insertions(+), 68 deletions(-)
>
> diff --git a/include/linux/time.h b/include/linux/time.h
> index 179f4d6..b6034b0 100644
> --- a/include/linux/time.h
> +++ b/include/linux/time.h
> @@ -168,6 +168,7 @@ extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
> extern int timekeeping_valid_for_hres(void);
> extern u64 timekeeping_max_deferment(void);
> extern int timekeeping_inject_offset(struct timespec *ts);
> +extern void timekeeping_set_tai_offset(time_t tai_offset);
>
> struct tms;
> extern void do_sys_times(struct tms *);
> diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> index d4d48b0..53c3227 100644
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c
> @@ -16,6 +16,7 @@
> #include<linux/mm.h>
> #include<linux/module.h>
>
> +#include "leap-seconds.h"
> #include "tick-internal.h"
>
> /*
> @@ -24,6 +25,7 @@
>
> DEFINE_SPINLOCK(ntp_lock);
>
> +#define STA_LEAP (STA_INS | STA_DEL)
>
> /* USER_HZ period (usecs): */
> unsigned long tick_usec = TICK_USEC;
> @@ -42,19 +44,9 @@ static u64 tick_length_base;
> * phase-lock loop variables
> */
>
> -/*
> - * clock synchronization status
> - *
> - * (TIME_ERROR prevents overwriting the CMOS clock)
> - */
> -static int time_state = TIME_OK;
> -
> /* clock status bits: */
> static int time_status = STA_UNSYNC;
>
> -/* TAI offset (secs): */
> -static long time_tai;
> -
> /* time adjustment (nsecs): */
> static s64 time_offset;
>
> @@ -386,57 +378,14 @@ u64 ntp_tick_length(void)
> * They were originally developed for SUN and DEC kernels.
> * All the kudos should go to Dave for this stuff.
> *
> - * Also handles leap second processing, and returns leap offset
> */
> int second_overflow(unsigned long secs)

Might be good to rename second_overflow() to ntp_second_overflow() and
drop passing the time_t as its now unused.

> {
> s64 delta;
> - int leap = 0;
> unsigned long flags;
>
> spin_lock_irqsave(&ntp_lock, flags);
>
> - /*
> - * Leap second processing. If in leap-insert state at the end of the
> - * day, the system clock is set back one second; if in leap-delete
> - * state, the system clock is set ahead one second.
> - */
> - switch (time_state) {
> - case TIME_OK:
> - if (time_status& STA_INS)
> - time_state = TIME_INS;
> - else if (time_status& STA_DEL)
> - time_state = TIME_DEL;
> - break;
> - case TIME_INS:
> - if (secs % 86400 == 0) {
> - leap = -1;
> - time_state = TIME_OOP;
> - time_tai++;
> - printk(KERN_NOTICE
> - "Clock: inserting leap second 23:59:60 UTC\n");
> - }
> - break;
> - case TIME_DEL:
> - if ((secs + 1) % 86400 == 0) {
> - leap = 1;
> - time_tai--;
> - time_state = TIME_WAIT;
> - printk(KERN_NOTICE
> - "Clock: deleting leap second 23:59:59 UTC\n");
> - }
> - break;
> - case TIME_OOP:
> - time_state = TIME_WAIT;
> - break;
> -
> - case TIME_WAIT:
> - if (!(time_status& (STA_INS | STA_DEL)))
> - time_state = TIME_OK;
> - break;
> - }
> -
> -
> /* Bump the maxerror field */
> time_maxerror += MAXFREQ / NSEC_PER_USEC;
> if (time_maxerror> NTP_PHASE_LIMIT) {
> @@ -478,7 +427,7 @@ int second_overflow(unsigned long secs)
> out:
> spin_unlock_irqrestore(&ntp_lock, flags);
>
> - return leap;
> + return 0;
> }
[snip]

> - getnstimeofday(&ts);
> + result = timekeeping_gettod_status(&ts,&orig_tai);
> + time_tai = orig_tai;
> +
> + if (txc->modes& ADJ_STATUS) {
> + /*
> + * Check for new leap second commands.
> + */
> + if (!(time_status& STA_INS)&& (txc->status& STA_INS))
> + timekeeping_insert_leap_second();
> +
> + else if (!(time_status& STA_DEL)&& (txc->status& STA_DEL))
> + timekeeping_delete_leap_second();
> +
> + else if ((time_status& STA_LEAP)&& !(txc->status& STA_LEAP))
> + timekeeping_finish_leap_second();
> + }
Doing this early doesn't run into any trouble with the tcx->modes rules,
right?

I'm just worried about changes in behavior if modes =
ADJ_ADJTIME|ADJ_STATUS


> spin_lock_irq(&ntp_lock);
>
> @@ -673,7 +637,7 @@ int do_adjtimex(struct timex *txc)
>
> /* If there are input parameters, then process them: */
> if (txc->modes)
> - process_adjtimex_modes(txc);
> + process_adjtimex_modes(txc,&time_tai);
>
> txc->offset = shift_right(time_offset * NTP_INTERVAL_FREQ,
> NTP_SCALE_SHIFT);
> @@ -681,7 +645,6 @@ int do_adjtimex(struct timex *txc)
> txc->offset /= NSEC_PER_USEC;
> }
>
> - result = time_state; /* mostly `TIME_OK' */
> /* check for errors */
> if (is_error_status(time_status))
> result = TIME_ERROR;
> @@ -702,6 +665,9 @@ int do_adjtimex(struct timex *txc)
>
> spin_unlock_irq(&ntp_lock);
>
> + if (time_tai != orig_tai&& result == TIME_OK)
> + timekeeping_set_tai_offset(time_tai);
> +
> txc->time.tv_sec = ts.tv_sec;
> txc->time.tv_usec = ts.tv_nsec;
> if (!(time_status& STA_NANO))
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index eab03fb..fdd1a48 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -444,6 +444,18 @@ int timekeeping_inject_offset(struct timespec *ts)
> EXPORT_SYMBOL(timekeeping_inject_offset);
>
> /**
> + * timekeeping_set_tai_offset - Sets the current TAI offset from UTC
> + */
> +void timekeeping_set_tai_offset(time_t tai_offset)
> +{
> + unsigned long flags;
> +
> + write_seqlock_irqsave(&timekeeper.lock, flags);
> + timekeeper.tai_offset = tai_offset;
> + write_sequnlock_irqrestore(&timekeeper.lock, flags);
> +}
> +
> +/**
> * change_clocksource - Swaps clocksources if a new one is available
> *
> * Accumulates current time interval and initializes new clocksource
> @@ -950,6 +962,38 @@ static void timekeeping_adjust(s64 offset)
> timekeeper.ntp_error_shift;
> }
>
> +static void timekeeper_overflow(void)

Mind renaming this to timekeeper_second_overflow, just so readers don't
mix it up with clocksource related overflows.

thanks
-john

2012-05-21 18:26:47

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On 05/18/2012 07:09 AM, Richard Cochran wrote:
> This patch introduces time keeping variables to track the next
> mini-epoch between the UTC and TAI timescales. A leap second occurs
> one second before a mini-epoch. When no leap second is pending, then
> the epoch is set to the far future, LONG_MAX.
>
> This code will become useful later on for providing correct time
> surrounding a leap second.
>
> Signed-off-by: Richard Cochran<[email protected]>
> ---
> kernel/time/timekeeping.c | 14 ++++++++++++++
> 1 files changed, 14 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index d66b213..ac04de4 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -70,6 +70,19 @@ struct timekeeper {
> /* The raw monotonic time for the CLOCK_MONOTONIC_RAW posix clock. */
> struct timespec raw_time;
>
> + /* The current TAI - UTC offset */
> + time_t tai_offset;
> + /* The UTC time of the next leap second epoch */
> + time_t utc_epoch;
> + /* Tracks where we stand with regard to leap the second epoch. */
> + enum {
> + LEAP_IDLE,
> + LEAP_INS_PENDING,
> + LEAP_INS_DONE,
> + LEAP_DEL_PENDING,
> + LEAP_DEL_DONE,
> + } leap_state;
> +
One other thing, I'd rather you break this patch set up logically as:
1) Move tai offset management into timeekeeping core
2) Move leap_state managment into timekeeping core.

Instead of:
1) Add unused values to timekeeper for tai and leapstate
2) Convert NTP code to use both these new values

I know its a pain to refactor a patch set in this way, but it makes it
much more clear as to what each patch is doing and the complexity of
each step.

thanks
-john


2012-05-21 19:08:28

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On Mon, May 21, 2012 at 11:09:51AM -0700, John Stultz wrote:
> On 05/18/2012 07:09 AM, Richard Cochran wrote:
> >This patch introduces time keeping variables to track the next
> >mini-epoch between the UTC and TAI timescales. A leap second occurs
> >one second before a mini-epoch. When no leap second is pending, then
> >the epoch is set to the far future, LONG_MAX.
> >
> >This code will become useful later on for providing correct time
> >surrounding a leap second.
> >
> >Signed-off-by: Richard Cochran<[email protected]>
> >---
> > kernel/time/timekeeping.c | 14 ++++++++++++++
> > 1 files changed, 14 insertions(+), 0 deletions(-)
> >
> >diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> >index d66b213..ac04de4 100644
> >--- a/kernel/time/timekeeping.c
> >+++ b/kernel/time/timekeeping.c
> >@@ -70,6 +70,19 @@ struct timekeeper {
> > /* The raw monotonic time for the CLOCK_MONOTONIC_RAW posix clock. */
> > struct timespec raw_time;
> >
> >+ /* The current TAI - UTC offset */
> >+ time_t tai_offset;
> >+ /* The UTC time of the next leap second epoch */
> >+ time_t utc_epoch;
>
> How about leap_utc_epoch just to be more clear?

Okay

>
> >+ /* Tracks where we stand with regard to leap the second epoch. */
> >+ enum {
> >+ LEAP_IDLE,
> >+ LEAP_INS_PENDING,
> >+ LEAP_INS_DONE,
> >+ LEAP_DEL_PENDING,
> >+ LEAP_DEL_DONE,
> >+ } leap_state;
> >+
> For continuity, would it make more sense for these to named closer
> to the NTP time_state values, or maybe reworked to make use of them?
> Not sure if its worth having separate state machines in the
> timekeeping code and the ntp code, but maybe I'm not seeing a
> necessary detail here?

Actually, the two state machines are _essential_ in making this
work. The first patch idea which you nixed (keep continuous time and
compute discontinuous UTC on demand) would make it all simple and
logical. But having to deal with the time basis jumping around (and
at the wrong time, too late, at the tick) forces us to keep extra
state.

For example In order to know TIME_OOP, you need to know

1. the UTC time of the epoch
2. from that, the UTC time of the leap second (before the jump)
3. whether or not the leap second correction has occurred

I don't think I am explaining this very well. I will try again to make
it clear using a table or something later on.

Thanks,
Richard

2012-05-21 19:14:07

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On Mon, May 21, 2012 at 11:21:18AM -0700, John Stultz wrote:
> On 05/18/2012 07:09 AM, Richard Cochran wrote:
> >This patch introduces time keeping variables to track the next
> >mini-epoch between the UTC and TAI timescales. A leap second occurs
> >one second before a mini-epoch. When no leap second is pending, then
> >the epoch is set to the far future, LONG_MAX.
> >
> >This code will become useful later on for providing correct time
> >surrounding a leap second.
> >
> >Signed-off-by: Richard Cochran<[email protected]>
> >---
> > kernel/time/timekeeping.c | 14 ++++++++++++++
> > 1 files changed, 14 insertions(+), 0 deletions(-)
> >
> >diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> >index d66b213..ac04de4 100644
> >--- a/kernel/time/timekeeping.c
> >+++ b/kernel/time/timekeeping.c
> >@@ -70,6 +70,19 @@ struct timekeeper {
> > /* The raw monotonic time for the CLOCK_MONOTONIC_RAW posix clock. */
> > struct timespec raw_time;
> >
> >+ /* The current TAI - UTC offset */
> >+ time_t tai_offset;
> >+ /* The UTC time of the next leap second epoch */
> >+ time_t utc_epoch;
> >+ /* Tracks where we stand with regard to leap the second epoch. */
> >+ enum {
> >+ LEAP_IDLE,
> >+ LEAP_INS_PENDING,
> >+ LEAP_INS_DONE,
> >+ LEAP_DEL_PENDING,
> >+ LEAP_DEL_DONE,
> >+ } leap_state;
> >+
> One other thing, I'd rather you break this patch set up logically as:
> 1) Move tai offset management into timeekeeping core

** see below

> 2) Move leap_state managment into timekeeping core.
>
> Instead of:
> 1) Add unused values to timekeeper for tai and leapstate
> 2) Convert NTP code to use both these new values

I can combine these two into one, no problem.

** But it does not make sense to move the tai offset by itself in one
patch, since that would mean adding an access method in one patch
and then removing again in the next.

Thanks,
Richard

2012-05-21 19:18:56

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH RFC V2 4/6] time: introduce leap second functional interface

On Mon, May 21, 2012 at 11:01:03AM -0700, John Stultz wrote:
> On 05/18/2012 07:09 AM, Richard Cochran wrote:
> >+
> >+int timekeeping_gettod_status(struct timespec *ts, time_t *offset);
> >+
> >+void timekeeping_delete_leap_second(void);
> >+
> >+void timekeeping_finish_leap_second(void);
> >+
> >+void timekeeping_insert_leap_second(void);
> >+
> >+#endif
>
> Why not just add these to time.h?

This is a private interface only for ntp.c, not for the whole rest of
the kernel via time.h.

BTW this highlights the very icky incestuous relationship between
ntp.c and timekeeper.c. Probably there should be a comment documenting
the (unspoken) locking sequence for ntp_lock and timekeeper.lock.

Thanks,
Richard

2012-05-21 19:24:35

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH RFC V2 5/6] time: move leap second management into time keeping core

On Mon, May 21, 2012 at 11:18:12AM -0700, John Stultz wrote:
> On 05/18/2012 07:09 AM, Richard Cochran wrote:
> >This patch refactors the timekeeping and ntp code in order to
> >improve leap second handling and to provide a TAI based clock.
> >The change has a number of aspects.
> >
> >* remove the NTP time_status variable
> >
> >Instead, compute this value on demand in adjtimex.
> >
> >* move TAI managment into timekeeping core from ntp
> >
> >Currently NTP manages the TAI offset. Since there's plans for a
> >CLOCK_TAI clockid, push the TAI management into the timekeeping
> >core.
> >
> >[ Original idea from: John Stultz<[email protected]> ]
> >[ Changed by RC: ]
> >
> > - replace u32 with time_t
> > - fixup second call site of second_overflow()
> >
> >* replace modulus with integer test and schedule leap second
> >
> >On the day of a leap second, the NTP code performs a division on every
> >tick in order to know when to leap. This patch replaces the division
> >with an integer comparison, making the code faster and easier to
> >understand.
> >
> >Signed-off-by: Richard Cochran<[email protected]>
> >---
> > include/linux/time.h | 1 +
> > kernel/time/ntp.c | 86 +++++++++++++-------------------------------
> > kernel/time/timekeeping.c | 54 ++++++++++++++++++++++++----
> > 3 files changed, 73 insertions(+), 68 deletions(-)
> >
> >diff --git a/include/linux/time.h b/include/linux/time.h
> >index 179f4d6..b6034b0 100644
> >--- a/include/linux/time.h
> >+++ b/include/linux/time.h
> >@@ -168,6 +168,7 @@ extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
> > extern int timekeeping_valid_for_hres(void);
> > extern u64 timekeeping_max_deferment(void);
> > extern int timekeeping_inject_offset(struct timespec *ts);
> >+extern void timekeeping_set_tai_offset(time_t tai_offset);
> >
> > struct tms;
> > extern void do_sys_times(struct tms *);
> >diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> >index d4d48b0..53c3227 100644
> >--- a/kernel/time/ntp.c
> >+++ b/kernel/time/ntp.c
> >@@ -16,6 +16,7 @@
> > #include<linux/mm.h>
> > #include<linux/module.h>
> >
> >+#include "leap-seconds.h"
> > #include "tick-internal.h"
> >
> > /*
> >@@ -24,6 +25,7 @@
> >
> > DEFINE_SPINLOCK(ntp_lock);
> >
> >+#define STA_LEAP (STA_INS | STA_DEL)
> >
> > /* USER_HZ period (usecs): */
> > unsigned long tick_usec = TICK_USEC;
> >@@ -42,19 +44,9 @@ static u64 tick_length_base;
> > * phase-lock loop variables
> > */
> >
> >-/*
> >- * clock synchronization status
> >- *
> >- * (TIME_ERROR prevents overwriting the CMOS clock)
> >- */
> >-static int time_state = TIME_OK;
> >-
> > /* clock status bits: */
> > static int time_status = STA_UNSYNC;
> >
> >-/* TAI offset (secs): */
> >-static long time_tai;
> >-
> > /* time adjustment (nsecs): */
> > static s64 time_offset;
> >
> >@@ -386,57 +378,14 @@ u64 ntp_tick_length(void)
> > * They were originally developed for SUN and DEC kernels.
> > * All the kudos should go to Dave for this stuff.
> > *
> >- * Also handles leap second processing, and returns leap offset
> > */
> > int second_overflow(unsigned long secs)
>
> Might be good to rename second_overflow() to ntp_second_overflow()
> and drop passing the time_t as its now unused.

Okay

> > {
> > s64 delta;
> >- int leap = 0;
> > unsigned long flags;
> >
> > spin_lock_irqsave(&ntp_lock, flags);
> >
> >- /*
> >- * Leap second processing. If in leap-insert state at the end of the
> >- * day, the system clock is set back one second; if in leap-delete
> >- * state, the system clock is set ahead one second.
> >- */
> >- switch (time_state) {
> >- case TIME_OK:
> >- if (time_status& STA_INS)
> >- time_state = TIME_INS;
> >- else if (time_status& STA_DEL)
> >- time_state = TIME_DEL;
> >- break;
> >- case TIME_INS:
> >- if (secs % 86400 == 0) {
> >- leap = -1;
> >- time_state = TIME_OOP;
> >- time_tai++;
> >- printk(KERN_NOTICE
> >- "Clock: inserting leap second 23:59:60 UTC\n");
> >- }
> >- break;
> >- case TIME_DEL:
> >- if ((secs + 1) % 86400 == 0) {
> >- leap = 1;
> >- time_tai--;
> >- time_state = TIME_WAIT;
> >- printk(KERN_NOTICE
> >- "Clock: deleting leap second 23:59:59 UTC\n");
> >- }
> >- break;
> >- case TIME_OOP:
> >- time_state = TIME_WAIT;
> >- break;
> >-
> >- case TIME_WAIT:
> >- if (!(time_status& (STA_INS | STA_DEL)))
> >- time_state = TIME_OK;
> >- break;
> >- }
> >-
> >-
> > /* Bump the maxerror field */
> > time_maxerror += MAXFREQ / NSEC_PER_USEC;
> > if (time_maxerror> NTP_PHASE_LIMIT) {
> >@@ -478,7 +427,7 @@ int second_overflow(unsigned long secs)
> > out:
> > spin_unlock_irqrestore(&ntp_lock, flags);
> >
> >- return leap;
> >+ return 0;
> > }
> [snip]
>
> >- getnstimeofday(&ts);
> >+ result = timekeeping_gettod_status(&ts,&orig_tai);
> >+ time_tai = orig_tai;
> >+
> >+ if (txc->modes& ADJ_STATUS) {
> >+ /*
> >+ * Check for new leap second commands.
> >+ */
> >+ if (!(time_status& STA_INS)&& (txc->status& STA_INS))
> >+ timekeeping_insert_leap_second();
> >+
> >+ else if (!(time_status& STA_DEL)&& (txc->status& STA_DEL))
> >+ timekeeping_delete_leap_second();
> >+
> >+ else if ((time_status& STA_LEAP)&& !(txc->status& STA_LEAP))
> >+ timekeeping_finish_leap_second();
> >+ }
> Doing this early doesn't run into any trouble with the tcx->modes
> rules, right?
>
> I'm just worried about changes in behavior if modes =
> ADJ_ADJTIME|ADJ_STATUS

I wouldn't worry too much. The whole ntp adjtimex thing is such a mess
anyhow. Garbage in, garbage out.

Let's turn the question around: What is the expected behavior of
ADJ_ADJTIME|ADJ_STATUS?

[ hint: not really well defined, I think you will find ]

>
>
> > spin_lock_irq(&ntp_lock);
> >
> >@@ -673,7 +637,7 @@ int do_adjtimex(struct timex *txc)
> >
> > /* If there are input parameters, then process them: */
> > if (txc->modes)
> >- process_adjtimex_modes(txc);
> >+ process_adjtimex_modes(txc,&time_tai);
> >
> > txc->offset = shift_right(time_offset * NTP_INTERVAL_FREQ,
> > NTP_SCALE_SHIFT);
> >@@ -681,7 +645,6 @@ int do_adjtimex(struct timex *txc)
> > txc->offset /= NSEC_PER_USEC;
> > }
> >
> >- result = time_state; /* mostly `TIME_OK' */
> > /* check for errors */
> > if (is_error_status(time_status))
> > result = TIME_ERROR;
> >@@ -702,6 +665,9 @@ int do_adjtimex(struct timex *txc)
> >
> > spin_unlock_irq(&ntp_lock);
> >
> >+ if (time_tai != orig_tai&& result == TIME_OK)
> >+ timekeeping_set_tai_offset(time_tai);
> >+
> > txc->time.tv_sec = ts.tv_sec;
> > txc->time.tv_usec = ts.tv_nsec;
> > if (!(time_status& STA_NANO))
> >diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> >index eab03fb..fdd1a48 100644
> >--- a/kernel/time/timekeeping.c
> >+++ b/kernel/time/timekeeping.c
> >@@ -444,6 +444,18 @@ int timekeeping_inject_offset(struct timespec *ts)
> > EXPORT_SYMBOL(timekeeping_inject_offset);
> >
> > /**
> >+ * timekeeping_set_tai_offset - Sets the current TAI offset from UTC
> >+ */
> >+void timekeeping_set_tai_offset(time_t tai_offset)
> >+{
> >+ unsigned long flags;
> >+
> >+ write_seqlock_irqsave(&timekeeper.lock, flags);
> >+ timekeeper.tai_offset = tai_offset;
> >+ write_sequnlock_irqrestore(&timekeeper.lock, flags);
> >+}
> >+
> >+/**
> > * change_clocksource - Swaps clocksources if a new one is available
> > *
> > * Accumulates current time interval and initializes new clocksource
> >@@ -950,6 +962,38 @@ static void timekeeping_adjust(s64 offset)
> > timekeeper.ntp_error_shift;
> > }
> >
> >+static void timekeeper_overflow(void)
>
> Mind renaming this to timekeeper_second_overflow, just so readers
> don't mix it up with clocksource related overflows.

Okay

Thanks,
Richard

2012-05-21 20:25:06

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH RFC V2 4/6] time: introduce leap second functional interface

On 05/21/2012 12:18 PM, Richard Cochran wrote:
> On Mon, May 21, 2012 at 11:01:03AM -0700, John Stultz wrote:
>> On 05/18/2012 07:09 AM, Richard Cochran wrote:
>>> +
>>> +int timekeeping_gettod_status(struct timespec *ts, time_t *offset);
>>> +
>>> +void timekeeping_delete_leap_second(void);
>>> +
>>> +void timekeeping_finish_leap_second(void);
>>> +
>>> +void timekeeping_insert_leap_second(void);
>>> +
>>> +#endif
>> Why not just add these to time.h?
> This is a private interface only for ntp.c, not for the whole rest of
> the kernel via time.h.
Hrm. I prefer to keep things fairly flat (even having time.h and
timex.h bugs me somewhat). But having such a separation could be
useful, but maybe at a slightly more coarse level. Something like
timekeeping-internal.h and time.h, splitting all the general accessors
away from the non-general.

I just don't want to have a ton of stray .h files, but maybe I'm
prematurely worrying about it.

> BTW this highlights the very icky incestuous relationship between
> ntp.c and timekeeper.c. Probably there should be a comment documenting
> the (unspoken) locking sequence for ntp_lock and timekeeper.lock.
>
The locking order is pretty straight forward: timekeeper.lock ->
ntp_lock. This only gets messy when you require timekeeping data from
the ntp context, but usually we provide the required data via the
caller. But better documentation is always welcome.

thanks
-john

2012-05-21 23:57:17

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH RFC V2 1/6] time: remove obsolete declaration

On 05/18/2012 07:09 AM, Richard Cochran wrote:
> The function, timekeeping_leap_insert, was removed in commit
> 6b43ae8a619d17c4935c3320d2ef9e92bdeed05d
>
> Signed-off-by: Richard Cochran<[email protected]>
>

I just queued this one and sent it to tglx.

thanks
-john

2012-05-21 23:58:57

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH RFC V2 2/6] ntp: remove useless parameter

On 05/18/2012 07:09 AM, Richard Cochran wrote:
> This patch removes some leftover cruft in the NTP code.
>
> Signed-off-by: Richard Cochran<[email protected]>

FYI: This one didn't apply when I tried to add it to the other cleanups
you've sent me along with tip/timers/core.
Please rework it against the tree I sent Thomas and I'll queue it up.

thanks
-john

2012-05-22 04:25:47

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH RFC V2 4/6] time: introduce leap second functional interface

On Mon, May 21, 2012 at 01:24:57PM -0700, John Stultz wrote:
> On 05/21/2012 12:18 PM, Richard Cochran wrote:
> Hrm. I prefer to keep things fairly flat (even having time.h and
> timex.h bugs me somewhat). But having such a separation could be
> useful, but maybe at a slightly more coarse level. Something like
> timekeeping-internal.h and time.h, splitting all the general
> accessors away from the non-general.

Yes, time.h is full of stuff not really for public use. When compiling
on an atom netbook as I do, it gets really noticeable and annoying
when you tweak some private prototype, and then the whole darn kernel
rebuilds.

> The locking order is pretty straight forward: timekeeper.lock ->
> ntp_lock. This only gets messy when you require timekeeping data
> from the ntp context, but usually we provide the required data via
> the caller. But better documentation is always welcome.

The icky part is the fact that ntp would need access to timekeeper
state while holding ntp_lock.

Richard

2012-05-22 15:11:45

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH RFC V2 4/6] time: introduce leap second functional interface

On 05/21/2012 09:25 PM, Richard Cochran wrote:
> On Mon, May 21, 2012 at 01:24:57PM -0700, John Stultz wrote:
>> The locking order is pretty straight forward: timekeeper.lock ->
>> ntp_lock. This only gets messy when you require timekeeping data
>> from the ntp context, but usually we provide the required data via
>> the caller. But better documentation is always welcome.
> The icky part is the fact that ntp would need access to timekeeper
> state while holding ntp_lock.
Well, that needs to be reworked so it doesn't. :) Again, passing the
required time state to NTP functions from the timekeeping context should
handle these issues, and for those few NTP paths that aren't triggered
from the timekeeping core (do_adjtimex basically), we can grab the
required time state before taking the ntp lock as we have been doing.

thanks
-john

2012-05-22 17:40:09

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On Mon, May 21, 2012 at 09:08:15PM +0200, Richard Cochran wrote:
> On Mon, May 21, 2012 at 11:09:51AM -0700, John Stultz wrote:
> > On 05/18/2012 07:09 AM, Richard Cochran wrote:
> > >+ /* Tracks where we stand with regard to leap the second epoch. */
> > >+ enum {
> > >+ LEAP_IDLE,
> > >+ LEAP_INS_PENDING,
> > >+ LEAP_INS_DONE,
> > >+ LEAP_DEL_PENDING,
> > >+ LEAP_DEL_DONE,
> > >+ } leap_state;

...

> I don't think I am explaining this very well. I will try again to make
> it clear using a table or something later on.

The following table illustrates what happens around a (fictitious)
leap second. Imagine a new epoch will occur at UTC time value 10 and
the new TAI - UTC offset will be 2 seconds. The columns of the table
show the values of the relevant time variables.

U: UTC time
CODE: NTP time code
T: TAI - UTC offset
P: pending (explained below)

U CODE T P
--------------------
1 INS 1 1 leap second sheduled
--------------------
2 INS 1 1
--------------------
...
--------------------
8 INS 1 1
--------------------
9 INS 1 1
--------------------
| 10 OOP 1 1 leap second, 1st tick
|~~~~~~~~~~~~~~~~~~~
| 9 2 0 leap second, 2nd and subsequent ticks
--------------------
10 WAIT 2 0 new epoch
--------------------
11 WAIT 2 0

Without adding some extra state, it is impossible to decide if UTC
time value 10 means OOP or WAIT. With the current tick based
implementation, this value can appear in the leap second and also in
the new epoch. A similar problem exists for UTC time value 9. We
cannot consult the NTP state to figure these out, since that is what
we are trying to compute in the first place.

The solution I came up with is to add a "leap second pending" flag
which tracks whether the leap second correction has been applied yet,
shown in column P. Since the case for deletion is a bit different than
insertion, there are actually two flags, and together they appear in
the new enumerated state variable.

So, I hope that explains why this extra state is needed.

Thanks,
Richard

2012-05-22 18:06:44

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On 05/22/2012 10:39 AM, Richard Cochran wrote:
> On Mon, May 21, 2012 at 09:08:15PM +0200, Richard Cochran wrote:
>> On Mon, May 21, 2012 at 11:09:51AM -0700, John Stultz wrote:
>>> On 05/18/2012 07:09 AM, Richard Cochran wrote:
>>>> + /* Tracks where we stand with regard to leap the second epoch. */
>>>> + enum {
>>>> + LEAP_IDLE,
>>>> + LEAP_INS_PENDING,
>>>> + LEAP_INS_DONE,
>>>> + LEAP_DEL_PENDING,
>>>> + LEAP_DEL_DONE,
>>>> + } leap_state;
> ...
>
>> I don't think I am explaining this very well. I will try again to make
>> it clear using a table or something later on.
> The following table illustrates what happens around a (fictitious)
> leap second. Imagine a new epoch will occur at UTC time value 10 and
> the new TAI - UTC offset will be 2 seconds. The columns of the table
> show the values of the relevant time variables.
>
> U: UTC time
> CODE: NTP time code
> T: TAI - UTC offset
> P: pending (explained below)
>
> U CODE T P
> --------------------
> 1 INS 1 1 leap second sheduled
> --------------------
> 2 INS 1 1
> --------------------
> ...
> --------------------
> 8 INS 1 1
> --------------------
> 9 INS 1 1
> --------------------
> | 10 OOP 1 1 leap second, 1st tick
> |~~~~~~~~~~~~~~~~~~~
> | 9 2 0 leap second, 2nd and subsequent ticks
> --------------------
> 10 WAIT 2 0 new epoch
> --------------------
> 11 WAIT 2 0

Not sure I'm still following.

It seems currently we have:

U CODE T
----------------
9 INS 1
----------------
10 INS 1 pre tick, post leap second edge (this is the technically incorrect interval)
~~~~~~~~~~~~~~~~
9 OOP 2 post tick, post leap second edge
----------------
10 WAIT 2 new epoch


If you're trying to correct the pre-tick, post leap second edge, the above provides all you need.

In the adjtimex code, all you have to do is:


if (unlikely(CODE == INS&& U == 10))

/*note, we're not modifying state here, just returning corrected local values*/

return (U-1, OOP, T+1);

return (U,CODE, T);


Since when the tick triggers, we'll move the CODE state appropriately.

Or am I still missing something?

thanks
-john

2012-05-23 08:29:32

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On Tue, May 22, 2012 at 11:06:09AM -0700, John Stultz wrote:
> On 05/22/2012 10:39 AM, Richard Cochran wrote:
> >On Mon, May 21, 2012 at 09:08:15PM +0200, Richard Cochran wrote:
> >>On Mon, May 21, 2012 at 11:09:51AM -0700, John Stultz wrote:
> >>>On 05/18/2012 07:09 AM, Richard Cochran wrote:
> >>>>+ /* Tracks where we stand with regard to leap the second epoch. */
> >>>>+ enum {
> >>>>+ LEAP_IDLE,
> >>>>+ LEAP_INS_PENDING,
> >>>>+ LEAP_INS_DONE,
> >>>>+ LEAP_DEL_PENDING,
> >>>>+ LEAP_DEL_DONE,
> >>>>+ } leap_state;
> >...
> >
> >>I don't think I am explaining this very well. I will try again to make
> >>it clear using a table or something later on.
> >The following table illustrates what happens around a (fictitious)
> >leap second. Imagine a new epoch will occur at UTC time value 10 and
> >the new TAI - UTC offset will be 2 seconds. The columns of the table
> >show the values of the relevant time variables.
> >
> >U: UTC time
> >CODE: NTP time code
> >T: TAI - UTC offset
> >P: pending (explained below)
> >
> > U CODE T P
> > --------------------
> > 1 INS 1 1 leap second sheduled
> > --------------------
> > 2 INS 1 1
> > --------------------
> > ...
> > --------------------
> > 8 INS 1 1
> > --------------------
> > 9 INS 1 1
> > --------------------
> >| 10 OOP 1 1 leap second, 1st tick
> >|~~~~~~~~~~~~~~~~~~~
> >| 9 2 0 leap second, 2nd and subsequent ticks
> > --------------------
> > 10 WAIT 2 0 new epoch
> > --------------------
> > 11 WAIT 2 0
>
> Not sure I'm still following.
>
> It seems currently we have:
>
> U CODE T
> ----------------
> 9 INS 1
> ----------------
> 10 INS 1 pre tick, post leap second edge (this is the technically incorrect interval)
> ~~~~~~~~~~~~~~~~
> 9 OOP 2 post tick, post leap second edge
> ----------------
> 10 WAIT 2 new epoch
>
>
> If you're trying to correct the pre-tick, post leap second edge, the above provides all you need.
>
> In the adjtimex code, all you have to do is:
>
>
> if (unlikely(CODE == INS&& U == 10))
>
> /*note, we're not modifying state here, just returning corrected local values*/
>
> return (U-1, OOP, T+1);
>
> return (U,CODE, T);

Okay, if you want it that way, then you will have to add the other
cases. For example:

switch (code) {
case INS:
if (U == epoch) {
U--;
T++;
code = OOP;
}
break;
case OOP:
if (U == epoch) {
code = WAIT;
}
break;
case DEL:
if (U == epoch - 1) {
U++;
T--;
code = WAIT;
}
break;
default:
break;
}
return (U, code, T);

This is beginning to look a lot like the code in my patch. However,
your approach is somewhat simpler, because it assumes the tick will
never miss a second overflow.

> Since when the tick triggers, we'll move the CODE state appropriately.
>
> Or am I still missing something?

Considering the tickless options, is it safe to assume that the CODE
state update will never be an entire second too late? If so, then
I'll rework the patch as above. If not, then I think the patch I
posted already handles all the cases.

Thanks,
Richard

2012-05-23 16:52:13

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On 05/23/2012 01:29 AM, Richard Cochran wrote:
> On Tue, May 22, 2012 at 11:06:09AM -0700, John Stultz wrote:
>> It seems currently we have:
>>
>> U CODE T
>> ----------------
>> 9 INS 1
>> ----------------
>> 10 INS 1 pre tick, post leap second edge (this is the technically incorrect interval)
>> ~~~~~~~~~~~~~~~~
>> 9 OOP 2 post tick, post leap second edge
>> ----------------
>> 10 WAIT 2 new epoch
>>
>>
>> If you're trying to correct the pre-tick, post leap second edge, the above provides all you need.
>>
>> In the adjtimex code, all you have to do is:
>>
>>
>> if (unlikely(CODE == INS&& U == 10))
>>
>> /*note, we're not modifying state here, just returning corrected local values*/
>>
>> return (U-1, OOP, T+1);
>>
>> return (U,CODE, T);
> Okay, if you want it that way, then you will have to add the other
> cases. For example:
>
> switch (code) {
> case INS:
> if (U == epoch) {
> U--;
> T++;
> code = OOP;
> }
> break;
> case OOP:
> if (U == epoch) {
epoch + 1 here, right?
> code = WAIT;
> }
> break;
> case DEL:
> if (U == epoch - 1) {
> U++;
> T--;
> code = WAIT;
> }
> break;
> default:
> break;
> }
> return (U, code, T);
>
> This is beginning to look a lot like the code in my patch. However,
> your approach is somewhat simpler, because it assumes the tick will
> never miss a second overflow.

I'm a little unclear on the above, because it looks like you're
modifying the state from the reader.


>> Since when the tick triggers, we'll move the CODE state appropriately.
>>
>> Or am I still missing something?
> Considering the tickless options, is it safe to assume that the CODE
> state update will never be an entire second too late? If so, then
> I'll rework the patch as above. If not, then I think the patch I
> posted already handles all the cases.
>

I still don't think it matters. If we know the when next leap second is
supposed to be, if the time_state is INS, then we can still handle
things without extra state.

if (unlikely(CODE == INS&& U == next_leap))
return (U-1, OOP, T+1);

if (unlikely(CODE == INS&& U == next_leap + 1))
return (U-1, WAIT, T+1);

if (unlikely(CODE == DEL&& U == next_leap - 1))
return (U+1, WAIT, T-1);


So even if we somehow sleep for two seconds over the leap second, and then an application hits the read critical section before the timer interrupt comes in the update the state, we can still provide correct state transition in the reader.

Thus the only additional state you might need over what we already have is the next_leap value.

thanks
-john

2012-05-23 19:20:41

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On Wed, May 23, 2012 at 09:50:13AM -0700, John Stultz wrote:
> On 05/23/2012 01:29 AM, Richard Cochran wrote:
> >Okay, if you want it that way, then you will have to add the other
> >cases. For example:
> >
> > switch (code) {
> > case INS:
> > if (U == epoch) {
> > U--;
> > T++;
> > code = OOP;
> > }
> > break;
> > case OOP:
> > if (U == epoch) {
> epoch + 1 here, right?

No, I really mean epoch (not the leap second value, but the value when
the new TAI offset comes into effect).

> > code = WAIT;
> > }
> > break;
> > case DEL:
> > if (U == epoch - 1) {
> > U++;
> > T--;
> > code = WAIT;
> > }
> > break;
> > default:
> > break;
> > }
> > return (U, code, T);
> >
> >This is beginning to look a lot like the code in my patch. However,
> >your approach is somewhat simpler, because it assumes the tick will
> >never miss a second overflow.
>
> I'm a little unclear on the above, because it looks like you're
> modifying the state from the reader.

Sorry about that. Here is a more exact pseudo code:

switch (time_state) {
case INS:
if (U == epoch) {
U--;
T++;
result_code = OOP;
}
break;
case OOP:
if (U == epoch) {
result_code = WAIT;
}
break;
case DEL:
if (U == epoch - 1) {
U++;
T--;
result_code = WAIT;
}
break;
default:
break;
}
return (U, result_code, T);

> I still don't think it matters. If we know the when next leap second
> is supposed to be, if the time_state is INS, then we can still
> handle things without extra state.
>
> if (unlikely(CODE == INS&& U == next_leap))
> return (U-1, OOP, T+1);
>
> if (unlikely(CODE == INS&& U == next_leap + 1))
> return (U-1, WAIT, T+1);

And what if (U > next_leap + 1)?

In that case, you must also return WAIT. Are you going to add a test
for every second beyond 'next_leap'? I don't think so.

>
> if (unlikely(CODE == DEL&& U == next_leap - 1))
> return (U+1, WAIT, T-1);
>
>
> So even if we somehow sleep for two seconds over the leap second,
> and then an application hits the read critical section before the
> timer interrupt comes in the update the state, we can still provide
> correct state transition in the reader.

No, I think what you wrote above is not correct.

> Thus the only additional state you might need over what we already
> have is the next_leap value.

Again, you will need two things.

1. the epoch threshold value (not the leap second value)
2. whether the new epoch has been applied yet, or not

Thanks,
Richard

2012-05-23 19:43:08

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On Wed, May 23, 2012 at 09:50:13AM -0700, John Stultz wrote:
>
> Thus the only additional state you might need over what we already
> have is the next_leap value.

And that is BTW exactly what my patches do.

The added enumerated state variable 'leap_state' in timekeeper.c is
balanced by the removal of 'time_state' in ntp.c.

Thanks,
Richard

2012-05-23 20:19:10

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On 05/23/2012 12:17 PM, Richard Cochran wrote:
> On Wed, May 23, 2012 at 09:50:13AM -0700, John Stultz wrote:
>> On 05/23/2012 01:29 AM, Richard Cochran wrote:
>>> Okay, if you want it that way, then you will have to add the other
>>> cases. For example:
>>>
>>> switch (code) {
>>> case INS:
>>> if (U == epoch) {
>>> U--;
>>> T++;
>>> code = OOP;
>>> }
>>> break;
>>> case OOP:
>>> if (U == epoch) {
>> epoch + 1 here, right?
> No, I really mean epoch (not the leap second value, but the value when
> the new TAI offset comes into effect).
>
>>> code = WAIT;
>>> }
>>> break;
>>> case DEL:
>>> if (U == epoch - 1) {
>>> U++;
>>> T--;
>>> code = WAIT;
>>> }
>>> break;
>>> default:
>>> break;
>>> }
>>> return (U, code, T);
>>>
>>> This is beginning to look a lot like the code in my patch. However,
>>> your approach is somewhat simpler, because it assumes the tick will
>>> never miss a second overflow.
>> I'm a little unclear on the above, because it looks like you're
>> modifying the state from the reader.
> Sorry about that. Here is a more exact pseudo code:
>
> switch (time_state) {
> case INS:
> if (U == epoch) {
> U--;
> T++;
> result_code = OOP;
> }
> break;
> case OOP:
> if (U == epoch) {
> result_code = WAIT;
> }
> break;
> case DEL:
> if (U == epoch - 1) {
> U++;
> T--;
> result_code = WAIT;
> }
> break;
> default:
> break;
> }
> return (U, result_code, T);

Again, my issue here is that you're modifying state from the reader. Why
not leave that to the tick?

>> I still don't think it matters. If we know the when next leap second
>> is supposed to be, if the time_state is INS, then we can still
>> handle things without extra state.
>>
>> if (unlikely(CODE == INS&& U == next_leap))
>> return (U-1, OOP, T+1);
>>
>> if (unlikely(CODE == INS&& U == next_leap + 1))
>> return (U-1, WAIT, T+1);
> And what if (U> next_leap + 1)?
>
> In that case, you must also return WAIT. Are you going to add a test
> for every second beyond 'next_leap'? I don't think so.
You're quite correct, sorry for the omission there.

if (unlikely((CODE == INS || CODE== OOP)&& U>= next_leap + 1))
return (U-1, WAIT, T+1);


>> if (unlikely(CODE == DEL&& U == next_leap - 1))
>> return (U+1, WAIT, T-1);
>>
>>
>> So even if we somehow sleep for two seconds over the leap second,
>> and then an application hits the read critical section before the
>> timer interrupt comes in the update the state, we can still provide
>> correct state transition in the reader.
> No, I think what you wrote above is not correct.
So what's wrong with the corrected line above?

>> Thus the only additional state you might need over what we already
>> have is the next_leap value.
> Again, you will need two things.
>
> 1. the epoch threshold value (not the leap second value)
So I've avoided the term epoch just to try not to confuse things with
the unix epoch, that's why I've used next_leap, etc.
Even so, I'm not sure you've made clear the subtlety of the difference.


> 2. whether the new epoch has been applied yet, or not
I believe the internal time_state (along with the next leap second)
already provides this.

From the reader's perspective:

Not applied: (INS&& U< leap): return (INS, U)
Applied: (INS&& U == leap): return (OOP, U-1)
Finished applied: ((INS||OOP)&& U>= (leap+1)): return (WAIT,U-1)
Delete: (DEL&& U>= (leap-1)): return (WAIT,U+1)


Again, no state change is done by the reader, so we don't have to keep
track of application state or not.
Then when the tick comes in, it will move the state machine appropriately.

Sorry working this out is so difficult. If we don't come to consensus
soon, I'll try to find some time to implement what I'm suggesting so you
aren't up against my unclear hand-waving. :)

thanks
-john

2012-05-24 06:43:54

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On Wed, May 23, 2012 at 01:18:27PM -0700, John Stultz wrote:
> So I've avoided the term epoch just to try not to confuse things
> with the unix epoch, that's why I've used next_leap, etc.
> Even so, I'm not sure you've made clear the subtlety of the difference.

For me, when working with the whole UTC leap second mess, it is
important to distinguish between the leap seconds and the epochs. I
sometime have written mini-epoch to make clear that it is not about
*the* epoch from 1970.

The NIST leap second table give a list of epochs and TAI offsets. The
leap second is always the second just before the epoch. For example:

30 June 1972 23:59:59 (NTP 2287785599, first time) <-- normal second
30 June 1972 23:59:60 (NTP 2287785599, second time) <-- leap second
1 July 1972 00:00:00 (NTP 2287785600) <-- epoch

We should write the thresholding code so that it is clear whether we
are testing against the epoch or the leap second.

> I believe the internal time_state (along with the next leap second)
> already provides this.
>
> From the reader's perspective:
>
> Not applied: (INS&& U< leap): return (INS, U)
> Applied: (INS&& U == leap): return (OOP, U-1)
> Finished applied: ((INS||OOP)&& U>= (leap+1)): return (WAIT,U-1)
> Delete: (DEL&& U>= (leap-1)): return (WAIT,U+1)

I think this is still wrong, but I get your point.

My patch makes the state of the leap second application explicit and
then infers time_state on demand. You can also, of course, make
time_state explicit and infer whether or not the leap second has been
applied.

My point is that in either case, it is the same amount of code.

I prefer what I wrote, because I think it is clearer.

> Again, no state change is done by the reader, so we don't have to
> keep track of application state or not.
> Then when the tick comes in, it will move the state machine appropriately.
>
> Sorry working this out is so difficult. If we don't come to
> consensus soon, I'll try to find some time to implement what I'm
> suggesting so you aren't up against my unclear hand-waving. :)

BTW you can use the program I have been using to test this at

git://github.com/richardcochran/leap.git

Thanks,
Richard

2012-05-24 06:57:39

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On Thu, May 24, 2012 at 08:43:40AM +0200, Richard Cochran wrote:
> BTW you can use the program I have been using to test this at
>
> git://github.com/richardcochran/leap.git

That program exposes another leap second bug, too, I think. It reads
the time via adjtimex in a tight loop, optionally sleeping using

clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, NULL);

The program does not wake from this call during a leap second. It is
my expectation that CLOCK_MONOTONIC should always work. Why doesn't
it?

Thanks,
Richard

2012-05-26 15:08:19

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On Thu, May 24, 2012 at 08:57:28AM +0200, Richard Cochran wrote:
> On Thu, May 24, 2012 at 08:43:40AM +0200, Richard Cochran wrote:
> > BTW you can use the program I have been using to test this at
> >
> > git://github.com/richardcochran/leap.git
>
> That program exposes another leap second bug, too, I think. It reads
> the time via adjtimex in a tight loop, optionally sleeping using
>
> clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, NULL);
>
> The program does not wake from this call during a leap second. It is
> my expectation that CLOCK_MONOTONIC should always work. Why doesn't
> it?

FWIW, the V1 of this patch series fixes this problem, but V2 doesn't.

Maybe that fact promotes my original idea?

Thanks,
Richard

2012-05-30 01:47:34

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On 05/23/2012 11:57 PM, Richard Cochran wrote:
> On Thu, May 24, 2012 at 08:43:40AM +0200, Richard Cochran wrote:
>> BTW you can use the program I have been using to test this at
>>
>> git://github.com/richardcochran/leap.git
> That program exposes another leap second bug, too, I think. It reads
> the time via adjtimex in a tight loop, optionally sleeping using
>
> clock_nanosleep(CLOCK_MONOTONIC, 0,&ts, NULL);
>
> The program does not wake from this call during a leap second. It is
> my expectation that CLOCK_MONOTONIC should always work. Why doesn't
> it?

Sorry for being slow here, just got a chance to look at this.

Does the following patch solve this issue for you?

thanks
-john


Make sure we update wall_to_monotonic when adding a leapsecond. This
could otherwise cause discontinuities in CLOCK_MONOTONIC.

Reported-by: Richard Cochran <[email protected]>
Signed-off-by: John Stultz <[email protected]>

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6e46cac..fb95654 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -962,6 +962,7 @@ static cycle_t logarithmic_accumulation(cycle_t offset, int shift)
timekeeper.xtime.tv_sec++;
leap = second_overflow(timekeeper.xtime.tv_sec);
timekeeper.xtime.tv_sec += leap;
+ timekeeper.wall_to_monotonic -= leap;
}

/* Accumulate raw time */

2012-05-30 01:50:24

by john stultz

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On 05/29/2012 06:46 PM, John Stultz wrote:
> On 05/23/2012 11:57 PM, Richard Cochran wrote:
>> On Thu, May 24, 2012 at 08:43:40AM +0200, Richard Cochran wrote:
>>> BTW you can use the program I have been using to test this at
>>>
>>> git://github.com/richardcochran/leap.git
>> That program exposes another leap second bug, too, I think. It reads
>> the time via adjtimex in a tight loop, optionally sleeping using
>>
>> clock_nanosleep(CLOCK_MONOTONIC, 0,&ts, NULL);
>>
>> The program does not wake from this call during a leap second. It is
>> my expectation that CLOCK_MONOTONIC should always work. Why doesn't
>> it?
>
> Sorry for being slow here, just got a chance to look at this.
>
> Does the following patch solve this issue for you?

Sorry, attached the wrong patch (that one doesn't build!).

Try this one.

thanks
-john


Make sure we update wall_to_monotonic when adding a leapsecond.
This could otherwise cause discontinuities in CLOCK_MONOTONIC.

Reported-by: Richard Cochran <[email protected]>
Signed-off-by: John Stultz <[email protected]>

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6e46cac..81c76a9 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -962,6 +962,7 @@ static cycle_t logarithmic_accumulation(cycle_t offset, int shift)
timekeeper.xtime.tv_sec++;
leap = second_overflow(timekeeper.xtime.tv_sec);
timekeeper.xtime.tv_sec += leap;
+ timekeeper.wall_to_monotonic.tv_sec -= leap;
}

/* Accumulate raw time */

2012-05-30 05:11:54

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On Tue, May 29, 2012 at 06:49:30PM -0700, John Stultz wrote:
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 6e46cac..81c76a9 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -962,6 +962,7 @@ static cycle_t logarithmic_accumulation(cycle_t offset, int shift)
> timekeeper.xtime.tv_sec++;
> leap = second_overflow(timekeeper.xtime.tv_sec);
> timekeeper.xtime.tv_sec += leap;
> + timekeeper.wall_to_monotonic.tv_sec -= leap;

Don't you need this in update_wall_time() too?

BTW I suggest refactoring this code (two almost identical if{} bodies)
into a shared helper function.

Thanks,
Richard

2012-05-30 05:56:57

by john stultz

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On 05/29/2012 10:11 PM, Richard Cochran wrote:
> On Tue, May 29, 2012 at 06:49:30PM -0700, John Stultz wrote:
>> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
>> index 6e46cac..81c76a9 100644
>> --- a/kernel/time/timekeeping.c
>> +++ b/kernel/time/timekeeping.c
>> @@ -962,6 +962,7 @@ static cycle_t logarithmic_accumulation(cycle_t offset, int shift)
>> timekeeper.xtime.tv_sec++;
>> leap = second_overflow(timekeeper.xtime.tv_sec);
>> timekeeper.xtime.tv_sec += leap;
>> + timekeeper.wall_to_monotonic.tv_sec -= leap;
> Don't you need this in update_wall_time() too?
Yep. Good point.

> BTW I suggest refactoring this code (two almost identical if{} bodies)
> into a shared helper function.
>
Sounds good, although since the logic is ever so slightly different I
might split up the pure fix and do the refactoring later so the fix is
easier to apply to -stable branches.

thanks
-john

2012-05-30 06:19:37

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On Tue, May 29, 2012 at 10:56:18PM -0700, John Stultz wrote:
> On 05/29/2012 10:11 PM, Richard Cochran wrote:
> >On Tue, May 29, 2012 at 06:49:30PM -0700, John Stultz wrote:
> >>diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> >>index 6e46cac..81c76a9 100644
> >>--- a/kernel/time/timekeeping.c
> >>+++ b/kernel/time/timekeeping.c
> >>@@ -962,6 +962,7 @@ static cycle_t logarithmic_accumulation(cycle_t offset, int shift)
> >> timekeeper.xtime.tv_sec++;
> >> leap = second_overflow(timekeeper.xtime.tv_sec);
> >> timekeeper.xtime.tv_sec += leap;
> >>+ timekeeper.wall_to_monotonic.tv_sec -= leap;
> >Don't you need this in update_wall_time() too?
> Yep. Good point.

Okay, so I can confirm that this fixes the CLOCK_MONOTONIC timer issue
during a leap second.

Thanks,
Richard

2012-05-30 06:23:33

by john stultz

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On 05/29/2012 11:19 PM, Richard Cochran wrote:
> On Tue, May 29, 2012 at 10:56:18PM -0700, John Stultz wrote:
>> On 05/29/2012 10:11 PM, Richard Cochran wrote:
>>> On Tue, May 29, 2012 at 06:49:30PM -0700, John Stultz wrote:
>>>> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
>>>> index 6e46cac..81c76a9 100644
>>>> --- a/kernel/time/timekeeping.c
>>>> +++ b/kernel/time/timekeeping.c
>>>> @@ -962,6 +962,7 @@ static cycle_t logarithmic_accumulation(cycle_t offset, int shift)
>>>> timekeeper.xtime.tv_sec++;
>>>> leap = second_overflow(timekeeper.xtime.tv_sec);
>>>> timekeeper.xtime.tv_sec += leap;
>>>> + timekeeper.wall_to_monotonic.tv_sec -= leap;
>>> Don't you need this in update_wall_time() too?
>> Yep. Good point.
> Okay, so I can confirm that this fixes the CLOCK_MONOTONIC timer issue
> during a leap second.
Thanks! Is it ok if I add your Tested-by: to the patch?

thanks
-john

2012-05-30 07:27:24

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCH RFC V2 3/6] time: keep track of the pending utc/tai threshold

On Tue, May 29, 2012 at 11:23:27PM -0700, John Stultz wrote:
> On 05/29/2012 11:19 PM, Richard Cochran wrote:
> >On Tue, May 29, 2012 at 10:56:18PM -0700, John Stultz wrote:
> >>On 05/29/2012 10:11 PM, Richard Cochran wrote:
> >>>On Tue, May 29, 2012 at 06:49:30PM -0700, John Stultz wrote:
> >>>>diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> >>>>index 6e46cac..81c76a9 100644
> >>>>--- a/kernel/time/timekeeping.c
> >>>>+++ b/kernel/time/timekeeping.c
> >>>>@@ -962,6 +962,7 @@ static cycle_t logarithmic_accumulation(cycle_t offset, int shift)
> >>>> timekeeper.xtime.tv_sec++;
> >>>> leap = second_overflow(timekeeper.xtime.tv_sec);
> >>>> timekeeper.xtime.tv_sec += leap;
> >>>>+ timekeeper.wall_to_monotonic.tv_sec -= leap;
> >>>Don't you need this in update_wall_time() too?
> >>Yep. Good point.
> >Okay, so I can confirm that this fixes the CLOCK_MONOTONIC timer issue
> >during a leap second.
> Thanks! Is it ok if I add your Tested-by: to the patch?

Yes, by all means, thanks.

Richard