2010-11-11 19:30:35

by Alexander Shishkin

[permalink] [raw]
Subject: [PATCHv6 0/7] system time changes notification

Hi,

This is the sixth version of system time change notification mechanism
for linux kernel. The need for it comes from applications which would
like to keep track of time changes without having to wake up every
$TIMEOUT and calling gettimeofday().

An excellent description for one of the usecases, written by Kay Sievers
(http://kerneltrap.org/mailarchive/linux-kernel/2010/8/5/4603531):
"""
This is the example Lennart and I thought about when we considered
adding cron-like stuff to systemd's timer configs, but didn't want to
do silly things like scheduled checks for the actual time, so we
delayed this feature until such a notification becomes available.

Consider we want stuff like "wakeup every day at 3pm", the next wakeup
might be earlier than the timer we calculated last time, on system
time changes. We need to re-calculate it. This is necessary for all
repeating events.

Say we want to wakeup at 3pm, now it's 4pm, so we schedule it in 23
hours. Now the system time changes to 2pm, and we would expect to
wakeup in one hour, but we take 25.
"""

Changes since v5:
- addressed Thomas Gleixner's comments
- added clockid syscall parameter as suggested by John Stultz
- updated powerpc and blackfin syscalls
Changes since v4:
- updated arm and s390 syscall wiring
- removed RFC
Changes since v3:
- broken out separate patches adding time_change_notify syscall to
arm, powerpc, x86, ia64, s390 and blackfin
Changes since v2:
- replaced sysfs interface with a syscall
- added sysctl/procfs handle to set a limit to the number of users
- fixed issues pointed out by Greg.
Changes since v1:
- updated against 2.6.36-rc1,
- added notification/filtering options,
- added Documentation/ABI/sysfs-kernel-time-notify interface description.

Alexander Shishkin (7):
notify userspace about time changes
wire up sys_time_change_notify() on ARM
wire up sys_time_change_notify() on x86
wire up sys_time_change_notify() on ia64
wire up sys_time_change_notify() on s390
wire up sys_time_change_notify() on powerpc
wire up sys_time_change_notify() on blackfin

Documentation/time-change-notify-example.c | 65 +++++++++++
arch/arm/include/asm/unistd.h | 1 +
arch/arm/kernel/calls.S | 1 +
arch/blackfin/include/asm/unistd.h | 3 +-
arch/blackfin/mach-common/entry.S | 1 +
arch/ia64/include/asm/unistd.h | 3 +-
arch/ia64/kernel/entry.S | 1 +
arch/powerpc/include/asm/systbl.h | 1 +
arch/powerpc/include/asm/unistd.h | 3 +-
arch/s390/include/asm/unistd.h | 3 +-
arch/s390/kernel/compat_wrapper.S | 7 ++
arch/s390/kernel/syscalls.S | 1 +
arch/x86/ia32/ia32entry.S | 1 +
arch/x86/include/asm/unistd_32.h | 3 +-
arch/x86/include/asm/unistd_64.h | 2 +
arch/x86/kernel/syscall_table_32.S | 1 +
include/asm-generic/unistd.h | 4 +-
include/linux/syscalls.h | 2 +
include/linux/time.h | 13 +++
kernel/sys_ni.c | 3 +
kernel/time/Kconfig | 7 ++
kernel/time/Makefile | 1 +
kernel/time/notify.c | 163 ++++++++++++++++++++++++++++
kernel/time/ntp.c | 3 +
kernel/time/timekeeping.c | 3 +
25 files changed, 290 insertions(+), 6 deletions(-)
create mode 100644 Documentation/time-change-notify-example.c
create mode 100644 kernel/time/notify.c

Regards,
--
Alex


2010-11-11 19:30:36

by Alexander Shishkin

[permalink] [raw]
Subject: [PATCHv6 1/7] notify userspace about time changes

Certain userspace applications (like "clock" desktop applets or cron) might
want to be notified when some other application changes the system time.
There are several known to me reasons for this:
- avoiding periodic wakeups to poll time changes;
- changing system timekeeping policy for system-wide time management
programs;
- keeping guest applications/operating systems running in emulators
up to date.

This patch implements a notification interface via eventfd mechanism. Proccess
wishing to be notified about time changes should create an eventfd and pass it
to time_change_notify() syscall. After that, any calls to settimeofday()/
stime()/adjtimex() made by other processes will be signalled to this eventfd.
Credits for suggesting the eventfd mechanism for this purpose go to Kirill
Shutemov.

This patch adds the syscall to asm-generic/unistd.h and a simple usage
example.

Signed-off-by: Alexander Shishkin <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: John Stultz <[email protected]>
CC: Martin Schwidefsky <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Jon Hunter <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: "Paul E. McKenney" <[email protected]>
CC: David Howells <[email protected]>
CC: Avi Kivity <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: John Kacur <[email protected]>
CC: Alexander Shishkin <[email protected]>
CC: Chris Friesen <[email protected]>
CC: Kay Sievers <[email protected]>
CC: Greg KH <[email protected]>
CC: [email protected]
---
Documentation/time-change-notify-example.c | 65 +++++++++++
include/asm-generic/unistd.h | 4 +-
include/linux/syscalls.h | 2 +
include/linux/time.h | 13 +++
kernel/sys_ni.c | 3 +
kernel/time/Kconfig | 7 ++
kernel/time/Makefile | 1 +
kernel/time/notify.c | 163 ++++++++++++++++++++++++++++
kernel/time/ntp.c | 3 +
kernel/time/timekeeping.c | 3 +
10 files changed, 263 insertions(+), 1 deletions(-)
create mode 100644 Documentation/time-change-notify-example.c
create mode 100644 kernel/time/notify.c

diff --git a/Documentation/time-change-notify-example.c b/Documentation/time-change-notify-example.c
new file mode 100644
index 0000000..b90c605
--- /dev/null
+++ b/Documentation/time-change-notify-example.c
@@ -0,0 +1,65 @@
+/*
+ * Simple program to catch system time changes
+ *
+ * written by Alexander Shishkin <[email protected]>
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <sys/eventfd.h>
+#include <sys/syscall.h>
+#include <time.h>
+#include <errno.h>
+#include <unistd.h>
+#include <poll.h>
+
+#ifndef SYS_time_change_notify
+# include "asm/unistd.h"
+# ifdef __NR_time_change_notify
+# define SYS_time_change_notify __NR_time_change_notify
+# else
+# error Cannot figure out time_change_notify syscall number.
+# endif
+#endif
+
+static int time_change_notify(clockid_t clockid, int fd, unsigned int flags)
+{
+ return syscall(SYS_time_change_notify, clockid, fd, flags);
+}
+
+int main(int argc, char **argv)
+{
+ struct pollfd fds = { .events = POLLIN };
+
+ fds.fd = eventfd(0, 0);
+ if (fds.fd < 0) {
+ perror("eventfd");
+ return EXIT_FAILURE;
+ }
+
+ /* subscribe to all events from all sources */
+ if (time_change_notify(CLOCK_REALTIME, fds.fd, 0)) {
+ perror("time_change_notify");
+ return EXIT_FAILURE;
+ }
+
+ while (poll(&fds, 1, -1) > 0) {
+ eventfd_t data;
+ ssize_t r;
+
+ r = read(fds.fd, &data, sizeof data);
+ if (r == -1) {
+ if (errno == EINTR)
+ continue;
+
+ break;
+ }
+
+ printf("system time has changed %llu times\n", data);
+ }
+
+ puts("Done polling system time changes.\n");
+
+ return EXIT_SUCCESS;
+}
+
diff --git a/include/asm-generic/unistd.h b/include/asm-generic/unistd.h
index b969770..c8372db 100644
--- a/include/asm-generic/unistd.h
+++ b/include/asm-generic/unistd.h
@@ -646,9 +646,11 @@ __SYSCALL(__NR_prlimit64, sys_prlimit64)
__SYSCALL(__NR_fanotify_init, sys_fanotify_init)
#define __NR_fanotify_mark 263
__SYSCALL(__NR_fanotify_mark, sys_fanotify_mark)
+#define __NR_time_change_notify 264
+__SYSCALL(__NR_time_change_notify, sys_time_change_notify)

#undef __NR_syscalls
-#define __NR_syscalls 264
+#define __NR_syscalls 265

/*
* All syscalls below here should go away really,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index cacc27a..6b77576 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -820,6 +820,8 @@ asmlinkage long sys_fanotify_init(unsigned int flags, unsigned int event_f_flags
asmlinkage long sys_fanotify_mark(int fanotify_fd, unsigned int flags,
u64 mask, int fd,
const char __user *pathname);
+asmlinkage long sys_time_change_notify(clockid_t clockid, int fd,
+ unsigned int flags);

int kernel_execve(const char *filename, const char *const argv[], const char *const envp[]);

diff --git a/include/linux/time.h b/include/linux/time.h
index 9f15ac7..76b9710 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -252,6 +252,19 @@ static __always_inline void timespec_add_ns(struct timespec *a, u64 ns)
a->tv_sec += __iter_div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
a->tv_nsec = ns;
}
+
+/* time change events types */
+enum {
+ TIME_EVENT_SET = 0,
+ TIME_EVENT_ADJ,
+};
+
+#ifdef CONFIG_TIME_NOTIFY
+void time_notify_all(clockid_t clockid, int type);
+#else
+static inline void time_notify_all(clockid_t clockid, int type) {}
+#endif
+
#endif /* __KERNEL__ */

#define NFDBITS __NFDBITS
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index c782fe9..48533f6 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -186,3 +186,6 @@ cond_syscall(sys_perf_event_open);
/* fanotify! */
cond_syscall(sys_fanotify_init);
cond_syscall(sys_fanotify_mark);
+
+/* time change notification */
+cond_syscall(sys_time_change_notify);
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index f06a8a3..71de686 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -27,3 +27,10 @@ config GENERIC_CLOCKEVENTS_BUILD
default y
depends on GENERIC_CLOCKEVENTS || GENERIC_CLOCKEVENTS_MIGR

+config TIME_NOTIFY
+ bool "System time changes notification for userspace"
+ depends on EVENTFD
+ help
+ Enable time change notification events to userspace via
+ eventfd.
+
diff --git a/kernel/time/Makefile b/kernel/time/Makefile
index ee26662..2db9d25 100644
--- a/kernel/time/Makefile
+++ b/kernel/time/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST) += tick-broadcast.o
obj-$(CONFIG_TICK_ONESHOT) += tick-oneshot.o
obj-$(CONFIG_TICK_ONESHOT) += tick-sched.o
obj-$(CONFIG_TIMER_STATS) += timer_stats.o
+obj-$(CONFIG_TIME_NOTIFY) += notify.o
diff --git a/kernel/time/notify.c b/kernel/time/notify.c
new file mode 100644
index 0000000..f6e9bd8
--- /dev/null
+++ b/kernel/time/notify.c
@@ -0,0 +1,163 @@
+/*
+ * linux/kernel/time/notify.c
+ *
+ * Copyright (C) 2010 Nokia Corporation
+ * Alexander Shishkin
+ *
+ * This file implements an interface to communicate time changes to userspace.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/syscalls.h>
+#include <linux/slab.h>
+#include <linux/eventfd.h>
+#include <linux/wait.h>
+#include <linux/workqueue.h>
+#include <linux/sched.h>
+#include <linux/poll.h>
+#include <linux/err.h>
+
+/*
+ * A process can "subscribe" to receive a notification via eventfd that
+ * some other process has called stime/settimeofday/adjtimex.
+ */
+struct time_event {
+ struct eventfd_ctx *eventfd;
+ clockid_t clockid;
+ unsigned int type;
+ struct work_struct remove;
+ wait_queue_t wq;
+ wait_queue_head_t *wqh;
+ poll_table pt;
+ struct list_head list;
+};
+
+static LIST_HEAD(event_list);
+static DEFINE_SPINLOCK(event_lock);
+
+/*
+ * Do the necessary cleanup when the eventfd is being closed
+ */
+static void time_event_remove(struct work_struct *work)
+{
+ struct time_event *evt = container_of(work, struct time_event, remove);
+ __u64 cnt;
+
+ eventfd_ctx_remove_wait_queue(evt->eventfd, &evt->wq, &cnt);
+ kfree(evt);
+}
+
+static int time_event_wakeup(wait_queue_t *wq, unsigned int mode, int sync,
+ void *key)
+{
+ struct time_event *evt = container_of(wq, struct time_event, wq);
+ unsigned long flags = (unsigned long)key;
+
+ if (flags & POLLHUP) {
+ spin_lock(&event_lock);
+ list_del(&evt->list);
+ spin_unlock(&event_lock);
+
+ schedule_work(&evt->remove);
+ }
+
+ return 0;
+}
+
+static void time_event_ptable_queue_proc(struct file *file,
+ wait_queue_head_t *wqh, poll_table *pt)
+{
+ struct time_event *evt = container_of(pt, struct time_event, pt);
+
+ evt->wqh = wqh;
+ add_wait_queue(wqh, &evt->wq);
+}
+
+/*
+ * time_change_notify() registers a given eventfd to receive time change
+ * notifications
+ */
+SYSCALL_DEFINE3(time_change_notify, clockid_t, clockid, int, fd, unsigned int,
+ type)
+{
+ struct time_event *evt;
+ struct file *file;
+ int ret;
+
+ /*
+ * for now only accept CLOCK_REALTIME clockid; in future CLOCK_RTC
+ * will make use of this interface as well; with dynamic clockids
+ * it may find even more users
+ */
+ if ((type != TIME_EVENT_SET && type != TIME_EVENT_ADJ) ||
+ clockid != CLOCK_REALTIME)
+ return -EINVAL;
+
+ evt = kmalloc(sizeof(*evt), GFP_KERNEL);
+ if (!evt)
+ return -ENOMEM;
+
+ evt->type = type;
+ evt->clockid = clockid;
+
+ file = eventfd_fget(fd);
+ if (IS_ERR(file)) {
+ ret = -EINVAL;
+ goto out_free;
+ }
+
+ evt->eventfd = eventfd_ctx_fileget(file);
+ if (IS_ERR(evt->eventfd)) {
+ ret = PTR_ERR(evt->eventfd);
+ goto out_fput;
+ }
+
+ INIT_LIST_HEAD(&evt->list);
+ INIT_WORK(&evt->remove, time_event_remove);
+
+ init_waitqueue_func_entry(&evt->wq, time_event_wakeup);
+ init_poll_funcptr(&evt->pt, time_event_ptable_queue_proc);
+
+ if (file->f_op->poll(file, &evt->pt) & POLLHUP) {
+ ret = 0;
+ goto out_ctxput;
+ }
+
+ spin_lock(&event_lock);
+ list_add(&evt->list, &event_list);
+ spin_unlock(&event_lock);
+
+ fput(file);
+
+ return 0;
+
+out_ctxput:
+ eventfd_ctx_put(evt->eventfd);
+
+out_fput:
+ fput(file);
+
+out_free:
+ kfree(evt);
+
+ return ret;
+}
+
+void time_notify_all(clockid_t clockid, int type)
+{
+ struct list_head *tmp;
+
+ spin_lock(&event_lock);
+ list_for_each(tmp, &event_list) {
+ struct time_event *e = container_of(tmp, struct time_event,
+ list);
+
+ if (e->type != type || e->clockid != clockid)
+ continue;
+
+ eventfd_signal(e->eventfd, 1);
+ }
+ spin_unlock(&event_lock);
+}
+
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index d232189..9022068 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -541,6 +541,9 @@ int do_adjtimex(struct timex *txc)

notify_cmos_timer();

+ if (txc->modes)
+ time_notify_all(CLOCK_REALTIME, TIME_EVENT_ADJ);
+
return result;
}

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 49010d8..f9517e2 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -335,6 +335,9 @@ int do_settimeofday(struct timespec *tv)
/* signal hrtimers about time change */
clock_was_set();

+ /* signal time_change_notify() listeners */
+ time_notify_all(CLOCK_REALTIME, TIME_EVENT_SET);
+
return 0;
}

--
1.7.2.1.45.gb66c2

2010-11-11 19:30:38

by Alexander Shishkin

[permalink] [raw]
Subject: [PATCHv6 5/7] wire up sys_time_change_notify() on s390

Signed-off-by: Alexander Shishkin <[email protected]>
Acked-by: Martin Schwidefsky <[email protected]>
CC: Heiko Carstens <[email protected]>
CC: [email protected]
CC: Andrew Morton <[email protected]>
CC: Jesper Nilsson <[email protected]>
CC: David Howells <[email protected]>
CC: Christian Borntraeger <[email protected]>
CC: "Eric W. Biederman" <[email protected]>
CC: Christoph Hellwig <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/s390/include/asm/unistd.h | 3 ++-
arch/s390/kernel/compat_wrapper.S | 7 +++++++
arch/s390/kernel/syscalls.S | 1 +
3 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/arch/s390/include/asm/unistd.h b/arch/s390/include/asm/unistd.h
index 1049ef2..f815a2d 100644
--- a/arch/s390/include/asm/unistd.h
+++ b/arch/s390/include/asm/unistd.h
@@ -272,7 +272,8 @@
#define __NR_fanotify_init 332
#define __NR_fanotify_mark 333
#define __NR_prlimit64 334
-#define NR_syscalls 335
+#define __NR_time_change_notify 335
+#define NR_syscalls 336

/*
* There are some system calls that are not present on 64 bit, some
diff --git a/arch/s390/kernel/compat_wrapper.S b/arch/s390/kernel/compat_wrapper.S
index 8e60fb2..dd4b23e 100644
--- a/arch/s390/kernel/compat_wrapper.S
+++ b/arch/s390/kernel/compat_wrapper.S
@@ -1877,3 +1877,10 @@ sys_prlimit64_wrapper:
llgtr %r4,%r4 # const struct rlimit64 __user *
llgtr %r5,%r5 # struct rlimit64 __user *
jg sys_prlimit64 # branch to system call
+
+ .globl sys_time_change_notify_wrapper
+sys_time_change_notify_wrapper:
+ lgfr %r2,%r2 # clockid_t (int)
+ lgfr %r3,%r3 # int
+ llgfr %r4,%r4 # unsigned int
+ jg sys_time_change_notify # branch to system call
diff --git a/arch/s390/kernel/syscalls.S b/arch/s390/kernel/syscalls.S
index a8fee1b..50ca1a7 100644
--- a/arch/s390/kernel/syscalls.S
+++ b/arch/s390/kernel/syscalls.S
@@ -343,3 +343,4 @@ SYSCALL(sys_perf_event_open,sys_perf_event_open,sys_perf_event_open_wrapper)
SYSCALL(sys_fanotify_init,sys_fanotify_init,sys_fanotify_init_wrapper)
SYSCALL(sys_fanotify_mark,sys_fanotify_mark,sys_fanotify_mark_wrapper)
SYSCALL(sys_prlimit64,sys_prlimit64,sys_prlimit64_wrapper)
+SYSCALL(sys_time_change_notify,sys_time_change_notify,sys_time_change_notify_wrapper) /* 335 */
--
1.7.2.1.45.gb66c2

2010-11-11 19:30:33

by Alexander Shishkin

[permalink] [raw]
Subject: [PATCHv6 4/7] wire up sys_time_change_notify() on ia64

Signed-off-by: Alexander Shishkin <[email protected]>
CC: Tony Luck <[email protected]>
CC: Fenghua Yu <[email protected]>
CC: "David S. Miller" <[email protected]>
CC: Arnaldo Carvalho de Melo <[email protected]>
CC: David Howells <[email protected]>
CC: Alexander Shishkin <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/ia64/include/asm/unistd.h | 3 ++-
arch/ia64/kernel/entry.S | 1 +
2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/ia64/include/asm/unistd.h b/arch/ia64/include/asm/unistd.h
index 954d398..7445473 100644
--- a/arch/ia64/include/asm/unistd.h
+++ b/arch/ia64/include/asm/unistd.h
@@ -315,11 +315,12 @@
#define __NR_fanotify_init 1323
#define __NR_fanotify_mark 1324
#define __NR_prlimit64 1325
+#define __NR_time_change_notify 1326

#ifdef __KERNEL__


-#define NR_syscalls 302 /* length of syscall table */
+#define NR_syscalls 303 /* length of syscall table */

/*
* The following defines stop scripts/checksyscalls.sh from complaining about
diff --git a/arch/ia64/kernel/entry.S b/arch/ia64/kernel/entry.S
index 244704a..48a5682 100644
--- a/arch/ia64/kernel/entry.S
+++ b/arch/ia64/kernel/entry.S
@@ -1771,6 +1771,7 @@ sys_call_table:
data8 sys_fanotify_init
data8 sys_fanotify_mark
data8 sys_prlimit64 // 1325
+ data8 sys_time_change_notify

.org sys_call_table + 8*NR_syscalls // guard against failures to increase NR_syscalls
#endif /* __IA64_ASM_PARAVIRTUALIZED_NATIVE */
--
1.7.2.1.45.gb66c2

2010-11-11 19:31:57

by Alexander Shishkin

[permalink] [raw]
Subject: [PATCHv6 3/7] wire up sys_time_change_notify() on x86

This patch adds a new syscall of the following form:

int sys_time_change_notify(int fd, unsigned int flags)

This syscall is used for registering an eventfd for system time
change notification.

Signed-off-by: Alexander Shishkin <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: [email protected]
CC: Christoph Hellwig <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Eric Paris <[email protected]>
CC: Russell King <[email protected]>
CC: David Howells <[email protected]>
CC: Jiri Slaby <[email protected]>
CC: "David S. Miller" <[email protected]>
CC: [email protected]
---
arch/x86/ia32/ia32entry.S | 1 +
arch/x86/include/asm/unistd_32.h | 3 ++-
arch/x86/include/asm/unistd_64.h | 2 ++
arch/x86/kernel/syscall_table_32.S | 1 +
4 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 518bb99..6981bdc 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -851,4 +851,5 @@ ia32_sys_call_table:
.quad sys_fanotify_init
.quad sys32_fanotify_mark
.quad sys_prlimit64 /* 340 */
+ .quad sys_time_change_notify
ia32_syscall_end:
diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h
index b766a5e..17ac4f5 100644
--- a/arch/x86/include/asm/unistd_32.h
+++ b/arch/x86/include/asm/unistd_32.h
@@ -346,10 +346,11 @@
#define __NR_fanotify_init 338
#define __NR_fanotify_mark 339
#define __NR_prlimit64 340
+#define __NR_time_change_notify 341

#ifdef __KERNEL__

-#define NR_syscalls 341
+#define NR_syscalls 342

#define __ARCH_WANT_IPC_PARSE_VERSION
#define __ARCH_WANT_OLD_READDIR
diff --git a/arch/x86/include/asm/unistd_64.h b/arch/x86/include/asm/unistd_64.h
index 363e9b8..3d745f9 100644
--- a/arch/x86/include/asm/unistd_64.h
+++ b/arch/x86/include/asm/unistd_64.h
@@ -669,6 +669,8 @@ __SYSCALL(__NR_fanotify_init, sys_fanotify_init)
__SYSCALL(__NR_fanotify_mark, sys_fanotify_mark)
#define __NR_prlimit64 302
__SYSCALL(__NR_prlimit64, sys_prlimit64)
+#define __NR_time_change_notify 303
+__SYSCALL(__NR_time_change_notify, sys_time_change_notify)

#ifndef __NO_STUBS
#define __ARCH_WANT_OLD_READDIR
diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
index b35786d..b1ae9d4 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -340,3 +340,4 @@ ENTRY(sys_call_table)
.long sys_fanotify_init
.long sys_fanotify_mark
.long sys_prlimit64 /* 340 */
+ .long sys_time_change_notify
--
1.7.2.1.45.gb66c2

2010-11-11 19:30:32

by Alexander Shishkin

[permalink] [raw]
Subject: [PATCHv6 2/7] wire up sys_time_change_notify() on ARM

sys_time_change_notify() is a new syscall with number and types of
parameters such that no ARM-specific processing is needed.

Tested with 2.6.36-rc3 using Documentation/time-change-notify-example.c.

Signed-off-by: Alexander Shishkin <[email protected]>
CC: Russell King <[email protected]>
CC: Andrew Morton <[email protected]>
CC: David Howells <[email protected]>
CC: Christoph Hellwig <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/arm/include/asm/unistd.h | 1 +
arch/arm/kernel/calls.S | 1 +
2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
index c891eb7..45d5dc9 100644
--- a/arch/arm/include/asm/unistd.h
+++ b/arch/arm/include/asm/unistd.h
@@ -396,6 +396,7 @@
#define __NR_fanotify_init (__NR_SYSCALL_BASE+367)
#define __NR_fanotify_mark (__NR_SYSCALL_BASE+368)
#define __NR_prlimit64 (__NR_SYSCALL_BASE+369)
+#define __NR_time_change_notify (__NR_SYSCALL_BASE+370)

/*
* The following SWIs are ARM private.
diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
index 5c26ecc..633e71a 100644
--- a/arch/arm/kernel/calls.S
+++ b/arch/arm/kernel/calls.S
@@ -379,6 +379,7 @@
CALL(sys_fanotify_init)
CALL(sys_fanotify_mark)
CALL(sys_prlimit64)
+ CALL(sys_time_change_notify)
#ifndef syscalls_counted
.equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
#define syscalls_counted
--
1.7.2.1.45.gb66c2

2010-11-11 19:32:14

by Alexander Shishkin

[permalink] [raw]
Subject: [PATCHv6 7/7] wire up sys_time_change_notify() on blackfin

Signed-off-by: Alexander Shishkin <[email protected]>
CC: Mike Frysinger <[email protected]>
CC: Alexander Shishkin <[email protected]>
CC: Arjan van de Ven <[email protected]>
CC: Paul Mackerras <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Robin Getz <[email protected]>
CC: Philippe Gerum <[email protected]>
CC: "David S. Miller" <[email protected]>
CC: Yi Li <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/blackfin/include/asm/unistd.h | 3 ++-
arch/blackfin/mach-common/entry.S | 1 +
2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/blackfin/include/asm/unistd.h b/arch/blackfin/include/asm/unistd.h
index 928ae97..e4a3f730 100644
--- a/arch/blackfin/include/asm/unistd.h
+++ b/arch/blackfin/include/asm/unistd.h
@@ -393,8 +393,9 @@
#define __NR_fanotify_mark 372
#define __NR_prlimit64 373
#define __NR_cacheflush 374
+#define __NR_time_change_notify 375

-#define __NR_syscall 375
+#define __NR_syscall 376
#define NR_syscalls __NR_syscall

/* Old optional stuff no one actually uses */
diff --git a/arch/blackfin/mach-common/entry.S b/arch/blackfin/mach-common/entry.S
index 2ca915e..8e85a9a 100644
--- a/arch/blackfin/mach-common/entry.S
+++ b/arch/blackfin/mach-common/entry.S
@@ -1738,6 +1738,7 @@ ENTRY(_sys_call_table)
.long _sys_fanotify_mark
.long _sys_prlimit64
.long _sys_cacheflush
+ .long _sys_time_change_notify /* 375 */

.rept NR_syscalls-(.-_sys_call_table)/4
.long _sys_ni_syscall
--
1.7.2.1.45.gb66c2

2010-11-11 19:32:32

by Alexander Shishkin

[permalink] [raw]
Subject: [PATCHv6 6/7] wire up sys_time_change_notify() on powerpc

Signed-off-by: Alexander Shishkin <[email protected]>
CC: Benjamin Herrenschmidt <[email protected]>
CC: Paul Mackerras <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Andreas Schwab <[email protected]>
CC: Alexander Shishkin <[email protected]>
CC: Christoph Hellwig <[email protected]>
CC: Jesper Nilsson <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/powerpc/include/asm/systbl.h | 1 +
arch/powerpc/include/asm/unistd.h | 3 ++-
2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
index aa0f1eb..a57cc82 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -348,3 +348,4 @@ COMPAT_SYS_SPU(sendmsg)
COMPAT_SYS_SPU(recvmsg)
COMPAT_SYS_SPU(recvmmsg)
SYSCALL_SPU(accept4)
+SYSCALL(time_change_notify)
diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index 6151937..bea2c9e 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -367,10 +367,11 @@
#define __NR_recvmsg 342
#define __NR_recvmmsg 343
#define __NR_accept4 344
+#define __NR_time_change_notify 345

#ifdef __KERNEL__

-#define __NR_syscalls 345
+#define __NR_syscalls 346

#define __NR__exit __NR_exit
#define NR_syscalls __NR_syscalls
--
1.7.2.1.45.gb66c2

2010-11-11 20:29:05

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Thu, 11 Nov 2010 21:29:55 +0200, Alexander Shishkin said:

> Consider we want stuff like "wakeup every day at 3pm", the next wakeup
> might be earlier than the timer we calculated last time, on system
> time changes. We need to re-calculate it. This is necessary for all
> repeating events.
>
> Say we want to wakeup at 3pm, now it's 4pm, so we schedule it in 23
> hours. Now the system time changes to 2pm, and we would expect to
> wakeup in one hour, but we take 25.

Sorry, I tuned in late here...

So the plan is that if you're not using this new interface, it will go off at
the same absolute offset (23 hours after timer was set), but if you're using
this interface, your timer event gets interrupted, you get woken up (say)
15 hours into your 23, and it's your job to decide if you need to set a
new timer for the remaining 6, 7, 8 hours or some other number?


Attachments:
(No filename) (227.00 B)

2010-11-11 20:51:26

by Alexander Shishkin

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Thu, Nov 11, 2010 at 03:28:13PM -0500, [email protected] wrote:
> On Thu, 11 Nov 2010 21:29:55 +0200, Alexander Shishkin said:
>
> > Consider we want stuff like "wakeup every day at 3pm", the next wakeup
> > might be earlier than the timer we calculated last time, on system
> > time changes. We need to re-calculate it. This is necessary for all
> > repeating events.
> >
> > Say we want to wakeup at 3pm, now it's 4pm, so we schedule it in 23
> > hours. Now the system time changes to 2pm, and we would expect to
> > wakeup in one hour, but we take 25.
>
> Sorry, I tuned in late here...
>
> So the plan is that if you're not using this new interface, it will go off at
> the same absolute offset (23 hours after timer was set), but if you're using
> this interface, your timer event gets interrupted, you get woken up (say)
> 15 hours into your 23, and it's your job to decide if you need to set a
> new timer for the remaining 6, 7, 8 hours or some other number?

Yes. This interface doesn't deal with timers, it only provides notifications.

Regards,
--
Alex

2010-11-11 20:55:30

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCHv6 1/7] notify userspace about time changes

On Thursday 11 November 2010 20:29:56 Alexander Shishkin wrote:
> Certain userspace applications (like "clock" desktop applets or cron) might
> want to be notified when some other application changes the system time.
> There are several known to me reasons for this:
> - avoiding periodic wakeups to poll time changes;
> - changing system timekeeping policy for system-wide time management
> programs;
> - keeping guest applications/operating systems running in emulators
> up to date.
>
> This patch implements a notification interface via eventfd mechanism. Proccess
> wishing to be notified about time changes should create an eventfd and pass it
> to time_change_notify() syscall. After that, any calls to settimeofday()/
> stime()/adjtimex() made by other processes will be signalled to this eventfd.
> Credits for suggesting the eventfd mechanism for this purpose go to Kirill
> Shutemov.
>
> This patch adds the syscall to asm-generic/unistd.h and a simple usage
> example.

Looks reasonable to me.

It would be good to have the man page for this, too. I guess it could
be added to the existing clock_{get,set}time man page, so you can
add a patch for that.

Arnd

2010-11-11 21:16:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

B1;2401;0cOn Thu, 11 Nov 2010, Alexander Shishkin wrote:

> On Thu, Nov 11, 2010 at 03:28:13PM -0500, [email protected] wrote:
> > On Thu, 11 Nov 2010 21:29:55 +0200, Alexander Shishkin said:
> >
> > > Consider we want stuff like "wakeup every day at 3pm", the next wakeup
> > > might be earlier than the timer we calculated last time, on system
> > > time changes. We need to re-calculate it. This is necessary for all
> > > repeating events.
> > >
> > > Say we want to wakeup at 3pm, now it's 4pm, so we schedule it in 23
> > > hours. Now the system time changes to 2pm, and we would expect to
> > > wakeup in one hour, but we take 25.
> >
> > Sorry, I tuned in late here...
> >
> > So the plan is that if you're not using this new interface, it will go off at
> > the same absolute offset (23 hours after timer was set), but if you're using
> > this interface, your timer event gets interrupted, you get woken up (say)
> > 15 hours into your 23, and it's your job to decide if you need to set a
> > new timer for the remaining 6, 7, 8 hours or some other number?
>
> Yes. This interface doesn't deal with timers, it only provides notifications.

The notification itself is pointless unless your application is
dealing with timers which need to be adjusted the one way or the
other.

That said, I'm still not convinced that this usecase justifies a new
systemcall.

1) We can make timers wake up when a clock change happens
2) Can't we use existing notification stuff like uevents or such ?

Thanks,

tglx

2010-11-11 22:11:58

by Kyle Moffett

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Thu, Nov 11, 2010 at 16:16, Thomas Gleixner <[email protected]> wrote:
> The notification itself is pointless unless your application is
> dealing with timers which need to be adjusted the one way or the
> other.
>
> That said, I'm still not convinced that this usecase justifies a new
> systemcall.
>
> 1) We can make timers wake up when a clock change happens
> 2) Can't we use existing notification stuff like uevents or such ?

What about maybe adding device nodes for various kinds of "clock"
devices? You could then do:

#define CLOCK_FD 0x80000000
fd = open("/dev/clock/realtime", O_RDWR);
poll(fd);
clock_gettime(CLOCK_FD|fd, &ts);
[...]

This would also enable the folks who want to support things like PHY
hardware clocks (for very-low-latency ethernet timestamping). It
would resolve the enumeration problem; instead of 0, 1, 2, ... as
constants, they would show up in sysfs and be open()able. Ideally you
would be able to set up ntpd to slew the "realtime" clock by following
a particular hardware clock, or vice versa.

Cheers,
Kyle Moffett

2010-11-11 22:36:54

by john stultz

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Thu, 2010-11-11 at 17:11 -0500, Kyle Moffett wrote:
> On Thu, Nov 11, 2010 at 16:16, Thomas Gleixner <[email protected]> wrote:
> > The notification itself is pointless unless your application is
> > dealing with timers which need to be adjusted the one way or the
> > other.
> >
> > That said, I'm still not convinced that this usecase justifies a new
> > systemcall.
> >
> > 1) We can make timers wake up when a clock change happens

I think this seems like the most interesting solution.

However we may need some sort of special flag, as I don't think many
timers are expecting to fire before the specified time, so you'd likely
break regular applications if all timers woke up on clock changes.


> > 2) Can't we use existing notification stuff like uevents or such ?
>
> What about maybe adding device nodes for various kinds of "clock"
> devices? You could then do:
>
> #define CLOCK_FD 0x80000000
> fd = open("/dev/clock/realtime", O_RDWR);
> poll(fd);
> clock_gettime(CLOCK_FD|fd, &ts);

Ehh.. I'm not a huge fan of creating dynamic ids for what are static
clocksources (REALTIME, MONOTONIC, etc).

That said...

> [...]
>
> This would also enable the folks who want to support things like PHY
> hardware clocks (for very-low-latency ethernet timestamping). It
> would resolve the enumeration problem; instead of 0, 1, 2, ... as
> constants, they would show up in sysfs and be open()able. Ideally you
> would be able to set up ntpd to slew the "realtime" clock by following
> a particular hardware clock, or vice versa.

This is very similar in spirit to what's being done by Richard Cochran's
dynamic clock devices code: http://lwn.net/Articles/413332/

thanks
-john

2010-11-11 22:51:07

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

B1;2401;0cOn Thu, 11 Nov 2010, Kyle Moffett wrote:
> On Thu, Nov 11, 2010 at 16:16, Thomas Gleixner <[email protected]> wrote:
> > The notification itself is pointless unless your application is
> > dealing with timers which need to be adjusted the one way or the
> > other.
> >
> > That said, I'm still not convinced that this usecase justifies a new
> > systemcall.
> >
> > 1) We can make timers wake up when a clock change happens
> > 2) Can't we use existing notification stuff like uevents or such ?
>
> What about maybe adding device nodes for various kinds of "clock"
> devices? You could then do:
>
> #define CLOCK_FD 0x80000000
> fd = open("/dev/clock/realtime", O_RDWR);
> poll(fd);
> clock_gettime(CLOCK_FD|fd, &ts);

That won't work due to the posix-cputimers occupying the negative
number space already.

> [...]
>
> This would also enable the folks who want to support things like PHY
> hardware clocks (for very-low-latency ethernet timestamping). It
> would resolve the enumeration problem; instead of 0, 1, 2, ... as
> constants, they would show up in sysfs and be open()able. Ideally you
> would be able to set up ntpd to slew the "realtime" clock by following
> a particular hardware clock, or vice versa.

There are other plans for the various interesting clocks floating
around which look pretty good.

But what you folks really want for this stuff is an extension to
timerfd as you want to be able to poll, right?

So what about the following:

Add a new flag TDF_NOTIFY_CLOCK_WAS_SET to the timerfd flags. Now this
flag adds the timer to a separate list, which gets woken up when the
clock is set.

No new syscall, just a few lines of code in fs/timerfd.c and
clock_was_set().

Thoughts ?

tglx

2010-11-11 23:19:35

by Kyle Moffett

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Thu, Nov 11, 2010 at 17:50, Thomas Gleixner <[email protected]> wrote:
> On Thu, 11 Nov 2010, Kyle Moffett wrote:
>> What about maybe adding device nodes for various kinds of "clock"
>> devices? You could then do:
>>
>> #define CLOCK_FD 0x80000000
>> fd = open("/dev/clock/realtime", O_RDWR);
>> poll(fd);
>> clock_gettime(CLOCK_FD|fd, &ts);
>
> That won't work due to the posix-cputimers occupying the negative
> number space already.

Hmm, looks like the manpages clock_gettime(2) et. al. need updating,
they don't mention anything at all about negative clockids. The same
thing could still be done with, EG:

#define CLOCK_FD 0x40000000


On Thu, Nov 11, 2010 at 17:36, john stultz <[email protected]> wrote:
> On Thu, 2010-11-11 at 17:11 -0500, Kyle Moffett wrote:
>> On Thu, Nov 11, 2010 at 16:16, Thomas Gleixner <[email protected]> wrote:
>> > 2) Can't we use existing notification stuff like uevents or such ?
>>
>> What about maybe adding device nodes for various kinds of "clock"
>> devices?  You could then do:
>>
>> #define CLOCK_FD 0x80000000
>> fd = open("/dev/clock/realtime", O_RDWR);
>> poll(fd);
>> clock_gettime(CLOCK_FD|fd, &ts);
>
> Ehh.. I'm not a huge fan of creating dynamic ids for what are static
> clocksources (REALTIME, MONOTONIC, etc).
>
> That said...
>
>> [...]
>>
>> This would also enable the folks who want to support things like PHY
>> hardware clocks (for very-low-latency ethernet timestamping).  It
>> would resolve the enumeration problem; instead of 0, 1, 2, ... as
>> constants, they would show up in sysfs and be open()able.  Ideally you
>> would be able to set up ntpd to slew the "realtime" clock by following
>> a particular hardware clock, or vice versa.
>
> This is very similar in spirit to what's being done by Richard Cochran's
> dynamic clock devices code: http://lwn.net/Articles/413332/

Hmm, I've just been poking around and thinking about an extension of
this concept. Right now we have:

/sys/devices/system/clocksource
/sys/devices/system/clocksource/clocksource0
/sys/devices/system/clocksource/clocksource0/current_clocksource
/sys/devices/system/clocksource/clocksource0/available_clocksource

Could we actually register the separate clocksources (hpet, acpi_pm,
etc) in the device model properly?

Then consider the possibility of creating "virtual clocksources" which
are measured against an existing clocksource. They could be
independently slewed and adjusted relative to the parent clocksource.
Then the "UTS namespace" feature could also affect the current
clocksource used for CLOCK_MONOTONIC, etc.

You could perform various forms of time-sensitive software testing
without causing problems for a "make" process running elsewhere on the
system. You could test the operation of various kinds of software
across large jumps or long periods of time (at a highly accelerated
rate) without impacting your development environment.

One really nice example would be testing "ntpd" itself; you could run
a known-good "ntpd" in the base system to maintain a very stable
clock, then simulate all kinds of terrifyingly bad clock hardware and
kernel problems (sudden frequency changes, etc) in a container. This
kind of stuff can currently only be easily simulated with specialized
hardware.

You could also improve "container-based" virtualization, allowing
perceived "CPU-time" to be slewed based on the cgroup. IE: Processes
inside of a container allocated only "33%" of one CPU might see their
"CPU-time" accrue 3 times faster than a process outside of the
container, as though the process was the only thing running on the
system. Running "top" inside of the container might show 100% CPU
even though the hardware is at 33% utilization, or 200% CPU if the
container is currently bursting much higher.

Cheers,
Kyle Moffett

2010-11-11 23:41:36

by john stultz

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Thu, 2010-11-11 at 18:19 -0500, Kyle Moffett wrote:
> On Thu, Nov 11, 2010 at 17:50, Thomas Gleixner <[email protected]> wrote:
> > On Thu, 11 Nov 2010, Kyle Moffett wrote:
> >> What about maybe adding device nodes for various kinds of "clock"
> >> devices? You could then do:
> >>
> >> #define CLOCK_FD 0x80000000
> >> fd = open("/dev/clock/realtime", O_RDWR);
> >> poll(fd);
> >> clock_gettime(CLOCK_FD|fd, &ts);
> >
> > That won't work due to the posix-cputimers occupying the negative
> > number space already.
>
> Hmm, looks like the manpages clock_gettime(2) et. al. need updating,
> they don't mention anything at all about negative clockids. The same
> thing could still be done with, EG:
>
> #define CLOCK_FD 0x40000000

Again, see Richard's patch and the discussion around it for various
complications here (which cause pid_t size limits and run into
limitations with max number of fds per process).

> > This is very similar in spirit to what's being done by Richard Cochran's
> > dynamic clock devices code: http://lwn.net/Articles/413332/
>
> Hmm, I've just been poking around and thinking about an extension of
> this concept. Right now we have:
>
> /sys/devices/system/clocksource
> /sys/devices/system/clocksource/clocksource0
> /sys/devices/system/clocksource/clocksource0/current_clocksource
> /sys/devices/system/clocksource/clocksource0/available_clocksource
>
> Could we actually register the separate clocksources (hpet, acpi_pm,
> etc) in the device model properly?
>
> Then consider the possibility of creating "virtual clocksources" which
> are measured against an existing clocksource. They could be
> independently slewed and adjusted relative to the parent clocksource.
> Then the "UTS namespace" feature could also affect the current
> clocksource used for CLOCK_MONOTONIC, etc.
>
> You could perform various forms of time-sensitive software testing
> without causing problems for a "make" process running elsewhere on the
> system. You could test the operation of various kinds of software
> across large jumps or long periods of time (at a highly accelerated
> rate) without impacting your development environment.

This can already be done by registering a bogus clocksource that returns
a counter value <<'ed up.

That said, the entire system will then see time run faster, and since
timer irqs are triggered off of other devices and other devices notion
of time would not be accelerated, the irqs would seem late. At extreme
values, this would cause system issues, like instant device timeouts.
Further, it wouldn't accelerate the cpu execution time, so applications
would seem to run very slowly.

At one time I looked at doing this in the other direction (slowing down
system time to emulate what a faster cpu would be like), but there's
tons of issues around the fact that there are numerous time domains in a
system that are all very close to actual time, so lots of assumptions
are made as if there is really only one time domain. So by speeding up
the system time, you break the assumption between devices and things
don't function properly.

Again, you might be able to get away with very minor freq adjustments,
but that can easily be done by registering a clocksource with an
incorrect freq value.

> One really nice example would be testing "ntpd" itself; you could run
> a known-good "ntpd" in the base system to maintain a very stable
> clock, then simulate all kinds of terrifyingly bad clock hardware and
> kernel problems (sudden frequency changes, etc) in a container. This
> kind of stuff can currently only be easily simulated with specialized
> hardware.

Eh, this stuff is emulated in software frequently.

Also, doing what you propose could be easily done via virtualization or
a hardware emulator where you really can manage all the different time
domains properly.


> You could also improve "container-based" virtualization, allowing
> perceived "CPU-time" to be slewed based on the cgroup. IE: Processes
> inside of a container allocated only "33%" of one CPU might see their
> "CPU-time" accrue 3 times faster than a process outside of the
> container, as though the process was the only thing running on the
> system. Running "top" inside of the container might show 100% CPU
> even though the hardware is at 33% utilization, or 200% CPU if the
> container is currently bursting much higher.

I just don't see the real benefit to greatly complicating the
timekeeping code to keep track of multiple fake time domains when these
things can be achieved in other ways (emulation, or virtualization with
freq adjusted clocksources).

The only cases I see where exposing alternative time domains to the
system time is a good thing is where you actually need to precisely
interact with a device that is adjusted or runs on a different time
crystal (as is the case with the PTP clock Richard is working on, or the
clocks on audio hardware).

thanks
-john

2010-11-11 23:45:51

by john stultz

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Thu, 2010-11-11 at 18:19 -0500, Kyle Moffett wrote:
> Then consider the possibility of creating "virtual clocksources" which
> are measured against an existing clocksource. They could be
> independently slewed and adjusted relative to the parent clocksource.
> Then the "UTS namespace" feature could also affect the current
> clocksource used for CLOCK_MONOTONIC, etc.
>
> You could perform various forms of time-sensitive software testing
> without causing problems for a "make" process running elsewhere on the
> system. You could test the operation of various kinds of software
> across large jumps or long periods of time (at a highly accelerated
> rate) without impacting your development environment.

Oh, and I forgot, if you want to actually do something like this, the
best way is to create a LD_PRELOAD library that intercepts gettimeofday,
clock_gettime, and all the other syscalls that utilize time values and
adjust them as desired.

This way you only affect the specific application and not the rest of
the system, and avoid all the nasty hardware time domain assumptions
that the kernel makes when working with hardware.

thanks
-john

2010-11-12 02:37:38

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Thu, 11 Nov 2010, Thomas Gleixner wrote:

> > [...]
> >
> > This would also enable the folks who want to support things like PHY
> > hardware clocks (for very-low-latency ethernet timestamping). It
> > would resolve the enumeration problem; instead of 0, 1, 2, ... as
> > constants, they would show up in sysfs and be open()able. Ideally you
> > would be able to set up ntpd to slew the "realtime" clock by following
> > a particular hardware clock, or vice versa.
>
> There are other plans for the various interesting clocks floating
> around which look pretty good.
>
> But what you folks really want for this stuff is an extension to
> timerfd as you want to be able to poll, right?
>
> So what about the following:
>
> Add a new flag TDF_NOTIFY_CLOCK_WAS_SET to the timerfd flags. Now this
> flag adds the timer to a separate list, which gets woken up when the
> clock is set.

What chu talkin bout Willis? :)
This borders the interface multiplexing, if I got your idea correctly.
Timerfd is about expiring private timers, not about getting out
notifications about system-wide time changes.
I understand your concerns about that code, but please do not use timerfd
as a sacrificial mule to fit something which IMO does not belong there.



- Davide

2010-11-12 09:27:26

by Alan

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

> #define CLOCK_FD 0x80000000
> fd = open("/dev/clock/realtime", O_RDWR);
> poll(fd);
> clock_gettime(CLOCK_FD|fd, &ts);
> [...]

Oh please. Can we not manage to vaguely follow the direction of other
syscalls and instead of magic flag hacks just add

fclock_gettime

2010-11-12 10:48:17

by Kay Sievers

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Thu, Nov 11, 2010 at 22:16, Thomas Gleixner <[email protected]> wrote:
> B1;2401;0cOn Thu, 11 Nov 2010, Alexander Shishkin wrote:
>
>> On Thu, Nov 11, 2010 at 03:28:13PM -0500, [email protected] wrote:
>> > On Thu, 11 Nov 2010 21:29:55 +0200, Alexander Shishkin said:
>> >
>> > > Consider we want stuff like "wakeup every day at 3pm", the next wakeup
>> > > might be earlier than the timer we calculated last time, on system
>> > > time changes. We need to re-calculate it. This is necessary for all
>> > > repeating events.
>> > >
>> > > Say we want to wakeup at 3pm, now it's 4pm, so we schedule it in 23
>> > > hours. Now the system time changes to 2pm, and we would expect to
>> > > wakeup in one hour, but we take 25.
>> >
>> > Sorry, I tuned in late here...
>> >
>> > So the plan is that if you're not using this new interface, it will go off at
>> > the same absolute offset (23 hours after timer was set), but if you're using
>> > this interface, your timer event gets interrupted, you get woken up (say)
>> > 15 hours into your 23, and it's your job to decide if you need to set a
>> > new timer for the remaining 6, 7, 8 hours or some other number?
>>
>> Yes. This interface doesn't deal with timers, it only provides notifications.
>
> The notification itself is pointless unless your application is
> dealing with timers which need to be adjusted the one way or the
> other.
>
> That said, I'm still not convinced that this usecase justifies a new
> systemcall.
>
> 1) We can make timers wake up when a clock change happens

That would be fine too. We just need to wake up somehow and then can
find out ourselves what has happened underneath us.

> 2) Can't we use existing notification stuff like uevents or such ?

Uevents are heavy and very expensive to handle in userspace. Udev
wakes up all of these and runs stuff. They can only be used for device
discovery or some low-frequency state change notification, nothing
else. If there is the slightest chance that they might happen at a
very high frequency, they might bring an entire box down. It's really
not what they are made for.

Can't we just let poll() on timerfd return POLLPRI|POLLERR if the time
shifted to something that needs re-calculation in the requesting
process?

Kay

2010-11-12 10:54:08

by Richard Cochran

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Fri, Nov 12, 2010 at 09:25:04AM +0000, Alan Cox wrote:
> > #define CLOCK_FD 0x80000000
> > fd = open("/dev/clock/realtime", O_RDWR);
> > poll(fd);
> > clock_gettime(CLOCK_FD|fd, &ts);
> > [...]
>
> Oh please. Can we not manage to vaguely follow the direction of other
> syscalls and instead of magic flag hacks just add
>
> fclock_gettime

Did you see my recent post on dynamic clock ids, from Nov 4th?
It implements the clock lifetime cycle, and I would like to here your
comments on that.

I did not implement the clock_* syscalls in new forms as fclock_* but
will do so if there is agreement about it.

IMHO, from the application point of view, it would be nicer to be able
to mix and match dynamic clock ids with the CLOCK_ ids using a
clockid_t and the existing posix clock_* calls.

Thanks,
Richard

2010-11-12 11:27:14

by Alan

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

> Did you see my recent post on dynamic clock ids, from Nov 4th?
> It implements the clock lifetime cycle, and I would like to here your
> comments on that.

Its hard to see the intended API from the patches, but I remain strongly
convinced that you must be able to open/close and use normal file
operations on a clock.

Having the id space nibble away at the future cpu clock range and pid
size is always a bit iffy.

The rest of the stuff - kref counting and the like look nice and it's
definitely going the right way about that.

Alan

2010-11-12 12:40:08

by Alexander Shishkin

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Thu, Nov 11, 2010 at 10:16:03PM +0100, Thomas Gleixner wrote:
> B1;2401;0cOn Thu, 11 Nov 2010, Alexander Shishkin wrote:
>
> > On Thu, Nov 11, 2010 at 03:28:13PM -0500, [email protected] wrote:
> > > On Thu, 11 Nov 2010 21:29:55 +0200, Alexander Shishkin said:
> > >
> > > > Consider we want stuff like "wakeup every day at 3pm", the next wakeup
> > > > might be earlier than the timer we calculated last time, on system
> > > > time changes. We need to re-calculate it. This is necessary for all
> > > > repeating events.
> > > >
> > > > Say we want to wakeup at 3pm, now it's 4pm, so we schedule it in 23
> > > > hours. Now the system time changes to 2pm, and we would expect to
> > > > wakeup in one hour, but we take 25.
> > >
> > > Sorry, I tuned in late here...
> > >
> > > So the plan is that if you're not using this new interface, it will go off at
> > > the same absolute offset (23 hours after timer was set), but if you're using
> > > this interface, your timer event gets interrupted, you get woken up (say)
> > > 15 hours into your 23, and it's your job to decide if you need to set a
> > > new timer for the remaining 6, 7, 8 hours or some other number?
> >
> > Yes. This interface doesn't deal with timers, it only provides notifications.
>
> The notification itself is pointless unless your application is
> dealing with timers which need to be adjusted the one way or the
> other.

Not quite. There are two use cases that I know of that need this notification
but don't use timers like that. One was described by Chris Friesen in the
comments to the first version of this patchset [1]. The other one was
described by me in the same thread [2]. They could, of course start using
timers specifically to get this notification, but I'm not sure if it doesn't
abuse the whole idea of timers.

> That said, I'm still not convinced that this usecase justifies a new
> systemcall.

Well, initially it was a sysfs interface (an arguably ugly one, though).

> 1) We can make timers wake up when a clock change happens

Isn't there a race so that if two clock changes happen in quick succession,
the user might lose the second one while he restarts the timer?

> 2) Can't we use existing notification stuff like uevents or such ?

I thought about that in the beginning but uevents indeed seem too heavy
for this sort of notification, especially compared to eventfds.

[1] http://lkml.org/lkml/2010/8/4/383
[2] http://lkml.org/lkml/2010/8/5/192

Regards,
--
Alex

2010-11-17 19:07:05

by Alexander Shishkin

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Thu, Nov 11, 2010 at 11:50:38PM +0100, Thomas Gleixner wrote:
> B1;2401;0cOn Thu, 11 Nov 2010, Kyle Moffett wrote:
> > On Thu, Nov 11, 2010 at 16:16, Thomas Gleixner <[email protected]> wrote:
> > > The notification itself is pointless unless your application is
> > > dealing with timers which need to be adjusted the one way or the
> > > other.
> > >
> > > That said, I'm still not convinced that this usecase justifies a new
> > > systemcall.
> > >
> > > 1) We can make timers wake up when a clock change happens
> > > 2) Can't we use existing notification stuff like uevents or such ?
> >
> > What about maybe adding device nodes for various kinds of "clock"
> > devices? You could then do:
> >
> > #define CLOCK_FD 0x80000000
> > fd = open("/dev/clock/realtime", O_RDWR);
> > poll(fd);
> > clock_gettime(CLOCK_FD|fd, &ts);
>
> That won't work due to the posix-cputimers occupying the negative
> number space already.
>
> > [...]
> >
> > This would also enable the folks who want to support things like PHY
> > hardware clocks (for very-low-latency ethernet timestamping). It
> > would resolve the enumeration problem; instead of 0, 1, 2, ... as
> > constants, they would show up in sysfs and be open()able. Ideally you
> > would be able to set up ntpd to slew the "realtime" clock by following
> > a particular hardware clock, or vice versa.
>
> There are other plans for the various interesting clocks floating
> around which look pretty good.
>
> But what you folks really want for this stuff is an extension to
> timerfd as you want to be able to poll, right?
>
> So what about the following:
>
> Add a new flag TDF_NOTIFY_CLOCK_WAS_SET to the timerfd flags. Now this
> flag adds the timer to a separate list, which gets woken up when the
> clock is set.
>
> No new syscall, just a few lines of code in fs/timerfd.c and
> clock_was_set().
>
> Thoughts ?

Something like this (sans ugliness)?

Signed-off-by: Alexander Shishkin <[email protected]>
---
fs/timerfd.c | 52 +++++++++++++++++++++++++++++++++++++++++++++-
include/linux/timerfd.h | 3 +-
kernel/hrtimer.c | 2 +
3 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index 8c4fc14..51a033e 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -30,8 +30,14 @@ struct timerfd_ctx {
u64 ticks;
int expired;
int clockid;
+ int clock_notifier; /* list_empty()? */
+ struct list_head notifiers_list;
};

+/* TFD_NOTIFY_CLOCK_SET timers go here */
+static DEFINE_SPINLOCK(notifiers_lock);
+static LIST_HEAD(notifiers_list);
+
/*
* This gets called when the timer event triggers. We set the "expired"
* flag, but we do not re-arm the timer (in case it's necessary,
@@ -51,6 +57,21 @@ static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
return HRTIMER_NORESTART;
}

+void timerfd_clock_was_set(void)
+{
+ struct timerfd_ctx *ctx;
+ unsigned long flags;
+
+ spin_lock(&notifiers_lock);
+ list_for_each_entry(ctx, &notifiers_list, notifiers_list) {
+ spin_lock_irqsave(&ctx->wqh.lock, flags);
+ ctx->ticks++;
+ wake_up_locked(&ctx->wqh);
+ spin_unlock_irqrestore(&ctx->wqh.lock, flags);
+ }
+ spin_unlock(&notifiers_lock);
+}
+
static ktime_t timerfd_get_remaining(struct timerfd_ctx *ctx)
{
ktime_t remaining;
@@ -65,6 +86,14 @@ static void timerfd_setup(struct timerfd_ctx *ctx, int flags,
enum hrtimer_mode htmode;
ktime_t texp;

+ if (flags & TFD_NOTIFY_CLOCK_SET) {
+ ctx->clock_notifier = 1;
+ ctx->ticks = 0;
+ INIT_LIST_HEAD(&ctx->notifiers_list);
+ list_add(&ctx->notifiers_list, &notifiers_list);
+ return;
+ }
+
htmode = (flags & TFD_TIMER_ABSTIME) ?
HRTIMER_MODE_ABS: HRTIMER_MODE_REL;

@@ -83,7 +112,12 @@ static int timerfd_release(struct inode *inode, struct file *file)
{
struct timerfd_ctx *ctx = file->private_data;

- hrtimer_cancel(&ctx->tmr);
+ if (ctx->clock_notifier) {
+ spin_lock(&notifiers_lock);
+ list_del(&ctx->notifiers_list);
+ spin_unlock(&notifiers_lock);
+ } else
+ hrtimer_cancel(&ctx->tmr);
kfree(ctx);
return 0;
}
@@ -113,6 +147,7 @@ static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,

if (count < sizeof(ticks))
return -EINVAL;
+
spin_lock_irq(&ctx->wqh.lock);
if (file->f_flags & O_NONBLOCK)
res = -EAGAIN;
@@ -120,7 +155,7 @@ static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
res = wait_event_interruptible_locked_irq(ctx->wqh, ctx->ticks);
if (ctx->ticks) {
ticks = ctx->ticks;
- if (ctx->expired && ctx->tintv.tv64) {
+ if (ctx->expired && ctx->tintv.tv64 && !ctx->clock_notifier) {
/*
* If tintv.tv64 != 0, this is a periodic timer that
* needs to be re-armed. We avoid doing it in the timer
@@ -218,13 +253,20 @@ SYSCALL_DEFINE4(timerfd_settime, int, ufd, int, flags,
* it to the new values.
*/
for (;;) {
+ spin_lock(&notifiers_lock);
spin_lock_irq(&ctx->wqh.lock);
if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
break;
spin_unlock_irq(&ctx->wqh.lock);
+ spin_unlock(&notifiers_lock);
cpu_relax();
}

+ if (ctx->clock_notifier) {
+ ctx->clock_notifier = 0;
+ list_del(&ctx->notifiers_list);
+ }
+
/*
* If the timer is expired and it's periodic, we need to advance it
* because the caller may want to know the previous expiration time.
@@ -243,6 +285,7 @@ SYSCALL_DEFINE4(timerfd_settime, int, ufd, int, flags,
timerfd_setup(ctx, flags, &ktmr);

spin_unlock_irq(&ctx->wqh.lock);
+ spin_unlock(&notifiers_lock);
fput(file);
if (otmr && copy_to_user(otmr, &kotmr, sizeof(kotmr)))
return -EFAULT;
@@ -262,6 +305,11 @@ SYSCALL_DEFINE2(timerfd_gettime, int, ufd, struct itimerspec __user *, otmr)
ctx = file->private_data;

spin_lock_irq(&ctx->wqh.lock);
+ if (ctx->clock_notifier) {
+ spin_unlock_irq(&ctx->wqh.lock);
+ return -EINVAL;
+ }
+
if (ctx->expired && ctx->tintv.tv64) {
ctx->expired = 0;
ctx->ticks +=
diff --git a/include/linux/timerfd.h b/include/linux/timerfd.h
index 2d07929..c3ddad9 100644
--- a/include/linux/timerfd.h
+++ b/include/linux/timerfd.h
@@ -19,6 +19,7 @@
* shared O_* flags.
*/
#define TFD_TIMER_ABSTIME (1 << 0)
+#define TFD_NOTIFY_CLOCK_SET (1 << 1)
#define TFD_CLOEXEC O_CLOEXEC
#define TFD_NONBLOCK O_NONBLOCK

@@ -26,6 +27,6 @@
/* Flags for timerfd_create. */
#define TFD_CREATE_FLAGS TFD_SHARED_FCNTL_FLAGS
/* Flags for timerfd_settime. */
-#define TFD_SETTIME_FLAGS TFD_TIMER_ABSTIME
+#define TFD_SETTIME_FLAGS (TFD_TIMER_ABSTIME | TFD_NOTIFY_CLOCK_SET)

#endif /* _LINUX_TIMERFD_H */
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 72206cf..a779e0f 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -642,8 +642,10 @@ static void retrigger_next_event(void *arg)
* resolution timer interrupts. On UP we just disable interrupts and
* call the high resolution interrupt code.
*/
+void timerfd_clock_was_set(clockid_t clockid);
void clock_was_set(void)
{
+ timerfd_clock_was_set(CLOCK_REALTIME);
/* Retrigger the CPU local events everywhere */
on_each_cpu(retrigger_next_event, NULL, 1);
}
--
1.7.2.1.45.gb66c2

2010-11-17 20:45:09

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Wed, 17 Nov 2010, Alexander Shishkin wrote:

> > But what you folks really want for this stuff is an extension to
> > timerfd as you want to be able to poll, right?
> >
> > So what about the following:
> >
> > Add a new flag TDF_NOTIFY_CLOCK_WAS_SET to the timerfd flags. Now this
> > flag adds the timer to a separate list, which gets woken up when the
> > clock is set.
> >
> > No new syscall, just a few lines of code in fs/timerfd.c and
> > clock_was_set().
> >
> > Thoughts ?
>
> Something like this (sans ugliness)?

Oh, gosh, please. This is interface-multiplexing-a-palooza.
It should be decided if the feature makes sense, and then have proper
interface, instead of multiplexing unrelated insterfaces.
This is a sort of system-event-report pattern. What is wrong with using a
netlink-based transport for those kind of things?





> Signed-off-by: Alexander Shishkin <[email protected]>
> ---
> fs/timerfd.c | 52 +++++++++++++++++++++++++++++++++++++++++++++-
> include/linux/timerfd.h | 3 +-
> kernel/hrtimer.c | 2 +
> 3 files changed, 54 insertions(+), 3 deletions(-)
>
> diff --git a/fs/timerfd.c b/fs/timerfd.c
> index 8c4fc14..51a033e 100644
> --- a/fs/timerfd.c
> +++ b/fs/timerfd.c
> @@ -30,8 +30,14 @@ struct timerfd_ctx {
> u64 ticks;
> int expired;
> int clockid;
> + int clock_notifier; /* list_empty()? */
> + struct list_head notifiers_list;
> };
>
> +/* TFD_NOTIFY_CLOCK_SET timers go here */
> +static DEFINE_SPINLOCK(notifiers_lock);
> +static LIST_HEAD(notifiers_list);
> +
> /*
> * This gets called when the timer event triggers. We set the "expired"
> * flag, but we do not re-arm the timer (in case it's necessary,
> @@ -51,6 +57,21 @@ static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
> return HRTIMER_NORESTART;
> }
>
> +void timerfd_clock_was_set(void)
> +{
> + struct timerfd_ctx *ctx;
> + unsigned long flags;
> +
> + spin_lock(&notifiers_lock);
> + list_for_each_entry(ctx, &notifiers_list, notifiers_list) {
> + spin_lock_irqsave(&ctx->wqh.lock, flags);
> + ctx->ticks++;
> + wake_up_locked(&ctx->wqh);
> + spin_unlock_irqrestore(&ctx->wqh.lock, flags);
> + }
> + spin_unlock(&notifiers_lock);
> +}
> +
> static ktime_t timerfd_get_remaining(struct timerfd_ctx *ctx)
> {
> ktime_t remaining;
> @@ -65,6 +86,14 @@ static void timerfd_setup(struct timerfd_ctx *ctx, int flags,
> enum hrtimer_mode htmode;
> ktime_t texp;
>
> + if (flags & TFD_NOTIFY_CLOCK_SET) {
> + ctx->clock_notifier = 1;
> + ctx->ticks = 0;
> + INIT_LIST_HEAD(&ctx->notifiers_list);
> + list_add(&ctx->notifiers_list, &notifiers_list);
> + return;
> + }
> +
> htmode = (flags & TFD_TIMER_ABSTIME) ?
> HRTIMER_MODE_ABS: HRTIMER_MODE_REL;
>
> @@ -83,7 +112,12 @@ static int timerfd_release(struct inode *inode, struct file *file)
> {
> struct timerfd_ctx *ctx = file->private_data;
>
> - hrtimer_cancel(&ctx->tmr);
> + if (ctx->clock_notifier) {
> + spin_lock(&notifiers_lock);
> + list_del(&ctx->notifiers_list);
> + spin_unlock(&notifiers_lock);
> + } else
> + hrtimer_cancel(&ctx->tmr);
> kfree(ctx);
> return 0;
> }
> @@ -113,6 +147,7 @@ static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
>
> if (count < sizeof(ticks))
> return -EINVAL;
> +
> spin_lock_irq(&ctx->wqh.lock);
> if (file->f_flags & O_NONBLOCK)
> res = -EAGAIN;
> @@ -120,7 +155,7 @@ static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
> res = wait_event_interruptible_locked_irq(ctx->wqh, ctx->ticks);
> if (ctx->ticks) {
> ticks = ctx->ticks;
> - if (ctx->expired && ctx->tintv.tv64) {
> + if (ctx->expired && ctx->tintv.tv64 && !ctx->clock_notifier) {
> /*
> * If tintv.tv64 != 0, this is a periodic timer that
> * needs to be re-armed. We avoid doing it in the timer
> @@ -218,13 +253,20 @@ SYSCALL_DEFINE4(timerfd_settime, int, ufd, int, flags,
> * it to the new values.
> */
> for (;;) {
> + spin_lock(&notifiers_lock);
> spin_lock_irq(&ctx->wqh.lock);
> if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
> break;
> spin_unlock_irq(&ctx->wqh.lock);
> + spin_unlock(&notifiers_lock);
> cpu_relax();
> }
>
> + if (ctx->clock_notifier) {
> + ctx->clock_notifier = 0;
> + list_del(&ctx->notifiers_list);
> + }
> +
> /*
> * If the timer is expired and it's periodic, we need to advance it
> * because the caller may want to know the previous expiration time.
> @@ -243,6 +285,7 @@ SYSCALL_DEFINE4(timerfd_settime, int, ufd, int, flags,
> timerfd_setup(ctx, flags, &ktmr);
>
> spin_unlock_irq(&ctx->wqh.lock);
> + spin_unlock(&notifiers_lock);
> fput(file);
> if (otmr && copy_to_user(otmr, &kotmr, sizeof(kotmr)))
> return -EFAULT;
> @@ -262,6 +305,11 @@ SYSCALL_DEFINE2(timerfd_gettime, int, ufd, struct itimerspec __user *, otmr)
> ctx = file->private_data;
>
> spin_lock_irq(&ctx->wqh.lock);
> + if (ctx->clock_notifier) {
> + spin_unlock_irq(&ctx->wqh.lock);
> + return -EINVAL;
> + }
> +
> if (ctx->expired && ctx->tintv.tv64) {
> ctx->expired = 0;
> ctx->ticks +=
> diff --git a/include/linux/timerfd.h b/include/linux/timerfd.h
> index 2d07929..c3ddad9 100644
> --- a/include/linux/timerfd.h
> +++ b/include/linux/timerfd.h
> @@ -19,6 +19,7 @@
> * shared O_* flags.
> */
> #define TFD_TIMER_ABSTIME (1 << 0)
> +#define TFD_NOTIFY_CLOCK_SET (1 << 1)
> #define TFD_CLOEXEC O_CLOEXEC
> #define TFD_NONBLOCK O_NONBLOCK
>
> @@ -26,6 +27,6 @@
> /* Flags for timerfd_create. */
> #define TFD_CREATE_FLAGS TFD_SHARED_FCNTL_FLAGS
> /* Flags for timerfd_settime. */
> -#define TFD_SETTIME_FLAGS TFD_TIMER_ABSTIME
> +#define TFD_SETTIME_FLAGS (TFD_TIMER_ABSTIME | TFD_NOTIFY_CLOCK_SET)
>
> #endif /* _LINUX_TIMERFD_H */
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index 72206cf..a779e0f 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -642,8 +642,10 @@ static void retrigger_next_event(void *arg)
> * resolution timer interrupts. On UP we just disable interrupts and
> * call the high resolution interrupt code.
> */
> +void timerfd_clock_was_set(clockid_t clockid);
> void clock_was_set(void)
> {
> + timerfd_clock_was_set(CLOCK_REALTIME);
> /* Retrigger the CPU local events everywhere */
> on_each_cpu(retrigger_next_event, NULL, 1);
> }
> --
> 1.7.2.1.45.gb66c2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


- Davide

2010-11-17 21:29:20

by Alexander Shishkin

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Wed, Nov 17, 2010 at 12:42:52PM -0800, Davide Libenzi wrote:
> On Wed, 17 Nov 2010, Alexander Shishkin wrote:
>
> > > But what you folks really want for this stuff is an extension to
> > > timerfd as you want to be able to poll, right?
> > >
> > > So what about the following:
> > >
> > > Add a new flag TDF_NOTIFY_CLOCK_WAS_SET to the timerfd flags. Now this
> > > flag adds the timer to a separate list, which gets woken up when the
> > > clock is set.
> > >
> > > No new syscall, just a few lines of code in fs/timerfd.c and
> > > clock_was_set().
> > >
> > > Thoughts ?
> >
> > Something like this (sans ugliness)?
>
> Oh, gosh, please. This is interface-multiplexing-a-palooza.

Thomas made a suggestion, I came up with how it might look like so that
pros and cons are clearer to everyone (or at least me) and can be discussed
on technical grounds. Code talks, sort of. I'm not convinced that a timer
that returns to userspace when the clock changes is such a bad idea, could
you please elaborate?

> It should be decided if the feature makes sense, and then have proper
> interface, instead of multiplexing unrelated insterfaces.

It is not a question any more.

> This is a sort of system-event-report pattern. What is wrong with using a
> netlink-based transport for those kind of things?

What is wrong with eventfd-based implementation that's already there?

Thanks,
--
Alex

2010-11-17 21:34:17

by Kay Sievers

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Wed, Nov 17, 2010 at 22:29, Alexander Shishkin <[email protected]> wrote:
> On Wed, Nov 17, 2010 at 12:42:52PM -0800, Davide Libenzi wrote:
>> On Wed, 17 Nov 2010, Alexander Shishkin wrote:
>>
>> > > But what you folks really want for this stuff is an extension to
>> > > timerfd as you want to be able to poll, right?
>> > >
>> > > So what about the following:
>> > >
>> > > Add a new flag TDF_NOTIFY_CLOCK_WAS_SET to the timerfd flags. Now this
>> > > flag adds the timer to a separate list, which gets woken up when the
>> > > clock is set.
>> > >
>> > > No new syscall, just a few lines of code in fs/timerfd.c and
>> > > clock_was_set().
>> > >
>> > > Thoughts ?
>> >
>> > Something like this (sans ugliness)?
>>
>> Oh, gosh, please.  This is interface-multiplexing-a-palooza.
>
> Thomas made a suggestion, I came up with how it might look like so that
> pros and cons are clearer to everyone (or at least me) and can be discussed
> on technical grounds. Code talks, sort of. I'm not convinced that a timer
> that returns to userspace when the clock changes is such a bad idea, could
> you please elaborate?

I like it.

It's all bout timers, and timerfd is fine to use, I think. It has
nothing to do with "system-events", we ask for the timer to serve us,
but if the *time* changes underneath, we need to know to re-calculate.

I think it's simple and fits very well in the current interface.

Kay

2010-11-17 21:47:08

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Wed, 17 Nov 2010, Alexander Shishkin wrote:

> On Wed, Nov 17, 2010 at 12:42:52PM -0800, Davide Libenzi wrote:
> > On Wed, 17 Nov 2010, Alexander Shishkin wrote:
> >
> > > > But what you folks really want for this stuff is an extension to
> > > > timerfd as you want to be able to poll, right?
> > > >
> > > > So what about the following:
> > > >
> > > > Add a new flag TDF_NOTIFY_CLOCK_WAS_SET to the timerfd flags. Now this
> > > > flag adds the timer to a separate list, which gets woken up when the
> > > > clock is set.
> > > >
> > > > No new syscall, just a few lines of code in fs/timerfd.c and
> > > > clock_was_set().
> > > >
> > > > Thoughts ?
> > >
> > > Something like this (sans ugliness)?
> >
> > Oh, gosh, please. This is interface-multiplexing-a-palooza.
>
> Thomas made a suggestion, I came up with how it might look like so that

Yeah, I made the suggestion in the first place and I regretted it when
Davide told me that I'm nuts :)

Thanks,

tglx

2010-11-18 09:49:12

by Alexander Shishkin

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Wed, Nov 17, 2010 at 10:46:48PM +0100, Thomas Gleixner wrote:
> On Wed, 17 Nov 2010, Alexander Shishkin wrote:
>
> > On Wed, Nov 17, 2010 at 12:42:52PM -0800, Davide Libenzi wrote:
> > > On Wed, 17 Nov 2010, Alexander Shishkin wrote:
> > >
> > > > > But what you folks really want for this stuff is an extension to
> > > > > timerfd as you want to be able to poll, right?
> > > > >
> > > > > So what about the following:
> > > > >
> > > > > Add a new flag TDF_NOTIFY_CLOCK_WAS_SET to the timerfd flags. Now this
> > > > > flag adds the timer to a separate list, which gets woken up when the
> > > > > clock is set.
> > > > >
> > > > > No new syscall, just a few lines of code in fs/timerfd.c and
> > > > > clock_was_set().
> > > > >
> > > > > Thoughts ?
> > > >
> > > > Something like this (sans ugliness)?
> > >
> > > Oh, gosh, please. This is interface-multiplexing-a-palooza.
> >
> > Thomas made a suggestion, I came up with how it might look like so that
>
> Yeah, I made the suggestion in the first place and I regretted it when
> Davide told me that I'm nuts :)

Well, I haven't seen any convincing technical arguments in either of
his emails. Kay, for example, made a clear point.

Regards,
--
Alex

2010-11-18 13:08:48

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Wed, 2010-11-17 at 12:42 -0800, Davide Libenzi wrote:
> On Wed, 17 Nov 2010, Alexander Shishkin wrote:
>
> > > But what you folks really want for this stuff is an extension to
> > > timerfd as you want to be able to poll, right?
> > >
> > > So what about the following:
> > >
> > > Add a new flag TDF_NOTIFY_CLOCK_WAS_SET to the timerfd flags. Now this
> > > flag adds the timer to a separate list, which gets woken up when the
> > > clock is set.
> > >
> > > No new syscall, just a few lines of code in fs/timerfd.c and
> > > clock_was_set().
> > >
> > > Thoughts ?
> >
> > Something like this (sans ugliness)?
>
> Oh, gosh, please. This is interface-multiplexing-a-palooza.
> It should be decided if the feature makes sense, and then have proper
> interface, instead of multiplexing unrelated insterfaces.
> This is a sort of system-event-report pattern. What is wrong with using a
> netlink-based transport for those kind of things?

Could you please explain in more details what is the problem?

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

2010-11-18 15:59:45

by Alexander Shishkin

[permalink] [raw]
Subject: Re: [PATCHv6 0/7] system time changes notification

On Wed, Nov 17, 2010 at 10:34:00PM +0100, Kay Sievers wrote:
> On Wed, Nov 17, 2010 at 22:29, Alexander Shishkin <[email protected]> wrote:
> > On Wed, Nov 17, 2010 at 12:42:52PM -0800, Davide Libenzi wrote:
> >> On Wed, 17 Nov 2010, Alexander Shishkin wrote:
> >>
> >> > > But what you folks really want for this stuff is an extension to
> >> > > timerfd as you want to be able to poll, right?
> >> > >
> >> > > So what about the following:
> >> > >
> >> > > Add a new flag TDF_NOTIFY_CLOCK_WAS_SET to the timerfd flags. Now this
> >> > > flag adds the timer to a separate list, which gets woken up when the
> >> > > clock is set.
> >> > >
> >> > > No new syscall, just a few lines of code in fs/timerfd.c and
> >> > > clock_was_set().
> >> > >
> >> > > Thoughts ?
> >> >
> >> > Something like this (sans ugliness)?
> >>
> >> Oh, gosh, please. ?This is interface-multiplexing-a-palooza.
> >
> > Thomas made a suggestion, I came up with how it might look like so that
> > pros and cons are clearer to everyone (or at least me) and can be discussed
> > on technical grounds. Code talks, sort of. I'm not convinced that a timer
> > that returns to userspace when the clock changes is such a bad idea, could
> > you please elaborate?
>
> I like it.
>
> It's all bout timers, and timerfd is fine to use, I think. It has
> nothing to do with "system-events", we ask for the timer to serve us,
> but if the *time* changes underneath, we need to know to re-calculate.
>
> I think it's simple and fits very well in the current interface.

What would you say timerfd_gettime() should return for such a timer? Should
it be current clock's time in it_value?

Regards,
--
Alex