Changes since v1:
- updated against 2.6.36-rc1,
- added notification/filtering options,
- added Documentation/ABI/sysfs-kernel-time-notify interface description.
Certain userspace applications (like "clock" desktop applets or cron) might
want to be notified when some other application changes the system time. It
might also be important for an application to be able to distinguish between
its own and somebody else's time changes.
This patch implements a notification interface via eventfd mechanism. Proccess
wishing to be notified about time changes should create an eventfd and echo
its file descriptor along with notification options to /sys/kernel/time_notify.
After that, any calls to settimeofday()/stime()/adjtimex() made by other
processes will be signalled to this eventfd. Credits for suggesting the eventfd
mechanism for this purpose go to Kirill Shutemov.
So far, this implementation can only filter out notifications caused by
time change calls made by the process that wrote the eventfd descriptor to
sysfs, but not its children which (might) have inherited the eventfd. It
is so far not clear to me whether this is bad and more confusing than
excluding such children as well.
Signed-off-by: Alexander Shishkin <[email protected]>
CC: Kirill A. Shutemov <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: John Stultz <[email protected]>
CC: Martin Schwidefsky <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Jon Hunter <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: "Paul E. McKenney" <[email protected]>
CC: David Howells <[email protected]>
CC: Avi Kivity <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: John Kacur <[email protected]>
CC: Alexander Shishkin <[email protected]>
CC: Chris Friesen <[email protected]>
CC: Kay Sievers <[email protected]>
CC: Greg KH <[email protected]>
CC: [email protected]
---
Documentation/ABI/testing/sysfs-kernel-time-notify | 39 ++++
include/linux/time.h | 11 ++
init/Kconfig | 8 +
kernel/Makefile | 1 +
kernel/time.c | 11 +-
kernel/time_notify.c | 184 ++++++++++++++++++++
6 files changed, 252 insertions(+), 2 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-kernel-time-notify
create mode 100644 kernel/time_notify.c
diff --git a/Documentation/ABI/testing/sysfs-kernel-time-notify b/Documentation/ABI/testing/sysfs-kernel-time-notify
new file mode 100644
index 0000000..fe5c960
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-time-notify
@@ -0,0 +1,39 @@
+What: /sys/kernel/time_notify
+Date: August 2010
+Contact: Alexander Shishkin <[email protected]>
+Description:
+ This file is used by processes which want to receive
+ notifications about system time changes, to communicate their
+ eventfd descriptor number along with the type of event they
+ wish to subscribe to and filtering options. Every time one of
+ the processes in the system succeeds in doing a settimeofday()
+ or adjtimex(), this eventfd will be signalled.
+ This file is not readable. Upon write, this is the format
+ accepted by the kernel:
+ <fd> <want-others> <want-own> <want-set> <want-adj>
+ where
+ - <fd> is the file descriptor of the eventfd that the process
+ will listen to for time change events;
+ - <want-others> is 0 or 1 depending on whether the process wants
+ to be notified about other processes' time changes;
+ - <want-own> -- likewise, for process' own time changes;
+ - <want-set> is 0 or 1 depending on whether the process wants
+ to be notified about settimeofday()/stime() system calls;
+ - <want-adj> -- likewise, for adjtimex() system call.
+
+ A simple snippet of C code illustrates the usage pattern:
+
+ efd = eventfd(0, 0);
+ fd = open("/sys/kernel/time_notify", O_WRONLY);
+ fdprintf(fd, "%d 1 0 1 1", efd);
+ close(fd);
+
+ fds[0].fd = efd;
+ fds[0].events = POLLIN;
+ while (poll(fds, 1, -1) > 0) {
+ uint64_t n;
+
+ read(efd, &n, 8);
+ printf("time has been modified %d time(s)\n", n);
+ }
+
diff --git a/include/linux/time.h b/include/linux/time.h
index 9f15ac7..72b6060 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -252,6 +252,17 @@ static __always_inline void timespec_add_ns(struct timespec *a, u64 ns)
a->tv_sec += __iter_div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
a->tv_nsec = ns;
}
+
+/* time change events */
+#define TIME_EVENT_SET 0
+#define TIME_EVENT_ADJ 1
+
+#ifdef CONFIG_TIME_NOTIFY
+void time_notify_all(int type);
+#else
+#define time_notify_all(x) do {} while (0)
+#endif
+
#endif /* __KERNEL__ */
#define NFDBITS __NFDBITS
diff --git a/init/Kconfig b/init/Kconfig
index 2de5b1c..aa4d890 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -980,6 +980,14 @@ config PERF_USE_VMALLOC
help
See tools/perf/design.txt for details
+config TIME_NOTIFY
+ bool
+ default y
+ depends on EVENTFD
+ help
+ Enable time change notification events to userspace via
+ eventfd.
+
menu "Kernel Performance Events And Counters"
config PERF_EVENTS
diff --git a/kernel/Makefile b/kernel/Makefile
index 0b72d1a..ac53c67 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -104,6 +104,7 @@ obj-$(CONFIG_PERF_EVENTS) += perf_event.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o
obj-$(CONFIG_PADATA) += padata.o
+obj-$(CONFIG_TIME_NOTIFY) += time_notify.o
ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
# According to Alan Modra <[email protected]>, the -fno-omit-frame-pointer is
diff --git a/kernel/time.c b/kernel/time.c
index ba9b338..b4155b8 100644
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -92,7 +92,9 @@ SYSCALL_DEFINE1(stime, time_t __user *, tptr)
if (err)
return err;
- do_settimeofday(&tv);
+ err = do_settimeofday(&tv);
+ if (!err)
+ time_notify_all(TIME_EVENT_SET);
return 0;
}
@@ -177,7 +179,10 @@ int do_sys_settimeofday(struct timespec *tv, struct timezone *tz)
/* SMP safe, again the code in arch/foo/time.c should
* globally block out interrupts when it runs.
*/
- return do_settimeofday(tv);
+ error = do_settimeofday(tv);
+ if (!error)
+ time_notify_all(TIME_EVENT_SET);
+ return error;
}
return 0;
}
@@ -215,6 +220,8 @@ SYSCALL_DEFINE1(adjtimex, struct timex __user *, txc_p)
if(copy_from_user(&txc, txc_p, sizeof(struct timex)))
return -EFAULT;
ret = do_adjtimex(&txc);
+ if (!ret)
+ time_notify_all(TIME_EVENT_ADJ);
return copy_to_user(txc_p, &txc, sizeof(struct timex)) ? -EFAULT : ret;
}
diff --git a/kernel/time_notify.c b/kernel/time_notify.c
new file mode 100644
index 0000000..21c1bea
--- /dev/null
+++ b/kernel/time_notify.c
@@ -0,0 +1,184 @@
+/*
+ * linux/kernel/time_notify.c
+ *
+ * Copyright (C) 2010 Nokia Corporation
+ * Alexander Shishkin
+ *
+ * This file implements an interface to communicate time changes to userspace.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+#include <linux/eventfd.h>
+#include <linux/kobject.h>
+#include <linux/wait.h>
+#include <linux/workqueue.h>
+#include <linux/sched.h>
+#include <linux/poll.h>
+#include <linux/err.h>
+
+/*
+ * A process can "subscribe" to receive a notification via eventfd that
+ * some other process has called stime/settimeofday/adjtimex.
+ */
+struct time_event {
+ struct eventfd_ctx *eventfd;
+ struct task_struct *watcher;
+ unsigned int want_others:1;
+ unsigned int want_own:1;
+ unsigned int want_set:1;
+ unsigned int want_adj:1;
+ struct work_struct remove;
+ wait_queue_t wq;
+ wait_queue_head_t *wqh;
+ poll_table pt;
+ struct list_head list;
+};
+
+static LIST_HEAD(event_list);
+static DEFINE_SPINLOCK(event_lock);
+
+/*
+ * Do the necessary cleanup when the eventfd is being closed
+ */
+static void time_event_remove(struct work_struct *work)
+{
+ struct time_event *evt = container_of(work, struct time_event, remove);
+
+ kfree(evt);
+}
+
+static int time_event_wakeup(wait_queue_t *wq, unsigned int mode, int sync,
+ void *key)
+{
+ struct time_event *evt = container_of(wq, struct time_event, wq);
+ unsigned long flags = (unsigned long)key;
+
+ if (flags & POLLHUP) {
+ __remove_wait_queue(evt->wqh, &evt->wq);
+ spin_lock(&event_lock);
+ list_del(&evt->list);
+ spin_unlock(&event_lock);
+
+ schedule_work(&evt->remove);
+ }
+
+ return 0;
+}
+
+static void time_event_ptable_queue_proc(struct file *file,
+ wait_queue_head_t *wqh, poll_table *pt)
+{
+ struct time_event *evt = container_of(pt, struct time_event, pt);
+
+ evt->wqh = wqh;
+ add_wait_queue(wqh, &evt->wq);
+}
+
+/*
+ * Process wishing to be notified about time changes should write its
+ * eventfd's descriptor to /sys/kernel/time_notify. This eventfd will
+ * then be signalled about any time changes made by any process *but*
+ * this one.
+ */
+static int time_notify_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ int ret;
+ unsigned int fd;
+ unsigned int want_others, want_own, want_set, want_adj;
+ struct file *file;
+ struct time_event *evt;
+
+ evt = kmalloc(sizeof(*evt), GFP_KERNEL);
+ if (!evt)
+ return -ENOMEM;
+
+ ret = sscanf(buf, "%u %u %u %u %u", &fd, &want_others, &want_own,
+ &want_set, &want_adj);
+ if (ret < 4) {
+ ret = -EINVAL;
+ goto out_free;
+ }
+
+ evt->want_others = !!want_others;
+ evt->want_own = !!want_own;
+ evt->want_set = !!want_set;
+ evt->want_adj = !!want_adj;
+
+ file = eventfd_fget(fd);
+ if (IS_ERR(file)) {
+ ret = -EINVAL;
+ goto out_free;
+ }
+
+ evt->eventfd = eventfd_ctx_fileget(file);
+ if (!evt->eventfd) {
+ ret = PTR_ERR(evt->eventfd);
+ goto out_fput;
+ }
+
+ INIT_LIST_HEAD(&evt->list);
+ INIT_WORK(&evt->remove, time_event_remove);
+
+ init_waitqueue_func_entry(&evt->wq, time_event_wakeup);
+ init_poll_funcptr(&evt->pt, time_event_ptable_queue_proc);
+
+ if (file->f_op->poll(file, &evt->pt) & POLLHUP) {
+ ret = 0;
+ goto out_fput;
+ }
+
+ evt->watcher = current;
+
+ spin_lock(&event_lock);
+ list_add(&event_list, &evt->list);
+ spin_unlock(&event_lock);
+
+ fput(file);
+
+ return n;
+
+out_fput:
+ fput(file);
+
+out_free:
+ kfree(evt);
+
+ return ret;
+}
+
+void time_notify_all(int type)
+{
+ struct list_head *tmp;
+
+ spin_lock(&event_lock);
+ list_for_each(tmp, &event_list) {
+ struct time_event *e = container_of(tmp, struct time_event,
+ list);
+
+ if (type == TIME_EVENT_SET && !e->want_set)
+ continue;
+ else if (type == TIME_EVENT_ADJ && !e->want_adj)
+ continue;
+
+ if (e->watcher == current && !e->want_own)
+ continue;
+ else if (e->watcher != current && !e->want_others)
+ continue;
+
+ eventfd_signal(e->eventfd, 1);
+ }
+ spin_unlock(&event_lock);
+}
+
+static struct kobj_attribute time_notify_attr =
+ __ATTR(time_notify, S_IWUGO, NULL, time_notify_store);
+
+static int time_notify_init(void)
+{
+ return sysfs_create_file(kernel_kobj, &time_notify_attr.attr);
+}
+
+core_initcall(time_notify_init);
--
1.7.1
On Wed, Aug 18, 2010 at 04:55:39PM +0300, Alexander Shishkin wrote:
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-kernel-time-notify
> @@ -0,0 +1,39 @@
> +What: /sys/kernel/time_notify
> +Date: August 2010
> +Contact: Alexander Shishkin <[email protected]>
> +Description:
> + This file is used by processes which want to receive
> + notifications about system time changes, to communicate their
> + eventfd descriptor number along with the type of event they
> + wish to subscribe to and filtering options. Every time one of
> + the processes in the system succeeds in doing a settimeofday()
> + or adjtimex(), this eventfd will be signalled.
> + This file is not readable. Upon write, this is the format
> + accepted by the kernel:
> + <fd> <want-others> <want-own> <want-set> <want-adj>
> + where
> + - <fd> is the file descriptor of the eventfd that the process
> + will listen to for time change events;
> + - <want-others> is 0 or 1 depending on whether the process wants
> + to be notified about other processes' time changes;
> + - <want-own> -- likewise, for process' own time changes;
> + - <want-set> is 0 or 1 depending on whether the process wants
> + to be notified about settimeofday()/stime() system calls;
> + - <want-adj> -- likewise, for adjtimex() system call.
> +
> + A simple snippet of C code illustrates the usage pattern:
> +
> + efd = eventfd(0, 0);
> + fd = open("/sys/kernel/time_notify", O_WRONLY);
> + fdprintf(fd, "%d 1 0 1 1", efd);
> + close(fd);
> +
> + fds[0].fd = efd;
> + fds[0].events = POLLIN;
> + while (poll(fds, 1, -1) > 0) {
> + uint64_t n;
> +
> + read(efd, &n, 8);
> + printf("time has been modified %d time(s)\n", n);
> + }
While this isn't the "nicest" sysfs file ever, it's not all that bad.
> +/* time change events */
> +#define TIME_EVENT_SET 0
> +#define TIME_EVENT_ADJ 1
> +
> +#ifdef CONFIG_TIME_NOTIFY
> +void time_notify_all(int type);
> +#else
> +#define time_notify_all(x) do {} while (0)
static inline please so we do a typecheck.
> +#endif
> +
> #endif /* __KERNEL__ */
>
> #define NFDBITS __NFDBITS
> diff --git a/init/Kconfig b/init/Kconfig
> index 2de5b1c..aa4d890 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -980,6 +980,14 @@ config PERF_USE_VMALLOC
> help
> See tools/perf/design.txt for details
>
> +config TIME_NOTIFY
> + bool
> + default y
Really? You obviously haven't added new Kconfig options to the kernel
in the past and been yelled at by Linus :)
> + evt->eventfd = eventfd_ctx_fileget(file);
> + if (!evt->eventfd) {
> + ret = PTR_ERR(evt->eventfd);
Are you sure this is correct?
thanks,
greg k-h
On Wed, 18 Aug 2010 16:55:39 +0300
Alexander Shishkin <[email protected]> wrote:
> Certain userspace applications (like "clock" desktop applets or cron) might
> want to be notified when some other application changes the system time.
The requirements sound a bit fluffy to me.
Any time-displaying application will find out the new time next time
it reads the time. So afaict this is only really useful for clock
applets which display once per minute, so they will show the new time
promptly after the time was altered, yes? Is that really worth adding
new code for?
> It
> might also be important for an application to be able to distinguish between
> its own and somebody else's time changes.
hm. Why? What are you thinking of here, specifically?
Also... what's the story with kernel-NTP updates?
On Wed, 2010-08-18 at 16:55 +0300, Alexander Shishkin wrote:
> Changes since v1:
> - updated against 2.6.36-rc1,
> - added notification/filtering options,
> - added Documentation/ABI/sysfs-kernel-time-notify interface description.
>
> Certain userspace applications (like "clock" desktop applets or cron) might
> want to be notified when some other application changes the system time. It
> might also be important for an application to be able to distinguish between
> its own and somebody else's time changes.
>
> This patch implements a notification interface via eventfd mechanism. Proccess
> wishing to be notified about time changes should create an eventfd and echo
> its file descriptor along with notification options to /sys/kernel/time_notify.
> After that, any calls to settimeofday()/stime()/adjtimex() made by other
> processes will be signalled to this eventfd. Credits for suggesting the eventfd
> mechanism for this purpose go to Kirill Shutemov.
Hey Alexander,
Glad to see this work continue! One thing did strike me as odd in
reading over this: Does adjtimex really make sense to trigger a
notification? Its not actually changing the time, but alters the freq
that time runs. This freq adjustment is transparent to applications or
timers (unlike something like settimeofday). So I'm not sure I see why
it is included here.
What use case did you have in mind for it?
thanks
-john
On 08/18/2010 03:57 PM, Andrew Morton wrote:
> On Wed, 18 Aug 2010 16:55:39 +0300
> Alexander Shishkin <[email protected]> wrote:
>
>> Certain userspace applications (like "clock" desktop applets or cron) might
>> want to be notified when some other application changes the system time.
>
> The requirements sound a bit fluffy to me.
>
> Any time-displaying application will find out the new time next time
> it reads the time. So afaict this is only really useful for clock
> applets which display once per minute, so they will show the new time
> promptly after the time was altered, yes? Is that really worth adding
> new code for?
>
Actually a much more significant use case was given for the "civil
periodic" type events: something that wants to happen "every day at
noon", for example. The logical thing of sleeping until the next noon
breaks if the walltime clock is changed. As such, things like cron have
to resort to wake up once a minute just to assure themselves that they
have nothing to do. This is inefficient, especially for battery-powered
devices.
Applications which display time aren't really as much affected, since
they generaly wake up every minute or every second anyway.
-hpa
On 08/18/2010 04:57 PM, Andrew Morton wrote:
> The requirements sound a bit fluffy to me.
>
> Any time-displaying application will find out the new time next time
> it reads the time. So afaict this is only really useful for clock
> applets which display once per minute, so they will show the new time
> promptly after the time was altered, yes? Is that really worth adding
> new code for?
There are other users of this functionality. We have an emulator
running on top of linux which in turn runs code originally intended for
another OS. It cares about walltime, so it wants to know ASAP if the
linux system time changes.
We've actually been carrying an internal patch doing something like this
for years now...never pushed it to mainline since we didn't think anyone
else would be interested.
Chris
--
Chris Friesen
Software Developer
GENBAND
[email protected]
http://www.genband.com
On Wed, 18 Aug 2010 16:43:48 -0700
"H. Peter Anvin" <[email protected]> wrote:
> On 08/18/2010 03:57 PM, Andrew Morton wrote:
> > On Wed, 18 Aug 2010 16:55:39 +0300
> > Alexander Shishkin <[email protected]> wrote:
> >
> >> Certain userspace applications (like "clock" desktop applets or cron) might
> >> want to be notified when some other application changes the system time.
> >
> > The requirements sound a bit fluffy to me.
> >
> > Any time-displaying application will find out the new time next time
> > it reads the time. So afaict this is only really useful for clock
> > applets which display once per minute, so they will show the new time
> > promptly after the time was altered, yes? Is that really worth adding
> > new code for?
> >
>
> Actually a much more significant use case was given for the "civil
> periodic" type events: something that wants to happen "every day at
> noon", for example. The logical thing of sleeping until the next noon
> breaks if the walltime clock is changed. As such, things like cron have
> to resort to wake up once a minute just to assure themselves that they
> have nothing to do. This is inefficient, especially for battery-powered
> devices.
OK.
Such applications might be better served via a wake-me-at-this-time
syscall instead of a sleep-me-for-this-long syscall. Although such a
thing is less general.
Is sysfs the right interface for this thing? Bear in mind that
CONFIG_SYSFS does exist.
> + fd = open("/sys/kernel/time_notify", O_WRONLY);
> + fdprintf(fd, "%d 1 0 1 1", efd);
why not
sys_time_notify(efd, 1, 0, 1, 1);
?
On Wed, Aug 18, 2010 at 04:53:03PM -0700, Andrew Morton wrote:
> Is sysfs the right interface for this thing? Bear in mind that
> CONFIG_SYSFS does exist.
>
> > + fd = open("/sys/kernel/time_notify", O_WRONLY);
> > + fdprintf(fd, "%d 1 0 1 1", efd);
>
> why not
>
> sys_time_notify(efd, 1, 0, 1, 1);
Yeah, that would be much better than a sysfs file, this is abusing the
sysfs interface quite a lot.
thanks,
greg k-h
On Wed, 18 Aug 2010 21:09:37 -0700 Greg KH <[email protected]> wrote:
> On Wed, Aug 18, 2010 at 04:53:03PM -0700, Andrew Morton wrote:
> > Is sysfs the right interface for this thing? Bear in mind that
> > CONFIG_SYSFS does exist.
> >
> > > + fd = open("/sys/kernel/time_notify", O_WRONLY);
> > > + fdprintf(fd, "%d 1 0 1 1", efd);
> >
> > why not
> >
> > sys_time_notify(efd, 1, 0, 1, 1);
>
> Yeah, that would be much better than a sysfs file, this is abusing the
> sysfs interface quite a lot.
>
(time_change_notify would be a better syscall name).
On Wed, Aug 18, 2010 at 09:09:37 -0700, Greg KH wrote:
> On Wed, Aug 18, 2010 at 04:53:03PM -0700, Andrew Morton wrote:
> > Is sysfs the right interface for this thing? Bear in mind that
> > CONFIG_SYSFS does exist.
> >
> > > + fd = open("/sys/kernel/time_notify", O_WRONLY);
> > > + fdprintf(fd, "%d 1 0 1 1", efd);
> >
> > why not
> >
> > sys_time_notify(efd, 1, 0, 1, 1);
>
> Yeah, that would be much better than a sysfs file, this is abusing the
> sysfs interface quite a lot.
Hm, ok. I used to think that adding new syscalls was discouraged, though.
Regards,
--
Alex
On Wed, Aug 18, 2010 at 09:09:37PM -0700, Greg KH wrote:
> On Wed, Aug 18, 2010 at 04:53:03PM -0700, Andrew Morton wrote:
> > Is sysfs the right interface for this thing? Bear in mind that
> > CONFIG_SYSFS does exist.
> >
> > > + fd = open("/sys/kernel/time_notify", O_WRONLY);
> > > + fdprintf(fd, "%d 1 0 1 1", efd);
> >
> > why not
> >
> > sys_time_notify(efd, 1, 0, 1, 1);
>
> Yeah, that would be much better than a sysfs file, this is abusing the
> sysfs interface quite a lot.
Do you really think, that increasing number of syscalls is better then
fs-based interfaces?
--
Kirill A. Shutemov
On Thu, Aug 19, 2010 at 10:36, Kirill A. Shutemov <[email protected]> wrote:
> On Wed, Aug 18, 2010 at 09:09:37PM -0700, Greg KH wrote:
>> On Wed, Aug 18, 2010 at 04:53:03PM -0700, Andrew Morton wrote:
>> > Is sysfs the right interface for this thing? Bear in mind that
>> > CONFIG_SYSFS does exist.
>> >
>> > > + fd = open("/sys/kernel/time_notify", O_WRONLY);
>> > > + fdprintf(fd, "%d 1 0 1 1", efd);
>> >
>> > why not
>> >
>> > sys_time_notify(efd, 1, 0, 1, 1);
>>
>> Yeah, that would be much better than a sysfs file, this is abusing the
>> sysfs interface quite a lot.
>
> Do you really think, that increasing number of syscalls is better then
> fs-based interfaces?
Not necessarily, but adding such, and from the outside scope 'random
stuff', to /sys/kernel/ is really not the way to do such things.
It seems to be new dumping ground for things that don't have an
obvious home, and since proc is out of fashion. :)
Can't timerfd be used, or extended in some way? It really sounds like
the place this should happen.
Kay
On Wed, 2010-08-18 at 16:53 -0700, Andrew Morton wrote:
>
>
> Such applications might be better served via a wake-me-at-this-time
> syscall instead of a sleep-me-for-this-long syscall. Although such a
> thing is less general.
>
We have timer_create(.clockid = CLOCK_REALTIME) for such things, no?
> > + fd = open("/sys/kernel/time_notify", O_WRONLY);
> > + fdprintf(fd, "%d 1 0 1 1", efd);
>
> why not
>
> sys_time_notify(efd, 1, 0, 1, 1);
And then there is the /dev/time proposal from Plan9:
http://lkml.org/lkml/2009/3/11/363
Their proposal was to add a poll() method to the device which would wake
on every change in time.
On Thu, Aug 19, 2010 at 11:50, Peter Zijlstra <[email protected]> wrote:
> On Wed, 2010-08-18 at 16:53 -0700, Andrew Morton wrote:
>>
>> Such applications might be better served via a wake-me-at-this-time
>> syscall instead of a sleep-me-for-this-long syscall. Although such a
>> thing is less general.
>>
> We have timer_create(.clockid = CLOCK_REALTIME) for such things, no?
Not for repeating events like cron needs.
Say we want to wakeup at 3pm, now it's 4pm, so we schedule it in 23
hours. Now the system time changes to 2pm, and we would expect to
wakeup in one hour, but we take 25.
Kay
On Thu, Aug 19, 2010 at 11:50:46 +0200, Peter Zijlstra wrote:
> On Wed, 2010-08-18 at 16:53 -0700, Andrew Morton wrote:
> >
> >
> > Such applications might be better served via a wake-me-at-this-time
> > syscall instead of a sleep-me-for-this-long syscall. Although such a
> > thing is less general.
> >
> We have timer_create(.clockid = CLOCK_REALTIME) for such things, no?
No. I'll try to summarize all the things that have been said about the
use cases for this functionality earlier in the followup version so as
to make things clearer.
>
> > > + fd = open("/sys/kernel/time_notify", O_WRONLY);
> > > + fdprintf(fd, "%d 1 0 1 1", efd);
> >
> > why not
> >
> > sys_time_notify(efd, 1, 0, 1, 1);
>
> And then there is the /dev/time proposal from Plan9:
>
> http://lkml.org/lkml/2009/3/11/363
>
> Their proposal was to add a poll() method to the device which would wake
> on every change in time.
I haven't seen any poll-related code there, but it looks like the interface
which duplicates existing functionality for no gain.
It would be nice if they did some work on replacing {get,set}timeofday()
with read()s/write()s in glibc etc. But as of now this /dev/time looks quite
a pointless thing to have.
Regards,
--
Alex
On Thu, Aug 19, 2010 at 11:36:12AM +0300, Kirill A. Shutemov wrote:
> On Wed, Aug 18, 2010 at 09:09:37PM -0700, Greg KH wrote:
> > On Wed, Aug 18, 2010 at 04:53:03PM -0700, Andrew Morton wrote:
> > > Is sysfs the right interface for this thing? Bear in mind that
> > > CONFIG_SYSFS does exist.
> > >
> > > > + fd = open("/sys/kernel/time_notify", O_WRONLY);
> > > > + fdprintf(fd, "%d 1 0 1 1", efd);
> > >
> > > why not
> > >
> > > sys_time_notify(efd, 1, 0, 1, 1);
> >
> > Yeah, that would be much better than a sysfs file, this is abusing the
> > sysfs interface quite a lot.
>
> Do you really think, that increasing number of syscalls is better then
> fs-based interfaces?
As you are pretty much creating a new syscall here anyway, there is no
problem with making it a real one, right? That way you can properly
handle the user/kernel documentation and persistance over time (i.e. you
can't change it.)
So yes, a syscall would be better, especially as this does not exactly
fit into the model of sysfs, right?
thanks,
greg k-h
On Thu, Aug 19, 2010 at 08:31:27AM -0700, Greg KH wrote:
> On Thu, Aug 19, 2010 at 11:36:12AM +0300, Kirill A. Shutemov wrote:
> > On Wed, Aug 18, 2010 at 09:09:37PM -0700, Greg KH wrote:
> > > On Wed, Aug 18, 2010 at 04:53:03PM -0700, Andrew Morton wrote:
> > > > Is sysfs the right interface for this thing? Bear in mind that
> > > > CONFIG_SYSFS does exist.
> > > >
> > > > > + fd = open("/sys/kernel/time_notify", O_WRONLY);
> > > > > + fdprintf(fd, "%d 1 0 1 1", efd);
> > > >
> > > > why not
> > > >
> > > > sys_time_notify(efd, 1, 0, 1, 1);
> > >
> > > Yeah, that would be much better than a sysfs file, this is abusing the
> > > sysfs interface quite a lot.
> >
> > Do you really think, that increasing number of syscalls is better then
> > fs-based interfaces?
>
> As you are pretty much creating a new syscall here anyway, there is no
> problem with making it a real one, right?
I think Linux has too many syscalls. Significant part these interfaces
would be better to map to a filesystem[s].
> That way you can properly
> handle the user/kernel documentation and persistance over time (i.e. you
> can't change it.)
On the other, hand properly designed fs-based interface requires less
modification of userspeace to use it. Acctually, you can use most of
fs-based intefaces directly from shell. No need in libc modifications and
utilities to use it from shell or other script language.
See cgroup, for example.
> So yes, a syscall would be better, especially as this does not exactly
> fit into the model of sysfs, right?
Yes, sysfs is not the best place for it, but...
--
Kirill A. Shutemov
On Fri, Aug 20, 2010 at 11:37:23AM +0300, Kirill A. Shutemov wrote:
> On Thu, Aug 19, 2010 at 08:31:27AM -0700, Greg KH wrote:
> > On Thu, Aug 19, 2010 at 11:36:12AM +0300, Kirill A. Shutemov wrote:
> > > On Wed, Aug 18, 2010 at 09:09:37PM -0700, Greg KH wrote:
> > > > On Wed, Aug 18, 2010 at 04:53:03PM -0700, Andrew Morton wrote:
> > > > > Is sysfs the right interface for this thing? Bear in mind that
> > > > > CONFIG_SYSFS does exist.
> > > > >
> > > > > > + fd = open("/sys/kernel/time_notify", O_WRONLY);
> > > > > > + fdprintf(fd, "%d 1 0 1 1", efd);
> > > > >
> > > > > why not
> > > > >
> > > > > sys_time_notify(efd, 1, 0, 1, 1);
> > > >
> > > > Yeah, that would be much better than a sysfs file, this is abusing the
> > > > sysfs interface quite a lot.
> > >
> > > Do you really think, that increasing number of syscalls is better then
> > > fs-based interfaces?
> >
> > As you are pretty much creating a new syscall here anyway, there is no
> > problem with making it a real one, right?
>
> I think Linux has too many syscalls. Significant part these interfaces
> would be better to map to a filesystem[s].
What is the difference between a syscall and a filesystem interface?
They are both things that we can not change in the future and need to be
preserved and documented.
Don't be afraid of syscall's, they don't bite :)
> > That way you can properly
> > handle the user/kernel documentation and persistance over time (i.e. you
> > can't change it.)
>
> On the other, hand properly designed fs-based interface requires less
> modification of userspeace to use it. Acctually, you can use most of
> fs-based intefaces directly from shell. No need in libc modifications and
> utilities to use it from shell or other script language.
> See cgroup, for example.
>
> > So yes, a syscall would be better, especially as this does not exactly
> > fit into the model of sysfs, right?
>
> Yes, sysfs is not the best place for it, but...
You just answered your own question. Please don't make it in sysfs,
make it a syscall as it does not fit into sysfs.
thanks,
greg "constantly fighting to keep /sys/ from being like /proc/" k-h
On Fri, Aug 20, 2010 at 08:33:46 -0700, Greg KH wrote:
> On Fri, Aug 20, 2010 at 11:37:23AM +0300, Kirill A. Shutemov wrote:
> > On Thu, Aug 19, 2010 at 08:31:27AM -0700, Greg KH wrote:
> > > On Thu, Aug 19, 2010 at 11:36:12AM +0300, Kirill A. Shutemov wrote:
> > > > On Wed, Aug 18, 2010 at 09:09:37PM -0700, Greg KH wrote:
> > > > > On Wed, Aug 18, 2010 at 04:53:03PM -0700, Andrew Morton wrote:
> > > > > > Is sysfs the right interface for this thing? Bear in mind that
> > > > > > CONFIG_SYSFS does exist.
> > > > > >
> > > > > > > + fd = open("/sys/kernel/time_notify", O_WRONLY);
> > > > > > > + fdprintf(fd, "%d 1 0 1 1", efd);
> > > > > >
> > > > > > why not
> > > > > >
> > > > > > sys_time_notify(efd, 1, 0, 1, 1);
> > > > >
> > > > > Yeah, that would be much better than a sysfs file, this is abusing the
> > > > > sysfs interface quite a lot.
> > > >
> > > > Do you really think, that increasing number of syscalls is better then
> > > > fs-based interfaces?
> > >
> > > As you are pretty much creating a new syscall here anyway, there is no
> > > problem with making it a real one, right?
> >
> > I think Linux has too many syscalls. Significant part these interfaces
> > would be better to map to a filesystem[s].
>
> What is the difference between a syscall and a filesystem interface?
> They are both things that we can not change in the future and need to be
> preserved and documented.
>
> Don't be afraid of syscall's, they don't bite :)
>
> > > That way you can properly
> > > handle the user/kernel documentation and persistance over time (i.e. you
> > > can't change it.)
> >
> > On the other, hand properly designed fs-based interface requires less
> > modification of userspeace to use it. Acctually, you can use most of
> > fs-based intefaces directly from shell. No need in libc modifications and
> > utilities to use it from shell or other script language.
> > See cgroup, for example.
> >
> > > So yes, a syscall would be better, especially as this does not exactly
> > > fit into the model of sysfs, right?
> >
> > Yes, sysfs is not the best place for it, but...
>
> You just answered your own question. Please don't make it in sysfs,
> make it a syscall as it does not fit into sysfs.
Hmm, how about a syscallfs? :)
Regards,
--
Alex