2005-12-07 22:08:15

by Shailabh Nagar

[permalink] [raw]
Subject: [RFC][Patch 0/5] Per-task delay accounting

The following patches add accounting for the delays seen by a task in
a) waiting for a CPU (while being runnable)
b) completion of synchronous block I/O initiated by the task
c) swapping in pages (i.e. capacity misses).

Such delays provide feedback for a task's cpu priority, io priority and
rss limit values. Long delays, especially relative to other tasks, can be
a trigger for changing a task's cpu/io priorities and modifying its rss usage
(either directly through sys_getprlimit() that was proposed earlier on lkml or
by throttling cpu consumption or process calling sys_setrlimit etc.)

There are quite a few differences from the earlier posting of these patches
(http://www.uwsg.indiana.edu/hypermail/linux/kernel/0511.1/2275.html):

- block I/O is (hopefully) being accounted properly now instead of just counting the
time spent in io_schedule() as done earlier.

- instead of accounting for time spent in all page faults, only swapping in of pages
is being counted since thats the only part that one can really control (capacity misses
vs. compulsory misses)

- a /proc interface is being used instead of connector-based interface. Andrew Morton
suggested a generic connector-based interface useful for future usage of
connectors fo stats. This revised connector-based interface will be posted separately
since its useful for efficient delivery of any per-task statistics, not just the ones
being introduced by these patches.

- the timestamping code has been made generic (following the suggestions to Matt Helsley's
patches to add timestamps to process events connectors)


More comments in individual patches.

Series

nstimestamp-diff.patch
delayacct-init.patch
delayacct-blkio.patch
delayacct-swapin.patch
delayacct-procfs.patch


2005-12-07 22:13:09

by Shailabh Nagar

[permalink] [raw]
Subject: [RFC][Patch 1/5] nanosecond timestamps and diffs

Add kernel utility functions for
- nanosecond resolution timestamps, adjusted for lost ticks
- interval (diff) between two such timestamps, in nanoseconds, adjusting
for overflow

The timestamp part of this patch is identical to the one proposed by
Matt Helsley (as part of adding timestamps to process event connectors)
http://www.uwsg.indiana.edu/hypermail/linux/kernel/0512.0/1373.html

Signed-off-by: Shailabh Nagar <[email protected]>

include/linux/time.h | 16 ++++++++++++++++
kernel/time.c | 22 ++++++++++++++++++++++
2 files changed, 38 insertions(+)

Index: linux-2.6.15-rc5/include/linux/time.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/time.h
+++ linux-2.6.15-rc5/include/linux/time.h
@@ -95,6 +95,7 @@ struct itimerval;
extern int do_setitimer(int which, struct itimerval *value, struct itimerval *ovalue);
extern int do_getitimer(int which, struct itimerval *value);
extern void getnstimeofday (struct timespec *tv);
+extern void getnstimestamp(struct timespec *ts);

extern struct timespec timespec_trunc(struct timespec t, unsigned gran);

@@ -113,6 +114,21 @@ set_normalized_timespec (struct timespec
ts->tv_nsec = nsec;
}

+/*
+ * timespec_nsdiff - Return difference of two timestamps in nanoseconds
+ * In the rare case of @end being earlier than @start, return zero
+ */
+static inline unsigned long long
+timespec_nsdiff(struct timespec *start, struct timespec *end)
+{
+ long long ret;
+
+ ret = end->tv_sec*(1000000000) + end->tv_nsec;
+ ret -= start->tv_sec*(1000000000) + start->tv_nsec;
+ if (ret < 0)
+ return 0;
+ return ret;
+}
#endif /* __KERNEL__ */

#define NFDBITS __NFDBITS
Index: linux-2.6.15-rc5/kernel/time.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/time.c
+++ linux-2.6.15-rc5/kernel/time.c
@@ -561,6 +561,28 @@ void getnstimeofday(struct timespec *tv)
EXPORT_SYMBOL_GPL(getnstimeofday);
#endif

+void getnstimestamp(struct timespec *ts)
+{
+ unsigned int seq;
+ struct timespec wall2mono;
+
+ /* synchronize with settimeofday() changes */
+ do {
+ seq = read_seqbegin(&xtime_lock);
+ getnstimeofday(ts);
+ wall2mono = wall_to_monotonic;
+ } while(unlikely(read_seqretry(&xtime_lock, seq)));
+
+ /* adjust to monotonicaly-increasing values */
+ ts->tv_sec += wall2mono.tv_sec;
+ ts->tv_nsec += wall2mono.tv_nsec;
+ while (unlikely(ts->tv_nsec >= NSEC_PER_SEC)) {
+ ts->tv_nsec -= NSEC_PER_SEC;
+ ts->tv_sec++;
+ }
+}
+EXPORT_SYMBOL_GPL(getnstimestamp);
+
#if (BITS_PER_LONG < 64)
u64 get_jiffies_64(void)
{

2005-12-07 22:16:03

by Shailabh Nagar

[permalink] [raw]
Subject: [RFC][Patch 2/5] Per-task delay accounting: Initialization, dynamic turn on/off

Changes since 11/14/05

- use nanosecond resolution, adjusted wall clock time for timestamps
instead of sched_clock (akpm, andi, marcelo)
- kernel param, sysctl option to control delay stats collection (parag)
- better CONFIG parameter name (parag)

11/14/05: First post

delayacct-init.patch

Initialization code related to collection of per-task "delay"
statistics which measure how long it had to wait for cpu,
sync block io, swapping etc.. The collection of statistics and
the interface are in other patches. This patch sets up the data
structures and enables the statistics collection to be dynamically
enabled (through a kernel boot paramater and through
/proc/sys/kernel/delayacct).


Signed-off-by: Shailabh Nagar <[email protected]>

Documentation/kernel-parameters.txt | 2 ++
include/linux/delayacct.h | 26 ++++++++++++++++++++++++++
include/linux/sched.h | 11 +++++++++++
include/linux/sysctl.h | 1 +
init/Kconfig | 13 +++++++++++++
kernel/Makefile | 1 +
kernel/delayacct.c | 36 ++++++++++++++++++++++++++++++++++++
kernel/fork.c | 2 ++
kernel/sysctl.c | 14 ++++++++++++++
9 files changed, 106 insertions(+)

Index: linux-2.6.15-rc5/init/Kconfig
===================================================================
--- linux-2.6.15-rc5.orig/init/Kconfig
+++ linux-2.6.15-rc5/init/Kconfig
@@ -162,6 +162,19 @@ config BSD_PROCESS_ACCT_V3
for processing it. A preliminary version of these tools is available
at <http://www.physik3.uni-rostock.de/tim/kernel/utils/acct/>.

+config TASK_DELAY_ACCT
+ bool "Enable per-task delay accounting (EXPERIMENTAL)"
+ help
+ Collect information on time spent by a task waiting for system
+ resources like cpu, synchronous block I/O completion and swapping
+ in pages. Such statistics can help in setting a task's priorities
+ relative to other tasks for cpu, io, rss limits etc.
+
+ Unlike BSD process accounting, this information is available
+ continuously during the lifetime of a task.
+
+ Say N if unsure.
+
config SYSCTL
bool "Sysctl support"
---help---
Index: linux-2.6.15-rc5/include/linux/sched.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/sched.h
+++ linux-2.6.15-rc5/include/linux/sched.h
@@ -541,6 +541,14 @@ struct sched_info {
extern struct file_operations proc_schedstat_operations;
#endif

+#ifdef CONFIG_TASK_DELAY_ACCT
+struct task_delay_info {
+ spinlock_t lock;
+
+ /* Add stats in pairs: uint64_t delay, uint32_t count */
+};
+#endif
+
enum idle_type
{
SCHED_IDLE,
@@ -857,6 +865,9 @@ struct task_struct {
int cpuset_mems_generation;
#endif
atomic_t fs_excl; /* holding fs exclusive resources */
+#ifdef CONFIG_TASK_DELAY_ACCT
+ struct task_delay_info delays;
+#endif
};

static inline pid_t process_group(struct task_struct *tsk)
Index: linux-2.6.15-rc5/kernel/fork.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/fork.c
+++ linux-2.6.15-rc5/kernel/fork.c
@@ -43,6 +43,7 @@
#include <linux/rmap.h>
#include <linux/acct.h>
#include <linux/cn_proc.h>
+#include <linux/delayacct.h>

#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -923,6 +924,7 @@ static task_t *copy_process(unsigned lon
if (p->binfmt && !try_module_get(p->binfmt->module))
goto bad_fork_cleanup_put_domain;

+ delayacct_tsk_init(p);
p->did_exec = 0;
copy_flags(clone_flags, p);
p->pid = pid;
Index: linux-2.6.15-rc5/include/linux/delayacct.h
===================================================================
--- /dev/null
+++ linux-2.6.15-rc5/include/linux/delayacct.h
@@ -0,0 +1,26 @@
+/* delayacct.h - per-task delay accounting
+ *
+ * Copyright (C) Shailabh Nagar, IBM Corp. 2005
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ */
+
+#ifndef _LINUX_TASKDELAYS_H
+#define _LINUX_TASKDELAYS_H
+
+#include <linux/sched.h>
+
+#ifdef CONFIG_TASK_DELAY_ACCT
+extern int delayacct_on; /* Delay accounting turned on/off */
+extern void delayacct_tsk_init(struct task_struct *tsk);
+#else
+static inline void delayacct_tsk_init(struct task_struct *tsk)
+{}
+#endif /* CONFIG_TASK_DELAY_ACCT */
+#endif /* _LINUX_TASKDELAYS_H */
Index: linux-2.6.15-rc5/kernel/sysctl.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/sysctl.c
+++ linux-2.6.15-rc5/kernel/sysctl.c
@@ -124,6 +124,10 @@ extern int sysctl_hz_timer;
extern int acct_parm[];
#endif

+#ifdef CONFIG_TASK_DELAY_ACCT
+extern int delayacct_on;
+#endif
+
int randomize_va_space = 1;

static int parse_table(int __user *, int, void __user *, size_t __user *, void __user *, size_t,
@@ -656,6 +660,16 @@ static ctl_table kern_table[] = {
.proc_handler = &proc_dointvec,
},
#endif
+#if defined(CONFIG_TASK_DELAY_ACCT)
+ {
+ .ctl_name = KERN_TASK_DELAY_ACCT,
+ .procname = "delayacct",
+ .data = &delayacct_on,
+ .maxlen = sizeof (int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
+#endif
{ .ctl_name = 0 }
};

Index: linux-2.6.15-rc5/include/linux/sysctl.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/sysctl.h
+++ linux-2.6.15-rc5/include/linux/sysctl.h
@@ -146,6 +146,7 @@ enum
KERN_RANDOMIZE=68, /* int: randomize virtual address space */
KERN_SETUID_DUMPABLE=69, /* int: behaviour of dumps for setuid core */
KERN_SPIN_RETRY=70, /* int: number of spinlock retries */
+ KERN_TASK_DELAY_ACCT=71, /* turn task delay accounting on/off */
};


Index: linux-2.6.15-rc5/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.15-rc5.orig/Documentation/kernel-parameters.txt
+++ linux-2.6.15-rc5/Documentation/kernel-parameters.txt
@@ -410,6 +410,8 @@ running once the system is up.
Format: <area>[,<node>]
See also Documentation/networking/decnet.txt.

+ delayacct [KNL] Enable per-task delay accounting
+
devfs= [DEVFS]
See Documentation/filesystems/devfs/boot-options.

Index: linux-2.6.15-rc5/kernel/Makefile
===================================================================
--- linux-2.6.15-rc5.orig/kernel/Makefile
+++ linux-2.6.15-rc5/kernel/Makefile
@@ -32,6 +32,7 @@ obj-$(CONFIG_GENERIC_HARDIRQS) += irq/
obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
obj-$(CONFIG_SECCOMP) += seccomp.o
obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
+obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o

ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
# According to Alan Modra <[email protected]>, the -fno-omit-frame-pointer is
Index: linux-2.6.15-rc5/kernel/delayacct.c
===================================================================
--- /dev/null
+++ linux-2.6.15-rc5/kernel/delayacct.c
@@ -0,0 +1,36 @@
+/* delayacct.c - per-task delay accounting
+ *
+ * Copyright (C) Shailabh Nagar, IBM Corp. 2005
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ */
+
+#include <linux/sched.h>
+
+int delayacct_on; /* Delay accounting turned on/off */
+
+int __init delayacct_setup_enable(char *str)
+{
+ delayacct_on = 1;
+ return 1;
+}
+__setup("delayacct", delayacct_setup_enable);
+
+inline void delayacct_tsk_init(struct task_struct *tsk)
+{
+ memset(&tsk->delays, 0, sizeof(tsk->delays));
+ spin_lock_init(&tsk->delays.lock);
+}
+
+static int __init delayacct_init(void)
+{
+ delayacct_tsk_init(&init_task);
+ return 0;
+}
+core_initcall(delayacct_init);

2005-12-07 22:23:21

by Shailabh Nagar

[permalink] [raw]
Subject: [RFC][Patch 3/5] Per-task delay accounting: Sync block I/O delays

This patch attempts to record all the time spent by a task
waiting for completion of (user-initiated) block I/O. Ideally, it
would have been nice to be able to record the time spent by a task
waiting for I/O events that are related to async block I/O. While
that can be done now (by measuring time spent in wait_for_async_kiocb)
once (if ?) network aio is implemented, AFAIK, it won't be possible
to distinguish async block and network aio events (and I suspect async
I/O to pipes too...) so async block I/O gets ignored for now.

Suggestions on how async block I/O wait can be accounted accurately would
be welcome.




Changes since 11/14/05

- use nanosecond resolution, adjusted wall clock time for timestamps
instead of sched_clock (akpm, andi, marcelo)
- collect stats only if delay accounting enabled (parag)
- stats collected for delays in all userspace-initiated block I/O
including fsync/fdatasync but not counting waits for async block io events.

11/14/05: First post


delayacct-blkio.patch

Record time spent by a task waiting for completion of
userspace initiated synchronous block I/O. This can help
determine the right I/O priority for the task.

Signed-off-by: Shailabh Nagar <[email protected]>

fs/buffer.c | 6 ++++++
fs/read_write.c | 10 +++++++++-
include/linux/delayacct.h | 4 ++++
include/linux/sched.h | 2 ++
kernel/delayacct.c | 31 +++++++++++++++++++++++++++++++
mm/filemap.c | 10 +++++++++-
mm/memory.c | 17 +++++++++++++++--
7 files changed, 76 insertions(+), 4 deletions(-)

Index: linux-2.6.15-rc5/include/linux/sched.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/sched.h
+++ linux-2.6.15-rc5/include/linux/sched.h
@@ -546,6 +546,8 @@ struct task_delay_info {
spinlock_t lock;

/* Add stats in pairs: uint64_t delay, uint32_t count */
+ uint64_t blkio_delay; /* wait for sync block io completion */
+ uint32_t blkio_count;
};
#endif

Index: linux-2.6.15-rc5/fs/read_write.c
===================================================================
--- linux-2.6.15-rc5.orig/fs/read_write.c
+++ linux-2.6.15-rc5/fs/read_write.c
@@ -14,6 +14,8 @@
#include <linux/security.h>
#include <linux/module.h>
#include <linux/syscalls.h>
+#include <linux/time.h>
+#include <linux/delayacct.h>

#include <asm/uaccess.h>
#include <asm/unistd.h>
@@ -224,8 +226,14 @@ ssize_t do_sync_read(struct file *filp,
(ret = filp->f_op->aio_read(&kiocb, buf, len, kiocb.ki_pos)))
wait_on_retry_sync_kiocb(&kiocb);

- if (-EIOCBQUEUED == ret)
+ if (-EIOCBQUEUED == ret) {
+ __attribute__((unused)) struct timespec start, end;
+
+ getnstimestamp(&start);
ret = wait_on_sync_kiocb(&kiocb);
+ getnstimestamp(&end);
+ delayacct_blkio(&start, &end);
+ }
*ppos = kiocb.ki_pos;
return ret;
}
Index: linux-2.6.15-rc5/mm/filemap.c
===================================================================
--- linux-2.6.15-rc5.orig/mm/filemap.c
+++ linux-2.6.15-rc5/mm/filemap.c
@@ -28,6 +28,8 @@
#include <linux/blkdev.h>
#include <linux/security.h>
#include <linux/syscalls.h>
+#include <linux/time.h>
+#include <linux/delayacct.h>
#include "filemap.h"
/*
* FIXME: remove all knowledge of the buffer layer from the core VM
@@ -1062,8 +1064,14 @@ generic_file_read(struct file *filp, cha

init_sync_kiocb(&kiocb, filp);
ret = __generic_file_aio_read(&kiocb, &local_iov, 1, ppos);
- if (-EIOCBQUEUED == ret)
+ if (-EIOCBQUEUED == ret) {
+ __attribute__((unused)) struct timespec start, end;
+
+ getnstimestamp(&start);
ret = wait_on_sync_kiocb(&kiocb);
+ getnstimestamp(&end);
+ delayacct_blkio(&start, &end);
+ }
return ret;
}

Index: linux-2.6.15-rc5/mm/memory.c
===================================================================
--- linux-2.6.15-rc5.orig/mm/memory.c
+++ linux-2.6.15-rc5/mm/memory.c
@@ -48,6 +48,8 @@
#include <linux/rmap.h>
#include <linux/module.h>
#include <linux/init.h>
+#include <linux/time.h>
+#include <linux/delayacct.h>

#include <asm/pgalloc.h>
#include <asm/uaccess.h>
@@ -2200,11 +2202,22 @@ static inline int handle_pte_fault(struc
old_entry = entry = *pte;
if (!pte_present(entry)) {
if (pte_none(entry)) {
+ int ret;
+ __attribute__((unused)) struct timespec start, end;
+
if (!vma->vm_ops || !vma->vm_ops->nopage)
return do_anonymous_page(mm, vma, address,
pte, pmd, write_access);
- return do_no_page(mm, vma, address,
- pte, pmd, write_access);
+
+ if (vma->vm_file)
+ getnstimestamp(&start);
+ ret = do_no_page(mm, vma, address,
+ pte, pmd, write_access);
+ if (vma->vm_file) {
+ getnstimestamp(&end);
+ delayacct_blkio(&start, &end);
+ }
+ return ret;
}
if (pte_file(entry))
return do_file_page(mm, vma, address,
Index: linux-2.6.15-rc5/fs/buffer.c
===================================================================
--- linux-2.6.15-rc5.orig/fs/buffer.c
+++ linux-2.6.15-rc5/fs/buffer.c
@@ -41,6 +41,8 @@
#include <linux/bitops.h>
#include <linux/mpage.h>
#include <linux/bit_spinlock.h>
+#include <linux/time.h>
+#include <linux/delayacct.h>

static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
static void invalidate_bh_lrus(void);
@@ -337,6 +339,7 @@ static long do_fsync(unsigned int fd, in
struct file * file;
struct address_space *mapping;
int ret, err;
+ __attribute__((unused)) struct timespec start, end;

ret = -EBADF;
file = fget(fd);
@@ -349,6 +352,7 @@ static long do_fsync(unsigned int fd, in
goto out_putf;
}

+ getnstimestamp(&start);
mapping = file->f_mapping;

current->flags |= PF_SYNCWRITE;
@@ -371,6 +375,8 @@ static long do_fsync(unsigned int fd, in
out_putf:
fput(file);
out:
+ getnstimestamp(&end);
+ delayacct_blkio(&start, &end);
return ret;
}

Index: linux-2.6.15-rc5/include/linux/delayacct.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/delayacct.h
+++ linux-2.6.15-rc5/include/linux/delayacct.h
@@ -19,8 +19,12 @@
#ifdef CONFIG_TASK_DELAY_ACCT
extern int delayacct_on; /* Delay accounting turned on/off */
extern void delayacct_tsk_init(struct task_struct *tsk);
+extern void delayacct_blkio(struct timespec *start, struct timespec *end);
#else
static inline void delayacct_tsk_init(struct task_struct *tsk)
{}
+static inline void delayacct_blkio(struct timespec *start, struct timespec *end)
+{}
+
#endif /* CONFIG_TASK_DELAY_ACCT */
#endif /* _LINUX_TASKDELAYS_H */
Index: linux-2.6.15-rc5/kernel/delayacct.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/delayacct.c
+++ linux-2.6.15-rc5/kernel/delayacct.c
@@ -12,6 +12,7 @@
*/

#include <linux/sched.h>
+#include <linux/time.h>

int delayacct_on; /* Delay accounting turned on/off */

@@ -34,3 +35,33 @@ static int __init delayacct_init(void)
return 0;
}
core_initcall(delayacct_init);
+
+inline void delayacct_blkio(struct timespec *start, struct timespec *end)
+{
+ unsigned long long delay;
+
+ if (!delayacct_on)
+ return;
+
+ delay = timespec_nsdiff(start, end);
+
+ spin_lock(&current->delays.lock);
+ current->delays.blkio_delay += delay;
+ current->delays.blkio_count++;
+ spin_unlock(&current->delays.lock);
+}
+
+inline void delayacct_swapin(struct timespec *start, struct timespec *end)
+{
+ unsigned long long delay;
+
+ if (!delayacct_on)
+ return;
+
+ delay = timespec_nsdiff(start, end);
+
+ spin_lock(&current->delays.lock);
+ current->delays.swapin_delay += delay;
+ current->delays.swapin_count++;
+ spin_unlock(&current->delays.lock);
+}

2005-12-07 22:28:36

by Shailabh Nagar

[permalink] [raw]
Subject: [RFC][Patch 4/5] Per-task delay accounting: Swap in delays

Changes since 11/14/05

- use nanosecond resolution, adjusted wall clock time for timestamps
instead of sched_clock (akpm, andi, marcelo)
- collect stats only if delay accounting enabled (parag)
- collect delays for only swapin page faults instead of all page faults.

11/14/05: First post


delayacct-swapin.patch

Record time spent by a task waiting for its pages to be swapped in.
This statistic can help in adjusting the rss limits of
tasks (process), especially relative to each other, when the system is
under memory pressure.

Signed-off-by: Shailabh Nagar <[email protected]>

include/linux/delayacct.h | 3 +++
include/linux/sched.h | 2 ++
mm/memory.c | 16 +++++++++-------
3 files changed, 14 insertions(+), 7 deletions(-)

Index: linux-2.6.15-rc5/include/linux/delayacct.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/delayacct.h
+++ linux-2.6.15-rc5/include/linux/delayacct.h
@@ -20,11 +20,14 @@
extern int delayacct_on; /* Delay accounting turned on/off */
extern void delayacct_tsk_init(struct task_struct *tsk);
extern void delayacct_blkio(struct timespec *start, struct timespec *end);
+extern void delayacct_swapin(struct timespec *start, struct timespec *end);
#else
static inline void delayacct_tsk_init(struct task_struct *tsk)
{}
static inline void delayacct_blkio(struct timespec *start, struct timespec *end)
{}
+static inline void delayacct_swapin(struct timespec *start, struct timespec *end)
+{}

#endif /* CONFIG_TASK_DELAY_ACCT */
#endif /* _LINUX_TASKDELAYS_H */
Index: linux-2.6.15-rc5/include/linux/sched.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/sched.h
+++ linux-2.6.15-rc5/include/linux/sched.h
@@ -548,6 +548,8 @@ struct task_delay_info {
/* Add stats in pairs: uint64_t delay, uint32_t count */
uint64_t blkio_delay; /* wait for sync block io completion */
uint32_t blkio_count;
+ uint64_t swapin_delay; /* wait for pages to be swapped in */
+ uint32_t swapin_count;
};
#endif

Index: linux-2.6.15-rc5/mm/memory.c
===================================================================
--- linux-2.6.15-rc5.orig/mm/memory.c
+++ linux-2.6.15-rc5/mm/memory.c
@@ -2201,16 +2201,15 @@ static inline int handle_pte_fault(struc

old_entry = entry = *pte;
if (!pte_present(entry)) {
- if (pte_none(entry)) {
- int ret;
- __attribute__((unused)) struct timespec start, end;
+ int ret;
+ __attribute__((unused)) struct timespec start, end;

+ getnstimestamp(&start);
+ if (pte_none(entry)) {
if (!vma->vm_ops || !vma->vm_ops->nopage)
return do_anonymous_page(mm, vma, address,
pte, pmd, write_access);

- if (vma->vm_file)
- getnstimestamp(&start);
ret = do_no_page(mm, vma, address,
pte, pmd, write_access);
if (vma->vm_file) {
@@ -2222,8 +2221,11 @@ static inline int handle_pte_fault(struc
if (pte_file(entry))
return do_file_page(mm, vma, address,
pte, pmd, write_access, entry);
- return do_swap_page(mm, vma, address,
- pte, pmd, write_access, entry);
+ ret = do_swap_page(mm, vma, address,
+ pte, pmd, write_access, entry);
+ getnstimestamp(&end);
+ delayacct_swapin(&start, &end);
+ return ret;
}

ptl = pte_lockptr(mm, pmd);

2005-12-07 22:30:00

by Shailabh Nagar

[permalink] [raw]
Subject: [RFC][Patch 5/5] Per-task delay accounting: procfs interface

Creates /proc/<pid>/delay interface for getting per-task
delay statistics (time spent by a task waiting for cpu,
sync block I/O completion, swapping in pages etc.) The cpu
stats are available only if CONFIG_SCHEDSTATS is enabled.

The interface allows a task's delay stats (excluding cpu)
to be reset to zero. This is particularly useful if
delay accounting is being turned on/off dynamically.

Signed-off-by: Shailabh Nagar <[email protected]>

fs/proc/base.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/delayacct.h | 6 ++++
kernel/delayacct.c | 33 +++++++++++++++++++++++
3 files changed, 104 insertions(+)

Index: linux-2.6.15-rc5/fs/proc/base.c
===================================================================
--- linux-2.6.15-rc5.orig/fs/proc/base.c
+++ linux-2.6.15-rc5/fs/proc/base.c
@@ -71,6 +71,8 @@
#include <linux/cpuset.h>
#include <linux/audit.h>
#include <linux/poll.h>
+#include <linux/delayacct.h>
+#include <linux/kernel.h>
#include "internal.h"

/*
@@ -166,6 +168,10 @@ enum pid_directory_inos {
PROC_TID_OOM_SCORE,
PROC_TID_OOM_ADJUST,

+#ifdef CONFIG_TASK_DELAY_ACCT
+ PROC_TID_DELAY_ACCT,
+ PROC_TGID_DELAY_ACCT,
+#endif
/* Add new entries before this */
PROC_TID_FD_DIR = 0x8000, /* 0x8000-0xffff */
};
@@ -220,6 +226,9 @@ static struct pid_entry tgid_base_stuff[
#ifdef CONFIG_AUDITSYSCALL
E(PROC_TGID_LOGINUID, "loginuid", S_IFREG|S_IWUSR|S_IRUGO),
#endif
+#ifdef CONFIG_TASK_DELAY_ACCT
+ E(PROC_TGID_DELAY_ACCT,"delay", S_IFREG|S_IRUGO),
+#endif
{0,0,NULL,0}
};
static struct pid_entry tid_base_stuff[] = {
@@ -262,6 +271,9 @@ static struct pid_entry tid_base_stuff[]
#ifdef CONFIG_AUDITSYSCALL
E(PROC_TID_LOGINUID, "loginuid", S_IFREG|S_IWUSR|S_IRUGO),
#endif
+#ifdef CONFIG_TASK_DELAY_ACCT
+ E(PROC_TID_DELAY_ACCT,"delay", S_IFREG|S_IRUGO),
+#endif
{0,0,NULL,0}
};

@@ -1066,6 +1078,53 @@ static struct file_operations proc_secco
};
#endif /* CONFIG_SECCOMP */

+#ifdef CONFIG_TASK_DELAY_ACCT
+ssize_t proc_delayacct_write(struct file *file, const char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ struct task_struct *tsk = proc_task(file->f_dentry->d_inode);
+ char kbuf[DELAYACCT_PROC_MAX_WRITE + 1];
+ int cmd, ret;
+
+ if (count > DELAYACCT_PROC_MAX_WRITE)
+ return -EINVAL;
+ if (copy_from_user(&kbuf, buffer, count))
+ return -EFAULT;
+
+ cmd = simple_strtoul(kbuf, NULL, 10);
+ ret = delayacct_task_write(tsk, cmd);
+
+ if (ret)
+ return ret;
+ return count;
+}
+
+ssize_t proc_delayacct_read(struct file *file, char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ struct task_struct *tsk = proc_task(file->f_dentry->d_inode);
+ char kbuf[DELAYACCT_PROC_MAX_READ + 1];
+ size_t len;
+ loff_t __ppos = *ppos;
+
+ len = delayacct_task_read(tsk, kbuf);
+
+ if (__ppos >= len)
+ return 0;
+ if (count > len-__ppos)
+ count = len-__ppos;
+ if (copy_to_user(buffer, kbuf + __ppos, count))
+ return -EFAULT;
+ *ppos = __ppos + count;
+ return count;
+}
+
+static struct file_operations proc_delayacct_operations = {
+ .read = proc_delayacct_read,
+ .write = proc_delayacct_write,
+};
+#endif
+
static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
{
struct inode *inode = dentry->d_inode;
@@ -1786,6 +1845,12 @@ static struct dentry *proc_pident_lookup
inode->i_fop = &proc_loginuid_operations;
break;
#endif
+#ifdef CONFIG_TASK_DELAY_ACCT
+ case PROC_TID_DELAY_ACCT:
+ case PROC_TGID_DELAY_ACCT:
+ inode->i_fop = &proc_delayacct_operations;
+ break;
+#endif
default:
printk("procfs: impossible type (%d)",p->type);
iput(inode);
Index: linux-2.6.15-rc5/include/linux/delayacct.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/delayacct.h
+++ linux-2.6.15-rc5/include/linux/delayacct.h
@@ -16,11 +16,17 @@

#include <linux/sched.h>

+/* Maximum data that a user can read/write from/to /proc/<tgid>/delay */
+#define DELAYACCT_PROC_MAX_READ 256
+#define DELAYACCT_PROC_MAX_WRITE 8
+
#ifdef CONFIG_TASK_DELAY_ACCT
extern int delayacct_on; /* Delay accounting turned on/off */
extern void delayacct_tsk_init(struct task_struct *tsk);
extern void delayacct_blkio(struct timespec *start, struct timespec *end);
extern void delayacct_swapin(struct timespec *start, struct timespec *end);
+extern int delayacct_task_write(struct task_struct *tsk, int cmd);
+extern size_t delayacct_task_read(struct task_struct *tsk, char *buf);
#else
static inline void delayacct_tsk_init(struct task_struct *tsk)
{}
Index: linux-2.6.15-rc5/kernel/delayacct.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/delayacct.c
+++ linux-2.6.15-rc5/kernel/delayacct.c
@@ -13,6 +13,7 @@

#include <linux/sched.h>
#include <linux/time.h>
+#include <linux/delayacct.h>

int delayacct_on; /* Delay accounting turned on/off */

@@ -65,3 +66,35 @@ inline void delayacct_swapin(struct time
current->delays.swapin_count++;
spin_unlock(&current->delays.lock);
}
+
+/* User writes @cmd to /proc/<tgid>/delay */
+inline int delayacct_task_write(struct task_struct *tsk, int cmd)
+{
+ if (cmd == 0) {
+ spin_lock(&tsk->delays.lock);
+ memset(&tsk->delays, 0, sizeof(tsk->delays));
+ spin_unlock(&tsk->delays.lock);
+ }
+ return 0;
+}
+
+/* User reads from /proc/<tgid>/delay */
+inline size_t delayacct_task_read(struct task_struct *tsk, char *buf)
+{
+ unsigned long long run_delay = 0;
+ unsigned long run_count = 0;
+
+#ifdef CONFIG_SCHEDSTATS
+ run_delay = jiffies_to_usecs(tsk->sched_info.run_delay) * 1000;
+ run_count = tsk->sched_info.pcnt ;
+#endif
+ return snprintf(buf, DELAYACCT_PROC_MAX_READ,
+ "%lu %llu %llu %u %llu %u %llu\n",
+ run_count,
+ (uint64_t) current_sched_time(tsk),
+ (uint64_t) run_delay,
+ (unsigned int) tsk->delays.blkio_count,
+ (uint64_t) tsk->delays.blkio_delay,
+ (unsigned int) tsk->delays.swapin_count,
+ (uint64_t) tsk->delays.swapin_delay);
+}

2005-12-07 22:34:19

by Dave Hansen

[permalink] [raw]
Subject: Re: [ckrm-tech] [RFC][Patch 3/5] Per-task delay accounting: Sync block I/O delays

On Wed, 2005-12-07 at 22:23 +0000, Shailabh Nagar wrote:
>
> + if (-EIOCBQUEUED == ret) {
> + __attribute__((unused)) struct timespec start, end;
> +

Those "unused" things suck. They're really ugly.

Doesn't making your delay functions into static inlines make the unused
warnings go away?

-- Dave

2005-12-07 23:06:45

by Shailabh Nagar

[permalink] [raw]
Subject: Re: [ckrm-tech] [RFC][Patch 3/5] Per-task delay accounting: Sync block I/O delays

Dave Hansen wrote:
> On Wed, 2005-12-07 at 22:23 +0000, Shailabh Nagar wrote:
>
>>+ if (-EIOCBQUEUED == ret) {
>>+ __attribute__((unused)) struct timespec start, end;
>>+
>
>
> Those "unused" things suck. They're really ugly.
>
> Doesn't making your delay functions into static inlines make the unused
> warnings go away?

They do indeed. Thanks !
It was a holdover from when the delay funcs were macros. Will fix everywhere.

--Shailabh


>
> -- Dave

2005-12-12 18:50:54

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs


On Wed, 7 Dec 2005, Shailabh Nagar wrote:

> +void getnstimestamp(struct timespec *ts)

There is already getnstimeofday in the kernel.

2005-12-12 19:31:52

by Shailabh Nagar

[permalink] [raw]
Subject: Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs

Christoph Lameter wrote:
> On Wed, 7 Dec 2005, Shailabh Nagar wrote:
>
>
>>+void getnstimestamp(struct timespec *ts)
>
>
> There is already getnstimeofday in the kernel.
>
>

Yes, and that function is being used within the getnstimestamp() being proposed.
However, John Stultz had advised that getnstimeofday could get affected by calls to
settimeofday and had recommended adjusting the getnstimeofday value with wall_to_monotonic.

John, could you elaborate ?

Thanks,
Shailabh

2005-12-12 19:49:32

by john stultz

[permalink] [raw]
Subject: Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs

On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
> Christoph Lameter wrote:
> > On Wed, 7 Dec 2005, Shailabh Nagar wrote:
> >
> >
> >>+void getnstimestamp(struct timespec *ts)
> >
> >
> > There is already getnstimeofday in the kernel.
> >
>
> Yes, and that function is being used within the getnstimestamp() being proposed.
> However, John Stultz had advised that getnstimeofday could get affected by calls to
> settimeofday and had recommended adjusting the getnstimeofday value with wall_to_monotonic.
>
> John, could you elaborate ?

I think you pretty well have it covered.

getnstimeofday + wall_to_monotonic should be higher-res and more
reliable (then TSC based sched_clock(), for example) for getting a
timestamp.

There may be performance concerns as you have to access the clock
hardware in getnstimeofday(), but there really is no other way for
reliable finely grained monotonically increasing timestamps.

thanks
-john

2005-12-12 20:00:50

by Shailabh Nagar

[permalink] [raw]
Subject: Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs

john stultz wrote:
> On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
>
>>Christoph Lameter wrote:
>>
>>>On Wed, 7 Dec 2005, Shailabh Nagar wrote:
>>>
>>>
>>>
>>>>+void getnstimestamp(struct timespec *ts)
>>>
>>>
>>>There is already getnstimeofday in the kernel.
>>>
>>
>>Yes, and that function is being used within the getnstimestamp() being proposed.
>>However, John Stultz had advised that getnstimeofday could get affected by calls to
>>settimeofday and had recommended adjusting the getnstimeofday value with wall_to_monotonic.
>>
>>John, could you elaborate ?
>
>
> I think you pretty well have it covered.
>
> getnstimeofday + wall_to_monotonic should be higher-res and more
> reliable (then TSC based sched_clock(), for example) for getting a
> timestamp.
>
> There may be performance concerns as you have to access the clock
> hardware in getnstimeofday(), but there really is no other way for
> reliable finely grained monotonically increasing timestamps.
>
> thanks
> -john

Thanks, that clarifies. I guess the other underlying concern here would be whether these
improvements (in resolution and reliability) should be going into getnstimeofday()
itself (rather than creating a new func for the same) ? Or is it better to leave
getnstimeofday as it is ?

Thanks,
Shailabh

2005-12-12 20:07:22

by john stultz

[permalink] [raw]
Subject: Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs

On Mon, 2005-12-12 at 20:00 +0000, Shailabh Nagar wrote:
> john stultz wrote:
> > On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
> >
> >>Christoph Lameter wrote:
> >>
> >>>On Wed, 7 Dec 2005, Shailabh Nagar wrote:
> >>>>+void getnstimestamp(struct timespec *ts)
> >>>
> >>>There is already getnstimeofday in the kernel.
> >>
> >>Yes, and that function is being used within the getnstimestamp() being proposed.
> >>However, John Stultz had advised that getnstimeofday could get affected by calls to
> >>settimeofday and had recommended adjusting the getnstimeofday value with wall_to_monotonic.
> >>
> >>John, could you elaborate ?
> >
> > I think you pretty well have it covered.
> >
> > getnstimeofday + wall_to_monotonic should be higher-res and more
> > reliable (then TSC based sched_clock(), for example) for getting a
> > timestamp.
> >
> > There may be performance concerns as you have to access the clock
> > hardware in getnstimeofday(), but there really is no other way for
> > reliable finely grained monotonically increasing timestamps.
> >

> Thanks, that clarifies. I guess the other underlying concern here would be whether these
> improvements (in resolution and reliability) should be going into getnstimeofday()
> itself (rather than creating a new func for the same) ? Or is it better to leave
> getnstimeofday as it is ?

No, getnstimeofday() is very much needed to get a nanosecond grained
wall-time clock, so a new function is needed for the monotonic clock.

In my timeofday re-work I have used the name "get_monotonic_clock()" and
"get_monotonic_clock_ts()" for basically the same functionality
(providing a ktime and a timespec respectively). You might consider
naming it as such, but resolving these naming collisions shouldn't be
too difficult either way.

thanks
-john

2005-12-13 00:55:50

by George Anzinger

[permalink] [raw]
Subject: Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs

john stultz wrote:
> On Mon, 2005-12-12 at 20:00 +0000, Shailabh Nagar wrote:
>
>>john stultz wrote:
>>
>>>On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
>>>
>>>
>>>>Christoph Lameter wrote:
>>>>
>>>>
>>>>>On Wed, 7 Dec 2005, Shailabh Nagar wrote:
>>>>>
>>>>>>+void getnstimestamp(struct timespec *ts)
>>>>>
>>>>>There is already getnstimeofday in the kernel.
>>>>
>>>>Yes, and that function is being used within the getnstimestamp() being proposed.
>>>>However, John Stultz had advised that getnstimeofday could get affected by calls to
>>>>settimeofday and had recommended adjusting the getnstimeofday value with wall_to_monotonic.
>>>>
>>>>John, could you elaborate ?
>>>
>>>I think you pretty well have it covered.
>>>
>>>getnstimeofday + wall_to_monotonic should be higher-res and more
>>>reliable (then TSC based sched_clock(), for example) for getting a
>>>timestamp.
>>>
>>>There may be performance concerns as you have to access the clock
>>>hardware in getnstimeofday(), but there really is no other way for
>>>reliable finely grained monotonically increasing timestamps.
>>>
>
>
>>Thanks, that clarifies. I guess the other underlying concern here would be whether these
>>improvements (in resolution and reliability) should be going into getnstimeofday()
>>itself (rather than creating a new func for the same) ? Or is it better to leave
>>getnstimeofday as it is ?
>
>
> No, getnstimeofday() is very much needed to get a nanosecond grained
> wall-time clock, so a new function is needed for the monotonic clock.
>
> In my timeofday re-work I have used the name "get_monotonic_clock()" and
> "get_monotonic_clock_ts()" for basically the same functionality
> (providing a ktime and a timespec respectively). You might consider
> naming it as such, but resolving these naming collisions shouldn't be
> too difficult either way.

Indeed. Lets use a name with "monotonic" in it, please. And,
possibly not "clock". How about get_nsmonotonic_time() or some such?


--
George Anzinger [email protected]
HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/

2005-12-13 03:48:38

by Nish Aravamudan

[permalink] [raw]
Subject: Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs

On 12/12/05, George Anzinger <[email protected]> wrote:
> john stultz wrote:
> > On Mon, 2005-12-12 at 20:00 +0000, Shailabh Nagar wrote:
> >
> >>john stultz wrote:
> >>
> >>>On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
> >>>
> >>>
> >>>>Christoph Lameter wrote:
> >>>>
> >>>>
> >>>>>On Wed, 7 Dec 2005, Shailabh Nagar wrote:
> >>>>>
> >>>>>>+void getnstimestamp(struct timespec *ts)
> >>>>>
> >>>>>There is already getnstimeofday in the kernel.
> >>>>
> >>>>Yes, and that function is being used within the getnstimestamp() being proposed.
> >>>>However, John Stultz had advised that getnstimeofday could get affected by calls to
> >>>>settimeofday and had recommended adjusting the getnstimeofday value with wall_to_monotonic.
> >>>>
> >>>>John, could you elaborate ?
> >>>
> >>>I think you pretty well have it covered.
> >>>
> >>>getnstimeofday + wall_to_monotonic should be higher-res and more
> >>>reliable (then TSC based sched_clock(), for example) for getting a
> >>>timestamp.
> >>>
> >>>There may be performance concerns as you have to access the clock
> >>>hardware in getnstimeofday(), but there really is no other way for
> >>>reliable finely grained monotonically increasing timestamps.
> >>>
> >
> >
> >>Thanks, that clarifies. I guess the other underlying concern here would be whether these
> >>improvements (in resolution and reliability) should be going into getnstimeofday()
> >>itself (rather than creating a new func for the same) ? Or is it better to leave
> >>getnstimeofday as it is ?
> >
> >
> > No, getnstimeofday() is very much needed to get a nanosecond grained
> > wall-time clock, so a new function is needed for the monotonic clock.
> >
> > In my timeofday re-work I have used the name "get_monotonic_clock()" and
> > "get_monotonic_clock_ts()" for basically the same functionality
> > (providing a ktime and a timespec respectively). You might consider
> > naming it as such, but resolving these naming collisions shouldn't be
> > too difficult either way.
>
> Indeed. Lets use a name with "monotonic" in it, please. And,
> possibly not "clock". How about get_nsmonotonic_time() or some such?

I agree -- personal preference, though, I prefer units at the end,
i.e. get_monotonic_time_ns() or get_monotonic_time_nsecs().

Thanks,
Nish

2005-12-13 18:35:23

by Jay Lan

[permalink] [raw]
Subject: Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs

john stultz wrote:
> On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
>
>>Christoph Lameter wrote:
>>
>>>On Wed, 7 Dec 2005, Shailabh Nagar wrote:
>>>
>>>
>>>
>>>>+void getnstimestamp(struct timespec *ts)
>>>
>>>
>>>There is already getnstimeofday in the kernel.
>>>
>>
>>Yes, and that function is being used within the getnstimestamp() being proposed.
>>However, John Stultz had advised that getnstimeofday could get affected by calls to
>>settimeofday and had recommended adjusting the getnstimeofday value with wall_to_monotonic.
>>
>>John, could you elaborate ?
>
>
> I think you pretty well have it covered.
>
> getnstimeofday + wall_to_monotonic should be higher-res and more
> reliable (then TSC based sched_clock(), for example) for getting a
> timestamp.

How is this proposed function different from
do_posix_clock_monotonic_gettime()?
It calls getnstimeofday(), it also adjusts with wall_to_monotinic.

It seems to me we just need to EXPORT_SYMBOL_GPL the
do_posix_clock_monotonic_gettime()?

Thanks,
- jay

>
> There may be performance concerns as you have to access the clock
> hardware in getnstimeofday(), but there really is no other way for
> reliable finely grained monotonically increasing timestamps.
>
> thanks
> -john
>

2005-12-13 21:16:34

by john stultz

[permalink] [raw]
Subject: Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs

On Tue, 2005-12-13 at 10:35 -0800, Jay Lan wrote:
> john stultz wrote:
> > On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
> >>Christoph Lameter wrote:
> >>>On Wed, 7 Dec 2005, Shailabh Nagar wrote:
> >>>>+void getnstimestamp(struct timespec *ts)
> >>>
> >>>There is already getnstimeofday in the kernel.
> >>
> >>Yes, and that function is being used within the getnstimestamp() being proposed.
> >>However, John Stultz had advised that getnstimeofday could get affected by calls to
> >>settimeofday and had recommended adjusting the getnstimeofday value with wall_to_monotonic.
> >>
> >>John, could you elaborate ?
> >
> > I think you pretty well have it covered.
> >
> > getnstimeofday + wall_to_monotonic should be higher-res and more
> > reliable (then TSC based sched_clock(), for example) for getting a
> > timestamp.
>
> How is this proposed function different from
> do_posix_clock_monotonic_gettime()?
> It calls getnstimeofday(), it also adjusts with wall_to_monotinic.
>
> It seems to me we just need to EXPORT_SYMBOL_GPL the
> do_posix_clock_monotonic_gettime()?

Indeed, this would be the same.

thanks
-john

2005-12-13 21:44:59

by Shailabh Nagar

[permalink] [raw]
Subject: Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs

Jay Lan wrote:
> john stultz wrote:
>
>> On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
>>
>>> Christoph Lameter wrote:
>>>
>>>> On Wed, 7 Dec 2005, Shailabh Nagar wrote:
>>>>
>>>>
>>>>
>>>>> +void getnstimestamp(struct timespec *ts)
>>>>
>>>>
>>>>
>>>> There is already getnstimeofday in the kernel.
>>>>
>>>
>>> Yes, and that function is being used within the getnstimestamp()
>>> being proposed.
>>> However, John Stultz had advised that getnstimeofday could get
>>> affected by calls to
>>> settimeofday and had recommended adjusting the getnstimeofday value
>>> with wall_to_monotonic.
>>>
>>> John, could you elaborate ?
>>
>>
>>
>> I think you pretty well have it covered.
>> getnstimeofday + wall_to_monotonic should be higher-res and more
>> reliable (then TSC based sched_clock(), for example) for getting a
>> timestamp.
>
>
> How is this proposed function different from
> do_posix_clock_monotonic_gettime()?
> It calls getnstimeofday(), it also adjusts with wall_to_monotinic.
>
> It seems to me we just need to EXPORT_SYMBOL_GPL the
> do_posix_clock_monotonic_gettime()?
>
> Thanks,
> - jay
>

Hmmm. Looks like do_posix_clock_monotonic_gettime will suffice for this patch.

Wonder why the clock parameter to do_posix_clock_monotonic_get is needed ?
Doesn't seem to be used.

Any possibility of these set of functions changing their behaviour ?

-- Shailabh







>>
>> There may be performance concerns as you have to access the clock
>> hardware in getnstimeofday(), but there really is no other way for
>> reliable finely grained monotonically increasing timestamps.
>>
>> thanks
>> -john
>>
>
>

2005-12-13 22:14:22

by George Anzinger

[permalink] [raw]
Subject: Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs

Shailabh Nagar wrote:
> Jay Lan wrote:
>
>>john stultz wrote:
>>
>>
>>>On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
>>>
>>>
>>>>Christoph Lameter wrote:
>>>>
>>>>
>>>>>On Wed, 7 Dec 2005, Shailabh Nagar wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>+void getnstimestamp(struct timespec *ts)
>>>>>
>>>>>
>>>>>
>>>>>There is already getnstimeofday in the kernel.
>>>>>
>>>>
>>>>Yes, and that function is being used within the getnstimestamp()
>>>>being proposed.
>>>>However, John Stultz had advised that getnstimeofday could get
>>>>affected by calls to
>>>>settimeofday and had recommended adjusting the getnstimeofday value
>>>>with wall_to_monotonic.
>>>>
>>>>John, could you elaborate ?
>>>
>>>
>>>
>>>I think you pretty well have it covered.
>>>getnstimeofday + wall_to_monotonic should be higher-res and more
>>>reliable (then TSC based sched_clock(), for example) for getting a
>>>timestamp.
>>
>>
>>How is this proposed function different from
>>do_posix_clock_monotonic_gettime()?
>>It calls getnstimeofday(), it also adjusts with wall_to_monotinic.
>>
>>It seems to me we just need to EXPORT_SYMBOL_GPL the
>>do_posix_clock_monotonic_gettime()?
>>
>>Thanks,
>> - jay
>>
>
>
> Hmmm. Looks like do_posix_clock_monotonic_gettime will suffice for this patch.
>
> Wonder why the clock parameter to do_posix_clock_monotonic_get is needed ?

Because it is called indirectly by the table driven posix clocks and
timers code where the clock, usually, is needed.

> Doesn't seem to be used.
>
> Any possibility of these set of functions changing their behaviour ?

Always :), but things are pretty stable now. Might want to add a
comment that it is being used outside of the posix "box".


--
George Anzinger [email protected]
HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/

2005-12-13 23:11:53

by Matt Helsley

[permalink] [raw]
Subject: Re: [ckrm-tech] Re: [Lse-tech] [RFC][Patch 1/5] nanosecond timestamps and diffs

On Tue, 2005-12-13 at 10:35 -0800, Jay Lan wrote:
> john stultz wrote:
> > On Mon, 2005-12-12 at 19:31 +0000, Shailabh Nagar wrote:
> >
> >>Christoph Lameter wrote:
> >>
> >>>On Wed, 7 Dec 2005, Shailabh Nagar wrote:
> >>>
> >>>
> >>>
> >>>>+void getnstimestamp(struct timespec *ts)
> >>>
> >>>
> >>>There is already getnstimeofday in the kernel.
> >>>
> >>
> >>Yes, and that function is being used within the getnstimestamp() being proposed.
> >>However, John Stultz had advised that getnstimeofday could get affected by calls to
> >>settimeofday and had recommended adjusting the getnstimeofday value with wall_to_monotonic.
> >>
> >>John, could you elaborate ?
> >
> >
> > I think you pretty well have it covered.
> >
> > getnstimeofday + wall_to_monotonic should be higher-res and more
> > reliable (then TSC based sched_clock(), for example) for getting a
> > timestamp.
>
> How is this proposed function different from
> do_posix_clock_monotonic_gettime()?
> It calls getnstimeofday(), it also adjusts with wall_to_monotinic.
>
> It seems to me we just need to EXPORT_SYMBOL_GPL the
> do_posix_clock_monotonic_gettime()?
>
> Thanks,
> - jay

Ah, yes. I should've searched for gettime rather than gettimeofday when
I was looking for a suitable function.

Two minor differences exist:

1) getnstimestamp does not fetch an unused copy of jiffies_64
2) getnstimestamp uses and advertises an explicit maximum resolution

I don't think either of these really matter so I'll post a series of
patches:

1) EXPORTing (_SYMBOL_GPL) do_posix_clock_monotonic_gettime()
2) using do_posix_clock_monotonic_gettime() as a timestamp
3) removing getnstimestamp()

Thanks,
-Matt Helsley