2012-11-23 14:21:34

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 0/3] cputime: Cleanups on adjusted cputime code

Hi,

Not vtime related that time, just a few supplementary cleanups on the part
that computes the adjusted cputime values.

Thanks.

Frederic Weisbecker (3):
cputime: Move thread_group_cputime() to sched code
cputime: Rename thread_group_times to thread_group_cputime_adjusted
cputime: Consolidate cputime adjustment code

fs/proc/array.c | 4 +-
include/linux/sched.h | 13 +++++--
kernel/exit.c | 4 +-
kernel/fork.c | 2 +-
kernel/posix-cpu-timers.c | 24 -------------
kernel/sched/cputime.c | 80 ++++++++++++++++++++++++++++++--------------
kernel/sys.c | 6 ++--
7 files changed, 71 insertions(+), 62 deletions(-)

--
1.7.5.4


2012-11-23 14:21:41

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 2/3] cputime: Rename thread_group_times to thread_group_cputime_adjusted

We have thread_group_cputime() and thread_group_times(). The naming
doesn't provide enough information about the difference between
these two APIs.

To lower the confusion, rename thread_group_times() to
thread_group_cputime_adjusted(). This name better suggests that
it's a version of thread_group_cputime() that does some stabilization
on the raw cputime values. ie here: scale on top of CFS runtime
stats and bound lower value for monotonicity.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Steven Rostedt <[email protected]>
---
fs/proc/array.c | 4 ++--
include/linux/sched.h | 4 ++--
kernel/exit.c | 4 ++--
kernel/sched/cputime.c | 8 ++++----
kernel/sys.c | 6 +++---
5 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index c1c207c..d369670 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -438,7 +438,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,

min_flt += sig->min_flt;
maj_flt += sig->maj_flt;
- thread_group_times(task, &utime, &stime);
+ thread_group_cputime_adjusted(task, &utime, &stime);
gtime += sig->gtime;
}

@@ -454,7 +454,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
if (!whole) {
min_flt = task->min_flt;
maj_flt = task->maj_flt;
- task_times(task, &utime, &stime);
+ task_cputime_adjusted(task, &utime, &stime);
gtime = task->gtime;
}

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e1581a0..e75cab5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1751,8 +1751,8 @@ static inline void put_task_struct(struct task_struct *t)
__put_task_struct(t);
}

-extern void task_times(struct task_struct *p, cputime_t *ut, cputime_t *st);
-extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *st);
+extern void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st);
+extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st);

/*
* Per process flags
diff --git a/kernel/exit.c b/kernel/exit.c
index 346616c..618f7ee 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1186,11 +1186,11 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
* as other threads in the parent group can be right
* here reaping other children at the same time.
*
- * We use thread_group_times() to get times for the thread
+ * We use thread_group_cputime_adjusted() to get times for the thread
* group, which consolidates times for all threads in the
* group including the group leader.
*/
- thread_group_times(p, &tgutime, &tgstime);
+ thread_group_cputime_adjusted(p, &tgutime, &tgstime);
spin_lock_irq(&p->real_parent->sighand->siglock);
psig = p->real_parent->signal;
sig = p->signal;
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index e56f138..7dc1553 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -445,13 +445,13 @@ void account_idle_ticks(unsigned long ticks)
* Use precise platform statistics if available:
*/
#ifdef CONFIG_VIRT_CPU_ACCOUNTING
-void task_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
+void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
{
*ut = p->utime;
*st = p->stime;
}

-void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
+void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
{
struct task_cputime cputime;

@@ -516,7 +516,7 @@ static cputime_t scale_utime(cputime_t utime, cputime_t rtime, cputime_t total)
return (__force cputime_t) temp;
}

-void task_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
+void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
{
cputime_t rtime, utime = p->utime, total = utime + p->stime;

@@ -543,7 +543,7 @@ void task_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
/*
* Must be called with siglock held.
*/
-void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
+void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
{
struct signal_struct *sig = p->signal;
struct task_cputime cputime;
diff --git a/kernel/sys.c b/kernel/sys.c
index e6e0ece..265b376 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1046,7 +1046,7 @@ void do_sys_times(struct tms *tms)
cputime_t tgutime, tgstime, cutime, cstime;

spin_lock_irq(&current->sighand->siglock);
- thread_group_times(current, &tgutime, &tgstime);
+ thread_group_cputime_adjusted(current, &tgutime, &tgstime);
cutime = current->signal->cutime;
cstime = current->signal->cstime;
spin_unlock_irq(&current->sighand->siglock);
@@ -1704,7 +1704,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
utime = stime = 0;

if (who == RUSAGE_THREAD) {
- task_times(current, &utime, &stime);
+ task_cputime_adjusted(current, &utime, &stime);
accumulate_thread_rusage(p, r);
maxrss = p->signal->maxrss;
goto out;
@@ -1730,7 +1730,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
break;

case RUSAGE_SELF:
- thread_group_times(p, &tgutime, &tgstime);
+ thread_group_cputime_adjusted(p, &tgutime, &tgstime);
utime += tgutime;
stime += tgstime;
r->ru_nvcsw += p->signal->nvcsw;
--
1.7.5.4

2012-11-23 14:21:40

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 3/3] cputime: Consolidate cputime adjustment code

task_cputime_adjusted() and thread_group_cputime_adjusted()
essentially share the same code. They just don't use the same
source:

* The first function uses the cputime in the task struct and the
previous adjusted snapshot that ensures monotonicity.

* The second adds the cputime of all tasks in the group and the
previous adjusted snapshot of the whole group from the signal
structure.

Just consolidate the common code that does the adjustment. These
functions just need to fetch the values from the appropriate
source.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Steven Rostedt <[email protected]>
---
include/linux/sched.h | 9 +++++++--
kernel/fork.c | 2 +-
kernel/sched/cputime.c | 46 +++++++++++++++++++++++-----------------------
3 files changed, 31 insertions(+), 26 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e75cab5..d23204f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -433,6 +433,11 @@ struct cpu_itimer {
u32 incr_error;
};

+struct cputime {
+ cputime_t utime;
+ cputime_t stime;
+};
+
/**
* struct task_cputime - collected CPU time counts
* @utime: time spent in user mode, in &cputime_t units
@@ -581,7 +586,7 @@ struct signal_struct {
cputime_t gtime;
cputime_t cgtime;
#ifndef CONFIG_VIRT_CPU_ACCOUNTING
- cputime_t prev_utime, prev_stime;
+ struct cputime prev_cputime;
#endif
unsigned long nvcsw, nivcsw, cnvcsw, cnivcsw;
unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
@@ -1340,7 +1345,7 @@ struct task_struct {
cputime_t utime, stime, utimescaled, stimescaled;
cputime_t gtime;
#ifndef CONFIG_VIRT_CPU_ACCOUNTING
- cputime_t prev_utime, prev_stime;
+ struct cputime prev_cputime;
#endif
unsigned long nvcsw, nivcsw; /* context switch counts */
struct timespec start_time; /* monotonic time */
diff --git a/kernel/fork.c b/kernel/fork.c
index 8b20ab7..0e7cdb9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1222,7 +1222,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
p->utime = p->stime = p->gtime = 0;
p->utimescaled = p->stimescaled = 0;
#ifndef CONFIG_VIRT_CPU_ACCOUNTING
- p->prev_utime = p->prev_stime = 0;
+ p->prev_cputime.utime = p->prev_cputime.stime = 0;
#endif
#if defined(SPLIT_RSS_COUNTING)
memset(&p->rss_stat, 0, sizeof(p->rss_stat));
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 7dc1553..220fdc4 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -516,14 +516,18 @@ static cputime_t scale_utime(cputime_t utime, cputime_t rtime, cputime_t total)
return (__force cputime_t) temp;
}

-void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
+static void cputime_adjust(struct task_cputime *curr,
+ struct cputime *prev,
+ cputime_t *ut, cputime_t *st)
{
- cputime_t rtime, utime = p->utime, total = utime + p->stime;
+ cputime_t rtime, utime, total;

+ utime = curr->utime;
+ total = utime + curr->stime;
/*
* Use CFS's precise accounting:
*/
- rtime = nsecs_to_cputime(p->se.sum_exec_runtime);
+ rtime = nsecs_to_cputime(curr->sum_exec_runtime);

if (total)
utime = scale_utime(utime, rtime, total);
@@ -533,11 +537,22 @@ void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
/*
* Compare with previous values, to keep monotonicity:
*/
- p->prev_utime = max(p->prev_utime, utime);
- p->prev_stime = max(p->prev_stime, rtime - p->prev_utime);
+ prev->utime = max(prev->utime, utime);
+ prev->stime = max(prev->stime, rtime - prev->utime);
+
+ *ut = prev->utime;
+ *st = prev->stime;
+}

- *ut = p->prev_utime;
- *st = p->prev_stime;
+void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
+{
+ struct task_cputime cputime = {
+ .utime = p->utime,
+ .stime = p->stime,
+ .sum_exec_runtime = p->se.sum_exec_runtime,
+ };
+
+ cputime_adjust(&cputime, &p->prev_cputime, ut, st);
}

/*
@@ -545,24 +560,9 @@ void task_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
*/
void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime_t *st)
{
- struct signal_struct *sig = p->signal;
struct task_cputime cputime;
- cputime_t rtime, utime, total;

thread_group_cputime(p, &cputime);
-
- total = cputime.utime + cputime.stime;
- rtime = nsecs_to_cputime(cputime.sum_exec_runtime);
-
- if (total)
- utime = scale_utime(cputime.utime, rtime, total);
- else
- utime = rtime;
-
- sig->prev_utime = max(sig->prev_utime, utime);
- sig->prev_stime = max(sig->prev_stime, rtime - sig->prev_utime);
-
- *ut = sig->prev_utime;
- *st = sig->prev_stime;
+ cputime_adjust(&cputime, &p->signal->prev_cputime, ut, st);
}
#endif
--
1.7.5.4

2012-11-23 14:22:12

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 1/3] cputime: Move thread_group_cputime() to sched code

thread_group_cputime() is a general cputime API that is not only
used by posix cpu timer. Let's move this helper to sched code.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Steven Rostedt <[email protected]>
---
kernel/posix-cpu-timers.c | 24 ------------------------
kernel/sched/cputime.c | 28 ++++++++++++++++++++++++++++
2 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 125cb67..d738402 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -217,30 +217,6 @@ static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
return 0;
}

-void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
-{
- struct signal_struct *sig = tsk->signal;
- struct task_struct *t;
-
- times->utime = sig->utime;
- times->stime = sig->stime;
- times->sum_exec_runtime = sig->sum_sched_runtime;
-
- rcu_read_lock();
- /* make sure we can trust tsk->thread_group list */
- if (!likely(pid_alive(tsk)))
- goto out;
-
- t = tsk;
- do {
- times->utime += t->utime;
- times->stime += t->stime;
- times->sum_exec_runtime += task_sched_runtime(t);
- } while_each_thread(tsk, t);
-out:
- rcu_read_unlock();
-}
-
static void update_gt_cputime(struct task_cputime *a, struct task_cputime *b)
{
if (b->utime > a->utime)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 8d859da..e56f138 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -288,6 +288,34 @@ static __always_inline bool steal_account_process_tick(void)
return false;
}

+/*
+ * Accumulate raw cputime values of dead tasks (sig->[us]time) and live
+ * tasks (sum on group iteration) belonging to @tsk's group.
+ */
+void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
+{
+ struct signal_struct *sig = tsk->signal;
+ struct task_struct *t;
+
+ times->utime = sig->utime;
+ times->stime = sig->stime;
+ times->sum_exec_runtime = sig->sum_sched_runtime;
+
+ rcu_read_lock();
+ /* make sure we can trust tsk->thread_group list */
+ if (!likely(pid_alive(tsk)))
+ goto out;
+
+ t = tsk;
+ do {
+ times->utime += t->utime;
+ times->stime += t->stime;
+ times->sum_exec_runtime += task_sched_runtime(t);
+ } while_each_thread(tsk, t);
+out:
+ rcu_read_unlock();
+}
+
#ifndef CONFIG_VIRT_CPU_ACCOUNTING

#ifdef CONFIG_IRQ_TIME_ACCOUNTING
--
1.7.5.4

2012-11-25 19:09:41

by Paul Gortmaker

[permalink] [raw]
Subject: Re: [PATCH 3/3] cputime: Consolidate cputime adjustment code

[[PATCH 3/3] cputime: Consolidate cputime adjustment code] On 23/11/2012 (Fri 15:21) Frederic Weisbecker wrote:

> task_cputime_adjusted() and thread_group_cputime_adjusted()
> essentially share the same code. They just don't use the same
> source:
>
> * The first function uses the cputime in the task struct and the
> previous adjusted snapshot that ensures monotonicity.
>
> * The second adds the cputime of all tasks in the group and the
> previous adjusted snapshot of the whole group from the signal
> structure.
>
> Just consolidate the common code that does the adjustment. These
> functions just need to fetch the values from the appropriate
> source.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> ---
> include/linux/sched.h | 9 +++++++--
> kernel/fork.c | 2 +-
> kernel/sched/cputime.c | 46 +++++++++++++++++++++++-----------------------
> 3 files changed, 31 insertions(+), 26 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index e75cab5..d23204f 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -433,6 +433,11 @@ struct cpu_itimer {
> u32 incr_error;
> };
>
> +struct cputime {
> + cputime_t utime;
> + cputime_t stime;
> +};
> +

Hi Frederic,

This new struct cputime is a 2/3 subset of the three variable struct
task_cputime we see right below. Maybe this is a stupid question, but I
was wondering why you didn't re-use task_cputime, and ignore the
sum_exec_runtime field -- vs. introducing this very similar struct?

Or maybe there is another way to consolidate the structs? With the
two being so similar, I wonder if it will be confusing when to use which
one of the two.

Thanks,
Paul.
--

/**
* struct task_cputime - collected CPU time counts
* @utime: time spent in user mode, in &cputime_t units
* @stime: time spent in kernel mode, in &cputime_t units
* @sum_exec_runtime: total time spent on the CPU, in nanoseconds
*
* This structure groups together three kinds of CPU time that are
* tracked for threads and thread groups. Most things considering
* CPU time want to group these counts together and treat all three
* of them in parallel.
*/
struct task_cputime {
cputime_t utime;
cputime_t stime;
unsigned long long sum_exec_runtime;
};

2012-11-26 00:16:59

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 3/3] cputime: Consolidate cputime adjustment code

2012/11/25 Paul Gortmaker <[email protected]>:
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -433,6 +433,11 @@ struct cpu_itimer {
>> u32 incr_error;
>> };
>>
>> +struct cputime {
>> + cputime_t utime;
>> + cputime_t stime;
>> +};
>> +
>
> Hi Frederic,
>
> This new struct cputime is a 2/3 subset of the three variable struct
> task_cputime we see right below. Maybe this is a stupid question, but I
> was wondering why you didn't re-use task_cputime, and ignore the
> sum_exec_runtime field -- vs. introducing this very similar struct?

Not a stupid question, I need to add a comment on that. I avoided to
reuse struct task_cputime because sum_exec_runtime is unused and 8
wasted bytes in the middle of task_struct is very undesirable. As in
signal struct.

>
> Or maybe there is another way to consolidate the structs? With the
> two being so similar, I wonder if it will be confusing when to use which
> one of the two.

I can add a comment that tells when to use which. Other than that I'm
short on ideas to consolidate both without creating a mess with long
dereferencing expressions like tsk->cputime.utime and so.

2012-11-26 18:56:08

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/3] cputime: Rename thread_group_times to thread_group_cputime_adjusted

On Fri, 2012-11-23 at 15:21 +0100, Frederic Weisbecker wrote:
> We have thread_group_cputime() and thread_group_times(). The naming
> doesn't provide enough information about the difference between
> these two APIs.
>
> To lower the confusion, rename thread_group_times() to
> thread_group_cputime_adjusted(). This name better suggests that
> it's a version of thread_group_cputime() that does some stabilization
> on the raw cputime values. ie here: scale on top of CFS runtime
> stats and bound lower value for monotonicity.

But, thread_group_times() does not do any type of adjustment. It only
retrieves the cpu times:

void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
{
struct task_cputime cputime;

thread_group_cputime(p, &cputime);

*ut = cputime.utime;
*st = cputime.stime;
}


It retrieves the current times, it doesn't adjust them.

I'm thinking the current name is more accurate.

-- Steve

2012-11-26 19:24:12

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/3] cputime: Rename thread_group_times to thread_group_cputime_adjusted

2012/11/26 Steven Rostedt <[email protected]>:
> On Fri, 2012-11-23 at 15:21 +0100, Frederic Weisbecker wrote:
>> We have thread_group_cputime() and thread_group_times(). The naming
>> doesn't provide enough information about the difference between
>> these two APIs.
>>
>> To lower the confusion, rename thread_group_times() to
>> thread_group_cputime_adjusted(). This name better suggests that
>> it's a version of thread_group_cputime() that does some stabilization
>> on the raw cputime values. ie here: scale on top of CFS runtime
>> stats and bound lower value for monotonicity.
>
> But, thread_group_times() does not do any type of adjustment. It only
> retrieves the cpu times:
>
> void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
> {
> struct task_cputime cputime;
>
> thread_group_cputime(p, &cputime);
>
> *ut = cputime.utime;
> *st = cputime.stime;
> }

This is the CONFIG_VIRT_CPU_ACCOUNTING only version. It also needs
some monotonicity guard IMO but that's another issue.
But please look at the other version.

> It retrieves the current times, it doesn't adjust them.
>
> I'm thinking the current name is more accurate.

2012-11-26 19:33:45

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/3] cputime: Rename thread_group_times to thread_group_cputime_adjusted

On Mon, 2012-11-26 at 20:24 +0100, Frederic Weisbecker wrote:
> 2012/11/26 Steven Rostedt <[email protected]>:
> > On Fri, 2012-11-23 at 15:21 +0100, Frederic Weisbecker wrote:
> >> We have thread_group_cputime() and thread_group_times(). The naming
> >> doesn't provide enough information about the difference between
> >> these two APIs.
> >>
> >> To lower the confusion, rename thread_group_times() to
> >> thread_group_cputime_adjusted(). This name better suggests that
> >> it's a version of thread_group_cputime() that does some stabilization
> >> on the raw cputime values. ie here: scale on top of CFS runtime
> >> stats and bound lower value for monotonicity.
> >
> > But, thread_group_times() does not do any type of adjustment. It only
> > retrieves the cpu times:
> >
> > void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
> > {
> > struct task_cputime cputime;
> >
> > thread_group_cputime(p, &cputime);
> >
> > *ut = cputime.utime;
> > *st = cputime.stime;
> > }
>
> This is the CONFIG_VIRT_CPU_ACCOUNTING only version. It also needs
> some monotonicity guard IMO but that's another issue.
> But please look at the other version.

Yeah, I just noticed. That turkey juice is still having an affect on
me ;-)

>
> > It retrieves the current times, it doesn't adjust them.
> >
> > I'm thinking the current name is more accurate.

OK, let's take a look at the other version now:

void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *st)
{
struct signal_struct *sig = p->signal;
struct task_cputime cputime;
cputime_t rtime, utime, total;

thread_group_cputime(p, &cputime);

total = cputime.utime + cputime.stime;
rtime = nsecs_to_cputime(cputime.sum_exec_runtime);

if (total)
utime = scale_utime(cputime.utime, rtime, total);
else
utime = rtime;

sig->prev_utime = max(sig->prev_utime, utime);
sig->prev_stime = max(sig->prev_stime, rtime - sig->prev_utime);

*ut = sig->prev_utime;
*st = sig->prev_stime;
}

So this version also updates the task's signal->prev_[us]times as well.

I guess I'll wait for you to explain to me more about what is going
on :-)

-- Steve

2012-11-27 23:51:34

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/3] cputime: Rename thread_group_times to thread_group_cputime_adjusted

2012/11/26 Steven Rostedt <[email protected]>:
> OK, let's take a look at the other version now:
>
> void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *st)

So this does the same thing than thread_group_cputime(), ie: fetch the
raw cputime stats from the task/signal struct, with a two adjustments:

* It scales the raw values with the CFS stats.
* It ensures the stats are increased monotonically across
thread_group_times() calls

More details below:

> {
> struct signal_struct *sig = p->signal;
> struct task_cputime cputime;
> cputime_t rtime, utime, total;
>
> thread_group_cputime(p, &cputime);
>
> total = cputime.utime + cputime.stime;
> rtime = nsecs_to_cputime(cputime.sum_exec_runtime);
>
> if (total)
> utime = scale_utime(cputime.utime, rtime, total);
> else
> utime = rtime;

raw cputime values (tsk->utime and tsk->stime) have a per tick
granularity. So the precision is not the best. For example a tick can
interrupt the same task 5 times while that task has actually ran for
no more than a jiffy overall. This can happen if the task runs for
short slices and get unlucky enough to often run at the same time the
tick fired.

The opposite can also happen: the task has ran for 5 jiffies but
wasn't much interrupted by the tick.

To fix this we scale utime and stime values against CFS accumulated
runtime for the task. As follows:

total_ticks_runtime = utime + stime
utime = utime * (total_cfs_runtime / total_ticks_runtime)
stime = total_cfs_runtime - utime

>
> sig->prev_utime = max(sig->prev_utime, utime);
> sig->prev_stime = max(sig->prev_stime, rtime - sig->prev_utime);

Now this scaling brings another problem. If between two calls of
thread_group_times(), tsk->utime has increased a lot and the cfs tsk
runtime hasn't increased that much, the resulting value of adjusted
stime may decrease from the 2nd to the 1st call of the function. But
userspace relies on the monotonicity of cputime. The same can happen
with utime if tsk->stime has increased a lot.

To fix this we apply the above monotonicity fixup.

I can add these explanations on comments in a new patch.

>
> *ut = sig->prev_utime;
> *st = sig->prev_stime;
> }
>
> So this version also updates the task's signal->prev_[us]times as well.
>
> I guess I'll wait for you to explain to me more about what is going
> on :-)
>
> -- Steve
>
>

2012-11-28 00:52:17

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/3] cputime: Rename thread_group_times to thread_group_cputime_adjusted

On Wed, 2012-11-28 at 00:51 +0100, Frederic Weisbecker wrote:

> To fix this we apply the above monotonicity fixup.

Thanks for the explanation, although I kind of figured it out
already ;-)

>
> I can add these explanations on comments in a new patch.

Comments are always good.

-- Steve