2015-11-19 15:47:43

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 0/7] cputime: A few tickless cputime accounting fixes and improvements

The first two patches from Hiroshi Shimamoto are fixes to be backported.
Although it's debatable whether the 2nd patch needs backport.

The patches that follow are further fixes and cleanups and the last
one is an optimization.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
cputime/fixes

HEAD: e4b430e6598cf6e069b08d7a81ee2ca75fe385f2

Thanks,
Frederic
---

Frederic Weisbecker (5):
cputime: Clarify vtime symbols and document them
cputime: Correctly handle task guest time on housekeepers
cputime: Rename vtime_accounting_enabled to vtime_accounting_cpu_enabled
cputime: Introduce vtime accounting check for readers
cputime: Convert vtime_seqlock to seqcount

Hiroshi Shimamoto (2):
cputime: Fix invalid gtime in proc
cputime: Remove extra cost in task_cputime


include/linux/context_tracking.h | 4 +--
include/linux/init_task.h | 2 +-
include/linux/sched.h | 7 ++--
include/linux/vtime.h | 25 ++++++++++----
kernel/fork.c | 4 +--
kernel/sched/cputime.c | 75 +++++++++++++++++++++++++---------------
kernel/time/tick-sched.c | 2 +-
7 files changed, 77 insertions(+), 42 deletions(-)


2015-11-19 15:47:47

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 1/7] cputime: Fix invalid gtime in proc

From: Hiroshi Shimamoto <[email protected]>

/proc/stats shows invalid gtime when the thread is running in guest.
When vtime accounting is not enabled, we cannot get a valid delta.
The delta is calculated with now - tsk->vtime_snap, but tsk->vtime_snap
is only updated when vtime accounting is runtime enabled.

This patch makes task_gtime() just return gtime without computing the
buggy non-existing tickless delta when vtime accounting is not enabled.

Use context_tracking_is_enabled() to check if vtime is accounting on
some cpu, in which case only we need to check the tickless delta. This
way we fix the gtime value regression on machines not running nohz full.

The kernel config contains CONFIG_VIRT_CPU_ACCOUNTING_GEN=y and
CONFIG_NO_HZ_FULL_ALL=n and boot without nohz_full.

I ran and stop a busy loop in VM and see the gtime in host.
Dump the 43rd field which shows the gtime in every second.
# while :; do awk '{print $3" "$43}' /proc/3955/task/4014/stat; sleep 1; done
S 4348
R 7064566
R 7064766
R 7064967
R 7065168
S 4759
S 4759

During running busy loop, it returns large value.

After applying this patch, we can see right gtime.

# while :; do awk '{print $3" "$43}' /proc/10913/task/10956/stat; sleep 1; done
S 5338
R 5365
R 5465
R 5566
R 5666
S 5726
S 5726

Signed-off-by: Hiroshi Shimamoto <[email protected]>
Cc: [email protected]
Cc: Christoph Lameter <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Hiroshi Shimamoto <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/sched/cputime.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 26a5446..05de80b 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -788,6 +788,9 @@ cputime_t task_gtime(struct task_struct *t)
unsigned int seq;
cputime_t gtime;

+ if (!context_tracking_is_enabled())
+ return t->gtime;
+
do {
seq = read_seqbegin(&t->vtime_seqlock);

--
2.5.3

2015-11-19 15:51:09

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 2/7] cputime: Remove extra cost in task_cputime

From: Hiroshi Shimamoto <[email protected]>

There is an extra cost in task_cputime() and task_cputime_scaled() when
nohz_full is not activated. When vtime accounting is not enabled, we
don't need to get deltas of utime and stime under vtime seqlock.

This patch removes that cost with adding a shortcut route if vtime
accounting is not enabled.

Use context_tracking_is_enabled() to check if vtime is accounting on
some cpu, in which case only we need to check the tickless cputime delta.

Signed-off-by: Hiroshi Shimamoto <[email protected]>
Cc: [email protected]
Cc: Christoph Lameter <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Hiroshi Shimamoto <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/sched/cputime.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 05de80b..1128d4b 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -853,6 +853,14 @@ void task_cputime(struct task_struct *t, cputime_t *utime, cputime_t *stime)
{
cputime_t udelta, sdelta;

+ if (!context_tracking_is_enabled()) {
+ if (utime)
+ *utime = t->utime;
+ if (stime)
+ *stime = t->stime;
+ return;
+ }
+
fetch_task_cputime(t, utime, stime, &t->utime,
&t->stime, &udelta, &sdelta);
if (utime)
@@ -866,6 +874,14 @@ void task_cputime_scaled(struct task_struct *t,
{
cputime_t udelta, sdelta;

+ if (!context_tracking_is_enabled()) {
+ if (utimescaled)
+ *utimescaled = t->utimescaled;
+ if (stimescaled)
+ *stimescaled = t->stimescaled;
+ return;
+ }
+
fetch_task_cputime(t, utimescaled, stimescaled,
&t->utimescaled, &t->stimescaled, &udelta, &sdelta);
if (utimescaled)
--
2.5.3

2015-11-19 15:47:49

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 3/7] cputime: Clarify vtime symbols and document them

VTIME_SLEEPING state happens either when:

1) The task is sleeping and no tickless delta is to be added on the task
cputime stats.
2) The CPU isn't running vtime at all, so the same properties of 1) applies.

Lets rename the vtime symbol to reflect both states.

Cc: Christoph Lameter <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Hiroshi Shimamoto <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
include/linux/sched.h | 5 ++++-
kernel/fork.c | 2 +-
kernel/sched/cputime.c | 6 +++---
3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index edad7a4..1a837e7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1522,8 +1522,11 @@ struct task_struct {
seqlock_t vtime_seqlock;
unsigned long long vtime_snap;
enum {
- VTIME_SLEEPING = 0,
+ /* Task is sleeping or running in a CPU with VTIME inactive */
+ VTIME_INACTIVE = 0,
+ /* Task runs in userspace in a CPU with VTIME active */
VTIME_USER,
+ /* Task runs in kernelspace in a CPU with VTIME active */
VTIME_SYS,
} vtime_snap_whence;
#endif
diff --git a/kernel/fork.c b/kernel/fork.c
index f97f2c4..c0a1370 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1350,7 +1350,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
seqlock_init(&p->vtime_seqlock);
p->vtime_snap = 0;
- p->vtime_snap_whence = VTIME_SLEEPING;
+ p->vtime_snap_whence = VTIME_INACTIVE;
#endif

#if defined(SPLIT_RSS_COUNTING)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 1128d4b..4a18a6e 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -680,7 +680,7 @@ static cputime_t get_vtime_delta(struct task_struct *tsk)
{
unsigned long long delta = vtime_delta(tsk);

- WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_SLEEPING);
+ WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_INACTIVE);
tsk->vtime_snap += delta;

/* CHECKME: always safe to convert nsecs to cputime? */
@@ -764,7 +764,7 @@ void vtime_account_idle(struct task_struct *tsk)
void arch_vtime_task_switch(struct task_struct *prev)
{
write_seqlock(&prev->vtime_seqlock);
- prev->vtime_snap_whence = VTIME_SLEEPING;
+ prev->vtime_snap_whence = VTIME_INACTIVE;
write_sequnlock(&prev->vtime_seqlock);

write_seqlock(&current->vtime_seqlock);
@@ -829,7 +829,7 @@ fetch_task_cputime(struct task_struct *t,
*s_dst = *s_src;

/* Task is sleeping, nothing to add */
- if (t->vtime_snap_whence == VTIME_SLEEPING ||
+ if (t->vtime_snap_whence == VTIME_INACTIVE ||
is_idle_task(t))
continue;

--
2.5.3

2015-11-19 15:49:12

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 4/7] cputime: Correctly handle task guest time on housekeepers

When a task runs on a housekeeper (a CPU running with the periodic tick
with neighbours running tickless), it doesn't account cputime using vtime
but relies on the tick. Such a task has its vtime_snap_whence value set
to VTIME_INACTIVE.

Readers won't handle that correctly though. As long as vtime is running
on some CPU, readers incorretly assume that vtime runs on all CPUs and
always compute the tickless cputime delta, which is only junk on
housekeepers.

So lets fix this with checking that the target runs on a vtime CPU through
the appropriate state check before computing the tickless delta.

Cc: Christoph Lameter <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Hiroshi Shimamoto <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/sched/cputime.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 4a18a6e..5cf24e7 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -795,7 +795,7 @@ cputime_t task_gtime(struct task_struct *t)
seq = read_seqbegin(&t->vtime_seqlock);

gtime = t->gtime;
- if (t->flags & PF_VCPU)
+ if (t->vtime_snap_whence == VTIME_SYS && t->flags & PF_VCPU)
gtime += vtime_delta(t);

} while (read_seqretry(&t->vtime_seqlock, seq));
--
2.5.3

2015-11-19 15:47:52

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 5/7] cputime: Rename vtime_accounting_enabled to vtime_accounting_cpu_enabled

vtime_accounting_enabled() checks if vtime is running on the current CPU
and is as such a misnomer. Lets rename it to a function that reflect its
locality. We are going to need the current name for a function that tells
if vtime runs at all on some CPU.

Cc: Christoph Lameter <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Hiroshi Shimamoto <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
include/linux/context_tracking.h | 4 ++--
include/linux/vtime.h | 14 +++++++-------
kernel/sched/cputime.c | 2 +-
kernel/time/tick-sched.c | 2 +-
4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 68b575a..d259274 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -86,7 +86,7 @@ static inline void context_tracking_init(void) { }
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
static inline void guest_enter(void)
{
- if (vtime_accounting_enabled())
+ if (vtime_accounting_cpu_enabled())
vtime_guest_enter(current);
else
current->flags |= PF_VCPU;
@@ -100,7 +100,7 @@ static inline void guest_exit(void)
if (context_tracking_is_enabled())
__context_tracking_exit(CONTEXT_GUEST);

- if (vtime_accounting_enabled())
+ if (vtime_accounting_cpu_enabled())
vtime_guest_exit(current);
else
current->flags &= ~PF_VCPU;
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index c5165fd..ca23e83 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -10,14 +10,14 @@
struct task_struct;

/*
- * vtime_accounting_enabled() definitions/declarations
+ * vtime_accounting_cpu_enabled() definitions/declarations
*/
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-static inline bool vtime_accounting_enabled(void) { return true; }
+static inline bool vtime_accounting_cpu_enabled(void) { return true; }
#endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */

#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-static inline bool vtime_accounting_enabled(void)
+static inline bool vtime_accounting_cpu_enabled(void)
{
if (context_tracking_is_enabled()) {
if (context_tracking_cpu_is_enabled())
@@ -29,7 +29,7 @@ static inline bool vtime_accounting_enabled(void)
#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */

#ifndef CONFIG_VIRT_CPU_ACCOUNTING
-static inline bool vtime_accounting_enabled(void) { return false; }
+static inline bool vtime_accounting_cpu_enabled(void) { return false; }
#endif /* !CONFIG_VIRT_CPU_ACCOUNTING */


@@ -44,7 +44,7 @@ extern void vtime_task_switch(struct task_struct *prev);
extern void vtime_common_task_switch(struct task_struct *prev);
static inline void vtime_task_switch(struct task_struct *prev)
{
- if (vtime_accounting_enabled())
+ if (vtime_accounting_cpu_enabled())
vtime_common_task_switch(prev);
}
#endif /* __ARCH_HAS_VTIME_TASK_SWITCH */
@@ -59,7 +59,7 @@ extern void vtime_account_irq_enter(struct task_struct *tsk);
extern void vtime_common_account_irq_enter(struct task_struct *tsk);
static inline void vtime_account_irq_enter(struct task_struct *tsk)
{
- if (vtime_accounting_enabled())
+ if (vtime_accounting_cpu_enabled())
vtime_common_account_irq_enter(tsk);
}
#endif /* __ARCH_HAS_VTIME_ACCOUNT */
@@ -78,7 +78,7 @@ extern void vtime_gen_account_irq_exit(struct task_struct *tsk);

static inline void vtime_account_irq_exit(struct task_struct *tsk)
{
- if (vtime_accounting_enabled())
+ if (vtime_accounting_cpu_enabled())
vtime_gen_account_irq_exit(tsk);
}

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5cf24e7..5727217 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -466,7 +466,7 @@ void account_process_tick(struct task_struct *p, int user_tick)
cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
struct rq *rq = this_rq();

- if (vtime_accounting_enabled())
+ if (vtime_accounting_cpu_enabled())
return;

if (sched_clock_irqtime) {
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 7c7ec45..b4a2a31 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -875,7 +875,7 @@ static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
unsigned long ticks;

- if (vtime_accounting_enabled())
+ if (vtime_accounting_cpu_enabled())
return;
/*
* We stopped the tick in idle. Update process times would miss the
--
2.5.3

2015-11-19 15:48:00

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 6/7] cputime: Introduce vtime accounting check for readers

Readers need to know if vtime runs at all on some CPU somewhere, this
is a fast-path check to determine if we need to check further the need
to add up any tickless cputime delta.

This fast path check uses context tracking state because vtime is tied
to context tracking as of now. This check appears to be confusing though
so lets use a vtime function that deals with context tracking details
in vtime implementation instead.

Cc: Christoph Lameter <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Hiroshi Shimamoto <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
include/linux/vtime.h | 13 ++++++++++++-
kernel/sched/cputime.c | 6 +++---
2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index ca23e83..fa21969 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -17,9 +17,20 @@ static inline bool vtime_accounting_cpu_enabled(void) { return true; }
#endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */

#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+/*
+ * Checks if vtime is enabled on some CPU. Cputime readers want to be careful
+ * in that case and compute the tickless cputime.
+ * For now vtime state is tied to context tracking. We might want to decouple
+ * those later if necessary.
+ */
+static inline bool vtime_accounting_enabled(void)
+{
+ return context_tracking_is_enabled();
+}
+
static inline bool vtime_accounting_cpu_enabled(void)
{
- if (context_tracking_is_enabled()) {
+ if (vtime_accounting_enabled()) {
if (context_tracking_cpu_is_enabled())
return true;
}
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5727217..9989c3f 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -788,7 +788,7 @@ cputime_t task_gtime(struct task_struct *t)
unsigned int seq;
cputime_t gtime;

- if (!context_tracking_is_enabled())
+ if (!vtime_accounting_enabled())
return t->gtime;

do {
@@ -853,7 +853,7 @@ void task_cputime(struct task_struct *t, cputime_t *utime, cputime_t *stime)
{
cputime_t udelta, sdelta;

- if (!context_tracking_is_enabled()) {
+ if (!vtime_accounting_enabled()) {
if (utime)
*utime = t->utime;
if (stime)
@@ -874,7 +874,7 @@ void task_cputime_scaled(struct task_struct *t,
{
cputime_t udelta, sdelta;

- if (!context_tracking_is_enabled()) {
+ if (!vtime_accounting_enabled()) {
if (utimescaled)
*utimescaled = t->utimescaled;
if (stimescaled)
--
2.5.3

2015-11-19 15:47:59

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 7/7] cputime: Convert vtime_seqlock to seqcount

The cputime can only be updated by the current task itself, even in
vtime case. So we can safely use seqcount instead of seqlock as there
is no writer concurrency involved.

Cc: Christoph Lameter <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Hiroshi Shimamoto <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
include/linux/init_task.h | 2 +-
include/linux/sched.h | 2 +-
kernel/fork.c | 2 +-
kernel/sched/cputime.c | 46 ++++++++++++++++++++++++----------------------
4 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 1c1ff7e..f2cb8d4 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -150,7 +150,7 @@ extern struct task_group root_task_group;

#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
# define INIT_VTIME(tsk) \
- .vtime_seqlock = __SEQLOCK_UNLOCKED(tsk.vtime_seqlock), \
+ .vtime_seqcount = SEQCNT_ZERO(tsk.vtime_seqcount), \
.vtime_snap = 0, \
.vtime_snap_whence = VTIME_SYS,
#else
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1a837e7..84f43c4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1519,7 +1519,7 @@ struct task_struct {
cputime_t gtime;
struct prev_cputime prev_cputime;
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
- seqlock_t vtime_seqlock;
+ seqcount_t vtime_seqcount;
unsigned long long vtime_snap;
enum {
/* Task is sleeping or running in a CPU with VTIME inactive */
diff --git a/kernel/fork.c b/kernel/fork.c
index c0a1370..eea32b5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1348,7 +1348,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
prev_cputime_init(&p->prev_cputime);

#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
- seqlock_init(&p->vtime_seqlock);
+ seqcount_init(&p->vtime_seqcount);
p->vtime_snap = 0;
p->vtime_snap_whence = VTIME_INACTIVE;
#endif
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 9989c3f..d5ff5c6 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -696,37 +696,37 @@ static void __vtime_account_system(struct task_struct *tsk)

void vtime_account_system(struct task_struct *tsk)
{
- write_seqlock(&tsk->vtime_seqlock);
+ write_seqcount_begin(&tsk->vtime_seqcount);
__vtime_account_system(tsk);
- write_sequnlock(&tsk->vtime_seqlock);
+ write_seqcount_end(&tsk->vtime_seqcount);
}

void vtime_gen_account_irq_exit(struct task_struct *tsk)
{
- write_seqlock(&tsk->vtime_seqlock);
+ write_seqcount_begin(&tsk->vtime_seqcount);
__vtime_account_system(tsk);
if (context_tracking_in_user())
tsk->vtime_snap_whence = VTIME_USER;
- write_sequnlock(&tsk->vtime_seqlock);
+ write_seqcount_end(&tsk->vtime_seqcount);
}

void vtime_account_user(struct task_struct *tsk)
{
cputime_t delta_cpu;

- write_seqlock(&tsk->vtime_seqlock);
+ write_seqcount_begin(&tsk->vtime_seqcount);
delta_cpu = get_vtime_delta(tsk);
tsk->vtime_snap_whence = VTIME_SYS;
account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
- write_sequnlock(&tsk->vtime_seqlock);
+ write_seqcount_end(&tsk->vtime_seqcount);
}

void vtime_user_enter(struct task_struct *tsk)
{
- write_seqlock(&tsk->vtime_seqlock);
+ write_seqcount_begin(&tsk->vtime_seqcount);
__vtime_account_system(tsk);
tsk->vtime_snap_whence = VTIME_USER;
- write_sequnlock(&tsk->vtime_seqlock);
+ write_seqcount_end(&tsk->vtime_seqcount);
}

void vtime_guest_enter(struct task_struct *tsk)
@@ -738,19 +738,19 @@ void vtime_guest_enter(struct task_struct *tsk)
* synchronization against the reader (task_gtime())
* that can thus safely catch up with a tickless delta.
*/
- write_seqlock(&tsk->vtime_seqlock);
+ write_seqcount_begin(&tsk->vtime_seqcount);
__vtime_account_system(tsk);
current->flags |= PF_VCPU;
- write_sequnlock(&tsk->vtime_seqlock);
+ write_seqcount_end(&tsk->vtime_seqcount);
}
EXPORT_SYMBOL_GPL(vtime_guest_enter);

void vtime_guest_exit(struct task_struct *tsk)
{
- write_seqlock(&tsk->vtime_seqlock);
+ write_seqcount_begin(&tsk->vtime_seqcount);
__vtime_account_system(tsk);
current->flags &= ~PF_VCPU;
- write_sequnlock(&tsk->vtime_seqlock);
+ write_seqcount_end(&tsk->vtime_seqcount);
}
EXPORT_SYMBOL_GPL(vtime_guest_exit);

@@ -763,24 +763,26 @@ void vtime_account_idle(struct task_struct *tsk)

void arch_vtime_task_switch(struct task_struct *prev)
{
- write_seqlock(&prev->vtime_seqlock);
+ write_seqcount_begin(&prev->vtime_seqcount);
prev->vtime_snap_whence = VTIME_INACTIVE;
- write_sequnlock(&prev->vtime_seqlock);
+ write_seqcount_end(&prev->vtime_seqcount);

- write_seqlock(&current->vtime_seqlock);
+ write_seqcount_begin(&current->vtime_seqcount);
current->vtime_snap_whence = VTIME_SYS;
current->vtime_snap = sched_clock_cpu(smp_processor_id());
- write_sequnlock(&current->vtime_seqlock);
+ write_seqcount_end(&current->vtime_seqcount);
}

void vtime_init_idle(struct task_struct *t, int cpu)
{
unsigned long flags;

- write_seqlock_irqsave(&t->vtime_seqlock, flags);
+ local_irq_save(flags);
+ write_seqcount_begin(&t->vtime_seqcount);
t->vtime_snap_whence = VTIME_SYS;
t->vtime_snap = sched_clock_cpu(cpu);
- write_sequnlock_irqrestore(&t->vtime_seqlock, flags);
+ write_seqcount_end(&t->vtime_seqcount);
+ local_irq_restore(flags);
}

cputime_t task_gtime(struct task_struct *t)
@@ -792,13 +794,13 @@ cputime_t task_gtime(struct task_struct *t)
return t->gtime;

do {
- seq = read_seqbegin(&t->vtime_seqlock);
+ seq = read_seqcount_begin(&t->vtime_seqcount);

gtime = t->gtime;
if (t->vtime_snap_whence == VTIME_SYS && t->flags & PF_VCPU)
gtime += vtime_delta(t);

- } while (read_seqretry(&t->vtime_seqlock, seq));
+ } while (read_seqcount_retry(&t->vtime_seqcount, seq));

return gtime;
}
@@ -821,7 +823,7 @@ fetch_task_cputime(struct task_struct *t,
*udelta = 0;
*sdelta = 0;

- seq = read_seqbegin(&t->vtime_seqlock);
+ seq = read_seqcount_begin(&t->vtime_seqcount);

if (u_dst)
*u_dst = *u_src;
@@ -845,7 +847,7 @@ fetch_task_cputime(struct task_struct *t,
if (t->vtime_snap_whence == VTIME_SYS)
*sdelta = delta;
}
- } while (read_seqretry(&t->vtime_seqlock, seq));
+ } while (read_seqcount_retry(&t->vtime_seqcount, seq));
}


--
2.5.3

2015-11-19 20:28:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 6/7] cputime: Introduce vtime accounting check for readers

On Thu, Nov 19, 2015 at 04:47:33PM +0100, Frederic Weisbecker wrote:
> +++ b/include/linux/vtime.h
> @@ -17,9 +17,20 @@ static inline bool vtime_accounting_cpu_enabled(void) { return true; }
> #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
>
> #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
> +/*
> + * Checks if vtime is enabled on some CPU. Cputime readers want to be careful
> + * in that case and compute the tickless cputime.
> + * For now vtime state is tied to context tracking. We might want to decouple
> + * those later if necessary.
> + */
> +static inline bool vtime_accounting_enabled(void)
> +{
> + return context_tracking_is_enabled();
> +}

Should this not also include a definition of this function for
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE and CONFIG_VIRT_CPU_ACCOUNTING ?

2015-11-23 14:19:25

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 6/7] cputime: Introduce vtime accounting check for readers

On Thu, Nov 19, 2015 at 09:28:28PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 19, 2015 at 04:47:33PM +0100, Frederic Weisbecker wrote:
> > +++ b/include/linux/vtime.h
> > @@ -17,9 +17,20 @@ static inline bool vtime_accounting_cpu_enabled(void) { return true; }
> > #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
> >
> > #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
> > +/*
> > + * Checks if vtime is enabled on some CPU. Cputime readers want to be careful
> > + * in that case and compute the tickless cputime.
> > + * For now vtime state is tied to context tracking. We might want to decouple
> > + * those later if necessary.
> > + */
> > +static inline bool vtime_accounting_enabled(void)
> > +{
> > + return context_tracking_is_enabled();
> > +}
>
> Should this not also include a definition of this function for
> CONFIG_VIRT_CPU_ACCOUNTING_NATIVE and CONFIG_VIRT_CPU_ACCOUNTING ?

I could but I haven't found any user of it yet for others than
CONFIG_VIRT_CPU_ACCOUNTING_GEN. The NATIVE version, when enabled, runs on
all CPUs anyway.

2015-11-23 14:35:29

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 6/7] cputime: Introduce vtime accounting check for readers

On Mon, Nov 23, 2015 at 03:19:19PM +0100, Frederic Weisbecker wrote:
> On Thu, Nov 19, 2015 at 09:28:28PM +0100, Peter Zijlstra wrote:
> > On Thu, Nov 19, 2015 at 04:47:33PM +0100, Frederic Weisbecker wrote:
> > > +++ b/include/linux/vtime.h
> > > @@ -17,9 +17,20 @@ static inline bool vtime_accounting_cpu_enabled(void) { return true; }
> > > #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
> > >
> > > #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
> > > +/*
> > > + * Checks if vtime is enabled on some CPU. Cputime readers want to be careful
> > > + * in that case and compute the tickless cputime.
> > > + * For now vtime state is tied to context tracking. We might want to decouple
> > > + * those later if necessary.
> > > + */
> > > +static inline bool vtime_accounting_enabled(void)
> > > +{
> > > + return context_tracking_is_enabled();
> > > +}
> >
> > Should this not also include a definition of this function for
> > CONFIG_VIRT_CPU_ACCOUNTING_NATIVE and CONFIG_VIRT_CPU_ACCOUNTING ?
>
> I could but I haven't found any user of it yet for others than
> CONFIG_VIRT_CPU_ACCOUNTING_GEN. The NATIVE version, when enabled, runs on
> all CPUs anyway.

Aah, I see, task_gtime() etc.. have a different definition for !GEN.
tricky.

2015-11-23 15:22:16

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 6/7] cputime: Introduce vtime accounting check for readers

On Mon, Nov 23, 2015 at 03:35:16PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 23, 2015 at 03:19:19PM +0100, Frederic Weisbecker wrote:
> > On Thu, Nov 19, 2015 at 09:28:28PM +0100, Peter Zijlstra wrote:
> > > On Thu, Nov 19, 2015 at 04:47:33PM +0100, Frederic Weisbecker wrote:
> > > > +++ b/include/linux/vtime.h
> > > > @@ -17,9 +17,20 @@ static inline bool vtime_accounting_cpu_enabled(void) { return true; }
> > > > #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
> > > >
> > > > #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
> > > > +/*
> > > > + * Checks if vtime is enabled on some CPU. Cputime readers want to be careful
> > > > + * in that case and compute the tickless cputime.
> > > > + * For now vtime state is tied to context tracking. We might want to decouple
> > > > + * those later if necessary.
> > > > + */
> > > > +static inline bool vtime_accounting_enabled(void)
> > > > +{
> > > > + return context_tracking_is_enabled();
> > > > +}
> > >
> > > Should this not also include a definition of this function for
> > > CONFIG_VIRT_CPU_ACCOUNTING_NATIVE and CONFIG_VIRT_CPU_ACCOUNTING ?
> >
> > I could but I haven't found any user of it yet for others than
> > CONFIG_VIRT_CPU_ACCOUNTING_GEN. The NATIVE version, when enabled, runs on
> > all CPUs anyway.
>
> Aah, I see, task_gtime() etc.. have a different definition for !GEN.
> tricky.

Right, this whole CONFIG_VIRT_CPU_ACCOUNTING stuff is a bit messy. I tried
to consolidate as much code I could between NATIVE and GEN but the result
is hard to parse. I'll see if I can clean a few things up there.

To begin with, VIRT_CPU_ACCOUNTING is a horrible misnomer, as vtime.
VIRT suggests we are dealing with virtualization while it's absolutely
not the case. Perhaps something like BOUNDARY_CPU_ACCOUNTING would parse
better. Or TICKLESS_CPU_ACCOUNTING.

Subject: [tip:locking/core] sched/cputime: Fix invalid gtime in proc

Commit-ID: 2541117b0cf79977fa11a0d6e17d61010677bd7b
Gitweb: http://git.kernel.org/tip/2541117b0cf79977fa11a0d6e17d61010677bd7b
Author: Hiroshi Shimamoto <[email protected]>
AuthorDate: Thu, 19 Nov 2015 16:47:28 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 4 Dec 2015 10:18:49 +0100

sched/cputime: Fix invalid gtime in proc

/proc/stats shows invalid gtime when the thread is running in guest.
When vtime accounting is not enabled, we cannot get a valid delta.
The delta is calculated with now - tsk->vtime_snap, but tsk->vtime_snap
is only updated when vtime accounting is runtime enabled.

This patch makes task_gtime() just return gtime without computing the
buggy non-existing tickless delta when vtime accounting is not enabled.

Use context_tracking_is_enabled() to check if vtime is accounting on
some cpu, in which case only we need to check the tickless delta. This
way we fix the gtime value regression on machines not running nohz full.

The kernel config contains CONFIG_VIRT_CPU_ACCOUNTING_GEN=y and
CONFIG_NO_HZ_FULL_ALL=n and boot without nohz_full.

I ran and stop a busy loop in VM and see the gtime in host.
Dump the 43rd field which shows the gtime in every second:

# while :; do awk '{print $3" "$43}' /proc/3955/task/4014/stat; sleep 1; done
S 4348
R 7064566
R 7064766
R 7064967
R 7065168
S 4759
S 4759

During running busy loop, it returns large value.

After applying this patch, we can see right gtime.

# while :; do awk '{print $3" "$43}' /proc/10913/task/10956/stat; sleep 1; done
S 5338
R 5365
R 5465
R 5566
R 5666
S 5726
S 5726

Signed-off-by: Hiroshi Shimamoto <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul E . McKenney <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/cputime.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 26a5446..05de80b 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -788,6 +788,9 @@ cputime_t task_gtime(struct task_struct *t)
unsigned int seq;
cputime_t gtime;

+ if (!context_tracking_is_enabled())
+ return t->gtime;
+
do {
seq = read_seqbegin(&t->vtime_seqlock);

Subject: [tip:sched/core] sched/cputime: Remove extra cost in task_cputime ()

Commit-ID: 7877a0ba5ec63c7b0111b06c773f1696fa17b35a
Gitweb: http://git.kernel.org/tip/7877a0ba5ec63c7b0111b06c773f1696fa17b35a
Author: Hiroshi Shimamoto <[email protected]>
AuthorDate: Thu, 19 Nov 2015 16:47:29 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 4 Dec 2015 10:34:43 +0100

sched/cputime: Remove extra cost in task_cputime()

There is an extra cost in task_cputime() and task_cputime_scaled() when
nohz_full is not activated. When vtime accounting is not enabled, we
don't need to get deltas of utime and stime under vtime seqlock.

This patch removes that cost with adding a shortcut route if vtime
accounting is not enabled.

Use context_tracking_is_enabled() to check if vtime is accounting on
some cpu, in which case only we need to check the tickless cputime delta.

Signed-off-by: Hiroshi Shimamoto <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul E . McKenney <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/cputime.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 05de80b..1128d4b 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -853,6 +853,14 @@ void task_cputime(struct task_struct *t, cputime_t *utime, cputime_t *stime)
{
cputime_t udelta, sdelta;

+ if (!context_tracking_is_enabled()) {
+ if (utime)
+ *utime = t->utime;
+ if (stime)
+ *stime = t->stime;
+ return;
+ }
+
fetch_task_cputime(t, utime, stime, &t->utime,
&t->stime, &udelta, &sdelta);
if (utime)
@@ -866,6 +874,14 @@ void task_cputime_scaled(struct task_struct *t,
{
cputime_t udelta, sdelta;

+ if (!context_tracking_is_enabled()) {
+ if (utimescaled)
+ *utimescaled = t->utimescaled;
+ if (stimescaled)
+ *stimescaled = t->stimescaled;
+ return;
+ }
+
fetch_task_cputime(t, utimescaled, stimescaled,
&t->utimescaled, &t->stimescaled, &udelta, &sdelta);
if (utimescaled)

Subject: [tip:sched/core] sched/cputime: Clarify vtime symbols and document them

Commit-ID: 7098c1eac75dc03fdbb7249171a6e68ce6044a5a
Gitweb: http://git.kernel.org/tip/7098c1eac75dc03fdbb7249171a6e68ce6044a5a
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Thu, 19 Nov 2015 16:47:30 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 4 Dec 2015 10:34:44 +0100

sched/cputime: Clarify vtime symbols and document them

VTIME_SLEEPING state happens either when:

1) The task is sleeping and no tickless delta is to be added on the task
cputime stats.
2) The CPU isn't running vtime at all, so the same properties of 1) applies.

Lets rename the vtime symbol to reflect both states.

Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Hiroshi Shimamoto <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul E . McKenney <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/sched.h | 5 ++++-
kernel/fork.c | 2 +-
kernel/sched/cputime.c | 6 +++---
3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f425aac..3533168 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1522,8 +1522,11 @@ struct task_struct {
seqlock_t vtime_seqlock;
unsigned long long vtime_snap;
enum {
- VTIME_SLEEPING = 0,
+ /* Task is sleeping or running in a CPU with VTIME inactive */
+ VTIME_INACTIVE = 0,
+ /* Task runs in userspace in a CPU with VTIME active */
VTIME_USER,
+ /* Task runs in kernelspace in a CPU with VTIME active */
VTIME_SYS,
} vtime_snap_whence;
#endif
diff --git a/kernel/fork.c b/kernel/fork.c
index f97f2c4..c0a1370 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1350,7 +1350,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
seqlock_init(&p->vtime_seqlock);
p->vtime_snap = 0;
- p->vtime_snap_whence = VTIME_SLEEPING;
+ p->vtime_snap_whence = VTIME_INACTIVE;
#endif

#if defined(SPLIT_RSS_COUNTING)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 1128d4b..4a18a6e 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -680,7 +680,7 @@ static cputime_t get_vtime_delta(struct task_struct *tsk)
{
unsigned long long delta = vtime_delta(tsk);

- WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_SLEEPING);
+ WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_INACTIVE);
tsk->vtime_snap += delta;

/* CHECKME: always safe to convert nsecs to cputime? */
@@ -764,7 +764,7 @@ void vtime_account_idle(struct task_struct *tsk)
void arch_vtime_task_switch(struct task_struct *prev)
{
write_seqlock(&prev->vtime_seqlock);
- prev->vtime_snap_whence = VTIME_SLEEPING;
+ prev->vtime_snap_whence = VTIME_INACTIVE;
write_sequnlock(&prev->vtime_seqlock);

write_seqlock(&current->vtime_seqlock);
@@ -829,7 +829,7 @@ fetch_task_cputime(struct task_struct *t,
*s_dst = *s_src;

/* Task is sleeping, nothing to add */
- if (t->vtime_snap_whence == VTIME_SLEEPING ||
+ if (t->vtime_snap_whence == VTIME_INACTIVE ||
is_idle_task(t))
continue;

Subject: [tip:sched/core] sched/cputime: Correctly handle task guest time on housekeepers

Commit-ID: cab245d68c38afff1a4c4d018ab7e1d316982f5d
Gitweb: http://git.kernel.org/tip/cab245d68c38afff1a4c4d018ab7e1d316982f5d
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Thu, 19 Nov 2015 16:47:31 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 4 Dec 2015 10:34:44 +0100

sched/cputime: Correctly handle task guest time on housekeepers

When a task runs on a housekeeper (a CPU running with the periodic tick
with neighbours running tickless), it doesn't account cputime using vtime
but relies on the tick. Such a task has its vtime_snap_whence value set
to VTIME_INACTIVE.

Readers won't handle that correctly though. As long as vtime is running
on some CPU, readers incorretly assume that vtime runs on all CPUs and
always compute the tickless cputime delta, which is only junk on
housekeepers.

So lets fix this with checking that the target runs on a vtime CPU through
the appropriate state check before computing the tickless delta.

Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Hiroshi Shimamoto <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul E . McKenney <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/cputime.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 4a18a6e..5cf24e7 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -795,7 +795,7 @@ cputime_t task_gtime(struct task_struct *t)
seq = read_seqbegin(&t->vtime_seqlock);

gtime = t->gtime;
- if (t->flags & PF_VCPU)
+ if (t->vtime_snap_whence == VTIME_SYS && t->flags & PF_VCPU)
gtime += vtime_delta(t);

} while (read_seqretry(&t->vtime_seqlock, seq));

Subject: [tip:sched/core] sched/cputime: Rename vtime_accounting_enabled() to vtime_accounting_cpu_enabled()

Commit-ID: 55dbdcfa05533f44c9416070b8a9f6432b22314a
Gitweb: http://git.kernel.org/tip/55dbdcfa05533f44c9416070b8a9f6432b22314a
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Thu, 19 Nov 2015 16:47:32 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 4 Dec 2015 10:34:45 +0100

sched/cputime: Rename vtime_accounting_enabled() to vtime_accounting_cpu_enabled()

vtime_accounting_enabled() checks if vtime is running on the current CPU
and is as such a misnomer. Lets rename it to a function that reflect its
locality. We are going to need the current name for a function that tells
if vtime runs at all on some CPU.

Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Hiroshi Shimamoto <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul E . McKenney <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/context_tracking.h | 4 ++--
include/linux/vtime.h | 14 +++++++-------
kernel/sched/cputime.c | 2 +-
kernel/time/tick-sched.c | 2 +-
4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 68b575a..d259274 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -86,7 +86,7 @@ static inline void context_tracking_init(void) { }
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
static inline void guest_enter(void)
{
- if (vtime_accounting_enabled())
+ if (vtime_accounting_cpu_enabled())
vtime_guest_enter(current);
else
current->flags |= PF_VCPU;
@@ -100,7 +100,7 @@ static inline void guest_exit(void)
if (context_tracking_is_enabled())
__context_tracking_exit(CONTEXT_GUEST);

- if (vtime_accounting_enabled())
+ if (vtime_accounting_cpu_enabled())
vtime_guest_exit(current);
else
current->flags &= ~PF_VCPU;
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index c5165fd..ca23e83 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -10,14 +10,14 @@
struct task_struct;

/*
- * vtime_accounting_enabled() definitions/declarations
+ * vtime_accounting_cpu_enabled() definitions/declarations
*/
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-static inline bool vtime_accounting_enabled(void) { return true; }
+static inline bool vtime_accounting_cpu_enabled(void) { return true; }
#endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */

#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-static inline bool vtime_accounting_enabled(void)
+static inline bool vtime_accounting_cpu_enabled(void)
{
if (context_tracking_is_enabled()) {
if (context_tracking_cpu_is_enabled())
@@ -29,7 +29,7 @@ static inline bool vtime_accounting_enabled(void)
#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */

#ifndef CONFIG_VIRT_CPU_ACCOUNTING
-static inline bool vtime_accounting_enabled(void) { return false; }
+static inline bool vtime_accounting_cpu_enabled(void) { return false; }
#endif /* !CONFIG_VIRT_CPU_ACCOUNTING */


@@ -44,7 +44,7 @@ extern void vtime_task_switch(struct task_struct *prev);
extern void vtime_common_task_switch(struct task_struct *prev);
static inline void vtime_task_switch(struct task_struct *prev)
{
- if (vtime_accounting_enabled())
+ if (vtime_accounting_cpu_enabled())
vtime_common_task_switch(prev);
}
#endif /* __ARCH_HAS_VTIME_TASK_SWITCH */
@@ -59,7 +59,7 @@ extern void vtime_account_irq_enter(struct task_struct *tsk);
extern void vtime_common_account_irq_enter(struct task_struct *tsk);
static inline void vtime_account_irq_enter(struct task_struct *tsk)
{
- if (vtime_accounting_enabled())
+ if (vtime_accounting_cpu_enabled())
vtime_common_account_irq_enter(tsk);
}
#endif /* __ARCH_HAS_VTIME_ACCOUNT */
@@ -78,7 +78,7 @@ extern void vtime_gen_account_irq_exit(struct task_struct *tsk);

static inline void vtime_account_irq_exit(struct task_struct *tsk)
{
- if (vtime_accounting_enabled())
+ if (vtime_accounting_cpu_enabled())
vtime_gen_account_irq_exit(tsk);
}

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5cf24e7..5727217 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -466,7 +466,7 @@ void account_process_tick(struct task_struct *p, int user_tick)
cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
struct rq *rq = this_rq();

- if (vtime_accounting_enabled())
+ if (vtime_accounting_cpu_enabled())
return;

if (sched_clock_irqtime) {
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 515edf3..11ce599 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -875,7 +875,7 @@ static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
unsigned long ticks;

- if (vtime_accounting_enabled())
+ if (vtime_accounting_cpu_enabled())
return;
/*
* We stopped the tick in idle. Update process times would miss the

Subject: [tip:sched/core] sched/cputime: Introduce vtime accounting check for readers

Commit-ID: e592539466380279a9e6e6fdfe4545aa54f22593
Gitweb: http://git.kernel.org/tip/e592539466380279a9e6e6fdfe4545aa54f22593
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Thu, 19 Nov 2015 16:47:33 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 4 Dec 2015 10:34:45 +0100

sched/cputime: Introduce vtime accounting check for readers

Readers need to know if vtime runs at all on some CPU somewhere, this
is a fast-path check to determine if we need to check further the need
to add up any tickless cputime delta.

This fast path check uses context tracking state because vtime is tied
to context tracking as of now. This check appears to be confusing though
so lets use a vtime function that deals with context tracking details
in vtime implementation instead.

Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Hiroshi Shimamoto <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul E . McKenney <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/vtime.h | 13 ++++++++++++-
kernel/sched/cputime.c | 6 +++---
2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index ca23e83..fa21969 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -17,9 +17,20 @@ static inline bool vtime_accounting_cpu_enabled(void) { return true; }
#endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */

#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+/*
+ * Checks if vtime is enabled on some CPU. Cputime readers want to be careful
+ * in that case and compute the tickless cputime.
+ * For now vtime state is tied to context tracking. We might want to decouple
+ * those later if necessary.
+ */
+static inline bool vtime_accounting_enabled(void)
+{
+ return context_tracking_is_enabled();
+}
+
static inline bool vtime_accounting_cpu_enabled(void)
{
- if (context_tracking_is_enabled()) {
+ if (vtime_accounting_enabled()) {
if (context_tracking_cpu_is_enabled())
return true;
}
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5727217..9989c3f 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -788,7 +788,7 @@ cputime_t task_gtime(struct task_struct *t)
unsigned int seq;
cputime_t gtime;

- if (!context_tracking_is_enabled())
+ if (!vtime_accounting_enabled())
return t->gtime;

do {
@@ -853,7 +853,7 @@ void task_cputime(struct task_struct *t, cputime_t *utime, cputime_t *stime)
{
cputime_t udelta, sdelta;

- if (!context_tracking_is_enabled()) {
+ if (!vtime_accounting_enabled()) {
if (utime)
*utime = t->utime;
if (stime)
@@ -874,7 +874,7 @@ void task_cputime_scaled(struct task_struct *t,
{
cputime_t udelta, sdelta;

- if (!context_tracking_is_enabled()) {
+ if (!vtime_accounting_enabled()) {
if (utimescaled)
*utimescaled = t->utimescaled;
if (stimescaled)

Subject: [tip:sched/core] sched/cputime: Convert vtime_seqlock to seqcount

Commit-ID: b7ce2277f087fd052e7e1bbf432f7fecbee82bb6
Gitweb: http://git.kernel.org/tip/b7ce2277f087fd052e7e1bbf432f7fecbee82bb6
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Thu, 19 Nov 2015 16:47:34 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 4 Dec 2015 10:34:46 +0100

sched/cputime: Convert vtime_seqlock to seqcount

The cputime can only be updated by the current task itself, even in
vtime case. So we can safely use seqcount instead of seqlock as there
is no writer concurrency involved.

Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Hiroshi Shimamoto <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul E . McKenney <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/init_task.h | 2 +-
include/linux/sched.h | 2 +-
kernel/fork.c | 2 +-
kernel/sched/cputime.c | 46 ++++++++++++++++++++++++----------------------
4 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 1c1ff7e..f2cb8d4 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -150,7 +150,7 @@ extern struct task_group root_task_group;

#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
# define INIT_VTIME(tsk) \
- .vtime_seqlock = __SEQLOCK_UNLOCKED(tsk.vtime_seqlock), \
+ .vtime_seqcount = SEQCNT_ZERO(tsk.vtime_seqcount), \
.vtime_snap = 0, \
.vtime_snap_whence = VTIME_SYS,
#else
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3533168..3b0de68 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1519,7 +1519,7 @@ struct task_struct {
cputime_t gtime;
struct prev_cputime prev_cputime;
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
- seqlock_t vtime_seqlock;
+ seqcount_t vtime_seqcount;
unsigned long long vtime_snap;
enum {
/* Task is sleeping or running in a CPU with VTIME inactive */
diff --git a/kernel/fork.c b/kernel/fork.c
index c0a1370..eea32b5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1348,7 +1348,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
prev_cputime_init(&p->prev_cputime);

#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
- seqlock_init(&p->vtime_seqlock);
+ seqcount_init(&p->vtime_seqcount);
p->vtime_snap = 0;
p->vtime_snap_whence = VTIME_INACTIVE;
#endif
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 9989c3f..d5ff5c6 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -696,37 +696,37 @@ static void __vtime_account_system(struct task_struct *tsk)

void vtime_account_system(struct task_struct *tsk)
{
- write_seqlock(&tsk->vtime_seqlock);
+ write_seqcount_begin(&tsk->vtime_seqcount);
__vtime_account_system(tsk);
- write_sequnlock(&tsk->vtime_seqlock);
+ write_seqcount_end(&tsk->vtime_seqcount);
}

void vtime_gen_account_irq_exit(struct task_struct *tsk)
{
- write_seqlock(&tsk->vtime_seqlock);
+ write_seqcount_begin(&tsk->vtime_seqcount);
__vtime_account_system(tsk);
if (context_tracking_in_user())
tsk->vtime_snap_whence = VTIME_USER;
- write_sequnlock(&tsk->vtime_seqlock);
+ write_seqcount_end(&tsk->vtime_seqcount);
}

void vtime_account_user(struct task_struct *tsk)
{
cputime_t delta_cpu;

- write_seqlock(&tsk->vtime_seqlock);
+ write_seqcount_begin(&tsk->vtime_seqcount);
delta_cpu = get_vtime_delta(tsk);
tsk->vtime_snap_whence = VTIME_SYS;
account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
- write_sequnlock(&tsk->vtime_seqlock);
+ write_seqcount_end(&tsk->vtime_seqcount);
}

void vtime_user_enter(struct task_struct *tsk)
{
- write_seqlock(&tsk->vtime_seqlock);
+ write_seqcount_begin(&tsk->vtime_seqcount);
__vtime_account_system(tsk);
tsk->vtime_snap_whence = VTIME_USER;
- write_sequnlock(&tsk->vtime_seqlock);
+ write_seqcount_end(&tsk->vtime_seqcount);
}

void vtime_guest_enter(struct task_struct *tsk)
@@ -738,19 +738,19 @@ void vtime_guest_enter(struct task_struct *tsk)
* synchronization against the reader (task_gtime())
* that can thus safely catch up with a tickless delta.
*/
- write_seqlock(&tsk->vtime_seqlock);
+ write_seqcount_begin(&tsk->vtime_seqcount);
__vtime_account_system(tsk);
current->flags |= PF_VCPU;
- write_sequnlock(&tsk->vtime_seqlock);
+ write_seqcount_end(&tsk->vtime_seqcount);
}
EXPORT_SYMBOL_GPL(vtime_guest_enter);

void vtime_guest_exit(struct task_struct *tsk)
{
- write_seqlock(&tsk->vtime_seqlock);
+ write_seqcount_begin(&tsk->vtime_seqcount);
__vtime_account_system(tsk);
current->flags &= ~PF_VCPU;
- write_sequnlock(&tsk->vtime_seqlock);
+ write_seqcount_end(&tsk->vtime_seqcount);
}
EXPORT_SYMBOL_GPL(vtime_guest_exit);

@@ -763,24 +763,26 @@ void vtime_account_idle(struct task_struct *tsk)

void arch_vtime_task_switch(struct task_struct *prev)
{
- write_seqlock(&prev->vtime_seqlock);
+ write_seqcount_begin(&prev->vtime_seqcount);
prev->vtime_snap_whence = VTIME_INACTIVE;
- write_sequnlock(&prev->vtime_seqlock);
+ write_seqcount_end(&prev->vtime_seqcount);

- write_seqlock(&current->vtime_seqlock);
+ write_seqcount_begin(&current->vtime_seqcount);
current->vtime_snap_whence = VTIME_SYS;
current->vtime_snap = sched_clock_cpu(smp_processor_id());
- write_sequnlock(&current->vtime_seqlock);
+ write_seqcount_end(&current->vtime_seqcount);
}

void vtime_init_idle(struct task_struct *t, int cpu)
{
unsigned long flags;

- write_seqlock_irqsave(&t->vtime_seqlock, flags);
+ local_irq_save(flags);
+ write_seqcount_begin(&t->vtime_seqcount);
t->vtime_snap_whence = VTIME_SYS;
t->vtime_snap = sched_clock_cpu(cpu);
- write_sequnlock_irqrestore(&t->vtime_seqlock, flags);
+ write_seqcount_end(&t->vtime_seqcount);
+ local_irq_restore(flags);
}

cputime_t task_gtime(struct task_struct *t)
@@ -792,13 +794,13 @@ cputime_t task_gtime(struct task_struct *t)
return t->gtime;

do {
- seq = read_seqbegin(&t->vtime_seqlock);
+ seq = read_seqcount_begin(&t->vtime_seqcount);

gtime = t->gtime;
if (t->vtime_snap_whence == VTIME_SYS && t->flags & PF_VCPU)
gtime += vtime_delta(t);

- } while (read_seqretry(&t->vtime_seqlock, seq));
+ } while (read_seqcount_retry(&t->vtime_seqcount, seq));

return gtime;
}
@@ -821,7 +823,7 @@ fetch_task_cputime(struct task_struct *t,
*udelta = 0;
*sdelta = 0;

- seq = read_seqbegin(&t->vtime_seqlock);
+ seq = read_seqcount_begin(&t->vtime_seqcount);

if (u_dst)
*u_dst = *u_src;
@@ -845,7 +847,7 @@ fetch_task_cputime(struct task_struct *t,
if (t->vtime_snap_whence == VTIME_SYS)
*sdelta = delta;
}
- } while (read_seqretry(&t->vtime_seqlock, seq));
+ } while (read_seqcount_retry(&t->vtime_seqcount, seq));
}

2015-12-07 16:21:46

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [tip:locking/core] sched/cputime: Fix invalid gtime in proc

On Fri, Dec 04, 2015 at 03:53:13AM -0800, tip-bot for Hiroshi Shimamoto wrote:
> Commit-ID: 2541117b0cf79977fa11a0d6e17d61010677bd7b
> Gitweb: http://git.kernel.org/tip/2541117b0cf79977fa11a0d6e17d61010677bd7b
> Author: Hiroshi Shimamoto <[email protected]>
> AuthorDate: Thu, 19 Nov 2015 16:47:28 +0100
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Fri, 4 Dec 2015 10:18:49 +0100
>
> sched/cputime: Fix invalid gtime in proc
>
> /proc/stats shows invalid gtime when the thread is running in guest.
> When vtime accounting is not enabled, we cannot get a valid delta.
> The delta is calculated with now - tsk->vtime_snap, but tsk->vtime_snap
> is only updated when vtime accounting is runtime enabled.
>
> This patch makes task_gtime() just return gtime without computing the
> buggy non-existing tickless delta when vtime accounting is not enabled.
>
> Use context_tracking_is_enabled() to check if vtime is accounting on
> some cpu, in which case only we need to check the tickless delta. This
> way we fix the gtime value regression on machines not running nohz full.
>
> The kernel config contains CONFIG_VIRT_CPU_ACCOUNTING_GEN=y and
> CONFIG_NO_HZ_FULL_ALL=n and boot without nohz_full.
>
> I ran and stop a busy loop in VM and see the gtime in host.
> Dump the 43rd field which shows the gtime in every second:
>
> # while :; do awk '{print $3" "$43}' /proc/3955/task/4014/stat; sleep 1; done
> S 4348
> R 7064566
> R 7064766
> R 7064967
> R 7065168
> S 4759
> S 4759
>
> During running busy loop, it returns large value.
>
> After applying this patch, we can see right gtime.
>
> # while :; do awk '{print $3" "$43}' /proc/10913/task/10956/stat; sleep 1; done
> S 5338
> R 5365
> R 5465
> R 5566
> R 5666
> S 5726
> S 5726
>
> Signed-off-by: Hiroshi Shimamoto <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Cc: Chris Metcalf <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Luiz Capitulino <[email protected]>
> Cc: Mike Galbraith <[email protected]>
> Cc: Paul E . McKenney <[email protected]>
> Cc: Paul E. McKenney <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Ingo Molnar <[email protected]>
> ---

Thanks for applying the patchset!

However we may want to backport this one, it's a regression fix affecting
CONFIG_NO_HZ_FULL=y with nohz_full off (99% of many distros defaults).

Thanks.

> kernel/sched/cputime.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 26a5446..05de80b 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -788,6 +788,9 @@ cputime_t task_gtime(struct task_struct *t)
> unsigned int seq;
> cputime_t gtime;
>
> + if (!context_tracking_is_enabled())
> + return t->gtime;
> +
> do {
> seq = read_seqbegin(&t->vtime_seqlock);
>

2015-12-08 05:34:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: [tip:locking/core] sched/cputime: Fix invalid gtime in proc


* Frederic Weisbecker <[email protected]> wrote:

> On Fri, Dec 04, 2015 at 03:53:13AM -0800, tip-bot for Hiroshi Shimamoto wrote:
> > Commit-ID: 2541117b0cf79977fa11a0d6e17d61010677bd7b
> > Gitweb: http://git.kernel.org/tip/2541117b0cf79977fa11a0d6e17d61010677bd7b
> > Author: Hiroshi Shimamoto <[email protected]>
> > AuthorDate: Thu, 19 Nov 2015 16:47:28 +0100
> > Committer: Ingo Molnar <[email protected]>
> > CommitDate: Fri, 4 Dec 2015 10:18:49 +0100
> >
> > sched/cputime: Fix invalid gtime in proc
> >
> > /proc/stats shows invalid gtime when the thread is running in guest.
> > When vtime accounting is not enabled, we cannot get a valid delta.
> > The delta is calculated with now - tsk->vtime_snap, but tsk->vtime_snap
> > is only updated when vtime accounting is runtime enabled.
> >
> > This patch makes task_gtime() just return gtime without computing the
> > buggy non-existing tickless delta when vtime accounting is not enabled.
> >
> > Use context_tracking_is_enabled() to check if vtime is accounting on
> > some cpu, in which case only we need to check the tickless delta. This
> > way we fix the gtime value regression on machines not running nohz full.
> >
> > The kernel config contains CONFIG_VIRT_CPU_ACCOUNTING_GEN=y and
> > CONFIG_NO_HZ_FULL_ALL=n and boot without nohz_full.
> >
> > I ran and stop a busy loop in VM and see the gtime in host.
> > Dump the 43rd field which shows the gtime in every second:
> >
> > # while :; do awk '{print $3" "$43}' /proc/3955/task/4014/stat; sleep 1; done
> > S 4348
> > R 7064566
> > R 7064766
> > R 7064967
> > R 7065168
> > S 4759
> > S 4759
> >
> > During running busy loop, it returns large value.
> >
> > After applying this patch, we can see right gtime.
> >
> > # while :; do awk '{print $3" "$43}' /proc/10913/task/10956/stat; sleep 1; done
> > S 5338
> > R 5365
> > R 5465
> > R 5566
> > R 5666
> > S 5726
> > S 5726
> >
> > Signed-off-by: Hiroshi Shimamoto <[email protected]>
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> > Cc: Chris Metcalf <[email protected]>
> > Cc: Christoph Lameter <[email protected]>
> > Cc: Linus Torvalds <[email protected]>
> > Cc: Luiz Capitulino <[email protected]>
> > Cc: Mike Galbraith <[email protected]>
> > Cc: Paul E . McKenney <[email protected]>
> > Cc: Paul E. McKenney <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Rik van Riel <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Link: http://lkml.kernel.org/r/[email protected]
> > Signed-off-by: Ingo Molnar <[email protected]>
> > ---
>
> Thanks for applying the patchset!
>
> However we may want to backport this one, it's a regression fix affecting
> CONFIG_NO_HZ_FULL=y with nohz_full off (99% of many distros defaults).
>
> Thanks.

Absolutely, not sure why I forgot that tag - I added it for other recent fixes.

Greg, please consider applying this upstream fix to -stable as well:

2541117b0cf7 ("sched/cputime: Fix invalid gtime in proc")

Note, the commit will cherry-pick cleanly all the way back to v3.9 (released 2.5+
years ago), but it will only build on v3.14 and later kernels.

Thanks,

Ingo