2012-10-08 22:36:22

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 0/3] cputime: Moar cleanups / enhancements

Hi,

I think that the 1st and 3rd patches are pretty uncontroversial given
how vtime_account() confusingly tries to do everything for
CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_IRQ_TIME_ACCOUNTING.

I believe the 2nd is also desired. vtime_account() is called
two times per irq, sometimes more if softirqs are involved. So
I think we want to optimize that by calling directly its specialized
APIs when possible.

There is still some work to do but I'm proceeding step by step.
I may focuse more on that generic vtime implementation next
time to implement cputime accounting for the tickmess patchset.
That, for sure, will inspire for even more cputime optimizations/cleanups.

Thanks.

PS: tested on x86 and ppc64 (checked reliability of times and /proc/stat).
But only built tested on s390 and ia64.

Frederic Weisbecker (3):
kvm: Directly account vtime to system on guest switch
cputime: Specialize irq vtime hooks
cputime: Separate irqtime accounting from generic vtime

arch/ia64/kernel/time.c | 1 +
arch/powerpc/kernel/time.c | 1 +
arch/s390/kernel/vtime.c | 4 ++
include/linux/hardirq.h | 80 +++++++++++++++++++++++++++++++++++--------
include/linux/kernel_stat.h | 8 ----
include/linux/kvm_host.h | 12 +++++-
kernel/sched/cputime.c | 8 ++--
kernel/softirq.c | 6 ++--
8 files changed, 88 insertions(+), 32 deletions(-)

--
1.7.5.4


2012-10-08 22:36:26

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 1/3] kvm: Directly account vtime to system on guest switch

Switching to or from guest context is done on ioctl context.
So by the time we call kvm_guest_enter() or kvm_guest_exit()
we know we are not running the idle task.

As a result, we can directly account the cputime using
vtime_account_system().

There are two good reasons to do this:

* We avoid one indirect call and some useless checks on guest switch.
It optimizes a bit this fast path.

* In the case of CONFIG_IRQ_TIME_ACCOUNTING, calling vtime_account()
checks for irq time to account. This is pointless since we know
we are not in an irq on guest switch. This is wasting cpu cycles
for no good reason. vtime_account_system() OTOH is a no-op in
this config option.

A further optimization may consist in introducing a vtime_account_guest()
that directly calls account_guest_time().

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Avi Kivity <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Alexander Graf <[email protected]>
Cc: Xiantao Zhang <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Cornelia Huck <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
arch/ia64/kernel/time.c | 1 +
arch/powerpc/kernel/time.c | 1 +
arch/s390/kernel/vtime.c | 4 ++++
include/linux/kernel_stat.h | 1 +
include/linux/kvm_host.h | 12 ++++++++++--
5 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 80ff9ac..3337e97 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -141,6 +141,7 @@ void vtime_account_system(struct task_struct *tsk)

account_system_time(tsk, 0, delta, delta);
}
+EXPORT_SYMBOL_GPL(vtime_account_system);

void vtime_account_idle(struct task_struct *tsk)
{
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index eaa9d0e..5547452 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -345,6 +345,7 @@ void vtime_account_system(struct task_struct *tsk)
if (stolen)
account_steal_time(stolen);
}
+EXPORT_SYMBOL_GPL(vtime_account_system);

void vtime_account_idle(struct task_struct *tsk)
{
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index cb5093c..972082d 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -140,6 +140,10 @@ void vtime_account(struct task_struct *tsk)
}
EXPORT_SYMBOL_GPL(vtime_account);

+void vtime_account_system(struct task_struct *tsk)
+__attribute__((alias("vtime_account")));
+EXPORT_SYMBOL_GPL(vtime_account_system);
+
void __kprobes vtime_stop_cpu(void)
{
struct s390_idle_data *idle = &__get_cpu_var(s390_idle);
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 36d12f0..6747d4b 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -136,6 +136,7 @@ extern void vtime_account_system(struct task_struct *tsk);
extern void vtime_account_idle(struct task_struct *tsk);
#else
static inline void vtime_task_switch(struct task_struct *prev) { }
+static inline void vtime_account_system(struct task_struct *tsk) { }
#endif

#endif /* _LINUX_KERNEL_STAT_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8a59e0a..148db3e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -685,7 +685,11 @@ static inline int kvm_deassign_device(struct kvm *kvm,
static inline void kvm_guest_enter(void)
{
BUG_ON(preemptible());
- vtime_account(current);
+ /*
+ * This is running in ioctl context so we can avoid
+ * the call to vtime_account() with its unnecessary idle check.
+ */
+ vtime_account_system(current);
current->flags |= PF_VCPU;
/* KVM does not hold any references to rcu protected data when it
* switches CPU into a guest mode. In fact switching to a guest mode
@@ -699,7 +703,11 @@ static inline void kvm_guest_enter(void)

static inline void kvm_guest_exit(void)
{
- vtime_account(current);
+ /*
+ * This is running in ioctl context so we can avoid
+ * the call to vtime_account() with its unnecessary idle check.
+ */
+ vtime_account_system(current);
current->flags &= ~PF_VCPU;
}

--
1.7.5.4

2012-10-08 22:36:31

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 2/3] cputime: Specialize irq vtime hooks

With CONFIG_VIRT_CPU_TIME_ACCOUNTING, When vtime_account()
is called in irq entry/exit, we perform a check on the
context: if we are interrupting the idle task when the
interrupt fires, account the pending cputime to idle,
otherwise account to system time or its sub-areas: tsk->stime,
hardirq time, softirq time, ...

However this check for idle only concerns the hardirq entry.
We only account pending idle time when a hardirq interrupts idle.

In the other cases we always account to system/irq time:

* On hardirq exit we account the time to hardirq time.

* Softirqs don't interrupt idle directly. They are either
following a hardirq that has already accounted the pending
idle time or we are running ksoftird and idle time has been
accounted in a previous context switch.

To optimize this and avoid the indirect call to vtime_account()
and the checks it performs, specialize the vtime irq APIs and
only perform the check on hard irq entry. Other vtime calls
can directly call vtime_account_system().

CONFIG_IRQ_TIME_ACCOUNTING behaviour doesn't change and directly
maps to its own vtime_account() implementation. One may want
to take benefits from the new APIs to optimize irq time accounting
as well in the future.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
include/linux/hardirq.h | 82 +++++++++++++++++++++++++++++++++++--------
include/linux/kernel_stat.h | 9 -----
kernel/softirq.c | 6 ++--
3 files changed, 70 insertions(+), 27 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index cab3da3..c126ffb 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -131,13 +131,65 @@ extern void synchronize_irq(unsigned int irq);

struct task_struct;

-#if !defined(CONFIG_VIRT_CPU_ACCOUNTING) && !defined(CONFIG_IRQ_TIME_ACCOUNTING)
-static inline void vtime_account(struct task_struct *tsk)
+#ifdef CONFIG_TICK_CPU_ACCOUNTING
+static inline void vtime_account(struct task_struct *tsk) { }
+static inline void vtime_account_irq_enter(struct task_struct *tsk,
+ unsigned long offset) { }
+static inline void vtime_account_irq_exit(struct task_struct *tsk,
+ unsigned long offset) { }
+#else /* !CONFIG_TICK_CPU_ACCOUNTING */
+extern void vtime_account(struct task_struct *tsk);
+#endif /* !CONFIG_TICK_CPU_ACCOUNTING */
+
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING
+extern void vtime_task_switch(struct task_struct *prev);
+extern void vtime_account_system(struct task_struct *tsk);
+extern void vtime_account_idle(struct task_struct *tsk);
+
+static inline void vtime_account_irq_enter(struct task_struct *tsk,
+ unsigned long offset)
{
+ /*
+ * On hardirq entry We need to check which context we are interrupting.
+ * Time may be accounted to either idle or system.
+ */
+ if (offset == HARDIRQ_OFFSET) {
+ vtime_account(tsk);
+ } else {
+ /*
+ * Softirqs never interrupt idle directly. Either the hardirq
+ * already did and accounted the idle time or we run in
+ * ksoftirqd and idle time was accounted on context switch.
+ */
+ vtime_account_system(tsk);
+ }
}
-#else
-extern void vtime_account(struct task_struct *tsk);
-#endif
+
+static inline void vtime_account_irq_exit(struct task_struct *tsk,
+ unsigned long offset)
+{
+ /* On hard|softirq exit we always account to hard|softirq cputime */
+ vtime_account_system(tsk);
+}
+#else /* !CONFIG_VIRT_CPU_ACCOUNTING */
+static inline void vtime_task_switch(struct task_struct *prev) { }
+static inline void vtime_account_system(struct task_struct *tsk) { }
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING */
+
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+static inline void vtime_account_irq_enter(struct task_struct *tsk,
+ unsigned long offset)
+{
+ vtime_account(tsk);
+}
+
+static inline void vtime_account_irq_exit(struct task_struct *tsk,
+ unsigned long offset)
+{
+ vtime_account(tsk);
+}
+#endif /* CONFIG_IRQ_TIME_ACCOUNTING */
+

#if defined(CONFIG_TINY_RCU) || defined(CONFIG_TINY_PREEMPT_RCU)

@@ -160,11 +212,11 @@ extern void rcu_nmi_exit(void);
* always balanced, so the interrupted value of ->hardirq_context
* will always be restored.
*/
-#define __irq_enter() \
- do { \
- vtime_account(current); \
- add_preempt_count(HARDIRQ_OFFSET); \
- trace_hardirq_enter(); \
+#define __irq_enter() \
+ do { \
+ vtime_account_irq_enter(current, HARDIRQ_OFFSET); \
+ add_preempt_count(HARDIRQ_OFFSET); \
+ trace_hardirq_enter(); \
} while (0)

/*
@@ -175,11 +227,11 @@ extern void irq_enter(void);
/*
* Exit irq context without processing softirqs:
*/
-#define __irq_exit() \
- do { \
- trace_hardirq_exit(); \
- vtime_account(current); \
- sub_preempt_count(HARDIRQ_OFFSET); \
+#define __irq_exit() \
+ do { \
+ trace_hardirq_exit(); \
+ vtime_account_irq_exit(current, HARDIRQ_OFFSET); \
+ sub_preempt_count(HARDIRQ_OFFSET); \
} while (0)

/*
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 6747d4b..2fbd905 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -130,13 +130,4 @@ extern void account_process_tick(struct task_struct *, int user);
extern void account_steal_ticks(unsigned long ticks);
extern void account_idle_ticks(unsigned long ticks);

-#ifdef CONFIG_VIRT_CPU_ACCOUNTING
-extern void vtime_task_switch(struct task_struct *prev);
-extern void vtime_account_system(struct task_struct *tsk);
-extern void vtime_account_idle(struct task_struct *tsk);
-#else
-static inline void vtime_task_switch(struct task_struct *prev) { }
-static inline void vtime_account_system(struct task_struct *tsk) { }
-#endif
-
#endif /* _LINUX_KERNEL_STAT_H */
diff --git a/kernel/softirq.c b/kernel/softirq.c
index cc96bdc..402d4b5 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -221,7 +221,7 @@ asmlinkage void __do_softirq(void)
current->flags &= ~PF_MEMALLOC;

pending = local_softirq_pending();
- vtime_account(current);
+ vtime_account_irq_enter(current, SOFTIRQ_OFFSET);

__local_bh_disable((unsigned long)__builtin_return_address(0),
SOFTIRQ_OFFSET);
@@ -272,7 +272,7 @@ restart:

lockdep_softirq_exit();

- vtime_account(current);
+ vtime_account_irq_exit(current, SOFTIRQ_OFFSET);
__local_bh_enable(SOFTIRQ_OFFSET);
tsk_restore_flags(current, old_flags, PF_MEMALLOC);
}
@@ -341,7 +341,7 @@ static inline void invoke_softirq(void)
*/
void irq_exit(void)
{
- vtime_account(current);
+ vtime_account_irq_exit(current, HARDIRQ_OFFSET);
trace_hardirq_exit();
sub_preempt_count(IRQ_EXIT_OFFSET);
if (!in_interrupt() && local_softirq_pending())
--
1.7.5.4

2012-10-08 22:36:52

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 3/3] cputime: Separate irqtime accounting from generic vtime

vtime_account() doesn't have the same role in
CONFIG_VIRT_CPU_TIME_ACCOUNTING and CONFIG_IRQ_TIME_ACCOUNTING.

In the first case it handles time accounting in any context. In
the second case it only handles irq time accounting.

So when vtime_account() is called from outside vtime_account_irq_*()
this call is pointless to CONFIG_IRQ_TIME_ACCOUNTING.

To fix the confusion, change vtime_account() to irqtime_account_irq()
in CONFIG_IRQ_TIME_ACCOUNTING. This way we ensure future account_vtime()
calls won't waste useless cycles in the irqtime APIs.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
include/linux/hardirq.h | 26 ++++++++++++--------------
kernel/sched/cputime.c | 8 ++++----
2 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index c126ffb..dc2052c 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -131,17 +131,8 @@ extern void synchronize_irq(unsigned int irq);

struct task_struct;

-#ifdef CONFIG_TICK_CPU_ACCOUNTING
-static inline void vtime_account(struct task_struct *tsk) { }
-static inline void vtime_account_irq_enter(struct task_struct *tsk,
- unsigned long offset) { }
-static inline void vtime_account_irq_exit(struct task_struct *tsk,
- unsigned long offset) { }
-#else /* !CONFIG_TICK_CPU_ACCOUNTING */
-extern void vtime_account(struct task_struct *tsk);
-#endif /* !CONFIG_TICK_CPU_ACCOUNTING */
-
#ifdef CONFIG_VIRT_CPU_ACCOUNTING
+extern void vtime_account(struct task_struct *tsk);
extern void vtime_task_switch(struct task_struct *prev);
extern void vtime_account_system(struct task_struct *tsk);
extern void vtime_account_idle(struct task_struct *tsk);
@@ -174,21 +165,28 @@ static inline void vtime_account_irq_exit(struct task_struct *tsk,
#else /* !CONFIG_VIRT_CPU_ACCOUNTING */
static inline void vtime_task_switch(struct task_struct *prev) { }
static inline void vtime_account_system(struct task_struct *tsk) { }
-#endif /* CONFIG_VIRT_CPU_ACCOUNTING */

#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+extern void irqtime_account_irq(struct task_struct *tsk);
+
static inline void vtime_account_irq_enter(struct task_struct *tsk,
unsigned long offset)
{
- vtime_account(tsk);
+ irqtime_account_irq(tsk);
}

static inline void vtime_account_irq_exit(struct task_struct *tsk,
unsigned long offset)
{
- vtime_account(tsk);
+ irqtime_account_irq(tsk);
}
-#endif /* CONFIG_IRQ_TIME_ACCOUNTING */
+#else /* !CONFIG_IRQ_TIME_ACCOUNTING */
+static inline void vtime_account_irq_enter(struct task_struct *tsk,
+ unsigned long offset) { }
+static inline void vtime_account_irq_exit(struct task_struct *tsk,
+ unsigned long offset) { }
+#endif /* !CONFIG_IRQ_TIME_ACCOUNTING */
+#endif /* !CONFIG_VIRT_CPU_ACCOUNTING */


#if defined(CONFIG_TINY_RCU) || defined(CONFIG_TINY_PREEMPT_RCU)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 81b763b..7ad407a 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -10,11 +10,11 @@

/*
* There are no locks covering percpu hardirq/softirq time.
- * They are only modified in vtime_account, on corresponding CPU
+ * They are only modified in irqtime_account_irq, on corresponding CPU
* with interrupts disabled. So, writes are safe.
* They are read and saved off onto struct rq in update_rq_clock().
* This may result in other CPU reading this CPU's irq time and can
- * race with irq/vtime_account on this CPU. We would either get old
+ * race with irqtime_account_irq on this CPU. We would either get old
* or new value with a side effect of accounting a slice of irq time to wrong
* task when irq is in progress while we read rq->clock. That is a worthy
* compromise in place of having locks on each irq in account_system_time.
@@ -43,7 +43,7 @@ DEFINE_PER_CPU(seqcount_t, irq_time_seq);
* Called before incrementing preempt_count on {soft,}irq_enter
* and before decrementing preempt_count on {soft,}irq_exit.
*/
-void vtime_account(struct task_struct *curr)
+void irqtime_account_irq(struct task_struct *curr)
{
unsigned long flags;
s64 delta;
@@ -73,7 +73,7 @@ void vtime_account(struct task_struct *curr)
irq_time_write_end();
local_irq_restore(flags);
}
-EXPORT_SYMBOL_GPL(vtime_account);
+EXPORT_SYMBOL_GPL(irqtime_account_irq);

static int irqtime_account_hi_update(void)
{
--
1.7.5.4