LinuxLists.cc - [RFC 0/6] chargeback accounting patches

2008-11-25 21:46:38

Subject: [RFC 0/6] chargeback accounting patches

Hi all,

I've taken Vaidy's patches to implement charge-back accounting and modified
them a bit. The end result is still mostly the same--scaled utime and stime
via taskstats--but hopefully done in a less invasive way. The point of these
patches is for a computer utilization accounting system to be able to determine
that a particular process was not completely CPU bound, at which point it could
try to determine if the process was memory-bound for (perhaps) more optimal
scheduling later. Or put discounts on the bill.

For sure, this is not to be used as a sole method for measuring processing
capacity. :)

There are six patches in this series. Allow me to summarize them:

1. First, there are accounting bugs in the cpufreq_stats code that will
be exposed by a later patch because someone assumed that cputime =
jiffies.

2. The second patch moves the APERF/MPERF access code into a separate
file so that both the chargeback accounting code and the acpi-cpufreq
driver can both access those MSRs without stepping on each other.

3. Next, we create a VIRT_CPU_ACCOUNTING config option. This enables us
to delegate timeslice accounting out of the generic kernel code into
arch-specific areas. In the arch-specific code, we can then use the
APERF/MPERF ratio to calculate the scaled utime/stime values. The
approach used is similar to what is done in arch/powerpc/ to scale
utime/stime values via SPURR/PURR.

4. Currently, x86 assumes that cputime = jiffies. However, this is an
integer counter, which means that fractional jiffies, such as what we
might get when trying to scale for CPU frequency, don't work. If
we change the cputime units to nanoseconds, however, we can accomplish
this without having to muck around with the taskstats code.

5. Convert the acpi-cpufreq driver to use the functions defined in patch 2
to access APERF/MPERF. Previously the acpi-cpufreq driver would zero
the MSRs after accessing them; however, this doesn't play well with
multiple accessors. Luckily, on a practical level the register is
wide enough that overflow won't happen for a long time.

6. Modify getdelays.c to report utime/stime/scaled_utime/scaled_stime.

Let me know what you think of the patchset. It's been tested with assorted
heavy/moderate loads and looks ok, though YMMV. I'm curious to see what
you all think... for one thing, this patchset doesn't stray too far away from
the notion that we charge 1 tick to the non-scaled utime/stime depending on
whichever space (user/system) we were in at the time of the tick. On one
hand that's still fairly close to the way we do things in x86 right now; on
the other hand, it's not terribly precise.

--D

2008-11-25 21:46:54

by djwong

[permalink] [raw]

Subject: [PATCH 1/6] cpufreq_stats: Correct jiffies64/cputime64 conversion

Since cpufreq_stats->time_in_state is a cputime64_t value, we ought
to convert the jiffies64 values to cputime64_t. On platforms where
cputime != jiffies, this leads to accounting errors in the sysfs
reports.

Signed-off-by: Darrick J. Wong <[email protected]>
---
drivers/cpufreq/cpufreq_stats.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index c0ff97d..b5ccf86 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -55,14 +55,18 @@ cpufreq_stats_update (unsigned int cpu)
{
struct cpufreq_stats *stat;
unsigned long long cur_time;
+ cputime64_t d;

cur_time = get_jiffies_64();
spin_lock(&cpufreq_stats_lock);
stat = per_cpu(cpufreq_stats_table, cpu);
- if (stat->time_in_state)
+ if (stat->time_in_state) {
+ d = jiffies64_to_cputime64(cputime_sub(cur_time,
+ stat->last_time));
stat->time_in_state[stat->last_index] =
cputime64_add(stat->time_in_state[stat->last_index],
- cputime_sub(cur_time, stat->last_time));
+ d);
+ }
stat->last_time = cur_time;
spin_unlock(&cpufreq_stats_lock);
return 0;

2008-11-25 21:47:21

by djwong

[permalink] [raw]

Subject: [PATCH 2/6] Centralize access to APERF and MPERF MSRs on Intel CPUs

This patch provides helper functions to detect and to access the APERF
and MPERF MSRs on certain Intel CPUs. These two registers are useful
for determining the measured performance over a period of time while
the CPU is in C0 state.

Signed-off-by: Darrick J. Wong <[email protected]>
---
arch/x86/include/asm/system.h | 19 ++++++++
arch/x86/kernel/Makefile | 2 -
arch/x86/kernel/time.c | 103 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 123 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h
index 2ed3f0f..787f5c2 100644
--- a/arch/x86/include/asm/system.h
+++ b/arch/x86/include/asm/system.h
@@ -422,4 +422,23 @@ static inline void rdtsc_barrier(void)
alternative(ASM_NOP3, "lfence", X86_FEATURE_LFENCE_RDTSC);
}

+#define U64_MAX (u64)(~((u64)0))
+
+static inline u64 delta_perf(u64 now, u64 *old)
+{
+ u64 delta;
+
+ if (now > *old)
+ delta = now - *old;
+ else
+ delta = now + (U64_MAX - *old);
+
+ *old = now;
+ return delta;
+}
+
+void get_intel_aperf_mperf_registers(u64 *aperf, u64 *mperf);
+u64 scale_with_perf(u64 input, u64 aperf, u64 mperf);
+#define CPUID_6_ECX_APERFMPERF_CAPABILITY (0x1)
+
#endif /* _ASM_X86_SYSTEM_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index e489ff9..7c20f6f 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -25,7 +25,7 @@ CFLAGS_tsc.o := $(nostackp)

obj-y := process_$(BITS).o signal_$(BITS).o entry_$(BITS).o
obj-y += traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
-obj-y += time_$(BITS).o ioport.o ldt.o
+obj-y += time_$(BITS).o ioport.o ldt.o time.o
obj-y += setup.o i8259.o irqinit_$(BITS).o setup_percpu.o
obj-$(CONFIG_X86_VISWS) += visws_quirks.o
obj-$(CONFIG_X86_32) += probe_roms_32.o
diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
new file mode 100644
index 0000000..41ff323
--- /dev/null
+++ b/arch/x86/kernel/time.c
@@ -0,0 +1,103 @@
+#include <linux/kernel_stat.h>
+#include <linux/module.h>
+#include <linux/hardirq.h>
+
+void get_intel_aperf_mperf_registers(u64 *aperf, u64 *mperf)
+{
+ union {
+ struct {
+ u32 lo;
+ u32 hi;
+ } split;
+ u64 whole;
+ } aperf_cur, mperf_cur;
+ unsigned long flags;
+
+ /* Read current values of APERF and MPERF MSRs*/
+ local_irq_save(flags);
+ rdmsr(MSR_IA32_MPERF, mperf_cur.split.lo, mperf_cur.split.hi);
+ rdmsr(MSR_IA32_APERF, aperf_cur.split.lo, aperf_cur.split.hi);
+ local_irq_restore(flags);
+
+ *mperf = mperf_cur.whole;
+ *aperf = aperf_cur.whole;
+}
+EXPORT_SYMBOL_GPL(get_intel_aperf_mperf_registers);
+
+u64 scale_with_perf(u64 input, u64 aperf, u64 mperf)
+{
+ union {
+ struct {
+ u32 lo;
+ u32 hi;
+ } split;
+ u64 whole;
+ } aperf_cur, mperf_cur;
+
+ aperf_cur.whole = aperf;
+ mperf_cur.whole = mperf;
+
+#ifdef __i386__
+ /*
+ * We dont want to do 64 bit divide with 32 bit kernel
+ * Get an approximate value. Return failure in case we cannot get
+ * an approximate value.
+ */
+ if (unlikely(aperf_cur.split.hi || mperf_cur.split.hi)) {
+ int shift_count;
+ u32 h;
+
+ h = max_t(u32, aperf_cur.split.hi, mperf_cur.split.hi);
+ shift_count = fls(h);
+
+ aperf_cur.whole >>= shift_count;
+ mperf_cur.whole >>= shift_count;
+ }
+
+ if (((unsigned long)(-1) / 100) < aperf_cur.split.lo) {
+ int shift_count = 7;
+ aperf_cur.split.lo >>= shift_count;
+ mperf_cur.split.lo >>= shift_count;
+ }
+
+ if (aperf_cur.split.lo && mperf_cur.split.lo)
+ return (aperf_cur.split.lo * input) / mperf_cur.split.lo;
+#else
+ if (unlikely(((unsigned long)(-1) / 100) < aperf_cur.whole)) {
+ int shift_count = 7;
+ aperf_cur.whole >>= shift_count;
+ mperf_cur.whole >>= shift_count;
+ }
+
+ if (aperf_cur.whole && mperf_cur.whole)
+ return (aperf_cur.whole * input) / mperf_cur.whole;
+#endif
+ return 0;
+}
+EXPORT_SYMBOL_GPL(scale_with_perf);
+
+static inline int is_intel_cpu_with_aperf(void)
+{
+ struct cpuinfo_x86 *c;
+ static int has_aperf = -1;
+
+ if (has_aperf >= 0)
+ return has_aperf;
+
+ /* If cpuid_level = 0, it might not have been set yet */
+ c = &cpu_data(smp_processor_id());
+ if (!c->x86_vendor && !c->cpuid_level)
+ return 0;
+
+ /* Check for APERF/MPERF support in hardware */
+ has_aperf = 0;
+ if (c->x86_vendor == X86_VENDOR_INTEL && c->cpuid_level >= 6) {
+ unsigned int ecx;
+ ecx = cpuid_ecx(6);
+ if (ecx & CPUID_6_ECX_APERFMPERF_CAPABILITY)
+ has_aperf = 1;
+ }
+
+ return has_aperf;
+}
+EXPORT_SYMBOL_GPL(is_intel_cpu_with_aperf);

2008-11-25 21:47:42

by djwong

[permalink] [raw]

Subject: [PATCH 3/6] x86 chargeback accounting patch

Provides scaled utime and stime CPU usage statistics to facilitate chargeback
accounting for CPU use on x86 architecture. This code allows taskstat users to
get an idea of not just the wall clock time that a process used, but also how
much demand was put on the CPU to run quickly, similar to how the SPURR and
PURR registers are used in arch/powerpc/ to provide scaled utime and stime.

Signed-off-by: Darrick J. Wong <[email protected]>
---
arch/x86/Kconfig | 12 ++++++++++
arch/x86/include/asm/system.h | 4 +++
arch/x86/kernel/time.c | 48 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 64 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ac22bb7..44e32b1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -206,6 +206,18 @@ menu "Processor type and features"

source "kernel/time/Kconfig"

+config VIRT_CPU_ACCOUNTING
+ bool "Deterministic task and CPU time accounting"
+ default n
+ help
+ Select this option to enable more accurate task and CPU time
+ accounting. This is done by reading a CPU counter on each
+ kernel entry and exit and on transitions within the kernel
+ between system, softirq and hardirq state, so there is a
+ small performance impact.
+
+ If in doubt, say N here.
+
config SMP
bool "Symmetric multi-processing support"
---help---
diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h
index 787f5c2..3fabf5d 100644
--- a/arch/x86/include/asm/system.h
+++ b/arch/x86/include/asm/system.h
@@ -422,6 +422,10 @@ static inline void rdtsc_barrier(void)
alternative(ASM_NOP3, "lfence", X86_FEATURE_LFENCE_RDTSC);
}

+#ifdef CONFIG_VIRT_CPU_ACCOUNTING
+void account_system_vtime(struct task_struct *tsk);
+#endif
+
#define U64_MAX (u64)(~((u64)0))

static inline u64 delta_perf(u64 now, u64 *old)
diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
index 41ff323..727aa7b 100644
--- a/arch/x86/kernel/time.c
+++ b/arch/x86/kernel/time.c
@@ -2,6 +2,10 @@
#include <linux/module.h>
#include <linux/hardirq.h>

+/* Buffer to store old values of aperf/mperf */
+DEFINE_PER_CPU(u64, vcpu_acct_old_aperf);
+DEFINE_PER_CPU(u64, vcpu_acct_old_mperf);
+
void get_intel_aperf_mperf_registers(u64 *aperf, u64 *mperf)
{
union {
@@ -101,3 +105,47 @@ static inline int is_intel_cpu_with_aperf(void)
return has_aperf;
}
EXPORT_SYMBOL_GPL(is_intel_cpu_with_aperf);
+
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING
+void account_system_vtime(struct task_struct *tsk)
+{
+ /* if intel with aperf */
+ if (!is_intel_cpu_with_aperf())
+ return;
+
+ /* record aperf/mperf right now */
+ get_intel_aperf_mperf_registers(
+ &(per_cpu(vcpu_acct_old_aperf, smp_processor_id())),
+ &(per_cpu(vcpu_acct_old_mperf, smp_processor_id())));
+}
+EXPORT_SYMBOL_GPL(account_system_vtime);
+
+static inline cputime_t cputime_to_scaled(const cputime_t ct)
+{
+ u64 a, m, ad, md;
+
+ /* if not intel with aperf, scale is 1 */
+ if (!is_intel_cpu_with_aperf())
+ return ct;
+
+ /* calculate delta of aperf/mperf since a_s_v() above */
+ get_intel_aperf_mperf_registers(&a, &m);
+ ad = delta_perf(a, &(per_cpu(vcpu_acct_old_aperf, smp_processor_id())));
+ md = delta_perf(m, &(per_cpu(vcpu_acct_old_mperf, smp_processor_id())));
+
+ return scale_with_perf(ct, ad, md);
+}
+
+void account_process_tick(struct task_struct *p, int user_tick)
+{
+ cputime_t one_jiffy = jiffies_to_cputime(1);
+
+ if (user_tick) {
+ account_user_time(p, one_jiffy);
+ account_user_time_scaled(p, cputime_to_scaled(one_jiffy));
+ } else {
+ account_system_time(p, HARDIRQ_OFFSET, one_jiffy);
+ account_system_time_scaled(p, cputime_to_scaled(one_jiffy));
+ }
+}
+#endif

2008-11-25 21:47:58

by djwong

[permalink] [raw]

Subject: [PATCH 4/6] Migrate x86 to nanosecond cputime64_t

This patch migrates cputime64_t to have nanosecond resolution instead
of jiffies resolution. This is necessary so that we can report scaled
utime and stime for chargeback accounting, since the taskstat data are
only updated once per jiffy.

Signed-off-by: Darrick J. Wong <[email protected]>
---
arch/x86/include/asm/cputime.h | 109 ++++++++++++++++++++++++++++++++++++++++
1 files changed, 109 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/cputime.h b/arch/x86/include/asm/cputime.h
index 6d68ad7..dcce01a 100644
--- a/arch/x86/include/asm/cputime.h
+++ b/arch/x86/include/asm/cputime.h
@@ -1 +1,110 @@
+/*
+ * Definitions for measuring cputime on x86 machines.
+ *
+ * Based on <asm-ia64/cputime.h>.
+ *
+ * From the original file:
+ * Copyright (C) 2007 FUJITSU LIMITED
+ * Copyright (C) 2007 Hidetoshi Seto <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * If we have CONFIG_VIRT_CPU_ACCOUNTING, we measure cpu time in nsec.
+ * Otherwise we measure cpu time in jiffies using the generic definitions.
+ */
+
+#ifndef __X86_CPUTIME_H
+#define __X86_CPUTIME_H
+
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING
#include <asm-generic/cputime.h>
+#else
+
+#include <linux/time.h>
+#include <linux/jiffies.h>
+#include <asm/processor.h>
+
+typedef u64 cputime_t;
+typedef u64 cputime64_t;
+
+#define cputime_zero ((cputime_t)0)
+#define cputime_max ((~((cputime_t)0) >> 1) - 1)
+#define cputime_add(__a, __b) ((__a) + (__b))
+#define cputime_sub(__a, __b) ((__a) - (__b))
+#define cputime_div(__a, __n) ((__a) / (__n))
+#define cputime_halve(__a) ((__a) >> 1)
+#define cputime_eq(__a, __b) ((__a) == (__b))
+#define cputime_gt(__a, __b) ((__a) > (__b))
+#define cputime_ge(__a, __b) ((__a) >= (__b))
+#define cputime_lt(__a, __b) ((__a) < (__b))
+#define cputime_le(__a, __b) ((__a) <= (__b))
+
+#define cputime64_zero ((cputime64_t)0)
+#define cputime64_add(__a, __b) ((__a) + (__b))
+#define cputime64_sub(__a, __b) ((__a) - (__b))
+#define cputime_to_cputime64(__ct) (__ct)
+
+/*
+ * Convert cputime <-> jiffies (HZ)
+ */
+#define cputime_to_jiffies(__ct) ((__ct) / (NSEC_PER_SEC / HZ))
+#define jiffies_to_cputime(__jif) ((__jif) * (NSEC_PER_SEC / HZ))
+#define cputime64_to_jiffies64(__ct) ((__ct) / (NSEC_PER_SEC / HZ))
+#define jiffies64_to_cputime64(__jif) ((__jif) * (NSEC_PER_SEC / HZ))
+
+/*
+ * Convert cputime <-> milliseconds
+ */
+#define cputime_to_msecs(__ct) ((__ct) / NSEC_PER_MSEC)
+#define msecs_to_cputime(__msecs) ((__msecs) * NSEC_PER_MSEC)
+
+/*
+ * Convert cputime <-> seconds
+ */
+#define cputime_to_secs(__ct) ((__ct) / NSEC_PER_SEC)
+#define secs_to_cputime(__secs) ((__secs) * NSEC_PER_SEC)
+
+/*
+ * Convert cputime <-> timespec (nsec)
+ */
+static inline cputime_t timespec_to_cputime(const struct timespec *val)
+{
+ cputime_t ret = val->tv_sec * NSEC_PER_SEC;
+ return ret + val->tv_nsec;
+}
+static inline void cputime_to_timespec(const cputime_t ct, struct timespec *val)
+{
+ val->tv_sec = ct / NSEC_PER_SEC;
+ val->tv_nsec = ct % NSEC_PER_SEC;
+}
+
+/*
+ * Convert cputime <-> timeval (msec)
+ */
+static inline cputime_t timeval_to_cputime(struct timeval *val)
+{
+ cputime_t ret = val->tv_sec * NSEC_PER_SEC;
+ return ret + val->tv_usec * NSEC_PER_USEC;
+}
+static inline void cputime_to_timeval(const cputime_t ct, struct timeval *val)
+{
+ val->tv_sec = ct / NSEC_PER_SEC;
+ val->tv_usec = (ct % NSEC_PER_SEC) / NSEC_PER_USEC;
+}
+
+/*
+ * Convert cputime <-> clock (USER_HZ)
+ */
+#define cputime_to_clock_t(__ct) ((__ct) / (NSEC_PER_SEC / USER_HZ))
+#define clock_t_to_cputime(__x) ((__x) * (NSEC_PER_SEC / USER_HZ))
+
+/*
+ * Convert cputime64 to clock.
+ */
+#define cputime64_to_clock_t(__ct) cputime_to_clock_t((cputime_t)__ct)
+
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING */
+#endif /* __X86_CPUTIME_H */

2008-11-25 21:48:29

by djwong

[permalink] [raw]

Subject: [PATCH 5/6] acpi_cpufreq: Use centralized APERF/MPERF function calls

Now that we've centralized APERF/MPERF accessor functions and hooked up
the chargeback accounting code to it, convert the only other user of the
APERF/MPERF MSRs to use those functions as well.

Signed-off-by: Darrick J. Wong <[email protected]>
---
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 73 +++++-----------------------
1 files changed, 14 insertions(+), 59 deletions(-)

diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
index 8e48c5d..21958af 100644
--- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -57,7 +57,6 @@ enum {
};

#define INTEL_MSR_RANGE (0xffff)
-#define CPUID_6_ECX_APERFMPERF_CAPABILITY (0x1)

struct acpi_cpufreq_data {
struct acpi_processor_performance *acpi_data;
@@ -243,6 +242,9 @@ static u32 get_cur_val(const cpumask_t *mask)
return cmd.val;
}

+DEFINE_PER_CPU(u64, cpufreq_old_aperf);
+DEFINE_PER_CPU(u64, cpufreq_old_mperf);
+
/*
* Return the measured active (C0) frequency on this CPU since last call
* to this function.
@@ -259,16 +261,8 @@ static u32 get_cur_val(const cpumask_t *mask)
static unsigned int get_measured_perf(struct cpufreq_policy *policy,
unsigned int cpu)
{
- union {
- struct {
- u32 lo;
- u32 hi;
- } split;
- u64 whole;
- } aperf_cur, mperf_cur;
-
+ u64 aperf_cur, mperf_cur;
cpumask_t saved_mask;
- unsigned int perf_percent;
unsigned int retval;

saved_mask = current->cpus_allowed;
@@ -279,60 +273,21 @@ static unsigned int get_measured_perf(struct cpufreq_policy *policy,
return 0;
}

- rdmsr(MSR_IA32_APERF, aperf_cur.split.lo, aperf_cur.split.hi);
- rdmsr(MSR_IA32_MPERF, mperf_cur.split.lo, mperf_cur.split.hi);
-
- wrmsr(MSR_IA32_APERF, 0,0);
- wrmsr(MSR_IA32_MPERF, 0,0);
-
-#ifdef __i386__
- /*
- * We dont want to do 64 bit divide with 32 bit kernel
- * Get an approximate value. Return failure in case we cannot get
- * an approximate value.
- */
- if (unlikely(aperf_cur.split.hi || mperf_cur.split.hi)) {
- int shift_count;
- u32 h;
-
- h = max_t(u32, aperf_cur.split.hi, mperf_cur.split.hi);
- shift_count = fls(h);
-
- aperf_cur.whole >>= shift_count;
- mperf_cur.whole >>= shift_count;
- }
-
- if (((unsigned long)(-1) / 100) < aperf_cur.split.lo) {
- int shift_count = 7;
- aperf_cur.split.lo >>= shift_count;
- mperf_cur.split.lo >>= shift_count;
- }
-
- if (aperf_cur.split.lo && mperf_cur.split.lo)
- perf_percent = (aperf_cur.split.lo * 100) / mperf_cur.split.lo;
- else
- perf_percent = 0;
-
-#else
- if (unlikely(((unsigned long)(-1) / 100) < aperf_cur.whole)) {
- int shift_count = 7;
- aperf_cur.whole >>= shift_count;
- mperf_cur.whole >>= shift_count;
- }
-
- if (aperf_cur.whole && mperf_cur.whole)
- perf_percent = (aperf_cur.whole * 100) / mperf_cur.whole;
- else
- perf_percent = 0;
-
-#endif
+ /* Calculate difference in APERF and MPERF. */
+ get_intel_aperf_mperf_registers(&aperf_cur, &mperf_cur);
+ aperf_cur = delta_perf(aperf_cur, &(per_cpu(cpufreq_old_aperf,
+ smp_processor_id())));
+ mperf_cur = delta_perf(mperf_cur, &(per_cpu(cpufreq_old_mperf,
+ smp_processor_id())));

- retval = per_cpu(drv_data, policy->cpu)->max_freq * perf_percent / 100;
+ /* Scale CPU frequency according to APERF/MPERF. */
+ retval = scale_with_perf(per_cpu(drv_data, policy->cpu)->max_freq,
+ aperf_cur, mperf_cur);

put_cpu();
set_cpus_allowed_ptr(current, &saved_mask);

- dprintk("cpu %d: performance percent %d\n", cpu, perf_percent);
+ dprintk("cpu %d: performance freq %d\n", cpu, retval);
return retval;
}

2008-11-25 21:48:47

by djwong

[permalink] [raw]

Subject: [PATCH 6/6] getdelays.c: Show utime, stime, and scaled variants of the two

Since utime, stime, scaled_utime and scaled_stime have been available
via the taskstats interface for a while, modify the helper program to
show those values.

Signed-off-by: Darrick J. Wong <[email protected]>
---
Documentation/accounting/getdelays.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c
index cc49400..1894894 100644
--- a/Documentation/accounting/getdelays.c
+++ b/Documentation/accounting/getdelays.c
@@ -199,7 +199,8 @@ void print_delayacct(struct taskstats *t)
"SWAP %15s%15s\n"
" %15llu%15llu\n"
"RECLAIM %12s%15s\n"
- " %15llu%15llu\n",
+ " %15llu%15llu\n"
+ "TIME utime=%llu stime=%llu sutime=%llu sstime=%llu\n",
"count", "real total", "virtual total", "delay total",
(unsigned long long)t->cpu_count,
(unsigned long long)t->cpu_run_real_total,
@@ -213,7 +214,11 @@ void print_delayacct(struct taskstats *t)
(unsigned long long)t->swapin_delay_total,
"count", "delay total",
(unsigned long long)t->freepages_count,
- (unsigned long long)t->freepages_delay_total);
+ (unsigned long long)t->freepages_delay_total,
+ (unsigned long long)t->ac_utime,
+ (unsigned long long)t->ac_stime,
+ (unsigned long long)t->ac_utimescaled,
+ (unsigned long long)t->ac_stimescaled);
}

void task_context_switch_counts(struct taskstats *t)