From: "Darrick J. Wong" <djwong@us.ibm.com>
Subject: [RFC 0/6] chargeback accounting patches
To: "Darrick J. Wong" <djwong@us.ibm.com>,
       Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
       Dipankar Sarma <dipankar.sarma@in.ibm.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
       Balbir Singh <balbir@linux.vnet.ibm.com>
Date: Tue, 25 Nov 2008 13:44:37 -0800
Message-ID: <20081125214437.22900.82384.stgit@elm3a70.beaverton.ibm.com>
User-Agent: StGIT/0.13
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2965
Lines: 58

Hi all,

I've taken Vaidy's patches to implement charge-back accounting and modified
them a bit.  The end result is still mostly the same--scaled utime and stime
via taskstats--but hopefully done in a less invasive way.  The point of these
patches is for a computer utilization accounting system to be able to determine
that a particular process was not completely CPU bound, at which point it could
try to determine if the process was memory-bound for (perhaps) more optimal
scheduling later.  Or put discounts on the bill.

For sure, this is not to be used as a sole method for measuring processing
capacity. :)

There are six patches in this series.  Allow me to summarize them:

1. First, there are accounting bugs in the cpufreq_stats code that will
   be exposed by a later patch because someone assumed that cputime =
   jiffies.

2. The second patch moves the APERF/MPERF access code into a separate
   file so that both the chargeback accounting code and the acpi-cpufreq
   driver can both access those MSRs without stepping on each other.

3. Next, we create a VIRT_CPU_ACCOUNTING config option.  This enables us
   to delegate timeslice accounting out of the generic kernel code into
   arch-specific areas.  In the arch-specific code, we can then use the
   APERF/MPERF ratio to calculate the scaled utime/stime values.  The
   approach used is similar to what is done in arch/powerpc/ to scale
   utime/stime values via SPURR/PURR.

4. Currently, x86 assumes that cputime = jiffies.  However, this is an
   integer counter, which means that fractional jiffies, such as what we
   might get when trying to scale for CPU frequency, don't work.  If
   we change the cputime units to nanoseconds, however, we can accomplish
   this without having to muck around with the taskstats code.

5. Convert the acpi-cpufreq driver to use the functions defined in patch 2
   to access APERF/MPERF.  Previously the acpi-cpufreq driver would zero
   the MSRs after accessing them; however, this doesn't play well with
   multiple accessors.  Luckily, on a practical level the register is
   wide enough that overflow won't happen for a long time.

6. Modify getdelays.c to report utime/stime/scaled_utime/scaled_stime.

Let me know what you think of the patchset.  It's been tested with assorted
heavy/moderate loads and looks ok, though YMMV.  I'm curious to see what
you all think... for one thing, this patchset doesn't stray too far away from
the notion that we charge 1 tick to the non-scaled utime/stime depending on
whichever space (user/system) we were in at the time of the tick.  On one
hand that's still fairly close to the way we do things in x86 right now; on
the other hand, it's not terribly precise.

--D
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/