From: Peter Chubb <peter@chubb.wattle.id.au>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <16945.9771.415044.444272@wombat.chubb.wattle.id.au>
Date: Fri, 11 Mar 2005 16:01:31 +1100
To: Andrew Morton <akpm@osdl.org>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>, linux-kernel@vger.kernel.org
Subject: Re: Microstate Accounting for 2.6.11
In-Reply-To: <20050310200808.306caf98.akpm@osdl.org>
References: <16945.5058.251259.828855@berry.gelato.unsw.EDU.AU>
	<20050310200808.306caf98.akpm@osdl.org>
Comments: Hyperbole mail buttons accepted, v04.18.
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2221
Lines: 52

>>>>> "Andrew" == Andrew Morton <akpm@osdl.org> writes:

Andrew> Peter Chubb <peterc@gelato.unsw.edu.au> wrote:
>>  Timing data on threads at present is pretty crude: when the timer
>> interrupt occurs, a tick is added to either system time or user
>> time for the currently running thread.  Thus in an unpacthed kernel
>> one can distinguish three timed states: On-cpu in userspace, on-cpu
>> in system space, and not running.
>> 
>> The actual number of states is much larger.  A thread can be on a
>> runqueue or the expired queue (i.e., ready to run but not running),
>> sleeping on a semaphore or on a futex, having its time stolen to
>> service an interrupt, etc., etc.
>> 
>> This patch adds timers per-state to each struct task_struct, so
>> that time in all these states can be tracked.  This patch contains
>> the core code do the timing, and to initialise the timers.
>> Subsequent patches enable the code (by adding Kconfig options) and
>> add hooks to track state changes.

Andrew> Why does the kernel need this feature?

I find that it's useful when trying to work out why a thread is going
more slowly than it needs to.  Userspace tools in the CVS repository
at gelato.unsw.edu.au let you graph in real time the time spent in
each state, so you get graphs like this:

     http://gelato.unsw.edu.au/patches/snapshot.png

which shows mplay skipping because of a slow disk/filesystem.

Andrew> Have you any numbers on the overhead?

Around 5% on LMbench context switch numbers for uniprocessor,
negligeable on SMP (but SMP context switch results are horrible at the
moment according to LMbench2 -- almost 16usec); select on 10 fd goes
from 1.665 usec to 1.701; 

Andrew> The preempt_disable() in sys_msa() seems odd.

Yes I only added that yesterday.  It's to prevent migration while
updating the current timer.  All the other places where the current
timer are updated are naturally protected this.  It should probably be a
local_irq_disable() instead.

Peter C

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/