2008-03-14 00:03:27

by Bill Huey (hui)

[permalink] [raw]
Subject: [PATCH RT 0/6] lockstat measurement extensions

Hello,

I'd like to announce extensions to the lockstat/lockdep framework to
measure the possibility of whether or not adaptive spins and/or lock
steals can happen within the rtmutex implementation's slow path. This
extends the common rtmutex functions to pass the depmap and friends so
that the lock_note_contention function can determine whether a
rtmutex->owner is live on another run queue. If so, then it logs it in
per cpu storage with the peterz's lockstat framework.

I had to extend a lot of function headers using some preprocessor
definitions to minimize the ifdef complexity, but it's still rather
complex even with some of the reductions. With that said and done, I'd
like suggest that a better method would be to add fields in the struct
rtmutex to pass values down to the lock_note_contention() function and
to contain state/events that can be post processed by
LOCK_CONTENDED*() macros instead. This was originally proposed by
Peter Zijlstra, but I had already decided to complete this track just
to see where it would take me.

This is a reimplementation of my own lockstat work into Peter's stuff.
I hope to pass it over to the Novell folks so that they can maintain
and take over development of this feature which will shine a light on
whether adaptive locks and lateral steals are useful in -rt.

Some results here:
----

lock_stat version 0.2
spinnables_total = 320571
contentions_total = 1670097
stolen_total = 1161888
cpu range error = 0
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
class name con-bounces contentions
[adapt,steals] waittime-min waittime-max waittime-total
acq-bounces acquisitions holdtime-min holdtime-max
holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

&type->i_mutex_dir_key#2: 273392 659581
[172517, 658744] 18446744073709 580800.58 524881532.33
315374 1050378 18446744073709 605616.37 107058004.06
------------------------
&type->i_mutex_dir_key#2 659581
[<ffffffff802a785b>] vfs_readdir+0x52/0xaf
&type->i_mutex_dir_key#2 0
[<ffffffff802a2917>] do_lookup+0x84/0x1b4

...............................................................................................................................................................................................

irq_desc#1: 305009 305009
[1, 0] 0.51 13.55 1120626.91 620872
1783324 0.23 36.62 8316839.04
----------
irq_desc#1 304949
[<ffffffff8026c3bf>] do_irqd+0x86/0x2c7
irq_desc#1 39
[<ffffffff8026cfc8>] handle_fasteoi_irq+0x2a/0x10a

...............................................................................................................................................................................................

(raw_spinlock_t *)(&lock->wait_lock): 77005 77561
[623, 0] 0.32 44.14 93979.97 743444
10998216 0.24 53.89 4278155.20
------------------------------------
(raw_spinlock_t *)(&lock->wait_lock) 14706
[<ffffffff804ff9bd>] rt_spin_lock_slowunlock+0xf/0x5c
(raw_spinlock_t *)(&lock->wait_lock) 15631
[<ffffffff804ffa19>] rt_mutex_slowunlock+0xf/0x59
(raw_spinlock_t *)(&lock->wait_lock) 61
[<ffffffff804ffe8e>] rt_mutex_slowlock+0x1f7/0x2d5
(raw_spinlock_t *)(&lock->wait_lock) 9
[<ffffffff804ffbf0>] rt_spin_lock_slowlock+0x128/0x1cf

...............................................................................................................................................................................................

dcache_lock.wait_lock: 69559 72424
[34806, 0] 0.36 13.90 87260.09 379404
1234745 0.29 27.01 1749680.84
---------------------
dcache_lock.wait_lock 12618
[<ffffffff804ff9bd>] rt_spin_lock_slowunlock+0xf/0x5c
dcache_lock.wait_lock 51591
[<ffffffff804ffbf0>] rt_spin_lock_slowlock+0x128/0x1cf
dcache_lock.wait_lock 8072
[<ffffffff804ffb02>] rt_spin_lock_slowlock+0x3a/0x1cf
dcache_lock.wait_lock 143
[<ffffffff802623b7>] task_blocks_on_rt_mutex+0x1aa/0x1bf

----

I measure the total number of contention events printed out at the
very top of the listing and spin/steals within "[]" to the right of
the contention number for that lock. It's interesting how many steals
actually happen in that the percentage is quite substantial. This was
against a load of "find /" commands which hit inode locks pretty
heavily. irq_desc is interesting as well.

Patches can be found here:
http://mmlinux.sourceforge.net/public/lockstat/patch?.diff

More of the output can be found here
http://mmlinux.sourceforge.net/public/lockstat/output

bill


2008-03-14 00:07:59

by Bill Huey (hui)

[permalink] [raw]
Subject: Re: [PATCH RT 1/6] lockstat measurement extensions

Function parameter extension macros and function prototype redefinitions.

bill


Attachments:
(No filename) (80.00 B)
patch0.diff (6.86 kB)
Download all attachments

2008-03-14 00:12:56

by Bill Huey (hui)

[permalink] [raw]
Subject: Re: [PATCH RT 3/6] lockstat measurement extensions

lock_contention/lock_acquire() refactor to put common code searching
for an hlock into a common function. A function pointer is pass to ths
common function which can do lock_note* specific functionality. gcc
4.3 will inline this so this will not effect codegen quality for that
compiler version (it's now hitting ubuntu and other distributions)

bill


Attachments:
(No filename) (351.00 B)
patch2.diff (8.29 kB)
Download all attachments

2008-03-14 00:15:57

by Bill Huey (hui)

[permalink] [raw]
Subject: Re: [PATCH RT 4/6] lockstat measurement extensions

rtmutex.c push down of lock_note_contention()/lock_acquire() and new
functions lock_note_stolen() into the rtmutex.c common code itself
from LOCK_CONTENTION() top level macros. This is so that I can note
events at the specific points it occurs.

bill


Attachments:
(No filename) (251.00 B)
patch3.diff (8.60 kB)
Download all attachments

2008-03-14 00:18:20

by Bill Huey (hui)

[permalink] [raw]
Subject: Re: [PATCH RT 5/6] lockstat measurement extensions

Lock function bodies are extended to pass down a struct lockdep_map
and friends for contention logging.

bill


Attachments:
(No filename) (110.00 B)
patch4.diff (10.67 kB)
Download all attachments

2008-03-14 00:20:02

by Bill Huey (hui)

[permalink] [raw]
Subject: Re: [PATCH RT 6/6] lockstat measurement extensions

Last one. Spin checking against the rtmutex->owner's run queue to
determine if an adaptive spin or not is useful.

bill

2008-03-14 00:26:41

by Bill Huey (hui)

[permalink] [raw]
Subject: Re: [PATCH RT 0/6] lockstat measurement extensions

Also, this is not meant for inclusion in it's current revision. It'll
need to have various compilation modes
(CONFIG_PREEMPT_RT/CONFIG_LOCK_STAT) combination enabled and such. I'm
handing this over to Peter Morreale and it'll be up to him to decided
what to do next.

However, it does work and should get interesting numbers for the
discussion of whether or not adaptive spins and friends are useful or
not.

bill

2008-03-14 00:40:27

by Bill Huey (hui)

[permalink] [raw]
Subject: Re: [PATCH RT 0/6] lockstat measurement extensions

Sorry missing the patch. At Thomas's urging, this is inlined:

--- linux-2.6.24/kernel/sched.c 2008-02-25 15:32:05.000000000 -0800
+++ linux-2.6.24.working/kernel/sched.c 2008-03-13 13:53:24.000000000 -0700
@@ -1175,6 +1175,20 @@
return cpu_curr(task_cpu(p)) == p;
}

+int task_spinnable(struct task_struct *p)
+{
+/*
+ * The use of task_curr can crash the system since the struct
thread_info seems
+ * to disappear when dereferenced arbitrarily, so becareful.
+ */
+#ifdef CONFIG_SMP
+ if (p && p->se.on_rq && task_curr(p))
+ return 1;
+#else
+ return 0;
+#endif
+}
+
/* Used instead of source_load when we know the type == 0 */
unsigned long weighted_cpuload(const int cpu)
{
@@ -1239,6 +1253,11 @@
*new_cfsrq = cpu_cfs_rq(old_cfsrq, new_cpu);
u64 clock_offset;

+//--billh
+// if (old_cpu >= NR_CPUS)
+// panic("bogus cpu id %u\n", old_cpu);
+//
+//
clock_offset = old_rq->clock - new_rq->clock;

#ifdef CONFIG_SCHEDSTATS