I work with i386 16-way systems. When hyperthreading is enabled they
have 32 "cpus". I recently did a quick performance check on with 2.6
(as it turn out with a SElinux enabled) doing kernel builds.
My basic test was timing kernel makes using -j16 and -j32. I saw
tremendous differences between these two times.
for -j16
real 0m52.450s
user 6m24.572s
sys 2m25.331s
for -j32
real 2m56.743s
user 9m28.781s
sys 73m50.536s
This performance was not seen without hyperthreading. I have only seen
this on 16-way with HT. 2.6.10-rc1 was used to evaluate the problem.
Notice the system time goes through the roof. From 2.5 min to almost 74
min! We looked at schedstat data and tried various scheduler / numa
options without much to point at. We then did some oprofiling and saw
31999102 83.2007 _spin_lock_irqsave
for make -j32.
After some lock profiling (keeping track of what locks were last used
and how many cycles were spent waiting) it became quite clean the the
avc_lock was to blame. The avc_lock is a SELinux lock.
The theory was proved by booting with selinux=0. The performance with
make -j32 is now within an acceptable level (within 20%) when compared
to a make -j16.
It appears that SELinux only scales to somewhere between 16 and 32
cpus. I now very little about SELinux and it's workings but I wanted to
report what I have seen on my system. I can't say I am really happy
about this performance.
I would like to thank Derrick Wong and Rick Lindsley for helping to
identify this issue.
Keith Mannthey
LTC xSeries
Please cc me as I am not a regular subscriber to this list.
* keith ([email protected]) wrote:
> I work with i386 16-way systems. When hyperthreading is enabled they
> have 32 "cpus". I recently did a quick performance check on with 2.6
> (as it turn out with a SElinux enabled) doing kernel builds.
Have you tried any recent -mm kernel? avc_lock was refactored to RCU
locking, and benchmarks shows it scales quite nicely now.
thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net
On Tue, 2004-11-23 at 14:22, keith wrote:
> After some lock profiling (keeping track of what locks were last used
> and how many cycles were spent waiting) it became quite clean the the
> avc_lock was to blame. The avc_lock is a SELinux lock.
Thanks to work by Kaigai Kohei of NEC, the global avc spinlock has been
replaced by an RCU-based scheme. Those changes are in the -mm patches
(e.g. 2.6.10-rc2-mm3) and will hopefully go upstream after 2.6.10 is
released. There is also ongoing work on baseline performance.
--
Stephen Smalley <[email protected]>
National Security Agency
On Tue, 23 Nov 2004 11:22:05 PST, keith said:
> After some lock profiling (keeping track of what locks were last used
> and how many cycles were spent waiting) it became quite clean the the
> avc_lock was to blame. The avc_lock is a SELinux lock.
Known issue - in the -mm kernels there are these patches:
selinux-scalability-add-spin_trylock_irq-and.patch
SELinux scalability: add spin_trylock_irq and spin_trylock_irqsave
selinux-scalability-convert-avc-to-rcu.patch
SELinux scalability: convert AVC to RCU
selinux-atomic_dec_and_test-bug.patch
SELinux: atomic_dec_and_test() bug
selinux-scalability-avc-statistics-and-tuning.patch
SELinux scalability: AVC statistics and tuning
I don't know if these patches require other infrastructure from the -mm
patchseries, or if they'll apply clean to a 2.6.10-rc2 kernel.
On Tue, 2004-11-23 at 12:52, Chris Wright wrote:
> * keith ([email protected]) wrote:
> > I work with i386 16-way systems. When hyperthreading is enabled they
> > have 32 "cpus". I recently did a quick performance check on with 2.6
> > (as it turn out with a SElinux enabled) doing kernel builds.
>
> Have you tried any recent -mm kernel? avc_lock was refactored to RCU
> locking, and benchmarks shows it scales quite nicely now.
I just tried 2.6.10-rc2-mm3 on my box. The patches in mm fix the
avc_lock problem. The refactoring of the lock worked :)
Thanks,
Keith Mannthey
LTC xSeries