Date: Thu, 3 Mar 2005 16:08:49 -0800
From: Andrew Morton <akpm@osdl.org>
To: peterc@gelato.unsw.edu.au, linux-kernel@vger.kernel.org,
       scalability@gelato.org
Subject: Re: [PATCH] Fixing address space lock contention in 2.6.11
Message-Id: <20050303160849.0b80be76.akpm@osdl.org>
In-Reply-To: <20050302184653.3ea8e29d.akpm@osdl.org>
References: <16934.28560.325928.858464@berry.gelato.unsw.EDU.AU>
	<20050302184653.3ea8e29d.akpm@osdl.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2137
Lines: 55

Andrew Morton <akpm@osdl.org> wrote:
>
> Peter Chubb <peterc@gelato.unsw.edu.au> wrote:
> >
> > 
> > Hi,
> > 	As part of the Gelato scalability focus group, we've been running
> > OSDL's Re-AIM7 benchmark with an I/O intensive load with varying
> > numbers of processors.  The current kernel shows severe contention on
> > the tree_lock in the address space structure when running on tmpfs or
> > ext2 on a RAM disk.
> > 
> 
> Yup.
> 
> Problem is, an rwlock is a little bit slower than a spinlock on a P4 due to
> the buslocked unlock, and a lot of people have p4's.
> 
> Could you do some testing on a 2-way p4?

hm, turns out I did some P4 testing ages ago:

with:

dd if=/dev/zero of=foo bs=1 count=2M  0.80s user 4.15s system 99% cpu 4.961 total
dd if=/dev/zero of=foo bs=1 count=2M  0.73s user 4.26s system 100% cpu 4.987 total
dd if=/dev/zero of=foo bs=1 count=2M  0.79s user 4.25s system 100% cpu 5.034 total

dd if=foo of=/dev/null bs=1  0.80s user 3.12s system 99% cpu 3.928 total
dd if=foo of=/dev/null bs=1  0.77s user 3.15s system 100% cpu 3.914 total
dd if=foo of=/dev/null bs=1  0.92s user 3.02s system 100% cpu 3.935 total

(3.926: 1.87 usecs)

without:

dd if=/dev/zero of=foo bs=1 count=2M  0.85s user 3.92s system 99% cpu 4.780 total
dd if=/dev/zero of=foo bs=1 count=2M  0.78s user 4.02s system 100% cpu 4.789 total
dd if=/dev/zero of=foo bs=1 count=2M  0.82s user 3.94s system 99% cpu 4.763 total
dd if=/dev/zero of=foo bs=1 count=2M  0.71s user 4.10s system 99% cpu 4.810 tota

dd if=foo of=/dev/null bs=1  0.76s user 2.68s system 100% cpu 3.438 total
dd if=foo of=/dev/null bs=1  0.74s user 2.72s system 99% cpu 3.465 total
dd if=foo of=/dev/null bs=1  0.67s user 2.82s system 100% cpu 3.489 total
dd if=foo of=/dev/null bs=1  0.70s user 2.62s system 99% cpu 3.326 total

(3.430: 1.635 usecs)


So the spinlock->rwlock conversion costs us 240 nsecs on a one-byte-write.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/