2001-12-03 08:51:49

by Andrew Morton

[permalink] [raw]
Subject: ext3-0.9.16 against linux-2.4.17-pre2

An ext3 update which also applies to linux-2.4.16 is available at

http://www.zip.com.au/~akpm/linux/ext3/

Quite a lot of miscellany here. It would be appreciated if interested
parties could please test it in preparation for sending upstream. Thanks.

Changelog:


- Merged several ext2 sync-up patches from Christoph Hellwig

- Drop the big kernel lock across the call to block_prepare_write.
This was causing excessive contention on large SMP machines. Thanks
to Anton ("dbench") Blanchard for finding this.

- Fixed a couple of potential kmap leaks on error paths.

There is some question whether the core kernel should be changed so
that this is not necessary, but it is right for current kernels.

- Fixed bugs concerning the use of bit operations on 32 bit quantities,
which could cause problems on 64-bit hardware. Thanks davem.

- Fix failure to return EFBIG when an attempt is made to lengthen an
ext3 file to more than the maximum file size via ftruncate().

- Current ext3 can cause an assertion failure and take down the machine
when an I/O error is encountered while mapping journal blocks in
preparation for writing to the journal. Fix from Stephen turns the
filesystem readonly when this occurs.

- ext3 is presently marking data dirty itself, which defeats the core
kernel's dirty buffer balancing. Take that out and let the generic
layer mark the buffers dirty.

This change, along with core kernel changes in 2.4.17-pre2 can
potentially reduce system congestion under heavy write loads.

- Update Documentation/Changes to reflect requirement for e2fsprogs
version (1.25)

- Update Documentation/Locking to describe the two address_space
methods which ext3 introduced.


2001-12-05 12:32:55

by Florian Lohoff

[permalink] [raw]
Subject: Re: ext3-0.9.16 against linux-2.4.17-pre2

On Sun, Dec 02, 2001 at 09:51:01PM -0800, Andrew Morton wrote:
>
> An ext3 update which also applies to linux-2.4.16 is available at
>

It seems something broken between 2.4.15-pre2 and this update - I am
seeing filesystem corruption:

Procmail moans about "locked" mailboxes - Opening them shows that
the last mail originates about 4 hours ago although there are coming
mails every minute.

procmail: Extraneous locallockfile ignored
procmail: Error while writing to "countpl/20011205"
procmail: Truncated file to former size
procmail: Error while writing to "archive/received-200112"
procmail: Truncated file to former size
From [email protected] Wed Dec 5 13:25:37 2001
Subject: Cron <nwmgmt@mgr1> /aol/bin/count.pl
Folder: /home/flo/Mail/inbox

(flo@ping)~# ls -la Mail/countpl/20011205 Mail/archive/received-200112
-rw------- 1 flo flo 51200000 Dec 5 13:25 Mail/archive/received-200112
-rw------- 1 flo flo 51200000 Dec 5 13:25 Mail/countpl/20011205

The last lines of the countpl/20011205 file contain 0 - Cut'n'pasted
from "most".

0x030D3C20: 3E0A4461 74653A20 5765642C 20203520 >.Date: Wed, 5
0x030D3C30: 44656320 32303031 2030393A 32303A32 Dec 2001 09:20:2
0x030D3C40: 32202B30 30303020 28474D54 290A0A45 2 +0000 (GMT)..E
0x030D3C50: 52522020 31373938 3520322E 72646967 RR 17985 2.rdig
0x030D3C60: 2E756B20 3A203632 2E35352E 382E3132 .uk : 62.55.8.12
0x030D3C70: 36206661 696C6564 20776169 74696E67 6 failed waiting
0x030D3C80: 20666F72 20706167 696E6720 72657175 for paging requ
0x030D3C90: 65737420 696E206C 32747020 73657373 est in l2tp sess
0x030D3CA0: 696F6E20 7461626C 650A0A00 00000000 ion table.......
0x030D3CB0: 00000000 00000000 00000000 00000000 ................
0x030D3CC0: 00000000 00000000 00000000 00000000 ................
0x030D3CD0: 00000000 00000000 00000000 00000000 ................
0x030D3CE0: 00000000 00000000 00000000 00000000 ................
0x030D3CF0: 00000000 00000000 00000000 00000000 ................
0x030D3D00: 00000000 00000000 00000000 00000000 ................
0x030D3D10: 00000000 00000000 00000000 00000000 ................
0x030D3D20: 00000000 00000000 00000000 00000000 ................
0x030D3D30: 00000000 00000000 00000000 00000000 ................
0x030D3D40: 00000000 00000000 00000000 00000000 ................
0x030D3D50: 00000000 00000000 00000000 00000000 ................
0x030D3D60: 00000000 00000000 00000000 00000000 ................
0x030D3D70: 00000000 00000000 00000000 00000000 ................
0x030D3D80: 00000000 00000000 00000000 00000000 ................
0x030D3D90: 00000000 00000000 00000000 00000000 ................
0x030D3DA0: 00000000 00000000 00000000 00000000 ................
0x030D3DB0: 00000000 00000000 00000000 00000000 ................
0x030D3DC0: 00000000 00000000 00000000 00000000 ................
0x030D3DD0: 00000000 00000000 00000000 00000000 ................
0x030D3DE0: 00000000 00000000 00000000 00000000 ................
0x030D3DF0: 00000000 00000000 00000000 00000000 ................
0x030D3E00: 00000000 00000000 00000000 00000000 ................
0x030D3E10: 00000000 00000000 00000000 00000000 ................
0x030D3E20: 00000000 00000000 00000000 00000000 ................
0x030D3E30: 00000000 00000000 00000000 00000000 ................
0x030D3E40: 00000000 00000000 00000000 00000000 ................
0x030D3E50: 00000000 00000000 00000000 00000000 ................
0x030D3E60: 00000000 00000000 00000000 00000000 ................
0x030D3E70: 00000000 00000000 00000000 00000000 ................
0x030D3E80: 00000000 00000000 00000000 00000000 ................
0x030D3E90: 00000000 00000000 00000000 00000000 ................
0x030D3EA0: 00000000 00000000 00000000 00000000 ................
0x030D3EB0: 00000000 00000000 00000000 00000000 ................
0x030D3EC0: 00000000 00000000 00000000 00000000 ................
0x030D3ED0: 00000000 00000000 00000000 00000000 ................
0x030D3EE0: 00000000 00000000 00000000 00000000 ................
0x030D3EF0: 00000000 00000000 00000000 00000000 ................
0x030D3F00: 00000000 00000000 00000000 00000000 ................
0x030D3F10: 00000000 00000000 00000000 00000000 ................
0x030D3F20: 00000000 00000000 00000000 00000000 ................
0x030D3F30: 00000000 00000000 00000000 00000000 ................
0x030D3F40: 00000000 00000000 00000000 00000000 ................
0x030D3F50: 00000000 00000000 00000000 00000000 ................
0x030D3F60: 00000000 00000000 00000000 00000000 ................
0x030D3F70: 00000000 00000000 00000000 00000000 ................
0x030D3F80: 00000000 00000000 00000000 00000000 ................
0x030D3F90: 00000000 00000000 00000000 00000000 ................
0x030D3FA0: 00000000 00000000 00000000 00000000 ................
0x030D3FB0: 00000000 00000000 00000000 00000000 ................
0x030D3FC0: 00000000 00000000 00000000 00000000 ................
0x030D3FD0: 00000000 00000000 00000000 00000000 ................
0x030D3FE0: 00000000 00000000 00000000 00000000 ................
0x030D3FF0: 00000000 00000000 00000000 00000000 ................
0x030D4000:

I am backing out the 2417 changes now - I already did a forced fsck
which (e2fs 1.25) which didnt find anything abnormal.

(flo@ping)~# uname -a
Linux ping.mediaways.net 2.4.16 #1 Tue Dec 4 19:42:30 CET 2001 i686 unknown

Flo
--
Florian Lohoff [email protected] +49-5201-669912
Nine nineth on september the 9th Welcome to the new billenium


Attachments:
(No filename) (5.61 kB)
(No filename) (232.00 B)
Download all attachments

2001-12-05 12:36:45

by Mike Fedyk

[permalink] [raw]
Subject: Re: ext3-0.9.16 against linux-2.4.17-pre2

On Wed, Dec 05, 2001 at 01:32:04PM +0100, Florian Lohoff wrote:
> On Sun, Dec 02, 2001 at 09:51:01PM -0800, Andrew Morton wrote:
> >
> > An ext3 update which also applies to linux-2.4.16 is available at
> >
>
> It seems something broken between 2.4.15-pre2 and this update - I am
> seeing filesystem corruption:
>

Hmm, that's strange.

> I am backing out the 2417 changes now - I already did a forced fsck
> which (e2fs 1.25) which didnt find anything abnormal.
>
> (flo@ping)~# uname -a
> Linux ping.mediaways.net 2.4.16 #1 Tue Dec 4 19:42:30 CET 2001 i686 unknown
>

Did you apply it against 2.4.16? It was meant for 2.4.17-pre2. Andrew, do
you know if that could be the cause of this problem?

Mike

2001-12-05 13:01:50

by Florian Lohoff

[permalink] [raw]
Subject: Re: ext3-0.9.16 against linux-2.4.17-pre2

On Wed, Dec 05, 2001 at 01:32:04PM +0100, Florian Lohoff wrote:
> It seems something broken between 2.4.15-pre2 and this update - I am
> seeing filesystem corruption:
>
> Procmail moans about "locked" mailboxes - Opening them shows that
> the last mail originates about 4 hours ago although there are coming
> mails every minute.
>
> procmail: Extraneous locallockfile ignored
> procmail: Error while writing to "countpl/20011205"
> procmail: Truncated file to former size
> procmail: Error while writing to "archive/received-200112"
> procmail: Truncated file to former size
> From [email protected] Wed Dec 5 13:25:37 2001
> Subject: Cron <nwmgmt@mgr1> /aol/bin/count.pl
> Folder: /home/flo/Mail/inbox
>
> (flo@ping)~# ls -la Mail/countpl/20011205 Mail/archive/received-200112
> -rw------- 1 flo flo 51200000 Dec 5 13:25 Mail/archive/received-200112
> -rw------- 1 flo flo 51200000 Dec 5 13:25 Mail/countpl/20011205
>
> The last lines of the countpl/20011205 file contain 0 - Cut'n'pasted
> from "most".

Hand me the brown paperbag and let me die in shame :) Postfix the
ulimit tweaker bit me again ....

Flo
--
Florian Lohoff [email protected] +49-5201-669912
Nine nineth on september the 9th Welcome to the new billenium


Attachments:
(No filename) (1.27 kB)
(No filename) (232.00 B)
Download all attachments

2001-12-05 23:42:38

by Robert Love

[permalink] [raw]
Subject: Re: ext3-0.9.16 against linux-2.4.17-pre2

On Mon, 2001-12-03 at 00:51, Andrew Morton wrote:
> An ext3 update which also applies to linux-2.4.16 is available at
>
> http://www.zip.com.au/~akpm/linux/ext3/
>
> Quite a lot of miscellany here. It would be appreciated if interested
> parties could please test it in preparation for sending upstream. Thanks.

Running 2.4.17-pre4 + preempt-kernel + ext3-0.9.16.

System survived a preliminary stress test, involving I/O and VM
pressure, with no problems. Seems solid here.

Also, subjectively the combination of 2.4.17-pre2+ and this ext3 patch
yields better performance under load. Can't comment which provide the
benefit without testing, but hey, it's the user experience that counts.

Robert Love

2001-12-06 08:31:56

by Yusuf Goolamabbas

[permalink] [raw]
Subject: 2.4.17-pre2+ext3-0.9.16+anton's cache aligned smp

Running 2.4.17-pre2 + ext3-0.9.16 + Anton Blanchards
cacheline_aligned_smp patch available at

http://samba.org/~anton/linux/cacheline_aligned/

Running this on a dual Xeon 500/2GB ram attached to a 3ware 6200 with
2x20 IDE disks. RH 7.2 [pain to install on a 440GX+3ware]. Make sure to
look at this bugzilla entry

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=54741

BTW, make install is borked on RH 7.2 (if you use grub) unless you
comment out the lilo in /sbin/installkernel

Workload was two client machines each with 50 mysql clients making mysql
queries to this machine which the local database jock had written, mix
of inserts,selects,update etc

mysqladmin status on the server showed around 2100 queries/sec.
Seemed very responsive. I'll be adding some more client machines and
reducing server memory and testing further

With Anton's patch, the number of ctx-swtch/sec drops by around 3000
from avg of 9000 (for 17-pre2+ext3) to avg of 6000 (with anton) as seen
by vmstat 1

Load avg is around 4-5 for this compared to 10-12 for 2.4.7-10smp as
installed by RH

I'm also trying to see if I can get test with Jen Axboe's blk-highmem
patch, It applies cleanly to 2.4.17-pre2+ext3-0.9.16 but I can't seem to
get CONFIG_HIGHIO configured via make {old,menu}config. Any gurus want
to take a look. I'd really like to reduce usage of bounce buffers.

Also, on #kernelnewbies, Andre Hedrick claims blk-highmem eats your
data. That didn't occur last time I tested it. I thought it was rock
solid and ready for inclusion. Anybody confirm/deny ?

> On Mon, 2001-12-03 at 00:51, Andrew Morton wrote:
> > An ext3 update which also applies to linux-2.4.16 is available at
> >
> > http://www.zip.com.au/~akpm/linux/ext3/
> >
> > Quite a lot of miscellany here. It would be appreciated if interested
> > parties could please test it in preparation for sending upstream. Thanks.
>
> Running 2.4.17-pre4 + preempt-kernel + ext3-0.9.16.
>
> System survived a preliminary stress test, involving I/O and VM
> pressure, with no problems. Seems solid here.
>
> Also, subjectively the combination of 2.4.17-pre2+ and this ext3 patch
> yields better performance under load. Can't comment which provide the
> benefit without testing, but hey, it's the user experience that counts.
>
> Robert Love

2001-12-06 08:40:26

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.4.17-pre2+ext3-0.9.16+anton's cache aligned smp

On Thu, Dec 06 2001, Yusuf Goolamabbas wrote:
> I'm also trying to see if I can get test with Jen Axboe's blk-highmem
> patch, It applies cleanly to 2.4.17-pre2+ext3-0.9.16 but I can't seem to
> get CONFIG_HIGHIO configured via make {old,menu}config. Any gurus want
> to take a look. I'd really like to reduce usage of bounce buffers.

There was a config bug, please just use Andrea's -aa kernels they have
that fixed.

> Also, on #kernelnewbies, Andre Hedrick claims blk-highmem eats your
> data. That didn't occur last time I tested it. I thought it was rock
> solid and ready for inclusion. Anybody confirm/deny ?

Andre claims a lot of things.

--
Jens Axboe

2001-12-06 08:46:06

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.4.17-pre2+ext3-0.9.16+anton's cache aligned smp

Yusuf Goolamabbas wrote:
>
> Running 2.4.17-pre2 + ext3-0.9.16 + Anton Blanchards
> cacheline_aligned_smp patch available at
>
> http://samba.org/~anton/linux/cacheline_aligned/

omigod look at that graph.

Excuse me while I get frustrated. Will someone *please* send that
damn patch to [email protected]?

(It can be improved further by putting padding *behind* the lock
but hey).

> ...
>
> With Anton's patch, the number of ctx-swtch/sec drops by around 3000
> from avg of 9000 (for 17-pre2+ext3) to avg of 6000 (with anton) as seen
> by vmstat 1

Really? The spinlock cacheline alignment alone made that
difference? I wonder why.


Thanks for testing.

-

2001-12-06 13:06:21

by Anton Blanchard

[permalink] [raw]
Subject: Re: 2.4.17-pre2+ext3-0.9.16+anton's cache aligned smp


> omigod look at that graph.

:) Well its probably worth explaining the results a bit. This machine has
a 128 byte cacheline and ppc uses a load with reservation, store
conditional pair to do atomic operations. The reservation granularity is
one cacheline and if another cpu stores into the cacheline we lose
the reservation. If this is the case its easy to see why forcing
hot locks into their own cacheline makes a big difference.

> Excuse me while I get frustrated. Will someone *please* send that
> damn patch to [email protected]?

Marcelo had some concerns with my original patch (I changed some things
in UP too). I redid the patch (aligning kernel_flag too as suggested
by you) which does not affect UP, I'll forward it on :)

> (It can be improved further by putting padding *behind* the lock
> but hey).

Well since we put these spinlocks into the cachline_aligned section I
dont think we need any padding behind the lock, can you check out the
offsets in System.map, it looks ok on ppc64 at least :)

> Really? The spinlock cacheline alignment alone made that
> difference? I wonder why.

Im guessing xeon cannot satisfy a cacheline miss by stealing directly from
another cpus cache. If that is the case it could be quite expensive to
bounce a cacheline between cpus. But I am also suprised it made that
much difference.

Anton

diff -ru 2.4.17-pre2/arch/alpha/kernel/smp.c 2.4.17-pre2_work/arch/alpha/kernel/smp.c
--- 2.4.17-pre2/arch/alpha/kernel/smp.c Fri Nov 23 18:12:35 2001
+++ 2.4.17-pre2_work/arch/alpha/kernel/smp.c Thu Dec 6 22:47:23 2001
@@ -23,6 +23,7 @@
#include <linux/delay.h>
#include <linux/spinlock.h>
#include <linux/irq.h>
+#include <linux/cache.h>

#include <asm/hwrpb.h>
#include <asm/ptrace.h>
@@ -65,7 +66,7 @@
IPI_CPU_STOP,
};

-spinlock_t kernel_flag = SPIN_LOCK_UNLOCKED;
+spinlock_t kernel_flag __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;

/* Set to a secondary's cpuid when it comes online. */
static unsigned long smp_secondary_alive;
diff -ru 2.4.17-pre2/arch/i386/kernel/smp.c 2.4.17-pre2_work/arch/i386/kernel/smp.c
--- 2.4.17-pre2/arch/i386/kernel/smp.c Thu Oct 25 11:29:51 2001
+++ 2.4.17-pre2_work/arch/i386/kernel/smp.c Thu Dec 6 22:47:26 2001
@@ -17,6 +17,7 @@
#include <linux/smp_lock.h>
#include <linux/kernel_stat.h>
#include <linux/mc146818rtc.h>
+#include <linux/cache.h>

#include <asm/mtrr.h>
#include <asm/pgalloc.h>
@@ -102,7 +103,7 @@
*/

/* The 'big kernel lock' */
-spinlock_t kernel_flag = SPIN_LOCK_UNLOCKED;
+spinlock_t kernel_flag __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;

struct tlb_state cpu_tlbstate[NR_CPUS] = {[0 ... NR_CPUS-1] = { &init_mm, 0 }};

diff -ru 2.4.17-pre2/arch/ia64/kernel/smp.c 2.4.17-pre2_work/arch/ia64/kernel/smp.c
--- 2.4.17-pre2/arch/ia64/kernel/smp.c Fri Nov 23 18:12:36 2001
+++ 2.4.17-pre2_work/arch/ia64/kernel/smp.c Thu Dec 6 22:47:29 2001
@@ -30,6 +30,7 @@
#include <linux/kernel_stat.h>
#include <linux/mm.h>
#include <linux/delay.h>
+#include <linux/cache.h>

#include <asm/atomic.h>
#include <asm/bitops.h>
@@ -51,7 +52,7 @@
#include <asm/mca.h>

/* The 'big kernel lock' */
-spinlock_t kernel_flag = SPIN_LOCK_UNLOCKED;
+spinlock_t kernel_flag __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;

/*
* Structure and data for smp_call_function(). This is designed to minimise static memory
diff -ru 2.4.17-pre2/arch/mips/kernel/smp.c 2.4.17-pre2_work/arch/mips/kernel/smp.c
--- 2.4.17-pre2/arch/mips/kernel/smp.c Fri Nov 23 18:12:37 2001
+++ 2.4.17-pre2_work/arch/mips/kernel/smp.c Thu Dec 6 22:47:37 2001
@@ -31,6 +31,7 @@
#include <linux/timex.h>
#include <linux/sched.h>
#include <linux/interrupt.h>
+#include <linux/cache.h>

#include <asm/atomic.h>
#include <asm/processor.h>
@@ -52,7 +53,7 @@


/* Ze Big Kernel Lock! */
-spinlock_t kernel_flag = SPIN_LOCK_UNLOCKED;
+spinlock_t kernel_flag __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;
int smp_threads_ready; /* Not used */
int smp_num_cpus;
int global_irq_holder = NO_PROC_ID;
diff -ru 2.4.17-pre2/arch/mips64/kernel/smp.c 2.4.17-pre2_work/arch/mips64/kernel/smp.c
--- 2.4.17-pre2/arch/mips64/kernel/smp.c Thu Oct 25 11:29:20 2001
+++ 2.4.17-pre2_work/arch/mips64/kernel/smp.c Thu Dec 6 22:47:44 2001
@@ -5,6 +5,7 @@
#include <linux/time.h>
#include <linux/timex.h>
#include <linux/sched.h>
+#include <linux/cache.h>

#include <asm/atomic.h>
#include <asm/processor.h>
@@ -52,7 +53,7 @@
#endif /* CONFIG_SGI_IP27 */

/* The 'big kernel lock' */
-spinlock_t kernel_flag = SPIN_LOCK_UNLOCKED;
+spinlock_t kernel_flag __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;
int smp_threads_ready; /* Not used */
atomic_t smp_commenced = ATOMIC_INIT(0);
struct cpuinfo_mips cpu_data[NR_CPUS];
diff -ru 2.4.17-pre2/arch/ppc/kernel/smp.c 2.4.17-pre2_work/arch/ppc/kernel/smp.c
--- 2.4.17-pre2/arch/ppc/kernel/smp.c Tue Nov 27 01:12:08 2001
+++ 2.4.17-pre2_work/arch/ppc/kernel/smp.c Thu Dec 6 22:47:54 2001
@@ -23,6 +23,7 @@
#include <linux/unistd.h>
#include <linux/init.h>
#include <linux/spinlock.h>
+#include <linux/cache.h>

#include <asm/ptrace.h>
#include <asm/atomic.h>
@@ -45,7 +46,7 @@
struct klock_info_struct klock_info = { KLOCK_CLEAR, 0 };
atomic_t ipi_recv;
atomic_t ipi_sent;
-spinlock_t kernel_flag __cacheline_aligned = SPIN_LOCK_UNLOCKED;
+spinlock_t kernel_flag __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;
unsigned int prof_multiplier[NR_CPUS];
unsigned int prof_counter[NR_CPUS];
cycles_t cacheflush_time;
diff -ru 2.4.17-pre2/arch/s390/kernel/smp.c 2.4.17-pre2_work/arch/s390/kernel/smp.c
--- 2.4.17-pre2/arch/s390/kernel/smp.c Fri Nov 23 18:12:37 2001
+++ 2.4.17-pre2_work/arch/s390/kernel/smp.c Thu Dec 6 22:48:00 2001
@@ -29,6 +29,7 @@
#include <linux/smp_lock.h>

#include <linux/delay.h>
+#include <linux/cache.h>

#include <asm/sigp.h>
#include <asm/pgalloc.h>
@@ -55,7 +56,7 @@
int smp_threads_ready=0; /* Set when the idlers are all forked. */
static atomic_t smp_commenced = ATOMIC_INIT(0);

-spinlock_t kernel_flag = SPIN_LOCK_UNLOCKED;
+spinlock_t kernel_flag __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;

unsigned long cpu_online_map;

diff -ru 2.4.17-pre2/arch/s390x/kernel/smp.c 2.4.17-pre2_work/arch/s390x/kernel/smp.c
--- 2.4.17-pre2/arch/s390x/kernel/smp.c Fri Nov 23 18:12:37 2001
+++ 2.4.17-pre2_work/arch/s390x/kernel/smp.c Thu Dec 6 22:48:10 2001
@@ -29,6 +29,7 @@
#include <linux/smp_lock.h>

#include <linux/delay.h>
+#include <linux/cache.h>

#include <asm/sigp.h>
#include <asm/pgalloc.h>
@@ -55,7 +56,7 @@
int smp_threads_ready=0; /* Set when the idlers are all forked. */
static atomic_t smp_commenced = ATOMIC_INIT(0);

-spinlock_t kernel_flag = SPIN_LOCK_UNLOCKED;
+spinlock_t kernel_flag __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;

unsigned long cpu_online_map;

diff -ru 2.4.17-pre2/arch/sparc/kernel/smp.c 2.4.17-pre2_work/arch/sparc/kernel/smp.c
--- 2.4.17-pre2/arch/sparc/kernel/smp.c Fri Nov 23 18:12:37 2001
+++ 2.4.17-pre2_work/arch/sparc/kernel/smp.c Thu Dec 6 22:48:17 2001
@@ -18,6 +18,7 @@
#include <linux/mm.h>
#include <linux/fs.h>
#include <linux/seq_file.h>
+#include <linux/cache.h>

#include <asm/ptrace.h>
#include <asm/atomic.h>
@@ -66,7 +67,7 @@
*/

/* Kernel spinlock */
-spinlock_t kernel_flag = SPIN_LOCK_UNLOCKED;
+spinlock_t kernel_flag __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;

/* Used to make bitops atomic */
unsigned char bitops_spinlock = 0;
diff -ru 2.4.17-pre2/arch/sparc64/kernel/smp.c 2.4.17-pre2_work/arch/sparc64/kernel/smp.c
--- 2.4.17-pre2/arch/sparc64/kernel/smp.c Thu Dec 6 22:50:55 2001
+++ 2.4.17-pre2_work/arch/sparc64/kernel/smp.c Thu Dec 6 22:48:23 2001
@@ -17,6 +17,7 @@
#include <linux/spinlock.h>
#include <linux/fs.h>
#include <linux/seq_file.h>
+#include <linux/cache.h>

#include <asm/head.h>
#include <asm/ptrace.h>
@@ -49,7 +50,7 @@
static int smp_activated = 0;

/* Kernel spinlock */
-spinlock_t kernel_flag = SPIN_LOCK_UNLOCKED;
+spinlock_t kernel_flag __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;

volatile int smp_processors_ready = 0;
unsigned long cpu_present_map = 0;
diff -ru 2.4.17-pre2/fs/block_dev.c 2.4.17-pre2_work/fs/block_dev.c
--- 2.4.17-pre2/fs/block_dev.c Fri Nov 23 18:12:43 2001
+++ 2.4.17-pre2_work/fs/block_dev.c Thu Dec 6 22:28:01 2001
@@ -234,7 +234,7 @@
#define HASH_SIZE (1UL << HASH_BITS)
#define HASH_MASK (HASH_SIZE-1)
static struct list_head bdev_hashtable[HASH_SIZE];
-static spinlock_t bdev_lock = SPIN_LOCK_UNLOCKED;
+static spinlock_t bdev_lock __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;
static kmem_cache_t * bdev_cachep;

#define alloc_bdev() \
diff -ru 2.4.17-pre2/fs/buffer.c 2.4.17-pre2_work/fs/buffer.c
--- 2.4.17-pre2/fs/buffer.c Thu Dec 6 22:50:56 2001
+++ 2.4.17-pre2_work/fs/buffer.c Thu Dec 6 22:28:01 2001
@@ -73,7 +73,7 @@
static rwlock_t hash_table_lock = RW_LOCK_UNLOCKED;

static struct buffer_head *lru_list[NR_LIST];
-static spinlock_t lru_list_lock = SPIN_LOCK_UNLOCKED;
+static spinlock_t lru_list_lock __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;
static int nr_buffers_type[NR_LIST];
static unsigned long size_buffers_type[NR_LIST];

diff -ru 2.4.17-pre2/fs/dcache.c 2.4.17-pre2_work/fs/dcache.c
--- 2.4.17-pre2/fs/dcache.c Thu Oct 25 11:29:42 2001
+++ 2.4.17-pre2_work/fs/dcache.c Thu Dec 6 22:28:01 2001
@@ -29,7 +29,7 @@
#define DCACHE_PARANOIA 1
/* #define DCACHE_DEBUG 1 */

-spinlock_t dcache_lock = SPIN_LOCK_UNLOCKED;
+spinlock_t dcache_lock __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;

/* Right now the dcache depends on the kernel lock */
#define check_lock() if (!kernel_locked()) BUG()
diff -ru 2.4.17-pre2/include/linux/cache.h 2.4.17-pre2_work/include/linux/cache.h
--- 2.4.17-pre2/include/linux/cache.h Thu Oct 25 11:29:44 2001
+++ 2.4.17-pre2_work/include/linux/cache.h Thu Dec 6 22:28:01 2001
@@ -34,4 +34,12 @@
#endif
#endif /* __cacheline_aligned */

+#ifndef __cacheline_aligned_in_smp
+#ifdef CONFIG_SMP
+#define __cacheline_aligned_in_smp __cacheline_aligned
+#else
+#define __cacheline_aligned_in_smp
+#endif /* CONFIG_SMP */
+#endif
+
#endif /* __LINUX_CACHE_H */
diff -ru 2.4.17-pre2/mm/filemap.c 2.4.17-pre2_work/mm/filemap.c
--- 2.4.17-pre2/mm/filemap.c Tue Nov 27 01:12:08 2001
+++ 2.4.17-pre2_work/mm/filemap.c Thu Dec 6 22:28:01 2001
@@ -53,7 +53,7 @@
EXPORT_SYMBOL(vm_min_readahead);


-spinlock_t pagecache_lock ____cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;
+spinlock_t pagecache_lock __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;
/*
* NOTE: to avoid deadlocking you must never acquire the pagemap_lru_lock
* with the pagecache_lock held.
@@ -63,7 +63,7 @@
* pagemap_lru_lock ->
* pagecache_lock
*/
-spinlock_t pagemap_lru_lock ____cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;
+spinlock_t pagemap_lru_lock __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;

#define CLUSTER_PAGES (1 << page_cluster)
#define CLUSTER_OFFSET(x) (((x) >> page_cluster) << page_cluster)
diff -ru 2.4.17-pre2/mm/highmem.c 2.4.17-pre2_work/mm/highmem.c
--- 2.4.17-pre2/mm/highmem.c Thu Oct 25 11:29:54 2001
+++ 2.4.17-pre2_work/mm/highmem.c Thu Dec 6 22:34:47 2001
@@ -32,7 +32,7 @@
*/
static int pkmap_count[LAST_PKMAP];
static unsigned int last_pkmap_nr;
-static spinlock_t kmap_lock = SPIN_LOCK_UNLOCKED;
+static spinlock_t kmap_lock __cacheline_aligned_in_smp = SPIN_LOCK_UNLOCKED;

pte_t * pkmap_page_table;

2001-12-07 15:25:49

by Daniel Phillips

[permalink] [raw]
Subject: Re: 2.4.17-pre2+ext3-0.9.16+anton's cache aligned smp

On December 6, 2001 09:45 am, Andrew Morton wrote:
> Yusuf Goolamabbas wrote:
> >
> > Running 2.4.17-pre2 + ext3-0.9.16 + Anton Blanchards
> > cacheline_aligned_smp patch available at
> >
> > http://samba.org/~anton/linux/cacheline_aligned/
>
> omigod look at that graph.
>
> Excuse me while I get frustrated. Will someone *please* send that
> damn patch to [email protected]?
>
> (It can be improved further by putting padding *behind* the lock
> but hey).
>
> > ...
> >
> > With Anton's patch, the number of ctx-swtch/sec drops by around 3000
> > from avg of 9000 (for 17-pre2+ext3) to avg of 6000 (with anton) as seen
> > by vmstat 1
>
> Really? The spinlock cacheline alignment alone made that
> difference? I wonder why.

Before getting *too* excited, remember, it's dbench, so effects could easily
be magnified. Maybe test with something better behaved?

--
Daniel