2001-04-19 23:30:11

by D.W.Howells

[permalink] [raw]
Subject: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]


You asked for some benchmarks Andrea, so I've obtained some.

The set of test modules can be found at:

ftp://infradead.org/pub/people/dwh/rwsem-test.tar.bz2

(This also includes rwsem-stat.txt which has a copy of the benchmark results
in as well)

There are six test programs. They can be made for i386 by the following
command:

make

And can also be made to invoke the scheduler after each pass through the loop:

make SCHED=yes

I ran each individual test twice, hence the two sets of results for
permutation.

My machine is a Dual 400MHz PII with an 440BX chipset. All the tests were run
in runlevel 3 with no other applications running.

I benchmarked four different environments:

(1) 2.4.4-pre3 + Andrea's generic rwsem patch
(2) 2.4.4-pre4 using XADD to implement the rwsems
(3) same as (2) but with a tweak to make rwsem_wake() less fair
(4) 2.4.4-pre3 using my generic spinlock code to implement the rwsems

David


TEST NUM READERS NUM WRITERS CONTENTION
=============== =============== =============== ==========
rwsem-rw 4 2 r-w & w-w
rwsem-ro 4 0 no
rwsem-wo 0 4 w-w only
rwsem-r1 1 0 no
rwsem-w1 0 1 no
rwsem-r2 2 0 no


ENVIRONMENT TEST SCHED READERS WRITERS
=============================== ======= ======= =============== =======
Linux-2.4.4-pre3 + AA-rwsem rws-rw no 3330281 1009
3331972 994
yes 1767102 607091
1743420 642095
rws-ro no 2534630 n/a
3535202 n/a
yes 2837218 n/a
3164814 n/a
rws-wo no n/a 2507449
n/a 2399102
yes n/a 1568467
n/a 1412262
rws-r1 no 9232485 n/a
9217585 n/a
yes 5483757 n/a
5487028 n/a
rws-w1 no n/a 9900333
n/a 9918021
yes n/a 5745657
n/a 5747063
rws-r2 no 3499275 n/a
3518590 n/a
yes 3184431 n/a
3180287 n/a

------------------------------- ------- ------- --------------- -------
Linux-2.4.4-pre4 [XADD] rws-rw no 563388 283005
558159 280288
yes 683670 197017
700714 194316
rws-ro no 6316985 n/a
6314241 n/a
yes 4309406 n/a
4575043 n/a
rws-wo no n/a 765699
n/a 763876
yes n/a 650512
n/a 652287
rws-r1 no 15171532 n/a
15158899 n/a
yes 7222310 n/a
7229793 n/a
rws-w1 no n/a 13942744
n/a 13991823
yes n/a 7362605
n/a 7356127
rws-r2 no 5517671 n/a
5516168 n/a
yes 3452796 n/a
3331947 n/a

------------------------------- ------- ------- --------------- -------
Linux-2.4.4-pre4 [XADD] rws-rw no 531929 267129
+ slightly-unfair-contention 531093 266822
tweak yes 839560 185670
903995 183958
rws-ro no 6318293 n/a
6320336 n/a
yes 4257862 n/a
4315243 n/a
rws-wo no n/a 766427
n/a 766471
yes n/a 644036
n/a 642236


------------------------------- ------- ------- --------------- -------
Linux-2.4.4-pre4 [GENERIC-SPIN] rws-rw no 545138 274002
545378 273785
yes 755343 187874
745888 185562
rws-ro no 4500398 n/a
4506583 n/a
yes 3137883 n/a
3129119 n/a
rws-wo no n/a 763599
n/a 763059
yes n/a 641256
n/a 647319
rws-r1 no 13110083 n/a
13115436 n/a
yes 6950687 n/a
6951901 n/a
rws-w1 no n/a 13004627
n/a 13003754
yes n/a 6899764
n/a 6898953
rws-r2 no 4741615 n/a
4746860 n/a
yes 3340292 n/a
2967268 n/a
0?A8


2001-04-20 01:20:39

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]

On Fri, Apr 20, 2001 at 12:28:09AM +0100, D . W . Howells wrote:
> I benchmarked four different environments:
>
> (1) 2.4.4-pre3 + Andrea's generic rwsem patch
> (2) 2.4.4-pre4 using XADD to implement the rwsems
> (3) same as (2) but with a tweak to make rwsem_wake() less fair
> (4) 2.4.4-pre3 using my generic spinlock code to implement the rwsems
>
> David
>
>
> TEST NUM READERS NUM WRITERS CONTENTION
> =============== =============== =============== ==========
> rwsem-rw 4 2 r-w & w-w
> rwsem-ro 4 0 no
> rwsem-wo 0 4 w-w only
> rwsem-r1 1 0 no
> rwsem-w1 0 1 no
> rwsem-r2 2 0 no
>
>
> ENVIRONMENT TEST SCHED READERS WRITERS
> =============================== ======= ======= =============== =======
> Linux-2.4.4-pre3 + AA-rwsem rws-rw no 3330281 1009
> 3331972 994
[..]
> ------------------------------- ------- ------- --------------- -------
> Linux-2.4.4-pre4 [GENERIC-SPIN] rws-rw no 545138 274002
> 545378 273785
> yes 755343 187874
> 745888 185562

Some explanation on the above extreme difference. In the misc rw benchmark the
reason in the same amount of time I get a total number of down 3332966 and you
get only 819163 is that I provide recursive down_read and that in turn can
starve the down_write (my first patches weren't implementing fair semaphores).

As you can see in my post of yesterday I made my semaphores fair in my last
patches (from rwsem-generic-5). (you didn't said which patch you used exactly
but obviously it was earlier than the -5 revision)

I'm uncertain if I should drop the list_empty() check from the fast path and if
I should still allow up_* to be called from irq/softirq, if I reduce the max
number of sleepers to 2^16 and I will provide weaker wakeup semantics I won't
be penalizied anymore and then we'll really compare apples to orange making the
comparison more interesting (probably I will do because later on I can probably
re-add that two features without too much pain).

About the benchmark you wrote it looks good measure to me, thanks.

Andrea

2001-04-20 10:11:04

by David Howells

[permalink] [raw]
Subject: Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]


> About the benchmark you wrote it looks good measure to me, thanks.

As with all benchmarks, take with one pinch of salt and two of Mindcraft:-)

David

2001-04-20 16:55:49

by Andrea Arcangeli

[permalink] [raw]
Subject: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]

On Fri, Apr 20, 2001 at 03:42:15AM +0200, Andrea Arcangeli wrote:
> I'm uncertain if I should drop the list_empty() check from the fast path and if

While dropping the list_empty check to speed up the fast path I faced the same
complexity of the 2.4.4pre4 lib/rwsem.c and so before reinventing the wheel I
read how the problem was solved in 2.4.4pre4.

I couldn't get convinced that all the subtle possible cases was handled
correctly so I tried to stress them. To do that I changed the 2.4.4pre4 rwsem
(compiled for PII SMP) as below patch shows in order to reproduce more easily
those corner cases (note my changes cannot be the cause of the deadlock, as far
I can tell they can _only_ change the timings, and the timings have to be
flexible because they can be randomly influenced by interrupt handlers machine
checks and whatever else in whatever hardware):

--- rwsemref-1/lib/rwsem.c.~1~ Thu Apr 19 19:59:46 2001
+++ rwsemref-1/lib/rwsem.c Fri Apr 20 17:13:05 2001
@@ -6,6 +6,7 @@
#include <linux/rwsem.h>
#include <linux/sched.h>
#include <linux/module.h>
+#include <linux/delay.h>

/*
* wait for the read lock to be granted
@@ -18,6 +19,7 @@
DECLARE_WAITQUEUE(wait,tsk);
signed long count;

+ mdelay(1);
rwsemtrace(sem,"Entering rwsem_down_read_failed");

/* this waitqueue context flag will be cleared when we are granted the lock */
@@ -59,6 +61,7 @@
DECLARE_WAITQUEUE(wait,tsk);
signed long count;

+ mdelay(1);
rwsemtrace(sem,"Entering rwsem_down_write_failed");

/* this waitqueue context flag will be cleared when we are granted the lock */
@@ -115,6 +118,7 @@
goto out;
}

+ mdelay(1);
/* try to grant a single write lock if there's a writer at the front of the queue
* - note we leave the 'active part' of the count incremented by 1 and the waiting part
* incremented by 0x00010000
@@ -122,6 +126,7 @@
if (wake_up_ctx(&sem->wait,1,-RWSEM_WAITING_FOR_WRITE)==1)
goto out;

+ mdelay(10);
/* grant an infinite number of read locks to the readers at the front of the queue
* - note we increment the 'active part' of the count by the number of readers just woken,
* less one for the activity decrement we've already done
@@ -129,6 +134,7 @@
woken = wake_up_ctx(&sem->wait,65535,-RWSEM_WAITING_FOR_READ);
if (woken<=0)
goto counter_correction;
+ mdelay(1);

woken *= RWSEM_ACTIVE_BIAS-RWSEM_WAITING_BIAS;
woken -= RWSEM_ACTIVE_BIAS;

and I succeeded in deadlocking the 2.4.4pre[2345] rwsemaphores with kernel
looping forever in 2.4.4pre4/lib/rwsem.c:rwsem_wake() but at least the breakage
is 100% reproducible if you use the above patch.

All you have to do to reproduce is to apply the above patch and then to run the
rwsem proggy that I posted to the list two days ago (the one that shows my
latest -5/6 revisions to be just a bit faster in the slow path) that does the
mmap and page faults from different threads in loop, in another console you run
`while :; do ps xa; done`, then while `ps` is near the end of its run you kill
the parent thread of the rwsem testcase so that all thread-children gets killed
as well, and at that point `ps` deadlocks hard in R state unkillable (it keeps
looping forever in the try_again: counter_correction: paths).

[..]
285 ? SW 0:00 [rpciod]
461 ? S 0:00 /usr/sbin/inetd
463 tty1 S 0:00 /sbin/mingetty --noclear tty1
464 tty2 S 0:00 /sbin/mingetty tty2
465 tty3 S 0:00 /sbin/mingetty tty3
466 tty4 S 0:00 /sbin/mingetty tty4
467 tty5 S 0:00 /sbin/mingetty tty5
468 tty6 S 0:00 /sbin/mingetty tty6
469 ? S 0:00 in.rlogind
470 pts/0 S 0:00 login -- andrea
471 pts/0 S 0:00 -bash
530 ? S 0:00 in.rlogind
531 pts/1 S 0:00 login -- andrea
532 pts/1 S 0:00 -bash
666 pts/1 R 0:00 ps xa
667 pts/0 S 0:00 ./rwsem
668 pts/0 R 0:00 ./rwsem
669 pts/0 D 0:00 ./rwsem
670 pts/0 D 0:00 ./rwsem
671 pts/0 D 0:00 ./rwsem
672 pts/0 D 0:00 ./rwsem
673 pts/0 D 0:00 ./rwsem
674 pts/0 D 0:00 ./rwsem
675 pts/0 D 0:00 ./rwsem
676 pts/0 D 0:00 ./rwsem
677 pts/0 R 0:00 ./rwsem
678 pts/0 D 0:00 ./rwsem
679 pts/0 D 0:00 ./rwsem
680 pts/0 D 0:00 ./rwsem
681 pts/0 D 0:00 ./rwsem
682 pts/0 D 0:00 ./rwsem
683 pts/0 D 0:00 ./rwsem
684 pts/0 D 0:00 ./rwsem
685 pts/0 D 0:00 ./rwsem
686 pts/0 D 0:00 ./rwsem
687 pts/0 D 0:00 ./rwsem
688 pts/0 R 0:00 ./rwsem
689 pts/0 D 0:00 ./rwsem
690 pts/0 D 0:00 ./rwsem
691 pts/0 D 0:00 ./rwsem
692 pts/0 D 0:00 ./rwsem
693 pts/0 D 0:00 ./rwsem
694 pts/0 D 0:00 ./rwsem
695 pts/0 D 0:00 ./rwsem
696 pts/0 D 0:00 ./rwsem
697 pts/0 D 0:00 ./rwsem
698 pts/0 D 0:00 ./rwsem
699 pts/0 D 0:00 ./rwsem
700 pts/0 D 0:00 ./rwsem
701 pts/0 D 0:00 ./rwsem
702 pts/0 D 0:00 ./rwsem
703 pts/0 D 0:00 ./rwsem
[ hang here ]

from another terminal:

andrea@laser:~ > kill 666
andrea@laser:~ > kill 666
andrea@laser:~ > kill 666
andrea@laser:~ > kill 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > ps 666
PID TTY STAT TIME COMMAND
666 pts/1 R 0:31 ps xa
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > kill -9 666
andrea@laser:~ > ps 666
PID TTY STAT TIME COMMAND
666 pts/1 R 53:38 ps xa
andrea@laser:~ >

At first I had a short look to check if this bug could be unrelated to the rwsem and
to be instead a race between the /proc filesystem and the do_exit path, but
after a short review of the interesting places the task_lock seems to serialize
correctly the accesses to the tsk->mm so the semaphore shouldn't go away under
rwsem_wake and furthmore it would be quite strange if the mm goes away under
the /proc filesystem that it only reproducibly fails in the same way in
rwsem_wake and I never get other system malfunctions for example while reading
the contents of the mm.

As further confirm this is a bug in the rwsemaphores I changed your benchmark
to be a little more aggressive:

--- rwsem-rw.c.orig Thu Apr 19 22:37:27 2001
+++ rwsem-rw.c Fri Apr 20 18:56:06 2001
@@ -71,9 +71,8 @@

static inline void sched(void)
{
-#ifdef SCHED
- schedule();
-#endif
+ if (current->need_resched)
+ schedule();
}

int reader(void *arg)
@@ -86,8 +85,8 @@

while (atomic_read(&do_stuff)) {
dr();
- ur();
sched();
+ ur();
}

atomic_dec(&runners);
@@ -105,8 +104,8 @@

while (atomic_read(&do_stuff)) {
dw();
- uw();
sched();
+ uw();
}

atomic_dec(&runners);
@@ -125,22 +124,64 @@

init_rwsem(&rwsem_sem);
atomic_set(&do_stuff, 1);
+ wmb();

#ifdef DEBUG_RWSEM
rwsem_to_monitor = &rwsem_sem;
#endif
init_timer(&timer);
timer.function = stop_test;
- timer.expires = jiffies + 5 * HZ;
+ timer.expires = jiffies + 50 * HZ;
add_timer(&timer);

- kernel_thread(writer, "WriteA", 0);
- kernel_thread(writer, "WriteB", 0);
-
- kernel_thread(reader, "ReadA", 0);
- kernel_thread(reader, "ReadB", 0);
- kernel_thread(reader, "ReadC", 0);
- kernel_thread(reader, "ReadD", 0);
+ kernel_thread(writer, "Write0", 0);
+ kernel_thread(writer, "Write1", 0);
+ kernel_thread(writer, "Write2", 0);
+ kernel_thread(writer, "Write3", 0);
+ kernel_thread(writer, "Write4", 0);
+ kernel_thread(writer, "Write5", 0);
+ kernel_thread(writer, "Write6", 0);
+ kernel_thread(writer, "Write7", 0);
+ kernel_thread(writer, "Write8", 0);
+ kernel_thread(writer, "Write9", 0);
+ kernel_thread(writer, "Write10", 0);
+ kernel_thread(writer, "Write11", 0);
+ kernel_thread(writer, "Write12", 0);
+ kernel_thread(writer, "Write13", 0);
+ kernel_thread(writer, "Write14", 0);
+ kernel_thread(writer, "Write15", 0);
+
+ kernel_thread(reader, "Read0", 0);
+ kernel_thread(reader, "Read1", 0);
+ kernel_thread(reader, "Read2", 0);
+ kernel_thread(reader, "Read3", 0);
+ kernel_thread(reader, "Read4", 0);
+ kernel_thread(reader, "Read5", 0);
+ kernel_thread(reader, "Read6", 0);
+ kernel_thread(reader, "Read7", 0);
+ kernel_thread(reader, "Read8", 0);
+ kernel_thread(reader, "Read9", 0);
+ kernel_thread(reader, "Read10", 0);
+ kernel_thread(reader, "Read11", 0);
+ kernel_thread(reader, "Read12", 0);
+ kernel_thread(reader, "Read13", 0);
+ kernel_thread(reader, "Read14", 0);
+ kernel_thread(reader, "Read15", 0);
+ kernel_thread(reader, "Read16", 0);
+ kernel_thread(reader, "Read17", 0);
+ kernel_thread(reader, "Read18", 0);
+ kernel_thread(reader, "Read19", 0);
+ kernel_thread(reader, "Read20", 0);
+ kernel_thread(reader, "Read21", 0);
+ kernel_thread(reader, "Read22", 0);
+ kernel_thread(reader, "Read23", 0);
+ kernel_thread(reader, "Read24", 0);
+ kernel_thread(reader, "Read25", 0);
+ kernel_thread(reader, "Read26", 0);
+ kernel_thread(reader, "Read27", 0);
+ kernel_thread(reader, "Read28", 0);
+ kernel_thread(reader, "Read29", 0);
+ kernel_thread(reader, "Read30", 0);

return 0;
}
@@ -155,6 +196,8 @@
schedule();
printk("reads taken: %d\n", atomic_read(&reads_taken));
printk("writes taken: %d\n", atomic_read(&writes_taken));
+ printk("readers: %d\n", atomic_read(&readers));
+ printk("writers: %d\n", atomic_read(&writers));
#ifdef DEBUG_RWSEM
rwsem_to_monitor = 0;
#endif

and now also the benchmark deadlocked:

andrea@laser:~ > ps 669
PID TTY STAT TIME COMMAND
669 pts/0 R 6:25 [Read19]
andrea@laser:~ > su
root@laser:/home/andrea > rmmod rwsem-rw
[ hang here ]

Until we fix that bug I will keep running my rwsem-generic-6 patch to be 100%
sure to sit on rock solid rwsems.

BTW, this is the fix against pre4 for the 386+SMP rwsemaphores bug that I found
yesterday:

--- rwsemref-1/include/asm-i386/rwsem-spin.h.~1~ Fri Apr 20 05:03:59 2001
+++ rwsemref-1/include/asm-i386/rwsem-spin.h Fri Apr 20 19:05:52 2001
@@ -93,9 +93,9 @@
__asm__ __volatile__(
"# beginning down_read\n\t"
#ifdef CONFIG_SMP
+ "1:\n\t"
LOCK_PREFIX " decb "RWSEM_SPINLOCK_OFFSET_STR"(%%eax)\n\t" /* try to grab the spinlock */
" js 3f\n" /* jump if failed */
- "1:\n\t"
#endif
" incl (%%eax)\n\t" /* adds 0x00000001, returns the old value */
#ifdef CONFIG_SMP
@@ -132,9 +132,9 @@
__asm__ __volatile__(
"# beginning down_write\n\t"
#ifdef CONFIG_SMP
+ "1:\n\t"
LOCK_PREFIX " decb "RWSEM_SPINLOCK_OFFSET_STR"(%%eax)\n\t" /* try to grab the spinlock */
" js 3f\n" /* jump if failed */
- "1:\n\t"
#endif
" xchg %0,(%%eax)\n\t" /* retrieve the old value */
" add %0,(%%eax)\n\t" /* add 0xffff0001, result in memory */
@@ -173,9 +173,9 @@
__asm__ __volatile__(
"# beginning __up_read\n\t"
#ifdef CONFIG_SMP
+ "1:\n\t"
LOCK_PREFIX " decb "RWSEM_SPINLOCK_OFFSET_STR"(%%eax)\n\t" /* try to grab the spinlock */
" js 3f\n" /* jump if failed */
- "1:\n\t"
#endif
" xchg %0,(%%eax)\n\t" /* retrieve the old value */
" addl %0,(%%eax)\n\t" /* subtract 1, result in memory */
@@ -213,9 +213,9 @@
__asm__ __volatile__(
"# beginning __up_write\n\t"
#ifdef CONFIG_SMP
+ "1:\n\t"
LOCK_PREFIX " decb "RWSEM_SPINLOCK_OFFSET_STR"(%%eax)\n\t" /* try to grab the spinlock */
" js 3f\n" /* jump if failed */
- "1:\n\t"
#endif
" addl %3,(%%eax)\n\t" /* adds 0x00010001 */
#ifdef CONFIG_SMP
@@ -251,9 +251,9 @@
__asm__ __volatile__(
"# beginning rwsem_atomic_update\n\t"
#ifdef CONFIG_SMP
+ "1:\n\t"
LOCK_PREFIX " decb "RWSEM_SPINLOCK_OFFSET_STR"(%1)\n\t" /* try to grab the spinlock */
" js 3f\n" /* jump if failed */
- "1:\n\t"
#endif
" xchgl %0,(%1)\n\t" /* retrieve the old value */
" addl %0,(%1)\n\t" /* add 0xffff0001, result in memory */
@@ -287,9 +287,9 @@
__asm__ __volatile__(
"# beginning rwsem_cmpxchgw\n\t"
#ifdef CONFIG_SMP
+ "1:\n\t"
LOCK_PREFIX " decb "RWSEM_SPINLOCK_OFFSET_STR"(%3)\n\t" /* try to grab the spinlock */
" js 3f\n" /* jump if failed */
- "1:\n\t"
#endif
" cmpw %w1,(%3)\n\t"
" jne 4f\n\t" /* jump if old doesn't match sem->count LSW */

Andrea

2001-04-20 23:46:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]



On Fri, 20 Apr 2001, Andrea Arcangeli wrote:
>
> While dropping the list_empty check to speed up the fast path I faced the same
> complexity of the 2.4.4pre4 lib/rwsem.c and so before reinventing the wheel I
> read how the problem was solved in 2.4.4pre4.

I would suggest the following:

- the generic semaphores should use the lock that already exists in the
wait-queue as the semaphore spinlock.

- the generic semaphores should _not_ drop the lock. Right now it drops
the semaphore lock when it goes into the slow path, only to re-aquire
it. This is due to bad interfacing with the generic slow-path routines.

I suspect that this lock-drop is why Andrea sees problems with the
generic semaphores. The changes to "count" and "sleeper" aren't
actually atomic, because we don't hold the lock over them all. And
re-using the lock means that we don't need the two levels of
spinlocking for adding ourselves to the wait queue. Easily done by just
moving the locking _out_ of the wait-queue helper functions, no?

- the generic semaphores are entirely out-of-line, and are just declared
universally as regular FASTCALL() functions.

The fast-path x86 code looks ok to me. The debugging stuff makes it less
readable than it should be, I suspect, and is probably not worth it at
this stage. The users of rw-semaphores are so well-defined (and so well
debugged) that the debugging code only makes the code harder to follow
right now.

Comments? Andrea? Your patches have looked ok, but I absoutely refuse to
see the non-inlined fast-path for reasonable x86 hardware..

Linus

2001-04-21 14:04:10

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]

On Fri, Apr 20, 2001 at 04:45:32PM -0700, Linus Torvalds wrote:
> I would suggest the following:
>
> - the generic semaphores should use the lock that already exists in the
> wait-queue as the semaphore spinlock.

Ok, that is what my generic code does.

> - the generic semaphores should _not_ drop the lock. Right now it drops
> the semaphore lock when it goes into the slow path, only to re-aquire
> it. This is due to bad interfacing with the generic slow-path routines.

My generic code doesn't drop the lock.

> I suspect that this lock-drop is why Andrea sees problems with the
> generic semaphores. The changes to "count" and "sleeper" aren't
> actually atomic, because we don't hold the lock over them all. And
> re-using the lock means that we don't need the two levels of
> spinlocking for adding ourselves to the wait queue. Easily done by just
> moving the locking _out_ of the wait-queue helper functions, no?

Basically yes, however for the wakeup I wrote a dedicated routine that
knows how to do the wake-all-next-readers or wake-next-writer (it is not
the same helper function of sched.c).

> - the generic semaphores are entirely out-of-line, and are just declared
> universally as regular FASTCALL() functions.

This is what I implemented originally but then I moved the fast path inline
for the fast-path benchmark reasons. I think in real life it doesn't matter
much if the fast path is inline or not.

> The fast-path x86 code looks ok to me. The debugging stuff makes it less
> readable than it should be, I suspect, and is probably not worth it at
> this stage. The users of rw-semaphores are so well-defined (and so well
> debugged) that the debugging code only makes the code harder to follow
> right now.

yes I agree, infact I added the ->magic check only to catch uninitialized
semaphores (and this one doesn't hurt readability that much).

> Comments? Andrea? Your patches have looked ok, but I absoutely refuse to
> see the non-inlined fast-path for reasonable x86 hardware..

In my last patch the fast path is inline as said above but it is not in asm yet
because I couldn't get convinced it was right code. I plan to return looking
into the rwsem soon. I also seen David fixed the bug and dropped the buggy
rwsem-spin.h, so I suggest to merge his code for now, after a very short look
it seems certainly better than pre5.

Andrea

2001-04-21 14:18:14

by Russell King

[permalink] [raw]
Subject: Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]

On Sat, Apr 21, 2001 at 04:03:27PM +0200, Andrea Arcangeli wrote:
> On Fri, Apr 20, 2001 at 04:45:32PM -0700, Linus Torvalds wrote:
> > I would suggest the following:
> >
> > - the generic semaphores should use the lock that already exists in the
> > wait-queue as the semaphore spinlock.
>
> Ok, that is what my generic code does.

Erm, spin_lock()? What if up_read or up_write gets called from interrupt
context (is this allowed)?

If these are now allowed, then maybe we should either consider getting
the Stanford checker to check for this, or else we ought to do a debugging
if (in_interupt()) BUG(); thing.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-04-21 14:37:35

by Russell King

[permalink] [raw]
Subject: Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]

Andrea Arcangeli writes:
> That it is allowed by my generic code that does spin_lock_irq in down_* and
> spin_lock_irqsave in up_* but it's disallowed by the weaker semantics of the
> generic and x86 semaphores 2.4.4pre[2345] (or + David's last patch).

Hang on, who's code is in 2.4.4-pre5? It claims to be Davids, which does
suffer from the problem I described.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-04-21 14:30:14

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]

On Sat, Apr 21, 2001 at 03:17:37PM +0100, Russell King wrote:
> On Sat, Apr 21, 2001 at 04:03:27PM +0200, Andrea Arcangeli wrote:
> > On Fri, Apr 20, 2001 at 04:45:32PM -0700, Linus Torvalds wrote:
> > > I would suggest the following:
> > >
> > > - the generic semaphores should use the lock that already exists in the
> > > wait-queue as the semaphore spinlock.
> >
> > Ok, that is what my generic code does.
>
> Erm, spin_lock()? What if up_read or up_write gets called from interrupt
> context (is this allowed)?

That it is allowed by my generic code that does spin_lock_irq in down_* and
spin_lock_irqsave in up_* but it's disallowed by the weaker semantics of the
generic and x86 semaphores 2.4.4pre[2345] (or + David's last patch).

> If these are now allowed, then maybe we should either consider getting
> the Stanford checker to check for this, or else we ought to do a debugging
> if (in_interupt()) BUG(); thing.

Caller bug is the last of my worries. (and I seriously doubt that if somebody
doesn't know the semantics of the rwsem, he will care to enable the debugging
checks during regression testing ;)

Andrea

2001-04-21 14:38:05

by Russell King

[permalink] [raw]
Subject: Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]

On Sat, Apr 21, 2001 at 04:03:27PM +0200, Andrea Arcangeli wrote:
> On Fri, Apr 20, 2001 at 04:45:32PM -0700, Linus Torvalds wrote:
> > I would suggest the following:
> >
> > - the generic semaphores should use the lock that already exists in the
> > wait-queue as the semaphore spinlock.
>
> Ok, that is what my generic code does.

rwsem-spinlock.h requires linux/types.h to be included (you're using
__u16 at towards the bottom):

gcc -D__KERNEL__ -I/usr/src2/v2.4/linux-rpc/include -Wall -Wstrict-prototypes
-O2 -fno-strict-aliasing -pipe -mapcs-32 -march=armv3m -mtune=strongarm110
-mshort-load-bytes -msoft-float -c -o ieee1284.o ieee1284.c
In file included from /usr/src2/v2.4/linux-rpc/include/linux/rwsem.h:56,
from /usr/src2/v2.4/linux-rpc/include/asm/semaphore.h:10,
from /usr/src2/v2.4/linux-rpc/include/linux/parport.h:101,
from ieee1284.c:19:
/usr/src2/v2.4/linux-rpc/include/linux/rwsem-spinlock.h:154: parse error before `rwsem_cmpxchgw'
/usr/src2/v2.4/linux-rpc/include/linux/rwsem-spinlock.h:154: parse error before `__u16'
/usr/src2/v2.4/linux-rpc/include/linux/rwsem-spinlock.h:155: warning: return-type defaults to `int'
/usr/src2/v2.4/linux-rpc/include/linux/rwsem-spinlock.h:155: warning: function declaration isn't a prototype

Here is a patch that fixes this oversight against 2.4.4-pre5:

--- orig/include/linux/rwsem-spinlock.h Sat Apr 21 15:32:57 2001
+++ linux/include/linux/rwsem-spinlock.h Sat Apr 21 15:28:45 2001
@@ -11,6 +11,7 @@
#endif

#include <linux/spinlock.h>
+#include <linux/types.h>

#ifdef __KERNEL__



--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-04-21 15:05:19

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]

On Sat, Apr 21, 2001 at 03:37:05PM +0100, [email protected] wrote:
> Andrea Arcangeli writes:
> > That it is allowed by my generic code that does spin_lock_irq in down_* and
> > spin_lock_irqsave in up_* but it's disallowed by the weaker semantics of the
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > generic and x86 semaphores 2.4.4pre[2345] (or + David's last patch).
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Hang on, who's code is in 2.4.4-pre5? It claims to be Davids, which does
> suffer from the problem I described.

Yes.

Andrea

2001-04-21 15:08:19

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]

On Sat, Apr 21, 2001 at 03:37:42PM +0100, Russell King wrote:
> On Sat, Apr 21, 2001 at 04:03:27PM +0200, Andrea Arcangeli wrote:
> > On Fri, Apr 20, 2001 at 04:45:32PM -0700, Linus Torvalds wrote:
> > > I would suggest the following:
> > >
> > > - the generic semaphores should use the lock that already exists in the
> > > wait-queue as the semaphore spinlock.
> >
> > Ok, that is what my generic code does.
>
> rwsem-spinlock.h requires linux/types.h to be included (you're using
> __u16 at towards the bottom):

As you are quoting me on "that is what my generic code does", note that my
code is only here:

ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre4aa1/00_rwsem-generic-6

and no one single line of my rwsem patches is ever been included in any of
Linus's or Alan's kernels.

> gcc -D__KERNEL__ -I/usr/src2/v2.4/linux-rpc/include -Wall -Wstrict-prototypes
> -O2 -fno-strict-aliasing -pipe -mapcs-32 -march=armv3m -mtune=strongarm110
> -mshort-load-bytes -msoft-float -c -o ieee1284.o ieee1284.c
> In file included from /usr/src2/v2.4/linux-rpc/include/linux/rwsem.h:56,
> from /usr/src2/v2.4/linux-rpc/include/asm/semaphore.h:10,
> from /usr/src2/v2.4/linux-rpc/include/linux/parport.h:101,
> from ieee1284.c:19:
> /usr/src2/v2.4/linux-rpc/include/linux/rwsem-spinlock.h:154: parse error before `rwsem_cmpxchgw'
> /usr/src2/v2.4/linux-rpc/include/linux/rwsem-spinlock.h:154: parse error before `__u16'
> /usr/src2/v2.4/linux-rpc/include/linux/rwsem-spinlock.h:155: warning: return-type defaults to `int'
> /usr/src2/v2.4/linux-rpc/include/linux/rwsem-spinlock.h:155: warning: function declaration isn't a prototype
>
> Here is a patch that fixes this oversight against 2.4.4-pre5:
>
> --- orig/include/linux/rwsem-spinlock.h Sat Apr 21 15:32:57 2001
> +++ linux/include/linux/rwsem-spinlock.h Sat Apr 21 15:28:45 2001
> @@ -11,6 +11,7 @@
> #endif
>
> #include <linux/spinlock.h>
> +#include <linux/types.h>
>
> #ifdef __KERNEL__
>
>
>
> --
> Russell King ([email protected]) The developer of ARM Linux
> http://www.arm.linux.org.uk/personal/aboutme.html
>


Andrea

2001-04-21 17:18:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsem benchmarks [Re: generic rwsem [Re: Alpha "process table hang"]]]



On Sat, 21 Apr 2001, Russell King wrote:
>
> Erm, spin_lock()? What if up_read or up_write gets called from interrupt
> context (is this allowed)?

Currently that is not allowed.

We allow it for regular semaphores, but not for rw-semaphores.

We may some day have to revisit that issue, but I suspect we won't have
much reason to.

Linus