2002-03-04 03:59:59

by Rusty Russell

[permalink] [raw]
Subject: [PATCH] Fast Userspace Mutexes III.

1) Use mmap/mprotect bits, not new syscall (thanks RTH, Erik Biederman)
2) Fix wakeup race in kernel (thanks Martin Wirth, Paul Mackerras)
3) Simplify locking to a single atomic (no more arch specifics!)
4) Use wake-one by handcoding queues.
5) Comments added.

Thanks to all for feedback and review: I'd appreciate a comment from
those arch's which need to do something with the PROT_SEM bit.

Once again, tested on 2.4.18 UP PPC, compiles on 2.5.6-pre1.

Bad news is that we're up to 206 lines again.
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.6-pre1/kernel/futex.c working-2.5.6-pre1-futex/kernel/futex.c
--- linux-2.5.6-pre1/kernel/futex.c Thu Jan 1 10:00:00 1970
+++ working-2.5.6-pre1-futex/kernel/futex.c Mon Mar 4 14:48:47 2002
@@ -0,0 +1,206 @@
+/*
+ * Fast Userspace Mutexes (which I call "Futexes!").
+ * (C) Rusty Russell, IBM 2002
+ *
+ * Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly
+ * enough at me, Linus for the original (flawed) idea, Matthew
+ * Kirkwood for proof-of-concept implementation.
+ *
+ * "The futexes are also cursed."
+ * "But they come in a choice of three flavours!"
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#include <linux/kernel.h>
+#include <linux/spinlock.h>
+#include <linux/sched.h>
+#include <linux/mm.h>
+#include <linux/hash.h>
+#include <linux/init.h>
+#include <linux/fs.h>
+#include <asm/atomic.h>
+
+/* These mutexes are a very simple counter: the winner is the one who
+ decrements from 1 to 0. The counter starts at 1 when the lock is
+ free. A value other than 0 or 1 means someone may be sleeping.
+ This is simple enough to work on all architectures, but has the
+ problem that if we never "up" the semaphore it could eventually
+ wrap around. */
+
+/* FIXME: This may be way too small. --RR */
+#define FUTEX_HASHBITS 6
+
+/* We use this instead of a normal wait_queue_t, so we can wake only
+ the relevent ones (hashed waitqueues may be shared) */
+struct futex_q {
+ struct list_head list;
+ struct task_struct *task;
+ atomic_t *count;
+};
+
+/* The key for the hash is the address + index + offset within page */
+static struct list_head futex_queues[1<<FUTEX_HASHBITS];
+static spinlock_t futex_lock = SPIN_LOCK_UNLOCKED;
+
+static inline struct list_head *hash_futex(struct page *page,
+ unsigned long offset)
+{
+ unsigned long h;
+
+ /* If someone is sleeping, page is pinned. ie. page_address
+ is a constant when we care about it. */
+ h = (unsigned long)page_address(page) + page->index + offset;
+ return &futex_queues[hash_long(h, FUTEX_HASHBITS)];
+}
+
+static inline void wake_one_waiter(struct list_head *head, atomic_t *count)
+{
+ struct list_head *i;
+
+ spin_lock(&futex_lock);
+ list_for_each(i, head) {
+ struct futex_q *this = list_entry(i, struct futex_q, list);
+
+ if (this->count == count) {
+ wake_up_process(this->task);
+ break;
+ }
+ }
+ spin_unlock(&futex_lock);
+}
+
+/* Add at end to avoid starvation */
+static inline void queue_me(struct list_head *head,
+ struct futex_q *q,
+ atomic_t *count)
+{
+ q->task = current;
+ q->count = count;
+
+ spin_lock(&futex_lock);
+ list_add_tail(&q->list, head);
+ spin_unlock(&futex_lock);
+}
+
+static inline void unqueue_me(struct futex_q *q)
+{
+ spin_lock(&futex_lock);
+ list_del(&q->list);
+ spin_unlock(&futex_lock);
+}
+
+/* Get kernel address of the user page and pin it. */
+static struct page *pin_page(unsigned long page_start)
+{
+ struct mm_struct *mm = current->mm;
+ struct page *page;
+ int err;
+
+ down_read(&mm->mmap_sem);
+ err = get_user_pages(current, current->mm, page_start,
+ 1 /* one page */,
+ 1 /* writable */,
+ 0 /* don't force */,
+ &page,
+ NULL /* don't return vmas */);
+ up_read(&mm->mmap_sem);
+
+ if (err < 0)
+ return ERR_PTR(err);
+ return page;
+}
+
+/* Simplified from arch/ppc/kernel/semaphore.c: Paul M. is a genius. */
+static int futex_down(struct list_head *head, atomic_t *count)
+{
+ int retval = 0;
+ struct futex_q q;
+
+ current->state = TASK_INTERRUPTIBLE;
+ queue_me(head, &q, count);
+
+ /* If we take the semaphore from 1 to 0, it's ours. */
+ while (!atomic_dec_and_test(count)) {
+ if (signal_pending(current)) {
+ retval = -EINTR;
+ break;
+ }
+ schedule();
+ current->state = TASK_INTERRUPTIBLE;
+ }
+ current->state = TASK_RUNNING;
+ unqueue_me(&q);
+ /* If we were signalled, we might have just been woken: we
+ must wake another one. Otherwise we need to wake someone
+ else (if they are waiting) so they drop the count below 0,
+ and when we "up" in userspace, we know there is a
+ waiter. */
+ wake_one_waiter(head, count);
+ return retval;
+}
+
+static int futex_up(struct list_head *head, atomic_t *count)
+{
+ atomic_set(count, 1);
+ smp_wmb();
+ wake_one_waiter(head, count);
+ return 0;
+}
+
+asmlinkage int sys_futex(void *uaddr, int op)
+{
+ int ret;
+ unsigned long pos_in_page;
+ struct list_head *head;
+ struct page *page;
+
+ pos_in_page = ((unsigned long)uaddr) % PAGE_SIZE;
+
+ /* Must be "naturally" aligned, and not on page boundary. */
+ if ((pos_in_page % __alignof__(atomic_t)) != 0
+ || pos_in_page + sizeof(atomic_t) > PAGE_SIZE)
+ return -EINVAL;
+
+ /* Simpler if it doesn't vanish underneath us. */
+ page = pin_page((unsigned long)uaddr - pos_in_page);
+ if (IS_ERR(page))
+ return PTR_ERR(page);
+
+ head = hash_futex(page, pos_in_page);
+ switch (op) {
+ case 1:
+ ret = futex_up(head, page_address(page) + pos_in_page);
+ break;
+ case -1:
+ ret = futex_down(head, page_address(page) + pos_in_page);
+ break;
+ /* Add other lock types here... */
+ default:
+ ret = -EINVAL;
+ }
+ put_page(page);
+
+ return ret;
+}
+
+static int __init init(void)
+{
+ unsigned int i;
+
+ for (i = 0; i < ARRAY_SIZE(futex_queues); i++)
+ INIT_LIST_HEAD(&futex_queues[i]);
+ return 0;
+}
+__initcall(init);
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.6-pre1/arch/i386/kernel/entry.S working-2.5.6-pre1-futex/arch/i386/kernel/entry.S
--- linux-2.5.6-pre1/arch/i386/kernel/entry.S Wed Feb 20 17:56:59 2002
+++ working-2.5.6-pre1-futex/arch/i386/kernel/entry.S Mon Mar 4 14:35:13 2002
@@ -716,6 +716,7 @@
.long SYMBOL_NAME(sys_lremovexattr)
.long SYMBOL_NAME(sys_fremovexattr)
.long SYMBOL_NAME(sys_tkill)
+ .long SYMBOL_NAME(sys_futex)

.rept NR_syscalls-(.-sys_call_table)/4
.long SYMBOL_NAME(sys_ni_syscall)
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.6-pre1/arch/ppc/kernel/misc.S working-2.5.6-pre1-futex/arch/ppc/kernel/misc.S
--- linux-2.5.6-pre1/arch/ppc/kernel/misc.S Wed Feb 20 17:57:04 2002
+++ working-2.5.6-pre1-futex/arch/ppc/kernel/misc.S Mon Mar 4 14:35:56 2002
@@ -1246,6 +1246,7 @@
.long sys_removexattr
.long sys_lremovexattr
.long sys_fremovexattr /* 220 */
+ .long sys_futex
.rept NR_syscalls-(.-sys_call_table)/4
.long sys_ni_syscall
.endr
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.6-pre1/arch/ppc/kernel/semaphore.c working-2.5.6-pre1-futex/arch/ppc/kernel/semaphore.c
--- linux-2.5.6-pre1/arch/ppc/kernel/semaphore.c Wed Feb 20 17:57:04 2002
+++ working-2.5.6-pre1-futex/arch/ppc/kernel/semaphore.c Fri Mar 1 22:52:06 2002
@@ -21,34 +21,6 @@
#include <asm/atomic.h>
#include <asm/semaphore.h>

-/*
- * Atomically update sem->count.
- * This does the equivalent of the following:
- *
- * old_count = sem->count;
- * tmp = MAX(old_count, 0) + incr;
- * sem->count = tmp;
- * return old_count;
- */
-static inline int __sem_update_count(struct semaphore *sem, int incr)
-{
- int old_count, tmp;
-
- __asm__ __volatile__("\n"
-"1: lwarx %0,0,%3\n"
-" srawi %1,%0,31\n"
-" andc %1,%0,%1\n"
-" add %1,%1,%4\n"
- PPC405_ERR77(0,%3)
-" stwcx. %1,0,%3\n"
-" bne 1b"
- : "=&r" (old_count), "=&r" (tmp), "=m" (sem->count)
- : "r" (&sem->count), "r" (incr), "m" (sem->count)
- : "cc");
-
- return old_count;
-}
-
void __up(struct semaphore *sem)
{
/*
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.6-pre1/include/asm-i386/atomic.h working-2.5.6-pre1-futex/include/asm-i386/atomic.h
--- linux-2.5.6-pre1/include/asm-i386/atomic.h Fri Nov 23 06:46:18 2001
+++ working-2.5.6-pre1-futex/include/asm-i386/atomic.h Fri Mar 1 22:51:25 2002
@@ -2,6 +2,7 @@
#define __ARCH_I386_ATOMIC__

#include <linux/config.h>
+#include <asm/system.h>

/*
* Atomic operations that C can't guarantee us. Useful for
@@ -201,4 +202,43 @@
#define smp_mb__before_atomic_inc() barrier()
#define smp_mb__after_atomic_inc() barrier()

+/*
+ * Atomically update count.
+ * This does the equivalent of the following:
+ *
+ * old_count = *count;
+ * tmp = MAX(old_count, 0) + incr;
+ * *count = tmp;
+ * return old_count;
+ */
+#ifdef CONFIG_M386
+#ifndef CONFIG_SMP
+#define __HAVE_ARCH_UPDATE_COUNT
+/* We can still do this (no userspace can be running). */
+static inline int __update_count(atomic_t *count, int incr)
+{
+ int old_count, tmp;
+
+ /* preempt_disable() */
+ old_count = atomic_read(count);
+ tmp = old_count > 0 ? old_count : 0;
+ atomic_set(count, tmp + incr);
+ return old_count;
+}
+#else
+/* SMP 386 update_count is not possible. 8( */
+#else /* ! 386 */
+#define __HAVE_ARCH_UPDATE_COUNT
+static inline int __update_count(atomic_t *count, int incr)
+{
+ int old_count, tmp;
+
+ do {
+ old_count = atomic_read(count);
+ tmp = old_count > 0 ? old_count : 0;
+ } while (cmpxchg(&count->counter, old_count, tmp + incr) != old_count);
+
+ return old_count;
+}
+#endif /* CONFIG_M386 */
#endif
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.6-pre1/include/asm-i386/mman.h working-2.5.6-pre1-futex/include/asm-i386/mman.h
--- linux-2.5.6-pre1/include/asm-i386/mman.h Wed Mar 15 12:45:20 2000
+++ working-2.5.6-pre1-futex/include/asm-i386/mman.h Mon Mar 4 14:51:26 2002
@@ -4,6 +4,7 @@
#define PROT_READ 0x1 /* page can be read */
#define PROT_WRITE 0x2 /* page can be written */
#define PROT_EXEC 0x4 /* page can be executed */
+#define PROT_SEM 0x8 /* page may be used for atomic ops */
#define PROT_NONE 0x0 /* page can not be accessed */

#define MAP_SHARED 0x01 /* Share changes */
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.6-pre1/include/asm-i386/unistd.h working-2.5.6-pre1-futex/include/asm-i386/unistd.h
--- linux-2.5.6-pre1/include/asm-i386/unistd.h Wed Feb 20 17:56:40 2002
+++ working-2.5.6-pre1-futex/include/asm-i386/unistd.h Mon Mar 4 14:36:32 2002
@@ -243,6 +243,7 @@
#define __NR_lremovexattr 236
#define __NR_fremovexattr 237
#define __NR_tkill 238
+#define __NR_futex 239

/* user-visible error numbers are in the range -1 - -124: see <asm-i386/errno.h> */

diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.6-pre1/include/asm-ppc/atomic.h working-2.5.6-pre1-futex/include/asm-ppc/atomic.h
--- linux-2.5.6-pre1/include/asm-ppc/atomic.h Wed Feb 20 17:57:18 2002
+++ working-2.5.6-pre1-futex/include/asm-ppc/atomic.h Sat Mar 2 10:34:29 2002
@@ -198,5 +198,33 @@
#define smp_mb__before_atomic_inc() smp_mb()
#define smp_mb__after_atomic_inc() smp_mb()

+/*
+ * Atomically update count.
+ * This does the equivalent of the following:
+ *
+ * old_count = *count;
+ * tmp = MAX(old_count, 0) + incr;
+ * *count = tmp;
+ * return old_count;
+ */
+#define __HAVE_ARCH_UPDATE_COUNT
+static inline int __update_count(atomic_t *count, int incr)
+{
+ int old_count, tmp;
+
+ __asm__ __volatile__("\n"
+"1: lwarx %0,0,%3\n"
+" srawi %1,%0,31\n"
+" andc %1,%0,%1\n"
+" add %1,%1,%4\n"
+ PPC405_ERR77(0,%3)
+" stwcx. %1,0,%3\n"
+" bne 1b"
+ : "=&r" (old_count), "=&r" (tmp), "=m" (*count)
+ : "r" (count), "r" (incr), "m" (*count)
+ : "cc");
+
+ return old_count;
+}
#endif /* __KERNEL__ */
#endif /* _ASM_PPC_ATOMIC_H_ */
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.6-pre1/include/asm-ppc/mman.h working-2.5.6-pre1-futex/include/asm-ppc/mman.h
--- linux-2.5.6-pre1/include/asm-ppc/mman.h Tue May 22 08:02:06 2001
+++ working-2.5.6-pre1-futex/include/asm-ppc/mman.h Mon Mar 4 14:51:26 2002
@@ -7,6 +7,7 @@
#define PROT_READ 0x1 /* page can be read */
#define PROT_WRITE 0x2 /* page can be written */
#define PROT_EXEC 0x4 /* page can be executed */
+#define PROT_SEM 0x8 /* page may be used for atomic ops */
#define PROT_NONE 0x0 /* page can not be accessed */

#define MAP_SHARED 0x01 /* Share changes */
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.6-pre1/include/asm-ppc/unistd.h working-2.5.6-pre1-futex/include/asm-ppc/unistd.h
--- linux-2.5.6-pre1/include/asm-ppc/unistd.h Wed Feb 20 17:57:18 2002
+++ working-2.5.6-pre1-futex/include/asm-ppc/unistd.h Mon Mar 4 14:36:23 2002
@@ -228,6 +228,7 @@
#define __NR_removexattr 218
#define __NR_lremovexattr 219
#define __NR_fremovexattr 220
+#define __NR_futex 221

#define __NR(n) #n

diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.6-pre1/include/linux/hash.h working-2.5.6-pre1-futex/include/linux/hash.h
--- linux-2.5.6-pre1/include/linux/hash.h Thu Jan 1 10:00:00 1970
+++ working-2.5.6-pre1-futex/include/linux/hash.h Fri Mar 1 22:51:25 2002
@@ -0,0 +1,58 @@
+#ifndef _LINUX_HASH_H
+#define _LINUX_HASH_H
+/* Fast hashing routine for a long.
+ (C) 2002 William Lee Irwin III, IBM */
+
+/*
+ * Knuth recommends primes in approximately golden ratio to the maximum
+ * integer representable by a machine word for multiplicative hashing.
+ * Chuck Lever verified the effectiveness of this technique:
+ * http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf
+ *
+ * These primes are chosen to be bit-sparse, that is operations on
+ * them can use shifts and additions instead of multiplications for
+ * machines where multiplications are slow.
+ */
+#if BITS_PER_LONG == 32
+/* 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1 */
+#define GOLDEN_RATIO_PRIME 0x9e370001UL
+#elif BITS_PER_LONG == 64
+/* 2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1 */
+#define GOLDEN_RATIO_PRIME 0x9e37fffffffc0001UL
+#else
+#error Define GOLDEN_RATIO_PRIME for your wordsize.
+#endif
+
+static inline unsigned long hash_long(unsigned long val, unsigned int bits)
+{
+ unsigned long hash = val;
+
+#if BITS_PER_LONG == 64
+ /* Sigh, gcc can't optimise this alone like it does for 32 bits. */
+ unsigned long n = hash;
+ n <<= 18;
+ hash -= n;
+ n <<= 33;
+ hash -= n;
+ n <<= 3;
+ hash += n;
+ n <<= 3;
+ hash -= n;
+ n <<= 4;
+ hash += n;
+ n <<= 2;
+ hash += n;
+#else
+ /* On some cpus multiply is faster, on others gcc will do shifts */
+ hash *= GOLDEN_RATIO_PRIME;
+#endif
+
+ /* High bits are more random, so use them. */
+ return hash >> (BITS_PER_LONG - bits);
+}
+
+static inline unsigned long hash_ptr(void *ptr, unsigned int bits)
+{
+ return hash_long((unsigned long)ptr, bits);
+}
+#endif /* _LINUX_HASH_H */
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.6-pre1/include/linux/mmzone.h working-2.5.6-pre1-futex/include/linux/mmzone.h
--- linux-2.5.6-pre1/include/linux/mmzone.h Fri Mar 1 22:58:34 2002
+++ working-2.5.6-pre1-futex/include/linux/mmzone.h Mon Mar 4 14:37:39 2002
@@ -51,8 +51,7 @@
/*
* wait_table -- the array holding the hash table
* wait_table_size -- the size of the hash table array
- * wait_table_shift -- wait_table_size
- * == BITS_PER_LONG (1 << wait_table_bits)
+ * wait_table_bits -- wait_table_size == (1 << wait_table_bits)
*
* The purpose of all these is to keep track of the people
* waiting for a page to become available and make them
@@ -75,7 +74,7 @@
*/
wait_queue_head_t * wait_table;
unsigned long wait_table_size;
- unsigned long wait_table_shift;
+ unsigned long wait_table_bits;

/*
* Discontig memory support fields.
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.6-pre1/kernel/Makefile working-2.5.6-pre1-futex/kernel/Makefile
--- linux-2.5.6-pre1/kernel/Makefile Wed Feb 20 17:56:17 2002
+++ working-2.5.6-pre1-futex/kernel/Makefile Fri Mar 1 22:53:36 2002
@@ -15,7 +15,7 @@
obj-y = sched.o dma.o fork.o exec_domain.o panic.o printk.o \
module.o exit.o itimer.o info.o time.o softirq.o resource.o \
sysctl.o acct.o capability.o ptrace.o timer.o user.o \
- signal.o sys.o kmod.o context.o
+ signal.o sys.o kmod.o context.o futex.o

obj-$(CONFIG_UID16) += uid16.o
obj-$(CONFIG_MODULES) += ksyms.o
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.6-pre1/mm/filemap.c working-2.5.6-pre1-futex/mm/filemap.c
--- linux-2.5.6-pre1/mm/filemap.c Wed Feb 27 14:45:43 2002
+++ working-2.5.6-pre1-futex/mm/filemap.c Fri Mar 1 23:10:14 2002
@@ -25,6 +25,7 @@
#include <linux/iobuf.h>
#include <linux/compiler.h>
#include <linux/fs.h>
+#include <linux/hash.h>

#include <asm/pgalloc.h>
#include <asm/uaccess.h>
@@ -773,32 +774,8 @@
static inline wait_queue_head_t *page_waitqueue(struct page *page)
{
const zone_t *zone = page_zone(page);
- wait_queue_head_t *wait = zone->wait_table;
- unsigned long hash = (unsigned long)page;

-#if BITS_PER_LONG == 64
- /* Sigh, gcc can't optimise this alone like it does for 32 bits. */
- unsigned long n = hash;
- n <<= 18;
- hash -= n;
- n <<= 33;
- hash -= n;
- n <<= 3;
- hash += n;
- n <<= 3;
- hash -= n;
- n <<= 4;
- hash += n;
- n <<= 2;
- hash += n;
-#else
- /* On some cpus multiply is faster, on others gcc will do shifts */
- hash *= GOLDEN_RATIO_PRIME;
-#endif
-
- hash >>= zone->wait_table_shift;
-
- return &wait[hash];
+ return &zone->wait_table[hash_ptr(page, zone->wait_table_bits)];
}

/*
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.6-pre1/mm/mprotect.c working-2.5.6-pre1-futex/mm/mprotect.c
--- linux-2.5.6-pre1/mm/mprotect.c Wed Feb 20 17:57:21 2002
+++ working-2.5.6-pre1-futex/mm/mprotect.c Mon Mar 4 14:51:26 2002
@@ -280,7 +280,7 @@
end = start + len;
if (end < start)
return -EINVAL;
- if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC))
+ if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM))
return -EINVAL;
if (end == start)
return 0;
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.6-pre1/mm/page_alloc.c working-2.5.6-pre1-futex/mm/page_alloc.c
--- linux-2.5.6-pre1/mm/page_alloc.c Wed Feb 20 17:57:21 2002
+++ working-2.5.6-pre1-futex/mm/page_alloc.c Fri Mar 1 23:10:14 2002
@@ -776,8 +776,8 @@
* per zone.
*/
zone->wait_table_size = wait_table_size(size);
- zone->wait_table_shift =
- BITS_PER_LONG - wait_table_bits(zone->wait_table_size);
+ zone->wait_table_bits =
+ wait_table_bits(zone->wait_table_size);
zone->wait_table = (wait_queue_head_t *)
alloc_bootmem_node(pgdat, zone->wait_table_size
* sizeof(wait_queue_head_t));


2002-03-04 19:50:20

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

On Sun, 2002-03-03 at 22:55, Rusty Russell wrote:
> 1) Use mmap/mprotect bits, not new syscall (thanks RTH, Erik Biederman)
> 2) Fix wakeup race in kernel (thanks Martin Wirth, Paul Mackerras)
> 3) Simplify locking to a single atomic (no more arch specifics!)
> 4) Use wake-one by handcoding queues.
> 5) Comments added.
>
> Thanks to all for feedback and review: I'd appreciate a comment from
> those arch's which need to do something with the PROT_SEM bit.
>
> Once again, tested on 2.4.18 UP PPC, compiles on 2.5.6-pre1.
>
> Bad news is that we're up to 206 lines again.
> Rusty.

Good work. I likee.

I have a couple comments and question:

> +static spinlock_t futex_lock = SPIN_LOCK_UNLOCKED;

Could we make this per-waitqueue?

> +asmlinkage int sys_futex(void *uaddr, int op)
< [...]
> + switch (op) {
> + case 1:
> + ret = futex_up(head, page_address(page) + pos_in_page);
> + break;
> + case -1:

We should do:

#define FUTEX_UP 1
#define FUTEX_DOWN -1

and put them in a common header (i.e. include/linux so both the kernel
and glibc will use it) and use that in our code and the kernel code.
Just a finishing detail ...

> +static inline int __update_count(atomic_t *count, int incr)
> +{
> + int old_count, tmp;
> +
> + /* preempt_disable() */
> + old_count = atomic_read(count);
> + tmp = old_count > 0 ? old_count : 0;
> + atomic_set(count, tmp + incr);
> + return old_count;
> +}

You will want to do:

int old_count, tmp;

preempt_disable();
old count = atomic_read(count);
tmp = old_count > 0 ? old_count : 0;
atomic_set(count, tmp + incr);
preempt_enable();

return old_count;

here. The preempt statements compile away if CONFIG_PREEMPT is not set,
so you can just put them in, even on arches that don't do preemption
yet.

... oh, and I would love an example of using it in userspace ;)

Nice work, Rusty.

Robert Love

2002-03-04 20:10:53

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

On 4 Mar 2002, Robert Love wrote:

> On Sun, 2002-03-03 at 22:55, Rusty Russell wrote:
> > 1) Use mmap/mprotect bits, not new syscall (thanks RTH, Erik Biederman)
> > 2) Fix wakeup race in kernel (thanks Martin Wirth, Paul Mackerras)
> > 3) Simplify locking to a single atomic (no more arch specifics!)
> > 4) Use wake-one by handcoding queues.
> > 5) Comments added.
> >
> > Thanks to all for feedback and review: I'd appreciate a comment from
> > those arch's which need to do something with the PROT_SEM bit.
> >
> > Once again, tested on 2.4.18 UP PPC, compiles on 2.5.6-pre1.
> >
> > Bad news is that we're up to 206 lines again.
> > Rusty.
>
> Good work. I likee.

Ok, i reply to this because my 'd' Pine's key is heavily used in these days :-)
I took a look at the code yesterday night and i've a stupid question
Rusty. I do not know what is the code used in the upper part of the
iceberg ( userspace ) but are you doing a dec_and_test() on userspace
before entering the kernel ? Or, is the kernel code the slow path or you
enter it by default ?



- Davide


2002-03-04 20:21:26

by Matthew Kirkwood

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

On Mon, 4 Mar 2002, Davide Libenzi wrote:

[ cc: list trimmage ]

> Or, is the kernel code the slow path or you enter it by default ?

The kernel part is the slow path -- we only enter it
when we have contention and want to sleep.

Matthew.

2002-03-04 20:48:19

by Hubertus Franke

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

On Mon, Mar 04, 2002 at 12:13:50PM -0800, Davide Libenzi wrote:
> On 4 Mar 2002, Robert Love wrote:
>
> > On Sun, 2002-03-03 at 22:55, Rusty Russell wrote:
> > > 1) Use mmap/mprotect bits, not new syscall (thanks RTH, Erik Biederman)
> > > 2) Fix wakeup race in kernel (thanks Martin Wirth, Paul Mackerras)
> > > 3) Simplify locking to a single atomic (no more arch specifics!)
> > > 4) Use wake-one by handcoding queues.
> > > 5) Comments added.
> > >
> > > Thanks to all for feedback and review: I'd appreciate a comment from
> > > those arch's which need to do something with the PROT_SEM bit.
> > >
> > > Once again, tested on 2.4.18 UP PPC, compiles on 2.5.6-pre1.
> > >
> > > Bad news is that we're up to 206 lines again.
> > > Rusty.
> >
> > Good work. I likee.
>
> Ok, i reply to this because my 'd' Pine's key is heavily used in these days :-)
> I took a look at the code yesterday night and i've a stupid question
> Rusty. I do not know what is the code used in the upper part of the
> iceberg ( userspace ) but are you doing a dec_and_test() on userspace
> before entering the kernel ? Or, is the kernel code the slow path or you
> enter it by default ?
>
>
>
> - Davide
>

Yes, that section is missing. It should be just as you said.

int futex_down(atomic_t *count)
{
if (!atomic_dec_and_test(count))
return sys_futex(count,FUTEX_DOWN);
return 0;
}

One needs to ensure that the atomic nature is compiled in (i.e. SMP)
even if SMP is not enabled in the machine.

Rusty, could you post that code that you are using.


Also, the check on PROT_SEM is missing. I tried this before and glibc filtered
these flags out when set. But effectively, one still needs to check
whether semaphores are allowed during the sys_futex call.
This is expensive, because the protections are associated with the VMA.
find_vma() is not an option here, that's why I did the hashing and more persistent
objects rather than hashed wait queues.

I'd suggest to drop the requirements for this flag PROT_SEM.
I don't see tremendous value for it anyway, and it creates quite some
headache if its to be enforced, which it is not in Rusty's code.
My code could do it, but as it comes with the complexity discussed earlier.

-- Hubertus

2002-03-04 22:13:00

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

On Mon, 4 Mar 2002, Hubertus Franke wrote:

> On Mon, Mar 04, 2002 at 12:13:50PM -0800, Davide Libenzi wrote:
> > On 4 Mar 2002, Robert Love wrote:
> >
> > > On Sun, 2002-03-03 at 22:55, Rusty Russell wrote:
> > > > 1) Use mmap/mprotect bits, not new syscall (thanks RTH, Erik Biederman)
> > > > 2) Fix wakeup race in kernel (thanks Martin Wirth, Paul Mackerras)
> > > > 3) Simplify locking to a single atomic (no more arch specifics!)
> > > > 4) Use wake-one by handcoding queues.
> > > > 5) Comments added.
> > > >
> > > > Thanks to all for feedback and review: I'd appreciate a comment from
> > > > those arch's which need to do something with the PROT_SEM bit.
> > > >
> > > > Once again, tested on 2.4.18 UP PPC, compiles on 2.5.6-pre1.
> > > >
> > > > Bad news is that we're up to 206 lines again.
> > > > Rusty.
> > >
> > > Good work. I likee.
> >
> > Ok, i reply to this because my 'd' Pine's key is heavily used in these days :-)
> > I took a look at the code yesterday night and i've a stupid question
> > Rusty. I do not know what is the code used in the upper part of the
> > iceberg ( userspace ) but are you doing a dec_and_test() on userspace
> > before entering the kernel ? Or, is the kernel code the slow path or you
> > enter it by default ?
> >
> >
> >
> > - Davide
> >
>
> Yes, that section is missing. It should be just as you said.
>
> int futex_down(atomic_t *count)
> {
> if (!atomic_dec_and_test(count))
> return sys_futex(count,FUTEX_DOWN);
> return 0;
> }

That's great. What if the process holding the mutex dies while there're
sleeping tasks waiting for it ?




- Davide



2002-03-05 01:31:27

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

In message <1015271393.15277.112.camel@phantasy> you write:
> > +static spinlock_t futex_lock = SPIN_LOCK_UNLOCKED;
>
> Could we make this per-waitqueue?

Yes, once someone gives benchmarks proving it's worth doing the whole
"multiple locks and cache aligned" thing. Until then, it's premature
optimization.

> We should do:
>
> #define FUTEX_UP 1
> #define FUTEX_DOWN -1

Ack. Definitely.

> here. The preempt statements compile away if CONFIG_PREEMPT is not set,
> so you can just put them in, even on arches that don't do preemption
> yet.

Oops, that code shouldn't have been in patch, and the only reason that
preempt_disable() was commented out is that I tested the patch on 2.4.

> ... oh, and I would love an example of using it in userspace ;)

I'll throw it in for patch IV. 8)

> Nice work, Rusty.

I don't know if I can accept the kudos: it's now hovering at about 70%
my code, but only 20% my ideas.

Cheers,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-03-05 01:50:20

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

On Mon, 2002-03-04 at 17:15, Davide Libenzi wrote:

> That's great. What if the process holding the mutex dies while there're
> sleeping tasks waiting for it ?

I can't find an answer in the code (meaning the lock is lost...) and no
one has yet answered. Davide, have you noticed anything?

I think this needs a proper solution..

Robert Love

2002-03-05 02:50:10

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

On 4 Mar 2002, Robert Love wrote:

> On Mon, 2002-03-04 at 17:15, Davide Libenzi wrote:
>
> > That's great. What if the process holding the mutex dies while there're
> > sleeping tasks waiting for it ?
>
> I can't find an answer in the code (meaning the lock is lost...) and no
> one has yet answered. Davide, have you noticed anything?

not inside the code i saw yesterday ...


> I think this needs a proper solution..

i think so, even if it'll make things a bit more complex ...



- Davide


2002-03-05 03:42:46

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

In message <1015293007.882.87.camel@phantasy> you write:
> On Mon, 2002-03-04 at 17:15, Davide Libenzi wrote:
>
> > That's great. What if the process holding the mutex dies while there're
> > sleeping tasks waiting for it ?
>
> I can't find an answer in the code (meaning the lock is lost...) and no
> one has yet answered. Davide, have you noticed anything?
>
> I think this needs a proper solution..

No. From my previous post:

Date: Mon, 4 Mar 2002 17:13:46 +1100
From: Rusty Russell <[email protected]>
To: Paul Jackson <[email protected]>
Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: [Lse-tech] Re: [PATCH] Lightweight userspace semaphores...
Message-Id: <[email protected]>
In-Reply-To: <[email protected]>
In-Reply-To: <[email protected]>
<[email protected]>

On Sun, 3 Mar 2002 14:13:45 -0800
Paul Jackson <[email protected]> wrote:

> On Sat, 2 Mar 2002, Hubertus Franke wrote:
> > But more conceptually, if you process dies while holding a lock ...
> > your app is toast at that point.
>
> Without application specific knowledge, certainly.
>
> Is there someway one could support a hook, to enable
> an application to register a handler that could clean
> up, for those apps that found that worthwhile?

If you want this, use fcntl locks (see TDB). If you don't tell the kernel
what you are doing (which is the reason these locks are fast), it cannot
clean up for you.

One could conceive of a solution where a process told the kernel
"here is where I keep my lock states: if I die, clean up". Of course,
there's a race there too.

IMHO, given that the lock is protecting something which is left in an
unknown state, this is something which would require serious testing
to be proven worthwhile.

Hope that helps,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-03-05 03:52:18

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

On Tue, 5 Mar 2002, Rusty Russell wrote:

> In message <1015293007.882.87.camel@phantasy> you write:
> > On Mon, 2002-03-04 at 17:15, Davide Libenzi wrote:
> >
> > > That's great. What if the process holding the mutex dies while there're
> > > sleeping tasks waiting for it ?
> >
> > I can't find an answer in the code (meaning the lock is lost...) and no
> > one has yet answered. Davide, have you noticed anything?
> >
> > I think this needs a proper solution..
>
> No. From my previous post:

Uhmm, you better comment this one.

"please do not die while holding it, please ..."

:-)



- Davide


2002-03-05 04:34:54

by Edgar Toernig

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

Robert Love wrote:
>
> On Mon, 2002-03-04 at 17:15, Davide Libenzi wrote:
>
> > That's great. What if the process holding the mutex dies while there're
> > sleeping tasks waiting for it ?
>
> I can't find an answer in the code (meaning the lock is lost...) and no
> one has yet answered. Davide, have you noticed anything?
>
> I think this needs a proper solution..

You can't do very much. The futex holder has probably damaged some
data. The only thing you could do is to kill all current and future
waiters too. But the "future waiters" is difficult. The process
may hold other locks the kernel does not know anything about.

The only thing one could do is to kill all processes that share a
MAP_SEM page with a dying process.

Ciao, ET.

2002-03-05 04:45:44

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

On Mon, 4 Mar 2002 15:48:48 -0500
Hubertus Franke <[email protected]> wrote:

> Also, the check on PROT_SEM is missing. I tried this before and glibc filtered
> these flags out when set. But effectively, one still needs to check
> whether semaphores are allowed during the sys_futex call.

Neither arch I care about (ppc, x86) needs to do anything with PROT_SEM, so it's
OK. glibc will have to be fixed on any architectures which require help here,
and a hook will be needed somewhere in the kernel for them.

I didn't implement it because I don't *know* which archs will need something,
and what they will need. Hence my request for arch maintainers to step
forward (Linus said they exist, and I believe him).

Hope that clarifies this particular wart...
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-03-05 06:08:09

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

In message <[email protected]> y
ou write:
> On Tue, 5 Mar 2002, Rusty Russell wrote:
>
> > In message <1015293007.882.87.camel@phantasy> you write:
> > > On Mon, 2002-03-04 at 17:15, Davide Libenzi wrote:
> > >
> > > > That's great. What if the process holding the mutex dies while there're
> > > > sleeping tasks waiting for it ?

I don't see that. Your data, your program, your funeral, yes?

Want to autoclean the lock? There are many ways to do this in
userspace. For most cases, the problem is "and then what"?

Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-03-05 15:09:06

by Hubertus Franke

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

On Monday 04 March 2002 10:45 pm, Rusty Russell wrote:
> In message <1015293007.882.87.camel@phantasy> you write:
> > On Mon, 2002-03-04 at 17:15, Davide Libenzi wrote:
> > > That's great. What if the process holding the mutex dies while there're
> > > sleeping tasks waiting for it ?
> >
> > I can't find an answer in the code (meaning the lock is lost...) and no
> > one has yet answered. Davide, have you noticed anything?
> >
> > I think this needs a proper solution..
>
>
> If you want this, use fcntl locks (see TDB). If you don't tell the kernel
> what you are doing (which is the reason these locks are fast), it cannot
> clean up for you.
>
> One could conceive of a solution where a process told the kernel
> "here is where I keep my lock states: if I die, clean up". Of course,
> there's a race there too.
>

Yes, the problem goes even deeper. A simple hook is not enough.
One must know who is actually holding the lock, so that the cleanup
routines do the right thing.
E.g. store the pid with the lock. As Rusty stated this has still race
conditions.
Anyway, this should be orthogonal to the low level services
provided.
Another issue is that of rwlocks. Here its perfectly OK to die if
you hold the lock in read mode and clean up before going away.

Again, this should not be part of the base service.

> IMHO, given that the lock is protecting something which is left in an
> unknown state, this is something which would require serious testing
> to be proven worthwhile.
>


> Hope that helps,
> Rusty.

--
-- Hubertus Franke ([email protected])

2002-03-05 15:15:06

by Hubertus Franke

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

On Monday 04 March 2002 11:48 pm, Rusty Russell wrote:
> On Mon, 4 Mar 2002 15:48:48 -0500
>
> Hubertus Franke <[email protected]> wrote:
> > Also, the check on PROT_SEM is missing. I tried this before and glibc
> > filtered these flags out when set. But effectively, one still needs to
> > check whether semaphores are allowed during the sys_futex call.
>
> Neither arch I care about (ppc, x86) needs to do anything with PROT_SEM, so
> it's OK. glibc will have to be fixed on any architectures which require
> help here, and a hook will be needed somewhere in the kernel for them.
>
> I didn't implement it because I don't *know* which archs will need
> something, and what they will need. Hence my request for arch maintainers
> to step forward (Linus said they exist, and I believe him).
>
> Hope that clarifies this particular wart...
> Rusty.

Clarifies only partially.

I agree to put it there if its not used as a means to define whether
user locks are permitted or not. If that is the intention, then the current
futex will need to check every access through find_vma(), which we
both know nobody wants to do.

So it can only be used for architectural hints, agreed ?

--
-- Hubertus Franke ([email protected])

2002-03-05 17:20:42

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

On Tue, 5 Mar 2002, Rusty Russell wrote:

> In message <[email protected]> y
> ou write:
> > On Tue, 5 Mar 2002, Rusty Russell wrote:
> >
> > > In message <1015293007.882.87.camel@phantasy> you write:
> > > > On Mon, 2002-03-04 at 17:15, Davide Libenzi wrote:
> > > >
> > > > > That's great. What if the process holding the mutex dies while there're
> > > > > sleeping tasks waiting for it ?
>
> I don't see that. Your data, your program, your funeral, yes?
>
> Want to autoclean the lock? There are many ways to do this in
> userspace. For most cases, the problem is "and then what"?

I tried hard to remember cases where fetures of IPC sems have been handy
to me but i failed.

*as designed*



- Davide


2002-03-06 01:28:08

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

In message <[email protected]> you write:
> I agree to put it there if its not used as a means to define whether
> user locks are permitted or not. If that is the intention, then the current
> futex will need to check every access through find_vma(), which we
> both know nobody wants to do.
>
> So it can only be used for architectural hints, agreed ?

Yes. It *might* work if you don't PROT_SEM the page the semaphore is
on, but it's still a bug waiting to happen. OTOH, it'd be nice if
PROT_SEM returns EINVAL was a reliably indicator of no futex support.
This way you actually need to call the futex syscall once to see if it
works.

Cheers!
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-03-07 01:53:09

by Richard Henderson

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

On Mon, Mar 04, 2002 at 02:55:36PM +1100, Rusty Russell wrote:
> + /* If we take the semaphore from 1 to 0, it's ours. */
> + while (!atomic_dec_and_test(count)) {
> + if (signal_pending(current)) {
> + retval = -EINTR;
> + break;

This is not safe from wraparound. Let one thread hold the
lock forever; let other threads keep trying to take the lock
while periodically getting SIGALRM. Eventually one of the
spinning threads will incorrectly acquire the mutex.

On sparc32, atomic_t is only 24 bits wide, so it wouldn't
take very long at all to wrap. It's slightly longer for
the other platforms, but it can still happen. And note
that even 64-bit platforms may be using a 32-bit atomic_t.

You really do need that cmpxchg loop.


r~

2002-03-07 01:59:21

by Richard Henderson

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

On Mon, Mar 04, 2002 at 02:15:58PM -0800, Davide Libenzi wrote:
> That's great. What if the process holding the mutex dies while there're
> sleeping tasks waiting for it ?

The lock is lost. The same thing would happen with locks completely
implemented in userspace.

I don't see that the kernel should do anything about this. If a
thread is killed with predudice (i.e. without pthread_cancel) then
there are all sorts of cleanups that won't happen. Having the
kernel automatically unlock the locks doesn't help much, since
the data structures are quite likely in an inconsistent state.


r~

2002-03-07 02:07:14

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

On Wed, 6 Mar 2002, Richard Henderson wrote:

> On Mon, Mar 04, 2002 at 02:15:58PM -0800, Davide Libenzi wrote:
> > That's great. What if the process holding the mutex dies while there're
> > sleeping tasks waiting for it ?
>
> The lock is lost. The same thing would happen with locks completely
> implemented in userspace.
>
> I don't see that the kernel should do anything about this. If a
> thread is killed with predudice (i.e. without pthread_cancel) then
> there are all sorts of cleanups that won't happen. Having the
> kernel automatically unlock the locks doesn't help much, since
> the data structures are quite likely in an inconsistent state.

agreed, whatever solution does not solve it completely and makes things is
lot more complex. it's not an issue ...



- Davide


2002-03-07 03:36:49

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

On Wed, 6 Mar 2002 17:52:03 -0800
Richard Henderson <[email protected]> wrote:

> On Mon, Mar 04, 2002 at 02:55:36PM +1100, Rusty Russell wrote:
> > + /* If we take the semaphore from 1 to 0, it's ours. */
> > + while (!atomic_dec_and_test(count)) {
> > + if (signal_pending(current)) {
> > + retval = -EINTR;
> > + break;
>
> This is not safe from wraparound. Let one thread hold the
> lock forever; let other threads keep trying to take the lock
> while periodically getting SIGALRM. Eventually one of the
> spinning threads will incorrectly acquire the mutex.

Yes, this was noted. And yes, it's about time we fixed sparc32
or threw it out of the tree. But since the real problem here is
"lock held forever", so I don't care.

> You really do need that cmpxchg loop.

Well, not decrementing if count < 0 already also works (as seen in
later patches), and I'm not going to break those SMP 386s if I don't
have to.

Cheers!
Rusty.
PS. Will Alpha have to do any special magic with the mmap PROT_SEM flag?
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-03-07 08:48:53

by Richard Henderson

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

On Thu, Mar 07, 2002 at 02:39:47PM +1100, Rusty Russell wrote:
> But since the real problem here is "lock held forever", so I don't care.

No, "lock held forever" merely makes the example trivial.
"Lock held for a while" is the real problem.

> > You really do need that cmpxchg loop.
>
> Well, not decrementing if count < 0 already also works

How, exactly, are you planning on doing that atomically?
Clue: 386 SMP requires an extra spinlock.

> PS. Will Alpha have to do any special magic with the mmap PROT_SEM flag?

No.


r~

2002-03-07 09:14:33

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] Fast Userspace Mutexes III.

In message <[email protected]> you write:
> On Thu, Mar 07, 2002 at 02:39:47PM +1100, Rusty Russell wrote:
> > But since the real problem here is "lock held forever", so I don't care.
>
> No, "lock held forever" merely makes the example trivial.
> "Lock held for a while" is the real problem.
>
> > > You really do need that cmpxchg loop.
> >
> > Well, not decrementing if count < 0 already also works
>
> How, exactly, are you planning on doing that atomically?
> Clue: 386 SMP requires an extra spinlock.

Fortuntely, it doesn't need to be atomic. This got me too, but Paul
Mackerras convinced me.

We don't care if someone decrements it between us checking < 0 and
doing the atomic_dec_and_test. So the only thing we worry about is
the "up" case. A userspace "up" will go into the kernel (since count
< 0), and the kernel "up" will wake us since we are on the queue
before we do this test.

> > PS. Will Alpha have to do any special magic with the mmap PROT_SEM flag?
>
> No.

Damn, I was hoping to find out which arch actually is going to use
this. I hate untested code.

Thanks!
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.