-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
... broke NPTL. Tests which worked with previous kernels fail now. One
test eventually succeeded, but the process somehow got stuck for about
30-40 seconds. Then it finished. Running strace showed a call to
clone() as the last operation but there were other threads running at
that time.
I tried to login (via ssh) into the system when the process hang and
this, too, was delayed and succeeded immediately when the strace'd
process continued.
I haven't looked at the changes made and wouldn't expect to understand
all the details either. And I can understand if you don't want to run
the glibc test suite. What I can offer are statically linked versions
of the tests. One is here:
http://people.redhat.com/drepper/tst-cond2.bz2
358572ec100de9b27833d6c4ee5ecdb5 tst-cond2
Let me know if you need more help.
- --
- --------------. ,-. 444 Castro Street
Ulrich Drepper \ ,-----------------' \ Mountain View, CA 94041 USA
Red Hat `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
iD8DBQE/WPD32ijCOnn/RHQRApLUAJ9SiIelIOnUA/pOmol04AaM+hMvaQCgoA86
VNLbS7nFXoVggjDtWAJxZ5g=
=tNkI
-----END PGP SIGNATURE-----
On Fri, 05 Sep 2003 13:24:23 -0700
Ulrich Drepper <[email protected]> wrote:
> ... broke NPTL.
This might explain something odd I am seeing with today's kernel.
On login, Gnome starts and gets most of the way running, but the icons and
screen background are missing.
On Fri, 5 Sep 2003, Ulrich Drepper wrote:
> ... broke NPTL. Tests which worked with previous kernels fail now. One
> test eventually succeeded, but the process somehow got stuck for about
> 30-40 seconds. Then it finished. Running strace showed a call to
> clone() as the last operation but there were other threads running at
> that time.
>....
> What I can offer are statically linked versions of the tests.
> One is here: http://people.redhat.com/drepper/tst-cond2.bz2
Very helpful, thank you: it showed two bugs, one new and one old.
Does the patch below work for you, Ulrich?
The new bug is that "offset" has been declared as an alternative in
the union, instead of as an element in the structures comprising it,
effectively eliminating it from the key: keys match which should not.
The old bug is that if futex_requeue were called with identical
key1 and key2 (sensible? tended to happen given the first bug),
it was liable to loop for a long time holding futex_lock: guard
against that, still respecting the semantics of futex_requeue.
While here, please let's also fix the get_futex_key VM_NONLINEAR
case, which was returning the 1 from get_user_pages, taken as an
error by its callers. And save a few bytes and improve debuggability
by uninlining the top-level futex_wake, futex_requeue, futex_wait.
Hugh
--- 2.6.0-test4-bk8/kernel/futex.c Fri Sep 5 21:04:52 2003
+++ linux/kernel/futex.c Sat Sep 6 16:29:32 2003
@@ -49,16 +49,18 @@
struct {
unsigned long pgoff;
struct inode *inode;
+ int offset;
} shared;
struct {
unsigned long uaddr;
struct mm_struct *mm;
+ int offset;
} private;
struct {
unsigned long word;
void *ptr;
+ int offset;
} both;
- int offset;
};
/*
@@ -91,7 +93,7 @@
{
return &futex_queues[hash_long(key->both.word
+ (unsigned long) key->both.ptr
- + key->offset, FUTEX_HASHBITS)];
+ + key->both.offset, FUTEX_HASHBITS)];
}
/*
@@ -101,7 +103,7 @@
{
return (key1->both.word == key2->both.word
&& key1->both.ptr == key2->both.ptr
- && key1->offset == key2->offset);
+ && key1->both.offset == key2->both.offset);
}
/*
@@ -127,10 +129,10 @@
/*
* The futex address must be "naturally" aligned.
*/
- key->offset = uaddr % PAGE_SIZE;
- if (unlikely((key->offset % sizeof(u32)) != 0))
+ key->both.offset = uaddr % PAGE_SIZE;
+ if (unlikely((key->both.offset % sizeof(u32)) != 0))
return -EINVAL;
- uaddr -= key->offset;
+ uaddr -= key->both.offset;
/*
* The futex is hashed differently depending on whether
@@ -199,6 +201,7 @@
key->shared.pgoff =
page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
put_page(page);
+ return 0;
}
return err;
}
@@ -208,7 +211,7 @@
* Wake up all waiters hashed on the physical page that is mapped
* to this virtual address:
*/
-static inline int futex_wake(unsigned long uaddr, int num)
+static int futex_wake(unsigned long uaddr, int num)
{
struct list_head *i, *next, *head;
union futex_key key;
@@ -247,7 +250,7 @@
* Requeue all waiters hashed on one physical page to another
* physical page.
*/
-static inline int futex_requeue(unsigned long uaddr1, unsigned long uaddr2,
+static int futex_requeue(unsigned long uaddr1, unsigned long uaddr2,
int nr_wake, int nr_requeue)
{
struct list_head *i, *next, *head1, *head2;
@@ -282,6 +285,9 @@
this->key = key2;
if (ret - nr_wake >= nr_requeue)
break;
+ /* Make sure to stop if key1 == key2 */
+ if (head1 == head2 && head1 != next)
+ head1 = i;
}
}
}
@@ -320,7 +326,7 @@
return ret;
}
-static inline int futex_wait(unsigned long uaddr, int val, unsigned long time)
+static int futex_wait(unsigned long uaddr, int val, unsigned long time)
{
DECLARE_WAITQUEUE(wait, current);
int ret, curval;
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hugh Dickins wrote:
> Does the patch below work for you, Ulrich?
Indeed it does. This was so far only on a UP HT machine. I'll
hopefully later on can run it on a bigger SMP machine. Good work.
- --
- --------------. ,-. 444 Castro Street
Ulrich Drepper \ ,-----------------' \ Mountain View, CA 94041 USA
Red Hat `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
iD8DBQE/WhoR2ijCOnn/RHQRAubxAJ9a4UeDkIQ61NBDIkwWWT8myvbsxACdEsnl
xyLcJXc57avwpVltynzRcQE=
=X+Xh
-----END PGP SIGNATURE-----
Ulrich Drepper wrote:
> > Does the patch below work for you, Ulrich?
>
> Indeed it does. This was so far only on a UP HT machine. I'll
> hopefully later on can run it on a bigger SMP machine. Good work.
The bugs fixed in Hugh's patch explain all the symptoms I saw with
your test program.
-- Jamie
Hugh Dickins wrote:
> The new bug is that "offset" has been declared as an alternative in
> the union, instead of as an element in the structures comprising it,
> effectively eliminating it from the key: keys match which should not.
Auch!
(Off to the post office for a pack of brown paper bags)
(And a pillow)
> The old bug is that if futex_requeue were called with identical
> key1 and key2 (sensible? tended to happen given the first bug),
> it was liable to loop for a long time holding futex_lock: guard
> against that, still respecting the semantics of futex_requeue.
That explains the hang I just saw in one run of Ulrich's test...
And it explain's why it's not repeatable.
With key1 == key2, you get to move nr_requeue waiters to the end of
the waiting list. I can't think of a good use for it, but it does do
something visible.
> + /* Make sure to stop if key1 == key2 */
> + if (head1 == head2 && head1 != next)
> + head1 = i;
Subtle, when nr_requeue > 1. That's a disturbingly nice trick :)
> While here, please let's also fix the get_futex_key VM_NONLINEAR
> case, which was returning the 1 from get_user_pages, taken as an
> error by its callers.
Yes. I just spotted it too.
> And save a few bytes and improve debuggability
> by uninlining the top-level futex_wake, futex_requeue, futex_wait.
Fair point about about debuggability, but does it really save bytes to
uninline these called-once functions?
-- Jamie
On Sat, 6 Sep 2003, Jamie Lokier wrote:
> > And save a few bytes and improve debuggability
> > by uninlining the top-level futex_wake, futex_requeue, futex_wait.
>
> Fair point about about debuggability, but does it really save bytes to
> uninline these called-once functions?
It saved about 50 with whatever 2.96 I'm using on this machine, but I
didn't try other gccs; it's probably added about that back in kallsyms...
Hugh
In message <[email protected]> you write:
> While here, please let's also fix the get_futex_key VM_NONLINEAR
> case, which was returning the 1 from get_user_pages, taken as an
> error by its callers. And save a few bytes and improve debuggability
> by uninlining the top-level futex_wake, futex_requeue, futex_wait.
OK, I've updated my patch on top of this. Mainly cosmetic, please
review.
Name: Minor Tweaks To Jamie Lokier's Futex Patch
Author: Rusty Russell
Status: Booted on 2.6.0-test4-bk9
Depends: Misc/futex-hugh.patch.gz
D: Minor changes to Jamie's excellent futex patch.
D: 1) Remove obsolete comment above hash array decl.
D: 2) Semantics of futex on read-only pages unclear: require write perm.
D: 3) Clarify comment about TASK_INTERRUPTIBLE.
D: 4) Andrew Morton says spurious wakeup is a bug. Catch it.
D: 5) Use Jenkins hash.
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .24731-linux-2.6.0-test4-bk9/kernel/futex.c .24731-linux-2.6.0-test4-bk9.updated/kernel/futex.c
--- .24731-linux-2.6.0-test4-bk9/kernel/futex.c 2003-09-08 10:44:26.000000000 +1000
+++ .24731-linux-2.6.0-test4-bk9.updated/kernel/futex.c 2003-09-08 12:01:23.000000000 +1000
@@ -33,7 +33,7 @@
#include <linux/poll.h>
#include <linux/fs.h>
#include <linux/file.h>
-#include <linux/hash.h>
+#include <linux/jhash.h>
#include <linux/init.h>
#include <linux/futex.h>
#include <linux/mount.h>
@@ -44,6 +44,7 @@
/*
* Futexes are matched on equal values of this key.
* The key type depends on whether it's a shared or private mapping.
+ * Don't rearrange without looking at hash_futex().
*/
union futex_key {
struct {
@@ -79,7 +80,6 @@ struct futex_q {
struct file *filp;
};
-/* The key for the hash is the address + index + offset within page */
static struct list_head futex_queues[1<<FUTEX_HASHBITS];
static spinlock_t futex_lock = SPIN_LOCK_UNLOCKED;
@@ -89,11 +89,12 @@ static struct vfsmount *futex_mnt;
/*
* We hash on the keys returned from get_futex_key (see below).
*/
-static inline struct list_head *hash_futex(union futex_key *key)
+static inline struct list_head *hash_futex(const union futex_key *key)
{
- return &futex_queues[hash_long(key->both.word
- + (unsigned long) key->both.ptr
- + key->both.offset, FUTEX_HASHBITS)];
+ u32 hash = jhash2((u32*)&key->both.word,
+ (sizeof(key->both.word)+sizeof(key->both.ptr))/4,
+ key->both.offset);
+ return &futex_queues[hash & ((1 << FUTEX_HASHBITS)-1)];
}
/*
@@ -145,17 +146,13 @@ static int get_futex_key(unsigned long u
/*
* Permissions.
*/
- if (unlikely((vma->vm_flags & (VM_IO|VM_READ)) != VM_READ))
- return (vma->vm_flags & VM_IO) ? -EPERM : -EACCES;
+ if (unlikely(vma->vm_flags & VM_IO))
+ return -EPERM;
+ if (unlikely(vma->vm_flags & (VM_READ|VM_WRITE)) != (VM_READ|VM_WRITE))
+ return -EACCES;
/*
* Private mappings are handled in a simple way.
- *
- * NOTE: When userspace waits on a MAP_SHARED mapping, even if
- * it's a read-only handle, it's expected that futexes attach to
- * the object not the particular process. Therefore we use
- * VM_MAYSHARE here, not VM_SHARED which is restricted to shared
- * mappings of _writable_ handles.
*/
if (likely(!(vma->vm_flags & VM_MAYSHARE))) {
key->private.mm = mm;
@@ -333,7 +330,6 @@ static int futex_wait(unsigned long uadd
union futex_key key;
struct futex_q q;
- try_again:
init_waitqueue_head(&q.waiters);
down_read(¤t->mm->mmap_sem);
@@ -367,10 +363,10 @@ static int futex_wait(unsigned long uadd
/*
* There might have been scheduling since the queue_me(), as we
* cannot hold a spinlock across the get_user() in case it
- * faults. So we cannot just set TASK_INTERRUPTIBLE state when
+ * faults, and we cannot just set TASK_INTERRUPTIBLE state when
* queueing ourselves into the futex hash. This code thus has to
- * rely on the futex_wake() code doing a wakeup after removing
- * the waiter from the list.
+ * rely on the futex_wake() code removing us from hash when it
+ * wakes us up.
*/
add_wait_queue(&q.waiters, &wait);
spin_lock(&futex_lock);
@@ -394,26 +390,19 @@ static int futex_wait(unsigned long uadd
* we are the only user of it.
*/
- /*
- * Were we woken or interrupted for a valid reason?
- */
- ret = unqueue_me(&q);
- if (ret == 0)
+ /* If we were woken (and unqueued), we succeeded, whatever. */
+ if (!unqueue_me(&q))
return 0;
if (time == 0)
return -ETIMEDOUT;
if (signal_pending(current))
return -EINTR;
- /*
- * No, it was a spurious wakeup. Try again. Should never happen. :)
- */
- goto try_again;
+ /* A spurious wakeup. Should never happen. */
+ BUG();
out_unqueue:
- /*
- * Were we unqueued anyway?
- */
+ /* If we were woken (and unqueued), we succeeded, whatever. */
if (!unqueue_me(&q))
ret = 0;
out_release_sem:
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
On Sat, 6 Sep 2003 17:28:44 +0100 (BST)
Hugh Dickins <[email protected]> wrote:
> On Fri, 5 Sep 2003, Ulrich Drepper wrote:
> > ... broke NPTL. Tests which worked with previous kernels fail now. One
> > test eventually succeeded, but the process somehow got stuck for about
> > 30-40 seconds. Then it finished. Running strace showed a call to
> > clone() as the last operation but there were other threads running at
> > that time.
> >....
> > What I can offer are statically linked versions of the tests.
> > One is here: http://people.redhat.com/drepper/tst-cond2.bz2
>
> Very helpful, thank you: it showed two bugs, one new and one old.
> Does the patch below work for you, Ulrich?
>
> The new bug is that "offset" has been declared as an alternative in
> the union, instead of as an element in the structures comprising it,
> effectively eliminating it from the key: keys match which should not.
>
> The old bug is that if futex_requeue were called with identical
> key1 and key2 (sensible? tended to happen given the first bug),
> it was liable to loop for a long time holding futex_lock: guard
> against that, still respecting the semantics of futex_requeue.
>
> While here, please let's also fix the get_futex_key VM_NONLINEAR
> case, which was returning the 1 from get_user_pages, taken as an
> error by its callers. And save a few bytes and improve debuggability
> by uninlining the top-level futex_wake, futex_requeue, futex_wait.
>
> Hugh
Everything is working fine for me now.
On Mon, 8 Sep 2003, Rusty Russell wrote:
> OK, I've updated my patch on top of this. Mainly cosmetic, please
> review.
>
> Name: Minor Tweaks To Jamie Lokier's Futex Patch
> Author: Rusty Russell
> Status: Booted on 2.6.0-test4-bk9
> Depends: Misc/futex-hugh.patch.gz
>
> D: Minor changes to Jamie's excellent futex patch.
> D: 1) Remove obsolete comment above hash array decl.
> D: 2) Semantics of futex on read-only pages unclear: require write perm.
> D: 3) Clarify comment about TASK_INTERRUPTIBLE.
> D: 4) Andrew Morton says spurious wakeup is a bug. Catch it.
> D: 5) Use Jenkins hash.
Most of it (the futex_wait tweaks) looked fine to me -
though I look forward to the first report of that BUG().
Part 2, requiring VM_WRITE and removing the comment on VM_MAYSHARE,
seems a regression to me. Perhaps I misinterpreted Linus' action in
taking Jamie's patch: I took that to mean he relented a little on his
hardline position about VM_SHARED, and now accepts that in this context
VM_MAYSHARE is more appropriate (easier to document). I know I argued
that readonly futices are pointless, but I thought Jamie gave a good
picture of how a readonly view could still be used. I'd rather that
part were a separate patch, so Linus can merge or not as he wishes.
In part 5, the Jenkins hashing, I was puzzled by the "/4" in
+ (sizeof(key->both.word)+sizeof(key->both.ptr))/4,
both.ptr would be a multiple of 32 (? seems to be that way on PIII
and P4, though I gave up trying to work out quite how slab.c aligns),
and both.word would be a multiple of 1 in the shared.pgoff case, or
a multiple of PAGE_SIZE in the private.uaddr case (private.uaddr =
uaddr >> PAGE_SHIFT might make more sense). Whereas both.offset
would be a multiple of 4. I don't suppose Mr Jenkins will mind,
but I did find the "/4" puzzling in that context.
Hugh
On Mon, 8 Sep 2003, Hugh Dickins wrote:
>
> In part 5, the Jenkins hashing, I was puzzled by the "/4" in
> + (sizeof(key->both.word)+sizeof(key->both.ptr))/4,
> [much blabbering snipped]
Sorry, I've just read that and noticed "sizeof": rosy dawn of enlightenment.
Hugh
Rusty Russell wrote:
> + u32 hash = jhash2((u32*)&key->both.word,
Have you checked the code size?
That does more work and has more code than is needed, especially on
32-bit archs. On 32-bit, jhash_3words() is much better because it
reduces to a single call to __jhash_mix(), instead of the two done by
jhash2 (only one is required for good hashing afaict).
It is probably worth adding a jhash_3longs() to jhash.h, which does
one call to __hash_mix() on 32-bit, two calls on 64-bit, and avoids
the loop in both cases.
[ Aside: For hashing individual integers, I prefer to use Thomas Wang's:
http://www.concentric.net/~Ttwang/tech/inthash.htm
He mentions Jenkin's function, and derived an integer mixing function
from correspondence with Jenkins.
For hashing 3 words together, Jenkins' hash does seem a bit more
compact - if you don't call __jhash_mix() multiple times that is! ]
> - if (unlikely((vma->vm_flags & (VM_IO|VM_READ)) != VM_READ))
> - return (vma->vm_flags & VM_IO) ? -EPERM : -EACCES;
> + if (unlikely(vma->vm_flags & VM_IO))
> + return -EPERM;
> + if (unlikely(vma->vm_flags & (VM_READ|VM_WRITE)) != (VM_READ|VM_WRITE))
> + return -EACCES;
Is there a good reason to disallow read-only waiters?
I agree with Hugh that it seems like a regression.
> + /* A spurious wakeup. Should never happen. */
> + BUG();
:)
The rest of your changes seem fine. I particularly appreciate your
grammatical improvements to my comment :)
-- Jamie
Jamie Lokier wrote:
> It is probably worth adding a jhash_3longs() to jhash.h, which does
> one call to __hash_mix() on 32-bit, two calls on 64-bit, and avoids
> the loop in both cases.
Bob Jenkins has a 64-bit version of the mixing function, which we
aren't using. It would be better to use a single iteration of that in
jhash_3longs().
See:
http://burtleburtle.net/bob/hash/evahash.html#hash64
-- Jamie
Rusty Russell <[email protected]> wrote:
>
> D: 4) Andrew Morton says spurious wakeup is a bug. Catch it.
Yes, but going BUG() is a bit rude. We can detect the error, we can
recover from it and it doesn't cause any user data corruption or anything.
A rude printk is all that is needed here.
Andrew Morton wrote:
> Rusty Russell <[email protected]> wrote:
> > D: 4) Andrew Morton says spurious wakeup is a bug. Catch it.
>
> Yes, but going BUG() is a bit rude. We can detect the error, we can
> recover from it and it doesn't cause any user data corruption or anything.
> A rude printk is all that is needed here.
Well, it should really _never_ happen. We are very careful. Is
there something like BUG() which doesn't terminate the process?
-- Jamie
On Mon, 8 Sep 2003, Jamie Lokier wrote:
>
> Well, it should really _never_ happen. We are very careful. Is
> there something like BUG() which doesn't terminate the process?
That's exactly what WARN() does.
Linus
Jamie Lokier <[email protected]> wrote:
>
> Andrew Morton wrote:
> > Rusty Russell <[email protected]> wrote:
> > > D: 4) Andrew Morton says spurious wakeup is a bug. Catch it.
> >
> > Yes, but going BUG() is a bit rude. We can detect the error, we can
> > recover from it and it doesn't cause any user data corruption or anything.
> > A rude printk is all that is needed here.
>
> Well, it should really _never_ happen. We are very careful. Is
> there something like BUG() which doesn't terminate the process?
>
A WARN_ON(1) is good enough. It will spit a stack dump and we will get to
hear about it. Going BUG merely reduces the chances of the info hitting
the logs and irritates our testers.
In message <[email protected]> you write:
> Most of it (the futex_wait tweaks) looked fine to me -
> though I look forward to the first report of that BUG().
Me too. But at least we'll *get* a report if someone does spurious
wakeups.
> Part 2, requiring VM_WRITE and removing the comment on VM_MAYSHARE,
> seems a regression to me. Perhaps I misinterpreted Linus' action in
> taking Jamie's patch: I took that to mean he relented a little on his
> hardline position about VM_SHARED, and now accepts that in this context
> VM_MAYSHARE is more appropriate (easier to document). I know I argued
> that readonly futices are pointless, but I thought Jamie gave a good
> picture of how a readonly view could still be used. I'd rather that
> part were a separate patch, so Linus can merge or not as he wishes.
Sure, I jumboed them together for feedback from you guys.
All users I am currently aware of won't care either way. Current
test-5 is:
Sees Changes Sees FUTEX_WAKE
from another MAP_SHARED from another MAP_SHARED
MAP_PRIVATE read-only: Y N
MAP_PRIVATE read-write: Y* N
MAP_SHARED read-only: Y Y
MAP_SHARED read-write: Y Y
[* Only until page is written to, after which COW splits them ]
Previously, the FUTEX_WAKE column was identical to the first column.
Now, IMHO, this new semantic is more sensible, but I don't really
mind.
But I don't recall anything about believable use of RO futexes: Jamie?
Cheers,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
In message <[email protected]> you write:
> Rusty Russell wrote:
> > + u32 hash = jhash2((u32*)&key->both.word,
>
> Have you checked the code size?
Good point: it's bigger, and there are 4 call sites, so it should be
inlined. Done.
> That does more work and has more code than is needed, especially on
> 32-bit archs. On 32-bit, jhash_3words() is much better because it
> reduces to a single call to __jhash_mix(), instead of the two done by
> jhash2 (only one is required for good hashing afaict).
>
> It is probably worth adding a jhash_3longs() to jhash.h, which does
> one call to __hash_mix() on 32-bit, two calls on 64-bit, and avoids
> the loop in both cases.
Well, I'm happy to let the compiler do this work 8) In fact, it does
it quite well: the jhash_2words version is still 4 bytes shorter
though, gcc 3.2.3, but then the hashes are slightly different (length
isn't added in jhash_2words).
> [ Aside: For hashing individual integers, I prefer to use Thomas Wang's:
>
> http://www.concentric.net/~Ttwang/tech/inthash.htm
>
> He mentions Jenkin's function, and derived an integer mixing function
> from correspondence with Jenkins.
Interesting: we could sub this for the specific jhash_x functions if
someone wants to do the analysis.
> > - if (unlikely((vma->vm_flags & (VM_IO|VM_READ)) != VM_READ))
> > - return (vma->vm_flags & VM_IO) ? -EPERM : -EACCES;
> > + if (unlikely(vma->vm_flags & VM_IO))
> > + return -EPERM;
> > + if (unlikely(vma->vm_flags & (VM_READ|VM_WRITE)) != (VM_READ|VM_WRITE))
> > + return -EACCES;
>
> Is there a good reason to disallow read-only waiters?
> I agree with Hugh that it seems like a regression.
Yes, I've reverted this part.
> > + /* A spurious wakeup. Should never happen. */
> > + BUG();
>
> :)
>
> The rest of your changes seem fine. I particularly appreciate your
> grammatical improvements to my comment :)
These days my biggest contribution to the futex code 8)
BTW, I'm guessing from your preference for multi-line comments that
you don't use a color-coding editor for source? I must say that once
I got used to it, I really prefer comments in green.
Cheers,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
Name: Minor Tweaks To Jamie Lokier's Futex Patch
Author: Rusty Russell
Status: Booted on 2.6.0-test5
D: Minor changes to Jamie's excellent futex patch.
D: 1) Remove obsolete comment above hash array decl.
D: 2) Clarify comment about TASK_INTERRUPTIBLE.
D: 3) Andrew Morton says spurious wakeup is a bug. Catch it.
D: 4) Try Jenkins hash.
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .24731-linux-2.6.0-test4-bk9/kernel/futex.c .24731-linux-2.6.0-test4-bk9.updated/kernel/futex.c
--- .24731-linux-2.6.0-test4-bk9/kernel/futex.c 2003-09-08 10:44:26.000000000 +1000
+++ .24731-linux-2.6.0-test4-bk9.updated/kernel/futex.c 2003-09-08 12:01:23.000000000 +1000
@@ -33,7 +33,7 @@
#include <linux/poll.h>
#include <linux/fs.h>
#include <linux/file.h>
-#include <linux/hash.h>
+#include <linux/jhash.h>
#include <linux/init.h>
#include <linux/futex.h>
#include <linux/mount.h>
@@ -44,6 +44,7 @@
/*
* Futexes are matched on equal values of this key.
* The key type depends on whether it's a shared or private mapping.
+ * Don't rearrange members without looking at hash_futex().
*/
union futex_key {
struct {
@@ -79,7 +80,6 @@ struct futex_q {
struct file *filp;
};
-/* The key for the hash is the address + index + offset within page */
static struct list_head futex_queues[1<<FUTEX_HASHBITS];
static spinlock_t futex_lock = SPIN_LOCK_UNLOCKED;
@@ -89,11 +89,12 @@ static struct vfsmount *futex_mnt;
/*
* We hash on the keys returned from get_futex_key (see below).
*/
-static inline struct list_head *hash_futex(union futex_key *key)
+static struct list_head *hash_futex(const union futex_key *key)
{
- return &futex_queues[hash_long(key->both.word
- + (unsigned long) key->both.ptr
- + key->both.offset, FUTEX_HASHBITS)];
+ u32 hash = jhash2((u32*)&key->both.word,
+ (sizeof(key->both.word)+sizeof(key->both.ptr))/4,
+ key->both.offset);
+ return &futex_queues[hash & ((1 << FUTEX_HASHBITS)-1)];
}
/*
@@ -333,7 +330,6 @@ static int futex_wait(unsigned long uadd
union futex_key key;
struct futex_q q;
- try_again:
init_waitqueue_head(&q.waiters);
down_read(¤t->mm->mmap_sem);
@@ -367,10 +363,10 @@ static int futex_wait(unsigned long uadd
/*
* There might have been scheduling since the queue_me(), as we
* cannot hold a spinlock across the get_user() in case it
- * faults. So we cannot just set TASK_INTERRUPTIBLE state when
+ * faults, and we cannot just set TASK_INTERRUPTIBLE state when
* queueing ourselves into the futex hash. This code thus has to
- * rely on the futex_wake() code doing a wakeup after removing
- * the waiter from the list.
+ * rely on the futex_wake() code removing us from hash when it
+ * wakes us up.
*/
add_wait_queue(&q.waiters, &wait);
spin_lock(&futex_lock);
@@ -394,26 +390,19 @@ static int futex_wait(unsigned long uadd
* we are the only user of it.
*/
- /*
- * Were we woken or interrupted for a valid reason?
- */
- ret = unqueue_me(&q);
- if (ret == 0)
+ /* If we were woken (and unqueued), we succeeded, whatever. */
+ if (!unqueue_me(&q))
return 0;
if (time == 0)
return -ETIMEDOUT;
if (signal_pending(current))
return -EINTR;
- /*
- * No, it was a spurious wakeup. Try again. Should never happen. :)
- */
- goto try_again;
+ /* A spurious wakeup. Should never happen. */
+ BUG();
out_unqueue:
- /*
- * Were we unqueued anyway?
- */
+ /* If we were woken (and unqueued), we succeeded, whatever. */
if (!unqueue_me(&q))
ret = 0;
out_release_sem:
In message <[email protected]> you write:
> Rusty Russell <[email protected]> wrote:
> >
> > D: 4) Andrew Morton says spurious wakeup is a bug. Catch it.
>
> Yes, but going BUG() is a bit rude. We can detect the error, we can
> recover from it and it doesn't cause any user data corruption or anything.
> A rude printk is all that is needed here.
OK. Changed.
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
Name: Minor Tweaks To Jamie Lokier's Futex Patch
Author: Rusty Russell
Status: Booted on 2.6.0-test5
D: Minor changes to Jamie's excellent futex patch.
D: 1) Remove obsolete comment above hash array decl.
D: 2) Clarify comment about TASK_INTERRUPTIBLE.
D: 3) Andrew Morton says spurious wakeup is a bug. Catch it.
D: 4) Try Jenkins hash.
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .9333-linux-2.6.0-test5/kernel/futex.c .9333-linux-2.6.0-test5.updated/kernel/futex.c
--- .9333-linux-2.6.0-test5/kernel/futex.c 2003-09-09 10:35:05.000000000 +1000
+++ .9333-linux-2.6.0-test5.updated/kernel/futex.c 2003-09-09 14:06:06.000000000 +1000
@@ -33,7 +33,7 @@
#include <linux/poll.h>
#include <linux/fs.h>
#include <linux/file.h>
-#include <linux/hash.h>
+#include <linux/jhash.h>
#include <linux/init.h>
#include <linux/futex.h>
#include <linux/mount.h>
@@ -44,6 +44,7 @@
/*
* Futexes are matched on equal values of this key.
* The key type depends on whether it's a shared or private mapping.
+ * Don't rearrange members without looking at hash_futex().
*/
union futex_key {
struct {
@@ -79,7 +80,6 @@ struct futex_q {
struct file *filp;
};
-/* The key for the hash is the address + index + offset within page */
static struct list_head futex_queues[1<<FUTEX_HASHBITS];
static spinlock_t futex_lock = SPIN_LOCK_UNLOCKED;
@@ -89,11 +89,12 @@ static struct vfsmount *futex_mnt;
/*
* We hash on the keys returned from get_futex_key (see below).
*/
-static inline struct list_head *hash_futex(union futex_key *key)
+static struct list_head *hash_futex(const union futex_key *key)
{
- return &futex_queues[hash_long(key->both.word
- + (unsigned long) key->both.ptr
- + key->both.offset, FUTEX_HASHBITS)];
+ u32 hash = jhash2((u32*)&key->both.word,
+ (sizeof(key->both.word)+sizeof(key->both.ptr))/4,
+ key->both.offset);
+ return &futex_queues[hash & ((1 << FUTEX_HASHBITS)-1)];
}
/*
@@ -333,7 +334,6 @@ static int futex_wait(unsigned long uadd
union futex_key key;
struct futex_q q;
- try_again:
init_waitqueue_head(&q.waiters);
down_read(¤t->mm->mmap_sem);
@@ -367,10 +367,10 @@ static int futex_wait(unsigned long uadd
/*
* There might have been scheduling since the queue_me(), as we
* cannot hold a spinlock across the get_user() in case it
- * faults. So we cannot just set TASK_INTERRUPTIBLE state when
+ * faults, and we cannot just set TASK_INTERRUPTIBLE state when
* queueing ourselves into the futex hash. This code thus has to
- * rely on the futex_wake() code doing a wakeup after removing
- * the waiter from the list.
+ * rely on the futex_wake() code removing us from hash when it
+ * wakes us up.
*/
add_wait_queue(&q.waiters, &wait);
spin_lock(&futex_lock);
@@ -394,26 +394,17 @@ static int futex_wait(unsigned long uadd
* we are the only user of it.
*/
- /*
- * Were we woken or interrupted for a valid reason?
- */
- ret = unqueue_me(&q);
- if (ret == 0)
+ /* If we were woken (and unqueued), we succeeded, whatever. */
+ if (!unqueue_me(&q))
return 0;
if (time == 0)
return -ETIMEDOUT;
- if (signal_pending(current))
- return -EINTR;
-
- /*
- * No, it was a spurious wakeup. Try again. Should never happen. :)
- */
- goto try_again;
+ /* A spurious wakeup should never happen. */
+ WARN_ON(!signal_pending(current));
+ return -EINTR;
out_unqueue:
- /*
- * Were we unqueued anyway?
- */
+ /* If we were woken (and unqueued), we succeeded, whatever. */
if (!unqueue_me(&q))
ret = 0;
out_release_sem:
Rusty Russell wrote:
> BTW, I'm guessing from your preference for multi-line comments that
> you don't use a color-coding editor for source? I must say that once
> I got used to it, I really prefer comments in green.
You mean the /*
*
*/ style?
No, I never use that style, except in futex.c to go along with the
style that was already there! I use Emacs with comments in red-orange :)
-- Jamie
In message <[email protected]> you write:
> Rusty Russell wrote:
> > BTW, I'm guessing from your preference for multi-line comments that
> > you don't use a color-coding editor for source? I must say that once
> > I got used to it, I really prefer comments in green.
>
> You mean the /*
> *
> */ style?
>
> No, I never use that style, except in futex.c to go along with the
> style that was already there! I use Emacs with comments in red-orange :)
I'll submit a trivial patch to condense them again: it's an Ingo-ism.
Comments in my parts of the code are less, um, verbose 8)
Cheers,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
Rusty Russell wrote:
> > No, I never use that style, except in futex.c to go along with the
> > style that was already there! I use Emacs with comments in red-orange :)
>
> I'll submit a trivial patch to condense them again: it's an Ingo-ism.
> Comments in my parts of the code are less, um, verbose 8)
Oh, the verbosity is definitely me, only the formatting is Ingo-ism :)
-- Jamie