2002-10-15 19:59:59

by Ingo Molnar

[permalink] [raw]
Subject: [patch] futex-2.5.42-A2


This is my current futex patchset against BK-curr. It mostly includes
must-have crash/correctness fixes from Martin Wirth, tested and reworked
somewhat by myself:

- crash fix: futex_close did not detach from the vcache. Detach cleanups.
(Martin Wirth)

- memory leak fix: forgotten put_page() in a rare path in __pin_page().
(Martin Wirth)

- crash fix: do not do any quickcheck in unqueue_me(). (Martin, me)

- correctness fix: the fastpath in __pin_page() now handles reserved
pages the same way get_user_pages() does. (Martin Wirth)

- queueing improvement: __attach_vcache() now uses list_add_tail() to
avoid the reversal of the futex queue if a COW happens. (Martin Wirth)

- simplified alignment check in sys_futex. (Martin Wirth)

- comment fix: make it clear how the vcache hash quickcheck works. (me)

compiles, boots & works just fine on x86 SMP and UP.

Ingo

--- linux/include/linux/vcache.h.orig 2002-10-15 21:35:55.000000000 +0200
+++ linux/include/linux/vcache.h 2002-10-15 21:36:07.000000000 +0200
@@ -18,7 +18,7 @@
struct mm_struct *mm,
void (*callback)(struct vcache_s *data, struct page *new_page));

-extern void detach_vcache(vcache_t *vcache);
+extern void __detach_vcache(vcache_t *vcache);

extern void invalidate_vcache(unsigned long address, struct mm_struct *mm,
struct page *new_page);
--- linux/kernel/futex.c.orig 2002-10-15 21:35:55.000000000 +0200
+++ linux/kernel/futex.c 2002-10-15 21:41:06.000000000 +0200
@@ -115,8 +115,9 @@
* Do a quick atomic lookup first - this is the fastpath.
*/
page = follow_page(mm, addr, 0);
- if (likely(page != NULL)) {
- get_page(page);
+ if (likely(page != NULL)) {
+ if (!PageReserved(page))
+ get_page(page);
return page;
}

@@ -140,8 +141,10 @@
* check for races:
*/
tmp = follow_page(mm, addr, 0);
- if (tmp != page)
+ if (tmp != page) {
+ put_page(page);
goto repeat_lookup;
+ }

return page;
}
@@ -176,6 +179,7 @@

if (this->page == page && this->offset == offset) {
list_del_init(i);
+ __detach_vcache(&this->vcache);
tell_waiter(this);
ret++;
if (ret >= num)
@@ -235,15 +239,15 @@
{
int ret = 0;

- detach_vcache(&q->vcache);
-
+ spin_lock(&vcache_lock);
spin_lock(&futex_lock);
if (!list_empty(&q->list)) {
list_del(&q->list);
+ __detach_vcache(&q->vcache);
ret = 1;
}
spin_unlock(&futex_lock);
-
+ spin_unlock(&vcache_lock);
return ret;
}

@@ -314,13 +318,7 @@
{
struct futex_q *q = filp->private_data;

- spin_lock(&futex_lock);
- if (!list_empty(&q->list)) {
- list_del(&q->list);
- /* Noone can be polling on us now. */
- BUG_ON(waitqueue_active(&q->waiters));
- }
- spin_unlock(&futex_lock);
+ unqueue_me(q);
unpin_page(q->page);
kfree(filp->private_data);
return 0;
@@ -436,9 +434,8 @@

pos_in_page = uaddr % PAGE_SIZE;

- /* Must be "naturally" aligned, and not on page boundary. */
- if ((pos_in_page % __alignof__(int)) != 0
- || pos_in_page + sizeof(int) > PAGE_SIZE)
+ /* Must be "naturally" aligned */
+ if (pos_in_page % sizeof(int))
return -EINVAL;

switch (op) {
--- linux/mm/vcache.c.orig 2002-10-15 21:35:55.000000000 +0200
+++ linux/mm/vcache.c 2002-10-15 21:36:07.000000000 +0200
@@ -41,14 +41,12 @@

hash_head = hash_vcache(address, mm);

- list_add(&vcache->hash_entry, hash_head);
+ list_add_tail(&vcache->hash_entry, hash_head);
}

-void detach_vcache(vcache_t *vcache)
+void __detach_vcache(vcache_t *vcache)
{
- spin_lock(&vcache_lock);
- list_del(&vcache->hash_entry);
- spin_unlock(&vcache_lock);
+ list_del_init(&vcache->hash_entry);
}

void invalidate_vcache(unsigned long address, struct mm_struct *mm,
@@ -61,12 +59,11 @@

hash_head = hash_vcache(address, mm);
/*
- * This is safe, because this path is called with the mm
- * semaphore read-held, and the add/remove path calls with the
- * mm semaphore write-held. So while other mm's might add new
- * entries in parallel, and *this* mm is locked out, so if the
- * list is empty now then we do not have to take the vcache
- * lock to see it's really empty.
+ * This is safe, because this path is called with the pagetable
+ * lock held. So while other mm's might add new entries in
+ * parallel, *this* mm is locked out, so if the list is empty
+ * now then we do not have to take the vcache lock to see it's
+ * really empty.
*/
if (likely(list_empty(hash_head)))
return;




2002-10-16 02:43:55

by Rusty Russell

[permalink] [raw]
Subject: Re: [patch] futex-2.5.42-A2

In message <[email protected]> you
write:
> - simplified alignment check in sys_futex. (Martin Wirth)

Um, this test existed for a reason:

> - /* Must be "naturally" aligned, and not on page boundary. */
> - if ((pos_in_page % __alignof__(int)) != 0
> - || pos_in_page + sizeof(int) > PAGE_SIZE)
> + /* Must be "naturally" aligned */
> + if (pos_in_page % sizeof(int))
> return -EINVAL;

If you do this, *please* add:
/* Above check not sufficient if align of int < size. Break link. */
if (__alignof__(int) < sizeof(int)) {
extern void __error_small_int_align();
__error_small_int_align();
}

Cheers,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-10-16 07:59:50

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] futex-2.5.42-A2


On Wed, 16 Oct 2002, Rusty Russell wrote:

> If you do this, *please* add:
> /* Above check not sufficient if align of int < size. Break link. */
> if (__alignof__(int) < sizeof(int)) {
> extern void __error_small_int_align();
> __error_small_int_align();
> }

are you aware of any platform we care about that has a word alignment like
this? But in any case, this should not matter these days, all we need is a
guarantee that the word is accessed atomically by the kernel when it reads
it, pure int alignment should be enough for that.

Ingo


2002-10-16 07:56:07

by Martin Wirth

[permalink] [raw]
Subject: Re: [patch] futex-2.5.42-A2

(My first reply seems to have been lost, so lets try a second time)


>Um, this test existed for a reason:
>
>> - /* Must be "naturally" aligned, and not on page boundary. */
>> - if ((pos_in_page % __alignof__(int)) != 0
>> - || pos_in_page + sizeof(int) > PAGE_SIZE)
>> + /* Must be "naturally" aligned */
>> + if (pos_in_page % sizeof(int))
>> return -EINVAL;
>
>If you do this, *please* add:
> /* Above check not sufficient if align of int < size. Break link. */
> if (__alignof__(int) < sizeof(int)) {
>
extern void __error_small_int_align();
>
__error_small_int_align();
> }

I suggested to tighten the above test, because if
__alignof__(int) < sizeof(int) the test leads to sporadic user space
errors if the futex variable accidentally crosses a page boundary. The
only sane way to control this is to force the user space compiler
to use __alignof__(int) == sizeof(int) for futex variables.

Anyway, the test dates back to times when the futex code did atomic
operations on the user space variable. But this is gone. The present
code only touches users space by get_user which does its on checks.
So from the point of keeping the kernel in a sane state we could drop the test
completely.


Martin

2002-10-16 08:05:54

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] futex-2.5.42-A2


On Wed, 16 Oct 2002, Martin Wirth wrote:

> > if (__alignof__(int) < sizeof(int)) {
> >
> extern void __error_small_int_align();

> I suggested to tighten the above test, because if __alignof__(int) <
> sizeof(int) the test leads to sporadic user space errors if the futex
> variable accidentally crosses a page boundary. [...]

i think you misunderstood Rusty's suggestion - what he suggests is to fail
the kernel compile with a linker error if alignof(int) < sizeof(int). This
is a pure compile-time thing, it does not relate to the alignment of the
futex variable in any way.

> Anyway, the test dates back to times when the futex code did atomic
> operations on the user space variable. But this is gone. The present
> code only touches users space by get_user which does its on checks. So
> from the point of keeping the kernel in a sane state we could drop the
> test completely.

actually, i think it's still important to do the alignment check to
enforce userspace to use sane alignment. We dont want one futex variable
out of 1000 potential futex to be misaligned, missed and blamed on the
kernel.

Ingo

2002-10-16 14:19:50

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [patch] futex-2.5.42-A2

Hello.

Does not related to this patch, but...

On Sat, 05 Oct 2002, Oleg Nesterov wrote:
>
> Ingo Molnar wrote:
> > the new lookup code first does a lightweight follow_page(), then if no
> > page is present we do the get_user_pages() thing.
>
> What if futex placed in VM_HUGETLB area?
> Then follow_page() return garbage.

I think __pin_page(addr_in_hugetlb_area) has serious problems.

Oleg.

2002-10-18 09:27:43

by Rusty Russell

[permalink] [raw]
Subject: Re: [patch] futex-2.5.42-A2

In message <[email protected]> you
write:
>
> On Wed, 16 Oct 2002, Rusty Russell wrote:
> > If you do this, *please* add:
> > /* Above check not sufficient if align of int < size. Break link. */
>
> But in any case, this should not matter these days, all we need is a
> guarantee that the word is accessed atomically by the kernel when it reads
> it, pure int alignment should be enough for that.

Yes, this is true. I used to not do get_user() on the value (the page
is pinned anyway), so it mattered.

Sorry for the confusion,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.