LinuxLists.cc - [PATCH] RFC: futex fault handling and futex key references (NOT FOR INCLUSION)

2009-01-09 07:52:57

Subject: [PATCH] RFC: futex fault handling and futex key references (NOT FOR INCLUSION)

While trying to bend my brain around the various layers of fault handling in
futex.c, I think I may have uncovered some logical errors (or at least stale
code sections). I've attached two patches that address the alleged problems
against linux-tip/core/futexes. They are based on the following assumption:

Since the uaddr passed to a futex function isn't updated within the function,
and the mm doesn't change while we're in there, there should never be a need to
make repeat calls to futex_get_key(). Even if the queue_lock is dropped, the
futex_q might lose it's waiter (requeued) but the key stays the same.

I'd really appreciate any feedback.

Thanks in advance,

---

Darren Hart (2):
RFC: Fix futex_lock_pi fault handling (NOT FOR INCLUSION)
RFC: Fix futex_wake_op fault handling (NOT FOR INCLUSION)

kernel/futex.c | 40 ++++++++++++++++------------------------
1 files changed, 16 insertions(+), 24 deletions(-)

--
Signature

2009-01-09 07:53:21

by Darren Hart

[permalink] [raw]

Subject: [PATCH 2/2] RFC: Fix futex_lock_pi fault handling (NOT FOR INCLUSION)

Regardless of whether we call get_user or futex_handle_fault to deal with a
fault, the uaddr doesn't change, so the key won't change. There doesn't appear
to be a reason to re-get the key after a get_user call, but not after a
futex_handle_fault. This patch moves both jump points to right after the the
get_futex_key call. Also fix a missed put_futex_key() call and update the
comment to accurately depict the current code (we don't hold the mm sem now).

Signed-off-by: Darren Hart <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rusty Russell <[email protected]>
---

kernel/futex.c | 17 ++++++++---------
1 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index c15c029..cd03229 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1351,13 +1351,13 @@ static int futex_lock_pi(u32 __user *uaddr, int fshared,
}

q.pi_state = NULL;
-retry:
+
q.key = FUTEX_KEY_INIT;
ret = get_futex_key(uaddr, fshared, &q.key);
if (unlikely(ret != 0))
goto out;

-retry_unlocked:
+retry:
hb = queue_lock(&q);

retry_locked:
@@ -1566,19 +1566,18 @@ out:

uaddr_faulted:
/*
- * We have to r/w *(int __user *)uaddr, and we have to modify it
- * atomically. Therefore, if we continue to fault after get_user()
- * below, we need to handle the fault ourselves, while still holding
- * the mmap_sem. This can occur if the uaddr is under contention as
- * we have to drop the mmap_sem in order to call get_user().
+ * We need to read and write *(int __user *)uaddr2 atomically.
+ * Therefore, if get_user below is not enough, we need to
+ * handle the fault ourselves.
*/
queue_unlock(&q, hb);
+ put_futex_key(fshared, &q.key);

if (attempt++) {
ret = futex_handle_fault((unsigned long)uaddr, attempt);
if (ret)
- goto out_put_key;
- goto retry_unlocked;
+ goto out;
+ goto retry;
}

ret = get_user(uval, uaddr);

2009-01-09 07:53:37

by Darren Hart

[permalink] [raw]

Subject: [PATCH 1/2] RFC: Fix futex_wake_op fault handling (NOT FOR INCLUSION)

As the the uaddr doesn't change after attempts to handle the fault, there is no
need to re-get the futex keys after get_user(). This patch makes successful
calls to futex_handle_fault() and get_user() start the retry from the same
point (right after the get_futex_key calls). Also simplify the logic and
corrects missing put on the futex keys. Finally, update the comment to more
accurate reflect the current code (we no hold the mm sem here).

Signed-off-by: Darren Hart <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rusty Russell <[email protected]>
---

kernel/futex.c | 23 ++++++++---------------
1 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 206d4c9..c15c029 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -745,7 +745,6 @@ futex_wake_op(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
struct futex_q *this, *next;
int ret, op_ret, attempt = 0;

-retryfull:
ret = get_futex_key(uaddr1, fshared, &key1);
if (unlikely(ret != 0))
goto out;
@@ -782,25 +781,19 @@ retry:
}

/*
- * futex_atomic_op_inuser needs to both read and write
- * *(int __user *)uaddr2, but we can't modify it
- * non-atomically. Therefore, if get_user below is not
- * enough, we need to handle the fault ourselves, while
- * still holding the mmap_sem.
+ * We need to read and write *(int __user *)uaddr2 atomically.
+ * Therefore, if get_user below is not enough, we need to
+ * handle the fault ourselves.
*/
- if (attempt++) {
+ if (attempt++)
ret = futex_handle_fault((unsigned long)uaddr2,
attempt);
- if (ret)
- goto out_put_keys;
- goto retry;
- }
+ else
+ ret = get_user(dummy, uaddr2);

- ret = get_user(dummy, uaddr2);
if (ret)
- return ret;
-
- goto retryfull;
+ goto out_put_keys;
+ goto retry;
}

head = &hb1->chain;

2009-01-09 22:03:12

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH] RFC: futex fault handling and futex key references (NOT FOR INCLUSION)

On Thu, 2009-01-08 at 23:52 -0800, Darren Hart wrote:
> While trying to bend my brain around the various layers of fault handling in
> futex.c, I think I may have uncovered some logical errors (or at least stale
> code sections). I've attached two patches that address the alleged problems
> against linux-tip/core/futexes. They are based on the following assumption:
>
> Since the uaddr passed to a futex function isn't updated within the function,
> and the mm doesn't change while we're in there,

That's not quite true, you _can_ change the memory map by issuing
concurrent mmap/munmap/mremap etc.. calls from another thread.

The thing is, afaik futexes have never been completely safe wrt
concurrent mm modifications -- that is, as long as we fail the futex op
with -EFAULT or similar and not crash the kernel we're good.

> there should never be a need to
> make repeat calls to futex_get_key(). Even if the queue_lock is dropped, the
> futex_q might lose it's waiter (requeued) but the key stays the same.

Yes, so when we assume the mmap stable (and fail the op whenever that
assumption proves false) we can say the futex keys are stable and should
never need recomputation.

2009-01-10 05:54:19

by Darren Hart

[permalink] [raw]

Subject: Re: [PATCH] RFC: futex fault handling and futex key references (NOT FOR INCLUSION)

Peter Zijlstra wrote:
> On Thu, 2009-01-08 at 23:52 -0800, Darren Hart wrote:
>> While trying to bend my brain around the various layers of fault handling in
>> futex.c, I think I may have uncovered some logical errors (or at least stale
>> code sections). I've attached two patches that address the alleged problems
>> against linux-tip/core/futexes. They are based on the following assumption:
>>
>> Since the uaddr passed to a futex function isn't updated within the function,
>> and the mm doesn't change while we're in there,
>
> That's not quite true, you _can_ change the memory map by issuing
> concurrent mmap/munmap/mremap etc.. calls from another thread.

Well, what I meant was that the pointer "current->mm" doesn't change,
since that is what we store in the futex key union.

> The thing is, afaik futexes have never been completely safe wrt
> concurrent mm modifications -- that is, as long as we fail the futex op
> with -EFAULT or similar and not crash the kernel we're good.
>
>> there should never be a need to
>> make repeat calls to futex_get_key(). Even if the queue_lock is dropped, the
>> futex_q might lose it's waiter (requeued) but the key stays the same.
>
> Yes, so when we assume the mmap stable (and fail the op whenever that
> assumption proves false) we can say the futex keys are stable and should
> never need recomputation.

And if I understand correctly, we would catch this scenario any time we
try to use uaddr and find it faulting (during the cmpxchg* calls for
example). If the mmap changes too much, we'll exhaust our fault
tolerance (3) and exit with -EFAULT back to userspace.

Sound right?

Thanks for the review Peter,

--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team