Changelog for 'v2':
Complete commit messages with needed git commit ids as Greg and Lee suggested.
Lee sent a patchset to update Futex for v4.9, see https://www.spinics.net/lists/stable/msg443081.html,
Then Xiaoming sent a follow-up patch for it, see https://lore.kernel.org/lkml/20210225093120.GD641347@dell/.
These 3 patches is directly picked from v4.9,
and they may also resolve following issues in 4.4.260 which have been reported in v4.9,
see https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/?h=linux-4.4.y&id=319f66f08de1083c1fe271261665c209009dd65a
> /*
> * The task is on the way out. When the futex state is
> * FUTEX_STATE_DEAD, we know that the task has finished
> * the cleanup:
> */
> int ret = (p->futex_state = FUTEX_STATE_DEAD) ? -ESRCH : -EAGAIN;
Here may be:
int ret = (p->futex_state == FUTEX_STATE_DEAD) ? -ESRCH : -EAGAIN;
> raw_spin_unlock_irq(&p->pi_lock);
> /*
> * If the owner task is between FUTEX_STATE_EXITING and
> * FUTEX_STATE_DEAD then store the task pointer and keep
> * the reference on the task struct. The calling code will
> * drop all locks, wait for the task to reach
> * FUTEX_STATE_DEAD and then drop the refcount. This is
> * required to prevent a live lock when the current task
> * preempted the exiting task between the two states.
> */
> if (ret == -EBUSY)
And here, the variable "ret" may only be "-ESRCH" or "-EAGAIN", but not "-EBUSY".
> *exiting = p;
> else
> put_task_struct(p);
Since 074e7d515783 ("futex: Ensure the correct return value from futex_lock_pi()") has
been merged in 4.4.260, I send the remain 3 patches.
Peter Zijlstra (1):
futex: Change locking rules
Thomas Gleixner (2):
futex: Cure exit race
futex: fix dead code in attach_to_pi_owner()
kernel/futex.c | 209 +++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 177 insertions(+), 32 deletions(-)
--
2.25.4
From: Thomas Gleixner <[email protected]>
commit da791a667536bf8322042e38ca85d55a78d3c273 upstream.
This patch comes directly from an origin patch (commit
9c3f3986036760c48a92f04b36774aa9f63673f80) in v4.9.
Stefan reported, that the glibc tst-robustpi4 test case fails
occasionally. That case creates the following race between
sys_exit() and sys_futex_lock_pi():
CPU0 CPU1
sys_exit() sys_futex()
do_exit() futex_lock_pi()
exit_signals(tsk) No waiters:
tsk->flags |= PF_EXITING; *uaddr == 0x00000PID
mm_release(tsk) Set waiter bit
exit_robust_list(tsk) { *uaddr = 0x80000PID;
Set owner died attach_to_pi_owner() {
*uaddr = 0xC0000000; tsk = get_task(PID);
} if (!tsk->flags & PF_EXITING) {
... attach();
tsk->flags |= PF_EXITPIDONE; } else {
if (!(tsk->flags & PF_EXITPIDONE))
return -EAGAIN;
return -ESRCH; <--- FAIL
}
ESRCH is returned all the way to user space, which triggers the glibc test
case assert. Returning ESRCH unconditionally is wrong here because the user
space value has been changed by the exiting task to 0xC0000000, i.e. the
FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This
is a valid state and the kernel has to handle it, i.e. taking the futex.
Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE
is set in the task which 'owns' the futex. If the value has changed, let
the kernel retry the operation, which includes all regular sanity checks
and correctly handles the FUTEX_OWNER_DIED case.
If it hasn't changed, then return ESRCH as there is no way to distinguish
this case from malfunctioning user space. This happens when the exiting
task did not have a robust list, the robust list was corrupted or the user
space value in the futex was simply bogus.
Reported-by: Stefan Liebler <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: [email protected]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
[Lee: Required to satisfy functional dependency from futex back-port.
Re-add the missing handle_exit_race() parts from:
3d4775df0a89 ("futex: Replace PF_EXITPIDONE with a state")]
Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Zheng Yejian <[email protected]>
---
kernel/futex.c | 71 +++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 65 insertions(+), 6 deletions(-)
diff --git a/kernel/futex.c b/kernel/futex.c
index b410752f5ad1..116766ef7de6 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1196,11 +1196,67 @@ static void wait_for_owner_exiting(int ret, struct task_struct *exiting)
put_task_struct(exiting);
}
+static int handle_exit_race(u32 __user *uaddr, u32 uval,
+ struct task_struct *tsk)
+{
+ u32 uval2;
+
+ /*
+ * If the futex exit state is not yet FUTEX_STATE_DEAD, wait
+ * for it to finish.
+ */
+ if (tsk && tsk->futex_state != FUTEX_STATE_DEAD)
+ return -EAGAIN;
+
+ /*
+ * Reread the user space value to handle the following situation:
+ *
+ * CPU0 CPU1
+ *
+ * sys_exit() sys_futex()
+ * do_exit() futex_lock_pi()
+ * futex_lock_pi_atomic()
+ * exit_signals(tsk) No waiters:
+ * tsk->flags |= PF_EXITING; *uaddr == 0x00000PID
+ * mm_release(tsk) Set waiter bit
+ * exit_robust_list(tsk) { *uaddr = 0x80000PID;
+ * Set owner died attach_to_pi_owner() {
+ * *uaddr = 0xC0000000; tsk = get_task(PID);
+ * } if (!tsk->flags & PF_EXITING) {
+ * ... attach();
+ * tsk->futex_state = } else {
+ * FUTEX_STATE_DEAD; if (tsk->futex_state !=
+ * FUTEX_STATE_DEAD)
+ * return -EAGAIN;
+ * return -ESRCH; <--- FAIL
+ * }
+ *
+ * Returning ESRCH unconditionally is wrong here because the
+ * user space value has been changed by the exiting task.
+ *
+ * The same logic applies to the case where the exiting task is
+ * already gone.
+ */
+ if (get_futex_value_locked(&uval2, uaddr))
+ return -EFAULT;
+
+ /* If the user space value has changed, try again. */
+ if (uval2 != uval)
+ return -EAGAIN;
+
+ /*
+ * The exiting task did not have a robust list, the robust list was
+ * corrupted or the user space value in *uaddr is simply bogus.
+ * Give up and tell user space.
+ */
+ return -ESRCH;
+}
+
/*
* Lookup the task for the TID provided from user space and attach to
* it after doing proper sanity checks.
*/
-static int attach_to_pi_owner(u32 uval, union futex_key *key,
+static int attach_to_pi_owner(u32 __user *uaddr, u32 uval, union futex_key *key,
struct futex_pi_state **ps,
struct task_struct **exiting)
{
@@ -1211,12 +1267,15 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key,
/*
* We are the first waiter - try to look up the real owner and attach
* the new pi_state to it, but bail out when TID = 0 [1]
+ *
+ * The !pid check is paranoid. None of the call sites should end up
+ * with pid == 0, but better safe than sorry. Let the caller retry
*/
if (!pid)
- return -ESRCH;
+ return -EAGAIN;
p = futex_find_get_task(pid);
if (!p)
- return -ESRCH;
+ return handle_exit_race(uaddr, uval, NULL);
if (unlikely(p->flags & PF_KTHREAD)) {
put_task_struct(p);
@@ -1235,7 +1294,7 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key,
* FUTEX_STATE_DEAD, we know that the task has finished
* the cleanup:
*/
- int ret = (p->futex_state = FUTEX_STATE_DEAD) ? -ESRCH : -EAGAIN;
+ int ret = handle_exit_race(uaddr, uval, p);
raw_spin_unlock_irq(&p->pi_lock);
/*
@@ -1301,7 +1360,7 @@ static int lookup_pi_state(u32 __user *uaddr, u32 uval,
* We are the first waiter - try to look up the owner based on
* @uval and attach to it.
*/
- return attach_to_pi_owner(uval, key, ps, exiting);
+ return attach_to_pi_owner(uaddr, uval, key, ps, exiting);
}
static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
@@ -1417,7 +1476,7 @@ static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
* attach to the owner. If that fails, no harm done, we only
* set the FUTEX_WAITERS bit in the user space variable.
*/
- return attach_to_pi_owner(uval, key, ps, exiting);
+ return attach_to_pi_owner(uaddr, newval, key, ps, exiting);
}
/**
--
2.25.4
On Thu, Mar 11, 2021 at 11:25:57AM +0800, Zheng Yejian wrote:
> Changelog for 'v2':
> Complete commit messages with needed git commit ids as Greg and Lee suggested.
>
> Lee sent a patchset to update Futex for v4.9, see https://www.spinics.net/lists/stable/msg443081.html,
> Then Xiaoming sent a follow-up patch for it, see https://lore.kernel.org/lkml/20210225093120.GD641347@dell/.
>
> These 3 patches is directly picked from v4.9,
> and they may also resolve following issues in 4.4.260 which have been reported in v4.9,
> see https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/?h=linux-4.4.y&id=319f66f08de1083c1fe271261665c209009dd65a
> > /*
> > * The task is on the way out. When the futex state is
> > * FUTEX_STATE_DEAD, we know that the task has finished
> > * the cleanup:
> > */
> > int ret = (p->futex_state = FUTEX_STATE_DEAD) ? -ESRCH : -EAGAIN;
>
> Here may be:
> int ret = (p->futex_state == FUTEX_STATE_DEAD) ? -ESRCH : -EAGAIN;
>
> > raw_spin_unlock_irq(&p->pi_lock);
> > /*
> > * If the owner task is between FUTEX_STATE_EXITING and
> > * FUTEX_STATE_DEAD then store the task pointer and keep
> > * the reference on the task struct. The calling code will
> > * drop all locks, wait for the task to reach
> > * FUTEX_STATE_DEAD and then drop the refcount. This is
> > * required to prevent a live lock when the current task
> > * preempted the exiting task between the two states.
> > */
> > if (ret == -EBUSY)
>
> And here, the variable "ret" may only be "-ESRCH" or "-EAGAIN", but not "-EBUSY".
>
> > *exiting = p;
> > else
> > put_task_struct(p);
>
> Since 074e7d515783 ("futex: Ensure the correct return value from futex_lock_pi()") has
> been merged in 4.4.260, I send the remain 3 patches.
>
> Peter Zijlstra (1):
> futex: Change locking rules
>
> Thomas Gleixner (2):
> futex: Cure exit race
> futex: fix dead code in attach_to_pi_owner()
>
> kernel/futex.c | 209 +++++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 177 insertions(+), 32 deletions(-)
All now queued up, thanks.
greg k-h