Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3763744imu; Mon, 10 Dec 2018 07:29:49 -0800 (PST) X-Google-Smtp-Source: AFSGD/VpvH1pDoxyy8JCr9MAuZ4OzsIr/020rkAgcaj1nn2M0t56Y026v0G+trfyjnSb8T8xqG6S X-Received: by 2002:a65:6148:: with SMTP id o8mr11143491pgv.451.1544455789553; Mon, 10 Dec 2018 07:29:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544455789; cv=none; d=google.com; s=arc-20160816; b=hC2GvZo4hDuUF9zIfvu/dCY+vvstNjl66tPasrM7BOEcaF7cO4m+Xu+1QWP5GCszWV /9gpOzDD/nwArB0PSe6DpEo22c28AqJo4YQiX/yww+VNzPwO87VQF6Y11b3V/g5LH0di YgfyDLYX4Qh9z3M5xwl3xAfyJPvNgTaQyQbo27bOEwtXvF5hfRILPl6VMVqi9rHv4x0N 90Ln5q1vgZV2K8EKiaS05w5Yti0Xh7uTAos8ag/UsKxUZzngYTd5CZifQ/qXABMxManJ uwAez8ZHxlizLLBnpJGBF7m3cpU5hrQPMSJ8+i/a3eKd8yFJKb0qekqDaiTLtiWed9wt I/4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:subject:cc:to:from:date :user-agent:message-id; bh=CNnPMDOkNcHqY5Fy13DQHyMbskcJg7NCS7DWqHqhrVI=; b=xkQMhFHMG90cLntXGmKeZoslTfsqMYTbvmbVJ0JTKCBZbYqA7t7FGOfqd7fy9nF24n mKyKQSByzE7e4zxUQwqdBr3dhLR/e45ngG7ymrlUwOpiDRGl6dY3a9AJb+AMkyuhGd6H A63s1Y4j4hvrVIFmBcOI2y2Y6gMqtL+ls1mgjAb7Epx5LJApzb1Rmr7CEhe5is817G7S YLUU7ry7ZvjA0NgdTN2CTikD+vkTkC9WhC+XtvRT9oJF9jbZpJJCc7eMpeMuqfW0cDUJ xJdvwOx9zFphbNyc3Z76u8sS4PgHOC/0fLRNhMLs8JjWdLkk10GwBOJrU/jibjOOYbJx fQoA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 4si10907128pfg.280.2018.12.10.07.29.34; Mon, 10 Dec 2018 07:29:49 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728045AbeLJP1x (ORCPT + 99 others); Mon, 10 Dec 2018 10:27:53 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:33620 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727914AbeLJP1x (ORCPT ); Mon, 10 Dec 2018 10:27:53 -0500 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1gWNTK-0006I9-Vi; Mon, 10 Dec 2018 16:27:51 +0100 Message-Id: <20181210152311.986181245@linutronix.de> User-Agent: quilt/0.65 Date: Mon, 10 Dec 2018 16:23:06 +0100 From: Thomas Gleixner To: LKML Cc: Stefan Liebler , Heiko Carstens , Peter Zijlstra , Darren Hart , Ingo Molnar Subject: [patch] futex: Cure exit race MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Stefan reported, that the glibc tst-robustpi4 test case fails occasionally. That case creates the following race between sys_exit() and sys_futex(LOCK_PI): CPU0 CPU1 sys_exit() sys_futex() do_exit() futex_lock_pi() exit_signals(tsk) No waiters: tsk->flags |= PF_EXITING; *uaddr == 0x00000PID mm_release(tsk) Set waiter bit exit_robust_list(tsk) { *uaddr = 0x80000PID; Set owner died attach_to_pi_owner() { *uaddr = 0xC0000000; tsk = get_task(PID); } if (!tsk->flags & PF_EXITING) { ... attach(); tsk->flags |= PF_EXITPIDONE; } else { if (!(tsk->flags & PF_EXITPIDONE)) return -EAGAIN; return -ESRCH; <--- FAIL } ESRCH is returned all the way to user space, which triggers the glibc test case assert. Returning ESRCH unconditionally is wrong here because the user space value has been changed by the exiting task to 0xC0000000, i.e. the FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This is a valid state and the kernel has to handle it, i.e. taking the futex. Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE is set in the task which owns the futex. If the value has changed, let the kernel retry the operation, which includes all regular sanity checks and correctly handles the FUTEX_OWNER_DIED case. If it hasn't changed, then return ESRCH as there is no way to distinguish this case from malfunctioning user space. This happens when the exiting task did not have a robust list, the robust list was corrupted or the user space value in the futex was simply bogus. Reported-by: Stefan Liebler Signed-off-by: Thomas Gleixner Cc: Heiko Carstens Cc: Peter Zijlstra Cc: Darren Hart Cc: Ingo Molnar Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467 --- kernel/futex.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 53 insertions(+), 4 deletions(-) --- a/kernel/futex.c +++ b/kernel/futex.c @@ -1148,11 +1148,60 @@ static int attach_to_pi_state(u32 __user return ret; } +static int handle_exit_race(u32 __user *uaddr, u32 uval, struct task_struct *tsk) +{ + u32 uval2; + + /* + * If PF_EXITPIDONE is not yet set try again. + */ + if (!(tsk->flags & PF_EXITPIDONE)) + return -EAGAIN; + + /* + * Reread the user space value to handle the following situation: + * + * CPU0 CPU1 + * + * sys_exit() sys_futex() + * do_exit() futex_lock_pi() + * exit_signals(tsk) No waiters: + * tsk->flags |= PF_EXITING; *uaddr == 0x00000PID + * mm_release(tsk) Set waiter bit + * exit_robust_list(tsk) { *uaddr = 0x80000PID; + * Set owner died attach_to_pi_owner() { + * *uaddr = 0xC0000000; tsk = get_task(PID); + * } if (!tsk->flags & PF_EXITING) { + * ... attach(); + * tsk->flags |= PF_EXITPIDONE; } else { + * if (!(tsk->flags & PF_EXITPIDONE)) + * return -EAGAIN; + * return -ESRCH; <--- FAIL + * } + * + * Returning ESRCH unconditionally is wrong here because the + * user space value has been changed by the exiting task. + */ + if (get_futex_value_locked(&uval2, uaddr)) + return -EFAULT; + + /* If the user space value has changed, try again. */ + if (uval2 != uval) + return -EAGAIN; + + /* + * The exiting task did not have a robust list, the robust list was + * corrupted or the user space value in *uaddr is simply bogus. + * Give up and tell user space. + */ + return -ESRCH; +} + /* * Lookup the task for the TID provided from user space and attach to * it after doing proper sanity checks. */ -static int attach_to_pi_owner(u32 uval, union futex_key *key, +static int attach_to_pi_owner(u32 __user *uaddr, u32 uval, union futex_key *key, struct futex_pi_state **ps) { pid_t pid = uval & FUTEX_TID_MASK; @@ -1187,7 +1236,7 @@ static int attach_to_pi_owner(u32 uval, * set, we know that the task has finished the * cleanup: */ - int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN; + int ret = handle_exit_race(uaddr, uval, p); raw_spin_unlock_irq(&p->pi_lock); put_task_struct(p); @@ -1244,7 +1293,7 @@ static int lookup_pi_state(u32 __user *u * We are the first waiter - try to look up the owner based on * @uval and attach to it. */ - return attach_to_pi_owner(uval, key, ps); + return attach_to_pi_owner(uaddr, uval, key, ps); } static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval) @@ -1352,7 +1401,7 @@ static int futex_lock_pi_atomic(u32 __us * attach to the owner. If that fails, no harm done, we only * set the FUTEX_WAITERS bit in the user space variable. */ - return attach_to_pi_owner(uval, key, ps); + return attach_to_pi_owner(uaddr, uval, key, ps); } /**