Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp746903ybl; Wed, 4 Dec 2019 10:11:01 -0800 (PST) X-Google-Smtp-Source: APXvYqxV5eLzs5wQ84BeDdbBoVFHNJMwuoI+npOB/FPf7XPWp9R5R63l0sECNOCZV+sEgGx5A0r8 X-Received: by 2002:aca:36c5:: with SMTP id d188mr3913143oia.54.1575483061720; Wed, 04 Dec 2019 10:11:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575483061; cv=none; d=google.com; s=arc-20160816; b=uSj7CTd+RjxtXYhDyvvbAotfMRxNk9LGMqbt9LXpaRraEhk3eV/g2yA2oiVgPqlfDb ZXBq6Ux1eJl4NoqjXB9quZaROZ7S+ZoDpKLUpF74+LpvmFvVmUe1xXBkziEkKbhGLxIh JC2GsnGsqtBckenXG4HQOkxMM5nOdxAvODtXMUDidR9aiGvins9AFTXY1m4et3GZ7e1p LDK3NSnHxQz5zuYU44PtR1IXZIYso5J9cJB463NAzimAluSGs+WoVgkceEcocyuxRSjJ qgvBpuLY0Z1ykKfDUai30otX2VgB2DQ0WXQtYPjyXCz5lTuzk+gBxta3euR/m2gYzLOD 5+ew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=QEt6FJRKhjWXWF5mIw9LYJTlMeybbqU9lVMrqpFzZpo=; b=nKurRBPSpjebtDkXvR5z+Fv/Fk3yV+S3MvTqJWx1eHfDWvzoufgC4+DtP/wnF/8fy+ 5Zeberh/kRA4u/e4pvaoL173Bo2GKYnk4nP2sLpgn5rwqTwZzk80RicBW4acY5Ldd8Cj mMmUohhxZY1nwjpBCR/jEBnd3CJ+dhmhiAFh0epJfvXy40CGTikLU7fgA20ETSVOS0jF zQhBJe63wjb5OCbv4fV2/PWjzHtnDO1H4BUPH+Kg+Vh6dOCZP7zCp1H8MqBBjgbn+BE9 CrQ1OxNbfvN8l3vECyrIT+iScYRHvghXJaN+0q87djJlkX6KZEieAi08N+2UiJwAGbSY sbOA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=zzCvb1J3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c66si3640011oig.265.2019.12.04.10.10.49; Wed, 04 Dec 2019 10:11:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=zzCvb1J3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731144AbfLDSJu (ORCPT + 99 others); Wed, 4 Dec 2019 13:09:50 -0500 Received: from mail.kernel.org ([198.145.29.99]:35768 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731127AbfLDSJn (ORCPT ); Wed, 4 Dec 2019 13:09:43 -0500 Received: from localhost (unknown [217.68.49.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 196F320865; Wed, 4 Dec 2019 18:09:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1575482982; bh=l47c64PsrpscUGGizhGJxFgNTfZWhasGVRjUzNmShLM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=zzCvb1J3W+fu1Y3c/RDc9/a6NUj5cDZFevIdX+vI3Bo5vSrgTCulcaEY5HKBi3qFI Ng/F+/rncLHN5MeEFq93nM5s0yminmDxnoDn26wKEKQNl68+VDUFdlkVJl0ehn8lLZ wBKGxXTQbZ+3Tmd8uCt7TAVwT6X5+ICHHcGQKfcE= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Thomas Gleixner , Ingo Molnar , "Peter Zijlstra (Intel)" Subject: [PATCH 4.14 192/209] futex: Mark the begin of futex exit explicitly Date: Wed, 4 Dec 2019 18:56:44 +0100 Message-Id: <20191204175336.705199503@linuxfoundation.org> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191204175321.609072813@linuxfoundation.org> References: <20191204175321.609072813@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Thomas Gleixner commit 18f694385c4fd77a09851fd301236746ca83f3cb upstream. Instead of relying on PF_EXITING use an explicit state for the futex exit and set it in the futex exit function. This moves the smp barrier and the lock/unlock serialization into the futex code. As with the DEAD state this is restricted to the exit path as exec continues to use the same task struct. This allows to simplify that logic in a next step. Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20191106224556.539409004@linutronix.de Signed-off-by: Greg Kroah-Hartman --- include/linux/futex.h | 31 +++---------------------------- kernel/exit.c | 13 +------------ kernel/futex.c | 37 ++++++++++++++++++++++++++++++++++++- 3 files changed, 40 insertions(+), 41 deletions(-) --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -55,6 +55,7 @@ union futex_key { #ifdef CONFIG_FUTEX enum { FUTEX_STATE_OK, + FUTEX_STATE_EXITING, FUTEX_STATE_DEAD, }; @@ -69,33 +70,7 @@ static inline void futex_init_task(struc tsk->futex_state = FUTEX_STATE_OK; } -/** - * futex_exit_done - Sets the tasks futex state to FUTEX_STATE_DEAD - * @tsk: task to set the state on - * - * Set the futex exit state of the task lockless. The futex waiter code - * observes that state when a task is exiting and loops until the task has - * actually finished the futex cleanup. The worst case for this is that the - * waiter runs through the wait loop until the state becomes visible. - * - * This has two callers: - * - * - futex_mm_release() after the futex exit cleanup has been done - * - * - do_exit() from the recursive fault handling path. - * - * In case of a recursive fault this is best effort. Either the futex exit - * code has run already or not. If the OWNER_DIED bit has been set on the - * futex then the waiter can take it over. If not, the problem is pushed - * back to user space. If the futex exit code did not run yet, then an - * already queued waiter might block forever, but there is nothing which - * can be done about that. - */ -static inline void futex_exit_done(struct task_struct *tsk) -{ - tsk->futex_state = FUTEX_STATE_DEAD; -} - +void futex_exit_recursive(struct task_struct *tsk); void futex_exit_release(struct task_struct *tsk); void futex_exec_release(struct task_struct *tsk); @@ -103,7 +78,7 @@ long do_futex(u32 __user *uaddr, int op, u32 __user *uaddr2, u32 val2, u32 val3); #else static inline void futex_init_task(struct task_struct *tsk) { } -static inline void futex_exit_done(struct task_struct *tsk) { } +static inline void futex_exit_recursive(struct task_struct *tsk) { } static inline void futex_exit_release(struct task_struct *tsk) { } static inline void futex_exec_release(struct task_struct *tsk) { } #endif --- a/kernel/exit.c +++ b/kernel/exit.c @@ -803,23 +803,12 @@ void __noreturn do_exit(long code) */ if (unlikely(tsk->flags & PF_EXITING)) { pr_alert("Fixing recursive fault but reboot is needed!\n"); - futex_exit_done(tsk); + futex_exit_recursive(tsk); set_current_state(TASK_UNINTERRUPTIBLE); schedule(); } exit_signals(tsk); /* sets PF_EXITING */ - /* - * Ensure that all new tsk->pi_lock acquisitions must observe - * PF_EXITING. Serializes against futex.c:attach_to_pi_owner(). - */ - smp_mb(); - /* - * Ensure that we must observe the pi_state in exit_mm() -> - * mm_release() -> exit_pi_state_list(). - */ - raw_spin_lock_irq(&tsk->pi_lock); - raw_spin_unlock_irq(&tsk->pi_lock); if (unlikely(in_atomic())) { pr_info("note: %s[%d] exited with preempt_count %d\n", --- a/kernel/futex.c +++ b/kernel/futex.c @@ -3702,10 +3702,45 @@ void futex_exec_release(struct task_stru exit_pi_state_list(tsk); } +/** + * futex_exit_recursive - Set the tasks futex state to FUTEX_STATE_DEAD + * @tsk: task to set the state on + * + * Set the futex exit state of the task lockless. The futex waiter code + * observes that state when a task is exiting and loops until the task has + * actually finished the futex cleanup. The worst case for this is that the + * waiter runs through the wait loop until the state becomes visible. + * + * This is called from the recursive fault handling path in do_exit(). + * + * This is best effort. Either the futex exit code has run already or + * not. If the OWNER_DIED bit has been set on the futex then the waiter can + * take it over. If not, the problem is pushed back to user space. If the + * futex exit code did not run yet, then an already queued waiter might + * block forever, but there is nothing which can be done about that. + */ +void futex_exit_recursive(struct task_struct *tsk) +{ + tsk->futex_state = FUTEX_STATE_DEAD; +} + void futex_exit_release(struct task_struct *tsk) { + tsk->futex_state = FUTEX_STATE_EXITING; + /* + * Ensure that all new tsk->pi_lock acquisitions must observe + * FUTEX_STATE_EXITING. Serializes against attach_to_pi_owner(). + */ + smp_mb(); + /* + * Ensure that we must observe the pi_state in exit_pi_state_list(). + */ + raw_spin_lock_irq(&tsk->pi_lock); + raw_spin_unlock_irq(&tsk->pi_lock); + futex_exec_release(tsk); - futex_exit_done(tsk); + + tsk->futex_state = FUTEX_STATE_DEAD; } long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,