Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3833470imu; Mon, 10 Dec 2018 08:31:20 -0800 (PST) X-Google-Smtp-Source: AFSGD/XW6JllFv0t6M0xaEPVK6XyAxlvYtdfhFyV8kzjmOQlzN+pBcrL1ySGtpVwHRH/eYj4tUv3 X-Received: by 2002:a17:902:50e3:: with SMTP id c32mr12873923plj.318.1544459480370; Mon, 10 Dec 2018 08:31:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544459480; cv=none; d=google.com; s=arc-20160816; b=hvjBX87Yf+O2DDQSa0xntj4ckiez+aRUXUlNMDsC9fpuAuR1vLCJgeRzi9Z/yhr450 Dtd/ye/milY29tiDMZ3a+TSGEthfrFEP6fiyKSKAYyNvQTQnaQhNRcsiQKUhFlF3UjP7 2VSNFraWbR3hIf6EFhiRgxXtrhu+rh2vbjPKkVUf0I1yIdaEus2v0mqd8jfdw3e2uAyB sEAQ344BMkZ70JTuJBW1oohmA2pOqm6iOOrzEP6Ikww4E1W5TuhMYXAA9SuD3PJAfqDG kh1Fe4OUb60bLaAbSEYLhHiIk+7U7BltISLDQBxABwkBvZ+JsOxFtl4sCY3dEknU8moF SNWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=4EKVW+hAfSkSgUcESSFHuIL80PpQuXuWr+0DFFQ5DB8=; b=mi/TOVze8tKotVElp+yvvV5w7tEW9d3SBdBY5Ns4TiSox3UXCoLv0DDWzAni48HloD /30r/ChDVQVVqxBfYFPhJ2fEtWaxPOw4cVaRI6J9R9P5Sk4DeHuJ8L2Hp5rTfNCKi/50 4RHWxM187Lxy9wYaTZEj+bBbXVPbvmQGQajnEAyNFpi0vbdM2urBOxTnbrknYtm1gc2m KIrvCO65L8bKPQC0Sm1oLPiQYsWJycQzwq6xZ61seW4SZQBdSNNxhQuu665K5xM2brpo VX3rUL/yWRLI4qj/bo3aMBUAW5ym5ZtvEoKJuzrOP0NsMYVpkE6gJmmjoXXCsPQRt5SW fRUg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=XUBVYZVQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y4si10830795pfk.172.2018.12.10.08.31.04; Mon, 10 Dec 2018 08:31:20 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=XUBVYZVQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727625AbeLJQCP (ORCPT + 99 others); Mon, 10 Dec 2018 11:02:15 -0500 Received: from merlin.infradead.org ([205.233.59.134]:59784 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727186AbeLJQCO (ORCPT ); Mon, 10 Dec 2018 11:02:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=4EKVW+hAfSkSgUcESSFHuIL80PpQuXuWr+0DFFQ5DB8=; b=XUBVYZVQhFxRsAy/nSISdnQPI ETyGkz8ZGNDq175W6v6gc5/k2VIud+MRFeCqQOD/U+SX253lZPsSPDkuMdTgMOkteuOZ0lwPLl2yc 0mUBZ1WM0yRBRsutWILKIkEd82Bt9DfNutOmeruKAxhSlRg5aYphQ87f26HPbKLfQzcwKZrLZNiHO KpgrqZfxrneD954GNvRbU17poaMOPgV+NWHuFzmIXDuWuYMI8QS5V2xGpijavyzalXJAmQ8toWkaL A9YRhZabPDPanldBj4yzfrKs4E2JSv0LUXhvrpQe7bdPVTPv7r0Aek7xRH6C1FwjDuuVcHmS5TaXz 2QV5N+OpQ==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1gWO0V-0004oV-D3; Mon, 10 Dec 2018 16:02:07 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id D714D2075BE71; Mon, 10 Dec 2018 17:02:05 +0100 (CET) Date: Mon, 10 Dec 2018 17:02:05 +0100 From: Peter Zijlstra To: Thomas Gleixner Cc: LKML , Stefan Liebler , Heiko Carstens , Darren Hart , Ingo Molnar Subject: Re: [patch] futex: Cure exit race Message-ID: <20181210160205.GQ5289@hirez.programming.kicks-ass.net> References: <20181210152311.986181245@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181210152311.986181245@linutronix.de> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 10, 2018 at 04:23:06PM +0100, Thomas Gleixner wrote: > kernel/futex.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 53 insertions(+), 4 deletions(-) > > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -1148,11 +1148,60 @@ static int attach_to_pi_state(u32 __user > return ret; > } > > +static int handle_exit_race(u32 __user *uaddr, u32 uval, struct task_struct *tsk) > +{ > + u32 uval2; > + > + /* > + * If PF_EXITPIDONE is not yet set try again. > + */ > + if (!(tsk->flags & PF_EXITPIDONE)) > + return -EAGAIN; > + > + /* > + * Reread the user space value to handle the following situation: > + * > + * CPU0 CPU1 > + * > + * sys_exit() sys_futex() > + * do_exit() futex_lock_pi() > + * exit_signals(tsk) No waiters: > + * tsk->flags |= PF_EXITING; *uaddr == 0x00000PID > + * mm_release(tsk) Set waiter bit > + * exit_robust_list(tsk) { *uaddr = 0x80000PID; Just to clarify; this is: sys_futex() <- futex_lock_pi() <- futex_lock_pi_atomic(), where we do: lock_pi_update_atomic(); // changes the futex word attach_to_pi_owner(); // possibly returns ESRCH after changing the word > + * Set owner died attach_to_pi_owner() { > + * *uaddr = 0xC0000000; tsk = get_task(PID); > + * } if (!tsk->flags & PF_EXITING) { > + * ... attach(); > + * tsk->flags |= PF_EXITPIDONE; } else { > + * if (!(tsk->flags & PF_EXITPIDONE)) > + * return -EAGAIN; > + * return -ESRCH; <--- FAIL > + * } > + * > + * Returning ESRCH unconditionally is wrong here because the > + * user space value has been changed by the exiting task. > + */ > + if (get_futex_value_locked(&uval2, uaddr)) > + return -EFAULT; > + > + /* If the user space value has changed, try again. */ > + if (uval2 != uval) > + return -EAGAIN; And this then goes back to futex_lock_pi(), which does a retry loop. > + /* > + * The exiting task did not have a robust list, the robust list was > + * corrupted or the user space value in *uaddr is simply bogus. > + * Give up and tell user space. > + */ > + return -ESRCH; If it is unchanged; -ESRCH is a valid return value. > +} There is another callers of futex_lock_pi_atomic(), futex_proxy_trylock_atomic(), which is part of futex_requeue(), that too does a retry loop on -EAGAIN. And there is another caller of attach_to_pi_owner(): lookup_pi_state(), and that too is in futex_requeue() and handles the retry case properly. Yes, this all looks good. Acked-by: Peter Zijlstra (Intel)