Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp4498882imu; Tue, 29 Jan 2019 02:37:40 -0800 (PST) X-Google-Smtp-Source: ALg8bN4rbHc9OS3zMvwYqM8in9pzLVf6fyaVhKNTMN2eoWzBn+vtWIokPPpyWoV5tw4+xRdxB+FC X-Received: by 2002:a63:b0a:: with SMTP id 10mr23285923pgl.423.1548758260747; Tue, 29 Jan 2019 02:37:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548758260; cv=none; d=google.com; s=arc-20160816; b=FX9f6AYEqxAnGtcbDD4FQMVFTS3ahqM8ZgZa1bYISqaDeGJVT7+hMyMde4lpW0hKLc TB4Uv57TQBcccnLscyj8YbbwAwd+17Qd8unOI4ZmUnpbd8y6iZqJNc6/GZ6K9+cRP0z7 j8Wh05kyskHojMwTUIk5E5sFTNlxDW5K60QoZdkx9U1Gv/eJMVBV6MesObPRKotAy75D 8C+5Qme09j3+q7/gvjz9p+JelPThgm8PicuTbCwJKF31f7ngK5HG0iNySlb6G39LVr4M 8wQTco0Ic2fbksHUJ0D5+tL2PKMG+vvV9jBTcYPFaRvtwPkTZSmtCJ/ys6YIqejRrMzX zQuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=5RJI3o5M+mkjfr5g4Aw6j9OJ0MTR8pHNwl/kE+goJ0o=; b=fQgO1HYSrArwp9/LEhpQ3tKhB+9cL49fTRrZ5knhpTWPQzQVmgOIrtEJkFhXBE4ZEF NQZWXW8Qw6F4eXLn81XSGDrM2ACl3RyUhKQOrDyFYWTUi+NowZbioz6RvuLrG0mkF2BW F6PuzSlujfhmTXsrgPJ/uFZoUh/Vlrs5wjLda/XmiV6qot1oPoB4y3XnuyjmeQMUQ3tm EkDbF2FtGJkVSlJNC8DBpYzyQS3g7nHv+VebiBTzrSG5nGIu2KpbCb+uSdkeeE+KNgKB npz1N0d+NnwBq5GyMJwrNt5tMKoMe11c0nYuHiA111AsIx9DseMCb1glXMRPiG8g37mi /R4Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=nuDt9gJs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v83si35216151pfk.264.2019.01.29.02.37.24; Tue, 29 Jan 2019 02:37:40 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=nuDt9gJs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727232AbfA2KgE (ORCPT + 99 others); Tue, 29 Jan 2019 05:36:04 -0500 Received: from merlin.infradead.org ([205.233.59.134]:57086 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726790AbfA2KgD (ORCPT ); Tue, 29 Jan 2019 05:36:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=5RJI3o5M+mkjfr5g4Aw6j9OJ0MTR8pHNwl/kE+goJ0o=; b=nuDt9gJsItxxGKi+QDFJH9XzC s+9KNrWH88TeuWQPiQ0KPSlPvE1CYJHmYUXnPPrSiUWJeLcSURofJhGjombYFWqDQC0CD0l9ZjMVZ a4xxLKFO4F8+iMox/0yfAlDayAYqsVoTIG2Ca2JGkDQRHsGxcC+lfsRLpXu92iTc0WHUfmjMZPwQT QIYHDjTdkF3dW5a+RWx8EtxAdjlaAf1H5N6j5TefKAxQnpc7qBDFs0b/ifnOs0X43cYCWWB3VixUq XAzhadTnF65cEcw+jJ5+m35zFd51pXG1UVtXmGKJxsOi1bIJiLtv1CJYd/ljGbXKh2hSIyz8mRQAE HwA9IJv/Q==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1goQkJ-0000Hr-AR; Tue, 29 Jan 2019 10:35:59 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 2337620101B8C; Tue, 29 Jan 2019 11:35:57 +0100 (CET) Date: Tue, 29 Jan 2019 11:35:57 +0100 From: Peter Zijlstra To: Heiko Carstens Cc: Thomas Gleixner , Ingo Molnar , Martin Schwidefsky , LKML , linux-s390@vger.kernel.org, Stefan Liebler , Sebastian Sewior Subject: Re: WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggered Message-ID: <20190129103557.GF28485@hirez.programming.kicks-ass.net> References: <20181127081115.GB3625@osiris> <20181129112321.GB3449@osiris> <20190128134410.GA28485@hirez.programming.kicks-ass.net> <20190128135804.GB28878@hirez.programming.kicks-ass.net> <20190129090108.GA26906@osiris> <20190129102409.GB26906@osiris> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190129102409.GB26906@osiris> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 29, 2019 at 11:24:09AM +0100, Heiko Carstens wrote: > Yes, sure. However ;) I reproduced the above with v5.0-rc4 + your > patch. And now I am trying to reproduce with linux-next 20190129 + > your patch and it doesn't trigger. Did I miss a patch which is only in > linux-next which could fix this? > I'm forever confused on what patch is where; but -ESRCH makes me thing maybe you lost this one: --- commit da791a667536bf8322042e38ca85d55a78d3c273 Author: Thomas Gleixner Date: Mon Dec 10 14:35:14 2018 +0100 futex: Cure exit race Stefan reported, that the glibc tst-robustpi4 test case fails occasionally. That case creates the following race between sys_exit() and sys_futex_lock_pi(): CPU0 CPU1 sys_exit() sys_futex() do_exit() futex_lock_pi() exit_signals(tsk) No waiters: tsk->flags |= PF_EXITING; *uaddr == 0x00000PID mm_release(tsk) Set waiter bit exit_robust_list(tsk) { *uaddr = 0x80000PID; Set owner died attach_to_pi_owner() { *uaddr = 0xC0000000; tsk = get_task(PID); } if (!tsk->flags & PF_EXITING) { ... attach(); tsk->flags |= PF_EXITPIDONE; } else { if (!(tsk->flags & PF_EXITPIDONE)) return -EAGAIN; return -ESRCH; <--- FAIL } ESRCH is returned all the way to user space, which triggers the glibc test case assert. Returning ESRCH unconditionally is wrong here because the user space value has been changed by the exiting task to 0xC0000000, i.e. the FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This is a valid state and the kernel has to handle it, i.e. taking the futex. Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE is set in the task which 'owns' the futex. If the value has changed, let the kernel retry the operation, which includes all regular sanity checks and correctly handles the FUTEX_OWNER_DIED case. If it hasn't changed, then return ESRCH as there is no way to distinguish this case from malfunctioning user space. This happens when the exiting task did not have a robust list, the robust list was corrupted or the user space value in the futex was simply bogus. Reported-by: Stefan Liebler Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra Cc: Heiko Carstens Cc: Darren Hart Cc: Ingo Molnar Cc: Sasha Levin Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467 Link: https://lkml.kernel.org/r/20181210152311.986181245@linutronix.de diff --git a/kernel/futex.c b/kernel/futex.c index f423f9b6577e..5cc8083a4c89 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -1148,11 +1148,65 @@ static int attach_to_pi_state(u32 __user *uaddr, u32 uval, return ret; } +static int handle_exit_race(u32 __user *uaddr, u32 uval, + struct task_struct *tsk) +{ + u32 uval2; + + /* + * If PF_EXITPIDONE is not yet set, then try again. + */ + if (tsk && !(tsk->flags & PF_EXITPIDONE)) + return -EAGAIN; + + /* + * Reread the user space value to handle the following situation: + * + * CPU0 CPU1 + * + * sys_exit() sys_futex() + * do_exit() futex_lock_pi() + * futex_lock_pi_atomic() + * exit_signals(tsk) No waiters: + * tsk->flags |= PF_EXITING; *uaddr == 0x00000PID + * mm_release(tsk) Set waiter bit + * exit_robust_list(tsk) { *uaddr = 0x80000PID; + * Set owner died attach_to_pi_owner() { + * *uaddr = 0xC0000000; tsk = get_task(PID); + * } if (!tsk->flags & PF_EXITING) { + * ... attach(); + * tsk->flags |= PF_EXITPIDONE; } else { + * if (!(tsk->flags & PF_EXITPIDONE)) + * return -EAGAIN; + * return -ESRCH; <--- FAIL + * } + * + * Returning ESRCH unconditionally is wrong here because the + * user space value has been changed by the exiting task. + * + * The same logic applies to the case where the exiting task is + * already gone. + */ + if (get_futex_value_locked(&uval2, uaddr)) + return -EFAULT; + + /* If the user space value has changed, try again. */ + if (uval2 != uval) + return -EAGAIN; + + /* + * The exiting task did not have a robust list, the robust list was + * corrupted or the user space value in *uaddr is simply bogus. + * Give up and tell user space. + */ + return -ESRCH; +} + /* * Lookup the task for the TID provided from user space and attach to * it after doing proper sanity checks. */ -static int attach_to_pi_owner(u32 uval, union futex_key *key, +static int attach_to_pi_owner(u32 __user *uaddr, u32 uval, union futex_key *key, struct futex_pi_state **ps) { pid_t pid = uval & FUTEX_TID_MASK; @@ -1162,12 +1216,15 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key, /* * We are the first waiter - try to look up the real owner and attach * the new pi_state to it, but bail out when TID = 0 [1] + * + * The !pid check is paranoid. None of the call sites should end up + * with pid == 0, but better safe than sorry. Let the caller retry */ if (!pid) - return -ESRCH; + return -EAGAIN; p = find_get_task_by_vpid(pid); if (!p) - return -ESRCH; + return handle_exit_race(uaddr, uval, NULL); if (unlikely(p->flags & PF_KTHREAD)) { put_task_struct(p); @@ -1187,7 +1244,7 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key, * set, we know that the task has finished the * cleanup: */ - int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN; + int ret = handle_exit_race(uaddr, uval, p); raw_spin_unlock_irq(&p->pi_lock); put_task_struct(p); @@ -1244,7 +1301,7 @@ static int lookup_pi_state(u32 __user *uaddr, u32 uval, * We are the first waiter - try to look up the owner based on * @uval and attach to it. */ - return attach_to_pi_owner(uval, key, ps); + return attach_to_pi_owner(uaddr, uval, key, ps); } static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval) @@ -1352,7 +1409,7 @@ static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb, * attach to the owner. If that fails, no harm done, we only * set the FUTEX_WAITERS bit in the user space variable. */ - return attach_to_pi_owner(uval, key, ps); + return attach_to_pi_owner(uaddr, newval, key, ps); } /**