Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp4413624imu; Tue, 29 Jan 2019 00:49:40 -0800 (PST) X-Google-Smtp-Source: ALg8bN6PPqpDqNWViHmKaX5eZNSkkOpWPEnD5LA1W5PiSlQNS/W22dxOcul1dcCSZlOaT+smy91S X-Received: by 2002:a63:68c4:: with SMTP id d187mr22698875pgc.11.1548751780512; Tue, 29 Jan 2019 00:49:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548751780; cv=none; d=google.com; s=arc-20160816; b=KKkg9DT1Vxp7hODg7S9SNotgf9nD/ImH378zAMspuZGRDxHEB0+R+Tv2T8AH5nWWEN icntKyJnUfVOLZ+IChIDTi42D/U2ZEAiUo7rrlqJl2ksYpU9IGmCl1oGJUH0BilfIDwE d5Ltt4bWyOo/8E7RRfysn+7M7jtno/VF9QsjV9v8Y8LjEZ5gwdgQl7BIFyJ79AjZsUjB H6M2x/wxCJabJiLG6gsFXVAMiMqNATuj6WJfGF4zmtmSGWH3/yqf+EgrDyFG3fLRi5hH pQwTOaX7InXhXEAnlhpYkRPMToZ8viCvS/+0ngJ1rkGGZsFb3rs3Sn95u4xT9tflqMe6 CC6w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=a6OWXRDrKRjK1tn6RziHuR9D0P3iFku43jfbGrAOHuo=; b=gp4W8BBzsMuZohH8wiz3NanlwiXW+5cKB2ZZSoSWY6kfx6QhosNAj/k83hvkgzJMMY 8A/tRq/1eCogowZyQJPUvTUT9SIE0KisebFQZvdRjlNNX9PCyvKDzTOzwMbmgWCSKQma qWHUvz9vo/tbGBWancS29cjZkfeujRl0UVG5GnfodarGCy1AC7xiQ4YNFfGA0/pmp7Q1 ba9DLULenChtHPKXB4ggtzJor6Q5BC9xdpgtleXrKUyhlYFUQRrFcGo6wCDZOn6tYcdm 2GY6Shazw7d4A5PeuQyws8VOGw9KycSzZ/A8z724EreZxzoShR9RKlrwhrxkaoEU5b0C Rz9Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=Q8zSkrfB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v6si1716411pgv.277.2019.01.29.00.49.25; Tue, 29 Jan 2019 00:49:40 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=Q8zSkrfB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727545AbfA2ItO (ORCPT + 99 others); Tue, 29 Jan 2019 03:49:14 -0500 Received: from merlin.infradead.org ([205.233.59.134]:55638 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725298AbfA2ItN (ORCPT ); Tue, 29 Jan 2019 03:49:13 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=a6OWXRDrKRjK1tn6RziHuR9D0P3iFku43jfbGrAOHuo=; b=Q8zSkrfB3Y/7Et26rFB1C4F4d /i4ENuTK2AEX2jacaPj+lf1dJIaa4I+Guyc10Px7DuoN0dNte7IEAN6ag4R1CB9Bog/jXUCEOB93Q 18d00mLJGH6sRGYWQ4k8Fez9JskKtHXBPvHiPEyzVygxkAa0aZ1x4FPYSzo7M80c8ViMlbAm2Zh6R oz+pbHA+5/kptEOmhxkAeM/WXhwse7X1UvKluWmYY2tpU97UkE7ZDndpAvO8vp6IOxDdEu1sXmtW4 87EVcoQL6E0pzm0wBkOmMfylFsCZed9SZOoQfNwMWyYzNn6Kulo/efImCddMJ0IljR7Kf2WyJRKI5 bmNly2szw==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1goP4t-00080E-JI; Tue, 29 Jan 2019 08:49:07 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 82E47201EC171; Tue, 29 Jan 2019 09:49:04 +0100 (CET) Date: Tue, 29 Jan 2019 09:49:04 +0100 From: Peter Zijlstra To: Thomas Gleixner Cc: Heiko Carstens , Ingo Molnar , Martin Schwidefsky , LKML , linux-s390@vger.kernel.org, Stefan Liebler , Sebastian Sewior Subject: Re: WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggered Message-ID: <20190129084904.GB28485@hirez.programming.kicks-ass.net> References: <20181127081115.GB3625@osiris> <20181129112321.GB3449@osiris> <20190128134410.GA28485@hirez.programming.kicks-ass.net> <20190128135804.GB28878@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 28, 2019 at 04:53:19PM +0100, Thomas Gleixner wrote: > Right after staring long enough at it, the commit simply forgot to give > __rt_mutex_start_proxy_lock() the same treatment as it gave to > rt_mutex_wait_proxy_lock(). > > Patch below cures that. Yes, that is a very nice solution. Find below an updated patch that includes a few comments. I found it harder than it should be to reconstruct this code. (also, I flipped the label names) --- kernel/futex.c | 28 ++++++++++++++++++---------- kernel/locking/rtmutex.c | 37 ++++++++++++++++++++++++++++++++----- 2 files changed, 50 insertions(+), 15 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index fdd312da0992..9d8411d3142d 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2861,35 +2861,39 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned int flags, * and BUG when futex_unlock_pi() interleaves with this. * * Therefore acquire wait_lock while holding hb->lock, but drop the - * latter before calling rt_mutex_start_proxy_lock(). This still fully - * serializes against futex_unlock_pi() as that does the exact same - * lock handoff sequence. + * latter before calling __rt_mutex_start_proxy_lock(). This + * interleaves with futex_unlock_pi() -- which does a similar lock + * handoff -- such that the latter can observe the futex_q::pi_state + * before __rt_mutex_start_proxy_lock() is done. */ raw_spin_lock_irq(&q.pi_state->pi_mutex.wait_lock); spin_unlock(q.lock_ptr); + /* + * __rt_mutex_start_proxy_lock() unconditionally enqueues the @rt_waiter + * such that futex_unlock_pi() is guaranteed to observe the waiter when + * it sees the futex_q::pi_state. + */ ret = __rt_mutex_start_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter, current); raw_spin_unlock_irq(&q.pi_state->pi_mutex.wait_lock); if (ret) { if (ret == 1) ret = 0; - - spin_lock(q.lock_ptr); - goto no_block; + goto cleanup; } - if (unlikely(to)) hrtimer_start_expires(&to->timer, HRTIMER_MODE_ABS); ret = rt_mutex_wait_proxy_lock(&q.pi_state->pi_mutex, to, &rt_waiter); +cleanup: spin_lock(q.lock_ptr); /* - * If we failed to acquire the lock (signal/timeout), we must + * If we failed to acquire the lock (deadlock/signal/timeout), we must * first acquire the hb->lock before removing the lock from the - * rt_mutex waitqueue, such that we can keep the hb and rt_mutex - * wait lists consistent. + * rt_mutex waitqueue, such that we can keep the hb and rt_mutex wait + * lists consistent. * * In particular; it is important that futex_unlock_pi() can not * observe this inconsistency. @@ -3013,6 +3017,10 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags) * there is no point where we hold neither; and therefore * wake_futex_pi() must observe a state consistent with what we * observed. + * + * In particular; this forces __rt_mutex_start_proxy() to + * complete such that we're guaranteed to observe the + * rt_waiter. Also see the WARN in wake_futex_pi(). */ raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); spin_unlock(&hb->lock); diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 581edcc63c26..afaf37d0ac15 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1726,12 +1726,33 @@ void rt_mutex_proxy_unlock(struct rt_mutex *lock, rt_mutex_set_owner(lock, NULL); } +/** + * __rt_mutex_start_proxy_lock() - Start lock acquisition for another task + * @lock: the rt_mutex to take + * @waiter: the pre-initialized rt_mutex_waiter + * @task: the task to prepare + * + * Starts the rt_mutex acquire; it enqueues the @waiter and does deadlock + * detection. It does not wait, see rt_mutex_wait_proxy_lock() for that. + * + * NOTE: does _NOT_ remove the @waiter on failure; must either call + * rt_mutex_wait_proxy_lock() or rt_mutex_cleanup_proxy_lock() after this. + * + * Returns: + * 0 - task blocked on lock + * 1 - acquired the lock for task, caller should wake it up + * <0 - error + * + * Special API call for PI-futex support. + */ int __rt_mutex_start_proxy_lock(struct rt_mutex *lock, struct rt_mutex_waiter *waiter, struct task_struct *task) { int ret; + lockdep_asssert_held(&lock->wait_lock); + if (try_to_take_rt_mutex(lock, task, NULL)) return 1; @@ -1749,9 +1770,6 @@ int __rt_mutex_start_proxy_lock(struct rt_mutex *lock, ret = 0; } - if (unlikely(ret)) - remove_waiter(lock, waiter); - debug_rt_mutex_print_deadlock(waiter); return ret; @@ -1763,12 +1781,18 @@ int __rt_mutex_start_proxy_lock(struct rt_mutex *lock, * @waiter: the pre-initialized rt_mutex_waiter * @task: the task to prepare * + * Starts the rt_mutex acquire; it enqueues the @waiter and does deadlock + * detection. It does not wait, see rt_mutex_wait_proxy_lock() for that. + * + * NOTE: unlike __rt_mutex_start_proxy_lock this _DOES_ remove the @waiter + * on failure. + * * Returns: * 0 - task blocked on lock * 1 - acquired the lock for task, caller should wake it up * <0 - error * - * Special API call for FUTEX_REQUEUE_PI support. + * Special API call for PI-futex support. */ int rt_mutex_start_proxy_lock(struct rt_mutex *lock, struct rt_mutex_waiter *waiter, @@ -1778,6 +1802,8 @@ int rt_mutex_start_proxy_lock(struct rt_mutex *lock, raw_spin_lock_irq(&lock->wait_lock); ret = __rt_mutex_start_proxy_lock(lock, waiter, task); + if (unlikely(ret)) + remove_waiter(lock, waiter); raw_spin_unlock_irq(&lock->wait_lock); return ret; @@ -1845,7 +1871,8 @@ int rt_mutex_wait_proxy_lock(struct rt_mutex *lock, * @lock: the rt_mutex we were woken on * @waiter: the pre-initialized rt_mutex_waiter * - * Attempt to clean up after a failed rt_mutex_wait_proxy_lock(). + * Attempt to clean up after a failed __rt_mutex_start_proxy_lock() or + * rt_mutex_wait_proxy_lock(). * * Unless we acquired the lock; we're still enqueued on the wait-list and can * in fact still be granted ownership until we're removed. Therefore we can