Received: by 2002:a05:6a10:c604:0:0:0:0 with SMTP id y4csp1763237pxt; Sun, 8 Aug 2021 00:24:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJymg+HpSOYhdfxskqmYZ7jRwRaXrPdaJIpKGxUZ/0zaBdbLmAwiSo6Po/t81yXSUrxexRT/ X-Received: by 2002:a02:9444:: with SMTP id a62mr16938442jai.138.1628407492697; Sun, 08 Aug 2021 00:24:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628407492; cv=none; d=google.com; s=arc-20160816; b=Ou+UufDkWHUx1H+fP6n5hxEzIGcTF8Id5h5tbqXtCnB08t0TqZ2c6BwvxswFHRvf24 IslJBY8GDbKgMkPoWSZ4NwuCV2iaD73D0UWBhYcpVUkdTQM593/YTR7I4l4Pbb1iIn42 XcF/67AdUirH8RPfS/qZNW3KkzQEsfOXjNCFjXRLSKjt2sqfN21dC+7/6jzuur21bjJA 3loHR1w0LXFK3/v57/EwxpTg2xwjo2LZ8sEoGpaa8i4J3wzWYvHpaeFgNwnOK8iFtv9d vUB1Ori/SNIvR/iw9/366DY9QA8bhfkxQEd4YTaITJzEpIb33Qd+Zf7QD/zZGpMEUJNm buJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=MXTQvWSC7W55gwL0wPQcCIuUwhn5WGMQMwvjPm4vrUA=; b=MLQwLgpILXijFsL1kEkTIo2Nffm3Ccxzu9j5Uuyrw+zKPUiEL9BiAInS/yVrKndVJ/ DK25j8pB0l9LeeEfMYOccl6WYJ5FJaO0h817ZL52PB6/Q0FynfVzJbrhOIkk0jLCrI50 NshJyLdTM9Nyv+z5WbZn1SRnaPHscQ0FvT3/Ezo4EgY7RGJSyjUVlPEHXe3L9R0TvjBR MX0usOo+uDIBs99TmujjsO6BOpJbWeyYKtJE4V0kEqRnx+yrKZZfNf3Rwabs/Xcc6TPT ZZYX+sookDiJDsbDAUrK1W8KZH8Uz304K2bHgbwBF+fAKAeEXZh3ucJ2G49KIQYeoeG6 2+4g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=yZMCiZGA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v14si13134933ilj.0.2021.08.08.00.24.42; Sun, 08 Aug 2021 00:24:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=yZMCiZGA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231630AbhHHHXk (ORCPT + 99 others); Sun, 8 Aug 2021 03:23:40 -0400 Received: from mail.kernel.org ([198.145.29.99]:55262 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231202AbhHHHX2 (ORCPT ); Sun, 8 Aug 2021 03:23:28 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 40E5961040; Sun, 8 Aug 2021 07:23:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1628407389; bh=pdZek8fX/klPvDCWKpRSqPOYgAyV+GiFtspqfRXUtPo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=yZMCiZGABIIKHv1d8JdabZsl4aG8KxLSbd03Cj28L+uVxC+/v93242sPOuNpaSHls s0SstfyTZG0sarlqKIVJl35T3z5id+4NMKbPZY3nFKy6NgmNtCnucWWhECesuTYAyM 1bRNCxKuef4E5KFbELwUTN+mVMMPVpofCjnyzD1g= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, "Peter Zijlstra (Intel)" , juri.lelli@arm.com, bigeasy@linutronix.de, xlpang@redhat.com, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jdesfossez@efficios.com, dvhart@infradead.org, bristot@redhat.com, Thomas Gleixner , Zhen Lei , Joe Korty Subject: [PATCH 4.4 05/11] futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock() Date: Sun, 8 Aug 2021 09:22:40 +0200 Message-Id: <20210808072217.498729555@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210808072217.322468704@linuxfoundation.org> References: <20210808072217.322468704@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Peter Zijlstra [ Upstream commit cfafcd117da0216520568c195cb2f6cd1980c4bb ] By changing futex_lock_pi() to use rt_mutex_*_proxy_lock() all wait_list modifications are done under both hb->lock and wait_lock. This closes the obvious interleave pattern between futex_lock_pi() and futex_unlock_pi(), but not entirely so. See below: Before: futex_lock_pi() futex_unlock_pi() unlock hb->lock lock hb->lock unlock hb->lock lock rt_mutex->wait_lock unlock rt_mutex_wait_lock -EAGAIN lock rt_mutex->wait_lock list_add unlock rt_mutex->wait_lock schedule() lock rt_mutex->wait_lock list_del unlock rt_mutex->wait_lock -EAGAIN lock hb->lock After: futex_lock_pi() futex_unlock_pi() lock hb->lock lock rt_mutex->wait_lock list_add unlock rt_mutex->wait_lock unlock hb->lock schedule() lock hb->lock unlock hb->lock lock hb->lock lock rt_mutex->wait_lock list_del unlock rt_mutex->wait_lock lock rt_mutex->wait_lock unlock rt_mutex_wait_lock -EAGAIN unlock hb->lock It does however solve the earlier starvation/live-lock scenario which got introduced with the -EAGAIN since unlike the before scenario; where the -EAGAIN happens while futex_unlock_pi() doesn't hold any locks; in the after scenario it happens while futex_unlock_pi() actually holds a lock, and then it is serialized on that lock. Signed-off-by: Peter Zijlstra (Intel) Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104152.062785528@infradead.org Signed-off-by: Thomas Gleixner Signed-off-by: Zhen Lei Acked-by: Joe Korty Signed-off-by: Greg Kroah-Hartman --- kernel/futex.c | 77 ++++++++++++++++++++++++++++------------ kernel/locking/rtmutex.c | 26 +++---------- kernel/locking/rtmutex_common.h | 1 3 files changed, 62 insertions(+), 42 deletions(-) --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2284,20 +2284,7 @@ queue_unlock(struct futex_hash_bucket *h hb_waiters_dec(hb); } -/** - * queue_me() - Enqueue the futex_q on the futex_hash_bucket - * @q: The futex_q to enqueue - * @hb: The destination hash bucket - * - * The hb->lock must be held by the caller, and is released here. A call to - * queue_me() is typically paired with exactly one call to unqueue_me(). The - * exceptions involve the PI related operations, which may use unqueue_me_pi() - * or nothing if the unqueue is done as part of the wake process and the unqueue - * state is implicit in the state of woken task (see futex_wait_requeue_pi() for - * an example). - */ -static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb) - __releases(&hb->lock) +static inline void __queue_me(struct futex_q *q, struct futex_hash_bucket *hb) { int prio; @@ -2314,6 +2301,24 @@ static inline void queue_me(struct futex plist_node_init(&q->list, prio); plist_add(&q->list, &hb->chain); q->task = current; +} + +/** + * queue_me() - Enqueue the futex_q on the futex_hash_bucket + * @q: The futex_q to enqueue + * @hb: The destination hash bucket + * + * The hb->lock must be held by the caller, and is released here. A call to + * queue_me() is typically paired with exactly one call to unqueue_me(). The + * exceptions involve the PI related operations, which may use unqueue_me_pi() + * or nothing if the unqueue is done as part of the wake process and the unqueue + * state is implicit in the state of woken task (see futex_wait_requeue_pi() for + * an example). + */ +static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb) + __releases(&hb->lock) +{ + __queue_me(q, hb); spin_unlock(&hb->lock); } @@ -2819,6 +2824,7 @@ static int futex_lock_pi(u32 __user *uad { struct hrtimer_sleeper timeout, *to = NULL; struct task_struct *exiting = NULL; + struct rt_mutex_waiter rt_waiter; struct futex_hash_bucket *hb; struct futex_q q = futex_q_init; int res, ret; @@ -2879,25 +2885,52 @@ retry_private: } } + WARN_ON(!q.pi_state); + /* * Only actually queue now that the atomic ops are done: */ - queue_me(&q, hb); + __queue_me(&q, hb); - WARN_ON(!q.pi_state); - /* - * Block on the PI mutex: - */ - if (!trylock) { - ret = rt_mutex_timed_futex_lock(&q.pi_state->pi_mutex, to); - } else { + if (trylock) { ret = rt_mutex_futex_trylock(&q.pi_state->pi_mutex); /* Fixup the trylock return value: */ ret = ret ? 0 : -EWOULDBLOCK; + goto no_block; } + /* + * We must add ourselves to the rt_mutex waitlist while holding hb->lock + * such that the hb and rt_mutex wait lists match. + */ + rt_mutex_init_waiter(&rt_waiter); + ret = rt_mutex_start_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter, current); + if (ret) { + if (ret == 1) + ret = 0; + + goto no_block; + } + + spin_unlock(q.lock_ptr); + + if (unlikely(to)) + hrtimer_start_expires(&to->timer, HRTIMER_MODE_ABS); + + ret = rt_mutex_wait_proxy_lock(&q.pi_state->pi_mutex, to, &rt_waiter); + spin_lock(q.lock_ptr); /* + * If we failed to acquire the lock (signal/timeout), we must + * first acquire the hb->lock before removing the lock from the + * rt_mutex waitqueue, such that we can keep the hb and rt_mutex + * wait lists consistent. + */ + if (ret && !rt_mutex_cleanup_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter)) + ret = 0; + +no_block: + /* * Fixup the pi_state owner and possibly acquire the lock if we * haven't already. */ --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1489,19 +1489,6 @@ int __sched rt_mutex_lock_interruptible( EXPORT_SYMBOL_GPL(rt_mutex_lock_interruptible); /* - * Futex variant with full deadlock detection. - * Futex variants must not use the fast-path, see __rt_mutex_futex_unlock(). - */ -int __sched rt_mutex_timed_futex_lock(struct rt_mutex *lock, - struct hrtimer_sleeper *timeout) -{ - might_sleep(); - - return rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, - timeout, RT_MUTEX_FULL_CHAINWALK); -} - -/* * Futex variant, must not use fastpath. */ int __sched rt_mutex_futex_trylock(struct rt_mutex *lock) @@ -1774,12 +1761,6 @@ int rt_mutex_wait_proxy_lock(struct rt_m /* sleep on the mutex */ ret = __rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, to, waiter); - /* - * try_to_take_rt_mutex() sets the waiter bit unconditionally. We might - * have to fix that up. - */ - fixup_rt_mutex_waiters(lock); - raw_spin_unlock(&lock->wait_lock); return ret; @@ -1819,6 +1800,13 @@ bool rt_mutex_cleanup_proxy_lock(struct fixup_rt_mutex_waiters(lock); cleanup = true; } + + /* + * try_to_take_rt_mutex() sets the waiter bit unconditionally. We might + * have to fix that up. + */ + fixup_rt_mutex_waiters(lock); + raw_spin_unlock_irq(&lock->wait_lock); return cleanup; --- a/kernel/locking/rtmutex_common.h +++ b/kernel/locking/rtmutex_common.h @@ -111,7 +111,6 @@ extern int rt_mutex_wait_proxy_lock(stru struct rt_mutex_waiter *waiter); extern bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock, struct rt_mutex_waiter *waiter); -extern int rt_mutex_timed_futex_lock(struct rt_mutex *l, struct hrtimer_sleeper *to); extern int rt_mutex_futex_trylock(struct rt_mutex *l); extern int __rt_mutex_futex_trylock(struct rt_mutex *l);