Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp2660859imj; Mon, 11 Feb 2019 06:37:53 -0800 (PST) X-Google-Smtp-Source: AHgI3IaZlicuc5gJa3wuZuw4sgkbYYToDqYp9/LrJwYi5isgNFJMzMGf3a1hKO2JnGSf+h0VXxH5 X-Received: by 2002:a63:f91c:: with SMTP id h28mr9296617pgi.14.1549895873095; Mon, 11 Feb 2019 06:37:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549895873; cv=none; d=google.com; s=arc-20160816; b=WZYoX30I+7ne0xLMR24pLgoGw+6XFRlybLLNwCOMIyoAMajax77mvUa/NEBe8BMrlh hucrI1LBmu8p6l2Kz1M4p6PKONF/D/y9Hejaf12gcjvdYhyu30D10aJLpaVDTFeyVKlw nDW6zkwB7kCM+Mwpc72nIIZED0xeJ+DSpq3/Lb6/lkpXavVLSFLGGrk+lreqxF2DOj4w Oawlx8NR6Uv0PK5xhFze0QufZHcXx0iLcbMX6N5dhn2l/jFIav7Fvz1pnHo218BL6k43 wm7ed74eTVdB1ftCaeMWvstiAG9hYlX+UoknJYmOhcENwLfcV3StTAZ7s/p0njlOxLYS 0V4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=CRoNY3AGB+BpX8/mXbkDo/C1zjOsHPdUryiXQvWVBWk=; b=hDieOGFUdQOzee4uKHLbQx0I34Aur7xzVP8aPibf+1KSyevSQBDwWOlA0hHpFpK20J Svja9WfAbTb2o5OXDNMSkrzW+qa32Ijit4w37YrmBDNNt1zYAiQLkgXcOScK9mq2/5/U k0BUE7L43ii7k6cpNWGXKLGXqXSOtreINjmVXlpNTQaOaBiqiqx0Ptg0smcyoBq7hc9f IjCPLOOrl+h5yB8G4PFEgOoYXadnP2rGTUOee4P1bXSCrxZi2ifUo57GDqFHea02dNhN FNpZ/gRDK6TnqG0dcH55hJrPErF4+1jDaQYbrNdo8i+deidUoRfYYUm7kj47izqgfhft wpkQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=NJJu5i1F; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c13si9429683pgi.531.2019.02.11.06.37.36; Mon, 11 Feb 2019 06:37:53 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=NJJu5i1F; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731323AbfBKOgr (ORCPT + 99 others); Mon, 11 Feb 2019 09:36:47 -0500 Received: from mail.kernel.org ([198.145.29.99]:45978 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731312AbfBKOgq (ORCPT ); Mon, 11 Feb 2019 09:36:46 -0500 Received: from localhost (5356596B.cm-6-7b.dynamic.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7F64520700; Mon, 11 Feb 2019 14:36:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1549895805; bh=4Z7xXJSBAiIX//hgcTI11DeoW7T3aDaTyf9TcJpSiZk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NJJu5i1FUg9lOm8Kl4kmNFoQz+JZFXyvBH5ZrOmAmtjc2svGM1DQCxPLABGcCL0qW r6rzS1IT/Rc9vOhLv6iEdBAu2CYSIo6yWdle3xDHluM/oz1hzhEmZ1Ivct6NXdGehW RozTqL/eIULeEXJAwDAJ0y2N1SZINDIL2P31DmSU= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Heiko Carstens , Peter Zijlstra , Thomas Gleixner , Martin Schwidefsky , linux-s390@vger.kernel.org, Stefan Liebler , Sebastian Sewior Subject: [PATCH 4.20 328/352] futex: Handle early deadlock return correctly Date: Mon, 11 Feb 2019 15:19:15 +0100 Message-Id: <20190211141907.746487189@linuxfoundation.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190211141846.543045703@linuxfoundation.org> References: <20190211141846.543045703@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review X-Patchwork-Hint: ignore MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.20-stable review patch. If anyone has any objections, please let me know. ------------------ From: Thomas Gleixner commit 1a1fb985f2e2b85ec0d3dc2e519ee48389ec2434 upstream. commit 56222b212e8e ("futex: Drop hb->lock before enqueueing on the rtmutex") changed the locking rules in the futex code so that the hash bucket lock is not longer held while the waiter is enqueued into the rtmutex wait list. This made the lock and the unlock path symmetric, but unfortunately the possible early exit from __rt_mutex_proxy_start() due to a detected deadlock was not updated accordingly. That allows a concurrent unlocker to observe inconsitent state which triggers the warning in the unlock path. futex_lock_pi() futex_unlock_pi() lock(hb->lock) queue(hb_waiter) lock(hb->lock) lock(rtmutex->wait_lock) unlock(hb->lock) // acquired hb->lock hb_waiter = futex_top_waiter() lock(rtmutex->wait_lock) __rt_mutex_proxy_start() ---> fail remove(rtmutex_waiter); ---> returns -EDEADLOCK unlock(rtmutex->wait_lock) // acquired wait_lock wake_futex_pi() rt_mutex_next_owner() --> returns NULL --> WARN lock(hb->lock) unqueue(hb_waiter) The problem is caused by the remove(rtmutex_waiter) in the failure case of __rt_mutex_proxy_start() as this lets the unlocker observe a waiter in the hash bucket but no waiter on the rtmutex, i.e. inconsistent state. The original commit handles this correctly for the other early return cases (timeout, signal) by delaying the removal of the rtmutex waiter until the returning task reacquired the hash bucket lock. Treat the failure case of __rt_mutex_proxy_start() in the same way and let the existing cleanup code handle the eventual handover of the rtmutex gracefully. The regular rt_mutex_proxy_start() gains the rtmutex waiter removal for the failure case, so that the other callsites are still operating correctly. Add proper comments to the code so all these details are fully documented. Thanks to Peter for helping with the analysis and writing the really valuable code comments. Fixes: 56222b212e8e ("futex: Drop hb->lock before enqueueing on the rtmutex") Reported-by: Heiko Carstens Co-developed-by: Peter Zijlstra Signed-off-by: Peter Zijlstra Signed-off-by: Thomas Gleixner Tested-by: Heiko Carstens Cc: Martin Schwidefsky Cc: linux-s390@vger.kernel.org Cc: Stefan Liebler Cc: Sebastian Sewior Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1901292311410.1950@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman --- kernel/futex.c | 28 ++++++++++++++++++---------- kernel/locking/rtmutex.c | 37 ++++++++++++++++++++++++++++++++----- 2 files changed, 50 insertions(+), 15 deletions(-) --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2850,35 +2850,39 @@ retry_private: * and BUG when futex_unlock_pi() interleaves with this. * * Therefore acquire wait_lock while holding hb->lock, but drop the - * latter before calling rt_mutex_start_proxy_lock(). This still fully - * serializes against futex_unlock_pi() as that does the exact same - * lock handoff sequence. + * latter before calling __rt_mutex_start_proxy_lock(). This + * interleaves with futex_unlock_pi() -- which does a similar lock + * handoff -- such that the latter can observe the futex_q::pi_state + * before __rt_mutex_start_proxy_lock() is done. */ raw_spin_lock_irq(&q.pi_state->pi_mutex.wait_lock); spin_unlock(q.lock_ptr); + /* + * __rt_mutex_start_proxy_lock() unconditionally enqueues the @rt_waiter + * such that futex_unlock_pi() is guaranteed to observe the waiter when + * it sees the futex_q::pi_state. + */ ret = __rt_mutex_start_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter, current); raw_spin_unlock_irq(&q.pi_state->pi_mutex.wait_lock); if (ret) { if (ret == 1) ret = 0; - - spin_lock(q.lock_ptr); - goto no_block; + goto cleanup; } - if (unlikely(to)) hrtimer_start_expires(&to->timer, HRTIMER_MODE_ABS); ret = rt_mutex_wait_proxy_lock(&q.pi_state->pi_mutex, to, &rt_waiter); +cleanup: spin_lock(q.lock_ptr); /* - * If we failed to acquire the lock (signal/timeout), we must + * If we failed to acquire the lock (deadlock/signal/timeout), we must * first acquire the hb->lock before removing the lock from the - * rt_mutex waitqueue, such that we can keep the hb and rt_mutex - * wait lists consistent. + * rt_mutex waitqueue, such that we can keep the hb and rt_mutex wait + * lists consistent. * * In particular; it is important that futex_unlock_pi() can not * observe this inconsistency. @@ -3002,6 +3006,10 @@ retry: * there is no point where we hold neither; and therefore * wake_futex_pi() must observe a state consistent with what we * observed. + * + * In particular; this forces __rt_mutex_start_proxy() to + * complete such that we're guaranteed to observe the + * rt_waiter. Also see the WARN in wake_futex_pi(). */ raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); spin_unlock(&hb->lock); --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1726,12 +1726,33 @@ void rt_mutex_proxy_unlock(struct rt_mut rt_mutex_set_owner(lock, NULL); } +/** + * __rt_mutex_start_proxy_lock() - Start lock acquisition for another task + * @lock: the rt_mutex to take + * @waiter: the pre-initialized rt_mutex_waiter + * @task: the task to prepare + * + * Starts the rt_mutex acquire; it enqueues the @waiter and does deadlock + * detection. It does not wait, see rt_mutex_wait_proxy_lock() for that. + * + * NOTE: does _NOT_ remove the @waiter on failure; must either call + * rt_mutex_wait_proxy_lock() or rt_mutex_cleanup_proxy_lock() after this. + * + * Returns: + * 0 - task blocked on lock + * 1 - acquired the lock for task, caller should wake it up + * <0 - error + * + * Special API call for PI-futex support. + */ int __rt_mutex_start_proxy_lock(struct rt_mutex *lock, struct rt_mutex_waiter *waiter, struct task_struct *task) { int ret; + lockdep_assert_held(&lock->wait_lock); + if (try_to_take_rt_mutex(lock, task, NULL)) return 1; @@ -1749,9 +1770,6 @@ int __rt_mutex_start_proxy_lock(struct r ret = 0; } - if (unlikely(ret)) - remove_waiter(lock, waiter); - debug_rt_mutex_print_deadlock(waiter); return ret; @@ -1763,12 +1781,18 @@ int __rt_mutex_start_proxy_lock(struct r * @waiter: the pre-initialized rt_mutex_waiter * @task: the task to prepare * + * Starts the rt_mutex acquire; it enqueues the @waiter and does deadlock + * detection. It does not wait, see rt_mutex_wait_proxy_lock() for that. + * + * NOTE: unlike __rt_mutex_start_proxy_lock this _DOES_ remove the @waiter + * on failure. + * * Returns: * 0 - task blocked on lock * 1 - acquired the lock for task, caller should wake it up * <0 - error * - * Special API call for FUTEX_REQUEUE_PI support. + * Special API call for PI-futex support. */ int rt_mutex_start_proxy_lock(struct rt_mutex *lock, struct rt_mutex_waiter *waiter, @@ -1778,6 +1802,8 @@ int rt_mutex_start_proxy_lock(struct rt_ raw_spin_lock_irq(&lock->wait_lock); ret = __rt_mutex_start_proxy_lock(lock, waiter, task); + if (unlikely(ret)) + remove_waiter(lock, waiter); raw_spin_unlock_irq(&lock->wait_lock); return ret; @@ -1845,7 +1871,8 @@ int rt_mutex_wait_proxy_lock(struct rt_m * @lock: the rt_mutex we were woken on * @waiter: the pre-initialized rt_mutex_waiter * - * Attempt to clean up after a failed rt_mutex_wait_proxy_lock(). + * Attempt to clean up after a failed __rt_mutex_start_proxy_lock() or + * rt_mutex_wait_proxy_lock(). * * Unless we acquired the lock; we're still enqueued on the wait-list and can * in fact still be granted ownership until we're removed. Therefore we can