Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752143AbdLAULU (ORCPT ); Fri, 1 Dec 2017 15:11:20 -0500 Received: from bombadil.infradead.org ([65.50.211.133]:40489 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751066AbdLAULS (ORCPT ); Fri, 1 Dec 2017 15:11:18 -0500 Date: Fri, 1 Dec 2017 12:11:15 -0800 From: Darren Hart To: Julia Cartwright Cc: Thomas Gleixner , Peter Zijlstra , Gratian Crisan , linux-kernel@vger.kernel.org, Ingo Molnar Subject: Re: PI futexes + lock stealing woes Message-ID: <20171201201115.GB18881@fury> References: <20171129175605.GA863@jcartwri.amer.corp.natinst.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171129175605.GA863@jcartwri.amer.corp.natinst.com> User-Agent: Mutt/1.8.0 (2017-02-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4295 Lines: 107 On Wed, Nov 29, 2017 at 11:56:05AM -0600, Julia Cartwright wrote: > Hey Thomas, Peter- > > Gratian and I have been debugging into a nasty and difficult race w/ > futexes seemingly the culprit. The original symptom we were seeing > was a seemingly spurious -EDEADLK from a futex(LOCK_PI) operation. > > On further analysis, however, it appears the thread which gets the > spurious -EDEADLK has observed a weird futex state: a prior > futex(WAIT_REQUEUE_PI) operation has returned -ETIMEDOUT, but the uaddr2 > futex word owner field indicates that it's the owner. > Do you have a reproducer you can share? > Here's an attempt to boil down this situation into a pseudo trace; I'm > happy to forward along the full traces as well, if that would be > helpful: Please do forward the full trace > > waiter waker stealer (prio > waiter) > > futex(WAIT_REQUEUE_PI, uaddr, uaddr2, > timeout=[N ms]) > futex_wait_requeue_pi() > futex_wait_queue_me() > freezable_schedule() > > futex(LOCK_PI, uaddr2) > futex(CMP_REQUEUE_PI, uaddr, > uaddr2, 1, 0) > /* requeues waiter to uaddr2 */ > futex(UNLOCK_PI, uaddr2) > wake_futex_pi() > cmp_futex_value_locked(uaddr, waiter) > wake_up_q() > > clears sleeper->task> > futex(LOCK_PI, uaddr2) > __rt_mutex_start_proxy_lock() > try_to_take_rt_mutex() /* steals lock */ > rt_mutex_set_owner(lock, stealer) > > > rt_mutex_wait_proxy_lock() > __rt_mutex_slowlock() > try_to_take_rt_mutex() /* fails, lock held by stealer */ > if (timeout && !timeout->task) > return -ETIMEDOUT; > fixup_owner() > /* lock wasn't acquired, so, > fixup_pi_state_owner skipped */ > return -ETIMEDOUT; > > /* At this point, we've returned -ETIMEDOUT to userspace, but the > * futex word shows waiter to be the owner, and the pi_mutex has > * stealer as the owner */ > eeeeeeewwwweeee > futex_lock(LOCK_PI, uaddr2) > -> bails with EDEADLK, futex word says we're owner. > > At some later point in execution, the stealer gets scheduled back in and > will do fixup_owner() which fixes up the futex word, but at that point > it's too late: the waiter has already observed the wonky state. > > fixup_owner() used to have additional seemingly relevant checks in place > that were removed 73d786bd043eb ("futex: Rework inconsistent > rt_mutex/futex_q state"). This and the subsequent changes moving some of this out from under the hb->lock are interesting - and were quite fun to review at the time. Hrm. I'll continue paging this stuff in, although I suspect Peter will likely beat me to it. In the meantime, if you can share the reproducer and/or the trace you collected, that will be helpful. > > The actual kernel we've been testing is 4.9.33-rt23, w/ 153fbd1226fb3 > ("futex: Fix more put_pi_state() vs. exit_pi_state_list() races") And this does not exhibit the behavior above, correct? > cherry-picked w/ PREEMPT_RT_FULL. However, it appears that this issue > may affect v4.15-rc1? And this does? > > Thoughts on how to move forward? > > Nasty. > > Julia > -- Darren Hart VMware Open Source Technology Center