Date: Wed, 23 Nov 2016 14:11:42 +0100
From: Daniel Vetter <daniel@ffwll.ch>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Nicolai =?iso-8859-1?Q?H=E4hnle?= <nhaehnle@gmail.com>,
        Nicolai =?iso-8859-1?Q?H=E4hnle?= <Nicolai.Haehnle@amd.com>,
        linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
        Ingo Molnar <mingo@redhat.com>, stable@vger.kernel.org,
        Maarten Lankhorst <maarten.lankhorst@canonical.com>
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes
Message-ID: <20161123131142.g53re5uznh6jqtk6@phenom.ffwll.local>
Mail-Followup-To: Peter Zijlstra <peterz@infradead.org>,
        Nicolai =?iso-8859-1?Q?H=E4hnle?= <nhaehnle@gmail.com>,
        Nicolai =?iso-8859-1?Q?H=E4hnle?= <Nicolai.Haehnle@amd.com>,
        linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
        Ingo Molnar <mingo@redhat.com>, stable@vger.kernel.org,
        Maarten Lankhorst <maarten.lankhorst@canonical.com>
References: <1479900325-28358-1-git-send-email-nhaehnle@gmail.com>
 <20161123130046.GS3092@twins.programming.kicks-ass.net>
 <20161123130848.q6yw73fjdhttmbqh@phenom.ffwll.local>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20161123130848.q6yw73fjdhttmbqh@phenom.ffwll.local>
User-Agent: NeoMutt/20161104 (1.7.1)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3791
Lines: 83

On Wed, Nov 23, 2016 at 02:08:48PM +0100, Daniel Vetter wrote:
> On Wed, Nov 23, 2016 at 02:00:46PM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai H?hnle wrote:
> > > From: Nicolai H?hnle <Nicolai.Haehnle@amd.com>
> > > 
> > > Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in
> > > the following example. Acquire context stamps are ordered like the thread
> > > numbers, i.e. thread #1 should back off when it encounters a mutex locked
> > > by thread #0 etc.
> > > 
> > > Thread #0    Thread #1    Thread #2    Thread #3
> > > ---------    ---------    ---------    ---------
> > >                                        lock(ww)
> > >                                        success
> > >              lock(ww')
> > >              success
> > >                           lock(ww)
> > >              lock(ww)        .
> > >                 .            .         unlock(ww) part 1
> > > lock(ww)        .            .            .
> > > success         .            .            .
> > >                 .            .         unlock(ww) part 2
> > >                 .         back off
> > > lock(ww')       .
> > >    .            .
> > > (stuck)      (stuck)
> > > 
> > > Here, unlock(ww) part 1 is the part that sets lock->base.count to 1
> > > (without being protected by lock->base.wait_lock), meaning that thread #0
> > > can acquire ww in the fast path or, much more likely, the medium path
> > > in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then
> > > won't wake up any of the waiters in ww_mutex_set_context_fastpath.
> > > 
> > > Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is
> > > thread #2, since waiters are added at the tail. Thread #2 wakes up and
> > > backs off since it sees ww owned by a context with a lower stamp.
> > > 
> > > Meanwhile, thread #1 is never woken up, and so it won't back off its lock
> > > on ww'. So thread #0 gets stuck waiting for ww' to be released.
> > > 
> > > This patch fixes the deadlock by waking up all waiters in the slow path
> > > of ww_mutex_unlock.
> > > 
> > > We have an internal test case for amdgpu which continuously submits
> > > command streams from tens of threads, where all command streams reference
> > > hundreds of GPU buffer objects with a lot of overlap in the buffer lists
> > > between command streams. This test reliably caused a deadlock, and while I
> > > haven't completely confirmed that it is exactly the scenario outlined
> > > above, this patch does fix the test case.
> > > 
> > > v2:
> > > - use wake_q_add
> > > - add additional explanations
> > > 
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> > > Cc: dri-devel@lists.freedesktop.org
> > > Cc: stable@vger.kernel.org
> > > Reviewed-by: Christian K?nig <christian.koenig@amd.com> (v1)
> > > Signed-off-by: Nicolai H?hnle <nicolai.haehnle@amd.com>
> > 
> > Completely and utterly fails to apply; I think this patch is based on
> > code prior to the mutex rewrite.
> > 
> > Please rebase on tip/locking/core.
> > 
> > Also, is this a regression, or has this been a 'feature' of the ww_mutex
> > code from early on?
> 
> Sorry forgot to mention that, but I checked. Seems to have been broken
> since day 1, at least looking at the original code the wake-single-waiter
> stuff is as old as the mutex code added in 2006.

More details: For gpu drivers this was originally working, since the
ww_mutex implementation in ttm did use wake_up_all. So need to add a

Fixes: 5e338405119a ("drm/ttm: convert to the reservation api")
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch