Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756628AbcKWNmx (ORCPT ); Wed, 23 Nov 2016 08:42:53 -0500 Received: from mblankhorst.nl ([141.105.120.124]:37266 "EHLO mblankhorst.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756602AbcKWNmw (ORCPT ); Wed, 23 Nov 2016 08:42:52 -0500 X-Greylist: delayed 517 seconds by postgrey-1.27 at vger.kernel.org; Wed, 23 Nov 2016 08:42:52 EST Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes To: Peter Zijlstra , =?UTF-8?Q?Nicolai_H=c3=a4hnle?= , =?UTF-8?Q?Nicolai_H=c3=a4hnle?= , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Ingo Molnar , stable@vger.kernel.org, Maarten Lankhorst References: <1479900325-28358-1-git-send-email-nhaehnle@gmail.com> <20161123130046.GS3092@twins.programming.kicks-ass.net> <20161123130848.q6yw73fjdhttmbqh@phenom.ffwll.local> <20161123131142.g53re5uznh6jqtk6@phenom.ffwll.local> From: Maarten Lankhorst Message-ID: Date: Wed, 23 Nov 2016 14:33:54 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161123131142.g53re5uznh6jqtk6@phenom.ffwll.local> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3849 Lines: 81 Op 23-11-16 om 14:11 schreef Daniel Vetter: > On Wed, Nov 23, 2016 at 02:08:48PM +0100, Daniel Vetter wrote: >> On Wed, Nov 23, 2016 at 02:00:46PM +0100, Peter Zijlstra wrote: >>> On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai H?hnle wrote: >>>> From: Nicolai H?hnle >>>> >>>> Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in >>>> the following example. Acquire context stamps are ordered like the thread >>>> numbers, i.e. thread #1 should back off when it encounters a mutex locked >>>> by thread #0 etc. >>>> >>>> Thread #0 Thread #1 Thread #2 Thread #3 >>>> --------- --------- --------- --------- >>>> lock(ww) >>>> success >>>> lock(ww') >>>> success >>>> lock(ww) >>>> lock(ww) . >>>> . . unlock(ww) part 1 >>>> lock(ww) . . . >>>> success . . . >>>> . . unlock(ww) part 2 >>>> . back off >>>> lock(ww') . >>>> . . >>>> (stuck) (stuck) >>>> >>>> Here, unlock(ww) part 1 is the part that sets lock->base.count to 1 >>>> (without being protected by lock->base.wait_lock), meaning that thread #0 >>>> can acquire ww in the fast path or, much more likely, the medium path >>>> in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then >>>> won't wake up any of the waiters in ww_mutex_set_context_fastpath. >>>> >>>> Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is >>>> thread #2, since waiters are added at the tail. Thread #2 wakes up and >>>> backs off since it sees ww owned by a context with a lower stamp. >>>> >>>> Meanwhile, thread #1 is never woken up, and so it won't back off its lock >>>> on ww'. So thread #0 gets stuck waiting for ww' to be released. >>>> >>>> This patch fixes the deadlock by waking up all waiters in the slow path >>>> of ww_mutex_unlock. >>>> >>>> We have an internal test case for amdgpu which continuously submits >>>> command streams from tens of threads, where all command streams reference >>>> hundreds of GPU buffer objects with a lot of overlap in the buffer lists >>>> between command streams. This test reliably caused a deadlock, and while I >>>> haven't completely confirmed that it is exactly the scenario outlined >>>> above, this patch does fix the test case. >>>> >>>> v2: >>>> - use wake_q_add >>>> - add additional explanations >>>> >>>> Cc: Peter Zijlstra >>>> Cc: Ingo Molnar >>>> Cc: Chris Wilson >>>> Cc: Maarten Lankhorst >>>> Cc: dri-devel@lists.freedesktop.org >>>> Cc: stable@vger.kernel.org >>>> Reviewed-by: Christian K?nig (v1) >>>> Signed-off-by: Nicolai H?hnle >>> Completely and utterly fails to apply; I think this patch is based on >>> code prior to the mutex rewrite. >>> >>> Please rebase on tip/locking/core. >>> >>> Also, is this a regression, or has this been a 'feature' of the ww_mutex >>> code from early on? >> Sorry forgot to mention that, but I checked. Seems to have been broken >> since day 1, at least looking at the original code the wake-single-waiter >> stuff is as old as the mutex code added in 2006. > More details: For gpu drivers this was originally working, since the > ww_mutex implementation in ttm did use wake_up_all. So need to add a > > Fixes: 5e338405119a ("drm/ttm: convert to the reservation api") But it's broken in the original kernel ww_mutex implementation, I guess it doesn't matter much since ttm was the first in-kernel user anyway. :) ~Maarten