Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965397AbcKXL1D (ORCPT ); Thu, 24 Nov 2016 06:27:03 -0500 Received: from mail-wm0-f43.google.com ([74.125.82.43]:33876 "EHLO mail-wm0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964851AbcKXL1B (ORCPT ); Thu, 24 Nov 2016 06:27:01 -0500 Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes To: Peter Zijlstra , =?UTF-8?Q?Nicolai_H=c3=a4hnle?= , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Ingo Molnar , stable@vger.kernel.org, Maarten Lankhorst References: <1479900325-28358-1-git-send-email-nhaehnle@gmail.com> <20161123140336.GU3092@twins.programming.kicks-ass.net> <20161123142525.ns2pkyp4bo2sa5z2@phenom.ffwll.local> From: =?UTF-8?Q?Nicolai_H=c3=a4hnle?= Message-ID: Date: Thu, 24 Nov 2016 12:26:57 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161123142525.ns2pkyp4bo2sa5z2@phenom.ffwll.local> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3060 Lines: 71 On 23.11.2016 15:25, Daniel Vetter wrote: > On Wed, Nov 23, 2016 at 03:03:36PM +0100, Peter Zijlstra wrote: >> On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai H?hnle wrote: >>> @@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock) >>> */ >>> mutex_clear_owner(&lock->base); >>> #endif >>> - __mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath); >>> + /* >>> + * A previously _not_ waiting task may acquire the lock via the fast >>> + * path during our unlock. In that case, already waiting tasks may have >>> + * to back off to avoid a deadlock. Wake up all waiters so that they >>> + * can check their acquire context stamp against the new owner. >>> + */ >>> + __mutex_fastpath_unlock(&lock->base.count, >>> + __mutex_unlock_slowpath_wakeall); >>> } >> >> So doing a wake-all has obvious issues with thundering herd etc.. Also, >> with the new mutex, you'd not be able to do hand-off, which would >> introduce starvation cases. >> >> Ideally we'd iterate the blocked list and pick the waiter with the >> earliest stamp, or we'd maintain the list in stamp order instead of >> FIFO, for ww_mutex. > > Not sure we'll win that much, at least I think we still need to wake up > everyone with earlier stamp than the one of the task that just released > the lock. Otherwise there's deadlocks. So just cuts the wakeups in half, > on average. > > What we could do is do a handoff-hint with the timestamp of earliest task > we believe should get the lock. Everyone with a later timestamp that gets > woken then knows that they definitely have a deadlock situation and need > to back off (thread 2 in the example). > > thread 1 would get woken, and would be able to take the lock, except when > thread 0 successfully raced it and stole the lock. And anyone else racing > in with later timestamps would also immediately back off, ensuring > fairness. I did consider maintaining stamp order in the waiter list and originally decided against it because I just wanted a simple and conservative fix to avoid adding new regressions. Now that a different approach is needed for >= 4.9 anyway mostly due to the hand-off logic, I'm reconsidering this. I do believe we can win a bit by keeping the wait list sorted, if we also make sure that waiters don't add themselves in the first place if they see that a deadlock situation cannot be avoided. I will probably want to extend struct mutex_waiter with ww_mutex-specific fields to facilitate this (i.e. ctx pointer, perhaps stamp as well to reduce pointer-chasing). That should be fine since it lives on the stack. In the meantime, I'd appreciate it if patch #1 could be accepted as-is for stable updates to <= 4.8. It fixes a real (if rare) bug, and the stampede inefficiency isn't a problem in practice at least for GPU applications. Thanks, Nicolai > > Without thinking it through in detail this is a PI issue, except that we > replace boosting with wakeup&back-off. Could we perhaps steal something > from rt mutexes to make it fair&efficient? > -Daniel >