Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757698AbcLAOHD (ORCPT ); Thu, 1 Dec 2016 09:07:03 -0500 Received: from mail-wm0-f54.google.com ([74.125.82.54]:36812 "EHLO mail-wm0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751140AbcLAOHC (ORCPT ); Thu, 1 Dec 2016 09:07:02 -0500 From: =?UTF-8?q?Nicolai=20H=C3=A4hnle?= To: linux-kernel@vger.kernel.org Subject: [PATCH v2 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Date: Thu, 1 Dec 2016 15:06:43 +0100 Message-Id: <1480601214-26583-1-git-send-email-nhaehnle@gmail.com> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2046 Lines: 42 Changes to patches 1 & 5 based on feedback. I've also updated the branch at https://cgit.freedesktop.org/~nh/linux/log/?h=mutex. There's been the question of using a balanced tree rather than a list. Frankly, I'd say the 99% use case doesn't need it. Also, dealing with waiters without a context is easy in the list, but becomes trickier with a tree. I think one can do it with an additional counter on the lock itself to establish the FIFO order of context-less waiters, but it'd make the code harder to follow. I doubt that it's worth it. (original cover letter below) The basic idea is to make sure that: 1. All waiters that have a ww_ctx appear in stamp order in the wait list. Waiters without a ww_ctx are still supported and appear in FIFO order as before. 2. At most one of the waiters can be in a state where it has to check for back off (i.e., ww_ctx->acquire > 0). Technically, there are short time windows in which more than one such waiter can be on the list, but all but the first one are running. This happens when a new waiter with ww_ctx->acquire > 0 adds itself at the front of the list and wakes up the previous head of the list, and of course multiple such chained cases can be in-flight simultaneously. Then we only ever have to wake up one task at a time. This is _not_ always the head of the wait list, since there may be waiters without a context. But among waiters with a context, we only ever have to wake the first one. To achieve all this, the series adds a new field to mutex_waiter which is only used for the w/w lock case. As a consequence, calling mutex_lock directly on w/w locks is now definitely incorrect. That was likely the intention previously anyway, but grepping through the source I did find one place that had slipped through. I've included timings taken from a contention-heavy stress test to some of the patches. The stress test performs actual GPU operations which take a good chunk of the wall time, but even so, the series still manages to improve the wall time quite a bit. Cheers, Nicolai