Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754331AbcC0G0B (ORCPT ); Sun, 27 Mar 2016 02:26:01 -0400 Received: from mail-wm0-f54.google.com ([74.125.82.54]:33394 "EHLO mail-wm0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752035AbcC0GZw (ORCPT ); Sun, 27 Mar 2016 02:25:52 -0400 Message-ID: <1459059948.3799.14.camel@gmail.com> Subject: Re: futex: clarification needed with drop_futex_key_refs and memory barriers From: Mike Galbraith To: Petros Koutoupis , linux-kernel@vger.kernel.org Cc: "Paul E. McKenney" , Thomas Gleixner , Peter Zijlstra Date: Sun, 27 Mar 2016 08:25:48 +0200 In-Reply-To: <1459007812.5648.5.camel@petros-ultrathin> References: <1459007812.5648.5.camel@petros-ultrathin> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.16.5 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2953 Lines: 60 (futex ordering pop-flare) On Sat, 2016-03-26 at 10:56 -0500, Petros Koutoupis wrote: > I stumbled on an interesting scenario which I am unable to fully explain and I > was hoping to get some other opinions on why this would or wouldn't work. > > In recent testing on a 48-core Haswell arch server, our multi-threaded user space > application was utilizing 60% to 100% more CPU than on our smaller 24-core servers > (running an identical load). After spending a considerable amount of time analyzing > stack dumps and straces it became immediately apparent that those exact threads > operating with the higher CPU utilization were off in futex land. > > Shortly afterward I stumbled on commit 76835b0ebf8a7fe85beb03c75121419a7dec52f0 > (futex: Ensure get_futex_key_refs() always implies a barrier) which addressed the > handling of private futexes and preventing a race condition by completing the > function with a memory barrier. Now, I fully understand why this patch was implemented: > to have a memory barrier before checking the "waiters." It makes sense. What doesn't > make sense (so far) is when I apply the same patch to the drop counterpart, > drop_futex_key_refs(), and the problem goes away. See the change and my notes below. > > > --- linux/kernel/futex.c.orig 2016-03-25 19:45:08.169563263 -0500 > +++ linux/kernel/futex.c 2016-03-25 19:46:06.901562211 -0500 > @@ -438,11 +438,13 @@ static void drop_futex_key_refs(union fu > > switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) { > case FUT_OFF_INODE: > - iput(key->shared.inode); > + iput(key->shared.inode); /* implies smp_mb(); (B) */ > break; > case FUT_OFF_MMSHARED: > - mmdrop(key->private.mm); > + mmdrop(key->private.mm); /* implies smp_mb(); (B) */ > break; > + default: > + smp_mb(); /* explicit smp_mb(); (B) */ > } > } > > > The iput() and mmdrop() routines in the switch statement eventually use > atomic_dec_and_test() which according to the Documentation/memory-barriers.txt > implies an smp_mb() on each side of the actual operation. Notice that private > futexes aren't handled by this (read below) this switch. > > Now there is a wrapper put_futex_key() which is called in a few function as a > way to clean up before before retrying, but in every case, and before it is > called, a check is made to see if the futex is private and if so, retries at > a more appropriate area of its respective function. > > Now I have found two functions where this type of check/protection aren't made > and I am curious as to if I stumbled on what could potentially lead to a race > condition in a large SMP environment. Please refer to futex_wait() (called indirectly > via unqueue_me()) and futex_requeue(). > > Any thoughts or opinions would be greatly appreciated. Thank you in advance. > > -- > Petros >