Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932637AbbLSG4d (ORCPT ); Sat, 19 Dec 2015 01:56:33 -0500 Received: from mail-wm0-f52.google.com ([74.125.82.52]:35943 "EHLO mail-wm0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752759AbbLSG4b (ORCPT ); Sat, 19 Dec 2015 01:56:31 -0500 Subject: Re: futex(3) man page, final draft for pre-release review To: Torvald Riegel , Darren Hart References: <56701916.4090203@gmail.com> <20151215211816.GR11972@malice.jf.intel.com> <1450437714.26597.53.camel@localhost.localdomain> Cc: mtk.manpages@gmail.com, Thomas Gleixner , lkml , libc-alpha , linux-man , "Carlos O'Donell" , Roland McGrath , Davidlohr Bueso , Jakub Jelinek , Ingo Molnar , bill o gallmeister , bert hubert , Jan Kiszka , Eric Dumazet , Arnd Bergmann , Rusty Russell , Heinrich Schuchardt , Andy Lutomirski , Daniel Wagner , Anton Blanchard , Steven Rostedt , Rich Felker , Jonathan Wakely , Mike Frysinger From: "Michael Kerrisk (man-pages)" X-Enigmail-Draft-Status: N1110 Message-ID: <5674FF9B.7070002@gmail.com> Date: Sat, 19 Dec 2015 07:56:27 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <1450437714.26597.53.camel@localhost.localdomain> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5386 Lines: 123 On 12/18/2015 12:21 PM, Torvald Riegel wrote: > On Tue, 2015-12-15 at 13:18 -0800, Darren Hart wrote: >> On Tue, Dec 15, 2015 at 02:43:50PM +0100, Michael Kerrisk (man-pages) wrote: >>> >>> When executing a futex operation that requests to block a thread, >>> the kernel will block only if the futex word has the value that >>> the calling thread supplied (as one of the arguments of the >>> futex() call) as the expected value of the futex word. The load‐ >>> ing of the futex word's value, the comparison of that value with >>> the expected value, and the actual blocking will happen atomi‐ >>> >>> FIXME: for next line, it would be good to have an explanation of >>> "totally ordered" somewhere around here. >>> >>> cally and totally ordered with respect to concurrently executing >> >> Totally ordered with respect futex operations refers to semantics of the >> ACQUIRE/RELEASE operations and how they impact ordering of memory reads and >> writes. The kernel futex operations are protected by spinlocks, which ensure >> that that all operations are serialized with respect to one another. >> >> This is a lot to attempt to define in this document. Perhaps a reference to >> linux/Documentation/memory-barriers.txt as a footnote would be sufficient? Or >> perhaps for this manual, "serialized" would be sufficient, with a footnote >> regarding "totally ordered" and a pointer to the memory-barrier documentation? > > I'd strongly prefer to document the semantics for users here. Yes, please. > And I > don't think users use the kernel's memory model -- instead, if we assume > that most users will call futex ops from C or C++, then the best we have > is the C11 / C++11 memory model. Agreed. > Therefore, if we want to expand that, I think we should. And by we, I mean you ;-) > we should specify semantics in terms of as-if equivalence to C11 pseudo > code. I had proposed that in the past but, IIRC, Michael didn't want to > add a C11 "dependency" in the semantics back then, at least for the > initial release. I'd like to avoid it if possible, since many of us don't understand all the details of those C11 semantics--and by us, I mean me :-/. But maybe I'll be forced to educate myself better. > Here's what I wrote back then (atomic_*_relaxed() is like C11 > atomic_*(..., memory_order_relaxed), lock/unlock have normal C11 mutex > semantics): > > ======================== > > For example, we could say that futex_wait is, in terms of > synchronization semantics, *as if* we'd execute a piece of C11 code. > Here's a part of the docs for a glibc-internal futex wrapper that I'm > working on; this is futex_wait ... : > > /* Atomically wrt other futex operations, this blocks iff the value at > *FUTEX matches the expected value. This is semantically equivalent to: > l = (FUTEX); > wait_flag = (FUTEX); > lock (l); > val = atomic_load_relaxed (FUTEX); > if (val != expected) { unlock (l); return EAGAIN; } > atomic_store_relaxed (wait_flag, 1); > unlock (l); > // Now block; can time out in futex_time_wait (see below) > while (atomic_load_relaxed(wait_flag)); > > Note that no guarantee of a happens-before relation between a woken > futex_wait and a futex_wake is documented; however, this does not matter > in practice because we have to consider spurious wake-ups (see below), > and thus would not be able to reason which futex_wake woke us anyway. > > > ... and this is futex_wake: > > /* Atomically wrt other futex operations, this unblocks the specified > number of processes, or all processes blocked on this futex if there are > fewer than the specified number. Semantically, this is equivalent to: > l = (futex); > lock (l); > for (res = 0; processes_to_wake > 0; processes_to_wake--, res++) { > if () break; > wf = (futex); > // No happens-before guarantee with woken futex_wait (see above) > atomic_store_relaxed (wf, 0); > } > return res; > > This allows a programmer to really infer the guarantees he/she can get > from a futex in terms of synchronization, without the docs having to use > prose to describe that. This should also not constrain the kernel in > terms of how to implement it, because it is a conceptual as-if relation > (e.g., the kernel won't spin-wait the whole time, and we might want to > make this clear for the PI case). > > Of course, there are several as-if representations we could use, and we > might want to be a bit more pseudo-code-ish to make this also easy to > understand for people not familiar with C11 (e.g., using mutex + condvar > with some relaxation of condvar guaranteees). Okay -- I'm open to all of the above. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/