Message-ID: <1450437714.26597.53.camel@localhost.localdomain>
Subject: Re: futex(3) man page, final draft for pre-release review
From: Torvald Riegel <triegel@redhat.com>
To: Darren Hart <dvhart@infradead.org>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        lkml <linux-kernel@vger.kernel.org>,
        libc-alpha <libc-alpha@sourceware.org>,
        linux-man <linux-man@vger.kernel.org>,
        "Carlos O'Donell" <carlos@redhat.com>,
        Roland McGrath <roland@hack.frob.com>,
        Davidlohr Bueso <dave@stgolabs.net>, Jakub Jelinek <jakub@redhat.com>,
        Ingo Molnar <mingo@elte.hu>,
        bill o gallmeister <bgallmeister@gmail.com>,
        bert hubert <bert.hubert@netherlabs.nl>,
        Jan Kiszka <jan.kiszka@siemens.com>,
        Eric Dumazet <edumazet@google.com>, Arnd Bergmann <arnd@arndb.de>,
        Rusty Russell <rusty@rustcorp.com.au>,
        Heinrich Schuchardt <xypron.glpk@gmx.de>,
        Andy Lutomirski <luto@amacapital.net>, Daniel Wagner <wagi@monom.org>,
        Anton Blanchard <anton@samba.org>,
        Steven Rostedt <rostedt@goodmis.org>, Rich Felker <dalias@libc.org>,
        Jonathan Wakely <jwakely@redhat.com>,
        Mike Frysinger <vapier@gentoo.org>
Date: Fri, 18 Dec 2015 12:21:54 +0100
In-Reply-To: <20151215211816.GR11972@malice.jf.intel.com>
References: <56701916.4090203@gmail.com>
	 <20151215211816.GR11972@malice.jf.intel.com>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4822
Lines: 99

On Tue, 2015-12-15 at 13:18 -0800, Darren Hart wrote:
> On Tue, Dec 15, 2015 at 02:43:50PM +0100, Michael Kerrisk (man-pages) wrote:
> > 
> >        When executing a futex operation that requests to block a thread,
> >        the kernel will block only if the futex word has the  value  that
> >        the  calling  thread  supplied  (as  one  of the arguments of the
> >        futex() call) as the expected value of the futex word.  The load‐
> >        ing  of the futex word's value, the comparison of that value with
> >        the expected value, and the actual blocking  will  happen  atomi‐
> > 
> > FIXME: for next line, it would be good to have an explanation of
> > "totally ordered" somewhere around here.
> > 
> >        cally  and totally ordered with respect to concurrently executing
> 
> Totally ordered with respect futex operations refers to semantics of the
> ACQUIRE/RELEASE operations and how they impact ordering of memory reads and
> writes. The kernel futex operations are protected by spinlocks, which ensure
> that that all operations are serialized with respect to one another.
> 
> This is a lot to attempt to define in this document. Perhaps a reference to
> linux/Documentation/memory-barriers.txt as a footnote would be sufficient? Or
> perhaps for this manual, "serialized" would be sufficient, with a footnote
> regarding "totally ordered" and a pointer to the memory-barrier documentation?

I'd strongly prefer to document the semantics for users here.  And I
don't think users use the kernel's memory model -- instead, if we assume
that most users will call futex ops from C or C++, then the best we have
is the C11 / C++11 memory model.  Therefore, if we want to expand that,
we should specify semantics in terms of as-if equivalence to C11 pseudo
code.  I had proposed that in the past but, IIRC, Michael didn't want to
add a C11 "dependency" in the semantics back then, at least for the
initial release.

Here's what I wrote back then (atomic_*_relaxed() is like C11
atomic_*(..., memory_order_relaxed), lock/unlock have normal C11 mutex
semantics):

========================

For example, we could say that futex_wait is, in terms of
synchronization semantics, *as if* we'd execute a piece of C11 code.
Here's a part of the docs for a glibc-internal futex wrapper that I'm
working on; this is futex_wait ... :

/* Atomically wrt other futex operations, this blocks iff the value at
   *FUTEX matches the expected value.  This is semantically equivalent to: 
     l = <get lock associated with futex> (FUTEX);
     wait_flag = <get wait_flag associated with futex> (FUTEX);
     lock (l);
     val = atomic_load_relaxed (FUTEX);
     if (val != expected) { unlock (l); return EAGAIN; }
     atomic_store_relaxed (wait_flag, 1);
     unlock (l);
     // Now block; can time out in futex_time_wait (see below)
     while (atomic_load_relaxed(wait_flag));

   Note that no guarantee of a happens-before relation between a woken
   futex_wait and a futex_wake is documented; however, this does not matter
   in practice because we have to consider spurious wake-ups (see below),
   and thus would not be able to reason which futex_wake woke us anyway.


... and this is futex_wake:

/* Atomically wrt other futex operations, this unblocks the specified
   number of processes, or all processes blocked on this futex if there are
   fewer than the specified number.  Semantically, this is equivalent to:
     l = <get lock associated with futex> (futex);
     lock (l);
     for (res = 0; processes_to_wake > 0; processes_to_wake--, res++) {
       if (<no process blocked on futex>) break;
       wf = <get wait_flag of a process blocked on futex> (futex);
       // No happens-before guarantee with woken futex_wait (see above)
       atomic_store_relaxed (wf, 0);
     }
     return res;

This allows a programmer to really infer the guarantees he/she can get
from a futex in terms of synchronization, without the docs having to use
prose to describe that.  This should also not constrain the kernel in
terms of how to implement it, because it is a conceptual as-if relation
(e.g., the kernel won't spin-wait the whole time, and we might want to
make this clear for the PI case).

Of course, there are several as-if representations we could use, and we
might want to be a bit more pseudo-code-ish to make this also easy to
understand for people not familiar with C11 (e.g., using mutex + condvar
with some relaxation of condvar guaranteees).

=========================

I will go through the discussion pointed out by Davidlohr next.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/