MIME-Version: 1.0
In-Reply-To: <CAPM31RJwFJXjxLnwFMCBE1=wVXyU06ttrT-_fNr1rb=+ewxVrg@mail.gmail.com>
References: <20150624222609.6116.86035.stgit@kitami.mtv.corp.google.com>
 <CALCETrXAtYDZBbpwZceFyhLOnqFmTDqTxhGfbrrVrY+34cxSFg@mail.gmail.com>
 <CAPM31R+GUtD_9S+m7U0DGpeqSCT1n98bvQ0NUOwMHX7-CoKigQ@mail.gmail.com>
 <842897619.3710.1435281350583.JavaMail.zimbra@efficios.com> <CAPM31RJwFJXjxLnwFMCBE1=wVXyU06ttrT-_fNr1rb=+ewxVrg@mail.gmail.com>
From: Andy Lutomirski <luto@amacapital.net>
Date: Sat, 27 Jun 2015 09:25:02 -0700
Message-ID: <CALCETrVRE+6nM6DGa8ph84cX+CdRXh+qXyReU+jHgx9-+uCTyg@mail.gmail.com>
Subject: Re: [RFC PATCH 0/3] restartable sequences: fast user-space percpu
 critical sections
To: Paul Turner <pjt@google.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
        Peter Zijlstra <peterz@infradead.org>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        Andrew Hunter <ahh@google.com>, Andi Kleen <andi@firstfloor.org>,
        Lai Jiangshan <laijs@cn.fujitsu.com>,
        linux-api <linux-api@vger.kernel.org>,
        LKML <linux-kernel@vger.kernel.org>, rostedt <rostedt@goodmis.org>,
        Josh Triplett <josh@joshtriplett.org>, Ingo Molnar <mingo@redhat.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Chris Lameter <cl@linux.com>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3965
Lines: 93

Let me try to summarize some of the approaches with their pros and cons:

--- percpu segment ---

This is probably the simplest and might make sense regardless.
cmpxchg can be used to do an atomic push onto a linked list.  I think
that unlocked cmpxchg16b can be used to get an atomic pop.  (You'd
have the list head pointer next to an auxiliary pointer to the second
element in the list, perhaps.)

You can also use this for limited forms of speculative locking.
Aborting cleanly if your lock is stolen might require the kernel's
help, though (you're now on the wrong cpu, so you can't atomically
poke the lock variable any more).

The ABI is straightforward, and the only limitation on multiple users
in the same process is that they need to coordinate their offsets into
the percpu segment.

--- vdso-provided atomic ops ---

This could be quite flexible.  The upside is that the ABI would be
straightforward (call a function with clearly-specified behavior).
The downside is that implementing it well might require percpu
segments and a certain amount of coordination, and it requires a
function call.

One nice thing about doing it in the vdso is that we can change the
implementation down the road.

--- kernel preemption hooks ---

I'm defining a preemption hook as an action taken by the kernel when a
user task is preempted during a critical section.

As an upside, we get extremely efficient, almost arbitrary percpu
operations.  We don't need to worry about memory ordering at all,
because the whole sequence aborts if anything else might run on the
same cpu.  Push and pop are both easy.

One con is that actually defining where the critical section is might
be nasty.  If there's a single IP range, then two libraries could
fight over it.  We could have a variable somewhere that you write to
arm the critical section, but that's a bit slower.

Another con is that you can't single-step through this type of
critical section.  It will be preempted every time.

--- kernel migration hooks ---

I'm not sure either Paul or Mattieu discussed this, but another option
would be to have some special handling if a task is migrated during a
critical section or to allow a task to prevent migration entirely
during a critical section.  From the user's point of view, this is
weaker than preemption hooks: it's possible to start your critical
section, be preempted, and have another thread enter its own critical
section, then get rescheduled on the same cpu without aborting.  Users
would have to use local atomics (like cmpxchg) to make it useful.

As a major advantage, single-stepping still works.

This shares the coordination downside with preemption hooks (users
have to tell the kernel about their critical sections somehow).

Push can certainly be implemented using cmpxchg.  The gs prefix isn't
even needed.  Pop might be harder to implement directly without
resorting to cmpxchg16b or similar.

--- Unnamed trick ---

On entry to a critical section, try to take a per-cpu lock that stores
the holder's tid.  This might require percpu segments.

If you get the lock, then start doing your thing.  For example, you
could pop by reading head->next and writing it back to head.

If, however, you miss the lock, then you need to either wait or
forcibly abort the lock holder.  You could do the latter by sending a
signal or possibly using a new syscall that atomically aborts the lock
holder and takes the lock.  You don't need to wait, though -- all you
need to do is queue the signal and, if the lock holder is actually
running, wait for signal delivery to start.


Thoughts?  I personally like the other options better than preemption
hooks.  I prefer solutions that don't interfere with debugging.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/