Date: Thu, 7 Apr 2016 22:11:56 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        Ingo Molnar <mingo@redhat.com>, Paul Turner <commonly@gmail.com>,
        Andi Kleen <andi@firstfloor.org>, Chris Lameter <cl@linux.com>,
        Dave Watson <davejwatson@fb.com>,
        Josh Triplett <josh@joshtriplett.org>,
        Linux API <linux-api@vger.kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Andrew Hunter <ahh@google.com>,
        Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu
 critical sections
Message-ID: <20160407201156.GC3448@twins.programming.kicks-ass.net>
References: <20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com>
 <CALCETrW=3bZyC9d5tUoESEsNt-rc-uhNhZpgEgeSC8W4FAVYkg@mail.gmail.com>
 <20160407120254.GY3448@twins.programming.kicks-ass.net>
 <CALCETrV0vcYcnBrs0axykJD=_BM28wKWVMG6bMzK8zh8R3m5fg@mail.gmail.com>
 <20160407152432.GZ3448@twins.programming.kicks-ass.net>
 <CALCETrU5ZL6Jajc=9up-j86vY_Xtt-gTFjdQE0sB0d=d-CJZ6A@mail.gmail.com>
 <20160407155312.GA3448@twins.programming.kicks-ass.net>
 <CALCETrVGo1Di3qamxx1NAFUSN_o=-HnYRDpeVp7zrQEBwe5u-g@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CALCETrVGo1Di3qamxx1NAFUSN_o=-HnYRDpeVp7zrQEBwe5u-g@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2012-12-30)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2418
Lines: 85

On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote:
> More concretely, this looks like (using totally arbitrary register
> assingments -- probably far from ideal, especially given how GCC's
> constraints work):
> 
> enter the critical section:
> 1:
> movq %[cpu], %%r12
> movq {address of counter for our cpu}, %%r13
> movq {some fresh value}, (%%r13)
> cmpq %[cpu], %%r12
> jne 1b
> 
> ... do whatever setup or computation is needed...
> 
> movq $%l[failed], %%rcx
> movq $1f, %[commit_instr]
> cmpq {whatever counter we chose}, (%%r13)
> jne %l[failed]
> cmpq %[cpu], %%r12
> jne %l[failed]
> 
> <-- a signal in here that conflicts with us would clobber (%%r13), and
> the kernel would notice and send us to the failed label
> 
> movq %[to_write], (%[target])
> 1: movq $0, %[commit_instr]

And the kernel, for every thread that has had the syscall called and a
thingy registered, needs to (at preempt/signal-setup):

	if (get_user(post_commit_ip, current->post_commit_ip))
		return -EFAULT;

	if (likely(!post_commit_ip))
		return 0;

	if (regs->ip >= post_commit_ip)
		return 0;

	if (get_user(seq, (u32 __user *)regs->r13))
		return -EFAULT;

	if (regs->$(which one holds our chosen seq?) == seq) {
		/* nothing changed, do not cancel, proceed to commit. */
		return 0;
	}

	if (put_user(0UL, current->post_commit_ip))
		return -EFAULT;

	regs->ip = regs->rcx;


> In contrast to Paul's scheme, this has two additional (highly
> predictable) branches and requires generation of a seqcount in
> userspace.  In its favor, though, it doesnt need preemption hooks,

Without preemption hooks, how would one thread preempting another at the
above <-- clobber anything and cause the commit to fail?

> it's inherently debuggable, 

It is more debuggable, agreed.

> and it allows multiple independent
> rseq-protected things to coexist without forcing each other to abort.

And the kernel only needs to load the second cacheline if it lands in
the middle of a finish block, which should be manageable overhead I
suppose.

But the userspace chunk is lots slower as it needs to always touch
multiple lines, since the @cpu, @seq and @post_commit_ip all live in
separate lines (although I suppose @cpu and @post_commit_ip could live
in the same).

The finish thing needs 3 registers for:

 - fail ip
 - seq pointer
 - seq value

Which I suppose is possible even on register constrained architectures
like i386.