Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757475AbcDGUMF (ORCPT ); Thu, 7 Apr 2016 16:12:05 -0400 Received: from casper.infradead.org ([85.118.1.10]:48355 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753618AbcDGUMC (ORCPT ); Thu, 7 Apr 2016 16:12:02 -0400 Date: Thu, 7 Apr 2016 22:11:56 +0200 From: Peter Zijlstra To: Andy Lutomirski Cc: Mathieu Desnoyers , "Paul E. McKenney" , Ingo Molnar , Paul Turner , Andi Kleen , Chris Lameter , Dave Watson , Josh Triplett , Linux API , "linux-kernel@vger.kernel.org" , Andrew Hunter , Linus Torvalds Subject: Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections Message-ID: <20160407201156.GC3448@twins.programming.kicks-ass.net> References: <20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com> <20160407120254.GY3448@twins.programming.kicks-ass.net> <20160407152432.GZ3448@twins.programming.kicks-ass.net> <20160407155312.GA3448@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2418 Lines: 85 On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote: > More concretely, this looks like (using totally arbitrary register > assingments -- probably far from ideal, especially given how GCC's > constraints work): > > enter the critical section: > 1: > movq %[cpu], %%r12 > movq {address of counter for our cpu}, %%r13 > movq {some fresh value}, (%%r13) > cmpq %[cpu], %%r12 > jne 1b > > ... do whatever setup or computation is needed... > > movq $%l[failed], %%rcx > movq $1f, %[commit_instr] > cmpq {whatever counter we chose}, (%%r13) > jne %l[failed] > cmpq %[cpu], %%r12 > jne %l[failed] > > <-- a signal in here that conflicts with us would clobber (%%r13), and > the kernel would notice and send us to the failed label > > movq %[to_write], (%[target]) > 1: movq $0, %[commit_instr] And the kernel, for every thread that has had the syscall called and a thingy registered, needs to (at preempt/signal-setup): if (get_user(post_commit_ip, current->post_commit_ip)) return -EFAULT; if (likely(!post_commit_ip)) return 0; if (regs->ip >= post_commit_ip) return 0; if (get_user(seq, (u32 __user *)regs->r13)) return -EFAULT; if (regs->$(which one holds our chosen seq?) == seq) { /* nothing changed, do not cancel, proceed to commit. */ return 0; } if (put_user(0UL, current->post_commit_ip)) return -EFAULT; regs->ip = regs->rcx; > In contrast to Paul's scheme, this has two additional (highly > predictable) branches and requires generation of a seqcount in > userspace. In its favor, though, it doesnt need preemption hooks, Without preemption hooks, how would one thread preempting another at the above <-- clobber anything and cause the commit to fail? > it's inherently debuggable, It is more debuggable, agreed. > and it allows multiple independent > rseq-protected things to coexist without forcing each other to abort. And the kernel only needs to load the second cacheline if it lands in the middle of a finish block, which should be manageable overhead I suppose. But the userspace chunk is lots slower as it needs to always touch multiple lines, since the @cpu, @seq and @post_commit_ip all live in separate lines (although I suppose @cpu and @post_commit_ip could live in the same). The finish thing needs 3 registers for: - fail ip - seq pointer - seq value Which I suppose is possible even on register constrained architectures like i386.