Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758960AbcDHVQ6 (ORCPT ); Fri, 8 Apr 2016 17:16:58 -0400 Received: from mail-oi0-f44.google.com ([209.85.218.44]:33343 "EHLO mail-oi0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750819AbcDHVQ4 (ORCPT ); Fri, 8 Apr 2016 17:16:56 -0400 MIME-Version: 1.0 In-Reply-To: <65466698.51122.1460137589499.JavaMail.zimbra@efficios.com> References: <20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com> <20160407155312.GA3448@twins.programming.kicks-ass.net> <20160407201156.GC3448@twins.programming.kicks-ass.net> <1802683892.49910.1460077902922.JavaMail.zimbra@efficios.com> <427613474.49955.1460081105607.JavaMail.zimbra@efficios.com> <65466698.51122.1460137589499.JavaMail.zimbra@efficios.com> From: Andy Lutomirski Date: Fri, 8 Apr 2016 14:16:35 -0700 Message-ID: Subject: Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections To: Mathieu Desnoyers Cc: "Paul E. McKenney" , Ingo Molnar , "linux-kernel@vger.kernel.org" , linux-api , Paul Turner , Chris Lameter , Andi Kleen , Josh Triplett , Dave Watson , Andrew Hunter , Linus Torvalds , Peter Zijlstra Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4595 Lines: 105 On Apr 8, 2016 10:46 AM, "Mathieu Desnoyers" wrote: > > ----- On Apr 7, 2016, at 10:05 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: > > > ----- On Apr 7, 2016, at 9:21 PM, Andy Lutomirski luto@amacapital.net wrote: > > > >> On Thu, Apr 7, 2016 at 6:11 PM, Mathieu Desnoyers > >> wrote: > >>> ----- On Apr 7, 2016, at 6:05 PM, Andy Lutomirski luto@amacapital.net wrote: > >>> > >>>> On Thu, Apr 7, 2016 at 1:11 PM, Peter Zijlstra wrote: > >>>>> On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote: > >>> [...] > >>>>> > >>>>>> it's inherently debuggable, > >>>>> > >>>>> It is more debuggable, agreed. > >>>>> > >>>>>> and it allows multiple independent > >>>>>> rseq-protected things to coexist without forcing each other to abort. > >>> > >>> [...] > >>> > >>> My understanding is that the main goal of this rather more complex > >>> proposal is to make interaction with debuggers more straightforward in > >>> cases of single-stepping through the rseq critical section. > >> > >> The things I like about my proposal are both that you can single-step > >> through it just like any other code as long as you pin the thread to a > >> CPU and that it doesn't make preemption magical. (Of course, you can > >> *force* it to do something on resume and/or preemption by sticking a > >> bogus value in the expected event count field, but that's not the > >> intended use. Hmm, I guess it does need to hook preemption and/or > >> resume for all processes that enable the thing so it can know to check > >> for an enabled post_commit_rip, just like all the other proposals.) > >> > >> Also, mine lets you have a fairly long-running critical section that > >> doesn't get aborted under heavy load and can interleave with other > >> critical sections that don't conflict. > > > > Yes, those would be nice advantages. I'll have to do a few more > > pseudo-code and execution scenarios to get a better understanding of > > your idea. > > > >> > >>> > >>> I recently came up with a scheme that should allow us to handle such > >>> situations in a fashion similar to debuggers handling ll/sc > >>> restartable sequences of instructions on e.g. powerpc. The good news > >>> is that my scheme does not require anything at the kernel level. > >>> > >>> The idea is simple: the userspace rseq critical sections now > >>> become marked by 3 inline functions (rather than 2 in Paul's proposal): > >>> > >>> rseq_start(void *rseq_key) > >>> rseq_finish(void *rseq_key) > >>> rseq_abort(void *rseq_key) > >> > >> How do you use this thing? What are its semantics? > > > > You define one rseq_key variable (dummy 1 byte variable, can be an > > empty structure) for each rseq critical section you have in your > > program. > > > > A rseq critical section will typically have one entry point (rseq_start), > > and one exit point (rseq_finish). I'm saying "typically" because there > > may be more than one entry point, and more than one exit point per > > critical section. > > > > Entry and exit points mark the beginning and end of each rseq critical > > section. rseq_start loads the sequence counter from the TLS and copies > > it onto the stack. It then gets passed to rseq_finish() to be compared > > with the final seqnum TLS value just before the commit. rseq_finish is > > the one responsible for storing into the post_commit_instr field of the > > TLS and populating rcx with the failure insn label address. rseq_finish() > > does the commit. > > > > And there is rseq_abort(), which would need to be called if we just want > > to exit from a rseq critical section without doing the commit (no matching > > call to rseq_finish after a rseq_start). > > > > Each of rseq_start, finish, and abort would need to receive a pointer > > to the rseq_key as parameter. > > > > rseq_start would return the sequence number read from the TLS. > > > > rseq_finish would also receive as parameter that sequence number that has > > been returned by rseq_start. > > > > Does it make sense ? > > By the way, the debugger can always decide to single-step through the > first iteration of the rseq, and then after it loops, decide to skip > single-stepping until the exit points are reached. True, but you're assuming that someone will actually write that code and then users will know how to use it. That's something I like about my version. Admittedly, LL/SC and TSX have the same problem, but those are architectural, and it's not really a good excuse to add a new thing that's awkward to debug. --Andy