Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756691AbcDHCFO (ORCPT ); Thu, 7 Apr 2016 22:05:14 -0400 Received: from mail.efficios.com ([78.47.125.74]:50077 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752872AbcDHCFL (ORCPT ); Thu, 7 Apr 2016 22:05:11 -0400 Date: Fri, 8 Apr 2016 02:05:05 +0000 (UTC) From: Mathieu Desnoyers To: Andy Lutomirski Cc: Peter Zijlstra , "Paul E. McKenney" , Ingo Molnar , Paul Turner , Andi Kleen , Chris Lameter , Dave Watson , Josh Triplett , linux-api , linux-kernel@vger.kernel.org, Andrew Hunter , Linus Torvalds Message-ID: <427613474.49955.1460081105607.JavaMail.zimbra@efficios.com> In-Reply-To: References: <20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com> <20160407155312.GA3448@twins.programming.kicks-ass.net> <20160407201156.GC3448@twins.programming.kicks-ass.net> <1802683892.49910.1460077902922.JavaMail.zimbra@efficios.com> Subject: Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [78.47.125.74] X-Mailer: Zimbra 8.6.0_GA_1178 (ZimbraWebClient - FF45 (Linux)/8.6.0_GA_1178) Thread-Topic: restartable sequences v2: fast user-space percpu critical sections Thread-Index: PvZzb6gBBqL5uxOUa95RKwOnAPKFuQ== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5015 Lines: 144 ----- On Apr 7, 2016, at 9:21 PM, Andy Lutomirski luto@amacapital.net wrote: > On Thu, Apr 7, 2016 at 6:11 PM, Mathieu Desnoyers > wrote: >> ----- On Apr 7, 2016, at 6:05 PM, Andy Lutomirski luto@amacapital.net wrote: >> >>> On Thu, Apr 7, 2016 at 1:11 PM, Peter Zijlstra wrote: >>>> On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote: >> [...] >>>> >>>>> it's inherently debuggable, >>>> >>>> It is more debuggable, agreed. >>>> >>>>> and it allows multiple independent >>>>> rseq-protected things to coexist without forcing each other to abort. >> >> [...] >> >> My understanding is that the main goal of this rather more complex >> proposal is to make interaction with debuggers more straightforward in >> cases of single-stepping through the rseq critical section. > > The things I like about my proposal are both that you can single-step > through it just like any other code as long as you pin the thread to a > CPU and that it doesn't make preemption magical. (Of course, you can > *force* it to do something on resume and/or preemption by sticking a > bogus value in the expected event count field, but that's not the > intended use. Hmm, I guess it does need to hook preemption and/or > resume for all processes that enable the thing so it can know to check > for an enabled post_commit_rip, just like all the other proposals.) > > Also, mine lets you have a fairly long-running critical section that > doesn't get aborted under heavy load and can interleave with other > critical sections that don't conflict. Yes, those would be nice advantages. I'll have to do a few more pseudo-code and execution scenarios to get a better understanding of your idea. > >> >> I recently came up with a scheme that should allow us to handle such >> situations in a fashion similar to debuggers handling ll/sc >> restartable sequences of instructions on e.g. powerpc. The good news >> is that my scheme does not require anything at the kernel level. >> >> The idea is simple: the userspace rseq critical sections now >> become marked by 3 inline functions (rather than 2 in Paul's proposal): >> >> rseq_start(void *rseq_key) >> rseq_finish(void *rseq_key) >> rseq_abort(void *rseq_key) > > How do you use this thing? What are its semantics? You define one rseq_key variable (dummy 1 byte variable, can be an empty structure) for each rseq critical section you have in your program. A rseq critical section will typically have one entry point (rseq_start), and one exit point (rseq_finish). I'm saying "typically" because there may be more than one entry point, and more than one exit point per critical section. Entry and exit points mark the beginning and end of each rseq critical section. rseq_start loads the sequence counter from the TLS and copies it onto the stack. It then gets passed to rseq_finish() to be compared with the final seqnum TLS value just before the commit. rseq_finish is the one responsible for storing into the post_commit_instr field of the TLS and populating rcx with the failure insn label address. rseq_finish() does the commit. And there is rseq_abort(), which would need to be called if we just want to exit from a rseq critical section without doing the commit (no matching call to rseq_finish after a rseq_start). Each of rseq_start, finish, and abort would need to receive a pointer to the rseq_key as parameter. rseq_start would return the sequence number read from the TLS. rseq_finish would also receive as parameter that sequence number that has been returned by rseq_start. Does it make sense ? Thanks, Mathieu > > --Andy > >> >> We associate each critical section with a unique "key" (dummy >> 1 byte object in the process address space), so we can group >> them. The new "rseq_abort" would mark exit points that would >> exit the critical section without executing the final commit >> instruction. >> >> Within each of rseq_start, rseq_finish and rseq_abort, >> we declare a non-loadable section that gets populated >> with the following tuples: >> >> (RSEQ_TYPE, insn address, rseq_key) >> >> Where RSEQ_TYPE is either RSEQ_START, RSEQ_FINISH, or RSEQ_ABORT. >> >> That special section would be found in the executable by the >> debugger, which can then skip over entire restartable critical >> sections when it encounters them by placing breakpoints at >> all exit points (finish and cancel) associated to the same >> rseq_key as the entry point (start). >> >> This way we don't need to complexify the runtime code, neither >> at kernel nor user-space level, and we get debuggability using >> a trick similar to what ll/sc architectures already need to do. >> >> Of course, this requires extending gdb, which should not be >> a show-stopper. >> >> Thoughts ? >> >> Thanks, >> >> Mathieu >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com > > > > -- > Andy Lutomirski > AMA Capital Management, LLC -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com