Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758524AbcDHP6u (ORCPT ); Fri, 8 Apr 2016 11:58:50 -0400 Received: from mail-oi0-f52.google.com ([209.85.218.52]:33509 "EHLO mail-oi0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751924AbcDHP6s (ORCPT ); Fri, 8 Apr 2016 11:58:48 -0400 MIME-Version: 1.0 In-Reply-To: <20160408064136.GJ3448@twins.programming.kicks-ass.net> References: <20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com> <20160407120254.GY3448@twins.programming.kicks-ass.net> <20160407152432.GZ3448@twins.programming.kicks-ass.net> <20160407155312.GA3448@twins.programming.kicks-ass.net> <20160408064136.GJ3448@twins.programming.kicks-ass.net> From: Andy Lutomirski Date: Fri, 8 Apr 2016 08:58:27 -0700 Message-ID: Subject: Re: [RFC PATCH 0/3] restartable sequences v2: fast user-space percpu critical sections To: Peter Zijlstra Cc: Mathieu Desnoyers , "Paul E. McKenney" , Ingo Molnar , Paul Turner , Chris Lameter , Andi Kleen , Josh Triplett , Dave Watson , Linux API , "linux-kernel@vger.kernel.org" , Andrew Hunter , Linus Torvalds Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1083 Lines: 31 On Apr 7, 2016 11:41 PM, "Peter Zijlstra" wrote: > > On Thu, Apr 07, 2016 at 09:43:33AM -0700, Andy Lutomirski wrote: > > enter the critical section: > > 1: > > movq %[cpu], %%r12 > > movq {address of counter for our cpu}, %%r13 > > movq {some fresh value}, (%%r13) > > cmpq %[cpu], %%r12 > > jne 1b > > This is inherently racy; your forgot the detail of 'some fresh value', > but since you want to avoid collisions you really want an increment. > > But load-store archs cannot do that. Or rather, they need to do: > > load Rn, $event > add Rn, Rn, 1 > store $event, Rn > > But if they're preempted in the middle, two threads will collide and > generate the _same_ increment. Comparing CPU numbers will not fix that. Even on x86 this won't work -- we have no actual guarantee we're on the right CPU, so we'd have to use an atomic. I was thinking we'd allocate from a per-thread pool (say 24 bits of thread ID and the rest being a nonce). On load-store architectures this wouldn't be async-signal-safe, though. Hmm. --Andy