Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752378AbbEZVod (ORCPT ); Tue, 26 May 2015 17:44:33 -0400 Received: from mail-lb0-f174.google.com ([209.85.217.174]:35775 "EHLO mail-lb0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752333AbbEZVo2 (ORCPT ); Tue, 26 May 2015 17:44:28 -0400 MIME-Version: 1.0 In-Reply-To: References: <1432219487-13364-1-git-send-email-mathieu.desnoyers@efficios.com> <757752240.6470.1432330487312.JavaMail.zimbra@efficios.com> <1839774559.6579.1432400944032.JavaMail.zimbra@efficios.com> <1184354091.7499.1432578613872.JavaMail.zimbra@efficios.com> <821493560.8531.1432674243321.JavaMail.zimbra@efficios.com> From: Andy Lutomirski Date: Tue, 26 May 2015 14:44:06 -0700 Message-ID: Subject: Re: [RFC PATCH] percpu system call: fast userspace percpu critical sections To: Mathieu Desnoyers Cc: Andi Kleen , Borislav Petkov , "H. Peter Anvin" , Lai Jiangshan , Ben Maurer , "Paul E. McKenney" , Ingo Molnar , Josh Triplett , Andrew Morton , Michael Kerrisk , Linux API , Linux Kernel , Paul Turner , Peter Zijlstra , Linus Torvalds , Steven Rostedt , Andrew Hunter Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1652 Lines: 36 On Tue, May 26, 2015 at 2:18 PM, Andy Lutomirski wrote: > On Tue, May 26, 2015 at 2:04 PM, Mathieu Desnoyers >> >>> >>> It's too bad that not all architectures have a single-instruction >>> unlocked compare-and-exchange. >> >> Based on my benchmarks, it's not clear that single-instruction >> unlocked CAS is actually faster than doing the same with many >> instructions. > > True, but with a single instruction the user can't get preempted in the middle. > > Looking at your code, it looks like percpu_user_sched_in has some > potentially nasty issues with page faults. Avoiding touching user > memory from the scheduler would be quite nice from an implementation > POV, and the x86-specific gs hack wins in that regard. ARM has "TLB lockdown entries" which could, I think, be used to implement per-cpu or per-thread mappings. I'm actually rather surprised that Linux doesn't already use a TLB lockdown entry for TLS. (Hmm. Maybe it's because the interface to write the entries requires actually touching the page. Maybe not -- the ARM docs, in general, seem to be much less clear than the Intel and AMD docs.) ARM doesn't seem to have any single-instruction compare-exchange or similar instruction, though, so this might be all that useful. On the other hand, ARM can probably do reasonably efficient per-cpu memory allocation and such with a single ldrex/strex pair. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/