Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932077AbbLMUSp (ORCPT ); Sun, 13 Dec 2015 15:18:45 -0500 Received: from one.firstfloor.org ([193.170.194.197]:60516 "EHLO one.firstfloor.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752021AbbLMUSn (ORCPT ); Sun, 13 Dec 2015 15:18:43 -0500 Date: Sun, 13 Dec 2015 21:18:41 +0100 From: Andi Kleen To: Mathieu Desnoyers Cc: Andi Kleen , Thomas Gleixner , linux-kernel@vger.kernel.org, Paul Turner , Andrew Hunter , Peter Zijlstra , Andy Lutomirski , Dave Watson , Chris Lameter , Ingo Molnar , Ben Maurer , rostedt , "Paul E. McKenney" , Josh Triplett , Linus Torvalds , Andrew Morton , linux-api Subject: Re: [RFC PATCH 1/2] thread_local_abi system call: caching current CPU number (x86) Message-ID: <20151213201841.GW15533@two.firstfloor.org> References: <1449761990-23525-1-git-send-email-mathieu.desnoyers@efficios.com> <20151213181527.GV15533@two.firstfloor.org> <450134747.239045.1450036728930.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <450134747.239045.1450036728930.JavaMail.zimbra@efficios.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1819 Lines: 41 > In the context of restartable sequences [1] [2], the goal is to turn > atomic operations on per-cpu data into a sequence of simple load/store > operations. Therefore, improving getcpu from 12ns to 0.3ns will have a I don't think LSL is 12ns. It's a few cycles. > Moreover, AFAIU, restartable sequences cannot do the function call > required by the vdso while within the c.s.: those need to entirely fit > within an inline assembly. So this CPU number caching actually enables > restartable sequences, whereas the vdso approach cannot be used in that > context. You can use the LSL directly though. In practice people already rely on it (and it's very cheap on the kernel side), so it's a defacto ABI and could be documented. So it's not function call vs load, but LSL vs load. > > Finally, even if overall this new system call is not deemed sufficiently > interesting on x86, other popular architectures such as ARM32 don't have > any vDSO for getcpu at the moment, mainly because they don't have similar > segment selector tricks, and I'm not aware of other solutions than caching Has that been confirmed by architecture experts? Maybe there is some trick there too. > I suspect that most of the difference between the vDSO approach and > CPU number caching is simply the function call required for the vDSO. > I doubt there is much to be done on this front. Not sure about that. Basic function calls are not that expensive. Right now there is some baggage but that could be optimized. The only unavoidable overhead would be the ABI register clobbering. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/