Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752404AbbLMU0q (ORCPT ); Sun, 13 Dec 2015 15:26:46 -0500 Received: from mail-oi0-f46.google.com ([209.85.218.46]:33185 "EHLO mail-oi0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752036AbbLMU0n (ORCPT ); Sun, 13 Dec 2015 15:26:43 -0500 MIME-Version: 1.0 In-Reply-To: <20151213201841.GW15533@two.firstfloor.org> References: <1449761990-23525-1-git-send-email-mathieu.desnoyers@efficios.com> <20151213181527.GV15533@two.firstfloor.org> <450134747.239045.1450036728930.JavaMail.zimbra@efficios.com> <20151213201841.GW15533@two.firstfloor.org> From: Andy Lutomirski Date: Sun, 13 Dec 2015 12:26:23 -0800 Message-ID: Subject: Re: [RFC PATCH 1/2] thread_local_abi system call: caching current CPU number (x86) To: Andi Kleen Cc: Mathieu Desnoyers , Thomas Gleixner , "linux-kernel@vger.kernel.org" , Paul Turner , Andrew Hunter , Peter Zijlstra , Dave Watson , Chris Lameter , Ingo Molnar , Ben Maurer , rostedt , "Paul E. McKenney" , Josh Triplett , Linus Torvalds , Andrew Morton , linux-api Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1539 Lines: 36 On Sun, Dec 13, 2015 at 12:18 PM, Andi Kleen wrote: >> In the context of restartable sequences [1] [2], the goal is to turn >> atomic operations on per-cpu data into a sequence of simple load/store >> operations. Therefore, improving getcpu from 12ns to 0.3ns will have a > > I don't think LSL is 12ns. It's a few cycles. 11ns on my Skylale laptop. (rdtscp is now almost as fast as lsl.) FWIW, a failed LSL is 55ns. We could play sneaky tricks and use SGDT instead. Long term on x86, I think we should be using per-cpu segments, though. > >> Moreover, AFAIU, restartable sequences cannot do the function call >> required by the vdso while within the c.s.: those need to entirely fit >> within an inline assembly. So this CPU number caching actually enables >> restartable sequences, whereas the vdso approach cannot be used in that >> context. > > You can use the LSL directly though. In practice people already rely > on it (and it's very cheap on the kernel side), so it's a defacto ABI > and could be documented. > > So it's not function call vs load, but LSL vs load. I do wonder if the function call itself is cheap enough that we should do this entirely within the vDSO. Unfortunately, the vDSO can't use TLS, so that's not so easy without trickery. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/