Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756798AbcDGP7T (ORCPT ); Thu, 7 Apr 2016 11:59:19 -0400 Received: from mail.efficios.com ([78.47.125.74]:44087 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756546AbcDGP7R convert rfc822-to-8bit (ORCPT ); Thu, 7 Apr 2016 11:59:17 -0400 Date: Thu, 7 Apr 2016 15:59:10 +0000 (UTC) From: Mathieu Desnoyers To: Peter Zijlstra Cc: Florian Weimer , "H. Peter Anvin" , Andrew Morton , Russell King , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-api , Paul Turner , Andrew Hunter , Andy Lutomirski , Andi Kleen , Dave Watson , Chris Lameter , Ben Maurer , rostedt , "Paul E. McKenney" , Josh Triplett , Linus Torvalds , Catalin Marinas , Will Deacon , Michael Kerrisk , Boqun Feng Message-ID: <966747921.48958.1460044750563.JavaMail.zimbra@efficios.com> In-Reply-To: <20160407122528.GS3430@twins.programming.kicks-ass.net> References: <5702A037.60200@zytor.com> <20160405164722.GB3430@twins.programming.kicks-ass.net> <570621E5.7060306@redhat.com> <20160407103158.GP3430@twins.programming.kicks-ass.net> <570638D9.7010108@redhat.com> <20160407111938.GR3430@twins.programming.kicks-ass.net> <57064CA9.101@redhat.com> <20160407122528.GS3430@twins.programming.kicks-ass.net> Subject: Re: [RFC PATCH v6 1/5] Thread-local ABI system call: cache CPU number of running thread MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-Originating-IP: [78.47.125.74] X-Mailer: Zimbra 8.6.0_GA_1178 (ZimbraWebClient - FF45 (Linux)/8.6.0_GA_1178) Thread-Topic: Thread-local ABI system call: cache CPU number of running thread Thread-Index: 8ttlhgvyHGGqSMbuV2F6IngaThl9PQ== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3567 Lines: 101 ----- On Apr 7, 2016, at 8:25 AM, Peter Zijlstra peterz@infradead.org wrote: > On Thu, Apr 07, 2016 at 02:03:53PM +0200, Florian Weimer wrote: >> > struct tlabi { >> > union { >> > __u8[64] __foo; >> > struct { >> > /* fields go here */ >> > }; >> > }; >> > } __aligned__(64); >> >> That's not really “fixed size” as far as an ABI is concerned, due to the >> possibility of future extensions. > > sizeof(struct tlabi) is always the same, right? How is that not fixed? > >> > People objected against the fixed size scheme, but it being possible to >> > get a fixed TCB offset and reduce indirections is a big win IMO. >> >> It's a difficult trade-off. It's not an indirection as such, it's avoid >> loading the dynamic TLS offset. > > What we _want_ is being able to use %[gf]s:offset and have it work (I > forever forget which segment register userspace TLS uses). > >> Let me repeat that the ELF TLS GNU ABI has very limited support for >> static offsets at present, and it is difficult to make them available >> more widely without code generation at run time (in the form of text >> relocations, but still). > > Do you have a pointer to something I can read? Because I'm clearly not > understanding the full issue here. For what is is worth, here are a couple of objdump snippet of my test program without and with -fPIC: * Compiled with -O2, *without* -fPIC, x86-64: __thread __attribute__((weak)) volatile struct thread_local_abi __thread_local_abi; static int32_t read_cpu_id(void) { if (unlikely(!(__thread_local_abi.features & TLABI_FEATURE_CPU_ID))) 40064e: 64 8b 04 25 c0 ff ff mov %fs:0xffffffffffffffc0,%eax 400655: ff 400656: a8 01 test $0x1,%al 400658: 74 71 je 4006cb return sched_getcpu(); return __thread_local_abi.cpu_id; 40065a: 64 8b 14 25 c4 ff ff mov %fs:0xffffffffffffffc4,%edx 400661: ff } * Compiled with -O2, with -fPIC, x86_64: __thread __attribute__((weak)) volatile struct thread_local_abi __thread_local_abi; 4006de: 64 48 8b 04 25 00 00 mov %fs:0x0,%rax 4006e5: 00 00 static int32_t read_cpu_id(void) { if (unlikely(!(__thread_local_abi.features & TLABI_FEATURE_CPU_ID))) 4006e7: 48 8d 80 c0 ff ff ff lea -0x40(%rax),%rax 4006ee: 8b 10 mov (%rax),%edx 4006f0: 83 e2 01 and $0x1,%edx 4006f3: 0f 84 80 00 00 00 je 400779 return sched_getcpu(); return __thread_local_abi.cpu_id; 4006f9: 8b 50 04 mov 0x4(%rax),%edx } So with -fPIC (libraries), TLS adds an extra indirection. However, it just needs to load the base address once, and can then access both "features" and "cpu_id" fields as offsets from that base. For executables compiled without -fPIC, there is no indirection. This justifies the following paragraph in the proposed man page: The symbol __thread_local_abi is recommended to be used across libraries and applications wishing to register a the thread-local ABI structure for tlabi_nr 0. The attribute "weak" is recommended when declaring this variable in libraries. Applications can choose to define their own version of this symbol without the weak attribute as a performance improvement. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com