Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756296AbbDWGYq (ORCPT ); Thu, 23 Apr 2015 02:24:46 -0400 Received: from mail-wi0-f174.google.com ([209.85.212.174]:36328 "EHLO mail-wi0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750704AbbDWGYo (ORCPT ); Thu, 23 Apr 2015 02:24:44 -0400 Date: Thu, 23 Apr 2015 08:24:38 +0200 From: Ingo Molnar To: Josh Triplett Cc: Linus Torvalds , Andy Lutomirski , Denys Vlasenko , Steven Rostedt , Borislav Petkov , "H. Peter Anvin" , Oleg Nesterov , Frederic Weisbecker , Alexei Starovoitov , Will Drewry , Kees Cook , X86 ML , "linux-kernel@vger.kernel.org" , Andrew Morton , Peter Zijlstra , Thomas Gleixner Subject: Re: [PATCH 2/2] x86/asm/entry/32: Remove unnecessary optimization in stub32_clone Message-ID: <20150423062438.GB28898@gmail.com> References: <1429720808-7173-1-git-send-email-dvlasenk@redhat.com> <1429720808-7173-2-git-send-email-dvlasenk@redhat.com> <20150422171005.GA1020@jtriplet-mobl1> <20150422201241.GA954@jtriplet-mobl1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150422201241.GA954@jtriplet-mobl1> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4681 Lines: 118 * Josh Triplett wrote: > On Wed, Apr 22, 2015 at 11:22:02AM -0700, Linus Torvalds wrote: > > On Wed, Apr 22, 2015 at 10:10 AM, Josh Triplett wrote: > > > > > > I do think my two-patch HAVE_COPY_THREAD_TLS series should go in fixing > > > this > > > > Ugh, I absolutely detesrt that patch. > > > > Don't make random crazy function signatures that depend on some config > > option. That's just evil. The patch is a mess of #ifdef's and should > > be shot in the head and staked with a silver stake to make sure it > > never re-appears. > > > > Either: > > > > (a) make the change for every architecture > > > > (b) have side-by-side interfaces. With different names! > > ...that's exactly what I did. They're called copy_thread and > copy_thread_tls; I very intentionally did not conditionally change > the signature of copy_thread, for exactly that reason. Those > functions are implemented in architecture-specific code, so the > config option just specifies which of the two functions the > architecture provides. > > *sys_clone* has different function signatures based on config > options, but I didn't touch that other than fixing the type of the > tls argument. That's historical baggage that we can't throw away > without breaking userspace. So you want to add a new clone() parameter. I strongly suspect that you won't be the last person who wants to do this. So why not leave the compatibility baggage alone and introduce a new clean clone syscall with a flexible, backwards and forwards compatible parameter ABI? Something like: SYSCALL_DEFINE1(clone_params, struct clone_params __user *, params) struct clone_params { __u32 size; /* User-space sets it to sizeof(struct clone_params) */ __u32 tls_val; /* Slightly out of order, for optimal structure packing */ __u64 clone_flags; __u64 new_sp_addr; __u64 parent_tid_addr; __u64 child_tid_addr; __u64 tls; }; The only real cost of this approach is that this parameter structure has to be copied (once): should be fast as it fits into a single cache line and is a constant size copy. Also note how this parameter block can be passed down by const reference from that point on, instead of copying/shuffling 5-6 parameters into increasingly long function parameter lists. So it might in fact turn out to be slightly faster as well. Note how easily extensible it is: a new field can be added by appending to the structure, and compatibility is achieved in the kernel without explicit versioning, by checking params->size: params->size == sizeof(*params): Good, kernel and userspace ABI version matches. This is a simple check and the most typical situation - it will be the fastest as well. params->size < sizeof(*params): Old binary calls into new kernel. Missing fields are set to 0. Binaries will be forward compatible without any versioning hacks. params->size > sizeof(*params): New user-space calls into old kernel. System call can still be serviced if the new field(s) are all zero. We return -ENOSYS if any field is non-zero. (i.e. if new user-space tries to make use of a new syscall feature that this kernel has not implemented yet.) This way user-space can figure out whether a particular new parameter is supported by a kernel, without having to add new system calls or versioning. Also note how 'fool proof' this kind of 'versioning' is: the parameter block is extended by appending to the end, cleanly extending the kernel internals to handle the new parameter - end of story. The zeroing logic on size mismatch will take care of the rest automatically, it's all ABI forwards and backwards compatible to the maximum possible extent. It's relatively easy to use this same interface from 32-bit compat environments as well: they can use the very same parameter block, the kernel's compat layer possibly just needs to do a minor amount of pointer translation for the _addr fields, for example to truncate them down to 32 bits for security, by filling in the high words of pointers with 0. (This too can be optimized further if needed, by open coding the copy and zeroing.) This too is forwards and backwards ABI compatible to the maximum possible extent. So introduce a single new syscall with the right ABI architecture and you can leave the old mistakes alone. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/