Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754280AbbGFRcc (ORCPT ); Mon, 6 Jul 2015 13:32:32 -0400 Received: from mail-wi0-f169.google.com ([209.85.212.169]:35128 "EHLO mail-wi0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754019AbbGFRc2 (ORCPT ); Mon, 6 Jul 2015 13:32:28 -0400 Date: Mon, 6 Jul 2015 19:32:23 +0200 From: Ingo Molnar To: Linus Torvalds Cc: Andy Lutomirski , Linux Kernel Mailing List , the arch/x86 maintainers , Jan Kara , Borislav Petkov , Denys Vlasenko Subject: Re: [PATCH] x86: Fix detection of GCC -mpreferred-stack-boundary support Message-ID: <20150706173223.GA30566@gmail.com> References: <20150706134423.GA8094@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2460 Lines: 56 * Linus Torvalds wrote: > On Mon, Jul 6, 2015 at 6:44 AM, Ingo Molnar wrote: > > > > So looking at this I question the choice of -mpreferred-stack-boundary=3. Why > > not do -mpreferred-stack-boundary=2? > > It wouldn't make sense anyway - it would only make code worse (if it worked) and > not any better. > > The reason the "=3" value is good is because 8-byte alignment is the "natural" > alignment - it's what you get with a normal call sequence, simply because the > return address is 8 bytes in size. > > That means that with "=3" you don't get extra code to align the stack for the > simple functions that don't need a frame. > > Anything smaller than 3 wouldn't help even if it worked, because none of the > normal stack operations (pushing/popping registers to save/restore them) would > be any smaller anyway. > > But bigger values than 3 result in the compiler having to generate extra stack > adjustments just to align the stack after a call that very naturally mis-aligned > it. And it doesn't help anyway, since in the kernel we don't put stuff on the > stack that needs bigger alignment (of, the fxsave buffer is a counter-example, > but it's a very odd one that we _shouldn't_ have put on the stack). Ok, so it's all moot, but my (quite possibly flawed) thinking was that for deeper call chains, using 4 byte RSP alignment (as opposed to 8 bytes) would allow, in about 50% of the cases, the stack frame to be narrower by 4 bytes. (depending on whether the 'natural' stack boundary is properly aligned to 8 bytes or not.) For a 10 deep call chain that's a 20 bytes more compact stack on average (10*4*0.5), resulting in a tiny bit denser D$. My assumptions were: - no extra code is generated by GCC. (If it causes any extra code to be generated then it's an obvious loss.) - mis-aligning an 8 byte variable by 4 bytes is being handled quite well by most x86 uarchs, without penalty in most cases. But ... it's all moot and even in the best case if both my assumptions are fully met (which is not a given), the advantages are pretty marginal, so consider the idea dead by multiple mortal wounds. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/