Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753096AbdCHRpZ (ORCPT ); Wed, 8 Mar 2017 12:45:25 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47760 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751160AbdCHRpW (ORCPT ); Wed, 8 Mar 2017 12:45:22 -0500 Date: Wed, 8 Mar 2017 11:37:03 -0600 From: Josh Poimboeuf To: Linus Torvalds Cc: Andy Lutomirski , Pavel Machek , kernel list , Ingo Molnar , Andrew Lutomirski , Borislav Petkov , Brian Gerst , Denys Vlasenko , Peter Anvin , Peter Zijlstra , Thomas Gleixner Subject: Re: v4.10: kernel stack frame pointer .. has bad value (null) Message-ID: <20170308173703.2h57rsltma3smbcm@treble> References: <20170222231808.hmr6ulbvfnrg2at7@treble> <20170223201039.GB5177@amd> <20170225050439.7dplheb6nyne4nkm@treble> <20170302234514.3qcqdozibcltkdai@treble> <20170306163807.GA20689@amd> <20170307173821.yknj5htr7plgdwxv@treble> <20170307182855.262ezbon2pm67qfd@treble> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Wed, 08 Mar 2017 17:37:15 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2424 Lines: 61 On Tue, Mar 07, 2017 at 10:40:14AM -0800, Linus Torvalds wrote: > On Tue, Mar 7, 2017 at 10:28 AM, Josh Poimboeuf wrote: > > > > Also, the gcc documentation says -maccumulate-outgoing-args is > > "generally beneficial for performance and size." > > Hmm. I wonder how true that is. I'm pretty sure it generates bigger > code, although it's probably less noticeable in the kernel (as opposed > to the traditional x86 "push everything" model) due to having the > three register arguments. It does seem to make it bigger. With Pavel's config on gcc 6, if I add -maccumulate-outgoing-args: text data bss dec hex filename 12692555 5550652 9146368 27389575 1a1ee87 vmlinux.before 13179531 5546556 9146368 27872455 1a94cc7 vmlinux.after That's 3.8% more text on x86-32. (FWIW, on x86-64, the size difference is negligible.) > And the "it's faster" is almost certainly garbage. It's true on P4 and > some older AMD cores that couldn't do push/pops quickly. > > > Not to mention the fact that -maccumulate-outgoing-args seems to already > > be enabled in most cases anyway. > > Yeah, that's the main argument for this patch, I think - just remove > the (unusual) special case. As it turns out, when optimizing for size, gcc seems to ignore -maccumulate-outgoing-args completely. So I guess we would have to live with both cases anyway. Which means I'll need to make the unwinder smart enough to deal with it. But that brings up another question. If -maccumulate-outgoing-args is ignored with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, wouldn't using that option break the things which require -maccumulate-outgoing-args? So, looking deeper at the various reasons this flag is enabled, they seem to be mostly obsolete. - CONFIG_FUNCTION_GRAPH_TRACER sets it on x86-32 because of a gcc bug where the stack gets aligned before the mcount call. This issue should be mostly obsolete as most modern compilers now have -mfentry. We could make it dependent on CC_USING_FENTRY. - CONFIG_JUMP_LABEL sets it on x86-32 because of a bug in gcc <= 4.5.1 which has since been fixed with https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46226. We could probably make it gcc-version-dependent. - x86-64 sets it to apparently make the no-longer-in-tree DWARF unwinder happy with older versions of gcc. So it looks like -maccumulate-outgoing-args isn't actually needed in most cases. -- Josh