Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932497AbbELLoB (ORCPT ); Tue, 12 May 2015 07:44:01 -0400 Received: from mail-wg0-f47.google.com ([74.125.82.47]:36631 "EHLO mail-wg0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752972AbbELLn7 (ORCPT ); Tue, 12 May 2015 07:43:59 -0400 Date: Tue, 12 May 2015 13:43:54 +0200 From: Ingo Molnar To: Denys Vlasenko Cc: Linus Torvalds , Thomas Graf , "David S. Miller" , Bart Van Assche , Peter Zijlstra , David Rientjes , Andrew Morton , Oleg Nesterov , "Paul E. McKenney" , linux-kernel@vger.kernel.org Subject: Re: [PATCH] force inlining of spinlock ops Message-ID: <20150512114353.GA13699@gmail.com> References: <1431367042-31475-1-git-send-email-dvlasenk@redhat.com> <20150512074443.GA724@gmail.com> <5551DDBD.9010803@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5551DDBD.9010803@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4349 Lines: 112 * Denys Vlasenko wrote: > On 05/12/2015 09:44 AM, Ingo Molnar wrote: > > > > * Denys Vlasenko wrote: > > > >> With both gcc 4.7.2 and 4.9.2, sometimes gcc mysteriously doesn't inline > >> very small functions we expect to be inlined. In particular, > >> with this config: http://busybox.net/~vda/kernel_config > >> there are more than a thousand copies of tiny spinlock-related functions: > > > > That's an x86-64 allyesconfig AFAICS, right? > > Close, but I disabled options which are clearly "heavy debugging" stuff. > IOW: many developers run their work machines with lock debugging etc, > but few would constantly use something which slows kernel down by a factor of 3! > > So, CONFIG_KASAN is off. CONFIG_STAGING is also off. And a few others I forgot. > > I'm using this config to see which inlines should be deinlined. > For that, I need to cover all callsites of each inline. > Thus, I need ~allyesconfig. > > The discovery that there also exists the opposite problem (wrongly > *un*inlined functions) was accidental. > > > > It's not mysterious, but an effect of -Os plus allowing GCC to do > > inlining heuristics: > > > > CONFIG_CC_OPTIMIZE_FOR_SIZE=y > > CONFIG_OPTIMIZE_INLINING=y > > > > Does the problem go away if you unset of these config options? > > With CONFIG_CC_OPTIMIZE_FOR_SIZE off, > problem greatly diminishes, but is not eliminated. > Testing allyesconfig would take too long, so I just took defconfig. > > On defconfig kernel, the following functions below 16 bytes > of machine code are auto-deinlined: > > #Calls_ Size(hex)_______ Name____________________ > 7 000000000000000b t hweight_long > 5 000000000000000f t init_once > 4 000000000000000d t cpumask_set_cpu > 4 000000000000000b t udp_lib_close > 4 0000000000000006 t udp_lib_hash > 3 000000000000000a t nofill > 3 0000000000000006 t sg_set_page.part.7 > 2 000000000000000f t udplite_sk_init > 2 000000000000000f t ct_seq_next > 2 000000000000000e t encode_cookie > 2 000000000000000d t ktime_get_real > 2 000000000000000b t spin_lock > 2 000000000000000b t device_create_release > 2 000000000000000b t cpu_smt_flags > 2 000000000000000b t cpu_core_flags > 2 0000000000000009 t default_write_file > 2 0000000000000008 t __initcall_pl_driver_init6 > 2 0000000000000008 t __initcall_nf_defrag_init6 > 2 0000000000000008 t __initcall_hid_init6 > 2 0000000000000008 t __initcall_ch_driver_init6 > 2 0000000000000008 t default_read_file > 2 0000000000000006 t wiphy_to_rdev.part.4 > 2 0000000000000006 t s_stop > 2 0000000000000006 t sg_set_page.part.3 > 2 0000000000000006 t generic_print_tuple > 2 0000000000000006 t exp_seq_stop > 2 0000000000000006 t ct_seq_stop > 2 0000000000000006 t ct_cpu_seq_stop > > In particular, one of the functions from my patches, > spin_lock(), has been auto-deinlined: > > ffffffff8108adb0 : > ffffffff8108adb0: 55 push %rbp > ffffffff8108adb1: 48 89 e5 mov %rsp,%rbp > ffffffff8108adb4: e8 37 db 81 00 callq ffffffff818a88f0 <_raw_spin_lock> > ffffffff8108adb9: 5d pop %rbp > ffffffff8108adba: c3 retq > > > > Furtermore, what is the size win on x86 defconfig with these options > > set? > > CONFIG_OPTIMIZE_INLINING=y is in defconfig. > > Size difference for CC_OPTIMIZE_FOR_SIZE: > > text data bss dec hex filename > 12335864 1746152 1081344 15163360 e75fe0 vmlinux.CC_OPTIMIZE_FOR_SIZE=y > 10373764 1684200 1077248 13135212 c86d6c vmlinux.CC_OPTIMIZE_FOR_SIZE=n > > Decrease by about 19%. I suspect the 'filename' field wants to be flipped? In any case, the interesting measurement would not be -Os comparisons (which causes GCC to be too crazy), but to see the size effect of your _patch_ that always-inlines spinlock ops, on plain defconfig and on defconfig-Os. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/