Date: Tue, 11 Jul 2017 10:40:56 +0200
From: Ingo Molnar <mingo@kernel.org>
To: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
        live-patching@vger.kernel.org,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Andy Lutomirski <luto@kernel.org>, Jiri Slaby <jslaby@suse.cz>,
        "H. Peter Anvin" <hpa@zytor.com>,
        Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH v2 4/8] objtool: add undwarf debuginfo generation
Message-ID: <20170711084055.pfrzl5kql7coxsxn@gmail.com>
References: <cover.1498659915.git.jpoimboe@redhat.com>
 <e255ec17d43e4a22b58accb42380fe78250cafe8.1498659915.git.jpoimboe@redhat.com>
 <20170629072512.pmkfnrgq4dci6od7@gmail.com>
 <20170629140404.qgcvxhcgm7iywrkb@treble>
 <20170629144618.vdzem7o6ib5nqab6@gmail.com>
 <20170629150652.r2dl7f3pzp6cj2i7@treble>
 <20170706203636.lcwfjsphmy2q464v@treble>
 <20170707094437.2vgosia5hjg2wsut@gmail.com>
 <20170711025807.62fzfgf2dhcgqur6@treble>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170711025807.62fzfgf2dhcgqur6@treble>
User-Agent: NeoMutt/20170113 (1.7.2)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1132
Lines: 27


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> Anyway, I used some linker magic to temporarily move the unwinder code to the 
> end of .text, so that unwinder changes don't add unexpected side effects to the 
> microbenchmark behavior.  Now I'm getting more consistent results: the packed 
> struct is measuring ~2% slower.  The slight slowdown might just be explained by 
> the fact that GCC generates some extra instructions for extracting the fields 
> out of the packed struct.

Yeah, the 16-bit field accesses versus a zero-extended 32-bit field are more 
complex to access even on x86 that has a fair amount of 16-bit legacy.

> In the meantime, I found a ~10% speedup by making the "fast lookup table" block 
> size a power-of-two (256) to get rid of the need for a slow 'div' instruction.
> 
> I think I'm done performance tweaking for now.  I'll keep the packed struct, and 
> add the code for the 'div' removal, and hope to submit v3 soon.

Sounds good to me!

~2% slowdown for ~30% RAM savings for a debug data structure that is about as 
large as a typical kernel's total .text is a decent trade-off.

Thanks,

	Ingo