Date: Thu, 29 Jun 2017 09:55:47 +0200
From: Ingo Molnar <mingo@kernel.org>
To: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
        live-patching@vger.kernel.org,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Andy Lutomirski <luto@kernel.org>, Jiri Slaby <jslaby@suse.cz>,
        "H. Peter Anvin" <hpa@zytor.com>,
        Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH v2 0/8] x86: undwarf unwinder
Message-ID: <20170629075547.y24s7aq4nqwt2rll@gmail.com>
References: <cover.1498659915.git.jpoimboe@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <cover.1498659915.git.jpoimboe@redhat.com>
User-Agent: NeoMutt/20170113 (1.7.2)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2521
Lines: 68


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> Undwarf vs frame pointers
> -------------------------
> 
> With frame pointers enabled, GCC adds instrumentation code to every
> function in the kernel.  The kernel's .text size increases by about
> 3.2%, resulting in a broad kernel-wide slowdown.  Measurements by Mel
> Gorman [1] have shown a slowdown of 5-10% for some workloads.
> 
> In contrast, the undwarf unwinder has no effect on text size or runtime
> performance, because the debuginfo is out of band.  So if you disable
> frame pointers and enable undwarf, you get a nice performance
> improvement across the board, and still have reliable stack traces.
> 
> Another benefit of undwarf compared to frame pointers is that it can
> reliably unwind across interrupts and exceptions.  Frame pointer based
> unwinds can skip the caller of the interrupted function if it was a leaf
> function or if the interrupt hit before the frame pointer was saved.
> 
> The main disadvantage of undwarf compared to frame pointers is that it
> needs more memory to store the undwarf table: roughly 3-5MB depending on
> the kernel config.

Note that it's not just a performance improvement, but also an instruction cache 
locality improvement: 3.2% .text savings almost directly transform into a 
similarly sized reduction in cache footprint. That can transform to even higher 
speedups for workloads whose cache locality is borderline.

I _really_ like this feature, and the independence of the debuginfo data format. 

Logistically it's too bad we are 3 days away from the merge window to be able to 
pick this up:

>  56 files changed, 3466 insertions(+), 1765 deletions(-)

OTOH most of the diffstat is in objtool.

Any objections to applying the first 3 objtool patches straight away and see 
whether anything breaks? That would significantly reduce the size of the rest of 
the patch set.

> I'm not tied to the 'undwarf' name, other naming ideas are welcome.

Ha, a new bike shed painting job! ;-)

I think 'undwarf' isn't a bad name, it's short, catchy and describes the purpose 
of the effort.

But I cannot resist some other suggestions, after 'elf' and 'dwarf' the obvious 
candidates from the peoples of Middle-earth would be:

 - 'Hobbit'
 - 'Eagle'
 - 'Ent'
 - 'Dragon'
 - 'Troll'
 - 'Ainur'

'struct troll_entry' has a certain charm to it.

'Eagle' is even nicer IMHO: larger than a dwarf but so much faster - and eagles 
are beautiful! Plus the name is 2 letters shorter than 'unwdwarf', win-win.

Thanks,

	Ingo