Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755547AbbEUMM1 (ORCPT ); Thu, 21 May 2015 08:12:27 -0400 Received: from mail-wg0-f45.google.com ([74.125.82.45]:33281 "EHLO mail-wg0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751750AbbEUMMY (ORCPT ); Thu, 21 May 2015 08:12:24 -0400 Date: Thu, 21 May 2015 14:12:18 +0200 From: Ingo Molnar To: Linus Torvalds Cc: Josh Poimboeuf , Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Michal Marek , Peter Zijlstra , X86 ML , live-patching@vger.kernel.org, "linux-kernel@vger.kernel.org" , Andy Lutomirski , Denys Vlasenko , Brian Gerst , Peter Zijlstra , Borislav Petkov , Andrew Morton Subject: Re: [PATCH v4 0/3] Compile-time stack frame pointer validation Message-ID: <20150521121218.GA18887@gmail.com> References: <20150520103339.GA22205@gmail.com> <20150520141331.GA16995@treble.redhat.com> <20150520144810.GA10374@gmail.com> <20150520162537.GD16995@treble.redhat.com> <20150521075228.GA20782@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150521075228.GA20782@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10440 Lines: 170 * Ingo Molnar wrote: > Especially on modern x86 CPUs with stack engines (latest Intel and > AMD CPUs) that keeps ESP updates out of the later stages of > execution pipelines, going from RBP framepointers to direct ESP use > is beneficial to performance and compresses I$ footprint as well: > > text data bss dec hex filename > 12150606 2565544 1634304 16350454 f97cf6 linux-CONFIG_FRAME_POINTERS=n/vmlinux > 13282884 2571744 1617920 17472548 10a9c24 linux-CONFIG_FRAME_POINTERS=y/vmlinux Correction: I ran that with a 1-byte alignment patch still applied. I reran all the numbers with the default 16-bytes alignment as well, and the gap between framepointers and no-framepointers become smaller, but the various trends and conclusions still hold. Here are the updated numbers: text data bss dec hex filename 13548564 2571744 1617920 17738228 10ea9f4 linux-CONFIG_FRAME_POINTERS=n/vmlinux 13797773 2571744 1617920 17987437 112776d linux-CONFIG_FRAME_POINTERS=y/vmlinux > Here's the I$ cachemiss rate with the 'vfs-mix' workload that I used > in the -falign-functions measuremenst gives this for > CONFIG_FRAMEPOINTERS=y, on Intel Sandy Bridge (best of 9x10 runs): > > # > # CONFIG_FRAMEPOINTERS=y > # > Performance counter stats for 'system wide' (10 runs): > > 728,328,347 L1-icache-load-misses ( +- 0.08% ) (100.00%) > 11,891,931,664 instructions ( +- 0.00% ) > 300,023 context-switches ( +- 0.00% ) > > 7.324048170 seconds time elapsed ( +- 0.09% ) Performance counter stats for 'system wide' (10 runs): 701,525,006 L1-icache-load-misses ( +- 0.06% ) (100.00%) 11,891,793,196 instructions ( +- 0.01% ) 300,036 context-switches ( +- 0.00% ) 7.354372294 seconds time elapsed ( +- 0.82% ) > > ... and these are the I$ miss perf stats from running the same > workload on a CONFIG_FRAMEPOINTERS=n kernel: > > # > # CONFIG_FRAMEPOINTERS are not set > # > Performance counter stats for 'system wide' (10 runs): > > 687,758,078 L1-icache-load-misses ( +- 0.10% ) (100.00%) > 10,984,908,013 instructions ( +- 0.01% ) > 300,021 context-switches ( +- 0.00% ) > > 7.120867260 seconds time elapsed ( +- 0.29% ) Performance counter stats for 'system wide' (10 runs): 685,107,089 L1-icache-load-misses ( +- 0.08% ) (100.00%) 10,983,861,590 instructions ( +- 0.01% ) 300,031 context-switches ( +- 0.00% ) 7.120738452 seconds time elapsed ( +- 0.35% ) > So if we disable frame pointers, then on this workload: > > - the kernel text size is 9.3% smaller > - the number of instructions executed went down by about 8.2% > - the cachemiss rate went down by about 5.9% > - performance went up by about 2.8%. - the kernel text size is 1.8% smaller: with 16 bytes alignment there's quite some extra free space the frame pointer code can grow into, which reduces the size win. - the number of instructions executed went down by about 8.2% (as expected this is invariant of alignment.) - the cachemiss rate went down by about 2.7%: this is a smaller win again, partly because of the 'free space' 16-byte alignment gives us. - the best 'time elapsed' numbers out of 10 runs show a speedup of 2.0% - close to the 2.8% with 1-byte alignment. > The speedup is actually even better than 2.8%, if you look at > average execution time: > > linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.324048170 seconds time elapsed ( +- 0.09% ) > linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.470166715 seconds time elapsed ( +- 1.01% ) > linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.365047474 seconds time elapsed ( +- 0.25% ) > linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.828223324 seconds time elapsed ( +- 2.04% ) > linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.427164489 seconds time elapsed ( +- 0.70% ) > linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.385565350 seconds time elapsed ( +- 0.35% ) > linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.560782318 seconds time elapsed ( +- 1.68% ) > linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.399741309 seconds time elapsed ( +- 0.74% ) > linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.303746766 seconds time elapsed ( +- 0.04% ) > > avg = 7.451609 linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.300875812 seconds time elapsed ( +- 0.17% ) linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.491652338 seconds time elapsed ( +- 1.33% ) linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.307877300 seconds time elapsed ( +- 0.20% ) linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.258946461 seconds time elapsed ( +- 0.23% ) linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.295113779 seconds time elapsed ( +- 0.30% ) linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.283375859 seconds time elapsed ( +- 0.21% ) linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.319320205 seconds time elapsed ( +- 0.38% ) linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.354372294 seconds time elapsed ( +- 0.82% ) linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.308955558 seconds time elapsed ( +- 0.26% ) linux-CONFIG_FRAME_POINTERS=y/res2.txt: 7.295267101 seconds time elapsed ( +- 0.26% ) avg=7.32 > linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.201498813 seconds time elapsed ( +- 0.86% ) > linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.120867260 seconds time elapsed ( +- 0.29% ) > linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.141642635 seconds time elapsed ( +- 0.15% ) > linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.217213506 seconds time elapsed ( +- 0.85% ) > linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.163046581 seconds time elapsed ( +- 0.56% ) > linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.128939439 seconds time elapsed ( +- 0.23% ) > linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.256172853 seconds time elapsed ( +- 0.82% ) > linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.122946768 seconds time elapsed ( +- 0.23% ) > linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.126018578 seconds time elapsed ( +- 0.18% ) > > avg = 7.164260 linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.135061084 seconds time elapsed ( +- 0.39% ) linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.132738388 seconds time elapsed ( +- 0.34% ) linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.174334895 seconds time elapsed ( +- 0.32% ) linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.215143851 seconds time elapsed ( +- 0.71% ) linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.131166029 seconds time elapsed ( +- 0.19% ) linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.270427197 seconds time elapsed ( +- 1.22% ) linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.120738452 seconds time elapsed ( +- 0.35% ) linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.168856127 seconds time elapsed ( +- 0.27% ) linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.268637173 seconds time elapsed ( +- 1.28% ) linux-CONFIG_FRAME_POINTERS=n/res2.txt: 7.178431781 seconds time elapsed ( +- 0.32% ) avg=7.18 > Then with framepointers disabled this workload gets faster by 4.0% > on average. With 16-byte alignment the average gets faster by 2.8%. The conclusions are unchanged: > The average result is also pretty stable in the no-framepointers > case, while it fluctuates more in the framepointers case. (and this > is why the 'best runtime' favors the framepointers case - the > average is closer to reality.) > > So the performance advantages of not doing framepointers is not > something we can ignore IMHO: but obviously performance isn't > everything - so if stack unwinding is unrobust, then we need and > want frame pointers. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/