From: Thomas Garnier Subject: Re: x86: PIE support and option to extend KASLR randomization Date: Thu, 21 Sep 2017 17:06:15 -0700 Message-ID: References: <20170815075609.mmzbfwritjzvrpsn@gmail.com> <20170816151235.oamkdva6cwpc4cex@gmail.com> <20170817080920.5ljlkktngw2cisfg@gmail.com> <20170825080443.tvvr6wzs362cjcuu@gmail.com> <20170921155919.skpyt7dutod5ul4t@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: Herbert Xu , "David S . Miller" , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , Peter Zijlstra , Josh Poimboeuf , Arnd Bergmann , Matthias Kaehlcke , Boris Ostrovsky , Juergen Gross , Paolo Bonzini , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Joerg Roedel , Tom Lendacky , Andy Lutomirski , Borislav Petkov , Brian Gerst , "Kirill A . Shutemov" , "Rafael J . Wysocki" , Len Brown , Pavel Machek , Tejun Heo , Christoph La To: Ingo Molnar Return-path: List-Post: List-Help: List-Unsubscribe: List-Subscribe: In-Reply-To: List-Id: linux-crypto.vger.kernel.org On Thu, Sep 21, 2017 at 2:16 PM, Thomas Garnier wrote: > > On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar wrote: > > > > ( Sorry about the delay in answering this. I could blame the delay on the merge > > window, but in reality I've been procrastinating this is due to the permanent, > > non-trivial impact PIE has on generated C code. ) > > > > * Thomas Garnier wrote: > > > >> 1) PIE sometime needs two instructions to represent a single > >> instruction on mcmodel=kernel. > > > > What again is the typical frequency of this occurring in an x86-64 defconfig > > kernel, with the very latest GCC? > > I am not sure what is the best way to measure that. A very approximate approach would be to look at each instruction using the signed trick with a _32S relocation. All _32S relocations won't be translated to more instructions because some are just relocating part of an absolute mov which would be actually smaller if relative. Used this command to get a relative estimate: objdump -dr ./baseline/vmlinux | egrep -A 2 '\-0x[0-9a-f]{8}' | grep _32S | wc -l Got 6130 places, if you assume each add at least 7 bytes. It adds at least 42910 bytes on the .text section. The text section is 78599 bytes bigger from baseline to PIE. That's at least 54% of the size difference. Assuming we found all of them and we can't factor the impact on using an additional register. Similar approach with the switch table but a bit more complex: 1) Find all constructs as with an lea (%rip) followed by a jmp instruction inside a function (typical unfolded switch case). 2) Remove occurrences of less than 4 for the destination address Result: 480 switch cases in 49 functions. Each case take at least 9 bytes and the switch itself takes 16 bytes (assuming one per function). That's 5104 bytes for easy to identify switches (less than 7% of the increase). I am certainly missing a lot of differences. I checked if the percpu changes impacted the size and it doesn't (only 3 bytes added on PIE). I also tried different ways to compare the .text section like size of symbols or number of bytes on full disassembly but the results are really off from the whole .text size so I am not sure if it is the right way to go about it. > > > > > Also, to make sure: which unwinder did you use for your measurements, > > frame-pointers or ORC? Please use ORC only for future numbers, as > > frame-pointers is obsolete from a performance measurement POV. > > I used the default configuration which uses frame-pointer. I built all > the different binaries with ORC and I see an improvement in size: > > On latest revision (just built and ran performance tests this week): > > With framepointer: PIE .text is 0.837324% than baseline > > With ORC: PIE .text is 0.814224% than baseline > > Comparing baselines only, ORC is -2.849832% than frame-pointers. > > > > >> 2) GCC does not optimize switches in PIE in order to reduce relocations: > > > > Hopefully this can either be fixed in GCC or at least influenced via a compiler > > switch in the future. > > > >> The switches are the biggest increase on small functions but I don't > >> think they represent a large portion of the difference (number 1 is). > > > > Ok. > > > >> A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE > >> kernel being faster by 1% across multiple runs (comparing 50 runs done > >> across 5 reboots twice). I don't think PIE is faster than a > >> mcmodel=kernel but recent versions of gcc makes them fairly similar. > > > > So I think we are down to an overhead range where the inherent noise (both random > > and systematic one) in 'hackbench' overwhelms the signal we are trying to measure. > > > > So I think it's the kernel .text size change that is the best noise-free proxy for > > the overhead impact of PIE. > > I agree but it might be hard to measure the exact impact. What is > acceptable and what is not? > > > > > It doesn't hurt to double check actual real performance as well, just don't expect > > there to be much of a signal for anything but fully cached microbenchmark > > workloads. > > That's aligned with what I see in the latest performance testing. > Performance is close enough that it is hard to get exact numbers (pie > is just a bit slower than baseline on hackench (~1%)). > > > > > Thanks, > > > > Ingo > > > > -- > Thomas -- Thomas