Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755312AbYHMSkf (ORCPT ); Wed, 13 Aug 2008 14:40:35 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750974AbYHMSk1 (ORCPT ); Wed, 13 Aug 2008 14:40:27 -0400 Received: from one.firstfloor.org ([213.235.205.2]:48959 "EHLO one.firstfloor.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750953AbYHMSk0 (ORCPT ); Wed, 13 Aug 2008 14:40:26 -0400 Date: Wed, 13 Aug 2008 20:41:42 +0200 From: Andi Kleen To: Linus Torvalds Cc: Mathieu Desnoyers , Steven Rostedt , Jeremy Fitzhardinge , Andi Kleen , LKML , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Andrew Morton , David Miller , Roland McGrath , Ulrich Drepper , Rusty Russell , Gregory Haskins , Arnaldo Carvalho de Melo , "Luis Claudio R. Goncalves" , Clark Williams Subject: Re: Efficient x86 and x86_64 NOP microbenchmarks Message-ID: <20080813184142.GM1366@one.firstfloor.org> References: <20080808190506.GD11376@Krystal> <87tzdv2g05.fsf@basil.nowhere.org> <489CE90D.1040902@goop.org> <20080813175213.GA8679@Krystal> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1934 Lines: 44 > So microbenchmarking this way will probably make some things look > unrealistically good. Must be careful to miss the big picture here. We have two assumptions here in this thread: - Normal alternative() nops are relatively infrequent, typically in points with enough pipeline bubbles anyways, and it likely doesn't matter how they are encode. And also they don't have an issue with mult part instructions anyways because they're not patched at runtime, so always the best known can be used. - The one case where nops are very frequent and matter and multipart is a problem is with ftrace noping out the call to mcount at runtime because that happens on every function entry. Even there the overhead is not that big, but at least measurable in kernel builds. Now the numbers have shown that just by not using frame pointer ( -pg right now implies frame pointer) you can get more benefit than what you lose from using non optimal nops. So for me the best strategy would be to get rid of the frame pointer and ignore the nops. This unfortunately would require going away from -pg and instead post process gcc output to insert "call mcount" manually. But the nice advantage of that is that you could actually set up a custom table of callers built in a ELF section and with that you don't actually need the runtime patching (which is only done currently because there's no global table of mcount calls), but could do everything in stop_machine(). Without runtime patching you also don't need single part nops. I think that would be the best option. I especially like it because it would prevent forcing frame pointer which seems to be costlier than any kinds of nosp. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/