Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757375AbYHMTgQ (ORCPT ); Wed, 13 Aug 2008 15:36:16 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752262AbYHMTgA (ORCPT ); Wed, 13 Aug 2008 15:36:00 -0400 Received: from one.firstfloor.org ([213.235.205.2]:36459 "EHLO one.firstfloor.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751848AbYHMTgA (ORCPT ); Wed, 13 Aug 2008 15:36:00 -0400 Date: Wed, 13 Aug 2008 21:37:15 +0200 From: Andi Kleen To: Mathieu Desnoyers Cc: Andi Kleen , Linus Torvalds , Steven Rostedt , Jeremy Fitzhardinge , LKML , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Andrew Morton , David Miller , Roland McGrath , Ulrich Drepper , Rusty Russell , Gregory Haskins , Arnaldo Carvalho de Melo , "Luis Claudio R. Goncalves" , Clark Williams Subject: Re: Efficient x86 and x86_64 NOP microbenchmarks Message-ID: <20080813193715.GQ1366@one.firstfloor.org> References: <87tzdv2g05.fsf@basil.nowhere.org> <489CE90D.1040902@goop.org> <20080813175213.GA8679@Krystal> <20080813184142.GM1366@one.firstfloor.org> <20080813193011.GC15547@Krystal> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080813193011.GC15547@Krystal> User-Agent: Mutt/1.4.2.1i Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1404 Lines: 33 > Sorry to ask, I feel I must be missing something, but I'm trying to > figure out where you propose to add the "call mcount" ? In the caller or > in the callee ? callee like gcc. caller would be likely more bloated because there are more calls than functions. Also if it was at the callee more code would be needed because the function currently executed couldn't be gotten from stack directly. > Or is it a different scheme I don't see ? I am trying to figure out how > you happen to do all that without dynamic code modification and manage > not to hurt performance. The dynamic code modification is only needed because there is no global table of the mcount call sites. So instead it discovers them at runtime, but that requires runtime save patching With a custom call scheme one could just build up a table of call sites at link time using an ELF section and then when tracing is enabled/disabled always patch them all in one go in a stop_machine(). Then you wouldn't need parallel execution safe patching anymore and it doesn't matter what the nops look like. The other advantage is that it would allow getting rid of the frame pointer. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/