Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757390AbYHMS3V (ORCPT ); Wed, 13 Aug 2008 14:29:21 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752143AbYHMS3N (ORCPT ); Wed, 13 Aug 2008 14:29:13 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:44158 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751748AbYHMS3M (ORCPT ); Wed, 13 Aug 2008 14:29:12 -0400 Date: Wed, 13 Aug 2008 11:27:14 -0700 (PDT) From: Linus Torvalds To: Mathieu Desnoyers cc: Steven Rostedt , Jeremy Fitzhardinge , Andi Kleen , LKML , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Andrew Morton , David Miller , Roland McGrath , Ulrich Drepper , Rusty Russell , Gregory Haskins , Arnaldo Carvalho de Melo , "Luis Claudio R. Goncalves" , Clark Williams Subject: Re: Efficient x86 and x86_64 NOP microbenchmarks In-Reply-To: <20080813175213.GA8679@Krystal> Message-ID: References: <20080808182104.GA11376@Krystal> <20080808190506.GD11376@Krystal> <87tzdv2g05.fsf@basil.nowhere.org> <489CE90D.1040902@goop.org> <20080813175213.GA8679@Krystal> User-Agent: Alpine 1.10 (LFD 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1719 Lines: 38 On Wed, 13 Aug 2008, Mathieu Desnoyers wrote: > > I also did some microbenchmarks on my Intel Xeon 64 bits, AMD64 and > Intel Pentium 4 boxes to compare a baseline Note that the biggest problems of a jump-based nop are likely to happen when there are I$ misses and/or when there are other jumps involved. Ie a some microarchitectures tend to have issues with jumps to jumps, or when there are multiple control changes in the same (possibly partial) cacheline because the instruction stream prediction may be predecoded in the L1 I$, and multiple branches in the same cacheline - or in the same execution cycle - can pollute that kind of thing. So microbenchmarking this way will probably make some things look unrealistically good. On the P4, the trace cache makes things even more interesting, since it's another level of I$ entirely, with very different behavior for the hit case vs the miss case. And I$ misses for the kernel are actually fairly high. Not in microbenchmarks that tend to have very repetive behavior and a small I$ footprint, but in a lot of real-life loads the *bulk* of all action is in user space, and then the kernel side is often invoced with few loops (the kernel has very few loops indeed) and a cold I$. So your numbers are interesting, but it would be really good to also get some info from Intel/AMD who may know about microarchitectural issues for the cases that don't show up in the hot-I$-cache environment. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/