Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760022AbYHHTF6 (ORCPT ); Fri, 8 Aug 2008 15:05:58 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1759085AbYHHTFp (ORCPT ); Fri, 8 Aug 2008 15:05:45 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:41428 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759757AbYHHTFl (ORCPT ); Fri, 8 Aug 2008 15:05:41 -0400 Date: Fri, 8 Aug 2008 12:04:00 -0700 (PDT) From: Linus Torvalds To: Steven Rostedt cc: Mathieu Desnoyers , linux-kernel@vger.kernel.org, Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Andrew Morton , David Miller , Roland McGrath , Ulrich Drepper , Rusty Russell , Jeremy Fitzhardinge , Gregory Haskins , Arnaldo Carvalho de Melo , "Luis Claudio R. Goncalves" , Clark Williams Subject: Re: [PATCH 0/5] ftrace: to kill a daemon In-Reply-To: Message-ID: References: <20080807182013.984175558@goodmis.org> <20080807184741.GB18164@Krystal> <20080808172259.GB8244@Krystal> <20080808174607.GG8244@Krystal> <20080808182104.GA11376@Krystal> User-Agent: Alpine 1.10 (LFD 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2345 Lines: 53 On Fri, 8 Aug 2008, Steven Rostedt wrote: > > Can a processor be preempted in a middle of nops? Sure. If you have two nops in a row (and the kernel definition of the NOP array does _not_ guarantee that it's a single-instruction one), you may get a profile hit (ie any interrupt) on the second one. It's less _likely_, but it certainly is not architecturally in any way guaranteed that the kernel "nop[]" tables would be atomic. > What do nops do for a processor? Depends on the microarchitecture. But many will squash it in the decode stage, and generate no uops for them at all, so it's purely a decode throughput issue and has absolutely _zero_ impact for any later CPU stages. > Can it skip them nicely in one shot? See above. It needs to decode them, and the decoder itself may well have some limitations - for example, the longer nops may not even decode in all decoders, which is why some uarchs might prefer two short nops to one long one, but _generally_ a nop will not make it any further than the decoder. But separate nops do count as separate instructions, ie they will hit all the normal decode limits (mostly three or four instructions decoded per cycle). > I'm assuming that jmp is more expensive than the nops because otherwise > a jmp 0 would have been used as a 5 byte nop. Yes. A CPU core _could_ certainly do special decoding for 'jmp 0' too, but I don't know any that do. The 'jmp' is much more likely to be special in the front-end and the decoder, and can easily cause things like the prefetcher to hickup (ie it tries to start prefetching at the "new" target address). So I would absolutely _expect_ a 'jmp' to be noticeably more expensive than one of the special nop's that can be squashed by the decoder. A nop that is squashed by the decoder will literally take absolutely no other resources. It doesn't even need to be tracked from an instruction completion standpoint (which _may_ end up meaning that a profiler would actually never see a hit on the second nop, but it's quite likely to depend on which decoder it hits etc). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/