Subject: Re: [PATCH] no need to check for NULL before calling kfree()
	-fs/ext2/
From: Arjan van de Ven <arjan@infradead.org>
To: linux-os@analogic.com
Cc: Jesper Juhl <juhl-lkml@dif.dk>, ext2-devel@lists.sourceforge.net,
       Linux kernel <linux-kernel@vger.kernel.org>
In-Reply-To: <Pine.LNX.4.61.0503261811001.9945@chaos.analogic.com>
References: <Pine.LNX.4.62.0503252307010.2498@dragon.hyggekrogen.localhost>
	 <Pine.LNX.4.61.0503251726010.6354@chaos.analogic.com>
	 <1111825958.6293.28.camel@laptopd505.fenrus.org>
	 <Pine.LNX.4.61.0503261811001.9945@chaos.analogic.com>
Content-Type: text/plain
Date: Sun, 27 Mar 2005 10:45:30 +0200
Message-Id: <1111913130.6297.24.camel@laptopd505.fenrus.org>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3875
Lines: 92

On Sat, 2005-03-26 at 18:21 -0500, linux-os wrote:
> On Sat, 26 Mar 2005, Arjan van de Ven wrote:
> 
> > On Fri, 2005-03-25 at 17:29 -0500, linux-os wrote:
> >> Isn't it expensive of CPU time to call kfree() even though the
> >> pointer may have already been freed?
> >
> > nope
> >
> > a call instruction is effectively half a cycle or less, the branch
> 
> Wrong!

oh? a call is "push eip + a new eip" effectively. the new eip is
entirely free, the push eip takes half a cycle (or 1 full cycle but only
one of the two/three pipelines).

> 
> > predictor of the cpu can predict perfectly where the next instruction is
> > from. The extra if() you do in front is a different matter, that can
> > easily cost 100 cycles+. (And those are redundant cycles because kfree
> > will do the if again anyway). So what you propose is to spend 100+
> > cycles to save half a cycle. Not a good tradeoff ;)
> >
> 
> Wrong!

Is it wrong that the cpu can predict the target perfectly? No. Unless
you use function pointers (then it's a  whole different ballgame).

> 
> Pure unmitigated bull-shit. I measure (with hardware devices)
> the execution time of real code in modern CPUs. I do this for
> a living so you don't have to stand in line for a couple of
> hours to have your baggage scanned at the airport.

Ok I used to do this kind of performance work for a living too and
measuring it to death as well.

> Always, always, a call will be more expensive than a branch
> on condition. 

It is not on modern Out of order cpus. 

> It's impossible to be otherwise. A call requires
> that the return address be written to memory (the stack),
> using register indirection (the stack-pointer).

and it's a so common pattern that it's optimized to death. Internally a
call gets transformed to 2 uops or so, one is push eip, the other is the
jmp (which gets then just absorbed by the "what is the next eip" logic,
just as a "jmp"s are 0 cycles)

> If somebody said; "I think that the code will look better
> and the few cycles lost will not be a consequence with modern
> CPUs...", then there is a point. But coming up with this
> disingenuous bullshit is something else.

I don't have to take this from you and I don't. You're calling me a liar
with zero evidence. Lets get some facts straight
1) On a modern cpu, a miss of the branch predictor is quite expensive.
   The entire pipeline needs flushing if this happens, and on a p4 this
   will be in the order of 50 to 100 cycles at minimum.
2) absolute "jmp" is free on modern OOO cpus. Instead of taking an
   actual execution slot, all that happens is that the "what is the next
   EIP" logic gets a different value. (you can argue what happens if you
   have a sequence of jmps and that it's not free then, and I'll grant
   you that, but that corner case is not relevant here)
3) a "call" instruction gets translated into what basically is 
   "push EIP" and "jmp" uops.
4) modern processors have special logic to optimize push/pop
   instructions; for example a "push eax ; push ebx" sequence will
   execute in parallel in the same cycle even though there is a data
   dependency on esp, the cpu can perfectly predict the esp effect and
   will do so.
5) modern processors have a call/ret fifo cache they use to do branch
   prediction for the target of "ret" instructions. Unless you do
   misbalanced call/ret pairs the prediction will be perfect.

Based on this the conclusion "a function call is really cheap versus a
conditional branch" is justified imo. Now you better come with proof
about which of the 5 things above I'm totally lying to you or you better
come with an apology.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/