2005-03-27 02:04:57

by Horst H. von Brand

[permalink] [raw]
Subject: Re: [PATCH] no need to check for NULL before calling kfree() -fs/ext2/

Marcin Dalecki <[email protected]> said:
> On 2005-03-27, at 00:21, linux-os wrote:
> > Always, always, a call will be more expensive than a branch
> > on condition.

Wrong.

> > It's impossible to be otherwise.

Many, many counterexamples say otherwise...

> > A call requires
> > that the return address be written to memory (the stack),

Not necesarily right now, it can be done at leisure later on while doing
other stuff.

> > using register indirection (the stack-pointer).

So what? The stack pointer is surely special. Modern programming languages
(and programming styles) encourage many calls, so this is very heavily
optimized.

> Needless to say that there are enough architectures out there, which
> don't even have something like an explicit call as separate assembler
> instruction...

The mechanism exists somehow.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513


2005-03-27 03:18:35

by Marcin Dalecki

[permalink] [raw]
Subject: Re: [PATCH] no need to check for NULL before calling kfree() -fs/ext2/


On 2005-03-27, at 04:00, Horst von Brand wrote:
>
>> Needless to say that there are enough architectures out there, which
>> don't even have something like an explicit call as separate assembler
>> instruction...
>
> The mechanism exists somehow.

Most RISC architectures are claiming a huge register set advantage over
IA32.
However in reality it's normal that:

1. Some of the register take roles as declared by the ABI. One is stack
one
is basis pointer and so no.
2. Only a subset of register is declared to be guaranteed to be
preserved by
system calls.

Thus the mechanisms are simple calling conventions.

Compilers can frequently see what a subroutine does and can flatten out
the cost
of function calls to something very much resembling just two jumps
instead of
a single jump around a condition.

On the other hand most modern IA32 implementation (since cyrix 486) are
very
efficient at mapping stack operations to a special cache between the
CPU and
L1 cache. I could even imagine them to be more efficient then plain
jumps, which
simply don't carry the same information for cache prefetch and branch
predition.