Andrew - thanks for applying my last patch; thought you might be interested
in this trivial one too. Patch is versus 2.6.0-test4-bk8, I expect it
will also apply against current -mm.
-Mitch
diff -urN -X dontdiff linux-2.6.0-test4bk8-VIRGIN/drivers/char/vt.c linux-2.6.0-test4bk8mnb1/drivers/char/vt.c
--- linux-2.6.0-test4bk8-VIRGIN/drivers/char/vt.c 2003-07-13 20:37:27.000000000 -0700
+++ linux-2.6.0-test4bk8mnb1/drivers/char/vt.c 2003-09-06 13:52:58.000000000 -0700
@@ -2179,7 +2179,7 @@
}
set_cursor(currcons);
- if (!oops_in_progress)
+ if (likely(!oops_in_progress))
poke_blanked_console();
quit:
diff -urN -X dontdiff linux-2.6.0-test4bk8-VIRGIN/drivers/parisc/led.c linux-2.6.0-test4bk8mnb1/drivers/parisc/led.c
--- linux-2.6.0-test4bk8-VIRGIN/drivers/parisc/led.c 2003-07-27 12:57:39.000000000 -0700
+++ linux-2.6.0-test4bk8mnb1/drivers/parisc/led.c 2003-09-06 13:53:18.000000000 -0700
@@ -488,7 +488,7 @@
}
/* blink all LEDs twice a second if we got an Oops (HPMC) */
- if (oops_in_progress) {
+ if (unlikely(oops_in_progress)) {
currentleds = (count_HZ<=(HZ/2)) ? 0 : 0xff;
}
diff -urN -X dontdiff linux-2.6.0-test4bk8-VIRGIN/kernel/printk.c linux-2.6.0-test4bk8mnb1/kernel/printk.c
--- linux-2.6.0-test4bk8-VIRGIN/kernel/printk.c 2003-07-13 20:39:24.000000000 -0700
+++ linux-2.6.0-test4bk8mnb1/kernel/printk.c 2003-09-06 13:53:50.000000000 -0700
@@ -400,7 +400,7 @@
static char printk_buf[1024];
static int log_level_unknown = 1;
- if (oops_in_progress) {
+ if (unlikely(oops_in_progress)) {
/* If a crash is occurring, make sure we can't deadlock */
spin_lock_init(&logbuf_lock);
/* And make sure that we print immediately */
On Sun, Sep 07, 2003 at 01:42:04AM -0500, Mitchell Blank Jr wrote:
> Andrew - thanks for applying my last patch; thought you might be interested
> in this trivial one too. Patch is versus 2.6.0-test4-bk8, I expect it
> will also apply against current -mm.
none of this patch seems to touch particularly performance critical code.
Is it really worth adding these macros to every if statement in the kernel?
There comes a point where readability is lost, for no measurable gain.
Dave
--
Dave Jones http://www.codemonkey.org.uk
On Wed, Sep 10, 2003 at 04:20:31PM +0200, Pavel Machek wrote:
> > none of this patch seems to touch particularly performance critical code.
> > Is it really worth adding these macros to every if statement in the kernel?
> > There comes a point where readability is lost, for no measurable gain.
>
> Perhaps we should have macros ifu() and ifl(), that would be used as a
> plain if, just with likelyness-indication? That way we could have it
> in *every* statement and readability would not suffer that much...
You've got to be kidding.
Dave
--
Dave Jones http://www.codemonkey.org.uk
Hi!
> > Andrew - thanks for applying my last patch; thought you might be interested
> > in this trivial one too. Patch is versus 2.6.0-test4-bk8, I expect it
> > will also apply against current -mm.
>
> none of this patch seems to touch particularly performance critical code.
> Is it really worth adding these macros to every if statement in the kernel?
> There comes a point where readability is lost, for no measurable gain.
Perhaps we should have macros ifu() and ifl(), that would be used as a
plain if, just with likelyness-indication? That way we could have it
in *every* statement and readability would not suffer that much...
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
Hi!
> > > none of this patch seems to touch particularly performance critical code.
> > > Is it really worth adding these macros to every if statement in the kernel?
> > > There comes a point where readability is lost, for no measurable gain.
> >
> > Perhaps we should have macros ifu() and ifl(), that would be used as a
> > plain if, just with likelyness-indication? That way we could have it
> > in *every* statement and readability would not suffer that much...
>
> You've got to be kidding.
I'm pretty serious.
ifu (a==b) {
something();
}
looks better than
if (unlikely(a==b)) {
something();
}
sched.c alone probably would get more readable as a result of such
conversion...
[Okay, having it at every statement is prbably bad idea.]
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
On Wed, 10 Sep 2003, Pavel Machek wrote:
> Hi!
>
> > > > none of this patch seems to touch particularly performance critical code.
> > > > Is it really worth adding these macros to every if statement in the kernel?
> > > > There comes a point where readability is lost, for no measurable gain.
> > >
> > > Perhaps we should have macros ifu() and ifl(), that would be used as a
> > > plain if, just with likelyness-indication? That way we could have it
> > > in *every* statement and readability would not suffer that much...
> >
> > You've got to be kidding.
>
> I'm pretty serious.
>
> ifu (a==b) {
> something();
> }
>
> looks better than
>
> if (unlikely(a==b)) {
> something();
> }
>
> sched.c alone probably would get more readable as a result of such
> conversion...
>
> [Okay, having it at every statement is prbably bad idea.]
> Pavel
> --
> When do you have a heart between your knees?
> [Johanka's followup: and *two* hearts?]
Dumb question? Has anybody actually tested "__builtin_expect", and
friends to see if any of this "coloration" actually enhances performance?
"unlikely()" is the negative of the new built-in. I would guess that
any conversion operations are going to negate any advantage possibly
gained by reordering instructions.
For instance:
Given:
if(a == b)
foo();
else
bar();
... and ...
if(unlikely(a == b))
foo();
else
bar();
I would guess that the compiler output might be:
movl a(%esp), %eax # Get 'a' off the stack
cmpl b(%esp), %eax # Compare against 'b' on the stack
jnz 1f # Not the same
call foo # Are the same, call foo()
jmp 2f # Need to jump around stuff
1: call bar # Execute bar()
2: # Do more stuff.
Note that no matter what you do, comparing equal or not equal,
when you get down to the actual code, it's apples and apples.
You are always going to take an extra jump in one execution
path after the function, and you will take a conditional jump
before the function call in the other execution path. So, you
always have the "extra" jumps, no matter.
I would guess that most all of the conditional code will turn
out this way and, won't even be as efficient as what I have
written above.
Before a lot of readable code is modified (trashed???), maybe
somebody should benchmark the differences? The tests I have
made here, seem to show that any measurment differences are
in the noise and... favor leaving __builtin_expect() out of
the code.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (794.73 BogoMips).
Note 96.31% of all statistics are fiction.
Richard B. Johnson wrote:
> I would guess that the compiler output might be:
Your guess is incorrect.
> You are always going to take an extra jump in one execution
> path after the function, and you will take a conditional jump
> before the function call in the other execution path. So, you
> always have the "extra" jumps, no matter.
That is not true. The "likely" path has no taken jumps.
Think about the code again.
How would you optimise it, if you were writing assembly language yourself?
(In more complex examples, another factor is that mis-predicted
conditional jumps are much slower than unconditional jumps, so it is
good to favour the latter in the likely path).
-- Jamie
On Wed, 10 Sep 2003, Jamie Lokier wrote:
> Richard B. Johnson wrote:
> > I would guess that the compiler output might be:
>
> Your guess is incorrect.
>
> > You are always going to take an extra jump in one execution
> > path after the function, and you will take a conditional jump
> > before the function call in the other execution path. So, you
> > always have the "extra" jumps, no matter.
>
> That is not true. The "likely" path has no taken jumps.
>
Absolutely, positively, irrefutably wrong! Any logical operation
with any real processor can only result in a jump upon condition. The
path not taken will always require a jump around the code that
handled the jump upon condition unless the code exists at
the end of a procedure where a 'return' will suffice. Period. There
is discussion if you don't understand this. If you insist upon
taking exception to everything I say, without even reading what
I say, then you are wasting a lot of energy.
All real processors make jumps based upon the preceeding logical
operation, i.e., if equal, if less, if greater, if true. With
Intel, you have the following construct:
After the conditional test, everybody has to execute from label
more_code:
cmpl $1, %eax
jz 1f
jc 2f
call do_default
jmp more_code
1: call do_something_if_equal
jmp more_code
2: call do_something_if_less
more_code:
In every case, one has to jump around code for other execution paths
except the last, where you have to jump on condition anyway. There
is no free liunch, and the straight-through route, do_default
uas just as many jumps as the last.
> Think about the code again.
> How would you optimise it, if you were writing assembly language yourself?
>
I did. And I do this for a living. And, after 30 years of this shit
they still haven't fired me. Learn something. I'm pissed.
> (In more complex examples, another factor is that mis-predicted
> conditional jumps are much slower than unconditional jumps, so it is
> good to favour the latter in the likely path).
>
Show me the money. With Intel, the testing of the condition, existing
in the flags, requires an instruction, unconditional or not.
> -- Jamie
>
Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (794.73 BogoMips).
Note 96.31% of all statistics are fiction.
Hi!
> > Your guess is incorrect.
> >
> > > You are always going to take an extra jump in one execution
> > > path after the function, and you will take a conditional jump
> > > before the function call in the other execution path. So, you
> > > always have the "extra" jumps, no matter.
> >
> > That is not true. The "likely" path has no taken jumps.
> >
>
> Absolutely, positively, irrefutably wrong! Any logical operation
> with any real processor can only result in a jump upon condition. The
> path not taken will always require a jump around the code that
> handled the jump upon condition unless the code exists at
> the end of a procedure where a 'return' will suffice. Period.
No.
jz not_likely
likely_code
go_back:
more_likely_core
retn
not_likely:
do_whatever_you_need
jmp go_back
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
Richard B. Johnson wrote:
> cmpl $1, %eax
> jz 1f
> jc 2f
> call do_default
> jmp more_code
> 1: call do_something_if_equal
> jmp more_code
> 2: call do_something_if_less
> more_code:
>
> In every case, one has to jump around code for other execution paths
> except the last, where you have to jump on condition anyway. There
> is no free liunch, and the straight-through route, do_default
> uas just as many jumps as the last.
Here is your code optimised for no jumps in the "do_default" case:
cmpl $1,%eax
jbe 1f
call do_default
more_code:
.subsection 1
1: jnz 2f
call do_something_if_equal
jmp more_code
2: call do_something_if_less
jmp more_code
.previous
> > How would you optimise it, if you were writing assembly language yourself?
> I did. And I do this for a living. And, after 30 years of this shit
> they still haven't fired me. Learn something. I'm pissed.
It's ok to be pissed. I'd be pissed too :)
*ducks*
Enjoy :)
-- Jame
On Wed, 10 Sep 2003, Jamie Lokier wrote:
> Richard B. Johnson wrote:
> > cmpl $1, %eax
> > jz 1f
> > jc 2f
> > call do_default
> > jmp more_code
> > 1: call do_something_if_equal
> > jmp more_code
> > 2: call do_something_if_less
> > more_code:
> >
> > In every case, one has to jump around code for other execution paths
> > except the last, where you have to jump on condition anyway. There
> > is no free liunch, and the straight-through route, do_default
> > uas just as many jumps as the last.
>
> Here is your code optimised for no jumps in the "do_default" case:
>
> cmpl $1,%eax
> jbe 1f
> call do_default
> more_code:
> .subsection 1
> 1: jnz 2f
> call do_something_if_equal
> jmp more_code
> 2: call do_something_if_less
> jmp more_code
> .previous
>
You are a magician! Putting in a .subsection to hide the jump
is absolute bullshit. The built-in macros, ".subsection", and
".previous" just made the damn linker do the fixup. You just
did a long jump, out of the current code-stream, into some
other section. Then you jumped back. Hell of an optimization!
Might even reload the cache if you are lucky! Linker tricks
won't work for me. Also, putting some address on the stack
and executing 'ret' emulate a jump won't impress me either.
In any real code, only the last instruction in a procedure
gets to have a jump optimized away. Most of the times you
can't even do that because you need to restore different
stack-levels from different code paths (one reason to use
a frame-pointer, but still not good enough).
> > > How would you optimise it, if you were writing assembly language yourself?
>
> > I did. And I do this for a living. And, after 30 years of this shit
> > they still haven't fired me. Learn something. I'm pissed.
>
> It's ok to be pissed. I'd be pissed too :)
>
> *ducks*
>
> Enjoy :)
> -- Jame
>
Sure do. I love it. I even get paid for this kind of stuff!
Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (794.73 BogoMips).
Note 96.31% of all statistics are fiction.
Richard B. Johnson wrote:
> > cmpl $1,%eax
> > jbe 1f
> > call do_default
> > more_code:
> > .subsection 1
> > 1: jnz 2f
> > call do_something_if_equal
> > jmp more_code
> > 2: call do_something_if_less
> > jmp more_code
> > .previous
> >
>
> You are a magician! Putting in a .subsection to hide the jump
> is absolute bullshit. The built-in macros, ".subsection", and
> ".previous" just made the damn linker do the fixup. You just
> did a long jump, out of the current code-stream, into some
> other section. Then you jumped back. Hell of an optimization!
".subsection" does not create jump instructions. The linker does not
create jump instructions. The only jump instructions are the ones
written in the source code.
The above code will execute three instructions in the do_default case:
"cmpl", "jbe" and "call". No jumps are taken in that case.
The code does exactly what you said is logically impossible: one of
the cases, presumably marked "likely" in C code, takes no jumps at
all. What the other cases do is irrelevant. That is called
optimising for the likely case, at the expense of the others.
Try assembling the above source, with a "ret" after it, and then
disassemble the object file, if you don't believe me.
Or just read Pavel's example.
If you don't understand Pavel's example, there is no hope of you
grokking the advanced stuff ;)
Seriously, you can't possibly have done asm programming for 30
years without optimising for fast paths... surely?
> Linker tricks won't work for me.
You'll be glad to know GCC does it without linker tricks.
> Also, putting some address on the stack and executing 'ret' emulate
> a jump won't impress me either.
It shouldn't. That would misalign the RSB (return stack buffer:
target prediction for "ret") causing several subsequent "ret"
instructions to mispredict their targets and stall the pipeline.
-- Jamie
Hi!
> > > cmpl $1, %eax
> > > jz 1f
> > > jc 2f
> > > call do_default
> > > jmp more_code
> > > 1: call do_something_if_equal
> > > jmp more_code
> > > 2: call do_something_if_less
> > > more_code:
> > >
> > > In every case, one has to jump around code for other execution paths
> > > except the last, where you have to jump on condition anyway. There
> > > is no free liunch, and the straight-through route, do_default
> > > uas just as many jumps as the last.
> >
> > Here is your code optimised for no jumps in the "do_default" case:
> >
> > cmpl $1,%eax
> > jbe 1f
> > call do_default
> > more_code:
> > .subsection 1
> > 1: jnz 2f
> > call do_something_if_equal
> > jmp more_code
> > 2: call do_something_if_less
> > jmp more_code
> > .previous
> >
>
> You are a magician! Putting in a .subsection to hide the jump
> is absolute bullshit. The built-in macros, ".subsection", and
> ".previous" just made the damn linker do the fixup. You just
> did a long jump, out of the current code-stream, into some
He's right. Even without subsections you can move code somewhere
outside the function. And gcc should be smart enough to do that.
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
Pavel Machek wrote:
> He's right. Even without subsections you can move code somewhere
> outside the function. And gcc should be smart enough to do that.
It is:
extern int a, b;
int test()
{
if (__builtin_expect(a > 1, 1))
foo();
else
bar();
return b;
}
Compiles to (with gcc -Os -fomit-frame-pointer):
test: cmpl $1, a
jle .L2
call foo
.L3: movl b, %eax
ret
.L2: call bar
jmp .L3
Enjoy,
-- Jamie
I feel sorry that my trivial little patch has spawned such a large thread.
Pavel Machek wrote:
> > There comes a point where readability is lost, for no measurable gain.
>
> Perhaps we should have macros ifu() and ifl(), that would be used as a
> plain if, just with likelyness-indication?
No, I think that would be a move in the wrong direction.
I've already described my personal opinion about unlikely() in a couple
private emails, but for the record:
When I see code like...
if (unlikely(condition)) {
}
...it reads to me like...
if (condition) { /* Unlikely error case */
}
In other words it documents the code making it MORE readable than before
(obviously this can be done to excess). If the compiler can also do
something useful with the information, so much the better.
Perhaps I'm the only one that feels that way, though.
Your ifu/ifl suggestion I think is pretty ugly. First off, it would break
syntax-highlighting editors which would hurt readability a lot. Also its
not obvious at all what "ifu (x == 0)" means - you can pretty much guess what
"if (unlikely(x == 0))" does the first time you see it.
-Mitch