The patch below was not yet tested. If it's correct as it is, please comment.
---
Fix Unlikely(x) == y
Signed-off-by: Roel Kluin <[email protected]>
---
diff --git a/arch/powerpc/platforms/ps3/interrupt.c b/arch/powerpc/platforms/ps3/interrupt.c
index 3a6db04..a14e5cd 100644
--- a/arch/powerpc/platforms/ps3/interrupt.c
+++ b/arch/powerpc/platforms/ps3/interrupt.c
@@ -709,7 +709,7 @@ static unsigned int ps3_get_irq(void)
asm volatile("cntlzd %0,%1" : "=r" (plug) : "r" (x));
plug &= 0x3f;
- if (unlikely(plug) == NO_IRQ) {
+ if (unlikely(plug == NO_IRQ)) {
pr_debug("%s:%d: no plug found: thread_id %lu\n", __func__,
__LINE__, pd->thread_id);
dump_bmp(&per_cpu(ps3_private, 0));
On Sat, 16 Feb 2008 17:08:01 +0100
Roel Kluin <[email protected]> wrote:
> The patch below was not yet tested. If it's correct as it is, please
> comment. ---
> Fix Unlikely(x) == y
>
you found a great set of bugs..
but to be honest... I suspect it's just best to remove unlikely altogether for these cases;
unlikely() is almost a go-faster-stripes thing, and if you don't know how to use it you shouldn't
be using it... so just removing it for all wrong cases is actually the best thing to do imo.
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Sat, Feb 16, 2008 at 09:25:52AM -0800, Arjan van de Ven wrote:
> On Sat, 16 Feb 2008 17:08:01 +0100
> Roel Kluin <[email protected]> wrote:
>
> > The patch below was not yet tested. If it's correct as it is, please
> > comment. ---
> > Fix Unlikely(x) == y
> >
>
> you found a great set of bugs..
> but to be honest... I suspect it's just best to remove unlikely altogether for these cases;
> unlikely() is almost a go-faster-stripes thing, and if you don't know how to use it you shouldn't
> be using it... so just removing it for all wrong cases is actually the best thing to do imo.
Well, eventhough the author may not know how to use it, "unlikely" at
least indicates the intention of the author, or his knowledge of what
should happen here. I'd suggest leaving it where it is because the
authot of this code is in best position to know that this branch is
unlikely to happen, eventhough he does not correctly use the macro.
Willy
On Sat, 16 Feb 2008 18:33:16 +0100
Willy Tarreau <[email protected]> wrote:
> On Sat, Feb 16, 2008 at 09:25:52AM -0800, Arjan van de Ven wrote:
> > On Sat, 16 Feb 2008 17:08:01 +0100
> > Roel Kluin <[email protected]> wrote:
> >
> > > The patch below was not yet tested. If it's correct as it is,
> > > please comment. ---
> > > Fix Unlikely(x) == y
> > >
> >
> > you found a great set of bugs..
> > but to be honest... I suspect it's just best to remove unlikely
> > altogether for these cases; unlikely() is almost a
> > go-faster-stripes thing, and if you don't know how to use it you
> > shouldn't be using it... so just removing it for all wrong cases is
> > actually the best thing to do imo.
>
> Well, eventhough the author may not know how to use it, "unlikely" at
> least indicates the intention of the author, or his knowledge of what
> should happen here. I'd suggest leaving it where it is because the
> authot of this code is in best position to know that this branch is
> unlikely to happen, eventhough he does not correctly use the macro.
>
you have more faith in the authors knowledge of how his code actually behaves than I think is warranted :)
Or faith in that he knows what "unlikely" means.
I should write docs about this; but unlikely() means:
1) It happens less than 0.01% of the cases.
2) The compiler couldn't have figured this out by itself
(NULL pointer checks are compiler done already, same for some other conditions)
3) It's a hot codepath where shaving 0.5 cycles (less even on x86) matters
(and the author is ok with taking a 500 cycles hit if he's wrong)
If you think unlikely() means something else, we should fix what it maps to towards gcc ;)
(to.. be empty ;)
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Sat, Feb 16, 2008 at 09:42:26AM -0800, Arjan van de Ven wrote:
> On Sat, 16 Feb 2008 18:33:16 +0100
> Willy Tarreau <[email protected]> wrote:
>
> > On Sat, Feb 16, 2008 at 09:25:52AM -0800, Arjan van de Ven wrote:
> > > On Sat, 16 Feb 2008 17:08:01 +0100
> > > Roel Kluin <[email protected]> wrote:
> > >
> > > > The patch below was not yet tested. If it's correct as it is,
> > > > please comment. ---
> > > > Fix Unlikely(x) == y
> > > >
> > >
> > > you found a great set of bugs..
> > > but to be honest... I suspect it's just best to remove unlikely
> > > altogether for these cases; unlikely() is almost a
> > > go-faster-stripes thing, and if you don't know how to use it you
> > > shouldn't be using it... so just removing it for all wrong cases is
> > > actually the best thing to do imo.
> >
> > Well, eventhough the author may not know how to use it, "unlikely" at
> > least indicates the intention of the author, or his knowledge of what
> > should happen here. I'd suggest leaving it where it is because the
> > authot of this code is in best position to know that this branch is
> > unlikely to happen, eventhough he does not correctly use the macro.
> >
>
> you have more faith in the authors knowledge of how his code actually behaves than I think is warranted :)
> Or faith in that he knows what "unlikely" means.
> I should write docs about this; but unlikely() means:
> 1) It happens less than 0.01% of the cases.
> 2) The compiler couldn't have figured this out by itself
> (NULL pointer checks are compiler done already, same for some other conditions)
> 3) It's a hot codepath where shaving 0.5 cycles (less even on x86) matters
> (and the author is ok with taking a 500 cycles hit if he's wrong)
>
> If you think unlikely() means something else, we should fix what it maps to towards gcc ;)
> (to.. be empty ;)
eventhough the gcc docs say it's just a hint to help the compiler optimize
the branch it takes by default, I too have noticed that it more often does
bad than good. Code gets completely reordered and even sometimes partially
duplicated (especially when the branch is a return).
Last but not least, gcc 4 tends to emit stupid checks, to the point that I
have replaced unlikely(x) with (x) in my code when gcc >= 4 is detected. What
I observe is that the following code :
if (unlikely(p == NULL)) ...
often gets coded like this :
reg1 = (p == NULL)
if (reg1 != 0) ...
... which clobbers reg1 for nothing and performs a double test.
But yes, I assumed that the author considered its use to be legitimate (I've
not looked at the code). Maybe you're right and it should be removed, but in
this case we would need a large audit of the abuses of unlikely()...
Willy
On Sat, 16 Feb 2008 18:58:49 +0100
> > If you think unlikely() means something else, we should fix what it
> > maps to towards gcc ;) (to.. be empty ;)
>
> eventhough the gcc docs say it's just a hint to help the compiler
> optimize the branch it takes by default, I too have noticed that it
> more often does bad than good. Code gets completely reordered and
> even sometimes partially duplicated (especially when the branch is a
> return).
>
> Last but not least, gcc 4 tends to emit stupid checks, to the point
> that I have replaced unlikely(x) with (x) in my code when gcc >= 4 is
> detected. What I observe is that the following code :
>
> if (unlikely(p == NULL)) ...
this is pure bad since GCC already assumes that NULL checks are unlikely,
and with the unlikely() code needing to normalize stuff... it will generate
worse code for sure yes.
>
> often gets coded like this :
>
> reg1 = (p == NULL)
> if (reg1 != 0) ...
>
> ... which clobbers reg1 for nothing and performs a double test.
>
> But yes, I assumed that the author considered its use to be
> legitimate (I've not looked at the code). Maybe you're right and it
> should be removed, but in this case we would need a large audit of
> the abuses of unlikely()...
no argument.. how about we start with all the cases where the author just got it
entirely wrong ... like the ones from this patch ;)
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On 02/16/2008 09:42 AM, Arjan van de Ven wrote:
> On Sat, 16 Feb 2008 18:33:16 +0100
> Willy Tarreau <[email protected]> wrote:
>
>> On Sat, Feb 16, 2008 at 09:25:52AM -0800, Arjan van de Ven wrote:
>> > On Sat, 16 Feb 2008 17:08:01 +0100
>> > Roel Kluin <[email protected]> wrote:
>> >
>> > > The patch below was not yet tested. If it's correct as it is,
>> > > please comment. ---
>> > > Fix Unlikely(x) == y
>> > >
>> >
>> > you found a great set of bugs..
>> > but to be honest... I suspect it's just best to remove unlikely
>> > altogether for these cases; unlikely() is almost a
>> > go-faster-stripes thing, and if you don't know how to use it you
>> > shouldn't be using it... so just removing it for all wrong cases is
>> > actually the best thing to do imo.
>>
>> Well, eventhough the author may not know how to use it, "unlikely" at
>> least indicates the intention of the author, or his knowledge of what
>> should happen here. I'd suggest leaving it where it is because the
>> authot of this code is in best position to know that this branch is
>> unlikely to happen, eventhough he does not correctly use the macro.
>>
>
> you have more faith in the authors knowledge of how his code actually behaves than I think is warranted :)
> Or faith in that he knows what "unlikely" means.
> I should write docs about this; but unlikely() means:
> 1) It happens less than 0.01% of the cases.
> 2) The compiler couldn't have figured this out by itself
> (NULL pointer checks are compiler done already, same for some other conditions)
> 3) It's a hot codepath where shaving 0.5 cycles (less even on x86) matters
> (and the author is ok with taking a 500 cycles hit if he's wrong)
>
> If you think unlikely() means something else, we should fix what it maps to towards gcc ;)
> (to.. be empty ;)
Well, I didn't consider what today's compiler does, but used it as a general
indicator, because I think that code will be around a long time. If you show
me some test results that prove it causes harm I might consider removing it.
-Geoff
On Sat, 16 Feb 2008 10:31:26 -0800
Geoff Levand <[email protected]> wrote:
> On 02/16/2008 09:42 AM, Arjan van de Ven wrote:
> > On Sat, 16 Feb 2008 18:33:16 +0100
> > Willy Tarreau <[email protected]> wrote:
> >
> >> On Sat, Feb 16, 2008 at 09:25:52AM -0800, Arjan van de Ven wrote:
> >> > On Sat, 16 Feb 2008 17:08:01 +0100
> >> > Roel Kluin <[email protected]> wrote:
> >> >
> >> > > The patch below was not yet tested. If it's correct as it is,
> >> > > please comment. ---
> >> > > Fix Unlikely(x) == y
> >> > >
> >> >
> >> > you found a great set of bugs..
> >> > but to be honest... I suspect it's just best to remove unlikely
> >> > altogether for these cases; unlikely() is almost a
> >> > go-faster-stripes thing, and if you don't know how to use it you
> >> > shouldn't be using it... so just removing it for all wrong cases
> >> > is actually the best thing to do imo.
> >>
> >> Well, eventhough the author may not know how to use it, "unlikely"
> >> at least indicates the intention of the author, or his knowledge
> >> of what should happen here. I'd suggest leaving it where it is
> >> because the authot of this code is in best position to know that
> >> this branch is unlikely to happen, eventhough he does not
> >> correctly use the macro.
> >>
> >
> > you have more faith in the authors knowledge of how his code
> > actually behaves than I think is warranted :) Or faith in that he
> > knows what "unlikely" means. I should write docs about this; but
> > unlikely() means: 1) It happens less than 0.01% of the cases.
> > 2) The compiler couldn't have figured this out by itself
> > (NULL pointer checks are compiler done already, same for some
> > other conditions) 3) It's a hot codepath where shaving 0.5 cycles
> > (less even on x86) matters (and the author is ok with taking a 500
> > cycles hit if he's wrong)
> >
> > If you think unlikely() means something else, we should fix what it
> > maps to towards gcc ;) (to.. be empty ;)
>
> Well, I didn't consider what today's compiler does, but used it as a
> general indicator, because I think that code will be around a long
> time. If you show me some test results that prove it causes harm I
> might consider removing it.
for mordern (last 10 years) x86 cpus... the cpu branchpredictor is better
than the coder in general. Same for most other architectures.
unlikely() creates bigger code as well.
Now... we're talking about your super duper hotpath function here right?
One where you care about 0.5 cycle speed improvement? (less on modern
systems ;)
Because if not, the bigger code and general "compiler second guessing" is just
more harmful than good, shown already here by the fact that the code was just incorrect
as a virtue of having the unlikely() in.
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On 02/16/2008 08:08 AM, Roel Kluin wrote:
> The patch below was not yet tested. If it's correct as it is, please comment.
> - if (unlikely(plug) == NO_IRQ) {
> + if (unlikely(plug == NO_IRQ)) {
A good catch! I'll put it in with some other 2.6.25 bug fixes.
-Geoff
On Feb 16, 2008 9:58 AM, Willy Tarreau <[email protected]> wrote:
> Last but not least, gcc 4 tends to emit stupid checks, to the point that I
> have replaced unlikely(x) with (x) in my code when gcc >= 4 is detected. What
> I observe is that the following code :
>
> if (unlikely(p == NULL)) ...
>
> often gets coded like this :
>
> reg1 = (p == NULL)
> if (reg1 != 0) ...
>
> ... which clobbers reg1 for nothing and performs a double test.
This really only can happen in GCC 4.0.x and 4.1.x and cannot happen
for 4.2 or 4.3 really because of the way __builtin_expect is handled
for those two.
-- Pinski
On Sun, Feb 17, 2008 at 01:45:23AM -0800, Andrew Pinski wrote:
> On Feb 16, 2008 9:58 AM, Willy Tarreau <[email protected]> wrote:
> > Last but not least, gcc 4 tends to emit stupid checks, to the point that I
> > have replaced unlikely(x) with (x) in my code when gcc >= 4 is detected. What
> > I observe is that the following code :
> >
> > if (unlikely(p == NULL)) ...
> >
> > often gets coded like this :
> >
> > reg1 = (p == NULL)
> > if (reg1 != 0) ...
> >
> > ... which clobbers reg1 for nothing and performs a double test.
>
> This really only can happen in GCC 4.0.x and 4.1.x and cannot happen
> for 4.2 or 4.3 really because of the way __builtin_expect is handled
> for those two.
Happy to know that, thanks for the info Andrew!
Willy
On Sat, 2008-02-16 at 10:39 -0800, Arjan van de Ven wrote:
> > >> > you found a great set of bugs..
> > >> > but to be honest... I suspect it's just best to remove unlikely
> > >> > altogether for these cases; unlikely() is almost a
> > >> > go-faster-stripes thing, and if you don't know how to use it you
> > >> > shouldn't be using it... so just removing it for all wrong cases
> > >> > is actually the best thing to do imo.
Hi Arjan,
In general I agree with you that unlikely() is overused, and often in
inappropriate places.
> for mordern (last 10 years) x86 cpus... the cpu branchpredictor is better
> than the coder in general. Same for most other architectures.
>
> unlikely() creates bigger code as well.
>
> Now... we're talking about your super duper hotpath function here right?
> One where you care about 0.5 cycle speed improvement? (less on modern
> systems ;)
The first patch was to platforms/ps3 code, which runs on the Cell, in
particular the PPE ... which is not an x86 :)
eg:
[michael@schoenaich ~]$ cat branch.c
#include <stdio.h>
int main(void)
{
int i, j;
for (i = 0, j = 0; i < 1000000000; i++)
if (i % 4 == 0)
j++;
printf("j = %d\n", j);
return 0;
}
[michael@schoenaich ~]$ ppu-gcc -Wall -O3 -o branch branch.c
[michael@schoenaich ~]$ time ./branch
real 0m5.172s
[michael@schoenaich ~]$ cat branch.c
..
for (i = 0, j = 0; i < 1000000000; i++)
if (__builtin_expect(i % 4 == 0, 0))
j++;
..
[michael@schoenaich ~]$ ppu-gcc -Wall -O3 -o branch branch.c
[michael@schoenaich ~]$ time ./branch
real 0m3.762s
Which looks as though unlikely() is helping. Admittedly we don't have a
lot of kernel code that looks like that, but at least unlikely is doing
what we want it to.
cheers
--
Michael Ellerman
OzLabs, IBM Australia Development Lab
wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)
We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
On Sun, Feb 17, 2008 at 10:50:03PM +1100, Michael Ellerman wrote:
> On Sat, 2008-02-16 at 10:39 -0800, Arjan van de Ven wrote:
>...
> > for mordern (last 10 years) x86 cpus... the cpu branchpredictor is better
> > than the coder in general. Same for most other architectures.
> >
> > unlikely() creates bigger code as well.
> >
> > Now... we're talking about your super duper hotpath function here right?
> > One where you care about 0.5 cycle speed improvement? (less on modern
> > systems ;)
>
> The first patch was to platforms/ps3 code, which runs on the Cell, in
> particular the PPE ... which is not an x86 :)
>
> eg:
>
> [michael@schoenaich ~]$ cat branch.c
> #include <stdio.h>
> int main(void)
> {
> int i, j;
>
> for (i = 0, j = 0; i < 1000000000; i++)
> if (i % 4 == 0)
> j++;
>
> printf("j = %d\n", j);
> return 0;
> }
> [michael@schoenaich ~]$ ppu-gcc -Wall -O3 -o branch branch.c
> [michael@schoenaich ~]$ time ./branch
> real 0m5.172s
>
> [michael@schoenaich ~]$ cat branch.c
> ..
> for (i = 0, j = 0; i < 1000000000; i++)
> if (__builtin_expect(i % 4 == 0, 0))
> j++;
> ..
> [michael@schoenaich ~]$ ppu-gcc -Wall -O3 -o branch branch.c
> [michael@schoenaich ~]$ time ./branch
> real 0m3.762s
>
>
> Which looks as though unlikely() is helping. Admittedly we don't have a
> lot of kernel code that looks like that, but at least unlikely is doing
> what we want it to.
This means it generates faster code with a current gcc for your platform.
But a future gcc might e.g. replace the whole loop with a division
(gcc SVN head (that will soon become gcc 4.3) already does
transformations like replacing loops with divisions [1]).
And your __builtin_expect() then might have unwanted effects on gcc.
Or the kernel code changes much but the likely/unlikely stays unchanged
although it becomes wrong.
If it is a real hotpath in the kernel where you have _measurable_
performance advantages from using likely/unlikely it's usage might be
justified, but otherwise it shouldn't be used.
> cheers
cu
Adrian
[1] e.g. the while() loop in timespec_add_ns() in include/linux/time.h
gets replaced by a division and a modulo (whether this
transformation is correct in this specific case is a different
question, but that's the level of code transformation gcc already
does today)
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
On Mon, 18 Feb 2008, Adrian Bunk wrote:
> On Sun, Feb 17, 2008 at 10:50:03PM +1100, Michael Ellerman wrote:
> > On Sat, 2008-02-16 at 10:39 -0800, Arjan van de Ven wrote:
> >...
> > > for mordern (last 10 years) x86 cpus... the cpu branchpredictor is better
> > > than the coder in general. Same for most other architectures.
> > >
> > > unlikely() creates bigger code as well.
> > >
> > > Now... we're talking about your super duper hotpath function here right?
> > > One where you care about 0.5 cycle speed improvement? (less on modern
> > > systems ;)
> >
> > The first patch was to platforms/ps3 code, which runs on the Cell, in
> > particular the PPE ... which is not an x86 :)
> >
> > eg:
> >
> > [michael@schoenaich ~]$ cat branch.c
> > #include <stdio.h>
> > int main(void)
> > {
> > int i, j;
> >
> > for (i = 0, j = 0; i < 1000000000; i++)
> > if (i % 4 == 0)
> > j++;
> >
> > printf("j = %d\n", j);
> > return 0;
> > }
> > [michael@schoenaich ~]$ ppu-gcc -Wall -O3 -o branch branch.c
> > [michael@schoenaich ~]$ time ./branch
> > real 0m5.172s
> >
> > [michael@schoenaich ~]$ cat branch.c
> > ..
> > for (i = 0, j = 0; i < 1000000000; i++)
> > if (__builtin_expect(i % 4 == 0, 0))
> > j++;
> > ..
> > [michael@schoenaich ~]$ ppu-gcc -Wall -O3 -o branch branch.c
> > [michael@schoenaich ~]$ time ./branch
> > real 0m3.762s
> >
> >
> > Which looks as though unlikely() is helping. Admittedly we don't have a
> > lot of kernel code that looks like that, but at least unlikely is doing
> > what we want it to.
>
> This means it generates faster code with a current gcc for your platform.
>
> But a future gcc might e.g. replace the whole loop with a division
> (gcc SVN head (that will soon become gcc 4.3) already does
> transformations like replacing loops with divisions [1]).
Hence shouldn't we ask the gcc people what's the purpose of __builtin_expect(),
if it doesn't live up to its promise?
With kind regards,
Geert Uytterhoeven
Software Architect
Sony Network and Software Technology Center Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium
Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/
Sony Network and Software Technology Center Europe
A division of Sony Service Centre (Europe) N.V.
Registered office: Technologielaan 7 · B-1840 Londerzeel · Belgium
VAT BE 0413.825.160 · RPR Brussels
Fortis Bank Zaventem · Swift GEBABEBB08A · IBAN BE39001382358619
On Mon, Feb 18, 2008 at 03:01:35PM +0100, Geert Uytterhoeven wrote:
> On Mon, 18 Feb 2008, Adrian Bunk wrote:
> >
> > This means it generates faster code with a current gcc for your platform.
> >
> > But a future gcc might e.g. replace the whole loop with a division
> > (gcc SVN head (that will soon become gcc 4.3) already does
> > transformations like replacing loops with divisions [1]).
>
> Hence shouldn't we ask the gcc people what's the purpose of __builtin_expect(),
> if it doesn't live up to its promise?
That's a different issue.
My point here is that we do not know how the latest gcc available in the
year 2010 might transform this code, and how a likely/unlikely placed
there might influence gcc's optimizations then.
If this is in hotpath code with a measurable speedup when using
likely/unlikely with a current gcc then it's worth it.
But otherwise it brings no real advantage today and the longterm effects
are not predictable.
> With kind regards,
>
> Geert Uytterhoeven
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
Geert Uytterhoeven <[email protected]> wrote:
> Hence shouldn't we ask the gcc people what's the purpose of
> __builtin_expect(), if it doesn't live up to its promise?
__builtin_expect() is useful on FRV where you _have_ to give each branch and
conditional branch instruction a measure of probability whether the branch
will be taken.
David
Arjan van de Ven <[email protected]> writes:
> you have more faith in the authors knowledge of how his code actually behaves than I think is warranted :)
iirc there was a mm patch some time ago to keep track of the actual unlikely
values at runtime and it showed indeed some wrong ones. But the
far majority of them are probably correct.
> Or faith in that he knows what "unlikely" means.
> I should write docs about this; but unlikely() means:
> 1) It happens less than 0.01% of the cases.
> 2) The compiler couldn't have figured this out by itself
> (NULL pointer checks are compiler done already, same for some other conditions)
> 3) It's a hot codepath where shaving 0.5 cycles (less even on x86) matters
> (and the author is ok with taking a 500 cycles hit if he's wrong)
One more thing unlikely() does is to move the unlikely code out of line.
So it should conserve some icache in critical functions, which might
well be worth some more cycles (don't have numbers though).
But overall I agree with you that unlikely is in most cases a bad
idea (and I submitted the original patch introducing it originally @). That is because
it is often used in situations where gcc's default branch prediction
heuristics do would make exactly the same decision
if (unlikely(x == NULL))
is simply totally useless because gcc already assumes all x == NULL
tests are unlikely. I appended some of the builtin heuristics from
a recent gcc source so people can see them.
Note in particular the last predictors; assuming branch ending
with goto, including call, causing early function return or
returning negative constant are not taken. Just these alone
are likely 95+% of the unlikelies in the kernel.
-Andi
/* Use number of loop iterations determined by # of iterations
analysis to set probability. We don't want to use Dempster-Shaffer
theory here, as the predictions is exact. */
DEF_PREDICTOR (PRED_LOOP_ITERATIONS, "loop iterations", PROB_ALWAYS,
PRED_FLAG_FIRST_MATCH)
/* Hints dropped by user via __builtin_expect feature. */
DEF_PREDICTOR (PRED_BUILTIN_EXPECT, "__builtin_expect", PROB_VERY_LIKELY,
PRED_FLAG_FIRST_MATCH)
/* Use number of loop iterations guessed by the contents of the loop. */
DEF_PREDICTOR (PRED_LOOP_ITERATIONS_GUESSED, "guessed loop iterations",
PROB_ALWAYS, PRED_FLAG_FIRST_MATCH)
/* Branch containing goto is probably not taken. */
DEF_PREDICTOR (PRED_CONTINUE, "continue", HITRATE (56), 0)
/* Branch to basic block containing call marked by noreturn attribute. */
DEF_PREDICTOR (PRED_NORETURN, "noreturn call", HITRATE (99),
PRED_FLAG_FIRST_MATCH)
/* Branch to basic block containing call marked by cold function attribute. */
DEF_PREDICTOR (PRED_COLD_FUNCTION, "cold function call", HITRATE (99),
PRED_FLAG_FIRST_MATCH)
/* Loopback edge is taken. */
DEF_PREDICTOR (PRED_LOOP_BRANCH, "loop branch", HITRATE (86),
PRED_FLAG_FIRST_MATCH)
/* Edge causing loop to terminate is probably not taken. */
DEF_PREDICTOR (PRED_LOOP_EXIT, "loop exit", HITRATE (91),
PRED_FLAG_FIRST_MATCH)
/* Pointers are usually not NULL. */
DEF_PREDICTOR (PRED_POINTER, "pointer", HITRATE (85), 0)
DEF_PREDICTOR (PRED_TREE_POINTER, "pointer (on trees)", HITRATE (85), 0)
/* NE is probable, EQ not etc... */
DEF_PREDICTOR (PRED_OPCODE_POSITIVE, "opcode values positive", HITRATE (79), 0)
DEF_PREDICTOR (PRED_OPCODE_NONEQUAL, "opcode values nonequal", HITRATE (71), 0)
DEF_PREDICTOR (PRED_FPOPCODE, "fp_opcode", HITRATE (90), 0)
DEF_PREDICTOR (PRED_TREE_OPCODE_POSITIVE, "opcode values positive (on trees)", HITRATE (70), 0)
DEF_PREDICTOR (PRED_TREE_OPCODE_NONEQUAL, "opcode values nonequal (on trees)", HITRATE (69), 0)
DEF_PREDICTOR (PRED_TREE_FPOPCODE, "fp_opcode (on trees)", HITRATE (90), 0)
/* Branch guarding call is probably taken. */
DEF_PREDICTOR (PRED_CALL, "call", HITRATE (69), 0)
/* Branch causing function to terminate is probably not taken. */
DEF_PREDICTOR (PRED_TREE_EARLY_RETURN, "early return (on trees)", HITRATE (54), 0)
/* Branch containing goto is probably not taken. */
DEF_PREDICTOR (PRED_GOTO, "goto", HITRATE (70), 0)
/* Branch guarding call is probably taken. */
DEF_PREDICTOR (PRED_CALL, "call", HITRATE (69), 0)
/* Branch causing function to terminate is probably not taken. */
DEF_PREDICTOR (PRED_TREE_EARLY_RETURN, "early return (on trees)", HITRATE (54), 0)
/* Branch containing goto is probably not taken. */
DEF_PREDICTOR (PRED_GOTO, "goto", HITRATE (70), 0)
/* Branch ending with return constant is probably not taken. */
DEF_PREDICTOR (PRED_CONST_RETURN, "const return", HITRATE (67), 0)
/* Branch ending with return negative constant is probably not taken. */
DEF_PREDICTOR (PRED_NEGATIVE_RETURN, "negative return", HITRATE (96), 0)
/* Branch ending with return; is probably not taken */
DEF_PREDICTOR (PRED_NULL_RETURN, "null return", HITRATE (96), 0)
/* Branches to a mudflap bounds check are extremely unlikely. */
DEF_PREDICTOR (PRED_MUDFLAP, "mudflap check", PROB_VERY_LIKELY, 0)
David Howells wrote:
> Geert Uytterhoeven <[email protected]> wrote:
>
>> Hence shouldn't we ask the gcc people what's the purpose of
>> __builtin_expect(), if it doesn't live up to its promise?
>
> __builtin_expect() is useful on FRV where you _have_ to give each branch and
> conditional branch instruction a measure of probability whether the branch
> will be taken.
>
> David
I was wondering whether some of the uses of likely illustrated below are
incorrect or not useful.
x = likely(X) || Y
for ( ... ; likely(...); ... )
while ( likely(X) )
if ( unlikely(X) &&/|| likely(Y) )
if ( X &&/|| unlikely(Y) )
return likely(X);
return likely(X) ? a : b;
On Mon, 18 Feb 2008 14:27:10 GMT, David Howells said:
> __builtin_expect() is useful on FRV where you _have_ to give each branch and
> conditional branch instruction a measure of probability whether the branch
> will be taken.
What does gcc do the 99.998% of the time we don't have likely/unlikely coded?
On Mon, 18 Feb 2008 13:11:06 -0500
[email protected] wrote:
> On Mon, 18 Feb 2008 14:27:10 GMT, David Howells said:
>
> > __builtin_expect() is useful on FRV where you _have_ to give each
> > branch and conditional branch instruction a measure of probability
> > whether the branch will be taken.
>
> What does gcc do the 99.998% of the time we don't have
> likely/unlikely coded?
see Andi's email.
It gets the exact same hints that 95%+ of the kernels unlikely/likely get you,
because the heuristics in it are usually the same as the kernel programmers
heuristics.
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Feb 18, 2008 6:01 AM, Geert Uytterhoeven
<[email protected]> wrote:
> > This means it generates faster code with a current gcc for your platform.
> >
> > But a future gcc might e.g. replace the whole loop with a division
> > (gcc SVN head (that will soon become gcc 4.3) already does
> > transformations like replacing loops with divisions [1]).
Yes but the issue is one optimization inside GCC does not take into
account the probability in one case.
And really there is a bug in the linux kernel for not implementing the
long long divide function (or really using libgcc) but that is a
different story and is part of the issue there anyways.
-- Pinski
On Mon, 2008-02-18 at 16:13 +0200, Adrian Bunk wrote:
> On Mon, Feb 18, 2008 at 03:01:35PM +0100, Geert Uytterhoeven wrote:
> > On Mon, 18 Feb 2008, Adrian Bunk wrote:
> > >
> > > This means it generates faster code with a current gcc for your platform.
> > >
> > > But a future gcc might e.g. replace the whole loop with a division
> > > (gcc SVN head (that will soon become gcc 4.3) already does
> > > transformations like replacing loops with divisions [1]).
> >
> > Hence shouldn't we ask the gcc people what's the purpose of __builtin_expect(),
> > if it doesn't live up to its promise?
>
> That's a different issue.
>
> My point here is that we do not know how the latest gcc available in the
> year 2010 might transform this code, and how a likely/unlikely placed
> there might influence gcc's optimizations then.
You're right, we don't know. But if giving the compiler _more_
information causes it to produce vastly inferior code then we should be
filing gcc bugs. After all the unlikely/likely is just a hint, if gcc
knows better it can always ignore it.
cheers
--
Michael Ellerman
OzLabs, IBM Australia Development Lab
wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)
We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
On Tuesday 19 February 2008 01:39, Andi Kleen wrote:
> Arjan van de Ven <[email protected]> writes:
> > you have more faith in the authors knowledge of how his code actually
> > behaves than I think is warranted :)
>
> iirc there was a mm patch some time ago to keep track of the actual
> unlikely values at runtime and it showed indeed some wrong ones. But the
> far majority of them are probably correct.
>
> > Or faith in that he knows what "unlikely" means.
> > I should write docs about this; but unlikely() means:
> > 1) It happens less than 0.01% of the cases.
> > 2) The compiler couldn't have figured this out by itself
> > (NULL pointer checks are compiler done already, same for some other
> > conditions) 3) It's a hot codepath where shaving 0.5 cycles (less even on
> > x86) matters (and the author is ok with taking a 500 cycles hit if he's
> > wrong)
>
> One more thing unlikely() does is to move the unlikely code out of line.
> So it should conserve some icache in critical functions, which might
> well be worth some more cycles (don't have numbers though).
I actually once measured context switching performance in the scheduler,
and removing the unlikely hint for testing RT tasks IIRC gave about 5%
performance drop.
This was on a P4 which is very different from more modern CPUs both in
terms of branch performance characteristics, and icache characteristics.
However, the P4's branch predictor is pretty good, and it should easily
be able to correctly predict the rt_task check if it has enough entries.
So I think much of the savings came from code transformation and movement.
Anyway, it is definitely worthwhile if used correctly.
Actually one thing I don't like about gcc is that I think it still emits
cmovs for likely/unlikely branches, which is silly (the gcc developers
seem to be in love with that instruction). If that goes away, then
branch hints may be even better.
>
> But overall I agree with you that unlikely is in most cases a bad
> idea (and I submitted the original patch introducing it originally @). That
> is because it is often used in situations where gcc's default branch
> prediction heuristics do would make exactly the same decision
>
> if (unlikely(x == NULL))
>
> is simply totally useless because gcc already assumes all x == NULL
> tests are unlikely. I appended some of the builtin heuristics from
> a recent gcc source so people can see them.
>
> Note in particular the last predictors; assuming branch ending
> with goto, including call, causing early function return or
> returning negative constant are not taken. Just these alone
> are likely 95+% of the unlikelies in the kernel.
Yes, gcc should be able to do pretty good heuristics, considering
the quite good numbers that cold CPU predictors can attain. However
for really performance critical code (or really "never" executed
code), then I think it is OK to have the hints and not have to rely
on gcc heuristics.
>
> -Andi
[snip]
Interesting, thanks!
On Tue, 19 Feb 2008 13:33:53 +1100
Nick Piggin <[email protected]> wrote:
>
> Actually one thing I don't like about gcc is that I think it still
> emits cmovs for likely/unlikely branches, which is silly (the gcc
> developers seem to be in love with that instruction). If that goes
> away, then branch hints may be even better.
only for -Os and only if the result is smaller afaik.
(cmov tends to be a performance loss most of the time so for -O2 and such it
isn't used as far as I know.. it does make for nice small code however ;-)
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Tuesday 19 February 2008 13:40, Arjan van de Ven wrote:
> On Tue, 19 Feb 2008 13:33:53 +1100
>
> Nick Piggin <[email protected]> wrote:
> > Actually one thing I don't like about gcc is that I think it still
> > emits cmovs for likely/unlikely branches, which is silly (the gcc
> > developers seem to be in love with that instruction). If that goes
> > away, then branch hints may be even better.
>
> only for -Os and only if the result is smaller afaik.
What is your evidence for saying this? Because here, with the latest
kernel and recent gcc-4.3 snapshot, it spits out cmov like crazy even
when compiled with -O2.
npiggin@am:~/usr/src/linux-2.6$ grep cmov kernel/sched.s | wc -l
45
And yes it even does for hinted branches and even at -O2/3
npiggin@am:~/tests$ cat cmov.c
int test(int a, int b)
{
if (__builtin_expect(a < b, 0))
return a;
else
return b;
}
npiggin@am:~/tests$ gcc-4.3 -S -O2 cmov.c
npiggin@am:~/tests$ head -13 cmov.s
.file "cmov.c"
.text
.p2align 4,,15
..globl test
.type test, @function
test:
..LFB2:
cmpl %edi, %esi
cmovle %esi, %edi
movl %edi, %eax
ret
..LFE2:
.size test, .-test
This definitely should be a branch, IMO.
> (cmov tends to be a performance loss most of the time so for -O2 and such
> it isn't used as far as I know.. it does make for nice small code however
> ;-)
It shouldn't be hard to work out the cutover point based on how
expensive cmov is, how expensive branch and branch mispredicts are,
and how often the branch is likely to be mispredicted. For an
unpredictable branch, cmov is normally quite a good win even on
modern CPUs. But gcc overuses it I think.
On Tue, Feb 19, 2008 at 01:33:53PM +1100, Nick Piggin wrote:
> > Note in particular the last predictors; assuming branch ending
> > with goto, including call, causing early function return or
> > returning negative constant are not taken. Just these alone
> > are likely 95+% of the unlikelies in the kernel.
>
> Yes, gcc should be able to do pretty good heuristics, considering
> the quite good numbers that cold CPU predictors can attain. However
> for really performance critical code (or really "never" executed
> code), then I think it is OK to have the hints and not have to rely
> on gcc heuristics.
in my experience, the real problem is that gcc does what *it* wants and not
what *you* want. I've been annoyed a lot by the way it coded some loops that
could really be blazingly fast, but which resulted in a ton of branches due
to its predictors. And using unlikely() there was a real mess, because instead
of just hinting the compiler with probabilities to write some linear code for
the *most* common case, it ended up with awful branches everywhere with code
sent far away and even duplicated for some branches.
Sometimes, for performance critical paths, I would like gcc to be dumb and
follow *my* code and not its hard-coded probabilities. For instance, in a
tree traversal, you really know how you want to build your loop. And these
days, it seems like the single method of getting it your way is doing asm,
which obviously is not portable :-(
Maybe one thing we would need would be the ability to assign probabilities
to each branch based on what we expect, so that gcc could build a better
tree keeping most frequently used code tight.
Hmm I've just noticed -fno-guess-branch-probability in the man, I never tried
it.
regards,
Willy
On Tuesday 19 February 2008 16:58, Willy Tarreau wrote:
> On Tue, Feb 19, 2008 at 01:33:53PM +1100, Nick Piggin wrote:
> > > Note in particular the last predictors; assuming branch ending
> > > with goto, including call, causing early function return or
> > > returning negative constant are not taken. Just these alone
> > > are likely 95+% of the unlikelies in the kernel.
> >
> > Yes, gcc should be able to do pretty good heuristics, considering
> > the quite good numbers that cold CPU predictors can attain. However
> > for really performance critical code (or really "never" executed
> > code), then I think it is OK to have the hints and not have to rely
> > on gcc heuristics.
>
> in my experience, the real problem is that gcc does what *it* wants and not
> what *you* want. I've been annoyed a lot by the way it coded some loops
> that could really be blazingly fast, but which resulted in a ton of
> branches due to its predictors. And using unlikely() there was a real mess,
> because instead of just hinting the compiler with probabilities to write
> some linear code for the *most* common case, it ended up with awful
> branches everywhere with code sent far away and even duplicated for some
> branches.
>
> Sometimes, for performance critical paths, I would like gcc to be dumb and
> follow *my* code and not its hard-coded probabilities. For instance, in a
> tree traversal, you really know how you want to build your loop. And these
> days, it seems like the single method of getting it your way is doing asm,
> which obviously is not portable :-(
Probably all true.
> Maybe one thing we would need would be the ability to assign probabilities
> to each branch based on what we expect, so that gcc could build a better
> tree keeping most frequently used code tight.
I don't know if that would *directly* lead to gcc being smarter. I
think perhaps they probably don't benchmark on code bases that have
much explicit annotation (I'm sure they wouldn't seriously benchmark
any parts of Linux as part of daily development). I think the key is
to continue to use annotations _properly_, and eventually gcc should
go in the right direction if enough code uses it.
And if you have really good examples like it sounds like above, then
I guess that should be reported to gcc?
On Tue, Feb 19, 2008 at 08:46:03AM +1100, Michael Ellerman wrote:
> On Mon, 2008-02-18 at 16:13 +0200, Adrian Bunk wrote:
> > On Mon, Feb 18, 2008 at 03:01:35PM +0100, Geert Uytterhoeven wrote:
> > > On Mon, 18 Feb 2008, Adrian Bunk wrote:
> > > >
> > > > This means it generates faster code with a current gcc for your platform.
> > > >
> > > > But a future gcc might e.g. replace the whole loop with a division
> > > > (gcc SVN head (that will soon become gcc 4.3) already does
> > > > transformations like replacing loops with divisions [1]).
> > >
> > > Hence shouldn't we ask the gcc people what's the purpose of __builtin_expect(),
> > > if it doesn't live up to its promise?
> >
> > That's a different issue.
> >
> > My point here is that we do not know how the latest gcc available in the
> > year 2010 might transform this code, and how a likely/unlikely placed
> > there might influence gcc's optimizations then.
>
> You're right, we don't know. But if giving the compiler _more_
> information causes it to produce vastly inferior code then we should be
> filing gcc bugs. After all the unlikely/likely is just a hint, if gcc
> knows better it can always ignore it.
It's the other way round, gcc assumes that you know better than gcc when
you give it a __builtin_expect().
The example you gave had only a 1:3 ratio, which is far outside of the
ratios where __builtin_expect() should be used.
What if you gave this annotation for the 1:3 case and gcc generates code
that performs better for ratios > 1:1000 but much worse for a 1:3 ratio
since your hint did override a better estimate of gcc?
And I'm sure that > 90% of all kernel developers (including me) are
worse in such respects than the gcc heuristics.
I'm a firm believer in the following:
- it's the programmer's job to write clean and efficient C code
- it's the compiler's job to convert C code into efficient assembler
code
The stable interface between the programmer and the compiler is C, and
when the programmer starts manually messing with internals of the
compiler that's a layering violation that requires a _good_
justification.
With a "good justification" not consisting of some microbenchmark but of
measurements of the actual annotations in the kernel code.
> cheers
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
On Tue, Feb 19, 2008 at 01:33:53PM +1100, Nick Piggin wrote:
> On Tuesday 19 February 2008 01:39, Andi Kleen wrote:
> > Arjan van de Ven <[email protected]> writes:
> > > you have more faith in the authors knowledge of how his code actually
> > > behaves than I think is warranted :)
> >
> > iirc there was a mm patch some time ago to keep track of the actual
> > unlikely values at runtime and it showed indeed some wrong ones. But the
> > far majority of them are probably correct.
> >
> > > Or faith in that he knows what "unlikely" means.
> > > I should write docs about this; but unlikely() means:
> > > 1) It happens less than 0.01% of the cases.
> > > 2) The compiler couldn't have figured this out by itself
> > > (NULL pointer checks are compiler done already, same for some other
> > > conditions) 3) It's a hot codepath where shaving 0.5 cycles (less even on
> > > x86) matters (and the author is ok with taking a 500 cycles hit if he's
> > > wrong)
> >
> > One more thing unlikely() does is to move the unlikely code out of line.
> > So it should conserve some icache in critical functions, which might
> > well be worth some more cycles (don't have numbers though).
>
> I actually once measured context switching performance in the scheduler,
> and removing the unlikely hint for testing RT tasks IIRC gave about 5%
> performance drop.
OT: what benchmarks did you use for that? I had a change some time
ago to the CFS scheduler to avoid unpredicted indirect calls for
the common case, but I wasn't able to benchmark a difference with the usual
suspect benchmark (lmbench). Since it increased code size by
a few bytes it was rejected then.
>
> This was on a P4 which is very different from more modern CPUs both in
> terms of branch performance characteristics,
> and icache characteristics.
Hmm, the P4 the trace cache actually should not care about inline
code that is not executed.
> However, the P4's branch predictor is pretty good, and it should easily
I think it depends on the generation. Prescott class branch
prediction should be much better than the earlier ones.
> Actually one thing I don't like about gcc is that I think it still emits
> cmovs for likely/unlikely branches,
That's -Os.
> which is silly (the gcc developers
It depends on the CPU. e.g. on K8 and P6 using CMOV if possible
makes sense. P4 doesn't like it though.
> the quite good numbers that cold CPU predictors can attain. However
> for really performance critical code (or really "never" executed
> code), then I think it is OK to have the hints and not have to rely
> on gcc heuristics.
But only when the explicit hints are different from what the implicit
branch predictors would predict anyways. And if you look at the
heuristics that is not often the case...
Also I really think that mm patch that measured unlikely efficiency
should be dug out and merged to mainline and them regularly rechecked.
-Andi
> Sometimes, for performance critical paths, I would like gcc to be dumb and
> follow *my* code and not its hard-coded probabilities.
If you really want that, simple: just disable optimization @)
> Maybe one thing we would need would be the ability to assign probabilities
> to each branch based on what we expect, so that gcc could build a better
> tree keeping most frequently used code tight.
Just use profile feedback then for user space. I don't think it's a good
idea for kernel code though because it leads to unreproducible binaries
which would wreck the development model.
> Hmm I've just noticed -fno-guess-branch-probability in the man, I never tried
> it.
Or -fno-reorder-blocks
-Andi
On Tuesday 19 February 2008 20:25, Andi Kleen wrote:
> On Tue, Feb 19, 2008 at 01:33:53PM +1100, Nick Piggin wrote:
> > I actually once measured context switching performance in the scheduler,
> > and removing the unlikely hint for testing RT tasks IIRC gave about 5%
> > performance drop.
>
> OT: what benchmarks did you use for that? I had a change some time
> ago to the CFS scheduler to avoid unpredicted indirect calls for
> the common case, but I wasn't able to benchmark a difference with the usual
> suspect benchmark (lmbench). Since it increased code size by
> a few bytes it was rejected then.
I think it was just a simple context switch benchmark, but not lmbench
(which I found to be a bit too variable). But it was a long time ago...
> > This was on a P4 which is very different from more modern CPUs both in
> > terms of branch performance characteristics,
> >
> > and icache characteristics.
>
> Hmm, the P4 the trace cache actually should not care about inline
> code that is not executed.
Yeah, which is why it is a bit different than other CPUs. Although
the L2 cache I guess is still going to suffer from sparse code, but
I guess that is a bit less important.
> > However, the P4's branch predictor is pretty good, and it should easily
>
> I think it depends on the generation. Prescott class branch
> prediction should be much better than the earlier ones.
I was using a Nocona Xeon, which I think is a Prescott class? And
don't they have much higher mispredict penalty (than older P4s)?
> > Actually one thing I don't like about gcc is that I think it still emits
> > cmovs for likely/unlikely branches,
>
> That's -Os.
And -O2 and -O3, on the gccs that I'm using, AFAIKS.
> > which is silly (the gcc developers
>
> It depends on the CPU. e.g. on K8 and P6 using CMOV if possible
> makes sense. P4 doesn't like it though.
If the branch is completely predictable (eg. annotated), then I
think branches should be used anyway. Even on well predicted
branches, cmov is similar speed on microbenchmarks, but it will
increase data hazards I think, so it will probably be worse for
some real world situations.
> > the quite good numbers that cold CPU predictors can attain. However
> > for really performance critical code (or really "never" executed
> > code), then I think it is OK to have the hints and not have to rely
> > on gcc heuristics.
>
> But only when the explicit hints are different from what the implicit
> branch predictors would predict anyways. And if you look at the
> heuristics that is not often the case...
But a likely branch will be _strongly_ predicted to be taken,
wheras a lot of the gcc heuristics simply have slightly more or
slightly less probability. So it's not just a question of which
way is more likely, but also _how_ likely it is to go that way.
On Tue, Feb 19, 2008 at 08:46:46PM +1100, Nick Piggin wrote:
> On Tuesday 19 February 2008 20:25, Andi Kleen wrote:
> > On Tue, Feb 19, 2008 at 01:33:53PM +1100, Nick Piggin wrote:
>
> > > I actually once measured context switching performance in the scheduler,
> > > and removing the unlikely hint for testing RT tasks IIRC gave about 5%
> > > performance drop.
> >
> > OT: what benchmarks did you use for that? I had a change some time
> > ago to the CFS scheduler to avoid unpredicted indirect calls for
> > the common case, but I wasn't able to benchmark a difference with the usual
> > suspect benchmark (lmbench). Since it increased code size by
> > a few bytes it was rejected then.
>
> I think it was just a simple context switch benchmark, but not lmbench
> (which I found to be a bit too variable). But it was a long time ago...
Do you still have it?
I thought about writing my own but ended up being too lazy for that @)
>
> > > However, the P4's branch predictor is pretty good, and it should easily
> >
> > I think it depends on the generation. Prescott class branch
> > prediction should be much better than the earlier ones.
>
> I was using a Nocona Xeon, which I think is a Prescott class?
Yes.
> And don't they have much higher mispredict penalty (than older P4s)?
They do have a longer pipeline, so yes more penalty (5 or 6 stages more iirc),
but also a lot better branch predictor which makes up for that.
>
>
> > > Actually one thing I don't like about gcc is that I think it still emits
> > > cmovs for likely/unlikely branches,
> >
> > That's -Os.
>
> And -O2 and -O3, on the gccs that I'm using, AFAIKS.
Well if it still happens on gcc 4.2 with P4 tuning you should
perhaps open a gcc PR. They tend to ignore these bugs mostly in
my experience, but sometimes they act on them.
>
>
> > > which is silly (the gcc developers
> >
> > It depends on the CPU. e.g. on K8 and P6 using CMOV if possible
> > makes sense. P4 doesn't like it though.
>
> If the branch is completely predictable (eg. annotated), then I
> think branches should be used anyway. Even on well predicted
> branches, cmov is similar speed on microbenchmarks, but it will
> increase data hazards I think, so it will probably be worse for
> some real world situations.
At least the respective optimization manuals say they should be used.
I presume they only made this recommendation after some extensive
benchmarking.
>
>
> > > the quite good numbers that cold CPU predictors can attain. However
> > > for really performance critical code (or really "never" executed
> > > code), then I think it is OK to have the hints and not have to rely
> > > on gcc heuristics.
> >
> > But only when the explicit hints are different from what the implicit
> > branch predictors would predict anyways. And if you look at the
> > heuristics that is not often the case...
>
> But a likely branch will be _strongly_ predicted to be taken,
> wheras a lot of the gcc heuristics simply have slightly more or
> slightly less probability. So it's not just a question of which
> way is more likely, but also _how_ likely it is to go that way.
Yes, but a lot of the heuristics are pretty strong (>80%) and gcc will
act on them unless it has a very strong contra cue. And that should
normally not be the case.
-Andi
On Tuesday 19 February 2008 20:57, Andi Kleen wrote:
> On Tue, Feb 19, 2008 at 08:46:46PM +1100, Nick Piggin wrote:
> > I think it was just a simple context switch benchmark, but not lmbench
> > (which I found to be a bit too variable). But it was a long time ago...
>
> Do you still have it?
>
> I thought about writing my own but ended up being too lazy for that @)
Had a quick look but couldn't find it. It was just two threads running
and switching to each other with a couple of mutexes or yield. If I
find it, then I'll send it over.
> > > > Actually one thing I don't like about gcc is that I think it still
> > > > emits cmovs for likely/unlikely branches,
> > >
> > > That's -Os.
> >
> > And -O2 and -O3, on the gccs that I'm using, AFAIKS.
>
> Well if it still happens on gcc 4.2 with P4 tuning you should
> perhaps open a gcc PR. They tend to ignore these bugs mostly in
> my experience, but sometimes they act on them.
I'm not sure about P4 tuning... But even IMO it should not on
predictable branches too much for any (especially OOOE) CPU.
> > > > which is silly (the gcc developers
> > >
> > > It depends on the CPU. e.g. on K8 and P6 using CMOV if possible
> > > makes sense. P4 doesn't like it though.
> >
> > If the branch is completely predictable (eg. annotated), then I
> > think branches should be used anyway. Even on well predicted
> > branches, cmov is similar speed on microbenchmarks, but it will
> > increase data hazards I think, so it will probably be worse for
> > some real world situations.
>
> At least the respective optimization manuals say they should be used.
> I presume they only made this recommendation after some extensive
> benchmarking.
What I have seen is that they tell you definitely not to use it for
predictable branches. Eg. the Intel optimization manual says
Use the setcc and cmov instructions to eliminate unpredictable
conditional branches where possible. Do not do this for predictable
branches. Do not use these instructions to eliminate all
unpredictable conditional branches, because using these instructions
will incur execution overhead due to executing both paths of a
conditional branch. In addition, converting conditional branches to
cmovs or setcc trades control-flow dependence for data dependence
and restricts the capability of the out-of-order engine.
> > But a likely branch will be _strongly_ predicted to be taken,
> > wheras a lot of the gcc heuristics simply have slightly more or
> > slightly less probability. So it's not just a question of which
> > way is more likely, but also _how_ likely it is to go that way.
>
> Yes, but a lot of the heuristics are pretty strong (>80%) and gcc will
> act on them unless it has a very strong contra cue. And that should
> normally not be the case.
True, but if you know a branch is 99%+, then use of likely/unlikely
can still be a good idea. 80% may not be enough to choose a branch
over a cmov for example.
On Tue, Feb 19, 2008 at 10:28:46AM +0100, Andi Kleen wrote:
> > Sometimes, for performance critical paths, I would like gcc to be dumb and
> > follow *my* code and not its hard-coded probabilities.
>
> If you really want that, simple: just disable optimization @)
already tried. It fixed some difficulties, but create new expected issues
with data being reloaded often from memory instead of being passed along
a few registers. Don't forget that optimizing for x86 requires a lot of
smartness from the compiler because of the very small number of registers!
> > Maybe one thing we would need would be the ability to assign probabilities
> > to each branch based on what we expect, so that gcc could build a better
> > tree keeping most frequently used code tight.
>
> Just use profile feedback then for user space. I don't think it's a good
> idea for kernel code though because it leads to unreproducible binaries
> which would wreck the development model.
I never found this to be practically usable in fact, because you have to
use it on the *exact* same source. End of game for cross-compilation. It
would be good to be able to use a few pragmas in the code to say "hey, I
want this block optimized like this". This is what I understood the
__builtin_expect() was for, except that it tends to throw unpredicted
branches too far away.
> > Hmm I've just noticed -fno-guess-branch-probability in the man, I never tried
> > it.
>
> Or -fno-reorder-blocks
Thanks for the hint, I will try it.
Willy