by Willy Tarreau

[permalink] [raw]

Subject: Re: [Cbe-oss-dev] [PATCH 1/3] Fix Unlikely(x) == y

On Sun, Feb 17, 2008 at 01:45:23AM -0800, Andrew Pinski wrote:
> On Feb 16, 2008 9:58 AM, Willy Tarreau <[email protected]> wrote:
> > Last but not least, gcc 4 tends to emit stupid checks, to the point that I
> > have replaced unlikely(x) with (x) in my code when gcc >= 4 is detected. What
> > I observe is that the following code :
> >
> > if (unlikely(p == NULL)) ...
> >
> > often gets coded like this :
> >
> > reg1 = (p == NULL)
> > if (reg1 != 0) ...
> >
> > ... which clobbers reg1 for nothing and performs a double test.
>
> This really only can happen in GCC 4.0.x and 4.1.x and cannot happen
> for 4.2 or 4.3 really because of the way __builtin_expect is handled
> for those two.

Happy to know that, thanks for the info Andrew!

Willy

2008-02-17 11:50:20

On Mon, 18 Feb 2008 14:27:10 GMT, David Howells said:

> __builtin_expect() is useful on FRV where you _have_ to give each branch and
> conditional branch instruction a measure of probability whether the branch
> will be taken.

What does gcc do the 99.998% of the time we don't have likely/unlikely coded?

Attachments:

(No filename) (226.00 B)

2008-02-18 18:34:53

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [PATCH 1/3] Fix Unlikely(x) == y

On Mon, 18 Feb 2008 13:11:06 -0500
[email protected] wrote:

> On Mon, 18 Feb 2008 14:27:10 GMT, David Howells said:
>
> > __builtin_expect() is useful on FRV where you _have_ to give each
> > branch and conditional branch instruction a measure of probability
> > whether the branch will be taken.
>
> What does gcc do the 99.998% of the time we don't have
> likely/unlikely coded?

see Andi's email.
It gets the exact same hints that 95%+ of the kernels unlikely/likely get you,
because the heuristics in it are usually the same as the kernel programmers
heuristics.

--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-02-18 19:22:29

by Andrew Pinski

[permalink] [raw]

Subject: Re: [Cbe-oss-dev] [PATCH 1/3] Fix Unlikely(x) == y

On Feb 18, 2008 6:01 AM, Geert Uytterhoeven
<[email protected]> wrote:
> > This means it generates faster code with a current gcc for your platform.
> >
> > But a future gcc might e.g. replace the whole loop with a division
> > (gcc SVN head (that will soon become gcc 4.3) already does
> > transformations like replacing loops with divisions [1]).

Yes but the issue is one optimization inside GCC does not take into
account the probability in one case.

And really there is a bug in the linux kernel for not implementing the
long long divide function (or really using libgcc) but that is a
different story and is part of the issue there anyways.

-- Pinski

2008-02-18 21:46:20

On Tuesday 19 February 2008 20:57, Andi Kleen wrote:
> On Tue, Feb 19, 2008 at 08:46:46PM +1100, Nick Piggin wrote:

> > I think it was just a simple context switch benchmark, but not lmbench
> > (which I found to be a bit too variable). But it was a long time ago...
>
> Do you still have it?
>
> I thought about writing my own but ended up being too lazy for that @)

Had a quick look but couldn't find it. It was just two threads running
and switching to each other with a couple of mutexes or yield. If I
find it, then I'll send it over.

> > > > Actually one thing I don't like about gcc is that I think it still
> > > > emits cmovs for likely/unlikely branches,
> > >
> > > That's -Os.
> >
> > And -O2 and -O3, on the gccs that I'm using, AFAIKS.
>
> Well if it still happens on gcc 4.2 with P4 tuning you should
> perhaps open a gcc PR. They tend to ignore these bugs mostly in
> my experience, but sometimes they act on them.

I'm not sure about P4 tuning... But even IMO it should not on
predictable branches too much for any (especially OOOE) CPU.

> > > > which is silly (the gcc developers
> > >
> > > It depends on the CPU. e.g. on K8 and P6 using CMOV if possible
> > > makes sense. P4 doesn't like it though.
> >
> > If the branch is completely predictable (eg. annotated), then I
> > think branches should be used anyway. Even on well predicted
> > branches, cmov is similar speed on microbenchmarks, but it will
> > increase data hazards I think, so it will probably be worse for
> > some real world situations.
>
> At least the respective optimization manuals say they should be used.
> I presume they only made this recommendation after some extensive
> benchmarking.

What I have seen is that they tell you definitely not to use it for
predictable branches. Eg. the Intel optimization manual says

Use the setcc and cmov instructions to eliminate unpredictable
conditional branches where possible. Do not do this for predictable
branches. Do not use these instructions to eliminate all
unpredictable conditional branches, because using these instructions
will incur execution overhead due to executing both paths of a
conditional branch. In addition, converting conditional branches to
cmovs or setcc trades control-flow dependence for data dependence
and restricts the capability of the out-of-order engine.

> > But a likely branch will be _strongly_ predicted to be taken,
> > wheras a lot of the gcc heuristics simply have slightly more or
> > slightly less probability. So it's not just a question of which
> > way is more likely, but also _how_ likely it is to go that way.
>
> Yes, but a lot of the heuristics are pretty strong (>80%) and gcc will
> act on them unless it has a very strong contra cue. And that should
> normally not be the case.

True, but if you know a branch is 99%+, then use of likely/unlikely
can still be a good idea. 80% may not be enough to choose a branch
over a cmov for example.

2008-02-20 07:33:17

by Willy Tarreau

[permalink] [raw]

Subject: Re: [PATCH 1/3] Fix Unlikely(x) == y

On Tue, Feb 19, 2008 at 10:28:46AM +0100, Andi Kleen wrote:
> > Sometimes, for performance critical paths, I would like gcc to be dumb and
> > follow *my* code and not its hard-coded probabilities.
>
> If you really want that, simple: just disable optimization @)

already tried. It fixed some difficulties, but create new expected issues
with data being reloaded often from memory instead of being passed along
a few registers. Don't forget that optimizing for x86 requires a lot of
smartness from the compiler because of the very small number of registers!

> > Maybe one thing we would need would be the ability to assign probabilities
> > to each branch based on what we expect, so that gcc could build a better
> > tree keeping most frequently used code tight.
>
> Just use profile feedback then for user space. I don't think it's a good
> idea for kernel code though because it leads to unreproducible binaries
> which would wreck the development model.

I never found this to be practically usable in fact, because you have to
use it on the *exact* same source. End of game for cross-compilation. It
would be good to be able to use a few pragmas in the code to say "hey, I
want this block optimized like this". This is what I understood the
__builtin_expect() was for, except that it tends to throw unpredicted
branches too far away.

> > Hmm I've just noticed -fno-guess-branch-probability in the man, I never tried
> > it.
>
> Or -fno-reorder-blocks

Thanks for the hint, I will try it.

Willy