2004-03-26 20:54:39

by Daniel Forrest

[permalink] [raw]
Subject: Somewhat OT: gcc, x86, -ffast-math, and Linux

I've tried Googling for an answer on this, but have come up empty and
I think it likely that someone here probably knows the answer...

We are testing and breaking in 6 racks of compute nodes, each rack
containing 30 1U boxes, each box containing 2 x 2.8GHz Xeon CPUs.
Each rack contains identical hardware (single purchase) with the
exception that one rack has double the memory per node. The 6 racks
are located in six different labs across our campus. It is available
to me only as a "black box" queueing system.

I am running one of our applications that has been compiled using gcc
with the -ffast-math option. I am finding that the identical program
using the same input data files is producing different results on
different machines. However, the differences are all less than the
precision of a single-precision floating point number. By this I mean
that if the results (which are written to 15 digits of precision) are
only compared to 7 digits then the results are the same. Also, most
of the time the 15 digit values are the same.

My question is this: Why aren't the results always the same? What is
the -ffast-math option doing? How are the excess bits of precision
dealt with during context switches? Shouldn't the same binary with
the same inputs produce the same output on identical hardware?

I have run the same test with the program compiled without -ffast-math
enabled and the results are always identical.

Any insight would be appreciated.

--
Daniel K. Forrest Laboratory for Molecular and
[email protected] Computational Genomics
University of Wisconsin, Madison


2004-03-26 21:25:53

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Somewhat OT: gcc, x86, -ffast-math, and Linux

On Fri, 26 Mar 2004, Daniel Forrest wrote:

> I've tried Googling for an answer on this, but have come up empty and
> I think it likely that someone here probably knows the answer...
>
> We are testing and breaking in 6 racks of compute nodes, each rack
> containing 30 1U boxes, each box containing 2 x 2.8GHz Xeon CPUs.
> Each rack contains identical hardware (single purchase) with the
> exception that one rack has double the memory per node. The 6 racks
> are located in six different labs across our campus. It is available
> to me only as a "black box" queueing system.
>
> I am running one of our applications that has been compiled using gcc
> with the -ffast-math option. I am finding that the identical program
> using the same input data files is producing different results on
> different machines. However, the differences are all less than the
> precision of a single-precision floating point number. By this I mean
> that if the results (which are written to 15 digits of precision) are
> only compared to 7 digits then the results are the same. Also, most
> of the time the 15 digit values are the same.
>
> My question is this: Why aren't the results always the same? What is
> the -ffast-math option doing? How are the excess bits of precision
> dealt with during context switches? Shouldn't the same binary with
> the same inputs produce the same output on identical hardware?
>
> I have run the same test with the program compiled without -ffast-math
> enabled and the results are always identical.
>
> Any insight would be appreciated.

The gcc `man` page says that -ffast-math allows for ANSI and
IEEE rules to be violated. There is also a statement about
not using it in conjunction with -O options as this will result
in incorrect output.

So you get what you asked for.

The FPU has 80-bits of precision internally. Its state is
always saved and restored across context-switches. There
are no "extra bits of precision" as you state.

The FPU's state is not saved during system-calls so the
kernel is not supposed to use the FPU internally.

Look at <math.h> and the files it includes. Note that the
math library takes and returns type double. If you have
declared your floating-point variables as type float, you
will have serious dynamic rounding errors unless you
closely adhere to the IEEE spec. Even then, it might
be serious. If the IEEE spec gets violated by the
--fast-math, you might have discovered the reason why
you get strange values.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.


2004-03-26 21:45:30

by Andy Isaacson

[permalink] [raw]
Subject: Re: Somewhat OT: gcc, x86, -ffast-math, and Linux

linux-kernel isn't the right forum for this, but I'll take a stab
anyways.

On Fri, Mar 26, 2004 at 02:54:31PM -0600, Daniel Forrest wrote:
[snip: 180 dual Xeon boxes]
> I am running one of our applications that has been compiled using gcc
> with the -ffast-math option. I am finding that the identical program
> using the same input data files is producing different results on
> different machines. However, the differences are all less than the
> precision of a single-precision floating point number. By this I mean
> that if the results (which are written to 15 digits of precision) are
> only compared to 7 digits then the results are the same. Also, most
> of the time the 15 digit values are the same.
>
> My question is this: Why aren't the results always the same? What is
> the -ffast-math option doing? How are the excess bits of precision
> dealt with during context switches? Shouldn't the same binary with
> the same inputs produce the same output on identical hardware?

The kernel should be doing the right thing to preserve FPU state during
context switches. That doesn't prevent the app from doing things wrong
and thus getting the wrong answer (perhaps only under certain
circumstances). And of course the kernel might have bugs (though it's
unlikely to be as simple as "doesn't preserve FPU state correctly"; a
lot of people depend on that codepath being right.)

Likely there is some difference in one of the following areas:
- hardware problems
- kernel
- libraries
- CPU microcode

Or, you have a bug in your program which is triggered by some
environmental factor. (For example, an inter-thread race condition
affected by IO interrupts.)

To eliminate them:
- first, run memtest86 or a similar program to verify that you are not
simply victim of a bad memory stick.
- next, check that the kernel, libc, and libm are identical across the
machines that display the problem.
- next, check /proc/cpuinfo and dmesg(1) output to verify that your
CPUs are the same stepping, and running the same microcode. (The
likelihood that this is the problem is so small as to be almost not
worth mentioning.)

> I have run the same test with the program compiled without -ffast-math
> enabled and the results are always identical.

You don't say how many different results you've gotten. Is there just
one correct and one incorrect result? Or do different runs give
different incorrect results? What is the software environment?
(language, libraries, threading, etc.)

Basically, at this point you haven't provided us enough information to
be able to even point a finger at the kernel. It's certainly possible
that there's a bug, but it's pretty unlikely (IMO). I'd be looking at
hardware and at threading problems in the apps, first.

-andy

2004-03-27 14:25:07

by Jamie Lokier

[permalink] [raw]
Subject: Re: Somewhat OT: gcc, x86, -ffast-math, and Linux

Daniel Forrest wrote:
> What is the -ffast-math option doing?

It enables some optimisations and mathematical transformations which
do not satisfy the properties of IEEE floating point arithmetic.

(Not that GCC's output satisfies those properties without -ffast-math
on x86, but this flag enables much looser semantics).

> How are the excess bits of precision dealt with during context
> switches?

They are preserved - they are part of the floating point context.
If there is any failure to preserve all of that context, it's a kernel bug.

> Shouldn't the same binary with the same inputs produce the same
> output on identical hardware?

Is the hardware *identical*, or are they different x86 CPUs?

Different CPUs give different results for the trigonometric functions.
GCC's manual claims that fsin, fcos and fsqrt instructions are only
used if the -funsafe-math-optimizations flag is also used, if the GCC
version is >= 2.6.1. However you may find that Glibc's <math.h> ends
up using those instructions when -ffast-math is used alone.

-- Jamie

2004-03-27 14:48:22

by Nick Warne

[permalink] [raw]
Subject: Re: Somewhat OT: gcc, x86, -ffast-math, and Linux

I had very funny results building a Quake2 MOD I am coder on (Quake2
dday).

Using -ffast-math made the HMG (e.g.) fire off aim by say 25?. Do a
recompile with NO code change, and the HMG would be OK... but then
the pistol starting firing all over the show. Do another rebuild (NO
code change) and then the rifle showed this... etc. etc.

Every new build produced a different result, although the code was
untouched.

I had to build leaving `--fast-math' option out in the end to get it
to work correctly.

Maybe bad coding here, but what I didn't understand was why the
result was so random (like each weapon has it's own code - so why one
routine worked then after a rebuild didn't and vice versa, I don't
know).

Nick
(Not subscribed to list).

--
"When you're chewing on life's gristle,
Don't grumble, Give a whistle..."

2004-03-27 15:14:07

by Jakub Jelinek

[permalink] [raw]
Subject: Re: Somewhat OT: gcc, x86, -ffast-math, and Linux

On Sat, Mar 27, 2004 at 02:24:59PM +0000, Jamie Lokier wrote:
> GCC's manual claims that fsin, fcos and fsqrt instructions are only
> used if the -funsafe-math-optimizations flag is also used, if the GCC
> version is >= 2.6.1. However you may find that Glibc's <math.h> ends
> up using those instructions when -ffast-math is used alone.

Well, -ffast-math sets -funsafe-math-optimizations, unless you do
-ffast-math -fno-unsafe-math-optimizations, so the difference is not that
big. glibc math inlines will be eventually replaced by GCC builtins as soon
as GCC is known to optimize at least as good as glibc's math inlines and so
even that difference will cease to exist.

Jakub

2004-03-29 08:47:44

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Somewhat OT: gcc, x86, -ffast-math, and Linux

Daniel Forrest <[email protected]> writes:

> I've tried Googling for an answer on this, but have come up empty and
> I think it likely that someone here probably knows the answer...
>
> We are testing and breaking in 6 racks of compute nodes, each rack
> containing 30 1U boxes, each box containing 2 x 2.8GHz Xeon CPUs.
> Each rack contains identical hardware (single purchase) with the
> exception that one rack has double the memory per node. The 6 racks
> are located in six different labs across our campus. It is available
> to me only as a "black box" queueing system.

Testing and breaking in hardware with only black box remote access
sounds crippling. Hopefully you can work with someone to fix problems
as they occur.

> I am running one of our applications that has been compiled using gcc
> with the -ffast-math option. I am finding that the identical program
> using the same input data files is producing different results on
> different machines. However, the differences are all less than the
> precision of a single-precision floating point number. By this I mean
> that if the results (which are written to 15 digits of precision) are
> only compared to 7 digits then the results are the same. Also, most
> of the time the 15 digit values are the same.

How errors propagate depend on the specifics of what computation
you are doing.

> My question is this: Why aren't the results always the same?

Most likely memory errors. Do the machines have ECC memory? Is anything
reporting the ECC memory errors as the occur?

> What is the -ffast-math option doing? How are the excess bits of precision
> dealt with during context switches? Shouldn't the same binary with
> the same inputs produce the same output on identical hardware?

Yes. Baring I/O related variables.

> I have run the same test with the program compiled without -ffast-math
> enabled and the results are always identical.

This may simply be a case where you are not hitting the hardware as
hard. Or possibly compiler/optimizer bugs.

Or possibly you never ran your job on the faulty hardware?

> Any insight would be appreciated.

I don't have any except that universities are usually cheap and go
with the lowest bidder on hardware.

In general tracking this kind of problem comes down to applying
the scientific method. And carefully looking at and controlling
the variables until a root cause is found.

Eric

2004-03-31 07:15:07

by J.A. Magallon

[permalink] [raw]
Subject: Re: Somewhat OT: gcc, x86, -ffast-math, and Linux


On 26 mar 2004, at 21:54, Daniel Forrest wrote:

> different machines. However, the differences are all less than the
> precision of a single-precision floating point number. By this I mean
> that if the results (which are written to 15 digits of precision) are
> only compared to 7 digits then the results are the same. Also, most
> of the time the 15 digit values are the same.
>

(sorry if this is stupid, but anyways...)

Don't blame fast-math, if I undestood what you did...
With single-precission floats, you just have 7 digits of precission
(scientific notation) [1].
If you ask printf() to write 15 decimal places, it just _lies_.
How does it invent the rest of decimals, is up to glibc. Just
get glibc sources.

In short, anything past the 7th digit is crap, and can be different
depending on the box, cosmic rays and a butterfly waving its wings.

[1] cpp -dM /dev/null | grep EPSILON
cpp -dM /dev/null | grep FLT


--
J.A. Magallon <jamagallon()able!es> \ Software is like sex:
werewolf!able!es \ It's better when it's free
MacOS X 10.3.3, Build 7F44, Darwin Kernel Version 7.3.0