2004-06-16 20:32:16

by eliot

[permalink] [raw]
Subject: PROBLEM: 2.6 kernels on x86 do not preserve FPU flags across context switches

Hi,

I am the team lead and chief VM developer for a Smaltalk implementation based on a JIT execution engine. Our customers have been seeing rare incorrect floating-point results in intensive fp applications on 2.6 kernels using various x86 compatible processors. These problems do not occur on previous kernel versons. We recently had occasion to reimplement our fp primitives to avoid severe performance problems on Xeon processors that were traced to Xeon's relatively slow implementation of fnclex and fstsw. The older implementaton would produce a result and test for a valid (non NaN, non Inf) result by examining the FPU status flags via fstsw. The newer implementation produces a result and tests its exponent for the NaN/Inf exponent. The new implementation does not show the rare incorrect floating-point results in intensive fp applications on 2.6 kernels. My conclusion is that context switches between the production of the result and the execution of the fstsw are the culprit, and that the context switch machinery fails to preserve the FPU status flags.

I don't know whether any action on your part is appropriate. The use of the FPU status flags is presumably rare on linux (I believe that neither gcc nor glibc make use of them). But "exotic" execution machinery such as runtimes for dynamic or functional languages (language implementations that may not use IEEE arithmetic and instead flag Infs and NaNs as an error) may fall foul of this issue. Since previous versions of the kernel on x86 apparently do preserve the FPU status flags perhaps its simple to preserve the old behaviour. At the very least let me suggest you document the limitation.

Sincerely,
---
Eliot Miranda ,,,^..^,,, mailto:[email protected]
VisualWorks Engineering, Cincom Smalltalk: scene not herd Tel +1 408 216 4581
3350 Scott Blvd, Bldg 36 Suite B, Santa Clara, CA 95054 USA Fax +1 408 216 4500



2004-06-16 21:06:09

by Richard B. Johnson

[permalink] [raw]
Subject: Re: PROBLEM: 2.6 kernels on x86 do not preserve FPU flags across context switches

On Wed, 16 Jun 2004 [email protected] wrote:

> Hi,
>
> I am the team lead and chief VM developer for a Smaltalk implementation based on a JIT execution engine. Our customers have been seeing rare incorrect floating-point results in intensive fp applications on 2.6 kernels using various x86 compatible processors. These problems do not occur on previous kernel versons. We recently had occasion to reimplement our fp primitives to avoid severe performance problems on Xeon processors that were traced to Xeon's relatively slow implementation of fnclex and fstsw. The older implementaton would produce a result and test for a valid (non NaN, non Inf) result by examining the FPU status flags via fstsw. The newer implementation produces a result and tests its exponent for the NaN/Inf exponent. The new implementation does not show the rare incorrect floating-point results in intensive fp applications on 2.6 kernels. My conclusion is that context switches between the production of the result and the execution of the fstsw are the culprit, and that the context switch machinery fails to preserve the FPU status flags.
>
> I don't know whether any action on your part is appropriate. The use of the FPU status flags is presumably rare on linux (I believe that neither gcc nor glibc make use of them). But "exotic" execution machinery such as runtimes for dynamic or functional languages (language implementations that may not use IEEE arithmetic and instead flag Infs and NaNs as an error) may fall foul of this issue. Since previous versions of the kernel on x86 apparently do preserve the FPU status flags perhaps its simple to preserve the old behaviour. At the very least let me suggest you document the limitation.
>
> Sincerely,
> ---
> Eliot Miranda ,,,^..^,,, mailto:[email protected]
> VisualWorks Engineering, Cincom Smalltalk: scene not herd Tel +1 408 216 4581
> 3350 Scott Blvd, Bldg 36 Suite B, Santa Clara, CA 95054 USA Fax +1 408 216 4500
>

All versions of the kernels preserve FPU state, including its flags
across context-switches. They use the FNSAVE/FRSTOR pair which saves
and restores the entire FPU environment including its flags. This
is very expensive, taking roughly 130 clocks for each operation.

What they don't do is save/restore FPU state during system calls
because it is extremely wasteful of preformance. So, if there is
any module loaded that (improperly) uses the FPU, the user's FPU
state can get trashed.

Also, I don't imagine you know what the [Enter] key does, do you?
If you hit this occasionally, somebody might be able to read your
text on a conventional terminal, rather than a Windows auto-warp
contraption.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.26 on an i686 machine (5570.56 BogoMips).
Note 96.31% of all statistics are fiction.


2004-06-16 21:40:26

by Andi Kleen

[permalink] [raw]
Subject: Re: PROBLEM: 2.6 kernels on x86 do not preserve FPU flags across context switches


[email protected] writes:

> I am the team lead and chief VM developer for a Smaltalk
> implementation based on a JIT execution engine. Our customers
> have been seeing rare incorrect floating-point results in
> intensive fp applications on 2.6 kernels using various x86
> compatible processors. These problems do not occur on
> previous kernel versons. We recently had occasion to
> reimplement our fp primitives to avoid severe performance
> problems on Xeon processors that were traced to Xeon's
> relatively slow implementation of fnclex and fstsw. The older

Funny, Linux just added fnclex to a critical path on popular request.
But I guess it will need to be removed again, we already discussed
that.


> I don't know whether any action on your part is appropriate. The
> use of the FPU status flags is presumably rare on linux (I believe
> that neither gcc nor glibc make use of them). But "exotic"
> execution machinery such as runtimes for dynamic or functional
> languages (language implementations that may not use IEEE arithmetic
> and instead flag Infs and NaNs as an error) may fall foul of this
> issue. Since previous versions of the kernel on x86 apparently do
> preserve the FPU status flags perhaps its simple to preserve the old
> behaviour. At the very least let me suggest you document the
> limitation.

This sounds like a serious kernel bug that should be fixed if
true. Can you perhaps create a simple demo program that shows the
problem and post it?

On what CPUs does the failure occur? Linux uses different paths
depending on if the CPU supports SSE or not.

Does your program receive signals? Could it be related to them?

-Andi

2004-06-16 22:27:40

by eliot

[permalink] [raw]
Subject: Re: PROBLEM: 2.6 kernels on x86 do not preserve FPU flags across context switches

Hi Andi,


| Funny, Linux just added fnclex to a critical path on popular request.
| But I guess it will need to be removed again, we already discussed
| that.

Yes, this is a right royal pain. We have problems around fnclex. Because people can call arbitrary code from within Smalltalk we have to do an fnclex prior to an fp operation if we're to trap NaN/Inf results. But doing so prior to each fp oiperation is becomming increasingly slower on more "modern" x86 implementations. So we've now moved the fnclex to the return from an external language call as this tends to have lower dynamic frequency. So from where I sit (a mushroom-like position) the issue feels like a design flaw in the x87 fpu...

| > I don't know whether any action on your part is appropriate. The
| > use of the FPU status flags is presumably rare on linux (I believe
| > that neither gcc nor glibc make use of them). But "exotic"
| > execution machinery such as runtimes for dynamic or functional
| > languages (language implementations that may not use IEEE arithmetic
| > and instead flag Infs and NaNs as an error) may fall foul of this
| > issue. Since previous versions of the kernel on x86 apparently do
| > preserve the FPU status flags perhaps its simple to preserve the old
| > behaviour. At the very least let me suggest you document the
| > limitation.

| This sounds like a serious kernel bug that should be fixed if
| true. Can you perhaps create a simple demo program that shows the
| problem and post it?

OK, I'm working on it. I have to get one of our customers to run the test because I don't have a 2.6 kernel handy. As Im in release crunch mode right now there may be a couple of weeks delay. But I should have a test program to you soon.

| On what CPUs does the failure occur? Linux uses different paths
| depending on if the CPU supports SSE or not.

This answer should be more prompt. Say tomorrow.

| Does your program receive signals? Could it be related to them?

Could be. Yes we do have to handle signals. But I'm pretty confident the issue is with the FPU flags because as far as fp goes the only significant change between the version that shows the problem and that that doesn't is the use of the FPU flags (via fxam, fstsw). The version that uses fxam & fstsw doesn;t show the problem on kernels prior to 2.6. In any case if I'm right the test proram should show it pretty clearly.

As I say, give me a couple of weeks or so.

Cheers,
---
Eliot Miranda ,,,^..^,,, mailto:[email protected]
VisualWorks Engineering, Cincom Smalltalk: scene not herd Tel +1 408 216 4581
3350 Scott Blvd, Bldg 36 Suite B, Santa Clara, CA 95054 USA Fax +1 408 216 4500


2004-06-16 23:02:17

by eliot

[permalink] [raw]
Subject: Re: PROBLEM: 2.6 kernels on x86 do not preserve FPU flags across context switches

Hi Andi,

you asked:
| On what CPUs does the failure occur? Linux uses different paths
| depending on if the CPU supports SSE or not.

Travis responded:

| We run on both AMDs (Durons and Athlons) as well as PII, PIII, and
| PIV's. Our kernels are all compiled as generic 586+. Though when we were
| testing for this, we did try the more generic 486+ option, as well as
| exact processor matches for the AMD at least. I don't remember it making
| a difference.

+-----------------------------
| Date: Wed, 16 Jun 2004 23:40:18 +0200
| From: Andi Kleen <[email protected]>
| Subject: Re: PROBLEM: 2.6 kernels on x86 do not preserve FPU flags across context switches



| [email protected] writes:

| > I am the team lead and chief VM developer for a Smaltalk
| > implementation based on a JIT execution engine. Our customers
| > have been seeing rare incorrect floating-point results in
| > intensive fp applications on 2.6 kernels using various x86
| > compatible processors. These problems do not occur on
| > previous kernel versons. We recently had occasion to
| > reimplement our fp primitives to avoid severe performance
| > problems on Xeon processors that were traced to Xeon's
| > relatively slow implementation of fnclex and fstsw. The older

| Funny, Linux just added fnclex to a critical path on popular request.
| But I guess it will need to be removed again, we already discussed
| that.


| > I don't know whether any action on your part is appropriate. The
| > use of the FPU status flags is presumably rare on linux (I believe
| > that neither gcc nor glibc make use of them). But "exotic"
| > execution machinery such as runtimes for dynamic or functional
| > languages (language implementations that may not use IEEE arithmetic
| > and instead flag Infs and NaNs as an error) may fall foul of this
| > issue. Since previous versions of the kernel on x86 apparently do
| > preserve the FPU status flags perhaps its simple to preserve the old
| > behaviour. At the very least let me suggest you document the
| > limitation.

| This sounds like a serious kernel bug that should be fixed if
| true. Can you perhaps create a simple demo program that shows the
| problem and post it?

| On what CPUs does the failure occur? Linux uses different paths
| depending on if the CPU supports SSE or not.

| Does your program receive signals? Could it be related to them?

| -Andi
---
Eliot Miranda ,,,^..^,,, mailto:[email protected]
VisualWorks Engineering, Cincom Smalltalk: scene not herd Tel +1 408 216 4581
3350 Scott Blvd, Bldg 36 Suite B, Santa Clara, CA 95054 USA Fax +1 408 216 4500


2004-06-17 07:03:07

by Denis Vlasenko

[permalink] [raw]
Subject: Re: PROBLEM: 2.6 kernels on x86 do not preserve FPU flags across context switches

On Thursday 17 June 2004 00:03, Richard B. Johnson wrote:
> On Wed, 16 Jun 2004 [email protected] wrote:
> > Hi,
> >
> > I am the team lead and chief VM developer for a Smaltalk implementation
> > based on a JIT execution engine. Our customers have been seeing rare
> > incorrect floating-point results in intensive fp applications on 2.6
> > kernels using various x86 compatible processors. These problems do not
> > occur on previous kernel versons. We recently had occasion to
> > reimplement our fp primitives to avoid severe performance problems on
> > Xeon processors that were traced to Xeon's relatively slow implementation
> > of fnclex and fstsw. The older implementaton would produce a result and
> > test for a valid (non NaN, non Inf) result by examining the FPU status
> > flags via fstsw. The newer implementation produces a result and tests
> > its exponent for the NaN/Inf exponent. The new implementation does not
> > show the rare incorrect floating-point results in intensive fp
> > applications on 2.6 kernels. My conclusion is that context switches

wow. Perfect place to install debug code.
Check _both_ exponent and flags, if they disagree, yell.
That may give some useful info about cause of this.

BTW, I didn't pay attention, but some FPU-related patches
were floating around just recently. Check list archives.

> > between the production of the result and the execution of the fstsw are
> > the culprit, and that the context switch machinery fails to preserve the
> > FPU status flags.
> >
> > I don't know whether any action on your part is appropriate. The use of
> > the FPU status flags is presumably rare on linux (I believe that neither
> > gcc nor glibc make use of them). But "exotic" execution machinery such
> > as runtimes for dynamic or functional languages (language implementations
> > that may not use IEEE arithmetic and instead flag Infs and NaNs as an
> > error) may fall foul of this issue. Since previous versions of the
> > kernel on x86 apparently do preserve the FPU status flags perhaps its
> > simple to preserve the old behaviour. At the very least let me suggest
> > you document the limitation.
--
vda

2004-06-17 10:35:34

by Andi Kleen

[permalink] [raw]
Subject: Re: PROBLEM: 2.6 kernels on x86 do not preserve FPU flags across context switches

On Wed, Jun 16, 2004 at 04:01:46PM -0700, [email protected] wrote:
> Hi Andi,
>
> you asked:
> | On what CPUs does the failure occur? Linux uses different paths
> | depending on if the CPU supports SSE or not.
>
> Travis responded:
>
> | We run on both AMDs (Durons and Athlons) as well as PII, PIII, and
> | PIV's. Our kernels are all compiled as generic 586+. Though when we were

And you saw it on all of them? (in particular both on PII and on PIV?)

I actually doubt the problem happens on a context switch - the kernel
just uses FXSAVE/FNSAVE for this and this is extremly hard to get wrong.
Either you have no FPU state saved at all or you have all, since the
CPU handles it completely in microcode.

Most likely candidate would be signal context saving. When a signal happens
and the process used floating point then the i386 kernel converts
the internal FXSAVE/FNSAVE image to another image derived from
iBCS on the signal stack (and then later back). If any problems
with subtle corruptions happen I would expect them in this process.

This would be more likely on SSE enabled CPUs though, on pre SSE
CPUs this code is much simpler.

Do you know which status bit gets corrupted exactly?

-Andi

2004-06-17 10:39:08

by Andi Kleen

[permalink] [raw]
Subject: Re: PROBLEM: 2.6 kernels on x86 do not preserve FPU flags across context switches

On Wed, Jun 16, 2004 at 03:26:55PM -0700, [email protected] wrote:
> Hi Andi,
>
>
> | Funny, Linux just added fnclex to a critical path on popular request.
> | But I guess it will need to be removed again, we already discussed
> | that.
>
> Yes, this is a right royal pain. We have problems around fnclex. Because people can call arbitrary code from within Smalltalk we have to do an fnclex prior to an fp operation if we're to trap NaN/Inf results. But doing so prior to each fp oiperation is becomming increasingly slower on more "modern" x86 implementations. So we've now moved the fnclex to the return from an external language call as this tends to have lower dynamic frequency. So from where I sit (a mushroom-like position) the issue feels like a design flaw in the x87 fpu...


You can use fwait and handle the exception in a signal handler if it occurs.
fwait is much faster.

>
> | > I don't know whether any action on your part is appropriate. The
> | > use of the FPU status flags is presumably rare on linux (I believe
> | > that neither gcc nor glibc make use of them). But "exotic"
> | > execution machinery such as runtimes for dynamic or functional
> | > languages (language implementations that may not use IEEE arithmetic
> | > and instead flag Infs and NaNs as an error) may fall foul of this
> | > issue. Since previous versions of the kernel on x86 apparently do
> | > preserve the FPU status flags perhaps its simple to preserve the old
> | > behaviour. At the very least let me suggest you document the
> | > limitation.
>
> | This sounds like a serious kernel bug that should be fixed if
> | true. Can you perhaps create a simple demo program that shows the
> | problem and post it?
>
> OK, I'm working on it. I have to get one of our customers to run the test because I don't have a 2.6 kernel handy. As Im in release crunch mode right now there may be a couple of weeks delay. But I should have a test program to you soon.

Thanks.

>
> | On what CPUs does the failure occur? Linux uses different paths
> | depending on if the CPU supports SSE or not.
>
> This answer should be more prompt. Say tomorrow.
>
> | Does your program receive signals? Could it be related to them?
>
> Could be. Yes we do have to handle signals. But I'm pretty confident the issue is with the FPU flags because as far as fp goes the only significant change between the version that shows the problem and that that doesn't is the use of the FPU flags (via fxam, fstsw). The version that uses fxam & fstsw doesn;t show the problem on kernels prior to 2.6. In any case if I'm right the test proram should show it pretty clearly.

I'm asking because signals save/restore FPU context.

-Andi