2003-03-10 19:49:31

by Chris Friesen

[permalink] [raw]
Subject: Re: [BK-2.5] Move "used FPU status" into new non-atomic thread_info->status field.

Linus Torvalds wrote:
> On Mon, 10 Mar 2003, David S. Miller wrote:
>
>>
>>At least on sparc{32,64}, we consider FPU state to be clobbered coming
>>into system calls, this eliminates a lot of hair wrt. FPU state
>>restoring in cases such as fork().
>>
>
> We could _probably_ do it on x86 too. The standard C calling convention on
> x86 says FPU register state is clobbered, if I remember correctly.
> However, some of the state is "long-term", like rounding modes, exception
> masking etc, and even if we didn't save the register state we would have
> to save that part.
>
> And once you save that part, you're better off saving the registers too,
> since it's all loaded and saved with the same fxsave/fxrestor instruction
> (ie we'd actually have to do _more_ work to save only part of the FP
> state).

Does this open the door for using FP in the kernel?

Chris


--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]


2003-03-10 20:01:05

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BK-2.5] Move "used FPU status" into new non-atomic thread_info->status field.


On Mon, 10 Mar 2003, Chris Friesen wrote:
> >
> > And once you save that part, you're better off saving the registers too,
> > since it's all loaded and saved with the same fxsave/fxrestor instruction
> > (ie we'd actually have to do _more_ work to save only part of the FP
> > state).
>
> Does this open the door for using FP in the kernel?

Not any wider than it already is.

For a while now, x86-specific optimizations (and all such stuff is by
nature very much architecture-specific) have been able to do

kernel_fpu_begin();
...
kernel_fpu_end();

and use the FP state in between. It generally sucks if the user-mode
process had touched FP state (we'll force it saved), but most of the time
that isn't true, and the only thing it does is to temporarily clear the
TS bit so that the FPU works again (and then sets it again in fpu_end,
although if this was a common thing we _could_ make that be a "work"
thing that is only done at return-to-user-mode).

Of course, clearing TS isn't exactly fast, so this really only works if
you have tons of stuff that you _really_ want to use the FPU for. And
since the FP cache is per-CPU, the whole region in question is
non-preemptible, so this can only be used for non-blocking stuff.

In other words: it's still very much a special case, and if the question
was "can I just use FP in the kernel" then the answer is still a
resounding NO, since other architectures may not support it AT ALL.

Linus