2004-06-12 22:08:16

by Andi Kleen

[permalink] [raw]
Subject: Re: timer + fpu stuff locks up computer

Sergey Vlasov <[email protected]> writes:

> On Sat, Jun 12, 2004 at 07:14:22PM +0400, Sergey Vlasov wrote:
>> If the FPU state belong to the userspace process, kernel_fpu_begin()
>> is safe even if some exceptions are pending. However, after
>> __clear_fpu() the FPU is "orphaned", and kernel_fpu_begin() does
>> nothing with it.
>>
>> Replacing fwait with fnclex instead of removing it completely should
>> avoid the fault later.
>
> Yes, it seems to be enough. Another case where it looks like FPU
> might be "orphaned" is exit(); however, it is handled as a normal task
> switch, __switch_to() calls __unlazy_fpu(), which clears pending
> exceptions.

One problem on 486s/P5s would be the race that is described in D.2.1.3
of Volume 1 of the Intel architecture manual when the FPU is in MSDOS
compatibility. When that happens we can still get the exception later
(e.g. on a following fwait which the kernel can still execute). The
only way to handle that would be to check in the exception handler,
like my patch did. However my patch was also not complete, since it
didn't handle it for all fwaits in the kernel.

Also BTW x86-64 must be fixed too.

-Andi


2004-06-13 13:07:08

by Sergey Vlasov

[permalink] [raw]
Subject: Re: timer + fpu stuff locks up computer

On Sun, Jun 13, 2004 at 12:08:10AM +0200, Andi Kleen wrote:
> One problem on 486s/P5s would be the race that is described in D.2.1.3
> of Volume 1 of the Intel architecture manual when the FPU is in MSDOS
> compatibility. When that happens we can still get the exception later
> (e.g. on a following fwait which the kernel can still execute). The
> only way to handle that would be to check in the exception handler,
> like my patch did.

But in head.S we set the NE flag in CR0 for all 486 or better
processors, so the MSDOS compatibility mode is not used, and we don't
need to care about this race.

> However my patch was also not complete, since it
> didn't handle it for all fwaits in the kernel.

Looked at your patch... I was also thinking about something similar.

You treat exception 16 and IRQ13 the same - is this really correct?
Asynchronous IRQ13 might break things. But this would be visible only
on a real 80386+80387 - does someone still have such hardware? ;)


Attachments:
(No filename) (979.00 B)
(No filename) (189.00 B)
Download all attachments