Date: Tue, 24 Feb 2015 02:14:10 +0000 (GMT)
From: "Maciej W. Rozycki" <macro@linux-mips.org>
To: Andy Lutomirski <luto@amacapital.net>
cc: Rik van Riel <riel@redhat.com>, Borislav Petkov <bp@alien8.de>,
        Ingo Molnar <mingo@kernel.org>, Oleg Nesterov <oleg@redhat.com>,
        X86 ML <x86@kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [RFC PATCH] x86, fpu: Use eagerfpu by default on all CPUs
In-Reply-To: <CALCETrVQM6ajmHWZxevWMz+WGF=Nv+N0rM8y6KBvoMqz+is0WQ@mail.gmail.com>
Message-ID: <alpine.LFD.2.11.1502240112070.17311@eddie.linux-mips.org>
References: <b0ba174ea882ed36cf7011e872baf427c23b7e09.1424458621.git.luto@amacapital.net> <20150221093150.GA27841@gmail.com> <20150221163840.GA32073@pd.tnic> <20150221172914.GB32073@pd.tnic> <alpine.LFD.2.11.1502212328210.11588@eddie.linux-mips.org>
 <CALCETrU=9Kvq82fBRfw9RLxzyj=LhnLzGV+vWtH+etpqypLatg@mail.gmail.com> <alpine.LFD.2.11.1502232014300.17311@eddie.linux-mips.org> <54EB99E8.2060500@redhat.com> <alpine.LFD.2.11.1502232152530.17311@eddie.linux-mips.org>
 <CALCETrVQM6ajmHWZxevWMz+WGF=Nv+N0rM8y6KBvoMqz+is0WQ@mail.gmail.com>
User-Agent: Alpine 2.11 (LFD 23 2013-08-11)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4948
Lines: 98

On Mon, 23 Feb 2015, Andy Lutomirski wrote:

> >> After a context switch, the instructions from the old task are no
> >> longer in the pipeline.
> >
> >  I'd say it's implementation-specific.  As I mentioned the i486 aborted
> > any transcendental x87 instruction in progress upon taking an exception or
> > interrupt.  That was a model like you refer to, but as I also mentioned it
> > had its shortcomings.
> 
> IRET is serializing, according to the the docs (I think) and according
> to the Intel engineers I asked (I'm absolutely certain about this
> part).  So FPU ops are entirely done at the end of a normal context
> switch.

 No question about the serialising property of IRET, it has been like this 
since the original Pentium implementation.  Do you have an architecture 
specification reference to back up your claim though as far as the FPU is 
concerned?  I'm asking because I am genuinely curious.

 The x87 case is so special, there isn't anything there really that is 
externally observable or should be affected by IRET or any other 
synchronisation barriers apart from WAIT (or a waiting x87 instruction) 
that has been there for this purpose since forever.  And it would defeat 
some documented benefits of running the FP pipeline in the parallel.

 And certainly such synchronisation didn't happen in the old days.

> We also always save the FPU context on every context switch away from
> a task that used the FPU, even in lazy mode.  This is because we might
> switch the task back in on a different CPU, and we don't want to use
> an IPI to move the FPU context.

 That's an interesting case too, although not necessarily related.  If you 
say that we always save the FP context eagerly for the purpose of process 
migration, then sure, that invalidates any benefit we'd have from letting 
the x87 proceed.

 However I can see different ways to address this case avoiding the need 
of eager FP context saving or an IPI:

1. We could bind any currently suspended process with an unsaved FP 
   context to the CPU it last executed on.

2. We could mark such a process for migration next time and let it execute 
   on the CPU that holds its FP context once more, and then save the FP 
   context eagerly on the way out.

In some cases a lazily retained FP context would be preempted by another 
process before the process in question would resume anyway.  In this case 
any temporary binding to a CPU could be given up.

> Given that we're only talking about old CPUs here, I sincerely doubt
> that there's any relevant case in which an fxsave can usefully wait
> for a long-running transcendental op to finish while we continue doing
> useful work.  *Especially* since there will almost certainly be
> several more mfences or atomic ops before the end of the context
> switch, even if we're lucky enough to complete the context switching
> using sysret.

 I am not sure what you mean by FXSAVE usefully waiting for an op, please 
elaborate.  At the point you've reached FXSAVE and an earlier x87 
instruction hasn't completed, you've already lost.  The pipeline will be 
stalled until the x87 instruction has completed and it can be hundreds of 
cycles.  My point therefore has been about avoiding to execute FXSAVE for 
the old task until absolutely necessary, that with the lazy FP context 
switching would be at the next x87 (or SSE) instruction reached by the new 
task.

 Likewise I don't see why MFENCE or an atomic operation should affect the 
excecution of say FSINCOS.  Whether the results of FSINCOS arrive before 
or after MFENCE, etc. are not externally observable.

 And I'm not sure if this all affects old CPUs only -- I don't know how 
much x87 software is out there, but after all these years I'd expect quite 
some.  Sure, lots of this can be recompiled to use SSE instead, but not 
all, and even where it is feasible, that's an extra burden for people, 
beyond say a routine hardware or Linux distribution or for that matter 
lone kernel upgrade.  Therefore I think we need to be careful not to 
pessimise things for a subset of people too much and ideally at all.

 And to be clear, I am not against removing lazy FP context switching per 
se.  I am just emphasizing to be careful with that and be absolutely sure 
that it does not cause excessive harm.

 I still wonder why Intel hasn't addressed some issues around this stuff 
-- is that there are not enough people using proper IEEE 754 arithmetic on 
x86 hardware to attract interest of hardware architecture maintainers?  
After all the same issue applies to enabled IEEE 754 exceptions, a #MF/#XM 
exception isn't going to take any less than a #NM fault.  Or maybe I'm 
just missing something?

  Maciej
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/