Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752657AbbBXCcV (ORCPT ); Mon, 23 Feb 2015 21:32:21 -0500 Received: from mail-lb0-f182.google.com ([209.85.217.182]:42648 "EHLO mail-lb0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752557AbbBXCcT (ORCPT ); Mon, 23 Feb 2015 21:32:19 -0500 MIME-Version: 1.0 In-Reply-To: References: <20150221093150.GA27841@gmail.com> <20150221163840.GA32073@pd.tnic> <20150221172914.GB32073@pd.tnic> <54EB99E8.2060500@redhat.com> From: Andy Lutomirski Date: Mon, 23 Feb 2015 18:31:57 -0800 Message-ID: Subject: Re: [RFC PATCH] x86, fpu: Use eagerfpu by default on all CPUs To: "Maciej W. Rozycki" Cc: Rik van Riel , Borislav Petkov , Ingo Molnar , Oleg Nesterov , X86 ML , "linux-kernel@vger.kernel.org" , Linus Torvalds Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5563 Lines: 116 On Mon, Feb 23, 2015 at 6:14 PM, Maciej W. Rozycki wrote: > On Mon, 23 Feb 2015, Andy Lutomirski wrote: > >> >> After a context switch, the instructions from the old task are no >> >> longer in the pipeline. >> > >> > I'd say it's implementation-specific. As I mentioned the i486 aborted >> > any transcendental x87 instruction in progress upon taking an exception or >> > interrupt. That was a model like you refer to, but as I also mentioned it >> > had its shortcomings. >> >> IRET is serializing, according to the the docs (I think) and according >> to the Intel engineers I asked (I'm absolutely certain about this >> part). So FPU ops are entirely done at the end of a normal context >> switch. > > No question about the serialising property of IRET, it has been like this > since the original Pentium implementation. Do you have an architecture > specification reference to back up your claim though as far as the FPU is > concerned? I'm asking because I am genuinely curious. > > The x87 case is so special, there isn't anything there really that is > externally observable or should be affected by IRET or any other > synchronisation barriers apart from WAIT (or a waiting x87 instruction) > that has been there for this purpose since forever. And it would defeat > some documented benefits of running the FP pipeline in the parallel. It's plausible that this is special, but I doubt it. Especially since this optimization would be nuts post-SSE2. > > And certainly such synchronisation didn't happen in the old days. > >> We also always save the FPU context on every context switch away from >> a task that used the FPU, even in lazy mode. This is because we might >> switch the task back in on a different CPU, and we don't want to use >> an IPI to move the FPU context. > > That's an interesting case too, although not necessarily related. If you > say that we always save the FP context eagerly for the purpose of process > migration, then sure, that invalidates any benefit we'd have from letting > the x87 proceed. > > However I can see different ways to address this case avoiding the need > of eager FP context saving or an IPI: > > 1. We could bind any currently suspended process with an unsaved FP > context to the CPU it last executed on. This would be insane. > > 2. We could mark such a process for migration next time and let it execute > on the CPU that holds its FP context once more, and then save the FP > context eagerly on the way out. This would be worse than insane. Now, in order to wake such a process on a different CPU, we'd have to force a *context switch* on the source CPU. Now we're replacing a few hundred cycles at worse for a transcendental function with at least 10k cycles (at a guess) and possibly many orders of magnitude more if locks are held, plus priority issues, plus total craziness. > > In some cases a lazily retained FP context would be preempted by another > process before the process in question would resume anyway. In this case > any temporary binding to a CPU could be given up. > >> Given that we're only talking about old CPUs here, I sincerely doubt >> that there's any relevant case in which an fxsave can usefully wait >> for a long-running transcendental op to finish while we continue doing >> useful work. *Especially* since there will almost certainly be >> several more mfences or atomic ops before the end of the context >> switch, even if we're lucky enough to complete the context switching >> using sysret. > > I am not sure what you mean by FXSAVE usefully waiting for an op, please > elaborate. At the point you've reached FXSAVE and an earlier x87 > instruction hasn't completed, you've already lost. The pipeline will be > stalled until the x87 instruction has completed and it can be hundreds of > cycles. My point therefore has been about avoiding to execute FXSAVE for > the old task until absolutely necessary, that with the lazy FP context > switching would be at the next x87 (or SSE) instruction reached by the new > task. > > Likewise I don't see why MFENCE or an atomic operation should affect the > excecution of say FSINCOS. Whether the results of FSINCOS arrive before > or after MFENCE, etc. are not externally observable. FSINCOS; FXSAVE; MFENCE had better serialize all the way, no matter what weird architectural crud is going on. > > And I'm not sure if this all affects old CPUs only -- I don't know how > much x87 software is out there, but after all these years I'd expect quite > some. Sure, lots of this can be recompiled to use SSE instead, but not > all, and even where it is feasible, that's an extra burden for people, > beyond say a routine hardware or Linux distribution or for that matter > lone kernel upgrade. Therefore I think we need to be careful not to > pessimise things for a subset of people too much and ideally at all. > > And to be clear, I am not against removing lazy FP context switching per > se. I am just emphasizing to be careful with that and be absolutely sure > that it does not cause excessive harm. We're talking about the unusual case in which we context switch within ~100 cycles of a legacy transcendental operation, and, even so, there's *still* no regression, since we don't optimize this case today. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/