Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754524AbcJEQAJ (ORCPT ); Wed, 5 Oct 2016 12:00:09 -0400 Received: from mail-ua0-f181.google.com ([209.85.217.181]:36033 "EHLO mail-ua0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754164AbcJEQAH (ORCPT ); Wed, 5 Oct 2016 12:00:07 -0400 MIME-Version: 1.0 In-Reply-To: References: <1475627678-20788-1-git-send-email-riel@redhat.com> <1475627678-20788-3-git-send-email-riel@redhat.com> <1475675843.11869.8.camel@redhat.com> From: Andy Lutomirski Date: Wed, 5 Oct 2016 08:59:33 -0700 Message-ID: Subject: Re: [PATCH 2/9] x86/fpu: Hard-disable lazy fpu mode To: Paolo Bonzini Cc: Rik van Riel , "linux-kernel@vger.kernel.org" , Dave Hansen , X86 ML , Thomas Gleixner , Ingo Molnar , Andrew Lutomirski , pa@zytor.com, Borislav Petkov Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2910 Lines: 64 On Wed, Oct 5, 2016 at 7:03 AM, Paolo Bonzini wrote: > > > On 05/10/2016 15:57, Rik van Riel wrote: >> On Wed, 2016-10-05 at 09:14 +0200, Paolo Bonzini wrote: >>> >>> On 05/10/2016 02:34, riel@redhat.com wrote: >>>> >>>> From: Andy Lutomirski >>>> >>>> Since commit 58122bf1d856 ("x86/fpu: Default eagerfpu=on on all >>>> CPUs") in Linux 4.6, eager FPU mode has been the default on all x86 >>>> systems, and no one has reported any regressions. >>>> >>>> This patch removes the ability to enable lazy mode: use_eager_fpu() >>>> becomes "return true" and all of the FPU mode selection machinery >>>> is >>>> removed. >>> >>> I haven't quite followed up on my promise to benchmark lazy vs. eager >>> FPU, but I probably should do that now... >>> >>> I see two possible issues with this. First, AMD as far as I know does >>> not have XSAVEOPT. Second, when using virtualization, depending on >>> how you configure your cluster it's enough to have one pre-SandyBridge >>> Intel machine to force no XSAVE on all machines. >> >> The "OPT" part of XSAVEOPT does not work across the >> host/guest boundary, anyway. > > Yes, but it works for bare metal (and in fact eager FPU was keyed on > XSAVEOPT before 58122bf1d856, not XSAVE). > > I'm not talking about KVM here; I am just saying that the lazy FPU code > might be used more than we'd like to, because of AMD machines and of > cases where XSAVE is hidden altogether from guests. Of course it is > quite unlikely that it be reported as a regression, since things just > work. But as far as I know 58122bf1d856 went in without any substantial > (or not-so-substantial) benchmarking. I actually benchmarked the underlying instructions quite a bit on Intel. (Not on AMD, but I doubt the results are very different.) Writes to CR0.TS are *incredibly* slow, as are device-not-available exceptions. Keep in mind that, while there's a (slow) CLTS instruction, there is no corresponding STTS instruction, so we're left with a fully serializing, slowly microcoded move to CR0. On SVM, I think it's worse, because IIRC SVM doesn't have fancy execution controls that let MOV to CR0 avoid exiting. We're talking a couple hundred cycles best case for a TS set/clear pair, and thousands of cycles if we actually take a fault. In contrast, an unconditional XSAVE + XRSTOR was considerably faster. This leads to the counterintuitive result that, if we switch from task A to B and back and task A is heavily using the FPU, then it's faster to unconditoinally save and restore the full state both ways than it is to set and clear TS so we can avoid it. I would guess that the lazy mode hasn't been a win under most workloads for many years. It's worse on 64-bit CPUs, since almost all userspace uses XMM regs for memcpy. At least on 32-bit CPUs, SIMD instructions weren't always available and userspace was conservative. --Andy