Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp675482pxb; Thu, 14 Jan 2021 16:09:19 -0800 (PST) X-Google-Smtp-Source: ABdhPJzdjRLIhuT/5N4M19HH/kzZapKuNGtgG8yRmLn+mfsyVkuk9V6BuBqfIuVVQWTdW0IUWooh X-Received: by 2002:a05:6402:40c4:: with SMTP id z4mr7404972edb.233.1610669359444; Thu, 14 Jan 2021 16:09:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610669359; cv=none; d=google.com; s=arc-20160816; b=G5o5g/sUZ/mCt6EJ22vYtadTUlkS8KP2Yk43hnIO9jVEW8p0nLMLmFiWiX3XO5ddbw QpxYwzPW784GYb09n+Qzrix4lIxiPIr15rwvZps2fpY8wnUNECmOu5mHdpkVOyTq41l7 uRAktEmEv11rcgoEMb1nww3/ZLi0E7Nun1UqPM9K4Jd2b5Le7dYm2bqagGs6QDZrBoBt G1x7KdqiCxaFUiA/lhahGZDlN4+zAi/UobUi82QdqPdEItNsGgQerdp5fEWVK+gKVgEv ih+Z3hqvUiStYO4IwipftVjr9T6UXpAk4XIlwcfaAKEYA2Ihd0HqAG3KyaCuja6Rcp13 3IpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=r0A8gT7jKAJDxXn6BHNxIdzH87QGAQCam+7K2k7wJ5Q=; b=PG1Wk71qGSY3iUOOW7M/uoThckW34n+7wHI3c2n4Quq355cfZUjtkM9oGebrU+vyHG 4LCXpT9OqC5eng+zRESsi4DbWFjNdw6An8aDGQsLYcwRMEoP4F4UMJvOB9UtlgBfhcD8 1xUBpUlZQmxtrddVdf5I2G/09Ow9SB+cTTgk/iVxO7L0Jc3Y233NVkvXMIC2K50MIglQ JrvWqBGhuAOfSuOW6yC+sxS1szm3N546irX0zjRI+oi8bGzyMpslrtu/fE1DwvtVs9tX tMGEK8IBdHp7hzxo1hrCFe65MI1XTzV5oHQJtLW1gZWM6ssHlF4GUFfViCsQ2WpanTQG hcnw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=podlesie.net Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d2si3180794edx.348.2021.01.14.16.08.55; Thu, 14 Jan 2021 16:09:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=podlesie.net Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731094AbhAOAGj (ORCPT + 99 others); Thu, 14 Jan 2021 19:06:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48312 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727047AbhAOAGi (ORCPT ); Thu, 14 Jan 2021 19:06:38 -0500 Received: from shrek.podlesie.net (shrek-3s.podlesie.net [IPv6:2a00:13a0:3010::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 12960C061575; Thu, 14 Jan 2021 16:05:58 -0800 (PST) Received: by shrek.podlesie.net (Postfix, from userid 603) id 594E86C9; Fri, 15 Jan 2021 01:05:57 +0100 (CET) Date: Fri, 15 Jan 2021 01:05:57 +0100 From: Krzysztof Mazur To: Andy Lutomirski Cc: "Jason A. Donenfeld" , Borislav Petkov , Thomas Gleixner , Ingo Molnar , X86 ML , LKML , stable Subject: Re: [PATCH] x86/lib: don't use MMX before FPU initialization Message-ID: <20210115000557.GA12468@shrek.podlesie.net> References: <20201228160631.32732-1-krzysiek@podlesie.net> <20210112000923.GK25645@zn.tnic> <20210114092218.GA26786@shrek.podlesie.net> <20210114094425.GA12284@zn.tnic> <20210114123657.GA6358@shrek.podlesie.net> <20210114140737.GD12284@zn.tnic> <20210114145105.GA17363@shrek.podlesie.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.2 (2016-07-01) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 14, 2021 at 08:31:30AM -0800, Andy Lutomirski wrote: > This is gross. I realize this is only used for old CPUs that we don't > care about perf-wise Performance might be still important for embedded systems (Geode LX seems to be supported "until at least 2021"). > , but this code is nonsense -- it makes absolutely > to sense to put this absurd condition here to work around > kernel_fpu_begin() bugs. Yes, it's nonsense. But I think that the kernel might have a silent assumption, that the FPU cannot be used too early. > If we want to use MMX, we should check MMX. > And we should also check the correct condition re: irqs. So this code > should be: > > if (boot_cpu_has(X86_FEATURE_XMM) && irq_fpu_usable()) { > kernel_fpu_begin_mask(FPU_MASK_XMM); > ... This code does not use SSE (XMM). It uses only MMX and 3DNow!, so XMM check is not needed here. And this code is enabled only when a processor with 3DNow! is selected ("lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o" in Makefile). So: if (!irq_fpu_usable()) fallback_to_non_mmx_code(); is sufficient. But the original: if (!in_interrupt()) fallback_to_non_mmx_code(); is also valid. Changing it to irq_fpu_usable() might allow using MMX variant in more cases and even improve performance. But it can also introduce some regressions. > or we could improve code gen by adding a try_kernel_fpu_begin_mask() > so we can do a single call instead of two. > [...] > What do you all think? If you're generally in favor, I can write the > code and do a quick audit for other weird cases in the kernel. I think that the cleanest approach would be introducing fpu_usable(), which returns 0 (or false) if FPU is not initialized yet. Then irq_fpu_usable() could be changed to: bool irq_fpu_usable(void) { return fpu_usable() && (!in_interrupt() || interrupted_user_mode() || interrupted_kernel_fpu_idle()); } and in the _mmx_memcpy(): - if (unlikely(in_interrupt())) + if (unlikely(irq_fpu_usable())) return __memcpy(to, from, len); try_kernel_fpu_begin(), even without mask, is also a good idea. If the FPU is not available yet (FPU not initizalized yet) it can fail and a fallback code would be used. However, in some cases fpu_usable() + kernel_fpu_begin() might be better, if between fpu_usable() check and kernel_fpu_begin() long preparation is required. (kernel_fpu_begin() disables preemption). In the mmx_32.c case try_kernel_fpu_begin() looks good, only two simple lines are added to a section with disabled preemption: diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c index 4321fa02e18d..9c6dadd2b2ef 100644 --- a/arch/x86/lib/mmx_32.c +++ b/arch/x86/lib/mmx_32.c @@ -31,14 +31,12 @@ void *_mmx_memcpy(void *to, const void *from, size_t len) void *p; int i; - if (unlikely(in_interrupt())) + if (unlikely(try_kernel_fpu_begin())) return __memcpy(to, from, len); p = to; i = len >> 6; /* len/64 */ - kernel_fpu_begin(); - __asm__ __volatile__ ( "1: prefetch (%0)\n" /* This set is 28 bytes */ " prefetch 64(%0)\n" And if try_kernel_fpu_begin_mask() with a mask will allow for some optimizations it also might be a good idea. > Or we could flip the OSFSXR bit very early, I suppose. I think that my original static_branch_likely() workaround might be the simplest and touches only mmx_32.c. Such approach is already used in the kernel, for instance in the crypto code: $ git grep static_branch arch/x86/crypto/ And maybe using FPU too early should be forbidden. Best regards, Krzysiek