Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
Date:   Fri, 15 Jan 2021 01:05:57 +0100
From:   Krzysztof Mazur <krzysiek@podlesie.net>
To:     Andy Lutomirski <luto@kernel.org>
Cc:     "Jason A. Donenfeld" <zx2c4@kernel.org>,
        Borislav Petkov <bp@alien8.de>,
        Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, X86 ML <x86@kernel.org>,
        LKML <linux-kernel@vger.kernel.org>,
        stable <stable@vger.kernel.org>
Subject: Re: [PATCH] x86/lib: don't use MMX before FPU initialization
Message-ID: <20210115000557.GA12468@shrek.podlesie.net>
References: <20201228160631.32732-1-krzysiek@podlesie.net>
 <20210112000923.GK25645@zn.tnic>
 <20210114092218.GA26786@shrek.podlesie.net>
 <20210114094425.GA12284@zn.tnic>
 <20210114123657.GA6358@shrek.podlesie.net>
 <20210114140737.GD12284@zn.tnic>
 <20210114145105.GA17363@shrek.podlesie.net>
 <CALCETrU0Y_Cg--Vs-88Hu9vTR-QynC0AQOpYehE86MYow5u3RQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CALCETrU0Y_Cg--Vs-88Hu9vTR-QynC0AQOpYehE86MYow5u3RQ@mail.gmail.com>
User-Agent: Mutt/1.6.2 (2016-07-01)
Precedence: bulk

On Thu, Jan 14, 2021 at 08:31:30AM -0800, Andy Lutomirski wrote:
> This is gross.  I realize this is only used for old CPUs that we don't
> care about perf-wise

Performance might be still important for embedded systems (Geode LX
seems to be supported "until at least 2021").

> , but this code is nonsense -- it makes absolutely
> to sense to put this absurd condition here to work around
> kernel_fpu_begin() bugs.

Yes, it's nonsense. But I think that the kernel might have a silent
assumption, that the FPU cannot be used too early.

> If we want to use MMX, we should check MMX.
> And we should also check the correct condition re: irqs.  So this code
> should be:
> 
> if (boot_cpu_has(X86_FEATURE_XMM) && irq_fpu_usable()) {
>   kernel_fpu_begin_mask(FPU_MASK_XMM);
>   ...

This code does not use SSE (XMM). It uses only MMX and 3DNow!, so XMM
check is not needed here. And this code is enabled only when a processor
with 3DNow! is selected ("lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o"
in Makefile). So:

	if (!irq_fpu_usable())
		fallback_to_non_mmx_code();

is sufficient.

But the original:

	if (!in_interrupt())
		fallback_to_non_mmx_code();

is also valid. Changing it to irq_fpu_usable() might allow using
MMX variant in more cases and even improve performance. But it
can also introduce some regressions.

> or we could improve code gen by adding a try_kernel_fpu_begin_mask()
> so we can do a single call instead of two.
> [...]
> What do you all think?  If you're generally in favor, I can write the
> code and do a quick audit for other weird cases in the kernel.

I think that the cleanest approach would be introducing fpu_usable(),
which returns 0 (or false) if FPU is not initialized yet. Then
irq_fpu_usable() could be changed to:

bool irq_fpu_usable(void)
{
	return fpu_usable() && (!in_interrupt() ||
		interrupted_user_mode() ||
		interrupted_kernel_fpu_idle());
}

and in the _mmx_memcpy():

-	if (unlikely(in_interrupt()))
+	if (unlikely(irq_fpu_usable()))
		return __memcpy(to, from, len);


try_kernel_fpu_begin(), even without mask, is also a good idea.
If the FPU is not available yet (FPU not initizalized yet) it can fail
and a fallback code would be used. However, in some cases fpu_usable()
+ kernel_fpu_begin() might be better, if between fpu_usable() check
and kernel_fpu_begin() long preparation is required. (kernel_fpu_begin()
disables preemption). In the mmx_32.c case try_kernel_fpu_begin()
looks good, only two simple lines are added to a section with disabled
preemption:

diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c
index 4321fa02e18d..9c6dadd2b2ef 100644
--- a/arch/x86/lib/mmx_32.c
+++ b/arch/x86/lib/mmx_32.c
@@ -31,14 +31,12 @@ void *_mmx_memcpy(void *to, const void *from, size_t len)
 	void *p;
 	int i;
 
-	if (unlikely(in_interrupt()))
+	if (unlikely(try_kernel_fpu_begin()))
 		return __memcpy(to, from, len);
 
 	p = to;
 	i = len >> 6; /* len/64 */
 
-	kernel_fpu_begin();
-
 	__asm__ __volatile__ (
 		"1: prefetch (%0)\n"		/* This set is 28 bytes */
 		"   prefetch 64(%0)\n"


And if try_kernel_fpu_begin_mask() with a mask will allow for some
optimizations it also might be a good idea.

> Or we could flip the OSFSXR bit very early, I suppose.

I think that my original static_branch_likely() workaround might
be the simplest and touches only mmx_32.c. Such approach is already
used in the kernel, for instance in the crypto code:

$ git grep static_branch arch/x86/crypto/


And maybe using FPU too early should be forbidden.

Best regards,
Krzysiek