When enabled, the MMX 3DNow! optimized memcpy() is used very early,
even before FPU is initialized. It worked fine, but commit
7ad816762f9bf89e940e618ea40c43138b479e10 ("x86/fpu: Reset MXCSR
to default in kernel_fpu_begin()") broke that. After that commit
the kernel crashes just after "Booting the kernel." message.
It affects all kernels with CONFIG_X86_USE_3DNOW=y (enabled when
some AMD/Cyrix processors are selected). So, enable MMX 3DNow!
optimized memcpy() later.
Cc: <[email protected]> # 5.8+
Signed-off-by: Krzysztof Mazur <[email protected]>
---
Hi,
this patch fixes a kernel crash during boot observed on AMD Athlon XP
(with CONFIG_MK7=y).
The static key adds 5 byte NOP in the "fast" path. The 3DNow! usage
can be enabled earlier, but arch_initcall should be ok.
The static_cpu_has() does not work because the kernel with
CONFIG_X86_USE_3DNOW=y assumes that 3DNOW! is available,
so a static key is used instead. "Alternatives" should
also work, as long they not assume that required features
(REQUIRED_MASK*) are available early.
Similar bugs are possible. For easier debugging, maybe
kernel_fpu_begin() should catch such cases and print something?
Thanks,
Krzysiek
arch/x86/lib/mmx_32.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c
index 4321fa02e18d..375bf0798993 100644
--- a/arch/x86/lib/mmx_32.c
+++ b/arch/x86/lib/mmx_32.c
@@ -26,12 +26,14 @@
#include <asm/fpu/api.h>
#include <asm/asm.h>
+static __ro_after_init DEFINE_STATIC_KEY_FALSE(use_mmx);
+
void *_mmx_memcpy(void *to, const void *from, size_t len)
{
void *p;
int i;
- if (unlikely(in_interrupt()))
+ if (unlikely(in_interrupt()) || !static_branch_likely(&use_mmx))
return __memcpy(to, from, len);
p = to;
@@ -376,3 +378,11 @@ void mmx_copy_page(void *to, void *from)
fast_copy_page(to, from);
}
EXPORT_SYMBOL(mmx_copy_page);
+
+static int __init mmx_32_init(void)
+{
+ static_branch_enable(&use_mmx);
+ return 0;
+}
+
+arch_initcall(mmx_32_init);
--
2.27.0.rc1.207.gb85828341f
On Mon, Dec 28, 2020 at 05:06:31PM +0100, Krzysztof Mazur wrote:
> When enabled, the MMX 3DNow! optimized memcpy() is used very early,
> even before FPU is initialized. It worked fine, but commit
> 7ad816762f9bf89e940e618ea40c43138b479e10 ("x86/fpu: Reset MXCSR
> to default in kernel_fpu_begin()") broke that. After that commit
> the kernel crashes just after "Booting the kernel." message.
Have you figured out in the meantime what exactly is causing the
breakage?
Does it boot if you comment out
+ if (boot_cpu_has(X86_FEATURE_FPU))
+ asm volatile ("fninit");
in kernel_fpu_begin()?
I'm guessing K7 doesn't have X86_FEATURE_XMM so the LDMXCSR won't
matter...
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Tue, Jan 12, 2021 at 01:09:23AM +0100, Borislav Petkov wrote:
> On Mon, Dec 28, 2020 at 05:06:31PM +0100, Krzysztof Mazur wrote:
> > When enabled, the MMX 3DNow! optimized memcpy() is used very early,
> > even before FPU is initialized. It worked fine, but commit
> > 7ad816762f9bf89e940e618ea40c43138b479e10 ("x86/fpu: Reset MXCSR
> > to default in kernel_fpu_begin()") broke that. After that commit
> > the kernel crashes just after "Booting the kernel." message.
>
> Have you figured out in the meantime what exactly is causing the
> breakage?
>
> Does it boot if you comment out
>
> + if (boot_cpu_has(X86_FEATURE_FPU))
> + asm volatile ("fninit");
>
> in kernel_fpu_begin()?
>
> I'm guessing K7 doesn't have X86_FEATURE_XMM so the LDMXCSR won't
> matter...
K7 supports both XMM and FXSR:
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow cpuid 3dnowprefetch vmmcall
and the Linux 5.10 boots fine with:
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index eb86a2b831b1..08e5c5bea599 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -141,8 +141,10 @@ void kernel_fpu_begin(void)
}
__cpu_invalidate_fpregs_state();
+#if 0
if (boot_cpu_has(X86_FEATURE_XMM))
ldmxcsr(MXCSR_DEFAULT);
+#endif
if (boot_cpu_has(X86_FEATURE_FPU))
asm volatile ("fninit");
So, I'm guessing that the K7 does not like ldmxcsr(), when FXSR
or/and XMM are not enabled in CR4 (in fpu__init_cpu_generic()).
I verified that by adding kernel_fpu_begin()+kernel_fpu_end()
pair, before and after cr4_set_bits() in fpu__init_cpu_generic()
(on a kernel with disabled early MMX-optimized memcpy).
The kernel with kernel_fpu_begin() after cr4_set_bits():
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 701f196d7c68..6d5db9107411 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -25,6 +25,9 @@ static void fpu__init_cpu_generic(void)
if (cr4_mask)
cr4_set_bits(cr4_mask);
+ kernel_fpu_begin();
+ kernel_fpu_end();
+
cr0 = read_cr0();
cr0 &= ~(X86_CR0_TS|X86_CR0_EM); /* clear TS and EM */
if (!boot_cpu_has(X86_FEATURE_FPU))
boots fine, but the kernel with kernel_fpu_begin() before
cr4_set_bits():
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 701f196d7c68..fcf3e6fbd3f8 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -22,6 +22,9 @@ static void fpu__init_cpu_generic(void)
cr4_mask |= X86_CR4_OSFXSR;
if (boot_cpu_has(X86_FEATURE_XMM))
cr4_mask |= X86_CR4_OSXMMEXCPT;
+
+ kernel_fpu_begin();
+ kernel_fpu_end();
if (cr4_mask)
cr4_set_bits(cr4_mask);
crashes.
So the breakage is caused by using ldmxcsr(MXCSR_DEFAULT)
before X86_CR4_OSFXSR or/and X86_CR4_OSXMMEXCPT are set
in CR4.
So there are at least two alternative approaches (to disabling
MMX-optimized memcpy early):
a) initialize FXSR/XMM bits in CR4 very early, before first memcpy()
b) replacing boot_cpu_has(X86_FEATURE_XMM) in kernel_fpu_begin()
by something that is not enabled before CR4 is initialized.
This approach also comes with a runtime penalty, because
if kernel requires XMM, boot_cpu_has(X86_FEATURE_XMM) is
optimized out.
Best regards,
Krzysiek
On Thu, Jan 14, 2021 at 10:22:18AM +0100, Krzysztof Mazur wrote:
> So, I'm guessing that the K7 does not like ldmxcsr(), when FXSR
> or/and XMM are not enabled in CR4 (in fpu__init_cpu_generic()).
> I verified that by adding kernel_fpu_begin()+kernel_fpu_end()
> pair, before and after cr4_set_bits() in fpu__init_cpu_generic()
> (on a kernel with disabled early MMX-optimized memcpy).
Ah, ok, that makes sense. If X86_CR4_OSFXSR is not set, we cannot use
legacy SSE1 insns and LDMXCSR is one of them.
I believe the correct fix should be
if (unlikely(in_interrupt()) || !(cr4_read_shadow() & X86_CR4_OSFXSR))
return __memcpy(to, from, len);
in _mmx_memcpy() as you had it in your first patch.
Wanna try it and if it works, send a proper patch?
Also pls put a comment above it that that CR4 bit needs to be tested
before using SSE insns.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Jan 14, 2021 at 10:44:25AM +0100, Borislav Petkov wrote:
> I believe the correct fix should be
>
> if (unlikely(in_interrupt()) || !(cr4_read_shadow() & X86_CR4_OSFXSR))
> return __memcpy(to, from, len);
>
> in _mmx_memcpy() as you had it in your first patch.
>
The OSFXSR must be set only on CPUs with SSE. There
are some CPUs with 3DNow!, but without SSE and FXSR (like AMD
Geode LX, which is still used in many embedded systems).
So, I've changed that to:
if (unlikely(in_interrupt()) || (boot_cpu_has(X86_FEATURE_XMM) &&
unlikely(!(cr4_read_shadow() & X86_CR4_OSFXSR))))
However, I'm not sure that adding two taken branches (that
second unlikely() does not help gcc to generate better code) and
a call to cr4_read_shadow() (not inlined) is a good idea.
The static branch adds a single jump, which is changed to
NOP. The code with boot_cpu_has() and CR4 checks adds:
c09932ad: a1 50 50 c3 c0 mov 0xc0c35050,%eax
c09932b2: a9 00 00 00 02 test $0x2000000,%eax
/* this branch is taken */
c09932b7: 0f 85 13 01 00 00 jne c09933d0 <_mmx_memcpy+0x140>
c09932bd: /* MMX code is here */
/* cr4_read_shadow is not inlined */
c09933d0: e8 8b 01 78 ff call c0113560 <cr4_read_shadow>
c09933d5: f6 c4 02 test $0x2,%ah
/* this branch is also taken */
c09933d8: 0f 85 df fe ff ff jne c09932bd <_mmx_memcpy+0x2d>
However, except for some embedded systems, probably almost nobody uses
that code today, and the most imporant is avoiding future breakage. And
I'm not sure which approach is better. Because that CR4 test tests
for a feature that is not used in mmx_memcpy(), but it's used in
kernel_fpu_begin(). And in future kernel_fpu_begin() may change and
require also other stuff. So I think that the best approach would be
delay any FPU optimized memcpy() after fpu__init_system() is executed.
Best regards,
Krzysiek
-- >8 --
Subject: [PATCH] x86/lib: don't use mmx_memcpy() to early
The MMX 3DNow! optimized memcpy() is used very early,
even before FPU is initialized in the kernel. It worked fine, but commit
7ad816762f9bf89e940e618ea40c43138b479e10 ("x86/fpu: Reset MXCSR
to default in kernel_fpu_begin()") broke that. After that
commit the kernel_fpu_begin() assumes that FXSR is enabled in
the CR4 register on all processors with SSE. Because memcpy() is used
before FXSR is enabled, the kernel crashes just after "Booting the kernel."
message. It affects all kernels with CONFIG_X86_USE_3DNOW (enabled when
some AMD/Cyrix processors are selected) on processors with SSE
(like AMD K7, which supports both MMX 3DNow! and SSE).
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Krzysztof Mazur <[email protected]>
---
arch/x86/lib/mmx_32.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c
index 4321fa02e18d..70aa769570e6 100644
--- a/arch/x86/lib/mmx_32.c
+++ b/arch/x86/lib/mmx_32.c
@@ -25,13 +25,20 @@
#include <asm/fpu/api.h>
#include <asm/asm.h>
+#include <asm/tlbflush.h>
void *_mmx_memcpy(void *to, const void *from, size_t len)
{
void *p;
int i;
- if (unlikely(in_interrupt()))
+ /*
+ * kernel_fpu_begin() assumes that FXSR is enabled on all processors
+ * with SSE. Thus, MMX-optimized version can't be used
+ * before the kernel enables FXSR (OSFXSR bit in the CR4 register).
+ */
+ if (unlikely(in_interrupt()) || (boot_cpu_has(X86_FEATURE_XMM) &&
+ unlikely(!(cr4_read_shadow() & X86_CR4_OSFXSR))))
return __memcpy(to, from, len);
p = to;
--
2.27.0.rc1.207.gb85828341f
On Thu, Jan 14, 2021 at 01:36:57PM +0100, Krzysztof Mazur wrote:
> The OSFXSR must be set only on CPUs with SSE. There
> are some CPUs with 3DNow!, but without SSE and FXSR (like AMD
> Geode LX, which is still used in many embedded systems).
> So, I've changed that to:
>
> if (unlikely(in_interrupt()) || (boot_cpu_has(X86_FEATURE_XMM) &&
> unlikely(!(cr4_read_shadow() & X86_CR4_OSFXSR))))
Why?
X86_CR4_OSFXSR won't ever be set on those CPUs but the test will be
performed anyway. So there's no need for boot_cpu_has().
> However, except for some embedded systems, probably almost nobody uses
> that code today, and the most imporant is avoiding future breakage.
Yes, exactly. K7 is a don't-care performance-wise nowadays.
> And I'm not sure which approach is better. Because that CR4 test tests
> for a feature that is not used in mmx_memcpy(), but it's used in
> kernel_fpu_begin(). And in future kernel_fpu_begin() may change and
> require also other stuff. So I think that the best approach would be
> delay any FPU optimized memcpy() after fpu__init_system() is executed.
I'd prefer the simplest approach as this code is almost never used. So
no need to complicate things unnecessarily.
> Subject: [PATCH] x86/lib: don't use mmx_memcpy() to early
s/to/too/
> The MMX 3DNow! optimized memcpy() is used very early,
> even before FPU is initialized in the kernel. It worked fine, but commit
> 7ad816762f9bf89e940e618ea40c43138b479e10 ("x86/fpu: Reset MXCSR
> to default in kernel_fpu_begin()") broke that. After that
> commit the kernel_fpu_begin() assumes that FXSR is enabled in
> the CR4 register on all processors with SSE. Because memcpy() is used
> before FXSR is enabled, the kernel crashes just after "Booting the kernel."
> message. It affects all kernels with CONFIG_X86_USE_3DNOW (enabled when
> some AMD/Cyrix processors are selected) on processors with SSE
> (like AMD K7, which supports both MMX 3DNow! and SSE).
>
Fixes: 7ad816762f9b ("x86/fpu: Reset MXCSR to default in kernel_fpu_begin()")
...
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Jan 14, 2021 at 03:07:37PM +0100, Borislav Petkov wrote:
> On Thu, Jan 14, 2021 at 01:36:57PM +0100, Krzysztof Mazur wrote:
> > The OSFXSR must be set only on CPUs with SSE. There
> > are some CPUs with 3DNow!, but without SSE and FXSR (like AMD
> > Geode LX, which is still used in many embedded systems).
> > So, I've changed that to:
> >
> > if (unlikely(in_interrupt()) || (boot_cpu_has(X86_FEATURE_XMM) &&
> > unlikely(!(cr4_read_shadow() & X86_CR4_OSFXSR))))
>
> Why?
>
> X86_CR4_OSFXSR won't ever be set on those CPUs but the test will be
> performed anyway. So there's no need for boot_cpu_has().
Because the MMX version should be always used on those CPUs, even without
OSFXSR set. If the CPU does not support SSE, it is safe to
call kernel_fpu_begin() without OSFXSR set.
"!(cr4_read_shadow() & X86_CR4_OSFXSR)" will be always true on
those CPUs, and without boot_cpu_has() MMX version will be never used.
There are two cases:
3DNow! without SSE always use MMX version
3DNow! + SSE (K7) use MMX version only if FXSR is enabled
Thanks.
Best regards,
Krzysiek
-- >8 --
Subject: [PATCH] x86/lib: don't use mmx_memcpy() too early
The MMX 3DNow! optimized memcpy() is used very early,
even before FPU is initialized in the kernel. It worked fine, but commit
7ad816762f9bf89e940e618ea40c43138b479e10 ("x86/fpu: Reset MXCSR
to default in kernel_fpu_begin()") broke that. After that
commit the kernel_fpu_begin() assumes that FXSR is enabled in
the CR4 register on all processors with SSE. Because memcpy() is used
before FXSR is enabled, the kernel crashes just after "Booting the kernel."
message. It affects all kernels with CONFIG_X86_USE_3DNOW (enabled when
some AMD/Cyrix processors are selected) on processors with SSE
(like AMD K7, which supports both MMX 3DNow! and SSE).
Fixes: 7ad816762f9b ("x86/fpu: Reset MXCSR to default in kernel_fpu_begin()")
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: <[email protected]> # 5.8+
Signed-off-by: Krzysztof Mazur <[email protected]>
---
arch/x86/lib/mmx_32.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c
index 4321fa02e18d..70aa769570e6 100644
--- a/arch/x86/lib/mmx_32.c
+++ b/arch/x86/lib/mmx_32.c
@@ -25,13 +25,20 @@
#include <asm/fpu/api.h>
#include <asm/asm.h>
+#include <asm/tlbflush.h>
void *_mmx_memcpy(void *to, const void *from, size_t len)
{
void *p;
int i;
- if (unlikely(in_interrupt()))
+ /*
+ * kernel_fpu_begin() assumes that FXSR is enabled on all processors
+ * with SSE. Thus, MMX-optimized version can't be used
+ * before the kernel enables FXSR (OSFXSR bit in the CR4 register).
+ */
+ if (unlikely(in_interrupt()) || (boot_cpu_has(X86_FEATURE_XMM) &&
+ unlikely(!(cr4_read_shadow() & X86_CR4_OSFXSR))))
return __memcpy(to, from, len);
p = to;
--
2.27.0.rc1.207.gb85828341f
On Thu, Jan 14, 2021 at 6:51 AM Krzysztof Mazur <[email protected]> wrote:
>
> On Thu, Jan 14, 2021 at 03:07:37PM +0100, Borislav Petkov wrote:
> > On Thu, Jan 14, 2021 at 01:36:57PM +0100, Krzysztof Mazur wrote:
> > > The OSFXSR must be set only on CPUs with SSE. There
> > > are some CPUs with 3DNow!, but without SSE and FXSR (like AMD
> > > Geode LX, which is still used in many embedded systems).
> > > So, I've changed that to:
> > >
> > > if (unlikely(in_interrupt()) || (boot_cpu_has(X86_FEATURE_XMM) &&
> > > unlikely(!(cr4_read_shadow() & X86_CR4_OSFXSR))))
> >
> > Why?
> >
> > X86_CR4_OSFXSR won't ever be set on those CPUs but the test will be
> > performed anyway. So there's no need for boot_cpu_has().
>
> Because the MMX version should be always used on those CPUs, even without
> OSFXSR set. If the CPU does not support SSE, it is safe to
> call kernel_fpu_begin() without OSFXSR set.
> "!(cr4_read_shadow() & X86_CR4_OSFXSR)" will be always true on
> those CPUs, and without boot_cpu_has() MMX version will be never used.
>
> There are two cases:
>
> 3DNow! without SSE always use MMX version
> 3DNow! + SSE (K7) use MMX version only if FXSR is enabled
>
> Thanks.
>
> Best regards,
> Krzysiek
> -- >8 --
> Subject: [PATCH] x86/lib: don't use mmx_memcpy() too early
>
> The MMX 3DNow! optimized memcpy() is used very early,
> even before FPU is initialized in the kernel. It worked fine, but commit
> 7ad816762f9bf89e940e618ea40c43138b479e10 ("x86/fpu: Reset MXCSR
> to default in kernel_fpu_begin()") broke that. After that
> commit the kernel_fpu_begin() assumes that FXSR is enabled in
> the CR4 register on all processors with SSE. Because memcpy() is used
> before FXSR is enabled, the kernel crashes just after "Booting the kernel."
> message. It affects all kernels with CONFIG_X86_USE_3DNOW (enabled when
> some AMD/Cyrix processors are selected) on processors with SSE
> (like AMD K7, which supports both MMX 3DNow! and SSE).
>
> Fixes: 7ad816762f9b ("x86/fpu: Reset MXCSR to default in kernel_fpu_begin()")
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: <[email protected]> # 5.8+
> Signed-off-by: Krzysztof Mazur <[email protected]>
> ---
> arch/x86/lib/mmx_32.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c
> index 4321fa02e18d..70aa769570e6 100644
> --- a/arch/x86/lib/mmx_32.c
> +++ b/arch/x86/lib/mmx_32.c
> @@ -25,13 +25,20 @@
>
> #include <asm/fpu/api.h>
> #include <asm/asm.h>
> +#include <asm/tlbflush.h>
>
> void *_mmx_memcpy(void *to, const void *from, size_t len)
> {
> void *p;
> int i;
>
> - if (unlikely(in_interrupt()))
> + /*
> + * kernel_fpu_begin() assumes that FXSR is enabled on all processors
> + * with SSE. Thus, MMX-optimized version can't be used
> + * before the kernel enables FXSR (OSFXSR bit in the CR4 register).
> + */
> + if (unlikely(in_interrupt()) || (boot_cpu_has(X86_FEATURE_XMM) &&
> + unlikely(!(cr4_read_shadow() & X86_CR4_OSFXSR))))
This is gross. I realize this is only used for old CPUs that we don't
care about perf-wise, but this code is nonsense -- it makes absolutely
to sense to put this absurd condition here to work around
kernel_fpu_begin() bugs. If we want to use MMX, we should check MMX.
And we should also check the correct condition re: irqs. So this code
should be:
if (boot_cpu_has(X86_FEATURE_XMM) && irq_fpu_usable()) {
kernel_fpu_begin_mask(FPU_MASK_XMM);
...
or we could improve code gen by adding a try_kernel_fpu_begin_mask()
so we can do a single call instead of two.
This also mostly fixes our little performance regression -- we'd make
kernel_fpu_begin() be an inline wrapper around
kernel_fpu_begin_mask(FPU_MASK_DEFAULTFP), and *that* would be
FPU_MASK_XYZMM on 64-bit and FPU_MASK_387 on 32-bit. (Okay, XYZMM
isn't a great name, but whatever.) And the EFI code can become
completely explicit: kernel_fpu_begin(FPU_MASK_ALL).
What do you all think? If you're generally in favor, I can write the
code and do a quick audit for other weird cases in the kernel.
Or we could flip the OSFSXR bit very early, I suppose.
--Andy
On Thu, Jan 14, 2021 at 08:31:30AM -0800, Andy Lutomirski wrote:
> This is gross. I realize this is only used for old CPUs that we don't
> care about perf-wise
Performance might be still important for embedded systems (Geode LX
seems to be supported "until at least 2021").
> , but this code is nonsense -- it makes absolutely
> to sense to put this absurd condition here to work around
> kernel_fpu_begin() bugs.
Yes, it's nonsense. But I think that the kernel might have a silent
assumption, that the FPU cannot be used too early.
> If we want to use MMX, we should check MMX.
> And we should also check the correct condition re: irqs. So this code
> should be:
>
> if (boot_cpu_has(X86_FEATURE_XMM) && irq_fpu_usable()) {
> kernel_fpu_begin_mask(FPU_MASK_XMM);
> ...
This code does not use SSE (XMM). It uses only MMX and 3DNow!, so XMM
check is not needed here. And this code is enabled only when a processor
with 3DNow! is selected ("lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o"
in Makefile). So:
if (!irq_fpu_usable())
fallback_to_non_mmx_code();
is sufficient.
But the original:
if (!in_interrupt())
fallback_to_non_mmx_code();
is also valid. Changing it to irq_fpu_usable() might allow using
MMX variant in more cases and even improve performance. But it
can also introduce some regressions.
> or we could improve code gen by adding a try_kernel_fpu_begin_mask()
> so we can do a single call instead of two.
> [...]
> What do you all think? If you're generally in favor, I can write the
> code and do a quick audit for other weird cases in the kernel.
I think that the cleanest approach would be introducing fpu_usable(),
which returns 0 (or false) if FPU is not initialized yet. Then
irq_fpu_usable() could be changed to:
bool irq_fpu_usable(void)
{
return fpu_usable() && (!in_interrupt() ||
interrupted_user_mode() ||
interrupted_kernel_fpu_idle());
}
and in the _mmx_memcpy():
- if (unlikely(in_interrupt()))
+ if (unlikely(irq_fpu_usable()))
return __memcpy(to, from, len);
try_kernel_fpu_begin(), even without mask, is also a good idea.
If the FPU is not available yet (FPU not initizalized yet) it can fail
and a fallback code would be used. However, in some cases fpu_usable()
+ kernel_fpu_begin() might be better, if between fpu_usable() check
and kernel_fpu_begin() long preparation is required. (kernel_fpu_begin()
disables preemption). In the mmx_32.c case try_kernel_fpu_begin()
looks good, only two simple lines are added to a section with disabled
preemption:
diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c
index 4321fa02e18d..9c6dadd2b2ef 100644
--- a/arch/x86/lib/mmx_32.c
+++ b/arch/x86/lib/mmx_32.c
@@ -31,14 +31,12 @@ void *_mmx_memcpy(void *to, const void *from, size_t len)
void *p;
int i;
- if (unlikely(in_interrupt()))
+ if (unlikely(try_kernel_fpu_begin()))
return __memcpy(to, from, len);
p = to;
i = len >> 6; /* len/64 */
- kernel_fpu_begin();
-
__asm__ __volatile__ (
"1: prefetch (%0)\n" /* This set is 28 bytes */
" prefetch 64(%0)\n"
And if try_kernel_fpu_begin_mask() with a mask will allow for some
optimizations it also might be a good idea.
> Or we could flip the OSFSXR bit very early, I suppose.
I think that my original static_branch_likely() workaround might
be the simplest and touches only mmx_32.c. Such approach is already
used in the kernel, for instance in the crypto code:
$ git grep static_branch arch/x86/crypto/
And maybe using FPU too early should be forbidden.
Best regards,
Krzysiek