2023-12-28 01:42:40

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v2 00/14] Unified cross-architecture kernel-mode FPU API

This series unifies the kernel-mode FPU API across several architectures
by wrapping the existing functions (where needed) in consistently-named
functions placed in a consistent header location, with mostly the same
semantics: they can be called from preemptible or non-preemptible task
context, and are not assumed to be reentrant. Architectures are also
expected to provide CFLAGS adjustments for compiling FPU-dependent code.
For the moment, SIMD/vector units are out of scope for this common API.

This allows us to remove the ifdeffery and duplicated Makefile logic at
each FPU user. It then implements the common API on RISC-V, and converts
a couple of users to the new API: the AMDGPU DRM driver, and the FPU
self test.

The underlying goal of this series is to allow using newer AMD GPUs
(e.g. Navi) on RISC-V boards such as SiFive's HiFive Unmatched. Those
GPUs need CONFIG_DRM_AMD_DC_FP to initialize, which requires kernel-mode
FPU support.

Previous versions:
v1: https://lore.kernel.org/linux-kernel/[email protected]/
v0: https://lore.kernel.org/linux-kernel/[email protected]/

Changes in v2:
- Add documentation explaining the built-time and runtime APIs
- Add a linux/fpu.h header for generic isolation enforcement
- Remove file name from header comment
- Clean up arch/arm64/lib/Makefile, like for arch/arm
- Remove RISC-V architecture-specific preprocessor check
- Split altivec removal to a separate patch
- Use linux/fpu.h instead of asm/fpu.h in consumers
- Declare test_fpu() in a header

Michael Ellerman (1):
drm/amd/display: Only use hard-float, not altivec on powerpc

Samuel Holland (13):
arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT
ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS
LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
riscv: Add support for kernel-mode FPU
drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
selftests/fpu: Move FP code to a separate translation unit
selftests/fpu: Allow building on other architectures

Documentation/core-api/floating-point.rst | 78 +++++++++++++++++++
Documentation/core-api/index.rst | 1 +
Makefile | 5 ++
arch/Kconfig | 6 ++
arch/arm/Kconfig | 1 +
arch/arm/Makefile | 7 ++
arch/arm/include/asm/fpu.h | 15 ++++
arch/arm/lib/Makefile | 3 +-
arch/arm64/Kconfig | 1 +
arch/arm64/Makefile | 9 ++-
arch/arm64/include/asm/fpu.h | 15 ++++
arch/arm64/lib/Makefile | 6 +-
arch/loongarch/Kconfig | 1 +
arch/loongarch/Makefile | 5 +-
arch/loongarch/include/asm/fpu.h | 1 +
arch/powerpc/Kconfig | 1 +
arch/powerpc/Makefile | 5 +-
arch/powerpc/include/asm/fpu.h | 28 +++++++
arch/riscv/Kconfig | 1 +
arch/riscv/Makefile | 3 +
arch/riscv/include/asm/fpu.h | 16 ++++
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/kernel_mode_fpu.c | 28 +++++++
arch/x86/Kconfig | 1 +
arch/x86/Makefile | 20 +++++
arch/x86/include/asm/fpu.h | 13 ++++
drivers/gpu/drm/amd/display/Kconfig | 2 +-
.../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 35 +--------
drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 +--------
drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 +--------
include/linux/fpu.h | 12 +++
lib/Kconfig.debug | 2 +-
lib/Makefile | 26 +------
lib/raid6/Makefile | 31 ++------
lib/test_fpu.h | 8 ++
lib/{test_fpu.c => test_fpu_glue.c} | 37 ++-------
lib/test_fpu_impl.c | 37 +++++++++
37 files changed, 343 insertions(+), 190 deletions(-)
create mode 100644 Documentation/core-api/floating-point.rst
create mode 100644 arch/arm/include/asm/fpu.h
create mode 100644 arch/arm64/include/asm/fpu.h
create mode 100644 arch/powerpc/include/asm/fpu.h
create mode 100644 arch/riscv/include/asm/fpu.h
create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c
create mode 100644 arch/x86/include/asm/fpu.h
create mode 100644 include/linux/fpu.h
create mode 100644 lib/test_fpu.h
rename lib/{test_fpu.c => test_fpu_glue.c} (71%)
create mode 100644 lib/test_fpu_impl.c

--
2.42.0



2023-12-28 01:42:55

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v2 02/14] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

ARM provides an equivalent to the common kernel-mode FPU API, but in a
different header and using different function names. Add a wrapper
header, and export CFLAGS adjustments as found in lib/raid6/Makefile.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

Changes in v2:
- Remove file name from header comment

arch/arm/Kconfig | 1 +
arch/arm/Makefile | 7 +++++++
arch/arm/include/asm/fpu.h | 15 +++++++++++++++
3 files changed, 23 insertions(+)
create mode 100644 arch/arm/include/asm/fpu.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index f8567e95f98b..92e21a4a2903 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -14,6 +14,7 @@ config ARM
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_KEEPINITRD
select ARCH_HAS_KCOV
+ select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PTE_SPECIAL if ARM_LPAE
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 5ba42f69f8ce..1dd860dba5f5 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -130,6 +130,13 @@ endif
# Accept old syntax despite ".syntax unified"
AFLAGS_NOWARN :=$(call as-option,-Wa$(comma)-mno-warn-deprecated,-Wa$(comma)-W)

+# The GCC option -ffreestanding is required in order to compile code containing
+# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel)
+CC_FLAGS_FPU := -ffreestanding
+# Enable <arm_neon.h>
+CC_FLAGS_FPU += -isystem $(shell $(CC) -print-file-name=include)
+CC_FLAGS_FPU += -march=armv7-a -mfloat-abi=softfp -mfpu=neon
+
ifeq ($(CONFIG_THUMB2_KERNEL),y)
CFLAGS_ISA :=-Wa,-mimplicit-it=always $(AFLAGS_NOWARN)
AFLAGS_ISA :=$(CFLAGS_ISA) -Wa$(comma)-mthumb
diff --git a/arch/arm/include/asm/fpu.h b/arch/arm/include/asm/fpu.h
new file mode 100644
index 000000000000..2ae50bdce59b
--- /dev/null
+++ b/arch/arm/include/asm/fpu.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef __ASM_FPU_H
+#define __ASM_FPU_H
+
+#include <asm/neon.h>
+
+#define kernel_fpu_available() cpu_has_neon()
+#define kernel_fpu_begin() kernel_neon_begin()
+#define kernel_fpu_end() kernel_neon_end()
+
+#endif /* ! __ASM_FPU_H */
--
2.42.0


2023-12-28 01:43:28

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v2 04/14] arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

arm64 provides an equivalent to the common kernel-mode FPU API, but in a
different header and using different function names. Add a wrapper
header, and export CFLAGS adjustments as found in lib/raid6/Makefile.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

Changes in v2:
- Remove file name from header comment

arch/arm64/Kconfig | 1 +
arch/arm64/Makefile | 9 ++++++++-
arch/arm64/include/asm/fpu.h | 15 +++++++++++++++
3 files changed, 24 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/include/asm/fpu.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b071a00425d..485ac389ac11 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -30,6 +30,7 @@ config ARM64
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_HAS_KCOV
+ select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON
select ARCH_HAS_KEEPINITRD
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 9a2d3723cd0f..4a65f24c7998 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -36,7 +36,14 @@ ifeq ($(CONFIG_BROKEN_GAS_INST),y)
$(warning Detected assembler with broken .inst; disassembly will be unreliable)
endif

-KBUILD_CFLAGS += -mgeneral-regs-only \
+# The GCC option -ffreestanding is required in order to compile code containing
+# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel)
+CC_FLAGS_FPU := -ffreestanding
+# Enable <arm_neon.h>
+CC_FLAGS_FPU += -isystem $(shell $(CC) -print-file-name=include)
+CC_FLAGS_NO_FPU := -mgeneral-regs-only
+
+KBUILD_CFLAGS += $(CC_FLAGS_NO_FPU) \
$(compat_vdso) $(cc_has_k_constraint)
KBUILD_CFLAGS += $(call cc-disable-warning, psabi)
KBUILD_AFLAGS += $(compat_vdso)
diff --git a/arch/arm64/include/asm/fpu.h b/arch/arm64/include/asm/fpu.h
new file mode 100644
index 000000000000..2ae50bdce59b
--- /dev/null
+++ b/arch/arm64/include/asm/fpu.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef __ASM_FPU_H
+#define __ASM_FPU_H
+
+#include <asm/neon.h>
+
+#define kernel_fpu_available() cpu_has_neon()
+#define kernel_fpu_begin() kernel_neon_begin()
+#define kernel_fpu_end() kernel_neon_end()
+
+#endif /* ! __ASM_FPU_H */
--
2.42.0


2023-12-28 01:43:31

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v2 03/14] ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS

Now that CC_FLAGS_FPU is exported and can be used anywhere in the source
tree, use it instead of duplicating the flags here.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v1)

arch/arm/lib/Makefile | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index 650404be6768..0ca5aae1bcc3 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -40,8 +40,7 @@ $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S
$(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S

ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
- NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp -mfpu=neon
- CFLAGS_xor-neon.o += $(NEON_FLAGS)
+ CFLAGS_xor-neon.o += $(CC_FLAGS_FPU)
obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
endif

--
2.42.0


2023-12-28 01:43:39

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v2 05/14] arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS

Now that CC_FLAGS_FPU is exported and can be used anywhere in the source
tree, use it instead of duplicating the flags here.

Signed-off-by: Samuel Holland <[email protected]>
---

Changes in v2:
- New patch for v2

arch/arm64/lib/Makefile | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 29490be2546b..13e6a2829116 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -7,10 +7,8 @@ lib-y := clear_user.o delay.o copy_from_user.o \

ifeq ($(CONFIG_KERNEL_MODE_NEON), y)
obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
-CFLAGS_REMOVE_xor-neon.o += -mgeneral-regs-only
-CFLAGS_xor-neon.o += -ffreestanding
-# Enable <arm_neon.h>
-CFLAGS_xor-neon.o += -isystem $(shell $(CC) -print-file-name=include)
+CFLAGS_xor-neon.o += $(CC_FLAGS_FPU)
+CFLAGS_REMOVE_xor-neon.o += $(CC_FLAGS_NO_FPU)
endif

lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o
--
2.42.0


2023-12-28 01:44:18

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v2 07/14] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in
asm/fpu.h, so it only needs to add kernel_fpu_available() and export
the CFLAGS adjustments.

Acked-by: WANG Xuerui <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v1)

arch/loongarch/Kconfig | 1 +
arch/loongarch/Makefile | 5 ++++-
arch/loongarch/include/asm/fpu.h | 1 +
3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index ee123820a476..65d4475565b8 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -15,6 +15,7 @@ config LOONGARCH
select ARCH_HAS_CPU_FINALIZE_INIT
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_KCOV
+ select ARCH_HAS_KERNEL_FPU_SUPPORT if CPU_HAS_FPU
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PTE_SPECIAL
diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
index 4ba8d67ddb09..1afe28feaba5 100644
--- a/arch/loongarch/Makefile
+++ b/arch/loongarch/Makefile
@@ -25,6 +25,9 @@ endif
32bit-emul = elf32loongarch
64bit-emul = elf64loongarch

+CC_FLAGS_FPU := -mfpu=64
+CC_FLAGS_NO_FPU := -msoft-float
+
ifdef CONFIG_DYNAMIC_FTRACE
KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
CC_FLAGS_FTRACE := -fpatchable-function-entry=2
@@ -46,7 +49,7 @@ ld-emul = $(64bit-emul)
cflags-y += -mabi=lp64s
endif

-cflags-y += -pipe -msoft-float
+cflags-y += -pipe $(CC_FLAGS_NO_FPU)
LDFLAGS_vmlinux += -static -n -nostdlib

# When the assembler supports explicit relocation hint, we must use it.
diff --git a/arch/loongarch/include/asm/fpu.h b/arch/loongarch/include/asm/fpu.h
index c2d8962fda00..3177674228f8 100644
--- a/arch/loongarch/include/asm/fpu.h
+++ b/arch/loongarch/include/asm/fpu.h
@@ -21,6 +21,7 @@

struct sigcontext;

+#define kernel_fpu_available() cpu_has_fpu
extern void kernel_fpu_begin(void);
extern void kernel_fpu_end(void);

--
2.42.0


2023-12-28 01:44:41

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v2 01/14] arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT

Several architectures provide an API to enable the FPU and run
floating-point SIMD code in kernel space. However, the function names,
header locations, and semantics are inconsistent across architectures,
and FPU support may be gated behind other Kconfig options.

Provide a standard way for architectures to declare that kernel space
FPU support is available. Architectures selecting this option must
implement what is currently the most common API (kernel_fpu_begin() and
kernel_fpu_end(), plus a new function kernel_fpu_available()) and
provide the appropriate CFLAGS for compiling floating-point C code.

Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

Changes in v2:
- Add documentation explaining the built-time and runtime APIs
- Add a linux/fpu.h header for generic isolation enforcement

Documentation/core-api/floating-point.rst | 78 +++++++++++++++++++++++
Documentation/core-api/index.rst | 1 +
Makefile | 5 ++
arch/Kconfig | 6 ++
include/linux/fpu.h | 12 ++++
5 files changed, 102 insertions(+)
create mode 100644 Documentation/core-api/floating-point.rst
create mode 100644 include/linux/fpu.h

diff --git a/Documentation/core-api/floating-point.rst b/Documentation/core-api/floating-point.rst
new file mode 100644
index 000000000000..a8d0d4b05052
--- /dev/null
+++ b/Documentation/core-api/floating-point.rst
@@ -0,0 +1,78 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+Floating-point API
+==================
+
+Kernel code is normally prohibited from using floating-point (FP) registers or
+instructions, including the C float and double data types. This rule reduces
+system call overhead, because the kernel does not need to save and restore the
+userspace floating-point register state.
+
+However, occasionally drivers or library functions may need to include FP code.
+This is supported by isolating the functions containing FP code to a separate
+translation unit (a separate source file), and saving/restoring the FP register
+state around calls to those functions. This creates "critical sections" of
+floating-point usage.
+
+The reason for this isolation is to prevent the compiler from generating code
+touching the FP registers outside these critical sections. Compilers sometimes
+use FP registers to optimize inlined ``memcpy`` or variable assignment, as
+floating-point registers may be wider than general-purpose registers.
+
+Usability of floating-point code within the kernel is architecture-specific.
+Additionally, because a single kernel may be configured to support platforms
+both with and without a floating-point unit, FPU availability must be checked
+both at build time and at run time.
+
+Several architectures implement the generic kernel floating-point API from
+``linux/fpu.h``, as described below. Some other architectures implement their
+own unique APIs, which are documented separately.
+
+Build-time API
+--------------
+
+Floating-point code may be built if the option ``ARCH_HAS_KERNEL_FPU_SUPPORT``
+is enabled. For C code, such code must be placed in a separate file, and that
+file must have its compilation flags adjusted using the following pattern::
+
+ CFLAGS_foo.o += $(CC_FLAGS_FPU)
+ CFLAGS_REMOVE_foo.o += $(CC_FLAGS_NO_FPU)
+
+Architectures are expected to define one or both of these variables in their
+top-level Makefile as needed. For example::
+
+ CC_FLAGS_FPU := -mhard-float
+
+or::
+
+ CC_FLAGS_NO_FPU := -msoft-float
+
+Normal kernel code is assumed to use the equivalent of ``CC_FLAGS_NO_FPU``.
+
+Runtime API
+-----------
+
+The runtime API is provided in ``linux/fpu.h``. This header cannot be included
+from files implementing FP code (those with their compilation flags adjusted as
+above). Instead, it must be included when defining the FP critical sections.
+
+.. c:function:: bool kernel_fpu_available( void )
+
+ This function reports if floating-point code can be used on this CPU or
+ platform. The value returned by this function is not expected to change
+ at runtime, so it only needs to be called once, not before every
+ critical section.
+
+.. c:function:: void kernel_fpu_begin( void )
+ void kernel_fpu_end( void )
+
+ These functions create a floating-point critical section. It is only
+ valid to call ``kernel_fpu_begin()`` after a previous call to
+ ``kernel_fpu_available()`` returned ``true``. These functions are only
+ guaranteed to be callable from (preemptible or non-preemptible) process
+ context.
+
+ Preemption may be disabled inside critical sections, so their size
+ should be minimized. They are *not* required to be reentrant. If the
+ caller expects to nest critical sections, it must implement its own
+ reference counting.
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index 7a3a08d81f11..974beccd671f 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -48,6 +48,7 @@ Library functionality that is used throughout the kernel.
errseq
wrappers/atomic_t
wrappers/atomic_bitops
+ floating-point

Low level entry and exit
========================
diff --git a/Makefile b/Makefile
index ee995fc2b0e5..79c9e0b56ab8 100644
--- a/Makefile
+++ b/Makefile
@@ -969,6 +969,11 @@ KBUILD_CFLAGS += $(CC_FLAGS_CFI)
export CC_FLAGS_CFI
endif

+# Architectures can define flags to add/remove for floating-point support
+CC_FLAGS_FPU += -D_LINUX_FPU_COMPILATION_UNIT
+export CC_FLAGS_FPU
+export CC_FLAGS_NO_FPU
+
ifneq ($(CONFIG_FUNCTION_ALIGNMENT),0)
KBUILD_CFLAGS += -falign-functions=$(CONFIG_FUNCTION_ALIGNMENT)
endif
diff --git a/arch/Kconfig b/arch/Kconfig
index f4b210ab0612..e1c01ce819ed 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1478,6 +1478,12 @@ config ARCH_HAS_NONLEAF_PMD_YOUNG
address translations. Page table walkers that clear the accessed bit
may use this capability to reduce their search space.

+config ARCH_HAS_KERNEL_FPU_SUPPORT
+ bool
+ help
+ Architectures that select this option can run floating-point code in
+ the kernel, as described in Documentation/core-api/floating-point.rst.
+
source "kernel/gcov/Kconfig"

source "scripts/gcc-plugins/Kconfig"
diff --git a/include/linux/fpu.h b/include/linux/fpu.h
new file mode 100644
index 000000000000..2fb63e22913b
--- /dev/null
+++ b/include/linux/fpu.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _LINUX_FPU_H
+#define _LINUX_FPU_H
+
+#ifdef _LINUX_FPU_COMPILATION_UNIT
+#error FP code must be compiled separately. See Documentation/core-api/floating-point.rst.
+#endif
+
+#include <asm/fpu.h>
+
+#endif
--
2.42.0


2023-12-28 01:44:57

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v2 09/14] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

x86 already provides kernel_fpu_begin() and kernel_fpu_end(), but in a
different header. Add a wrapper header, and export the CFLAGS
adjustments as found in lib/Makefile.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v1)

arch/x86/Kconfig | 1 +
arch/x86/Makefile | 20 ++++++++++++++++++++
arch/x86/include/asm/fpu.h | 13 +++++++++++++
3 files changed, 34 insertions(+)
create mode 100644 arch/x86/include/asm/fpu.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3762f41bb092..1fe7f2d8d017 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -81,6 +81,7 @@ config X86
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_KCOV if X86_64
+ select ARCH_HAS_KERNEL_FPU_SUPPORT
select ARCH_HAS_MEM_ENCRYPT
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 1a068de12a56..71576c8dbe79 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -70,6 +70,26 @@ export BITS
KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx
KBUILD_RUSTFLAGS += -Ctarget-feature=-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-avx,-avx2

+#
+# CFLAGS for compiling floating point code inside the kernel.
+#
+CC_FLAGS_FPU := -msse -msse2
+ifdef CONFIG_CC_IS_GCC
+# Stack alignment mismatch, proceed with caution.
+# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
+# (8B stack alignment).
+# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383
+#
+# The "-msse" in the first argument is there so that the
+# -mpreferred-stack-boundary=3 build error:
+#
+# -mpreferred-stack-boundary=3 is not between 4 and 12
+#
+# can be triggered. Otherwise gcc doesn't complain.
+CC_FLAGS_FPU += -mhard-float
+CC_FLAGS_FPU += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4)
+endif
+
ifeq ($(CONFIG_X86_KERNEL_IBT),y)
#
# Kernel IBT has S_CET.NOTRACK_EN=0, as such the compilers must not generate
diff --git a/arch/x86/include/asm/fpu.h b/arch/x86/include/asm/fpu.h
new file mode 100644
index 000000000000..b2743fe19339
--- /dev/null
+++ b/arch/x86/include/asm/fpu.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef _ASM_X86_FPU_H
+#define _ASM_X86_FPU_H
+
+#include <asm/fpu/api.h>
+
+#define kernel_fpu_available() true
+
+#endif /* ! _ASM_X86_FPU_H */
--
2.42.0


2023-12-28 01:45:02

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v2 10/14] riscv: Add support for kernel-mode FPU

This is motivated by the amdgpu DRM driver, which needs floating-point
code to support recent hardware. That code is not performance-critical,
so only provide a minimal non-preemptible implementation for now.

Signed-off-by: Samuel Holland <[email protected]>
---

Changes in v2:
- Remove RISC-V architecture-specific preprocessor check

arch/riscv/Kconfig | 1 +
arch/riscv/Makefile | 3 +++
arch/riscv/include/asm/fpu.h | 16 ++++++++++++++++
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/kernel_mode_fpu.c | 28 ++++++++++++++++++++++++++++
5 files changed, 49 insertions(+)
create mode 100644 arch/riscv/include/asm/fpu.h
create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 24c1799e2ec4..4d4d1d64ce34 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -27,6 +27,7 @@ config RISCV
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_HAS_KCOV
+ select ARCH_HAS_KERNEL_FPU_SUPPORT if FPU
select ARCH_HAS_MMIOWB
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PMEM_API
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index a74be78678eb..2e719c369210 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -81,6 +81,9 @@ KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64i

KBUILD_AFLAGS += -march=$(riscv-march-y)

+# For C code built with floating-point support, exclude V but keep F and D.
+CC_FLAGS_FPU := -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)([^v_]*)v?/\1\2/')
+
KBUILD_CFLAGS += -mno-save-restore
KBUILD_CFLAGS += -DCONFIG_PAGE_OFFSET=$(CONFIG_PAGE_OFFSET)

diff --git a/arch/riscv/include/asm/fpu.h b/arch/riscv/include/asm/fpu.h
new file mode 100644
index 000000000000..91c04c244e12
--- /dev/null
+++ b/arch/riscv/include/asm/fpu.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef _ASM_RISCV_FPU_H
+#define _ASM_RISCV_FPU_H
+
+#include <asm/switch_to.h>
+
+#define kernel_fpu_available() has_fpu()
+
+void kernel_fpu_begin(void);
+void kernel_fpu_end(void);
+
+#endif /* ! _ASM_RISCV_FPU_H */
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index fee22a3d1b53..662c483e338d 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -62,6 +62,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/

obj-$(CONFIG_RISCV_MISALIGNED) += traps_misaligned.o
obj-$(CONFIG_FPU) += fpu.o
+obj-$(CONFIG_FPU) += kernel_mode_fpu.o
obj-$(CONFIG_RISCV_ISA_V) += vector.o
obj-$(CONFIG_SMP) += smpboot.o
obj-$(CONFIG_SMP) += smp.o
diff --git a/arch/riscv/kernel/kernel_mode_fpu.c b/arch/riscv/kernel/kernel_mode_fpu.c
new file mode 100644
index 000000000000..0ac8348876c4
--- /dev/null
+++ b/arch/riscv/kernel/kernel_mode_fpu.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#include <linux/export.h>
+#include <linux/preempt.h>
+
+#include <asm/csr.h>
+#include <asm/fpu.h>
+#include <asm/processor.h>
+#include <asm/switch_to.h>
+
+void kernel_fpu_begin(void)
+{
+ preempt_disable();
+ fstate_save(current, task_pt_regs(current));
+ csr_set(CSR_SSTATUS, SR_FS);
+}
+EXPORT_SYMBOL_GPL(kernel_fpu_begin);
+
+void kernel_fpu_end(void)
+{
+ csr_clear(CSR_SSTATUS, SR_FS);
+ fstate_restore(current, task_pt_regs(current));
+ preempt_enable();
+}
+EXPORT_SYMBOL_GPL(kernel_fpu_end);
--
2.42.0


2023-12-28 01:45:30

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v2 11/14] drm/amd/display: Only use hard-float, not altivec on powerpc

From: Michael Ellerman <[email protected]>

The compiler flags enable altivec, but that is not required; hard-float
is sufficient for the code to build and function.

Drop altivec from the compiler flags and adjust the enable/disable code
to only enable FPU use.

Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

Changes in v2:
- New patch for v2

drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 12 ++----------
drivers/gpu/drm/amd/display/dc/dml/Makefile | 2 +-
drivers/gpu/drm/amd/display/dc/dml2/Makefile | 2 +-
3 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
index 4ae4720535a5..0de16796466b 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
@@ -92,11 +92,7 @@ void dc_fpu_begin(const char *function_name, const int line)
#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
kernel_fpu_begin();
#elif defined(CONFIG_PPC64)
- if (cpu_has_feature(CPU_FTR_VSX_COMP))
- enable_kernel_vsx();
- else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
- enable_kernel_altivec();
- else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
+ if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
enable_kernel_fp();
#elif defined(CONFIG_ARM64)
kernel_neon_begin();
@@ -125,11 +121,7 @@ void dc_fpu_end(const char *function_name, const int line)
#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
kernel_fpu_end();
#elif defined(CONFIG_PPC64)
- if (cpu_has_feature(CPU_FTR_VSX_COMP))
- disable_kernel_vsx();
- else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
- disable_kernel_altivec();
- else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
+ if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
disable_kernel_fp();
#elif defined(CONFIG_ARM64)
kernel_neon_end();
diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile
index 6042a5a6a44f..554c39024a40 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
@@ -31,7 +31,7 @@ dml_ccflags := $(dml_ccflags-y) -msse
endif

ifdef CONFIG_PPC64
-dml_ccflags := -mhard-float -maltivec
+dml_ccflags := -mhard-float
endif

ifdef CONFIG_ARM64
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
index acff3449b8d7..7b51364084b5 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
@@ -30,7 +30,7 @@ dml2_ccflags := $(dml2_ccflags-y) -msse
endif

ifdef CONFIG_PPC64
-dml2_ccflags := -mhard-float -maltivec
+dml2_ccflags := -mhard-float
endif

ifdef CONFIG_ARM64
--
2.42.0


2023-12-28 01:45:51

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v2 13/14] selftests/fpu: Move FP code to a separate translation unit

This ensures no compiler-generated floating-point code can appear
outside kernel_fpu_{begin,end}() sections, and some architectures
enforce this separation.

Signed-off-by: Samuel Holland <[email protected]>
---

Changes in v2:
- Declare test_fpu() in a header

lib/Makefile | 3 ++-
lib/test_fpu.h | 8 +++++++
lib/{test_fpu.c => test_fpu_glue.c} | 32 +------------------------
lib/test_fpu_impl.c | 37 +++++++++++++++++++++++++++++
4 files changed, 48 insertions(+), 32 deletions(-)
create mode 100644 lib/test_fpu.h
rename lib/{test_fpu.c => test_fpu_glue.c} (71%)
create mode 100644 lib/test_fpu_impl.c

diff --git a/lib/Makefile b/lib/Makefile
index 6b09731d8e61..e7cbd54944a2 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -132,7 +132,8 @@ FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-st
endif

obj-$(CONFIG_TEST_FPU) += test_fpu.o
-CFLAGS_test_fpu.o += $(FPU_CFLAGS)
+test_fpu-y := test_fpu_glue.o test_fpu_impl.o
+CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS)

obj-$(CONFIG_TEST_LIVEPATCH) += livepatch/

diff --git a/lib/test_fpu.h b/lib/test_fpu.h
new file mode 100644
index 000000000000..4459807084bc
--- /dev/null
+++ b/lib/test_fpu.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#ifndef _LIB_TEST_FPU_H
+#define _LIB_TEST_FPU_H
+
+int test_fpu(void);
+
+#endif
diff --git a/lib/test_fpu.c b/lib/test_fpu_glue.c
similarity index 71%
rename from lib/test_fpu.c
rename to lib/test_fpu_glue.c
index e82db19fed84..85963d7be826 100644
--- a/lib/test_fpu.c
+++ b/lib/test_fpu_glue.c
@@ -19,37 +19,7 @@
#include <linux/debugfs.h>
#include <asm/fpu/api.h>

-static int test_fpu(void)
-{
- /*
- * This sequence of operations tests that rounding mode is
- * to nearest and that denormal numbers are supported.
- * Volatile variables are used to avoid compiler optimizing
- * the calculations away.
- */
- volatile double a, b, c, d, e, f, g;
-
- a = 4.0;
- b = 1e-15;
- c = 1e-310;
-
- /* Sets precision flag */
- d = a + b;
-
- /* Result depends on rounding mode */
- e = a + b / 2;
-
- /* Denormal and very large values */
- f = b / c;
-
- /* Depends on denormal support */
- g = a + c * f;
-
- if (d > a && e > a && g > a)
- return 0;
- else
- return -EINVAL;
-}
+#include "test_fpu.h"

static int test_fpu_get(void *data, u64 *val)
{
diff --git a/lib/test_fpu_impl.c b/lib/test_fpu_impl.c
new file mode 100644
index 000000000000..777894dbbe86
--- /dev/null
+++ b/lib/test_fpu_impl.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include <linux/errno.h>
+
+#include "test_fpu.h"
+
+int test_fpu(void)
+{
+ /*
+ * This sequence of operations tests that rounding mode is
+ * to nearest and that denormal numbers are supported.
+ * Volatile variables are used to avoid compiler optimizing
+ * the calculations away.
+ */
+ volatile double a, b, c, d, e, f, g;
+
+ a = 4.0;
+ b = 1e-15;
+ c = 1e-310;
+
+ /* Sets precision flag */
+ d = a + b;
+
+ /* Result depends on rounding mode */
+ e = a + b / 2;
+
+ /* Denormal and very large values */
+ f = b / c;
+
+ /* Depends on denormal support */
+ g = a + c * f;
+
+ if (d > a && e > a && g > a)
+ return 0;
+ else
+ return -EINVAL;
+}
--
2.42.0


2023-12-28 01:46:08

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v2 14/14] selftests/fpu: Allow building on other architectures

Now that ARCH_HAS_KERNEL_FPU_SUPPORT provides a common way to compile
and run floating-point code, this test is no longer x86-specific.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v1)

lib/Kconfig.debug | 2 +-
lib/Makefile | 25 ++-----------------------
lib/test_fpu_glue.c | 5 ++++-
3 files changed, 7 insertions(+), 25 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 4405f81248fb..4596100eeb14 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2918,7 +2918,7 @@ config TEST_FREE_PAGES

config TEST_FPU
tristate "Test floating point operations in kernel space"
- depends on X86 && !KCOV_INSTRUMENT_ALL
+ depends on ARCH_HAS_KERNEL_FPU_SUPPORT && !KCOV_INSTRUMENT_ALL
help
Enable this option to add /sys/kernel/debug/selftest_helpers/test_fpu
which will trigger a sequence of floating point operations. This is used
diff --git a/lib/Makefile b/lib/Makefile
index e7cbd54944a2..b9f28558c9bd 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -109,31 +109,10 @@ CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE)
obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o
obj-$(CONFIG_TEST_OBJPOOL) += test_objpool.o

-#
-# CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns
-# off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS
-# get appended last to CFLAGS and thus override those previous compiler options.
-#
-FPU_CFLAGS := -msse -msse2
-ifdef CONFIG_CC_IS_GCC
-# Stack alignment mismatch, proceed with caution.
-# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
-# (8B stack alignment).
-# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383
-#
-# The "-msse" in the first argument is there so that the
-# -mpreferred-stack-boundary=3 build error:
-#
-# -mpreferred-stack-boundary=3 is not between 4 and 12
-#
-# can be triggered. Otherwise gcc doesn't complain.
-FPU_CFLAGS += -mhard-float
-FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4)
-endif
-
obj-$(CONFIG_TEST_FPU) += test_fpu.o
test_fpu-y := test_fpu_glue.o test_fpu_impl.o
-CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS)
+CFLAGS_test_fpu_impl.o += $(CC_FLAGS_FPU)
+CFLAGS_REMOVE_test_fpu_impl.o += $(CC_FLAGS_NO_FPU)

obj-$(CONFIG_TEST_LIVEPATCH) += livepatch/

diff --git a/lib/test_fpu_glue.c b/lib/test_fpu_glue.c
index 85963d7be826..eef282a2715f 100644
--- a/lib/test_fpu_glue.c
+++ b/lib/test_fpu_glue.c
@@ -17,7 +17,7 @@
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/debugfs.h>
-#include <asm/fpu/api.h>
+#include <linux/fpu.h>

#include "test_fpu.h"

@@ -38,6 +38,9 @@ static struct dentry *selftest_dir;

static int __init test_fpu_init(void)
{
+ if (!kernel_fpu_available())
+ return -EINVAL;
+
selftest_dir = debugfs_create_dir("selftest_helpers", NULL);
if (!selftest_dir)
return -ENOMEM;
--
2.42.0


2023-12-28 01:46:15

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v2 06/14] lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS

Now that CC_FLAGS_FPU is exported and can be used anywhere in the source
tree, use it instead of duplicating the flags here.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v1)

lib/raid6/Makefile | 31 ++++++++-----------------------
1 file changed, 8 insertions(+), 23 deletions(-)

diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile
index 1c5420ff254e..309fea97efc6 100644
--- a/lib/raid6/Makefile
+++ b/lib/raid6/Makefile
@@ -33,25 +33,6 @@ CFLAGS_REMOVE_vpermxor8.o += -msoft-float
endif
endif

-# The GCC option -ffreestanding is required in order to compile code containing
-# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel)
-ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
-NEON_FLAGS := -ffreestanding
-# Enable <arm_neon.h>
-NEON_FLAGS += -isystem $(shell $(CC) -print-file-name=include)
-ifeq ($(ARCH),arm)
-NEON_FLAGS += -march=armv7-a -mfloat-abi=softfp -mfpu=neon
-endif
-CFLAGS_recov_neon_inner.o += $(NEON_FLAGS)
-ifeq ($(ARCH),arm64)
-CFLAGS_REMOVE_recov_neon_inner.o += -mgeneral-regs-only
-CFLAGS_REMOVE_neon1.o += -mgeneral-regs-only
-CFLAGS_REMOVE_neon2.o += -mgeneral-regs-only
-CFLAGS_REMOVE_neon4.o += -mgeneral-regs-only
-CFLAGS_REMOVE_neon8.o += -mgeneral-regs-only
-endif
-endif
-
quiet_cmd_unroll = UNROLL $@
cmd_unroll = $(AWK) -v N=$* -f $(srctree)/$(src)/unroll.awk < $< > $@

@@ -75,10 +56,14 @@ targets += vpermxor1.c vpermxor2.c vpermxor4.c vpermxor8.c
$(obj)/vpermxor%.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE
$(call if_changed,unroll)

-CFLAGS_neon1.o += $(NEON_FLAGS)
-CFLAGS_neon2.o += $(NEON_FLAGS)
-CFLAGS_neon4.o += $(NEON_FLAGS)
-CFLAGS_neon8.o += $(NEON_FLAGS)
+CFLAGS_neon1.o += $(CC_FLAGS_FPU)
+CFLAGS_neon2.o += $(CC_FLAGS_FPU)
+CFLAGS_neon4.o += $(CC_FLAGS_FPU)
+CFLAGS_neon8.o += $(CC_FLAGS_FPU)
+CFLAGS_REMOVE_neon1.o += $(CC_FLAGS_NO_FPU)
+CFLAGS_REMOVE_neon2.o += $(CC_FLAGS_NO_FPU)
+CFLAGS_REMOVE_neon4.o += $(CC_FLAGS_NO_FPU)
+CFLAGS_REMOVE_neon8.o += $(CC_FLAGS_NO_FPU)
targets += neon1.c neon2.c neon4.c neon8.c
$(obj)/neon%.c: $(src)/neon.uc $(src)/unroll.awk FORCE
$(call if_changed,unroll)
--
2.42.0


2023-12-28 01:47:23

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v2 08/14] powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

PowerPC provides an equivalent to the common kernel-mode FPU API, but in
a different header and using different function names. The PowerPC API
also requires a non-preemptible context. Add a wrapper header, and
export the CFLAGS adjustments.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v1)

arch/powerpc/Kconfig | 1 +
arch/powerpc/Makefile | 5 ++++-
arch/powerpc/include/asm/fpu.h | 28 ++++++++++++++++++++++++++++
3 files changed, 33 insertions(+), 1 deletion(-)
create mode 100644 arch/powerpc/include/asm/fpu.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6f105ee4f3cf..e96cb5b7c571 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -137,6 +137,7 @@ config PPC
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_HUGEPD if HUGETLB_PAGE
select ARCH_HAS_KCOV
+ select ARCH_HAS_KERNEL_FPU_SUPPORT if PPC_FPU
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_MEMREMAP_COMPAT_ALIGN if PPC_64S_HASH_MMU
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index f19dbaa1d541..91106970a8c1 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -142,6 +142,9 @@ CFLAGS-$(CONFIG_PPC32) += $(call cc-option, $(MULTIPLEWORD))

CFLAGS-$(CONFIG_PPC32) += $(call cc-option,-mno-readonly-in-sdata)

+CC_FLAGS_FPU := $(call cc-option,-mhard-float)
+CC_FLAGS_NO_FPU := $(call cc-option,-msoft-float)
+
ifdef CONFIG_FUNCTION_TRACER
ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
@@ -163,7 +166,7 @@ asinstr := $(call as-instr,lis 9$(comma)foo@high,-DHAVE_AS_ATHIGH=1)

KBUILD_CPPFLAGS += -I $(srctree)/arch/$(ARCH) $(asinstr)
KBUILD_AFLAGS += $(AFLAGS-y)
-KBUILD_CFLAGS += $(call cc-option,-msoft-float)
+KBUILD_CFLAGS += $(CC_FLAGS_NO_FPU)
KBUILD_CFLAGS += $(CFLAGS-y)
CPP = $(CC) -E $(KBUILD_CFLAGS)

diff --git a/arch/powerpc/include/asm/fpu.h b/arch/powerpc/include/asm/fpu.h
new file mode 100644
index 000000000000..ca584e4bc40f
--- /dev/null
+++ b/arch/powerpc/include/asm/fpu.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef _ASM_POWERPC_FPU_H
+#define _ASM_POWERPC_FPU_H
+
+#include <linux/preempt.h>
+
+#include <asm/cpu_has_feature.h>
+#include <asm/switch_to.h>
+
+#define kernel_fpu_available() (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
+
+static inline void kernel_fpu_begin(void)
+{
+ preempt_disable();
+ enable_kernel_fp();
+}
+
+static inline void kernel_fpu_end(void)
+{
+ disable_kernel_fp();
+ preempt_enable();
+}
+
+#endif /* ! _ASM_POWERPC_FPU_H */
--
2.42.0


2023-12-28 01:48:51

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v2 12/14] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT

Now that all previously-supported architectures select
ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
of the existing list of architectures. It can also take advantage of the
common kernel-mode FPU API and method of adjusting CFLAGS.

Signed-off-by: Samuel Holland <[email protected]>
---

Changes in v2:
- Split altivec removal to a separate patch
- Use linux/fpu.h instead of asm/fpu.h in consumers

drivers/gpu/drm/amd/display/Kconfig | 2 +-
.../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 27 ++------------
drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 ++-----------------
drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 ++-----------------
4 files changed, 7 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig
index 901d1961b739..5fcd4f778dc3 100644
--- a/drivers/gpu/drm/amd/display/Kconfig
+++ b/drivers/gpu/drm/amd/display/Kconfig
@@ -8,7 +8,7 @@ config DRM_AMD_DC
depends on BROKEN || !CC_IS_CLANG || ARM64 || RISCV || SPARC64 || X86_64
select SND_HDA_COMPONENT if SND_HDA_CORE
# !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752
- select DRM_AMD_DC_FP if (X86 || LOONGARCH || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG))
+ select DRM_AMD_DC_FP if ARCH_HAS_KERNEL_FPU_SUPPORT && (!ARM64 || !CC_IS_CLANG)
help
Choose this option if you want to use the new display engine
support for AMDGPU. This adds required support for Vega and
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
index 0de16796466b..e46f8ce41d87 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
@@ -26,16 +26,7 @@

#include "dc_trace.h"

-#if defined(CONFIG_X86)
-#include <asm/fpu/api.h>
-#elif defined(CONFIG_PPC64)
-#include <asm/switch_to.h>
-#include <asm/cputable.h>
-#elif defined(CONFIG_ARM64)
-#include <asm/neon.h>
-#elif defined(CONFIG_LOONGARCH)
-#include <asm/fpu.h>
-#endif
+#include <linux/fpu.h>

/**
* DOC: DC FPU manipulation overview
@@ -87,16 +78,9 @@ void dc_fpu_begin(const char *function_name, const int line)
WARN_ON_ONCE(!in_task());
preempt_disable();
depth = __this_cpu_inc_return(fpu_recursion_depth);
-
if (depth == 1) {
-#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
+ BUG_ON(!kernel_fpu_available());
kernel_fpu_begin();
-#elif defined(CONFIG_PPC64)
- if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
- enable_kernel_fp();
-#elif defined(CONFIG_ARM64)
- kernel_neon_begin();
-#endif
}

TRACE_DCN_FPU(true, function_name, line, depth);
@@ -118,14 +102,7 @@ void dc_fpu_end(const char *function_name, const int line)

depth = __this_cpu_dec_return(fpu_recursion_depth);
if (depth == 0) {
-#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
kernel_fpu_end();
-#elif defined(CONFIG_PPC64)
- if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
- disable_kernel_fp();
-#elif defined(CONFIG_ARM64)
- kernel_neon_end();
-#endif
} else {
WARN_ON_ONCE(depth < 0);
}
diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile
index 554c39024a40..be15d366b786 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
@@ -25,40 +25,8 @@
# It provides the general basic services required by other DAL
# subcomponents.

-ifdef CONFIG_X86
-dml_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float
-dml_ccflags := $(dml_ccflags-y) -msse
-endif
-
-ifdef CONFIG_PPC64
-dml_ccflags := -mhard-float
-endif
-
-ifdef CONFIG_ARM64
-dml_rcflags := -mgeneral-regs-only
-endif
-
-ifdef CONFIG_LOONGARCH
-dml_ccflags := -mfpu=64
-dml_rcflags := -msoft-float
-endif
-
-ifdef CONFIG_CC_IS_GCC
-ifneq ($(call gcc-min-version, 70100),y)
-IS_OLD_GCC = 1
-endif
-endif
-
-ifdef CONFIG_X86
-ifdef IS_OLD_GCC
-# Stack alignment mismatch, proceed with caution.
-# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
-# (8B stack alignment).
-dml_ccflags += -mpreferred-stack-boundary=4
-else
-dml_ccflags += -msse2
-endif
-endif
+dml_ccflags := $(CC_FLAGS_FPU)
+dml_rcflags := $(CC_FLAGS_NO_FPU)

ifneq ($(CONFIG_FRAME_WARN),0)
ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y)
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
index 7b51364084b5..4f6c804a26ad 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
@@ -24,40 +24,8 @@
#
# Makefile for dml2.

-ifdef CONFIG_X86
-dml2_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float
-dml2_ccflags := $(dml2_ccflags-y) -msse
-endif
-
-ifdef CONFIG_PPC64
-dml2_ccflags := -mhard-float
-endif
-
-ifdef CONFIG_ARM64
-dml2_rcflags := -mgeneral-regs-only
-endif
-
-ifdef CONFIG_LOONGARCH
-dml2_ccflags := -mfpu=64
-dml2_rcflags := -msoft-float
-endif
-
-ifdef CONFIG_CC_IS_GCC
-ifeq ($(call cc-ifversion, -lt, 0701, y), y)
-IS_OLD_GCC = 1
-endif
-endif
-
-ifdef CONFIG_X86
-ifdef IS_OLD_GCC
-# Stack alignment mismatch, proceed with caution.
-# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
-# (8B stack alignment).
-dml2_ccflags += -mpreferred-stack-boundary=4
-else
-dml2_ccflags += -msse2
-endif
-endif
+dml2_ccflags := $(CC_FLAGS_FPU)
+dml2_rcflags := $(CC_FLAGS_NO_FPU)

ifneq ($(CONFIG_FRAME_WARN),0)
ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y)
--
2.42.0


2023-12-28 03:19:32

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v2 08/14] powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

Samuel Holland <[email protected]> writes:
> PowerPC provides an equivalent to the common kernel-mode FPU API, but in
> a different header and using different function names. The PowerPC API
> also requires a non-preemptible context. Add a wrapper header, and
> export the CFLAGS adjustments.
>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Samuel Holland <[email protected]>
> ---
>
> (no changes since v1)
>
> arch/powerpc/Kconfig | 1 +
> arch/powerpc/Makefile | 5 ++++-
> arch/powerpc/include/asm/fpu.h | 28 ++++++++++++++++++++++++++++
> 3 files changed, 33 insertions(+), 1 deletion(-)
> create mode 100644 arch/powerpc/include/asm/fpu.h

Acked-by: Michael Ellerman <[email protected]> (powerpc)

cheers

2023-12-28 06:19:22

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 01/14] arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT

Thanks for all the great documentation!

Looks good:

Reviewed-by: Christoph Hellwig <[email protected]>


2023-12-28 06:19:44

by Christoph Hellwig

[permalink] [raw]

2023-12-28 06:20:22

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 10/14] riscv: Add support for kernel-mode FPU

On Wed, Dec 27, 2023 at 05:42:00PM -0800, Samuel Holland wrote:
> This is motivated by the amdgpu DRM driver, which needs floating-point
> code to support recent hardware. That code is not performance-critical,
> so only provide a minimal non-preemptible implementation for now.
>
> Signed-off-by: Samuel Holland <[email protected]>

Looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2024-01-03 14:28:13

by Alex Deucher

[permalink] [raw]
Subject: Re: [PATCH v2 00/14] Unified cross-architecture kernel-mode FPU API

On Thu, Dec 28, 2023 at 5:11 AM Samuel Holland
<[email protected]> wrote:
>
> This series unifies the kernel-mode FPU API across several architectures
> by wrapping the existing functions (where needed) in consistently-named
> functions placed in a consistent header location, with mostly the same
> semantics: they can be called from preemptible or non-preemptible task
> context, and are not assumed to be reentrant. Architectures are also
> expected to provide CFLAGS adjustments for compiling FPU-dependent code.
> For the moment, SIMD/vector units are out of scope for this common API.
>
> This allows us to remove the ifdeffery and duplicated Makefile logic at
> each FPU user. It then implements the common API on RISC-V, and converts
> a couple of users to the new API: the AMDGPU DRM driver, and the FPU
> self test.
>
> The underlying goal of this series is to allow using newer AMD GPUs
> (e.g. Navi) on RISC-V boards such as SiFive's HiFive Unmatched. Those
> GPUs need CONFIG_DRM_AMD_DC_FP to initialize, which requires kernel-mode
> FPU support.

Series is:
Acked-by: Alex Deucher <[email protected]>

>
> Previous versions:
> v1: https://lore.kernel.org/linux-kernel/[email protected]/
> v0: https://lore.kernel.org/linux-kernel/[email protected]/
>
> Changes in v2:
> - Add documentation explaining the built-time and runtime APIs
> - Add a linux/fpu.h header for generic isolation enforcement
> - Remove file name from header comment
> - Clean up arch/arm64/lib/Makefile, like for arch/arm
> - Remove RISC-V architecture-specific preprocessor check
> - Split altivec removal to a separate patch
> - Use linux/fpu.h instead of asm/fpu.h in consumers
> - Declare test_fpu() in a header
>
> Michael Ellerman (1):
> drm/amd/display: Only use hard-float, not altivec on powerpc
>
> Samuel Holland (13):
> arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT
> ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
> ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
> arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
> arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
> lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS
> LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
> powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
> x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
> riscv: Add support for kernel-mode FPU
> drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
> selftests/fpu: Move FP code to a separate translation unit
> selftests/fpu: Allow building on other architectures
>
> Documentation/core-api/floating-point.rst | 78 +++++++++++++++++++
> Documentation/core-api/index.rst | 1 +
> Makefile | 5 ++
> arch/Kconfig | 6 ++
> arch/arm/Kconfig | 1 +
> arch/arm/Makefile | 7 ++
> arch/arm/include/asm/fpu.h | 15 ++++
> arch/arm/lib/Makefile | 3 +-
> arch/arm64/Kconfig | 1 +
> arch/arm64/Makefile | 9 ++-
> arch/arm64/include/asm/fpu.h | 15 ++++
> arch/arm64/lib/Makefile | 6 +-
> arch/loongarch/Kconfig | 1 +
> arch/loongarch/Makefile | 5 +-
> arch/loongarch/include/asm/fpu.h | 1 +
> arch/powerpc/Kconfig | 1 +
> arch/powerpc/Makefile | 5 +-
> arch/powerpc/include/asm/fpu.h | 28 +++++++
> arch/riscv/Kconfig | 1 +
> arch/riscv/Makefile | 3 +
> arch/riscv/include/asm/fpu.h | 16 ++++
> arch/riscv/kernel/Makefile | 1 +
> arch/riscv/kernel/kernel_mode_fpu.c | 28 +++++++
> arch/x86/Kconfig | 1 +
> arch/x86/Makefile | 20 +++++
> arch/x86/include/asm/fpu.h | 13 ++++
> drivers/gpu/drm/amd/display/Kconfig | 2 +-
> .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 35 +--------
> drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 +--------
> drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 +--------
> include/linux/fpu.h | 12 +++
> lib/Kconfig.debug | 2 +-
> lib/Makefile | 26 +------
> lib/raid6/Makefile | 31 ++------
> lib/test_fpu.h | 8 ++
> lib/{test_fpu.c => test_fpu_glue.c} | 37 ++-------
> lib/test_fpu_impl.c | 37 +++++++++
> 37 files changed, 343 insertions(+), 190 deletions(-)
> create mode 100644 Documentation/core-api/floating-point.rst
> create mode 100644 arch/arm/include/asm/fpu.h
> create mode 100644 arch/arm64/include/asm/fpu.h
> create mode 100644 arch/powerpc/include/asm/fpu.h
> create mode 100644 arch/riscv/include/asm/fpu.h
> create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c
> create mode 100644 arch/x86/include/asm/fpu.h
> create mode 100644 include/linux/fpu.h
> create mode 100644 lib/test_fpu.h
> rename lib/{test_fpu.c => test_fpu_glue.c} (71%)
> create mode 100644 lib/test_fpu_impl.c
>
> --
> 2.42.0
>

2024-01-04 09:57:31

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v2 07/14] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

Hi, Samuel,

On Thu, Dec 28, 2023 at 9:42 AM Samuel Holland
<[email protected]> wrote:
>
> LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in
> asm/fpu.h, so it only needs to add kernel_fpu_available() and export
> the CFLAGS adjustments.
>
> Acked-by: WANG Xuerui <[email protected]>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Samuel Holland <[email protected]>
> ---
>
> (no changes since v1)
>
> arch/loongarch/Kconfig | 1 +
> arch/loongarch/Makefile | 5 ++++-
> arch/loongarch/include/asm/fpu.h | 1 +
> 3 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index ee123820a476..65d4475565b8 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -15,6 +15,7 @@ config LOONGARCH
> select ARCH_HAS_CPU_FINALIZE_INIT
> select ARCH_HAS_FORTIFY_SOURCE
> select ARCH_HAS_KCOV
> + select ARCH_HAS_KERNEL_FPU_SUPPORT if CPU_HAS_FPU
> select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
> select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
> select ARCH_HAS_PTE_SPECIAL
> diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
> index 4ba8d67ddb09..1afe28feaba5 100644
> --- a/arch/loongarch/Makefile
> +++ b/arch/loongarch/Makefile
> @@ -25,6 +25,9 @@ endif
> 32bit-emul = elf32loongarch
> 64bit-emul = elf64loongarch
>
> +CC_FLAGS_FPU := -mfpu=64
> +CC_FLAGS_NO_FPU := -msoft-float
We will add LoongArch32 support later, maybe it should be -mfpu=32 in
that case, and do other archs have the case that only support FP32?

Huacai

> +
> ifdef CONFIG_DYNAMIC_FTRACE
> KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
> CC_FLAGS_FTRACE := -fpatchable-function-entry=2
> @@ -46,7 +49,7 @@ ld-emul = $(64bit-emul)
> cflags-y += -mabi=lp64s
> endif
>
> -cflags-y += -pipe -msoft-float
> +cflags-y += -pipe $(CC_FLAGS_NO_FPU)
> LDFLAGS_vmlinux += -static -n -nostdlib
>
> # When the assembler supports explicit relocation hint, we must use it.
> diff --git a/arch/loongarch/include/asm/fpu.h b/arch/loongarch/include/asm/fpu.h
> index c2d8962fda00..3177674228f8 100644
> --- a/arch/loongarch/include/asm/fpu.h
> +++ b/arch/loongarch/include/asm/fpu.h
> @@ -21,6 +21,7 @@
>
> struct sigcontext;
>
> +#define kernel_fpu_available() cpu_has_fpu
> extern void kernel_fpu_begin(void);
> extern void kernel_fpu_end(void);
>
> --
> 2.42.0
>
>

2024-01-04 15:59:19

by Samuel Holland

[permalink] [raw]
Subject: Re: [PATCH v2 07/14] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

Hi Huacai,

On 2024-01-04 3:55 AM, Huacai Chen wrote:
> Hi, Samuel,
>
> On Thu, Dec 28, 2023 at 9:42 AM Samuel Holland
> <[email protected]> wrote:
>>
>> LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in
>> asm/fpu.h, so it only needs to add kernel_fpu_available() and export
>> the CFLAGS adjustments.
>>
>> Acked-by: WANG Xuerui <[email protected]>
>> Reviewed-by: Christoph Hellwig <[email protected]>
>> Signed-off-by: Samuel Holland <[email protected]>
>> ---
>>
>> (no changes since v1)
>>
>> arch/loongarch/Kconfig | 1 +
>> arch/loongarch/Makefile | 5 ++++-
>> arch/loongarch/include/asm/fpu.h | 1 +
>> 3 files changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
>> index ee123820a476..65d4475565b8 100644
>> --- a/arch/loongarch/Kconfig
>> +++ b/arch/loongarch/Kconfig
>> @@ -15,6 +15,7 @@ config LOONGARCH
>> select ARCH_HAS_CPU_FINALIZE_INIT
>> select ARCH_HAS_FORTIFY_SOURCE
>> select ARCH_HAS_KCOV
>> + select ARCH_HAS_KERNEL_FPU_SUPPORT if CPU_HAS_FPU
>> select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
>> select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
>> select ARCH_HAS_PTE_SPECIAL
>> diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
>> index 4ba8d67ddb09..1afe28feaba5 100644
>> --- a/arch/loongarch/Makefile
>> +++ b/arch/loongarch/Makefile
>> @@ -25,6 +25,9 @@ endif
>> 32bit-emul = elf32loongarch
>> 64bit-emul = elf64loongarch
>>
>> +CC_FLAGS_FPU := -mfpu=64
>> +CC_FLAGS_NO_FPU := -msoft-float
> We will add LoongArch32 support later, maybe it should be -mfpu=32 in
> that case, and do other archs have the case that only support FP32?

Do you mean that LoongArch32 does not support double-precision FP in hardware?
At least both of the consumers in this series use double-precision, so my first
thought is that LoongArch32 could not select ARCH_HAS_KERNEL_FPU_SUPPORT.

Regards,
Samuel


2024-01-07 02:39:26

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH v2 07/14] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

On Thu, Jan 4, 2024 at 11:58 PM Samuel Holland
<[email protected]> wrote:
>
> Hi Huacai,
>
> On 2024-01-04 3:55 AM, Huacai Chen wrote:
> > Hi, Samuel,
> >
> > On Thu, Dec 28, 2023 at 9:42 AM Samuel Holland
> > <[email protected]> wrote:
> >>
> >> LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in
> >> asm/fpu.h, so it only needs to add kernel_fpu_available() and export
> >> the CFLAGS adjustments.
> >>
> >> Acked-by: WANG Xuerui <[email protected]>
> >> Reviewed-by: Christoph Hellwig <[email protected]>
> >> Signed-off-by: Samuel Holland <[email protected]>
> >> ---
> >>
> >> (no changes since v1)
> >>
> >> arch/loongarch/Kconfig | 1 +
> >> arch/loongarch/Makefile | 5 ++++-
> >> arch/loongarch/include/asm/fpu.h | 1 +
> >> 3 files changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> >> index ee123820a476..65d4475565b8 100644
> >> --- a/arch/loongarch/Kconfig
> >> +++ b/arch/loongarch/Kconfig
> >> @@ -15,6 +15,7 @@ config LOONGARCH
> >> select ARCH_HAS_CPU_FINALIZE_INIT
> >> select ARCH_HAS_FORTIFY_SOURCE
> >> select ARCH_HAS_KCOV
> >> + select ARCH_HAS_KERNEL_FPU_SUPPORT if CPU_HAS_FPU
> >> select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
> >> select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
> >> select ARCH_HAS_PTE_SPECIAL
> >> diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
> >> index 4ba8d67ddb09..1afe28feaba5 100644
> >> --- a/arch/loongarch/Makefile
> >> +++ b/arch/loongarch/Makefile
> >> @@ -25,6 +25,9 @@ endif
> >> 32bit-emul = elf32loongarch
> >> 64bit-emul = elf64loongarch
> >>
> >> +CC_FLAGS_FPU := -mfpu=64
> >> +CC_FLAGS_NO_FPU := -msoft-float
> > We will add LoongArch32 support later, maybe it should be -mfpu=32 in
> > that case, and do other archs have the case that only support FP32?
>
> Do you mean that LoongArch32 does not support double-precision FP in hardware?
> At least both of the consumers in this series use double-precision, so my first
> thought is that LoongArch32 could not select ARCH_HAS_KERNEL_FPU_SUPPORT.
Then is it possible to introduce CC_FLAGS_SP_FPU and CC_FLAGS_DP_FPU?
I think there may be some place where SP FP is enough.

Huacai

>
> Regards,
> Samuel
>

2024-01-08 09:37:23

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 07/14] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

On Sun, Jan 07, 2024 at 10:39:07AM +0800, Huacai Chen wrote:
> > Do you mean that LoongArch32 does not support double-precision FP in hardware?
> > At least both of the consumers in this series use double-precision, so my first
> > thought is that LoongArch32 could not select ARCH_HAS_KERNEL_FPU_SUPPORT.
> Then is it possible to introduce CC_FLAGS_SP_FPU and CC_FLAGS_DP_FPU?
> I think there may be some place where SP FP is enough.

Let's defer that until it is actually neeed.

2024-01-10 14:51:49

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [PATCH v2 10/14] riscv: Add support for kernel-mode FPU

On Wed, 27 Dec 2023 17:42:00 PST (-0800), [email protected] wrote:
> This is motivated by the amdgpu DRM driver, which needs floating-point
> code to support recent hardware. That code is not performance-critical,
> so only provide a minimal non-preemptible implementation for now.
>
> Signed-off-by: Samuel Holland <[email protected]>
> ---
>
> Changes in v2:
> - Remove RISC-V architecture-specific preprocessor check
>
> arch/riscv/Kconfig | 1 +
> arch/riscv/Makefile | 3 +++
> arch/riscv/include/asm/fpu.h | 16 ++++++++++++++++
> arch/riscv/kernel/Makefile | 1 +
> arch/riscv/kernel/kernel_mode_fpu.c | 28 ++++++++++++++++++++++++++++
> 5 files changed, 49 insertions(+)
> create mode 100644 arch/riscv/include/asm/fpu.h
> create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 24c1799e2ec4..4d4d1d64ce34 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -27,6 +27,7 @@ config RISCV
> select ARCH_HAS_GCOV_PROFILE_ALL
> select ARCH_HAS_GIGANTIC_PAGE
> select ARCH_HAS_KCOV
> + select ARCH_HAS_KERNEL_FPU_SUPPORT if FPU
> select ARCH_HAS_MMIOWB
> select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
> select ARCH_HAS_PMEM_API
> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
> index a74be78678eb..2e719c369210 100644
> --- a/arch/riscv/Makefile
> +++ b/arch/riscv/Makefile
> @@ -81,6 +81,9 @@ KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64i
>
> KBUILD_AFLAGS += -march=$(riscv-march-y)
>
> +# For C code built with floating-point support, exclude V but keep F and D.
> +CC_FLAGS_FPU := -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)([^v_]*)v?/\1\2/')
> +
> KBUILD_CFLAGS += -mno-save-restore
> KBUILD_CFLAGS += -DCONFIG_PAGE_OFFSET=$(CONFIG_PAGE_OFFSET)
>
> diff --git a/arch/riscv/include/asm/fpu.h b/arch/riscv/include/asm/fpu.h
> new file mode 100644
> index 000000000000..91c04c244e12
> --- /dev/null
> +++ b/arch/riscv/include/asm/fpu.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2023 SiFive
> + */
> +
> +#ifndef _ASM_RISCV_FPU_H
> +#define _ASM_RISCV_FPU_H
> +
> +#include <asm/switch_to.h>
> +
> +#define kernel_fpu_available() has_fpu()
> +
> +void kernel_fpu_begin(void);
> +void kernel_fpu_end(void);
> +
> +#endif /* ! _ASM_RISCV_FPU_H */
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index fee22a3d1b53..662c483e338d 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -62,6 +62,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
>
> obj-$(CONFIG_RISCV_MISALIGNED) += traps_misaligned.o
> obj-$(CONFIG_FPU) += fpu.o
> +obj-$(CONFIG_FPU) += kernel_mode_fpu.o
> obj-$(CONFIG_RISCV_ISA_V) += vector.o
> obj-$(CONFIG_SMP) += smpboot.o
> obj-$(CONFIG_SMP) += smp.o
> diff --git a/arch/riscv/kernel/kernel_mode_fpu.c b/arch/riscv/kernel/kernel_mode_fpu.c
> new file mode 100644
> index 000000000000..0ac8348876c4
> --- /dev/null
> +++ b/arch/riscv/kernel/kernel_mode_fpu.c
> @@ -0,0 +1,28 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2023 SiFive
> + */
> +
> +#include <linux/export.h>
> +#include <linux/preempt.h>
> +
> +#include <asm/csr.h>
> +#include <asm/fpu.h>
> +#include <asm/processor.h>
> +#include <asm/switch_to.h>
> +
> +void kernel_fpu_begin(void)
> +{
> + preempt_disable();
> + fstate_save(current, task_pt_regs(current));
> + csr_set(CSR_SSTATUS, SR_FS);
> +}
> +EXPORT_SYMBOL_GPL(kernel_fpu_begin);
> +
> +void kernel_fpu_end(void)
> +{
> + csr_clear(CSR_SSTATUS, SR_FS);
> + fstate_restore(current, task_pt_regs(current));
> + preempt_enable();
> +}
> +EXPORT_SYMBOL_GPL(kernel_fpu_end);

Reviewed-by: Palmer Dabbelt <[email protected]>
Acked-by: Palmer Dabbelt <[email protected]>

assuming you want to keep these together -- it touches a lot of stuff,
so LMK if you want me to pick something up.

Thanks!