2024-03-29 07:24:59

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API

This series unifies the kernel-mode FPU API across several architectures
by wrapping the existing functions (where needed) in consistently-named
functions placed in a consistent header location, with mostly the same
semantics: they can be called from preemptible or non-preemptible task
context, and are not assumed to be reentrant. Architectures are also
expected to provide CFLAGS adjustments for compiling FPU-dependent code.
For the moment, SIMD/vector units are out of scope for this common API.

This allows us to remove the ifdeffery and duplicated Makefile logic at
each FPU user. It then implements the common API on RISC-V, and converts
a couple of users to the new API: the AMDGPU DRM driver, and the FPU
self test.

The underlying goal of this series is to allow using newer AMD GPUs
(e.g. Navi) on RISC-V boards such as SiFive's HiFive Unmatched. Those
GPUs need CONFIG_DRM_AMD_DC_FP to initialize, which requires kernel-mode
FPU support.

Previous versions:
v3: https://lore.kernel.org/linux-kernel/[email protected]/
v2: https://lore.kernel.org/linux-kernel/[email protected]/
v1: https://lore.kernel.org/linux-kernel/[email protected]/
v0: https://lore.kernel.org/linux-kernel/[email protected]/

Changes in v4:
- Add missed CFLAGS changes for recov_neon_inner.c
(fixes arm build failures)
- Fix x86 include guard issue (fixes x86 build failures)

Changes in v3:
- Rebase on v6.9-rc1
- Limit riscv ARCH_HAS_KERNEL_FPU_SUPPORT to 64BIT

Changes in v2:
- Add documentation explaining the built-time and runtime APIs
- Add a linux/fpu.h header for generic isolation enforcement
- Remove file name from header comment
- Clean up arch/arm64/lib/Makefile, like for arch/arm
- Remove RISC-V architecture-specific preprocessor check
- Split altivec removal to a separate patch
- Use linux/fpu.h instead of asm/fpu.h in consumers
- Declare test_fpu() in a header

Michael Ellerman (1):
drm/amd/display: Only use hard-float, not altivec on powerpc

Samuel Holland (14):
arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT
ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS
LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
x86/fpu: Fix asm/fpu/types.h include guard
x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
riscv: Add support for kernel-mode FPU
drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
selftests/fpu: Move FP code to a separate translation unit
selftests/fpu: Allow building on other architectures

Documentation/core-api/floating-point.rst | 78 +++++++++++++++++++
Documentation/core-api/index.rst | 1 +
Makefile | 5 ++
arch/Kconfig | 6 ++
arch/arm/Kconfig | 1 +
arch/arm/Makefile | 7 ++
arch/arm/include/asm/fpu.h | 15 ++++
arch/arm/lib/Makefile | 3 +-
arch/arm64/Kconfig | 1 +
arch/arm64/Makefile | 9 ++-
arch/arm64/include/asm/fpu.h | 15 ++++
arch/arm64/lib/Makefile | 6 +-
arch/loongarch/Kconfig | 1 +
arch/loongarch/Makefile | 5 +-
arch/loongarch/include/asm/fpu.h | 1 +
arch/powerpc/Kconfig | 1 +
arch/powerpc/Makefile | 5 +-
arch/powerpc/include/asm/fpu.h | 28 +++++++
arch/riscv/Kconfig | 1 +
arch/riscv/Makefile | 3 +
arch/riscv/include/asm/fpu.h | 16 ++++
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/kernel_mode_fpu.c | 28 +++++++
arch/x86/Kconfig | 1 +
arch/x86/Makefile | 20 +++++
arch/x86/include/asm/fpu.h | 13 ++++
arch/x86/include/asm/fpu/types.h | 6 +-
drivers/gpu/drm/amd/display/Kconfig | 2 +-
.../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 35 +--------
drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 +--------
drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 +--------
include/linux/fpu.h | 12 +++
lib/Kconfig.debug | 2 +-
lib/Makefile | 26 +------
lib/raid6/Makefile | 33 +++-----
lib/test_fpu.h | 8 ++
lib/{test_fpu.c => test_fpu_glue.c} | 37 ++-------
lib/test_fpu_impl.c | 37 +++++++++
38 files changed, 348 insertions(+), 193 deletions(-)
create mode 100644 Documentation/core-api/floating-point.rst
create mode 100644 arch/arm/include/asm/fpu.h
create mode 100644 arch/arm64/include/asm/fpu.h
create mode 100644 arch/powerpc/include/asm/fpu.h
create mode 100644 arch/riscv/include/asm/fpu.h
create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c
create mode 100644 arch/x86/include/asm/fpu.h
create mode 100644 include/linux/fpu.h
create mode 100644 lib/test_fpu.h
rename lib/{test_fpu.c => test_fpu_glue.c} (71%)
create mode 100644 lib/test_fpu_impl.c

--
2.44.0



2024-03-29 07:25:11

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

ARM provides an equivalent to the common kernel-mode FPU API, but in a
different header and using different function names. Add a wrapper
header, and export CFLAGS adjustments as found in lib/raid6/Makefile.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v2)

Changes in v2:
- Remove file name from header comment

arch/arm/Kconfig | 1 +
arch/arm/Makefile | 7 +++++++
arch/arm/include/asm/fpu.h | 15 +++++++++++++++
3 files changed, 23 insertions(+)
create mode 100644 arch/arm/include/asm/fpu.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index b14aed3a17ab..b1751c2cab87 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -15,6 +15,7 @@ config ARM
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_KEEPINITRD
select ARCH_HAS_KCOV
+ select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PTE_SPECIAL if ARM_LPAE
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index d82908b1b1bb..71afdd98ddf2 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -130,6 +130,13 @@ endif
# Accept old syntax despite ".syntax unified"
AFLAGS_NOWARN :=$(call as-option,-Wa$(comma)-mno-warn-deprecated,-Wa$(comma)-W)

+# The GCC option -ffreestanding is required in order to compile code containing
+# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel)
+CC_FLAGS_FPU := -ffreestanding
+# Enable <arm_neon.h>
+CC_FLAGS_FPU += -isystem $(shell $(CC) -print-file-name=include)
+CC_FLAGS_FPU += -march=armv7-a -mfloat-abi=softfp -mfpu=neon
+
ifeq ($(CONFIG_THUMB2_KERNEL),y)
CFLAGS_ISA :=-Wa,-mimplicit-it=always $(AFLAGS_NOWARN)
AFLAGS_ISA :=$(CFLAGS_ISA) -Wa$(comma)-mthumb
diff --git a/arch/arm/include/asm/fpu.h b/arch/arm/include/asm/fpu.h
new file mode 100644
index 000000000000..2ae50bdce59b
--- /dev/null
+++ b/arch/arm/include/asm/fpu.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef __ASM_FPU_H
+#define __ASM_FPU_H
+
+#include <asm/neon.h>
+
+#define kernel_fpu_available() cpu_has_neon()
+#define kernel_fpu_begin() kernel_neon_begin()
+#define kernel_fpu_end() kernel_neon_end()
+
+#endif /* ! __ASM_FPU_H */
--
2.44.0


2024-03-29 07:25:19

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v4 01/15] arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT

Several architectures provide an API to enable the FPU and run
floating-point SIMD code in kernel space. However, the function names,
header locations, and semantics are inconsistent across architectures,
and FPU support may be gated behind other Kconfig options.

Provide a standard way for architectures to declare that kernel space
FPU support is available. Architectures selecting this option must
implement what is currently the most common API (kernel_fpu_begin() and
kernel_fpu_end(), plus a new function kernel_fpu_available()) and
provide the appropriate CFLAGS for compiling floating-point C code.

Suggested-by: Christoph Hellwig <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v2)

Changes in v2:
- Add documentation explaining the built-time and runtime APIs
- Add a linux/fpu.h header for generic isolation enforcement

Documentation/core-api/floating-point.rst | 78 +++++++++++++++++++++++
Documentation/core-api/index.rst | 1 +
Makefile | 5 ++
arch/Kconfig | 6 ++
include/linux/fpu.h | 12 ++++
5 files changed, 102 insertions(+)
create mode 100644 Documentation/core-api/floating-point.rst
create mode 100644 include/linux/fpu.h

diff --git a/Documentation/core-api/floating-point.rst b/Documentation/core-api/floating-point.rst
new file mode 100644
index 000000000000..a8d0d4b05052
--- /dev/null
+++ b/Documentation/core-api/floating-point.rst
@@ -0,0 +1,78 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+Floating-point API
+==================
+
+Kernel code is normally prohibited from using floating-point (FP) registers or
+instructions, including the C float and double data types. This rule reduces
+system call overhead, because the kernel does not need to save and restore the
+userspace floating-point register state.
+
+However, occasionally drivers or library functions may need to include FP code.
+This is supported by isolating the functions containing FP code to a separate
+translation unit (a separate source file), and saving/restoring the FP register
+state around calls to those functions. This creates "critical sections" of
+floating-point usage.
+
+The reason for this isolation is to prevent the compiler from generating code
+touching the FP registers outside these critical sections. Compilers sometimes
+use FP registers to optimize inlined ``memcpy`` or variable assignment, as
+floating-point registers may be wider than general-purpose registers.
+
+Usability of floating-point code within the kernel is architecture-specific.
+Additionally, because a single kernel may be configured to support platforms
+both with and without a floating-point unit, FPU availability must be checked
+both at build time and at run time.
+
+Several architectures implement the generic kernel floating-point API from
+``linux/fpu.h``, as described below. Some other architectures implement their
+own unique APIs, which are documented separately.
+
+Build-time API
+--------------
+
+Floating-point code may be built if the option ``ARCH_HAS_KERNEL_FPU_SUPPORT``
+is enabled. For C code, such code must be placed in a separate file, and that
+file must have its compilation flags adjusted using the following pattern::
+
+ CFLAGS_foo.o += $(CC_FLAGS_FPU)
+ CFLAGS_REMOVE_foo.o += $(CC_FLAGS_NO_FPU)
+
+Architectures are expected to define one or both of these variables in their
+top-level Makefile as needed. For example::
+
+ CC_FLAGS_FPU := -mhard-float
+
+or::
+
+ CC_FLAGS_NO_FPU := -msoft-float
+
+Normal kernel code is assumed to use the equivalent of ``CC_FLAGS_NO_FPU``.
+
+Runtime API
+-----------
+
+The runtime API is provided in ``linux/fpu.h``. This header cannot be included
+from files implementing FP code (those with their compilation flags adjusted as
+above). Instead, it must be included when defining the FP critical sections.
+
+.. c:function:: bool kernel_fpu_available( void )
+
+ This function reports if floating-point code can be used on this CPU or
+ platform. The value returned by this function is not expected to change
+ at runtime, so it only needs to be called once, not before every
+ critical section.
+
+.. c:function:: void kernel_fpu_begin( void )
+ void kernel_fpu_end( void )
+
+ These functions create a floating-point critical section. It is only
+ valid to call ``kernel_fpu_begin()`` after a previous call to
+ ``kernel_fpu_available()`` returned ``true``. These functions are only
+ guaranteed to be callable from (preemptible or non-preemptible) process
+ context.
+
+ Preemption may be disabled inside critical sections, so their size
+ should be minimized. They are *not* required to be reentrant. If the
+ caller expects to nest critical sections, it must implement its own
+ reference counting.
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index 7a3a08d81f11..974beccd671f 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -48,6 +48,7 @@ Library functionality that is used throughout the kernel.
errseq
wrappers/atomic_t
wrappers/atomic_bitops
+ floating-point

Low level entry and exit
========================
diff --git a/Makefile b/Makefile
index 763b6792d3d5..710f65e4249d 100644
--- a/Makefile
+++ b/Makefile
@@ -964,6 +964,11 @@ KBUILD_CFLAGS += $(CC_FLAGS_CFI)
export CC_FLAGS_CFI
endif

+# Architectures can define flags to add/remove for floating-point support
+CC_FLAGS_FPU += -D_LINUX_FPU_COMPILATION_UNIT
+export CC_FLAGS_FPU
+export CC_FLAGS_NO_FPU
+
ifneq ($(CONFIG_FUNCTION_ALIGNMENT),0)
# Set the minimal function alignment. Use the newer GCC option
# -fmin-function-alignment if it is available, or fall back to -falign-funtions.
diff --git a/arch/Kconfig b/arch/Kconfig
index 9f066785bb71..8e34b3acf73d 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1569,6 +1569,12 @@ config ARCH_HAS_NONLEAF_PMD_YOUNG
address translations. Page table walkers that clear the accessed bit
may use this capability to reduce their search space.

+config ARCH_HAS_KERNEL_FPU_SUPPORT
+ bool
+ help
+ Architectures that select this option can run floating-point code in
+ the kernel, as described in Documentation/core-api/floating-point.rst.
+
source "kernel/gcov/Kconfig"

source "scripts/gcc-plugins/Kconfig"
diff --git a/include/linux/fpu.h b/include/linux/fpu.h
new file mode 100644
index 000000000000..2fb63e22913b
--- /dev/null
+++ b/include/linux/fpu.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _LINUX_FPU_H
+#define _LINUX_FPU_H
+
+#ifdef _LINUX_FPU_COMPILATION_UNIT
+#error FP code must be compiled separately. See Documentation/core-api/floating-point.rst.
+#endif
+
+#include <asm/fpu.h>
+
+#endif
--
2.44.0


2024-03-29 07:25:24

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v4 03/15] ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS

Now that CC_FLAGS_FPU is exported and can be used anywhere in the source
tree, use it instead of duplicating the flags here.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v1)

arch/arm/lib/Makefile | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index 650404be6768..0ca5aae1bcc3 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -40,8 +40,7 @@ $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S
$(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S

ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
- NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp -mfpu=neon
- CFLAGS_xor-neon.o += $(NEON_FLAGS)
+ CFLAGS_xor-neon.o += $(CC_FLAGS_FPU)
obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
endif

--
2.44.0


2024-03-29 07:26:05

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v4 05/15] arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS

Now that CC_FLAGS_FPU is exported and can be used anywhere in the source
tree, use it instead of duplicating the flags here.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v2)

Changes in v2:
- New patch for v2

arch/arm64/lib/Makefile | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 29490be2546b..13e6a2829116 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -7,10 +7,8 @@ lib-y := clear_user.o delay.o copy_from_user.o \

ifeq ($(CONFIG_KERNEL_MODE_NEON), y)
obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
-CFLAGS_REMOVE_xor-neon.o += -mgeneral-regs-only
-CFLAGS_xor-neon.o += -ffreestanding
-# Enable <arm_neon.h>
-CFLAGS_xor-neon.o += -isystem $(shell $(CC) -print-file-name=include)
+CFLAGS_xor-neon.o += $(CC_FLAGS_FPU)
+CFLAGS_REMOVE_xor-neon.o += $(CC_FLAGS_NO_FPU)
endif

lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o
--
2.44.0


2024-03-29 07:26:11

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v4 06/15] lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS

Now that CC_FLAGS_FPU is exported and can be used anywhere in the source
tree, use it instead of duplicating the flags here.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

Changes in v4:
- Add missed CFLAGS changes for recov_neon_inner.c
(fixes arm build failures)

lib/raid6/Makefile | 33 ++++++++++-----------------------
1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile
index 385a94aa0b99..0e88bfe6445b 100644
--- a/lib/raid6/Makefile
+++ b/lib/raid6/Makefile
@@ -33,25 +33,6 @@ CFLAGS_REMOVE_vpermxor8.o += -msoft-float
endif
endif

-# The GCC option -ffreestanding is required in order to compile code containing
-# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel)
-ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
-NEON_FLAGS := -ffreestanding
-# Enable <arm_neon.h>
-NEON_FLAGS += -isystem $(shell $(CC) -print-file-name=include)
-ifeq ($(ARCH),arm)
-NEON_FLAGS += -march=armv7-a -mfloat-abi=softfp -mfpu=neon
-endif
-CFLAGS_recov_neon_inner.o += $(NEON_FLAGS)
-ifeq ($(ARCH),arm64)
-CFLAGS_REMOVE_recov_neon_inner.o += -mgeneral-regs-only
-CFLAGS_REMOVE_neon1.o += -mgeneral-regs-only
-CFLAGS_REMOVE_neon2.o += -mgeneral-regs-only
-CFLAGS_REMOVE_neon4.o += -mgeneral-regs-only
-CFLAGS_REMOVE_neon8.o += -mgeneral-regs-only
-endif
-endif
-
quiet_cmd_unroll = UNROLL $@
cmd_unroll = $(AWK) -v N=$* -f $(srctree)/$(src)/unroll.awk < $< > $@

@@ -75,10 +56,16 @@ targets += vpermxor1.c vpermxor2.c vpermxor4.c vpermxor8.c
$(obj)/vpermxor%.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE
$(call if_changed,unroll)

-CFLAGS_neon1.o += $(NEON_FLAGS)
-CFLAGS_neon2.o += $(NEON_FLAGS)
-CFLAGS_neon4.o += $(NEON_FLAGS)
-CFLAGS_neon8.o += $(NEON_FLAGS)
+CFLAGS_neon1.o += $(CC_FLAGS_FPU)
+CFLAGS_neon2.o += $(CC_FLAGS_FPU)
+CFLAGS_neon4.o += $(CC_FLAGS_FPU)
+CFLAGS_neon8.o += $(CC_FLAGS_FPU)
+CFLAGS_recov_neon_inner.o += $(CC_FLAGS_FPU)
+CFLAGS_REMOVE_neon1.o += $(CC_FLAGS_NO_FPU)
+CFLAGS_REMOVE_neon2.o += $(CC_FLAGS_NO_FPU)
+CFLAGS_REMOVE_neon4.o += $(CC_FLAGS_NO_FPU)
+CFLAGS_REMOVE_neon8.o += $(CC_FLAGS_NO_FPU)
+CFLAGS_REMOVE_recov_neon_inner.o += $(CC_FLAGS_NO_FPU)
targets += neon1.c neon2.c neon4.c neon8.c
$(obj)/neon%.c: $(src)/neon.uc $(src)/unroll.awk FORCE
$(call if_changed,unroll)
--
2.44.0


2024-03-29 07:26:23

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v4 07/15] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in
asm/fpu.h, so it only needs to add kernel_fpu_available() and export
the CFLAGS adjustments.

Acked-by: WANG Xuerui <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v3)

Changes in v3:
- Rebase on v6.9-rc1

arch/loongarch/Kconfig | 1 +
arch/loongarch/Makefile | 5 ++++-
arch/loongarch/include/asm/fpu.h | 1 +
3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index a5f300ec6f28..2266c6c41c38 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -18,6 +18,7 @@ config LOONGARCH
select ARCH_HAS_CURRENT_STACK_POINTER
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_KCOV
+ select ARCH_HAS_KERNEL_FPU_SUPPORT if CPU_HAS_FPU
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PTE_SPECIAL
diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
index df6caf79537a..efb5440a43ec 100644
--- a/arch/loongarch/Makefile
+++ b/arch/loongarch/Makefile
@@ -26,6 +26,9 @@ endif
32bit-emul = elf32loongarch
64bit-emul = elf64loongarch

+CC_FLAGS_FPU := -mfpu=64
+CC_FLAGS_NO_FPU := -msoft-float
+
ifdef CONFIG_UNWINDER_ORC
orc_hash_h := arch/$(SRCARCH)/include/generated/asm/orc_hash.h
orc_hash_sh := $(srctree)/scripts/orc_hash.sh
@@ -59,7 +62,7 @@ ld-emul = $(64bit-emul)
cflags-y += -mabi=lp64s
endif

-cflags-y += -pipe -msoft-float
+cflags-y += -pipe $(CC_FLAGS_NO_FPU)
LDFLAGS_vmlinux += -static -n -nostdlib

# When the assembler supports explicit relocation hint, we must use it.
diff --git a/arch/loongarch/include/asm/fpu.h b/arch/loongarch/include/asm/fpu.h
index c2d8962fda00..3177674228f8 100644
--- a/arch/loongarch/include/asm/fpu.h
+++ b/arch/loongarch/include/asm/fpu.h
@@ -21,6 +21,7 @@

struct sigcontext;

+#define kernel_fpu_available() cpu_has_fpu
extern void kernel_fpu_begin(void);
extern void kernel_fpu_end(void);

--
2.44.0


2024-03-29 07:26:25

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v4 04/15] arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

arm64 provides an equivalent to the common kernel-mode FPU API, but in a
different header and using different function names. Add a wrapper
header, and export CFLAGS adjustments as found in lib/raid6/Makefile.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v2)

Changes in v2:
- Remove file name from header comment

arch/arm64/Kconfig | 1 +
arch/arm64/Makefile | 9 ++++++++-
arch/arm64/include/asm/fpu.h | 15 +++++++++++++++
3 files changed, 24 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/include/asm/fpu.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b11c98b3e84..67f0d3b5b7df 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -30,6 +30,7 @@ config ARM64
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_HAS_KCOV
+ select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON
select ARCH_HAS_KEEPINITRD
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 0e075d3c546b..3e863e5b0169 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -36,7 +36,14 @@ ifeq ($(CONFIG_BROKEN_GAS_INST),y)
$(warning Detected assembler with broken .inst; disassembly will be unreliable)
endif

-KBUILD_CFLAGS += -mgeneral-regs-only \
+# The GCC option -ffreestanding is required in order to compile code containing
+# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel)
+CC_FLAGS_FPU := -ffreestanding
+# Enable <arm_neon.h>
+CC_FLAGS_FPU += -isystem $(shell $(CC) -print-file-name=include)
+CC_FLAGS_NO_FPU := -mgeneral-regs-only
+
+KBUILD_CFLAGS += $(CC_FLAGS_NO_FPU) \
$(compat_vdso) $(cc_has_k_constraint)
KBUILD_CFLAGS += $(call cc-disable-warning, psabi)
KBUILD_AFLAGS += $(compat_vdso)
diff --git a/arch/arm64/include/asm/fpu.h b/arch/arm64/include/asm/fpu.h
new file mode 100644
index 000000000000..2ae50bdce59b
--- /dev/null
+++ b/arch/arm64/include/asm/fpu.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef __ASM_FPU_H
+#define __ASM_FPU_H
+
+#include <asm/neon.h>
+
+#define kernel_fpu_available() cpu_has_neon()
+#define kernel_fpu_begin() kernel_neon_begin()
+#define kernel_fpu_end() kernel_neon_end()
+
+#endif /* ! __ASM_FPU_H */
--
2.44.0


2024-03-29 07:26:47

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v4 08/15] powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

PowerPC provides an equivalent to the common kernel-mode FPU API, but in
a different header and using different function names. The PowerPC API
also requires a non-preemptible context. Add a wrapper header, and
export the CFLAGS adjustments.

Acked-by: Michael Ellerman <[email protected]> (powerpc)
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v1)

arch/powerpc/Kconfig | 1 +
arch/powerpc/Makefile | 5 ++++-
arch/powerpc/include/asm/fpu.h | 28 ++++++++++++++++++++++++++++
3 files changed, 33 insertions(+), 1 deletion(-)
create mode 100644 arch/powerpc/include/asm/fpu.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1c4be3373686..c42a57b6839d 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -137,6 +137,7 @@ config PPC
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_HUGEPD if HUGETLB_PAGE
select ARCH_HAS_KCOV
+ select ARCH_HAS_KERNEL_FPU_SUPPORT if PPC_FPU
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_MEMREMAP_COMPAT_ALIGN if PPC_64S_HASH_MMU
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 65261cbe5bfd..93d89f055b70 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -153,6 +153,9 @@ CFLAGS-$(CONFIG_PPC32) += $(call cc-option, $(MULTIPLEWORD))

CFLAGS-$(CONFIG_PPC32) += $(call cc-option,-mno-readonly-in-sdata)

+CC_FLAGS_FPU := $(call cc-option,-mhard-float)
+CC_FLAGS_NO_FPU := $(call cc-option,-msoft-float)
+
ifdef CONFIG_FUNCTION_TRACER
ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
@@ -174,7 +177,7 @@ asinstr := $(call as-instr,lis 9$(comma)foo@high,-DHAVE_AS_ATHIGH=1)

KBUILD_CPPFLAGS += -I $(srctree)/arch/powerpc $(asinstr)
KBUILD_AFLAGS += $(AFLAGS-y)
-KBUILD_CFLAGS += $(call cc-option,-msoft-float)
+KBUILD_CFLAGS += $(CC_FLAGS_NO_FPU)
KBUILD_CFLAGS += $(CFLAGS-y)
CPP = $(CC) -E $(KBUILD_CFLAGS)

diff --git a/arch/powerpc/include/asm/fpu.h b/arch/powerpc/include/asm/fpu.h
new file mode 100644
index 000000000000..ca584e4bc40f
--- /dev/null
+++ b/arch/powerpc/include/asm/fpu.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef _ASM_POWERPC_FPU_H
+#define _ASM_POWERPC_FPU_H
+
+#include <linux/preempt.h>
+
+#include <asm/cpu_has_feature.h>
+#include <asm/switch_to.h>
+
+#define kernel_fpu_available() (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
+
+static inline void kernel_fpu_begin(void)
+{
+ preempt_disable();
+ enable_kernel_fp();
+}
+
+static inline void kernel_fpu_end(void)
+{
+ disable_kernel_fp();
+ preempt_enable();
+}
+
+#endif /* ! _ASM_POWERPC_FPU_H */
--
2.44.0


2024-03-29 07:27:24

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v4 11/15] riscv: Add support for kernel-mode FPU

This is motivated by the amdgpu DRM driver, which needs floating-point
code to support recent hardware. That code is not performance-critical,
so only provide a minimal non-preemptible implementation for now.

Support is limited to riscv64 because riscv32 requires runtime (libgcc)
assistance to convert between doubles and 64-bit integers.

Acked-by: Palmer Dabbelt <[email protected]>
Reviewed-by: Palmer Dabbelt <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v3)

Changes in v3:
- Limit riscv ARCH_HAS_KERNEL_FPU_SUPPORT to 64BIT

Changes in v2:
- Remove RISC-V architecture-specific preprocessor check

arch/riscv/Kconfig | 1 +
arch/riscv/Makefile | 3 +++
arch/riscv/include/asm/fpu.h | 16 ++++++++++++++++
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/kernel_mode_fpu.c | 28 ++++++++++++++++++++++++++++
5 files changed, 49 insertions(+)
create mode 100644 arch/riscv/include/asm/fpu.h
create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index be09c8836d56..3bcd0d250810 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -27,6 +27,7 @@ config RISCV
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_HAS_KCOV
+ select ARCH_HAS_KERNEL_FPU_SUPPORT if 64BIT && FPU
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_MMIOWB
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 252d63942f34..76ff4033c854 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -84,6 +84,9 @@ KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64i

KBUILD_AFLAGS += -march=$(riscv-march-y)

+# For C code built with floating-point support, exclude V but keep F and D.
+CC_FLAGS_FPU := -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)([^v_]*)v?/\1\2/')
+
KBUILD_CFLAGS += -mno-save-restore
KBUILD_CFLAGS += -DCONFIG_PAGE_OFFSET=$(CONFIG_PAGE_OFFSET)

diff --git a/arch/riscv/include/asm/fpu.h b/arch/riscv/include/asm/fpu.h
new file mode 100644
index 000000000000..91c04c244e12
--- /dev/null
+++ b/arch/riscv/include/asm/fpu.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef _ASM_RISCV_FPU_H
+#define _ASM_RISCV_FPU_H
+
+#include <asm/switch_to.h>
+
+#define kernel_fpu_available() has_fpu()
+
+void kernel_fpu_begin(void);
+void kernel_fpu_end(void);
+
+#endif /* ! _ASM_RISCV_FPU_H */
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 81d94a8ee10f..5b243d46f4b1 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -67,6 +67,7 @@ obj-$(CONFIG_RISCV_MISALIGNED) += unaligned_access_speed.o
obj-$(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS) += copy-unaligned.o

obj-$(CONFIG_FPU) += fpu.o
+obj-$(CONFIG_FPU) += kernel_mode_fpu.o
obj-$(CONFIG_RISCV_ISA_V) += vector.o
obj-$(CONFIG_RISCV_ISA_V) += kernel_mode_vector.o
obj-$(CONFIG_SMP) += smpboot.o
diff --git a/arch/riscv/kernel/kernel_mode_fpu.c b/arch/riscv/kernel/kernel_mode_fpu.c
new file mode 100644
index 000000000000..0ac8348876c4
--- /dev/null
+++ b/arch/riscv/kernel/kernel_mode_fpu.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#include <linux/export.h>
+#include <linux/preempt.h>
+
+#include <asm/csr.h>
+#include <asm/fpu.h>
+#include <asm/processor.h>
+#include <asm/switch_to.h>
+
+void kernel_fpu_begin(void)
+{
+ preempt_disable();
+ fstate_save(current, task_pt_regs(current));
+ csr_set(CSR_SSTATUS, SR_FS);
+}
+EXPORT_SYMBOL_GPL(kernel_fpu_begin);
+
+void kernel_fpu_end(void)
+{
+ csr_clear(CSR_SSTATUS, SR_FS);
+ fstate_restore(current, task_pt_regs(current));
+ preempt_enable();
+}
+EXPORT_SYMBOL_GPL(kernel_fpu_end);
--
2.44.0


2024-03-29 07:27:36

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v4 12/15] drm/amd/display: Only use hard-float, not altivec on powerpc

From: Michael Ellerman <[email protected]>

The compiler flags enable altivec, but that is not required; hard-float
is sufficient for the code to build and function.

Drop altivec from the compiler flags and adjust the enable/disable code
to only enable FPU use.

Signed-off-by: Michael Ellerman <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v2)

Changes in v2:
- New patch for v2

drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 12 ++----------
drivers/gpu/drm/amd/display/dc/dml/Makefile | 2 +-
drivers/gpu/drm/amd/display/dc/dml2/Makefile | 2 +-
3 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
index 4ae4720535a5..0de16796466b 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
@@ -92,11 +92,7 @@ void dc_fpu_begin(const char *function_name, const int line)
#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
kernel_fpu_begin();
#elif defined(CONFIG_PPC64)
- if (cpu_has_feature(CPU_FTR_VSX_COMP))
- enable_kernel_vsx();
- else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
- enable_kernel_altivec();
- else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
+ if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
enable_kernel_fp();
#elif defined(CONFIG_ARM64)
kernel_neon_begin();
@@ -125,11 +121,7 @@ void dc_fpu_end(const char *function_name, const int line)
#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
kernel_fpu_end();
#elif defined(CONFIG_PPC64)
- if (cpu_has_feature(CPU_FTR_VSX_COMP))
- disable_kernel_vsx();
- else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
- disable_kernel_altivec();
- else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
+ if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
disable_kernel_fp();
#elif defined(CONFIG_ARM64)
kernel_neon_end();
diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile
index c4a5efd2dda5..59d3972341d2 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
@@ -31,7 +31,7 @@ dml_ccflags := $(dml_ccflags-y) -msse
endif

ifdef CONFIG_PPC64
-dml_ccflags := -mhard-float -maltivec
+dml_ccflags := -mhard-float
endif

ifdef CONFIG_ARM64
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
index acff3449b8d7..7b51364084b5 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
@@ -30,7 +30,7 @@ dml2_ccflags := $(dml2_ccflags-y) -msse
endif

ifdef CONFIG_PPC64
-dml2_ccflags := -mhard-float -maltivec
+dml2_ccflags := -mhard-float
endif

ifdef CONFIG_ARM64
--
2.44.0


2024-03-29 07:28:03

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v4 14/15] selftests/fpu: Move FP code to a separate translation unit

This ensures no compiler-generated floating-point code can appear
outside kernel_fpu_{begin,end}() sections, and some architectures
enforce this separation.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v2)

Changes in v2:
- Declare test_fpu() in a header

lib/Makefile | 3 ++-
lib/test_fpu.h | 8 +++++++
lib/{test_fpu.c => test_fpu_glue.c} | 32 +------------------------
lib/test_fpu_impl.c | 37 +++++++++++++++++++++++++++++
4 files changed, 48 insertions(+), 32 deletions(-)
create mode 100644 lib/test_fpu.h
rename lib/{test_fpu.c => test_fpu_glue.c} (71%)
create mode 100644 lib/test_fpu_impl.c

diff --git a/lib/Makefile b/lib/Makefile
index ffc6b2341b45..fcb35bf50979 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -133,7 +133,8 @@ FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-st
endif

obj-$(CONFIG_TEST_FPU) += test_fpu.o
-CFLAGS_test_fpu.o += $(FPU_CFLAGS)
+test_fpu-y := test_fpu_glue.o test_fpu_impl.o
+CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS)

# Some KUnit files (hooks.o) need to be built-in even when KUnit is a module,
# so we can't just use obj-$(CONFIG_KUNIT).
diff --git a/lib/test_fpu.h b/lib/test_fpu.h
new file mode 100644
index 000000000000..4459807084bc
--- /dev/null
+++ b/lib/test_fpu.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#ifndef _LIB_TEST_FPU_H
+#define _LIB_TEST_FPU_H
+
+int test_fpu(void);
+
+#endif
diff --git a/lib/test_fpu.c b/lib/test_fpu_glue.c
similarity index 71%
rename from lib/test_fpu.c
rename to lib/test_fpu_glue.c
index e82db19fed84..85963d7be826 100644
--- a/lib/test_fpu.c
+++ b/lib/test_fpu_glue.c
@@ -19,37 +19,7 @@
#include <linux/debugfs.h>
#include <asm/fpu/api.h>

-static int test_fpu(void)
-{
- /*
- * This sequence of operations tests that rounding mode is
- * to nearest and that denormal numbers are supported.
- * Volatile variables are used to avoid compiler optimizing
- * the calculations away.
- */
- volatile double a, b, c, d, e, f, g;
-
- a = 4.0;
- b = 1e-15;
- c = 1e-310;
-
- /* Sets precision flag */
- d = a + b;
-
- /* Result depends on rounding mode */
- e = a + b / 2;
-
- /* Denormal and very large values */
- f = b / c;
-
- /* Depends on denormal support */
- g = a + c * f;
-
- if (d > a && e > a && g > a)
- return 0;
- else
- return -EINVAL;
-}
+#include "test_fpu.h"

static int test_fpu_get(void *data, u64 *val)
{
diff --git a/lib/test_fpu_impl.c b/lib/test_fpu_impl.c
new file mode 100644
index 000000000000..777894dbbe86
--- /dev/null
+++ b/lib/test_fpu_impl.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include <linux/errno.h>
+
+#include "test_fpu.h"
+
+int test_fpu(void)
+{
+ /*
+ * This sequence of operations tests that rounding mode is
+ * to nearest and that denormal numbers are supported.
+ * Volatile variables are used to avoid compiler optimizing
+ * the calculations away.
+ */
+ volatile double a, b, c, d, e, f, g;
+
+ a = 4.0;
+ b = 1e-15;
+ c = 1e-310;
+
+ /* Sets precision flag */
+ d = a + b;
+
+ /* Result depends on rounding mode */
+ e = a + b / 2;
+
+ /* Denormal and very large values */
+ f = b / c;
+
+ /* Depends on denormal support */
+ g = a + c * f;
+
+ if (d > a && e > a && g > a)
+ return 0;
+ else
+ return -EINVAL;
+}
--
2.44.0


2024-03-29 07:28:15

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v4 15/15] selftests/fpu: Allow building on other architectures

Now that ARCH_HAS_KERNEL_FPU_SUPPORT provides a common way to compile
and run floating-point code, this test is no longer x86-specific.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v1)

lib/Kconfig.debug | 2 +-
lib/Makefile | 25 ++-----------------------
lib/test_fpu_glue.c | 5 ++++-
3 files changed, 7 insertions(+), 25 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c63a5fbf1f1c..f93e778e0405 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2890,7 +2890,7 @@ config TEST_FREE_PAGES

config TEST_FPU
tristate "Test floating point operations in kernel space"
- depends on X86 && !KCOV_INSTRUMENT_ALL
+ depends on ARCH_HAS_KERNEL_FPU_SUPPORT && !KCOV_INSTRUMENT_ALL
help
Enable this option to add /sys/kernel/debug/selftest_helpers/test_fpu
which will trigger a sequence of floating point operations. This is used
diff --git a/lib/Makefile b/lib/Makefile
index fcb35bf50979..e44ad11f77b5 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -110,31 +110,10 @@ CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE)
obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o
obj-$(CONFIG_TEST_OBJPOOL) += test_objpool.o

-#
-# CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns
-# off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS
-# get appended last to CFLAGS and thus override those previous compiler options.
-#
-FPU_CFLAGS := -msse -msse2
-ifdef CONFIG_CC_IS_GCC
-# Stack alignment mismatch, proceed with caution.
-# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
-# (8B stack alignment).
-# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383
-#
-# The "-msse" in the first argument is there so that the
-# -mpreferred-stack-boundary=3 build error:
-#
-# -mpreferred-stack-boundary=3 is not between 4 and 12
-#
-# can be triggered. Otherwise gcc doesn't complain.
-FPU_CFLAGS += -mhard-float
-FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4)
-endif
-
obj-$(CONFIG_TEST_FPU) += test_fpu.o
test_fpu-y := test_fpu_glue.o test_fpu_impl.o
-CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS)
+CFLAGS_test_fpu_impl.o += $(CC_FLAGS_FPU)
+CFLAGS_REMOVE_test_fpu_impl.o += $(CC_FLAGS_NO_FPU)

# Some KUnit files (hooks.o) need to be built-in even when KUnit is a module,
# so we can't just use obj-$(CONFIG_KUNIT).
diff --git a/lib/test_fpu_glue.c b/lib/test_fpu_glue.c
index 85963d7be826..eef282a2715f 100644
--- a/lib/test_fpu_glue.c
+++ b/lib/test_fpu_glue.c
@@ -17,7 +17,7 @@
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/debugfs.h>
-#include <asm/fpu/api.h>
+#include <linux/fpu.h>

#include "test_fpu.h"

@@ -38,6 +38,9 @@ static struct dentry *selftest_dir;

static int __init test_fpu_init(void)
{
+ if (!kernel_fpu_available())
+ return -EINVAL;
+
selftest_dir = debugfs_create_dir("selftest_helpers", NULL);
if (!selftest_dir)
return -ENOMEM;
--
2.44.0


2024-03-29 07:30:09

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v4 09/15] x86/fpu: Fix asm/fpu/types.h include guard

The include guard should match the filename, or it will conflict with
the newly-added asm/fpu.h.

Signed-off-by: Samuel Holland <[email protected]>
---

Changes in v4:
- New patch for v4

arch/x86/include/asm/fpu/types.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index ace9aa3b78a3..eb17f31b06d2 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -2,8 +2,8 @@
/*
* FPU data structures:
*/
-#ifndef _ASM_X86_FPU_H
-#define _ASM_X86_FPU_H
+#ifndef _ASM_X86_FPU_TYPES_H
+#define _ASM_X86_FPU_TYPES_H

#include <asm/page_types.h>

@@ -596,4 +596,4 @@ struct fpu_state_config {
/* FPU state configuration information */
extern struct fpu_state_config fpu_kernel_cfg, fpu_user_cfg;

-#endif /* _ASM_X86_FPU_H */
+#endif /* _ASM_X86_FPU_TYPES_H */
--
2.44.0


2024-03-29 07:30:32

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v4 10/15] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

x86 already provides kernel_fpu_begin() and kernel_fpu_end(), but in a
different header. Add a wrapper header, and export the CFLAGS
adjustments as found in lib/Makefile.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v1)

arch/x86/Kconfig | 1 +
arch/x86/Makefile | 20 ++++++++++++++++++++
arch/x86/include/asm/fpu.h | 13 +++++++++++++
3 files changed, 34 insertions(+)
create mode 100644 arch/x86/include/asm/fpu.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 39886bab943a..7c9d032ee675 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -83,6 +83,7 @@ config X86
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_KCOV if X86_64
+ select ARCH_HAS_KERNEL_FPU_SUPPORT
select ARCH_HAS_MEM_ENCRYPT
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 662d9d4033e6..5a5f5999c505 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -74,6 +74,26 @@ KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx
KBUILD_RUSTFLAGS += --target=$(objtree)/scripts/target.json
KBUILD_RUSTFLAGS += -Ctarget-feature=-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-avx,-avx2

+#
+# CFLAGS for compiling floating point code inside the kernel.
+#
+CC_FLAGS_FPU := -msse -msse2
+ifdef CONFIG_CC_IS_GCC
+# Stack alignment mismatch, proceed with caution.
+# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
+# (8B stack alignment).
+# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383
+#
+# The "-msse" in the first argument is there so that the
+# -mpreferred-stack-boundary=3 build error:
+#
+# -mpreferred-stack-boundary=3 is not between 4 and 12
+#
+# can be triggered. Otherwise gcc doesn't complain.
+CC_FLAGS_FPU += -mhard-float
+CC_FLAGS_FPU += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4)
+endif
+
ifeq ($(CONFIG_X86_KERNEL_IBT),y)
#
# Kernel IBT has S_CET.NOTRACK_EN=0, as such the compilers must not generate
diff --git a/arch/x86/include/asm/fpu.h b/arch/x86/include/asm/fpu.h
new file mode 100644
index 000000000000..b2743fe19339
--- /dev/null
+++ b/arch/x86/include/asm/fpu.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef _ASM_X86_FPU_H
+#define _ASM_X86_FPU_H
+
+#include <asm/fpu/api.h>
+
+#define kernel_fpu_available() true
+
+#endif /* ! _ASM_X86_FPU_H */
--
2.44.0


2024-03-29 07:31:38

by Samuel Holland

[permalink] [raw]
Subject: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT

Now that all previously-supported architectures select
ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
of the existing list of architectures. It can also take advantage of the
common kernel-mode FPU API and method of adjusting CFLAGS.

Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
---

(no changes since v2)

Changes in v2:
- Split altivec removal to a separate patch
- Use linux/fpu.h instead of asm/fpu.h in consumers

drivers/gpu/drm/amd/display/Kconfig | 2 +-
.../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 27 ++------------
drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 ++-----------------
drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 ++-----------------
4 files changed, 7 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig
index 901d1961b739..5fcd4f778dc3 100644
--- a/drivers/gpu/drm/amd/display/Kconfig
+++ b/drivers/gpu/drm/amd/display/Kconfig
@@ -8,7 +8,7 @@ config DRM_AMD_DC
depends on BROKEN || !CC_IS_CLANG || ARM64 || RISCV || SPARC64 || X86_64
select SND_HDA_COMPONENT if SND_HDA_CORE
# !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752
- select DRM_AMD_DC_FP if (X86 || LOONGARCH || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG))
+ select DRM_AMD_DC_FP if ARCH_HAS_KERNEL_FPU_SUPPORT && (!ARM64 || !CC_IS_CLANG)
help
Choose this option if you want to use the new display engine
support for AMDGPU. This adds required support for Vega and
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
index 0de16796466b..e46f8ce41d87 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
@@ -26,16 +26,7 @@

#include "dc_trace.h"

-#if defined(CONFIG_X86)
-#include <asm/fpu/api.h>
-#elif defined(CONFIG_PPC64)
-#include <asm/switch_to.h>
-#include <asm/cputable.h>
-#elif defined(CONFIG_ARM64)
-#include <asm/neon.h>
-#elif defined(CONFIG_LOONGARCH)
-#include <asm/fpu.h>
-#endif
+#include <linux/fpu.h>

/**
* DOC: DC FPU manipulation overview
@@ -87,16 +78,9 @@ void dc_fpu_begin(const char *function_name, const int line)
WARN_ON_ONCE(!in_task());
preempt_disable();
depth = __this_cpu_inc_return(fpu_recursion_depth);
-
if (depth == 1) {
-#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
+ BUG_ON(!kernel_fpu_available());
kernel_fpu_begin();
-#elif defined(CONFIG_PPC64)
- if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
- enable_kernel_fp();
-#elif defined(CONFIG_ARM64)
- kernel_neon_begin();
-#endif
}

TRACE_DCN_FPU(true, function_name, line, depth);
@@ -118,14 +102,7 @@ void dc_fpu_end(const char *function_name, const int line)

depth = __this_cpu_dec_return(fpu_recursion_depth);
if (depth == 0) {
-#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
kernel_fpu_end();
-#elif defined(CONFIG_PPC64)
- if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
- disable_kernel_fp();
-#elif defined(CONFIG_ARM64)
- kernel_neon_end();
-#endif
} else {
WARN_ON_ONCE(depth < 0);
}
diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile
index 59d3972341d2..a94b6d546cd1 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
@@ -25,40 +25,8 @@
# It provides the general basic services required by other DAL
# subcomponents.

-ifdef CONFIG_X86
-dml_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float
-dml_ccflags := $(dml_ccflags-y) -msse
-endif
-
-ifdef CONFIG_PPC64
-dml_ccflags := -mhard-float
-endif
-
-ifdef CONFIG_ARM64
-dml_rcflags := -mgeneral-regs-only
-endif
-
-ifdef CONFIG_LOONGARCH
-dml_ccflags := -mfpu=64
-dml_rcflags := -msoft-float
-endif
-
-ifdef CONFIG_CC_IS_GCC
-ifneq ($(call gcc-min-version, 70100),y)
-IS_OLD_GCC = 1
-endif
-endif
-
-ifdef CONFIG_X86
-ifdef IS_OLD_GCC
-# Stack alignment mismatch, proceed with caution.
-# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
-# (8B stack alignment).
-dml_ccflags += -mpreferred-stack-boundary=4
-else
-dml_ccflags += -msse2
-endif
-endif
+dml_ccflags := $(CC_FLAGS_FPU)
+dml_rcflags := $(CC_FLAGS_NO_FPU)

ifneq ($(CONFIG_FRAME_WARN),0)
ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y)
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
index 7b51364084b5..4f6c804a26ad 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
@@ -24,40 +24,8 @@
#
# Makefile for dml2.

-ifdef CONFIG_X86
-dml2_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float
-dml2_ccflags := $(dml2_ccflags-y) -msse
-endif
-
-ifdef CONFIG_PPC64
-dml2_ccflags := -mhard-float
-endif
-
-ifdef CONFIG_ARM64
-dml2_rcflags := -mgeneral-regs-only
-endif
-
-ifdef CONFIG_LOONGARCH
-dml2_ccflags := -mfpu=64
-dml2_rcflags := -msoft-float
-endif
-
-ifdef CONFIG_CC_IS_GCC
-ifeq ($(call cc-ifversion, -lt, 0701, y), y)
-IS_OLD_GCC = 1
-endif
-endif
-
-ifdef CONFIG_X86
-ifdef IS_OLD_GCC
-# Stack alignment mismatch, proceed with caution.
-# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
-# (8B stack alignment).
-dml2_ccflags += -mpreferred-stack-boundary=4
-else
-dml2_ccflags += -msse2
-endif
-endif
+dml2_ccflags := $(CC_FLAGS_FPU)
+dml2_rcflags := $(CC_FLAGS_NO_FPU)

ifneq ($(CONFIG_FRAME_WARN),0)
ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y)
--
2.44.0


2024-03-29 17:30:13

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 10/15] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

On 3/29/24 00:18, Samuel Holland wrote:
> +#
> +# CFLAGS for compiling floating point code inside the kernel.
> +#
> +CC_FLAGS_FPU := -msse -msse2
> +ifdef CONFIG_CC_IS_GCC
> +# Stack alignment mismatch, proceed with caution.
> +# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
> +# (8B stack alignment).
> +# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383
> +#
> +# The "-msse" in the first argument is there so that the
> +# -mpreferred-stack-boundary=3 build error:
> +#
> +# -mpreferred-stack-boundary=3 is not between 4 and 12
> +#
> +# can be triggered. Otherwise gcc doesn't complain.
> +CC_FLAGS_FPU += -mhard-float
> +CC_FLAGS_FPU += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4)
> +endif

I was expecting to see this (now duplicate) hunk come _out_ of
lib/Makefile somewhere in the series.

Did I miss that, or is there something keeping the duplicate there?

2024-03-29 17:30:28

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v4 09/15] x86/fpu: Fix asm/fpu/types.h include guard

On 3/29/24 00:18, Samuel Holland wrote:
> The include guard should match the filename, or it will conflict with
> the newly-added asm/fpu.h.

Acked-by: Dave Hansen <[email protected]>

2024-03-29 18:03:14

by Samuel Holland

[permalink] [raw]
Subject: Re: [PATCH v4 10/15] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

On 2024-03-29 12:28 PM, Dave Hansen wrote:
> On 3/29/24 00:18, Samuel Holland wrote:
>> +#
>> +# CFLAGS for compiling floating point code inside the kernel.
>> +#
>> +CC_FLAGS_FPU := -msse -msse2
>> +ifdef CONFIG_CC_IS_GCC
>> +# Stack alignment mismatch, proceed with caution.
>> +# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
>> +# (8B stack alignment).
>> +# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383
>> +#
>> +# The "-msse" in the first argument is there so that the
>> +# -mpreferred-stack-boundary=3 build error:
>> +#
>> +# -mpreferred-stack-boundary=3 is not between 4 and 12
>> +#
>> +# can be triggered. Otherwise gcc doesn't complain.
>> +CC_FLAGS_FPU += -mhard-float
>> +CC_FLAGS_FPU += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4)
>> +endif
>
> I was expecting to see this (now duplicate) hunk come _out_ of
> lib/Makefile somewhere in the series.
>
> Did I miss that, or is there something keeping the duplicate there?

This hunk is removed in patch 15/15, after the conversion of lib/test_fpu.c:

https://lore.kernel.org/linux-kernel/[email protected]/

Regards,
Samuel


2024-04-03 12:54:50

by Christian König

[permalink] [raw]
Subject: Re: [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API

I only skimmed over the platform patches and spend only a few minutes on
the amdgpu stuff.

From what I've seen this series seems to make perfect sense to me, I
just can't fully judge everything.

So feel free to add Acked-by: Christian König <[email protected]>
but I strongly suggest that Harry and Rodrigo take a look as well.

Regards,
Christian.

Am 29.03.24 um 08:18 schrieb Samuel Holland:
> This series unifies the kernel-mode FPU API across several architectures
> by wrapping the existing functions (where needed) in consistently-named
> functions placed in a consistent header location, with mostly the same
> semantics: they can be called from preemptible or non-preemptible task
> context, and are not assumed to be reentrant. Architectures are also
> expected to provide CFLAGS adjustments for compiling FPU-dependent code.
> For the moment, SIMD/vector units are out of scope for this common API.
>
> This allows us to remove the ifdeffery and duplicated Makefile logic at
> each FPU user. It then implements the common API on RISC-V, and converts
> a couple of users to the new API: the AMDGPU DRM driver, and the FPU
> self test.
>
> The underlying goal of this series is to allow using newer AMD GPUs
> (e.g. Navi) on RISC-V boards such as SiFive's HiFive Unmatched. Those
> GPUs need CONFIG_DRM_AMD_DC_FP to initialize, which requires kernel-mode
> FPU support.
>
> Previous versions:
> v3: https://lore.kernel.org/linux-kernel/[email protected]/
> v2: https://lore.kernel.org/linux-kernel/[email protected]/
> v1: https://lore.kernel.org/linux-kernel/[email protected]/
> v0: https://lore.kernel.org/linux-kernel/[email protected]/
>
> Changes in v4:
> - Add missed CFLAGS changes for recov_neon_inner.c
> (fixes arm build failures)
> - Fix x86 include guard issue (fixes x86 build failures)
>
> Changes in v3:
> - Rebase on v6.9-rc1
> - Limit riscv ARCH_HAS_KERNEL_FPU_SUPPORT to 64BIT
>
> Changes in v2:
> - Add documentation explaining the built-time and runtime APIs
> - Add a linux/fpu.h header for generic isolation enforcement
> - Remove file name from header comment
> - Clean up arch/arm64/lib/Makefile, like for arch/arm
> - Remove RISC-V architecture-specific preprocessor check
> - Split altivec removal to a separate patch
> - Use linux/fpu.h instead of asm/fpu.h in consumers
> - Declare test_fpu() in a header
>
> Michael Ellerman (1):
> drm/amd/display: Only use hard-float, not altivec on powerpc
>
> Samuel Holland (14):
> arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT
> ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
> ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
> arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
> arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
> lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS
> LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
> powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
> x86/fpu: Fix asm/fpu/types.h include guard
> x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
> riscv: Add support for kernel-mode FPU
> drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
> selftests/fpu: Move FP code to a separate translation unit
> selftests/fpu: Allow building on other architectures
>
> Documentation/core-api/floating-point.rst | 78 +++++++++++++++++++
> Documentation/core-api/index.rst | 1 +
> Makefile | 5 ++
> arch/Kconfig | 6 ++
> arch/arm/Kconfig | 1 +
> arch/arm/Makefile | 7 ++
> arch/arm/include/asm/fpu.h | 15 ++++
> arch/arm/lib/Makefile | 3 +-
> arch/arm64/Kconfig | 1 +
> arch/arm64/Makefile | 9 ++-
> arch/arm64/include/asm/fpu.h | 15 ++++
> arch/arm64/lib/Makefile | 6 +-
> arch/loongarch/Kconfig | 1 +
> arch/loongarch/Makefile | 5 +-
> arch/loongarch/include/asm/fpu.h | 1 +
> arch/powerpc/Kconfig | 1 +
> arch/powerpc/Makefile | 5 +-
> arch/powerpc/include/asm/fpu.h | 28 +++++++
> arch/riscv/Kconfig | 1 +
> arch/riscv/Makefile | 3 +
> arch/riscv/include/asm/fpu.h | 16 ++++
> arch/riscv/kernel/Makefile | 1 +
> arch/riscv/kernel/kernel_mode_fpu.c | 28 +++++++
> arch/x86/Kconfig | 1 +
> arch/x86/Makefile | 20 +++++
> arch/x86/include/asm/fpu.h | 13 ++++
> arch/x86/include/asm/fpu/types.h | 6 +-
> drivers/gpu/drm/amd/display/Kconfig | 2 +-
> .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 35 +--------
> drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 +--------
> drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 +--------
> include/linux/fpu.h | 12 +++
> lib/Kconfig.debug | 2 +-
> lib/Makefile | 26 +------
> lib/raid6/Makefile | 33 +++-----
> lib/test_fpu.h | 8 ++
> lib/{test_fpu.c => test_fpu_glue.c} | 37 ++-------
> lib/test_fpu_impl.c | 37 +++++++++
> 38 files changed, 348 insertions(+), 193 deletions(-)
> create mode 100644 Documentation/core-api/floating-point.rst
> create mode 100644 arch/arm/include/asm/fpu.h
> create mode 100644 arch/arm64/include/asm/fpu.h
> create mode 100644 arch/powerpc/include/asm/fpu.h
> create mode 100644 arch/riscv/include/asm/fpu.h
> create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c
> create mode 100644 arch/x86/include/asm/fpu.h
> create mode 100644 include/linux/fpu.h
> create mode 100644 lib/test_fpu.h
> rename lib/{test_fpu.c => test_fpu_glue.c} (71%)
> create mode 100644 lib/test_fpu_impl.c
>


2024-04-10 22:26:11

by Thiago Jung Bauermann

[permalink] [raw]
Subject: Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT


Hello,

Samuel Holland <[email protected]> writes:

> Now that all previously-supported architectures select
> ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
> of the existing list of architectures. It can also take advantage of the
> common kernel-mode FPU API and method of adjusting CFLAGS.
>
> Acked-by: Alex Deucher <[email protected]>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Samuel Holland <[email protected]>

Unfortunately this patch causes build failures on arm with allyesconfig
and allmodconfig. Tested with next-20240410.

Error with allyesconfig:

$ make -j 8 \
O=$HOME/.cache/builds/linux-cross-arm \
ARCH=arm \
CROSS_COMPILE=arm-linux-gnueabihf-
make[1]: Entering directory '/home/bauermann/.cache/builds/linux-cross-arm'

arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.o: in function `dcn20_populate_dml_pipes_from_context':
dcn20_fpu.c:(.text+0x20f4): undefined reference to `__aeabi_l2d'
arm-linux-gnueabihf-ld: dcn20_fpu.c:(.text+0x210c): undefined reference to `__aeabi_l2d'
arm-linux-gnueabihf-ld: dcn20_fpu.c:(.text+0x2124): undefined reference to `__aeabi_l2d'
arm-linux-gnueabihf-ld: dcn20_fpu.c:(.text+0x213c): undefined reference to `__aeabi_l2d'
arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.o: in function `pipe_ctx_to_e2e_pipe_params':
dcn_calcs.c:(.text+0x390): undefined reference to `__aeabi_l2d'
arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.o:dcn_calcs.c:(.text+0x3a4): more undefined references to `__aeabi_l2d' follow
arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.o: in function `optimize_configuration':
dml2_wrapper.c:(.text+0xcbc): undefined reference to `__aeabi_d2ulz'
arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.o: in function `populate_dml_plane_cfg_from_plane_state':
dml2_translation_helper.c:(.text+0x9e4): undefined reference to `__aeabi_l2d'
arm-linux-gnueabihf-ld: dml2_translation_helper.c:(.text+0xa20): undefined reference to `__aeabi_l2d'
arm-linux-gnueabihf-ld: dml2_translation_helper.c:(.text+0xa58): undefined reference to `__aeabi_l2d'
arm-linux-gnueabihf-ld: dml2_translation_helper.c:(.text+0xa90): undefined reference to `__aeabi_l2d'
make[3]: *** [/home/bauermann/src/linux/scripts/Makefile.vmlinux:37: vmlinux] Error 1
make[2]: *** [/home/bauermann/src/linux/Makefile:1165: vmlinux] Error 2
make[1]: *** [/home/bauermann/src/linux/Makefile:240: __sub-make] Error 2
make[1]: Leaving directory '/home/bauermann/.cache/builds/linux-cross-arm'
make: *** [Makefile:240: __sub-make] Error 2

The error with allmodconfig is slightly different:

$ make -j 8 \
O=$HOME/.cache/builds/linux-cross-arm \
ARCH=arm \
CROSS_COMPILE=arm-linux-gnueabihf-
make[1]: Entering directory '/home/bauermann/.cache/builds/linux-cross-arm'

ERROR: modpost: "__aeabi_d2ulz" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
ERROR: modpost: "__aeabi_l2d" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
make[3]: *** [/home/bauermann/src/linux/scripts/Makefile.modpost:145: Module.symvers] Error 1
make[2]: *** [/home/bauermann/src/linux/Makefile:1876: modpost] Error 2
make[1]: *** [/home/bauermann/src/linux/Makefile:240: __sub-make] Error 2
make[1]: Leaving directory '/home/bauermann/.cache/builds/linux-cross-arm'
make: *** [Makefile:240: __sub-make] Error 2

--
Thiago

2024-04-10 22:56:15

by Samuel Holland

[permalink] [raw]
Subject: Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT

Hi Thiago,

On 2024-04-10 5:21 PM, Thiago Jung Bauermann wrote:
> Samuel Holland <[email protected]> writes:
>
>> Now that all previously-supported architectures select
>> ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
>> of the existing list of architectures. It can also take advantage of the
>> common kernel-mode FPU API and method of adjusting CFLAGS.
>>
>> Acked-by: Alex Deucher <[email protected]>
>> Reviewed-by: Christoph Hellwig <[email protected]>
>> Signed-off-by: Samuel Holland <[email protected]>
>
> Unfortunately this patch causes build failures on arm with allyesconfig
> and allmodconfig. Tested with next-20240410.
>
> Error with allyesconfig:
>
> $ make -j 8 \
> O=$HOME/.cache/builds/linux-cross-arm \
> ARCH=arm \
> CROSS_COMPILE=arm-linux-gnueabihf-
> make[1]: Entering directory '/home/bauermann/.cache/builds/linux-cross-arm'
> ⋮
> arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.o: in function `dcn20_populate_dml_pipes_from_context':
> dcn20_fpu.c:(.text+0x20f4): undefined reference to `__aeabi_l2d'
> arm-linux-gnueabihf-ld: dcn20_fpu.c:(.text+0x210c): undefined reference to `__aeabi_l2d'
> arm-linux-gnueabihf-ld: dcn20_fpu.c:(.text+0x2124): undefined reference to `__aeabi_l2d'
> arm-linux-gnueabihf-ld: dcn20_fpu.c:(.text+0x213c): undefined reference to `__aeabi_l2d'
> arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.o: in function `pipe_ctx_to_e2e_pipe_params':
> dcn_calcs.c:(.text+0x390): undefined reference to `__aeabi_l2d'
> arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.o:dcn_calcs.c:(.text+0x3a4): more undefined references to `__aeabi_l2d' follow
> arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.o: in function `optimize_configuration':
> dml2_wrapper.c:(.text+0xcbc): undefined reference to `__aeabi_d2ulz'
> arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.o: in function `populate_dml_plane_cfg_from_plane_state':
> dml2_translation_helper.c:(.text+0x9e4): undefined reference to `__aeabi_l2d'
> arm-linux-gnueabihf-ld: dml2_translation_helper.c:(.text+0xa20): undefined reference to `__aeabi_l2d'
> arm-linux-gnueabihf-ld: dml2_translation_helper.c:(.text+0xa58): undefined reference to `__aeabi_l2d'
> arm-linux-gnueabihf-ld: dml2_translation_helper.c:(.text+0xa90): undefined reference to `__aeabi_l2d'
> make[3]: *** [/home/bauermann/src/linux/scripts/Makefile.vmlinux:37: vmlinux] Error 1
> make[2]: *** [/home/bauermann/src/linux/Makefile:1165: vmlinux] Error 2
> make[1]: *** [/home/bauermann/src/linux/Makefile:240: __sub-make] Error 2
> make[1]: Leaving directory '/home/bauermann/.cache/builds/linux-cross-arm'
> make: *** [Makefile:240: __sub-make] Error 2
>
> The error with allmodconfig is slightly different:
>
> $ make -j 8 \
> O=$HOME/.cache/builds/linux-cross-arm \
> ARCH=arm \
> CROSS_COMPILE=arm-linux-gnueabihf-
> make[1]: Entering directory '/home/bauermann/.cache/builds/linux-cross-arm'
> ⋮
> ERROR: modpost: "__aeabi_d2ulz" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
> ERROR: modpost: "__aeabi_l2d" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
> make[3]: *** [/home/bauermann/src/linux/scripts/Makefile.modpost:145: Module.symvers] Error 1
> make[2]: *** [/home/bauermann/src/linux/Makefile:1876: modpost] Error 2
> make[1]: *** [/home/bauermann/src/linux/Makefile:240: __sub-make] Error 2
> make[1]: Leaving directory '/home/bauermann/.cache/builds/linux-cross-arm'
> make: *** [Makefile:240: __sub-make] Error 2

In both cases, the issue is that the toolchain requires runtime support to
convert between `unsigned long long` and `double`, even when hardware FP is
enabled. There was some past discussion about GCC inlining some of these
conversions[1], but that did not get implemented.

The short-term fix would be to drop the `select ARCH_HAS_KERNEL_FPU_SUPPORT` for
32-bit arm until we can provide these runtime library functions.

Regards,
Samuel

[1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970

2024-04-11 01:02:23

by Thiago Jung Bauermann

[permalink] [raw]
Subject: Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT


Hello Samuel,

Thank you for the quick reply!

Samuel Holland <[email protected]> writes:
> On 2024-04-10 5:21 PM, Thiago Jung Bauermann wrote:
>>
>> Unfortunately this patch causes build failures on arm with allyesconfig
>> and allmodconfig. Tested with next-20240410.

<snip>

> In both cases, the issue is that the toolchain requires runtime support to
> convert between `unsigned long long` and `double`, even when hardware FP is
> enabled. There was some past discussion about GCC inlining some of these
> conversions[1], but that did not get implemented.

Thank you for the explanation and the bugzilla reference. I added a
comment there mentioning that the problem came up again with this patch
series.

> The short-term fix would be to drop the `select ARCH_HAS_KERNEL_FPU_SUPPORT` for
> 32-bit arm until we can provide these runtime library functions.

Does this mean that patch 2 in this series:

[PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

will be dropped?

--
Thiago

2024-04-11 01:11:16

by Samuel Holland

[permalink] [raw]
Subject: Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT

Hi Thiago,

On 2024-04-10 8:02 PM, Thiago Jung Bauermann wrote:
> Samuel Holland <[email protected]> writes:
>> On 2024-04-10 5:21 PM, Thiago Jung Bauermann wrote:
>>>
>>> Unfortunately this patch causes build failures on arm with allyesconfig
>>> and allmodconfig. Tested with next-20240410.
>
> <snip>
>
>> In both cases, the issue is that the toolchain requires runtime support to
>> convert between `unsigned long long` and `double`, even when hardware FP is
>> enabled. There was some past discussion about GCC inlining some of these
>> conversions[1], but that did not get implemented.
>
> Thank you for the explanation and the bugzilla reference. I added a
> comment there mentioning that the problem came up again with this patch
> series.
>
>> The short-term fix would be to drop the `select ARCH_HAS_KERNEL_FPU_SUPPORT` for
>> 32-bit arm until we can provide these runtime library functions.
>
> Does this mean that patch 2 in this series:
>
> [PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
>
> will be dropped?

No, because later patches in the series (3, 6) depend on the definition of
CC_FLAGS_FPU from that patch. I will need to send a fixup patch unless I can
find a GPL-2 compatible implementation of the runtime library functions.

Regards,
Samuel


2024-04-11 01:30:16

by Thiago Jung Bauermann

[permalink] [raw]
Subject: Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT


Hello Samuel,

Samuel Holland <[email protected]> writes:

> On 2024-04-10 8:02 PM, Thiago Jung Bauermann wrote:
>> Samuel Holland <[email protected]> writes:
>>> On 2024-04-10 5:21 PM, Thiago Jung Bauermann wrote:
>>>>
>>>> Unfortunately this patch causes build failures on arm with allyesconfig
>>>> and allmodconfig. Tested with next-20240410.
>>
>> <snip>
>>
>>> In both cases, the issue is that the toolchain requires runtime support to
>>> convert between `unsigned long long` and `double`, even when hardware FP is
>>> enabled. There was some past discussion about GCC inlining some of these
>>> conversions[1], but that did not get implemented.
>>
>> Thank you for the explanation and the bugzilla reference. I added a
>> comment there mentioning that the problem came up again with this patch
>> series.
>>
>>> The short-term fix would be to drop the `select ARCH_HAS_KERNEL_FPU_SUPPORT` for
>>> 32-bit arm until we can provide these runtime library functions.
>>
>> Does this mean that patch 2 in this series:
>>
>> [PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
>>
>> will be dropped?
>
> No, because later patches in the series (3, 6) depend on the definition of
> CC_FLAGS_FPU from that patch. I will need to send a fixup patch unless I can
> find a GPL-2 compatible implementation of the runtime library functions.

Ok, thank you for clarifying.

Andrew Pinski just responded on the GCC bugzilla and if I understood his
point correctly, it seems to be a matter of changing function names to
what GCC (or actually the arm EABI) expects...

--
Thiago

2024-04-11 07:16:05

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT

(cc Arnd)

On Thu, 11 Apr 2024 at 03:11, Samuel Holland <[email protected]> wrote:
>
> Hi Thiago,
>
> On 2024-04-10 8:02 PM, Thiago Jung Bauermann wrote:
> > Samuel Holland <[email protected]> writes:
> >> On 2024-04-10 5:21 PM, Thiago Jung Bauermann wrote:
> >>>
> >>> Unfortunately this patch causes build failures on arm with allyesconfig
> >>> and allmodconfig. Tested with next-20240410.
> >
> > <snip>
> >
> >> In both cases, the issue is that the toolchain requires runtime support to
> >> convert between `unsigned long long` and `double`, even when hardware FP is
> >> enabled. There was some past discussion about GCC inlining some of these
> >> conversions[1], but that did not get implemented.
> >
> > Thank you for the explanation and the bugzilla reference. I added a
> > comment there mentioning that the problem came up again with this patch
> > series.
> >
> >> The short-term fix would be to drop the `select ARCH_HAS_KERNEL_FPU_SUPPORT` for
> >> 32-bit arm until we can provide these runtime library functions.
> >
> > Does this mean that patch 2 in this series:
> >
> > [PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
> >
> > will be dropped?
>
> No, because later patches in the series (3, 6) depend on the definition of
> CC_FLAGS_FPU from that patch. I will need to send a fixup patch unless I can
> find a GPL-2 compatible implementation of the runtime library functions.
>

Is there really a point to doing that? Do 32-bit ARM systems even have
enough address space to the map the BARs of the AMD GPUs that need
this support?

Given that this was not enabled before, I don't think the upshot of
this series should be that we enable support for something on 32-bit
ARM that may cause headaches down the road without any benefit.

So I'd prefer a fixup patch that opts ARM out of this over adding
support code for 64-bit conversions.

2024-04-11 07:52:03

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT

On Thu, Apr 11, 2024, at 09:15, Ard Biesheuvel wrote:
> On Thu, 11 Apr 2024 at 03:11, Samuel Holland <[email protected]> wrote:
>> On 2024-04-10 8:02 PM, Thiago Jung Bauermann wrote:
>> > Samuel Holland <[email protected]> writes:
>>
>> >> The short-term fix would be to drop the `select ARCH_HAS_KERNEL_FPU_SUPPORT` for
>> >> 32-bit arm until we can provide these runtime library functions.
>> >
>> > Does this mean that patch 2 in this series:
>> >
>> > [PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
>> >
>> > will be dropped?
>>
>> No, because later patches in the series (3, 6) depend on the definition of
>> CC_FLAGS_FPU from that patch. I will need to send a fixup patch unless I can
>> find a GPL-2 compatible implementation of the runtime library functions.
>>
>
> Is there really a point to doing that? Do 32-bit ARM systems even have
> enough address space to the map the BARs of the AMD GPUs that need
> this support?
>
> Given that this was not enabled before, I don't think the upshot of
> this series should be that we enable support for something on 32-bit
> ARM that may cause headaches down the road without any benefit.
>
> So I'd prefer a fixup patch that opts ARM out of this over adding
> support code for 64-bit conversions.

I have not found any dts file for a 32-bit platform with support
for a 64-bit prefetchable BAR, and there are very few that even
have a pcie slot (as opposed on on-board devices) you could
plug a card into.

That said, I also don't think we should encourage the use of
floating-point code in random device drivers. There is really
no excuse for the amdgpu driver to use floating point math
here, and we should get AMD to fix their driver instead.

Arnd

2024-04-12 01:54:47

by Dave Airlie

[permalink] [raw]
Subject: Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT

On Thu, 11 Apr 2024 at 17:32, Arnd Bergmann <[email protected]> wrote:
>
> On Thu, Apr 11, 2024, at 09:15, Ard Biesheuvel wrote:
> > On Thu, 11 Apr 2024 at 03:11, Samuel Holland <[email protected]> wrote:
> >> On 2024-04-10 8:02 PM, Thiago Jung Bauermann wrote:
> >> > Samuel Holland <[email protected]> writes:
> >>
> >> >> The short-term fix would be to drop the `select ARCH_HAS_KERNEL_FPU_SUPPORT` for
> >> >> 32-bit arm until we can provide these runtime library functions.
> >> >
> >> > Does this mean that patch 2 in this series:
> >> >
> >> > [PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
> >> >
> >> > will be dropped?
> >>
> >> No, because later patches in the series (3, 6) depend on the definition of
> >> CC_FLAGS_FPU from that patch. I will need to send a fixup patch unless I can
> >> find a GPL-2 compatible implementation of the runtime library functions.
> >>
> >
> > Is there really a point to doing that? Do 32-bit ARM systems even have
> > enough address space to the map the BARs of the AMD GPUs that need
> > this support?
> >
> > Given that this was not enabled before, I don't think the upshot of
> > this series should be that we enable support for something on 32-bit
> > ARM that may cause headaches down the road without any benefit.
> >
> > So I'd prefer a fixup patch that opts ARM out of this over adding
> > support code for 64-bit conversions.
>
> I have not found any dts file for a 32-bit platform with support
> for a 64-bit prefetchable BAR, and there are very few that even
> have a pcie slot (as opposed on on-board devices) you could
> plug a card into.
>
> That said, I also don't think we should encourage the use of
> floating-point code in random device drivers. There is really
> no excuse for the amdgpu driver to use floating point math
> here, and we should get AMD to fix their driver instead.

That would be nice, but it won't happen, there are many reasons for
that code to exist like it does, unless someone can write an automated
converter to fixed point and validate it produces the same results for
a long series of input values, it isn't really something that will get
"fixed".

AMD's hardware team produces the calculations, and will only look into
hardware problems in that area if the driver is using the calculations
they produce and validate.

If you've looked at the calculation complexity you'd understand this
isn't a trivial use of float-point for no reason.

Dave.

2024-04-30 17:29:06

by Harry Wentland

[permalink] [raw]
Subject: Re: [PATCH v4 12/15] drm/amd/display: Only use hard-float, not altivec on powerpc

On 2024-03-29 03:18, Samuel Holland wrote:
> From: Michael Ellerman <[email protected]>
>
> The compiler flags enable altivec, but that is not required; hard-float
> is sufficient for the code to build and function.
>
> Drop altivec from the compiler flags and adjust the enable/disable code
> to only enable FPU use.
>
> Signed-off-by: Michael Ellerman <[email protected]>
> Acked-by: Alex Deucher <[email protected]>
> Signed-off-by: Samuel Holland <[email protected]>

Acked-by: Harry Wentland <[email protected]>

Harry

> ---
>
> (no changes since v2)
>
> Changes in v2:
> - New patch for v2
>
> drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 12 ++----------
> drivers/gpu/drm/amd/display/dc/dml/Makefile | 2 +-
> drivers/gpu/drm/amd/display/dc/dml2/Makefile | 2 +-
> 3 files changed, 4 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
> index 4ae4720535a5..0de16796466b 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
> @@ -92,11 +92,7 @@ void dc_fpu_begin(const char *function_name, const int line)
> #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
> kernel_fpu_begin();
> #elif defined(CONFIG_PPC64)
> - if (cpu_has_feature(CPU_FTR_VSX_COMP))
> - enable_kernel_vsx();
> - else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
> - enable_kernel_altivec();
> - else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
> + if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
> enable_kernel_fp();
> #elif defined(CONFIG_ARM64)
> kernel_neon_begin();
> @@ -125,11 +121,7 @@ void dc_fpu_end(const char *function_name, const int line)
> #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
> kernel_fpu_end();
> #elif defined(CONFIG_PPC64)
> - if (cpu_has_feature(CPU_FTR_VSX_COMP))
> - disable_kernel_vsx();
> - else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
> - disable_kernel_altivec();
> - else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
> + if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
> disable_kernel_fp();
> #elif defined(CONFIG_ARM64)
> kernel_neon_end();
> diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile
> index c4a5efd2dda5..59d3972341d2 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
> +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
> @@ -31,7 +31,7 @@ dml_ccflags := $(dml_ccflags-y) -msse
> endif
>
> ifdef CONFIG_PPC64
> -dml_ccflags := -mhard-float -maltivec
> +dml_ccflags := -mhard-float
> endif
>
> ifdef CONFIG_ARM64
> diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> index acff3449b8d7..7b51364084b5 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> @@ -30,7 +30,7 @@ dml2_ccflags := $(dml2_ccflags-y) -msse
> endif
>
> ifdef CONFIG_PPC64
> -dml2_ccflags := -mhard-float -maltivec
> +dml2_ccflags := -mhard-float
> endif
>
> ifdef CONFIG_ARM64


2024-04-30 17:29:25

by Harry Wentland

[permalink] [raw]
Subject: Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT



On 2024-03-29 03:18, Samuel Holland wrote:
> Now that all previously-supported architectures select
> ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
> of the existing list of architectures. It can also take advantage of the
> common kernel-mode FPU API and method of adjusting CFLAGS.
>
> Acked-by: Alex Deucher <[email protected]>
> Reviewed-by: Christoph Hellwig <[email protected]>

Really nice set of changes.

Acked-by: Harry Wentland <[email protected]>

Harry

> Signed-off-by: Samuel Holland <[email protected]>
> ---
>
> (no changes since v2)
>
> Changes in v2:
> - Split altivec removal to a separate patch
> - Use linux/fpu.h instead of asm/fpu.h in consumers
>
> drivers/gpu/drm/amd/display/Kconfig | 2 +-
> .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 27 ++------------
> drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 ++-----------------
> drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 ++-----------------
> 4 files changed, 7 insertions(+), 94 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig
> index 901d1961b739..5fcd4f778dc3 100644
> --- a/drivers/gpu/drm/amd/display/Kconfig
> +++ b/drivers/gpu/drm/amd/display/Kconfig
> @@ -8,7 +8,7 @@ config DRM_AMD_DC
> depends on BROKEN || !CC_IS_CLANG || ARM64 || RISCV || SPARC64 || X86_64
> select SND_HDA_COMPONENT if SND_HDA_CORE
> # !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752
> - select DRM_AMD_DC_FP if (X86 || LOONGARCH || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG))
> + select DRM_AMD_DC_FP if ARCH_HAS_KERNEL_FPU_SUPPORT && (!ARM64 || !CC_IS_CLANG)
> help
> Choose this option if you want to use the new display engine
> support for AMDGPU. This adds required support for Vega and
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
> index 0de16796466b..e46f8ce41d87 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
> @@ -26,16 +26,7 @@
>
> #include "dc_trace.h"
>
> -#if defined(CONFIG_X86)
> -#include <asm/fpu/api.h>
> -#elif defined(CONFIG_PPC64)
> -#include <asm/switch_to.h>
> -#include <asm/cputable.h>
> -#elif defined(CONFIG_ARM64)
> -#include <asm/neon.h>
> -#elif defined(CONFIG_LOONGARCH)
> -#include <asm/fpu.h>
> -#endif
> +#include <linux/fpu.h>
>
> /**
> * DOC: DC FPU manipulation overview
> @@ -87,16 +78,9 @@ void dc_fpu_begin(const char *function_name, const int line)
> WARN_ON_ONCE(!in_task());
> preempt_disable();
> depth = __this_cpu_inc_return(fpu_recursion_depth);
> -
> if (depth == 1) {
> -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
> + BUG_ON(!kernel_fpu_available());
> kernel_fpu_begin();
> -#elif defined(CONFIG_PPC64)
> - if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
> - enable_kernel_fp();
> -#elif defined(CONFIG_ARM64)
> - kernel_neon_begin();
> -#endif
> }
>
> TRACE_DCN_FPU(true, function_name, line, depth);
> @@ -118,14 +102,7 @@ void dc_fpu_end(const char *function_name, const int line)
>
> depth = __this_cpu_dec_return(fpu_recursion_depth);
> if (depth == 0) {
> -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
> kernel_fpu_end();
> -#elif defined(CONFIG_PPC64)
> - if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
> - disable_kernel_fp();
> -#elif defined(CONFIG_ARM64)
> - kernel_neon_end();
> -#endif
> } else {
> WARN_ON_ONCE(depth < 0);
> }
> diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile
> index 59d3972341d2..a94b6d546cd1 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
> +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
> @@ -25,40 +25,8 @@
> # It provides the general basic services required by other DAL
> # subcomponents.
>
> -ifdef CONFIG_X86
> -dml_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float
> -dml_ccflags := $(dml_ccflags-y) -msse
> -endif
> -
> -ifdef CONFIG_PPC64
> -dml_ccflags := -mhard-float
> -endif
> -
> -ifdef CONFIG_ARM64
> -dml_rcflags := -mgeneral-regs-only
> -endif
> -
> -ifdef CONFIG_LOONGARCH
> -dml_ccflags := -mfpu=64
> -dml_rcflags := -msoft-float
> -endif
> -
> -ifdef CONFIG_CC_IS_GCC
> -ifneq ($(call gcc-min-version, 70100),y)
> -IS_OLD_GCC = 1
> -endif
> -endif
> -
> -ifdef CONFIG_X86
> -ifdef IS_OLD_GCC
> -# Stack alignment mismatch, proceed with caution.
> -# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
> -# (8B stack alignment).
> -dml_ccflags += -mpreferred-stack-boundary=4
> -else
> -dml_ccflags += -msse2
> -endif
> -endif
> +dml_ccflags := $(CC_FLAGS_FPU)
> +dml_rcflags := $(CC_FLAGS_NO_FPU)
>
> ifneq ($(CONFIG_FRAME_WARN),0)
> ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y)
> diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> index 7b51364084b5..4f6c804a26ad 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> @@ -24,40 +24,8 @@
> #
> # Makefile for dml2.
>
> -ifdef CONFIG_X86
> -dml2_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float
> -dml2_ccflags := $(dml2_ccflags-y) -msse
> -endif
> -
> -ifdef CONFIG_PPC64
> -dml2_ccflags := -mhard-float
> -endif
> -
> -ifdef CONFIG_ARM64
> -dml2_rcflags := -mgeneral-regs-only
> -endif
> -
> -ifdef CONFIG_LOONGARCH
> -dml2_ccflags := -mfpu=64
> -dml2_rcflags := -msoft-float
> -endif
> -
> -ifdef CONFIG_CC_IS_GCC
> -ifeq ($(call cc-ifversion, -lt, 0701, y), y)
> -IS_OLD_GCC = 1
> -endif
> -endif
> -
> -ifdef CONFIG_X86
> -ifdef IS_OLD_GCC
> -# Stack alignment mismatch, proceed with caution.
> -# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
> -# (8B stack alignment).
> -dml2_ccflags += -mpreferred-stack-boundary=4
> -else
> -dml2_ccflags += -msse2
> -endif
> -endif
> +dml2_ccflags := $(CC_FLAGS_FPU)
> +dml2_rcflags := $(CC_FLAGS_NO_FPU)
>
> ifneq ($(CONFIG_FRAME_WARN),0)
> ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y)