LinuxLists.cc - [PATCH 0/3] ARM: make use of UAL VFP mnemonics when possible

2020-03-10 22:03:51

Subject: [PATCH 0/3] ARM: make use of UAL VFP mnemonics when possible

To build the kernel with Clang's integrated assembler the VFP code needs
to make use of the unified assembler language (UAL) VFP mnemonics.

At first I tried to get rid of the co-processor instructions to access
the floating point unit along with the macros completely. However, due
to missing FPINST/FPINST2 argument support in older binutils versions we
have to keep them around. Once we drop support for binutils 2.24 and
older, the move to UAL VFP mnemonics will be straight forward with this
changes applied.

Tested using Clang with integrated assembler as well as external
(binutils assembler), various gcc/binutils version down to 4.7/2.23.
Disassembled and compared the object files in arch/arm/vfp/ to make
sure this changes leads to the same code. Besides different inlining
behavior I was not able to spot a difference.

This replaces (and extends) my earlier patch "ARM: use assembly mnemonics
for VFP register access"
http://lore.kernel.org/r/8bb16ac4b15a7e28a8e819ef9aae20bfc3f75fbc.1582266841.git.stefan@agner.ch

--
Stefan

Stefan Agner (3):
ARM: use .fpu assembler directives instead of assembler arguments
ARM: use VFP assembler mnemonics in register load/store macros
ARM: use VFP assembler mnemonics if available

arch/arm/include/asm/vfp.h | 2 ++
arch/arm/include/asm/vfpmacros.h | 31 ++++++++++++++++++++++---------
arch/arm/vfp/Makefile | 5 ++++-
arch/arm/vfp/vfphw.S | 31 ++++++++++++++++++++-----------
arch/arm/vfp/vfpinstr.h | 23 +++++++++++++++++++----
5 files changed, 67 insertions(+), 25 deletions(-)

--
2.25.1

2020-03-10 23:01:20

by Stefan Agner

[permalink] [raw]

Subject: [PATCH 2/3] ARM: use VFP assembler mnemonics in register load/store macros

Clang's integrated assembler does not allow to access the VFP registers
through the coprocessor load/store instructions:
<instantiation>:4:6: error: invalid operand for instruction
LDC p11, cr0, [r10],#32*4 @ FLDMIAD r10!, {d0-d15}
^

Replace the coprocessor load/store instructions with explicit assembler
mnemonics to accessing the floating point coprocessor registers. Use
assembler directives to select the appropriate FPU version.

This allows to build these macros with GNU assembler as well as with
Clang's built-in assembler.

Signed-off-by: Stefan Agner <[email protected]>
---
arch/arm/include/asm/vfpmacros.h | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h
index 628c336e8e3b..947ee5395e1f 100644
--- a/arch/arm/include/asm/vfpmacros.h
+++ b/arch/arm/include/asm/vfpmacros.h
@@ -19,23 +19,25 @@

@ read all the working registers back into the VFP
.macro VFPFLDMIA, base, tmp
+ .fpu vfpv2
#if __LINUX_ARM_ARCH__ < 6
- LDC p11, cr0, [\base],#33*4 @ FLDMIAX \base!, {d0-d15}
+ fldmiax \base!, {d0-d15}
#else
- LDC p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d0-d15}
+ vldmia \base!, {d0-d15}
#endif
#ifdef CONFIG_VFPv3
+ .fpu vfpv3
#if __LINUX_ARM_ARCH__ <= 6
ldr \tmp, =elf_hwcap @ may not have MVFR regs
ldr \tmp, [\tmp, #0]
tst \tmp, #HWCAP_VFPD32
- ldclne p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d16-d31}
+ vldmiane \base!, {d16-d31}
addeq \base, \base, #32*4 @ step over unused register space
#else
VFPFMRX \tmp, MVFR0 @ Media and VFP Feature Register 0
and \tmp, \tmp, #MVFR0_A_SIMD_MASK @ A_SIMD field
cmp \tmp, #2 @ 32 x 64bit registers?
- ldcleq p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d16-d31}
+ vldmiaeq \base!, {d16-d31}
addne \base, \base, #32*4 @ step over unused register space
#endif
#endif
@@ -44,22 +46,23 @@
@ write all the working registers out of the VFP
.macro VFPFSTMIA, base, tmp
#if __LINUX_ARM_ARCH__ < 6
- STC p11, cr0, [\base],#33*4 @ FSTMIAX \base!, {d0-d15}
+ fstmiax \base!, {d0-d15}
#else
- STC p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d0-d15}
+ vstmia \base!, {d0-d15}
#endif
#ifdef CONFIG_VFPv3
+ .fpu vfpv3
#if __LINUX_ARM_ARCH__ <= 6
ldr \tmp, =elf_hwcap @ may not have MVFR regs
ldr \tmp, [\tmp, #0]
tst \tmp, #HWCAP_VFPD32
- stclne p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d16-d31}
+ vstmiane \base!, {d16-d31}
addeq \base, \base, #32*4 @ step over unused register space
#else
VFPFMRX \tmp, MVFR0 @ Media and VFP Feature Register 0
and \tmp, \tmp, #MVFR0_A_SIMD_MASK @ A_SIMD field
cmp \tmp, #2 @ 32 x 64bit registers?
- stcleq p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d16-d31}
+ vstmiaeq \base!, {d16-d31}
addne \base, \base, #32*4 @ step over unused register space
#endif
#endif
--
2.25.1

2020-03-21 15:05:24

by Peter Smith

[permalink] [raw]

Subject: Re: [PATCH 0/3] ARM: make use of UAL VFP mnemonics when possible

> To build the kernel with Clang's integrated assembler the VFP code needs
> to make use of the unified assembler language (UAL) VFP mnemonics.
>
> At first I tried to get rid of the co-processor instructions to access
> the floating point unit along with the macros completely. However, due
> to missing FPINST/FPINST2 argument support in older binutils versions we
> have to keep them around. Once we drop support for binutils 2.24 and
> older, the move to UAL VFP mnemonics will be straight forward with this
> changes applied.
>
> Tested using Clang with integrated assembler as well as external
> (binutils assembler), various gcc/binutils version down to 4.7/2.23.
> Disassembled and compared the object files in arch/arm/vfp/ to make
> sure this changes leads to the same code. Besides different inlining
> behavior I was not able to spot a difference.
>

From the perspective of an Arm toolchain developer perspective the
substitutions in this patch series look correct to me.

Some references I found helpful:

The v8-A Arm Architecture Reference Manual chapter Legacy Instruction Syntax for
AArch32 Instruction Sets. Table K6-2 Pre-UAL instruction syntax for A32
floating-point instructions

FMSR/FMRS is the pre-UAL name for VMOV (between general-purpose register and
single-precision)
FMDRR/FMRRD is the pre-UAL name for VMOV (between two general-purpose
registers and a doubleword floating-point register)
FLDMIAD is the pre-UAL name for VLDMIA
FSTMIAD is the pre-UAL name for VSTMIA

FLDMIAX and FSTMIAX has no UAL equivalent and is deprecated in ARMv6 and above,
it is equivalent to pre-UAL FLDMIAD/FSTMIAD except that the imm8 field is set
to twice the number of doubleword registers + 1, instead of twice the number of
doubleword registers. This description is taken from A8.8.50 F*, former
Floating-point instruction mnemonics in the v7-A Arm Architecture reference
manual.

The mrrc/mcrr and mcr/mcr correspond to a VMOV instruction. The mrrc/mcrr and
mcr/mcr set opc2 to #4 when accessing registers 16 to 31 as the instructions
can only refer to 16 coprocessor registers. The same bit (7) in the VMOV
corresponds to N, with the register n = UInt(N:Vn) so VMOV can refer to 32
registers.

Ref: Arm V8-A https://static.docs.arm.com/ddi0487/fa/DDI0487F_a_armv8_arm.pdf
Ref: Arm V7-A https://static.docs.arm.com/ddi0406/c/DDI0406C_C_arm_architecture_reference_manual.pdf

Hope this helps move this forward

Peter

> This replaces (and extends) my earlier patch "ARM: use assembly mnemonics
> for VFP register access"
> http://lore.kernel.org/r/8bb16ac4b15a7e28a8e819ef9aae20bfc3f75fbc.1582266841.git.stefan@agner.ch
>
> --
> Stefan
>
> Stefan Agner (3):
> ARM: use .fpu assembler directives instead of assembler arguments
> ARM: use VFP assembler mnemonics in register load/store macros
> ARM: use VFP assembler mnemonics if available
>
> arch/arm/include/asm/vfp.h | 2 ++
> arch/arm/include/asm/vfpmacros.h | 31 ++++++++++++++++++++++---------
> arch/arm/vfp/Makefile | 5 ++++-
> arch/arm/vfp/vfphw.S | 31 ++++++++++++++++++++-----------
> arch/arm/vfp/vfpinstr.h | 23 +++++++++++++++++++----
> 5 files changed, 67 insertions(+), 25 deletions(-)
>
> --
> 2.25.1