From: WANG Xuerui <[email protected]>
Hi,
It's been a long time since the LoongArch port was upstreamed to LLVM,
and there seems to be evidence that Linux was successfully built with
Clang inside Loongson roughly around that time; however, a lot has
changed since then, and the Linux/LoongArch codebase now makes use of
more novel features that necessitate further work. (The enablement work
is tracked at [1].)
With this patch series and a patched LLVM/Clang/LLD ([2][3][4][5]), a
working kernel can be built with `make LLVM=1`. Although currently
support for CONFIG_RELOCATABLE and CONFIG_MODULE is still TODO, we've
decided to post the series early to hopefully reduce the rebase
burden. The series contains several useful cleanups anyway.
Regarding how to merge this: because only Patch 8 is outside
arch/loongarch, I'd prefer the series to get merged through Huacai's
tree. The series applies cleanly on top of next-20230622.
Thanks go to the ClangBuiltLinux team, and LoongArch toolchain
maintainers from Loongson and the community alike; without your help
this would come much later, if at all (my free time has been steadily
dwindling this year already).
Your comments are welcome!
[1]: https://github.com/ClangBuiltLinux/linux/issues/1787
[2]: https://reviews.llvm.org/D153609
[3]: https://reviews.llvm.org/D138135
[4]: https://reviews.llvm.org/D150196
[5]: https://reviews.llvm.org/D153707
Changes in v3:
- Squashed the two CFLAGS patches into one, and removed the CC_IS_CLANG
check in favor of a feature detection approach (Huacai and Ruoyao)
- Removed unnecessary BUILD_BUG_ONs in the invtlb wrappers, and made
them __always_inline (Ruoyao and Bibo)
- Linked to the explanation regarding the Clang triple's "libc" part
(Nick)
- Fixed a wrong punctuation in commit message of Patch 4
Changes in v2:
- Merged the two FCSR-related patches, now using the same approach for
assembly and C (given that the inline asm codepath cannot be removed
right away), also change terminology: register "class" instead of
"bank"
- Restored signatures of invtlb wrappers for ease of rebase (potentially
downstream product kernels?)
- Removed -G0 switches altogether (turned out it's useless upon closer
look)
- Fix -mabi and -msoft-float handling in the CFLAGS patch; two more LLVM
patches are now necessary (the [4] and [5] links above) but the
original and correct CFLAGS arrangement now works
WANG Rui (2):
LoongArch: Calculate various sizes in the linker script
LoongArch: extable: Also recognize ABI names of registers
WANG Xuerui (6):
LoongArch: Prepare for assemblers with proper FCSR class support
LoongArch: Make the CPUCFG and CSR ops simple aliases of compiler
built-ins
LoongArch: Simplify the invtlb wrappers
LoongArch: Tweak CFLAGS for Clang compatibility
LoongArch: Mark Clang LTO as working
Makefile: Add loongarch target flag for Clang compilation
arch/loongarch/Kconfig | 5 ++
arch/loongarch/Makefile | 21 +++++---
arch/loongarch/include/asm/fpregdef.h | 7 +++
arch/loongarch/include/asm/gpr-num.h | 30 +++++++++++
arch/loongarch/include/asm/loongarch.h | 72 +++++++-------------------
arch/loongarch/include/asm/percpu.h | 6 ++-
arch/loongarch/include/asm/tlb.h | 43 +++++++--------
arch/loongarch/kernel/efi-header.S | 6 +--
arch/loongarch/kernel/head.S | 8 +--
arch/loongarch/kernel/traps.c | 2 +-
arch/loongarch/kernel/vmlinux.lds.S | 7 +++
arch/loongarch/lib/dump_tlb.c | 6 +--
arch/loongarch/vdso/Makefile | 2 +-
scripts/Makefile.clang | 1 +
14 files changed, 118 insertions(+), 98 deletions(-)
--
2.40.0
From: WANG Rui <[email protected]>
Taking the address delta between symbols in different sections is not
supported by the LLVM IAS. Instead, do this in the linker script, so
the same data can be properly referenced in assembly.
Signed-off-by: WANG Rui <[email protected]>
Signed-off-by: WANG Xuerui <[email protected]>
---
arch/loongarch/kernel/efi-header.S | 6 +++---
arch/loongarch/kernel/head.S | 8 ++++----
arch/loongarch/kernel/vmlinux.lds.S | 7 +++++++
3 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/arch/loongarch/kernel/efi-header.S b/arch/loongarch/kernel/efi-header.S
index 8c1d229a2afa..5f23b85d78ca 100644
--- a/arch/loongarch/kernel/efi-header.S
+++ b/arch/loongarch/kernel/efi-header.S
@@ -24,7 +24,7 @@
.byte 0x02 /* MajorLinkerVersion */
.byte 0x14 /* MinorLinkerVersion */
.long __inittext_end - .Lefi_header_end /* SizeOfCode */
- .long _end - __initdata_begin /* SizeOfInitializedData */
+ .long _kernel_vsize /* SizeOfInitializedData */
.long 0 /* SizeOfUninitializedData */
.long __efistub_efi_pe_entry - _head /* AddressOfEntryPoint */
.long .Lefi_header_end - _head /* BaseOfCode */
@@ -79,9 +79,9 @@
IMAGE_SCN_MEM_EXECUTE /* Characteristics */
.ascii ".data\0\0\0"
- .long _end - __initdata_begin /* VirtualSize */
+ .long _kernel_vsize /* VirtualSize */
.long __initdata_begin - _head /* VirtualAddress */
- .long _edata - __initdata_begin /* SizeOfRawData */
+ .long _kernel_rsize /* SizeOfRawData */
.long __initdata_begin - _head /* PointerToRawData */
.long 0 /* PointerToRelocations */
diff --git a/arch/loongarch/kernel/head.S b/arch/loongarch/kernel/head.S
index 0d8180153ec0..53b883db0786 100644
--- a/arch/loongarch/kernel/head.S
+++ b/arch/loongarch/kernel/head.S
@@ -23,7 +23,7 @@ _head:
.word MZ_MAGIC /* "MZ", MS-DOS header */
.org 0x8
.dword kernel_entry /* Kernel entry point */
- .dword _end - _text /* Kernel image effective size */
+ .dword _kernel_asize /* Kernel image effective size */
.quad PHYS_LINK_KADDR /* Kernel image load offset from start of RAM */
.org 0x38 /* 0x20 ~ 0x37 reserved */
.long LINUX_PE_MAGIC
@@ -32,9 +32,9 @@ _head:
pe_header:
__EFI_PE_HEADER
-SYM_DATA(kernel_asize, .long _end - _text);
-SYM_DATA(kernel_fsize, .long _edata - _text);
-SYM_DATA(kernel_offset, .long kernel_offset - _text);
+SYM_DATA(kernel_asize, .long _kernel_asize);
+SYM_DATA(kernel_fsize, .long _kernel_fsize);
+SYM_DATA(kernel_offset, .long _kernel_offset);
#endif
diff --git a/arch/loongarch/kernel/vmlinux.lds.S b/arch/loongarch/kernel/vmlinux.lds.S
index 0c7b041be9d8..79f238df029e 100644
--- a/arch/loongarch/kernel/vmlinux.lds.S
+++ b/arch/loongarch/kernel/vmlinux.lds.S
@@ -136,6 +136,13 @@ SECTIONS
DWARF_DEBUG
ELF_DETAILS
+ /* header symbols */
+ _kernel_asize = _end - _text;
+ _kernel_fsize = _edata - _text;
+ _kernel_offset = kernel_offset - _text;
+ _kernel_vsize = _end - __initdata_begin;
+ _kernel_rsize = _edata - __initdata_begin;
+
.gptab.sdata : {
*(.gptab.data)
*(.gptab.sdata)
--
2.40.0
From: WANG Xuerui <[email protected]>
The GNU assembler (as of 2.40) mis-treats FCSR operands as GPRs, but
the LLVM IAS does not. Probe for this and refer to FCSRs as "$fcsrNN"
if support is present.
Signed-off-by: WANG Xuerui <[email protected]>
---
arch/loongarch/Kconfig | 3 +++
arch/loongarch/include/asm/fpregdef.h | 7 +++++++
arch/loongarch/include/asm/loongarch.h | 9 ++++++++-
3 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 743d87655742..ac3564935281 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -242,6 +242,9 @@ config SCHED_OMIT_FRAME_POINTER
config AS_HAS_EXPLICIT_RELOCS
def_bool $(as-instr,x:pcalau12i \$t0$(comma)%pc_hi20(x))
+config AS_HAS_FCSR_CLASS
+ def_bool $(as-instr,x:movfcsr2gr \$t0$(comma)\$fcsr0)
+
config CC_HAS_LSX_EXTENSION
def_bool $(cc-option,-mlsx)
diff --git a/arch/loongarch/include/asm/fpregdef.h b/arch/loongarch/include/asm/fpregdef.h
index b6be527831dd..3eb7ff9e1d8e 100644
--- a/arch/loongarch/include/asm/fpregdef.h
+++ b/arch/loongarch/include/asm/fpregdef.h
@@ -40,6 +40,12 @@
#define fs6 $f30
#define fs7 $f31
+#ifdef CONFIG_AS_HAS_FCSR_CLASS
+#define fcsr0 $fcsr0
+#define fcsr1 $fcsr1
+#define fcsr2 $fcsr2
+#define fcsr3 $fcsr3
+#else
/*
* Current binutils expects *GPRs* at FCSR position for the FCSR
* operation instructions, so define aliases for those used.
@@ -48,5 +54,6 @@
#define fcsr1 $r1
#define fcsr2 $r2
#define fcsr3 $r3
+#endif
#endif /* _ASM_FPREGDEF_H */
diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
index ac83e60c60d1..ff4482fd8ad7 100644
--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -1445,11 +1445,18 @@ __BUILD_CSR_OP(tlbidx)
#define EXCCODE_INT_START 64
#define EXCCODE_INT_END (EXCCODE_INT_START + EXCCODE_INT_NUM - 1)
-/* FPU register names */
+/* FPU Status Register Names */
+#ifdef CONFIG_AS_HAS_FCSR_CLASS
+#define LOONGARCH_FCSR0 $fcsr0
+#define LOONGARCH_FCSR1 $fcsr1
+#define LOONGARCH_FCSR2 $fcsr2
+#define LOONGARCH_FCSR3 $fcsr3
+#else
#define LOONGARCH_FCSR0 $r0
#define LOONGARCH_FCSR1 $r1
#define LOONGARCH_FCSR2 $r2
#define LOONGARCH_FCSR3 $r3
+#endif
/* FPU Status Register Values */
#define FPU_CSR_RSVD 0xe0e0fce0
--
2.40.0
From: WANG Xuerui <[email protected]>
Now the arch code is mostly ready for LLVM/Clang consumption, it is time
to re-organize the CFLAGS a little to actually enable the LLVM build.
Namely, all -G0 switches from CFLAGS are removed, and -mexplicit-relocs
and -mdirect-extern-access are now wrapped with cc-option (with the
related asm/percpu.h definition guarded against toolchain combos that
are known to not work).
A build with !RELOCATABLE && !MODULE is confirmed working within a QEMU
environment; support for the two features are currently blocked on
LLVM/Clang, and will come later.
Why -G0 can be removed:
In GCC, -G stands for "small data threshold", that instructs the
compiler to put data smaller than the specified threshold in a dedicated
"small data" section (called .sdata on LoongArch and several other
arches).
However, benefiting from this would require ABI cooperation, which is
not the case for LoongArch; and current GCC behave the same whether -G0
(equal to disabling this optimization) is given or not. So, remove -G0
from CFLAGS altogether for one less thing to care about. This also
benefits LLVM/Clang compatibility where the -G switch is not supported.
Why -mexplicit-relocs can now be conditionally applied without
regressions:
Originally -mexplicit-relocs is unconditionally added to CFLAGS in case
of CONFIG_AS_HAS_EXPLICIT_RELOCS, because not having it (i.e. old GCC +
new binutils) would not work: modules will have R_LARCH_ABS_* relocs
inside, but given the rarity of such toolchain combo in the wild, it may
not be worthwhile to support it, so support for such relocs in modules
were not added back when explicit relocs support was upstreamed, and
-mexplicit-relocs is unconditionally added to fail the build early.
Now that Clang compatibility is desired, given Clang is behaving like
-mexplicit-relocs from day one but without support for the CLI flag, we
must ensure the flag is not passed in case of Clang. However, explicit
compiler flavor checks can be more brittle than feature detection: in
this case what actually matters is support for __attribute__((model))
when building modules. Given neither older GCC nor current Clang support
this attribute, probing for the attribute support and #error'ing out
would allow proper UX without checking for Clang, and also automatically
work when Clang support for the attribute is to be added in the future.
Why -mdirect-extern-access is now conditionally applied:
This is actually a nice-to-have optimization that can reduce GOT
accesses, but not having it is harmless either. Because Clang does not
support the option currently, but might do so in the future, conditional
application via cc-option ensures compatibility with both current and
future Clang versions.
Suggested-by: Xi Ruoyao <[email protected]> # cc-option changes
Signed-off-by: WANG Xuerui <[email protected]>
---
arch/loongarch/Makefile | 21 +++++++++++++--------
arch/loongarch/include/asm/percpu.h | 6 +++++-
arch/loongarch/vdso/Makefile | 2 +-
3 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
index a27e264bdaa5..a63683da3bcf 100644
--- a/arch/loongarch/Makefile
+++ b/arch/loongarch/Makefile
@@ -46,8 +46,8 @@ ld-emul = $(64bit-emul)
cflags-y += -mabi=lp64s
endif
-cflags-y += -G0 -pipe -msoft-float
-LDFLAGS_vmlinux += -G0 -static -n -nostdlib
+cflags-y += -pipe -msoft-float
+LDFLAGS_vmlinux += -static -n -nostdlib
# When the assembler supports explicit relocation hint, we must use it.
# GCC may have -mexplicit-relocs off by default if it was built with an old
@@ -56,13 +56,18 @@ LDFLAGS_vmlinux += -G0 -static -n -nostdlib
# When the assembler does not supports explicit relocation hint, we can't use
# it. Disable it if the compiler supports it.
#
-# If you've seen "unknown reloc hint" message building the kernel and you are
-# now wondering why "-mexplicit-relocs" is not wrapped with cc-option: the
-# combination of a "new" assembler and "old" compiler is not supported. Either
-# upgrade the compiler or downgrade the assembler.
+# The combination of a "new" assembler and "old" GCC is not supported, given
+# the rarity of this combo and the extra complexity needed to make it work.
+# Either upgrade the compiler or downgrade the assembler; the build will error
+# out if it is the case (by probing for the model attribute; all supported
+# compilers in this case would have support).
+#
+# Also, -mdirect-extern-access is useful in case of building with explicit
+# relocs, for avoiding unnecessary GOT accesses. It is harmless to not have
+# support though.
ifdef CONFIG_AS_HAS_EXPLICIT_RELOCS
-cflags-y += -mexplicit-relocs
-KBUILD_CFLAGS_KERNEL += -mdirect-extern-access
+cflags-y += $(call cc-option,-mexplicit-relocs)
+KBUILD_CFLAGS_KERNEL += $(call cc-option,-mdirect-extern-access)
else
cflags-y += $(call cc-option,-mno-explicit-relocs)
KBUILD_AFLAGS_KERNEL += -Wa,-mla-global-with-pcrel
diff --git a/arch/loongarch/include/asm/percpu.h b/arch/loongarch/include/asm/percpu.h
index ad8d88494554..b9f567e66016 100644
--- a/arch/loongarch/include/asm/percpu.h
+++ b/arch/loongarch/include/asm/percpu.h
@@ -14,7 +14,11 @@
* loaded. Tell the compiler this fact when using explicit relocs.
*/
#if defined(MODULE) && defined(CONFIG_AS_HAS_EXPLICIT_RELOCS)
-#define PER_CPU_ATTRIBUTES __attribute__((model("extreme")))
+# if __has_attribute(model)
+# define PER_CPU_ATTRIBUTES __attribute__((model("extreme")))
+# else
+# error compiler support for the model attribute is necessary when a recent assembler is used
+# endif
#endif
/* Use r21 for fast access */
diff --git a/arch/loongarch/vdso/Makefile b/arch/loongarch/vdso/Makefile
index 4c859a0e4754..ee4abcf5642e 100644
--- a/arch/loongarch/vdso/Makefile
+++ b/arch/loongarch/vdso/Makefile
@@ -25,7 +25,7 @@ endif
cflags-vdso := $(ccflags-vdso) \
-isystem $(shell $(CC) -print-file-name=include) \
$(filter -W%,$(filter-out -Wa$(comma)%,$(KBUILD_CFLAGS))) \
- -O2 -g -fno-strict-aliasing -fno-common -fno-builtin -G0 \
+ -O2 -g -fno-strict-aliasing -fno-common -fno-builtin \
-fno-stack-protector -fno-jump-tables -DDISABLE_BRANCH_PROFILING \
$(call cc-option, -fno-asynchronous-unwind-tables) \
$(call cc-option, -fno-stack-protector)
--
2.40.0
From: WANG Xuerui <[email protected]>
In addition to less visual clutter, this also makes Clang happy
regarding the const-ness of arguments. In the original approach, all
Clang gets to see is the incoming arguments whose const-ness cannot be
proven without first being inlined; so Clang errors out here while GCC
is fine.
While at it, tweak several printk format strings because the return type
of csr_read64 becomes effectively unsigned long, instead of unsigned
long long.
Signed-off-by: WANG Xuerui <[email protected]>
---
arch/loongarch/include/asm/loongarch.h | 63 +++++---------------------
arch/loongarch/kernel/traps.c | 2 +-
arch/loongarch/lib/dump_tlb.c | 6 +--
3 files changed, 15 insertions(+), 56 deletions(-)
diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
index ff4482fd8ad7..ea8d1e82369d 100644
--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -56,10 +56,7 @@ __asm__(".macro parse_r var r\n\t"
#undef _IFC_REG
/* CPUCFG */
-static inline u32 read_cpucfg(u32 reg)
-{
- return __cpucfg(reg);
-}
+#define read_cpucfg(reg) __cpucfg(reg)
#endif /* !__ASSEMBLY__ */
@@ -207,56 +204,18 @@ static inline u32 read_cpucfg(u32 reg)
#ifndef __ASSEMBLY__
/* CSR */
-static __always_inline u32 csr_read32(u32 reg)
-{
- return __csrrd_w(reg);
-}
-
-static __always_inline u64 csr_read64(u32 reg)
-{
- return __csrrd_d(reg);
-}
-
-static __always_inline void csr_write32(u32 val, u32 reg)
-{
- __csrwr_w(val, reg);
-}
-
-static __always_inline void csr_write64(u64 val, u32 reg)
-{
- __csrwr_d(val, reg);
-}
-
-static __always_inline u32 csr_xchg32(u32 val, u32 mask, u32 reg)
-{
- return __csrxchg_w(val, mask, reg);
-}
-
-static __always_inline u64 csr_xchg64(u64 val, u64 mask, u32 reg)
-{
- return __csrxchg_d(val, mask, reg);
-}
+#define csr_read32(reg) __csrrd_w(reg)
+#define csr_read64(reg) __csrrd_d(reg)
+#define csr_write32(val, reg) __csrwr_w(val, reg)
+#define csr_write64(val, reg) __csrwr_d(val, reg)
+#define csr_xchg32(val, mask, reg) __csrxchg_w(val, mask, reg)
+#define csr_xchg64(val, mask, reg) __csrxchg_d(val, mask, reg)
/* IOCSR */
-static __always_inline u32 iocsr_read32(u32 reg)
-{
- return __iocsrrd_w(reg);
-}
-
-static __always_inline u64 iocsr_read64(u32 reg)
-{
- return __iocsrrd_d(reg);
-}
-
-static __always_inline void iocsr_write32(u32 val, u32 reg)
-{
- __iocsrwr_w(val, reg);
-}
-
-static __always_inline void iocsr_write64(u64 val, u32 reg)
-{
- __iocsrwr_d(val, reg);
-}
+#define iocsr_read32(reg) __iocsrrd_w(reg)
+#define iocsr_read64(reg) __iocsrrd_d(reg)
+#define iocsr_write32(val, reg) __iocsrwr_w(val, reg)
+#define iocsr_write64(val, reg) __iocsrwr_d(val, reg)
#endif /* !__ASSEMBLY__ */
diff --git a/arch/loongarch/kernel/traps.c b/arch/loongarch/kernel/traps.c
index 22179cf6f33c..8fb5e7a77145 100644
--- a/arch/loongarch/kernel/traps.c
+++ b/arch/loongarch/kernel/traps.c
@@ -999,7 +999,7 @@ asmlinkage void cache_parity_error(void)
/* For the moment, report the problem and hang. */
pr_err("Cache error exception:\n");
pr_err("csr_merrctl == %08x\n", csr_read32(LOONGARCH_CSR_MERRCTL));
- pr_err("csr_merrera == %016llx\n", csr_read64(LOONGARCH_CSR_MERRERA));
+ pr_err("csr_merrera == %016lx\n", csr_read64(LOONGARCH_CSR_MERRERA));
panic("Can't handle the cache error!");
}
diff --git a/arch/loongarch/lib/dump_tlb.c b/arch/loongarch/lib/dump_tlb.c
index c2cc7ce343c9..0b886a6e260f 100644
--- a/arch/loongarch/lib/dump_tlb.c
+++ b/arch/loongarch/lib/dump_tlb.c
@@ -20,9 +20,9 @@ void dump_tlb_regs(void)
pr_info("Index : 0x%0x\n", read_csr_tlbidx());
pr_info("PageSize : 0x%0x\n", read_csr_pagesize());
- pr_info("EntryHi : 0x%0*llx\n", field, read_csr_entryhi());
- pr_info("EntryLo0 : 0x%0*llx\n", field, read_csr_entrylo0());
- pr_info("EntryLo1 : 0x%0*llx\n", field, read_csr_entrylo1());
+ pr_info("EntryHi : 0x%0*lx\n", field, read_csr_entryhi());
+ pr_info("EntryLo0 : 0x%0*lx\n", field, read_csr_entrylo0());
+ pr_info("EntryLo1 : 0x%0*lx\n", field, read_csr_entrylo1());
}
static void dump_tlb(int first, int last)
--
2.40.0
From: WANG Rui <[email protected]>
When the kernel is compiled with LLVM, the register names being handled
during exception fixup building are ABI names instead of bare $rNN
style. Add mapping for the ABI names for LLVM compatibility.
Signed-off-by: WANG Rui <[email protected]>
Signed-off-by: WANG Xuerui <[email protected]>
---
arch/loongarch/include/asm/gpr-num.h | 30 ++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/arch/loongarch/include/asm/gpr-num.h b/arch/loongarch/include/asm/gpr-num.h
index e0941af20c7e..996038da806d 100644
--- a/arch/loongarch/include/asm/gpr-num.h
+++ b/arch/loongarch/include/asm/gpr-num.h
@@ -9,6 +9,22 @@
.equ .L__gpr_num_$r\num, \num
.endr
+ /* ABI names of registers */
+ .equ .L__gpr_num_$ra, 1
+ .equ .L__gpr_num_$tp, 2
+ .equ .L__gpr_num_$sp, 3
+ .irp num,0,1,2,3,4,5,6,7
+ .equ .L__gpr_num_$a\num, 4 + \num
+ .endr
+ .irp num,0,1,2,3,4,5,6,7,8
+ .equ .L__gpr_num_$t\num, 12 + \num
+ .endr
+ .equ .L__gpr_num_$s9, 22
+ .equ .L__gpr_num_$fp, 22
+ .irp num,0,1,2,3,4,5,6,7,8
+ .equ .L__gpr_num_$s\num, 23 + \num
+ .endr
+
#else /* __ASSEMBLY__ */
#define __DEFINE_ASM_GPR_NUMS \
@@ -16,6 +32,20 @@
" .irp num,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31\n" \
" .equ .L__gpr_num_$r\\num, \\num\n" \
" .endr\n" \
+" .equ .L__gpr_num_$ra, 1\n" \
+" .equ .L__gpr_num_$tp, 2\n" \
+" .equ .L__gpr_num_$sp, 3\n" \
+" .irp num,0,1,2,3,4,5,6,7\n" \
+" .equ .L__gpr_num_$a\\num, 4 + \\num\n" \
+" .endr\n" \
+" .irp num,0,1,2,3,4,5,6,7,8\n" \
+" .equ .L__gpr_num_$t\\num, 12 + \\num\n" \
+" .endr\n" \
+" .equ .L__gpr_num_$s9, 22\n" \
+" .equ .L__gpr_num_$fp, 22\n" \
+" .irp num,0,1,2,3,4,5,6,7,8\n" \
+" .equ .L__gpr_num_$s\\num, 23 + \\num\n" \
+" .endr\n" \
#endif /* __ASSEMBLY__ */
--
2.40.0
From: WANG Xuerui <[email protected]>
The LoongArch kernel is 64-bit and built with the soft-float ABI,
hence the loongarch64-linux-gnusf target. (The "libc" part can affect
the codegen of libcalls: other arches do not use a bare-metal target,
and currently the only fully supported libc on LoongArch is glibc
anyway.)
See: https://lore.kernel.org/loongarch/CAKwvOdnimxv8oJ4mVY74zqtt1x7KTMrWvn2_T9x22SFDbU6rHQ@mail.gmail.com/
Signed-off-by: WANG Xuerui <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
---
scripts/Makefile.clang | 1 +
1 file changed, 1 insertion(+)
diff --git a/scripts/Makefile.clang b/scripts/Makefile.clang
index 058a4c0f864e..6c23c6af797f 100644
--- a/scripts/Makefile.clang
+++ b/scripts/Makefile.clang
@@ -4,6 +4,7 @@
CLANG_TARGET_FLAGS_arm := arm-linux-gnueabi
CLANG_TARGET_FLAGS_arm64 := aarch64-linux-gnu
CLANG_TARGET_FLAGS_hexagon := hexagon-linux-musl
+CLANG_TARGET_FLAGS_loongarch := loongarch64-linux-gnusf
CLANG_TARGET_FLAGS_m68k := m68k-linux-gnu
CLANG_TARGET_FLAGS_mips := mipsel-linux-gnu
CLANG_TARGET_FLAGS_powerpc := powerpc64le-linux-gnu
--
2.40.0
From: WANG Xuerui <[email protected]>
Confirmed working with QEMU system emulation.
Signed-off-by: WANG Xuerui <[email protected]>
Acked-by: Nick Desaulniers <[email protected]>
---
arch/loongarch/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index ac3564935281..ed9a148cdcde 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -51,6 +51,8 @@ config LOONGARCH
select ARCH_SUPPORTS_ACPI
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_HUGETLBFS
+ select ARCH_SUPPORTS_LTO_CLANG
+ select ARCH_SUPPORTS_LTO_CLANG_THIN
select ARCH_SUPPORTS_NUMA_BALANCING
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF
--
2.40.0
From: WANG Xuerui <[email protected]>
The invtlb instruction has been supported by upstream LoongArch
toolchains from day one, so ditch the raw opcode trickery and just use
plain inline asm for it.
While at it, also make the invtlb asm statements barriers, for proper
modeling of the side effects. The functions are also marked as
__always_inline instead of just "inline", because they cannot work at
all if not inlined: the op argument will not be compile-time const in
that case, thus failing to satisfy the "i" constraint.
The signature of the other more specific invtlb wrappers contain unused
arguments right now, but these are not removed right away in order for
the patch to be focused. In the meantime, assertions are added to ensure
no accidental misuse happens before the refactor. (The more specific
wrappers cannot re-use the generic invtlb wrapper, because the ISA
manual says $zero shall be used in case a particular op does not take
the respective argument: re-using the generic wrapper would mean losing
control over the register usage.)
Signed-off-by: WANG Xuerui <[email protected]>
---
arch/loongarch/include/asm/tlb.h | 43 ++++++++++++++------------------
1 file changed, 19 insertions(+), 24 deletions(-)
diff --git a/arch/loongarch/include/asm/tlb.h b/arch/loongarch/include/asm/tlb.h
index 0dc9ee2b05d2..da7a3b5b9374 100644
--- a/arch/loongarch/include/asm/tlb.h
+++ b/arch/loongarch/include/asm/tlb.h
@@ -88,52 +88,47 @@ enum invtlb_ops {
INVTLB_GID_ADDR = 0x16,
};
-/*
- * invtlb op info addr
- * (0x1 << 26) | (0x24 << 20) | (0x13 << 15) |
- * (addr << 10) | (info << 5) | op
- */
-static inline void invtlb(u32 op, u32 info, u64 addr)
+static __always_inline void invtlb(u32 op, u32 info, u64 addr)
{
__asm__ __volatile__(
- "parse_r addr,%0\n\t"
- "parse_r info,%1\n\t"
- ".word ((0x6498000) | (addr << 10) | (info << 5) | %2)\n\t"
- :
- : "r"(addr), "r"(info), "i"(op)
+ "invtlb %0, %1, %2\n\t"
:
+ : "i"(op), "r"(info), "r"(addr)
+ : "memory"
);
}
-static inline void invtlb_addr(u32 op, u32 info, u64 addr)
+static __always_inline void invtlb_addr(u32 op, u32 info, u64 addr)
{
+ BUILD_BUG_ON(!__builtin_constant_p(info) || info != 0);
__asm__ __volatile__(
- "parse_r addr,%0\n\t"
- ".word ((0x6498000) | (addr << 10) | (0 << 5) | %1)\n\t"
- :
- : "r"(addr), "i"(op)
+ "invtlb %0, $zero, %1\n\t"
:
+ : "i"(op), "r"(addr)
+ : "memory"
);
}
-static inline void invtlb_info(u32 op, u32 info, u64 addr)
+static __always_inline void invtlb_info(u32 op, u32 info, u64 addr)
{
+ BUILD_BUG_ON(!__builtin_constant_p(addr) || addr != 0);
__asm__ __volatile__(
- "parse_r info,%0\n\t"
- ".word ((0x6498000) | (0 << 10) | (info << 5) | %1)\n\t"
- :
- : "r"(info), "i"(op)
+ "invtlb %0, %1, $zero\n\t"
:
+ : "i"(op), "r"(info)
+ : "memory"
);
}
-static inline void invtlb_all(u32 op, u32 info, u64 addr)
+static __always_inline void invtlb_all(u32 op, u32 info, u64 addr)
{
+ BUILD_BUG_ON(!__builtin_constant_p(info) || info != 0);
+ BUILD_BUG_ON(!__builtin_constant_p(addr) || addr != 0);
__asm__ __volatile__(
- ".word ((0x6498000) | (0 << 10) | (0 << 5) | %0)\n\t"
+ "invtlb %0, $zero, $zero\n\t"
:
: "i"(op)
- :
+ : "memory"
);
}
--
2.40.0
Queued for loongarch-next, thanks.
Huacai
On Sun, Jun 25, 2023 at 5:56 PM WANG Xuerui <[email protected]> wrote:
>
> From: WANG Xuerui <[email protected]>
>
> Hi,
>
> It's been a long time since the LoongArch port was upstreamed to LLVM,
> and there seems to be evidence that Linux was successfully built with
> Clang inside Loongson roughly around that time; however, a lot has
> changed since then, and the Linux/LoongArch codebase now makes use of
> more novel features that necessitate further work. (The enablement work
> is tracked at [1].)
>
> With this patch series and a patched LLVM/Clang/LLD ([2][3][4][5]), a
> working kernel can be built with `make LLVM=1`. Although currently
> support for CONFIG_RELOCATABLE and CONFIG_MODULE is still TODO, we've
> decided to post the series early to hopefully reduce the rebase
> burden. The series contains several useful cleanups anyway.
>
> Regarding how to merge this: because only Patch 8 is outside
> arch/loongarch, I'd prefer the series to get merged through Huacai's
> tree. The series applies cleanly on top of next-20230622.
>
> Thanks go to the ClangBuiltLinux team, and LoongArch toolchain
> maintainers from Loongson and the community alike; without your help
> this would come much later, if at all (my free time has been steadily
> dwindling this year already).
>
> Your comments are welcome!
>
> [1]: https://github.com/ClangBuiltLinux/linux/issues/1787
> [2]: https://reviews.llvm.org/D153609
> [3]: https://reviews.llvm.org/D138135
> [4]: https://reviews.llvm.org/D150196
> [5]: https://reviews.llvm.org/D153707
>
> Changes in v3:
>
> - Squashed the two CFLAGS patches into one, and removed the CC_IS_CLANG
> check in favor of a feature detection approach (Huacai and Ruoyao)
> - Removed unnecessary BUILD_BUG_ONs in the invtlb wrappers, and made
> them __always_inline (Ruoyao and Bibo)
> - Linked to the explanation regarding the Clang triple's "libc" part
> (Nick)
> - Fixed a wrong punctuation in commit message of Patch 4
>
> Changes in v2:
>
> - Merged the two FCSR-related patches, now using the same approach for
> assembly and C (given that the inline asm codepath cannot be removed
> right away), also change terminology: register "class" instead of
> "bank"
> - Restored signatures of invtlb wrappers for ease of rebase (potentially
> downstream product kernels?)
> - Removed -G0 switches altogether (turned out it's useless upon closer
> look)
> - Fix -mabi and -msoft-float handling in the CFLAGS patch; two more LLVM
> patches are now necessary (the [4] and [5] links above) but the
> original and correct CFLAGS arrangement now works
>
> WANG Rui (2):
> LoongArch: Calculate various sizes in the linker script
> LoongArch: extable: Also recognize ABI names of registers
>
> WANG Xuerui (6):
> LoongArch: Prepare for assemblers with proper FCSR class support
> LoongArch: Make the CPUCFG and CSR ops simple aliases of compiler
> built-ins
> LoongArch: Simplify the invtlb wrappers
> LoongArch: Tweak CFLAGS for Clang compatibility
> LoongArch: Mark Clang LTO as working
> Makefile: Add loongarch target flag for Clang compilation
>
> arch/loongarch/Kconfig | 5 ++
> arch/loongarch/Makefile | 21 +++++---
> arch/loongarch/include/asm/fpregdef.h | 7 +++
> arch/loongarch/include/asm/gpr-num.h | 30 +++++++++++
> arch/loongarch/include/asm/loongarch.h | 72 +++++++-------------------
> arch/loongarch/include/asm/percpu.h | 6 ++-
> arch/loongarch/include/asm/tlb.h | 43 +++++++--------
> arch/loongarch/kernel/efi-header.S | 6 +--
> arch/loongarch/kernel/head.S | 8 +--
> arch/loongarch/kernel/traps.c | 2 +-
> arch/loongarch/kernel/vmlinux.lds.S | 7 +++
> arch/loongarch/lib/dump_tlb.c | 6 +--
> arch/loongarch/vdso/Makefile | 2 +-
> scripts/Makefile.clang | 1 +
> 14 files changed, 118 insertions(+), 98 deletions(-)
>
> --
> 2.40.0
>
在 2023/6/25 17:56, WANG Xuerui 写道:
> From: WANG Xuerui <[email protected]>
>
> The invtlb instruction has been supported by upstream LoongArch
> toolchains from day one, so ditch the raw opcode trickery and just use
> plain inline asm for it.
>
> While at it, also make the invtlb asm statements barriers, for proper
> modeling of the side effects. The functions are also marked as
> __always_inline instead of just "inline", because they cannot work at
> all if not inlined: the op argument will not be compile-time const in
> that case, thus failing to satisfy the "i" constraint.
>
> The signature of the other more specific invtlb wrappers contain unused
> arguments right now, but these are not removed right away in order for
> the patch to be focused. In the meantime, assertions are added to ensure
> no accidental misuse happens before the refactor. (The more specific
> wrappers cannot re-use the generic invtlb wrapper, because the ISA
> manual says $zero shall be used in case a particular op does not take
> the respective argument: re-using the generic wrapper would mean losing
> control over the register usage.)
>
> Signed-off-by: WANG Xuerui <[email protected]>
> ---
> arch/loongarch/include/asm/tlb.h | 43 ++++++++++++++------------------
> 1 file changed, 19 insertions(+), 24 deletions(-)
>
> diff --git a/arch/loongarch/include/asm/tlb.h b/arch/loongarch/include/asm/tlb.h
> index 0dc9ee2b05d2..da7a3b5b9374 100644
> --- a/arch/loongarch/include/asm/tlb.h
> +++ b/arch/loongarch/include/asm/tlb.h
> @@ -88,52 +88,47 @@ enum invtlb_ops {
> INVTLB_GID_ADDR = 0x16,
> };
>
> -/*
> - * invtlb op info addr
> - * (0x1 << 26) | (0x24 << 20) | (0x13 << 15) |
> - * (addr << 10) | (info << 5) | op
> - */
> -static inline void invtlb(u32 op, u32 info, u64 addr)
> +static __always_inline void invtlb(u32 op, u32 info, u64 addr)
> {
> __asm__ __volatile__(
> - "parse_r addr,%0\n\t"
> - "parse_r info,%1\n\t"
> - ".word ((0x6498000) | (addr << 10) | (info << 5) | %2)\n\t"
> - :
> - : "r"(addr), "r"(info), "i"(op)
> + "invtlb %0, %1, %2\n\t"
> :
> + : "i"(op), "r"(info), "r"(addr)
> + : "memory"
> );
> }
>
> -static inline void invtlb_addr(u32 op, u32 info, u64 addr)
> +static __always_inline void invtlb_addr(u32 op, u32 info, u64 addr)
> {
> + BUILD_BUG_ON(!__builtin_constant_p(info) || info != 0);
> __asm__ __volatile__(
> - "parse_r addr,%0\n\t"
> - ".word ((0x6498000) | (addr << 10) | (0 << 5) | %1)\n\t"
> - :
> - : "r"(addr), "i"(op)
> + "invtlb %0, $zero, %1\n\t"
> :
> + : "i"(op), "r"(addr)
> + : "memory"
> );
> }
>
> -static inline void invtlb_info(u32 op, u32 info, u64 addr)
> +static __always_inline void invtlb_info(u32 op, u32 info, u64 addr)
> {
> + BUILD_BUG_ON(!__builtin_constant_p(addr) || addr != 0);
> __asm__ __volatile__(
> - "parse_r info,%0\n\t"
> - ".word ((0x6498000) | (0 << 10) | (info << 5) | %1)\n\t"
> - :
> - : "r"(info), "i"(op)
> + "invtlb %0, %1, $zero\n\t"
> :
> + : "i"(op), "r"(info)
> + : "memory"
> );
> }
macro parse_r is not used here, and it is not used any more.
Can you remove definition of this macro also?
Regards
Bibo Mao
>
> -static inline void invtlb_all(u32 op, u32 info, u64 addr)
> +static __always_inline void invtlb_all(u32 op, u32 info, u64 addr)
> {
> + BUILD_BUG_ON(!__builtin_constant_p(info) || info != 0);
> + BUILD_BUG_ON(!__builtin_constant_p(addr) || addr != 0);
> __asm__ __volatile__(
> - ".word ((0x6498000) | (0 << 10) | (0 << 5) | %0)\n\t"
> + "invtlb %0, $zero, $zero\n\t"
> :
> : "i"(op)
> - :
> + : "memory"
> );
> }
>
On Sun, Jun 25, 2023 at 5:57 PM WANG Xuerui <[email protected]> wrote:
>
> From: WANG Rui <[email protected]>
>
> Taking the address delta between symbols in different sections is not
> supported by the LLVM IAS. Instead, do this in the linker script, so
> the same data can be properly referenced in assembly.
>
> Signed-off-by: WANG Rui <[email protected]>
> Signed-off-by: WANG Xuerui <[email protected]>
> ---
> arch/loongarch/kernel/efi-header.S | 6 +++---
> arch/loongarch/kernel/head.S | 8 ++++----
> arch/loongarch/kernel/vmlinux.lds.S | 7 +++++++
> 3 files changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/arch/loongarch/kernel/efi-header.S b/arch/loongarch/kernel/efi-header.S
> index 8c1d229a2afa..5f23b85d78ca 100644
> --- a/arch/loongarch/kernel/efi-header.S
> +++ b/arch/loongarch/kernel/efi-header.S
> @@ -24,7 +24,7 @@
> .byte 0x02 /* MajorLinkerVersion */
> .byte 0x14 /* MinorLinkerVersion */
> .long __inittext_end - .Lefi_header_end /* SizeOfCode */
> - .long _end - __initdata_begin /* SizeOfInitializedData */
> + .long _kernel_vsize /* SizeOfInitializedData */
> .long 0 /* SizeOfUninitializedData */
> .long __efistub_efi_pe_entry - _head /* AddressOfEntryPoint */
> .long .Lefi_header_end - _head /* BaseOfCode */
> @@ -79,9 +79,9 @@
> IMAGE_SCN_MEM_EXECUTE /* Characteristics */
>
> .ascii ".data\0\0\0"
> - .long _end - __initdata_begin /* VirtualSize */
> + .long _kernel_vsize /* VirtualSize */
> .long __initdata_begin - _head /* VirtualAddress */
> - .long _edata - __initdata_begin /* SizeOfRawData */
> + .long _kernel_rsize /* SizeOfRawData */
> .long __initdata_begin - _head /* PointerToRawData */
>
> .long 0 /* PointerToRelocations */
> diff --git a/arch/loongarch/kernel/head.S b/arch/loongarch/kernel/head.S
> index 0d8180153ec0..53b883db0786 100644
> --- a/arch/loongarch/kernel/head.S
> +++ b/arch/loongarch/kernel/head.S
> @@ -23,7 +23,7 @@ _head:
> .word MZ_MAGIC /* "MZ", MS-DOS header */
> .org 0x8
> .dword kernel_entry /* Kernel entry point */
> - .dword _end - _text /* Kernel image effective size */
> + .dword _kernel_asize /* Kernel image effective size */
> .quad PHYS_LINK_KADDR /* Kernel image load offset from start of RAM */
> .org 0x38 /* 0x20 ~ 0x37 reserved */
> .long LINUX_PE_MAGIC
> @@ -32,9 +32,9 @@ _head:
> pe_header:
> __EFI_PE_HEADER
>
> -SYM_DATA(kernel_asize, .long _end - _text);
> -SYM_DATA(kernel_fsize, .long _edata - _text);
> -SYM_DATA(kernel_offset, .long kernel_offset - _text);
> +SYM_DATA(kernel_asize, .long _kernel_asize);
> +SYM_DATA(kernel_fsize, .long _kernel_fsize);
> +SYM_DATA(kernel_offset, .long _kernel_offset);
>
> #endif
>
> diff --git a/arch/loongarch/kernel/vmlinux.lds.S b/arch/loongarch/kernel/vmlinux.lds.S
> index 0c7b041be9d8..79f238df029e 100644
> --- a/arch/loongarch/kernel/vmlinux.lds.S
> +++ b/arch/loongarch/kernel/vmlinux.lds.S
> @@ -136,6 +136,13 @@ SECTIONS
> DWARF_DEBUG
> ELF_DETAILS
>
> + /* header symbols */
> + _kernel_asize = _end - _text;
> + _kernel_fsize = _edata - _text;
> + _kernel_offset = kernel_offset - _text;
When !CONFIG_EFI_STUB there is a build error, I fixed it when I applied.
Huacai
> + _kernel_vsize = _end - __initdata_begin;
> + _kernel_rsize = _edata - __initdata_begin;
> +
> .gptab.sdata : {
> *(.gptab.data)
> *(.gptab.sdata)
> --
> 2.40.0
>
>