2024-01-01 17:23:05

by Xi Ruoyao

[permalink] [raw]
Subject: [PATCH] LoongArch: Fix and simplify fcsr initialization on execve

There has been a lingering bug in LoongArch Linux systems causing some
GCC tests to intermittently fail (see Closes link). I've made a minimal
reproducer:

zsh% cat measure.s
.align 4
.globl _start
_start:
movfcsr2gr $a0, $fcsr0
bstrpick.w $a0, $a0, 16, 16
beqz $a0, .ok
break 0
.ok:
li.w $a7, 93
syscall 0
zsh% cc mesaure.s -o measure -nostdlib
zsh% echo $((1.0/3))
0.33333333333333331
zsh% while ./measure; do ; done

This while loop should not stop as POSIX is clear that execve must set
fenv to the default, where FCSR should be zero. But in fact it will
just stop after running for a while (normally less than 30 seconds).
Note that "$((1.0/3))" is needed to reproduce the issue because it
raises FE_INVALID and makes fcsr0 non-zero.

The problem is we are relying on SET_PERSONALITY2 to reset
current->thread.fpu.fcsr. But SET_PERSONALITY2 is executed before
start_thread which calls lose_fpu(0). We can see if kernel preempt is
enabled, we may switch to another thread after SET_PERSONALITY2 but
before lose_fpu(0). Then bad thing happens: during the thread switch
the value of the fcsr0 register is stored into current->thread.fpu.fcsr,
making it dirty again.

The issue can be fixed by setting current->thread.fpu.fcsr after
lose_fpu(0) because lose_fpu clears TIF_USEDFPU, then the thread
switch won't touch current->thread.fpu.fcsr.

The only other architecture setting FCSR in SET_PERSONALITY2 is MIPS.
They do this for supporting different FP flavors (NaN encodings etc).
which do not exist on LoongArch. I'm not sure how MIPS evades the issue
(or maybe it's just buggy too) as I don't have a running MIPS hardware
now.

So for LoongArch, just remove the current->thread.fpu.fcsr setting from
SET_PERSONALITY2 and do it in start_thread, after lose_fpu(0). And we
just set it to 0, instead of boot_cpu_data.fpu_csr0 (because we should
provide the userspace a consistent configuration, no matter how hardware
and firmware behave).

The while loop failing with the mainline kernel has survived one hour
after this change.

Closes: https://github.com/loongson-community/discussions/issues/7
Fixes: 803b0fc5c3f2 ("LoongArch: Add process management")
Cc: [email protected]
Signed-off-by: Xi Ruoyao <[email protected]>
---
arch/loongarch/include/asm/elf.h | 5 -----
arch/loongarch/kernel/elf.c | 5 -----
arch/loongarch/kernel/process.c | 1 +
3 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/loongarch/include/asm/elf.h b/arch/loongarch/include/asm/elf.h
index 9b16a3b8e706..f16bd42456e4 100644
--- a/arch/loongarch/include/asm/elf.h
+++ b/arch/loongarch/include/asm/elf.h
@@ -241,8 +241,6 @@ void loongarch_dump_regs64(u64 *uregs, const struct pt_regs *regs);
do { \
current->thread.vdso = &vdso_info; \
\
- loongarch_set_personality_fcsr(state); \
- \
if (personality(current->personality) != PER_LINUX) \
set_personality(PER_LINUX); \
} while (0)
@@ -259,7 +257,6 @@ do { \
clear_thread_flag(TIF_32BIT_ADDR); \
\
current->thread.vdso = &vdso_info; \
- loongarch_set_personality_fcsr(state); \
\
p = personality(current->personality); \
if (p != PER_LINUX32 && p != PER_LINUX) \
@@ -340,6 +337,4 @@ extern int arch_elf_pt_proc(void *ehdr, void *phdr, struct file *elf,
extern int arch_check_elf(void *ehdr, bool has_interpreter, void *interp_ehdr,
struct arch_elf_state *state);

-extern void loongarch_set_personality_fcsr(struct arch_elf_state *state);
-
#endif /* _ASM_ELF_H */
diff --git a/arch/loongarch/kernel/elf.c b/arch/loongarch/kernel/elf.c
index 183e94fc9c69..0fa81ced28dc 100644
--- a/arch/loongarch/kernel/elf.c
+++ b/arch/loongarch/kernel/elf.c
@@ -23,8 +23,3 @@ int arch_check_elf(void *_ehdr, bool has_interpreter, void *_interp_ehdr,
{
return 0;
}
-
-void loongarch_set_personality_fcsr(struct arch_elf_state *state)
-{
- current->thread.fpu.fcsr = boot_cpu_data.fpu_csr0;
-}
diff --git a/arch/loongarch/kernel/process.c b/arch/loongarch/kernel/process.c
index 767d94cce0de..caed58770650 100644
--- a/arch/loongarch/kernel/process.c
+++ b/arch/loongarch/kernel/process.c
@@ -92,6 +92,7 @@ void start_thread(struct pt_regs *regs, unsigned long pc, unsigned long sp)
clear_used_math();
regs->csr_era = pc;
regs->regs[3] = sp;
+ current->thread.fpu.fcsr = 0;
}

void flush_thread(void)
--
2.43.0



2024-01-02 02:36:05

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH] LoongArch: Fix and simplify fcsr initialization on execve

Hi, Ruoyao,

On Tue, Jan 2, 2024 at 1:23 AM Xi Ruoyao <[email protected]> wrote:
>
> There has been a lingering bug in LoongArch Linux systems causing some
> GCC tests to intermittently fail (see Closes link). I've made a minimal
> reproducer:
>
> zsh% cat measure.s
> .align 4
> .globl _start
> _start:
> movfcsr2gr $a0, $fcsr0
> bstrpick.w $a0, $a0, 16, 16
> beqz $a0, .ok
> break 0
> .ok:
> li.w $a7, 93
> syscall 0
> zsh% cc mesaure.s -o measure -nostdlib
> zsh% echo $((1.0/3))
> 0.33333333333333331
> zsh% while ./measure; do ; done
>
> This while loop should not stop as POSIX is clear that execve must set
> fenv to the default, where FCSR should be zero. But in fact it will
> just stop after running for a while (normally less than 30 seconds).
> Note that "$((1.0/3))" is needed to reproduce the issue because it
> raises FE_INVALID and makes fcsr0 non-zero.
>
> The problem is we are relying on SET_PERSONALITY2 to reset
> current->thread.fpu.fcsr. But SET_PERSONALITY2 is executed before
> start_thread which calls lose_fpu(0). We can see if kernel preempt is
> enabled, we may switch to another thread after SET_PERSONALITY2 but
> before lose_fpu(0). Then bad thing happens: during the thread switch
> the value of the fcsr0 register is stored into current->thread.fpu.fcsr,
> making it dirty again.
>
> The issue can be fixed by setting current->thread.fpu.fcsr after
> lose_fpu(0) because lose_fpu clears TIF_USEDFPU, then the thread
> switch won't touch current->thread.fpu.fcsr.
>
> The only other architecture setting FCSR in SET_PERSONALITY2 is MIPS.
> They do this for supporting different FP flavors (NaN encodings etc).
> which do not exist on LoongArch. I'm not sure how MIPS evades the issue
> (or maybe it's just buggy too) as I don't have a running MIPS hardware
> now.
I think you can use QEMU. :)

>
> So for LoongArch, just remove the current->thread.fpu.fcsr setting from
> SET_PERSONALITY2 and do it in start_thread, after lose_fpu(0). And we
> just set it to 0, instead of boot_cpu_data.fpu_csr0 (because we should
> provide the userspace a consistent configuration, no matter how hardware
> and firmware behave).
I still prefer to set fcsr to boot_cpu_data.fpu_csr0, because we will
add LoongArch32 later, not sure whether something will change.

Huacai

>
> The while loop failing with the mainline kernel has survived one hour
> after this change.
>
> Closes: https://github.com/loongson-community/discussions/issues/7
> Fixes: 803b0fc5c3f2 ("LoongArch: Add process management")
> Cc: [email protected]
> Signed-off-by: Xi Ruoyao <[email protected]>
> ---
> arch/loongarch/include/asm/elf.h | 5 -----
> arch/loongarch/kernel/elf.c | 5 -----
> arch/loongarch/kernel/process.c | 1 +
> 3 files changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/arch/loongarch/include/asm/elf.h b/arch/loongarch/include/asm/elf.h
> index 9b16a3b8e706..f16bd42456e4 100644
> --- a/arch/loongarch/include/asm/elf.h
> +++ b/arch/loongarch/include/asm/elf.h
> @@ -241,8 +241,6 @@ void loongarch_dump_regs64(u64 *uregs, const struct pt_regs *regs);
> do { \
> current->thread.vdso = &vdso_info; \
> \
> - loongarch_set_personality_fcsr(state); \
> - \
> if (personality(current->personality) != PER_LINUX) \
> set_personality(PER_LINUX); \
> } while (0)
> @@ -259,7 +257,6 @@ do { \
> clear_thread_flag(TIF_32BIT_ADDR); \
> \
> current->thread.vdso = &vdso_info; \
> - loongarch_set_personality_fcsr(state); \
> \
> p = personality(current->personality); \
> if (p != PER_LINUX32 && p != PER_LINUX) \
> @@ -340,6 +337,4 @@ extern int arch_elf_pt_proc(void *ehdr, void *phdr, struct file *elf,
> extern int arch_check_elf(void *ehdr, bool has_interpreter, void *interp_ehdr,
> struct arch_elf_state *state);
>
> -extern void loongarch_set_personality_fcsr(struct arch_elf_state *state);
> -
> #endif /* _ASM_ELF_H */
> diff --git a/arch/loongarch/kernel/elf.c b/arch/loongarch/kernel/elf.c
> index 183e94fc9c69..0fa81ced28dc 100644
> --- a/arch/loongarch/kernel/elf.c
> +++ b/arch/loongarch/kernel/elf.c
> @@ -23,8 +23,3 @@ int arch_check_elf(void *_ehdr, bool has_interpreter, void *_interp_ehdr,
> {
> return 0;
> }
> -
> -void loongarch_set_personality_fcsr(struct arch_elf_state *state)
> -{
> - current->thread.fpu.fcsr = boot_cpu_data.fpu_csr0;
> -}
> diff --git a/arch/loongarch/kernel/process.c b/arch/loongarch/kernel/process.c
> index 767d94cce0de..caed58770650 100644
> --- a/arch/loongarch/kernel/process.c
> +++ b/arch/loongarch/kernel/process.c
> @@ -92,6 +92,7 @@ void start_thread(struct pt_regs *regs, unsigned long pc, unsigned long sp)
> clear_used_math();
> regs->csr_era = pc;
> regs->regs[3] = sp;
> + current->thread.fpu.fcsr = 0;
> }
>
> void flush_thread(void)
> --
> 2.43.0
>
>

2024-01-02 08:09:27

by Xi Ruoyao

[permalink] [raw]
Subject: Re: [PATCH] LoongArch: Fix and simplify fcsr initialization on execve

On Tue, 2024-01-02 at 10:35 +0800, Huacai Chen wrote:

/* snip */

> > The only other architecture setting FCSR in SET_PERSONALITY2 is MIPS.
> > They do this for supporting different FP flavors (NaN encodings etc).
> > which do not exist on LoongArch.  I'm not sure how MIPS evades the issue
> > (or maybe it's just buggy too) as I don't have a running MIPS hardware
> > now.
> I think you can use QEMU. :)

I'll investigate it later.

> > So for LoongArch, just remove the current->thread.fpu.fcsr setting from
> > SET_PERSONALITY2 and do it in start_thread, after lose_fpu(0).  And we
> > just set it to 0, instead of boot_cpu_data.fpu_csr0 (because we should
> > provide the userspace a consistent configuration, no matter how hardware
> > and firmware behave).
> I still prefer to set fcsr to boot_cpu_data.fpu_csr0, because we will
> add LoongArch32 later, not sure whether something will change.

I just seen fpu_csr0 is initialized to FPU_CSR_RN which is just 0 for
LA64, so my concern about firmware & hardware leaving non-zero FCSR is
not valid. I'll send v2 to keep using boot_cpu_data.fpu_csr0 then.

>
>
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University