LinuxLists.cc - [PATCH v2] LoongArch: Clean useless vcsr in loongarch

2022-07-04 15:44:17

Subject: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.

The `vcsr` is not used anymore. Remove this member from `loongarch_fpu`.

From 3A5000(LoongArch), `vcsr` is removed in hardware. FP and LSX/LASX
both use `fcsr` as their csr.

Particularly, fcsr from $r16 to $r31 are reserved for LSX/LASX, and
using the registers in this area will cause SXD/ASXD if LSX/LASX is
not enabled.

Signed-off-by: Qi Hu <[email protected]>
---
V2:
- Add more details in the commit message.
---
arch/loongarch/include/asm/fpregdef.h | 1 -
arch/loongarch/include/asm/processor.h | 2 --
arch/loongarch/kernel/asm-offsets.c | 1 -
arch/loongarch/kernel/fpu.S | 10 ----------
4 files changed, 14 deletions(-)

diff --git a/arch/loongarch/include/asm/fpregdef.h b/arch/loongarch/include/asm/fpregdef.h
index adb16e4b4..b6be52783 100644
--- a/arch/loongarch/include/asm/fpregdef.h
+++ b/arch/loongarch/include/asm/fpregdef.h
@@ -48,6 +48,5 @@
#define fcsr1 $r1
#define fcsr2 $r2
#define fcsr3 $r3
-#define vcsr16 $r16

#endif /* _ASM_FPREGDEF_H */
diff --git a/arch/loongarch/include/asm/processor.h b/arch/loongarch/include/asm/processor.h
index 1d63c934b..57ec45aa0 100644
--- a/arch/loongarch/include/asm/processor.h
+++ b/arch/loongarch/include/asm/processor.h
@@ -80,7 +80,6 @@ BUILD_FPR_ACCESS(64)

struct loongarch_fpu {
unsigned int fcsr;
- unsigned int vcsr;
uint64_t fcc; /* 8x8 */
union fpureg fpr[NUM_FPU_REGS];
};
@@ -161,7 +160,6 @@ struct thread_struct {
*/ \
.fpu = { \
.fcsr = 0, \
- .vcsr = 0, \
.fcc = 0, \
.fpr = {{{0,},},}, \
}, \
diff --git a/arch/loongarch/kernel/asm-offsets.c b/arch/loongarch/kernel/asm-offsets.c
index bfb65eb28..20cd9e16a 100644
--- a/arch/loongarch/kernel/asm-offsets.c
+++ b/arch/loongarch/kernel/asm-offsets.c
@@ -166,7 +166,6 @@ void output_thread_fpu_defines(void)

OFFSET(THREAD_FCSR, loongarch_fpu, fcsr);
OFFSET(THREAD_FCC, loongarch_fpu, fcc);
- OFFSET(THREAD_VCSR, loongarch_fpu, vcsr);
BLANK();
}

diff --git a/arch/loongarch/kernel/fpu.S b/arch/loongarch/kernel/fpu.S
index 75c6ce068..a631a7137 100644
--- a/arch/loongarch/kernel/fpu.S
+++ b/arch/loongarch/kernel/fpu.S
@@ -146,16 +146,6 @@
movgr2fcsr fcsr0, \tmp0
.endm

- .macro sc_save_vcsr base, tmp0
- movfcsr2gr \tmp0, vcsr16
- EX st.w \tmp0, \base, 0
- .endm
-
- .macro sc_restore_vcsr base, tmp0
- EX ld.w \tmp0, \base, 0
- movgr2fcsr vcsr16, \tmp0
- .endm
-
/*
* Save a thread's fp context.
*/
--
2.36.1

2022-07-05 08:56:41

by WANG Xuerui

[permalink] [raw]

Subject: Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.

Hi,

I just noticed that the subject line is in Chinglish too. It should
probably be just "LoongArch: remove mentions of vcsr" and without the
trailing period.

On 2022/7/4 23:36, Qi Hu wrote:
> The `vcsr` is not used anymore. Remove this member from `loongarch_fpu`.
>
> From 3A5000(LoongArch), `vcsr` is removed in hardware. FP and LSX/LASX
> both use `fcsr` as their csr.
>
> Particularly, fcsr from $r16 to $r31 are reserved for LSX/LASX, and
> using the registers in this area will cause SXD/ASXD if LSX/LASX is
> not enabled.

Actually I'm still not very satisfied with the explanation; the code
must be written with *something* in mind, since GS464V/LA464 is the only
LoongArch implementation so far, it must have a VCSR to begin with. And
you can't magically melt the VCSR on the tens of thousands of
3A5000/3C5000's already shipped, because the old-world kernel obviously
comes with LSX/LASX and it most likely utilizes the VCSR. In addition,
you didn't mention what will happen if LSX/LASX *is* enabled on this
new-world kernel, *and* fcsr16 is being accessed.

I think maybe you just want to remove the mentions of VCSR since they
are dead code right now, as I don't believe it's gone in shipped
hardware, as explained above. Except if there's magically a way to
implement LSX/LASX without touching the said-to-have-disappeared VCSR,
which I don't know of, and can't know because the LSX/LASX ISA manual is
still not publicly accessible; so I don't feel comfortable approving
this patch.

(BTW, the $rXX-style reference to the FCSRs is obviously an
implementation wart of the toolchain, the FCSR is nothing like GPR so it
obviously shouldn't be referred to as one. Binutils patch will be out
shortly.)

>
> Signed-off-by: Qi Hu <[email protected]>
> ---
> V2:
> - Add more details in the commit message.
> ---
> arch/loongarch/include/asm/fpregdef.h | 1 -
> arch/loongarch/include/asm/processor.h | 2 --
> arch/loongarch/kernel/asm-offsets.c | 1 -
> arch/loongarch/kernel/fpu.S | 10 ----------
> 4 files changed, 14 deletions(-)
>
> diff --git a/arch/loongarch/include/asm/fpregdef.h b/arch/loongarch/include/asm/fpregdef.h
> index adb16e4b4..b6be52783 100644
> --- a/arch/loongarch/include/asm/fpregdef.h
> +++ b/arch/loongarch/include/asm/fpregdef.h
> @@ -48,6 +48,5 @@
> #define fcsr1 $r1
> #define fcsr2 $r2
> #define fcsr3 $r3
> -#define vcsr16 $r16
>
> #endif /* _ASM_FPREGDEF_H */
> diff --git a/arch/loongarch/include/asm/processor.h b/arch/loongarch/include/asm/processor.h
> index 1d63c934b..57ec45aa0 100644
> --- a/arch/loongarch/include/asm/processor.h
> +++ b/arch/loongarch/include/asm/processor.h
> @@ -80,7 +80,6 @@ BUILD_FPR_ACCESS(64)
>
> struct loongarch_fpu {
> unsigned int fcsr;
> - unsigned int vcsr;
> uint64_t fcc; /* 8x8 */
> union fpureg fpr[NUM_FPU_REGS];
> };
> @@ -161,7 +160,6 @@ struct thread_struct {
> */ \
> .fpu = { \
> .fcsr = 0, \
> - .vcsr = 0, \
> .fcc = 0, \
> .fpr = {{{0,},},}, \
> }, \
> diff --git a/arch/loongarch/kernel/asm-offsets.c b/arch/loongarch/kernel/asm-offsets.c
> index bfb65eb28..20cd9e16a 100644
> --- a/arch/loongarch/kernel/asm-offsets.c
> +++ b/arch/loongarch/kernel/asm-offsets.c
> @@ -166,7 +166,6 @@ void output_thread_fpu_defines(void)
>
> OFFSET(THREAD_FCSR, loongarch_fpu, fcsr);
> OFFSET(THREAD_FCC, loongarch_fpu, fcc);
> - OFFSET(THREAD_VCSR, loongarch_fpu, vcsr);
> BLANK();
> }
>
> diff --git a/arch/loongarch/kernel/fpu.S b/arch/loongarch/kernel/fpu.S
> index 75c6ce068..a631a7137 100644
> --- a/arch/loongarch/kernel/fpu.S
> +++ b/arch/loongarch/kernel/fpu.S
> @@ -146,16 +146,6 @@
> movgr2fcsr fcsr0, \tmp0
> .endm
>
> - .macro sc_save_vcsr base, tmp0
> - movfcsr2gr \tmp0, vcsr16
> - EX st.w \tmp0, \base, 0
> - .endm
> -
> - .macro sc_restore_vcsr base, tmp0
> - EX ld.w \tmp0, \base, 0
> - movgr2fcsr vcsr16, \tmp0
> - .endm
> -
> /*
> * Save a thread's fp context.
> */

2022-07-05 10:01:44

by Xi Ruoyao

[permalink] [raw]

Subject: Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.

On Tue, 2022-07-05 at 16:46 +0800, WANG Xuerui wrote:

> Actually I'm still not very satisfied with the explanation; the code
> must be written with *something* in mind, since GS464V/LA464 is the only
> LoongArch implementation so far, it must have a VCSR to begin with. And
> you can't magically melt the VCSR on the tens of thousands of
> 3A5000/3C5000's already shipped, because the old-world kernel obviously
> comes with LSX/LASX and it most likely utilizes the VCSR. In addition,
> you didn't mention what will happen if LSX/LASX *is* enabled on this
> new-world kernel, *and* fcsr16 is being accessed.
>
> I think maybe you just want to remove the mentions of VCSR since they
> are dead code right now, as I don't believe it's gone in shipped
> hardware, as explained above. Except if there's magically a way to
> implement LSX/LASX without touching the said-to-have-disappeared VCSR,
> which I don't know of, and can't know because the LSX/LASX ISA manual is
> still not publicly accessible; so I don't feel comfortable approving
> this patch.

Let me make some bold guess here. In the MIPS-compatible 3A4000 we had
"MSACSR" register. According to the MSA manual, only the 24-th bit of
this register is used:

"Some MSA floating point instructions might not handle subnormal input
operands or compute tiny non-zero results. Such instructions may signal
the Unimplemented Operation Exception and let the software emulation
finalize the operation. If software emulation is not needed or desired,
MSACSR FS bit could be set to replace every tiny non-zero result and
subnormal input operand with zero of the same sign."

And, it says:

"Should the alternate exception handling attributes of the IEEE Standard
for Floating-Point Arithmetic 754-2008, Section 8 be desired, the MSACSR
FS bit should be zero, the Underflow Exception be enabled and a trap
handler be provided to carry out the execution of the alternate
exception handling attributes."

We can see Loongson has been trying to make 3A processors more IEEE-754
compatible. For example, 3A4000 is the only non-R6 MIPS-compatible
processor using IEEE-754-2008 NaN encoding. And LoongArch manual
directly refers to IEEE-754-2008 manual in many places. So I guess this
change means Loongson won't set this bit to 1 for 3A5000 at all, and any
attempt by the user to set this bit could be considered undefined
behavior (causing inconsistent behavior on 3A5000 as the kernel doesn't
save/restore VCSR at context switch, and SIGILL on 3A6000 as the
register is removed).
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2022-07-05 10:20:57

by Xi Ruoyao

[permalink] [raw]

Subject: Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.

On Tue, 2022-07-05 at 17:49 +0800, Xi Ruoyao wrote:
> On Tue, 2022-07-05 at 16:46 +0800, WANG Xuerui wrote:
>
> > Actually I'm still not very satisfied with the explanation; the code
> > must be written with *something* in mind, since GS464V/LA464 is the only
> > LoongArch implementation so far, it must have a VCSR to begin with. And
> > you can't magically melt the VCSR on the tens of thousands of
> > 3A5000/3C5000's already shipped, because the old-world kernel obviously
> > comes with LSX/LASX and it most likely utilizes the VCSR. In addition,
> > you didn't mention what will happen if LSX/LASX *is* enabled on this
> > new-world kernel, *and* fcsr16 is being accessed.
> >
> > I think maybe you just want to remove the mentions of VCSR since they
> > are dead code right now, as I don't believe it's gone in shipped
> > hardware, as explained above. Except if there's magically a way to
> > implement LSX/LASX without touching the said-to-have-disappeared VCSR,
> > which I don't know of, and can't know because the LSX/LASX ISA manual is
> > still not publicly accessible; so I don't feel comfortable approving
> > this patch.
>
> Let me make some bold guess here. In the MIPS-compatible 3A4000 we had
> "MSACSR" register. According to the MSA manual, only the 24-th bit of
> this register is used:
>
> "Some MSA floating point instructions might not handle subnormal input
> operands or compute tiny non-zero results. Such instructions may signal
> the Unimplemented Operation Exception and let the software emulation
> finalize the operation. If software emulation is not needed or desired,
> MSACSR FS bit could be set to replace every tiny non-zero result and
> subnormal input operand with zero of the same sign."
>
> And, it says:
>
> "Should the alternate exception handling attributes of the IEEE Standard
> for Floating-Point Arithmetic 754-2008, Section 8 be desired, the MSACSR
> FS bit should be zero, the Underflow Exception be enabled and a trap
> handler be provided to carry out the execution of the alternate
> exception handling attributes."
>
> We can see Loongson has been trying to make 3A processors more IEEE-754
> compatible. For example, 3A4000 is the only non-R6 MIPS-compatible
> processor using IEEE-754-2008 NaN encoding. And LoongArch manual
> directly refers to IEEE-754-2008 manual in many places. So I guess this
> change means Loongson won't set this bit to 1 for 3A5000 at all, and any
> attempt by the user to set this bit could be considered undefined
> behavior (causing inconsistent behavior on 3A5000 as the kernel doesn't
> save/restore VCSR at context switch, and SIGILL on 3A6000 as the
> register is removed).

P. S. I'm just playing an "intelligence game" writing the reply so maybe
I'm completely wrong. And, it does not indicate any position of me or
my affiliation about some "controversial topics". And, even if my guess
is correct, it's still better to add something like "the use of VCSR is
optional in LA464 and we've decided to remove it in future
architectures, so it's not supported to use it even on LA464" into the
commit message.
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2022-07-06 02:48:19

by Huacai Chen

[permalink] [raw]

Subject: Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.

Hi, all,

On Tue, Jul 5, 2022 at 6:01 PM Xi Ruoyao <[email protected]> wrote:
>
> On Tue, 2022-07-05 at 17:49 +0800, Xi Ruoyao wrote:
> > On Tue, 2022-07-05 at 16:46 +0800, WANG Xuerui wrote:
> >
> > > Actually I'm still not very satisfied with the explanation; the code
> > > must be written with *something* in mind, since GS464V/LA464 is the only
> > > LoongArch implementation so far, it must have a VCSR to begin with. And
> > > you can't magically melt the VCSR on the tens of thousands of
> > > 3A5000/3C5000's already shipped, because the old-world kernel obviously
> > > comes with LSX/LASX and it most likely utilizes the VCSR. In addition,
> > > you didn't mention what will happen if LSX/LASX *is* enabled on this
> > > new-world kernel, *and* fcsr16 is being accessed.
> > >
> > > I think maybe you just want to remove the mentions of VCSR since they
> > > are dead code right now, as I don't believe it's gone in shipped
> > > hardware, as explained above. Except if there's magically a way to
> > > implement LSX/LASX without touching the said-to-have-disappeared VCSR,
> > > which I don't know of, and can't know because the LSX/LASX ISA manual is
> > > still not publicly accessible; so I don't feel comfortable approving
> > > this patch.
> >
> > Let me make some bold guess here. In the MIPS-compatible 3A4000 we had
> > "MSACSR" register. According to the MSA manual, only the 24-th bit of
> > this register is used:
> >
> > "Some MSA floating point instructions might not handle subnormal input
> > operands or compute tiny non-zero results. Such instructions may signal
> > the Unimplemented Operation Exception and let the software emulation
> > finalize the operation. If software emulation is not needed or desired,
> > MSACSR FS bit could be set to replace every tiny non-zero result and
> > subnormal input operand with zero of the same sign."
> >
> > And, it says:
> >
> > "Should the alternate exception handling attributes of the IEEE Standard
> > for Floating-Point Arithmetic 754-2008, Section 8 be desired, the MSACSR
> > FS bit should be zero, the Underflow Exception be enabled and a trap
> > handler be provided to carry out the execution of the alternate
> > exception handling attributes."
> >
> > We can see Loongson has been trying to make 3A processors more IEEE-754
> > compatible. For example, 3A4000 is the only non-R6 MIPS-compatible
> > processor using IEEE-754-2008 NaN encoding. And LoongArch manual
> > directly refers to IEEE-754-2008 manual in many places. So I guess this
> > change means Loongson won't set this bit to 1 for 3A5000 at all, and any
> > attempt by the user to set this bit could be considered undefined
> > behavior (causing inconsistent behavior on 3A5000 as the kernel doesn't
> > save/restore VCSR at context switch, and SIGILL on 3A6000 as the
> > register is removed).
>
> P. S. I'm just playing an "intelligence game" writing the reply so maybe
> I'm completely wrong. And, it does not indicate any position of me or
> my affiliation about some "controversial topics". And, even if my guess
> is correct, it's still better to add something like "the use of VCSR is
> optional in LA464 and we've decided to remove it in future
> architectures, so it's not supported to use it even on LA464" into the
> commit message.

Maybe Xuerui and Ruoyao have some misunderstanding. LSX/LASX will
surely be upstream, this has nothing to do with cleanup VCSR16.
Because FP/LSX/LASX share the same control bits in FCSR now. Am I
right? Qi Hu?

If I'm right, Qi Hu, please tell us what control bits were in VCSR16,
and where their replacements in FCSR are. Thanks.

Huacai

> --
> Xi Ruoyao <[email protected]>
> School of Aerospace Science and Technology, Xidian University

2022-07-06 03:12:04

by Xi Ruoyao

[permalink] [raw]

Subject: Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.

On Wed, 2022-07-06 at 10:35 +0800, Huacai Chen wrote:

> Maybe Xuerui and Ruoyao have some misunderstanding. LSX/LASX will
> surely be upstream, this has nothing to do with cleanup VCSR16.
> Because FP/LSX/LASX share the same control bits in FCSR now.

My guess:

Almost all behavior of vector unit is controlled by FCSR (for example,
the rounding of both FPU and vector unit should be controlled by FCSR
altogether), except one bit similar to the bit 24 of MSACSR ("flush to
zero") is in VCSR [^1]. And "flush to zero" is not really useful so it
will be removed in 3A6000, and we'll not use it for 3A5000.

[^1]: A more bold guess: the hardware engineers could have just said
"let's wire this register called MSACSR in GS464V as FCSR16/VCSR in
LA464, maybe it will be useful and who knows?" But now in practice it's
not useful.

Am I correct?

--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2022-07-06 04:26:26

by Qi Hu

[permalink] [raw]

Subject: Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.

On 2022/7/6 10:51, Xi Ruoyao wrote:
> On Wed, 2022-07-06 at 10:35 +0800, Huacai Chen wrote:
>
>> Maybe Xuerui and Ruoyao have some misunderstanding. LSX/LASX will
>> surely be upstream, this has nothing to do with cleanup VCSR16.
>> Because FP/LSX/LASX share the same control bits in FCSR now.
> My guess:
>
> Almost all behavior of vector unit is controlled by FCSR (for example,
> the rounding of both FPU and vector unit should be controlled by FCSR
> altogether), except one bit similar to the bit 24 of MSACSR ("flush to
> zero") is in VCSR [^1]. And "flush to zero" is not really useful so it
> will be removed in 3A6000, and we'll not use it for 3A5000.
Actually, flush to zero has been removed in 3A5000.
>
> [^1]: A more bold guess: the hardware engineers could have just said
> "let's wire this register called MSACSR in GS464V as FCSR16/VCSR in
> LA464, maybe it will be useful and who knows?" But now in practice it's
> not useful.
>
> Am I correct?
The hardware(LA464) has removed the vcsr("has but not use" is
incorrect), and here are some details:

- For all FP operations, including LSX/LASX, they are controlled by
fcsr0/1/2/3.

- For LSX/LASX other operations, they are *not* controlled by any other
CSR now. And fcsr16 to fcsr31 are reserved to control these operations
(now they are *undefined*).

- Flush to zero(MSACSR.FS) is removed and not supported.

- If you use "movfcsr2gr" to read the fcsr16, the value is *UNDEFINED*.

2022-07-06 20:56:33

by Jiaxun Yang

[permalink] [raw]

Subject: Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.

在2022年7月6日七月上午5:00，Qi Hu写道：
> On 2022/7/6 10:51, Xi Ruoyao wrote:
>> On Wed, 2022-07-06 at 10:35 +0800, Huacai Chen wrote:
>>
>>> Maybe Xuerui and Ruoyao have some misunderstanding. LSX/LASX will
>>> surely be upstream, this has nothing to do with cleanup VCSR16.
>>> Because FP/LSX/LASX share the same control bits in FCSR now.
>> My guess:
>>
>> Almost all behavior of vector unit is controlled by FCSR (for example,
>> the rounding of both FPU and vector unit should be controlled by FCSR
>> altogether), except one bit similar to the bit 24 of MSACSR ("flush to
>> zero") is in VCSR [^1]. And "flush to zero" is not really useful so it
>> will be removed in 3A6000, and we'll not use it for 3A5000.
> Actually, flush to zero has been removed in 3A5000.
>>
>> [^1]: A more bold guess: the hardware engineers could have just said
>> "let's wire this register called MSACSR in GS464V as FCSR16/VCSR in
>> LA464, maybe it will be useful and who knows?" But now in practice it's
>> not useful.
>>
>> Am I correct?
> The hardware(LA464) has removed the vcsr("has but not use" is
> incorrect), and here are some details:
>
> - For all FP operations, including LSX/LASX, they are controlled by
> fcsr0/1/2/3.
>
> - For LSX/LASX other operations, they are *not* controlled by any other
> CSR now. And fcsr16 to fcsr31 are reserved to control these operations
> (now they are *undefined*).
Sorry but what do you meant by “these” here?
If it means LSX/LASX, are you trying to say that future chip’s LSX/LASX won’t be
compatible with present 3A5000? As your said fcsr16 and fcsr31 are undefined
for now.

Thanks
-

>
> - Flush to zero(MSACSR.FS) is removed and not supported.
>
> - If you use "movfcsr2gr" to read the fcsr16, the value is *UNDEFINED*.

--
- Jiaxun

2022-07-07 01:31:48

by Qi Hu

[permalink] [raw]

Subject: Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.

On 2022/7/7 04:49, Jiaxun Yang wrote:
>
> 在2022年7月6日七月上午5:00，Qi Hu写道：
>> On 2022/7/6 10:51, Xi Ruoyao wrote:
>>> On Wed, 2022-07-06 at 10:35 +0800, Huacai Chen wrote:
>>>
>>>> Maybe Xuerui and Ruoyao have some misunderstanding. LSX/LASX will
>>>> surely be upstream, this has nothing to do with cleanup VCSR16.
>>>> Because FP/LSX/LASX share the same control bits in FCSR now.
>>> My guess:
>>>
>>> Almost all behavior of vector unit is controlled by FCSR (for example,
>>> the rounding of both FPU and vector unit should be controlled by FCSR
>>> altogether), except one bit similar to the bit 24 of MSACSR ("flush to
>>> zero") is in VCSR [^1]. And "flush to zero" is not really useful so it
>>> will be removed in 3A6000, and we'll not use it for 3A5000.
>> Actually, flush to zero has been removed in 3A5000.
>>> [^1]: A more bold guess: the hardware engineers could have just said
>>> "let's wire this register called MSACSR in GS464V as FCSR16/VCSR in
>>> LA464, maybe it will be useful and who knows?" But now in practice it's
>>> not useful.
>>>
>>> Am I correct?
>> The hardware(LA464) has removed the vcsr("has but not use" is
>> incorrect), and here are some details:
>>
>> - For all FP operations, including LSX/LASX, they are controlled by
>> fcsr0/1/2/3.
>>
>> - For LSX/LASX other operations, they are *not* controlled by any other
>> CSR now. And fcsr16 to fcsr31 are reserved to control these operations
>> (now they are *undefined*).
> Sorry but what do you meant by “these” here?
"These operations" means "LSX/LASX other operations", except its
floating-point operations.
> If it means LSX/LASX, are you trying to say that future chip’s LSX/LASX won’t be
> compatible with present 3A5000? As your said fcsr16 and fcsr31 are undefined
> for now.
>
> Thanks
> -

"not compatible" is incorrect. If future chips add new features to
define and use these registers, some bits in CPUCFG should be set, like
CPUID in X86.

And at that time, if the applications do not use these new features(like
some old apps), they can run at *default* state which is the same as
current 3A5000.

So, "reserved" is just to prepare for adding something, not "incompatible".

Thanks.

>> - Flush to zero(MSACSR.FS) is removed and not supported.
>>
>> - If you use "movfcsr2gr" to read the fcsr16, the value is *UNDEFINED*.

2022-07-07 04:18:32

by Xi Ruoyao

[permalink] [raw]

Subject: Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.

On Thu, 2022-07-07 at 11:05 +0800, WANG Xuerui wrote:

> To be frank, at this point I think you're trying to hide something.
> (This is not your fault, blame someone else of course because they told
> you the fact.) In the old-world kernel the VCSR a.k.a. FCSR16 is
> certainly being saved/restored, and there's apparently no harm in doing
> so. And if the contents are indeed "undefined", why are the code there
> in the first place? Certainly the bits *are* meaningful, only that for
> some reason you aren't revealing the semantics and pretending that they
> are "undefined" and probably "do nothing externally observable" if
> accessed in the first place.

On a 3A5000LL, I did an experiment via a kernel module, which enables
LSX/LASX and tries to write and read fcsr16. I tried each bit (1, 2, 4,
8, ..., 1 << 31) one by one. The result: no matter which bit I wrote
into fcsr16, I always read out 0.

And I've objdump'ed a kernel shipped in an early Loongnix release. It
seems the only reference to fcsr16 is a "movgr2fcsr $r16, $r0"
instruction.
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2022-07-07 04:27:12

by WANG Xuerui

[permalink] [raw]

Subject: Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.

On 2022/7/7 09:29, Qi Hu wrote:
>
> On 2022/7/7 04:49, Jiaxun Yang wrote:
>>
>> 在2022年7月6日七月上午5:00，Qi Hu写道：
>>> On 2022/7/6 10:51, Xi Ruoyao wrote:
>>>> On Wed, 2022-07-06 at 10:35 +0800, Huacai Chen wrote:
>>>>
>>>>> Maybe Xuerui and Ruoyao have some misunderstanding. LSX/LASX will
>>>>> surely be upstream, this has nothing to do with cleanup VCSR16.
>>>>> Because FP/LSX/LASX share the same control bits in FCSR now.
>>>> My guess:
>>>>
>>>> Almost all behavior of vector unit is controlled by FCSR (for example,
>>>> the rounding of both FPU and vector unit should be controlled by FCSR
>>>> altogether), except one bit similar to the bit 24 of MSACSR ("flush to
>>>> zero") is in VCSR [^1]. And "flush to zero" is not really useful
>>>> so it
>>>> will be removed in 3A6000, and we'll not use it for 3A5000.
>>> Actually, flush to zero has been removed in 3A5000.
>>>> [^1]: A more bold guess: the hardware engineers could have just said
>>>> "let's wire this register called MSACSR in GS464V as FCSR16/VCSR in
>>>> LA464, maybe it will be useful and who knows?" But now in practice
>>>> it's
>>>> not useful.
>>>>
>>>> Am I correct?
>>> The hardware(LA464) has removed the vcsr("has but not use" is
>>> incorrect), and here are some details:
>>>
>>> - For all FP operations, including LSX/LASX, they are controlled by
>>> fcsr0/1/2/3.
>>>
>>> - For LSX/LASX other operations, they are *not* controlled by any other
>>> CSR now. And fcsr16 to fcsr31 are reserved to control these operations
>>> (now they are *undefined*).
>> Sorry but what do you meant by “these” here?
> "These operations" means "LSX/LASX other operations", except its
> floating-point operations.

This is getting hard to follow. Assuming the expression "LSX/LASX other
operations" is Chinglish (it's certainly not proper English), I think
you mean "the non-FP operations belonging to LSX/LASX" here right?

And it's strange, that these ops do exist in LSX/LASX, hence also
present in 3A5000, but the control bits are undefined. How come this is
even possible?

>> If it means LSX/LASX, are you trying to say that future chip’s
>> LSX/LASX won’t be
>> compatible with present 3A5000? As your said fcsr16 and fcsr31 are
>> undefined
>> for now.
>>
>> Thanks
>> -
>
> "not compatible" is incorrect. If future chips add new features to
> define and use these registers, some bits in CPUCFG should be set,
> like CPUID in X86.
>
> And at that time, if the applications do not use these new
> features(like some old apps), they can run at *default* state which is
> the same as current 3A5000.
>
> So, "reserved" is just to prepare for adding something, not
> "incompatible".

To be frank, at this point I think you're trying to hide something.
(This is not your fault, blame someone else of course because they told
you the fact.) In the old-world kernel the VCSR a.k.a. FCSR16 is
certainly being saved/restored, and there's apparently no harm in doing
so. And if the contents are indeed "undefined", why are the code there
in the first place? Certainly the bits *are* meaningful, only that for
some reason you aren't revealing the semantics and pretending that they
are "undefined" and probably "do nothing externally observable" if
accessed in the first place.

Then, either

(1) I am wrong and being hyper-aggressive here (and if it turns out to
be the case, I'd immediately apologize for that), and the VCSR is indeed
only meaningful for some early development steppings of 3A5000, and does
nothing on the 3A5000 we all have now;

or, (2) the VCSR is still relevant for the "other" applicable LSX/LASX
ops on 3A5000, but this patch is taking the VCSR out thus rendering such
ops unreliable/incorrect in case they depend on the additional state.

Either way, the new-world kernel is handling LSX/LASX differently from
the old-world, and I fear this would create unnecessary churn both in
kernel and downstream projects already carrying LSX/LASX assembly, in
case they actually used said ops. I can't rule out such possibility
because at this time all LSX/LASX assembly was written by Loongson
people with access to the manuals, and tested on old-world systems, so
they might not be aware that some non-FP ops are going to magically
change behavior on new-world systems because VCSR is suddenly not being
saved/restored.

(Actually, people outside of Loongson might never know if (2) is the
case, because the insns dependent on VCSR could be simply deleted from
manuals, so they "don't exist" for outsiders, and "compatibility" of
course doesn't suffer.)

>
> Thanks.
>
>>> - Flush to zero(MSACSR.FS) is removed and not supported.
>>>
>>> - If you use "movfcsr2gr" to read the fcsr16, the value is *UNDEFINED*.
>

2022-07-07 04:28:35

by WANG Xuerui

[permalink] [raw]

Subject: Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.

On 2022/7/7 12:04, Xi Ruoyao wrote:
> On Thu, 2022-07-07 at 11:05 +0800, WANG Xuerui wrote:
>
>> To be frank, at this point I think you're trying to hide something.
>> (This is not your fault, blame someone else of course because they told
>> you the fact.) In the old-world kernel the VCSR a.k.a. FCSR16 is
>> certainly being saved/restored, and there's apparently no harm in doing
>> so. And if the contents are indeed "undefined", why are the code there
>> in the first place? Certainly the bits *are* meaningful, only that for
>> some reason you aren't revealing the semantics and pretending that they
>> are "undefined" and probably "do nothing externally observable" if
>> accessed in the first place.
> On a 3A5000LL, I did an experiment via a kernel module, which enables
> LSX/LASX and tries to write and read fcsr16. I tried each bit (1, 2, 4,
> 8, ..., 1 << 31) one by one. The result: no matter which bit I wrote
> into fcsr16, I always read out 0.
>
> And I've objdump'ed a kernel shipped in an early Loongnix release. It
> seems the only reference to fcsr16 is a "movgr2fcsr $r16, $r0"
> instruction.

Hmm this is weird. I can't understand why the vcsr code was there in the
first place then... I'd like to check a few Loongnix/Kylin/UOS kernels
but currently I don't have the time.

If this is the case, indeed all vcsr-related code should be removed.
Although I'm still not sure how to best word the commit message.

2022-07-07 05:47:31

by Qi Hu

[permalink] [raw]

Subject: Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.

On 2022/7/7 11:05, WANG Xuerui wrote:
> On 2022/7/7 09:29, Qi Hu wrote:
>>
>> On 2022/7/7 04:49, Jiaxun Yang wrote:
>>>
>>> 在2022年7月6日七月上午5:00，Qi Hu写道：
>>>> On 2022/7/6 10:51, Xi Ruoyao wrote:
>>>>> On Wed, 2022-07-06 at 10:35 +0800, Huacai Chen wrote:
>>>>>
>>>>>> Maybe Xuerui and Ruoyao have some misunderstanding. LSX/LASX will
>>>>>> surely be upstream, this has nothing to do with cleanup VCSR16.
>>>>>> Because FP/LSX/LASX share the same control bits in FCSR now.
>>>>> My guess:
>>>>>
>>>>> Almost all behavior of vector unit is controlled by FCSR (for
>>>>> example,
>>>>> the rounding of both FPU and vector unit should be controlled by FCSR
>>>>> altogether), except one bit similar to the bit 24 of MSACSR
>>>>> ("flush to
>>>>> zero") is in VCSR [^1]. And "flush to zero" is not really useful
>>>>> so it
>>>>> will be removed in 3A6000, and we'll not use it for 3A5000.
>>>> Actually, flush to zero has been removed in 3A5000.
>>>>> [^1]: A more bold guess: the hardware engineers could have just said
>>>>> "let's wire this register called MSACSR in GS464V as FCSR16/VCSR in
>>>>> LA464, maybe it will be useful and who knows?" But now in
>>>>> practice it's
>>>>> not useful.
>>>>>
>>>>> Am I correct?
>>>> The hardware(LA464) has removed the vcsr("has but not use" is
>>>> incorrect), and here are some details:
>>>>
>>>> - For all FP operations, including LSX/LASX, they are controlled by
>>>> fcsr0/1/2/3.
>>>>
>>>> - For LSX/LASX other operations, they are *not* controlled by any
>>>> other
>>>> CSR now. And fcsr16 to fcsr31 are reserved to control these operations
>>>> (now they are *undefined*).
>>> Sorry but what do you meant by “these” here?
>> "These operations" means "LSX/LASX other operations", except its
>> floating-point operations.
>
> This is getting hard to follow. Assuming the expression "LSX/LASX
> other operations" is Chinglish (it's certainly not proper English), I
> think you mean "the non-FP operations belonging to LSX/LASX" here right?
That's right.
>
> And it's strange, that these ops do exist in LSX/LASX, hence also
> present in 3A5000, but the control bits are undefined. How come this
> is even possible?
The code is redundant, actually. Reading or writing the fcsr16 do not
have any effect on LSX/LASX.

2022-07-07 06:18:59

by Qi Hu

[permalink] [raw]

Subject: Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.

On 2022/7/7 12:22, WANG Xuerui wrote:
> On 2022/7/7 12:04, Xi Ruoyao wrote:
>> On Thu, 2022-07-07 at 11:05 +0800, WANG Xuerui wrote:
>>
>>> To be frank, at this point I think you're trying to hide something.
>>> (This is not your fault, blame someone else of course because they told
>>> you the fact.) In the old-world kernel the VCSR a.k.a. FCSR16 is
>>> certainly being saved/restored, and there's apparently no harm in doing
>>> so. And if the contents are indeed "undefined", why are the code there
>>> in the first place? Certainly the bits *are* meaningful, only that for
>>> some reason you aren't revealing the semantics and pretending that they
>>> are "undefined" and probably "do nothing externally observable" if
>>> accessed in the first place.
>> On a 3A5000LL, I did an experiment via a kernel module, which enables
>> LSX/LASX and tries to write and read fcsr16. I tried each bit (1, 2, 4,
>> 8, ..., 1 << 31) one by one. The result: no matter which bit I wrote
>> into fcsr16, I always read out 0.
>>
>> And I've objdump'ed a kernel shipped in an early Loongnix release. It
>> seems the only reference to fcsr16 is a "movgr2fcsr $r16, $r0"
>> instruction.
>
> Hmm this is weird. I can't understand why the vcsr code was there in
> the first place then... I'd like to check a few Loongnix/Kylin/UOS
> kernels but currently I don't have the time.
>
> If this is the case, indeed all vcsr-related code should be removed.
> Although I'm still not sure how to best word the commit message.
>
Thanks for the Ruoyao's experiment.

Removing the vcsr is the first step to trying to support LBT and
LSX/LASX in Kernel. In my opinion, the vcsr relevant code may be used
for debugging and forgot to remove.

2022-07-07 06:58:18

by WANG Xuerui

[permalink] [raw]

Subject: Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.

On 2022/7/7 14:09, Qi Hu wrote:
>
> On 2022/7/7 12:22, WANG Xuerui wrote:
>> On 2022/7/7 12:04, Xi Ruoyao wrote:
>>> On Thu, 2022-07-07 at 11:05 +0800, WANG Xuerui wrote:
>>>
>>>> To be frank, at this point I think you're trying to hide something.
>>>> (This is not your fault, blame someone else of course because they
>>>> told
>>>> you the fact.) In the old-world kernel the VCSR a.k.a. FCSR16 is
>>>> certainly being saved/restored, and there's apparently no harm in
>>>> doing
>>>> so. And if the contents are indeed "undefined", why are the code there
>>>> in the first place? Certainly the bits *are* meaningful, only that for
>>>> some reason you aren't revealing the semantics and pretending that
>>>> they
>>>> are "undefined" and probably "do nothing externally observable" if
>>>> accessed in the first place.
>>> On a 3A5000LL, I did an experiment via a kernel module, which enables
>>> LSX/LASX and tries to write and read fcsr16. I tried each bit (1,
>>> 2, 4,
>>> 8, ..., 1 << 31) one by one. The result: no matter which bit I wrote
>>> into fcsr16, I always read out 0.
>>>
>>> And I've objdump'ed a kernel shipped in an early Loongnix release. It
>>> seems the only reference to fcsr16 is a "movgr2fcsr $r16, $r0"
>>> instruction.
>>
>> Hmm this is weird. I can't understand why the vcsr code was there in
>> the first place then... I'd like to check a few Loongnix/Kylin/UOS
>> kernels but currently I don't have the time.
>>
>> If this is the case, indeed all vcsr-related code should be removed.
>> Although I'm still not sure how to best word the commit message.
>>
> Thanks for the Ruoyao's experiment.
>
> Removing the vcsr is the first step to trying to support LBT and
> LSX/LASX in Kernel. In my opinion, the vcsr relevant code may be used
> for debugging and forgot to remove.
>
Thinking about it harder, it actually makes sense. Given that access to
the LSX/LASX manuals is currently restricted, outsiders can never know
whether the code in question is really needed, so one has to err on the
conservative side. Thanks for the clarification, and my apologies for
being harsh in the previous reply.

I think the commit message could be reworded like:

"The VCSR (also known as FCSR16) is not present on retail steppings of
3A5000. FCSR16 through FCSR31 are reserved for non-floating-point
LSX/LASX operations, but on 3A5000 all writes to them are no-op and all
reads return zero. FP LSX/LASX operations reuse the FCSR0 as their CSR.

Remove VCSR handling that is probably leftover debugging code for an
earlier not-for-retail stepping."

(And remove the trailing period in your patch title, while at it; the
Linux kernel doesn't use a trailing period for majority of its commits.)

2022-07-07 06:59:00

by Qi Hu

[permalink] [raw]

Subject: Re: [PATCH v2] LoongArch: Clean useless vcsr in loongarch_fpu.

On 2022/7/7 14:24, WANG Xuerui wrote:
>
> On 2022/7/7 14:09, Qi Hu wrote:
>>
>> On 2022/7/7 12:22, WANG Xuerui wrote:
>>> On 2022/7/7 12:04, Xi Ruoyao wrote:
>>>> On Thu, 2022-07-07 at 11:05 +0800, WANG Xuerui wrote:
>>>>
>>>>> To be frank, at this point I think you're trying to hide something.
>>>>> (This is not your fault, blame someone else of course because they
>>>>> told
>>>>> you the fact.) In the old-world kernel the VCSR a.k.a. FCSR16 is
>>>>> certainly being saved/restored, and there's apparently no harm in
>>>>> doing
>>>>> so. And if the contents are indeed "undefined", why are the code
>>>>> there
>>>>> in the first place? Certainly the bits *are* meaningful, only that
>>>>> for
>>>>> some reason you aren't revealing the semantics and pretending that
>>>>> they
>>>>> are "undefined" and probably "do nothing externally observable" if
>>>>> accessed in the first place.
>>>> On a 3A5000LL, I did an experiment via a kernel module, which enables
>>>> LSX/LASX and tries to write and read fcsr16. I tried each bit (1,
>>>> 2, 4,
>>>> 8, ..., 1 << 31) one by one. The result: no matter which bit I wrote
>>>> into fcsr16, I always read out 0.
>>>>
>>>> And I've objdump'ed a kernel shipped in an early Loongnix release. It
>>>> seems the only reference to fcsr16 is a "movgr2fcsr $r16, $r0"
>>>> instruction.
>>>
>>> Hmm this is weird. I can't understand why the vcsr code was there in
>>> the first place then... I'd like to check a few Loongnix/Kylin/UOS
>>> kernels but currently I don't have the time.
>>>
>>> If this is the case, indeed all vcsr-related code should be removed.
>>> Although I'm still not sure how to best word the commit message.
>>>
>> Thanks for the Ruoyao's experiment.
>>
>> Removing the vcsr is the first step to trying to support LBT and
>> LSX/LASX in Kernel. In my opinion, the vcsr relevant code may be used
>> for debugging and forgot to remove.
>>
> Thinking about it harder, it actually makes sense. Given that access
> to the LSX/LASX manuals is currently restricted, outsiders can never
> know whether the code in question is really needed, so one has to err
> on the conservative side. Thanks for the clarification, and my
> apologies for being harsh in the previous reply.
>
> I think the commit message could be reworded like:
>
> "The VCSR (also known as FCSR16) is not present on retail steppings of
> 3A5000. FCSR16 through FCSR31 are reserved for non-floating-point
> LSX/LASX operations, but on 3A5000 all writes to them are no-op and
> all reads return zero. FP LSX/LASX operations reuse the FCSR0 as their
> CSR.
>
> Remove VCSR handling that is probably leftover debugging code for an
> earlier not-for-retail stepping."
>
> (And remove the trailing period in your patch title, while at it; the
> Linux kernel doesn't use a trailing period for majority of its commits.)
>
Thanks for reminding me about this. It is my first time committing the
patch to Linux Kernel, and the commit message will be modified at V4 patch.

-

Qi