Introduce Kconfig option ARCH_STRICT_ALIGN to make -mstrict-align be
configurable.
Not all LoongArch cores support h/w unaligned access, we can use the
-mstrict-align build parameter to prevent unaligned accesses.
This option is disabled by default to optimise for performance, but you
can enabled it manually if you want to run kernel on systems without h/w
unaligned access support.
Signed-off-by: Huacai Chen <[email protected]>
---
arch/loongarch/Kconfig | 10 ++++++++++
arch/loongarch/Makefile | 2 ++
2 files changed, 12 insertions(+)
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 9cc8b84f7eb0..7470dcfb32f0 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -441,6 +441,16 @@ config ARCH_IOREMAP
protection support. However, you can enable LoongArch DMW-based
ioremap() for better performance.
+config ARCH_STRICT_ALIGN
+ bool "Enable -mstrict-align to prevent unaligned accesses"
+ help
+ Not all LoongArch cores support h/w unaligned access, we can use
+ -mstrict-align build parameter to prevent unaligned accesses.
+
+ This is disabled by default to optimise for performance, you can
+ enabled it manually if you want to run kernel on systems without
+ h/w unaligned access support.
+
config KEXEC
bool "Kexec system call"
select KEXEC_CORE
diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
index 4402387d2755..ccfb52700237 100644
--- a/arch/loongarch/Makefile
+++ b/arch/loongarch/Makefile
@@ -91,10 +91,12 @@ KBUILD_CPPFLAGS += -DVMLINUX_LOAD_ADDRESS=$(load-y)
# instead of .eh_frame so we don't discard them.
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
+ifdef CONFIG_ARCH_STRICT_ALIGN
# Don't emit unaligned accesses.
# Not all LoongArch cores support unaligned access, and as kernel we can't
# rely on others to provide emulation for these accesses.
KBUILD_CFLAGS += $(call cc-option,-mstrict-align)
+endif
KBUILD_CFLAGS += -isystem $(shell $(CC) -print-file-name=include)
--
2.39.0
From: Huacai Chen
> Sent: 02 February 2023 08:43
>
> Introduce Kconfig option ARCH_STRICT_ALIGN to make -mstrict-align be
> configurable.
>
> Not all LoongArch cores support h/w unaligned access, we can use the
> -mstrict-align build parameter to prevent unaligned accesses.
>
> This option is disabled by default to optimise for performance, but you
> can enabled it manually if you want to run kernel on systems without h/w
> unaligned access support.
Should there be an associated run-time check during kernel initialisation
that a kernel compiled without -mstrict-align isn't being run on hardware
that doesn't support unaligned accesses.
It can be quite a while before you get a compiler-generated misaligned accesses.
Also isn't there a HAVE_EFFICIENT_MISALIGNED_ACCESS define that would
also need to be set correctly??
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Thu, Feb 2, 2023, at 09:42, Huacai Chen wrote:
> Introduce Kconfig option ARCH_STRICT_ALIGN to make -mstrict-align be
> configurable.
>
> Not all LoongArch cores support h/w unaligned access, we can use the
> -mstrict-align build parameter to prevent unaligned accesses.
>
> This option is disabled by default to optimise for performance, but you
> can enabled it manually if you want to run kernel on systems without h/w
> unaligned access support.
>
> Signed-off-by: Huacai Chen <[email protected]>
This feels like it's a way too low-level option, I would not expect
users to be able to answer this correctly.
What I would do instead is to have Kconfig options for specific
CPU implementations and derive the alignment requirements from
that.
> +config ARCH_STRICT_ALIGN
> + bool "Enable -mstrict-align to prevent unaligned accesses"
> + help
> + Not all LoongArch cores support h/w unaligned access, we can use
> + -mstrict-align build parameter to prevent unaligned accesses.
> +
> + This is disabled by default to optimise for performance, you can
> + enabled it manually if you want to run kernel on systems without
> + h/w unaligned access support.
> +
There is already a global CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
option, I think you should use that one instead of adding another
one. Setting HAVE_EFFICIENT_UNALIGNED_ACCESS for CPUs that can
do unaligned access will enable some important optimizations in
the network stack and a few other places.
Arnd
On 2023/2/2 16:42, Huacai Chen wrote:
> Introduce Kconfig option ARCH_STRICT_ALIGN to make -mstrict-align be
> configurable.
>
> Not all LoongArch cores support h/w unaligned access, we can use the
> -mstrict-align build parameter to prevent unaligned accesses.
>
> This option is disabled by default to optimise for performance, but you
> can enabled it manually if you want to run kernel on systems without h/w
> unaligned access support.
It's customary to accompany "performance-related" changes like this with
some benchmark numbers and concrete use cases where this would be
profitable. Especially given that arch/loongarch developer and user base
is relatively small, we probably don't want to allow customization of
such a low-level characteristic. In general kernel performance does not
vary much with compiler flags like this, so I'd really hope to see some
numbers here to convince people that this is *really* providing gains.
Also, defaulting to emitting unaligned accesses would mean those future,
likely embedded models (and AFAIK some existing models that haven't
reached GA yet) would lose support with the defconfig. Which means
downstream packagers that care about those use cases would have one more
non-default, non-generic option to carry within their Kconfig. We
probably don't want to repeat the history of other architectures (think
arch/arm or arch/mips) where there wasn't really generic builds and
board-specific tweaks proliferated.
>
> Signed-off-by: Huacai Chen <[email protected]>
> ---
> arch/loongarch/Kconfig | 10 ++++++++++
> arch/loongarch/Makefile | 2 ++
> 2 files changed, 12 insertions(+)
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 9cc8b84f7eb0..7470dcfb32f0 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -441,6 +441,16 @@ config ARCH_IOREMAP
> protection support. However, you can enable LoongArch DMW-based
> ioremap() for better performance.
>
> +config ARCH_STRICT_ALIGN
> + bool "Enable -mstrict-align to prevent unaligned accesses"
> + help
> + Not all LoongArch cores support h/w unaligned access, we can use
> + -mstrict-align build parameter to prevent unaligned accesses.
> +
> + This is disabled by default to optimise for performance, you can
> + enabled it manually if you want to run kernel on systems without
> + h/w unaligned access support.
> +
> config KEXEC
> bool "Kexec system call"
> select KEXEC_CORE
> diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
> index 4402387d2755..ccfb52700237 100644
> --- a/arch/loongarch/Makefile
> +++ b/arch/loongarch/Makefile
> @@ -91,10 +91,12 @@ KBUILD_CPPFLAGS += -DVMLINUX_LOAD_ADDRESS=$(load-y)
> # instead of .eh_frame so we don't discard them.
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
>
> +ifdef CONFIG_ARCH_STRICT_ALIGN
> # Don't emit unaligned accesses.
> # Not all LoongArch cores support unaligned access, and as kernel we can't
> # rely on others to provide emulation for these accesses.
> KBUILD_CFLAGS += $(call cc-option,-mstrict-align)
> +endif >
> KBUILD_CFLAGS += -isystem $(shell $(CC) -print-file-name=include)
>
--
WANG "xen0n" Xuerui
Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/
Hi, David,
On Thu, Feb 2, 2023 at 5:01 PM David Laight <[email protected]> wrote:
>
> From: Huacai Chen
> > Sent: 02 February 2023 08:43
> >
> > Introduce Kconfig option ARCH_STRICT_ALIGN to make -mstrict-align be
> > configurable.
> >
> > Not all LoongArch cores support h/w unaligned access, we can use the
> > -mstrict-align build parameter to prevent unaligned accesses.
> >
> > This option is disabled by default to optimise for performance, but you
> > can enabled it manually if you want to run kernel on systems without h/w
> > unaligned access support.
>
> Should there be an associated run-time check during kernel initialisation
> that a kernel compiled without -mstrict-align isn't being run on hardware
> that doesn't support unaligned accesses.
>
> It can be quite a while before you get a compiler-generated misaligned accesses.
If we don't use -mstrict-align, the kernel cannot be run on hardware
that doesn't support unaligned accesses, so I think the run-time check
is useless, and it has no chance to run the checking.
>
> Also isn't there a HAVE_EFFICIENT_MISALIGNED_ACCESS define that would
> also need to be set correctly??
Yes, HAVE_EFFICIENT_MISALIGNED_ACCESS should be kept consistency with
ARCH_STRICT_ALIGN, thank you.
Huacai
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>
Hi, Arnd,
On Thu, Feb 2, 2023 at 5:47 PM Arnd Bergmann <[email protected]> wrote:
>
> On Thu, Feb 2, 2023, at 09:42, Huacai Chen wrote:
> > Introduce Kconfig option ARCH_STRICT_ALIGN to make -mstrict-align be
> > configurable.
> >
> > Not all LoongArch cores support h/w unaligned access, we can use the
> > -mstrict-align build parameter to prevent unaligned accesses.
> >
> > This option is disabled by default to optimise for performance, but you
> > can enabled it manually if you want to run kernel on systems without h/w
> > unaligned access support.
> >
> > Signed-off-by: Huacai Chen <[email protected]>
>
> This feels like it's a way too low-level option, I would not expect
> users to be able to answer this correctly.
>
> What I would do instead is to have Kconfig options for specific
> CPU implementations and derive the alignment requirements from
> that.
You mean provide something like CONFIG_CPU_XXXX as MIPS do? That
seems not a good idea, too. If there are more than 3 CONFIG_CPU_XXXX,
the complexity is more than CONFIG_ARCH_STRICT_ALIGN. Then users are
also unable to do a correct selection. On the other hand, we can add
more words under CONFIG_ARCH_STRICT_ALIGN to describe which processors
support hardware unaligned accesses.
Huacai
>
> > +config ARCH_STRICT_ALIGN
> > + bool "Enable -mstrict-align to prevent unaligned accesses"
> > + help
> > + Not all LoongArch cores support h/w unaligned access, we can use
> > + -mstrict-align build parameter to prevent unaligned accesses.
> > +
> > + This is disabled by default to optimise for performance, you can
> > + enabled it manually if you want to run kernel on systems without
> > + h/w unaligned access support.
> > +
>
>
> There is already a global CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> option, I think you should use that one instead of adding another
> one. Setting HAVE_EFFICIENT_UNALIGNED_ACCESS for CPUs that can
> do unaligned access will enable some important optimizations in
> the network stack and a few other places.
>
> Arnd
From: Huacai Chen
> Sent: 03 February 2023 02:01
>
> Hi, David,
>
> On Thu, Feb 2, 2023 at 5:01 PM David Laight <[email protected]> wrote:
> >
> > From: Huacai Chen
> > > Sent: 02 February 2023 08:43
> > >
> > > Introduce Kconfig option ARCH_STRICT_ALIGN to make -mstrict-align be
> > > configurable.
> > >
> > > Not all LoongArch cores support h/w unaligned access, we can use the
> > > -mstrict-align build parameter to prevent unaligned accesses.
> > >
> > > This option is disabled by default to optimise for performance, but you
> > > can enabled it manually if you want to run kernel on systems without h/w
> > > unaligned access support.
> >
> > Should there be an associated run-time check during kernel initialisation
> > that a kernel compiled without -mstrict-align isn't being run on hardware
> > that doesn't support unaligned accesses.
> >
> > It can be quite a while before you get a compiler-generated misaligned accesses.
>
> If we don't use -mstrict-align, the kernel cannot be run on hardware
> that doesn't support unaligned accesses, so I think the run-time check
> is useless, and it has no chance to run the checking.
If you don't add the check and someone boots the wrong type of kernel
then they'll probably get a panic well after booting.
You really do want a check in the bot code.
There is also the question of how userspace is compiled.
You pretty much don't want to be taking traps to fixup misaligned accesses.
So the default compiler options better include -mstrict-align.
You should look at -mno-strict-align being a performance option when
running on known hardware, not a default.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On 2023/2/2 下午6:30, WANG Xuerui wrote:
> On 2023/2/2 16:42, Huacai Chen wrote:
>> Introduce Kconfig option ARCH_STRICT_ALIGN to make -mstrict-align be
>> configurable.
>>
>> Not all LoongArch cores support h/w unaligned access, we can use the
>> -mstrict-align build parameter to prevent unaligned accesses.
>>
>> This option is disabled by default to optimise for performance, but you
>> can enabled it manually if you want to run kernel on systems without h/w
>> unaligned access support.
>
> It's customary to accompany "performance-related" changes like this with
> some benchmark numbers and concrete use cases where this would be
> profitable. Especially given that arch/loongarch developer and user base
> is relatively small, we probably don't want to allow customization of
> such a low-level characteristic. In general kernel performance does not
> vary much with compiler flags like this, so I'd really hope to see some
> numbers here to convince people that this is *really* providing gains.
>
> Also, defaulting to emitting unaligned accesses would mean those future,
> likely embedded models (and AFAIK some existing models that haven't
> reached GA yet) would lose support with the defconfig. Which means
> downstream packagers that care about those use cases would have one more
> non-default, non-generic option to carry within their Kconfig. We
> probably don't want to repeat the history of other architectures (think
> arch/arm or arch/mips) where there wasn't really generic builds and
> board-specific tweaks proliferated.
>
Hi, Xuerui
I think the kernels produced with and without -mstrict-align have mainly
following differences:
- Diffirent size. I build two kernls (vmlinux), size of kernel with
-mstrict-align is 26533376 bytes and size of kernel without
-mstrict-align is 26123280 bytes.
- Diffirent performance. For example, in kernel function jhash(), the
assemble code slices with and without -mstrict-align are following:
without -mstrict-align:
900000000032736c <jhash>:
900000000032736c: 15bd5b6d lu12i.w $t1,
-136485(0xdeadb)
9000000000327370: 03bbbdad ori $t1, $t1, 0xeef
9000000000327374: 001019ad add.w $t1, $t1, $a2
9000000000327378: 001015ae add.w $t2, $t1, $a1
900000000032737c: 0280300c addi.w $t0, $zero,
12(0xc)
9000000000327380: 00150091 move $t5, $a0
9000000000327384: 001501d0 move $t4, $t2
9000000000327388: 001501c4 move $a0, $t2
900000000032738c: 6c009585 bgeu $t0, $a1,
148(0x94) # 9000000000327420 <jhash+0xb4>
9000000000327390: 02803012 addi.w $t6, $zero,
12(0xc)
9000000000327394: 24000a2f ldptr.w $t3, $t5, 8(0x8)
9000000000327398: 2400022d ldptr.w $t1, $t5, 0
900000000032739c: 2400062c ldptr.w $t0, $t5, 4(0x4)
90000000003273a0: 001011e4 add.w $a0, $t3, $a0
90000000003273a4: 001111af sub.w $t3, $t1, $a0
90000000003273a8: 001039ef add.w $t3, $t3, $t2
90000000003273ac: 004cf08e rotri.w $t2, $a0, 0x1c
90000000003273b0: 0010418c add.w $t0, $t0, $t4
...
with -mstrict-align:
90000000003310c0 <jhash>:
90000000003310c0: 15bd5b6f lu12i.w $t3,
-136485(0xdeadb)
90000000003310c4: 03bbbdef ori $t3, $t3, 0xeef
90000000003310c8: 001019ef add.w $t3, $t3, $a2
90000000003310cc: 001015e6 add.w $a2, $t3, $a1
90000000003310d0: 0280300d addi.w $t1, $zero, 12(0xc)
90000000003310d4: 0015008c move $t0, $a0
90000000003310d8: 001500d2 move $t6, $a2
90000000003310dc: 001500c4 move $a0, $a2
90000000003310e0: 6c0101a5 bgeu $t1, $a1,
256(0x100) # 90000000003311e0 <jhash+0x120>
90000000003310e4: 02803011 addi.w $t5, $zero, 12(0xc)
90000000003310e8: 2a002589 ld.bu $a5, $t0, 9(0x9)
90000000003310ec: 2a00218d ld.bu $t1, $t0, 8(0x8)
90000000003310f0: 2a002988 ld.bu $a4, $t0, 10(0xa)
90000000003310f4: 2a000587 ld.bu $a3, $t0, 1(0x1)
90000000003310f8: 2a002d8e ld.bu $t2, $t0, 11(0xb)
90000000003310fc: 2a00018b ld.bu $a7, $t0, 0
9000000000331100: 2a000994 ld.bu $t8, $t0, 2(0x2)
9000000000331104: 2a001593 ld.bu $t7, $t0, 5(0x5)
9000000000331108: 2a000d8f ld.bu $t3, $t0, 3(0x3)
900000000033110c: 00412129 slli.d $a5, $a5, 0x8
9000000000331110: 2a00118a ld.bu $a6, $t0, 4(0x4)
9000000000331114: 2a001990 ld.bu $t4, $t0, 6(0x6)
9000000000331118: 00153529 or $a5, $a5, $t1
...
It seems that it's difficult for me to test the performance difference
in a real kernel path with unaligned-access code. So, I use a kernel
module (use simple test code) to show some difference on 3A5000 as
following:
c code:
preempt_disable();
start = ktime_get_ns();
for (i = 0; i < n; i++)
assign(p1[i], q1[i]);
end = ktime_get_ns();
preempt_enable();
printk("mstrict-align-test took: %lld nsec\n", end - start);
assemble code without -mstrict-align:
0: 260000ac ldptr.d $t0, $a1, 0
4: 2700008c stptr.d $t0, $a0, 0
8: 4c000020 jirl $zero, $ra, 0
assemble code with -mstrict-align:
0: 2a0000b3 ld.bu $t7, $a1, 0
4: 2a0004b2 ld.bu $t6, $a1, 1(0x1)
8: 2a0008b1 ld.bu $t5, $a1, 2(0x2)
c: 2a000cb0 ld.bu $t4, $a1, 3(0x3)
10: 2a0010af ld.bu $t3, $a1, 4(0x4)
14: 2a0014ae ld.bu $t2, $a1, 5(0x5)
18: 2a0018ad ld.bu $t1, $a1, 6(0x6)
1c: 2a001cac ld.bu $t0, $a1, 7(0x7)
20: 29000093 st.b $t7, $a0, 0
24: 29000492 st.b $t6, $a0, 1(0x1)
28: 29000891 st.b $t5, $a0, 2(0x2)
2c: 29000c90 st.b $t4, $a0, 3(0x3)
30: 2900108f st.b $t3, $a0, 4(0x4)
34: 2900148e st.b $t2, $a0, 5(0x5)
38: 2900188d st.b $t1, $a0, 6(0x6)
3c: 29001c8c st.b $t0, $a0, 7(0x7)
40: 4c000020 jirl $zero, $ra, 0
and test results (run 3 times) following:
the module without -mstrict-align testing:
[root@openEuler loongson]# insmod align-test.ko
[ 39.029931] mstrict-align-test took: 29603510 nsec
[root@openEuler loongson]# rmmod align-test.ko
[root@openEuler loongson]# insmod align-test.ko
[ 41.356007] mstrict-align-test took: 28816710 nsec
[root@openEuler loongson]# rmmod align-test.ko
[root@openEuler loongson]# insmod align-test.ko
[ 43.506624] mstrict-align-test took: 30030700 nsec
[root@openEuler loongson]# rmmod align-test.ko
the module with -mstrict-align testing:
root@openEuler ~]# insmod align-test.ko
[ 92.656477] mstrict-align-test took: 59629000 nsec
[root@openEuler ~]# rmmod align-test.ko
[root@openEuler ~]# insmod align-test.ko
[ 99.473011] mstrict-align-test took: 58972250 nsec
[root@openEuler ~]# rmmod align-test.ko
[root@openEuler ~]# insmod align-test.ko
[ 104.620103] mstrict-align-test took: 59419260 nsec
[root@openEuler ~]# rmmod align-test.ko
Thanks!
Jianmin
>>
>> Signed-off-by: Huacai Chen <[email protected]>
>> ---
>> arch/loongarch/Kconfig | 10 ++++++++++
>> arch/loongarch/Makefile | 2 ++
>> 2 files changed, 12 insertions(+)
>>
>> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
>> index 9cc8b84f7eb0..7470dcfb32f0 100644
>> --- a/arch/loongarch/Kconfig
>> +++ b/arch/loongarch/Kconfig
>> @@ -441,6 +441,16 @@ config ARCH_IOREMAP
>> protection support. However, you can enable LoongArch DMW-based
>> ioremap() for better performance.
>> +config ARCH_STRICT_ALIGN
>> + bool "Enable -mstrict-align to prevent unaligned accesses"
>> + help
>> + Not all LoongArch cores support h/w unaligned access, we can use
>> + -mstrict-align build parameter to prevent unaligned accesses.
>> +
>> + This is disabled by default to optimise for performance, you can
>> + enabled it manually if you want to run kernel on systems without
>> + h/w unaligned access support.
>> +
>> config KEXEC
>> bool "Kexec system call"
>> select KEXEC_CORE
>> diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
>> index 4402387d2755..ccfb52700237 100644
>> --- a/arch/loongarch/Makefile
>> +++ b/arch/loongarch/Makefile
>> @@ -91,10 +91,12 @@ KBUILD_CPPFLAGS += -DVMLINUX_LOAD_ADDRESS=$(load-y)
>> # instead of .eh_frame so we don't discard them.
>> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
>> +ifdef CONFIG_ARCH_STRICT_ALIGN
>> # Don't emit unaligned accesses.
>> # Not all LoongArch cores support unaligned access, and as kernel we
>> can't
>> # rely on others to provide emulation for these accesses.
>> KBUILD_CFLAGS += $(call cc-option,-mstrict-align)
>> +endif >
>> KBUILD_CFLAGS += -isystem $(shell $(CC) -print-file-name=include)
>
On 2023/2/3 下午4:46, David Laight wrote:
> From: Huacai Chen
>> Sent: 03 February 2023 02:01
>>
>> Hi, David,
>>
>> On Thu, Feb 2, 2023 at 5:01 PM David Laight <[email protected]> wrote:
>>>
>>> From: Huacai Chen
>>>> Sent: 02 February 2023 08:43
>>>>
>>>> Introduce Kconfig option ARCH_STRICT_ALIGN to make -mstrict-align be
>>>> configurable.
>>>>
>>>> Not all LoongArch cores support h/w unaligned access, we can use the
>>>> -mstrict-align build parameter to prevent unaligned accesses.
>>>>
>>>> This option is disabled by default to optimise for performance, but you
>>>> can enabled it manually if you want to run kernel on systems without h/w
>>>> unaligned access support.
>>>
>>> Should there be an associated run-time check during kernel initialisation
>>> that a kernel compiled without -mstrict-align isn't being run on hardware
>>> that doesn't support unaligned accesses.
>>>
>>> It can be quite a while before you get a compiler-generated misaligned accesses.
>>
>> If we don't use -mstrict-align, the kernel cannot be run on hardware
>> that doesn't support unaligned accesses, so I think the run-time check
>> is useless, and it has no chance to run the checking.
>
> If you don't add the check and someone boots the wrong type of kernel
> then they'll probably get a panic well after booting.
> You really do want a check in the bot code.
>
Agree, maybe it's reasonable to check it at the beginning of cpu probe
stuff.
> There is also the question of how userspace is compiled.
> You pretty much don't want to be taking traps to fixup misaligned accesses.
> So the default compiler options better include -mstrict-align.
>
> You should look at -mno-strict-align being a performance option when
> running on known hardware, not a default.
>
> David
>
I think the key point of the patch is providing users with a high
performance kernel for existed and future unaligned-access-supported
Loongson CPUs (mainly for destop and server system, also called *big*
CPU), which are dominant compared with unaligned-access-unsupported CPUs
(mainly for customized embedded system, also called *small* CPU). By
this way, we just want to provide *the vast majority of big CPU users*
(desktop and server OS) with convenience to directly use high
performance kernel without any extra compile option. Instead, for
customized embedded system, we have to support them with an extra
compile option. So, it seems that we have to reconcile default compile
option between small CPU and big CPU, and sacrifice the convenience of
small CPU.
For some specific diffirences with and without -mstrict-align, see:
https://lore.kernel.org/all/[email protected]/
Thanks!
Jianmin
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>
On Fri, Feb 3, 2023, at 03:08, Huacai Chen wrote:
> On Thu, Feb 2, 2023 at 5:47 PM Arnd Bergmann <[email protected]> wrote:
>>
>> On Thu, Feb 2, 2023, at 09:42, Huacai Chen wrote:
>> > Introduce Kconfig option ARCH_STRICT_ALIGN to make -mstrict-align be
>> > configurable.
>> >
>> > Not all LoongArch cores support h/w unaligned access, we can use the
>> > -mstrict-align build parameter to prevent unaligned accesses.
>> >
>> > This option is disabled by default to optimise for performance, but you
>> > can enabled it manually if you want to run kernel on systems without h/w
>> > unaligned access support.
>> >
>> > Signed-off-by: Huacai Chen <[email protected]>
>>
>> This feels like it's a way too low-level option, I would not expect
>> users to be able to answer this correctly.
>>
>> What I would do instead is to have Kconfig options for specific
>> CPU implementations and derive the alignment requirements from
>> that.
> You mean provide something like CONFIG_CPU_XXXX as MIPS do? That
> seems not a good idea, too. If there are more than 3 CONFIG_CPU_XXXX,
> the complexity is more than CONFIG_ARCH_STRICT_ALIGN.
The way that mips does it is not useful since that forces you to
pick a single CPU from a 'choice' list in Kconfig, with the CPUs
being mutually exclusive in that list. What you need here is either
a strict hierarchy of CPUs like in arch/x86/Kconfig.cpus where each
option is a superset of the previous one, or a set of options
like in arch/arm/mm/Kconfig that are not mutually exclusive and
let you pick any combinations that you want to support in a kernel
image.
The important bit is that a kernel you build will by default
always work across all hardware except the ones that are
explicitly excluded.
> Then users are also unable to do a correct selection. On the other
> hand, we can add more words under CONFIG_ARCH_STRICT_ALIGN to
> describe which processors support hardware unaligned accesses.
Trying to handle this with help texts quickly gets out of hand
when you get to dozens of CPU specific optimizations that are
incompatible with other CPU cores. I assume you will need similar
options e.g. for the broken cpu-idle instruction on early cores
or the missing sub-word atomics, once these are fixed in new
CPU cores.
Arnd
On Mon, 2023-02-06 at 18:24 +0800, Jianmin Lv wrote:
> Hi, Xuerui
>
> I think the kernels produced with and without -mstrict-align have mainly
> following differences:
> - Diffirent size. I build two kernls (vmlinux), size of kernel with
> -mstrict-align is 26533376 bytes and size of kernel without
> -mstrict-align is 26123280 bytes.
> - Diffirent performance. For example, in kernel function jhash(), the
> assemble code slices with and without -mstrict-align are following:
But there are still questions remaining:
(1) Is the difference contributed by a bad code generation of GCC? If
true, it's better to improve GCC before someone starts to build a distro
for LA264 as it would benefit the user space as well.
(2) Is there some "big bad unaligned access loop" on a hot spot in the
kernel code? If true, it may be better to just refactor the C code
because doing so will benefit all ports, not only LoongArch. Otherwise,
it may be unworthy to optimize for some cold paths.
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University
On 2023/2/6 下午7:18, Xi Ruoyao wrote:
> On Mon, 2023-02-06 at 18:24 +0800, Jianmin Lv wrote:
>> Hi, Xuerui
>>
>> I think the kernels produced with and without -mstrict-align have mainly
>> following differences:
>> - Diffirent size. I build two kernls (vmlinux), size of kernel with
>> -mstrict-align is 26533376 bytes and size of kernel without
>> -mstrict-align is 26123280 bytes.
>> - Diffirent performance. For example, in kernel function jhash(), the
>> assemble code slices with and without -mstrict-align are following:
>
> But there are still questions remaining:
>
> (1) Is the difference contributed by a bad code generation of GCC? If
> true, it's better to improve GCC before someone starts to build a distro
> for LA264 as it would benefit the user space as well.
>
AFAIK, GCC builds to produce unaligned-access-enabled target binary by
default (without -mstrict-align) for improving user space performance
(small size and runtime high performance), which is also based the fact
that the vast majority of LoongArch CPUs support unaligned-access.
> (2) Is there some "big bad unaligned access loop" on a hot spot in the
> kernel code? If true, it may be better to just refactor the C code
> because doing so will benefit all ports, not only LoongArch. Otherwise,
> it may be unworthy to optimize for some cold paths.
>
Frankly, I'm not sure if there is this kind of hot code in kernel, I
just see the difference from different kernel size and different
assemble code slice. And I'm afraid that it may be difficult to judge
whether it is reasonable hot code or not if exists.
On Mon, Feb 6, 2023, at 14:13, Jianmin Lv wrote:
> On 2023/2/6 下午7:18, Xi Ruoyao wrote:
>> On Mon, 2023-02-06 at 18:24 +0800, Jianmin Lv wrote:
>>> Hi, Xuerui
>>>
>>> I think the kernels produced with and without -mstrict-align have mainly
>>> following differences:
>>> - Diffirent size. I build two kernls (vmlinux), size of kernel with
>>> -mstrict-align is 26533376 bytes and size of kernel without
>>> -mstrict-align is 26123280 bytes.
>>> - Diffirent performance. For example, in kernel function jhash(), the
>>> assemble code slices with and without -mstrict-align are following:
>>
>> But there are still questions remaining:
>>
>> (1) Is the difference contributed by a bad code generation of GCC? If
>> true, it's better to improve GCC before someone starts to build a distro
>> for LA264 as it would benefit the user space as well.
>>
> AFAIK, GCC builds to produce unaligned-access-enabled target binary by
> default (without -mstrict-align) for improving user space performance
> (small size and runtime high performance), which is also based the fact
> that the vast majority of LoongArch CPUs support unaligned-access.
>
>> (2) Is there some "big bad unaligned access loop" on a hot spot in the
>> kernel code? If true, it may be better to just refactor the C code
>> because doing so will benefit all ports, not only LoongArch. Otherwise,
>> it may be unworthy to optimize for some cold paths.
>>
> Frankly, I'm not sure if there is this kind of hot code in kernel, I
> just see the difference from different kernel size and different
> assemble code slice. And I'm afraid that it may be difficult to judge
> whether it is reasonable hot code or not if exists.
Just look for CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, this will
show you code locations that use different implementations based on
whether the kernel should run on CPUs without unaligned access or
not.
Arnd
On Mon, 2023-02-06 at 21:13 +0800, Jianmin Lv wrote:
> > (1) Is the difference contributed by a bad code generation of GCC? If
> > true, it's better to improve GCC before someone starts to build a distro
> > for LA264 as it would benefit the user space as well.
> >
> AFAIK, GCC builds to produce unaligned-access-enabled target binary by
> default (without -mstrict-align) for improving user space performance
> (small size and runtime high performance), which is also based the fact
> that the vast majority of LoongArch CPUs support unaligned-access.
I mean: if someone starts to build a distro for a less-capable LoongArch
processor, (s)he will need an entire user space compiled with -mstrict-
align. So it would be better to start preparation now.
And it's likely (s)he will either submit a GCC patch to make GCC
enable/disable -mstrict-align based on the -march= (--with-arch at
configure time) value, or hack GCC to enable -mstrict-align by default
for the distro. So I think we'll also need:
> +ifdef CONFIG_ARCH_STRICT_ALIGN may enable strict align by default.
> # Don't emit unaligned accesses.
> # Not all LoongArch cores support unaligned access, and as kernel we can't
> # rely on others to provide emulation for these accesses.
> KBUILD_CFLAGS += $(call cc-option,-mstrict-align)
+else
+# Distros designed for running on both kind of processors may disable
+# strict align by default, but the user may want a no-strict-align
+# kernel for his/her specific hardware.
KBUILD_CFLAGS += $(call cc-option,-mno-strict-align)
> +endif
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University
On 2023/2/6 下午9:22, Arnd Bergmann wrote:
> On Mon, Feb 6, 2023, at 14:13, Jianmin Lv wrote:
>> On 2023/2/6 下午7:18, Xi Ruoyao wrote:
>>> On Mon, 2023-02-06 at 18:24 +0800, Jianmin Lv wrote:
>>>> Hi, Xuerui
>>>>
>>>> I think the kernels produced with and without -mstrict-align have mainly
>>>> following differences:
>>>> - Diffirent size. I build two kernls (vmlinux), size of kernel with
>>>> -mstrict-align is 26533376 bytes and size of kernel without
>>>> -mstrict-align is 26123280 bytes.
>>>> - Diffirent performance. For example, in kernel function jhash(), the
>>>> assemble code slices with and without -mstrict-align are following:
>>>
>>> But there are still questions remaining:
>>>
>>> (1) Is the difference contributed by a bad code generation of GCC? If
>>> true, it's better to improve GCC before someone starts to build a distro
>>> for LA264 as it would benefit the user space as well.
>>>
>> AFAIK, GCC builds to produce unaligned-access-enabled target binary by
>> default (without -mstrict-align) for improving user space performance
>> (small size and runtime high performance), which is also based the fact
>> that the vast majority of LoongArch CPUs support unaligned-access.
>>
>>> (2) Is there some "big bad unaligned access loop" on a hot spot in the
>>> kernel code? If true, it may be better to just refactor the C code
>>> because doing so will benefit all ports, not only LoongArch. Otherwise,
>>> it may be unworthy to optimize for some cold paths.
>>>
>> Frankly, I'm not sure if there is this kind of hot code in kernel, I
>> just see the difference from different kernel size and different
>> assemble code slice. And I'm afraid that it may be difficult to judge
>> whether it is reasonable hot code or not if exists.
>
> Just look for CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, this will
> show you code locations that use different implementations based on
> whether the kernel should run on CPUs without unaligned access or
> not.
>
> Arnd
>
Got it, thank you very much, I greped
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS and found many matched cases
including driver, lib, net and so on, it seems that it's reasonable to
use high performance way for CPUs with HAVE_EFFICIENT_UNALIGNED_ACCESS
configured.
On 2023/2/6 下午9:30, Xi Ruoyao wrote:
> On Mon, 2023-02-06 at 21:13 +0800, Jianmin Lv wrote:
>>> (1) Is the difference contributed by a bad code generation of GCC? If
>>> true, it's better to improve GCC before someone starts to build a distro
>>> for LA264 as it would benefit the user space as well.
>>>
>> AFAIK, GCC builds to produce unaligned-access-enabled target binary by
>> default (without -mstrict-align) for improving user space performance
>> (small size and runtime high performance), which is also based the fact
>> that the vast majority of LoongArch CPUs support unaligned-access.
>
> I mean: if someone starts to build a distro for a less-capable LoongArch
> processor, (s)he will need an entire user space compiled with -mstrict-
> align. So it would be better to start preparation now.
>
> And it's likely (s)he will either submit a GCC patch to make GCC
> enable/disable -mstrict-align based on the -march= (--with-arch at
> configure time) value, or hack GCC to enable -mstrict-align by default
> for the distro. So I think we'll also need:
>
>> +ifdef CONFIG_ARCH_STRICT_ALIGN may enable strict align by default.
>> # Don't emit unaligned accesses.
>> # Not all LoongArch cores support unaligned access, and as kernel we can't
>> # rely on others to provide emulation for these accesses.
>> KBUILD_CFLAGS += $(call cc-option,-mstrict-align)
> +else
> +# Distros designed for running on both kind of processors may disable
> +# strict align by default, but the user may want a no-strict-align
> +# kernel for his/her specific hardware.
> KBUILD_CFLAGS += $(call cc-option,-mno-strict-align)
>> +endif
>
Thanks, Ruoyao, I think it's good suggestion. After talking about it
with GCC colleague, it's very likely make GCC enable/disable
-mstrict-align based on the -march= in future, just as you said.
On 2023/2/6 18:28, Jianmin Lv wrote:
>
>
> On 2023/2/3 下午4:46, David Laight wrote:
>> From: Huacai Chen
>>> Sent: 03 February 2023 02:01
>>>
>>> Hi, David,
>>>
>>> On Thu, Feb 2, 2023 at 5:01 PM David Laight <[email protected]>
>>> wrote:
>>>>
>>>> From: Huacai Chen
>>>>> Sent: 02 February 2023 08:43
>>>>>
>>>>> Introduce Kconfig option ARCH_STRICT_ALIGN to make -mstrict-align be
>>>>> configurable.
>>>>>
>>>>> Not all LoongArch cores support h/w unaligned access, we can use the
>>>>> -mstrict-align build parameter to prevent unaligned accesses.
>>>>>
>>>>> This option is disabled by default to optimise for performance, but
>>>>> you
>>>>> can enabled it manually if you want to run kernel on systems
>>>>> without h/w
>>>>> unaligned access support.
>>>>
>>>> Should there be an associated run-time check during kernel
>>>> initialisation
>>>> that a kernel compiled without -mstrict-align isn't being run on
>>>> hardware
>>>> that doesn't support unaligned accesses.
>>>>
>>>> It can be quite a while before you get a compiler-generated
>>>> misaligned accesses.
>>>
>>> If we don't use -mstrict-align, the kernel cannot be run on hardware
>>> that doesn't support unaligned accesses, so I think the run-time check
>>> is useless, and it has no chance to run the checking.
>>
>> If you don't add the check and someone boots the wrong type of kernel
>> then they'll probably get a panic well after booting.
>> You really do want a check in the bot code.
>>
> Agree, maybe it's reasonable to check it at the beginning of cpu probe
> stuff.
Yeah I think just performing a deliberate unaligned access very early
would be enough to stop "weaker" CPUs from continuing in this case.
>
>> There is also the question of how userspace is compiled.
>> You pretty much don't want to be taking traps to fixup misaligned
>> accesses.
>> So the default compiler options better include -mstrict-align.
>>
>> You should look at -mno-strict-align being a performance option when
>> running on known hardware, not a default.
>>
>> David
>>
> I think the key point of the patch is providing users with a high
> performance kernel for existed and future unaligned-access-supported
> Loongson CPUs (mainly for destop and server system, also called *big*
> CPU), which are dominant compared with unaligned-access-unsupported CPUs
> (mainly for customized embedded system, also called *small* CPU). By
> this way, we just want to provide *the vast majority of big CPU users*
> (desktop and server OS) with convenience to directly use high
> performance kernel without any extra compile option.
Market share and general availability may matter, but again, if you're
considering end users that most likely don't compile their own kernels,
Kconfig default or defconfig may not matter after all: distributions
invariably maintain their own Kconfig. And I think we should follow the
general principle of "least surprises" -- just make the default value
most universal. It's not like those comparatively small number of power
users / developers are not paying attention to the "Emit unaligned
accesses in kernel for performance" config option.
(Yes I've partially changed my mind after seeing Arnd's suggestion that
indeed some optimized codepaths can be enabled if we can know the CPU's
unaligned capability at config time. Now I'm in support of making this
codegen aspect tunable, but I still think keeping the default as-is
would be a better idea. It won't regress or surprise anyone and embedded
people's convenience wouldn't get sacrificed.)
> Instead, for customized embedded system, we have to support them with an extra
> compile option. So, it seems that we have to reconcile default compile
> option between small CPU and big CPU, and sacrifice the convenience of
> small CPU.
>
> For some specific diffirences with and without -mstrict-align, see:
> https://lore.kernel.org/all/[email protected]/
As someone who's dabbled with compilers I definitely agree the codegen
impact and/or performance benefit could be sizable, after all every
potentially unaligned access must be split into two guaranteed-aligned
insns if we can't rely on the hardware. But again microbenchmarks could
at times translate into real-world gains surprisingly poorly, so I still
think concrete use cases would make a better argument.
But again, since some other known-good optimizations can only be turned
on at config time, like in the network stack, arguably you don't have to
come up with this concrete number any more ;)
--
WANG "xen0n" Xuerui
Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/
On Tue, Feb 7, 2023, at 06:24, WANG Xuerui wrote:
> (Yes I've partially changed my mind after seeing Arnd's suggestion that
> indeed some optimized codepaths can be enabled if we can know the CPU's
> unaligned capability at config time. Now I'm in support of making this
> codegen aspect tunable, but I still think keeping the default as-is
> would be a better idea. It won't regress or surprise anyone and embedded
> people's convenience wouldn't get sacrificed.)
I agree the default should always be to have a kernel that works on
every machine that has been produced, but this also depends on which
models specifically lack the unaligned access. If it's just about
pre-production silicon that is now all but scrapped, things are different
from a situation where users may actually use them for normal workloads.
Is there an overview of the available loongarch CPU cores that have
been produced so far, and which ones support unaligned access?
Arnd
On 2023/2/7 下午6:32, Arnd Bergmann wrote:
> I agree the default should always be to have a kernel that works on
> every machine that has been produced, but this also depends on which
> models specifically lack the unaligned access. If it's just about
> pre-production silicon that is now all but scrapped, things are different
> from a situation where users may actually use them for normal workloads.
>
> Is there an overview of the available loongarch CPU cores that have
> been produced so far, and which ones support unaligned access?
So far, produced CPUs based LoongArch include 3A5000, 3B5000, 3C5000L,
3C5000, 2K2000, 2K1000LA and 2K0500, where 2K1000LA and 2K0500 are
unaligned-access-unsupported, and others are unaligned-access-supported.
On Tue, Feb 7, 2023, at 14:28, Jianmin Lv wrote:
> On 2023/2/7 下午6:32, Arnd Bergmann wrote:
>> I agree the default should always be to have a kernel that works on
>> every machine that has been produced, but this also depends on which
>> models specifically lack the unaligned access. If it's just about
>> pre-production silicon that is now all but scrapped, things are different
>> from a situation where users may actually use them for normal workloads.
>>
>> Is there an overview of the available loongarch CPU cores that have
>> been produced so far, and which ones support unaligned access?
>
> So far, produced CPUs based LoongArch include 3A5000, 3B5000, 3C5000L,
> 3C5000, 2K2000, 2K1000LA and 2K0500, where 2K1000LA and 2K0500 are
> unaligned-access-unsupported, and others are unaligned-access-supported.
Ok, so these are actually some of the newer (though low-end)
implementations that require the workaround, not the older chips.
In this case, I think both the kernel and toolchain need to default
to -mstrict-align, unless someone specifically asks for the variant
that can support unaligned access. The kernel option could be
guarded by 'depends on EXPERT' to ensure that this is not set by
default.
To be sure that this is set correctly, the
arch/loongarch/kernel/unaligned.c file should also never be included
when EFFICIENT_UNALIGNED_ACCESS is set, to ensure that any attempt
to run such a non-portable kernel on 2K1000LA results in a
a kernel panic rather than silently fixing up the unaligned accesses
at a huge performance cost.
Arnd
Hi, Arnd,
On Tue, Feb 7, 2023 at 10:11 PM Arnd Bergmann <[email protected]> wrote:
>
> On Tue, Feb 7, 2023, at 14:28, Jianmin Lv wrote:
> > On 2023/2/7 下午6:32, Arnd Bergmann wrote:
> >> I agree the default should always be to have a kernel that works on
> >> every machine that has been produced, but this also depends on which
> >> models specifically lack the unaligned access. If it's just about
> >> pre-production silicon that is now all but scrapped, things are different
> >> from a situation where users may actually use them for normal workloads.
> >>
> >> Is there an overview of the available loongarch CPU cores that have
> >> been produced so far, and which ones support unaligned access?
> >
> > So far, produced CPUs based LoongArch include 3A5000, 3B5000, 3C5000L,
> > 3C5000, 2K2000, 2K1000LA and 2K0500, where 2K1000LA and 2K0500 are
> > unaligned-access-unsupported, and others are unaligned-access-supported.
>
> Ok, so these are actually some of the newer (though low-end)
> implementations that require the workaround, not the older chips.
>
> In this case, I think both the kernel and toolchain need to default
> to -mstrict-align, unless someone specifically asks for the variant
> that can support unaligned access. The kernel option could be
> guarded by 'depends on EXPERT' to ensure that this is not set by
> default.
>
> To be sure that this is set correctly, the
> arch/loongarch/kernel/unaligned.c file should also never be included
> when EFFICIENT_UNALIGNED_ACCESS is set, to ensure that any attempt
> to run such a non-portable kernel on 2K1000LA results in a
> a kernel panic rather than silently fixing up the unaligned accesses
> at a huge performance cost.
OK, sounds reasonable, I will send V2 for that.
Huacai
>
> Arnd
>