2019-06-21 08:59:18

by Mathieu Malaterre

[permalink] [raw]
Subject: [PATCH] powerpc/lib/xor_vmx: Relax frame size for clang

When building with clang-8 the frame size limit is hit:

../arch/powerpc/lib/xor_vmx.c:119:6: error: stack frame size of 1200 bytes in function '__xor_altivec_5' [-Werror,-Wframe-larger-than=]

Follow the same approach as commit 9c87156cce5a ("powerpc/xmon: Relax
frame size for clang") until a proper fix is implemented upstream in
clang and relax requirement for clang.

Link: https://github.com/ClangBuiltLinux/linux/issues/563
Cc: Joel Stanley <[email protected]>
Signed-off-by: Mathieu Malaterre <[email protected]>
---
arch/powerpc/lib/Makefile | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index c55f9c27bf79..b3f7d64caaf0 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -58,5 +58,9 @@ obj-$(CONFIG_FTR_FIXUP_SELFTEST) += feature-fixups-test.o

obj-$(CONFIG_ALTIVEC) += xor_vmx.o xor_vmx_glue.o
CFLAGS_xor_vmx.o += -maltivec $(call cc-option,-mabi=altivec)
+ifdef CONFIG_CC_IS_CLANG
+# See https://github.com/ClangBuiltLinux/linux/issues/563
+CFLAGS_xor_vmx.o += -Wframe-larger-than=4096
+endif

obj-$(CONFIG_PPC64) += $(obj64-y)
--
2.20.1


2022-09-07 17:41:20

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH] powerpc/lib/xor_vmx: Relax frame size for clang



Le 21/06/2019 à 10:58, Mathieu Malaterre a écrit :
> When building with clang-8 the frame size limit is hit:
>
> ../arch/powerpc/lib/xor_vmx.c:119:6: error: stack frame size of 1200 bytes in function '__xor_altivec_5' [-Werror,-Wframe-larger-than=]
>
> Follow the same approach as commit 9c87156cce5a ("powerpc/xmon: Relax
> frame size for clang") until a proper fix is implemented upstream in
> clang and relax requirement for clang.

With Clang 14 I get the following errors, but only with KASAN selected.

CC arch/powerpc/lib/xor_vmx.o
arch/powerpc/lib/xor_vmx.c:95:6: error: stack frame size (1040) exceeds
limit (1024) in '__xor_altivec_4' [-Werror,-Wframe-larger-than]
void __xor_altivec_4(unsigned long bytes,
^
arch/powerpc/lib/xor_vmx.c:124:6: error: stack frame size (1312) exceeds
limit (1024) in '__xor_altivec_5' [-Werror,-Wframe-larger-than]
void __xor_altivec_5(unsigned long bytes,
^


Is this patch still relevant ?

Or should frame size be relaxed when KASAN is selected ? After all the
stack size is multiplied by 2 when we have KASAN, so maybe the warning
limit should be increased as well ?

Thanks
Christophe

>
> Link: https://github.com/ClangBuiltLinux/linux/issues/563
> Cc: Joel Stanley <[email protected]>
> Signed-off-by: Mathieu Malaterre <[email protected]>
> ---
> arch/powerpc/lib/Makefile | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
> index c55f9c27bf79..b3f7d64caaf0 100644
> --- a/arch/powerpc/lib/Makefile
> +++ b/arch/powerpc/lib/Makefile
> @@ -58,5 +58,9 @@ obj-$(CONFIG_FTR_FIXUP_SELFTEST) += feature-fixups-test.o
>
> obj-$(CONFIG_ALTIVEC) += xor_vmx.o xor_vmx_glue.o
> CFLAGS_xor_vmx.o += -maltivec $(call cc-option,-mabi=altivec)
> +ifdef CONFIG_CC_IS_CLANG
> +# See https://github.com/ClangBuiltLinux/linux/issues/563
> +CFLAGS_xor_vmx.o += -Wframe-larger-than=4096
> +endif
>
> obj-$(CONFIG_PPC64) += $(obj64-y)

2022-09-08 00:45:04

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH] powerpc/lib/xor_vmx: Relax frame size for clang

Christophe Leroy <[email protected]> writes:
> Le 21/06/2019 à 10:58, Mathieu Malaterre a écrit :
>> When building with clang-8 the frame size limit is hit:
>>
>> ../arch/powerpc/lib/xor_vmx.c:119:6: error: stack frame size of 1200 bytes in function '__xor_altivec_5' [-Werror,-Wframe-larger-than=]
>>
>> Follow the same approach as commit 9c87156cce5a ("powerpc/xmon: Relax
>> frame size for clang") until a proper fix is implemented upstream in
>> clang and relax requirement for clang.
>
> With Clang 14 I get the following errors, but only with KASAN selected.
>
> CC arch/powerpc/lib/xor_vmx.o
> arch/powerpc/lib/xor_vmx.c:95:6: error: stack frame size (1040) exceeds
> limit (1024) in '__xor_altivec_4' [-Werror,-Wframe-larger-than]
> void __xor_altivec_4(unsigned long bytes,
> ^
> arch/powerpc/lib/xor_vmx.c:124:6: error: stack frame size (1312) exceeds
> limit (1024) in '__xor_altivec_5' [-Werror,-Wframe-larger-than]
> void __xor_altivec_5(unsigned long bytes,
> ^

That's a 32-bit build?

> Is this patch still relevant ?

The clang issue was closed because a different change fixed the issue:

https://github.com/ClangBuiltLinux/linux/issues/563

> Or should frame size be relaxed when KASAN is selected ? After all the
> stack size is multiplied by 2 when we have KASAN, so maybe the warning
> limit should be increased as well ?

Yeah that would make some sense.

On 64-bit the largest frame in that file is 1424, which is below the
default 2048 byte limit.

So maybe just increase it for 32-bit && KASAN.

What would be nice is if the FRAME_WARN value could be calculated as a
percentage of the THREAD_SHIFT, but that's not easily doable with the
way things are structured in Kconfig.

cheers

2022-09-08 06:37:53

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH] powerpc/lib/xor_vmx: Relax frame size for clang



Le 08/09/2022 à 02:27, Michael Ellerman a écrit :
> Christophe Leroy <[email protected]> writes:
>> Le 21/06/2019 à 10:58, Mathieu Malaterre a écrit :
>>> When building with clang-8 the frame size limit is hit:
>>>
>>> ../arch/powerpc/lib/xor_vmx.c:119:6: error: stack frame size of 1200 bytes in function '__xor_altivec_5' [-Werror,-Wframe-larger-than=]
>>>
>>> Follow the same approach as commit 9c87156cce5a ("powerpc/xmon: Relax
>>> frame size for clang") until a proper fix is implemented upstream in
>>> clang and relax requirement for clang.
>>
>> With Clang 14 I get the following errors, but only with KASAN selected.
>>
>> CC arch/powerpc/lib/xor_vmx.o
>> arch/powerpc/lib/xor_vmx.c:95:6: error: stack frame size (1040) exceeds
>> limit (1024) in '__xor_altivec_4' [-Werror,-Wframe-larger-than]
>> void __xor_altivec_4(unsigned long bytes,
>> ^
>> arch/powerpc/lib/xor_vmx.c:124:6: error: stack frame size (1312) exceeds
>> limit (1024) in '__xor_altivec_5' [-Werror,-Wframe-larger-than]
>> void __xor_altivec_5(unsigned long bytes,
>> ^
>
> That's a 32-bit build?

Yes, pmac32_defconfig

>
>> Is this patch still relevant ?
>
> The clang issue was closed because a different change fixed the issue:
>
> https://github.com/ClangBuiltLinux/linux/issues/563
>
>> Or should frame size be relaxed when KASAN is selected ? After all the
>> stack size is multiplied by 2 when we have KASAN, so maybe the warning
>> limit should be increased as well ?
>
> Yeah that would make some sense.
>
> On 64-bit the largest frame in that file is 1424, which is below the
> default 2048 byte limit.
>
> So maybe just increase it for 32-bit && KASAN.
>
> What would be nice is if the FRAME_WARN value could be calculated as a
> percentage of the THREAD_SHIFT, but that's not easily doable with the
> way things are structured in Kconfig.
>

Looking at it more deeply, I see strange things.

What is that frame size ? I thought it was the number of bytes r1 is
decremented at the begining of the function, but it seems not, at least
on GCC. It seems GCC substrats 112 bytes while clang doesn't.

I set CONFIG_FRAME_WARN to 8 and with GCC and without KASAN, I get no
warning, allthough I have:

00000000 <__xor_altivec_2>:
0: 94 21 ff f0 stwu r1,-16(r1)
00000078 <__xor_altivec_3>:
78: 94 21 ff f0 stwu r1,-16(r1)
0000010c <__xor_altivec_4>:
10c: 94 21 ff f0 stwu r1,-16(r1)
000001c4 <__xor_altivec_5>:
1c4: 94 21 ff e0 stwu r1,-32(r1)

With GCC and inline KASAN I get:

arch/powerpc/lib/xor_vmx.c: In function '__xor_altivec_2':
arch/powerpc/lib/xor_vmx.c:69:1: warning: the frame size of 96 bytes is
larger than 8 bytes [-Wframe-larger-than=]
arch/powerpc/lib/xor_vmx.c: In function '__xor_altivec_3':
arch/powerpc/lib/xor_vmx.c:93:1: warning: the frame size of 128 bytes is
larger than 8 bytes [-Wframe-larger-than=]
arch/powerpc/lib/xor_vmx.c: In function '__xor_altivec_4':
arch/powerpc/lib/xor_vmx.c:122:1: warning: the frame size of 80 bytes is
larger than 8 bytes [-Wframe-larger-than=]
arch/powerpc/lib/xor_vmx.c: In function '__xor_altivec_5':
arch/powerpc/lib/xor_vmx.c:156:1: warning: the frame size of 128 bytes
is larger than 8 bytes [-Wframe-larger-than=]

00000000 <__xor_altivec_2>:
0: 94 21 ff 30 stwu r1,-208(r1)
00000458 <__xor_altivec_3>:
458: 94 21 ff 00 stwu r1,-256(r1)
00000b94 <__xor_altivec_4>:
b94: 94 21 fe b0 stwu r1,-336(r1)
000015b8 <__xor_altivec_5>:
15b8: 94 21 fe 60 stwu r1,-416(r1)

With CLANG and without KASAN I get:

CC arch/powerpc/lib/xor_vmx.o
arch/powerpc/lib/xor_vmx.c:52:6: warning: stack frame size (144) exceeds
limit (8) in '__xor_altivec_2' [-Wframe-larger-than]
void __xor_altivec_2(unsigned long bytes,
arch/powerpc/lib/xor_vmx.c:71:6: warning: stack frame size (144) exceeds
limit (8) in '__xor_altivec_3' [-Wframe-larger-than]
void __xor_altivec_3(unsigned long bytes,
arch/powerpc/lib/xor_vmx.c:95:6: warning: stack frame size (160) exceeds
limit (8) in '__xor_altivec_4' [-Wframe-larger-than]
void __xor_altivec_4(unsigned long bytes,
arch/powerpc/lib/xor_vmx.c:124:6: warning: stack frame size (144)
exceeds limit (8) in '__xor_altivec_5' [-Wframe-larger-than]
void __xor_altivec_5(unsigned long bytes,

00000000 <__xor_altivec_2>:
0: 94 21 ff 70 stwu r1,-144(r1)
00000528 <__xor_altivec_3>:
528: 94 21 ff 70 stwu r1,-144(r1)
00000c4c <__xor_altivec_4>:
c4c: 94 21 ff 60 stwu r1,-160(r1)
000015a4 <__xor_altivec_5>:
15a4: 94 21 ff 70 stwu r1,-144(r1)

With CLANG and with inline KASAN I get:

arch/powerpc/lib/xor_vmx.c:52:6: warning: stack frame size (512) exceeds
limit (8) in '__xor_altivec_2' [-Wframe-larger-than]
void __xor_altivec_2(unsigned long bytes,
arch/powerpc/lib/xor_vmx.c:71:6: warning: stack frame size (768) exceeds
limit (8) in '__xor_altivec_3' [-Wframe-larger-than]
void __xor_altivec_3(unsigned long bytes,
arch/powerpc/lib/xor_vmx.c:95:6: warning: stack frame size (1040)
exceeds limit (8) in '__xor_altivec_4' [-Wframe-larger-than]
void __xor_altivec_4(unsigned long bytes,
arch/powerpc/lib/xor_vmx.c:124:6: warning: stack frame size (1312)
exceeds limit (8) in '__xor_altivec_5' [-Wframe-larger-than]
void __xor_altivec_5(unsigned long bytes,

00000000 <__xor_altivec_2>:
8: 94 21 fe 00 stwu r1,-512(r1)
00000a24 <__xor_altivec_3>:
a2c: 94 21 fd 00 stwu r1,-768(r1)
000019a4 <__xor_altivec_4>:
19ac: 94 21 fb f0 stwu r1,-1040(r1)
00002f20 <__xor_altivec_5>:
2f28: 94 21 fa e0 stwu r1,-1312(r1)


So it seems that GCC and CLANG don't warn on the same thing, is that
expected ? GCC substrats 112 bytes, which is the minimum frame size on a
ppc64, but here I'm building a ppc32 kernel, min frame size is 16.

And CLANG is still using stack a lot more than GCC.

Christophe

2022-09-08 14:21:42

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH] powerpc/lib/xor_vmx: Relax frame size for clang

On Thu, Sep 08, 2022 at 06:00:24AM +0000, Christophe Leroy wrote:
> Looking at it more deeply, I see strange things.

I'll have to see full generated machine code to be able to see strange
things, there isn't enough information at all here yet. Sorry.

Use private mail if it is too big or uninteresting for the list :-)

> What is that frame size ? I thought it was the number of bytes r1 is
> decremented at the begining of the function, but it seems not, at least
> on GCC. It seems GCC substrats 112 bytes while clang doesn't.

That is the vars size + the fixed size + the size of the parameter
save area + the size of the regs save area, rounded up to a multiple
of 16. Fixed size is 8 on 32-bit PowerPC ELF. Frame size used by GCC
here is just the vars size.

> So it seems that GCC and CLANG don't warn on the same thing, is that
> expected ? GCC substrats 112 bytes, which is the minimum frame size on a
> ppc64, but here I'm building a ppc32 kernel, min frame size is 16.

I need to see the generated code to make sense of what is happening
here. It sounds like it is doing varargs calls or similar expensive
stack juggling. Or just saving a boatload of registers on the stack.

> And CLANG is still using stack a lot more than GCC.

Good to hear! Well, good for GCC, anyway ;-)


Segher

2022-09-08 15:16:08

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] powerpc/lib/xor_vmx: Relax frame size for clang

On Thu, Sep 8, 2022, at 2:27 AM, Michael Ellerman wrote:
> Christophe Leroy <[email protected]> writes:
>
> Yeah that would make some sense.
>
> On 64-bit the largest frame in that file is 1424, which is below the
> default 2048 byte limit.
>
> So maybe just increase it for 32-bit && KASAN.
>
> What would be nice is if the FRAME_WARN value could be calculated as a
> percentage of the THREAD_SHIFT, but that's not easily doable with the
> way things are structured in Kconfig.
>

Increasing the warning limit slightly for 32-bit with
CONFIG_KASAN_STACK makes sense, but there are a lot of
related concerns:

- I was hoping to still stay under 1280 bytes for the warning
limit, so that even with KASAN_STACK enabled, we are able to
catch warnings in functions that use a stupid amount of
local variables, without getting too many false positives.

- if the XOR code has its frame size explode like this, it's
probably an indication of the compiler doing something wrong,
not the kernel code. The result is likely that the "optimized"
XOR implementation is slower than the default version as a
result, and the kernel will pick the other one at boot time.
This needs to be confirmed of course, but an easier workaround
for this instance might be to just disable the xor_vmx module
when KASAN_STACK is set.

- The warning limit on 32-bit is actually 2028 bytes when
GCC_PLUGIN_LATENT_ENTROPY is set. I think this is a mistake
and we should lower /that/ limit instead, but a side-effect
here is that an allmodconfig kernel build with gcc will fail
to warn about bugs that exist both with gcc and clang, while
clang complains about it.

Arnd

2022-09-08 22:55:26

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH] powerpc/lib/xor_vmx: Relax frame size for clang

Hi!

On Thu, Sep 08, 2022 at 05:07:24PM +0200, Arnd Bergmann wrote:
> - if the XOR code has its frame size explode like this, it's
> probably an indication of the compiler doing something wrong,
> not the kernel code.

On the contrary, it is most likely an indication that the kernel code
wants something unreasonable. Like, having 20 variables live at the
same time, but still wanting nicely scheduled machine code generated.

But I suspect GCC unrolled the loops here, even? Best way to prevent
that here is to put an option in the Makefile, for these files. We
don't want any of this unrolled after all? Or, alternatively, remove
all the manual unrolling from this code, let GCC do its thing, without
painting it in a corner.

> The result is likely that the "optimized"
> XOR implementation is slower than the default version as a
> result, and the kernel will pick the other one at boot time.

Yes. So it's self-healing even, of a sort :-)


Segher

2022-09-09 05:25:48

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH] powerpc/lib/xor_vmx: Relax frame size for clang



Le 08/09/2022 à 15:48, Segher Boessenkool a écrit :
> On Thu, Sep 08, 2022 at 06:00:24AM +0000, Christophe Leroy wrote:
>> Looking at it more deeply, I see strange things.
>
> I'll have to see full generated machine code to be able to see strange
> things, there isn't enough information at all here yet. Sorry.

Well, what I call strange is the fact that with GCC the number of bytes
reported by -Wframe-larger-than doesn't match the value the offset used
for the stwu at the start of the function, while it does with clang.

>
> Use private mail if it is too big or uninteresting for the list :-)
>
>> What is that frame size ? I thought it was the number of bytes r1 is
>> decremented at the begining of the function, but it seems not, at least
>> on GCC. It seems GCC substrats 112 bytes while clang doesn't.
>
> That is the vars size + the fixed size + the size of the parameter
> save area + the size of the regs save area, rounded up to a multiple
> of 16. Fixed size is 8 on 32-bit PowerPC ELF. Frame size used by GCC
> here is just the vars size.

Ok, so it means that the stack utilisation is underestimated when using
GCC ? Or is it clang that overestimates it ?

>
>> So it seems that GCC and CLANG don't warn on the same thing, is that
>> expected ? GCC substrats 112 bytes, which is the minimum frame size on a
>> ppc64, but here I'm building a ppc32 kernel, min frame size is 16.
>
> I need to see the generated code to make sense of what is happening
> here. It sounds like it is doing varargs calls or similar expensive
> stack juggling. Or just saving a boatload of registers on the stack.
>

Ok, I'll send it to you. But once again, I don't mind what the code
really look like, I'm just worried that GCC doesn't report the entire
stack usage.


Christophe