2023-10-18 18:24:51

by Hamza Mahfooz

[permalink] [raw]
Subject: [PATCH] lib/Kconfig.debug: disable FRAME_WARN for kasan and kcsan

With every release of LLVM, both of these sanitizers eat up more and
more of the stack. So, set FRAME_WARN to 0 if either of them is enabled
for a given build.

Cc: [email protected]
Signed-off-by: Hamza Mahfooz <[email protected]>
---
lib/Kconfig.debug | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 39d1d93164bd..15ad742729ca 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -429,11 +429,10 @@ endif # DEBUG_INFO
config FRAME_WARN
int "Warn for stack frames larger than"
range 0 8192
- default 0 if KMSAN
+ default 0 if KASAN || KCSAN || KMSAN
default 2048 if GCC_PLUGIN_LATENT_ENTROPY
default 2048 if PARISC
default 1536 if (!64BIT && XTENSA)
- default 1280 if KASAN && !64BIT
default 1024 if !64BIT
default 2048 if 64BIT
help
--
2.42.0


2023-10-18 18:30:21

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH] lib/Kconfig.debug: disable FRAME_WARN for kasan and kcsan

Hi Hamza,

On Wed, Oct 18, 2023 at 8:24 PM Hamza Mahfooz <[email protected]> wrote:
> With every release of LLVM, both of these sanitizers eat up more and
> more of the stack. So, set FRAME_WARN to 0 if either of them is enabled
> for a given build.
>
> Cc: [email protected]
> Signed-off-by: Hamza Mahfooz <[email protected]>

Thanks for your patch!

> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -429,11 +429,10 @@ endif # DEBUG_INFO
> config FRAME_WARN
> int "Warn for stack frames larger than"
> range 0 8192
> - default 0 if KMSAN
> + default 0 if KASAN || KCSAN || KMSAN

Are kernels with KASAN || KCSAN || KMSAN enabled supposed to be bootable?
Stack overflows do cause crashes.

> default 2048 if GCC_PLUGIN_LATENT_ENTROPY
> default 2048 if PARISC
> default 1536 if (!64BIT && XTENSA)
> - default 1280 if KASAN && !64BIT
> default 1024 if !64BIT
> default 2048 if 64BIT
> help

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-10-18 18:40:06

by Hamza Mahfooz

[permalink] [raw]
Subject: Re: [PATCH] lib/Kconfig.debug: disable FRAME_WARN for kasan and kcsan

On 10/18/23 14:29, Geert Uytterhoeven wrote:
> Hi Hamza,
>
> On Wed, Oct 18, 2023 at 8:24 PM Hamza Mahfooz <[email protected]> wrote:
>> With every release of LLVM, both of these sanitizers eat up more and
>> more of the stack. So, set FRAME_WARN to 0 if either of them is enabled
>> for a given build.
>>
>> Cc: [email protected]
>> Signed-off-by: Hamza Mahfooz <[email protected]>
>
> Thanks for your patch!
>
>> --- a/lib/Kconfig.debug
>> +++ b/lib/Kconfig.debug
>> @@ -429,11 +429,10 @@ endif # DEBUG_INFO
>> config FRAME_WARN
>> int "Warn for stack frames larger than"
>> range 0 8192
>> - default 0 if KMSAN
>> + default 0 if KASAN || KCSAN || KMSAN
>
> Are kernels with KASAN || KCSAN || KMSAN enabled supposed to be bootable?

They are all intended to be used for runtime debugging, so I'd imagine so.

> Stack overflows do cause crashes.

It is worth noting that FRAME_WARN has been disabled for KMSAN for quite
a while and as far as I can tell no one has complained.

>
>> default 2048 if GCC_PLUGIN_LATENT_ENTROPY
>> default 2048 if PARISC
>> default 1536 if (!64BIT && XTENSA)
>> - default 1280 if KASAN && !64BIT
>> default 1024 if !64BIT
>> default 2048 if 64BIT
>> help
>
> Gr{oetje,eeting}s,
>
> Geert
>
--
Hamza

2023-10-18 19:13:03

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH] lib/Kconfig.debug: disable FRAME_WARN for kasan and kcsan

Hi Hamza,

On Wed, Oct 18, 2023 at 8:39 PM Hamza Mahfooz <[email protected]> wrote:
> On 10/18/23 14:29, Geert Uytterhoeven wrote:
> > On Wed, Oct 18, 2023 at 8:24 PM Hamza Mahfooz <[email protected]> wrote:
> >> With every release of LLVM, both of these sanitizers eat up more and
> >> more of the stack. So, set FRAME_WARN to 0 if either of them is enabled
> >> for a given build.
> >>
> >> Cc: [email protected]
> >> Signed-off-by: Hamza Mahfooz <[email protected]>
> >
> > Thanks for your patch!
> >
> >> --- a/lib/Kconfig.debug
> >> +++ b/lib/Kconfig.debug
> >> @@ -429,11 +429,10 @@ endif # DEBUG_INFO
> >> config FRAME_WARN
> >> int "Warn for stack frames larger than"
> >> range 0 8192
> >> - default 0 if KMSAN
> >> + default 0 if KASAN || KCSAN || KMSAN
> >
> > Are kernels with KASAN || KCSAN || KMSAN enabled supposed to be bootable?
>
> They are all intended to be used for runtime debugging, so I'd imagine so.

Then I strongly suggest putting a nonzero value here. As you write
that "with every release of LLVM, both of these sanitizers eat up more and more
of the stack", don't you want to have at least some canary to detect
when "more and more" is guaranteed to run into problems?

> > Stack overflows do cause crashes.
>
> It is worth noting that FRAME_WARN has been disabled for KMSAN for quite
> a while and as far as I can tell no one has complained.

ROTFL...

> >> default 2048 if GCC_PLUGIN_LATENT_ENTROPY
> >> default 2048 if PARISC
> >> default 1536 if (!64BIT && XTENSA)
> >> - default 1280 if KASAN && !64BIT
> >> default 1024 if !64BIT
> >> default 2048 if 64BIT
> >> help

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-10-19 10:05:49

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH] lib/Kconfig.debug: disable FRAME_WARN for kasan and kcsan

> > > Are kernels with KASAN || KCSAN || KMSAN enabled supposed to be bootable?
> >
> > They are all intended to be used for runtime debugging, so I'd imagine so.
>
> Then I strongly suggest putting a nonzero value here. As you write
> that "with every release of LLVM, both of these sanitizers eat up more and more
> of the stack", don't you want to have at least some canary to detect
> when "more and more" is guaranteed to run into problems?

FRAME_WARN is a poor canary. First, it does not necessarily indicate
that a build is faulty (a single bloated stack frame won't crash the
system).
Second, devs are unlikely to fix a function because its stack frame is
too big under some exotic tool+compiler combination.
So the remaining option would be to just increase the frame size every
time a new function surpasses the limit.

2023-10-19 13:46:26

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] lib/Kconfig.debug: disable FRAME_WARN for kasan and kcsan

On Thu, Oct 19, 2023, at 12:04, Alexander Potapenko wrote:
>> > > Are kernels with KASAN || KCSAN || KMSAN enabled supposed to be bootable?
>> >
>> > They are all intended to be used for runtime debugging, so I'd imagine so.
>>
>> Then I strongly suggest putting a nonzero value here. As you write
>> that "with every release of LLVM, both of these sanitizers eat up more and more
>> of the stack", don't you want to have at least some canary to detect
>> when "more and more" is guaranteed to run into problems?
>
> FRAME_WARN is a poor canary. First, it does not necessarily indicate
> that a build is faulty (a single bloated stack frame won't crash the
> system).

I agree it's flawed, but it does catch a lot of bugs, both in the
driver and the compiler. What we should probably have is some better
runtime debugging in addition to FRAME_WARN, but it's better than
nothing.

One idea that I've suggested in the past is to add a soft stack
limit that is lower than THREAD_SIZE, using VMAP_STACK with a custom
stack start and a read-only page at the end to catch a thread
exceeding the soft limit and print a backtrace before marking
the page writable.

> Second, devs are unlikely to fix a function because its stack frame is
> too big under some exotic tool+compiler combination.

I've probably sent hundreds of fixes for these in the past. Most
of the time there is an actual driver bug, and almost always
the driver maintainers are responsive and treat the report with
the appropriate urgency: even if only some configurations actually
push it over the limit, the general case is some data structure that
is hundreds of bytes long and was not actually meant to be on
the stack.

The gcc bug reports also usually get addressed quickly, though
we've had problems with clang not making progress on known
bugs for years. It sounds like Nick has made some important
progress on clang very recently, so we should be able to
raise the minimum clang version for kasan and kcsan once
there is a known good release.

> So the remaining option would be to just increase the frame size every
> time a new function surpasses the limit.

That is clearly not an option, though we could try to
add Kconfig dependencies that avoid the known bad combinations,
such as annotating the AMD GPU driver as

depends on (CC_IS_GCC || CLANG_VERSION >=180000) || !(KASAN || KCSAN)

Arnd

2023-10-19 15:57:33

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH] lib/Kconfig.debug: disable FRAME_WARN for kasan and kcsan

On Thu, Oct 19, 2023 at 02:53:01PM +0200, Arnd Bergmann wrote:
> On Thu, Oct 19, 2023, at 12:04, Alexander Potapenko wrote:
> > So the remaining option would be to just increase the frame size every
> > time a new function surpasses the limit.
>
> That is clearly not an option, though we could try to
> add Kconfig dependencies that avoid the known bad combinations,
> such as annotating the AMD GPU driver as
>
> depends on (CC_IS_GCC || CLANG_VERSION >=180000) || !(KASAN || KCSAN)

This would effectively disable the AMDGPU driver for allmodconfig, which
is somewhat unfortunate as it is an easy testing target.

Taking a step back, this is all being done because of a couple of
warnings in the AMDGPU code. If fixing those in the source is too much
effort (I did note [1] that GCC is at the current limit for that file
even with Rodrigo's series applied [2]), couldn't we just take the
existing workaround that this Makefile has for this file and its high
stack usage and just extend it slightly for clang?

diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
index 66431525f2a0..fd49e3526c0d 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
@@ -58,7 +58,7 @@ endif
endif

ifneq ($(CONFIG_FRAME_WARN),0)
-frame_warn_flag := -Wframe-larger-than=2048
+frame_warn_flag := -Wframe-larger-than=$(if $(CONFIG_CC_IS_CLANG),3072,2048)
endif

CFLAGS_$(AMDDALPATH)/dc/dml2/display_mode_core.o := $(dml2_ccflags) $(frame_warn_flag)

That would address the immediate concern of the warning breaking builds
with CONFIG_WERROR=y while not raising the limit for other files in the
kernel (just this one file in AMDGPU) and avoiding disabling the whole
driver. The number could be lower, I think ~2500 bytes is the most usage
I see with Rodrigo's series applied, so maybe 2800 would be a decent
limit? Once there is a fix in the compiler, this expression could be
changed to use clang-min-version or something of that sort.

[1]: https://lore.kernel.org/[email protected]/
[2]: https://lore.kernel.org/[email protected]/

Cheers,
Nathan

2023-10-19 20:17:58

by Hamza Mahfooz

[permalink] [raw]
Subject: Re: [PATCH] lib/Kconfig.debug: disable FRAME_WARN for kasan and kcsan

On 10/19/23 11:56, Nathan Chancellor wrote:
> On Thu, Oct 19, 2023 at 02:53:01PM +0200, Arnd Bergmann wrote:
>> On Thu, Oct 19, 2023, at 12:04, Alexander Potapenko wrote:
>>> So the remaining option would be to just increase the frame size every
>>> time a new function surpasses the limit.
>>
>> That is clearly not an option, though we could try to
>> add Kconfig dependencies that avoid the known bad combinations,
>> such as annotating the AMD GPU driver as
>>
>> depends on (CC_IS_GCC || CLANG_VERSION >=180000) || !(KASAN || KCSAN)
>
> This would effectively disable the AMDGPU driver for allmodconfig, which
> is somewhat unfortunate as it is an easy testing target.
>
> Taking a step back, this is all being done because of a couple of
> warnings in the AMDGPU code. If fixing those in the source is too much
> effort (I did note [1] that GCC is at the current limit for that file
> even with Rodrigo's series applied [2]), couldn't we just take the
> existing workaround that this Makefile has for this file and its high
> stack usage and just extend it slightly for clang?

I personally don't mind fixing these issues in the driver, but the fact
that they the creep back every time a new major version of Clang rolls
out (that has been true for the past couple of years at the very
least), makes it rather annoying to deal with.

>
> diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> index 66431525f2a0..fd49e3526c0d 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> @@ -58,7 +58,7 @@ endif
> endif
>
> ifneq ($(CONFIG_FRAME_WARN),0)
> -frame_warn_flag := -Wframe-larger-than=2048
> +frame_warn_flag := -Wframe-larger-than=$(if $(CONFIG_CC_IS_CLANG),3072,2048)
> endif
>
> CFLAGS_$(AMDDALPATH)/dc/dml2/display_mode_core.o := $(dml2_ccflags) $(frame_warn_flag)
>
> That would address the immediate concern of the warning breaking builds
> with CONFIG_WERROR=y while not raising the limit for other files in the
> kernel (just this one file in AMDGPU) and avoiding disabling the whole
> driver. The number could be lower, I think ~2500 bytes is the most usage
> I see with Rodrigo's series applied, so maybe 2800 would be a decent
> limit? Once there is a fix in the compiler, this expression could be
> changed to use clang-min-version or something of that sort.
>
> [1]: https://lore.kernel.org/[email protected]/
> [2]: https://lore.kernel.org/[email protected]/
>
> Cheers,
> Nathan
--
Hamza

2023-10-19 20:52:07

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH] lib/Kconfig.debug: disable FRAME_WARN for kasan and kcsan

On Thu, Oct 19, 2023 at 04:17:26PM -0400, Hamza Mahfooz wrote:
> On 10/19/23 11:56, Nathan Chancellor wrote:
> > On Thu, Oct 19, 2023 at 02:53:01PM +0200, Arnd Bergmann wrote:
> > > On Thu, Oct 19, 2023, at 12:04, Alexander Potapenko wrote:
> > > > So the remaining option would be to just increase the frame size every
> > > > time a new function surpasses the limit.
> > >
> > > That is clearly not an option, though we could try to
> > > add Kconfig dependencies that avoid the known bad combinations,
> > > such as annotating the AMD GPU driver as
> > >
> > > depends on (CC_IS_GCC || CLANG_VERSION >=180000) || !(KASAN || KCSAN)
> >
> > This would effectively disable the AMDGPU driver for allmodconfig, which
> > is somewhat unfortunate as it is an easy testing target.
> >
> > Taking a step back, this is all being done because of a couple of
> > warnings in the AMDGPU code. If fixing those in the source is too much
> > effort (I did note [1] that GCC is at the current limit for that file
> > even with Rodrigo's series applied [2]), couldn't we just take the
> > existing workaround that this Makefile has for this file and its high
> > stack usage and just extend it slightly for clang?
>
> I personally don't mind fixing these issues in the driver, but the fact
> that they the creep back every time a new major version of Clang rolls
> out (that has been true for the past couple of years at the very
> least), makes it rather annoying to deal with.

I am not sure I agree with that characterization of the situation. clang
has been pretty consistent for the most part (which is certainly on us),
as all versions that the kernel supports warns about this code. I
believe it is more so the fact that there is a new copy of the dcn code
added every year that has none of the fixes applied from earlier
generations... It is not just me that has fixed issues like this, just
run 'git log --grep=stack drivers/gpu/drm/amd/display'. It is not just
clang that complains about the code when sanitizers are turned on, GCC
does as well since Stephen reported them.

Cheers,
Nathan

> > diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> > index 66431525f2a0..fd49e3526c0d 100644
> > --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> > +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> > @@ -58,7 +58,7 @@ endif
> > endif
> > ifneq ($(CONFIG_FRAME_WARN),0)
> > -frame_warn_flag := -Wframe-larger-than=2048
> > +frame_warn_flag := -Wframe-larger-than=$(if $(CONFIG_CC_IS_CLANG),3072,2048)
> > endif
> > CFLAGS_$(AMDDALPATH)/dc/dml2/display_mode_core.o := $(dml2_ccflags) $(frame_warn_flag)
> >
> > That would address the immediate concern of the warning breaking builds
> > with CONFIG_WERROR=y while not raising the limit for other files in the
> > kernel (just this one file in AMDGPU) and avoiding disabling the whole
> > driver. The number could be lower, I think ~2500 bytes is the most usage
> > I see with Rodrigo's series applied, so maybe 2800 would be a decent
> > limit? Once there is a fix in the compiler, this expression could be
> > changed to use clang-min-version or something of that sort.
> >
> > [1]: https://lore.kernel.org/[email protected]/
> > [2]: https://lore.kernel.org/[email protected]/
> >
> > Cheers,
> > Nathan
> --
> Hamza
>